Journal of Statistical Theory and Applications

Volume 20, Issue 2, June 2021, Pages 355 - 363

Modeling Frailty Using Birnbaum Saunders Distribution for Bivariate Survival Data

Lal Pawimawha*, ORCID
Department of Statistics, Pachhunga University College, Mizoram, 796001, India
*Corresponding author. Email:
Corresponding Author
Lal Pawimawha
Received 23 June 2020, Accepted 18 February 2021, Available Online 12 March 2021.
DOI to use a DOI?
Bayesian model comparison, Birnbaum–Saunders frailty, Exponential power distribution, Pareto distribution, Shared frailty, Weibull distribution

Frailty models are survival models, which are designed to explore the properties of the unobserved heterogeneity in individuals pertaining to disease and death. In this paper, we propose the new frailty model called Birnbaum–Saunders frailty model with Weibull, Pareto, and exponential power as baseline distributions. Estimation of the proposed model parameters was done using Bayesian estimation procedure under Markov chain Monte Carlo technique. A simulation study was also presented to compare the true values and the estimated parameters. We then applied these proposed models to kidney infection data. Models comparison was done using various information criteria and Bayes factor. The best model for the data related to infection occur due to insertion of catheter is suggested.

© 2021 The Authors. Published by Atlantis Press B.V.
Open Access
This is an open access article distributed under the CC BY-NC 4.0 license (


Many researchers have used Cox [1] proportional-hazards model to analyze the survival data by considering the study population as homogeneous indicating that all the individuals are considered as having the same risk for getting diseases or death. But, in many practical situations, it may not be possible to consider all the individuals as homogeneous since they are differ from each other in contracting disease due to genetic factor or environmental factor or some other factor, which are generally omitted during the analysis of the survival data. In these conditions, it is not possible to assume the dependency between the survival times. To investigate such data, it is important to explain dependency within the subject in the multiple event times. In such case, it is expected to present an additional random part “frailty” into the proportional hazard model. Here, the random family impact itself is of prime interest in the investigation. These models are becoming alluring and more attractive due to the introduction of extra unobserved variable and name as “frailty,” which is assumed to be shared by a pair of organs.

The frailty model is also known as a random effect model, which is designed to account for heterogeneity amongst the groups of individuals due to unobserved factors. Generally, it is omitted by many researchers. Clayton [2] first proposed the model for such data without using frailty. The term frailty was first used by Vaupel et al. [3]. Keyfitz and Littman [4] also indicate that omitting individual heterogeneity can cause inaccurate conclusions. Balakrishnan and Liu [5] discussed Birnbaum–Saunders (BS) as frailty distribution in semi-parametric frailty model. In addition, BSas frailty distribution is used to analyze medical and cancer data (Leao et al. [68]). Many distributions such as gamma, inverse Gaussian, positive stable distribution, power variance function, Weibull, compound Poisson are also engaged as frailty distribution for modeling heterogeneity in the populations.

Generally, the frailty model is developed as an unobserved random variable is adding multiplicatively on the baseline hazard function (BHF). Considering univariate frailty model, let a continuous random variable Y be a life time of an individual and the random variable V be the frailty variable. The conditional hazard function (CHF) for a given frailty variable V=v at time y>0 is,

where m0(y) is a BHF at time y> 0. X_ and β_ are the vector form of covariates and regression coefficients. The conditional survival function (CSF) for given frailty at time y> 0 is,
where M0(y) is the cumulative BHF at time y> 0.

The marginal survival function(SF) can be obtained by taking integration of Equation (1.3) with respect to frailty variable V having probability density function (pdf) f(v), and is given as

where LV(.) is the Laplace transform (LT) of the frailty variable V.

In this manuscript, we consider right censored data with BS as the frailty distribution. Weibull, Pareto and exponential power are used as the baseline distributions to explore the salient features of the shared BS frailty models based on multiplicative hazard. Here, the dependence between the survival times is due to BS distributed common frailty variable V. The next sections are organized as follows: in Section 2, the general shared frailty model was introduced. Frailty distribution is given in Section 3. In Section 4, baseline distributions are given. Sections 5 and 6 are reserved for proposed models and Estimation strategies. Section 7 present the application to real life data and in Section 8, conclusion is given.


In the study, suppose there are n individuals having first and second failure times (y1k,y2k), Xpk (p = 1,2,…,q) be the found covariate for the kth individual. Here it is acknowledged that the two survival times share a similar kind of covariates. Let Vk represent the shared frailty for the kth individual and are assumed to be adding multiplicatively on BHF. Another assumption is that conditional independence between the two survival times of an individual for given shared frailty. Under these cases, the CHF and CSF for the kth individual at jth (j = 1,2) survival times yjk for given frailty are obtained as

where m0(yjk) is a BHF at time yjk>0 and β_ is a vector form of regression coefficients.

The conditional cumulative hazard function (CuHF) for the kth individual at the jth survival time yjk>0 for given frailty Vk = vk is

where ηk=eX_kβ_ and M0(yjk) is the cumulative BHF at time yjk>0.

The CSF for kth individual at jth survival time yjk>0 for given frailty Vk = vk is


Under the assumption of independence, the bivariate CSF for given frailty Vk = vk at time y1k>0 and y2k>0 is


The unconditional SF (in bivariate case) at time y1k>0 and y2k>0 can be derived by taking integration of Equation (2.4) with respect to frailty variable Vk having the pdf f(vk), and is given as

where LVk(.) is the LT of frailty variable of Vk for kth individual.

Here onward we represent S(y1k,y2kX_k) as S(y1k,y2k).


The first frailty distribution we have considered here is BS distribution [9]. The pdf of the BS distribution is unimodal. The shape of the HF plays a significant role in the analysis of lifetime data. Leiva [10] also showed that BS distribution is right-skewed (asymmetrical), continuous, and unimodal. It is otherwise called the fatigue life distribution and has gotten extensive consideration because of its theoretical arguments, its alluring properties and its connection with the normal distribution. In this regard, Mann et al. [11] first conjectured that the HF is not an increasing function, but the average HF is nearly nondecreasing function. Later, Kundu et al. [12] and Bebbington et al. [13] showed that the HF of the BS distribution is a unimodal function. In many real-life situations, the HF may not be monotone, and it increases up to a point and then decreases.

A continuous random variable V is said to follow BS distribution with scale parameter α and shape parameter ξ if its probability density function is

having mean ξ2(α2+2) and variance ξ24 (5α4+4α2)

The LT is derived as


For the identifiability issues, we assume V has mean one which presents restriction on the parameters α = θ and ξ = 2θ2+2. Under this restriction, the pdf and the LT of a BS distribution are derived as

and LV(s)=e1θ21+4s+2θ2θ2(2+θ2)(1+4s+2θ2)1/4θ(2+θ2+1+4s+2θ2θ2(2+θ2))212+θ2(θ2(2+θ2))7/4(1+4s+2θ2θ2(2+θ2))3/2

Substituting the LT in Equation (2.5), we get the unconditional bivariate survival function for the kth individual at the time y1k and y2k as,

where M01(y1k) and M02(y2k) are the cumulative BHFs of the life time random variables Y1k and Y2k, respectively.


The first baseline distribution engaged here is Weibull distribution, which is one of the most commonly used distribution for modeling life-time data in survival analysis. A continuous random variable Y is said to have two parameter Weibull distribution if the SF, the HF, and the CuHF, respectively;

where λ and α represents scale and shape parameters, respectively. The HF is increasing when α>1 and decreasing when α<1 and constant hazard rate is observed for the distribution when α = 1.

The second baseline distribution engaged here is the Pareto distribution [14]. A continuous random variable Y is said to have the Pareto distribution with the scale parameter λ and the shape parameter α if its SF is

and the HF and the CuHF as

It is observed that h(y) decreases with y, λ>0, α>0. Hence this distribution is also among the class of decreasing failure rate.

The third and last baseline distribution we have considered here is the exponential power distribution. A continuous random variable Y is said to have the exponential power distribution if its SF is

where λ and α represents scale and shape parameters of the distribution respectively. The HF and the CuHF are respectively,

It is observed that the HF is increasing, decreasing and U shaped for the exponential power distribution.


Replacing the CuHF for the Weibull distribution, Pareto distribution, and exponential power distribution in Equation (3.3), we get the unconditional bivariate SFs at time y1k>0 and y2k>0 as


Here from now onwards we denote Equations (5.15.3) as Model I, Model II, and Model III, respectively. Model I is shared BS frailty model with Weibull baseline distribution, Model II is shared BS frailty model with Pareto baseline distribution, and Model III is shared BS frailty model with exponential power baseline distribution.


Let us consider n individuals under the study having first and second failure times given by (y1k,y2k). Let d1k and d2k be considered as the observed censoring times for the kth individual (k=1,2,3,,n) associated with the first and second recurrence times respectively. It is also assumed that censoring scheme and life times of individuals are independent.

The likelihood function (LF) associated with the kth individual having bivariate life time random variable is given by

and the LF is
where σ, ψ_, and β_ are respectively the frailty parameter, the baseline parameters and the regression coefficients, which are in vector form.

Where n1,n2,n3, and n4 are the random number of observations for which first and second failure times (y1k,y2k) observed to lie in the ranges y1k<d1k,y2k<d2k; y1k<d1k,y2k>d2k; y1k>d1k,y2k<d2k and y1k>d1k,y2k>d2k, respectively and


The LFs given in Equation (6.1) is very difficult to solve using the Newton–Raphson method. Maximum likelihood estimations (MLEs) may not be convergent since large number of parameters are involved in the models. Therefore, the Bayesian approach of estimation is engaged to estimate the parameters involved in the models, which does not suffer any such kind of troubles. Many authors have studied the Bayesian approach for the estimation of frailty model parameters. Some of the authors are Ibrahim et al. [15] and references therein, Santos and Achcar [16]. Santos and Achcar [16] have used log-normal as frailty distribution with Weibull and generalized gamma distribution as the baseline in parametric models. Ibrahim et al. [15] and references therein used gamma frailty in the Weibull model and the piecewise exponential model.

The joint posterior density function associated with the parameters for given failure times is obtained as

where qi(.) (i=1,2,3,,5) represents the prior density function having known hyper parameters which is concurring justification for baseline parameters and variance of frailty; gi(.) represents prior density function associated with the regression coefficient βi; β_i is regression coefficients, which represent in a vector except βi, i=1,2,,k and L(.) is given by Equation (6.1). Here all the parameters are assumed to be independently distributed.

Metropolis–Hasting and Gibbs Sampler Algorithm is utilized for the estimation of the parameters in the models fitted with the above prior density function and likelihood equation. Geweke test and Gelman–Rubin Statistics given by Geweke [17] and Gelman and Rubin [18] to observe the Markov chain to be converged to a stationary distribution. To explore the property of the chain, to calculate the burn-in period and to choose autocorrelation lag, we utilized trace plots, coupling from the past plots and sample autocorrelation plots, respectively.

The adequacy of the models is checked by using predictive intervals, which is based on the sample generated from the posterior predictive density. For comparing different proposed models, we used various information criteria such as Bayesian Information Criteria (BIC), Akaike Information Criteria (AIC), and Deviance Information Criteria (DIC). Also, the Bayes factor Buv (Kass and Raftery [19]) is utilized for comparing the models Mu against Mv.


The models performance was examined by applying the proposed models to the data related to infection of kidney occurring due to catheter insertion [20]. First, Kolmogorov–Smirnov (KS) test is utilized to check the goodness of fit for the given data and it is seen that the p values for the first and second recurrences are adequately enormous to say that there is no motivation to reject the hypothesis that the first and second recurrence time to follow the distributions with SF as given in Equations (4.1), (4.4), and (4.7) in univariate case and this condition is assumed to be valid for bivariate case. The p values obtained from KS test are given in Table 1.

Recurrence Times
Baseline Distribution First Second
Weibull 0.6671 0.5003
Pareto 0.9736 0.8452
Exponential power 0.7012 0.4929
Table 1

p values of Kolmogorov–Smirnov (KS) statistics for goodness of fit test for kidney infection data set.

The posterior results of the different proposed models are presented in Tables 24. It comprises the estimate (posterior mean), standard error (SE), 95% lower and upper credible limits (LCL and UCL), Gelman and Rubin (GR) statistics values with p values and Geweke (GK) test values. From Tables 24, it is observed that regression coefficients for all the models are more or less the same. Also for all these different proposed models, the value zero is not a credible value for all the credible intervals of the regression coefficients, so all the covariates seem to be significant. It is observed that the values of frailty parameter for all the models are different. In particular, the sex effect β2= −1.8987, −0.8597, and −1.6724 indicates that in the frailty models the female patients have significantly lower risk of infection than the male patients for all the models since the credible interval of regression coefficient does not contain zero. Also, estimate of θ = 1.1217, 0.6175, and 1.1938 for the models shows that there exists heterogeneity in the population of patients though each patient within a cluster share the same covariates that is, age, sex, and disease type.

Parameter Estimate SE LCL UCL GK Values p Values GR Values
Burn in period = 7200; Autocorrelation lag = 250
α1 1.2271 0.0322 1.1701 1.2938 0.0070 0.5028 0.9999
α2 1.1558 0.0482 1.0510 1.2344 0.0115 0.5046 1.0017
λ1 0.0178 0.0047 0.0093 0.0271 0.0097 0.5038 1.0002
λ2 0.0150 0.0045 0.0074 0.0243 0.0019 0.5007 0.9999
θ 1.1217 0.0486 1.0409 1.2200 0.0094 0.5037 1.0011
β1 0.0082 0.0005 0.0072 0.0091 0.0002 0.5000 1.0001
β2 −1.8987 0.3475 −2.5616 −1.2270 −0.0120 0.5000 1.0008
β3 0.0421 0.0047 0.0331 0.0510 0.0106 0.5042 1.0000
β4 0.0616 0.0050 0.0528 0.0711 0.0049 0.5019 1.0001
β5 0.0021 0.0005 0.0011 0.0030 0.0006 0.5002 1.0000

SE, standard error; LCL, lower credible limit; UCL, upper credible limit; GK, Geweke.

Table 2

Posterior results for Model I using kidney infection data set.

Parameter Estimate SE LCL UCL GK Values p Values GR Values
Burn in period = 4900; Autocorrelation lag = 220
α1 2.2022 0.2959 1.6299 2.8432 −0.0023 0.4990 1.0005
α2 2.6089 0.4641 1.7035 3.4416 0.0045 0.5018 1.0085
λ1 0.0159 0.0043 0.0102 0.0264 0.0069 0.5027 1.0000
λ2 0.0089 0.0005 0.0080 0.0099 0.0021 0.5008 0.9999
θ 0.6175 0.0514 0.5427 0.7241 0.0234 0.5093 1.0014
β1 0.0020 0.0005 0.0010 0.0029 0.0048 0.5019 1.0006
β2 −0.8597 0.1796 −1.1968 −0.5025 −0.0074 0.5019 1.0058
β3 0.0620 0.0051 0.0529 0.0712 0.0180 0.5072 1.0033
β4 0.0620 0.0051 0.0525 0.0712 0.0051 0.5020 1.0056
β5 0.0209 0.0052 0.0116 0.0303 −0.0006 0.4997 1.0035

SE, standard error; LCL, lower credible limit; UCL, upper credible limit; GK, Geweke.

Table 3

Posterior results for Model II using kidney infection data set.

Parameter Estimate SE LCL UCL GK Values p Values GR Values
Burn in period = 5100; Autocorrelation lag = 375
α1 0.5117 0.0293 0.4515 0.5720 −0.0024 0.4990 1.0014
α2 0.5952 0.0250 0.5517 0.6415 −0.0042 0.4982 0.9999
λ1 0.1623 0.0330 0.1073 0.2432 0.0024 0.5009 1.0005
λ2 0.0891 0.0049 0.0807 0.0989 0.0038 0.5015 0.9999
θ 1.1938 0.0479 1.1116 1.2846 0.0054 0.5021 1.0020
β1 0.0015 0.0004 0.0006 0.0024 −0.0043 0.4982 1.0009
β2 −1.6724 0.3614 −2.3837 -0.8913 0.0023 0.4982 1.0001
β3 0.0741 0.0048 0.0646 0.0828 0.0083 0.5033 1.0000
β4 0.0822 0.0049 0.0731 0.0906 0.0001 0.4999 1.0000
β5 0.0021 0.0005 0.0012 0.0030 0.0089 0.5035 1.0009

SE, standard error; LCL, lower credible limit; UCL, upper credible limit; GK, Geweke.

Table 4

Posterior results for Model II using kidney infection data set.

The adequacy of the different proposed models is checked by constructing 95% and 50% predictive intervals of a generated random sample from predictive distribution as given in Gelfand and Ghosh [21] and Sahu et al. [22] and counted the number of times recurrence happen in the intervals for first and second kidney infections. Under the Model I, Model II, and Model III, 76, 65; 76, 62; and 71, 58, respectively out of 76 observations are contained in the 95% and 50% predictive intervals. This shows that the three models are adequate for the data.

The best model as per AIC, BIC, and DIC values is Model I, since the values of AIC, BIC, and DIC of Model I are smaller than Model II and Model III, which is given in Table 5. But the distinction between AIC, BIC, and DIC values for Model I, Model II, and Model III are very little, so AIC, BIC, and DIC values may not be commendable to make a choice between the proposed models. Here, we consider Bayes factor to compare model u against model v. The value in Table 6 shows that Model I is better than Model II and Model III as the corresponding value of 2log(Buv) is greater than 10 indicating that there is a very strong positive to favor Model I than Model II and Model III for the given dataset, which affirms our earlier result given in Table 5. Hence from all the demonstrated comparison criteria we can say Model I is better than Model II and Model III for modeling the given data.

Model No. AIC BIC DIC Log likelihood
Model I 690.1043 706.4801 675.1946 −335.0521
Model II 702.0792 718.4551 685.4432 −341.0396
Model III 698.0372 714.413 685.1324 −339.0186

BIC, Bayesian Information Criteria; AIC, Akaike Information Criteria; DIC, Deviance Information Criteria.

Table 5

AIC, BIC, and DIC values for all the proposed models.

Numerator Model Against Denominator Model 2loge(Buv) Range Evidence Against Model in Denominator
MI against MII 11.1430 10 Very Strong Positive
MI against MIII 14.1158 0 and <2 Not significant
MIII against MII 2.9728 2 and <6 Positive
Table 6

The values of Bayes factor and decision based on it.


In this paper we discussed the results for BS shared frailty models with Weibull, Pareto, and exponential power as baseline distributions. Our main objective is to investigate which model fits better with the given infectious disease data. We performed the analysis of the given data by using R. Since, the LFs do not converge and maximum likelihood fails to estimate the parameters, Bayesian approach was utilized to analyze the data.

The comparison between the three proposed models was done using AIC, BIC, and DIC values. The smallest AIC value was Model I(BS frailty model with Weibull baseline distribution). The same result holds for BIC and DIC value. But these differences are not much significant. To take the decision about Model I, Model II, and Model III, we used the Bayes factor. We observed that, Model I is the best. The values of frailty variances were also calculated, which are 1.2197, 0.3971 and 1.3515 for Model I, Model II and Model III respectively indicating that there is a strong evidence of high degree of heterogeneity in the population of patients. Bayes factor was employed to test the frailty parameter θ=0 and it was observed that frailty is present. Negative value of regression coefficient (β2) of covariate sex indicated that the female patients have a slightly lower risk of infection.

By referring all the above analysis, we suggested that the new shared BS frailty model with Weibull distribution as baseline distribution is the best in the proposed models for modeling of kidney infection data. This also confirms that model with frailty is appropriate to model kidney infection dataset. Therefore, from the above results, it is evident that cluster effects (frailty variable) significantly incorporate with the recurrence time of kidney infection due to insertion of catheter. One important property that was observed from BS frailty model with Weibull baseline distribution is the estimate of frailty variance, which is larger than earlier frailty models given by McGrilchrist and Aisbett [20] on log-normal frailty, Hanagal and Dabade [23] on compound Poisson frailty, and Hanagal and Dabade [24] on gamma frailty models.


The authors declare that they have no conflicts of interest.


The author developed the models, performed the analytic calculations, numerical simulations, wrote the manuscript and discussed the results.


The author would like to thank the referee for their valuable suggestions and comments which improved the earlier version of the manuscript.


5.N. Balakrishnan and L. Liu, Revstat., Vol. 16, 2018, pp. 231-255.
11.N.R. Mann, R.E. Schafer, and N.D. Singpurwalla, Methods for Statistical Analysis of Reliability and Life Data, John Wiley and Sons, New York, NY, USA, 1974.
13.M. Bebbington, C.-D. Lai, and R. Zitikis, Math. Sci., Vol. 33, 2008, pp. 49-56.
14.J.V. Deshpande and S.G. Purohit, Life Time Data: Statistical Models and Methods, World Scientific, New Jersey, NJ, USA, 2005.
16.C.A. Santos and J.A. Achcar, J. Stat. Theory Appl., Vol. 9, 2010, pp. 233-253.
18.A. Gelman and D.B. Rubin, J.M. Bernardo, J.O. Berger, A.P. Dawid, and A.F.M. Smith (editors), Bayesian Statistics 4, Oxford University Press, Oxford, 1992, pp. 625-632.
23.D.D. Hanagal and A.D. Dabade, Int. J. Stat. Manag. Syst., Vol. 7, 2012, pp. 36-84.
Journal of Statistical Theory and Applications
20 - 2
355 - 363
Publication Date
ISSN (Online)
ISSN (Print)
DOI to use a DOI?
© 2021 The Authors. Published by Atlantis Press B.V.
Open Access
This is an open access article distributed under the CC BY-NC 4.0 license (

Cite this article

AU  - Lal Pawimawha
PY  - 2021
DA  - 2021/03/12
TI  - Modeling Frailty Using Birnbaum Saunders Distribution for Bivariate Survival Data
JO  - Journal of Statistical Theory and Applications
SP  - 355
EP  - 363
VL  - 20
IS  - 2
SN  - 2214-1766
UR  -
DO  -
ID  - Pawimawha2021
ER  -