Bayesian Analysis of Modified Weibull distribution under progressively censored competing risk model
BBayesian Analysis of Modified Weibulldistribution under progressively censoredcompeting risk model
Arabin Kumar DeyDepartment of Mathematics, IIT GuwahatiAbhilash JhaDepartment of Mathematics, IIT GuwahatiSanku DeySt. Anthony’s CollegeMay 24, 2016
Abstract
In this paper we study bayesian analysis of Modified Weibull distribution underprogressively censored competing risk model. This study is made for progressivelycensored data. We use deterministic scan Gibbs sampling combined with slice sam-pling to generate from the posterior distribution. Posterior distribution is formedby taking prior distribution as reference prior. A real life data analysis is shown forillustrative purpose.
Keywords:
Modified Weibull, Competing risk, likelihood function, Slice sampling.1 a r X i v : . [ s t a t . C O ] M a y Introduction
In survival and medical studies it is quite common that more than one cause of failure maybe directed to a system at the same time. It is often interesting that an investigator needsto estimate a specific risk in presence of other risk factors. In statistical literature analysisof such risk is known as competing risk model. Parametric inference of competing riskmodels are studied by many authors assuming that competing risks follow different lifetimedistributions such as gamma, exponential and Weibull distribution; see for example Berksonand Elveback (1960), Cox (1959), David Moeschberger (1978). However determinationof the cause of the failure is more difficult many times to observe than to follow up thetime to failure. Under the assumption that risks follow exponential distribution, inferenceof the model studied by Miyakawa (1984). Kundu and Basu (2000) considered the samemodel, studied by Miyakawa under the assumption that every member of a certain targetpopulation either dies of a particular cause, say cancer or by other causes. Model canbe more flexible by considering the fact that some individual may be alive at the end ofexperiment i.e. data are censored. Such models are available in literature even underbayesian set up; see for example Kundu and Pradhan (2011), Pareek, Kundu and Kumar(2009).Here we consider competing risk under progressive type-II censoring. The censoringscheme is defined as follows. Consider n individuals in a study and assume that there arek causes of failure which are known. At time of each failure, one of more surviving unitsmay be removed from the study at random. The data from a progressively type-II censoredsample is as follows : ( X m : n , δ , R ) , · · · , ( X m : m : n , δ m , R m ) . X m : n < · · · < X m : m : n . Notethat the complete and type-II right censored samples are special cases of the above schemewhen R = · · · = R m = 0 and R = R = · · · = R m − = 0 and R m = n − m respectively.For an exhaustive list of references and further details on progressive censoring, the readermay refer to the book by Balakrishnan and Aggarwala (2000).The main focus of this paper is the analysis of parametric competing risk model whenthe data is progressively censored. No work has been done taking modified Weibull asparametric distribution of cause of the failures. Modified Weibull has different forms. Weconsider a specific form mentioned in section 2 which is not well-studied. In bayesian anal-2sis we choose reference prior. Use of reference prior is rare as it makes the computationalproblem harder, though the best motivation for prior selection can be obtained throughreference prior. Also many times expression of the prior may not be tractable. Howeverusing gibbs combined with slice can provide a solution to this problem. Bayes estimatorsand credible intervals are also obtained.The organization of the chapter is as follows. In section 2 we describe the model andpresent the definitions and notation used throughout the paper. The bayesian estimationof the different parameters are considered in Section 3. Numerical results are provided insection 4. We illustrate the performance of those techniques in section 5 using real dataset. Finally some conclusions are drawn in Section 6. We assume ( X i , X i ) , i = 1 , · · · , n are n independent, identically distributed (i.i.d.) mod-ified Weibull random variable. Further X i and X i are independent for all i = 1 , · · · , n and X i = min { X i , X i } . We observe the sample ( X m : n , δ , R ) , · · · , ( X m : m : n , δ m , R m ) as-suming all the causes of failure are known. We assume that the X ji ’s are modified Weibulldistribution with parameters λ j for i = 1 , · · · , n and for j = 1 ,
2. The distribution function F j ( · ) of X ji has the following form : F j ( t ) = 1 − e λ j α (1 − e ( tα ) β ) , j = 1 and 2. Therefore foreach cause j , the pdf of failure time can be given by, f j ( t ) = λβ ( tα ) ( β − e ( tα ) β e λ i α (1 − e ( tα ) β ) The likelihood function of the observed data is L ( λ , λ , β, α ) = m (cid:89) i =1 S R i T ( t i ; α, β, λ + λ ) m (cid:89) i =1 [ f ( t i ; α, β, λ ) S ( t i ; α, β, λ )] × m (cid:89) i = m [ f ( t i ; α, β, λ ) S ( t i ; α, β, λ )]where S T ( t i ; α, β, λ + λ ) = (1 − F T ( t i ; α, β, λ + λ )) . L ( λ , λ , β, α ) = n ( n − R − · · · ( n − R − · · · − R m − − m + 1) × λ m λ m − m β m e (cid:80) mi =1 ( tiα ) β [ m (cid:89) i =1 ( t i α ) β − ] e ( λ + λ ) α (cid:80) mi =1 ( R i +1)(1 − e ( tiα ) β ) The key idea is to derive reference prior described by ? (1992) is to get prior π ( θ ) thatmaximizes the expected posterior information about the parameters. Let I ( θ ) is expectedinformation about θ given the data X = x . Then I ( θ ) = E X ( K ( p ( θ | x ) || π ( θ )))where K ( p ( θ | ˜ x ) || π ( θ )) is the Kullback-Leibler distance between posterior p ( θ | x ) and prior p ( θ ) can be given by K ( π ( θ | x ) || π ( θ )) = (cid:90) Ω θ ln π ( θ | x ) π ( θ ) π ( θ | x ) dθ If the set up is as follows, θ = ( θ , θ ), where θ is p × θ is p ×
1. We define p = p + p . where θ is the parameter of interest and θ is a nuisance parameter.Let I ( θ ) = I ( θ ) = I ( θ ) I ( θ ) I ( θ ) I ( θ ) We can show π ( θ | θ ) = | I ( θ ) | c ( θ ), where c ( θ ) is the constant that makes thisdistribution a proper density. We will be using this fact to get the expression for the priorin conditional distribution at each Gibbs sampler step. Reference Prior for Gibbs sampling Set-up
In the Gibbs sampling set-up we try to reduce the posterior as the conditional distri-bution of one parameter given the other parameters and the data. For example, we areinterested in finding the p ( λ | λ , β, α, x ) ∝ p ( x | λ , λ , β, α ) p reference prior ( λ | λ , β, α ) . Com-plication of finding reference prior can be much simplied by using Gibbs sampler method.4 ( λ | λ , β, α ) ∝ (cid:112) J ( λ | λ , β, α, x )= (cid:115) − (cid:18) ∂ l∂λ (cid:19) π ( β | λ , λ , α ) ∝ (cid:112) J ( β | λ , λ , α, x )= (cid:115) − (cid:18) ∂ l∂β (cid:19) π ( α | β, λ , λ ) ∝ (cid:112) J ( α | β, λ , λ , x )= (cid:115) − (cid:18) ∂ l∂α (cid:19) π ( λ | β, λ , α ) ∝ (cid:112) J ( λ | β, λ , α, x )= (cid:115) − (cid:18) ∂ l∂λ (cid:19) However generation of random number from these conditional densities is not tractablethrough inverse transform method due to its highly complicated form. Therefore we useSlice sampler to generate sample from those conditional distribution. Steps of the algo-rithms and calculation of HPD region can be provided as follows :
1. Choose a starting value of λ , λ , α, β . 5. Use slice sampling to generate λ using above λ , α, β .3. Use this new λ and earlier λ , α, β to generate new λ .4. Use new λ , λ and earlier α, β to generate new α .5. Use new λ , λ , α to generate new β .6. repeat step 2-5 M times to generate new ( λ i , λ i , α i , β i ) i = 1 , · · · , m .7. Mean Bayesian estimate of λ , λ , α, β is given by:-ˆ λ B = 1 M M (cid:88) k =1 λ k , ˆ λ B = 1 M M (cid:88) k =1 λ k ˆ α B = 1 M M (cid:88) k =1 α k , ˆ β B = 1 M M (cid:88) k =1 β k
8. To obtain HPD(Highest Posterior Density) region of λ , we order { λ i } , as λ <λ < · · · < λ M ) . Then 100(1 - γ )% HPD region of λ become( λ j ) , λ j + M − Mγ ) ) , f or j = 1 , · · · , M γ Therefore 100(1 - γ )% HPD region of λ becomes ( λ j ∗ ) , λ j ∗ + M − Mγ ) ) , where j ∗ issuch that λ j ∗ + M − Mγ ) − λ j ∗ ) ≤ λ j + M − Mγ ) − λ j ) for all j = 1 , · · · , M γ . Similarly, we can obtain the HPD credible interval for λ , α and β . In this section we conduct a simulation study to investigate the performance of the proposedBayes estimators. We generate sample points from the actual distribution assuming theparameters provided along with different censoring schemes and then apply it to our modelto predict the parameters and compare the results. To generate data sets currently we6ssign the cause of failure as 1 or 2 with probability 0.5. We have generated 200 datapoints. Then we have divided it in 2 cases. In case 1 we consider all 200 data points(which is
Type 2 Censoring ) and in case 2 we will apply censoring on those data (whichis
Progressive censoring ) and then predict the parameters. Below are the differentschemes of data sets used for validation.
Scheme λ = 1 , λ = 0 . , α = 0 . , β = 0 . , n = 200, Type 2 Censoring Scheme λ = 1 , λ = 0 . , α = 0 . , β = 0 . , n = 200, Progressive Type 2 Censoring Scheme λ = 1 , λ = 1 , α = 0 . , β = 0 . , n = 200, Type 2 Censoring Scheme λ = 1 , λ = 1 , α = 0 . , β = 0 . , n = 200, Progressive Type 2 Censoring Scheme 1
Figure 1: Histogram and trace plot of parameter values for Scheme 17arameters Actual Result λ λ α β Scheme 2
Figure 2: Histogram and trace plot of parameter values for Scheme 2Parameters Actual Result λ λ α β cheme 3 Figure 3: Histogram and trace plot of parameter values for Scheme 3Parameters Actual Result λ λ α β Scheme 4 λ λ α β Now We will move on to apply our model on real life data. For that we have considered thefollicular cell lymphoma data from Pintilie (2007) where additional details about data set10an be found. The data set can be downloaded from follic.txt, and consists of 541 patientswith early disease stage follicular cell lymphoma (I or II) and treated with radiation only(chemo = 0) or a combined treatment with radiation and chemotherapy (chemo = 1).Parameters recorded were path1, ldh, clinstg, blktxcat, relsite, chrt, survtime, stat, dftime,dfcens, resp and stnum. The two competing risks are death without relapse and notreatment response . The patient’s ages (age: mean = 57 and sd = 14) and haemoglobinlevels (hgb: mean = 138 and sd = 15) were also recorded. The median follow-up time was5.5 years. There are more parameters which are not of our concern.First we read the data, compute the cause of failure indicator. Below is the code tocalculate the cause. R > e v c e n s < − as . numeric ( f o l $ r e s p == ”NR” | f o l $ r e l s i t e != ” ” ) R > c r c e n s < − as . numeric ( f o l $ r e s p == ”CR” & f o l $ r e l s i t e == ” ” & f o l $ s t a t == 1 ) R > c a u s e < − i f e l s e ( e v c e n s == 1 , 1 , i f e l s e ( c r c e n s == 1 , 2 , 0 ) R > t ab l e ( c a u s e )c a u s e0 1 2193 272 76There are 272 (no treatment response or relapse) events due to the disease, 76 competingrisk events (death without relapse) and 193 censored individuals. The event times aredenoted as dftime. Thus our data set is prepared.Now we will consider this data set and apply it on our model. We will consider 3 casesof analysis as defined below :-Case 1 We will only consider those data in which cause = 1 and cause = 2 are considered.Case 2 Here we will consider the censor data too. Now to prepare data set we will first sortdata according to time. Then we will remove data points between two non-zerocauses.For Case 1, the following are the bayes estimate corresponding to the parameters, λ =0.12128, λ = 0.01762, α = 2.33876, β = 0.44871. For Case 2, bayes estimates are, λ =11.034, λ = 0.004, α = 30.228, β = 0.536.We show the histogram and trace plot of the parameters of the Follicular Data Case 1and Case 2 in 5, 5 respectively.Figure 5: Histogram and trace plot of parameters for Follicular Data Case 112igure 6: Histogram and trace plot of parameters for Follicular Data Case 2Parameters posterior median Lower HPD Upper HPD λ λ α β α apparently seem to look awkward as its widthis very high. However taking posterior median as true value of the parameters of modifiedWeibull, we can easily cross-check it provides a good fit for histograms of the distribution ofcause-specific data. This verifies the distributional assumption of this parametric approach.13 Conclusion
We consider the bayesian analysis of the competing risks data when they are Type-IIprogressively censored. Numerical simulation shows the posterior median calculated viaslice sampling combined with gibbs sampler works quite well. In this article we considerwith two causes of failure only, the work can be extended to more than two causes of failure.We select parameters to follow reference prior which is very flexible. The similar work canbe done for type-I progressively censored data. The work is on progress. Slice samplinguses step out methods where selection of width plays an important role. An automatedchoice of such selection based on the data and distribution can enhance the quality of thealgorithm.
SUPPLEMENTARY MATERIAL
References
Abdel-Hamid, Alaa H and Al-Hussaini, Essam K (2011). Inference for a progressive stressmodel from Weibull distribution under progressive type-II censoring.
Journal of Com-putational and applied mathematics , no. 17, 5259-5271.Atreya, S. F. and Rizk, M. M. (2015). Prediction under modified Weibull distribution withapplications.
International Journal of Scientific & Engineering Research , 1625–1632.Asgharzadeh, A., Valiollahi, R. and Kundu, D. (2015). Prediction for future failures inWeibull Distribution under hybrid censoring International Journal of Scientific & Engi-neering Research , 1625–1632.Azzalini, A. (2005). The skew-normal distribution and related multivariate families. ,159–188.Balakrishnan, N (2007) Progressive censoring methodology: an appraisal Test
Springer ,no. 16, 211–259 14alakrishnan and Aggarwala (2000) Progressive censoring : Theory, Methods, and Appli-cations
Statistics For Industry and Technology , Birkhauser , BostonBerkson, J. and Elveback, L. (1960). Competing exponential risks with particular inferenceto the study of smoking lung cancer
Journal of the American Statistical Association ,84-89.Bernardo, Jos´e M (1979). Reference posterior distributions for Bayesian Inference. Journalof the Royal Statistical Society
B 41 , 113 - 147.Bernardo, Jos´e M (1992). On development of reference priors.
Bayesian Statistics , 61 -77.Berger, James O and Bernardo, Jos´e M and Sun, Dongchu (2009). Competing exponen-tial risks with particular inference to the study of smoking lung cancer The Annals ofStatistics , no. 2, 905–938.Cox, D. R. (1959). The analysis of exponentially distributed lifetimes of with two types offailure times Journal of Royal Statistics
B 21 , 411 - 421.David, H. A. and Moeschberger (1978). The Theory of Competing Risks
Griffin
London.Kaplan, E. L. and Meier, P. (1958). Non-parametric estimation from incomplete observation
Journal of American Statistical Association , vol. 53, 457 - 481.Peterson, Jr., A. P. (1977). Expressing the Kaplan-Meier estimator as a function of em-pirical survival functions
Journal of American Statistical Association , vol. 72, 854 -858.Hall, P. (1988). Theoretical comparison of bootstrap confidence intervals.
Annals of Statis-tics , vol. 16, 927 - 953.Chen, Ming-Hui and Shao, Qi-Man (2005). Monte Carlo Estimation of Bayesian Credibleand HPD Intervals.
Journal of Computational and Graphical Statistics , 159–188.15amien, Paul and Wakefield, Jon and Walker, Stephen (1999). Gibbs sampling for Bayesiannon-conjugate and hierarchical models by using auxiliary variables. Journal of the RoyalStatistical Society. Series B, Statistical Methodology , 331–344.Gasmi, S. and Berzig, M. (2011). Parameters Estimation of the Modified Weibull Distribu-tion Based on Type I Censored Samples. Applied Mathematical Sciences , 2899–2917.Hemmati, F. and Khorram, E. (2016). On Adaptive Progressively Type-II Censored Com-peting Risks Data. Communications in Statistics-Simulation and Computation just-accepted .Pareek, Bhuvanesh and Kundu, Debasis and Kumar, Sumit (2009) Bayesian analysis ofprogressively censored competing risks data
Computational Statistics & Data Analysis , no. 12, 4083–4094.Kundu, Debasis and Pradhan, Biswabrata (2011) On progressively censored competingrisks data for Weibull distributions. Sankhya B , no. 2, 276–296.Kundu, Debasis and Basu, S. (2000) Analysis of incomplete data in presence of competingrisk Journal of Statistical Planning and Inference , no. 2, 276–296.Preda, V., Panaitescu, E. and Constantinescu, A. (2010). Bayes estimators of Modified-Weibull distribution parameters using Lindley’s approximation Wseas Transactions onMathematics , 539–549.Miyakawa, M. (1984). Analysis of incomplete data in competing risks model IEEE Trans-actions on Reliability Analysis , no. 4, 293-296.Neal, Radford M (2003). Slice sampling Annals of statistics , 705–741.Sarhan, Ammar M. and Apaloo, J. (2013). Exponentiated modified Weibull extensiondistribution Reliability Engineering and System Safety , 137–144.Satagopan, JM and Ben-Porat, L and Berwick, M and Robson, M and Kutler, D andAuerbach, AD (2004), A note on competing risks in survival data analysis
Britishjournal of cancer vol. 91, no. 7, 1229–1235.16padhyaya, S. K., and Gupta, A. (2010). A Bayes analysis of modified Weibull distributionvia Markov chain Monte Carlo simulation,
J. Statist. Comput. Simul. , 241-254.Xie, M., Tang, Y. and Goh, TN. (2002). A modified Weibull extension with bathtub-shapedfailure rate function. Reliability Engineering and System Safety76