Subgroup Identification and Interpretation with Bayesian Nonparametric Models in Health Care Claims Data
SSubgroup Identification and Interpretation withBayesian Nonparametric Models in Health CareClaims Data
Christoph F. Kurz
Helmholtz Zentrum München, GermanyInstitute of Health Economics and Health Care Management [email protected]
Laura A. Hatfield
Harvard Medical SchoolDepartment of Health Care Policy
Abstract
Inpatient care is a large share of total health care spending, making analysis ofinpatient utilization patterns an important part of understanding what drives healthcare spending growth. Common features of inpatient utilization measures includezero inflation, over-dispersion, and skewness, all of which complicate statisticalmodeling. Mixture modeling is a popular approach that can accommodate thesefeatures of health care utilization data. In this work, we add a nonparametricclustering component to such models. Our fully Bayesian model framework allowsfor an unknown number of mixing components, so that the data determine thenumber of mixture components. When we apply the modeling framework to dataon hospital lengths of stay for patients with lung cancer, we find distinct subgroupsof patients with differences in means and variances of hospital days, health andtreatment covariates, and relationships between covariates and length of stay.
Inpatient hospital services account for a small share of health care utilization, but the majority oftotal health care spending. [10] To understand the variation in this major component of health careexpenditures, researchers have sought to identify patient subgroups with different utilization andspending patterns. [9]Health care resource use data are often non-negative, right-skewed, heavy-tailed, and multi-modalwith a point mass at zero. Desirable analytical approaches for these data should be sufficientlypowerful and flexible to accommodate all these features. Common generalized linear models (GLMs)for count data use the Poisson, geometric, and Negative Binomial distributions, which do not accountfor over-dispersion, zero inflation, and multi-modality. [6, 8] More sophisticated models for countsinclude the zero-inflated or hurdle model [3] and finite mixture models (FMMs), also known as latentclass models. FMMs identify groups of observations with similar outcomes using unsupervisedclustering. FMM models may be implemented with count distributions, such as Poisson [13] andNegative Binomial [2], and may be augmented with a hurdle component for excess zeros. [1]FMMs provide better fit than standard GLMs and the hurdle model. [2] In addition, mixture modelsaccommodate multi-modality and can even link mixture component prevalences to covariates. [11]
Interpretable ML Symposium, 31st Conference on Neural Information Processing Systems (NIPS 2017), LongBeach, CA, USA. a r X i v : . [ s t a t . M L ] N ov ixture models avoid the hurdle model’s sharp dichotomy between users and non-users. A keyquestion in mixture models is the optimal number of components. (Note that we use component ,rather than cluster , to describe the subpopulations identified by FMMs.) Too many components mayoverfit the data and impair model interpretation, while too few components limit the flexibility ofthe mixture to approximate the true underlying data structure. The number of components can bedecided ex ante , by choosing a convenient and interpretable number such as two or three, or ex post ,by calculating models with different numbers of components and comparing their fit statistics.Our model uses a Dirichlet prior (DP) for the mixing component. In this fully Bayesian hierarchicalmixture model, the optimal number of components is determined simultaneously with the model fit,and the model can incorporate prior information about the number of components. This one-stageprocess yields the ideal number of components and allows interpretation of each component. Thepotentially unbounded infinite mixture model avoids both over- and under-fitting by allowing the datato determine the optimal number of components. [14]We are motivated by the desire to understand variation in days spent in hospital among patientsdiagnosed with lung cancer. Lung cancer is the most common cancer worldwide and a major cause ofcancer-related mortality. Previous work has shown substantial heterogeneity in the patterns of healthcare utilization among patients with lung cancer. [15] To understand patient subpopulations definedby inpatient hospital days and effects of covariates on hospital days, we develop and apply NegativeBinomial regression models that use a DP prior for nonparametric mixing of the components. Studiesfound that nonparametric models can outperform current methods, especially in the case of healthclaims data. [4] The basis of our model is a Negative Binomial regression model. We extend this model to a mixtureof Negative Binomial components indexed by k = 1 , . . . , K . Each component is parameterizedby a mean µ = µ , ..., µ K , and a precision parameter ψ = ψ , ..., ψ K . As before, we specify aregression model for the mean parameter µ k = exp ( x β k ) . We define z = z , ..., z K as the componentassignment variable, where each z k is a K -indicator vector. The component assignment depends onthe mixing variable c = c , ..., c K , with (cid:80) Kk =1 c k = 1 . Thus, the likelihood is: p ( y | µ, ψ ) = N (cid:89) n =1 K (cid:89) k =1 ( NegBin ( y n | µ k , ψ k )) z nk We define the marginal over component assignments as a categorical distribution: p ( z | c ) = N (cid:89) n =1 K (cid:89) k =1 c z nk k , and the prior over the K -simplex mixing vector as a Dirichlet distribution with hyperparameter α : p ( c ) = Dir ( c | α ) = Γ( Kα )Γ( α ) K K (cid:89) k =1 c α − k , where Γ( · ) is the Gamma function. Using a Dirichlet prior for the distribution of component means inmixture models does not require one to specify the number of components. Instead, a concentrationparameter α controls it implicitly. The Dirichlet prior serves as a nonparametric prior on the mixturecomponents. Finally, we specify priors for the regression coefficients and the precision parameters: β kd ∼ N ( m , s ) ,ψ k ∼ LN ( a , b ) . We chose the following weakly informative [5] hyperparameters: α = 0 . , m = 0 . , s = 10 . , a = 0 . , b = 2 . . In addition to the usual regression parameters, this nonparametric mixture models produces severaladditional parameters of interest. For each mixture component k , we want to estimate the relative2revalence of the mixture component in the data and parameters of the mixture component’s dis-tribution, such as the mean, variance, and regression coefficients. The mixture weights c k are theprobabilities associated with each component. In addition, for each observation, we want to estimatethe mixture component from which it was most likely drawn, also called the component assignment z . We analyzed health care billing claims provided by the AOK Research Institute. AOK covers around30% of the German resident population. The data set contains patient-level information on inpatientand outpatient diagnoses and procedures from 2009 to 2012, as well as service utilization. We used astudy population previously derived from this data set consisting of patients with incident lung cancerin 2009. More information on the data set and the selection criteria can be found in [16].The outcome of interest was the total number of inpatient hospital days for each patient in the yearafter diagnosis. Inpatient hospital days are defined as the number of days from formal admission tohospital until discharge (i.e., hospital outpatient procedures do not count as hospital days), summedover all hospitalizations in a year. Admission and discharge on the same day count as one hospital day,so only individuals who were never admitted have zero hospital days. We included only individualswho survived for the full year, resulting in N = 7118 individual observations. The mean number ofhospital days is 44, with a maximum of 296 and 59 zero observations.We included the following covariates in the model: age, sex, treatment type during the course ofthe disease (chemotherapy, radiation therapy, or surgery), number of other tumor sites at diagnosis,number of metastases at diagnosis, Charlson comorbidity index, and district type of residence (majorcity, urban district, rural district, or thinly populated rural district). The Charlson comorbidity indexwas calculated using ICD-10 codes as in [17] with the slight modification of excluding the diagnosisof lung cancer out of the group “solid tumor without metastation". For the AOK data set, the DP-NB finds 3 components as having the highest expected posteriormixture weights. In the following, we only show results based on this final model with 3 components.Component 1 contains only 6% (420/7118) of all observations and corresponds to individuals whospend, on average, fewer days in hospital but with very high variance. The mode of hospital days forcomponent 1 is 5 for the DP-NB and 7 for the DP-ZINB (zero-mode omitted). Component 2 is thelargest, with 58% (4091/7118) of individuals; they stay longer in hospital (the mode is at 24 days)with less variance. Component 3 comprises 37% (2607/7118) of the population, and these patientshave the most hospital days (mode at 39 days) and again high variance.
Comp. 1
Comp. 2
Comp. 3incidence rate ratio
10% 20% 30% 40%chemo+radio+surgeryradio+surgerychemo+surgerychemo+radiosurgeryradiation therapychemotherapyno therapy
A B
Figure 1: (A) DP-NB estimation results for all three components on the AOK data set. Parameterestimates are presented as incidence rate ratios and 95% high probability density intervals. Intervalsthat exclude the 1 are highlighted in purple. Intercept is not shown.Figure 1 (A) shows β k parameter estimates for each component of the DP-NB as incidence rate ratios(IRRs) alongside the Bayesian high probability density intervals (HPDIs). These coefficients can3e interpreted as the multiplicative increase in the expected number of hospital days for every oneunit increase in the predictor. For example, in components 1 and 2, treatment is associated withmore hospital days, across all modalities. The IRR for the combination of all three treatments is thelargest, 8.3. That is, compared to patients who receive no treatment, those who receive chemotherapy,radiation, and surgery have 8.3 times as many expected hospital days. In general, all treatmentcombinations have higher IRRs in component 1 than in components 2 and 3. The only exceptionis radiotherapy, which has an IRR of 4.1 in component 2, higher than 0.6 in component 3, and 3.1in component 1. The IRR for the number of metastases is decreasing over the components: 1.24,1.10 in 2, and 0.99 in 3. On the other hand, the IRR for the number of multiple tumors is increasingfrom 0.69 to 0.81 to 0.95 across components 1, 2, and 3. In component 3, radiation is associatedwith fewer hospital days and surgery is null, whereas chemotherapy and combinations are associatedwith more hospital days. Demographic factors and baseline health were less strongly associated withhospital days. Age appears to have no relationship to hospital days (the IRRs are around 1.0 in allcomponents), and sex has a very small estimate in component 1 (IRR of 0.93). Regional factors areonly important for individuals in component 3 and only for urban districts, where the IRR is 0.64.When we use hard assignments to classify individuals into components according to the highestposterior probability, we see that treatment patterns are very different across components (see Figure 1(B)). Chemotherapy plus radiation is the most common treatment in all components, but individuals incomponent 1 are far more likely to receive this combination (42% compared to 23% in components 2and 3). Surgery alone is the second most common treatment in components 2 (22%) and 3 (17%), andchemotherapy alone is the second most common treatment in component 1 (20%), but it is infrequentin 2 (12%) and 3 (13%). Compared to people in component 1, those in components 2 and 3 are morelikely to receive chemotherapy combined with surgery or surgery and radiation. This paper explores a Bayesian regression model for count data that combines nonparametric cluster-ing with the advantages of finite mixture models. It avoids model selection and decision bias becauseall parameters, including the ideal number of mixture components, are estimated from the data.For the AOK data set, the DP models find three components of individuals with strikingly differentdistributions of hospital days and treatment patterns. In the treatment of lung cancer, surgery offersthe best prospect of cure. If diagnosed at an early stage, it is possible to remove the tumor as awhole, such that no further treatment is necessary. If the tumor is already bigger, surgical resectionwith chemotherapy and radiation is the treatment of choice. In metastatic lung cancer, palliativechemotherapy, possibly accompanied by radiation therapy for individual metastases, may alleviatesymptoms and prolong survival. [7]Component 1 has the fewest hospital days on average. In this component, we find many patientswith chemotherapy only, and chemotherapy in combination radiation therapy. This, and the lack ofsurgery, likely indicate that these patients were already in an advanced (metastatic) stage at diagnosis.For these patients, it is likely that therapy had a palliative intent with a focus on improving qualityof life. In contrast, patients in components 2 and 3 were more likely to have surgery only, surgeryand chemotherapy, and the combination of all three treatments. This indicates diagnosis at an earlierstage and more aggressive treatment. The treatments received by people in components 2 and 3 arequite similar, with only the proportion of surgery being slightly higher in 2 than in 3. However, thecoefficients governing the relationships between treatment type and hospital days in these componentsare quite different. While radiation therapy is associated with significantly more hospital days incomponent 2, it has the opposite association in component 3. Moreover, the strong and positiveassociation of surgery with hospital days in component 2 fades in component 3, where surgery has norelationship to hospital days. Together, these results suggest that people in components 2 and 3 getsimilar treatment combinations, but for different reasons.There are several limitations to this study. First, DP mixture models present computational challenges.For example, care must be taken when fitting Bayesian mixture models to avoid the so-called "label-switching problem" caused by the model being invariant under permutations of the indices of thecomponents (i.e., the indices of the model components may be permuted across chains). It is crucialto run multiple Markov chains, inspect the resulting posterior samples, and apply posterior checks, aswe have done here. In addition, because nonparametric mixture models allow inference over such a4arge parameter space (i.e., DP mixture models use a prior over essentially all distributions), posteriorcomputation may be intractable. Neal [12] provides an overview of the state-of-the-art samplingalgorithms, but much progress has been made since then. In our applications, fitting this model tothe AOK data took approximately 4 hours on a current quadcore CPU with 32GB RAM, comparedto only 3 minutes to compute a finite mixture model with a pre-specified number of components.Further research should investigate how variational Bayesian methods could improve speed and howthis affects the accuracy of the estimates.In addition, interpretation is necessarily more difficult in complex models such as these. In thecase of the DP-NB model, with three mixture components, the number of regression coefficients isthree times the number of covariates. However, inference on multiple parameters simultaneously isrelatively straightforward in Bayesian models, which is another advantage of this approach.
References [1] Teresa Bago d’Uva. Latent class models for utilisation of health care.
Health economics , 15(4):329–343,2006.[2] Partha Deb, Pravin K Trivedi, et al. Demand for medical care by the elderly: a finite mixture approach.
Journal of applied Econometrics , 12(3):313–336, 1997.[3] Naihua Duan, Willard G Manning, Carl N Morris, and Joseph P Newhouse. A comparison of alternativemodels for the demand for medical care.
Journal of business & economic statistics , 1(2):115–126, 1983.[4] Gilbert W Fellingham, Athanasios Kottas, and Brian M Hartman. Bayesian nonparametric predictivemodeling of group health claims.
Insurance: Mathematics and Economics , 60:1–10, 2015.[5] Andrew Gelman. Prior distributions for variance parameters in hierarchical models (comment on article bybrowne and draper).
Bayesian analysis , 1(3):515–534, 2006.[6] William Greene. Accounting for excess zeros and sample selection in poisson and negative binomial regres-sion models. Technical report, New York University, Leonard N. Stern School of Business, Department ofEconomics, 1994.[7] C Gridelli, F Perrone, F Nelli, S Ramponi, and F De Marinis. Quality of life in lung cancer patients.
Annalsof oncology , 12(suppl 3):S21–S25, 2001.[8] Joseph M Hilbe.
Modeling count data . Springer, 2011.[9] Aroon D Hingorani, Daniëlle A van der Windt, Richard D Riley, Keith Abrams, Karel GM Moons,Ewout W Steyerberg, Sara Schroter, Willi Sauerbrei, Douglas G Altman, and Harry Hemingway. Prognosisresearch strategy (progress) 4: stratified medicine research.
Bmj , 346:e5793, 2013.[10] David Kashihara and Kelly Carper. Statistical brief 272: National health care expenses in the us civiliannoninstitutionalized population, 2007.
Agency for Healthcare Research and Quality , 2009.[11] Borislava Mihaylova, Andrew Briggs, Anthony O’Hagan, and Simon G Thompson. Review of statisticalmethods for analysing healthcare resources and costs.
Health economics , 20(8):897–916, 2011.[12] Radford M Neal. Markov chain sampling methods for dirichlet process mixture models.
Journal ofcomputational and graphical statistics , 9(2):249–265, 2000.[13] Winfried Pohlmeier and Volker Ulrich. An econometric model of the two-part decisionmaking process inthe demand for health care.
Journal of Human Resources , pages 339–361, 1995.[14] CE Rasmussen. The infinite gaussian mixture model.
Advances in Neural Information Processing Systems ,pages 554–560, 2000.[15] Megan S Schuler, Nina R Joyce, Haiden A Huskamp, Elizabet B Lamont, and Laura A Hatfield. Medicarebeneficiaries with advanced cancer experience diverse patterns of care from diagnosis to death.
HealthAffairs , 36(7):1193–1200, 2017.[16] Larissa Schwarzkopf, Margarethe Wacker, Rolf Holle, Reiner Leidl, Christian Günster, Jürgen-BernhardAdler, and Rudolf Maria Huber. Cost-components of lung cancer care within the first three years afterinitial diagnosis in context of different treatment regimens.
Lung Cancer , 90(2):274–280, 2015.[17] Vijaya Sundararajan, Toni Henderson, Catherine Perry, Amanda Muggivan, Hude Quan, and William AGhali. New icd-10 version of the charlson comorbidity index predicted in-hospital mortality.
Journal ofclinical epidemiology , 57(12):1288–1294, 2004., 57(12):1288–1294, 2004.