Analysis of Prescription Drug Utilization with Beta Regression Models
AAnalysis of Prescription Drug Utilization with BetaRegression Models
Guojun Gan ∗ Emiliano A. Valdez † August 4, 2020
Abstract
The healthcare sector in the U.S. is complex and is also a large sector that generatesabout 20% of the country’s gross domestic product. Healthcare analytics has been usedby researchers and practitioners to better understand the industry. In this paper, weexamine and demonstrate the use of Beta regression models to study the utilizationof brand name drugs in the U.S. to understand the variability of brand name drugutilization across different areas. The models are fitted to public datasets obtainedfrom the Medicare & Medicaid Services and the Internal Revenue Service. IntegratedNested Laplace Approximation (INLA) is used to perform the inference. The numericalresults show that Beta regression models can fit the brand name drug claim rates welland including spatial dependence improves the performance of the Beta regressionmodels. Such models can be used to reflect the effect of prescription drug utilizationwhen updating an insured’s health risk in a risk scoring model.
Keywords : Beta regression, healthcare analytics, random effects, spatial modeling,
In the pharmaceutical industry, a brand name drug is a prescription drug that is developedand patented by a company. The patent gives the company a period of market exclusivity,during which the company sells the brand name drug at a significantly high price. Formore information about drug pricing, readers are referred to Schoonveld (2015). When thepatent expires, generic versions of the brand name drug are marketed at lower prices byother companies. Generally, most states allow pharmacists to substitute brand name drugswith generic versions, unless otherwise directed by physicians. Nevertheless, according tothe Association for Accessible Medicines (AAM) (Klinger, 2017), the generics account for89% of prescriptions dispensed but only 26% of total drug costs in the U.S. Recent annual ∗ Corresponding author, Department of Mathematics, University of Connecticut, 341 Mansfield Road,Storrs, CT, 06269-1009, USA. Email: [email protected] . † Department of Mathematics, University of Connecticut, 341 Mansfield Road, Storrs, CT, 06269-1009,USA. Email: [email protected] . a r X i v : . [ s t a t . A P ] J u l rescription spending in the United States has been on the rise. According to the NHE(National Health Expenditure) fact sheet , prescription drug spending in the U.S. increased2.5% to $335.0 billion in 2018, faster than the 1.4% growth in 2017.This large disparity in the costs between brand name and generic drugs motivated us toinvestigate for the presence of feature variables, e.g., location, income, demographics, thatdrive the decision to choose one type over the other. It is also our intent to prescribe a modelthat health insurers can replicate to reflect these decision drivers into a risk adjustmentprogram (Duncan, 2011), a mechanism widely popular for assessing the health risk of aninsured (Hileman et al., 2016). This mechanism is an umbrella of commercially designed riskassessment models that are sometimes called risk scoring models. While models do vary,they share a common objective of predicting the health care cost per member per year usingprior year’s medical utilization and expenditures as well as additional risk factors such aslocation and demographic.The decision to prescribe/purchase generic or brand name drugs has been studied before.Brinberg and Cummings (1984) used two behavioral intension models to examine the deci-sion to purchase generic prescription drugs and found several differences between individualswho intent to purchase generic prescription drugs with those who do not. For example, non-intenders believed it was less likely that generic drugs were a safe product than intenders.Hamann et al. (2013) studied psychiatrists’ decision making between generic and brandedantipsychotics and found that psychiatrists were more likely to choose branded drugs whenimagining choosing the drug for themselves. Hassali et al. (2014) reviewed experiences of im-plementing generic medicine policy in eight countries, including the U.S. The review indicatesthat pharmacists play an essential role in promoting generic medicines as they substituted83.8% of prescriptions that allowed substitution and that generic prescribing is still not com-mon practice in the U.S. because many physicians have negative perceptions about genericmedicines and lack in-depth knowledge about the bioequivalence concept applied in the U.S.Kesselheim et al. (2016) studied variations in patients’ perceptions and use of generic drugsbased on a national survey and found a substantial shift that more patients have positiveviews of generic drugs.In the aforementioned work, Brinberg and Cummings (1984), Hamann et al. (2013),and Kesselheim et al. (2016) used survey data to study the behavior of the patients andphysicians. The review conducted by Hassali et al. (2014) was based on literature searchusing several electronic databases and search engines such as ISI ( Institute for ScientificInformation) Web of Knowledge and Google. Public pharmaceutical drug utilization datahave not been used to study the factors driving the decision of choosing one type over theother.In this paper, we investigate the rates of brand name drug claims in the U.S. In particular,we investigate the variations of the brand name drug claims rates in different areas of theU.S. Our study supplements the aforementioned studies based on survey data. Since ratesare values in the interval (0 , . Accessed on July 6, 2020. Healthcare is an important practicing area of actuaries due to the existence of health insur-ance. In this section, we review some work that is related to health insurance analytics.Early work includes Bolnick (2004), Petertil (2005), and Bachler et al. (2006). Bolnick(2004) proposed a framework for long-term healthcare cost projections by incorporating keyhealthcare cost drivers such as life expectancy, biological morbidity, and economic morbidity.Using the framework, Bolnick (2004) considered various plausible future scenarios that en-compass a reasonable range of healthcare cost outcomes. Petertil (2005) studied the relativesignificance of aging as a driver of healthcare cost beyond age fifty and used Medicare datato draw general conclusions that utilization and cost differ by age. Bachler et al. (2006)studied the impact of chronic and nonchronic insured members on cost trends and foundthat classification of chronic members can affect the trends significantly.Recently, there has been a lot more studies on healthcare analytics than ten years ago.Duncan (2011) devoted a book to healthcare analytics for actuaries. In particular, this bookcovers the essentials of health risk and case studies on the use of predictive modeling in riskadjustment. For example, Chapter 14 of this book discusses the risk adjustment methodused by the Centers for Medicare and Medicaid Services (CMS). Frees et al. (2011) extendedthe standard two-part model to predict the frequency and amount of healthcare expendituresand used the data from the Medical Expenditure Panel Survey (MEPS) to calibrate and testthe model. Shi and Zhang (2013) built a game-theoretic model based on copulas to studythe effect of managed care on healthcare utilization compared to traditional fee-for-serviceplans in private health insurance market.Getzen (2016) developed a method to evaluate projections of future medical spendingused by the U.S. Medicare and Medicaid programs and found that the recent set of projec-tions (1998–2010) is more accurate than the older set of projections (1986–1995) becausethe recent projections incorporate lagged macroeconomic effects. Duncan et al. (2016) in-vestigated several statistical techniques (e.g., Lasso GLM, multivariate adaptive regressionsplines, random forests, decision trees, and boosted trees) for modeling future healthcarecosts and found that the traditional regression approach does not perform well.Huang et al. (2017) compared different models and model selection methods for healthcosts in Episode Treatment Groups (ETGs) and found that random forest feature selectionis preferable in terms of efficiency and accuracy. Richardson and Hartman (2018) proposed aBayesian nonparametric regression model for predicting healthcare claims. Their numericalresults show that the Bayesian nonparametric model outperforms standard linear regressionand generalized Beta regression in terms of predictive accuracy. Brockett et al. (2018)investigated the use of Data Envelopment Analysis (DEA), which is a method developed inmanagement science to measure relative efficiency of organizations, to assess the potential3avings of Medicare. Their analysis shows that Medicare Advantage plans are more efficientin reducing health expenditures than the original Medicare. In Brockett et al. (2019), theauthors used linear regression models to estimate the effect of medical loss ratio (MLR) andefficiency on the quality of care for the Medicaid managed care plans. The results show thatthe effect of medical services efficiency on the quality of care is insignificant. Kan et al.(2019) explored the use of machine learning techniques for risk adjustment and found thatpenalized regression performs better than ordinary least squares in predicting healthcarecosts.Our work presented in this paper is relative to healthcare analytics but is different fromthe work mentioned above. In particular, our work examined Beta regression model withspatial dependence to model the percentages of bran name drug claims. The methods andfindings of this paper can be used by practitioners in their risk modeling and adjustment.
We use a public dataset called the Part D Prescriber Public Use File (PUF) from the Centersfor Medicare & Medicaid Services (CMS) and another public dataset called Individual IncomeTax Statistics from the IRS (Internal Revenue Service).The data in the Part D Prescriber PUF cover calendar years 2013 through 2016 andcontain information on prescription drug events (PDEs) incurred by Medicare beneficiarieswith a Part D prescription drug plan. The data consist of a detailed data file and twosummary tables. The “Part D Prescriber Summary Table” contains overall drug utilization,drug costs, and beneficiary counts organized by National Provider Identifier (NPI). In thispaper, we use the 2016 Part D Prescriber Summary Table, which contains 1,131,550 recordsand 84 variables . The Part D Prescriber Summary table contains information about thebrand and generic drug claims at the provider level. Other information includes the zip codeof providers, the count of beneficiaries, the average age of beneficiaries, and the average riskscore of beneficiaries.In this paper, our interest is on studying variations of the brand name drug claims ratesin different areas of the U.S. To that end, we need to aggregate the data to different areas.However, the data can be aggregated to different levels: the state level, the zip code level,or some customized level between the state level and the zip code level. On the one hand,aggregating the data to the state level is not desirable for the following reasons: • The brand name drug claim rate exhibits variations within individual states. • The U.S. has only 50 states and aggregating the data to the state level produces only50 data points.On the other hand, aggregating the data to the zip code level is also not desirable: The file name is
PartD Prescriber PUF NPI 16.txt and it is available from . Accessed on Jan 20, 2020. The dataset contains about 20,000 different zip codes. Aggregating the data to thezip code level will produce about 20,000 data points. This will cause challenges inmodeling the spatial effects as a large number of sites requires lots of parameters. • Some zip codes do not correspond to geographical areas but large volume customersor post office boxes. • Aggregating the data to the zip level produces highly volatile brand name drug claimsrates. That is, the rates are highly volatile between zip codes.As a result, we decide to aggregate the data to some customized level between the state leveland the zip code level.
Longitude La t i t ude Figure 1: A mesh of the 48 contiguous states with 530 triangles.To aggregate the data to a customized level, we first need to split the U.S. land into smallareas, i.e., create a mesh of the U.S. land. This can be done conveniently by using the Rfunction inla.mesh.2d from the
INLA package. Figure 1 shows a mesh of the 48 contiguousstates of the U.S. that consists of 530 triangles. Aggregating the data to the 530 triangularareas will produce a dataset with about 530 data points. This is computationally reasonablefor modeling spatial effects.Aggregating the data to the triangles shown in Figure 1 is done as follows. First, weobtain the longitudes and the latitudes of zip codes from the R package noncensus , whichcontains regional information and demographic data determined by the U.S. Census Bureau.Note that each zip code is associated with a longitude and a latitude. Second, we determine5he triangles to which the zip codes belong by using the longitudes and the latitudes. Finally,we aggregate the data based on the indices of the triangles.The Part D Summary Table contains average ages and average risk scores of the bene-ficiaries. The average risk scores are average HCC (Hierarchical Condition Category) riskscores, which estimate how beneficiaries’ FFS (Fee-For-Service) spending will compare tothe overall average of the entire Medicare population. Before aggregating the data, we con-vert average ages and average risk scores to total ages and total risk scores by multiplyingthe average numbers with the count of beneficiaries. After aggregation is done, we converttotal ages and total risk scores to average ages and average risk scores by dividing the totalnumber by the aggregated count of beneficiaries. The Part D Summary Table also containsmissing values. We remove missing values before aggregating the data.
Longitude La t i t ude brandrate (a) Brandrate F r equen cy (b) Figure 2: Distributions of the brand name drug claim rates. White triangles mean that dataare not available in these areas.Figure 2(a) shows the distribution of the brand name drug claim rates in different tri-angular areas. From the figure, we see that 19 triangular areas do not have the data. As aresult, the aggregated dataset contains 511 observations. From the figure, we also see thatsome triangles in the middle are light red. This suggests that modeling the spatial effectsmay improve the fitting of Beta regression models to the data.Table 1: Summary statistics of the Part D data.
Min 1st Q Median Mean 3rd Q Max brandrate avgage avgscore . This dataset contains 29,874 records, each of which is described by 144variables. It contains information about the number of returns and the amount of returnsin different categories. For example, it contains the number of returns with unemploymentcompensation and the unemployment compensation amount. The return counts and amountsare aggregated to the zip code level. Although this datast does not contain all informationabout an area, it does provide some demographic and economic information, which mightbe useful for explaining the variation of the brand name drug claim rate. For example, thenumber of returns from an area is related to the population of the area.We aggregate the tax data to the 530 triangular areas in the same way as we aggregatethe Part D data. After aggregating the data, we divide the total dollar amount of an areaby the corresponding number of returns to get the average dollar amount of the area. Thetotal number of returns and the average amounts in different categories are used in the dataanalysis. Since this dataset has 144 variables, we do not show the summary statistics ofthese variables here. In this section, we describe some Beta regression models. In particular, we present fourmodels: the basic Beta regression model, the Beta regression model with random effects, theBeta-Besag model, and the Beta-BYM model.The density function of the Beta distribution is typically defined as (Klugman et al.,2012): f ( y ; p, q ) = Γ( p + q )Γ( p )Γ( q ) y p − (1 − y ) q − , < y < , (1)where p > q > f ( y ; µ, φ ) = Γ( φ )Γ( µφ )Γ((1 − µ ) φ ) y µφ − (1 − y ) (1 − µ ) φ − , < y < , (2)where 0 < µ < φ >
0. The shape parameters can be obtained from the mean and thedispersion as follows: p = φµ and q = φ (1 − µ ).The basic Beta regression model is described as follows. Suppose that we have n obser-vations. For i = 1 , , . . . , n , let x i = ( x i , x i , . . . , x ik ) T and y i be the vector of k regressorsand the response in the i th case, respectively. The responses y , y , . . . , y n are assumed toform a random sample such as y i ∼ Beta ( µ i , φ ) . The mean µ i is linked to the regressors as follows: g ( µ i ) = η i = x Ti β , (3) The file name is and it is available from . Accessed on Jan 12, 2020. g ( · ) is the link function and β = ( β , β , . . . , β k ) T is the vector of regression coefficients.The variance of y i can be estimated asVar( y i ) = µ i (1 − µ i )1 + φ . (4)Random effects models are commonly used to analyze areal or spatial data. For example,Tufvesson et al. (2019) applied random effects models to model car insurance data withgeographical locations of the policyholders. There are two types of random effects models:unstructured and structured. Unstructured random effects models assume independence ofthe random effects, while structured random effects models allow for spatial dependence.The second model we consider is a Beta regression model with a single random effect. Inthis model, a random effect is introduced to the linear predictor as follows: g ( µ i ) = η i = x Ti β + v i , (5)where v i are i.i.d Gaussian noise, i.e., v i ∼ N (cid:18) , ψ (cid:19) , (6)where ψ is a precision parameter. This model can help to control for unobserved hetero-geneity when the heterogeneity is not correlated with independent variables. The basic Betaregression model assumes that the heterogeneity is correlated with independent variables.The third model we consider is the Besag model, which introduces a structured randomeffect to the linear predictor: g ( µ i ) = η i = x Ti β + u i , (7)where u i s follow a CAR(1) model, i.e., an order-1 conditional autoregressive model (Besagand Kooperberg, 1995). Building a CAR model requires a neighborhood graph, which tellswhich areas are neighbors to each other. Figure 3 shows a neighborhood graph where areasthat are neighbors are connected by blue lines. In our models, we assume that triangles thatshare a vertex or a side are neighbors to each other.Given a neighborhood graph, the CAR(1) model assumes that the random effect in anarea is related to the random effects in the area’s neighbors through a conditional Gaussiandistribution: u i |{ u j : j ∈ A i } ∼ N (cid:32) | A i | (cid:88) j ∈ A i u j , ψ | A j | (cid:33) , (8)where A i is the set of indices of neighbors of the i th area, | A i | denotes the number of elementsin A i , and ψ is a precision parameter. Note that the CAR(1) model can be written as amultivariate Gaussian model for u = ( u , u , . . . , u n ) (Besag and Kooperberg, 1995; Wanget al., 2018): u ∼ N (cid:18) , ψ Q − (cid:19) , (9)8 Longitude La t i t ude Figure 3: The neighborhood graph used to model spatial effects.where Q is a highly sparse matrix defined by Q ij = | A i | , if i = j, − , if j ∈ A i , , otherwise.The fourth model is the BYM (Besag-York-Molli´e) model, which combines an unstruc-tured random effect and a structured random effect (Besag et al., 1991): g ( µ i ) = η i = x Ti β + u i + v i , (10)where u i s are modeled by a CAR(1) model as specified in Equation (8) and v i s are i.i.dGaussian noise as specified in Equation (6). The structured effects u model the spatialeffect. The unstructured effects v = ( v , v , . . . , v n ) are used to model additional randomvariations that are not explained by geographical locations. Between the two effects u and v , a larger effect of v in the model means that less exchange of information between areasis allowed. The relative strengths of the unstructured effects v and the structured effects u are controlled by the two hyperparameters ψ and ψ , respectively.Table 2 summarizes the four Beta regression models described above. The first two modelsdo not consider spatial dependence, while the last two models allow for spatial dependence.There are several choices of the link function (Ferrari and Cribari-Neto, 2004):9able 2: Linear predictors of the four models. Model name Linear predictor
BetaReg Equation (3)BetaRE Equation (5)BetaBesag Equation (7)BetaBYM Equation (10) • the logit link: g ( µ ) = log µ − µ ; • the probit link: g ( µ ) = Φ − ( µ ), where Φ( · ) is the standard normal distribution function; • the log-log link: g ( µ ) = − log( − log( µ )); • the complimentary log-log link: g ( µ ) = log( − log(1 − µ )); • the Cauchy link: g ( µ ) = tan( π ( µ − . . , −404 0.25 0.50 0.75 m g ( m ) link logitprobitloglogcloglogcauchit Figure 4: Some link functions for Beta regression models.10
Results
In this section, we present the results of fitting the four Beta regression models (see Table2) to the brand drug claim data.
Before we fit the models, we transform the covariates by using the YeoJohnson transforma-tion introduced by Yeo and Johnson (2000). The purpose of transforming the covariates isto reduce their magnitudes for the convenience of modeling fitting. The YeoJohnson trans-formation is similar to the power transformation but can handle zero and negative values.In this transformation, a value x is transformed as follows: x ∗ = (cid:40) log(1 + x ) , if x ≥ , − log(1 − x ) , if x < . (11)We transform all covariates except for the average risk score before fitting the models.After transforming the data, we split the dataset into two sets: one for fitting the modelsand one for validating the models. We use 80% of the data for fitting and the remaining20% for evaluation. Figure 5 shows the distribution of the brand name drug claim rates forthe training set and the test set. Longitude La t i t ude brandrate (a) The training set. Longitude La t i t ude brandrate (b) The test set. Figure 5: The distribution of the brand name drug claim rates for the training and test sets.
The data contain 145 covariates, which include 143 from the individual tax income file and 2(i.e., avgage and avgscore ) from the Part D Summary Table. Including all these covariatesin the models causes the parameter identification problem because many of these covariatesare highly correlated.To select covariates for fitting the models, we use the lasso regularized generalized linearmodels (GLMs), which have been implemented in the R package glmnet (Friedman et al.,11010). Tufvesson et al. (2019) used this method to select covariates. However, the package glmnet does not support Beta regression models. To circumvent this problem, we first createa new binary response variable by comparing the brand name drug claim rates to the averagerate. If a rate is above the average, then the corresponding response value is 1. Otherwise,the response value is 0. Then use lasso regularized logistic regression that is supported by glmnet to select covariates. We assume that the covariates that help to separate high andlower rates can also serve as useful predictors. −10 −8 −6 −4 −2 . . . . . . Log ( l ) B i no m i a l D e v i an c e llllllllllllllllllllllllllllllllllllllllllllll
137 127 113 96 81 62 46 29 16 7 3 2
Figure 6: Results from a cross-valuation run of the lasso regularized logistic GLM based onthe training set.Figure 6 shows the results from a cross-validation run of the lasso regularized logisticGLM with different penalties λ . In the figure, the left dashed vertical line corresponds tothe penalty λ min that minimizes the cross-validation error. The second dashed vertical linecorresponds to the smallest penalty λ se with the error that is within one standard deviationof the minimal error. We use the model with the penalty λ se to select covariates. This givesus 12 covariates, which are described in Table 3. These 12 selected covariates are used to fitthe Beta regression models.Figure 7 shows a correlation plot of the response variable and the covariates. From thefigure, we see that many of these variables are positively correlated. The variable A11070 isnegatively correlated with a few other variables.12able 3: Descripton of the selected covariates.
Covariate Description avgscore
Average risk score
VITA
Number of volunteer income tax assistance (VITA) prepared returns
A00900
Business or professional net income (less loss) amount
A02300
Unemployment compensation amount
A03150
Individual retirement arrangement payments amount
A03230
Tuition and fees deduction amount
A18450
State and local general sales tax amount
A18800
Personal property taxes amount
A07230
Nonrefundable education credit amount
A85770
Total premium tax credit amount
A11070
Additional child tax credit amount
A11902
Overpayments refunded amount l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l −1−0.8−0.6−0.4−0.200.20.40.60.81 b r and r a t e a v g sc o r e V I T A A A A A A A A A A A brandrateavgscoreVITAA00900A02300A03150A03230A18450A18800A07230A85770A11070A11902 Figure 7: Correlation of the selected covariates and the response variable.13 .3 Fitting of the models
Bayesian methods have been used to analyze spatial data for about twenty years since year2000 with the advent of Markov Chain Monte Carlo (MCMC) simulative methods (Blan-giardo and Cameletti, 2015). In fact, the advent of MCMC has allowed researchers to useBayesian methods to develop complex models on large datasets. However, one major draw-back of MCMC methods is that they are computationally demanding, especially for largedatasets. INLA (Integrated Nested Laplace Approximation) is as an alternative to MCMCfor Bayesian inference (Rue and Held, 2005; Rue et al., 2009). A major advantage of INLAis that it is a deterministic algorithm and is capable of producing accurate and fast results.Since INLA was embedded into R through the R package
INLA , it has become popular amongresearchers and practitioners. For example, Tufvesson et al. (2019) used INLA to model carinsurance frequency and severity. In this paper, we use the R package
INLA (Blangiardo andCameletti, 2015; Wang et al., 2018) to fit the four Beta regression models.The four models described in Section 4 can be formulated as Bayesian hierarchical models.For example, the BetaBYM model can be expressed as y i | η i ∼ Beta ( g − ( η i ) , φ ) , (12a) η i = x Ti β + u i + v i , (12b) v i | ψ ∼ N (cid:0) , ψ − (cid:1) , (12c) u | ψ ∼ N (cid:0) , ψ − Q − (cid:1) , (12d) ψ ∼ π ( ψ ) , (12e)where π ( · ) denotes a prior distribution for the two hyperparameters ψ and ψ . Commonchoices for the prior distribution of ψ include independent gamma distributions. In thispaper, we use the default priors in INLA. The deviance information criterion (DIC) of Spiegelhalter et al. (2002) and the Watan-abe Akaike information criterion (WAIC) of Watanabe (2010) are commonly used to selectBayesian models. The DIC is motivated by the Akaike information criterion (AIC) and isdefined as
DIC = E [ D ( θ )] + p D = D ( E [ θ ]) + 2 p D , (13)where D ( θ ) = − p ( y | θ ))is the deviance of the model and p D denotes the effective number of parameters that isdefined as p D = E [ D ( θ )] − D ( E [ θ ]) . In a Bayesian model, the deviance is random variable and the expected deviance under theposterior distribution is used as a measure of fit. Between two models, the model with alower DIC value is preferred. 14he WAIC is similar to the DIC but the effective number of parameters is calculateddifferently. The WAIC is defined as
W AIC = D ( E [ θ ]) + 2 p W , (14)where p W = n (cid:88) i =1 (cid:8) E (cid:2) (log p ( y i | θ )) (cid:3) − E [log p ( y i | θ )] (cid:9) . The WAIC is interpreted in the same way as the DIC. That is, the lower the WAIC, themore preferable the model. Between the DIC and the WAIC, Gelman et al. (2014) arguethat the WAIC is preferred.We also use two measures to validate the out-of-sample performance of the models. Thefirst measure is the concordance correlation coefficient (CCC), which is used to measure theagreement between two variables. The CCC is defined as (Lin, 1989):
CCC = 2 ρσ σ σ + σ + ( µ − µ ) . (15)where ρ is the correlation between the observed values and the predicted values, σ and µ are the standard deviation and the mean of the observed values, respectively, and σ and µ are the standard deviation and the mean of the predicted values, respectively. The value ofthe CCC ranges from − RSE = (cid:118)(cid:117)(cid:117)(cid:116) n n (cid:88) i =1 (ˆ y i − y i ) , (16)where y i and ˆ y i represent the i th observed value and the i th predicted value, respectively.Between two models, the model with a lower RSE value is better. Table 4 shows the DIC and the WAIC of fitting the four Beta regression models with differentlink functions to the training dataset. If we look at the rows, we see that the performancebased on the five link functions is quite similar except for the Cauchy link. If we look at thecolumns, we see that the BetaBYM model performs the best. In terms of DIC, the BetaBYMmodel with the Cauchy link performs the best. In terms of WAIC, however, the BetaBYMmodel with the log-log link is the best. However, the performance of the BetaBYM modelis quite similar for the first four link functions. Using the logit link function is not a badchoice.Table 5 show the out-of-sample performance of the models with different link functionson the test set. In terms of the CCC, the BetaBYM model with the log-log link performsthe best because the corresponding CCC value is the highest among the 20 cases. In terms15able 4: In-sample performance of the models with different link functions.
Model Logit Probit Loglog Cloglog Cauchy
DIC
BetaReg -2103.82 -2102.92 -2101.53 -2104.36 -2109.25BetaRE -2103.60 -2102.20 -2100.60 -2104.10 -2109.79BetaBesag -2105.79 -2109.89 -2109.58 -2107.41 -2114.87BetaBYM -2116.93 -2115.13 -2112.55 -2118.06 -2132.92
W AIC
BetaReg -2070.47 -2070.84 -2071.36 -2069.72 -2062.49BetaRE -2070.60 -2071.07 -2071.56 -2069.88 -2063.02BetaBesag -2077.25 -2081.93 -2083.42 -2077.41 -2069.28BetaBYM -2089.08 -2091.05 -2091.60 -2088.89 -2087.56Table 5: Out-of-sample performance of the models with different link functions.
Model Logit Probit Loglog Cloglog Cauchy
CCC
BetaReg 0.42846 0.42810 0.42749 0.42801 0.42138BetaRE 0.42834 0.42790 0.42733 0.42786 0.42126BetaBesag 0.45289 0.46855 0.47168 0.45622 0.44610BetaBYM 0.48595 0.48954
RSE
BetaReg 0.01380 0.01376 0.01371 0.01383 0.01412BetaRE 0.01380 0.01376 0.01371 0.01383 0.01413BetaBesag 0.01364 0.01352 0.01342 0.01367 0.01412BetaBYM 0.01359 0.01352 ll ll llllll ll ll lll llll l ll lll ll lllll llllll llll lllllllll ll lll l lllll lllll l lllll lll llllll l l llllllll l llll llll lll lll llllllll llllll ll ll lllllll llllllll lll llllllllll lll ll llll ll ll ll l ll ll lllll ll l llll l llllllll lllll lll ll l lllllllllll llll llll ll lll lllll lllll ll l llllllll ll l l ll ll lllllllllll lll ll ll llllllllllll lllll llll lllll llll lllllll llll lllll ll ll llllllllllll ll lllllll lll lll lllll llll lllll l lll l llllll ll llll ll llllll lll l lllll l lll . . . Observed brand rate P r ed i c t ed b r and r a t e l l ll l ll ll lll llllllll lll ll llll llll llll lll ll lllll llll lll ll llll llll lll ll l ll ll llll lll lllll llll l llll l ll ll lll (a) The basic Beta regression model. lll ll llllll ll ll lll llll l ll lll ll lllll llllll llll lllllllll ll lll l lllll lllll l lllll lll llllll l l llllllll l llll llll lll lll llllllll llllll ll ll lllllll llllllll lll llllllllll lll ll llll ll ll ll l ll ll lllll ll l llll l llllllll lllll lll ll l lllllllllll llll llll ll lll lllll lllll ll l llllllll ll l l ll ll lllllllllll lll ll ll llllllllllll lllll llll lllll llll lllllll llll lllll ll ll llllllllllll ll lllllll lll lll lllll llll lllll l lll l llllll ll llll ll llllll lll l lllll l lll . . . Observed brand rate P r ed i c t ed b r and r a t e l l ll l ll ll lll llllllll lll ll llll llll llll lll ll lllll llll lll ll llll llll lll ll l ll ll llll lll lllll llll l llll l ll ll lll (b) The BetaRE model. lll ll llllll ll ll lll llll l ll lll ll lllll llllll llll lllllllll ll lll l lllll lllll l lllll lll llllll l l llllllll l llll llll lll lll llllllll llllll ll ll lllllll llllllll lll llllllllll lll ll llll ll ll ll l ll ll lllll ll l llll l llllllll lllll lll ll l lllllllllll llll llll ll lll lllll lllll ll l llllllll ll l l ll ll lllllllllll lll ll ll llllllllllll lllll llll lllll llll lllllll llll lllll ll ll llllllllllll ll lllllll lll lll lllll llll lllll l lll l llllll ll llll ll llllll lll l lllll l lll . . . Observed brand rate P r ed i c t ed b r and r a t e l l ll l ll ll lll llllllll lll ll llll llll llll lll ll lllll llll lll ll llll llll lll ll l ll ll llll lll lllll llll l llll l ll ll lll (c) The BetaBesag model. lll ll llllll ll ll lll llll l ll lll ll lllll llllll llll lllllllll ll lll l lllll lllll l lllll lll llllll l l llllllll l llll llll lll lll llllllll llllll ll ll lllllll llllllll lll llllllllll lll ll llll ll ll ll l ll ll lllll ll l llll l llllllll lllll lll ll l lllllllllll llll llll ll lll lllll lllll ll l llllllll ll l l ll ll lllllllllll lll ll ll llllllllllll lllll llll lllll llll lllllll llll lllll ll ll llllllllllll ll lllllll lll lll lllll llll lllll l lll l llllll ll llll ll llllll lll l lllll l lll . . . Observed brand rate P r ed i c t ed b r and r a t e l l ll l ll ll lll llllllll lll ll llll llll llll lll ll lllll llll lll ll llll llll lll ll l ll ll llll lll lllll llll l llll l ll ll lll (d) The BetaBYM model. Figure 8: Scatter plots of the observed brand name drug claim rates and those predicted bythe four models with the log-log link function. The out-of-sample predictions are plotted asblue dots, while the in-sample fitted values are plotted as black circles.17f the RSE, the BetaBYM model with the log-log link is also the best as it has the lowestRSE value.Figure 8 shows the scatter plots of the observed brand name drug claim rates and thepredicted values produced by the four models with the log-log link function. From the fourscatter plots, we see that the predicted values of the four models are pretty similar. Wealso see that all four models do not fit the large rates well. The highest predicted value isaround 0.25, while the largest observed value is around 0.41 (see Table 1). Figure 9 showsthe distributions of the brand name drug claim rates predicted by the four models with thelog-log link function across the triangular areas. The maps look quite similar and it is hardto see the differences. In addition, these maps look similar to the map of the observed ratesshown in Figure 2(a).
Longitude La t i t ude brandrate (a) The basic Beta regression model. Longitude La t i t ude brandrate (b) The BetaRE model. Longitude La t i t ude brandrate (c) The BetaBesag model. Longitude La t i t ude brandrate (d) The BetaBYM model. Figure 9: Distributions of the brand name drug claim rates predicted by the four modelswith the log-log link function.Tables 6, 7, 8, and 9 show the fitted coefficients and hyperparameters of the four modelswith the log-log link function. In particular, the tables show the mean, the standard devi-ation, the 2.5% quantile, the median, and the 97.5% quantile of the estimated parameters.From the 2.5% and the 97.5% quantiles, we get the 95% credibility intervals of the fittedparameters. If we look at the credibility intervals of the fitted coefficients, we see that the18able 6: Coefficients and hyperparameters from fitting the basic Beta regression model withthe log-log link.
Mean Std 0.025 Q 0.5 Q 0.975 Q. (Intercept) -0.554 0.106 -0.763 -0.554 -0.345 avgscore
VITA
A00900
A02300
A03150
A03230 -0.002 0.013 -0.027 -0.002 0.023
A18450 -0.044 0.013 -0.069 -0.044 -0.019
A18800
A07230 -0.198 0.044 -0.285 -0.198 -0.111
A85770
A11070 -0.234 0.103 -0.437 -0.234 -0.031
A11902 φ Mean Std 0.025 Q 0.5 Q 0.975 Q. (Intercept) -0.553 0.107 -0.763 -0.553 -0.344 avgscore
VITA
A00900
A02300
A03150
A03230 -0.002 0.013 -0.027 -0.002 0.023
A18450 -0.044 0.013 -0.069 -0.044 -0.019
A18800
A07230 -0.198 0.045 -0.285 -0.198 -0.111
A85770
A11070 -0.234 0.104 -0.438 -0.234 -0.030
A11902 φ ψ Mean Std 0.025 Q 0.5 Q 0.975 Q. (Intercept) -0.574 0.109 -0.788 -0.574 -0.361 avgscore
VITA
A00900
A02300
A03150
A03230
A18450 -0.036 0.014 -0.064 -0.036 -0.007
A18800 -0.012 0.031 -0.074 -0.012 0.049
A07230 -0.215 0.048 -0.310 -0.215 -0.121
A85770
A11070 -0.206 0.110 -0.421 -0.206 0.011
A11902 φ ψ Mean Std 0.025 Q 0.5 Q 0.975 Q. (Intercept) -0.580 0.111 -0.800 -0.580 -0.362 avgscore
VITA
A00900
A02300
A03150
A03230
A18450 -0.031 0.014 -0.059 -0.031 -0.002
A18800 -0.021 0.031 -0.083 -0.021 0.041
A07230 -0.225 0.049 -0.321 -0.225 -0.128
A85770
A11070 -0.195 0.114 -0.419 -0.196 0.028
A11902 φ ψ ψ avgscore , VITA , A02300 , and
A11902 contain only positive values.This means that these variables tend to have positive impact on the responses, i.e., thebrand name drug claim rates.From these tables, we also see two variables whose 95% credibility intervals contain onlynegative values. The two variables are
A07230 and
A18450 . From Table 3, we see that bothvariables are related to education spending. Since the fitted coefficients of these two variablesare negative, the two variables tend to have negative impact on the brand name drug claimrates.Tables 7, 8, and 9 also show the estimated hyperparameters of the unstructured ran-dom effects and the structured random effects. Note that the hyperparameters are precisionparameters that are inverse to the variance of the random effects. The higher the hyperpa-rameter, the lower the variance of the random effects. In Table 7, we see that the estimatedhyperparameter ψ is very large. This means that the random effects component of themodel has a very small variance and does not contribute much to explain the variation ofthe response variable.In Table 8, we see that the fitted precision parameter ψ of the structured random effectsis much lower than the fitted ψ in Table 7. In Table 9, we see the same pattern that fitted ψ is much lower than the fitted ψ . This means that the structured random effects helpmore than the unstructured random effects in terms of explaining the response variation.In summary, the above results show that modeling the spatial effect improves the perfor-mance of the Beta regression model. In addition, including unstructured random effects inthe model only improve the performance marginally. To further assess the comprehensive-ness of our Beta regression models, we provide additional results using models suggested inDuncan et al. (2016). See Appendix A for additional results. The healthcare sector in the U.S. is a large sector that generates about 20% of the country’sgross domestic product. This significance of healthcare has motivated researchers to developenhanced tools and approaches to better understand the industry through healthcare ana-lytics. According to the 2019 Predictive Analytics in Health Care Trend Forecast (Society ofActuaries, 2019), predictive analytics is poised to reshape the healthcare industry by achiev-ing three aims: improved patient outcomes, higher quality of care, and reduced costs. Inparticular, McKinsey estimates that big data analytics can help the U.S. healthcare industryto save more than $300 billion per year, where two thirds of that come from the reductionsof approximately 8% in national healthcare expenditures.In this paper, we examined and demonstrated the use of Beta regression models to studythe utilization of brand name drugs in the U.S. in order to understand variability of thebrand name drug claim rates across different areas of the U.S.. We studied different Betaregression models with and without spatial effects and fitted these models to public datasetsobtained from the CMS and the IRS. The numerical results show that Beta regression modelsare able to fit the brand name drug claim rates well and modeling the spatial effects improvesthe performance of the Beta regression models. We also find some significant variables thathelp to explain the variation of the brand name drug claim rates across different areas. The21ethods and findings in this paper are useful for healthcare actuaries in their data analysis.By reflecting the effect of prescription drug utilization, these models can be used to updatean insured’s risk score in a risk adjustment model. Specifically, healthcare actuaries canincorporate the geographic variation in their models used to select preferred providers.
References
Bachler, R., Duncan, I., and Juster, I. (2006). A comparative analysis of chronic andnonchronic insured commercial member cost trends.
North American Actuarial Journal ,10(4):76–89.Besag, J. and Kooperberg, C. (1995). On conditional and intrinsic autoregression.
Biometrika , 82(4):733–746.Besag, J., York, J., and Molli´e, A. (1991). Bayesian image restoration, with two applicationsin spatial statistics.
Annals of the Institute of Statistical Mathematics , 43:1–20.Blangiardo, M. and Cameletti, M. (2015).
Spatial and Spatio-Temporal Bayesian Modelswith R-INLA . Wiley, Hoboken, NJ.Bolnick, H. J. (2004). A framework for long-term actuarial projections of health care costs:The importance of population aging and other factors.
North American Actuarial Journal ,8(4):1–29.Brinberg, D. and Cummings, V. (1984). Purchasing generic prescriptions drugs: an anal-ysis using two behavioral intention models. In
NA - Advances in Consumer Research ,volume 11, pages 229–234.Brockett, P., Golden, L., Yang, C. C., and Young, D. (2019). Medicaid managed care:Efficiency, medical loss ratio, and quality of care.
North American Actuarial Journal ,0(0):1–16.Brockett, P. L., Golden, L. L., and Yang, C. C. (2018). Potential “savings” of Medicare:The analysis of Medicare advantage and accountable care organizations.
North AmericanActuarial Journal , 22(3):458–472.Duncan, I. (2011).
Health Risk Adjustment and Predictive Modeling . ACTEX Publications,Winsted, CT.Duncan, I., Loginov, M., and Ludkovski, M. (2016). Testing alternative regression frame-works for predictive modeling of health care costs.
North American Actuarial Journal ,20(1):65–87.Ferrari, S. and Cribari-Neto, F. (2004). Beta regression for modelling rates and proportions.
Journal of Applied Statistics , 31(7):799–815.Frees, E. W., Gao, J., and Rosenberg, M. A. (2011). Predicting the frequency and amountof health care expenditures.
North American Actuarial Journal , 15(3):377–392.22riedman, J., Hastie, T., and Tibshirani, R. (2010). Regularization paths for generalizedlinear models via coordinate descent.
Journal of Statistical Software, Articles , 33(1):1–22.Gelman, A., Hwang, J., and Vehtari, A. (2014). Understanding predictive information cri-teria for Bayesian models.
Statistics and Computing , 24:997–1016.Getzen, T. E. (2016). Accuracy of long-range actuarial projections of health care costs.
North American Actuarial Journal , 20(2):101–113.Hamann, J., Mendel, R., Kissling, W., and StefanLeucht (2013). Psychiatrists’ decision mak-ing between branded and generic drugs.
European Neuropsychopharmacology , 23(7):686 –690.Hassali, M. A., Alrasheedy, A. A., McLachlan, A., Nguyen, T. A., AL-Tamimi, S. K.,Ibrahim, M. I. M., and Aljadhey, H. (2014). The experiences of implementing genericmedicine policy in eight countries: A review and recommendations for a successful pro-motion of generic medicine use.
Saudi Pharmaceutical Journal , 22(6):491 – 503.Hileman, G. R., Mehmud, S. M., and Rosenberg, M. A. (2016). Risk scoring in health insur-ance: A primer. , Society of Actuaries.Huang, S., Hartman, B., and Brazauskas, V. (2017). Model selection and averaging of healthcosts in episode treatment groups.
ASTIN Bulletin , 47(1):153–167.Kan, H. J., Kharrazi, H., Chang, H.-Y., Bodycombe, D., Lemke, K., and Weiner, J. P. (2019).Exploring the use of machine learning for risk adjustment: A comparison of standard andpenalized linear regression models in predicting health care costs in older adults.
PLOSONE , 14:1–13.Kesselheim, A. S., Gagne, J. J., Franklin, J. M., Eddings, W., Fulchino, L. A., Avorn, J.,and Campbell, E. G. (2016). Variations in patients’ perceptions and use of generic drugs:Results of a national survey.
Journal of general internal medicine , 31(6):609 – 614.Klinger, E. (2017). 2017 generic drug access and savings in the U.S. re-port. https://accessiblemeds.org/resources/blog/2017-generic-drug-access-and-savings-us-report . Accessed on 9/27/2018.Klugman, S., Panjer, H., and Willmot, G. (2012).
Loss Models: From Data to Decisions .Wiley, Hoboken, NJ, 4th edition.Lin, L. I.-K. (1989). A concordance correlation coefficient to evaluate reproducibility.
Bio-metrics , 45(1):255–268.Petertil, J. P. (2005). Aging curves for health care costs in retirement.
North AmericanActuarial Journal , 9(3):22–49. 23ichardson, R. and Hartman, B. (2018). Bayesian nonparametric regression models formodeling and predicting healthcare claims.
Insurance: Mathematics and Economics , 83:1– 8.Rue, H. and Held, L. (2005).
Gaussian Markov Random Fields: Theory and Applications .CRC Press, Boca Raton, FL.Rue, H., Martino, S., and Chopin, N. (2009). Approximate Bayesian inference for latentGaussian models by using integrated nested laplace approximations.
Journal of the RoyalStatistical Society: Series B (Statistical Methodology) , 71(2):319–392.Schoonveld, E. (2015).
The Price of Global Health: Drug Pricing Strategies to BalancePatient Access and the Funding of Innovation . Gower Publishing Company, Burlington,VT, 2nd edition.Shi, P. and Zhang, W. (2013). Managed care and health care utilization: Specification ofbivariate models using copulas.
North American Actuarial Journal , 17(4):306–324.Society of Actuaries (2019). 2019 Predictive analytics in health care trend fore-cast. . Accessed: 2019-8-23.Spiegelhalter, D. J., Best, N. G., Carlin, B. P., and Van Der Linde, A. (2002). Bayesianmeasures of model complexity and fit.
Journal of the Royal Statistical Society: Series B(Statistical Methodology) , 64(4):583–639.Tufvesson, O., Lindstr¨om, J., and Lindstr¨om, E. (2019). Spatial statistical modelling ofinsurance risk: a spatial epidemiological approach to car insurance.
Scandinavian ActuarialJournal , 0(0):1–15.Wang, X., Yue, Y. R., and Faraway, J. (2018).
Bayesian Regression Modeling with INLA .CRC Press, Boca Raton, FL.Watanabe, S. (2010). Asymptotic equivalence of bayes cross validation and widely applicableinformation criterion in singular learning theory.
Journal of Machine Learning Research ,11:3571–3594.Yeo, I.-K. and Johnson, R. A. (2000). A new family of power transformations to improvenormality or symmetry.
Biometrika , 87(4):954–959.
A Additional results
Duncan et al. (2016) tested several alternative regression frameworks for predictive modelingof health care costs. In particular, they tested multiple linear regression models, lasso,multivariate adaptive regression splines (MARS), random forests, M5 decision trees, andboosted trees. For details of these models, readers are referred to Duncan et al. (2016) and24able 10: Parameters and the R packages of the additional models.
Model R Package Parameters
Linear model lm Multiple linear regression without interactionsLasso glmnet
Gaussian familyMARS earth
Interactions up to degree = 1;Generalized Cross Validation penalty = 1Random forests randomForest
Number of trees = 100; minimum node size = 10M5 decision trees
Cubist
Number of committees = 25Boosted trees gbm
Number of trees = 5000; Shrinkage = 0.01Table 11: Out-of-sample prediction results of the additional models.
Model
CCC RSE
Linear model 0.4129 0.0141Lasso 0.0000 0.0154MARS 0.4420 0.0143Random forest 0.4657 0.0126M5 decision tree 0.4330 0.0131Boosted trees 0.4658 0.0133the references therein. In this section, we present the out-of-sample prediction results ofthese models.Table 10 shows the parameters and R packages we used to test the additional models.Table 11 shows the out-of-sample prediction results obtained by these models. Figure 10shows the scatter plots of the observed brand name drug claim rates and the predictedvalues by the additional models. The variables input to these models are the same as thoseselected for the Beta regression models described in Table 3.From Table 11, we see that the lasso model produced the worst result. The reason isthat the lasso regression model produced an intercept-only model for the data. Figure 10(b)also confirms this as the predicted values are constant. Comparing Tables 5 and 11, wesee that all these models do not perform better than the Beta regression model in termsof
CCC . However, the tree-based models (i.e., random forests, M5 decision trees, boostedtrees) produced lower
RSE than the Beta regression model.25 ll ll llllll ll ll lll llll l ll lll ll lllll llllll llll lllllllll ll lll l lllll lllll l lllll lll llllll l l llllllll l llll llll lll lll llllllll llllll ll ll lllllll llllllll lll llllllllll lll ll llll ll ll ll l ll ll lllll ll l llll l llllllll lllll lll ll l lllllllllll llll llll ll lll lllll lllll ll l llllllll ll l l ll ll lllllllllll lll ll ll llllllllllll lllll llll lllll llll lllllll llll lllll ll ll llllllllllll ll lllllll lll lll lllll llll lllll l lll l llllll ll llll ll llllll lll l lllll l lll . . . Observed brand rate P r ed i c t ed b r and r a t e l l ll l ll ll lll llllllll lll ll llll llll llll lll ll lllll llll lll ll llll llll lll ll l ll ll llll lll lllll llll l llll l ll ll lll (a) The linear regression model. lll ll llllll ll ll lll llll l ll lll ll lllll llllll llll lllllllll ll lll l lllll lllll l lllll lll llllll l l llllllll l llll llll lll lll llllllll llllll ll ll lllllll llllllll lll llllllllll lll ll llll ll ll ll l ll ll lllll ll l llll l llllllll lllll lll ll l lllllllllll llll llll ll lll lllll lllll ll l llllllll ll l l ll ll lllllllllll lll ll ll llllllllllll lllll llll lllll llll lllllll llll lllll ll ll llllllllllll ll lllllll lll lll lllll llll lllll l lll l llllll ll llll ll llllll lll l lllll l lll . . . Observed brand rate P r ed i c t ed b r and r a t e l l ll l ll ll lll llllllll lll ll llll llll llll lll ll lllll llll lll ll llll llll lll ll l ll ll llll lll lllll llll l llll l ll ll lll (b) The lasso model. lll ll llllll ll ll lll llll l ll lll ll lllll llllll llll lllllllll ll lll l lllll lllll l lllll lll llllll l l llllllll l llll llll lll lll llllllll llllll ll ll lllllll llllllll lll llllllllll lll ll llll ll ll ll l ll ll lllll ll l llll l llllllll lllll lll ll l lllllllllll llll llll ll lll lllll lllll ll l llllllll ll l l ll ll lllllllllll lll ll ll llllllllllll lllll llll lllll llll lllllll llll lllll ll ll llllllllllll ll lllllll lll lll lllll llll lllll l lll l llllll ll llll ll llllll lll l lllll l lll . . . Observed brand rate P r ed i c t ed b r and r a t e l l ll l ll ll lll llllllll lll ll llll llll llll lll ll lllll llll lll ll llll llll lll ll l ll ll llll lll lllll llll l llll l ll ll lll (c) The MARS model. lll ll llllll ll ll lll llll l ll lll ll lllll llllll llll lllllllll ll lll l lllll lllll l lllll lll llllll l l llllllll l llll llll lll lll llllllll llllll ll ll lllllll llllllll lll llllllllll lll ll llll ll ll ll l ll ll lllll ll l llll l llllllll lllll lll ll l lllllllllll llll llll ll lll lllll lllll ll l llllllll ll l l ll ll lllllllllll lll ll ll llllllllllll lllll llll lllll llll lllllll llll lllll ll ll llllllllllll ll lllllll lll lll lllll llll lllll l lll l llllll ll llll ll llllll lll l lllll l lll . . . Observed brand rate P r ed i c t ed b r and r a t e l l ll l ll ll lll llllllll lll ll llll llll llll lll ll lllll llll lll ll llll llll lll ll l ll ll llll lll lllll llll l llll l ll ll lll (d) The random forests model. lll ll llllll ll ll lll llll l ll lll ll lllll llllll llll lllllllll ll lll l lllll lllll l lllll lll llllll l l llllllll l llll llll lll lll llllllll llllll ll ll lllllll llllllll lll llllllllll lll ll llll ll ll ll l ll ll lllll ll l llll l llllllll lllll lll ll l lllllllllll llll llll ll lll lllll lllll ll l llllllll ll l l ll ll lllllllllll lll ll ll llllllllllll lllll llll lllll llll lllllll llll lllll ll ll llllllllllll ll lllllll lll lll lllll llll lllll l lll l llllll ll llll ll llllll lll l lllll l lll . . . Observed brand rate P r ed i c t ed b r and r a t e l l ll l ll ll lll llllllll lll ll llll llll llll lll ll lllll llll lll ll llll llll lll ll l ll ll llll lll lllll llll l llll l ll ll lll (e) The M5 decision trees model. lll ll llllll ll ll lll llll l ll lll ll lllll llllll llll lllllllll ll lll l lllll lllll l lllll lll llllll l l llllllll l llll llll lll lll llllllll llllll ll ll lllllll llllllll lll llllllllll lll ll llll ll ll ll l ll ll lllll ll l llll l llllllll lllll lll ll l lllllllllll llll llll ll lll lllll lllll ll l llllllll ll l l ll ll lllllllllll lll ll ll llllllllllll lllll llll lllll llll lllllll llll lllll ll ll llllllllllll ll lllllll lll lll lllll llll lllll l lll l llllll ll llll ll llllll lll l lllll l lll . . . Observed brand rate P r ed i c t ed b r and r a t e l l ll l ll ll lll llllllll lll ll llll llll llll lll ll lllll llll lll ll llll llll lll ll l ll ll llll lll lllll llll l llll l ll ll lll (f) The boosted trees model.(f) The boosted trees model.