[PDF] Analysis of Prescription Drug Utilization with Beta Regression Models

Abstract

The healthcare sector in the U.S. is complex and is also a large sector that generates about 20% of the country's gross domestic product. Healthcare analytics has been used by researchers and practitioners to better understand the industry. In this paper, we examine and demonstrate the use of Beta regression models to study the utilization of brand name drugs in the U.S. to understand the variability of brand name drug utilization across different areas. The models are fitted to public datasets obtained from the Medicare & Medicaid Services and the Internal Revenue Service. Integrated Nested Laplace Approximation (INLA) is used to perform the inference. The numerical results show that Beta regression models can fit the brand name drug claim rates well and including spatial dependence improves the performance of the Beta regression models. Such models can be used to reflect the effect of prescription drug utilization when updating an insured's health risk in a risk scoring model.

Full PDF

AAnalysis of Prescription Drug Utilization with BetaRegression Models

Guojun Gan ∗ Emiliano A. Valdez † August 4, 2020

Abstract

The healthcare sector in the U.S. is complex and is also a large sector that generatesabout 20% of the country’s gross domestic product. Healthcare analytics has been usedby researchers and practitioners to better understand the industry. In this paper, weexamine and demonstrate the use of Beta regression models to study the utilizationof brand name drugs in the U.S. to understand the variability of brand name drugutilization across diﬀerent areas. The models are ﬁtted to public datasets obtainedfrom the Medicare & Medicaid Services and the Internal Revenue Service. IntegratedNested Laplace Approximation (INLA) is used to perform the inference. The numericalresults show that Beta regression models can ﬁt the brand name drug claim rates welland including spatial dependence improves the performance of the Beta regressionmodels. Such models can be used to reﬂect the eﬀect of prescription drug utilizationwhen updating an insured’s health risk in a risk scoring model.

Keywords : Beta regression, healthcare analytics, random eﬀects, spatial modeling,

In the pharmaceutical industry, a brand name drug is a prescription drug that is developedand patented by a company. The patent gives the company a period of market exclusivity,during which the company sells the brand name drug at a signiﬁcantly high price. Formore information about drug pricing, readers are referred to Schoonveld (2015). When thepatent expires, generic versions of the brand name drug are marketed at lower prices byother companies. Generally, most states allow pharmacists to substitute brand name drugswith generic versions, unless otherwise directed by physicians. Nevertheless, according tothe Association for Accessible Medicines (AAM) (Klinger, 2017), the generics account for89% of prescriptions dispensed but only 26% of total drug costs in the U.S. Recent annual ∗ Corresponding author, Department of Mathematics, University of Connecticut, 341 Mansﬁeld Road,Storrs, CT, 06269-1009, USA. Email: [email protected] . † Department of Mathematics, University of Connecticut, 341 Mansﬁeld Road, Storrs, CT, 06269-1009,USA. Email: [email protected] . a r X i v : . [ s t a t . A P ] J u l rescription spending in the United States has been on the rise. According to the NHE(National Health Expenditure) fact sheet , prescription drug spending in the U.S. increased2.5% to $335.0 billion in 2018, faster than the 1.4% growth in 2017.This large disparity in the costs between brand name and generic drugs motivated us toinvestigate for the presence of feature variables, e.g., location, income, demographics, thatdrive the decision to choose one type over the other. It is also our intent to prescribe a modelthat health insurers can replicate to reﬂect these decision drivers into a risk adjustmentprogram (Duncan, 2011), a mechanism widely popular for assessing the health risk of aninsured (Hileman et al., 2016). This mechanism is an umbrella of commercially designed riskassessment models that are sometimes called risk scoring models. While models do vary,they share a common objective of predicting the health care cost per member per year usingprior year’s medical utilization and expenditures as well as additional risk factors such aslocation and demographic.The decision to prescribe/purchase generic or brand name drugs has been studied before.Brinberg and Cummings (1984) used two behavioral intension models to examine the deci-sion to purchase generic prescription drugs and found several diﬀerences between individualswho intent to purchase generic prescription drugs with those who do not. For example, non-intenders believed it was less likely that generic drugs were a safe product than intenders.Hamann et al. (2013) studied psychiatrists’ decision making between generic and brandedantipsychotics and found that psychiatrists were more likely to choose branded drugs whenimagining choosing the drug for themselves. Hassali et al. (2014) reviewed experiences of im-plementing generic medicine policy in eight countries, including the U.S. The review indicatesthat pharmacists play an essential role in promoting generic medicines as they substituted83.8% of prescriptions that allowed substitution and that generic prescribing is still not com-mon practice in the U.S. because many physicians have negative perceptions about genericmedicines and lack in-depth knowledge about the bioequivalence concept applied in the U.S.Kesselheim et al. (2016) studied variations in patients’ perceptions and use of generic drugsbased on a national survey and found a substantial shift that more patients have positiveviews of generic drugs.In the aforementioned work, Brinberg and Cummings (1984), Hamann et al. (2013),and Kesselheim et al. (2016) used survey data to study the behavior of the patients andphysicians. The review conducted by Hassali et al. (2014) was based on literature searchusing several electronic databases and search engines such as ISI ( Institute for ScientiﬁcInformation) Web of Knowledge and Google. Public pharmaceutical drug utilization datahave not been used to study the factors driving the decision of choosing one type over theother.In this paper, we investigate the rates of brand name drug claims in the U.S. In particular,we investigate the variations of the brand name drug claims rates in diﬀerent areas of theU.S. Our study supplements the aforementioned studies based on survey data. Since ratesare values in the interval (0 , . Accessed on July 6, 2020. Healthcare is an important practicing area of actuaries due to the existence of health insur-ance. In this section, we review some work that is related to health insurance analytics.Early work includes Bolnick (2004), Petertil (2005), and Bachler et al. (2006). Bolnick(2004) proposed a framework for long-term healthcare cost projections by incorporating keyhealthcare cost drivers such as life expectancy, biological morbidity, and economic morbidity.Using the framework, Bolnick (2004) considered various plausible future scenarios that en-compass a reasonable range of healthcare cost outcomes. Petertil (2005) studied the relativesigniﬁcance of aging as a driver of healthcare cost beyond age ﬁfty and used Medicare datato draw general conclusions that utilization and cost diﬀer by age. Bachler et al. (2006)studied the impact of chronic and nonchronic insured members on cost trends and foundthat classiﬁcation of chronic members can aﬀect the trends signiﬁcantly.Recently, there has been a lot more studies on healthcare analytics than ten years ago.Duncan (2011) devoted a book to healthcare analytics for actuaries. In particular, this bookcovers the essentials of health risk and case studies on the use of predictive modeling in riskadjustment. For example, Chapter 14 of this book discusses the risk adjustment methodused by the Centers for Medicare and Medicaid Services (CMS). Frees et al. (2011) extendedthe standard two-part model to predict the frequency and amount of healthcare expendituresand used the data from the Medical Expenditure Panel Survey (MEPS) to calibrate and testthe model. Shi and Zhang (2013) built a game-theoretic model based on copulas to studythe eﬀect of managed care on healthcare utilization compared to traditional fee-for-serviceplans in private health insurance market.Getzen (2016) developed a method to evaluate projections of future medical spendingused by the U.S. Medicare and Medicaid programs and found that the recent set of projec-tions (1998–2010) is more accurate than the older set of projections (1986–1995) becausethe recent projections incorporate lagged macroeconomic eﬀects. Duncan et al. (2016) in-vestigated several statistical techniques (e.g., Lasso GLM, multivariate adaptive regressionsplines, random forests, decision trees, and boosted trees) for modeling future healthcarecosts and found that the traditional regression approach does not perform well.Huang et al. (2017) compared diﬀerent models and model selection methods for healthcosts in Episode Treatment Groups (ETGs) and found that random forest feature selectionis preferable in terms of eﬃciency and accuracy. Richardson and Hartman (2018) proposed aBayesian nonparametric regression model for predicting healthcare claims. Their numericalresults show that the Bayesian nonparametric model outperforms standard linear regressionand generalized Beta regression in terms of predictive accuracy. Brockett et al. (2018)investigated the use of Data Envelopment Analysis (DEA), which is a method developed inmanagement science to measure relative eﬃciency of organizations, to assess the potential3avings of Medicare. Their analysis shows that Medicare Advantage plans are more eﬃcientin reducing health expenditures than the original Medicare. In Brockett et al. (2019), theauthors used linear regression models to estimate the eﬀect of medical loss ratio (MLR) andeﬃciency on the quality of care for the Medicaid managed care plans. The results show thatthe eﬀect of medical services eﬃciency on the quality of care is insigniﬁcant. Kan et al.(2019) explored the use of machine learning techniques for risk adjustment and found thatpenalized regression performs better than ordinary least squares in predicting healthcarecosts.Our work presented in this paper is relative to healthcare analytics but is diﬀerent fromthe work mentioned above. In particular, our work examined Beta regression model withspatial dependence to model the percentages of bran name drug claims. The methods andﬁndings of this paper can be used by practitioners in their risk modeling and adjustment.

We use a public dataset called the Part D Prescriber Public Use File (PUF) from the Centersfor Medicare & Medicaid Services (CMS) and another public dataset called Individual IncomeTax Statistics from the IRS (Internal Revenue Service).The data in the Part D Prescriber PUF cover calendar years 2013 through 2016 andcontain information on prescription drug events (PDEs) incurred by Medicare beneﬁciarieswith a Part D prescription drug plan. The data consist of a detailed data ﬁle and twosummary tables. The “Part D Prescriber Summary Table” contains overall drug utilization,drug costs, and beneﬁciary counts organized by National Provider Identiﬁer (NPI). In thispaper, we use the 2016 Part D Prescriber Summary Table, which contains 1,131,550 recordsand 84 variables . The Part D Prescriber Summary table contains information about thebrand and generic drug claims at the provider level. Other information includes the zip codeof providers, the count of beneﬁciaries, the average age of beneﬁciaries, and the average riskscore of beneﬁciaries.In this paper, our interest is on studying variations of the brand name drug claims ratesin diﬀerent areas of the U.S. To that end, we need to aggregate the data to diﬀerent areas.However, the data can be aggregated to diﬀerent levels: the state level, the zip code level,or some customized level between the state level and the zip code level. On the one hand,aggregating the data to the state level is not desirable for the following reasons: • The brand name drug claim rate exhibits variations within individual states. • The U.S. has only 50 states and aggregating the data to the state level produces only50 data points.On the other hand, aggregating the data to the zip code level is also not desirable: The ﬁle name is

PartD Prescriber PUF NPI 16.txt and it is available from . Accessed on Jan 20, 2020. The dataset contains about 20,000 diﬀerent zip codes. Aggregating the data to thezip code level will produce about 20,000 data points. This will cause challenges inmodeling the spatial eﬀects as a large number of sites requires lots of parameters. • Some zip codes do not correspond to geographical areas but large volume customersor post oﬃce boxes. • Aggregating the data to the zip level produces highly volatile brand name drug claimsrates. That is, the rates are highly volatile between zip codes.As a result, we decide to aggregate the data to some customized level between the state leveland the zip code level.

Longitude La t i t ude Figure 1: A mesh of the 48 contiguous states with 530 triangles.To aggregate the data to a customized level, we ﬁrst need to split the U.S. land into smallareas, i.e., create a mesh of the U.S. land. This can be done conveniently by using the Rfunction inla.mesh.2d from the

INLA package. Figure 1 shows a mesh of the 48 contiguousstates of the U.S. that consists of 530 triangles. Aggregating the data to the 530 triangularareas will produce a dataset with about 530 data points. This is computationally reasonablefor modeling spatial eﬀects.Aggregating the data to the triangles shown in Figure 1 is done as follows. First, weobtain the longitudes and the latitudes of zip codes from the R package noncensus , whichcontains regional information and demographic data determined by the U.S. Census Bureau.Note that each zip code is associated with a longitude and a latitude. Second, we determine5he triangles to which the zip codes belong by using the longitudes and the latitudes. Finally,we aggregate the data based on the indices of the triangles.The Part D Summary Table contains average ages and average risk scores of the bene-ﬁciaries. The average risk scores are average HCC (Hierarchical Condition Category) riskscores, which estimate how beneﬁciaries’ FFS (Fee-For-Service) spending will compare tothe overall average of the entire Medicare population. Before aggregating the data, we con-vert average ages and average risk scores to total ages and total risk scores by multiplyingthe average numbers with the count of beneﬁciaries. After aggregation is done, we converttotal ages and total risk scores to average ages and average risk scores by dividing the totalnumber by the aggregated count of beneﬁciaries. The Part D Summary Table also containsmissing values. We remove missing values before aggregating the data.

Longitude La t i t ude brandrate (a) Brandrate F r equen cy (b) Figure 2: Distributions of the brand name drug claim rates. White triangles mean that dataare not available in these areas.Figure 2(a) shows the distribution of the brand name drug claim rates in diﬀerent tri-angular areas. From the ﬁgure, we see that 19 triangular areas do not have the data. As aresult, the aggregated dataset contains 511 observations. From the ﬁgure, we also see thatsome triangles in the middle are light red. This suggests that modeling the spatial eﬀectsmay improve the ﬁtting of Beta regression models to the data.Table 1: Summary statistics of the Part D data.

Min 1st Q Median Mean 3rd Q Max brandrate avgage avgscore . This dataset contains 29,874 records, each of which is described by 144variables. It contains information about the number of returns and the amount of returnsin diﬀerent categories. For example, it contains the number of returns with unemploymentcompensation and the unemployment compensation amount. The return counts and amountsare aggregated to the zip code level. Although this datast does not contain all informationabout an area, it does provide some demographic and economic information, which mightbe useful for explaining the variation of the brand name drug claim rate. For example, thenumber of returns from an area is related to the population of the area.We aggregate the tax data to the 530 triangular areas in the same way as we aggregatethe Part D data. After aggregating the data, we divide the total dollar amount of an areaby the corresponding number of returns to get the average dollar amount of the area. Thetotal number of returns and the average amounts in diﬀerent categories are used in the dataanalysis. Since this dataset has 144 variables, we do not show the summary statistics ofthese variables here. In this section, we describe some Beta regression models. In particular, we present fourmodels: the basic Beta regression model, the Beta regression model with random eﬀects, theBeta-Besag model, and the Beta-BYM model.The density function of the Beta distribution is typically deﬁned as (Klugman et al.,2012): f ( y ; p, q ) = Γ( p + q )Γ( p )Γ( q ) y p − (1 − y ) q − , < y < , (1)where p > q > f ( y ; µ, φ ) = Γ( φ )Γ( µφ )Γ((1 − µ ) φ ) y µφ − (1 − y ) (1 − µ ) φ − , < y < , (2)where 0 < µ < φ >

0. The shape parameters can be obtained from the mean and thedispersion as follows: p = φµ and q = φ (1 − µ ).The basic Beta regression model is described as follows. Suppose that we have n obser-vations. For i = 1 , , . . . , n , let x i = ( x i , x i , . . . , x ik ) T and y i be the vector of k regressorsand the response in the i th case, respectively. The responses y , y , . . . , y n are assumed toform a random sample such as y i ∼ Beta ( µ i , φ ) . The mean µ i is linked to the regressors as follows: g ( µ i ) = η i = x Ti β , (3) The ﬁle name is and it is available from . Accessed on Jan 12, 2020. g ( · ) is the link function and β = ( β , β , . . . , β k ) T is the vector of regression coeﬃcients.The variance of y i can be estimated asVar( y i ) = µ i (1 − µ i )1 + φ . (4)Random eﬀects models are commonly used to analyze areal or spatial data. For example,Tufvesson et al. (2019) applied random eﬀects models to model car insurance data withgeographical locations of the policyholders. There are two types of random eﬀects models:unstructured and structured. Unstructured random eﬀects models assume independence ofthe random eﬀects, while structured random eﬀects models allow for spatial dependence.The second model we consider is a Beta regression model with a single random eﬀect. Inthis model, a random eﬀect is introduced to the linear predictor as follows: g ( µ i ) = η i = x Ti β + v i , (5)where v i are i.i.d Gaussian noise, i.e., v i ∼ N (cid:18) , ψ (cid:19) , (6)where ψ is a precision parameter. This model can help to control for unobserved hetero-geneity when the heterogeneity is not correlated with independent variables. The basic Betaregression model assumes that the heterogeneity is correlated with independent variables.The third model we consider is the Besag model, which introduces a structured randomeﬀect to the linear predictor: g ( µ i ) = η i = x Ti β + u i , (7)where u i s follow a CAR(1) model, i.e., an order-1 conditional autoregressive model (Besagand Kooperberg, 1995). Building a CAR model requires a neighborhood graph, which tellswhich areas are neighbors to each other. Figure 3 shows a neighborhood graph where areasthat are neighbors are connected by blue lines. In our models, we assume that triangles thatshare a vertex or a side are neighbors to each other.Given a neighborhood graph, the CAR(1) model assumes that the random eﬀect in anarea is related to the random eﬀects in the area’s neighbors through a conditional Gaussiandistribution: u i |{ u j : j ∈ A i } ∼ N (cid:32) | A i | (cid:88) j ∈ A i u j , ψ | A j | (cid:33) , (8)where A i is the set of indices of neighbors of the i th area, | A i | denotes the number of elementsin A i , and ψ is a precision parameter. Note that the CAR(1) model can be written as amultivariate Gaussian model for u = ( u , u , . . . , u n ) (Besag and Kooperberg, 1995; Wanget al., 2018): u ∼ N (cid:18) , ψ Q − (cid:19) , (9)8 Longitude La t i t ude Figure 3: The neighborhood graph used to model spatial eﬀects.where Q is a highly sparse matrix deﬁned by Q ij =  | A i | , if i = j, − , if j ∈ A i , , otherwise.The fourth model is the BYM (Besag-York-Molli´e) model, which combines an unstruc-tured random eﬀect and a structured random eﬀect (Besag et al., 1991): g ( µ i ) = η i = x Ti β + u i + v i , (10)where u i s are modeled by a CAR(1) model as speciﬁed in Equation (8) and v i s are i.i.dGaussian noise as speciﬁed in Equation (6). The structured eﬀects u model the spatialeﬀect. The unstructured eﬀects v = ( v , v , . . . , v n ) are used to model additional randomvariations that are not explained by geographical locations. Between the two eﬀects u and v , a larger eﬀect of v in the model means that less exchange of information between areasis allowed. The relative strengths of the unstructured eﬀects v and the structured eﬀects u are controlled by the two hyperparameters ψ and ψ , respectively.Table 2 summarizes the four Beta regression models described above. The ﬁrst two modelsdo not consider spatial dependence, while the last two models allow for spatial dependence.There are several choices of the link function (Ferrari and Cribari-Neto, 2004):9able 2: Linear predictors of the four models. Model name Linear predictor

BetaReg Equation (3)BetaRE Equation (5)BetaBesag Equation (7)BetaBYM Equation (10) • the logit link: g ( µ ) = log µ − µ ; • the probit link: g ( µ ) = Φ − ( µ ), where Φ( · ) is the standard normal distribution function; • the log-log link: g ( µ ) = − log( − log( µ )); • the complimentary log-log link: g ( µ ) = log( − log(1 − µ )); • the Cauchy link: g ( µ ) = tan( π ( µ − . . , −404 0.25 0.50 0.75 m g ( m ) link logitprobitloglogcloglogcauchit Figure 4: Some link functions for Beta regression models.10

Results

In this section, we present the results of ﬁtting the four Beta regression models (see Table2) to the brand drug claim data.

Before we ﬁt the models, we transform the covariates by using the YeoJohnson transforma-tion introduced by Yeo and Johnson (2000). The purpose of transforming the covariates isto reduce their magnitudes for the convenience of modeling ﬁtting. The YeoJohnson trans-formation is similar to the power transformation but can handle zero and negative values.In this transformation, a value x is transformed as follows: x ∗ = (cid:40) log(1 + x ) , if x ≥ , − log(1 − x ) , if x < . (11)We transform all covariates except for the average risk score before ﬁtting the models.After transforming the data, we split the dataset into two sets: one for ﬁtting the modelsand one for validating the models. We use 80% of the data for ﬁtting and the remaining20% for evaluation. Figure 5 shows the distribution of the brand name drug claim rates forthe training set and the test set. Longitude La t i t ude brandrate (a) The training set. Longitude La t i t ude brandrate (b) The test set. Figure 5: The distribution of the brand name drug claim rates for the training and test sets.

The data contain 145 covariates, which include 143 from the individual tax income ﬁle and 2(i.e., avgage and avgscore ) from the Part D Summary Table. Including all these covariatesin the models causes the parameter identiﬁcation problem because many of these covariatesare highly correlated.To select covariates for ﬁtting the models, we use the lasso regularized generalized linearmodels (GLMs), which have been implemented in the R package glmnet (Friedman et al.,11010). Tufvesson et al. (2019) used this method to select covariates. However, the package glmnet does not support Beta regression models. To circumvent this problem, we ﬁrst createa new binary response variable by comparing the brand name drug claim rates to the averagerate. If a rate is above the average, then the corresponding response value is 1. Otherwise,the response value is 0. Then use lasso regularized logistic regression that is supported by glmnet to select covariates. We assume that the covariates that help to separate high andlower rates can also serve as useful predictors. −10 −8 −6 −4 −2 . . . . . . Log ( l ) B i no m i a l D e v i an c e llllllllllllllllllllllllllllllllllllllllllllll

137 127 113 96 81 62 46 29 16 7 3 2

Figure 6: Results from a cross-valuation run of the lasso regularized logistic GLM based onthe training set.Figure 6 shows the results from a cross-validation run of the lasso regularized logisticGLM with diﬀerent penalties λ . In the ﬁgure, the left dashed vertical line corresponds tothe penalty λ min that minimizes the cross-validation error. The second dashed vertical linecorresponds to the smallest penalty λ se with the error that is within one standard deviationof the minimal error. We use the model with the penalty λ se to select covariates. This givesus 12 covariates, which are described in Table 3. These 12 selected covariates are used to ﬁtthe Beta regression models.Figure 7 shows a correlation plot of the response variable and the covariates. From theﬁgure, we see that many of these variables are positively correlated. The variable A11070 isnegatively correlated with a few other variables.12able 3: Descripton of the selected covariates.

Covariate Description avgscore

Average risk score

VITA

Number of volunteer income tax assistance (VITA) prepared returns

A00900

Business or professional net income (less loss) amount

A02300

Unemployment compensation amount

A03150

Individual retirement arrangement payments amount

A03230

Tuition and fees deduction amount

A18450

State and local general sales tax amount

A18800

Personal property taxes amount

A07230

Nonrefundable education credit amount

A85770

Total premium tax credit amount

A11070

Additional child tax credit amount

A11902

Overpayments refunded amount l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l −1−0.8−0.6−0.4−0.200.20.40.60.81 b r and r a t e a v g sc o r e V I T A A A A A A A A A A A brandrateavgscoreVITAA00900A02300A03150A03230A18450A18800A07230A85770A11070A11902 Figure 7: Correlation of the selected covariates and the response variable.13 .3 Fitting of the models

Bayesian methods have been used to analyze spatial data for about twenty years since year2000 with the advent of Markov Chain Monte Carlo (MCMC) simulative methods (Blan-giardo and Cameletti, 2015). In fact, the advent of MCMC has allowed researchers to useBayesian methods to develop complex models on large datasets. However, one major draw-back of MCMC methods is that they are computationally demanding, especially for largedatasets. INLA (Integrated Nested Laplace Approximation) is as an alternative to MCMCfor Bayesian inference (Rue and Held, 2005; Rue et al., 2009). A major advantage of INLAis that it is a deterministic algorithm and is capable of producing accurate and fast results.Since INLA was embedded into R through the R package

INLA , it has become popular amongresearchers and practitioners. For example, Tufvesson et al. (2019) used INLA to model carinsurance frequency and severity. In this paper, we use the R package

INLA (Blangiardo andCameletti, 2015; Wang et al., 2018) to ﬁt the four Beta regression models.The four models described in Section 4 can be formulated as Bayesian hierarchical models.For example, the BetaBYM model can be expressed as y i | η i ∼ Beta ( g − ( η i ) , φ ) , (12a) η i = x Ti β + u i + v i , (12b) v i | ψ ∼ N (cid:0) , ψ − (cid:1) , (12c) u | ψ ∼ N (cid:0) , ψ − Q − (cid:1) , (12d) ψ ∼ π ( ψ ) , (12e)where π ( · ) denotes a prior distribution for the two hyperparameters ψ and ψ . Commonchoices for the prior distribution of ψ include independent gamma distributions. In thispaper, we use the default priors in INLA. The deviance information criterion (DIC) of Spiegelhalter et al. (2002) and the Watan-abe Akaike information criterion (WAIC) of Watanabe (2010) are commonly used to selectBayesian models. The DIC is motivated by the Akaike information criterion (AIC) and isdeﬁned as

DIC = E [ D ( θ )] + p D = D ( E [ θ ]) + 2 p D , (13)where D ( θ ) = − p ( y | θ ))is the deviance of the model and p D denotes the eﬀective number of parameters that isdeﬁned as p D = E [ D ( θ )] − D ( E [ θ ]) . In a Bayesian model, the deviance is random variable and the expected deviance under theposterior distribution is used as a measure of ﬁt. Between two models, the model with alower DIC value is preferred. 14he WAIC is similar to the DIC but the eﬀective number of parameters is calculateddiﬀerently. The WAIC is deﬁned as

W AIC = D ( E [ θ ]) + 2 p W , (14)where p W = n (cid:88) i =1 (cid:8) E (cid:2) (log p ( y i | θ )) (cid:3) − E [log p ( y i | θ )] (cid:9) . The WAIC is interpreted in the same way as the DIC. That is, the lower the WAIC, themore preferable the model. Between the DIC and the WAIC, Gelman et al. (2014) arguethat the WAIC is preferred.We also use two measures to validate the out-of-sample performance of the models. Theﬁrst measure is the concordance correlation coeﬃcient (CCC), which is used to measure theagreement between two variables. The CCC is deﬁned as (Lin, 1989):

CCC = 2 ρσ σ σ + σ + ( µ − µ ) . (15)where ρ is the correlation between the observed values and the predicted values, σ and µ are the standard deviation and the mean of the observed values, respectively, and σ and µ are the standard deviation and the mean of the predicted values, respectively. The value ofthe CCC ranges from − RSE = (cid:118)(cid:117)(cid:117)(cid:116) n n (cid:88) i =1 (ˆ y i − y i ) , (16)where y i and ˆ y i represent the i th observed value and the i th predicted value, respectively.Between two models, the model with a lower RSE value is better. Table 4 shows the DIC and the WAIC of ﬁtting the four Beta regression models with diﬀerentlink functions to the training dataset. If we look at the rows, we see that the performancebased on the ﬁve link functions is quite similar except for the Cauchy link. If we look at thecolumns, we see that the BetaBYM model performs the best. In terms of DIC, the BetaBYMmodel with the Cauchy link performs the best. In terms of WAIC, however, the BetaBYMmodel with the log-log link is the best. However, the performance of the BetaBYM modelis quite similar for the ﬁrst four link functions. Using the logit link function is not a badchoice.Table 5 show the out-of-sample performance of the models with diﬀerent link functionson the test set. In terms of the CCC, the BetaBYM model with the log-log link performsthe best because the corresponding CCC value is the highest among the 20 cases. In terms15able 4: In-sample performance of the models with diﬀerent link functions.

Model Logit Probit Loglog Cloglog Cauchy

DIC

BetaReg -2103.82 -2102.92 -2101.53 -2104.36 -2109.25BetaRE -2103.60 -2102.20 -2100.60 -2104.10 -2109.79BetaBesag -2105.79 -2109.89 -2109.58 -2107.41 -2114.87BetaBYM -2116.93 -2115.13 -2112.55 -2118.06 -2132.92

W AIC

BetaReg -2070.47 -2070.84 -2071.36 -2069.72 -2062.49BetaRE -2070.60 -2071.07 -2071.56 -2069.88 -2063.02BetaBesag -2077.25 -2081.93 -2083.42 -2077.41 -2069.28BetaBYM -2089.08 -2091.05 -2091.60 -2088.89 -2087.56Table 5: Out-of-sample performance of the models with diﬀerent link functions.

Model Logit Probit Loglog Cloglog Cauchy

CCC

BetaReg 0.42846 0.42810 0.42749 0.42801 0.42138BetaRE 0.42834 0.42790 0.42733 0.42786 0.42126BetaBesag 0.45289 0.46855 0.47168 0.45622 0.44610BetaBYM 0.48595 0.48954

RSE

BetaReg 0.01380 0.01376 0.01371 0.01383 0.01412BetaRE 0.01380 0.01376 0.01371 0.01383 0.01413BetaBesag 0.01364 0.01352 0.01342 0.01367 0.01412BetaBYM 0.01359 0.01352 ll ll llllll ll ll lll llll l ll lll ll lllll llllll llll lllllllll ll lll l lllll lllll l lllll lll llllll l l llllllll l llll llll lll lll llllllll llllll ll ll lllllll llllllll lll llllllllll lll ll llll ll ll ll l ll ll lllll ll l llll l llllllll lllll lll ll l lllllllllll llll llll ll lll lllll lllll ll l llllllll ll l l ll ll lllllllllll lll ll ll llllllllllll lllll llll lllll llll lllllll llll lllll ll ll llllllllllll ll lllllll lll lll lllll llll lllll l lll l llllll ll llll ll llllll lll l lllll l lll . . . Observed brand rate P r ed i c t ed b r and r a t e l l ll l ll ll lll llllllll lll ll llll llll llll lll ll lllll llll lll ll llll llll lll ll l ll ll llll lll lllll llll l llll l ll ll lll (a) The basic Beta regression model. lll ll llllll ll ll lll llll l ll lll ll lllll llllll llll lllllllll ll lll l lllll lllll l lllll lll llllll l l llllllll l llll llll lll lll llllllll llllll ll ll lllllll llllllll lll llllllllll lll ll llll ll ll ll l ll ll lllll ll l llll l llllllll lllll lll ll l lllllllllll llll llll ll lll lllll lllll ll l llllllll ll l l ll ll lllllllllll lll ll ll llllllllllll lllll llll lllll llll lllllll llll lllll ll ll llllllllllll ll lllllll lll lll lllll llll lllll l lll l llllll ll llll ll llllll lll l lllll l lll . . . Observed brand rate P r ed i c t ed b r and r a t e l l ll l ll ll lll llllllll lll ll llll llll llll lll ll lllll llll lll ll llll llll lll ll l ll ll llll lll lllll llll l llll l ll ll lll (b) The BetaRE model. lll ll llllll ll ll lll llll l ll lll ll lllll llllll llll lllllllll ll lll l lllll lllll l lllll lll llllll l l llllllll l llll llll lll lll llllllll llllll ll ll lllllll llllllll lll llllllllll lll ll llll ll ll ll l ll ll lllll ll l llll l llllllll lllll lll ll l lllllllllll llll llll ll lll lllll lllll ll l llllllll ll l l ll ll lllllllllll lll ll ll llllllllllll lllll llll lllll llll lllllll llll lllll ll ll llllllllllll ll lllllll lll lll lllll llll lllll l lll l llllll ll llll ll llllll lll l lllll l lll . . . Observed brand rate P r ed i c t ed b r and r a t e l l ll l ll ll lll llllllll lll ll llll llll llll lll ll lllll llll lll ll llll llll lll ll l ll ll llll lll lllll llll l llll l ll ll lll (c) The BetaBesag model. lll ll llllll ll ll lll llll l ll lll ll lllll llllll llll lllllllll ll lll l lllll lllll l lllll lll llllll l l llllllll l llll llll lll lll llllllll llllll ll ll lllllll llllllll lll llllllllll lll ll llll ll ll ll l ll ll lllll ll l llll l llllllll lllll lll ll l lllllllllll llll llll ll lll lllll lllll ll l llllllll ll l l ll ll lllllllllll lll ll ll llllllllllll lllll llll lllll llll lllllll llll lllll ll ll llllllllllll ll lllllll lll lll lllll llll lllll l lll l llllll ll llll ll llllll lll l lllll l lll . . . Observed brand rate P r ed i c t ed b r and r a t e l l ll l ll ll lll llllllll lll ll llll llll llll lll ll lllll llll lll ll llll llll lll ll l ll ll llll lll lllll llll l llll l ll ll lll (d) The BetaBYM model. Figure 8: Scatter plots of the observed brand name drug claim rates and those predicted bythe four models with the log-log link function. The out-of-sample predictions are plotted asblue dots, while the in-sample ﬁtted values are plotted as black circles.17f the RSE, the BetaBYM model with the log-log link is also the best as it has the lowestRSE value.Figure 8 shows the scatter plots of the observed brand name drug claim rates and thepredicted values produced by the four models with the log-log link function. From the fourscatter plots, we see that the predicted values of the four models are pretty similar. Wealso see that all four models do not ﬁt the large rates well. The highest predicted value isaround 0.25, while the largest observed value is around 0.41 (see Table 1). Figure 9 showsthe distributions of the brand name drug claim rates predicted by the four models with thelog-log link function across the triangular areas. The maps look quite similar and it is hardto see the diﬀerences. In addition, these maps look similar to the map of the observed ratesshown in Figure 2(a).

Longitude La t i t ude brandrate (a) The basic Beta regression model. Longitude La t i t ude brandrate (b) The BetaRE model. Longitude La t i t ude brandrate (c) The BetaBesag model. Longitude La t i t ude brandrate (d) The BetaBYM model. Figure 9: Distributions of the brand name drug claim rates predicted by the four modelswith the log-log link function.Tables 6, 7, 8, and 9 show the ﬁtted coeﬃcients and hyperparameters of the four modelswith the log-log link function. In particular, the tables show the mean, the standard devi-ation, the 2.5% quantile, the median, and the 97.5% quantile of the estimated parameters.From the 2.5% and the 97.5% quantiles, we get the 95% credibility intervals of the ﬁttedparameters. If we look at the credibility intervals of the ﬁtted coeﬃcients, we see that the18able 6: Coeﬃcients and hyperparameters from ﬁtting the basic Beta regression model withthe log-log link.

Mean Std 0.025 Q 0.5 Q 0.975 Q. (Intercept) -0.554 0.106 -0.763 -0.554 -0.345 avgscore

VITA

A00900

A02300

A03150

A03230 -0.002 0.013 -0.027 -0.002 0.023

A18450 -0.044 0.013 -0.069 -0.044 -0.019

A18800

A07230 -0.198 0.044 -0.285 -0.198 -0.111

A85770

A11070 -0.234 0.103 -0.437 -0.234 -0.031

A11902 φ Mean Std 0.025 Q 0.5 Q 0.975 Q. (Intercept) -0.553 0.107 -0.763 -0.553 -0.344 avgscore

VITA

A00900

A02300

A03150

A03230 -0.002 0.013 -0.027 -0.002 0.023

A18450 -0.044 0.013 -0.069 -0.044 -0.019

A18800

A07230 -0.198 0.045 -0.285 -0.198 -0.111

A85770

A11070 -0.234 0.104 -0.438 -0.234 -0.030

A11902 φ ψ Mean Std 0.025 Q 0.5 Q 0.975 Q. (Intercept) -0.574 0.109 -0.788 -0.574 -0.361 avgscore

VITA

A00900

A02300

A03150

A03230

A18450 -0.036 0.014 -0.064 -0.036 -0.007

A18800 -0.012 0.031 -0.074 -0.012 0.049

A07230 -0.215 0.048 -0.310 -0.215 -0.121

A85770

A11070 -0.206 0.110 -0.421 -0.206 0.011

A11902 φ ψ Mean Std 0.025 Q 0.5 Q 0.975 Q. (Intercept) -0.580 0.111 -0.800 -0.580 -0.362 avgscore

VITA

A00900

A02300

A03150

A03230

A18450 -0.031 0.014 -0.059 -0.031 -0.002

A18800 -0.021 0.031 -0.083 -0.021 0.041

A07230 -0.225 0.049 -0.321 -0.225 -0.128

A85770

A11070 -0.195 0.114 -0.419 -0.196 0.028

A11902 φ ψ ψ avgscore , VITA , A02300 , and

A11902 contain only positive values.This means that these variables tend to have positive impact on the responses, i.e., thebrand name drug claim rates.From these tables, we also see two variables whose 95% credibility intervals contain onlynegative values. The two variables are

A07230 and

A18450 . From Table 3, we see that bothvariables are related to education spending. Since the ﬁtted coeﬃcients of these two variablesare negative, the two variables tend to have negative impact on the brand name drug claimrates.Tables 7, 8, and 9 also show the estimated hyperparameters of the unstructured ran-dom eﬀects and the structured random eﬀects. Note that the hyperparameters are precisionparameters that are inverse to the variance of the random eﬀects. The higher the hyperpa-rameter, the lower the variance of the random eﬀects. In Table 7, we see that the estimatedhyperparameter ψ is very large. This means that the random eﬀects component of themodel has a very small variance and does not contribute much to explain the variation ofthe response variable.In Table 8, we see that the ﬁtted precision parameter ψ of the structured random eﬀectsis much lower than the ﬁtted ψ in Table 7. In Table 9, we see the same pattern that ﬁtted ψ is much lower than the ﬁtted ψ . This means that the structured random eﬀects helpmore than the unstructured random eﬀects in terms of explaining the response variation.In summary, the above results show that modeling the spatial eﬀect improves the perfor-mance of the Beta regression model. In addition, including unstructured random eﬀects inthe model only improve the performance marginally. To further assess the comprehensive-ness of our Beta regression models, we provide additional results using models suggested inDuncan et al. (2016). See Appendix A for additional results. The healthcare sector in the U.S. is a large sector that generates about 20% of the country’sgross domestic product. This signiﬁcance of healthcare has motivated researchers to developenhanced tools and approaches to better understand the industry through healthcare ana-lytics. According to the 2019 Predictive Analytics in Health Care Trend Forecast (Society ofActuaries, 2019), predictive analytics is poised to reshape the healthcare industry by achiev-ing three aims: improved patient outcomes, higher quality of care, and reduced costs. Inparticular, McKinsey estimates that big data analytics can help the U.S. healthcare industryto save more than $300 billion per year, where two thirds of that come from the reductionsof approximately 8% in national healthcare expenditures.In this paper, we examined and demonstrated the use of Beta regression models to studythe utilization of brand name drugs in the U.S. in order to understand variability of thebrand name drug claim rates across diﬀerent areas of the U.S.. We studied diﬀerent Betaregression models with and without spatial eﬀects and ﬁtted these models to public datasetsobtained from the CMS and the IRS. The numerical results show that Beta regression modelsare able to ﬁt the brand name drug claim rates well and modeling the spatial eﬀects improvesthe performance of the Beta regression models. We also ﬁnd some signiﬁcant variables thathelp to explain the variation of the brand name drug claim rates across diﬀerent areas. The21ethods and ﬁndings in this paper are useful for healthcare actuaries in their data analysis.By reﬂecting the eﬀect of prescription drug utilization, these models can be used to updatean insured’s risk score in a risk adjustment model. Speciﬁcally, healthcare actuaries canincorporate the geographic variation in their models used to select preferred providers.

References

Bachler, R., Duncan, I., and Juster, I. (2006). A comparative analysis of chronic andnonchronic insured commercial member cost trends.

North American Actuarial Journal ,10(4):76–89.Besag, J. and Kooperberg, C. (1995). On conditional and intrinsic autoregression.

Biometrika , 82(4):733–746.Besag, J., York, J., and Molli´e, A. (1991). Bayesian image restoration, with two applicationsin spatial statistics.

Annals of the Institute of Statistical Mathematics , 43:1–20.Blangiardo, M. and Cameletti, M. (2015).

Spatial and Spatio-Temporal Bayesian Modelswith R-INLA . Wiley, Hoboken, NJ.Bolnick, H. J. (2004). A framework for long-term actuarial projections of health care costs:The importance of population aging and other factors.

North American Actuarial Journal ,8(4):1–29.Brinberg, D. and Cummings, V. (1984). Purchasing generic prescriptions drugs: an anal-ysis using two behavioral intention models. In

NA - Advances in Consumer Research ,volume 11, pages 229–234.Brockett, P., Golden, L., Yang, C. C., and Young, D. (2019). Medicaid managed care:Eﬃciency, medical loss ratio, and quality of care.

North American Actuarial Journal ,0(0):1–16.Brockett, P. L., Golden, L. L., and Yang, C. C. (2018). Potential “savings” of Medicare:The analysis of Medicare advantage and accountable care organizations.

North AmericanActuarial Journal , 22(3):458–472.Duncan, I. (2011).

Health Risk Adjustment and Predictive Modeling . ACTEX Publications,Winsted, CT.Duncan, I., Loginov, M., and Ludkovski, M. (2016). Testing alternative regression frame-works for predictive modeling of health care costs.

North American Actuarial Journal ,20(1):65–87.Ferrari, S. and Cribari-Neto, F. (2004). Beta regression for modelling rates and proportions.

Journal of Applied Statistics , 31(7):799–815.Frees, E. W., Gao, J., and Rosenberg, M. A. (2011). Predicting the frequency and amountof health care expenditures.

North American Actuarial Journal , 15(3):377–392.22riedman, J., Hastie, T., and Tibshirani, R. (2010). Regularization paths for generalizedlinear models via coordinate descent.

Journal of Statistical Software, Articles , 33(1):1–22.Gelman, A., Hwang, J., and Vehtari, A. (2014). Understanding predictive information cri-teria for Bayesian models.

Statistics and Computing , 24:997–1016.Getzen, T. E. (2016). Accuracy of long-range actuarial projections of health care costs.

North American Actuarial Journal , 20(2):101–113.Hamann, J., Mendel, R., Kissling, W., and StefanLeucht (2013). Psychiatrists’ decision mak-ing between branded and generic drugs.

European Neuropsychopharmacology , 23(7):686 –690.Hassali, M. A., Alrasheedy, A. A., McLachlan, A., Nguyen, T. A., AL-Tamimi, S. K.,Ibrahim, M. I. M., and Aljadhey, H. (2014). The experiences of implementing genericmedicine policy in eight countries: A review and recommendations for a successful pro-motion of generic medicine use.

Saudi Pharmaceutical Journal , 22(6):491 – 503.Hileman, G. R., Mehmud, S. M., and Rosenberg, M. A. (2016). Risk scoring in health insur-ance: A primer. , Society of Actuaries.Huang, S., Hartman, B., and Brazauskas, V. (2017). Model selection and averaging of healthcosts in episode treatment groups.

ASTIN Bulletin , 47(1):153–167.Kan, H. J., Kharrazi, H., Chang, H.-Y., Bodycombe, D., Lemke, K., and Weiner, J. P. (2019).Exploring the use of machine learning for risk adjustment: A comparison of standard andpenalized linear regression models in predicting health care costs in older adults.

PLOSONE , 14:1–13.Kesselheim, A. S., Gagne, J. J., Franklin, J. M., Eddings, W., Fulchino, L. A., Avorn, J.,and Campbell, E. G. (2016). Variations in patients’ perceptions and use of generic drugs:Results of a national survey.

Journal of general internal medicine , 31(6):609 – 614.Klinger, E. (2017). 2017 generic drug access and savings in the U.S. re-port. https://accessiblemeds.org/resources/blog/2017-generic-drug-access-and-savings-us-report . Accessed on 9/27/2018.Klugman, S., Panjer, H., and Willmot, G. (2012).

Loss Models: From Data to Decisions .Wiley, Hoboken, NJ, 4th edition.Lin, L. I.-K. (1989). A concordance correlation coeﬃcient to evaluate reproducibility.

Bio-metrics , 45(1):255–268.Petertil, J. P. (2005). Aging curves for health care costs in retirement.

North AmericanActuarial Journal , 9(3):22–49. 23ichardson, R. and Hartman, B. (2018). Bayesian nonparametric regression models formodeling and predicting healthcare claims.

Insurance: Mathematics and Economics , 83:1– 8.Rue, H. and Held, L. (2005).

Gaussian Markov Random Fields: Theory and Applications .CRC Press, Boca Raton, FL.Rue, H., Martino, S., and Chopin, N. (2009). Approximate Bayesian inference for latentGaussian models by using integrated nested laplace approximations.

Journal of the RoyalStatistical Society: Series B (Statistical Methodology) , 71(2):319–392.Schoonveld, E. (2015).

The Price of Global Health: Drug Pricing Strategies to BalancePatient Access and the Funding of Innovation . Gower Publishing Company, Burlington,VT, 2nd edition.Shi, P. and Zhang, W. (2013). Managed care and health care utilization: Speciﬁcation ofbivariate models using copulas.

North American Actuarial Journal , 17(4):306–324.Society of Actuaries (2019). 2019 Predictive analytics in health care trend fore-cast. . Accessed: 2019-8-23.Spiegelhalter, D. J., Best, N. G., Carlin, B. P., and Van Der Linde, A. (2002). Bayesianmeasures of model complexity and ﬁt.

Journal of the Royal Statistical Society: Series B(Statistical Methodology) , 64(4):583–639.Tufvesson, O., Lindstr¨om, J., and Lindstr¨om, E. (2019). Spatial statistical modelling ofinsurance risk: a spatial epidemiological approach to car insurance.

Scandinavian ActuarialJournal , 0(0):1–15.Wang, X., Yue, Y. R., and Faraway, J. (2018).

Bayesian Regression Modeling with INLA .CRC Press, Boca Raton, FL.Watanabe, S. (2010). Asymptotic equivalence of bayes cross validation and widely applicableinformation criterion in singular learning theory.

Journal of Machine Learning Research ,11:3571–3594.Yeo, I.-K. and Johnson, R. A. (2000). A new family of power transformations to improvenormality or symmetry.

Biometrika , 87(4):954–959.

A Additional results

Duncan et al. (2016) tested several alternative regression frameworks for predictive modelingof health care costs. In particular, they tested multiple linear regression models, lasso,multivariate adaptive regression splines (MARS), random forests, M5 decision trees, andboosted trees. For details of these models, readers are referred to Duncan et al. (2016) and24able 10: Parameters and the R packages of the additional models.

Model R Package Parameters

Linear model lm Multiple linear regression without interactionsLasso glmnet

Gaussian familyMARS earth

Interactions up to degree = 1;Generalized Cross Validation penalty = 1Random forests randomForest

Number of trees = 100; minimum node size = 10M5 decision trees

Cubist

Number of committees = 25Boosted trees gbm

Number of trees = 5000; Shrinkage = 0.01Table 11: Out-of-sample prediction results of the additional models.

Model

CCC RSE

Linear model 0.4129 0.0141Lasso 0.0000 0.0154MARS 0.4420 0.0143Random forest 0.4657 0.0126M5 decision tree 0.4330 0.0131Boosted trees 0.4658 0.0133the references therein. In this section, we present the out-of-sample prediction results ofthese models.Table 10 shows the parameters and R packages we used to test the additional models.Table 11 shows the out-of-sample prediction results obtained by these models. Figure 10shows the scatter plots of the observed brand name drug claim rates and the predictedvalues by the additional models. The variables input to these models are the same as thoseselected for the Beta regression models described in Table 3.From Table 11, we see that the lasso model produced the worst result. The reason isthat the lasso regression model produced an intercept-only model for the data. Figure 10(b)also conﬁrms this as the predicted values are constant. Comparing Tables 5 and 11, wesee that all these models do not perform better than the Beta regression model in termsof

CCC . However, the tree-based models (i.e., random forests, M5 decision trees, boostedtrees) produced lower

RSE than the Beta regression model.25 ll ll llllll ll ll lll llll l ll lll ll lllll llllll llll lllllllll ll lll l lllll lllll l lllll lll llllll l l llllllll l llll llll lll lll llllllll llllll ll ll lllllll llllllll lll llllllllll lll ll llll ll ll ll l ll ll lllll ll l llll l llllllll lllll lll ll l lllllllllll llll llll ll lll lllll lllll ll l llllllll ll l l ll ll lllllllllll lll ll ll llllllllllll lllll llll lllll llll lllllll llll lllll ll ll llllllllllll ll lllllll lll lll lllll llll lllll l lll l llllll ll llll ll llllll lll l lllll l lll . . . Observed brand rate P r ed i c t ed b r and r a t e l l ll l ll ll lll llllllll lll ll llll llll llll lll ll lllll llll lll ll llll llll lll ll l ll ll llll lll lllll llll l llll l ll ll lll (a) The linear regression model. lll ll llllll ll ll lll llll l ll lll ll lllll llllll llll lllllllll ll lll l lllll lllll l lllll lll llllll l l llllllll l llll llll lll lll llllllll llllll ll ll lllllll llllllll lll llllllllll lll ll llll ll ll ll l ll ll lllll ll l llll l llllllll lllll lll ll l lllllllllll llll llll ll lll lllll lllll ll l llllllll ll l l ll ll lllllllllll lll ll ll llllllllllll lllll llll lllll llll lllllll llll lllll ll ll llllllllllll ll lllllll lll lll lllll llll lllll l lll l llllll ll llll ll llllll lll l lllll l lll . . . Observed brand rate P r ed i c t ed b r and r a t e l l ll l ll ll lll llllllll lll ll llll llll llll lll ll lllll llll lll ll llll llll lll ll l ll ll llll lll lllll llll l llll l ll ll lll (b) The lasso model. lll ll llllll ll ll lll llll l ll lll ll lllll llllll llll lllllllll ll lll l lllll lllll l lllll lll llllll l l llllllll l llll llll lll lll llllllll llllll ll ll lllllll llllllll lll llllllllll lll ll llll ll ll ll l ll ll lllll ll l llll l llllllll lllll lll ll l lllllllllll llll llll ll lll lllll lllll ll l llllllll ll l l ll ll lllllllllll lll ll ll llllllllllll lllll llll lllll llll lllllll llll lllll ll ll llllllllllll ll lllllll lll lll lllll llll lllll l lll l llllll ll llll ll llllll lll l lllll l lll . . . Observed brand rate P r ed i c t ed b r and r a t e l l ll l ll ll lll llllllll lll ll llll llll llll lll ll lllll llll lll ll llll llll lll ll l ll ll llll lll lllll llll l llll l ll ll lll (c) The MARS model. lll ll llllll ll ll lll llll l ll lll ll lllll llllll llll lllllllll ll lll l lllll lllll l lllll lll llllll l l llllllll l llll llll lll lll llllllll llllll ll ll lllllll llllllll lll llllllllll lll ll llll ll ll ll l ll ll lllll ll l llll l llllllll lllll lll ll l lllllllllll llll llll ll lll lllll lllll ll l llllllll ll l l ll ll lllllllllll lll ll ll llllllllllll lllll llll lllll llll lllllll llll lllll ll ll llllllllllll ll lllllll lll lll lllll llll lllll l lll l llllll ll llll ll llllll lll l lllll l lll . . . Observed brand rate P r ed i c t ed b r and r a t e l l ll l ll ll lll llllllll lll ll llll llll llll lll ll lllll llll lll ll llll llll lll ll l ll ll llll lll lllll llll l llll l ll ll lll (d) The random forests model. lll ll llllll ll ll lll llll l ll lll ll lllll llllll llll lllllllll ll lll l lllll lllll l lllll lll llllll l l llllllll l llll llll lll lll llllllll llllll ll ll lllllll llllllll lll llllllllll lll ll llll ll ll ll l ll ll lllll ll l llll l llllllll lllll lll ll l lllllllllll llll llll ll lll lllll lllll ll l llllllll ll l l ll ll lllllllllll lll ll ll llllllllllll lllll llll lllll llll lllllll llll lllll ll ll llllllllllll ll lllllll lll lll lllll llll lllll l lll l llllll ll llll ll llllll lll l lllll l lll . . . Observed brand rate P r ed i c t ed b r and r a t e l l ll l ll ll lll llllllll lll ll llll llll llll lll ll lllll llll lll ll llll llll lll ll l ll ll llll lll lllll llll l llll l ll ll lll (e) The M5 decision trees model. lll ll llllll ll ll lll llll l ll lll ll lllll llllll llll lllllllll ll lll l lllll lllll l lllll lll llllll l l llllllll l llll llll lll lll llllllll llllll ll ll lllllll llllllll lll llllllllll lll ll llll ll ll ll l ll ll lllll ll l llll l llllllll lllll lll ll l lllllllllll llll llll ll lll lllll lllll ll l llllllll ll l l ll ll lllllllllll lll ll ll llllllllllll lllll llll lllll llll lllllll llll lllll ll ll llllllllllll ll lllllll lll lll lllll llll lllll l lll l llllll ll llll ll llllll lll l lllll l lll . . . Observed brand rate P r ed i c t ed b r and r a t e l l ll l ll ll lll llllllll lll ll llll llll llll lll ll lllll llll lll ll llll llll lll ll l ll ll llll lll lllll llll l llll l ll ll lll (f) The boosted trees model.(f) The boosted trees model.