An Extension of Generalized Linear Models to Finite Mixture Outcome Distributions
AAn Extension of Generalized Linear Modelsto Finite Mixture Outcome Distributions
Andrew M. Raim ∗ , Nagaraj K. Neerchal † & Jorge G. Morel †∗ Center for Statistical Research and Methodology, U.S. Census Bureau † Department of Mathematics and Statistics, University of Maryland, Baltimore County
Abstract
Finite mixture distributions arise in sampling a heterogeneous population. Data drawn from such apopulation will exhibit extra variability relative to any single subpopulation. Statistical models based onfinite mixtures can assist in the analysis of categorical and count outcomes when standard generalizedlinear models (GLMs) cannot adequately account for variability observed in the data. We proposean extension of GLM where the response is assumed to follow a finite mixture distribution, while theregression of interest is linked to the mixture’s mean. This approach may be preferred over a finitemixture of regressions when the population mean is the quantity of interest; here, only a single regressionfunction must be specified and interpreted in the analysis. A technical challenge is that the mean of afinite mixture is a composite parameter which does not appear explicitly in the density. The proposedmodel is completely likelihood-based and maintains the link to the regression through a certain randomeffects structure. We consider typical GLM cases where means are either real-valued, constrained tobe positive, or constrained to be on the unit interval. The resulting model is applied to two exampledatasets through a Bayesian analysis: one with success/failure outcomes and one with count outcomes.Supporting the extra variation is seen to improve residual plots and to appropriately widen predictionintervals.
The Generalized Linear Model (GLM) is heavily used by researchers and practitioners for regression anal-ysis on categorical, count, and continuous outcomes (McCullagh and Nelder, 1989). Standard GLM theoryassumes an exponential family distribution, such as Poisson to model counts and Binomial to model suc-cess/failure data. These distributions are limited in the amount of variability they can express. GLM usersoften encounter the issue of overdispersion, where the data exhibit variability which cannot be expressed bythe model. This can manifest itself in a number of ways, depending on the specific nature of the overdisper-sion and its departure from the model. For example, assuming independence in clustered data can result instandard error estimates which are too small and lead to tests with an inflated type I error rate (Morel andNeerchal, 2012, Chapter 1).The objective of this paper is to extend the GLM so that a finite mixture of J simpler densities can be usedas the distribution for the response. There is a well-established literature on finite mixtures of regressions, This paper is released to inform interested parties of ongoing research and to encourage discussion of work in progress. Anyviews expressed are those of the authors and not necessarily those of the U.S. Census Bureau.For correspondence:A.M. Raim ( [email protected] )Center for Statistical Research and MethodologyU.S. Census BureauWashington, D.C. 20233, U.S.A. a r X i v : . [ s t a t . M E ] D ec n which each component distribution of a finite mixture is linked to a separate regression (Fr¨uhwirth-Schnatter, 2006). An analyst may employ a finite mixture of regressions model if heterogeneity is suspectedin the relationship between covariate x and response y among sampled units, yet not enough is known tomodel the heterogeneity explicitly. Specifying regressions for J latent subpopulations may complicate modelselection in practice. Often, the interest may be in modeling the mean response, and heterogeneity is simplya nuisance rather than a target for inference. This motivates us to formulate the Mixture Link model,which uses a finite mixture to capture extra variation, but constrains the mean of the finite mixture to belinked to a single regression function. The mean of a finite mixture is composed of multiple parameterswhich may not appear directly in the likelihood. Central to the development of Mixture Link is the set inwhich the link constraint is honored. In the case of positive-valued means, this constraint set is a polytope,while for probability-valued means it is the intersection of a polyhedron and a unit cube. For real-valuedmeans, the constraint set is the basis of a linear space. A random effects structure is assumed on this setto complete specification of the likelihood. Under Poisson and Normal outcome types, the random effectscan be integrated out to yield a tractable form for the density. The case of Binomial outcomes is morecomputationally challenging. Taking a Bayesian approach to inference, a simple Random-Walk Metropolis-Hastings sampler can be used for the Normal and Poisson Mixture Link models. For Binomial outcomes,we consider a Metropolis-within-Gibbs sampler with data augmentation to avoid repeated evaluation of themarginal density.A number of methods have been established to handle overdispersion. Morel and Neerchal (2012) pro-vide an overview in the settings of count and categorical data. One common approach is to extend abasic distribution by assuming the presence of latent random variables, and then integrating them out.The Beta-Binomial (Otake and Prentice, 1984), Zero-Inflated Binomial (Hall, 2000), and Random-ClumpedBinomial (Morel and Nagaraj, 1993) distributions are all obtained in this way starting from the Binomialdistribution. Similarly, the negative Binomial and zero-inflated negative Binomial distributions (Hilbe, 2011)are obtained starting from the Poisson distribution. In this same way, the t-distribution (Liu and Rubin,1995) may be considered an overdispersion model relative to the normal distribution. Generalized LinearMixed Models are obtained by adding random effects to the regression function (McCulloch et al., 2008);the marginal likelihood of the outcomes usually cannot be written without an integral for non-normal out-comes. Quasi-likelihood methods extend the likelihood in ways that do not yield a proper likelihood, butallow inference to be made on regression coefficients. A simple quasi-likelihood is obtained from placinga dispersion multiplier to the variance (Agresti, 2002, Section 4.7). The method of Wedderburn (1974)requires specification of only the mean-variance relationship to form a system of equations and carry outinference. Generalized Estimating Equations (GEE) is a quasi-likelihood method for grouped data wherethe analyst assumes a working correlation structure for observations taken within a subject (Hardin andHilbe, 2012). Some Bayesian overdispersion methods are discussed in the collection assembled by Dey et al.(2000); for example, Basu and Mukhopadhyay (2000) consider generalizing the link function of a GLM toa mixture distribution and Dey and Ravishanker (2000) propose generalized exponential families for theoutcome. More recently, Klein et al. (2015) proposed a Bayesian approach to generalized additive modelsunder the Zero-Inflated Negative Binomial model to estimate complicated regression functions.The rest of the paper proceeds as follows. Section 2 formulates the Mixture Link general model. Section 3develops Mixture Link under probability-valued means, with special attention given to Binomial outcomes.Sections 4 and 5 develop Mixture Link for positive- and real-valued means, respectively, and obtain specificmodels for Poisson and Normal outcomes. Section 6 presents example data analyses with Mixture LinkBinomial and Mixture Link Poisson. Finally, Section 7 concludes the paper. The mixlink package for R (available from http://cran.r-project.org ) provides much of the Mixture Link functionality discussed inthis paper. 2 Mixture Link Formulation
The usual GLM formulation is based on a density in the exponential dispersion family, f ( y | θ, φ ) = exp (cid:26) θy − b ( θ ) a ( φ ) + c ( y ; φ ) (cid:27) , (2.1)where θ is the canonical parameter which influences the mean and φ is the dispersion parameter. Here it canbe shown that E( y ) = b (cid:48) ( θ ) and Var( y ) = a ( φ ) b (cid:48)(cid:48) ( θ ), and expressions for the score vector and informationmatrix can be obtained (Agresti, 2002, Section 4.4). Estimation can be carried out routinely, using Newton-Raphson or scoring algorithms to compute maximum likelihood estimates, or standard MCMC algorithmsfor a Bayesian analysis. Our objective is to modify this framework to allow a finite mixture as the outcomedistribution, establishing a link between the mixture mean and a regression function of interest. Becausefinite mixtures can support more variation than distributions of the form (2.1), this extension should naturallysupport variation beyond standard GLMs. We are especially interested in finite mixtures of three commonGLM outcome types: Normal, Binomial, and Poisson.Consider a random variable Y following the finite mixture distribution, f ( y | θ ) = J (cid:88) j =1 π j g ( y | θ j ) . (2.2)Here, the mixing proportions π = ( π , . . . , π J ) belong to the probability simplex S J = { λ ∈ [0 , J : λ j ≥ , λ T = 1 } . The densities g ( y | θ j ) belong to a common family parameterized by θ j = ( µ j , φ j ), consistingof a mean parameter µ j = (cid:82) y g ( y | θ j ) dν ( y ) and where all other parameters are contained in φ j . Writing ν as the dominating measure for densities g allows expectations over discrete and continuous random variablesto be treated with a common integral notation. The overall expected value is E( Y ) = (cid:80) Jj =1 π j µ j = π T µ .The µ j may naturally be restricted to a subset of R , depending on the outcome type. For example, if Y isa count, µ j ∈ [0 , ∞ ) often represents a rate. Alternatively, if Y is the number of successes among m trials,which result in either success or failure, then µ j ∈ [0 ,
1] can represent the probability of a success. In general,denote the natural space of µ j as M , so that µ = ( µ , . . . , µ J ) is an element of M J .In a regression setting, we observe a random sample Y , . . . , Y n from the finite mixture f ( y i | θ i ) = J (cid:88) j =1 π j g ( y | µ ij , φ ij ) , (2.3)with an associated (fixed) predictor x i ∈ R d , for i ∈ { , . . . , n } . As in the traditional GLM, we wish to linkE( Y i ) to a regression function such as x Ti β through an inverse link function G . To simplify expressions inthe rest of the paper, denote ϑ ( x ) as the inverse-linked regression G ( x T β ). We will write ϑ i = G ( x Ti β ) forbrevity when specifically referring to the i th observation, and ϑ in place of ϑ ( x ) when not emphasizing aspecific observation. With this notation, our objective is to link π T µ = ϑ i . (2.4)The left-hand side of (2.4) must vary with the observation for the link to be achievable. In this work,we will assume that subpopulation means µ i = ( µ i , . . . , µ iJ ) are specific to the i th observation, but thatmixing proportions π are common across observations. In contrast to the traditional GLM setting, π T µ i is a composite parameter which does not appear directly in the density of Y i . Therefore, we cannot simplyplug ϑ i into the likelihood.To enforce (2.4), consider the set A ( ϑ, π ) = { µ ∈ M J : µ T π = ϑ } . (2.5)For a given β and π , restricting ourselves to µ i ∈ A ( ϑ i , π ) is equivalent to enforcing the link. We willwrite A as a shorthand for A ( ϑ, π ) and A i for A ( ϑ i , π ). Our approach will be to take µ i as a random effect3 µ µ •• ••• v = (0 . , , v = (0 . , , v = (0 . , , v = (1 , . , v = (1 , , . (a) µ µ µ • •• v = (4 , , v = (0 , , v = (0 , , (b) µ µ µ (c) Figure 1: Examples of the set A ( ϑ, π ) in dimension J = 3: (a) probability-valued means with π =(0 . , . , .
2) and ϑ = 0 .
65, (b) positive means with π = (0 . , . , .
25) and ϑ = 2, (c) real-valued meanswith π = (0 . , . , .
2) and ϑ = 0.drawn from set A ( ϑ i , π ). In Sections 3, 4, and 5 we will consider several commonly used choices of the space M —the unit interval, the positive real line, and the real line respectively—to determine an appropriatedistribution for µ i . Figure 1 displays an example of the set A ( ϑ i , π ) for each of these three cases. Boyd andVandenberghe (2004) is a useful reference for basic concepts in the analysis of convex sets which emerge inthe remainder of the paper. Note that x i = 1 may be taken for all i = 1 , . . . , n to yield a non-regressionversion of Mixture Link.Selection of a distribution over A ( ϑ, π ) determines the density of Y i , f ( y i | β , π , φ i ) = (cid:90) J (cid:88) j =1 π j g ( y i | µ ij , φ ij ) · f A ( i ) ( µ i ) d µ i = J (cid:88) j =1 π j (cid:90) g ( y i | w, φ ij ) · f A ( i ) j ( w ) dw. (2.6)Here, f A ( i ) represents the J -dimensional random effects density over A ( ϑ i , π ) and f A ( i ) j represents themarginal density of the j th coordinate. In the trivial case J = 1, there is only a single point in A ( ϑ i , π ), and f ( y i | β , π , φ i ) simplifies to g ( y i | ϑ i , φ i ). In general, evaluating f ( y i | β , π , φ i ) requires computation of J univariate integrals, which can be achieved numerically using quadrature or other standard techniques. Thiscan become a computational burden if f ( y i | β , π , φ i ) must be computed many times (e.g. for a simulationor iterative estimation procedure) or if f A ( i ) j ( w ) is difficult to evaluate. By construction, E( Y i ) = ϑ i , butvariance and other moments depend on g and the distribution of µ i . As in more basic finite mixture models,the value of density (2.6) is invariant to permutations of the subpopulation labels { , . . . , J } . Consider the setting M = [0 , A ( ϑ i , π ) = { µ ∈ [0 , J : µ T π = ϑ i } is a bounded convexset in R J . Therefore, we have the decomposition A ( ϑ i , π ) = (cid:110) k i (cid:88) (cid:96) =1 λ (cid:96) v ( i ) (cid:96) : λ ∈ S k i (cid:111) = (cid:110) V ( i ) λ : λ ∈ S k i (cid:111) . (3.1)4he J × k i matrix V ( i ) is composed of the columns v ( i )1 , . . . , v ( i ) k i which are vertices of A ( ϑ i , π ). Any element µ ∈ A ( ϑ i , π ) can be written as a convex combination of these vertices. The matrix V ( i ) depends on both π and ϑ i ; both its elements and the dimension k i may vary with the observation i = 1 , . . . , n . The vector λ ( i ) belongs to the probability simplex S k .The Minkowski-Weyl decomposition of a polyhedron is P = { (cid:80) k(cid:96) =1 λ (cid:96) v (cid:96) : λ ∈ S k } + { (cid:80) h(cid:96) =1 λ (cid:96) ξ (cid:96) : λ ≥ } , relative to extreme points v , . . . , v k (i.e. vertices) and extreme directions ξ , . . . , ξ h of P . The set A i in(3.1) is a polytope, a bounded polyhedron not having extreme directions, for which we need only considerextreme points. Assuming a distribution on the coefficients of the Minkowski-Weyl decomposition has beenadvocated by Danaher et al. (2012), who sought a class of priors to enforce biologically motivated polyhedralconstraints in a Bayesian analysis.A natural choice for a random effects distribution on S k i is λ ( i ) ind ∼ Dirichlet k i ( α ). However, this choiceleads to each component of µ i = V ( i ) λ ( i ) following the distribution of a linear combination of a k -dimensionalDirichlet. This distribution is computationally impractical; for example, its density has no known closed formfor general k (Provost and Cheong, 2000). Our approach will first be to state the model using a Dirichletrandom effect, then to state a more practical form of the model using Beta random effects with matchedfirst and second moments. This ensures, for example, that E( µ i ) ∈ A ( ϑ i , π ). The Dirichlet formulation ofthe model is Y i ind ∼ J (cid:88) j =1 π j g ( y i | µ ij , φ ij ) , (3.2) µ i = V ( i ) λ ( i ) , where V ( i ) contains vertices of A ( ϑ i , π ) , λ ( i ) ind ∼ Dirichlet k i ( α ( i ) ) . We restrict α ( i ) to the k i -dimension vector κ so that all λ ( i ) follow a Symmetric Dirichlet distributionparameterized by a single scalar κ ; this is done for several reasons. First, the dimension k i can vary with theobservation so that an arbitrary α would not be compatible with all observations. Second, the ordering ofthe vertices in V ( i ) is somewhat arbitrary, and it is difficult to maintain a correspondence between individualvertices and the elements of α . Figure 2 plots the symmetric Dirichlet density for several κ when k = 3.Note that κ = 1 corresponds to the uniform distribution on the simplex, while 0 < κ < κ > (cid:96) ij and u ij as the smallest and largest elements respectively of the j th row V ( i ) ; then ( (cid:96) ij , u ij ) forms the support of µ ij .The Beta formulation of the model is Y i ind ∼ J (cid:88) j =1 π j g ( y i | µ ij , φ ij ) , (3.3) µ ij = ( u ij − (cid:96) ij ) ψ ij + (cid:96) ij , j = 1 , . . . , J,ψ ij ∼ Beta( a ij , b ij ) . To obtain a ij and b ij , we first computeE( µ ij ) = ( u ij − (cid:96) ij ) a ij a ij + b ij + (cid:96) ij , and Var( µ ij ) = ( u ij − (cid:96) ij ) a ij b ij ( a ij + b ij ) ( a ij + b ij + 1) . Next, for λ ∼ Dirichlet k i ( κ ) and v ( i ) Tj. denoting the j th row of V ( i ) , we can obtainE( v ( i ) Tj. λ ) = ¯ v ( i ) j. and Var( v ( i ) Tj. λ ) = v ( i ) Tj. v ( i ) j. − k i (¯ v ( i ) j. ) k i (1 + k i κ ) , irichlet Density for k = k = l l (a) Dirichlet Density for k = k = l l (b) Figure 2: The Dirichlet ( λ | κ ) density for several settings of κ . Only λ and λ are plotted since λ =1 − λ − λ .where ¯ v ( i ) j. denotes the mean of v ( i ) Tj. . Equating E( µ ij ) to E( v ( i ) Tj. λ ) and Var( µ ij ) to Var( v ( i ) Tj. λ ) and solvingfor a ij and b ij , we obtain that a ij = (¯ v ( i ) j. − (cid:96) ij ) (cid:34) k i (1 + k i κ ) v ( i ) Tj. v ( i ) j. − k i (¯ v ( i ) j. ) (cid:35) u ij − ¯ v ( i ) j. u ij − (cid:96) ij − ¯ v ( i ) j. − (cid:96) ij u ij − (cid:96) ij , (3.4) b ij = a ij (cid:32) u ij − ¯ v ( i ) j. ¯ v ( i ) j. − (cid:96) ij (cid:33) . (3.5)In the special case that k = 2, we have¯ v ( i ) j. = 12 (cid:20) min (cid:96) ∈{ , } v ( i ) j(cid:96) + max (cid:96) ∈{ , } v ( i ) j(cid:96) (cid:21) = 12 [ (cid:96) ij + u ij ] , ¯ v ( i ) j. − (cid:96) ij = u ij − ¯ v ( i ) j. , v ( i ) Tj. v ( i ) j. = u ij + (cid:96) ij , from which it can be shown that a ij = κ and b ij = κ .Raim (2014) observes through simulation that, although the linear-combination-of-Dirichlet density candiffer substantially from the moment-matched Beta density, the density of model (3.3) is a close approxi-mation to the density of model (3.2). We have paid specific attention to the marginal distributions of thecoordinates of µ i rather than the full joint distribution; it is seen from (2.6) that only the marginals influencethe overall Mixture Link distribution. The density of model (3.3) is now given by f ( y i | β , π , φ i , κ ) = J (cid:88) j =1 π j (cid:90) g ( y i | H ij ( w ) , φ ij ) · B ( w | a ij , b ij ) dw, (3.6)where B ( x | a, b ) denotes the Beta density and H ij ( x ) = ( u ij − (cid:96) ij ) x + (cid:96) ij .Computation of the Mixture Link density and its moments depends on the vertices of the set A . For thecase J = 2, it is easy to identify the vertices of A graphically by plotting the line µ π + µ π = ϑ , and6 µ •• v = ( , v = ( , Figure 3: An illustration of the set A ( ϑ, π ) = { µ ∈ [0 , J : µ T π = ϑ } . Here we have selected π = ( , )and ϑ = .visually identifying the points at which it intersects the unit rectangle. An illustration is given in Figure 3.Formulas for the vertices in this case are stated now as a lemma. Lemma 3.1.
Suppose J = 2 and A has two distinct vertices v , v . Then the vertices are given by v = (cid:16) π ϑ, (cid:17) , if π ϑ ≤ (cid:16) , π ( ϑ − π ) (cid:17) , otherwise , v = (cid:16) π ( ϑ − π ) , (cid:17) , if π ( ϑ − π ) ≥ (cid:16) , π ϑ (cid:17) , otherwise , where π = 1 − π .Proof. Using µ π + µ π = ϑ we have µ = 1 π ( ϑ − µ π ) and µ = 1 π ( ϑ − µ π ) , (3.7)where µ ∈ [0 ,
1] and µ ∈ [0 ,
1] must hold. To obtain v , take µ as large as possible noting expressions(3.7). If µ = 1 is a valid solution (i.e. a point in A ), then µ = π ( ϑ − π ). Otherwise, take µ as small aspossible to maximize µ ; this yields µ = π ϑ and µ = 0. A similar argument taking µ as small as possibleyields v .We may also locate the vertices v , v systematically in the following way. Fix µ = 0 and solve for µ sothat µ T π = ϑ . Then fix µ = 1 and solve for µ . Then fix µ at the values 0 and 1 and solve for µ .At most two of these four solutions are contained in A ; these are the vertices. We will soon see that thisidea generalizes to J >
2. Note that it is also possible to have k = 1 vertices when J = 2. For example,if π = (1 / , /
2) and ϑ = 1, then µ = 1 , µ = 1 is the only solution to µ π + µ π = ϑ in [0 , , andtherefore A is a singleton set.For the general ( J ≥
2) case, Lemma 3.2 characterizes points in A which need to be considered whensearching for the extreme points. In searching for extreme points, we must only consider those with at mostone component not equal to 0 or 1. 7 emma 3.2 (Characterization of Extreme Points of A ) . Suppose v = ( v , . . . , v J ) is a point in A with twoor more components strictly between 0 and 1. Then v is not an extreme point of A .Proof. Suppose without loss of generality that v ∈ A with v ∈ (0 ,
1) and v ∈ (0 , v T π = ϑ ⇐⇒ v π + v π + ( v π + · · · + v J π J ) = ϑ ⇐⇒ v π + v π = ϑ ∗ , where ϑ ∗ = ϑ − ( v π + · · · + v J π J ). We can now use Lemma 3.1 to obtain vertices, say a and b , of the linesegment L = (cid:8) ( µ , µ , v , . . . , v J ) ∈ [0 , J : µ π + µ π = ϑ ∗ (cid:9) , where ( v , . . . , v J ) are held fixed and only ( µ , µ ) may vary. Explicitly, we have a = (cid:16) π ϑ ∗ , , v , . . . , v J (cid:17) , if π ϑ ∗ ≤ (cid:16) , π ( ϑ ∗ − π ) , v , . . . , v J (cid:17) , otherwise , b = (cid:16) π ( ϑ ∗ − π ) , , v , . . . , v J (cid:17) , if π ( ϑ ∗ − π ) ≥ (cid:16) , π ϑ ∗ , v , . . . , v J (cid:17) , otherwise . By construction, we have that v is in the line segment strictly between a and b , with a (cid:54) = b . Furthermore,since L ⊆ A , we have that a , b ∈ A . Therefore, v can not be an extreme point of A .This can be used to formulate a simple procedure to identify all extreme points of A , which is givenas Algorithm 3.1. Notice that it considers J · J − points; this would be impractical for large J , but ismanageable for smaller values of J that are commonly used in finite mixtures. Algorithm 3.1
Find vertices of the set A ( ϑ, π ). function FindVertices ( ϑ, π ) V ← ∅ for j = 1 , . . . , J doif π j > thenfor all µ − j ∈ { , } J − do µ ∗ j ← π − j (cid:2) ϑ − µ T − j π − j (cid:3) v ∗ ← ( µ , . . . , µ j − , µ ∗ j , µ j +1 , . . . , µ J ) V ← V ∪ v ∗ if v ∗ ∈ A ( ϑ, π ) return Matrix V with columns v ∗ ∈ V We will now formulate a Mixture Link Binomial distribution. Suppose g ( y i | w, φ ij ) = Bin( y i | m i , w ) sothat y i represents a count of successes out of m i independent trials. Model (3.3) becomes Y i ind ∼ J (cid:88) j =1 π j (cid:18) m i y i (cid:19) µ y i ij (1 − µ ij ) m i − y i , (3.8) µ ij = ( u ij − (cid:96) ij ) ψ ij + (cid:96) ij , j = 1 , . . . , J,ψ ij ∼ Beta( a ij , b ij ) . To draw from this distribution,1. Compute matrix V given x , β , and π . 8. Compute a j and b j for j = 1 , . . . , J according to (3.5), and let ( (cid:96) j , u j ) be the minimum and maximumelement, respectively, of the j th row of V .3. Let µ j = ( u j − (cid:96) j ) ψ j + (cid:96) j with ψ j ∼ Beta( a j , b j ), for j = 1 , . . . , J .4. Draw Z ∼ Discrete(1 , . . . , J ; π ).5. Draw Y ∼ Binomial( m, µ Z ).Here, Discrete(1 , . . . , k ; p ) denotes the discrete distribution with values 1 , . . . , k and corresponding probabil-ities p = ( p , . . . , p k ). Moments of Y can be computed using moments of µ j for j = 1 , . . . , J . In particular,after some algebra, we obtainVar( Y ) = mϑ (1 − mϑ ) + m ( m − J (cid:88) j =1 π j v Tj. v j. + κ ( k ¯ v j. ) k (1 + κk ) . Some remarks about the Mixture Link Binomial distribution follow. Remark 3.3.
For the case m = 1 where y represents a single success or failure, E( Y ) = ϑ implies P( Y =1) = ϑ y (1 − ϑ ) − y , and Mixture Link simplifies to the usual Bernoulli regression model. In this case, thedistribution depends only on its β parameter. When m >
1, this trivial simplification does not take place.
Remark 3.4.
Note that because v Tj. v j. ≤ k and ¯ v j. ≤
1, we have (cid:80) Jj =1 π j v Tj. v j. + κ ( k ¯ v j. ) ≤ k (1 + κk ),yielding the bound Var( Y ) ≤ m ( m − − mϑ ( mϑ − π and κ . Remark 3.5.
The expression Var( Y ) is non-increasing in κ . This can be seen from ∂∂κ Var( Y ) = − m ( m − κk ) J (cid:88) j =1 π j k (cid:88) (cid:96) =1 ( v j(cid:96) − ¯ v j. ) ≤ . Remark 3.6.
Binomial( m, ϑ ) is a special case of Mixture Link Binomial, when π = ( J , . . . , J ) and κ → ∞ .This can be seen directly from the Dirichlet formulation of Mixture Link (3.2). Let π = ( J , . . . , J ) so that A ( π , ϑ ) = { µ ∈ [0 , J : µ + · · · + µ J = Jϑ } . A vertex v ∗ of A ( π , ϑ ) is obtained by taking, say, the first v ∗ , . . . , v ∗ [ Jϑ ] to be 1, v ∗ [ Jϑ ]+1 = Jϑ − [ Jϑ ], and the remaining elements of v ∗ to be zero. Here, [ x ] representsthe integer part of a real number x . By Lemma 3.2, v ∗ is a vertex of A ( π , ϑ ). The remaining vertices canbe obtained by permuting the elements of v ∗ . If ˜ v ∗ , . . . , ˜ v ∗ s are the unique elements of v ∗ with multiplicities | ˜ v ∗ | , . . . , | ˜ v ∗ s | , then there are k = J ! / {| ˜ v ∗ | ! · · · | ˜ v ∗ s | ! } unique permutations of v ∗ to use as columns in thematrix V . Notice that, for any a, j ∈ { , . . . , J } , the element ˜ v ∗ a appears in the j th row v Tj. of V exactly( J − / {| ˜ v ∗ a − | ! (cid:81) (cid:96) (cid:54) = a | ˜ v ∗ (cid:96) | ! } times. Then we have v Tj. = s (cid:88) a =1 ˜ v ∗ a ( J − | ˜ v ∗ a − | ! (cid:81) (cid:96) (cid:54) = a | ˜ v ∗ (cid:96) | ! = s (cid:88) a =1 ˜ v ∗ a J ! | ˜ v ∗ a | (cid:81) a(cid:96) =1 | ˜ v ∗ (cid:96) | ! 1 J = kJ s (cid:88) a =1 ˜ v ∗ a · | ˜ v ∗ a | = kJ Jϑ = kϑ. (3.9)When κ → ∞ , a draw λ ∼ Dirichlet k ( κ ) becomes a point mass at its expected value k so that (3.9) gives µ = V λ = k V = ϑ . It can now be seen that f ( y ) = J (cid:88) j =1 π j (cid:18) my (cid:19) µ yj (1 − µ j ) m − y = J (cid:88) j =1 J (cid:18) my (cid:19) ϑ y (1 − ϑ ) m − y is the Binomial( m, ϑ ) distribution. Analogous statements for some of these remarks can be made about the Mixture Link Poisson and Mixture Link Normaldistributions, discussed in Sections 4 and 5. We have focused on the Binomial case for brevity. This is the number of unique permutations of { v ∗ , . . . , v ∗ J } , keeping one of the elements fixed. emark 3.7. Mixture Link Binomial becomes a zero- and/or m -inflated Binomial model when κ →
0. Asin Remark 3.6, we will work directly from the Dirichlet formulation. As κ →
0, a draw λ ∼ Dirichlet k ( κ )behaves as a discrete uniform random variable on { e , . . . , e k } , the columns of the k × k identity matrixwhich represent the vertices of the simplex S k . Here, the Mixture Link distribution becomes f ( y ) = J (cid:88) j =1 π j k (cid:88) (cid:96) =1 k · Bin( y | m, v Tj. e (cid:96) )= J (cid:88) j =1 k (cid:88) (cid:96) =1 π j k (cid:18) my (cid:19) v yj(cid:96) (1 − v j(cid:96) ) m − y . Recall from Lemma 3.2 that, for each (cid:96) = 1 , . . . , k , at most one of { v (cid:96) , . . . , v J(cid:96) } can take on a value outsideof { , } . Terms with v J(cid:96) = 0 represent a point mass at zero, while terms with v J(cid:96) = 1 represent a pointmass at m . Remark 3.8.
Mixture Link Binomial is closely related to two other Binomial models for overdispersion.Starting from (3.6), if we could take (cid:96) ij = 0 and u ij = 1, we would have f ( y i | β , π , φ i , κ ) = J (cid:88) j =1 π j (cid:90) Bin( y i | ( u ij − (cid:96) ij ) w + (cid:96) ij , φ ij ) · B ( w | a ij , b ij ) dw, = J (cid:88) j =1 π j (cid:18) m i y i (cid:19) B ( a ij + y i , b ij + m i − y i ) B ( a ij , b ij ) . Therefore, Mixture Link Binomial can be seen as a constrained form of a finite mixture of J Beta-Binomialdensities. Also, recall the Random-Clumped Binomial (RCB) distribution (Morel and Nagaraj, 1993), whosedensity is given by f ( y | π, ρ ) = π Bin( y | π, µ ) + π Bin( y | π, µ ) , where π = π , π = 1 − π , and µ = (1 − ρ ) π + ρ , µ = (1 − ρ ) π . The free parameters of the distribution are π ∈ (0 ,
1) and ρ ∈ (0 , π µ + π µ = π , so that this particular choice of ( µ , µ ) is in the set A ( π , π ). Therefore, RCB can be seen as a special case of Mixture Link Binomial. The setting M = [0 , ∞ ) is commonly required for count data and time-to-event data. Just as in Section 3,the set A ( ϑ, π ) = { µ ∈ [0 , ∞ ) J : µ T π = ϑ } is a closed convex hyperplane segment within R J . Therefore,the decomposition (3.1) also applies but the procedure to compute vertices is much simpler. First note thatfor J = 2, v = ( ϑ/π ,
0) and v = (0 , ϑ/π ) are the vertices of A . To see this, suppose µ ∗ is an arbitrarypoint in A . Then we must have, for some λ ∈ [0 , (cid:18) µ ∗ µ ∗ (cid:19) = λ v + (1 − λ ) v = (cid:18) λϑ/π (1 − λ ) ϑ/π (cid:19) . Taking λ = µ ∗ π /ϑ satisfies the first equation µ ∗ = λϑ/π , and also gives (1 − λ ) ϑ/π = ( ϑ − µ ∗ π ) /π = µ ∗ to satisfy the second equation. Similarly to Lemma 3.2, we characterize the extreme points of A for the caseof positive means by Lemma 4.1. The proof is similar to that of Lemma 3.2, and therefore omitted. Lemma 4.1 (Characterization of Extreme Points of A ) . Suppose v = ( v , . . . , v J ) is a point in A with twoor more components which are strictly positive. Then v is not an extreme point of A . v = (0 , . . . , , v j , , . . . ,
0) is a point in A , v T π = ϑ implies v j π j = ϑ . There are exactly J suchpoints in A , yielding V = Diag( ϑ/π , . . . , ϑ/π J ). Poisson Mixture Link can now be formulated similarly asin Section 3. Note that, in this case, the Dirichlet and Beta assumptions on µ i lead to exactly the samemodel. Taking g ( y i | w, φ ij ) = Poisson( y i | w ), the model becomes Y i ind ∼ J (cid:88) j =1 π j e − µ ij µ y i ij y i ! µ i = V ( i ) λ ( i ) , λ ( i ) ind ∼ Dirichlet k i ( κ ) . Expressions involving the vertices simplify in the case of positive means, with J = k i , (cid:96) ij = 0, u ij = v ( i ) jj ,¯ v ( i ) j. = v ( i ) jj /J , v ( i ) Tj. v ( i ) j. = ( v ( i ) jj ) , H ij ( w ) = v ( i ) jj w , a ij = κ , and b ij = κ ( J − J ( κ ) is Beta( κ, κ ( J − f ( y i | β , π , κ ) = J (cid:88) j =1 π j (cid:90) e − H ij ( w ) H ij ( w ) y i y i ! · B ( w | κ, κ ( J − dw = J (cid:88) j =1 π j (cid:90) e − v ( i ) jj w [ v ( i ) jj w ] y i y i ! · w κ − (1 − w ) κ ( J − − B ( κ, κ ( J − dw = ϑ y i i Γ( y i + κ )Γ( κJ )Γ( y i + κJ )Γ( κ )Γ( y i + 1) J (cid:88) j =1 π − y i j · F (cid:18) − ϑ i π j ; y i + κ, y i + Jκ (cid:19) where F ( x ; a, b ) = [ B ( a, b − a )] − (cid:82) w a − (1 − w ) b − a − e xw dw is the confluent hypergeometric function ofthe first order and B ( a, b ) = Γ( a )Γ( b ) / Γ( a + b ) is the beta function (Johnson et al., 2005, Chapter 1).Implementations of F ( x ; a, b ) are available in computing packages such as the GNU Scientific Library. Thevariance of Y becomes Var( Y ) = ϑ + J (cid:88) j =1 π j ¯ v j. − ϑ + J (cid:88) j =1 π j v Tj. v j. − k (¯ v j. ) k (1 + κk )= ϑ + ϑ κ + 1 J (1 + Jκ ) J (cid:88) j =1 π j − . Drawing random variables from Mixture Link Poisson is similar to the method given in Section 3 for MixtureLink Binomial:1. Compute matrix of vertices V given x , β , and π .2. Let µ j = ψ j · ϑ/π j with ψ j ∼ Beta( κ, κ ( J − j = 1 , . . . , J .3. Draw Z ∼ Discrete(1 , . . . , J ; π ).4. Draw Y ∼ Binomial( m, µ Z ). Remark 4.2.
The expression Var( Y ) is decreasing in κ since ∂∂κ Var( Y ) = − ϑ ( J − J (1 + Jκ ) J (cid:88) j =1 π j < . Real-valued Means
In the case M = R , the set A ( ϑ, π ) = { µ ∈ R J : µ T π = ϑ } forms a hyperplane in R J and can be decomposedas A ( ϑ, π ) = { ¯ µ ∈ R J : ¯ µ T π = 0 } + ϑ . For any ¯ µ in the subspace { ¯ µ ∈ R J : ¯ µ T π = 0 } , we can write¯ µ J = − π − J ( π ¯ µ + · · · + π J − ¯ µ J − ) with ¯ µ j unrestricted for j = 1 , . . . , J −
1. Therefore a basis for thesubspace is given by the J × ( J −
1) matrix V = · · ·
00 1 · · ·
0. . .0 0 · · · − π /π J − π /π J · · · − π J − /π J . We can therefore represent any µ ∈ A ( ϑ, π ) as µ = V λ + ϑ for some λ ∈ R J − .A natural choice for a random effects distribution on A ( ϑ, π ) is to take λ j iid ∼ N(0 , κ ) for j = 1 , . . . , J − µ ∼ N( ϑ , κ V V T ) , where V V T = (cid:18) I − π − J π − J − π − J π T − J π − J π T − J π − J (cid:19) , I denotes the ( J − × ( J −
1) identity matrix, and π − J = ( π , . . . , π J − ). The Mixture Link density dependsonly on the diagonal terms of the random effect variance, f ( y i | β , π , φ i , κ ) = J (cid:88) j =1 π j (cid:90) g ( y i | w, φ ij ) · N( w | ϑ i , κ a ij ) dw, (5.1)where a ij = 1 for j = 1 , . . . , J − a iJ = π − J π T − J π − J .To obtain a Mixture Link analogue to the commonly used ordinary least squares model, suppose g ( y i | w, φ ij ) = N( y i | w, σ j ). In this case, it can be shown that (5.1) simplifies to the finite mixture f ( y i | β , π , σ , . . . , σ J , κ ) = J (cid:88) j =1 π j N( y i | ϑ i , κ a ij + σ j ) , (5.2)where each of the subpopulations has a common mean. If the J subpopulations are assumed to be ho-moskedastic, (5.2) further simplifies to a finite mixture of two densities, f ( y i | β , π , σ , κ ) = (1 − π J )N( y i | ϑ i , κ + σ ) + π J N( y i | ϑ i , κ π − J (1 − π J ) + σ ) . Focusing on the homoskedastic model, it is straightforward to draw from the distribution:1. Draw Z i ∼ Discrete(1 ,
2; (1 − π J , π J )),2. Draw Y i from N( y i | ϑ i , κ a ij + σ ) where Z i = j .An expression for the variance is given byVar( Y i ) = κ − π J π J + σ . Data Analysis Examples
We now present two examples of data analysis with the Mixture Link distribution. The Hiroshima datadiscussed in Section 6.1 features a Binomial outcome. The Arizona Medpar data has a count outcome, andis discussed in Section 6.2.For a complete Bayesian specification of Mixture Link Binomial and Mixture Link Poisson, we assumepriors β ∼ N( , Ω β ) , π ∼ Dirichlet( γ ) ,κ ∼ Gamma( a κ , b κ ) , where the parameterization of Gamma is taken to have E( κ ) = a κ /b κ . In the absence of a-priori knowledge,a somewhat vague choice of hyperparameters is Ω β = 1000 I d , γ = , and a κ = 1 , b κ = 2.To diagnose the fit of models with non-Normal outcomes, we make use of the randomized quantileresiduals (Dunn and Smyth, 1996). Interpretation of quantile residuals is similar to the routine residualanalysis from ordinary least squares regression. Quantile residuals from an adequate model fit appear tobehave as an independent sample from the standard Normal distribution. For y i drawn independently froma continuous distribution F ( · | θ ) with estimate ˆ θ , the quantile residual is defined as r i = Φ − { F ( y i | ˆ θ ) } .For y i drawn independently from a discrete distribution, there is an additional randomization where theresidual is defined by r i = Φ − { u i } , using u i drawn uniformly on the interval between lim ε ↓ F ( y i − ε | ˆ θ )and F ( y i | ˆ θ ). A Bayesian version of the quantile residual using draws θ (1) , . . . , θ ( R ) from the posteriordistribution f ( θ | y ) is r i = R (cid:80) Rr =1 Φ − { u ( r ) i } , where each u ( r ) i is drawn uniformly on the interval betweenlim ε ↓ F ( y i − ε | θ ( r ) ) and F ( y i | θ ( r ) ).We will also evaluate models using prediction intervals computed from the posterior predictive distri-bution. Recall that the posterior predictive distribution for a new sample ˜ y given the observed sample y is f ( ˜ y | y ) = (cid:90) f ( ˜ y | θ , y ) f ( θ | y ) dν ( θ ) = (cid:90) f ( ˜ y | θ ) f ( θ | y ) dν ( θ ) , where ν denotes an appropriate dominating measure. Then to sample from f ( ˜ y | y ):1. Draw θ (1) , . . . , θ ( R ) from posterior f ( θ | y ).2. Draw ˜ y ( r ) from f ( ˜ y | θ ( r ) ) for r = 1 , . . . , R .Now ( ˜ y (1) , . . . , ˜ y ( R ) ) is a draw from the posterior predictive distribution. A prediction for the i th observationis given by R (cid:80) Rr =1 ˜ y ( r ) i , and a prediction interval with coverage probability 1 − α for the i th observation isgiven by the α/ − α/ y (1) i , . . . , ˜ y ( R ) i ).Label switching is a common issue in Bayesian analysis of finite mixtures (Jasra et al., 2005). For MixtureLink, the π parameters are susceptible to this problem. Because finite mixtures are invariant to permutationof the labels, the parameters corresponding to labels { , . . . , J } can change during the course of an MCMCcomputation. Therefore, special care must be taken when summarizing parameters using MCMC draws. Inthis work, we take the simple approach of reordering the components within each draw π ( r ) , in ascendingorder, for each r = 1 , . . . , R . Awa et al. (1971) and Sofuni et al. (1978) study the effects of radiation exposure on chromosome aberrationsin survivors of the atomic bombs that were used in Hiroshima and Nagasaki. We consider a subset of thedata, as presented in Morel and Neerchal (2012), on n = 648 subjects in Hiroshima. For the i th subject,a chromosome analysis has been carried out on m i circulating lymphocytes to determine the number y i Model DICBinomial 3625.34RCB 3148.05BB 2984.49MixLinkJ2 2876.64MixLinkJ3 2878.01MixLinkJ4 2875.93 containing chromosome aberrations. Neutron and gamma radiation exposure (measured in rads) are availableas potential covariates. As in Raim et al. (2015), we consider the regression ϑ i = G ( β + β x i + β x i ) , (6.1)where x i is a normalized sum of neutron and gamma doses, and we take G to be the logistic CDF (as inlogistic regression).We compare six Binomial-type models with (6.1) as the regression function: Binomial, Random-ClumpedBinomial (RCB), Beta-Binomial (BB), and Mixture Link with J = 2 , , J integrals numerically for each of the n observations. Alternatively,Appendix A proposes a Metropolis-within-Gibbs (MWG) sampler (Robert and Casella, 2010, Section 10.3)where ψ i are taken as augmented data (Tanner and Wong, 1987) to avoid the expensive integration.An RWMH sampler was used to obtain posterior draws under the Binomial, RCB, and BB models, whilethe MWG sampler from Appendix A was used for Mixture Link. For each Mixture Link model, we carriedout a preliminary “pilot” MCMC, which was used to tune the proposal distribution for a final MCMC runand achieve satisfactory mixing. Mixing was assessed primarily through trace plots and autocorrelation plotsof the saved draws. Trace plots for the selected Mixture Link model are shown in Figure 6. For all models, amultivariate Normal proposal distribution was selected by hand to achieve acceptance rates between about15% and 30%. Final MCMC runs for Mixture Link were carried out for 55,000 iterations; the first 5,000 werediscarded as a burn-in sample, and 1 of every 50 remaining draws from the chain were saved. For Binomial,BB, and RCB, we used 50,000 iterations overall with the first 5,000 discarded as burn-in and saved 1 ofevery 50 remaining.Table 1 shows the Deviance Information Criterion (DIC) for these models. The three Mixture Linkmodels fit best according to DIC; BB has a smaller DIC than RCB by a large margin, and Binomial givesthe worst fit as expected. Table 2 reports means, standard deviations, 2.5% quantiles, and 97.5% quantilesfor each parameter from the posterior draws. Generally, signs and magnitudes of the β estimates agreebetween models. Standard deviations and credible intervals are a bit larger for BB and MixLink modelsthan RCB and Binomial. Figure 4 displays quantile residuals for the Binomial, BB, and MixLinkJ2 models.Residuals from BB and MixLinkJ2 are markedly closer to a N(0 ,
1) sample than Binomial residuals, as can beseen from the Q-Q plots. For all models, there is a systematic pattern in residuals vs. predicted proportions,which is an indication that the mean is not fully explained by regression function (6.1). Finally, Figure 5plots x i against observed y i /m i , along with 95% prediction intervals for Binomial, BB, and MixLinkJ2. Theintervals computed by MixLinkJ2, and to a lesser extent BB, express variability from the observed data intowider prediction intervals. 14 ll llll ll lllll llll lllll l llll ll l ll lll l lllll llll ll llll llll lll ll llllll l l llll llll lll ll llll llll l llll lllll ll ll ll ll lll ll lll ll l llll llll lll lll ll ll ll lll lll ll ll ll ll ll ll ll l lll l lll lll l llll lll l lll l l ll llll ll l ll llllll ll ll l lllll lll lll l ll l ll l l ll lll l lll lll lllll ll llll lll ll l ll l lll llllll lllll lll lllll ll ll ll lll lll ll lll lll lllllll l l ll llll llllllll lll llll llll lll ll lll llll lllllllllllll llllllll llllll llll ll ll lll lll llll ll ll ll lll ll lllll l ll lll llll lll lll ll llllllll l l ll l l ll l ll ll l lllll lll ll l llll lll ll llll ll lll ll llll ll l lll ll lll l lllll l ll llllll llll llll ll l lll l lllll ll lll ll ll lllll ll lll ll lll lllll ll l lll l ll llll l ll ll llll l l lll ll ll llllll llllll l l lll ll ll lllllllll lll llll llllllll lllll ll lllll lll − − Residuals vs. Fitted Values
Predicted Proportion R e s i dua l (a) Binomial lll llll ll lllll llll lllll l llll ll l ll lll l lllll llll llllll llll lll ll llllll l l llll llll lll ll llll llll lllll lllll ll ll ll ll lll ll lll ll l llll llll lll lll ll ll ll lll lll ll ll ll ll ll ll ll l lll l lll lll l llll lll l lll l l ll llll ll l ll llllll ll ll l lllll lll lll l ll lll l l ll lll l lll lll lllll ll llll lll ll l ll l lll llllll lllll lll llll l ll ll ll lll lll ll lll lll lllllll l l ll llll llllllll lll llll llll lll ll lll llll lllllllllllll llllllll llllll llll ll ll lll lll llll ll ll ll lll ll lllll l ll lll llll lll lll ll llllllll l l ll l lll l ll ll l lllll lll ll l l l ll lll ll llll ll lll ll llll ll l lll ll lll l lllll l ll llllll llll llll ll llll l lllll lll lll ll ll lllll ll lll ll lll lllll ll l l ll l ll llll l ll ll llll l l lll ll ll llllll llllll l llll ll ll lllllllll lll llll llllllll lllll ll lllll lll − − − Residuals vs. Fitted Values
Predicted Proportion R e s i dua l (b) BB lll llll ll lllll llll lllll l llll ll l ll lll l lllll llll ll llll llll lll ll llllll l l llll llll lll ll llll llll l llll lllll ll ll ll ll lll ll lll ll l llll llll lll lll ll ll ll lll lll ll ll ll ll ll ll ll l lll l lll lll l llll lll l lll l l ll llll ll l ll llllll ll ll l lllll lll lll l ll lll l l ll lll l lll lll lllll ll llll lll ll l ll l lll llllll lllll lll lllll ll ll ll lll lll ll lll lll lllllll l l ll llll llllllll lll llll llll lll ll lll llll lllllllllllll llllllll llllll llll ll ll lll lll llll ll ll ll lll ll lllll l ll lll llll lll lll ll llllllll l l ll l l ll l ll ll l lllll lll ll l llll lll ll ll ll ll lll ll llll ll l lll ll lll l lllll l ll llllll llll llll ll l ll l l lllll lll lll ll ll lllll ll lll ll lll lllll ll l l ll l ll llll l ll ll llll l llll ll ll llllll llllll l l lll ll ll lllllllll lll llll llllllll lllll ll lllll lll − − Residuals vs. Fitted Values
Predicted Proportion R e s i dua l (c) MixLinkJ2 ll l ll lll ll ll lll lll ll ll llll lll ll lll lllllll ll l ll lll ll lll lll ll l ll l l ll ll ll l llll l l lll ll ll l ll l l ll ll l l lll lll ll l ll ll l lll l l lll lll l ll llllll ll lll ll l llll ll l ll ll l ll llll l ll ll l llll l lll lll l ll lll ll ll ll ll ll ll lll ll l lll llll llll l ll ll l lll ll llll lll lll ll lll l ll lll lll lll ll ll l l ll ll lll l l ll ll l ll lll ll llll ll ll lll l ll ll ll ll l l l ll lll ll llll l ll ll llll ll ll ll l llll ll ll ll l lll lll l l ll l ll ll ll lll llll l ll ll llll lll l lll ll llll l ll ll llllll ll lll ll ll ll ll lll l ll llll l ll lll lll ll ll ll lll lll l ll ll l ll ll l ll ll llll ll lllll l llll l l lll lll l l ll lll ll ll ll llll l ll llll lll ll l ll ll lllll l lll l ll ll llll ll l l lll lll lll l ll lll ll lll l l ll l lll ll lllll ll ll lll l ll llll ll ll l l ll l llll ll l ll ll lll ll l ll lll ll l ll l ll lllll ll llll ll ll llll l lll ll llll l −3 −2 −1 0 1 2 3 − − Normal Q−Q Plot
Theoretical Quantiles S a m p l e Q uan t il e s (d) Binomial ll l ll l ll ll ll lll lll ll l l llll lll ll ll llllllll ll l ll lll ll lll lll lll ll l l ll ll ll l lll ll l lll ll ll l ll l l ll ll l l lll lll ll l ll ll llll ll lll llll ll llllll ll lll lll llll ll l ll ll l ll llll l ll ll ll lll l lll lll l ll lll ll ll ll ll ll ll lll ll l lll llll llll l l l ll l lll ll llll lll lll ll lll l ll ll l lll lll ll ll l l ll ll lll l l ll ll l ll ll l ll ll ll ll ll ll l l ll ll ll ll l l l ll lll ll llll l llll ll ll ll ll ll l ll ll l l ll ll l lll llll l lll ll llll lll llll l ll ll llll lll l lll ll llll l ll ll llll ll ll lll ll ll ll ll llll ll ll ll l ll ll l lll l l ll ll lll lll l ll lll ll ll l ll ll llll lllllll l llll l l lll lll l l ll lll ll ll l l llll l ll llll lll ll l ll ll lllll l lll l ll ll ll ll ll ll llll lll lll l ll lll ll lll l l ll l lll ll lllll ll ll lll l ll llll ll ll l l ll l llll ll l ll ll lll ll l ll lll lll ll l ll lllll lll l ll ll ll llll l lll ll lll l l −3 −2 −1 0 1 2 3 − − − Normal Q−Q Plot
Theoretical Quantiles S a m p l e Q uan t il e s (e) BB ll lll l ll ll l l lll lll ll ll llll lll lllll lllllll ll l ll lll ll lll ll l lll ll l l ll ll ll l lllll l lll ll ll l ll l l ll ll l l lll lll ll l ll ll l lll l l lll llll ll l lllll ll lll lll llll ll l ll ll l ll llll l ll ll l llll l lll lll l ll lll ll ll ll ll ll ll lll ll l lll llll llll l ll ll l lll ll llll lll lll ll lll l ll lll lll lll ll ll l l ll ll lll l l ll ll l ll ll l ll llll ll ll lll l ll ll ll lll l l ll l ll ll llll l ll ll ll ll ll ll ll l llll ll ll ll l lll llll l lll ll llll lll llll lll ll llll lll l l ll ll l lll l ll ll llll ll ll lll ll ll ll ll lll l ll llll l ll ll ll ll l lll ll lll lll l ll ll l ll ll l ll l l llll ll lll ll l llll l l lll lll l l ll lll ll ll ll llll l ll llll lllll l ll ll lllll l lll l ll l l ll ll ll ll llll lll lll l ll lll ll lll l l ll l lll ll lllll ll ll lll l ll llll ll lll l ll l llll ll l ll ll lll ll l ll lll lll ll l ll lllll llll ll ll ll llll l lll ll lll l l −3 −2 −1 0 1 2 3 − − Normal Q−Q Plot
Theoretical Quantiles S a m p l e Q uan t il e s (f) MixLinkJ2 Figure 4: Quantile residuals for Hiroshima models. lll llll ll lllll llll lllll l llll ll l ll lll llllll llll ll llll llll lll ll llllll l l llll llll lll ll llll llll l llll lllll ll ll ll ll lll ll lll ll l llll llll lll lll ll ll ll lll lll ll ll ll ll ll ll ll l lll l lll lll l llll lll l lll l l ll llll ll l ll llllllll ll l lllll lll lll l ll lll l l ll lll llll lll lllll ll llll lll ll l ll l lll llllll lllll lll llll l ll ll ll lll lll ll lll lll lllllll l lll llll llllllll lll llll llll lll ll lll llll lllllllllllll llllllll llllll llll ll ll lll lll llll ll ll ll lll ll lllll l ll lll llll lll lll ll llllllll l lll l l ll l ll ll l lllll lll ll l llll l ll ll llll ll lll ll llll ll l lll ll lll l lllll lll llllll llll llll ll l ll l l lllll l lll ll ll ll lllll ll lll ll lll lllll ll l lll l ll llll l ll ll llll l llll ll ll llllll llllll l l lll ll ll lllllllll lll llll llllllll lllll ll lllll lll −1 0 1 2 3 4 . . . . . . x y / m lll llll ll lllll llll lllll l llll ll l ll lll llllll llll ll llll llll lll ll llllll l l llll llll lll ll llll ll ll l llll lllll ll ll ll ll lll ll lll ll l llll llll lll lll ll ll ll lll lll ll ll ll ll ll ll ll l lll l lll lll l llll lll l lll l l ll llll ll l ll llllll ll ll l lllll lll lll l ll l ll l l ll lll llll lll lllll ll llll lll ll l ll l lll llllll lllll lll llll l ll ll ll lll ll l ll lll lll ll lllll l l ll llll llllllll lll llll llll lll ll lll lll l lllllllllllll llllllll llllll llll ll ll lll lll llll ll ll ll lll ll lllll l ll lll llll lll lll ll lllll lll l lll l l ll l ll ll l lllll lll ll l llll l ll ll llll ll lll ll llll ll l lll ll lll l lllll l ll llllll llll llll ll l ll l l lllll l lll ll ll ll lllll ll ll l ll lll lllll ll l lll l ll llll l ll ll llll l l lll ll ll llllll llllll l l lll ll ll lllllllll lll llll llllllll ll lll ll lllll llllll llll ll lllll llll lllll l llll ll l ll lll llllll llll ll llll llll lll ll llllll l l llll llll lll ll llll ll ll l llll lllll ll ll ll ll lll ll lll ll l llll llll lll lll ll ll ll lll lll ll ll ll ll ll ll ll l lll l lll lll l llll lll l lll l l ll llll ll l ll llllll ll ll l lllll lll lll l ll l ll l l ll lll llll lll lllll ll llll lll ll l ll l lll llllll lllll lll llll l ll ll ll lll ll l ll lll lll ll lllll l l ll llll llllllll lll llll llll lll ll lll lll l lllllllllllll llllllll llllll llll ll ll lll lll llll ll ll ll lll ll lllll l ll lll llll lll lll ll lllll lll l lll l l ll l ll ll l lllll lll ll l llll l ll ll llll ll lll ll llll ll l lll ll lll l lllll l ll llllll llll llll ll l ll l l lllll l lll ll ll ll lllll ll ll l ll lll lllll ll l lll l ll llll l ll ll llll l l lll ll ll llllll llllll l l lll ll ll lllllllll lll llll llllllll ll lll ll lllll llllll llll ll lllll llll lllll l llll ll l ll lll llllll llll ll llll llll lll ll llllll l l llll llll lll ll llll ll ll l llll lllll ll ll ll ll lll ll lll ll l llll llll lll lll ll ll ll lll lll ll ll ll ll ll ll ll l lll l lll lll l llll lll l lll l l ll llll ll l ll llllll ll ll l lllll lll lll l ll l ll l l ll lll llll lll lllll ll llll lll ll l ll l lll llllll lllll lll llll l ll ll ll lll ll l ll lll lll ll lllll l l ll llll llllllll lll llll llll lll ll lll lll l lllllllllllll llllllll llllll llll ll ll lll lll llll ll ll ll lll ll lllll l ll lll llll lll lll ll lllll lll l lll l l ll l ll ll l lllll lll ll l llll l ll ll llll ll lll ll llll ll l lll ll lll l lllll l ll llllll llll llll ll l ll l l lllll l lll ll ll ll lllll ll ll l ll lll lllll ll l lll l ll llll l ll ll llll l l lll ll ll llllll llllll l l lll ll ll lllllllll lll llll llllllll ll lll ll lllll lll (a) Binomial lll llll ll lllll llll lllll l llll ll l ll lll llllll llll ll llll llll lll ll llllll l l llll llll lll ll llll llll l llll lllll ll ll ll ll lll ll lll ll l llll llll lll lll ll ll ll lll lll ll ll ll ll ll ll ll l lll l lll lll l llll lll l lll l l ll llll ll l ll llllllll ll l lllll lll lll l ll lll l l ll lll llll lll lllll ll llll lll ll l ll l lll llllll lllll lll llll l ll ll ll lll lll ll lll lll lllllll l lll llll llllllll lll llll llll lll ll lll llll lllllllllllll llllllll llllll llll ll ll lll lll llll ll ll ll lll ll lllll l ll lll llll lll lll ll llllllll l lll l l ll l ll ll l lllll lll ll l llll l ll ll llll ll lll ll llll ll l lll ll lll l lllll lll llllll llll llll ll l ll l l lllll l lll ll ll ll lllll ll lll ll lll lllll ll l lll l ll llll l ll ll llll l llll ll ll llllll llllll l l lll ll ll lllllllll lll llll llllllll lllll ll lllll lll −1 0 1 2 3 4 . . . . . . x y / m lll llll ll lllll llll lllll l llll ll l ll lll llllll llll ll llll llll lll ll llllll l l llll llll lll ll llll ll ll l llll lllll ll ll ll ll lll ll lll ll l llll llll lll lll ll ll ll lll lll ll ll ll ll ll ll ll l lll l lll lll l llll lll l lll l l ll llll ll l ll llllll ll ll l lllll lll lll l ll l ll l l ll lll llll lll lllll ll llll lll ll l ll l lll llllll lllll lll llll l ll ll ll lll ll l ll lll lll ll lllll l l ll llll llllllll lll llll llll lll ll lll lll l lllllllllllll llllllll llllll llll ll ll lll lll llll ll ll ll lll ll lllll l ll lll llll lll lll ll lllll lll l lll l l ll l ll ll l lllll lll ll l llll l ll ll llll ll lll ll llll ll l lll ll lll l lllll l ll llllll llll llll ll l ll l l lllll l lll ll ll ll lllll ll ll l ll lll lllll ll l lll l ll llll l ll ll llll l l lll ll ll llllll llllll l l lll ll ll lllllllll lll llll llllllll ll lll ll lllll llllll llll ll lllll llll lllll l llll ll l ll lll llllll llll ll llll llll lll ll llllll l l llll llll lll ll llll ll ll l llll lllll ll ll ll ll lll ll lll ll l llll llll lll lll ll ll ll lll lll ll ll ll ll ll ll ll l lll l lll lll l llll lll l lll l l ll llll ll l ll llllll ll ll l lllll lll lll l ll l ll l l ll lll llll lll lllll ll llll lll ll l ll l lll llllll lllll lll llll l ll ll ll lll ll l ll lll lll ll lllll l l ll llll llllllll lll llll llll lll ll lll lll l lllllllllllll llllllll llllll llll ll ll lll lll llll ll ll ll lll ll lllll l ll lll llll lll lll ll lllll lll l lll l l ll l ll ll l lllll lll ll l llll l ll ll llll ll lll ll llll ll l lll ll lll l lllll l ll llllll llll llll ll l ll l l lllll l lll ll ll ll lllll ll ll l ll lll lllll ll l lll l ll llll l ll ll llll l l lll ll ll llllll llllll l l lll ll ll lllllllll lll llll llllllll ll lll ll lllll llllll llll ll lllll llll lllll l llll ll l ll lll llllll llll ll llll llll lll ll llllll l l llll llll lll ll llll ll ll l llll lllll ll ll ll ll lll ll lll ll l llll llll lll lll ll ll ll lll lll ll ll ll ll ll ll ll l lll l lll lll l llll lll l lll l l ll llll ll l ll llllll ll ll l lllll lll lll l ll l ll l l ll lll llll lll lllll ll llll lll ll l ll l lll llllll lllll lll llll l ll ll ll lll ll l ll lll lll ll lllll l l ll llll llllllll lll llll llll lll ll lll lll l lllllllllllll llllllll llllll llll ll ll lll lll llll ll ll ll lll ll lllll l ll lll llll lll lll ll lllll lll l lll l l ll l ll ll l lllll lll ll l llll l ll ll llll ll lll ll llll ll l lll ll lll l lllll l ll llllll llll llll ll l ll l l lllll l lll ll ll ll lllll ll ll l ll lll lllll ll l lll l ll llll l ll ll llll l l lll ll ll llllll llllll l l lll ll ll lllllllll lll llll llllllll ll lll ll lllll lll (b) BB lll llll ll lllll llll lllll l llll ll l ll lll llllll llll ll llll llll lll ll llllll l l llll llll lll ll llll llll l llll lllll ll ll ll ll lll ll lll ll l llll llll lll lll ll ll ll lll lll ll ll ll ll ll ll ll l lll l lll lll l llll lll l lll l l ll llll ll l ll llllllll ll l lllll lll lll l ll lll l l ll lll llll lll lllll ll llll lll ll l ll l lll llllll lllll lll llll l ll ll ll lll lll ll lll lll lllllll l lll llll llllllll lll llll llll lll ll lll llll lllllllllllll llllllll llllll llll ll ll lll lll llll ll ll ll lll ll lllll l ll lll llll lll lll ll llllllll l lll l l ll l ll ll l lllll lll ll l llll l ll ll llll ll lll ll llll ll l lll ll lll l lllll lll llllll llll llll ll l ll l l lllll l lll ll ll ll lllll ll lll ll lll lllll ll l lll l ll llll l ll ll llll l llll ll ll llllll llllll l l lll ll ll lllllllll lll llll llllllll lllll ll lllll lll −1 0 1 2 3 4 . . . . . . x y / m lll llll ll lllll llll lllll l llll ll l ll lll llllll llll ll llll llll lll ll llllll l l llll llll lll ll llll ll ll l llll lllll ll ll ll ll lll ll lll ll l llll llll lll lll ll ll ll lll lll ll ll ll ll ll ll ll l lll l lll lll l llll lll l lll l l ll llll ll l ll llllll ll ll l lllll lll lll l ll l ll l l ll lll llll lll lllll ll llll lll ll l ll l lll llllll lllll lll llll l ll ll ll lll ll l ll lll lll ll lllll l l ll llll llllllll lll llll llll lll ll lll lll l lllllllllllll llllllll llllll llll ll ll lll lll llll ll ll ll lll ll lllll l ll lll llll lll lll ll lllll lll l lll l l ll l ll ll l lllll lll ll l llll l ll ll llll ll lll ll llll ll l lll ll lll l lllll l ll llllll llll llll ll l ll l l lllll l lll ll ll ll lllll ll ll l ll lll lllll ll l lll l ll llll l ll ll llll l l lll ll ll llllll llllll l l lll ll ll lllllllll lll llll llllllll ll lll ll lllll llllll llll ll lllll llll lllll l llll ll l ll lll llllll llll ll llll llll lll ll llllll l l llll llll lll ll llll ll ll l llll lllll ll ll ll ll lll ll lll ll l llll llll lll lll ll ll ll lll lll ll ll ll ll ll ll ll l lll l lll lll l llll lll l lll l l ll llll ll l ll llllll ll ll l lllll lll lll l ll l ll l l ll lll llll lll lllll ll llll lll ll l ll l lll llllll lllll lll llll l ll ll ll lll ll l ll lll lll ll lllll l l ll llll llllllll lll llll llll lll ll lll lll l lllllllllllll llllllll llllll llll ll ll lll lll llll ll ll ll lll ll lllll l ll lll llll lll lll ll lllll lll l lll l l ll l ll ll l lllll lll ll l llll l ll ll llll ll lll ll llll ll l lll ll lll l lllll l ll llllll llll llll ll l ll l l lllll l lll ll ll ll lllll ll ll l ll lll lllll ll l lll l ll llll l ll ll llll l l lll ll ll llllll llllll l l lll ll ll lllllllll lll llll llllllll ll lll ll lllll llllll llll ll lllll llll lllll l llll ll l ll lll llllll llll ll llll llll lll ll llllll l l llll llll lll ll llll ll ll l llll lllll ll ll ll ll lll ll lll ll l llll llll lll lll ll ll ll lll lll ll ll ll ll ll ll ll l lll l lll lll l llll lll l lll l l ll llll ll l ll llllll ll ll l lllll lll lll l ll l ll l l ll lll llll lll lllll ll llll lll ll l ll l lll llllll lllll lll llll l ll ll ll lll ll l ll lll lll ll lllll l l ll llll llllllll lll llll llll lll ll lll lll l lllllllllllll llllllll llllll llll ll ll lll lll llll ll ll ll lll ll lllll l ll lll llll lll lll ll lllll lll l lll l l ll l ll ll l lllll lll ll l llll l ll ll llll ll lll ll llll ll l lll ll lll l lllll l ll llllll llll llll ll l ll l l lllll l lll ll ll ll lllll ll ll l ll lll lllll ll l lll l ll llll l ll ll llll l l lll ll ll llllll llllll l l lll ll ll lllllllll lll llll llllllll ll lll ll lllll lll (c) MixLinkJ2 Figure 5: Observed proportions y i /m i vs. x i for Hiroshima data are plotted as open circles. Smaller soliddots represent 95% prediction intervals (upper and lower curves) and predictions (middle curve) from therespective model. 15able 2: Posterior summaries for Hiroshima models. Binomial mean SD 2.5% 97.5%intercept -3.0241 0.0241 -3.0695 -2.9723 x x -0.1611 0.0080 -0.1762 -0.1459BB mean SD 2.5% 97.5%intercept -2.9437 0.0461 -3.0368 -2.8589 x x -0.1416 0.0139 -0.1681 -0.1146 ρ x x -0.1817 0.0121 -0.2052 -0.1578 ρ x x -0.1771 0.0167 -0.2114 -0.1450 π π κ − . − . Beta0
Iteration 0 200 400 600 800 1000 . . Beta1
Iteration 0 200 400 600 800 1000 − . − . Beta2
Iteration0 200 400 600 800 1000 . . . Pi1
Iteration 0 200 400 600 800 1000 . . . Pi2
Iteration 0 200 400 600 800 1000 . . kappa Iteration
Figure 6: Trace plots for MixLinkJ2 fit to Hiroshima data.16 .2 Arizona Medpar Data
The azpro data in the
COUNT R package are taken from Arizona cardiovascular patient files in 1991. Itcontains 3,589 observations on subjects from 17 hospitals. The outcome of interest, length of hospital stay y , is a count. Several indicator variables are available as covariates: procedure takes values 1 for CoronaryArtery Bypass Graft and 0 for Percutaneous Transluminal Coronary Angioplasty, sex is 1 for male and 0for female, type of admission admit is 1 if emergency and 0 if elective, age75 is 1 if patient’s age is at least75 and 0 otherwise, and hospital is a code to identify hospital. For this example, we consider only the 376observations with hospital = 6.5 , and take the regression function to beE( y i ) = exp { β + β · procedure i + β · sex i + β · age75 i } . We compare count regression models based on Poisson, NegBin, and Mixture Link with J = 2 , . . . , θ were drawn in a partitioned manner to improve mixing of the chain: a proposalfor either β , π , or κ was drawn at a time, keeping other parameters fixed, and either accepted or rejected. Insome cases where J >
2, the components of π were also drawn individually to further improve mixing. Weassessed mixing primarily through trace plots and autocorrelation plots of the saved draws. For all models,the multivariate Normal proposal distribution was tuned by hand to achieve acceptance rates between about15% and 30%. MCMC was carried out for 55,000 iterations; the first 5,000 were discarded as a burn-insample, and 1 of every 20 remaining draws from the chain were saved.Table 3 compares DIC across all fitted models. Because Poisson is a special case of NegBin, it is notsurprising that the DIC of NegBin indicates a superior fit. It is interesting that the DIC of MixLink appearsto improve gradually as the number of mixture components J are increased. Taking J > J = 9 resulted in poor diagnostics, so these results are not shown. Figure 9 displays the trace plots forMixLinkJ8, which was selected among the seven Mixture Link models for further analysis.We proceed by comparing the Poisson, NegBin, and MixLinkJ8 models. Table 4 reports means, standarddeviations, 2.5% quantiles, and 97.5% quantiles of each parameter computed from the posterior draws.Generally, the signs and magnitudes of the means of β are similar. The standard deviations of β aresmallest for Poisson and largest for NegBin. The credible intervals based on the quantiles are correspondinglynarrowest for Poisson and widest for NegBin. For MixLinkJ8, κ takes on rather large values which effectivelyreduces Var( Y i ) over i = 1 , . . . , n .Figure 7 plots quantile residuals against predictions and also displays Q-Q plots to assess Normality. Thepredictions have been computed by taking means of draws from the posterior predictive distribution. Notethat there are only 16 distinct values of the covariate x and observations with a common covariate are likelyto obtain similar predictions. The residuals produced by MixLinkJ8 exhibit the best behavior of the threemodels, with the least departure from standard Normality. There is still a pattern where smaller predictionstend to have more variable residuals, which indicates that further refinement of the regression function maybe needed.Finally, Figure 8 displays boxplots of y for each of the 16 possible covariate values, with 95% predictionintervals from both the Poisson and MixLinkJ8 models. These intervals were computed from 2.5% and 97.5%quantiles of the posterior predictive distribution. Intervals for the NegBin model are not shown because theupper limits are far above the range of the plots in all cases. In some cases, the Poisson intervals appearto be too narrow to capture the observed variability of the data, while MixLinkJ8 widens the intervals toreflect the variability. Regression on the mean is commonly carried out with exponential family distributions in the GeneralizedLinear Model framework, but extending this idea to finite mixture distributions is not completely straight-forward. This paper formulated the Mixture Link distribution, which establishes a link from a finite mixture17able 3: DIC for Arizona Medpar models.
Model DICPoisson 2392.62NegBin 2125.11MixLinkJ2 2095.07MixLinkJ3 2096.85MixLinkJ4 2065.76MixLinkJ5 2061.04MixLinkJ6 2062.23MixLinkJ7 2059.73MixLinkJ8 2059.39
Table 4: Posterior summaries for Arizona Medpar models.
Poisson mean SD 2.5% 97.5%intercept 1.4947 0.0541 1.3885 1.6012procedure 0.8447 0.0369 0.7713 0.9161sex -0.0292 0.0370 -0.1024 0.0429admit 0.2813 0.0469 0.1896 0.3749age75 0.0366 0.0388 -0.0402 0.1092NegBin mean SD 2.5% 97.5%intercept 1.4972 0.0861 1.3323 1.6698procedure 0.8492 0.0593 0.7333 0.9634sex -0.0422 0.0626 -0.1651 0.0781admit 0.2889 0.0750 0.1391 0.4366age75 0.0335 0.0649 -0.0960 0.1628 κ π π π π π π π π κ llll lllll lll lllll llll llll lll llllllll lllllllllllllllll lllll lll llllllll ll llll lll llllll lll lll lllllllll lllll ll llll lllllllllllll lll ll lll ll lll llll lllll ll l lll llll l l ll l ll llll llll ll llllll lll ll llll llllll l llll ll lllll l lllll lllll lll l ll ll lll lll lll lll ll ll l lllll lllll llll lll l ll ll llll llll ll l ll lll l ll llll l ll ll llll ll ll ll l llllll lll ll ll ll llll llll l lll l lllll llll l lll ll lllll ll llll lll ll ll l lll ll llllll − Residuals vs. Predicted Values
Predicted Count R e s i dua l (a) Poisson lllll lllll lll lllll llll llll lll llllllll lllllllllllllllll lllll lll llllllll ll llll lll ll ll ll lll lll lllllllll lllll ll llll lllllllllllll lll ll lll ll lll llll lllll lll lll llll l l ll l ll llll llll ll llllll lll ll llll llllll l ll ll ll ll lll l lllll lllll lll l ll ll lll lll lll lll ll ll l lllll lllll llll lll l ll ll lll l llll ll l ll lll l ll l lll l ll ll llll ll ll ll l l lllll lll ll ll ll llll llll l lll l lllll llll l lll ll lllll ll llll lll ll ll llll ll llllll − − Residuals vs. Predicted Values
Predicted Count R e s i dua l (b) NegBin lllll lllll lll lllll llll llll lll llllllll llllllll ll lllllll lllll lll ll llllll ll llll lll ll ll ll lll lll l llllllll ll ll l ll llll llll lll llllll lll ll lll ll lll llll lllll ll l ll l llll l l ll l ll llll llll ll llllll lll ll llll llllll l ll ll ll ll ll l l llll l lll ll lll l ll ll lll lll lll lll ll ll l lllll lllll llll lll l ll ll lll l llll ll l ll lll l ll l lll l ll ll llll ll ll ll l llllll lll ll ll ll l lll llll l lll l lllll llll l lll ll ll lll ll llll lll ll ll l lll ll lll lll − − Residuals vs. Predicted Values
Predicted Count R e s i dua l (c) MixLinkJ8 l llllll lllll lll llllll l lllll l lllll llll lllll lll ll ll ll ll llll l lll lll l l lllll l lll lll l lll ll l lll ll l ll lll l lll l l ll l llll lll llll llll llll ll lll lllll l ll lll l ll lllll lll ll llll ll lll ll llllll lllll llll ll lll lll ll lll lll ll l l l ll l lll ll llll llll ll l ll lll l l llll ll lllll lll l ll llll l lll lll l lll l lll l llll ll llll llllll lll ll ll ll ll lll ll lll lll lllll ll ll l lll l l lll ll ll lllll l ll lll lll llll ll l ll ll l ll llll l llll l llll lll lll llll l −3 −2 −1 0 1 2 3 − Normal Q−Q Plot
Theoretical Quantiles S a m p l e Q uan t il e s (d) Poisson l llllll lllll lll llllll l lllll l lllll llll lllll lll ll ll ll ll llll l lll lll l l lllll l lll lll l lll ll l lll ll l ll lll l lll l l ll l llll lll llll llll llll ll lll lllll l ll lll l ll lllll lll ll llll ll lll ll llllll lllllllll ll lll lll ll lll lll ll l l l ll l lll ll llll llll ll l ll lll l l llll ll lllll ll l l ll llll l lll lll l lll l lll l llll ll llll llll ll lll ll ll ll ll ll l ll lll lll lllll ll ll l lll ll lll ll ll lllll l ll lll lllllll ll l llll l ll llll l llll l llll lll lll llll l −3 −2 −1 0 1 2 3 − − Normal Q−Q Plot
Theoretical Quantiles S a m p l e Q uan t il e s (e) NegBin l llllll lllll lll llllll l lllll l lllll llll lllll lll ll ll ll ll llll l lll lll l l lllll l lll lll l lll ll l lll ll l ll lll l lll l l ll l llll lll llll llll llll ll lll lllll l ll lll l ll lllll lll ll llll ll lll ll llllll lllllllll ll lll lll ll lll lll ll l l l ll l lll ll llll llll ll l ll lll l l llll ll lllll ll l l ll llll l lll lll l lll l lll l llll ll llll lll lll lll ll ll ll ll lll ll lll lll lllll ll ll l lll l l lll ll ll lllll l ll lll lll llll ll l llll l ll llll l llll l ll ll lll lll llll l −3 −2 −1 0 1 2 3 − − Normal Q−Q Plot
Theoretical Quantiles S a m p l e Q uan t il e s (f) MixLinkJ8 Figure 7: Quantile residuals for Arizona Medpar data. l l ll ll llll l y l l llll llllll y Figure 8: Boxplots of observed y i for each of the 16 possible covariate values in the Arizona Medpar data.Covariate values are displayed as a string representing (procedure, sex, admit, age75). For example, “1010”represents procedure = admit = 1 and sex = age75 = 0. Red dash-dot lines represent 95% prediction limitsfrom Poisson and blue dashed lines are from MixLink.19
500 1000 1500 2000 2500 . . (Intercept) Iteration 0 500 1000 1500 2000 2500 . . procedure Iteration 0 500 1000 1500 2000 2500 − . . sex Iteration0 500 1000 1500 2000 2500 . . . admit Iteration 0 500 1000 1500 2000 2500 − . . age75 Iteration 0 500 1000 1500 2000 2500 . . Pi1
Iteration0 500 1000 1500 2000 2500 . . Pi2
Iteration 0 500 1000 1500 2000 2500 . . . Pi3
Iteration 0 500 1000 1500 2000 2500 . . . Pi4
Iteration0 500 1000 1500 2000 2500 . . Pi5
Iteration 0 500 1000 1500 2000 2500 . . Pi6
Iteration 0 500 1000 1500 2000 2500 . . Pi7
Iteration0 500 1000 1500 2000 2500 . . Pi8
Iteration 0 500 1000 1500 2000 2500 kappa
Iteration
Figure 9: Trace plots for MixLinkJ8 model fit to Arizona Medpar dataset.20ean to the regression function by assuming a random effects structure on the constrained parameter space.Specific variants of Mixture Link were obtained for Binomial, Poisson, and Normal outcomes. Integrals inthe general Binomial case appeared not to have a tractable form, but the Normal case could be integrated toyield another (constrained) Normal finite mixture, and integrals in the Poisson case were evaluated using theconfluent hypergeometric function. Some interesting connections were noted, for example, between MixtureLink Binomial and the Random-Clumped Binomial and Beta-Binomial distributions. Example regressionanalyses using Mixture Link Binomial and Poisson models demonstrated utility in handling overdispersion.Simpler models could adequately estimate the regression, yet failed to capture variability seen in the data.This became especially apparent in portions of analysis that depend heavily on the model, such as di-agnosing model fit with quantile residuals or computing prediction intervals from the posterior predictivedistribution. The fact that Mixture Link is completely likelihood-based ensures that such procedures areavailable; this could be seen as an advantage over quasi-likelihood methods when a flexible mean-variancerelationship is needed. R code for the Mixture Link model is available in the mixlink package, available at http://cran.r-project.org . The Mixture Link approach leads to a novel class of distributions with an interesting set of challengesfor practical use in data analysis. Initial results in Raim (2014), Raim et al. (2015), and the present paperappear promising, especially using Bayesian inference, but more work is needed to determine the suitabilityof Mixture Link for wider application. In particular, it may be worthwhile to investigate analytical propertiesof Mixture Link models, such as differentiability, especially in the Binomial case. Such properties may beneeded to establish appropriate methods for maximum likelihood estimation, large sample properties ofmaximum likelihood estimates, and approximation of the posterior distribution by a Normal distribution.
Acknowledgements
We thank Professors Thomas Mathew, Yi Huang, and Yaakov Malinovsky at the University of Maryland,Baltimore County (UMBC) for serving on the committee of the dissertation in which this work was initiated.We thank the UMBC High Performance Computing Facility for use of its computational resources, and forfinancial support of the first author through a multiple year graduate assistantship.
A Appendix: MCMC for Binomial Mixture Link
An MCMC algorithm based on model (3.8) can be formulated with ψ ij as augmented data. This approachavoids expensive numerical integration needed to compute the likelihood. The joint distribution of all randomquantities is f ( y , ψ , β , π , κ ) = (cid:40) n (cid:89) i =1 Q ( y i , ψ i , β , π , κ ) (cid:41) f ( β ) f ( π ) f ( κ ) , where Q ( y i , ψ i , β , π , κ ) = J (cid:88) j =1 π j Bin( y i | m i , H ij ( ψ ij )) B ( ψ ij | a ij , b ij ) , and H ij ( x ) = ( u ij − (cid:96) ij ) x + (cid:96) ij . Gibbs steps to sample β , π , κ , and Ψ = { ψ i : i = 1 , . . . , n } will not yieldclosed forms. Instead, we will use simple Random Walk Metropolis Hastings (Robert and Casella, 2010,Section 7.5) to propose draws for each random quantity.To obtain draws of the constrained parameters π , κ , and Ψ , we draw unconstrained random variablesfrom the sampler and transform them to the constrained space. Generally, denote ξ as one of the constrainedparameters whose full conditional density is f ( ξ | Rest), and let h be a bijection from the space of ξ to a The package currently provides Mixture Link Binomial and Poisson distributions and MCMC samplers. Functions tocompute maximum likelihood estimates using numerical optimization are also implemented. R k . The density of φ = h ( ξ ) is then f ( h − ( φ ) | Rest) | det J ( φ ) | , where J ( φ ) = ∂ ξ /∂ φ .Starting from a given φ = h ( ξ ), a proposed φ ∗ will be accepted with probabilitymin (cid:26) , f ( h − ( φ ∗ ) | Rest) · | det J ( φ ∗ ) | f ( h − ( φ ) | Rest) · | det J ( φ ) | (cid:27) . Note that the function Q ( y i , ψ i , β , π , κ ) needs to be evaluated in each step. By computing Q in C/C++ ,it is possible to improve the performance greatly over a pure R (R Core Team, 2015) implementation of oursampler. The Rcpp package by Eddelbuettel and Francois (2011), for example, greatly facilitates a hybridimplementation of R and C++ . Gibbs step for β . Consider the unnormalized density q ( β | Rest) = (cid:40) n (cid:89) i =1 Q ( y i , ψ i , β , π , κ ) (cid:41) f ( β ) . Suppose β ( r ) is the current iterate of β in the simulation and draw β ∗ from the proposal distribution N ( β ( r ) , V prop β ). Draw U ∼ U (0 , β ( r +1) = β ∗ if U < q ( β ∗ | Rest) q ( β ( r ) | Rest) β ( r ) otherwise . Gibbs step for π . Consider the unnormalized density q ( π | Rest) = (cid:40) n (cid:89) i =1 Q ( y i , ψ i , β , π , κ ) (cid:41) f ( π ) . Suppose π ( r ) is the current iterate of π in the simulation. Denote S J as the probability simplex indimension J with typical element p = ( p , . . . , p J ). Note that the multinomial logit function h ( p ) =(log( p /p J ) , . . . , log( p j − /p J )) is a bijection from S J to R J − . Therefore, we can draw φ ∗ from the pro-posal distribution N ( h ( π ( r ) ) , V prop π ) on R J − and let π ∗ = h − ( φ ∗ ) be the candidate for the next iterate.Denote J ( φ ) = ∂ π ∂ φ as the J × ( J −
1) Jacobian of the transformation from φ to π , and let det J ( φ ) be thedeterminant ignoring the J th row. Draw U ∼ U (0 , π ( r +1) = π ∗ if U < q ( π ∗ | Rest) q ( π ( r ) | Rest) | det J ( φ ∗ ) || det J ( φ ( r ) ) | π ( r ) otherwise . Gibbs step for κ . Consider the unnormalized density q ( κ | Rest) = (cid:40) n (cid:89) i =1 Q ( y i , ψ i , β , π , κ ) (cid:41) f ( κ ) . Suppose κ ( r ) is the current iterate of κ in the simulation. Draw φ ∗ from the proposal distribution N (log( κ ( r ) ) , V prop κ )and let κ ∗ = exp( φ ∗ ) be the candidate for the next iterate. The Jacobian of the transformation from φ to κ is ∂κ∂φ = exp( φ ). Draw U ∼ U (0 , κ ( r +1) = κ ∗ if U < q ( κ ∗ | Rest) q ( κ ( r ) | Rest) exp( φ ∗ )exp( φ ( r ) ) κ ( r ) otherwise . ibbs step for ψ . Consider the unnormalized density q ( ψ | Rest) = n (cid:89) i =1 Q ( y i , ψ i , β , π , κ ) . We can see that ψ i are independent conditional on the remaining random variables and we may thereforeconsider drawing one at a time. Suppose ψ ( r ) i is the current iterate of ψ i in the simulation. Let G bethe CDF of the logistic distribution, which is a bijection from R to the unit interval. Denote φ ( r ) =( G − ( ψ ( r ) i ) , . . . , G − ( ψ ( r ) iJ ). The Jacobian of the transformation from φ to ψ i is ∂ ψ i ∂ φ = Diag( G (cid:48) ( φ ) , . . . , G (cid:48) ( φ J )) = ⇒ det (cid:18) ∂ ψ i ∂ φ (cid:19) = J (cid:89) j =1 G (cid:48) ( φ j ) , where G (cid:48) represents the logistic density. Draw φ ∗ from the proposal distribution N ( φ ( r ) , V prop φ ) and let ψ ∗ i = ( G ( φ ∗ ) , . . . , G ( φ ∗ J )) be the candidate for the next iterate. Draw U ∼ U (0 , ψ ( r +1) i = ψ ∗ i if U < q ( ψ ∗ i | Rest) q ( ψ ( r ) i | Rest) (cid:81) Jj =1 G (cid:48) ( φ ∗ j ) (cid:81) Jj =1 G (cid:48) ( φ ( r ) j ) , ψ ( r ) i otherwise . References
Alan Agresti.
Categorical Data Analysis . Wiley-Interscience, 2nd edition, 2002.Akio A. Awa, Takeo Honda, Toshio Sofuni, Shotaro Neriishi, Michihiro C. Yoshida, and Takashi Matsui.Chromosome-aberration frequency in cultured blood-cells in relation to radiation dose of A-bomb survivor.
The Lancet , 298(7730):903–905, 1971.Sanjib Basu and Saurabh Mukhopadhyay. Binary response regression with normal scale mixture links. InBani K. Mallick Dipak K. Dey, Sujit K. Ghosh, editor,
Generalized Linear Models: A Bayesian Perspective ,pages 231–242. CRC Press, 2000.Stephen Boyd and Lieven Vandenberghe.
Convex Optimization . Cambridge University Press, 2004.Michelle R. Danaher, Anindya Roy, Zhen Chen, Sunni L. Mumford, and Enrique F. Schisterman. Minkowski-Weyl priors for models with parameter constraints: An analysis of the biocycle study.
Journal of theAmerican Statistical Association , 107(500):1395–1409, 2012.Dipak K. Dey and Nalini Ravishanker. Bayesian approaches for overdispersion in generalized linear models. InBani K. Mallick Dipak K. Dey, Sujit K. Ghosh, editor,
Generalized Linear Models: A Bayesian Perspective ,pages 73–88. CRC Press, 2000.Dipak K Dey, Sujit K Ghosh, and Bani K Mallick.
Generalized linear models: A Bayesian perspective . CRCPress, 2000.Peter K. Dunn and Gordon K. Smyth. Randomized quantile residuals.
Journal of Computational andGraphical Statistics , 5(3):236–244, 1996.Dirk Eddelbuettel and Romain Francois. Rcpp: Seamless R and C++ integration.
Journal of StatisticalSoftware , 40(1):1–18, 2011.Sylvia Fr¨uhwirth-Schnatter.
Finite Mixture and Markov Switching Models . Springer, 2006.23aniel B. Hall. Zero-inflated poisson and binomial regression with random effects: A case study.
Biometrics ,56(4):1030–1039, 2000.James W. Hardin and Joseph M. Hilbe.
Generalized Estimating Equations . Chapman and Hall/CRC, 2ndedition, 2012.Joseph M. Hilbe.
Negative Binomial Regression . Cambridge University Press, 2nd edition, 2011.A. Jasra, C. C. Holmes, and D. A. Stephens. Markov chain Monte Carlo methods and the label switchingproblem in Bayesian mixture modeling.
Statistical Science , 20(1):50–67, 2005.Norman L. Johnson, Samuel Kotz, and Adrienne W. Kemp.
Univariate Discrete Distributions . Wiley-Interscience, 3rd edition, 2005.Nadja Klein, Thomas Kneib, and Stefan Lang. Bayesian generalized additive models for location, scale, andshape for zero-inflated and overdispersed count data.
Journal of the American Statistical Association , 110(509):405–419, 2015.Chuanhai Liu and Donald B. Rubin. ML estimation of the t distribution using EM and its extensions, ECMand ECME.
Statistica Sinica , 5:19–39, 1995.P. McCullagh and J. A. Nelder.
Generalized Linear Models . Chapman and Hall/CRC, 2nd edition, 1989.Charles E. McCulloch, Shayle R. Searle, and John M. Neuhaus.
Generalized, Linear, and Mixed Models ,volume 2. Wiley-Interscience, 2nd edition, 2008.Jorge G. Morel and Neerchal K. Nagaraj. A finite mixture distribution for modelling multinomial extravariation.
Biometrika , 80(2):363–371, 1993.Jorge G. Morel and Nagaraj K. Neerchal.
Overdispersion Models in SAS . SAS Institute, 2012.Masanori Otake and Ross L. Prentice. The analysis of chromosomally aberrant cells based on beta-binomialdistribution.
Radiation research , 98(3):456–470, 1984.Serge B. Provost and Young-Ho Cheong. On the distribution of linear combinations of the components of adirichlet random vector.
Canadian Journal of Statistics , 28(2):417–425, 2000.R Core Team.
R: A Language and Environment for Statistical Computing . R Foundation for StatisticalComputing, Vienna, Austria, 2015.Andrew M. Raim. Computational methods in finite mixtures using approximate information and regressionlinked to the mixture mean. Ph.D. Thesis, Department of Mathematics and Statistics, University ofMaryland, Baltimore County, 2014.Andrew M. Raim, Marissa N. Gargano, Nagaraj K. Neerchal, and Jorge G. Morel. Bayesian analysis ofoverdispersed binomial data using mixture link regression. In
JSM Proceedings, Statistical ComputingSection. Alexandria, VA: American Statistical Association , pages 2794–2808, 2015.Christian P. Robert and George Casella.
Monte Carlo Statistical Methods . Springer, 2nd edition, 2010.T. Sofuni, T. Honda, M. Itoh, S. Neriishi, and M. Otake. Relationship between the radiation dose andchromosome aberrations in atomic bomb survivors of Hiroshima and Nagasaki.
Journal of RadiationResearch , 19(2):126–140, 1978.Martin A. Tanner and Wing Hung Wong. The calculation of posterior distributions by data augmentation.
Journal of the American Statistical Association , 82(398):528–540, 1987.R. W. M. Wedderburn. Quasi-likelihood functions, generalized linear models, and the Gauss-Newton method.