[PDF] Modeling Sums of Exchangeable Binary Variables

Abstract

We introduce a new model for sums of exchangeable binary random variables. The proposed distribution is an approximation to the exact distributional form, and relies on the theory of completely monotone functions and the Laplace transform of a gamma distribution function. Using Monte Carlo methods, we show that this new model compares favorably to the beta binomial model with respect to estimating the success probability of the Bernoulli trials and the correlation between any two variables in the exchangeable set. We apply the new methodology to two classic data sets and the results are summarized.

Full PDF

AARTICLE TEMPLATE

Modeling Sums of Exchangeable Binary Variables

Ryan Elmore

University of DenverDaniels College of BusinessDepartment of Business Information and Analytics2101 S. University Blvd, Suite 580Denver, CO 80210

ARTICLE HISTORY

Compiled July 24, 2020

ABSTRACT

We introduce a new model for sums of exchangeable binary random variables. Theproposed distribution is an approximation to the exact distributional form, andrelies on the theory of completely monotone functions and the Laplace transform ofa gamma distribution function. Using Monte Carlo methods, we show that this newmodel compares favorably to the beta binomial model with respect to estimatingthe success probability of the Bernoulli trials and the correlation between any twovariables in the exchangeable set. We apply the new methodology to two classic datasets and the results are summarized.

KEYWORDS exchangeable binary variables; overdispersion

1. Introduction

Correlated binary outcomes, either by design or through natural conditions, is a com-mon occurrence in many ﬁelds. Examples include measuring a Bernoulli outcome ina repeated measures study, teratological risk assessment, studies of familial diseasesand genetic traits, and group randomization studies, among many others. Kuk (2004)provides a nice introduction to developmental toxicity studies and the statistical issuestherein. We will summarize the details from their paper as its development is closelyrelated to what is presented here.In a standard developmental toxicology study, pregnant laboratory animals are oftenrandomly assigned to receive varying dose levels of a toxic substance during a majorperiod of organogenesis. Their lives are usually terminated before giving birth, their

CONTACT Ryan Elmore. Email:

[email protected] a r X i v : . [ s t a t . M E ] J u l terus is subsequently removed and examined for possible birth defects. For each litterin such a study, there is a sequence of Bernoulli random variables X , X , . . . , X m where X i = 0 ,

1, denoting the absence or presence of the birth defect.It is commonly assumed that members of the same litter will behave more similarlythan nonlittermates and, therefore, one may assume a degree of correlation betweenlittermates. Kuk (2004) notes that litter eﬀect can be accounted for by assuming theintralitter correlation is induced by a random eﬀect that is shared by all fetuses inthe same litter. This random eﬀect accounts for all of the environmental and geneticfactors that littermates share in common. Pang and Kuk (2005) point out that, “failureto account for litter eﬀect and the overdispersion it induces will lead to estimates withoverstated precision.”In earlier work, Williams (1975) states that it is necessary to model variation be-tween fetuses in the same litter and variation between litters receiving the same treat-ment. It is this insight that leads to the development of the beta binomial modelfor application to toxicological experiments involving reproduction and teratogenicity.Williams (1975) essentially assumes that the probability of response varies as a betadistribution between dose groups to model the overdispersion due to litter eﬀect. Itshould be noted that Skellam (1948) was ﬁrst to propose the idea of using the betadistribution to describe variation in the probability parameter of the binomial dis-tribution. For a large portion of the past 40+ years, the beta binomial distributionhas been the gold standard when it comes to modeling clustered binary data. Addi-tional models include a correlated binomial model proposed by Kupper and Haseman(1978), a correlated beta binomial model discussed by Paul (1979) and Pack (1986),an extended beta binomial model introduced by Prentice (1986), and “additive” and“multiplicative” generalizations of the binomial given in Altham (1978).George and Bowman (1995) developed an exact distribution for sums of ex-changeable binary variables. In addition, they propose an approximating model for λ k = P ( X = X = · · · = X k = 1), k = 1 , , . . . , m using λ k = f ( k ; β ) where f isthe completely monotone folded-logistic function. A drawback to this model is thatwith only a single parameter β , this model lacks the ﬂexibility of many two-parametermodels, such as the beta binomial model, when estimating success probability andintra-cluster correlation.Kuk (2004) notes that the shape of the beta binomial probability function is oftenU-shaped, J-shaped, or reverse J-shaped instead of unimodal with the mode near theexpected value of mp . Hence, all of the probability mass could be concentrated at0 and m , and a value near the “expected” value could be very unlikely. Essentiallywhat happens is that the beta binomial, and other existing distributions, tend tounderestimate the risk of at least one littermate having a birth defect. Kuk (2004)introduced the q -power model that is based on the exchangeable theory developedin George and Bowman (1995). The shared response model, introduced in Pang andKuk (2005), can model the data without overestimating the probability of no aﬀected2etuses.In this manuscript, we propose a new (approximate) distribution for handling ex-changeable binary data. Our model is based on the theory of George and Bowman(1995), and is similar to Kuk (2004), Yu and Zelterman (2008), and Bowman (2016)in its development. Background information and the new model are introduced in Sec-tion 2. In Section 3, we present the results of a large-scale Monte Carlo study designedto assess several statistical properties of the proposed model relative to those of ex-isting methods. Two classic examples are analyzed in Section 4 and our concludingremarks are reported in Section 5.

2. Model Development2.1.

Background

Most of the substantial work on sequences of exchangeable binary variables revolvesaround the famous theorem of Bruno de Finetti as stated in Diaconis (1977).

Theorem 2.1 (de Finetti) . Let { Y i } ∞ i =1 be an inﬁnite sequence of random variableswith { Y i } mi =1 exchangeable for each m ; then there is a unique probability measure µ on [0 , such that for each ﬁxed sequence of zeros and ones { e i } mi =1 , we have P [ Y = e , . . . , Y m = e m ] = (cid:90) p s (1 − p ) m − s d µ ( p ) where s = (cid:80) e i . Several comments are in order regarding this theorem. Perhaps most importantis the fact that the unique measure µ exists only if we have an inﬁnite sequenceof exchangeable binary random variables { Y i } ∞ i =1 . This result is known to fail forﬁnite sets, say { Y i } ri =1 , of exchangeable binary variables. Fortunately, two ﬁnite formsof de Finetti’s theorem are developed in Diaconis (1977) and are restated here forcompleteness. Theorem 2.2 (Diaconis (1977)) . Let { Y i } ri =1 be an exchangeable sequence which canbe extended to an exchangeable sequence of length k > r . Then there is a measure µ k on [0 , such that if e , e , . . . , e r is any sequence of zeros and ones and s = (cid:80) ri =1 e i ,then (cid:12)(cid:12)(cid:12)(cid:12) P [ Y = e , . . . , Y r = e r ] − (cid:90) p s (1 − p ) r − s d µ k ( p ) (cid:12)(cid:12)(cid:12)(cid:12) < ck , (1) where c is a constant that does not depend on the sequence e i . Corollary 2.3.

Let { Y i } ∞ i =1 be an inﬁnite sequence of random variables with { Y i } mi =1 xchangeable for each m ; then there is a unique probability measure µ on [0 , suchthat for each ﬁxed sequence of zeros and ones { e i } mi =1 , we have P [ S m = s ] = (cid:18) ms (cid:19) (cid:90) p s (1 − p ) m − s d µ ( p ) where s = (cid:80) e i = 0 , , . . . , m . Results such the previous three are potentially one motivation, but certainly notthe only one, for using the beta binomial distribution to model sums of correlatedbinary variables, see e.g.

Skellam (1948), Griﬃths (1973), Williams (1975), and Pren-tice (1986). The beta binomial model is deﬁned on S m for sums of correlated binaryrandom variables having latent response probability p as P [ S m = s | p ] = (cid:18) ms (cid:19) p s (1 − p ) m − s for s = 0 , , . . . , m, (2)where p ∼ Beta ( α, β ). This implies that, unconditionally, P [ S m = s ] = (cid:90) (cid:18) ms (cid:19) p s (1 − p ) m − s d G ( p )= (cid:0) ms (cid:1) B( α, β ) (cid:90) p α + s − (1 − p ) m + β − s − d p = (cid:0) ms (cid:1) B( α + s, m + β − s )B( α, β ) , for s = 0 , , . . . , m (3)with B( a, b ) = Γ( a )Γ( b )Γ( a + b ) . The critical, yet non-veriﬁable assumptions in using such amodel as the distribution of ﬁnite sums of exchangeable binary variables are that:(i) the set of exchangeable variables can be embedded into an inﬁnite sequence ofexchangeable binary random variables, and (ii) the unique measure given in Theorem2.1, µ , is a beta distribution function.Prentice (1986) introduces an alternative form of the beta binomial distributionby deﬁning µ = α ( α + β ) − and γ = ρ (1 − ρ ) − . The mass function under thisparameterization is deﬁned by P [ S m = s ; µ, γ ] = (cid:18) ms (cid:19) s − (cid:89) a =0 ( µ + γa ) m − s − (cid:89) a =0 (1 − µ + γa ) m − (cid:89) a =0 (1 + γa ) , (4) s = 0 , , . . . , m , for 0 < µ < γ >

0. Using this form, ρ = γ (1 + γ ) − and small4egative correlations are permissible. In fact, one can show that ρ ≥ max {− µ ( m − µ − − , − ¯ µ ( m − ¯ µ − − } , or equivalently, γ ≥ max {− µ ( m − − , ¯ µ ( m − − } , where ¯ µ = 1 − µ . Note that usual binomial variation corresponds to γ = 0, whereas,binomial variation corresponds to inﬁnite parameter values under the original param-eterization.We will now shift our attention to ﬁnite sums of exchangeable binary random vari-ables. Modeling ﬁnite sums of exchangeable binary random variables is explored indetail in the papers of George and Bowman (1995), Bowman and George (1995), andGeorge and Kodell (1996), among other papers outlined in Section 1. In George andBowman (1995), an exact distribution for the sum of exchangeable binary randomvariables is derived and their development is summarized next.Let Y = ( Y , Y , . . . , Y m ) T denote a vector of exchangeable binary random variables.By exchangeable, we mean that( Y , Y , . . . , Y k ) T D = ( Y π (1) , Y π (2) , . . . , Y π ( k ) ) T for any permutation π of the integers { , , . . . , k } . We are interested in making infer-ences on the quantity S m = (cid:80) mj =1 Y j for k ≤ m . From a straightforward application ofthe Inclusion-Exclusion principle in probability, the exact distribution of these sumscan be shown to be P [ S m = s ] = (cid:18) ms (cid:19) m − s (cid:88) k =0 ( − k (cid:18) m − sk (cid:19) p s + k , for s = 0 , , . . . , m, (5)where, p j = P [ Y = 1 , Y = 1 , . . . , Y j = 1] , j = 1 , , . . . , m, and1 = p ≥ p ≥ p ≥ · · · ≥ p m . (6)Parameter estimates of p , . . . , p m , and hence any k th -order correlation can be foundin the following way. An observation S m = s is simply an indicator random variablefollowing a multinomial distribution having cell probabilities given by P [ S m = j ] for5 = 0 , , . . . , m . Using the inversion formula p j = m − j (cid:88) k =0 (cid:0) m − jk (cid:1)(cid:0) mk (cid:1) P [ S m = m − k ] , we can ﬁnd the desired estimates. Variance estimates are computed based on thedistributional properties of multinomial probabilities. Rather than estimating eachindividual p j using a saturated approach, it is possible to model these parametersusing a function which preserves the constraints given above, namely that1 = p ≥ p ≥ p ≥ · · · ≥ p m , and m − s (cid:88) k =0 ( − k (cid:18) m − sk (cid:19) p s + k ≥ . (7)George and Bowman (1995) suggest using the folded-logistic function to model thesequences of probabilities in order to approximate the model deﬁned in Equation (5).The folded-logistic function is deﬁned by p x ( β ) = 21 + ( x + 1) β for x ≥ β >

0. Under this parameterization, Equation (5) becomes P [ S m = s ; β ] = (cid:18) ms (cid:19) m − s (cid:88) k =0 ( − k (cid:18) m − sk (cid:19)

21 + ( s + k + 1) β , (8)for s = 0 , , . . . , m . The estimation problem is now reduced to estimating a singleparameter, β , rather than estimating the m individual p ’s.Kuk (2004) introduces two additional distributions based on the theory of com-pletely monotone functions. Note that a function ϕ is completely monotone if it pos-sesses derivatives ϕ ( n ) of all orders and ( − n ϕ ( n ) ( λ ) ≥

0, for λ >

0. In the ﬁrst, Kukmodels the sequence of probabilities 1 = p ≥ . . . ≥ p m by λ k = P ( X = X = . . . = X k = 1) = p k γ where k = 0 , , . . . , m and 0 ≤ p, γ ≤

1. The parameter p is the marginal responseprobability and the parameter γ controls the degree of association between litter-mates. A value of γ = 1 corresponds to independence between littermates while avalue of γ = 0 corresponds to complete dependence between littermates. Under this6arameterization, (5) can be written as P ( S m = s ; p, γ ) = (cid:18) ms (cid:19) m − s (cid:88) k =0 ( − k (cid:18) m − sk (cid:19) p ( s + k ) γ . (9)Kuk refers to this model as the p -power distribution.Kuk mentions that one may also use the same type of power-family model for X (cid:48) = 1 − X . In this case, q = 1 − p = P ( X (cid:48) = 1) = P ( X = 0) and, therefore, λ (cid:48) k = P ( X (cid:48) = X (cid:48) = · · · = X (cid:48) k = 1) = P ( X = X = · · · = X k = 0) = q k γ . This results in the q -power probability distribution given by P ( S m = s ; q, γ ) = P ( S (cid:48) m = m − s | q, γ ) = (cid:18) ms (cid:19) s (cid:88) k =0 ( − k (cid:18) sk (cid:19) q ( m − s + k ) γ (10)where 0 ≤ q, γ ≤

1. Kuk advocates for the use of the q -power distribution over the p -power distribution when modeling overdispersed binary data. Laplace Transform of the Gamma (LapGam) Distribution

Our development relies on the theory presented in Section 2.1, along with the followingtheory on the diﬀerence operator ∆, as given in Feller (1971). The diﬀerence operator ∆is deﬁned on a sequence { c n } to be ∆ c n = c n +1 − c n . If we apply the diﬀerence operatorto the new sequence ∆ c n , we get another sequence ∆ c n = ∆(∆ c n ). Similarly, thehigher-order diﬀerences are deﬁned recursively by ∆ r c n = ∆(∆ r − c n ), where ∆ = ∆.It can be shown that the r th -order diﬀerence can be written as∆ r c n = r (cid:88) k =0 (cid:18) rk (cid:19) ( − r + k c n + k . (11)This leads to the following deﬁnition. Deﬁnition 2.4 (Feller V2) . A sequence { c n } such that ( − r ∆ r c ν ≥ r, ν is called a completely monotone sequence.Applying these results to the sequence { p n } given above, we see that if { p n } iscompletely monotone, then the constraints deﬁned in (7) are satisﬁed. We will modelsuch a sequence using a completely monotone function . Our main result, summarizednext, is a direct application of the following theorem using the gamma distributionfunction. Theorem 2.5 (Feller V2) . A function ϕ on (0 , ∞ ) is the Laplace transform of a robability distribution F, iﬀ it is completely monotone, and ϕ (0) = 1 . Result 2.1.

The function p x deﬁned by p x ( α, β ) = 1[1 + βx ] α (12)for x ≥ β, α > F be the distribution function of a gamma random variable withmean αβ and variance αβ , for parameters α, β >

0. The Laplace transform of F isgiven by ϕ λ ( α, β ) = (cid:90) ∞ e − λx d F ( x ) = 1(1 + βλ ) α . (13)Therefore, p x ( α, β ) is a completely monotone function by Feller’s result given above.Similarly to the ideas presented in George and Bowman (1995) and Kuk (2004), wewill use this function as a model for the sequence 1 = p ≥ p ≥ · · · ≥ p m . Therefore,an approximate distribution of S m under this parameterization is P [ S m = s ; α, β ] = (cid:18) ms (cid:19) m − s (cid:88) k =0 ( − k (cid:18) m − sk (cid:19) β ( s + k )] α (14)for s = 0 , , . . . , m . We will refer to this distribution as the Laplace transform of thegamma distribution, or the LapGam for short. Estimation

Let S m , S m , . . . , S mn be a random sample of sums of exchangeable binary randomvariables deﬁned by S mi = (cid:80) mj =1 Y ij where P [ Y ij = 1] = p . We will assume that S mi follows the distribution given by (5) with p s + k = p s + k ( α, β ) = 1[1 + β ( s + k )] α . (15)8hus, the log-likelihood function for this sample can be written as l n ( α, β ; s ) = log (cid:32) n (cid:89) i =1 P [ S m = s i ; α, β ] (cid:33) = n (cid:88) i =1 log (cid:18) ms i (cid:19) + n (cid:88) i =1 log (cid:32) m − s i (cid:88) k =0 ( − k (cid:18) m − s i k (cid:19) p s i + k ( α, β ) (cid:33) = n (cid:88) i =1 log (cid:18) ms i (cid:19) + n (cid:88) i =1 log (cid:32) m − s i (cid:88) k =0 ( − k (cid:0) m − s i k (cid:1) [1 + β ( s + k )] α (cid:33) . (16)Our interest is in ﬁnding estimators ˆ α and ˆ β , and consequently estimators of theprobability parameters and correlations (of potentially all orders). It is straightforwardto write a Newton-Raphson algorithm to maximize this likelihood, or simply use anoptimization method in R or python to ﬁnd the MLEs.The asymptotic theory of MLEs tells us that √ n (cid:32)(cid:32) ˆ α ˆ β (cid:33) − (cid:32) α β (cid:33)(cid:33) L −→ N ( , I − ( θ )) (17)where θ = ( α , β ) T is the true value of the parameter and I − is the inverse ofthe Fisher-Information matrix. From (17), we can apply the Delta method to ﬁndasymptotic distributions of additional quantities (Lehmann 1999). For example, weknow that ˆ p = 1(1 + ˆ β ) ˆ α by the invariance principle of MLEs and, hence, √ n (cid:16) g ( ˆ α, ˆ β ) − g ( α , β ) (cid:17) = √ n ( ˆ p − p ) L −→ N ( , ˙ g (cid:48) I − ( θ ) ˙ g )using the delta method with g deﬁned by g ( α, β ) = 1(1 + β ) α see Equation (12). Note that ˙ g is the vector of partial derivatives of g evaluated atthe true value θ . Similarly, a function h : R → R can be deﬁned to ﬁnd the jointasymptotic distribution of (ˆ p, ˆ ρ ) (cid:48) . 9 . Monte Carlo Simulation In order to assess the performance of the LapGam model in a controlled environment,we conducted a large-scale Monte Carlo study. Three additional models were chosen asa basis of comparison in this simulation study: the George and Bowman (1995) modelgiven in equation (8), the beta binomial model (4) using the parameterization deﬁnedin Prentice (1986), and the q -power model (10) deﬁned in Kuk (2004). The modelswere evaluated in terms of estimating the binary response probability p and ﬁrst-order correlation ρ for sums of correlated binary variables. We considered 40 diﬀerentscenarios corresponding to p = 0 . , . , . , . , and 0 . ρ = 0 . , . , . , and0 .

20. We varied the number of Bernoulli trials using m = 10 and m = 15. For everyscenario, B = 1000 samples of size 100 were simulated. The data for this simulationstudy were generated using the bindata() package (Leisch, Weingessel, and Hornik1998) available in the R software (R Core Team 2020). For the sake of brevity, we onlydiscuss the results of the simulations when m = 10 in this manuscript. The m = 15scenario is similar to what is shown here. A comprehensive summary of the full set ofsimulation results is available from the author upon request.We ﬁrst discuss the results of estimating p , the success probability. The simulatedsampling distributions of ˆ p can be see in Figure 1. Speciﬁcally, each row ( p ) and column( ρ ) combination corresponds to the parameter values that were used to generate thedata. Each box shows the estimated sampling distributions of ˆ p under the four modelsin question. As can be seen in this ﬁgure, the beta binomial and the LapGam modelsperform almost identically across the ten diﬀerent scenarios presented here. On theother hand, the estimates of p based on the folded-logistic and q -power models showevidence of bias in certain situations. For example, there is noticeable bias in bothwhen the intra-cluster correlation is high (0.2) and the success probability is low (0.1).Figure 2 shows the results when estimating ρ . Each row ( ρ ) and column ( p ) combi-nation indicates the parameter values that were used to generate the data and showsthe estimated sampling distributions of ˆ ρ under the four models in question. Simi-lar to the story told above, the beta binomial and LapGam models tend to performwell at estimating ρ in each scenario, whereas the other two show some bias. Thefolded-logistic model, in particular, does a bad job at estimating ρ when the successprobability is 0.5 and the individual trials are weakly correlated, ρ = 0 .

4. Examples4.1.

Brassica Data

The following example consists of data presented in Skellam (1948) and Altham (1978)on the secondary association of chromosomes in

Brassica , a group of plants belongingto the Mustard family (botany.com). If the probability of association is constant within10 igure 1.

The estimated sampling distributions of ˆ p using the beta binomial, folded-logistic, LapGam, and q -power models. Each row and column correspond to particular values of p and ρ , respectively, that were usedto generate the data in each simulation. The horizontal dashed lines serve as references for the targeted valuesof p to be estimated. igure 2. The estimated sampling distributions of ˆ ρ using the beta binomial, folded-logistic, LapGam, and q -power models. Each row and column correspond to particular values of ρ and p , respectively, that were usedto generate the data in each simulation. The horizontal dashed lines serve as references for the targeted valuesof ρ to be estimated. able 1. The observed and expected counts for the

Brassica data using the binomial, folded-logistic, betabinomial, LapGam, and q -power models. m Observed Binomial FL BB LapGam q -power0 32 24.86 50.59 33.97 33.97 32.431 103 103.24 90.57 97.16 97.20 102.022 122 142.93 104.45 127.67 127.61 122.693 80 65.96 91.39 78.20 78.23 79.86 Table 2.

The parameter estimates of p and ρ using the folded-logistic, beta binomial, LapGam, and q -powermodels. FL BB LapGam q -power p ρ q -power model.The p -values for the chi-square goodness of ﬁt statistics for the beta binomial, LGa,and the q -power models are 0.8594, 0.8621, and 0.9993, respectively. On the otherhand, the usual binomial and the folded logistic ﬁts are rejected according to thechi-square test with p -values equal to 0.0439 and 0.0048, respectively. The probabilityof association for a given bivalent and the correlation among pairs of bivalents aregiven in Table 2. As can be seen in this classic example, it is diﬃcult to distinguishwhich model provides the best summary of these data between the beta binomial,LGa, and the q -power models. It is fairly easy to discount the estimates based on thefolded-logistic function, however. Brazil Data

Our second example consists of data related to a survey of deaths in children froma particularly poor region in northeast Brazil. The raw data were reported in Sastry(1997) and reproduced in Yu and Zelterman (2008). The data consist of 1051 uniquefamilies with a total of 2946 children. The number of children in the families rangedfrom one to eight, with two being the most frequent. The outcome of interest is a binaryvariable indicating childhood mortality. The data are assumed to be exchangeable asthey are overdispersed within families relative to a binomial model.13s discussed in Yu and Zelterman (2008), it appears that the mortality rate andwithin-family correlation diﬀer as a function of family size. They account for thisdiﬀerence using a quadratic function of m , family size, in the logit of µ , mortalityrate, and the log of γ in the beta binomial model speciﬁcation deﬁned in Equation (4).Rather than requiring strict functional forms, we take a semi-parametric approach toﬁtting the beta binomial and LapGam models to the Brazil data. That is, we modelthe logit of µ and the log of γ using cubic splines in m , i.e.logit[ µ ( m )] = s ( m ) (cid:48) η and log[ γ ( m )] = s ( m ) (cid:48) η where s ( m ) is the cubic-spline basis representation of m and η and η are vectors ofparameters to be estimated from the data. Similarly, we handle within-family diﬀer-ences in α and β from Equation (14) using the log, orlog[ α ( m )] = s ( m ) (cid:48) η and log[ β ( m )] = s ( m ) (cid:48) η . Again, η and η are estimable parameter values and s ( m ) is the cubic-spline basisrepresentation of m .The estimated deviances for the semi-parametric ﬁts of the beta binomial andLapGam models are 18.79 and 19.09, respectively. The estimated mortality rates andwithin-family correlations are presented in Figure 3. The estimates associated withthe beta binomial model are given as circles along the solid line and the LapGamestimates are shown using triangles on dashed lines. The estimated probabilities arevirtually identical across the two models whereas there are noticeable diﬀerences inwithin-family correlation. As a comparison to Yu and Zelterman (2008) and for com-pleteness, the estimated deviance using a quadratic function of m in α and β is 19.16,suggesting a slightly better ﬁt when using the semi-parametric model.

5. Discussion

In this paper, we introduced a new model, LapGam, for approximating the distribu-tion of sums of exchangeable binary variables. The model is developed using a novelapplication of completely monotone functions and the diﬀerence operator to the exact14 igure 3.

The left panel shows the estimated mortality rates as functions of m for the beta binomial model(circles, solid line) and the LapGam model (triangles, dashed). The right panel shows the estimated within-family correlation for each family size ranging from two to eight for the beta binomial model (circles, solid line)and the LapGam model (triangles, dashed). distribution as developed in George and Bowman (1995). In addition, we demonstratethe eﬃcacy of maximum likelihood estimation of the LapGam using a large-scale sim-ulation study. Lastly, we demonstrate the use of this model by applying the results totwo classic applications.The simulation study shows that this new LapGam model performs on par with thewell-known beta binomial distribution under a wide variety of simulated conditions.These results provide conﬁdence in our conclusions associated with the two classicexamples that we analyzed in Section 4.As other authors have noted (see Yu and Zelterman (2008) and references therein),the exact distribution, as well as its many approximations, rarely out perform thebeta binomial model when applied to estimating parameters in sums of exchangeablebinary variables. This is the case here as well. However, we did show that our newproposal, the LapGam model, performs as well as the beta binomial model in all ofour simulated scenarios.In closing, we wish to emphasize that the theory and numerical results that arepresented in this manuscript should serve a building block for future research on sumsof exchangeable Bernoulli variables. As an example, there are likely similar results tothose presented in Section 2.1 for additional distribution functions. Hopefully thesefuture studies will come to fruition. References

Altham, P. M. E. 1978. “Two generalizations of the binomial distribution.”

Journal of theRoyal Statistical Society Series C: Applied Statistics

27: 162–167.Bowman, D., and E. O. George. 1995. “A saturated model for analyzing exchangeable binarydata: Applications to clinical and developmental toxicity studies.”

Journal of the AmericanStatistical Association

90: 871–879. owman, Dale. 2016. “Statistical inference for familial disease models assuming exchangeabil-ity.” Statistics & Probability Letters

Synthese

36 (2):271–281.Feller, W. 1971.

An Introduction to Probability Theory and Its Applications . 2nd ed., Vol. II.John Wiley and Sons.George, E. O., and D. Bowman. 1995. “A full likelihood procedure for analysing exchangeablebinary data.”

Biometrics

51: 512–523.George, E. O., and R. L. Kodell. 1996. “Tests of independence, treatment heterogeneity, anddose-related trend with exchangeable binary data.”

Journal of the American StatisticalAssociation

91: 1602–1610.Griﬃths, D. A. 1973. “Maximum Likelihood Estimation for the Beta-binomial Distribution andAn Application to the Household Distribution of the Total Number of Cases of a Disease.”

Biometrics

29: 637–648.Kuk, Anthony Y. C. 2004. “A litter-based approach to risk assessment in developmental tox-icity studies via a power family of completely monotone functions.”

Journal of the RoyalStatistical Society Series C: Applied Statistics

53 (2): 369–386.Kupper, L. L., and J. K. Haseman. 1978. “The use of a correlated binomial model for theanalysis of certain toxicological experiments.”

Biometrics

34: 69–76.Lehmann, E. L. 1999.

Elements of Large Sample Theory . Springer-Verlag.Leisch, F., A. Weingessel, and K. Hornik. 1998.

On the Generation of Correlated ArtiﬁcialBinary Data . Working Paper 13. SFB “Adaptive Information Systems and Modeling inEconomics and Management Science”. .Pack, S.E. 1986. “Hypothesis testing for proportion with overdispersion.”

Biometrics

42: 85–89.Pang, Zhen, and Anthony Y. C. Kuk. 2005. “A Shared Response Model for Cluster BinaryData in Developmental Toxicity Studies.”

Biometrics

61: 1076–1084.Paul, S.R. 1979. “A clumped beta-binomial model for the analysis of clustered attribute data.”

Biometrics

35: 821–824.Prentice, R. L. 1986. “Binary Regression Using An Extended Beta-binomial Distribution,With Discussion of Correlation Induced By Covariate Measurement Errors.”

Journal of theAmerican Statistical Association

81: 321–327.R Core Team. 2020.

R: A Language and Environment for Statistical Computing . Vienna,Austria: R Foundation for Statistical Computing. .Sastry, Narayan. 1997. “A nested frailty model for survival data, with an application to thestudy of child survival in northeast Brazil.”

Journal of the American Statistical Association

92 (438): 426–435.Skellam, J.G. 1948. “A probability distribution derived from the binomial distribution byregarding the probability of success as variable between sets of trials.”

Journal of the RoyalStatistical Society Series B: Methodological

10 (2): 257–261.Williams, D. A. 1975. “The Analysis of Binary Responses From Toxicological ExperimentsInvolving Reproduction and Teratogenicity.”

Biometrics

31: 949–952.Yu, Chang, and Daniel Zelterman. 2008. “Sums of exchangeable Bernoulli random variables or family and litter frequency data.” Computational Statistics & Data Analysis

52 (3):1636–1649.52 (3):1636–1649.