On the Existence of Conditional Maximum Likelihood Estimates of the Binary Logit Model with Fixed Effects
OOn the Existence of Conditional MaximumLikelihood Estimates of the Binary Logit Modelwith Fixed Effects ∗ Martin Mugnier † Abstract
By exploiting McFadden (1974)’s results on conditional logit estimation, weshow that there exists a one-to-one mapping between existence and uniquenessof conditional maximum likelihood estimates of the binary logit model withfixed effects and the configuration of data points. Our results extend those inAlbert and Anderson (1984) for the cross-sectional case and can be used tobuild a simple algorithm that detects spurious estimates in finite samples. Asan illustration, we exhibit an artificial dataset for which the STATA’s command clogit returns spurious estimates.
Keywords: separation, collinearity, binary choice models, fixed effects.
JEL Codes:
C13, C25. ∗ I would like to thank Xavier D’Haultfœuille and Louis-Daniel Pape for their comments. Allremaining errors are mine. † CREST, [email protected]. a r X i v : . [ ec on . E M ] S e p Introduction
Suppose there are i = 1 , . . . , n individuals who are observed making a choice from j ∈ { , } alternatives, over t = 1 , . . . , T choice situations. Individual i ’s sequenceof choices is Y i := ( Y i , . . . , Y iT ) with elements Y it = { i chooses 1 in t } that indicate i ’s choice in choice situation t . Suppose in addition to observe individual-specificcovariates X it ∈ R p for individual i in choice situation t . Individual i ’s sequence ofcovariates is then X i := ( X i , . . . , X iT ) . Let Z n := ( Y i , X i ) ≤ i ≤ n denote the sampledata. Provided it exists and is unique, the conditional maximum likelihood estimator(that we denote by b β CMLE n ) verifies b β CMLE n := arg max β ∈ R p log L ( β ; Z n ) , (1.1)where log L ( β ; Z n ) := N X i =1 log exp (cid:16)P Tt =1 Y it X it β (cid:17)P d i : P Tt =1 ( d it − Y it )=0 exp (cid:16)P Tt =1 d it X it β (cid:17) . As Andersen (1970) showed (see also Rasch, 1961, for an earlier reference), b β CMLE n is consistent for the binary logit model with fixed effects. However, it may be thecase that b β CMLE n does not exist in finite samples (the maximum in (1.1) is not unique,or the conditional log-likelihood does not have a maximum). While Albert and An-derson (1984) established necessary and sufficient existence conditions for maximumlikelihood estimates in cross-sectional logistic regression models, such conditions for b β CMLE n are lacking. Yet, these are of interest for at least two reasons. First, practi-tioners often use off-the-shelf programs with built-in nonlinear solvers which do notsystematically detect, tag, or even report situations where estimates do not exist(see, e.g., McCullough and Vinod, 2003). Specifically, we show below with an arti-ficial dataset that STATA’s clogit stops after some iterations and returns spuriousresults, which further illustrates that the problem has not been widely reckognized. Albert and Anderson (1984) show that, under a rank condition on the design matrix, maximumlikelihood estimates exist if and only if there is no binary linear classifier that perfectly predictsoutcome from covariates for all data points lying outside the decision frontier. Noteworthy, the “ clogit ” section in the Stata User’s Guide does not discuss the existenceproblem at all. b β CMLE n does not take advantage of all the variation avail-able in the data. This mechanically increases the probability of existence failure as weshall see. In this paper, we extend the data separation rules established in Albert andAnderson (1984) for cross-sectional logistic models. Our results hold under a rankcondition which is similar to Albert and Anderson (1984, p. 2)’ full rank assumptionimposed on the matrix of covariates (see Assumption 1 below).In Section 2, we derive our main existence theorem and provide a simple algorithmto detect existence failures. In Section 3, we exhibit an artificial dataset for whichexistence fails but clogit does not detect such failure and returns spurious estimates. Following the terminology employed in McFadden (1974), the sample Z n results froma choice experiment composed of n distinct trials ( X i , B i ), where B i is the alternativeset defined as B i := ( d := ( d , . . . , d T ) ∈ { , } T : T X t =1 ( d t − Y it ) = 0 ) . The alternative set contains r ni := (cid:16) T P Tt =1 Y it (cid:17) alternatives d ji , indexed by j = 1 , . . . , r ni ,and with vectors of attributes P Tt =1 d jit X it . Note that the number of alternatives, r ni ,differs from one individual to another. Let us define r n := P Ni =1 r ni . By rewritingAxiom 5 in McFadden (1974, p. 116) to fit our framework, we shall make the followingassumption. Assumption 1
For all β ∈ R p , the r n × p matrix whose rows are T X t =1 d jit X it − r ni X s =1 P Tt =1 d sit X it exp( P Tt =1 d sit X it β ) P r ni ‘ =1 exp( P Tt =1 d ‘it X it β ) ! for j = 1 , . . . , r ni and i = 1 , . . . , N is of rank p . Assumption 1 holds when the data vary sufficiently across periods. A necessarycondition is r n ≥ p + N . This is likely to hold in practice if p has a reasonable size.If N ≥ p , it will hold generally since r ni ≥
2, but it may also hold for
N < p if T is3arge. The following condition is an adaptation of Axiom 6 in McFadden (1974, p.116). Assumption 2
It does not exist β ∗ ∈ R p \{ } satisfying T X t =1 ( d jit − Y it ) X it β ∗ ≥ for j = 1 , . . . , r ni and i = 1 , . . . , N . Assumption 2 is reminiscent of the separation and quasi-complete separation rela-tionships (4) and (6) in Albert and Anderson (1984). However, it is specific to thepanel data structure considered here.
Theorem 2.1
Suppose Assumption 1 holds. Then, Assumption 2 is necessary andsufficient for the existence of a finite and unique b β CMLE n . Theorem 2.1 gives a sufficient and necessary condition for existence and uniqueness ofconditional maximum likelihood estimates that depends only on the configuration ofdata points. It follows from an application of Lemma 3 in McFadden (1974). We nowturn to the problem of finding an automated procedure for detecting if Assumption 2holds in practice.
Theorem 2.2
Suppose Assumption 1 holds. Then, Assumption 2 holds if and onlyif the minimum in the following quadratic programming problem is zero: min u,β u u subject to u = N X i =1 r ni X j =1 β ij T X t =1 ( d jit − Y it ) X it and β ij ≥ . Theorem 2.2 follows from an application of Lemma 4 in McFadden (1974). Notethat a
Python module called
BinLogitCMLE that implements the program given inTheorem 2.2 before computing b β CMLE n is made publicly available on the author’sGithub page. https://github.com/martinmugnier/BinLogitCMLE. An Example with Artificial Data
We generate an artificial dataset of 10 individuals (with personal identifiers id ) whoare observed at periods t ∈ { , , } . The choice variable is y it ∈ { , } and there isa unique regressor x it ∈ [0 , id t x it y it id t x it y it Note that y it = 1 if and only if x it > .
5. Hence, the stacked data is separated inAlbert and Anderson (1984)’ sense. Actually, it is easy to check that the data alsoviolate Assumption 2. While the logit program detects the separation (see Figure 1), clogit does not detect violation of Assumption 2 and returns spurious estimates afterthree iterations (see Figure 2). Although the estimated standard error (resp. the log-likelihood function) is quite large (resp. almost zero), the user may miss the crucialfact that these results are informative only about the nonexistence of the estimate.We note that
BinLogitCMLE detects the existence failure as expected.5 eferences
Albert, A. and Anderson, J. A. (1984), ‘On the existence of maximum likelihoodestimates in logistic regression models’,