[PDF] On the Existence of Conditional Maximum Likelihood Estimates of the Binary Logit Model with Fixed Effects

Abstract

By exploiting McFadden (1974)'s results on conditional logit estimation, we show that there exists a one-to-one mapping between existence and uniqueness of conditional maximum likelihood estimates of the binary logit model with fixed effects and the configuration of data points. Our results extend those in Albert and Anderson (1984) for the cross-sectional case and can be used to build a simple algorithm that detects spurious estimates in finite samples. As an illustration, we exhibit an artificial dataset for which the STATA's command \texttt{clogit} returns spurious estimates.

Full PDF

OOn the Existence of Conditional MaximumLikelihood Estimates of the Binary Logit Modelwith Fixed Eﬀects ∗ Martin Mugnier † Abstract

By exploiting McFadden (1974)’s results on conditional logit estimation, weshow that there exists a one-to-one mapping between existence and uniquenessof conditional maximum likelihood estimates of the binary logit model withﬁxed eﬀects and the conﬁguration of data points. Our results extend those inAlbert and Anderson (1984) for the cross-sectional case and can be used tobuild a simple algorithm that detects spurious estimates in ﬁnite samples. Asan illustration, we exhibit an artiﬁcial dataset for which the STATA’s command clogit returns spurious estimates.

Keywords: separation, collinearity, binary choice models, ﬁxed eﬀects.

JEL Codes:

C13, C25. ∗ I would like to thank Xavier D’Haultfœuille and Louis-Daniel Pape for their comments. Allremaining errors are mine. † CREST, [email protected]. a r X i v : . [ ec on . E M ] S e p Introduction

Suppose there are i = 1 , . . . , n individuals who are observed making a choice from j ∈ { , } alternatives, over t = 1 , . . . , T choice situations. Individual i ’s sequenceof choices is Y i := ( Y i , . . . , Y iT ) with elements Y it = { i chooses 1 in t } that indicate i ’s choice in choice situation t . Suppose in addition to observe individual-speciﬁccovariates X it ∈ R p for individual i in choice situation t . Individual i ’s sequence ofcovariates is then X i := ( X i , . . . , X iT ) . Let Z n := ( Y i , X i ) ≤ i ≤ n denote the sampledata. Provided it exists and is unique, the conditional maximum likelihood estimator(that we denote by b β CMLE n ) veriﬁes b β CMLE n := arg max β ∈ R p log L ( β ; Z n ) , (1.1)where log L ( β ; Z n ) := N X i =1 log exp (cid:16)P Tt =1 Y it X it β (cid:17)P d i : P Tt =1 ( d it − Y it )=0 exp (cid:16)P Tt =1 d it X it β (cid:17) . As Andersen (1970) showed (see also Rasch, 1961, for an earlier reference), b β CMLE n is consistent for the binary logit model with ﬁxed eﬀects. However, it may be thecase that b β CMLE n does not exist in ﬁnite samples (the maximum in (1.1) is not unique,or the conditional log-likelihood does not have a maximum). While Albert and An-derson (1984) established necessary and suﬃcient existence conditions for maximumlikelihood estimates in cross-sectional logistic regression models, such conditions for b β CMLE n are lacking. Yet, these are of interest for at least two reasons. First, practi-tioners often use oﬀ-the-shelf programs with built-in nonlinear solvers which do notsystematically detect, tag, or even report situations where estimates do not exist(see, e.g., McCullough and Vinod, 2003). Speciﬁcally, we show below with an arti-ﬁcial dataset that STATA’s clogit stops after some iterations and returns spuriousresults, which further illustrates that the problem has not been widely reckognized. Albert and Anderson (1984) show that, under a rank condition on the design matrix, maximumlikelihood estimates exist if and only if there is no binary linear classiﬁer that perfectly predictsoutcome from covariates for all data points lying outside the decision frontier. Noteworthy, the “ clogit ” section in the Stata User’s Guide does not discuss the existenceproblem at all. b β CMLE n does not take advantage of all the variation avail-able in the data. This mechanically increases the probability of existence failure as weshall see. In this paper, we extend the data separation rules established in Albert andAnderson (1984) for cross-sectional logistic models. Our results hold under a rankcondition which is similar to Albert and Anderson (1984, p. 2)’ full rank assumptionimposed on the matrix of covariates (see Assumption 1 below).In Section 2, we derive our main existence theorem and provide a simple algorithmto detect existence failures. In Section 3, we exhibit an artiﬁcial dataset for whichexistence fails but clogit does not detect such failure and returns spurious estimates. Following the terminology employed in McFadden (1974), the sample Z n results froma choice experiment composed of n distinct trials ( X i , B i ), where B i is the alternativeset deﬁned as B i := ( d := ( d , . . . , d T ) ∈ { , } T : T X t =1 ( d t − Y it ) = 0 ) . The alternative set contains r ni := (cid:16) T P Tt =1 Y it (cid:17) alternatives d ji , indexed by j = 1 , . . . , r ni ,and with vectors of attributes P Tt =1 d jit X it . Note that the number of alternatives, r ni ,diﬀers from one individual to another. Let us deﬁne r n := P Ni =1 r ni . By rewritingAxiom 5 in McFadden (1974, p. 116) to ﬁt our framework, we shall make the followingassumption. Assumption 1

For all β ∈ R p , the r n × p matrix whose rows are T X t =1 d jit X it − r ni X s =1 P Tt =1 d sit X it exp( P Tt =1 d sit X it β ) P r ni ‘ =1 exp( P Tt =1 d ‘it X it β ) ! for j = 1 , . . . , r ni and i = 1 , . . . , N is of rank p . Assumption 1 holds when the data vary suﬃciently across periods. A necessarycondition is r n ≥ p + N . This is likely to hold in practice if p has a reasonable size.If N ≥ p , it will hold generally since r ni ≥

2, but it may also hold for

N < p if T is3arge. The following condition is an adaptation of Axiom 6 in McFadden (1974, p.116). Assumption 2

It does not exist β ∗ ∈ R p \{ } satisfying T X t =1 ( d jit − Y it ) X it β ∗ ≥ for j = 1 , . . . , r ni and i = 1 , . . . , N . Assumption 2 is reminiscent of the separation and quasi-complete separation rela-tionships (4) and (6) in Albert and Anderson (1984). However, it is speciﬁc to thepanel data structure considered here.

Theorem 2.1

Suppose Assumption 1 holds. Then, Assumption 2 is necessary andsuﬃcient for the existence of a ﬁnite and unique b β CMLE n . Theorem 2.1 gives a suﬃcient and necessary condition for existence and uniqueness ofconditional maximum likelihood estimates that depends only on the conﬁguration ofdata points. It follows from an application of Lemma 3 in McFadden (1974). We nowturn to the problem of ﬁnding an automated procedure for detecting if Assumption 2holds in practice.

Theorem 2.2

Suppose Assumption 1 holds. Then, Assumption 2 holds if and onlyif the minimum in the following quadratic programming problem is zero: min u,β u u subject to u = N X i =1 r ni X j =1 β ij T X t =1 ( d jit − Y it ) X it and β ij ≥ . Theorem 2.2 follows from an application of Lemma 4 in McFadden (1974). Notethat a

Python module called

BinLogitCMLE that implements the program given inTheorem 2.2 before computing b β CMLE n is made publicly available on the author’sGithub page. https://github.com/martinmugnier/BinLogitCMLE. An Example with Artiﬁcial Data

We generate an artiﬁcial dataset of 10 individuals (with personal identiﬁers id ) whoare observed at periods t ∈ { , , } . The choice variable is y it ∈ { , } and there isa unique regressor x it ∈ [0 , id t x it y it id t x it y it Note that y it = 1 if and only if x it > .

5. Hence, the stacked data is separated inAlbert and Anderson (1984)’ sense. Actually, it is easy to check that the data alsoviolate Assumption 2. While the logit program detects the separation (see Figure 1), clogit does not detect violation of Assumption 2 and returns spurious estimates afterthree iterations (see Figure 2). Although the estimated standard error (resp. the log-likelihood function) is quite large (resp. almost zero), the user may miss the crucialfact that these results are informative only about the nonexistence of the estimate.We note that

BinLogitCMLE detects the existence failure as expected.5 eferences

Albert, A. and Anderson, J. A. (1984), ‘On the existence of maximum likelihoodestimates in logistic regression models’,