[PDF] Fixed Effects Binary Choice Models with Three or More Periods

Abstract

We consider fixed effects binary choice models with a fixed number of periods T and without a large support condition on the regressors. If the time-varying unobserved terms are i.i.d. with known distribution F, Chamberlain (2010) shows that the common slope parameter is point-identified if and only if F is logistic. However, he considers in his proof only T=2. We show that actually, the result does not generalize to T>2: the common slope parameter and some parameters of the distribution of the shocks can be identified when F belongs to a family including the logit distribution. Identification is based on a conditional moment restriction. We give necessary and sufficient conditions on the covariates for this restriction to identify the parameters. In addition, we show that under mild conditions, the corresponding GMM estimator reaches the semiparametric efficiency bound when T=3.

Full PDF

FFixed Eﬀects Binary Choice Models with Three orMore Periods ∗ Laurent Davezies † Xavier D’Haultfœuille ‡ Martin Mugnier § Abstract

We consider ﬁxed eﬀects binary choice models with a ﬁxed number of periods T and without a large support condition on the regressors. If the time-varyingunobserved terms are i.i.d. with known distribution F , Chamberlain (2010)shows that the common slope parameter is point-identiﬁed if and only if F islogistic. However, he considers in his proof only T = 2. We show that actually,the result does not generalize to T ≥

3: the common slope parameter andsome parameters of the distribution of the shocks can be identiﬁed when F belongs to a family including the logit distribution. Identiﬁcation is based ona conditional moment restriction. We give necessary and suﬃcient conditionson the covariates for this restriction to identify the parameters. In addition, weshow that under mild conditions, the corresponding GMM estimator reachesthe semiparametric eﬃciency bound when T = 3. Keywords:

Binary choice model, panel data, point identiﬁcation, condi-tional moment restrictions.

JEL Codes:

C14, C23, C25. ∗ We would like to thank Pascal Lavergne for his comments. † CREST, [email protected] ‡ CREST, [email protected]. § CREST, [email protected]. a r X i v : . [ ec on . E M ] S e p Introduction

In this paper, we revisit the classical binary choice model with ﬁxed eﬀects. Speciﬁ-cally, let T denote the number of periods and let us suppose to observe, for individual i , ( Y it , X it ) t =1 ,...,T with Y it = { X it β + γ i − ε it ≥ } (1.1)where β ∈ R K is unknown and ε it ∈ R is an idiosyncratic shock. The nonlinear natureof the model and the absence of restriction on the distribution of γ i conditional on X i := ( X i , . . . , X iT ) renders the identiﬁcation of β diﬃcult. Rasch (1960) showsthat if the ( ε it ) t =1 ,...,T are i.i.d. with a logitistic distribution, a conditional maximumlikelihood can be used to identify and estimate β . Chamberlain (2010) establishesa striking converse of Rasch’s result: if the ( ε it ) t =1 ,...,T are i.i.d. with distribution F and the support of X i is bounded, β is point identiﬁed only if F is logistic. Otherpapers have circumvented such a negative result by either considering large supportregressors (see in particular Manski, 1987; Honore and Lewbel, 2002) or allowing fordependence between the shocks (see Magnac, 2004).It turns out, however, that Chamberlain (2010) only proves his result for T = 2. Andin fact, we show that his result does not generalize to T ≥

3. Speciﬁcally, we considerdistributions F satisfying F ( x )1 − F ( x ) = τ X k =1 w k exp( λ k x ) or 1 − F ( x ) F ( x ) = τ X k =1 w k exp( − λ k x ) , (1.2)with T ≥ τ + 1, ( w , ..., w τ ) ∈ (0 , ∞ ) × [0 , ∞ ) τ − and 1 = λ < . . . < λ τ . Westudy the identiﬁcation of β , assuming that λ := ( λ , . . . , λ τ ) is known, but also of θ := ( β , λ ). In both cases, the weights w , . . . , w τ remain unknown, thus allowingfor much more ﬂexibility on the distribution of ε it than in the logit case. Our maininsight is that for any F satisfying (1.2), a conditional moment restriction holds.We then give necessary as well as suﬃcient conditions for such moment restrictionsto identify β or θ . The necessary conditions show for instance that with τ ≥ β cannot be achieved with a single, binary. X it . On the otherhand, our suﬃcient conditions imply that at least if γ is constant, θ is identiﬁedif conditional on ( X j,t ) ( j,t ) =( i,t ) , X it takes at least 2 τ values. Note that Johnson(2004) considers the same family with τ = 2 and T = 3. However, he does not2tudy the general case and does not show any formal identiﬁcation result based onthe corresponding moment conditions.Obviously, the conditional moment condition can be used to construct GMM estima-tors. This means, in particular, that √ n -consistent estimation is possible beyond thelogit case when T >

2, overturning again the negative results of Chamberlain (2010)and Magnac (2004). Further, we show that if T = 3 and mild additional restric-tions hold, the optimal GMM estimator based on our conditional moment conditionsreaches the semiparametric eﬃciency bound of the model. This means that at leastwhen T = 3, these moment conditions contain all the information of the model. Wealso show through simulations that this information is suﬃcient to form rather preciseestimators for usual sample sizes.The remainder of the paper is organized as follows. Section 2 gives a necessary andsuﬃcient conditions for point identiﬁcation of β and the λ j . Section 3 discussesestimation and the semiparametric eﬃciency bound of the model. Section 4 reportsresults from a Monte-Carlo study. Section 5 concludes. All the proofs are collectedin the appendix. We drop the subscript i in the absence of ambiguity and let Y := ( Y , . . . , Y T ) , X := ( X , . . . , X T ) and X t := ( X ,t , . . . , X K,t ) . For any set A ⊂ R p (for any p ≥ A ∗ := A \{ } and | A | denote the cardinal of A . Hereafter, we maintain thefollowing conditions. Assumption 1 (Binary choice panel model)

Equation (1.1) holds and:1. ( X, γ ) and ( ε t ) ≤ t ≤ T are independent and the ( ε t ) ≤ t ≤ T are i.i.d. with a knowncumulative distribution function (cdf) F .2. For all ( k, t ) , E [ X k,t ] < ∞ . . β ∈ R K ∗ . The ﬁrst condition is also considered in Chamberlain (2010). The second conditionis a standard moment restriction on the covariates. Finally, we exclude in the thirdcondition the case β = 0 here. This case can be treated separately, as the followingproposition shows. Proposition 2.1

Suppose that Assumption 1 holds, F is strictly increasing on R andthere exist ( t, t ) ∈ { , . . . , T } such that E [( X t − X t )( X t − X t ) ] is nonsingular. Then β = 0 if and only if P ( Y t = 1 , Y t = 0 | Y t + Y t = 1 , X t , X t ) = 12 a.s. (2.1)Condition (2.1) can be tested by a speciﬁcation test on the nonparametric regressionof D = Y t (1 − Y t ) on ( X t , X t ), conditional on the event Y t + Y t = 1. See, e.g., Bierens(1990) or Hong and White (1995).Turning to identiﬁcation on R K ∗ , we ﬁrst recall the negative result of Chamberlain(2010). Theorem 2.2

Suppose that T = 2 , Assumption 1 holds, F is strictly increasing on R with bounded, continuous derivative and Supp ( X ) is compact. If, for all β ∈ R K ∗ , β is identiﬁed, then F ( x ) / (1 − F ( x )) = w exp( λx ) for some ( w, λ ) ∈ R + ∗ . Our results below imply, however, that this negative result does not generalize to

T >

2. To this end, we consider a family of distribution that includes the lo-gistic distribution and is deﬁned as follows. Hereafter, Λ τ denotes a subset of { ( λ , . . . , λ τ ) ∈ R τ : 1 = λ < . . . < λ τ } . Assumption 2 (“Generalized” logistic distributions)

There exists a known τ ∈{ , . . . , T − } , unknown w := ( w , . . . , w τ ) ∈ (0 , ∞ ) × [0 , ∞ ) τ − and λ := ( λ , . . . , λ τ ) ∈ Λ τ such that: Either F ( x )1 − F ( x ) = P τj =1 w j exp( λ j x ) (First type) , or − F ( x ) F ( x ) = P τj =1 w j exp( − λ j x ) (Second type) . Noteworthy, the family of “generalized” logistic distributions we consider diﬀers from thoseintroduced by Balakrishnan and Leung (1988) and Stukel (1988).

4e ﬁx min { λ , . . . , λ τ } to 1 as the scale of the latent variable X it β + γ i − ε it is notidentiﬁed. Also, if F is of the second type, then one can show that the cdf of − ε it is of the ﬁrst type. Thus, up to changing ( Y t , X t ) into (1 − Y t , − X t ), we can assumewithout loss of generality, as we do afterwards, that F is of the ﬁrst type. We shallsee that τ + 1 periods are suﬃcient to achieve identiﬁcation. Hence, we assume, againwithout loss of generality, that T = τ + 1: if T > τ + 1, we can always focus on τ + 1periods.We consider the identiﬁcation of not only β but also λ . We then let θ := ( β , λ ) and Θ := ( R K ∗ ) × Λ τ . We also deﬁne, for any ( y, x, θ ) ∈ { , } T × Supp( X ) × Θ , m ( y, x ; θ ) := T X t =1 { y t = 1 , y t = 0 ∀ t = t } M t ( x ; θ ) , where for all j ∈ { , . . . , T } , M j ( x ; θ ) is the (1 , j )-cofactor of the matrix  . . . λ x β ) . . . exp( λ x T β )... ...exp( λ τ x β ) . . . exp( λ τ x T β )  . As we also consider identiﬁcation of β alone, we also let, with a slight abuse ofnotation, m ( y, x ; β ) := m ( y, x ; ( β , λ ) ). Our ﬁrst result shows that the conditionalmoment of m ( Y, X ; θ ) is zero. Theorem 2.3

If Assumptions 1-2 hold, we have, almost surely, E [ m ( Y, X ; θ ) | X ] = 0 . (2.2)Theorem 2.3 shows there exists a known moment condition which potentially identiﬁes θ in a model more general than the logistic one. It shows that, as the number ofperiods T increases, there is an increasing class of distributions F for which β (or θ ) can be point identiﬁed. This is consistent with the idea that if T = ∞ , β ispoint identiﬁed for any F , by using variations in X t of a single individual. It alsocomplements the results of Chernozhukov et al. (2013) showing that bounds on β for general F shrink quickly as T increases.5ote that the result holds also with T = τ + 1 = 2 (or, more generally, with T > τ =1). In such a case, the conditional moment condition can be written E [ { Y > Y } exp( X β ) − { Y > Y } exp( X β ) | X ] = 0 . This conditional moment generate the ﬁrst-order conditions of the theoretical condi-tional likelihood, since the latter is equivalent to E " ( X − X )exp( X β ) + exp( X β ) ( { Y > Y } exp( X β ) − { Y > Y } exp( X β )) = 0 . The discussion above implies that with T = τ + 1 = 2, β is identiﬁed by (2.2) assoon as E [( X − X )( X − X ) ] is nonsingular. We now consider suﬃcient conditionsfor (2.2) to identify θ (or β ) more generally, not only with τ = 1. The momentconditions in the general case are highly nonlinear, making it diﬃcult to provide acomplete characterization. First, we consider the case where γ is actually constant. For any ( k, t ) ∈ { , . . . , K } × { , . . . , T } , we let X k := ( X k, , . . . , X k,T ), X − k :=( X k ,t ) k = k,t =1 ,...,T and X k − t = ( X k,s ) s = t . Proposition 2.4

Let assume that Assumptions 1-2 are satisﬁed, T = τ + 1 ≥ , V ( γ ) = 0 and for all ( k, t ) ∈ { , . . . , K } × { , . . . , T } , | Supp ( X kt | X − k , X k − t ) | ≥ τ .Then, E [ m ( Y, X ; θ ) | X ] = 0 a.s. ⇒ θ = θ . (2.3)Proposition 2.4 shows that in the absence of ﬁxed eﬀects, the conditional momentcondition E [ m ( Y, X ; θ ) | X ] = 0 is suﬃcient to identify θ under mild restrictions onthe distribution of X . In particular, all components of X may be discrete. The resultrelies in particular on the fact that for any λ ∈ Λ τ , the family of functions ( v exp( λ j v )) j =1 ,...,τ forms a Chebyshev system (see, e.g., Krein and Nudelman, 1977, Because we consider identiﬁcation based on (2.2) alone, we suppose this additional restrictionto be unknown by the econometrician. A close inspection of the proof reveals that the support restrictions on X could actually beweakened further, but at the expense of complicating the condition. v P Tj =1 a j exp( λ j v ) does not vanish more than T − γ is nondegenerate and possibly correlated with X ,which is more realistic in practice. For any ( t, ‘, x ) ∈ { , . . . , T } × { , . . . , τ } × Supp( X ), let us deﬁne a t,‘,x : v E  exp( λ ‘ γ ) C ( γ, x ; θ , t ) (cid:16) P τj =1 w j δ j ( x ; θ , t ) exp( λ j ( β k v + γ )) (cid:17) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X = x  where C ( γ, x ; θ , t ) := Q t = t (1+ P τj =1 w j exp( λ j ( x t β + γ ))) and δ j ( x ; θ , t ) := exp( λ j × x t β ). We consider the following conditions. Assumption 3

1. There exist ( t , t ) ∈ { , ..., T } such that E [( X t − X t )( X t − X t ) ] is nonsingular.2. There exists k ∈ { , . . . , K } such that β k = 0 and almost surely, X k | X − k admits a density with respect to the Lebesgue measure. Assumption 4 X k ⊥⊥ γ | X − k .2. There exists some t ∈ { , . . . , T }\{ t , t } such that, for all ( β k , λ ) ∈ ( R ∗ ) × Λ τ , { λ β k , . . . , λ τ β k } ∩ { λ β k , . . . , λ τ β k } = ∅ implies that the τ ( τ + 1) functions { a t ,‘,x ( v ) exp( λ ‘ β k v ) , a t ,‘,x ( v ) exp( λ β k v ) , . . . , a t ,‘,x ( v ) exp( λ τ β k v ) } ‘ =1 ,...,τ form a free family of functions over R for almost all x ∈ Supp ( X ) . Assumption 4 X k ⊥⊥ γ | X − k .2. There exists some t ∈ { , . . . , T }\{ t , t } such that, for all β k ∈ R ∗ , { λ β k , . . . ,λ τ β k } ∩ { λ β k , . . . , λ τ β k } = ∅ implies that the τ ( τ + 1) functions { a t ,‘,x ( v ) exp( λ ‘ β k v ) , a t ,‘,x ( v ) exp( λ β k v ) , . . . , a t ,‘,x ( v ) exp( λ τ β k v ) } ‘ =1 ,...,τ form a free family of functions over R for almost all x ∈ Supp ( X ) . Again, Assumptions 3 and 4 (or 3 and 4’) are assumed to be unknown by the econometrician. X t − X t ) β . Assumption 3.2 imposes that at least one regressor is continuouslydistributed. Assumption 4 and 4’ are very close, with Assumption 4’ being a weakerform of Assumption 4 that turns out to be suﬃcient to identify β only, when λ issupposed to be known. When combined with Assumption 3.2, Assumption 4.1 (orAssumption 4’.1) is similar, but less restrictive, than Assumption R.iii of Magnac andMaurin (2007) or Assumptions A.2-3 in Honore and Lewbel (2002). Importantly, itdoes not imply any large support restriction. Assumption 4.2 and 4’.2 are high-levelconditions that we discuss below. Proposition 2.5

Suppose that Assumptions 1-3 hold and T = τ + 1 . Then:1. If Assumption 4 holds as well, then (2.3) holds.2. If Assumption 4’ holds as well, E [ m ( Y, X ; β ) | X ] = 0 a.s. ⇒ β = β . (2.4)The proof relies on two main ingredients. The ﬁrst is, again, the upper bound onthe number of roots of exponential “polynomials”. The second is analyticity of theconditional moment as a function of X k,t . By a continuation theorem on real analyticfunctions (see e.g. Corollary 1.2.5 in Krantz and Parks, 2002), this allows us toextend the conditional moment function from any x ∈ Supp( X ) to any x such that x j,t = x j,t for all ( j, t ) = ( k, t ) and x k,t ∈ R .Assumption 4.2 and 4’.2 are high-level and technical. We conjecture that they holdunder mild restrictions on the distribution of γ . The following proposition, restrictedto T = 3 and a binary γ , substantiates this claim. Proposition 2.6

Let T = τ + 1 = 3 and Λ τ ⊂ { (1 , λ ) : λ > } . If | Supp ( γ | X ) | = 2 almost surely, Assumption 4’.2 is satisﬁed. We now turn to necessary conditions for (2.4) to hold. We consider the followingassumption.

Assumption 5 P (cid:16) X ∈ n x ∈ R KT : |{ x , . . . , x T }| = T o(cid:17) > . X = ( X , . . . , X T ) with distinctvalues at all periods. Since we focus here on T ≥

3, this excludes in particular thecase where X t is binary. But contrary to Assumption 3, Assumption 5 does notexclude the case where all covariates are discrete, and can be expected to hold if | Supp( X t ) | ≥ T . The following proposition shows that Assumption 5 is actuallynecessary for the conditional moment condition E [ m ( Y, X ; β ) | X ] = 0 to identify β . Proposition 2.7

Suppose that Assumptions 1-2 are satisﬁed and T = τ + 1 ≥ .Then, if (2.4) holds, Assumption 5 holds as well. In the following, we assume that λ is known and focus on the estimation of β . Theconditional moment condition (2.4) can be transformed into unconditional conditionssuch that standard GMM estimators can easily be constructed. Letting g ( X ) ∈ R K ,such estimators b β satisfy b β = arg min β ∈ B n n X i =1 g ( X i ) m ( Y i , X i ; β ) ! n n X i =1 g ( X i ) m ( Y i , X i ; β ) ! , (3.1)where B is a compact subset of R K ∗ . The optimal estimator among this class isobtained by choosing g ? ( X ) := R ( X ) / Ω( X ), with R ( X ) = E [ ∇ β m ( Y, X ; β ) | X ]and Ω( X ) = V [ m ( Y, X ; β ) | X ] (see Chamberlain, 1987). Given that R ( X ) andΩ( X ) are unknown, an asymptotically eﬃcient GMM estimator can be obtainedin two steps. In a ﬁrst step, g ( X ) is chosen arbitrarily and we compute the cor-responding estimator b β . In a second step, we compute b g ? ( X ) = b R ( X ) / b Ω( X ), where b R ( x ) = b E [ ∇ β m ( Y, X ; b β ) | X ] and b Ω( X ) = b V [ m ( Y, X ; b β ) | X ] are standard nonpara-metric estimators (e.g., kernel or series estimators). We then compute the estimator b β ? based on b g ? ( X ). Under regularity conditions displayed in, e.g., Newey (1990), wehave √ n ( b β ? − β ) d −→ N (0 , V ) , (3.2) Estimation of θ could be performed in the same way as that of β , but it is unclear to us whetherthe corresponding estimator would reach the semiparametric eﬃciency bound of θ , something weprove below for β . V := E [Ω( X ) − R ( X ) R ( X ) ] − . To obtain this result, two assumptions areworth mentioning. The ﬁrst is an identiﬁability condition when using the optimalinstruments: E [ g ? ( X ) m ( Y, X ; β )] = 0 ⇒ β = β . Such a condition may fail to hold, as shown by Dominguez and Lobato (2004). Otherestimators relying on the full set of moments can be used to prevent this identiﬁcationfailure (see in particular Dominguez and Lobato, 2004; Hsu and Kuan, 2011; Lavergneand Patilea, 2013). The second condition is that E [Ω( X ) − R ( X ) R ( X ) ] exists and isnonsingular. Nonsingularity holds if and only if E [ R ( X ) R ( X ) ] is nonsingular, whichis a local identiﬁcation condition.We now establish that with T = τ + 1 = 3, the semiparametric eﬃciency boundactually coincides with the asymptotic variance V of the optimal GMM estimator.The result holds under the following condition. Assumption 6 E [Ω − ( X ) R ( X ) R ( X ) ] exists and is nonsingular.2. | Supp ( γ | X ) | ≥ almost surely. We already discuss the ﬁrst condition. The second condition we impose is weakerthan that imposed by Chamberlain (2010), namely Supp( γ | X ) = R . Theorem 3.1

Assume T = τ + 1 = 3 , λ is known with λ = 2 and Assumptions 1-3 and 6 hold. Then the semiparametric eﬃciency bound of β , V ? ( β ) , is ﬁnite andsatisﬁes V ? ( β ) = V . Intuitively, this result states that all the information content of the model is includedin the conditional moment restriction E [ m ( Y, X ; β ) | X ] = 0. It complements, for T = τ + 1 = 3, the result of Hahn (1997), which states that the conditional maximumlikelihood estimator is the eﬃcient estimator of β if F is logistic. The diﬀerencebetween the two results is that here, ( w , w ) is unknown rather than known andequal to (1 , Monte-Carlo simulations

We conduct numerical simulations in order to characterize the ﬁnite sample perfor-mance of b β ? . We let T = τ + 1 = 3 and consider both ( w , w ) = (0 . , .

9) and( w , w ) = (0 . , . λ = (1 ,

5) and suppose it is known. Next, we let K = 1and β = 1, with X t ∈ {− , , } (note that a binary X t ). We ﬁrst draw X uni-formly over {− , , } , then draw X uniformly over {− , , }\{ X } and ﬁnally let X be the remaining element in {− , , }\{ X , X } . Note that Assumption 3.2 failsto hold with such a X . But as explained above, this condition is only suﬃcient, notnecessary for identiﬁcation. We then consider ﬁve data generating processes (DGPs)where the r.v. γ is:i. Constant: γ = 0.ii. Discrete and independent of X : P ( γ = − / | X ) = P ( γ = 0 | X ) = P ( γ =1 / | X ) = 1 / X : γ | X ∼ U ([ − / , / X : γ = U Z where (

U, Z ) ∈ {− / , / } × { , } and P ( U = 1 / | X ) = 0 . X X / P ( Z = 1 | X, U ) = 0 . X : γ = U Z where U | X ∼ U ([0 , / Z ∈ {− / , / } , P ( Z = 1 / | X, U ) = 0 . X X / n ∈ { , , , } . With theDGPs above, the subsample eﬀectively used in the estimation, namely { i ∈ { , . . . , n } : P t =1 Y it = 1 } , represents on average 47 .

8% of the initial sample.To compute the optimal GMM estimator, the usual practice is to estimate b g ? using anineﬃcient GMM estimator. However, in the current set-up, such estimators are oftenequal to zero if g is not chosen appropriately. To overcome this ﬁnite sample issue,we ﬁrst use a rough estimator e g ? of g ? based on the conditional maximum likelihoodestimator ( b β ) of β , assuming a logistic distribution. Then, using e g ? , we obtainan initial GMM estimator e β , which allows us to compute a second (and consistent)estimator of b g ? . Finally, we compute the asymptotically optimal GMM estimator b β ? using b g ? . 11able 1: Simulation Results for b β ? w = 0 . w = 0 . n Bias RMSE Bias RMSEi. 500 − . . − . . , − . . − . . , − . . − . . , − . . − . . − . . − . . , − . . − . . , − . . − . . , − . . − . . − . . − . . , − . . − . . , − . . − . . , − . . − . . − . . − . . , − . . − . . , − . . − . . , − . . − . . − . . − . . , − . . − . . , − . . − . . , − . . − . . Notes: β = 1, λ = (1 , w = 1 − w . The optimal in-struments are estimated using conditional means and b β . Theresults are based on 10 ,

000 sample replications.

For each DGP and the two values of w , Table 1 reports the estimated bias and rootmean square error (RMSE) of b β ? . The estimator b β ? is precise in the absence of ﬁxedeﬀects. When ﬁxed eﬀects are introduced the bias and RMSE vary with ( w, λ ).Overall, the results suggest that for a given sample size n , the bias and RMSE are12ower when w − w increases, when | Supp( γ | X ) | increases or when γ is uncorrelatedwith X . The second case is consistent with our conjecture about Assumption 4. This paper addresses the problem of point identiﬁcation of the common slope pa-rameter in a static panel binary model with exogenous and bounded regressors. Wederive necessary and suﬃcient conditions for global point identiﬁcation based on aconditional moment restriction when T ≥ T = 3. Our paper leaves a fewquestions unanswered. A ﬁrst one is whether the family of F considered here is theonly one for which point identiﬁcation can be achieved. Another one is whether theGMM estimator still reaches the semiparametric eﬃciency bound when T >

3. Bothquestions raise diﬃcult issues and deserve future investigation.13 eferences

Balakrishnan, N. and Leung, M. (1988), ‘Order statistics from the type i generalizedlogistic distribution’,

Communications in Statistics-Simulation and Computation (1), 25–50.Bierens, H. J. (1990), ‘A consistent conditional moment test of functional form’, Econometrica: Journal of the Econometric Society pp. 1443–1458.Chamberlain, G. (1987), ‘Asymptotic eﬃciency in estimation with conditional mo-ment restrictions’,

Journal of Econometrics (3), 305–304.Chamberlain, G. (2010), ‘Binary response models for panel data: Identiﬁcation andinformation’, Econometrica (1), 159–168.Chernozhukov, V., Fernández-Val, I., Hahn, J. and Newey, W. (2013), ‘Average andquantile eﬀects in nonseparable panel models’, Econometrica (2), 535–580.Davezies, L., D’Haultfœuille, X. and Mugnier, M. (2020), ‘Online ap-pendix for ﬁxed eﬀects binary choice models with three or more periods’, https://faculty.crest.fr/xdhaultfoeuille/wp-content/uploads/sites/9/2020/09/online-appendix.pdf .Dominguez, M. A. and Lobato, I. N. (2004), ‘Consistent estimation of models deﬁnedby conditional moment restrictions’, Econometrica (5), 1601–1615.Hahn, J. (1997), ‘A note on the eﬃcient semiparametric estimation of some exponen-tial panel models’, Econometric Theory (4), 583–588.Hong, Y. and White, H. (1995), ‘Consistent speciﬁcation testing via nonparametricseries regression’, Econometrica: Journal of the Econometric Society pp. 1133–1159.Honore, B. E. and Lewbel, A. (2002), ‘Semiparametric binary choice panel data mod-els without strictly exogeneous regressors’,

Econometrica (5), 2053–2063.Hsu, S.-H. and Kuan, C.-M. (2011), ‘Estimation of conditional moment restrictionswithout assuming parameter identiﬁability in the implied unconditional moments’, Journal of Econometrics (1), 87 – 99.14ohnson, E. G. (2004), ‘Identiﬁcation in discrete choice models with ﬁxed eﬀects’,

Working paper, Bureau of Labor Statistics .Krantz, S. and Parks, H. (2002),

A Primer of Real Analytic Functions , AdvancedTexts Series, Birkhäuser Boston.Krein, M. and Nudelman, A. A. (1977),

The Markov Moment Problem and ExtremalProblems , American Mathematical Society.Lavergne, P. and Patilea, V. (2013), ‘Smooth minimum distance estimation and test-ing with conditional estimating equations: uniform in bandwidth theory’,

Journalof Econometrics (1), 47–59.Magnac, T. (2004), ‘Binary variables and suﬃciency: Generalizing conditional logit’,

Econometrica (6), 1859–1876.Magnac, T. and Maurin, E. (2007), ‘Identiﬁcation and information in monotone bi-nary models’, Journal of Econometrics (1), 76–104.Manski, C. F. (1987), ‘Semiparametric analysis of random eﬀects linear models frombinary panel data’,

Econometrica (2), 357–362.Newey, W. K. (1990), ‘Eﬃcient instrumental variables estimation of nonlinear mod-els’, Econometrica (4), 809–837.Rasch, G. (1960), Probabilistic Models for Some Intelligence and Attainment Tests ,Copenhagen: Denmarks Paedagogiske Institute.Stukel, T. A. (1988), ‘Generalized logistic models’,

Journal of the American StatisticalAssociation (402), 426–431.van der Vaart, A. W. (2000), Asymptotic Statistics , Cambridge University Press.15

Proofs of the results

A.1 Proposition 2.1

The suﬃcient part is obvious. To prove necessity, suppose β = 0. Since E [( X t − X t )( X t − X t ) ] is non singular, there exist a subset S of the support of ( X t , X t )such that P ( S ) > x t , x t ) ∈ S , ( x t − x t ) β has constant, non-zero sign.Without loss of generality let us assume ( x t − x t ) β >

0. Let G ( x ) = F ( x ) / (1 − F ( x )).Because G is strictly increasing, we have, for all g ∈ R , G ( x t β + g ) > G ( x t β + g ) . Equivalently, F ( x t β + g )(1 − F ( x t β + g )) > F ( x t β + g )(1 − F ( x t β + g )) . In other words, P ( Y = 1 , Y t = 0 | X t = x t , X t = x t , γ = g ) > P ( Y = 0 , Y t = 1 | X t = x t , X t = x t , γ = g ) , and the result follows by integration over g . A.2 Theorem 2.3

Let us deﬁne A ( x, γ ; θ ) := P τj =1 w j exp( λ j ( x β + γ )) . . . P τj =1 w j exp( λ j ( x T β + γ ))exp( λ x β ) . . . exp( λ x T β )... ...exp( λ τ x β ) . . . exp( λ τ x T β )  . Let A i ( x, γ ; θ ) denote the i th line of A ( x, γ ; θ ). Then A ( x, γ ; θ ) = τ X j =1 w j exp( λ j γ ) A j +1 ( x, γ ; θ ) . It follows that for all ( x, γ ) ∈ Supp( X ) × R ,det A ( x, γ ; θ ) = 0 .

16y Assumption 2 and since we focus on the ﬁrst type therein, we have G ( x ) := F ( x ) / (1 − F ( x )) = P T − j =1 w j exp( λ j x ). Now, developping det A ( x, γ ; θ ) with respectto the ﬁrst row yields, by deﬁnition of the function m , X y ∈{ , } T m ( y, x ; θ ) Y t : y t =1 G ( x t β + γ ) = 0 . Multiplying this equality by Q t (1 − F ( x t β + γ )) we obtain X y ∈{ , } T  m ( y, x ; θ ) Y t : y t =1 F ( x t β + γ ) Y t : y t =0 (1 − F ( x t β + γ ))  = 0 . This equation is equivalent to E [ m ( Y, X ; θ ) | X, γ ] = 0 a.s. The result follows byintegration over γ . A.3 Proposition 2.4

Let us suppose that θ = ( β, λ ) ∈ Θ satisﬁes E [ m ( Y, X ; θ ) | X ] = 0 , (A.1)and let us show that θ = θ . Since γ = γ almost surely for some γ , Equation (A.1)is equivalent to: τ X i =1 w i exp( λ i γ ) det (cid:16) A i ( x ) (cid:17) = 0 , (A.2)for almost all x ∈ Supp( X ), with A i ( x ) :=  exp( λ i x β ) . . . exp( λ i x T β )exp( λ x β ) . . . exp( λ x T β )... ...exp( λ τ x β ) . . . exp( λ τ x T β )  . Let S denote the subset of Supp( X ) on which (A.2) holds. Further, let X ( β ) = { x ∈S : |{ x β, . . . , x T β }| = T − } . We ﬁrst we show that P ( X ( β )) > . (A.3)17his is trivial for T = 2. Otherwise, note ﬁrst that there exists k such that β k = 0.Then: X ( β ) = ( x ∈ S : ∀ t ≥ , x k ,t ( x β − x k , β − k β k , . . . , x t − β − x k ,t β − k β k )) . The condition | Supp( X k ,t | X , . . . , X t − , X − k t ) | ≥ τ for all t ≥ X − k t =( X j,t ) j = k ) ensures that almost surely,Supp( X k ,t | X , . . . , X t − , X − k t ) ( X β − X k ,t β − k β k , . . . , X t − β − X k ,t β − k β k ) and thus (A.3) holds.Now ﬁx x ∈ X ( β ) and k ∈ { , . . . , K } . Using again | Supp( X k,t | X − k , X k − ) | ≥ τ ,there exists A ⊂ R , | A | ≥ τ , such that for all e x verifying e x j,t = x j,t for j = k or t = 1and e x k, = x k, + v , v ∈ A , we have e x ∈ S . Applying (A.2) to such e x and developingeach determinant with respect to the ﬁrst column, we obtain that for all v ∈ A , τ X j =1 ( − j exp( λ j x β ) τ X i =1 w i exp( λ i γ ) det (cid:16) A i { j +1 , } ( x ) (cid:17)! exp( λ j β k v )+ τ X i =1 w i exp( λ i ( x β + γ )) det (cid:16) A i { , } ( x ) (cid:17) exp( λ i β k v ) = 0 , (A.4)where A ij,k ( x ) denote the sub-matrix of A i ( x ) once row j and column k have beenremoved.We ﬁrst assume that β k = 0. Suppose that there exists i such that for all j ∈{ , . . . , τ } , λ j β k = λ i β k . The left hand-side of (A.4) is a polynomial of exponentialfunctions with at most 2 τ distinct exponential functions and it is equal to 0 on 2 τ distinct points v . Then, by Lemma B.1 and because the coeﬃcient of exp( λ i β k v ) is w i exp( λ i ( x β + γ )) det (cid:16) A i { , } ( x ) (cid:17) , we havedet (cid:16) A i { , } ( x ) (cid:17) = 0 . (A.5)Now, because |{ x β, . . . , x T β }| = T −

1, the deﬁnition of Chebyshev systems impliesthat det (cid:16) A { , } (cid:17) = 0, a contradiction. Hence, for all i ∈ { , . . . , τ } , there exists ‘ ( i )such that λ ‘ ( i ) β k = λ i β k . Because λ ‘ ( i ) and λ i are both positive, the sign of β k is then equal to the sign of β k . Let us suppose without loss of generality (since, β k = 0) that β k >

0. Then λ β k < . . . < λ τ β k , implying, since β k >

0, that18 ‘ (1) < . . . < λ ‘ ( τ ) . Hence, ‘ ( i ) = i for all i ∈ { , . . . , τ } and i = 1 yields β k = β k . Inturn, this latter equality implies that λ = λ .We now consider the case β k = 0. Let us assume that β k = 0. The left hand-side of(A.4) is a polynomial of exponential functions with at most τ + 1 distinct exponentialfunctions (since λ j β k = 0 for all j ) and it is equal to 0 on 2 τ distinct points v . Then,by Lemma B.1, τ X i =1 w i exp( λ i ( x β + γ )) det (cid:16) A i { , } ( x ) (cid:17) = 0 . Now, notice that det (cid:16) A i { , } ( x ) (cid:17) = 0 does not depend on i . As a result, τ X i =1 w i exp( λ i ( x β + γ )) = 0 , which is a contradiction. Hence β k = 0 = β k . Note that we do not identify λ in thiscase, but its identiﬁcation is achieved by the previous paragraph, since there exists k such that β k = 0. This concludes the proof. A.4 Proposition 2.5

1. Without loss of generality, we assume hereafter that t = 1 so that t , t ≥

2. Letus suppose that θ = ( β, λ ) ∈ Θ satisﬁes E [ m ( Y, X ; θ ) | X ] = 0 , (A.6)and let us show that θ = θ . Equation (A.6) is equivalent to τ X i =1 w i E  exp( λ i γ ) C ( γ, x ; θ , (cid:16) P τj =1 w j exp( λ j ( x β + γ )) (cid:17) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X = x  × det  exp( λ i x β ) . . . exp( λ i x T β )exp( λ x β ) . . . exp( λ x T β )... ...exp( λ τ x β ) . . . exp( λ τ x T β )  = 0 , (A.7)for almost all x ∈ Supp( X ). Let S denote the subset of Supp( X ) on which (A.7)holds. Further, let X ( β ) = { x ∈ S : |{ x β, . . . , x T β }| = T } . By Assumption 3,19 ( X ( β )) = 1. Now, ﬁx x ∈ X ( β ). By Assumption 3 again, there exist ε ≤ ≤ ε withmax( − ε, ε ) >

0, such that for almost every e x verifying e x t = x t for t > e x j = x j for j = k , | e x k, − x k, | ∈ [ ε, ε ], we have e x ∈ Supp( X ). Applying (A.7) to such e x andusing X k ⊥⊥ γ | X − k , we obtain τ X i =1 w i a ,i,x ( v ) det (cid:16) A i ( v ) (cid:17) = 0 , (A.8)for almost every v ∈ [ ε, ε ], with A i ( v ) =  exp( λ i ( x β + β k v )) . . . exp( λ i x T β )exp( λ ( x β + β k v )) . . . exp( λ x T β )... ...exp( λ τ ( x β + β k v )) . . . exp( λ τ x T β )  . Let A iJ,K ( v ) denote the sub-matrix of A i ( v ) once the rows and columns with indicesin J ⊂ { , . . . , T } and K ⊂ { , . . . , T } , respectively, have been removed. We simplynote A iJ,K when A iJ,K ( v ) does not depend on v . Then, developping each A i ( v ) withrespect to the ﬁrst column, we obtain, for almost every v ∈ [ ε, ε ], τ X i =1 w i " det (cid:16) A i { } , { } (cid:17) exp( λ i ( x β + β k v )) a ,i,x ( v )+ τ X j =1 ( − j det (cid:16) A i { j +1 } , { } (cid:17) exp( λ j ( x β + β k v )) a ,i,x ( v ) = 0 . (A.9)Now, by Lemma B.2, the left-hand side of (A.9) is real analytic, where we recall thata function f : I → R is real analytic if f is equal to its Taylor series at every point of I . Then, by the continuation theorem for real analytic functions (see e.g. Corollary1.2.5 in Krantz and Parks, 2002), (A.8) holds for all v ∈ R . Now, ﬁx i ∈ { , . . . , τ } and let us assume that there is no t ( i ) ∈ { , . . . , τ } such that λ t ( i ) β k = λ i β k . Then,Assumption 4.2 ensures that the functions of v in (A.8) are linearly independent, sothat det (cid:16) A i { t } , { } (cid:17) = 0 , ∀ t ∈ { , . . . , T } , (A.10)Because |{ ( x β, . . . , x T β }| = T , we have, by deﬁnition of Chebyshev systems,det (cid:16) A i { } , { } (cid:17) = 0 , i ∈ { , . . . , τ } , there exists t ( i ) such that λ t ( i ) β k = λ i β k . Because λ t ( i ) and λ i are both positive, the sign of β k is then equalto the sign of β k . Let us suppose without loss of generality (since, by Assumption 3, β k = 0) that β k >

0. Then λ β k < . . . < λ τ β k , implying, since β k >

0, that λ t (1) < . . . < λ t ( τ ) . Hence, t ( i ) = i for all i ∈ { , . . . , τ } and i = 1 yields β k = β k . Inturn, this latter equality implies that λ = λ .Now, in (A.8), λ i = λ i for all i ∈ { , . . . , τ } and β k = β k . With λ replaced by λ and β k replaced by β k , (A.9) and Assumption 4.2 still imply that for all i ∈ { , . . . , τ } ,det (cid:16) A i { t } , { } (cid:17) = 0 , ∀ t ∈ { , . . . , T }\{ i + 1 } . (A.11)Because |{ x β, . . . , x T β }| = T , we have, by deﬁnition of Chebyshev systems,det (cid:16) A i { ,t } , { ,n } (cid:17) = 0 , ∀ ( t, n ) ∈ { , . . . , T }\{ i + 1 } × { , . . . , T } . This, together with (A.11), implies that for t ∈ { , . . . , T }\{ i + 1 } the ﬁrst row of A it, is a non-trivial linear combination of the other rows. In other words, for all t = i ,there exists a non-zero vector ( w t,j ) j =1 ,...,τ with w t,t = 0 such that for all s ≥ λ i x s β ) = τ X j =1 w t,j exp( λ j x s β ) . (A.12)Let deﬁne P t ( u ) = P τj =1 w t,j exp( λ j u ) for all t ∈ { , . . . , τ } . Then, for all s ≥ P ( x s β ) = . . . = P i − ( x s β ) = P i +1 ( x s β ) = . . . = P τ ( x s β ) . Moreover, because x ∈ X ( β ), we have |{ x β, . . . , x T β }| = τ . Then, by Lemma B.1, P = . . . = P T . But this implies that for all ( t, j ) ∈ { , . . . , τ }\{ i } , w t,j = w j,j =0. Therefore, by (A.12) again, there exists strictly positive constants ( c , . . . , c τ ) ∈ (0 , ∞ ) τ such that exp( λ i x t β ) = c i exp( λ i x t β ) for all t ≥

2. In other words, thereexists K ∈ R such that for all t ≥ x t ( β − β ) = K. (A.13)This equality holds in particular for periods t and t in Assumption 3.1. Moreover,because x ∈ X ( β ) was arbitrary and P ( X ( β )) = 1, this implies that almost surely,( X t − X t ) ( β − β ) = 0. The ﬁrst part of Assumption 3 implies β = β , which endsthe proof. 21. We follow the exact same reasoning, except that λ in θ is replaced by λ . Inparticular, we obtain the same equation as (A.9) with λ in place of λ . Then (A.10)holds under Assumption 3’.2 instead of Assumption 3.2. This implies that β k = β k .The proof that β j = β j for j = k is exactly as above. A.5 Proposition 2.6

We leave x and the conditioning on X = x implicit here. We also let C ( γ ) := C ( γ, x ; θ , t ), α i := w i δ i ( x ; θ , t ), a i := λ i β k , b i := λ i β k , ( γ , γ ) := Supp( γ | X = x ) and ( q , q ) denote the corresponding probabilities. We must prove that for all µ = ( µ j‘ ) j =0 , , ,‘ =1 , , if for all v ∈ R , X j =1 e a j v X p =1 q p C ( γ p ) 11 + P i =1 α i e λ i γ p e b i v X ‘ =1 µ j‘ e λ ‘ γ p ! + X ‘ =1 e λ ‘ β k v X p =1 q p µ ‘ e λ ‘ γ ‘ C ( γ p ) 11 + P i =1 α i e λ i γ p e b i v = 0 , then µ = 0. Let us deﬁne, for p ∈ { , } , f j,p ( v ) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) e ajv P τi =1 α i e λ iγp e biv if j ∈ { , } , e bj − τ v P τi =1 α i e λ iγp e biv if j ∈ { , } ,G j,p ( µ ) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) q p C ( γ p ) P τ‘ =1 µ j‘ e λ ‘ γ p if j ∈ { , } , q p µ j − τ e λ j − τ γj − τ C ( γ p ) if j ∈ { , } . Then Assumption 4’.2 can be rewritten as follows: X j =1 2 X p =1 G j,p ( µ ) f j,p ( v ) = 0 ∀ v ∈ R ⇒ µ = 0 . (A.14)To prove (A.14), ﬁrst remark that if G j,p ( µ ) = 0 for all ( j, p ), then µ = 0. This istrivial for the µ ‘ . For the µ j‘ , j ≥

1, this follows from Lemma B.1. Thus, Assumption4 holds if the family ( f j,p ) j =1 ,..., ,p =1 , is free, i.e. if for all ν = ( ν ij ) i =1 ,..., ,j =1 , X j =1 2 X p =1 ν jp f j,p ( v ) = 0 ∀ v ∈ R ⇒ ν = 0 . v ∈ R ,( ν + ν ) e a v + α ( ν e λ γ + ν e λ γ ) e ( a + b ) v + α ( ν e λ γ + ν e λ γ ) e ( a + b ) v +( ν + ν ) e a v + α ( ν e λ γ + ν e λ γ ) e ( a + b ) v + α ( ν e λ γ + ν e λ γ ) e ( a + b ) v +( ν + ν ) e b v + α ( ν e λ γ + ν e λ γ ) e b v + α ( ν e λ γ + ν e λ γ ) e ( b + b ) v +( ν + ν ) e b + α ( ν e λ γ + ν e λ γ ) e ( b + b ) v + α ( ν e λ γ + ν e λ γ ) e b v = 0 , then ν := ( ν , ν , ν , ν , ν , ν , ν , ν ) = 0. The proof of this point, which islong and cumbersome, is detailed in our online Appendix (Davezies et al., 2020). A.6 Proposition 2.7

Let us suppose that Assumption 5 fails. Without loss of generality, assume that X = X almost surely. Let us deﬁne y := (1 , , ..., y := (0 , , , ...,

0) and f ( x ; β ) := E [ m ( Y, X ; β, λ ) | X = x ]. By deﬁnition, f ( X ; β ) = X y ∈{ , } T P ( Y = y | X ) m ( y, X ; β ) . (A.15)Moreover, almost surely, P ( Y = y | X )= Z F ( X β + γ )(1 − F ( X β + γ ))(1 − F ( X β + γ )) · · · (1 − F ( X T β + γ ))d F γ | X ( γ )= Z F ( X β + γ )(1 − F ( X β + γ ))(1 − F ( X β + γ )) · · · (1 − F ( X T β + γ ))d F γ | X ( γ )= P ( Y = y | X ) . (A.16)Next, m ( y , X ; β ) = det  exp ( λ X β ) . . . exp ( λ X T β )... ...exp ( λ T − X β ) . . . exp ( λ T − X T β )  = det  exp ( λ X β ) . . . exp ( λ X T β )... ...exp ( λ T − X β ) . . . exp ( λ T − X T β )  = − m ( y , X ; β ) . (A.17)23oreover, for all y such that P t y t = 1 and y

6∈ { y , y } , m ( y, X ; β ) = 0 becausethe cofactor includes two identical columns (since X = X ). Finally, if P t y t = 1,we also have m ( y, X ; β ) = 0. In view of (A.15), these last points, combined with(A.16)-(A.17), imply f ( β ) = 0. Since β was arbitrary, it means that (2.3) does notidentify β . The result follows. A.7 Theorem 3.1

Let us ﬁrst summarize the proof. We link the current model with a “complete” modelwhere γ is also observed. This model is fully parametric and thus can be analyzedeasily. Speciﬁcally, we show in a ﬁrst step that this complete model is diﬀerentiablein quadratic mean (see, e.g. van der Vaart, 2000, pp.64-65 for a deﬁnition) and has anonsingular information matrix. In a second step, we establish an abstract expressionfor the semiparametric eﬃciency bound. This expression involves in particular thekernel K of the conditional expectation operator g E [ g ( X, Y ) | X, γ ]. In a thirdstep, we show that K = { ( x, y ) q ( x ) m ( x, y ; β ) , E [ q ( X )] < ∞} . (A.18)The fourth step of the proof concludes. First step: the complete model is diﬀerentiable in quadratic mean and hasa nonsingular information matrix.

Let p ( y | x, g ; β ) = P ( Y = y | X = x, γ = g ; β ).We check that the conditions of Lemma 7.6 in van der Vaart (2000) hold. Under,Assumptions 1-2, we have p ( y | x, g ; β ) = Y t : y t =1 F ( x it β + g ) Y t : y t =0 (1 − F ( x it β + g )) , where F is C ∞ on R and takes values in (0 , β ln p ( y | x, g ; β ) isdiﬀerentiable. Let S β = ∂ ln p ( Y | X, γ ; β ) /∂β and let S βk denote its k -th component.We prove that E [ S βk ] < ∞ . First, remark that S βk = T X t =1 X k,t f ( X t β + γ )[ F ( X t β + γ )][1 − F ( X t β + γ )] [ Y t − F ( X t β + γ )] . | S βk | ≤ T X t =1 | X k,t | f ( X t β + γ ) F ( X t β + γ )(1 − F ( X t β + γ ))= T X t =1 | X k,t | P T − j =1 w j λ j e λ j ( X t β + γ ) P T − j =1 w j e λ j ( X t β + γ ) ≤ λ τ T X t =1 | X k,t | , (A.19)where we have used the triangle inequality and | Y t − F ( X t β + γ ) | ≤ E [ S βk ] < ∞ . Bythe dominated convergence theorem and again (A.19), β E [ S β S β ] is continuous.Therefore, the conditions in Lemma 7.6 in van der Vaart (2000) hold, and the completemodel is diﬀerentiable in quadratic mean. Moreover, E [ S β S β ] = E [ V ( S β | X, γ )] = T X t =1 E  f ( X t β + γ )[ F ( X t β + γ )][1 − F ( X t β + γ )] ! X t X t  . Then, if for some λ ∈ R K , λ E [ S β S β ] λ = 0, we would have X t λ = 0 almost surely forall t ∈ { , . . . , T } . By Assumption 3.1, this implies λ = 0. Hence, the informationmatrix E [ S β S β ] is nonsingular. Second step: V ? depends on the orthogonal projection of E [ S β | X, Y ] on K . Let e ψ = ( e ψ , . . . , e ψ K ) denote the eﬃcient inﬂuence function, as deﬁned p.363of van der Vaart (2000). Then V ? = E [ e ψ e ψ ] and E [ e ψ ] = 0. Let S =span( S β ), G = { q : E [ q ( X, γ )] < ∞ , E [ q ( X, γ )] = 0 } and for any closed convex set A andany h = ( h , . . . , h K ) , let Π A denote the orthogonal projection on A and Π A ( h ) =(Π A ( h ) , . . . , Π A ( h K )) . By Equation (25.29), Lemma 25.34 (since the complete modelis diﬀerentiable in quadratic mean by the ﬁrst step) and the same reasoning as inExample 25.36 of van der Vaart (2000), e ψ is the function of ( X, Y ) of minimal L -norm satisfying e χ = Π S + G ( e ψ ) , (A.20)where e χ is the eﬃcient inﬂuence function of the large model. Because this large modelis parametric, we have e χ = E [ S β S β ] − S β . (A.21)25quation (A.20) implies E [( e ψ − e χ ) e χ ] = 0. Thus, deﬁning ‘ β = E [ S β | Y, X ], we get E [ e ψ‘ β ] = E [ e ψS β ] = Id , (A.22)Moreover, because E [ S β | X, γ ] = 0, S and G are orthogonal. Thus, (A.20) is equiv-alent to Π S ( e χ ) = Π S ( e ψ ) and Π G ( e χ ) = Π G ( e ψ ). Moreover, (A.21) implies thatΠ G ( e χ ) = 0. Hence, e ψ ∈ K K . Now, because Π K is an orthogonal projector, wehave E [ e ψ Π K ( ‘ β ) ] = E [Π K ( e ψ ) ‘ β ] = E [ e ψ‘ β ] = Id , where the last equality follows by (A.22). Hence, if Π K ( ‘ β ) λ = 0 a.s., we would have λ = 0. In other words, E [Π K ( ‘ β )Π K ( ‘ β ) ] is nonsingular. Now, consider the set, F = n E [Π K ( ‘ β )Π K ( ‘ β ) ] − Π K ( ‘ β ) + v : E [ v Π K ( ‘ β ) ] = 0 o . F is thus the set of vector-valued functions ψ satisfying the equation E [ ψ Π K ( ‘ β )] =Id.Hence, e ψ being the element of F with minimum L -norm, we obtain e ψ = E [Π K ( ‘ β )Π K ( ‘ β ) ] − Π K ( ‘ β ) . Finally, because V ? = E [ e ψ e ψ ], V ? = E [Π K ( ‘ β )Π K ( ‘ β ) ] − . (A.23) Third step: (A.18) holds.

Let r ∈ K and let us prove that r ( y, x ) = q ( x ) m ( y, x ; β )for some q . First, by deﬁnition of K , we have, for almost all ( g, x ) ∈ Supp( γ, X ),0 = r ((0 , , , x ) + r ((1 , , , x ) G ( x β + g ) + r ((0 , , , x ) G ( x β + g )+ r ((0 , , , x ) G ( x β + g ) + r ((1 , , , x ) G ( x β + g ) G ( x β + g )+ r ((1 , , , x ) G ( x β + g ) G ( x β + g ) + r ((0 , , , x ) G ( x β + g ) G ( x β + g )+ r ((1 , , , x ) G ( x β + g ) G ( x β + g ) G ( x β + g ) . (A.24)Let a t := x t β for t ∈ { , , } and, for the sake of conciseness, let us remove thedependence of r on x . Then, using Assumption 2, we obtain, for almost all ( g, x ),0 = A e × g + A e g + A e λ g + A e g + A e λ g + A e (1+ λ ) g + A e g + A e (2+ λ ) g + A e (1+2 λ ) g + A e λ g , A = r (0 , , ,A = w [ r (1 , , e a + r (0 , , e a + r (0 , , e a ] ,A = w h r (1 , , e λ a + r (0 , , e λ a + r (0 , , e λ a i ,A = w h r (1 , , e ( a + a ) + r (1 , , e ( a + a ) + r (0 , , e ( a + a ) i ,A = w w h r (1 , , e a + λ a + e a + λ a ) + r (1 , , e a + λ a + e a + λ a )+ r (0 , , e a + λ a + e a + λ a ) i ,A = w h r (1 , , e λ ( a + a ) + r (1 , , e λ ( a + a ) + r (0 , , e λ ( a + a ) i ,A = w r (1 , , e a + a + a ,A = w w r (1 , , h e a + a + λ a + e a + λ a + a + e λ a + a + a i ,A = w w r (1 , , h e a + λ ( a + a ) + e a + λ ( a + a ) + e a + λ ( a + a ) i ,A = w r (1 , , e λ ( a + a + a ) . Since λ = 2 is excluded by assumption, there are three cases left depending on thenumber of diﬀerent exponents in Equation (A.24).First, we consider λ / ∈ { / , } . By Lemma B.1 and because | Supp( γ | X ) | ≥

10, weobtain A k = 0 for all k ∈ { , . . . , } . A = A = 0 imply that r (0 , ,

0) = r (1 , ,

1) =0. Next, A = A = 0 implies that either r (1 , ,

1) = r (1 , ,

0) = r (0 , ,

1) = 0 or  r (1 , ,

0) = − r (1 , , e λ ( a − a ) − r (0 , , e λ ( a − a ) ,r (1 , ,

0) = − r (1 , , e ( a − a ) − r (0 , , e ( a − a ) . (A.25)Consider the second case. A = 0 implies, since ( r (1 , , , r (1 , , , r (0 , , =(0 , , r (1 , ,

0) = − r (1 , , e a + λ a + e a + λ a e a + λ a + e a + λ a − r (0 , , e a + λ a + e a + λ a e a + λ a + e a + λ a . By assumption, for almost every x = ( x , x , x ), a = a and a = a . Then, using27he latter display with equation (A.25) yields, since λ = 1, r (1 , ,

1) = r (0 , , h e λ ( a − a ) − e a − a i − h e a − a − e λ ( a − a ) i ,r (1 , ,

1) = r (0 , , " e λ ( a − a ) − e a + λ a + e a + λ a e a + λ a + e a + λ a − × " e a + λ a + e a + λ a e a + λ a + e a + λ a − e λ ( a − a ) . Since ( r (1 , , , r (1 , , , r (0 , , = (0 , , r (1 , , = 0 and r (0 , , = 0. Then e (1 − λ ) a e (1 − λ ) a e a + λ a +( λ − a − e λ ( a + a ) e λ ( a + a ) − e ( λ − a + λ a + a = e a + λ a +( λ − a − e λ ( a + a ) e λ ( a + a ) − e ( λ − a + λ a + a , which is equivalent to a = a . By assumption, the set of x for which this occurs isof probability zero. In other words, for almost every x , r ((1 , , , x ) = r ((1 , , , x ) = r ((0 , , , x ) = 0 .A = A = 0 implies that either r (1 , ,

0) = r (0 , ,

1) = 0 or  r (0 , ,

1) = − e ( a − a ) r (1 , , − e ( a − a ) r (0 , , ,r (0 , ,

1) = − e λ ( a − a ) r (1 , , − e λ ( a − a ) r (0 , , . In the ﬁrst case, almost surely r ( Y, X ) = 0 = 0 × m ( Y, X ; β ). In the second case, r ( Y, X ) = q ( X ) × m ( Y, X ; β ) for some g ∈ L X . The result follows.Now, we turn to λ = 3 /

2. Then, for almost all ( g, x ) ∈ Supp( γ, X ),0 = A e × g + A e g + A e g + A e g + ( A + A ) e g + A e g + A e g + A e g + A e g . By Lemma B.1 and because | Supp( γ | X ) | ≥

9, we obtain A + A = 0 and A k = 0for all k

6∈ { , } . A = A = 0 implies that r (0 , ,

0) = r (1 , ,

1) = 0 which in turnimplies that A = 0 and thus A = 0. Hence, we have A k = 0 for all k ∈ { , . . . , } and the same reasoning as when λ

6∈ { / , } allows us to obtain the result.Finally, we consider λ = 3. Then, for all ( g, x ),0 = A e × g + A e g + ( A + A ) e g + A e g + A e g + A e g + A e g + A e g + A e g , By Lemma B.1 and because | Supp( γ | X ) | ≥

9, we obtain A + A = 0 and A k = 0for all k

6∈ { , } . A = A = 0 implies that r (0 , ,

0) = r (1 , ,

1) = 0 which in turnimplies that A = 0 and thus A = 0. Hence, A k = 0 for all k ∈ { , . . . , } and theresult follows again as when λ

6∈ { / , } .28 ourth step: conclusion. By Steps 2 and 3, there exists q ( X ) such that Π K ( ‘ β ) = q ( X ) m ( Y, X ; β ). Moreover, by deﬁnition of the orthogonal projection, Π K ( ‘ β ) − ‘ β ∈ ( K ⊥ ) K . Hence, again by Step 3, we have, for all q ∈ L X , E [ q ( X ) q ( X ) m ( Y, X ; β ) ] = E [ ‘ β q ( X ) m ( Y, X ; β )] . This implies that q ( X )Ω( X ) = E [ ‘ β m ( Y, X ; β ) | X ] . As a result, because ‘ β = E [ S β | Y, X ],Π K ( ‘ β ) =Ω − ( X ) m ( Y, X ; β ) E [ ‘ β m ( Y, X ; β ) | X ]=Ω − ( X ) m ( Y, X ; β ) E [ S β m ( Y, X ; β ) | X ] . Then, using (A.23), we obtain V ? = E h Ω − ( X ) E [ S β m ( Y, X ; β ) | X ] E [ S β m ( Y, X ; β ) | X ] i − . Now, by the end of the proof of Theorem 2.3, we have, for all β ,0 = E β [ m ( Y, X ; β ) | X, γ ] . As a result, 0 = ∇ β E β [ m ( Y, X ; β ) | X, γ ]= E β [ ∇ β m ( Y, X ; β ) | X, γ ] + E β [ m ( Y, X ; β ) S β | X, γ ] . Evaluating this equality at β and integrating over γ yields: E [ S β m ( Y, X ; β ) | X ] = − E [ ∇ β m ( Y, X ; β ) | X ] = − R ( X ) . We conclude that V ? = E h Ω − ( X ) R ( X ) R ( X ) i − = V , which is a well-deﬁned matrix by Assumption 6.1.29 Technical lemmas

The following two lemmas are keys in the proof of Proposition 2.5.

Lemma B.1

Let n ≥ , ( α , . . . , α n ) be n distinct real numbers, ( a , . . . , a n ) ∈ R n and P ( x ) = P ni =1 a i exp( α i x ) . If P has n distinct roots, then a = . . . = a n = 0 . Lemma B.2

For any ( t, ‘ ) ∈ { , . . . , T } × { , . . . , τ } , a t,‘,x is real analytic for almostall Supp ( X ) . B.1 Proof of Lemma B.1

This follows by induction on n and Rolle’s theorem, see e.g. Chapter 2, section 2 ofKrein and Nudelman (1977). B.2 Proof of Lemma B.2

We want to prove that each function a t,‘,x is real analytic for almost all x ∈ Supp( X ).Fix x ∈ Supp( X ), and let ˜ w γj := w j δ j ( x, θ , t ) exp( λ j γ ), ˜ λ j := λ j β k . Let us deﬁne f : ( v, γ ) / (cid:16) P τj =1 ˜ w γj exp(˜ λ j v ) (cid:17) . We have a t,‘,x ( v ) = Z exp( λ ‘ γ ) C ( γ, x ; θ , t ) f ( v, γ ) dF γ | X = x ( γ ) , ∀ v ∈ R . We prove the result in three steps. First, we establish a bound on the derivatives of f . Second, we show that a t,‘,x is C ∞ , and we bound its deratives. Finally, we showthat a t,‘,x is real analytic. First step: for all k ≥ and all ( v, γ ) , (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∂ k ∂v k f ( v, γ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ k !( eλ τ | β k | ) k f ( v, γ ) . (B.1)For any inﬁnitely diﬀerentiable real function g : R × Supp( γ | X = x ) → R , welet g ( k ) ( v, γ ) = ∂ k g ( v, γ ) /∂v k and deﬁne P : ( v, γ ) P τj =1 ˜ w γj ˜ λ j exp(˜ λ j v ). First,30emark that for any positive integer k , (cid:12)(cid:12)(cid:12) P ( k ) ( v, γ ) (cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) τ X j =1 ˜ w γj ˜ λ k +10 j exp(˜ λ j v ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤| ˜ λ k +10 τ | (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) τ X j =1 ˜ w γj exp(˜ λ j v ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤| ˜ λ k +10 τ | /f ( v, γ ) . (B.2)Now, we prove (B.1) by induction. The result is trivial for k = 0. Suppose that itholds for j = 0 , . . . , k , k ≥

0. Remark that f (1) = f × ( f P ). Then, by applying twicethe general Leibniz rule, we obtain (cid:12)(cid:12)(cid:12) f ( k +1) (cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) k X j =0 kj ! ( f ) ( j ) ( f P ) ( k − j ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ k X j =0 kj ! | f ( j ) || ( f P ) ( k − j ) |≤ f k X j =0 kj ! j !( e ˜ λ τ ) j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) k − j X i =0 k − ji ! f ( i ) P ( k − j − i ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ f k X j =0  kj ! j !( e ˜ λ τ ) j k − j X i =0 k − ji ! i !( e ˜ λ τ ) i f ˜ λ k − j − i +10 τ f  ≤ f ˜ λ k +10 τ e k k X j =0 kj ! j !  k − j X i =0 k − ji ! i !  , where we used the induction hypothesis to get the second and third inequalities. Thelast inequality follows from e i ≤ e k − j , ∀ i ≤ k − j . Now, notice that for any k ∈ N ∗ ,we have k X s =0 ks ! s ! = k X s =0 k !( k − s )! ≤ k ! e. (B.3)As a result, (cid:12)(cid:12)(cid:12) f ( k +1) (cid:12)(cid:12)(cid:12) ≤ f ˜ λ k +10 τ e k k X j =0 kj ! j !( k − j )! e = f ˜ λ k +10 τ e k × e × ( k + 1) × k != ( k + 1)! (cid:16) e ˜ λ τ (cid:17) k +1 f, and thus the induction hypothesis holds for k + 1. This ends the ﬁrst step.31 econd step: a t,‘,x is C ∞ and for all k ≥ , sup v ∈ R (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∂ k a t,‘,x ( v ) ∂v k (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ C t,‘,x,θ k !( eλ τ | β k | ) k , (B.4)for some C t,‘,x,θ > v ∈ R , we have 1 /f ( v ) ≥ ˜ w γ‘ exp(˜ λ ‘ v ) and C ( γ, x ; θ , t ) ≥

1. Thus,exp( λ ‘ γ ) C ( γ, x ; θ , t ) f ( v, γ ) ≤ w ‘ δ ‘ ( x, θ , t ) . (B.5)Hence, (B.4) holds for k = 0, with C t,‘,x,θ = 1 / [ w ‘ δ ‘ ( x, θ , t )]. Next, v exp( λ ‘ γ ) × f ( v, γ ) /C ( γ, x ; θ , t ) is C ∞ and by (B.5) and the previous step, we have, for any k ≥ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∂ k ∂v k exp( λ ‘ γ ) C ( γ, x ; θ , t ) f ( v, γ ) !(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ k !( eλ τ | β k | ) k exp( λ ‘ γ ) C ( γ, x ; θ , t ) f ( v, γ ) ≤ k !( eλ τ | β k | ) k w j δ j ( x, θ , t ) . Thus, by the dominated convergence theorem, a t,‘,x is C k and we have (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∂ k a t,‘,x ( v ) ∂v k (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ Z (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∂ k ∂v k exp( λ ‘ γ ) C ( γ, x ; θ , t ) f ( v, γ ) !(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) d F γ | X = x ( γ ) ≤ k !( eλ τ | β k | ) k Z exp( λ ‘ γ ) C ( γ, x ; θ , t ) f ( v, γ )d F γ | X = x ( γ )= k !( eλ τ | β k | ) k a t,‘,x ( v ) ≤ C t,‘,x,θ k !( eλ τ | β k | ) k . Third step: a t,‘,x is real analytic. It suﬃces to show that there exists

R > v , a t,‘,x coincides with its Taylor expansion at v on ( v − R, v + R ). Let R < / (2 eλ τ | β k | ). First, by the second step, we have, for any v ∈ ( v − R, v + R ), (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) k ! ( v − v ) k ∂ k a t,‘,x ( v ) ∂v k (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ k ! R k sup v (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∂ k a t,‘,x ( v ) ∂v l (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ C t,‘,x,θ ( Reλ τ | β k | ) k , (B.6)and the corresponding series converges since Reλ τ | β k | <

1. Thus, the Taylor seriesof a t,‘,x converges at v , for any v ∈ ( v − R, v + R ). Finally, by the second step again32nd Taylor’s theorem applied to a t,‘,x ( v ), we obtain, for any K > | v − v | < R : (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) a t,‘,x ( v ) − K X k =0 k ! ( v − v ) k ∂ k a t,‘,x ( v ) ∂v k (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ R K +1 ( K + 1)! sup | v − v |