A Correlated Random Coefficient Panel Model with Time-Varying Endogeneity
AA Correlated Random Coefficient Panel Modelwith Time-Varying Endogeneity
Louise Laage ∗ This version : March 14, 2020
Abstract
This paper studies a class of linear panel models with random coefficients. We do not restrictthe joint distribution of the time-invariant unobserved heterogeneity and the covariates. Weinvestigate identification of the average partial effect (APE) when fixed-effect techniques cannotbe used to control for the correlation between the regressors and the time-varying disturbances.Relying on control variables, we develop a constructive two-step identification argument. Thefirst step identifies nonparametrically the conditional expectation of the disturbances giventhe regressors and the control variables, and the second step uses “between-group” variations,correcting for endogeneity, to identify the APE. We propose a natural semiparametric estimatorof the APE, show its ? n asymptotic normality and compute its asymptotic variance. Theestimator is computationally easy to implement, and Monte Carlo simulations show favorablefinite sample properties. Control variables arise in various economic and econometric models,and we provide variations of our argument to obtain identification in some applications. As anempirical illustration, we estimate the average elasticity of intertemporal substitution in a laborsupply model with random coefficients. ∗ [email protected], Toulouse School of Economics, University of Toulouse Capitole, Toulouse, France.I am grateful to my advisors Donald W. K. Andrews, Xiaohong Chen and especially Yuichi Kitamura for theirguidance and support. I also thank Anna Bykhovskaya, Philip A. Haile, Yu Jung Hwang, John Eric Humphries,Rosa Matzkin, Patrick Moran, Peter C. B. Phillips, Pedro Sant’Anna, Masayuki Sawada, Edward Vytlacil as well asparticipants at the Yale econometrics seminar for helpful conversations and comments on this project. I acknowledgefinancial support from the grants ERC POEMH 337665 and ANR-17-EURE-0010, and from the IAAE travel grant.All errors are mine. a r X i v : . [ ec on . E M ] M a r Introduction
This paper considers a random coefficient panel model whose outcome equation is y it “ x it µ i ` α i ` (cid:15) it , i ď n, t ď T, (1)where the number of periods T is fixed and the number of units n is large. The scalar (cid:15) it is atime-varying disturbance. The impact of the vector x it P R d x of covariates on the scalar dependentvariable y it is linear in µ i , vector-valued time-invariant unobserved heterogeneity. In order to depictsituations in which the researcher does not know what drives heterogeneity in the impact of x it , a fixed effect approach is adopted, that is, we do not impose assumptions on the joint distributionof p µ i , p x it q t ď T q . This positions (1) in the class of correlated random coefficient (CRC) models:attention is given to recovering properties of the distribution of the unobserved heterogeneity µ ,a question complicated by the correlation between this vector and the regressor. For instance, alinear least squares regression computed with a single cross-section will not consistently estimate E p µ q .Model (1) has been studied in the seminal paper by Chamberlain (1992), and more recently inArellano and Bonhomme (2012) and Graham and Powell (2012). Under various sets of constraintson the variations over time of the regressor and disturbance, we now know how to identify theconditional mean of the unobserved heterogeneity (Chamberlain (1992) for when T ą d x ` T “ d x `
1) and even the conditional distribution of µ (Arellanoand Bonhomme (2012)). An essential assumption for these identification argument is a strict exo-geneity condition on the regressors. In model (1), this strict exogeneity condition can be written E p (cid:15) it | x i , ..., x iT q “ . Note that under the strict exogeneity condition, the covariates p x it q t ď T can be correlated with the vector of unobservables p α i , µ i , p (cid:15) it q t ď T q through their correlation with p α i , µ i q . This correlation can be controlled for by a fixed effect transformation because p α i , µ i q istime-invariant. Loosely speaking, this condition implies that the endogeneity of the model can be“captured by a fixed effect”. It prohibits the presence of time-varying omitted variables correlatedwith the regressors x it and, as pointed out, e.g, in Arellano and Bonhomme (2012) does not allowfor sequentially exogeneous regressors. When instruments satisfying an orthogonality condition areavailable, one might be tempted to estimate the average effect using a fixed-effect instrumental vari-ables estimator or a first-difference instrumental variables estimator (see, e.g, Wooldridge (2010)).But due to the randomness of the unobserved effect and its potential correlation with the regressors In a model more general than (1), where the partial derivative B E p y | x, q q{B x depends on the time-varying dis-turbance, Graham and Powell (2012) allows for a correlation between this disturbance and the regressor x it , but amarginal stationarity condition must be satisfied. E q rB E p y t | x t , q t q{B x | x t “ ¯ x s where the outer expectation is over thevector of unobservables q t “ p µ, α, (cid:15) t q . It is an average of the partial effect of x t on y t over thedistribution of the unobserved heterogeneity q t and in (1), this average partial effect is equal to E p µ q . A discussion on the difficulty of identifying the APE in panel correlated random coefficientsmodels can be found in Graham and Powell (2012).The core idea of this paper is to use the control function approach (CFA) to control for endo-geneity. More specifically, we assume the existence of control variables such that, conditional onthe control variables at all time periods, the time-varying disturbance at time t is conditionallymean independent of the regressors at all time periods. Then instinctively, if one wants to dis-entangle the regressors from the unobserved heterogeneity, the instrument must impact variationsover time of the regressors conditional on the control variables. An invertibility condition formal-izes this intuition. Equipped now with such instruments and control variables, identification ofthe average partial effect is obtained using a two-step approach. First the individual unobservedheterogeneity is “differenced away” using the individual time variation of the regressors, whichidentifies nonparametrically the conditional expectation of the disturbances given regressors andcontrol variables, that is, the term controlling the endogeneity of the model. Second, the endogene-ity is corrected for using this identified nonparametric function and “between-group” variationswill allow us to pin down the average effect. This identification argument is constructive and itsstructure suggests a natural multi-step estimator, following the identification steps after estimatingthe control variables. We define the estimator in Section 4.1 and show consistency and asymptoticnormality in Sections 4.2 through 4.4. The derivation of the asymptotic properties of our estimatoris challenging due to the presence of nonparametric regression estimators using nonparametricallyestimated regressors. This relates to a broad literature on estimation with generated covariates inwhich to our knowledge, there are no results directly applicable to our estimator on the asymptoticdistribution of sample moments depending on nonparametric two-step sieve estimators.The identification argument does not rely on an exact specification of the control variables.Interestingly this approach gives flexibility to the two-step method and we show in Section 3 thatidentification can be obtained in variations of model (1) with different types of violation of the strictexogeneity condition. A panel random coefficient model with sample selection is such an example,and we also apply the idea in a model such as (1) when sequential exogeneity holds, that is, whenthe regressors are predetermined. 3e review here the literature this paper is connected to. Linear panel models with randomcoefficients, such as (1), are sometimes referred to as models with individual-specific slope orvariable-coefficient models. They are surveyed for example in Wooldridge (2010) and Hsiao (2014).Although we focus on the fixed T framework, such models have been studied when T is allowed togrow, see, e.g, Pesaran and Smith (1995). For fixed T , Wooldridge (2005a) shows that consistencyof the standard fixed-effects estimator in these models requires the random coefficients to be meanindependent of the detrended regressors. Important recent results on correlated random coeffi-cient panel models are in Arellano and Bonhomme (2012) and Graham and Powell (2012), papersmost closely related to ours. Both papers build upon Chamberlain (1992), which studies efficiencybounds in semiparametric models. As an application, Chamberlain (1992) derives the semipara-metric variance bound of the APE in a correlated random coefficient model and provides an efficientestimator. Arellano and Bonhomme (2012) investigate a model very similar to (1) under strict ex-ogeneity. They obtain identification of the variance and of the distribution of the unobserved effectby leveraging information on the time dependence of the time-varying disturbances. They requirethat the number of periods be strictly greater than the number of regressors (including a constantif any), an assumption that we maintain. On the other hand, Graham and Powell (2012) focus onidentification of the APE when T is exactly equal to the number of regressors. In this case, themethod developped in Chamberlain (1992) cannot be applied. They develop an alternative iden-tification argument exploiting the subsample of “stayers” with little regressor variation as a firststep, and construct an estimator. Other recent papers have analyzed nonseparable panel modelsunder a fixed-effect approach. One such paper is Evdokimov (2010) which studies identificationand estimation of a model where the outcome equation is additively separable in a nonparamet-ric function of regressors and scalar time-invariant unobserved heterogeneity, and a residual term.Other papers are Chernozhukov, Fern´andez-Val, Hahn, and Newey (2013) studying partial identifi-cation of average structural function and quantile structural function in nonseparable panel modelswith time-invariant unobserved heterogeneity, and Hoderlein and White (2012). An alternative tothe fixed-effect approach is the correlated random effect approach which imposes restrictions onthe conditional distribution of the unobserved heterogeneity given the regressor. Examples of thisapproach in panel data are, among others, Altonji and Matzkin (2005), Bester and Hansen (2009),Arellano and Bonhomme (2016) and Graham, Hahn, Poirier, and Powell (2018).As stated earlier, papers studying the APE correlated random coefficients panel models donot allow for time-varying endogeneity. To the best of our knowledge, we are the first to proveidentification of the APE in CRC models with time-varying endogeneity. However the use ofexclusion restrictions or of the control function approach (see, e.g, Newey, Powell, and Vella (1999),Blundell and Powell (2003)) in models with random coefficients is not new. Cross section models4nclude Wooldridge (1997), Wooldridge (2003) and Heckman and Vytlacil (1998). They impose anexclusion restriction on the random coefficient and homogeneity conditions on the impact of theinstruments on the regressors, and identify the average treatment effect. More recently, Mastenand Torgovitsky (2016) specify a nonseparable first stage thus allowing for heterogeneity in theimpact of the instrument. They retrieve the conditional APE under the assumption that therandom coefficient is independent of the instrument and the regressor. Hoderlein, Holzmann, andMeister (2017) analyze a triangular model with random coefficients in both stages, independentof instruments and exogenous regressors: for such a model they show nonidentification of thedistribution of the random coefficients in general and that an independence condition betweenthe random coefficients is required for identification. An analogous approach in a panel model isemployed in Murtazashvili and Wooldridge (2016) which studies a random coefficient model withendogenous regressors and endogenous switching. Additionally, Murtazashvili and Wooldridge(2008) show that the fixed-effect instrumental variables estimator is consistent only under a similarset of assumptions. Exploiting the panel aspect of the data to “difference away” the time-invariantunobserved heterogeneity allows us to avoid imposing such restrictions on the joint distribution ofthe unobserved heterogeneity, the regressors, and the instruments.Section 2 reviews the model and constructs the main two-step identification argument. Someextensions are provided to ease the burden of the curse of dimensionality. The model describesa very generic form of endogeneity and Section 3 leverages this aspect to obtain identification inrelated models. In Section 4, an estimator is provided for the APE in the main model. Thisestimator is computationally easy to implement as it uses closed-form expressions and does notrequire optimization. Monte Carlo simulations show favorable finite sample properties in Section 5.Finally, Section 6 turns to an empirical illustration. Using the Panel Study of Income Dynamics, weestimate the average elasticity of intertemporal substitution in a labor supply model with randomcoefficients. The following section sets up the model and the control function approach assumption. Section 2.2lays out the identification argument, imposing an invertibility condition which is then studied inmore details in Section 2.3. Finally in Section 2.4, we show how to improve upon the identificationmethod in some cases where more is known about the data generating process.5 .1 Model
For a sample of units indexed by i , for i ď n , the outcome variable in period t , for t ď T , is givenby y it “ x it µ i ` α i ` (cid:15) it , where x it P R d x is a vector of observed variables, (cid:15) it is a time-varying disturbance, and µ i is atime-invariant vector which represents individual unobserved heterogeneity. We consider the casewhere T is fixed and n large, and assume T ě d x `
2. More details on this last condition are inSection 2.2.1. Denoting by y i “ p y i , .., y iT q the vector of outcomes of unit i , X i “ p x i , .., x iT q the matrix of regressors, and (cid:15) i “ p (cid:15) i , .., (cid:15) iT q the vector of error terms, we can rewrite (1) as y i “ X i µ i ` α i T ` (cid:15) i where 1 T is the vector of size T composed of ones.The parameters of interest we focus on are the average effects E p µ i q and E p α i q . A standardassumption in the panel correlated random coefficient literature is strict exogeneity of the regressors,which in Model (1) takes the form E p (cid:15) i t | X i , α i , µ i q “
0. However as pointed out in Section 1, thisassumption does not allow for the presence of time-varying omitted variables. We seek to relaxthe strict exogeneity condition. We will assume the availability of instrumental variables z it P R d z satisfying the following assumption, where we write Z i for the individual matrix of instruments and x it,d for each scalar-valued regressor so that x it “ p x it,d q d ď d x . Assumption 2.1. p X i , Z i , (cid:15) i , µ i , α i q is i.i.d. accross i , and (1) holds with E p (cid:15) it q “ ,2. For each t ď T , there exists an identified function C t such that, defining v it “ C t p x it , z it q P R d v , E p (cid:15) it | x i , ..x iT , v i , ..v iT q “ f t p v i , ..v iT q . Assumption 2.1 (2) is a control function approach (CFA) assumption and v it is a control variable.Besides its panel aspect, this assumption is similar to the condition imposed in Newey, Powell, andVella (1999) . Define V i “ p v i , ..v iT q . If for all t ď T , a cross section control function assumptionis satisfied, that is, E p (cid:15) it | x it , v it q “ h t p v it q , and if p x it , (cid:15) it , v it q is i.i.d. over both i and t , thenAssumption 2.1 (2) is satisfied with f t p V i q “ h t p v it q . Note that we normalized E p (cid:15) it q “ E p f t p V i qq “ t ď T . This is without loss of generality since a constant is not separatelyidentifiable from E p α i q .Control variables satisfying Assumption 2.1 (2) are typically provided by a first-step selectionequation. We will provide a few examples of such selection equations in Section 3. We mention here A slight difference is that they impose E p (cid:15) i | z i , v i q “ f p v i q in a cross section regression, without endogenousvariables in the conditioning set. Their definition of the control variable as v i “ x i ´ E p x i | z i q implies E p (cid:15) i | x i , v i q “ f p v i q , which is a cross section version of the condition we impose. y it “ x it µ i ` x it µ i ` (cid:15) it , (2) x it “ E p x it | x it , z it q ` v it , E p (cid:15) i | V i , X i q “ f p V i q , where x it P R d is a vector of exogenous regressors and x it P R d a vector of potentially endogenousregressors. We can rewrite (2) as (1) taking d x to be d ` d , x it “ p x it , x it q , µ i “ p µ i , µ i q and v it P R d as defined. Then Assumption 2.1 (2) is satisfied with C t p x, z q “ x ´ E p x it | x it “ x , z it “ z q . The control variables in this model is the residual of the regression of x it on p x it , z it q . Anotherwell-known choice of control variable studied in Blundell and Powell (2003) and Imbens and Newey(2009) is the scalar random variable v it “ F x | x ,z p x it | x it , z it q . Note . Unlike this particular case, Model (1) and our general definition of the control variables as v it “ C t p x it , z it q in Assumption 2.1 (2) do not make explicit which of the regressors are endogenous.Nor does the condition E p (cid:15) it | X i , V i q “ f t p V i q . Our general identification results will not distinguishendogenous from exogenous variables as this information is embedded in the specific definition ofthe control variables, which is determined outside of the model. This deliberate lack of precisionis offset by a gain in flexibility and we will present in Section 3 a variety of specifications to whichour general identification argument applies.
Define u it “ (cid:15) it ´ f t p V i q and u i “ p u i , .. , u iT q . By Assumption 2.1, E p u i | X i , V i q “
0. We also write f p V i q “ p f p V i q , .. , f T p V i qq , where V i is of dimension p T d v q . The vector of primitives for each unit i is W i “ p X i , Z i , V i , µ i , u i q .The model (1) can be rewritten y it “ x it µ i ` α i ` f t p V i q ` u it . The extra term f t p V i q captures the “time-varying endogeneity”: it is an unobserved time-varyingrandom variable correlated with the regressors. The random variable α i ` f t p V i q is composed oftwo unobserved elements. First, α i , which is time invariant but varies accross individuals. Second, f t p V i q , which varies with both i and t ď T . Thus, f p V i q and E p α i | V i q are not separately identifiable.However,we normalized E p f p V i qq “
0, which should intuitively guarantee that f p V i q and E p α i q areseparately identifiable. Our identification procedure will indeed prove this fact. Exploiting thetime invariance of α i , we take time differences to eliminate this term. We will then later obtainidentification of E p α i q using E p f p V i qq “ t ď T ´ y i t ` ´ y it “ r x i t ` ´ x it s µ i ` f t ` p V i q ´ f t p V i q ` u i t ` ´ u it , y it “ x it µ i ` g t p V i q ` u it , (3)with y it “ y i t ` ´ y it , x it “ x i t ` ´ x it , g t p V i q “ f t ` p V i q ´ f t p V i q , and u it “ u i t ` ´ u it .We write the model in vector form, defining X i “ p x i , .., x i T ´ q a p T ´ q ˆ d x matrices, andthe p T ´ q ˆ y i “ p y i , .. , y i T ´ q , g p V i q “ p g p V i q , .. , g T ´ p V i qq and u i “ p u i , .. , u i T ´ q .Equation (3) can then be rewritten as y i “ X i µ i ` g p V i q ` u i , (4)with by assumption, E p u i | X i , V i q “ Note . Having (cid:15) i t “ f t p V i q ` u i t with E p u i | X i , V i q “ f t p V i q is allowed to depend on each of the v is , s ď T . In Section3, we take advantage of this property of Assumption 2.1 (2) and obtain identification in a class ofmodels similar to (1) without contemporaneous endogeneity but where only sequential exogeneityis imposed. We now introduce two matrices commonly used in the panel CRC literature. The first one is the p T ´ q ˆ p T ´ q matrix M i “ I T ´ ´ X i p X i X i q ´ X i if X i is of full rank or M i “ I ´ X i X ` i ifnot, where X ` i is the Moore Penrose inverse (implying X i X ` i X i “ X i ). M i is a projection matrixprojecting onto the space orthogonal to the columns of X i . And the second one, defined only if X i has full column rank, is the d x ˆ p T ´ q matrix Q i “ p X i X i q ´ X i . By definition M i X i µ i “ Q i X i µ i “ µ i .Before going further into the identification argument, we mention here some of the limitationsof using the matrices M i and Q i . First, when X i has full column rank, that is, when Q i is defined,if T “ d x ` M i “ g . Therefore as in Arellano and Bonhomme (2012), we study panels where T ě d x ` Q i can be very large. Indeed the norm of q p X q “ p X X q ´ X goesto infinity as det p X X q approaches 0. In particular, the norm of q p X q is not necessarily boundedwhen X lies in a compact subset of the space of matrices of size T ˆ k x . The identification argumentwill involve expectations of the product of the vector of outcome variables by Q i but this will beproperly defined only if E p|| Q i u i ||q ă 8 . Given the properties of Q i that we highlighted, this isa strong condition. This issue is discussed in details in Graham and Powell (2012), which studiesModel (1) when T “ d x `
1. The identification method in this paper will impose positive densityof “stayers”, that is of individuals with non full column rank X i . The stayers will be leveraged for8dentification of their common parameters, yet under this condition E p|| Q i u i ||q ă 8 is unlikely tohold. They thus provide an alternative closed-form expression for the average effect as the limit of aconditional expectation conditional on det p X i q ą h , limit taken as h Ñ
0. As they point out, evenwhen T “ d x ` E p|| Q i u i ||q ă 8 as is the case here, for similar reasons the semiparametricvariance bound for E p µ i q computed by Chamberlain (1992) might not be bounded. That is, E p µ i q is not regularly identified. Their limit closed-form equation can be instead used to identify theaverage effect.We acknowledge these issues and for the sake of simplicity will assume that the needed momentsare finite, noting that the identification strategy used in Graham and Powell (2012) could beextended to T “ d x ` E p µ i | det p X i X i q ą δ q is alsoidentified under standard assumptions and will actually suggest an estimator for this parameter,the asymptotic properties of which are studied in more details using these standard assumptions.This is also the object estimated in Arellano and Bonhomme (2012). Note that whether or not X i X i is invertible, M i is an orthogonal projection matrix. This implies that the function M p X q “ I T ´ ´ X p X X q ´ X is bounded. Assumption 2.2. X i is of full column rank with probability ,2. E p|| u i ||q ă 8 , E p|| Q i u i ||q ă 8 , E p|| Q i g p V i q||q ă 8 , and E p||p µ i , α i q||q ă 8 . With expectations now properly defined, since M i and Q i are function of X i , we use Assumption2.1 (2) to obtain M i y i “ M i g p V i q ` M i u i , E p M i u i | X i , V i q “ , (5) Q i y i “ µ i ` Q i g p V i q ` Q i u i , E p Q i u i | X i , V i q “ . (6)Equation (5) is a within-group transformation that allows us to separate g from µ i and to identifyit, while equation (6) isolates µ i from X i and uses the knowledge of g to identify E p µ q by takingexpectation. g t p . q This function g t itself is not an object of interest, but the procedure developed here to identify theaverage partial effect requires its identification as a first step. Note that (5) gives E p M i y i | X i , V i q “ M i g p V i q , (7)since M i is function of X i . For V a given value of V i , g p V q is a p T ´ q ˆ g using the conditional expectation. However because M i is a projection9atrix, it is singular. It is therefore not possible to identify g p V i q directly using (7), despite M i being observed.Instead of using E p M i y i | X i , V i q , we focus on k p V q “ E p M y | V i “ V q which satisfies E p M i y i | V i “ V q “ E p M i | V i “ V q g p V q “ M p V q g p V q , (8)where we write M p V q “ E p M i | V i “ V q “ E p M p X i q | V i “ V q “ E p I ´ X i p X i X i q ´ X i | V i “ V q .If M p V q is invertible for a given value V on the support of V i , then (8) gives a closed form expressionfor g p V q . This suggests the following invertibility condition to obtain identification of the wholefunction, Assumption 2.3.
The matrix M p V i q is invertible, P V a.s. Note that this assumption is a condition solely on observables, and can therefore be tested usingavailable data. Under Assumptions 2.1, 2.2 and 2.3, we obtain g p V i q “ M p V i q ´ E p M i y i | V i q , P V a.s . (9)Intuitively, Assumption 2.3 precludes g p V i q from being of the form X i β i and thus from distortinga proper identification of E p µ i q . Indeed, g p V i q “ X i β i ñ M i g p V i q “ ñ M p V i q g p V i q “ ñ g p V i q “ M p V i q , which can hold only with probability 0 since X i is of fullcolumn rank with probability 1. This means that the term X i µ i is separately identifiable from g p V i q by Assumption 2.3. Note . Instead of taking time differences, one could define a p V i q “ E p α i | V i q , ˜ M i “ I T ´ X i p X i X i q ´ X i and ˜ M p V q “ E p ˜ M i | V i “ V q . Equation (8) becomes E p ˜ M i y i | V i q “ ˜ M p V i qr f p V i q ` a p V i qs` E p ˜ M i r α i ´ a p V i qsq , where the second term on the RHS is a priori nonzero. Hence one cannotuse the above explained method to identify f ` a and must exploit the time invariance of α i . Average partial effect E p µ q : Under Assumption 2.1, 2.2 and 2.3, g t p . q is identified for t ď T ´ Q i is well-defined with probability 1. Equation (6) implies µ i “ Q i y i ´ Q i g p V i q ´ Q i u i , (10)where by the law of iterated expectations and Assumption 2.2, E p Q i u i q “
0. This implies E p µ i q “ E p Q i y i ´ Q i g p V i qq , (11)which identifies E p µ q since all elements on the right hand side are observed.10 esult 2.1. Under Assumptions 2.1, 2.2 and 2.3, the average effect E p µ q is identified. As mentioned in Section 2.2.1, one might be worried that the conditions E p|| Q i u i ||q ă 8 and E p|| Q i g p V i q||q ă 8 of Assumption 2.2 do not hold. In this case, we propose an alternative objectof interest which is identified under standard conditions. We define δ i “ p det p X i X i q ą δ q and Q δi “ δ i Q i . Then E p µ | δ q “ E p δ i Q i y i ´ δ i Q i g p V i qq P p det p X i X i q ą δ q “ E p Q δi y i ´ Q δi g p V i qq P p det p X i X i q ą δ q , (12)which identifies E p µ | δ q since all the terms on the right hand side of (12) are identified. Therequired conditions are E p|| Q δi u i ||q ă 8 and E p|| Q δi g p V i q||q ă 8 , which one can show are satisfiedif for instance X i has bounded support, E p|| u i ||q ă 8 and E p|| g p V i q||q ă 8 .It remains to identify E p α i q , which we obtain using the variables in period 1. We multiply (10)with x i and substract y i , y i ´ x i µ i “ y i ´ x i r Q i y i ´ Q i g p V i q ´ Q i u i s . The model gives y i ´ x i µ i “ α i ` (cid:15) it where E p (cid:15) it q “
0. Combining the two, we obtain E p α i q “ E p y i ´ x i r Q i y i ´ Q i g p V i qsq , where the right hand side is identified or observed. This identifies E p α i q . Note . The focus of this paper being on allowing for time-varying endogeneity in correlatedrandom coefficient panel model, the parameter of interest was chosen to be the average effect E p µ q .However more properties of the unobserved heterogeneity can be obtained as is shown in Arellanoand Bonhomme (2012). This will be explained in Section 2.4.3 but note that by E p u i | X i q “ E p µ i | X i q is also identified with E p µ i | X i q “ E p Q i y i ´ Q i g p V i q| X i q . Average effect of an exogenous intervention :
Consider a policy intervention that changes x it for each unit i in a given period t . The average effect of this exterior intervention is an objectof interest to analyze such policies and Blundell and Powell (2003) studies its identifiability indifferent models when the change in covariates is exogenous, i.e, independent of the unobservableerror terms. The unobservables in the CRC model we study in this paper are p µ i , α i , p (cid:15) it q t ď T q , andan exogenous shift can be a variation ∆ i independent of p α i , µ i , p (cid:15) it q t ď T q in which case the averageimpact of the policy is E p µ i q E p ∆ i q . However it might be of interest to consider policy interventionswhere the variation ∆ is correlated with x it , hence correlated with p µ i , α i q while exogenous in thesense that it is independent of p (cid:15) it q t ď T . For example, consider an exogenous intervention that shifts x it to l p x it q . The average outcome after this intervention is E p l p x it q µ i ` α i ` (cid:15) it q and depends on11he joint distribution of p µ i , x it q where µ i is unobservable. It could potentially be challenging toobtain since we left this joint distribution unrestricted but because Equation (10) expresses µ i asa function of the primitives it can be once more plugged in to recover average effects. The changein expected outcome is E p l p x it q µ i ` α i ` (cid:15) it ´ r x it µ i ` α i ` (cid:15) it sq “ E pr l p x it q ´ x it s µ i q , “ E pr l p x it q ´ x it s r Q i y i ´ Q i g p V i q ´ Q i u i sq , “ E pr l p x it q ´ x it s r Q i y i ´ Q i g p V i qsq , where the second equality holds by exogeneity of the change in regressors. All elements in the lastexpectation are identified, thus identifying the average change in outcome. M p V q We now provide conditions satisfying Assumption 2.3 under which the matrix M p V q is nonsingularalmost surely in V . We first state a set of high level conditions and prove that they satisfyAssumption 2.3. We will then explore on a case-by-case basis situations in which these high-levelconditions are satisfied. We also provide some extensions. We use the notation Int pq to refer to theinterior of a set, S W | ¯ V refers to the support of the random variable W conditional on the variable V taking the value ¯ V . Rank p A q refers to the column rank of a matrix A , GL T ´ p R q is the space ofmatrices of size p T ´ q ˆ p T ´ q that are invertible, and p W | V p . | . q is the conditional density of arandom variable W conditional on the variable V . We will write, for two random variables A and B , S A the support of A and S A | b the support of A conditional on B “ b . Before giving a formal statement, a brief intuition is given here on what properties of the randomvariables are used to show invertibility of M p V q . Recall that M p V q “ E p M | V q “ E p M p X q | V q where M p X q “ I T ´ ´ X p X X q ´ X is an orthogonal projection matrix. M p X q projects onto thespace orthogonal to the k x columns of X , where each column k corresponds to the T ´ x i.,k . By the properties of orthogonal projection matrices, M p X q “ M p X q “ M p X q “ M p X q M p X q . This implies that M p V q “ E p M p X q M p X q | V q . Thus, for a given V P S V , M p V q R GL T ´ p R q ô D c P R T ´ zt u M p V q c “ ñ D c P R T ´ zt u , c M p V q c “ The support of a continuous r.v Z with density p Z is defined as the closure of the set where p Z takes nonzerovalues. D c P R T ´ zt u , E p c M p X q M p X q c | V q “ ñ D c P R T ´ zt u , E p || M p X q c || | V q “ , and since || M p X q c || is a positive function, this implies that || M p X q c || “ X conditional on V , i.e P X | V ´ a.s. That is, M p X q c “ , P X | V -a.s . This result is very useful here, it implies that if a sum of orthogonal projections of a given vector c is zero, then each of the orthogonal projections of c is zero. Thus the goal of any assumptionimplying invertibility of M p V q is to make sure that as the value of X varies on S X | V the null spacesof the matrices M p X q have trivial intersection. The nullspace of M p X q being the space spanned bythe columns of X , intuitively this requirement will be satisfied S X | V contains sufficiently differentdraws of X . The following result provides one way to formalize this explanation. Assumption 2.4.
The following holds almost surely in V .1. Int ´ S X | V ¯ ‰ H ,2. There exists a basis e “ p e , .., e T ´ q of R T ´ and for each t ď T ´ , there exists X p t q P Int ´ S X | V ¯ such that p X | V p X p t q | V q ą , Rank p X p t q q “ d x and X p t q 1 e t “ . Result 2.2.
Under Assumption (2.4), M p V q is nonsingular almost surely in V .Note . Some comments are necessary on the conditions imposed in Assumption 2.4.1. If e is the canonical basis of R T ´ , that is, e “ p , , ..., q , ... and e T ´ “ p , ..., , q , then X d t “ x i t “ x i t ` “ ¯ x . In this case, because E p y i t ` ´ y i t | x i t “ x i t ` “ ¯ x, V i “ V q “ f t ` p V q ´ f t p V q “ g t p V q .g t is trivially identified. Note that the condition p X | V p X p t q | V q ą T ´
1, exactly) and we do not need to knowthese values of ¯ x to obtain identification, we simply need their existence.2. The existence of a draw X p t q such that X p t q is of full column rank d x and X p t q e t “ X p t q to be lower than T ´
2, i.e d x ď T ´ Although it is not explicit in the notation, the vector c depends on the draw of V . p , .., q is always included in the matrix M i , whichwould imply that the matrix M p V i q is singular.3. The statement here is about invertibility of M p V q , almost everywhere on the support of V .In Section 4.1, we will construct an estimator, and to study its asymptotic properties we willassume that M p V q is invertible for all V P S V to ensure that λ min p M p V qq is bounded awayfrom zero uniformly in V . Assumption 2.3 is implicitly an assumption on the first stage used to construct the control variables V , i.e, it is an assumption on the role of the instruments p z t q t ď T . To better understand it, someparticular cases are discussed here as well as some extensions. Example 1:
We focus here on the model (2) when there are no exogenous regressors, i.e, d “ d “
1. We write x it “ b t p z it q ` v it , E p v it | z it q “ . (13)For simplicity we also consider the case where the instrument z it is real-valued. In this case, weshow that for any value of V and as Z varies on S Z | V , it is sufficient that two draws of X arenon collinear. Since X is the column vector of the values of x over time, the noncollinearity is acondition requiring the instrument to have an impact on the variations over time of x . This is alsovisible in the following assumption. Assumption 2.5.
1. For all t ď T , b t is a continuously differentiable function,2. Almost surely in V : Int ` S Z | V ˘ ‰ H . Plus, there exists Z V P Int ` S Z | V ˘ such that p Z | V p Z V | V q ą and for some t ď T ´ , d b t p z Vit q{ dz t ‰ . Result 2.3.
If (13) holds and Assumption 2.5 is satisfied, M p V q is nonsingular P V a.s . Example 2:
Consider again Model (2) but where now the first stage is linear. That is, x it P R d x , z P R d z and x it “ A t z it ` v it , E p v it | z it q “ , (14)with A t of size d x ˆ d z . Taking the basis e to be the canonical basis of R T ´ , Condition (2) ofAssumption 2.4 says that almost surely in V and for all t ď T ´ , there exists X p t q P S X | V e t X p t q “ x p t q t “
0. By (14), for a fixed value V of thevector of control variables, this imposes for all t ď T ´ Z p t q P S Z | V such that A t ` z p t q t ` ´ A t z p t q t ` v t “
0. It is visible that this is a condition on the dynamics of p A t z t q t ď T conditional on V , which can be translated into conditions on the dynamics of the instrumentdepending on the matrices p A t q t ď T .For instance, this condition will be satisfied if A t ` has full row rank (which implies d z ě d x )and if A t ` p A t ` A t ` q ´ p A t z t ´ v t q P S z t ` | V,z t for some z t P S z t | V . The support of z t ` conditionalon V and z t needs to be large enough. If for instance the matrix A t “ A does not vary over timeand is of full row rank, it is the support of z t ` ´ z t | V which must be large enough as this conditionbecomes A p AA q ´ v t P S z t ´ z t ` | V .Another example is if the instrument does not vary over time, z t “ z @ t ď T . Condition (2)of Assumption 2.4 would be satisfied in this case if @ t ď T , A t ` ´ A t is of full row rank and p A t ´ A t ` q rp A t ` ´ A t qp A t ` ´ A t q s ´ v t P S z | V . Observe that having z t constant over timetransfer the time variation requirement onto the matrices A t as we now need A t ` ´ A t to havefull row rank. One of the applications in Section 3, studying sequential exogeneity, exploits thispossibility as it uses x as an instrument for all time periods. Extension 1:
The support condition in Assumption 2.4 does not allow for deterministic rela-tion between regressors. If some regressors do depend deterministically on others, it is possible torewrite the model and obtain sufficient conditions guaranteeing invertibility of the matrix. Writ-ing x it “ p x it , .. , x sit , x s ` it , .. , x d x it q , where the first s components of x it do not have functionaldependence, we assume there are d x ´ s functions p l k q s ` ď k ď d x such that for s ` ď k ď d x , x kit “ l k p x it , .. , x sit q . We define X s “ p x i , .. , x si q the collection of time differences for the first s components of x it . With this new setting we can rewrite Assumption 2.4. Assumption 2.6.
The following holds almost surely in V .1. Int ´ S X s | V ¯ ‰ H ,2. For all s ` ď k ď d x , l k is a continuous function,3. For all t ď T ´ , there exists X p t q such that p X p t q q s P Int ´ S X s | V ¯ , p X | V p X p t q | V q ą , Rank p X p t q q “ d x and X p t q 1 e t “ . Result 2.4.
Under Assumption (2.6), M p V q is nonsingular almost surely in V . Note that the support condition is on X s while the orthogonality condition is on the wholecollection of columns of X . Assuming full rank implies that the l k functions cannot be linear. Extension 2:
The various sets of assumptions suggested so far do not handle the case wherethe conditional distribution of X given V is discrete. That is, if say the control variable comes from15 selection equation x it “ c p z it , v it q , then z it cannot be a discrete random variable. It is howeverpossible to extend the previous framework to obtain invertibility of M p V q when z is a discreterandom variable. We point out this compatibility here, which extends to the overall identificationargument, but we will often assume in the rest of the paper that x , z and v are continuouslydistributed. Assume that the vector Z i conditional on V i “ V takes N p V q values with positive probability.For each value Z p N q , N ď N p V q , we denote by X p N q and M p N q the corresponding matrix ofregressors and projection matrix. Using the fact that each M p N q is an orthogonal projection matrix,we look at the singularity condition on M p V q . M p V q R GL T ´ p R q ô D c P R T ´ , M p V q c “ ñ D c P R T ´ , c M p V q c “ ñ D c P R T ´ , ÿ N ď N p V q c M N q M p N q c “ ñ D c P R T ´ , ÿ N ď N p V q || M p N q c || “ ñ @ N ď N p V q , M p N q c “ . An assumption yielding invertibility of M p V q , P V ´ a.s, is as follows. Assumption 2.7.
Almost surely in V , there exists a basis e “ p e , .., e T ´ q of R T ´ and for each t ď T ´ , there exists N t p V q ď N p V q such that X N t p V qq e t “ . Result 2.5.
If Assumption 2.7 holds, then M p V q is nonsingular P V a.s. Note that if Assumption 2.2 holds, X p N q is of full column rank for all N . Thus for a given X p N q there can be at most d x ´p T ´ q linearly independent vectors in the nullspace of X N q . Assumption2.7 therefore implies that the support of Z conditional on V must have at least r T ´ T ´ ´ d x s points. If it is known to the researcher that the random coefficients associated to some covariates l it P R d l have a degenerate distribution, we propose a different procedure. Consider the model y it “ l it b ` x it µ i ` (cid:15) it , (15) Note that we do not assume that V is continuously distributed because we directly assume that V is a controlvariable satisfying the control function assumption as well as the invertibility assumption. Typically the constructionof the control variable will require V to be continuously distributed (see, e.g, Assumption (ii) of Theorem 1 in Imbensand Newey (2009)). x it “ p x it , x it q P R d x and where as in Model (2), x it P R d are exogenous regressors and x it P R d are allowed to be endogenous. We also write l it “ p l it , l it q P R k m where l it P R d l isexogenous, l it P R d l is endogenous, which are regressors known to have a homogeneous impact.In the case where the control variables are the residuals of the regression of x it , these extensionsare useful for two reasons. First, if all coefficients are assumed heterogeneous, the proceduredescribed in Section 2.2 requires T to be at least d x ` d l `
2, which means that the vector of controlvariables will be of dimension at least p d l ` d qp d x ` d l ` q . V is an argument of the function g which will be nonparametrically estimated. A high dimension of V is undesirable because ofthe curse of dimensionality. However if l it is known to have homogeneous impact, T needs to behigher than p d x ` q which is a less restrictive requirement. We will show that the dimensions ofconditioning sets in that case does not have to exceed d p d x ` q (which is reached if T is taken tobe exactly d x ` E p (cid:15) it | Z Li , X i , V i q “ f t p V i q , where as before, V i is an identified function of the regressors X i and the instruments Z i , and Z Li iscomposed of L i and instruments for L i . The matrices M i and Q i are the same matrices functionof X i .Using the within-group operation, M i y i “ M i L i b ` M i g p V i q ` M i u i , with E p M i u i | Z Li , X i , V i q “ , (16) E p M i y i | V i q “ E p M i L i | V i q b ` M p V i q g p V i qñ M i M p V i q ´ E p M i y i | V i q “ M i M p V i q ´ E p M i L i | V i q b ` M i g p V i q . The left multiplication by M i M p V i q ´ leads to having the term M i g p V i q , which also appears in (16).This suggests a modification of the procedure developed in Robinson (1988) for the identificationof b . Indeed, defining ∆ y i “ y i ´ M p V i q ´ E p M i y i | V i q and ∆ L i “ L i ´ M p V i q ´ E p M i L i | V i q , weobtain M i ∆ y i “ M i ∆ L i b ` M i u i . Since E p Z Li M i u i q “ E p Z Li M i E p u i | Z Li , X i , V i qq “ M i “ M i “ M i , we obtain b “ E p L W i M i ∆ L i q ´ E p Z L i M i ∆ y i q , (17)under the assumption that E p Z L i M i ∆ L i q is nonsingular.Once b is identified, then identification of g and E p µ i q will be obtained by applying the resultsof Sections 2.2.2 and 2.2.3 to y it ´ l it b . 17f d x “
1, that is, the researcher is interested in relaxing the homogeneity assumption for oneendogenous regressor, then it is required that T ě
3. Taking v it to be scalar and T “
3, then thedimension of the conditioning set for the nonparametric regressions needed for identification is 3,independently of the number of regressors in l it . T ą d x ` x it . For identification, it is requiredthat T ě k x `
2, so if T ą k x `
2, one can select k x ` T available andobtain identification assuming the control function approach assumption as well as the invertibilitycondition holds for this subset of periods. However, it is possible to use the T time periods withoutincreasing the dimension of the conditioning set.We assume here that T ą k x `
2, and denote T the set of subsets of t , .. , T u of cardinality k x `
2. The cardinality of T is ` Tk x ` ˘ . Consider τ P T , a subset of k x ` τ “ p t , ... , t k x ` q where t , ă ¨ ¨ ¨ ă t k x ` . We write with a superscript τ the vectors thatare defined using only the time periods in τ . For instance, V τi “ p v it , .. , v it kx ` q . Then (1) implies y τi “ X τi µ i ` (cid:15) τi . Now we modify the control function approach assumption.
Assumption 2.8.
There exist a set of functions p h τt q t P ττ P T and identified functions p C t q t ď T suchthat, defining v it “ C t p x it , z it q P R d v , @ τ P T , @ t P τ, E p (cid:15) it | X τi , V τi q “ h τt p V τi q . Indeed the assumption that we used in the main model is E p (cid:15) it | X i , V i q “ f t p V i q and doesnot imply Assumption 2.8. If Assumption 2.1 (2) holds, then by the law of iterated expectations E p (cid:15) it | X τi , V τi q “ E p f t p V i q | X τi , V τi q which is not necessarily a function of V τi only. However, theindependence assumptions we make in all our applications directly satisfy Assumption 2.8.For a given τ in T , changing the definition of g t to g τt “ h τt ` ´ h τt , identification of the vector offunctions g τ follows from the same first step provided that M τ p V τ q is invertible. In the main model,identification of E p µ i q follows from (10), which would become E p µ i q “ E p Q τi y τi ´ Q τi g τ p V τi qq . Butsince this holds for all subset τ , we can also write E p µ i q “ ` Tk x ` ˘ ÿ τ P T E p Q τi y τi ´ Q τi g τ p V τi qq . I thank Donald Andrews for this suggestion. .4.3 Identifying higher-order properties of µ i In a model with strict exogeneity, Arellano and Bonhomme (2012) extend the method in Cham-berlain (1992) to identify the variance matrix and even the distribution of p α i , µ i q under variousrestrictions on the time-dependence of (cid:15) it and on the joint distribution of p (cid:15) i , α i , µ i q . The argu-ment first identifies the common parameters. Then subtracting the common part from the outcomevariables, higher order moments of µ i are separated from those of (cid:15) i using the above-mentionedrestrictions. We note here that their argument can be combined with the assumptions made inthe present paper so as to allow for endogeneity of the regressors. Indeed g p V i q being recoveredusing the method described in Section 2.2.2, the analysis of Arellano and Bonhomme (2012) canbe conducted on y i ´ g p V i q “ X i µ i ` u i which takes the same form as in their paper. We refer tothe paper for more details on the procedure to recover these moments. In this section, we propose a direct application of the model and also describe some models, differentfrom the main model 1 but where, using the appropriate control variables, the two-step approachalso provides identification results under some conditions.
Consider a decision variable x it chosen by an agent and an outcome variable y it realized after thechoice of x it , given by y it “ x it µ i ` (cid:15) it . Such a production function can be used to model education outcomes, where x it is any type ofparental investment, and the random coefficients represent heterogeneity in the returns to invest-ment at the child level. But y it can also be a firm or farm output, with x it being capital, laborand/or land inputs.The use of a triangular system in such a model is suggested in Imbens and Newey (2009) andwe follow this example here, using for each time period the decision problem to obtain a selectionequation. An important difference is that we assume that the agent does not know p µ i , (cid:15) it q at thetime of the decision. Instead, she has information about it contained in η it P R , scalar randomvariable. Writing C t p x, z q a cost function with z the cost shifters, she chooses x it to maximize anexpected profit, x it “ argmax x E p y it ´ C t p x, z it q | z it , η it q . (18)This implies the existence of a function H t such that x it “ H t p z it , η it q . We assume that for all t ď T , H t p z it , η q is strictly monotonic in η with probability 1, η t is continuously distributed and its CDF19s strictly increasing. We also assume that p η it , (cid:15) it q t ď T | ù p z it q t ď T . Defining v it “ F x t | z t p x it | z it q “ F η t p η it q , these assumptions as shown in Imbens and Newey (2009) imply that (cid:15) it is independent of Z i conditional on V i . Therefore, E p (cid:15) it | V i , X i q “ E p E p (cid:15) it | V i , X i , Z i q| V i , X i q “ E p E p (cid:15) it | V i , Z i q| V i , X i qq “ E p (cid:15) it | V i q “ : f t p V i q . This proves that the model studied here satisfies the control function assumption, Assumption2.1. If in addition the invertibility assumption, Assumption 2.3, holds, the identification resultsobtained in the previous section apply and the average returns to input are identified. Note howeverthat this result requires the information η it to be scalar while the unobserved heterogeneity p µ i , (cid:15) it q in the outcome equation is of higher dimension. On the other hand, it does not require theinstrument to be independent of the returns µ i . Consider a panel model with random coefficients and sample selection. Loosely speaking, if theselection is correlated with the disturbance of the main equation, an endogeneity problem arises,since the regressors of the selected individuals will be correlated with the disturbance as well. Das,Newey, and Vella (2003) study a nonparametric model of sample selection in a cross sectional settingand address the endogeneity issue with a selection equation which provides them with a controlvariable.The selection equation studied here is similar and some of the arguments closely follow theirs,but the outcome equation differs: as in (1) it is a panel random coefficients specification. Theselection model we consider is y ˚ it “ x it µ i ` α i ` (cid:15) it ,d it “ p η it ď C t p x it , z it qq , (19) y it “ d it y ˚ it , where z it is an instrument. Let d i “ p d it q t ď T , and write d i “ d it “ t ď T . Also, let p it “ E ` d it | x it , z it ˘ “ P p η it ď C t p x it , z it qq , P i “ p p it q t ď T , and assume thatfor each t there is a function f t such that for all t ď T , E p (cid:15) it | d i “ , X i , P i q “ f t p P i q . (20)Note that as pointed out in Das, Newey, and Vella (2003) in the cross-sectional case, thisassumption is satisfied in particular if p (cid:15) is , η is q s ď T | ù p X i , Z i q and if the cdf of η t , F t , is strictlyincreasing. Indeed, in this case defining ν it “ F t p η it q , ν i “ p ν it q t ď T , p it “ F t p C t p x it , z it qq , then d it “ p ν it ď p it q and E p (cid:15) it | d i “ , X i , P i q “ E p E p (cid:15) it | ν i , X i , Z i q | d i “ , X i , P i q “ E p E p (cid:15) it | ν i q | ν i ď P i q : “ f t p P i q ,
20s desired (where by an abuse of notation the inequality ν i ď P i denotes the inequality componentby component). Note that the joint distribution of p (cid:15) it , η q is unrestricted under these assumptions.The conditional expectation has a form similar to the control function assumption we maintainedin the identification section on the main model, where the control variable is now p it and is identifiedthrough a cross sectional regression of d it , for each period t . Identification can thus be obtainedby a similar two-step argument. The important difference is that all the conditional expectationsare evaluated for the subsample such that d i “
1, that is, the subsample of individuals who areselected in all periods. To be more precise, define u it “ (cid:15) it ´ f t p P i q , ˜ x it “ d i t ` x i t ` ´ d it x it ,˜ g t p P i q “ d i t ` f t ` p P i q ´ d it f t p P i q , and similarly, ˜ u it and the matrices and vectors ˜ X i , ˜ u i , ˜ g p P i q ,˜ M i and ˜ Q i . Note that for the subsample such that d i “
1, we have ˜ X i “ X i . Hence, E ´ ˜ M i y i | d i “ , P i ¯ “ E ´ ˜ M i ˜ X i µ i ` ˜ M i ˜ g p P i q ` ˜ M i ˜ u i | d i “ , P i ¯ “ E ´ M i X i µ i ` M i g p P i q ` M i u i | d i “ , P i ¯ “ M p P i q g p P i q , where we define M p P i q “ E ´ ˜ M ˜ y i | p d i “ q , P i ¯ . This first step equation identifies g on the supportof P i if M p P i q is invertible a.s. The second step equation will be given by E ´ ˜ Q i ˜ y i ´ ˜ Q i ˜ g p P i q| d i “ ¯ “ E p µ i | d i “ q . Assumption 3.1. E p (cid:15) t | d “ , X, P q “ f t p P q , and M p P q is invertible almost surely in P . Result 3.1.
Under Assumptions 4.7 and 3.1, E p µ | d “ q is identified. The identified object is the average effect conditional on selection, E p µ | d “ q which is in generaldifferent from E p µ q unless µ | ù p η, X, Z q . That additional restrictions are needed to identify E p µ q is intuitive: when d it ‰ T ą d x `
2, we recommend using the procedure described inSection 2.4.2 and computing the average effect conditional on being selected in a subset of timeperiods. Averaging over all subsets identifies a conditional average effect under some additionalconditions. This avoids using only the subsample of individuals for whom d it “ t ď T ,which can be quite small if T is large (and still fixed).Other cases studied in Das, Newey, and Vella (2003) can be handled here. For instance, themodel allows for regressors s it to be suject to selection as well, that is, to be not observed forthe population such that d it “ C t and the condition E p (cid:15) t | d “ , X, S, P q “ f t p P q holds, the identification argument remains valid.21his sample selection model can also be adapted to the case where some of the regressors areendogenous. If these regressors are not multiplied by random coefficients, the argument of Section2.4.1 can be applied. If they are accompanied by random coefficients, on the other hand, we suggestusing the control function approach on the endogenous regressors, the control variables being forinstance the residuals of the regression of the endogenous regressors on the exogenous regressorsand instruments. The identification method developed above would then use a vector of controlvariables which include these residuals in addition to the propensity scores.One last case worth mentioning, to which our two-step approach can be adapted, is when someregressors are endogenous and subject to selection. We briefly explain how to construct the controlvariables. The model is y it “ d it y ˚ it , with y ˚ it “ x it µ i ` x it µ i ` (cid:15) it ,x it “ d it x ˚ it , with x ˚ it “ π t p x it , z it q ` v it , (21) d it “ ` ν it ď p t p x it , z it , z it q ˘ “ p ν it ď p it q , with ν it „ U r
0; 1 s . Assume that p (cid:15) is , ν is , v is q s ď T | ù p x is , z is , z is q s ď T . In this model, x is the endoge-nous regressor. Identification of p it holds by E ` d it | x it , z it , z it ˘ “ p it . Moreover E ` v it | d it “ , x it , z it , z it ˘ “ E ` E p v it | ν it q |p ν it ď p it q , x it , z it , z it ˘ : “ φ t p p it q , implying E ` x it | d it “ , x it , z it , z it ˘ “ π t p x it , z it q ` φ t p p it q . This gives x it ´ E ` x it | d it “ , x it , z it , z it ˘ “ v it ´ φ t p p it q : “ ¯ v it , (22)where ¯ v it is identified. That is, the residuals for individuals selected in the sample are also controlvariables. The corresponding estimator will not need generated covariates. We define again ν i “p ν is q s ď T , and similarly d i , P i , ¯ V i , X i “ p X i , X i q and φ p P i q “ p φ t p p it qq t ď T . Note that the function p V i , P i q ÞÑ p ¯ V i , P i q is one-to-one. Therefore, E ` (cid:15) it | d i “ , X i , ¯ V i , P i ˘ “ E ` E ` (cid:15) it | ν i , V i , X i , Z i , Z i ˘ | p ν i ď P i q , X i , V i , P i ˘ “ E p E p (cid:15) it | ν i , V i q | p ν i ď P i q , X i , V i , P i q : “ h t p V i , P i q , “ h t p ¯ V i ` φ p P i q , P i q : “ f t p ¯ V i , P i q . This conditional expectation is as in Assumption 2.1, where the control variables are p ¯ V i , P i q . Thisdouble use of the control function approach is already suggested in Das, Newey, and Vella (2003).We presented here a slight modification such that the identification requires two steps instead ofthree. This for instance allows one to use the formula for the asymptotic variance given in Section4.4 (provided π is not an object of interest) as it is known that increasing the number of stepstypically changes the asymptotic variance matrix.22 .3 Relaxing a strict exogeneity condition Instead of focusing on contemporaneous endogeneity, that is the joint dependence of p (cid:15) it , x it q , onecan use the framework of this paper to relax restrictions on the joint dependence of p (cid:15) it , x it ` , ..., x iT q .This corresponds to relaxing the strict exogeneity condition imposed in Arellano and Bonhomme(2012) to allow for sequential exogeneity.The model is as (1), y it “ x it µ i ` α i ` (cid:15) it , (23)where E p (cid:15) it | x it q “ E p (cid:15) it | X i q “ x it ` can be impacted by (cid:15) it . As in the main model, the ideais to look for an identified vector V i such that E p (cid:15) it | x i , ..x iT , v i , ..v iT q “ f t p v i , ..v iT q “ f t p V i q .We consider the case where x it is a Markov process and write x i t ` “ m t p x it q ` η i t ` , 1 ď t ď T ´
1, where η it are i.i.d over time. One could alternatively consider x i t ` “ m t ` p x it , η i t ` q with η scalar and m t strictly monotonic in η and use the control variable suggested in Imbensand Newey (2009). We also assume that p (cid:15) it , η i t ` , .. , η iT q | ù p x i , η i , .. , η it q , which implies that p (cid:15) it , η i t ` , .. , η iT q | ù p x i , .. , x it q . That is, the innovations giving the evolution of x after time t and (cid:15) t are independent of past values of x . However the joint distribution of p (cid:15) it , η i t ` , .. , η iT q is notrestricted and allows for sequential exogeneity. Then, for all t less than T , E p (cid:15) it | x i , .. , x iT , η i , .. , η iT q “ E p (cid:15) it | x i , η i , .. , η iT q “ E p (cid:15) it | η i t ` , .. , η iT q : “ f t p η i , .. , η iT q : “ f t p V i q , where the first equality holds by the Markov structure of x , and the second by the independenceassumption on the error terms. We define V i “ p η i , .. , η iT q to be the control variable. Notethat the η it are all identified as the residuals of reduced form regressions. Assuming independencebetween η it for t ě x i might restrict the joint distribution of p x it , µ i q if x i is correlated with µ i but it does not necessarily require independence between x it and µ i .Defining as previously M i “ I ´ X i p X i X i q ´ X i , M p V i q “ E p M i | V i q , u it “ (cid:15) it ´ f t p V i q and g t p V i q “ f t ` p V i q ´ f t p V i q , (23) together with the independence assumptions guarantees, as in themain model, E p M i y it | V i q “ M p V i q g p V i q , and E p Q i y i q “ E p µ i q ` E p Q i g p V i q . (24)A two-step procedure, as in the main model, requires M p V q to be nonsingular: a first step identifiesthe vector of functions g and a second step identifies the average effect. Note that as for the sample selection example, independence is stronger than needed. Conditional mean inde-pendence of E p (cid:15) it | x i , η i , .. , η iT q with respect to x i is sufficient. M p V q can be nonsingular is nontrivial here. Indeed by definition, V i “ p η i , ..η iT q while M i is constructed using the vector of variables X i : therefore the expectation of M i conditional on V i is an expectation over x i only, which is the instrument here. To show that this invertibilitycondition can actually hold, let us look more closely at the case where x t is a scalar AR(1) process,that is, x i t ` “ ρx it ` η i t ` with ρ ‰ ρ ‰
0, and p (cid:15) it , η i t ` , .. , η iT q | ù p x i , η i , .. , η it q . Bydefinition, M i “ I ´ X i X i {p X i X i q . Moreover, x it “ ρ t ´ x i ` t ÿ s “ ρ t ´ s η is ñ x i t ` ´ x it “ ρ t ´ p ρ ´ q x i ` p ρ ´ q t ÿ s “ ρ t ´ s η is ` η i t ` , therefore defining the two vectors C “ r ρ ´ sp , ρ, .. , ρ T ´ q P R T ´ and C p V q “ p η i , r ρ ´ s η i ` η i , .. , r ρ ´ s ř T ´ s “ ρ T ´ s ´ η is ` η i T q P R T ´ , we can write X i “ x i C ` C p V i q . (25)For a given value ¯ V P S V , we proved in Section 2.3 that M p ¯ V q R GL T ´ p R q ô D a P R T ´ zt u , M p ¯ V q a “ , ñ D a P R T ´ zt u , M a “ P X | V “ ¯ V a.s, ñ D a P R T ´ zt u , X is collinear to a P X | V “ ¯ V a.s.The draws of X from P X | V “ ¯ V , as can be seen in (25), differ only in the value of x : these drawsare the sum of two vectors, C p ¯ V q which is fixed since the draws are conditional on V “ ¯ V , and x C proportional to the constant vector C . Note that S x | V “ ¯ V “ S x since x | ù V . If thereare two nonzero points x and x in S x such that a is collinear to X “ x C ` C p ¯ V q and to X “ x C ` C p ¯ V q , then since a ‰ X and X are collinear. Since C ‰
0, this implies that either C and C p ¯ V q are proportional, or C p ¯ V q “
0. Note that C p ¯ V q “ V “
0, and one canshow that if C and C p ¯ V q are proportional, this implies that ¯ V P " b. ˆ ... ˙ | b P R * “ : D . Hence,the existence of such x and x implies ¯ V P D . D is a subset of R T ´ with P V measure 0 if V i iscontinuously distributed on R T ´ . We summarize the arguments in the following assumption andresult. Assumption 3.2.
1. (23) holds and for all t ď T , x i t ` “ ρx it ` η i t ` with ρ ‰ , p (cid:15) it , η i t ` , .. , η iT q | ù p x i , η i , .. , η it q ,and p X i , µ i , α i , (cid:15) i , η i q is i.i.d,2. Either x i has a discrete distribution with at least two support points, or x i is continuouslydistributed and Int p S x q ‰ H . Moreover, p η i , .., η iT q is continuously distributed on R T ´ ,and f t is continuous. esult 3.2. Under Assumptions 4.7 and 3.2, E p µ i , α i q is identified. This result is an example of the case where identification does not require the instrument tovary over time, but where the impact of z it “ x i on each time period x it creates sufficient timevariation of the regressor as a result of the condition ρ ‰ Note . While the model exposed above allows for the regressor to be correlated with pastdisturbances, the assumptions are not compatible with a lagged dependent variable as regressor.Indeed writing x i t ` “ m t p x it q ` η i t ` implicitly imposes a homogeneous dependence on the pastvalue that is not consistent with the specification y i t ` “ µ i y it ` (cid:15) it . Note . Using the first value of the regressor as an instrument for the correlation between theresidual (cid:15) at a given time and the future values of the regressors can be extended to models wherein addition to contemporaneous endogeneity, there is a such a feedback effect. First, note thatAssumption 2.1 (2) that E p (cid:15) it | X i , V i q “ f t p V i q does not exclude feedback for the control variablesthemselves as all time periods of p v i s q s ď T are arguments of the function f t . One could howeverwonder if a feedback effect in the instruments z it is allowed within the framework of this paper. Forinstance, the production function example given in Section 3.1 imposed p η it , (cid:15) it q t ď T | ù p z it q t ď T . Asin the example studied above, it is actually possible to use the first observation of the instrumentsto construct the control variables. We show this briefly in the same production function model butwith a Markov structure on the instrument.Recall x it “ H t p z it , η it q and assume that the instrument satisfies z it ` “ m t ` p z it , ν it ` q , wherethe ν it P R are i.i.d over t . Assume moreover that m t ` p z it , . q is strictly increasing with probability1, ν t is continuously distributed, its CDF is strictly continuous for all t , and p (cid:15) it , η i , .. , η i T , ν it ` , .. , ν iT q | ù p z i , ν i , .. , ν it q . This assumption guarantees (cid:15) it | ù z it but does not restrict the joint distribution of (cid:15) it and ν is , s ě t `
1. Define r it “ F z t ` | z t p z it ` | z it q , s it “ F x t | z t p x it | z it q , and V i “ p s i , r i , s i , .. , r iT , s iT q .The vector V i is identified as a collection of residuals of cross-section reduced form regressions.Moreover, E p (cid:15) it | X i , V i q “ E p (cid:15) it | H p z i , η i q , ..., H p z iT , η iT q , s i , r i , s i , .. , r iT , s iT q“ E p (cid:15) it | r it , ..., r iT , s i , ..., s iT q : “ f t p V i q , by the strict monotonicity of the functions p m t q ď t ď T , p H t q t ď T , p F ν t q ď t ď T and p F η t q t ď T . Underinvertibility of M p V q P V almost surely, identification of the average effect is obtained.25 Estimation
As seen in Section 2, the proof of identification of E p µ q in Model (1) are constructive. An es-timator can therefore naturally constructed by following the identification steps and replacingpopulation moments with their sample analogs. We assumed that the control variables are givenby v it “ C t p x it , z it q where C t is identified: for ˆ C t an estimator of this function, either parametric ornonparametric depending on the form of the control variables, v it is estimated with ˆ v it “ ˆ C t p x it , z it q .The conditional expectation functions M p V q “ E p M i | V i “ V q and k p V q “ E p M i y i | V i “ V q are es-timated nonparametrically using the generated values ˆ V as regressors and the function g “ M ´ k is estimated plugging in the estimators ˆ M p V q and ˆ k p V q in this formula. As we highlighted inSection 2.2.1, the condition E p|| Q i || q ă 8 may not hold in the data so out of caution we estimate E p µ | δ q , where we defined δ i “ p det p X i X i q ą δ q . The estimator for E p µ | δ q will be a sample analogof Equation (12), plugging in the estimator of g and V .It is clear that the asymptotic properties of this estimator will depend on the definition of thecontrol variables, that is, on C t . The focus of the asymptotic analysis will thus be on an importantexample in the class of models satisfying (1) and Assumption 2.1. More specifically, the model is y it “ x it µ i ` x it µ i ` α i ` (cid:15) it , (26) x it “ b t p x it , z it q ` v it , E p v it | x it , z it q “ , where x it P R d , x it P R d , z it P R d z , and where Assumption 2.1 holds. Here the regressors x areexogenous while x can be endogenous. Note that v it “ x it ´ E p x it | x it , z it q . The control variablesin this model are the residuals of the nonparametric regression of the endogenous regressors on theexogenous regressors and the instruments.The estimators used in the asymptotic analysis are as described above, where the estimatorˆ v it is the residual from the nonparametric regression estimation of x it , and all estimators of thenonparametric regressions will be series estimators. We proceed in this section with an explicitdefinition of the estimators and a stepwise proof of asymptotic normality of ˆ µ . All proofs are inthe Appendix. The vector of control variables is V i “ p v i , .. , v iT q P R T d , where v it “ x it ´ E p x it | x it , z it q . We write ξ it “ p x it , z it q . Consider the L ˆ r L p ξ t q “ p r L p ξ t q , .. , r LL p ξ t qq and r it “ r L p ξ it q . We define the series estimators of the regression function E p x it | ξ it “ ξ t q “ b t p ξ t q to be ˆ β t r L p ξ t q where ˆ β t is L ˆ d , andˆ β t “ p R t R t q ´ ÿ i r L p ξ it q x it “ p R t R t q ´ R t X t , (27)26here R t “ p r t , .. , r nt q is L ˆ n and X t “ p x t , .. , x nt q is d ˆ n .The control variables are defined as the residuals of the regression of x t . Later, the support S V of V will be assumed bounded. However, the values obtained using the estimated residuals mightnot be in S V : it will be convenient for the asymptotic analysis to introduce a transformation τ of thegenerated variables to ensure that their transformed values lie in S V . Specifically, we assume thatthe support of v t is of the form Ś d d “ r v td , ¯ v td s so that the support of V is S V “ Ś d ď d , t ď T r v td , ¯ v td s .We define τ such that for V “ p v , .. , v T q P R T d , then τ p V q P S V and the p d p t ´ q ` d q th component of τ p V q satisfies τ p V q p t ´ q d ` d “ $’’’’&’’’’% v t,d , if v t,d P r v td ; ¯ v td s ,v td , if v t,d ď v td , ¯ v td , if v t,d ě ¯ v td , where v t,d is the d th component of v t . Write r it “ r L p ξ it q and ˆ b it “ ˆ β t r it and define b it “ b t p ξ it q and ˆ b t “ ˆ β t r L . We also define the residuals ˜ v it “ x it ´ ˆ b it , and ˜ V i “ p ˜ v i , .. , ˜ v iT q . Our estimatorfor V i will then be ˆ V i “ τ p ˜ V i q : τ p ˜ V i q is the projection of ˜ V i onto S V such that if ˜ V i lies outside of S V , the function τ p ˜ V q is the point on the boundaries of the support that is the closest to V . Notethat for all draws of V i , τ p V i q “ V i and || ˆ V i ´ V i || ď || ˜ V i ´ V i || .Let p K p V q “ p p K p V q , .. , p KK p V qq denote a K ˆ p i “ p K p V i q and ˆ p i “ p K p ˆ V i q . An estimator of h W p V q “ E p w i | V i “ V q for a generic scalar randomvariable w i using the generated ˆ V is p K p V q ˆ π W where ˆ π W is a vector of size K given byˆ π W “ p ˆ P ˆ P q ´ ÿ i p K p ˆ V i q W i “ p ˆ P ˆ P q ´ ˆ P W, (28)where ˆ P “ p ˆ p , .. , ˆ p n q is K ˆ n and W “ p w , .. , w n q is a vector of size n .Using this general definition, we construct component by component estimators ˆ M and ˆ k forthe matrix and vector valued functions M and k . We obtain p K p V q ˆ π M,st an estimator of the p s, t q component of the matrix M , taking w i to be p M i q s,t . Similarly, an estimator of the s thcomponent of k will be p K p V q ˆ π k,s , choosing w i “ p M i y i q s . Under Assumptions 2.1 and 2.3, wehave g p V q “ M p V q ´ k p V q . A straightforward estimator of g is thusˆ g p V q “ ˆ M p V q ´ ˆ k p V q . The closed-form expression (12) suggests the use of a sample average to estimate E p µ | δ q , pluggingin the nonparametric estimator of g evaluated at the generated values. The estimator isˆ µ “ ř ni “ δ i Q i r y i ´ ˆ g p ˆ V i qs ř ni “ δ i “ ř ni “ Q ¯ δi r y i ´ ˆ g p ˆ V i qs ř ni “ δ i . µ . This type of asymptotic analysis is the subjectof a wide literature on nonparametric and semiparametric estimation with generated covariates.Before laying out the main results of our asymptotic analysis, we give here a brief overview of thisliterature.Papers studying asymptotic normality of semiparametric estimators, such as Newey (1994a),Chen, Linton, and Van Keilegom (2003), Ai and Chen (2003) and Ichimura and Lee (2010) amongmany other references, have a level of generality which encompasses the case where the regressorsare themselves estimated. However, the conditions given in these papers are “high-level” conditionsand are not easily applied to the composition of nonparametrically estimated infinite-dimensionalnuisance parameters. Examples of asymptotic derivations in specific models with generated re-gressors are papers already citeasnound such as Newey, Powell, and Vella (1999), Imbens andNewey (2009), and Das, Newey, and Vella (2003), as the use of a nonparametric control functionapproach naturally suggests an estimator with generated covariates. Others are, e.g, Ahn and Pow-ell (1993), Blundell and Powell (2004), Newey (2009) and Escanciano, Jacho-Ch´avez, and Lewbel(2016). Moreover, recent contributions have focused on obtaining general asymptotic results forsuch semiparametric estimators. Among important recent contributions, Hahn and Ridder (2013)derives in the spirit of Newey (1994a) a general formula of the asymptotic variance of estimatorswith generated regressors. However they do not provide results on how to obtain asymptotic nor-mality for particular classes of estimators. For estimators with generated regressors depending ona nonparametrically estimated function, this type of analysis can be found for instance in Escan-ciano, Jacho-Ch´avez, and Lewbel (2014), Mammen, Rothe, and Schienle (2016) and Hahn, Liao,and Ridder (2018). Escanciano, Jacho-Ch´avez, and Lewbel (2014) obtain a uniform expansion ofa weighted sample average of residuals obtained from kernel-estimated nonparametric regressionswith generated covariates, which can be then be used to prove asymptotic normality of a class ofsemiparametric estimators. Mammen, Rothe, and Schienle (2016) study the asymptotic normal-ity of a general class of semiparametric GMM estimators depending on a nonparametric nuisanceparameter, also constructed with generated covariates. Our estimator of the APE ˆ µ belongs tothis class of estimators, although of a simpler form since it has a closed-form expression. Moreoverwe use series to construct the nonparametric estimates while the infinite dimensional nuisance pa-rameter in Mammen, Rothe, and Schienle (2016) is a conditional expectation estimated with localpolynomial estimator and they do not specify an estimator for the generated covariates. Estimatorsin Hahn, Liao, and Ridder (2018) have a structure closer to that of ˆ µ : they study nonparametrictwo-step sieve M estimators, but focus on known functionals. They show asymptotic normality of28heir estimator when standardized by a finite sample variance and give a practical estimator of thisvariance. They do not however provide an explicit formula of the asymptotic variance. The esti-mator we analyze in this section is instead an estimated functional of the two-step nonparametricestimators. Using a different type of proof techniques with lower level conditions on the primi-tives of a more specific class of models, we show asymptotic normality and obtain the asymptoticvariance of a generic class of estimators to which ours belongs. See, e.g, Mammen, Rothe, andSchienle (2016) for a literature review on semiparametric estimation with generated covariates andexplanation on the specificity of this type of estimation. We introduce some notations. For a vector a P R p , || a || is its Euclidean norm. We also denote by || . || F the Frobenius norm (the canonical norm) in the space of matrices M p p R q , and || . || the matrixnorm induced by || . || on R p (the spectral norm). We recall that for a given matrix A P M p p R q , || A || F “ ´ř i,j ď p a ij ¯ { “ tr p A A q { . To avoid tedious notations, we will regularly omit thesubscript F , || A || without index implies that the norm considered is the Frobenius norm. Theindex will be displayed when clarity requires it. We define λ min p A q to be the smallest eigenvalue ofthe matrix A (when it has one), similarly λ max p A q , as well as λ p A q ď ... ď λ p p A q all the eigenvaluesranked by increasing order (when they exist).We will use the following results. First, for all A P M p p R q , || A || ď || A || F . This inequality alsoholds for nonsquare matrices. Also, for A a symmetric matrix, || A || “ | λ max p A q| and || A || F “ ř pi “ λ i p A q . By definition of || . || , || A a || ď || A || || a || .We also write, for g a vector of functions of x P S x Ă R k , || g || “ sup S x || g p . q|| . For l “p l , .. , l k q P N k , we define | l | “ ř kj “ l j , and the partial derivative B l g p x q “ B | l | g p x q{B l x ... B l k . Wewill use the norm | g | d “ max | l |ď d sup x P S x ||B l g p x q|| when g is d times differentiable. We denote by B g p x q the Jacobian matrix pB g p x q{B x , ..., B g p x q{B x k q . In what follows, LLN denotes the weak lawof large numbers, C a generic constant (whose value can change from one line to another) and for asequence p c n q n P N P R N , the notation c n Ñ c n Ñ n Ñ8
0. We now proceedstep by step to derive uniform and MSE convergence rates of all nonparametric estimators.
Sample mean square error for estimator of the control variables :
The generatedcovariates ˆ V i are constructed as the estimated residuals of T regressions. The method to generateregressors is as in Newey, Powell, and Vella (1999) and we use some of their results. Assumption 4.1.
There exists γ ą and a p L q such that a L { n a p L q ÝÝÝÑ n Ñ8 and for all t ď T ,1. p x it , ξ it q is i.i.d over i , continuously distributed and Var p x t | ξ t q is bounded,2. There exists a L ˆ L nonsingular matrix Γ t such that for R L p ξ t q “ Γ t r L p ξ t q , E p R L p ξ t q R L p ξ t q q has smallest eigenvalue bounded away from zero uniformly in L , . There exists β Lt such that sup S ξt || b t p ξ t q ´ β L t r L p ξ t q|| ď CL ´ γ ,4. sup S ξt || R L p ξ t q|| ď a p L q . Result 4.1.
Under Assumption 4.1, n n ÿ i “ || V i ´ ˆ V i || “ O P ` L { n ` L ´ γ ˘ “ O P p ∆ n q , (29)max i || V i ´ ˆ V i || “ O P p a p L q ∆ n q . (30)If for instance b t is continuously differentiable up to order p , writing d ξ “ d ` d z , then As-sumption 4.1 (3) holds with γ “ p { d ξ for different choices of sieve basis.Conditions satisfying Assumption 4.1 (2) typically require the support of ξ t to be bounded andthe density of ξ t to be bounded away from 0 on its support. This restriction is not desirable. Indeed,in applications where the density of the regressors goes to 0 at the boundaries, regressors will betrimmed to consider only a subset of S ξ where the density is bounded away from 0. However,we are interested here in the average effect E p µ q and counterfactuals involving population means.Trimming arbitrarily on regressors to estimate a conditional effect is contrary to this goal. Wetherefore provide a set of conditions allowing the density of the regressor to go to 0 at the boundaryof its support when the support is bounded. We follow Imbens and Newey (2009) which developsan argument of Andrews (1991) in assuming a polynomial lower bound on the rate of decrease ofthe density. Formally, we assume that S ξt is of the form Ś d ξ d “ r ξ td ; ¯ ξ td s . Recall that ξ td is the d th component of ξ t . In this case, the set of conditions in Assumption 4.1 can be modified as follows. Assumption 4.2.
There exists α ą and γ ą such that a L { n L α ` ÝÝÝÑ n Ñ8 and @ t ď T ,1. p x it , ξ it q is i.i.d over i , continuously distributed and Var p x t | ξ t q is bounded,2. r L p . q is the power series basis, and @ ξ t P S ξt , f ξt p ξ t q ě Π d ξ d “ p ξ dt ´ ξ dt q α p ¯ ξ dt ´ ξ dt q α ,3. There exists β Lt such that sup S ξt || b t p ξ t q ´ β L t r L p ξ t q|| ď CL ´ γ . Result 4.2.
Under Assumption 4.2, (29) and (30) hold.
Allowing for unbounded support is a desirable extension as well and is made possible using amethod similar to Chen, Hong, and Tamer (2005) and Chen, Hong, and Tarozzi (2008), but isoutside the scope of this paper.
Convergence rates of two-step series estimators :
The estimator of E p w i | V i “ V q “ h W p V q defined in (28) using the generated control variables is as in Newey, Powell, and Vella(1999), aside from the panel aspect. However, they impose an orthogonality condition which bydefinition does not hold for our specific choices of w i and this rules out a direct application of theirasymptotic results. More specifically, writing w “ h W p V q ` e W , E p e W | V q “ ,
30n additional assumption required to apply directly Newey, Powell, and Vella (1999) would be E p e W | X , V, Z q “
0, that is, e W is conditionally mean-independent of all variables involved in thefirst step, that is in the construction of the control variables. However this condition does not holdwhen w i is either a component of the matrix M “ I ´ X p X X q ´ X or of the vector M y , because E p M | X , X , Z q “ M ‰ E p M | V q “ M p V q , E p M y | X , X , Z q “ M g p V q ` M E p u | X , X , Z q ‰ E p M y | V q “ k p V q “ M p V q g p V q . This difference has been documented for instance in Hahn and Ridder (2013) and Mammen, Rothe,and Schienle (2016). It has implications for the convergence rate of the two-step estimator whichwill have an extra term and, as will be clear in a later part of the paper, on the asymptotic varianceof a linear functional of this estimator. To account for the extra term E p e W | X , V, Z q “ E p e W | X, V q where X “ p X , X q , we write w “ h W p V q ` e W , E p e W | V q “ ,w “ h W p V q ` ρ W p X, Z q ` e W ˚ , E p e W ˚ | X, Z q “ , (31)where ρ W p X, Z q “ E p e W | X, Z q “ E p w | X, Z q ´ E p w | V q . As was done for estimation of b t , wefirst state results on ˆ h under generic assumptions then show that these assumptions are satisfiedwhen the regressors have bounded support and their joint density goes to 0 at the boundary of thesupport. Assumption 4.3. e W ˚ i is i.i.d and E pp e W ˚ i q | X, Z q is bounded on S X,Z ,2. h W is Lipschitz on S V and ρ W is bounded on S X,Z ,3. There exists a K ˆ K nonsingular matrix Γ , such that for P K p V q “ Γ p K p V q , E p P K p V q P K p V q q has smallest eigenvalue bounded away from zero uniformly in K ,4. There exists γ and π K such that sup S V | h W p V q ´ p K p V q π KW | ď CK ´ γ ,5. For sup S V || P K p V q|| ď b p K q and sup S V ||B P K p V q{B V || ď b p K q , ? K b p K q ∆ n ÝÝÝÑ n Ñ8 and a K { n b p K q ÝÝÝÑ n Ñ8 . Result 4.3.
Under Assumption 4.1 and 4.3, ż ˇˇˇ ˆ h W p V q ´ h W p V q ˇˇˇ dF p V q “ O P p K { n ` K ´ γ ` ∆ n b p K q q , (32)sup V P S V | ˆ h W p V q ´ h W p V q| “ O P ´ b p K qp K { n ` K ´ γ ` ∆ n b p K q q { ¯ . (33)The additional term in the mean-squared error convergence rate compared to, e.g, Newey,Powell, and Vella (1999) or Hahn, Liao, and Ridder (2018) (Section 3 of the Online Appendix) is∆ n b p K q . It comes from the correlation between e W and ˆ P ´ P .31e insert here a corollary stating the rate of convergence of the first order partial derivative ofthe two-step nonparametric estimator. This will be used in the proof of asymptotic normality. Itsproof follows from the proof of Result 4.3. Corollary 4.1.
Under Assumption 4.1 and 4.3, if sup S V |B h W p V q ´ π K W B p K p V q| ď CK ´ γ , sup V P S V |B ˆ h W p V q ´ B h W p V q| “ O P ´ b p K qp K { n ` K ´ γ ` ∆ n b p K q q { ¯ . As mentioned earlier, Assumption 4.3 (3) is often shown to hold for regressors with boundedsupport and density bounded away from 0 on their support. For reasons stated earlier, we preferavoiding any trimming on covariates. We therefore give a set of conditions allowing the densityof the regressors to go to zero on the boundaries of the support. Recall that the support of V is S V “ Ś d ď d , t ď T r v td , ¯ v td s . Assumption 4.4. e W ˚ i is i.i.d and E pp e W ˚ q | X, Z q is bounded on S X,Z ,2. h W is Lipschitz on S V and ρ W is bounded on S X,Z ,3. p K p . q is the power series basis, and @ V P S V , f V p V q ě Π d ď k , t ď T p v t,d ´ v td q α p ¯ v td ´ v t,d q α ,4. There exists γ and π KW such that sup S V | h W p V q ´ π K W p K p V q| ď CK ´ γ ,5. b p K q “ K α ` { ∆ n Ñ n Ñ8 and b p K q “ K α ` { {? n Ñ n Ñ8 . Result 4.4.
Under Assumptions 4.2 and 4.4 , (32) and (33) hold.
For these results to apply to our choice of w i “ p M i q s,t and w i “ p M i y i q t for 1 ď s, t ď T ´ Assumption 4.5. E p u | X, Z q and Var p|| u || | X, Z q are bounded on S X,Z ,2. M and k are Lipschitz and g is bounded on S V ,3. p K p . q is the power series basis, and @ V P S V , f V p V q ě Π d ď k , t ď T p v t,d ´ v td q α p ¯ v td ´ v t,d q α ,4. There exists γ and p π KM,st q s,t and p π Kk,t q s,t such that sup S V | M st p V q ´ π K M,st p K p V q| ď CK ´ γ and sup S V | k t p V q ´ π K k,t p K p V q| ď CK ´ γ for all s, t ď T ´ ,5. K α ` { ∆ n Ñ n Ñ8 and K α ` { {? n Ñ n Ñ8 . Under Assumptions 2.1, 4.2 and 4.5, the convergence rate of ˆ M and ˆ k in sup norm and meansquare norms are therefore given by (32) and (33). Convergence rate for ˆ g : Recall that ˆ g p V q “ ˆ M p V q ´ ˆ k p V q . The rate of convergence of ˆ g p . q is obtained using continuity arguments. We will assume the following set of conditions. Assumption 4.6. M and g are continuous on S V , S V is a compact set and M p V q is invertiblefor all values V P S V . k “ M g is continuous as well and that || m || , || M || and || g || exist.Note that the continuity assumption somewhat overlaps with Assumption 4.5 (4) and (2) as theexistence of a linear approximation relies on smoothness assumptions. Moreover while Assumption2.3 requires the matrix to be invertible only P V a.s, we assume here that M p V q is invertible for allvalues in the support. Under these conditions, the following MSE and sup norm rates are obtainedfor ˆ g . Result 4.5.
Under Assumptions 4.2, 4.5 and 4.6, assuming b p K q p K { n ` K ´ γ ` ∆ n b p K q q Ñ , ż || ˆ g p V q ´ g p V q|| dF p V q “ O P p K { n ` K ´ γ ` ∆ n b p K q q , || ˆ g p V q ´ g p V q|| “ O P p b p K qp K { n ` K ´ γ ` ∆ n b p K q q { q . ˆ µ Equipped with the convergence rate results on the nonparametric estimators, we can now showconsistency of the APE estimator ˆ µ . Recall that, writing δ i “ p det p X i X i q ą δ q and Q δi “ δ i Q i ,we defined the estimator for E p µ | δ q to beˆ µ “ ř ni “ Q δi r y i ´ ˆ g p ˆ V i qs ř ni “ δ i . Write γ n “ b p K qp K { n ` K ´ γ ` ∆ n b p K q q { . Assumption 4.7.
Assume E p|| Q δ ||q ă 8 and E p|| Q δ y ||q ă 8 . Result 4.6.
Suppose Assumptions 4.2, 4.4, 4.6, and 4.7 hold. Assume also γ n Ñ , a p L q ∆ n Ñ and that g is continuously differentiable on S V . Then ˆ µ Ñ P E p µ | δ q . We now derive the asymptotic normality of ˆ µ . The analysis is carried out in several steps. First,we modify the trimming function. We then explain how to linearize our estimator as a functionof the nonparametric two-step sieve estimators. We obtain an asymptotic expansion of a generallinear functional of nonparametric two-step sieve estimator which we then apply to the obtainedlinearization of our estimator. Finally we prove that the linear approximation is valid and deriveasymptotic normality of ˆ µ . Recall that we defined ˆ V i “ τ p ˜ V i q , where ˜ V i “ p ˜ v it q t ď T is the vector of residuals from the sieveregression of x it on ξ it “ p x it , z it q and τ projects onto S V “ Ś d ď d , t ď T r v td ; ¯ v td s . The proof of33symptotic normality will use smoothness properties of τ and will require it to be twice differen-tiable, which is not the case when τ is the projection defined in the previous section. We thuschange the definition of τ so that it now projects onto a bounded superset of S V . Importantly wewill not focus anymore on allowing for the density to be 0 on the boundary of S V . Define ς ą τ ς : x P R ÞÑ ς p e ´ x {p ς q ` x { ς ´ q . Note that lim x Ñ´8 τ ς p x q “ ´ ς , lim x Ñ`8 τ ς p x q “ ´ ς andwe also have τ ς p q “ τ ς p q “ τ ς p q “
0. For V P R T d , the p d p t ´ q ` d q th component of τ p V q is given by τ p V q p t ´ q k ` d “ $’’’’&’’’’% v td , if v td P r v td ; ¯ v td s ,v td ` τ ς p v td ´ v td q , if v td ď v td , ¯ v td ´ τ ς p ¯ v td ´ v td q , if v td ě ¯ v td , and define as before ˆ V i “ τ p ˜ V i q . The support of τ is R T d and we now have ˆ V i P S ςV “ Ś d ď d , t ď T r v td ´ ς ; ¯ v td ` ς s . We will refer to S ςV as the “extended support”.Each component of τ is a twice differentiable function of V , implying that τ itself is twicecontinuously differentiable. Moreover for all V P S V , B τ {B V “ I T k which will imply that thederivative of a function m composed with τ evaluated at V , m p τ p V qq , is equal to the derivativeof m p V q whenever V P S V . On the extended support, that is for all V P S ςV , |B τ {B V | ď C and |B τ {B V | ď C for some constant C .It will also be convenient to use extensions of the various regression functions used at differentplaces in our proofs. For a function m : S V Ñ R p (for any given p P N ) such that m is twicecontinuously differentiable on S V , we define m ς : S ςV Ñ R p an extension of m , twice continuouslydifferentiable. That is, for all V in S V , m ς p V q “ m p V q , and m ς must be twice continuouslydifferentiable on the extended support S ςV . Note that if there exists a sequence of functions p m n q n P N converging uniformly to m ς on the extended support S ςV , the sequence of restrictions of p m n q n P N on S V converges uniformly to m . We previously used, for g a function of the variable V , the norm | g | d “ max | l |ď d sup V P S V ||B l g p . q|| . A corresponding norm for the extended functions will change thesupremum to a supremum over the extended support, i.e., | g | ςd “ max | l |ď d sup V P S ςV ||B l g p . q|| .As was the case with our previous definition of τ , || ˆ V i ´ V i || ď || ˜ V i ´ V i || . This guaranteees thatour results on the sup-norm convergence rates of the nonparametric two-step estimators ˆ M and ˆ k and of their derivatives remain valid, provided some changes are made to the definition of the vectorof basis functions p K p . q and to the approximation condition (4) of Assumption 4.3. First, p K p . q isdefined on the extended support, and the bounds b p K q and b p K q are also defined as bounds onthe sup norm over the extended support. Second, the approximation condition must be imposedon the extended functions M ς and k ς . Under these modified conditions, because the extendedfunctions remain Lipschitz, the rates of convergence of the nonparametric two-step estimators to34he extended functions are the same, the rate of convergence of ˆ g is unchanged and consistency ofˆ µ holds.We point out here that we will not show that our asymptotic normality result applies to caseswhere the density of the regressors goes to zero on the boundaries of their support, as we didfor consistency (see Assumption 4.2 and 4.4), Indeed in contrast to the consistency proof, we willuse rates on the sup norm of the nonparametric estimates as well as of their derivatives whenthe suprema are defined over the extended support. This rules out a direct application of theapproach allowing the density of the regressors to go to zero on the boundaries of their support.This approach would require Condition (3) of Assumption 4.4 to hold on S ςV , which cannot be trueif the density of the regressors is 0 on the boundary of the original support. Computing rates onthe extended support allowing for this case is beyond the scope of this paper. We therefore remainsilent on the choice of the basis. We study asymptotic normality of ? n p ˆ µ ´ E p µ qq where we rewriteˆ µ “ ř ni “ δ i { n ˆ µ δ , with ˆ µ δ “ ř ni “ Q δi r y i ´ ˆ g p ˆ V i qs{ n . We will first study ˆ µ δ ´ E p µδ q . We write G “ pp b t q t ď T , k, M q for a vector of generic functions with b t : S ξt ÞÑ R d , k : S ςV ÞÑ R T ´ and M : S ςV ÞÑ M T ´ p R q .For clarity we choose to write G “ pp b t q t ď T , k , M q , for the true values of these functions,that is, for the nonparametric primitives of the model. Note that the functions we consider hereare functions on the extended support. We dropped the exponant ς and will display it to avoidconfusion whenever necessary. We decompose ? n p ˆ µ δ ´ E p µ i δ i qq “ ? n n ÿ i “ Q δi r y i ´ ˆ g p ˆ V i qs ´ E p µ i δ i q , “ ? n « n ÿ i “ r δ i µ i ´ E p µ i δ i qs ` n ÿ i “ Q δi u i ` ? n n ÿ i “ r Q δi g p V i q ´ E p Q δ g p V qqs ´ n ÿ i “ r Q δi ˆ g p ˆ V i q ´ E p Q δ g p V qqs ff , “ ? n n ÿ i “ r δ i µ i ´ E p µ i δ i qs ` ? n n ÿ i “ Q δi u i ´ ? n “ X n p ˆ G q ´ X n p G q ‰ , (34)where we define χ p W i , G q “ Q δi M ` τ “ p x it ´ b t p ξ it qq t ď T ‰˘ ´ k ` τ “ p x it ´ b t p ξ it qq t ď T ‰˘ , X n p G q “ n n ÿ i “ r χ p W i , G q ´ E p χ p W i , G qqs , τ is as defined in Section 4.4.1 and ensures that the argument of M and k lies in S ςV . Werecall that W i “ p X i , V i , Z i , u i , µ i , α i q stands for the whole vector of primitive variables. We writethe variables as column vectors, e.g V i “ p v i , ... , v iT q . We also define X p G q “ E p χ p W i , G qq ´ E p χ p W i , G qq . Note that X p G q “ X make explicit the dependence ofour estimator on the functions p b t q t ď T . The use of generated covariates in place of the true valueof the variables has a twofold impact on semiparametric estimators such as ˆ µ . First the nuisanceparameter M and k are estimated using the generated values. Second the estimators ˆ M and ˆ k are evaluated at the generated values when plugged in in the sample average that defines ˆ µ . Thedependence of X on p b t q t ď T highlights the latter aspect.The two first terms in Equation (34) are normalized sums of i.i.d random variables. Theirasymptotic normality can be established by a standard CLT argument. We focus on the last term, ? n “ X n p ˆ G q ´ X n p G q ‰ , which has a standard form except for its dependence on a composition ofthe infinite dimensional nuisance parameters. Specifically, we define a class of continuous functions H endowed with a pseudometric || . || H such that G P H . Arguments yielding asymptotic normalitytypically require the following set of conditions for our asymptotic analysis. Define H δ “ t G P H : || G ´ G || ď δ u . Assumption 4.8.
1. For all δ n “ o p q , sup || G ´ G || H ď δ n || X n p G q ´ X p G q ´ X n p G q|| “ o P p n ´ { q .2. The pathwise derivative of X at G evaluated at G ´ G , X p G q p G qr G ´ G s , exists in all directions r G ´ G s , and for all G P H δ n with δ n “ o p q , || X p G q ´ X p G q p G qr G ´ G s || ď c || G ´ G || H , forsome constant c ě ,3. || ˆ G ´ G || H “ o P p n ´ { q . Result 4.7.
Under Assumption 4.8, ? n “ X n p ˆ G q ´ X n p G q ‰ “ ? n X p G q p G qr ˆ G ´ G s ` o P p q . Under Assumption 4.8, the asymptotic distribution of ˆ µ depends on the asymptotic behaviorof ? n X p G q r ˆ G ´ G s , where we write X p G q for the pathwise derivative of X at G . It is a linearfunctional of a vector of nonparametric estimators.The definition of H and in particular of || . || H is not straightforward here. The choice of H will be driven by the stochastic equicontinuity condition, that is, Condition (1) of Assumption4.8, following Chen, Linton, and Van Keilegom (2003) and the choice of || . || H will be driven byCondition (2). The structure of our asymptotic analysis is as follows: we first derive the asymptotic36istribution of ? n X p G q r ˆ G ´ G s by studying the general case of a linear functional of the two-stepsieve estimator of a nonparametric regression function and obtain its asymptotic variance. We thenspecify our choice of H and || . || H , show that Assumption 4.8 holds for this choice. Using Result4.7, we obtain the asymptotic distribution of the standardized ˆ µ . We therefore focus here on thelinearized term. The pathwise derivative applied to the estimators can be decomposed as the sumof T ` χ p k q p W i qr ˜ k s , p χ p bt q p W i qr ˜ b t sq t ď T and χ p M q p W i qr ˜ M sq to be the partial pathwise derivatives of χ with respect to k , b t and M (respectively) at the true value G , evaluated (respectively) at ˜ k , ´ ˜ b t ¯ t ď T and ˜ M . We have χ p M q p W i qr ˜ M s : “ χ p M q p W i , G qr ˜ M s “ ´ Q δi M p V i q ´ ˜ M p V i q M p V i q ´ k p V i q“ ´ Q δi M p V i q ´ ˜ M p V i q g p V i q “ ´r g p V i q b p Q δi M p V i q ´ qs Vec p ˜ M p V i qq ,χ p k q p W i qr ˜ k s : “ χ p k q p W i , G qr ˜ k s “ Q δi M p V i q ´ ˜ k p V i q ,χ p bt q p W i qr ˜ b t s : “ χ p t q p W i , G qr ˜ b t s “ ´ Q δi B g B v t p V i q ˜ b t p ξ it q where v t denotes the t th component of V , and where B g B v t p V i q is a Jacobian matrix of size p T ´ qˆ d .Note that the function τ does not appear in the above formula, nor does any of its partial orderderivatives. This is because when evaluated at the true value of V , by design τ simplifies to theidentity function on S V and its Jacobian is the identity matrix.We define X p k q r ˜ k s the partial pathwise derivative of X with respect to k at G and evaluatedat ˜ k , and similarly X p M q r ˜ M s and p X p bt q r ˜ b t sq t ď T . Assuming one can interchange expectation anddifferentiation, we follow Mammen et al (2016) and write X p G q r G ´ G s “ X p k q r k ´ k s ` X p M q r M ´ M s ` T ÿ t “ X p bt q r b t ´ b ,t s , “ ż V r λ M p v q Vec pp M ´ M qp v qqs dF V p v q ` ż V r λ k p v q p k ´ k qp v qs dF V p v q` T ÿ t “ ż ξ t λ bt p ξ t q p b t ´ b ,t qp ξ t q dF ξ t p ξ t q , (35)where the functions λ . are defined using the partial pathwise derivatives as λ M p v q “ ´ E ´ g p V i q b p Q δi M p V i q ´ q | V i “ v ¯ “ ´ g p v q b r E p Q δi | V i “ v q M p v q ´ s ,λ k p v q “ E p Q δi M p V i q ´ | V i “ v q “ E p Q δi | V i “ v q M p v q ´ ,λ bt p ξ t q “ ´ E ˆ Q δi B g p V i qB v t | ξ it “ ξ t ˙ . .4.3 Linear application of a nonparametric two-step sieve estimator To obtain the asymptotic properties of a linear functional of nonparametric two-step series esti-mators, we now return to Model (31) and treat the general case. The object of interest in thissection is the value of a linear function a evaluated at h W where h W p v q “ E p W | V “ v q . We usethe nonparametric two-step sieve estimator ˆ h W .Functionals of nonparametric estimators have been widely studied for different types of non-parametric estimators (see, e.g, Newey (1994b) for kernel estimators and Newey (1997) for seriesestimators). However the linear functional here is also evaluated at the more complicated two-stepnonparametric estimators constructed in the previous sections. Its asymptotic distribution cannotbe derived directly from the aforementioned results. Hahn, Liao, and Ridder (2018) derive asymp-totic normality results for nonlinear functionals of two-step nonparametric sieves estimators whenthe sieve estimators are from a general class of nonlinear sieve regression estimators. Character-izing the finite sample variance, they provide a practical estimator arguing that the asymptoticvariance does not always have an analytical form. We show with a different set of calculationsgiving lower-level conditions on the primitives that the asymptotic variance of our estimator canbe obtained for a class of models where the orthogonality condition between the first and secondstege does not hold.They however do not specify a formula for the asymptotic variance, arguing that it might notexist. This is not an issue in our case and we derive using a different type of proof the asymptoticnormality and asymptotic variance of our sieve estimators.The estimator of a p h W q will be a p ˆ h W q , and the purpose of this section is to write, undergeneral conditions on the random variables W, V, e W ˚ and the functions h W and ρ W , the term ? n p a p h W q ´ a p ˆ h W qq as ? n ř ni “ s Wi,n ` o P p q . We will then apply the derived results to X p k q r ˆ k ´ k s and X p M q r ˆ M ´ M s .We consider the case where w P R , V P R T d , a p h q P R d a . Define ρ Wi “ ρ W p X i , Z i q and thefollowing matrices, H Wt “ n n ÿ i “ ˆ p i pB h Wς p V i q{B v t b r it q A “ p a p p K q , .. , a p p KK qq , “ n n ÿ i “ ˆ p i pB h W p V i q{B v t b r it q , d P Wt “ n n ÿ i “ ρ Wi ˆ B p K p V i qB v t b r it ˙ ,H Wt “ E r p i pB h Wς p V i q{B v t b r it qs d P Wt “ E p ρ Wi ˆ B p K p V i qB v t b r it ˙ q , “ E r p i pB h W p V i q{B v t b r it qs , where h Wς is the functional extension of h W and where the equalities on the first two matrices holdbecause h ς and h are equal on S V . Recall that by || τ p v q ´ τ p v q|| ď || v ´ v || , under Assumption38.1 we have n ř ni “ || v i ´ ˆ v i || “ O P ` L { n ` L ´ γ ˘ “ O P p ∆ n q . Assumption 4.9.
1. The data W i is i.i.d,2. || a p g q|| ď C | g | ,3. h W is twice continuously differentiable with bounded first and second derivatives, and ρ W isbounded,4. There exists γ and β Lt such that for all t ď T , sup S ξt || b t p ξ t q ´ β L t r L p x t , z t q|| ď CL ´ γ .There exists γ , π KW such that sup S ςV || h Wς p v q ´ p K p v q π KW || ď CK ´ γ ,5. For all t ď T , there exists Γ t , a L ˆ L nonsingular matrix such that for R Lt p ξ t q “ Γ t r Lt p ξ t q , E p R Lt p ξ t q R Lt p ξ t q q has smallest eigenvalue bounded away from uniformly in L . There exists Γ , a K ˆ K nonsingular matrix such that for P K p V q “ Γ p K p V q , E p P K p V q P K p V q q hassmallest eigenvalue bounded away from uniformly in K ,6. || A || is bounded,7. For | R Lt p ξ t q| ď a p L q , | P K p V q| ς ď b p K q , | P K p V q| ς ď b p K q , | P K p V q| ς ď b p K q , we have ? nK ´ γ “ o p q , max p? K, ? Lb p K qq a p L q a L { n “ o p q , b p K q? nL ´ γ “ o p q , b p K qr a L { n ` L ´ γ sr K ` L s “ o p q , b p K qr L {? n ` ? nL ´ γ s “ o p q , b p K q ? K r a L { n ` L ´ γ s “ o p q ,8. E p|| v t || | ξ t q and Var p e W ˚ | X, Z q are bounded on S ξt and S X,Z respectively.
As stated above, we do not specify the sieve basis.
Lemma 4.1.
Under Assumptions 4.1 and 4.9, ? n r a p ˆ h W q ´ a p h W qs “ ? n n ÿ i “ A E p p i p i q ´ « p i e Wi ` T ÿ t “ p H Wt ´ d P Wt qp I k b E p r it r it q ´ q p v it b r it q ff ` o P p q . We use the proof techniques of Lemma 2 of Newey, Powell, and Vella (1999) to obtain thisapproximation. However as mentioned previously, an essential orthogonal condition they assumed,namely the conditional mean independence of w ´ E p w | V q of p X, Z q , does not hold in our model.For this reason we obtain an extra term depending on ρ W , the term d P Wt which would be zero if ρ W p X, Z q “
0. Another difference is the summation over t of the H Wt and d P Wt terms due to vectorof control variables being composed of T estimated residuals coming from T different cross-sectionregressions.Assumption 4.9 (6) is a condition on the generic functional a applied to the elements of theapproximating basis. The functionals appearing in ˆ µ are derived from the linearization of X .They all take the form of an expectation a p h W q “ ş λ a p v q h W p v q dF V p v q . This is exactly the meansquare continuity condition of Newey (1997), which he shows is sufficient to obtain ? n asymptotic39ormality of linear functionals of linear sieve estimators. We similarly exploit properties impliedby this specification of a but instead obtain an intermediary result on the mean square convergenceof the term in bracket on the RHS of Lemma 4.1. We will use this in the following section toshow that Condition (6) of Assumption 4.9 holds and to then obtain the total asymptotic variancematrix of ? n X p G q r ˆ G ´ G s . Assumption 4.10.
1. There exists a function λ a : R T d ÞÑ R d a such that a p h W q “ ş λ a p v q h W p v q dF V p v q ,2. There exists ι Ka such that | λ a p V q´ ι Ka p K p V q| “ O p K ´ γ q , and p ι Laht , ι
Laρt q such that as L Ñ 8 , E ´ || E ” λ a p V q B h W p V qB v td | ξ t ı ´ ι Laht r L p ξ t q|| ¯ Ñ and E ´ || E ” ρ Wi B λ a p V qB v td | ξ t ı ´ ι Laρt r L p ξ t q|| ¯ Ñ ,3. b p K q K ´ γ “ o p q ,4. Var p e Wi | V q is bounded on S V . Define ˜ λ a p v q “ A E p p i p i q ´ p K p v q , ˜ λ B a,td p ξ t q “ A E p p i p i q ´ H Wtd E p r it r it q ´ r L p ξ t q and B ˜ λ a,td p ξ t q “ A E p p i p i q ´ d P Wtd E p r it r it q ´ r L p ξ t q . Write Assumption 4.9’ for Assumption 4.9 without Condition(5). Lemma 4.2.
Under Assumption 4.9’ and 4.10, as
K, L
Ñ 8 , E p|| e W p ˜ λ a p V q ´ λ a p V qq|| q Ñ , E ´ || v td ” ˜ λ B a,td p ξ t q ´ E ´ λ a p V q B h W p V qB v td | ξ t ¯ı || ¯ Ñ and E ´ || v td ´ B ˜ λ a,td ´ E ” ρ W B λ a p V qB v td | ξ t ı¯ || ¯ Ñ . Note that Condition (2) is a sup norm rate condition on both λ a and B λ a {B V , stronger than onlyassuming E p|| λ a p V q ´ ι Ka p K p V q|| q Ñ ? n asymptotic normality. Loosely speaking, because the d P Wt includes derivatives of the vector ofbasis functions, left multiplication of d P Wt by the matrix A , which is the matrix of expectations of λ a multiplied by the functions, will yield under Condition (2) an approximation of the derivative of λ a . This will then appear in the asymptotic variance of our estimator when applied to the specificfunctionals. By Result 4.7, under Assumption 4.8, ? n “ X n p ˆ G q´ X n p G q ‰ is asymptotically equivalent to ? n X p G q r G ´ G s . X p G q is a sum of linear functionals applied to the components of G ´ G , where G “pp b t q t ď T , k, M q , see (35). We thus apply Lemma 4.1 to these functionals, choosing w i to be ei-ther a component of M i y i or of M i . By analogy with the general model, to apply Lemma 4.1 to X M we define the following objects e Mi “ M i ´ E p M i | V i q “ M i ´ M p V i q , M ˚ i “ M i ´ E p M i | X i , Z i q “ ,ρ M p X i , Z i q “ E p M i | X i , Z i q ´ E p M i | V i q “ M i ´ M p V i q , and similarly for X k r ˆ k s , e ki “ M i y i ´ E p M i y i | V i q “ r M i ´ M p V i qs g p V i q ` M i u i ,e k ˚ i “ M i y i ´ E p M i y i | X i , Z i q “ M i r u i ´ E p u i | X i , Z i qs ,ρ k p X i , Z i q “ E p M i y i | X i , Z i q ´ E p M i y i | V i q “ r M i ´ M p V i qs g p V i q ` M i E p u i | X i , Z i q . It will be convenient to assume E p u i | X i , X i , Z i q “
0. We define now the analogs of the matrices A , d P Wt and H Wt . For a given v λ jM p v q is the j th column of the matrix λ M p v q andΛ M “ ż v ” λ M p v q p K p v q , ... , λ M p v q p KK p v q , ... , λ p T ´ q M p v q p K p v q , ... , λ p T ´ q M p v q p KK p v q ı dF V p v q , of dimension d x ˆ K p T ´ q . Define similarly the matrix Λ k . We will interchangeably index thecolumns of λ M as λ dM with d ď p T ´ q and as λ stM with 1 ď s, t ď T ´
1. We will also need H Mt “ E „ B Vec p M p V i qqB v t b p i b r it , H kt “ E „ B k p V i qB v t b p i b r it , d P Mt “ E „ Vec p ρ Mi q b B p K p V i qB v t b r it , d P kt “ E „ ρ ki b B p K p V i qB v t b r it . The regression functions b t are estimated nonparametrically and the asymptotic distributionof functionals of such objects is studied in Newey (1997). For those, we define for a given ξ t , λ jbt p ξ t q the j th column of the matrix λ bt p ξ t q and for each t ,Λ bt “ ż ξ t ” λ bt p ξ t q r L p ξ t q , ... , λ bt p ξ t q r LL p ξ t q , ... , λ T ´ bt p ξ t q r L p ξ t q , ... , λ T ´ bt p ξ t q r LL p ξ t q ı dF ξ t p ξ t q . We now state the assumptions required to apply Lemma 4.1 on the functionals X p M q and X p k q applied respectively to ˆ M and ˆ k where as mentioned in the discussion before Assumption 4.9, wedo not specify the basis of approximating functions. Assumption 4.11. M and g are twice continuously differentiable with bounded first and second order deriva-tives,2. E p|| Q δi ||q ă 8 , E p|| u || | X, Z q is bounded on S X,Z , and S ξt for all t ď T and S V are bounded,3. There exists γ and β Lt such that for all t ď T , sup S ξt || b t p ξ t q ´ β L t r L p ξ t q|| ď CL ´ γ . Thereexists γ , π KM and π Kk such that sup S ςV || M ς p v q ´ p K p v q π KM || ď CK ´ γ and sup S ςV || k ς p v q ´ p K p v q π Kk || ď CK ´ γ , . For all t ď T , there exists Γ t , a L ˆ L nonsingular matrix such that for R Lt p ξ t q “ Γ t r Lt p ξ t q , E p R Lt p ξ t q R Lt p ξ t q q has smallest eigenvalue bounded away from uniformly in L . There exists Γ , a K ˆ K nonsingular matrix such that for P K p V q “ Γ p K p V q , E p P K p V q P K p V q q hassmallest eigenvalue bounded away from uniformly in K ,5. || Λ M || , || Λ k || , and || Λ bt || are bounded,6. For | R Lt p ξ t q| ď a p L q , | P K p V q| ς ď b p K q , | P K p V q| ς ď b p K q , | P K p V q| ς ď b p K q , we have ? nK ´ γ “ o p q , max p? K, ? Lb p K qq a p L q a L { n “ o p q , b p K q? nL ´ γ “ o p q , b p K qr a L { n ` L ´ γ sr K ` L s “ o p q , b p K qr L {? n ` ? nL ´ γ s “ o p q , b p K q ? K r a L { n ` L ´ γ s “ o p q ,7. E p u | X, Z q “ , and E p|| v t || | ξ t q and E p|| u || | X, Z q are bounded on S ξt and S X,Z respectively.
Assumption 4.11 (7) is imposed to simplify computations. It amounts to strengthening thecontrol function assumption, that is, Assumption 2.1 (2). Applying Lemma 4.1, we obtain thefollowing linearization.
Result 4.8.
Under Assumptions 2.1, 4.6 and 4.11, ? n X p G q r ˆ G ´ G s “ o P p q` ? n n ÿ i “ Λ M p I p T ´ q b Θ q Vec p e Mi q b p i ` ? n n ÿ i “ Λ k p I T ´ b Θ q e ki b p i (36) ` ? n n ÿ i “ T ÿ t “ ” Λ M p I p T ´ q b Θ qp H Mt ´ d P Mt q ` Λ k p I T ´ b Θ qp H kt ´ d P kt q ` Λ bt ı p I k b Θ q v it b r it , where we define Θ “ E p p i p i q and Θ “ E p r i r i q . Note that we can now write ? n ř ni “ r δ i µ i ´ E p µδ qs ` ? n ř ni “ Q δi u i ´ ? n X p G q r ˆ G ´ G s as ? n ř ni “ s i,n ` o P p q where s i,n “r δ i µ i ´ E p µδ qs ` Q δi u i ´ Λ M p I p T ´ q b Θ q Vec p e Mi q b p i ´ Λ k p I T ´ b Θ q e ki b p i ` T ÿ t “ ” Λ M p I p T ´ q b Θ qp d P Mt ´ H Mt q ` Λ k p I T ´ b Θ qp d P kt ´ H kt q ´ Λ bt ı p I k b Θ q v it b r it . We define Ω “ Var p s i,n q and the following objects, ˜ Q δi “ Q δi ´ E p Q δi | V i q M p V i q ´ M i , and Ω “ Var ´ r δ i µ i ´ E p µδ qs ` ˜ Q δi u i ` ř Tt “ E ´ ˜ Q δi B g p V i qB v t | ξ it ¯ v it ¯ . Assumption 4.12.
1. For each function λ a p . q , column of λ M p . q or λ k p . q , λ a p . q is continuously differentiable andthere exists ι Ka such that | λ a p V q ´ ι Ka p K p V q| “ O p K ´ γ q , and p ι Laht , ι
Laρt q such that as L Ñ 8 , E ´ || E ” λ a p V q B h W p V qB v td | ξ t ı ´ ι Laht r L p ξ t q|| ¯ Ñ and E ´ || E ” ρ Wi B λ a p V qB v td | ξ t ı ´ ι Laρt r L p ξ t q|| ¯ Ñ .For all t ď T , b t is continuous and there exists ι L t such that E ` || λ bt p ξ t q ´ ι Lbt r L p ξ t q|| ˘ Ñ as L Ñ 8 , . b p K q K ´ γ “ o p q ,3. E p Q δi || q ă 8 , E p|| µ i || q ă 8 , and there exists C ą such that Ω ě CI d x . The condition Ω ě CI d x holds if for instance Var p µ i | X i , Z i , u i , V i q ě CI d x for some C ą u i , as is typically assumed. We nowstate the result giving the asymptotic variance of the estimator and guaranteeing that Assumption4.11 (5) holds. The boundedness of the two last matrices is added for later results on asymptoticnormality. We will write Assumption 4.11’ for Assumption 4.11 without its condition (5) Result 4.9.
Under Assumptions 2.1, 4.6, 4.11’ and 4.12, Ω ´ { Ñ n Ñ8 Ω ´ { . Moreover, || Λ M || , || Λ k || , || Λ bt || , || Λ M p I p T ´ q b Θ qp H Mt ´ d P Mt q|| , and || Λ k p I T ´ b Θ qp H kt ´ d P kt q|| are bounded. We now know that under Assumption 4.12, Assumption 4.11 (5) holds. Thus, under Assumption4.11’ and Assumption 4.12, Equation (36) on ? n X p G q r ˆ G ´ G s holds. ˆ µ We now assemble the arguments of Section 4.4.2 and 4.4.4.1. Recall that if Assumption 4.8 holds,Result 4.7 will guarantee that ? n p ˆ µ δ ´ E p µ i δ i qq “ ? n ř ni “ s i,n ` o P p q . We thus focus now onshowing that Assumption 4.8 does hold.Condition (1) is a stochastic equicontinuity condition. We follow Section 4 in Chen, Lin-ton, and Van Keilegom (2003) (CLVK thereafter) in our choice of the space H , as they es-tablish easy-to-check conditions implying stochastic equicontinuity in some spaces. For S W abounded subset of R k , we define for a function g : S W ÞÑ R , and (cid:37) ą
0, the norm || g || ,(cid:37) “| g | t (cid:37) u ` max | r |“ t (cid:37) u sup w ‰ w |B r g p w q´B r g p w q||| w ´ w || (cid:37) ´ t (cid:37) u . We define C (cid:37)c p S W q to be the set of continuous func-tions g : S W ÞÑ R such that || g || ,(cid:37) ď c . The set H (cid:37) t,c “ C (cid:37)c p S ξ t q k will be the class of vec-tor valued functions taking values in R k , each component of which lies in C (cid:37)c p S ξ t q . We recallthat the generic functions k and M are defined on the extended support S ςV . Hence we define H (cid:37) M ,c,c “ C (cid:37)c p S ςV q p T ´ q X t g : @ V P S ςV , λ min p M g p V q q ą c u , where M g p V q is the matrix formed bythe coefficients of g p V q , and H (cid:37)k,c “ C (cid:37)c p S ςV q T ´ . Finally, for the entire vector of infinite dimensionalparameters G , we define the set H (cid:37)c,c “ ´ ˆ t ď T H (cid:37) t,c ¯ ˆ H (cid:37) M ,c,c ˆ H (cid:37)k,c and take H to be H (cid:37)c,c .Our choice of the norm on H , || . || H , is justified by Condition (2). The functional X is a functionof M , k and p b t q t ď T , where M and k are composed with p b t q t ď T . These compositions imply, as wasclear in the computations, that the linearization will involve the first order partial derivatives of M and k . It also implies that the difference between X p G q and X p G q ´ X p G q p G qr G ´ G s can beeasily controlled by, among other terms, the distance between first order partial derivatives of thesefunctions. A natural norm on H (cid:37)c,c is therefore || G || H “ ř p T ´ q j “ | M j ´ M ,j | ς ` ř T ´ j “ | k j ´ k ,j | ς ` ř t ď T ř d j “ || b t,j ´ b ,t,j || where by an abuse of notation M j is the j th component of Vec p M q . Thisnorm is our choice of norm in the remainder of this section.43 ssumption 4.13. (cid:37) ą max p T d , d z ` d q{ . Result 4.10.
Defining H “ H (cid:37)c,c and || . || H as described, if Assumption 4.11’ (1) and (2), Assump-tion 4.12 (3) and Assumption 4.13 hold, then Assumption 4.8 (1) and (2) hold. We chose H and || . || H and provided a set of conditions guaranteeing that Conditions (1) and(2) of Assumption 4.8 hold. Condition (3) is a condition on the convergence rate of the estimatorsˆ b t , ˆ k and ˆ M . The rate of convergence of || ˆ b t,j ´ b ,t,j || for all p t, j q is given by Equation (30), seethe Proof of Result 4.1. The rates of convergence of | ˆ k j ´ k ,j | ς and | ˆ M j ´ M ,j | ς are given byCorollary 4.1. The conditions required to apply this corollary must be adapted to the extendedsupport, as we did for other results. Assumption 4.11’ already includes most of these conditions,specifying an approximation rate of M and k over the extended support and defining the rates b , b and b as bounds on sup-norms of derivatives of P K defined over the extended support. Onlya slight modification of Condition (3) needs to be added. “There exists γ and β Lt such that for all t ď T , sup S ξt || g t p ξ t q ´ β L t r L p ξ t q|| ď CL ´ γ . Thereexists γ , π KM,st and π Kk,t such that | M ς p . q st ´ p K p . q π KM,st | ς ď CK ´ γ and | k ς p . q t ´ p K p . q π Kk,t | ς ď CK ´ γ , for all ď s, t ď T ´ ”. This modification is a stronger assumption, changing the approximation rate to be over the | . | norm instead of the sup norm. Assumption 4.11” is the modified version of Assumption 4.11’. Wecan now state the following result. Result 4.11.
Under Assumptions 2.1, 4.6, 4.11’, 4.12 and 4.13, assuming moreover that a p L q ∆ n “ o p n ´ { q and b p K qr K { n ` K ´ γ ` ∆ n b p K q s { “ o p n ´ { q , then r ˆ µ δ ´ E p µ i δ i qs “ ? n n ÿ i “ s i,n ` o P p q . Assumption 4.14. E r|| µ i ´ E p µ q|| s ă `8 , E r|| Q δi || s ă 8 and Var p δ i q ą . Also, E p|| v t || | ξ t q and E p|| u || | X, Z q are bounded on S ξt for all t ď T and on S X,Z respectively.
We can now state the main result of this section,
Result 4.12.
Under Assumptions 2.1, 4.6, 4.11”, 4.12, 4.13 and 4.14, assuming moreover that a p L q ∆ n “ o p n ´ { q and b p K qr K { n ` K ´ γ ` ∆ n b p K q s { “ o p n ´ { q , ? n r ˆ µ ´ E p µ | δ qs Ñ d N p , Φ ´ Ξ q , where Ξ “ Ω ` E pp δ i ´ Φ q s i q E p µ | δ q ` E p µ | δ q E pp δ i ´ Φ q s i q ` p Φ ´ Φ q E p µ | δ q E p µ | δ q . Monte Carlo simulations
We explore the properties of our multi-step estimator with Monte Carlo simulations when the modelis a specific case of the model studied in the asymptotic analysis, Model (26). More specifically,the data generating process we consider is the following. y it “ x it µ i ` x it µ i ` sin p v it q ` u it looooooomooooooon “ (cid:15) it , i “ ..n, t ď T. where the random coefficients are drawn according to µ i “ Aν i , with A “ ¨˝ ˛‚ and ν i „ U r , s , ν i „ U r , s , ν | ù ν , and the specification for the covariates, instruments and time-varying disturbances are, for all t ď T , ˜ x it „ U r , s , ˜ z it „ U r , s , v it „ U r´ . , . s , X i | ù Z i ,x it “ p µ i q { ˜ x it , z it “ p µ i q { ˜ z it ,x it “ p x it ` z it q { ` v it . In this design, the control function is f t p V i q “ sin p v it q , giving g t p V i q “ sin p v it ` q´ sin p v it q . Asfor the random coefficients, the design implies that µ has support r , s , E p µ i q “ .
5, Var p µ i q “ {
12, and that µ has support r , s , E p µ i q “ p µ i q “ {
6. The desgin implies that thesupport of x t is r , . s and the support of x t is r´ . , . s . The heterogeneity is quite substantial inthis design. This simulation design imposes the random coefficients and the regressors to covary. Toensure that the condition E p|| Q i u i ||q ă 8 holds, we imposed z it and x it to depend multiplicativelyon µ i and µ i raised to the power 1 {
4, following an observation made in Graham and Powell (2012).We show here the results of R “ n “ n “ x conditional on x and z , and estimationof the functions M p . q “ E p M i | V i “ . q and k p . q “ E p M i y i | V i “ . q . The conditional expectation of x is used to construct the generated covariates ˆ V i . Recall that g p V q “ M ´ p V q k p V q . For each ofthe simulation draw r , an estimate ˆ g r of the function g is computed. We report in Figure 1 thepointwise average of these esimates ¯ g p V q “ ř r ď R ˆ g r p V q{ R as well as the 5 th and 95 th quantiles g p V q and g p V q for each value of V .For each draw r , the estimators ˆ µ r and ˆ µ r of the average partial effects E p µ i q and E p µ i q arecomputed following the second step of our estimation procedure. That is, we plug in the estimatorof the function g in a sample analog of the formula (11). Figures 2 and 3 are smoothed histograms45f the obtained estimators of these average effects. For each coefficient, we used the same scalefor different sample sizes but did not use the same scale for each coefficient. These plots arecompatible with the asymptotic normality result of Section 4.4. It is noticeable that the varianceof the estimator of E p µ i q , which is the average partial effect of the endogenous variable, is largerthan the variance of the estimator of E p µ i q . However, this shows that even in small samples of size1000, the estimator for the average partial effects performs relatively well and in particular doesnot seem to be biased.As an additional exercise, we compare in Figure 4 the distribution of the estimator constructedin this paper to two different estimators of the impact of x and x . The first one is the first-difference instrumental variable, ˆ µ F DIV , as defined in Wooldridge (2010) Section 11.4. This esti-mator is consistent under an homogeneity assumption. The second estimator is ˆ µ CRC , an estimatorwhich is consistent under heterogeneity if there is no time-varying endogeneity. More precisely,ˆ µ CRC “ ř ni “ Q i y i { n : it corresponds to the second step of the estimator studied in this paper.It is visible from the figure that because of the biases coming from either heterogeneity or time-varying endogeneity, the true value of the average effect might not be in the confidence intervals ofestimators neglecting either of these features. As an empirical exercise, we apply our method to a model of labor supply with heterogenouselasticity of intertemporal substitution (EIS). The EIS is an essential object of interest in the studyof labor supply as it quantifies how labor supply responds to variations of the wage rate over time.More specifically, the model we consider isln h it “ α i ` ln ω it µ i ` χ it b ` (cid:15) it , i “ ..n, t “ ..T, (37)where h it is the number of annual hours worked, ω it is the hourly wage, χ it is composed of additionaldemographics. The individual elasticity of intertemporal subsitution µ i enters the individual utilityfunction, and heterogeneity in preferences may covary with ω it and χ it . This justifies not restrictingthe joint distribution of these random variables and taking a fixed effect approach to identificationand estimation. We allow for the log wage rate variable to be endogenous.A version of (37) without random coefficient, i.e, where µ i “ µ almost surely, is studied in Ziliak(1997) which also focuses on estimation of the EIS. In this paper, the demographics are assumed tosatisfy the sequential exogeneity condition E p (cid:15) it | χ is q “
0, for all s ď t . On the other hand, the wagevariable is considered contemporaneously endogenous due to either nonlinear income taxes, omittedvariables or measurement error. The wage is therefore assumed to only satisfy E p (cid:15) it | ln ω is q “ s ă t . 46e will estimate E p µ i q under a different set of conditions using the data set used in Ziliak(1997) and the identification results of Section 2. Consider a panel of periods 1 to T , preceded byperiods 0 , ´ , ..., ´ τ . Define v it “ ln ω it ´ E p ln ω it | χ i , ln ω i q and V i “ p v it q t ď T . We assume thatthere exists p f t p . qq t ď T such thatfor all t ď T, E p (cid:15) it | V i , ln ω i , χ i , χ i , .., χ i ´ τ q “ f t p V i q , (38)where the normalization condition E p f t p V i qq “ p χ i , χ i , .., χ i ´ τ q corresponds to theset of additional instruments mentioned in Section 2.4.2. By sequential exogeneity of χ it , its valuesin s for 2 ď s ď T cannot be in this set of instruments.For the same reason, we do not use χ it asinsrument to construct the control variables v it . Instead, we use the initial values χ i and ln ω i .This is similar to the approach described in Section 3.3. The conditional expectation equation(38) holds if for instance for all t , p ln ω is ´ , χ is , χ is ´ , .., χ i ´ τ q s ď t | ù p v is , (cid:15) is q s ě t , a condition whichwould also imply the moment conditions used in Ziliak (1997).Defining M i as in Section 2, we need T ě M i not to be the null matrix with probability1. Moreover, we use the log wage one period before the beginning of the panel as instrument toconstruct the control variables. We also use values of χ it drawn before period 1 as instrumentalvariables to estimate b . These requirements imply that T must be greater than 4.The dataset constructed in Ziliak (1997) is described in Section 2.1 of the paper. It is a selectedsample from the Survey Research Center subsample of the Panel Study of Income Dynamics. It iscomposed of 532 men aged 22 to 55, married and working at all periods of the panel. We definethe demographics χ it as number of children, age and an indicator of bad health. We use a panelof years 1979 to 1982 where period 1 is year 1980, period T is year 1982 and τ “
1. Note that thesample size is not as large as is desirable in semiparametric estimation.We start by estimating the generated covariates v it , writingln ω it “ γ t ln ω i ` γ t χ GC i ` v it , where χ GC i includes χ i and age i . We choose this linear specification with a quadratic in ageinstead of a fully nonparametric one to avoid the curse of dimensionality which potentially hasa strong impact given our small sample size. We then estimate successively the vector b , thefunctions g t p . q “ f t ` p . q ´ f t p . q for t ď T ´
1, and the average partial effect E p µ i q . These stepsrequire estimation of conditional expectation functions conditional on V . We choose the same basisof approximating functions of V (power series) and the same number of approximating terms foreach of these functions. The exact choice of approximating functions is decided using a leave-one-out cross-validation (CV) criterion. By design, the estimator of E p µ i q depends on the inverseof the matrix function M p . q “ E p M i | V i “ . q while it depends linearly on the other conditionalexpectations. For that reason, we chose as a criterion function the mean square forecast error of47he random variable M i . The set of conditioning variables is p v i , v i , v i q , hence the terms that canbe included in the sieve basis are v it raised to various powers and interactions of those (in additionto a constant term). We report the CV values for some specifications in Table 1. Our choice willbe the power series basis of degree 2.Terms included CV values p v it , v it q t ď T , v i v i , v i v i p v it , v it q t ď T p v it , v it , v it q t ď T p v it , v it , v it q t ď T , v i v i p v it q t ď T , v i v i , v i v i p v it q t ď T b . Theset of instruments is Z χi “ p χ i , χ i , age i , age i q . Defining the differences ∆ ln h i and ∆ χ i as inSection 2.4.1, and their estimators as { ∆ ln h i and y ∆ χ i , our estimator of b isˆ b “ ˜ n ÿ i “ y ∆ χ i Z i n ÿ i “ Z i Z i n ÿ i “ Z i y ∆ χ i ¸ ´ ˜ n ÿ i “ y ∆ χ i Z i n ÿ i “ Z i Z i n ÿ i “ Z i { ∆ ln h i ¸ , and we obtain ˆ b “ p´ . , . , ´ . q . Finally we estimate E p µ i q by the two-step approach asexplained in the main body of the paper. We first estimate M p . q and k p . q “ E p M i r ln h i ´ χ i b s | V i “ . q using a series approximation and plugging in the estimate ˆ b . Using these estimators ˆ M p . q and ˆ k p . q ,our estimate of g is ˆ g “ ˆ M ´ ˆ k . The final step to obtain the estimate of the average partial effect,that is, of the average elasticity of intertemporal substitution, entails computing the sample analogof the moment equality E p Q i r ln h i ´ χ i b ´ g p V i qsq “ E p µ i q . This gives ˆ µ “ n ř ni “ Q i r ln h i ´ χ i ˆ b ´ ˆ g p ˆ V i qs “ . In this paper, we studied a correlated random coefficient panel model and relaxed the strict exogene-ity condition imposed in the literature to allow for time-varying endogeneity. We proved identifica-tion of the average partial effect E p µ i q . Moreover, we provided an estimator of E p µ i | det p X i X i q ą δ q , showed its asymptotic normality and computed its asymptotic variance.48e highlight two directions for future research. First, our estimation focuses on E p µ | δ q , whichdepends on a constant δ . However δ is arbitrarily fixed in the paper and we do not give di-rections on how to choose a value when implementing the estimator suggested in this paper. Itwould be of interest to follow Graham and Powell (2012) and study the asymptotic properties of E p µ i | det p X i X i q ą δ n q with δ n Ñ
0. This could give a sense of an optimal choice for δ n as afunction of the sample size n . Note that extending the asymptotic analysis of Graham and Powell(2012) is nontrivial, as our estimation procedure includes an additional step with computation ofnonparametric two-step series estimators.The identification argument required T ą d x `
1. This can be quite restrictive, and long enoughpanels might not be available to identify average partial effects in models with multiple covariateswith random coefficients. A second direction for future work would be to relax this condition andobtain identification in the case T “ d x `
1, as is done in Graham and Powell (2012).49 eferences
Abadir, K. M., and
J. R. Magnus (2005):
Matrix algebra , vol. 1. Cambridge University Press.
Ahn, H., and
J. L. Powell (1993): “Semiparametric estimation of censored selection modelswith a nonparametric selection mechanism,”
Journal of Econometrics , 58(1-2), 3–29.
Ai, C., and
X. Chen (2003): “Efficient estimation of models with conditional moment restrictionscontaining unknown functions,”
Econometrica , 71(6), 1795–1843.
Altonji, J. G., and
R. L. Matzkin (2005): “Cross section and panel data estimators for non-separable models with endogenous regressors,”
Econometrica , 73(4), 1053–1102.
Andrews, D. W. (1991): “Asymptotic normality of series estimators for nonparametric andsemiparametric regression models,”
Econometrica: Journal of the Econometric Society , pp. 307–345.
Arellano, M., and
S. Bonhomme (2012): “Identifying Distributional Characteristics in RandomCoefficients Panel Data Models,”
The Review of Economic Studies , 79(3), 987–1020.
Arellano, M., and
S. Bonhomme (2016): “Nonlinear panel data estimation via quantile regres-sions,”
The Econometrics Journal , 19(3), C61–C94.
Bester, C. A., and
C. Hansen (2009): “Identification of Marginal Effects in a NonparametricCorrelated Random Effects Model,”
Journal of Business & Economic Statistics , 27(2), 235–250.
Blundell, R. W., and
J. L. Powell (2003):
Endogeneity in Nonparametric and SemiparametricRegression Models vol. 2 of
Econometric Society Monographs , p. 312357. Cambridge UniversityPress. (2004): “Endogeneity in semiparametric binary response models,”
The Review of EconomicStudies , 71(3), 655–679.
Chamberlain, G. (1992): “Efficiency Bounds for Semiparametric Regression,”
Econometrica ,60(3), 567–596.
Chen, X., H. Hong, and
E. Tamer (2005): “Measurement error models with auxiliary data,”
The Review of Economic Studies , 72(2), 343–366.
Chen, X., H. Hong, and
A. Tarozzi (2008): “Semiparametric efficiency in GMM models withauxiliary data,”
The Annals of Statistics , 36(2), 808–843.50 hen, X., O. Linton, and
I. Van Keilegom (2003): “Estimation of semiparametric modelswhen the criterion function is not smooth,”
Econometrica , 71(5), 1591–1608.
Chernozhukov, V., I. Fern´andez-Val, J. Hahn, and
W. Newey (2013): “Average andquantile effects in nonseparable panel models,”
Econometrica , 81(2), 535–580.
Das, M., W. K. Newey, and
F. Vella (2003): “Nonparametric estimation of sample selectionmodels,”
The Review of Economic Studies , 70(1), 33–58.
Escanciano, J. C., D. Jacho-Ch´avez, and
A. Lewbel (2016): “Identification and estimationof semiparametric two-step models,”
Quantitative Economics , 7(2), 561–589.
Escanciano, J. C., D. T. Jacho-Ch´avez, and
A. Lewbel (2014): “Uniform convergenceof weighted sums of non and semiparametric residuals for estimation and testing,”
Journal ofEconometrics , 178, 426–443.
Evdokimov, K. (2010): “Identification and estimation of a nonparametric panel data model withunobserved heterogeneity,”
Department of Economics, Princeton University . Graham, B. S., J. Hahn, A. Poirier, and
J. L. Powell (2018): “A quantile correlated randomcoefficients panel data model,”
Journal of Econometrics , 206(2), 305–335.
Graham, B. S., and
J. L. Powell (2012): “Identification and estimation of average partial effectsin irregular correlated random coefficient panel data models,”
Econometrica , 80(5), 2105–2152.
Hahn, J., Z. Liao, and
G. Ridder (2018): “Nonparametric two-step sieve M estimation andinference,”
Econometric Theory , pp. 1–44.
Hahn, J., and
G. Ridder (2013): “Asymptotic variance of semiparametric estimators with gen-erated regressors,”
Econometrica , 81(1), 315–340.
Heckman, J., and
E. Vytlacil (1998): “Instrumental variables methods for the correlatedrandom coefficient model: Estimating the average rate of return to schooling when the return iscorrelated with schooling,”
Journal of Human Resources , pp. 974–987.
Hoderlein, S., H. Holzmann, and
A. Meister (2017): “The triangular model with randomcoefficients,”
Journal of econometrics , 201(1), 144–169.
Hoderlein, S., and
H. White (2012): “Nonparametric identification in nonseparable panel datamodels with generalized fixed effects,”
Journal of Econometrics , 168(2), 300–314.
Horn, R. A., and
C. R. Johnson (2012):
Matrix analysis . Cambridge university press.51 siao, C. (2014):
Analysis of Panel Data , Econometric Society Monographs. Cambridge UniversityPress, 3 edn.
Ichimura, H., and
S. Lee (2010): “Characterization of the asymptotic distribution of semipara-metric M-estimators,”
Journal of Econometrics , 159(2), 252–266.
Imbens, G. W., and
W. K. Newey (2009): “Identification and estimation of triangular simulta-neous equations models without additivity,”
Econometrica , 77(5), 1481–1512.
Mammen, E., C. Rothe, and
M. Schienle (2016): “Semiparametric estimation with generatedcovariates,”
Econometric Theory , 32(5), 1140–1177.
Masten, M. A., and
A. Torgovitsky (2016): “Identification of instrumental variable correlatedrandom coefficients models,”
Review of Economics and Statistics , 98(5), 1001–1005.
Murtazashvili, I., and
J. M. Wooldridge (2008): “Fixed effects instrumental variables esti-mation in correlated random coefficient panel data models,”
Journal of Econometrics , 142(1),539–552.(2016): “A control function approach to estimating switching regression models withendogenous explanatory variables and endogenous switching,”
Journal of Econometrics , 190(2),252–266.
Newey, W. K. (1994a): “The asymptotic variance of semiparametric estimators,”
Econometrica:Journal of the Econometric Society , pp. 1349–1382.(1994b): “Kernel estimation of partial means and a general variance estimator,”
Econo-metric Theory , 10(2), 1–21.(1997): “Convergence rates and asymptotic normality for series estimators,”
Journal ofeconometrics , 79(1), 147–168.(2009): “Two-step series estimation of sample selection models,”
The Econometrics Jour-nal , 12, S217–S229.
Newey, W. K., J. L. Powell, and
F. Vella (1999): “Nonparametric estimation of triangularsimultaneous equations models,”
Econometrica , 67(3), 565–603.
Pesaran, M. H., and
R. Smith (1995): “Estimating long-run relationships from dynamic het-erogeneous panels,”
Journal of econometrics , 68(1), 79–113.
Wooldridge, J. M. (1997): “On two stage least squares estimation of the average treatmenteffect in a random coefficient model,”
Economics letters , 56(2), 129–133.522003): “Further results on instrumental variables estimation of average treatment effectsin the correlated random coefficient model,”
Economics letters , 79(2), 185–191.(2005a): “Fixed-effects and related estimators for correlated random-coefficient andtreatment-effect panel data models,”
Review of Economics and Statistics , 87(2), 385–390.(2005b):
Unobserved Heterogeneity and Estimation of Average Partial Effects p. 2755.Cambridge University Press.(2010):
Econometric analysis of cross section and panel data . MIT press.
Ziliak, J. P. (1997): “Efficient estimation with panel data when instruments are predetermined:an empirical comparison of moment-condition estimators,”
Journal of Business & EconomicStatistics , 15(4), 419–431. 53 ppendix
A Simulation Results
Figure 1:
Estimation of g , n “ (left) and n “ (right)Plot of the true value g (red dashed line), the pointwise average ¯ g (green line) and the 90percent MC confidence bands g and g (black dotted line). These functions are evaluated at V “ p v , , , q where v P r´ . , . s (in this case g p V q “ ´ sin p v q ). Figure 2:
Estimation of E p µ i q Distribution of ˆ µ r , with true value E p µ i q “ . ,when the sample sizes are n “ (left) and n “ (right). Estimation of E p µ i q Distribution of ˆ µ r , with true value E p µ i q “ ,when the sample sizes are n “ (left) and n “ (right). Figure 4:
Comparison of estimatorsFor µ (left) with true value E p µ i q “ . , and µ (right) with true value E p µ i q “ ,Sample size n “ . Proofs of Results in Section 2 and 3
Proof of Result 2.2:
Consider a draw of V P S V satisfying Assumption 2.4 (1) and (2). Int ´ S X | V ¯ ‰H and there exists a basis e “ p e , .., e T ´ q and for each t ď T ´ X p t q P Int ´ S X | V ¯ such that p X | V p X p t q | V q ą Rank p X p t q q “ d x and X p t q 1 e t “ S X | V is a subset of R p T ´ qˆ d x , the continuity arguments will therefore be in R p T ´ qˆ d x . Fix t ď T ´ X p t q is of full column rank k x , which implies that det p X p t q1 X p t q q ‰
0. The determinantfunction being continuous, as well as the density p X | V p . | V q , this implies that there exists an openball B t Ă Int ´ S X | V ¯ such that (1) X p t q P B t , (2) @ X P B t , p X | V p X | V q ą
0, and (3) @ X P B t , Rank p X q “ k x .Take c P R T ´ such that M p V q c “
0. Then we know by the argument given in Section 2.3.1that M p X q c “ , P X | V -a.s . Since the density p X | V p . | V q is strictly positive on B t , M p X q c “ X in B t except on a set of measure 0. Additionally, since for all X in B t X is of full rank,then M p X q c is a continuous function of X . Those two facts imply that M p X q c is uniformly 0 on B t , and in particular, M p X p t q q c “
0. Moreover X p t q 1 e t “ M p X p t q q e t “ M p X p t q q e t “ e t .Thus, @ t ď T ´ , M p X p t q q c “ ñ @ t ď T ´ , e t M p X p t q q c “ , ñ @ t ď T ´ , e t c “ c t “ , ñ c “ . Hence M p V q is invertible and this holds almost surely in V . Proof of Result 2.3:
We write x it “ b p z i t ` , z it q ` v it and in vector form X i “ B p Z i q ` V i , where X i is a column vector of size T ´ V and for the corresponding Z V defined inCondition (2) of Assumption 2.5, the variable X V “ B p Z V q` V . Wlog we can assume that X V ‰ b t p z Vit q{ dz t ‰
0. We also define an open ball B around X V such that for all X P B , X ‰ p X | V p X | V q ą
0. The function M p . q which maps X to the orthogonal projection matrix projectingonto the space orthogonal to the columns of X , is continuous on B by the same argument as in theproof of Result 2.2. Take c such that M p V q c “
0. As in this proof, we then have || M p X q c || “ X such that X P B .For this same draw of V and the corresponding Z V , there exists t ď T ´ b t p ¯ z it q{ dz t ‰
0. For a given δ ą
0, define Z Vδ “ p z Vi , .., z Vit ´ , z Vit ` δ, z Vit ` , .., z ViT q and X Vδ “ B p Z Vδ q ` V . For δ small enough, X Vδ P B . Note that X Vδ are column vectors. All components of X Vδ are the same asthose of X V but two. If T ě
4, one can deduce directly that there exists δ such that X Vδ and X V X Vδ ‰
0. If T “ δ small enough, one can show that X Vδ and X V canbe collinear only if x Vi “ x Vi . Since B is an open ball, one can change X V P B to ensure x Vi ‰ x Vi .We now have X Vδ and X V two noncollinear vectors of B . However M p X V q c “ c and X V are collinear, and M p X Vδ q c “ c and X Vδ are collinear, which wouldimply, if c ‰
0, that X and ¯ X are collinear. Therefore c must be 0. This implies that M p V q isnonsingular. Proof of Result 2.4:
The proof of invertibility of M p V q under Assumption 2.6 follows the samesteps as the proof of Result 2.2, using additionally continuity of the functions p l k q s ` ď k ď d x . Proof of Result 2.5:
The proof of this result follows as in the proof of Result 2.2, without thecontinuity arguments.
Proof of Result 3.2:
Under Assumption 3.2, for a given value ¯ V , if M p ¯ V q is not invertible, thereexist two nonzero draws of x , x and x , with positive density in the continuously distributedcase, or probability in the discretely distributed case, such that x C ` C p ¯ V q and x C ` C p ¯ V q are proportional. Thus M p ¯ V q not invertible implies ¯ V P D . The function g is then identified over S V z D . However, g is continuous and the support of V is dense in R T ´ , which allows us to identify g over S V . Since g is identified, the second identification step described in Section 2 allows foridentification of E p µ i q and E p α i q . C Proofs of Results in Section 4
C.1 Proof of Consistency of ˆ µ In what follows, T will denote the triangular inequality, M the Markov inequality, CS indicates theuse of the Cauchy Schwarz inequality, LLN the weak law of large numbers, C a generic constant(whose value can change from one line to another), and we follow Imbens and Newey (2009) indenoting with CM (for Conditional Markov) the result that if E ` | a n | ˇˇ b n ˘ “ O P p r n q then | a n | “ O P p r n q . For a sequence p c n q n P N P R N , the notation c n Ñ c n Ñ n Ñ8 Proof of Result 4.1.
Under Assumption 4.1, using Theorem 1 of Newey (1997) and Lemma A1 ofNewey, Powell, and Vella (1999) (see e.g Equations A.3 and A.5), we have @ t ď T , n ř ni “ || v it ´ ˜ v it || “ O P p ∆ n q , as well as max i || v it ´ ˜ v it || “ O P p a p L q ∆ n q , where ∆ n “ a L { n ` L ´ γ , andsup S ξt || b t ´ ˆ b t || “ O P p a p L q ∆ n q (39)Define ˜ V i “ p ˜ v i , ..., ˜ v iT q . Since || ˆ V i ´ V i || ď || ˜ V i ´ V i || , the result applies.57 roof of Result 4.2. Define a p L q “ L α ` . Under these conditions, Lemma S.3 of Imbens andNewey (2009) can be modified using Andrews (1991) (Equations 3.14 or A.40) to account for the factthat ξ t is not scalar. One obtains that since r L is a the power series basis of functions, there exists anonsingular L ˆ L matrix ˜Γ ξt such that for ˜ r L p ξ t q “ ˜Γ ξt r L p ξ t q then E p ˜ r L p ξ t q ˜ r L p ξ t q q “ I L , implyingthat Assumption 4.1 (2) holds. One also obtain sup S ξ || ˜ r L p ξ q|| ď a p L q with a p L q “ CL α ` as isrequired in Assumption 4.1 (4). Thus, Assumption 4.1 is satisfied and Result 4.1 applies. Proof of Result 4.3.
Instead of applying the general results of Section 5.2 of the Online Appendixof Hahn, Liao, and Ridder (2018), we directly extend the proof of Theorem 12 of Imbens andNewey (2009) (IN09 thereafter) because it uses lower level conditions similar to the ones we seekto impose. We adapt some of their claims to our model where E p e Wi | X i , X i , Z i q ‰ P “ p p , .. , p n q , Q “ P P { n , ˆ Q “ ˆ P ˆ P { n , ρ Wi “ ρ W p X i , X i , Z i q , as well as the vectors e W “ p e W , .. , e Wn q , (cid:126)ρ W “ p ρ W , .. , ρ Wn q and e W ˚ “ p e W ˚ , .. , e W ˚ n q . Note that e W “ (cid:126)ρ W ` e W ˚ .Because the series estimator is unchanged by a linear transformation of the basis of functions,we can assume that p i p V i q “ P K p V i q . As argued in Newey (1997), we can assume without lossof generality that under Assumption 4.3 E p p Ki p K i q “ I K . By construction, ˆ V i P S V , and underAssumption 4.1, (29) holds. Therefore, as in Lemma S.5 of Imbens and Newey (2009), we have || Q ´ I K || “ O P p b p K q a K { n q , (40) || P e W { n || “ O P p a K { n q , (41) || ˆ P ´ P || { n “ O P p b p K q ∆ n q , (42) || ˆ Q ´ Q || “ O P p b p K q ∆ n ` ? Kb p K q ∆ n q . (43)Hence by Assumption 4.3 (5), || ˆ Q ´ I K || “ o P p q and as in Lemma S.6 of Imbens and Newey (2009),with probability going to 1, λ min p ˆ Q q ě C and λ min p Q q ě C .We now show how the rate of convergence is impacted by the conditional mean dependence of e W on p X, Z q by deriving the rate of || ˆ π W ´ π KW || , where we recall ˆ π W “ ˆ Q ´ ˆ P W . We define H W “p h W p V q , .. , h W p V n qq , ˆ H W “ p h W p ˆ V q , .. , h W p ˆ V n qq , ˜ π W “ ˆ Q ´ ˆ P ˆ H W { n , ¯ π W “ ˆ Q ´ ˆ P H W { n . Wedecompose || ˆ π W ´ π KW || ď || ˆ π W ´ ¯ π W || loooooomoooooon p A q ` || ¯ π W ´ ˜ π W || loooooomoooooon p B q ` || ˜ π W ´ π KW || loooooomoooooon p C q . The first term can in turn be decomposed as p A q “ ˆ Q ´ ˆ P r (cid:126)ρ W { n ` e W ˚ { n s . Since p X i , Z i , e W ˚ i q are i.i.d, we have E p e W ˚ i | X , Z , .. , X n , Z n q “ E pp e W ˚ i q | X , Z , .. , X n , Z n q “ E pp e W ˚ i q | X i , Z i q ď C , and E p e W ˚ i e W ˚ j | X , Z , .. , X n , Z n q “
0. This gives E p|| ˆ Q { ˆ Q ´ ˆ P e W ˚ { n || | X , Z , .. , X n , Z n q “ tr p ˆ Q ´ { ˆ P E p e W ˚ e W ˚ 1 | X , Z , .. , X n , Z n q ˆ P ˆ Q ´ { q{ n , C tr p ˆ P p ˆ P ˆ P q ´ ˆ P q{ n ď CK { n. This implies by M that ˆ Q { ˆ Q ´ ˆ P e W ˚ { n “ O P p a K { n q , and by λ min p ˆ Q q ě C w . p . a 1, thatˆ Q ´ ˆ P e W ˚ { n “ O P p a K { n q . This rate is the same as Lemma S.7 (i) of IN09 since e W ˚ is by definition conditionally mean-independent of the regressors generating V . As for the second term appearing in p A q , we writeˆ Q ´ ˆ P (cid:126)ρ W { n “ ˆ Q ´ P (cid:126)ρ W { n ` ˆ Q ´ p P ´ ˆ P q (cid:126)ρ W { n. Since E p ρ Wi | V i q “ p ρ Wi , V i q is i.i.d, we know that as in (41), || P (cid:126)ρ W { n || “ O P p K { n q . Therefore,by λ min p Q q ě C w . p . a 1, || ˆ Q ´ P (cid:126)ρ W { n || “ O P p a K { n q .Moreover, p P ´ ˆ P q ρ W { n “ n ř ni “ p p Ki ´ ˆ p Ki q ρ Wi and1 n || n ÿ i “ p ˆ p Ki ´ p Ki q ρ Wi || ď n n ÿ i “ ||p ˆ p Ki ´ p Ki q ρ Wi || ď C ˜ n n ÿ i “ ||p ˆ p Ki ´ p Ki q|| ¸ { ˜ n n ÿ i “ | ρ Wi | ¸ { , ď Cb p K q ˜ n n ÿ i “ ||p ˆ V i ´ V i q|| ¸ { || ρ || “ O P p b p K q ∆ n q . This implies by λ min p ˆ Q q ě C w . p . a 1 that ˆ Q ´ p P ´ ˆ P q (cid:126)ρ W { n “ O P p b p K q ∆ n q . This gives aconvergence rate for p A q , || ˆ π W ´ ¯ π W || “ O P p K { n ` K { n ` b p K q ∆ n q “ O P p K { n ` b p K q ∆ n q .By Lemma S.7 (ii) and (iii) of IN09, p B q “ O P p ∆ n q since h W p . q is Lipschitz, and p C q “ O P p K ´ γ q using Assumption 4.3 (4). This implies that || ˆ π W ´ π KW || “ O P p K { n ` b p K q ∆ n ` ∆ n ` K ´ γ q “ O P p K { n ` K ´ γ ` b p K q ∆ n q , which differs from the rate p K { n ` K ´ γ ` ∆ n q obtained in IN09. The structure of the proofshowed that the extra term b p K q ∆ n comes from the correlation between e W and ˆ P ´ P , which isnonzero because ˆ P is constructed using the estimated ˆ V which themselves depend on the covariates p X , X , Z q . Deriving a rate of convergence required linearizing the term ˆ P ´ P .We can now conclude that ż ˇˇˇ ˆ h W p V q ´ h W p V q ˇˇˇ dF V p V q ď ż ˇˇˇ ˆ h W p V q ´ p K p V q π KW ˇˇˇ dF V p V q ` ż ˇˇ p K p V q π KW ´ h W p V q ˇˇ dF V p V qď p ˆ π W ´ π KW q r ż p K p V q p K p V q dF V p V qs p ˆ π W ´ π KW q ` CK ´ γ “ O P p K { n ` K ´ γ ` ∆ n b p K q q , where the last line holds by the normalization E p Q q “ I K . Moreover,sup V P S V | ˆ h W p V q ´ h W p V q| ď sup V P S V ˇˇˇ ˆ h W p V q ´ p K p V q π KW ˇˇˇ ` O P p K ´ γ q O P ´ b p K qp K { n ` K ´ γ ` ∆ n b p K q q { ¯ . Proof of Result 4.4.
As in the proof of Result 4.2, define b p K q “ K α ` and b p K q “ K α ` .Then Assumption 4.4 (5) implies Assumption 4.3 (5), and together with Assumption 4.4 (3), impliesAssumption 4.3 (3).To prove Result 4.5, we will need the two following Lemmas. Lemma C.1.
Under Assumption 4.6, the function V P S V ÞÑ λ min p M p V qq is continuous.Proof of Lemma C.1. For all V , M p V q is symmetric. Using the Weilandt and Hoffman inequality(see e.g Corollary 6.3.8 p408 in Horn and Johnson (2012)), for two values V and V , T ´ ÿ i “ ˇˇ λ i p M p V qq ´ λ i p M p V qq ˇˇ ď || M p V q ´ M p V q|| F , where we index the eigenvalues p λ i q T ´ i “ by increasing order. This implies ˇˇ λ min p M p V qq ´ λ min p M p V qq ˇˇ ď || M p V q ´ M p V q|| { F . (44)Since M p . q is a continuous function, this concludes the argument. Note that the Lipschitz inequality(44) will be used in the proof for the convergence rates. Lemma C.2.
Under Assumption 4.6, there exists c ą , such that for all V P S V , λ min p M p V qq ě c .Proof of Lemma C.2. Under Assumption 4.6, M p V q is nonsingular for all V P S V . This impliesthat for all V P S V , λ min p M p V qq ą
0. Since S V is a compact set and the function V ÞÑ λ min p M p V qq is continuous by Lemma C.1, it has a minimum and reaches it. This minimum value cannot be 0,hence D c ą , @ V P S V , λ min p M p V qq ě c . Proof of Result 4.5.
Under Assumptions 4.2 and 4.4, writing Γ n and γ n respectively the meansquare and sup norm rates of convergence, we have ż ˇˇˇˇˇˇ ˆ k p V q ´ k p V q ˇˇˇˇˇˇ dF p V q “ O P p Γ n q , sup V P S V || ˆ k p V q ´ k p V q|| “ O P p γ n q , ż ˇˇˇˇˇˇ ˆ M p V q ´ M p V q ˇˇˇˇˇˇ F dF p V q “ O P p Γ n q , sup V P S V ˇˇˇˇˇˇ ˆ M p V q ´ M p V q ˇˇˇˇˇˇ F “ O P p γ n q , since the Frobenius norm and the Euclidean norm are square roots of the sum of squared elements,and this rate was obtained for each element of M p V q and k p V q .We write n “ ´ min V P S V λ min p ˆ M p V qq ą c ¯ . Using (44), we have λ min p ˆ M p V qq ą λ min p M p V qq ´ ˇˇˇˇ || ˆ M p V q ´ M p V q|| F ˇˇˇˇ , min V P S V λ min p ˆ M p V qq ą c ´ ˇˇˇˇ || ˆ M p V q ´ M p V q|| F ˇˇˇˇ , where the last implication uses Lemma C.2. Hence n ě ´ˇˇˇˇ || ˆ M p V q ´ M p V q|| F ˇˇˇˇ ď c { ¯ . ByAssumptions 4.2 and 4.4, γ n Ñ n “ . p . a 1.To obtain the sup norm rate, we write @ V P S V , g p V q ´ ˆ g p V q “ M p V q ´ k p V q ´ ˆ M p V q ´ ˆ k p V q“ M p V q ´ ” ˆ M p V q ´ M p V q ı ˆ M p V q ´ k p V q ` ˆ M p V q ´ ” k p V q ´ ˆ k p V q ı . Using T, the norm inequality and definition of induced norm, this gives n || ˆ g p V q ´ g p V q|| ď n c ˇˇˇˇ || ˆ M p V q ´ M p V q|| F ˇˇˇˇ c || k || ` n c || k ´ ˆ k || . This implies that || ˆ g p V q ´ g p V q|| “ O P p γ n q . To obtain the mean square error rate, we write n ż || ˆ g p V q ´ g p V q|| dF p V q“ n ż ˇˇˇˇˇˇ M p V q ´ ” ˆ M p V q ´ M p V q ı ˆ M p V q ´ k p V q ` ˆ M p V q ´ ” k p V q ´ ˆ k p V q ıˇˇˇˇˇˇ dF p V qď n c c || k || ż ˇˇˇˇˇˇ ˆ M p V q ´ M p V q ˇˇˇˇˇˇ F dF p V q ` n c ż ˇˇˇˇˇˇ ˆ k p V q ´ k p V q ˇˇˇˇˇˇ dF p V q , which implies ş || ˆ g p V q ´ g p V q|| dF p V q “ O P p Γ n q . Proof of Result 4.6.
To prove consistency, we need to show that n ř ni “ Q δi ˆ g p ˆ V i q Ñ P E p Qg p V q δ q .Indeed then by the LLN, we have n ř ni “ δ i Ñ P P p det p X i X i q ą δ q , and by Assumption 4.7 and theLLN, n ř ni “ Q δi y i Ñ P E p Q y δ q also holds. Then consistency would follow from Equation (12).To obtain n ř ni “ Q δi ˆ g p ˆ V i q Ñ P E p Qg p V q δ q , we decompose1 n n ÿ i “ Q δi ˆ g p ˆ V i q ´ E p Qg p V q δ q “ n n ÿ i “ Q δi r ˆ g p ˆ V i q ´ g p V i qs ` n n ÿ i “ Q δi g p V i q ´ E p Qg p V q δ q : “ A n ` B n . We have || A n || “ ˇˇˇˇ n n ÿ i “ Q δi r ˆ g p ˆ V i q ´ g p ˆ V i q s ` n n ÿ i “ Q δi r g p ˆ V i q ´ g p V i q s ˇˇˇˇ ď || ˆ g ´ g || n n ÿ i “ || Q δi || ` C max i || ˆ V i ´ V i || n n ÿ i “ || Q δi || “ O P p γ n ` a p L q ∆ n q n n ÿ i “ || Q δi || , where the first term in the inequality follows from ˆ V i P S V by design, and the second sum termfollows from g being continuously differentiable on a compact set, hence Lipschitz continuous onthis set. The last equality follows from Equation (30) and Result (4.5). We assumed γ n Ñ a p L q ∆ n Ñ n goes to infinity, thus we obtain || A n || “ o P p q .61 .2 Proof of Asymptotic Normality of ˆ µ We first introduce some more notations. We define (cid:126)b t “ p b t , ..., b nt q “ p b t p ξ t q , ..., b t p ξ nt qq , (cid:126)v t “p v t , .., v nt q , (cid:126)V “ p V , .. , V n q , (cid:126)x “ p X , .. , X n q and similarly (cid:126)z . For the results in Section 4.4.3, wealso define the vector (cid:126)h Wς “ p h Wς p V q , .. , h Wς p V n qq “ p h W p V q , .. , h W p V n qq since V i P S V @ i ď n ,and the vector (cid:126) ˆ h Wς “ p h Wς p ˆ V q , .. , h Wς p ˆ V n qq . Proof of Result 4.7.
This proof is a special case of Theorem 2 in Chen, Linton, and Van Keile-gom (2003). By Assumption 4.8 (3), there exists δ n “ o p q such that P p|| ˆ G ´ G || ą δ n q Ñ n “ p|| ˆ G ´ G || ď δ n q . n || X n p ˆ G q ´ X n p G q ´ X p G q p G qr ˆ G ´ G s|| ď n || X n p ˆ G q ´ X p G q ´ X n p G q|| ` n || X p G q ´ X p G q p G qr ˆ G ´ G s|| “ o P p n ´ { q by Assumption 4.8. Hence || X n p ˆ G q ´ X n p G q ´ X p G q p G qr ˆ G ´ G s|| “ o P p n ´ { q . Proof of Lemma 4.1.
As argued in the proof of Result 4.3, under Assumption 4.9 (5), we can imposewithout loss of generality the normalization E p p i p i q “ I K and E p r i r i q “ I L .We define ∆ B P “ b p K q a L { n , ∆ Q “ b p K q ∆ n `? Kb p K q ∆ n ` b p K q a K { n , ∆ Q “ a p L q a L { n and ∆ H “ b p K q ∆ n ? L ` a p L q a K { n . Recall that ∆ n “ L { n ` L ´ γ and note that b p K q ď b p K q ď b p K q Under Assumption 4.9 (7), ? nK ´ γ “ o p q and ? K ∆ Q “ O p? K r? Kb p K q ∆ n ` b p K q a K { n sq “ O p Kb p K qr a L { n ` L ´ γ sq “ o p q , ∆ B P ? L “ b p K q L {? n “ o p q , b p K q ∆ Q ? L “ b p K q a p L q L {? n “ o p q ,b p K q ∆ n “ o p q , ? L ∆ H “ O p b p K q L p a L { n ` L ´ γ q ` a p L q a KL { n q “ o p q , ∆ Q b p K q “ O ´ b p K q ? K r L ´ γ ` a L { n s ` b p K q a K { n ¯ “ o p q , ? nb p K q ∆ n “ b p K qr L {? n ` ? nL ´ γ s “ o p q . These results imply in particular that ∆ B P “ o p q , ∆ Q “ o p q , ∆ Q “ o p q and ∆ H “ o p q . Allthese rates will be used in the steps of the proof.We define for all t ď T , Q t “ R t R t { n . As in the proof of Result 4.3, we obtain || Q t ´ I L || “ O P p ∆ Q q “ o P p q , t ď T , and || ˆ Q ´ I K || “ O P p ∆ Q q “ o P p q . This implies, as argued in NPV99,that the eigenvalues of ˆ Q are bounded away from 0 w . p . a 1, therefore || B ˆ Q ´ || ď || B || O P p q and || B ˆ Q ´ { || ď || B || O P p q for any matrix B . Using || H Wt ´ H Wt || ď|| n n ÿ i “ p ˆ p i ´ p i qpB h Wς p V i q{B v t b r it q||` || n n ÿ i “ p i pB h W p V i q{B v t b r it q ´ E r p i pB h W p V i q{B v t b r it qs|| , with || n n ÿ i “ p ˆ p i ´ p i qpB h Wς p V i q{B v t b r it q|| ď b p K q ∆ n sup S ςV ||B h Wς || tr p R t R t q { {? n “ O P p b p K q ∆ n ? L q , E p|| n n ÿ i “ p i pB h W p V i q{B v t b r it q ´ E r p i pB h W p V i q{B v t b r it qs|| qď E « tr ˜ n n ÿ i “ p i pB h W p V i q{B v t b r it q pB h W p V i q{B v t b r it q p i ¸ff ď C n E p tr p p i r i r i p i qq ď Ca p L q K { n, we obtain || H Wt ´ H Wt || “ O P p ∆ H q . Thus by Assumption 4.9 (7), || H Wt ´ H Wt || “ o P p q . Moreover,since ρ W p . q is a bounded function and E p r it r it q “ E p tr p I L qq “ L , we have for all t ď T , E p|| d P Wt ´ d P Wt || q ď Cn n ÿ i “ E « tr ˜ B p K p V i qB v t ˆ B p K p V i qB v t ˙ r it r it ¸ff ď Cb p K q L { n, which implies by M that || d P Wt ´ d P Wt || “ O P p ∆ B P q “ o P p q .Since a p . q is linear, a p ˆ h W q “ A ˆ π W . Using Assumption 4.9 (2), || a p p K p . q π KW q ´ a p h W p . qq|| ď C sup S V || p K p . q π KW ´ h W p . q|| ď C sup S ςV || p K p . q π KW ´ h Wς p . q|| ď CK ´ γ , which implies using Assumptions 4.9 (7) that ? n r a p ˆ h W q ´ a p h W qs “ ? nA r ˆ π W ´ π KW s ` o P p q , “ A ˆ Q ´ ˆ P p W ´ ˆ P π KW q{? n ` o P p q . Since || A ˆ Q ´ ˆ P p (cid:126) ˆ h Wς ´ ˆ P π KW q|| ď || A ˆ Q ´ ˆ P |||| (cid:126) ˆ h Wς ´ ˆ P π KW || ď ? n || A ˆ Q ´ { || ? n sup S ςV || p K p . q π KW ´ h W p . q|| , then A ˆ Q ´ ˆ P p ˆ h Wς ´ ˆ P π KW q{? n “ o P p q by || A || bounded. Therefore we obtain as in NPV99, ? n p a p ˆ h W q ´ a p h W qq “ A ˆ Q ´ ˆ P e W {? n loooooooomoooooooon (B) ` A ˆ Q ´ ˆ P p (cid:126)h Wς ´ (cid:126) ˆ h Wς q{? n loooooooooooooomoooooooooooooon (C) ` o P p q . (45)We first focus on the term (B), which we decompose(B) “ AP e W {? n ` A p ˆ Q ´ ´ I q P e W {? n loooooooooooomoooooooooooon p B q ` A ˆ Q ´ p ˆ P ´ P q e W ˚ {? n looooooooooooomooooooooooooon p B q ` A ˆ Q ´ p ˆ P ´ P q (cid:126)ρ W {? n looooooooooooomooooooooooooon p B q . Since E p|| P e W {? n || q “ tr r E p P e W p e W q P qs{ n “ tr r E p E p e W p e W q | (cid:126)V q P P qs{ n ď C tr r I k s “ O p K q by Assumption 4.9 (8), by M we have || P e W {? n || “ O P p K { q . Therefore, ||p B q|| ď || A ˆ Q ´ || || I K ´ ˆ Q || || P e W {? n || “ || A || O P p q || I K ´ ˆ Q || || P e W {? n || “ O P p ∆ Q K { q p B q “ o P p q , under Assumption 4.9 (8). We now look at the extra terms (B2) and (B3), where we decomposed e W as ρ W ` e W ˚ since e W itself is not conditionally mean independent of p ˆ P ´ P q , while e W ˚ is.Indeed, since E p e W ˚ | (cid:126)x, (cid:126)z q “ E p||p ˆ P ´ P q e W ˚ {? n || | (cid:126)x, (cid:126)z q “ { n tr r E pp ˆ P ´ P q e W ˚ p e W ˚ q p ˆ P ´ P q | (cid:126)x, (cid:126)z qs“ { n tr rp ˆ P ´ P q E p e W ˚ p e W ˚ q | (cid:126)x, (cid:126)z q p ˆ P ´ P q s ď C { n || ˆ P ´ P || . By the proof of Result 4.3, || ˆ P ´ P || { n “ O P p b p K q ∆ n q (the difference is that now b p K q is definedas the sup rate over the extended support S ςV ) hence by CM, ||p ˆ P ´ P q e W ˚ {? n || “ O P p b p K q ∆ n q . || A ˆ Q ´ p ˆ P ´ P q e W ˚ {? n || ď || A ˆ Q ´ || ||p ˆ P ´ P q e W ˚ || {? n “ || A || O P p q||p ˆ P ´ P q e W ˚ ||{? n ñ p B q “ O P p b p K q ∆ n q “ o P p q . We now focus on (B3). We have p ˆ P ´ P q (cid:126)ρ W {? n “ {? n ř ni “ p ˆ p Ki ´ p Ki q ρ Wi and a second orderTaylor expansion gives ||p ˆ p Ki ´ p Ki q ´ B p K p τ p V i qqB V B τ p V i qB V p ˜ V i ´ V i q|| ď Cb p K q|| ˜ V i ´ V i || , which can be rewritten as ||p ˆ p Ki ´ p Ki q ´ B p K p V i qB v p ˜ V i ´ V i q|| ď Cb p K q|| ˜ V i ´ V i || , since V i P S V andwe chose τ so that its Jacobian matrix is the identity matrix on S V . Hence || A ˆ Q ´ ? n rp ˆ P ´ P q (cid:126)ρ W ´ n ÿ i “ B p K p V i qB V p ˜ V i ´ V i q ρ Wi s || ď CO P p q b p K q n ÿ i “ || ˜ V i ´ V i || {? n “ O P p? nb p K q ∆ n q “ o P p q , by Assumption 4.9 (7). Therefore p B q “ A ˆ Q ´ ř ni “ B p K p V i qB V p ˜ V i ´ V i q ρ Wi {? n ` o P p q . We candecompose p´ qp ˜ v it ´ v it q “ ˆ β t r it ´ b it “ p Q ´ t R t r X t ´ R t β Lt s{ n q r it ` β L t r it ´ b it , “ rp Q ´ t R t (cid:126)v t { n q r it s ` rp Q ´ t R t r (cid:126)b t ´ R t β Lt s{ n q r it s ` r β L t r it ´ b it s , and then apply this decomposition to p B. q , p B. q “ ´rp B . q ` p B . q ` p B . qs ` o P p q , where p B . q “ T ÿ t “ A ˆ Q ´ n ÿ i “ ρ Wi B p K p V i qB v t ” Q ´ t R t r (cid:126)b t ´ R t β Lt s{ n ı r it {? n p B . q “ T ÿ t “ A ˆ Q ´ n ÿ i “ ρ Wi B p K p V i qB v t r β L t r it ´ b it s{? n p B . q “ T ÿ t “ A ˆ Q ´ n ÿ i “ ρ Wi B p K p V i qB v t “ Q ´ t R t (cid:126)v t { n ‰ r it {? n p B. q can be rewritten p B . q “ T ÿ t “ A ˆ Q ´ n ÿ i “ ρ Wi B p K p V i qB v t ” Q ´ t R t r (cid:126)b t ´ R t β Lt s{ n ı r it {? n “ T ÿ t “ A ˆ Q ´ n n ÿ i “ ρ Wi ˆ B p K p V i qB v t b r it ˙ Vec p Q ´ t R t r (cid:126)b t ´ R t β Lt sq{? n “ T ÿ t “ A ˆ Q ´ d P Wt p I d b Q ´ t R t q Vec p (cid:126)b t ´ R t β Lt q{? n, (see e.g p282 Abadir and Magnus (2005)) where || Vec p (cid:126)b t ´ R t β Lt q|| ď ? n sup S ξt || b t p . q´ β L t r L p . q|| ď? nL ´ γ . Defining, for d ď d , the matrix d P Wtd “ n ř ni “ ρ Wi B p K p V i qB v td r it where v td is the d th compo-nent of v t , then d P Wt “ p d P Wt , .. , d P Wtd q , and we can write || A ˆ Q ´ d P Wt p I d b Q ´ t R t q|| “ d ÿ d “ || A ˆ Q ´ d P Wtd Q ´ t R t q|| “ d ÿ d “ tr p A ˆ Q ´ d P Wtd Q ´ t R t R t Q ´ t d P W td ˆ Q ´ A q “ n d ÿ d “ tr p A ˆ Q ´ d P Wtd Q ´ t d P W td ˆ Q ´ A q , and d P Wtd Q ´ t d P W td “ ´ n ř ni “ ρ Wi B p K p V i qB v td r it ¯ ` n ř ni “ r it r it ˘ ´ ´ n ř ni “ ρ Wi B p K p V i qB v td r it ¯ .By R t p R t R t q ´ R t being an orthogonal projection matrix,d P Wtd Q ´ t d P W td ď n n ÿ i “ p ρ Wi q B p K p V i qB v td ˆ B p K p V i qB v td ˙ , implying || d P Wtd Q ´ { t || ď b p K q andtr p A ˆ Q ´ d P Wtd Q ´ t d P W td ˆ Q ´ A q ď n n ÿ i “ p ρ Wi q tr p A ˆ Q ´ B p K p V i qB v td ˆ B p K p V i qB v td ˙ ˆ Q ´ A qď n || A ˆ Q ´ || n ÿ i “ p ρ Wi q || B p K p V i qB v td || “ O P p q b p K q , since ρ W p . q is bounded. Hence || A ˆ Q ´ d P Wt p I d b Q ´ t R t q|| “ O P p nb p K q q and we obtain byAssumption 4.9 (7), ||p B . q|| “ O P p? nb p K q? nL ´ γ {? n q “ O P p b p K q? nL ´ γ q “ o P p q . Focusing now on the second term in the expression of p B q , ||p B . q|| “ ˇˇˇˇ T ÿ t “ A ˆ Q ´ n ÿ i “ ρ Wi B p K p V i qB v t r β L t r it ´ b it s ˇˇˇˇ {? n ď || A ˆ Q ´ || nb p K q CL ´ γ {? n “ O P p q b p K q? nL ´ γ “ o P p q , p B q “ ´p B . q ` o P p q , with p B . q “ T ÿ t “ A ˆ Q ´ d P Wt p I d b Q ´ t q Vec p R t (cid:126)v t q{? n. First, by E p|| v it || | ξ it “ ξ t q bounded, || R t (cid:126)v t {? n || “ O P p? L q , which gives || T ÿ t “ A ˆ Q ´ d P Wt r I d b Q ´ t ´ I d L s Vec p R t (cid:126)v t q{? n || , ď || A ˆ Q ´ || T ÿ t “ || d P Wt p I d b Q ´ { t q|| || I d b Q ´ { t || || I d b p I L ´ Q t q|| || R t (cid:126)v t {? n || , ď O P p q Cb p K q O P p q ∆ Q ? L “ O P p b p K q ∆ Q ? L q “ o P p q , by Assumption 4.9 (7). Similarly || T ÿ t “ A ˆ Q ´ p d P Wt ´ d P Wt q Vec p R t (cid:126)v t q{? n || ď C || A ˆ Q ´ || || d P Wt ´ d P Wt || || R t (cid:126)v t {? n ||“ O P p ∆ B P ? L q “ o P p q . Finally, we write d P Wtd “ E ` ρ Wi B p K p V i q{B v td r it ˘ , as well as v itd the d th component of v it and (cid:126)v td “ p v td , .., v ntd q . Then E p|| d P Wtd R t (cid:126)v td {? n || q “ tr ` d P Wtd E ` R t E p (cid:126)v td (cid:126)v td | (cid:126)x , (cid:126)z q R t ˘ d P W td ˘ { n ď C tr ` d P Wtd E ` R t R t ˘ d P W td ˘ { n ď C || d P Wtd d P W td || “ C E ` ρ Wi B p K p V i q{B v td r it ˘ E p r it r it q ´ E ` ρ Wi r it B p K p V i q {B v td ˘ ď C E pp ρ Wi q B p K p V i q{B v td pB p K p V i q{B v td q q ď Cb p K q , where the second to last inequality follows from taking the orthogonal projection matrix argumentto the limit.This implies that || d P Wtd R t (cid:126)v td {? n || “ O P p b p K qq , and || T ÿ t “ A p ˆ Q ´ ´ I d q d P Wt Vec p R t (cid:126)v t q{? n || ď T ÿ t “ d ÿ d “ || A ˆ Q ´ p I d ´ ˆ Q q d P Wtd R t (cid:126)v td q{? n ||ď || A ˆ Q ´ || O P p ∆ Q q O P p b p K qq “ O P p ∆ Q b p K qq “ o P p q , by Assumption 4.9 (7). We can now write p B . q “ ? n A T ÿ t “ d P Wt Vec p R t (cid:126)v t q ` o P p q “ ? n n ÿ i “ A T ÿ t “ d P Wt v it b r it ` o P p q , where the term appearing in the sum over n , were the weight matrices not normalized, wouldbecome ř t ď T A E p p i p i q ´ d P Wt p I d b E p r it r it q ´ q v it b r it . Adding all terms appearing in p B q , oneobtains, (B) “ ? n n ÿ i “ A « p i e Wi ´ T ÿ t “ d P Wt p v it b r it q ff ` o P p q . ? n p a p ˆ h W q ´ a p h W qq is p C q “ A ˆ Q ´ ˆ P p (cid:126)h Wς ´ (cid:126) ˆ h Wς q{? n. This term is similar to the second term in equation p A. q of NPV99, p598, where the regressionfunction is becomes h Wς . Since v ÞÑ h Wς p τ p v qq is by composition twice continuously differentiableand has bounded second order derivative on the extended support, one obtains using NPV99(C) “ ? n n ÿ i “ A T ÿ t “ H Wt p v it b r it q ` o P p q , adapting to the fact that h Wς is here function of T generated covariates instead of one and using ? nL ´ γ , ? nb p K q ∆ n , ? L ∆ Q , ? K ∆ Q and ? L ∆ H converge to zero as n goes to infinity. Notethat, absent the normalization of the weight matrices, the term summed over n in the previousequation would become A E p p i p i q ´ ř Tt “ H Wt p I d b E p r it r it q ´ q p v it b r it q . Proof of Lemma 4.2.
By Assumption 4.9 (5), we can assume wlog that E p p i p i q “ I K and E p r it r it q “ I L . Note that ˜ λ a p v q “ A p K p v q “ E ` λ a p V q p K p V q ˘ p K p v q is the mean square projection of λ a onthe functional space spanned by p K . As in the proof of Theorem 3 in Newey (1997), this impliesthat E p|| ˜ λ a p V q ´ λ a p V q|| q ď E p|| ι Ka p K p V q ´ λ a p V q|| q , which gives E p|| e W p ˜ λ a p V q ´ λ a p V qq|| q ď CK ´ γ Ñ
0, using Assumption 4.10 (3) and (4).Following NPV99, writing ˜ λ B a,td p ξ t q “ A H
Wtd r L p ξ t q “ E ´ ˜ λ a p V q B h W p V qB v td b r L p ξ t q ¯ r L p ξ t q , andsince E ´ r ˜ λ a p V q ´ λ a p V qs B h W p V qB v td b r L p ξ t q ¯ r L p ξ t q is the mean square projection of the function E ´´ ˜ λ a p V q ´ λ a p V q ¯ B h W p V qB v td | ξ t ¯ on the functional space spanned by r L , by properties of projectionwe have E ´ˇˇˇˇˇˇ ˜ λ B a,td p ξ t q ´ E ˆ λ a p V q B h W p V qB v td b r L p ξ t q ˙ r L p ξ t q ˇˇˇˇˇˇ ˙ ď E ˆ ||r ˜ λ a p V q ´ λ a p V qs B h W p V qB v td || ˙ ď C E ´ ||r ˜ λ a p V q ´ λ a p V qs|| ¯ Ñ , where the last inequality holds by Assumption 4.9 (3).Since E ´ λ a p V q B h W p V qB v td b r L p ξ t q ¯ r L p ξ t q is the mean square projection of E ” λ a p V q B h W p V qB v td | ξ t ı ,then E ˆˇˇˇˇˇˇ E ˆ λ a p V q B h W p V qB v td b r L p ξ t q ˙ r L p ξ t q ´ E „ λ a p V q B h W p V qB v td | ξ t ˇˇˇˇˇˇ ˙ ď E ˆ || ι Laht r L p ξ t q ´ E „ λ a p V q B h W p V qB v td | ξ t || ˙ Ñ . This implies E ´ || ˜ λ B a,td p ξ t q ´ E ´ λ a p V q B h W p V qB v td | ξ t ¯ || ¯ Ñ
0, and by Assumption 4.9 (8), E ˆ || v td „ ˜ λ B a,td p ξ t q ´ E ˆ λ a p V q B h W p V qB v td | ξ t ˙ || ˙ Ñ . E ˜ || B ˜ λ a p V qB v td ´ B λ a p V qB v td || ¸ ď E ˆ || E p λ a p V q p K p V q q B p K p V qB v td ´ ι Ka B p K p V qB v td || ˙ ` E ˆ || ι Ka B p K p V qB v td ´ B λ a p V qB v td || ˙ , where the second term in the sum converges to 0 by Assumption 4.10 (2) and 3. The first term is E ˆˇˇˇˇˇˇ E ` r λ a p V q ´ ι Ka p K p V qs p K p V q ˘ B p K p V qB v td ˇˇˇˇˇˇ ˙ ď b p K q || E pr λ a p V q ´ ι Ka p K p V qs p K p V q q|| “ O p b p K q K ´ γ q Ñ , where the last equality is obtained by the same argument as in the previous proof. This implies that E p|| B ˜ λ a p V qB v td ´ B λ a p V qB v td || q Ñ
0. Since B ˜ λ a,td p ξ t q “ A d P Wtd r L p ξ t q “ E ´ Aρ W ´ B p K p v qB v td b r t p ξ t q ¯¯ r L p ξ t q , we have by property of MSE projection, E ˆˇˇˇˇˇˇ B ˜ λ a,td p ξ t q ´ E ˆ ρ W B λ a p V qB v td b r L p ξ t q ˙ r L p ξ t q ˇˇˇˇˇˇ ˙ ď E ˜ | ρ W | ˇˇˇˇˇˇ B ˜ λ a p V qB v td ´ B λ a p V qB v td ˇˇˇˇˇˇ ¸ Ñ E ˆˇˇˇˇˇˇ E ˆ ρ W B λ a p V qB v td b r L p ξ t q ˙ r L p ξ t q ´ E „ ρ W B λ a p V qB v td | ξ t ˇˇˇˇˇˇ ˙ ď E ˆ || ι Laρt r L p ξ t q ´ E „ ρ W B λ a p V qB v td | ξ t || ˙ Ñ , which implies E ´ || B ˜ λ a,td ´ E ” ρ W B λ a p V qB v td | ξ t ı || ¯ Ñ E ´ || v td p B ˜ λ a,td ´ E ” ρ W B λ a p V qB v td | ξ t ı q|| ¯ Ñ Proof of Result 4.8.
We first focus on the functional X p t q r b t s “ ş ξ λ bt p ξ t q b t p ξ t q dF ξ t p ξ t q . Newey(1997) shows in the proof of Theorem 2 (equation (A.7) p164 and the subsequent text) thatif || X p bt q r b t s|| ď C | b t | , ? nL ´ γ Ñ
0, ∆ Q “ a p L q a L { n Ñ
0, and || Λ bt || is bounded, then ? n X p bt q p ˆ b t ´ b t q “ Λ bt ř ni “ r it b v it {? n ` o P p q . For all t ď T , under Assumption 4.11 (1) and (2), || X p bt q r b t s|| “ || E p Q i B g B v t p V i q b t p ξ it qq|| ď C | b t | . The other required conditions hold by Assumption4.11.We now check that the conditions of Assumption 4.9 hold for the functionals X p k q and X p M q applied to the two-step estimators ˆ k and ˆ M . Under Assumption 4.6 we have by Lemma C.2 that λ min p M p V qq ě C . Together with Assumption 4.11 (1) and (2), this guarantees that || X p k q r ˜ k s|| “|| E p Q i M p V i q ´ ˜ k p V i qq|| ď C | ˜ k | and || X p M q r ˜ M s|| “ || E p Q i M p V i q ´ ˜ M p V i q g p V i q|| ď C | ˜ M | .Hence Assumption 4.9 (2) holds for each functional.Moreover ρ M p X i , Z i q “ M i ´ M p V i q where M i “ I ´ X i p X i X i q ´ X i if X i is of full rank, or M i “ I ´ X i X ` i if not, with X ` i is the Moore Penrose inverse. In either case, || M i || ď | M i || F ď C and || M p V i q|| F “ || E p M i | V i q|| F ď C , ensuring that ρ M is a bounded function.By the same argument, Assumption 4.11 (1), (2) and (7), ρ k p X i , Z i q “ r M i ´ M p V i qs g p V i q isuniformly bounded. By Assumption 2.1 and 2.3, k p V q “ M ´ p V q g p V q therefore by Assumption4.11 (1), k is twice continuously differentiable, implying that Assumption 4.9 (3) holds for eachfunctional. Also, E p|| e M ˚ || | X, Z q “ p X, Z q , and by Assumption 4.11 (7), for all p X, Z q , E p|| e k ˚ || | X, Z q “ E p|| M u || | X, Z q ď C , ensuring that Assumption 4.9 (8) holds for each functional.The other conditions of Assumption 4.9 are direct consequences of Assumption 4.11. Proof of Result 4.9.
Under Assumptions 4.11’ and 4.12, Conditions (1), (2) and (3) of Assump-tion 4.10 holds for w i “ p M i y i q t and w i “ p M i q st associated respectively with λ tk and λ s,tM . Weshowed that Assumption 4.11 implied that ρ M and ρ k are bounded, as well as E p|| e M ˚ || | X, Z q and E p|| e k ˚ || | X, Z q : this implies that for all V , Var p e M | V q ď C and Var p e k | V q ď C . Condition (4) ofAssumption 4.10 is also satisfied for our choices of w i , hence we can apply Lemma 4.2.We now use Equation (36) to construct the asymptotic variance. Define s i “ r δ i µ i ´ E p µδ qs ` Q δi u i ´ λ M p V i q Vec p e Mi q ´ λ k p V i q e ki ` T ÿ t “ ˆ E „ B λ M p V i qB v t Vec p ρ Mi q| ξ it ´ E „ λ M p V i q B M p V i qB v t | ξ it ` E „ B λ k p V i qB v t ρ ki | ξ it ´ E „ λ k p V i q B k p V i qB v t | ξ it ´ λ bt p ξ it q ˙ v it , where, by a convenient abuse of notation, we denote with B λ M p V qB v t Vec p ρ Mi q the sum ř j ďp T ´ q ρ Mi,j B λ jM p V qB v t with ρ Mi,j the j th component of the vector Vec p ρ M p X i , Z i qq , and similarly for λ k . We will, in a laterstep of this proof, simplify the formula for s i .We conveniently decompose the difference s i,n ´ s i as s i,n ´ s i “ λ M p V i q Vec p e Mi q ´ Λ M p I p T ´ q b Θ q Vec p e Mi q b p i ` λ k p V i q e ki ´ Λ k p I T ´ b Θ q e ki b p i ` T ÿ t “ E „ λ M p V i q B M p V i qB v t | ξ it v it ´ Λ M p I p T ´ q b Θ q H Mt p I k b Θ q v it b r it ` T ÿ t “ E „ λ k p V i q B k p V i qB v t | ξ t v it ´ Λ k p I T ´ b Θ q H kt p I k b Θ q v it b r it ` T ÿ t “ Λ M p I p T ´ q b Θ q d P Mt p I k b Θ q v it b r it ´ E „ B λ M p V i qB v t Vec p ρ Mi q| ξ it v it ` T ÿ t “ Λ k p I T ´ b Θ q d P kt p I k b Θ q v it b r it ´ E „ B λ k p V i qB v t ρ ki | ξ it ` T ÿ t “ λ bt p ξ it q v it ´ Λ bt p I k b Θ q v it b r it , E p|| s i,n ´ s i || q is bounded by the sum of the expected squared norms of theelements of each line up to a multiplicative constant. To show that it converges to 0 as n goes toinfinity, we use the fact that Assumption 4.10 holds for each λ a , where λ a is a column of either λ M or λ k . By Assumption 4.12 (1) and Assumption 4.11 (7), the expected squared norm of the termin the last line also converges to 0 as n Ñ 8 .These arguments imply that E p|| s i,n ´ s i || q Ñ
0. By the proof of Result 4.8 and Assumption4.12 (1), the functions multiplying the residuals appearing in the definition of s i are all bounded.Together with Assumption 4.12 (3), this guarantees E pr s i c s q ă 8 . For a constant vector c P R d x , | c r E p s in s in q ´ E p s i s i qs c | ď E pr s in c ´ s i c s q ` E pr s i c s q { E pr s in c ´ s i c s q { . Hence, | c r E p s in s in q ´ E p s i s i qs c | Ñ c , implying E p s in s in q ´ E p s i s i q Ñ
0. That is, Ω Ñ Var p s i q as n Ñ 8 .We can now simplify the formula for s i using the primitives of the model. Indeed, note that λ M p V i q Vec p e Mi q ` λ k p V i q e ki “ E p Q δi | V i q M p V i q ´ M i u i ,λ M p V i q B M p V i qB v t ` λ k p V i q B k p V i qB v t “ E p Q δi | V i q B g p V i qB v t , B λ M p V i qB v t Vec p ρ Mi q ` B λ k p V i qB v t ρ ki “ ´ E p Q δi | V i q M p V i q ´ p M i ´ M p V i qq B g p V i qB v t , and since λ bt p ξ it q “ ´ E ´ Q δi B g p V i qB v t | ξ it “ ξ t ¯ , we obtain s i “r δ i µ i ´ E p µδ qs ` Q δi u i ´ E p Q δi | V i q M p V i q ´ M i u i ` T ÿ t “ E ˆ” Q δi ´ E p Q δi | V i q M p V i q ´ M i ı B g p V i qB v t | ξ it ˙ v it , “r δ i µ i ´ E p µδ qs ` ˜ Q δi u i ` T ÿ t “ E ˆ ˜ Q δi B g p V i qB v t | ξ it ˙ v it . Thus Var p s i q “ Ω and Ω Ñ n Ñ8 Ω . By Ω ě CI d x , we obtain Ω ´ { Ñ n Ñ8 Ω ´ { ď C ´ { I k x .We again use Lemma 4.2 to prove that || Λ M || , || Λ k || , and || Λ bt || are bounded. Indeed, since wlogwe can assume Θ “ I L , we have || Λ bt || “ tr p Λ bt Λ bt q “ tr p Λ bt p I k b Θ q Λ bt q . Using the notation ofLemma 4.2 with ˜ λ jbt p ξ q “ E p λ jbt p ξ it q r L p ξ it q q r l p ξ q , this gives || Λ bt || “ tr ´ř j ď T ´ E ” ˜ λ jbt p ξ it q ˜ λ jbt p ξ it q ı¯ .However, by Lemma 4.2, we know that E p|| ˜ λ jbt p ξ it q ´ λ jbt p ξ it q|| q Ñ
0, under Assumption 4.12. Thesame reasoning we used for E p s in s in q ´ E p s i s i q applies, and since λ jbt p . q is a bounded function, weobtain E ´ ˜ λ jbt p ξ it q ˜ λ jbt p ξ it q ¯ Ñ n Ñ8 E ´ λ jbt p ξ it q λ jbt p ξ it q ¯ . Therefore || Λ bt || Ñ n Ñ8 tr ˜ ÿ j ď T ´ E ´ λ jbt p ξ it q λ jbt p ξ it q ¯¸ ď C. Hence || Λ bt || is bounded. 70he same arguments applied to the functions λ M , λ k , as well as to E „ B λ M p V i qB v t Vec p ρ Mi q| ξ it , E „ λ M p V i q B M p V i qB v t | ξ it , E „ B λ k p V i qB v t ρ ki | ξ it and E „ λ k p V i q B k p V i qB v t | ξ it , would imply that || Λ M || , || Λ k || , || Λ M p I p T ´ q b Θ qp H Mt ´ d P Mt q|| , and || Λ k p I T ´ b Θ qp H kt ´ d P kt q|| are bounded. Proof of Result 4.10.
We start by showing that the stochastic equicontinuity condition, Condition(1), holds. Lemma 1 of CLVK shows that if p W i q ni “ is i.i.d, Assumption 4.8 (1) holds if : (A)the class F “ t χ p W, G q : G P H (cid:37)c,c u is P -Donsker, i.e it satisfies ş b log N rs p (cid:15), F , || . || L p P q q d(cid:15) ă8 , where N rs p (cid:15), F , || . || L p P q q is the covering number with bracketing, and if (B) χ p ., G q is L p P q continuous at G , that is, E p|| χ p W i , G q ´ χ p W i , G q|| q Ñ || G ´ G || H Ñ
0. We now check thateach of these conditions is satisfied under our assumptions.Condition (A): We use j, l to index components of vectors. As in CLVK, it is enough toprove that F l “ t χ l p W, G q : G P H (cid:37)c,c u is P -Donsker for each component l of χ p . q . Recallthat χ p W i , G q “ Q δi M ` τ “ p x it ´ b t p ξ it qq t ď T ‰˘ ´ k ` τ “ p x it ´ b t p ξ it qq t ď T ‰˘ . We examine χ p W i , G q ´ χ p W i , G q and write, by an abuse of notation and only in this proof, V i “ p x it ´ b t p ξ it qq t ď T and V ,i “ p x it ´ b ,t p ξ it qq t ď T . Note that V “ τ p V q . We decompose χ p W i , G q ´ χ p W i , G q “ Q δi M p τ r V sq ´ r k p τ r V sq ´ k p τ r V sqs` Q δi M p τ r V sq ´ r M p τ r V sq ´ M p τ r V sqs M p τ r V sq ´ k p V q (46) ` Q δi M p τ r V sq ´ r M p V q ´ M p τ r V sqs M p V q ´ k p V q` Q δi M p τ r V sq ´ r k p τ r V sq ´ k p V qs . Since p G , G q P H (cid:37)c,c ˆ H (cid:37)c,c , the norms of each functional and its first order derivatives are bounded.Moreover the derivatives of τ are bounded. This implies that || M p V q ´ M p τ r V sq|| ď c || V ´ V || , and the same result holds for k . Hence, using (46), | χ l p W i , G q ´ χ l p W i , G q| ď || χ p W i , G q ´ χ p W i , G q|| ď C || Q δi || p ř p T ´ q j “ | M j ´ M ,j | ς ` ř T ´ j “ | k j ´ k ,j | ς ` ř t ď T ř d j “ || b t,j ´ b ,t,j || q , wherethe constant C depends on c and c . By Assumption 4.13, E p|| Q δi || q ă 8 which implies by theproof of Theorem 3 of CLVK that N rs p (cid:15), F , || . || L p P q q ď N ` (cid:15) { c Q , C (cid:37)c p S ςV q , || . || ˘ p T ´ q ` T ´ Π t ď T N ` (cid:15) { c Q , C (cid:37)c p S ξ t q , || . || ˘ d , where N p (cid:15), C (cid:37)c p S W q , || . || q denotes the covering number of the class C (cid:37)c p S W q , and c Q “ rp T ´ q ` T ´ ` T d s E p|| Q δi || q , is the size of the brackets constructed in CLVK.It is known that for S W a bounded subset of R k , log N p (cid:15), C (cid:37)c p S W q , || . || q ď (cid:15) ´ k { (cid:37) . By Assumption4.13, (cid:37) ą max p T d , d z ` d q{
2, which implies that F j is P -Donsker. Therefore, Condition (A) issatisfied. 71ondition (B) : By E p|| Q δi || q ď C and using once more the decomposition given by (46), E p|| χ p W i , G q ´ χ p W i , G q|| q ď C || G ´ G || H which gives the wanted result.We now show that Assumption 4.8 (2) holds. This condition is on the remainder of the lin-earization, || X p G q ´ X p G q p G qr G ´ G s || . Note that X p G q ´ X p G q p G qr G ´ G s“ E p Q δi M p τ r V sq ´ r k p τ r V sq ´ k p τ r V sqsq ´ E p Q δi M p V q ´ r k p V q ´ k p V qsq` E p Q δi M p τ r V sq ´ r k p τ r V sq ´ k p V qsq ´ E p Q δi M p V q ´ B k B V p V qr V ´ V sq` E p Q δi M p τ r V sq ´ r M p τ r V sq ´ M p τ r V sqs M p τ r V sq ´ k p V qq´ E p Q δi M p V q ´ r M p V q ´ M p V qs M p V q ´ k p V qq` E p Q δi M p τ r V sq ´ r M p V q ´ M p τ r V sqs M p V q ´ k p V q´ E pr k p V q M p V q b p Q δi M p V q ´ qs Vec p B M B V p V qqr V ´ V sq . We use this decomposition and we bound each line separately. We show here how to find upperbounds for the first and second lines, both of which will be less than || G ´ G || H up to a multiplicativeconstant. The upper bounds for the third and forth lines of this decomposition can be obtainedin a similar fashion. By the triangular inequality, this will give || X p G q ´ X p G q p G qr G ´ G s || ď C || G ´ G || H , as desired. First, ˇˇˇˇ E p Q δi M p τ p V qq ´ r k p τ r V sq ´ k p τ p V qqsq ´ E p Q δi M p V q ´ r k p V q ´ k p V qsq ˇˇˇˇ “ ˇˇˇˇ E p Q δi M p τ r V sq ´ r M p τ r V sq ´ M p τ r V sqs M p τ r V sq ´ r k p τ r V sq ´ k p V qsq ˇˇˇˇ ` ˇˇˇˇ E p Q δi M p τ r V sq ´ r M p V q ´ M p τ r V sqs M p V q ´ r k p τ r V sq ´ k p τ r V sqsq ˇˇˇˇ ` ˇˇˇˇ E p Q δi M p V q ´ rp k ´ k qp τ r V sq ´ p k ´ k qp V qsq ˇˇˇˇ ď C E p|| Q δi ||q ¨˝ p p T ´ q ÿ j “ | M j ´ M ,j | ς qp T ´ ÿ j “ | k j ´ k ,j || ς q` p ÿ t ď T d ÿ j “ || b t ´ b ,t || qp T ´ ÿ j “ | k j ´ k ,j | ς q` p T ´ ÿ j “ | k j ´ k ,j | ς qp ÿ t ď T d ÿ j “ || b t,j ´ b ,t,j || q ¸ ď C || G ´ G || H . As for the second line of the decomposition of X p G q ´ X p G q p G qr G ´ G s , we write ˇˇˇˇ E p Q δi M p τ r V sq ´ r k p τ p V qq ´ k p V qsq ´ E p Q δi M p V q ´ B k B V p V qr V ´ V sq ˇˇˇˇ “ ˇˇˇˇ E p Q δi M p τ r V sq ´ r M p V q ´ M p τ r V sqs M p V q ´ r k p τ r V sq ´ k p V qsq ˇˇˇˇ ` ˇˇˇˇ E p Q δi M p V q ´ r M p V q ´ M p V qs M p V q ´ r k p τ p V qq ´ k p V qsq ˇˇˇˇ ˇˇˇˇ E p Q δi M p V q ´ r k p τ r V sq ´ k p V q ´ B k B V p V qr V ´ V ssq ˇˇˇˇ ď C E p|| Q δi ||q ˜ p ÿ t ď T d ÿ j “ || b t,j ´ b ,t,j || q ` p ÿ t ď T d ÿ j “ || b t,j ´ b ,t,j || qp p T ´ q ÿ j “ | M j ´ M ,j | ς q` p ÿ t ď T d ÿ j “ || b t,j ´ b ,t,j || q ¸ ď C || G ´ G || H , where the inequality for the third term in this equation holds by Assumption 4.11 (1), by theJacobian of τ being the identity matrix when evaluated at V (since V P S V ) and by the secondorder derivative of τ being bounded. Proof of Result 4.12.
Define Φ “ P p det p X i X i q ą δ q ´ and φ n “ n ř ni “ δ i . The estimator of theaverage effect E p µ | δ q is ˆ µ “ ˆ µ δ { φ n . We define Σ “ Var pp s i , δ i q q . Since Ω ą ą
0. We decompose ? nφ n r ˆ µ ´ E p µ | δ qs “ ? n r ˆ µ δ ´ E p µδ qs ` ? n E p µδ q Φ r Φ ´ φ n s , “ ? n p I d x ; E p µ | δ qq ”´ ˆ µ δ φ n ¯ ´ ´ E p µδ q Φ ¯ı , (47)where p I d x ; E p µ | δ qq is of size d x ˆ p d x ` q .We first show ? n Σ ´ { ”´ ˆ µ δ φ n ¯ ´ ´ E p µδ q Φ ¯ı Ñ d N p , I d x ` q . (48)By Result 4.11, ? n ”´ ˆ µ δ φ n ¯ ´ ´ E p µδ q Φ ¯ı “ ? n ř ni “ ` s i,n δ i ´ Φ ˘ ` o P p q .Define Σ n “ Var pp s i,n , δ i ´ Φ q q . We obtain the asymptotic distribution in two steps. We provefirst that ? n Σ n ”´ ˆ µ δ φ n ¯ ´ ´ E p µδ q Φ ¯ı Ñ d N p , I d x ` q . We show in a second step that Σ n Ñ Σ , whichwill yield the desired result. We follow Newey, Powell, and Vella (1999) in proving a Lindebergcondition for c Σ n p s i,n , δ i ´ ∆ q for any constant vector c P R d x ` such that || c || “
1. More preciselyif for all such c , ? n c Σ ´ { n ř ni “ p s i,n , δ i ´ Φ q Ñ d N p , q , this first result will be a consequencethe Cram´er-Wold theorem. Write S i,n “ c Σ n p s i,n , δ i ´ Φ q , then E p S i,n q “ p S i,n q “ S i,n , i.e, for any (cid:15) ą E p S i,n p| S i,n | ą (cid:15) ? n qq Ñ
0. Note that by ρ M and ρ k boundedunder Assumption 4.11, E p u i | X i , Z i q “ E r|| u i || | X i “ X, Z i “ Z s ď C for all p X, Z q , then E r|| e ki || | V i “ V s ď C and E r|| Vec p e Mi q|| | V i “ V s ď C . Fix (cid:15) ą
0. We normalize Θ “ I K andΘ “ I L , and obtain n(cid:15) E p S i,n p| S i,n | ą (cid:15) ? n qq ď E p S i,n p| S i,n | ą (cid:15) ? n qq ď E p S i,n q , C ´ E r|| µ i ´ E p µ q|| s ` E r|| Q δi u i || s ` || Λ M || E r|| Vec p e Mi q b p i || s ` || Λ k || E r|| e ki b p i || s` ÿ t ď T ||r Λ M p H Mt ´ d P Mt q ` Λ k p H kt ´ d P kt q ` Λ bt s || E r|| v it b r it || s ` E r|| δ i ´ Φ || s ¸ . We can bound E p|| v it b r it || q “ E p|| r it || || v it ||q ď C E p|| r it || q by Assumption 4.14, and E p|| r it || q ď a p L q tr p E p r it r it qq “ a p L q L . Similarly, by Assumption 4.14, E p|| e ki b p i || q “ O p b p K q K q and E p|| Vec p e Mi q b p i || q “ O p b p K q K q . Therefore, by Result 4.9, n(cid:15) E p S i,n p| S i,n | ą (cid:15) ? n qq “ O p b p K q K ` a p L q L q .Assumption 4.11 (6) implies ∆ Q “ o p q and ∆ Q “ o p q , in turn implying a K { n b p K q Ñ a L { n a p L q Ñ
0. Therefore the condition E p S i,n p| S i,n | ą (cid:15) ? n qq Ñ n Ñ Σ . This is a consequence ofof the proof of Result4.9. Now we can use (47) with (48) to obtain by a delta method argument ? nφ n r ˆ µ ´ E p µ | δ qs Ñ d N ` , p I d x ; E p µ | δ qq Σ p I d x ; E p µ | δ qq ˘ , hence ? n r ˆ µ ´ E p µ | δ qs Ñ d N ` , Φ ´ p I d x ; E p µ | δ qq Σ p I d x ; E p µ | δ qq ˘ ..