A Control Function Approach to Estimate Panel Data Binary Response Model
aa r X i v : . [ ec on . E M ] F e b A Control Function Approach to Estimate Panel DataBinary Response Model *Amaresh K Tiwari † Abstract
We propose a new control function (CF) method to estimate a binary responsemodel in a triangular system with multiple unobserved heterogeneities The CFsare the expected values of the heterogeneity terms in the reduced form equationsconditional on the histories of the endogenous and the exogenous variables. Themethod requires weaker restrictions compared to CF methods with similar im-posed structures. If the support of endogenous regressors is large, average par-tial effects are point-identified even when instruments are discrete. Bounds areprovided when the support assumption is violated. An application and MonteCarlo experiments compare several alternative methods with ours.
Keywords:
Triangular System, Unobserved Heterogeneities, Point & Partial Iden-tification, Average Partial Effects, Child Labor
JEL Classifications:
C13, C18, C33, J13 * I would like to thank anonymous referees for helpful comments. Thanks are due to seminar parti-cipants at the The Bank of Estonia, the 10 th Nordic Econometric Meeting (Stockholm), the Institute ofMathematics and Statistics (University of Tartu), and the Inaugural Baltic Economic Conference (Vilnius)for the same. I would especially like to thank Soham Sahoo for helping me with the data. All remainingerrors are mine. † University of Tartu,School of Economics and Business Administration,Narva maantee 18, 51009 Tartu, EstoniaPhone: +3727376374,Email: [email protected] & [email protected]
Introduction
Chamberlain (2010) and Arellano and Bonhomme (2011) point out that when paneldata outcomes are discrete, serious identification issues arise when covariates are cor-related with unobserved heterogeneity. Chamberlain shows that for binary choicemodel with fixed T , quantities of interest such as Average Partial Effect (APE) maynot be point identified, or may not possess a √ N consistent estimator. Notwithstand-ing this underidentification result, various methods have been proposed to estimatethe structural measures of interest.Arellano and Bonhomme (2011) provide an overview, and categorize, of some of themethods developed to estimate the quantities of interest. These include the fixed ef-fect (FE) approach that treat heterogeneity or individual effects as parameters to beestimated, where several approaches have been proposed to correct for bias due theincidental parameter problem. Wooldridge (2019), points out that the FE approach,though promising, suffers from a number of shortcomings. First, the number of timeperiods needed for the bias adjustments to work well is often greater than is availablein many applications. Secondly, the recent bias adjustments methods require the as-sumptions of stationarity and weak dependence; in some cases, the strong assumptionof serial independence (conditional on the heterogeneity) is maintained. However, inempirical work dealing with linear models, it has been found that idiosyncratic errorsexhibit serial dependence. Also, “the requirement of stationarity is strong and has sub-stantive restrictions as it rules out staples in empirical work such as including separateyear effects, which can be estimated very precisely given a large cross section.”There is another class of models that acknowledges the fact that many nonlinear paneldata models are not point identified at fixed T and consequently discuss set identi-fication (bound analysis) for certain quantiles of interest such as the marginal effects.These papers show that the bounds become tighter as the number of time periods, T ,increases. However, the methods in most of these papers are still limited to discretecovariates. Moreover, these papers and papers utilizing FE approach assume that con-ditional on unobserved heterogeneity all covariates are exogenous or predetermined;this, as argued in Hoderlein and White (2012) (henceforth HW), may not always holdtrue.In this paper, we relax the assumption of conditional exogeneity to allow for endogen-ous covariates that are continuous, and develop a control function method to identifyand estimate structural measures such as the Average Structural Function (ASF) andthe APE while accounting for endogeneity and heterogeneity in a triangular system.Some of the papers that have developed control function method to study binary orfractional response outcomes are Rivers and Vuong (1988), Blundell and Powell (2004)(BP), Papke and Wooldridge (2008) (PW), Rothe (2009) and Semykina and Wooldridge(2018) (SW). A partial list of papers that have studied nonparametric control func-tion estimation of nonseparable, including binary response, models are Altonji and atzkin (2005) (AM), Florens et al. (2008), Imbens and Newey (2009) (IN), and HW,where the focus is on estimating heterogeneous effect of endogenous treatment.To our knowledge, the papers that allow certain unobserved heterogeneities to be cor-related with the exogenous variables while developing a control function method are:PW, val (2011) (FV), HW, Kim and Petrin (2017), and SW. While PW and SW use theframework of correlated random effects to account for correlation between individualspecific effects and the exogenous variables, FV consider fixed effect estimation inboth the stages, where the control variable is based on estimates of the fixed effectsin the reduced form equation. FV’s method though more general, requires large T to correct the bias due to incidental parameters problem. HW develop a generalizedversion of differencing – which differences out the fixed effects – to identify local aver-age responses in a nonseparable and binary response models. However, identificationin HW requires that individuals do not experience a change in covariates over time;this requirement that the support of all regressors overlap over time could be hard tosatisfy – e.g., it rules out time trends and time dummies. Kim and Petrin exploit re-strictions in the conditional moment of unobserved heterogeneity given instrumentsto develop “generalized control function.” We allow for the correlation between theunobserved individual effects and the instruments in a manner similar to PW’s andSW’s.Typically, in a simultaneous triangular system, unobserved heterogeneity in the re-duced form equations is assumed to be scalar, where the identifying assumption isthat conditional on these scalar time-varying heterogeneity/errors or its CDF, whichare identified, all covariates are independent of the heterogeneity in the structuralequation. However, we know that economic models suggest heterogeneity in tastes,technologies, abilities, etc. that are unobserved. Also, some of these unobserved het-erogeneity might as well be multidimensional. Kasy (2011) shows that for the existingcontrol function methods, identification fails when the reduced form equations havemultiple unobserved heterogeneities.The exceptions to our knowledge are PW, FV, HW and SW, who consider panel datawhere multiple heterogeneities constitute of time invariant random effects and idio-syncratic errors. While the imposed structures in PW and SW are similar to ours,they make the traditional control function assumption, and so their control functionis scalar, whereas our control function is vector valued, whose dimension depend onthe dimension of unobserved heterogeneity and the number of endogenous variables.HW’s specification of the triangular system does not nest ours and FV’s fixed effectsmethod requires long panels.We propose that the expected values of the heterogeneity terms of the reduced formequations conditional on the history of endogenous variables, X i ≡ ( xxx i , . . . , xxx iT ) , andthe same of the exogenous variables, Z i ≡ ( zzz i , . . . , zzz iT ) be used as control functions.The proposed control functions are identified when the distributions of the heterogen-eity terms are specified. We argue that (1) for triangular systems with set-ups similarto ours, these control functions imply a weaker restriction than the commonly made ontrol function assumptions, and (2) the traditionally used control functions may notprovide consistent estimates in a panel data setting such as ours.Our method, while being simple, makes a number of contributions to the literature.First, we allow for multiple heterogeneities, albeit with restrictions, in the triangularsystem, where most papers, adopting the control function approach to handle endo-geneity, do not. Secondly, when the support of the endogenous variables is large,ASF or the APEs are point-identified even when the instruments have a small support.We exploit panel data with repeated observations of the same unit for the purposeof point-identification when support requirement is met. Sharp bounds on the ASFand the APEs are provided when the support assumption is not satisfied. Thirdly, themethod accounts for multiple endogenous variables, all of which are determined sim-ultaneously, whereas most papers on control function consider a single endogenousvariable. Finally, our model retains the attractive features of PW’s, where no assump-tions are made on the serial dependence among the outcome variable.Using data on India, we estimate the causal effects of household income and wealthon the incidence of child labor. We find a strong effect of correcting for endogeneity,and show that the standard parametric models give a misleading picture of the causaleffect of income and wealth on child labor.The rest of the paper is organized as follows. In section 2 we introduce the model anddiscuss identification and estimation of structural measures of interests for a discreteresponse model in a triangular system with random effects. In section 3 we discuss theresults of the Monte Carlo experiments, where we compare our estimator with someof the existing methods for panel data binary response model with imposed structuressimilar to ours. Section 4 contains the application of the proposed estimator to studyincome and wealth effects on the incidence of child labor. And finally in section 5we conclude. The following have been put in the appendix: proofs of the lemmas,propositions, and theorems (Appendix A), generalized estimating equation (GEE) es-timation of probit conditional mean function (Appendix B), extension of the randomeffect model in the main text to allow for random coefficients (Appendix C), largesample properties of the estimator (Appendix D), other technical details (AppendixE). Consider the following binary choice model in a triangular set-up: y it = { y ∗ it = ( ′ it , xxx ′ it ) ϕϕϕ + θ i + ζ it > } , (2.1)where 1 { . } is an indicator function that takes value 1 if the argument in the parenthesisholds true and 0 otherwise. In (2.1), θ i is the unobserved time invariant individualeffect and ζ it is the idiosyncratic error component. The variables, xxx it , are endogenous n the sense that ζ it ⊥6 ⊥ xxx it | θ i ; whereas most papers studying panel data binary choicemodel assume that ζ it ⊥⊥ xxx it | θ i . We assume that each of the endogenous variables arecontinuous and have a large support. The dimension of xxx it is d x and the dimension ofthe exogenous variables, it , is d w .The reduced form in the triangular system, which is estimated in the first stage, is asystem of d x linear equations, xxx it = π zzz it + ααα i + ǫǫǫ it . (2.2)In (2.2), π has a row dimension of d x , ααα i ≡ ( α i , . . . , α id x ) ′ is the ( d x × ) vector of un-observed time invariant individual effects, ǫǫǫ it ≡ ( ǫ it , . . . , ǫ itd x ) ′ is the ( d x × ) vectorof idiosyncratic error terms, and zzz it ≡ ( ′ it , ˜ zzz ′ it ) ′ is of dimension d z . The dimension ofthe vector of instruments, ˜ zzz it , is greater than or equal to the dimension of xxx it . Suchexclusion restriction, where ˜ zzz it appears in the reduced form but not in the structural,are justified on economic grounds.Since the exogenous variables, it , have no bearing on the identification results ob-tained in the paper, to ease notations we suppress it in the binary response model inthe rest of the paper. All assumptions and results are to be understood as conditionalon it . Secondly, in the rest of the paper, except when needed, we will drop the indi-vidual subscript, i .While we refer (2.2) as reduced form equation, it is possible that the triangular sys-tem in (2.1) and (2.2) is in fact fully simultaneous (see Blundell and Powell, 2004, forexamples). However, even if a simultaneous system is not triangular, the triangularrepresentation, such as the above, can be easily derived if the simultaneous equationsinvolving y ∗ t and xxx t are linear and the errors are additively separable. Also, the tri-angular model can be generalized to allow for random coefficients instead of fixedcoefficients. For the sake of exposition, we limit the analysis to fixed coefficients withrandom effects; a straightforward extension of the method to allow for random coeffi-cients is discussed in Appendix c.We first define some notations. Let X ≡ ( xxx , . . . , xxx T ) , a ( d x × T ) matrix, denote thehistory of the endogenous variables, xxx ; let Z ≡ ( zzz , . . . , zzz T ) of dimension ( d z × T ) ,denote the history of the exogenous variables, zzz ; similarly, ζζζ ≡ ( ζ , . . . , ζ T ) ′ is a vectorcontaining the realizations of idiosyncratic shocks in the structural equation, and ǫǫǫ ≡ ( ǫǫǫ , . . . , ǫǫǫ T ) is a ( d x × T ) matrix containing the realizations of idiosyncratic shocks inthe reduced form equations.The first assumptions toward identifying the structural measures of interest such asASF and APE are: AS 1 ζζζ , ǫǫǫ ⊥⊥ Z , θ , ααα . AS 2 (a) θ , ζζζ | X , Z , ααα ∼ θ , ζζζ | ǫǫǫ , Z , ααα ∼ θ , ζζζ | ǫǫǫ , ααα where ǫǫǫ = X − E ( X | Z , ααα ) .(b) θ , ζ t ⊥⊥ ǫǫǫ − t | ααα , ǫǫǫ t , ssumptions AS 1 and AS 2, which serve to account for unobserved confounders suchas the unobserved heterogeneities that are fixed at least in short panels and to elimin-ate the confounding influences of observed and unobserved confounders, are weakerthan the identifying assumptions in traditional control function method. In the tra-ditional control function method, (a) Z is assumed independent of all heterogeneityterms, θ , ζ t , ααα , ǫǫǫ , and (b) it is assumed that θ + ζ t ⊥⊥ X | ααα + ǫǫǫ t = υυυ t ; such an assump-tion also implies that heterogeneity in each of the d x reduced form equations is scalar,whereas one would like to allow for additional heterogeneities such as individual ef-fects and/or random coefficients. We allow Z to be correlated with θ and ααα , and inpart (a) of AS 2, assume that conditional on the history of reduced form error terms, ǫǫǫ and ααα , Z , and thereby X , is independent of the structural error terms θ and ζ t . Theassumption in AS 2 (b), where only contemporaneous errors are correlated, has beenmade to ease exposition, and can be dropped.To see why assumption AS 2 (a) is justified or realistic in empirical settings, considerthe empirical example in BP, in which they estimate the causal effect of “other” house-hold income on work participation decision by men without college education. Thetriangular model in BP augmented with individual effects is given as: y t = { y ∗ t = x t ϕ + z t ϕ z + θ + ζ t > } (2.3) x t = z t π + z t π + z t π + α + ǫ t , (2.4)where y ∗ t is the number of hours worked in a week by the man in the house; x t , which isweekly “other” household income and which includes the income of the spouse, is en-dogenous; z t is a set of strictly exogenous variables that includes various observablesocial demographic variables; z t is a set of strictly exogenous variables that includeshousehold characteristics, for example, the education level of the spouse; and the in-strument, z t , is the weekly welfare benefit entitlement variable, which is excludedfrom the structural equation (2.3). This entitlement variable measures the transfer in-come the family would receive if neither spouse was working.The above triangular representation can be obtained by augmenting with individualeffects the simultaneous equation model considered in Blundell and Smith (1994): y ∗ t = x t ϕ + z t ϕ z + θ + ζ t (2.5) x t = y ∗ t β y + z t β z + z t β z + γ + ξ t . (2.6)Since the structural equation, (2.5) is derived using a wage equation (see also BP),the individual effect, θ = f ( µ , ω ) , could be a composite of unobserved “taste” forwork, µ , and unobserved ability/productivity, ω ; whereas γ in equation (2.6) couldrepresent household’s or spouse’s unobserved productivity (see Blundell et al. , 2007).Substituting x t ϕ + z t ϕ z + θ + ζ t for y ∗ t in equation (2.6), we get the reduced form in quation (2.4), where π = ϕ z β y − ϕβ y , π = β z − ϕβ y , π = β z − ϕβ y , α = θβ y + γ − ϕβ y , and ǫ t = ζ t β y + ξ t − ϕβ y .Let zzz t ≡ { z t , z t , z t } . First, given what the unobserved heterogeneities, θ , γ and α = g ( θ , γ ) , are, it is quite likely that zzz t , which includes the education level of thecouple and the welfare benefits they receive, is correlated with them. Second, if, as inHW, ζ t in equation (2.5) represents new private information revealed to the household,which affects both y ∗ t and x t , then (a) ξ t = f ( ζ t ) in equation (2.6) and (b) even afterconditioning on individual effects, x t and ζ t would be dependent.For the example above and in general, given Z , the only source of dependence between X and ( θ , ζ t ) is through the relationship between ( ααα , ǫǫǫ , . . . , ǫǫǫ T ) and ( θ , ζ t ) . Therefore,given Z , conditioning on ( ααα , ǫǫǫ , . . . , ǫǫǫ T ) eliminates this source of dependency. Since ǫǫǫ t and ζ t are by assumption AS 1 independent of Z , it can be shown that assumption AS2 (a) boils down to θ ⊥⊥ Z | ααα . In other words, what we are assuming is that no informa-tion about Z is contained in θ over and above that contained in ααα . This assumption, ineffect, is similar to the assumption of strict exogeneity in panel data models: once theendogeneity of xxx t has been addressed by conditioning on ( ααα , ǫǫǫ , . . . , ǫǫǫ T ) , given time-invariant heterogeneity, ααα , Z has no extraneous influence on y ∗ t .Since the unobserved conditioning variables, ααα and ǫǫǫ t , cannot be identified separately,for identifying structural measures our method then requires that we be able to re-cover the conditional distribution of ααα given X and Z so that the control functions,which are based on E ( ααα | X , Z ) , can be estimated. However, we do not know of anysemi or nonparametric estimator, and it is outside the scope of this paper to developone, where the distribution or the expectation of individual effects or random coeffi-cients conditional on X and Z are estimated for a system of regressions. With para-metric specification of the error components, however, this conditional distribution isobtained readily.Biørn (2004) proposed a step-wise maximum likelihood method for estimating thesystems of regression equations, where the distributions of error components are spe-cified. We assume that the conditional distribution of ααα given Z and the marginaldistribution of ǫǫǫ t are as follows: AS 3 ααα | Z ∼ N h E ( ααα | Z ) , Λ αα i and ǫǫǫ t ∼ N h Σ ǫǫ i , where E ( ααα | Z ) = ¯ π ¯ zzz could be either Chamberlain’s or Mundlak’s specification for correlatedrandom effects. Thus the tail, aaa = ααα − E ( ααα | Z ) = ααα − ¯ π ¯ zzz , is distributed normally with conditional meanzero and variance Λ αα . Given assumption AS 3, we can write the reduced form in (2.2) s xxx t = π zzz t + ¯ π ¯ zzz + aaa + ǫǫǫ t . (2.7)Some recent papers that have employed Mundlak’s specification for correlated ran-dom effects are listed in SW. Though we have assumed the error term to be normallydistributed, as we discuss here, violation of assumed normality of reduced form errorsis unlikely to have a bearing on the estimates. First, the coefficients in Biørn (2004) areestimated by the method of generalized least squares (GLS), which does not requirenormality of the errors, ααα and ǫǫǫ t . Second, Biørn shows that the ML estimates of thecovariance matrices, Σ ǫǫ and Λ αα , for a moderately large N are approximately same asthose that are obtained when the distributions of ααα and ǫǫǫ t are unknown. Third, estim-ating the reduced form equation augmented with ¯ zzz = T − ∑ Tt = zzz t – to account for thecorrelation between ααα and Z – by GLS yields fixed-effects (FE) estimates of π for timevarying zzz t (see Wooldridge, 2019).For scalar x , Baltagi et al. (2010) allow for heteroscedastic a and serial correlationamong ǫ t . Thus, when d x =
1, the assumptions that a is completely independentof Z and that ǫ t ’s are i.i.d. can be weakened to allow for non-spherical error compon-ents. In lemma 1 in the appendix we derive the conditional distribution of ααα given X and Z for the estimator in Baltagi et al. (2010).However, since we want to account for the endogeneity of multiple endogenous re-gressors, we will stick to assumption AS 3, and estimate the first-stage parameters, Θ ≡ { π , ¯ π , Σ ǫǫ , Λ αα } , of the reduced form equations (2.7) using Biørn’s step-wiselikelihood method, which is briefly described in Appendix E. Now, by assumptions AS 1 and AS 2 the dependence of ( θ , ζζζ ) on X , Z , and ααα is char-acterized by ααα and ǫǫǫ t . If ααα and ǫǫǫ t could be identified, we could augment the structuralequation with ααα and ǫǫǫ t and estimate the coefficients, ϕϕϕ . Since ααα and ǫǫǫ t are not identi-fied separately, the traditional control function approach assumes that the compositeerror, υυυ t = ααα + ǫǫǫ t , which are estimated as the residuals of the reduced form equations,is independent of Z and that conditional on υυυ t , X is independent of θ + ζ t . Such anassumption, as we discuss in detail, could quite likely be violated.Instead, we propose ˆ ααα ( X , Z ) ≡ E ( ααα | X , Z ) and ˆ ǫǫǫ t ( X , Z ) ≡ E ( ǫǫǫ t | X , Z ) , which can beidentified (see lemma 2), as control functions for panel data. The motivation for pro-posing, for one, is lemma 1, where we show that: Lemma 1
If (i) assumptions AS 1 and AS 2 hold and (ii) E ( θ | ααα ) and E ( ζ t | ǫǫǫ t ) are linear in ααα and ǫǫǫ t respectively so that E ( θ | ααα ) + E ( ζ t | ǫǫǫ t ) = ρρρ α ααα + ρρρ ǫ ǫǫǫ t , then E ( θ + ζ t | X , Z ) depends on ( X , Z ) only through ˆ ααα ( X , Z ) and ˆ ǫǫǫ t ( X , Z ) . roof of Lemma 1 Now, E ( θ + ζ t | X , Z ) = E ( E ( θ + ζ t | X , Z , ααα ) | X , Z ) = E ( E ( θ + ζ t | ααα , ǫǫǫ t ) | X , Z )= E ( E ( θ | ααα ) + E ( ζ t | ǫǫǫ t ) | X , Z ) = ρρρ α E ( ααα | X , Z ) + ρρρ ǫ E ( ǫǫǫ t | X , Z ) , (2.8) where the first equality is due to the law of iterated expectations, the second is due to assump-tions AS 2, the third due to AS 1, and the fourth due assumption (ii) in the lemma. In lemma 2 we show that : Lemma 2
Let xxx t = π zzz t + ¯ π ¯ zzz + aaa + ǫǫǫ t , t ∈ {
1, . . . , T } , where ¯ zzz = T ∑ Tt = zzz t , and let AS 3hold, then ααα = ¯ π ¯ zzz + aaa, given X and Z, is distributed with conditional mean E ( ααα | X , Z ) ≡ ˆ ααα ( X , Z , Θ ) = ¯ π ¯ zzz + E ( aaa | X , Z ) = ¯ π ¯ zzz + ΩΣ − ǫǫ T ∑ t = ( xxx t − π zzz t − ¯ π ¯ zzz ) , where Ω = [ T Σ − ǫǫ + Λ − αα ] − is the conditional variance of ααα given X and Z; Λ αα and Σ ǫǫ being the covariance matrices of aaa and ǫǫǫ t respectively. Proof of Lemma 2
Given in Appendix A.Conditional mean of ǫǫǫ t given X and Z is then given byˆ ǫǫǫ t ( X , Z , Θ ) = xxx t − π zzz t − E ( ααα | X , Z ) = xxx t − π zzz t − ˆ ααα ( X , Z , Θ ) = υυυ t − ˆ ααα ( X , Z , Θ ) . Remark 1
The expected posterior estimates, ˆ ααα , of ααα in lemma 2, however, is the empiricalBayes or the James-Stein’s shrinkage estimator of ααα (see Efron, 2010). The empirical Bayesestimation has gained certain popularity in economics. In education economics, it is employedas a procedure to calculate teacher value added and often as a way to make imprecise estimatesmore reliable (see Guarino et al., 2015, and the references therein).Now, we can write ˆ ααα in lemma 2 as ˆ ααα = ¯ π ¯ zzz + E ( aaa | X , Z ) = ¯ π ¯ zzz + [ Σ − ǫǫ + T Λ − αα ] − Σ − ǫǫ | {z } shrinkage factor T T ∑ t = ( xxx t − π zzz t − ¯ π ¯ zzz ) . With the reduced form equation being specified as in equation (2.7) and given assumption AS 3,it can be verified that, for a given Θ , the MLE of aaa is T ∑ Tt = ( xxx t − π zzz t − ¯ π ¯ zzz ) . Since for smallT the MLE of aaa is less reliable, the shrinkage factor of the empirical Bayes estimator shrinks theMLE of aaa towards its mean, 0; thus shrinking the ML estimate of ααα towards its prior mean, ¯ π ¯ zzz.Given consistent estimates of reduced-form parameters, Θ , the empirical Bayes estimate, ˆ ααα , of In lemma 1, part (b), given in appendix A we derive the E ( a | X , Z ) when a and ǫ t are both scalar, a isheteroscedastic, and the distribution of ǫ t is non-spherical. For notational convenience, we use ˆ ααα ( X , Z , Θ ) , ˆ ααα ( X , Z ) and ˆ ααα interchangeably; the same forˆ ǫǫǫ t ( X , Z , Θ ) , ˆ ǫǫǫ t ( X , Z ) and ˆ ǫǫǫ t . 8 αα is the minimum mean squared error predictor of ααα under normality, and therefore a justifiedestimator of ααα .For large T, since the estimates of π are the FE estimates of π , it can be shown that ˆ ααα consist-ently estimates the fixed effects, ααα . With the FE estimates of ααα given by ˆ ααα FE = T ∑ Tt = ( xxx t − π zzz t ) , we can write ˆ ααα as ˆ ααα = ¯ π ¯ zzz + [ Σ − ǫǫ + T Λ − αα ] − Σ − ǫǫ ( ˆ ααα FE − ¯ π ¯ zzz ) . Assuming N is large to have consistently estimated the reduced form parameters, since ˆ ααα FE converges in probability to ααα and [ Σ − ǫǫ + T Λ − αα ] − Σ − ǫǫ to an identity matrix as T → ∞ , bycontinuous mapping theorem it can be shown that ˆ ααα p −→ ααα , and consequently ˆ ǫǫǫ t p −→ ǫǫǫ t . Now, given that ˆ ǫǫǫ t = υυυ t − ˆ ααα , there is one-to-one mapping between ( ˆ ǫǫǫ t , ˆ ααα ) and ( υυυ t , ˆ ααα ) ,and therefore the conditioning σ -algebra, σ ( ˆ ǫǫǫ t , ˆ ααα ) , is same as the σ -algebra, σ ( υυυ t , ˆ ααα ) .Hence, conditioning on the proposed control functions, ˆ ǫǫǫ t and ˆ ααα , is equivalent to condi-tioning on υυυ t and additional individual specific information as summarized by ˆ ααα ( X , Z ) .In other words, in assuming that ( ζ t , θ ) ⊥⊥ X | ˆ ǫǫǫ t , ˆ ααα , we are saying that no informationabout X is contained in ( ζ t , θ ) over and above that contained in ( ˆ ǫǫǫ t , ˆ ααα ) . This, as alsodiscussed in remark 2 and remark 3, may not hold true for υυυ t ( xxx t , zzz t ) = xxx t − π zzz t as acontrol function.Assuming ˆ ǫǫǫ t and ˆ ααα as control functions would be equivalent to assuming the follow-ing: ACF 1 (a) ζζζ , θ | X , Z , ˆ ααα ∼ ζζζ , θ | V , Z , ˆ ααα ∼ ζζζ , θ | V , ˆ ααα , where V ≡ ( υυυ , . . . , υυυ T ) = X − π Z and ˆ ααα = E ( ααα | X , Z ) .(b) ζ t , θ ⊥⊥ V − t | υυυ t , ˆ ααα . In part (a), the assumption is that the dependence of ( θ , ζζζ ) on X and Z is completelycharacterized by V and ˆ ααα . By part (b), because conditioning on ( υυυ t , ˆ ααα ) is equivalent toconditioning on ( ˆ ǫǫǫ t , ˆ ααα ) , X is independent of ( ζ t , θ ) given ( ˆ ǫǫǫ t , ˆ ααα ) . Part (b), as in AS 2(b), is to facilitate comparison with the traditional control function method and can bedropped. Remark 2
When Z ⊥6 ⊥ ( θ , ααα ) , then the requirement of the traditional control function methodthat Z be independent of ( υυυ t , ζ t + θ ) is violated and ζ t + θ ⊥⊥ Z | υυυ t does not hold generally. Inwhich case, assumption ACF 1 seems plausible as ζ t + θ is mean independent of ( X , Z ) given ( ˆ ααα , ˆ ǫǫǫ t ) . This assumption is related to the dependence assumptions in AM, Bester and Hansen(2009) (BH) and HW, where the distribution of unobserved effects depends on the observed When there is a single endogenous regressor, so that the reduced form has a single equation, then onecan employ the estimation method in Gu and Koenker (2017), who, for longitudinal data, have developeda non-parametric estimation method to estimate the empirical Bayes estimates of the individual effects, α , and the distribution of α . Since the posterior mean ˆ α is estimated non-parametrically, the large sampleproperties of the structural coefficients will have to be worked out anew.9 ariables only through certain function of the observed variables. These functions, as BH ar-gue, may be viewed as sufficient statistic. AM assume that ( ζ t , θ ) is independent of X givencertain summary statistics such as the mean, T − ∑ Tt = xxx t , or index functions of summarystatistics, while in BH these functions of observed variables are assumed to be unrestricted in-dex functions. In our case, the control function, ( ˆ ǫǫǫ t , ˆ ααα ) , is motivated by the result that undercertain restrictions, the mean of θ + ζ t given the histories, ( X , Z ) , of the endogenous and theexogenous variables, depends on ( X , Z ) only through ˆ ǫǫǫ t and ˆ ααα . Moreover, as ( ˆ ααα , ˆ ǫǫǫ t ) consist-ently estimates ( ααα , ǫǫǫ t ) when T is large (see Remark 1), it implies that for large T, assumptionsACF 1 and AS 2 are asymptotically equivalent. Remark 3
When Z ⊥⊥ ( θ , ααα , ζ t , ǫǫǫ t ) , as in the traditional control function approach, then ( ζ t , θ ) ⊥⊥ Z | V holds true. Since V is invertible in X when Z is given, we have ( ζ t , θ ) | X , Z ∼ ( ζ t , θ ) | V , Z ∼ ( ζ t , θ ) | V. In panel data, therefore, when Z is independent of the unobserved heterogeneities, Vshould be employed as the control function.Given that the unobserved heterogeneities, ( θ , ααα ) , which represent unobserved, time-invariantattributes, such as preferences, technologies, or abilities, influence the choice of xxx t in each timeperiod, it is not only with xxx t that the errors, θ + ζ t , are correlated, but generally with the entirehistory, X, of the endogenous variable. Moreover, since xxx t is endogenous, due to potentialfeedback from y t to xxx s for s > t, it is likely that the optimal choice of xxx t depends on ζ from thepast, or more generally ζ from other time periods, which is likely to make ζ t and xxx from othertime periods dependent . If only υυυ t is employed as the control function, as in the traditionalcontrol function method, then X may not be conditionally independent of ( ζ t , θ ) . Therefore,there will exist some partial correlation between y t and xxx in other time periods if the dependencybetween the structural errors, ( θ , ζ t ) , and the history, X, is not accounted for. Employing only υυυ t as the control function places a strong restriction on the dependence between ( θ , ζ t ) and X,which, as shown in Proposition 1, is unlikely to hold. Proposition 1
Let Z ⊥⊥ ( θ , ααα , ζ t , ǫǫǫ t ) . When ( θ , ζ t ) and ( ααα , ǫǫǫ t ) are correlated, then θ + ζ t ⊥⊥ X | V whereas θ + ζ t ⊥6 ⊥ X | υυυ t , where υυυ t = ααα + ǫǫǫ t = xxx t + π zzz t and V ≡ ( υυυ , . . . , υυυ T ) . Proof of Proposition 1
Given in Appendix A.
Since by assumption ACF 1, θ + ζ t ⊥⊥ X | ˆ ααα , ˆ ǫǫǫ t , we have E ( y ∗ t | X , ˆ ααα , ˆ ǫǫǫ t ) = xxx ′ t ϕϕϕ + E ( θ + ζ t | ˆ ααα , ˆ ǫǫǫ t ) . To estimate ϕϕϕ parametrically, we further assume that For example, in the study of child labor in section 4, the endogenous variables, household income,amount of land owned by a household, and index of household ownership of productive farm assets, ineach period will also depend on unobserved household characteristics such as parents’ abilities, qualityof land, or possibly other omitted variables fixed at the household level. Moreover, apart from contem-poraneous shocks, ( ζ t , ǫǫǫ t ) , that affect the current choices of both xxx t and y t , there may be feedback fromlagged values of ζ or y to xxx t . In the study of the child labor, for example, current choices of labor supplycan impact the future choices of endogenous variables mentioned above. It is thus possible that ( θ , ζ t ) and X , as in the considered example of child labor, could be dependent.10 S 4 (i) θ + ζ t = E ( θ + ζ t | ˆ ααα , ˆ ǫǫǫ t ) + η t = ϕϕϕ α ˆ ααα + ϕϕϕ ǫ ˆ ǫǫǫ t + η t , and that (ii) η t ⊥⊥ X t =( xxx ′ t , ˆ ααα ′ , ˆ ǫǫǫ ′ t ) ′ . The two vectors, ϕϕϕ α and ϕϕϕ ǫ , when estimated give us a test of exogeneity of xxx t . Now,though the residual, η t = θ + ζ t − E ( θ + ζ t | ˆ ααα , ˆ ǫǫǫ t ) , is uncorrelated with the condition-ing variables, ( ˆ ααα , ˆ ǫǫǫ t ) , for estimation of binary response models a stronger condition ofindependence is required. The assumption that the residual, U − E ( U | V ) , is independ-ent of the conditioning variables, V , has been made in Chamberlain (1984) and PW forestimation of parametric models such as the correlated random effects (CRE) probitand fractional response models. This independence condition is, however, satisfied ifwhen ( U , V ) are jointly normal.Given the above assumption, equation (2.1) can then be written as y t = { X ′ t Θ + η t > } , (2.9)where Θ ≡ ( ϕϕϕ ′ , ϕϕϕ ′ α , ϕϕϕ ′ ǫ ) ′ . Since ˆ ǫǫǫ t and ˆ ααα , both, are of dimension d x , the dimensionof X t is 3 d x . The identification conditions for Θ in (2.9) to be identified when η t is assumed to follow a known distribution are: (a) η t be independent of X t and (b)rank ( E ( X t X ′ t )) = d x . In theorem 1 we show that condition (b) is satisfied. Theorem 1
If (i) rank ( E ( xxx t xxx ′ t )) = d x ; (ii) rank ( Π ) = d x , where Π = (cid:16) π ¯ π (cid:17) ; (iii) rank ( E (( zzz ′ t , ¯ zzz ′ ) ′ ( zzz ′ t , ¯ zzz ′ ))) = k where k = dim (( zzz ′ t , ¯ zzz ′ ) ′ ) ; and (iv) if AS 3 holds so that thecovariance matrices of ǫǫǫ t and ααα are of full rank, then rank ( E ( X t X ′ t )) = d x . Proof of Theorem 1
Given in Appendix A.Condition (ii) is the rank condition in IN, and it underscores the necessity of exclusionrestriction for identification. Conditions (i) to (iii) in theorem 1 are standard conditionsfor identification of ϕϕϕ in the traditional control function methods, where the controlfunction is the composite error, υυυ t = ααα + ǫǫǫ t = xxx t − π zzz t . Our conditioning variables,however, are ˆ ǫǫǫ t and ˆ ααα , which are also functions of Λ αα and Σ ǫǫ . Positive definitenessof Λ αα and Σ ǫǫ in condition (iv) helps establish the statement of the theorem to be true.Appendix B discusses how one can use the method of generalized estimating equation(GEE), which can account for heteroscedasticity and serial dependence in the responseoutcome, to estimate Θ .Following theorem 1, since the components of xxx t are continuous, with scale and loca-tion normalization, Θ can be estimated by semiparametric methods without specify-ing the distribution of η t (see Horowitz, 2009, for a review of identification results forsemiparametric binary choice models).Alternatively, one can estimate ϕϕϕ and measures like the ASF by semiparametric meth-ods in BP or in Rothe, where the conditional distribution of θ + ζ t given ˆ ǫǫǫ t and ˆ ααα , as in Note that we have suppressed t in X t , where t is of dimension d w . So, in fact, the dimension of X t is3 d x + d w . Suppressing t in X t , however, results in no loss of generality.11 ssumption AS 4, is not specified. BP extend the the matching estimator of ϕϕϕ for thesingle-index model without endogeneity, whereas Rothe develops a semiparametricmaximum likelihood (SML) method for binary response model to handle endogeneityusing control functions. These semiparametric methods, however, require that, zzz = ˜ zzz ,contains an instrument that is continuous. If all instruments are discrete, the “rankcondition” in BP and condition (ii) of Theorem 1 in Rothe , necessary for identifica-tion, are violated.We do not pursue semiparametric estimation of binary choice models with the controlfunctions developed in this paper any further. Semiparametric estimation and thelarge sample properties of the estimates are left for future research.Wooldridge (2015) in providing an overview of control function (CF) methods writes,“in evaluating the scope of an estimation method, it is important to understand howit works in familiar settings, including cases when it is not necessarily needed.” Whilenoting that the standard CF method for cross sectional data when applied to linearmodels gives coefficients for the endogenous variables that are equal to the standardtwo-stage least squares (2SLS), Wooldridge points out certain advantages of the CFapproach, such as providing a robust, regression based Hausman test of exogeneity,compared to the 2SLS approach. Before ending this subsection, we therefore showthat using ˆ ααα and ˆ ǫǫǫ t as additional covariates in linear panel data models is equivalentto estimating the models by a certain 2SLS method.Now, in linear panel data models, y t = xxx ′ t ϕϕϕ + θ + ζ t , (2.10)when instruments are correlated with time invariant heterogeneity, fixed effect two-stage least squares (FE2SLS), which employs time-variant instruments that are devi-ations from the group-mean, ¨ zzz t = zzz t − ¯ zzz , is employed. An alternative approach (seeWooldridge, 2010a, chapter 11), is to write θ in equation (2.10) as θ = E ( θ | Z ) + τ , spe-cify E ( θ | Z ) as in Mundlak, and estimate the model by pooled 2SLS using ( zzz t , ¯ zzz ) asinstruments.By assumptions AS 1, AS 2, AS 3 and condition (ii) of lemma 1, we have E ( θ | Z ) = E ( E ( θ | ααα ) | Z ) = E ( ρρρ α ααα | Z ) = ρρρ α ¯ π ¯ zzz . Thus we can write the model in equation (2.10) as y t = xxx ′ t ϕϕϕ + ρρρ α ¯ π ¯ zzz + τ + ζ t , (2.11)where the heterogeneity term, τ + ζ t , and X are dependent even though τ + ζ t is meanindependent of Z . Thus, we can estimate of ϕϕϕ in (2.11) by using instrument variables.Also, by assumptions ACF 1 and AS 4, the linear model in equation (2.10) as can be Since conditioning on ˆ ǫǫǫ t and ˆ ααα is equivalent to conditioning on υυυ t = xxx t − π zzz t and ˆ ααα , and if identificationrequires that conditional on the control variables, ˆ ǫǫǫ t and ˆ ααα , – hence, υυυ t and ˆ ααα – the vector xxx t contains atleast one, x t , continuously distributed component with non-zero coefficient, then it would be necessarythat zzz t contains a continuously distributed regressor.12 ritten as y t = xxx ′ t ϕϕϕ + ϕϕϕ α ˆ ααα + ϕϕϕ ǫ ˆ ǫǫǫ t + η t , (2.12)where η t = θ + ζ t − E ( θ + ζ t | X , Z ) = θ + ζ t − ( ϕϕϕ α ˆ ααα + ϕϕϕ ǫ ˆ ǫǫǫ t ) is mean independent of ( X , Z ) . Theorem 2
Let the estimate of ϕϕϕ in (2.11) by pooled 2SLS using ( ¨ zzz t , ¯ zzz ) as additional instru-ments be denoted by ˆ ϕϕϕ IV and let the estimate of the same obtained from estimating (2.12) bypooling the data be denoted by ˆ ϕϕϕ CF , then ˆ ϕϕϕ CF = ˆ ϕϕϕ IV . Proof of Theorem 2
Given in Appendix A.Estimating (2.11) by pooled 2SLS using ( ¨ zzz t , ¯ zzz ) as additional instruments is similar to es-timating by the error component two-stage least squares (EC2SLS) method proposedin Baltagi (1981), with the difference that the time-varying instruments are allowed tobe correlated to the individual specific unobserved heterogeneity. With ˆ ααα and ˆ ǫǫǫ t as control functions, we havePr ( y t = | X , ˆ ααα , ˆ ǫǫǫ t ) = Z {− ( ζ t + θ ) < xxx ′ t ϕϕϕ } dF ( ζ t + θ | X , ˆ ααα , ˆ ǫǫǫ t )= Z {− ( ζ t + θ ) < xxx ′ t ϕϕϕ } dF ( ζ t + θ | ˆ ααα , ˆ ǫǫǫ t ) = F ( xxx ′ t ϕϕϕ ; ˆ ααα , ˆ ǫǫǫ t ) ,where F ( xxx ′ t ϕϕϕ ; ˆ ααα , ˆ ǫǫǫ t ) is the conditional CDF of ζ t + θ given ( ˆ ααα , ˆ ǫǫǫ t ) evaluated at xxx ′ t ϕϕϕ .Given a particular value ¯ xxx of xxx t , averaging F ( ¯ xxx ′ ϕϕϕ ; ˆ ααα , ˆ ǫǫǫ t ) over ( ˆ ααα , ˆ ǫǫǫ t ) , we get the ASF: G ( ¯ xxx ) = Z F ( ¯ xxx ′ ϕϕϕ ; ˆ ααα , ˆ ǫǫǫ t ) dF ( ˆ ααα , ˆ ǫǫǫ t ) , = Z (cid:20) Z { ¯ xxx ′ ϕϕϕ + θ + ζ t > } dF ( θ + ζ | ˆ ααα , ˆ ǫǫǫ t ) (cid:21) dF ( ˆ ααα , ˆ ǫǫǫ t )= E θ + ζ ( { ¯ xxx ′ ϕϕϕ + θ + ζ t > } ) . (2.13)The APE of changing a variable, say ¯ x k , from ¯ x k to ¯ x k + ∆ k can be obtained as ∆ G ( ¯ xxx ) ∆ k = G ( ¯ xxx − k , ¯ x k + ∆ k ) − G ( ¯ xxx ) ∆ k . (2.14)To point-identify the ASF, G ( ¯ xxx ) , it is required that F ( xxx ′ t ϕϕϕ = ¯ xxx ′ ϕϕϕ ; ˆ ααα = ¯ ααα , ˆ ǫǫǫ t = ¯ ǫǫǫ ) be eval-uated at all values of ( ¯ ααα , ¯ ǫǫǫ ) in the support of the unconditional distribution of ( ˆ ααα , ˆ ǫǫǫ ) .This requires that the support of the conditional distribution of ( ˆ ααα , ˆ ǫǫǫ ) conditional on xxx t = ¯ xxx be equal to the support of the unconditional distribution. It ensures that forany group of individuals defined in terms of ( ˆ ααα , ˆ ǫǫǫ t ) , at least some experience xxx t = ¯ xxx . his is analogous to the overlap condition in the program evaluation literature, wheretreatment is discrete.For many triangular systems that employ the control function, υυυ t , or { F ( υ t ) . . . F ( υ d x , t ) } ′ – where F ( υ t ) , the CDF of υ t , is equal to F ( x t | zzz it ) , the CDF of x t given zzz t – the re-quirement of common support necessitates that along with the rank condition (see the-orem 1) the set of instruments, zzz t , contains a continuous instrument with large support(this is discussed in IN, Florens et al. , and in Blundell and Powell (2003)). In lemma 3we show that when the instruments have small support – that is, when instrumentsare binary, discrete, or continuous but without large support – the support require-ment for G ( ¯ xxx ) to be point-identified by the “partial-mean” formulation in equation(2.13) is satisfied if xxx have a large support. Lemma 3
If the endogenous variables, xxx, have large a support, then under AS 3, the supportof the conditional distribution of ˆ ααα ( X , Z , Θ ) and ˆ ǫǫǫ t ( X , Z , Θ ) , conditional on xxx t = ¯ xxx, is sameas the support of their marginal distribution. Proof of Lemma 3
Given in Appendix A.In our approach, the control functions, ˆ ǫǫǫ t and ˆ ααα , are smooth, unbounded functions of xxx t ’s, t ∈ {
1, . . . , T } . Therefore, when xxx is continuous with a large support, becausethe xxx s ’s, s = t , are unrestricted the ranges of ˆ ααα and ˆ ǫǫǫ t = xxx t − π zzz t − ˆ ααα conditional on xxx t = ¯ xxx do not depend on xxx t . Since the result does not rely on any kind of restriction on zzz t ’s support, our method circumvents the need to have a continuous instrument withlarge support to point-identify the ASF and/or the APEs when some of the xxx ’s havelarge supports.Before proceeding further, we note that these results could be useful for computingthe quantile structural function (QSF), as in IN, for the kind of triangular set-upsconsidered in Blundell and Powell (2003), where the structural equation is nonsep-arable in errors, but additively separable in the reduced form. When the nonseparablestructural and reduced form equations are strictly increasing in their respective scalarerrors, D’Haultfœuille and F´evrier (2015) and Torgovitsky (2015) show that the struc-tural function, y t = g ( xxx t , ε t ) , is point-identified with discrete instruments. The keyobservation in both the papers is that establishing point identification is tantamountto solving a functional fixed point problem. While the papers differ in their approachto establishing sufficient conditions under which the fixed point problem admits aunique solution, they are able to avoid the partial-mean formulation in equation (2.13),or in equation (6) of IN’s, to identify the QSF, which in their case is same as the struc-tural function. Since g ( xxx t , ε t ) is required to be strictly monotonic in ε t , these methods,however, are not suitable for identification in discrete choice models.When the support condition in lemma 3 is not satisfied, one can establish bounds onthe ASF and the APE’s. Let A be the unconditional support of ( ˆ ααα , ˆ ǫǫǫ t ) and A ( ¯ xxx ) ≡{ ˆ ααα , ˆ ǫǫǫ t : f ( ˆ ααα , ˆ ǫǫǫ t | ¯ xxx ) > } be the support of ( ˆ ααα , ˆ ǫǫǫ t ) conditional on ¯ xxx . When the support of xx is bounded and the instruments have a small support then A 6 = A ( ¯ xxx ) . Now, let˜ G ( ¯ xxx ) = Z A ( ¯ xxx ) F ( ¯ xxx ′ ϕϕϕ ; ˆ ααα , ˆ ǫǫǫ t ) dF ( ˆ ααα , ˆ ǫǫǫ t ) (2.15)be the identified object and let P ( ¯ xxx ) = R A∩A ( ¯ xxx ) c dF ( ˆ ααα , ˆ ǫǫǫ t ) . Since G ( xxx t ) = ˜ G ( ¯ xxx ) + Z A∩A ( ¯ xxx ) c F ( ¯ xxx ′ ϕϕϕ ; ˆ ααα , ˆ ǫǫǫ t ) dF ( ˆ ααα , ˆ ǫǫǫ t ) and since 0 ≤ F ( ¯ xxx ′ ϕϕϕ ; ˆ ααα , ˆ ǫǫǫ t ) ≤
1, the above equation implies that G ( ¯ xxx ) ∈ [ ˜ G ( ¯ xxx ) , ˜ G ( ¯ xxx ) + P ( ¯ xxx )] ; that is, G ( ¯ xxx ) is set-identified. The bounds on G ( ¯ xxx ) are sharp since there are norestrictions on E ( y t | xxx t = ¯ xxx , ˆ ααα , ˆ ǫǫǫ t ) = F ( ¯ xxx ′ ϕϕϕ ; ˆ ααα , ˆ ǫǫǫ t ) imposed by the data.It follows then that the APEs are also set-identified when support requirement inlemma 3 is not met. To derive bounds for the APE of changing x k from ¯ x k to ¯ x k + ∆ k ,let us first denote ( ¯ xxx ′− k , ¯ x k + ∆ k ) ′ by ¯ xxx ∆ k . Since the ASF, G ( ¯ xxx ∆ k ) , at ¯ xxx ∆ k is partially iden-tified, where G ( ¯ xxx ∆ k ) ∈ [ ˜ G ( ¯ xxx ∆ k ) , ˜ G ( ¯ xxx ∆ k ) + P ( ¯ xxx ∆ k )] , the APE of x k at ¯ xxx , ∆ G ( ¯ xxx ) / ∆ k , liesin the interval, (cid:20) ˜ G ( ¯ xxx ∆ k ) − ˜ G ( ¯ xxx ) − P ( ¯ xxx ) ∆ k , ˜ G ( ¯ xxx ∆ k ) + P ( ¯ xxx ∆ k ) − ˜ G ( ¯ xxx ) ∆ k (cid:21) , (2.16)where the sharpness of the bounds on APE derives from that of the bounds on ASF.Once we have the consistent estimates, ˆ Θ , of Θ , to estimate the bounds on APE of avariable, x k , we first, as in IN, estimate the support of the estimates, ( ˆˆ ααα i , ˆˆ ǫǫǫ it ) , given ¯ xxx as ˆ A ( ¯ xxx ) = { ˆˆ ααα i , ˆˆ ǫǫǫ it : ˆ f ( ˆˆ ααα i , ˆˆ ǫǫǫ it | ¯ xxx ) ≥ δ ( ¯ p ) , ( ˆˆ ααα i , ˆˆ ǫǫǫ it ) ∈ ˆ A} ,where ˆ f ( ˆˆ ααα i , ˆˆ ǫǫǫ it | ¯ xxx ) , which is the estimate of the conditional density of ( ˆ ααα i , ˆ ǫǫǫ it ) given ¯ xxx ,is obtained by employing the method of estimating the conditional density functionin Hall et al. (2004) . ˆ A is an estimator of the support, A , containing all ( ˆˆ ααα i , ˆˆ ǫǫǫ it ) . Inthe above, δ ( ¯ p ) is the trimming parameter and, as discussed in Cadre et al. (2013), isobtained as a solution to the following equation: Z { ˆ f ≥ δ } ˆ f ( ˆˆ ααα i , ˆˆ ǫǫǫ it | ¯ xxx ) = ¯ p , where ¯ p is a fixed probability level.When ¯ p is close to 1, the upper level set, { ˆˆ ααα i , ˆˆ ǫǫǫ it : ˆ f ( ˆˆ ααα i , ˆˆ ǫǫǫ it | ¯ xxx ) ≥ δ ( ¯ p ) } , is close to thesupport of the conditional distribution.To estimate δ ( ¯ p ) , letˆ H ( γ ) = NT ∑ i , t { ˆ f ( ˆˆ ααα i , ˆˆ ǫǫǫ it | ¯ xxx ) ≤ γ } be the estimate of H ( γ ) = Pr ( f ( ˆ ααα i , ˆ ǫǫǫ it | ¯ xxx ) ≤ γ ) , R’s ‘np’ package developed by Hayfield and Racine (2008) implements the method. The package’s‘npcdens’ function computes kernel conditional density estimates of p variables conditional on q vari-ables. 15 nd let the estimate of ( − ¯ p ) -quantile of the law of f ( ˆ ααα i , ˆ ǫǫǫ it | ¯ xxx ) be γ ( ¯ p ) = inf { γ ∈ R : ˆ H ( γ ) ≥ − ¯ p } . γ ( ¯ p ) can be computed by considering the order statistic inducedby the sample: ˆ f ( ˆˆ ααα , ˆˆ ǫǫǫ | ¯ xxx ) , . . . , ˆ f ( ˆˆ ααα N , ˆˆ ǫǫǫ N , T | ¯ xxx ) . Cadre et al. note that whenever H ( γ ) iscontinuous at γ ( ¯ p ) , then δ ( ¯ p ) = γ ( ¯ p ) . Following IN, for the application in section 4,we set ¯ p = A ( ¯ xxx ) , we can estimate ˜ G ( . ) and P ( . ) in (2.15) at ¯ xxx asˆ˜ G ( ¯ xxx ) = NT ∑ i , t Φ ( ¯ xxx ′ ˆ ϕϕϕ + ˆ ϕϕϕ α ˆˆ ααα i + ˆ ϕϕϕ ǫ ˆˆ ǫǫǫ it ) [( ˆˆ ααα i , ˆˆ ǫǫǫ it ) ∈ ˆ A ( ¯ xxx )] andˆ P ( ¯ xxx ) = NT ∑ i , t [( ˆˆ ααα i , ˆˆ ǫǫǫ it ) / ∈ ˆ A ( ¯ xxx )] respectively. (2.17)Now that we can estimate ˜ G ( . ) and P ( . ) at any xxx , the bounds on APE in (2.16), too, canbe computed.In Appendix D we derive the asymptotic covariance matrix of the second-stage coeffi-cient estimates when the first stage estimation involves estimating a system of regres-sion using the method in Biørn (2004). Given the covariance matrix of the second-stagecoefficient estimates, we also derive the confidence intervals (CIs) proposed in Imbensand Manski (2004) for the partially identified APEs.However, first, because the expressions needed to compute the covariance matricesmight be computationally involved, and secondly, because new expressions for thecovariance matrix of the second-stage coefficient estimates will have to be derivedwhen a different estimator for the first stage reduced form is employed, we suggestthat bootstrapping procedure be employed to approximate the variance of the estim-ated coefficient. To obtain bootstrap standard errors for control function methods,both parts of the estimation are included for every bootstrap sample (see Wooldridge,2015), where resampling, as in PW, can be done at the level of cross-sectional unit. In this section we discuss the results of the Monte Carlo (MC) experiments, which weconduct to analyze the finite sample behaviour of our model and compare the estim-ates of APEs from ours and alternative estimators to the true measures of the APEs.Since we want to compare the performance of our estimator to the performances ofalternative estimators with setups similar to ours, such as that of PW’s, which has asingle endogenous regressor, we first conduct the simulation exercise with one endo-genous variable, x . And since our method allows for multiple endogenous regressors,we also experiment with two endogenous regressors, ( x , x ) . n the first simulation exercise, we consider the following data generating process(DGP): y it = { ϕ x it + θ i + ζ it > } and 0 otherwise, where (3.1) x it = π z it + α i + ǫ it , i =
1, . . . , n , t =
1, . . . , 5, (3.2)and where z it is the instrument. We assume that ϕ = − π = α i and θ i to be correlated with the vector of instruments, Z i = ( z i , . . . , z i ) ′ . The z it ’s are i.i.d and marginally distributed as N [ σ z ] , where σ z =
5. The variables, Z i , α i , and θ i , are drawn from the following distribution: ( Z ′ i , α i , θ i ) ′ ∼ N h Σ z αθ i , where σ α = σ θ = ρ z α = ρ z θ = ρ αθ = α i , the conditional correlationbetween z it and θ i , ρ z θ | α = ρ z θ − ρ z α ρ αθ =
0, which, in this case, also implies thatconditional on α i , θ i ⊥⊥ Z i | α i .In accordance with assumption AS 1, we assume that ( ζζζ i , ǫǫǫ i ) ⊥⊥ ( Z ′ i , α i , θ i ) ′ , and draw ( ζ it , ǫ it ) from N h Σ ζǫ i , where the elements σ ζ , σ ǫ , and ρ ζǫ of Σ ζǫ are assumed as σ ζ = σ ǫ = ρ ζǫ = θ i ⊥⊥ Z i | α i and ( Z ′ i , α i , θ i ) ′ ⊥⊥ ( ζζζ i , ǫǫǫ i ) ,together satisfy AS 2 (a) , and since ( ζ it , ǫ it ) are i.i.d., they also satisfy AS 2 (b).From this DGP we generate ( Z ′ i , α i , θ i ) ′ and ( ζ it , ǫ it ) of varying size, n , with t fixed at t =
5. We then discretized z it to take value 1 if z it > ( Z ′ i , α i , θ i ) ′ and ( ζ it , ǫ it ) , we generate x it according to (3.2) and then y it accordingto (3.1).We have showed that by estimating y it = { ϕ x it + ϕ α ˆ α i + ϕ ǫ ˆ ǫ it + η it > } , (3.3)where ˆ α i and ˆ ǫ it are the control variables, as a probit model we can obtain consistentestimates of the ASF and the APE. The control variables, ˆ α i and ˆ ǫ it , are obtained fromthe estimates of the reduced form equation (3.2), augmented with ¯ π ¯ z i = ¯ π T − ∑ Tt = z it ,which is estimated as a random effect model by the method of MLE. The APE at ¯ x after estimating equation (3.3) could be obtained by averaging1 ∆ x Φ (cid:18) ϕ x it + ϕ α ˆ α i + ϕ ǫ ˆ ǫ it σ η (cid:19) (3.4)over ˆ α and ˆ ǫ at x it = ¯ x + ∆ x and x it = ¯ x and taking the difference. In the above, σ η isthe variance of η it in equation (3.3) and Φ is the standard cumulative normal densityfunction.Now, while in practice the heterogeneity terms, ( θ i , ζ it ) and ( α i , ǫ it ) , are unobserved, inMC experiments we do know what these values are. By averaging 1 { x it ϕ + θ i + ζ it > } over ( θ i , ζ it ) at x it = ¯ x to obtain G ( ¯ x ) and the same at x it = ¯ x + ∆ x to obtain Given the DGP assumptions, it can be verified that ζζζ i ⊥⊥ Z i | α i , ǫǫǫ i , θ i and θ i ⊥⊥ Z i | α i , ǫǫǫ i ; these two thenimply that θ i , ζζζ i ⊥⊥ Z i | α i , ǫǫǫ i or equivalently θ i , ζζζ i ⊥⊥ X i | α i , ǫǫǫ i .17 ( ¯ x + ∆ x ) , we could compute the true measure of APE, ∂ G ( x it ) ∂ x , at x it = ¯ x by computing G ( ¯ x + ∆ x ) − G ( ¯ x ) ∆ x . For the exercise, we chose ¯ x = ∆ x = ( θ i , ζ it ) , there is some variability in the values of ∂ G ( ¯ x ) ∂ x over thereplications; the average over the replications for every sample size is reported in thetables containing the results. For notational convenience we will denote the true APEby ∂ G ( ¯ x ) ∂ x . Estimates of APE from any of the model considered in this section will bedenoted by d ∂ G ( ¯ x ) ∂ x .One of the alternative estimators, which has its set-up similar to ours is the methodproposed by PW. To address the issue of endogeneity, PW also propose a two-stepcontrol function method. They first assume that θ i = E ( θ i | Z i ) + τ i = ¯ π θ ¯ z i + τ i and α i = E ( α i | Z i ) + a i = ¯ π α ¯ z i + a i , where ¯ z i = T − ∑ Tt = z it . Given the assumptions, theywrite the triangular system in (3.1) and (3.2) as y it = { ϕ x it + ¯ π θ ¯ z i + τ i + ζ it > } (3.5) x it = π z it + ¯ π α ¯ z i + υ PWit , (3.6)where υ PWit = a i + ǫ it . They then make the control function assumption that τ i + ζ it ⊥⊥ x it | υ PWit . This allows them to estimate the APE at x it = ¯ x by averaging1 ∆ x Φ (cid:18) ϕ x it + ¯ π θ ¯ z i + ρυ PWit σ PW (cid:19) over ¯ π θ ¯ z and υ PW at x it = ¯ x + ∆ x and x it = ¯ x and taking the difference. In the above, ρ is population regression coefficient of τ i + ζ it on υ PWit , and where υ PWit is obtained asresiduals after estimating (3.6) in the first stage. The conditional distribution of τ i + ζ it given υ PWit is assumed to follow a normal distribution with variance σ PW . If theirmethod gives consistent estimates of APE, then it must be that the above measure isequal to ∂ G ( ¯ x ) ∂ x .In Chamberlain’s correlated random effects (CRE) probit and in Chamberlain’s condi-tional logit (CL), x it is assumed to be independent of the idiosyncratic term, ζ it . Whilein CRE probit model E ( θ i | X i ) is specified, in the CL model the distribution of θ i is leftunspecified. Assuming that θ i = ¯ π θ ¯ x i + τ i , where ¯ π θ ¯ x i is the specification for E ( θ i | X i ) ,the structural equation for the CRE probit model is given by y it = { ϕ x it + ¯ π θ ¯ x i + τ i + ζ it > } , where τ i = θ i − E ( θ i | X i ) . τ i + ζ it is assumed independent of X i and is distributed normally with variance σ CRE .The CRE probit model is estimated as a probit model by pooling the data. If the CREprobit model, too, gives consistent measure of APE then it has to be that1 ∆ x (cid:20) Z Φ (cid:18) ϕ ( x it + ∆ x ) + ¯ π θ ¯ x i σ CRE (cid:19) dF ( ¯ π θ ¯ x ) − Z Φ (cid:18) ϕ x it + ¯ π θ ¯ x i σ CRE (cid:19) dF ( ¯ π θ ¯ x ) (cid:21) = ∂ G ( x it ) ∂ x ,where the LHS is the measure of APE of x at x it pertaining to the CRE probit model. he structural equation for the CL model is same as equation (3.1), where ζ it follows alogistic distribution. The APE of x at x it for the CL model is1 ∆ x (cid:20) Z Λ ( x it + ∆ x , θ i ) dF ( θ ) − Z Λ ( x it , θ i ) dF ( θ ) (cid:21) ,where Λ ( x it , θ i ) = Pr ( y it = | x it , θ i ) = exp ( ϕ x it + θ i ) + exp ( ϕ x it + θ i ) . Once we have estimated ϕ byestimating the CL model, we can estimate the APE by averaging Λ ( x it + ∆ x , θ i ) and Λ ( x it , θ i ) over θ i and taking the difference.Table 1 provides the results for various sample size, n , with m = method, to the alternative estimators considered above.Table 1: Performance of the APE, ∂ G ( ¯ x = ) ∂ x , for alternative estimators. True APE CRECF Method Papke and Wooldridge Chamberlain’s CRE Probit Chamberlain’s LogitMean RMSE Mean RMSE Mean RMSE Mean RMSE Mean N = 200 -.0931 .0445 -.0920 .0724 -.0354 .0561 -.0578 .0473 -.1070 N = 500 -.0944 .0283 -.0932 .0654 -.0353 .0462 -.0578 .0310 -.1061 N = 1000 -.0935 .0203 -.0936 .0616 -.0353 .0408 -.0579 .0238 -.1057 N = 2000 -.0934 .0143 -.0936 .0597 -.0353 .0381 -.0579 .0189 -.1057 N = 5000 -.0939 .0088 -.0936 .0592 -.0353 .0370 -.0579 .0144 -.1053RMSE is Root Mean Square Error and Mean is the mean value of m = In Figure 1 we plot the densities of m = d ∂ G ( x it ) / ∂ x − ∂ G ( x it ) / ∂ x ,at x it =
1, obtained for the four estimation methods for different sample sizes. It canbe seen from the figure that for each of the alternative estimators, the APE of x is es-timated with a bias, which persists as the sample size grows larger. Thus, even asthe variance of d ∂ G ( x it ) / ∂ x − ∂ G ( x it ) / ∂ x for each of the alternative methods decreases,the root mean square error (RMSE) for alternative methods in Table 1 decreases quiteslowly.Since the CL and CRE probit models do not account for the endogeneity of x it withrespect to the transitory errors, ζ it , the methods can give biased results. Unexpectedly,however, the method proposed by PW, which tries to accounts for correlation between Z i and θ i and the correlation of x it with both θ i and ζ it , gives the least satisfactoryresults. This suggests that under a more general DGP, as in our MC experiments, theircontrol function assumption that τ i + ζ it ⊥⊥ X i | υ PWit , where both τ i + ζ it and υ PWit areassumed to be independent of Z i , is, as discussed in remark 3 and proposition 1, likelyto get violated.To validate the claims made in remark 3, we conduct some MC experiments to com-pare the APE when V PW , i ≡ { υ PW , i , . . . , υ PW , iT } is used as a control function as againstwhen only υ PW , it is used a control function. In Figure 2 below, we have plotted thedensity of difference between the estimated and the true APEs, d ∂ G ( x it ) / ∂ x − ∂ G ( x it ) / ∂ x ,where the estimated APEs are the APEs that are obtained by varying the control func-tions and the instruments in PW’s model. As can be seen in figure, when the instru- The acronym derives from fact that the control functions are based on correlated random effects in thereduced form equations. 19 igure 1: Comparison with Alternative Estimators: Density of d ∂ G ( ¯ x ) / ∂ x − ∂ G ( ¯ x ) / ∂ x at ¯ x = −.1 0 .1 .2 PSfrag replacements d ∂ G ( ¯ x ) / ∂ x − ∂ G ( ¯ x ) / ∂ x CRECF Method D e n s i t y Chamberlain’s LogitChamberlain’s Probit Papke and Wooldridge (a) N =500 −.1 −.05 0 .05 .1 .15 PSfrag replacements d ∂ G ( ¯ x ) / ∂ x − ∂ G ( ¯ x ) / ∂ x CRECF Method D e n s i t y Chamberlain’s LogitChamberlain’s Probit Papke and Wooldridge (b) N =1000 −.05 0 .05 .1 PSfrag replacements d ∂ G ( ¯ x ) / ∂ x − ∂ G ( ¯ x ) / ∂ x CRECF Method D e n s i t y Chamberlain’s LogitChamberlain’s Probit Papke and Wooldridge (c) N =2000 −.05 0 .05 .1 PSfrag replacements d ∂ G ( ¯ x ) / ∂ x − ∂ G ( ¯ x ) / ∂ x CRECF Method D e n s i t y Chamberlain’s LogitChamberlain’s Probit Papke and Wooldridge (d) N =5000 ment is continuous with a large support, employing V PW , i yields consistent estimatesof the APE, whereas if only υυυ PW , it is employed as control function, then we get estim-ates that are biased. When the same instrument is discretized to take value 1 and 0,we get biased estimates when either V PW , i or υυυ PW , it is employed as control function.This is because when the instrument is discrete, even though V PW , i is the appropriatecontrol function, the APE is not point but partially identified.That the APEs when using the traditional control function, as in PW’s, may not bepoint identified when the instrument, z it , is binary even when x it has a large supportcan be seen in Figure 3 (c), which has the plots of the level sets of the kernel estimatesof the joint density of ( x , υ PWit ). The figure suggests that the common support require-ment for point identification may be satisfied only over a small range of x values inPW’s model. Whereas from Figure 3 (a) and (b), we can deduce that the support of theconditional distribution of ( ˆ ǫ it , ˆ α i ) given x is almost the same for large ranges of x . igure 2: Density of difference between the estimated and the true APEs, d ∂ G ( ¯ x ) / ∂ x − ∂ G ( ¯ x ) / ∂ x , at ¯ x = d ∂ G ( ¯ x ) / ∂ x , are obtained by varying thecontrol functions and the instruments in PW’s model. Sample Size: N =5000 −.05 0 .05 .1 PSfrag replacements d ∂ G ( ¯ x ) / ∂ x − ∂ G ( ¯ x ) / ∂ x Cnt. Inst., CF: υ PW , it Binary Inst., CF: υ PW , it Cnt. Inst., CF: V PW , i Binary Inst., CF: V PW , i D e n s i t y Figure 3: Level curves of estimated joint density of x and various control functions. − − −7 −4 −1 2 5 8 0.0530.0470.0410.0360.0300.0240.0180.0120.006 PSfrag replacements x υ it ˆ α i ˆ ǫ i t (a) Joint Density of ˆ ǫ it and x . − − − −7 −4 −1 2 5 8 0.0380.0340.0290.0250.0210.0170.0130.0080.004 PSfrag replacements x υ it ˆ α i ˆ ǫ it (b) Joint Density of ˆ α i and x . − − − −7 −4 −1 2 5 8 0.0480.0430.0370.0320.0270.0210.0160.0110.005 PSfrag replacements x υ i t ˆ α i ˆ ǫ it (c) Joint Density of υ it and x . As our method allows for multiple endogenous regressors, we also conduct a simula-tion exercise with two endogenous regressors. The two instruments, zzz it = ( z it , z it ) , re i.i.d and marginally distributed as N [ σ z ] , where σ z = σ z =
2, and ρ z z = Z ′ i ≡ { zzz i , . . . , zzz i } and the individual effects are correlated, and fol-low the joint distribution: ( Z ′ i , α i , α i , θ i ) ′ ∼ N h Σ z αθ i , where σ α = σ α = σ θ = ρ z α = ρ z α = ρ z α = ρ z α = ρ α α = ρ z θ = ρ z θ = ρ α θ = ρ α θ = ααα i = ( α i , α i ) , the conditional correlation between zzz it and θ i is 0. Havinggenerated the data, we then discretize the instruments to take values 0 and 1: z it takesvalue 1 if it is non-negative and 0 otherwise, while z it takes value 1 if it is greater thanor equal 1 and 0 otherwise. The idiosyncratic error terms ( ζ it , ǫ it , ǫ it ) are drawn fromN h Σ ζǫ i , where the elements of Σ ζǫ are assumed as σ ζ = σ ǫ = σ ǫ = ρ ζǫ = ρ ζǫ = ρ ǫ ǫ = Z i and the error terms in place, we next generate x it , x it and y it according to: x it = − z it + .05 z it + α i + ep it x it = .025 z it + .75 z it + α i + ep it y it = {− x it + x it + θ i + ζ it > } .We compute the APEs at ¯ x = x = ∆ x = ∆ x = n , with m = Performance of the APEs, ∂ G ( ¯ x = x = ) ∂ x and ∂ G ( ¯ x = x = ) ∂ x , for alternativeestimators. True APE CRECF Method Chamberlain’s CRE Probit Chamberlain’s LogitMean RMSE Mean RMSE Mean RMSE Mean N = 500 x -0.3328 0.0538 -0.3385 0.0761 -0.3874 0.1114 -0.4294 x N = 1000 x -0.3325 0.0383 -0.3395 0.0659 -0.3869 0.103 -0.428 x N = 2000 x -0.331 0.0282 -0.3402 0.0618 -0.3869 0.1009 -0.4281 x N = 5000 x -0.3316 0.0192 -0.3407 0.0579 -0.387 0.098 -0.4281 x m = The results therefore imply that assuming ( ˆ ǫǫǫ it , ˆ ααα i ) as control function for identifyingthe ASF and APE, may not be restrictive, and that the developed method can yieldconsistent result.To conclude, this finite sample study establishes the following:(1) Our method performs well with sample sizes frequently encountered in practice.
2) It performs better than the alternative estimators with set-ups similar to ours.(3) Employing V PW , instead of υυυ PWt , as a control function yielded consistent APEswhen the instrument had a large support. This suggest that when Z is independent ofthe error terms and the instruments, zzz t , have a large support, then employing V as acontrol function could yield consistent APEs. Child labor is a pressing concern in all developing countries. According to Inter-national Labour Office’s 2016 estimates, worldwide, 152 million children in the agegroup of 5 to 17 years age group are victims of child labor; 62 million of which arein the Asia-Pacific region. Conditions of child labor can vary. Many children work inhazardous industries that take a toll on their health. Moreover, when children work,they forego education and human capital accumulation, with deleterious effect ontheir future earning potential. Furthermore, since there is positive externality to hu-man capital accumulation, as argued by Baland and Robinson (2000) (BR), the socialreturn to such accumulation, too, is not realized.There is a huge literature, both empirical and theoretical, that has sought to under-stand the mechanism underlying child labor. What has emerged is that poverty (Basuand Van 1998 & BR), along with imperfection in labor and land market (Bhalotra andHeady, 2003; Dumas, 2007; Basu et al. , 2010) and capital market (BR) to be the majorcauses of child labor. BR show that child labor increases when endowments of parentsare low, and that when capital market imperfections exist and parents cannot borrow,child labor becomes inefficiently high.Basu et al. (2010) (BDD) point out that papers like Bhalotra and Heady (2003)(BHy)and Dumas (2007) show that in some developing countries the amount of work thechildren of a household do increases with the amount of land possessed by the house-hold. Since land is usually strongly correlated with a household’s income, this findingseems to challenge the presumption that child labor involves the poorest households.They argue that these perverse findings are a facet of labor and land market imperfec-tions, and that in developing countries, poor households in order to escape povertywant to send their children to work but are unable to do so because they have no ac-cess to labor markets close to their home. In such a situation, if the household comes toacquire some wealth, say land, its children, if only to escape penury, will start working.However, if the household’s land ownership continues to rise, then beyond a point thehousehold will be well-off enough and it will not want to make its children work.BHy argue that on one hand there is the negative wealth effect of large landholding onchild labor, whereby large landholding generate higher income and, thereby, makes it asier for the household to forgo the income that child labor would bring. On theother there is the substitution effect, where due to labor market imperfections, ownersof land who are unable to productively hire labor on their farms have an incentiveto employ their children. Since the marginal product of child labor is increasing infarm size, this incentive is stronger amongst larger landowners. The value of workexperience will also tend to increase in farm size if the child stands to inherit the familyfarm. Furthermore, they argue that large landowners who cannot productively hirelabor would want to sell their land rather than employ their children on it, but, becauseof land market failure, are unable to do so. Thus, land market failure reinforces labormarket failure.Cockburn and Dostie (2007) (CD) in their analysis of child labor in Ethiopia find that inpresence of labor market imperfections, all assets need not be child labor enhancing.They find that certain productive assets that enable an increase in the total familyincome may not necessarily increase child labor. They show that assets such as oxenand ploughs that are operated by adults decrease child labor. To test this hypothesis,in our empirical specification we include an index of productive farm assets.Now, while land and labor market imperfections may exist in developing countries,the extent of imperfection may not be uniform across all countries, or regions withina country. Hence, the relationship between child labor and different kinds of assets,such as landholding or agrarian assets, is an empirical question. The question is im-portant because policy implications could be different under different relationshipsbetween various kinds of assets and child labor. For example, if one were to confirmthe findings in BHy and BDD, then if monetary transfers are used to increase landhold-ing or land redistribution is done in favor of the poor, child labor may in fact increase.On the other hand, when monetary transfers are used to increase agrarian assets, thenis an inverse relationship between agrarian assets and child labor holds, such transferscould reduce the incidence of child labor.In our data, we find non-agricultural income to be much higher than agricultural in-come (Table 4). This suggests that land is not the only source of income as in BHy andBDD. BDD assume that land is the only source of income and derive a regression equa-tion where household income is left out. Since non-agricultural income constitutes amajor portion of total household income, we also control for household income.We also find that, overtime, land size distribution has become more unequal (Table 4),which indicates that land market exists in the regions from where the data has been col-lected. Now, if land market exists, even if imperfect, then it is unlikely that land ownedby households will be exogenous to a household’s labor supply as in BHy and BDD,where land is mainly inherited, but endogenously determined along with household’s,including children’s, labor supply decisions. However, endogeneity could also arisedue to omitted variables. To account for the endogeneity of landholding along withthat of productive assets and household income, we employ the method developed inthe paper. .2 Data and Empirical Model We conduct our empirical analysis at the level of the child using two waves, 2006-07 and 2009-2010, of the data from Young Lives Study (YLS), a panel study from sixdistricts of the state of United Andhra Pradesh (henceforth AP) in India. We restrictour sample to children in the age group of 5 to 14 years in 2007 living in rural areas.Finally, excluding children for whom relevant information was missing, we were leftwith 2458 children. Table 3 and Table 4 have the relevant descriptive statistics.Table 3:
Work Status by Age Group
Year 2007 Year: 2010Age Group Not Working Working Total Age Group Not Working Working Total5 to 7 years 45.25 5.02 50.27 8 to 10 years 31.25 19.03 50.278 to 14 years 22.88 26.85 49.73 11 to 17 years 14.98 34.75 49.73Total 68.13 31.87 100.00 Total 46.23 53.77 100.00The figures are in percentage. Total number of children in each period: 2458
Children were asked how much time they spent in the reference period (a typical dayin the last week) doing (a) wage labor, (b) non-wage labor, or (c) domestic chores .If the answer was positive number of hours for any one of the activities, then thebinary variable DWORK was assigned value 1, 0 otherwise. The major componentof work (not reported here) is due to domestic chores. While both domestic and non-domestic work registered increase over the years, the increase in the proportion ofchildren doing non-domestic work was higher.In Table 4 we find that while land ownership has become more unequal, the averagesize of land owned increased over the years. Farming Asset Index, which too increasedover the years, was constructed by Principal Component Analysis of several variables,each of which indicate the number of farming related assets of each kind that thehousehold owns. Let y it be the binary variable that takes value 1 if the parents of the child i decide thatthe child works and 0 otherwise. The decision is modelled as in equation (2.1), where y ∗ it is amount of time devoted to work by child i in period t . The set of endogenousvariables, xxx it , include income ( I NCOME it ) of the household to which the child i be- In 2014 the north-western portion of the then Andhra Pradesh was separated to form the new state ofTelangana. Wage labor involves activities for pay, work done for money outside of household, or work done forsomeone not a part the household. Non-wage labor includes tasks on family farm, cattle herding (house-hold and/or community), other family business, shepherding, piecework or handicrafts done at home(not just farming), and domestic work includes tasks and chores such as fetching water, firewood, clean-ing, cooking, washing, and shopping. Farming assets constitute of agriculture tools, carts, pesticide pumps, ploughs, water pumps, threshers,tractors, and other farm equipments. 25 able 4:
Descriptive Statistics
Child characteristics
Sex (Male=1, Female=0) 0.52 0.50 0.52 0.50Age (yrs.) 8.07 2.97 11.07 2.97
Household characteristics
Parents participated in NREGS (Yes=1 & No=0) 0.33 0.47 0.66 0.47Total number of days parents worked in NREGS 9.21 21.44 36.00 48.10Land Owned (acre) 2.32 3.42 3.86 43.53Farm Asset Index -0.13 0.98 0.22 1.46Gini Coefficient for Land Owned 0.62 0.74Total Income of Household (in Thousand | ) 30.91 34.35 48.88 60.24Annual non-agricultural income ( | ) 20787 35813 29013 62225Annual agricultural income ( | ) 5060 23319 9936 42746Does a household own farm assets (Yes=1 & No=0) 0.69 0.46 0.91 0.29Number of farm assets 4.70 11.06 6.29 9.01Engineered Road to the Locality (Yes=1 & No=0) 0.32 0.47 0.58 0.49Drinkable Water in the Locality (Yes=1 & No=0) 0.87 0.34 0.86 0.34National Bank in the Locality (Yes=1 & No=0) 0.23 0.41 0.08 0.27Hospital in the Locality (Yes=1 & No=0) 0.37 0.89 0.38 0.48 Community (Mandal) characteristics
Total NREGS amount sanctioned (in Million | ) 7.25 8.30 20.19 19.17Total number of children/observations in each period: 2458 longs, size of the land holdings ( LAND it ), and the index of productive farm assets( ASSET it ).To address the issues of endogeneity and heterogeneity, we employ the two-step con-trol function methodology developed in the paper, where the control functions,ˆ ααα ′ i = ( ˆ α INCOME , i , ˆ α LAND , i , ˆ α ASSET , i ) and ˆ ǫǫǫ ′ it = ( ˆ ǫ INCOME , it , ˆ ǫ LAND , it , ˆ ǫ ASSET , it ) ,are obtained from the estimates of the first stage reduced form equations (2.7). Afteraugmenting the structural equation (2.1) with the control functions, we get the modi-fied structural equation (2.9), which we estimate as a probit model .To identify the impact of the endogenous variables on the parents’ decision to maketheir children work, we employ the following instruments: (1) NREGS , which is thetotal sanctioned amount at the mandal (region) level at the beginning of financial year(in 2008-09 prices) to support an employment guarantee scheme; (2)
C ASTE , caste(social group) of the child; and (3) a set of four indicator variables that capture thelevel of infrastructural development in the household’s locality/settlement. For many children, as we know, the optimal choice of y ∗ it is the corner solution, y ∗ it =
0. For cornersolution outcomes, we are interested in features of the distribution such as R Pr ( y ∗ it > |X it , ˆ ααα i , ˆ ǫǫǫ it ) dF ( ˆ ααα , ˆ ǫǫǫ ) and R E ( y ∗ it |X it , ˆ ααα i , ˆ ǫǫǫ it ) dF ( ˆ ααα , ˆ ǫǫǫ ) , where X it = ( xxx ′ it , ′ it ) ′ andE ( y ∗ it |X it , ˆ ααα i , ˆ ǫǫǫ it ) = Pr ( y ∗ it = |X it , ˆ ααα i , ˆ ǫǫǫ it ) .0 + Pr ( y ∗ it > |X it , ˆ ααα i , ˆ ǫǫǫ it ) .E ( y ∗ it |X it , ˆ ααα i , ˆ ǫǫǫ it , y ∗ it > )= Pr ( y ∗ it > |X it , ˆ ααα i , ˆ ǫǫǫ it ) E ( y ∗ it |X it , ˆ ααα i , ˆ ǫǫǫ it , y ∗ it > ) .Due to lack of space, in this application we study only R Pr ( y ∗ it > |X it , ˆ ααα i , ˆ ǫǫǫ it ) dF ( ˆ ααα , ˆ ǫǫǫ ) .26 he National Rural Employment Guarantee Scheme (NREGS) was initiated in 2006 bythe Government of India with the objective to alleviate rural poverty. NREGS legallyentitles rural households to 100 days of employment in unskilled manual labour (onpublic work projects) at a prefixed wage. Now, it can be seen in Table 4 that overthe years, the proportion of children with either parent working in NREGS almostdoubled. This increase in participation was accompanied by a rise in the number ofdays of work on NREGS projects as well. Afridi et al. (2016) claiming NREGS to bea valid instrument for income, argue that since fund sanctioned at the beginning ofthe financial year is not be affected by current demand for work, the funds sanctionedis exogenous and more funds imply more work opportunity in NREGS, which canhave a positive effect on household income. Also, the total fund allocation to NREGSincreased during the period 2007-2010. However, this increase was not uniform acrossthe 15 mandal s .Our second instrument is the caste, a system of social stratification, to which the childbelongs. India is beleaguered with a caste system. Within this caste system, historic-ally, the Scheduled Castes and Scheduled Tribes (SC/ST’s) have been economicallybackward and concentrated in low-skill (mostly agricultural) occupations in ruralareas. Moreover, they were also subject to centuries of systematic caste based dis-crimination, both economically and socially. The historical tradition of social divisionthrough the caste system created a social stratification along education, occupation,income, and wealth lines that has continued into modern India . Fairing better thanSC/ST’s are those belonging to the “Other Backward Classes ” (OBC) . Hence, giventhe fact that income and wealth, both land and productive assets, vary with caste, wechoose C ASTE as our second instrument, which is a discrete variable that takes threevalues: 1 if the child belongs to SC/ST household, 2 if the child belongs to OBC, and 3if the child belongs to group labelled as “Others” (OT).We claim that
C ASTE is a valid instrument for landholding because, though averagewealth and income are evidently distributed along caste lines, we do not find a signific-ant variation in child labor or school enrolment across caste or social group to whichthe child belongs (Table 5). In other words, no social group is inherently disposedto make their children work or send them to school. This could be because risingawareness, overtime, about returns from education persuades families of all castesto send their children to school. We find support for the assertion in the literaturetoo. Hnatkovska et al. (2012) find significant convergence in the education attainmentlevels and occupation choice of SC/ST’s and non-SC/ST’s between 1983 and 2004- Data on the sanctioned funds at the mandal level was obtained from the Andhra Pradesh Government’swebsite on NREGS (http://nrega.ap.gov.in/). In fact, this stratification was so endemic that the constitution of India aggregated these castes intoa schedule of the constitution and provided them with affirmative action cover in both education andpublic sector employment. This constitutional initiative was viewed as a key component of attaining thegoal of raising the social and economic status of the SC/STs to the levels of the non-SC/ST’s. The Government of India classifies, a classification based on social and economic conditions, someof its citizen as Other Backward Classes (OBC). The OBC list is dynamic (castes and communities canbe added or removed) and is supposed to change from time to time depending on social, educationaland economic factors. In the constitution, OBC’s are described as “socially and educationally backwardclasses”, and government is enjoined to ensure their social and educational development.27 able 5:
Descriptive Statistics of some Variables by Caste
Scheduled Castes/Tribes Other Backward Classes OthersYear: 2007 Household Income 31.22 31.64 43.21(in Thousand | ). (33.94) (34.29) (48.59)Land Owned 1.58 2.32 3.08in acre (2.12) (3.51) (4.53)Index of Productive -0.22 -0.14 0.04Farm Asset (0.71) (1.02) (1.17)School Dummy 0.90 0.89 0.96 DSCHOOL = DWORK = DCHOOL = DWORK = Standard errors in parentheses. .3 Discussion of Results The results of the first stage reduced form equations in Table 6 suggest that our in-struments are good predictors of the endogenous variables. First, we find that an in-crease in the amount sanctioned for NREGS projects increases the household income.Secondly,
C ASTE does, on an average, correctly predict the economic (income, landholding, and assets) status of households. Finally, the dummy variables indicatingthe level of infrastructure development are positively correlated with the index of pro-ductive farm assets.Table 6:
First Stage Reduced Form Estimates: Joint Estimation of Income, Land, andFarm Assets Equations
Income Landholding Farm AssetTotal NREGS amount sanctioned (in Million | ) 0.047 ∗∗∗ -0.008 -0.0003(0.009) (0.007) (0.0002)Caste (SC/ST = 1, OBC = 2, OT = 3) 9.220 ∗∗∗ ∗∗∗ ∗∗∗ (1.217) (0.726) (0.0300)Drinkable Water in the Locality (Yes=1 & No=0) 5.417 -1.879 0.341 ∗∗ (5.703) (4.260) (0.150)National Bank in the Locality (Yes=1 & No=0) -2.785 4.684 ∗∗ ∗∗∗ (2.130) (1.591) (0.0561)Hospital in the Locality (Yes=1 & No=0) -0.689 -4.143 ∗∗∗ ∗ (1.248) (0.932) (0.033)Other Exogenous Variables of the Structural Yes Yes YesEquation: Age and Sex of the ChildBiørn’s stepwise MLE was employed to obtain these estimates. Significance levels : ∗ : 10% ∗∗ : 5% ∗ ∗ ∗ : 1% Standard errors (SE) in parentheses Before we begin to discuss the result of the second-stage estimation in Table 7, we statea few points regarding the estimation. (a) The only exogenous explanatory variablesin our parsimonious specification are the age and the sex of the children. (b) Thespecification includes district dummies, a time dummy, and the interaction of the twoto account for the fact that the districts to which children belong may have differenteconomic growth trajectories as well as trends related to work and education. Thetime dummy allows us to control for changes in demand and supply of work overtime. (c) Since the support assumption for point identification of the APEs is not met,we estimate the bounds on the APEs and the 95% confidence interval (CI ) for thepartially identified APEs. (d) For the continuous variables, the bounds on the APE ofa variable were computed by increasing the variable by one standard deviation fromits mean, where the mean and the SD of the variable are from the 2010 data. For age,the bounds on APE were computed by increasing the mean age in 2010 by 1 year. (e) Though we do not report here, we did not find that nonlinear terms of income, land, and productiveassets to be significant. We had also included four education related dummy variables, two for the fatherand two for the mother. The dummy variables for the mother, for example, indicated (1) if the mother hadsome schooling and (2) if the mother had attended secondary or post secondary school. The educationdummies, though substantially affecting household income, did not seem to affect child labor propensity.This suggests that parents’ education level has had no independent impact except through income.29 he standard errors of the coefficients were estimated using the analytical expressionof the covariance matrix derived in Appendix D.Table 7:
Household Income and Wealth Effect on Incidence of Child Labor
CRE Probit Control Function (CF) Method
APE BoundsCoeff. Coeff. CI
CFs Coeff.Income 0.003 ∗∗∗ -0.0234 ∗∗∗ [-0.00532, -0.00451] ˆ α INCOME ∗∗∗ [0.00679, 0.00749 ] ˆ α LAND -0.015 ∗∗ (0.002) (0.007) [0.00676, 0.00752] (0.0065)Farm Asset Index -0.011 -0.976 ∗∗∗ [-0.22077, -0.18734] ˆ α ASSET ∗∗∗ (0.0279) (0.169) [-0.22121, -0.1869] (0.129)Age 2.019 ∗∗∗ ∗∗∗ [0.0757, 0.12533] ˆ ǫ INCOME ∗∗∗ (0.072) (0.057) [0.0755, 0.12552] (0.003)Sex 0.644 ∗∗∗ ∗∗∗ [0.07355, 0.12318] ˆ ǫ LAND -0.031 ∗∗∗ (0.042) (0.0473) [0.07322, 0.12351] (0.0075)ˆ ǫ ASSET ∗∗∗ (0.185)
Total number of children: 2458. Total number of observations with positive outcome: 2128
Significance levels : ∗ : 10% ∗∗ : 5% ∗ ∗ ∗ : 1% , Standard errors (SE) in parentheses
We begin by comparing the results from Chamberlain’s CRE probit model with the es-timates obtained from applying the method developed in this paper. The significanceof estimated coefficients of the control functions suggests that income, land size, andproductive farm assets are endogenously determined along with household’s laborsupply, including that of the child’s, decisions. When income and wealth are not in-strumented, as in the CRE probit, considering the discussion in the paper, we get anincorrect sign for the coefficient on income. Moreover, the result of CRE probit sug-gests that ownership of land and farm assets do not affect child labor, which, given themany recent evidences, is unlikely in a developing country. The results, thus, makeclear the importance of accounting for endogeneity of income, landholding, and farmasset.The estimates from the control function method suggest that children of householdsthat have a higher landholding are more likely to engage in work. This is in conformitywith the findings in BDD, BHy and CD, where, due to presence of land, labor, andcredit market imperfections, ownership of large amount land provides incentives forchildren to work. As far as income is concerned, we find that higher household incomereduces the chances of child labor, which again confirms poverty to be a cause of childlabor.Since the upper and lower bounds of the APE of productive farm assets are high andsince the CI is only marginally bigger than the bound, it seems that ownership offarm assets leads to a significantly high reduction in children’s participation in work.Dumas, BHy and CD argue that an increase in asset holding that increases the mar-ginal productivity of labor induces two opposite effects on labor. While the incomeeffect of increased wealth tends to reduce the labor time, the substitution effect, dueto the absence of labor market, provides incentives for work, and tends to increase hildren’s labor time. Our results suggest that the wealth effect of farm assets, whichare not likely to be operated by children, dominate to reduce children’s labor time.Secondly, since the prevalence of farm assets is high in those regions where there hasbeen infrastructure development, it seems that lack of infrastructure development thatimpedes access to, or does not provide incentives to acquire, productive farm assetsmay be an important factor determining child labor . Finally, we find that older chil-dren and boys are more likely to work. The objective of the paper has been to develop a method to estimate structural meas-ures of interest such as the average partial effects for panel data binary responsemodel in a triangular system while accounting for multiple unobserved heterogen-eities. The unobserved heterogeneity terms constitute of time invariant random ef-fects/coefficients and idiosyncratic errors. We propose that the expected values – con-ditional on the histories of the endogenous variables, X i ≡ ( xxx ′ i , . . . , xxx ′ iT ) ′ , and the exo-genous variables, Z i ≡ ( zzz ′ i , . . . , zzz ′ iT ) ′ – of the heterogeneity terms be used as controlfunctions (CF).The proposed method makes a number of interesting contribution to the literature.First, among the class of triangular system with imposed structures similar to ours, theproposed CF method requires weaker restrictions than the traditional control functionmethods. Secondly, when instruments have a small support, the CFs, which exploitpanel data, help in point-identifying structural measures such as the APEs when theendogenous variables have a large support. Bounds on the structural measures areprovided when the support assumption is not satisfied. Thirdly, the method allowsfor multiple endogenous variables, all of which are determined simultaneously. Fi-nally, in an equivalence result we showed that for linear panel data models, when thestructural equation is augmented with the proposed control functions, the resultingestimates are equivalent to the ones that are obtained when the structural model is es-timated by a certain two-stage least squares. Also, Monte Carlo experiments show thatcompared to alternative panel data binary choice models similar to ours, our methodperforms better.The estimator was applied to estimate the causal effects of income, land size, and farmassets on the incidence of child labor. We found that household income and owner-ship of farming assets significantly lower the incidence of child labor, suggesting astrong income effect of farm assets. Secondly, large landholding increases the likeli-hood of child labor, suggesting a substitution effect of land ownership. Thirdly, a test In a separate set of regressions that included only the exogenous variables, we tried to assess if theinfrastructure variables had independent impacts on work and schooling decisions of children. Thesevariables turned out to be insignificant, suggesting that the demand for child labor or opportunities forschooling were not affected by infrastructure development or its lack in rural AP. In other words, infra-structure had its effect on work and schooling outcomes only through its impact on the economic con-ditions of certain households, which validates using infrastructure variables as instruments for farmingassets. 31 f exogeneity revealed that land size is determined endogenously along with house-hold labor supply decisions, contrary to what most empirical studies on child labor indeveloping countries assume.Finally, we would like to note that an important generalization would be to identifyand estimate the proposed control functions without making distributional assump-tions about the heterogeneity terms of the reduced form equations.
References (2011). Bias corrections for two-step fixed effects panel data estimators.
Journal of Eco-nometrics , (2), 144 – 162.A FRIDI , F., M
UKHOPADHYAY , A. and S
AHOO , S. (2016). Female Labor Force Particip-ation and Child Education in India: Evidence from the National Rural EmploymentGuarantee Scheme.
IZA Journal of Labor & Development , , doi:10.1186/s40175–016–0053–y.A LTONJI , J. G. and M
ATZKIN , R. L. (2005). Cross section and panel data estimators fornonseparable models with endogenous regressors.
Econometrica , (4), 1053–1102.A RELLANO , M. and B
ONHOMME , S. (2011). Nonlinear Panel Data Analysis.
AnnualReview of Economics , , 395–424.B ALAND , J. M. and R
OBINSON , J. A. (2000). Is Child Labor Inefficient?
Journal ofPolitical Economy , , 663–679.B ALTAGI , B. (1981). Simultaneous equations with error components.
Journal of Econo-metrics , (2), 189–200.B ALTAGI , B. H., S
ONG , S. H. and J
UNG , B. C. (2010). Testing for Heteroskedasticityand Serial Correlation in a Random Effects Panel Data Model.
Journal of Econometrics , , 122–124.B ASU , K., D AS , S. and D UTTA , B. (2010). Child Labor and Household Wealth: Theoryand Empirical Evidence of an Inverted-U.
Journal of Development Economics , , 8–14.— and V AN , P. H. (1998). The Economics of Child Labor. American Economic Review , , 412–427.B ESTER , C. A. and H
ANSEN , C. (2009). Identification of Marginal Effects in a Nonpara-metric Correlated Random Effects Model.
Journal of Business and Economic Statistics , , 235–250.B HALOTRA , S. and H
EADY , C. (2003). Child Farm Labor: The Wealth Paradox.
WorldBank Economic Review , , 197–227.B IØRN , E. (2004). Regression Systems for Unbalanced Panel Data: A Stepwise Max-imum Likelihood Procedure .
Journal of Econometrics , , 281–291. LUNDELL , R., M A C URDY , T. and M
EGHIR , C. (2007). Chapter 69 Labor Supply Mod-els: Unobserved Heterogeneity, Nonparticipation and Dynamics.
Handbook of Econo-metrics , vol. 6, Elsevier, pp. 4667 – 4775.— and P
OWELL , J. (2003). Endogeneity in Nonparametric and Semiparametric Regres-sion Models. In M. Dewatripont, L. Hansen and S. Turnovsky (eds.),
Advances inEconomics and Econonometrics: Theory and Applications, Eighth World Congress , vol. 2,Cambridge: Cambridge University Press.— and — (2004). Endogeneity in Semiparametric Binary Response Models.
Review ofEconomic Studies , , 655–679.— and S MITH , R. J. (1994). Coherency and Estimation in Simultaneous Models withCensored or Qualitative Dependent Variables.
Journal of Econometrics , (1-2), 355–373.C ADRE , B., P
ELLETIER , B. and P
UDLO , P. (2013). Estimation of Density Level Sets witha given Probability Content.
Journal of Nonparametric Statistics , (1), 261–272.C HAMBERLAIN , G. (1984). Panel Data. In Z. Griliches and M. D. Intriligator (eds.),
Handbook of Econometrics , vol. 2, Elsevier.— (2010). Binary Response Models for Panel Data: Identification and Information.
Econometrica , , 159–168.C OCKBURN , J. and D
OSTIE , B. (2007). Child Work and Schooling: The Role of House-hold Asset Profiles and Poverty in Rural Ethiopia.
Journal of African Economies , ,519–563.C ONSTANTINOU , P. and D
AWID , A. P. (2017). Extended conditional independenceand applications in causal inference.
Annals of Statistics , (6), 2618–2653.D’H AULTFŒUILLE , X. and F ´
EVRIER , P. (2015). Identification of Nonseparable Trian-gular Models with Discrete Instruments.
Econometrica , (3), 1199–1210.D UMAS , C. (2007). Why do Parents make their Children Work? A Test of the PovertyHypothesis in Rural Areas of Burkina Faso.
Oxford Economic Papers , , 301–329.E FRON , B. (2010).
Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing,and Prediction . Institute of Mathematical Statistics Monographs, Cambridge Univer-sity Press.F
LORENS , J. P., H
ECKMAN , J. J., M
EGHIR , C. and V
YTLACIL , E. (2008). Identificationof treatment effects using control functions in models with continuous, endogenoustreatment and heterogeneous effects.
Econometrica , (5), 1191–1206.G REENE , W. (2004). Convenient Estimators for the Panel Probit Model: Further Res-ults.
Empirical Economics , , 21–47.G U , J. and K OENKER , R. (2017). Unobserved Heterogeneity in Income Dynamics: AnEmpirical Bayes Perspective.
Journal of Business & Economic Statistics , (1), 1–16. UARINO , C. M., M
AXFIELD , M., R
ECKASE , M. D., T
HOMPSON , P. N. andW
OOLDRIDGE , J. M. (2015). An Evaluation of Empirical Bayes’s Estimation of Value-Added Teacher Performance Measures.
Journal of Educational and Behavioral Statistics , (2), 190–222.H ALL , P., R
ACINE , J. and L I , Q. (2004). Cross-Validation and the Estimation of Con-ditional Probability Densities. Journal of the American Statistical Association , , 1015–1026.H AYFIELD , T. and R
ACINE , J. (2008). Nonparametric Econometrics: The np Package.
Journal of Statistical Software, Articles , (5), 1–32.H EISS , F. and W
INSCHEL , V. (2008). Likelihood Approximation by Numerical Integra-tion on Sparse Grids.
Journal of Econometrics , (1), 62–80.H NATKOVSKA , V., L
AHIRI , A. and P
AUL , S. (2012). Castes and Labor Mobility.
Amer-ican Economic Journal: Applied Economics , (2), 274–307.H ODERLEIN , S. and S
HERMAN , R. (2015). Identification and Estimation in a Correl-ated Random Coefficients Binary Response Model.
Journal of Econometrics , , 135–149.— and W HITE , H. (2012). Nonparametric Identification in Nonseparable Panel DataModels with Generalized Fixed Effects.
Journal of Econometrics , , 300–314.H OROWITZ , J. L. (2009).
Semiparametric and Nonparametric Methods in Econometrics .Springer, 2nd edn.I
MBENS , G. W. and M
ANSKI , C. F. (2004). Confidence Intervals for Partially IdentifiedParameters.
Econometrica , (6), 1845–1857.— and N EWEY , W. K. (2009). Identification and Estimation of Triangular SimultaneousEquations Models without Additivity.
Econometrica , , 1481–1512.J AMES , W. and S
TEIN , C. (1961). Estimation with quadratic loss. In
Proceedings of theFourth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contri-butions to the Theory of Statistics , Berkeley, Calif.: University of California Press, pp.361–379.K
ASY , M. (2011). Identification in Triangular Systems Using Control Functions.
Econo-metric Theory , (3).K IM , K. I . and P ETRIN , A. (2017). A Generalized Non-parametric InstrumentalVariable-Control Function Approach to Estimation in Non-linear Settings, WorkingPaper.L
IANG , K.-Y. and Z
EGER , S. L. (1986). Longitudinal data analysis using generalizedlinear models.
Biometrika , (1), 13–22.M ILLER , K. S. (1981). On the Inverse of the Sum of Matrices.
Mathematics Magazine , (2), 67–72. UNDLAK , Y. (1978). On the Pooling of Time Series and Cross Section Data.
Economet-rica , , 69–85.N EWEY , W. K. (1984). A Method of Moment Interpretation of Sequential Estimators.
Economics Letters , , 201–206.P APKE , L. E. and W
OOLDRIDGE , J. M. (2008). Panel Data Methods for Fractional Re-sponse Variables with an application to Test Pass Rates.
Journal of Econometrics , ,121–133.R IVERS , D. and V
UONG , Q. H. (1988). Limited information estimators and exogeneitytests for simultaneous probit models.
Journal of Econometrics , (3), 347–366.R OTHE , C. (2009). Semiparametric Estimation of Binary Response Models with Endo-genous Regressors.
Journal of Econometrics , , 51–64.S EMYKINA , A. and W
OOLDRIDGE , J. M. (2018). Binary Response Panel Data Modelswith Sample Selection and Self-Selection.
Journal of Applied Econometrics , (2), 179–197.T ORGOVITSKY , A. (2015). Identification of Nonseparable Models Using InstrumentsWith Small Support.
Econometrica , , 1185–1197.W OOLDRIDGE , J. M. (2010a).
Econometric Analysis of Cross Section and Panel Data . MitPress.— (2010b).
Solutions Manual and Supplementary Materials for Econometric Analysis ofCross Section and Panel Data . Cambridge, MA: MIT Press.— (2015). Control Function Methods in Applied Econometrics.
Journal of Human Re-sources , (2), 420–445.— (2019). Correlated Random Effects Models with Unbalanced Panels. Journal of Eco-nometrics , (1), 137–150. Proofs
Lemma 2 (a) Let X ≡ ( xxx ′ , . . . , xxx ′ T ) ′ and Z ≡ ( zzz . . . zzz T ) . If xxx t is specified asxxx t = π zzz t + ¯ π ¯ zzz + aaa | {z } ααα + ǫǫǫ t , t ∈ {
1, . . . , T } , (A-1) where ¯ zzz = T ∑ Tt = zzz t , Z ⊥⊥ ( aaa , ǫǫǫ t ) , aaa ⊥⊥ ǫǫǫ t , ǫǫǫ t is i.i.d., aaa ∼ N ( Λ αα ) and ǫǫǫ t ∼ N ( Σ ǫǫ ) ,then E ( ααα | X , Z ) ≡ ˆ ααα ( X , Z , Θ ) = ¯ π ¯ zzz + E ( aaa | X , Z ) = ¯ π ¯ zzz + ΩΣ − ǫǫ T ∑ t = ( xxx t − π zzz t − ¯ π ¯ zzz ) , where Ω = [ T Σ − ǫǫ + Λ − αα ] − is the conditional variance of aaa given X and Z.(b) Suppose we have a single endogenous variable, x t . Let X ≡ ( x , . . . , x T ) and define Z ≡ ( zzz . . . zzz T ) . Suppose x t is given byx t = π zzz t + ¯ π ¯ zzz + a + ǫ t , t =
1, . . . , T , (A-2) where ¯ zzz = T ∑ Tt = zzz t . If the errors, ǫǫǫ ≡ ( ǫ , . . . , ǫ T ) ′ , are normally distributed with mean 0and are non-spherical such that E ( ǫǫǫǫǫǫ ′ ) = Ω ǫǫ , a ∼ N ( σ α ) , a ⊥⊥ ǫǫǫ , and Z ⊥⊥ ( aaa , ǫǫǫ ) , then E ( a | X , Z ) = ˆ a ( X , Z ) = ( x − π zzz − ¯ π ¯ zzz ) ω + . . . + ( x T − π zzz T − ¯ π ¯ zzz ) ω T , where ( ω , . . . , ω T ) ′ = Ω − ǫǫ e T ( e ′ T Ω − ǫǫ e T + σ − α ) and e T is a vector of ones of dimension T. Proof 1 (a) To obtain ˆ aaa ( X , Z ) = E ( aaa | X , Z ) , we first derive f ( aaa | X , Z ) , the conditional densityfunction of aaa given X and Z. By Bayes’ rule we havef ( aaa | X , Z ) = f ( X , Z | aaa ) f ( aaa ) f ( X , Z ) = f ( X | Z , aaa ) f ( Z | aaa ) f ( aaa ) f ( X | Z ) f ( Z ) = f ( X | Z , aaa ) f ( aaa ) f ( X | Z ) , (A-3) where the last equality is obtained because Z is independent of the residual individual effects,aaa; that is, f ( Z | aaa ) = f ( Z ) .Since aaa ⊥⊥ ǫǫǫ t , ǫǫǫ t is i.i.d., aaa ∼ N ( Λ αα ) and ǫǫǫ t ∼ N ( Σ ǫǫ ) then, given (A-1), it implies thatX, given Z, is normally distributed with mean (( π zzz + ¯ π ¯ zzz ) ′ , . . . , ( π zzz T + ¯ π ¯ zzz )) ′ , and variance Σ = I T ⊗ Σ ǫǫ + E T ⊗ Λ αα , where I T is an identity matrix of dimension T and E T is a T × Tmatrix of ones. That is, f ( X | Z ) in (A-3) is given byf ( X | Z ) = p ( π ) mT | Σ | exp (cid:18) − R ′ Σ − R (cid:19) , where R = X − π zzz + ¯ π ¯ zzz... π zzz T + ¯ π ¯ zzz = rrr ...rrr T nd m = d x is the dimension of xxx t . Since rank ( E T ) = , we can use example 5 in Miller(1981), which is on the inverse of sum of two Kronecker products, to obtain Σ − = I T ⊗ Σ − ǫǫ − E T ⊗ [ Σ ǫǫ + tr ( E T ) Λ αα ] − Λ αα Σ − ǫǫ and | Σ | = | Σ ǫǫ | ( T − ) | Σ ǫǫ + tr ( E T ) Λ αα | where tr ( E T ) = T , which allows us to write f ( X | Z ) asf ( X | Z ) = q ( π ) mT | Σ ǫǫ | ( T − ) | Σ ǫǫ + T Λ αα | × exp (cid:18) − (cid:20) R ′ [ I T ⊗ Σ − ǫǫ ] R − T ∑ t = rrr ′ t [ Σ ǫǫ + T Λ αα ] − Λ αα Σ − ǫǫ T ∑ t = rrr t (cid:21)(cid:19) . (A-4) Since f ( X | Z , aaa ) = f (( ǫǫǫ ′ , . . . ǫǫǫ ′ T ) ′ ) , ǫǫǫ t ’s are i.i.d., ǫǫǫ t ∼ N ( Σ ǫǫ ) , and aaa ∼ N ( Λ αα ) ,f ( X | Z , aaa ) f ( aaa ) in (A-3) isf ( X | Z , aaa ) f ( aaa ) = p ( π ) mT + m | Σ ǫǫ | T | Λ αα | × exp (cid:18) − (cid:20) ( R − e T ⊗ aaa ) ′ [ I T ⊗ Σ ǫǫ ] − ( R − e T ⊗ aaa ) + aaa ′ Λ − αα aaa (cid:21)(cid:19) , (A-5) where R − e T ⊗ aaa = ( ǫǫǫ ′ , . . . ǫǫǫ ′ T ) ′ , e T being vector of ones of dimension T.The following matrix results,1. ( A m × m ⊗ B n × n ) − = A − m × m ⊗ B − n × n ,2. ( A p × q ⊗ B r × s )( C q × k ⊗ D s × l ) = A p × q C q × k ⊗ B r × s D s × l and3. e ′ T I T e T = T,allow us to write the expression in the square parenthesis in (A-5) asR ′ [ I T ⊗ Σ − ǫǫ ] R − aaa ′ Σ − ǫǫ T ∑ t = rrr t − T ∑ t = rrr ′ t Σ − ǫǫ aaa + aaa ′ [ Λ − αα + T Σ − ǫǫ ] aaa . (A-6) Using the results in (A-4), (A-5) and (A-6) and the result that | A − | = | A | − , if A is nonsin-gular, we getf ( aaa | X , Z ) = f ( X | Z , aaa ) f ( aaa ) f ( X | Z )= p ( π ) m | Σ ǫǫ [ Σ ǫǫ + T Λ αα ] − Λ αα | × exp (cid:18) − (cid:20) T ∑ t = rrr ′ t [ Σ ǫǫ + T Λ αα ] − Λ αα Σ − ǫǫ T ∑ t = rrr t − aaa ′ Σ − ǫǫ T ∑ t = rrr t − T ∑ t = rrr ′ t Σ − ǫǫ aaa + aaa ′ [ Λ − αα + T Σ − ǫǫ ] aaa (cid:21)(cid:19) . (A-7) et [ Λ − αα + T Σ − ǫǫ ] = Ω − , then ∑ Tt = rrr ′ t [ Σ ǫǫ + T Λ αα ] − Λ αα Σ − ǫǫ ∑ Tt = rrr t and ∑ Tt = rrr ′ t Σ − ǫǫ , in(A-7), after a few matrix manipulations, can be written as T ∑ t = rrr ′ t [ Σ ǫǫ + T Λ αα ] − Λ αα Σ − ǫǫ T ∑ t = rrr t = T ∑ t = rrr ′ t Σ − ǫǫ ΩΩ − ΩΣ − ǫǫ T ∑ t = rrr t and (A-8) T ∑ t = rrr ′ t Σ − ǫǫ = T ∑ t = rrr ′ t Σ − ǫǫ ΩΩ − respectively. (A-9) Given (A-8) and (A-9), we can write f ( aaa | X , Z ) in (A-7) asf ( aaa | X , Z ) = p ( π ) m | Ω | exp (cid:18) − (cid:20)(cid:20) aaa − ΩΣ − ǫǫ T ∑ t = rrr t (cid:21) ′ Ω − (cid:20) aaa − ΩΣ − ǫǫ T ∑ t = rrr t (cid:21)(cid:21)(cid:19) . In other words, aaa, given X and Z, is normally distributed with conditional mean E ( aaa | X , Z ) = ˆ aaa ( X , Z ) = ΩΣ − ǫǫ T ∑ t = ( xxx t − π zzz t − ¯ π ¯ zzz t ) and conditional variance Ω = Σ ǫǫ [ Σ ǫǫ + T Λ αα ] − Λ αα .(b) While discussing the restrictions imposed on the reduced form equation, we had statedthat when d x = , the assumption that a and ǫ t are completely independent of Z can beweakened to allow for non-spherical error components. Suppose that ǫ t , t =
1, . . . ,
T areserially dependent such that ǫǫǫ ≡ ( ǫ , . . . , ǫ T ) ′ normally distributed with E ( ǫǫǫǫǫǫ ′ ) = Ω ǫǫ and ais normally distributed and is heteroscedastic as in Baltagi et al. (2010).To obtain ˆ a ( X , Z ) = E ( a | X , Z ) , as in part (a), we first derive f ( a | X , Z ) . Using the fact thatZ ⊥⊥ a, by an application of Bayes’ rule, as in part (a), equation (A-3), we have f ( a | X , Z ) = f ( X | Z , a ) f ( a ) f ( X | Z ) , where f ( a ) is the normal density function of a.Now, since in (A-2), a ⊥⊥ ǫǫǫ , a ∼ N ( σ α ) , and ǫǫǫ ∼ N ( Ω ǫǫ ) , it implies that X, givenZ, is normally distributed with mean (( π zzz + ¯ π ¯ zzz ) , . . . , ( π zzz T + ¯ π ¯ zzz )) ′ , and variance Σ = Ω ǫǫ + σ α e T e ′ T , where e T is a vector of ones of dimension T. That is,f ( X | Z ) = p ( π ) T | Σ | exp ( − R ′ Σ − R ) , where R = X − π zzz + ¯ π ¯ zzz... π zzz T + ¯ π ¯ zzz , (A-10) and where by Sherman-Morrison formula, Σ − = Ω − ǫǫ − σ α Ω − ǫǫ e T e ′ T Ω − ǫǫ + e ′ T Ω − ǫǫ e T , and | Σ | = | Ω ǫǫ | ( + σ α e ′ T Ω − ǫǫ e T ) .Since X given ( Z , a ) has the same distribution as ǫǫǫ = R − ae T , we havef ( X | Z , a ) f ( a ) = p ( π ) T + | Ω ǫǫ | σ α exp (cid:18) − [( R − ae T ) ′ Ω − ǫǫ ( R − ae T ) + a σ α ] (cid:19) .(A-11) inally, because f ( a | X , Z ) = f ( X | Z , a ) f ( a ) f ( X | Z ) , using (A-10) and (A-11), it can be shown that agiven X and Z is normally distributed with conditional mean E ( a | X , Z ) = ˆ a ( X , Z ) = ( x − π zzz − ¯ π ¯ zzz ) ω + . . . + ( x T − π zzz T − ¯ π ¯ zzz ) ω T , where ( ω , . . . , ω T ) ′ = Ω − ǫǫ e T ( e ′ T Ω − ǫǫ e T + σ − α ) , and conditional variance, σ α ( σ α e ′ T Ω − ǫǫ e T + ) − . Proposition 1
Let Z ⊥⊥ ( θ , ααα , ζ t , ǫǫǫ t ) . When θ and ααα are correlated and so are ζ t and ǫǫǫ t , then θ + ζ t ⊥⊥ X | V whereas θ + ζ t ⊥6 ⊥ X | υυυ t , where υυυ t = ααα + ǫǫǫ t = xxx t + π zzz t and V ≡ ( υυυ , . . . , υυυ T ) . Proof 2
Now, to show that r t = θ + ζ t ⊥⊥ X | V, we can show that E ( f ( r t ) | X , V ) = E ( f ( r t ) | V ) , (A-12) where f is real, bounded and measurable function (see Proposition 2.3 in Constantinou andDawid, 2017).Since X = π Z + V, there is one-to-one mapping between ( X , V ) and ( π Z , V ) , and thereforethe conditioning σ -algebra, σ ( X , V ) , is same as the σ -algebra, σ ( π Z , V ) . Hence, E ( f ( r t ) | X , V ) = E ( f ( r t ) | π Z , V ) . (A-13) Since r t and V are independent of π Z, we get E ( f ( r t ) | π Z , V ) = E ( f ( r t ) | V ) , and thereforewe have E ( f ( r t ) | X , V ) = E ( f ( r t ) | V ) , which is what we wanted to show.When the control function is υυυ t , we have E ( f ( r t ) | X , υυυ t ) = E ( f ( r t ) | X − t , xxx t , υυυ t ) = E ( f ( r t ) | X − t , π zzz t + υυυ t , υυυ t ) = E ( f ( r t ) | X − t , π zzz t , υυυ t ) , where the second equality follows because xxx t = π zzz t + υυυ t and third by the same logic by whichwe get (A-13). Because r t = θ + ζ t and υ s = ααα + ǫǫǫ s , s = t, are correlated even afterconditioning on υυυ t , so, conditional on υυυ t , r t is correlated with xxx s = π zzz s + υυυ s , s = t; that is, E ( f ( r t ) | X − t , π zzz t , υυυ t ) = E ( f ( r t ) | π zzz t , υυυ t ) = E ( f ( r t ) | υυυ t ) . Theorem 1
If (i) rank ( E ( xxx t xxx ′ t )) = d x ; (ii) rank ( Π ) = d x , where Π = (cid:16) π ¯ π (cid:17) ; (iii) rank ( E (( zzz ′ t , ¯ zzz ′ ) ′ ( zzz ′ t , ¯ zzz ′ ))) = k where k = dim (( zzz ′ t , ¯ zzz ′ ) ′ ) ; and (iv) if assumption AS 3 holds sothat the covariance matrices of ǫǫǫ t and ααα are of full rank, then rank ( E ( X t X ′ t )) = d x . Proof 3
Now, condition (i) of the lemma is the “rank condition” for the standard probit modelwhen xxx it is exogenous and the object of interest is ϕϕϕ or marginal effects. This condition isassumed to hold true. Similarly, condition (iii) is the rank condition for the identification ofthe reduced form coefficients, Π = (cid:16) π ¯ π (cid:17) , which is also assumed to hold.To begin with, without loss of generality assume that zzz t is uncorrelated with the individualeffects ααα so that ¯ π ¯ zzz = . This implies that we can ignore ¯ zzz in the reduced form equation (A-1)and consider only the dimension of zzz t , which is d z , in condition (iii) of the lemma, and that Π = π , ˆ ααα = ˆ aaa , ˆ ǫǫǫ t = xxx t − π zzz t − ˆ aaa , k = d z , and π a d x × d z matrix . ince, ˆ ǫǫǫ t = υυυ t − ˆ ααα , then if E ( X t X ′ t ) , where X t = ( xxx ′ t , ˆ ǫǫǫ ′ t , ˆ ααα ′ ) ′ , were to be invertible (or equi-valently have a rank of d x ) so, too, would E xxx t υυυ t ˆ ααα h xxx ′ t υυυ ′ t ˆ ααα ′ i be.This is equivalent to stating that the columns of ( xxx ′ t , υυυ ′ t , ˆ ααα ′ ) are linearly independent. Now, ifthe columns of ( xxx ′ t , υυυ ′ t , ˆ ααα ′ ) are linearly independent, then every subset of its columns, too, islinearly independent. Thus, to show the statement of the theorem to be true, we can show that rank ( E [( xxx ′ t , υυυ ′ t , ˆ ααα ′ ) ′ xxx ′ t ]) = d x , (A-14)rank ( E [( xxx ′ t , υυυ ′ t , ˆ ααα ′ ) ′ υυυ ′ t ]) = d x and (A-15)rank ( E [( xxx ′ t , υυυ ′ t , ˆ ααα ′ ) ′ ˆ ααα ′ ]) = d x , (A-16) as both υυυ t and ˆ ααα ′ are vectors of dimension d x .Since E [( xxx ′ t , υυυ ′ t , ˆ ααα ′ ) ′ xxx ′ t ] in eq. (A-14) has d x rows and d x columns, rank ( E [( xxx ′ t , υυυ ′ t , ˆ ααα ′ ) ′ xxx ′ t ]) ≤ d x . Consider xxx t xxx ′ t in eq. (A-14). Now, by condition (i) of the lemma, rank ( E [ xxx t xxx ′ t ]) = d x , which implies that d x columns of E [ xxx t xxx ′ t ] are linearly independent. This then impliesthat the d x columns of E [( xxx ′ t , υυυ ′ t , ˆ ααα ′ ) ′ xxx ′ t ] are also linearly independent; that is, it implies that rank ( E [( xxx ′ t , υυυ ′ t , ˆ ααα ′ ) ′ xxx ′ t ]) ≥ d x . Thus, we can conclude that rank ( E [( xxx ′ t , υυυ ′ t , ˆ ααα ′ ) ′ xxx ′ t ]) = d x .Again, given that E [( xxx ′ t , υυυ ′ t , ˆ ααα ′ ) ′ υυυ ′ t ] in eq. (A-15) has d x rows and d x columns, rank ( E [( xxx ′ t , υυυ ′ t , ˆ ααα ′ ) ′ υυυ ′ t ]) ≤ d x . If we can show the rank ( E [ υυυ t υυυ ′ t ]) in eq. (A-15) is d x , then it would imply that the d x columns of E [( xxx ′ t , υυυ ′ t , ˆ ααα ′ ) ′ υυυ ′ t ] are also linearly independent, implying that rank ( E [( xxx ′ t , υυυ ′ t , ˆ ααα ′ ) ′ υυυ ′ t ]) ≥ d x . Thus, we would be able to show that rank ( E [( xxx ′ t , υυυ ′ t , ˆ ααα ′ ) ′ υυυ ′ t ]) = d x , as desired.To show that E [ υυυ t υυυ ′ t ] has a full column rank of d x , is equivalent to showing that υυυ ′ ccc = ( xxx ′ − zzz ′ t π ′ ) ccc = almost surely (a.s.) whenever ccc = , ccc ∈ R d x , and xxx = π zzz t a.s..Now, by condition (i) of the lemma xxx ′ t ccc = a.s. . (A-17) By condition (ii), according to which the rank of π ′ is d x , π ′ ccc = ccc π = where ccc π is of dimension, d z ×
1. (A-18)
By condition (iii) of the lemma and (A-18), we havezzz ′ t π ′ ccc = zzz ′ t ccc π = a.s. . (A-19) From (A-17) and (A-19) we can conclude that E [ υυυ t υυυ ′ t ] has a full column rank of d x ; thus weare able to establish (A-15). imilarly, to show (A-16) to be true, we can show that ˆ ααα ′ ccc = a.s. whenever ccc = , ccc ∈ R d x .Now, ˆ ααα ′ ccc = (cid:18) [ T Σ − ǫǫ + Λ − αα ] − Σ − ǫǫ (cid:18) T ∑ t = υυυ t (cid:19)(cid:19) ′ ccc = (cid:18) T ∑ t = υυυ ′ t (cid:19) Σ − ǫǫ [ T Σ − ǫǫ + Λ − αα ] − ccc = (cid:18) T ∑ t = υυυ ′ t (cid:19) ccc α . Because Σ ǫǫ and Λ αα , the covariance matrices of ǫǫǫ t and ααα respectively, are symmetric posit-ive definite matrices, Σ − ǫǫ [ T Σ − ǫǫ + Λ − αα ] − is nonsingular. This implies that Σ − ǫǫ [ T Σ − ǫǫ + Λ − αα ] − ccc = ccc α = .Since by (A-17) and (A-19), ( xxx ′ t − zzz ′ t π ′ ) ¯ ccc α = υυυ ′ t ccc α = , when xxx t = π zzz t a.s., we therefore have ˆ ααα ′ ccc = (cid:18) T ∑ t = υυυ ′ t (cid:19) ccc α = a.s., thus establishing (A-16). (A-20) Having established (A-14), (A-15) and (A-16), we can conclude that rank ( E ( X t X ′ t )) = d x . Theorem 2
Let the linear structural model be y t = xxx ′ t ϕϕϕ + θ + ζ t , t ∈ {
1, . . . , T } . Underassumption AS 3 and condition (ii) of lemma 1 we can write θ as θ = E ( θ | Z ) + τ = ρρρ α ¯ π ¯ zzz + τ ,which allows us to write the linear structural equation asy t = xxx ′ t ϕϕϕ + ρρρ α ¯ π ¯ zzz + τ + ζ t . (A-21) Under assumptions ACF 1 and AS 4 in the main text, we can write θ + ζ t in the structuralequation as θ + ζ t = E ( θ + ζ t | X , Z ) + η t = ˆ ααα ′ ϕϕϕ α + ˆ ǫǫǫ ′ t ϕϕϕ ǫ + η t , which allows us to write thelinear structural equation as y t = xxx ′ t ϕϕϕ CF + ˆ ααα ′ ϕϕϕ α + ˆ ǫǫǫ ′ t ϕϕϕ ǫ + η t (A-22) where τ + ζ t is mean independent of Z and η t = θ + ζ t − E ( θ + ζ t | X , Z ) is mean independentof X and Z.Given that X and τ + ζ t are dependent, let ˆ ϕϕϕ IV denote the estimated coefficient of xxx t when(A-21) is estimated by two stage least squares (2SLS) with instruments, ¯ zzz = T ∑ Tt = zzz t , ¨ zzz t = zzz t − ¯ zzz, and ¯ zzz ′ ¯ π . And let ˆ ϕϕϕ CF denote the estimate of ϕϕϕ obtained from estimating (A-22) bypooling the data. In the following, we show that ˆ ϕϕϕ CF = ˆ ϕϕϕ IV . Proof 4
Without any loss of generality assume that there is only endogenous regressor, x t ; theproof generalizes to multiple regressors. Given assumption in AS 3, the reduced form is givenby x t = zzz ′ t π + ¯ zzz ′ ¯ π + a + ǫ t , (A-23) where ¯ zzz ′ ¯ π + a = α and a ∼ N ( σ α ) and ǫ t ∼ N ( σ ǫ ) . Here, ¯ y, ¯ x and ¯ zzz denote groupmeans of y t , x t , and zzz t respectively. ow, by Mundlak (1978) we know that π is equal to the within estimator, π w , of the reducedform equation (A-23) and ¯ π = π b − π w , where π b is the between estimator of π in (A-23).Thus by lemma 2, the control functions, ˆ α and ˆ ǫ t , in (A-22) are respectively given by ˆ α = ¯ zzz ′ ( π b − π w ) + ( − λ )( ¯ x − ¯ zzz ′ π b ) ˆ ǫ t = x t − zzz ′ t π w − ¯ zzz ′ ( π b − π w ) − ( − λ )( ¯ x − ¯ zzz ′ π b ) , where λ = σ ǫ σ ǫ + T σ α .It is convenient to write the panel model in (A-22) in vector form as Y = X ϕ + ˆ A ϕ α + ˆ E ϕ ǫ + ηηη , (A-24) where Y is a NT × data matrix which has the observations on y it stacked in NT rows;similarly, X has stacked observations on x it , ˆ A on ˆ α i , and ˆ E on ˆ ǫ it . Now, given ˆ α and ˆ ǫ t , wecan write ˆ A and ˆ E , respectively, as ˆ A = P Z ¯ π + ( − λ ) P R and ˆ E = Q R + λ P R , where R is a NT × matrix, which has the residuals of the reduced form equation (A-23),r it = x t − zzz ′ t π w − ¯ zzz ′ ( π b − π w ) , stacked in NT rows, Z has stacked observations on zzz it , P is amatrix that averages the observation across time for each individual, i.e., P = I N ⊗ ˜ J T whereJ T is a matrix of ones of dimension T and ˜ J T = J T / T, and Q = I NT − P is a matrix thatobtains the deviations from individual means. Given the above, we can write (A-24) as Y = X ϕ + ( P Z ¯ π + ( − λ ) P R ) ϕ α + ( Q R + λ P R ) ϕ ǫ + ηηη . (A-25) Because P and Q are idempotent and P is orthogonal to Q, premultiplying (A-25) through outby Q, we get ¨ Y = ¨ X ϕ + ¨ R ϕ ǫ + ¨ ηηη , (A-26) where ¨ Y is a NT × matrix that has ¨ y it = y it − ¯ y i , the deviations of y it from the individualmeans, ¯ y i , (within transformation) stacked in NT rows, and ¨ R = ¨ X − ¨ Z π w is the matrixobtained after within transforming the residuals, R . ¨ R , incidentally, are the residuals obtainedafter applying within transformation to the reduced form equation (A-23) and estimating it asa fixed effect model.Now, premultiplying (A-25) through out by P, we get ¯ Y = ¯ X ϕ + ( ¯ Z ¯ π + ( − λ ) ¯ R ) ϕ α + λ ¯ R ϕ ǫ + ¯ ηηη , (A-27) where ¯ Y is a NT × matrix that has the individual means, ¯ y i , (between transformation)stacked in NT rows, and ¯ R = ¯ X − ¯ Z π b is the matrix obtained after between transformingthe residuals, R . ¯ R , incidentally, are the residuals obtained after applying between transforma-tion to the reduced form equation (A-23) and estimating it by OLS. ince the column space of the LHS variables, [ ¯ X , ¯ Z ¯ π + ( − λ ) ¯ R , λ ¯ R ] , in (A-27) is same asthat of [ ¯ X , ¯ Z ¯ π , ¯ R ] , the projections of the two matrices will be the same. Therefore, the estimatesof ϕ , ϕ α , and ϕ ǫ in (A-27) will be the same from estimating the following equation: ¯ Y = ¯ X ϕ + ¯ Z ¯ π ϕ α + ¯ R ϕ ǫ + ¯ ηηη , (A-28) From (A-26) and (A-28), it is thus evident that estimating the following equation Y = X ϕ + ¯ Z ¯ π ϕ α + [ Q R + P R ] ϕ ǫ + ηηη , (A-29) by OLS would yield the same result as estimating (A-25) by OLS.By Frisch-Waugh-Lovell Theorem, then ϕ and ϕ α in (A-29) is estimated as " ˆ ϕ CF ˆ ϕ α = "" X ′ ˆ¯ π ′ ¯ Z ′ QM ¨ R + PM ¯ R i h X ¯ Z ˆ¯ π i − " X ′ ˆ¯ π ′ ¯ Z ′ QM ¨ R + PM ¯ R i Y , where ˆ¯ π is the estimate of ¯ π , M ¨ R = I NT − ¨ R [ ¨ R ′ ¨ R ] − ¨ R ′ and M ¯ R = I NT − ¯ R [ ¯ R ′ ¯ R ] − ¯ R ′ . Since X and Y can be decomposed as X = ¨ X + ¯ X and Y = ¨ Y + ¯ Y respectively, we can write the aboveequation as " ˆ ϕ CF ˆ ϕ α = "" ¨ X ′ + ¯ X ′ ˆ¯ π ′ ¯ Z ′ QM ¨ R + PM ¯ R i h ¨ X + ¯ X ¯ Z ˆ¯ π i − " ¨ X ′ + ¯ X ′ ˆ¯ π ′ ¯ Z ′ QM ¨ R + PM ¯ R i h ¨ Y + ¯ Y i . Taking into account the orthogonality conditions in the above, the above simplifies to " ˆ ϕ CF ˆ ϕ α = " ¨ X ′ M ¨ R ¨ X + " ¯ X ′ ˆ¯ π ′ ¯ Z ′ M ¯ R h ¯ X ¯ Z ˆ¯ π i − " ¨ X ′ M ¨ R ¨ Y + " ¯ X ′ ˆ¯ π ′ ¯ Z ′ M ¯ R ¯ Y . (A-30) Because ¨ R = M ¨ Z ¨ X , where M ¨ Z = I NT − ¨ Z [ ¨ Z ′ ¨ Z ] − ¨ Z ′ , it can be verified that ¨ X ′ M ¨ R = ¨ X ′ P ¨ Z ,where P ¨ Z = ¨ Z [ ¨ Z ′ ¨ Z ] − ¨ Z ′ is the projection matrix. And because ¯ R = M ¯ Z ¯ X , where M ¯ Z = I NT − ¯ Z [ ¯ Z ′ ¯ Z ] − ¯ Z ′ , it can be verified that ¯ X ′ M ¯ R = ¯ X ′ P ¯ Z , where P ¯ Z = ¯ Z [ ¯ Z ′ ¯ Z ] − ¯ Z ′ , and that ˆ¯ π ′ ¯ Z ′ M ¯ R = ˆ¯ π ′ ¯ Z ′ . Thus, we can write ˆ ϕ CF and ˆ ϕ α in (A-30) as " ˆ ϕ CF ˆ ϕ α = " ¨ X ′ P ¨ Z ¨ X + " ¯ X ′ ˆ¯ π ′ ¯ Z ′ P ¯ Z h ¯ X ¯ Z ˆ¯ π i − " ¨ X ′ P ¨ Z ¨ Y + " ¯ X ′ ˆ¯ π ′ ¯ Z ′ P ¯ Z ¯ Y , (A-31) which is the same as the estimates ˆ ϕ IV and ˆ ρ α when (A-21) is estimated by the 2SLS using ¯ zzz, ¨ zzz t = zzz t − ¯ zzz and ¯ zzz ′ ˆ¯ π as instruments.If only the within transform, equation (A-26), is estimated, then by a similar process as above,beginning with the Frisch-Waugh-Lovell Theorem, one obtains ˆ ϕ wCF = [ ¨ X ′ P ¨ Z ¨ X ] − ¨ X ′ P ¨ Z ¨ Y , (A-32) which is same as the within estimate of ϕ wIV when (A-21) is estimated by fixed effect two-stageleast squares (FE2SLS) that utilizes ¨ zzz t = zzz t − ¯ zzz as instruments. f only the between transform, equation ( A-28), is estimated, then, again, a similar process asthe one used to derive (A-31), one finally obtains " ˆ ϕ bCF ˆ ϕ b α = "" ¯ X ′ P ¯ Z ˆ¯ π ′ ¯ Z ′ P ¯ Z ¯ X ¯ Z ˆ¯ π i − " ¯ X ′ P ¯ Z ˆ¯ π ′ ¯ Z ′ ¯ Y , which is the estimate of ϕ bIV and ρ b α when (A-21) is between transformed and estimated by two-stage least squares (2SLS) with ¯ zzz and ¯ zzz ′ ˆ¯ π as instruments. After some algebraic manipulations,we get ˆ ϕ bCF = h ¯ X ′ P ¯ Z M ¯ Z ˆ¯ π P ¯ Z ¯ X i − ¯ X ′ P ¯ Z M ¯ Z ˆ¯ π ¯ Y , where M ¯ Z ˆ¯ π = I − ¯ Z ˆ¯ π [ ˆ¯ π ′ ¯ Z ′ ¯ Z ˆ¯ π ] − ˆ¯ π ′ ¯ Z ′ . Lemma 3
If the endogenous variables, xxx, have large a support, then under AS 3, the supportof the conditional distribution of ˆ ααα ( X , Z , Θ ) and ˆ ǫǫǫ t ( X , Z , Θ ) , conditional on xxx t = ¯ xxx, is sameas the support of their marginal distribution. Proof 5 (a) We have shown that the expected value of ααα = ¯ π ¯ zzz + aaa and ǫǫǫ t given Z and X,where aaa and ǫǫǫ t are normally distributed with variances Λ αα and Σ ǫǫ respectively, are given E ( ααα | X , Z ) = ˆ ααα = ¯ π ¯ zzz + ˆ aaa = ¯ π ¯ zzz + T ∑ t = Ω ( xxx t − Π Z t ) and E ( ǫǫǫ t | X , Z ) = ˆ ǫǫǫ t = xxx t − Π Z t − T ∑ t = Ω ( xxx t − Π Z t ) respectively,where Π = ( ¯ π , ¯ π ) , Z t = ( zzz ′ t , ¯ zzz ′ ) ′ and Ω = [ T Σ − ǫǫ + Λ − αα ] − Σ − ǫǫ .Because the support of xxx t is R d x and Ω is a d x × d x nonsingular matrix,Supp ( ˆ ααα ) = Supp ( ˆ ǫǫǫ t ) = R d x whether or not zzz t has a large support . Now fix xxx t = ¯ xxx. Then, because the xxx s ’s, s = t, are not restricted, we haveSupp ( ˆ ααα | xxx t = ¯ xxx ) = Supp (cid:18) ¯ π ¯ zzz + Ω ( ¯ xxx − Π Z t ) + ∑ s = t Ω ( xxx s − Π Z s ) (cid:19) = R d x andSupp ( ˆ ǫǫǫ t | xxx t = ¯ xxx ) = Supp (cid:18) [ I m − Ω ]( ¯ xxx − Π Z t ) − ∑ s = t Ω ( xxx s − Π Z s ) (cid:19) = R d x . (b) When we have a single endogenous variable, x t , given byx t = Π Z t + a + ǫ t , t =
1, . . . , T , here the errors, ǫǫǫ ≡ ( ǫ , . . . , ǫ T ) ′ , are non-spherical such that E ( ǫǫǫǫǫǫ ′ ) = Ω ǫǫ , a invertibleT × T matrix, and a is normally distributed with variance σ α , then we showed that ˆ a ( X , Z , Θ ) = ( x − Π Z ) ω + . . . + ( x T − Π Z T ) ω T , where ( ω , . . . , ω T ) ′ = Ω − ǫǫ e T ( e ′ Ω − ǫǫ e T + σ − α ) and e T is a vector of ones of dimension T.Given that x t ’s have large supports, using a similar argument as in part (a), we getSupp ( ˆ a | x t = ¯ x ) = R and Supp ( ˆ ǫ t | x t = ¯ x ) = R . B Estimation Of Probit Conditional Mean Function If η t in equation (2.9) in main text is assumed to follow a normal distribution, thenE ( y t | X , Z ) = Φ (( X ′ t Θ ) / σ ) , (B-1)where X t = ( xxx ′ t , ˆ ααα ′ ( X , Z ) , ˆ ǫǫǫ ′ t ( X , Z )) , Θ = ( ϕϕϕ ′ , ρρρ ′ α , ρρρ ′ ǫ ) ′ , and σ is the variance of η t .Since in probit models the coefficients can only be identified up to a scale, in this sec-tion with a slight abuse of notation we denote the scaled parameters, σ Θ , by Θ . Toestimate Θ , one can employ nonlinear least squares by pooling the data. However,as PW discuss, since Var ( y t | ( X , Z )) will most likely be heteroscedastic and since therewill be serial correlation across time in the joint distribution, F ( y , . . . , y T | X , Z ) , the es-timates, though consistent, will be estimated inefficiently resulting in biased standarderrors. PW argue that modelling F ( y , . . . , y T | X , Z ) when y t ’s are fractional responseand applying MLE methods, while possible, is not trivial. Moreover, if the model for F ( y , . . . , y T | X , Z ) is misspecified but E ( y t | X , Z ) is correctly specified, the MLE will beinconsistent for Θ and the resulting APEs. This is likely to be true when responseoutcomes are binary.To account for heteroscedasticity and serial dependence when all covariates are exo-genous, PW employ the method of multivariate weighted nonlinear least squares(MWNLS) to obtain efficient estimates of Θ . To get the correct estimates of the stand-ard errors, the method requires is a parametric model of Var ( y i | X i , Z i ) , where y i is the T × ( y t | X , Z ) asVar ( y t | X , Z ) = τ m ( X t , Θ )( − m ( X t , Θ )) , (B-2)where m ( X t , Θ ) = Φ ( X ′ t Θ ) and 0 < τ ≤
1. For covariances, Cov ( y t , y r | X , Z ) , a“working” version, which can be misspecified for Var ( y | X , Z ) , is assumed. This, in thecontext of panel data, is what underlies the method of generalized estimating equa-tion (GEE), as described in Liang and Zeger (1986). The main advantage of GEE lies Another possibility would be to assume the form of heteroscedasticity such as multiplicative heteros-cedasticity as is common in heteroscedastic probit model. However, since it is likely that there will beserial dependence across time, we favour the GEE method proposed in this section, which can potentiallyaccount for both heteroscedasticity and serial dependence.45 n the consistent and unbiased estimation of parameters’ standard errors even whenthe correlation structure is misspecified. Also, GEE and MWNLS are asymptoticallyequivalent whenever they use the same estimates of the T × T positive definite matrix,Var ( y | X , Z ) .Generally, the conditional correlations, Cov ( y t , y s | X , Z ) , are a function of X and Z . Inthe GEE literature, the “working correlation matrix” is that which assumes the de-pendency structure to be invariant over all observations; that is, the correlations arenot a function of X and Z . Here we will focus on a particular correlation matrix thatis suited for panel data applications with small T . In the GEE literature it is called an“exchangeable” correlation pattern. Exchangeable correlation assumes constant timedependency, so that all the off-diagonal elements of the correlation matrix are equal.Though other correlation patterns such as “autoregressive”, which assumes the correl-ations to be an exponential function of the time lag, or “stationary M ”, which assumesconstant correlations within equal time intervals could also be assumed.GEE method suggests that parameter, ρ , that characterize Var ( y | X , Z ) = V ( X , Z , Θ , τ , ρ ) can be estimated using simple functions of residuals, u t , u t = y t − E ( y t | X , Z ) = y t − m ( X t , Θ ) ,where the mean function, E ( y t | X , Z ) , is correctly specified. With the variance definedin (B-2), we can define standardized errors as e t = u t p m ( X t , Θ )( − m ( X t , Θ )) .Then we have Var ( e t | X , Z ) = τ . The exchangeability assumption is that the pairwisecorrelations between pairs of standardized errors are constant, say ρ . This, to reiterate,is a “working” assumption that leads to an estimated variance matrix to be used inMWNLS. Neither consistency of the estimator of ρ , nor valid inference, will rest onexchangeability being true.To estimate a common correlation parameter, let ˜ Θ be a preliminary, consistent estim-ator of Θ . ˜ Θ could be the pooled ML estimate of the heteroscedastic probit model.Define the residuals, ˜ u t , as ˜ u t = y t − m ( X t , ˜ Θ ) and the standardized residuals as˜ e t = ˜ u t q m ( X t , ˜ Θ )( − m ( X t , ˜ Θ )) .Then a natural estimator of a common correlation coefficient is˜ ρ = NT ( T − ) N ∑ i = T ∑ t = ∑ s = t ˜ e it ˜ e is . (B-3) nder standard regularity conditions, without much restrictions on Corr ( e t , e s | X , Z ) ,the plim of ˜ ρ is plim ( ˜ ρ ) = [ T ( T − )] T ∑ t = ∑ s = t E ( e it e is ) ≡ ρ ∗ . (B-4)If Corr ( e t , e s | X , Z ) happens to be the same for all t = s , then ˜ ρ consistently estimatesthis constant correlation. Generally, it consistently estimates the average of these cor-relations across all ( t , s ) pairs, which is defined as C ( ˜ ρ ) . Given the estimated T × T working correlation matrix, C ( ˜ ρ ) , which has unity down its diagonal and ˜ ρ every-where else, we can construct the estimated working variance matrix: V ( X , Z , ˜ Θ , ˜ ρ ) = D ( X , Z , ˜ Θ ) C ( ˜ ρ ) D ( X , Z , ˜ Θ ) = V ( X , Z , ˜ Υ ) where D ( X , Z , Θ ) is the T × T diagonal matrix with m ( X t , Θ )( − m ( X t , Θ )) downits diagonal. (Note that dropping the variance scale factor, τ , has no effect on estima-tion or inference.)Estimation by MWNLS then involves solving for ˆ Θ by minimizing the following withrespect to Θ :min Θ N ∑ i = [ y i − m i ( X i , Z i , Θ )] ′ [ V ( X i , Z i , ˜ Υ )] − [ y i − m i ( X i , Z i , Θ )] , (B-5)where m i ( X i , Z i , Θ ) is the T vector with t th element being m ( X it , Θ ) .The requirement of GEE is that the mean model, E ( y t | X , Z ) , be correctly specified,else the GEE approach to estimation can give inconsistent results. We have, given ouridentifying assumptions, shown that E ( y t | X , Z ) = Φ ( X ′ t Θ ) , and therefore we can em-ploy GEE to account for serial correlation across time. Once the control functions havebeen estimated, one can then use the STATA command,“xtgee,” which fits generalizedlinear models and allows one to specify the within-group correlation structure for thepanels, to estimate Θ . C Panel Probit Model with Random Coefficients in a Triangu-lar System
In this section we extend the model with random effect studied in section 2 to allowfor random coefficients, and discuss identification of certain structural measures ofinterest. Consider the following binary choice random coefficient model y it = { y ∗ it = X ′ it ϕϕϕ i + ζ it > } , (C-1)where X it = ( x it , ′ it ) ′ and ζ it are the idiosyncratic errors. Here we consider a singlecontinuous endogenous variable, x it , with a large support. Let d w be the dimension of he exogenous variables, it . In (C-1), the random coefficients are ϕϕϕ i = ϕϕϕ + θθθ i , where ϕϕϕ is a ( + d w ) × ( θθθ i ) =
0. Thus, ϕϕϕ is the averageslope that we might be interested in.The reduced form in the triangular system is given by: x it = zzz ′ it ααα i + ǫ it , (C-2)where zzz it = ( ′ it , ˜ zzz ′ it ) ′ . The dimension of the vector of instruments, ˜ zzz it , d z , is greaterthan or equal to 1. ααα i = ααα + aaa i is the ( d w + d z ) × ααα is a vector of constants and aaa i a vector of stationary random variables withzero means and constant variance-covariances. And finally, ǫ it is a scalar idiosyncraticterm.The identifying distributional restrictions are summarized as follows: RC 1 (a) ( θθθ i , ζζζ i ) , ( aaa i , ǫǫǫ i ) ⊥⊥ Z i and (b) θθθ i , aaa i ⊥⊥ ζζζ i , ǫǫǫ i , where Z i ≡ ( zzz i , . . . , zzz iT ) is a T × ( d w + d z ) matrix, ζζζ i ≡ ( ζ i , . . . , ζ iT ) ′ , and ǫǫǫ i ≡ ( ǫ i , . . . , ǫ iT ) ′ . In the above assumption, zzz it is independent of the random coefficients, ( ϕϕϕ i , ααα i ) , and theidiosyncratic errors, ( ζ it , ǫ it ) . Also, as in the random effects model, we assume that therandom coefficients and the idiosyncratic errors are independent of each other. RC 2 θθθ i , ζ it | X i , Z i , aaa i ∼ θθθ i , ζ it | X i − E ( X i | Z i , aaa i ) , Z i , aaa i ∼ θθθ i , ζ it | ǫǫǫ i , Z i , aaa i ∼ θθθ i , ζ it | ǫǫǫ i , aaa i , where X i ≡ ( x i , . . . , x iT ) ′ and ǫǫǫ i = X i − E ( X i | Z i , aaa i ) = X i − Z i ( ααα + aaa i ) . In RC 2 the assumption is that the dependence of the structural error terms θθθ i and ζ it on X i , Z i , and aaa i is completely characterized by the reduced form error components, ǫǫǫ i and aaa i . If given ( ǫ it , aaa i ) only contemporaneous correlations matter, then θθθ i , ζ it ⊥⊥ ǫǫǫ i , − t | ( ǫ it , aaa i ) .As in the model for random effects, we specify the marginal distributions of aaa i and ǫ it .We assume that RC 3 aaa i ∼ N ( Σ a ) and that ǫ it ∼ N ( σ ǫ ) . Let Θ ≡ { ααα , Σ a , σ ǫ } denote the set of parameters of random coefficient model in (C-2),the reduced form equation. The random coefficient model is a standard one, and moststatistical packages have routines to estimate Θ . iven assumption RC 2, we haveE ( θθθ i | X i , Z i , aaa i ) = E ( θθθ i | aaa i , ǫǫǫ i ) = E ( θθθ i | aaa i ) = ρρρ θ a aaa i andE ( ζ it | X i , Z i , aaa i ) = E ( ζ it | aaa i , ǫǫǫ i ) = E ( ζ it | ǫǫǫ i ) = ρρρ ζǫ ǫǫǫ i , (C-3)where the second equality in each of the above follows from part (b) of assumptionRC 1. In the above, ρρρ θ a is the ( d w + ) × ( d w + d z ) matrix of population regressioncoefficients of θθθ i on aaa i , and ρρρ ζǫ is the population regression coefficient of ζ it on ǫǫǫ i .Our assumptions and (C-3) then imply that the conditional expectation of y ∗ it given X i , Z i , and aaa i is given by E ( y ∗ it | X i , Z i , aaa i ) = X ′ it ϕϕϕ + X ′ it ρρρ θ a aaa i + ρρρ ζǫ ǫǫǫ i .Because the stochastic part, aaa i , of the random coefficients in the reduced form equa-tion are unobserved, the conditioning variable, ǫǫǫ i = X i − Z i ( ααα + aaa i ) , too, is not iden-tified. To estimate the structural parameters, as in the model with random effects, wefirst integrate out aaa i from E ( y ∗ it | X i , Z i , aaa i ) with respect to its conditional distribution, f ( aaa i | X i , Z i ) , to obtainE ( y ∗ t | X i , Z i ) = Z E ( y ∗ it | X i , Z i , aaa i ) f ( aaa i | X i , Z i ) daaa i = X ′ it ϕϕϕ + X ′ it ρρρ θ a ˆ aaa i + ρρρ ζǫ ˆ ǫǫǫ i , (C-4)where ˆ aaa i = E ( aaa i | X i , Z i ) and ˆ ǫǫǫ i = X i − Z i ( ααα + ˆ aaa i ) . In lemma 1 we show thatL EMMA C1 If x it = zzz ′ it ααα + zzz ′ it aaa i + ǫ it , and if RC 1 and RC 3 hold, then E ( aaa i | X i , Z i ) = ˆ aaa i ( X i , Z i , Θ ) = [ T ∑ t = zzz it zzz ′ it + σ ǫ Σ − a ] − (cid:18) T ∑ t = zzz it ( x it − zzz ′ it ααα ) (cid:19) .P ROOF OF L EMMA
C1 Given in section C.1 of this appendix.From (C-3) and (C-4), it therefore follows thatE ( ϕϕϕ i | X i , Z i ) = E ( ϕϕϕ + θθθ i | X i , Z i ) = ϕϕϕ + ρρρ θ a ˆ aaa i and E ( ζ it | X i , Z i ) = ρρρ ζǫ ˆ ǫǫǫ i . (C-5)Writing ϕϕϕ i and ζ it in error form as ϕϕϕ i = ϕϕϕ + ρρρ θ a ˆ aaa i + ˜ θθθ i and ζ it = ρρρ ζǫ ˆ ǫǫǫ i + ˜ ζ it respectively,we can write the structural equation (C-1) as y it = { y ∗ it = X ′ it ϕϕϕ + X ′ it ρρρ θ a ˆ aaa i + ρρρ ζǫ ˆ ǫǫǫ i + X ′ it ˜ θθθ i + ˜ ζ it > } . (C-6)While ˜ θθθ i and ˜ ζ it are mean independent of X i and Z i , if, as in Chamberlain (1984), wemake a stronger assumption of complete independence and assume that ˜ θθθ i and ˜ ζ it aredistributed normally with mean zero and variances Σ θ and 1 respectively, then the arameters, Θ ≡ { ϕϕϕ , ρρρ θ a , ρρρ ζǫ , Σ θ } , of the above model can be estimated by integratedmaximum likelihood method, where one can integrate out ˜ θθθ i using numerical mul-tidimensional integration (see Heiss and Winschel, 2008). Alternatively, maximumsimulated likelihood or Markov Chain Monte Carlo (MCMC) methods as discussedin Greene (2004), too, can be used to obtain Θ .Once Θ is estimated, the following measures of interest can be obtained. (A) Theexpected value, E ( ϕϕϕ i | X i , Z i ) = ϕϕϕ + ρρρ θ a ˆ aaa i . (B) The Average Partial Effect (APE) of chan-ging a variable, say w , in time period t from w it to w it + ∆ w can be obtained as ∆ G ( X it ) ∆ w = G ( X it − w , ( w t + ∆ w )) − G ( X it ) ∆ w , where G ( X it ) = Z Φ (cid:18) X ′ it ϕϕϕ + X ′ it ρρρ θ a ˆ aaa i + ρρρ ζǫ ˆ ǫǫǫ i + X ′ it Σ θ X it (cid:19) dF ( ˆ aaa , ˆ ǫ ) .The above integral/partial mean rests on the assumption that analogous support re-quirement as in lemma 3 of the main text is satisfied.Since we were able to identify the average slopes coefficients and the APEs when weaugmented the structural equation with ˆ aaa ′ i ⊗ X ′ it and ˆ ǫǫǫ i , as in the random effects case,we propose ( ˆ ǫǫǫ i , ˆ aaa i ) to be used as control function. ACF 2 ζ it , θθθ i | X i , Z i , ˆ aaa i ∼ ζ it , θθθ i | ˆ ǫǫǫ i , Z i , ˆ aaa i ∼ ζ it , θθθ i | ˆ ǫǫǫ i , ˆ aaa i , where ˆ ǫǫǫ i ≡ ( ˆ ǫ i , . . . , ˆ ǫ iT ) ′ = X i − Z i ( ααα + ˆ aaa i ) and ˆ aaa i = E ( aaa i | X i , Z i ) . In the above, ˆ ǫǫǫ i ( X i , Z i ) and ˆ aaa i ( X i , Z i ) are assumed to fully characterize the dependenceof X i and Z i on the structural errors, ζ it and θθθ i , in (C-1). With ˆ ǫǫǫ i and ˆ aaa i as control func-tions the semiparametric method in Hoderlein and Sherman (2015) can be employedto estimate the mean of ϕϕϕ i .Kasy (2011) considers non-separable triangular systems for cross-sectional data tocharacterizes systems for which control functions – control functions such as C ( x , z ) = x − E ( x | z ) or C ( x , z ) = F ( x | z ) , where F is the conditional cumulative distribution func-tion of x given z – exist. Kasy shows that when unobserved heterogeneity in first-stagereduced form equations is multi-dimensional, such as the reduced form equationswith random coefficients, the errors in the structural equation are not independent ofthe endogenous covariates, x , or the instruments, z , given C ( x , z ) .We consider panel data, where the random coefficient are time invariant, and our con-trol functions, ˆ ǫǫǫ i and ˆ aaa i , are different from those considered in Kasy. Since ˆ aaa i , a func-tion of X i and Z i , summarizes certain individual specific information, as argued insection 2.1, the assumption in ACF 2 is that the dependence of ( θθθ i , ζ it ) on ( X i , Z i ) can e reduced to dependence of ( θθθ i , ζ it ) on ( ˆ aaa i , ˆ ǫǫǫ i ) , which is akin to dependence assump-tion in papers such as by Altonji and Matzkin (2005) and Bester and Hansen (2009).The assumption is motivated by the result that under the restrictions in RC 1, RC 2,and (C-3), the expectations of ζ it and θθθ i given ( X i , Z i ) depend on ( X i , Z i ) only throughˆ ǫǫǫ i and ˆ aaa i respectively. C.1 Proof of Lemma C1
As in the models with random effects, to obtain ˆ aaa ( X , Z ) = E ( aaa | X , Z ) we first derive f ( aaa | X , Z ) . Again, using the fact that Z ⊥⊥ aaa , by an application of Bayes’ rule, we have f ( aaa | X , Z ) = f ( X | Z , aaa ) f ( aaa ) f ( X | Z ) .Since aaa ⊥⊥ ǫ t , aaa ∼ N ( Σ a ) , and ǫ ∼ N ( σ ǫ ) , it implies that X , given Z , is normallydistributed with mean Z ′ ααα , and variance Σ = σ ǫ I T + Z ′ Σ a Z , where I T is an identitymatrix of dimension T . That is, f ( X | Z ) = p ( π ) T | Σ | exp ( − R ′ Σ − R ) , where R = X − Z ′ ααα , (C-7)and where by Woodbury matrix identity, Σ − = σ ǫ I T − σ ǫ Z ′ [ σ ǫ Σ − a + ZZ ′ ] − Z , and byMatrix determinant lemma, | Σ | = | σ ǫ Σ − a + ZZ ′ || Σ a | .Since X given ( Z , aaa ) has the same distribution as ǫǫǫ = R − Z ′ aaa , we have f ( X | Z , aaa ) f ( aaa ) = p ( π ) T + k σ ǫ | Σ a | exp (cid:18) − σ ǫ (cid:20) [ R − Z ′ aaa ] ′ [ R − Z ′ aaa ] + σ ǫ aaa ′ Σ − a aaa (cid:21)(cid:19) ,(C-8)where k is the dimension of aaa .Since f ( aaa | X , Z ) = f ( X | Z , aaa ) f ( aaa ) f ( X | Z ) , as shown earlier, using (C-7) and (C-8) it can be shownthe aaa given X and Z is normally distributed with meanE ( aaa | X , Z ) = ˆ aaa ( X , Z ) = [ σ ǫ Σ − a + ZZ ′ ] − Z ( X − Z ′ ααα ) ,and conditional variance σ ǫ [ σ ǫ Σ − a + ZZ ′ ] − . D Asymptotic Covariance Matrix for Structural Parameters
Though obtaining the parameters of the second stage, given the first stage consistentestimates ˆ Θ , is asymptotically equivalent to estimating the subsequent stage para-meters had the true value of Θ ∗ been known, to obtain correct inference about thestructural parameters, one has to account for the fact that instead of true values offirst stage reduced form parameters, we use their estimated value. Here we are as-suming that the first stage estimation involves the estimation of system of regression sing Biørn’s method and that in the second stage a probit model, using the methodof multivariate weighted nonlinear least squares (MWNLS), is estimated.Newey (1984) has shown that sequential estimators can be interpreted as membersof a class of Method of Moments (MM) estimators and that this interpretation fa-cilitates derivation of asymptotic covariance matrices for multi-step estimators. Let Θ = ( Θ ′ , Θ ′ ) ′ , where Θ and Θ are respectively the parameters to be estimated inthe first and second step estimation of the sequential estimator. Following Newey(1984) we write the first and second step estimation as an MM estimation based on thefollowing population moment conditions:E ( L i Θ ) = E ∂ ln L i ( Θ ) ∂ Θ = ( H i Θ ( Θ , Θ )) = L i ( Θ ) is the likelihood function for individual i for the first step system ofreduced form equations and E ( H i Θ ( Θ , Θ )) is the population moment condition forestimating Θ given Θ .The estimates for Θ and Θ are obtained by solving the sample analog of the abovepopulation moment conditions. The sample analog of moment conditions for the firststep estimation is given by1 N L Θ ( ˆ Θ ) = N N ∑ i = ∂ L i ( ˆ Θ ) ∂ Θ = N N ∑ i = ∂ ln L i ( ˆ Θ ) ∂ Θ where L i ( Θ ) and the first order conditions with respect to Θ = ( δδδ ′ , vec ( Λ αα ) ′ , vec ( Σ ǫǫ ) ′ ) ′ are given in appendix E, and N is the total number of individuals.The sample analog of population moment condition for the second step estimation isgiven by 1 N H Θ ( ˆ Θ , ˆ Θ ) = N N ∑ i = H i Θ ( ˆ Θ , ˆ Θ ) .We have shown that the structural equations augmented with the control functionsˆ ααα i ( X i , Z i , Θ ) and ˆ ǫǫǫ it ( X i , Z i , Θ ) leads to the identification of Θ . Let Θ ∗ be the truevalues of Θ . Under the assumptions we make, solving N ∑ Ni = H it Θ ( ˆ Θ , Θ ) = N ∑ Ni = H it Θ ( Θ ∗ , Θ ) =
0, where ˆ Θ is a consist-ent first step estimate of Θ . Hence ˆ Θ obtained by solving N H Θ ( ˆ Θ , ˆ Θ ) = Θ . While we have written our reduced form equation as xxx it = π zzz it + ¯ π ¯ zzz i + aaa i + ǫǫǫ it , Biørn writes it as xxx it = Z ′ it δδδ + aaa i + ǫǫǫ it ,where Z it = diag (( zzz ′ it , ¯ zzz ′ i ) ′ , . . . , ( zzz ′ it , ¯ zzz ′ i ) ′ ) and δδδ = ( vec ( π ) ′ , vec ( ¯ π ) ′ ) ′ .52 o derive the asymptotic distribution of the second step estimates ˆ Θ , consider thestacked up sample moment conditions:1 N " L Θ ( ˆ Θ ) H Θ ( ˆ Θ , ˆ Θ ) =
0. (D-1)A series of Taylor’s expansion of L Θ ( ˆ Θ ) , H Θ ( ˆ Θ , ˆ Θ ) and around Θ ∗ gives1 N " L Θ Θ H Θ Θ H Θ Θ √ N ( ˆ Θ − Θ ∗ ) √ N ( ˆ Θ − Θ ∗ ) = − √ N " L Θ H Θ . (D-2)In matrix notation the above can be written as B ΘΘ N √ N ( ˆ Θ − Θ ) = − √ N Λ Θ N ,where Λ Θ N is evaluated at Θ ∗ and B ΘΘ N is evaluated at points somewhere betweenˆ Θ and Θ ∗ . Under the standard regularity conditions for Generalized Method of Mo-ments (GMM) (see Newey, 1984) B ΘΘ N converges in probability to the lower blocktriangular matrix B ∗ = lim E ( B ΘΘ N ) . B ∗ is given by B ∗ = " L Θ Θ H Θ Θ H Θ Θ where L Θ Θ = E ( L i Θ Θ ) , H Θ Θ = E ( H i Θ Θ ) . √ N Λ N converges asymptotically indistribution to a normal random variable with mean zero and a covariance matrix A ∗ = lim E N Λ N Λ ′ N , where A ∗ is given by A ∗ = " V LL V LH V HL V HH ,and a typical element of A ∗ , say V LH , is given by V LH = E [ L i Θ ( Θ ) H i Θ ( Θ , Θ ) ′ ] . Un-der the regularity conditions √ N ( ˆ Θ − Θ ∗ ) is asymptotically normal with zero meanand covariance matrix given by B − ∗ A ∗ B − ′∗ , that is √ N ( ˆ Θ − Θ ∗ ) a ∼ N [( ) , ( B − ∗ A ∗ B − ′∗ )] . (D-3)By an application of partitioned inverse formula and some matrix manipulation weget the asymptotic covariance matrix of √ N ( ˆ Θ − Θ ) , V ∗ , where V ∗ = H − Θ Θ V HH H − Θ Θ + H − Θ Θ H − Θ Θ { L − Θ Θ V LL L − ′ Θ Θ } H − ′ Θ Θ H − ′ Θ Θ − H − Θ Θ { H Θ Θ L − Θ Θ V LH + V HL L − ′ Θ Θ H ′ Θ Θ } H − ′ Θ Θ (D-4)To estimate V ∗ , sample analog of the B ∗ , B N given in (D-2), and sample analog of A ∗ , A N = N Λ N Λ N , have to be computed. A typical element of A N , say V LH N , is given y V LH N = N ∑ Ni = L i Θ ( Θ ) H i Θ ( ˆ Θ , ˆ Θ ) ′ . The first and the second order conditionsof the first-stage likelihood function for estimating Θ , which are used to compute thesample analog of L Θ Θ and to compute A N , are provided in appendix E.For binary response model, the score function pertaining to the minimand in equation(B-5) is given by H i Θ ( Θ , Θ ) = −∇ Θ m i ( X i , Z i , Θ ) ′ [ V ( X i , Z i , ˜ Υ )] − [ y i − m i ( X i , Z i , Θ )]= −∇ Θ m i ( Θ , Θ ) ′ ˜ V − u i ,where m i ( Θ , Θ ) ≡ m i ( X i , Z i , Θ ) is a T vector with t th element being m ( X it , Θ ) = Φ ( xxx ′ it ϕϕϕ + ρρρ α ˆ ααα i + ρρρ ǫ ˆ ǫǫǫ it ) ≡ m it ( Θ , Θ ) , u i is a T vector with t th element being y it − m it ( Θ , Θ ) , and ˜ V ≡ V ( X i , Z i , ˜ Υ ) . Now ∇ Θ m it ( Θ , Θ ) = φ ( X ′ it Θ ) X ′ it where X it = ( xxx ′ it , ˆ ααα ′ i ( Θ ) , ˆ ǫǫǫ ′ it ( Θ )) ′ and Θ = ( ϕϕϕ ′ , ρρρ ′ α , ρρρ ′ ǫ ) ′ .Wooldridge (2010a) and Wooldridge (2010b) show (see Problem 12.11 in Wooldridge,2010b) that H Θ Θ of B ∗ is given by H Θ Θ = E [ H i Θ Θ ( Θ , Θ )] = E [ ∇ Θ m i ( Θ , Θ ) ′ ˜ V − ∇ Θ m i ( Θ , Θ )] ,which can be approximated as1 N N ∑ i = ∇ Θ m i ( ˆ Θ , ˆ Θ ) ′ ˆ V − ∇ Θ m i ( ˆ Θ , ˆ Θ ) ,where ˆ V = V ( X i , Z i , ˆ Υ ) = V ( X i , Z i , ˆ Θ , ˆ ρ ) .We now compute H Θ Θ = ∑ Ni = H i Θ Θ = ∑ Ni = ∂ H i Θ ( Θ , Θ ) ∂ Θ ′ in order to compute thesample analog of H Θ Θ . Now, ∂ H i Θ ( Θ , Θ ) ∂ Θ ′ = − (cid:20) [ u ′ i ˜ V − ⊗ I ] ∂ vec ( ∇ Θ m i ( Θ , Θ ) ′ ) ∂ Θ ′ + [ u i ⊗ ∇ Θ m i ( Θ , Θ ) ′ ] ∂ vec ( ˜ V − ) ∂ Θ ′ − ∇ Θ m i ( Θ , Θ ) ′ ˜ V − ∇ Θ m i ( Θ , Θ ) (cid:21) .Taking expectation of the above, we find that the first two terms are zero. Hence wehave H Θ Θ = E [ H i Θ Θ ( Θ , Θ )] = E [ ∇ Θ m i ( Θ , Θ ) ′ ˜ V − ∇ Θ m i ( Θ , Θ )] , hich can be approximated by1 N N ∑ i = ∇ Θ m i ( ˆ Θ , ˆ Θ ) ′ ˆ V − ∇ Θ m i ( ˆ Θ , ˆ Θ ) .The constituents, ∇ Θ m it ( Θ , Θ ) , of ∇ Θ m i ( Θ , Θ ) are given by ∇ Θ m it ( Θ , Θ ) = φ ( X ′ it Θ ) Θ ′ ∂ X it ∂ Θ ′ ,which is row matrix with dimension that of Θ , and where ∂ X it ∂ Θ ′ = ∂ xxx it ∂δδδ ′ ∂ xxx it ∂ vec ( Λ αα ) ′ ∂ xxx it ∂ vec ( Σ ǫǫ ) ′ ∂ ˆ ααα i ∂δδδ ′ ∂ ˆ ααα i ∂ vec ( Λ αα ) ′ ∂ ˆ ααα i ∂ vec ( Σ ǫǫ ) ′ ∂ ˆ ǫǫǫ it ∂δδδ ′ ∂ ˆ ǫǫǫ it ∂ vec ( Λ αα ) ′ ∂ ˆ ǫǫǫ it ∂ vec ( Σ ǫǫ ) ′ .Since xxx it is not a function of Θ , ∂ xxx it ∂ Θ ′ = xxx , where xxx is a null matrix with row dimensionthat of column vector xxx it and column dimension that of column vector Θ . Using thefollowing matrix results: ∂ vec ( Ω bbb ) = ( bbb ′ ⊗ I m ) ∂ vec ( Ω ) , ∂ vec ( Ω − ) = − ( Ω ′− ⊗ Ω − ) ∂ vec ( Ω ) and ∂ vec ( Ω ) ∂ vec ( Ω ) = I mm ,where bbb is a vector of dimension m , Ω is a symmetric m × m matrix and I mm is the mm × mm identity matrix, it can be shown that ∂ ˆ ααα i ∂δδδ ′ = ∂ ( diag ( ¯ zzz ′ i , . . . , ¯ zzz ′ i ) ′ vec ( ¯ π ) + ˆ aaa i ) ∂δδδ ′ = O ′ Zi − [ T Σ − ǫǫ + Λ − αα ] − Σ − ǫǫ Z ′ it , ∂ ˆ ǫǫǫ it ∂δδδ ′ = ∂ ( xxx it − Z ′ it δδδ − ˆ aaa i ) ∂δδδ ′ = − Z ′ it + [ T Σ − ǫǫ + Λ − αα ] − Σ − ǫǫ Z ′ it , ∂ ˆ ααα i ∂ vec ( Λ αα ) ′ = − (cid:18)(cid:18) T ∑ t = υυυ ′ t (cid:19) ⊗ I m (cid:19)(cid:20)(cid:18) Σ − ǫǫ ⊗ I m (cid:19)(cid:18) Σ ′ ⊗ Σ (cid:19)(cid:21) I mm , ∂ ˆ ααα i ∂ vec ( Σ ǫǫ ) ′ = − (cid:18)(cid:18) T ∑ t = υυυ ′ t (cid:19) ⊗ I m (cid:19)(cid:20)(cid:18) I m ⊗ Σ (cid:19)(cid:18) Σ − ǫǫ ⊗ Σ − ǫǫ (cid:19) + (cid:18) Σ − ǫǫ ⊗ I m (cid:19)(cid:18) Σ ′ ⊗ Σ (cid:19)(cid:21) T I mm , ∂ ˆ ǫǫǫ it ∂ vec ( Λ αα ) ′ = − ∂ ˆ ααα i ∂ vec ( Λ αα ) ′ , and ∂ ˆ ǫǫǫ it ∂ vec ( Σ ǫǫ ) ′ = − ∂ ˆ ααα i ∂ vec ( Σ ǫǫ ) ′ ,where O Zi = diag (( ′ z , ¯ zzz ′ i ) ′ , . . . , ( ′ z , ¯ zzz ′ i ) ′ ) , 0 z denoting a vector of zeros with dimensionthat of zzz it , υυυ t = xxx t − π zzz t , and Σ = [ T Σ − ǫǫ + Λ − αα ] − . ince H i Θ Θ ( ˆ Θ , ˆ Θ ) and H i Θ Θ ( ˆ Θ , ˆ Θ ) converge almost surly to H i Θ Θ ( Θ ∗ , Θ ∗ ) and H i Θ Θ ( Θ ∗ , Θ ∗ ) respectively, by the weak LLN N ∑ Ni = H i Θ Θ ( ˆ Θ , ˆ Θ ) will converge inprobability to E ( H i Θ Θ ( Θ ∗ , Θ ∗ )) = H Θ Θ and N ∑ Ni = H i Θ Θ ( ˆ Θ , ˆ Θ ) will convergein probability to E ( H i Θ Θ ( Θ ∗ , Θ ∗ )) = H Θ Θ . D.1 Hypothesis Testing of Average Partial Effects
In section 2 of the paper we discussed the estimation of average structural function(ASF) and average partial effect (APE) of a variable w . If the support condition inlemma 3 for the point identification of ASF and the APEs is met, then if η t in (2.9) ofthe main text is assumed to follow a normal distribution, the estimated APE of w onthe probability of y it = xxx it = ¯ xxx is given by ∂ b Pr ( y it = | ¯ xxx ) ∂ w = NT N ∑ i = T ∑ t = ˆ ϕ w φ (cid:18) ¯ X ′ it ˆ Θ (cid:19) ≡ NT N ∑ i = T ∑ t = g wit ( ˆ Θ ) ,where ¯ X it = ( ¯ xxx ′ , ˆˆ ααα i ( ˆ Θ ) ′ , ˆˆ ǫǫǫ it ( ˆ Θ ) ′ ) ′ and ˆ Θ = ( ˆ ϕϕϕ ′ , ˆ ρρρ α , ˆ ρρρ ǫ ) ′ .To test various hypothesis in order to draw inferences about the APE’s we need tocompute the standard errors of their estimates. Now, we know that by the linearapproximation approach (delta method), the asymptotic variance of ∂ b Pr ( y it = | ¯ xxx ) ∂ w can beestimated by computing (cid:20) NT N ∑ i = T ∑ t = ∂ g wit ( ˆ Θ ) ∂ ˆ Θ ′ (cid:21) ˆ V ∗ (cid:20) NT N ∑ i = T ∑ t = ∂ g wit ( ˆ Θ ) ∂ ˆ Θ ′ (cid:21) ′ , (D-5)where ˆ V ∗ , the second stage error adjusted covariance matrix of Θ estimated at ˆ Θ , isgiven in (D-4). ∂ g wit ( ˆ Θ ) ∂ ˆ Θ ′ in (D-5) turns out to be ∂ g wit ( ˆ Θ ) ∂ ˆ Θ ′ = φ ( ¯ X ′ it ˆ Θ )[ e w − ˆ ϕ w ( ¯ X ′ it ˆ Θ ) ¯ X it ] where e w is a column vector having the dimension of Θ ′ and with 1 at the position of ϕ w in Θ and zeros elsewhere.If w is a dummy variable then the estimated APE of w when the APE of w is pointidentified is given by ∆ w Pr ( y it = ) = NT N ∑ i = T ∑ t = Φ ( ¯ xxx − w , w =
1, ˆˆ ααα i , ˆˆ ǫǫǫ it ) − Φ ( ¯ xxx − w , w =
0, ˆˆ ααα i , ˆˆ ǫǫǫ it )= NT N ∑ i = T ∑ t = Φ it ( w = ) − Φ it ( w = )= NT N ∑ i = T ∑ t = ∆ w Φ it () . he asymptotic variance of the above can again by the application of delta method beobtained as (cid:20) NT N ∑ i = T ∑ t = ∂ ∆Φ it ( . ) ∂ Θ ′ (cid:21) ˆ V ∗ (cid:20) NT N ∑ i = T ∑ t = ∂ ∆Φ it ( . ) ∂ Θ ′ (cid:21) ′ , (D-6)where ∂ ∆Φ it ( . ) ∂ Θ ′ = ∂ Φ it ( w = ) ∂ Θ ′ − ∂ Φ it ( w = ) ∂ Θ ′ = φ it ( w = ) " ¯ X it − w ′ − φ it ( w = ) " ¯ X it − w ′ .When the support condition in lemma 3 for the point identification of ASF and theAPEs is not met, we compute the 95% confidence interval (CI ) as proposed in Im-bens and Manski (2004) for the partially identified APEs. In section 2 of the main text,we have shown that when support condition in lemma 3 is not met, the the APE ofchanging x k from ¯ x k to ¯ x k + ∆ k , ∆ G ( ¯ xxx ) / ∆ k , lies in the interval, (cid:20) Ψ l = ˜ G ( ¯ xxx ∆ k ) − ˜ G ( ¯ xxx ) − P ( ¯ xxx ) ∆ k , Ψ u = ˜ G ( ¯ xxx ∆ k ) + P ( ¯ xxx ∆ k ) − ˜ G ( ¯ xxx ) ∆ k (cid:21) , (D-7)where ¯ xxx ∆ k = ( ¯ xxx ′− k , ¯ x k + ∆ k ) ′ .Let σ l be the standard errors of the estimate of the lower bound of the interval andlet σ u be the standard errors of the estimate of the upper bound. To construct theconfidence interval for the partially identified APEs, we first show that σ u = σ l = ¯ σ .Let b Ψ l be the estimate of the lower bound and let b Ψ u be that of the upper bound. Sincethe estimates of P ( . ) in (D-7) does not depend on Θ , ∂ b Ψ l ∂ Θ ′ = ∂ b Ψ u ∂ Θ ′ = (cid:20) ∂ ˆ˜ G ( ¯ xxx ∆ k ) ∂ Θ ′ − ∂ ˆ˜ G ( ¯ xxx ) ∂ Θ ′ (cid:21) ∆ k .By applying by the delta method, we get¯ σ = ∆ k (cid:20) ∂ ˆ˜ G ( ¯ xxx ∆ k ) ∂ Θ ′ − ∂ ˆ˜ G ( ¯ xxx ) ∂ Θ ′ (cid:21) ˆ V ∗ ∆ k (cid:20) ∂ ˆ˜ G ( ¯ xxx ∆ k ) ∂ Θ ′ − ∂ ˆ˜ G ( ¯ xxx ) ∂ Θ ′ (cid:21) ′ , (D-8)where in (D-8) the derivative of ˆ˜ G ( . ) with respect to Θ at xxx is given by ∂ ˆ˜ G ( xxx ) ∂ Θ ′ = NT ∑ i , t φ ( xxx ′ ˆ ϕϕϕ + ˆ ρρρ α ˆˆ ααα i + ˆ ρρρ ǫ ˆˆ ǫǫǫ it ) [( ˆˆ ααα i , ˆˆ ǫǫǫ it ) ∈ ˆ A ( xxx )]( xxx ′ , ˆˆ ααα ′ i , ˆˆ ǫǫǫ ′ it ) .According to lemma 4.1 in Imbens and Manski (2004), the confidence intervalCI = (cid:20) Ψ l − C NT ¯ σ √ NT , Ψ u + C NT ¯ σ √ NT (cid:21) , here C NT is a solution to Φ (cid:18) C NT + √ NT ( Ψ u − Ψ l ) ¯ σ (cid:19) − Φ (cid:18) − C NT (cid:19) = E Estimation of the Reduced form Equations
In this section we briefly describe Biørn (2004) step wise maximum likelihood proced-ure to estimate the reduced form system of equation xxx it = Z ′ it δδδ + aaa i + ǫǫǫ it , (E-1)where Z it = diag (( zzz ′ it , ¯ zzz ′ i ) ′ , . . . , ( zzz ′ it , ¯ zzz ′ i ) ′ ) and δδδ = ( vec ( π ) ′ , vec ( ¯ π ) ′ ) ′ . While Biørn(2004) deals with unbalanced panel, here we assume that our panel is balanced. Let N be the total number of individuals. Let N be the total number of observations, i.e., N = NT . Let xxx i ( T ) = ( xxx ′ i , . . . xxx ′ iT ) ′ , Z i ( T ) = ( Z ′ i , . . . Z ′ iT ) ′ and ǫǫǫ i ( T ) = ( ǫǫǫ ′ i , . . . ǫǫǫ ′ iT ) ′ andwrite the model as xxx i ( T ) = Z ′ i ( T ) δδδ + ( e T ⊗ aaa i ) + ǫǫǫ i ( T ) = Z ′ i ( T ) δδδ + uuu i ( T ) , (E-2)Now, E ( uuu i ( T ) uuu ′ i ( T ) ) = I T ⊗ Σ ǫǫ + E T ⊗ Λ αα = K T ⊗ Σ ǫǫ + J T ⊗ Σ ( T ) = Ω u ( T ) where Σ ( T ) = Σ ǫǫ + T Λ αα ,where I T is the T dimensional identity matrix, e T is the ( T × ) vector of ones, E T = e T e ′ T , J T = ( T ) E T , and K T = I T − J T . The latter two matrices are symmetric andidempotent and have orthogonal columns, which facilitates inversion of Ω u ( T ) . E.1 GLS estimation
Before addressing the maximum likelihood problem, we consider the GLS problem for δδδ when Λ α and Σ ǫǫ are known. Define Q i ( T ) = uuu ′ i ( T ) Ω − u ( T ) uuu i ( T ) , then GLS estimationis the problem of minimizing Q = ∑ Ni = Q i ( T ) with respect to δδδ . Since Ω − u ( T ) = K T ⊗ Σ − ǫǫ + J T ⊗ ( Σ ǫǫ + T Λ αα ) − , we can rewrite Q as Q = N ∑ i = uuu ′ i ( T ) [ K T ⊗ Σ − ǫǫ ] uuu i ( T ) + N ∑ i = uuu ′ i ( T ) [ J T ⊗ ( Σ ǫǫ + T Λ αα ) − ] uuu i ( T ) . LS estimator of δδδ when Λ αα and Σ ǫǫ are known is obtained from ∂ Q / ∂δδδ =
0, and isgiven byˆ δδδ
GLS = (cid:20) N ∑ i = Z ′ i ( T ) [ K T ⊗ Σ − ǫǫ ] Z i ( T ) + N ∑ i = Z ′ i ( T ) [ J T ⊗ ( Σ ǫǫ + T Λ αα ) − ] Z i ( T ) (cid:21) − × (cid:20) N ∑ i = Z ′ i ( T ) [ K T ⊗ Σ − ǫǫ ] xxx i ( T ) + N ∑ i = Z ′ i ( T ) [ J T ⊗ ( Σ ǫǫ + T Λ αα ) − ] xxx i ( T ) (cid:21) . (E-3) E.2 Maximum Likelihood Estimation
Now consider ML estimation of δδδ , Σ ǫǫ , and Λ αα . Assuming normality of the indi-vidual effects and the disturbances, i.e., aaa i ∼ IIN ( Λ αα ) and ǫǫǫ it ∼ IIN ( Σ ǫǫ ) , then uuu i ( T ) = ( e T ⊗ aaa i ) + ǫǫǫ i ( T ) ∼ IIN ( mT ,1 , Ω u ( T ) ) . The log-likelihood functions of all xxx ’s con-ditional on all Z ’s for an individual and for all individuals in the data set then become,respectively, L i = − mT ( π ) −
12 ln | Ω u ( T ) | − Q i ( T ) ( δδδ , Σ ǫǫ , Λ αα ) , (E-4) L = N ∑ i = L i = − mNT ( π ) − N ln | Ω u ( T ) | − N ∑ i = Q i ( T ) ( δδδ , Σ ǫǫ , Λ αα ) , (E-5)where Q i ( T ) ( δδδ , Σ ǫǫ , Λ αα ) = [ xxx i ( T ) − Z ′ i ( T ) δδδ ] ′ [ K T ⊗ Σ − ǫǫ + J T ⊗ ( Σ ǫǫ + p Λ αα ) − ][ xxx i ( T ) − Z ′ i ( T ) δδδ ] ,and | Ω u ( T ) | = | Σ ( T ) || Σ ǫǫ | T − .Biørn splits the problem of estimation into: (A) Maximization of L with respect to δδδ for given Σ ǫǫ and Λ αα and (B) Maximization of L with respect to Σ ǫǫ and Λ αα for given δδδ . Subproblem (A) is identical with the GLS problem, since maximization of L with respectto δδδ for given Σ ǫǫ and Λ αα is equivalent to minimization of ∑ Ni Q i ( T ) ( δδδ , Σ ǫǫ , Λ αα ) , whichgives (E-3). To solve subproblem (B) Biørn derives expressions for the derivatives ofboth L i and L with respect to Σ ǫǫ and Λ αα . The complete stepwise algorithm forsolving jointly subproblems (A) and (B) then consists in switching between (E-3) andminimizing (E-5) with respect to Σ ǫǫ and Λ αα to obtain Σ ǫǫ and Λ αα and iterating untilconvergence.The first order conditions for the log-likelihood function for an individual i with re-spect to δδδ , vech ( Σ ǫǫ ) and vech ( Λ αα ) are: ∂ L i ∂δδδ = [ xxx i ( T ) − Z ′ i ( T ) δδδ ] ′ [ K T ⊗ Σ − ǫǫ + J T ⊗ ( Σ ǫǫ + p Σ αα ) − ] Z ′ i ( T ) , L i ∂ vech ( Σ ǫǫ ) = − L m vec (cid:20) Σ − ( T ) + ( T − ) Σ − ǫǫ − Σ − ( T ) B ui ( T ) Σ − ( T ) − Σ − ǫǫ W ui ( T ) Σ − ǫǫ (cid:21) ,and ∂ L i ∂ vech ( Λ αα ) = − L m vec (cid:20) T Σ − ( T ) − T Σ − ( T ) B ui ( T ) Σ − ( T ) (cid:21) ,where vech ( Σ ǫǫ ) and vech ( Λ αα ) are column-wise vectorization of the lower triangle ofthe symmetric matrix Σ ǫǫ and Λ αα , and L m is an elimination matrix. W ui ( T ) and B ui ( T ) respectively are defined as follows W ui ( T ) = ˜ E i ( T ) K T ˜ E ′ i ( T ) and B ui ( T ) = ˜ E i ( T ) J T ˜ E ′ i ( T ) ,where ˜ E i ( T ) = [ uuu i , . . . , uuu iT ] is a ( m × T ) matrix and uuu i ( T ) = vec ( E i ( T ) ) , ‘vec’ being thevectorization operator. That is, the disturbances defined in (E-2) for an individual i has been arranged in ( m × T ) matrix, ˜ E i ( T ) .The second order conditions are: ∂ L i ∂δδδ∂δδδ ′ = − Z i ( T ) [ K T ⊗ Σ − ǫǫ + J T ⊗ ( Σ ǫǫ + p Σ αα ) − ] Z ′ i ( T ) ∂ L i ∂δδδ∂ vec ( Λ αα ) ′ = − T ( uuu i ( T ) ⊗ Z i ( T ) )( I T K m , T ⊗ I m )( vec ( J T ) ⊗ Σ − ( T ) ⊗ Σ − ( T ) ) ∂ L i ∂δδδ∂ vec ( Σ ǫǫ ) ′ = − ( uuu i ( T ) ⊗ Z i ( T ) )( I T ⊗ K m , T ⊗ I m )( vec ( K T ) ⊗ Σ − ǫǫ ⊗ Σ − ǫǫ + vec ( J T ) ⊗ Σ − ( T ) ⊗ Σ − ( T ) ) ∂ L i ∂ vec ( Λ αα ) ∂δδδ ′ = − T ( Σ − ( T ) ⊗ Σ − ( T ) )[( ˜ E i ( T ) J T ⊗ I m ) + ( I m ⊗ ˜ E i ( T ) J T ) K m , T ] Z ′ i ( T ) ∂ L i ∂ vec ( Λ αα ) ∂ vec ( Λ αα ) ′ = T [( Σ − ( T ) ⊗ Σ − ( T ) ) − Σ − ( T ) B ui ( T ) Σ − ( T ) ⊗ Σ − ( T ) − Σ − ( T ) ⊗ Σ − ( T ) B ui ( T ) Σ − ( T ) ] ∂ L i ∂ vec ( Λ αα ) ∂ vec ( Σ ǫǫ ) ′ = T [( Σ − ( T ) ⊗ Σ − ( T ) ) − Σ − ( T ) B ui ( T ) Σ − ( T ) ⊗ Σ − ( T ) − Σ − ( T ) ⊗ Σ − ( T ) B ui ( T ) Σ − ( T ) ] ∂ L i ∂ vec ( Σ ǫǫ ) ∂δδδ ′ = − ( Σ − ( T ) ⊗ Σ − ( T ) )[( ˜ E i ( T ) J T ⊗ I m ) + ( I m ⊗ ˜ E i ( T ) J T ) K m , T ] Z ′ i ( T ) − ( Σ − ǫǫ ⊗ Σ − ǫǫ )[( ˜ E i ( T ) K T ⊗ I m ) + ( I m ⊗ ˜ E i ( T ) K T ) K m , T ] Z ′ i ( T ) L i ∂ vec ( Σ ǫǫ ) ∂ vec ( Λ αα ) ′ = T [( Σ − ( T ) ⊗ Σ − ( T ) ) − Σ − ( T ) B ui ( T ) Σ − ( T ) ⊗ Σ − ( T ) − Σ − ( T ) ⊗ Σ − ( T ) B ui ( T ) Σ − ( T ) ] ∂ L i ∂ vec ( Σ ǫǫ ) ∂ vec ( Σ ǫǫ ) ′ = [ Σ − ( T ) ⊗ Σ − ( T ) + ( T − ) Σ ǫǫ ⊗ Σ ǫǫ − Σ ǫǫ B ui ( T ) Σ ( T ) ⊗ Σ − ( T ) − Σ − ( T ) ⊗ Σ − ( T ) B ui ( T ) − Σ − ǫǫ W ui ( T ) Σ − ǫǫ ⊗ Σ − ǫǫ − Σ − ǫǫ ⊗ Σ − ǫǫ W ui ( T ) Σ − ( T ) ] . D en s i t yy