Maximum Likelihood Estimation of Stochastic Frontier Models with Endogeneity
MMAXIMUM LIKELIHOOD ESTIMATION OF STOCHASTIC FRONTIERMODELS WITH ENDOGENEITY
SAMUELE CENTORRINO AND MAR´IA P´EREZ-URDIALES
Abstract.
We study a closed-form maximum likelihood estimator of stochastic frontier modelswith endogeneity in cross-section data when both error components may be correlated with inputsand environmental variables. We achieve identification using a control function assumption. Weshow that the conditional distribution of the stochastic inefficiency term given the control functionsis a folded normal distribution, which reduces to the half-normal distribution when both inputs andenvironmental variables are independent of the stochastic inefficiency term. Hence, our frameworkis a natural generalization of the normal half-normal stochastic frontier model with endogeneity.We further provide a Battese-Coelli estimator of technical efficiency in this context. Our estimatoris computationally fast and easy to implement. We showcase its finite sample properties in Monte-Carlo simulations and an empirical application to farmers in Nepal.
Keywords : Stochastic Frontier; Endogeneity; Control Functions; Maximum Likelihood; Technicalefficiency.
JEL Codes : C10; C13; C26; C36. Introduction
We consider a stochastic frontier model which includes environmental variables that affect theinefficiency level, but not the production frontier. The composite error term is split into statisticalnoise and an inefficiency component. The production frontier can be linear or nonlinear, and theinefficiency term satisfies the scaling property, that is, it can be decomposed into a stochastic effi-ciency term and into a scaling function that depends on the environmental variables (Alvarez et al.,2006). We allow both inputs and environmental variables to be correlated with the composite errorterm. These endogenous regressors are further restricted to be continuous. We achieve identifica-tion by allowing for a vector of control functions that fully captures the dependence between thecomposite error term and the regressors; and such that the statistical noise and the inefficiencyterm are independent given these control functions. We contribute to the literature by providing aclosed-form maximum likelihood estimator of the production frontier in this context. This allowsus to provide a clear analysis of identification and a simple and computationally fast estimation of
Date : April 30, 2020. a r X i v : . [ ec on . E M ] A p r he model’s parameters. Finally, we also provide a generalization of the Battese and Coelli (1988)estimator of technical efficiency.Our analysis highlights some interesting facts about identification and estimation in this context.Under our assumptions, we show that the conditional distribution of the stochastic inefficiencyterm given the control functions is a folded normal distribution (Leone et al., 1961; Sundberg,1974). When the correlation between the inefficiency term and the control function is equal to0, the endogeneity problem disappears, and the folded normal distribution reduces to the positivehalf-normal distribution. Our framework thus provides a generalization of the normal half-normalmodel to the case when regressors are endogenous. Because of the properties of the folded normaldistribution, only the magnitude of the correlation between the stochastic inefficiency term andthe control functions is identified. However, its sign cannot be identified. This implies that thelog-likelihood function thus has two isolated maxima which are symmetric about a local minimumat zero. When the correlation parameter is equal to 0, the log-likelihood function has a uniquemaximum. We discuss some of the implications of this identification issue for both estimation andinference.Although endogeneity in the stochastic frontier framework has received increasing attention inthe literature (see Kutlu, 2010; Tran and Tsionas, 2013, 2015; Karakaplan and Kutlu, 2017; Am-sler et al., 2016; pin Lai and Kumbhakar, 2018, among others), models that explicitly allow forcorrelation between the stochastic inefficiency term, inputs and environmental variables have onlybeen studied by Amsler et al. (2016, 2017), to the best of our knowledge. These authors proposean estimator that allows production inputs and environmental variables to be correlated with boththe statistical noise and the stochastic inefficiency term. They fix the marginal distribution of thestatistical noise to be a normal distribution, and the marginal distribution of the inefficiency termto be a half-normal distribution. These authors model the dependence between observables andunobservables using copula functions. Such functions are cleverly constructed from the marginal The folded normal distribution can be thought of as a normal distribution which is “folded at zero by taking theabsolute value. Suppose we take a mean-zero normal random variable η , and then generate two standard normalrandom variables U and U , which have correlation − .
5, and 0 . η , respectively. When we “fold both U and U by taking their absolute values, we have that ∣ U ∣ has the same distribution of ∣ U ∣ . Identification of the sign ofthe correlation is thus not feasible. All other parameters held constant, the log-likelihood has to “bend back” between the two local maxima, whichgenerates a local minimum at zero. istributions of the unobservables. They also potentially allow for dependence between the statis-tical noise and the inefficiency term. However, the likelihood function cannot be written in closedform, and these authors need to resort to simulations to obtain an estimator of the model’s parame-ters. This prevents a clean and straightforward analysis of identification, estimation, and inferencein such a context. Moreover, simulated methods can be biased and have a higher variance in finitesamples, especially when the number of simulations is not chosen appropriately with the samplesize (Gouri´eroux and Monfort, 1997). Finally, when both inputs and environmental variables arepotentially correlated with the inefficiency term, they cannot obtain an estimator of technical effi-ciency. Our approach seeks to avoid these potential pitfalls. As we provide the likelihood functionin closed form, we can study identification in the usual way, and provide an estimator of technicalefficiency that is applicable to any correlation structure. In a simulation study, similar to the onein Amsler et al. (2017), we show that our estimator is computationally faster and exhibits bet-ter performances in finite samples, especially for the estimation of the variance of the stochasticinefficiency term.The paper is structured as follows. In Section 2 we discuss the statistical model and provide themain steps for the construction of the likelihood function. In Section 3, we discuss both estimationand inference in such a context, with particular emphasis on the issue of testing the null hypothesisthat there is no correlation between the regressors and the inefficiency term. In Section 4, weprovide a simulation evidence of the finite sample properties of our estimator. In Section 5, weapply our methodology to the agricultural sector in Nepal. We show that accounting for endogeneitysubstantially changes the conclusions of the empirical analysis. Finally, Section 6 concludes.2. Statistical Model
We consider a general version of the model usually considered in this literature. The output, Y ,is determined by the logarithm of some known function, m (⋅ , ⋅) , which depends on a vector of p ≥ X , and a parameter, β ; and by a composite error term ε = V − U , where V represents astochastic component; and U ≥ Y = m ( X, β ) + V − U, (1) n a way that the inefficiency term, U , captures the producer’s shortfall from the production frontier.Additionally, we fix U = U g ( Z, δ ) , where U ≥ g (⋅ , ⋅) is a known strictly positive scaling function, which depends on some additional variables Z ∈ R k ,with k ≥
0, through a parameter vector δ (Simar et al., 1994; Alvarez et al., 2006). X and Z mayhave some elements in common, but they must have at least one non-overlapping component. Werefer to Z as environmental variables .Thus, we finally have Y = m ( X, β ) + V − U g ( Z, δ ) . (2)A potential maximum likelihood estimator of ( β, δ ) is based on the assumption that the compositeerror component ( V, U ) is independent of ( X, Z ) , with ( U, V ) mutually independent; V following anormal distribution with constant variance; and U following a normal distribution truncated at 0(so-called positive half-normal distribution, see Aigner et al., 1977; Schmidt and Lovell, 1979, 1980;Horrace, 2005).While a consistent estimation of ( β, δ ) can also be obtained without these strong distributionalassumptions (Simar et al., 1994; Tran and Tsionas, 2013), these assumptions are necessary to learnsomething about the variance of the inefficiency term, U . We are often interested in estimatingthe distance of each producer from the frontier (Battese and Coelli, 1988). This can be easily donewhen the marginal distributions of V and U are taken to be known.It has long been recognized in the literature that inputs may be simultaneously chosen withthe output, and thus potentially correlated with the composite error term (see Mundlak, 1961;Schmidt and Sickles, 1984, for a full description of the statistical issues in this context). Similarly,environmental variables may be decided by the producer depending on characteristics that areobservable to her but not to the econometrician.To deal with endogenous variables, we need a vector of instruments that are correlated with theendogenous components but independent of the composite error term (see Amsler et al., 2016, forthe impact of several exogeneity assumptions on identification in SFA). To simplify our presentation,we take all variables in ( X, Z ) to be endogenous. Extension to the case when we have someendogenous and some exogenous components can be handled similarly. e consider the following auxiliary regression models X = W γ X + η X Z = W γ Z + η Z , where η = ( η X , η Z ) ∈ R p + k is a random vector of error components, and W ∈ R q is a vector ofinstrumental variables, with q ≥ p + k .Our approach is based on a control function assumption. That is, we assume that all thedependence between ( X, Z ) and ( V, U ) is captured by η (Newey et al., 1999; Imbens and Newey,2009; Wooldridge, 2015). Moreover, we assume that the instruments are strongly exogenous, thatis, fully independent of the composite error term. Given a triplet of random variables U , V and η ,we use the notation U Æ V to indicate that U is fully independent of V ; and the notation U Æ V ∣ η to indicate that U is fully independent of V conditional on η .Therefore, our main Assumptions can be formally stated as follows. Assumption 2.1. W Æ (
V, U , η ) and ( X, Z ) Æ ( U , V )∣ η . Assumption 2.2. U Æ V ∣ η . Assumption 2.1 implies strong exogeneity of the instruments; and implies that the control func-tion η captures all the dependence between ( X, Z ) and ( U , V ) .Assumption 2.2 implies that, if any dependence exists between V and U , it has to happenthrough the vector η . This assumption reduces to the standard assumption of U Æ V when both X and Z are taken to be exogenous (Kumbhakar and Lovell, 2003, Sec. 3.2, p. 64). This assumptionexcludes any direct correlation between U , the stochastic inefficiency term, and V .Assumptions 2.1 and 2.2 directly imply that f V,U ,η ( v, u, η ) = f V,η ( v, η ) f U ∣ η ( u ∣ η ) , where f denotes a probability density function. To construct a maximum likelihood estimator(MLE), we further impose the condition that η ∼ N ( , Σ η ) , where Σ η is a positive definite covariancematrix. f stochastic efficiency is taken to be independent of all covariates, a full information MLE canbe easily constructed by further assuming that ⎛⎜⎝ Vη ⎞⎟⎠ ∼ N ⎛⎜⎝⎡⎢⎢⎢⎢⎢⎣ ⎤⎥⎥⎥⎥⎥⎦ , ⎡⎢⎢⎢⎢⎢⎣ σ V Σ ′ V η Σ V η Σ η ⎤⎥⎥⎥⎥⎥⎦⎞⎟⎠ , where Σ ′ V η is a vector of covariances between V and η , and σ V is the variance of V (Kutlu, 2010).However, the main difficulty lays in the specification of the joint density of ( U , η ) such that itsmarginal distributions are a truncated normal and a joint normal, respectively, and the dependencebetween the two can be captured by only one parameter. If one specifies a joint normal distributionfor the random vector ( U ∗ , η ) , and then takes U = ∣ U ∗ ∣ , the marginal distributions of U and η are the correct marginal distributions. Amsler et al. (2017) claim that this construction createsdependence but it does not create correlation between U , and η (see also Schmidt and Lovell,1980). We contend that any dependence between U and η cannot naturally be linear, as U isa nonlinear transformation of a normal random variable. However, we show that the conditionaldistribution of U given η can be written in such a way that this dependence is still captured byonly one parameter which, we refer to as correlation parameter, and we denote as ρ U . To show how one can construct the conditional distribution for U given η , we introduce afictitious random variable η such that ⎛⎜⎝ η η ⎞⎟⎠ ∼ N ⎛⎜⎝⎡⎢⎢⎢⎢⎢⎣ ⎤⎥⎥⎥⎥⎥⎦ , ⎡⎢⎢⎢⎢⎢⎣ σ − U Σ ′ Uη σ − U Σ Uη Σ η ⎤⎥⎥⎥⎥⎥⎦⎞⎟⎠ , where Σ Uη captures the dependence between U and η , and σ U is the scale parameter of thedistribution of U . Define the new random variable κ = ⎧⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎩ η if η ≥ − η otherwise . We are abusing terminology here. However, we label ρ U as the correlation parameter, in parallel with the normalcase and lack of a better definition. he random variable κ follows a skew-normal distribution with parameters Σ η and α = Σ − η Σ Uη ( σ U − Σ ′ Uη Σ − η Σ Uη ) / (Azzalini and Valle, 1996; Azzalini and Capitanio, 1999).Notice that we can let the conditional distribution of U to depend on η through κ , in a way thejoint distribution of ( U , κ ) can be written as f U ,κ ( u, κ ) = ( π ) ( p + k + )/ ∣ Σ Uη ∣ − / exp ⎛⎜⎜⎝− ⎛⎜⎝ uκ ⎞⎟⎠ ′ Σ − Uη ⎛⎜⎝ uκ ⎞⎟⎠⎞⎟⎟⎠ , (3)and the correlation between U and κ can be written as ρ U = σ − U Σ − / η Σ Uη . This implies that f U ,η ( u, η ) = ∫ f U ,κ,η ( u, κ, η ) dκ = ∫ f U ∣ κ,η ( u ∣ κ, η ) f κ ∣ η ( κ ∣ η ) dκf η ( η )= ∫ f U ∣ κ ( u ∣ κ ) f κ ∣ η ( κ ∣ η ) dκf η ( η ) , where the last step follows from the fact that U depends on η only through the new randomvariable κ .This construction leads to two important conclusions.1) The conditional density of U given κ can be written as f U ∣ κ ( u ∣ κ ) = √ π ( σ U − Σ ′ Uη Σ − η Σ Uη ) ⎡⎢⎢⎢⎢⎢⎣ Φ ⎛⎜⎝ Σ ′ Uη Σ − η κ √ σ U − Σ ′ Uη Σ − η Σ Uη ⎞⎟⎠⎤⎥⎥⎥⎥⎥⎦ − exp ⎛⎝− ( u − Σ ′ Uη Σ − η κ ) ( σ U − Σ ′ Uη Σ − η Σ Uη ) ⎞⎠ , which is a normal distribution truncated at zero with location parameter Σ ′ Uη Σ − η κ .2) The conditional density of κ given η can be written as a two-point distribution such that f κ ∣ η ( κ ∣ η ) = ⎧⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎩ Φ ( Σ ′ Uη Σ − η η √ σ U − Σ ′ Uη Σ − η Σ Uη ) when κ = η − Φ ( Σ ′ Uη Σ − η η √ σ U − Σ ′ Uη Σ − η Σ Uη ) otherwise . Using these two facts, straightforward computations easily imply that the distribution of U given η can be written as U ∣ η ( u ∣ η ) = √ π ( σ U − Σ ′ Uη Σ − η Σ Uη ) ⎧⎪⎪⎨⎪⎪⎩ exp ⎛⎝− ( u − Σ ′ Uη Σ − η η ) ( σ U − Σ ′ Uη Σ − η Σ Uη ) ⎞⎠ + exp ⎛⎝− ( u + Σ ′ Uη Σ − η η ) ( σ U − Σ ′ Uη Σ − η Σ Uη ) ⎞⎠⎫⎪⎪⎬⎪⎪⎭ , (4)which is the pdf of a folded normal distribution (Leone et al., 1961).Let us denote by ρ V , the vector of correlations between V and η , which is defined in the usualway. Our problem can be reparametrized in terms of ( ρ V , ρ U ) .Figure 1 depicts the conditional folded normal pdf when η is a bivariate random vector, and ρ U = ( . , . ) ′ . When all parameters are fixed, the pdf is symmetric in η , in the sense that theshape of the density for η = e is the same as for η = − e , for any real-valued vector e . For a given η , our X = Z = 0 X = Z = 1 X = 2, Z = 1 X = 3, Z = 1 Figure 1.
Conditional density of U given η .remark above implies that the density is invariant to changes in sign of the correlation parameter ρ U . That is, the conditional density of U generated under a certain vector of correlations ρ U isequal to the conditional density of U when the correlation parameter is − ρ U . This is a well-knownequivalence property of the folded normal distribution (see Sundberg, 1974, among others).Therefore, the sign of the parameter ρ U is not identified. Figure 2 exemplifies this issue in thecase when there are two endogenous regressors, one in the inputs and one in the environmentalvariables, so that p = k =
1, and the true value of ρ U = ( . , . ) ′ . The black solid lines are thelevel curves of the log-likelihood function in the parameter ρ U , when all other parameters are fixed o their true value. The red dots designate the points where the log-likelihood function reaches itsmaximum. We can observe how both (− . , − . ) ′ and ( . , . ) ′ are maxima of the log-likelihoodfunction. Identification of U -0.6 -0.4 -0.2 0 0.2 0.4 UX -0.6-0.4-0.200.20.4 U Z Figure 2.
Example of lack of identification of the parameter ρ U .However, one can still assess whether there is correlation between the regressors and the efficiencyterm, although it is not possible to obtain the sign of this correlation. Thus, in practice, this doesnot appear to be a major issue. We discuss below some potential ways to deal with this lack ofidentification in estimation and inference.Another potential issue, relevant for our discussion below, is that, if the likelihood has twoisolated maxima, there must also be a point where the likelihood decreases between these twomaxima, i.e. a point of local minimum. In particular, the likelihood as a function of ρ U , has a localminimum at zero. While this does not affect estimation, it is important for testing as it impliesthat the value of the score function at ρ U = ρ U .When Σ Uη is a vector of zeros, that is, when there is no correlation between covariates and theinefficiency term, the conditional distribution in (4) reduces to f U ( u ) = √ πσ U exp (− u σ U ) , hich is the density of a half-normal distribution. In this case, ρ U is point identified, as we go backto the case in which U is independent of both X and Z .One can also easily show that the marginal distributions of η and U obtained from this con-struction are a normal and a half-normal distribution, respectively, for any plausible value of theparameter ρ U .Finally, because of Assumption 2.1 and the strict positivity of the function g (⋅) , the conditionaldistribution of U = U g ( Z, δ ) given η is simply given by P ( U ≤ u ∣ η ) = P ( U ≤ ( g ( Z, δ )) − u ∣ η ) , and it is therefore a simple scaled version of the distribution of U given η , as in the standard case.To summarize, we have shown that one can directly write the conditional density of U given η in closed form and in a way that the marginal distributions of U and η are a half-normal and anormal distribution respectively. Also, we have shown that only the sign of the vector of correlationsbetween the efficiency term, U , and the control function, η , is identified.We now turn to the construction of the likelihood function. We follow the literature on stochasticfrontier and define a new random variable ε = V − U such that f U,ε ∣ η ( u, ε ∣ η ) = f V ∣ η ( ε + u ∣ η ) ( g ( Z, δ )) − f U ∣ η (( g ( Z, δ )) − u ∣ η ) . We can thus write f V ∣ η ( ε + u ∣ η ) ( g ( Z, δ )) − f U ∣ η ( exp (− Z, δ ) u ∣ η )= π ˜ σ U ( Z ) ˜ σ V ⎧⎪⎪⎨⎪⎪⎩ exp ⎛⎝− ( u − g ( Z, δ ) Σ ′ Uη Σ − η η ) σ U ( Z ) − ( ε + u − Σ ′ V η Σ − η η ) σ V ⎞⎠+ exp ⎛⎝− ( u + g ( Z, δ ) Σ ′ Uη Σ − η η ) σ U ( Z ) − ( ε + u − Σ ′ V η Σ − η η ) σ V ⎞⎠⎫⎪⎪⎬⎪⎪⎭ , where ˜ σ U ( Z ) = ( σ U − Σ ′ Uη Σ − η Σ Uη ) g ( Z, δ ) , and ˜ σ V = σ V − Σ ′ V η Σ − η Σ V η .By simple but tedious computations, that we detail in Appendix, and after integrating withrespect to U , we obtain f ε ∣ η ( ε ∣ η ) = ∫ f V ∣ η ( ε + u ∣ η ) ( g ( Z, δ )) − f U ∣ η (( g ( Z, δ )) − u ∣ η ) du √ πσ ⎧⎪⎪⎨⎪⎪⎩ Φ ⎛⎝ λ ( Z ) Σ ′ V η Σ − η ησ ( Z ) + g ( Z, δ ) Σ ′ Uη Σ − η ηλ ( Z ) σ ( Z ) − λ ( Z ) εσ ( Z ) ⎞⎠ × exp ⎛⎝− ( ε − Σ ′ V η Σ − η η + g ( Z, δ ) Σ ′ Uη Σ − η η ) σ ( Z ) ⎞⎠+ Φ ⎛⎝ λ ( Z ) Σ ′ V η Σ − η ησ ( Z ) − g ( Z, δ ) Σ ′ Uη Σ − η ηλ ( Z ) σ ( Z ) − λ ( Z ) εσ ( Z ) ⎞⎠ × exp ⎛⎝− ( ε − Σ ′ V η Σ − η η − g ( Z, δ ) Σ ′ Uη Σ − η η ) σ ( Z ) ⎞⎠⎫⎪⎪⎬⎪⎪⎭ , with λ ( Z ) = ˜ σ U ( Z ) ˜ σ V , and σ ( Z ) = ˜ σ V + ˜ σ U ( Z ) . This distribution is an equal mixture of two conditional skew-normal distributions (see Azzaliniand Capitanio, 1999). In the absence of correlation between ε and the control function η , themarginal distribution of ε reduces to a skew-normal distribution. That is, to the standard stochasticfrontier model with strongly exogenous regressors.The full information likelihood function is therefore given by L( θ ) = f ε ∣ η ( ε ∣ η ) f η ( η ) , where the parameter θ = ( β, γ, δ, ρ U , ρ V , σ U , σ V ) .Let θ = arg max θ ∈ Θ L( θ ) . We assume that θ exists. However, it is, in general, not unique, because of the identification issuediscussed above. We further assume that θ is in the interior of the parameter space Θ.Following the idea of Sundberg (1974), we can show that θ is a well-separated maximum ofthe likelihood function in the sense of Newey and McFadden (1994), only when one appropriatelyrestricts the parameter space. Let us assume there is at least one partition of the space [− , ] p + k ,such that there exists a unique maximum of the likelihood function in each element of the partition.Then, θ is locally identified, provided the partition is chosen appropriately.A further step to complete our framework is to obtain a feasible estimator of technical efficiency, T E = exp (− U i ) . Researchers are often interested in obtaining the technical efficiency for each roducer. In our case, we obtain an estimator of this quantity from the conditional distribution of U given ε and η , following a similar approach as in Amsler et al. (2017).Let σ ⋆ = ˜ σ V ˜ σ U ( Z ) σ ( Z ) µ ⋆ = − ( ε − Σ ′ V η Σ − η η ) ˜ σ U ( Z ) σ ( Z ) µ ⋆ = g ( Z, δ ) Σ ′ Uη Σ − η η ˜ σ V σ ( Z ) , where we have removed the dependence of σ ⋆ , µ ⋆ and µ ⋆ on the variable Z for simplicity. Theconditional density of U given ε and η can be written as f U ∣ ε,η ( u ∣ ε, η ) = √ πσ ⋆ {[ Φ ( µ ⋆ + µ ⋆ σ ⋆ )] − exp (− ( u − µ ⋆ − µ ⋆ ) σ ⋆ )+ [ Φ ( µ ⋆ − µ ⋆ σ ⋆ )] − exp (− ( u − µ ⋆ + µ ⋆ ) σ ⋆ )} . We can observe that, when both U and V are independent of η , this conditional density reduces tothe one derived in Jondrow et al. (1982).Hence E [ exp (− U )∣ ε, η ] = √ πσ ⋆ {[ Φ ( µ ⋆ + µ ⋆ σ ⋆ )] − ∫ ∞ exp (− u − ( u − µ ⋆ − µ ⋆ ) σ ⋆ ) du + [ Φ ( µ ⋆ − µ ⋆ σ ⋆ )] − ∫ ∞ exp (− u − ( u − µ ⋆ + µ ⋆ ) σ ⋆ ) du } . Using simple computations, and by the properties of the cdf of the univariate normal distribution,this expression is easily shown to be equal to E [ exp (− U )∣ ε, η ] = . (− µ ⋆ − µ ⋆ + σ ⋆ ) − Φ (− µ ⋆ + µ ⋆ σ ⋆ + σ ⋆ ) Φ ( µ ⋆ + µ ⋆ σ ⋆ )+ . (− µ ⋆ + µ ⋆ + σ ⋆ ) − Φ (− µ ⋆ − µ ⋆ σ ⋆ + σ ⋆ ) Φ ( µ ⋆ − µ ⋆ σ ⋆ ) . (5) his formula generalizes Battese and Coelli (1988) formula for technical efficiencies to the en-dogenous case. Finally, the mean technical efficiency (Lee and Tyler, 1978) can be obtained as E [ exp (− U )] = E [ E [ exp (− U )∣ ε, η ]] , by the law of iterated expectations.3. Estimation and Inference
We consider an iid sample drawn from the joint distribution of ( Y, X, Z, W ) , that we denote {( Y i , X i , Z i , W i ) , i = , . . . , n } , where each observation follows the model in equation (2).Estimation of the model is relatively straightforward, and directly follows from the specificationof the likelihood function derived above. For all i = , . . . , n , we can write L n ( θ ) = n ∏ i = f ε ∣ η ( ε i ∣ η i ) f η ( η i ) , (6)with η i = ( η X,i , η
Z,i ) ′ and ε i = Y i − X i βη Xi = X i − W i γ X η Zi = Z i − W i γ Z . By letting, (cid:96) n ( θ ) = log L n ( θ ) to be the log-likelihood function, we can obtain:ˆ θ n = arg max θ ∈ Θ (cid:96) n ( θ ) . As discussed above, the main issue in the estimation procedure is related to the sign of thecorrelation parameter ρ U , which is not identified.Let us denote by ˆ θ n,ρ U the estimator of θ obtained when ρ U is restricted to a partition of thehypercube [− , ] p + k , such that θ is locally identified, and it is in the interior of the partitionedparameter space. The likelihood function satisfies the condition for consistency (see Newey andMcFadden, 1994, Theorem 2.5, p. 2131). We thus have thatˆ θ n,ρ U p —→ θ ,ρ U . f, moreover, the likelihood function is twice continuously differentiable in a neighborhood of θ ,ρ U ,we have that √ n ( ˆ θ n,ρ U − θ ,ρ U ) d —→ N ( , I − ( θ ,ρ U )) , where I ( θ ,ρ U ) is the Fisher’s information matrix. This suggests that one can project out theparameter ρ U and conduct estimation and inference in the usual way.In practice, we find that better estimation results are obtained by leaving the parameter ρ U unconstrained. The numerical optimization algorithm would converge to either of the two maximaof the likelihood function. However, this does not appear to have any effects on the estimation ofthe other parameters, as we show in simulations. Moreover, it is often not feasible to restrict theparameter space in a meaningful way, especially when the dimension of ρ U is greater than or equalto 2, as this requires some prior beliefs on the sign of the correlation coefficients. Furthermore,imposing inequality constraints may lead to singularity of the information matrix and further issuesrelated to the fact that the optimum may be at the boundaries of the (restricted) parameter space(Andrews, 1999). The MLE is not asymptotically normal when the true value is at the boundary,and appropriate testing procedures for this case have been developed (see Lee, 1993; Ketz, 2018,among others). Letting the parameter space unconstrained avoids these complications.Furthermore, one may wish to conduct inference on the parameter ρ U . In particular, a simplehypothesis to be tested is whether X and Z are independent of the inefficiency term, i.e. ρ U = ρ U in the unrestrictedparameter space is not identified under the alternative and thus standard tests may fail to satisfytheir usual asymptotic properties.One important remark is about the Score test. Irrespectively of its asymptotic properties andthe true value of ρ U , the Score test has no power around ρ U =
0. This is because zero is alwayslocal minimum of the likelihood function and thus the score is always equal to zero at that point.We leave a thorough theoretical exploration of the properties of the Trinity of tests in this modelfor future work, but we explore some of their finite sample properties in simulations. . Simulations
We replicate the same simulation schemes as in Amsler et al. (2017). We consider the followingmodel Y i = β + X i β + X i β + V i − U i exp ( Z i δ + Z i δ ) , with β = δ = δ = β = β = . ( X i , Z i ) are takento be exogenous (i.e. fully independent of the composite error term), and ( X i , Z i ) are insteadendogenous. We consider two instruments ( W i , W i ) , also fully independent of the error term.The exogenous variables are generated independently from a normal distribution with meansequal to 0 and variances equal to 1. These variables are equicorrelated, with correlation parameterequal to 0 . ( V, η X , η Z ) from the following normal distribution ⎛⎜⎜⎜⎜⎜⎝ V i η X,i η Z,i ⎞⎟⎟⎟⎟⎟⎠ ∼ N ⎛⎜⎜⎜⎜⎜⎝⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣ ⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦ , ⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣ . . . . . . ⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦⎞⎟⎟⎟⎟⎟⎠ , so that ρ V = ( . , . ) ′ , and X i = γ ( X i + Z i + W i + W i ) + η X,i Z i = γ ( X i + Z i + W i + W i ) + η Z,i , with γ = . η ∼ N ( Σ ′ Uη Σ − η η, ( − Σ ′ Uη Σ − η Σ Uη ) , with Σ η = ⎡⎢⎢⎢⎢⎢⎣ . . ⎤⎥⎥⎥⎥⎥⎦ , and Σ ′ Uη = ρ U , as all variances are taken equal to 1. From η and η , we generate a skew-normalrandom variable κ , such that κ = η ( η ≥ ) − η ( η < ) , here (⋅) is the indicator function. Finally, U = σ U ∣ Σ ′ Uη Σ − η κ + √ − Σ ′ Uη Σ − η Σ Uη (cid:15) ∣ , where (cid:15) is a standard normal random variable.We consider two simulation schemes that differ because of the value of the parameter ρ U . In Setting 1 , we take U to be uncorrelated with η (the same setting as in Amsler et al., 2017). In Setting 2 , we take ρ U = ( . , . ) ′ . We take increasing sample sizes n = { , , } , and run R = γ by OLS. For a given γ , one can then maximize thefull likelihood with respect to the other parameters. One can use the estimator obtained in thisfashion as a starting value for maximization of the full likelihood. Standard errors are obtained byevaluating numerically the Hessian matrix of the full likelihood. Bootstrap is also a possibility, butwe do not explore it here (Kutlu, 2010).Second, the choice of the initial condition is crucial, especially for nonlinear, high dimensionaloptimization problems like ours. We select the initial parameters by the method of moments. Wecan write E [ Y i ∣ X i , Z i , η i ] = β + X i β + X i β + E [ V i ∣ η i ] − E [ U i ∣ η i ] exp ( Z i δ + Z i δ ) , using the assumption that ( U , V ) is independent of ( X , Z ) given η , with E [ V i ∣ η i ] = Σ ′ V η Σ − η η i E [ U i ∣ η i ] = √ σ U − Σ ′ Uη Σ − η Σ Uη φ ⎛⎜⎝ Σ ′ Uη Σ − η η i √ σ U − Σ ′ Uη Σ − η Σ Uη ⎞⎟⎠+ ⎛⎜⎝ ⎛⎜⎝ Σ ′ Uη Σ − η η i √ σ U − Σ ′ Uη Σ − η Σ Uη ⎞⎟⎠ − ⎞⎟⎠ Σ ′ Uη Σ − η η i . e report results of these simulations in Tables 1 and 2 below. The results in Table 1 shouldbe compared with those in Table 4, p. 138 of Amsler et al. (2017). The mean and the standarddeviation for most of the parameters are comparable with theirs. However, we achieve much betterprecision in estimating the variance of the inefficiency term, which, as indicated by Amsler et al.(2017), is estimated very imprecisely using the copula method. Both the bias and the variancedecrease as the sample size n increases, which ought to be expected from our MLE. N = N = N = β β β δ δ γ x, γ x, γ x, γ x, γ x, γ z, γ z, γ z, γ z, γ z, σ U σ V ρ U,η X ρ U,η Z ρ V,η X ρ V,η Z Table 1.
Mean and Standard Errors of Estimators for Setting 1
The results in
Setting 2 , i.e. when ρ U = ( . , . ) ′ are comparable to the results obtainedabove. We compute the mean of the parameter ρ U after taking the absolute value. Obviously,this is feasible here as we know that ρ U is well separated from the local minimum at 0. Theonly remarkable difference between the two tables is that the standard deviation of ρ U is nowmuch larger, which ought to be expected, as the parameter is not point identified in this case.Finally, the standard error of the parameter ρ U is also approximated very poorly using the inverseof the numerical Hessian matrix. This suggests that a Wald test may tend to over-reject the nullhypothesis in finite samples.We thus provide next some simulation evidence about using the trinity of test in this setting.For all simulation schemes, we test the composite nulls that ρ U = ρ U = . = N = N = β β β δ δ γ x, γ x, γ x, γ x, γ x, γ z, γ z, γ z, γ z, γ z, σ U σ V ρ U,η X ρ U,η Z ρ V,η X ρ V,η Z Table 2.
Mean and Standard Errors of Estimators for Setting 2 construct the covariance of the estimator for the Lagrange multiplier tests, we numerically evaluatethe second derivative under the null. The critical values are taken from a χ distribution with 2degrees of freedom.In Table 3, we report the size properties of the three tests, with the nominal size being 5%. Thecolumns indicate the true value of ρ U used in the simulation exercise and the null hypothesis ofthe test. Both the Wald test and the Lagrange multiplier tests require numerical evaluation of thesecond derivative of the likelihood function, which may affect their finite sample properties. For ρ U =
0, the Likelihood ratio test is the one that has size most comparable to the nominal one. TheWald test has a much higher rejection probability and its performance does not improve as thesample size increases. As we suggest above, this may be due to the poor approximation of the truestandard errors. The score test instead features the opposite issue, as it rarely rejects a true null.When ρ U = . U = H ∶ ρ U = ρ U = . H ∶ ρ U = . Table 3.
Size of the trinity of tests
In Table 4, we instead report their power properties. The columns indicate the true value of ρ U used in the simulation exercise and the null hypothesis of the test. The tests have in general goodpower, with two main exceptions. The Wald test does not perform when ρ U =
0, but its powerproperties improve as the sample size increases. Similarly, the Score test has little to no power indetecting a false null hypothesis. As zero is a local minimum of the log-likelihood, as indicatedabove, the Score is close to zero at that point, which explains its bad performances. ρ U = H ∶ ρ U = . ρ U = . H ∶ ρ U = Table 4.
Power of the trinity of tests
Overall, we can conclude that the Likelihood ratio test has the best finite sample performance inour small-scale simulation exercise. This conclusion has to be taken with caution, as the theoreticalproperties of the trinity of tests in our setting may not be standard.Finally, we report summary statistics for our estimators of technical efficiencies using the Battese-Coelli formula provided in equation 5. To give a reference point to the reader, in both simulationschemes the marginal distribution of U is a half-normal distribution with scale parameter equalto σ U = . E [ exp (− U )] = ( σ U ) Φ (− σ U ) = . . Our estimator gives a plausible interval for the values of technical efficiencies. The mean technicalefficiency also approaches the true value of N increases. = N = N = ρ U = ρ U = . ρ U = ρ U = . ρ U = ρ U = . Table 5.
Summary measures for the estimator of technical efficiency Empirical Application
In this section, we consider an application using data on the agricultural sector in Nepal. Thedata set consists of a cross-section of 600 vegetable-cultivating farmers from Nepal for the cropyear 2015, which is sourced from the International Food Policy Research Institute and the SeedEntrepreneurs’ Association of Nepal (2018). For more detail on the data, see Spielman et al. (2017).The
Output variable is total vegetable production measured in rupees.
Land is measured as thetotal area cultivated in square feet.
Machinery is the number of hours machinery was used forland preparation, seed and sowing operations, and harvesting.
Labor is the sum of hours worked byhired laborers and the hours worked by household members.
Pesticides are measured in milligrams.
Fertilizers are the sum of organic and inorganic fertilizers, both measured in Kilograms.
Seeds aremeasured as the sum of hybrid and pollinated seeds in grams. As environmental variables weconsider
Experience , which is the number of years the farmer has been growing vegetables;
HigherEducation , the proportion of household members with higher education or professional degree; and
Risk diversification , which is constructed as follows:
Risk diversif ication = ⎧⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎩ ∑ Ci = s i − / C − / C for C >
11 for C = , where s i is the proportion of land devoted to crop i , and C is the total number of crops cultivatedby each farmer. This indicator is constructed similarly to a normalized Herfindahl-Hirschman Indexand a Simpson Diversity Index, both concentration measures. Our indicator ranges from 0 to 1.A Risk diversification
Index equal to 1 indicates a farmer who is cultivating only one crop, andtherefore, not diversifying risks; whereas lower values of this index indicate more risk diversification. fter removing missing values, we obtain a final sample of 551 observations. Summary statisticsof the variables used in the analysis are provided in Appendix A.Having this in mind, the model we estimate is the following: Y = Xβ + V − U exp ( Zδ ) , where: Y ={ log ( Output )} ,X ={ Intercept, log ( Land ) , log ( Labor ) , log ( M achinery ) , log ( F ertilizers ) , log ( P esticides ) , log ( Seeds )} ,Z ={ Education, Experience, Risk diversif ication } . We allow for endogeneity of five inputs (
Labor , Machinery , Fertilizers , Pesticides and
Seeds ) andone environmental variable (
Risk diversification ). As instruments, we use two dummies for whetherthe farmer has suffered any natural or human shocks in the two years prior to the survey (
NaturalShocks and
Human Shocks , respectively); the average years of experience of nearby farmers, as ameasure of spillover effects (
Peers Experience ); three variables measuring the proportion of seedsthat are owned by the farmer (
Own Supplier ), obtained through formal channels such as an inputretailer, a private seed company or representative, a government extension service or a researchinstitute (
Formal Supplier ), or informal channels such as a family member, a farmer’s cooperative,gifted from a nearby farmer, friend or farmer from other villages, or landlord (
Informal Supplier );a set of variables indicating the proportion of seeds that have been obtained using different meansof transportation to reach the market (
Foot , Bike , Rickshaw , Motorbike , Tempo , Bus and
Car );and interaction terms between the type of seed provider and the mean of transportation. We testwhether the instruments are weak using the first stage F-statistics (Stock and Yogo, 2005), and wereject the null hypothesis of weak instruments. Results are given in Appendix A2.Table 6 reports results for our empirical example. The first pair of columns shows the estimationresults assuming exogeneity. Most of the estimated coefficients for inputs are positive, althoughsmall in magnitude and not always significant, being
Seeds and
Land the most relevant inputs.The coefficients for
Machinery is negative, which seems unreasonable, but it is not significantly β β Land β Labor β Machinery -0.0024 0.0086 0.0074 0.0282 β Fertilizer β Pesticides β Seeds δ Education -0.0429 1.606 0.3067 0.5767 δ Experience δ Risk -72.4406 0.0146 1.2888 0.5015 ρ U,η
Labor ρ U,η
Machinery ρ U,η
Fertilizer -0.0007 0.1089 ρ U,η
Pesticides -0.3242 0.1920 ρ U,η
Seeds ρ U,η
Risk -0.4713 0.1404 ρ V,η
Labor -0.3066 0.1026 ρ V,η
Machinery -0.0598 0.1040 ρ V,η
Fertilizer -0.0617 0.1167 ρ V,η
Pesticides ρ V,η
Seeds -0.3610 0.0780 ρ V,η
Risk σ U σ V Table 6.
Estimation of the efficiency frontier with and without accounting for endogeneity. different from zero. The estimation results controlling for endogeneity are reported in the secondpair of columns. We find that the estimated coefficients for the inputs are all positive, except for
Pesticides , which have a negative and significant effect on the value of production.
Seeds are stillhaving a significant impact on output, along with
Labor . Land instead is now not significantlydifferent from zero. As it is common in instrumental variables, the standard errors in the modelcontrolling for endogeneity are substantially larger than in the model assuming exogeneity.Regarding the environmental variables, we find that the only significant coefficient is the oneof
Risk diversification . The estimated coefficient is negative and remarkably large in magnitudein the model assuming exogeneity. However, this coefficient reverts to positive when controllingfor endogeneity. This means that higher levels of crop concentration (lower risk diversification)increase the level of inefficiency. This result may seem counter-intuitive, as one may expect thatfarmers cultivating fewer crops (i.e., with lower risk diversification) can become more specialized.However, it is also true that farmers who diversify risks are less exposed to shocks affecting theirproduction, and our results suggest that they may be more efficient. hen controlling for endogeneity, we have also tested for the absence of correlation between theendogenous variables and the inefficiency term, and for the variance of the inefficiency term beingequal to 0. We first test the joint null hypothesis that ρ U = σ U is equal to 0 in both models. In the model withendogeneity, this is a composite null, as σ U =
0, also implies ρ U =
0. Similarly, we are testing fora parameter at the boundary of the parameter space (Lee, 1993; Ketz, 2018). However, we ignorethis issue for simplicity. In both models, the likelihood ratio test rejects the null of σ U being equalto 0. However, the Wald test fails to reject the null in the model with exogeneity. Technical Efficiency (Exogeneity)
Technical Efficiency (Endogeneity)
Figure 3.
Estimation of technical inefficiency.Figure 3 reports the technical efficiency estimates for both models. It is apparent from thedistribution of the inefficiency scores, that the stochastic frontier model that does not account forendogeneity is unable to capture any skewness in the distribution of the residuals. However, despitethe variance of the inefficiency term being smaller in the model with endogeneity, the estimatorof technical efficiencies are much richer and suggests that many farmers may be very far from theestimated production frontier. . Conclusions
We propose a closed-form maximum likelihood estimation of a stochastic frontier model whenboth the production inputs and the environmental variables are correlated with the two-sided sto-chastic error term and the one-sided stochastic inefficiency term. Our identification and estimationstrategy is based on control functions that fully capture the dependence between regressors and un-observables. While the joint density of the two-sided stochastic error term and the control functionis easily modeled as a normal distribution, one of the main challenges for direct maximum likelihoodestimation is to write the joint density of the stochastic inefficiency term and the control functionin closed-form. To circumvent this issue, Amsler et al. (2017) use copula functions to model thedependence between observables and unobservables components of the model, and employ a simu-lated maximum likelihood procedure to obtain the parameter’s estimate. This estimator may notbe easy to implement and may be computationally slow. Moreover, instrumental variable methodslead to lower precision in the estimate and simulated methods can increase this lack of precisioneven further.In this work, we provide a simple maximum likelihood estimator that aims at avoiding thesepotential pitfalls. Under appropriate conditional independence restrictions, we show that the con-ditional distribution of the stochastic inefficiency term given the control functions is a folded normaldistribution, which reduces to the half-normal when there is no endogeneity. This makes our modela straightforward extension of the normal-half-normal model to include endogenous regressors. Weshed light on new identification issues, and we provide Monte-Carlo evidence of the size and powerof standard testing procedures in such context. Our estimator is easy and fast to implement, andenjoys good finite sample properties.Additional research on the asymptotic properties of the trinity of tests and on testing the dis-tributional assumptions on the error term is needed. Moreover, extensions of our model to paneldata with time-varying endogeneity and true fixed effects could be of interest. eferences Aigner, D., Lovell, C. and Schmidt, P. (1977), ‘Formulation and estimation of stochastic frontierproduction function models’,
Journal of Econometrics (1), 21 – 37.Alvarez, A., Amsler, C., Orea, L. and Schmidt, P. (2006), ‘Interpreting and Testing the ScalingProperty in Models where Inefficiency Depends on Firm Characteristics’, Journal of ProductivityAnalysis (3), 201–212.Amsler, C., Prokhorov, A. and Schmidt, P. (2016), ‘Endogeneity in stochastic frontier models’, Journal of Econometrics (2), 280 – 288.Amsler, C., Prokhorov, A. and Schmidt, P. (2017), ‘Endogenous environmental variables in sto-chastic frontier models’,
Journal of Econometrics (2), 131 – 140.Andrews, D. W. K. (1999), ‘Estimation when a parameter is on a boundary’,
Econometrica (6), 1341–1383.Azzalini, A. and Capitanio, A. (1999), ‘Statistical applications of the multivariate skew normal dis-tribution’, Journal of the Royal Statistical Society: Series B (Statistical Methodology) (3), 579–602.Azzalini, A. and Valle, A. D. (1996), ‘The Multivariate Skew-Normal Distribution’, Biometrika (4), 715–726.Battese, G. E. and Coelli, T. J. (1988), ‘Prediction of firm-level technical efficiencies with a gener-alized frontier production function and panel data’, Journal of Econometrics (3), 387 – 399.Gouri´eroux, C. and Monfort, A. (1997), Simulation-based Econometric Methods , OUP/CORE Lec-ture Series, Oxford University Press.Horrace, W. C. (2005), ‘Some results on the multivariate truncated normal distribution’,
Journalof Multivariate Analysis (1), 209 – 221.IFPRI and SEAN (2018), ‘Nepal Vegetable Seed Study: Household Survey’. URL: https://doi.org/10.7910/DVN/9BRU7N
Imbens, G. W. and Newey, W. K. (2009), ‘Identification and Estimation of Triangular SimultaneousEquations Models Without Additivity’,
Econometrica (5), 1481–1512.Jondrow, J., Lovell, C. K., Materov, I. S. and Schmidt, P. (1982), ‘On the estimation of techni-cal inefficiency in the stochastic frontier production function model’, Journal of Econometrics (2), 233 – 238.Karakaplan, M. U. and Kutlu, L. (2017), ‘Handling Endogeneity in Stochastic Frontier Analysis’, Economics Bulletin (2).Ketz, P. (2018), ‘Subvector inference when the true parameter vector may be near or at the bound-ary’, Journal of Econometrics (2), 285 – 306.Kumbhakar, S. and Lovell, C. (2003),
Stochastic Frontier Analysis , Stochastic Frontier Analysis,Cambridge University Press.Kutlu, L. (2010), ‘Battese-coelli estimator with endogenous regressors’,
Economics Letters (2), 79 – 81.Lee, L.-F. (1993), ‘Asymptotic Distribution of the Maximum Likelihood Estimator for a StochasticFrontier Function Model with a Singular Information Matrix’,
Econometric Theory (3), 413–430.Lee, L.-F. and Tyler, W. G. (1978), ‘The stochastic frontier production function and averageefficiency: An empirical analysis’, Journal of Econometrics (3), 385 – 389.Leone, F. C., Nelson, L. S. and Nottingham, R. B. (1961), ‘The Folded Normal Distribution’, Technometrics (4), 543–550.Mundlak, Y. (1961), ‘Empirical Production Function Free of Management Bias’, American Journalof Agricultural Economics (1), 44–56.Newey, W. K. and McFadden, D. (1994), Large sample estimation and hypothesis testing, Vol. 4of Handbook of Econometrics , Elsevier, pp. 2111 – 2245.Newey, W. K., Powell, J. L. and Vella, F. (1999), ‘Nonparametric Estimation of Triangular Simul-taneous Equations Models’,
Econometrica (3), 565–603.pin Lai, H. and Kumbhakar, S. C. (2018), ‘Endogeneity in panel data stochastic frontier modelwith determinants of persistent and transient inefficiency’, Economics Letters , 5 – 9.Schmidt, P. and Lovell, C. (1980), ‘Estimating stochastic production and cost frontiers when tech-nical and allocative inefficiency are correlated’,
Journal of Econometrics (1), 83 – 100.Schmidt, P. and Lovell, C. K. (1979), ‘Estimating technical and allocative inefficiency relative tostochastic production and cost frontiers’, Journal of Econometrics (3), 343 – 366.Schmidt, P. and Sickles, R. C. (1984), ‘Production frontiers and panel data’, Journal of Business& Economic Statistics (4), 367–374. imar, L., Knox Lovell, C. and Vanden Eeckaut, P. (1994), ‘Stochastic frontiers incorporatingexogenous influences on efficiency’, STAT Discussion Papers (9403).Spielman, D. J., Bhandary, P., Bhandari, A., Shrestha, H., Dhakal, L. and Marahatta, B. (2017),Nepali Vegetable Seed Market Study – Household Analysis, Technical report, International FoodPolicy Research Institute.Stock, J. H. and Yogo, M. (2005),
Testing for Weak Instruments in Linear IV Regression , Cam-bridge University Press, pp. 80–108.Sundberg, R. (1974), ‘Maximum likelihood theory for incomplete data from an exponential family’,
Scandinavian Journal of Statistics (2), 49–58.Tran, K. C. and Tsionas, E. G. (2013), ‘Gmm estimation of stochastic frontier model with endoge-nous regressors’, Economics Letters (1), 233 – 236.Tran, K. C. and Tsionas, E. G. (2015), ‘Endogeneity in stochastic frontier models: Copula approachwithout external instruments’,
Economics Letters , 85 – 88.Wooldridge, J. M. (2015), ‘Control Function Methods in Applied Econometrics’,
Journal of HumanResources (2), 420–445. . Appendix
A.1.
Conditional density of the composite error term.
In this subsection, we provide themain steps to derive the conditional density of the composite error term, ε , given η . Recall that f V ∣ η ( ε + u ∣ η ) ( g ( Z, δ )) − f U ∣ η (( g ( Z, δ )) − u ∣ η )= π ˜ σ U ( Z ) ˜ σ V ⎧⎪⎪⎨⎪⎪⎩ exp ⎛⎝− ( u − g ( Z, δ ) Σ ′ Uη Σ − η η ) σ U ( Z ) − ( ε + u − Σ ′ V η Σ − η η ) σ V ⎞⎠+ exp ⎛⎝− ( u + g ( Z, δ ) Σ ′ Uη Σ − η η ) σ U ( Z ) − ( ε + u − Σ ′ V η Σ − η η ) σ V ⎞⎠⎫⎪⎪⎬⎪⎪⎭ , where ˜ σ U ( Z ) = ( σ U − Σ ′ Uη Σ − η Σ Uη ) g ( Z, δ ) , and ˜ σ V = σ V − Σ ′ V η Σ − η Σ V η .The terms inside the exponential function can be treated similarly, and for simplicity, we onlyshow the algebra for the first term. We have ( u − g ( Z, δ ) Σ ′ Uη Σ − η η ) ˜ σ U ( Z ) = σ U ( Z ) ( u − g ( Z, δ ) Σ ′ Uη Σ − η uη + ( g ( Z, δ ) Σ ′ Uη Σ − η ) η )( ε + u − Σ ′ V η Σ − η η ) ˜ σ V = σ V ( u + ( ε − Σ ′ V η Σ − η η ) + ( ε − Σ ′ V η Σ − η η ) u ) . Taking the sum of these two terms gives σ ( Z ) ˜ σ U ( Z ) ˜ σ V ( u − g ( Z, δ ) Σ ′ Uη Σ − η uη ˜ σ V σ ( Z ) + ( ε − Σ ′ V η Σ − η η ) u ˜ σ U ( Z ) σ ( Z ) )+ ( g ( Z, δ ) Σ ′ Uη Σ − η ) η ˜ σ U ( Z ) + ( ε − Σ ′ V η Σ − η η ) ˜ σ V = σ ( Z ) ˜ σ U ( Z ) ˜ σ V ( u + (( ε − Σ ′ V η Σ − η η ) ˜ σ U ( Z ) σ ( Z ) − g ( Z, δ ) Σ ′ Uη Σ − η η ˜ σ V σ ( Z ) )) − σ ( Z ) ˜ σ U ( Z ) ˜ σ V (( ε − Σ ′ V η Σ − η η ) ˜ σ U ( Z ) σ ( Z ) − g ( Z, δ ) Σ ′ Uη Σ − η η ˜ σ V σ ( Z ) ) + ( g ( Z, δ ) Σ ′ Uη Σ − η ) η ˜ σ U ( Z ) + ( ε − Σ ′ V η Σ − η η ) ˜ σ V = σ ( Z ) ˜ σ U ( Z ) ˜ σ V [ u + (( ε − Σ ′ V η Σ − η η ) ˜ σ U ( Z ) σ ( Z ) − g ( Z, δ ) Σ ′ Uη Σ − η η ˜ σ V σ ( Z ) )] + ( σ V − ˜ σ U ( Z ) ˜ σ V σ ( Z ) ) ( ε − Σ ′ V η Σ − η η ) + ( σ U ( Z ) − ˜ σ V ˜ σ U ( Z ) σ ( Z ) ) ( g ( Z, δ ) Σ ′ Uη Σ − η ) η σ ( Z ) ( ε − Σ ′ V η Σ − η η ) g ( Z, δ ) Σ ′ Uη Σ − η η = σ ( Z ) ˜ σ U ( Z ) ˜ σ V [ u + (( ε − Σ ′ V η Σ − η η ) ˜ σ U ( Z ) σ ( Z ) − g ( Z, δ ) Σ ′ Uη Σ − η η ˜ σ V σ ( Z ) )] + σ ( Z ) ( ε − Σ ′ V η Σ − η η + g ( Z, δ ) Σ ′ Uη Σ − η η ) . Then, treating the remaining term similarly, we can write f V ∣ η ( ε + u ∣ η ) ( g ( Z, δ )) − f U ∣ η (( g ( Z, δ )) − u ∣ η )= π ˜ σ U ( Z ) ˜ σ V σ ( Z ) σ ( Z ) ⎧⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎩ exp ⎛⎜⎜⎜⎝− σ ( Z ) [ u + (( ε − Σ ′ V η Σ − η η ) ˜ σ U ( Z ) σ ( Z ) − g ( Z, δ ) Σ ′ Uη Σ − η η ˜ σ V σ ( Z ) )] σ U ( Z ) ˜ σ V ⎞⎟⎟⎟⎠ × exp ⎛⎜⎝− ( ε − Σ ′ V η Σ − η η + g ( Z, δ ) Σ ′ Uη Σ − η η ) σ ( Z ) ⎞⎟⎠+ exp ⎛⎜⎜⎜⎝− σ ( Z ) [ u + (( ε − Σ ′ V η Σ − η η ) ˜ σ U ( Z ) σ ( Z ) + g ( Z, δ ) Σ ′ Uη Σ − η η ˜ σ V σ ( Z ) )] σ U ( Z ) ˜ σ V ⎞⎟⎟⎟⎠ × exp ⎛⎜⎝− ( ε − Σ ′ V η Σ − η η − g ( Z, δ ) Σ ′ Uη Σ − η η ) σ ( Z ) ⎞⎟⎠⎫⎪⎪⎪⎬⎪⎪⎪⎭ . After integrating this final expression with respect to U on its support, that is between 0 and ∞ , we obtain the final result.A.2. Additional material for empirical application.
In this section, we provide some addi-tional information about the empirical application.Table 7 contains descriptive statistics from the main variables used in the analysis. The variablesare divided by category for convenience of the reader.Table 8 contains instead values of the F-statistics from the first stage linear regressions of theendogenous variables on the included exogenous variables and the instruments. The null hypothesistested is that the instruments are irrelevant, that is, all coefficients are simultaneously equal to 0.We can observe how all F-statistics are above 10, which is the threshold value suggested by Stockand Yogo (2005) below which the instruments should be considered weak. ean St.Dev. Min MaxOutput 986391.007 7413003.065 2466.286 117761500.000 Inputs
Land 27521.032 32288.538 729.000 273800.000Labor 520.127 4182.825 1.000 92881.000Machinery 2.426 7.222 0.000 70.000Fertilizers 44539.860 433215.499 0.000 7500000.000Pesticides 85.309 226.033 0.000 3250.000Seeds 279.404 384.670 0.002 3500.000
Environmental variables
Education 0.065 0.138 0.000 0.800Experience 24.309 16.496 1.000 100.000Risk Div 0.393 0.169 0.093 1.000
Instruments
Natural Shock 0.430 0.496 0.000 1.000Human Shock 0.022 0.146 0.000 1.000Own Supplier 0.052 0.105 0.000 1.000Formal supplier 0.251 0.190 0.000 1.000Informal Supplier 0.011 0.044 0.000 0.500Peers Experience 24.739 13.366 10.000 44.000Foot 0.653 0.425 0.000 1.000Bike 0.161 0.340 0.000 1.000Rickshaw 0.003 0.047 0.000 1.000Motorbike 0.020 0.130 0.000 1.000Tempo 0.007 0.069 0.000 1.000Bus 0.115 0.287 0.000 1.000Car 0.004 0.033 0.000 0.500
Table 7.
Descriptive StatisticsVariable F-StatisticLabor 27.804Machinery 56.339Fertilizers 26.760Pesticides 20.298Seeds 11.223Risk 98.596
Table 8.
F-Statistics from linear first stage regressions (S. Centorrino, Corresponding author)
Economics Department, State University of New York at StonyBrook, USA.
E-mail address , S. Centorrino: [email protected] (M. P´erez-Urdiales)
Economics Department, State University of New York at Stony Brook, USA.
E-mail address : [email protected]@stonybrook.edu