[PDF] Maximum Likelihood Estimation of Stochastic Frontier Models with Endogeneity

Abstract

We propose and study a maximum likelihood estimator of stochastic frontier models with endogeneity in cross-section data when the composite error term may be correlated with inputs and environmental variables. Our framework is a generalization of the normal half-normal stochastic frontier model with endogeneity. We derive the likelihood function in closed form using three fundamental assumptions: the existence of control functions that fully capture the dependence between regressors and unobservables; the conditional independence of the two error components given the control functions; and the conditional distribution of the stochastic inefficiency term given the control functions being a folded normal distribution. We also provide a Battese-Coelli estimator of technical efficiency. Our estimator is computationally fast and easy to implement. We study some of its asymptotic properties, and we showcase its finite sample behavior in Monte-Carlo simulations and an empirical application to farmers in Nepal.

Full PDF

MMAXIMUM LIKELIHOOD ESTIMATION OF STOCHASTIC FRONTIERMODELS WITH ENDOGENEITY

SAMUELE CENTORRINO AND MAR´IA P´EREZ-URDIALES

Abstract.

We study a closed-form maximum likelihood estimator of stochastic frontier modelswith endogeneity in cross-section data when both error components may be correlated with inputsand environmental variables. We achieve identiﬁcation using a control function assumption. Weshow that the conditional distribution of the stochastic ineﬃciency term given the control functionsis a folded normal distribution, which reduces to the half-normal distribution when both inputs andenvironmental variables are independent of the stochastic ineﬃciency term. Hence, our frameworkis a natural generalization of the normal half-normal stochastic frontier model with endogeneity.We further provide a Battese-Coelli estimator of technical eﬃciency in this context. Our estimatoris computationally fast and easy to implement. We showcase its ﬁnite sample properties in Monte-Carlo simulations and an empirical application to farmers in Nepal.

Keywords : Stochastic Frontier; Endogeneity; Control Functions; Maximum Likelihood; Technicaleﬃciency.

JEL Codes : C10; C13; C26; C36. Introduction

We consider a stochastic frontier model which includes environmental variables that aﬀect theineﬃciency level, but not the production frontier. The composite error term is split into statisticalnoise and an ineﬃciency component. The production frontier can be linear or nonlinear, and theineﬃciency term satisﬁes the scaling property, that is, it can be decomposed into a stochastic eﬃ-ciency term and into a scaling function that depends on the environmental variables (Alvarez et al.,2006). We allow both inputs and environmental variables to be correlated with the composite errorterm. These endogenous regressors are further restricted to be continuous. We achieve identiﬁca-tion by allowing for a vector of control functions that fully captures the dependence between thecomposite error term and the regressors; and such that the statistical noise and the ineﬃciencyterm are independent given these control functions. We contribute to the literature by providing aclosed-form maximum likelihood estimator of the production frontier in this context. This allowsus to provide a clear analysis of identiﬁcation and a simple and computationally fast estimation of

Date : April 30, 2020. a r X i v : . [ ec on . E M ] A p r he model’s parameters. Finally, we also provide a generalization of the Battese and Coelli (1988)estimator of technical eﬃciency.Our analysis highlights some interesting facts about identiﬁcation and estimation in this context.Under our assumptions, we show that the conditional distribution of the stochastic ineﬃciencyterm given the control functions is a folded normal distribution (Leone et al., 1961; Sundberg,1974). When the correlation between the ineﬃciency term and the control function is equal to0, the endogeneity problem disappears, and the folded normal distribution reduces to the positivehalf-normal distribution. Our framework thus provides a generalization of the normal half-normalmodel to the case when regressors are endogenous. Because of the properties of the folded normaldistribution, only the magnitude of the correlation between the stochastic ineﬃciency term andthe control functions is identiﬁed. However, its sign cannot be identiﬁed. This implies that thelog-likelihood function thus has two isolated maxima which are symmetric about a local minimumat zero. When the correlation parameter is equal to 0, the log-likelihood function has a uniquemaximum. We discuss some of the implications of this identiﬁcation issue for both estimation andinference.Although endogeneity in the stochastic frontier framework has received increasing attention inthe literature (see Kutlu, 2010; Tran and Tsionas, 2013, 2015; Karakaplan and Kutlu, 2017; Am-sler et al., 2016; pin Lai and Kumbhakar, 2018, among others), models that explicitly allow forcorrelation between the stochastic ineﬃciency term, inputs and environmental variables have onlybeen studied by Amsler et al. (2016, 2017), to the best of our knowledge. These authors proposean estimator that allows production inputs and environmental variables to be correlated with boththe statistical noise and the stochastic ineﬃciency term. They ﬁx the marginal distribution of thestatistical noise to be a normal distribution, and the marginal distribution of the ineﬃciency termto be a half-normal distribution. These authors model the dependence between observables andunobservables using copula functions. Such functions are cleverly constructed from the marginal The folded normal distribution can be thought of as a normal distribution which is “folded at zero by taking theabsolute value. Suppose we take a mean-zero normal random variable η , and then generate two standard normalrandom variables U and U , which have correlation − .

5, and 0 . η , respectively. When we “fold both U and U by taking their absolute values, we have that ∣ U ∣ has the same distribution of ∣ U ∣ . Identiﬁcation of the sign ofthe correlation is thus not feasible. All other parameters held constant, the log-likelihood has to “bend back” between the two local maxima, whichgenerates a local minimum at zero. istributions of the unobservables. They also potentially allow for dependence between the statis-tical noise and the ineﬃciency term. However, the likelihood function cannot be written in closedform, and these authors need to resort to simulations to obtain an estimator of the model’s parame-ters. This prevents a clean and straightforward analysis of identiﬁcation, estimation, and inferencein such a context. Moreover, simulated methods can be biased and have a higher variance in ﬁnitesamples, especially when the number of simulations is not chosen appropriately with the samplesize (Gouri´eroux and Monfort, 1997). Finally, when both inputs and environmental variables arepotentially correlated with the ineﬃciency term, they cannot obtain an estimator of technical eﬃ-ciency. Our approach seeks to avoid these potential pitfalls. As we provide the likelihood functionin closed form, we can study identiﬁcation in the usual way, and provide an estimator of technicaleﬃciency that is applicable to any correlation structure. In a simulation study, similar to the onein Amsler et al. (2017), we show that our estimator is computationally faster and exhibits bet-ter performances in ﬁnite samples, especially for the estimation of the variance of the stochasticineﬃciency term.The paper is structured as follows. In Section 2 we discuss the statistical model and provide themain steps for the construction of the likelihood function. In Section 3, we discuss both estimationand inference in such a context, with particular emphasis on the issue of testing the null hypothesisthat there is no correlation between the regressors and the ineﬃciency term. In Section 4, weprovide a simulation evidence of the ﬁnite sample properties of our estimator. In Section 5, weapply our methodology to the agricultural sector in Nepal. We show that accounting for endogeneitysubstantially changes the conclusions of the empirical analysis. Finally, Section 6 concludes.2. Statistical Model

We consider a general version of the model usually considered in this literature. The output, Y ,is determined by the logarithm of some known function, m (⋅ , ⋅) , which depends on a vector of p ≥ X , and a parameter, β ; and by a composite error term ε = V − U , where V represents astochastic component; and U ≥ Y = m ( X, β ) + V − U, (1) n a way that the ineﬃciency term, U , captures the producer’s shortfall from the production frontier.Additionally, we ﬁx U = U g ( Z, δ ) , where U ≥ g (⋅ , ⋅) is a known strictly positive scaling function, which depends on some additional variables Z ∈ R k ,with k ≥

0, through a parameter vector δ (Simar et al., 1994; Alvarez et al., 2006). X and Z mayhave some elements in common, but they must have at least one non-overlapping component. Werefer to Z as environmental variables .Thus, we ﬁnally have Y = m ( X, β ) + V − U g ( Z, δ ) . (2)A potential maximum likelihood estimator of ( β, δ ) is based on the assumption that the compositeerror component ( V, U ) is independent of ( X, Z ) , with ( U, V ) mutually independent; V following anormal distribution with constant variance; and U following a normal distribution truncated at 0(so-called positive half-normal distribution, see Aigner et al., 1977; Schmidt and Lovell, 1979, 1980;Horrace, 2005).While a consistent estimation of ( β, δ ) can also be obtained without these strong distributionalassumptions (Simar et al., 1994; Tran and Tsionas, 2013), these assumptions are necessary to learnsomething about the variance of the ineﬃciency term, U . We are often interested in estimatingthe distance of each producer from the frontier (Battese and Coelli, 1988). This can be easily donewhen the marginal distributions of V and U are taken to be known.It has long been recognized in the literature that inputs may be simultaneously chosen withthe output, and thus potentially correlated with the composite error term (see Mundlak, 1961;Schmidt and Sickles, 1984, for a full description of the statistical issues in this context). Similarly,environmental variables may be decided by the producer depending on characteristics that areobservable to her but not to the econometrician.To deal with endogenous variables, we need a vector of instruments that are correlated with theendogenous components but independent of the composite error term (see Amsler et al., 2016, forthe impact of several exogeneity assumptions on identiﬁcation in SFA). To simplify our presentation,we take all variables in ( X, Z ) to be endogenous. Extension to the case when we have someendogenous and some exogenous components can be handled similarly. e consider the following auxiliary regression models X = W γ X + η X Z = W γ Z + η Z , where η = ( η X , η Z ) ∈ R p + k is a random vector of error components, and W ∈ R q is a vector ofinstrumental variables, with q ≥ p + k .Our approach is based on a control function assumption. That is, we assume that all thedependence between ( X, Z ) and ( V, U ) is captured by η (Newey et al., 1999; Imbens and Newey,2009; Wooldridge, 2015). Moreover, we assume that the instruments are strongly exogenous, thatis, fully independent of the composite error term. Given a triplet of random variables U , V and η ,we use the notation U Æ V to indicate that U is fully independent of V ; and the notation U Æ V ∣ η to indicate that U is fully independent of V conditional on η .Therefore, our main Assumptions can be formally stated as follows. Assumption 2.1. W Æ (

V, U , η ) and ( X, Z ) Æ ( U , V )∣ η . Assumption 2.2. U Æ V ∣ η . Assumption 2.1 implies strong exogeneity of the instruments; and implies that the control func-tion η captures all the dependence between ( X, Z ) and ( U , V ) .Assumption 2.2 implies that, if any dependence exists between V and U , it has to happenthrough the vector η . This assumption reduces to the standard assumption of U Æ V when both X and Z are taken to be exogenous (Kumbhakar and Lovell, 2003, Sec. 3.2, p. 64). This assumptionexcludes any direct correlation between U , the stochastic ineﬃciency term, and V .Assumptions 2.1 and 2.2 directly imply that f V,U ,η ( v, u, η ) = f V,η ( v, η ) f U ∣ η ( u ∣ η ) , where f denotes a probability density function. To construct a maximum likelihood estimator(MLE), we further impose the condition that η ∼ N ( , Σ η ) , where Σ η is a positive deﬁnite covariancematrix. f stochastic eﬃciency is taken to be independent of all covariates, a full information MLE canbe easily constructed by further assuming that ⎛⎜⎝ Vη ⎞⎟⎠ ∼ N ⎛⎜⎝⎡⎢⎢⎢⎢⎢⎣ ⎤⎥⎥⎥⎥⎥⎦ , ⎡⎢⎢⎢⎢⎢⎣ σ V Σ ′ V η Σ V η Σ η ⎤⎥⎥⎥⎥⎥⎦⎞⎟⎠ , where Σ ′ V η is a vector of covariances between V and η , and σ V is the variance of V (Kutlu, 2010).However, the main diﬃculty lays in the speciﬁcation of the joint density of ( U , η ) such that itsmarginal distributions are a truncated normal and a joint normal, respectively, and the dependencebetween the two can be captured by only one parameter. If one speciﬁes a joint normal distributionfor the random vector ( U ∗ , η ) , and then takes U = ∣ U ∗ ∣ , the marginal distributions of U and η are the correct marginal distributions. Amsler et al. (2017) claim that this construction createsdependence but it does not create correlation between U , and η (see also Schmidt and Lovell,1980). We contend that any dependence between U and η cannot naturally be linear, as U isa nonlinear transformation of a normal random variable. However, we show that the conditionaldistribution of U given η can be written in such a way that this dependence is still captured byonly one parameter which, we refer to as correlation parameter, and we denote as ρ U . To show how one can construct the conditional distribution for U given η , we introduce aﬁctitious random variable η such that ⎛⎜⎝ η η ⎞⎟⎠ ∼ N ⎛⎜⎝⎡⎢⎢⎢⎢⎢⎣ ⎤⎥⎥⎥⎥⎥⎦ , ⎡⎢⎢⎢⎢⎢⎣ σ − U Σ ′ Uη σ − U Σ Uη Σ η ⎤⎥⎥⎥⎥⎥⎦⎞⎟⎠ , where Σ Uη captures the dependence between U and η , and σ U is the scale parameter of thedistribution of U . Deﬁne the new random variable κ = ⎧⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎩ η if η ≥ − η otherwise . We are abusing terminology here. However, we label ρ U as the correlation parameter, in parallel with the normalcase and lack of a better deﬁnition. he random variable κ follows a skew-normal distribution with parameters Σ η and α = Σ − η Σ Uη ( σ U − Σ ′ Uη Σ − η Σ Uη ) / (Azzalini and Valle, 1996; Azzalini and Capitanio, 1999).Notice that we can let the conditional distribution of U to depend on η through κ , in a way thejoint distribution of ( U , κ ) can be written as f U ,κ ( u, κ ) = ( π ) ( p + k + )/ ∣ Σ Uη ∣ − / exp ⎛⎜⎜⎝− ⎛⎜⎝ uκ ⎞⎟⎠ ′ Σ − Uη ⎛⎜⎝ uκ ⎞⎟⎠⎞⎟⎟⎠ , (3)and the correlation between U and κ can be written as ρ U = σ − U Σ − / η Σ Uη . This implies that f U ,η ( u, η ) = ∫ f U ,κ,η ( u, κ, η ) dκ = ∫ f U ∣ κ,η ( u ∣ κ, η ) f κ ∣ η ( κ ∣ η ) dκf η ( η )= ∫ f U ∣ κ ( u ∣ κ ) f κ ∣ η ( κ ∣ η ) dκf η ( η ) , where the last step follows from the fact that U depends on η only through the new randomvariable κ .This construction leads to two important conclusions.1) The conditional density of U given κ can be written as f U ∣ κ ( u ∣ κ ) = √ π ( σ U − Σ ′ Uη Σ − η Σ Uη ) ⎡⎢⎢⎢⎢⎢⎣ Φ ⎛⎜⎝ Σ ′ Uη Σ − η κ √ σ U − Σ ′ Uη Σ − η Σ Uη ⎞⎟⎠⎤⎥⎥⎥⎥⎥⎦ − exp ⎛⎝− ( u − Σ ′ Uη Σ − η κ ) ( σ U − Σ ′ Uη Σ − η Σ Uη ) ⎞⎠ , which is a normal distribution truncated at zero with location parameter Σ ′ Uη Σ − η κ .2) The conditional density of κ given η can be written as a two-point distribution such that f κ ∣ η ( κ ∣ η ) = ⎧⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎩ Φ ( Σ ′ Uη Σ − η η √ σ U − Σ ′ Uη Σ − η Σ Uη ) when κ = η − Φ ( Σ ′ Uη Σ − η η √ σ U − Σ ′ Uη Σ − η Σ Uη ) otherwise . Using these two facts, straightforward computations easily imply that the distribution of U given η can be written as U ∣ η ( u ∣ η ) = √ π ( σ U − Σ ′ Uη Σ − η Σ Uη ) ⎧⎪⎪⎨⎪⎪⎩ exp ⎛⎝− ( u − Σ ′ Uη Σ − η η ) ( σ U − Σ ′ Uη Σ − η Σ Uη ) ⎞⎠ + exp ⎛⎝− ( u + Σ ′ Uη Σ − η η ) ( σ U − Σ ′ Uη Σ − η Σ Uη ) ⎞⎠⎫⎪⎪⎬⎪⎪⎭ , (4)which is the pdf of a folded normal distribution (Leone et al., 1961).Let us denote by ρ V , the vector of correlations between V and η , which is deﬁned in the usualway. Our problem can be reparametrized in terms of ( ρ V , ρ U ) .Figure 1 depicts the conditional folded normal pdf when η is a bivariate random vector, and ρ U = ( . , . ) ′ . When all parameters are ﬁxed, the pdf is symmetric in η , in the sense that theshape of the density for η = e is the same as for η = − e , for any real-valued vector e . For a given η , our X = Z = 0 X = Z = 1 X = 2, Z = 1 X = 3, Z = 1 Figure 1.

Conditional density of U given η .remark above implies that the density is invariant to changes in sign of the correlation parameter ρ U . That is, the conditional density of U generated under a certain vector of correlations ρ U isequal to the conditional density of U when the correlation parameter is − ρ U . This is a well-knownequivalence property of the folded normal distribution (see Sundberg, 1974, among others).Therefore, the sign of the parameter ρ U is not identiﬁed. Figure 2 exempliﬁes this issue in thecase when there are two endogenous regressors, one in the inputs and one in the environmentalvariables, so that p = k =

1, and the true value of ρ U = ( . , . ) ′ . The black solid lines are thelevel curves of the log-likelihood function in the parameter ρ U , when all other parameters are ﬁxed o their true value. The red dots designate the points where the log-likelihood function reaches itsmaximum. We can observe how both (− . , − . ) ′ and ( . , . ) ′ are maxima of the log-likelihoodfunction. Identification of U -0.6 -0.4 -0.2 0 0.2 0.4 UX -0.6-0.4-0.200.20.4 U Z Figure 2.

Example of lack of identiﬁcation of the parameter ρ U .However, one can still assess whether there is correlation between the regressors and the eﬃciencyterm, although it is not possible to obtain the sign of this correlation. Thus, in practice, this doesnot appear to be a major issue. We discuss below some potential ways to deal with this lack ofidentiﬁcation in estimation and inference.Another potential issue, relevant for our discussion below, is that, if the likelihood has twoisolated maxima, there must also be a point where the likelihood decreases between these twomaxima, i.e. a point of local minimum. In particular, the likelihood as a function of ρ U , has a localminimum at zero. While this does not aﬀect estimation, it is important for testing as it impliesthat the value of the score function at ρ U = ρ U .When Σ Uη is a vector of zeros, that is, when there is no correlation between covariates and theineﬃciency term, the conditional distribution in (4) reduces to f U ( u ) = √ πσ U exp (− u σ U ) , hich is the density of a half-normal distribution. In this case, ρ U is point identiﬁed, as we go backto the case in which U is independent of both X and Z .One can also easily show that the marginal distributions of η and U obtained from this con-struction are a normal and a half-normal distribution, respectively, for any plausible value of theparameter ρ U .Finally, because of Assumption 2.1 and the strict positivity of the function g (⋅) , the conditionaldistribution of U = U g ( Z, δ ) given η is simply given by P ( U ≤ u ∣ η ) = P ( U ≤ ( g ( Z, δ )) − u ∣ η ) , and it is therefore a simple scaled version of the distribution of U given η , as in the standard case.To summarize, we have shown that one can directly write the conditional density of U given η in closed form and in a way that the marginal distributions of U and η are a half-normal and anormal distribution respectively. Also, we have shown that only the sign of the vector of correlationsbetween the eﬃciency term, U , and the control function, η , is identiﬁed.We now turn to the construction of the likelihood function. We follow the literature on stochasticfrontier and deﬁne a new random variable ε = V − U such that f U,ε ∣ η ( u, ε ∣ η ) = f V ∣ η ( ε + u ∣ η ) ( g ( Z, δ )) − f U ∣ η (( g ( Z, δ )) − u ∣ η ) . We can thus write f V ∣ η ( ε + u ∣ η ) ( g ( Z, δ )) − f U ∣ η ( exp (− Z, δ ) u ∣ η )= π ˜ σ U ( Z ) ˜ σ V ⎧⎪⎪⎨⎪⎪⎩ exp ⎛⎝− ( u − g ( Z, δ ) Σ ′ Uη Σ − η η ) σ U ( Z ) − ( ε + u − Σ ′ V η Σ − η η ) σ V ⎞⎠+ exp ⎛⎝− ( u + g ( Z, δ ) Σ ′ Uη Σ − η η ) σ U ( Z ) − ( ε + u − Σ ′ V η Σ − η η ) σ V ⎞⎠⎫⎪⎪⎬⎪⎪⎭ , where ˜ σ U ( Z ) = ( σ U − Σ ′ Uη Σ − η Σ Uη ) g ( Z, δ ) , and ˜ σ V = σ V − Σ ′ V η Σ − η Σ V η .By simple but tedious computations, that we detail in Appendix, and after integrating withrespect to U , we obtain f ε ∣ η ( ε ∣ η ) = ∫ f V ∣ η ( ε + u ∣ η ) ( g ( Z, δ )) − f U ∣ η (( g ( Z, δ )) − u ∣ η ) du √ πσ ⎧⎪⎪⎨⎪⎪⎩ Φ ⎛⎝ λ ( Z ) Σ ′ V η Σ − η ησ ( Z ) + g ( Z, δ ) Σ ′ Uη Σ − η ηλ ( Z ) σ ( Z ) − λ ( Z ) εσ ( Z ) ⎞⎠ × exp ⎛⎝− ( ε − Σ ′ V η Σ − η η + g ( Z, δ ) Σ ′ Uη Σ − η η ) σ ( Z ) ⎞⎠+ Φ ⎛⎝ λ ( Z ) Σ ′ V η Σ − η ησ ( Z ) − g ( Z, δ ) Σ ′ Uη Σ − η ηλ ( Z ) σ ( Z ) − λ ( Z ) εσ ( Z ) ⎞⎠ × exp ⎛⎝− ( ε − Σ ′ V η Σ − η η − g ( Z, δ ) Σ ′ Uη Σ − η η ) σ ( Z ) ⎞⎠⎫⎪⎪⎬⎪⎪⎭ , with λ ( Z ) = ˜ σ U ( Z ) ˜ σ V , and σ ( Z ) = ˜ σ V + ˜ σ U ( Z ) . This distribution is an equal mixture of two conditional skew-normal distributions (see Azzaliniand Capitanio, 1999). In the absence of correlation between ε and the control function η , themarginal distribution of ε reduces to a skew-normal distribution. That is, to the standard stochasticfrontier model with strongly exogenous regressors.The full information likelihood function is therefore given by L( θ ) = f ε ∣ η ( ε ∣ η ) f η ( η ) , where the parameter θ = ( β, γ, δ, ρ U , ρ V , σ U , σ V ) .Let θ = arg max θ ∈ Θ L( θ ) . We assume that θ exists. However, it is, in general, not unique, because of the identiﬁcation issuediscussed above. We further assume that θ is in the interior of the parameter space Θ.Following the idea of Sundberg (1974), we can show that θ is a well-separated maximum ofthe likelihood function in the sense of Newey and McFadden (1994), only when one appropriatelyrestricts the parameter space. Let us assume there is at least one partition of the space [− , ] p + k ,such that there exists a unique maximum of the likelihood function in each element of the partition.Then, θ is locally identiﬁed, provided the partition is chosen appropriately.A further step to complete our framework is to obtain a feasible estimator of technical eﬃciency, T E = exp (− U i ) . Researchers are often interested in obtaining the technical eﬃciency for each roducer. In our case, we obtain an estimator of this quantity from the conditional distribution of U given ε and η , following a similar approach as in Amsler et al. (2017).Let σ ⋆ = ˜ σ V ˜ σ U ( Z ) σ ( Z ) µ ⋆ = − ( ε − Σ ′ V η Σ − η η ) ˜ σ U ( Z ) σ ( Z ) µ ⋆ = g ( Z, δ ) Σ ′ Uη Σ − η η ˜ σ V σ ( Z ) , where we have removed the dependence of σ ⋆ , µ ⋆ and µ ⋆ on the variable Z for simplicity. Theconditional density of U given ε and η can be written as f U ∣ ε,η ( u ∣ ε, η ) = √ πσ ⋆ {[ Φ ( µ ⋆ + µ ⋆ σ ⋆ )] − exp (− ( u − µ ⋆ − µ ⋆ ) σ ⋆ )+ [ Φ ( µ ⋆ − µ ⋆ σ ⋆ )] − exp (− ( u − µ ⋆ + µ ⋆ ) σ ⋆ )} . We can observe that, when both U and V are independent of η , this conditional density reduces tothe one derived in Jondrow et al. (1982).Hence E [ exp (− U )∣ ε, η ] = √ πσ ⋆ {[ Φ ( µ ⋆ + µ ⋆ σ ⋆ )] − ∫ ∞ exp (− u − ( u − µ ⋆ − µ ⋆ ) σ ⋆ ) du + [ Φ ( µ ⋆ − µ ⋆ σ ⋆ )] − ∫ ∞ exp (− u − ( u − µ ⋆ + µ ⋆ ) σ ⋆ ) du } . Using simple computations, and by the properties of the cdf of the univariate normal distribution,this expression is easily shown to be equal to E [ exp (− U )∣ ε, η ] = . (− µ ⋆ − µ ⋆ + σ ⋆ ) − Φ (− µ ⋆ + µ ⋆ σ ⋆ + σ ⋆ ) Φ ( µ ⋆ + µ ⋆ σ ⋆ )+ . (− µ ⋆ + µ ⋆ + σ ⋆ ) − Φ (− µ ⋆ − µ ⋆ σ ⋆ + σ ⋆ ) Φ ( µ ⋆ − µ ⋆ σ ⋆ ) . (5) his formula generalizes Battese and Coelli (1988) formula for technical eﬃciencies to the en-dogenous case. Finally, the mean technical eﬃciency (Lee and Tyler, 1978) can be obtained as E [ exp (− U )] = E [ E [ exp (− U )∣ ε, η ]] , by the law of iterated expectations.3. Estimation and Inference

We consider an iid sample drawn from the joint distribution of ( Y, X, Z, W ) , that we denote {( Y i , X i , Z i , W i ) , i = , . . . , n } , where each observation follows the model in equation (2).Estimation of the model is relatively straightforward, and directly follows from the speciﬁcationof the likelihood function derived above. For all i = , . . . , n , we can write L n ( θ ) = n ∏ i = f ε ∣ η ( ε i ∣ η i ) f η ( η i ) , (6)with η i = ( η X,i , η

Z,i ) ′ and ε i = Y i − X i βη Xi = X i − W i γ X η Zi = Z i − W i γ Z . By letting, (cid:96) n ( θ ) = log L n ( θ ) to be the log-likelihood function, we can obtain:ˆ θ n = arg max θ ∈ Θ (cid:96) n ( θ ) . As discussed above, the main issue in the estimation procedure is related to the sign of thecorrelation parameter ρ U , which is not identiﬁed.Let us denote by ˆ θ n,ρ U the estimator of θ obtained when ρ U is restricted to a partition of thehypercube [− , ] p + k , such that θ is locally identiﬁed, and it is in the interior of the partitionedparameter space. The likelihood function satisﬁes the condition for consistency (see Newey andMcFadden, 1994, Theorem 2.5, p. 2131). We thus have thatˆ θ n,ρ U p —→ θ ,ρ U . f, moreover, the likelihood function is twice continuously diﬀerentiable in a neighborhood of θ ,ρ U ,we have that √ n ( ˆ θ n,ρ U − θ ,ρ U ) d —→ N ( , I − ( θ ,ρ U )) , where I ( θ ,ρ U ) is the Fisher’s information matrix. This suggests that one can project out theparameter ρ U and conduct estimation and inference in the usual way.In practice, we ﬁnd that better estimation results are obtained by leaving the parameter ρ U unconstrained. The numerical optimization algorithm would converge to either of the two maximaof the likelihood function. However, this does not appear to have any eﬀects on the estimation ofthe other parameters, as we show in simulations. Moreover, it is often not feasible to restrict theparameter space in a meaningful way, especially when the dimension of ρ U is greater than or equalto 2, as this requires some prior beliefs on the sign of the correlation coeﬃcients. Furthermore,imposing inequality constraints may lead to singularity of the information matrix and further issuesrelated to the fact that the optimum may be at the boundaries of the (restricted) parameter space(Andrews, 1999). The MLE is not asymptotically normal when the true value is at the boundary,and appropriate testing procedures for this case have been developed (see Lee, 1993; Ketz, 2018,among others). Letting the parameter space unconstrained avoids these complications.Furthermore, one may wish to conduct inference on the parameter ρ U . In particular, a simplehypothesis to be tested is whether X and Z are independent of the ineﬃciency term, i.e. ρ U = ρ U in the unrestrictedparameter space is not identiﬁed under the alternative and thus standard tests may fail to satisfytheir usual asymptotic properties.One important remark is about the Score test. Irrespectively of its asymptotic properties andthe true value of ρ U , the Score test has no power around ρ U =

0. This is because zero is alwayslocal minimum of the likelihood function and thus the score is always equal to zero at that point.We leave a thorough theoretical exploration of the properties of the Trinity of tests in this modelfor future work, but we explore some of their ﬁnite sample properties in simulations. . Simulations

We replicate the same simulation schemes as in Amsler et al. (2017). We consider the followingmodel Y i = β + X i β + X i β + V i − U i exp ( Z i δ + Z i δ ) , with β = δ = δ = β = β = . ( X i , Z i ) are takento be exogenous (i.e. fully independent of the composite error term), and ( X i , Z i ) are insteadendogenous. We consider two instruments ( W i , W i ) , also fully independent of the error term.The exogenous variables are generated independently from a normal distribution with meansequal to 0 and variances equal to 1. These variables are equicorrelated, with correlation parameterequal to 0 . ( V, η X , η Z ) from the following normal distribution ⎛⎜⎜⎜⎜⎜⎝ V i η X,i η Z,i ⎞⎟⎟⎟⎟⎟⎠ ∼ N ⎛⎜⎜⎜⎜⎜⎝⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣ ⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦ , ⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣ . . . . . . ⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦⎞⎟⎟⎟⎟⎟⎠ , so that ρ V = ( . , . ) ′ , and X i = γ ( X i + Z i + W i + W i ) + η X,i Z i = γ ( X i + Z i + W i + W i ) + η Z,i , with γ = . η ∼ N ( Σ ′ Uη Σ − η η, ( − Σ ′ Uη Σ − η Σ Uη ) , with Σ η = ⎡⎢⎢⎢⎢⎢⎣ . . ⎤⎥⎥⎥⎥⎥⎦ , and Σ ′ Uη = ρ U , as all variances are taken equal to 1. From η and η , we generate a skew-normalrandom variable κ , such that κ = η ( η ≥ ) − η ( η < ) , here (⋅) is the indicator function. Finally, U = σ U ∣ Σ ′ Uη Σ − η κ + √ − Σ ′ Uη Σ − η Σ Uη (cid:15) ∣ , where (cid:15) is a standard normal random variable.We consider two simulation schemes that diﬀer because of the value of the parameter ρ U . In Setting 1 , we take U to be uncorrelated with η (the same setting as in Amsler et al., 2017). In Setting 2 , we take ρ U = ( . , . ) ′ . We take increasing sample sizes n = { , , } , and run R = γ by OLS. For a given γ , one can then maximize thefull likelihood with respect to the other parameters. One can use the estimator obtained in thisfashion as a starting value for maximization of the full likelihood. Standard errors are obtained byevaluating numerically the Hessian matrix of the full likelihood. Bootstrap is also a possibility, butwe do not explore it here (Kutlu, 2010).Second, the choice of the initial condition is crucial, especially for nonlinear, high dimensionaloptimization problems like ours. We select the initial parameters by the method of moments. Wecan write E [ Y i ∣ X i , Z i , η i ] = β + X i β + X i β + E [ V i ∣ η i ] − E [ U i ∣ η i ] exp ( Z i δ + Z i δ ) , using the assumption that ( U , V ) is independent of ( X , Z ) given η , with E [ V i ∣ η i ] = Σ ′ V η Σ − η η i E [ U i ∣ η i ] = √ σ U − Σ ′ Uη Σ − η Σ Uη φ ⎛⎜⎝ Σ ′ Uη Σ − η η i √ σ U − Σ ′ Uη Σ − η Σ Uη ⎞⎟⎠+ ⎛⎜⎝ ⎛⎜⎝ Σ ′ Uη Σ − η η i √ σ U − Σ ′ Uη Σ − η Σ Uη ⎞⎟⎠ − ⎞⎟⎠ Σ ′ Uη Σ − η η i . e report results of these simulations in Tables 1 and 2 below. The results in Table 1 shouldbe compared with those in Table 4, p. 138 of Amsler et al. (2017). The mean and the standarddeviation for most of the parameters are comparable with theirs. However, we achieve much betterprecision in estimating the variance of the ineﬃciency term, which, as indicated by Amsler et al.(2017), is estimated very imprecisely using the copula method. Both the bias and the variancedecrease as the sample size n increases, which ought to be expected from our MLE. N = N = N = β β β δ δ γ x, γ x, γ x, γ x, γ x, γ z, γ z, γ z, γ z, γ z, σ U σ V ρ U,η X ρ U,η Z ρ V,η X ρ V,η Z Table 1.

Mean and Standard Errors of Estimators for Setting 1

The results in

Setting 2 , i.e. when ρ U = ( . , . ) ′ are comparable to the results obtainedabove. We compute the mean of the parameter ρ U after taking the absolute value. Obviously,this is feasible here as we know that ρ U is well separated from the local minimum at 0. Theonly remarkable diﬀerence between the two tables is that the standard deviation of ρ U is nowmuch larger, which ought to be expected, as the parameter is not point identiﬁed in this case.Finally, the standard error of the parameter ρ U is also approximated very poorly using the inverseof the numerical Hessian matrix. This suggests that a Wald test may tend to over-reject the nullhypothesis in ﬁnite samples.We thus provide next some simulation evidence about using the trinity of test in this setting.For all simulation schemes, we test the composite nulls that ρ U = ρ U = . = N = N = β β β δ δ γ x, γ x, γ x, γ x, γ x, γ z, γ z, γ z, γ z, γ z, σ U σ V ρ U,η X ρ U,η Z ρ V,η X ρ V,η Z Table 2.

Mean and Standard Errors of Estimators for Setting 2 construct the covariance of the estimator for the Lagrange multiplier tests, we numerically evaluatethe second derivative under the null. The critical values are taken from a χ distribution with 2degrees of freedom.In Table 3, we report the size properties of the three tests, with the nominal size being 5%. Thecolumns indicate the true value of ρ U used in the simulation exercise and the null hypothesis ofthe test. Both the Wald test and the Lagrange multiplier tests require numerical evaluation of thesecond derivative of the likelihood function, which may aﬀect their ﬁnite sample properties. For ρ U =

0, the Likelihood ratio test is the one that has size most comparable to the nominal one. TheWald test has a much higher rejection probability and its performance does not improve as thesample size increases. As we suggest above, this may be due to the poor approximation of the truestandard errors. The score test instead features the opposite issue, as it rarely rejects a true null.When ρ U = . U = H ∶ ρ U = ρ U = . H ∶ ρ U = . Table 3.

Size of the trinity of tests

In Table 4, we instead report their power properties. The columns indicate the true value of ρ U used in the simulation exercise and the null hypothesis of the test. The tests have in general goodpower, with two main exceptions. The Wald test does not perform when ρ U =

0, but its powerproperties improve as the sample size increases. Similarly, the Score test has little to no power indetecting a false null hypothesis. As zero is a local minimum of the log-likelihood, as indicatedabove, the Score is close to zero at that point, which explains its bad performances. ρ U = H ∶ ρ U = . ρ U = . H ∶ ρ U = Table 4.

Power of the trinity of tests

Overall, we can conclude that the Likelihood ratio test has the best ﬁnite sample performance inour small-scale simulation exercise. This conclusion has to be taken with caution, as the theoreticalproperties of the trinity of tests in our setting may not be standard.Finally, we report summary statistics for our estimators of technical eﬃciencies using the Battese-Coelli formula provided in equation 5. To give a reference point to the reader, in both simulationschemes the marginal distribution of U is a half-normal distribution with scale parameter equalto σ U = . E [ exp (− U )] = ( σ U ) Φ (− σ U ) = . . Our estimator gives a plausible interval for the values of technical eﬃciencies. The mean technicaleﬃciency also approaches the true value of N increases. = N = N = ρ U = ρ U = . ρ U = ρ U = . ρ U = ρ U = . Table 5.

Summary measures for the estimator of technical eﬃciency Empirical Application

In this section, we consider an application using data on the agricultural sector in Nepal. Thedata set consists of a cross-section of 600 vegetable-cultivating farmers from Nepal for the cropyear 2015, which is sourced from the International Food Policy Research Institute and the SeedEntrepreneurs’ Association of Nepal (2018). For more detail on the data, see Spielman et al. (2017).The

Output variable is total vegetable production measured in rupees.

Land is measured as thetotal area cultivated in square feet.

Machinery is the number of hours machinery was used forland preparation, seed and sowing operations, and harvesting.

Labor is the sum of hours worked byhired laborers and the hours worked by household members.

Pesticides are measured in milligrams.

Fertilizers are the sum of organic and inorganic fertilizers, both measured in Kilograms.

Seeds aremeasured as the sum of hybrid and pollinated seeds in grams. As environmental variables weconsider

Experience , which is the number of years the farmer has been growing vegetables;

HigherEducation , the proportion of household members with higher education or professional degree; and

Risk diversiﬁcation , which is constructed as follows:

Risk diversif ication = ⎧⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎩ ∑ Ci = s i − / C − / C for C >

11 for C = , where s i is the proportion of land devoted to crop i , and C is the total number of crops cultivatedby each farmer. This indicator is constructed similarly to a normalized Herﬁndahl-Hirschman Indexand a Simpson Diversity Index, both concentration measures. Our indicator ranges from 0 to 1.A Risk diversiﬁcation

Index equal to 1 indicates a farmer who is cultivating only one crop, andtherefore, not diversifying risks; whereas lower values of this index indicate more risk diversiﬁcation. fter removing missing values, we obtain a ﬁnal sample of 551 observations. Summary statisticsof the variables used in the analysis are provided in Appendix A.Having this in mind, the model we estimate is the following: Y = Xβ + V − U exp ( Zδ ) , where: Y ={ log ( Output )} ,X ={ Intercept, log ( Land ) , log ( Labor ) , log ( M achinery ) , log ( F ertilizers ) , log ( P esticides ) , log ( Seeds )} ,Z ={ Education, Experience, Risk diversif ication } . We allow for endogeneity of ﬁve inputs (

Labor , Machinery , Fertilizers , Pesticides and

Seeds ) andone environmental variable (

Risk diversiﬁcation ). As instruments, we use two dummies for whetherthe farmer has suﬀered any natural or human shocks in the two years prior to the survey (

NaturalShocks and

Human Shocks , respectively); the average years of experience of nearby farmers, as ameasure of spillover eﬀects (

Peers Experience ); three variables measuring the proportion of seedsthat are owned by the farmer (

Own Supplier ), obtained through formal channels such as an inputretailer, a private seed company or representative, a government extension service or a researchinstitute (

Formal Supplier ), or informal channels such as a family member, a farmer’s cooperative,gifted from a nearby farmer, friend or farmer from other villages, or landlord (

Informal Supplier );a set of variables indicating the proportion of seeds that have been obtained using diﬀerent meansof transportation to reach the market (

Foot , Bike , Rickshaw , Motorbike , Tempo , Bus and

Car );and interaction terms between the type of seed provider and the mean of transportation. We testwhether the instruments are weak using the ﬁrst stage F-statistics (Stock and Yogo, 2005), and wereject the null hypothesis of weak instruments. Results are given in Appendix A2.Table 6 reports results for our empirical example. The ﬁrst pair of columns shows the estimationresults assuming exogeneity. Most of the estimated coeﬃcients for inputs are positive, althoughsmall in magnitude and not always signiﬁcant, being

Seeds and

Land the most relevant inputs.The coeﬃcients for

Machinery is negative, which seems unreasonable, but it is not signiﬁcantly β β Land β Labor β Machinery -0.0024 0.0086 0.0074 0.0282 β Fertilizer β Pesticides β Seeds δ Education -0.0429 1.606 0.3067 0.5767 δ Experience δ Risk -72.4406 0.0146 1.2888 0.5015 ρ U,η

Labor ρ U,η

Machinery ρ U,η

Fertilizer -0.0007 0.1089 ρ U,η

Pesticides -0.3242 0.1920 ρ U,η

Seeds ρ U,η

Risk -0.4713 0.1404 ρ V,η

Labor -0.3066 0.1026 ρ V,η

Machinery -0.0598 0.1040 ρ V,η

Fertilizer -0.0617 0.1167 ρ V,η

Pesticides ρ V,η

Seeds -0.3610 0.0780 ρ V,η

Risk σ U σ V Table 6.

Estimation of the eﬃciency frontier with and without accounting for endogeneity. diﬀerent from zero. The estimation results controlling for endogeneity are reported in the secondpair of columns. We ﬁnd that the estimated coeﬃcients for the inputs are all positive, except for

Pesticides , which have a negative and signiﬁcant eﬀect on the value of production.

Seeds are stillhaving a signiﬁcant impact on output, along with

Labor . Land instead is now not signiﬁcantlydiﬀerent from zero. As it is common in instrumental variables, the standard errors in the modelcontrolling for endogeneity are substantially larger than in the model assuming exogeneity.Regarding the environmental variables, we ﬁnd that the only signiﬁcant coeﬃcient is the oneof

Risk diversiﬁcation . The estimated coeﬃcient is negative and remarkably large in magnitudein the model assuming exogeneity. However, this coeﬃcient reverts to positive when controllingfor endogeneity. This means that higher levels of crop concentration (lower risk diversiﬁcation)increase the level of ineﬃciency. This result may seem counter-intuitive, as one may expect thatfarmers cultivating fewer crops (i.e., with lower risk diversiﬁcation) can become more specialized.However, it is also true that farmers who diversify risks are less exposed to shocks aﬀecting theirproduction, and our results suggest that they may be more eﬃcient. hen controlling for endogeneity, we have also tested for the absence of correlation between theendogenous variables and the ineﬃciency term, and for the variance of the ineﬃciency term beingequal to 0. We ﬁrst test the joint null hypothesis that ρ U = σ U is equal to 0 in both models. In the model withendogeneity, this is a composite null, as σ U =

0, also implies ρ U =

0. Similarly, we are testing fora parameter at the boundary of the parameter space (Lee, 1993; Ketz, 2018). However, we ignorethis issue for simplicity. In both models, the likelihood ratio test rejects the null of σ U being equalto 0. However, the Wald test fails to reject the null in the model with exogeneity. Technical Efficiency (Exogeneity)

Technical Efficiency (Endogeneity)

Figure 3.

Estimation of technical ineﬃciency.Figure 3 reports the technical eﬃciency estimates for both models. It is apparent from thedistribution of the ineﬃciency scores, that the stochastic frontier model that does not account forendogeneity is unable to capture any skewness in the distribution of the residuals. However, despitethe variance of the ineﬃciency term being smaller in the model with endogeneity, the estimatorof technical eﬃciencies are much richer and suggests that many farmers may be very far from theestimated production frontier. . Conclusions

We propose a closed-form maximum likelihood estimation of a stochastic frontier model whenboth the production inputs and the environmental variables are correlated with the two-sided sto-chastic error term and the one-sided stochastic ineﬃciency term. Our identiﬁcation and estimationstrategy is based on control functions that fully capture the dependence between regressors and un-observables. While the joint density of the two-sided stochastic error term and the control functionis easily modeled as a normal distribution, one of the main challenges for direct maximum likelihoodestimation is to write the joint density of the stochastic ineﬃciency term and the control functionin closed-form. To circumvent this issue, Amsler et al. (2017) use copula functions to model thedependence between observables and unobservables components of the model, and employ a simu-lated maximum likelihood procedure to obtain the parameter’s estimate. This estimator may notbe easy to implement and may be computationally slow. Moreover, instrumental variable methodslead to lower precision in the estimate and simulated methods can increase this lack of precisioneven further.In this work, we provide a simple maximum likelihood estimator that aims at avoiding thesepotential pitfalls. Under appropriate conditional independence restrictions, we show that the con-ditional distribution of the stochastic ineﬃciency term given the control functions is a folded normaldistribution, which reduces to the half-normal when there is no endogeneity. This makes our modela straightforward extension of the normal-half-normal model to include endogenous regressors. Weshed light on new identiﬁcation issues, and we provide Monte-Carlo evidence of the size and powerof standard testing procedures in such context. Our estimator is easy and fast to implement, andenjoys good ﬁnite sample properties.Additional research on the asymptotic properties of the trinity of tests and on testing the dis-tributional assumptions on the error term is needed. Moreover, extensions of our model to paneldata with time-varying endogeneity and true ﬁxed eﬀects could be of interest. eferences Aigner, D., Lovell, C. and Schmidt, P. (1977), ‘Formulation and estimation of stochastic frontierproduction function models’,

Journal of Econometrics (1), 21 – 37.Alvarez, A., Amsler, C., Orea, L. and Schmidt, P. (2006), ‘Interpreting and Testing the ScalingProperty in Models where Ineﬃciency Depends on Firm Characteristics’, Journal of ProductivityAnalysis (3), 201–212.Amsler, C., Prokhorov, A. and Schmidt, P. (2016), ‘Endogeneity in stochastic frontier models’, Journal of Econometrics (2), 280 – 288.Amsler, C., Prokhorov, A. and Schmidt, P. (2017), ‘Endogenous environmental variables in sto-chastic frontier models’,

Journal of Econometrics (2), 131 – 140.Andrews, D. W. K. (1999), ‘Estimation when a parameter is on a boundary’,

Econometrica (6), 1341–1383.Azzalini, A. and Capitanio, A. (1999), ‘Statistical applications of the multivariate skew normal dis-tribution’, Journal of the Royal Statistical Society: Series B (Statistical Methodology) (3), 579–602.Azzalini, A. and Valle, A. D. (1996), ‘The Multivariate Skew-Normal Distribution’, Biometrika (4), 715–726.Battese, G. E. and Coelli, T. J. (1988), ‘Prediction of ﬁrm-level technical eﬃciencies with a gener-alized frontier production function and panel data’, Journal of Econometrics (3), 387 – 399.Gouri´eroux, C. and Monfort, A. (1997), Simulation-based Econometric Methods , OUP/CORE Lec-ture Series, Oxford University Press.Horrace, W. C. (2005), ‘Some results on the multivariate truncated normal distribution’,

Journalof Multivariate Analysis (1), 209 – 221.IFPRI and SEAN (2018), ‘Nepal Vegetable Seed Study: Household Survey’. URL: https://doi.org/10.7910/DVN/9BRU7N

Imbens, G. W. and Newey, W. K. (2009), ‘Identiﬁcation and Estimation of Triangular SimultaneousEquations Models Without Additivity’,

Econometrica (5), 1481–1512.Jondrow, J., Lovell, C. K., Materov, I. S. and Schmidt, P. (1982), ‘On the estimation of techni-cal ineﬃciency in the stochastic frontier production function model’, Journal of Econometrics (2), 233 – 238.Karakaplan, M. U. and Kutlu, L. (2017), ‘Handling Endogeneity in Stochastic Frontier Analysis’, Economics Bulletin (2).Ketz, P. (2018), ‘Subvector inference when the true parameter vector may be near or at the bound-ary’, Journal of Econometrics (2), 285 – 306.Kumbhakar, S. and Lovell, C. (2003),

Stochastic Frontier Analysis , Stochastic Frontier Analysis,Cambridge University Press.Kutlu, L. (2010), ‘Battese-coelli estimator with endogenous regressors’,

Economics Letters (2), 79 – 81.Lee, L.-F. (1993), ‘Asymptotic Distribution of the Maximum Likelihood Estimator for a StochasticFrontier Function Model with a Singular Information Matrix’,

Econometric Theory (3), 413–430.Lee, L.-F. and Tyler, W. G. (1978), ‘The stochastic frontier production function and averageeﬃciency: An empirical analysis’, Journal of Econometrics (3), 385 – 389.Leone, F. C., Nelson, L. S. and Nottingham, R. B. (1961), ‘The Folded Normal Distribution’, Technometrics (4), 543–550.Mundlak, Y. (1961), ‘Empirical Production Function Free of Management Bias’, American Journalof Agricultural Economics (1), 44–56.Newey, W. K. and McFadden, D. (1994), Large sample estimation and hypothesis testing, Vol. 4of Handbook of Econometrics , Elsevier, pp. 2111 – 2245.Newey, W. K., Powell, J. L. and Vella, F. (1999), ‘Nonparametric Estimation of Triangular Simul-taneous Equations Models’,

Econometrica (3), 565–603.pin Lai, H. and Kumbhakar, S. C. (2018), ‘Endogeneity in panel data stochastic frontier modelwith determinants of persistent and transient ineﬃciency’, Economics Letters , 5 – 9.Schmidt, P. and Lovell, C. (1980), ‘Estimating stochastic production and cost frontiers when tech-nical and allocative ineﬃciency are correlated’,

Journal of Econometrics (1), 83 – 100.Schmidt, P. and Lovell, C. K. (1979), ‘Estimating technical and allocative ineﬃciency relative tostochastic production and cost frontiers’, Journal of Econometrics (3), 343 – 366.Schmidt, P. and Sickles, R. C. (1984), ‘Production frontiers and panel data’, Journal of Business& Economic Statistics (4), 367–374. imar, L., Knox Lovell, C. and Vanden Eeckaut, P. (1994), ‘Stochastic frontiers incorporatingexogenous inﬂuences on eﬃciency’, STAT Discussion Papers (9403).Spielman, D. J., Bhandary, P., Bhandari, A., Shrestha, H., Dhakal, L. and Marahatta, B. (2017),Nepali Vegetable Seed Market Study – Household Analysis, Technical report, International FoodPolicy Research Institute.Stock, J. H. and Yogo, M. (2005),

Testing for Weak Instruments in Linear IV Regression , Cam-bridge University Press, pp. 80–108.Sundberg, R. (1974), ‘Maximum likelihood theory for incomplete data from an exponential family’,

Scandinavian Journal of Statistics (2), 49–58.Tran, K. C. and Tsionas, E. G. (2013), ‘Gmm estimation of stochastic frontier model with endoge-nous regressors’, Economics Letters (1), 233 – 236.Tran, K. C. and Tsionas, E. G. (2015), ‘Endogeneity in stochastic frontier models: Copula approachwithout external instruments’,

Economics Letters , 85 – 88.Wooldridge, J. M. (2015), ‘Control Function Methods in Applied Econometrics’,

Journal of HumanResources (2), 420–445. . Appendix

A.1.

Conditional density of the composite error term.

In this subsection, we provide themain steps to derive the conditional density of the composite error term, ε , given η . Recall that f V ∣ η ( ε + u ∣ η ) ( g ( Z, δ )) − f U ∣ η (( g ( Z, δ )) − u ∣ η )= π ˜ σ U ( Z ) ˜ σ V ⎧⎪⎪⎨⎪⎪⎩ exp ⎛⎝− ( u − g ( Z, δ ) Σ ′ Uη Σ − η η ) σ U ( Z ) − ( ε + u − Σ ′ V η Σ − η η ) σ V ⎞⎠+ exp ⎛⎝− ( u + g ( Z, δ ) Σ ′ Uη Σ − η η ) σ U ( Z ) − ( ε + u − Σ ′ V η Σ − η η ) σ V ⎞⎠⎫⎪⎪⎬⎪⎪⎭ , where ˜ σ U ( Z ) = ( σ U − Σ ′ Uη Σ − η Σ Uη ) g ( Z, δ ) , and ˜ σ V = σ V − Σ ′ V η Σ − η Σ V η .The terms inside the exponential function can be treated similarly, and for simplicity, we onlyshow the algebra for the ﬁrst term. We have ( u − g ( Z, δ ) Σ ′ Uη Σ − η η ) ˜ σ U ( Z ) = σ U ( Z ) ( u − g ( Z, δ ) Σ ′ Uη Σ − η uη + ( g ( Z, δ ) Σ ′ Uη Σ − η ) η )( ε + u − Σ ′ V η Σ − η η ) ˜ σ V = σ V ( u + ( ε − Σ ′ V η Σ − η η ) + ( ε − Σ ′ V η Σ − η η ) u ) . Taking the sum of these two terms gives σ ( Z ) ˜ σ U ( Z ) ˜ σ V ( u − g ( Z, δ ) Σ ′ Uη Σ − η uη ˜ σ V σ ( Z ) + ( ε − Σ ′ V η Σ − η η ) u ˜ σ U ( Z ) σ ( Z ) )+ ( g ( Z, δ ) Σ ′ Uη Σ − η ) η ˜ σ U ( Z ) + ( ε − Σ ′ V η Σ − η η ) ˜ σ V = σ ( Z ) ˜ σ U ( Z ) ˜ σ V ( u + (( ε − Σ ′ V η Σ − η η ) ˜ σ U ( Z ) σ ( Z ) − g ( Z, δ ) Σ ′ Uη Σ − η η ˜ σ V σ ( Z ) )) − σ ( Z ) ˜ σ U ( Z ) ˜ σ V (( ε − Σ ′ V η Σ − η η ) ˜ σ U ( Z ) σ ( Z ) − g ( Z, δ ) Σ ′ Uη Σ − η η ˜ σ V σ ( Z ) ) + ( g ( Z, δ ) Σ ′ Uη Σ − η ) η ˜ σ U ( Z ) + ( ε − Σ ′ V η Σ − η η ) ˜ σ V = σ ( Z ) ˜ σ U ( Z ) ˜ σ V [ u + (( ε − Σ ′ V η Σ − η η ) ˜ σ U ( Z ) σ ( Z ) − g ( Z, δ ) Σ ′ Uη Σ − η η ˜ σ V σ ( Z ) )] + ( σ V − ˜ σ U ( Z ) ˜ σ V σ ( Z ) ) ( ε − Σ ′ V η Σ − η η ) + ( σ U ( Z ) − ˜ σ V ˜ σ U ( Z ) σ ( Z ) ) ( g ( Z, δ ) Σ ′ Uη Σ − η ) η σ ( Z ) ( ε − Σ ′ V η Σ − η η ) g ( Z, δ ) Σ ′ Uη Σ − η η = σ ( Z ) ˜ σ U ( Z ) ˜ σ V [ u + (( ε − Σ ′ V η Σ − η η ) ˜ σ U ( Z ) σ ( Z ) − g ( Z, δ ) Σ ′ Uη Σ − η η ˜ σ V σ ( Z ) )] + σ ( Z ) ( ε − Σ ′ V η Σ − η η + g ( Z, δ ) Σ ′ Uη Σ − η η ) . Then, treating the remaining term similarly, we can write f V ∣ η ( ε + u ∣ η ) ( g ( Z, δ )) − f U ∣ η (( g ( Z, δ )) − u ∣ η )= π ˜ σ U ( Z ) ˜ σ V σ ( Z ) σ ( Z ) ⎧⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎩ exp ⎛⎜⎜⎜⎝− σ ( Z ) [ u + (( ε − Σ ′ V η Σ − η η ) ˜ σ U ( Z ) σ ( Z ) − g ( Z, δ ) Σ ′ Uη Σ − η η ˜ σ V σ ( Z ) )] σ U ( Z ) ˜ σ V ⎞⎟⎟⎟⎠ × exp ⎛⎜⎝− ( ε − Σ ′ V η Σ − η η + g ( Z, δ ) Σ ′ Uη Σ − η η ) σ ( Z ) ⎞⎟⎠+ exp ⎛⎜⎜⎜⎝− σ ( Z ) [ u + (( ε − Σ ′ V η Σ − η η ) ˜ σ U ( Z ) σ ( Z ) + g ( Z, δ ) Σ ′ Uη Σ − η η ˜ σ V σ ( Z ) )] σ U ( Z ) ˜ σ V ⎞⎟⎟⎟⎠ × exp ⎛⎜⎝− ( ε − Σ ′ V η Σ − η η − g ( Z, δ ) Σ ′ Uη Σ − η η ) σ ( Z ) ⎞⎟⎠⎫⎪⎪⎪⎬⎪⎪⎪⎭ . After integrating this ﬁnal expression with respect to U on its support, that is between 0 and ∞ , we obtain the ﬁnal result.A.2. Additional material for empirical application.

In this section, we provide some addi-tional information about the empirical application.Table 7 contains descriptive statistics from the main variables used in the analysis. The variablesare divided by category for convenience of the reader.Table 8 contains instead values of the F-statistics from the ﬁrst stage linear regressions of theendogenous variables on the included exogenous variables and the instruments. The null hypothesistested is that the instruments are irrelevant, that is, all coeﬃcients are simultaneously equal to 0.We can observe how all F-statistics are above 10, which is the threshold value suggested by Stockand Yogo (2005) below which the instruments should be considered weak. ean St.Dev. Min MaxOutput 986391.007 7413003.065 2466.286 117761500.000 Inputs

Land 27521.032 32288.538 729.000 273800.000Labor 520.127 4182.825 1.000 92881.000Machinery 2.426 7.222 0.000 70.000Fertilizers 44539.860 433215.499 0.000 7500000.000Pesticides 85.309 226.033 0.000 3250.000Seeds 279.404 384.670 0.002 3500.000

Environmental variables

Education 0.065 0.138 0.000 0.800Experience 24.309 16.496 1.000 100.000Risk Div 0.393 0.169 0.093 1.000

Instruments

Natural Shock 0.430 0.496 0.000 1.000Human Shock 0.022 0.146 0.000 1.000Own Supplier 0.052 0.105 0.000 1.000Formal supplier 0.251 0.190 0.000 1.000Informal Supplier 0.011 0.044 0.000 0.500Peers Experience 24.739 13.366 10.000 44.000Foot 0.653 0.425 0.000 1.000Bike 0.161 0.340 0.000 1.000Rickshaw 0.003 0.047 0.000 1.000Motorbike 0.020 0.130 0.000 1.000Tempo 0.007 0.069 0.000 1.000Bus 0.115 0.287 0.000 1.000Car 0.004 0.033 0.000 0.500

Table 7.

Descriptive StatisticsVariable F-StatisticLabor 27.804Machinery 56.339Fertilizers 26.760Pesticides 20.298Seeds 11.223Risk 98.596

Table 8.

F-Statistics from linear ﬁrst stage regressions (S. Centorrino, Corresponding author)

Economics Department, State University of New York at StonyBrook, USA.

E-mail address , S. Centorrino: [email protected] (M. P´erez-Urdiales)

Economics Department, State University of New York at Stony Brook, USA.

E-mail address : [email protected]@stonybrook.edu