Nonparametric Tests of Tail Behavior in Stochastic Frontier Models
NNonparametric Tests of Tail Behaviorin Stochastic Frontier Models
William C. Horrace ∗ Yulong Wang † June 16, 2020
Abstract
This article studies tail behavior for the error components in the stochastic frontiermodel, where one component has bounded support on one side, and the other hasunbounded support on both sides. Under weak assumptions on the error components,we derive nonparametric tests that the unbounded component distribution has thintails and that the component tails are equivalent. The tests are useful diagnostic toolsfor stochastic frontier analysis. A simulation study and an application to a stochasticcost frontier for 6,100 US banks from 1998 to 2005 are provided. The new tests rejectthe normal or Laplace distributional assumptions, which are commonly imposed in theexisting literature.
Keywords:
Hypothesis tests, Production, Inefficiency, Extreme value theory.
JEL Codes:
C12, C21, D24. ∗ Corresponding author: Department of Economics and Center for Policy Research, Syracuse University,Syracuse, NY, 13244, [email protected]. † Department of Economics and Center for Policy Research, Syracuse University, Syracuse, NY, 13244. a r X i v : . [ ec on . E M ] J un Introduction
Stochastic frontier analysis (SFA) has a vast literature, both methodological and applied,and empiricists have applied the methods to myriad industries, most notably agriculture,banking, education, healthcare, and energy. A common practice in SFA is to impose para-metric assumptions on the error components, but the set of statistical tools to investigatethe validity of these assumptions is still limited. This paper expands this set of tools bydrawing on recently developed techniques in Extreme Value (EV) theory and by developingnew diagnostic tests.In particular, the parametric stochastic frontier model for cross-sectional data (Aigneret al. 1997) is a leading case of the error component regression model but with the uniquefeature that one error component ( U ) is a non-negative random variable (e.g., half-normal,exponential), while the other ( W ) is a random variable of unbounded support (e.g., normal,Laplace, Student-t). A common assumption in the stochastic frontier literature is that W is drawn from a normal or Laplace distribution (both thin-tailed distributions). SeeAigner et al. (1977) or Horrace and Parmeter (2018), respectively. However, heavy-taileddistributions are now also being considered. For example, the findings of Wheat et al. (2019)suggest that a cost inefficiency model of highway maintenance costs in England has Student-t errors. These parametric distributions, such as normal and Student-t, display similarpatterns in the middle of their supports but exhibit substantially different tail behaviors.This observation motivates and plays an essential role in our diagnostic tests, which webelieve are a timely and appropriate contribution to the literature. For other parametric specifications of the model see Li (1996), Carree (2002), Tsionas (2007), Kumbhakaret al. (2013), and Almanidis et al. (2014). There are semi-parametric estimators of the model that relax the distributional assumptions on onecomponent and estimate the density of the other using kernel deconvolution techniques. See Kneip et al.(2015), Horrace and Parmeter (2011), Cai et al. (2020), Simar et al. 2017, Hall and Simar (2002), Florenset al. (2020). Z = W − U ) (approximately) arisefrom the tails of W , because U is one-sided. Also, assuming that W is in the domain ofattraction (DOA) of extreme value distributions, the asymptotic distribution of the largestorder statistics of W is the EV distribution, which may be fully characterized (after locationand scale normalization) by a single parameter that captures its tail heaviness. Then,likelihood ratio statistics for hypotheses on this single parameter can be derived based onthe limiting EV distribution.To be specific, consider the right tail of W . If the DOA assumption is satisfied, then tailbehavior may be entirely characterized by a tail index , ξ ∈ R . If ξ = 0, then W has thintails. If ξ >
0, then W has thick tails. Otherwise, W has bounded support. Under veryweak assumptions on the error components, we derive a test that the tails of W are thin( ξ = 0). We prove that this test is valid whether Z is observed or appended to a regressionmodel (as it is in the stochastic frontier model). If we assume that U is also in the DOAof extreme value distributions and that W is symmetric (a common assumption), we alsoderive a test that the (right) tail of U is thinner than the left tail of W . If we further assumethat W is a member of the normal family, then we may test the hypotheses that the tailsof U and W are both thin. Therefore, our nonparametric tests are useful diagnostic toolsto help empiricists make parametric choices on the distributions of both U and W . Thisis particularly important for the stochastic frontier model for cross-sectional data, wheredistributional assumptions on the components are typically necessary for the identificationof the model’s parameters.The paper is organized as follows. The next section presents the tests. Section 3 providesa simulation study of the power and size of the test. Section 4 applies the tests to a stochastic The assumption that W is in the DOA of extreme value distributions is not restrictive, as we shall see. W are not thin.Therefore, a normal or Laplace assumption for W is not justified, and perhaps a Student-tassumption may be appropriate. Section 5 concludes. To fix ideas, we begin a review of the DOA assumption and present the test in the case where Z is directly observed in Section 2.1. Then in Section 2.2, we move to the case where Z isappended to a regression model and has to be estimated, which covers the linear regressionstochastic frontier model. Additional tests under different sets of weak assumptions are alsopresented. Consider a random sample of Z i = W i − U i for i = 1 , . . . , n , where U i ≥ inefficiency , and W i ∈ R is noise with unbounded support. We start with testing the shape of the right tail of W i in a nonparametric way.The key assumption is that the distribution of W i is within the domain of attraction ofEV distributions. In particular, a cumulative distribution function F is in the domain ofattraction of G ξ , denoted as F ∈ D ( G ξ ), if there exist constants a n > b n such thatlim n →∞ F n ( a n v + b n ) = G ξ ( v ) While the analyses that follow are for cross-sectional data, they can easily be applied to panel data, aslong as one is willing to assume independence in both the time and cross-sectional dimensions. G ξ is the generalized EV distribution, G ξ ( v ) = exp[ − (1 + ξv ) − /ξ ], 1 + ξv ≥
0, for ξ (cid:54) = 0exp[ − e − v ], v ∈ R , ξ = 0 (1)and ξ is the tail index, measuring the decay rate of the tail.The domain of attraction condition is satisfied by a large range of commonly used distri-butions. If ξ is positive, this condition is equivalent to regularly varying at infinity, i.e.,lim t →∞ − F ( tv )1 − F ( t ) = v − /ξ for v > . (2)This covers Pareto, Student-t , and F distributions, for example. The case with ξ = 0 coversthe normal family, and the case with ξ < See de Haan and Ferreira (2007), Ch.1 for a complete review.Note that the above notation is for the right tail of W , which can be easily adapted tothe left tail by considering − W . For expositional simplicity, we denote ξ W − and ξ W + as thetail indices for the left and right tails of W , respectively. The same notation applies to othervariables (e.g., U and Z ) introduced later.Returning to SFA, a common assumption is that W i is normal or Laplace, which impliesthat ξ W + = 0. So our hypothesis testing problem is as follows: H : ξ W + = 0 against H : ξ W + > . (3)If the null hypothesis is rejected, we would then argue that some heavy-tailed distributionshould be used to model the noise and maybe the inefficiency as well. The tail index of the Student-t distribution with ν degrees of freedom is ξ = 1 /ν . The uniform distribution has ξ = −
1, and the triangular distribution has ξ = − /
5o obtain a feasible test, we argue that, since U i is bounded from below at zero, thelargest order statistics of Z i are approximately stemming from the right tail of W i . This isformalized in Proposition 1, which requires the following conditions. Let Z n : n ≥ , . . . , ≥ Z n :1 be the order statistics of { Z i } ni =1 by descend sorting. Denote Z + = ( Z n : n , ..., Z n : n − k +1 ) (cid:124) as the k largest observations. From now on, we use bold letters to denote vectors. Denote F W and Q W ( p ) = inf { y ∈ R : p ≤ F W ( y ) } as the CDF and the quantile function of W i ,respectively. Write Q W (1) as the right end-point of the support of W i . For a generic columnvector X and scalar c , the notation X − c means X − ( c, . . . , c ) (cid:124) . Assumption 1 (i) ( U i , W i ) (cid:124) is i.i.d. (ii) U i and W i are independent.(iii) U i ≥ E [ | U i | ] < ∞ and W i ∈ R with Q W (1) = ∞ .(iv) F W ∈ D ( G ξ W + ) with ξ W + ≥
0. In addition, F W ( · ) is twice continuously differentiablewith bounded derivatives, and the density f W ( · ) satisfies that ∂f W ( t ) /∂t (cid:37) t → ∞ on [ c, ∞ ) for some constant c .Assumptions 1(i)-(iii) are common in the SFA literature (see Horrace and Parmeter (2018)and the references therein). Assumption 1(iv) requires the tail of F W to be within the domainof attraction of EV distributions with an infinite upper bound. Moreover, it requires that thedensity derivative monotonically increases to zero. This is a mild assumption and is satisfied6y many commonly used distributions. For example, the normal distribution is covered asseen by ∂f W ( t ) ∂t ∝ − t exp( − t (cid:37) t → ∞ , and the Pareto distribution is covered as seen by ∂f W ( t ) ∂t ∝ ( − α − x − α − (cid:37) t → ∞ for some α > . Under Assumption 1, the following proposition derives the asymptotic distribution of Z + . Proposition 1
Suppose Assumption 1 holds. Then, there exist sequences of constants a n and b n such that for any fixed k Z + − b n a n d → V + = ( V , ..., V k ) (cid:124) where the joint density of V + is given by f V + | ξ W + ( v , ..., v k ) = G ξ W + ( v k ) (cid:81) ki =1 g ξ W + ( v i ) /G ξ W + ( v i ) on v k ≤ v k − ≤ . . . ≤ v , and g ξ W + ( v ) = dG ξ W + ( v ) /dv . The proof is in Appendix A. This proposition implies that the distributions of Z i and W i share the same (right) tail shape, which is entirely characterized by the tail index ξ W + . Suchtail equivalence does not hold, however, for the left tails due to the existence of U . This isstudied in Section 2.3 under the additional assumption that W is symmetric.If the constants a n and b n were known, Z + is then approximately distributed as V + ,and the limiting problem is reduced to the well-defined finite sample problem: constructingsome inference method based on one draw V + whose density f V + | ξ W + is known up to ξ W + .However, a n and b n depend on F W and hence are unknown a priori .7o avoid the need for knowledge of a n and b n , we consider the following self-normalizedstatistic Z ∗ + = Z + − Z n : n − k +1 Z n : n − Z n : n − k +1 (4)= (cid:18) , Z n : n − − Z n : n − k +1 Z n : n − Z n : n − k +1 , ..., Z n : n − k +2 − Z n : n − k +1 Z n : n − Z n : n − k +1 , (cid:19) (cid:124) . It is easy to establish that Z ∗ is maximally invariant with respect to the group of locationand scale transformations (cf., Lehmann and Romano (2005), Ch.6). In words, the estimatorconstructed as a function of Z ∗ remains unchanged if data are shifted and multiplied by anynon-zero constant. This makes senses since the tail shape should be preserved no matter howdata are linearly transformed. This invariance property allows us to construct nonparametrictests for a stochastic frontier model that is otherwise not identified without parametricassumptions on U and W . As such, our tests do not reveal anything about the location orthe scale of the error components.The continuous mapping theorem and Proposition 1 imply that for any fixed k , as n → ∞ , Z ∗ + d → V ∗ + ≡ V + − V k V − V k . The CDF of V ∗ + can be calculated via change of variables as f V ∗ + | ξ W + (cid:0) v ∗ + (cid:1) = Γ ( k ) (cid:90) b ( ξ W + ) t k − exp (cid:32) − (1 + 1 /ξ W + ) k (cid:88) i =1 log(1 + ξ W + v ∗ i t ) (cid:33) dt, (5)where v ∗ + = ( v ∗ , . . . , v ∗ k ), b ( ξ ) = ∞ if ξ ≥ − /ξ otherwise, and Γ ( k ) is the gammafunction. Note that the invariance restriction costs two degrees of freedom since the first and In particular the non-zero expectation of U precludes identification of unknown parameter δ in the model Z i = δ + W i − U i . V ∗ + are always 1 and 0, respectively. We calculate this density by numericalquadrature.Given f V ∗ + | ξ W + , we can construct the generalized likelihood-ratio test for problem (3).Since the alternative hypothesis is composite, we follow Andrews and Ploberger (1994) andElliott et al. (2015) to consider the weighted average alternative (cid:90) f V ∗ + | ξ W + ( · ) w ( ξ W + ) dξ W + , where w ( · ) is a weighting function that reflects the importance of rejecting different alterna-tive values. Then our test is constructed as ϕ ( v ∗ + ) = (cid:34) (cid:82) f V ∗ + | ξ W + ( v ∗ + ) w ( ξ W + ) dξ W + f V ∗ + | ( v ∗ + ) > cv( k, α ) (cid:35) , (6)where the critical value cv( k, α ) depends on k and the level of significance α . We can obtainit by simulation. By Proposition 1 and the continuous mapping theorem, this test controlssize asymptotically as lim n →∞ ϕ ( Z ∗ + ) = α .We end this subsection by briefly discussing the choice of k , that is, the number ofthe largest order statistics used to approximate the EV distribution. On the one hand,larger k means including more mid-sample observations, which induces a larger finite samplebias in the EV approximation. On the other hand, smaller k provides a better asymptoticapproximation but uses less sample information, leading to a lower power test. This trade-offleads to difficulty in theoretical justification of an optimal k in standard EV theory literature(cf., M¨uller and Wang (2017)). It is even more difficult, if at all possible, in our case, since weonly observe Z , and not W . Nonetheless, our asymptotic arguments show that the test (6)controls size for any fixed k , as long as n is sufficiently large. Figure 1 depicts the asymptotic In later sections, we set w ( · ) to be the standard uniform distribution over (0 ,
1) for simplicity. V ∗ + generated from the density (5) based on 10,000 simulationdraws. The test controls size for all values of k by construction and has reasonably largepower when k exceeds 20.With ideas fixed, we now turn to the regression version of the test, with application toSFA. Now consider the linear regression with Y i = X (cid:124) i β + Z i , where Z i = − U i + W i is as in the previous section, and β is some pseudo-true parameterin some compact parameter space. This could be a Cobb-Douglas production function (inlogarithms), where Y is productive output and U is now called technical efficiency , whichmeasures distance ( U i ) from a stochastic frontier ( X (cid:124) i β + W i ). The slopes ( β ) are marginalproducts of the productive inputs, X i . It could also be a stochastic cost function if wemultiply U by −
1. Suppose we have some estimator, ˆ β of β . The following assumption isimposed to construct our diagnostic test. Assumption 2 (i) ( X i , U i , W i ) (cid:124) is i.i.d.(ii) U i and W i are independent.(iii) U i ≥ E [ | U i | ] < ∞ and W i ∈ R with Q W (1) = ∞ .10iv) F W ∈ D ( G ξ W + ) with ξ W + ≥
0. In addition, F W ( · ) is twice continuously differentiablewith bounded derivatives, and the density f W ( · ) satisfies that ∂f W ( t ) /∂t (cid:37) t → ∞ on [ c, ∞ ) for some constant c .(v) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ˆ β − β (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) sup i || X i || = o p ( n ξ W + ), if ξ W + > (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ˆ β − β (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) sup i || X i || /f W ( Q W (1 − /n )) = o p (1),otherwise.Assumption 2 is similar to Assumption 1 with additional restrictions on the covariate X . In particular, Assumption 2(v) bounds the norm of ˆ β and || X i || . A sufficient conditionwhen ξ W + is positive is that (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ˆ β − β (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = O p ( n − / ) and sup i || X i || = o p ( n / ), which iseasily satisfied in many applications. When ξ is zero, we need slightly stronger bounds.Straightforward calculations show that the normal distribution satisfies Assumption 2(v) forthe ξ W + = 0 case, if (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ˆ β − β (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = O p (cid:0) n − / (cid:1) and sup i || X i || = O p ( n / − ε ) for some ε > /f W ( Q W (1 − /n )) ≤ O (log( n )) (cf. Example 1.1.7 in de Haan and Ferreira(2007)).Denote ˆ Z i as the OLS residuals and ˆZ + = (cid:16) ˆ Z n : n , ..., ˆ Z n : n − k +1 (cid:17) (cid:124) the largest k order statistics. Then given Assumption 2, the following proposition derivesthe asymptotic distribution of ˆZ + Proposition 2
Suppose Assumption 2 holds. Then, there exist sequences of constants a n and b n such that for any fixed k ˆZ + − b n a n d → V + where the joint density of V + is the same as in Proposition 1. Even though E [ | U i | ] (cid:54) = 0, ordinary least squares (OLS) will typically suffice for ˆ β , because our test isinvariant to relocation. W asymptotically. This validates the construction of thetest (6) by replacing Z ∗ + with ˆZ ∗ + , where ˆZ ∗ + = ˆZ + − ˆ Z n : n − k +1 ˆ Z n : n − ˆ Z n : n − k +1 . Proposition 2 and the continuous mapping theorem, we similarly have lim n →∞ ϕ ( ˆZ ∗ + ) = α . W The previous analysis studies the right tail of W (and equivalently Z ). Suppose we assume W has a symmetric distribution, then the tail indices of both tails of W become equivalent,and hence we can learn about the tail of U using the left tail index of Z . To this end, wemake the following additional assumption. Assumption 3 (i) W i is symmetric at zero.(ii) F U ∈ D ( G ξ U + ) with ξ U + ≥ ξ W − = ξ W + , and the condition that U > ξ U and ξ W as theright tail indices of U and W , respectively. Now we can test if U has a thinner or equal righttail than W by specifying the following hypothesis testing problem, H : ξ U ≤ ξ W against H : ξ U > ξ W . (7)12oreover, if W is in the normal or Laplace family ( ξ W = 0), since we limit the tail indicesto be non-negative, the null hypothesis then reduces to ξ U = ξ W = 0.Under the null hypothesis of (7), W is the leading term in Z in both the left and righttails. Then the DOA assumption for both W and U implies that ξ Z − = max { ξ U , ξ W } , andProposition 2 entails ξ Z + = ξ W . Therefore, the above testing problem becomes equivalent to H : ξ Z − = ξ Z + against H : ξ Z − > ξ Z + . (8)We now construct a test for (8). Define ˆZ − as the smallest k order statistics of theestimation residuals, that is, ˆZ − = (cid:16) ˆ Z n :1 , ˆ Z n, , . . . , ˆ Z n,k (cid:17) (cid:124) and its self-normalized analogue as ˆZ ∗− = ˆZ − − ˆ Z n : k ˆ Z n :1 − ˆ Z n : k . The following proposition establishes that ˆZ ∗− asymptotically has the EV distribution withtail index ξ Z − and is independent from ˆZ ∗ + . Proposition 3
Suppose Assumptions 2 and 3 hold. Then, for any fixed k , (cid:18) ˆZ ∗− ˆZ ∗ + (cid:19) d → (cid:18) V ∗− V ∗ + (cid:19) as n → ∞ , where V ∗− and V ∗ + are independent and both EV distributed with density (5) and tail indices ξ Z − and ξ Z + , respectively. The proof is in Appendix A. Given the above proposition, we aim to construct a gener-13lized likelihood ratio test for (8) as follows, ϕ ± (cid:0) v ∗− , v ∗ + (cid:1) = (cid:34) (cid:82) { ( ξ − ,ξ + ) ∈ Ξ : ξ + <ξ − } f V ∗− | ξ − ( v ∗− ) f V ∗ + | ξ + (cid:0) v ∗ + (cid:1) w ( ξ − , ξ + ) dξ − dξ + (cid:82) Ξ f V ∗− | ξ ( v ∗− ) f V ∗ + | ξ ( v ∗ + ) d Λ ( ξ ) > cv( k, α ) (cid:35) , (9)where Ξ denotes the parameter space of the tail indices, and w ( · , · ) is the weighting functionfor the alternative hypothesis as in (6). We set Ξ to be [0 ,
1) to cover all distributions witha finite mean and w ( · ) to be uniform over the alternative space. The weight Λ ( · ) can beconsidered as the least favorable distribution, which we discuss more now.Note that the null hypothesis of (8) is composite. We need to control size uniformlyover all ξ Z − = ξ Z + ∈ Ξ. To that end, we can transform the composite null into a simpleone by considering the weighted average density with respect to the weight Λ. Togetherwith a suitably chosen the critical value, this test (9) maintains the uniform size control.Now the problem reduces to determining an appropriate weight Λ. Elliott et al. (2015)study the generic hypothesis testing problem where a nuisance parameter exists in the nullhypothesis. We tailor their argument for our test (9) and adopt their computational algo-rithm for implementation. In particular, Λ ( · ) and cv( k, α ) are numerically calculated onlyonce by the authors instead of the empiricists who use our test. They only need to con-struct the order statistics ˆZ ∗− and ˆZ ∗ + and numerically evaluate the density. We providemore computational details in the Appendix and the corresponding MATLAB code in thesupplemental materials. By the continuous mapping theorem and Proposition 3, for anyfixed k , lim sup n →∞ E (cid:104) ϕ ± (cid:16) ˆZ ∗− , ˆZ ∗ + (cid:17)(cid:105) ≤ α under the null hypothesis of (8).As we discussed above, the hypothesis testing problem (8) simplifies to H : ξ Z − = ξ Z + = 0 against H : ξ Z − > ξ Z + = 0,14f W is assumed to be in the normal family ( ξ W = 0). Proposition 3 implies ˆZ ∗− and ˆZ ∗ + areasymptotically independent and both of them are EV distributed. Then accordingly, ourtest (9) reduces to (cid:34) (cid:82) Ξ f V ∗− | ξ − ( v ∗− ) dW ( ξ − ) f V ∗− | ( v ∗− ) > cv( k, α ) (cid:35) , which is identical to (6). This suggests that we can simply substitute ˆZ ∗− into (6) forimplementation. W We set w ( · ) to be the uniform weight on [0 , .
99] to include all distributions with a finitemean and the level of significance to be 0 .
05. In Table 1, we report the small sample re-jection probabilities of the test (6). We generate U i from the right half-standard normaland the right half-Laplace(0,1) distributions and W i from four distributions: standard nor-mal, Laplace(0,1) (denoted La(0,1)) Student-t(2), Pareto(0.5) and F(4,4). The normal andLaplace distributions correspond to the null hypothesis, and the other three are alternativehypotheses. The results suggest that the test (6) has an excellent performance in size andpower. Note that when k = 50 and n = 100, we essentially include too many mid-sampleobservations so that the EV approximation is poor.Now we consider the linear regression model that Y i = X (cid:124) i β + Z i with X i = (1 , X i ) (cid:124) and β = (1 , (cid:124) . We assume X i ∼ N (0 ,
1) and independent from Z i . Table 2 reports therejection probabilities of our test (6). Findings are similar to those in Table 1.15 .2 Hypothesis testing about inefficiency U and noise W Consider the hypothesis testing problem (8). We implement the test (9) with the samesetup as above. Table 3 reports the rejection probabilities under the null and alternativehypotheses. We make the following observations. First, the test controls size well unless k is too large relative to n , as seen in the column with n = 100 and k = 50. This is againbecause we are using too many mid-sample observations to approximate the tail so thatthe EV convergence in Propositions 1-3 provides poor approximations. Second, the test hasgood power properties as seen from the last five rows. In particular, using only the largest50 order statistics from 1000 observations leads to the power of 0.94. Finally, the powerdecreases as the alternative hypothesis becomes closer to the null, as we move down alongrows.Now we consider the special case where W is in the normal family. Then we implement(6) with ˆZ ∗− as the input. Table 4 contains the rejection probabilities under the null andalternative hypotheses. The rows with F U being half-normal or Laplace correspond to thesize under the null hypothesis, while other rows the power under the alternative hypothesis.The new test has excellent size and power properties. We illustrate the new method using the US bank data collected by Feng and Serletis (2009).The data are a sample of US banks covering the period from 1998 to 2005 (inclusive). Afterdeleting banks with negative or zero input prices, we are left with a balanced panel of 6,010banks observed annually over the 8-year period. A more detailed description of the datamay be found in Feng and Serletis (2009). Here we specify a stochastic cost function, letting Z = W + U , so U ≥ Y . Since our tests are designed for cross-sectional data, we divide the original paneldata into cross-sections (one for each year) and regress the logarithm of total bank cost ona constant and the logarithms of six control variables, including the wage rate for labor, theinterest rate for borrowed funds, the price of physical capital, and the amounts of consumerloans, non-consumer loans, and securities. Since the object of interest is the cost function, wemultiply the OLS residuals by − k ∈ { , , , } order statistics, respectively, to implement the test (6). The p-values are reported in Table5. Under the assumption that W is symmetric , these small p-values suggest that W hasheavy tails on both sides, so a Student-t assumption (e.g., Wheat, Stead, and Greene, 2019)is more appropriate. We derive several nonparametric tests of the tail behavior of the error components in thestochastic frontier model. The tests are easy to implement in MATLAB and are usefuldiagnostic tools for empiricists.Often a first-step diagnostic tool for SFA is to calculate the skewness of the OLS residualsto see if they are properly skewed. See Waldman (1982), Simar and Wilson (2010), andHorrace and Wright (2020). If they are positively skewed, the maximum likelihood estimatorof the variance of inefficiency is zero, and OLS is the maximum likelihood estimator of β .If they are negatively skewed, then OLS is not a stationary point in the parameter space ofthe likelihood, and the stochastic frontier model is well-posed. After calculating negativelyskewed OLS residuals, a useful second-step diagnostic tool is to implement our nonparametrictests to understand the tail behaviors of the error component distributions and to guide The symmetry assumption is reasonable here and is imposed in Feng and Serletis (2009).
References [1] Andrews, D. W. K. and W. Ploberger (1994). Optimal tests when a nuisance parameteris present only under the alternative,
Econometrica , 62, 1383-1414.[2] Aigner, D., C. A. K. Lovell, and P. Schmidt (1977). Formulation and estimation ofstochastic production frontier models,
Journal of Econometrics , 6, 21-37.[3] Almanidis, P., J. Qian, and R. C. Sickles (2014). Stochastic frontiermodels with boundedinefficiency. In Sickles, R. C. and Horrace, W. C., eds. Festschrift in Honor of PeterSchmidt Econometric Methods and Applications,
New York: Springer , pp. 4782.[4] Arnold, B. C., N. Balakrishnan, and H. H. N. Nagaraja (1992). A First Course in OrderStatistics,
Siam .[5] Cai, J., W. C. Horrace, and C. F. Parmeter (2020). Density deconvolution with Laplaceerrors and unknown variance, Unpublished Manuscript, Syracuse University, Center forPolicy Research.[6] Carree, M. A. (2002). Technological inefficiency and the skewness of the error componentin stochastic frontier analysis,
Economics Letters
SpringerScience and Business Media, New York .[8] Elliott, G., U. K. M¨uller, and M. W. Watson (2015). Nearly optimal tests when anuisance parameter is present under the null hypothesis,
Econometrica , 83, 771-811.189] Feng, G., and A. Serletis (2009). Efficiency and productivity of the US banking in-dustry, 1998-2005: Evidence from the Fourier cost function satisfying global regularityconditions,
Journal of Applied Econometrics , 24, 105-138.[10] Lorens, J. P., L. Simar, and I. Van Keilegom (2020). Estimation of the boundary of avariable observed with symmetric error,
Journal of the American Statistical Association ,115:529, 425-441.[11] Greene, W. H. (1990). A gamma-distributed stochastic frontier model,
Journal ofEconometrics , 46, 141-164.[12] Hall, P. and L. Simar (2002). Estimating a change point, boundary, or frontier in thepresence of observation error,
Journal of the American Statistical Association
97, 523-534.[13] Horrace, W. C. and C. F. Parmeter (2011). Semiparametric deconvolution with unknownerror variance,
Journal of Productivity Analysis , 35, 129-141[14] Horrace, W. C. and C. F. Parmeter (2018). A Laplace stochastic frontier model,
Econo-metric Reviews , 37, 260-280.[15] Horrace, W. C. and I. A. Wright (2020). Stationary points for parametric stochasticfrontier models,
Journal of Business and Economic Statistics , forthcoming.[16] Kneip, A., L. Simar, and I. Van Keilegom (2015). Frontier estimation in the presenceof measurement error with unknown variance,
Journal of Econometrics , 2015, 184, 379-393.[17] Kumbhakar, S. C., Parmeter, C. F., and Tsionas, E. G. (2013). A zero inefficiencystochastic frontier model,
Journal of Econometrics , 172, 6676.1918] Li, Q. (1996). Estimating a stochastic production frontier when the adjusted error issymmetric,
Economics Letters
52, 221228.[19] M¨uller, U. K. and Y. Wang (2017) Fixed- k asymptotic inference about tail properties, Journal of the American Statistical Association , 112, 1134-1143.[20] Simar, L., I, Van Keilegom, and V. Zelenyuk (2017). Nonparametric least squares meth-ods for stochastic frontier models,
Journal of Productivity Analysis , 47, 189-204.[21] Simar, L., and P. W. Wilson (2010), Inference from cross-sectional stochastic frontiermodels,
Econometric Reviews , 29, 6298.[22] Tsionas, E. G. (2007). Effciency measurement with the Weibull stochastic frontier,
Oxford Bulletin of Economics and Statistics , 69, 693706.[23] Waldman, D. M. (1982). A stationary point for the stochastic frontier likelihood,
Journalof Econometrics , 18, 275279.[24] Wheat, P., A. D. Stead, and W. H. Greene (2019). Robust stochastic frontier analysis: aStudents t-half normal model with application to highway maintenance costs in England,
Journal of Productivity Analysis , 51, 2138.20 ppendixA Proofs
Proof of Proposition 1
Since only the right tail index of W shows up in this proof, we simply denote ξ = ξ W + inthis proof.We prove the case with k = 1 first. By Corollary 1.2.4 and Remark 1.2.7 in de Haanand Ferreira (2007), the constants a n and b n can be chosen as follows. If ξ >
0, we choose a n = Q W (1 − /n ) and b n ( ξ ) = 0. If ξ = 0, we choose a n = 1 / ( nf W ( b n )) and b n = Q W (1 − /n ).By construction, these constants satisfy that 1 − F W ( a n v + b n ) = O ( n − ) for any fixed v > P ( Z n : n ≤ a n v + b n )= P ( Z i ≤ a n v + b n ) n ≡ A n ( v ) · (cid:18) B n ( v ) P ( W i ≤ a n v + b n ) (cid:19) n , where A n = P ( W i ≤ a n v + b n ) n , and B n ( v ) = P ( − U i + W i ≤ a n v + b n ) − P ( W i ≤ a n v + b n ) .By Assumption 1-(iv), A n ( v ) → G ξ ( v ) for any constant v >
0. Then by the facts that P ( W i ≤ a n v + b n ) → t/n ) n → exp( t ), it suffices to show that B n ( v ) = o ( n − ).21o this end, we have B n ( v )= (1) E [ F W ( a n v + b n + U i ) − F W ( a n v + b n )] ≤ (2) sup t ∈ [ a n v + b n , ∞ ] f W ( t ) · E [ | U i | ] ≤ (3) f W ( a n v + b n ) · E [ | U i | ]= (4) o ( n − ) , where eq.(1) is by Assumption 1-(ii) ( U i is independent from V i ), ineq.(2) is by the interme-diate value theorem, ineq.(3) follows from Assumption 1-(iv) ( f W ( t ) is non-increasing when t > c for some constant c ), and eq.(4) is seen by Assumption 1-(iii) ( E [ | U i | ] < ∞ ) andAssumption 1-(iv). In particular, the fact that nf W ( a n v + b n ) = o (1) is implied by the vonMises’ condition. See, for example, Corollary 1.1.10 in de Haan and Ferreira (2007) with t = Q W (1 − /n ).Generalization to k > v > v > · · · > v k . Chapter 8.4 in Arnoldet al. (1992) (p.219) gives that P ( Z n : n ≤ a n v + b n , ..., Z n : n − k +1 ≤ a n v k + b n )= F n − kZ ( a n v k + b n ) k (cid:89) r =1 ( n − r + 1) a n f Z ( a n v r + b n )= (cid:34) F n − kW ( a n v k + b n ) k (cid:89) r =1 ( n − r + 1) a n f W ( a n v r + b n ) (cid:35) × (cid:34)(cid:18) F Z ( a n v k + b n ) F W ( a n y k + b n ) (cid:19) n − k k (cid:89) r =1 f Z ( a n v r + b n ) f W ( a n v r + b n ) (cid:35) ≡ ˜ A n × ˜ B n . A n → G ξ ( v k ) (cid:81) kr =1 { g ξ ( v r ) /G ξ ( v k ) } is established by Theorem 8.4.2in Arnold et al. (1992). It now remains to show ˜ B n →
1. First, the fact that( F Z ( a n v k + b n ) /F W ( a n v k + b n )) n − k → k = 1 case. Second, for any vf Z ( v ) f W ( v ) = ∂ E [ F W ( v + U i )] ∂v f W ( v )= ∂∂v (cid:82) F W ( v + u ) f U ( u ) duf W ( v )= (cid:82) ∂∂v F W ( v + u ) f U ( u ) duf W ( v ) (by Leibniz’s rule)= E [ f W ( v + U i )] f W ( v ) , where applying Leibniz’s rule is permitted by Assumption 1-(iv), which implies that f W ( v )is uniformly continuous in v . Then similarly as bounding B n above, we use the mean valueexpansion and Assumptions 1(ii)-(iv) to derive that for any r ∈ { , ..., k } and some constant0 < C < ∞ , (cid:12)(cid:12)(cid:12)(cid:12) f Z ( a n v r + b n ) f W ( a n v r + b n ) − (cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12) E [ f W ( a n v r + b n + U i ) − f W ( a n v r + b n )] f W ( a n v r + b n ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ sup t ∈ [ a n v r + b n , ∞ ] (cid:12)(cid:12)(cid:12)(cid:12) ∂f W ( t ) /∂tf W ( a n v r + b n ) (cid:12)(cid:12)(cid:12)(cid:12) E [ | U i | ] ≤ (cid:12)(cid:12)(cid:12)(cid:12) ∂f W ( a n v r + b n ) /∂tf W ( a n v r + b n ) (cid:12)(cid:12)(cid:12)(cid:12) E [ | U i | ] (by ∂f W ( t ) ∂t (cid:37) ≤ C (cid:12)(cid:12)(cid:12)(cid:12) f W ( a n v r + b n )1 − F W ( a n v r + b n ) (cid:12)(cid:12)(cid:12)(cid:12) E [ | U i | ]= o (1) , t →∞ ∂f W ( t ) /∂t (1 − F W ( t )) f W ( t ) → − − ξ ,which is implied by the von Mises’s condition (cf. Theorem 1.1.8 in de Haan and Ferreira(2007)), and the last equality follows from the facts that n (1 − F W ( a n v r + b n )) = O (1) and nf W ( a n v r + b n ) = o (1) (see again Corollary 1.1.10 in de Haan and Ferreira (2007) with t = Q W (1 − /n )). The proof is then complete. (cid:4) Proof of Proposition 2
In this proof, we drop the subscript W + in ξ W + since it is the only tail index here.Proposition 1 implies that Z + − b n a n d → V + , (10)where V + is jointly EV distributed with tail index ξ , and the constants a n and b n are chosenin the proof of Proposition 1.Let I = ( I , . . . , I k ) ∈ { , . . . , T } k be the k random indices such that Z n : n − j +1 = Z I j , j = 1 , . . . , k , and let ˆ I be the corresponding indices such that ˆ Z n : n − j +1 = ˆ Z ˆ I j . Then theconvergence of ˆZ + follows from (10) once we establish | ˆ Z ˆ I j − Z I j | = o p ( a n ) for j = 1 , . . . , k .We consider k = 1 for simplicity and the argument for a general k is very similar. Denote ε i ≡ ˆ Z i − Z i .Consider the case with ξ >
0. the part in Assumption 2(v) for ξ > i | ε i | = sup i (cid:12)(cid:12)(cid:12) X i (cid:16) β − ˆ β (cid:17)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) β − ˆ β (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) sup i || X i || = o p (1) . Given this, we have that, on the one hand, ˆ Z ˆ I = max i { Z i + ε i } ≤ Z I + sup i | ε i | = Z I + o p (1);24nd on the other hand, ˆ Z ˆ I = max i { Z i + ε i } ≥ max i { Z i + min i { ε i }} ≥ Z I + min i { ε i } ≥ Z I − sup i | ε i | = Z I − o p (1). Therefore, | ˆ Z ˆ I − Z I | ≤ o p (1) = o p ( a n ) since a n → ∞ .Consider the case with ξ = 0. Corollary 1.2.4 in de Haan and Ferreira (2007) impliesthat a n = f W ( Q W (1 − /n )). Thus, the part in Condition 2.3 for ξ = 0 implies that1 a n sup i | ε i | ≤ sup i || X i || · (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) β − ˆ β (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) f W ( Q W (1 − /n ))= o p (1) . Then the same argument as above yields that (cid:12)(cid:12)(cid:12) ˆ Z ˆ I − Z I (cid:12)(cid:12)(cid:12) ≤ O p (sup i | ε i | ) = o p ( a n ). (cid:4) Proof of Proposition 3
Let Z ∗− denote the k smallest order statistics of { Z i } . Let ( a + n , b + n ) (cid:124) and ( a − n , b − n ) (cid:124) be thesequences of normalizing constants for the right and left tails of Z , respectively. Then bythe same argument as in Proposition 2, we have ˆZ − − Z − = o p ( a − n ) and ˆZ + − Z + = o p ( a + n ).Therefore, it suffices to establish Z + and Z − jointly converge to ( V (cid:124) + , V (cid:124) − ) (cid:124) where V (cid:124) + and V (cid:124) − are independent and both EV distributed with indices ξ Z + and ξ Z − , respectively. To thisend, note that the case with k = 1 is established as Theorem 8.4.3 in Arnold et al. (1992).We now generalize their argument for k ≥ Z n : n , . . . , Z n :1 is n ! (cid:81) ni =1 f Z ( z i ) for z ≤ z ≤ . . . ≤ z n . Then by a change of variables,the joint density of ( Z n : n − b + n ) /a + n , . . . , ( Z n : n − k +1 − b + n ) /a + n , ( Z n : k − b − n ) /a − n , . . . , ( Z n :1 − b + n ) /a + n satisfies that for v − ≤ v − ≤ . . . ≤ v − k ≤ v + k ≤ . . . v +1 , Z n : n ≤ a + n v +1 + b + n , ..., Z n : n − k +1 ≤ a + n v + k + b + n ,Z n :1 ≥ a − n v − + b − n , ..., Z n : k ≤ a − n v − k + b − n = (cid:0) F Z ( a + n v + k + b + n ) − F Z ( a − n v − k + b − n ) (cid:1) n − k × k (cid:89) r =1 ( n − r + 1) a − n f Z (cid:0) a − n v − r + b − n (cid:1) × k (cid:89) r =1 ( n − r + 1) a + n f Z (cid:0) a + n v + r + b + n (cid:1) ≡ P n × P n × P n .By the DOA assumption for both the left and right tails and equations (8.3.1) and (8.4.9)in Arnold et al. (1992), P n → G ξ Z + (cid:0) v + k (cid:1) (cid:16) − G ξ Z − (cid:0) v − k (cid:1)(cid:17) .By (8.4.4) in Arnold et al. (1992) and the fact that k is fixed, P n → (cid:81) kr =1 g ξ Z − ( v − r ) /G ξ Z − ( v − r )and P n → (cid:81) kr =1 g ξ Z + ( v + r ) / (cid:16) − G ξ Z + ( v + r ) (cid:17) . The proof is then complete by combining P jn for j = 1 , , (cid:4) B Computational details
This section provides more details for constructing the test (9), which is based on the limitingobservations V ∗− and V ∗ + . The density is given by (5), which is computed by GaussianQuadrature. To construct the test (9), we specify the weight w to be uniform over thealternative space for expositional simplicity, which can be easily changed. Then, it remainsto determine a suitable candidate for the weight Λ and the critical value cv( k, α ). We dothis by the generic algorithm provided by Elliott et al. (2015) and M¨uller and Wang (2017).26he idea of identifying a suitable choice of Λ and cv( k, α ) is as follows. First, we candiscretize Ξ into a grid Ξ a and determine Λ accordingly as the point masses. Then we cansimulate N random draws of V ∗− and V ∗ + from ξ ∈ Ξ a and estimate the rejection probability P ξ ( ϕ ± ( V ∗− , V ∗ + ) = 1) by sample fractions. The subscript ξ emphasizes that the rejectionprobability depends on the value of ξ that generates the data. By iteratively increasing ordecreasing the point masses as a function of whether the estimated P ξ ( ϕ ± ( V ∗− , V ∗ + ) = 1) islarger or smaller than the nominal level, we can always find a candidate Λ together withcv( k, α ) that numerically satisfy the uniform size control.In practice, we can determine the point masses by the following steps. Let c be short forcv( k, α ). Algorithm:
1. Simulate N = 10,000 i.i.d. random draws from some proposal density with ξ drawnuniformly from Ξ a , which is an equally spaced grid on [0 , .
99] with 50 points.2. Start with Λ (0) = { / , / , . . . , / } and c = 1. Calculate the (estimated) cover-age probabilities P ξ j ( ϕ ± ( V ∗− , V ∗ + ) = 1) for every ξ j ∈ Ξ a using importance sampling.Denote them by P = ( P , ..., P ) (cid:124) .
3. Update Λ and c by setting c Λ ( s +1) = c Λ ( s ) + κ ( P − .
05) with some step-length constant κ >
0, so that the j -th point mass in Λ is increased/decreased if the coverage probabilityfor ξ j is larger/smaller than the nominal level.4. Integrate for 500 times. Then, the resulting Λ (500) and c are a valid candidate.5. Numerically check if ϕ ± with Λ (500) and c indeed controls the size uniformly by simu-lating the rejection probabilities over a much finer grid on Ξ. If not, go back to step 2with a finer Ξ a . 27
100 1000 k
10 20 50 10 20 50 F W Rejection Prob. under half-normal U i N(0,1) 0.01 0.00 0.00 0.03 0.02 0.01La(0,1) 0.05 0.04 0.02 0.05 0.05 0.05t(2) 0.31 0.45 0.35 0.30 0.49 0.76 ± Pa(0.5) 0.31 0.52 0.12 0.30 0.48 0.78 ± F(4,4) 0.32 0.50 0.50 0.29 0.50 0.80 F W Rejection Prob. under half-Laplace U i N(0,1) 0.02 0.00 0.00 0.03 0.01 0.01La(0,1) 0.05 0.04 0.050 0.06 0.06 0.04t(2) 0.32 0.46 0.29 0.32 0.50 0.80 ± Pa(0.5) 0.31 0.52 0.06 0.30 0.49 0.79 ± F(4,4) 0.31 0.52 0.49 0.28 0.48 0.78Table 1: Small sample rejection probabilities of test (6) when there is no covariate. U i isgenerated from half-standard normal or half-Laplace(0,1) and W i is generated from standardnormal, Laplace(0,1), Student-t(2), Pareto(0.5) and F(4,4). Based on 1000 simulation draws.Significance level is 0.05. 28
100 1000 k
10 20 50 10 20 50 F W Rejection Prob. under half-normal U i N(0,1) 0.01 0.00 0.00 0.03 0.02 0.01La(0,1) 0.05 0.04 0.02 0.05 0.05 0.05t(2) 0.30 0.44 0.33 0.30 0.49 0.80 ± Pa(0.5) 0.32 0.51 0.10 0.30 0.48 0.80 ± F(4,4) 0.32 0.50 0.47 0.29 0.50 0.80 F W Rejection Prob. under half-Laplace U i N(0,1) 0.01 0.00 0.00 0.03 0.01 0.01La(0,1) 0.04 0.05 0.01 0.06 0.06 0.04t(2) 0.32 0.46 0.28 0.31 0.50 0.80 ± Pa(0.5) 0.32 0.50 0.07 0.30 0.49 0.80 ± F(4,4) 0.31 0.51 0.47 0.28 0.48 0.78Table 2: Small sample rejection probabilities of test (6) when there are covariates. U i is generated from half-normal and W i is generated from standard normal, Laplace(0,1),Student-t(2), Pareto(0.5) and F(4,4). Based on 1000 simulation draws. Significance level is0.05. 29
100 1000 k
10 20 50 10 20 50 F W F U Rejection Prob. under H N(0,1) half-N(0,1) 0.06 0.06 0.03 0.05 0.05 0.05La(0,1) half-La(0,1) 0.04 0.05 0.07 0.05 0.04 0.04t(2) half-t(2) 0.04 0.06 0.24 0.06 0.06 0.05 ± Pa(0.5) − Pa(0.5) 0.05 0.06 0.67 0.05 0.06 0.05 ± F(4,4) − F(4,4) 0.04 0.05 0.24 0.04 0.06 0.03 F W F U Rejection Prob. under H N(0,1) − Pa(0.75) 0.28 0.68 0.99 0.25 0.55 0.94Laplace(0,1) − Pa(0.75) 0.25 0.46 0.89 0.21 0.44 0.82t(2) − Pa(0.75) 0.13 0.21 0.62 0.09 0.09 0.19 ± Pa(0.5) − Pa(0.75) 0.10 0.23 0.87 0.07 0.12 0.17 ± F(4,4) − Pa(0.75) 0.09 0.14 0.46 0.07 0.09 0.15Table 3: Small sample rejection probabilities of test (9). U i is generated from half-norma,half-Laplace, Student-t(2), Pareto(0.5), F(4,4), and Pareto(0.75) and W i is generated fromstandard normal, Laplace(0,1), Student-t(2), Pareto(0.5) and F(4,4). Based on 1000 simu-lation draws. Significance level is 0.05. 30
100 1000 k
10 20 50 10 20 50 F U Rejection Prob. under Normal W i half-N(0,1) 0.02 0.01 0.00 0.02 0.02 0.00half-La(0,1) 0.05 0.03 0.00 0.06 0.05 0.04half-t(2) 0.31 0.45 0.46 0.34 0.54 0.88 − Pa(0.5) 0.34 0.52 0.48 0.33 0.56 0.89 − F(4,4) 0.31 0.52 0.70 0.32 0.52 0.86 F U Rejection Prob. under Laplace W i half-N(0,1) 0.05 0.03 0.01 0.05 0.05 0.05half-La(0,1) 0.04 0.03 0.00 0.05 0.04 0.03half-t(2) 0.28 0.37 0.38 0.35 0.58 0.89 − Pa(0.5) 0.30 0.40 0.44 0.33 0.58 0.89 − F(4,4) 0.30 0.49 0.59 0.33 0.52 0.86Table 4: Rejection probabilities of test (6). U i is generated from various distributions and W i is generated from standard normal or Laplace(0,1). Based on 1000 simulation draws.Significance level is 0.05. left tail right tailyear k = 25 50 75 100 25 50 75 1001998 > . > . > . > . > . > . > . > . V ∗ + generated from thejoint extreme value distribution (5) and the nominal size of 0 ..