TTesting for Quantile Sample Selection ∗ Valentina Corradi † Surrey University Daniel Gutknecht ‡ Goethe University FrankfurtApril 10, 2020
Abstract
This paper provides a testing approach for detecting sample selection in nonparametricconditional quantile functions. Our testing strategy consists of a two-step procedure: the first testis an omitted predictor test with the propensity score as the omitted variable. As with any omnibustest, in the case of rejection we cannot distinguish between rejection due to genuine selection or tomisspecification. Thus, since the differentiation of the two causes has implications for nonparametric(point) identification and estimation of the conditional quantile function(s), we suggest a second testto identify whether the cause for rejection at the first stage was solely due to selection or not. Usingonly individuals with propensity score close to one, this second test relies on an ‘identification atinfinity’ argument, but accommodates cases of irregular identification. Our testing procedure doesnot require any parametric assumptions on the selection equation, and all our results hold uniformlyacross quantile ranks in a compact set. We apply our procedure to test for selection in log hourlywages using UK Family Expenditure Survey data.
Key-Words:
Nonparametric Estimation, Conditional Quantile Function, Irregular Identification,Wild bootstrap, Specification Test.
JEL Classification:
C12, C14, C21. ∗ We are grateful to Bertille Antoine, Federico Bugni, Xavier D’Hautefoeuille, Giovanni Mellace, Toru Kitagawa,Peter C.B. Phillips, and Youngki Shin for useful discussions and comments. Moreover, we would like to thank seminarparticipants at the ISNPS meeting (Salerno, 2018), the ICEEE meeting (Lecce, 2019), the CIREQ Montreal EconometricsConference (Montreal, 2019), the Warwick Econometrics Workshop (Warwick, 2019), and the Southampton Workshopin Econometrics and Statistics (Southampton, 2019) for helpful comments. † Department of Economics, University of Surrey, School of Economics, Guildford GU2 7XH, UK. Email:
[email protected] ‡ Corresponding Author: Department of Economics and Business, Goethe University Frankfurt, Theodor-W.-AdornoPlatz 4, 60629 Frankfurt, Germany. Email:
[email protected] a r X i v : . [ ec on . E M ] A p r Introduction
Empirical studies using non-experimental data are often plagued by the presence of non-random sampleselection: individuals typically self select themselves into employment, training programs etc. on thebasis of characteristics which are believed to be non-random and unobservable to the researcher(s)(Gronau, 1974; Heckman, 1974). In fact, it is well known that ignoring selection in conditional meanmodels induces a bias in the estimation, which can be additive (see e.g. Heckman, 1979; Das et al.,2003) or multiplicative (Jochmans, 2015) depending on the functional form of the model. In bothcases, one can deal with the selection bias adopting a control function approach. On the other hand,until recently very little was known about the identification and estimation of conditional quantilefunctions in the presence of endogenous selection, see the recent survey by Arellano and Bonhomme(2017b). A notable exception is the case of sample selection in location shift models, where it inducesa parallel shift in the quantile function, and hence can be taken into account by simply correcting forthe selection bias as in the mean case. In all other cases, including linear quantile regression models,however, the presence of endogenous selection causes a rotation of the quantile function (Arellano andBonhomme, 2017a) and control function methods can no longer be applied. In fact, as shown by theseauthors, in the absence of parametric assumptions, conditional quantile functions are point identified‘at infinity’ or when the joint distribution of outcome and selection error is real analytic, a conditionwhich is difficult to verify and whose practical implementation still requires additional parametricrestrictions (e.g. on the copula of the two errors).This highlights the importance of testing for sample selection when estimation of nonparametricconditional quantile functions is the goal, and marks our starting point. More specifically, this paperprovides a testing approach for sample selection in conditional quantile functions, imposing only aminimal set of functional form assumptions on both the outcome and the selection equation(s). In fact,the only additional assumption is that selection (if present) affects outcome through the propensityscore, which is the probability to be in the selected sample, a standard assumption in the selectionliterature (e.g., Das et al., 2003).Our objective is then to develop a rule for deciding between sample selection and another possibleconfounder, namely misspecification of the nonparametric conditional quantile function, and to controlthe overall classification errors asymptotically. The distinction between non-random selection andmisspecification in the form of omitted predictors is particularly relevant for the consistent estimationof and inference about nonparametric conditional quantile functions as selection generally leads toa loss of point identification (cf. Arellano and Bonhomme, 2017a), but consistent estimation and2nference may still be carried out on a subset of observations with propensity score close to one. Bycontrast, omitting relevant predictors impedes, regardless of the presence of endogenous selection,consistent estimation and inference altogether.To understand the heuristics of our testing strategy, note that we have exclusively a problem ofsample selection if the conditional quantile error depends on the propensity score when the latter isin the interior of the unit interval, but is independent of the propensity score when the latter is one.This is so because individuals with a propensity score equal to one are selected into the sample almostsurely. By contrast, our conditional quantile function is likely to miss out on relevant predictor(s)correlated with the propensity score if it depends on the latter, regardless of whether it takes onvalues in the interior of the unit interval or close to one. We formalize this heuristic argument in adecision rule, which we implement in a two-step testing procedure. In the first step, we propose atest for omitted predictors, where the omitted predictor is the (estimated) propensity score. Here, thenull hypothesis is that the conditional quantile error does not depend on the propensity score, whenthe latter is in the interior of its support. Our test statistic resembles that of Volgushev et al. (2013).However, we establish asymptotic normality under the null hypothesis uniformly over all quantileranks in a compact subset of (0 , Importantly, while the second step relies on a so called‘identification at infinity’ argument, meaning that the support of the propensity score has to comprisethe boundary point one, our test does allow for a thin set of observations close to the boundary, thusaccommodating cases of so called irregular identification (Khan and Tamer, 2010). In fact, the rateof convergence of the second test depends on both the degree of irregularity of the marginal densityof the propensity score as well as on the size of the set of covariate values for which identification atinfinity holds. We therefore suggest a studentized version of the test statistic, which is rate adaptiveand converges weakly even if numerator and denominator of the statistic diverge individually at thesame rate.To make the testing procedure operational, we establish the first order validity of wild bootstrapcritical values for the first and the second test. Moreover, the decision rule is formalized, and corre-sponding classification errors associated with our decision are obtained. Finally, we apply our testingprocedure to test for selection in log hourly wages of females and males in the UK using data from theUK Family Expenditure Survey from 1995 to 2000. The same data was recently also used by Arellanoand Bonhomme (2017a) to analyze gender wage inequality in the UK. We run our testing procedureon two different sub-periods of different economic performance, namely 1995-1997 and 1998-2000. Asa preview of the results, we cannot find evidence for selection among females for the 1995-1997 period,but only for the 1998-2000 period. By contrast, while we reject the null of the first test for males withdata from 1995 to 1997, our second test strongly suggests that this rejection may actually be due tomisspecification of the quantile function, a feature that might have remained undetected without ourtesting procedure.Finally, in supplementary material to this paper we also provide an extension of the above testingidea to nonparametric conditional mean functions, which are commonly used in practice. In fact,since we consider both tests to be important (even as ‘standalone’ tests), we derive asymptotic resultsnot only for the first, but also for the second test. More specifically, while the first test builds ona statistic suggested by Delgado and Gonzalez-Manteiga (2001), the second test is again a localizedversion using only observations with propensity score close to one.Tests for conditional mean selection bias in a local average treatment effects framework have alreadybeen suggested by Black et al. (2017). These tests are based on the regression of parametric residualsfrom the null model on those variables which are assumed to affect selection (but not the outcome). Thus, to obtain power against misspecification in this test we require that the omitted predictor(s) are correlatedwith the propensity score, even when the latter is at or close to one.
We begin by outlining the data generating process. As it is customary in the sample selection literature,we postulate that the continuous outcome variable of interest, y i , is observed if and only if s i = 1,where s i denotes a binary selection indicator. For every individual i , we observe covariate(s) x i . A We only consider the case of continuous outcomes in this paper. For generic inference methods for conditionalquantile functions with discrete outcome variables see Chernozhukov et al. (2018). y i are only observed for individuals who participate in the labor market and who areemployed ( s i = 1), and different sub-groups (e.g., males and females) may differ in terms of theirunobservable labor market attachment. Thus, conventional measures of wage gaps or wage inequalitymay be biased (Heckman, 1974, 1979). In addition to x i , we also observe instrumental variable(s) z i .Here, z i is assumed to affect the process of selection into the sample governed by s i , but not y i directly,an assumption which is testable in the context of the sample selection model (Kitagawa, 2010). Notealso that the variables x i and z i need not be disjoint, although our testing procedure requires some ofthe continuous variables in z i to be excluded from x i (cf. Assumption A.1 below).Throughout the paper, the maintained assumptions are that (i) the instrumental variable(s) and (ii)non-random selection (if present) enter the conditional quantile function only through the propensityscore p i ≡ Pr ( s i = 1 | z i ), the probability to be in the selected sample for a given z i . Formally, we canexpress this as follows: A.Q
For all τ ,Pr (cid:16) y i ≤ q τ ( x i ) (cid:12)(cid:12)(cid:12) x i , z i , s i = 1 (cid:17) ( i ) = Pr (cid:16) y i ≤ q τ ( x i ) (cid:12)(cid:12)(cid:12) x i , p i , s i = 1 (cid:17) ( ii ) = Pr (cid:16) y i ≤ q τ ( x i ) (cid:12)(cid:12)(cid:12) x i , p i (cid:17) , (1)holds almost surely, where q τ ( x i ) denotes the conditional τ -quantile of y i given x i and selection s i = 1,the probability limit of the conditional local quantile regression estimator defined in (15) below.Assumption A.Q is implied by standard threshold crossing selection models where s i = 1 { p ( z i ) > v i } and the unobservable error terms from the selection and the outcome equation are jointly independentof x i and z i (see Remark 1 below). In particular, note that we will only require that the propensityscore is a smooth, but not necessarily monotonic function of z i . In fact, the conditions set out in(1) are the only ‘structure’ we impose on the way in which selection enters the conditional mean orquantile function. Remark 1 : To see that the condition in (1) is implied by the set-up of e.g. Arellano and Bonhomme(2017a), assume that there exists an unobserved outcome y ∗ i (e.g., market wages) given by y ∗ i = q ( u i , x i )and a selection indicator (e.g., employment status) s i = 1 { p ( z i ) > v i } , where ( u i , v i ) are assumed tobe jointly statistically independent of z i given x i . Also, assume that: y i = y ∗ i s i iff s i = 1 . Then, if ( u i , v i ) are absolutely continuous w.r.t. Lebesgue measure, have standard uniform marginal6istributions, and as F y ∗ | x ( y ∗ i | x i ) and its inverse are strictly increasing, we obtain thatPr (cid:16) y ∗ i ≤ q τ ( x i ) (cid:12)(cid:12)(cid:12) x i , z i , s i = 1 (cid:17) = Pr (cid:16) q ( u i , x i ) ≤ q τ ( x i ) (cid:12)(cid:12)(cid:12) x i , z i , v i < p ( z i ) (cid:17) (2)= Pr (cid:16) y i ≤ q τ ( x i ) (cid:12)(cid:12)(cid:12) x i , p i (cid:17) = Pr (cid:16) u i ≤ τ (cid:12)(cid:12)(cid:12) x i , p i (cid:17) . Note that in fact q τ ( x i ), the ‘observed’ τ quantile of y i given x i and s i = 1, coincides with the τ quantile of y ∗ i given x i when selection is random, i.e. when F y | x,s =1 ( y | x, s = 1) = F y ∗ | x ( y | x ) almostsurely.Given the existence of a valid (continuous) instrument and under Equation (1), our aim is now todevelop a rule for deciding between selection and misspecification (where the latter does not necessarilyrule out selection), or none of the two, and to obtain bounds on the classification error. As outlinedin the introduction, this decision rule is based on the outcome of a two step testing procedure. Wefirst outline the two sets of hypotheses tested in the first and the second step.In the first step, we test the hypothesis that the propensity score is not an omitted predictor,against its negation. In what follows, let T = [ τ , τ ] denote the compact set of quantile ranks to beexamined, where 0 < τ ≤ τ <
1. Also, we use X to denote a compact set in the interior of the unionof the supports of covariates R x , and P = [ p, p ] ⊂ (0 ,
1) to denote a compact subset of the supportof p ( z i ). In the first step we test H (1)0 ,q versus H (1) A,q using the subset of selected individuals for which s i = 1, i.e. H (1)0 ,q : Pr (Pr ( y i ≤ q τ ( x i ) | x i = x, p i = p ) = τ ) = 1 for all τ ∈ T , x ∈ X , and p ∈ P (3)versus H (1) A,q : Pr (Pr ( y i ≤ q τ ( x i ) | x i = x, p i = p ) = τ ) < τ ∈ T , x ∈ X , and p ∈ P . (4)The logic behind H (1)0 ,q vs. H (1) A,q is that, given (1),Pr ( y i ≤ q τ ( x i ) | x i , p i ) = Pr ( y i ≤ q τ ( x i ) | x i , p i , s i = 1) = τ if and only if Pr ( y i ≤ q τ ( x i ) | x i , p i , s i = 1) = Pr ( y i ≤ q τ ( x i ) | x i , s i = 1). Note that the null hypothesis ofno omitted predictor in (3) could have been also stated in terms of Conditional Distribution Functions’ From here onwards, we make the conditioning on values of x i and p i explicit whenever required for clarity. F y | x,p ( y | x i = x, p i = p ) = F y | x ( y | x i = x ) (5)for all x ∈ X , p ∈ P , and y ∈ Y for some set Y subset of the support of y i . A test for the nullin (5) is based on the difference between CDFs estimated using a larger and a smaller informationset. While this circumvents the issue of extreme quantile estimation, such a test would suffer froma dimensionality problem (see Remark 3 in the next section). Moreover, in the context of selectoin,interest often lies in specific (conditional) quantiles or an interior set quantile ranks. For instance, wemight only be interested in testing for sample selection in the (log) wage distribution of males andfemales from lower conditional quantiles such as from the 10% to the 25% quantiles, or of individualsthat earn below the (conditional) median wage etc.. To carry out this type of analysis in the conditionaldistribution function context would require finding corresponding values say y and y to examine all y such that y ≤ y ≤ y . These values are typically unknown and require estimating the conditionalquantiles in the first place.Under (1) and assumptions outlined in the next section, failure to reject H (1)0 ,q rules out endogenousselection asymptotically, with probability approaching one. Therefore, if we fail to reject the nullhypothesis, we stop the testing procedure and decide against selection. By contrast, rejection in thisfirst test can occur either due to genuine selection or due to an omitted variable in the outcomeequation, which happens to be correlated with the propensity score. This is so, since the omittedpredictor test, as any omnibus test, does not possess directed power against specific alternatives. Wetherefore design a test in the second step which has directed power against detecting misspecification.To render this argument more formal, suppose there is an omitted relevant predictor π i , which wedefine as follows: Definition 1 : Let (cid:101) q τ ( x i , π i ) and q τ ( x i ) denote probability limits of two local polynomial quantileestimators for the selected subsample of y i on x i and π i as well as on x i only, respectively. We saythat π i is a relevant predictor if for some τ ∈ T and π ∈ R π , where R π denotes the support of π i , (cid:101) q τ ( x, π ) (cid:54) = q τ ( x ) for at least all x in a subset of X with non-zero Lebesgue measure.Therefore, if π i is a relevant, omitted predictor which is correlated with p i , we expect indeed thatPr (cid:16) y i ≤ q τ ( x i ) (cid:12)(cid:12)(cid:12) x i = x, p i = p (cid:17) (cid:54) = τ with positive probability for some τ ∈ T , x ∈ X , and p ∈ P . Remark 2 : Consider again the set-up of Remark 1, but suppose that the true τ conditional quantile8f y ∗ i is given by (cid:101) q ( τ, x i , π i ). Hence, π i is an omitted predictor, which is assumed to be correlated with p i . Given A.2 below, and letting y ∗ i = (cid:101) q ( (cid:101) u i , x i , π i ), we can writePr ( y ∗ i ≤ q τ ( x i ) | x i , z i , s i = 1)= Pr ( (cid:101) u i ≤ τ | x i , z i , s i = 1)= Pr ( (cid:101) u i ≤ τ | x i , p i ) (cid:54) = τ, with positive probability, where the last equality follows since (cid:101) u i is a function of π i (and x i ), which iscorrelated with p i , even in the absence of non-random selection.Hence, we want to disentangle selection from relevant omitted predictors correlated with the propensityscore, which may as well cover the case of endogenous selection. In order to impose no-selection asmaintained hypothesis, we require the existence of at least one value z in the support of z i s.t. p ( z ) = 1.This type of condition is typically labelled ‘identification at infinity’ in the nonparametric identificationliterature (e.g. Chamberlain, 1986) and requires the existence of a continuous instrument exhibitingsufficient independent variation from x i . Note, however, that in Section 4 we will address concernsthat the marginal density of p i may not be bounded away from zero at p i = 1 (so called irregularidentification) resulting in very few observations with (estimated) propensity score close to one.Since at p i = 1, every individual is selected into the sample with certainty, and so selection is notpresent, in a second step, we test the null hypothesis that the propensity score is an omitted predictorwhen p i = 1. That is, we test H (2)0 ,q = Pr (Pr ( y i ≤ q τ ( x i ) | x i = x, p i = 1) = τ ) = 1 (6)for all τ ∈ T , and x ∈ X for which identification at infinity holds (a more precise notion will be givenin Section 4), versus H (2) A,q = Pr (Pr ( y i ≤ q τ ( x i ) | x i = x, p i = 1) = τ ) < τ ∈ T , and for some x . Thus, if selection is the sole cause for rejection of H (1)0 ,q , we do notexpect to reject H (2)0 ,q (at least asymptotically). By contrast, if we reject H (2)0 ,q , we take this as anindication that misspecification was likely to be the or at least a major driver of the rejection at thefirst stage. We formalize these arguments in Section 5 via a Decision Rule for which we shall establish As detailed out in Section 4, this test requires to assume that misspecification, if present, is not independent of thepropensity score p i when the latter is one (or close to one). Of course, as we discuss in Section 5, we cannot rule out selection if both misspecification and selection are presentand lead to a rejection simultaneously.
We now introduce a statistic for testing H (1)0 ,q vs. H (1) A,q , as defined in (3) and (4). Moreover, fornotational simplicity, from here onwards we assume that all components of x i and z i are continuous.The extension to discrete elements in both vectors (as long as both still contain continuous elementswhich exhibit independent variation of each other) is immediate at the cost of more complicatednotation and more lengthy arguments in the proofs. Also, note that one would generally expect theconvergence rate of our statistics to depend only on the number of continuous elements in x i (and z i )(cf. Li and Racine, 2008).To implement our test, we rely on a statistic very close to that of Volgushev et al. (2013). Thisstatistic has the advantage of requiring an estimate of the conditional quantile function only underthe null hypothesis, i.e. where the conditional quantile is a function of x i only. To estimate theconditional quantile function(s) at some point x i = x , we use an r -th order local polynomial estimator,which we denote by (cid:98) q τ ( x ), while its corresponding probability limit is denoted by q † τ ( x ), which areformally defined in the Appendix Equations (15) and (16). Moreover, define (cid:98) u τ ( x i ) ≡ y i − (cid:98) q τ ( x i ), u τ ( x i ) ≡ y i − q † τ ( x i ), and let x = ( x , ..., x d x ) , and x = (cid:0) x , ..., x d x (cid:1) , x, x ∈ X , where d x denotes thedimension of x i . The test statistic is given by: Z q ,n = sup τ ∈T , ( x,x ) ∈X , ( p,p ) ∈P | Z q ,n (cid:0) τ, x, x, p, p (cid:1) | , where Z q ,n (cid:0) τ, x, x, p, p (cid:1) = 1 √ n n (cid:88) i =1 s i (1 { (cid:98) u τ ( x i ) ≤ } − τ )Π d x j =1 { x j < x j,i < x j } { p < (cid:98) p i < p } . The statistc Z ,n (cid:0) τ, x, x, p, p (cid:1) differs from Volgushev et al. (2013) in two aspects. First, the omittedregressor p i is not observable and thus replaced by a nonparametric estimator, (cid:98) p i . Under regularityand bandwidth conditions outlined below, we show however that the estimation error arising from (cid:98) p i isasymptotically negligible. This is a well known result for estimates affecting the statistic only througha weight function (cf. Escanciano et al., 2014). Second, and more importantly, our test statistic isconstructed taking the supremum also w.r.t. τ (over T ). We therefore can test for selection not onlyat a given quantile, but across all interior quantile ranks. Heuristically, this is achieved via the use10f a local polynomial quantile estimator for which Guerre and Sabbah (2012) established a Bahadurrepresentation uniform over compact sets X and T . In fact, in the simulations, we do not find theassumption of compactness of X to be of great importance in finite samples.In the sequel, we make the following assumptions: A.1 ( y i , x (cid:48) i , z (cid:48) i , s i ) ⊂ R y × R x × R z × { , } are identically and independently distributed. Let X ≡ X × . . . × X d x denote a compact subset of the interior of R x . z i contains at least one vari-able which is not contained in x i and which is not x i -measurable. The distributions of x i and z i have a probability density function with respect to Lebesgue measure which is strictly positive andcontinuously differentiable (with bounded derivatives) over the interior of their respective supports.Also, assume that the joint density function of y i , x i and p i is uniformly bounded everywhere, andthat Pr( s i = 1 | x, p ) = Pr( s i = 1 | p ) > x ∈ X and p ∈ P . A.2
The distribution function F y | x,s =1 ( ·|· , · ) of y i given x i and selection s i = 1 has a continuousprobability density function f y | x,s =1 ( y | x, s = 1) w.r.t. Lebesgue measure which is strictly positive andbounded for all y ∈ R y , x ∈ X . The partial derivative(s) ∇ x F y | x,s =1 ( y | x, s = 1) are continuous on R y × X . Moreover, there exists a positive constant C such that: | f y | x,s =1 ( y | x, s = 1) − f y | x,s =1 ( y (cid:48) | x (cid:48) , s = 1) | ≤ C (cid:107) ( y, x ) − ( y (cid:48) , x (cid:48) ) (cid:107) for all ( y, x ) , ( y (cid:48) , x (cid:48) ) ∈ R y × X . Also assume that q τ ( x ) is r + 1 − th times continuously differentiableon X for all τ ∈ T with r > d x . A.3
There exists an estimator (cid:98) p ( z i ) such that sup z ∈Z | (cid:98) p ( z ) − p ( z ) | = o p ( n − ) with Z a compact subsetof R z , and that: Pr ( ∃ i : z i ∈ R z \ Z , p ( z i ) ∈ P ) = o ( n − ) . A.4
For some positive constant C , it holds that: | F p | x,u τ ,s =1 ( p | x, , s = 1) − F p | x,u τ ,s =1 ( p (cid:48) | x (cid:48) , , s = 1) | ≤ C (cid:107) ( p, x ) − ( p (cid:48) , x (cid:48) ) (cid:107) for all τ ∈ T , ( p, p (cid:48) ) ∈ P , and ( x, x (cid:48) ) ∈ X . Qu and Yoon (2015) recently presented a uniform (in x i ) Bahadur representation for the conditional (re-arranged)quantile estimator on an unbounded set X . While this feature is certainly appealing, their representation does not holduniformly in τ . We therefore rely on a representation derived by Guerre and Sabbah (2012, see below for details), whichholds uniformly on compact sets X and T . .5 The non-negative kernel function K ( · ) is a bounded, continuously differentiable function withuniformly bounded derivative and compact support on [ − , (cid:82) K ( v ) dv = 1 as well as (cid:82) vK ( v ) dv = 0.Assumption A.1 imposes the existence of at least one continuous instrumental variable, and ensuresthe existence of selected observations for all values in X and P . Assumptions A.2 and
A.4 on theother hand are rather standard smoothness assumptions, while
A.3 is a high-level condition, whichensures that p ( z i ) can be estimated at a specific rate uniformly over Z so that estimation error in (cid:98) p ( z i )is asymptotically negligible. In fact, in the case where (cid:98) p ( z i ) is a local constant kernel estimator, theuse of a second order kernel imposes restrictions on the dimensionality of the number of continuousregressors, namely d z < Note also that a sufficient condition for the second part of
A.3 is theexistence of sufficient moments. Letting ‘ ⇒ ’ denote weak convergence, we establish the asymptoticbehavior of Z q ,n . Theorem 1:
Let Assumptions
A.1 - A.5 and
A.Q hold. Moreover, let h x denote a deterministicbandwidth sequence that satisfies h x → n → ∞ . If as n → ∞ , ( nh d x x ) / log n → ∞ , nh rx log n → , then(i) under H (1)0 ,q , Z q ,n ⇒ Z q , where Z q is the supremum of a zero mean Gaussian process whose covariance kernel is defined in theproof of Theorem 1.(ii) under H (1) A,q , there exists ε > , such thatlim n →∞ Pr (cid:16) Z q ,n > ε (cid:17) = 1 . The results of Theorem 1 rely on an appropriate choice of h x . As common in the nonparametrictesting literature, our rate conditions require undersmoothing, and thus cross-validation is not directlyapplicable in our setting. However, to still pick h x in a data-driven manner ensuring minimal biasat the same time, one possibility to select h x in practice could be to choose h x on the basis of cross-validation for a local polynomial estimator of order smaller than the one assumed for the test. For More specifically, if we estimate (cid:98) p i using a local constant estimator, and select the bandwidth via cross validation,we may choose h z , the bandwidth of this estimator, to be of order h z = O ( n − dz ). In this case, when d z <
4, the biasis of order n − dz = o (cid:16) n − / (cid:17) , while for the standard deviation we obtain ( √ nh ) − = o (cid:16) n − / (cid:17) . On the contrary, if d z ≥
4, we instead require a local polynomial estimator of order greater than one. In fact, the order of the bandwidth selected by cross-validation is too large for nh rx log( n ) → r = 3 as an example), h x could be chosen by cross-validation for a local linear estimator, i.e. h x = O (cid:16) n − dx (cid:17) . This in turnimplies that nh rx log( n ) → nh d x x / log( n ) → ∞ whenever d x < Remark 3 : While the above test restricts itself to T , a compact subset of (0 , A corresponding test statistic, which might be moresuitable for extreme quantiles, could for instance be based on a weighted comparison of two empricalconditional distribution functions: h ( d x +1) / n (cid:88) j =1 (cid:16) (cid:98) F y | x,p,s =1 ( y j | x j , (cid:98) p j , s j = 1) − (cid:98) F y | x,s =1 ( y j | x j , s j = 1) (cid:17) ω ( x j , (cid:98) p j )where ω ( x j , (cid:98) p j ) is a non-negative weighting function, and (cid:98) F denotes a kernel estimator of a nonpara-metric conditional distribution function. However, a limitation of this test is that the above statisticconverges at a nonparametric rate, which depends on the dimension of the larger information set( x (cid:48) j , (cid:98) p j ) , see e.g. Corradi et al. (2019). This is not the case with the statistic presented here, whichconverges at a parametric rate.Since the limiting distribution Z q depends on features of the data generating process, we derivea bootstrap approximation for it. In particular, we follow He and Zhu (2003), and use the bootstrapstatistic: Z ∗ q ,n (cid:0) τ, x, x, p, p (cid:1) = 1 √ n n (cid:88) i =1 s i ( B i,τ − τ )Π d x j =1 { x j < x j,i < x j } (cid:0)(cid:0) { (cid:98) p i < p } − { (cid:98) p i < p } (cid:1) (8) − (cid:16) (cid:98) F p | x,u τ ,s =1 ( p | x i, , s i = 1) − (cid:98) F p | x,u τ ,s =1 (cid:0) p | x i, , s i = 1 (cid:1)(cid:17)(cid:17) , where B i,τ = 1 { U i ≤ τ } with U i i.i.d. ∼ U (0 , (cid:98) F p | x,u τ ,s =1 ( p | x i, , s i = 1)denotes a nonparamemtric kernel estimator with corresponding bandwidth sequence h F satisfaying h F → n → ∞ (see Equation (18) in the Appendix for a formal definition). The bootstrap test Even though detecting sample selection among say the upper 5% or lower 5% of the (conditional) wage distributiondoes not appear to be of particular relevance for most applied researchers. Z ∗ q ,n = sup τ ∈T , ( x,x ) ∈X ,p,p ∈P | Z ∗ q ,n (cid:0) τ, x, x, p, p (cid:1) | . Let c ∗ (1)(1 − α ) ,n,R be the (1 − α ) percentile of the empirical distribution of Z ∗ q, ,n , ..., Z ∗ q,R ,n , where R isthe number of bootstrap replications. The following Theorem establishes the first order validity ofinference based on the bootstrap critical values, c ∗ (1 − α ) ,n,R . Theorem 1 ∗ : Let Assumption
A.1 - A.5 and
A.Q hold. If as n → ∞ , ( nh d x x ) / log n → ∞ , nh rx log n → , h F → , nh d x +1 F → ∞ , and R → ∞ , then(i) under H (1)0 ,q lim n,R →∞ Pr (cid:16) Z q ,n ≥ c ∗ (1)(1 − α ) ,n,R (cid:17) = α (ii) under H (1) A,q lim n,R →∞ Pr (cid:16) Z q ,n ≥ c ∗ (1)(1 − α ) ,n,R (cid:17) = 1 . If we fail to reject the null hypothesis H (1)0 ,q , we can conclude that the propensity score is not an omittedpredictor and thus there is no endogenous selection. This is so because the Type II error approacheszero in probability. On the other hand, if we reject the null, this may be either due to genuine non-random selection or instead due to an omitted regressor which is correlated with the propensity score,as the first test does not have directed power against either of the alternatives. We therefore rely on an ‘identification at infinity’ argument, which in turn allows us discriminatebetween both alternatives. More specifically, provided p ( z ) = 1 for some z ∈ Z , under correctspecification (i.e., in the absence of omitted predictors), whenever p →
1, the selection bias approaches Note that Volgushev et al. (2013) suggest a variant of Z ∗ q ,n ( τ, x, x, p ) in which τ is replaced by (cid:98) τ = n − n (cid:88) i =1 { y i ≤ (cid:98) q τ ( x i ) } . While we outline the second test in the context of the two-step testing procedure, it is noteworthy that the secondtest can as well be viewed as a ‘standalone’ test if researchers are exclusively interested in inferring whether results after‘selection correction’ may also be driven by misspecification or not. p → Pr ( y i ≤ q τ ( x i ) | x i , p ) = τ. By contrast, when π i is also a relevant predictor in the sense of Definition 1 of Section 2 for value(s) x ∈ X with p ( z ) close to one, an assumption that we make explicit in condition A.8 below, thenlim p → Pr ( y i ≤ q τ ( x i ) | x i , p ) (cid:54) = τ with positive probability. This heuristic motivates the second statisticbased on observations with (estimated) propensity score close to one for testing H (2)0 ,q vs. H (2) A,q as definedin (6) and (7).A common concern in the context of ‘identification at infinity’ is so called irregular identification(Khan and Tamer, 2010), where, although conditional quantiles are point identified, they cannot beestimated at a regular convergence rate as the marginal density of p i may not be bounded away fromzero at the evaluation point p ( z ) = 1. That is, heuristically, even if ‘identification at infinity’ holds,and for some value z ∈ R z , p ( z ) can reach one, it is still possible that observations in the neighborhoodof one are very sparse in practice (‘thin density set’), and so convergence occurs at an irregular rate(Khan and Tamer, 2010). To address this issue, we only use observations from parts of the supportwhere the density of p i is bounded away from zero. Formally, this is implemented by introducing atrimming sequence, converging to zero at a sufficiently slow rate so that irregular identification is nolonger a concern. Thus, let δ = 1 − H with H → H/h p → ∞ as n → ∞ , where H governs thespeed of the trimming sequence δ , while h p defines the window width around δ . Then, for some fixedset ( x, x ), the second test is based on the statistic Z q ,n ( τ, x, x,
1) = (cid:80) ni =1 s i (1 { (cid:98) u τ ( x i ) ≤ } − τ )Π d x j =1 { x j < x j,i < x j } K (cid:16) (cid:98) p i − δh p (cid:17)(cid:16)(cid:82) K ( v ) dv (cid:80) ni =1 s i (1 { (cid:98) u τ ( x i ) ≤ } − τ ) Π d x j =1 { x j < x j,i < x j } K (cid:16) (cid:98) p i − δh p (cid:17)(cid:17) / . (9)This statistic only uses observations with (estimated) propensity score (cid:98) p i ∈ (1 − h p − H, h p − H ),and thus overcomes the issue of possible irregular identification as long as a sufficient number ofobservations are assumed to exist in this set (see below). Note here that the convergence speed of H is inherently pegged to the tail behavior of the density of p i in the neighborhood of p = 1, which isof course unknown in practice. That is, the thinner the density tail of p i , the slower H has to go tozero. We discuss this issue and a potential data-driven way to select H and h p in given finite samplesafter Theorem 2 further below.In what follows, let ξ j,n , j = { , . . . , d x } be a deterministic sequence that, for each j , may converge15o 0 or to some ξ j > n → ∞ . Furthermore, let C ,n ≡ ⊗ d x j =1 [ x ,j − ξ j,n , x ,j + ξ j,n ] , for some point x ∈ X defined in A.6 below. Note that C ,n ⊆ ⊗ d x j =1 (cid:2) x j , x j (cid:3) . Finally, define G x i ( τ, − H ) ≡ Pr( y i ≤ q τ ( x i ) | x i , − H ), and note that under H (2)0 ,q , it holds that lim H → G x i ( τ, − H ) = τ forevery τ ∈ T and x i ∈ lim n →∞ C ,n , and that lim H → Pr( s i = 1 | − H ) = 1. We make the followingadditional assumptions: A.6
Assume there exists at least one point x ∈ X , such that for at least one z ∈ R z , it holds that p ( z ) = 1. Moreover, there exists a strictly positive, continuous, and integrable function g y,x,p ( y, x, g x,p ( x,
1) such that for all x ∈ C ,n and y ∈ R y :sup x ∈C ,n ,y ( x ) ∈ R y (cid:18)(cid:12)(cid:12)(cid:12)(cid:12) f y,x,p ( y, x, − H ) g y,x,p ( y, x, H η − (cid:12)(cid:12)(cid:12)(cid:12)(cid:19) → x ∈C ,n (cid:18)(cid:12)(cid:12)(cid:12)(cid:12) f x,p ( x, − H ) g x,p ( x, H η − (cid:12)(cid:12)(cid:12)(cid:12)(cid:19) → n → ∞ for some 0 ≤ η < η <
1. Moreover, for all x / ∈ C ,n , and y ∈ R y , it holds that f y,x,p ( y ( x ) , x, − H ) = 0 for every n . A.7
The distribution function F y | x,p,s =1 ( ·|· , · , · ) of y i given x i , p i , and selection s i = 1 has a con-tinuous probability density function f y | x,p,s =1 ( y | x, p, s = 1) w.r.t. Lebesgue measure. The functions f y | x,p,s =1 ( y i | x i , p i , s i = 1), f p | x,s =1 ( p i | x i , s i = 1), Pr( s i = 1 | p i ), and f x,p ( x i , p i ) are continuously differ-entiable w.r.t. to p i on (0 ,
1) for all x i ∈ X and z i ∈ Z . Moreover, assume that for every x ∈ X and y , f y,x,p ( y, x, · ), f x,p ( x, · ), and f p ( · ) are left-continuous at p = 1. A.8
The set of x ∈ X for which π i is a relevant predictor is a subset of C ,n (see Assumption A.6). A.9
Assume that for all x ∈ X and τ ∈ T , there exist positive constants C ( x ) and C such that: | G x ( τ, − H ) − G x ( τ, | ≤ C ( x ) H − η as well as | Pr( s i = 1 | − H ) − | ≤ CH − η . Moreover, the partial derivatives sup y ∈ R y ,x ∈X ,p ∈ (0 , |∇ p f y | x,p,s =1 ( y | x, p, s = 1) | , sup x ∈X ,p ∈ (0 , |∇ p f p | x,s =1 ( p | x, s =1) | , sup x ∈X ,p ∈ (0 , |∇ p Pr( s i = 1 | p ) | , and sup x ∈X ,p ∈ (0 , |∇ p f x,p ( x, p ) | are bounded.Assumption A.6 requires identification at infinity, for at least some values of the covariates. Inparticular, we allow for both the case of ξ j,n = ξ j > ξ j,n → n → ∞ . The case of ξ j,n = ξ j > j ∈ { , . . . , d x } , corresponds to the case of strong support, as we require the propensity scoreto approach one for all x s in a set of non-zero Lebesgue measure in a compact subset of R d x . In thecase when instead ξ j,n → j, we require identification at infinity over a subset ofnon-zero Lebesgue measure in a compact subset of R d (cid:48) x , with d (cid:48) x < d x . Finally, if ξ j,n → n → ∞ forall j , we only search over an interval shrinking to a singleton in X . Furthermore, in all cases we allowfor so called irregular support, in the sense that f x,p ( x,
1) is not necessarily bounded away from zeroat p = 1. In fact, when η = 0, lim H → f x,p ( x, − H ) is bounded away from zero for all x ∈ C ,n , while η > η representing thinner tails).That is, if η >
0, we allow for a thin set of observations with a propensity score close to one. Similarly,when η = 0, the first part A.9 becomes a standard Lipschitz condition, while as η gets closer to oneand the tails of the densities in A.6 become thinner, we allow G x ( τ, − H ) and Pr( s i = 1 | − H ) toapproach G x ( τ,
1) and 1, respectively, at a slower rate. Finally, Assumption
A.8 is crucial for thetest to have directed power against misspecification since it postulates that omitted predictors π i , ifpresent, are correlated with the event { p i = 1 } , or, more specifically { p i ∈ (1 − h p − H, h p − H ) } .The rate of convergence of the numerator in (9) depends on both on (cid:16) Π d x j =1 ξ j,n (cid:17) , the measure ofthe set C ,n , and on H η , the tail behavior of the density f x,p ( x, p ) around p = 1, which are both ofcourse unknown in practice. In fact, we are generally ignorant about the rate of convergence given by (cid:114) nh p (cid:16) Π d x j =1 ξ j,n (cid:17) H η , which may in principle be as fast as (cid:112) nh p . To address this problem, we use astudentized statistic, which allows the convergence rate to vary depending on both the measure of theset of x for which p ( z ) = 1, and on the sparsity of observations around p close to one. That is, as wecannot infer the appropriate scaling factor, it is crucial that, regardless of the ‘strength’ of the supportand the degree of thinness of the set of observations with propensity score close to one, Z ,n ( τ, x, x, (cid:112)(cid:100) var ( Z ,n ( τ, x, x, Z q ,n ( τ,x,x, (cid:113)(cid:100) var ( Z q ,n ( τ,x,x, as an empirical process over τ ∈ T , but for fixed values of the interval extremes x and x. The reason is that for the case of ξ n → , the statistic does not depend on x, x, as it vanishes outside a shrinking interval of x . Theorem 2:
Let Assumption
A.1 , A.3 , A.5 , A.6 , A.7 , A.8 , A.9 , and
A.Q hold. If as n →∞ , ( nh d x x ) / log n → ∞ , nh rx log n → , H → , H/h p → ∞ , nh p H − η (cid:16)(cid:81) d x j =1 ξ j,n (cid:17) →
0, and nh p (cid:16)(cid:81) dj =1 ξ j,n (cid:17) H η → ∞ , then Note that under H (2)0 ,q , we have that G x ( τ,
1) = τ almost surely. H (2)0 ,q , sup τ ∈T (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Z q ,n ( τ, x, x, (cid:113)(cid:100) var ( Z q ,n ( τ, x, x, (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ⇒ Z q , where Z q is the supremum of a zero mean Gaussian process with covariance kernel defined in the proofof Theorem 2.(ii) under H (2) A,q , there exists ε > , such thatlim n →∞ Pr sup τ ∈T (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Z q ,n ( τ, x, x, (cid:113)(cid:100) var ( Z q ,n ( τ, x, x, (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > ε = 1 . Theorem 2 establishes the limiting distribution of the studentized statistic. As the theoretical resultscrucially hinge on the tuning parameters H and h p , whose rates depend in turn on the unknown ξ j,n and η , a discussion of their choice in practice is warranted. In fact, a possible data-driven choice ofthese parameters, without claiming optimality of a specific kind, could be as follows: as shown inthe supplementary material, one may re-write h p and H , which is a function of h p itself, as functionsof η only, i.e. h p ( η ) = Cn − − η − ε + εη log( n ) − and H ( η ) = h p ( η ) − ε for some arbitrary ε > η < η , and some scaling constant C . Here, η represents the threshold value with the slowest possibleconvergence rate still satisfying the rate conditions of Theorem 2. Thus, in order to ‘choose’ thesmallest possible η in practice, which in turn corresponds to the fastest possible convergence rate, onecould for instance plot nh p ( η )) − η (cid:80) ni =1 K (cid:16) (cid:98) p i − (1 − H ( η )) h p ( η ) (cid:17) for a given ε > ε = .
1) on a grid ofdifferent η values with η ∈ [0 , η as the smallest value for which the estimated density isbounded away from zero, e.g. above a minimum threshold value such as 10 − . In fact, if the set of (cid:98) p i close to 1 is not ‘thin’, we would expect to select ˆ η = 0 in large enough samples with this type ofprocedure.Finally, note that an alternative test for selection against omitted relevant predictors could inprinciple be based on the null q τ ( x,
1) = q τ ( x, p ) for all τ ∈ T and for some x for which identificationat infinity holds. A statistic for this null could be constructed using the weighted difference of thetwo corresponding estimators of q τ ( x, p ) and q τ ( x, q τ ( x, . Hence, when constructing the wild bootstrap statistic we do not have to ‘subtract’an estimator of the conditional distribution of p i . On the other hand, as the rate of convergencedepends on the ‘degree’ of irregular identification at p close to 1 and on the set of covariates for whichidentification at infinity holds, we also need a appropriately studentized bootstrap statistic, i.e. Z ∗ q ,n = sup τ ∈T (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Z ∗ q ,n ( τ, x, x, (cid:113) (cid:100) var ∗ ( Z q ,n ( τ, x, x, (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) , (10)where Z ∗ q ,n ( τ, x, x,
1) = n (cid:88) i =1 s i ( B i,τ − τ )Π d x j =1 { x j < x j,i < x j } K (cid:18) (cid:98) p i − δh p (cid:19) with B i,τ = 1 { U i ≤ τ } with U i i.i.d. ∼ U (0 ,
1) and independent of the sample, and (cid:100) var ∗ ( Z q ,n ( τ, x, x, (cid:32) n n (cid:88) i =1 ( B i,τ − τ ) (cid:33) (cid:18)(cid:90) K ( v ) dv (cid:19) n (cid:88) i =1 s i Π d x j =1 { x j < x j,i < x j } K (cid:18) (cid:98) p i − δh p (cid:19) . (11)By noting that n (cid:80) ni =1 ( B i,τ − τ ) = τ (1 − τ ) + o ∗ p (1) , given (11), we see that whenever identificationat infinity holds at all x ∈ X and the number of observations with propensity score in the interval(1 − h p − H, h p − H ) grows at rate nh p , then both numerator and denominator in (10) are boundedin probability, otherwise they diverge at the same rate.Let c ∗ (2)(1 − γ ) ,n,R be the (1 − γ ) percentile of the empirical distribution of Z ∗ q, ,n , ..., Z ∗ q,R ,n , where R isthe number of bootstrap replications. The following Theorem establishes the first order validity ofinference based on the bootstrap critical values, c ∗ (2)(1 − γ ) ,n,R . Theorem 2 ∗ : Let Assumption
A.1 , A.3 , A.5 , A.6 , A.7 , A.8 , A.9 , and
A.Q hold. If as n → ∞ , ( nh d x ) / log n → ∞ , nh rx log n → , H → , H/h p → ∞ , nh p H − η (cid:16)(cid:81) d x j =1 ξ j,n (cid:17) → nh p (cid:16)(cid:81) dj =1 ξ j,n (cid:17) H η → ∞ , and R → ∞ , then(i) under H (2)0 ,q lim n,R →∞ Pr (cid:16) Z q ,n ≥ c ∗ (2)(1 − γ ) ,n,R (cid:17) = γ (ii) under H (2) A,q lim n,R →∞ Pr (cid:16) Z q ,n ≥ c ∗ (2)(1 − γ ) ,n,R (cid:17) = 1 . Theorem 2* establishes the first order validity of inference based on wild bootstrap critical values.19nder H (2)0 ,q , the studentized statistic and its bootstrap counterpart have the same limiting distribu-tion. Under, H (2) A,q , the statistic diverges, as the numerator is of larger probability order than thedenominator, while the bootstrap statistic remains bounded in probability. As detailed in the introduction, if we fail to reject the null hypothesis of the first test, we decideagainst endogeneous selection which allows to rely on nonparametric estimators of the conditionalquantiles using all selected individuals in the data. On the other hand, if we reject the first test, butfail to reject the second one, one may still estimate the conditional quantile function(s) using onlyindividuals with propensity score close to one, e.g. as in (17) in the Appendix. Finally, if the nullhypotheses of both tests are rejected, there is evidence for relevant omitted predictor(s) (and possiblyendogenous selection), and neither the estimator using all selected individuals nor the one using onlythose with propensity score close to one will deliver estimates consistent for the conditional quantilefunction(s) of interest.Thus, there are three cases to be distinguished, namely no sample selection ( H NS,q ), sample selec-tion only ( H S,q ), and misspecification possibly with sample selection ( H M,q ). We have that: H NS,q = H (1)0 ,q , (12) H S,q = H (1) A,q ∩ H (2)0 ,q , (13)and finally: H M,q = H (1) A,q ∩ H (2) A,q . (14)This means that we decide for no selection if we fail to reject H (1)0 ,q , but decide for selection onlyif we reject H (1)0 ,q and fail to reject H (2)0 ,q . By contrast, if we reject both H (1)0 ,q and H (2)0 ,q , we opt formisspecification (and selection).We now formalize the rules for differentiating among these three cases. Let c ∗ (1)(1 − α ) ,n,R and c ∗ (2)(1 − γ ) ,n,R be respectively the (1 − α ) and (1 − γ ) bootstrap critical values for either the quantile case (as definedin Theorem 1* and in Theorem 2*). Based on the outcome of first and second test, we device thefollowing decision rule. Rule RS j,n : (1) If Z q ,n ≤ c ∗ (1)(1 − α ) ,n,R , we decide that H NS,q is true. That is, we decide in favor of non-selection.20 If Z q ,n ≥ c ∗ (1)(1 − α ) ,n,R and Z q ,n ( τ, x, x, ≤ c ∗ (2)(1 − γ ) ,n,R , we decide that H S,q is true. That is, we decidein favor of selection only. (3) If Z q ,n ≥ c ∗ (1)(1 − α ) ,n,R and Z q ,n ( τ, x, x, ≥ c ∗ (2)(1 − γ ) ,n,R , we decide that H M,q is true. That is, wedecide in favor of misspecification or misspecification and selection.The Theorem below establishes the validity of our procedure by showing that the mis-classificationprobabilities (e.g., decide for non-selection when there is selection and/ or misspecification) are asymp-totically controlled by our decision rule at pre-specified levels.
Theorem RS
Let all the Assumptions and the rate conditions in Theorems 1, 1*, 2, and 2* hold.Then, ( i ) lim n →∞ Pr (choose H NS,q or H S,q | H M,q is true) = 0 , ( ii ) lim n →∞ Pr (choose H S,q or H M,q | H NS,q is true) ≤ α, and ( iii ) lim n →∞ Pr (choose H M,q or H NS,q | H S,q is true) ≤ γ. Since the estimator we use depends on the outcome of a testing procedure, one may be concerned aboutthe size problem arising when one fails to reject a Hausman test of endogeneity, and then conductsinference based on OLS estimators, as outlined in Guggenberger (2010a,b). In terms of (2) in Remark1, suppose that the correlation between u i and p i is weak, say of order n − / . In this case, we may failto reject the null of no selection, and thus move to estimate conditional quantiles using all selectedindividuals. If inference is based on a nonparametric quantile estimator, this estimator converges to itslimiting distribution at a rate slower than n / and thus pre-testing would not represent a problem. Itis only in the case where we decide to make inference using an estimator for a parametric conditionalquantile function that the issue of size distorsion may arise as in the set-up of Guggenberger (2010a,b).Moreover, while in the Hausman pre-testing case, the cost of always using instrumental variables isonly in terms of efficiency loss, in our context the cost lies predominantly in a much slower convergencerate.As pointed out in the previous section, our testing procedure cannot disentagle selection andomitted regressors when the latter are uncorrelated with p i when p i takes on values close to one. Inthis case, we still decide in favor of selection, and so we estimate the quantiles using only observationswith propensity score close to one. However, even in this case we make the ‘right’ decision in thesense that for observations with propensity score close to one, omitted predictor bias is not present.21inally, it is also noteworthy that our testing procedure only requires a second stage when we areunsure about the correct specification of the outcome function and the correlation of the unobservedfactors with the instrument(s): when the instrumental variable(s) are free of these concerns as theyhave been constructed e.g. on the basis of a randomized control trial, a second stage is not required. Our illustration is based on a subsample of the UK wage data from the Family Expenditure Surveyused by Arellano and Bonhomme (2017a). As pointed out by these authors, due to changes inemployment rates over time, simply examining wage inequality for females and males at work overtime may provide a distorted picture of market-level wage inequality. We will therefore run ourselection testing procedure on two different subsets of the data, namely 1995 to 1997, a period ofincreasing gross domestic product (GDP) growth rates, and 1998 to 2000, a period of high, but stableGDP growth rates. Unlike Arellano and Bonhomme (2017a), however, our testing procedure forselection will not rely on a parametric specification of the conditional log-wage quantile functions, butremain completely nonparametric.The covariates we include in x i are dummies for marital status, education (end of schooling at 17or 18, and end of schooling after 18), location (eleven regional dummies), number of kids (split bysix age categories), time (year dummies), as well as age in years. This set of covariates is identicalto the one used by Arellano and Bonhomme (2017a), but for the fact that the latter used cohortdummies instead of age in years. The continuous instrumental variable is given by the measure ofpotential out-of-work (welfare) income, interacted with marital status. This variable, which was alsoused by Arellano and Bonhomme (2017a), builds on Blundell et al. (2003) and is constructed for eachindividual in the sample (employed and non-employed) using the Institute of Fiscal Studies (IFS) taxand welfare-benefit simulation model.The final sample for the years 1995-1997 comprises 21,263 individuals, 11,647 of which are femalesand 9,616 of which are males, respectively. The number of working females (males) with a positive loghourly wage in that sample is 7,761 (7,623). By contrast, for the 1998-2000 period we obtain 16,350observations, 8,904 females and 7,446 males. The number of working females (males) in that sampleare 5,931 (6,058).All estimates are constructed using routines from the np package of Hayfield and Racine (2008).More specifically, we estimate the propensity score Pr( s i = 1 | z i ) fully nonparametrically using a stan- For the exact construction of the sample see their paper and references therein. n = 450), selectingthe median values over 50 replications. The conditional quantile function q τ ( x i ) is estimated as inEquation (19) of Li and Racine (2008), while the conditional distribution function F p | x,u τ ,s =1 ( ·|· , · , · ) isconstructed as in Equation (4) of the same paper. The bandwidths are again determined as before. The quantile grid is chosen to be T = { . , . , . , . , . , . , . , . , . } .To provide the reader with a better illustration of the potential magnitudes of selection intowork, we replicate the predictions from the (estimated) parametric conditional quantile functionsfrom Figure 1 of Arellano and Bonhomme (2017a, p.16) for the sub-periods 1995-1997 and 1998-2000, see Figures 1 and 2, respectively. In these pictures, solid lines represent estimated uncorrected(for selection) conditional log-wage quantile functions, while dashed lines are the ones corrected forsample selection. Throughout, female quantile lines lie below the male quantile lines. The figures,which display selection corrections based on a linear quantile regression model and parametric selectioncorrection, show little difference between the original and the corrected lines for both males and femalesfor the 1995-1997 period (except for males at lower percentile levels), but more pronounced differencesfor the subsequent 1998-2000 period. In terms of magnitude, the effect of correction appears to begenerally bigger for males than for females in both subperiods.Turning to the test results in Table 1, we see that while we cannot find any evidence for selectionduring the 1995-1997 period for females at conventional significance levels, there is some evidence formales at the 10% significance level. In fact, taking a closer look at the results in Table 1, we observethat rejection for males occurs on the basis of the 10th percentile, which is in line with the graphicalevidence in Figure 1. Switching over to the right panel of Table 1, however, we obtain a differentpicture: for females, H (1)0 ,q is rejected at any conventional level, and rejection is most pronounced atthe 20th and the 30th percentile. On the other hand, we cannot reject H (1)0 ,q for males. This failure toreject H (1)0 ,q for males is in contrast to the graphical evidence in Figure 2, and highlights the importanceof formal testing under a more flexible specification.Following our testing procedure outlined in Section 2, we perform the second test for males in the1995-1997 period, and for females in the 1998-2000 period (see Table 2). Turning to the results, we Recall that there are no continuous covariates contained in x i , and thus the theoretical rate conditions do not directlyapply here. For the exact specification used, see Arellano and Bonhomme (2017a). q τ ( x i ) for males,but fail to reject that null for females at any conventional levels. Both results appear to be robust todifferent choices of δ and h p . Thus, under the assumption that out-of-work income is a valid instrumentand indeed selection enters outcome as postulated in Equation (1), our test results suggest that thereis evidence for selection among females for the 1998-2000, but not for males. In fact, what appears tobe selection among males during the 1995-1997 period may actually be attributed to misspecificationof the conditional quantile function. In a related paper, Kitagawa (2010) tested for the validity of the same instrumental variable (but for the interactionwith marital status) in a similar data set on the basis of the UK Family Expenditure Survey used by Blundell et al.(2007). Although his test results are not directly informative here as his test is run on a much coarser set of covariates x i not including e.g. regional, martial, or family information, his evidence suggested that the conditional independence ofthe instrument and outcome (given x i and selection s i = 1) may indeed be violated for some sub-groups (in particular,younger males with moderate levels of education). Thus, rejection in the second test for males could also be related tothis feature. a) τ = 10% (b) τ = 40%(c) τ = 30% (d) τ = 40%(e) τ = 50% (f) τ = 60%(g) τ = 70% (h) τ = 80%(i) τ = 90% Figure 1: Corrected and Uncorrected Log Hourly Wage Quantiles by Gender 1995-1997 (Arellano andBonhomme, 2017a).
Note: male quantiles are always at the top, female ones at the bottom (solid lines:uncorrected quantiles; dashed lines: selection corrected quantiles) a) τ = 10% (b) τ = 20%(c) τ = 30% (d) τ = 40%(e) τ = 50% (f) τ = 60%(g) τ = 70% (h) τ = 80%(i) τ = 90% Figure 2: Corrected and Uncorrected Log Hourly Wage Quantiles by Gender 1998-2000 (Arellano andBonhomme, 2017a).
Note: male quantiles are always at the top, female ones at the bottom (solid lines:uncorrected quantiles; dashed lines: selection corrected quantiles) .
050 0 . .
050 0 . .
037 0 . .
031 0 . .
030 0 . .
040 0 . .
034 0 . .
040 0 . .
032 0 . .
022 0 . .
050 0 . .
053 0 . .
10 0 . .
045 0 . .
033 0 . .
039 0 . .
044 0 . .
041 0 . .
045 0 . .
039 0 . .
035 0 . .
033 0 . .
032 0 . .
048 0 . .
049 0 . .
17 0 . Note: Number of Bootstrap Replications is 400.
This paper introduces a novel testing procedure to detect sample selection in conditional quantilefunctions, without imposing parametric assumptions on either the outcome or the selection equation.This is accomplished via two tests, the first of which is an omitted predictor test, with the estimatedpropensity score as omitted predictor. As with any omnibus test, rejection in the first step can be dueto either selection or to the omission of a predictor which is correlated with the estimated propen-sity score. Since selection and misspecification have very different implications for the estimation ofnonparametric (conditional) quantile functions, we aim at disentangling the two if we reject in thefirst step. That is, after rejection in the first test we proceed to the second test, which is a localizedversion of the first test, using only observations with (estimated) propensity score close to one. Arejection in this case indicates the presence of misspecification, possibly in conjunction with selection.Importantly, the second test, although relying on ‘identification at infinity’, allows for irregular iden-tification by using observations close, but not too close to one. We establish the first order validity ofbootstrap critical values based on the wild bootstrap.In our empirical illustration, we test for sample selection in log hourly wages of females and malesin the UK using data from the UK Family Expenditure Survey. Using the periods 1995-1997 and1998-2000 as examples, we find evidence for selection among females for the 1998-2000, but not formales. In fact, what appears to be selection among males during the 1995-1997 period may actually27est 2Males - 1995-1997 δ = . δ = . δ = 1 δ = 1 h p = . h p = . h p = . h p = . .
013 9 .
828 8 .
488 6 . .
961 7 .
118 5 .
899 4 . .
961 8 .
358 7 .
377 5 . .
097 9 .
397 8 .
033 7 . .
614 9 .
828 8 .
488 7 . .
013 9 .
165 7 .
704 7 . .
267 8 .
800 7 .
132 7 . .
170 7 .
082 5 .
185 6 . .
310 5 .
939 4 .
422 5 . .
166 4 .
267 3 .
203 3 . .
476 2 .
663 2 .
340 2 . .
682 2 .
936 2 .
589 2 . .
00 0 .
00 0 .
00 0 . δ = . δ = . δ = . δ = . h p = . h p = . h p = . h p = . .
249 1 .
431 2 .
346 1 . .
492 0 .
323 0 .
176 1 . .
312 0 .
024 1 .
032 1 . .
664 0 .
054 0 .
247 0 . .
696 0 .
108 1 .
209 0 . .
249 0 .
736 1 .
557 1 . .
125 1 .
144 2 .
061 1 . .
477 1 .
431 2 .
346 1 . .
739 0 .
310 0 .
993 0 . .
913 0 .
591 0 .
522 0 . .
323 2 .
246 2 .
459 2 . .
528 2 .
453 2 .
706 2 . .
83 0 .
71 0 .
13 0 . Note: Number of Bootstrap Replications is 1,000
28e attributed to misspecification of the conditional quantile function.
Nonparametric Estimators :As detailed in the main text, to estimate the conditional quantile function at some point x i = x , weuse an r-th order local polynomial estimator based on the standard ‘check type’ objective function: l τ ( v ) = 2 v ( τ − { v ≤ } ) . The local polynomial estimator is then given by: (cid:98) b h x ( τ, x ) = arg min b nh d x x n (cid:88) i =1 l τ y i − b − (cid:88) ≤| t |≤ r b t ( x i − x ) t s i K (cid:18) x i − xh x (cid:19) (15)is an estimator of b † h x ( τ, x ) with b † ( τ, x ) = arg min b lim n →∞ nh d x x n (cid:88) i =1 E l τ y i − b − (cid:88) ≤| t |≤ r b t ( x i − x ) t s i K (cid:18) x i − xh x (cid:19) . (16)Here, K ( · ) denotes a d x dimensional product kernel. We use (cid:98) q τ ( x ) = (cid:98) b ,h x ( τ, x ), the first element of (cid:98) b h x ( τ, x ), and set q † τ ( x ) = b † ( τ, x ). If the quantile functions are estimated using only observationswith propensity score close to one, a corresponding estimator can be defined as: (cid:98)(cid:101) b h x ( τ, x ) = arg min b nh d x x h p n (cid:88) i =1 l τ y i − b − (cid:88) ≤| t |≤ r b t ( x i − x ) t s i K (cid:18) x i − xh x (cid:19) K (cid:18) (cid:98) p i − δ n h p (cid:19) . (17)Finally, the kernel estimator of F p | x,u τ ,s =1 ( p | x i , , s i = 1) used to construct the bootstrap statistic ofthe first test is given by: (cid:98) F p | x,u τ ,s =1 ( p | x i, , s i = 1) = nh dx +1 F (cid:80) nj =1 s j { (cid:98) p j ≤ p } K (cid:16) (cid:98) u j,τ − h F (cid:17) K (cid:16) x j − x i h F (cid:17) nh dx +1 F (cid:80) nj =1 s j K (cid:16) (cid:98) u j,τ − h F (cid:17) K (cid:16) x j − x i h F (cid:17) . (18) Auxiliary Lemmas :In the following, let E S n [ · ] denote the expectation operator conditional on the actual sample realiza-tions. Moreover, since 1 { p ≤ (cid:98) p i ≤ p } = 1 { (cid:98) p i ≤ p } − { (cid:98) p i ≤ p } , we will ignore the part of the statisticwhich involves 1 { (cid:98) p i ≤ p } in the sequel. Lemma 1 : Let Assumptions
A.1 - A.5 and
A.Q hold. Moreover, let h x denote a deterministicbandwidth sequence that satisfies h x → n → ∞ . If as n → ∞ , ( nh d x x ) / log n → ∞ , nh rx log n → T , X , and P : In the discrete case, we can set up a local constant estimator in the direction of the discrete elements. We borrow notation from Masry (1996) letting t = ( t , . . . , t d x ) (cid:48) , | t | = (cid:80) d x j =1 t j , and (cid:80) ≤| t |≤ r = (cid:80) rj =0 (cid:80) jt =0 . . . (cid:80) jt dx =0 . i) Under H (1)0 ,q : √ nE S n (cid:34) s i (1 { (cid:98) u τ ( x i ) ≤ } − τ ) Π d x j =1 { x j < x j,i < x j } { (cid:98) p i ≤ p }− s i (1 { u τ ( x i ) ≤ } − τ ) Π d x j =1 { x j < x j,i < x j } { p i ≤ p } (cid:35) = − √ n n (cid:88) j =1 F p | x,u τ ,s =1 ( p | x j , , s j = 1) ( s j (1 { u τ ( x j ) ≤ } − τ )) Π d x l =1 { x l < x l,j < x l } + o p (1) . (ii) Under H (1) A,q : √ nE S n (cid:34) s i (1 { (cid:98) u τ ( x i ) ≤ } − τ ) × Π d x j =1 { x j < x j,i < x j } { (cid:98) p i ≤ p } − s i (1 { u τ ( x i ) ≤ } − τ )Π d x j =1 { x j < x j,i < x j } { p i ≤ p } (cid:35) = O p (cid:32) ln( n ) (cid:112) h d x x (cid:33) . Lemma 2 : Let Assumptions
A.1 , A.3 , A.5 , A.6 , A.7 , A.8 , A.9 , and
A.Q hold. If as n →∞ , ( nh d x x ) / log n → ∞ , nh rx log n → , H → , H/h p → ∞ , nh p H − η (cid:16)(cid:81) d x j =1 ξ j,n (cid:17) →
0, and nh p (cid:16)(cid:81) dj =1 ξ j,n (cid:17) H η → ∞ , then uniformly over T : (i) n (cid:114) nh p H η (cid:16)(cid:81) d x j =1 ξ j,n (cid:17) E S n (cid:20) s i (1 { (cid:98) u τ ( x i ) ≤ } − τ )1 { x i ∈ C ,n } K (cid:18) (cid:98) p i − δh p (cid:19) − s i (1 { u τ ( x i ) ≤ } − τ )1 { x i ∈ C ,n } K (cid:18) p i − δh p (cid:19)(cid:21) = o p (1) (ii) (cid:114) nh p H η (cid:16)(cid:81) d x j =1 ξ j,n (cid:17) n (cid:88) i =1 (cid:18) s i (1 { (cid:98) u τ ( x i ) ≤ } − τ )1 { x i ∈ C ,n } K (cid:18) (cid:98) p i − δh p (cid:19) − E S n (cid:20) s i (1 { (cid:98) u τ ( x i ) ≤ } − τ )1 { x i ∈ C ,n } K (cid:18) (cid:98) p i − δh p (cid:19)(cid:21)(cid:19) − (cid:114) nh p H η (cid:16)(cid:81) d x j =1 ξ j,n (cid:17) n (cid:88) i =1 (cid:18) s i (1 { u τ ( x i ) ≤ } − τ ) 1 { x i ∈ C ,n } K (cid:18) p i − δh p (cid:19) − E S n (cid:20) s i (1 { u τ ( x i ) ≤ } − τ ) 1 { x i ∈ C ,n } K (cid:18) p i − δh p (cid:19)(cid:21)(cid:19) = o p (1) . Lemma 3 : Let Assumptions
A.1 , A.3 , A.5 , A.6 , A.7 , A.8 , A.9 , and
A.Q hold. If as n →∞ , ( nh d x x ) / log n → ∞ , nh rx log n → , H → , H/h p → ∞ , nh p H − η (cid:16)(cid:81) d x j =1 ξ j,n (cid:17) →
0, and nh p (cid:16)(cid:81) dj =1 ξ j,n (cid:17) H η → ∞ , then: 30 i) Under H (2)0 ,q and H (2) A,q , pointwise in τ ∈ T : (cid:80) ni =1 s i (1 { u τ ( x i ) ≤ } − G x ( τ, (cid:81) d x j =1 (cid:8) x j ≤ x i,j ≤ x j (cid:9) K (cid:16) p i − δh p (cid:17)(cid:114)(cid:82) K ( v ) dv (cid:80) ni =1 s i (1 { u τ ( x i ) ≤ } − G x ( τ, Π d x j =1 { x j < x j,i < x j } K (cid:16) p i − δh p (cid:17) d → N (0 , . (ii) Under H (2)0 ,q , uniformly in τ ∈ T :1 (cid:114) nh p H η (cid:16)(cid:81) d x j =1 ξ j,n (cid:17) n (cid:88) i =1 s i ( G x i ( τ, p i ) − τ )1 { x i ∈ C ,n } K (cid:18) p i − δh p (cid:19) = o p (1) Proofs of Theorem 1 and 2 : Proof of Theorem 1 : (i) Start by noting that we can decompose Z ,n ( τ, x, x, p ) as follows:1 √ n n (cid:88) i =1 s i (1 { (cid:98) u τ ( x i ) ≤ } − τ )Π d x j =1 { x j < x j,i < x j } { (cid:98) p i ≤ p } = 1 √ n n (cid:88) i =1 s i (1 { u τ ( x i ) ≤ } − τ )Π d x j =1 { x j < x j,i < x j } { p i ≤ p } + √ nE S n (cid:34) s i (1 { (cid:98) u τ ( x i ) ≤ } − τ ) Π d x j =1 { x j < x j,i < x j } { (cid:98) p i ≤ p }− s i (1 { u τ ( x i ) ≤ } − τ ) Π d x j =1 { x j < x j,i < x j } { p i ≤ p } (cid:35) − √ n n (cid:88) i =1 (cid:40) s i (1 { u τ ( x i ) ≤ } − τ ) Π d x j =1 { x j < x j,i < x j } { p i ≤ p }− E S n (cid:34) s i (1 { u τ ( x i ) ≤ } − τ ) Π d x j =1 { x j < x j,i < x j } { p i ≤ p } (cid:35)(cid:41) + 1 √ n n (cid:88) i =1 (cid:40) s i (1 { (cid:98) u τ ( x i ) ≤ } − τ ) Π d x j =1 { x j < x j,i < x j } { (cid:98) p i ≤ p }− E S n (cid:34) s i (1 { (cid:98) u τ ( x i ) ≤ } − τ ) Π d x j =1 { x j < x j,i < x j } { (cid:98) p i ≤ p } (cid:35)(cid:41) = I n + II n + III n . From Lemma 1(i), II n = − √ n n (cid:88) j =1 F p | x,u τ ,s =1 ( p | x j , , s j = 1) ( s j (1 { u τ ( x j ) ≤ } − τ )) Π d x l =1 { x l < x l,j < x l } + o p (1) , where the o p (1) term holds uniformly over T , X , and P . As for III n , we first, we apply Lemma A.1of Escanciano et al. (2014) to the function classes F ≡ { f ( s, τ, x ) = s (1 { u τ ( x ) ≤ } − τ )Π d x j =1 { x j T × X × P , with covariancekernel cov (cid:16) Z q ,n (cid:0) τ, x, x, p, p (cid:1) , Z q ,n (cid:0) τ (cid:48) , x (cid:48) , x (cid:48) , p (cid:48) , p (cid:48) (cid:1)(cid:17) = E (cid:104) (1 { u τ ( x i ) ≤ } − τ )Π dj =1 { x j < x j,i < x j } (cid:0) s i (1 { p i ≤ p } − { p i ≤ p } ) − ( F p | x,u τ ,s =1 ( p | x i , , s i = 1) − F p | x,u τ ,s =1 ( p | x i , , s i = 1)) Pr( s i = 1 | x i ) (cid:1) (1 { u τ (cid:48) ( x i ) ≤ } − τ (cid:48) )Π dj =1 { x (cid:48) j < x j,i < x (cid:48) j } (cid:0) s i (1 { p i ≤ p (cid:48) } − { p i ≤ p (cid:48) } ) − ( F p | x,u τ ,s =1 ( p (cid:48) | x i , , s i = 1) − F p | x,u τ ,s =1 ( p (cid:48) | x i , , s i = 1)) Pr( s i = 1 | x i ) (cid:1)(cid:3) As an immediate consequence, we also obtain the weak convergence of any continuous functional andso: Z q ,n ⇒ Z q . (ii) Given Lemma 1(ii): Z ,n ( τ, x, x, p )= 1 √ n n (cid:88) i =1 (cid:32) (1 { u τ ( x i ) ≤ } − τ )Π dj =1 { x j < x j,i < x j } s i (cid:32) { p i ≤ p } − F p | x,u τ ,s =1 ( p | x i , , s i = 1) (cid:33) − E (cid:104) (1 { u τ ( x i ) ≤ } − τ )Π dj =1 { x j < x j,i < x j } s i (cid:0) { p i ≤ p } − F p | x,u τ ,s =1 ( p | x i , , s i = 1) (cid:1)(cid:17)(cid:105) + √ n E (cid:104) (1 { u τ ( x i ) ≤ } − τ )Π dj =1 { x j < x j,i < x j } s i (cid:0) { p i ≤ p } − F p | x,u τ ,s =1 ( p | x i , , s i = 1) (cid:1)(cid:105) + O p (cid:32) ln n (cid:112) h d x x (cid:33) (19)with the O p term holding uniformly over T , X , and P . The statement then follows as the first termon the right hand side (RHS) of (19) weakly converges, and ln n √ h dxx diverges at a rate slower than √ n given that nh d x x → ∞ . Proof of Theorem 2 : (i) Given Assumption A.6, Z q ,n ( τ, x, x, 1) (20)= ZN q ,n ( τ, x, x, (cid:114)(cid:82) K ( v ) dv (cid:113) nh p H η ( (cid:81) dxj =1 ξ j,n ) (cid:80) ni =1 s i { (cid:98) u τ ( x i ) ≤ } − τ ) { x i ∈ C ,n } K (cid:16) (cid:98) p i − δh p (cid:17) (1 + o p (1))32here C ,n ⊆ ⊗ d x j =1 [ x j , x j ] and ZN q ,n ( τ, x, x, ≡ (cid:114) nh p H η (cid:16)(cid:81) d x j =1 ξ j,n (cid:17) n (cid:88) i =1 s i (1 { (cid:98) u τ ( x i ) ≤ } − τ )1 { x i ∈ C ,n } K (cid:18) (cid:98) p i − δh p (cid:19) . The o p (1) term, which holds uniformly in τ ∈ T , follows from Assumption A.6, given that for all x inthe complement of C ,n , the set of observations with propensity score close to one is thinner than forall x ∈ C ,n . Hereafter, for brevity, we ignore the (1 + o p (1)) term. Now, ZN q ,n ( τ, x, x, (cid:114) nh p H η (cid:16)(cid:81) d x j =1 ξ j,n (cid:17) n (cid:88) i =1 s i (1 { u τ ( x i ) ≤ } − τ )1 { x i ∈ C ,n } K (cid:18) p i − δh p (cid:19) + n (cid:114) nh p H η (cid:16)(cid:81) d x j =1 ξ j,n (cid:17) E S n (cid:20) s i (1 { (cid:98) u τ ( x i ) ≤ } − τ )1 { x i ∈ C ,n } K (cid:18) (cid:98) p i − δh p (cid:19) − s i (1 { u τ ( x i ) ≤ } − τ )1 { x i ∈ C ,n } K (cid:18) p i − δh p (cid:19)(cid:21) − (cid:114) nh p H η (cid:16)(cid:81) d x j =1 ξ j,n (cid:17) n (cid:88) i =1 (cid:18) s i (1 { u τ ( x i ) ≤ } − τ ) 1 { x i ∈ C ,n } K (cid:18) p i − δh p (cid:19) − E S n (cid:20) s i (1 { u τ ( x i ) ≤ } − τ ) 1 { x i ∈ C ,n } K (cid:18) p i − δh p (cid:19)(cid:21)(cid:19) + 1 (cid:114) nh p H η (cid:16)(cid:81) d x j =1 ξ j,n (cid:17) n (cid:88) i =1 (cid:18) s i (1 { (cid:98) u τ ( x i ) ≤ } − τ )1 { x i ∈ C ,n } K (cid:18) (cid:98) p i − δh p (cid:19) − E S n (cid:20) s i (1 { (cid:98) u τ ( x i ) ≤ } − τ )1 { x i ∈ C ,n } K (cid:18) (cid:98) p i − δh p (cid:19)(cid:21)(cid:19) = I n + II n + III n Given Lemma 2(i)-(ii), II n and III n are o p (1) uniformly in τ ∈ T . Thus, it suffices to derive thelimiting distribution of I n . Recalling that G x i ( τ, p i ) = Pr( y i ≤ q τ ( x i ) | x i , p i ), I n reads as: I n = 1 (cid:114) nh p H η (cid:16)(cid:81) d x j =1 ξ j,n (cid:17) n (cid:88) i =1 s i (1 { u τ ( x i ) ≤ } − G x i ( τ, p i ))1 { x i ∈ C ,n } K (cid:18) p i − δh p (cid:19) + 1 (cid:114) nh p H η (cid:16)(cid:81) d x j =1 ξ j,n (cid:17) n (cid:88) i =1 s i ( G x i ( τ, p i ) − τ )1 { x i ∈ C ,n } K (cid:18) p i − δh p (cid:19) (21)= I ,n + I ,n The first term drives the limiting the distribution, while the second term can be thought of as biassince G x ( τ, p ) → τ only as p → 1. More specifically, by Lemma 3(i), I ,n satisfies a CLT for triangulararrays pointwise in τ ∈ T , while by Lemma 3(ii), I ,n = o p ( I ,n ) uniformly in τ ∈ T and for all33 i ∈ C ,n . We now need to study the denominator in (20): (cid:18)(cid:90) K ( v ) dv (cid:19) nh p H η (cid:16)(cid:81) d x j =1 ξ j,n (cid:17) n (cid:88) i =1 s i (1 { (cid:98) u τ ( x i ) ≤ } − τ ) { x i ∈ C ,n } K (cid:18) (cid:98) p i − δh p (cid:19) = (cid:18)(cid:90) − K ( v ) dv (cid:19) τ (1 − τ ) (cid:82) C ,n g ( x, x (cid:16)(cid:81) d x j =1 ξ j,n (cid:17) + o p (1) = O (1)uniformly over T . The covariance kernel of the statistic is therefore given by:cov (cid:0) Z q ( τ, x, x, , Z q (cid:0) τ (cid:48) , x, x, (cid:1)(cid:1) = lim n →∞ E (cid:80) ni =1 s i (1 { u τ ( x i ) ≤ } − τ )1 { x i ∈ C ,n } K (cid:16) p i − δh p (cid:17)(cid:0)(cid:82) K ( v ) dv (cid:1) (cid:80) ni =1 s i (1 { u τ ( x i ) ≤ } − τ ) { x i ∈ C ,n } K (cid:16) p i − δh p (cid:17)(cid:80) ni =1 s i (1 { u τ (cid:48) ( x i ) ≤ } − τ (cid:48) )1 { x i ∈ C ,n } K (cid:16) p i − δh p (cid:17)(cid:0)(cid:82) K ( v ) dv (cid:1) (cid:80) ni =1 s i (1 { u τ (cid:48) ( x i ) ≤ } − τ (cid:48) ) { x i ∈ C ,n } K (cid:16) p i − δh p (cid:17) . Finally, by Lemma A.1 and B.3 of Escanciano et al. (2014), we can also conclude that numeratorand denominator of Z q ,n ( τ, x, x, 1) are Donsker, and hence by Theorem 2.10.6 of Van der Vaart andWellner (1996), that Z q ,n ( τ, x, x, 1) is Donsker as well. Thus, it follows that Z q ,n ( τ, x, x, ZN q ,n ( τ, x, x, (cid:18)(cid:0)(cid:82) K ( v ) dv (cid:1) nh p H η ( (cid:81) dxj =1 ξ j,n ) (cid:80) ni =1 s i (1 { (cid:98) u τ ( x i ) ≤ } − τ ) { x i ∈ C ,n } K (cid:16) (cid:98) p i − δh p (cid:17)(cid:19) / converges weakly in l ∞ ( T ), and by continuous mapping, so does the functionalsup τ ∈T (cid:12)(cid:12)(cid:12) Z q ,n ( τ, x, x, (cid:12)(cid:12)(cid:12) as postulated in the statement of part (i). (ii) Now, if there is an omitted relevant regressor, given Asumption A.7,1 nh p (cid:16) Π d x j =1 ξ j,n (cid:17) H η n (cid:88) i =1 s i ( G x i ( τ, p i ) − τ )Π d x j =1 { x j ≤ x i,j ≤ x j } K (cid:18) (cid:98) p i − δh p (cid:19) = lim n →∞ h p (cid:16) Π d x j =1 ξ j,n (cid:17) H η E (cid:20) s i ( G x i ( τ, p i ) − τ )1 { x i ∈ C ,n } K (cid:18) p i − δh p (cid:19)(cid:21)(cid:124) (cid:123)(cid:122) (cid:125) (cid:54) =0 + o p (1)and 1 nh p H η (cid:16)(cid:81) d x j =1 ξ j,n (cid:17) n (cid:88) i =1 s i (1 { (cid:98) u τ ( x i ) ≤ } − τ ) (cid:89) d x j =1 { x j ≤ x i,j ≤ x j } K (cid:18) (cid:98) p i − δh p (cid:19) p → lim n →∞ h p (cid:16) Π d x j =1 ξ j,n (cid:17) H η E (cid:20) s i (cid:0) G x ( τ, p i ) + τ − τ G x ( τ, p i ) (cid:1) { x i ∈ C ,n } K (cid:18) p i − δh p (cid:19)(cid:21) > . (cid:114) nh p (cid:16) Π d x j =1 ξ j,n (cid:17) H η alternatives. References Arellano, M. and S. Bonhomme (2017a). Quantile selection models with an application to understand-ing changes in wage inequality. Econometrica 85 (1), 1–28.Arellano, M. and S. Bonhomme (2017b). Sample selection in quantile regression: A survey. InR. Koenker, V. Chernozhukov, X. He, and L. Peng (Eds.), Handbook of Quantile Regression (1ed.)., Chapter 13, pp. 209–221. Chapman and Hall/ CRC.Black, D., J. Joo, R. LaLonde, J. Smith, and E. Taylor (2017). Simple tests for selection bias: Learningmore from instrumental variables. IZA DP 9346, Institute for the Study of Labor.Blundell, R., A. Gosling, H. Ichimura, and C. Meghir (2007). Changes in the distribution of male andfemale wages accounting for employment composition using bounds. Econometrica 75 , 323–363.Blundell, R., H. Reed, and T. Stoker (2003). Interpreting aggregate wage growth. American EconomicReview 93 (4), 1114–1131.Breunig, C. (2017). Testing for missing at random using instrumental variables. Journal of Businessand Economic Statistics Forthcoming , N/A.Chamberlain, G. (1986). Asymptotic efficiency in semi-parametric models with censoring. Journal ofEconometrics 32 , 189–218.Chernozhukov, V., I. Fernandez-Val, B. Melly, and K. Wuethrich (2018). Generic inference on quantileand quantile effect function for discrete outcomes. Working Paper arXiv:1608.05142v4, arXiv.Corradi, V., W. Distaso, and M. Fernandes (2019). Testing for jump spillovers without testing forjumps. Journal of the American Statistical Association Forthcoming .Das, M., W. K. Newey, and F. Vella (2003). Nonparametric estimation of sample selection models. Review of Economic Studies 70 (1), 33–58.Delgado, M. and W. Gonzalez-Manteiga (2001). Significance testing in nonparametric regression basedon the bootstrap. The Annals of Statistics 29 (5), 1469–1507.Escanciano, J. C., D. Jacho-Chavez, and A. Lewbel (2014). Uniform convergence of weighted sumsof non- and semiparametric residuals for estimation and testing. Journal of Econometrics 178 ,426–443.Gronau, R. (1974). Wage comparisons: A selectivity bias. Journal of Political Economy 82 , 1119–1143.Guerre, E. and C. Sabbah (2012). Uniform bias study and bahadur representation for local polynomialestimators of the conditional quantile function. Econometric Theory 28 , 87–129.Guggenberger, P. (2010a). The impact of a hausman pretest on the asymptotic size of a hypothesistest. Econometric Theory 26 , 369–382.Guggenberger, P. (2010b). The impact of a hausman pretest on the size of a hypothesis test: thepanel data case. Journal of Econometrics 156 (2), 337–343.Hayfield, T. and J. Racine (2008). Nonparametric econometrics: The np package. Journal of StatisticalSoftware 27 (5).He, X. and L. Zhu (2003). A lack-of-fit test for quantile regression. Journal of the American StatisticalAssociation 98 (464), 1013–1022. 35eckman, J. (1974). Shadow prices, market wages and labor supply. Econometrica 42 , 679694.Heckman, J. (1979). Sample selection bias as a specification error. Econometrica 47 , 153–161.Huber, M. and B. Melly (2015). A test of the conditional independence assumption in sample selectionmodels. Journal of Applied Econometrics 30 , 1144–1168.Jochmans, K. (2015). Multiplicative-error models with sample selection. Journal of Econometrics 184 ,315–327.Khan, S. and E. Tamer (2010). Irregular identification, support conditions, and inverse weight esti-mation. Econometrica 78 (6), 2021–2042.Kitagawa, T. (2010). Testing for instrument independence in the selection model. Unpublishedmanuscript, UCL.Li, Q. and J. Racine (2008). Nonparametric estimation of conditional cdf and quantile functionswith mixed categorical and continuous data. Journal of Business and Economic Statistics 26 (4),423–434.Masry, E. (1996). Multivariate regression estimation local polynomial fitting for time series. StochasticProcesses and their Applications 65 , 81–101.Qu, Z. and J. Yoon (2015). Nonparametric estimation and inference on conditional quantile processes. Journal of Econometrics 185 , 1–19.Racine, J. (1993). An efficient crossvalidation algorithm for window width selection for nonparametrickernel regression. Communications in Statistics 22 (4), 1107–1114.Van der Vaart, A. and J. Wellner (1996). Weak Convergence and Empirical Processes (First ed.).Springer Series in Statistics. Springer Verlag.Volgushev, S., M. Birke, H. Dette, and N. Neumeyer (2013). Significance testing in quantile regression.