PPermutation Tests at Nonparametric Rates
Marinho Bertanha ∗ EunYi Chung † February 26th, 2021
Abstract
Classical two-sample permutation tests for equality of distributions have exact sizein finite samples, but they fail to control size for testing equality of parameters thatsummarize each distribution. This paper proposes permutation tests for equality ofparameters that are estimated at root-n or slower rates. Our general framework ap-plies to both parametric and nonparametric models, with two samples or one samplesplit into two subsamples. Our tests have correct size asymptotically while preservingexact size in finite samples when distributions are equal. They have no loss in local-asymptotic power compared to tests that use asymptotic critical values. We proposeconfidence sets with correct coverage in large samples that also have exact coverage infinite samples if distributions are equal up to a transformation. We apply our theory tofour commonly-used hypothesis tests of nonparametric functions evaluated at a point.Lastly, simulations show good finite sample properties of our tests.
Keywords:
Permutation Tests, Confidence Sets, Nonparametrics
JEL Classification:
C12, C14, C15 ∗ ∼ mbertanh. † Department of Economics, University of Illinois at Urbana-Champaign. Address: 1407 W Gregory Dr,Urbana, IL 61802. Email: [email protected]. Website: economics.illinois.edu/profile/eunyi a r X i v : . [ ec on . E M ] F e b Introduction
Applications of permutation tests have gained widespread popularity in empirical analy-ses in the social and natural sciences. Classical two-sample permutation tests are appealingto applied researchers because they are easy to implement and have exact size in finite sam-ples under the sharp null hypothesis, that is, when the distributions are the same. However,researchers are often interested in testing equality of parameters that summarize the wholedistribution. For example, one may want to test equality of average outcomes betweentreatment and control groups while nonparametrically controlling for age and income. Theclassical permutation tests fails to control size under such nulls, both in finite and largesamples.This paper proposes robust two-sample permutation tests for equality of parameters thatare estimated at root- n or slower rates. The tests are robust in the sense that they control sizeasymptotically while preserving finite-sample exactness under the sharp null. Our generalframework covers both parametric and nonparametric models, in cases with two samples fromtwo populations or one sample from a union of two populations. In addition, the paper makesthree further contributions. First, we derive the asymptotic distribution of permutationtests by proving the coupling approximation in our framework without assuming the nullhypothesis. Second, we provide four examples of tests in widely-used nonparametric models,and we prove that they satisfy the conditions of our framework. Third, we construct robustconfidence sets for a discrepancy measure between the two populations. The confidence setsare robust in the sense that they have correct coverage asymptotically and exact coveragein finite samples if the sharp null holds under a class of transformations.Our framework considers a summary parameter which one consistently estimates using anasymptotically linear statistic. The influence function depends on the data and populationdistribution as well as the sample size. There may be two iid samples from two populationsor one iid sample from a union of two populations. In the case of one sample, there is avariable in the data that identifies the population of each observation, and the sample issplit into two. The researcher applies the estimator to each of the two samples, computesthe difference, and tests whether the two parameters are equal. The classical permutationtest compares the estimated difference to critical values from the permutation distribution,that is, the distribution of estimates over all permutations of observations across the twosamples.We derive the asymptotic permutation distribution of the estimated difference and find itto be generally different from its asymptotic sampling distribution. This leads the classicalpermutation test to have incorrect size. The derivation has two key technical features. First,2e use the coupling approximation, which consists of two parts: (i) replacing the originalsample with a new sample from a particular mixture of the two populations; and (ii) prov-ing that such replacement renders no change in the asymptotic permutation distribution ofthe test statistic. Note that we prove the coupling approximation holds without assumingthe null hypothesis. This requires a new bound on the variance of the approximation er-ror. Second, sample sizes are random when the researcher splits one sample into two as afunction of the data. Thus, the derivation of the limiting distribution must be valid condi-tional on any sequence of sample splits that occurs with probability one. Additionally, theasymptotic expansion of estimators must hold uniformly over convex combinations of thetwo populations.Our proposed permutation test uses a studentized test statistic, which is the estimateddifference of parameters divided by a consistent estimator of its standard deviation. We thenshow that both the asymptotic permutation and sampling distributions are standard normal.It follows that our permutation test has correct size in large samples, and its asymptoticpower against local alternatives is identical to the test that relies on critical values from astandard normal. Finally, we construct a confidence set by inverting our test, which requirestesting null hypotheses that are more general than simple equality of parameters. We proposeways to transform the data in order to test more general hypotheses while preserving finitesample exactness when populations are equal up to the data transformations.Examples of applications of permutation tests abound. Table 2 in the supplement liststop-publications from the last decade in a variety of disciplines that use permutation tests,and this broad applicability motivates our extension of the theory. We illustrate our frame-work using four nonparametric examples of hypothesis tests that are widely used in empiricalstudies. The first and second examples test equality at a point of nonparametric conditionalmean and quantile functions, respectively. The third and fourth examples test continuity ata point of nonparametric conditional mean or probability density functions (PDFs), respec-tively. We explain how to implement the permutation test in each case and give sufficientconditions to derive the limiting permutation distribution. Implementation requires a signchange and sample splitting in the third and fourth examples. The derivation of limitingdistributions adapts existing works in nonparametrics to demonstrate that kernel estimatorsare asymptotic linear uniformly, as needed by our framework. We find that asymptotic sizecontrol requires studentization except for the fourth example. Related Literature
The insight of robustness through studentization has been proposed before in specificmodels: Neuhaus (1993), Janssen (1997), Janssen (2005), Neubert and Brunner (2007), andPauly et al. (2015). Robustness is also achieved by the prepivoting method (Chung and3omano (2016); Fogarty (2021)). Canay et al. (2017) study randomization tests that havecorrect size in large samples under an approximate symmetry condition. More related tothis paper is the work by Chung and Romano (2013), who propose two-sample permutationtests for equality of parameters that are estimated at root- n by asymptotic linear statistics.It is important to emphasize that this paper is not a straightforward generalization of theirwork. Our framework allows for nonparametric rates, influence functions that depend on n , and random sample sizes that arise from sample splitting. Moreover, our verification ofthe coupling approximation differs from that of Chung and Romano (2013) because we donot assume the null hypothesis. All these features make many of our proofs substantiallydifferent from theirs.Previous works have also considered randomization tests for continuity of nonparametricmodels at a point. Cattaneo et al. (2015) propose local randomization inference proceduresfor a sharp null hypothesis, while Canay and Kamat (2018) provide permutation tests forcontinuity of the whole distribution of an outcome variable conditional on a control variableat a point. In contrast, our permutation test applies to testing continuity of summarystatistics of the conditional distribution such as mean, quantile, variance, etc. Our fourthexample is related to Bugni and Canay (2021), who propose a sign-change randomizationtest for continuity of PDFs at a point, where critical values come from maximizing a functionof a binomial distribution. We show how the same null hypothesis fits into our frameworkand is testable using permutation tests. The last two papers use the insightful idea thatnon-iid order statistics converge in distribution to iid variables, which is technically distinctfrom our coupling approximation. Finally, permutation-based confidence sets have onlybeen proposed before in specific settings. For example, the confidence sets of Imbens andRosenbaum (2005) assume that treatment effects divided by treatment doses are constantacross individuals and that the distribution of treatment eligibility is known.The rest of this paper is outlined as follows. Section 2 presents the general framework,assumptions, and asymptotic distributions of the classical and robust permutation tests.Section 3 studies how our theory applies to four nonparametric examples. Section 4 explainshow to invert permutation tests to build robust confidence sets. Section 5 displays a sim-ulation study that confirms our theory and illustrates good finite sample properties of therobust permutation test. The supplement contains all the proofs. There are two populations P and P , and a real-valued parameter θ ( P k ) summarizing4istribution P k , k = 1 ,
2. The null hypothesis is stated as H : θ ( P ) = θ ( P ) . (2.1)For each population k , there are n k iid observations Z k,i ∈ R q from distribution P k , i = 1 ,. . . , n k . Observations are independent across k and the total number of observations is n = n + n . We define P to be the convex hull of { P , P } . Throughout this paper,random variables with subscript “k” indicate they have distribution P k , e.g., Z k ; for anyother distribution in P , the random variable is denoted V ∈ R q . Operators such as P , E , or V applied to Z k do not carry the subscript P k , but operators applied to V carry the subscript P , e.g., E [ Z k ] vs E P [ V ]. The parameter θ ( P k ) is consistently estimated by (cid:98) θ k = θ n k ,n ( Z k, ,. . . , Z k,n k ), where the functions θ n ,n and θ n ,n satisfy the following assumption. Assumption 2.1.
Let V , . . . , V m be an iid sample from a distribution P ∈ P . Let m growwith n such that m/n → γ , for some γ ∈ (0 , . Use these observations to construct theestimator (cid:98) θ = θ m,n ( V , . . . , V m ) . Assume there exist a sequence of functions ψ n : R q × P → R , a function ξ : P → R , and a non-increasing sequence h n such that nh n → ∞ and ∀ ε > P ∈P P P (cid:40)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:112) mh n (cid:16)(cid:98) θ − θ ( P ) (cid:17) − (cid:32) √ m m (cid:88) i =1 ψ n ( V i , P ) (cid:33)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > ε (cid:41) → E P [ ψ n ( V i , P )] = 0 ∀ P ∈ P (2.3)sup P ∈P (cid:12)(cid:12) V P [ ψ n ( V i , P )] − ξ ( P ) (cid:12)(cid:12) → P ∈P E (cid:2) ψ n ( Z k , P ) (cid:3) < ∞ , for k ∈ { , } (2.5) ∃ θ > n − θ/ sup P ∈P E P (cid:12)(cid:12)(cid:12)(cid:12) ψ n ( V i , P ) V P [ ψ n ( V i , P )] / (cid:12)(cid:12)(cid:12)(cid:12) θ → , and (2.6) ξ (cid:18) mn P + n − mn P (cid:19) → ξ ( γP + (1 − γ ) P ) . (2.7)Situations arise where the number of observations n k is random rather than deterministic.For example, suppose the researcher desires to compare the female and male subpopulationsof a country but only has one iid sample with n individuals from that country. The researchersplits the sample into two subsamples based on the gender of each observation and samplesizes are random. In order to accommodate both deterministic and random sample sizes, weconsider a sampling scheme which is dictated by a vector of indicator variables W n = ( W ,. . . , W n ), W i ∈ { , } for i = 1 , . . . , n , where W n has distribution Q n . Conditional on W n , the sample Z n = ( Z , . . . , Z n ) has Z i drawn from distribution P if W i = 1 or fromdistribution P if W i = 2, with observations independent across i . This accommodates the5tandard two-population sampling by making W n non-random with n entries equal to 1, n entries equal to 2, and n + n = n . It also accommodates the example above of male andfemale subpopulations by making W i iid and P ( W i = 1) equal to the probability of beingfemale. Conditional on W n , there are n k iid observations Z k,i ∈ R q from distribution P k , k = 1 ,
2, and observations are independent across k . As before, (cid:98) θ k = θ n k ,n ( Z k, , . . . , Z k,n k ). Assumption 2.2.
There exists λ ∈ (0 , such that the sequence of distributions Q n satisfies n /n p → λ as n → ∞ . Moreover, Assumption 2.1 holds for all sequences of sample sizes m such that m/n → λ or m/n → − λ . The test statistic T n is a function of the data ( W n , Z n ) as follows T n ( W n , Z n ) . = √ nh (cid:16)(cid:98) θ − (cid:98) θ (cid:17) (2.8)where we omit the subscript n from the sequence h n of Assumption 2.1 to simplify notation.The permutation test is constructed by permuting the order of observations in Z n , whilekeeping the indicator variables W n unchanged, and recomputing the test statistic. A per-mutation is a one-to-one function π : { , . . . , n } → { , . . . , n } , where π ( i ) = j says that the j -th observation becomes the i -th observation once permutation π is applied. Given permu-tation π , the permuted sample becomes ( W n , Z πn ) = ( W , . . . , W n , Z π (1) , . . . , Z π ( n ) ), and there-computed value of the test statistic is T πn = T n ( W n , Z πn ). In other words, permutationsswap individuals across the two samples to which they originally belonged according to W n ,which remains fixed. The set G n is the set of all possible permutations π . The number ofelements in G n is n !.The two-sided permutation test with nominal level α ∈ (0 ,
1) is constructed as follows.First, re-compute the test statistic T n ( W n , Z n ) for every π ∈ G n . Rank the values of T πn across π : T (1) n ≤ T (2) n ≤ . . . ≤ T ( n !) n . Second, fix a nominal level α ∈ (0 ,
1) and let k − = (cid:98) n ! α/ (cid:99) , that is, the largest integer less than or equal to n ! α/
2, and k + = n ! − k − .Third, compute the following quantities: (i) M + , the number of values T ( j ) n , j = 1 , . . . , n !,that are strictly greater than T ( k + ) n , (ii) M − , the number of values T ( j ) n that are strictlysmaller than T ( k − ) n , (iii) M , the number of values T ( j ) n that are equal to either T ( k + ) n or T ( k − ) n , and (iv) a = ( αn ! − M + − M − ) /M . Finally, the outcome of the test is based on thetest function φ : φ ( W n , Z n ) = T n > T ( k + ) n or T n < T ( k − ) n ,a if T n = T ( k + ) n or T n = T ( k − ) n , T ( k − ) n < T n < T ( k + ) n . (2.9)6or a given sample, if φ = 1, you reject the null hypothesis; if φ = a , you randomly reject thenull hypothesis with probability a ; otherwise, if φ = 0, you fail to reject the null. A classicproperty of permutation tests is exact size in finite samples under the sharp null, that is, P = P . Lemma 2.1.
For any n , Q n , P , and P , if P = P , then E [ φ ( W n , Z n )] = α . Remark 2.1.
The randomized outcome in case of ties is important for exact size in finitesamples if P = P . However, it may be undesirable to not always have a deterministicanswer to a hypothesis test after observing a sample of data. An easy way to fix that is toset φ = 0 in case of ties and the test becomes conservative, that is, the size becomes less thanor equal to α . However, the set of distributions that satisfy the null hypothesis θ ( P ) = θ ( P ) is ingeneral larger than the set of distributions that satisfy the sharp null P = P . Thus, thereis no finite sample size control in general. To investigate the asymptotic properties of thetest in (2.9), we derive the probability limit of the permutation distribution, (cid:98) R T n ( t ) = 1 n ! (cid:88) π ∈ G n I { T n ( W n , Z πn ) ≤ t } . (2.10)The hypothesis test (2.9) utilizes critical values from (cid:98) R T n . The test has asymptotic sizecontrol if, under the null hypothesis, the probability limit of (cid:98) R T n equals the cumulativedistribution function (CDF) of the limiting distribution of T n . In order to study both sizeand power, we derive these limiting distributions without imposing the null hypothesis inthe following theorems. Theorem 2.1.
Under Assumptions 2.1-2.2, the permutation distribution (cid:98) R T n converges uni-formly in probability to the CDF of a N (0 , τ ) , i.e., sup t (cid:12)(cid:12)(cid:12) (cid:98) R T n ( t ) − Φ ( t/τ ) (cid:12)(cid:12)(cid:12) p → where τ . = ξ ( P ) /λ (1 − λ ) and P . = λP + (1 − λ ) P . Moreover, T n − √ nh ( θ ( P ) − θ ( P )) d → N (cid:0) , σ (cid:1) (2.12) where σ . = ξ ( P ) /λ + ξ ( P ) / (1 − λ ) . The permutation distribution fails to control size asymptotically because the asymptoticvariance of the permutation distribution τ generally differs from σ . To resolve this issue, the7est statistic T n must be transformed to become asymptotically pivotal. Thus, we divide T n by the square root of a consistent estimator for its asymptotic variance. For each population k ∈ { , } , let ˆ ξ k = ξ n k ,n ( Z k, , . . . , Z k,n k ) be a consistent estimator for ξ ( P k ) and assume thefunctions ξ n ,n and ξ n ,n satisfy the following assumption, Assumption 2.3.
Let V , . . . , V m be an iid sample from a distribution P ∈ P . Use theseobservations to construct the estimator (cid:98) ξ = ξ m,n ( V , . . . , V m ) . Assume that, for any se-quence of sample sizes m such that m/n → λ or m/n → − λ , (cid:98) ξ − ξ ( P ) converges inprobability to uniformly over P ∈ P . Then, the studentized test statistic S n is S n ( W n , Z n ) = T n ( W n , Z n ) (cid:98) σ n (2.13)where (cid:98) σ n is the square root of the consistent estimator for the asymptotic variance of T n , (cid:98) σ n = nn (cid:98) ξ + nn (cid:98) ξ . (2.14) Theorem 2.2.
Let (cid:98) R S n be the permutation CDF defined in (2.10) with T n replaced by S n .Under Assumptions 2.1-2.3, (cid:98) R S n converges uniformly in probability to the CDF of a N (0 , ,i.e., sup t (cid:12)(cid:12)(cid:12) (cid:98) R S n ( t ) − Φ ( t ) (cid:12)(cid:12)(cid:12) p → . Moreover, S n − √ nh ( θ ( P ) − θ ( P )) (cid:98) σ n d → N (0 , . (2.15)Note that the standard deviation ˆ σ n that divides T n must be consistent for σ , as opposedto τ . However, when ˆ σ n is evaluated using permuted samples, it converges in probability to τ . Under the null hypothesis, both the permutation distribution and the test statistic S n are asymptotically standard normal. Therefore, our robust permutation test in (2.9) with T n replaced by S n has asymptotic size equal to the nominal level α , even if P (cid:54) = P . Incase P = P , this test has exact size in finite samples. Since Theorems 2.1 and 2.2 aretrue regardless of whether the null hypothesis holds or not, we can now study the powerproperties of the permutation test. Corollary 2.1.
Let φ ( W n , Z n ) be the permutation test in (2.9) with T n replaced by S n , andsuppose Assumptions 2.1-2.3 hold. If the null hypothesis holds, then E [ φ ( W n , Z n )] → α ; therwise, E [ φ ( W n , Z n )] → . Moreover, assume S n has a limiting distribution under asequence of local alternatives contiguous to the null. Then, the asymptotic power of therobust permutation test against local alternatives is the same as that of the test that usescritical values from the limiting null distribution of S n . In this section, we apply our theory to four different nonparametric problems: testingfor equality of conditional mean and quantile functions evaluated at a point and testing forcontinuity of conditional mean and PDF at a point. For simplicity, we use the Nadaraya-Watson (NW) type of kernel estimators throughout, but proofs in this section generalizeto other types of estimators, e.g., local-polynomial regression, sieves, etc. We obtain theusual rate restrictions on the bandwidth tuning parameter h , which allows researchers tochoose among standard options available in the literature. We fit the four examples intothe general framework of Section 2, demonstrate the validity of Assumptions 2.1 and 2.2,and show that studentization is generally required for asymptotic size control, except for thePDF continuity test. Researchers are often interested in comparing conditional mean functions between twodifferent populations. For example, in randomized controlled trials, P and P are the popu-lations of control and treatment individuals, respectively. Interest lies in the average outcome Y after controlling for an individual characteristic X . For instance, outcomes of a profes-sional training program may differ between rich and poor individuals and we would like tocondition on income X .There are two independent samples of bivariate variables: n observations with Z ,i =( X ,i , Y ,i ) iid P and n observations with Z ,i = ( X ,i , Y ,i ) iid P . The vector W n is non-random with n entries equal to 1 and n entries equal to 2, where n = n + n . For a givenvalue of the control variable x , θ ( P k ) = E [ Y k,i | X k,i = x ], k = 1 ,
2, and the null hypothesis is θ ( P ) = θ ( P ).A common estimator for conditional mean functions is the NW kernel estimator, whichcomputes a weighted average of Y local to X = x for each population. For a bandwidth9 > K , (cid:98) θ k = n k (cid:80) i =1 K (cid:16) X k,i − xh (cid:17) Y k,in k (cid:80) i =1 K (cid:16) X k,i − xh (cid:17) , for k = 1 , . (3.1)In case the distribution of ( X , Y ) equals that of ( X , Y ), the permutation test in (2.9)has exact size in finite samples. For other cases, we rely on asymptotic size control, which de-pends on Assumptions 2.1 and 2.2. Regularity conditions such as continuous differentiabilityof conditional moments yields the asymptotic linear representation of the NW estimator. Proposition 3.1.
Assume that: (i) as n → ∞ , n /n → λ ∈ (0 , , h → , nh → ∞ ,and √ nhh → ; and (ii) K is a kernel density function that is non-negative, bounded,symmetric, and (cid:82) K ( u ) | u | du < ∞ . For each k = 1 , , assume that (iii) the distributionof X k has PDF f X k bounded away from zero, with bounded first and second derivatives; (iv) m Y k | X k ( x k ) . = E [ Y k | X k = x k ] is twice differentiable, m Y k | X k and its derivatives are bounded;(v) v Y k | X k ( x k ) . = V [ Y k | X k = x k ] is differentiable, v Y k | X k and its derivative are bounded,and v Y k | X k ( x ) > ; and (vi) ∃ θ > such that E [ | Y k | θ | X k ] is almost surely bounded. Let V = ( R, S ) be a random variable with distribution P ∈ P . Then, the NW estimator satisfiesAssumptions 2.1 and 2.2 with the following functions ψ n ( V , P ) = K (cid:18) R − xh n (cid:19) (cid:18) S − m S | R ( R ; P ) √ h n f R ( x ; P ) (cid:19) ξ ( P ) = v S | R ( x ; P ) (cid:82) ∞−∞ K ( u ) duf R ( x ; P ) where m S | R ( x ; P ) is the conditional mean of S given R = x , v S | R ( x ; P ) is the conditionalvariance of S given R = x , and f R ( x ; P ) is the PDF of R at x , all three assuming distribution V = ( R, S ) ∼ P ∈ P . The proof of Proposition 3.1 adapts conventional arguments for nonparametric asymp-totics and is found in Section C.1 of the supplement. Under the null hypothesis, the asymp-totic variance of T n and of the permutation distribution are, respectively, σ = (cid:90) ∞−∞ K ( u ) du (cid:18) v Y | X ( x ) λf X ( x ) + v Y | X ( x )(1 − λ ) f X ( x ) (cid:19) τ = (cid:90) ∞−∞ K ( u ) du (cid:18) f X ( x ) v Y | X ( x )(1 − λ ) f R ( x ; P ) + f X ( x ) v Y | X ( x ) λf R ( x ; P ) (cid:19) . where f R ( x ; P ) is the square of the PDF of R evaluated at x under P = λP + (1 − λ ) P ∈ P .10hese variances are generally different, except in special cases, e.g., when f X ( x ) = f X ( x )and λ = 1 / f X ( x ) = f X ( x ) and v Y | X ( x ) = v Y | X ( x ). Thus, in general, theresearcher must use the studentized test statistic for the permutation test to have asymptoticsize control. In this subsection, we examine equality of conditional quantile functions for two pop-ulations. For example, a researcher may wish to compare not only averages (Section 3.1)but also other features of a conditional distribution between P and P , such as the median,tails, interquartile range, etc. The goal is to test the difference of the τ -th quantile of theoutcomes Y between the two populations, after controlling for a given value of the variable X . For instance, the immune response Y of a certain treatment conditional on age X maydiffer for individuals at bottom, median, or top of the immunity distribution.As in Section 3.1, there are two independent samples, Z ,i = ( X ,i , Y ,i ), i = 1 , . . . , n ,and Z ,i = ( X ,i , Y ,i ), i = 1 , . . . , n , and the vector W n is non-random. For a given value of X = x , the parameter of interest is the τ -th conditional quantile, that is, θ ( P k ) = Q τ [ Y k,i | X k,i = x ] = arg min a E [ ρ τ ( Y k,i − a ) | X k,i = x ] , where ρ τ ( u ) = ( τ − I ( u < u .For a bandwidth h > K , a consistent estimator of theNW style is (cid:98) θ k = argmin a n k (cid:88) i =1 ρ τ ( Y k,i − a ) K (cid:18) X k,i − xh (cid:19) , for k = 1 , . (3.2)The next proposition gives conditions for validity of Assumptions 2.1-2.2 and asymptoticsize control. Proposition 3.2.
Assume that: (i) as n → ∞ , n /n → λ ∈ (0 , , h → , nh → ∞ ,and √ nhh → ; and (ii) K is a kernel density function that is bounded, symmetric, andit has finite second moment. For k = 1 , , assume that (iii) X k has PDF f X k ( x k ) thatis bounded and bounded away from zero, twice differentiable with bounded derivatives; (iv)the distribution of Y k conditional on X k has PDF f Y k | X k ( y k | x k ) that is bounded, boundedaway from zero over y k for x k = x , and differentiable with bounded partial derivatives; and(v) the distribution of Y k conditional on X k has CDF F Y k | X k ( y k | x k ) that is twice partiallydifferentiable wrt x k with bounded partial derivatives. Let V = ( R, S ) be a random variable ith distribution P ∈ P . Then, the quantile estimator satisfies Assumptions 2.1 and 2.2with the following functions ψ n ( V , P )= 1 f S | R ( θ ( P ) | x ; P ) f R ( x ; P ) √ h n K (cid:18) R − xh n (cid:19) (cid:0) I { S < θ ( P ) } − F S | R ( θ ( P ) | R ; P ) (cid:1) ξ ( P ) = (cid:90) ∞−∞ K ( u ) du τ (1 − τ ) f S | R ( θ ( P ) | x ; P ) f R ( x ; P ) where f S | R ( s | r ; P ) is the conditional PDF of S given R = r , f R ( r ; P ) is the PDF of R ,and F S | R ( s | r ; P ) is the conditional CDF of S given R = r , all three assuming distribution V = ( R, S ) ∼ P ∈ P . The proof of Proposition 3.2 adapts arguments by Pollard (1991) and Fan et al. (1994)and is found in Section C.2 of the supplement. The asymptotic variance of T n and of thepermutation distribution are, respectively, σ = τ (1 − τ ) (cid:90) ∞−∞ K ( u ) du (cid:32) λf Y | X ( θ ( P ) | x ) f X ( x ) + 1(1 − λ ) f Y | X ( θ ( P ) | x ) f X ( x ) (cid:33) τ = τ (1 − τ ) λ (1 − λ ) (cid:90) ∞−∞ K ( u ) du (cid:32) f S | R ( θ ( ¯ P ) | x ; ¯ P ) f R ( x ; P ) (cid:33) . These variances are generally different except in special cases. For example, the null hypoth-esis implies θ ( P ) = θ ( P ) = θ ( P ). If f Y | X ( θ ( P ) | x ) = f Y | X ( θ ( P ) | x ) and f X ( x ) = f X ( x ),then σ = τ . Thus, in general, the researcher must use the studentized test statistic for thepermutation test to have asymptotic size control. There have been numerous empirical studies in the social sciences that rely on estimationand inference on the size of a discontinuity in a conditional mean function at a certain point.In the so-called Regression Discontinuity Design (RDD), an individual i receives treatmentif and only if a running variable X i is above a fixed threshold. If individuals do not knowthe threshold or do not have perfect manipulation over X , untreated individuals who barelymissed the cutoff serve as a control group for treated individuals who barely made it acrossthe cutoff. Assume the threshold for treatment is 0 without loss of generality. The difference12n side limits E [ Y | X = 0 + ] − E [ Y | X = 0 − ] identifies the causal effect of treatment on anoutcome variable Y . Thus, the null hypothesis of zero causal effect is equivalent to continuityof the conditional mean function E [ Y | X = x ] at x = 0.The idea first appeared in psychology (Thistlethwaite and Campbell (1960)), made itsway to economics (Hahn et al. (2001)), and has a growing number of applications in thesocial sciences. Examples of top-publications include: economics - Agarwal et al. (2017);education - Valentine et al. (2017); political science - Abou-Chadi and Krause (2020); andsociology - Zoorob (2020). We focus on the conditional mean function in this subsection butthe whole argument goes through if one desires to use the conditional quantile function.The researcher has a sample of n iid observations ( X i , Y i ) and the NW estimator for thediscontinuity is (cid:80) ni =1 K (cid:0) X i h (cid:1) I { X i ≥ } Y i (cid:80) ni =1 K (cid:0) X i h (cid:1) I { X i ≥ } − (cid:80) ni =1 K (cid:0) X i h (cid:1) I { X i < } Y i (cid:80) ni =1 K (cid:0) X i h (cid:1) I { X i < } . RDD fits in our two-population framework by splitting the sample based on X beingabove or below the cutoff. Construct W i = 2 − I { X i ≥ } , so that n /n p → λ = P [ W i = 1].Re-order the sample such that the n observations with W i = 1 come first, and the n observations with W i = 2 come second. Define Z ,i = ( X ,i , Y ,i ) = ( X i , Y i ) for i = 1 ,. . . , n and Z ,i = ( X ,i , Y ,i ) = ( − X n + i , Y n + i ) for i = 1 , . . . , n . We have Z n = ( Z , ,. . . , Z ,n , Z , , . . . , Z ,n ). Conditional on W n , the distribution of Z ,i is P , which equalsthe distribution of ( X, Y ) conditional on X ≥
0. Likewise for Z ,i , P is the distributionof ( − X, Y ) conditional on
X <
0. The RDD parameter becomes θ ( P ) − θ ( P ), where θ ( P k ) = E [ Y k | X k = 0 + ] for k = 1 ,
2. The NW estimator for the RDD parameter is (cid:98) θ − (cid:98) θ ,where (cid:98) θ k is defined in Equation 3.1 for k = 1 ,
2. Permutations re-order observations in Z n but keep W n unchanged.In case the distribution of ( Y, X ) equals that of ( Y, − X ), then P = P and the permu-tation test in (2.9) has exact size in finite samples. Note that this X-symmetry restrictioneliminates the impossibility problem in RDD tests (Kamat (2018) and Bertanha and Mor-eira (2020)) because there is no bias in estimation. For other cases, we rely on asymptoticsize control, which depends on Assumptions 2.1 and 2.2. The proposition below gives suf-ficient conditions for these assumptions to hold. These conditions are stated in terms ofthe originally sampled variables ( X, Y ), that is, before they are transformed into ( X k , Y k ), k = 1 , Proposition 3.3.
Assume that: (i) as n → ∞ , h → , nh → ∞ , and √ nhh → ; (ii) K isa kernel density function that is non-negative, bounded, symmetric, and (cid:82) K ( u ) | u | du < ∞ ;(iii) the distribution of X has PDF f X that is bounded, bounded away from zero, differentiable xcept at x = 0 , and has bounded derivative; (iv) E [ Y | X = x ] is bounded, differentiable exceptat x = 0 , and has bounded derivative; (v) V [ Y | X = x ] is bounded, differentiable except at x = 0 , has bounded derivative, V [ Y | X = 0 + ] > , and V [ Y | X = 0 − ] > , where + and − denote side limits; and (vi) ∃ θ > such that E [ | Y | θ | X ] is almost surely bounded. Let V = ( R, S ) be a random variable with distribution P ∈ P . Then, Assumptions 2.1 and 2.2are satisfied with ψ n ( V , P ) = K (cid:18) Rh n (cid:19) (cid:18) S − m S | R ( R ; P ) √ h n f R (0 + ; P ) / (cid:19) ξ ( P ) = v S | R (0 + ; P ) (cid:82) ∞ K ( u ) duf R (0 + ; P ) / where m S | R ( r ; P ) is the conditional mean of S given R = r , v S | R ( r ; P ) is the conditionalvariance of S given R = r , and f R ( r ; P ) is the PDF of R at r , all three assuming distribution V = ( R, S ) ∼ P ∈ P . The proof is in Section C.3 in the supplement. The conditions and proof of Proposition3.3 follow the lines of the conditional mean case (Proposition 3.1). Unlike the conditionalmean case, the evaluation point x lies at the boundary of the support of X k . As a result, h hasto converge faster to zero to eliminate asymptotic bias of T n (i.e., √ nhh → √ nhh → T n and of the permutation distributionare, respectively, σ = (cid:90) ∞ K ( u ) du (cid:18) v Y | X (0 + ) λf X (0 + ) / v Y | X (0 + )(1 − λ ) f X (0 + ) / (cid:19) τ = (cid:90) ∞ K ( u ) du (cid:18) f X (0 + ) v Y | X (0 + )(1 − λ ) f R (0 + ; P ) / f X (0 + ) v Y | X (0 + ) λf R (0 + ; P ) / (cid:19) . If agents don’t manipulate X to change their treatment status, which is a key assumptionin RDD, then the PDF of X should be continuous at the cutoff. This implies that f X (0 + ) = f X (0 + ) λ = f X (0 + )(1 − λ ) = f X (0 − ) = f X (0), and the variances simplify to σ = (cid:90) ∞ K ( u ) du (cid:18) f X (0) (cid:19) (cid:0) v Y | X (0 + ) + v Y | X (0 + ) (cid:1) τ = (cid:90) ∞ K ( u ) du (cid:18) λ (1 − λ ) f X (0) (cid:19) (cid:0) v Y | X (0 + ) + v Y | X (0 + ) (cid:1) . These are generally different, except when λ = 1 /
2. Thus, in general, the researcher mustuse the studentized test statistic for the permutation test to have asymptotic size control.14 .4 Discontinuity of Density
In many settings, the distribution of a random variable may exhibit a discontinuity ata given point if a certain phenomenon of interest occurs. For example, estimating agents’responses to incentives is a central objective in the social sciences. A continuous distributionof agents who face a discontinuous schedule of incentives results in a distribution of responseswith a discontinuity at a known point. For example, Saez (2010) looks for evidence of a masspoint in the distribution of reported income at tax brackets as evidence of agents’ responsesto tax rates. Caetano (2015) proposes an exogeneity test in nonparametric regression mod-els, where the distribution of the potentially endogenous regressor may have a mass point.Identification of causal effects with RDD highly depends on continuity assumptions, andthese imply that the PDF of the control variable is continuous at the cutoff.Consider a scalar random variable X with PDF f that is continuous, except for point x = 0. We want to test the null hypothesis of continuity of the PDF at x = 0. For a samplewith n iid observations X i , a symmetric kernel, and bandwidth h >
0, the kernel densityestimator for the size of the discontinuity is2 nh n (cid:88) i =1 K (cid:18) X i h (cid:19) [ I { X i ≥ } − I { X i < } ] . The problem fits in our two-population framework by randomly splitting the sampleas follows. For observations i ≤ (cid:98) n/ (cid:99) . = n , set W i = 1 and let Z ,i = X ,i = X i ; forobservations n = (cid:98) n/ (cid:99) < i ≤ n , set W i = 2 and let Z ,i − n = X ,i − n = − X i . Thisimplies that n /n → / . Permutations re-order observations in Z n = ( X , , . . . , X ,n , X , ,. . . , X ,n ) but keep W n unchanged. Conditional on W n , the distribution of X ,i is P , whichequals the distribution of X . Likewise for X ,i , P is the distribution of − X . We cannotsplit the sample based on X being above or below 0 as we do in Section 3.3. If we splitthe sample based on X and the distribution of X is asymmetric, it becomes impossible toidentify the side limit of f using only data from either sample, as required by Assumption2.1.Let V be a scalar variable R ∼ P ∈ P . The parameter of interest is θ ( P ) = 1 / f R (0 + ; P ) − f R (0 − ; P )), where f R (0 + ; P ) and f R (0 − ; P ) are the side-limits at zero of the PDF of R under the P distribution. The discontinuity parameter is θ ( P ) − θ ( P ), which equals f X (0 + ) − f X (0 − ) in terms of the PDF of X . The kernel estimator for the density discontinuitybecomes (cid:98) θ − (cid:98) θ , where (cid:98) θ k is defined as (cid:98) θ k = 1 n k h n k (cid:88) i =1 K (cid:18) X k,i h (cid:19) ( I { X k,i ≥ } − I { X k,i < } ) , for k = 1 , . n is even and split in half, the test statistic T n = √ nh (cid:16)(cid:98) θ − (cid:98) θ (cid:17) isinvariant to the way the original sample is split.In case the distribution of X is symmetric at 0, then P = P and the permutation testin (2.9) has exact size in finite samples. For other cases, we rely on asymptotic size control,and thus need to verify Assumptions 2.1 and 2.2. Proposition 3.4.
We assume that: (i) n → ∞ , h → , nh → ∞ , and √ nhh → ; (ii) K isa kernel density function that is non-negative, bounded, symmetric, and (cid:82) K ( u ) | u | du < ∞ ;and (iii) the distribution of X has PDF that is bounded, differentiable except at x = 0 , f X has bounded derivative, f X (0 + ) > , and f X (0 − ) > . Let V = R be a random variable withdistribution P ∈ P . Assumptions 2.1 and 2.2 are satisfied with ψ n ( V , P ) = h − / n K (cid:18) Rh n (cid:19) ( I { R ≥ } − I { R < } ) − h − / n E P (cid:20) K (cid:18) Rh (cid:19) ( I { R ≥ } − I { R < } ) (cid:21) ξ ( P ) = (cid:0) f R (0 + ; P ) + f R (0 − ; P ) (cid:1) (cid:90) ∞ K ( u ) du. The proof of Proposition C.4 is in Section C.4 in the supplement. The asymptotic varianceof T n and of the permutation distribution are, respectively, σ = 2 (cid:90) ∞ K ( u ) du (cid:2) f X (0 + ) + f X (0 − ) + f X (0 + ) + f X (0 − ) (cid:3) τ = 4 (cid:90) ∞ K ( u ) du (cid:2) f R (0 + ; ¯ P ) + f R (0 − ; ¯ P ) (cid:3) = 2 (cid:90) ∞ K ( u ) du (cid:2) f X (0 + ) + f X (0 − ) + f X (0 + ) + f X (0 − ) (cid:3) . These are the same regardless if the null hypothesis is true or not. Thus, unlike the previousexamples, the researcher does not need to studentize the test statistic for asymptotic validityof the permutation test.
This section constructs robust confidence sets for a discrepancy measure Ψ( P , P ) be-tween the two populations by “inverting” the permutation test for the null hypothesis Ψ( P ,P ) = δ , δ ∈ R . The discrepancy measure satisfies two requirements. First, there exists anunique δ ∈ R such that Ψ( P , P ) = δ is equivalent to θ ( P ) = θ ( P ). Second, for every16 (cid:54) = δ , there exists a known data transformation ψ δ that applies to observations in sample1 such that the distribution (cid:101) P of ψ δ ( Z ,i ) satisfies θ ( (cid:101) P ) = θ ( P ).In terms of the examples of Sections 3.1-3.3, we set Ψ( P , P ) = θ ( P ) − θ ( P ), and itfollows that δ = 0 and ψ δ ( Z ,i ) = ( X ,i , Y ,i − δ ) satisfy the two requirements for the dis-crepancy measure. Note that the null hypothesis Ψ( P , P ) = δ is equivalent to E [ Y | X = x ] − E [ Y | X = x ] = δ in the conditional mean case, to Q τ [ Y | X = x ] − Q τ [ Y | X = x ] = δ in the conditional quantile case, and to E [ Y | X = 0 + ] − E [ Y | X = 0 − ] = δ in the dis-continuity of conditional mean case. For the discontinuity of PDF example of Section3.4, we make Ψ( P , P ) = f X (0 + ) /f X (0 + ), and we have that δ = 1 and ψ δ ( Z ,i ) = X ,i ( δ I { X ,i ≥ } + (1 /δ ) I { X ,i < } ) satisfy the two requirements. In that example, Ψ( P ,P ) = δ is equivalent to f X (0 + ) /f X (0 − ) = δ .Define φ δ ( W n , Z n ) to be the test described in Equation 2.9 with the studentized teststatistic S n of Equation 2.13 replacing T n . This test applies to the null hypothesis Ψ( P ,P ) = δ . Next, for δ (cid:54) = δ , we first transform the data Z n = ( Z , , . . . , Z ,n , Z , , . . . , Z ,n )to (cid:101) Z n = ( (cid:101) Z , , . . . , (cid:101) Z ,n , Z , , . . . , Z ,n ), where (cid:101) Z ,i = ψ δ ( Z ,i ) for i = 1 , . . . , n . The robustpermutation test for the null hypothesis Ψ( P , P ) = δ is defined as φ δ ( W n , Z n ) = φ δ ( W n , (cid:101) Z n ).Let U be an uniform random variable in [0 ,
1] and independent of the data. The confidenceset with 1 − α nominal coverage is C ( W n , Z n ) . = { δ : U > φ δ ( W n , Z n ) } . (4.1)The set almost-surely includes all values of δ for which the test fails to reject, and it excludesthe ones the test rejects. For those values of δ for which the test outcome is randomized withrejection probability a , the inclusion in the confidence set occurs with probability 1 − a . Thepurpose of a randomized confidence set is to guarantee exact coverage whenever the test φ δ has exact size. A non-randomized confidence set is (cid:101) C ( W n , Z n ) . = { δ ∈ R : φ δ ( W n , Z n ) < } ,but its coverage is conservative, specially in small samples.Lemma 2.1 implies that φ δ ( W n , Z n ) has exact size α in finite samples for any P and P such that Ψ( P , P ) = δ and (cid:101) P = ψ δ P = P . That implies that the confidence set C ( W n , Z n ) has exact coverage in finite samples if distributions are equal up to a transformation ψ δ .In the examples of Sections 3.1-3.2, exactness occurs when the distributions of P and P aresuch that there exists δ ∈ R for which the distribution of ( Y ,i − δ, X ,i ) equals that of ( Y ,i ,X ,i ); in Section 3.3, when there exists δ ∈ R for which the distribution of ( Y − δ, X ) | X ≥ Y, − X ) | X <
0; and in Section 3.4, when there exists δ ∈ (0 , ∞ ) such that I { x ≥ } f X ( x/δ ) /δ + I { x < } f X ( xδ ) δ is symmetric around x = 0. If these restrictions do17ot apply, then the confidence set has correct coverage asymptotically. Corollary 4.1.
Consider the discrepancy measure Ψ and class of data transformations ψ δ discussed above. For any n , Q n , P , and P , if (cid:101) P = ψ δ P = P for δ = Ψ( P , P ) , then P [Ψ( P , P ) ∈ C ( W n , Z n )] = 1 − α. Assume instead that (cid:101) P = ψ δ P (cid:54) = P and Assumptions 2.1-2.3 hold. Then, as n → ∞ , P [Ψ( P , P ) ∈ C ( W n , Z n )] → − α. Remark 4.1.
The data transformation ψ δ is used to obtain finite sample exactness whendistributions are equal up to a transformation ψ δ , but it is not necessary for correct asymptoticcoverage. In fact, for the null hypothesis θ ( P ) − θ ( P ) = δ , one may construct a permutationtest φ ∗ δ that compares the value of S n − √ nhδ/ (cid:98) σ n with the critical values from (cid:98) R S n . The test φ ∗ δ has correct size asymptotically and the confidence set constructed by inverting φ ∗ δ hascorrect coverage in large samples. We conduct Monte Carlo simulations to compare the finite sample performance of ourpermutation test to the conventional t -test, that is, the test that rejects if | S n | > − Φ(1 − α/ t -test in all cases;the goal is to verify the theoretical predictions of Section 2 and explore DGP variations thatillustrate pros and cons of our methods. The exercise confirms the theoretical findings ofsize control in large samples and in finite samples under the sharp null; it also shows similarpower curves between permutation and t -tests in large samples. Moreover, we find severalcases where the permutation test performs significantly better than the t -test, both in powerand in size control outside of the sharp null.The experiment simulates iid samples from the following designs: Design 1: for k = 1 , X k ∼ U [0 , ε k ∼ N (0 , σ k ), where X k is independent of ε k ,and Y k = m k ( X k ) + ε k ; the conditional mean functions are m ( x ) = 5( x − . x − .
8) + I {| x − . | > . } and m ( x ) = − x − . x − . I {| x − . | > . } ; the sample sizesare ( n , n ) ∈ { (50 , , (250 , , (500 , , (20 , , (100 , , (200 , } , and thevariances are ( σ , σ ) ∈ { (1 , , (1 , } ; the null hypothesis is H : θ ( P ) = m (0 .
5) = m (0 .
5) = θ ( P ); Design 2: for k = 1 , X k ∼ U [0 , ε y,k ∼ N (0 , σ k ), ε d,k ∼ N (0 , X k , y,k , ε d,k ) are mutually independent, Y k = m y,k ( X k ) + ε y,k , and D k = m d,k ( X k ) + ε d,k ; theconditional mean functions are m y, ( x ) = 1+5( x − . x − . I {| x − . | > . } , m y, ( x ) =1 − x − . x − . I {| x − . | > . } , and m d,k ( X k ) = µ ∈ { , } ; the sample sizes are( n , n ) ∈ { (50 , , (100 , , (1000 , } , and the variances are ( σ , σ ) ∈ { (1 , , (1 , } ;the null hypothesis is H : θ ( P ) = m y, (0 . /m d, (0 .
5) = m y, (0 . /m d, (0 .
5) = θ ( P ).The designs represent practical situations where the t -test is known to perform poorly insmall samples. Design 1 correspond to cases of sample imbalance, that is, cases where thesample sizes are very different. For example, the researcher may be interested in comparingthe average outcome in a professional training conditional on an test score between men andwomen, and the sample of women is much larger than men. Design 2 refers to scenarioswhere interest lies in the ratio of conditional mean functions and the denominator maybe small. For example, one desires to estimate the efficacy rate of a vaccine conditionalon blood pressure and to compare the rate between men and women. The efficacy ratein vaccine trials is the difference between the proportions of infected individuals betweentreatment and control divided by the proportion in the control group. Both designs haveconditional mean functions for Y given X that are equal and flat for | X − . | ≤ .
3, butdifferent and non-linear otherwise (Figure 1). This specification allows us to experiment withscenarios with or without estimation bias, and inside or outside the sharp null, dependingon the bandwidth choice. Both designs fall under the null hypothesis; they fall under thesharp null if we further set σ = σ and restrict the sample to | X − . | ≤ . Notes: conditional mean functions of Design 1. The lines are equal and flat for | X − . | ≤ .
3. Theconditional means of Design 2 equal those of Design 1 shifted upwards by 1.
The test statistic T n is the difference of consistent estimators (cid:98) θ − (cid:98) θ multiplied by √ nh (Equation 2.8). The studentized statistic S n equals the difference of consistent estimatorsdivided by the standard error of the difference (Equation 2.13). The conditional meanfunctions at point 0 . h that shrinks to zero as n increases. A practical choice for h is the estimated MSE-optimal bandwidth for local linear regression, which decreases at rate n − / . In particular, we adapt to our setting the algorithm of Imbens and Kalyanaraman(2012). This choice of bandwidth implies that T n and S n have asymptotic distributions thatare not centered at zero. Thus, we employ local quadratic regressions using the same kerneland bandwidth as before to construct the test statistics and avoid the asymptotic bias. Weuse White’s robust formula for local quadratic regressions to compute standard errors, wherethe the squared residuals are obtained by the nearest-neighbor matching estimator using 3neighbors (Abadie and Imbens (2006)).We consider 10,000 simulated samples and 1,000 random permutations for each variationof the two designs. We compare the null rejection probability of three different tests at 5%nominal size: the non-studentized permutation test (NSP), the studentized permutation test(SP), and the t -test ( t ). All tests use the same bandwidth choice, and we experiment withfour possibilities: three fixed choices of h , 0 . , . , and 0 .
5, and the data-driven MSE-optimal20andwidth (cid:98) h mse .Table 1 displays the simulated rejection rates under the null hypothesis. DGPs with σ = σ and h = 0 . . t fails to do so, most notably in Design 1with n = 20 and Design 2 with µ = 1. Models with σ = σ and h = 0 . T n divergeto infinity as the sample size increases, which explains the increasing size distortion of alltests. In these cases, all tests fail to control size, although the distortions are smaller for SPthan for t .The rows of Table 1 with σ (cid:54) = σ fall outside of the sharp null. Cases with σ (cid:54) = σ and h ≤ . T n does not diverge as in the case of h = 0 .
5. For Design 1 with h = 0 . .
3, there is a large size distortion of NSP and a smallsize distortion of SP that decreases with n , as predicted by our theory. The size distortion ofSP is much smaller than that of t , especially for smaller samples. For Design 2 with h = 0 . .
3, the size distortions of the permutation test are again smaller than t and decrease with n . Finally, the MSE-optimal bandwidth (cid:98) h mse balances bias and variance and does a goodjob in keeping low size distortions of SP in all cases.We shift the conditional mean function of Y in both designs to allow for θ ( P ) (cid:54) = θ ( P )and examine power of SP and t tests. It is difficult to directly compare power between SPand t because t fails to control size in most of the cases. Thus, we artificially adjust thenominal size of both tests to make sure they have simulated rejection rate of 5% under thenull hypothesis. Figures 2 and 3 display the power curve of SP (solid line) and t (dashedline), respectively for designs 1 and 2. Each figure corresponds to a design; each panel insidea figure corresponds to one of four variations of a design and displays curves associated withthree different sample sizes, with darker colors representing larger samples. The x-axis shows θ ( P ) − θ ( P ) and the y-axis, the simulated probability of rejection.In Design 1 (Figure 2), the power curves are essentially identical between SP and t . InDesign 2 (Figure 3), SP dominates t in cases with µ = 1, but power curves are otherwise verysimilar. The discrepancies between SP and t converge to zero as n increases, as predicted byour theory. Overall, we conclude that SP has superior size control than t , without substantialcosts in terms of power. 21able 1: Simulated Rejection Rates - 5% Nominal Size Design 1 h = 0 . h = 0 . h = 0 . (cid:98) h mse σ σ n n NSP SP t NSP SP t NSP SP t NSP SP t Design 2 h = 0 . h = 0 . h = 0 . (cid:98) h mse σ σ µ n = n NSP SP t NSP SP t NSP SP t NSP SP t Notes: the table displays the simulated rejection rates under the null hypothesis for the non-studentized permutation test (NSP), studentizedpermutation test (SP), and t -test ( t ). Different rows correspond to different variations of Designs 1 and 2 as explained in the text. The bandwidthchoice h equals one of three different fixed values (0.1, 0.3, and 0.5) or a data-driven MSE-optimal bandwidth (cid:98) h mse . igure 2: Design 1 - Simulated Power Curves(a) σ = σ = 1, n / ( n + n ) = 0 . (b) σ = σ = 1, n / ( n + n ) = 0 . (c) σ = 5, σ = 1, n / ( n + n ) = 0 . (d) σ = 5, σ = 1, n / ( n + n ) = 0 . Notes: solid lines display the power curves of the studentized permutation test (P) and dashed lines displaythe power curves of the t -test ( t ). Darker lines correspond to bigger sample sizes, i.e., in panels (a) and (c):black ( n , n ) = (500 , n , n ) = (250 , n , n ) = (50 , n , n ) = (200 , n , n ) = (100 , n , n ) = (20 , θ ( P ) − θ ( P ) and the y-axis, the simulated probability ofrejection. The nominal sizes of P and t are artificially adjusted such that the simulated rejection under thenull is always equal to 5%. σ = σ = 1, µ = 10 (b) σ = σ = 1, µ = 1 (c) σ = 5, σ = 1, µ = 10 (d) σ = 5, σ = 1, µ = 1 Notes: solid lines display the power curves of the studentized permutation test (P) and dashed lines displaythe power curves of the t -test ( t ). Darker lines correspond to bigger sample sizes, i.e., black ( n , n ) = (1000 , n , n ) = (100 , n , n ) = (50 , θ ( P ) − θ ( P ) and the y-axis, the simulated probability of rejection. The nominal sizes of P and t are artificially adjusted such that the simulated rejection under the null is always equal to 5%. Classical two-sample permutation tests for the sharp null hypothesis of equal distributionsare easy to implement and have exact size in finite samples. However, for testing equality ofparameters that summarize distributions, classical permutation tests fail to control size. Tofix this problem, we propose robust permutation tests based on studentized test statisticsthat are asymptotically linear at root- n or slower rates. Our tests have asymptotic sizecontrol and local power equal to conventional t -tests, with the added benefit of exact sizecontrol in finite samples when the distributions are equal under the null.24ur permutation tests apply to several important hypothesis tests that are widely used inthe social and natural sciences, and we present four nonparametric examples. Our frameworkis general enough to cover both parametric and nonparametric models with two samples orone sample split into two subsamples. The sample splitting feature is particularly helpful totwo of our examples where researchers are interested in testing continuity of nonparametricfunctions. We also propose confidence sets with correct asymptotic coverage that haveexact coverage in finite samples if population distributions are the same up to a class oftransformations. A simulation study confirms the theoretical findings of size control in largesamples, exact size in small samples under the sharp null, and similar power curves betweenpermutation and t -tests in large samples. Moreover, the simulation study reveals cases wherethe permutation test performs significantly better than the t -test, both in power and in sizecontrol outside of the sharp null. References
Abadie, Alberto and Guido W Imbens (2006) “Large Sample Properties of Matching Esti-mators for Average Treatment Effects,”
Econometrica , Vol. 74, No. 1, pp. 235–267.Abou-Chadi, Tarik and Werner Krause (2020) “The Causal Effect of Radical Right Successon Mainstream Parties’ Policy Positions: A Regression Discontinuity Approach,”
BritishJournal of Political Science , Vol. 50, No. 3, pp. 829–847.Agarwal, Sumit, Souphala Chomsisengphet, Neale Mahoney, and Johannes Stroebel (2017)“Do Banks Pass Through Credit Expansions to Consumers Who Want to Borrow?”
Quar-terly Journal of Economics , Vol. 133, No. 1, pp. 129–190.Agha, G., E. A. Houseman, K. T. Kelsey, C. B. Eaton, S. Buka, and E. B. Loucks (2015)“Adiposity is associated with DNA methylation profile in adipose tissue,”
InternationalJournal of Epidemiology , Vol. 44, No. 4, pp. 1277–1287.Alan, S., T. Boneva, and S. Ertac (2019) “Ever Failed, Try Again, Succeed Better: Re-sults from a Randomized Educational Intervention on Grit,”
The Quarterly Journal ofEconomics , Vol. 134, No. 3, pp. 1121–1162.Arnup, S. J., A. B. Forbes, B. C. Kahan, K. E. Morgan, and J. E. McKenziea (2016)“Appropriate statistical methods were infrequently used in cluster-randomized crossovertrials,”
Journal of Clinical Epidemiology , Vol. 74, pp. 40–50.25echtel, M. M., D. Hangartner, and L. Schmid (2015) “Does Compulsory Voting IncreaseSupport for Leftist Policy?”
American Journal of Political Science , Vol. 60, pp. 752–767.Bellan, S. E., J. R. Pulliam, C. A. Pearson, D. Champredon, S. J. Fox, L. Skrip, A. P.Galvani, M. Gambhir, B. A. Lopman, T. C. Porco, L. A. Meyers, and J. Dushoff (2015)“Statistical power and validity of Ebola vaccine trials in Sierra Leone: a simulation studyof trial design and analysis,”
Lancet Infect Diseases , Vol. 15, pp. 703–710.Bertanha, Marinho and Marcelo J Moreira (2020) “Impossible Inference in Econometrics:Theory and Applications,”
Journal of Econometrics , Vol. 218, No. 2, pp. 247–270.Bick, A., N. Fuchs-Sch¨undeln, and D. Lagakos (2018) “How Do Hours Worked Vary withIncome? Cross-Country Evidence and Implications,” Vol. 108, No. 1, pp. 170–199.Bishara, A. J. and J. B. Hittner (2012) “Testing the significance of a correlation with non-normal data: Comparison of Pearson, Spearman, transformation, and resampling ap-proaches,”
Psychological Methods , Vol. 17, No. 3, pp. 399–417.Bugni, Federico A and Ivan A Canay (2021) “Testing Continuity of a Density via g-orderStatistics in the Regression Discontinuity Design,”
Journal of Econometrics , Vol. 221, No.1, pp. 138–159.Burrows, N. R., I. Hora, L. S. Geiss, E. W. Gregg, and A. Albright (2017) “Incidence ofEnd-Stage Renal Disease Attributed to Diabetes Among Persons with Diagnosed Diabetes- United States and Puerto Rico, 2000-2014,”
MMWR Morbidity and Mortality WeeklyReport , Vol. 66, No. 43, pp. 1165–1170.Bursztyn, L., G. Egorov, and R. Jensen (2019) “Cool to be Smart or Smart to be Cool?Understanding Peer Pressure in Education,”
Review of Economic Studies , Vol. 86, pp.1487–1526.Caetano, Carolina (2015) “A Test of Exogeneity without Instrumental Variables in Modelswith Bunching,”
Econometrica , Vol. 83, No. 4, pp. 1581–1600.Canay, Ivan A and Vishal Kamat (2018) “Approximate Permutation Tests and InducedOrder Statistics in the Regression Discontinuity Design,”
Review of Economic Studies ,Vol. 85, No. 3, pp. 1577–1608.Canay, Ivan A, Joseph P Romano, and Azeem M Shaikh (2017) “Randomization Tests underan Approximate Symmetry Assumption,”
Econometrica , Vol. 85, No. 3, pp. 1013–1030.26attaneo, M.D., B.R. Frandsen, and R. Titiunik (2015) “Randomization Inference in the Re-gression Discontinuity Design: An Application to Party: Advantages in the U.S. Senate,”
Journal of Causal Inference , Vol. 3, No. 1, pp. 1–24.Chung, EunYi and Joseph P Romano (2013) “Exact and Asymptotically Robust PermutationTests,”
The Annals of Statistics , Vol. 41, No. 2, pp. 484–507.(2016) “Multivariate and Multiple Permutation Tests,”
Journal of Econometrics ,Vol. 193, No. 1, pp. 76–91.Cunningham, S. and M. Shah (2018) “Decriminalizing Indoor Prostitution: Implications forSexual Violence and Public Health,”
Review of Economic Studies , Vol. 85, pp. 1683–1715.Das, T., S. Borgwardt, D. J. Hauke, F. Harrisberger, U. E. Lang, A. Riecher-R¨ossler,L. Palaniyappan, and A. Schmidt (2018) “Disorganized Gyrification Network PropertiesDuring the Transition to Psychosis,”
JAMA Psychiatry , Vol. 75, pp. 613–622.Durrett, Rick (2019)
Probability: Theory and Examples , Vol. 49: Cambridge university press.Fan, Jianqing, Tien-Chung Hu, and Young K Truong (1994) “Robust Non-parametric Func-tion Estimation,”
Scandinavian Journal of Statistics , pp. 433–446.Fogarty, Colin B (2021) “Prepivoted Permutation Tests,” arXiv preprint arXiv:2102.04423 .Hahn, J., P. Todd, and W. Van der Klaauw (2001) “Identification and Estimation of Treat-ment Effects with a Regression-Discontinuity Design,”
Econometrica , Vol. 69, No. 1, pp.201–209.Hu, Y., S. Boker, M. Neale, and K. L. Klump (2014) “Coupled latent differential equationwith moderators: Simulation and application,”
Psychological Methods , Vol. 19, No. 1, pp.56–71.Huang, L., M. Crino, J. HY. Wu, M. Woodward, F. Barzi, M. Land, R. McLean, J. Webster,B. Enkhtungalag, and B. Neal (2016) “Mean population salt intake estimated from 24-hurine samples and spot urine samples: a systematic review and meta-analysis,”
Interna-tional Journal of Epidemiology , Vol. 45, No. 1, pp. 239–250.Imbens, Guido and Karthik Kalyanaraman (2012) “Optimal Bandwidth Choice For TheRegression Discontinuity Estimator,”
Review of Economic Studies , Vol. 79, No. 3, pp.933–959. 27mbens, Guido W and Paul R Rosenbaum (2005) “Robust, Accurate Confidence Intervalswith a Weak Instrument: Quarter of Birth and Education,”
Journal of the Royal StatisticalSociety: Series A (Statistics in Society) , Vol. 168, No. 1, pp. 109–126.Janssen, Arnold (1997) “Studentized permutation tests for non-i.i.d. hypotheses and thegeneralized Behrens-Fisher problem,”
Statistics & Probability Letters , Vol. 36, No. 1, pp.9–21. (2005) “Resampling Student’s t -type statistics,” Annals of the Institute of StatisticalMathematics , Vol. 57, pp. 507–529.Jorgensen, T. D., B. A. Kite, P. Y. Chen, and S. D. Short (2018) “Permutation randomizationmethods for testing measurement equivalence and detecting differential item functioningin multiple-group confirmatory factor analysis,”
Psychological Methods , Vol. 23, No. 4, pp.708–728.Kamat, Vishal (2018) “On Nonparametric Inference in the Regression Discontinuity Design,”
Econometric Theory , Vol. 34, No. 3, pp. 694–703.Lax, J. R. and K. T. Rader (2010) “Legal Constraints on Supreme Court Decision Making:Do Jurisprudential Regimes Exist?”
The Journal of Politics , Vol. 72, No. 2, pp. 273–284.Lehmann, Erich L and Joseph P Romano (2005)
Testing Statistical Hypotheses : SpringerScience & Business Media.L´opez-Cuadrado, T., A. Ll´acer, R. Palmera-Su´arez, D. G´omez-Barroso, C. Savulescu,P. Gonz´alez-Yuste, and R. Fern´andez-Cuenca (2014) “Trends in Infectious Disease Mortal-ity Rates, Spain, 1980-2100,”
Emerging Infectious Diseases , Vol. 20, No. 5, pp. 782–789.Neubert, K. and E. Brunner (2007) “A studentized permutation test for the non-parametricBehrens-Fisher problem,”
Computational Statistics & Data Analysis , Vol. 51, No. 10, pp.5192–5204.Neuhaus, G. (1993) “Conditional Rank Tests for the Two-Sample Problem Under RandomCensorship,”
Annals of Statistics , Vol. 21, No. 4, pp. 1760–1779.Pauly, M., E. Brunner, and F. Konietschke (2015) “Asymptotic permutation tests in generalfactorial designs,”
Journal of the Royal Statistical Society. Series B (Statistical Methodol-ogy) , Vol. 77, No. 2, pp. 461–473. 28hyo, A. P., S. Nkhoma, K. Stepniewska, E. A. Ashley, S. Nair, R. McGready, C. L. Moo,S. Al-Saai, A. M. Dondorp, K. M. Lwin, P. Singhasivanon, N. P. Day, N. J. White,T. J. Anderson, and F. Nosten (2012) “Emergence of artemisinin-resistant malaria on thewestern border of Thailand: a longitudinal study,”
The Lancet , Vol. 379, No. 9830, pp.1960–1966.Pietschnig, J. and M. Voracek (2015) “One Century of Global IQ Gains: A Formal Meta-Analysis of the Flynn Effect (1909-2013),”
Perspectives on Psychological Science , Vol. 10,No. 3, pp. 282–306.Pollard, David (1991) “Asymptotics for Least Absolute Deviation Regression Estimators,”
Econometric Theory , Vol. 7, No. 2, pp. 186–199.Rajagopalan, P., D. P. Hibar, and P. M. Thompson (2013) “TREM2 Risk Variant and Lossof Brain Tissue,”
The New England Journal of Medicine , Vol. 369, No. 16, pp. 1565–1567.Ramos, R. and C. Sanz (2019) “Backing the Incumbent in Difficult Times: The ElectoralImpact of Wildfires,”
Comparative Political Studies , Vol. 53, pp. 469–499.Rao, G. (2019) “Familiarity Does Not Breed Contempt: Generosity, Discrimination, andDiversity in Delhi Schools,”
American Economics Review , Vol. 109, No. 3, pp. 774–809.Ryan, A. M, S. Krinsky, E. Kontopantelis, and T. Doran (2016) “Long-term evidence forthe effect of pay-for-performance in primary care on mortality in the UK: a populationstudy,”
The Lancet , Vol. 388, No. 10041, pp. 268–274.Saez, Emmanuel (2010) “Do Taxpayers Bunch at Kink Points?”
American Economic Jour-nal: Economic Policy , Vol. 2, No. 3, pp. 180–212.Schafer, J. and J. B. Holbein (2020) “When Time Is of the Essence: A Natural Experimenton How Time Constraints Influence Elections,”
The Journal of Politics , Vol. 82, No. 2,pp. 418–432.Thistlethwaite, Donald L and Donald T Campbell (1960) “Regression-discontinuity Analysis:An Alternative to the Ex Post Facto Experiment,”
Journal of Educational Psychology , Vol.51, No. 6, p. 309.Valentine, Jeffrey C, Spyros Konstantopoulos, and Sara Goldrick-Rab (2017) “What Hap-pens to Students Placed into Developmental Education? A Meta-analysis of RegressionDiscontinuity Studies,”
Review of Educational Research , Vol. 87, No. 4, pp. 806–833.29ood, A. K. and C. R. Grose (2020) “Campaign Finance Transparency Affects Legislators’Election Outcomes and Behavior,”
American Journal of Political Science, forthcoming .Zoorob, Michael (2020) “Do Police Brutality Stories Reduce 911 Calls? Reassessing anImportant Criminological Finding,”
American Sociological Review , Vol. 85, No. 1, pp.176–183. 30
Permutation Tests at Nonparametric Rates”
Marinho Bertanha, EunYi Chung
This supplementary appendix contains Table 2 and all proofs from the main text. A List of Publications
Table 2: Selected Publications in Social and Natural Sciences
Economics
Alan et al. (2019) Qtly. J. EconomicsBick et al. (2018) Ame. Econ. Rev.Bursztyn et al. (2019) Rev. Econ. Stud.Cunningham and Shah (2018) Rev. Econ. Stud.Rao (2019) Ame. Econ. Rev.
Epidemiology
Agha et al. (2015) Int. J. Epidemiol.Arnup et al. (2016) J. Clin. Epidemiol.Burrows et al. (2017) MMWR Morb Mortal Wkly RepHuang et al. (2016) Int. J. Epidemiol.L´opez-Cuadrado et al. (2014) Emerg. Infect. Dis.
Medicine
Bellan et al. (2015) Lancet Infect DisDas et al. (2018) J. Am. Med. Assoc. PsychiatryPhyo et al. (2012) LancetRajagopalan et al. (2013) N. Engl. J. Med.Ryan et al. (2016) Lancet
Political Science
Bechtel et al. (2015) Ame. J. Pol. Sci.Lax and Rader (2010) J. PoliticsRamos and Sanz (2019) Comp. Polit. Stud.Schafer and Holbein (2020) J. PoliticsWood and Grose (2020) Ame. J. Pol. Sci.
Psychology
Bishara and Hittner (2012) Psychol. MethodsHu et al. (2014) Psychol. MethodsJorgensen et al. (2018) Psychol. MethodsPietschnig and Voracek (2015) Perspect Psychol Sci
Notes: The table lists selected publications from top journals in each discipline that use permutation testsin their analyses. We selected top journals using Google Scholar - Metrics and searched for publications inthe last decade. Proofs of Theorems and Corollaries
B.1 Proof of Lemma 2.1 - Exact Size in Finite Samples
Summing φ ( W n , Z πn ) over π ∈ G n and taking the conditional expectation given W n yields, αn ! = (cid:88) π ∈ G n E [ φ ( W n , Z πn ) | W n ] = n ! E [ φ ( W n , Z n ) | W n ] (B.1)which implies that E [ φ ( W n , Z n )] = E { E [ φ ( W n , Z n ) | W n ] } = α . (cid:3) B.2 Proof of Theorem 2.1 - Asymptotic Distributions WithoutStudentization
B.2.1 Asymptotic Distribution of Test Statistic
Consider a sequence of W n that satisfies ( n /n − λ ) → λ ∈ (0 , W n , T n − √ nh ( θ ( P ) − θ ( P )) = √ nh (cid:20) √ n h √ n h (cid:16)(cid:98) θ − θ ( P ) (cid:17) − √ n h √ n h (cid:16)(cid:98) θ − θ ( P ) (cid:17)(cid:21) = √ n √ n (cid:34) √ n n (cid:88) i =1 ψ n ( Z ,i , P ) (cid:35) − √ n √ n (cid:34) √ n n (cid:88) i =1 ψ n ( Z ,i , P ) (cid:35) + √ n √ n o P (1) − √ n √ n o P (1)where o P k (1) is a term that depends on Z k, , . . . , Z k,n k and converges in probability to zeroas n → ∞ , k = 1 , √ n √ n o P (1) − √ n √ n o P (1) p → . Second, for each k = 1 ,
2, it suffices to show that the Lindeberg condition holds. Abbre-viate ψ n ( Z k,i , P k ) by ψ n,k and V [ ψ n ( Z k,i , P k )] by δ n,k . For every ε > θ of Assumption2.1-(2.6), note that (cid:12)(cid:12)(cid:12)(cid:12) ψ n,k δ n,k (cid:12)(cid:12)(cid:12)(cid:12) θ ≥ ψ n,k δ n,k ( εn k ) θ/ I (cid:40) ψ n,k δ n,k > εn k (cid:41) , ξ ( P k ) = lim n →∞ V [ ψ n ( Z k,i , P k )],1 √ n k n k (cid:88) i =1 ψ n ( Z k,i , P k ) d → N (cid:0) ξ ( P k ) (cid:1) . Therefore, √ n √ n (cid:34) √ n n (cid:88) i =1 ψ n ( Z ,i , P ) (cid:35) − √ n √ n (cid:34) √ n n (cid:88) i =1 ψ n ( Z ,i , P ) (cid:35) d → N (cid:18) ξ ( P ) λ + ξ ( P )1 − λ (cid:19) which shows convergence in distribution conditional on W n .Lemma D.3 gives convergence in distribution unconditionally. B.2.2 Asymptotic Linear Representation of Permuted Test Statistic
In this and the following subsections, we make the entire analysis conditional on a se-quence of W n that satisfies ( n /n − λ ) → λ ∈ (0 , Z n = ( Z , . . . , Z n , Z n +1 , . . . , Z n ) = ( Z , , . . . , Z ,n , Z , , . . . , Z ,n ) and W n = ( W ,n , . . . , W n ,n , W n +1 ,n , . . . , W n,n ) = (1 , . . . , , , . . . , π be a random permutation that is uniformly distributed over G n and independentof the data. The goal of this subsection is to show that, for the test statistic T n defined inEquation 2.8, T n ( W n , Z πn ) = √ n √ n (cid:34) √ n n (cid:88) i =1 ψ n ( Z π ( i ) , P n ) (cid:35) − √ n √ n (cid:34) √ n n (cid:88) i = n +1 ψ n ( Z π ( i ) , P n ) (cid:35) + o p (1) , where P n = p ,n P + p ,n P , p k,n = n k /n , k = 1 ,
2, and o p (1) is a term that depends on thedata and the random permutation, and it converges in probability to zero.For each k = 1 ,
2, let V ,n , . . . , V n k ,n be an iid sample from the distribution P n . As P n ∈ P ∀ n , the uniform asymptotic linear representation of Assumption 2.1 guaranteesthat R k,n ( V ,n , . . . , V n k ,n ) . = (cid:112) n k h (cid:0) θ n k ,n ( V ,n , . . . , V n k ,n ) − θ ( P n ) (cid:1) (B.2)4 (cid:32) √ n k n k (cid:88) i =1 ψ n ( V i,n , P n ) (cid:33) p → R ,n ( Z π (1) , . . . , Z π ( n ) ) = (cid:112) n h (cid:0) θ n ,n ( Z π (1) , . . . , Z π ( n ) ) − θ ( P n ) (cid:1) − (cid:32) √ n n (cid:88) i =1 ψ n ( Z π ( i ) , P n ) (cid:33) p → R ,n ( Z π ( n +1) , . . . , Z π ( n ) ) = (cid:112) n h (cid:0) θ n ,n ( Z π ( n +1) , . . . , Z π ( n ) ) − θ ( P n ) (cid:1) − (cid:32) √ n n (cid:88) i = n +1 ψ n ( Z π ( i ) , P n ) (cid:33) p → . Then, T n ( W n , Z πn ) = √ n √ n (cid:34) √ n n (cid:88) i =1 ψ n ( Z π ( i ) , P n ) (cid:35) − √ n √ n (cid:34) √ n n (cid:88) i = n +1 ψ n ( Z π ( i ) , P n ) (cid:35) + √ n √ n R ,n ( Z π (1) , . . . , Z π ( n ) ) − √ n √ n R ,n ( Z π ( n +1) , . . . , Z π ( n ) )= √ n √ n (cid:34) √ n n (cid:88) i =1 ψ n ( Z π ( i ) , P n ) (cid:35) − √ n √ n (cid:34) √ n n (cid:88) i = n +1 ψ n ( Z π ( i ) , P n ) (cid:35) + o p (1) . B.2.3 Coupling Approximation
The goal of this section is to create a data set Z ∗ n = ( Z ∗ , . . . , Z ∗ n ) that is iid from P n anda permutation π such that, for a random permutation π , T n ( W n , Z ∗ ππ n ) − T n ( W n , Z πn ) p → Z n = ( Z , . . . , Z n , Z n +1 , . . . , Z n ) = ( Z , , . . . , Z ,n , Z , , . . . , Z ,n ), the dis-tribution of Z n is P n × P n , that is, the independent product of distributions P ( n times)and P ( n times). In what follows, we construct a dataset Z ∗ n = ( Z ∗ , . . . , Z ∗ n ) that is iidfrom P n . For observation i = 1, draw an index k out of { , } at random with probabilities( p ,n , p ,n ). Given the resulting index k , set Z ∗ = Z k, . Move on to i = 2 and draw an index k (cid:48) as before. If the index k (cid:48) drawn is the same as k , set Z ∗ = Z k, . Otherwise, if the index k (cid:48) (cid:54) = k , then set Z ∗ = Z k (cid:48) , . Keep going until you reach a point where you may run out ofobservations from one of the two samples. For example, for observation i , if you randomlypick k = 1, but you have already exhausted all n observations from P , then randomly draw Z ∗ i from P . We end up with Z ∗ n and Z n having many of the same observations in common.Call D the random number of observations that are different.5eorder observations in Z ∗ n by a permutation π so that Z ∗ π ( i ) equals Z i for most i ,except for D of them. The re-ordered sample Z ∗ π n is constructed as follows. The sample Z n has the n observations from P appear first, then the n observations from P appearsecond. Simply take all the observations from P in Z ∗ n and place them first, up to n . Theobservations from P in Z ∗ n that are equal to observations from P in Z n are placed first andin the same order as in Z n . If there are more than n observations from P in Z ∗ n , put theextra observations on the side. If there are less than n , leave the spots blank and startat spot n + 1 with the observations from P in Z ∗ n . Repeat the same procedure for thoseobservations in Z ∗ n that were drawn from P , and place them starting at spot n + 1 up tospot n . There are remaining spots in the newly created sample Z ∗ π n , and these should befilled up with the observations from Z ∗ n that were placed on the side, in any order. Note thatthe distribution of Z ∗ π n is also iid P n .We first need to show that the (random) number of observations D , that are differentbetween Z ∗ n and Z n , is “small” as a fraction of n . Let n ∗ denote the number of observationsin Z ∗ n that are generated from P . Then, n ∗ has the binomial distribution with ( n, p ,n )parameters. So the mean of n ∗ is np ,n = n . We have that D = | n ∗ − n | . E [ D ] = E | n ∗ − n | = E | n ∗ − np ,n |≤ [ E { ( n ∗ − np ,n ) } ] / = [ np ,n (1 − p ,n )] / = O ( n / ) , where the inequality follows from the Jensen’s inequality and p ,n (1 − p ,n ) ≤ /
4. By asimilar argument, E [ D ] = O ( n ) and V [ D ] = O ( n ).Let π be a random permutation that is uniformly distributed over G n and independentof everything else. Define ∆ i = 1 for i ≤ n , and ∆ i = − n n for i > n . From before, we have T n ( W n , Z πn ) = √ n √ n (cid:34) √ n n (cid:88) i =1 ψ n ( Z π ( i ) , P n ) (cid:35) − √ n √ n (cid:34) √ n n (cid:88) i = n +1 ψ n ( Z π ( i ) , P n ) (cid:35) + o p (1) d = √ nn n (cid:88) i =1 ∆ π ( i ) ψ n ( Z i , P n ) (cid:124) (cid:123)(cid:122) (cid:125) . = T πn + o p (1) = T πn + o p (1) , where A d = B means A and B have the same distribution. The test statistic that uses Z ∗ π n
6n the place of Z n as initial sample and then undergoes permutation π is T n ( W n , Z ∗ ππ n ) d = √ nn n (cid:88) i =1 ∆ π ( i ) ψ n ( Z ∗ π ( i ) , P n ) (cid:124) (cid:123)(cid:122) (cid:125) . = T ∗ ππ n + o p (1) = T ∗ ππ n + o p (1) . The rest of this subsection shows that T ∗ ππ n − T πn p → n → ∞ .First, the expected value. E [ T ∗ ππ n − T πn ] = √ nn n (cid:88) i =1 E (cid:2) ∆ π ( i ) (cid:3) E (cid:2)(cid:0) ψ n ( Z ∗ π ( i ) , P n ) − ψ n ( Z i , P n ) (cid:1)(cid:3) = 0because π is independent of everything and E [∆ π ( i ) ] = 0.Second, the variance. V ( T ∗ ππ n − T πn ) = E [ V ( T ∗ ππ n − T πn | D, π, π )] (B.4)+ V [ E ( T ∗ ππ n − T πn | D, π, π )] (B.5)Part B.4 : The elements in Z ∗ π n and Z n are the same except for D of them. This makesall the terms in the difference T ∗ ππ n − T πn zero, except for at most D of them. Conditioningon D , π , and π , the variance is V [ T ∗ ππ n − T πn | D, π, π ] = nn D V (cid:2) ∆ π ( i ) (cid:0) ψ n ( Z ∗ π ( i ) , P n ) − ψ n ( Z i , P n ) (cid:1)(cid:12)(cid:12) D, π, π (cid:3) ≤ nn D max (cid:40)(cid:18) n n (cid:19) , (cid:41) (cid:8) V [ ψ n ( Z , P n )] + V [ ψ n ( Z , P n )] (cid:9) (B.6)= nn D.O (1) (B.7)because n /n = O (1) and V [ ψ n ( Z k , P n )] = O (1) , k = 1 , , by Assumption 2.1-(2.5).Taking the expectation, E [ V ( T ∗ ππ n − T πn | D, π, π )] ≤ nn E [ D ] O (1) = o (1) . (B.8)Part B.5 : 7quation B.5 is bounded by nh min { n , n } [ θ ( P ) − θ ( P ) + o (1)] V { D } , (B.9)which converges to 0 under the null. However, when the null is not imposed, an alternativebound is needed. Applying the law of iterated expectation to Equation B.5 and conditioningon D and π , an alternative bound is calculated as n n n ( n − (cid:20) E ( D ) − E ( D ) (cid:18) n (cid:19)(cid:21) (cid:8) E (cid:2) ψ n ( Z , P n ) (cid:3) − E (cid:2) ψ n ( Z , P n ) (cid:3)(cid:9) , where (cid:8) E (cid:2) ψ n ( Z , P n ) (cid:3) − E (cid:2) ψ n ( Z , P n ) (cid:3)(cid:9) = O (1) because of Assumption 2.1-(2.5). There-fore, this bound converges to 0 and the desired result is obtained. B.2.4 Hoeffding’s CLT
Let π and π (cid:48) be permutations that are mutually independent, uniformly distributed over G n , and independent of everything else. The goal of this section is to show that( T πn , T π (cid:48) n ) d → ( T, T (cid:48) ) , (B.10)where T, T (cid:48) are independent normal random variables with the same distribution. To thisend, we first show that( T ∗ πn , T ∗ π (cid:48) n ) = (cid:32) √ nn n (cid:88) i =1 ∆ π ( i ) ψ n ( Z ∗ i , P n ) , √ nn n (cid:88) i =1 ∆ π (cid:48) ( i ) ψ n ( Z ∗ i , P n ) (cid:33) d → ( T, T (cid:48) ) . (B.11)By the Cram`er-Wold device, we need to verify that n (cid:88) i =1 √ nn ( a ∆ π ( i ) + b ∆ π (cid:48) ( i ) ) (cid:124) (cid:123)(cid:122) (cid:125) . = C n,i ψ n ( Z ∗ i , P n ) = n (cid:88) i =1 C n,i ψ n ( Z ∗ i , P n ) (B.12)is asymptotically normal for any choice of constants a and b , where a (cid:54) = 0 or b (cid:54) = 0. Notethat C n, , . . . , C n,n is a sequence of random variables that are independent of ψ n ( Z ∗ i , P n ), i = 1 , . . . , n .Call δ n = V [ ψ n ( Z ∗ i , P n )]. In order to apply Lemma D.4 and conclude that (cid:80) ni =1 C n,i ψ n ( Z ∗ i , P n ) δ n (cid:113)(cid:80) nl =1 C n,l d → N (0 , ,
8e need to show that there exists θ > (cid:32) max i =1 ,...,n C n,i (cid:80) nl =1 C n,l (cid:33) θ/ E (cid:12)(cid:12)(cid:12)(cid:12) ψ n ( Z ∗ i , P n ) δ n (cid:12)(cid:12)(cid:12)(cid:12) θ p → . (B.13)We verify (B.13) in three steps.First, we show that max i =1 ,...,n C n,i = O p ( n − ): C n,i = nn (cid:0) a ∆ π ( i ) + 2 ab ∆ π ( i ) ∆ π (cid:48) ( i ) + b ∆ π (cid:48) ( i ) (cid:1) = nn O p (1)max i =1 ,...,n C n,i = nn O p (1) = O p ( n − ) . Second, we derive the probability limit of (cid:80) nl =1 C n,l . Note that: E [∆ π ( i ) ] = E [∆ π (cid:48) ( i ) ] = 0 , V [∆ π ( i ) ] = V [∆ π (cid:48) ( i ) ] = n n , C [∆ π ( i ) , ∆ π (cid:48) ( i ) ] = E [∆ π ( i ) ∆ π (cid:48) ( i ) ] = 0 , E (cid:34) n (cid:88) i =1 C n,i (cid:35) → λ (1 − λ ) (cid:0) a + b (cid:1) , V (cid:34) n (cid:88) i =1 nn (cid:0) a ∆ π ( i ) + 2 ab ∆ π ( i ) ∆ π (cid:48) ( i ) + b ∆ π ( i ) (cid:48) (cid:1)(cid:35) = o (1) . Therefore, n (cid:88) i =1 C n,i p → λ (1 − λ ) (cid:0) a + b (cid:1) , which is bounded away from zero.Third, combining the two previous steps (cid:32) max i =1 ,...,n C n,i (cid:80) nl =1 C n,l (cid:33) θ/ = O p (cid:0) n − θ/ (cid:1) which combined with Assumption 2.1-(2.6) yields (B.13).Next, we derive the limiting distribution of (cid:80) ni =1 C n,i ψ n ( Z ∗ i , P n ), that is, without thestandardization. Since we already know the probability limit of (cid:80) nl =1 C n,l , we need the limitof δ n . Define P = λP + (1 − λ ) P . δ n = V [ ψ n ( Z ∗ i , P n )] − ξ ( P n ) + ξ ( P n ) − ξ ( P ) + ξ ( P )9 ξ ( P )where we used Assumption 2.1-(2.4) and Assumption 2.1-(2.7).Therefore, n (cid:88) i =1 C n,i ψ n ( Z ∗ i , P n ) d → N (cid:18) , ξ ( P ) (cid:18) λ (1 − λ ) (cid:0) a + b (cid:1)(cid:19)(cid:19) . By the Cram`er-Wold device, we conclude that ( T ∗ πn , T ∗ π (cid:48) n ) d → ( T, T (cid:48) ), where (
T, T (cid:48) ) isbivariate normal with zero means, equal variances V [ T ] = V [ T (cid:48) ] = (cid:18) ξ ( P ) λ (1 − λ ) (cid:19) . = τ (B.14)and zero covariance. Thus, T and T (cid:48) are independent.So far, we have shown that ( T ∗ πn , T ∗ π (cid:48) n ) d → ( T, T (cid:48) ). Consider the permutation π fromSection B.2.3. The conditions on ( π, π (cid:48) ) imply that the permutations ( ππ , π (cid:48) π ) are alsomutually independent, uniformly distributed over G n , and independent of everything else.This implies that ( T ∗ ππ n , T ∗ π (cid:48) π n ) d → ( T, T (cid:48) ).Finally, ( T πn , T π (cid:48) n ) = ( T πn − T ∗ ππ n , T π (cid:48) n − T ∗ π (cid:48) π n ) + ( T ∗ ππ n , T ∗ π (cid:48) π n ) d → ( T, T (cid:48) ) , where we use the coupling argument from Section B.2.3 to obtain T πn − T ∗ ππ n p → T π (cid:48) n − T ∗ π (cid:48) π n p →
0, and the Slutsky theorem.
B.2.5 Unconditional Argument
In this subsection, we apply Lemma D.3 to show the conclusion of Section B.2.4 alsoholds unconditionally. The analysis in sections B.2.3 and B.2.4 are conditional on W ∞ thatsatisfies (cid:12)(cid:12) n n − λ (cid:12)(cid:12) → n is a random variable. Assumption 2.2 implies that ( n /n − λ ) p → λ ∈ (0 , T πn , T π (cid:48) n ) d → ( T, T (cid:48) ) unconditionally. By the Hoeffding’s CLT(Lehmann and Romano (2005), Theorem 15.2.3), (cid:98) R T n ( t ) p → Φ (cid:18) tτ (cid:19) where Φ is the CDF of the standard normal distribution, and τ is given by (B.14). Uniform10onsistency follows from continuity and monotonicity of Φ (Lemma D.5). B.3 Proof of Theorem 2.2 - Asymptotic Distributions With Stu-dentization
B.3.1 Asymptotic Distribution of Test Statistic
Consider a sequence of W n that satisfies ( n /n − λ ) → λ ∈ (0 , W n for each n , S n − √ nh ( θ ( P ) − θ ( P )) (cid:98) σ n = √ nh (cid:98) σ n (cid:104)(cid:16)(cid:98) θ − θ ( P ) (cid:17) − (cid:16)(cid:98) θ − θ ( P ) (cid:17)(cid:105) d → N (0 , σ/ (cid:98) σ n p → T n − √ nh ( θ ( P ) − θ ( P )) d → N (0 , σ ) by Theorem2.1, and the Slutsky theorem. The same is true unconditional on W n by Lemma D.3. B.3.2 Asymptotic Permutation Distribution
Again, consider a sequence of W n that satisfies ( n /n − λ ) → λ ∈ (0 , W n for each n . Without loss of generality, re-order observations in the sample such that Z n = ( Z , . . . , Z n , Z n +1 , . . . , Z n ) = ( Z , , . . . , Z ,n , Z , , . . . , Z ,n ) and W n = ( W ,n , . . . ,W n ,n , W n +1 ,n , . . . , W n,n ) = (1 , . . . , , , . . . , π be a random permutation that is uniformly distributed over G n and independentof the data. For each k = 1 ,
2, let V ,n , . . . , V n k ,n be an iid sample from the distribution P n .Assumption 2.3 guarantees that ξ n k ,n ( V ,n , . . . , V n k ,n ) − ξ ( P n ) p → . (B.15)Lemma 5.3 and Remark A.2 by Chung and Romano (2013) show that ξ n ,n ( Z π (1) , . . . , Z π ( n ) ) − ξ ( P n ) p → ξ n ,n ( Z π ( n +1) , . . . , Z π ( n ) ) − ξ ( P n ) p → . Using Assumption 2.1-(2.7), ξ n ,n ( Z π (1) , . . . , Z π ( n ) ) p → ξ ( P ) ξ n ,n ( Z π ( n +1) , . . . , Z π ( n ) ) p → ξ ( P ) 11 σ πn . = nn ξ n ,n ( Z π (1) , . . . , Z π ( n ) ) + nn ξ n ,n ( Z π ( n +1) , . . . , Z π ( n ) ) p → ξ ( P ) λ (1 − λ ) = τ . By Lemma D.3, (cid:98) σ πn p → τ unconditional on W n . Section B.2.5 shows that ( T πn , T π (cid:48) n ) d → ( T,T (cid:48) ), where (
T, T (cid:48) ) are independent normals with variance τ . Given these two facts togetherwith Theorem 5.2 by Chung and Romano (2013), the asymptotic permutation distributionof T n /τ is the same as that of S n = T n / (cid:98) σ n . Therefore, (cid:98) R S n ( t ) p → Φ( t ). Uniform consistencyfollows from the monotonicity and continuity of Φ (Lemma D.5). B.4 Proof of Corollary 2.1 - Asymptotic Size and Power
Proof.
For a ∈ (0 , r ( a ) = inf { t : Φ( t ) ≥ a } = Φ − ( a ) and (cid:98) r n ( a ) = inf { t : (cid:98) R S n ( t ) ≥ a } . Lemma 11.2.1 by Lehmann and Romano (2005) says that (cid:98) R S n p → Φ implies (cid:98) r n ( a ) p → r ( a ).Rewrite the test φ as, φ ( W n , Z n ) = S n > (cid:98) r n (1 − α/
2) or S n < (cid:98) r n ( α/ a if S n = (cid:98) r n (1 − α/
2) or S n = (cid:98) r n ( α/ (cid:98) r n ( α/ < S n < (cid:98) r n (1 − α/ . First, assume the null hypothesis θ ( P ) − θ ( P ) holds. For n → ∞ , S n d → S , where S isa standard normal. S n − (cid:98) r n ( a ) + r ( a ) d → S P [ S n < (cid:98) r n ( a )] → P [ S < r ( a )] = a P [ S n > (cid:98) r n ( a )] → P [ S > r ( a )] = 1 − a P [ S n = (cid:98) r n ( a )] → P [ S = r ( a )] = 0 . Then, the probability of rejection is E [ φ ( W n , Z n )]= P [ S n ( W n , Z n ) > (cid:98) r n (1 − α/ P [ S n ( W n , Z n ) < (cid:98) r n ( α/ o (1) → − (1 − α/
2) + α/ α. Second, assume that θ ( P ) − θ ( P ) = η and η > m fixedand n → ∞ , S n − (cid:112) nh n η/ (cid:98) σ + (cid:112) mh m η/ (cid:98) σ − (cid:98) r n ( a ) + r ( a ) d → S + (cid:112) mh m η/σ. m fixed and n larger than m , P [ S n < (cid:98) r n ( a )] ≤ P (cid:104) S n − (cid:112) nh n η/ (cid:98) σ + (cid:112) mh m η/ (cid:98) σ − (cid:98) r n ( a ) + r ( a ) < r ( a ) (cid:105) lim n →∞ P [ S n < (cid:98) r n ( a )] ≤ P (cid:104) S + (cid:112) mh m η/σ < r ( a ) (cid:105) = Φ (cid:16) r ( a ) − (cid:112) mh m η/σ (cid:17) P [ S n > (cid:98) r n ( a )] ≥ P (cid:104) S n − (cid:112) nh n η/ (cid:98) σ + (cid:112) mh m η/ (cid:98) σ − (cid:98) r n ( a ) + r ( a ) > r ( a ) (cid:105) lim n →∞ P [ S n > (cid:98) r n ( a )] ≥ P (cid:104) S + (cid:112) mh m η/σ > r ( a ) (cid:105) = 1 − Φ (cid:16) r ( a ) − (cid:112) mh m η/σ (cid:17) . Take limits as m → ∞ from both sides,lim n →∞ P [ S n < (cid:98) r n ( a )] ≤ lim m →∞ Φ (cid:16) r ( a ) − (cid:112) mh m η/σ (cid:17) = 0lim n →∞ P [ S n > (cid:98) r n ( a )] ≥ lim m →∞ − Φ (cid:16) r ( a ) − (cid:112) mh m η/σ (cid:17) = 1 . Then, E [ φ ( W n , Z n )] → . Moreover, there is no loss in power in using permutation critical values. The asymptotictest rejects when S n > r (1 − α/
2) or S n < r ( α/ r ( a ) = Φ − ( a ) is nonrandom.Suppose S n ( W n , Z n ) d → L η for some L η under a sequence of alternatives that are contiguous to some distribution satis-fying the null hypothesis and θ ( P n ) − θ ( P n ) = η/ √ nh n . Then the power of the test againstlocal alternatives would tend to 1 − L η (Φ − (1 − α/ L η (Φ − ( α/ r n obtained from the permutation distribution satisfies ˆ r n ( a ) p → Φ − ( a ).The same results follows under the sequence of contiguous alternatives, thus implying thatthe permutation test has the same limiting local power as the asymptotic test which usesnonrandom critical values. 13 .5 Proof of Corollary 4.1 - Confidence Set Proof.
Fix n and Q n arbitrary. Pick any pair ( P , P ). Call δ = Ψ( P , P ). By assumption, ψ δ P = P . Lemma 2.1 says that E [ φ δ ( W n , Z n )] = E [ φ δ ( W n , (cid:101) Z n )] = α . Therefore, P [Ψ( P , P ) ∈ C n ( W n , Z n )] = P [ U > φ δ ( W n , Z n )]= E [ P ( U > φ δ ( W n , Z n ) | W n , Z n )]= 1 − E [ φ δ ( W n , Z n )] = 1 − α. Now, suppose ψ δ P (cid:54) = P and Assumptions 2.1-2.3 hold. Corollary 2.1 says that E [ φ δ ( W n , Z n )] → α . Take the limit as n → ∞ on both sides of the equality above. It followsthat the asymptotic coverage of C n ( W n , Z n ) is 1 − α . C Proof of Applications
This appendix gathers the proofs of Propositions 3.1 - 3.4. Key steps in these proofsconsist of demonstrating uniform convergence over P . By uniformly over P we mean over P distribution of V and P argument of functions, e.g., f R ( x ; P ) or m S | R ( x ; P ). For A n ( P )random function of P , with distribution depending on P , we use A n ( P ) = o P (1) to denotesup P ∈P P P [ | A n ( P ) | > ε ] →
0; we also use A n ( P ) = O P (1) to denote that, for every δ > M δ < ∞ such that sup P ∈P P P [ | A n ( P ) | > M δ ] < δ . The same notation is appliedwhen A n ( P ) is a deterministic function of P . See Definition D.1 and Lemma D.1 in AppendixD. C.1 Proof of Proposition 3.1 - Controlled Means
The goal of this proof is to use the assumptions listed in Proposition 3.1 to verify As-sumptions 2.1 and 2.2.Let m n be an arbitrary sequence of positive integers that grows with n such that m n /n → γ , for some γ ∈ (0 , n of m n for notational ease. Let V = ( R, S ),draw V , . . . , V m iid P ∈ P . The assumptions in Proposition 3.1 imply the following facts:1. As m → ∞ , h → , mh → ∞ , √ mhh → (cid:82) K ( u ) u du = 0 , (cid:82) K r ( u ) u s g ( u ) du < ∞ for ≤ r < ∞ , ≤ s ≤ , and boundedfunction g ( u ); 14. The distribution of R has PDF f R ( r ; P ) , first and second derivatives wrt r denoted ∇ r f R ( r ; P ) and ∇ r f R ( r ; P ) , respectively; f R ( r ; P ) , ∇ r f R ( r ; P ) , and ∇ r f R ( r ; P ) arebounded as functions of ( r, P ) ; f R ( r ; P ) is bounded away from zero as a function of ( r, P ) ; To see this, note that f R ( r ; P ) is a convex combination of f X ( r ) and f X ( r ), eachbounded, with bounded derivatives, and bounded away from zero.4. m S | R ( r ; P ) = E P [ S | R = r ] has first and second derivatives wrt r denoted ∇ r m S | R ( r ; P ) and ∇ r m S | R ( r ; P ) , respectively; m S | R , ∇ r m S | R , ∇ r m S | R are all bounded as functionsof ( r, P ) ; To see this, take P = αP + (1 − α ) P and note that, m S | R ( r ; P ) = αf X ( r ) f R ( r ; P ) m Y | X ( r ) + (1 − α ) f X ( r ) f R ( r ; P ) m Y | X ( r ) . The expectations m Y k | X k ( r ) are bounded functions of r . The weights ω ( r ; P ) . = α f X ( r ) f R ( r ; P ) and ω ( r ; P ) . = (1 − α ) f X ( r ) f R ( r ; P ) are bounded functions of ( r, P ) because they arepositive and sum to 1. The expectations m Y k | X k ( r ) and the weights ω k ( r ; P ) are twicedifferentiable wrt r . The derivatives of m Y k | X k ( r ) are bounded wrt r . The derivativesof the weights are bounded because the derivatives of the PDFs f X k ( r ) are boundedplus the fact that f R ( r ; P ) is bounded away from zero over ( r, P ).5. v S | R ( r ; P ) = V P [ S | R = r ] has first derivative wrt r denoted ∇ r v S | R ( r ; P ) ; v S | R , ∇ r v S | R are both bounded as functions of ( r, P ) ; v S | R ( x ; P ) is bounded away from zero as afunction of P ; Again, v S | R ( r ; P ) = m S | R ( r ; P ) − m S | R ( r ; P ), where m S | R ( r ; P ) = ω ( r ; P ) m Y | X ( r ) + ω ( r ; P ) m Y | X ( r )= ω ( r ; P ) (cid:2) v Y | X ( r ) + m Y | X ( r ) (cid:3) + ω ( r ; P ) (cid:2) v Y | X ( r ) + m Y | X ( r ) (cid:3) m S | R ( r ; P ) = (cid:2) ω ( r ; P ) m Y | X ( r ) + ω ( r ; P ) m Y | X ( r ) (cid:3) . A similar argument to Fact 4 shows that v S | R and ∇ r v S | R are bounded functionsof ( r, P ). Next, v Y k | X k ( x ) = m Y k | X k ( x ) − m Y k | X k ( x ) is bounded away from zero,so that m Y k | X k ( x ) is bounded away from m Y k | X k ( x ). It follows that m S | R ( x ; P ) = ω ( x ; P ) m Y | X ( x ) + ω ( x ; P ) m Y | X ( x ) is bounded away from ω ( x ; P ) m Y | X ( x ) + ω ( x ; P ) m Y | X ( x ), which is greater than or equal to m S | R ( x ; P ). Thus, v S | R ( x ; P )is bounded away from zero as a function of P .15. Define η ( r ; P ) = E P [ | S − m S | R ( r ; P ) | θ | R = r ] . η ( r ; P ) is a bounded function of ( r, P ) . By the c r -inequality, η ( r ; P ) ≤ θ E P [ | S | θ | R = r ] + 2 θ | m S | R ( r ; P ) | θ = ω ( r ; P ) E [ | Y | θ | X = r ] + ω ( r ; P ) E [ | Y | θ | X = r ]+2 θ | m S | R ( r ; P ) | θ which is a bounded function of ( r, P ) because the weights ω k ( r ; P ) are bounded, E [ | Y k | θ | X k = r ] are bounded, and m S | R ( r ; P ) is bounded.We re-write √ mh (cid:16)(cid:98) θ − θ ( P ) (cid:17) to find the asymptotic linear representation. √ mh (cid:16)(cid:98) θ − θ ( P ) (cid:17) = (cid:32) √ mh m (cid:88) i =1 K (cid:18) R i − xh (cid:19) (cid:0) S i − m S | R ( R i ; P ) (cid:1)(cid:33) f R ( x ; P ) − (C.1)+ (cid:32) √ mh m (cid:88) i =1 K (cid:18) R i − xh (cid:19) (cid:0) S i − m S | R ( R i ; P ) (cid:1)(cid:33)(cid:32) mh m (cid:88) i =1 K (cid:18) R i − xh (cid:19)(cid:33) − − f R ( x ; P ) − (C.2)+ (cid:32) √ mh m (cid:88) i =1 K (cid:18) R i − xh (cid:19) (cid:0) m S | R ( R i ; P ) − m S | R ( x ; P ) (cid:1)(cid:33)(cid:32) mh m (cid:88) i =1 K (cid:18) R i − xh (cid:19)(cid:33) − (C.3)1. Assumption 2.1 - (2.2): asymptotic expansion.Equation C.1 above gives the influence function ψ n .1 √ m m (cid:88) i =1 K (cid:18) R i − xh (cid:19) (cid:0) S i − m S | R ( R i ; P ) (cid:1) h − / f − R ( x ; P ) (cid:124) (cid:123)(cid:122) (cid:125) . = ψ n ( V i ,P ) = 1 √ m m (cid:88) i =1 ψ n ( V i , P ) . We need to show that Equations C.2 and C.3 converge in probability to zero uniformlyover P . Equation C.2: is o P (1). We show this in 3 steps.16irst, V P (cid:32) √ mh m (cid:88) i =1 K (cid:18) R i − xh (cid:19) (cid:0) S i − m S | R ( R i ; P ) (cid:1)(cid:33) = (cid:90) K ( u ) v S | R ( x + uh ; P ) f R ( x + uh ; P ) du which is bounded over P because of the kernel properties (Fact 2 above), and v S | R ( r ; P )and f R ( r ; P ) are bounded functions of ( r, P ). Next, E P (cid:32) √ mh m (cid:88) i =1 K (cid:18) R i − xh (cid:19) (cid:0) S i − m S | R ( R i ; P ) (cid:1)(cid:33) = 0 . Use Lemma D.1, part 2, to conclude that1 √ mh m (cid:88) i =1 K (cid:18) R i − xh (cid:19) (cid:0) S i − m S | R ( R i ; P ) (cid:1) = O P (1) . Second, E P (cid:34) mh m (cid:88) i =1 K (cid:18) R i − xh (cid:19)(cid:35) − f R ( x ; P ) = (cid:90) K ( u ) ∇ r f R ( x ∗ uh ; P ) uh du where x ∗ uh is a point between x + uh . The expression above converges to zero uniformlyover P because of kernel properties (Fact 2), the derivative ∇ r f R ( r ; P ) is a boundedfunction of ( r, P ), and h → m → ∞ . Next, the variance of the same term. V P (cid:34) mh m (cid:88) i =1 K (cid:18) R i − xh (cid:19)(cid:35) = 1 mh V P (cid:20) K (cid:18) R i − xh (cid:19)(cid:21) ≤ mh E P (cid:20) K (cid:18) R i − xh (cid:19)(cid:21) = 1 mh (cid:90) K ( u ) f R ( x + uh ; P ) du which converges to zero uniformly over P because mh → ∞ , kernel properties (Fact2), and f R ( r ; P ) is a bounded function of ( r, P ). Use Lemma D.1, parts 2 and 3 toarrive at: 1 mh m (cid:88) i =1 K (cid:18) R i − xh (cid:19) − f R ( x ; P ) = o P (1) (C.4)17 mh m (cid:88) i =1 K (cid:18) R i − xh (cid:19)(cid:33) − − ( f R ( x ; P )) − = o P (1) . (C.5)Third, combine steps 1 and 2 and use Lemma D.1 - part 1: (cid:32) √ mh m (cid:88) i =1 K (cid:18) R i − xh (cid:19) (cid:0) S i − m S | R ( R i ; P ) (cid:1)(cid:33)(cid:32) mh m (cid:88) i =1 K (cid:18) R i − xh (cid:19)(cid:33) − − ( f R ( x ; P )) − = O P (1) o P (1) = o P (1) . Equation C.3: is o P (1). We show this in 3 steps.First, (cid:0) mh (cid:80) mi =1 K (cid:0) R i − xh (cid:1)(cid:1) − = O P (1) by what was shown above (Equations C.4 andC.5).Second, E P (cid:34) √ mh m (cid:88) i =1 K (cid:18) R i − xh (cid:19) (cid:0) m S | R ( R i ; P ) − m S | R ( x ; P ) (cid:1)(cid:35) = √ mh (cid:90) K ( u ) [ m S | R ( x + uh ; P ) − m S | R ( x ; P )] f R ( x + uh ; P ) du = √ mh (cid:90) K ( u ) [ ∇ r m S | R ( x ; P ) uh + ∇ r m S | R ( x ∗ uh ; P ) u h / f R ( x ; P ) + ∇ r f R ( x ∗∗ uh ; P ) uh ] du = √ mhh (cid:90) K ( u ) u du ∇ r m S | R ( x ; P ) f R ( x ; P )+ √ mhh (cid:90) K ( u ) u ∇ r f R ( x ∗∗ uh ; P ) du ∇ r m S | R ( x ; P )+ √ mhh / (cid:90) K ( u ) u ∇ r m S | R ( x ∗ uh ; P ) du f R ( x ; P )+ √ mhh / (cid:90) K ( u ) u ∇ r m S | R ( x ∗ uh ; P ) ∇ r f R ( x ∗∗ uh ; P ) du =0 + O P (cid:16) √ mhh (cid:17) + O P (cid:16) √ mhh (cid:17) + O P (cid:16) √ mhh (cid:17) = o P (1) . where we use the following: (i) (cid:82) K ( u ) u du = 0 and other kernel properties (Fact 2);(ii) ∇ r m S | R ( r ; P ), ∇ r m S | R ( r ; P ), f R ( r ; P ), and ∇ r f R ( r ; P ) are bounded functions of18 r, P ); (iii) √ mhh →
0; and (iv) x ∗ uh , x ∗∗ uh are points between x + uh and x . Next, thevariance. V P (cid:34) √ mh m (cid:88) i =1 K (cid:18) R i − xh (cid:19) (cid:0) m S | R ( R i ; P ) − m S | R ( x ; P ) (cid:1)(cid:35) = 1 h V P (cid:20) K (cid:18) R i − xh (cid:19) (cid:0) m S | R ( R i ; P ) − m S | R ( x ; P ) (cid:1)(cid:21) ≤ h E P (cid:20) K (cid:18) R i − xh (cid:19) (cid:0) m S | R ( R i ; P ) − m S | R ( x ; P ) (cid:1) (cid:21) = h (cid:90) K ( u ) [ ∇ r m S | R ( x ∗ uh ; P )] u f R ( x + uh ; P ) du = O P (cid:0) h (cid:1) = o P (1)where we use that (i) ∇ r m S | R ( r ; P ) and f R ( r ; P ) are bounded functions of ( r, P ); (ii) h →
0; and (iii) kernel properties (Fact 2).Apply Lemma D.1- part 2 to get1 √ mh m (cid:88) i =1 K (cid:18) R i − xh (cid:19) (cid:0) m S | R ( R i ; P ) − m S | R ( x ; P ) (cid:1) = o P (1) . Third, combine the first and second steps and apply Lemma D.1- part 1 to arrive at (cid:32) √ mh m (cid:88) i =1 K (cid:18) R i − xh (cid:19) (cid:0) m S | R ( R i ; P ) − m S | R ( x ; P ) (cid:1)(cid:33) (cid:32) mh m (cid:88) i =1 K (cid:18) R i − xh (cid:19)(cid:33) − = o P (1) O P (1) = o P (1) .
2. Assumption 2.1 - (2.3): zero mean of influence function. E P [ ψ n ( V i , P )] = 0 ∀ P by construction.3. Assumption 2.1 - (2.4): variance of influence function.Define ξ ( P ) = M K v S | R ( x ; P ) /f R ( x ; P ), where M K = (cid:82) ∞−∞ K ( u ) du . V P (cid:18) f R ( x ; P ) √ h K (cid:18) R i − xh (cid:19) (cid:0) S i − m S | R ( R i ; P ) (cid:1)(cid:19) − ξ ( P )= 1 f R ( x ; P ) (cid:90) K ( u ) v S | R ( x + uh ; P ) f R ( x + uh ; P ) − v S | R ( x ; P ) f R ( x ; P ) (cid:124) (cid:123)(cid:122) (cid:125) . = g ( x ; P ) du hf R ( x ; P ) (cid:90) K ( u ) ∇ x g ( x ∗ uh ; P ) u du = o P (1) . where ∇ x g ( x ; P ) denotes the derivative of g ( x ; P ) wrt x . The expression above con-verges to zero uniformly over P because h →
0, Fact 2 on the kernel, the derivative ∇ x g ( x ; P ) is a bounded function of ( x, P ), and f R ( x ; P ) is bounded away from zero asa function of P . Therefore,sup P ∈P (cid:12)(cid:12) V P [ ψ n ( V i , P )] − ξ ( P ) (cid:12)(cid:12) → .
4. Assumption 2.1 - (2.5): sup P ∈P E [ ψ n ( Z k,i , P )] < ∞ for k = 1 , ψ n ( Z k , P ) = K (cid:18) X k − xh (cid:19) (cid:0) Y k − m S | R ( X k ; P ) (cid:1) h − / f − R ( x ; P ) ψ n ( Z k , P ) = K (cid:0) X k − xh (cid:1) hf R ( x ; P ) (cid:2)(cid:0) Y k − m Y k | X k ( X k ) (cid:1) + (cid:0) m Y k | X k ( X k ) − m S | R ( X k ; P ) (cid:1)(cid:3) = K (cid:0) X k − xh (cid:1) hf R ( x ; P ) (cid:104)(cid:0) Y k − m Y k | X k ( X k ) (cid:1) + (cid:0) m Y k | X k ( X k ) − m S | R ( X k ; P ) (cid:1) +2 (cid:0) Y k − m Y k | X k ( X k ) (cid:1) (cid:0) m Y k | X k ( X k ) − m S | R ( X k ; P ) (cid:1)(cid:3) E (cid:2) ψ n ( Z k , P ) | X k (cid:3) = K (cid:0) X k − xh (cid:1) hf R ( x ; P ) (cid:104) v Y k | X k ( X k ) + (cid:0) m Y k | X k ( X k ) − m S | R ( X k ; P ) (cid:1) (cid:105) E (cid:2) ψ n ( Z k , P ) (cid:3) = 1 f R ( x ; P ) (cid:90) K ( u ) (cid:2) v Y k | X k ( x + uh )+ (cid:0) m Y k | X k ( x + uh ) − m S | R ( x + uh ; P ) (cid:1) (cid:105) f X k ( x + uh ) du = O P (1)because f R ( x ; P ) is bounded away from zero as a function of P , the conditional momentfunctions and f X k inside the integral are bounded, and Fact 2 on the kernel.5. Assumption 2.1 - (2.6): (2 + θ )-th moment condition.We verify it in two steps.First, V P ( ψ n ( V i , P )) = V P (cid:18) f R ( x ; P ) √ h K (cid:18) R i − xh (cid:19) (cid:0) S i − m S | R ( R i ; P ) (cid:1)(cid:19)
20 1 f R ( x ; P ) (cid:90) K ( u ) (cid:8) v S | R ( x + uh ; P ) f R ( x + uh ; P ) (cid:9) du is bounded away from zero uniformly over P and n because (i) f R ( x ; P ) is boundedas a function of P ; and (ii) v S | R ( r ; P ) and f R ( r ; P ) are continuous functions of r andbounded away from zero at x = r and over P .Second, for θ of the moment condition in Proposition 3.1, call η ( r ; P ) = E P [ | S i − m S | R ( R i ; P ) | θ | R i = r ]. n − θ/ E P | ψ n ( V i , P ) | θ = ( m/n ) θ/ f θR ( x ; P )( mh ) θ/ (cid:90) | K ( u ) | θ η ( x + uh ; P ) f R ( x + uh ; P ) du = o P (1)because (i) mh → ∞ , m/n = O (1); (ii) f R ( x ; P ) = O P (1) and f − R ( x ; P ) = O P (1); (iii) η ( x ; P ) = O P (1); and (iv) Fact 2 on the kernel. Combining steps 1 and 2, n − θ/ sup P ∈P E P (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ψ n ( V i , P ) (cid:112) V P ( ψ n ( V i , P )) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = o (1) .
6. Assumption 2.1 - (2.7): ξ (cid:0) mn P + n − mn P (cid:1) → ξ ( γP + (1 − γ ) P ).Let P n = mn P + n − mn P and P = γP +(1 − γ ) P . We have that ξ ( P n ) = M K v S | R ( x ; P n ) /f R ( x ; ¯ P n ),so it suffices to show that v S | R ( x ; ¯ P n ) → v S | R ( x ; ¯ P ) and f R ( x ; ¯ P n ) → f R ( x ; ¯ P ).First, convergence of the PDF, f R ( x ; ¯ P n ) = mn f X ( x ) + n − mn f X ( x ) → γf X ( x ) + (1 − γ ) f X ( x ) = f R ( x ; ¯ P ) . Second, convergence of moments. For g ( x ) = x or g ( x ) = x , E ¯ P n [ g ( S ) | R = x ] = mf X ( x ) nf R ( x ; ¯ P n ) E [ g ( Y ) | X = x ] + ( n − m ) f X ( x ) nf R ( x ; ¯ P n ) E [ g ( Y ) | X = x ] → γf X ( x ) f R ( x ; ¯ P ) E [ g ( Y ) | X = x ] + (1 − γ ) f X ( x ) f R ( x ; ¯ P ) E [ g ( Y ) | X = x ]= E ¯ P [ g ( S ) | R = x ]which implies that v S | R ( x ; ¯ P n ) → v S | R ( x ; ¯ P ).Therefore, ξ ( P n ) → ξ ( P ). 21. Assumption 2.2 :We have that n is a deterministic sequence and ( n /n − λ ) → m n such that ( m/n − λ ) → m/n − (1 − λ )) → (cid:3) C.2 Proof of Proposition 3.2 - Controlled Quantiles
The goal of this proof is to use the assumptions listed in Proposition 3.2 to verify As-sumptions 2.1 and 2.2. It adapts arguments from Pollard (1991) and Fan et al. (1994).Let m n be an arbitrary sequence of positive integers that grows with n such that m n /n → γ , for some γ ∈ (0 , V = ( R, S ), draw V , . . . , V m iid P ∈ P . Define U = S − θ ( P ).The assumptions in Proposition 3.2 imply the following facts:1. As m → ∞ , h → , mh → ∞ , √ mhh → (cid:82) K ( u ) u du = 0 , (cid:82) K r ( u ) u s g ( u ) du < ∞ for ≤ r < ∞ , ≤ s ≤ , and boundedfunction g ( u );3. The distribution of R has PDF f R ( r ; P ) , first and second derivatives wrt r denoted ∇ r f R ( r ; P ) and ∇ r f R ( r ; P ) , respectively; f R ( r ; P ) , ∇ r f R ( r ; P ) , and ∇ r f R ( r ; P ) arebounded as functions of ( r, P ) ; f R ( r ; P ) is bounded away from zero as a function of ( r, P ) ; To see this, note that f R ( r ; P ) is a convex combination of f X ( r ) and f X ( r ), eachbounded, with bounded derivatives, and bounded away from zero.4. The conditional distribution of U given R has PDF f U | R ( u | r ; P ) that is a boundedfunction of ( u, r, P ) , f U | R (0 | x ; P ) is bounded away from zero over P , f U | R ( u | r ; P ) isdifferentiable as function of ( u, r ) and has bounded partial derivatives; To see this, take P = αP + (1 − α ) P and note that, f U | R ( u | r ; P ) = αf X ( r ) f R ( r ; P ) f Y | X ( θ ( P ) + u | r ) + (1 − α ) f X ( r ) f R ( r ; P ) f Y | X ( θ ( P ) + u | r ) . The PDFs f Y k | X k ( y k | x k ) are bounded functions of ( x k , y k ). The weights ω ( r ; P ) . = α f X ( r ) f R ( r ; P ) and ω ( r ; P ) . = (1 − α ) f X ( r ) f R ( r ; P ) are bounded functions of ( r, P ) because theyare positive and sum to 1. The PDFs f Y k | X k ( y k | x k ) are differentiable and so are theweights. The partial derivatives of f Y k | X k ( y k | x k ) are bounded. The derivatives of the22eights wrt r also are bounded because the derivatives of the PDFs f X k ( r ) are boundedplus the fact that f R ( r ; P ) is bounded away from zero over ( r, P ). Finally, f Y k | X k ( y k | x )are bounded away from zero over y k .5. The conditional distribution of U given R has CDF F U | R ( u | r ; P ) that is twice partiallydifferentiable wrt r and has partial derivatives ∇ r F U | R (0 | r ; P ) and ∇ r F U | R (0 | r ; P ) thatare bounded functions of ( r, P ) ; Again, F U | R (0 | r ; P ) = ω ( r ; P ) F Y | X ( θ ( P ) | r ) + ω ( r ; P ) F Y | X ( θ ( P ) | r ) . We have that and F Y k | X k ( y k | x k ) is twice partially differentiable wrt x k . The weights ω k ( r ; P ) are twice differentiable wrt r because f X k ( x k ) are twice differentiable. The firstand second partial derivatives of F Y k | X k ( y k | x k ) wrt x k are bounded functions of ( x k , y k ).The first two derivatives of ω k ( r ; P ) wrt r are bounded functions of ( r, P ) because thefirst two derivatives of f X k ( x k ) wrt x k are bounded and f R ( r ; P ) is bounded away fromzero over ( r, P ).1. Assumption 2.1 - (2.2): asymptotic expansion.The τ th conditional quantile estimator is given byˆ θ = arg min θ m (cid:88) i =1 ρ τ ( S i − θ ) K (cid:18) R i − xh (cid:19) , where ρ τ ( u ) = ( τ − I ( u ≤ u. Define Z m = √ mh (cid:16) ˆ θ − θ ( P ) (cid:17) . We want to study theasymptotic behavior of Z m , where Z m is the value that minimizes the objective function L m ( z ), i.e., Z m = arg min z m (cid:88) i =1 ρ τ (cid:18) S i − θ ( P ) − z √ mh (cid:19) K (cid:18) R i − xh (cid:19)(cid:124) (cid:123)(cid:122) (cid:125) . = L m ( z ) = arg min z L m ( z ) − L m (0) , since L m (0) is not a function of z and hence does not affect the argmin.Let Q m ( z ) = L m ( z ) − L m (0) and U i = S i − θ ( P ). We have, Q m ( z ) = m (cid:88) i =1 (cid:18) ρ τ (cid:18) U i − √ mh z (cid:19) − ρ τ ( U i ) (cid:19) K (cid:18) R i − xh (cid:19) , Z m = √ mh (cid:16) ˆ θ − θ ( P ) (cid:17) . Notice that since ρ τ ( · )s are convex functionsof z, so is Q m ( z ), which is a sum of convex functions.The derivation of the asymptotic linear representation is done in two parts. First, weapproximate Q m ( z ) by a quadratic function Q ∗ m ( z ) whose minimizing value z = η m has anasymptotic linear representation. Second, we show that Z m = √ nh ( (cid:98) θ − θ ( P )) converges to η m in probability (and therefore they share the same asymptotic behavior).Part I: approximating the objective functionLet D i = − τ I ( U i ≥
0) + (1 − τ ) I ( U i <
0) = I ( U i < − τ and V i ( z ) . = ρ τ (cid:18) U i − √ mh z (cid:19) − ρ τ ( U i ) − √ mh zD i . We can rewrite Q m ( z ) in terms of D i and V i ( z ) by adding and subtracting the conditionalexpectation of Q m ( z ) as follows: Q m ( z ) = E P [ Q m ( z ) | R m ] (C.6)+ 1 √ mh m (cid:88) i =1 z ( D i − E P [ D i | R i ]) K (cid:18) R i − xh (cid:19) (C.7)+ m (cid:88) i =1 ( V i ( z ) − E P [ V i ( z ) | R i ]) K (cid:18) R i − xh (cid:19) , (C.8)where R m is the vector ( R , . . . , R m ). In what follows, we show that(C.6) = f U | R (0 | x ; P ) f R ( x ; P ) z + o P (1) and (C.8) = o P (1) . Regarding (C.6), define M ( t | r ; P ) . = E P [ ρ τ ( S − θ ( P ) + t ) | R = r ] = E P [ ρ τ ( U + t ) | R = r ] . Notice that although the check function is not differentiable, the M function is differentiable. ∇ t M ( t | r ; P ) = τ (1 − F U | R ( − t | r ; P )) − (1 − τ ) F U | R ( − t | r ; P ) = τ − F U | R ( − t | r ; P ) , ∇ t M ( t | r ; P ) = −∇ t { F U | R ( − t | r ; P ) } = f U | R ( − t | r ; P ) , ∇ t M ( t | r ; P ) = −∇ u f U | R ( − t | r ; P ) , ∇ tr M ( t | r ; P ) = −∇ r F U | R ( − t | r ; P ) , and ∇ tr M ( t | r ; P ) = −∇ r F U | R ( − t | r ; P ) . where use the Leibniz rule, the existence of the derivatives ∇ u f U | R ( u | r ; P ), ∇ r F U | R ( − t | r ; P ),24 r F U | R ( − t | r ; P ). We can write E [ Q m ( z ) | R m ] in terms of M as follows, E P [ Q m ( z ) | R m ]= m (cid:88) i =1 E P (cid:20) ρ τ (cid:18) U i − √ mh z (cid:12)(cid:12)(cid:12) R i (cid:19) − ρ τ (cid:16) U i (cid:12)(cid:12)(cid:12) R i (cid:17) (cid:21) K (cid:18) R i − xh (cid:19) = m (cid:88) i =1 (cid:20) M (cid:18) − z √ mh (cid:12)(cid:12)(cid:12) R i ; P (cid:19) − M (cid:0) (cid:12)(cid:12) R i ; P (cid:1)(cid:21) K (cid:18) R i − xh (cid:19) Taylor expand M as a function of t around 0, E P [ Q m ( z ) | R m ]= m (cid:88) i =1 (cid:20) ∇ t M (cid:0) (cid:12)(cid:12) R i ; P (cid:1) − z √ mh + 12 ∇ t M (cid:0) (cid:12)(cid:12) R i ; P (cid:1) z mh − ∇ t M ( q ∗ | R i ; P ) z ( mh ) / (cid:21) K (cid:18) R i − xh (cid:19) = − z √ mh m (cid:88) i =1 ∇ t M (cid:0) (cid:12)(cid:12) R i ; P (cid:1) K (cid:18) R i − xh (cid:19) (C.9)+ 12 z mh m (cid:88) i =1 f U | R (0 | R i ; P ) K (cid:18) R i − xh (cid:19) (C.10)+ z √ mh mh m (cid:88) i =1 ∇ u f U | R ( − q ∗ | R i ; P ) K (cid:18) R i − xh (cid:19) (C.11)where q ∗ is a point between 0 and − z/ √ mh . The goal is to show that(C.10) = f U | R (0 | x ; P ) f R ( x ; P ) z + o P (1), and that (C.9) and (C.11) are both o P (1) . Expectation of (C.9). We Taylor expand ∇ t M (cid:0) (cid:12)(cid:12) R i ; P (cid:1) as a function of R i around R i = x and use the fact that ∇ t M (cid:0) (cid:12)(cid:12) x ; P (cid:1) = τ − P P ( S i − θ ( P ) ≤ | R i = x ) = 0: E P (cid:34) − z √ mh m (cid:88) i =1 ∇ t M (cid:0) (cid:12)(cid:12) R i ; P (cid:1) K (cid:18) R i − xh (cid:19)(cid:35) = E P − z √ mh m (cid:88) i =1 ∇ t M (cid:0) (cid:12)(cid:12) x ; P (cid:1)(cid:124) (cid:123)(cid:122) (cid:125) =0 K (cid:18) R i − xh (cid:19) + E P − z √ mh m (cid:88) i =1 ∇ tr M (cid:0) (cid:12)(cid:12) x ; P (cid:1)(cid:124) (cid:123)(cid:122) (cid:125) = −∇ r F U | R (0 | x ; P ) ( R i − x ) K (cid:18) R i − xh (cid:19) E P − z √ mh m (cid:88) i =1 ∇ tr M (cid:0) (cid:12)(cid:12) x ∗ ; P (cid:1)(cid:124) (cid:123)(cid:122) (cid:125) = −∇ r F U | R (0 | x ∗ ; P ) ( R i − x ) K (cid:18) R i − xh (cid:19) = z √ mh (cid:90) (cid:2) ∇ r F U | R (0 | x ; P ) uh + ∇ r F U | R (0 | x ∗ ; P ) u h (cid:3) K ( u ) f R ( x + uh ; P ) du = z ∇ r F U | R (0 | x ; P ) √ mhh (cid:90) uK ( u ) f R ( x + uh ; P ) du + z √ mhh (cid:90) ∇ r F U | R (0 | x ∗ ; P ) u K ( u ) f R ( x + uh ; P ) du = z ∇ r F U | R (0 | x ; P ) √ mhh (cid:90) uK ( u ) [ f R ( x ; P ) + ∇ r f R ( x ∗∗ ; P ) uh ] du + O P ( √ mhh )= z ∇ r F U | R (0 | x ; P ) f R ( x ; P ) √ mhh (cid:90) uK ( u ) du (cid:124) (cid:123)(cid:122) (cid:125) =0 + √ mhh (cid:90) u K ( u ) ∇ r f R ( x ∗∗ ; P ) du + O P ( √ mhh )= O P ( √ mhh ) = o P (1)where x ∗ is a point between R i and x , x ∗∗ is a point between x + uh and x ; we use thekernel properties, ∇ r F U | R (0 | x ; P ) is a bounded function of P , ∇ r F U | R (0 | r ; P ) is a boundedfunction of ( r, P ), f R ( r ; P ) is a bounded function of ( r, P ), ∇ r f R ( r ; P ) is a bounded functionof ( r, P ), and √ mhh → V P (cid:34) − z √ mh m (cid:88) i =1 ∇ t M (cid:0) (cid:12)(cid:12) R i ; P (cid:1) K (cid:18) R i − xh (cid:19)(cid:35) = z h V P (cid:20) ∇ t M (cid:0) (cid:12)(cid:12) R i ; P (cid:1) K (cid:18) R i − xh (cid:19)(cid:21) ≤ z h E P (cid:20)(cid:8) ∇ t M (cid:0) (cid:12)(cid:12) R i ; P (cid:1)(cid:9) K (cid:18) R i − xh (cid:19)(cid:21) = z h (cid:90) (cid:8) ∇ r F U | R (0 | x ∗ ; P ) (cid:9) u f R ( x + uh ; P ) K ( u ) du = o P (1) . Therefore, (C.9) = o P (1) . Moreover, (C.11) = o P (1) because mh → ∞ , mh (cid:80) mi =1 K (cid:0) R i − xh (cid:1) = O P (1), and ∇ u f U | R ( u | r ; P )is a bounded function of ( u, r, P ).It remains to show that 26C.10) = f U | R (0 | x ; P ) f R ( x ; P ) z + o P (1).Expectation of (C.10): E P (cid:34) z mh m (cid:88) i =1 f U | R (0 | R i ; P ) K (cid:18) R i − xh (cid:19)(cid:35) = E P (cid:20) z h (cid:8) f U | R (0 | x ; P ) + ∇ r f U | R (0 | x ∗ ) ( R i − x ) (cid:9) K (cid:18) R i − xh (cid:19)(cid:21) = z (cid:90) (cid:8) f U | R (0 | x ; P ) + ∇ r f U | R (0 | x ∗ ) uh (cid:9) K ( u ) f R ( x + uh ; P ) du = f U | R (0 | x ; P ) z (cid:90) K ( u ) { f R ( x ; P ) + ∇ r f R ( x ∗∗ ; P ) uh } du + z h (cid:90) ∇ r f U | R (0 | x ∗ ; P ) uK ( u ) f R ( x + uh ; P ) du = z f U | R (0 | x ; P ) f R ( x ; P ) + o P (1)where we use that f R ( r ; P ), ∇ r f R ( r ; P ), f U | R (0 | x ; P ), ∇ r f U | R (0 | r ; P ) are bounded over ( r,P ).Variance of (C.10): V P (cid:34) z mh m (cid:88) i =1 f U | R (0 | R i ; P ) K (cid:18) R i − xh (cid:19)(cid:35) = z mh V P (cid:20) f U | R (0 | R i ; P ) K (cid:18) R i − xh (cid:19)(cid:21) ≤ z mh E P (cid:20) f U | R (0 | R i ; P ) K (cid:18) R i − xh (cid:19)(cid:21) = z mh (cid:90) f U | R (0 | x + uh ; P ) K ( u ) f R ( x + uh ; P ) du = o P (1) , because f U | R (0 | r ; P ) and f R ( r ; P ) are bounded functions of ( r, P ) and mh → ∞ .Therefore, we have that (C.10) = f U | R (0 | x ; P ) f R ( x ; P ) z + o P (1).Regarding (C.8), we show that its expectation is zero and its variance converges to zero.The expectation of (C.8) is zero because it equals the expectation of E P (cid:34) n (cid:88) i =1 ( V i ( z ) − E P [ V i ( z ) | R i ]) K (cid:18) R i − xh (cid:19) (cid:12)(cid:12)(cid:12) R m (cid:35) = 0 . V P (cid:34) m (cid:88) i =1 ( V i ( z ) − E P [ V i ( z ) | R i ]) K (cid:18) R i − xh (cid:19)(cid:35) = V P (cid:34) E P (cid:34) m (cid:88) i =1 ( V i ( z ) − E P [ V i ( z ) | R i ]) K (cid:18) R i − xh (cid:19) (cid:12)(cid:12)(cid:12) R m (cid:35)(cid:35) + E P (cid:34) V P (cid:34) m (cid:88) i =1 ( V i ( z ) − E P [ V i ( z ) | R i ]) K (cid:18) R i − xh (cid:19) (cid:12)(cid:12)(cid:12) R m (cid:35)(cid:35) ≤ m (cid:88) i =1 E P (cid:20) K (cid:18) R i − xh (cid:19) E P (cid:104) V i ( z ) (cid:12)(cid:12)(cid:12) R i (cid:105)(cid:21) = m (cid:88) i =1 E P (cid:20) K (cid:18) R i − xh (cid:19) V i ( z ) (cid:21) ≤ z (cid:90) K ( u ) f R ( x + uh ; P ) (cid:26) F U | R (cid:18)(cid:12)(cid:12)(cid:12)(cid:12) z √ mh (cid:12)(cid:12)(cid:12)(cid:12) (cid:12)(cid:12)(cid:12)(cid:12) x + uh ; P (cid:19) − F U | R (cid:18) − (cid:12)(cid:12)(cid:12)(cid:12) z √ mh (cid:12)(cid:12)(cid:12)(cid:12) (cid:12)(cid:12)(cid:12)(cid:12) x + uh ; P (cid:19)(cid:27) du = 8 z √ mh (cid:90) K ( u ) f R ( x + uh ; P ) ∇ u F U | R (cid:18) u ∗ (cid:12)(cid:12)(cid:12)(cid:12) x + uh ; P (cid:19) du = o P (1) . where we use that f R and ∇ u F U | R are bounded over ( u, r, P ) and | V i ( z ) | = (cid:12)(cid:12)(cid:12)(cid:16) ρ τ (cid:16) U i − z/ √ mh (cid:17) − ρ τ ( U i ) − D i z/ √ mh (cid:17)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12) z/ √ mh (cid:12)(cid:12)(cid:12) I (cid:18) | U i | ≤ (cid:12)(cid:12)(cid:12)(cid:12) z √ mh (cid:12)(cid:12)(cid:12)(cid:12)(cid:19) . Combining what we found for (C.6)-(C.8), we now have, for fixed z : Q m ( z ) = 12 f U | R (0 | x ; P ) f R ( x ; P ) z + 1 √ mh m (cid:88) i =1 z ( D i − E P [ D i | R i ]) K (cid:18) R i − xh (cid:19)(cid:124) (cid:123)(cid:122) (cid:125) . = Q ∗ m ( z ) + r m ( z ) , where Q ∗ m ( z ) is the probability limit of (C.6) plus (C.7), and r m ( z ) is the difference between(C.6) and its probability limit plus (C.8). Thus, we know that r m ( z ) = o P (1) for fixed z .28ewrite Q ∗ m ( z ) as follows. Q ∗ m ( z ) = 12 f U | R (0 | x ; P ) f R ( x ; P ) z + z √ mh m (cid:88) i =1 ( D i − E P [ D i | R i ]) K (cid:18) R i − xh (cid:19)(cid:124) (cid:123)(cid:122) (cid:125) . = M m = 12 f U | R (0 | x ; P ) f R ( x ; P ) z + zM m = 12 f U | R (0 | x ; P ) f R ( x ; P ) z + 1 f U | R (0 | x ; P ) f R ( x ; P ) M m (cid:124) (cid:123)(cid:122) (cid:125) . = − η m − f U | R (0 | x ; P ) f R ( x ; P ) M m = 12 f U | R (0 | x ; P ) f R ( x ; P ) ( z − η m ) − f U | R (0 | x ; P ) f R ( x ; P ) η m , (C.12)which is minimized at η m = − f U | R (0 | x ; P ) f R ( x ; P ) M m = − f U | R (0 | x ; P ) f R ( x ; P ) 1 √ mh m (cid:88) i =1 ( D i − E P [ D i | R i ]) K (cid:18) R i − xh (cid:19) . We have already shown above that r m ( z ) = o P (1) for fixed z . Now, we show that theconvergence is also uniform over z in a compact set K ⊂ R . To this end, considerΛ m ( z ) . = Q m ( z ) − √ mh (cid:80) mi =1 z ( D i − E P [ D i | R i ]) K (cid:0) R i − xh (cid:1) ,Λ( z ) . = f U | R (0 | r ; P ) f R ( x ; P ) z , and note that Λ m ( z ) is a convex function of z . By theconvexity lemma (Pollard (1991), page 187), sup z ∈K | Λ m ( z ) − Λ( z ) | = o P (1). Since r m ( z ) = Q m ( z ) − Q ∗ m ( z ) = Λ m ( z ) − Λ( z ), we have that, for any compact subset K ⊂ R , sup z ∈K | r m ( z ) | = o P (1).Part II: Z m − η m = o P (1).We want to show that for each (cid:15) > P ∈P P P ( | Z m − η m | ≤ (cid:15) ) → . Consider the closed interval B ( m ) with center η m and radius (cid:15) . Since η m converges indistribution, it is stochastically bounded. The compact set K can be chosen to contain B ( m )with probability arbitrarily close to one, thereby implying sup z ∈ B ( m ) | r m ( z ) | = o P (1) . B ( m ), suppose z = η m + δ with δ > (cid:15) . For the boundarypoint z ∗ = η m + (cid:15) , the convexity of Q m and (C.12) imply (cid:15)δ Q m ( z )+ (cid:16) − (cid:15)δ (cid:17) Q m ( η m ) ≥ Q m (cid:16) (cid:15)δ z + (cid:16) − (cid:15)δ (cid:17) η m (cid:17) = Q m ( z ∗ ) ≥ f U | R (0 | x ; P ) f R ( x ; P )2 ( z ∗ − η m ) − f U | R (0 | r ; P ) f R ( x ; P )2 η m − sup z ∈ B ( m ) | r m ( z ) |≥ f U | R (0 | x ; P ) f R ( x ; P )2 (cid:15) + Q m ( η m ) − z ∈ B ( m ) | r m ( z ) | .Q m ( z ) ≥ Q m ( η m ) + (cid:18) δ(cid:15) (cid:19) (cid:34) f U | R (0 | x ; P ) f R ( x ; P )2 (cid:15) − z ∈ B ( m ) | r m ( z ) | (cid:35) . An analogous argument holds for z = η m − δ . Define the event A m as A m : 2 sup z ∈ B ( m ) | r m ( z ) | < f U | R (0 | x ; P ) f R ( x ; P )4 (cid:15) . We have that inf P ∈P P P [ A m ] → P ∈P P P (cid:2) sup z ∈ B ( m ) | r m ( z ) | > f U | R (0 | x ; P ) f R ( x ; P ) (cid:15) / (cid:3) →
0. Conditional on A m , Q m ( z ) ≥ Q m ( η m ) + f U | R (0 | x ; P ) f R ( x ; P )4 (cid:15) , for any z : | z − η m | > (cid:15) , (C.13)happens with probability one. The event in (C.13) also implies that | Z m − η m | ≤ (cid:15) . Thus, P P [ | Z m − η m | ≤ (cid:15) ] ≥ P P (cid:20) inf z : | z − η m | >(cid:15) Q m ( z ) ≥ Q m ( η m ) + f U | R (0 | x ; P ) f R ( x ; P )4 (cid:15) (cid:21) ≥ P P [ A m ], so that,inf P ∈P P P [ | Z m − η m | ≤ (cid:15) ] ≥ inf P ∈P P P [ A m ] = 1 . Therefore, we have the asymptotic linear representation of the estimator as follows: √ mh (cid:16) ˆ θ − θ ( P ) (cid:17) = 1 √ mh m (cid:88) i =1 f U | R (0 | x ; P ) f R ( x ; P ) K (cid:18) R i − xh (cid:19) ( D i − E P [ D i | R i ]) + o P (1)= 1 √ m m (cid:88) i =1 ψ n ( V i , P ) + o P (1)where D i − E P [ D i | R i ] = I { S i < θ ( P ) } − F S | R ( θ ( P ) | R i ; P ) and the influence function is given30y ψ n ( V i , P ) . = 1 f U | R (0 | x ; P ) f R ( x ; P ) √ h K (cid:18) R i − xh (cid:19) ( D i − E P [ D i | R i ])= 1 f U | R (0 | x ; P ) f R ( x ; P ) √ h K (cid:18) R i − xh (cid:19) (cid:0) I { S i < θ ( P ) } − F S | R ( θ ( P ) | R i ; P ) (cid:1) .
2. Assumption 2.1 - (2.3): zero mean of influence function. E P [ ψ n ( V i , P )] = 0 ∀ P by construction.3. Assumption 2.1 - (2.4): variance of influence function.Define ξ ( P ) . = 1 f U | R (0 | x ; P ) f R ( x ; P ) M K V P [ D i | R i = x ]= 1 f U | R (0 | x ; P ) f R ( x ; P ) M K F S | R ( θ ( P ) | x ; P )(1 − F S | R ( θ ( P ) | x ; P ))= τ (1 − τ ) f U | R (0 | x ; P ) f R ( x ; P ) M K , where M K = (cid:82) ∞−∞ K ( u ) du . V P (cid:32) f U | R (0 | x ; P ) f R ( x ; P ) √ h K (cid:18) R i − xh (cid:19) ( D i − E P [ D i | R i ]) (cid:33) − ξ ( P )= 1 f U | R (0 | x ; P ) f R ( x ; P ) (cid:90) K ( u ) (cid:26) V P [ D i | R i = x + uh ] f R ( x + uh ; P ) − V P [ D i | R i = x ] f R ( x ; P ) (cid:124) (cid:123)(cid:122) (cid:125) . = g ( x ; P ) (cid:27) du = hf U | R (0 | x ; P ) f R ( x ; P ) (cid:90) K ( u ) ∇ x g ( x ∗ uh ; P ) u du = o P (1) . where x ∗ uh is a point between x + uh and x , and ∇ x g ( x ; P ) denotes the derivative of g wrt x . The expression above is o P (1) because h →
0, the derivative ∇ x g ( x ; P ) isa bounded function of ( x, P ), and f R ( x ; P ) and f U | R (0 | x ; P ) are bounded away fromzero over P . The derivative ∇ x g ( x ; P ) is bounded because f R ( r ; P ), ∇ r f R ( r ; P ), V P [ D i | R i = r ] = F U | R (0 | r ; P )(1 − F U | R (0 | r ; P )) and ∇ r (cid:8) F U | R (0 | r ; P )(1 − F U | R (0 | r ; P )) (cid:9) are bounded functions of ( r, P ).31herefore, sup P ∈P (cid:12)(cid:12) V P [ ψ n ( V i , P )] − ξ ( P ) (cid:12)(cid:12) → .
4. Assumption 2.1 - (2.5): sup P E [ ψ n ( Z k , P )] < ∞ . For Z k = ( X k , Y k ) ∼ P k , k = 1 ,
2, define D k = I ( Y k − θ ( P k ) < − τ , m D k | X k ( x k ) = E [ D k | X k = x k ], and v D k | X k ( x k ) = V [ D k | X k = x k ]. For V = ( R, S ) ∼ P ∈ P , D = I ( S − θ ( P ) < − τ and m D | R ( r ; P ) = E P [ D | R = r ].We have, ψ n ( Z k , P ) = 1 f U | R (0 | x ; P ) f R ( x ; P ) √ h K (cid:18) X k − xh (cid:19) (cid:0) D k − m D | R ( X k ; P ) (cid:1) E (cid:2) ψ n ( Z k , P ) (cid:3) = 1 f U | R (0 | x ; P ) f R ( x ; P ) (cid:90) K ( u ) (cid:20) v P k ( x + uh ) + (cid:0) m D k | X k ( x + uh ) − m D | R ( x + uh ; P ) (cid:1) (cid:21) f X k ( x + uh ) du = O P (1) , because v D k | X k ( r ) = F Y k | X k ( θ ( P k ) | r )(1 − F Y k | X k ( θ ( P k ) | r )), m D k | X k ( r ) = F D k | X k ( θ ( P k ) | r ) − τ , m D | R ( r ; P ) = F S | R ( θ ( P ) | r ) − τ , and f X k ( r ) arebounded over ( r, P ); and f R ( x ; P ) and f U | R (0 | x ; P ) are bounded away from zero over P .5. Assumption 2.1 - (2.6): (2 + θ )-th moment condition.We verify it in two steps.First, V P ( ψ n ( Z i , P )) − ξ ( P ) = o P (1) and ξ ( P ) is bounded away from zero, uniformlyover P . Thus, V − P ( ψ n ( Z i , P )) = O P (1).Second, for any θ >
0, call η ( r ; P ) = E P [ | D i − E P [ D i | R i ] | θ | R i = r ] and note that | η ( r ; P ) | ≤ .n − θ/ E P | ψ n ( V i , P ) | θ = ( m/n ) θ/ f θU | R (0 | x ; P ) f θR ( x ; P )( mh ) θ/ (cid:90) | K ( u ) | θ η ( x + uh ; P ) f R ( x + uh ; P ) du = o P (1) 32ombining steps 1 and 2, n − θ/ sup P ∈P E P (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ψ n ( V i , P ) (cid:112) V P ( ψ n ( V i , P )) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = o (1) .
6. Assumption 2.1 - (2.7): ξ (cid:0) mn P + n − mn P (cid:1) → ξ ( γP + (1 − γ ) P ).Let P n = mn P + n − mn P and P = γP + (1 − γ ) P . Consider the expression for ξ ( P ) given above. It suffices to show that f R ( x ; ¯ P n ) → f R ( x ; ¯ P ) and f U | R (0 | x ; ¯ P n ) → f U | R (0 | x ; ¯ P ). The first is straightforward (see Section C.1). For the second, let U k = Y k − θ ( P k ) and note that f U | R (0 | x ; ¯ P n ) = mn f X ( x ) f R ( x ; ¯ P n ) f U | X (0 | x ) + n − mn f X ( x ) f R ( x ; ¯ P n ) f U | X (0 | x ) → γ f X ( x ) f R ( x ; ¯ P ) f U | X (0 | x ) + (1 − γ ) f X ( x ) f R ( x ; ¯ P ) f U | X (0 | x )= f U | R (0 | x ; ¯ P ) .
7. Assumption 2.2 : n /n − λ p → n is deterministic and ( n /n − λ ) →
0. Assumption 2.1 has already beenverified above for any m such that ( m/n − λ ) → m/n − (1 − λ )) → (cid:3) C.3 Proof of Proposition 3.3 - Discontinuity of Conditional Mean
The goal of this proof is to use the assumptions listed in Proposition 3.3 to verify As-sumptions 2.1 and 2.2. It follows the general lines of the proof of Proposition 3.1, so thereader may refer to Section C.1 for the redundant details that we omit here.Let m n be an arbitrary sequence of positive integers that grows with n such that m n /n → γ , for some γ ∈ (0 , V = ( R, S ), draw V , . . . , V m iid P ∈ P , where P is thedistribution of ( X, Y ) conditional on X ≥
0, and P is the distribution of ( X, Y ) conditionalon
X <
0. The assumptions in Proposition 3.3 imply the following facts:1. As m → ∞ , h → , mh → ∞ , √ mhh → The distribution of R has PDF f R ( r ; P ) and first derivative wrt r denoted ∇ r f R ( r ; P ) ; f R ( r ; P ) and ∇ r f R ( r ; P ) are bounded as functions of ( r, P ) ; f R ( r ; P ) is bounded awayfrom zero as a function of ( r, P ) ; m S | R ( r ; P ) = E P [ S | R = r ] has first derivative wrt r denoted ∇ r m S | R ( r ; P ) ; m S | R and ∇ r m S | R are bounded as functions of ( r, P ) ; v S | R ( r ; P ) = V P [ S | R = r ] has first derivative wrt r denoted ∇ r v S | R ( r ; P ) ; v S | R , ∇ r v S | R are both bounded as functions of ( r, P ) ; v S | R (0 + ; P ) is bounded away from zero as afunction of P ; η ( r ; P ) . = E P [ | S − m S | R ( R ; P ) | θ | R = r ] is a bounded function of ( r, P ) . We re-write √ mh (cid:16)(cid:98) θ − θ ( P ) (cid:17) to find the asymptotic linear representation. √ mh (cid:16)(cid:98) θ − θ ( P ) (cid:17) = (cid:32) √ mh m (cid:88) i =1 K (cid:18) R i h (cid:19) (cid:0) S i − m S | R ( R i ; P ) (cid:1)(cid:33) (cid:0) f R (0 + ; P ) / (cid:1) − (C.14)+ (cid:32) √ mh m (cid:88) i =1 K (cid:18) R i h (cid:19) (cid:0) S i − m S | R ( R i ; P ) (cid:1)(cid:33)(cid:32) mh m (cid:88) i =1 K (cid:18) R i h (cid:19)(cid:33) − − (cid:0) f R (0 + ; P ) / (cid:1) − (C.15)+ (cid:32) √ mh m (cid:88) i =1 K (cid:18) R i h (cid:19) (cid:0) m S | R ( R i ; P ) − m S | R (0 + ; P ) (cid:1)(cid:33)(cid:32) mh m (cid:88) i =1 K (cid:18) R i h (cid:19)(cid:33) − (C.16)1. Assumption 2.1 - (2.2): asymptotic expansion.Equation C.14 above gives the influence function ψ n .1 √ m m (cid:88) i =1 K (cid:18) R i h (cid:19) (cid:0) S i − m S | R ( R i ; P ) (cid:1) h − / (cid:0) f R (0 + ; P ) / (cid:1) − (cid:124) (cid:123)(cid:122) (cid:125) . = ψ n ( V i ,P ) = 1 √ m m (cid:88) i =1 ψ n ( V i , P ) . We need to show that Equations C.15 and C.16 converge in probability to zero uni-formly over P . Equation C.15: is o P (1). We show this in 3 steps. First, V P (cid:32) √ mh m (cid:88) i =1 K (cid:18) R i h (cid:19) (cid:0) S i − m S | R ( R i ; P ) (cid:1)(cid:33) (cid:90) ∞ K ( u ) v S | R ( uh ; P ) f R ( uh ; P ) du = O P (1) . The expected value of the expression inside the variance above is zero, so we have that1 √ mh m (cid:88) i =1 K (cid:18) R i h (cid:19) (cid:0) S i − m S | R ( R i ; P ) (cid:1) = O P (1) . Second, E P (cid:34) mh m (cid:88) i =1 K (cid:18) R i h (cid:19)(cid:35) − f R (0 + ; P ) / h (cid:90) ∞ K ( u ) ∇ r f R ( x ∗ uh ; P ) u du = o P (1) V P (cid:34) mh m (cid:88) i =1 K (cid:18) R i h (cid:19)(cid:35) ≤ mh (cid:90) K ( u ) f R ( uh ; P ) du = o P (1) . Therefore, 1 mh m (cid:88) i =1 K (cid:18) R i h (cid:19) − f R (0 + ; P ) / o P (1) (C.17) (cid:32) mh m (cid:88) i =1 K (cid:18) R i h (cid:19)(cid:33) − − (cid:0) f R (0 + ; P ) / (cid:1) − = o P (1) . (C.18)Third, combining steps 1 and 2 gives (cid:32) √ mh m (cid:88) i =1 K (cid:18) R i h (cid:19) (cid:0) S i − m S | R ( R i ; P ) (cid:1)(cid:33)(cid:32) mh m (cid:88) i =1 K (cid:18) R i h (cid:19)(cid:33) − − (cid:0) f R (0 + ; P ) / (cid:1) − = O P (1) o P (1) = o P (1) . Equation C.16: is o P (1). We show this in 3 steps. First, (cid:0) mh (cid:80) mi =1 K (cid:0) R i h (cid:1)(cid:1) − = O P (1).Second, E P (cid:34) √ mh m (cid:88) i =1 K (cid:18) R i h (cid:19) (cid:0) m S | R ( R i ; P ) − m S | R (0 + ; P ) (cid:1)(cid:35) = √ mh (cid:90) ∞ K ( u ) [ m S | R ( uh ; P ) − m S | R (0 + ; P )] f R ( uh ; P ) du = √ mhh (cid:90) ∞ K ( u ) [ ∇ r m S | R ( x ∗ uh ; P ) u ][ f R (0 + ; P ) + ∇ r f R ( x ∗∗ uh ; P ) uh ] du √ mhhO P (1) = o P (1) . V P (cid:34) √ mh m (cid:88) i =1 K (cid:18) R i h (cid:19) (cid:0) m S | R ( R i ; P ) − m S | R (0 + ; P ) (cid:1)(cid:35) = h (cid:90) ∞ K ( u ) [ ∇ r m S | R ( x ∗ uh ; P )] u f R ( uh ; P ) du = h O P (1) = o P (1) . This gives, 1 √ mh m (cid:88) i =1 K (cid:18) R i h (cid:19) (cid:0) m S | R ( R i ; P ) − m S | R (0 + ; P ) (cid:1) = o P (1) . Third, combine the first and second steps: (cid:32) √ mh m (cid:88) i =1 K (cid:18) R i h (cid:19) (cid:0) m S | R ( R i ; P ) − m S | R (0 + ; P ) (cid:1)(cid:33)(cid:32) mh m (cid:88) i =1 K (cid:18) R i h (cid:19)(cid:33) − = o P (1) O P (1) = o P (1) .
2. Assumption 2.1 - (2.3): zero mean of influence function. E P [ ψ n ( V i , P )] = 0 ∀ P by construction.3. Assumption 2.1 - (2.4): variance of influence function.Call ξ ( P ) = 4 M K v S | R (0 + ; P ) /f R (0 + ; P ), where M K = (cid:82) ∞ K ( u ) du . V P (cid:18) f R (0 + ; P ) √ h K (cid:18) R i h (cid:19) (cid:0) S i − m S | R ( R i ; P ) (cid:1)(cid:19) − ξ ( P )= 4 f R (0 + ; P ) (cid:90) ∞ K ( u ) v S | R ( uh ; P ) f R ( uh ; P ) − v S | R (0 + ; P ) f R (0 + ; P ) (cid:124) (cid:123)(cid:122) (cid:125) . = g (0 + ; P ) du = 4 hf R (0 + ; P ) (cid:90) ∞ K ( u ) ∇ x g ( x ∗ uh ; P ) u du = o P (1) . Therefore, sup P ∈P (cid:12)(cid:12) V P [ ψ n ( V i , P )] − ξ ( P ) (cid:12)(cid:12) → .
4. Assumption 2.1 - (2.5): sup P E [ ψ n ( Z k,i , P )] < ∞ .
36e have, ψ n ( Z k , P ) = 4 K (cid:0) X k h (cid:1) hf R (0 + ; P ) (cid:2)(cid:0) Y k − m Y k | X k ( X k ) (cid:1) + (cid:0) m Y k | X k ( X k ) − m S | R ( X k ; P ) (cid:1)(cid:3) E (cid:2) ψ n ( Z k , P ) (cid:3) = 4 f R (0 + ; P ) (cid:90) ∞ K ( u ) (cid:104) v Y k | X k ( uh ) + (cid:0) m Y k | X k ( uh ) − m S | R ( uh ; P ) (cid:1) (cid:105) f X k ( uh ) du = O P (1) .
5. Assumption 2.1 - (2.6): (2 + θ )-th moment condition.First, V − P ( ψ n ( V i , P )) = O P (1) . Second, n − θ/ E P | ψ n ( V i , P ) | θ = o P (1) because η ( r ; P ) = O P (1).Therefore, n − θ/ sup P ∈P E P (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ψ n ( V i , P ) (cid:112) V P ( ψ n ( V i , P )) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = o (1) .
6. Assumption 2.1 - (2.7): ξ (cid:0) mn P + n − mn P (cid:1) → ξ ( γP + (1 − γ ) P ).Let P n = mn P + n − mn P and P = γP + (1 − γ ) P . We have that ξ ( P n ) = M K v S | R (0 + ; P n ) /f R (0 + ; ¯ P n ), v S | R (0 + ; ¯ P n ) → v S | R (0 + ; ¯ P ) and f R (0 + ; ¯ P n ) → f R (0 + ; ¯ P ) as in Section C.1. Therefore, ξ ( P n ) → ξ ( P ).7. Assumption 2.2 : ( n /n − λ ) p → n /n = (cid:80) i I { X i ≥ } /n and ( n /n − λ ) p → m such that ( m/n − λ ) → m/n − (1 − λ )] → (cid:3) C.4 Proof of Proposition C.4 - Discontinuity of Density
The goal of this proof is to use the assumptions listed in Proposition 3.4 to verify As-sumptions 2.1 and 2.2. It follows the general lines of the proof of Proposition 3.1, so thereader may refer to Section C.1 for the redundant details that we omit here.Let m n be an arbitrary sequence of positive integers that grows with n such that m n /n → γ , for some γ ∈ (0 , V = R , draw V , . . . , V m iid P ∈ P , where P is the distribution37f X and P is the distribution of − X . The assumptions in Proposition 3.4 imply thefollowing facts:1. As m → ∞ , h → , mh → ∞ , and √ mhh → ; The distribution of R has PDF f R ( r ; P ) that is differentiable wrt r except at r = 0 ; f R ( r ; P ) and the derivative ∇ r f R ( r ; P ) are bounded as functions of ( r, P ) ; f R (0 + ; P ) and f R (0 − ; P ) are bounded away from zero as functions of P ; We re-write √ mh (cid:16)(cid:98) θ − θ ( P ) (cid:17) to find the asymptotic linear representation. √ mh (cid:16)(cid:98) θ − θ ( P ) (cid:17) = 1 √ mh m (cid:88) i =1 (cid:26) K (cid:18) R i h (cid:19) ( I { R i ≥ } − I { R i < } ) − E P (cid:20) K (cid:18) Rh (cid:19) ( I { R ≥ } − I { R < } ) (cid:21)(cid:27) (C.19)+ 1 √ mh m (cid:88) i =1 (cid:26) E P (cid:20) K (cid:18) Rh (cid:19) ( I { R ≥ } − I { R < } ) (cid:21) − h (cid:0) f R (0 + ; P ) − f R (0 − ; P ) (cid:1)(cid:27) (C.20)1. Assumption 2.1 - (2.2): asymptotic expansion.Equation C.19 above gives the influence function ψ n , that is, (C.19) = m − / (cid:80) mi =1 ψ n ( V i ,P ), where ψ n ( V i , P ) . = 1 √ h (cid:26) K (cid:18) R i h (cid:19) ( I { R i ≥ } − I { R i < } ) − E P (cid:20) K (cid:18) Rh (cid:19) ( I { R ≥ } − I { R < } ) (cid:21)(cid:27) . We need to show that Equation C.20 converges to zero uniformly over P .Note that,1 √ mh m (cid:88) i =1 (cid:18) E P (cid:20) K (cid:18) Rh (cid:19) ( I { R ≥ } − I { R < } ) (cid:21) − h (cid:0) f R (0 + ; P ) − f R (0 − ; P ) (cid:1)(cid:19) = √ mh (cid:18)(cid:90) ∞ K ( u ) f R ( uh ; P ) du − (cid:90) −∞ K ( u ) f R ( uh ; P ) du − (cid:0) f R (0 + ; P ) − f R (0 − ; P ) (cid:1)(cid:19) √ mhh (cid:18)(cid:90) ∞ K ( u ) ∇ r f R ( x ∗ uh ; P ) u du − (cid:90) ∞ K ( u ) ∇ r f R ( x ∗∗ uh ; P ) u du (cid:19) = √ mhhO P (1) = o P (1) . where x ∗ uh and x ∗∗ uh are points between uh and 0.2. Assumption 2.1 - (2.3): zero mean of influence function. E P [ ψ n ( V i , P )] = 0 ∀ P by construction.3. Assumption 2.1 - (2.4): variance of influence function.Call ξ ( P ) = ( f R (0 + ; P ) + f R (0 − ; P )) (cid:82) ∞ K ( u ) du . V P (cid:18) √ h K (cid:18) R i h (cid:19) ( I { R i ≥ } − I { R i < } ) (cid:19) − ξ ( P )= 1 h (cid:20)(cid:90) ∞−∞ K ( u ) f R ( uh ; P ) h du − (cid:18)(cid:90) ∞ K ( u ) f R ( uh ; P ) h du − (cid:90) −∞ K ( u ) f R ( uh ; P ) h du (cid:19) (cid:35) − ξ ( P )= (cid:90) ∞ K ( u )[ f R (0 + ; P ) + ∇ r f R ( x ∗ uh ; P ) uh ] du + (cid:90) −∞ K ( u )[ f R (0 − ; P ) + ∇ r f R ( x ∗∗ uh ; P ) uh ] du − h (cid:18)(cid:90) ∞ K ( u ) f R ( uh ; P ) du − (cid:90) −∞ K ( u ) f R ( uh ; P ) du (cid:19) − ξ ( P )= h (cid:20)(cid:90) ∞ K ( u ) ∇ r f R ( x ∗ uh ; P ) u du + (cid:90) −∞ K ( u ) ∇ r f R ( x ∗∗ uh ; P ) u du − (cid:18)(cid:90) ∞ K ( u ) f R ( uh ; P ) du − (cid:90) −∞ K ( u ) f R ( uh ; P ) du (cid:19) (cid:35) = hO P (1) = o P (1) . Therefore, sup P ∈P (cid:12)(cid:12) V P [ ψ n ( V i , P )] − ξ ( P ) (cid:12)(cid:12) → .
4. Assumption 2.1 - (2.5): sup P E [ ψ n ( Z k , P )] < ∞ for k = 1 , . We have, ψ n ( Z k , P ) = h − / K (cid:18) X k h (cid:19) ( I { X k ≥ } − I { X k < } )39 h − / E P (cid:20) K (cid:18) Vh (cid:19) ( I { V ≥ } − I { V < } ) (cid:21) ψ n ( Z k , P ) = 1 h K (cid:18) X k h (cid:19) + 1 h (cid:26) E P (cid:20) K (cid:18) Vh (cid:19) ( I { V ≥ } − I { V < } ) (cid:21)(cid:27) − h K (cid:18) X k h (cid:19) ( I { X k ≥ } − I { X k < } ) E P (cid:20) K (cid:18) Vh (cid:19) ( I { V ≥ } − I { V < } ) (cid:21) E (cid:2) ψ n ( Z k , P ) (cid:3) = (cid:90) K ( u ) f X k ( uh ) du + h (cid:26)(cid:90) ∞ K ( u ) f R ( uh ; P ) du − (cid:90) −∞ K ( u ) f R ( uh ; P ) du (cid:27) − (cid:26)(cid:90) ∞ K ( u ) f X k ( uh ) du − (cid:90) −∞ K ( u ) f X k ( uh ) du (cid:27) × h (cid:26)(cid:90) ∞ K ( u ) f R ( uh ; P ) du − (cid:90) −∞ K ( u ) f R ( uh ; P ) du (cid:27) E (cid:2) ψ n ( Z k , P ) (cid:3) = O P (1) + O P ( h ) + O P ( h ) = O P (1) .
5. Assumption 2.1 - (2.6): (2 + θ )-th moment condition.We verify it in two steps.First, V P ( ψ n ( Z i , P )) − ξ ( P ) = o P (1) and ξ ( P ) is bounded away from zero, uniformlyover P . Thus, V − P ( ψ n ( Z i , P )) = O P (1).Second, pick θ >
0. From before, E P (cid:2) K (cid:0) Rh (cid:1) ( I { R ≥ } − I { R < } ) (cid:3) = O P ( h ). n − θ/ E P | ψ n ( V i , P ) | θ = 1( nh ) θ/ h E P (cid:12)(cid:12)(cid:12)(cid:12) K (cid:18) R i h (cid:19) ( I { V i ≥ } − I { V i < } ) − O P ( h ) (cid:12)(cid:12)(cid:12)(cid:12) θ = 1( nh ) θ/ (cid:90) ∞ | K ( u ) − O P ( h ) | θ f R ( uh ; P ) du + 1( nh ) θ/ (cid:90) −∞ |− K ( u ) − O P ( h ) | θ f R ( uh ; P ) du = 1( nh ) θ/ O P (1) = o P (1) . Combining both steps gives, n − θ/ sup P ∈P E P (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ψ n ( V i , P ) (cid:112) V P ( ψ n ( V i , P )) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = o (1) .
40. Assumption 2.1 - (2.7): ξ (cid:0) mn P + n − mn P (cid:1) → ξ ( γP + (1 − γ ) P ).Let P n = mn P + n − mn P and P = γP + (1 − γ ) P . Given that ξ ( P n ) = (cid:0) f R (0 + ; P n ) + f R (0 − ; P n ) (cid:1) (cid:82) ∞ K ( u ) du , the result follows because m/n → γ , f R (0 + ; P n ) → f R (0 + ; P ), and f R (0 − ; P n ) → f R (0 − ; P ).7. Assumption 2.2 : ( n /n − λ ) p → n is a deterministic sequence, and we have that n /n = (cid:98) n/ (cid:99) /n → / m such that m/n → / (cid:3) D Auxiliary Lemmas
Definition D.1.
Consider a sequence of measurable functions X n : Ω × P → R q for aprobability space (Ω , B , P ) , where P belongs to a set of distributions P . In other words,for any fixed P ∈ P , X n ( P ) is a random variable defined on (Ω , B , P ) to the Euclideanspace, whose probability distribution depends on n and P ∈ P . We say X n is uniformlybounded in probability over P by a deterministic sequence α n if, for every δ > , there existsa deterministic constant M δ ∈ (0 , ∞ ) such that sup P ∈P P P [ (cid:107) X n ( P ) (cid:107) > M δ α n ] < δ. We denote this as X n = O P ( α n ) . Similarly, we say X n = o P ( α n ) if, for every ε, δ > ∃ n ε,δ such that sup P ∈P P P [ (cid:107) X n ( P ) (cid:107) > εα n ] < δ ∀ n ≥ n ε,δ . Lemma D.1.
Consider a deterministic sequence α n and a sequence of bivariate randomfunctions ( X n , Y n ) : Ω × P → R as in Definition D.1.1. If X n = O P ( α n ) and Y n = o P (1) , then X n Y n = o P ( α n ) .2. Assume X n ( P ) has expectation E P [ X n ( P )] and variance V P [ X n ( P )] that are boundedover P . Then, X n = O P (cid:18) sup P ∈P E P [ X n ( P )] + sup P ∈P V P [ X n ( P )] (cid:19) .
3. Suppose there exists a deterministic function y : P → [ y, y ] with < y < y < ∞ suchthat Y n − y = o P (1) . Then, Y n − y = o P (1) . roof. Part 1:
Fix δ > ε >
0. There exists M δ ∈ (0 , ∞ ) such thatsup P ∈P P P [ | X n ( P ) | > M δ α n ] < δ . There exists ∃ n ε,δ such thatsup P ∈P P P (cid:20) | Y n ( P ) | > εM δ (cid:21) < δ ∀ n ≥ n ε,δ . Given that | X n ( P ) Y n ( P ) | > εα n implies | X n ( P ) | > M δ α n or | Y n ( P ) | > ε/M δ we havesup P ∈P P P [ | X n ( P ) Y n ( P ) | > εα n ] ≤ sup P ∈P P P [ | X n ( P ) | > M δ α n ] + sup P ∈P P P [ | Y n ( P ) | > ε/M δ ] < δ ∀ n ≥ n ε,δ . Part 2:
Call A n = sup P ∈P E P [ X n ( P )] + sup P ∈P V P [ X n ( P )] and B n = sup P ∈P E P ( X n ( P )). Note that A n ≥ B n ≥ E P ( X n ( P )) for every P ∈ P .For δ > P ∈ P , P P (cid:2) | X n ( P ) | > δ − / A n (cid:3) ≤ P P (cid:2) | X n ( P ) | > δ − / B n (cid:3) ≤ P P (cid:2) | X n ( P ) | > δ − / E P (cid:0) X n ( P ) (cid:1)(cid:3) ≤ δ where the last inequality is the Markov inequality. The result follows by taking the supremumof both sides. Part 3:
Pick η > y − η >
0. Define the set A = { y : y − η ≤ y ≤ y + η } . The function g ( y ) = 1 /y is uniformly continuous over A . Thus, for every ε >
0, there exists γ ε > | g ( y (cid:48) ) − g ( y ) | > ε ⇒ | y (cid:48) − y | > γ ε for any y (cid:48) , y ∈ A . This implies that, for fixed P , P P [ | /Y n ( P ) − /y ( P ) | > ε, Y n ( P ) ∈ A ] ≤ P P [ | Y n ( P ) − y ( P ) | > γ ε ] → . This right-hand side is uniform over P because Y n − y = o P (1).Next, for fixed P , | Y n ( P ) − y ( P ) | < η ⇒ y − η < Y n ( P ) < y + η ⇒ Y n ( P ) ∈ A .This implies that P P [ | Y n ( P ) − y ( P ) | < η ] ≤ P P [ Y n ( P ) ∈ A ] and that P P [ Y n ( P ) ∈ A ] → P because Y n − y = o P (1) implies that P P [ | Y n ( P ) − y ( P ) | < η ] → P . Then, P P [ | /Y n ( P ) − /y ( P ) | > ε ] − P P [ | /Y n ( P ) − /y ( P ) | > ε, Y n ( P ) ∈ A ] ≤ − P P [ Y n ( P ) ∈ A ] → P . Lemma D.2.
Consider a sequence of random variables X n in the Euclidean space. X n p → X if, and only if, for every subsequence X n k there exists a further subsequence X n kj such that X n kj as → X .Proof. See the proof of Theorem 2.3.2 by Durrett (2019).
Lemma D.3.
Consider a sequence of random variables ( Z n , X n ) , n = 1 , , . . . , and a randomvariable Z , all with domain on the measure space (Ω , B , µ ) . The image of Z n and Z are ina Euclidean space, and the image of X n is X n . For a measurable function F n : X n → R ,assume F n ( X n ) p → . Moreover, suppose that for any nonrandom sequence x n ∈ X n suchthat F n ( x n ) → , Z n conditional on X n = x n converges in distribution to Z . Then, Z n d → Z unconditionally.Proof. For an arbitrary subsequence n k of the sequence { F n ( X n ) } n , Lemma D.2 says thereis a further subsequence n k j such that F n kj ( X n kj ) as → j → ∞ . Define X ∞ to be the spaceof all subsequences of the form (cid:110) x n kj (cid:111) ∞ j =1 , for values x n kj ∈ X n kj that satisfy F n kj ( x n kj ) → j → ∞ . We know that P (cid:104) { X n kj } ∞ j =1 ∈ X ∞ (cid:105) = 1.Let G n be the CDF of the distribution of Z n conditional on X n = x n , where x n is anarbitrary sequence that satisfies F n ( x n ) →
0. By assumption, G n → G pointwise (for everycontinuity point of G ), where G is the CDF of Z . This implies that G n kj → G for thesubsequence { n k j } j from above.Next, P (cid:104) Z n kj ≤ z (cid:105) = (cid:90) Ω P (cid:104) Z n kj ≤ z (cid:12)(cid:12)(cid:12) X n kj = X n kj ( ω ) (cid:105) dµ ( ω )= (cid:90) Ω G n kj ( z ; ω ) I (cid:110) ω : { X n kj ( ω ) } ∞ j =1 ∈ X ∞ (cid:111) dµ ( ω )43here G n kj ( z ; ω ) denotes the CDF of Z n kj conditional on X n kj = X n kj ( ω ), and we used that P (cid:104) ω : { X n kj ( ω ) } ∞ j =1 ∈ X ∞ (cid:105) = 1.For each ω such that the indicator is 1, the conditional CDF G n kj ( z ; ω ) converges to G ( z ) as j → ∞ . So, we have an integral of a measurable function of ω that changeswith j , is bounded by 1, and converges pointwise in ω to G ( z ) I (cid:110) ω : { X n kj ( ω ) } ∞ j =1 ∈ X ∞ (cid:111) as j → ∞ . By the dominated convergence theorem, the integral above converges to G ( z ) P (cid:104) { X n kj } ∞ j =1 ∈ X ∞ (cid:105) = G ( z ) . This says that Z n kj d → Z .Finally, call H n the unconditional CDF of Z n . We have just shown that, for everysubsequence { n k } k there exists a further subsequence { n k j } j such that H n kj ( z ) → G ( z ).This implies that H n ( z ) → G ( z ). Therefore, Z n d → Z .The next lemma is a generalization of Lemma 11.3.3 of Lehmann and Romano (2005). Lemma D.4.
For each n , let Y n, , . . . , Y n,n be independently identically distributed with meanzero and finite variance σ n . Let C n, , . . . , C n,n be a sequence of random variables, independentof Y n, , . . . , Y n,n . Assume ∃ θ > such that E (cid:2) | Y n,i /σ n | θ (cid:3) (cid:32) max ni =1 C n,i (cid:80) nl =1 C n,l (cid:33) θ/ p → as n → ∞ . (D.1) Then (cid:80) ni =1 C n,i Y i,n σ n (cid:113)(cid:80) nl =1 C n,l d → N (0 , . (D.2) Proof.
We use Lemma D.3 to prove this lemma. Call α n = E (cid:2) | Y n,i /σ n | θ (cid:3) . Mapping thelemma’s notation to our case, we have Z n = (cid:80) ni =1 C n,i Y n,i σ n (cid:113)(cid:80) nl =1 C n,l X n = ( C n, , . . . , C n,n ) X n = R n F n ( X n ) = α n (cid:32) max ≤ i ≤ n C n,i (cid:80) nj =1 C n,j (cid:33) θ/ . Then, it suffices to derive the limiting distribution of the sequence of distributions of (cid:80) ni =1 C n,i Y n,i / (cid:113)(cid:80) nl =1 C n,l conditional on ( C n, , . . . , C n,n ) = ( c n, , . . . , c n,n ), where the values44 c n, , . . . , c n,n ) come from an arbitrary triangular array with infinitely many rows that satisfies α n (cid:16) max ni =1 c n,i / (cid:80) nj =1 c n,j (cid:17) θ/ → n → ∞ .For each n , we have a sum of a triangular array of random variables C n,i Y n,i that areindependent across i = 1 , . . . , n once we condition on ( C n, , . . . , C n,n ). By assumption, C n, ,. . . , C n,n is independent of Y n, , . . . , Y n,n for every n . Thus, E [ C n,i Y n,i | ( C n, , . . . , C n,n ) = ( c n, , . . . , c n,n )] = 0 (D.3) V [ C n,i Y n,i | ( C n, , . . . , C n,n ) = ( c n, , . . . , c n,n )] = c n,i σ n (D.4)for i = 1 , . . . , n . The sum of the variances is s n = σ n (cid:80) ni =1 c n,i .For n = 1 , , . . . , the sequence of distributions of (cid:80) ni =1 C n,i Y n,i /σ n (cid:113)(cid:80) nl =1 C n,l conditional on ( C n, , . . . , C n,n ) = ( c n, , . . . , c n,n ) is the same asthe sequence of distributions of Z n = (cid:80) ni =1 c n,i Y n,i /σ n (cid:113)(cid:80) nl =1 c n,l .We apply the Lindeberg CLT to derive the limiting distribution of Z n . We need to verifythe Lindeberg condition, that is, for any δ > n (cid:88) i =1 s n E (cid:2) c n,i Y n,i I {| c n,i Y n,i | > δs n } (cid:3) → n → ∞ (D.5) ⇔ n (cid:88) i =1 c n,i σ n (cid:80) nl =1 c n,l E (cid:34) Y n,i I (cid:40) c n,i Y n,i > δ σ n n (cid:88) l =1 c n,l (cid:41)(cid:35) → ⇐ n (cid:88) i =1 c n,i (cid:80) nl =1 c n,l E Y n,i σ n I Y n,i σ n > δ (cid:80) nl =1 c n,l max ≤ i ≤ n c n,i → ⇔ E Y n,i σ n I Y n,i σ n > δ (cid:80) nl =1 c n,l max ≤ i ≤ n c n,i → I (cid:8) c n,i Y n,i > δ σ n (cid:80) nl =1 c n,l (cid:9) ≤ I (cid:26) Y n,i σ n > δ (cid:80) nl =1 c n,l max ≤ i ≤ n c n,i (cid:27) , and that E (cid:20) Y n,i σ n I (cid:26) Y n,i σ n > ε (cid:80) nl =1 c n,l max ≤ i ≤ n c n,i (cid:27)(cid:21) does not depend on i . Thus, it suffices to verify Equation D.8.45ote that, (cid:12)(cid:12)(cid:12)(cid:12) Y n,i σ n (cid:12)(cid:12)(cid:12)(cid:12) θ ≥ (cid:12)(cid:12)(cid:12)(cid:12) Y n,i σ n (cid:12)(cid:12)(cid:12)(cid:12) θ I Y n,i σ n > δ (cid:80) nl =1 c n,l max ≤ i ≤ n c n,i = Y n,i σ n (cid:12)(cid:12)(cid:12)(cid:12) Y n,i σ n (cid:12)(cid:12)(cid:12)(cid:12) θ I Y n,i σ n > δ (cid:80) nl =1 c n,l max ≤ i ≤ n c n,i ≥ Y n,i σ n δ (cid:80) nl =1 c n,l max ≤ i ≤ n c n,i θ/ I Y n,i σ n > δ (cid:80) nl =1 c n,l max ≤ i ≤ n c n,i . Re-arranging, max ≤ i ≤ n c n,i δ (cid:80) nl =1 c n,l θ/ E (cid:12)(cid:12)(cid:12)(cid:12) Y n,i σ n (cid:12)(cid:12)(cid:12)(cid:12) θ ≥ E Y n,i σ n I Y n,i σ n > δ (cid:80) nl =1 c n,l max ≤ i ≤ n c n,i δ − θ max ≤ i ≤ n c n,i (cid:80) nl =1 c n,l θ/ α n ≥ E Y n,i σ n I Y n,i σ n > δ (cid:80) nl =1 c n,l max ≤ i ≤ n c n,i and the left-hand side converges to zero, which implies the right-hand side converges to zeroand (D.8) holds.Therefore, (cid:80) ni =1 c n,i Y n,i σ n (cid:113)(cid:80) nl =1 c n,l d → N (0 , (cid:80) ni =1 C n,i Y n,i σ n (cid:113)(cid:80) nl =1 C n,l d → N (0 , . Lemma D.5.
Consider a sequence of random CDFs { F n } n that converges pointwise inprobability to a continuous CDF F , that is, for every x ∈ R , F n ( x ) p → F ( x ) . Then, theconvergence is also uniform, namely, sup x | F n ( x ) − F ( x ) | p → . roof. Fix ε > m : ε > /m . Use continuity of F to pick points −∞ = x