Goodness-of-Fit Tests based on Series Estimators in Nonparametric Instrumental Regression
GGoodness-of-Fit Tests based on SeriesEstimators in Nonparametric Instrumental
Regression ∗ Christoph Breunig (cid:63)
Humboldt-Universit¨at zu Berlin
March 2, 2015
This paper proposes several tests of restricted specification in nonparametricinstrumental regression. Based on series estimators, test statistics are estab-lished that allow for tests of the general model against a parametric or nonpara-metric specification as well as a test of exogeneity of the vector of regressors.The tests’ asymptotic distributions under correct specification are derived andtheir consistency against any alternative model is shown. Under a sequenceof local alternative hypotheses, the asymptotic distributions of the tests is de-rived. Moreover, uniform consistency is established over a class of alternativeswhose distance to the null hypothesis shrinks appropriately as the sample sizeincreases. A Monte Carlo study examines finite sample performance of the teststatistics.
Keywords:
Nonparametric regression, instrument, linear operator, orthogonal seriesestimation, hypothesis testing, local alternative, uniform consistency.
JEL classification:
C12, C14.
1. Introduction
While parametric instrumental variables estimators are widely used in econometrics, itsnonparametric extension has not been introduced until the last decade. The study of non-parametric instrumental regression models was initiated by Florens [2003] and Newey andPowell [2003]. In these models, given a scalar dependent variable Y , a vector of regressors Z , and a vector of instrumental variables W , the structural function ϕ satisfies Y = ϕ ( Z ) + U with E [ U | W ] = 0 (1.1) ∗ This paper derives from my doctoral dissertation, completed under the guidance of Enno Mammen. Iwould like to thank two anonymous referees for comments and suggestions that greatly improved thepaper. I also benefited from helpful comments by Jan Johannes, James Stock, Federico Crudu, and PetyoBonev. This work was supported by the DFG-SNF research group FOR916. (cid:63)
Humboldt-Universit¨at zu Berlin, Spandauer Straße 1, 10178 Berlin, Germany, e-mail:[email protected] a r X i v : . [ ec on . E M ] S e p or an error term U . Here, Z contains potentially endogenous entries, that is, E [ U | Z ] maynot be zero. Model (1.1) does not involve the a priori assumption that the structuralfunction is known up to finitely many parameters. By considering this nonparametricmodel, we minimize the likelihood of misspecification. On the other hand, implementingthe nonparametric instrumental regression model can be challenging.Nonparametric instrumental regression models have attracted increasing attention in theeconometric literature. For example, Ai and Chen [2003], Blundell et al. [2007], Chenand Reiß [2011], Newey and Powell [2003] or Johannes and Schwarz [2010] consider sieveminimum distance estimators of ϕ , while Darolles et al. [2011], Hall and Horowitz [2005],Gagliardini and Scaillet [2011] or Florens et al. [2011] study penalized least squares esti-mators. When the methods of analysis are widened to include nonparametric techniques,one must confront two mayor challenges. First, identification in model (1.1) requires farstronger assumptions about the instrumental variables than for the parametric case (cf.Newey and Powell [2003]). Second, the accuracy of any estimator of ϕ can be low, evenfor large sample sizes. More precisely, Chen and Reiß [2011] showed that for a large classof joint distributions of ( Z, W ) only logarithmic rates of convergence can be obtained. Thereason for this slow convergence is that model (1.1) leads to an inverse problem which is illposed in general, that is, the solution does not depend continuously on the data.In light of the difficulties of estimating the nonparametric function ϕ in model (1.1), theneed for statistically justified model simplifications is paramount. We do not face an illposed inverse problem if a parametric structure of ϕ or exogeneity of Z can be justified. Ifthese model simplifications are not supported by the data, one might still be interested inwhether a smooth solution to model (1.1) exists and if some regressors could be omittedfrom the structural function ϕ . These model simplifications have important potential sincethey might increase the accuracy of estimators of ϕ or lower the required conditions imposedon the instrumental variables to ensure identification.In this work we present a new family of goodness-of-fit statistics which allows for severalrestricted specification tests of the model (1.1). Our method can be used for testing eithera parametric or nonparametric specification. In addition, we perform a test of exogeneityand of dimension reduction of the vector of regressors Z , that is, whether certain regressorscan be omitted from the structural function ϕ . By a withdrawal of regressors which areindependent of the instrument, identification in the restricted model might be possiblealthough ϕ is not identified in the original model (1.1).There is a large literature concerning hypothesis testing of restricted specification of re-gression. In the context of conditional moment equation, Donald et al. [2003] and Tripathiand Kitamura [2003] make use of empirical likelihood methods to test parametric restric-tions of the structural function. In addition, Santos [2012] allows for different hypothesistests, such as a test of homogeneity. Based on kernel techniques, Horowitz [2006], Blundelland Horowitz [2007], and Horowitz [2011] propose test statistics in which an additionalsmoothing step (on the exogenous entries of Z ) is carried out. Horowitz [2006] considersa parametric specification test. Blundell and Horowitz [2007] establish a consistent test ofexogeneity of the vector of regressors Z , whereas Horowitz [2011] tests whether the endoge-nous part of Z can be omitted from ϕ . Gagliardini and Scaillet [2007] and Horowitz [2012]develop nonparametric specification tests in an instrumental regression model. We like toemphasize that their test cannot be applied to model (1.1) where some entries of Z mightbe exogenous.Our testing procedure is entirely based on series estimation and hence is easy to implement.We use approximating functions to estimate the conditional moment restriction implied by2he model (1.1) where ϕ is replaced by an estimator under each conjectured hypothesis. Itis worth noting that by our methodology we can omit some assumptions typically foundin related literature, such as smoothness conditions on the joint distribution of ( Z, W ). Inaddition, a Monte Carlo indicates that the finite sample power of our tests exceed that ofexisting tests.The paper is organized as follows. In Section 2, we start with a simple hypothesis test,that is, whether ϕ coincides with a known function ϕ . We obtain the test’s asymptoticdistribution under the null hypothesis and its consistency against any fixed alternativemodel. Moreover, we judge its power by considering linear local alternatives and establishuniform consistency over a class of functions. In Sections 3–5 we consider a parametricspecification test, a test of exogeneity, and a nonparametric specification test. The goodness-of-fit statistics are obtained by replacing ϕ in the statistic of Section 2 by an appropriateestimator. In each case, the asymptotic distribution under correct specification and powerstatements against alternative models are derived. In Section 6, we investigate the finitesample properties of our tests by Monte Carlo simulations. All proofs can be found in theappendix.
2. A simple hypothesis test
In this section, we propose a goodness-of-fit statistic for testing the hypothesis H : ϕ = ϕ ,where ϕ is a known function, against the alternative ϕ (cid:54) = ϕ . We develop a test statisticbased on L distance. As we will see in the following chapters, it is sufficient to replace ϕ byan appropriate estimator to allow for tests of the general model against other specifications.We first give basic assumptions, then obtain the asymptotic distribution of the proposedstatistic, and further discuss its power and consistency properties. The model revisited
The nonparametric instrumental regression model (1.1) leads to alinear operator equation. To be more precise, let us introduce the conditional expectationoperator
T φ := E [ φ ( Z ) | W ] mapping L Z = { φ : E | φ ( Z ) | < ∞} to L W = { ψ : E | ψ ( W ) | < ∞} . Consequently, model (1.1) can be written as g = T ϕ (2.1)where the function g := E [ Y | W ] belongs to L W . Throughout the paper we assume that aniid. n -sample of ( Y, Z, W ) from the model (1.1) is available.
Assumptions.
Our test statistic based on a sequence of approximating functions { f l } l (cid:62) in L W . Let W denote the support of W and the marginal density of W by p W . Let ν bea probability density function that is strictly positive on W . We assume throughout thepaper that { f l } l (cid:62) forms an orthonormal basis in L ν ( R d w ) := { φ : (cid:82) φ ( s ) ν ( s ) ds < ∞} where d w denotes the dimension of W . For instance, if W ⊂ [ a, b ] then a natural choice of ν would be ν ( w ) = 1 / ( b − a ) for w ∈ [ a, b ] and zero otherwise. Assumption 1.
There exist constants η f , η p (cid:62) such that (i) sup l (cid:62) (cid:82) | f l ( s ) | ν ( s ) ds (cid:54) η f and (ii) sup w ∈W (cid:8) p W ( w ) /ν ( w ) (cid:9) (cid:54) η p with ν being strictly positive on W . i ) restricts the magnitude of the approximating functions { f j } j (cid:62) which isnecessary for our proof to determine the asymptotic behavior of our test statistic. Thisassumption holds for sufficiently large η f if the basis { f l } l (cid:62) is uniformly bounded, suchas trigonometric bases. Moreover, Assumption 1 ( i ) is satisfied by Hermite polynomials.Assumption 1 ( ii ) is satisfied if, for instance, p W /ν is continuous and W is compact.The results derived below involve assumptions on the conditional moments of the randomvariables U given W gathered in the following assumption. Assumption 2.
There exists a constant σ > such that E [ U | W ] (cid:54) σ . The conditional moment condition on the error term U helps to establish the asymptoticdistribution of our test statistics. The following assumption ensures identification of ϕ inthe model (2.1). Assumption 3.
The conditional expectation operator T is nonsingular. Under Assumption 3, the hypothesis H is equivalent to g = T ϕ which is used to constructour test statistic below. Note that the asymptotic results under null hypotheses consideredin Sections 2–4 hold true even if T is singular. If Assumption 3 fails, however, our test hasno power against alternative models whose structural function satisfies ϕ = ϕ + δ with δ belonging to the null space of T .We will see below that the power of our test can be increased by carrying out an additionalsmoothing step. Therefore, we introduce a smoothing operator L mapping L W to L W . Incontrast to the unknown conditional expectation operator T , which has to be estimated, theoperator L can be chosen by the econometrician. Let L have an eigenvalue decompositiongiven by { τ / j , f j } j (cid:62) . We allow in this paper for a wide range of smoothing operators.In particular, L may be the identity operator, that is, no smoothing step is carried out.We only require the following condition on the operator L determined by the sequence ofeigenvalues τ = ( τ j ) j (cid:62) . Assumption 4.
The weighting sequence τ is positive, nonincreasing, and satisfies τ = 1 . Assumption 4 ensures that the operator L is nonsingular. Remark 2.1.
Horowitz [2006], Blundell and Horowitz [2007], and Horowitz [2011] consideras a smoothing operator a Fredholm integral operator, that is, Lφ ( s ) = (cid:82) (cid:96) ( s, t ) φ ( t ) dt forsome function φ ∈ L [0 ,
1] = { φ : (cid:82) φ ( s ) ds < ∞} and some kernel function (cid:96) : [0 , → R .In order to ensure Lφ ∈ L [0 , it is sufficient to assume (cid:82) (cid:82) | (cid:96) ( s, t ) | dsdt < ∞ . Let { τ / j , f j } j (cid:62) be the eigenvalue decomposition of L . By Parseval’s identity (cid:90) (cid:90) | (cid:96) ( s, t ) | dsdt = (cid:90) ∞ (cid:88) j =1 τ j | f j ( s ) | ds = ∞ (cid:88) j =1 τ j where the right hand side is only finite if the sequence τ decays sufficiently fast. In our case,if we apply a smoothing operator L with (cid:80) ∞ j =1 τ j < ∞ then our test statistics converges alsoto a weighted series of chi-squared random variables. In addition, we allow for a milderdegree of smoothing or no smoothing at all and show below that then asymptotic normalityof our test statistics can be obtained. (cid:3) otation. For a matrix A we denote its transposed by A t , its inverse by A − , and itsgeneralized inverse by A − . The euclidean norm is denoted by (cid:107) · (cid:107) which in case of a matrixdenotes the spectral norm, that is (cid:107) A (cid:107) = (trace( A t A )) / . The norms on L Z and L W aredenoted by (cid:107) φ (cid:107) Z := E | φ ( Z ) | for φ ∈ L Z and (cid:107) ψ (cid:107) W := E | ψ ( W ) | for ψ ∈ L W . The k × k identity matrix is denoted by I k . For a vector V we write diag( V ) for the diagonalmatrix with diagonal elements being the values of V . Moreover, e m ( Z ) and f m ( W ) denoterandom vectors with entries e j ( Z ) and f j ( W ), 1 (cid:54) j (cid:54) m , respectively. For any weightingsequence w we introduce vectors e wm ( Z ) and f wm ( W ) with entries e wj ( Z ) = √ w j e j ( Z ) and f wj ( W ) = √ w j f j ( W ), 1 (cid:54) j (cid:54) m . We write a n ∼ b n when there exist constants c, c (cid:48) > cb n (cid:54) a n (cid:54) c (cid:48) b n for all sufficiently large n . Nonsingularity of the conditional expectation operator T and the smoothing operator L implies that the null hypothesis H is equivalent to L ( g − T ϕ ) = 0. Note that (cid:107) L ( g − T ϕ ) (cid:107) W = 0 if and only if (cid:82) (cid:12)(cid:12) L ( g − T ϕ )( w ) p W ( w ) /ν ( w ) (cid:12)(cid:12) ν ( w ) dw = 0 since the Lebesguemeasure ν is strictly positive on W . Moreover, since { f j } j (cid:62) is an orthonormal basis withrespect to ν we obtain by Parseval’s identity (cid:90) (cid:12)(cid:12) L ( g − T ϕ )( w ) p W ( w ) /ν ( w ) (cid:12)(cid:12) ν ( w ) dw = ∞ (cid:88) j =1 E [( g − T ϕ )( W ) f τj ( W )] . (2.2)Now we truncate the infinite sum at some integer m n which grows with the sample size n .This ensures consistency of our testing procedure. Further, replacing the expectation bysample mean we obtain our test statistic S n := m n (cid:88) j =1 τ j (cid:12)(cid:12) n − n (cid:88) i =1 ( Y i − ϕ ( Z i )) f j ( W i ) (cid:12)(cid:12) . (2.3)We reject the hypothesis H if n S n becomes too large. When no additional smoothingis carried out, that is, L is the identity operator, then τ j = 1 for all j (cid:62)
1. To achieveasymptotic normality we need to standardize our test statistic S n by appropriate mean andvariance, which we introduce in the following definition. Definition 2.1.
For all m (cid:62) let Σ m be the covariance matrix of the random vector U f τm ( W ) with entries s jl = E (cid:2) U f τj ( W ) f τl ( W ) (cid:3) , (cid:54) j, l (cid:54) m . Then the trace and theFrobenius norm of Σ m are respectively denoted by µ m := m (cid:88) j =1 s jj and ς m := (cid:16) m (cid:88) j, l =1 s jl (cid:17) / . Indeed the next result shows that nS n after standardization is asymptotically normallydistributed if m n increases appropriately as the sample size n tends to infinity. Theorem 2.1.
Let Assumptions 1–4 hold true. If m n satisfies ς − m n = o (1) and (cid:16) m n (cid:88) j =1 τ j (cid:17) = o ( n ) (2.4) then under H ( √ ς m n ) − (cid:0) n S n − µ m n (cid:1) d → N (0 , . emark 2.2. Since ς m n (cid:54) η p σ (cid:0) (cid:80) m n j =1 τ j (cid:1) (cf. proof of Theorem 2.2) condition ς − m n = o (1) implies that (cid:80) m n j =1 τ j tends to infinity as n increases. Moreover, from condition (2.4) we seethat by choosing a stronger decaying sequence τ the parameter m n may be chosen larger.From the following theorem we see that if (cid:80) m n j =1 τ j = O (1) only m − n = o (1) is required. (cid:3) In the following result, we establish the asymptotic distribution of our test when the se-quence of weights τ may have a stronger decay than in Theorem 2.1, that is, we consider thecase where τ satisfies (cid:80) m n j =1 τ j = O (1). This holds, for instance, if the sequence τ satisfies τ j ∼ j − (1+ ε ) for any ε >
0. In this case, the asymptotic distribution changes and additionaldefinitions have to be made. Let Σ be the covariance matrix of the infinite dimensionalcentered vector (cid:0) U f τj ( W ) (cid:1) j (cid:62) . The ordered eigenvalues of Σ are denoted by ( λ j ) j (cid:62) . Below,we introduce a sequence { χ j } j (cid:62) of independent random variables that are distributed aschi-square with one degree of freedom. Theorem 2.2.
Let Assumptions 1–4 hold true. If m n satisfies m n (cid:88) j =1 τ j = O (1) and m − n = o (1) (2.5) then under H n S n d → ∞ (cid:88) j =1 λ j χ j . Remark 2.3 (Estimation of Critical Values) . The asymptotic results of Theorem 2.1 and2.2 depend on unknown population quantities. As we see in the following, the critical valuescan be easily estimated. Let W m ( τ ) denote a n × m matrix with entries f τj ( W i ) for (cid:54) i (cid:54) n and (cid:54) j (cid:54) m . Moreover, U n = ( Y − ϕ ( Z ) , . . . , Y n − ϕ ( Z n )) t . In the setting of Theorem2.1, we replace Σ m by (cid:98) Σ m := n − W m ( τ ) t diag ( U n ) W m ( τ ) . Now the asymptotic result of Theorem 2.1 continues to hold if we replace ς m n by the Frobe-nius norm of (cid:98) Σ m n and µ m n by the trace of (cid:98) Σ m n . In the setting of Theorem 2.2, the asymp-totic distribution is not pivotal and has to approximated. First, the difference of criticalvalues between (cid:80) ∞ j =1 λ j χ j and the truncated sum (cid:80) M n j =1 λ j χ j converges to zero if the inte-ger M n > tends to infinity (depending on n ). Second, replace ( λ j ) (cid:54) j (cid:54) M n by ( (cid:98) λ j ) (cid:54) j (cid:54) M n which are the ordered eigenvalues of (cid:98) Σ M n . Observe max (cid:54) j (cid:54) M n | (cid:98) λ j − λ j | = (cid:107) (cid:98) Σ M n − Σ M n (cid:107) = O ( M n n − / ) almost surely. Hence, the critical values of (cid:80) M n j =1 (cid:98) λ j χ j converge in probabilityto the ones of the limiting distribution of n S n if M n = o ( √ n ) . (cid:3) Let us study the power of the test statistic S n , that is, the probability to reject a falsehypothesis, against a sequence of linear local alternatives that tends to zero as n → ∞ . It isshown that the power of our tests essentially relies on the choice of the weighting sequence τ .Let us start with the case ς − m n = o (1). We consider the following sequence of linear localalternatives Y = ϕ ( Z ) + ς / m n n − / δ ( Z ) + U (2.6)6or some function δ ∈ L Z := { φ : E | φ ( Z ) | < ∞} . The next result establishes asymptoticnormality for the standardized test statistic S n . Let us denote δ j := √ τ j E [ δ ( Z ) f j ( W )]. Proposition 2.3.
Given the conditions of Theorem 2.1 it holds under (2.6)( √ ς m n ) − (cid:0) n S n − µ m n (cid:1) d → N (cid:16) − / ∞ (cid:88) j =1 δ j , (cid:17) . As we see below the test statistic S n has power advantages if (cid:80) m n j =1 τ j = O (1). Let usconsider the sequence of linear local alternatives Y = ϕ ( Z ) + n − / δ ( Z ) + U (2.7)for some function δ ∈ L Z . For the next result, the sequence { χ j ( δ j /λ j ) } j (cid:62) denotesindependent random variables that are distributed as non-central chi-square with one degreeof freedom and non-centrality parameters δ j /λ j . Proposition 2.4.
Given the conditions of Theorem 2.2 it holds under (2.7) n S n d → ∞ (cid:88) j =1 λ j χ j ( δ j /λ j ) . Remark 2.4.
We see from Proposition 2.3 that our test can detect linear alternatives ata rate ς / m n n − / . On the other hand, if (cid:80) m n j =1 τ j = O (1) then S n can detect local linearalternatives at the faster rate n − / . But still our test with L = Id can have better poweragainst certain smooth classes of alternatives as illustrated by Hong and White [1995] andHorowitz and Spokoiny [2001]. Indeed, the next subsection shows that additional smoothingchanges the class of alternatives over which uniform consistency can be obtained. (cid:3)
In this subsection, we establish consistency against a fixed alternative and uniform con-sistency of our test over appropriate function classes. Let us first consider the case of afixed alternative. We assume that H does not hold, that is, P ( ϕ = ϕ ) <
1. The follow-ing proposition shows that our test has the ability to reject a false null hypothesis withprobability 1 as the sample size grows to infinity.The consistency properties require the following additional assumption.
Assumption 5. (i) The function p W /ν is uniformly bounded away from zero. (ii) Thereexists a constant σ o > such that E [ U | W ] (cid:62) σ o . Assumption 5 ( i ) implies that (cid:107) LT ( ϕ − ϕ ) (cid:107) W > ϕ in thealternative. Further, Assumption 5 implies that (cid:80) m n j =1 τ j = O ( ς m n ). Proposition 2.5.
Assume that H does not hold. Let E | Y − ϕ ( Z ) | < ∞ and let As-sumption 5 (i) hold true. Consider the sequence ( α n ) n (cid:62) satisfying α n = o ( nς − m n ) . Underthe conditions of Theorem 2.1 we have P (cid:16) ( √ ς m n ) − (cid:0) nS n − µ m n (cid:1) > α n (cid:17) = 1 + o (1) . Under the conditions of Theorem 2.2 we have α n = o ( n ) and P (cid:0) nS n > α n (cid:1) = 1 + o (1) .
7n the following, we specify a class of functions over which our test S n is uniformly consistent.This essentially implies that there are no alternative functions in this class over which ourtest has low power. We show that our test is consistent uniformly over the class G ρn = (cid:110) ϕ ∈ L Z : (cid:107) LT ( ϕ − ϕ ) (cid:107) W (cid:62) ρ n − ς m n and sup z ∈Z | ( ϕ − ϕ )( z ) | (cid:54) C (cid:111) where C > H is false then (cid:107) LT ( ϕ − ϕ ) (cid:107) W (cid:62) ρ ς m n n − forall sufficiently large n and some ρ >
0. By Assumption 4 the sequence τ is nonincreasingsequence with τ = 1 and hence, (cid:107) LT ( ϕ − ϕ ) (cid:107) W (cid:54) (cid:107) T ( ϕ − ϕ ) (cid:107) W (cid:54) (cid:107) ϕ − ϕ (cid:107) Z byJensen’s inequality. We conclude that G ρn contains all alternative functions whose L Z -distance to the structural function ϕ is at least n − ς m n within a constant. If the coefficients E [( ϕ − ϕ )( Z ) f j ( W )] fluctuate for large j then ϕ does not belong to G ρn if the decay of τ is too strong. On the other hand, if E [( ϕ − ϕ )( Z ) f j ( W )] is sufficiently small for j upto a finite constant then ϕ does not necessarily belong to G ρn with τ having a slow decay.For the next result let q α and q α denote the 1 − α quantile of N (0 ,
1) and (cid:80) ∞ j =1 λ j χ j ,respectively. Proposition 2.6.
Let Assumption 5 be satisfied. For any ε > , any < α < , and anysufficiently large constant ρ > we have under the conditions of Theorem 2.1 that lim n →∞ inf ϕ ∈G ρn P (cid:16) ( √ ς m n ) − (cid:0) nS n − µ m n (cid:1) > q α (cid:17) (cid:62) − ε, while under the conditions of Theorem 2.2 lim n →∞ inf ϕ ∈G ρn P (cid:16) n S n > q α (cid:17) (cid:62) − ε.
3. A parametric specification test
In this section, we present a test whether the structural function ϕ is known up to a finitedimensional parameter. Let Θ be a compact subspace of R k then we consider the nullhypothesis H p : there exists some ϑ ∈ Θ such that ϕ ( · ) = φ ( · , ϑ ) for a known function φ .The alternative hypothesis is that there exists no ϑ ∈ Θ such that ϕ ( · ) = φ ( · , ϑ ) holds true. Under Assumptions 3 and 4, the null hypothesis H p is equivalent to L ( g − T φ ( · , ϑ )) = 0 forsome ϑ ∈ Θ . Thereby, to verify H p we make use of the test statistic S n given in (2.3) where ϕ is replaced by φ ( · , (cid:98) ϑ n ) with (cid:98) ϑ n being an estimator of ϑ . Hence, our test statistic for aparametric specification is given by S p n := m n (cid:88) j =1 τ j (cid:12)(cid:12) n − n (cid:88) i =1 (cid:0) Y i − φ ( Z i , (cid:98) ϑ n ) (cid:1) f j ( W i ) (cid:12)(cid:12) . If the test statistic S p n becomes too large then H p has to be rejected. To obtain asymptoticresults for the statistic S p n we require smoothness conditions of the function φ with respectto its second argument. Below we denote the vector of partial derivatives of φ with respectto ϑ = ( ϑ , . . . , ϑ k ) t by φ ϑ = ( φ ϑ l ) (cid:54) l (cid:54) k and the matrix of second-order partial derivativesby φ ϑϑ = ( φ ϑ j ϑ l ) (cid:54) j,l (cid:54) k . 8 ssumption 6. (i) Let (cid:98) ϑ n be an estimator satisfying (cid:107) (cid:98) ϑ n − ϑ (cid:107) = O p ( n − / ) for some ϑ ∈ int ( Θ ) with ϕ ( · ) = φ ( · , ϑ ) if H p holds true. (ii) The function φ is twice partialdifferentiable with respect to its second argument. There exists some constant η φ (cid:62) suchthat sup (cid:54) l (cid:54) k E | φ ϑ l ( Z, ϑ ) | (cid:54) η φ and sup (cid:54) j,l (cid:54) k E sup θ ∈ Θ | φ ϑ j ϑ l ( Z, θ ) | (cid:54) η φ . The following proposition establishes asymptotic normality of S p n after standardization. Theorem 3.1.
Let Assumptions 1–4 and 6 hold true. If m n satisfies (2.4) , then under H p ( √ ς m n ) − (cid:0) n S p n − µ m n (cid:1) d → N (0 , . In the following theorem, we state the asymptotic distribution of nS p n when (cid:80) m n j =1 τ j = O (1).In this case, we assume that (cid:98) ϑ n satisfies under H p √ n ( (cid:98) ϑ n − ϑ ) = n − / n (cid:88) i =1 h k ( V i ) + o p (1) (3.1)where V i := ( Y i , Z i , W i , ϑ ) and h k ( V i ) = ( h ( V i ) , . . . , h k ( V i )) t where h j , 1 (cid:54) j (cid:54) k , are realvalued functions. It is well known that this representation holds if (cid:98) ϑ n is the generalizedmethod of moments estimator. Let Σ p be the covariance matrix of the infinite dimensionalcentered vector (cid:0) U f τj ( W ) − E [ f τj ( W ) φ ϑ ( Z, ϑ ) t ] h k ( V ) (cid:1) j (cid:62) . The ordered eigenvalues of Σ p are denoted by ( λ p j ) j (cid:62) . Theorem 3.2.
Let Assumptions1–4 and 6 hold true. Assume that H p holds true and (cid:98) ϑ n satisfies condition (3.1) with E h j ( V ) = 0 and E | h j ( V ) | < ∞ , (cid:54) j (cid:54) k . If m n satisfies (2.5) , then n S p n d → ∞ (cid:88) j =1 λ p j χ j . Remark 3.1. [Estimation of Critical Values] For the estimation of critical values of The-orem 3.1 and 3.2, let us define U p n = (cid:0) Y − φ ( Z , (cid:98) ϑ n ) , . . . , Y n − φ ( Z n , (cid:98) ϑ n ) (cid:1) t . We estimatethe covariance matrix Σ m by (cid:98) Σ m := n − W m ( τ ) t diag ( U p n ) W m ( τ ) . Now the asymptotic result of Theorem 3.1 continues to hold if we replace ς m n by the Frobe-nius norm of (cid:98) Σ m n and µ m n by the trace of (cid:98) Σ m n . In the setting of Theorem 3.2, we replace Σ p by a finite dimensional matrix. Let A k be a n × k matrix with entries φ ϑ l ( Z i , (cid:98) ϑ n ) for (cid:54) i (cid:54) n , (cid:54) l (cid:54) k and h k ( V ) = (cid:0) h k ( V ) , . . . , h k ( V n ) (cid:1) t . Then define V k := n − h k ( V ) A tk .Given a sufficiently large integer M > we estimate Σ p by (cid:98) Σ p M := n − W M ( τ ) t (cid:16) diag ( U p n ) − V k (cid:17) t (cid:16) diag ( U p n ) − V k (cid:17) W M ( τ ) . Hence, we approximate (cid:80) ∞ j =1 λ j χ j by the finite sum (cid:80) M n j =1 (cid:98) λ p j χ j where ( (cid:98) λ p j ) (cid:54) j (cid:54) M n are theordered eigenvalues of (cid:98) Σ p M n . We have max (cid:54) j (cid:54) M n | (cid:98) λ p j − λ p j | = o p (1) if M n = o ( √ n ) . (cid:3) .2. Limiting behavior under local alternatives and consistency. In the following, we study the power and consistency properties of the test statistic S p n .In the following, we consider a sequence of linear local alternatives (2.6) or (2.7) with ϕ = φ ( ϑ , · ). Further, let δ ⊥ denote the projection of δ onto the orthogonal complementof φ ( · , ϑ ); that is, E [ φ ϑ ( Z, ϑ ) δ ⊥ ( Z )] = 0. Let us denote δ j ⊥ := √ τ j E [ δ ⊥ ( Z ) f j ( W )]. Proposition 3.3.
Let the conditions of Theorem 3.1 be satisfied. Then under (2.6) with ϕ = φ ( · , ϑ ) it holds ( √ ς m n ) − (cid:0) n S p n − µ m n (cid:1) d → N (cid:16) − / ∞ (cid:88) j =1 δ j ⊥ , (cid:17) . Let the conditions of Theorem 3.2 be satisfied. Then under (2.7) with ϕ = φ ( · , ϑ ) it holds n S p n d → ∞ (cid:88) j =1 λ p j χ j ( δ j ⊥ /λ p j ) . Remark 3.2.
Under homoscedasticity, that is, E [ U | W ] = σ o , W ∼ U [0 , , and L = Id wesee from Proposition 3.3 that our test has the same power properties as the test of Hong andWhite [1995]. On the other hand, if (cid:80) m n j =1 τ j = O (1) then our test can detect local linearalternatives at a rate n − / as in Horowitz [2006], which decreases more quickly than therate obtained by Tripathi and Kitamura [2003]. (cid:3) The next proposition establishes consistency of our test against a fixed alternative model.It is assumed that H p is false, that is, there exists no ϑ ∈ Θ such that ϕ ( · ) = φ ( · , ϑ ). Inthis situation, ϑ denotes the probability limit of the estimator (cid:98) ϑ n . Proposition 3.4.
Assume that H p does not hold. Let E | Y − φ ( Z, ϑ ) | < ∞ and Assump-tion 5 (i) hold true. Let ( α n ) n (cid:62) as in Proposition 2.5. Under the conditions of Theorem3.1 we have P (cid:16) ( √ ς m n ) − (cid:0) nS p n − µ m n (cid:1) > α n (cid:17) = 1 + o (1) . Given the conditions of Theorem 3.2 it holds P (cid:0) n S p n > α n (cid:1) = 1 + o (1) . In the following, we show that S p n is consistent uniformly over the function class H ρn = (cid:110) ϕ ∈ L Z : (cid:107) LT ( ϕ − φ ( · , ϑ )) (cid:107) W (cid:62) ρ n − ς m n and sup z ∈Z | ϕ ( z ) − φ ( z, ϑ ) | (cid:54) C (cid:111) for some constant C > ϑ denotes the probability limit of (cid:98) ϑ n . Similarly as in theprevious section, it can be seen that H ρn only contains functions whose L Z distance to φ ( · , ϑ ) is at least n − ς m n within a constant. For the next result let q α and q α denote the1 − α quantile of N (0 ,
1) and (cid:80) ∞ j =1 λ p j χ j , respectively. Proposition 3.5.
Let Assumption 5 be satisfied. For any ε > , any < α < , and anysufficiently large constant ρ > we have under the conditions of Theorem 3.1 that lim n →∞ inf ϕ ∈H ρn P (cid:16) ( √ ς m n ) − (cid:0) n S p n − µ m n (cid:1) > q α (cid:17) (cid:62) − ε, whereas under the conditions of Theorem 3.2 it holds lim n →∞ inf ϕ ∈H ρn P (cid:0) n S p n > q α (cid:1) (cid:62) − ε. . A nonparametric test of exogeneity Endogeneity of regressors is a common problem in econometric applications. Falsely as-suming exogeneity of the regressors leads to inconsistent estimators. On the other hand,treating exogenous regressors as if they were endogenous can lower the accuracy of esti-mation dramatically. In this section, we propose a test whether the vector of regressors Z is exogenous, that is, E [ U | Z ] = 0 or equivalently ϕ ( Z ) = E [ Y | Z ]. In this section, let ϕ ( Z ) = E [ Y | Z ] then the hypothesis under consideration is given by H e : ϕ = ϕ . Thealternative hypothesis is that ϕ (cid:54) = ϕ . To establish a test of exogeneity, let us first introduce an estimator of the conditional meanof Y given Z . This estimator is based on a sequence of approximating functions { e j } j (cid:62) belonging to L Z . Further, let Z k denote a n × k matrix with entries e j ( Z i ) for 1 (cid:54) i (cid:54) n and 1 (cid:54) j (cid:54) k . Moreover, let Y n = ( Y , . . . , Y n ) t . Then we define the estimator ϕ k ( · ) := e k ( · ) t (cid:98) β k where (cid:98) β k = ( Z tk Z k ) − Z tk Y n . (4.1)In contrast to the parametric case we need to allow for k tending to infinity as n → ∞ inorder to ensure consistency of the estimator ϕ k . Under conditions given below Z tk n Z k n willbe nonsingular with probability approaching one and hence its generalized inverse will bethe standard inverse. Note that the asymptotic behavior of the estimator ϕ k was studied,for example, by Newey [1997].Under Assumptions 3 and 4, the null hypothesis H e is equivalent to L ( g − T ϕ ) = 0. Con-sequently, our test of exogeneity of Z is based on the goodness-of-fit statistic S n introducedin (2.3) but where ϕ is replaced by the series estimator ϕ k n . The proposed test statisticfor H e is now given by S e n = m n (cid:88) j =1 τ j (cid:12)(cid:12) n − n (cid:88) i =1 (cid:0) Y i − ϕ k n ( Z i ) (cid:1) f j ( W i ) (cid:12)(cid:12) where k n and m n tend to infinity as n → ∞ . The hypothesis of exogeneity of Z has to berejected if S e n becomes too large.For controlling the bias of the estimator ϕ k n we specify in the following a rate of approxi-mation (cf. Newey [1997]). Let γ = ( γ j ) j (cid:62) be a nondecreasing sequence with γ = 1. Weassume that ϕ belongs to F γ := (cid:8) φ ∈ L Z : sup z ∈Z | φ ( z ) − e k n ( z ) t β k n | = O ( γ − k n ) for some β k n ∈ R k n (cid:9) . Here, the sequence of weights γ measures the approximation error of ϕ with respect to thefunctions { e j } j (cid:62) . Assumption 7. (i) Let ϕ ∈ F γ with nondecreasing sequence γ satisfying j = o ( γ j ) . (ii)There exists some constant η e (cid:62) such that sup z ∈Z (cid:107) e k n ( z ) (cid:107) (cid:54) η e k n . (iii) The smallesteigenvalue of E [ e k ( Z ) e k ( Z ) t ] is bounded away from zero uniformly in k . (iv) E [ U | Z ] isbounded. Assumption 7 ( i ) determines the required asymptotic behavior of the rate γ . For splinesand power series this assumption is satisfied if the number of continuous derivatives of ϕ Z equals two. Assumption 7 ( ii ) and ( iii ) restrict the magnitudeof the approximating functions { e j } j (cid:62) and impose nonsingularity of their second momentmatrix.We are now in the position to proof the following asymptotic result for the standardized teststatistic S e n . Here, a key requirement is that k n = o ( ς m n ) implying that k n = o ( (cid:80) m n j =1 τ j )and, in particular, k n = o ( m n ) if the smoothing operator L is the identity. Theorem 4.1.
Let Assumptions 1–4 and 7 be satisfied. If n = o ( γ k n ς m n ) , k n = o ( ς m n ) , and (cid:16) m n (cid:88) j =1 τ j (cid:17) = o ( n ) (4.2) then under H e it holds ( √ ς m n ) − (cid:0) n S e n − µ m n (cid:1) d → N (0 , . Example 4.1.
Let Z be continuously distributed with dim( Z ) = r and set L = Id. Considerthe polynomial case where γ j ∼ j p/r with p > and let m n ∼ n ν with < ν < / . LetAssumption 5 hold true then √ m n = O ( ς m n ) . Hence, condition (4.2) is satisfied if k n ∼ n κ with r (1 − ν/ / (2 p ) < κ < ν/ . (4.3) This ensures that the bias of this estimator in the statistic S e n is asymptotically negligible.Note that condition (4.3) requires p > r (2 /ν − . Hence, with a larger dimension r alsothe smoothness of ϕ has to increase, reflecting the curse of dimensionality. (cid:3) The next result states an asymptotic distribution result for the statistic S e n if (cid:80) m n j =1 τ j = O (1). Let Σ e be the covariance matrix of the infinite dimensional centered vector (cid:0) U ( f τj ( W ) − (cid:80) l (cid:62) E [ f τj ( W ) e l ( Z )] e l ( Z )) (cid:1) j (cid:62) . The ordered eigenvalues of Σ e are denoted by ( λ e j ) j (cid:62) . Theorem 4.2.
Let Assumptions 1–4 and 7 be satisfied. If m n (cid:88) j =1 τ j = O (1) , n = O ( γ k n ) , k n = o ( n ) , and m − n = o (1) (4.4) then under H e it holds n S e n d → ∞ (cid:88) j =1 λ e j χ j . Example 4.2.
Consider the setting of Example 4.1 but where the eigenvalues of L satisfy τ j ∼ j − . Condition (4.4) is satisfied if m n ∼ n ν for some ν > and k n ∼ n κ with r/ (2 p ) < κ < / . Here, the required smoothness of ϕ is p > r/ . In contrast to thesetting of Theorem 4.1, the estimator of ϕ needs to be undersmoothed. This ensures thatthe bias of this estimator in the statistic S e n is asymptotically negligible. (cid:3) Remark 4.1.
In contrast to Blundell and Horowitz [2007] no smoothness assumptions onthe joint distribution of ( Z, W ) is required here. In addition, we do not need any assumptionthat links the smoothness of the regression function ϕ to the smoothness of the joint densityof ( Z, W ) . (cid:3) emark 4.2 (Estimation of Critical Values) . For the estimation of critical values of The-orem 4.1 and 4.2, let us define U e n = (cid:0) Y − ϕ k n ( Z ) , . . . , Y n − ϕ k n ( Z n ) (cid:1) t . For any m (cid:62) we estimate the covariance matrix Σ m by (cid:98) Σ m := n − W m ( τ ) t diag ( U e n ) W m ( τ ) . Now the asymptotic result of Theorem 4.1 continues to hold if we replace ς m n by the Frobe-nius norm of (cid:98) Σ m n and µ m n by the trace of (cid:98) Σ m n . This consistency is shown in Lemma 4.3.In the setting of Theorem 4.2, we replace Σ e by a finite dimensional matrix (cid:98) Σ e M := n − W M ( τ ) t (cid:16) I n − n − Z k n Z tk n (cid:17) diag ( U e n ) (cid:16) I n − n − Z k n Z tk n (cid:17) W M ( τ ) where M > is a sufficiently large integer. Let ( (cid:98) λ e j ) (cid:54) j (cid:54) M n denote the ordered eigen-values of (cid:98) Σ e M n . Hence, we approximate (cid:80) ∞ j =1 λ e j χ j by the finite sum (cid:80) M n j =1 (cid:98) λ e j χ j where max (cid:54) j (cid:54) M n | (cid:98) λ e j − λ e j | = o p (1) if M n = o ( √ n ) . (cid:3) Lemma 4.3.
Consider (cid:98) Σ m n as defined in Remark 4.2. Under conditions of Theorem 4.1or Theorem 4.2 the difference of its Frobenius norm to ς m n and its trace to µ m n convergein probability to zero. Similar to the previous sections we study the power and consistency properties of our test.Let us study the power of our test of exogeneity under linear local alternatives (2.6) or(2.7). In these cases, it holds E [ U | W ] = 0 but E [ U | Z ] = − ς / m n n − / δ ( Z ) under (2.6) or E [ U | Z ] = − n − / δ ( Z ) under (2.7). Proposition 4.4.
Given the conditions of Theorem 4.1 and Assumption 5 (ii) it holdsunder (2.6)( √ ς m n ) − (cid:0) n S e n − µ m n (cid:1) d → N (cid:16) − / ∞ (cid:88) j =1 δ j , (cid:17) . Given the conditions of Theorem 4.2 it holds under (2.7) n S e n d → ∞ (cid:88) j =1 λ e j χ j ( δ j /λ e j ) . Let us now establish consistency of our tests when H e does not hold, that is, P (cid:0) ϕ = ϕ (cid:1) < Proposition 4.5.
Assume that H e does not hold. Let E | Y − ϕ ( Z ) | < ∞ and Assumption5 (i) hold true. Let ( α n ) n (cid:62) as in Proposition 2.5. Under the conditions of Theorem 4.1 wehave P (cid:16) ( √ ς m n ) − (cid:0) nS e n − µ m n (cid:1) > α n (cid:17) = 1 + o (1) , whereas in the setting of Theorem 4.2 P (cid:0) n S e n > α n (cid:1) = 1 + o (1) .
13n the following we show that our tests are consistent uniformly over the function class I ρn = (cid:110) ϕ ∈ L Z : (cid:107) LT ( ϕ − ϕ ) (cid:107) W (cid:62) ρ n − ς m n and sup z ∈Z | ( ϕ − ϕ )( z ) | (cid:54) C (cid:111) form some constant C >
0. For the next result let q α and q α denote the 1 − α quantile of N (0 ,
1) and (cid:80) ∞ j =1 λ e j χ j , respectively. Proposition 4.6.
Let Assumption 5 be satisfied. Under the conditions of Theorem 4.1 wehave for any ε > , any < α < , and any sufficiently large constant ρ > that lim n →∞ inf ϕ ∈I ρn P (cid:16) ( √ ς m n ) − (cid:0) n S e n − µ m n (cid:1) > q α (cid:17) (cid:62) − ε, whereas under the conditions of Theorem 4.2 it holds lim n →∞ inf ϕ ∈I ρn P (cid:0) n S e n > q α (cid:1) (cid:62) − ε.
5. A nonparametric specification test
A solution to the linear operator equation (2.1) only exists if g belongs to the range of T .This might be violated if, for instance, the instrument is not valid, that is, E [ U | W ] (cid:54) = 0. Inmany economic applications a priori smoothness restriction on the unknown function canbe justified which we capture by a set of functions F . We consider the hypothesis H np :there exists a solution ϕ ∈ F to (2.1). The alternative hypothesis is that there exists asolution (2.1) which does not belong to F . Under the alternative only unsmooth functionssolve the conditional moment restriction which can be interpreted as a failure of validity ofthe instrument W . We see in this section that our results allow also for a test of dimensionreduction of the vector of regressors Z , that is, whether some regressors can be omittedfrom the structural function ϕ . The nonparametric estimator.
In the following, we derive an estimator of ϕ under thenull hypothesis H np . For simplicity, assume that Z = W and consider a sequence { e j } j (cid:62) ofapproximating functions which are orthonormal on Z with respect to the Lebesque measure ν . Under conditions given below, ϕ has the expansion ϕ ( · ) = (cid:80) ∞ l =1 (cid:82) ϕ ( z ) e l ( z ) ν ( z ) dz e l ( · ).Thereby, the conditional moment restriction under H np leads to the following unconditionalmoment restrictions E [ Y e j ( W )] = ∞ (cid:88) l =1 E [ e j ( W ) e l ( Z )] (cid:90) ϕ ( z ) e l ( z ) ν ( z ) dz (5.1)for j (cid:62)
1. This motivates the following orthogonal series type estimator. Let Z k and Y n beas in the previous section and let X k denote a n × k matrix with entries e j ( W i ) for 1 (cid:54) i (cid:54) n and 1 (cid:54) j (cid:54) k . Then for each k (cid:62) (cid:98) ϕ k ( · ) := e k ( · ) t (cid:98) β k where (cid:98) β k = ( X tk Z k ) − X tk Y n . (5.2)Under conditions given below X tk n Z k n will be nonsingular with probability approachingone and hence its generalized inverse will be the standard inverse. The nonparametricestimator (cid:98) ϕ k given in (5.2) was studied by Johannes and Schwarz [2010], Horowitz [2011],and Horowitz [2012]. 14 dditional assumptions. In the following, we specify a priori smoothness assumptionscaptured by the set F . As noted by Horowitz [2012], uniformly consistent testing of H np is only possible if the null is restricted that any solution to (2.1) is smooth. Here, we as-sume that under the null hypothesis ϕ belongs to the ellipsoid F := F ργ := (cid:8) φ ∈ L Z : (cid:80) ∞ j =1 γ j E [ φ ( Z ) e j ( Z )] (cid:54) ρ (cid:9) . As in the previous section, γ = ( γ j ) j (cid:62) measures the approx-imation error of ϕ with respect to the basis { e j } j (cid:62) .Further, as usual in the context of nonparametric instrumental regression, we specify somemapping properties of the conditional expectation operator T . Denote by T the set of allnonsingular operators mapping L Z to L W . Given a sequence of weights υ := ( υ j ) j (cid:62) and d (cid:62) T υd of T by T υd := (cid:110) T ∈ T : (cid:90) | ( T φ )( w ) | ν ( w ) dw (cid:54) d ∞ (cid:88) j =1 υ j (cid:16) (cid:90) φ ( z ) e j ( z ) ν ( z ) dz (cid:17) for all φ ∈ L Z (cid:111) . If p Z /ν is bounded from above and p W /ν is uniformly bounded away from zero then theconditional expectation operator T belongs to T υd with υ j = 1, j (cid:62)
1, due to Jensen’sinequality. Notice that for all T ∈ T υd it follows that (cid:107) T e j (cid:107) W (cid:54) d η p υ j and thereby, thecondition T ∈ T υd links the operator T to the basis { e j } j (cid:62) . In the following, we denote[ T ] k = E [ e k ( W ) e k ( Z ) t ] which is assumed to be a nonsingular matrix. In what follows, weintroduce a stronger condition on the basis { e l } l (cid:62) . We denote by T υd,D for some D (cid:62) d thesubset of T υd given by T υd,D := (cid:110) T ∈ T υd : [ T ] k is nonsingular and sup k (cid:62) (cid:107) diag( υ , . . . , υ k ) / [ T ] − k (cid:107) (cid:54) D (cid:111) . The class T υd,D only contains operators T whose off-diagonal elements of [ T ] − k are sufficientlysmall for all k (cid:62)
1. A similar diagonality restriction has been used by Hall and Horowitz[2005] or Breunig and Johannes [2011]. Besides the mapping properties for the operator T we need a stronger assumption for the basis under consideration. The following conditiongathers conditions on the sequences γ and υ . Assumption 8. (i) Under H np , let ϕ ∈ F ργ with nondecreasing sequence γ satisfying j = o ( γ j ) . (ii) The sequence { e j } j (cid:62) is an orthogonal basis on Z = W with respect to ν . (iii) There exists some constant η e (cid:62) such that sup j (cid:62) sup z ∈Z | e j ( z ) | (cid:54) η e . (iv)Let T ∈ T υd,D with υ being a strictly positive sequences such that υ and ( υ j /τ j ) j (cid:62) arenonincreasing. (v) p Z /ν is bounded from above and p W /ν is uniformly bounded away fromzero. Due to Assumption 8 ( iv ) the degree of additional smoothing for our testing proceduremust not be stronger than the degree of ill-posedness implied by the conditional expectationoperator T . Under similar assumptions as above, Johannes and Schwarz [2010] show thatmean integrated squared error loss of (cid:98) ϕ k n attains the optimal rate of convergence R n :=max (cid:0) γ − k n , (cid:80) k n j =1 ( nυ j ) − (cid:1) . Due to Assumption 8 ( v ) we do not require orthonormal baseswith respect to the unknown distribution ( Z, W ) (cf. Remark 3.2 of Breunig and Johannes[2011]).
As in the previous sections, our test is based on the observation that the null hypothesis H np is equivalent to L ( g − T ϕ ) = 0. Our goodness-of-fit statistic for testing nonparametric15pecifications is given by S n where ϕ is replaced by the nonparametric estimator (cid:98) ϕ k n givenin (5.2), that is, S np n := m n (cid:88) j =1 τ j (cid:12)(cid:12) n − n (cid:88) i =1 (cid:0) Y i − (cid:98) ϕ k n ( Z i ) (cid:1) f j ( W i ) (cid:12)(cid:12) . If S np n becomes too large then there exists no function in F ργ solving (2.1). The next resultestablishes asymptotic normality of S np n after standardization. Again, a key requirementto obtain this asymptotic distribution is that k n = o ( ς m n ) implying that k n = o ( m n ) ifthe smoothing operator L is the identity. This corresponds to the test of overidentificationin the parametric framework where more orthogonality restrictions than parameters arerequired. Theorem 5.1.
Let Assumptions 1–4 and 8 be satisfied. If nυ k n = o ( γ k n ς m n ) , k n = o ( ς m n ) , k n (cid:16) m n (cid:88) j =1 τ j (cid:17) = O ( nυ k n ) , and (cid:16) m n (cid:88) j =1 τ j (cid:17) = o ( n ) (5.3) then it holds under H np ( √ ς m n ) − (cid:16) nS np n − µ m n (cid:17) d → N (0 , . Example 5.1.
Consider the setting of Example 4.1. In the mildly ill posed case where υ j ∼ j − a/r for some a (cid:62) condition (5.3) holds true if k n ∼ n κ with κ < ν/ and r (1 − ν/ / (2 a + 2 p ) < κ < r (1 − ν ) / (2 a + r ) . In the severely ill posed case, that is, υ j ∼ exp( − j a/r ) for some a > , condition (5.3) issatisfied if, for example, m n satisfies m n = o ( k pn ) and k n = o ( √ m n ) where k n ∼ (cid:0) log n − log( m / n ) (cid:1) r/ (2 a ) . (cid:3) The next result states an asymptotic distribution of our test if (cid:80) m n j =1 τ j = O (1). Let Σ np bethe covariance matrix of the infinite dimensional centered vector (cid:0) U ( f τj ( W ) − e τj ( W )) (cid:1) j (cid:62) .The ordered eigenvalues of Σ np are denoted by ( λ np j ) j (cid:62) . Theorem 5.2.
Let Assumptions 1–4 and 8 be satisfied. If m n (cid:88) j =1 τ j = O (1) , nυ k n = o ( γ k n ) , k n = o ( nυ k n ) , and m − n = o (1) (5.4) then it holds under H np n S np n d → ∞ (cid:88) j =1 λ np j χ j . Example 5.2.
Consider the setting of Example 4.2. In the mildly ill posed case, that is, υ j ∼ j − a/r for some a (cid:62) , condition (5.4) is satisfied if m n ∼ n ν for some ν > and k n ∼ n κ with r/ (2 a + 2 p ) < κ < r/ (2 a + 3 r ) . In the severely ill posed case, that is, υ j ∼ exp( − j a/r ) for some a > , condition (5.4) issatisfied if k n ∼ (cid:0) log( n ε ) (cid:1) r/ (2 a ) for any ε > . In contrast to Theorem 5.1, we requireundersmoothing of the estimator (cid:98) ϕ k n . (cid:3) emark 5.1. If the basis { e j } j (cid:62) coincides with { f j } j (cid:62) then n S np n is asymptotically de-generate. To avoid this degeneracy problem we choose different bases functions and hence,sample splitting as used by Horowitz [2012] is not necessary here. (cid:3) Remark 5.2.
Let Z (cid:48) be a vector containing only entries of Z with dim( Z (cid:48) ) < dim( Z ) . Itis easy to generalize our previous result for a test of H (cid:48) np : there exists a solution ϕ ∈ F ργ to (2.1) only depending on Z (cid:48) . To be more precise consider the test statistic S (cid:48) np n := (cid:13)(cid:13) n − n (cid:88) i =1 (cid:0) Y i − (cid:98) ϕ k n ( Z (cid:48) i ) (cid:1) f τm n ( W i ) (cid:107) where (cid:98) ϕ k n is the estimator (5.2) based on an iid. sample ( Y , Z (cid:48) , W ) , . . . , ( Y n , Z (cid:48) n , W n ) of ( Y, Z (cid:48) , W ) . Under H (cid:48) np we consider the conditional expectation operator T (cid:48) : L Z (cid:48) → L W with T (cid:48) φ := E [ φ ( Z (cid:48) ) | W ] . It is interesting to note that if T is nonsingular then also T (cid:48) is.Hence, for a test of H (cid:48) np we may replace Assumption 3 by the weaker condition that T (cid:48) isnonsingular. Moreover, under H (cid:48) np the results of Theorem 5.1 and 5.2 still hold true if wereplace Z by Z (cid:48) . (cid:3) In the mildly ill-posed case, the estimation precision suffers from the curse of dimensionality.Hence, by the test of dimension reduction of Z we can increase the accuracy of estimation of ϕ . On the other hand, in the severely ill-posed case the rate of convergence is independentof the dimension of Z (cf. Chen and Reiß [2011]). As the next example illustrates, adimension reduction test can also weaken the required restrictions on the instrument toobtain identification of ϕ in the restricted model. Example 5.3.
Let Z = ( Z (1) , Z (2) ) where both, Z (1) and Z (2) are endogenous vectors ofregressors. But only Z (1) satisfies a sufficiently strong relationship with the instrument W inthe sense that for all φ ∈ L Z (1) condition E [ φ ( Z (1) ) | W ] = 0 implies φ = 0 . In this example,we do not assume that this completeness condition is fulfilled for the joint distribution of ( Z (2) , W ) . Thereby only the operator T (1) : L Z (1) → L W with T (1) φ := E [ φ ( Z (1) ) | W ] isnonsingular but T is singular. If our dimension reduction test of Z indicates that Z (2) canbe omitted from the structural function ϕ then we obtain identification in the restrictedmodel. (cid:3) Remark 5.3. [Estimation of Critical Values] For the estimation of critical values of The-orem 5.1 and 5.2, let us define U np n = (cid:0) Y − (cid:98) ϕ k n ( Z ) , . . . , Y n − (cid:98) ϕ k n ( Z n ) (cid:1) t . For all m (cid:62) ,we estimate the covariance matrix Σ m by (cid:98) Σ m := n − W m ( τ ) t diag ( U np n ) W m ( τ ) . Now the asymptotic result of Theorem 5.1 continues to hold if we replace ς m n by the Frobe-nius norm of (cid:98) Σ m n and µ m n by the trace of (cid:98) Σ m n (this is easily seen from the proof of Lemma4.3 assuming that { f j } j (cid:62) is uniformly bounded). In the setting of Theorem 5.2, we replace Σ np by a finite dimensional matrix. Let V k := W k (cid:0) Z tk W k ) − Z tk for k (cid:62) . Then for asufficiently large integer M > we estimate Σ np by (cid:98) Σ np M := n − W M ( τ ) (cid:0) I n − V k n (cid:1) t diag ( U np n ) (cid:0) I n − V k n (cid:1) W M ( τ ) . Hence, we approximate (cid:80) ∞ j =1 λ np j χ j by the finite sum (cid:80) M n j =1 (cid:98) λ np j χ j where ( (cid:98) λ np j ) (cid:54) j (cid:54) M n arethe ordered eigenvalues of (cid:98) Σ np M n where max (cid:54) j (cid:54) M n | (cid:98) λ np j − λ np j | = o p (1) if M n = o ( √ n ) . (cid:3) .3. Limiting behavior under local alternatives and consistency. Similar to the previous sections we study the power and consistency properties of our test.To study the power against local alternatives of the statistic S np n we consider alternativemodels with the function ϕ k n ( · ) = e k n ( · ) t [ T ] − k n E [ Y f k n ( W )]. We consider alternative models Y = ϕ k n ( Z ) + ς / m n n − / δ ( Z ) + U (5.5)for some function δ ∈ L Z and where E [ U | W ] = 0. Let ϕ be such that E [ Y − ϕ ( Z ) | W ] = 0.Due to (5.5) ϕ does not belong F ργ and hence H np fails. Indeed, if ϕ ∈ F ργ then we showin the appendix that (cid:107) T ( ϕ − ϕ k n ) (cid:107) W = O ( υ k n γ − k n ) = o ( ς m n n − ) due to condition (5.3) (or(5.4)), which is in contrast to (5.5). Proposition 5.3.
Let Assumption 5 (ii) be satisfied. Given the conditions of Theorem 5.1it holds under (5.5)( √ ς m n ) − (cid:0) n S np n − µ m n (cid:1) d → N (cid:16) − / ∞ (cid:88) j =1 ξ j , (cid:17) . Given the conditions of Theorem 5.2 it holds under (5.5) where ς m n is replaced by that n S np n d → ∞ (cid:88) j =1 λ np j χ j ( δ j /λ np j ) . In the next proposition, we establish consistency of our test when H np does not hold, thatis, the solution to (2.1) does not belong to F ργ for any sequence γ satisfying Assumption 8and any sufficiently large constant 0 < ρ < ∞ . Proposition 5.4.
Assume that H np does not hold. Let Assumption 5 (i) hold true. Let ( α n ) n (cid:62) be as in Proposition 2.5. Under the conditions of Theorem 5.1 and 5.2, respectively,we have P (cid:16) ( √ ς m n ) − (cid:0) nS np n − µ m n (cid:1) > α n (cid:17) = 1 + o (1) , P (cid:0) nS np n > α n (cid:1) = 1 + o (1) . In the following we show that our tests are consistent uniformly over the function class J ρn = (cid:110) ϕ ∈ L Z : (cid:107) LT ( ϕ − ϕ ) (cid:107) W (cid:62) ρn − ς m n and sup z ∈Z | ( ϕ − ϕ )( z ) | (cid:54) C (cid:111) where ϕ ∈ F ργ solves (2.1) and C > q α and q α denote the 1 − α quantile of N (0 ,
1) and (cid:80) ∞ j =1 λ np j χ j , respectively. Proposition 5.5.
Let Assumption 5 be satisfied. For any ε > , any < α < , and anysufficiently large constant ρ > we have under the conditions of Theorem 5.1 lim n →∞ inf ϕ ∈J ρn P (cid:16) ( √ ς m n ) − (cid:0) n S np n − µ m n (cid:1) > q α (cid:17) (cid:62) − ε, whereas under the conditions of Theorem 5.2 it holds lim n →∞ inf ϕ ∈J ρn P (cid:0) n S np n > q α (cid:1) (cid:62) − ε. . Monte Carlo simulation In this section, we study the finite-sample performance of our test by presenting the resultsof Monte Carlo experiments. There are 1000 Monte Carlo replications in each experiment.Results are presented for the nominal level 0 .
05. Realizations of Y were generated from Y = ϕ ( Z ) + c U U (6.1)for some constant c U > ϕ and the joint distribu-tion of ( Z, W, U ) varies in the experiments below. As basis { f j } j (cid:62) we choose cosine basisfunctions given by f j ( t ) = √ πjt ) for j = 1 , , . . . throughout this simulation study. Parametric Specification
Let us investigate the finite sample performance of our tests inthe case of parametric specifications. Realizations (
Z, W ) were generated by W ∼ U [0 , Z = ( ξ W + (1 − ξ ) ε ) where ξ = 0 . ε ∼ N (0 . , . U = κ ε + √ − κ (cid:15) with κ = 0 . (cid:15) ∼ N (0 , Y where generated by (6.1) with c U = 0 . ϕ ( z ) = z, (6.2)a polynomial of second degree ϕ ( z ) = z − z , (6.3)or a polynomial of third degree ϕ ( z ) = z − z + θ z . (6.4)Given (6.4) is the correct model, then θ = 1 . θ = 3 if (6.2)is the null model. In Table 1 we depict the empirical rejection probabilities when using S p n with additional smoothing where either τ j = j − or τ j = j − , j (cid:62)
1, which we denoteby S p n or S p n , respectively. When τ j = j − then the number of basis functions used is Sample Null Alt. Empirical Rejection probabilitySize Model Model S p n S p n H(2006)’ test
250 (6.2) H p true H p true H p true H p true = 200 while in the case of τ j = j − a choice of m = 100 is sufficient. The critical valuesare estimated as described in Remark 3.1 where M = 150 if τ j = j − and M = 100 if τ j = j − . This choice of M ensures that the estimated eigenvalues (cid:98) λ j are sufficiently closeto zero for all j (cid:62) M . We compare our test statistic with the test of Horowitz [2006]. Wefollow his implementation using biweight kernels. The bandwidth used to estimate the jointdensity of ( Z, W ) was also selected by cross validation. As Table 1 illustrates, the resultsfor S p n and S p n are quite similar. In both situations, our test is more powerful than the testof Horowitz [2006] when testing (6.2) against (6.4). In this simulation study, we observedthat the estimated coefficients of T ( ϕ − φ ( ϑ , · )) have a fast decay. Consequently, the teststatistic S n with no weighting has less power, as we discussed in Subsection 2.4. In contrast,we will demonstrate by the end of this section that using weights can be inappropriate. Testing Exogeneity
We now turn to the test of exogeneity where the realizations (
Z, W )are generated by W ∼ U [0 ,
1] and Z = ξ W + (cid:112) − ξ ε with ξ = 0 .
7, and ε ∼ U [0 , U = κ ε + √ − κ (cid:15) with (cid:15) ∼ U [0 , κ measures the degree of endogeneityof Z and is varied among the experiments. The null hypothesis H holds true if κ = 0 andis false otherwise. Now realizations of Y where generated by (6.1) with c U = 1 and thenonparametric structural function ϕ ( z ) = (cid:80) ∞ j =1 ( − j +1 j − sin( jπz ). For computationalreasons we truncate the infinite sum at K = 100. The resulting function is displayed inFigure 6. We estimate the structural relationship using Lagrange polynomials. Indeed,only a few basis functions are necessary to accurately approximate the true function. If wechoose k n too small or too large then the estimator will be a poor approximate of the truestructural function and hence, the test statistic will reject H np . In this experiment we set k n = 4 for n = 250 and n = 500. Sample Size κ Empirical Rejection probability using S e n S e n BH(2007)’ test
250 0 . .
15 0.209 0.314 0.1530 . .
25 0.591 0.716 0.504500 0 . .
15 0.476 0.543 0.4160 . .
25 0.922 0.957 0.885Table 2: Empirical Rejection probabilities for testing exogeneityIn Table 2 we depict the empirical rejection probabilities when using S e n with additionalsmoothing where either τ j = j − or τ j = j − , j (cid:62)
1, which we denote by S e n or S e n ,respectively. The critical values of these statistics are estimated as described in Remark4.2 with M = 50 in case of τ j = j − and M = 40 in case of τ j = j − . We compare ourresults with the test of Blundell and Horowitz [2007]. We follow their approach by choosingthe bandwidth of the joint density of ( Z, W ) by cross validation. The bandwidth of themarginal of Z is n / − / times the cross-validation bandwidth. As we see from Table 2, S e n is slightly more powerful than the test of Blundell and Horowitz [2007]. If we choose a20tronger sequence, however, then our test statistic S e n becomes considerably more powerful. Nonparametric Specification
Let us now study the finite sample of our test in the caseof nonparametric specification. We generate the pair (
Z, W ) as in the parametric casedescribed above. For the generation of the dependent variable Y we distinguish two cases.Besides the structural function ϕ ( z ) = (cid:80) ∞ j =1 ( − j +1 j − sin( jπz ) we also consider thefunction ϕ ( z ) = (cid:80) ∞ j =1 (( − j +1 + 1) / j − sin( jπz ). Again, for computational reasons wetruncate the infinite sum at K = 100. The resulting functions are displayed in Figure 6.Further, Y is generated by (6.1) either with ϕ and c U = 0 . ϕ and c U = 0 .
8. In bothcases, we estimate the structural relationship using Lagrange polynomials with k n = 4 for n = 500 and n = 1000.If H np is false then E [ U | W ] (cid:54) = 0 and we let E [ U | W ] = E [ ρ ( Z ) | W ] where ρ is defined below.Consequently, when H np is false we generate realizations of Y from Y = ϕ l ( Z ) + ρ j ( Z ) + c U U for l = 1 , j (cid:62) ρ j ( z ) = c j (exp(2 jz ) { z (cid:54) / } + exp(2 j (1 − z )) { z> / } −
1) and c j is a normalizing constant such that (cid:82) ρ j ( z ) dz = 0 .
5. The functions ρ j are continuousbut not differentiable at 0 .
5. Roughly speaking, the degree of roughness of ρ j is larger forlarger j . In Table 3, we depict the empirical rejection probabilities when using S np n with . . . . . . . . z j ( z ) . . . . . . . z j ( z ) Figure 1: Graph of ϕ and ϕ either no smoothing or additional smoothing τ j = j − , j (cid:62)
1, which we denote by S np n or S np n , respectively. When no additional smoothing is applied then the number of basisfunctions f j is given by m n = 11 if n = 500 and m n = 15 if n = 1000 and hence, the choiceof m n is slightly larger than n / as suggested by the theoretical results. The critical valuesof these statistics are estimated as described in Remark 5.3 where in the case of S np n wechoose M = 100. We compare our results with the test of Horowitz [2012]. We observethat the statistic S np n is less powerful than S np n against the alternatives ρ and ρ .In the following, we illustrate that using additional weighting can be inappropriate. Table 4illustrates the power of our tests when the structural function ϕ is considered and realiza-tions ( Z, W ) were generated by W ∼ U [0 , Z = (0 . W + 0 . ε ) where ε ∼ N (0 . , . ample Size ρ Empirical Rejection probability using S np n S np n H(2012)’ test H np true ρ ρ ρ H np true ρ ρ ρ ϕ with c U = 0 . Y using (6.1) where c U = 0 .
8. In this case, the estimates of thegeneralized coefficients of T ( ϕ − ϕ ) are more fluctuating and using weights is not appro-priate here. Indeed, as we can see from Table 4, the test statistic S np n with no smoothingis more powerful than S np n were weighting τ j = j − , j (cid:62)
1, is used. In particular, S np n ismuch more powerful than the test of Horowitz [2012]. Sample Size ρ Empirical Rejection probability using S np n S np n H(2012)’ test H np true ρ ρ ρ H np true ρ ρ ρ ϕ with c U = 0 .
7. Conclusion
Based on the methodology of series estimation, we have developed in this paper a family ofgoodness-of-fit statistics and derived their asymptotic properties. The implementation ofthese statistics is straightforward. We have seen that the asymptotic results depend cruciallyon the choice of the smoothing operator L . By choosing a stronger decaying sequence τ ,our test becomes more powerful with respect to local alternatives but might lose desirableconsistency properties. We gave heuristic arguments how to choose the weights in practice.In addition, in a Monte Carlo investigation our tests perform well in finite samples.22 . Appendix Throughout the Appendix, let
C > (cid:80) i = (cid:80) ni =1 and (cid:80) i (cid:48)
Proof of Theorem 2.1.
Under H we have ( Y i − ϕ ( Z i )) f τm ( W i ) = U i f τm ( W i ) for all m (cid:62) ς − m n (cid:0) nS n − µ m n (cid:1) = 1 ς m n n (cid:88) i m n (cid:88) j =1 (cid:0) | U i f τj ( W i ) | − s jj (cid:1) + 1 ς m n n (cid:88) i (cid:54) = i (cid:48) m n (cid:88) j =1 U i U i (cid:48) f τj ( W i ) f τj ( W i (cid:48) )where the first summand tends in probability to zero as n → ∞ . Indeed, since E | U f j ( W ) | − ς jj = 0, j (cid:62)
1, it holds for all m (cid:62) ς m n ) E (cid:12)(cid:12) (cid:88) i m (cid:88) j =1 | U i f τj ( W i ) | − s jj (cid:12)(cid:12) = 1 nς m E (cid:12)(cid:12) m (cid:88) j =1 | U f τj ( W ) | − s jj (cid:12)(cid:12) (cid:54) nς m E (cid:107) U f τm ( W ) (cid:107) . By using Assumptions 1 and 2, i.e., sup j ∈ N E | f j ( W ) | (cid:54) η f η p and E [ U | W ] (cid:54) σ , weconclude E (cid:107) U f τm ( W ) (cid:107) (cid:54) max (cid:54) j (cid:54) m E | U f j ( W ) | (cid:16) m (cid:88) j =1 τ j (cid:17) (cid:54) η f η p σ (cid:16) m (cid:88) j =1 τ j (cid:17) . (A.1)Let m = m n satisfy condition (2.4) then E (cid:107) U f τm n ( W ) (cid:107) = o (cid:0) nς m n (cid:1) . Therefore, it issufficient to prove √ ς m n n ) − (cid:88) i (cid:54) = i (cid:48) m n (cid:88) j =1 U i U i (cid:48) f τj ( W i ) f τj ( W i (cid:48) ) d → N (0 , . (A.2)Since ς m n = o (1) this follows from Lemma A.2 and thus, completes the proof. Proof of Theorem 2.2.
Similarly to the proof of Theorem 2.1 it is sufficient to studythe asymptotic behavior of n − (cid:80) m n j =1 (cid:80) i (cid:54) = i (cid:48) U i U i (cid:48) f τj ( W i ) f τj ( W i (cid:48) ). For any finite m (cid:62) E (cid:12)(cid:12)(cid:12) n m (cid:88) j =1 (cid:88) i (cid:54) = i (cid:48) U i U i (cid:48) f τj ( W i ) f τj ( W i (cid:48) ) − n ∞ (cid:88) j =1 (cid:88) i (cid:54) = i (cid:48) U i U i (cid:48) f τj ( W i ) f τj ( W i (cid:48) ) (cid:12)(cid:12)(cid:12) (cid:54) E (cid:104) E [ U U | W , W ] (cid:16) (cid:88) j>m f τj ( W ) f τj ( W ) (cid:17) (cid:105) (cid:54) σ η p (cid:16) (cid:88) j>m τ j (cid:17) (cid:80) j (cid:62) τ j = O (1), becomes sufficiently small (depending on m ). Note that (cid:0) √ n (cid:80) i U i f τ ( W i ) , . . . , √ n (cid:80) i U i f τm ( W i ) (cid:1) d → N (0 , Σ m ). Hence, for any finite m (cid:62) m (cid:88) j =1 (cid:12)(cid:12)(cid:12) √ n (cid:88) i U i f τj ( W i ) (cid:12)(cid:12)(cid:12) d → m (cid:88) j =1 λ j χ j with λ j , 1 (cid:54) j (cid:54) m , being eigenvalues of Σ m . Moreover, we conclude for m (cid:62) n m (cid:88) j =1 (cid:88) i (cid:54) = i (cid:48) U i U i (cid:48) f τj ( W i ) f τj ( W i (cid:48) ) t = m (cid:88) j =1 (cid:16)(cid:12)(cid:12)(cid:12) √ n (cid:88) i U i f τj ( W i ) (cid:12)(cid:12)(cid:12) − n (cid:88) i | U i f τj ( W i ) | (cid:17) d → m (cid:88) j =1 (cid:0) λ j χ j − s jj (cid:1) . It is easily seen that (cid:80) mj =1 ( λ j χ j − s jj ) has expectation zero. Hence, following the linesof page 198-199 of Serfling [1981] we obtain that (cid:80) j>m (cid:0) λ j χ j − s jj (cid:1) becomes sufficientlysmall (depending on m ) and thus, completes the proof. Proof of Proposition 2.3.
For ease of notation let δ n ( · ) := ς / m n n − / δ ( · ). Under thesequence of alternatives (2.6) the following decomposition holds true S n = (cid:13)(cid:13) n − (cid:88) i U i f τm n ( W i ) (cid:13)(cid:13) + 2 (cid:10) n − (cid:88) i U i f τm n ( W i ) , n − (cid:88) i δ n ( Z i ) f τm n ( W i ) (cid:11) + (cid:13)(cid:13) n − (cid:88) i δ n ( Z i ) f τm n ( W i ) (cid:13)(cid:13) =: I n + 2 II n + III n . Due to Theorem 2.1 we have ( √ ς m n ) − (cid:0) n I n − µ m n (cid:1) d → N (0 , II n . We observe n E | II n | (cid:54) m n (cid:88) j =1 τ j (cid:0) E | U f j ( W ) | E | δ n ( Z ) f j ( W ) | (cid:1) / + (cid:16) n E (cid:12)(cid:12)(cid:12) m n (cid:88) j =1 τ j [ T δ n ] j U f j ( W ) (cid:12)(cid:12)(cid:12) (cid:17) / (cid:54) σ m n (cid:88) j =1 τ j (cid:0) E | δ n ( Z ) f j ( W ) | (cid:1) / + ση p √ n (cid:107) T δ n (cid:107) W . From the definition of δ n and condition (2.4) we infer that n E | II n | = o ( ς m n ). Consider III n .Employing again the definition of δ n it is easily seen that nς − m n III n = (cid:80) m n j =1 τ j [ T δ ] j + o p (1).We conclude ( √ ς m n ) − nIII n = 2 − / (cid:80) j (cid:62) δ j + o p (1), which completes the proof. Proof of Proposition 2.4.
Let δ n ( · ) := n − / δ ( · ). Similarly to the proof of Theorem2.2 it is straightforward to see that under the sequence of alternatives (2.7) it holds1 n (cid:88) i (cid:54) = i (cid:48) m n (cid:88) j =1 ( U i + δ n ( Z i ))( U i (cid:48) + δ n ( Z i (cid:48) )) f τj ( W i ) f τj ( W i (cid:48) )= ∞ (cid:88) j =1 (cid:16)(cid:12)(cid:12)(cid:12) √ n (cid:88) i U i f τj ( W i ) + 1 n (cid:88) i δ ( Z i ) f τj ( W i ) (cid:12)(cid:12)(cid:12) − n (cid:88) i (cid:12)(cid:12) U i f τj ( W i ) (cid:12)(cid:12) (cid:17) d → ∞ (cid:88) j =1 λ j χ j ( δ j /λ j )similar to the lines of page 198-199 of Serfling [1981] and hence the assertion follows.24 roof of Proposition 2.5. If H fails we observe that (cid:80) ∞ j =1 τ j [ T ( ϕ − ϕ )] j = (cid:82) W | LT ( ϕ − ϕ )( w ) p W ( w ) /ν ( w ) | ν ( w ) dw (cid:62) C (cid:107) LT ( ϕ − ϕ ) (cid:107) W > p W /ν is uniformly boundedfrom zero and LT is nonsingular. Now since ς m n α n + µ m n = o ( n ) it is sufficient to show S n = (cid:80) ∞ j =1 τ j [ T ( ϕ − ϕ )] j + o p (1). We make use of the decomposition S n = m n (cid:88) j =1 τ j (cid:12)(cid:12) n − (cid:88) i ( Y i − ϕ ( Z i )) f j ( W i ) − [ T ( ϕ − ϕ )] j (cid:12)(cid:12) + 2 m n (cid:88) j =1 τ j (cid:0) n − (cid:88) i ( Y i − ϕ ( Z i )) f j ( W i ) − [ T ( ϕ − ϕ )] j (cid:1) [ T ( ϕ − ϕ )] j + m n (cid:88) j =1 τ j [ T ( ϕ − ϕ )] j = I n + II n + III n . Due to condition E | Y − ϕ ( Z ) | < ∞ it is easily seen that I n + II n = o p (1), which provesthe result. Proof of Proposition 2.6.
We make use of the decomposition P (cid:16) ( √ ς m n ) − (cid:0) n S n − µ m n (cid:1) > q − α (cid:17) (cid:62) P (cid:16)(cid:13)(cid:13) n − / (cid:88) i ( ϕ ( Z i ) − ϕ ( Z i )) f τm n ( W i ) (cid:13)(cid:13) + (cid:13)(cid:13) n − / (cid:88) i U i f τm n ( W i ) (cid:13)(cid:13) − µ m n > √ ς m n q − α + 2 | (cid:10) n − (cid:88) i ( ϕ ( Z i ) − ϕ ( Z i )) f τm n ( W i ) , (cid:88) i U i f τm n ( W i ) (cid:11) | (cid:17) . Uniformly over all ϕ ∈ G ρn it holds (cid:10) n − (cid:88) i ( ϕ ( Z i ) − ϕ ( Z i )) f τm n ( W i ) , (cid:88) i U i f τm n ( W i ) (cid:11) = O p (cid:0) max( √ n (cid:107) LT ( ϕ − ϕ ) (cid:107) W , ς m n ) (cid:1) . Indeed, this is easily seen from E (cid:12)(cid:12) m n (cid:88) j =1 τ j E [( ϕ ( Z ) − ϕ ( Z )) f j ( W )] (cid:88) i U i f j ( W i ) (cid:12)(cid:12) (cid:54) σ η p n m n (cid:88) j =1 E [( ϕ ( Z ) − ϕ ( Z )) f τj ( W )] and further, denoting ψ ji = ( ϕ ( Z i ) − ϕ ( Z i )) f j ( W ) − E [( ϕ ( Z ) − ϕ ( Z )) f j ( W )], 1 (cid:54) j (cid:54) m n ,1 (cid:54) i (cid:54) n , from E (cid:12)(cid:12) n − (cid:88) i (cid:54) = i (cid:48) m n (cid:88) j =1 τ j ψ ji U i (cid:48) f j ( W i (cid:48) ) (cid:12)(cid:12) = n − n m n (cid:88) j,j (cid:48) =1 τ j τ j (cid:48) E (cid:2) ψ j ψ j (cid:48) (cid:3) E (cid:2) U f j ( W ) f j (cid:48) ( W ) (cid:3) (cid:54) C m n (cid:88) j,j (cid:48) =1 τ j τ j (cid:48) E (cid:2) U f j ( W ) f j (cid:48) ( W ) (cid:3) (cid:54) Cσ E (cid:12)(cid:12) m n (cid:88) j =1 τ j f j ( W ) (cid:12)(cid:12) = O (cid:16) m n (cid:88) j =1 τ j (cid:17) . Thereby, for all 0 < ε (cid:48) < C > P (cid:16) ( √ ς m n ) − (cid:0) n S n − µ m n (cid:1) > q − α (cid:17) (cid:62) P (cid:16)(cid:13)(cid:13) n − / (cid:88) i ( ϕ ( Z i ) − ϕ ( Z i )) f τm n ( W i ) (cid:13)(cid:13) + (cid:13)(cid:13) n − / (cid:88) i U i f τm n ( W i ) (cid:13)(cid:13) − µ m n > √ ς m n q − α + C max( √ n (cid:107) LT ( ϕ − ϕ ) (cid:107) W , ς m n ) (cid:17) − ε (cid:48) . (cid:13)(cid:13) n − / (cid:80) i U i f τm n ( W i ) (cid:13)(cid:13) = µ m n + O p ( ς m n ) due to Theorem 2.1. Moreover, (cid:13)(cid:13) n − / (cid:88) i ( ϕ ( Z i ) − ϕ ( Z i )) f τm n ( W i ) (cid:13)(cid:13) (cid:62) n m n (cid:88) j =1 τ j [ T ( ϕ − ϕ )] j − (cid:12)(cid:12)(cid:10) (cid:88) i ( ϕ ( Z i ) − ϕ ( Z i )) f τm n ( W i ) − n [ LT ( ϕ − ϕ )] m n , [ LT ( ϕ − ϕ )] m n (cid:11) = I n + II n . Consider II n . For 1 (cid:54) j (cid:54) m n let s j = τ j [ T ( ϕ − ϕ )] j / (cid:0) (cid:80) ∞ j =1 τ j [ T ( ϕ − ϕ )] j (cid:1) / then clearly (cid:80) m n j =1 s j = 1 and thus E | (cid:80) m n j =1 s j f j ( W ) | (cid:54) η f η p . Further, since sup z ∈Z | ϕ ( z ) − ϕ ( z ) | (cid:54) C we calculate E | II n | = n E (cid:12)(cid:12)(cid:12) m n (cid:88) j =1 τ j (cid:0) ( ϕ ( Z ) − ϕ ( Z )) f j ( W ) − [ T ( ϕ − ϕ )] j (cid:1) [ T ( ϕ − ϕ )] j (cid:12)(cid:12)(cid:12) (cid:54) n m n (cid:88) j =1 τ j [ T ( ϕ − ϕ )] j E (cid:12)(cid:12)(cid:12) m n (cid:88) j =1 s j ( ϕ ( Z ) − ϕ ( Z )) f j ( W ) (cid:12)(cid:12)(cid:12) = O (cid:0) n (cid:107) LT ( ϕ − ϕ ) (cid:107) W (cid:1) and hence II n = O p ( √ n (cid:107) LT ( ϕ − ϕ ) (cid:107) W ). Consider I n . Note that (cid:107) LT ( ϕ − ϕ ) (cid:107) W (cid:54) C forall ϕ ∈ G ρn we have I n (cid:62) Cn (cid:107) LT ( ϕ − ϕ ) (cid:107) W for n sufficiently large. Since on G ρn we have n (cid:107) LT ( ϕ − ϕ ) (cid:107) W (cid:62) ρ ς m n we obtain the result by choosing ρ sufficiently large. A.2. Proofs of Section 3.
For ease of notation, we write in the following φ ( · ) for φ ( · , ϑ ) and φ ϑ l ( · ) for φ ϑ l ( · , ϑ ). Proof of Theorem 3.1.
The proof is based on the decomposition under H p S p n = (cid:13)(cid:13) n − (cid:88) i U i f τm n ( W i ) (cid:13)(cid:13) +2 (cid:10) n − (cid:88) i U i f τm n ( W i ) , n − (cid:88) i (cid:0) φ ( Z i ) − φ ( Z i , (cid:98) ϑ n ) (cid:1) f τm n ( W i ) (cid:11) + (cid:107) n − (cid:88) i (cid:0) φ ( Z i ) − φ ( Z i , (cid:98) ϑ n ) (cid:1) f τm n ( W i ) (cid:107) = I n + 2 II n + III n . (A.3)Due to Theorem 2.1 it holds ( √ ς m n ) − ( nI n − µ m n ) d → N (0 , III n . It holds φ ( Z i ) − φ ( Z i , (cid:98) ϑ n ) = φ ϑ ( Z i ) t ( ϑ − (cid:98) ϑ n )+( ϑ − (cid:98) ϑ n ) t φ ϑϑ ( Z i , ϑ n ) t ( ϑ − (cid:98) ϑ n ) / ϑ n between (cid:98) ϑ n and ϑ . From the bounds imposed in Assumption 6 ( ii ) we infer nIII n (cid:54) n (cid:107) ϑ − (cid:98) ϑ n (cid:107) (cid:16) k (cid:88) l =1 m n (cid:88) j =1 τ j [ T φ ϑ l ] j + k (cid:88) l =1 m n (cid:88) j =1 τ j (cid:0) n (cid:88) i φ ϑ l ( Z i ) f j ( W i ) − [ T φ ϑ l ] j (cid:1) (cid:17) + o p (1) . For each 1 (cid:54) l (cid:54) k we have m n (cid:88) j =1 [ T φ ϑ l ] j = m n (cid:88) j =1 (cid:16) (cid:90) W ( T φ ϑ l )( w ) f j ( w ) p W ( w ) dw (cid:17) (cid:54) (cid:90) W | ( T φ ϑ l )( w ) p W ( w ) /ν ( w ) | ν ( w ) dw (cid:54) η p (cid:107) T φ ϑ l (cid:107) W (cid:54) η p E | φ ϑ l ( Z, ϑ ) | (cid:54) η p η φ (A.4)by applying Jensen’s inequality. Moreover, we calculate k (cid:88) l =1 m n (cid:88) j =1 E (cid:12)(cid:12) n (cid:88) i φ ϑ l ( Z i ) f j ( W i ) − [ T φ ϑ l ] j (cid:12)(cid:12) (cid:54) km n n sup j,l (cid:62) E | φ ϑ l ( Z ) f j ( W ) | (cid:54) η km n n . (A.5)26hese estimates together with (cid:107) ϑ − (cid:98) ϑ n (cid:107) = O p ( n − / ) imply nIII n = o p ( ς m n ). We are leftwith the proof of nII n = o p ( ς m n ). We observe for each 1 (cid:54) l (cid:54) k E (cid:12)(cid:12)(cid:12) m n (cid:88) j =1 τ j (cid:16) n − / (cid:88) i U i f j ( W i ) (cid:0) n − (cid:88) i φ ϑ l ( Z i ) f j ( W i ) − [ T φ ϑ l ] j (cid:1)(cid:17)(cid:12)(cid:12)(cid:12) (cid:54) n − / m n (cid:88) j =1 τ j (cid:0) E | U f j ( W ) | (cid:1) / (cid:0) E | φ ϑ l ( Z ) f j ( W ) | (cid:1) / = O (cid:16) n − / m n (cid:88) j =1 τ j (cid:17) = o ( ς m n ) . Now since n / ( ϑ − (cid:98) ϑ n ) = O p (1) we infer nII n = n / ( ϑ − (cid:98) ϑ n ) t m n (cid:88) j =1 τ j (cid:16) ς − m n n − / (cid:88) i U i f j ( W i ) E [ φ ϑ ( Z ) f j ( W )] (cid:17) + o p (1) . We observe for each 1 (cid:54) l (cid:54) kς − m n n − E (cid:12)(cid:12)(cid:12) m n (cid:88) j =1 τ j (cid:88) i U i f j ( W i )[ T φ ϑ l ] j (cid:12)(cid:12)(cid:12) (cid:54) ς − m n σ η p m n (cid:88) j =1 [ T φ ϑ l ] j (cid:54) ς − m n σ η p η f which implies nII n = o p ( ς m n ) and thus, in light of decomposition (A.3), completes theproof. Proof of Theorem 3.2.
For 1 (cid:54) j (cid:54) m n we make use of the following decomposition n − / (cid:88) i f j ( W i ) (cid:16) U i + φ ( Z i ) − φ ( Z i , (cid:98) ϑ n ) (cid:17) = n − / (cid:88) i (cid:16) f j ( W i ) U i − k (cid:88) l =1 [ T φ ϑ l ] j h l ( V i ) (cid:17) + k (cid:88) l =1 (cid:16) n − (cid:88) i f j ( W i ) φ ϑ l ( Z i ) − [ T φ ϑ l ] j (cid:17)(cid:16) n − / (cid:88) i h l ( V i ) (cid:17) + k (cid:88) l =1 n − (cid:88) i f j ( W i ) φ ϑ l ( Z i ) r l + o p (1) = A nj + B nj + C nj + o p (1) (A.6)where r k = ( r , . . . , r k ) t is a stochastic vector satisfying r k = o p (1). Consequently, under H p we have nS p n = m n (cid:88) j =1 τ j A nj + 2 m n (cid:88) j =1 τ j A nj ( B nj + C nj ) + m n (cid:88) j =1 τ j ( B nj + C nj ) + o p (1) . Clearly, for all 1 (cid:54) i (cid:54) n the random variables U i f τj ( W i ) + E (cid:2) f τj ( W ) φ ϑ ( Z ) t (cid:3) h k ( V i ), 1 (cid:54) j (cid:54) m n , are centered with bounded second moment. Due to the proof of Theorem 2.2 itis easily seen that (cid:80) m n j =1 τ j A nj d → (cid:80) ∞ j =1 λ p j χ j . Inequality (A.5) yields (cid:80) m n j =1 B nj = o p (1).Since (cid:80) m n j =1 [ T φ ϑ ] j (cid:54) η p η φ we have (cid:107) E [ f m n ( W ) φ ϑ ( Z ) t ] r k (cid:107) (cid:54) k η p η φ (cid:107) r k (cid:107) = o p (1) and hence (cid:80) m n j =1 C nj = o p (1). Finally, the Cauchy-Schwarz inequality implies (cid:80) m n j =1 τ j A nj ( B nj + C nj ) = o p (1), which completes the proof. 27 roof of Proposition 3.3. Without loss of generality we may assume δ = δ ⊥ (otherwisereplace φ ( Z i ) by φ ( Z i ) + φ ϑ ( Z i ) t E [ δ ( Z ) φ ϑ ( Z )]. Consider the case ς − m n = o (1). Under thesequence of alternatives (2.6) the following decomposition holds true S p n = (cid:13)(cid:13) n − (cid:88) i ( U i + ς / m n n − / δ ⊥ ( Z i )) f τm n ( W i ) (cid:13)(cid:13) + 2 (cid:10) n − (cid:88) i ( U i + ς / m n n − / δ ⊥ ( Z i )) f τm n ( W i ) , n − (cid:88) i ( φ ( Z i ) − φ ( Z i , (cid:98) ϑ n )) f τm n ( W i ) (cid:11) + (cid:13)(cid:13) n − (cid:88) i ( φ ( Z i ) − φ ( Z i , (cid:98) ϑ n )) f τm n ( W i ) (cid:13)(cid:13) . Due to Proposition 2.3 and the proof of Theorem 3.1 it is sufficient to show (cid:10) n − (cid:88) i δ ⊥ ( Z i ) f τm n ( W i ) , n − / (cid:88) i ( φ ( Z i ) − φ ( Z i , (cid:98) ϑ n )) f τm n ( W i ) (cid:11) = o p ( √ ς m n ) . (A.7)Indeed, since δ j ⊥ = √ τ j E [ δ ⊥ ( Z ) f j ( W )] we have m n (cid:88) j =1 δ j ⊥ n − / (cid:88) i ( φ ( Z i ) − φ ( Z i , (cid:98) ϑ n )) f j ( W i ) = √ n ( ϑ − (cid:98) ϑ n ) t m n (cid:88) j =1 δ j ⊥ E [ φ ϑ ( Z ) f j ( W )]+ o p (1) (cid:54) η p η φ √ n (cid:107) ϑ − (cid:98) ϑ n (cid:107) ∞ (cid:88) j =1 δ j ⊥ + o p (1) = O p (1)and hence (A.7) holds true.Consider the case (cid:80) m n j =1 τ j = O (1). We make use of decomposition (A.6) where U i isreplaced by U i + n − / δ ⊥ ( Z i ). Similarly to the proof of Proposition 2.4 it is easily seenthat (cid:80) m n j =1 τ j A nj d → (cid:80) ∞ j =1 λ p j χ j ( δ j ⊥ /λ p j ). Thereby, due to the proof of Theorem 3.2, theassertion follows. Proof of Proposition 3.4.
Consider first the case ς − m n = o (1). Similar to the proofof Theorem 3.1 we observe that (cid:107) n − (cid:80) i ( φ ( Z i , ϑ ) − φ ( Z i , (cid:98) ϑ n )) f τm n ( W i ) (cid:107) = o p (1) and (cid:107) n − (cid:80) i ( Y i − φ ( Z i , ϑ )) f τm n ( W i ) (cid:107) = (cid:80) ∞ j =1 τ j [ T ( ϕ − φ ( · , ϑ ))] j + o p (1). Thus, the resultfollows as in the proof of Proposition 2.5. In case of (cid:80) m n j =1 τ j = O (1), we obtain similarlythat S p n = (cid:80) m n j =1 τ j (cid:12)(cid:12) n − (cid:80) i (cid:0) ( Y i − φ ( Z i , ϑ )) f j ( W i ) (cid:12)(cid:12) + o p (1) and hence, S p n = (cid:80) ∞ j =1 τ j [ T ( ϕ − φ ( · , ϑ ))] j + o p (1) . Proof of Proposition 3.5.
Consider the case ς − m n = o (1). The basic inequality ( a − b ) (cid:62) a / − b , a, b ∈ R , yields P (cid:16) ( √ ς m n ) − (cid:0) n S p n − µ m n (cid:1) > q − α (cid:17) (cid:62) P (cid:16) / (cid:13)(cid:13) n − / (cid:88) i ( ϕ ( Z i ) − φ ( Z i , ϑ )) f τm n ( W i ) (cid:13)(cid:13) + (cid:13)(cid:13) n − / (cid:88) i U i f τm n ( W i ) (cid:13)(cid:13) − µ m n > √ ς m n q − α + 2 | (cid:10) n − (cid:88) i ( ϕ ( Z i ) − φ ( Z i , (cid:98) ϑ n )) f τm n ( W i ) , (cid:88) i U i f τm n ( W i ) (cid:11) | + (cid:13)(cid:13) n − / (cid:88) i ( φ ( Z i , ϑ ) − φ ( Z i , (cid:98) ϑ n )) f τm n ( W i ) (cid:13)(cid:13) (cid:17) . (A.8)28rom the proof of Theorem 3.1 we infer (cid:13)(cid:13) n − / (cid:80) i ( φ ( Z i , (cid:98) ϑ n ) − φ ( Z i , ϑ )) f τm n ( W i ) (cid:13)(cid:13) = o p ( ς m n ) and (cid:10) n − (cid:88) i ( ϕ ( Z i ) − φ ( Z i , (cid:98) ϑ n )) f τm n ( W i ) , (cid:88) i U i f τm n ( W i ) (cid:11) = (cid:10) n − (cid:88) i ( ϕ ( Z i ) − φ ( Z i , ϑ )) f τm n ( W i ) , (cid:88) i U i f τm n ( W i ) (cid:11) + o p ( ς m n ) . Thus, following line by line the proof of Proposition 2.6, the assertion follows. In case of (cid:80) m n j =1 τ j = O (1) the assertion follows similarly. A.3. Proofs of Section 4.
In the following, we denote [ (cid:98) Q ] k n = n − (cid:80) i e k n ( Z i ) e k n ( Z i ) t . By Assumption 7, the eigen-values of E [ e k n ( Z ) e k n ( Z ) t ] are bounded away from zero and hence, it may be assumed that E [ e k n ( Z ) e k n ( Z ) t ] = I k n (cf. Newey [1997], p. 161). Proof of Theorem 4.1.
The proof is based on the decomposition (A.3) where the esti-mator φ ( · , (cid:98) ϑ n ) is replaced by ϕ k n ( · ) given in (4.1). It holds nIII n = o p ( ς m n ), which can beseen as follows. We make use of III n / (cid:54) (cid:13)(cid:13) n (cid:88) i ( E k n ϕ ( Z i ) − ϕ k n ( Z i )) f τm n ( W i ) (cid:13)(cid:13) + (cid:13)(cid:13) n (cid:88) i (cid:0) E ⊥ k n ϕ (cid:1) ( Z i ) f τm n ( W i ) (cid:13)(cid:13) =: A n + A n . Consider A n . We observe A n (cid:54) (cid:107) E [ f τm n ( W ) e k n ( Z ) t ][ (cid:98) Q ] − k n ([ (cid:98) Q ] k n [ ϕ ] k n − n − (cid:88) i Y i e k n ( Z i )) (cid:107) + 2 (cid:107) E k n ϕ − ϕ k n (cid:107) Z m n (cid:88) j =1 τ j k n (cid:88) l =1 | n − (cid:88) i e l ( Z i ) f j ( W i ) − [ T ] jl | =: 2 B n + 2 B n . (A.9)For B n we evaluate due to the relation [ (cid:98) Q ] − k n = I k n − [ (cid:98) Q ] − k n ([ (cid:98) Q ] k n − I k n ) that B n (cid:54) (cid:13)(cid:13) E [ f τm n ( W ) e k n ( Z ) t ] n − (cid:88) i ( E k n ϕ ( Z i ) − Y i ) e k n ( Z i ) (cid:13)(cid:13) +2 (cid:13)(cid:13) E [ f τm n ( W ) e k n ( Z ) t ] (cid:13)(cid:13) (cid:13)(cid:13) [ (cid:98) Q ] k n − I k n (cid:13)(cid:13) (cid:13)(cid:13) [ (cid:98) Q ] − k n (cid:13)(cid:13) (cid:13)(cid:13) n − (cid:88) i ( E k n ϕ ( Z i ) − Y i ) e k n ( Z i ) (cid:13)(cid:13) . Since the spectral norm of a matrix is bounded by its Frobenius norm it holds E (cid:13)(cid:13) [ (cid:98) Q ] k n − I k n (cid:13)(cid:13) (cid:54) n − k n (cid:88) l,l (cid:48) =1 E | e l ( Z ) e l (cid:48) ( Z ) | (cid:54) η e n − k n . E [( E k n ϕ ( Z ) − Y ) e k n ( Z )] = 0 we deduce E (cid:13)(cid:13) E [ f τm n ( W ) e k n ( Z ) t ] n − (cid:88) i ( E k n ϕ ( Z i ) − Y i ) e k n ( Z i ) (cid:13)(cid:13) (cid:54) n − m n (cid:88) j =1 E (cid:12)(cid:12) k n (cid:88) j =1 E [ f j ( W ) e l ( Z )]( E k n ϕ ( Z ) − Y ) e l ( Z ) | (cid:54) Cη p n − m n (cid:88) j =1 k n (cid:88) j =1 E [ f j ( W ) e l ( Z )] = O ( n − k n )where we used the definition of F γ and that E [ U | Z ] is bounded. Moreover, since thedifference of eigenvalues of [ (cid:98) Q ] k n and I k n is bounded by (cid:107) [ (cid:98) Q ] k n − I k n (cid:107) , the smallest eigenvalueof [ (cid:98) Q ] k n converges in probability to one and hence, (cid:107) [ (cid:98) Q ] − k n (cid:107) = 1 + o p (1). Further, note that (cid:107) E [ f τm n ( W ) e k n ( Z ) t ] (cid:107) (cid:54) (cid:80) m n j =1 (cid:80) k n j =1 E [ f j ( W ) e l ( Z )] = O ( k n ). Consequently, n (cid:107) E k n ϕ − ϕ k n (cid:107) Z = O p ( k n ) (A.10)and since k n = o ( ς m n ) we proved nB n = o p ( ς m n ). In addition, applying inequality (A.5)together with equation (A.10) yields nB n = o p ( ς m n ). Consequently, nA n = o ( ς m n ). Con-sider A n . Similar to the derivation of (A.4) we obtain E (cid:13)(cid:13) n − (cid:88) i (cid:0) E ⊥ k n ϕ (cid:1) ( Z i ) f τm n ( W i ) (cid:13)(cid:13) (cid:54) η p (cid:107) E ⊥ k n ϕ (cid:107) Z + 2 n − m n (cid:88) j =1 E | E ⊥ k n ϕ ( Z ) f j ( W ) | . We have m n (cid:88) j =1 τ j E | ( E ⊥ k n ϕ )( Z ) f j ( W ) | = O (cid:16) γ − k n m n (cid:88) j =1 τ j (cid:17) = o ( ς m n ) (A.11)and n (cid:107) E ⊥ k n ϕ (cid:107) Z = O ( nγ − k n ) = o ( ς m n ). Hence, nIII n = o p ( ς m n ). Consider II n . We calculate nII n (cid:54) (cid:12)(cid:12)(cid:12) m n (cid:88) j =1 τ j (cid:88) i U i f j ( W i )([ ϕ ] k n − [ ϕ ] k n ) t (cid:16) n − (cid:88) i e k n ( Z i ) f j ( W i ) − E (cid:2) e k n ( Z ) f j ( W ) (cid:3)(cid:17)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12) m n (cid:88) j =1 τ j k n (cid:88) l =1 ([ ϕ ] l − [ ϕ ] l ) (cid:16) (cid:88) i U i f j ( W i )[ T ] jl (cid:17)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12) m n (cid:88) j =1 τ j (cid:16) (cid:88) i U i f j ( W i ) (cid:17)(cid:16) n − (cid:88) i E ⊥ k n ϕ ( Z i ) f j ( W i ) − E [ E ⊥ k n ϕ ( Z ) f j ( W )] (cid:17)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12) m n (cid:88) j =1 τ j (cid:16) (cid:88) i U i f j ( W i ) (cid:17) E [ E ⊥ k n ϕ ( Z ) f j ( W )] (cid:12)(cid:12)(cid:12) = C n + C n + C n + C n . (A.12)Consider C n . Applying the Cauchy-Schwarz inequality twice gives C n (cid:54) (cid:107) E k n ϕ − ϕ k n (cid:107) Z m n (cid:88) j =1 τ j (cid:12)(cid:12) (cid:88) i U i f j ( W i ) (cid:12)(cid:12) (cid:16) k n (cid:88) l =1 | n − (cid:88) i e l ( Z i ) f j ( W i ) − E [ e l ( Z ) f j ( W )] | (cid:17) / . E | (cid:80) i U i f j ( W i ) | (cid:54) n η f σ , relation (A.10), and inequality (A.5) we infer C n = o p ( ς m n ) due to condition (4.2). For C n we evaluate C n (cid:54) (cid:107) E k n ϕ − ϕ k n (cid:107) Z (cid:16) k n (cid:88) l =1 (cid:12)(cid:12) m n (cid:88) j =1 (cid:88) i U i f j ( W i )[ T ] jl (cid:12)(cid:12) (cid:17) / . Now (cid:80) m n j =1 (cid:80) k n l =1 [ T ] jl = O ( k n ) together with (A.10) yields C n = o p (1). Consider C n .Since E [ U | W ] (cid:54) σ we conclude similarly as in inequality (A.11) that E C n (cid:54) m n (cid:88) j =1 τ j (cid:0) E | U f j ( W ) | (cid:1) / (cid:0) E | E ⊥ k n ϕ ( Z ) f j ( W ) | (cid:1) / = O (cid:16) γ − / k n m n (cid:88) j =1 τ j (cid:17) = o ( ς m n ) . Consider C n . We calculate E | C n | (cid:54) n η p σ m n (cid:88) j =1 [ T E ⊥ k n ϕ ] j (cid:54) n η p σ (cid:107) T E ⊥ k n ϕ (cid:107) W = O ( nγ − k n ) = o ( ς m n ) . Consequently, in light of decomposition (A.12) we obtain nII n = o ( ς m n ), which completesthe proof. Proof of Theorem 4.2.
Employing the equality [ (cid:98) Q ] − k n = I k n − [ (cid:98) Q ] − k n ([ (cid:98) Q ] k n − I k n ) weobtain for all 1 (cid:54) j (cid:54) m n n − / (cid:88) i f j ( W i ) (cid:16) U i + ϕ ( Z i ) − ϕ k n ( Z i ) (cid:17) = n − / (cid:88) i (cid:16) f j ( W i ) U i + E [ f j ( W ) e k n ( Z ) t ] e k n ( Z i ) (cid:0) ϕ ( Z i ) − Y i (cid:1)(cid:17) + n − / (cid:88) i E (cid:2) f j ( W ) e k n ( Z ) t (cid:3) [ (cid:98) Q ] − k n (cid:0) [ (cid:98) Q ] k n − I k n (cid:1) e k n ( Z i ) (cid:0) E k n ϕ ( Z i ) − Y i (cid:1) + (cid:16) n − (cid:88) i f j ( W i ) e k n ( Z i ) − E (cid:2) f j ( W ) e k n ( Z ) t (cid:3)(cid:17) √ n (cid:0) [ ϕ ] k n − [ ϕ k n ] k n (cid:1) − n − / (cid:88) i E [ f j ( W ) e k n ( Z ) t ] e k n ( Z i ) E ⊥ k n ϕ ( Z i ) = A nj + B nj + C nj + D nj . (A.13)Due to Assumption 7 (ii) we may assume that { e , . . . , e k } forms an orthonormal systemin L Z and hence (cid:80) kl =1 E [ f j ( W ) e l ( Z )] is bounded uniformly in k . Thereby, we conclude (cid:80) m n j =1 τ j (cid:80) l>k n E [ f j ( W ) e l ( Z )] e l ( · ) = o (1). Now following line by line the proof of Theorem2.2 we deduce m n (cid:88) j =1 τ j A nj = m (cid:88) j =1 τ j E (cid:12)(cid:12)(cid:12) n − / (cid:88) i U i (cid:16) f j ( W i )+ (cid:88) l (cid:62) E [ f j ( W ) e l ( Z )] e l ( Z i ) (cid:17)(cid:12)(cid:12)(cid:12) + o p (1) d → ∞ (cid:88) j =1 λ e j χ j . Moreover, we see similarly to the proof of Theorem 4.1 that (cid:80) m n j =1 τ j ( B nj + C nj + D nj ) = o p (1), which completes the proof. 31 roof of Lemma 4.3. Note that the squared Frobenius norm of (cid:98) Σ m n − Σ m n is given by m n (cid:88) j,l =1 (cid:12)(cid:12)(cid:12) n − (cid:88) i ( Y i − ϕ k n ( Z i )) f τj ( W i ) f τl ( W i ) − s jl (cid:12)(cid:12) (cid:54) (cid:107) ϕ k n − E k n ϕ (cid:107) Z m n (cid:88) j,l =1 E (cid:2) (cid:107) e k n ( Z ) (cid:107) f τj ( W ) f τl ( W ) (cid:3) + m n (cid:88) j,l =1 E (cid:2) ( E ⊥ k n ϕ ( Z )) f τj ( W ) f τl ( W ) (cid:3) + o p (1) (cid:54) (cid:107) ϕ k n − E k n ϕ (cid:107) Z O (cid:16)(cid:0) m n (cid:88) j =1 τ j (cid:1) (cid:17) + O (cid:16)(cid:0) γ − k n m n (cid:88) j =1 τ j (cid:1) (cid:17) + o p (1) = o p (1)by using relation (A.10). Consequently, the Frobenius norm of (cid:98) Σ m n equals ς m n + o p (1).Consistency of the trace of (cid:98) Σ m n is seen similarly. Proof of Proposition 4.4.
Similar to the proof of Proposition 3.3 it is sufficient to show (cid:10) n − (cid:88) i δ ( Z i ) f τm n ( W i ) , n − / (cid:88) i ( ϕ ( Z i ) − ϕ k n ( Z i )) f τm n ( W i ) (cid:11) = o p ( √ ς m n ) . (A.14)By employing Jensen’s inequality and estimate (A.10) we obtain m n (cid:88) j =1 τ j [ T δ ] j √ n (cid:88) i ( E k n ϕ ( Z i ) − ϕ k n ( Z i )) f j ( W i ) (cid:54) √ n (cid:107) T δ (cid:107) τ (cid:107) T ( E k n ϕ − ϕ k n ) (cid:107) W + o p (1) = o p ( ς m n ) . Similarly to the upper bounds of C n and C n in the proof of Theorem 4.1 it is straight-forward to see that (cid:80) m n j =1 τ j [ T δ ] j n − / (cid:80) i E ⊥ k n ϕ ( Z i ) f j ( W i ) = o p ( ς m n ) and, hence equation(A.14) holds true. Consider the case (cid:80) m n j =1 τ j = O (1). We make use of decomposition (A.13)where U i is replaced by U i + n − / δ ( Z i ). Similarly to the proof of Proposition 2.4 it is easilyseen that (cid:80) m n j =1 τ j A nj d → (cid:80) ∞ j =1 λ e j χ j ( δ j /λ e j ). Thereby, due to the proof of Theorem 4.2,the assertion follows. Proof of Proposition 4.5.
Similar to the proof of Proposition 3.4.
Proof of Proposition 4.6.
We make use of inequality (A.8) where φ ( · , (cid:98) ϑ n ) is replacedby ϕ k n . From the proof of Theorem 4.1 we infer (cid:13)(cid:13) n − / (cid:80) i ( ϕ k n ( Z i ) − ϕ ( Z i )) f τm n ( W i ) (cid:13)(cid:13) = o p ( ς m n ) and (cid:10) n − (cid:88) i ( ϕ ( Z i ) − ϕ k n ( Z i )) f τm n ( W i ) , (cid:88) i U i f τm n ( W i ) (cid:11) = (cid:10) n − (cid:88) i ( ϕ ( Z i ) − ϕ ( Z i )) f τm n ( W i ) , (cid:88) i U i f τm n ( W i ) (cid:11) + o p ( ς m n )uniformly over all ϕ ∈ I ρn . Thus, following line by line the proof of Proposition 2.6, theassertion follows. 32 .4. Proofs of Section 5. Recall that [ T ] k = E [ e k ( W ) e k ( Z ) t ]. Further, we denote [ (cid:98) T ] k = n − (cid:80) ni =1 e k ( W i ) e k ( Z i ) t and [ (cid:98) g ] k = n − (cid:80) ni =1 Y i e k ( W i ). In the following, we introduce the function ϕ k n ( · ) := e k n ( · ) t [ T ] − k n [ g ] k n which belongs to L Z . For all k (cid:62) k := {(cid:107) [ (cid:98) T ] − k (cid:107) (cid:54) √ n } and (cid:102) k := {(cid:107) R k (cid:107)(cid:107) [ T ] − k (cid:107) (cid:54) / } where R k = [ (cid:98) T ] k − [ T ] k . Note that E Ω ckn = P (Ω ck n ) = o (1) (cf. proof of Proposition 3.1 of Breunig and Johannes [2011]) and, hence Ω kn = 1 + o p (1). For a sequence of weights ω = ( ω j ) j (cid:62) we define the weighted norm (cid:107) φ (cid:107) ω = (cid:0) (cid:80) j (cid:62) ω j ( (cid:82) Z φ ( z ) e j ( z ) ν ( z ) dz ) (cid:1) / . Proof of Theorem 5.1.
For the proof we make use of decomposition (A.3) where theestimator φ ( · , (cid:98) ϑ n ) is replaced by (cid:98) ϕ k n given in (5.2). Consider III n . Observe III n (cid:54) (cid:107) n − (cid:88) i ( ϕ k n ( Z i ) − (cid:98) ϕ k n ( Z i )) f τm n ( W i ) (cid:107) + 2 (cid:107) n − (cid:88) i (cid:0) ϕ k n ( Z i ) − ϕ ( Z i ) (cid:1) f τm n ( W i ) (cid:107) = 2 A n + 2 A n . (A.15)Consider A n . Making use of the relation [ (cid:98) T ] k n [ T ] − k n [ g ] k n − [ (cid:98) g ] k n = n − (cid:80) i f k n ( W i )( ϕ k n ( Z i ) − Y i ) we obtain A n (cid:54) (cid:13)(cid:13) E [ f m n ( W ) e k n ( Z ) t ][ T ] − k n (cid:13)(cid:13) (cid:13)(cid:13) n − (cid:88) i f k n ( W i )( ϕ k n ( Z i ) − Y i ) (cid:13)(cid:13) + 4 (cid:13)(cid:13) E [ f m n ( W ) e k n ( Z ) t ][ T ] − k n R k n [ (cid:98) T ] − k n n − (cid:88) i f k n ( W i )( ϕ k n ( Z i ) − Y i ) (cid:13)(cid:13) + 2 (cid:107) ϕ k n − (cid:98) ϕ k n (cid:107) υ m n (cid:88) j =1 τ j k n (cid:88) l =1 υ − l | n − (cid:88) i e l ( Z i ) f j ( W i ) − [ T ] jl | = 4 B n + 4 B n + 2 B n . From Lemma A.1 of Breunig and Johannes [2011] we deduce (cid:107) n − / (cid:80) i e k n ( W i ) (cid:0) ϕ k n ( Z i ) − Y i (cid:1) (cid:107) = O p ( k n ) and since (cid:107) E [ f m n ( W ) e k n ( Z ) t ][ T ] − k n (cid:107) = O (1) we have nB n = o ( ς m n ).Further, consider B n . By employing (cid:107) [ (cid:98) T ] − k (cid:107) (cid:102) k (cid:54) (cid:107) [ T ] − k (cid:107) and (cid:107) [ (cid:98) T ] − k (cid:107) Ω k (cid:54) n for all k (cid:62) B n Ω kn ( (cid:102) kn + (cid:102) ckn ) = O (cid:16) (cid:107) [ T ] − k n (cid:107) (cid:107) R k n (cid:107) (cid:107) n − (cid:88) i f k n ( W i ) (cid:0) ϕ k n ( Z i ) − Y i (cid:1) (cid:107) + n (cid:107) R k n (cid:107) (cid:107) n − (cid:88) i f k n ( W i ) (cid:0) ϕ k n ( Z i ) − Y i (cid:1) (cid:107) (cid:102) ckn (cid:17) . Further, since n (cid:107) R k n (cid:107) = O p ( k n ) (cf. Lemma A.1 of Breunig and Johannes [2011]) and n (cid:107) R k n (cid:107) (cid:107) n − / (cid:80) i e k n ( W i ) (cid:0) ϕ k n ( Z i ) − Y i (cid:1) (cid:107) (cid:102) ckn = o p (1) (cf. proof of Proposition 3.1 ofBreunig and Johannes [2011]) it follows nB n Ω kn = o ( ς m n ). This together with estimate33A.5) implies nA n = o p ( ς m n ). Consider A n . We observe E A n (cid:54) (cid:107) T ( ϕ k n − ϕ ) (cid:107) W + 2 n − E (cid:107) (cid:0) ϕ k n ( Z ) − ϕ ( Z ) (cid:1) f τm n ( W ) (cid:107) (cid:54) η p d (cid:107) ϕ k n − ϕ (cid:107) υ + 2 n (cid:88) l (cid:62) l (cid:16) (cid:90) Z ( ϕ k n − ϕ )( z ) e l ( z ) ν ( z ) dz (cid:17) m n (cid:88) j =1 τ j (cid:88) l (cid:62) l − E | e l ( Z ) f j ( W ) | = O (cid:16) υ k n γ k n (cid:107) ϕ k n − ϕ (cid:107) γ + (cid:107) ϕ k n − ϕ (cid:107) γ k n nγ k n m n (cid:88) j =1 τ j (cid:17) . (A.16)where we used Lemma A.2 of Johannes and Schwarz [2010], i.e., (cid:107) ϕ k n − ϕ (cid:107) w = O ( w k n γ − k n )for a nondecreasing sequence w . Condition (5.3) together with the estimate k n (cid:54) σ (cid:80) m n j =1 τ j for n sufficiently large implies nA n = o p ( ς m n ). Consequently, due to (A.15) we have shown nIII n = o p ( ς m n ). The proof of nII n = o p ( ς m n ) is based on decomposition (A.12) where ϕ k n and E ⊥ k n ϕ are replaced by (cid:98) ϕ k n and ϕ k n − ϕ , respectively. Consider C n . We calculate C n (cid:54) (cid:107) (cid:98) ϕ k n − ϕ k n (cid:107) υ m n (cid:88) j =1 τ j (cid:12)(cid:12) (cid:88) i U i f j ( W i ) (cid:12)(cid:12)(cid:16) k n (cid:88) l =1 υ − l (cid:12)(cid:12) n − (cid:88) i e l ( Z i ) f j ( W i ) − [ T ] jl (cid:12)(cid:12) (cid:17) / Since √ n (cid:107) (cid:98) ϕ k n − ϕ k n (cid:107) υ = o p ( ς / m n ) we obtain, similarly as in the proof of Theorem 4.1, C n = o p ( ς m n ). Consider C n . Again similarly to the proof of Theorem 4.1 we observe C n = (cid:12)(cid:12)(cid:12) m n (cid:88) j =1 τ j k n (cid:88) l =1 [ T ] jl (cid:90) Z ( (cid:98) ϕ k n − ϕ k n )( z ) e l ( z ) ν ( z ) dz (cid:16) (cid:88) i U i f j ( W i ) (cid:17)(cid:12)(cid:12)(cid:12) (cid:54) (cid:0) n (cid:107) (cid:98) ϕ k n − ϕ k n (cid:107) υ (cid:1) / (cid:16) σ k n (cid:88) l =1 υ − l m n (cid:88) j =1 [ T ] jl (cid:17) / + o p (1) = o ( ς m n )by exploiting (cid:80) m n j =1 [ T ] jl (cid:54) η p (cid:107) T e l (cid:107) W (cid:54) d η p υ l . Consider C n . Since E [ U | W ] (cid:54) σ weconclude similarly as in inequality (A.11) using Lemma A.2 of Johannes and Schwarz [2010] E C n (cid:54) σ m n (cid:88) j =1 τ j (cid:0) E | ( ϕ k n ( Z ) − ϕ ( Z )) f j ( W ) | (cid:1) / (cid:54) η πσ √ k n √ γ k n (cid:107) ϕ k n − ϕ (cid:107) γ m n (cid:88) j =1 τ j = o ( ς m n ) . Consider C n . Again exploring the link condition T ∈ T υd,D and Lemma A.2 of Johannesand Schwarz [2010] we calculate E | C n | (cid:54) nσ m n (cid:88) j =1 [ T ( ϕ k n − ϕ )] j (cid:54) nσ (cid:107) T ( ϕ k n − ϕ ) (cid:107) W (cid:54) nσd (cid:107) ϕ k n − ϕ (cid:107) υ (cid:54) Ddρσ nυ k n γ k n (cid:107) ϕ k n − ϕ (cid:107) γ = o ( ς m n ) . Consequently, the estimates for C n , C n , C n , and C n imply nII n = o p ( ς m n ), whichcompletes the proof. 34 roof of Theorem 5.2. Observe [ (cid:98) T ] k n [ T ] − k n [ g ] k n − [ (cid:98) g ] k n = n − (cid:80) i e k n ( W i )( ϕ k n ( Z i ) − Y i )and hence, for all 1 (cid:54) j (cid:54) m n n − / (cid:88) i f j ( W i ) (cid:0) U i + ϕ ( Z i ) − (cid:98) ϕ k n ( Z i ) (cid:1) = n − / (cid:88) i (cid:16) f j ( W i ) U i + E (cid:2) f j ( W ) e k n ( Z ) t (cid:3) [ T ] − k n e k n ( W i ) (cid:0) ϕ k n ( Z i ) − Y i (cid:1)(cid:17) − n − / (cid:88) i E (cid:2) f j ( W ) e k n ( Z ) t (cid:3) [ T ] − k n R k n [ (cid:98) T ] − k n e k n ( W i ) (cid:0) ϕ k n ( Z i ) − Y i (cid:1) + (cid:16) n − (cid:88) i f j ( W i ) e k n ( Z i ) t − E (cid:2) f j ( W ) e k n ( Z ) t (cid:3)(cid:17) [ (cid:98) T ] − k n (cid:16) n − / (cid:88) i e k n ( W i ) (cid:0) ϕ k n ( Z i ) − Y i (cid:1)(cid:17) + n − / (cid:88) i (cid:0) ϕ ( Z i ) − ϕ k n ( Z i ) (cid:1) f j ( W i ) = A nj + B nj + C nj + D nj . (A.17)Consider A nj . For each j (cid:62)
1, note that (cid:107) E (cid:2) f j ( W ) e k ( Z ) t (cid:3) [ T ] − k (cid:107) is bounded uniformly in k and further that E [ e k n ( W )( ϕ k n ( Z ) − ϕ ( Z ))] = 0. Now similarly to the proof of Theorem4.2 we conclude m n (cid:88) j =1 τ j A nj = m n (cid:88) j =1 τ j (cid:12)(cid:12)(cid:12) n − / (cid:88) i U i (cid:0) f j ( W i ) − e j ( W i ) (cid:1)(cid:12)(cid:12)(cid:12) + o p (1) d → ∞ (cid:88) j =1 λ np j χ j . Moreover, as in the proof of Theorem 5.1 it can be seen that (cid:80) m n j =1 τ j ( B nj + C nj + D nj ) = o p (1), which proves the result Proof of Proposition 5.3.
Consider the case ς − m n = o (1). Further, under (5.5) we ob-serve by following the upper bound for A n in the proof of Theorem 5.1 that m n (cid:88) j =1 (cid:12)(cid:12)(cid:12) n − / (cid:88) i ( (cid:98) ϕ k n ( Z i ) − ϕ ( Z i )) f τj ( W i ) (cid:12)(cid:12)(cid:12) = m n (cid:88) j =1 n (cid:12)(cid:12)(cid:12) (cid:88) i ( (cid:98) ϕ k n ( Z i ) − ϕ k n ( Z i )) f τj ( W i ) (cid:12)(cid:12)(cid:12) + m n (cid:88) j =1 (cid:12)(cid:12)(cid:12) ς m n n (cid:88) i δ ( Z i ) f τj ( W i ) (cid:12)(cid:12)(cid:12) + o p ( ς m n )= ς m n m n (cid:88) j =1 δ j + o p ( ς m n ) . Consequently, the result follows as in the proof of Theorem 5.1. For (cid:80) m n j =1 τ j = O (1) weconclude similarly. Proof of Proposition 5.4.
Following the lines of the proof of Theorem 5.1 it can beseen that (cid:107) n − (cid:80) i ( (cid:98) ϕ k n ( Z i ) − ϕ k n ( Z i )) f τm n ( W i ) (cid:107) = o p (1) with ϕ ∈ F ργ . On the otherhand, (cid:107) n − (cid:80) i ( ϕ k n ( Z i ) − ϕ ( Z i )) f τm n ( W i ) (cid:107) = C (cid:107) T ( ϕ k n − ϕ ) (cid:107) W + o p (1). Further, since (cid:107) T ( ϕ k n − ϕ ) (cid:107) W = (cid:107) g − T ϕ (cid:107) W + o (1) the result follows as in the proof of Proposition3.4. Proof of Proposition 5.5.
We make use of inequality (A.8) where φ ( · , (cid:98) ϑ n ) is replacedby (cid:98) ϕ k n . From the proof of Theorem 5.1 we infer (cid:13)(cid:13) n − / (cid:80) i ( (cid:98) ϕ k n ( Z i ) − ϕ ( Z i )) f τm n ( W i ) (cid:13)(cid:13) =35 p ( ς m n ) and (cid:10) n − (cid:88) i ( ϕ ( Z i ) − (cid:98) ϕ k n ( Z i )) f τm n ( W i ) , (cid:88) i U i f τm n ( W i ) (cid:11) = (cid:10) n − (cid:88) i ( ϕ ( Z i ) − ϕ ( Z i )) f τm n ( W i ) , (cid:88) i U i f τm n ( W i ) (cid:11) + o p ( ς m n )uniformly over all ϕ ∈ J ρn . Consequently, following line by line the proof of Proposition 2.6,the assertion follows. A.5. Technical assertions.
Let us introduce X ii (cid:48) := √ ς m n n ) − (cid:80) m n j =1 U i U i (cid:48) f τj ( W i ) f τj ( W i (cid:48) ) and Q ni := (cid:26) (cid:80) i − l =1 X li , for i = 2 , . . . , n, , for i = 1 and i > n. (A.18)Then clearly( √ ς m n n ) − (cid:88) i (cid:54) = i (cid:48) m n (cid:88) j =1 U i U i (cid:48) f τj ( W i ) f τj ( W i (cid:48) ) = √ ς m n n ) − (cid:88) i
1, be the σ -algebra generatedby ( Z , Y , W ) , . . . , ( Z i , Y i , W i ). Since U i f τj ( W i ), 1 (cid:54) i (cid:54) n , are centered random vari-ables it follows that { ( (cid:80) ii (cid:48) =1 Q ni (cid:48) , B ni ) , i (cid:62) } is a Martingale for each n (cid:62) { ( Q ni , B ni ) , i (cid:62) } is a Martingale difference array for each n (cid:62)
1. Moreover, it satisfies theconditions of Proposition A.1 as shown in the following technical result.
Proposition A.1. If { ( Q ni , B ni ) , i (cid:62) } is a Martingale difference array for each n (cid:62) satisfying conditions ∞ (cid:88) i =1 E | Q ni | (cid:54) for all n (cid:62) , (A.19) ∞ (cid:88) i =1 Q ni = 1 + o p (1) , (A.20)sup i (cid:62) | Q ni | = o p (1) (A.21) then (cid:80) ∞ i =1 Q ni d → N (0 , ν ) .Proof. See Awad [1981].Note that this result has been also applied by Ghorai [1980] to establish asymptotic nor-mality of an orthogonal series type density estimator.
Lemma A.2.
Let Q ni be defined as in (A.18) . Let Assumptions 1–4 be satisfied and assume (cid:0) (cid:80) m n j =1 τ j (cid:1) = o ( n ) . Then conditions (A.19) – (A.21) hold true. roof. Proof of (A.19). Observe that E [ X i X i (cid:48) ] = 0 for i (cid:54) = i (cid:48) and thus, for i = 2 , . . . , n wehave E | Q ni | = E | X i + · · · + X i − ,i | = ( i − E | X | = 2( i − n ς m n E (cid:12)(cid:12) m n (cid:88) j =1 U f τj ( W ) U f τj ( W ) (cid:12)(cid:12) = 2( i − n ς m n m n (cid:88) j,j (cid:48) =1 (cid:0) E [ U f τj ( W ) f τj (cid:48) ( W )] (cid:1) = 2( i − n by the definition of ς m n . Thereby, we conclude n (cid:88) i =1 E | Q ni | = 2 n n − (cid:88) i =1 i = n ( n − n = 1 − n (A.22)which proves (A.19).Proof of (A.20). Using relation (A.22) we observe E (cid:12)(cid:12) n (cid:88) i =1 Q ni − (cid:12)(cid:12) = n (cid:88) i =1 E Q ni + 2 (cid:88) i
2) = n ( n − n −
2) we conclude I n (cid:54) n ς m n (cid:16) m n (cid:88) j =1 τ j (cid:17) (cid:16) n ( n − m n (cid:88) j =1 τ j ( E | U f j ( W ) | ) + n ( n − n − m n (cid:88) j =1 s jj E | U f j ( W ) | (cid:17) Therefore, applying max j (cid:62) E | U f j ( W ) | (cid:54) η f η p σ and (cid:80) m n j =1 τ j = o ( n / ) yields I n = o (1).Consider II n . We calculate for i < i (cid:48) Q ni Q ni (cid:48) = (cid:16) i − (cid:88) k =1 X ki (cid:17)(cid:16) i (cid:48) − (cid:88) k =1 X ki (cid:48) (cid:17) + (cid:16) i − (cid:88) k =1 X ki (cid:17)(cid:16) i (cid:48) − (cid:88) k (cid:54) = k (cid:48) X ki (cid:48) X k (cid:48) i (cid:48) (cid:17) + (cid:16) i − (cid:88) k (cid:54) = k (cid:48) X ki X k (cid:48) i (cid:17)(cid:16) i (cid:48) − (cid:88) k =1 X ki (cid:48) (cid:17) + (cid:16) i − (cid:88) k (cid:54) = k (cid:48) X ki X k (cid:48) i (cid:17)(cid:16) i (cid:48) − (cid:88) k (cid:54) = k (cid:48) X ki (cid:48) X k (cid:48) i (cid:48) (cid:17) =: A ii (cid:48) + B ii (cid:48) + C ii (cid:48) + D ii (cid:48) . Consider A ii (cid:48) . Exploiting relation (A.22) and using (cid:80) i
1) = (cid:80) ni (cid:48) =1 ( i (cid:48) − i (cid:48) − / n ( n − n − / (cid:80) i
3) = (cid:80) ni (cid:48) =1 ( i (cid:48) − i (cid:48) − i (cid:48) − / ( n − n − n − / (cid:88) i
1) + 2( E X ) (cid:88) i
3) + o (1)= 8 n ( n − n − n ς m n (cid:16) m n (cid:88) j,j (cid:48) ,l,l (cid:48) =1 ς jj (cid:48) ς ll (cid:48) E U f τj ( W ) f τj (cid:48) ( W ) f τl ( W ) f τl (cid:48) ( W ) (cid:17) + n ( n − n − n − n + o (1) . Moreover, applying the Cauchy-Schwarz inequality twice gives m n (cid:88) j,j (cid:48) ,l,l (cid:48) =1 s jj (cid:48) s ll (cid:48) E U f τj ( W ) f τj (cid:48) ( W ) f τl ( W ) f τl (cid:48) ( W ) (cid:54) max (cid:54) j (cid:54) m n E | U f j ( W ) | (cid:16) m n (cid:88) j, j (cid:48) =1 √ τ j τ j (cid:48) s jj (cid:48) (cid:17) (cid:54) η f η p σ ς m n (cid:16) m n (cid:88) j =1 τ j (cid:17) . Thereby, it holds 2 (cid:80) i
2) = 2 σ η p n ( n − n − n − ς m n n = o (1)38nd hence 2 (cid:80) i ε (cid:1) (cid:54) (cid:80) ni =1 P (cid:0) Q ni > ε (cid:1) and, hence theassertion follows from the Markov inequality. References
C. Ai and X. Chen. Efficient estimation of models with conditional moment restrictionscontaining unknown functions.
Econometrica , 71:1795–1843, 2003.A. M. Awad. Conditional central limit theorems for martingales and reversed martingales.
The Indian Journal of Statistics, Series A , 43:10–106, 1981.R. Blundell and J. Horowitz. A nonparametric test of exogeneity.
Review of EconomicStudies , 74(4):1035–1058, Oct 2007.R. Blundell, X. Chen, and D. Kristensen. Semi-nonparametric iv estimation of shape-invariant engel curves.
Econometrica , 75(6):1613–1669, 2007.C. Breunig and J. Johannes. Adaptive estimation of functionals in nonparametric instru-mental regression. Technical report, University of Mannheim (submitted.), 2011.X. Chen and M. Reiß. On rate optimality for ill-posed inverse problems in econometrics.
Econometric Theory , 27(03):497–521, 2011.S. Darolles, Y. Fan, J. P. Florens, and E. M. Renault. Nonparametric instrumental regres-sion.
Econometrica , 79(5):1541–1565, 2011.S. G. Donald, G. Imbens, and W. K. Newey. Empirical likelihood estimation and consistenttests with conditional moment restrictions.
Journal of Econometrics , 117(1):55–93, 2003.J.-P. Florens. Inverse problems and structural econometrics: The example of instrumentalvariables. In M. Dewatripont, L. P. Hansen, and S. J. Turnovsky, editors,
Advances inEconomics and Econometrics: Theory and Applications – Eight World Congress , vol-ume 36 of
Econometric Society Monographs . Cambridge University Press, 2003.J. P. Florens, J. Johannes, and S. Van Bellegem. Identification and estimation by penal-ization in nonparametric instrumental regression.
Econometric Theory , 27(03):472–496,2011.P. Gagliardini and O. Scaillet. A specification test for nonparametric instrumental variableregression. Swiss Finance Institute Research Paper No. 07-13, 2007.P. Gagliardini and O. Scaillet. Tikhonov regularization for nonparametric instrumentalvariable estimators.
Journal of Econometrics , 167:6175, 2011.J. Ghorai. Asymptotic normality of a quadratic measure of the orthogonal series typedensity estimate.
Annals of the Institute of Statistical Mathematics , 32:341–350, 1980.P. Hall and J. L. Horowitz. Nonparametric methods for inference in the presence of instru-mental variables.
Annals of Statistics , 33:2904–2929, 2005.Y. Hong and H. White. Consistent specification testing via nonparametric series regression.
Econometrica , 63:1133–1159, 1995. 39. L. Horowitz. Testing a parametric model against a nonparametric alternative with iden-tification through instrumental variables.
Econometrica , 74(2):521–538, 2006.J. L. Horowitz. Applied nonparametric instrumental variables estimation.
Econometrica ,79(2):347–394, 2011.J. L. Horowitz. Specification testing in nonparametric instrumental variables estimation.
Journal of Econometrics , 167:383–396, 2012.J. L. Horowitz and V. G. Spokoiny. An adaptive, rate-optimal test of a parametric mean-regression model against a nonparametric alternative.
Econometrica , 69(3):599–631, 2001.J. Johannes and M. Schwarz. Adaptive nonparametric instrumental regression by modelselection. Technical report, Universit´e catholique de Louvain, 2010.W. K. Newey. Convergence rates and asymptotic normality for series estimators.
Journalof Econometrics , 79(1):147 – 168, 1997.W. K. Newey and J. L. Powell. Instrumental variable estimation of nonparametric models.
Econometrica , 71:1565–1578, 2003.A. Santos. Inference in nonparametric instrumental variables with partial identification.
Econometrica , 80(1):213–275, 2012.R. J. Serfling.
Approximation theorems of mathematical statistics . Wiley Series in Proba-bility and Statistics. Wiley, Hoboken, NJ, 1981.G. Tripathi and Y. Kitamura. Testing conditional moment restrictions.