Adaptive Estimation of Quadratic Functionals in Nonparametric Instrumental Variable Models
aa r X i v : . [ m a t h . S T ] J a n Adaptive Estimation of QuadraticFunctionals in NonparametricInstrumental Variable Models ∗ Christoph Breunig † Xiaohong Chen ‡ First version: August 2018; Revised version: January 2021
This paper considers adaptive estimation of quadratic functionals in thenonparametric instrumental variables (NPIV) models. Minimax estima-tion of a quadratic functional of a NPIV is an important problem in opti-mal estimation of a nonlinear functional of an ill-posed inverse regressionwith an unknown operator using one random sample. We first show that aleave-one-out, sieve NPIV estimator of the quadratic functional proposedby Breunig and Chen [2020] attains a convergence rate that coincides withthe lower bound previously derived by Chen and Christensen [2018]. Theminimax rate is achieved by the optimal choice of a key tuning parameter(sieve dimension) that depends on unknown NPIV model features. Wenext propose a data driven choice of the tuning parameter based on Lep-ski’s method. The adaptive estimator attains the minimax optimal ratein the severely ill-posed case and in the regular, mildly ill-posed case, butup to a multiplicative √ log n in the irregular, mildly ill-posed case. Keywords: nonparametric instrumental variables, ill-posed inverse, quadratic functional,minimax estimation, leave-one-out, adaptation, Lepski’s method. ∗ We are grateful to Volodia Spokoiny for his friendship and insightful comments, and admire hiscreativity and high standard on research. We thank Cristina Butucea, Tim Christensen, RichardNickl and Sasha Tsybakov for discussions. We acknowledge helpful comments from participantsat econometrics workshops at NYU/Stern, NUS, Chicago/Booth, and at the conference “Cel-ebrating Whitney Newey’s Contributions to Econometrics” at MIT in May 2019, the NorthAmerican Summer Meeting of the Econometric Society in Seattle in June 2019, the conference“Foundations of Modern Statistics” in occasion of Volodia Spokoiny’s 60th birthday at WIAS inNovember 2019. † Department of Economcis, Emory University, Rich Memorial Building, Atlanta, GA 30322, USA.Email: [email protected] ‡ Cowles Foundation for Research in Economics, Yale University, Box 208281, New Haven, CT06520, USA. E-mail: [email protected] . Introduction Long before the recent popularity of instrumental variables in modern machine learn-ing causal inference and biostatistics, the instrumental variables technique has beenwidely used in economics. For instance, instrumental variables regressions are fre-quently used to account for omitted variables, mis-measured regressors, endogeneityin simultaneous equations and other complex situations in observational data. Ineconomics and other social sciences, as well as in medical research, it is very difficultto estimate causal effects using observational data sets alone. When treatment as-signment is not randomized, it is generally impossible to discern between the causaleffect of treatments and spurious correlations that are induced by unobserved factors.Instrumental variables are commonly used to provide exogenous variation that is as-sociated with the treatment status, but not with the outcome variable (beyond itsdirect effect on the treatments).To avoid mis-specification of parametric functional forms, the nonparametric in-strumental variables regressions (NPIV) have gained popularity in econometrics andmodern causal inference in statistics and machine learning. The simplest NPIV modelassumes that a random sample { ( Y i , X i , W i ) } ni =1 is drawn from a joint distribution of( Y, X, W ) satisfying Y = h ( X ) + U, E [ U | W ] = 0 , (1.1)where h is an unknown measurable function, X is a d -dimensional vector of endoge-nous regressors in the sense that E [ U | X ] = 0, W is a vector of conditioning variables(instrumental variables) such that E [ U | W ] = 0. The structural function h can beidentified as a solution to an integral equation of first kind with an unknown operator: E [ Y | W = w ] = ( T h )( w ) := Z h ( x ) f X | W ( x | w ) dx, where the conditional density f X | W (and hence the conditional expectation operator T ) is unknown. Under mild conditions, the conditional density f X | W is continuousand the operator T is compact. This makes the estimation of h an ill-posed inverseproblem with an unknown operator T . See, for example, Newey and Powell [2003],Carrasco et al. [2007], Blundell et al. [2007] and Horowitz [2011].This paper considers minimax rate-optimal estimation of a quadratic functional2f the h in the NPIV model (1.1): f ( h ) := Z h ( x ) µ ( x ) dx (1.2)for a known non-negative weighting function µ , which is assumed to be uniformlybounded from below on its support. Although Chen and Pouzo [2015] and Chen andChristensen [2018] already studied plug-in sieve NPIV estimation and inference onnonlinear functionals of h , there is no results on the minimax rate optimal estimationof any nonlinear functionals of h yet. Since a quadratic functional is a leadingexample of a nonlinear functional and is also widely used in goodness-of-fit testingon h , this paper focuses on optimal estimation of the quadratic functional f ( h ) inthe NPIV model.In this paper, we first analyze a simple leave-one-out, sieve NPIV estimator b f J forthe quadratic functional f ( h ) recently proposed by Breunig and Chen [2020]. Weestablish an upper bound on the convergence rate for this estimator, and show thatit coincides with the lower bound previously derived by Chen and Christensen [2018].Thus, the leave-one-out, sieve NPIV estimator b f J is minimax rate-optimal for f ( h ).Depending on the smoothing properties of the conditional expectation operator T , we distinguish between the mildly and severely ill-posed cases. For the severely ill-posed case, the minimax optimal convergence rate for estimating f ( h ) is of (log n ) − a order, where a > h and thedegree of ill-posedness. For the mildly ill-posed case, the optimal convergence ratefor estimating f ( h ) exhibits the so-called elbow phenomena: the rate is of order n − / for the regular mildly ill-posed case; the rate is of order n − b , where 0 < b < / h , the dimension of X and the degree of ill-posedness.As already noticed in Chen and Christensen [2018], the minimax convergence ratefor the severely ill-posed case is so slow that even a plug-in sieve NPIV estimator f ( b h ) of f ( h ) achieves it, where b h is the sieve NPIV estimator of h . However, theplug-in estimator f ( b h ) fails to achieve the minimax rate for the mildly ill-posed case.Nevertheless, we show that the simple leave-one-out sieve NPIV estimator b f J of f ( h )is minimax rate-optimal regardless the degree of ill-posedness.The minimax optimal rate is achieved by the optimal choice of a key tuningparameter (sieve dimension) J that depends on the unknown smoothness of the NPIVfunction h and the unknown degree of ill-posedness. We next propose a data drivenchoice of the tuning parameter (sieve dimension) J based on Lepski’s principle. Theadaptive, leave-one-out sieve NPIV estimator b f b J of f ( h ) is shown to attain the3inimax optimal rate in the severely ill-posed case and in the regular, mildly ill-posed case, but up to a multiplicative √ log n in the irregular, mildly ill-posed case.We note that even for adaptive estimation of a quadratic functional of a unknowndensity or a regression function from a random sample, Efromovich and Low [1996]shown that the extra √ log n factor is the necessary price to pay for adaptation to theunknown smoothness of the function.Previously for the nonparametric estimation of the NPIV function h in the model(1.1), Horowitz [2014] considers adaptive estimation of h in L norm using a modelselection procedure. Breunig and Johannes [2016] consider adaptive estimation of alinear functional of h in root-mean squared error metric using a combined modelselection and Lepski method. These papers obtain adaptive rate of convergence up toa factor of p log( n ) (of the minimax optimal rate) in both severely ill-posed and mildlyill-posed cases. Chen and Christensen [2015] propose adaptive estimation of h in L ∞ norm using Lepski method, and show that their data-driven procedure attains theminimax optimal rate in sup-norm and thus fully adaptive. Our data-driven choiceof the sieve dimension is closest to that of Chen and Christensen [2015], which mightexplain why we also obtain minimax optimal adaptivity for quadratic functionals ofa NPIV function h .Minimax rate optimal estimation of a quadratic functional in density and regres-sion settings has a long history in statistics. The elbow phenomenon was perhapsfirst discovered by Bickel and Ritov [1988] in estimation of the integrated square of adensity. An analogous result was established by Donoho and Nussbaum [1990] in es-timation of quadratic functionals in Gaussian sequence models. Efromovich and Low[1996], Laurent [1996] and Laurent and Massart [2000] establish that Pinsker typeestimators of quadratic functionals can achieve the minimax optimal rates of conver-gence. For estimation of a integrated squared density, Gin´e and Nickl [2008] showedthat a leave-one-out estimator attains the rate optimal and provided an adaptiveestimator based on Lepski’s method. Collier et al. [2017] consider minimax rate-optimal estimation of a quadratic functional under sparsity constraints. Minimaxrate optimal estimation of a quadratic functional has also been analyzed in densitydeconvolutions and inverse regressions in Gaussian sequence models. See, for ex-ample, Butucea [2007], Butucea and Meziani [2011] and Kroll [2019]. As far as weknow, there is no published work on minimax rate-optimal adaptive estimation of aquadratic functional in NPIV models yet.The rest of the paper is organized as follows. Section 2 presents the leave-one-outsieve NPIV estimator of the quadratic functional f ( h ). It also derives the minimax4ptimal convergence rates for our estimator of f ( h ). Section 3 first presents a simpledata-driven procedure of choosing tuning parameters based on Lepski’s method. Itthen establishes the convergence rates of our adaptive estimator of the quadraticfunctional. Section 4 briefly concludes. All proofs can be found in the AppendicesA–C.
2. Minimax Optimal Quadratic Functional Estimation
This section consists of three parts. Subsection 2.1 first recalls a lower bound resultin Chen and Christensen [2018] for estimation of a quadratic functional of the NPIVfunction h . Subsection 2.2 introduces a simple leave-one-out, sieve NPIV estimatorof the quadratic functional f ( h ). Subsection 2.3 establishes the convergence rate ofthe proposed estimator, and shows that the rate coincides with the lower bound andhence is optimal. We first introduce notation. For any random vector V with support V , we let L ( V ) = { φ : V 7→ R , k φ k L ( V ) < ∞} with the norm k φ k L ( V ) = p E [ φ ( V )]. If { a n } and { b n } are sequences of positive numbers, we use the notation a n . b n if lim sup n →∞ a n /b n < ∞ and a n ∼ b n if a n . b n and b n . a n .Denote L µ = { φ : [0 , d R , k φ k µ < ∞} with the norm k φ k µ = qR φ ( x ) µ ( x ) dx .Let { e ψ } j ≥ be an orthonormal basis and e ψ j,k,G be a tensor-product CDV wavelet ba-sis in L µ . For simplicity we assume that the structural function h belongs to theSobolev ellipsoid H ( p, L ) = n φ ∈ L µ : ∞ X j =1 j p/d h φ, e ψ j i µ ≤ L o , for 0 < p, L < ∞ . Finally we let T : L ( X ) L ( W ) denote the conditional expectation operator givenby ( T h )( w ) = E [ h ( X ) | W = w ]. Condition LB. (i) E [ U | W = w ] ≥ σ > uniformly for w ∈ W ; (ii) there is somedecreasing sequence ( ν j ) j ≥ such that k T h k L ( W ) . P j,G,k [ ν (2 j )] h h, e ψ j,k,G i µ for all h ∈ H ( p, L ) ; and (iii) T [ h − h ] = 0 for any h ∈ L µ implies that f ( h ) = f ( h ) . Condition LB (ii) specifies the smoothing properties of the conditional expectationoperator relative to e ψ j,k,G . This assumption is commonly imposed in the related5iterature, see also Chen and Reiß [2011] for an overview. As in the related literature,we distinguish between a mildly ill-posed case where ν ( t ) = t − ζ and a severely ill-posed case where ν ( t ) = exp( − t ζ ) for some ζ >
0. Condition LB (iii) ensuresidentification of the nonlinear functional f ( h ). The next result was established byChen and Christensen [2018, Theorem C.1] and hence, its proof is omitted. Lemma 2.1.
Let Condition LB be satisfied. Then, we have lim inf n →∞ inf e f sup h ∈H ( p,L ) P h (cid:16)(cid:12)(cid:12) e f − f ( h ) (cid:12)(cid:12) > cr n (cid:17) ≥ c ′ > , for some constant c > , where inf e f is the infimum over all estimators that dependon the sample size n and1. Mildly ill-posed case: r n = ( n − p/ (4( p + ζ )+ d ) , if p ≤ ζ + d/ ,n − / , if p > ζ + d/ . (2.1)
2. Severely ill-posed case: r n = (log n ) − p/ζ . (2.2)According to Lemma 2.1, there is the so-called elbow phenomena in the mildlyill-posed case 2.1: the regular case with a parametric rate of n − / when p > ζ + d/ p ≤ ζ + d/ Let { ( Y i , X i , W i ) } ni =1 denote a random sample from the NPIV model (1.1). The sieveNPIV (or series 2SLS) estimator b h of h can be written in matrix form as b h ( · ) = ψ J ( · ) ′ [Ψ ′ B ( B ′ B ) − B ′ Ψ] − Ψ ′ B ( B ′ B ) − B ′ Y where Y = ( Y , . . . , Y n ) ′ and ψ J ( x ) = ( ψ ( x ) , . . . , ψ J ( x )) ′ Ψ = ( ψ J ( X ) , . . . , ψ J ( X n )) ′ b K ( w ) = ( b ( w ) , . . . , b K ( w )) ′ B = ( b K ( W ) , . . . , b K ( W n )) ′ { ψ , ..., ψ J } and { b , ..., b K } are collections of basis functions of dimension J and K used for approximating h and the instrument space, respectively. Based onmany simulation results in Blundell et al. [2007] and the minimax sup-norm rateresults in Chen and Christensen [2018], the crucial regularization parameter is thedimension J of the sieve space used to approximate unknown function h . In thispaper, we keep the relationship of J and K fixed, i.e., the function K ( · ) does notdepend on the sample size and satisfies K ( J ) ≥ J for all J . As pointed out by Chenand Christensen [2018], although one could estimate f ( h ) by the plug-in sieve NPIVestimator f ( b h ), it fails to achieve the lower bound in Lemma 2.1.Recently Breunig and Chen [2020] propose a leave-one-out, sieve NPIV estimatorfor the quadratic functional f ( h ) as follows: b f J = 2 n ( n − X ≤ i
0; and(ii) severely ill-posed if τ j ∼ exp( j ζ/d ) for some ζ > Assumption 1. (i) h ∈ H ( p, L ) and ∪ J Ψ J is dense in ( H ( p, L ) , k · k µ ) ; (ii) Condi-tion LB (iii) is satisfied; and (iii) sup w ∈W E [ U | W = w ] ≤ σ < ∞ and E [ U ] < ∞ . We introduce the projections Π J h ( x ) = ψ J ( x ) ′ G − µ h ψ J , h i µ . Let ζ ψ,J = sup x k G − / µ ψ J ( x ) k and ζ b,K = sup w k G − / b b K ( w ) k . Denote ζ J = max( ζ ψ,J , ζ b,K ) for K = K ( J ). Assumption 2. (i) τ J ζ J p (log J ) /n = O (1) ; (ii) ζ J √ log J k h − Π J h k µ = O (1) . Assumption 3. (i) sup h ∈ Ψ J k (Π K T − T ) h k L ( W ) / k h k µ = o ( τ − J ) ; (ii) There existsconstant C > such that for all j ≥ : τ j k T ( h − Π j h ) k L ( W ) ≤ C k h − Π j h k µ . In the following, we introduce the vector τ J = ( τ , . . . , τ J ) ′ where τ j > ≤ j ≤ J and τ j ≤ τ j ′ for all 1 ≤ j < j ′ ≤ J . For a r × c matrix A with r ≤ c and fullrow rank r we let A − l denote its left pseudoinverse, namely ( A ′ A ) − A ′ where ′ denotestranspose and − denotes generalized inverse. Below, k · k denotes the vector ℓ normwhen applied to vectors and the operator norm induced by the vector ℓ norm whenapplied to matrices. Assumption 4. (cid:13)(cid:13) diag ( τ J ) − (cid:0) G − / b SG − / µ (cid:1) − l (cid:13)(cid:13) ≤ D for some constant D > .Discussion of Assumptions: Assumption 1 (i) imposes a regularity condition onthe structural function h . Assumption 2 imposes bounds on the growth of the basisfunctions which are known for commonly used bases. For instance, ζ b = O ( √ K )and ζ ψ = O ( √ J ) for (tensor-product) polynomial spline, wavelet and cosine bases,and ζ b = O ( K ) and ζ ψ = O ( J ) for (tensor-product) orthogonal polynomial bases.Assumption 3 (i) is a mild condition on the approximation properties of the basisused for the instrument space. In fact, k (Π K T − T ) h k L ( W ) = 0 for all h ∈ Ψ J when the basis functions for clsp { b , ..., b K } and Ψ J form either a Riesz basis or aneigenfunction basis for the conditional expectation operator. Assumption 3 (ii) is theusual L “stability condition” imposed in the NPIV literature (cf. Assumption 6 inBlundell et al. [2007]). Note that Assumption 3 (ii) is also automatically satisfied byRiesz bases. Assumption 4 is a modification of the sieve measure of ill-posedness andwas used by Efromovich and Koltchinskii [2001]. Assumption 4 is also related to the8xtended link condition in Breunig and Johannes [2016] to establish optimal upperbounds in the context of minimax optimal estimation of linear functionals in NPIV.The next result provides the rate of convergence of our estimator b f J . Theorem 2.1.
Let Assumptions 1–4 be satisfied and assume J ∼ K . Then, we have (cid:12)(cid:12) b f J − f ( h ) (cid:12)(cid:12) = O p τ J Jn + 1 n (cid:16) J X j =1 τ j h h , e ψ j i µ + τ J X j>J h h , e ψ j i µ (cid:17) + J − p/d ! . (2.3)
1. Mildly ill-posed case: choosing J ∼ n d/ (4( p + ζ )+ d ) implies (cid:12)(cid:12) b f J − f ( h ) (cid:12)(cid:12) = ( O p (cid:0) n − p/ (4( p + ζ )+ d ) (cid:1) , if p ≤ ζ + d/ ,O p (cid:0) n − / (cid:1) , if p > ζ + d/ . (2.4)
2. Severely ill-posed case: choosing J ∼ (cid:18) log n − p + d ζ log log n (cid:19) d/ζ implies (cid:12)(cid:12) b f J − f ( h ) (cid:12)(cid:12) = O p (cid:0) (log n ) − p/ζ (cid:1) . (2.5)Theorem 2.1 provides an extension of Breunig and Chen [2020, Theorem F.1]under Assumption 4. Theorem 2.1 also provides concrete convergence rates of b f J once J is chosen optimally in either the mildly ill-posed case or the severely ill-posed case.Within the mildly ill-posed case, depending on the smoothness of h relatively to thedimension of X and the degree of mildly illposedness ζ , either the first or the secondvariance term is dominating which then leads to the so-called elbow phenomenon.In particular, the simple estimator b f J of Breunig and Chen [2020] achieves the lowerbound rate in Lemma 2.1 and is thus, minimax rate optimal.
3. Rate Adaptive Estimation
The minimax rate of convergence depends on the optimal choice of sieve dimension J , which depends on the unknown smoothness p of the true NPIV function h andthe unknown degree of ill-posedness. In this section we propose a data-driven choiceof the sieve dimension J based on Lepski’s method; see Lepski [1990], Lepski and9pokoiny [1997] and Lepski et al. [1997] for detailed descriptions of the method.The oracle choice of the dimension parameter is given by J = min n J ∈ N : J − p/d ≤ C V ( J ) o where V ( J ) = τ J p J (log n ) n (3.1)for some constant C >
0. The dimension parameter J characterizes the optimalchoice of the dimension parameter given a-priori knowledge of smoothness of h andthe degree of ill-posedness of our estimation problem. We thus denote the b f J as theoracle estimator.Our data driven choice b J of sieve dimension is defined as follows b J = min n J ∈ b I : | b f J − b f J ′ | ≤ c ( b V ( J ) + b V ( J ′ )) for all J ′ ∈ b I with J ′ ≥ J o for some constant c >
0, some index set b I to be introduced below, and where b V ( J ) = b τ J p J (log n ) /n (3.2)and an estimator of the sieve measure of ill-posedness τ J is given by b τ J := (cid:2) s min (cid:0) ( B ′ B/n ) − / ( B ′ Ψ /n ) G − / µ (cid:1)(cid:3) − where s min ( · ) is the minimal singular value.Since our quadratic functional estimator is the simple leave-one-out sieve NPIVestimator, for simplicity we use the same index set as that in Chen and Christensen[2015] for their adaptive sieve NPIV estimation of h : b I = { J : J min ≤ J ≤ b J max } , where J min = ⌊ log log n ⌋ and b J max = min n J > J min : b τ J [ ζ ( J )] p ℓ ( J )(log n ) /n ≥ o where ℓ ( J ) = 0 . J , and ζ ( J ) = √ J for spline, wavelet, or trigonometric sievebasis, and ζ ( J ) = J for orthogonal polynomial basis.Since the definition of the feasible index set b I relies on an upper bound b J max , wefollow Chen and Christensen [2015] to use a dimension parameter J to control thecomplexity of b I , which is allowed to grow slowly with the sample size n . previously,Chen and Christensen [2015, Theorem 3.2] establish that b J max ≤ J with probability10pproaching one, where J satisfies the rate restrictions imposed in the next assump-tion. We also make use of the notation ζ = ζ J . Assumption 5. τ J ζ (log n ) / √ n = o (1) and J ε = O ( n ) for some ε > . Assumption 5 is a uniform version of the rate conditions imposed in Assumption2 (i). The next result establishes an upper bound for the adaptive estimator b f b J . Theorem 3.1.
Let Assumptions 1–5 be satisfied. Then, we have (cid:12)(cid:12) b f b J − f ( h ) (cid:12)(cid:12) ≤ c τ J p J log n/n + | b f J − f ( h ) | with probability approaching one. The proof of Theorem 3.1 is based on a Bernstein-type inequality for canonicalU-statistics. Also note that Theorem 2.1 provides an upper bound for | b f J − f ( h ) | .The following result illustrates the general upper bound for the mildly and severelyill-posed case. Corollary 3.1.
Let Assumptions 1–5 be satisfied. Then, we have in the1. Mildly ill-posed case: (cid:12)(cid:12) b f b J − f ( h ) (cid:12)(cid:12) = ( O p (cid:16)(cid:0) √ log n/n (cid:1) p/ (4( p + ζ )+ d ) (cid:17) , if p ≤ ζ + d/ ,O p (cid:0) n − / (cid:1) , if p > ζ + d/ . (3.3)
2. Severely ill-posed case: (cid:12)(cid:12) b f b J − f ( h ) (cid:12)(cid:12) = O p (cid:0) (log n ) − p/ζ (cid:1) . (3.4)Corollary 3.1 shows that our data-driven choice of tuning parameter can lead tofully adaptive rate-optimal estimation of f ( h ) for both the severely ill-posed case andthe regular mildly ill-posed case, while it has to pay a price of extra √ log n factor forthe irregular mildly ill-posed case. We note that when ζ = 0 in the mildly ill-posedcase, the NPIV model (1.1) becomes the regression model with X = W . Thus ourresult is in agreement with the findings of Efromovich and Low [1996] who showedthat one must pay a factor of √ log n penalty in adaptive estimation of a quadraticfunctional of unknown density and regression functions when p ≤ d/ . Conclusion In this paper we first show that the simple leave-one-out, sieve NPIV estimator of thequadratic functional proposed in Breunig and Chen [2020] is minimax rate optimal.We then propose an adaptive leave-one-out sieve NPIV estimator of the quadraticfunctional based on Lepski’s method. We show that the adaptive estimator achievesthe minimax optimal rate for the severely ill-posed case as well as for the regularmildly ill-posed case, while a multiplicative √ log n term is the price to pay for theirregular, mildly ill-posed NPIV problem.In adaptive estimation of a nonparametric regression function E [ Y | X = · ] = h ( · ),it is known that Lepski method has the tendency of choosing small sieve dimension,and hence may not perform well in empirical work. We wish to point out that due tothe ill-posedness of the NPIV model (1.1), the optimal sieve dimension for estimat-ing f ( h ) is smaller than the optimal sieve dimension for estimating f ( E [ Y | X = · ]).Therefore, we suspect that our simple adaptive estimator of quadratic functional ofa NPIV function will perform well in finite samples. Implementation of our datadriven method still relies on choice of a calibration constant. To improve finite sam-ple performance over the original Lepski method, Spokoiny and Vial [2009] offered apropagation approach, and Spokoiny and Willrich [2019] proposed a bootstrap cali-bration of Lepski’s method. We believe that the bootstrap calibration will be helpfulin the NPIV setting as well.A leading nonlinear ill-posed inverse problems with unknown operator is the non-parametric quantile instrumental variables (NPQIV) model: Y = h ( X ) + U , E [ { U ≤ }| W ] = τ ∈ (0 , . Recently Dunker [2020] considers adaptive estimation of the NPQIV function h in L norm under an a stronger assumption that U and W are fully independent, withoutestablishing minimax rate optimality. Previously Chen and Pouzo [2015] presentedthe estimation and inference on possibly nonlinear functionals of a NPQIV function h . It will be interesting to study the minimax rate adaptive estimation of a quadraticfunctional in a NPQIV model. 12 . Proofs of Results in Section 2 We define s J = s min ( G − / b SG − / µ ) which satisfies s − J = sup h ∈ Ψ J ,h =0 k h k µ k Π K T h k L ( W ) ≥ τ J (A.1)for all K = K ( J ) ≥ J >
0. Indeed, we note that s J = inf h ∈ Ψ J k Π K T h k L ( W ) k h k µ ≥ inf h ∈ Ψ J k T h k L ( W ) k h k µ − sup h ∈ Ψ J k (Π K T − T ) h k L ( W ) k h k µ = (1 − o (1)) τ − J (A.2)by Assumption 3.In the following we let Q J h ( x ) = ψ J ( x ) ′ [ S ′ G − b S ] − S ′ G − b E [ b K ( J ) ( W ) h ( X )]. For a r × c matrix A with r ≤ c and full row rank r we let A − l denote its left pseudoinverse,namely ( A ′ A ) − A ′ where ′ denotes transpose and − denotes generalized inverse. Thus,we can write Q J h ( x ) = ψ J ( x ) ′ ( G − / b S ) − l G − / b E [ b K ( J ) ( W ) h ( X )]= ψ J ( x ) ′ G − / µ ( G − / b SG − / µ ) − l G − / b E [ b K ( J ) ( W ) h ( X )] . Proof of Theorem 2.1.
Proof of (2.3). Under Assumptions 1–3, we may applyBreunig and Chen [2020, Theorem F.1] which yields (cid:12)(cid:12) b f J − f ( h ) (cid:12)(cid:12) = O p (cid:16) n − τ J √ J + n − / (cid:13)(cid:13) h Q J h , ψ J i ′ µ ( G − / b S ) − l (cid:13)(cid:13) + J − p/d (cid:17) . In the remainder of this proof we bound the second summand on the right hand side.Let us define e ψ J = G − / µ ψ J and e b K = G − / b b K . From the definition of Q J h ( x ) = ψ J ( x ) ′ ( G − / b S ) − l E [ e b K ( J ) ( W ) h ( X )] we infer h Q J h , e ψ J i µ = ( G − / b SG − / µ ) − l E [ e b K ( J ) ( W ) h ( X )].We thus calculate (cid:13)(cid:13) h Q J h , ψ J i ′ µ ( G − / b S ) − l (cid:13)(cid:13) = (cid:13)(cid:13) h Q J h , e ψ J i ′ µ ( G − / b S ′ G − / µ ) − l (cid:13)(cid:13) ≤ (cid:13)(cid:13) diag ( τ J )( G − / b SG − / µ ) − l E [ e b K ( J ) ( W ) h ( X )] (cid:13)(cid:13) × (cid:13)(cid:13) diag ( τ J ) − (cid:0) G − / b SG − / µ (cid:1) − l (cid:13)(cid:13) where (cid:13)(cid:13) diag ( τ J ) − (cid:0) G − / b SG − / µ (cid:1) − l (cid:13)(cid:13) ≤ D due to Assumption 4. Consequently, the13efinition of the projection Π J h ( x ) = ψ J ( x ) ′ G − µ h ψ J , h i µ yields (cid:13)(cid:13) h Q J h , ψ J i ′ µ ( G − / b S ) − l (cid:13)(cid:13) ≤ D (cid:13)(cid:13) diag ( τ J )( G − / b SG − / µ ) − l E [ e b K ( J ) ( W ) h ( X )] (cid:13)(cid:13) ≤ D (cid:13)(cid:13) diag ( τ J ) h e ψ J , h i µ (cid:13)(cid:13) + D (cid:13)(cid:13) diag ( τ J )( G − / b SG − / µ ) − l E [ e b K ( J ) ( W )( h ( X ) − Π J h ( X ))] (cid:13)(cid:13) ≤ D vuut J X j =1 τ j h h , e ψ j i µ + D τ J (cid:13)(cid:13) E [ e b K ( J ) ( W )( h ( X ) − Π J h ( X ))] (cid:13)(cid:13) where we used again Assumption 4 in the last inequality. Consider the second termon the right hand side. We obtain τ J (cid:13)(cid:13) E [ e b K ( J ) ( W )( h ( X ) − Π J h ( X ))] (cid:13)(cid:13) = τ J k Π K T ( h − Π J h ) k L ( W ) ≤ τ J k T ( h − Π J h ) k L ( W ) = O (cid:0) τ J k h − Π J h k µ (cid:1) = O τ J sX j>J h h , e ψ j i µ by making use of Assumption 3 (ii).Proof of (2.4). The choice of J ∼ n d/ (4( p + ζ )+ d ) implies n − τ J J ∼ n − J ζ/d ∼ n − p/ (4( p + ζ )+ d ) and for the bias term J − p/d ∼ n − p/ (4( p + ζ )+ d ) . We now distinguish between the two regularity cases of the result. First, consider thecase p ≤ ζ + d/
4, where the function j j ζ − p ) /d +1 / is increasing and consequently,14e observe n − J X j =1 h h , e ψ j i µ τ j ∼ n − J X j =1 h h , e ψ j i µ j p/d − / j ζ − p ) /d +1 / . J ζ − p ) /d +1 / n − J X j =1 h h , e ψ j i µ j p/d − / . J ζ − p ) /d +1 / n − ∼ n − p/ (4( p + ζ )+ d ) . Moreover, using h ∈ H ( p, L ), i.e., P j ≥ h h , e ψ j i µ j p/d ≤ L , we obtain n − τ J X j>J h h , e ψ j i µ . n − J ζ − p ) /d X j>J h h , e ψ j i µ j p/d . n − J ζ − p ) /d . n − p/ (4( p + ζ )+ d ) . Finally, it remains to consider the case p > ζ + d/
4. In this case, we have that J X j =1 h h , e ψ j i µ τ J . J X j =1 h h , e ψ j i µ j p/d = O (1)and consequently, the second variance term satisfies n − (cid:13)(cid:13) h Q J h , ψ J i µ ( G − / b S ) − l (cid:13)(cid:13) = O ( n − ) which is the dominating rate and thus, completes the proof of the result.Proof of (2.5). The choice of J ∼ (cid:18) log n − p + d ζ log log n (cid:19) d/ζ implies n − τ J J ∼ n − J exp(2 J ζ/d ) ∼ (cid:18) log n − p + d ζ log log n (cid:19) d/ζ (log n ) − (4 p + d ) /ζ ∼ (log n ) d/ζ (log n ) − (4 p + d ) /ζ ∼ (log n ) − p/ζ J − p/d ∼ (log n ) − p/ζ . Moreover, since the function j j − p/d exp( j ζ/d ) is in-creasing we obtain n − J X j =1 h h , e ψ j i µ τ j ∼ n − J X j =1 h h , e ψ j i µ j p/d j − p/d exp( j ζ/d ) . J − p/d exp( J ζ/d ) n − J X j =1 h h , e ψ j i µ j p/d ∼ (log n ) − p/ζ (log n ) − (2 p + d ) /ζ . (log n ) − p/ζ and finally n − τ J X j>J h h , e ψ j i µ . n − exp( J ζ/d ) J − p/d X j>J h h , ψ j i µ j p/d . n − J − p/d exp( J ζ/d ) . (log n ) − p/ζ , which shows the result. B. Proofs of Results in Section 3
Below, we make use of the notation J = (cid:8) J ∈ N : J − p/d ≤ C V ( J ) (cid:9) and b J = n J ∈ b I : | b f J − b f J ′ | ≤ c ( b V ( J ) + b V ( J ′ )) for all J ′ ∈ b I with J ′ ≥ J o and recall the definition b I = { J : J min ≤ J ≤ b J max } . Below, we abbreviate “withprobability approaching one” to “wpa1”. We make use of the notation J = min (cid:8) J >J min : 4 τ J ζ ( J ) p ℓ ( J )(log n ) /n ≥ (cid:9) . We introduce the set E ∗ n = { J ∈ b J } ∩ {| b τ − J − s J | ≤ ηs J for all J ≤ J ≤ J } for some η ∈ (0 , − (2 / / ). By Lemma C.3 and C.5it holds P ( E ∗ n ) = 1 + o (1). Proof of Theorem 3.1.
Due to Theorem 3.2 of Chen and Christensen [2015] wemay assume that J ≤ b J max ≤ J on E ∗ n . The definition b J = min J ∈ b J J implies b J ≤ J
16n the set E ∗ n and hence, we obtain (cid:12)(cid:12) b f b J − f ( h ) (cid:12)(cid:12) E ∗ n ≤ (cid:12)(cid:12) b f b J − b f J (cid:12)(cid:12) E ∗ n + | b f J − f ( h ) | E ∗ n ≤ c (cid:16) b V ( b J ) + b V ( J ) (cid:17) E ∗ n + | b f J − f ( h ) | . On the set E ∗ n , we have | b τ − J − s J | ≤ ηs J which implies b τ J ≤ s − J (1 − η ) − and thus,by the definition of b V ( · ) in (3.2) we have (cid:12)(cid:12) b f b J − f ( h ) (cid:12)(cid:12) E ∗ n ≤ c (1 − η ) − (cid:16) s − b J b J + s − J J (cid:17) E ∗ n p log n/n + | b f J − f ( h ) | . Using inequality (A.2) we have s − J ≤ (1 − η ) − τ J for J ∈ { J , b J } and n sufficientlylarge, using that b J ≥ J min = ⌊ log log n ⌋ . Consequently, from the definition of V ( · ) in(3.1) we infer: (cid:12)(cid:12) b f b J − f ( h ) (cid:12)(cid:12) E ∗ n ≤ c (1 − η ) − (cid:16) τ b J b J + τ J J (cid:17) E ∗ n p log n/n + | b f J − f ( h ) |≤ c (1 − η ) − (cid:16) V ( b J ) + V ( J ) (cid:17) E ∗ n + | b f J − f ( h ) |≤ c (1 − η ) − V ( J ) + | b f J − f ( h ) | for n sufficiently large, where the last inequality is due to V ( b J ) E ∗ n ≤ V ( J ) since b J ≤ J on E ∗ n . Since η ∈ (0 , − (2 / / ) we obtain 2(1 − η ) − ≤
3. By Lemma C.3and C.5 it holds P ( E ∗ n ) = 1 + o (1) which completes the proof. Proof of Corollary 3.1.
Proof of (3.3). The definition of the oracle choice in(3.1) implies J ∼ ( n/ √ log n ) d/ (4( p + ζ )+ d ) in the mildly ill-posed case. Thus, we obtain n − (log n ) τ J J ∼ n − (log n ) J ζ/d ∼ ( p log n/n ) p/ (4( p + ζ )+ d ) which coincides with the rate for the bias term. We now distinguish between the twocases of the result. First, consider the case p ≤ ζ + d/
4. In this case, the function j j ζ − p ) /d +1 / is increasing and consequently, we observe n − J X j =1 τ j h h , e ψ j i µ . J ζ − p ) /d +1 / n − . ( p log n/n ) p/ (4( p + ζ )+ d ) . h ∈ H ( p, L ), i.e., P j ≥ h h , e ψ j i µ j p/d ≤ L , we obtain n − τ J X j>J h h , e ψ j i µ . ( p log n/n ) p/ (4( p + ζ )+ d ) . Finally, it remains to consider the case p > ζ + d/
4, where as in the proof of Theorem2.1 we have J X j =1 τ j h h , e ψ j i µ = O (1) , implying n − (cid:13)(cid:13) h Q J h , ψ J i µ ( G − / b S ) − l (cid:13)(cid:13) = O ( n − ) which is the dominating rate andthus, completes the proof of the result.Proof of (3.4). In the severely ill-posed case, the definition of the oracle choice in(3.1) implies J ∼ (cid:18) log( n/ p log n ) − p + d ζ log log( n/ p log n ) (cid:19) d/ζ and ensures that the variance and squared bias term are of the same order. Specifi-cally, we obtain n − (log n ) τ J J ∼ n − (log n ) J exp(2 J ζ/d ) ∼ (cid:16) log( n/ p log n ) (cid:17) d/ζ (cid:16) log( n/ p log n ) (cid:17) − (4 p + d ) /ζ ∼ (cid:18) log n −
12 log log n (cid:19) − p/ζ ∼ (log n ) − p/ζ and J − p/d ∼ (log n ) − p/ζ . Moreover, since the function j j − p/d exp( j ζ/d ) isincreasing we obtain n − J X j =1 τ j h h , e ψ j i µ . (log n ) − p/ζ . and finally n − τ J X j>J h h , e ψ j i µ . (log n ) − p/ζ , C. Supplementary Lemmas
We first introduce additional notation. First we consider a U-statistic U n, = 2 n ( n − X i
Lemma C.1 (Houdr´e and Reynaud-Bouret [2003]) . Let U n be a degenerate U-statisticof order 2 with kernel R based on a simple random sample Z , . . . , Z n . Then thereexists a generic constant C > , such that P h (cid:12)(cid:12)(cid:12) X ≤ i
Let Assumption 1 (iii) be satisfied. Given kernel R = R , it holds: Λ ≤ σ n ( n − J s − J (C.2)Λ ≤ σ n s − J (C.3)Λ ≤ σ √ nM n ζ b,K s − J (C.4)Λ ≤ M n ζ b,K s − J . (C.5) Proof.
The result is due to Lemma F.1 and Lemma G.2 of Breunig and Chen [2020].
Lemma C.3.
Let Assumption 5 be satisfied. Then, for η ∈ (0 , it holds | b τ − J − s J | ≤ ηs J for all J ≤ J ≤ J wpa1.Proof. The result is due to Lemma E.16 of Chen and Christensen [2015].
Lemma C.4.
Let Assumptions 1–5 be satisfied and let J ≤ b J max ≤ J hold wpa1.Then, uniformly for all J min ≤ J ≤ J it holds (cid:12)(cid:12) b f J − f ( Q J h ) (cid:12)(cid:12) ≤ σ s − J √ J log nn − c n ( J ) (C.6) wpa1.Proof. First, observe that by making use of Assumption 5, i.e., τ J ζ p (log n ) /n = o (1), it holds c n ( J ) = o (1) uniformly in J ≤ J ≤ J . Following the proof of Breunig20nd Chen [2020, Theorem F.1] we have the decomposition b f J − f ( Q J h ) = U n, ( J ) + U n, ( J )+ 2 n ( n − X i (cid:19) ≤ P (cid:18) max J ≤ J n c n ( J ) − (cid:12)(cid:12) U n, ( J ) (cid:12)(cid:12)o > / (cid:19) + P (cid:18) max J ≤ J n c n ( J ) − (cid:12)(cid:12) U n, ( J ) (cid:12)(cid:12)o > / (cid:19) + P (cid:18) max J ≤ J n c n ( J ) − (cid:12)(cid:12) U n, ( J ) (cid:12)(cid:12)o > / (cid:19) =: T + T + T . Consider T . We make use of U-statistics deviation results. To do so, considerΛ , . . . , Λ as given in Lemma C.1. From Lemma C.2 we we infer with u = 2 log J and M n = ζ − √ n/ (log J ) that for all J ≤ J we haveΛ √ u + Λ u +Λ u / + Λ u ≤ Λ q J + 2Λ log J + Λ (2 log J ) / + 4Λ (log J ) ≤ σ ns − J p J log n + 4 σ ns − J log n + σ ns − J p n + 4 ns − J for n sufficiently large. Hence, we obtain for n sufficiently largeΛ √ u + Λ u + Λ u / + Λ u ≤ σ ns − J p J log n = n ( n − c n ( J )21y the definition of c n ( J ). Consequently, Lemma C.1 with u = 2 log J yields T ≤ X J ≤ J P (cid:12)(cid:12)(cid:12) X i (cid:13)(cid:13)(cid:13) G / µ AG / b (cid:13)(cid:13)(cid:13) = (cid:13)(cid:13)(cid:13) G / µ [ S ′ G − b S ] − S ′ G − / b (cid:13)(cid:13)(cid:13) = (cid:13)(cid:13)(cid:13) [ G − / µ S ′ G − b SG − / µ ] − G − / µ S ′ G − / b (cid:13)(cid:13)(cid:13) = (cid:13)(cid:13)(cid:13) (cid:16) G − / b SG − / µ (cid:17) − l (cid:13)(cid:13)(cid:13) = s − J . Consequently, using that M n = ζ − √ n/ (log J ) we obtain by Markov’s inequality andthe definition of c n ( J ) that T ≤ E | U {| U | >M n } | E | U {| U | >M n } | max J ≤ J ζ J s J c n ( J ) ≤ M − n (cid:0) E [ U ] (cid:1) max J ≤ J ζ J s J c n ( J )= O (cid:16) n − (log J ) ζ τ J p log n (cid:17) = o (1)where the last equation is due to the rate condition imposed in Assumption 5.22onsider T . We make use of the inequality X i = i ′ U i U i ′ b K ( J ) ( W i ) ′ (cid:16) A ′ G µ A − b A ′ G µ b A (cid:17) b K ( J ) ( W i ′ ) ≤ (cid:13)(cid:13)(cid:13) X i U i e b K ( J ) ( W i ) (cid:13)(cid:13)(cid:13) (cid:13)(cid:13) G / b ( A ′ G µ A − b A ′ G µ b A ) G / b (cid:13)(cid:13) = X i = i ′ U i U i ′ e b K ( J ) ( W i ) ′ e b K ( J ) ( W i ′ ) (cid:13)(cid:13) G / b ( A ′ G µ A − b A ′ G µ b A ) G / b (cid:13)(cid:13) + 1 n X i (cid:13)(cid:13)(cid:13) U i e b K ( J ) ( W i ) (cid:13)(cid:13)(cid:13) (cid:13)(cid:13) G / b ( A ′ G µ A − b A ′ G µ b A ) G / b (cid:13)(cid:13) . Note that n − P i k U i e b K ( J ) ( W i ) k ≤ K ( J ) + o p (1) uniformly in J . Further, the defi-nition c n ( J ) = 2 σ s − J √ J log n/ ( n −
1) implies T = P (cid:16) max J ≤ J (cid:12)(cid:12)(cid:12) s J n √ J log n X i σ (cid:17) ≤ P (cid:16) max J ≤ J (cid:12)(cid:12)(cid:12) s J n √ J log n X i = i ′ U i U i ′ e b K ( J ) ( W i ) ′ e b K ( J ) ( W i ′ ) (cid:12)(cid:12)(cid:12) > σ (cid:17) + P (cid:16) max J ≤ J (cid:0)(cid:13)(cid:13) G / b ( A ′ GA − b A ′ G b A ) G / b (cid:13)(cid:13)(cid:1) > (cid:17) + P (cid:16) max J ≤ J (cid:0) K ( J ) (cid:13)(cid:13) G / b ( A ′ GA − b A ′ G b A ) G / b (cid:13)(cid:13)(cid:1) > σ (cid:17) + o (1)=: T + T + T + o (1) . Now, T = o (1) directly by following the first step in the proof of Breunig and Chen[2020, Theorem 3.3]. Finally, T = o (1) and T = o (1) by following the proof ofChen and Christensen [2015, Lemma E.16]. Lemma C.5.
Let Assumptions 1–3 be satisfied and let J ≤ b J max ≤ J hold wpa1.Then, we have P ( J ∈ b J ) = 1 + o (1) .Proof. Let E n denote the event upon which J ≤ b J max ≤ J and observe that P ( E cn ) = o (1) by hypothesis. The triangular inequality yields (cid:12)(cid:12) b f J − b f J (cid:12)(cid:12) ≤ (cid:12)(cid:12) b f J − f ( Q J h ) (cid:12)(cid:12) + (cid:12)(cid:12) b f J − f ( Q J h ) (cid:12)(cid:12) + (cid:12)(cid:12) f ( Q J h ) − f ( h ) (cid:12)(cid:12) + (cid:12)(cid:12) f ( Q J h ) − f ( h ) (cid:12)(cid:12) . By Lemma C.4, uniformly for all J min ≤ J ≤ J it holds (cid:12)(cid:12) b f J − f ( Q J h ) (cid:12)(cid:12) ≤ σ s − J √ J log nn − E n, ⊆ E n where P ( E cn, ) = o (1). By the definition of J we infer for all J ∈ J that (cid:12)(cid:12)(cid:12) f ( Q J h ) − f ( h ) (cid:12)(cid:12)(cid:12) ≤ C τ J p J log n/n ≤ C s − J p J log n/n where the last inequality is due to (A.1). Hence, we conclude (cid:12)(cid:12) b f J − b f J (cid:12)(cid:12) ≤ ( C + 2 σ ) (cid:16) s − J p J log n/n + s − J p J log n/n (cid:17) . Due to Lemma C.3 it holds s − J ≤ (1 + η ) b τ J for some η ∈ (0 , J ≤ J ≤ J , on some set E n, with P ( E cn, ) = o (1). Consequently, on E n, ∩ E n, itholds (cid:12)(cid:12) b f J − b f J (cid:12)(cid:12) ≤ ( C + 2 σ )(1 + η ) (cid:16)b τ J p J log n/n + b τ J p J log n/n (cid:17) = ( C + 2 σ )(1 + η ) (cid:16) b V ( J ) + b V ( J ) (cid:17) uniformly for all J < J ≤ J . We conclude that J ∈ b J on E n, ∩ E n, and P ( E n, ∩E n, ) = 1 − o (1). References
P. J. Bickel and Y. Ritov. Estimating integrated squared density derivatives: sharpbest order of convergence estimates.
Sankhy¯a: The Indian Journal of Statistics,Series A , pages 381–393, 1988.R. Blundell, X. Chen, and D. Kristensen. Semi-nonparametric iv estimation of shape-invariant engel curves.
Econometrica , 75(6):1613–1669, 2007.C. Breunig and X. Chen. Adaptive, rate-optimal testing in instrumental variablesmodels. arXiv preprint arXiv:2006.09587 , 2020.C. Breunig and J. Johannes. Adaptive estimation of functionals in nonparametricinstrumental regression.
Econometric Theory , 32(3):612–654, 2016.C. Butucea. Goodness-of-fit testing and quadratic functional estimation from indirectobservations.
The Annals of Statistics , 35(5):1907–1930, 2007.24. Butucea and K. Meziani. Quadratic functional estimation in inverse problems.
Statistical Methodology , 8(1):31–41, 2011.M. Carrasco, J.-P. Florens, and E. Renault. Linear inverse problems in structuraleconometrics: Estimation based on spectral decomposition and regularization. InJ. Heckman and E. Leamer, editors,
Handbook of Econometrics , volume 6B. NorthHolland, 2007.X. Chen and T. M. Christensen. Optimal sup-norm rates, adaptivity and in-ference in nonparametric instrumental variables estimation. arXiv preprintarXiv:1508.03365v1 , 2015.X. Chen and T. M. Christensen. Optimal sup-norm rates and uniform inference onnonlinear functionals of nonparametric iv regression.
Quantitative Economics , 9(1):39–84, 2018.X. Chen and D. Pouzo. Sieve quasi likelihood ratio inference on semi/nonparametricconditional moment models.
Econometrica , 83(3):1013–1079, 2015.X. Chen and M. Reiß. On rate optimality for ill-posed inverse problems in economet-rics.
Econometric Theory , 27(03):497–521, 2011.O. Collier, L. Comminges, and A. B. Tsybakov. Minimax estimation of linear andquadratic functionals on sparsity classes.
The Annals of Statistics , 45(3):923–958,2017.D. L. Donoho and M. Nussbaum. Minimax quadratic estimation of a quadraticfunctional.
Journal of Complexity , 6(3):290–323, 1990.F. Dunker. Nonparametric instrumental variable regression and quantile regressionwith full independence. arXiv preprint arXiv:1511.03977v2 , 2020.S. Efromovich and V. Koltchinskii. On inverse problems with unknown operators.
IEEE Transactions on Information Theory , 47(7):2876–2894, 2001.S. Efromovich and M. Low. On optimal adaptive estimation of a quadratic functional.
The Annals of Statistics , 24(3):1106–1125, 1996.E. Gin´e and R. Nickl. A simple adaptive estimator of the integrated square of adensity.
Bernoulli , 14(1):47–61, 2008. 25. L. Horowitz. Applied nonparametric instrumental variables estimation.
Economet-rica , 79(2):347–394, 2011.J. L. Horowitz. Adaptive nonparametric instrumental variables estimation: Empiricalchoice of the regularization parameter.
Journal of Econometrics , 180(2):158–173,2014.C. Houdr´e and P. Reynaud-Bouret. Exponential inequalities, with constants, for u-statistics of order two. In
Stochastic inequalities and applications , pages 55–69.Springer, 2003.M. Kroll. Rate optimal estimation of quadratic functionals in inverse problems withpartially unknown operator and application to testing problems.
ESAIM: Proba-bility and Statistics , 23:524–551, 2019.B. Laurent. Efficient estimation of integral functionals of a density.
The Annals ofStatistics , 24(2):659–681, 1996.B. Laurent and P. Massart. Adaptive estimation of a quadratic functional by modelselection.
The Annals of Statistics , pages 1302–1338, 2000.O. V. Lepski. On a problem of adaptive estimation in gaussian white noise.
Theoryof Probability and its Applications , 35:454–466, 1990.O. V. Lepski and V. G. Spokoiny. Optimal pointwise adaptive methods in nonpara-metric estimation.
The Annals of Statistics , pages 2512–2546, 1997.O. V. Lepski, E. Mammen, and V. G. Spokoiny. Optimal spatial adaptation toinhomogeneous smoothness: an approach based on kernel estimates with variablebandwidth selectors.
The Annals of Statistics , pages 929–947, 1997.W. K. Newey and J. L. Powell. Instrumental variable estimation of nonparametricmodels.
Econometrica , 71:1565–1578, 2003.V. Spokoiny and C. Vial. Parameter tuning in pointwise adaptation using a propa-gation approach.
The Annals of Statistics , 37(5B):2783–2807, 2009.V. Spokoiny and N. Willrich. Bootstrap tuning in gaussian ordered model selection.