[PDF] Kernel Selection in Nonparametric Regression

Abstract

In the regression model Y=b(X)+σ(X)ε , where X has a density f , this paper deals with an oracle inequality for an estimator of bf , involving a kernel in the sense of Lerasle et al. (2016), selected via the PCO method. In addition to the bandwidth selection for kernel-based estimators already studied in Lacour, Massart and Rivoirard (2017) and Comte and Marie (2020), the dimension selection for anisotropic projection estimators of f and bf is covered.

Full PDF

KKERNEL SELECTION IN NONPARAMETRIC REGRESSION

HÉLÈNE HALCONRUY* AND NICOLAS MARIE † Abstract.

In the regression model Y = b ( X ) + ε , where X has a density f , this paper deals with anoracle inequality for an estimator of bf , involving a kernel in the sense of Lerasle et al. (2016), selected viathe PCO method. In addition to the bandwidth selection for kernel-based estimators already studiedin Lacour, Massart and Rivoirard (2017) and Comte and Marie (2020), the dimension selection foranisotropic projection estimators of f and bf is covered. Contents

1. Introduction 12. Risk bound 23. Kernel selection 44. Basic numerical experiments 6Appendix A. Details on kernels sets: proofs of Propositions 2.2, 2.3 and 3.3 8A.1. Proof of Proposition 2.2 8A.2. Proof of Proposition 2.3 8A.3. Proof of Proposition 3.3 9Appendix B. Proofs of risk bounds 11B.1. Preliminary results 11B.2. Proof of Proposition 2.4 17B.3. Proof of Theorem 2.5 17B.4. Proof of Theorem 3.2 18References 20

MSC2010:

Introduction

Consider n ∈ N ∗ independent R d × R -valued ( d ∈ N ∗ ) random variables ( X , Y ) , . . . , ( X n , Y n ) , havingthe same probability distribution assumed to be absolutely continuous with respect to Lebesgue’s measure,and (cid:98) s K,(cid:96) ( n ; x ) := 1 n n (cid:88) i =1 K ( X i , x ) (cid:96) ( Y i ) ; x ∈ R d , where (cid:96) : R → R is a Borel function and K is a symmetric continuous map from R d × R d into R . This isan estimator of the function s : R d → R deﬁned by s ( x ) := E ( (cid:96) ( Y ) | X = x ) f ( x ) ; ∀ x ∈ R d , where f is a density of X . For (cid:96) = 1 , (cid:98) s K,(cid:96) ( n ; . ) coincides with the estimator of f studied in Lerasle etal. [11], but for (cid:96) (cid:54) = 1 , it covers estimators involved in nonparametric regression. Assume that for every i ∈ { , . . . , n } ,(1) Y i = b ( X i ) + ε i where ε i is a centered random variable, independent of X i , and b : R d → R is a Borel function. Key words and phrases.

Nonparametric estimators ; Projection estimators ; Model selection ; Regression model. a r X i v : . [ m a t h . S T ] J un HÉLÈNE HALCONRUY* AND NICOLAS MARIE † • If (cid:96) = Id R , k is a symmetric kernel and(2) K ( x (cid:48) , x ) = d (cid:89) q =1 h q k (cid:18) x (cid:48) q − x q h q (cid:19) with h , . . . , h d > for every x, x (cid:48) ∈ R d , then (cid:98) s K,(cid:96) ( n ; . ) is the numerator of Nadaraya-Watson’s estimator of theregression function b . Precisely, (cid:98) s K,(cid:96) ( n ; . ) is an estimator of s = bf . If (cid:96) (cid:54) = Id R , then (cid:98) s K,(cid:96) ( n ; . ) isthe numerator of the estimator studied in Einmahl and Mason [5, 6]. • If (cid:96) = Id R , B m q = { ϕ m q , . . . , ϕ m q m q } ( m q ∈ N ∗ and q ∈ { , . . . , d } ) is an orthonormal family of L ( R ) and(3) K ( x (cid:48) , x ) = d (cid:89) q =1 m q (cid:88) j =1 ϕ m q j ( x q ) ϕ m q j ( x (cid:48) q ) for every x, x (cid:48) ∈ R d , then (cid:98) s K,(cid:96) ( n ; . ) is the projection estimator on S = span ( B m ⊗ · · · ⊗ B m d ) of s = bf .Now, assume that for every i ∈ { , . . . , n } , Y i is deﬁned by the heteroscedastic model(4) Y i = σ ( X i ) ε i , where ε i is a centered random variable of variance , independent of X i , and σ : R d → R is a Borelfunction. If (cid:96) ( x ) = x for every x ∈ R , then (cid:98) s K,(cid:96) ( n ; . ) is an estimator of s = σ f .These ten last years, several data-driven procedures have been proposed in order to select the band-width of Parzen-Rosenblatt’s estimator ( (cid:96) = 1 and K deﬁned by (2)). First, Goldenshluger-Lepski’smethod, introduced in [8], which reaches the adequate bias-variance compromise, but is not completelysatisfactory on the numerical side (see Comte and Rebafka [4]). More recently, in [10], Lacour, Massartand Rivoirard proposed the PCO (Penalized Comparison to Overﬁtting) method and proved an oracleinequality for the associated adaptative Parzen-Rosenblatt’s estimator by using a concentration inequal-ity for the U-statistics due to Houdré and Reynaud-Bouret [9]. Together with Varet, they established thenumerical eﬃciency of the PCO method in Varet et al. [13].Comte and Marie [3] deals with an oracle inequality and numerical experiments for an adaptativeNadaraya-Watson’s estimator with a numerator and a denominator having distinct bandwidths, bothselected via the PCO method. Since the output variable in a regression model has no reason to bebounded, there were signiﬁcant additional diﬃculties, bypassed in [3], to establish an oracle inequalityfor the numerator’s adaptative estimator. Via similar arguments, the present article deals with an oracleinequality for (cid:98) s (cid:99) K,(cid:96) ( n ; . ) , where (cid:98) K is selected via the PCO method in the spirit of Lerasle et al. [11]. Inaddition to the bandwidth selection for kernel-based estimators already studied in [10, 3], it covers thedimension selection for anisotropic projection estimators of f , bf (when Y , . . . , Y n are deﬁned by Model(1)) and σ f (when Y , . . . , Y n are deﬁned by Model (4)). As for the bandwidth selection for kernelbased estimators, for d > , the PCO method allows to bypass the numerical diﬃculties generated by theGoldenshluger-Lepski type method involved in the anisotropic model selection procedures (see Chagny[1]).In Section 2, some examples of kernels sets are provided and a risk bound for (cid:98) s K,(cid:96) ( n ; . ) is established.Section 3 deals with an oracle inequality for (cid:98) s (cid:99) K,(cid:96) ( n ; . ) , where (cid:98) K is selected via the PCO method. Finally,Section 4 deals with a basic numerical study.2. Risk bound

Throughout the paper, s ∈ L ( R d ) . Let K n be a set of symmetric continuous maps from R d × R d into R , of cardinal less or equal than n , fulﬁlling the following assumption. Assumption 2.1.

There exists a deterministic constant m K ,(cid:96) > , not depending on n , such that ERNEL SELECTION IN NONPARAMETRIC REGRESSION 3 (1)

For every K ∈ K n , sup x (cid:48) ∈ R d (cid:107) K ( x (cid:48) , . ) (cid:107) (cid:54) m K ,(cid:96) n. (2) For every K ∈ K n , (cid:107) s K,(cid:96) (cid:107) (cid:54) m K ,(cid:96) with s K,(cid:96) := E ( (cid:98) s K,(cid:96) ( n ; . )) = E ( K ( X , . ) (cid:96) ( Y )) . (3) For every

K, K (cid:48) ∈ K n , E ( (cid:104) K ( X , . ) , K (cid:48) ( X , . ) (cid:96) ( Y ) (cid:105) ) (cid:54) m K ,(cid:96) s K (cid:48) ,(cid:96) with s K (cid:48) ,(cid:96) := E ( (cid:107) K (cid:48) ( X , . ) (cid:96) ( Y ) (cid:107) ) . (4) For every K ∈ K n and ψ ∈ L ( R d ) , E ( (cid:104) K ( X , . ) , ψ (cid:105) ) (cid:54) m K ,(cid:96) (cid:107) ψ (cid:107) . The elements of K n are called kernels. Let us provide two natural examples of kernels sets. Proposition 2.2.

Consider K k ( h min ) := (cid:40) ( x (cid:48) , x ) (cid:55)→ d (cid:89) q =1 h q k (cid:18) x (cid:48) q − x q h q (cid:19) ; h , . . . , h d ∈ { h min , . . . , } (cid:41) , where k is a symmetric kernel (in the usual sense) and nh d min (cid:62) . The kernels set K k ( h min ) fulﬁllsAssumption 2.1 and, for any K ∈ K k ( h min ) such that K ( x (cid:48) , x ) = d (cid:89) q =1 h q k (cid:18) x (cid:48) q − x q h q (cid:19) ; ∀ x, x (cid:48) ∈ R d with h , . . . , h d ∈ { h min , . . . , } , s K,(cid:96) = (cid:107) k (cid:107) d E ( (cid:96) ( Y ) ) d (cid:89) q =1 h q . Proposition 2.3.

Consider K B ,..., B n ( m max ) :=  ( x (cid:48) , x ) (cid:55)→ d (cid:89) q =1 m q (cid:88) j =1 ϕ m q j ( x q ) ϕ m q j ( x (cid:48) q ) ; m , . . . , m d ∈ { , . . . , m max }  , where m d max ∈ { , . . . , n } and, for every m ∈ { , . . . , n } , B m = { ϕ m , . . . , ϕ mm } is an orthonormal familyof L ( R ) such that sup x (cid:48) ∈ R m (cid:88) j =1 ϕ mj ( x (cid:48) ) (cid:54) m B m with m B > not depending on m and n , and (5) B m ⊂ B m +1 ; ∀ m ∈ { , . . . , n − } or (6) m B := sup {| E ( K ( X , x )) | ; K ∈ K B ,..., B n ( m max ) and x ∈ R d } is ﬁnite and doesn’t depend on n .The kernels set K B ,..., B n ( m max ) fulﬁlls Assumption 2.1 and, for any K ∈ K B ,..., B n ( m max ) such that K ( x (cid:48) , x ) = d (cid:89) q =1 m q (cid:88) j =1 ϕ m q j ( x q ) ϕ m q j ( x (cid:48) q ) ; ∀ x, x (cid:48) ∈ R d HÉLÈNE HALCONRUY* AND NICOLAS MARIE † with m , . . . , m n ∈ { , . . . , m max } , s K,(cid:96) (cid:54) m d B E ( (cid:96) ( Y ) ) d (cid:89) q =1 m q . Remark.

Note that Condition (5) (resp. (6)) is close to (resp. the same that) Condition (19) (resp.(20)) of Lerasle et al. [11], Proposition 3.2. See also Massart [12], Chapter 7 on these conditions. Forinstance, the trigonometric basis and Hermite’s basis satisfy Condition (5). The regular histograms basissatisfy Condition (6). Indeed, by taking ϕ mj = ψ mj := √ m [( j − /m,j/m [ for every m ∈ { , . . . , n } and j ∈ { , . . . , m } , (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E  d (cid:89) q =1 m q (cid:88) j =1 ψ m q j ( X ,q ) ψ m q j ( x q ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = m (cid:88) j =1 · · · m d (cid:88) j d =1 (cid:32) d (cid:89) q =1 m q [( j q − /m q ,j q /m q [ ( x q ) (cid:33) × (cid:90) j /m ( j − /m · · · (cid:90) j d /m d ( j d − /m d f ( x (cid:48) , . . . , x (cid:48) d ) dx (cid:48) · · · dx (cid:48) d (cid:54) (cid:107) f (cid:107) ∞ d (cid:89) q =1 m q (cid:88) j =1 [( j − /m q ,j/m q [ ( x ) (cid:54) (cid:107) f (cid:107) ∞ for every m , . . . , m d ∈ { , . . . , n } and x ∈ R d .The following proposition provides a suitable control of the variance of (cid:98) s K,(cid:96) ( n ; . ) . Proposition 2.4.

Under Assumption 2.1.(1,2,3), if s ∈ L ( R d ) and if there exists α > such that E (exp( α | (cid:96) ( Y ) | )) < ∞ , then there exists a deterministic constant c . > , not depending on n , such thatfor every θ ∈ ]0 , , E (cid:18) sup K ∈K n (cid:26)(cid:12)(cid:12)(cid:12) (cid:107) (cid:98) s K,(cid:96) ( n ; . ) − s K,(cid:96) (cid:107) − s K,(cid:96) n (cid:12)(cid:12)(cid:12) − θn s K,(cid:96) (cid:27)(cid:19) (cid:54) c . log( n ) θn . Finally, let us state the main result of this section.

Theorem 2.5.

Under Assumption 2.1, if s ∈ L ( R d ) and if there exists α > such that E (exp( α | (cid:96) ( Y ) | )) < ∞ , then there exists a deterministic constant c . , c . > , not depending on n , such that for every θ ∈ ]0 , , E (cid:18) sup K ∈K n (cid:110) (cid:107) (cid:98) s K,(cid:96) ( n ; . ) − s (cid:107) − (1 + θ ) (cid:16) (cid:107) s K,(cid:96) − s (cid:107) + s K,(cid:96) n (cid:17)(cid:111)(cid:19) (cid:54) c . log( n ) θn and E (cid:18) sup K ∈K n (cid:26) (cid:107) s K,(cid:96) − s (cid:107) + s K,(cid:96) n − − θ (cid:107) (cid:98) s K,(cid:96) ( n ; . ) − s (cid:107) (cid:27)(cid:19) (cid:54) c . log( n ) θ (1 − θ ) n . Remark.

Note that the ﬁrst inequality in Theorem 2.5 gives a risk bound on the estimator (cid:98) s K,(cid:96) ( n ; . ) : E ( (cid:107) (cid:98) s K,(cid:96) ( n ; . ) − s (cid:107) ) (cid:54) (1 + θ ) (cid:16) (cid:107) s K,(cid:96) − s (cid:107) + s K,(cid:96) n (cid:17) + c . log( n ) θn for every θ ∈ ]0 , . The second inequality is useful in order to establish a risk bound on the adaptativeestimator deﬁned in the next section (see Theorem 3.2).3. Kernel selection

This section deals with a risk bound on the adaptative estimator (cid:98) s (cid:99) K,(cid:96) ( n ; . ) , where (cid:98) K ∈ arg min K ∈K n {(cid:107) (cid:98) s K,(cid:96) ( n ; · ) − (cid:98) s K ,(cid:96) ( n ; · ) (cid:107) + pen ( K ) } ,K is an overﬁtting proposal for K and(7) pen ( K ) := 2 n n (cid:88) i =1 (cid:104) K ( ., X i ) , K ( ., X i ) (cid:105) (cid:96) ( Y i ) ; ∀ K ∈ K n . ERNEL SELECTION IN NONPARAMETRIC REGRESSION 5

Example.

For K n = K k ( h min ) , one should take K ( x (cid:48) , x ) = 1 h d min d (cid:89) q =1 k (cid:18) x (cid:48) q − x q h min (cid:19) ; ∀ x, x (cid:48) ∈ R d , and for K n = K B ,..., B n ( m max ) , one should take K ( x (cid:48) , x ) = d (cid:89) q =1 m max (cid:88) j =1 ϕ m max j ( x q ) ϕ m max j ( x (cid:48) q ) ; ∀ x, x (cid:48) ∈ R d . In the sequel, in addition to Assumption 2.1, the kernels set K n fulﬁlls the following assumption. Assumption 3.1.

There exists a deterministic constant m K ,(cid:96) > , not depending on n , such that E (cid:18) sup K,K (cid:48) ∈K n (cid:104) K ( X , . ) , s K (cid:48) ,(cid:96) (cid:105) (cid:19) (cid:54) m K ,(cid:96) . The following theorem provides an oracle inequality for the adaptative estimator (cid:98) s (cid:99) K,(cid:96) ( n ; . ) . Theorem 3.2.

Under Assumptions 2.1 and 3.1, if s ∈ L ( R d ) and if there exists α > such that E (exp( α | (cid:96) ( Y ) | )) < ∞ , then there exists a deterministic constant c . > , not depending on n , such thatfor every ϑ ∈ ]0 , , E ( (cid:107) (cid:98) s (cid:99) K,(cid:96) ( n ; . ) − s (cid:107) ) (cid:54) (1 + ϑ ) min K ∈K n E ( (cid:107) (cid:98) s K,(cid:96) ( n ; . ) − s (cid:107) ) + c . ϑ (cid:18) (cid:107) s K ,(cid:96) − s (cid:107) + log( n ) n (cid:19) . Finally, let us discuss about Assumption 3.1. Note that if s is bounded and m K := sup {(cid:107) K ( x (cid:48) , . ) (cid:107) ; K ∈ K n and x (cid:48) ∈ R d } doesn’t depend on n , then K n fulﬁlls Assumption 3.1. Indeed, E (cid:32) sup K,K (cid:48) ∈K n (cid:104) K ( X , . ) , s K (cid:48) ,(cid:96) (cid:105) (cid:33) (cid:54) (cid:18) sup K (cid:48) ∈K n (cid:107) s K (cid:48) ,(cid:96) (cid:107) ∞ (cid:19) E (cid:18) sup K ∈K n (cid:107) K ( X , . ) (cid:107) (cid:19) (cid:54) m K sup (cid:40)(cid:18)(cid:90) ∞−∞ | K (cid:48) ( x (cid:48) , x ) s ( x ) | dx (cid:19) ; K (cid:48) ∈ K n and x (cid:48) ∈ R (cid:41) (cid:54) m K (cid:107) s (cid:107) ∞ . In the nonparametric regression framework (see Model (1)), to assume s bounded means that bf isbounded. For instance, this condition is fulﬁlled by the linear regression models with Gaussian inputs.The following examples focus on the condition on m K . Examples: (1) Consider K ∈ K k ( h min ) . Then, there exists h , . . . , h d ∈ { h min , . . . , } such that K ( x (cid:48) , x ) = d (cid:89) q =1 h q k (cid:18) x (cid:48) q − x q h q (cid:19) ; ∀ x, x (cid:48) ∈ R d . Clearly, (cid:107) K ( x (cid:48) , . ) (cid:107) = (cid:107) k (cid:107) d for every x (cid:48) ∈ R d . So, for K n = K k ( h min ) , m K (cid:54) (cid:107) k (cid:107) d .(2) For K n = K B ,..., B n ( m max ) , the condition on m K seems harder to check in general. Let us showthat it is satisﬁed for the regular histograms basis deﬁned in Section 2. For every m , . . . , m d ∈{ , . . . , n } , (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) d (cid:89) q =1 m q (cid:88) j =1 ψ m q j ( x (cid:48) q ) ψ m q j ( . q ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:54) d (cid:89) q =1  m q m q (cid:88) j =1 [( j − /m q ,j/m q [ ( x (cid:48) q ) (cid:90) j/m q ( j − /m q dx  (cid:54) . The following proposition shows that K B ,..., B n ( m max ) fulﬁlls Assumption 3.1 for the trigonometric basis,even if the condition on m K is not satisﬁed. HÉLÈNE HALCONRUY* AND NICOLAS MARIE † Proposition 3.3.

Consider χ := [0 , and, for every j ∈ N ∗ , the functions χ j and χ j +1 deﬁned on R by χ j ( x ) := √ πjx ) [0 , ( x ) and χ j +1 ( x ) := √ πjx ) [0 , ( x ) ; ∀ x ∈ R . If s ∈ C ( R d ) and B m = { χ , . . . , χ m } for every m ∈ { , . . . , n } , then K B ,..., B n ( m max ) fulﬁlls Assumption3.1. Basic numerical experiments

Throughout this section, d = 1 , (cid:96) ∈ { , Id R } and Y , . . . , Y n are deﬁned by Model (1) with ε , . . . , ε n (cid:32) N (0 , . Some numerical experiments on (cid:98) s K, ( n ; . ) (resp. (cid:98) s K, Id R ( n ; . ) ) for K ∈ K k ( h min ) have alreadybeen done in Varet et al. [13] (resp. Comte and Marie [3]). So, this section deals with basic numericalexperiments on (cid:98) s K, ( n ; . ) and (cid:98) s K, Id R ( n ; . ) for K ∈ K B ,..., B n ( m max ) and B m = { ψ m , . . . , ψ mm } for every m = 1 , . . . , n .In this case, (cid:98) K = K (cid:99) m ( (cid:96) ) whereK m ( x (cid:48) , x ) := m (cid:88) j =1 ψ mj ( x (cid:48) ) ψ mj ( x ) ; ∀ x, x (cid:48) ∈ R , ∀ m ∈ M = { , . . . , m max } , (cid:98) m ( (cid:96) ) is a solution of the minimization problem min m ∈M {(cid:107) (cid:98) s K m ,(cid:96) ( n ; . ) − (cid:98) s K m max ,(cid:96) ( n ; . ) (cid:107) + pen ( m ) } and pen ( m ) := 2 n n (cid:88) i =1 (cid:104) K m ( ., X i ) , K m max ( ., X i ) (cid:105) (cid:96) ( Y i ) ; ∀ m ∈ M . For (cid:96) ∈ { , Id R } , n = 250 and m max = 30 , m is selected in M for two basic densities and two nonlinearregression functions: • f = f the density of E (5) . • f = f the density of N (1 / , (1 / ) . • b ( x ) = b ( x ) := 10( x − / for every x ∈ [0 , . • b ( x ) = b ( x ) := cos(5 πx ) for every x ∈ [0 , .On the one hand, on the four following ﬁgures, one can see the beam of all possible estimations of f and bf (i.e. for each m ∈ M ) at left, the PCO criteria for (cid:98) s K, ( n ; . ) and (cid:98) s K, Id R ( n ; . ) for each m ∈ M at themiddle, and the PCO estimations of f and bf (i.e. for m = (cid:98) m (1) and m = (cid:98) m ( Id R ) ) at right: Figure 1. f = f , b = b , (cid:98) m (1) = 10 and (cid:98) m ( Id R ) = 10 . ERNEL SELECTION IN NONPARAMETRIC REGRESSION 7

Figure 2. f = f , b = b , (cid:98) m (1) = 12 and (cid:98) m ( Id R ) = 20 . Figure 3. f = f , b = b , (cid:98) m (1) = 10 and (cid:98) m ( Id R ) = 15 . Figure 4. f = f , b = b , (cid:98) m (1) = 15 and (cid:98) m ( Id R ) = 6 . HÉLÈNE HALCONRUY* AND NICOLAS MARIE † On the other hand, for ( f, b ) = ( f , b ) and ( f, b ) = ( f , b ) , let us generate 10 datasets of n = 250 observations of ( X , Y ) and, for each of these, select m ∈ M via the PCO criterion introduced previously.On the two following ﬁgures, the beam of all PCO estimations of f (resp. bf ) is plotted at left (resp. atright): Figure 5. f = f and b = b . Figure 6. f = f and b = b . Appendix A. Details on kernels sets: proofs of Propositions 2.2, 2.3 and 3.3

A.1.

Proof of Proposition 2.2.

Consider

K, K (cid:48) ∈ K k ( h min ) . Then, there exist h, h (cid:48) ∈ { h min , . . . , } d such that K ( x (cid:48) , x ) = d (cid:89) q =1 h q k (cid:18) x (cid:48) q − x q h q (cid:19) and K (cid:48) ( x (cid:48) , x ) = d (cid:89) q =1 h (cid:48) q k (cid:18) x (cid:48) q − x q h (cid:48) q (cid:19) for every x, x (cid:48) ∈ R d .(1) For every x (cid:48) ∈ R d , (cid:107) K ( x (cid:48) , . ) (cid:107) = (cid:107) k (cid:107) d d (cid:89) q =1 h q (cid:54) (cid:107) k (cid:107) d n. (2) Since s K,(cid:96) = K ∗ s , (cid:107) s K,(cid:96) (cid:107) (cid:54) (cid:107) k (cid:107) d (cid:107) s (cid:107) .(3) First, s K (cid:48) ,(cid:96) = (cid:107) k (cid:107) d E ( (cid:96) ( Y ) ) d (cid:89) q =1 h (cid:48) q . Then, E ( (cid:104) K ( X , . ) , K (cid:48) ( X , . ) (cid:96) ( Y ) (cid:105) ) = E (( K ∗ K (cid:48) )( X − X ) (cid:96) ( Y ) ) (cid:54) (cid:107) f (cid:107) ∞ (cid:107) K ∗ K (cid:48) (cid:107) E ( (cid:96) ( Y ) ) (cid:54) (cid:107) f (cid:107) ∞ (cid:107) k (cid:107) d s K (cid:48) ,(cid:96) . (4) For every ψ ∈ L ( R d ) , E ( (cid:104) K ( X , . ) , ψ (cid:105) ) = E (( K ∗ ψ )( X ) ) (cid:54) (cid:107) f (cid:107) ∞ (cid:107) K ∗ ψ (cid:107) (cid:54) (cid:107) f (cid:107) ∞ (cid:107) k (cid:107) d (cid:107) ψ (cid:107) . A.2.

Proof of Proposition 2.3.

Consider

K, K (cid:48) ∈ K B ,..., B n ( m max ) . Then, there exist m, m (cid:48) ∈{ , . . . , m max } d such that K ( x (cid:48) , x ) = d (cid:89) q =1 m q (cid:88) j =1 ϕ m q j ( x q ) ϕ m q j ( x (cid:48) q ) and K (cid:48) ( x (cid:48) , x ) = d (cid:89) q =1 m (cid:48) q (cid:88) j =1 ϕ m (cid:48) q j ( x q ) ϕ m (cid:48) q j ( x (cid:48) q ) for every x, x (cid:48) ∈ R d . ERNEL SELECTION IN NONPARAMETRIC REGRESSION 9 (1) For every x (cid:48) ∈ R d , (cid:107) K ( x (cid:48) , . ) (cid:107) = d (cid:89) q =1 m q (cid:88) j,j (cid:48) =1 ϕ m q j (cid:48) ( x (cid:48) q ) ϕ m q j ( x (cid:48) q ) (cid:90) ∞−∞ ϕ m q j (cid:48) ( x ) ϕ m q j ( x ) dx = d (cid:89) q =1 m q (cid:88) j =1 ϕ m q j ( x (cid:48) q ) (cid:54) m d B d (cid:89) q =1 m q (cid:54) m d B n. (2) Since s K,(cid:96) ( . ) = m (cid:88) j =1 · · · m d (cid:88) j d =1 (cid:104) s, ϕ m j ⊗ · · · ⊗ ϕ m d j d (cid:105) ( ϕ m j ⊗ · · · ⊗ ϕ m d j d )( . ) , by Pythagore’s theorem, (cid:107) s K,(cid:96) (cid:107) (cid:54) (cid:107) s (cid:107) .(3) First, s K (cid:48) ,(cid:96) = E  (cid:96) ( Y ) d (cid:89) q =1 m (cid:48) q (cid:88) j =1 ϕ m (cid:48) q j ( X ,q )  (cid:54) m d B E ( (cid:96) ( Y ) ) d (cid:89) q =1 m (cid:48) q . On the one hand, if B , . . . , B n satisfy Condition (5), then E ( (cid:104) K ( X , . ) , K (cid:48) ( X , . ) (cid:96) ( Y ) (cid:105) ) = (cid:90) R d E  d (cid:89) q =1 m q ∧ m (cid:48) q (cid:88) j =1 ϕ m (cid:48) q j ( x (cid:48) q ) ϕ m (cid:48) q j ( X ,q )  (cid:96) ( Y )  f ( x (cid:48) ) λ d ( dx (cid:48) ) (cid:54) (cid:107) f (cid:107) ∞ E  (cid:96) ( Y ) d (cid:89) q =1 m q ∧ m (cid:48) q (cid:88) j,j (cid:48) =1 ϕ m (cid:48) q j (cid:48) ( X ,q ) ϕ m (cid:48) q j ( X ,q ) (cid:90) ∞−∞ ϕ m (cid:48) q j (cid:48) ( x (cid:48) ) ϕ m (cid:48) q j ( x (cid:48) ) dx (cid:48)  (cid:54) (cid:107) f (cid:107) ∞ s K (cid:48) ,(cid:96) . On the other hand, if B , . . . , B n satisfy Condition (6), then E ( (cid:104) K ( X , . ) , K (cid:48) ( X , . ) (cid:96) ( Y ) (cid:105) ) (cid:54) E ( (cid:107) K ( X , . ) (cid:107) (cid:107) K (cid:48) ( X , . ) (cid:107) (cid:96) ( Y ) )= E ( K ( X , X )) E ( (cid:107) K (cid:48) ( X , . ) (cid:107) (cid:96) ( Y ) ) (cid:54) m B s K (cid:48) ,(cid:96) . (4) For every ψ ∈ L ( R d ) , E ( (cid:104) K ( X , . ) , ψ (cid:105) ) = E (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) m (cid:88) j =1 · · · m d (cid:88) j d =1 (cid:104) ψ, ϕ m j ⊗ · · · ⊗ ϕ m d j d (cid:105) ( ϕ m j ⊗ · · · ⊗ ϕ m d j d )( X ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)  (cid:54) (cid:107) f (cid:107) ∞ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) m (cid:88) j =1 · · · m d (cid:88) j d =1 (cid:104) ψ, ϕ m j ⊗ · · · ⊗ ϕ m d j d (cid:105) ( ϕ m j ⊗ · · · ⊗ ϕ m d j d )( . ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:54) (cid:107) f (cid:107) ∞ (cid:107) ψ (cid:107) . A.3.

Proof of Proposition 3.3.

For the sake of readability, assume that d = 1 . Consider K, K (cid:48) ∈K B ,..., B n ( m max ) . Then, there exist m, m (cid:48) ∈ { , . . . , m max } such that K ( x (cid:48) , x ) = m (cid:88) j =1 χ j ( x ) χ j ( x (cid:48) ) and K (cid:48) ( x (cid:48) , x ) = m (cid:48) (cid:88) j =1 χ j ( x ) χ j ( x (cid:48) ) ; ∀ x, x (cid:48) ∈ R . † First, there exist m ( m, m (cid:48) ) ∈ { , . . . , n } and c > , not depending on n , K and K (cid:48) , such that for any x (cid:48) ∈ [0 , , |(cid:104) K ( x (cid:48) , . ) , s K (cid:48) ,(cid:96) (cid:105) | = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) m ∧ m (cid:48) (cid:88) j =1 E ( (cid:96) ( Y ) χ j ( X )) χ j ( x (cid:48) ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:54) c + 2 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) m ( m,m (cid:48) ) (cid:88) j =1 E ( (cid:96) ( Y )(cos(2 πjX ) cos(2 πjx (cid:48) ) + sin(2 πjX ) sin(2 πjx (cid:48) )) [0 , ( X )) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = c + 2 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) m ( m,m (cid:48) ) (cid:88) j =1 E ( (cid:96) ( Y ) cos(2 πj ( X − x (cid:48) )) [0 , ( X )) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . Moreover, for any j ∈ { , . . . , m ( m, m (cid:48) ) } , E ( (cid:96) ( Y ) cos(2 πj ( X − x (cid:48) )) [0 , ( X )) = (cid:90) cos(2 πj ( x − x (cid:48) )) s ( x ) dx = 1 j (cid:20) sin(2 πj ( x − x (cid:48) ))2 π s ( x ) (cid:21) + 1 j (cid:20) cos(2 πj ( x − x (cid:48) ))4 π s (cid:48) ( x ) (cid:21) − j (cid:90) cos(2 πj ( x − x (cid:48) ))4 π s (cid:48)(cid:48) ( x ) dx = s (0) − s (1)2 π · α j ( x (cid:48) ) j + β j ( x (cid:48) ) j where α j ( x (cid:48) ) := sin(2 πjx (cid:48) ) and β j ( x (cid:48) ) := 14 π (cid:18) ( s (cid:48) (1) − s (cid:48) (0)) cos(2 πjx (cid:48) ) − (cid:90) cos(2 πj ( x − x (cid:48) )) s (cid:48)(cid:48) ( x ) dx (cid:19) . Then, there exists a deterministic constant c > , not depending on n , K , K (cid:48) and x (cid:48) , such that(8) (cid:104) K ( x (cid:48) , . ) , s K (cid:48) ,(cid:96) (cid:105) (cid:54) c   m ( m,m (cid:48) ) (cid:88) j =1 α j ( x (cid:48) ) j  +  m ( m,m (cid:48) ) (cid:88) j =1 β j ( x (cid:48) ) j   . Let us show that each term of the right-hand side of Inequality (8) are uniformly bounded in x (cid:48) , m and m (cid:48) . On the one hand, (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) m ( m,m (cid:48) ) (cid:88) j =1 β j ( x (cid:48) ) j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:54) max j ∈{ ,...,n } (cid:107) β j (cid:107) ∞ n (cid:88) j =1 j (cid:54)

124 (2 (cid:107) s (cid:48) (cid:107) ∞ + (cid:107) s (cid:48)(cid:48) (cid:107) ∞ ) . On the other hand, for every x ∈ ]0 , π [ such that [ π/x ] + 1 (cid:54) m ( m, m (cid:48) ) (without loss of generality), (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) m ( m,m (cid:48) ) (cid:88) j =1 sin( jx ) j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:54) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) [ π/x ] (cid:88) j =1 sin( jx ) j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) m ( m,m (cid:48) ) (cid:88) j =[ π/x ]+1 sin( jx ) j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:54) x (cid:104) πx (cid:105) + 2(1 + [ π/x ]) sin( x/ (cid:54) π + 2 . (9)Since x (cid:55)→ sin( x ) is continuous, odd and π -periodic, Inequality (9) holds true for every x ∈ R . So, (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) m ( m,m (cid:48) ) (cid:88) j =1 α j ( x (cid:48) ) j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:54) π + 2 . ERNEL SELECTION IN NONPARAMETRIC REGRESSION 11

Therefore, E (cid:34) sup K,K (cid:48) ∈K B ,..., B n ( m max ) (cid:104) K ( X , . ) , s K (cid:48) ,(cid:96) (cid:105) (cid:35) (cid:54) c (cid:18) π + 2) + 124 (2 (cid:107) s (cid:48) (cid:107) ∞ + (cid:107) s (cid:48)(cid:48) (cid:107) ∞ ) (cid:19) . Appendix B. Proofs of risk bounds

In this section, the proofs follow the same pattern as in Comte and Marie [2, 3].B.1.

Preliminary results.

This subsection provides three lemmas used several times in the sequel.

Lemma B.1.

Consider U K,K (cid:48) ,(cid:96) ( n ) := (cid:88) i (cid:54) = j (cid:104) K ( X i , . ) (cid:96) ( Y i ) − s K,(cid:96) , K (cid:48) ( X j , . ) (cid:96) ( Y j ) − s K (cid:48) ,(cid:96) (cid:105) ; ∀ K, K (cid:48) ∈ K n . Under Assumption 2.1.(1,2,3), if s ∈ L ( R d ) and if there exists α > such that E (exp( α | (cid:96) ( Y ) | )) < ∞ ,then there exists a deterministic constant c B. > , not depending on n , such that for every θ ∈ ]0 , , E (cid:18) sup K,K (cid:48) ∈K n (cid:26) | U K,K (cid:48) ,(cid:96) ( n ) | n − θn s K (cid:48) ,(cid:96) (cid:27)(cid:19) (cid:54) c B. log( n ) θn . Lemma B.2.

Consider V K,(cid:96) ( n ) := 1 n n (cid:88) i =1 (cid:107) K ( X i , . ) (cid:96) ( Y i ) − s K,(cid:96) (cid:107) ; ∀ K ∈ K n . Under Assumption 2.1.(1,2), if s ∈ L ( R d ) and if there exists α > such that E (exp( α | (cid:96) ( Y ) | )) < ∞ ,then there exists a deterministic constant c B. > , not depending on n , such that for every θ ∈ ]0 , , E (cid:18) sup K ∈K n (cid:26) n | V K,(cid:96) ( n ) − s K,(cid:96) | − θn s K,(cid:96) (cid:27)(cid:19) (cid:54) c B. log( n ) θn . Lemma B.3.

Consider W K,K (cid:48) ,(cid:96) ( n ) := (cid:104) (cid:98) s K,(cid:96) ( n ; . ) − s K,(cid:96) , s K (cid:48) ,(cid:96) − s (cid:105) ; ∀ K, K (cid:48) ∈ K n . Under Assumption 2.1.(1,2,4), if s ∈ L ( R d ) and if there exists α > such that E (exp( α | (cid:96) ( Y ) | )) < ∞ ,then there exists a deterministic constant c B. > , not depending on n , such that for every θ ∈ ]0 , , E (cid:18) sup K,K (cid:48) ∈K n {| W K,K (cid:48) ,(cid:96) ( n ) | − θ (cid:107) s K (cid:48) ,(cid:96) − s (cid:107) } (cid:19) (cid:54) c B. log( n ) θn . B.1.1.

Proof of Lemma B.1.

Consider m ( n ) := 8 log( n ) /α . For any K, K (cid:48) ∈ K n , U K,K (cid:48) ,(cid:96) ( n ) = U K,K (cid:48) ,(cid:96) ( n ) + U K,K (cid:48) ,(cid:96) ( n ) + U K,K (cid:48) ,(cid:96) ( n ) + U K,K (cid:48) ,(cid:96) ( n ) where U lK,K (cid:48) ,(cid:96) ( n ) := (cid:88) i (cid:54) = j g lK,K (cid:48) ,(cid:96) ( n ; X i , Y i , X j , Y j ) ; l = 1 , , , with, for every ( x (cid:48) , y ) , ( x (cid:48)(cid:48) , y (cid:48) ) ∈ E = R d × R , g K,K (cid:48) ,(cid:96) ( n ; x (cid:48) , y, x (cid:48)(cid:48) , y (cid:48) ) := (cid:104) K ( x (cid:48) , . ) (cid:96) ( y ) | (cid:96) ( y ) | (cid:54) m ( n ) − s + K,(cid:96) ( n ; . ) , K (cid:48) ( x (cid:48)(cid:48) , . ) (cid:96) ( y (cid:48) ) | (cid:96) ( y ) | (cid:54) m ( n ) − s + K (cid:48) ,(cid:96) ( n ; . ) (cid:105) ,g K,K (cid:48) ,(cid:96) ( n ; x (cid:48) , y, x (cid:48)(cid:48) , y (cid:48) ) := (cid:104) K ( x (cid:48) , . ) (cid:96) ( y ) | (cid:96) ( y ) | > m ( n ) − s − K,(cid:96) ( n ; . ) , K (cid:48) ( x (cid:48)(cid:48) , . ) (cid:96) ( y (cid:48) ) | (cid:96) ( y ) | (cid:54) m ( n ) − s + K (cid:48) ,(cid:96) ( n ; . ) (cid:105) ,g K,K (cid:48) ,(cid:96) ( n ; x (cid:48) , y, x (cid:48)(cid:48) , y (cid:48) ) := (cid:104) K ( x (cid:48) , . ) (cid:96) ( y ) | (cid:96) ( y ) | (cid:54) m ( n ) − s + K,(cid:96) ( n ; . ) , K (cid:48) ( x (cid:48)(cid:48) , . ) (cid:96) ( y (cid:48) ) | (cid:96) ( y ) | > m ( n ) − s − K (cid:48) ,(cid:96) ( n ; . ) (cid:105) ,g K,K (cid:48) ,(cid:96) ( n ; x (cid:48) , y, x (cid:48)(cid:48) , y (cid:48) ) := (cid:104) K ( x (cid:48) , . ) (cid:96) ( y ) | (cid:96) ( y ) | > m ( n ) − s − K,(cid:96) ( n ; . ) , K (cid:48) ( x (cid:48)(cid:48) , . ) (cid:96) ( y (cid:48) ) | (cid:96) ( y ) | > m ( n ) − s − K (cid:48) ,(cid:96) ( n ; . ) (cid:105) and, for every k ∈ K n , s + k,(cid:96) ( n ; . ) := E ( k ( X , . ) (cid:96) ( Y ) | (cid:96) ( Y ) | (cid:54) m ( n ) ) and s − k,(cid:96) ( n ; . ) := E ( k ( X , . ) (cid:96) ( Y ) | (cid:96) ( Y ) | > m ( n ) ) . † On the one hand, since E ( g K,K (cid:48) ,(cid:96) ( n ; x (cid:48) , y, X , Y )) = 0 for every ( x (cid:48) , y ) ∈ E , by Giné and Nickl [7],Theorem 3.4.8, there exists a universal constant m (cid:62) such that for any λ > , with probability largerthan − . e − λ , | U K,K (cid:48) ,(cid:96) ( n ) | n (cid:54) m n ( c K,K (cid:48) ,(cid:96) ( n ) λ / + d K,K (cid:48) ,(cid:96) ( n ) λ + b K,K (cid:48) ,(cid:96) ( n ) λ / + a K,K (cid:48) ,(cid:96) ( n ) λ ) where the constants a K,K (cid:48) ,(cid:96) ( n ) , b K,K (cid:48) ,(cid:96) ( n ) , c K,K (cid:48) ,(cid:96) ( n ) and d K,K (cid:48) ,(cid:96) ( n ) are deﬁned and controlled later.First, note that U K,K (cid:48) ,(cid:96) ( n ) = (cid:88) i (cid:54) = j ( ϕ K,K (cid:48) ,(cid:96) ( n ; X i , Y i , X j , Y j ) − ψ K,K (cid:48) ,(cid:96) ( n ; X i , Y i ) − ψ K (cid:48) ,K,(cid:96) ( n ; X j , Y j ) + E ( ϕ K,K (cid:48) ,(cid:96) ( n ; X i , Y i , X j , Y j ))) , (10)where ϕ K,K (cid:48) ,(cid:96) ( n ; x (cid:48) , y, x (cid:48)(cid:48) , y (cid:48)(cid:48) ) := (cid:104) K ( x (cid:48) , . ) (cid:96) ( y ) | (cid:96) ( y ) | (cid:54) m ( n ) , K (cid:48) ( x (cid:48)(cid:48) , . ) (cid:96) ( y (cid:48) ) | (cid:96) ( y (cid:48) ) | (cid:54) m ( n ) (cid:105) and ψ k,k (cid:48) ,(cid:96) ( n ; x (cid:48) , y ) := (cid:104) k ( x (cid:48) , . ) (cid:96) ( y ) | (cid:96) ( y ) | (cid:54) m ( n ) , s + k (cid:48) ,(cid:96) ( n ; . ) (cid:105) = E ( ϕ k,k (cid:48) ,(cid:96) ( n ; x (cid:48) , y, X , Y )) for every k, k (cid:48) ∈ K n and ( x (cid:48) , y ) , ( x (cid:48)(cid:48) , y (cid:48) ) ∈ E . Let us now control a K,K (cid:48) ,(cid:96) ( n ) , b K,K (cid:48) ,(cid:96) ( n ) , c K,K (cid:48) ,(cid:96) ( n ) and d K,K (cid:48) ,(cid:96) ( n ) : • The constant a K,K (cid:48) ,(cid:96) ( n ) . Consider a K,K (cid:48) ,(cid:96) ( n ) := sup ( x (cid:48) ,y ) , ( x (cid:48)(cid:48) ,y (cid:48) ) ∈ E | g K,K (cid:48) ,(cid:96) ( n ; x (cid:48) , y, x (cid:48)(cid:48) , y (cid:48) ) | . By (10), Cauchy-Schwarz’s inequality and Assumption 2.1.(1), a K,K (cid:48) ,(cid:96) ( n ) (cid:54) ( x (cid:48) ,y ) , ( x (cid:48)(cid:48) ,y (cid:48) ) ∈ E |(cid:104) K ( x (cid:48) , . ) (cid:96) ( y ) | (cid:96) ( y ) | (cid:54) m ( n ) , K (cid:48) ( x (cid:48)(cid:48) , . ) (cid:96) ( y (cid:48) ) | (cid:96) ( y (cid:48) ) | (cid:54) m ( n ) (cid:105) | (cid:54) m ( n ) (cid:18) sup x (cid:48) ∈ R d (cid:107) K ( x (cid:48) , . ) (cid:107) (cid:19) (cid:18) sup x (cid:48)(cid:48) ∈ R d (cid:107) K (cid:48) ( x (cid:48)(cid:48) , . ) (cid:107) (cid:19) (cid:54) m K ,(cid:96) m ( n ) n. So, n a K,K (cid:48) ,(cid:96) ( n ) λ (cid:54) n m K ,(cid:96) m ( n ) λ . • The constant b K,K (cid:48) ,(cid:96) ( n ) . Consider b K,K (cid:48) ,(cid:96) ( n ) := n sup ( x (cid:48) ,y ) ∈ E E ( g K,K (cid:48) ,(cid:96) ( n ; x (cid:48) , y, X , Y ) ) . By (10), Jensen’s inequality, Cauchy-Schwarz’s inequality and Assumption 2.1.(1), b K,K (cid:48) ,(cid:96) ( n ) (cid:54) n sup ( x (cid:48) ,y ) ∈ E E ( (cid:104) K ( x (cid:48) , . ) (cid:96) ( y ) | (cid:96) ( y ) | (cid:54) m ( n ) , K (cid:48) ( X , . ) (cid:96) ( Y ) | (cid:96) ( Y ) | (cid:54) m ( n ) (cid:105) ) (cid:54) n m ( n ) sup x (cid:48) ∈ R d (cid:107) K ( x (cid:48) , . ) (cid:107) E ( (cid:107) K (cid:48) ( X , . ) (cid:96) ( Y ) | (cid:96) ( Y ) | (cid:54) m ( n ) (cid:107) ) (cid:54) m K ,(cid:96) n m ( n ) s K (cid:48) ,(cid:96) . So, for any θ ∈ ]0 , , n b K,K (cid:48) ,(cid:96) ( n ) λ / (cid:54) (cid:18) m θ (cid:19) / n / m / K ,(cid:96) m ( n ) λ / × (cid:18) θ m (cid:19) / n / s / K (cid:48) ,(cid:96) (cid:54) θ m n s K (cid:48) ,(cid:96) + 12 m λ θn m K ,(cid:96) m ( n ) . • The constant c K,K (cid:48) ,(cid:96) ( n ) . Consider c K,K (cid:48) ,(cid:96) ( n ) := n E ( g K,K (cid:48) ,(cid:96) ( n ; X , Y , X , Y ) ) . By (10), Jensen’s inequality and Assumption 2.1.(3), c K,K (cid:48) ,(cid:96) ( n ) (cid:54) n E ( (cid:104) K ( X , . ) (cid:96) ( Y ) | (cid:96) ( Y ) | (cid:54) m ( n ) , K (cid:48) ( X , . ) (cid:96) ( Y ) | (cid:96) ( Y ) | (cid:54) m ( n ) (cid:105) ) (cid:54) n m ( n ) E ( (cid:104) K ( X , . ) , K (cid:48) ( X , . ) (cid:96) ( Y ) (cid:105) ) (cid:54) m K ,(cid:96) n m ( n ) s K (cid:48) ,(cid:96) . ERNEL SELECTION IN NONPARAMETRIC REGRESSION 13

So, n c K,K (cid:48) ,(cid:96) ( n ) λ / (cid:54) θ m n s K (cid:48) ,(cid:96) + 12 m λθn m K ,(cid:96) m ( n ) . • The constant d K,K (cid:48) ,(cid:96) ( n ) . Consider d K,K (cid:48) ,(cid:96) ( n ) := sup ( a,b ) ∈A E (cid:88) i , with probability larger than − . e − λ , | U K,K (cid:48) ,(cid:96) ( n ) | n (cid:54) θn s K (cid:48) ,(cid:96) + 40 m θn m K ,(cid:96) m ( n ) (1 + λ ) . So, with probability larger than − . |K n | e − λ , S K ,(cid:96) ( n, θ ) := sup K,K (cid:48) ∈K n (cid:40) | U K,K (cid:48) ,(cid:96) ( n ) | n − θn s K (cid:48) ,(cid:96) (cid:41) (cid:54) m θn m K ,(cid:96) m ( n ) (1 + λ ) . For every t ∈ R + , consider λ K ,(cid:96) ( n, θ, t ) := − (cid:18) t m K ,(cid:96) ( n, θ ) (cid:19) / with m K ,(cid:96) ( n, θ ) = 40 m θn m K ,(cid:96) m ( n ) . Then, for any

T > , E ( S K ,(cid:96) ( n, θ )) (cid:54) T + (cid:90) ∞ T P ( S K ,(cid:96) ( n, θ ) (cid:62) (1 + λ K ,(cid:96) ( n, θ, t )) m K ,(cid:96) ( n, θ )) dt (cid:54) T + 5 . c |K n | m K ,(cid:96) ( n, θ ) exp (cid:18) − T / m K ,(cid:96) ( n, θ ) / (cid:19) with c = (cid:90) ∞ e − r / / dr. Moreover, m K ,(cid:96) ( n, θ ) (cid:54) c log( n ) θn with c = 40 · m α m K ,(cid:96) . So, by taking T = 2 c log( n ) θn , and since |K n | (cid:54) n , E ( S K ,(cid:96) ( n, θ )) (cid:54) c log( n ) θn + 5 . c m K ,(cid:96) ( n, θ ) |K n | n (cid:54) (2 + 5 . c ) c log( n ) θn . † On the other hand, by Assumption 2.1.(1), Cauchy-Schwarz’s inequality and Markov’s inequality, E (cid:32) sup K,K (cid:48) ∈K n | g K,K (cid:48) ,(cid:96) ( n ; X , Y , X , Y ) | (cid:33) (cid:54) m ( n ) (cid:88) K,K (cid:48) ∈K n E ( | (cid:96) ( Y ) | | (cid:96) ( Y ) | > m ( n ) |(cid:104) K ( X , . ) , K (cid:48) ( X , . ) (cid:105) | ) (cid:54) m ( n ) m K ,(cid:96) n |K n | E ( (cid:96) ( Y ) ) / P ( | (cid:96) ( Y ) | > m ( n )) / (cid:54) c log( n ) n with c = 32 α m K ,(cid:96) E ( (cid:96) ( Y ) ) / E (exp( α | (cid:96) ( Y ) | )) / . So, E (cid:32) sup K,K (cid:48) ∈K n | U K,K (cid:48) ,(cid:96) ( n ) | n (cid:33) (cid:54) c log( n ) n and, symmetrically, E (cid:32) sup K,K (cid:48) ∈K n | U K,K (cid:48) ,(cid:96) ( n ) | n (cid:33) (cid:54) c log( n ) n . By Assumption 2.1.(1), Cauchy-Schwarz’s inequality and Markov’s inequality, E (cid:32) sup K,K (cid:48) ∈K n | g K,K (cid:48) ,(cid:96) ( n ; X , Y , X , Y ) | (cid:33) (cid:54) (cid:88) K,K (cid:48) ∈K n E ( | (cid:96) ( Y ) (cid:96) ( Y ) | | (cid:96) ( Y ) | , | (cid:96) ( Y ) | > m ( n ) |(cid:104) K ( X , . ) , K (cid:48) ( X , . ) (cid:105) | ) (cid:54) m K ,(cid:96) n |K n | E ( (cid:96) ( Y ) ) P ( | (cid:96) ( Y ) | > m ( n )) (cid:54) c n with c = 4 m K ,(cid:96) E ( (cid:96) ( Y ) ) E (exp( α | (cid:96) ( Y ) | )) . So, E (cid:32) sup K,K (cid:48) ∈K n | U K,K (cid:48) ,(cid:96) ( n ) | n (cid:33) (cid:54) c n . Therefore, E (cid:18) sup K,K (cid:48) ∈K n (cid:26) | U K,K (cid:48) ,(cid:96) ( n ) | n − θn s K (cid:48) ,(cid:96) (cid:27)(cid:19) (cid:54) (2 + 5 . c ) c log( n ) θn + 2 c log( n ) n + c n . B.1.2.

Proof of Lemma B.2.

First, the two following results are used several times in the sequel: (cid:107) s K,(cid:96) (cid:107) (cid:54) E ( (cid:96) ( Y ) ) (cid:90) R d f ( x (cid:48) ) (cid:90) R d K ( x (cid:48) , x ) λ d ( dx ) λ d ( dx (cid:48) ) (cid:54) E ( (cid:96) ( Y ) ) m K ,(cid:96) n (11)and E ( V K,(cid:96) ( n )) = E ( (cid:107) K ( X , . ) (cid:96) ( Y ) − s K,(cid:96) (cid:107) )= E ( (cid:107) K ( X , . ) (cid:96) ( Y ) (cid:107) ) + (cid:107) s K,(cid:96) (cid:107) − (cid:90) R d s K,(cid:96) ( x ) E ( K ( X , x ) (cid:96) ( Y )) λ d ( dx ) = s K,(cid:96) − (cid:107) s K,(cid:96) (cid:107) . (12)Consider m ( n ) := 2 log( n ) /α and v K,(cid:96) ( n ) := V K,(cid:96) ( n ) − E ( V K,(cid:96) ( n )) = v K,(cid:96) ( n ) + v K,(cid:96) ( n ) , where v jK,(cid:96) ( n ) = 1 n n (cid:88) i =1 ( g jK,(cid:96) ( n ; X i , Y i ) − E ( g jK,(cid:96) ( n ; X i , Y i ))) ; j = 1 , with, for every ( x (cid:48) , y ) ∈ E , g K,(cid:96) ( n ; x (cid:48) , y ) := (cid:107) K ( x (cid:48) , . ) (cid:96) ( y ) − s K,(cid:96) (cid:107) | (cid:96) ( y ) | (cid:54) m ( n ) ERNEL SELECTION IN NONPARAMETRIC REGRESSION 15 and g K,(cid:96) ( n ; x (cid:48) , y ) := (cid:107) K ( x (cid:48) , . ) (cid:96) ( y ) − s K,(cid:96) (cid:107) | (cid:96) ( y ) | > m ( n ) . On the one hand, by Bernstein’s inequality, for any λ > , with probability larger than − e − λ , | v K,(cid:96) ( n ) | (cid:54) (cid:114) λn v K,(cid:96) ( n ) + λn c K,(cid:96) ( n ) where c K,(cid:96) ( n ) = (cid:107) g K,(cid:96) ( n ; . ) (cid:107) ∞ and v K,(cid:96) ( n ) = E ( g K,(cid:96) ( n ; X , Y ) ) . Moreover, c K,(cid:96) ( n ) = 13 sup ( x (cid:48) ,y ) ∈ E (cid:107) K ( x (cid:48) , . ) (cid:96) ( y ) − s K,(cid:96) (cid:107) | (cid:96) ( y ) | (cid:54) m ( n ) (cid:54) (cid:18) m ( n ) sup x (cid:48) ∈ R d (cid:107) K ( x (cid:48) , . ) (cid:107) + (cid:107) s K,(cid:96) (cid:107) (cid:19) (cid:54)

23 ( m ( n ) + E ( (cid:96) ( Y ) )) m K ,(cid:96) n by Inequality (11), and v K,(cid:96) ( n ) (cid:54) (cid:107) g K,(cid:96) ( n ; . ) (cid:107) ∞ E ( V K,(cid:96) ( n )) (cid:54) m ( n ) + E ( (cid:96) ( Y ) )) m K ,(cid:96) n ( s K,(cid:96) − (cid:107) s K,(cid:96) (cid:107) ) by Inequality (11) and Equality (12). Then, for any θ ∈ ]0 , , | v K,(cid:96) ( n ) | (cid:54) (cid:113) λ ( m ( n ) + E ( (cid:96) ( Y ) )) m K ,(cid:96) ( s K,(cid:96) − (cid:107) s K,(cid:96) (cid:107) ) + 2 λ m ( n ) + E ( (cid:96) ( Y ) )) m K ,(cid:96) (cid:54) θ s K,(cid:96) + 5 λ θ (1 + E ( (cid:96) ( Y ) )) m K ,(cid:96) m ( n ) with probability larger than − e − λ . So, with probability larger than − |K n | e − λ , S K ,(cid:96) ( n, θ ) := sup K ∈K n (cid:40) | v K,(cid:96) ( n ) | n − θn s K,(cid:96) (cid:41) (cid:54) λ θn (1 + E ( (cid:96) ( Y ) )) m K ,(cid:96) m ( n ) . For every t ∈ R + , consider λ K ,(cid:96) ( n, θ, t ) := t m K ,(cid:96) ( n, θ ) with m K ,(cid:96) ( n, θ ) = 53 θn (1 + E ( (cid:96) ( Y ) )) m K ,(cid:96) m ( n ) . Then, for any

T > , E ( S K ,(cid:96) ( n, θ )) (cid:54) T + (cid:90) ∞ T P ( S K ,(cid:96) ( n, θ ) (cid:62) λ K ,(cid:96) ( n, θ, t ) m K ,(cid:96) ( n, θ )) dt (cid:54) T + 2 c |K n | m K ,(cid:96) ( n, θ ) exp (cid:18) − T m K ,(cid:96) ( n, θ ) (cid:19) with c = (cid:90) ∞ e − r/ dr = 2 . Moreover, m K ,(cid:96) ( n, θ ) (cid:54) c log( n ) θn with c = 103 α (1 + E ( (cid:96) ( Y ) )) m K ,(cid:96) . So, by taking T = 2 c log( n ) θn , and since |K n | (cid:54) n , E ( S K ,(cid:96) ( n, θ )) (cid:54) c log( n ) θn + 4 m K ,(cid:96) ( n, θ ) |K n | n (cid:54) c log( n ) θn . † On the other hand, by Inequality (11) and Markov’s inequality, E (cid:34) sup K ∈K n | v K,(cid:96) ( n ) | n (cid:35) (cid:54) n E (cid:18) sup K ∈K n (cid:107) K ( X , . ) (cid:96) ( Y ) − s K,(cid:96) (cid:107) | (cid:96) ( Y ) | > m ( n ) (cid:19) (cid:54) n E (cid:34)(cid:12)(cid:12)(cid:12)(cid:12) (cid:96) ( Y ) sup K ∈K n (cid:107) K ( X , . ) (cid:107) + sup K ∈K n (cid:107) s K,(cid:96) (cid:107) (cid:12)(cid:12)(cid:12)(cid:12) (cid:35) / P ( | (cid:96) ( Y ) | > m ( n )) / (cid:54) c n with c = 8 m K ,(cid:96) E ( (cid:96) ( Y ) ) / E (exp( α | (cid:96) ( Y ) | )) / . Therefore, E (cid:18) sup K ∈K n (cid:26) | v K,(cid:96) ( n ) | n − θn s K,(cid:96) ) (cid:27)(cid:19) (cid:54) c log( n ) θn + c n and, by Equality (12), the deﬁnition of v K,(cid:96) ( n ) and Assumption 2.1.(2), E (cid:18) sup K ∈K n (cid:26) n | V K,(cid:96) ( n ) − s K,(cid:96) | − θn s K,(cid:96) (cid:27)(cid:19) (cid:54) c log( n ) θn + c + m K ,(cid:96) n . B.1.3.

Proof of Lemma B.3.

Consider m ( n ) = 12 log( n ) /α . For any K, K (cid:48) ∈ K n , W K,K (cid:48) ,(cid:96) ( n ) = W K,K (cid:48) ,(cid:96) ( n ) + W K,K (cid:48) ,(cid:96) ( n ) where W jK,K (cid:48) ,(cid:96) ( n ) := 1 n n (cid:88) i =1 ( g jK,K (cid:48) ,(cid:96) ( n ; X i , Y i ) − E ( g jK,K (cid:48) ,(cid:96) ( n ; X i , Y i ))) ; j = 1 , with, for every ( x (cid:48) , y ) ∈ E , g K,K (cid:48) ,(cid:96) ( n ; x (cid:48) , y ) := (cid:104) K ( x (cid:48) , . ) (cid:96) ( y ) , s K (cid:48) ,(cid:96) − s (cid:105) | (cid:96) ( y ) | (cid:54) m ( n ) and g K,K (cid:48) ,(cid:96) ( n ; x (cid:48) , y ) := (cid:104) K ( x (cid:48) , . ) (cid:96) ( y ) , s K (cid:48) ,(cid:96) − s (cid:105) | (cid:96) ( y ) | > m ( n ) . On the one hand, by Bernstein’s inequality, for any λ > , with probability larger than − e − λ , | W K,K (cid:48) ,(cid:96) ( n ) | (cid:54) (cid:114) λn v K,K (cid:48) ,(cid:96) ( n ) + λn c K,K (cid:48) ,(cid:96) ( n ) where c K,K (cid:48) ,(cid:96) ( n ) = (cid:107) g K,K (cid:48) ,(cid:96) ( n ; . ) (cid:107) ∞ and v K,K (cid:48) ,(cid:96) ( n ) = E ( g K,K (cid:48) ,(cid:96) ( n ; X , Y ) ) . Moreover, c K,K (cid:48) ,(cid:96) ( n ) = 13 sup ( x (cid:48) ,y ) ∈ E |(cid:104) K ( x (cid:48) , . ) (cid:96) ( y ) , s K (cid:48) ,(cid:96) − s (cid:105) | | (cid:96) ( y ) | (cid:54) m ( n ) (cid:54) m ( n ) (cid:107) s K (cid:48) ,(cid:96) − s (cid:107) sup x (cid:48) ∈ R d (cid:107) K ( x (cid:48) , . ) (cid:107) (cid:54) m / K ,(cid:96) n / m ( n ) (cid:107) s K (cid:48) ,(cid:96) − s (cid:107) by Assumption 2.1.(1), and v K,(cid:96) ( n ) (cid:54) E ( (cid:104) K ( X , . ) (cid:96) ( Y ) , s K (cid:48) ,(cid:96) − s (cid:105) | (cid:96) ( Y ) | (cid:54) m ( n ) ) (cid:54) m ( n ) m K ,(cid:96) (cid:107) s K (cid:48) ,(cid:96) − s (cid:107) by Assumption 2.1.(4). Then, since λ > , for any θ ∈ ]0 , , | W K,K (cid:48) ,(cid:96) ( n ) | (cid:54) (cid:114) λn m ( n ) m K ,(cid:96) (cid:107) s K (cid:48) ,(cid:96) − s (cid:107) + λ n / m / K ,(cid:96) m ( n ) (cid:107) s K (cid:48) ,(cid:96) − s (cid:107) (cid:54) θ (cid:107) s K (cid:48) ,(cid:96) − s (cid:107) + m K ,(cid:96) θn m ( n ) (1 + λ ) with probability larger than − e − λ . So, with probability larger than − |K n | e − λ , S K ,(cid:96) ( n, θ ) := sup K,K (cid:48) ∈K n {| W K,K (cid:48) ,(cid:96) ( n ) | − θ (cid:107) s K (cid:48) ,(cid:96) − s (cid:107) } (cid:54) m K ,(cid:96) θn m ( n ) (1 + λ ) . ERNEL SELECTION IN NONPARAMETRIC REGRESSION 17

For every t ∈ R + , consider λ K ,(cid:96) ( n, θ, t ) := − (cid:18) t m K ,(cid:96) ( n, θ ) (cid:19) / with m K ,(cid:96) ( n, θ ) = m K ,(cid:96) θn m ( n ) . Then, for any

T > , E ( S K ,(cid:96) ( n, θ )) (cid:54) T + (cid:90) ∞ T P ( S K ,(cid:96) ( n, θ ) (cid:62) (1 + λ K ,(cid:96) ( n, θ, t )) m K ,(cid:96) ( n, θ )) dt (cid:54) T + 2 c |K n | m K ,(cid:96) ( n, θ ) exp (cid:18) − T / m K ,(cid:96) ( n, θ ) / (cid:19) with c = (cid:90) ∞ e − r / / dr. Moreover, m K ,(cid:96) ( n, θ ) (cid:54) c log( n ) θn with c = 12 α m K ,(cid:96) . So, by taking T = 2 c log( n ) θn , and since |K n | (cid:54) n , E ( S K ,(cid:96) ( n, θ )) (cid:54) c log( n ) θn + 2 c m K ,(cid:96) ( n, θ ) |K n | n (cid:54) (2 + 2 c ) c log( n ) θn . On the other hand, by Assumption 2.1.(2,4), Cauchy-Schwarz’s inequality and Markov’s inequality, E (cid:18) sup K,K (cid:48) ∈K n | W K,K (cid:48) ,(cid:96) ( n ) | (cid:19) (cid:54) E ( (cid:96) ( Y ) | (cid:96) ( Y ) | > m ( n ) ) / (cid:88) K,K (cid:48) ∈K n E ( (cid:104) K ( X , . ) , s K (cid:48) ,(cid:96) − s (cid:105) ) / (cid:54) m / K ,(cid:96) (cid:107) s K (cid:48) ,(cid:96) − s (cid:107) E ( (cid:96) ( Y ) ) / |K n | P ( | (cid:96) ( Y ) | > m ( n )) / (cid:54) c n with c = 2 m / K ,(cid:96) ( m / K ,(cid:96) + (cid:107) s (cid:107) ) E ( (cid:96) ( Y ) ) / E (exp( α | (cid:96) ( Y ) | )) / . Therefore, E (cid:18) sup K,K (cid:48) ∈K n {| W K,K (cid:48) ,(cid:96) ( n ) | − θ (cid:107) s K (cid:48) ,(cid:96) − s (cid:107) } (cid:19) (cid:54) (2 + 2 c ) c log( n ) θn + c n (cid:54) c log( n ) θn with c = (2 + 2 c ) c + c .B.2. Proof of Proposition 2.4.

For any K ∈ K n ,(13) (cid:107) (cid:98) s K,(cid:96) ( n ; . ) − s K,(cid:96) (cid:107) = U K,(cid:96) ( n ) n + V K,(cid:96) ( n ) n with U K,(cid:96) ( n ) = U K,K,(cid:96) ( n ) and V K,(cid:96) ( n ) = V K,K,(cid:96) ( n ) . Then, by Lemmas B.1 and B.2, E (cid:18) sup K ∈K n (cid:26)(cid:12)(cid:12)(cid:12) (cid:107) (cid:98) s K,(cid:96) ( n ; . ) − s K,(cid:96) (cid:107) − s K,(cid:96) n (cid:12)(cid:12)(cid:12) − θn s K,(cid:96) (cid:27)(cid:19) (cid:54) c . log( n ) θn with c . = c B. + c B. .B.3. Proof of Theorem 2.5.

On the one hand, for every K ∈ K n , (cid:107) (cid:98) s K,(cid:96) ( n ; . ) − s (cid:107) − (1 + θ ) (cid:16) (cid:107) s K,(cid:96) − s (cid:107) + s K,(cid:96) n (cid:17) can be written (cid:107) (cid:98) s K,(cid:96) ( n ; . ) − s K,(cid:96) (cid:107) − (1 + θ ) s K,(cid:96) n + W K,K,(cid:96) ( n ) − θ (cid:107) s K,(cid:96) − s (cid:107) . Then, by Proposition 2.4 and Lemma B.3, E (cid:18) sup K ∈K n (cid:110) (cid:107) (cid:98) s K,(cid:96) ( n ; . ) − s (cid:107) − (1 + θ ) (cid:16) (cid:107) s K,(cid:96) − s (cid:107) + s K,(cid:96) n (cid:17)(cid:111)(cid:19) (cid:54) c . log( n ) θn † with c . = c . + c B. . On the other hand, for any K ∈ K n , (cid:107) s K,(cid:96) − s (cid:107) = (cid:107) (cid:98) s K,(cid:96) ( n ; . ) − s (cid:107) − (cid:107) (cid:98) s K,(cid:96) ( n ; . ) − s K,(cid:96) (cid:107) − W K,(cid:96) ( n ) . Then, (1 − θ ) (cid:16) (cid:107) s K,(cid:96) − s (cid:107) + s K,(cid:96) n (cid:17) − (cid:107) (cid:98) s K,(cid:96) ( n ; . ) − s (cid:107) (cid:54) | W K,(cid:96) ( n ) | − θ (cid:107) s K,(cid:96) − s (cid:107) + Λ K,(cid:96) ( n ) − θ s K,(cid:96) n where Λ K,(cid:96) ( n ) := (cid:12)(cid:12)(cid:12) (cid:107) (cid:98) s K,(cid:96) − s K,(cid:96) (cid:107) − s K,(cid:96) n (cid:12)(cid:12)(cid:12) . By Equalities (13) and (12), Λ K,(cid:96) ( n ) = (cid:12)(cid:12)(cid:12)(cid:12) U K,(cid:96) ( n ) n + v K,(cid:96) ( n ) n − (cid:107) s K,(cid:96) (cid:107) n (cid:12)(cid:12)(cid:12)(cid:12) . By Lemmas B.2 and B.1, there exists a deterministic constant c > , not depending n and θ , such that E (cid:18) sup K ∈K n (cid:110) Λ K,(cid:96) ( n ) − θ s K,(cid:96) n (cid:111)(cid:19) (cid:54) c log( n ) θn . By Lemma B.3, E (cid:18) sup K ∈K n {| W K,(cid:96) ( n ) | − θ (cid:107) s K,(cid:96) − s (cid:107) } (cid:19) (cid:54) c B. log( n ) θn . Therefore, E (cid:18) sup K ∈K n (cid:26) (cid:107) s K,(cid:96) − s (cid:107) + s K,(cid:96) n − − θ (cid:107) (cid:98) s K,(cid:96) ( n ; . ) − s (cid:107) (cid:27)(cid:19) (cid:54) c . log( n ) θ (1 − θ ) n with c . = c B. + c .B.4. Proof of Theorem 3.2.

The proof of Theorem 3.2 is dissected in three steps.

Step 1.

This ﬁrst step is devoted to provide a suitable decomposition of (cid:107) (cid:98) s (cid:99) K,(cid:96) ( n ; · ) − s (cid:107) . First, (cid:107) (cid:98) s (cid:99) K,(cid:96) ( n ; · ) − s (cid:107) = (cid:107) (cid:98) s (cid:99) K,(cid:96) ( n ; · ) − (cid:98) s K ,(cid:96) ( n ; · ) (cid:107) + (cid:107) (cid:98) s K ,(cid:96) ( n ; · ) − s (cid:107) − (cid:104) (cid:98) s K ,(cid:96) ( n ; · ) − (cid:98) s (cid:99) K,(cid:96) ( n ; · ) , (cid:98) s K ,(cid:96) ( n ; · ) − s (cid:105) From (7), it follows that for any K ∈ K n , (cid:107) (cid:98) s (cid:99) K,(cid:96) ( n ; · ) − s (cid:107) (cid:54) (cid:107) (cid:98) s K,(cid:96) ( n ; · ) − s (cid:107) + pen ( K ) − pen ( (cid:98) K ) + (cid:107) (cid:98) s K ,(cid:96) ( n ; · ) − s (cid:107) − (cid:104) (cid:98) s K ,(cid:96) ( n ; · ) − (cid:98) s (cid:99) K,(cid:96) ( n · ) , (cid:98) s K ,(cid:96) ( n ; · ) − s (cid:105) = (cid:107) (cid:98) s K,(cid:96) ( n ; · ) − s (cid:107) + ψ n ( K ) − ψ n ( (cid:98) K ) (14)where ψ n ( K ) := 2 (cid:104) (cid:98) s K,(cid:96) ( n ; · ) − s, (cid:98) s K ,(cid:96) ( n ; · ) − s (cid:105) − pen ( K ) . Let’s complete the decomposition of (cid:107) (cid:98) s (cid:99) K,(cid:96) ( n ; · ) − s (cid:107) by writing ψ n ( K ) = 2( ψ ,n ( K ) + ψ ,n ( K ) + ψ ,n ( K )) , where ψ ,n ( K ) := U K,K ,(cid:96) ( n ) n ,ψ ,n ( K ) := − n (cid:32) n (cid:88) i =1 (cid:96) ( Y i ) (cid:104) K ( X i , . ) , s K,(cid:96) (cid:105) + n (cid:88) i =1 (cid:96) ( Y i ) (cid:104) K ( X i , . ) , s K ,(cid:96) (cid:105) (cid:33) + 1 n (cid:104) s K ,(cid:96) , s K,(cid:96) (cid:105) and ψ ,n ( K ) := W K,K ,(cid:96) ( n ) + W K ,K,(cid:96) ( n ) + (cid:104) s K,(cid:96) − s, s K ,(cid:96) − s (cid:105) . ERNEL SELECTION IN NONPARAMETRIC REGRESSION 19

Step 2.

In this step, we give controls of the quantities E ( ψ i,n ( K )) and E ( ψ i,n ( (cid:98) K )) ; i = 1 , , . • By Lemma B.1, for any θ ∈ ]0 , , E ( | ψ ,n ( K ) | ) (cid:54) θn s K,(cid:96) + c B. log( n ) θn and E ( | ψ ,n ( (cid:98) K ) | ) (cid:54) θn E ( s (cid:99) K,(cid:96) ) + c B. log( n ) θn . • On the one hand, for any

K, K (cid:48) ∈ K n , consider Ψ ,n ( K, K (cid:48) ) := 1 n n (cid:88) i =1 (cid:96) ( Y i ) (cid:104) K ( X i , . ) , s K (cid:48) ,(cid:96) (cid:105) . Then, by Assumption 3.1, E (cid:18) sup K,K (cid:48) ∈K n | Ψ ,n ( K, K (cid:48) ) | (cid:19) (cid:54) E ( (cid:96) ( Y ) ) / E (cid:18) sup K,K (cid:48) ∈K n (cid:104) K ( X , . ) , s K (cid:48) ,(cid:96) (cid:105) (cid:19) / (cid:54) m / K ,(cid:96) E ( (cid:96) ( Y ) ) / . On the other hand, by Assumption 2.1.(2), |(cid:104) s K,(cid:96) , s K ,(cid:96) (cid:105) | (cid:54) m K ,(cid:96) . Then, there exists a deterministic constant c > , not depending on n and K , such that E ( | ψ ,n ( K ) | ) (cid:54) c n and E ( | ψ ,n ( (cid:98) K ) | ) (cid:54) c n . • By Lemma B.3, E ( | ψ ,n ( K ) | ) (cid:54) θ (cid:107) s K,(cid:96) − s (cid:107) + (cid:107) s K ,(cid:96) − s (cid:107) ) + 8 c B. log( n ) θn + (cid:18) θ (cid:19) / (cid:107) s K,(cid:96) − s (cid:107) × (cid:18) θ (cid:19) / (cid:107) s K ,(cid:96) − s (cid:107) (cid:54) θ (cid:107) s K,(cid:96) − s (cid:107) + (cid:18) θ θ (cid:19) (cid:107) s K ,(cid:96) − s (cid:107) + 8 c B. log( n ) θn and E ( | ψ ,n ( (cid:98) K ) | ) (cid:54) θ E ( (cid:107) s (cid:99) K,(cid:96) − s (cid:107) ) + (cid:18) θ θ (cid:19) (cid:107) s K ,(cid:96) − s (cid:107) + 8 c B. log( n ) θn . Step 3.

By the previous step, there exists a deterministic constant c > , not depending on n , θ , K and K , such that E ( | ψ n ( K ) | ) (cid:54) θ (cid:16) (cid:107) s K,(cid:96) − s (cid:107) + s K,(cid:96) n (cid:17) + (cid:18) θ θ (cid:19) (cid:107) s K ,(cid:96) − s (cid:107) + c log( n ) θn and E ( | ψ n ( (cid:98) K ) | ) (cid:54) θ E (cid:18) (cid:107) s (cid:99) K,(cid:96) − s (cid:107) + s (cid:99) K,(cid:96) n (cid:19) + (cid:18) θ θ (cid:19) (cid:107) s K ,(cid:96) − s (cid:107) + c log( n ) θn . Then, by Theorem 2.5, E ( | ψ n ( K ) | ) (cid:54) θ − θ E ( (cid:107) (cid:98) s K,(cid:96) ( n ; . ) − s (cid:107) ) + (cid:18) θ θ (cid:19) (cid:107) s K ,(cid:96) − s (cid:107) + (cid:18) c θ + c . − θ (cid:19) log( n ) n and E ( | ψ n ( (cid:98) K ) | ) (cid:54) θ − θ E ( (cid:107) (cid:98) s (cid:99) K,(cid:96) ( n ; . ) − s (cid:107) ) + (cid:18) θ θ (cid:19) (cid:107) s K ,(cid:96) − s (cid:107) + (cid:18) c θ + c . − θ (cid:19) log( n ) n . † By decomposition (14), there exist two deterministic constants c , c > , not depending on n , θ , K and K , such that E ( (cid:107) (cid:98) s (cid:99) K,(cid:96) ( n ; · ) − s (cid:107) ) (cid:54) E ( (cid:107) (cid:98) s K,(cid:96) ( n ; · ) − s (cid:107) ) + E ( | ψ n ( K ) | ) + E ( | ψ n ( (cid:98) K ) | ) (cid:54) (cid:18) θ − θ (cid:19) E ( (cid:107) (cid:98) s K,(cid:96) ( n ; · ) − s (cid:107) ) + θ − θ E ( (cid:107) (cid:98) s (cid:99) K,(cid:96) ( n ; . ) − s (cid:107) )+ c θ (cid:107) s K ,(cid:96) − s (cid:107) + c θ (1 − θ ) · log( n ) n . This concludes the proof.

Acknowledgments.

Many thanks to Fabienne Comte for her careful reading and advices.

References [1] G. Chagny.

Warped Bases for Conditional Density Estimation.

Math. Methods Statist. 22, 253-282, 2013.[2] F. Comte and N. Marie.

Bandwidth Selection for the Wolverton-Wagner Estimator.

Journal of Statistical Planningand Inference 207, 198-214, 2020.[3] F. Comte and N. Marie.

On a Nadaraya-Watson Estimator with Two Bandwidths.

Submitted, 2020.[4] F. Comte and T. Rebafka.

Nonparametric Weighted Estimators for Biased Data.

Journal of Statistical Planning andInference 174, 104-128, 2016.[5] U. Einmahl and D.M. Mason.

An Empirical Process Approach to the Uniform Consistency of Kernel-Type FunctionEstimators.

Journal of Theoretical Probability 13, 1-37, 2000.[6] U. Einmahl and D.M. Mason.

Uniform in Bandwidth Consistency of Kernel-Type Function Estimators.

Annals ofStatistics 33, 1380-1403, 2005.[7] E. Giné and R. Nickl.

Mathematical Foundations of Inﬁnite-Dimensional Statistical Models.

Cambridge universitypress, 2015.[8] A. Goldenshluger and O. Lepski.

Bandwidth Selection in Kernel Density Estimation: Oracle Inequalities and AdaptiveMinimax Optimality.

The Annals of Statistics 39, 1608-1632, 2011.[9] C. Houdré and P. Reynaud-Bouret.

Exponential Inequalities, with Constants, for U -Statistics of Order Two. StochasticInequalities and Applications, vol. 56 of Progr. Proba., 55-69, Birkhauser, 2003.[10] C. Lacour, P. Massart and V. Rivoirard.

Estimator Selection: a New Method with Applications to Kernel DensityEstimation.

Sankhya A 79, 2, 298-335, 2017.[11] M. Lerasle, N.M. Magalhaes and P. Reynaud-Bouret.

Optimal Kernel Selection for Density Estimation.

High dimen-sional probabilities VII: The Cargese Volume, vol. 71 of Prog. Proba., 435–460, Birkhauser, 2016.[12] P. Massart.

Concentration Inequalities and Model Selection.

Lecture Notes in Mathematics 1896, Springer, 2007.[13] S. Varet, C. Lacour, P. Massart and V. Rivoirard.

Numerical Performance of Penalized Comparison to Overﬁtting forMultivariate Density Estimation.

Preprint, 2020. *LTCI, Télécom Paris, Palaiseau, France

E-mail address : [email protected] † Laboratoire Modal’X, Université Paris Nanterre, Nanterre, France

E-mail address : [email protected] *, † ESME Sudria, Paris, France

E-mail address : [email protected] E-mail address ::