Minimax estimation of norms of a probability density: I. Lower bounds
aa r X i v : . [ m a t h . S T ] A ug Minimax estimation of norms of a probabilitydensity: I. Lower bounds
A. Goldenshluger ∗ and O. V. Lepski † Department of StatisticsUniversity of HaifaMount CarmelHaifa 31905, Israele-mail: [email protected]
Institut de Math´ematique de MarseilleAix-Marseille Universit´e39, rue F. Joliot-Curie13453 Marseille, Francee-mail: [email protected]
Abstract:
The paper deals with the problem of nonparametric estimating the L p –norm, p ∈ (1 , ∞ ), of a probability density on R d , d ≥ p is integer or not. Moreover, we develop a general technique for derivation of lower boundson the minimax risk in the problems of estimating nonlinear functionals. The proposedtechnique is applicable for a broad class of nonlinear functionals, and it is used for derivationof the lower bounds in the L p –norm estimation. AMS 2000 subject classifications:
Keywords and phrases: estimation of nonlinear functionals, minimax estimation, mini-max risk, anisotropic Nikolskii’s class.
1. Introduction
Suppose that we observe i.i.d. vectors X i ∈ R d , i = 1 , . . . , n, with common probability density f .Let p > L p -norm of f , k f k p := (cid:20) Z R d | f ( x ) | p d x (cid:21) /p , using observations X ( n ) = ( X , . . . , X n ). By estimator we mean any X ( n ) -measurable map e F : R n → R , and accuracy of an estimator e F is measured by the quadratic risk R n [ e F , f ] := (cid:16) E f (cid:2) e F − k f k p (cid:3) (cid:17) / , where E f denotes expectation with respect to the probability measure P f of observations X ( n ) =( X , . . . , X n ). ∗ Supported by the ISF grant No. 361/15. † This work has been carried out in the framework of the Labex Archim`ede (ANR-11-LABX-0033) and ofthe A*MIDEX project (ANR-11-IDEX-0001-02), funded by the ”Investissements d’Avenir” French Governmentprogram managed by the French National Research Agency (ANR).1 . Goldenshluger and O. V. Lepski/Estimation of norms of a probability density We adopt the minimax approach to the outlined estimation problem. Let F denote the set ofall probability densities defined on R d . With any estimator e F and any subset F of F we associate the maximal risk of e F on F : R n (cid:2) e F , F (cid:3) := sup f ∈F R n [ e F , f ] . The minimax risk is R n [ F ] := inf ˜ F R n [ e F , F ] , where inf is taken over all possible estimators. An estimator e F ∗ is called optimal in order or rate–optimal if R n [ e F ∗ ; F ] ≍ R n [ F ] , n → ∞ . The rate at which R n [ F ] converges to zero as n tends to infinity is referred to as the minimaxrate of convergence .The problems of minimax nonparametric estimation of density functionals have been ex-tensively studied in the literature. The case of linear functionals is particularly well under-stood: here a complete optimality theory under rather general assumptions has been developed[see, e.g., Ibragimov and Khasminskii (1986), Donoho and Liu (1991), Cai and Low (2004) andJuditsky and Nemirovski (2020)]. As for nonlinear functionals, the situation is completely differ-ent: even in the problem of estimating quadratic functionals of a density rate–optimal estimatorsare known only for very specific functional classes. For representative publications dealing withestimation of quadratic and closely related integral functionals of a probability density we refer toBickel and Ritov (1988), Birg´e and Massart (1995), Kerkyacharian and Picard (1996), Laurent(1996, 1997), Gin´e and Nickl (2008) and Tchetgen et al. (2008). The problems of estimatingnon-linear functionals were also considered in the framework of the Gaussian white noise model;e.g., Ibragimov et al. (1986), Nemirovskii (1990) [see also (Nemirovski 2000, Chapters 7 and 8)],Donoho and Nussbaum (1990), Cai and Low (2005). The contribution of this paper is closelyrelated to the works Lepski et al. (1999), Cai and Low (2011) and Han et al. (2019), where theproblem of estimation of norms of a signal observed in Gaussian white noise was studied. Addi-tional pointers to relevant work and discussion of relations between our results and the existingliterature are provided in Section 4.8.This paper deals with the problem of estimating the L p –norm of a probability density andderives lower bounds on asymptotics of the minimax risk over anisotropic Nikolskii’s classes N ~r,d ( ~β, ~L ) (precise definition of the functional class is given below). In the companion paperGoldenshluger and Lepski (2020) we develop the corresponding rate–optimal estimators demon-strating that the derived lower bounds are tight. We also study how boundedness of the un-derlying density f in some integral norm influences the estimation accuracy by considering theminimax risk over the functional class F = N ~r,d (cid:0) ~β, ~L (cid:1) ∩ B q ( Q ), where B q ( Q ) := (cid:8) f : R d → R : k f k q ≤ Q (cid:9) , q > , Q > . The contribution of this paper is two–fold. First, we derive lower bounds on the minimax riskon the class F in the problem of estimating the L p –norm of a probability density. Second, wedevelop general machinery for derivation of lower bounds on the minimax risk in the problemsof estimating nonlinear functionals of the typeΨ( f ) = G (cid:18) Z R d H (cid:0) f ( x ) (cid:1) d x (cid:19) , (1.1)where G : R → R and H : R + → R are fixed functions. The developed machinery is appliedfor the problem of estimating k f k p . In order to demonstrate broad applicability of the proposed . Goldenshluger and O. V. Lepski/Estimation of norms of a probability density technique we also provide lower bounds in problems of estimation of other nonlinear functionalsof interest.The rest of this paper is structured as follows. Section 2 presents lower bounds on the minimaxrisk in the problem of estimating the L p –norms of f . Section 3 develops a general technique forderivation of lower bounds in the problems of estimating nonlinear functionals of type (1.1). Themain results of these two sections are proved in Sections 4 and 5 respectively. Appendix containsproofs of auxiliary results.
2. Lower bounds for estimation of the L p –norm We start with the definition of the anisotropic Nikolskii functional classes. Let ( e , . . . , e d ) denotethe canonical basis of R d . For function G : R d → R and real number u ∈ R the first orderdifference operator with step size u in direction of variable x j is defined by ∆ u,j G ( x ) := G ( x + u e j ) − G ( x ) , j = 1 , . . . , d . By induction, the k -th order difference operator with step size u indirection of x j is ∆ ku,j G ( x ) = ∆ u,j ∆ k − u,j G ( x ) = k X l =1 ( − l + k (cid:0) kl (cid:1) ∆ ul,j G ( x ) . Definition 1.
For given vectors ~β = ( β , . . . , β d ) ∈ (0 , ∞ ) d , ~r = ( r , . . . , r d ) ∈ [1 , ∞ ] d , and ~L = ( L , . . . , L d ) ∈ (0 , ∞ ) d we say that function G : R d → R belongs to anisotropic Nikolskii’sclass N ~r,d (cid:0) ~β, ~L (cid:1) if k G k r j ≤ L j for all j = 1 , . . . , d and there exist natural number k j > β j suchthat (cid:13)(cid:13) ∆ k j u,j G (cid:13)(cid:13) r j ≤ L j | u | β j , ∀ u ∈ R , ∀ j = 1 , . . . , d. In addition to constraint f ∈ N ~r,d ( ~β, ~L ) we also assume that f ∈ B q ( Q ). By definition ofNikolskii’s class, f ∈ N ~r,d ( ~β, ~L ) implies f ∈ B r ∗ (max l =1 ,...,d L l ), where r ∗ := max l =1 ,...,d r l . Sincewe are interested in estimating k f k p , it is necessary to suppose that this norm is bounded.Therefore in all what follows we assume that q ≥ p ∨ r ∗ .Asymptotic behavior of the minimax risks on anisotropic Nikolskii’s classes is convenientlyexpressed in terms of the following parameters:1 β := d X j =1 β j , ω := d X j =1 β j r j , L := d Y j =1 L /β j j ,τ ( s ) := 1 − ω + 1 βs , s ∈ [1 , ∞ ] . It is worth mentioning that quantities τ ( · ) appear in embedding theorems for Nikolskii’s classes;for details see Nikolskii (1977).Now we are ready to state lower bounds on the minimax risk in the problem of estimating the L p –norm k f k p . We consider the cases of integer and non–integer p separately. p ≥ Define θ := τ (1) , τ ( p ) ≥ /p − /q − /q − (1 − /p ) τ ( q ) , τ ( p ) < , τ ( q ) < τ ( p ) τ (1) , τ ( p ) < , τ ( q ) ≥ , . Goldenshluger and O. V. Lepski/Estimation of norms of a probability density and let φ n := L − /pτ (1) n − θ ∗ , θ ∗ := 2 − ∧ θ. Theorem 1.
For any ~β ∈ (0 , ∞ ) d , ~L ∈ (0 , ∞ ) d , ~r ∈ [1 , ∞ ] d , q ≥ p ∨ r ∗ and p ∈ N ∗ , p ≥ thereexists c > independent of ~L such that lim inf n →∞ φ − n R n (cid:2) N ~r,d (cid:0) ~β, ~L (cid:1) ∩ B q ( Q ) (cid:3) ≥ c. Remark 1.
In the companion paper Goldenshluger and Lepski (2020) we demonstrate that therates of convergence of the minimax risk established in Theorem 1 are minimax , that is, they areattained by explicitly constructed estimation procedures.
The lower bounds on the minimax rates of convergence of Theorem 1 exhibit rather unusualfeatures as compared to the results on estimating the L p –norm of a signal in the Gaussian whitenoise model [see Lepski et al. (1999) and Han et al. (2017)].1. It is quite surprising that the obtained asymptotics of the minimax risk does not dependon p and q if τ ( p ) ≥
1. Perhaps it is even more surprising that in some cases the L p -norm of aprobability density can be estimated with the parametric rate ! On the other hand, it is easilyseen that θ < / τ ( p ) < , τ ( q ) <
0; therefore, the parametric rate is not achievable in thisregime.2. If r ∗ = max l =1 ,...,d r l ≤ p and q = p then uniformly consistent estimators over anisotropicNikol’skii’s classes do not exist when τ ( p ) ≤
0. This together with Remark 1 implies that condi-tion τ ( p ) > necessary and sufficient for existence of uniformly consistent estimators of the L p -norm.3. Taking together the previous remarks, we see that in the considered estimation problemthe full spectrum of asymptotic behavior for the minimax risk is possible: from parametric rateof convergence to inconsistency. To the best of our knowledge this phenomenon has not beenobserved before. p ≥ Define ϑ := ∧ − /pτ (1) , τ ( p ) ≥ − /p ; /p − /q − /q − τ ( q ) , τ ( p ) < − /p, τ ( q ) < ∧ τ ( p ) τ (1) , τ ( p ) < − /p, τ ( q ) ≥ ,ϑ ∗ := − /p ) − τ ( p ) τ (1) , τ ( p ) ≥ − /p ; ϑ, τ ( p ) < − /p, τ ( q ) < p, τ ( p ) < − /p, τ ( q ) ≥ , (2.1)and let φ n := L − /pτ (1) n − ϑ (cid:2) ln( n ) (cid:3) ϑ ∗ − p . Theorem 2.
For any ~β ∈ (0 , ∞ ) d , ~L ∈ (0 , ∞ ) d , ~r ∈ [1 , ∞ ] d and p / ∈ N ∗ , p > there exists c > independent of ~L such that lim inf n →∞ φ − n R n (cid:2) N ~r,d (cid:0) ~β, ~L (cid:1) ∩ B q ( Q ) (cid:3) ≥ c. . Goldenshluger and O. V. Lepski/Estimation of norms of a probability density
1. Note that the rates of convergence established in Theorems 1 and 2 are different, exceptthe case τ ( p ) < − /p, τ ( q ) ≥
0. As we can see, the estimation accuracy for integer values of p is much better than for the non–integer ones. For the first time this phenomenon was discoveredby Lepski et al. (1999) in the problem of estimating the L p –norm of a signal in the univariateGaussian white noise model.2. Theorem 2 shows that if q = p and τ ( p ) = 0 then there no uniformly consistent estimatorexists. If q = p and τ ( p ) < n . We conjecture thatin this case there are no uniformly consistent estimators as well. If our conjecture is true, theproof of lower bounds will require some additional considerations.3. It is not difficult to check that the rate of convergence corresponding to the zone τ ( p ) ≤ − /p is slower than the one corresponding to τ ( p ) > − /p independently of the value of q . As it was mentioned above, in this paper we do not discuss estimation procedures; we refer toGoldenshluger and Lepski (2020) for construction of rate–optimal estimators of k f k p for integervalues of p . However, for non–integer values of p in some cases very simple constructions leadto nearly rate–optimal adaptive estimators of L p -norms. Let us discuss one such estimator undercondition that density f is uniformly bounded, i.e., q = ∞ .Let p / ∈ N ∗ , p > q = ∞ then ϑ = ( ∧ − /pτ (1) , τ ( p ) ≥ − /p ; ωp , τ ( p ) < − /p, τ ( ∞ ) < ,φ n = L − /pτ (1) n − ϑ (cid:2) ln( n ) (cid:3) ϑ − p . Consider the following sets of parameters: D = (cid:8)(cid:0) ~β, ~r (cid:1) : τ ( p ) > − /p ) (cid:9) ; D = (cid:8)(cid:0) ~β, ~r (cid:1) : τ ( p ) < − /p, τ ( ∞ ) < (cid:9) . Let ℓ > ϕ n := ( L /n ) − /pτ (1) (cid:2) ln( n ) (cid:3) d − − /pτ (1) , ( ~β, ~r ) ∈ D ;( L ln( n ) /n ) ω/p , ( ~β, ~r ) ∈ D . Let b f ( x ), x ∈ R d be the estimator of f ( x ) built in Theorem 1 of Lepski and Willer (2019) inthe case α = 0 [see also Goldenshluger and Lepski (2014)], and consider the plug–in estimator ofthe L p –norm, b F := k b f k p . Theorem 3.
For any
Q > , L > , ℓ > , ~L ∈ [ L , ∞ ) d and any ~β ∈ (0 , ℓ ] d , ~r ∈ (1 , ∞ ] d belonging to D ∪ D there exists C < ∞ , independent of ~L , such that lim sup n →∞ ϕ − n R n (cid:2) b F ; N ~r,d (cid:0) ~β, ~L (cid:1) ∩ B ∞ ( Q ) (cid:3) ≤ C. The proof of this theorem is trivial. By the triangle inequality | b F − k f k p | ≤ k b f − f k p , so thatthe problem of estimating k f k p can be reduced to the problem of adaptive estimation of f underthe L p -loss. The stated upper bound follows from the results of Theorem 3 in Lepski and Willer(2019) corresponding to what is called in that paper tail zone and sparse zone 1 . Combiningbounds of Theorems 2 and 3 we come to the following statement. . Goldenshluger and O. V. Lepski/Estimation of norms of a probability density Corollary 1.
For any
Q > , L > , ℓ > , ~L ∈ [ L , ∞ ) d and any ~β ∈ (0 , ℓ ] d , ~r ∈ (1 , ∞ ] d belonging to D ∪ D one has for all n large enough (cid:2) ln( n ) (cid:3) γ − p . n ϑ R n (cid:2) N ~r,d (cid:0) ~β, ~L (cid:1) ∩ B ∞ ( Q ) (cid:3) . (cid:2) ln( n ) (cid:3) γ , where γ := ( − /p ) − τ ( p ) τ (1) , D ,ω/p, D , γ := ( d − − /pτ (1) , D ,ω/p, D . Thus the estimator b F is nearly rate–optimal adaptive over the scale of Nikolskii’s classes whoseparameters belong to D ∪ D . L p -loss Assume that we are interested in estimating density f under L p -loss from the observation X ( n ) .Measuring accuracy of estimation procedures by the L p -loss leads to the quadratic risk in theform R n [ e f , f ] := (cid:16) E f (cid:2) k e f − f k p (cid:3) (cid:17) / . In view of the triangle inequality we have for any f ∈ FR n [ e f , f ] ≥ R n [ e F , f ] , where e F = k e f k p . Hence, whatever the functional class F is, one has R n [ F ] := inf e f sup f ∈ F R n [ e f , f ] ≥ R n [ F ] . and we assert that any lower bound for R n [ F ] is automatically the lower bound for R n [ F ]. Inparticular, assuming that r ∗ ≤ p and putting q = p we deduce from Theorems 1 and 2 thefollowing result. Corollary 2.
Let either < τ ( p ) < − /p if p / ∈ N ∗ or < τ ( p ) < if p ∈ N ∗ . Then lim inf n →∞ n − τ ( p ) τ (1) R n (cid:2) N ~r,d (cid:0) ~β, ~L (cid:1)(cid:3) ≥ c > . This asymptotics of the minimax risk is related to the fact that the estimated density maybe unbounded. To the best of our knowledge, this result is new. In the one-dimensional case for p = 2 the obtained rate coincides with the one in Birg´e (2014).
3. Lower bounds for estimation of general non–linear functionals
The results of Theorems 1 and 2 follow from general machinery for derivation of lower bounds onminimax risks in the density model. In this section we develop this technique in full generalityfor a broad class of nonlinear functionals to be estimated.Let G : R → R and H : R + → R be fixed functions. We are interested in estimating thefunctional Ψ( f ) = G (cid:18) Z R d H (cid:0) f ( x ) (cid:1) d x (cid:19) (3.1) . Goldenshluger and O. V. Lepski/Estimation of norms of a probability density from observation X ( n ) = ( X , . . . , X n ). Let F be a class of functions defined on R d and let R n (cid:2) F (cid:3) := inf ˜Ψ sup f ∈ F (cid:16) E f (cid:2) ˜Ψ − Ψ( f ) (cid:3) (cid:17) / , where the infimum is taken over all possible estimators of Ψ. Our goal is to derive an explicitlower bound on the minimax risk under mild condition on functions G and H and functionalclass F .We remark that the class of considered functionals is rather broad and includes many probleminstances of interest. Let us give some examples.1. Let G ( y ) = y /p and H ( y ) = y p for some p ∈ (1 , ∞ ); then Ψ( f ) is the L p -norm of f , andestimation of this functional is the subject of the present paper.2. The choice G ( y ) = ay leads to the estimation of the integral-type functionals . The followingparticular cases have been considered in the literature.(a) if a = 1 and H ( y ) = y p with p ∈ N ∗ , p ≥
2, then the corresponding functional isΨ( f ) = k f k pp ; see for instance, Bickel and Ritov (1988), Kerkyacharian and Picard(1996), Laurent (1996, 1997), Tchetgen et al. (2008);(b) the case a = 1 and H ( y ) = − y ln( y ) corresponds to the differential entropy , Ψ( f ) = − R f ( x ) ln f ( x )d x ; see, e.g., Kozachenko and Leonenko (1987);(c) if a = ( p − − and H ( y ) = y − y p with p = 1 then Ψ( f ) is the Tsallis entropy ,Ψ( f ) = ( p − − (1 − R | f ( x ) | p d x ); see Tsallis (1988), Leonenko et al. (2008).3. Let G ( y ) = (1 − p ) − ln( y ) and H ( y ) = y p with p = 1; then the corresponding functionalis the R´enyi entropy , Ψ( f ) = (1 − p ) − ln (cid:0) R | f ( x ) | p d x (cid:1) ; see R´enyi (1961), Leonenko et al.(2008).The technique for derivation of lower bounds relies upon construction of a parameterizedfamily of functions equipped with a pair prior probability measures on it. Below we discuss theseconstruction ingredients in succession. Let Λ : R d → R + be a function satisfying the following conditions:Λ( x ) = 0 , ∀ x / ∈ [ − , d , Z R d Λ( x )d x = 1 . (3.2)Let | · | ∞ denote the ℓ ∞ –norm on R d , and let M be a given finite set of indices of cardinality M = card( M ). Let (cid:8) x m ∈ R d , m ∈ M (cid:9) be a finite set of points in R d satisfying (cid:12)(cid:12) x k − x m (cid:12)(cid:12) ∞ ≥ , ∀ k = m, k, m ∈ M . Fix vector ~σ = ( σ , . . . , σ d ) ∈ (0 , d and constant A > m ∈ M Λ m ( x ) = A Λ (cid:0) ( x − x m ) /~σ (cid:1) , Π m = n x ∈ R d : (cid:12)(cid:12) ( x − x m ) /~σ (cid:12)(cid:12) ∞ ≤ o , (3.3)where the division is understood in the coordinate–wise sense. In words, Π m is a rectangle in R d centered at x m with edges of half–lengths σ , . . . σ d that are parallel to the coordinate axes. It isobvious that Λ m is supported on Π m for any m ∈ M , and Π m are disjoint:Π m ∩ Π k = ∅ , ∀ k = m, k, m ∈ M . (3.4) . Goldenshluger and O. V. Lepski/Estimation of norms of a probability density Let Π := R d \ ∪ m ∈M Π m , σ := Q dl =1 σ l , and ̺ w ( z ) := X m ∈M (cid:0) w m (cid:1) z , w ∈ [0 , M , z > . Let f be a probability density supported on Π . Define the family of functions: f w ( x ) := (cid:2) − A σ ̺ w (1) (cid:3) f ( x ) + A X m ∈M w m Λ m ( x ) , w ∈ [0 , M . (3.5)The family { f w , w ∈ [0 , M } involves tuning parameters A , ~σ and M that will be specified inthe sequel. The most important element of our approach consists in equipping [0 , M with twoproduct probability measures, thus assuming that w is a random vector distributed in accordancewith one of them. Then functions f w become random, and they are not necessarily densityfunctions and/or functions from the functional class F for all realizations of w . With conditionsintroduced below we ensure that f w ∈ F ∩ F with large enough probability. Let P [0 ,
1] be the set of all probability measures with total mass on [0 , π ∈ P [0 , z ≥ e π ( z ) := Z x z π (d x ) . Definition 2.
For a pair of probability measures µ, ν ∈ P [0 , we will write µ ∽ ν if e µ (1) = e ν (1) . Let ζ := ( ζ m , m ∈ M ) be independent identically distributed random variables, and ζ m isdistributed π ∈ P [0 , m ∈ M . The law of ζ and the corresponding expectation will be denotedby P π and E π respectively. Define p ζ ( x ) := d Y i =1 f ζ ( x i ) , x ∈ R dn ;here and from now on we regard x = ( x , . . . , x n ), x i ∈ R d as an element of R dn . Now we introduce general assumptions that relate properties of parameterized family { f w , w ∈ [0 , M } and prior measures on [0 , M . Assumption 1.
There exist ε ∈ (0 , and two probability measures µ, ν ∈ P [0 , , µ ∽ ν suchthat P π (cid:8) f ζ ∈ F (cid:9) ≥ − ε, π ∈ { µ, ν } , where f ζ is defined in (3.5). Assumption 1 stipulates that under prior probability measures µ and ν random function f ζ belongs to the functional class F with probability at least 1 − ε . Note that this assumption doesnot guarantee that f ζ is a probability density; by construction, only assumption R f ζ = 1 isfulfilled for all realizations of ζ .We also need conditions that relate parameters A, ~σ, M of the family of functions with thenumber of observations n . . Goldenshluger and O. V. Lepski/Estimation of norms of a probability density Assumption 2.
For sufficiently small κ > and sufficiently large υ > one has A σ √ M ≤ κ n − / , (3.6) M ≥ υ, (3.7) A σ M ≤ / . (3.8)Condition (3.7) guarantees that random function f ζ concentrates properly around its expec-tation, while (3.8) implies that f ζ is a probability density for all realizations of ζ . Indeed, byconstruction R f ζ ( x )d x = 1, and, in view of (3.8), f ζ ≥ ̺ ζ (1) ≤ M for all ζ . In addition,condition (3.6) allows us to construct a product form approximation for the Bayesian likelihoodratio E µ [ p ζ ( · )] / E ν [ p ζ ( · )], which is an essential step in the derivation of lower bounds. To state lower bounds on the minimax risk for estimating functional Ψ( f ) we require notationthat involves functions H and G appearing in (3.1).Define functions S : [0 , → R and S : R + → R by S ( z ) := Z Π H (cid:0) (1 − z ) f ( x ) (cid:1) d x, S ( z ) := Z [ − , d H (cid:0) z Λ( x ) (cid:1) d x. (3.9)For π ∈ P [0 ,
1] let E π ( A ) := Z S ( Ay ) π (d y ) , V π ( A ) := (cid:20) Z S ( Ay ) π (d y ) (cid:21) / . (3.10)We tacitly assume that E π ( A ) and V π ( A ) are finite for all A > π ∈ P [0 , { f w , w ∈ [0 , M } one has Z R d H (cid:0) f ζ ( x ) (cid:1) d x = S ( A σ ̺ ζ (1)) + σ X m ∈M S ( Aζ m ) ≈ S ( A σ M e π (1)) + σ M E π ( A ) , (3.11)where the approximate equality in the second line designates that the sums of independent ran-dom variables ̺ ζ (1) and P m ∈M S ( Aζ m ) concentrate properly around their expectations M e π (1)and M E π ( A ) respectively. In addition, the lower bound derivation requires analysis of discrep-ancy between the values of the functional Ψ( f ζ ) when ζ is distributed according to prior measures µ, ν ∈ P [0 , µ ∽ ν . This fact along with (3.11) motivates the following notation.Let µ, ν ∈ P [0 , µ ∽ ν and π ∈ { µ, ν } . Define H ∗ π := S ( A σ M e π (1)) + σ M E π ( A ) , (3.12) α π := η S (cid:0) A σ M e π (1); A σ √ M (cid:1) + σ √ υM V π ( A ) , (3.13) J π := h inf | α |≤ α π G ( H ∗ π + α ) , sup | α |≤ α π G ( H ∗ π + α ) i , (3.14) . Goldenshluger and O. V. Lepski/Estimation of norms of a probability density where η S ( x ; δ ) stands for the local modulus of continuity of function S , η S ( x ; δ ) := sup y : | y − x |≤ δ | S ( x ) − S ( y ) | , x, y ∈ [0 , , δ > . Define also ∆( µ, ν ) := min (cid:8) | x − x ′ | : x ∈ J µ , x ′ ∈ J ν (cid:9) ;clearly, ∆( µ, ν ) is the Hausdorff distance between the intervals J µ and J ν .Finally we let n m ( x ) := n X i =1 Π m ( x i ) , n ( x ) := n − X m ∈M n m ( x ) , x ∈ R dn , where sets Π m , m ∈ M are defined in (3.3). The quantities n m ( x ) and n ( x ) have evidentprobabilistic interpretation: if X ( n ) = ( X , . . . , X n ) is the sample then n m ( X ( n ) ) and n ( X ( n ) )are numbers of observations in the sets Π m and Π respectively. Furthermore, for a pair ofmeasures µ ∽ ν we defineΥ( x ) := Y m ∈M γ m,µ ( x ) γ m,ν ( x ) , x ∈ R dn ,γ m,π ( x ) := Z y n m ( x ) e − Dn ( x ) y π (d y ) , D := A σ − A σ M e π (1) . (3.15)Observe that D does not depend on π ∈ { µ, ν } because µ ∽ ν .Now we are in a position to formulate the main result of this section. Theorem 4.
Let µ, ν ∈ P [0 , , µ ∽ ν , and suppose that Assumptions 1 and 2 are fulfilled. If ∆( µ, ν ) > then e [∆( µ,ν )] − R n [ F ] ≥ E ν h P f ζ (cid:8) Υ (cid:0) X ( n ) (cid:1) ≥ (cid:9)i − υ − − p υ − + ε ) . (3.16)In order to apply general lower bound (3.16) in concrete problem instances we need to computeor bound from below the quantity ∆( µ, ν ) and to show that the right hand side is strictly positive.The next two corollaries of Theorem 4 derive lower bounds on the right hand side of (3.16) underadditional conditions on prior measures and parameters of the family { f w , w ∈ [0 , M } . Definition 3.
Let r ∈ N ∗ , r ≥ be fixed. For µ, ν ∈ P [0 , we write µ r ∽ ν if e µ ( k ) = e ν ( k ) forall k = 1 , . . . , r . Proposition 3 of Section 4.4 presents a sophisticated construction of probability measures µ , ν satisfying conditions of Definition 3 and possessing some additional properties. Corollary 3.
Let r be a positive integer number, possibly dependent on n , such that r > ln(36 υ ) ,and let µ, ν ∈ P [0 , satisfy µ r ∽ ν . Suppose that Assumptions 1 and 2 are fulfilled. Assume thatfor sufficiently small κ > one has nA σ ≤ κ r, M ≤ e r . (3.17) If ∆( µ, ν ) > then putting C ∗ = (36 e ) − / (cid:2) − − υ − − p ε + 2 /υ ) (cid:3) / one has R n [ F ] ≥ C ∗ ∆( µ, ν ) . . Goldenshluger and O. V. Lepski/Estimation of norms of a probability density Definition 4.
Let t, r ∈ N ∗ , r ≥ t > be fixed. For µ, ν ∈ P [0 , we write that µ r,t ∽ ν if thefollowing requirements are fulfilled:1. e µ ( k ) = e ν ( k ) for all k = 1 , . . . , r, k = t ;2. e µ ( t ) = e ν ( t ) . Proposition 2 of Section 4.4 presents a construction of measures µ and ν satisfying µ r,t ∽ ν . Corollary 4.
Fix r ≥ t > , and let µ, ν ∈ P [0 , satisfy µ r,t ∽ ν . Suppose that Assumptions 1–2are fulfilled, and for sufficiently small κ > and t independent of n one has nA σ M /t ≤ κ . (3.18) If ∆( µ, ν ) > then R n [ F ] ≥ C ∗ ∆( µ, ν ) , where C ∗ is defined in Corollary 3. In this section we discuss applicability of Theorem 4 and Corollaries 3 and 4, and main ideasthat underlie the proofs of these results.
More general statistical experiments
The proofs of Theorem 4 and Corollaries 3 and 4 donot use the fact that density f is defined on R d . In fact, after minor changes and modificationsour construction is applicable in an arbitrary density model.Let ( X , B , λ ) be a measurable space, and let X be an X -valued random variable whose lawhas the density f with respect to measure λ . Assume that we observe X ( n ) = ( X , . . . , X n ),where X i , i = 1 , . . . , n , are independent copies of X . The goal is to estimate the functionalΨ( f ) = G (cid:18) Z X H (cid:0) f ( x ) (cid:1) λ (d x ) (cid:19) , where as before G : R → R and H : R + → R are fixed functions.Let M be a finite set of indices with cardinality M , possibly dependent on n . Let f , Λ m : X → R + and Π m ∈ B , m ∈ M be collections of measurable functions and sets satisfying thefollowing conditions.(a) Π m ∩ Π k = ∅ for any m = k, m, k ∈ M ;(b) λ (Π m ) = σ > m ∈ M ;(c) Λ m ( x ) = 0 , x / ∈ Π m for any m ∈ M ;(d) R Π m Λ m ( x ) λ (d x ) = 1 for any m ∈ M ;(e) R X f ( x ) λ (d x ) = 1 and f ( x ) = 0 for any x ∈ ∪ m ∈M Π m .Under these conditions some evident minor modifications in definitions should be made; forinstance, function S should be defined as S ( z ) = M − X m ∈M Z X H (cid:0) z Λ m ( x ) (cid:1) λ (d x ) . With these changes the results of Theorem 4 and its corollaries remain valid. . Goldenshluger and O. V. Lepski/Estimation of norms of a probability density Method of proof
The following fundamental principles and main ideas lie at the core of theproof of Theorem 4 and its corollaries.The first idea goes back to the paper Lepski et al. (1999). It reduces the original estimationproblem to a problem of testing two composite hypotheses for mixture distributions which areobtained by imposing prior probability measures with intersecting supports on parameters of afunctional family. In Tsybakov (2009) this technique is called the method of two fuzzy hypothe-ses . The choice of the prior measures is based on the moment matching technique ; see, e.g.,Cai and Low (2011) and Han et al. (2019), where further references can be found. We clarify themoment matching technique in Propositions 2 and 3 that can be viewed as slight generalizationand modification of the results in Lepski et al. (1999). Detailed proofs of these statements aregiven in Appendix.The second idea is related to construction of a specific parameterized family of densities onwhich the lower bound on the minimax risks is established. Here we use a construction that issimilar to the one proposed in Goldenshluger and Lepski (2014).The third main idea is related to the analysis of the so-called
Bayesian likelihood ratio . Thisanalysis, being common in estimating nonlinear functionals, depends heavily on the consideredstatistical model. The multivariate density model on R d requires development of an originaltechnique because, in contrast to the Gaussian white noise or regression models, the Bayesianlikelihood ratio is not a product of independent random variables. As a consequence, standardmethods based on computation of the Kullback-Leibler, Hellinger or other divergences betweendistributions are not applicable. That is why the proof of Theorem 4 contains development oftwo–sided product–form bounds on the Bayesian likelihood ratio. In this section we discuss implications of our results and the developed technique for otherproblems of estimating nonlinear functionals. In all examples below we consider functionals Ψ( f )of type (3.1) with G ( y ) ≡ y . Denote also F Ψ := (cid:8) f : | Ψ( f ) | < ∞ (cid:9) . Estimation of k f k pp , p ∈ N ∗ In this example H ( y ) = y p , p ∈ N ∗ . Apparently, the case p = 2 is the most well studied setting.Many authors, starting from the seminal paper of Bickel and Ritov (1988), made fundamentalcontributions to the minimax and minimax adaptive estimation of quadratic functionals of prob-ability density; see, for instance, Birg´e and Massart (1995), Laurent (1996, 1997) among manyothers. Kerkyacharian and Picard (1996) studied the case p = 3 and Tchetgen et al. (2008) con-sidered the setting with arbitrary integer p . It is worth noting that all aforementioned papersconsider either univariate or compactly supported densities belonging to a semi–isotropic func-tional class, that is r l = r for any l = 1 , . . . , d .Let us consider the case p = 2 and recall one of the most well known results. Assume thatthe underlying density f is compactly supported and belongs to the anisotropic H¨older class N ~ ∞ ,d (cid:0) ~β, ~L (cid:1) , that is r l = ∞ for all l = 1 , . . . , d . It is well known that tn this setting the minimaxrate of convergence in estimating k f k is given by(1 /n ) β β +1 ∧ ;see, e.g., Bickel and Ritov (1988) for one–dimensional case. In particular, the parametric regimeis possible if and only if β ≥ /
4. On the other hand, close inspection of the proof of Theorem . Goldenshluger and O. V. Lepski/Estimation of norms of a probability density k f k issimply the squared rate found in this theorem in the ”nonparametric” regime. In particular, if r l = ∞ for all l = 1 , . . . , d then Theorem 1 yields the rate(1 /n ) ββ +1 ∧ . We do not know whether this rate is the minimax rate of convergence, but we can assert thatthe parametric rate is not possible if β ≤ /
3. This shows that problems of estimating k f k forcompactly supported densities, and densities supported on the entire space R d are completelydifferent.Another interesting feature is that if q = p , r ∗ ≤ p and τ ( p ) ≤ k f k over anisotropic Nikolskii’s class. This phenomenon is again due tothe fact that the underlying density is assumed to be supported on the entire space R d . Estimation of the differential entropy
This setting corresponds to H ( y ) = − y ln( y ). Applying the same reasoning as in the proof ofTheorem 2 in conjunction with Proposition 4 we are able to prove the following statement. Theorem 5.
There exists c > such that for any ~β ∈ (0 , ∞ ) d , ~L ∈ (0 , ∞ ) d , ~r ∈ [1 , ∞ ] d lim inf n →∞ [ln n ] inf e F sup F Ψ ∩ N ~r,d (cid:0) ~β,~L (cid:1) (cid:16) E f (cid:2) e F − Ψ( f ) (cid:3) (cid:17) / ≥ c. Estimation of k f k pp , p ∈ (0 , H ( y ) = y p , p ∈ (0 , p ∈ (0 , Theorem 6.
For any p ∈ (0 , there exists c > such that for any ~β ∈ (0 , ∞ ) d . ~L ∈ (0 , ∞ ) d and ~r ∈ [1 , ∞ ] d lim inf n →∞ [ln n ] p inf e F sup F Ψ ∩ N ~r,d ( ~β,~L ) (cid:16) E f (cid:2) e F − Ψ( f ) (cid:3) (cid:17) / ≥ c. The rates of convergence established in Theorems 5 and 6 are very slow and do not depend onthe parameters of the functional class. In particular, these results demonstrate that smoothnessalone is not sufficient in order to guarantee a ”reasonable” accuracy of estimation. However, therecent paper Han et al. (2017) dealing with estimation of the differential entropy shows that ifunderlying density satisfies moment conditions then the minimax risk converges to zero at thepolynomial (in n ) rate. It is also clear that the polynomial rate of convergence in estimatingof considered functionals is possible for smooth compactly supported densities. The proof ofTheorems 5 and 6 coincides with the proof of Theorem 2 up to minor modifications.
4. Proofs of Theorems 1 and 2
We will prove only ”nonparametric rates”. A lower bound corresponding to the parametric rateof convergence n − / can be easily derived by reduction of the considered problem to parameterestimation in a regular statistical model. . Goldenshluger and O. V. Lepski/Estimation of norms of a probability density The proofs of Theorems 1 and 2 are based on application of Corollaries 3 and 4, and contain manycommon elements. In particular, in both proofs we consider parameterized family of functions { f w , w ∈ [0 , M } defined in (3.2)–(3.5), and we choose the sets Π and Π m , m ∈ M so thatΠ ⊂ ( −∞ , d and Π m ⊂ [0 , ∞ ) d for all m ∈ M . We equip the parameter set [0 , M with a pairof probability measures µ , ν satisfying conditions of one of Definitions 2–4. Along with conditionsof Definitions 2–4 in the proofs of Theorems 1 and 2 we require that the specified probabilitymeasures µ and ν possess the following property: p e π (2 z ) ≤ e π ( z ) , ∀ z ∈ { r , . . . , r d , q, p } , π ∈ { µ, ν } . (4.1)Recall that r , . . . , r d are the coordinates of the vector ~r used in the definition of the Nikolskiiclass. Once measures satisfying conditions of Definitions 2–4 are constructed, they can be easilymodified to satisfy (4.1); for details we refer to the proofs of Propositions 2 and 3. By convention,here and from now on we put [ e π ( z )] /z = 1 for z = ∞ .The parameters A, ~σ, M of the family { f w , w ∈ [0 , M } are specified to guarantee that underimposed prior measures random functions f ζ satisfy required smoothness conditions with theprobability controlled by parameter υ . Under these circumstances, parameter υ of Assumption 2should be such that constant C ∗ in Corollaries 3 and 4 is strictly positive. For instance, the choice υ = 64( d + 1) is sufficient and assumed throughout the proof.In what follows C , C , . . . , and c , c , . . . , denote constants that may depend on ~β, ~r, q, Q andΛ, but they are independent of ~L and n . Let C := R − e − − z d z , and U ( x ) := C − d e − P dj =1 11 − x i [ − , d ( x ) , x = ( x , . . . , x d ) ∈ R d , For
N > a > f ,N ( x ) := ( N ) − d Z R d U ( y − x )1 [ − N − , − d ( y )d y, f ,N ( x ) := a d ¯ f ,N (cid:0) xa (cid:1) . Lemma 1.
The following statements hold. (a).
For any N and a , f ,N is a probability density. For any ~β, ~L ∈ (0 , ∞ ) d and ~r ∈ (0 , ∞ ] d there exists a > such that f ,N ∈ N ~r,d (cid:0) ~β, ~L (cid:1) , ∀ N > . (b). For any
Q > and q ∈ ( r ∗ , ∞ ] there exists N ( q, Q ) > such that f ,N ∈ N ~r,d (cid:0) ~β, ~L (cid:1) ∩ B q ( Q ) , ∀ N ≥ N ( q, Q ) . The proof of the lemma is trivial; it is omitted. Let f := f ,N , N ≥ N ( q, Q ) , (4.2)so that statements (a) and (b) of Lemma 1 hold.For µ, ν ∈ P [0 ,
1] and z > e ∗ ( z ) := max[ e µ ( z ) , e ν ( z )]. . Goldenshluger and O. V. Lepski/Estimation of norms of a probability density Lemma 2.
Let f be given in (4.2) and f w be defined in (3.5) with Λ ∈ C ∞ ( R d ) satisfying (3.2),and ~σ ∈ (0 , d . Let µ, ν ∈ P [0 , satisfy (4.1) and assume that A k Λ k q (cid:2) σ M e ∗ ( q ) (cid:3) /q ≤ Q/
4; (4.3) Aσ − β l l k Λ k r l (cid:2) σ M e ∗ ( r l ) (cid:3) /r l ≤ c L l , ∀ l = 1 , . . . , d. (4.4) Then Assumption 1 is fulfilled for F = N ~r,d ( ~β, ~L ) ∩ B q ( Q ) with ε = 2 − . The proof is given in Appendix.
The sought lower bound depends on parameters A, σ , M of family { f w , w ∈ [0 , M } , and pa-rameters that characterize properties of the prior measures µ and ν in Definitions 3 and 4.All these parameters should be specified to satisfy conditions of Corollaries 3 and 4. It turnsout that the choice of all parameters can be made in a unified way so that the resulting lowerbound in expressed only in terms of the sample size n and properties of measures µ and ν . Thecorresponding statement is given in Proposition 1 below; it is of independent interest.For r ∈ N ∗ we define ς := (cid:26) /n, p ∈ N ∗ ,r/n, p / ∈ N ∗ , t := (cid:26) p, p ∈ N ∗ , ∞ , p / ∈ N ∗ , and d µ,ν := 64 υ [ e ∗ ( p )] | e µ ( p ) − e ν ( p ) | , n r,n := (cid:26) n p/ ( p − , p ∈ N ∗ ,e r ∧ ( n/r ) , p / ∈ N ∗ . Define also J r,n ( µ, ν ) := (cid:2) d µ,ν , n r,n (cid:3) ∩ n x > x − t ς n ≤ , (cid:0) [ e ∗ ( q )] /q ς (cid:1) τ ( q ) x q − − t ) τ ( q ) ≤ o , and M r,n ( µ, ν ) := sup M ∈ J r,n ( µ,ν ) M (cid:2) p − t +(1 − t )( βp − ω ) (cid:3) τ (1) . Several remarks on the choice of parameters A, σ and M that clarify the above definitionsare in order. In our parameter choice we treat condition (4.4) of Lemma 2, the first conditionin (3.17) of Corollary 3 for p / ∈ N ∗ , or condition (3.18) of Corollary 4 for p ∈ N ∗ as equalities .This allows us to express A and σ l , l = 1 , . . . d (and σ ) as functions of M, r and n . All otherconditions [Assumption 2, condition (4.3) of Lemma 2 and the second bound in (3.17) if p / ∈ N ∗ ]for given r and n determine the set to which parameter M should belong. In particular, the set n x > x − / t ς n ≤ , (cid:0) [ e ∗ ( q )] q ς (cid:1) τ ( q ) τ (1) x /q − − / t ) τ ( q ) τ (1) ≤ o is determined by conditions (3.6) of Assumption 2 and (4.3) of Lemma 2. The quantity n r,n isrelated to condition (3.8) of Assumption 2 and to the second requirement in (3.17) of Corollary3 if p / ∈ N ∗ . It is readily seen that J r,n ( µ, ν ) is the intersection of three intervals and, therefore,the value M r,n ( µ, ν ) is attained at the one of their endpoints. The quantity d µ,ν comes from thecondition ∆( µ, ν ) >
0. We remark that d µ,ν > υ and, therefore, condition (3.7) of Assumption2 is not active in the L p -norm estimation. . Goldenshluger and O. V. Lepski/Estimation of norms of a probability density Proposition 1.
Let r ∈ N ∗ be an integer number, possibly dependent on n , and let p > befixed. Let µ, ν ∈ P [0 , satisfy (4.1), J r,p ( µ, ν ) = ∅ , and assume that µ p,p ∽ ν if p ∈ N ∗ , and µ r ∽ ν if p / ∈ N ∗ . Then R n (cid:2) N ~r,d (cid:0) ~β, ~L (cid:1) ∩ B q ( Q ) (cid:3) ≥ C L − /pτ (1) (cid:0) ς (cid:2) e ∗ ( q ) (cid:3) q (cid:1) τ ( p ) τ (1) M r,n ( µ, ν ) | e µ ( p ) − e ν ( p ) | [ e ∗ ( q )] q [ e ∗ ( p )] − p ! . (4.5)Proposition 1 relates properties of probability measures µ and ν as determined by their mo-ments and parameter r with the sample sample n . It provides a guideline for the choice of priormeasures µ and ν : they should be specified to maximize the right hand side of (4.5). In view of Proposition 1, the proof of Theorems 1 and 2 is reduced to the construction of a pair ofprior probability measures, µ, ν ∈ P [0 , Proposition 2.
For any t, s ∈ N ∗ , s ≥ t > one can construct a pair of probability measures µ, ν ∈ P [0 , such that e µ ( z ) , e ν ( z ) ≥ / for any z > , µ s,t ∽ ν and e µ ( t ) − e ν ( t ) ≥ C s,t := √ t − t − ( s − t )![( s + t − − . For S ∈ C (0 ,
1) and s ∈ N ∗ let ̟ s ( S ) denote accuracy of the best approximation of S on [0 , s : ̟ s ( S ) := inf t ∈ R s +1 sup x ∈ [0 , (cid:12)(cid:12) S ( x ) − P s,a ( x ) (cid:12)(cid:12) , where P s,a ( x ) := P sj =0 a j x j , a = ( a , . . . , a s ) ∈ R s +1 . Proposition 3.
For any S ∈ C (0 , with ̟ s ( S ) > there exist a pair of probability measures µ, ν ∈ P [0 , such that e µ ( z ) , e ν ( z ) ≥ / for any z > , µ s ∽ ν and Z S ( x ) µ (d x ) − Z S ( x ) ν (d x ) = ̟ s ( S ) . We remark that the measures µ, ν constructed in Propositions 2 and 3 obviously satisfy therequirement (4.1).Lower and upper bounds for the accuracy of best approximation ̟ s ( S ) are known for manycontinuous functions S . In particular, the following results can be found in Timan (1963), § § Proposition 4.
For any p > , p / ∈ N ∗ there exists C p > such that ̟ s ( x x p ) ≥ C p s − p , ∀ s ∈ N ∗ . There exists
C > such that ̟ s ( x x ln x ) ≥ Cs − , ∀ s ∈ N ∗ . . Goldenshluger and O. V. Lepski/Estimation of norms of a probability density n − τ ( p ) τ (1) In this case lower bounds of Theorems 1 and 2 coincide. We derive them in a unified way below.Put for brevity e := [ e ∗ ( q )] /q and assume that τ ( q ) ≥ p ∈ N ∗ (under the premise of Theorem 1) then we pick prior measures µ, ν as in Proposi-tion 2 with s = t = p . If p / ∈ N ∗ (under the premise of Theorem 2), then we choose µ, ν as inProposition 3 with s = 2 s , where s is chosen from the relation s = inf (cid:8) s ∈ N ∗ : max (cid:2) C − p,p , υC − p s p (cid:3) ≤ e s (cid:9) . Here C p,p and C p are the constants from Propositions 2 and 4 respectively.This choice guarantees that 0 < d µ,ν < e s and, therefore, choosing M = e s we can assertthat M ∈ J r,n ( µ, ν ) for sufficiently large n . Indeed, if τ ( q ) > ς → , n ς → , [ e ς ] τ ( q ) τ (1) → , n → ∞ and, therefore J r,n ( µ, ν ) ⊇ [ d µ,ν , e s ] for n large enough. By the same reason if τ ( q ) = 0 J r,n ( µ, ν ) ⊇ (cid:2) d µ,ν , e s (cid:3) ∩ (cid:8) x ≥ x − / t ς n ≤ (cid:9) = (cid:2) d µ,ν , e s (cid:3) for sufficiently large n . It remains to note that µ and ν constructed in Propositions 2 and 3satisfy obviously (4.1) and | e µ ( p ) − e ν ( p ) | ≥ c . Thus, applying Proposition 1 with r = s we get R n (cid:2) N ~r,d (cid:0) ~β, ~L (cid:1) ∩ B q ( Q ) (cid:3) ≥ C L − /pτ (1) (1 /n ) τ ( p ) τ (1) . This completes the proof of Theorems 1 and 2 in the cases τ ( q ) ≥ τ ( p ) ≤ p ∈ N ∗ and τ ( q ) ≥ τ ( p ) ≤ − /p if p / ∈ N ∗ . Let prior measures µ, ν be chosen according to Proposition 2 with s = t = p . Hence | e µ ( p ) − e ν ( p ) | ≥ c , d µ,ν = c and (4.1) is fulfilled. Remembering that ς = n − , t = p ≥ J r,n ( µ, ν ) = h c , n pp − i ∩ n x > e /n ] τ ( q ) τ (1) x /q − − /p ) τ ( q ) τ (1) ≤ o . In addition, we deduce from Proposition 1 that for any M ∈ J r,n ( µ, ν ) R n (cid:2) N ~r,d (cid:0) ~β, ~L (cid:1) ∩ B q ( Q ) (cid:3) ≥ C L − /pτ (1) ς τ ( p ) τ (1) M (1 − /p )(1 / ( βp ) − /ω ) τ (1) . (4.6)Now we consider two cases.(a). Assume first that 1 / ( βp ) − /ω ≥ τ ( p ) ≥
1. Let us show that M := n − /p ∈ J r,n ( µ, ν ). Indeed, n − τ ( q ) τ (1) M /q − − /p ) τ ( q ) τ (1) = n /q − − /p ) τ (1) ≤ q ≥
1. Thus, we conclude from (4.6) R n (cid:2) N ~r,d (cid:0) ~β, ~L (cid:1) ∩ B q ( Q ) (cid:3) ≥ C L − /pτ (1) (1 /n ) τ (1) . . Goldenshluger and O. V. Lepski/Estimation of norms of a probability density (b). Now let us assume that 1 / ( βp ) − /ω < τ ( q ) <
0. Let us show that M := [ e n ] τ ( q )(1 − /p ) τ ( q ) − (1 − /q ) ∈ J r,n ( µ, ν ) . Indeed, we have τ ( q )(1 − /p ) τ ( q ) − (1 − /q ) − − /p = (1 − /q )(1 − /p ) − (1 − /p ) τ ( q ) − (1 − /q ) < M < n pp − . Moreover M → ∞ , n → ∞ . Thus, M ∈ J r,n ( µ, ν ) and we concludefrom (4.6) that R n (cid:2) N ~r,d (cid:0) ~β, ~L (cid:1) ∩ B q ( Q ) (cid:3) ≥ C L − /pτ (1) (1 /n ) /p − /q − /q − (1 − /p ) τ ( q ) . The theorem is proved.
Let prior measures µ, ν be chosen according to Proposition 3 with s = 2 ⌊ ln n ⌋ + 2. In Proposition1 choose r = ⌊ ln n ⌋ + 1; then by Proposition 4 | e µ ( p ) − e ν ( p ) | ≥ C r − p = c (ln n ) − p , d µ,ν ≤ c (ln n ) p . Also, ς = ln( n ) /n and t = ∞ . Hence, we have for all n large enough J r,n ( µ, ν ) = h c (ln n ) p , n/ ln ( n ) i ∩ n x > e ln( n ) /n ] τ ( q ) τ (1) x /q − τ ( q ) τ (1) ≤ o , where we remind that e = [ e ∗ ( q )] /q . Since µ and ν satisfy (4.1) we derive from Proposition 1that for any M ∈ J r,n ( µ, ν ) R n (cid:2) N ~r,d (cid:0) ~β, ~L (cid:1) ∩ B q ( Q ) (cid:3) ≥ C L − /pτ (1) (ln( n ) /n ) τ ( p ) τ (1) M /p +1 / ( βp ) − /ωτ (1) (ln n ) − p . (4.7)Consider now two cases. First we assume first that 1 /p + 1 / ( βp ) − /ω ≥ τ ( p ) ≥ − /p ) and show that M := n/ ln ( n ) ∈ J r,n ( µ, ν ). Indeed, n − τ ( q ) τ (1) M /q − τ ( q ) τ (1) = (cid:2) n/ ln ( n ) (cid:3) /q − τ (1) [ln( n )] − /qτ (1) ≤ n large enough. Hence, M ∈ J r,n ( µ, ν ) and we conclude from (4.7) that R n (cid:2) N ~r,d (cid:0) ~β, ~L (cid:1) ∩ B q ( Q ) (cid:3) ≥ C L − /pτ (1) (1 /n ) − /pτ (1) (ln n ) − /p ) − τ ( p ) τ (1) − p . Second, let τ ( q ) < M := (cid:0) n/ [ e ln( n )] (cid:1) − τ ( q )1 − /q − τ ( q ) ∈ J r,n ( µ, ν ) . Indeed, − τ ( q )1 − /q − τ ( q ) = 1 − − /q − /q − τ ( q ) ∈ (0 , , and, therefore, M ∈ (cid:2) c (ln n ) p , n/ ln ( n ) (cid:3) for all n large enough. Hence M ∈ J r,n ( µ, ν ) and wededuce from (4.7) R n (cid:2) N ~r,d (cid:0) ~β, ~L (cid:1) ∩ B q ( Q ) (cid:3) ≥ C L − /pτ (1) (1 /n ) /p − /q − /q − τ ( q ) [ln( n )] /p − /q − /q − τ ( q ) − p . This completes the proof of the theorem. . Goldenshluger and O. V. Lepski/Estimation of norms of a probability density The proof consists of two steps.
Computation of ∆( µ, ν ) In the specific case of Ψ( f ) = k f k p we have G ( x ) = x /p , H ( x ) = x p so that functions S and S in (3.9) and quantities E π ( A ), V π ( A ) in (3.10) take the form S ( z ) =(1 − z ) p k f k pp , S ( z ) = z p k Λ k pp , and, correspondingly, E π ( A ) = A p k Λ k pp e π ( p ) , V π ( A ) = A p k Λ k pp [ e π (2 p )] / ≤ A p k Λ k pp e π ( p ) . The last inequality follows from (4.1). Therefore H ∗ π = [1 − A σ M e π (1)] p k f k pp + A p σ k Λ k pp M e π ( p ) . Since the choice of parameters A , σ and M satisfies (3.8), 1 − A σ M e π (1) ≥ /
2. By definitionof f , k f k pp = c N − d ( p − , and N can be chosen arbitrarily large.In particular, denoting e ∗ ( p ) = min[ e µ ( p ) , e ν ( p )] and choosing N so that N − d ( p − = c A p σ k Λ k pp e ∗ ( p ) √ υM with sufficiently small c > A p σ k Λ k pp M e π ( p ) ≤ H ∗ π ≤ A p σ k Λ k pp M e π ( p ) + A p σ k Λ k pp e ∗ ( p ) √ υM . (4.8)Furthermore, | S ( z ) − S ( z ′ ) | = k f k pp (cid:12)(cid:12) (1 − z ) p − (1 − z ′ ) p | ≤ p k f k pp (cid:12)(cid:12) z − z ′ (cid:12)(cid:12) , ∀ z, z ′ ∈ [0 , , so that η S (cid:0) A σ M e π (1); A σ √ υM (cid:1) ≤ p k f k pp A σ √ υM ≤ c N − d ( p − A σ √ υM . Taking into account that N − d ( p − = c A p σ k Λ k pp e ∗ ( p ) √ υM we obtain α π ≤ c A p +1 σ k Λ k pp e ∗ ( p ) υM + 2 A p σ k Λ k pp e π ( p ) √ υM ≤ A p σ k Λ k pp e π ( p ) √ υM (cid:0) c / A σ √ υM (cid:1) ≤ A p σ k Λ k pp e π ( p ) √ υM , where in the last inequality we have used (3.6), and κ is sufficiently small. Therefore (4.8) and(3.7) imply H ∗ π − α π ≥ A p σ k Λ k pp M e π ( p ) (cid:0) − p υ/M (cid:1) ,H ∗ π + α π ≤ A p σ k Λ k pp M e π ( p ) (cid:0) p υ/M (cid:1) . Since, G ( x ) = x /p we can assert that J π ⊆ h A ( M σ ) /p k Λ k p [ e π ( p )] /p (cid:0) − p υ/M (cid:1) /p , A ( M σ ) /p k Λ k p [ e π ( p )] /p (cid:0) p υ/M (cid:1) /p i and, therefore, denoting ( y ) + = max[0 , y ], we get∆( µ, ν ) ≥ A ( M σ ) /p k Λ k p (cid:16) [ e ∗ ( p )] /p (cid:0) − p υ/M (cid:1) /p − [ e ∗ ( p )] /p (cid:0) p υ/M (cid:1) /p (cid:17) + . Assuming that | e µ ( p ) − e ν ( p ) | > e ∗ ( p ) p υ/M (4.9) . Goldenshluger and O. V. Lepski/Estimation of norms of a probability density we can guarantee that ∆( µ, ν ) > H ∗ µ − α µ , H ∗ µ + α µ ] and [ H ∗ ν − α ν , H ν + α ν ]are disjoint. Additionally, by the elementary inequality p ( a ∨ b ) − /p | a − b | ≤ | a /p − b /p | ∀ a, b > , p ≥ a = e ∗ ( p ) (cid:0) − p υ/M (cid:1) and b = e ∗ ( p ) (cid:0) p υ/M (cid:1) we get∆( µ, ν ) ≥ c A ( M σ ) /p (cid:12)(cid:12) e µ ( p ) − e ν ( p ) (cid:12)(cid:12) [ e ∗ ( p )] /p − . With this lower bound on ∆( µ, ν ) applying either Corollary 3 or Corollary 4 we come to thefollowing lower bound on the minimax risk in terms of parameters A , σ and M of the family { f w } and properties of the probability measures µ and ν : R n (cid:2) N ~r,d (cid:0) ~β, ~L (cid:1) ∩ B q ( Q ) (cid:3) ≥ c A ( σ M ) /p (cid:12)(cid:12) e µ ( p ) − e ν ( p ) (cid:12)(cid:12) [ e ∗ ( p )] /p − . (4.10) Generic choice of parameters
In this part we present the choice of parameters A and ~σ asfunctions of M, r , n and satisfying conditions (3.6)–(3.8) and (4.3)–(4.4).For any t ∈ (1 , ∞ ] and ς < σ l = c /β l L − /β l l L /βl − / ( βlrl ) τ (1) [ e ς ] τ ( rl ) βlτ (1) M /rl − − / t ) τ ( rl ) βlτ (1) ; A = c e − L τ (1) [ e ς ] − /ωτ (1) M − / t +(1 − / t ) /ωτ (1) σ = c /β L − τ (1) [ e ς ] /βτ (1) M /ω − / ( β t ) τ (1) . Simple algebra shows that (4.4) holds if c c / ( βr ∗ ) − ≤ c because [ e ∗ ( r l )] /r l ≤ e for all l =1 , . . . , d , and q ≥ r ∗ . Moreover, it yields A σ = c c /β ς M − / t ; (4.11) A σ M = c c /β ς M − / t ; (4.12) nA σ M = c c /β M − / t ς n ; (4.13) A e [ σ M ] q = c c / ( βq )3 L − /qτ (1) [ e ς ] τ ( q ) τ (1) M /q − − / t ) τ ( q ) τ (1) . (4.14)In addition, A ( σ M ) /p = c e − L − /pτ (1) [ e ς ] τ ( p ) τ (1) M τ (1) (cid:2) p − t +(1 − t )( βp − ω ) (cid:3) . (4.15)Introduce X ς , t = h υ, ς − tt − i ∩ n x > x − / t ς n ≤ , [ e ς ] τ ( q ) τ (1) x /q − − / t ) τ ( q ) τ (1) ≤ o . First of all we assert that in view of (4.12)–(4.14) M ∈ X ς , t implies the verification of Assumption2 and (4.3) if one chooses c sufficiently small.Our goal now is to show that for any l = 1 , . . . , dσ l ≤ , ∀ M ∈ X ς , t . (4.16)Put T l = c /β l L − /β l l L /βl − / ( βlrl ) τ (1) and consider separately two cases. . Goldenshluger and O. V. Lepski/Estimation of norms of a probability density Let τ ( q ) ≥
0. In this case τ ( r l ) ≥ l = 1 , . . . , d since q ≥ r ∗ and we have σ l = T l h e ς M − / t i τ ( rl ) βlτ (1) M /rl − βlτ (1) ≤ h ς M − / t i τ ( rl ) βlτ (1) M /rl − βlτ (1) ≤ T l ≤ , if one chooses c sufficiently small. Here we have used also that r l ≥ e ≤ M ≥ M ∈ X ς , t .Let now τ ( q ) <
0. First we note that in this case necessarily e ς ≤ M − /q − − / t ) τ ( q ) τ ( q ) ⇔ e ς M − /t ≤ M − /q − τ ( q ) . On the other hand σ l = T l [ e ς ] τ ( q ) βlτ (1) [ e ς ] τ ( rl ) − τ ( q ) βlτ (1) M /rl − /q +(1 − /t )( τ ( rl ) − τ ( q )) βlτ (1) M /q − − / t ) τ ( q ) βlτ (1) = T l [ e ς ] τ ( q ) βlτ (1) M /q − − / t ) τ ( q ) βlτ (1) h e ς M − /t i τ ( rl ) − τ ( q ) βlτ (1) M /rl − /qβlτ (1) ≤ T l M /rl − /qβlτ (1) (cid:2) − /q − βτ ( q ) (cid:3) = T l M /rl − /qβlτ ( q ) ≤ T l ≤ , if one chooses c sufficiently small. Here we have used once again that q ≥ r l for any l = 1 , . . . , d , M > τ ( q ) <
0. Thus, (4.16) is proved, and all assumptions of Theorem 4 and Lemma 2are verified if M ∈ X ς , t .Note that if t = p N ∗ ( p ) + ∞ ¯ N ∗ ( p ) and ς = n − N ∗ ( p ) + rn − ¯ N ∗ ( p ) then J r,n ( µ, ν ) ⊂ X ς , t . To get the latter inclusion we have used also that (4.9) is equivalent to M ≥ d µ,ν and d µ,ν ≥ υ .Moreover, we deduce from (4.11) that assumptions (3.17) of Corollary 3 and (3.18) of Corollary4 with t = p are verified for sufficiently small c . The assertion of the proposition follows nowfrom (4.10) and (4.15).
5. Proofs of Theorem 4 and Corollaries 3–4
We break the proof into several steps.1 . Product form bounds for p ζ ( x )Our first step is to develop tight bounds on p ζ ( x ) for all x ∈ R dn possessing a product formstructure with respect to the coordinates of ζ . Recall that p ζ ( x ) = Q i =1 f ζ ( x i ) where f ζ isdefined in (3.5).Let Λ ( · ) := f ( · ) (cid:2) − A σ ̺ ζ (1) (cid:3) . As it was mentioned above, (3.8) implies 1 − A σ ̺ ζ (1) > ∩ Π m = ∅ for all m ∈ M , and Π j ∩ Π m = ∅ for all m, j ∈ M , m = j we have thefollowing representation of function f ζ ( · ): for any y ∈ R d f ζ ( y ) = Λ ( y )1 Π ( y ) + A X m ∈M ζ m Λ m ( y )1 Π m ( y )= Λ ( y ) Π0 ( y ) Y m ∈M (cid:2) Aζ m Λ m ( y ) (cid:3) Π m ( y ) . . Goldenshluger and O. V. Lepski/Estimation of norms of a probability density Therefore p ζ ( x ) = n Y i =1 f ζ ( x i ) = h n Y i =1 Λ ( x i ) Π0 ( x i ) ih Y m ∈M n Y i =1 (cid:2) Aζ m Λ m ( x i ) (cid:3) Π m ( x i ) i . If we put T ( x ) := n Y i =1 n(cid:2) f ( x i ) (cid:3) Π0 ( x i ) Y m ∈M (cid:2) A Λ m ( x i ) (cid:3) Π m ( x i ) o (5.1)then p ζ ( x ) = T ( x ) (cid:2) − A σ ̺ ζ (1) (cid:3) n ( x ) Y m ∈M (cid:2) ζ m (cid:3) n m ( x ) . (5.2)Our current goal is to derive bounds on 1 − A σ ̺ ζ (1). Recall that E π ρ ζ (1) = M e π (1), and denotefor brevity b := A σ , u := E π ̺ ζ (1) = M e π (1) , D := b/ (1 − bu ) , where D is defined in (3.15). For π ∈ { µ, ν } define the set W π := n w ∈ [0 , M : (cid:12)(cid:12) ̺ w (1) − M e π (1) (cid:12)(cid:12) ≤ √ υM o , (5.3)and suppose that ζ ∈ W π . Then (3.8) implies | ̺ ζ (1) − u | ≤ √ υM ≤ M, b | ̺ ζ (1) − u | ≤ / . (5.4)Moreover, by (3.8) and e π (1) ≤ bu = A σ M e π (1) ≤ /
2, and D ≤ b . We have1 − b̺ ζ (1) = 1 − bu − b (cid:2) ̺ ζ (1) − u (cid:3) = (1 − bu ) (cid:0) − D [ ̺ ζ (1) − u ] (cid:1) . (5.5)Using elementary inequality 1 − t ≤ e − t we obtain from (5.5)1 − b̺ ζ (1) ≤ (1 − bu ) e Du exp {− D̺ ζ (1) } . (5.6)On the other hand, taking into account that D | ̺ ζ (1) − u | ≤ { ζ ∈ W π } andapplying inequality 1 − t ≥ e − t − t , ∀ t ≥ − − D [ ̺ ζ (1) − u ] ≥ e Du exp {− D̺ ζ (1) } (cid:16) − D [ ̺ ζ (1) − u ] e D [ ̺ ζ (1) − u ] (cid:17) ≥ e Du exp {− D̺ ζ (1) } (cid:16) − b [ ̺ ζ (1) − u ] e b [ ̺ ζ (1) − u ] (cid:17) ≥ e Du exp {− D̺ ζ (1) } (cid:0) − eb υM (cid:1) ≥ e Du exp {− D̺ ζ (1) } (cid:0) − /n (cid:1) , where the second inequality follows from D ≤ b , the third inequality is a consequence of (5.4),and the last inequality follows from condition (3.6) with small enough κ satisfying 2 e κ υ ≤ − b̺ ζ (1) ≥ (1 − bu ) e Du exp {− D̺ ζ (1) } (cid:0) − /n (cid:1) . (5.7)Combining (5.6) and (5.7) with (5.2) and n ( x ) ≤ n we get e − p ∗ ζ ( x ) ≤ p ζ ( x ) ≤ p ∗ ζ ( x ), ∀ x ∈ R dn ,where p ∗ ζ ( x ) := T ( x )(1 − bu ) n ( x ) e Dun ( x ) Y m ∈M e − Dn ( x ) ζ m (cid:2) ζ m (cid:3) n m ( x ) . (5.8) . Goldenshluger and O. V. Lepski/Estimation of norms of a probability density Thus we showed that under conditions (3.6) and (3.8) (cid:8) ζ ∈ W π (cid:9) ⊆ (cid:8) e − p ∗ ζ ( x ) ≤ p ζ ( x ) ≤ p ∗ ζ ( x ) (cid:9) , ∀ x ∈ R dn . (5.9)Since ζ m , m ∈ M are independent random variables we get E π (cid:8) p ∗ ζ ( x ) (cid:9) = T ( x )(1 − bu ) n ( x ) e Dun ( x ) Y m ∈M E π n e − Dn ( x ) ζ m (cid:2) ζ m (cid:3) n m ( x ) o , It remains to note that since µ ∽ ν , values of D and u do not depend on π ∈ { µ, ν } ; thereforeΥ( x ) := E µ (cid:8) p ∗ ζ ( x ) (cid:9) E ν (cid:8) p ∗ ζ ( x ) (cid:9) = Y m ∈M γ m,µ ( x ) γ m,ν ( x ) , ∀ x ∈ R dn , where γ m,π is given in (3.15).2 . Derivation of lower bound (3.16) (a). According to (3.11), Z R d H (cid:0) f ζ ( x ) (cid:1) d x = S (cid:0) A σ ̺ ζ (1) (cid:1) + σ X m ∈M S (cid:0) Aζ m (cid:1) . For π ∈ { µ, ν } define W π,S := n w ∈ [0 , M : (cid:12)(cid:12)(cid:12) X m ∈M S ( Aw m ) − M E π ( A ) (cid:12)(cid:12)(cid:12) ≤ √ υM V π ( A ) o . It is worth noting that events { ζ ∈ W π } [see (5.3)] and { ζ ∈ W π,S } control deviations of sums ofindependent random variables from their expectations, where the thresholds on the right handside in the definitions of W π and W π,S are the upper bounds on the standard deviations of thesum inflated by a factor √ υ . This fact allows to assert that by Chebyshev’s inequality P π { ζ
6∈ W π } ≤ /υ, P π { ζ
6∈ W π,S } ≤ /υ. Assume that ζ ∈ W π ∩ W π,S ; then (cid:12)(cid:12)(cid:12) Z R d H (cid:0) f ζ ( x ) (cid:1) d x − H ∗ π (cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12) S (cid:0) A σ ̺ ζ (1) (cid:1) − S ( A σ M e π (1)) (cid:12)(cid:12) + σ (cid:12)(cid:12)(cid:12) X m ∈M S ( Aζ m ) − M E π ( A ) (cid:12)(cid:12)(cid:12) , where H ∗ π is defined in (3.12). We have A σ | ̺ ζ (1) − M e π (1) | ≤ A σ √ υM in view of ζ ∈ W π and(3.6); therefore (cid:12)(cid:12) S (cid:0) A σ ̺ ζ (1) (cid:1) − S ( A σ M e π (1)) (cid:12)(cid:12) ≤ η S (cid:0) A σ M e π (1); A σ √ υM (cid:1) . If ζ ∈ W π ∩ W π,S is realized then (cid:12)(cid:12)(cid:12) Z R d H (cid:0) f ζ ( x ) (cid:1) d x − H ∗ π (cid:12)(cid:12)(cid:12) ≤ η S (cid:0) A σ M e π (1); A σ √ υM (cid:1) + σ √ υM V π ( A ) = α π , . Goldenshluger and O. V. Lepski/Estimation of norms of a probability density where α π is defined in (3.13). Therefore we have shown that { ζ ∈ W π } ∩ { ζ ∈ W π,S } ⊆ { Ψ( f ζ ) ∈ J π } , (5.10)where J π is defined in (3.14).(b). For π ∈ { µ, ν } define C π := { ζ ∈ W π } ∩ { ζ ∈ W π,S } ∩ { f ζ ∈ F } . For the sake of brevityin the subsequent proof we write ∆ := ∆( µ, ν ). For arbitrary estimator ˜Ψ of Ψ( f ) we have2 sup f ∈ F ∩ F P f (cid:8) | ˜Ψ − Ψ( f ) | ≥ ∆ (cid:9) ≥ E µ h {C µ } P f ζ (cid:8) | ˜Ψ − Ψ( f ζ ) | ≥ ∆ (cid:9)i + E ν h {C ν } P f ζ (cid:8) | ˜Ψ − Ψ( f ζ ) | ≥ ∆ (cid:9)i . (5.11)Let a π := inf | α |≤ α π G ( ¯ H π + α ), b π := sup | α |≤ α π G ( ¯ H π + α ) so that J π = [ a π , b π ]. Thereforeletting I π (∆) := [ a π − ∆ , b π + ∆] we have from (5.10) C π ∩ n ˜Ψ − Ψ( f ζ ) | ≥ ∆ o ⊇ C π ∩ (cid:8) ˜Ψ / ∈ I π (∆) (cid:9) . This implies J π := E π h {C π } P f ζ (cid:8) | ˜Ψ − Ψ( f ζ ) | ≥ ∆ (cid:9)i ≥ E π h {C π } P f ζ (cid:8) ˜Ψ / ∈ I π (∆) (cid:9)i = E π h {C π } Z R dn (cid:8) ˜Ψ( x ) / ∈ I π (∆) (cid:9) p ζ ( x )d x i ≥ e − E π h {C π } Z R dn (cid:8) ˜Ψ( x ) / ∈ I π (∆) (cid:9) p ∗ ζ ( x )d x i ≥ e − Z R dn (cid:8) ˜Ψ( x ) / ∈ I π (∆) (cid:9) E π [ p ∗ ζ ( x )]d x − e − E π (cid:2) { ¯ C π } p ∗ ζ (cid:3) , where ¯ C π is the event complementary to C π , and p ∗ ζ := R R dn p ∗ ζ ( x )d x . In the third line we haveused that p ζ ( x ) ≥ e − p ∗ ζ ( x ) for all x ∈ R dn on the event { ζ ∈ W π } . Note that by Chebyshev’sinequality and in view of Assumption 1 P π { ¯ C π } ≤ P π { ζ
6∈ W π } + P π { ζ
6∈ W π,S } + P π { ζ F } ≤ υ − + ε. Then by the Cauchy–Schwarz inequality E π (cid:2) { ¯ C π } p ∗ ζ (cid:3) ≤ p (2 υ − + ε ) max π ∈{ µ,ν } (cid:8) E π ( p ∗ ζ ) (cid:9) / =: R (5.12)which leads to J π ≥ e − Z R dn (cid:8) ˜Ψ( x ) / ∈ I π (∆) (cid:9) E π [ p ∗ ζ ( x )]d x − e − R. (5.13)Furthermore, we note that J µ + e − R ≥ e − Z R dn (cid:8) ˜Ψ( x ) / ∈ I µ (∆) (cid:9) Υ( x ) E ν [ p ∗ ζ ( x )]d x ≥ (2 e ) − Z R dn (cid:8) ˜Ψ( x ) / ∈ I µ (∆) (cid:9) { Υ( x ) ≥ } E ν [ p ∗ ζ ( x )]d x, (5.14) . Goldenshluger and O. V. Lepski/Estimation of norms of a probability density and that for all x ∈ R dn (cid:8) ˜Ψ( x ) / ∈ I µ (∆) (cid:9) + 1 (cid:8) ˜Ψ( x ) / ∈ I ν (∆) (cid:9) ≥ . (5.15)The last inequality is an immediate consequence of definition of I π (∆) and the fact that ∆ =∆( µ, ν ) >
0. Therefore combining (5.15), (5.14), (5.13) and (5.11) we obtain ( J µ + J ν ) ≥ (4 e ) − Z R dn { Υ( x ) ≥ } E ν [ p ∗ ζ ( x )]d x − e − R ≥ (4 e ) − Z R dn { Υ( x ) ≥ } E ν h { ζ ∈ W ν } p ζ ( x ) i d x − e − R ≥ (4 e ) − E ν h ζ ∈ W ν ) P f ζ (cid:8) Υ( X ( n ) ) ≥ (cid:9)i − e − R ≥ (4 e ) − E ν h P f ζ (cid:8) Υ( X ( n ) ) ≥ (cid:9)i − (4 eυ ) − − e − R, where in the second line we have used that p ∗ ζ ( x ) ≥ p ζ ( x ) for all x ∈ R dn on the event { ζ ∈ W ν } ,see (5.9). This together with (5.11) and Chebyshev’s inequality implies that[∆( µ, ν )] − R n [ F ] ≥ (36 e ) − E ν h P f ζ (cid:8) Υ( X ( n ) ) ≥ (cid:9)i − (36 eυ ) − − (9 e ) − R. (5.16)3 . Bounding the remainder in (5.16) In order to complete the proof of the theorem, in view of (5.12), it remains to show that E π (cid:0) p ∗ ζ (cid:1) ≤ . (5.17)Indeed, if (5.17) is established then the theorem statement follows from (5.16) and (5.12).(a). First, we note that in view of (5.1) and by definition of n ( x ) and n m ( x ) T ( x )(1 − bu ) n ( x ) e Dun ( x ) = n Y i =1 (cid:26)h e Du (1 − bu ) f ( x i ) i Π0 ( x i ) Y m ∈M (cid:2) A Λ m ( x i ) (cid:3) Π m ( x i ) (cid:27) , and Y m ∈M e − Dn ( x ) ζ m (cid:2) ζ m (cid:3) n m ( x ) = d Y i =1 (cid:2) e − D̺ ζ (1) (cid:3) Π0( xi ) Y m ∈M [ ζ m ] Π m ( xi ) . Therefore p ∗ ζ ( x ) = n Y i =1 nh f ( x i )(1 − bu ) e − D [ ρ ζ (1) − u ] i Π0( xi ) Y m ∈M (cid:0) A Λ m ( x i ) ζ m (cid:1) Π m ( x i ) o = n Y i =1 nh ¯Π ( x i ) + f ( x i )(1 − bu ) e − D [ ρ ζ (1) − u ] ih Π ( x i ) + X m ∈M Aζ m Λ m ( x i ) io = n Y i =1 h f ( x i )(1 − bu ) e − D [ ρ ζ (1) − u ] + X m ∈M Aζ m Λ m ( x i ) i , . Goldenshluger and O. V. Lepski/Estimation of norms of a probability density and taking into account that R Λ m ( x )d x = σ and b = A σ we obtain p ∗ ζ = h (1 − bu ) e − D [ ρ ζ (1) − u ] + bρ ζ (1) i n = h e − b [ ̺ζ (1) − u ]1 − bu (1 − bu ) + b̺ ζ (1) i n . Denote χ := ̺ ζ (1) − E π (cid:8) ̺ ζ (1) (cid:9) . Since E π (cid:8) ̺ ζ (1) (cid:9) = u we have p ∗ ζ = (cid:2) e − bχ − bu (1 − bu ) + bu + bχ (cid:3) n . (5.18)Note that e π (1) ≤
1; hence 0 < bu ≤ / ̺ ζ (1) is a positive randomvariable b ( ̺ ζ (1) − u )1 − bu ≥ − bu − bu ≥ − . Since e − t ≤ − t + t for all t ≥ −
1, we have e − bχ − bu (1 − bu ) ≤ − bu − bχ + 2 b χ whichtogether with (5.18) leads to p ∗ ζ ≤ (cid:2) b χ (cid:3) n . (5.19)Now we bound second moment of the random variable on the right hand side of (5.19).(b). We have E π (cid:0) p ∗ ζ (cid:1) ≤ E π (cid:8) e nb χ (cid:9) ≤ Z ∞ P π (cid:16) | χ | ≥ (4 nb ) − / p ln y (cid:17) d y. Since χ is the sum of M centered i.i.d. random variables taking values in [0 , E π (cid:0) p ∗ ζ (cid:1) ≤ Z ∞ e − (2 Mnb ) − ln y d y. Then (3.6) implies 2
M nb = 2 nA σ M ≤ κ , and E π (cid:0) p ∗ ζ (cid:1) ≤ Z ∞ y − / (2 κ ) d y = 1 + 4 κ (1 − κ ) − ≤ κ . Thus, (5.17) is proved. The following well known inequality on the tail of binomial random variable [see, e.g., in Boucheron et al.(2013)] will be exploited in the proof of the theorem.
Lemma 3.
Let ξ be a binomial random variable with parameters n and p . Then for pn ≤ z ≤ n one has P ( ξ ≥ z ) ≤ (cid:16) pnz (cid:17) z e z − pn . First of all let us remark that the first condition in (3.17) implies Dn ≤ κ r. (5.20) . Goldenshluger and O. V. Lepski/Estimation of norms of a probability density Also without loss of generality we will assume that e µ ( r ) ≤ e ν ( r ), since all our constructions anddefinitions are ”symmetric” w.r.t µ and ν . We have for any m ∈ M γ m,π ( x ) = Z y n m ( x ) e − Dn ( x ) y π (d y ) = ∞ X k =0 ( − k k ! (cid:2) Dn ( x ) (cid:3) k e π (cid:0) n m ( x ) + k (cid:1) . Let Y m ( x ) := r X k =0 ( − k k ! (cid:2) Dn ( x ) (cid:3) k (cid:2) e µ ( n m ( x ) + k ) − e ν ( n m ( x ) + k ) (cid:3) . Taking into account that µ r ∽ ν , e π ( k ) ≥ e π ( k ) for k ≤ k , and (5.20) we obtain (cid:12)(cid:12) γ m,µ ( x ) − γ m,ν ( x ) (cid:12)(cid:12) ≤ (cid:12)(cid:12) Y m ( x ) (cid:12)(cid:12) + (cid:2) e µ ( r ) + e ν ( r ) (cid:3) ∞ X k = r +1 ( Dn ) k k ! ≤ (cid:12)(cid:12) Y m ( x ) (cid:12)(cid:12) + 2 e ν ( r ) ∞ X k = r +1 (cid:20) e κ rk (cid:21) k ≤ (cid:12)(cid:12) Y m ( x ) (cid:12)(cid:12) + 4(2 e κ ) r +1 e ν ( r ) (5.21)for small enough κ . On the other hand, by definition γ m,π ( x ) ≥ e − Dn e π (cid:0) n m ( x ) (cid:1) . (5.22)Now define the random event A := ∪ m ∈M A m , A m := (cid:8) n m (cid:0) X ( n ) (cid:1) ≥ r (cid:9) , and let ¯ A be the event complimentary to A .Since µ r ∽ ν , on the event ¯ A we have Y m (cid:0) X ( n ) (cid:1) = 0, ∀ m ∈ M , and e ν (cid:0) n m ( x ) (cid:1) ≥ e ν ( r ).Therefore it follows from from (5.21) and (5.22) thatΥ (cid:0) X ( n ) (cid:1) ≥ Y m ∈M (cid:2) − e κ r (2 e κ ) r (cid:3) = Y m ∈M (cid:2) − e c r (cid:3) (5.23)where c = 2 κ + 1 + ln(2 κ ). In view of the second condition in (3.17), M ≤ e r , and (5.23)implies that if ¯ A is realized thenΥ (cid:0) X ( n ) (cid:1) ≥ (1 − e c r ) M ≥ inf z ≥ (cid:0) − z c (cid:1) z ≥ , where the last inequality holds because c = 2 κ +1+ln(2 κ ) can be made negative and arbitrarysmall by choice of sufficiently small κ . Thus¯ A ⊆ (cid:8) Υ (cid:0) X ( n ) (cid:1) > (cid:9) , (5.24)and we derive from (5.24) P f ζ (cid:8) Υ (cid:0) X ( n ) (cid:1) > − (cid:9) ≥ P f ζ (cid:0) ¯ A (cid:1) ≥ − X m ∈M P f ζ (cid:0) A m (cid:1) . (5.25)Our current goal is to bound from below the expression on the right hand side of the last formula.Note that for any m ∈ M random variable b n m = n m ( X ( n ) ) = P ni =1 Π m ( X i ) has binomialdistribution with parameters n and p m := P f ζ (cid:0) X i ∈ Π m (cid:1) = Z Π m f ζ ( y )d y = A σ ζ m ≤ A σ . . Goldenshluger and O. V. Lepski/Estimation of norms of a probability density By the first condition in (3.17), p m n ≤ A σ n ≤ κ r ≤ r so that we can apply Lemma 3: P f ζ ( A m ) = P f ζ { b n m ≥ r } ≤ ( eA σ nr − ) r ≤ ( e κ ) r . Therefore taking into account that M ≤ e r we obtain P f ζ (cid:0) ¯ A (cid:1) ≥ − M ( e κ ) r ≥ − ( e κ ) r ≥ , (5.26)provided that κ is small enough. Then it follows from (5.25) and (5.26) that E ν (cid:2) P f ζ (cid:8) Υ (cid:0) X ( n ) (cid:1) ≥ (cid:9)(cid:3) ≥ . The statement of the corollary follows now from Theorem 4.
Let r = t , and µ, ν ∈ P [0 ,
1] satisfy µ t,t ∽ ν . Let us remark that (3.18) implies Dn ≤ A σ n ≤ κ M − /t . (5.27)Also without loss of generality we will assume that e µ ( t ) ≤ e ν ( t ).1 . Define the random events D := \ m ∈M (cid:8)b n m ≤ t − (cid:9) , E := t − \ k =0 n η k ≤ M − kt o , where b n m = n m ( X ( n ) ), and we have put η k := P m ∈M { b n m = k } . If D is realized then for any m ∈ M we have γ m,π (cid:0) X ( n ) (cid:1) = t − X k =0 (cid:0)b n m = k (cid:1)(cid:20) Z y k e − b n Dy π (d y ) (cid:21) = t − Y k =0 (cid:2) T k ( π ) (cid:3) b n m = k ) , where, recall, γ m,π ( · ) is defined in (3.15), and we have denoted T k ( π ) := Z y k e − b n Dy π (d y ) . Hence, if event D is realizedΥ (cid:0) X ( n ) (cid:1) = Y m ∈M γ m,µ (cid:0) X ( n ) (cid:1) γ m,ν (cid:0) X ( n ) (cid:1) = t − Y k =0 (cid:20) T k ( µ ) T k ( ν ) (cid:21) η k . (5.28)Setting for k = 0 , , . . . , t − u k ( π ) := t − k − X j =0 ( − j j ! ( D b n ) j e π ( j + k ) , U k ( π ) := ∞ X j = t − k ( − j j ! ( D b n ) j e π ( j + k ) , we get in view of the Taylor expansion T k ( π ) = u k ( π ) + U k ( π ). Moreover, since µ t,t ∽ ν , u k ( µ ) = u k ( ν ) , ∀ k = 0 , , . . . , t − . (5.29) . Goldenshluger and O. V. Lepski/Estimation of norms of a probability density Next, in view of (5.27),
Dn < κ < κ (cid:12)(cid:12) U k ( π ) (cid:12)(cid:12) ≤ ( eDn ) t − k e π ( t ) , k = 0 , , . . . , t − . (5.30)Also, by (5.27), and by definition k = 0 , , . . . , t − T k ( π ) ≥ e − Dn e π ( k ) ≥ e − κ e π ( t ) . (5.31)We deduce from (5.29), (5.30) and (5.31) that on the event D one has T k ( µ ) T k ( ν ) ≥ − e κ ( eDn ) t − k ≥ − e κ +1 κ M ( k − t ) /t . (5.32)To get the last inequality we used (5.27), Dn < κ <
1, and took into account that κ is smallenough so that 2 κ e <
1. Assuming additionally that E is realized we get from (5.28) and (5.32)Υ (cid:0) X ( n ) (cid:1) ≥ t − Y k =0 h − κ e κ +1 M ( k − t ) /t i M ( t − k ) /t ≥ h inf z ≥ (cid:16) − κ e κ +1 z − (cid:17) z i t > / , provided that κ is sufficiently small, and t is independent of n .Thus, we have proved that D ∩ E ⊆ (cid:8) Υ (cid:0) X ( n ) (cid:1) ≥ (cid:9) , and, to complete the proof it suffices to show that E ν (cid:2) P f ζ {D ∩ E} (cid:3) ≥ / . (5.33)2 . Let ¯ E and ¯ B be the events complimentary to E and B respectively.(a). First we show that E ν (cid:2) P f ζ { ¯ E} (cid:3) ≤ e κ . (5.34)By Markov’s inequality P f ζ (cid:8) ¯ E (cid:9) ≤ t − X k =1 P f ζ n η k > M − kt o ≤ t − X k =1 M kt − E f ζ ( η k ) , where in the first inequality we have used that η ≤ M by definition. Noting that E f ζ ( η k ) = X m ∈M P f ζ { b n m = k } = X m ∈M P f ζ (cid:26) n X i =1 Π m ( X i ) = k (cid:27) , we obtain E f ζ ( η k ) = X m ∈M (cid:0) nk (cid:1)h Z Π m f ζ ( x )d x i k h − Z Π m f ζ ( x )d x i n − k ≤ M ( nA σ ) k k ! . In the last inequality we took into account that ζ m ≤ m ∈ M . Using the condition(3.18) we obtain E f ζ ( η k ) = κ M ( t − k ) /t ( k !) − , and (5.34) follows.(b). Now let us prove that E ν h P f ζ (cid:8) ¯ D (cid:9) i ≤ ( e κ ) t . (5.35) . Goldenshluger and O. V. Lepski/Estimation of norms of a probability density Indeed, P f ζ { ¯ D} ≤ X m ∈M P f ζ { b n m ≥ t } = X m ∈M P f ζ n n X i =1 Π m ( X i ) ≥ t o . Noting that under f ζ random variable b n m is binomial with parameters n and p n := P f ζ (cid:0) X i ∈ Π m (cid:1) = Z Π m f ζ ( y )d y = A σ ζ m ≤ A σ ,np n ≤ κ M − /t ≤ t in view of the condition (3.18), and applying Lemma 3 given in the proof ofCorollary 3 we obtain for any m ∈ M P f ζ (cid:0)b n m ≥ t (cid:1) ≤ (cid:0) enA σ t − (cid:1) t ≤ ( e κ ) t M − . This implies (5.35), and it follows from (5.34) and (5.35) that E ν (cid:2) P f ζ {D ∩ E} (cid:3) ≥ − (cid:2) e κ + ( e κ ) t (cid:3) > / , provided that κ sufficiently small. The corollary statement follows now from (5.33) and Theo-rem 4.
6. Appendix
Fix s ∈ N ∗ and let M s denote the Hilbert matrix, that is M s = (cid:8) ( i + j + 1) − (cid:9) i,j =0 ,...,s . It iswell known that M s is invertible for all s ∈ N ∗ .Let a = ( a , . . . , a s ) ∈ R s +1 be such that Z P s, a ( x ) x k d x = δ k,t , ∀ k = 0 , . . . s ;here P s, a ( x ) = P sj =0 a j x j , and δ k,t is the Kronecker symbol. If e t = (0 , . . . , , , , . . . ∈ R s +1 is the t –th unit vector of the canonical basis of R s +1 then Z P s, a ( x ) x k d x = δ k,t , ∀ k = 0 , . . . s ⇔ M s a = e t , a = M − s e t . Put for brevity P = P s, a , and note that0 < k P k L (0 , ≤ k P k L (0 , = a T M s a = a T e t = (cid:2) M − s ] t,t . (6.1)Define P + ( x ) := max { P ( x ) , } , P − ( x ) := max {− P ( x ) , } and remark that since P ( x ) = P + ( x ) − P + ( x ) for all x ∈ [0 ,
1] 0 = Z P ( x )d x = Z P + ( x )d x − Z P − ( x )d x. Moreover | P ( x ) | = P + ( x ) + P − ( x ) for any x ∈ [0 ,
1] and, therefore, Z P + ( x )d x = Z P − ( x )d x = k P k L (0 , . . Goldenshluger and O. V. Lepski/Estimation of norms of a probability density Setting µ ′ (d x ) = 2 P + ( x )1 [0 , ( x ) k P k L (0 , d x, ν ′ (d x ) = 2 P − ( x )1 [0 , ( x ) k P k L (0 , d x we can assert that µ ′ , ν ′ ∈ P [0 , k = 1 , . . . , se µ ′ ( k ) − e ν ′ ( k ) = 2 k P k − L (0 , Z x k P ( x )d x = 2 k P k − L (0 , δ k,t . (6.2)Finally, let µ = µ ′ (d x ) + δ (d x ) , ν = 12 ν ′ (d x ) + δ (d x ) , where δ t , t ∈ R , denotes the Dirac mass at t . It is obvious that (6.2) is fulfilled for µ and ν withthe constant k P k − L (0 , if k = t . Additionally all moments w.r.t. µ and ν are greater than 1 / (cid:2) M − s ] t,t = C − s,t . This, together with (6.1) completes the proof.
Proof of Proposition 3 . The first step in the proof is to find a function K : R → R satisfying Z K ( x ) x k d x = 0 , k = 0 , . . . , s, Z K ( x ) S ( x )d x = ̟ s ( S ) . (6.3)We will seek K in the following form: K ( x ) = P s, a ( x ) − bS ( x ) , where a = ( a , . . . , a s ) ∈ R s +1 , b ∈ R are the parameters to be chosen. Let c k = Z S ( x ) x k d x, k = 0 , . . . , s, and let c = (cid:0) c , . . . , c r (cid:1) ∈ R s +1 . Note that condition R K ( x ) x k d x = 0, k = 0 , . . . , s is equivalentto M s a = b c , where as before M s denotes the Hilbert matrix.Since M s is invertible we get a = bM − s c . (6.4)With this choice the second condition in (6.3) becomes b a T c − b k S k L (0 , = b h c T M − s c − k S k L (0 , i = ̟ s ( S ) . It remains to note that κ s ( S ) := c T M − s c − k S k L (0 , = − inf u ∈ R s +1 Z (cid:12)(cid:12) S ( x ) − P s, u ( x ) (cid:12)(cid:12) d x, and κ s ( S ) < κ s ( S ) = 0 implies that S ( x ) coincides almost everywhere with apolynomial of degree s which contradicts to the proposition assumption ̟ s ( S ) >
0. Choosing b = ̟ s ( S ) / κ s ( S ) we conclude that K ( x ) = P s, a ( x ) − bS ( x ) satisfies (6.3). . Goldenshluger and O. V. Lepski/Estimation of norms of a probability density . Let L be the linear subspace of C (0 ,
1) spanned by functions (cid:8) S ( x ) , , x, x , . . . , x s (cid:9) . Let K ( x ) = P s, a ( x ) − [ ̟ s ( S ) / κ s ( S )] S ( x ), where a is defined in (6.4). DefineΛ( ℓ ) = Z K ( x ) ℓ ( x )d x, ℓ ∈ L ;Λ is a linear continuous functional on L , and its norm is k Λ k L := sup ℓ ∈ L {| Λ( ℓ ) | / k ℓ k ∞ } . Our goalnow is to prove that k Λ k L = 1 . (6.5)For any ε > P ( ε ) be a polynomial of the degree s such thatsup x ∈ [0 , (cid:12)(cid:12) S ( x ) − P ( ε ) ( x ) (cid:12)(cid:12) ≤ ̟ s ( S )(1 + ε ) . Putting ℓ ε ( x ) = S ( x ) − P ( ε ) ( x ) and noting that ℓ ε ∈ L for any ε > k Λ k L ≥ (cid:12)(cid:12) Λ( ℓ ε ) (cid:12)(cid:12) k ℓ ε k − ∞ = | ℓ ε k − ∞ (cid:12)(cid:12)(cid:12)(cid:12) Z K ( x ) ℓ ε ( x )d x (cid:12)(cid:12)(cid:12)(cid:12) = k ℓ ε k − ∞ (cid:12)(cid:12)(cid:12)(cid:12) Z K ( x ) S ( x )d x (cid:12)(cid:12)(cid:12)(cid:12) = ̟ s ( S ) (cid:13)(cid:13) ℓ ε k − ∞ ≥ (1 + ε ) − . Since ε can be chosen arbitrary small we prove that necessarily k Λ k L ≥ k Λ k L >
1; then there exists ℓ ∈ L such that | Λ( ℓ ) | > k ℓ k ∞ . Next we note thatΛ( ℓ ) = 0 if and only if ℓ is a polynomial. Indeed, by definition of L any ℓ ∈ L is represented asfollows ℓ ( x ) = t s +1 S ( x ) + P s, t ( x ) , t = ( t , . . . , t s ) ∈ R s +1 , t s +1 ∈ R . In view of (6.3), Λ( ℓ ) = t s +1 ̟ s ( S ) and, therefore, Λ( ℓ ) = 0 if and only if t s +1 = 0. Now we define l ( x ) := S ( x ) − ̟ s ( S ) ℓ ( x )Λ( ℓ ) , and note that Λ( l ) = Λ( S ) − ̟ s ( S ) = 0 which means that l is a polynomial. On the other hand,sup x ∈ [0 , | S ( x ) − l ( x ) | = k ̟ s ( S )Λ − ( ℓ ) ℓ k ∞ = ̟ s ( S ) (cid:12)(cid:12) Λ( ℓ ) (cid:12)(cid:12) − k ℓ k ∞ < ̟ s ( S ) , which is impossible by definition of ̟ s ( S ). Thus, (6.5) is established.3 . By the Hahn-Banach extension theorem there exists a linear continuous functional Λ ∗ on C (0 ,
1) satisfying Λ ∗ ( ℓ ) = Λ( ℓ ) , ∀ ℓ ∈ L , k Λ ∗ k C (0 , = k Λ k L . (6.6)The Riesz representation theorem implies existence of a unique signed measure λ such thatΛ ∗ ( u ) = Z u ( x ) λ (d x ) , ∀ u ∈ C (0 , , | λ | (cid:0) [0 , (cid:1) = k Λ ∗ k C (0 , . (6.7)Moreover, in view of the Jordan decomposition theorem, λ can be represented uniquely as λ = λ + − λ − , where λ + , λ − are positive measures, and | λ | = λ + + λ − . Therefore we obtain from(6.3), (6.6) and (6.7) λ ([0 , λ + (cid:0) [0 , (cid:1) − λ − (cid:0) [0 , (cid:1) = Λ ∗ (cid:0)
1) = Λ(1) = 0 . . Goldenshluger and O. V. Lepski/Estimation of norms of a probability density In addition, (6.5), (6.6) and (6.7) imply that λ + (cid:0) [0 , (cid:1) + λ − (cid:0) [0 , (cid:1) = 1 and, therefore, λ + (cid:0) [0 , (cid:1) = λ − (cid:0) [0 , (cid:1) = . (6.8)Note that in view of (6.3), (6.6) and (6.7) for any k = 0 , . . . , s one has Z x k λ + (d x ) − Z x k λ − (d x ) = Z x k λ (d x ) = Λ ∗ (cid:0) x x k (cid:1) = Λ (cid:0) x x k (cid:1) = 0 , and Z S ( x ) λ + (d x ) − Z S ( x ) λ − (d x ) = Z S ( x ) λ (d x ) = Λ ∗ ( S ) = Λ( S ) = ̟ s ( S ) . Finally, in view of (6.8), 2 λ + and 2 λ − are probability measures on [0 , µ (d x ) := λ + (d x ) + δ (d x ) , ν (d x ) := λ − (d x ) + 12 δ (d x )we complete the proof. First of all we remark that the construction of the set of functions (cid:8) f w , w ∈ { , } M (cid:9) almostcoincides with the construction proposed in Goldenshluger and Lepski (2014), in the proof ofTheorem 3. Thus, denoting F w = P m ∈M w m Λ m and repeating computations done in the citedpaper we can verify that the assumption Aσ − β l l k Λ k r l (cid:2) ̺ w ( r l ) σ (cid:3) /r l ≤ C L l , ∀ l = 1 , . . . , d, (6.9)together with ~σ ∈ (0 , d guarantees F w ∈ N ~r,d ( ~β, ~L ), w ∈ [0 , M . It is important to realizethat the only conditions used in the proof of (6.9) is (3.2) and Λ ∈ C ∞ ( R d ) which are the sameas in Goldenshluger and Lepski (2014). Set for any z ≥ W π,z := n w ∈ [0 , M : (cid:12)(cid:12) ̺ w ( z ) − M e π ( z ) (cid:12)(cid:12) ≤ p υM e π (2 z ) o . First we note that if for some z ≥ (cid:8) ζ ∈ W π,z (cid:9) is realized, then ̺ ζ ( z ) ≤ M e π ( z ) + p υM e π (2 z ) = e π ( z )[ M + √ υM (cid:3) ≤ M e π ( z ) ≤ M e ∗ ( z ) , (6.10)since e π ( z ) ≤ M ≥ υ in view of (3.7). Also we have used (4.1).Next, we deduce from (6.10) that if (cid:8) ζ ∈ W π,z (cid:9) , z ∈ [1 , ∞ ) is realized then (cid:13)(cid:13) F ζ (cid:13)(cid:13) z = k Λ k z (cid:0) σ ̺ ζ ( z )) z ≤ k Λ k z (cid:0) σ M e ∗ ( z ) (cid:1) z . It yields in particular, (cid:8) ζ ∈ W π,q (cid:9) is realized (cid:13)(cid:13) f ζ (cid:13)(cid:13) q = (cid:2) − A σ ̺ ζ (1) (cid:3) k f k q + A (cid:13)(cid:13) F ζ (cid:13)(cid:13) q ≤ − Q + 2 A k Λ k q (cid:0) σ M e ∗ ( q ) (cid:1) q ≤ Q. To get the last inequality we have used (4.3) and that f ∈ B q ( Q/
2) in view of the secondassertion of Lemma 1. Thus, f ζ ∈ B q ( Q ). . Goldenshluger and O. V. Lepski/Estimation of norms of a probability density At last, if (cid:8) ζ ∈ ∩ dl =1 W π,r l (cid:9) is realized we have in view of condition (4.4) and (6.10) A k Λ k r l (cid:0) σ ̺ ζ ( r l ) (cid:1) rl ≤ A k Λ k r l (cid:0) σ M e ∗ ( r l ) (cid:1) rl ≤ c L l for any r l = ∞ . Additionally, for all l ∈ { , . . . , d } such that r l = ∞ the latter inequalityobviously hold for all realizations of ζ .We assert in view of (6.9) with C = 2 c that F ζ ∈ N ~r,d ( ~β, − ~L ) and, therefore, f ζ ∈ N ~r,d ( ~β, ~L ) since f ζ = (cid:2) − A σ ̺ ζ (1) (cid:3) f + F ζ and f ∈ N ~r,d ( ~β, − ~L ) in view of the secondassertion of Lemma 1. Thus, we have shown that \ z ∈{ r ,...,r d ,q } (cid:8) ζ ∈ W π,z (cid:9) ⊂ n f ζ ∈ N ~r,d ( ~β, ~L ) ∩ B q ( Q ) o . Hence P π (cid:8) f ζ ∈ N ~r,d ( ~β, ~L ) ∩ B q ( Q ) (cid:9) ≥ − X z ∈{ r ,...,r d ,q } P π (cid:8) ζ / ∈ W π,z (cid:9) ≥ − ( d + 1) υ − = 1 − (64) − . Lemma is proved.
References
Bickel, R.J. and
Ritov, Y. (1988). Estimating integrated squared density derivatives: sharpbest order of convergence estimates.
Sankhya: The Indian Journal of Statistics , 381–393. Birg´e, L. and
Massart, P. (1995). Estimation of the integral functionals of a density.
Ann.Statist. , 11–29. Birg, L. (2014). Model selection for density estimation with L -loss. Probab. Theory RelatedFields , 533?574.
Boucheron, S. , Lugosi, G. and
Massart, P. (2013).
Concentration Inequalities: ANonasymptotic Theory of Independence . Oxford University Press.
Cai, T. T. and
Low, M. G. (2004) Minimax estimation of linear functionals over nonconvexparameter spaces.
Ann. Statist. , 552–576. Cai, T.T. and
Low, M.G. (2005). Nonquadratic estimators of a quadratic functional.
Ann.Statist. , 2930–2956. Cai, T. T. and
Low, M. G. (2011). Testing composite hypotheses, Hermite polynomials andoptimal estimation of a nonsmooth functional.
Ann. Statist. , 1012–1041. Donoho, D. L. and
Liu, R. C. (1991). Geometrizing rates of convergence. II, III.
Ann. Statist. , 633–667, 668–701. Donoho, D. and
Nussbaum, M. (1990). Minimax quadratic estimation of a quadratic func-tional.
J. Complexity , 290–323. Gin´e, E. and
Nickl, R.
A simple adaptive estimator of the integrated square of a density.
Bernoulli , 47–61. Goldenshluger, A. and
Lepski, O. (2014). On adaptive minimax density estimation on R d . Probab. Theory Related Fields , 479–543.
Goldenshluger, A. and
Lepski, O. (2020). Minimax estimation of norms of a probabilitydensity: II. Rate–optimal estimation procedures.
Manuscript . Han, Y., Jiao, J., Weissman, T. and
Zinn, J. (2017). Optimal rates of entropy estimationover Lipschitz balls. arXiv:1711.02141 [math.ST] . . Goldenshluger and O. V. Lepski/Estimation of norms of a probability density Han, Y., Jiao, J. and
Mukherjee, R. (2019). On estimation of L r -norms in Gaussian WhiteNoise Models. arXiv:1710.03863 [math.ST] . Ibragimov, I. A. and
Khasminskii, R. Z. (1986). An estimate for the value of a linear func-tional of the density of a distribution. (Russian)
Zap. Nauchn. Sem. Leningrad. Otdel. Mat.Inst. Steklov. (LOMI) (1986), 45-59; translation in
J. Soviet Math. (1989), no. 4,454–465. Ibragimov, I. A. , Nemirovskii, A. S. and
Khasminskii, R. Z. (1986). Some problems ofnonparametric estimation in Gaussian white noise. (Russian)
Teor. Veroyatnost. i Primenen. , 451–466. Juditsky, A. and
Nemirovski, A. (2020)
Statistical Inference via Convex Optimization.
Prince-ton Series in Applied Mathematics, Princeton University Press.
Kerkyacharian, G. and
Picard, D. (1996). Estimating nonquadratic functionals of a densityusing Haar wavelets.
Ann. Statist. , 485–507. Kozachenko, L. F. and
Leonenko, N. N. (1987). Sample estimate of the entropy of a randomvector.
Probl. Inform. Transm. , , 95–101. Laurent, B. (1996). Efficient estimation of integral functionals of a density.
Ann. Statist. ,659–681. Laurent, B. (1997). Estimation of integral functionals of a density and its derivatives.
Bernoulli , 181–211. Leonenko, N., Pronzato, L. and Savani, V. (2008). A class of R´enyi information estimatorsfor multidimensional densities.
Ann. Statist. , 2153–2182. Lepski, O.V., Nemirovski, A. and
Spokoiny, V. (1999). On estimation of the L r -norm of aregression function Probab. Theory and Related Fields , 221–253.
Lepski, O.V. and
Willer, T. (2019). Oracle inequalities and adaptive estimation in the con-volution structure density model.
Ann. Statist. , 233–287. Nemirovskii, A. S. (1990). Necessary conditions for efficient estimation of functionals of anonparametric signal observed in white noise. (Russian)
Teor. Veroyatnost. i Primenen. ,83–91; translation in Theory Probab. Appl. (1990), 94–103 (1991). Nemirovski, A. (2000).
Topics in non-parametric statistics.
Lectures on probability theory andstatistics (Saint-Flour, 1998), 85–277, Lecture Notes in Math. , Springer, Berlin.
Nikol’skii, S. M. (1977).
Priblizhenie Funktsii Mnogikh Peremennykh i Teoremy Vlozheniya ,2nd ed. Nauka, Moscow.
R´enyi, A (1961). On measures of entropy and information.
Proc. 4th Berkeley Sympos. Math.Statist. Probab. , 547–561. Univ. California Press, Berkeley. Timan, A. F. (1963).
Theory of approximation of functions of a real variable . Pergamon Press.
Tchetgen, E., Li, L., Robins, J. and van der Vaart, A. (2008). Minimax estimation of theintegral of a power of a density.
Stat. Probab. Letters Tsallis, C (1988). Possible generalization of Boltzmann-Gibbs statistics.
J. Statist. Phys. ,479–487. Tsybakov, A. (2009).