Approximation and Estimation of s-Concave Densities via Rényi Divergences
aa r X i v : . [ m a t h . S T ] O c t Submitted to the Annals of Statistics
APPROXIMATION AND ESTIMATION OF S -CONCAVEDENSITIES VIA R´ENYI DIVERGENCES By Qiyang Han and Jon A. Wellner ∗ University of Washington
In this paper, we study the approximation and estimation of s -concave densities via R´enyi divergence. We first show that the ap-proximation of a probability measure Q by an s -concave density existsand is unique via the procedure of minimizing a divergence functionalproposed by Koenker and Mizera (2010) if and only if Q admits full-dimensional support and a first moment. We also show continuity ofthe divergence functional in Q : if Q n → Q in the Wasserstein metric,then the projected densities converge in weighted L metrics and uni-formly on closed subsets of the continuity set of the limit. Moreover,directional derivatives of the projected densities also enjoy local uni-form convergence. This contains both on-the-model and off-the-modelsituations, and entails strong consistency of the divergence estimatorof an s -concave density under mild conditions. One interesting andimportant feature for the R´enyi divergence estimator of an s -concavedensity is that the estimator is intrinsically related with the esti-mation of log-concave densities via maximum likelihood methods. Infact, we show that for d = 1 at least, the R´enyi divergence estimatorsfor s -concave densities converge to the maximum likelihood estimatorof a log-concave density as s ր
0. The R´enyi divergence estimatorshares similar characterizations as the MLE for log-concave distribu-tions, which allows us to develop pointwise asymptotic distributiontheory assuming that the underlying density is s -concave. ∗ Supported in part by NSF Grant DMS-1104832 and NI-AID grant R01 AI029168
MSC 2010 subject classifications:
Primary 62G07, 62H12; secondary 62G05, 62G20
Keywords and phrases: s -concavity, consistency, projection, asymptotic distribution,mode estimation, nonparametric estimation, shape constraints HAN AND WELLNER
CONTENTS1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 Theoretical properties of the divergence estimator . . . . . . . . . 72.1 Existence and uniqueness . . . . . . . . . . . . . . . . . . . . 82.2 Weighted global convergence in k·k L and k·k ∞ . . . . . . . . 82.3 Characterization of the R´enyi divergence projection and esti-mator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.4 Continuity of the R´enyi divergence estimator in s . . . . . . . 133 Limit behavior of s -concave densities . . . . . . . . . . . . . . . . . 143.1 Limit characterization via dimensionality condition . . . . . . 143.2 Modes of convergence . . . . . . . . . . . . . . . . . . . . . . 153.3 Local convergence of directional derivatives . . . . . . . . . . 164 Limiting distribution theory of the divergence estimator . . . . . . 164.1 Limit distribution theory . . . . . . . . . . . . . . . . . . . . 174.2 Estimation of the mode . . . . . . . . . . . . . . . . . . . . . 205 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225.1 Behavior of R´enyi projection for generic measures Q when s < − / ( d + 1) . . . . . . . . . . . . . . . . . . . . . . . . . . 225.2 Global rates of convergence for R´enyi divergence estimators . 225.3 Conjectures about the global rates in higher dimensions . . . 235.4 Adaptive estimation of concave-transformed class of functions 236 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236.1 Proofs for Section 2 . . . . . . . . . . . . . . . . . . . . . . . 236.2 Proofs for Section 3 . . . . . . . . . . . . . . . . . . . . . . . 366.3 Proofs for Section 4 . . . . . . . . . . . . . . . . . . . . . . . 427 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467.1 Proofs of Lemmas 6.2 and 6.3 . . . . . . . . . . . . . . . . . . 467.2 Proof of Theorem 6.4 . . . . . . . . . . . . . . . . . . . . . . . 497.3 Auxiliary convex analysis . . . . . . . . . . . . . . . . . . . . 57Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63Author’s addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 -CONCAVE ESTIMATION
1. Introduction.
Overview.
The class of s -concave densities on R d is defined by thegeneralized means of order s as follows. Let M s ( a, b ; θ ) := (cid:0) (1 − θ ) a s + θb s (cid:1) /s , s = 0 , a, b > , , s < , ab = 0 ,a − θ b θ , s = 0 ,a ∧ b, s = −∞ . Then a density p ( · ) on R d is called s -concave, i.e. p ∈ P s if and only iffor all x , x ∈ R d and θ ∈ (0 , p (cid:0) (1 − θ ) x + θx (cid:1) ≥ M s ( p ( x ) , p ( x ); θ ).This definition apparently goes back to Avriel (1972) with further studies byBorell (1974, 1975), Das Gupta (1976), Rinott (1976), and Uhrin (1984); seealso Dharmadhikari and Joag-Dev (1988) for a nice summary. It is easy tosee that the densities p ( · ) have the form p = ϕ /s + for some concave function ϕ if s > p = exp( ϕ ) for some concave ϕ if s = 0, and p = ϕ /s + for someconvex ϕ if s <
0. The function classes P s are nested in s in that for every r > > s , we have P r ⊂ P ⊂ P s ⊂ P −∞ . Nonparametric estimation of s -concave densities has been under intenseresearch efforts in recent years. In particular, much attention has been paidto estimation in the special case s = 0 which corresponds to all log-concavedensities on R d . The nonparametric maximum likelihood estimator (MLE)of a log-concave density was studied in the univariate setting by Walther(2002), D¨umbgen and Rufibach (2009), Pal, Woodroofe and Meyer (2007);and in the multivariate setting by Cule, Samworth and Stewart (2010); Cule and Samworth(2010). The limiting distribution theory at fixed points when d = 1 wasstudied in Balabdaoui, Rufibach and Wellner (2009), and rate results inDoss and Wellner (2016); Kim and Samworth (2015). D¨umbgen, Samworth and Schuhmacher(2011) also studied stability properties of the MLE projection of any prob-ability measure onto the class of log-concave densities.Compared with the well-studied log-concave densities (i.e. s = 0), muchremains unknown concerning estimation and inference procedures for thelarger classes P s , s <
0. One important feature for this larger class is thatthe densities in P s ( s <
0) are allowed to have heavier and heavier tailsas s → −∞ . In fact, t − distributions with ν degrees of freedom belong to P − / ( ν +1) ( R ) (and hence also to P s ( R ) for any s < − / ( ν + 1)). The studyof maximum likelihood estimators (MLE’s in the following) for general s -concave densities in Seregin and Wellner (2010) shows that the MLE existsand is consistent for s ∈ ( − , ∞ ). However there is no known result about HAN AND WELLNER uniqueness of the MLE of s -concave densities except for s = 0. The difficul-ties in the theory of estimation via MLE lie in the fact we have still very littleknowledge of ‘good’ characterizations of the MLE in the s -concave setting.This has hindered further development of both theoretical and statisticalproperties of the estimation procedure.Some alternative approaches to estimation of s -concave densities havebeen proposed in the literature by using divergences other than the log-likelihood functional (Kullback-Leibler divergence in some sense). Koenker and Mizera(2010) proposed an alternative to maximum likelihood based on generalizedR´enyi entropies. Similar procedures were also proposed in parametric set-tings by Basu et al. (1998) using a family of discrepancy measures. In oursetting of s -concave densities with s <
0, the methods of Koenker and Mizera(2010) can be formulated as follows.Given i.i.d. observations X = ( X , . . . , X n ), consider the primal optimiza-tion problem ( P ):(1.1) ( P ) min g ∈G ( X ) L ( g, Q n ) ≡ n n X i =1 g ( X i ) + 1 | β | Z R d g ( x ) β d x, where G ( X ) denotes all non-negative closed convex functions supported onthe convex set conv( X ), Q n = n P ni =1 δ X i the empirical measure and β =1 + 1 /s <
0. As is shown by Koenker and Mizera (2010), the associated dualproblem ( D ) is( D ) max f Z R d ( f ( y )) α α d y, subject to f = d( Q n − G )d y for some G ∈ G ( X ) ◦ (1.2)where G ( X ) ◦ ≡ (cid:8) G ∈ C ∗ ( X ) | R g d G ≤ , for all g ∈ G ( X ) (cid:9) is the polar coneof G ( X ), and α is the conjugate index of β , i.e. 1 /α + 1 /β = 1. Here C ∗ ( X ),the space of signed Radon measures on conv( X ), is the topological dual of C ( X ), the space of continuous functions on conv( X ). We also note that theconstraint G ∈ G ( X ) ◦ in the dual form (1.2) comes from the ‘dual’ of theprimal constraint g ∈ G ( X ), and the constraint f = d( Q n − G )d y can be derivedfrom the dual computation of L ( · , Q n ): (cid:0) L ( · , Q n ) (cid:1) ∗ ( G ) = sup g (cid:18) h G, g i − n n X i =1 g ( X i ) − | β | Z R d g ( x ) β d x (cid:19) = sup g (cid:18) h G − Q n , g i − Z ψ s ( g ( x ))d x (cid:19) = Ψ ∗ s ( G − Q n ) . -CONCAVE ESTIMATION Here we used the notation h G, g i := R g d G , ψ s ( · ) := ( · ) β / | β | and Ψ s is thefunctional defined by Ψ s ( g ) := R ψ s ( g ( x )) d x for clarity. Now the dual form(1.2) follows by the well known fact (e.g. Rockafellar (1971) Corollary 4A)that the form of the above dual functional is given byΨ ∗ ( G ) = R ψ ∗ (d G/ d x ) d x if G is absolute continuous with respect toLebesgue measure,+ ∞ otherwise.For the primal problem ( P ) and the dual problem ( D ), Koenker and Mizera(2010) proved the following results: Theorem . ( P ) admitsa unique solution g ∗ n if int(conv( X )) = ∅ , where g ∗ n is a polyhedral convexfunction supported on conv( X ) . Theorem . Strong du-ality between ( P ) and ( D ) holds. Any dual feasible solution is actually adensity on R d with respect to the canonical Lebesgue measure. The dual op-timal solution f ∗ n exists, and satisfies f ∗ n = ( g ∗ n ) /s . We note that the above results are all obtained in the empirical setting. Atthe population level, given a probability measure Q with suitable regularityconditions, consider(1.3) ( P Q ) min g ∈G L s ( g, Q ) , where L ( g, Q ) ≡ L s ( g, Q ) ≡ Z g ( x ) d Q + 1 | β | Z R d g ( x ) β d x, and G denotes the class of all (non-negative) closed convex functions withnon-empty interior, which are coercive in the sense that g ( x ) → ∞ , as k x k →∞ . Koenker and Mizera (2010) show that Fisher consistency holds at thepopulation level: Suppose Q ( A ) := R A f d λ is defined for some f = g /s where g ∈ G ; then g is an optimal solution for ( P Q ).Koenker and Mizera (2010) also proposed a general discretization schemecorresponding to the primal form (1.1) and the dual form (1.2) for fastcomputation, by which the one dimensional problem can be solved via lin-ear programming and the two dimensional problem via semi-definite pro-gramming. These have been implemented in the R package REBayes byKoenker and Mizera (2014). Koenker’s package depends in turn on the
MOSEK
HAN AND WELLNER implementation of MOSEK ApS (2011); see Appendix B of Koenker and Mizera(2010) for further details. On the other hand, in the special case s = 0, com-putation of the MLE’s of log-concave densities has been implemented in the R package LogConcDEAD developed in Cule, Samworth and Stewart (2010) inarbitrary dimensions. However, expensive search for the proper triangulationof the support conv( X ) renders computation difficult in high dimensions.In this paper, we show that the estimation procedure proposed by Koenker and Mizera(2010) is the ‘natural’ way to estimate s -concave densities. As a startingpoint, since the classes P s are nested in s , it is natural to consider estima-tion of the extreme case s = 0 (the class of log-concave densities) as somekind of ‘limit’ of estimation of the larger class s <
0. As we will see, estima-tion of s -concave distributions via R´enyi divergences is intrinsically relatedwith the estimation of log-concave distributions via maximum likelihoodmethods. In fact we show that in the empirical setting in dimension 1, theR´enyi divergence estimators converge to the maximum likelihood estimatorfor log-concave densities as s ր s -concave densities become possible. In particular, the charac-terizations developed here enable us to overcome some of the difficulties ofmaximum likelihood estimators as proposed by Seregin and Wellner (2010),and to develop limit distribution theory at fixed points assuming that theunderlying model is s -concave. The pointwise rate and limit distributionresults follow a pattern similar to the corresponding results for the MLE’sin the log-concave setting obtained by Balabdaoui, Rufibach and Wellner(2009). This local point of view also underlines the results on global rates ofconvergence considered in Doss and Wellner (2016), showing that the diffi-culty of estimation for such densities with tails light or heavy, comes almostsolely from the shape constraints, namely, the convexity-based constraints.The rest of the paper is organized as follows. In Section 2, we study thebasic theoretical properties of the approximation/projection scheme definedby the procedure (1.3). In Section 3, we study the limit behavior of s -concaveprobability measures in the setting of weak convergence under dimension-ality conditions on the supports of the limiting sequence. In Section 4, wedevelop limiting distribution theory of the divergence estimator in dimen-sion 1 under curvature conditions with tools developed in Sections 2 and 3.Related issues and further problems are discussed in Section 5. Proofs are -CONCAVE ESTIMATION given in Sections 6 and 7.1.2. Notation.
In this paper, we denote the canonical Lebesgue measureon R d by λ or λ d and write k·k p for the canonical Euclidean p -norm in R d ,and k·k = k·k unless otherwise specified. B ( x, δ ) stands for the open ball ofradius δ centered at x in R d , and A for the indicator function of A ⊂ R d .We use L p ( f ) ≡ k f k L p ≡ k f k p = ( R | f | p d λ d ) /p to denote the L p ( λ d ) normof a measurable function f on R d if no confusion arises.We write csupp( Q ) for the convex support of a measure Q defined on R d ,i.e. csupp( Q ) = \ { C : C ⊂ R d closed and convex , Q ( C ) = 1 } . We let Q denote all probability measures on R d whose convex support hasnon-void interior, while Q denotes the set of all probability measures Q with finite first moment: R k x k Q (d x ) < ∞ .We write f n → d f if P n converges weakly to P for the correspondingprobability measures P n ( A ) ≡ R A f n dλ and P ( A ) ≡ R A f dλ .We write α := 1 + s, β := 1 + 1 /s, r := − /s unless otherwise specified.
2. Theoretical properties of the divergence estimator.
In thissection, we study the basic theoretical properties of the proposed projec-tion scheme via R´enyi divergence (1.3). Starting from a given probabilitymeasure Q , we first show the existence and uniqueness of such projectionsvia R´enyi divergence under assumptions on the index s and Q . We will callsuch a projection the R´enyi divergence estimator for the given probabilitymeasure Q in the following discussions. We next show that the projectionscheme is continuous in Q in the following sense: if a sequence of probabilitymeasures Q n , for which the projections onto the class of s -concave densitiesexist, converge to a limiting probability measure Q in Wasserstein distance,then the corresponding projected densities converge in weighted L metricsand uniformly on closed subsets of the continuity set of the limit. The direc-tional derivatives of such projected densities also converge uniformly in alldirections in a local sense. We then turn our attention the explicit character-izations of the R´enyi divergence estimators, especially in dimension 1. Thishelps in two ways. First, it helps to understand the continuity of the projec-tion scheme in the index s , i.e. answers affirmatively the question: For a givenprobability measure Q , does the R´enyi divergence estimator converge to thelog-concave projection as studied in D¨umbgen, Samworth and Schuhmacher(2011) as s ր
0? Second, the explicit characterizations are exploited in thedevelopment of asymptotic distribution theory presented in Section 4.
HAN AND WELLNER
Existence and uniqueness.
For a given probability measure Q , let L ( Q ) = inf g ∈G L ( g, Q ). Lemma . Assume − / ( d + 1) < s < and Q ∈ Q . Then L ( Q ) < ∞ if and only if Q ∈ Q . Now we state our main theorem for the existence of R´enyi divergenceprojection corresponding to a general measure Q on R d . Theorem . Assume − / ( d +1) < s < and Q ∈ Q ∩Q . Then (1.3)achieves its nontrivial minimum for some ˜ g ∈ G . Moreover, ˜ g is boundedaway from zero, and ˜ f ≡ ˜ g /s is a bounded density with respect to λ d . The uniqueness of the solution follows immediately from the strict con-vexity of the functional L ( · , Q ). Lemma . ˜ g is the unique solution for ( P Q ) if int(dom(˜ g )) = ∅ . Remark . By the above discussion, we conclude that the map Q arg min g ∈G L ( g, Q ) is well-defined for probability measures Q with suitableregularity conditions: in particular, if Q ∈ Q and − / ( d + 1) < s <
0, itis well-defined if and only if Q ∈ Q . From now on we denote the optimalsolution as g s ( ·| Q ) or simply g ( ·| Q ) if no confusion arises, and write P Q for the corresponding s -concave distribution, and say that P Q is the R´enyiprojection of Q to P Q ∈ P s .2.2. Weighted global convergence in k·k L and k·k ∞ . Theorem . Assume − / ( d + 1) < s < . Let { Q n } ⊂ Q be asequence of probability measures converging weakly to Q ⊂ Q ∩ Q . Then (2.1) Z k x k d Q ≤ lim inf n →∞ Z k x k d Q n . If we further assume that (2.2) lim n →∞ Z k x k d Q n = Z k x k d Q, then, (2.3) L ( Q ) = lim n →∞ L ( Q n ) . -CONCAVE ESTIMATION Conversely, if (2.3) holds, then (2.2) holds true. In the former case(i.e. (2.2)holds), let g := g ( ·| Q ) and g n := g ( ·| Q n ) , then f := g /s , f n := g /sn satisfy lim n →∞ ,x → y f n ( x ) = f ( y ) , for all y ∈ R d \ ∂ { f > } , lim sup n →∞ ,x → y f n ( x ) ≤ f ( y ) , for all y ∈ R d . (2.4) For κ < r − d ≡ − /s − d , (2.5) lim n →∞ Z (1 + k x k ) κ | f n ( x ) − f ( x ) | d x = 0 . For any closed set S contained in the continuity points of f and κ < r , (2.6) lim n →∞ sup x ∈ S (cid:0) k x k ) κ | f n ( x ) − f ( x ) | = 0 . Furthermore, let D f := { x ∈ int(dom( f )) : f is differentiable at x } , and T ⊂ int( D f ) be any compact set. Then (2.7) lim n →∞ sup x ∈ T, k ξ k =1 |∇ ξ f n ( x ) − ∇ ξ f ( x ) | = 0 where ∇ ξ f ( x ) := lim h ց f ( x + hξ ) − f ( x ) h denotes the (one-sided) directionalderivative along ξ . Remark . The one-sided directional derivative for a convex function g is well-defined and ∇ ξ g ( x ) = inf h> g ( x + hξ ) − g ( x ) h , hence well-defined for f ≡ g /s . See Section 23 in Rockafellar (1997) for more details.As a direct consequence, we have the following result covering both onand off-the-model cases. Corollary . Assume − / ( d + 1) < s < . Let Q be a proba-bility measure such that Q ∈ Q ∩ Q , with f Q := g ( ·| Q ) /s the densityfunction corresponding to the R´enyi projection P Q (as in Remark 2.4). Let Q n = n P ni =1 δ X i be the empirical measure when X , . . . , X n are i.i.d. withdistribution Q on R d . Let ˆ g n := g ( ·| Q n ) and ˆ f n := ˆ g /sn be the R´enyi diver-gence estimator of Q . Then, almost surely we have lim n →∞ ,x → y ˆ f n ( x ) = f Q ( y ) , for all y ∈ R d \ ∂ { f > } , lim sup n →∞ ,x → y ˆ f n ( x ) ≤ f Q ( y ) , for all y ∈ R d . (2.8) HAN AND WELLNER
For κ < r − d ≡ − /s − d , (2.9) lim n →∞ Z (1 + k x k ) κ (cid:12)(cid:12)(cid:12) ˆ f n ( x ) − f Q ( x ) (cid:12)(cid:12)(cid:12) d x = a.s. . For any closed set S contained in the continuity points of f and κ < r , (2.10) lim n →∞ sup x ∈ S (cid:0) k x k ) κ (cid:12)(cid:12)(cid:12) ˆ f n ( x ) − f Q ( x ) (cid:12)(cid:12)(cid:12) = a.s. . Furthermore, for any compact set T ⊂ int( D f Q ) , (2.11) lim n →∞ sup x ∈ T, k ξ k =1 (cid:12)(cid:12)(cid:12) ∇ ξ ˆ f n ( x ) − ∇ ξ f Q ( x ) (cid:12)(cid:12)(cid:12) = a.s. . Now we return to the correctly specified case and relax the previous as-sumption that s > − / ( d +1) for the case of the empirical measure Q n ≡ Q n and some measure Q with finite mean and bounded density f ∈ P s ′ ⊂ P s with s ′ > s . Corollary . Assume − /d < s < . Let Q be a probability measureon R d with density f ∈ P s if − / ( d + 1) < s and f ∈ P s ′ where s ′ > − / ( d + 1) } if s ∈ ( − /d, − / ( d + 1)] . (Thus f is bounded and f has afinite mean.) Let ˆ f n ≡ ˆ f n,s be defined as in Corollary 2.7. Then (2.8), (2.9),(2.10), and (2.11) hold with f Q replaced by f . Characterization of the R´enyi divergence projection and estimator.
We now develop characterizations for the R´enyi divergence projection, es-pecially in dimension 1. All proofs for this subsection can be found in Ap-pendix 6.1.We note that the assumption − / ( d + 1) < s < Q n ≡ Q n , this condition can be relaxed to − /d with g + th ∈ G holdsfor all t ∈ (0 , t ) . -CONCAVE ESTIMATION Corollary . Assume − / ( d + 1) < s < and Q ∈ Q ∩ Q andlet h be any closed convex function. Then Z h d P ≤ Z h d Q, where P = P Q is the R´enyi projection of Q to P Q ∈ P s (as in Remark 2.4). As a direct consequence, we have
Corollary . Assume − / ( d + 1) < s < and Q ∈ Q ∩Q . Let µ Q := E Q [ X ] . Then µ P = µ Q . Furthermore if − / ( d +2) < s < , we have λ max (Σ P ) ≤ λ max (Σ Q ) and λ min (Σ P ) ≤ λ min (Σ Q ) where Σ Q is the covariance matrix defined by Σ Q := E Q [( X − µ Q )( X − µ Q ) T ] .Generally if − / ( d + k ) < s < for some k ∈ N , then E P [ k X k l ] ≤ E Q [ k X k l ] holds for all l = 1 , . . . , k . Now we restrict our attention to d = 1, and in the following we will givea full characterization of the R´enyi divergence estimator. Suppose we ob-serve X , . . . , X n i.i.d. Q on R , and let X (1) ≤ X (2) ≤ . . . ≤ X ( n ) be theorder statistics of X , . . . , X n . Let F n be the empirical distribution func-tion corresponding to the empirical probability measure Q n := n P ni =1 δ X i .Let ˆ g n := g ( ·| Q n ) and ˆ F n ( t ) := R t −∞ ˆ g /sn ( x ) d x . From Theorem 4.1 inKoenker and Mizera (2010) it follows that ˆ g n is a convex function supportedon [ X (1) , X ( n ) ], and linear on [ X ( i ) , X ( i +1) ] for all i = 1 , . . . , n −
1. For a con-tinuous piecewise linear function h : [ X (1) , X ( n ) ] → R , define the set of knotsto be S n ( h ) := { t ∈ ( X (1) , X ( n ) ) : h ′ ( t − ) = h ′ ( t +) } ∩ { X , . . . , X n } . Theorem . Let g n be a convex function taking the value + ∞ on R \ [ X (1) , X ( n ) ] and linear on [ X ( i ) , X ( i +1) ] for all i = 1 , . . . , n − . Let F n ( t ) := Z t −∞ g /sn ( x ) d x. Assume F n ( X ( n ) ) = 1 . Then g n = ˆ g n if and only if (2.13) Z tX (1) (cid:0) F n ( x ) − F n ( x ) (cid:1) d x ( = 0 if t ∈ S n ( g n ) ≤ otherwise . HAN AND WELLNER
Corollary . For x ∈ S n (ˆ g n ) , we have F n ( x ) − n ≤ ˆ F n ( x ) ≤ F n ( x ) . Finally we give a characterization of the R´enyi divergence estimator interms of distribution function as Theorem 2.7 in D¨umbgen, Samworth and Schuhmacher(2011).
Theorem . Assume − / < s < and Q ∈ Q ∩ Q is a probabilitymeasure on R with distribution function G ( · ) . Let g ∈ G be such that f ≡ g /s is a density on R , with distribution function F ( · ) . Then g = g ( ·| Q ) if andonly if1. R R ( F − G )( t )d t = 0 ;2. R x −∞ ( F − G )( t )d t ≤ for all x ∈ R with equality when x ∈ ˜ S ( g ) .Here ˜ S ( g ) := { x ∈ R : g ( x ) < (cid:0) g ( x + δ ) + g ( x − δ ) (cid:1) holds for δ > small enough . } . The above theorem is useful for understanding the projected s -concavedensity given an arbitrary probability measure Q ∈ Q ∩ Q . The followingexample illustrates these projections and also gives some insight concerningthe boundary properties of the class of s -concave densities. Example . Consider the class of densities Q defined by Q = (cid:26) q τ ( x ) = τ − τ − (cid:18) | x | τ − (cid:19) − τ , τ > (cid:27) . Note that q τ is − /τ -concave and not s -concave for any 0 > s > − /τ . Westart from arbitrary q τ ∈ Q with τ >
2, and we will show in the following thatthe projection of q τ onto the class of s -concave (0 > s > − /τ ) distributionthrough L ( · , q τ ) will be given by q − /s . Let Q τ be the distribution functionof q τ ( · ), then we can calculate Q τ ( x ) = ( (cid:0) − xτ − (cid:1) − ( τ − if x ≤ , − (cid:0) xτ − (cid:1) − ( τ − if x > . It is easy to check by direct calculation that R x −∞ (cid:0) Q r ( t ) − Q τ ( t ) (cid:1) d t ≤ x = 0. It is clear that ˜ S ( q τ ) = { } andhence the conditions in Theorem 2.14 are verified. Note that, in Example 2.9 -CONCAVE ESTIMATION of D¨umbgen, Samworth and Schuhmacher (2011), the log-concave approxi-mation of the rescaled t density is the Laplace distribution. It is easy tosee from the above calculation that the log-concave projection of the wholeclass Q will be the Laplace distribution q ∞ = exp( − | x | ). Therefore thelog-concave approximation fails to distinguish densities at least amongst theclass Q ∪ { t } .2.4. Continuity of the R´enyi divergence estimator in s . Recall that α =1+ s , and then α, β is a conjugate pair with α − + β − = 1 where β = 1+1 /s .For 1 − /d < α <
1, let F α ( f ) = 1 α − Z f α ( x ) d x,F ( f ) = Z f ( x ) log f ( x ) d x. For a given index − /d < s <
0, and data X = ( X , . . . , X n ) with non-voidint(conv( X )), solving the dual problem (1.2) for the primal problem (1.1) isequivalent to solving( D α ) min f F α ( f ) = 1 α − Z f α ( x ) d x subject to f = d( Q n − G )d y for some G ∈ G ( X ) ◦ (2.14)where G ( X ) ◦ is the polar cone of G ( X ) and Q n = n P ni =1 δ X i is the empiricalmeasure. The maximum likelihood estimation of a log-concave density hasdual form ( D ) min f F ( f ) = Z f ( x ) log f ( x ) d x, subject to f = d( Q n − G )d y for some G ∈ G ( X ) ◦ . (2.15)Let f α and f be the solutions of ( D α ) and ( D ). For simplicity we drop theexplicit notational dependence of f α , f on n . Since F α ( f ) → F ( f ) as α ր f smooth enough, it is natural to expect some convergence property of f α to f . The main result is summarized as follows. Theorem . Suppose d = 1 . For all κ > , p ≥ , we have thefollowing weighted convergence lim α ր Z (1 + k x k ) κ | f α ( x ) − f ( x ) | p d x = 0 , HAN AND WELLNER
Moreover, for any closed set S contained in the continuity points of f , lim α ր sup x ∈ S (cid:0) k x k (cid:1) κ | f α ( x ) − f ( x ) | = 0 for all κ > .
3. Limit behavior of s -concave densities. Let { f n } n ∈ N be a se-quence of s -concave densities with corresponding measures d ν n = f n d λ .Suppose ν n → d ν . From Borell (1974, 1975) and Brascamp and Lieb (1976),we know that each ν n is a t − concave measure with t = s/ (1 + sd ) if − /d < s < ∞ , t = −∞ if s = − /d , and t = 1 /d if s = ∞ . This re-sult is proved via different methods by Rinott (1976). Furthermore, if thedimension of the support of ν is d , then it follows from Borell (1974), The-orem 2.2 that the limit measure ν is t − concave and hence has a Lebesguedensity with s = t/ (1 − td ). Here we pursue this type of result in somewhatmore detail. Our key dimensionality condition will be formulated in termsof the set C := { x ∈ R d : lim inf f n ( x ) > } . We will show that if(D1) Either dim(csupp( ν )) = d or dim( C ) = d holds, then the limiting probability measure ν admits an upper semi-continuous s -concave density on R d . Furthermore, if a sequence of s -concave densities { f n } converges weakly to some density f (in the sense that the correspondingprobability measures converge weakly), then f is s -concave, and f n convergesto f in weighted L metrics and uniformly on any closed set of continuitypoints of f . The directional derivatives of f n also converge uniformly in alldirections in a local sense.In the following sections, we will not fully exploit the strength of theresults we have obtained. The results obtained will be interesting in theirown right, and careful readers will find them useful as technical support forSections 2 and 4.3.1. Limit characterization via dimensionality condition.
Note that C is a convex set. For a general convex set K , we follow the convention (seeRockafellar (1997)) that dim K = dim(aff( K )), where aff( K ) is the affinehull of K . It is well known that the dimension of a convex set K is the max-imum of the dimensions of the various simplices included in K (cf. Theorem2.4, Rockafellar (1997)).We first extend several results in Kim and Samworth (2015) and Cule and Samworth(2010) from the log-concave setting to our s -concave setting. The proofs willall be deferred to Appendix 6.2. Lemma . Assume (D1). Then csupp( ν ) = C . -CONCAVE ESTIMATION Lemma . Let { ν n } n ∈ N be probability measures with upper semi-continuous s -concave densities { f n } n ∈ N such that ν n → ν weakly as n → ∞ . Here ν isa probability measure with density f . Then f n → a.e. f , and f can be takenas f = cl(lim n f n ) and hence upper semi-continuous s -concave. In many situations, uniform boundedness of a sequence of s -concave den-sities give rise to good stability and convergence property. Lemma . Assume − /d < s < . Let { f n } n ∈ N be a sequence of s -concave densities on R d . If dim C = d where C = { lim inf n f n > } asabove, then sup n ∈ N k f n k ∞ < ∞ . Now we state one limit characterization theorem.
Theorem . Assume − /d < s < . Under either condition of (D1), ν is absolutely continuous with respect to λ d , with a version of the Radon-Nikodym derivative cl(lim n f n ) , which is an upper semi-continuous and an s -concave density on R d . Modes of convergence.
It is shown above that the weak conver-gence of s -concave probability measures implies almost everywhere point-wise convergence at the density level. In many applications, we wish differ-ent/stronger types of convergence. This subsection is devoted to the studyof the following two types of convergence:1. Convergence in k·k L metric;2. Convergence in k·k ∞ metric.We start by investigating convergence property in k·k L metric. Lemma . Assume − /d < s < . Let ν, ν , . . . , ν n , . . . be probabilitymeasures with upper semi-continuous s -concave densities f, f , . . . , f n , . . . such that ν n → ν weakly as n → ∞ . Then there exists a, b > such that f n ( x ) ∨ f ( x ) ≤ (cid:0) a k x k + b (cid:1) /s . Once the existence of a suitable integrable envelope function is estab-lished, we conclude naturally by dominated convergence theorem that
Theorem . Assume − /d < s < . Let ν, ν , . . . , ν n , . . . be probabilitymeasures with upper semi-continuous s -concave densities f, f , . . . , f n , . . . such that ν n → ν weakly as n → ∞ . Then for κ < r − d , (3.1) lim n →∞ Z (1 + k x k ) κ | f n ( x ) − f ( x ) | d x = 0 . HAN AND WELLNER
Next we examine convergence of s -concave densities in k·k ∞ norm. Wedenote g = f s , g n = f sn unless otherwise specified. Since we have establishedpointwise convergence in Lemma 3.2, classical convex analysis guaranteesthat the convergence is uniform over compact sets in int(dom( f )). To es-tablish global uniform convergence result, we only need to control the tailbehavior of the class of s -concave functions and the region near the boundaryof f . This is accomplished via Lemmas 6.2 and 6.3. Theorem . Let ν, ν , . . . , ν n , . . . be probability measures with uppersemi-continuous s -concave densities f, f , . . . , f n , . . . such that ν n → ν weaklyas n → ∞ . Then for any closed set S contained in the continuity points of f and κ < r = − /s , lim n →∞ sup x ∈ S (cid:0) k x k (cid:1) κ | f n ( x ) − f ( x ) | = 0 . We note that no assumption on the index s is required here.3.3. Local convergence of directional derivatives.
It is known in convexanalysis that if a sequence of convex functions g n converges pointwise to g on an open convex set, then the subdifferential of g n also ‘converges’ tothe subdifferential of g . If we further assume smoothness of g n , then localuniform convergence of the derivatives automatically follows. See Theorems24.5 and 25.7 in Rockafellar (1997) for precise statements. Here we pursuethis issue at the level of transformed densities. Theorem . Let ν, ν , . . . , ν n , . . . be probability measures with uppersemi-continuous s -concave densities f, f , . . . , f n , . . . such that ν n → ν weaklyas n → ∞ . Let D f := { x ∈ int(dom( f )) : f is differentiable at x } , and T ⊂ int( D f ) be any compact set. Then lim n →∞ sup x ∈ T, k ξ k =1 |∇ ξ f n ( x ) − ∇ ξ f ( x ) | = 0 .
4. Limiting distribution theory of the divergence estimator.
Inthis section we establish local asymptotic distribution theory of the diver-gence estimator ˆ f n at a fixed point x ∈ R . Limit distribution theory inshape-constrained estimation was pioneered for monotone density and re-gression estimators by Prakasa Rao (1969), Brunk (1970), Wright (1981)and Groeneboom (1985). Groeneboom, Jongbloed and Wellner (2001) es-tablished pointwise limit theory for the MLE’s and LSE’s of a convex de-creasing density, and also treated pointwise limit theory estimation of a -CONCAVE ESTIMATION convex regression function. Balabdaoui, Rufibach and Wellner (2009) estab-lished pointwise limit theorems for the MLEs of log-concave densities on R . On the other hand, for nonparametric estimation of s -concave densi-ties, asymptotic theory beyond the Hellinger consistency results for theMLE’s established by Seregin and Wellner (2010) has been non-existent.Doss and Wellner (2016) have shown in the case of d = 1 that the MLE’shave Hellinger convergence rates of order O p ( n − / ) for each s ∈ ( − , ∞ )(which includes the log-concave case s = 0). However, due at least in partto the lack of explicit characterizations of the MLE for s -concave classes,no results concerning limiting distributions of the MLE at fixed points arecurrently available. In the remainder of this section we formulate results ofthis type for the R´enyi divergence estimators. These results are compara-ble to the pointwise limit distribution results for the MLE’s of log-concavedensities obtained by Balabdaoui, Rufibach and Wellner (2009).In the following, we will see how natural and strong characterizationsdeveloped in Section 2 help us to understand the limit behavior of the R´enyidivergence estimator at a fixed point. For this purpose, we assume the truedensity f = g − r satisfies the following:(A1). g ∈ G and f is an s -concave density on R , where − < s < f ( x ) > g is locally C k around x for some k ≥ k := max { k ∈ N : k ≥ , g ( j )0 ( x ) = 0 , for all 2 ≤ j ≤ k − , g ( k )0 ( x ) = 0 } , and k = 2 if the above set is empty. Assume g ( k )0 iscontinuous around x .4.1. Limit distribution theory.
Before we state the main results con-cerning the limit distribution theory for the R´enyi divergence estimator,let us sketch the route by which the theory is developed. We first denoteˆ F n ( x ) := R x −∞ ˆ f n ( t ) d t , ˆ H n ( x ) := R x −∞ ˆ F n ( t ) d t and H n ( x ) := R x −∞ F n ( t ) d t .We also denote r n := n ( k +2) / (2 k +1) and l n,x = [ x , x + n − / (2 k +1) t ]. Due tothe form of the characterizations obtained in Theorem 2.12, we define localprocesses at the level of integrated distribution functions as follows: Y loc n ( t ) : = r n Z l n,x (cid:18) F n ( v ) − F n ( x ) − Z vx (cid:0) k − X j =0 f ( j )0 ( x ) j ! ( u − x ) j (cid:1) d u (cid:19) d v ; H loc n ( t ) : = r n Z l n,x (cid:18) ˆ F n ( v ) − ˆ F ( x ) − Z vx (cid:0) k − X j =0 f ( j )0 ( x ) j ! ( u − x ) j (cid:1) d u (cid:19) d v + ˆ A n t + ˆ B n , HAN AND WELLNER where ˆ A n := n k +12 k +1 (cid:0) ˆ F n ( x ) − F n ( x ) (cid:1) and ˆ B n := n k +22 k +1 (cid:0) ˆ H n ( x ) − H n ( x ) (cid:1) aredefined so that Y loc n ( · ) ≥ H loc n ( · ) by virtue of Theorem 2.12. Since we wishto derive asymptotic theory at the level of the underlying convex function,we modify the processes by Y locmod n ( t ) : = Y loc n ( t ) f ( x ) − r n Z l n,x Z vx ˆΨ k,n, ( u )d u d v, H locmod n ( t ) : = H loc n ( t ) f ( x ) − r n Z l n,x Z vx ˆΨ k,n, ( u )d u d v. (4.1)where ˆΨ k,n, ( u ) = 1 f ( x ) ˆ f n ( u ) − k − X j =0 f ( j )0 ( x ) j ! ( u − x ) j + rg ( x ) (cid:0) ˆ g n ( u ) − g ( x ) − g ′ ( x )( u − x ) (cid:1) . (4.2)A direct calculation reveals that with r = − /s > H locmod n ( t ) = − r · r n g ( x ) Z l n,x Z vx (cid:18) ˆ g n ( u ) − g ( x ) − ( u − x ) g ′ ( x ) (cid:19) d u d v + ˆ A n t + ˆ B n f ( x ) , and hence n k k +1 (cid:0) ˆ g n ( x + s n t ) − g ( x ) − s n tg ′ ( x ) (cid:1) = g ( x ) − r d d t H locmod n ( t ) ,n k − k +1 (cid:0) ˆ g ′ n ( x + s n t ) − g ′ ( x ) (cid:1) = g ( x ) − r d d t H locmod n ( t ) . (4.3)It is clear from (4.1) that the order relationship Y locmod n ( · ) ≥ H locmod n ( · ) isstill valid for the modified processes. Now by tightness arguments, the limitprocess H of H locmod n , including its derivatives, exists uniquely, giving us thepossibility of taking the limit in (4.3) as n → ∞ . Finally we relate H to thecanonical process H k defined in Theorem 4.1 by looking at their respective‘envelope’ functions Y and Y k , where Y denotes the limit process of Y locmod n and Y k ( t ) = R t W ( s ) d s − t k +2 . Careful calculation of the limit of Y loc n andˆΨ k,n, reveals that Y locmod n ( t ) → d p f ( x ) Z t W ( s ) d s − rg ( k )0 ( x ) g ( x )( k + 2)! t k +2 , Now by the scaling property of Brownian motion, W ( at ) = d √ aW ( t ), weget the following theorem. -CONCAVE ESTIMATION Theorem . Under assumptions (A1)-(A4), we have (4.4) n k k +1 (cid:0) ˆ g n ( x ) − g ( x ) (cid:1) n k − k +1 (cid:0) ˆ g ′ n ( x ) − g ′ ( x ) (cid:1)! → d − (cid:18) g k ( x ) g ( k )0 ( x ) r k f ( x ) k ( k +2)! (cid:19) / (2 k +1) H (2) k (0) − (cid:18) g k − ( x ) (cid:2) g ( k )0 ( x ) (cid:3) r k − f ( x ) k − (cid:2) ( k +2)! (cid:3) (cid:19) / (2 k +1) H (3) k (0) , and (4.5) n k k +1 (cid:0) ˆ f n ( x ) − f ( x ) (cid:1) n k − k +1 (cid:0) ˆ f ′ n ( x ) − f ′ ( x ) (cid:1)! → d (cid:18) rf ( x ) k +1 g ( k )0 ( x ) g ( x )( k +2)! (cid:19) / (2 k +1) H (2) k (0) (cid:18) r f ( x ) k +2 (cid:0) g ( k )0 ( x ) (cid:1) g ( x ) (cid:2) ( k +2)! (cid:3) (cid:19) / (2 k +1) H (3) k (0) , where H k is the unique lower envelope of the process Y k satisfying1. H k ( t ) ≤ Y k ( t ) for all t ∈ R ;2. H (2) k is concave;3. H k ( t ) = Y k ( t ) if the slope of H (2) k decreases strictly at t . Remark . We note that the minus sign appearing in (4.4) is dueto the convexity of ˆ g n , g and the concavity of the limit process H (2) k (0).The dependence of the constant appearing in the limit is optimal in view ofTheorem 2.23 in Seregin and Wellner (2010). Remark . Assume − < s < k = 2. Let f = exp( ϕ ) be a log-concave density where ϕ : R → R is the underlying concave function. Then f is also s -concave. Let g s := f − /r = exp( − ϕ /r ) be the underlying convexfunction when f is viewed as an s -concave density. Then direct calculationyields that g (2) s ( x ) = 1 r g s ( x ) (cid:0) ϕ ′ ( x ) − rϕ ′′ ( x ) (cid:1) . Hence the constant before H (2) k (0) appearing in (4.5) becomes (cid:18) f ( x ) ϕ ′ ( x ) r + f ( x ) | ϕ ′′ ( x ) | (cid:19) / . Note that the second term in the above display is exactly the constant in-volved in the limiting distribution when f ( x ) is estimated via the log-concave MLE, see (2.2), page 1305 in Balabdaoui, Rufibach and Wellner(2009). The first term is non-negative and hence illustrates the price we HAN AND WELLNER need to pay by estimating a true log-concave density via the R´enyi diver-gence estimator over a larger class of s -concave densities. We also note thatthe additional term vanishes as r → ∞ , or equivalently s ր Estimation of the mode.
We consider the estimation of the modeof an s -concave density f ( · ) defined by M ( f ) := inf { t ∈ R : f ( t ) =sup u ∈ R f ( u ) } . Theorem . Assume (A1)-(A4) hold. Then (4.6) n / (2 k +1) (cid:0) ˆ m n − m (cid:1) → d (cid:18) g ( m ) ( k + 2)! r f ( m ) g ( k )0 ( m ) (cid:19) / (2 k +1) M ( H (2) k ) , where ˆ m n = M ( ˆ f n ) , m = M ( f ) . By Theorem 2.26 in Seregin and Wellner (2010), the dependence of theconstant on local smoothness is optimal when k = 2. Here we show that thisdependence is also optimal for k > P dominated by the canonical Lebesgue mea-sure on R d . Let T : P → R be any functional. For an increasing convex lossfunction l ( · ) on R + , we define the minimax risk as(4.7) R l ( n ; T, P ) := inf t n sup p ∈P E p × n l (cid:0) | t n ( X , . . . , X n ) − T ( p ) | (cid:1) , where the infimum is taken over all possible estimators of T ( p ) based on X , . . . , X n . Our basic method of deriving minimax lower bound based onthe following work in Jongbloed (2000). Theorem . Let { p n } be a sequenceof densities in P such that lim sup n →∞ nh ( p n , p ) ≤ τ for some density p ∈ P . Then (4.8) lim inf n →∞ R l ( n ; T, { p, p n } ) l (cid:0) exp( − τ ) / · | T ( p n ) − T ( p ) | (cid:1) ≥ . For fixed g ∈ G and f := g /s = g − r , let m := M ( g ) be the mode of g .Consider a class of local perturbations of g : For every ǫ >
0, define˜ g ǫ ( x ) = g ( m − ǫc ǫ ) + ( x − m + ǫc ǫ ) g ′ ( m − ǫc ǫ ) x ∈ [ m − ǫc ǫ , m − ǫ ) g ( m + ǫ ) + ( x − m − ǫ ) g ′ ( m + ǫ ) x ∈ [ m − ǫ, m + ǫ ) g ( x ) otherwise . -CONCAVE ESTIMATION Here c ǫ is chosen so that g ǫ is continuous at m − ǫ . This construction of aperturbation class is also seen in Balabdaoui, Rufibach and Wellner (2009);Groeneboom, Jongbloed and Wellner (2001). By Taylor expansion at m − ǫ we can easily see c ǫ = 3 + o (1) as ǫ →
0. Since ˜ f ǫ := ˜ g − rǫ is not a density, wenormalize it by f ǫ ( x ) := ˜ f ǫ ( x ) R R ˜ f ǫ ( y )d y . Now f ǫ is s -concave for each ǫ > m − ǫ .The following result follows from direct calculation. For a proof, we referto Appendix section 6.3 . Lemma . Assume (A1)-(A4). Then h ( f ǫ , f ) = ζ k r f ( m )( g ( k ) ( m )) g ( m ) ǫ k +1 + o ( ǫ k +1 ) , where ζ k = 1108( k !) ( k + 1)( k + 2)(2 k + 1) (cid:20) − · k +2 (2 k + 1)(3 k +2 + k + k − k + 1)( k + 2) (cid:18) k +1 −
1) + 2 · k (2 k + 1)(2 k (2 k −
9) + 27) (cid:19)(cid:21) + 2 k (2 k + 1)3( k !) ( k + 1)(2 k + 1) . Theorem . For an s -concave density f , let SC n,τ ( f ) be defined by SC n,τ ( f ) := (cid:26) f : s -concave density , h ( f, f ) ≤ τ n (cid:27) . Let m = M ( f ) be the mode of f . Suppose (A1)-(A4) hold. Then, sup τ> lim inf n →∞ n / (2 k +1) inf t n sup f ∈SC n,τ E f | T n − M ( f ) | ≥ ρ k (cid:18) g ( m ) r f ( m ) g ( k )0 ( m ) (cid:19) / (2 k +1) , where ρ k = (2(2 k + 1) ζ k ) − / (2 k +1) / . Proof.
Take l ( x ) = | x | . Let ǫ = cn − / (2 k +1) , and let γ = r f ( m )( g ( k ) ( m )) g ( m ) , f n := f cn − / (2 k +1) . Then lim sup n →∞ nh ( f n , f ) = ζ k γc k +1 . Applying Theo-rem 4.5, we find thatlim inf n →∞ n / (2 k +1) R l ( n ; T, { f, f n } ) ≥ c exp (cid:16) − ζ k γc (2 k +1) (cid:17) . Now we choose c = (2(2 k + 1) ζ k γ ) − / (2 k +1) to conclude. HAN AND WELLNER
5. Discussion.
We show in this paper that the class of s -concave den-sities can be approximated and estimated via R´enyi divergences in a robustand stable way. We also develop local asymptotic distribution theory forthe divergence estimator, which suggests that the convexity constraint isthe main complexity within the class of s -concave densities regardless heavytails. In the rest of this section, we will sketch some related problems andfuture research directions.5.1. Behavior of R´enyi projection for generic measures Q when s < − / ( d +1) . We have considered in this paper two regions for the index s : (1) − / ( d + 1) < s < − /d < s ≤ − / ( d + 1). In case (1), weshowed that starting from a generic measure Q with the interior of its con-vex support non-void and a first moment, the R´enyi projection through (1.3)exists and enjoys nice continuity properties that cover both on and off-the-model situations. In case (2), we showed that the R´enyi projection for theempirical measure still enjoys such continuity properties when Q is a prob-ability measure corresponding to a true s -concave density with a finite firstmoment.It remains open to investigate the behavior of the R´enyi projection in theregion (2) for a generic measure Q . If Q does not admit a first moment,i.e. R k x k d Q ( x ) = ∞ , then the first term in the functional (1.3) divergesfor any candidate convex function. We conjecture that the R´enyi divergenceprojection fails to exist in this case. We do not know if the R´enyi projectionexists when − /d < s ≤ − / ( d + 1) and Q / ∈ P s but R k x k dQ ( x ) < ∞ .It should be mentioned that the MLEs for the classes P s exist (for an in-creasingly large sample size n as s ց − /d ), and are Hellinger consistent for − /d < s < s < − /d . Butwe do not yet know any continuity properties of the Maximum Likelihoodprojection “off the model”. This leaves the interval − /d < s ≤ − / ( d + 1)presently without a nicely stable nonparametric estimation procedure. SeeKoenker and Mizera (2010) pages 3008 and 3016 for some further discussion.5.2. Global rates of convergence for R´enyi divergence estimators.
Clas-sical empirical process theory relates the maximum likelihood estimatorswith Hellinger loss via ‘basic inequalities’ as coined in van de Geer (2000)and van der Vaart and Wellner (1996). This reduces the problem of globalrates of convergence to the study of modulus of continuity of empirical pro-cess indexed by a suitable transformation of the function class of interest.We expect that similar ‘basic inequalities’ can be exploited to relate theR´enyi divergence estimators to some divergence (not necessarily Hellinger -CONCAVE ESTIMATION distance). We also expect some uniformity in the rates of convergence forthe R´enyi divergence estimators as observed by Kim and Samworth (2015)in the case of the MLEs for log-concave densities.5.3. Conjectures about the global rates in higher dimensions.
It is nowwell-understood from the work of Doss and Wellner (2016) that the MLEsfor s -concave densities( − < s <
0) and log-concave densities in dimension1 converge at rates no worse than O p ( n − / ) in Hellinger loss. In higherdimensions, Kim and Samworth (2015) provide an important lower boundon the bracketing entropy for a subclass of log-concave densities on the orderof O ( ǫ − ( d/ ∨ ( d − ) in Hellinger distance, and a matching upper bound up tologarithmic factors for d ≤
3. Lack of corresponding results in discrete convexgeometry precludes further upper bounds beyond d = 3. If a matching upperbound can be achieved for d ≥ r n in squared Hellinger distances become r n = O ( n − / ( d − ) , d ≥ Adaptive estimation of concave-transformed class of functions.
Therates conjectured above are conservative in that they are derived from the global point of view. From a local perspective, adaptive estimation may bepossible when the underlying function/density exhibits special structures.In fact, it is shown by Guntuboyina and Sen (2015) that in the univari-ate convex regression setting, if the underlying convex function is piecewiselinear, then the rate of convergence for the global risk in the discrete l norm adapts to nearly parametric rate n − / (up to logarithmic factors). Itwould be interesting to examine if same phenomenon can be observed forthe MLEs/R´enyi divergence estimators, and more generally for minimumcontrast estimators of concave-transformed classes of functions.
6. Proofs.
Proofs for Section 2.
Proof of Lemma 2.1.
Let Q ∈ Q . Then by letting g ( x ) := k x k + 1, HAN AND WELLNER we have L ( Q ) ≤ L ( g, Q ) = Z (1 + k x k ) d Q + 1 | β | Z d x (1 + k x k ) − β < ∞ , by noting Q ∈ Q , and − β = − − /s > d . Now assume L ( Q ) < ∞ . If Q / ∈ Q , i.e. R k x k d Q = ∞ , then since for each g ∈ G , we can find some a, b > g ( x ) ≥ a k x k − b , we have L ( g, Q ) = Z g d Q + 1 | β | Z g β d x ≥ Z ( a k x k − b ) d Q = ∞ , a contradiction. This implies Q ∈ Q . Proof of Theorem 2.2.
We note that L ( Q ) < ∞ by Lemma 2.1. Hencewe can take a sequence { g n } n ∈ N ⊂ G such that ∞ > M ≥ L ( g n , Q ) ց L ( Q )as n → ∞ for some M >
0. Now we claim that, for all x ∈ int(csupp( Q )),(6.1) sup n ∈ N g n ( x ) < ∞ . Denote ǫ n ≡ inf x ∈ R d g n ( x ). First note, L ( g n , Q ) ≥ Z g n d Q = Z g n ( g n ≤ g n ( x )) d Q + Z g n ( g n > g n ( x )) d Q = Z (cid:0) g n − g n ( x ) + g n ( x ) (cid:1) ( g n ≤ g n ( x )) d Q + Z g n ( g n > g n ( x )) d Q ≥ g n ( x ) − (cid:0) g n ( x ) − ǫ n (cid:1) Q (cid:0) { g n ( · ) ≤ g n ( x ) } (cid:1) . If g n ( x ) > ǫ n , then x is not an interior point of the closed convex set { g n ≤ g n ( x ) } , which implies Q (cid:0) { g n ( · ) ≤ g n ( x ) } (cid:1) ≤ h ( Q, x ), where h ( · , · ) isdefined in Lemma 7.9. Hence, in this case, the above term is lower boundedby L ( g n , Q ) ≥ g n ( x ) − (cid:0) g n ( x ) − ǫ n (cid:1) h ( Q, x ) ≥ g n ( x ) (cid:0) − h ( Q, x ) (cid:1) . This inequality also holds for g n ( x ) = ǫ n , which implies that g n ( x ) ≤ L ( g n , Q )1 − h ( Q, x ) ≤ M − h ( Q, x ) . by the first statement of Lemma 7.9. Thus we verified (6.1). Now invokingLemma 7.14, and we check conditions (A1)-(A2) as follows: (A1) followsby (6.1); (A2) follows by the choice of g n since sup n ∈ N L ( g n , Q ) ≤ M . By -CONCAVE ESTIMATION Lemma 7.13 we can find a subsequence { g n ( k ) } k ∈ N of { g n } n ∈ N , and a function˜ g ∈ G such that { x ∈ R d : sup n ∈ N g n ( x ) < ∞} ⊂ dom(˜ g ), andlim k →∞ ,x → y g n ( k ) ( x ) = ˜ g ( y ) , for all y ∈ int(dom(˜ g )) , lim inf k →∞ ,x → y g n ( k ) ( x ) ≥ ˜ g ( y ) , for all y ∈ R d . Again for simplicity we assume that { g n } satisfies the above properties. Wenote that L ( Q ) = lim n →∞ (cid:18) Z g n d Q + 1 | β | Z g βn d x (cid:19) ≥ lim inf n →∞ Z g n d Q + 1 | β | lim inf n →∞ Z g βn d x ≥ Z ˜ g d Q + 1 | β | Z ˜ g β d x = L (˜ g, Q ) ≥ L ( Q ) , where the third line follows from Fatou’s lemma for the first term, and Fa-tou’s lemma and the fact that the boundary of a convex set has Lebesguemeasure zero for the second term (Theorem 1.1, Lang (1986)). This es-tablishes L (˜ g, Q ) = L ( Q ), and hence ˜ g is the desired minimizer. Since˜ g ∈ G achieves its minimum, we may assume x ∈ Arg min x ∈ R d ˜ g ( x ). If˜ g ( x ) = 0, since ˜ g has domain with non-empty interior, we can choose x , . . . , x d ∈ dom(˜ g ) such that { x , . . . , x d } are in general position. Thenby Lemma 7.15 we find L (˜ g, Q ) = ∞ , a contradiction. This implies ˜ g mustbe bounded away from zero.For the last statement, since ˜ g is a minimizer of (1.3), and the fact that˜ g is bounded away from zero, then L (˜ g + c, Q ) is well-defined for all | c | ≤ δ with small δ >
0, and we must necessarily have dd c L (˜ g + c, Q ) | c =0 = 0 . On theother hand it is easy to calculate that dd c L (˜ g + c, Q ) = 1 − R (cid:0) ˜ g ( x )+ c (cid:1) β − d x. This yields the desired result by noting β − /s . Proof of Lemma 2.3.
Let g, h be two minimizers for P Q . Since ψ s ( x ) = | β | x β is strictly convex on [0 , ∞ ), L ( t · g + (1 − t ) · h, Q ) is strictly convex in t ∈ [0 ,
1] unless g = h a.e. with respect to the canonical Lebesgue measure.We claim if two closed functions g, h agree a.e. with respect to the canonicalLebesgue measure, then it must agree everywhere, thus closing the argu-ment. It is easy to see int(dom g ) = int(dom h ). Since int(dom( g )) = ∅ , wehave ri(dom g ) = int(dom g ) = int(dom h ) = ri(dom h ) . Also note that a con-vex function is continuous in the interior of its domain, and hence almosteverywhere equality implies everywhere equality within the interior of the HAN AND WELLNER domain, i.e. g (cid:12)(cid:12) int(dom g ) = h (cid:12)(cid:12) int(dom h ) . Now by Corollary 7.3.4 in Rockafellar(1997), and the closedness of g, h , we find that g = cl g = cl h = h . Proof of Theorem 2.5.
To show (2.1), we use Skorohod’s theorem:since Q n → d Q , there exist random vectors X n ∼ Q n and X ∼ Q de-fined on a common probability space (Ω , B , P ) satisfying X n → a.s. X . Thenby Fatou’s lemma, we have R k x k d Q = E [ k X k ] ≤ lim inf n →∞ E [ k X n k ] =lim inf n →∞ R k x k d Q n . Assume (2.2). We first claim that(6.2) lim sup n →∞ L ( Q n ) ≤ L ( g, Q ) = L ( Q ) . Let g n ( · ) , g ( · ) be defined as in the statement of the theorem. Note thatlim sup n →∞ L ( g n , Q n ) ≤ lim n →∞ L ( g ( ǫ ) , Q n ) = L ( g ( ǫ ) , Q ). Here g ( ǫ ) is theLipschitz approximation of g defined in Lemma 7.8, and the last equalityfollows from the moment convergence condition (2.2) by rewriting g ( ǫ ) ( x ) = g ( ǫ ) ( x )1+ k x k (1+ k x k ), and note the Lipschitz condition on g ( ǫ ) implies boundednessof g ( ǫ ) ( x )1+ k x k . By construction of { g ( ǫ ) } ǫ> we know that if x is a minimizer of g , then it is also a minimizer of g ( ǫ ) . This implies that the function class { g ( ǫ ) } ǫ> is bounded away from zero since g is bounded away from zero byTheorem 2.2, i.e. inf x ∈ R d g ( ǫ ) ( x ) ≥ ǫ holds for all ǫ > ǫ > ǫ ց
0, in view of Lemma 7.8, by the monotone convergence theoremapplied to g ( ǫ ) and ǫ β − ( g ( ǫ ) ) β we have verified (6.2).Next, we claim that, for all x ∈ int(dom( Q )),(6.3) lim sup n →∞ g n ( x ) < ∞ . Denote ǫ n ≡ inf x ∈ R d g n ( x ). Note by essentially the same argument as in theproof of Theorem 2.2, we have g n ( x ) ≤ L ( Q n )1 − h ( Q n , x ) . By taking lim sup as n → ∞ , (6.3) follows by virtue of Lemma 7.9 and (6.2).Now we proceed to show (2.3) and (2.4). By invoking Lemma 7.14, wecan easily check that all conditions are satisfied (note we also used (6.2)here). Thus we can find a subsequence { g n ( k ) } k ∈ N of { g n } n ∈ N with g n ( k ) ( x ) ≥ a k x k − b, holds for all x ∈ R d and all k ∈ N with some a, b >
0. Hence byLemma 7.13, we can find a function ˜ g ∈ G such that { x ∈ R d : lim sup k →∞ g n ( k ) ( x ) < -CONCAVE ESTIMATION ∞} ⊂ dom(˜ g ) , and thatlim k →∞ ,x → y g n ( k ) ( x ) = ˜ g ( y ) , for all y ∈ int(dom(˜ g )) , lim inf k →∞ ,x → y g n ( k ) ( x ) ≥ ˜ g ( y ) , for all y ∈ R d . Again for simplicity we assume { g n } admit the above properties. Now definerandom variables H n ≡ g n ( X n ) − ( a k X n k − b ). Then by the same reasoningas in the proof of Theorem 2.2, we havelim inf n →∞ L ( Q n ) = lim inf n →∞ (cid:18) Z g n d Q n + 1 | β | Z g βn d x (cid:19) ≥ lim inf n →∞ E [ H n + a ( X n ) − b ] + 1 | β | Z ˜ g β d x ≥ E [lim inf n →∞ H n ] + a lim inf n →∞ Z k x k d Q n − b + 1 | β | Z ˜ g β d x = L (˜ g, Q ) + a (cid:18) lim inf n →∞ Z k x k d Q n − Z k x k d Q (cid:19) ≥ L ( Q ) + a (cid:18) lim inf n →∞ Z k x k d Q n − Z k x k d Q (cid:19) , Note the expectation is taken with respect to the probability space (Ω , B , P )defined above. This establishes that if (2.2) holds true, then(6.4) lim inf n →∞ L ( Q n ) ≥ L (˜ g, Q ) ≥ L ( Q ) . Conversely, if (2.2) does not hold true, then there exists a subsequence { Q n ( k ) } such that lim inf k →∞ R k x k d Q n ( k ) > R k x k d Q . However, this meansthat lim inf k →∞ L ( Q n ( k ) ) > L ( Q ) , which contradicts (2.3). Hence if (2.3)holds, then (2.2) holds true. Combine (6.4) and (6.2), and by virtue ofLemma 2.3, we find ˜ g ≡ g . This completes the proof for (2.3) and (2.4).We show (2.5). First we claim that { ˆ x n ∈ Arg min x ∈ R d g n ( x ) } n ∈ N isbounded. If not, then we can find a subsequence such that k ˆ x n ( k ) k → ∞ as k → ∞ . However this means that g n ( k ) ( x ) ≥ g n ( k ) (ˆ x n ( k ) ) ≥ a k ˆ x n ( k ) k− b → ∞ as k → ∞ for any x , a contradiction. Next we claim that there exists ǫ > k ∈ N ǫ n ( k ) ≥ ǫ holds for some subsequence { ǫ n ( k ) } k ∈ N of { ǫ n } n ∈ N .This can be seen as follows: Boundedness of { ˆ x n } implies ˆ x n ( k ) → x ∗ as k → ∞ for some subsequence { ˆ x n ( k ) } k ∈ N ⊂ { ˆ x n } n ∈ N and some x ∗ ∈ R .Hence by (2.4) we have lim sup k →∞ f n ( k ) (ˆ x n ( k ) ) ≤ f ( x ∗ ) < ∞ , since f ( · ) isbounded. This implies that sup k ∈ N k f n ( k ) k ∞ < ∞ , which is equivalent to theclaim. As before, we will understand the notation for whole sequence as a HAN AND WELLNER suitable subsequence. Now we have g n ( x ) ≥ (cid:0) a k x k − b (cid:1) ∨ ǫ holds for all x ∈ R d . This gives rise to(6.5) f n ( x ) ≤ (cid:18)(cid:0) a k x k − b (cid:1) ∨ ǫ (cid:19) /s , for all x ∈ R d . Note that − / ( d + 1) < s < /s < − ( d + 1), whence we getan integrable envelope. Now a simple application of dominated convergencetheorem yields the desired result (2.5), in view of the fact that the boundaryof a convex set has Lebesgue measure zero (cf. Theorem 1.1 in Lang (1986)).Finally, (2.6) and (2.7) are direct results of Theorems 3.7 and 3.8 by notingthat (2.5) entails f n → d f (in the sense that the corresponding probabilitymeasures converge weakly). Proof of Corollary 2.7.
It is known by Varadarajan’s theorem (cf.Dudley (2002) Theorem 11.4.1), Q n converges weakly to Q with probabil-ity 1. Further by the strong law of large numbers (SLLN), we know that R k x k d Q n → a.s. R k x k d Q . This verifies all conditions required in Theorem2.5. Proof of Corollary 2.8.
The conclusion follows from Corollary 2.7if − / ( d + 1) < s <
0, so suppose − /d < s ≤ − / ( d + 1). Since f ∈P s ′ , we may write f = g /s ′ where g is convex. If f is unbounded, then g ( x ) = 0 for some x ∈ R . By Lemma 7.15 with r ′ = − /s ′ , it followsthat R f = ∞ , contradicting the fact that f is a density. Thus f mustnecessarily be bounded. To see that f has a finite mean, note that by Lemma3.5 f ( x ) = ( b + a k x k ) /s ′ where a, b > r ′ ≡ − /s ′ > d + 1. Thus R R d k x k f ( x ) dx ≤ R R d k x k ( b + a k x k ) − r ′ dx < ∞ . Now note that (2.8) holds bythe existence of the R´enyi divergence estimator for the empirical measure(cf. Theorem 4.1 in Koenker and Mizera (2010)) and the same argumentin the proof of Theorem 2.5. Also note that by the proof of Theorem 3.7,(2.8) would be enough to ensure (2.10). Since f is continuous on the interiorof the domain, we see that (2.10) implies weak convergence: let ˆ Q n be themeasures corresponding to ˆ f n . Then ˆ Q n → Q weakly as n → ∞ . Now therest follows immediately from Theorems 3.6 and 3.8. Proof of Theorem 2.9.
Denote L ( · ) := L ( · , Q ). We first claim: Claim. g = arg min g ∈G L ( g ) if and only if lim t ց L ( g + th ) − L ( g ) t ≥ , holds forall h : R d → R such that there exists t > g + th ∈ G holds for all t ∈ (0 , t ).To see this, we only have to show sufficiency. Now suppose g is not aminimizer of L ( · ). By Theorem 2.2 we know there exists ˆ g ∈ G such that -CONCAVE ESTIMATION ˆ g = g ( ·| Q ). By convexity, we have that for any t > L (cid:0) g + t (ˆ g − g ) (cid:1) ≤ (1 − t ) L ( g ) + tL (ˆ g ) . This implies that if we let h = ˆ g − g , and t = 1, then L ( g + th ) − L ( g ) t ≤ t (cid:0) (1 − t ) L ( g ) + tL (ˆ g ) − L ( g ) (cid:1) = − t (cid:0) L ( g ) − L (ˆ g ) (cid:1) , and thus lim t ց L ( g + th ) − L ( g ) t ≤ − (cid:0) L ( g ) − L (ˆ g ) (cid:1) <
0, where the strict in-equality follows from Lemma 2.3. This proves our claim. Now the theoremfollows from simple calculation:0 ≤ lim t ց t (cid:18) L ( g + th ) − L ( g ) (cid:19) = Z h d Q − Z h · g /s d λ, as desired. Proof of Corollary 2.10.
Let g ≡ g ( ·| Q ). Then by Theorem 2.2 andLemma 7.10, we find that there exists some a, b > g ( x ) ≥ a k x k + b . Now take v ∈ ∂h (0), i.e. h ( x ) ≥ h (0) + v T x holds for all x ∈ R d .Hence for t >
0, we have g ( x ) + th ( x ) ≥ a k x k + b + t ( h (0) + v T x ) ≥ ( a − t k v k ) k x k + ( b + th (0)) , which implies that g + th ∈ G for t > Proof of Theorem 2.12.
We first note that if F is a distribution func-tion for a probability measure supported on [ X (1) , X ( n ) ], and h : [ X (1) , X ( n ) ] → R an absolutely continuous function, then integration by parts (Fubini’s the-orem) yields(6.6) Z h d F = h ( X ( n ) ) − Z X ( n ) X (1) h ′ ( x ) F ( x ) d x. First we assume g n = ˆ g n . For fixed t ∈ [ X (1) , X ( n ) ], let h be a convexfunction whose derivative is given by h ′ ( x ) = − ( x ≤ t ). Now by Theorem2.9 we find that R h d F n = R h d ˆ F n ≤ R h d F n . Plugging in (6.6) we findthat R tX (1) F n ( x ) d x ≤ R tX (1) F n ( x ) d x. For t ∈ S n ( g n ), let h be the functionwith derivative h ′ ( x ) = ( x ≤ t ). It is easy to see g n + th is convex for t > g n satisfies (2.13). In view of the proof of Theorem2.9, we only have to show (2.12) holds for all function h : R → R which is HAN AND WELLNER linear on [ X ( i ) , X ( i +1) ]( i = 1 , . . . , n −
1) and g n + th convex for t > g n is a linear function between two consecutive knots, h mustbe convex between consecutive knots. This implies that the derivative ofsuch an h can be written as h ′ ( x ) = P nj =2 β j ( x ≤ X ( j ) ) , with β , . . . , β n satisfying β j ≤ X ( j ) / ∈ S n ( g n ) . Now again by (6.6) we have Z h d ˆ F n = h ( X n ) − n X j =2 β j Z X ( j ) X (1) ˆ F n ( x ) d x ≤ h ( X n ) − n X j =2 β j Z X ( j ) X (1) F n ( x ) d x = Z h d F n , as desired. Proof of Corollary 2.13.
This follows directly from the Theorem2.12 by noting for x < x < x we have1 x − x Z x x ˆ F n ( x ) d x ≤ x − x Z x x F n ( x ) d x, and 1 x − x Z x x ˆ F n ( x ) d x ≥ x − x Z x x F n ( x ) d x. Now let x ր x and x ց x we find that ˆ F n ( x ) ≤ F n ( x ) by rightcontinuity and ˆ F n ( x ) ≥ F n ( x − ) = F n ( x ) − n . Proof of Theorem 2.14.
The proof closely follows the proof of Theo-rem 2.7 of D¨umbgen, Samworth and Schuhmacher (2011). For the reader’sconvenience we give a full proof here. Let P denote the probability distri-bution corresponding to F . We first show necessity by assuming g = g ( ·| Q ).By Corollary 2.10 applied to h ( x ) = ± x , we find by Fubini’s theorem that0 = Z R x d( Q − P )( x ) = Z R ( F − G )( t )d t which proves (1). Now we turn to (2). Since the map s ( s − x ) + is convex,again by Corollary 2.10, we find0 ≤ Z R ( s − x ) + d( Q − P )( s ) = − Z x −∞ ( F − G )( t ) d t, where in the last equality we used the proved fact that R R ( F − G )d λ = 0.Now we assume x ∈ ˜ S ( g ), and discuss two different cases to conclude. If -CONCAVE ESTIMATION x ∈ ∂ (dom( g )), then let h ( s ) = − ( s − x ) + , it is easy to see g + th ∈ G for t > ≤ Z h ( s )d( Q − P )( s ) = Z x −∞ ( F − G )( t ) d t. If x ∈ int(dom( g )), then g ′ ( x − δ ) < g ′ ( x + δ ) for small δ > H ′ δ ( u ) = − g ′ ( u ) − g ′ ( x − δ ) g ′ ( x + δ ) − g ′ ( x − δ ) { u ∈ [ x − δ,x + δ ] } − { u>x + δ } , whose integral H δ ( s ) := R s −∞ H ′ δ ( u ) d u serves as an approximation of − ( s − x ) + as δ ց
0. Note that (cid:0) g + tH δ (cid:1) ( s ) = g ( s ) − tg ′ ( x + δ ) − g ′ ( x − δ ) Z s ∧ ( x + δ ) s ∧ ( x − δ ) (cid:0) g ′ ( u ) − g ′ ( x − δ ) (cid:1) d u − t (cid:0) s − ( x + δ ) (cid:1) + , implying g + tH δ ∈ G for t > δ ).Then by Theorem 2.9,0 ≤ Z H δ ( s )d( Q − P )( s ) → − Z ( s − x ) + d( Q − P )( s ) = Z x −∞ ( F − G )( t ) d t, as δ ց
0, where the convergence follows easily from dominated convergencetheorem. This proves (2). Now we show sufficiency by assuming (1)-(2).Consider a Lipschitz continuous function ∆( · ) with Lipschitz constant L .Then Z ∆d( Q − P ) = Z ∆ ′ ( F − G ) d λ = − Z ( L − ∆ ′ )( F − G ) d λ = − Z R (cid:18) Z L − L { s> ∆ ′ ( t ) } d s (cid:19) ( F − G )( t )d t = − Z L − L Z A (∆ ′ ,s ) ( F − G )( t ) d t d s, where the second line follows from (1), and A (∆ ′ , s ) := { t ∈ R : ∆ ′ ( t )
0, by monotone convergence theorem wefind that R g d Q = R g d P and that R g d Q ≥ R g d P . This yields L ( g , Q ) ≥ L ( g , P ) ≥ L ( g, P ) = L ( g, Q ) , where the second inequality follows from the Fisher consistency of functional L ( · , · ) and the fact that P is the distribution corresponding to g .Before we prove Theorem 2.16, we will need an elementary lemma. Lemma . Fix a sequence < α n < with α n ր . Let f α n be an ( α n − -concave density on R . Let g α n := f α n − α n be the underlying convexfunction. Suppose { g α n } ’s are linear on [ a, b ] with lim n →∞ f α n ( a ) = γ a ∈ [0 , ∞ ] and lim n →∞ f α n ( b ) = γ b ∈ [0 , ∞ ] . Then for all x ∈ [ a, b ] , (6.7) f α n ( x ) → exp (cid:18) log γ b − log γ a b − a ( x − a ) + log γ a (cid:19) where exp( −∞ ) := 0 and exp( ∞ ) := ∞ . Proof of Lemma 6.1.
First assume γ b = γ a and γ a , γ b ∈ (0 , ∞ ). Fornotational convenience we drop explicit dependence on n and the limit istaken as α ր
1. Let γ a,α = f α ( a ) = g α ( a ) / ( α − and γ b,α = f α ( b ) = g α ( b ) / ( α − . For any x ∈ [ a, b ],lim α → log f α ( x ) = lim α → α − (cid:18) γ α − b,α − γ α − a,α b − a ( x − a ) + γ α − a,α (cid:19) = lim α → α − (cid:18) γ α − b − γ α − a b − a ( x − a ) · γ α − b,α − γ α − a,α γ α − b − γ α − a + γ α − a,α (cid:19) ≡ log γ a + lim α → α − (cid:18) ( γ α − b − γ α − a ) ( x − a )( b − a ) · γ α − a,α · r α + 1 (cid:19) . (6.8)Since γ α − a,α →
1, we claim that it suffices to show that r α ≡ γ α − b,α − γ α − a,α γ α − b − γ α − a → α → . (6.9)To see this, assume without loss of generality that γ a > γ b and hence γ α − b − γ α − a >
0. Suppose that (6.9) holds and let ǫ >
0. Then the second term on -CONCAVE ESTIMATION right hand side of (6.8) can be bounded from above bylim α ր α − (cid:18)(cid:0) γ α − b − γ α − a (cid:1) ( x − a )( b − a ) (1 − ǫ ) + 1 (cid:19) = lim α ր (cid:0) log γ b · γ α − b − log γ a · γ α − a (cid:1) ( x − a )( b − a ) (1 − ǫ )= (log γ b − log γ a ) ( x − a )( b − a ) (1 − ǫ )where the second line follows from L’Hospital’s rule. Similarly we can derivea lower bound: (log γ b − log γ a ) ( x − a )( b − a ) (1 + ǫ ) . Thus it remains to show that (6.9) holds. But we can rewrite r α as r α = c α − α − c α − − c α − ( c α /c ) α − − ( c α /c ) α − + ( c α /c ) α − − c α − −
1= ( c α /c ) α − + ( c α /c ) α − − c α − − → α → c α /c ) α − ) = ( α −
1) log( c α /c ) → · log 1 = 0, and where the secondlimit follows from an upper and lower bound argument using c α /c → c α := γ b,α /γ a,α and c = γ b /γ a = 1.This shows that (6.9) holds, thereby proving the case for γ a = γ b ∈ (0 , ∞ ).For the case γ b = γ a ∈ (0 , ∞ ), similarly we havelim α → log f α ( x ) = log γ a + lim α → α − (cid:18) c α − α − b − a ( x − a ) + 1 (cid:19) . The second term is 0 by an argument much as above by observing c α = γ b,α /γ a,α → γ b /γ a = 1. Finally, if γ a ∧ γ b = 0, then by the first line of (6.8)we see that log f α ( x ) → −∞ ; if γ a ∨ γ b = ∞ , then again log f α ( x ) → ∞ . Proof of Theorem 2.16.
In the following, the notation sup α , inf α , lim α is understood as taking corresponding operation over α close to 1 unlessotherwise specified. We first show almost everywhere convergence by invok-ing Lemma 7.13. To see this, for fixed s ∈ ( − / , g α := f α − α and g ( s ) α := ( f α ) s . Then for α > s , the transformed function g ( s ) α is convex.We need to check two conditions in order to apply Lemma 7.13 as follows: HAN AND WELLNER (C1) The set ( X (1) , X ( n ) ) ⊂ { lim inf α f α ( x ) > } ;(C2) There is a uniform lower bound function ˜ g s ∈ G such that g ( s ) α ≥ ˜ g s holds for α sufficiently close to 1.The first assertion can be checked by using the characterization Theorem2.12. Let F α be the distribution function of f α . Then R tX (1) ( F α − F n )( x ) d x ≤ t ∈ S n ( g α ). For x ∈ ( X (1) , X ( n ) ) closeenough to X ( n ) , we claim that lim inf α f α ( x ) >
0. If not, we may assumewithout loss of generality that lim α f α ( x ) = 0. We first note that there existssome t ∈ { , · · · n − } and some subsequence { α ( β ) } β ∈ N with α ( β ) ր X ( t ) is a knot point for { g α ( β ) } , and (2) X ( u ) is not a knotpoint for any { g α ( β ) } for u ≥ t + 1, i.e. g α ( β ) ’s are linear on [ X ( t ) , X ( n ) ]. Wedrop β for notational simplicity and assume without loss of generality thatboth limits lim α f α ( X ( n ) ) , lim α f α ( X ( t ) ) exist. Now Lemma 6.1 shows thatmin { lim α f α ( X ( n ) ) , lim α f α ( X ( t ) ) } = 0 since we have assumed lim α f α ( x ) = 0for some x ∈ ( X ( t ) , X ( n ) ). This in turn implies that lim α f α ( x ) = 0 forall x ∈ ( X ( t ) , X ( n ) ). Now we consider the following two cases to derive acontradiction with the fact(6.10) Z X ( n ) X ( t ) F α ( x )d x = Z X ( n ) X ( t ) F n ( x )d x that follows from Theorem 2.12, thereby proving lim inf α f α ( x ) > x close enough to X ( n ) . [Case 1.] If lim α f α ( X ( n ) ) = 0, then the left hand side of (6.10) convergesto X ( n ) − X ( t ) while the right hand side is no larger than n − n (cid:0) X ( n ) − X ( t ) (cid:1) . [Case 2.] . If lim α f α ( X ( n ) ) >
0, then we must necessarily have lim α f α ( x ) =0 for all x ∈ [ X (1) , X ( n ) ) by convexity of g α : If lim α f α ( x ) > x ∈ [ X (1) , X ( t ) ], then lim α g α ( x ) ∨ g α ( X ( n ) ) < ∞ while lim α g α ( x ) = ∞ for all x ∈ ( X ( t ) , X ( n ) ), which is absurd. Note that this also forces lim α f α ( X ( n ) ) = ∞ , otherwise the constraint R f α = 1 will be invalid eventually. Now the lefthand side of (6.10) converges to 0 while the right hand side is bounded frombelow by n ( X ( n ) − X ( t ) ).Similarly we can show lim inf α f α ( x ) > x close to X (1) . Now (C1)follows by convexity of f α .(C2) can be seen by first noting M := sup α k f α k ∞ < ∞ . This can beverified by Lemma 3.3 combined with the first assertion proved above. Thisimplies that the class { g ( s ) α } α has a uniform lower bound M s . Now (C2)follows by noting that the domain of all g ( s ) α is conv( X ). Therefore allconditions needed for Lemma 7.13 are valid, and hence we can extract a -CONCAVE ESTIMATION subsequence { g ( s ) α n } n ∈ N such thatlim n →∞ ,x → y g ( s ) α n ( x ) = g ( s ) ( y ) , for all y ∈ int(dom( g ( s ) ));lim n →∞ ,x → y g ( s ) α n ( x ) ≥ g ( s ) ( y ) , for all y ∈ R d , holds for some g ( s ) ∈ G . This implies f α n → a.e. f ( s ) as n → ∞ where f ( s ) := (cid:0) g ( s ) (cid:1) /s . Now repeat the above argument with another s with afurther extracted subsequence { α n ( k ) } , we see that f α n ( k ) → a.e. f ( s ) ( k → ∞ )for some s -concave f ( s ) holds for the subsequence { α n ( k ) } k ∈ N . This impliesthat f ( s ) = a.e. f ( s ) . Since a convex function is continuous in the interior ofthe domain, we can choose a version of upper semi-continuous f such that f = f ( s ) a.e. for all { / < s < } ∩ Q . This implies that f is s -concave forany rational 1 / < s < L convergence: For fixed κ >
0, choose 0 > s > − / ( κ + 1). Since there exists a, b > g ( s ) α n ≥ g ( s ) ≥ a k x k − b holds for all n ∈ N , we have anintegrable envelope function: (cid:0) k x k (cid:1) κ (cid:0) f α n ( x ) ∨ f ( x ) (cid:1) ≤ (cid:0) k x k (cid:1) κ (cid:18)(cid:0) a k x k − b (cid:1) ∨ M (cid:19) /s . Now an application of the dominated convergence theorem yields the desiredweighted L convergence. Similar arguments show weighted convergence isalso valid in arbitrary L p norms ( p ≥ f = f by virtue of Theorem 2.2 in D¨umbgen and Rufibach(2009) and Theorem 2.9. We note that by Lemma 6.1, f must be log-linearbetween consecutive data points. Now since f and f are both log-linear be-tween consecutive data points of { X , . . . , X n } , we only have to consider testfunctions h such that h is piecewise linear on consecutive data points. Recall g α = f α − α and g := − log f are the underlying convex functions for f α and f . For any such h with the property that, g + th ∈ G for t small enough, wewish to argue that such h is also a valid test for f α (i.e. g α + th ∈ G for t > { α k } converging up to 1 as k → ∞ . Thuswe only have to argue that for all X ( i ) ∈ S ( g ), X ( i ) ∈ S ( g α ) for a sequence of { α k } going up to 1 as k → ∞ . Assume the contrary that X ( i ) / ∈ S ( g α ) for all α close enough to 1. Then { g α } ’s are all linear on a closed interval I = [ a, b ]containing X ( i ) for α close to 1. Since f α → f uniformly on I by Theorem3.7, in particular f α ( a ) and f α ( b ) converges, Lemma 6.1 entails that f islog-linear over I , a contradiction to the fact X ( i ) ∈ S ( g ). Hence we can finda subsequence { α k } going up to 1 as k → ∞ such that for all X ( i ) ∈ S ( g ), HAN AND WELLNER X ( i ) ∈ S ( g α k ), i.e. for all feasible test function h of f , being linear on con-secutive data points, is also valid for f α k . Now combining the fact that f α k converges in L metric to f and Theorem 2.2 in D¨umbgen and Rufibach(2009) we conclude f = f .6.2. Proofs for Section 3.
Proof of Lemma 3.1.
The proof closely follows the first part of theproof of Proposition 2 Kim and Samworth (2015). Suppose dim (cid:0) csupp( ν ) (cid:1) = d , we show csupp( ν ) ⊂ C . To see this, we take x / ∈ C , then there exists δ > B ( x , δ ) ⊂ C c , and we claim that(6.11) For all x ∗ ∈ B ( x , δ ) ⊂ C c , x ∗ / ∈ int(csupp( ν )) . If (6.11) holds, then x / ∈ csupp( ν ) and hence csupp( ν ) ⊂ C . Now we turnto show (6.11). Since x ∗ / ∈ C = { lim inf n →∞ f n ( x ) > } , we can find asubsequence { f n ( k ) } k ∈ N of { f n } n ∈ N such that f n ( k ) ( x ∗ ) < k holds for all k ∈ N . Hence x ∗ / ∈ Γ k := { x ∈ R d : f n ( k ) ( x ) ≥ k } . Note that Γ k is a closedconvex set, hence by Hyperplane Separation Theorem we can find b k ∈ R d with k b k k = 1 such that { x ∈ R d : h b k , x i ≤ h b k , x ∗ i} ⊂ (Γ k ) c . Without lossof generality we may assume b k → b x ∗ as k → ∞ for some b x ∗ ∈ R d with k b x ∗ k = 1. Now for fixed R > η >
0, define A R,η := { x ∈ R d : h b x ∗ , x i < h b x ∗ , x ∗ i − η, k x k ≤ R } . Choose k ∈ N large enough such that k b k − b x ∗ k ≤ η R holds for all k ≥ k ( x ∗ , η, R ). Now for R > k x ∗ k and x ∈ A R,η , we have h b k , x − x ∗ i = h b x ∗ , x − x ∗ i + h b k − b x ∗ , x − x ∗ i < − η + η R ( k x k + k x ∗ k ) ≤ k ≥ k ( x ∗ , η, R ). This implies for R > k x ∗ k and η > A R,η ⊂ { x ∈ R d : h b k , x i ≤ h b k , x ∗ i} ⊂ (Γ k ) c = { x ∈ R d : f n ( k ) ( x ) < k } . Now note A R,η is open, by Portmanteau Theorem we find that ν ( A R,η ) ≤ lim inf k →∞ ν n ( k ) ( A R,η ) = lim inf k →∞ Z A R,η f n ( k ) ( x ) d x ≤ lim inf k →∞ λ d ( A R,η ) k = 0 . This implies ν (cid:0) { x ∈ R d : h b x ∗ , x i < h b x ∗ , x ∗ i} (cid:1) = ν (cid:18) ∞ [ R =1 A R, /R (cid:19) = lim R →∞ ν ( A R, /R ) = 0 , -CONCAVE ESTIMATION where the second equality follows from the fact { A R, /R } is an increasingfamily as R increases. By the assumption that dim (cid:0) csupp( ν ) (cid:1) = d , we find x ∗ / ∈ int(csupp( ν )), as we claimed in (6.11).Now Suppose dim C = d , we claim C ⊂ csupp( ν ). To see this, we onlyhave to show C ⊂ csupp( ν ) by the closedness of csupp( ν ). Suppose not,then we can find x ∈ C \ csupp( ν ). This implies that there exists δ > B ( x , δ ) ∩ csupp( ν ) = ∅ . By the assumption that dim C = d ,we can find x , . . . , x d ∈ B ( x , δ ) ∩ C such that { x , . . . , x d } are in generalposition. By definition of C we can find ǫ > , n ∈ N such that f n ( x j ) ≥ ǫ for all j = 0 , , . . . , d and n ≥ n . By convexity, we conclude that f n ( x ) ≥ ǫ , for all x ∈ conv( { x , . . . , x d } ) and n ≥ n . This gives ν (cid:0) conv( { x , . . . , x d } ) (cid:1) ≥ lim sup n →∞ ν n (cid:0) conv( { x , . . . , x d } ) (cid:1) ≥ ǫ λ d (cid:0) conv( { x , . . . , x d } ) (cid:1) > , a contradiction with B ( x , δ ) ∩ csupp( ν ) = ∅ , thus completing the proof ofthe claim. To summarize, we have proved1. If dim (cid:0) csupp( ν ) (cid:1) = d , then csupp( ν ) ⊂ C . This in turn impliesdim C = d , and hence C ⊂ csupp( ν ). Now it follows that csupp( ν ) = C ;2. If dim C = d , then C ⊂ csupp( ν ). This in turn implies dim (cid:0) csupp( ν ) (cid:1) = d , and hence csupp( ν ) ⊂ C . Now it follows that csupp( ν ) = C . Proof of Lemma 3.2.
The proof is essentially the same as the proofof Proposition 2 Cule and Samworth (2010) by exploiting convexity at thelevel of the underlying basic convex function so we shall omit it.
Proof of Lemma 3.3.
Set U n,t = { x ∈ R d : f n ( x ) ≥ t } . We first claimthat there exists n ∈ N , ǫ ∈ (0 ,
1) such that λ d ( U n,ǫ ) ≥ ǫ holds for all n ≥ n . If not, then for all k ∈ N , l ∈ N , there exists n k,l ∈ N such that λ d ( U n k,l , /l ) ≤ l . Note that { lim inf n f n > } = ∪ k ∈ N ∪ l ∈ N ∩ n ≥ k U n, /l . Since λ d (cid:0) S l ∈ N T n ≥ k U n, /l (cid:1) = lim l →∞ λ d (cid:0) T n ≥ k U n, /l (cid:1) ≤ lim l →∞ λ d ( U n k,l , /l ) =0, we find that C = { lim inf n f n > } is a countable union of null set andhence λ d ( C ) = 0, a contradiction to the assumption dim C = d . This showsthe claim.Denote M n := sup x ∈ R d f n ( x ) , ǫ n ∈ Arg max f n ( x ) . Without loss of gen-erality we assume M n ≥ ǫ (1+ κ s ) /s where κ s = (1 / s − >
0, and we set λ n := κ s M sn ǫ s − M sn ∈ [0 , x ∈ U n,ǫ , by convexity of f sn we have f sn ( ǫ n + λ n ( x − ǫ n )) ≤ λ n f sn ( x )+(1 − λ n ) f sn ( ǫ n ) ≤ λ n ǫ s +(1 − λ n ) M sn = ( M n / s . HAN AND WELLNER
This implies f n ( x ) ≥ M n / n , for all x ∈ V n,ǫ := { ǫ n + λ n ( x − ǫ n ) : x ∈ U n,ǫ } . Hence V n,ǫ ⊂ U n, Ω n and therefore λ d ( V n,ǫ ) = λ d ( U n,ǫ ) λ dn , thus λ d ( U n, Ω n ) ≥ λ d ( V n,ǫ ) = λ d ( U n,ǫ ) λ dn ≥ ǫ λ dn , holds for all n ≥ n . On the other hand,1 = Z f n ≥ Ω n λ d ( U n, Ω n ) ≥ Ω n ǫ λ dn , and suppose the contrary that M n → ∞ as n → ∞ , then1 ≥ Ω n ǫ λ dn = ǫ κ ds ǫ s − M sn ) d M sdn ≥ cM sdn → ∞ , n → ∞ , since 1 + sd > − /d < s <
0. Here c = ǫ − sd κ ds . This givesa contradiction and the proof is complete. Proof of Theorem 3.4.
We only have to show ν is absolutely contin-uous with respect to λ d . To this end, for given ǫ >
0, choose δ = ǫ/ M ,where M := sup n k f n k ∞ < ∞ by virtue of Lemma 3.3. Now for Borel set A ⊂ R d with λ d ( A ) ≤ δ , we can take an open A ′ ⊃ A such that λ d ( A ′ ) ≤ δ by the regularity of Lebesgue measure. Then ν ( A ) ≤ ν ( A ′ ) ≤ lim inf n →∞ ν n ( A ′ ) = lim inf n →∞ Z A ′ f n ≤ δM = ǫ, as desired. Proof of Lemma 3.5.
Let g n = f sn and g = f s . Without loss of gen-erality we assume 0 ∈ int(dom( g )), and choose η > B η := B (0 , η ) ⊂ int(dom( g )). By the Lemma 7.10, we know there ex-ists a > , R > g ( x ) − g (0) k x k ≥ a, holds for all k x k ≥ R . Nowwe claim that there exists n ∈ N such that g n ( x ) − g n (0) k x k ≥ a , holds for all k x k ≥ R and n ≥ n . Note for each n ∈ N , by convexity of g n ( · ), we knowthat for fixed x ∈ R d , the quantity g n ( λx ) − g n (0) k λx k is non-decreasing in λ , sowe only have to show the claim for k x k = R and n ≥ n . Suppose the con-trary, then we can find a subsequence { g n ( k ) } and k x n ( k ) k = R such that g n ( k ) ( x n ( k ) ) − g n ( k ) (0) k x n ( k ) k < a . For simplicity of notation we think of { g n } , { x n } as { g n ( k ) } , { x n ( k ) } . Now define A n := conv( { x n , B η } ); B n := { y ∈ R d : k y − x n k ≤ R/ } ; C n := A n ∩ B n . By reducing η > -CONCAVE ESTIMATION assume B η ∩ B n = ∅ . It is easy to see C n is convex and λ d ( C n ) = λ is aconstant independent of n ∈ N . By Lemma 3.2, we know that g n → a.e. g on B η , and hence sup x ∈ B η | g n ( x ) − g ( x ) | → n → ∞ ) by Theorem 10.8,Rockafellar (1997). By further reducing η > g n ( y ) ≤ g (0) + aR , holds for all y ∈ B η and n ∈ N . Now for any x ∗ ∈ C n ,write x ∗ = λx n + (1 − λ ) y , by noting R/ ≤ k x ∗ k ≤ R and convexity of g n ,we get g n ( x ∗ ) − g n (0) k x ∗ k ≤ λg n ( x n ) + (1 − λ ) g n ( y ) − g n (0) k x ∗ k = λ · g n ( x n ) − g n (0) k x n k · k x n kk x ∗ k + (1 − λ ) g n ( y ) − g n (0) k x ∗ k≤ λ · a RR/ − λ ) aR/ R/ a . This gives rise tolim inf n →∞ Z C n ( f n − f ) ≥ lim inf n →∞ λ (cid:0) ( aR/ g n (0)) /s − ( aR/ g (0)) /s (cid:1) = λ (cid:0) ( aR/ g (0)) /s − ( aR/ g (0)) /s (cid:1) > , which is a contradiction to Lemma 7.16. This establishes our claim. Nowby Lemma 3.2, we find that the set { lim inf n f n ( · ) > } is full-dimensional,and hence by Lemma 3.3 we conclude g n ( · ) is uniformly bounded away fromzero. Also note by Lemma 7.15 we find g ( · ) must be bounded away fromzero, which gives the desired assertion.Before the proof of Theorem 3.7, we first state some useful lemmas thatgive good control of tails with local information of the s -concave densities;the proof can be found in Section 7.1. Lemma . Let x , . . . , x d be d + 1 points in R d such that its convexhull ∆ = conv( { x , . . . , x d } ) is non-void. If f ( y ) ≤ min j (cid:0) d P i = j f s ( x i ) (cid:1) /s ,then f ( y ) ≤ f max (cid:18) − dr + dr f min C (1 + k y k ) / (cid:19) − r . Here the constant C = λ d (∆)( d +1) − / σ max ( X ) − where X = (cid:18) x . . . x d . . . (cid:19) and f min := min ≤ j ≤ d f ( x j ) , f max := max ≤ j ≤ d f ( x j ) . HAN AND WELLNER
Lemma . Let ν be a probability measure with s -concave density f .Suppose that B (0 , δ ) ⊂ int(dom( f )) for some δ > . Then for any y ∈ R d , sup x ∈ B ( y,δ t ) f ( x ) ≤ J t (cid:18) ν ( B ( ty, δ t )) J λ d ( B ( ty, δ t )) (cid:19) − /r − (1 − t ) !! − r , where J := inf v ∈ B (0 ,δ ) f ( v ) and δ t = δ − t t . Now we are in position to prove Theorem 3.7.
Proof of Theorem 3.7.
That the sequence { f n } n ∈ N converges uniformlyon any compact subset in int(dom( f )) follows directly from Lemma 3.2 andTheorem 10.8 Rockafellar (1997). Now we show that if f is continuous at y ∈ R d with f ( y ) = 0, then for any η > δ = δ ( y, η ) such that(6.12) lim sup n →∞ sup x ∈ B ( y,δ ( y,η )) f n ( x ) ≤ η. Assume without loss of generality that B (0 , δ ) ⊂ int(dom( f )) for some δ >
0. Let J := inf x ∈ B (0 ,δ ) f ( x ). Then uniform convergence of { f n } to f over B (0 , δ ) entails thatlim inf n →∞ inf x ∈ B (0 ,δ ) f n ( x ) ≥ J . Hence with δ t = δ − t t , it follows from Lemma 6.3 thatlim sup n →∞ sup x ∈ B ( y,δ t ) f n ( x ) ≤ J (cid:18) t (cid:18) (cid:18) ν ( B ( ty, δ t )) J λ d ( B ( ty, δ t )) (cid:19) − /r − (1 − t ) (cid:19)(cid:19) − r ≤ J (cid:18) J /r (cid:0) sup x ∈ B ( ty,δ t ) f ( x ) (cid:1) − /r − (1 − t ) t (cid:19) − r → t ր
1. This completes the proof for (6.12). So far we have shown thatlim n →∞ sup x ∈ S ∩ B (0 ,ρ ) | f n ( x ) − f ( x ) | = 0holds for every ρ ≥
0, where S is the closed set contained in the con-tinuity points of f . Our goal is to let ρ → ∞ and conclude. Let ∆ =conv( { x , . . . , x d } ) be a non-void simplex with x , . . . , x d ∈ int(dom( f )).Note first by a closer look at the proof of Lemma 3.5, f n ( x ) ∨ f ( x ) ≤ -CONCAVE ESTIMATION (cid:0) ( a k x k − b ) (cid:1) /s + holds for all x ∈ R d with some a, b >
0. Let ρ := inf { ρ ≥ (cid:0) aρ − b ) /s ≤ f min / } where f min := min ≤ j ≤ d f ( x i ) >
0. Then { x ∈ R d : k x k ≥ ρ } ⊂ \ n ≥ { f n ≤ f min / } \ { f ≤ f min / }⊂ \ n ≥ n { f n ≤ ( f n ) min } \ { f ≤ f min }⊂ \ n ≥ n { f n ≤ min j (cid:0) d X i = j f sn ( x i ) (cid:1) /s } \ { f ≤ min j (cid:0) d X i = j f s ( x i ) (cid:1) /s } , where n ∈ N is a large constant. The second inclusion follows from thefact that lim n →∞ f n ( x i ) = f ( x i ) holds for i = 0 , . . . , d . By Lemma 6.2 weconclude thatlim sup n →∞ sup x : k x k≥ ρ ∨ ρ (cid:0) k x k ) κ (cid:0) f n ( x ) ∨ f ( x ) (cid:1) ≤ sup x : k x k≥ ρ ∨ ρ f max (cid:0) k x k ) κ (cid:18) − dr + dr f min C (cid:0) k x k (cid:1) / (cid:19) − r → , as ρ → ∞ . This completes the proof. Proof of Theorem 3.8.
Since ∇ ξ f n ( x ) = − rg n ( x ) /s − ∇ ξ g n ( x ), |∇ ξ f n ( x ) − ∇ ξ f ( x ) | = r (cid:12)(cid:12)(cid:12) g n ( x ) /s ∇ ξ g n ( x ) − g ( x ) /s ∇ ξ g ( x ) (cid:12)(cid:12)(cid:12) ≤ r (cid:18) f n ( x ) |∇ ξ g n ( x ) − ∇ ξ g ( x ) | + | f n ( x ) − f ( x ) | |∇ ξ g ( x ) | (cid:19) ≤ r sup x ∈ T | f ( x ) | |∇ ξ g n ( x ) − ∇ ξ g ( x ) | + r sup x ∈ T | f n ( x ) − f ( x ) | sup x ∈ T k∇ g ( x ) k holds for n large enough by Theorem 3.7. By Theorem 23.4 in Rockafellar(1997), ∇ ξ g n ( x ) = τ Tx ξ for some τ x ∈ ∂g n ( x ) since ∂g n ( x ) is a closed set.Thus the first term above is further bounded by2 r sup x ∈ T | f ( x ) | sup x ∈ T,τ ∈ ∂g n ( x ) k τ − ∇ g ( x ) k , which vanishes as n → ∞ in view of Lemma 3.10 in Seijo and Sen (2011).Note that ∇ g ( · ) is continuous on T by Corollary 25.5.1 in Rockafellar (1997),and hence sup x ∈ T k∇ g ( x ) k < ∞ . Now it is easy to see that the second termalso vanishes as n → ∞ by virtue of Theorem 3.7. HAN AND WELLNER
Proofs for Section 4.
Before we prove Theorem 4.1, we will need thefollowing tightness result.
Theorem . We have the following conclusions.1. For fixed
K > , the modified local process Y locmod n ( · ) converges weaklyto a drifted integrated Gaussian process on C [ − K, K ] : Y locmod n ( t ) → d p f ( x ) Z t W ( s ) d s − rg ( k )0 ( x ) g ( x )( k + 2)! t k +2 , where W ( · ) is the standard two-sided Brownian motion starting from on R .2. The localized processes satisfy Y locmod n ( t ) − H locmod n ( t ) ≥ , with equality attained for all t such that x + tn − / (2 k +1) ∈ S (ˆ g n ) .3. The sequences { ˆ A n } and { ˆ B n } are tight. The above theorem includes everything necessary in order to apply the‘invelope’ argument roughly indicated in Section 4.1. For a proof of thistechnical result, we refer the reader to Section 7.2. Here we will provideproofs for our main results.
Proof of Theorem 4.1.
By the same tightness and uniqueness argu-ment adopted in Groeneboom, Jongbloed and Wellner (2001), Balabdaoui and Wellner(2007), and Balabdaoui, Rufibach and Wellner (2009), we only have to findthe rescaling constants. To this end we denote H ( · ), Y ( · ) the correspond-ing limit of H locmod n ( · ) and Y locmod n ( · ) in the uniform topology on the space C [ − K, K ], and let Y ( t ) = γ Y k ( γ t ) , where by Theorem 6.4, we know that Y ( t ) = 1 p f ( x ) Z t W ( s ) d s − rg ( k )0 ( x ) g ( x )( k + 2)! t k +2 . Let a := (cid:0) f ( x ) (cid:1) − / and b := rg ( k )0 ( x ) g ( x )( k +2)! , then by rescaling property ofBrownian motion, we find that γ γ / = a, γ γ k +22 = b . Solving for γ , γ yields(6.13) γ = a k +42 k +1 b − k +1 , γ = a − k +1 b k +1 . -CONCAVE ESTIMATION On the other hand, by (4.3), let n → ∞ , we find that n k k +1 (cid:0) ˆ g n ( x + s n t ) − g ( x ) − s n tg ′ ( x ) (cid:1) n k − k +1 (cid:0) ˆ g ′ n ( x + s n t ) − g ′ ( x ) (cid:1) ! → d g ( x ) − r d d t H ( t ) g ( x ) − r d d t H ( t ) ! (6.14)It is easy to see that d d t H ( t ) = γ γ
22 d d t H k ( γ t ) nd d d t H ( t ) = γ γ
32 d d t H k ( γ t ).Now by substitution in (6.13) we get the conclusion by direct calculation andthe delta method. Proof of Theorem 4.4.
The proof is essentially the same as that ofTheorem 3.6 Balabdaoui, Rufibach and Wellner (2009).
Lemma . Assume (A1)-(A4). Then Z ∞−∞ ˜ f ǫ ( x ) d x = 1 + π k rg ( k ) ( m ) g ( m ) r +1 ǫ k +1 + o ( ǫ k +1 ) , where π k = 1( k + 1)! h k − (2 k − k + 3) + 2 k − i . Proof of Lemma 6.5.
This is straightforward calculation by Taylor ex-pansion. Note that Z ∞−∞ ˜ g − rǫ ( x ) d x = Z ∞−∞ (˜ g − rǫ ( x ) − g − r ( x )) d x + 1= Z m − ǫm − c ǫ ǫ (cid:18) ˜ g − rǫ ( x ) − g − r ( x ) (cid:19) d x + Z m + ǫm − ǫ (cid:18) ˜ g − rǫ ( x ) − g − r ( x ) (cid:19) d x + 1:= I + II + 1 . For y > x , we have x − r − y − r = P ∞ n ≥ (cid:0) − rn (cid:1) ( − n ( y − x ) n y − r − n . Now for the HAN AND WELLNER first term above, we continue our calculation of its leading term by noting g ( x ) − ˜ g ǫ ( x )= g ( x ) − g ( m − c ǫ ǫ ) − ( x − m + c ǫ ǫ ) g ′ ( m − c ǫ ǫ )= g ( m ) + g ( k ) ( m ) k ! ( x − m ) k − (cid:20) g ( m ) + g ( k ) ( m ) k ! ( − c ǫ ǫ ) k (cid:21) − ( x − m + c ǫ ǫ ) g ( k ) ( m )( k − − c ǫ ǫ ) k − + higher order terms= g ( k ) ( m ) k ! (cid:20) ( x − m ) k − c kǫ ǫ k + kc k − ǫ ǫ k − ( x − m + c ǫ ǫ ) (cid:21) + higher order terms . (6.15)Here we used the fact k is an even number, as shown in Lemma 7.2. Thuswe haveleading term of I= Z m − ǫm − c ǫ ǫ r (cid:18) g ( x ) − g ( m − c ǫ ǫ ) − ( x − m + c ǫ ǫ ) g ′ ( m − c ǫ ǫ ) (cid:19) g ( x ) − r − d x = rg ( k ) ( m ) k ! g ( m ) r +1 Z m − ǫm − c ǫ ǫ (cid:20) ( x − m ) k − c kǫ ǫ k + kc k − ǫ ǫ k − ( x − m + c ǫ ǫ ) (cid:21) d x + o ( ǫ k +1 )= α k rg ( k ) ( m ) g ( m ) r +1 ǫ k +1 + o ( ǫ k +1 )Here α k = 1( k + 1)! h k − (2 k − k + 3) − i . For the second term, g ( x ) − ˜ g ǫ ( x )= g ( x ) − g ( m + ǫ ) − ( x − m − ǫ ) g ′ ( m + ǫ )= g ( k ) ( m ) k ! (cid:20) ( x − m ) k − ǫ k − kǫ k − ( x − m − ǫ ) (cid:21) + higher order terms . (6.16)Now similar calculations yield that the second term = β k rg ( k ) ( m ) g ( m ) r +1 ǫ k +1 + o ( ǫ k +1 ) with β k = 2 k ( k + 1)! . This gives the conclusion. -CONCAVE ESTIMATION Proof of Lemma 4.6.
By definition of the Hellinger metric and Lemma6.5, we have2 h ( f ǫ , f ) = Z ∞−∞ (cid:0)p f ǫ ( x ) − p f ( x ) (cid:1) d x = Z ∞−∞ (cid:18) ˜ g − r/ ǫ ( x ) − π k rg ( k ) ( m ) g ( m ) r +1 ǫ k +1 + o ( ǫ k +1 ) ! − g − r/ ( x ) (cid:19) d x ≡ Z ∞−∞ (cid:16) ˜ g − r/ ǫ ( x )(1 + η k ( ǫ )) − g − r/ ( x ) (cid:17) d x since f ǫ ( x ) = ˜ g − rǫ ( x ) (cid:18) π k rg ( k ) ( m ) g ( m ) r +1 ǫ k +1 + o ( ǫ k +1 ) (cid:19) − = ˜ g − rǫ ( x ) (cid:18) − π k rg ( k ) ( m ) g ( m ) r +1 ǫ k +1 + o ( ǫ k +1 ) (cid:19) . Here η k ( ǫ ) = O ( ǫ k +1 ). Splitting two terms apart in the above integral weget2 h ( f ǫ , f ) = Z ∞−∞ (cid:18) ˜ g − r/ ǫ ( x ) − g − r/ ( x ) + η k ( ǫ )˜ g − r/ ǫ ( x ) (cid:19) d x = Z ∞−∞ (cid:0) ˜ g − r/ ǫ ( x ) − g − r/ ( x ) (cid:1) d x + (cid:0) η k ( ǫ ) (cid:1) Z ∞−∞ ˜ g − rǫ ( x ) d x + 2 η k ( ǫ ) Z ∞−∞ ˜ g − r/ ǫ ( x ) (cid:0) ˜ g − r/ ǫ ( x ) − g − r/ ( x ) (cid:1) d x = I + II + III.
Now for the first term, I = Z m + ǫm − c ǫ ǫ r (cid:2) g ( x ) − ˜ g ǫ ( x ) (cid:3) g ( x ) − r − d x + higher order terms= r g ( m ) r +2 Z m + ǫm − c ǫ ǫ (cid:2) g ( x ) − ˜ g ǫ ( x ) (cid:3) d x + higher order terms= r g ( m ) r +2 (cid:18) Z m − ǫm − c ǫ ǫ + Z m + ǫm − ǫ (cid:19)(cid:2) g ( x ) − ˜ g ǫ ( x ) (cid:3) d x + higher order terms= I + I + higher order terms . HAN AND WELLNER
By (6.15) and (6.16) we see that for i = 1 , I i = r g ( m ) r +2 Z I i (cid:2) g ( x ) − ˜ g ǫ ( x ) (cid:3) d x = ζ ( i ) k r f ( m ) g ( k ) ( m ) g ( m ) ǫ k +1 + o ( ǫ k +1 ) . Here I = [ m − c ǫ ǫ, m − ǫ ], I = [ m − ǫ, m + ǫ ], and ζ (1) k = 1108( k !) ( k + 1)( k + 2)(2 k + 1) (cid:20) − · k +2 (2 k + 1)(3 k +2 + k + k − k + 1)( k + 2) (cid:18) k +1 −
1) + 2 · k (2 k + 1)(2 k (2 k −
9) + 27) (cid:19)(cid:21) .ζ (2) k = 2 k (2 k + 1)3( k !) ( k + 1)(2 k + 1) . On the other hand, II = O ( ǫ (2 k +2) ) = o ( ǫ k +1 ) and | III | ≤ O ( ǫ k +1 · ǫ (2 k +1) / · ǫ (2 k +2) / ) = o ( ǫ k +1 ) by Cauchy-Schwarz. This completes the proof.
7. Appendix.
Proofs of Lemmas 6.2 and 6.3.
Lemma . Let ν be a probability measure with s -concave density f ,and x , . . . , x d ∈ R d be d + 1 points such that ∆ := conv( { x , . . . , x d } ) isnon-void. If f ( x ) ≤ (cid:0) d P di =1 f s ( x i ) (cid:1) /s , then f ( x ) ≤ ¯ g − r (cid:18) − dr + dr λ d (∆)¯ g − r ν (∆) (cid:19) − r , where ¯ g := d P dj =1 f s ( x j ) . Proof of Lemma 7.1.
For any point x ∈ ∆, we can find some u =( u , . . . , u d ) ∈ ∆ d = { u : P di =1 u i ≤ } such that x ( u ) = P di =0 u i x i . Here u := 1 − P di =1 u i ≥
0. We use the following representation of integrationon the unit simplex ∆ d : For any measurable function h : ∆ d → [0 , ∞ ), wehave R ∆ d h ( u ) d u = d ! E h ( B , . . . , B d ) , where B i = E i / P dj =0 E j with inde-pendent, standard exponentially distributed random variables E , . . . , E d . ν (∆) λ d (∆) = 1 λ d (∆ d ) Z ∆ d g (cid:0) x ( u ) (cid:1) − r d u = E g (cid:18) d X j =0 B j x j (cid:19) − r ≥ E (cid:18) d X j =0 B j g ( x j ) (cid:19) − r = E (cid:18) B g + (1 − B ) d X i =1 ˜ B i g ( x i ) (cid:19) − r , -CONCAVE ESTIMATION where ˜ B i := E i / P dj =1 E j for 1 ≤ i ≤ d . Following Cule and D¨umbgen(2008), it is known that B and { ˜ B i } di =1 are independent, and E [ ˜ B i ] = 1 /d .Hence it follows from Jensen’s inequality that ν (∆) λ d (∆) ≥ E " E (cid:18) B g + (1 − B ) d X i =1 ˜ B i g ( x i ) (cid:19) − r (cid:12)(cid:12)(cid:12)(cid:12) B ≥ E (cid:18) B g + (1 − B ) 1 d d X i =1 g ( x i ) (cid:19) − r = E ( B g + (1 − B )¯ g ) − r = Z d (1 − t ) d − (cid:0) tg + (1 − t )¯ g (cid:1) − r d t = ¯ g − r Z d (1 − t ) d − (cid:18) − st (cid:18) ( − /s ) (cid:18) g ¯ g − (cid:19)(cid:19)(cid:19) d t = ¯ g − r J d,s (cid:18) − s (cid:18) g ¯ g − (cid:19) (cid:19) , where J d,s ( y ) = Z d (1 − t ) d − (1 − syt ) /s d t. We claim that J d,s ( y ) ≥ Z d (1 − t ) d − (1 − t ) y d t = dd + y , holds for s < , y >
0. To see this, we write (1 − syt ) /s = (1 + yt/r ) − ( r/y ) y . Then we only have to show (1 + yt/r ) − r/y ≥ (1 − t ) for 0 ≤ t ≤
1, orequivalently (1 + bt ) ≤ (1 − t ) − b where we let b = y/r . Let g ( t ) := (1 − t ) − b − (1 + bt ). It is easy to verify that g (0) = 0, g ′ ( t ) = b (1 − t ) − b − − b with g ′ (0) = 0, and g ′′ ( t ) = b ( b + 1)(1 − t ) − b − ≥
0. Integrating g ′′ twiceyields g ( t ) ≥
0, and hence we have verified the claim. Now we proceed withthe calculation ν (∆) λ d (∆) ≥ ¯ g − r J d,s (cid:18) − s (cid:18) g ¯ g − (cid:19) (cid:19) ≥ ¯ g − r dd − s (cid:0) g ¯ g − (cid:1) . Solving for g and replacing − /s = r proves the desired inequality. Proof of Lemma 6.2.
For fixed j ∈ { , . . . , d } , note | det( x i − x j ) : i = j | = | det X | where X = (cid:18) x . . . x d . . . (cid:19) . Also for each y ∈ R d , since ∆ = HAN AND WELLNER conv( { x , . . . , x d } ) is non-void, y must be in the affine hull of ∆ and hencewe can write y = P di =0 λ i x i with P di =0 λ i = 1 (not necessary non-negative),i.e. λ = X − (cid:0) y (cid:1) . Let ∆ j ( y ) := conv( { x i : i = j } ∪ { y } ). Then λ d (∆ j ( y )) = 1 d ! (cid:12)(cid:12)(cid:12)(cid:12) det (cid:18) x . . . x j − y x j +1 . . . x d . . . . . . (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) = 1 d ! | λ j | | det X | = | λ j | λ d (∆) . Hence,max ≤ j ≤ d λ d (∆ j ( y )) ≥ λ d (∆) max j | λ j | = λ d (∆) k X − (cid:18) y (cid:19) k ∞ ≥ λ d (∆)( d + 1) − / k X − (cid:18) y (cid:19) k≥ λ d (∆)( d + 1) − / σ max ( X ) − (1 + k y k ) / = C (1 + k y k ) / . Now the conclusion follows from Lemma 7.1 by noting f ( y ) ≤ ¯ g − rj (cid:18) − dr + dr λ d (∆ j ( y ))¯ g − rj ν (∆ j ( y )) (cid:19) − r ≤ f max (cid:18) − dr + dr f min C (1 + k y k ) / (cid:19) − r , since ¯ g − rj = (cid:0) d P i = j f s ( x i ) (cid:1) /s and hence f min ≤ ¯ g − rj ≤ f max , and the index j is chosen such that λ d (∆ j ( y )) is maximized. Proof of Lemma 6.3.
The key point that for any point x ∈ B ( y, δ t ) B ( ty, δ t ) ⊂ (1 − t ) B (0 , δ ) + tx can be shown in the same way as in the proof of Lemma 4.2 Schuhmacher, H¨usler and D¨umbgen(2011). Namely, pick any w ∈ B ( ty, δ t ), let v := (1 − t ) − ( w − tx ), then since k v k = (1 − t ) − k w − tx k = (1 − t ) − k w − ty + t ( y − x ) k ≤ (1 − t ) − ( δ t + tδ t ) = δ, and hence v ∈ B (0 , δ ). This implies that w = (1 − t ) v + tx ∈ (1 − t ) B (0 , δ )+ tx ,as desired. By s -concavity of f , we have f ( w ) ≥ (cid:0) (1 − t ) f ( v ) s + tf ( x ) s (cid:1) /s ≥ (cid:0) (1 − t ) J s + tf ( x ) s (cid:1) /s = J (cid:18) − t + t (cid:18) f ( x ) J (cid:19) s (cid:19) /s . -CONCAVE ESTIMATION Averaging over w ∈ B ( ty, δ t ) yields ν ( B ( ty, δ t )) λ d ( B ( ty, δ t )) ≥ J (cid:18) − t + t (cid:18) f ( x ) J (cid:19) s (cid:19) /s . Solving for f ( x ) completes the proof.7.2. Proof of Theorem 6.4.
We first observe that
Lemma . k is an even integer and g ( k )0 ( x ) > . Proof of Lemma 7.2.
By Taylor expansion of g ′′ around x , we findthat locally for x ≈ x , g ′′ ( x ) = g ( k )0 ( x )( k − x − x ) k − + o (cid:0) ( x − x ) k − (cid:1) . Also note g ′′ ( x ) ≥ k − g ( k )0 ( x ) > k , r n := n k +22 k +1 ; s n := n − k +1 ; x n ( t ) := x + s n t ; l n,x :=[ x , x n ( t )] . Let τ + n := inf { t ∈ S n (ˆ g n ) : t > x } , and τ − n := sup { t ∈ S n (ˆ g n ) : t < x } . The key step in establishing the limit theory, is to establish astochastic bound for the gap τ + n − τ − n as follows. Theorem . Assume (A1)-(A4) hold. Then τ + n − τ − n = O p ( s n ) . Proof.
Define ∆ ( x ) := ( τ − n − x ) [ τ − n , ¯ τ ] ( x ) + ( x − τ + n ) [¯ τ,τ + n ] ( x ), and∆ := ∆ + τ + n − τ − n [ τ − n ,τ + n ] , where ¯ τ =: τ − n + τ + n . Thus we find that Z ∆ d( F n − F ) = Z ∆ d( F n − ˆ F n ) + Z ∆ d( ˆ F n − F ) ≥ − τ + n − τ − n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)Z τ + n τ − n d( F n − ˆ F n ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + Z ∆ ( ˆ f n − f ) d λ ≥ − τ + n − τ − n n + Z ∆ ( ˆ f n − f ) d λ, where the last line follows from Corollary 2.13. Now let R n := R ∆ ( ˆ f n − f ) d λ, R n := R ∆ d( F n − F ) . The conclusion follows directly from thefollowing lemma. HAN AND WELLNER
Lemma . Suppose (A1)-(A4) hold. Then R n = O p ( τ + n − τ − n ) k +2 and R n = O p ( r − n ) . Proof of Lemma 7.4.
Define p n := ˆ g n /g on [ τ + n , τ − n ]. It is easy to seethat τ + n − τ − n = o p (1), so with large probability, for all n ∈ N large enough,inf x ∈ [ τ + n ,τ − n ] f ( x ) > R n = Z τ + n τ − n ∆ ( x ) (cid:0) ˆ f n ( x ) − f ( x ) (cid:1) d x = Z τ + n τ − n ∆ ( x ) f ( x ) (cid:18) ˆ f n ( x ) f ( x ) − (cid:19) d x = Z τ + n τ − n ∆ ( x ) f ( x ) (cid:18) k − X j =1 (cid:18) − rj (cid:19) ( p n ( x ) − j + (cid:18) − rk (cid:19) θ − r − kx,n ( p n ( x ) − k (cid:19) d x, where θ x,n ∈ [1 ∧ ˆ g n ( x ) g ( x ) , ∨ ˆ g n ( x ) g ( x ) ]. Now define S nj = Z τ + n τ − n ∆ ( x ) f ( x ) (cid:18) − rj (cid:19) ( p n ( x ) − j d x, ≤ j ≤ k − ,S nk = Z τ + n τ − n ∆ ( x ) f ( x ) (cid:18) − rk (cid:19) θ − r − kx,n ( p n ( x ) − k d x. Expand f around ¯ τ , then we have S nj = k − X l =0 Z τ + n τ − n ∆ ( x ) f ( l )0 (¯ τ ) l ! ( x − ¯ τ ) l (cid:18) − rj (cid:19) ( p n ( x ) − j d x + Z τ + n τ − n ∆ ( x ) f ( l )0 ( η n,x,k ) k ! ( x − ¯ τ ) k (cid:18) − rk (cid:19) ( p n ( x ) − k d x,S nk = k − X l =0 Z τ + n τ − n ∆ ( x ) f ( l )0 (¯ τ ) l ! θ − r − kx,n ( x − ¯ τ ) l (cid:18) − rj (cid:19) ( p n ( x ) − k d x + Z τ + n τ − n ∆ ( x ) f ( l )0 ( η n,x,k ) k ! θ − r − kx,n ( x − ¯ τ ) k (cid:18) − rk (cid:19) ( p n ( x ) − k d x. Now we see the dominating term is the first term in S n since all other termsare of higher orders, and | θ x,n − | = o p (1) uniformly locally in x in view ofTheorem 3.7. We denote this term Q n . Note that 1 /g ( x ) = 1 /g ( τ )+ o p (1) -CONCAVE ESTIMATION uniformly in τ around x , and that ˆ g n is piecewise linear, yielding Q n − rf (¯ τ ) = Z τ + n τ − n ∆ ( x ) 1 g ( x ) (cid:0) ˆ g n ( x ) − g ( x ) (cid:1) d x = (cid:18) g ( x ) + o p (1) (cid:19) Z τ + n τ − n ∆ ( x ) (cid:0) ˆ g n ( x ) − g ( x ) (cid:1) d x = (cid:18) g ( x ) + o p (1) (cid:19)(cid:20)(cid:0) ˆ g n (¯ τ ) − g (¯ τ ) (cid:1) Z τ + n τ − n ∆ ( x ) d x + (cid:0) ˆ g ′ n (¯ τ ) − g ′ (¯ τ ) (cid:1) Z τ + n τ − n ∆ ( x )( x − ¯ τ ) d x − k X j =2 g ( j )0 (¯ τ ) j ! Z τ + n τ − n ∆ ( x )( x − ¯ τ ) j d x − Z τ + n τ − n ǫ n ( x )∆ ( x )( x − ¯ τ ) k d x (cid:21) , where the first two terms in the bracket is zero by construction of ∆ . Nownote that Z τ + n τ − n ∆ ( x )( x − ¯ τ ) j d x = ( j = 0 , or j is odd; j j +2 ( j +1)( j +2) (cid:0) τ + n − τ − n (cid:1) j +2 j > , and j is even , and that g ( j )0 (¯ τ ) = k − j )! ( g ( k )0 ( x ) + o p (1)) (cid:0) ¯ τ − x ) k − j . This means that for j ≥ j even, g ( j )0 (¯ τ ) j ! Z τ + n τ − n ∆ ( x )( x − ¯ τ ) j d x = j ( g ( k )0 ( x ) + o p (1))( k − j )!( j + 2)!2 j +2 (¯ τ − x ) k − j ( τ + n − τ − n ) j +2 = j ( g ( k )0 ( x ) + o p (1))( k − j )!( j + 2)!2 j +2 O p (1)( τ + n − τ − n ) k +2 . Further note that k ǫ n k ∞ = o p (1) as τ + n − τ − n → p
0, we get Q n = O p ( τ + n − τ − n ) k +2 . This establishes the first claim. The proof for R n follows the sameline as in the proof of Lemma 4.4 Balabdaoui, Rufibach and Wellner (2009)p1318-1319. Lemma . We have the following: f ( j )0 ( x ) = j ! (cid:18) − rj (cid:19) g ( x ) − r − j (cid:0) g ′ ( x ) (cid:1) j , ≤ j ≤ k − f ( k )0 ( x ) = k ! (cid:18) − rk (cid:19) g ( x ) − r − k (cid:0) g ′ ( x ) (cid:1) k − rg ( x ) − r − g ( k )0 ( x ) . HAN AND WELLNER
Proof.
This follows from direct calculation.
Lemma . For any
M > , we have sup | t |≤ M (cid:12)(cid:12) ˆ g ′ n ( x + s n t ) − ˆ g ′ ( x ) (cid:12)(cid:12) = O p ( s k − n );sup | t |≤ M (cid:12)(cid:12) ˆ g n ( x + s n t ) − g ( x ) − s n tg ′ ( x ) (cid:12)(cid:12) = O p ( s kn ) . The proof is identical to Lemma 4.4 in Groeneboom, Jongbloed and Wellner(2001) so we shall omit it.
Lemma . Let ˆ e n ( u ) := ˆ f n ( u ) − k − X j =0 f ( j )0 ( x ) j ! ( u − x ) j − f ( x ) (cid:18) − rk (cid:19)(cid:18) g ′ ( x ) g ( x ) (cid:19) k ( u − x ) k . Then for any
M > , we have sup | t |≤ M | ˆ e n ( x + s n t ) | = O p ( s kn ) . Proof.
Note thatˆ f n ( u ) − f ( x ) = f ( x ) (cid:20) ˆ f n ( u ) f ( x ) − (cid:21) = f ( x ) (cid:20)(cid:18) ˆ g n ( u ) g ( x ) (cid:19) − r − (cid:21) = f ( x ) (cid:18) k X j =1 (cid:18) − rj (cid:19)(cid:18) ˆ g n ( u ) g ( x ) − (cid:19) j + X j ≥ k +1 (cid:18) − rj (cid:19)(cid:18) ˆ g n ( u ) g ( x ) − (cid:19) j | {z } =: ˆΨ k,n, ( u ) (cid:19) . (7.1)Define ˆΨ k,n, ( u ) := P j ≥ k +1 (cid:0) − rj (cid:1) (cid:16) ˆ g n ( u ) g ( x ) − (cid:17) j = P j ≥ k +1 (cid:0) − rj (cid:1) g ( x ) j (cid:0) ˆ g n ( u ) − g ( x ) (cid:1) j . Note that (cid:0) ˆ g n ( u ) − g ( x ) (cid:1) j = (cid:0) ˆ g n ( u ) − g ( x ) − ( u − x ) g ′ ( x ) + ( u − x ) g ′ ( x ) (cid:1) j = j X l =1 (cid:18) jl (cid:19)(cid:2) ˆ g n ( u ) − g ( x ) − ( u − x ) g ′ ( x ) (cid:3) l ( u − x ) j − l g ′ ( x ) j − l + ( u − x ) j g ′ ( x ) j = O p ( s kln · s j − ln ) + O p ( s jn )uniformly on { u : | u − x | ≤ M n − / (2 k +1) } = O p ( n − j k +1 ) , -CONCAVE ESTIMATION if j ≥ k + 1. Here the third line follows from Lemma 7.6. This impliesˆΨ k,n, ( u ) = o p ( n − k k +1 ) , uniformly on { u : | u − x | ≤ M n − / (2 k +1) } . Usingthe same expansion in the first term on the right hand side of (7.1), we arriveat ˆ f n ( u ) − f ( x ) | {z } (1) = f ( x ) k X j =1 (cid:18) − rj (cid:19) g ( x )] j j X r =1 (cid:18) jr (cid:19)(cid:2) ˆ g n ( u ) − g ( x ) − ( u − x ) g ′ ( x ) (cid:3) r ( u − x ) j − r g ( x ) j − r | {z } (2) + f ( x ) k X j =1 (cid:18) − rj (cid:19)(cid:18) g ′ ( x ) g ( x ) (cid:19) j ( u − x ) j | {z } (3) + f ( x ) ˆΨ k,n, ( u ) | {z } (4) . By Lemma 7.5, we see that ˆ e n ( u ) = (1) − (3) = (2) + (4) = O p ( s kn ) uniformlyon { u : | u − x | ≤ M n − / (2 k +1) } . This yields the desired result.We are now ready for the proof of Theorem 6.4. HAN AND WELLNER
Proof of Theorem 6.4.
For the first assertion, note that[ f ( x )] − (cid:18) ˆ f n ( u ) − k − X j =0 f ( j )0 ( x ) j ! ( u − x ) j (cid:19) =[ f ( x )] − (cid:18) ˆ f n ( u ) − f ( x ) − k − X j =1 f ( j )0 ( x ) j ! ( u − x ) j (cid:19) =[ f ( x )] − (cid:18) f ( x ) (cid:18) k X j =1 (cid:18) − rj (cid:19)(cid:18) ˆ g n ( u ) g ( x ) − (cid:19) j + ˆΨ k,n, ( u ) (cid:19) − k − X j =1 f ( j )0 ( x ) j ! ( u − x ) j (cid:19) by (7 . k,n, ( u ) + k X j =1 (cid:18) − rj (cid:19)(cid:18) ˆ g n ( u ) g ( x ) − (cid:19) j − [ f ( x )] − k − X j =1 f ( j )0 ( x ) j ! ( u − x ) j = ˆΨ k,n, ( u ) + (cid:18) − r (cid:19)(cid:18) ˆ g n ( u ) g ( x ) − (cid:19) − f ( x ) f ′ ( x )( u − x )+ k X j =2 (cid:18) − rj (cid:19)(cid:18) ˆ g n ( u ) g ( x ) − (cid:19) j − [ f ( x )] − k − X j =2 f ( j )0 ( x ) j ! ( u − x ) j = ˆΨ k,n, ( u ) − rg ( x ) (cid:18) ˆ g n ( u ) − g ( x ) − g ′ ( x )( u − x ) (cid:19) + k X j =2 (cid:18) − rj (cid:19)(cid:18) ˆ g n ( u ) g ( x ) − (cid:19) j − [ f ( x )] − k − X j =2 f ( j )0 ( x ) j ! ( u − x ) j = − rg ( x ) (cid:18) ˆ g n ( u ) − g ( x ) − g ′ ( x )( u − x ) (cid:19) + ˆΨ k,n, ( u ) , whereˆΨ k,n, ( u ) := ˆΨ k,n, ( u )+ k X j =2 (cid:18) − rj (cid:19)(cid:18) ˆ g n ( u ) g ( x ) − (cid:19) j − [ f ( x )] − k − X j =2 f ( j )0 ( x ) j ! ( u − x ) j . -CONCAVE ESTIMATION Now we calculate Z l n,x Z vx ˆΨ k,n, ( u )d u d v = 12 t n − k +1 sup u ∈ l n,x (cid:12)(cid:12)(cid:12) ˆΨ k,n, ( u ) (cid:12)(cid:12)(cid:12) + k X j =2 (cid:18) − rj (cid:19) Z l n,x Z vx (cid:18) ˆ g n ( u ) g ( x ) − (cid:19) j d u d v − [ f ( x )] − k − X j =2 f ( j )0 ( x ) j ! Z l n,x Z vx ( u − x ) j d u d v = o p ( r − n ) + k X j =2 (cid:18) − rj (cid:19)(cid:18) g ′ ( x ) g ( x ) (cid:19) j Z l n,x Z vx ( u − x ) j d u d v − k − X j =2 (cid:18) − rj (cid:19)(cid:18) g ′ ( x ) g ( x ) (cid:19) j Z l n,x Z vx ( u − x ) j d u d v + (cid:18) k X j =2 (cid:18) − rj (cid:19) g ( x )] j × Z l n,x Z vx j X l =1 (cid:18) jl (cid:19)(cid:0) ˆ g n ( u ) − g ( x ) − g ′ ( x )( u − x ) (cid:1) l ( u − x ) j − l [ g ′ ( x )] j − l d u d v (cid:19) = o p ( r − n ) + (cid:18) − rk (cid:19)(cid:18) g ′ ( x ) g ( x ) (cid:19) k Z l n,x Z vx ( u − x ) k d u d v + (cid:18) k X j =2 (cid:18) − rj (cid:19) g ( x )] j × Z l n,x Z vx j X l =1 (cid:18) jl (cid:19)(cid:0) ˆ g n ( u ) − g ( x ) − g ′ ( x )( u − x ) (cid:1) l ( u − x ) j − l [ g ′ ( x )] j − l d u d v (cid:19) := o p ( r − n ) + (2) + (1) . Consider (1): for each ( j, l ) satisfying 1 ≤ l ≤ j ≤ k and j ≥
2, we have(1) : r n Z l n,x Z vx (cid:0) ˆ g n ( u ) − g ( x ) − g ′ ( x )( u − x ) (cid:1) l ( u − x ) j − l [ g ′ ( x )] j − l d u d v = n k +22 k +1 · O ( n − k +1 ) · O p ( n − kl k +1 ) · O p ( n − j − l k +1 ) = O p ( n − k ( l − j − l )2 k +1 ) = o p (1) . HAN AND WELLNER
Consider (2) as follows:(2) = (cid:18) − rk (cid:19)(cid:18) g ′ ( x ) g ( x ) (cid:19) k Z l n,x Z vx ( u − x ) k d u d v = 1( k + 1)( k + 2) (cid:18) − rk (cid:19)(cid:18) g ′ ( x ) g ( x ) (cid:19) k t k +2 r − n . Hence we have r n Z l n,x Z vx ˆΨ k,n, ( u )d u d v = 1( k + 1)( k + 2) (cid:18) − rk (cid:19)(cid:18) g ′ ( x ) g ( x ) (cid:19) k t k +2 + o p (1) . Note by definition we have(7.2) Y locmod n ( t ) = Y loc n ( t ) f ( x ) − r n Z l n,x Z vx ˆΨ k,n, ( u )d u d v. Let n → ∞ , by the same calculation in the proof of Theorem 6.2 Groeneboom, Jongbloed and Wellner(2001), we have Y locmod n ( t ) → d p f ( x ) Z t W ( s ) d s + (cid:20) f ( k )0 ( x )( k + 2)! f ( x ) − k + 1)( k + 2) (cid:18) − rk (cid:19)(cid:18) g ′ ( x ) g ( x ) (cid:19) k (cid:21) t k +2 = 1 p f ( x ) Z t W ( s ) d s − rg ( k )0 ( x ) g ( x )( k + 2)! t k +2 , where the last line follows from Lemma 7.5. Now we turn to the secondassertion. It is easy to check by the definition of ˆΨ k,n, ( · ) that(7.3) H locmod n ( t ) = H loc n ( t ) f ( x ) − r n Z l n,x Z vx ˆΨ k,n, ( u )d u d v. On the other hand, simple calculation yields that Y loc n ( t ) − H loc n ( t ) = r n (cid:0) H n ( x + s n t ) − ˆ H n ( x + s n t ) (cid:1) ≥ { ˆ A n } and { ˆ B n } . By Theorem 7.3, we can find M > τ ∈ S (ˆ g n ) such that 0 ≤ τ − x ≤ M n − / (2 k +1) with large probability. -CONCAVE ESTIMATION Now note (cid:12)(cid:12)(cid:12) ˆ A n (cid:12)(cid:12)(cid:12) ≤ r n s n (cid:12)(cid:12)(cid:12)(cid:0) ˆ F n ( x ) − ˆ F n ( τ ) (cid:1) − (cid:0) F n ( x ) − F n ( τ ) (cid:1)(cid:12)(cid:12)(cid:12) + r n s n n ≤ r n s n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)Z τx (cid:18) ˆ f n ( u ) − k − X j =0 f ( j )0 ( x ) j ! ( u − x ) j (cid:19) d u (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + r n s n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)Z τx (cid:18) k − X j =0 f ( j )0 ( x ) j ! ( u − x ) j − f ( u ) (cid:19) d u (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + r n s n (cid:12)(cid:12)(cid:12)(cid:12)Z τx d( F n − F ) (cid:12)(cid:12)(cid:12)(cid:12) + n − k/ (2 k +1) =: ˆ A n + ˆ A n + ˆ A n + n − k/ (2 k +1) . We calculate three terms respectively.ˆ A n ≤ r n s n (cid:12)(cid:12)(cid:12)(cid:12)Z τx ˆ e n ( u ) d u (cid:12)(cid:12)(cid:12)(cid:12) + r n s n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)Z τx f ( x ) (cid:18) − rk (cid:19)(cid:18) g ′ ( x ) g ( x ) (cid:19) k ( u − x ) k d u (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = O p ( r n s n · s k +1 n ) + o p ( r n s n · s k +1 n ) = O p (1) , by Lemma 7 . A n ≤ r n s n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)Z τx f ( k )0 ( x ) k ! ( u − x ) k d u (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + r n s n (cid:12)(cid:12)(cid:12)(cid:12)Z τx ( u − x ) k ǫ n ( u ) d u (cid:12)(cid:12)(cid:12)(cid:12) = O p (1) , since k ǫ n k ∞ → p x − τ → p . For ˆ A n , we follow the lines of Lemma 4.1 Balabdaoui, Rufibach and Wellner(2009) again to conclude. Fix R >
0, and consider the function class F x ,R := { [ x ,y ] : x ≤ y ≤ x + R } . Then F x ,R ( z ) := [ x ,x + R ] ( z ) is an envelopfunction for F x ,R , and E F x ,R = R x + Rx d z = R. Now let s = k, d = 1 inLemma 4.1 Balabdaoui, Rufibach and Wellner (2009), we haveˆ A n = (cid:12)(cid:12)(cid:12)(cid:12)Z τx d( F n − F )( z ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ | τ − x | k +1 + O p (1) n − k +12 k +1 = O p (1) . This completes the proof for tightness for { A n } . { B n } follows from similarargument so we omit the details.7.3. Auxiliary convex analysis.
Lemma . For any ϕ ( · ) ∈ G with non-empty domain, and ǫ > , define ϕ ( ǫ ) ( x ) := sup ( v,c ) ( v T x + c ) HAN AND WELLNER where the supremum is taken over all pairs of ( v, c ) ∈ R d × R such that1. k v k ≤ ǫ ;2. ϕ ( y ) ≥ v T y + c holds for all y ∈ R d .Then ϕ ( ǫ ) ∈ G with Lipschitz constant ǫ . Furthermore, ϕ ( ǫ ) ր ϕ, as ǫ ց , where the convergence is pointwise for all x ∈ R d . Lemma . Given Q ∈ Q , a point x ∈ R d is an interior point of csupp ( Q ) if and onlyif h ( Q, x ) ≡ sup { Q ( C ) : C ⊂ R d closed and convex , x / ∈ int( C ) } < . Moreover, if { Q n } ⊂ Q converges weakly to Q , then lim sup n →∞ h ( Q n , x ) ≤ h ( Q, x ) holds for all x ∈ R d . Lemma . If g ∈ G , then there exists a, b > such that for all x ∈ R d , g ( x ) ≥ a k x k − b. Proof.
The proof is essentially the same as for Lemma 1, Cule and Samworth(2010), so we shall omit it.Consider the class of functions G M := (cid:26) g ∈ G : Z g β d x ≤ M (cid:27) . Lemma . For a given g ∈ G M , denote D r := D ( g, r ) := { g ≤ r } tobe the level set of g ( · ) at level r , and ǫ := inf g . Then for r > ǫ , we have λ ( D r ) ≤ M ( − s )( r − ǫ ) d ( s + 1) R r − ǫ v d ( v + ǫ ) /s d v , where β = 1 + 1 /s , and − < s < . -CONCAVE ESTIMATION Proof.
For u ∈ [ ǫ, r ], by convexity of g ( · ), we have λ ( D u ) ≥ (cid:18) u − ǫr − ǫ (cid:19) d λ ( D r ) . This can be seen as follows: Consider the epigraph Γ g of g ( · ), where Γ g = { ( t, x ) ∈ R d × R : x ≥ g ( t ) } . Let x ∈ R d be a minimizer of g . Considerthe convex set C r = conv (cid:0) Γ g ∩ { g = r } , ( x , ǫ ) (cid:1) ⊂ Γ g ∩ { g ≤ r } . where theinclusion follows from the convexity of Γ g as a subset of R d +1 . The claimedinequality follows from λ d ( D u ) = λ d (cid:0) π d (Γ g ∩{ g = u } ) (cid:1) ≥ λ d (cid:0) π d ( C r ∩{ g = u } ) (cid:1) = (cid:18) u − ǫr − ǫ (cid:19) d λ d ( D r ) , where π d : R d × R → R d is the natural projection onto the first component.Now we do the calculation as follows: M ≥ Z D r (cid:0) g ( x ) /s +1 − r /s +1 (cid:1) d x = − (cid:18) s + 1 (cid:19) Z D r (cid:18) Z rǫ ( u ≥ g ( x )) u /s d u (cid:19) d x = − (cid:18) s + 1 (cid:19) Z rǫ u /s d u Z D r ( u ≥ g ( x )) d x = − (cid:18) s + 1 (cid:19) Z rǫ λ ( D u ) u /s d u ≥ − (cid:18) s + 1 (cid:19) Z rǫ (cid:18) u − ǫr − ǫ (cid:19) d λ ( D r ) u /s d u = λ ( D r ) · ( s + 1) R rǫ ( u − ǫ ) d u /s d u ( − s )( r − ǫ ) d . By a change of variable in the integral we get the desired inequality.
Lemma . Let G be a convex set in R d with non-empty interior,and a sequence { y n } n ∈ N with k y n k → ∞ as n → ∞ . Then there exists { x , . . . , x d } ⊂ G such that λ d (cid:0) conv (cid:0) x , . . . , x d , y n ( k ) (cid:1) (cid:1) → ∞ , as k → ∞ where { y n ( k ) } k ∈ N is a suitable subsequence of { y n } n ∈ N . HAN AND WELLNER
Proof.
Without loss of generality we assume 0 ∈ int(dom( G )), and wefirst choose a convergence subsequence { y n ( k ) } k ∈ N from { y n / k y n k} n ∈ N . Nowif we let a := lim k →∞ y n ( k ) / k y n ( k ) k , then k a k = 1. Since G has non-emptyinterior, { a T x = 0 } ∩ G has non-empty relative interior. Thus we can choose x , . . . , x d ⊂ { a T x = 0 } ∩ G such that λ d − ( K ) ≡ λ d − (cid:0) conv ( x , . . . , x d ) (cid:1) >
0. Note thatdist (cid:0) y n ( k ) , aff( K ) (cid:1) = dist (cid:0) y n ( k ) , { a T x = 0 } (cid:1) = h y n ( k ) , a i = k y n ( k ) kh y n ( k ) / k y n ( k ) k , a i → ∞ , as k → ∞ . Since λ d (cid:0) conv (cid:0) x , . . . , x d , y n ( k ) (cid:1) (cid:1) = λ d (cid:0) conv (cid:0) K, y n ( k ) (cid:1) (cid:1) = cλ d − ( K ) · dist (cid:0) y n ( k ) , aff( K ) (cid:1) , for some constant c = c ( d ) >
0, the proof is complete as we let k → ∞ . Lemma . Let ¯ g and { g n } n ∈ N be functions in G such that g n ≥ ¯ g , for all n ∈ N . Supposethe set C := { x ∈ R d : lim sup n →∞ g n ( x ) < ∞} is non-empty. Then thereexist a subsequence { g n ( k ) } k ∈ N of { g n } n ∈ N , and a function g ∈ G such that C ⊂ dom( g ) and lim k →∞ ,x → y g n ( k ) ( x ) = g ( y ) , for all y ∈ int(dom( g )) , lim inf k →∞ ,x → y g n ( k ) ( x ) ≥ g ( y ) , for all y ∈ R d . (7.4) Lemma . Let { g n } be a sequence of non-negative convex functionssatisfying the following conditions:(A1). There exists a convex set G with non-empty interior such that for all x ∈ int( G ) , we have sup n ∈ N g n ( x ) < ∞ . (A2). There exists some M > such that sup n ∈ N R (cid:0) g n ( x ) (cid:1) β d x ≤ M < ∞ . Then there exists a, b > such that for all x ∈ R d and k ∈ N g n ( k ) ( x ) ≥ a k x k − b, where { g n ( k ) } k ∈ N is a suitable subsequence of { g n } n ∈ N . Proof.
Without loss of generality we may assume G is contained in allint(dom( g n )). We first note (A1)-(A2) implies that { b x n ∈ Arg min x ∈ R d g n ( x ) } ∞ n =1 is a bounded sequence, i.e.(7.5) sup n ∈ N k b x n k < ∞ , -CONCAVE ESTIMATION Suppose not, then without loss of generality we may assume k b x n k → ∞ as n → ∞ . By Lemma 7.12, we can choose { x , . . . , x d } ⊂ G such that λ d (cid:0) conv (cid:0) x , . . . , x d , b x n ( k ) (cid:1) (cid:1) → ∞ , as k → ∞ for some subsequence { b x n ( k ) } ⊂{ b x n } . For simplicity of notation we think of { b x n } as such an appropriate sub-sequence. Denote ǫ n := inf x ∈ R d g n ( x ), and M := sup n ∈ N ǫ n ≤ sup n ∈ N g n ( x ) < ∞ by (A1). Again by (A1) and convexity we may assume thatsup x ∈ conv( x ,...,x d , b x n ) g n ( x ) ≤ M , holds for some M > n ∈ N . This implies that Z g βn ( x ) d x ≥ M β λ d (cid:0) conv ( x , . . . , x d , ˆ x n ) (cid:1) → ∞ , as n → ∞ , which gives a contradiction to (A2). This shows (7.5).Now we define g ( · ) be the convex hull of ˜ g ( x ) := inf n ∈ N g n ( x ), then g ≤ g n holds for all n ∈ N . We claim that g ( x ) → ∞ as k x k → ∞ . By Lemma 7.11,for fixed η >
1, we have λ d (cid:0) D ( g n , ηM ) (cid:1) ≤ M ( − s )( ηM − ǫ n ) d ( s + 1) R ηM − ǫ n v d ( v + ǫ n ) /s d v ≤ M ( − s )( ηM ) d ( s + 1) R ( η − M v d ( v + M ) /s d v < ∞ , where D ( g n , ηM ) := { g n ≤ ηM } . Hence(7.6) sup n ∈ N λ d (cid:0) D ( g n , ηM ) (cid:1) < ∞ . holds for every η >
1. Now combining (7.5) and (7.6), we claim that, forfixed η large enough, it is possible to find R = R ( η ) > g n ( x ) ≥ ηM , holds for all x ≥ R ( η ) and n ∈ N . If this is not true, then for all k ∈ N , wecan find n ( k ) ∈ N and ¯ x k ∈ R d with k ¯ x k k ≥ k such that g n ( k ) (¯ x k ) ≤ ηM .We consider two cases to derive a contradiction. [Case 1.] If for some n ∈ N there exists infinitely many k ∈ N with n ( k ) = n , then we may assume without loss of generality that we can find some asequence { ¯ x k } k ∈ N with k ¯ x k k → ∞ as k → ∞ , and g n (¯ x k ) ≤ ηM . Since thesupport g n has non-empty interior, by Lemma 7.12, we can find x , . . . , x d ∈ supp( g n ) such that λ d (cid:0) conv( x , . . . , x d , ¯ x k ( j ) ) (cid:1) → ∞ as j → ∞ holds for HAN AND WELLNER some subsequence { ¯ x k ( j ) } j ∈ N of { ¯ x k } k ∈ N . Let ¯ M := max ≤ i ≤ d g n ( x i ), thenwe find λ d (cid:0) D ( g n , ¯ M ∨ ηM ) (cid:1) = ∞ . This contradicts with (7.6). [Case 2.]
If { k ∈ N : n = n ( k ) } < ∞ for all n ∈ N , then withoutloss of generality we may assume that for all k ∈ N , we can find ¯ x k ∈ R d with k ¯ x k k ≥ k such that g k ( x k ) ≤ ηM . Recall by assumption (A1) convexset G has non-empty interior, and is contained in the support of g n forall n ∈ N . Again by Lemma 7.12, we may take x , . . . , x d ∈ C such that λ d (cid:0) conv( x , . . . , x d , ¯ x k ( j ) ) (cid:1) → ∞ as j → ∞ holds for some subsequence { ¯ x k ( j ) } j ∈ N of { ¯ x k } k ∈ N . In view of (A1), we conclude by convexity that ¯ M :=max ≤ i ≤ d sup j ∈ N g k ( j ) ( x i ) < ∞ . This implies λ d (cid:0) D ( g n k ( j ) , ¯ M ∨ ηM ) (cid:1) ≥ λ d (cid:0) conv( x , . . . , x d , ¯ x k ( j ) ) (cid:1) → ∞ , j → ∞ , which gives a contradiction.Combining these two cases we have proved (7.7). This implies that ˜ g ( x ) →∞ as k x k → ∞ , whence verifying the claim that g ( x ) → ∞ as k x k → ∞ .Hence in view of Lemma 7.10, we find that there exists a, b > g n ( x ) ≥ a k x k − b holds for all x ∈ R d and n ∈ N . Lemma . Assume x , . . . , x d ∈ R d are in general position. If g ( · ) is anon-negative function with ∆ ≡ conv( x , . . . , x d ) ⊂ dom( g ) , and g ( x ) = 0 .Then for r ≥ d , we have R ∆ (cid:0) g ( x ) (cid:1) − r d x = ∞ . Proof.
We may assume without loss of generality that x = 0 , x i = e i ∈ R d , where e i is the unit directional vector with 1 in its i -th coordinate and0 otherwise. Then ∆ = ∆ := { x ∈ R d : P di =1 x i ≤ , x i ≥ , ∀ i = 1 , . . . , d } .Denote a i = g ( x i ) ≥
0. We may assume there is at least one a i = 0. Thenby convexity of g we find g ( x ) ≤ P di =1 a i x i for all x ∈ ∆ . This gives Z ∆ (cid:0) g ( x ) (cid:1) − r d x ≥ Z ∆ (cid:0) d X i =1 a i x i (cid:1) − r d x ≥ Z ∆ i =1 ,...,d a i ) r k x k r d x ≥ i =1 ,...,d a i ) r d r/ Z C k x k r d x = ∞ , where C := {k x k ≤ √ d } ∩ { x i ≥ , i = 1 , . . . , d } . Note we used the factthat k x k ≤ √ d k x k . Lemma . Let f n → d f , and D be the class of all Borel measurable, convex subsets in R d .Then lim n →∞ sup D ∈D (cid:12)(cid:12)R D ( f n − f ) (cid:12)(cid:12) = 0 . -CONCAVE ESTIMATION Acknowledgements.
The authors owe thanks to Charles Doss, RogerKoenker and Richard Samworth, as well as two referees and an AssociateEditor for helpful comments, suggestions and minor corrections.
References.
Avriel, M. (1972). r -convex functions. Math. Programming Balabdaoui, F. , Rufibach, K. and
Wellner, J. A. (2009). Limit distribution theory formaximum likelihood estimation of a log-concave density.
Ann. Statist. Balabdaoui, F. and
Wellner, J. A. (2007). Estimation of a k -monotone density:limit distribution theory and the spline connection. Ann. Statist. Basu, A. , Harris, I. R. , Hjort, N. L. and
Jones, M. C. (1998). Robust and effi-cient estimation by minimising a density power divergence.
Biometrika Bhattacharya, R. N. and
Ranga Rao, R. (1976).
Normal Approximation and Asymp-totic Expansions . John Wiley & Sons, New York-London-Sydney Wiley Series in Prob-ability and Mathematical Statistics. MR0436272 (55
Birg´e, L. and
Massart, P. (1993). Rates of convergence for minimum contrast estima-tors.
Probab. Theory Related Fields Borell, C. (1974). Convex measures on locally convex spaces.
Ark. Mat. Borell, C. (1975). Convex set functions in d -space. Period. Math. Hungar. Brascamp, H. J. and
Lieb, E. H. (1976). On extensions of the Brunn-Minkowskiand Pr´ekopa-Leindler theorems, including inequalities for log concave functions, andwith an application to the diffusion equation.
J. Functional Analysis Brunk, H. D. (1970). Estimation of isotonic regression. In
Nonparametric Techniques inStatistical Inference (Proc. Sympos., Indiana Univ., Bloomington, Ind., 1969)
Cule, M. L. and
D¨umbgen, L. (2008). On an auxiliary function for log-density estima-tion. arXiv preprint arXiv:0807.4719 . Cule, M. and
Samworth, R. (2010). Theoretical properties of the log-concave maxi-mum likelihood estimator of a multidimensional density.
Electron. J. Stat. Cule, M. , Samworth, R. and
Stewart, M. (2010). Maximum likelihood estimation ofa multi-dimensional log-concave density.
J. R. Stat. Soc. Ser. B Stat. Methodol. Das Gupta, S. (1976). S -unimodal function: related inequalities and statistical applica-tions. Sankhy¯a Ser. B Dharmadhikari, S. and
Joag-Dev, K. (1988).
Unimodality, Convexity, and Appli-cations . Probability and Mathematical Statistics . Academic Press, Inc., Boston, MA.MR954608 (89k:60020)
Doss, C. and
Wellner, J. A. (2016). Global rates of convergence of the MLEs of log-concave and s-concave densities.
Ann. Statist. to appear. HAN AND WELLNER
Dudley, R. M. (2002).
Real Analysis and Probability . Cambridge Studies in AdvancedMathematics . Cambridge University Press, Cambridge Revised reprint of the 1989original. MR1932358(2003h:60001) D¨umbgen, L. and
Rufibach, K. (2009). Maximum likelihood estimation of a log-concave density and its distribution function: basic properties and uniform consistency.
Bernoulli D¨umbgen, L. , Samworth, R. and
Schuhmacher, D. (2011). Approximation by log-concave distributions, with applications to regression.
Ann. Statist. Groeneboom, P. (1985). Estimating a monotone density. In
Proceedings of the Berke-ley conference in honor of Jerzy Neyman and Jack Kiefer, Vol. II (Berkeley,Calif., 1983) . Wadsworth Statist./Probab. Ser.
Groeneboom, P. , Jongbloed, G. and
Wellner, J. A. (2001). Estimation of a con-vex function: characterizations and asymptotic theory.
Ann. Statist. Guntuboyina, A. and
Sen, B. (2015). Global risk bounds and adaptation in univariateconvex regression.
Probab. Theory Related Fields
Jongbloed, G. (2000). Minimax lower bounds and moduli of continuity.
Statist. Probab.Lett. Kim, A. K. and
Samworth, R. J. (2015). Global rates of convergence in log-concavedensity estimation. arXiv preprint arXiv:1404.2298v2 . Koenker, R. and
Mizera, I. (2010). Quasi-concave density estimation.
Ann. Statist. Koenker, R. and
Mizera, I. (2014). Convex Optimization in R.
Journal of StatisticalSoftware . Lang, R. (1986). A note on the measurability of convex sets.
Arch. Math. (Basel) MOSEK ApS, D.
Pal, J. K. , Woodroofe, M. and
Meyer, M. (2007). Estimating a Polya frequencyfunction . In Complex datasets and inverse problems . IMS Lecture Notes Monogr. Ser. Prakasa Rao, B. L. S. (1969). Estimation of a unimodal density.
Sankhy¯a Ser. A Rinott, Y. (1976). On convexity of measures.
Ann. Probability Rockafellar, R. T. (1971). Integrals which are convex functionals. II.
Pacific J. Math. Rockafellar, R. T. (1997).
Convex Analysis . Princeton Landmarks in Mathematics .Princeton University Press, Princeton, NJ Reprint of the 1970 original, Princeton Pa-perbacks. MR1451876 (97m:49001)
Schuhmacher, D. , H¨usler, A. and
D¨umbgen, L. (2011). Multivariate log-concavedistributions as a nearly parametric model.
Stat. Risk Model. Seijo, E. and
Sen, B. (2011). Nonparametric least squares estimation of a multivariateconvex regression function.
Ann. Statist. Seregin, A. and
Wellner, J. A. (2010). Nonparametric estimation of multivariateconvex-transformed densities.
Ann. Statist. -CONCAVE ESTIMATION Uhrin, B. (1984). Some remarks about the convolution of unimodal functions.
Ann.Probab. van de Geer, S. A. (2000). Applications of Empirical Process Theory . Cambridge Seriesin Statistical and Probabilistic Mathematics . Cambridge University Press, Cambridge.MR1739079 (2001h:62002) van der Vaart, A. W. and Wellner, J. A. (1996).
Weak Convergence and EmpiricalProcesses . Springer Series in Statistics . Springer-Verlag, New York With applicationsto statistics. MR1385671 (97g:60035)
Walther, G. (2002). Detecting the presence of mixing with multiscale maximum likeli-hood.
J. Amer. Statist. Assoc. Wright, F. T. (1981). The asymptotic behavior of monotone regression estimates.
Ann.Statist. Department of Statistics, Box 354322University of WashingtonSeattle, WA 98195-4322E-mail: [email protected]
Department of Statistics, Box 354322University of WashingtonSeattle, WA 98195-4322E-mail: [email protected]