[PDF] Approximation and Estimation of s-Concave Densities via Rényi Divergences

Abstract

In this paper, we study the approximation and estimation of s -concave densities via Rényi divergence. We first show that the approximation of a probability measure Q by an s -concave densities exists and is unique via the procedure of minimizing a divergence functional proposed by Koenker and Mizera (2010) if and only if Q admits full-dimensional support and a first moment. We also show continuity of the divergence functional in Q : if Q n →Q in the Wasserstein metric, then the projected densities converge in weighted L 1 metrics and uniformly on closed subsets of the continuity set of the limit. Moreover, directional derivatives of the projected densities also enjoy local uniform convergence. This contains both on-the-model and off-the-model situations, and entails strong consistency of the divergence estimator of an s -concave density under mild conditions. One interesting and important feature for the Rényi divergence estimator of an s -concave density is that the estimator is intrinsically related with the estimation of log-concave densities via maximum likelihood methods. In fact, we show that for d=1 at least, the Rényi divergence estimators for s -concave densities converge to the maximum likelihood estimator of a log-concave density as s↗0 . The Rényi divergence estimator shares similar characterizations as the MLE for log-concave distributions, which allows us to develop pointwise asymptotic distribution theory assuming that the underlying density is s -concave.

Full PDF

aa r X i v : . [ m a t h . S T ] O c t Submitted to the Annals of Statistics

APPROXIMATION AND ESTIMATION OF S -CONCAVEDENSITIES VIA R´ENYI DIVERGENCES By Qiyang Han and Jon A. Wellner ∗ University of Washington

In this paper, we study the approximation and estimation of s -concave densities via R´enyi divergence. We ﬁrst show that the ap-proximation of a probability measure Q by an s -concave density existsand is unique via the procedure of minimizing a divergence functionalproposed by Koenker and Mizera (2010) if and only if Q admits full-dimensional support and a ﬁrst moment. We also show continuity ofthe divergence functional in Q : if Q n → Q in the Wasserstein metric,then the projected densities converge in weighted L metrics and uni-formly on closed subsets of the continuity set of the limit. Moreover,directional derivatives of the projected densities also enjoy local uni-form convergence. This contains both on-the-model and oﬀ-the-modelsituations, and entails strong consistency of the divergence estimatorof an s -concave density under mild conditions. One interesting andimportant feature for the R´enyi divergence estimator of an s -concavedensity is that the estimator is intrinsically related with the esti-mation of log-concave densities via maximum likelihood methods. Infact, we show that for d = 1 at least, the R´enyi divergence estimatorsfor s -concave densities converge to the maximum likelihood estimatorof a log-concave density as s ր

0. The R´enyi divergence estimatorshares similar characterizations as the MLE for log-concave distribu-tions, which allows us to develop pointwise asymptotic distributiontheory assuming that the underlying density is s -concave. ∗ Supported in part by NSF Grant DMS-1104832 and NI-AID grant R01 AI029168

MSC 2010 subject classiﬁcations:

Primary 62G07, 62H12; secondary 62G05, 62G20

Keywords and phrases: s -concavity, consistency, projection, asymptotic distribution,mode estimation, nonparametric estimation, shape constraints HAN AND WELLNER

CONTENTS1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 Theoretical properties of the divergence estimator . . . . . . . . . 72.1 Existence and uniqueness . . . . . . . . . . . . . . . . . . . . 82.2 Weighted global convergence in k·k L and k·k ∞ . . . . . . . . 82.3 Characterization of the R´enyi divergence projection and esti-mator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.4 Continuity of the R´enyi divergence estimator in s . . . . . . . 133 Limit behavior of s -concave densities . . . . . . . . . . . . . . . . . 143.1 Limit characterization via dimensionality condition . . . . . . 143.2 Modes of convergence . . . . . . . . . . . . . . . . . . . . . . 153.3 Local convergence of directional derivatives . . . . . . . . . . 164 Limiting distribution theory of the divergence estimator . . . . . . 164.1 Limit distribution theory . . . . . . . . . . . . . . . . . . . . 174.2 Estimation of the mode . . . . . . . . . . . . . . . . . . . . . 205 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225.1 Behavior of R´enyi projection for generic measures Q when s < − / ( d + 1) . . . . . . . . . . . . . . . . . . . . . . . . . . 225.2 Global rates of convergence for R´enyi divergence estimators . 225.3 Conjectures about the global rates in higher dimensions . . . 235.4 Adaptive estimation of concave-transformed class of functions 236 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236.1 Proofs for Section 2 . . . . . . . . . . . . . . . . . . . . . . . 236.2 Proofs for Section 3 . . . . . . . . . . . . . . . . . . . . . . . 366.3 Proofs for Section 4 . . . . . . . . . . . . . . . . . . . . . . . 427 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467.1 Proofs of Lemmas 6.2 and 6.3 . . . . . . . . . . . . . . . . . . 467.2 Proof of Theorem 6.4 . . . . . . . . . . . . . . . . . . . . . . . 497.3 Auxiliary convex analysis . . . . . . . . . . . . . . . . . . . . 57Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63Author’s addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 -CONCAVE ESTIMATION

1. Introduction.

Overview.

The class of s -concave densities on R d is deﬁned by thegeneralized means of order s as follows. Let M s ( a, b ; θ ) := (cid:0) (1 − θ ) a s + θb s (cid:1) /s , s = 0 , a, b > , , s < , ab = 0 ,a − θ b θ , s = 0 ,a ∧ b, s = −∞ . Then a density p ( · ) on R d is called s -concave, i.e. p ∈ P s if and only iffor all x , x ∈ R d and θ ∈ (0 , p (cid:0) (1 − θ ) x + θx (cid:1) ≥ M s ( p ( x ) , p ( x ); θ ).This deﬁnition apparently goes back to Avriel (1972) with further studies byBorell (1974, 1975), Das Gupta (1976), Rinott (1976), and Uhrin (1984); seealso Dharmadhikari and Joag-Dev (1988) for a nice summary. It is easy tosee that the densities p ( · ) have the form p = ϕ /s + for some concave function ϕ if s > p = exp( ϕ ) for some concave ϕ if s = 0, and p = ϕ /s + for someconvex ϕ if s <

0. The function classes P s are nested in s in that for every r > > s , we have P r ⊂ P ⊂ P s ⊂ P −∞ . Nonparametric estimation of s -concave densities has been under intenseresearch eﬀorts in recent years. In particular, much attention has been paidto estimation in the special case s = 0 which corresponds to all log-concavedensities on R d . The nonparametric maximum likelihood estimator (MLE)of a log-concave density was studied in the univariate setting by Walther(2002), D¨umbgen and Ruﬁbach (2009), Pal, Woodroofe and Meyer (2007);and in the multivariate setting by Cule, Samworth and Stewart (2010); Cule and Samworth(2010). The limiting distribution theory at ﬁxed points when d = 1 wasstudied in Balabdaoui, Ruﬁbach and Wellner (2009), and rate results inDoss and Wellner (2016); Kim and Samworth (2015). D¨umbgen, Samworth and Schuhmacher(2011) also studied stability properties of the MLE projection of any prob-ability measure onto the class of log-concave densities.Compared with the well-studied log-concave densities (i.e. s = 0), muchremains unknown concerning estimation and inference procedures for thelarger classes P s , s <

0. One important feature for this larger class is thatthe densities in P s ( s <

0) are allowed to have heavier and heavier tailsas s → −∞ . In fact, t − distributions with ν degrees of freedom belong to P − / ( ν +1) ( R ) (and hence also to P s ( R ) for any s < − / ( ν + 1)). The studyof maximum likelihood estimators (MLE’s in the following) for general s -concave densities in Seregin and Wellner (2010) shows that the MLE existsand is consistent for s ∈ ( − , ∞ ). However there is no known result about HAN AND WELLNER uniqueness of the MLE of s -concave densities except for s = 0. The diﬃcul-ties in the theory of estimation via MLE lie in the fact we have still very littleknowledge of ‘good’ characterizations of the MLE in the s -concave setting.This has hindered further development of both theoretical and statisticalproperties of the estimation procedure.Some alternative approaches to estimation of s -concave densities havebeen proposed in the literature by using divergences other than the log-likelihood functional (Kullback-Leibler divergence in some sense). Koenker and Mizera(2010) proposed an alternative to maximum likelihood based on generalizedR´enyi entropies. Similar procedures were also proposed in parametric set-tings by Basu et al. (1998) using a family of discrepancy measures. In oursetting of s -concave densities with s <

0, the methods of Koenker and Mizera(2010) can be formulated as follows.Given i.i.d. observations X = ( X , . . . , X n ), consider the primal optimiza-tion problem ( P ):(1.1) ( P ) min g ∈G ( X ) L ( g, Q n ) ≡ n n X i =1 g ( X i ) + 1 | β | Z R d g ( x ) β d x, where G ( X ) denotes all non-negative closed convex functions supported onthe convex set conv( X ), Q n = n P ni =1 δ X i the empirical measure and β =1 + 1 /s <

0. As is shown by Koenker and Mizera (2010), the associated dualproblem ( D ) is( D ) max f Z R d ( f ( y )) α α d y, subject to f = d( Q n − G )d y for some G ∈ G ( X ) ◦ (1.2)where G ( X ) ◦ ≡ (cid:8) G ∈ C ∗ ( X ) | R g d G ≤ , for all g ∈ G ( X ) (cid:9) is the polar coneof G ( X ), and α is the conjugate index of β , i.e. 1 /α + 1 /β = 1. Here C ∗ ( X ),the space of signed Radon measures on conv( X ), is the topological dual of C ( X ), the space of continuous functions on conv( X ). We also note that theconstraint G ∈ G ( X ) ◦ in the dual form (1.2) comes from the ‘dual’ of theprimal constraint g ∈ G ( X ), and the constraint f = d( Q n − G )d y can be derivedfrom the dual computation of L ( · , Q n ): (cid:0) L ( · , Q n ) (cid:1) ∗ ( G ) = sup g (cid:18) h G, g i − n n X i =1 g ( X i ) − | β | Z R d g ( x ) β d x (cid:19) = sup g (cid:18) h G − Q n , g i − Z ψ s ( g ( x ))d x (cid:19) = Ψ ∗ s ( G − Q n ) . -CONCAVE ESTIMATION Here we used the notation h G, g i := R g d G , ψ s ( · ) := ( · ) β / | β | and Ψ s is thefunctional deﬁned by Ψ s ( g ) := R ψ s ( g ( x )) d x for clarity. Now the dual form(1.2) follows by the well known fact (e.g. Rockafellar (1971) Corollary 4A)that the form of the above dual functional is given byΨ ∗ ( G ) = R ψ ∗ (d G/ d x ) d x if G is absolute continuous with respect toLebesgue measure,+ ∞ otherwise.For the primal problem ( P ) and the dual problem ( D ), Koenker and Mizera(2010) proved the following results: Theorem . ( P ) admitsa unique solution g ∗ n if int(conv( X )) = ∅ , where g ∗ n is a polyhedral convexfunction supported on conv( X ) . Theorem . Strong du-ality between ( P ) and ( D ) holds. Any dual feasible solution is actually adensity on R d with respect to the canonical Lebesgue measure. The dual op-timal solution f ∗ n exists, and satisﬁes f ∗ n = ( g ∗ n ) /s . We note that the above results are all obtained in the empirical setting. Atthe population level, given a probability measure Q with suitable regularityconditions, consider(1.3) ( P Q ) min g ∈G L s ( g, Q ) , where L ( g, Q ) ≡ L s ( g, Q ) ≡ Z g ( x ) d Q + 1 | β | Z R d g ( x ) β d x, and G denotes the class of all (non-negative) closed convex functions withnon-empty interior, which are coercive in the sense that g ( x ) → ∞ , as k x k →∞ . Koenker and Mizera (2010) show that Fisher consistency holds at thepopulation level: Suppose Q ( A ) := R A f d λ is deﬁned for some f = g /s where g ∈ G ; then g is an optimal solution for ( P Q ).Koenker and Mizera (2010) also proposed a general discretization schemecorresponding to the primal form (1.1) and the dual form (1.2) for fastcomputation, by which the one dimensional problem can be solved via lin-ear programming and the two dimensional problem via semi-deﬁnite pro-gramming. These have been implemented in the R package REBayes byKoenker and Mizera (2014). Koenker’s package depends in turn on the

MOSEK

HAN AND WELLNER implementation of MOSEK ApS (2011); see Appendix B of Koenker and Mizera(2010) for further details. On the other hand, in the special case s = 0, com-putation of the MLE’s of log-concave densities has been implemented in the R package LogConcDEAD developed in Cule, Samworth and Stewart (2010) inarbitrary dimensions. However, expensive search for the proper triangulationof the support conv( X ) renders computation diﬃcult in high dimensions.In this paper, we show that the estimation procedure proposed by Koenker and Mizera(2010) is the ‘natural’ way to estimate s -concave densities. As a startingpoint, since the classes P s are nested in s , it is natural to consider estima-tion of the extreme case s = 0 (the class of log-concave densities) as somekind of ‘limit’ of estimation of the larger class s <

0. As we will see, estima-tion of s -concave distributions via R´enyi divergences is intrinsically relatedwith the estimation of log-concave distributions via maximum likelihoodmethods. In fact we show that in the empirical setting in dimension 1, theR´enyi divergence estimators converge to the maximum likelihood estimatorfor log-concave densities as s ր s -concave densities become possible. In particular, the charac-terizations developed here enable us to overcome some of the diﬃculties ofmaximum likelihood estimators as proposed by Seregin and Wellner (2010),and to develop limit distribution theory at ﬁxed points assuming that theunderlying model is s -concave. The pointwise rate and limit distributionresults follow a pattern similar to the corresponding results for the MLE’sin the log-concave setting obtained by Balabdaoui, Ruﬁbach and Wellner(2009). This local point of view also underlines the results on global rates ofconvergence considered in Doss and Wellner (2016), showing that the diﬃ-culty of estimation for such densities with tails light or heavy, comes almostsolely from the shape constraints, namely, the convexity-based constraints.The rest of the paper is organized as follows. In Section 2, we study thebasic theoretical properties of the approximation/projection scheme deﬁnedby the procedure (1.3). In Section 3, we study the limit behavior of s -concaveprobability measures in the setting of weak convergence under dimension-ality conditions on the supports of the limiting sequence. In Section 4, wedevelop limiting distribution theory of the divergence estimator in dimen-sion 1 under curvature conditions with tools developed in Sections 2 and 3.Related issues and further problems are discussed in Section 5. Proofs are -CONCAVE ESTIMATION given in Sections 6 and 7.1.2. Notation.

In this paper, we denote the canonical Lebesgue measureon R d by λ or λ d and write k·k p for the canonical Euclidean p -norm in R d ,and k·k = k·k unless otherwise speciﬁed. B ( x, δ ) stands for the open ball ofradius δ centered at x in R d , and A for the indicator function of A ⊂ R d .We use L p ( f ) ≡ k f k L p ≡ k f k p = ( R | f | p d λ d ) /p to denote the L p ( λ d ) normof a measurable function f on R d if no confusion arises.We write csupp( Q ) for the convex support of a measure Q deﬁned on R d ,i.e. csupp( Q ) = \ { C : C ⊂ R d closed and convex , Q ( C ) = 1 } . We let Q denote all probability measures on R d whose convex support hasnon-void interior, while Q denotes the set of all probability measures Q with ﬁnite ﬁrst moment: R k x k Q (d x ) < ∞ .We write f n → d f if P n converges weakly to P for the correspondingprobability measures P n ( A ) ≡ R A f n dλ and P ( A ) ≡ R A f dλ .We write α := 1 + s, β := 1 + 1 /s, r := − /s unless otherwise speciﬁed.

2. Theoretical properties of the divergence estimator.

In thissection, we study the basic theoretical properties of the proposed projec-tion scheme via R´enyi divergence (1.3). Starting from a given probabilitymeasure Q , we ﬁrst show the existence and uniqueness of such projectionsvia R´enyi divergence under assumptions on the index s and Q . We will callsuch a projection the R´enyi divergence estimator for the given probabilitymeasure Q in the following discussions. We next show that the projectionscheme is continuous in Q in the following sense: if a sequence of probabilitymeasures Q n , for which the projections onto the class of s -concave densitiesexist, converge to a limiting probability measure Q in Wasserstein distance,then the corresponding projected densities converge in weighted L metricsand uniformly on closed subsets of the continuity set of the limit. The direc-tional derivatives of such projected densities also converge uniformly in alldirections in a local sense. We then turn our attention the explicit character-izations of the R´enyi divergence estimators, especially in dimension 1. Thishelps in two ways. First, it helps to understand the continuity of the projec-tion scheme in the index s , i.e. answers aﬃrmatively the question: For a givenprobability measure Q , does the R´enyi divergence estimator converge to thelog-concave projection as studied in D¨umbgen, Samworth and Schuhmacher(2011) as s ր

0? Second, the explicit characterizations are exploited in thedevelopment of asymptotic distribution theory presented in Section 4.

HAN AND WELLNER

Existence and uniqueness.

For a given probability measure Q , let L ( Q ) = inf g ∈G L ( g, Q ). Lemma . Assume − / ( d + 1) < s < and Q ∈ Q . Then L ( Q ) < ∞ if and only if Q ∈ Q . Now we state our main theorem for the existence of R´enyi divergenceprojection corresponding to a general measure Q on R d . Theorem . Assume − / ( d +1) < s < and Q ∈ Q ∩Q . Then (1.3)achieves its nontrivial minimum for some ˜ g ∈ G . Moreover, ˜ g is boundedaway from zero, and ˜ f ≡ ˜ g /s is a bounded density with respect to λ d . The uniqueness of the solution follows immediately from the strict con-vexity of the functional L ( · , Q ). Lemma . ˜ g is the unique solution for ( P Q ) if int(dom(˜ g )) = ∅ . Remark . By the above discussion, we conclude that the map Q arg min g ∈G L ( g, Q ) is well-deﬁned for probability measures Q with suitableregularity conditions: in particular, if Q ∈ Q and − / ( d + 1) < s <

0, itis well-deﬁned if and only if Q ∈ Q . From now on we denote the optimalsolution as g s ( ·| Q ) or simply g ( ·| Q ) if no confusion arises, and write P Q for the corresponding s -concave distribution, and say that P Q is the R´enyiprojection of Q to P Q ∈ P s .2.2. Weighted global convergence in k·k L and k·k ∞ . Theorem . Assume − / ( d + 1) < s < . Let { Q n } ⊂ Q be asequence of probability measures converging weakly to Q ⊂ Q ∩ Q . Then (2.1) Z k x k d Q ≤ lim inf n →∞ Z k x k d Q n . If we further assume that (2.2) lim n →∞ Z k x k d Q n = Z k x k d Q, then, (2.3) L ( Q ) = lim n →∞ L ( Q n ) . -CONCAVE ESTIMATION Conversely, if (2.3) holds, then (2.2) holds true. In the former case(i.e. (2.2)holds), let g := g ( ·| Q ) and g n := g ( ·| Q n ) , then f := g /s , f n := g /sn satisfy lim n →∞ ,x → y f n ( x ) = f ( y ) , for all y ∈ R d \ ∂ { f > } , lim sup n →∞ ,x → y f n ( x ) ≤ f ( y ) , for all y ∈ R d . (2.4) For κ < r − d ≡ − /s − d , (2.5) lim n →∞ Z (1 + k x k ) κ | f n ( x ) − f ( x ) | d x = 0 . For any closed set S contained in the continuity points of f and κ < r , (2.6) lim n →∞ sup x ∈ S (cid:0) k x k ) κ | f n ( x ) − f ( x ) | = 0 . Furthermore, let D f := { x ∈ int(dom( f )) : f is diﬀerentiable at x } , and T ⊂ int( D f ) be any compact set. Then (2.7) lim n →∞ sup x ∈ T, k ξ k =1 |∇ ξ f n ( x ) − ∇ ξ f ( x ) | = 0 where ∇ ξ f ( x ) := lim h ց f ( x + hξ ) − f ( x ) h denotes the (one-sided) directionalderivative along ξ . Remark . The one-sided directional derivative for a convex function g is well-deﬁned and ∇ ξ g ( x ) = inf h> g ( x + hξ ) − g ( x ) h , hence well-deﬁned for f ≡ g /s . See Section 23 in Rockafellar (1997) for more details.As a direct consequence, we have the following result covering both onand oﬀ-the-model cases. Corollary . Assume − / ( d + 1) < s < . Let Q be a proba-bility measure such that Q ∈ Q ∩ Q , with f Q := g ( ·| Q ) /s the densityfunction corresponding to the R´enyi projection P Q (as in Remark 2.4). Let Q n = n P ni =1 δ X i be the empirical measure when X , . . . , X n are i.i.d. withdistribution Q on R d . Let ˆ g n := g ( ·| Q n ) and ˆ f n := ˆ g /sn be the R´enyi diver-gence estimator of Q . Then, almost surely we have lim n →∞ ,x → y ˆ f n ( x ) = f Q ( y ) , for all y ∈ R d \ ∂ { f > } , lim sup n →∞ ,x → y ˆ f n ( x ) ≤ f Q ( y ) , for all y ∈ R d . (2.8) HAN AND WELLNER

For κ < r − d ≡ − /s − d , (2.9) lim n →∞ Z (1 + k x k ) κ (cid:12)(cid:12)(cid:12) ˆ f n ( x ) − f Q ( x ) (cid:12)(cid:12)(cid:12) d x = a.s. . For any closed set S contained in the continuity points of f and κ < r , (2.10) lim n →∞ sup x ∈ S (cid:0) k x k ) κ (cid:12)(cid:12)(cid:12) ˆ f n ( x ) − f Q ( x ) (cid:12)(cid:12)(cid:12) = a.s. . Furthermore, for any compact set T ⊂ int( D f Q ) , (2.11) lim n →∞ sup x ∈ T, k ξ k =1 (cid:12)(cid:12)(cid:12) ∇ ξ ˆ f n ( x ) − ∇ ξ f Q ( x ) (cid:12)(cid:12)(cid:12) = a.s. . Now we return to the correctly speciﬁed case and relax the previous as-sumption that s > − / ( d +1) for the case of the empirical measure Q n ≡ Q n and some measure Q with ﬁnite mean and bounded density f ∈ P s ′ ⊂ P s with s ′ > s . Corollary . Assume − /d < s < . Let Q be a probability measureon R d with density f ∈ P s if − / ( d + 1) < s and f ∈ P s ′ where s ′ > − / ( d + 1) } if s ∈ ( − /d, − / ( d + 1)] . (Thus f is bounded and f has aﬁnite mean.) Let ˆ f n ≡ ˆ f n,s be deﬁned as in Corollary 2.7. Then (2.8), (2.9),(2.10), and (2.11) hold with f Q replaced by f . Characterization of the R´enyi divergence projection and estimator.

We now develop characterizations for the R´enyi divergence projection, es-pecially in dimension 1. All proofs for this subsection can be found in Ap-pendix 6.1.We note that the assumption − / ( d + 1) < s < Q n ≡ Q n , this condition can be relaxed to − /d with g + th ∈ G holdsfor all t ∈ (0 , t ) . -CONCAVE ESTIMATION Corollary . Assume − / ( d + 1) < s < and Q ∈ Q ∩ Q andlet h be any closed convex function. Then Z h d P ≤ Z h d Q, where P = P Q is the R´enyi projection of Q to P Q ∈ P s (as in Remark 2.4). As a direct consequence, we have

Corollary . Assume − / ( d + 1) < s < and Q ∈ Q ∩Q . Let µ Q := E Q [ X ] . Then µ P = µ Q . Furthermore if − / ( d +2) < s < , we have λ max (Σ P ) ≤ λ max (Σ Q ) and λ min (Σ P ) ≤ λ min (Σ Q ) where Σ Q is the covariance matrix deﬁned by Σ Q := E Q [( X − µ Q )( X − µ Q ) T ] .Generally if − / ( d + k ) < s < for some k ∈ N , then E P [ k X k l ] ≤ E Q [ k X k l ] holds for all l = 1 , . . . , k . Now we restrict our attention to d = 1, and in the following we will givea full characterization of the R´enyi divergence estimator. Suppose we ob-serve X , . . . , X n i.i.d. Q on R , and let X (1) ≤ X (2) ≤ . . . ≤ X ( n ) be theorder statistics of X , . . . , X n . Let F n be the empirical distribution func-tion corresponding to the empirical probability measure Q n := n P ni =1 δ X i .Let ˆ g n := g ( ·| Q n ) and ˆ F n ( t ) := R t −∞ ˆ g /sn ( x ) d x . From Theorem 4.1 inKoenker and Mizera (2010) it follows that ˆ g n is a convex function supportedon [ X (1) , X ( n ) ], and linear on [ X ( i ) , X ( i +1) ] for all i = 1 , . . . , n −

1. For a con-tinuous piecewise linear function h : [ X (1) , X ( n ) ] → R , deﬁne the set of knotsto be S n ( h ) := { t ∈ ( X (1) , X ( n ) ) : h ′ ( t − ) = h ′ ( t +) } ∩ { X , . . . , X n } . Theorem . Let g n be a convex function taking the value + ∞ on R \ [ X (1) , X ( n ) ] and linear on [ X ( i ) , X ( i +1) ] for all i = 1 , . . . , n − . Let F n ( t ) := Z t −∞ g /sn ( x ) d x. Assume F n ( X ( n ) ) = 1 . Then g n = ˆ g n if and only if (2.13) Z tX (1) (cid:0) F n ( x ) − F n ( x ) (cid:1) d x ( = 0 if t ∈ S n ( g n ) ≤ otherwise . HAN AND WELLNER

Corollary . For x ∈ S n (ˆ g n ) , we have F n ( x ) − n ≤ ˆ F n ( x ) ≤ F n ( x ) . Finally we give a characterization of the R´enyi divergence estimator interms of distribution function as Theorem 2.7 in D¨umbgen, Samworth and Schuhmacher(2011).

Theorem . Assume − / < s < and Q ∈ Q ∩ Q is a probabilitymeasure on R with distribution function G ( · ) . Let g ∈ G be such that f ≡ g /s is a density on R , with distribution function F ( · ) . Then g = g ( ·| Q ) if andonly if1. R R ( F − G )( t )d t = 0 ;2. R x −∞ ( F − G )( t )d t ≤ for all x ∈ R with equality when x ∈ ˜ S ( g ) .Here ˜ S ( g ) := { x ∈ R : g ( x ) < (cid:0) g ( x + δ ) + g ( x − δ ) (cid:1) holds for δ > small enough . } . The above theorem is useful for understanding the projected s -concavedensity given an arbitrary probability measure Q ∈ Q ∩ Q . The followingexample illustrates these projections and also gives some insight concerningthe boundary properties of the class of s -concave densities. Example . Consider the class of densities Q deﬁned by Q = (cid:26) q τ ( x ) = τ − τ − (cid:18) | x | τ − (cid:19) − τ , τ > (cid:27) . Note that q τ is − /τ -concave and not s -concave for any 0 > s > − /τ . Westart from arbitrary q τ ∈ Q with τ >

2, and we will show in the following thatthe projection of q τ onto the class of s -concave (0 > s > − /τ ) distributionthrough L ( · , q τ ) will be given by q − /s . Let Q τ be the distribution functionof q τ ( · ), then we can calculate Q τ ( x ) = ( (cid:0) − xτ − (cid:1) − ( τ − if x ≤ , − (cid:0) xτ − (cid:1) − ( τ − if x > . It is easy to check by direct calculation that R x −∞ (cid:0) Q r ( t ) − Q τ ( t ) (cid:1) d t ≤ x = 0. It is clear that ˜ S ( q τ ) = { } andhence the conditions in Theorem 2.14 are veriﬁed. Note that, in Example 2.9 -CONCAVE ESTIMATION of D¨umbgen, Samworth and Schuhmacher (2011), the log-concave approxi-mation of the rescaled t density is the Laplace distribution. It is easy tosee from the above calculation that the log-concave projection of the wholeclass Q will be the Laplace distribution q ∞ = exp( − | x | ). Therefore thelog-concave approximation fails to distinguish densities at least amongst theclass Q ∪ { t } .2.4. Continuity of the R´enyi divergence estimator in s . Recall that α =1+ s , and then α, β is a conjugate pair with α − + β − = 1 where β = 1+1 /s .For 1 − /d < α <

1, let F α ( f ) = 1 α − Z f α ( x ) d x,F ( f ) = Z f ( x ) log f ( x ) d x. For a given index − /d < s <

0, and data X = ( X , . . . , X n ) with non-voidint(conv( X )), solving the dual problem (1.2) for the primal problem (1.1) isequivalent to solving( D α ) min f F α ( f ) = 1 α − Z f α ( x ) d x subject to f = d( Q n − G )d y for some G ∈ G ( X ) ◦ (2.14)where G ( X ) ◦ is the polar cone of G ( X ) and Q n = n P ni =1 δ X i is the empiricalmeasure. The maximum likelihood estimation of a log-concave density hasdual form ( D ) min f F ( f ) = Z f ( x ) log f ( x ) d x, subject to f = d( Q n − G )d y for some G ∈ G ( X ) ◦ . (2.15)Let f α and f be the solutions of ( D α ) and ( D ). For simplicity we drop theexplicit notational dependence of f α , f on n . Since F α ( f ) → F ( f ) as α ր f smooth enough, it is natural to expect some convergence property of f α to f . The main result is summarized as follows. Theorem . Suppose d = 1 . For all κ > , p ≥ , we have thefollowing weighted convergence lim α ր Z (1 + k x k ) κ | f α ( x ) − f ( x ) | p d x = 0 , HAN AND WELLNER

Moreover, for any closed set S contained in the continuity points of f , lim α ր sup x ∈ S (cid:0) k x k (cid:1) κ | f α ( x ) − f ( x ) | = 0 for all κ > .

3. Limit behavior of s -concave densities. Let { f n } n ∈ N be a se-quence of s -concave densities with corresponding measures d ν n = f n d λ .Suppose ν n → d ν . From Borell (1974, 1975) and Brascamp and Lieb (1976),we know that each ν n is a t − concave measure with t = s/ (1 + sd ) if − /d < s < ∞ , t = −∞ if s = − /d , and t = 1 /d if s = ∞ . This re-sult is proved via diﬀerent methods by Rinott (1976). Furthermore, if thedimension of the support of ν is d , then it follows from Borell (1974), The-orem 2.2 that the limit measure ν is t − concave and hence has a Lebesguedensity with s = t/ (1 − td ). Here we pursue this type of result in somewhatmore detail. Our key dimensionality condition will be formulated in termsof the set C := { x ∈ R d : lim inf f n ( x ) > } . We will show that if(D1) Either dim(csupp( ν )) = d or dim( C ) = d holds, then the limiting probability measure ν admits an upper semi-continuous s -concave density on R d . Furthermore, if a sequence of s -concave densities { f n } converges weakly to some density f (in the sense that the correspondingprobability measures converge weakly), then f is s -concave, and f n convergesto f in weighted L metrics and uniformly on any closed set of continuitypoints of f . The directional derivatives of f n also converge uniformly in alldirections in a local sense.In the following sections, we will not fully exploit the strength of theresults we have obtained. The results obtained will be interesting in theirown right, and careful readers will ﬁnd them useful as technical support forSections 2 and 4.3.1. Limit characterization via dimensionality condition.

Note that C is a convex set. For a general convex set K , we follow the convention (seeRockafellar (1997)) that dim K = dim(aﬀ( K )), where aﬀ( K ) is the aﬃnehull of K . It is well known that the dimension of a convex set K is the max-imum of the dimensions of the various simplices included in K (cf. Theorem2.4, Rockafellar (1997)).We ﬁrst extend several results in Kim and Samworth (2015) and Cule and Samworth(2010) from the log-concave setting to our s -concave setting. The proofs willall be deferred to Appendix 6.2. Lemma . Assume (D1). Then csupp( ν ) = C . -CONCAVE ESTIMATION Lemma . Let { ν n } n ∈ N be probability measures with upper semi-continuous s -concave densities { f n } n ∈ N such that ν n → ν weakly as n → ∞ . Here ν isa probability measure with density f . Then f n → a.e. f , and f can be takenas f = cl(lim n f n ) and hence upper semi-continuous s -concave. In many situations, uniform boundedness of a sequence of s -concave den-sities give rise to good stability and convergence property. Lemma . Assume − /d < s < . Let { f n } n ∈ N be a sequence of s -concave densities on R d . If dim C = d where C = { lim inf n f n > } asabove, then sup n ∈ N k f n k ∞ < ∞ . Now we state one limit characterization theorem.

Theorem . Assume − /d < s < . Under either condition of (D1), ν is absolutely continuous with respect to λ d , with a version of the Radon-Nikodym derivative cl(lim n f n ) , which is an upper semi-continuous and an s -concave density on R d . Modes of convergence.

It is shown above that the weak conver-gence of s -concave probability measures implies almost everywhere point-wise convergence at the density level. In many applications, we wish diﬀer-ent/stronger types of convergence. This subsection is devoted to the studyof the following two types of convergence:1. Convergence in k·k L metric;2. Convergence in k·k ∞ metric.We start by investigating convergence property in k·k L metric. Lemma . Assume − /d < s < . Let ν, ν , . . . , ν n , . . . be probabilitymeasures with upper semi-continuous s -concave densities f, f , . . . , f n , . . . such that ν n → ν weakly as n → ∞ . Then there exists a, b > such that f n ( x ) ∨ f ( x ) ≤ (cid:0) a k x k + b (cid:1) /s . Once the existence of a suitable integrable envelope function is estab-lished, we conclude naturally by dominated convergence theorem that

Theorem . Assume − /d < s < . Let ν, ν , . . . , ν n , . . . be probabilitymeasures with upper semi-continuous s -concave densities f, f , . . . , f n , . . . such that ν n → ν weakly as n → ∞ . Then for κ < r − d , (3.1) lim n →∞ Z (1 + k x k ) κ | f n ( x ) − f ( x ) | d x = 0 . HAN AND WELLNER

Next we examine convergence of s -concave densities in k·k ∞ norm. Wedenote g = f s , g n = f sn unless otherwise speciﬁed. Since we have establishedpointwise convergence in Lemma 3.2, classical convex analysis guaranteesthat the convergence is uniform over compact sets in int(dom( f )). To es-tablish global uniform convergence result, we only need to control the tailbehavior of the class of s -concave functions and the region near the boundaryof f . This is accomplished via Lemmas 6.2 and 6.3. Theorem . Let ν, ν , . . . , ν n , . . . be probability measures with uppersemi-continuous s -concave densities f, f , . . . , f n , . . . such that ν n → ν weaklyas n → ∞ . Then for any closed set S contained in the continuity points of f and κ < r = − /s , lim n →∞ sup x ∈ S (cid:0) k x k (cid:1) κ | f n ( x ) − f ( x ) | = 0 . We note that no assumption on the index s is required here.3.3. Local convergence of directional derivatives.

It is known in convexanalysis that if a sequence of convex functions g n converges pointwise to g on an open convex set, then the subdiﬀerential of g n also ‘converges’ tothe subdiﬀerential of g . If we further assume smoothness of g n , then localuniform convergence of the derivatives automatically follows. See Theorems24.5 and 25.7 in Rockafellar (1997) for precise statements. Here we pursuethis issue at the level of transformed densities. Theorem . Let ν, ν , . . . , ν n , . . . be probability measures with uppersemi-continuous s -concave densities f, f , . . . , f n , . . . such that ν n → ν weaklyas n → ∞ . Let D f := { x ∈ int(dom( f )) : f is diﬀerentiable at x } , and T ⊂ int( D f ) be any compact set. Then lim n →∞ sup x ∈ T, k ξ k =1 |∇ ξ f n ( x ) − ∇ ξ f ( x ) | = 0 .

4. Limiting distribution theory of the divergence estimator.

Inthis section we establish local asymptotic distribution theory of the diver-gence estimator ˆ f n at a ﬁxed point x ∈ R . Limit distribution theory inshape-constrained estimation was pioneered for monotone density and re-gression estimators by Prakasa Rao (1969), Brunk (1970), Wright (1981)and Groeneboom (1985). Groeneboom, Jongbloed and Wellner (2001) es-tablished pointwise limit theory for the MLE’s and LSE’s of a convex de-creasing density, and also treated pointwise limit theory estimation of a -CONCAVE ESTIMATION convex regression function. Balabdaoui, Ruﬁbach and Wellner (2009) estab-lished pointwise limit theorems for the MLEs of log-concave densities on R . On the other hand, for nonparametric estimation of s -concave densi-ties, asymptotic theory beyond the Hellinger consistency results for theMLE’s established by Seregin and Wellner (2010) has been non-existent.Doss and Wellner (2016) have shown in the case of d = 1 that the MLE’shave Hellinger convergence rates of order O p ( n − / ) for each s ∈ ( − , ∞ )(which includes the log-concave case s = 0). However, due at least in partto the lack of explicit characterizations of the MLE for s -concave classes,no results concerning limiting distributions of the MLE at ﬁxed points arecurrently available. In the remainder of this section we formulate results ofthis type for the R´enyi divergence estimators. These results are compara-ble to the pointwise limit distribution results for the MLE’s of log-concavedensities obtained by Balabdaoui, Ruﬁbach and Wellner (2009).In the following, we will see how natural and strong characterizationsdeveloped in Section 2 help us to understand the limit behavior of the R´enyidivergence estimator at a ﬁxed point. For this purpose, we assume the truedensity f = g − r satisﬁes the following:(A1). g ∈ G and f is an s -concave density on R , where − < s < f ( x ) > g is locally C k around x for some k ≥ k := max { k ∈ N : k ≥ , g ( j )0 ( x ) = 0 , for all 2 ≤ j ≤ k − , g ( k )0 ( x ) = 0 } , and k = 2 if the above set is empty. Assume g ( k )0 iscontinuous around x .4.1. Limit distribution theory.

Before we state the main results con-cerning the limit distribution theory for the R´enyi divergence estimator,let us sketch the route by which the theory is developed. We ﬁrst denoteˆ F n ( x ) := R x −∞ ˆ f n ( t ) d t , ˆ H n ( x ) := R x −∞ ˆ F n ( t ) d t and H n ( x ) := R x −∞ F n ( t ) d t .We also denote r n := n ( k +2) / (2 k +1) and l n,x = [ x , x + n − / (2 k +1) t ]. Due tothe form of the characterizations obtained in Theorem 2.12, we deﬁne localprocesses at the level of integrated distribution functions as follows: Y loc n ( t ) : = r n Z l n,x (cid:18) F n ( v ) − F n ( x ) − Z vx (cid:0) k − X j =0 f ( j )0 ( x ) j ! ( u − x ) j (cid:1) d u (cid:19) d v ; H loc n ( t ) : = r n Z l n,x (cid:18) ˆ F n ( v ) − ˆ F ( x ) − Z vx (cid:0) k − X j =0 f ( j )0 ( x ) j ! ( u − x ) j (cid:1) d u (cid:19) d v + ˆ A n t + ˆ B n , HAN AND WELLNER where ˆ A n := n k +12 k +1 (cid:0) ˆ F n ( x ) − F n ( x ) (cid:1) and ˆ B n := n k +22 k +1 (cid:0) ˆ H n ( x ) − H n ( x ) (cid:1) aredeﬁned so that Y loc n ( · ) ≥ H loc n ( · ) by virtue of Theorem 2.12. Since we wishto derive asymptotic theory at the level of the underlying convex function,we modify the processes by Y locmod n ( t ) : = Y loc n ( t ) f ( x ) − r n Z l n,x Z vx ˆΨ k,n, ( u )d u d v, H locmod n ( t ) : = H loc n ( t ) f ( x ) − r n Z l n,x Z vx ˆΨ k,n, ( u )d u d v. (4.1)where ˆΨ k,n, ( u ) = 1 f ( x )  ˆ f n ( u ) − k − X j =0 f ( j )0 ( x ) j ! ( u − x ) j  + rg ( x ) (cid:0) ˆ g n ( u ) − g ( x ) − g ′ ( x )( u − x ) (cid:1) . (4.2)A direct calculation reveals that with r = − /s > H locmod n ( t ) = − r · r n g ( x ) Z l n,x Z vx (cid:18) ˆ g n ( u ) − g ( x ) − ( u − x ) g ′ ( x ) (cid:19) d u d v + ˆ A n t + ˆ B n f ( x ) , and hence n k k +1 (cid:0) ˆ g n ( x + s n t ) − g ( x ) − s n tg ′ ( x ) (cid:1) = g ( x ) − r d d t H locmod n ( t ) ,n k − k +1 (cid:0) ˆ g ′ n ( x + s n t ) − g ′ ( x ) (cid:1) = g ( x ) − r d d t H locmod n ( t ) . (4.3)It is clear from (4.1) that the order relationship Y locmod n ( · ) ≥ H locmod n ( · ) isstill valid for the modiﬁed processes. Now by tightness arguments, the limitprocess H of H locmod n , including its derivatives, exists uniquely, giving us thepossibility of taking the limit in (4.3) as n → ∞ . Finally we relate H to thecanonical process H k deﬁned in Theorem 4.1 by looking at their respective‘envelope’ functions Y and Y k , where Y denotes the limit process of Y locmod n and Y k ( t ) = R t W ( s ) d s − t k +2 . Careful calculation of the limit of Y loc n andˆΨ k,n, reveals that Y locmod n ( t ) → d p f ( x ) Z t W ( s ) d s − rg ( k )0 ( x ) g ( x )( k + 2)! t k +2 , Now by the scaling property of Brownian motion, W ( at ) = d √ aW ( t ), weget the following theorem. -CONCAVE ESTIMATION Theorem . Under assumptions (A1)-(A4), we have (4.4) n k k +1 (cid:0) ˆ g n ( x ) − g ( x ) (cid:1) n k − k +1 (cid:0) ˆ g ′ n ( x ) − g ′ ( x ) (cid:1)! → d  − (cid:18) g k ( x ) g ( k )0 ( x ) r k f ( x ) k ( k +2)! (cid:19) / (2 k +1) H (2) k (0) − (cid:18) g k − ( x ) (cid:2) g ( k )0 ( x ) (cid:3) r k − f ( x ) k − (cid:2) ( k +2)! (cid:3) (cid:19) / (2 k +1) H (3) k (0)  , and (4.5) n k k +1 (cid:0) ˆ f n ( x ) − f ( x ) (cid:1) n k − k +1 (cid:0) ˆ f ′ n ( x ) − f ′ ( x ) (cid:1)! → d  (cid:18) rf ( x ) k +1 g ( k )0 ( x ) g ( x )( k +2)! (cid:19) / (2 k +1) H (2) k (0) (cid:18) r f ( x ) k +2 (cid:0) g ( k )0 ( x ) (cid:1) g ( x ) (cid:2) ( k +2)! (cid:3) (cid:19) / (2 k +1) H (3) k (0)  , where H k is the unique lower envelope of the process Y k satisfying1. H k ( t ) ≤ Y k ( t ) for all t ∈ R ;2. H (2) k is concave;3. H k ( t ) = Y k ( t ) if the slope of H (2) k decreases strictly at t . Remark . We note that the minus sign appearing in (4.4) is dueto the convexity of ˆ g n , g and the concavity of the limit process H (2) k (0).The dependence of the constant appearing in the limit is optimal in view ofTheorem 2.23 in Seregin and Wellner (2010). Remark . Assume − < s < k = 2. Let f = exp( ϕ ) be a log-concave density where ϕ : R → R is the underlying concave function. Then f is also s -concave. Let g s := f − /r = exp( − ϕ /r ) be the underlying convexfunction when f is viewed as an s -concave density. Then direct calculationyields that g (2) s ( x ) = 1 r g s ( x ) (cid:0) ϕ ′ ( x ) − rϕ ′′ ( x ) (cid:1) . Hence the constant before H (2) k (0) appearing in (4.5) becomes (cid:18) f ( x ) ϕ ′ ( x ) r + f ( x ) | ϕ ′′ ( x ) | (cid:19) / . Note that the second term in the above display is exactly the constant in-volved in the limiting distribution when f ( x ) is estimated via the log-concave MLE, see (2.2), page 1305 in Balabdaoui, Ruﬁbach and Wellner(2009). The ﬁrst term is non-negative and hence illustrates the price we HAN AND WELLNER need to pay by estimating a true log-concave density via the R´enyi diver-gence estimator over a larger class of s -concave densities. We also note thatthe additional term vanishes as r → ∞ , or equivalently s ր Estimation of the mode.

We consider the estimation of the modeof an s -concave density f ( · ) deﬁned by M ( f ) := inf { t ∈ R : f ( t ) =sup u ∈ R f ( u ) } . Theorem . Assume (A1)-(A4) hold. Then (4.6) n / (2 k +1) (cid:0) ˆ m n − m (cid:1) → d (cid:18) g ( m ) ( k + 2)! r f ( m ) g ( k )0 ( m ) (cid:19) / (2 k +1) M ( H (2) k ) , where ˆ m n = M ( ˆ f n ) , m = M ( f ) . By Theorem 2.26 in Seregin and Wellner (2010), the dependence of theconstant on local smoothness is optimal when k = 2. Here we show that thisdependence is also optimal for k > P dominated by the canonical Lebesgue mea-sure on R d . Let T : P → R be any functional. For an increasing convex lossfunction l ( · ) on R + , we deﬁne the minimax risk as(4.7) R l ( n ; T, P ) := inf t n sup p ∈P E p × n l (cid:0) | t n ( X , . . . , X n ) − T ( p ) | (cid:1) , where the inﬁmum is taken over all possible estimators of T ( p ) based on X , . . . , X n . Our basic method of deriving minimax lower bound based onthe following work in Jongbloed (2000). Theorem . Let { p n } be a sequenceof densities in P such that lim sup n →∞ nh ( p n , p ) ≤ τ for some density p ∈ P . Then (4.8) lim inf n →∞ R l ( n ; T, { p, p n } ) l (cid:0) exp( − τ ) / · | T ( p n ) − T ( p ) | (cid:1) ≥ . For ﬁxed g ∈ G and f := g /s = g − r , let m := M ( g ) be the mode of g .Consider a class of local perturbations of g : For every ǫ >

0, deﬁne˜ g ǫ ( x ) =  g ( m − ǫc ǫ ) + ( x − m + ǫc ǫ ) g ′ ( m − ǫc ǫ ) x ∈ [ m − ǫc ǫ , m − ǫ ) g ( m + ǫ ) + ( x − m − ǫ ) g ′ ( m + ǫ ) x ∈ [ m − ǫ, m + ǫ ) g ( x ) otherwise . -CONCAVE ESTIMATION Here c ǫ is chosen so that g ǫ is continuous at m − ǫ . This construction of aperturbation class is also seen in Balabdaoui, Ruﬁbach and Wellner (2009);Groeneboom, Jongbloed and Wellner (2001). By Taylor expansion at m − ǫ we can easily see c ǫ = 3 + o (1) as ǫ →

0. Since ˜ f ǫ := ˜ g − rǫ is not a density, wenormalize it by f ǫ ( x ) := ˜ f ǫ ( x ) R R ˜ f ǫ ( y )d y . Now f ǫ is s -concave for each ǫ > m − ǫ .The following result follows from direct calculation. For a proof, we referto Appendix section 6.3 . Lemma . Assume (A1)-(A4). Then h ( f ǫ , f ) = ζ k r f ( m )( g ( k ) ( m )) g ( m ) ǫ k +1 + o ( ǫ k +1 ) , where ζ k = 1108( k !) ( k + 1)( k + 2)(2 k + 1) (cid:20) − · k +2 (2 k + 1)(3 k +2 + k + k − k + 1)( k + 2) (cid:18) k +1 −

1) + 2 · k (2 k + 1)(2 k (2 k −

9) + 27) (cid:19)(cid:21) + 2 k (2 k + 1)3( k !) ( k + 1)(2 k + 1) . Theorem . For an s -concave density f , let SC n,τ ( f ) be deﬁned by SC n,τ ( f ) := (cid:26) f : s -concave density , h ( f, f ) ≤ τ n (cid:27) . Let m = M ( f ) be the mode of f . Suppose (A1)-(A4) hold. Then, sup τ> lim inf n →∞ n / (2 k +1) inf t n sup f ∈SC n,τ E f | T n − M ( f ) | ≥ ρ k (cid:18) g ( m ) r f ( m ) g ( k )0 ( m ) (cid:19) / (2 k +1) , where ρ k = (2(2 k + 1) ζ k ) − / (2 k +1) / . Proof.

Take l ( x ) = | x | . Let ǫ = cn − / (2 k +1) , and let γ = r f ( m )( g ( k ) ( m )) g ( m ) , f n := f cn − / (2 k +1) . Then lim sup n →∞ nh ( f n , f ) = ζ k γc k +1 . Applying Theo-rem 4.5, we ﬁnd thatlim inf n →∞ n / (2 k +1) R l ( n ; T, { f, f n } ) ≥ c exp (cid:16) − ζ k γc (2 k +1) (cid:17) . Now we choose c = (2(2 k + 1) ζ k γ ) − / (2 k +1) to conclude. HAN AND WELLNER

5. Discussion.

We show in this paper that the class of s -concave den-sities can be approximated and estimated via R´enyi divergences in a robustand stable way. We also develop local asymptotic distribution theory forthe divergence estimator, which suggests that the convexity constraint isthe main complexity within the class of s -concave densities regardless heavytails. In the rest of this section, we will sketch some related problems andfuture research directions.5.1. Behavior of R´enyi projection for generic measures Q when s < − / ( d +1) . We have considered in this paper two regions for the index s : (1) − / ( d + 1) < s < − /d < s ≤ − / ( d + 1). In case (1), weshowed that starting from a generic measure Q with the interior of its con-vex support non-void and a ﬁrst moment, the R´enyi projection through (1.3)exists and enjoys nice continuity properties that cover both on and oﬀ-the-model situations. In case (2), we showed that the R´enyi projection for theempirical measure still enjoys such continuity properties when Q is a prob-ability measure corresponding to a true s -concave density with a ﬁnite ﬁrstmoment.It remains open to investigate the behavior of the R´enyi projection in theregion (2) for a generic measure Q . If Q does not admit a ﬁrst moment,i.e. R k x k d Q ( x ) = ∞ , then the ﬁrst term in the functional (1.3) divergesfor any candidate convex function. We conjecture that the R´enyi divergenceprojection fails to exist in this case. We do not know if the R´enyi projectionexists when − /d < s ≤ − / ( d + 1) and Q / ∈ P s but R k x k dQ ( x ) < ∞ .It should be mentioned that the MLEs for the classes P s exist (for an in-creasingly large sample size n as s ց − /d ), and are Hellinger consistent for − /d < s < s < − /d . Butwe do not yet know any continuity properties of the Maximum Likelihoodprojection “oﬀ the model”. This leaves the interval − /d < s ≤ − / ( d + 1)presently without a nicely stable nonparametric estimation procedure. SeeKoenker and Mizera (2010) pages 3008 and 3016 for some further discussion.5.2. Global rates of convergence for R´enyi divergence estimators.

Clas-sical empirical process theory relates the maximum likelihood estimatorswith Hellinger loss via ‘basic inequalities’ as coined in van de Geer (2000)and van der Vaart and Wellner (1996). This reduces the problem of globalrates of convergence to the study of modulus of continuity of empirical pro-cess indexed by a suitable transformation of the function class of interest.We expect that similar ‘basic inequalities’ can be exploited to relate theR´enyi divergence estimators to some divergence (not necessarily Hellinger -CONCAVE ESTIMATION distance). We also expect some uniformity in the rates of convergence forthe R´enyi divergence estimators as observed by Kim and Samworth (2015)in the case of the MLEs for log-concave densities.5.3. Conjectures about the global rates in higher dimensions.

It is nowwell-understood from the work of Doss and Wellner (2016) that the MLEsfor s -concave densities( − < s <

0) and log-concave densities in dimension1 converge at rates no worse than O p ( n − / ) in Hellinger loss. In higherdimensions, Kim and Samworth (2015) provide an important lower boundon the bracketing entropy for a subclass of log-concave densities on the orderof O ( ǫ − ( d/ ∨ ( d − ) in Hellinger distance, and a matching upper bound up tologarithmic factors for d ≤

3. Lack of corresponding results in discrete convexgeometry precludes further upper bounds beyond d = 3. If a matching upperbound can be achieved for d ≥ r n in squared Hellinger distances become r n = O ( n − / ( d − ) , d ≥ Adaptive estimation of concave-transformed class of functions.

Therates conjectured above are conservative in that they are derived from the global point of view. From a local perspective, adaptive estimation may bepossible when the underlying function/density exhibits special structures.In fact, it is shown by Guntuboyina and Sen (2015) that in the univari-ate convex regression setting, if the underlying convex function is piecewiselinear, then the rate of convergence for the global risk in the discrete l norm adapts to nearly parametric rate n − / (up to logarithmic factors). Itwould be interesting to examine if same phenomenon can be observed forthe MLEs/R´enyi divergence estimators, and more generally for minimumcontrast estimators of concave-transformed classes of functions.

6. Proofs.

Proofs for Section 2.

Proof of Lemma 2.1.

Let Q ∈ Q . Then by letting g ( x ) := k x k + 1, HAN AND WELLNER we have L ( Q ) ≤ L ( g, Q ) = Z (1 + k x k ) d Q + 1 | β | Z d x (1 + k x k ) − β < ∞ , by noting Q ∈ Q , and − β = − − /s > d . Now assume L ( Q ) < ∞ . If Q / ∈ Q , i.e. R k x k d Q = ∞ , then since for each g ∈ G , we can ﬁnd some a, b > g ( x ) ≥ a k x k − b , we have L ( g, Q ) = Z g d Q + 1 | β | Z g β d x ≥ Z ( a k x k − b ) d Q = ∞ , a contradiction. This implies Q ∈ Q . Proof of Theorem 2.2.

We note that L ( Q ) < ∞ by Lemma 2.1. Hencewe can take a sequence { g n } n ∈ N ⊂ G such that ∞ > M ≥ L ( g n , Q ) ց L ( Q )as n → ∞ for some M >

0. Now we claim that, for all x ∈ int(csupp( Q )),(6.1) sup n ∈ N g n ( x ) < ∞ . Denote ǫ n ≡ inf x ∈ R d g n ( x ). First note, L ( g n , Q ) ≥ Z g n d Q = Z g n ( g n ≤ g n ( x )) d Q + Z g n ( g n > g n ( x )) d Q = Z (cid:0) g n − g n ( x ) + g n ( x ) (cid:1) ( g n ≤ g n ( x )) d Q + Z g n ( g n > g n ( x )) d Q ≥ g n ( x ) − (cid:0) g n ( x ) − ǫ n (cid:1) Q (cid:0) { g n ( · ) ≤ g n ( x ) } (cid:1) . If g n ( x ) > ǫ n , then x is not an interior point of the closed convex set { g n ≤ g n ( x ) } , which implies Q (cid:0) { g n ( · ) ≤ g n ( x ) } (cid:1) ≤ h ( Q, x ), where h ( · , · ) isdeﬁned in Lemma 7.9. Hence, in this case, the above term is lower boundedby L ( g n , Q ) ≥ g n ( x ) − (cid:0) g n ( x ) − ǫ n (cid:1) h ( Q, x ) ≥ g n ( x ) (cid:0) − h ( Q, x ) (cid:1) . This inequality also holds for g n ( x ) = ǫ n , which implies that g n ( x ) ≤ L ( g n , Q )1 − h ( Q, x ) ≤ M − h ( Q, x ) . by the ﬁrst statement of Lemma 7.9. Thus we veriﬁed (6.1). Now invokingLemma 7.14, and we check conditions (A1)-(A2) as follows: (A1) followsby (6.1); (A2) follows by the choice of g n since sup n ∈ N L ( g n , Q ) ≤ M . By -CONCAVE ESTIMATION Lemma 7.13 we can ﬁnd a subsequence { g n ( k ) } k ∈ N of { g n } n ∈ N , and a function˜ g ∈ G such that { x ∈ R d : sup n ∈ N g n ( x ) < ∞} ⊂ dom(˜ g ), andlim k →∞ ,x → y g n ( k ) ( x ) = ˜ g ( y ) , for all y ∈ int(dom(˜ g )) , lim inf k →∞ ,x → y g n ( k ) ( x ) ≥ ˜ g ( y ) , for all y ∈ R d . Again for simplicity we assume that { g n } satisﬁes the above properties. Wenote that L ( Q ) = lim n →∞ (cid:18) Z g n d Q + 1 | β | Z g βn d x (cid:19) ≥ lim inf n →∞ Z g n d Q + 1 | β | lim inf n →∞ Z g βn d x ≥ Z ˜ g d Q + 1 | β | Z ˜ g β d x = L (˜ g, Q ) ≥ L ( Q ) , where the third line follows from Fatou’s lemma for the ﬁrst term, and Fa-tou’s lemma and the fact that the boundary of a convex set has Lebesguemeasure zero for the second term (Theorem 1.1, Lang (1986)). This es-tablishes L (˜ g, Q ) = L ( Q ), and hence ˜ g is the desired minimizer. Since˜ g ∈ G achieves its minimum, we may assume x ∈ Arg min x ∈ R d ˜ g ( x ). If˜ g ( x ) = 0, since ˜ g has domain with non-empty interior, we can choose x , . . . , x d ∈ dom(˜ g ) such that { x , . . . , x d } are in general position. Thenby Lemma 7.15 we ﬁnd L (˜ g, Q ) = ∞ , a contradiction. This implies ˜ g mustbe bounded away from zero.For the last statement, since ˜ g is a minimizer of (1.3), and the fact that˜ g is bounded away from zero, then L (˜ g + c, Q ) is well-deﬁned for all | c | ≤ δ with small δ >

0, and we must necessarily have dd c L (˜ g + c, Q ) | c =0 = 0 . On theother hand it is easy to calculate that dd c L (˜ g + c, Q ) = 1 − R (cid:0) ˜ g ( x )+ c (cid:1) β − d x. This yields the desired result by noting β − /s . Proof of Lemma 2.3.

Let g, h be two minimizers for P Q . Since ψ s ( x ) = | β | x β is strictly convex on [0 , ∞ ), L ( t · g + (1 − t ) · h, Q ) is strictly convex in t ∈ [0 ,

1] unless g = h a.e. with respect to the canonical Lebesgue measure.We claim if two closed functions g, h agree a.e. with respect to the canonicalLebesgue measure, then it must agree everywhere, thus closing the argu-ment. It is easy to see int(dom g ) = int(dom h ). Since int(dom( g )) = ∅ , wehave ri(dom g ) = int(dom g ) = int(dom h ) = ri(dom h ) . Also note that a con-vex function is continuous in the interior of its domain, and hence almosteverywhere equality implies everywhere equality within the interior of the HAN AND WELLNER domain, i.e. g (cid:12)(cid:12) int(dom g ) = h (cid:12)(cid:12) int(dom h ) . Now by Corollary 7.3.4 in Rockafellar(1997), and the closedness of g, h , we ﬁnd that g = cl g = cl h = h . Proof of Theorem 2.5.

To show (2.1), we use Skorohod’s theorem:since Q n → d Q , there exist random vectors X n ∼ Q n and X ∼ Q de-ﬁned on a common probability space (Ω , B , P ) satisfying X n → a.s. X . Thenby Fatou’s lemma, we have R k x k d Q = E [ k X k ] ≤ lim inf n →∞ E [ k X n k ] =lim inf n →∞ R k x k d Q n . Assume (2.2). We ﬁrst claim that(6.2) lim sup n →∞ L ( Q n ) ≤ L ( g, Q ) = L ( Q ) . Let g n ( · ) , g ( · ) be deﬁned as in the statement of the theorem. Note thatlim sup n →∞ L ( g n , Q n ) ≤ lim n →∞ L ( g ( ǫ ) , Q n ) = L ( g ( ǫ ) , Q ). Here g ( ǫ ) is theLipschitz approximation of g deﬁned in Lemma 7.8, and the last equalityfollows from the moment convergence condition (2.2) by rewriting g ( ǫ ) ( x ) = g ( ǫ ) ( x )1+ k x k (1+ k x k ), and note the Lipschitz condition on g ( ǫ ) implies boundednessof g ( ǫ ) ( x )1+ k x k . By construction of { g ( ǫ ) } ǫ> we know that if x is a minimizer of g , then it is also a minimizer of g ( ǫ ) . This implies that the function class { g ( ǫ ) } ǫ> is bounded away from zero since g is bounded away from zero byTheorem 2.2, i.e. inf x ∈ R d g ( ǫ ) ( x ) ≥ ǫ holds for all ǫ > ǫ > ǫ ց

0, in view of Lemma 7.8, by the monotone convergence theoremapplied to g ( ǫ ) and ǫ β − ( g ( ǫ ) ) β we have veriﬁed (6.2).Next, we claim that, for all x ∈ int(dom( Q )),(6.3) lim sup n →∞ g n ( x ) < ∞ . Denote ǫ n ≡ inf x ∈ R d g n ( x ). Note by essentially the same argument as in theproof of Theorem 2.2, we have g n ( x ) ≤ L ( Q n )1 − h ( Q n , x ) . By taking lim sup as n → ∞ , (6.3) follows by virtue of Lemma 7.9 and (6.2).Now we proceed to show (2.3) and (2.4). By invoking Lemma 7.14, wecan easily check that all conditions are satisﬁed (note we also used (6.2)here). Thus we can ﬁnd a subsequence { g n ( k ) } k ∈ N of { g n } n ∈ N with g n ( k ) ( x ) ≥ a k x k − b, holds for all x ∈ R d and all k ∈ N with some a, b >

0. Hence byLemma 7.13, we can ﬁnd a function ˜ g ∈ G such that { x ∈ R d : lim sup k →∞ g n ( k ) ( x ) < -CONCAVE ESTIMATION ∞} ⊂ dom(˜ g ) , and thatlim k →∞ ,x → y g n ( k ) ( x ) = ˜ g ( y ) , for all y ∈ int(dom(˜ g )) , lim inf k →∞ ,x → y g n ( k ) ( x ) ≥ ˜ g ( y ) , for all y ∈ R d . Again for simplicity we assume { g n } admit the above properties. Now deﬁnerandom variables H n ≡ g n ( X n ) − ( a k X n k − b ). Then by the same reasoningas in the proof of Theorem 2.2, we havelim inf n →∞ L ( Q n ) = lim inf n →∞ (cid:18) Z g n d Q n + 1 | β | Z g βn d x (cid:19) ≥ lim inf n →∞ E [ H n + a ( X n ) − b ] + 1 | β | Z ˜ g β d x ≥ E [lim inf n →∞ H n ] + a lim inf n →∞ Z k x k d Q n − b + 1 | β | Z ˜ g β d x = L (˜ g, Q ) + a (cid:18) lim inf n →∞ Z k x k d Q n − Z k x k d Q (cid:19) ≥ L ( Q ) + a (cid:18) lim inf n →∞ Z k x k d Q n − Z k x k d Q (cid:19) , Note the expectation is taken with respect to the probability space (Ω , B , P )deﬁned above. This establishes that if (2.2) holds true, then(6.4) lim inf n →∞ L ( Q n ) ≥ L (˜ g, Q ) ≥ L ( Q ) . Conversely, if (2.2) does not hold true, then there exists a subsequence { Q n ( k ) } such that lim inf k →∞ R k x k d Q n ( k ) > R k x k d Q . However, this meansthat lim inf k →∞ L ( Q n ( k ) ) > L ( Q ) , which contradicts (2.3). Hence if (2.3)holds, then (2.2) holds true. Combine (6.4) and (6.2), and by virtue ofLemma 2.3, we ﬁnd ˜ g ≡ g . This completes the proof for (2.3) and (2.4).We show (2.5). First we claim that { ˆ x n ∈ Arg min x ∈ R d g n ( x ) } n ∈ N isbounded. If not, then we can ﬁnd a subsequence such that k ˆ x n ( k ) k → ∞ as k → ∞ . However this means that g n ( k ) ( x ) ≥ g n ( k ) (ˆ x n ( k ) ) ≥ a k ˆ x n ( k ) k− b → ∞ as k → ∞ for any x , a contradiction. Next we claim that there exists ǫ > k ∈ N ǫ n ( k ) ≥ ǫ holds for some subsequence { ǫ n ( k ) } k ∈ N of { ǫ n } n ∈ N .This can be seen as follows: Boundedness of { ˆ x n } implies ˆ x n ( k ) → x ∗ as k → ∞ for some subsequence { ˆ x n ( k ) } k ∈ N ⊂ { ˆ x n } n ∈ N and some x ∗ ∈ R .Hence by (2.4) we have lim sup k →∞ f n ( k ) (ˆ x n ( k ) ) ≤ f ( x ∗ ) < ∞ , since f ( · ) isbounded. This implies that sup k ∈ N k f n ( k ) k ∞ < ∞ , which is equivalent to theclaim. As before, we will understand the notation for whole sequence as a HAN AND WELLNER suitable subsequence. Now we have g n ( x ) ≥ (cid:0) a k x k − b (cid:1) ∨ ǫ holds for all x ∈ R d . This gives rise to(6.5) f n ( x ) ≤ (cid:18)(cid:0) a k x k − b (cid:1) ∨ ǫ (cid:19) /s , for all x ∈ R d . Note that − / ( d + 1) < s < /s < − ( d + 1), whence we getan integrable envelope. Now a simple application of dominated convergencetheorem yields the desired result (2.5), in view of the fact that the boundaryof a convex set has Lebesgue measure zero (cf. Theorem 1.1 in Lang (1986)).Finally, (2.6) and (2.7) are direct results of Theorems 3.7 and 3.8 by notingthat (2.5) entails f n → d f (in the sense that the corresponding probabilitymeasures converge weakly). Proof of Corollary 2.7.

It is known by Varadarajan’s theorem (cf.Dudley (2002) Theorem 11.4.1), Q n converges weakly to Q with probabil-ity 1. Further by the strong law of large numbers (SLLN), we know that R k x k d Q n → a.s. R k x k d Q . This veriﬁes all conditions required in Theorem2.5. Proof of Corollary 2.8.

The conclusion follows from Corollary 2.7if − / ( d + 1) < s <

0, so suppose − /d < s ≤ − / ( d + 1). Since f ∈P s ′ , we may write f = g /s ′ where g is convex. If f is unbounded, then g ( x ) = 0 for some x ∈ R . By Lemma 7.15 with r ′ = − /s ′ , it followsthat R f = ∞ , contradicting the fact that f is a density. Thus f mustnecessarily be bounded. To see that f has a ﬁnite mean, note that by Lemma3.5 f ( x ) = ( b + a k x k ) /s ′ where a, b > r ′ ≡ − /s ′ > d + 1. Thus R R d k x k f ( x ) dx ≤ R R d k x k ( b + a k x k ) − r ′ dx < ∞ . Now note that (2.8) holds bythe existence of the R´enyi divergence estimator for the empirical measure(cf. Theorem 4.1 in Koenker and Mizera (2010)) and the same argumentin the proof of Theorem 2.5. Also note that by the proof of Theorem 3.7,(2.8) would be enough to ensure (2.10). Since f is continuous on the interiorof the domain, we see that (2.10) implies weak convergence: let ˆ Q n be themeasures corresponding to ˆ f n . Then ˆ Q n → Q weakly as n → ∞ . Now therest follows immediately from Theorems 3.6 and 3.8. Proof of Theorem 2.9.

Denote L ( · ) := L ( · , Q ). We ﬁrst claim: Claim. g = arg min g ∈G L ( g ) if and only if lim t ց L ( g + th ) − L ( g ) t ≥ , holds forall h : R d → R such that there exists t > g + th ∈ G holds for all t ∈ (0 , t ).To see this, we only have to show suﬃciency. Now suppose g is not aminimizer of L ( · ). By Theorem 2.2 we know there exists ˆ g ∈ G such that -CONCAVE ESTIMATION ˆ g = g ( ·| Q ). By convexity, we have that for any t > L (cid:0) g + t (ˆ g − g ) (cid:1) ≤ (1 − t ) L ( g ) + tL (ˆ g ) . This implies that if we let h = ˆ g − g , and t = 1, then L ( g + th ) − L ( g ) t ≤ t (cid:0) (1 − t ) L ( g ) + tL (ˆ g ) − L ( g ) (cid:1) = − t (cid:0) L ( g ) − L (ˆ g ) (cid:1) , and thus lim t ց L ( g + th ) − L ( g ) t ≤ − (cid:0) L ( g ) − L (ˆ g ) (cid:1) <

0, where the strict in-equality follows from Lemma 2.3. This proves our claim. Now the theoremfollows from simple calculation:0 ≤ lim t ց t (cid:18) L ( g + th ) − L ( g ) (cid:19) = Z h d Q − Z h · g /s d λ, as desired. Proof of Corollary 2.10.

Let g ≡ g ( ·| Q ). Then by Theorem 2.2 andLemma 7.10, we ﬁnd that there exists some a, b > g ( x ) ≥ a k x k + b . Now take v ∈ ∂h (0), i.e. h ( x ) ≥ h (0) + v T x holds for all x ∈ R d .Hence for t >

0, we have g ( x ) + th ( x ) ≥ a k x k + b + t ( h (0) + v T x ) ≥ ( a − t k v k ) k x k + ( b + th (0)) , which implies that g + th ∈ G for t > Proof of Theorem 2.12.

We ﬁrst note that if F is a distribution func-tion for a probability measure supported on [ X (1) , X ( n ) ], and h : [ X (1) , X ( n ) ] → R an absolutely continuous function, then integration by parts (Fubini’s the-orem) yields(6.6) Z h d F = h ( X ( n ) ) − Z X ( n ) X (1) h ′ ( x ) F ( x ) d x. First we assume g n = ˆ g n . For ﬁxed t ∈ [ X (1) , X ( n ) ], let h be a convexfunction whose derivative is given by h ′ ( x ) = − ( x ≤ t ). Now by Theorem2.9 we ﬁnd that R h d F n = R h d ˆ F n ≤ R h d F n . Plugging in (6.6) we ﬁndthat R tX (1) F n ( x ) d x ≤ R tX (1) F n ( x ) d x. For t ∈ S n ( g n ), let h be the functionwith derivative h ′ ( x ) = ( x ≤ t ). It is easy to see g n + th is convex for t > g n satisﬁes (2.13). In view of the proof of Theorem2.9, we only have to show (2.12) holds for all function h : R → R which is HAN AND WELLNER linear on [ X ( i ) , X ( i +1) ]( i = 1 , . . . , n −

1) and g n + th convex for t > g n is a linear function between two consecutive knots, h mustbe convex between consecutive knots. This implies that the derivative ofsuch an h can be written as h ′ ( x ) = P nj =2 β j ( x ≤ X ( j ) ) , with β , . . . , β n satisfying β j ≤ X ( j ) / ∈ S n ( g n ) . Now again by (6.6) we have Z h d ˆ F n = h ( X n ) − n X j =2 β j Z X ( j ) X (1) ˆ F n ( x ) d x ≤ h ( X n ) − n X j =2 β j Z X ( j ) X (1) F n ( x ) d x = Z h d F n , as desired. Proof of Corollary 2.13.

This follows directly from the Theorem2.12 by noting for x < x < x we have1 x − x Z x x ˆ F n ( x ) d x ≤ x − x Z x x F n ( x ) d x, and 1 x − x Z x x ˆ F n ( x ) d x ≥ x − x Z x x F n ( x ) d x. Now let x ր x and x ց x we ﬁnd that ˆ F n ( x ) ≤ F n ( x ) by rightcontinuity and ˆ F n ( x ) ≥ F n ( x − ) = F n ( x ) − n . Proof of Theorem 2.14.

The proof closely follows the proof of Theo-rem 2.7 of D¨umbgen, Samworth and Schuhmacher (2011). For the reader’sconvenience we give a full proof here. Let P denote the probability distri-bution corresponding to F . We ﬁrst show necessity by assuming g = g ( ·| Q ).By Corollary 2.10 applied to h ( x ) = ± x , we ﬁnd by Fubini’s theorem that0 = Z R x d( Q − P )( x ) = Z R ( F − G )( t )d t which proves (1). Now we turn to (2). Since the map s ( s − x ) + is convex,again by Corollary 2.10, we ﬁnd0 ≤ Z R ( s − x ) + d( Q − P )( s ) = − Z x −∞ ( F − G )( t ) d t, where in the last equality we used the proved fact that R R ( F − G )d λ = 0.Now we assume x ∈ ˜ S ( g ), and discuss two diﬀerent cases to conclude. If -CONCAVE ESTIMATION x ∈ ∂ (dom( g )), then let h ( s ) = − ( s − x ) + , it is easy to see g + th ∈ G for t > ≤ Z h ( s )d( Q − P )( s ) = Z x −∞ ( F − G )( t ) d t. If x ∈ int(dom( g )), then g ′ ( x − δ ) < g ′ ( x + δ ) for small δ > H ′ δ ( u ) = − g ′ ( u ) − g ′ ( x − δ ) g ′ ( x + δ ) − g ′ ( x − δ ) { u ∈ [ x − δ,x + δ ] } − { u>x + δ } , whose integral H δ ( s ) := R s −∞ H ′ δ ( u ) d u serves as an approximation of − ( s − x ) + as δ ց

0. Note that (cid:0) g + tH δ (cid:1) ( s ) = g ( s ) − tg ′ ( x + δ ) − g ′ ( x − δ ) Z s ∧ ( x + δ ) s ∧ ( x − δ ) (cid:0) g ′ ( u ) − g ′ ( x − δ ) (cid:1) d u − t (cid:0) s − ( x + δ ) (cid:1) + , implying g + tH δ ∈ G for t > δ ).Then by Theorem 2.9,0 ≤ Z H δ ( s )d( Q − P )( s ) → − Z ( s − x ) + d( Q − P )( s ) = Z x −∞ ( F − G )( t ) d t, as δ ց

0, where the convergence follows easily from dominated convergencetheorem. This proves (2). Now we show suﬃciency by assuming (1)-(2).Consider a Lipschitz continuous function ∆( · ) with Lipschitz constant L .Then Z ∆d( Q − P ) = Z ∆ ′ ( F − G ) d λ = − Z ( L − ∆ ′ )( F − G ) d λ = − Z R (cid:18) Z L − L { s> ∆ ′ ( t ) } d s (cid:19) ( F − G )( t )d t = − Z L − L Z A (∆ ′ ,s ) ( F − G )( t ) d t d s, where the second line follows from (1), and A (∆ ′ , s ) := { t ∈ R : ∆ ′ ( t )
0, by monotone convergence theorem weﬁnd that R g d Q = R g d P and that R g d Q ≥ R g d P . This yields L ( g , Q ) ≥ L ( g , P ) ≥ L ( g, P ) = L ( g, Q ) , where the second inequality follows from the Fisher consistency of functional L ( · , · ) and the fact that P is the distribution corresponding to g .Before we prove Theorem 2.16, we will need an elementary lemma. Lemma . Fix a sequence < α n < with α n ր . Let f α n be an ( α n − -concave density on R . Let g α n := f α n − α n be the underlying convexfunction. Suppose { g α n } ’s are linear on [ a, b ] with lim n →∞ f α n ( a ) = γ a ∈ [0 , ∞ ] and lim n →∞ f α n ( b ) = γ b ∈ [0 , ∞ ] . Then for all x ∈ [ a, b ] , (6.7) f α n ( x ) → exp (cid:18) log γ b − log γ a b − a ( x − a ) + log γ a (cid:19) where exp( −∞ ) := 0 and exp( ∞ ) := ∞ . Proof of Lemma 6.1.

First assume γ b = γ a and γ a , γ b ∈ (0 , ∞ ). Fornotational convenience we drop explicit dependence on n and the limit istaken as α ր

1. Let γ a,α = f α ( a ) = g α ( a ) / ( α − and γ b,α = f α ( b ) = g α ( b ) / ( α − . For any x ∈ [ a, b ],lim α → log f α ( x ) = lim α → α − (cid:18) γ α − b,α − γ α − a,α b − a ( x − a ) + γ α − a,α (cid:19) = lim α → α − (cid:18) γ α − b − γ α − a b − a ( x − a ) · γ α − b,α − γ α − a,α γ α − b − γ α − a + γ α − a,α (cid:19) ≡ log γ a + lim α → α − (cid:18) ( γ α − b − γ α − a ) ( x − a )( b − a ) · γ α − a,α · r α + 1 (cid:19) . (6.8)Since γ α − a,α →

1, we claim that it suﬃces to show that r α ≡ γ α − b,α − γ α − a,α γ α − b − γ α − a → α → . (6.9)To see this, assume without loss of generality that γ a > γ b and hence γ α − b − γ α − a >

0. Suppose that (6.9) holds and let ǫ >

0. Then the second term on -CONCAVE ESTIMATION right hand side of (6.8) can be bounded from above bylim α ր α − (cid:18)(cid:0) γ α − b − γ α − a (cid:1) ( x − a )( b − a ) (1 − ǫ ) + 1 (cid:19) = lim α ր (cid:0) log γ b · γ α − b − log γ a · γ α − a (cid:1) ( x − a )( b − a ) (1 − ǫ )= (log γ b − log γ a ) ( x − a )( b − a ) (1 − ǫ )where the second line follows from L’Hospital’s rule. Similarly we can derivea lower bound: (log γ b − log γ a ) ( x − a )( b − a ) (1 + ǫ ) . Thus it remains to show that (6.9) holds. But we can rewrite r α as r α = c α − α − c α − − c α − ( c α /c ) α − − ( c α /c ) α − + ( c α /c ) α − − c α − −

1= ( c α /c ) α − + ( c α /c ) α − − c α − − → α → c α /c ) α − ) = ( α −

1) log( c α /c ) → · log 1 = 0, and where the secondlimit follows from an upper and lower bound argument using c α /c → c α := γ b,α /γ a,α and c = γ b /γ a = 1.This shows that (6.9) holds, thereby proving the case for γ a = γ b ∈ (0 , ∞ ).For the case γ b = γ a ∈ (0 , ∞ ), similarly we havelim α → log f α ( x ) = log γ a + lim α → α − (cid:18) c α − α − b − a ( x − a ) + 1 (cid:19) . The second term is 0 by an argument much as above by observing c α = γ b,α /γ a,α → γ b /γ a = 1. Finally, if γ a ∧ γ b = 0, then by the ﬁrst line of (6.8)we see that log f α ( x ) → −∞ ; if γ a ∨ γ b = ∞ , then again log f α ( x ) → ∞ . Proof of Theorem 2.16.

In the following, the notation sup α , inf α , lim α is understood as taking corresponding operation over α close to 1 unlessotherwise speciﬁed. We ﬁrst show almost everywhere convergence by invok-ing Lemma 7.13. To see this, for ﬁxed s ∈ ( − / , g α := f α − α and g ( s ) α := ( f α ) s . Then for α > s , the transformed function g ( s ) α is convex.We need to check two conditions in order to apply Lemma 7.13 as follows: HAN AND WELLNER (C1) The set ( X (1) , X ( n ) ) ⊂ { lim inf α f α ( x ) > } ;(C2) There is a uniform lower bound function ˜ g s ∈ G such that g ( s ) α ≥ ˜ g s holds for α suﬃciently close to 1.The ﬁrst assertion can be checked by using the characterization Theorem2.12. Let F α be the distribution function of f α . Then R tX (1) ( F α − F n )( x ) d x ≤ t ∈ S n ( g α ). For x ∈ ( X (1) , X ( n ) ) closeenough to X ( n ) , we claim that lim inf α f α ( x ) >

0. If not, we may assumewithout loss of generality that lim α f α ( x ) = 0. We ﬁrst note that there existssome t ∈ { , · · · n − } and some subsequence { α ( β ) } β ∈ N with α ( β ) ր X ( t ) is a knot point for { g α ( β ) } , and (2) X ( u ) is not a knotpoint for any { g α ( β ) } for u ≥ t + 1, i.e. g α ( β ) ’s are linear on [ X ( t ) , X ( n ) ]. Wedrop β for notational simplicity and assume without loss of generality thatboth limits lim α f α ( X ( n ) ) , lim α f α ( X ( t ) ) exist. Now Lemma 6.1 shows thatmin { lim α f α ( X ( n ) ) , lim α f α ( X ( t ) ) } = 0 since we have assumed lim α f α ( x ) = 0for some x ∈ ( X ( t ) , X ( n ) ). This in turn implies that lim α f α ( x ) = 0 forall x ∈ ( X ( t ) , X ( n ) ). Now we consider the following two cases to derive acontradiction with the fact(6.10) Z X ( n ) X ( t ) F α ( x )d x = Z X ( n ) X ( t ) F n ( x )d x that follows from Theorem 2.12, thereby proving lim inf α f α ( x ) > x close enough to X ( n ) . [Case 1.] If lim α f α ( X ( n ) ) = 0, then the left hand side of (6.10) convergesto X ( n ) − X ( t ) while the right hand side is no larger than n − n (cid:0) X ( n ) − X ( t ) (cid:1) . [Case 2.] . If lim α f α ( X ( n ) ) >

0, then we must necessarily have lim α f α ( x ) =0 for all x ∈ [ X (1) , X ( n ) ) by convexity of g α : If lim α f α ( x ) > x ∈ [ X (1) , X ( t ) ], then lim α g α ( x ) ∨ g α ( X ( n ) ) < ∞ while lim α g α ( x ) = ∞ for all x ∈ ( X ( t ) , X ( n ) ), which is absurd. Note that this also forces lim α f α ( X ( n ) ) = ∞ , otherwise the constraint R f α = 1 will be invalid eventually. Now the lefthand side of (6.10) converges to 0 while the right hand side is bounded frombelow by n ( X ( n ) − X ( t ) ).Similarly we can show lim inf α f α ( x ) > x close to X (1) . Now (C1)follows by convexity of f α .(C2) can be seen by ﬁrst noting M := sup α k f α k ∞ < ∞ . This can beveriﬁed by Lemma 3.3 combined with the ﬁrst assertion proved above. Thisimplies that the class { g ( s ) α } α has a uniform lower bound M s . Now (C2)follows by noting that the domain of all g ( s ) α is conv( X ). Therefore allconditions needed for Lemma 7.13 are valid, and hence we can extract a -CONCAVE ESTIMATION subsequence { g ( s ) α n } n ∈ N such thatlim n →∞ ,x → y g ( s ) α n ( x ) = g ( s ) ( y ) , for all y ∈ int(dom( g ( s ) ));lim n →∞ ,x → y g ( s ) α n ( x ) ≥ g ( s ) ( y ) , for all y ∈ R d , holds for some g ( s ) ∈ G . This implies f α n → a.e. f ( s ) as n → ∞ where f ( s ) := (cid:0) g ( s ) (cid:1) /s . Now repeat the above argument with another s with afurther extracted subsequence { α n ( k ) } , we see that f α n ( k ) → a.e. f ( s ) ( k → ∞ )for some s -concave f ( s ) holds for the subsequence { α n ( k ) } k ∈ N . This impliesthat f ( s ) = a.e. f ( s ) . Since a convex function is continuous in the interior ofthe domain, we can choose a version of upper semi-continuous f such that f = f ( s ) a.e. for all { / < s < } ∩ Q . This implies that f is s -concave forany rational 1 / < s < L convergence: For ﬁxed κ >

0, choose 0 > s > − / ( κ + 1). Since there exists a, b > g ( s ) α n ≥ g ( s ) ≥ a k x k − b holds for all n ∈ N , we have anintegrable envelope function: (cid:0) k x k (cid:1) κ (cid:0) f α n ( x ) ∨ f ( x ) (cid:1) ≤ (cid:0) k x k (cid:1) κ (cid:18)(cid:0) a k x k − b (cid:1) ∨ M (cid:19) /s . Now an application of the dominated convergence theorem yields the desiredweighted L convergence. Similar arguments show weighted convergence isalso valid in arbitrary L p norms ( p ≥ f = f by virtue of Theorem 2.2 in D¨umbgen and Ruﬁbach(2009) and Theorem 2.9. We note that by Lemma 6.1, f must be log-linearbetween consecutive data points. Now since f and f are both log-linear be-tween consecutive data points of { X , . . . , X n } , we only have to consider testfunctions h such that h is piecewise linear on consecutive data points. Recall g α = f α − α and g := − log f are the underlying convex functions for f α and f . For any such h with the property that, g + th ∈ G for t small enough, wewish to argue that such h is also a valid test for f α (i.e. g α + th ∈ G for t > { α k } converging up to 1 as k → ∞ . Thuswe only have to argue that for all X ( i ) ∈ S ( g ), X ( i ) ∈ S ( g α ) for a sequence of { α k } going up to 1 as k → ∞ . Assume the contrary that X ( i ) / ∈ S ( g α ) for all α close enough to 1. Then { g α } ’s are all linear on a closed interval I = [ a, b ]containing X ( i ) for α close to 1. Since f α → f uniformly on I by Theorem3.7, in particular f α ( a ) and f α ( b ) converges, Lemma 6.1 entails that f islog-linear over I , a contradiction to the fact X ( i ) ∈ S ( g ). Hence we can ﬁnda subsequence { α k } going up to 1 as k → ∞ such that for all X ( i ) ∈ S ( g ), HAN AND WELLNER X ( i ) ∈ S ( g α k ), i.e. for all feasible test function h of f , being linear on con-secutive data points, is also valid for f α k . Now combining the fact that f α k converges in L metric to f and Theorem 2.2 in D¨umbgen and Ruﬁbach(2009) we conclude f = f .6.2. Proofs for Section 3.

Proof of Lemma 3.1.

The proof closely follows the ﬁrst part of theproof of Proposition 2 Kim and Samworth (2015). Suppose dim (cid:0) csupp( ν ) (cid:1) = d , we show csupp( ν ) ⊂ C . To see this, we take x / ∈ C , then there exists δ > B ( x , δ ) ⊂ C c , and we claim that(6.11) For all x ∗ ∈ B ( x , δ ) ⊂ C c , x ∗ / ∈ int(csupp( ν )) . If (6.11) holds, then x / ∈ csupp( ν ) and hence csupp( ν ) ⊂ C . Now we turnto show (6.11). Since x ∗ / ∈ C = { lim inf n →∞ f n ( x ) > } , we can ﬁnd asubsequence { f n ( k ) } k ∈ N of { f n } n ∈ N such that f n ( k ) ( x ∗ ) < k holds for all k ∈ N . Hence x ∗ / ∈ Γ k := { x ∈ R d : f n ( k ) ( x ) ≥ k } . Note that Γ k is a closedconvex set, hence by Hyperplane Separation Theorem we can ﬁnd b k ∈ R d with k b k k = 1 such that { x ∈ R d : h b k , x i ≤ h b k , x ∗ i} ⊂ (Γ k ) c . Without lossof generality we may assume b k → b x ∗ as k → ∞ for some b x ∗ ∈ R d with k b x ∗ k = 1. Now for ﬁxed R > η >

0, deﬁne A R,η := { x ∈ R d : h b x ∗ , x i < h b x ∗ , x ∗ i − η, k x k ≤ R } . Choose k ∈ N large enough such that k b k − b x ∗ k ≤ η R holds for all k ≥ k ( x ∗ , η, R ). Now for R > k x ∗ k and x ∈ A R,η , we have h b k , x − x ∗ i = h b x ∗ , x − x ∗ i + h b k − b x ∗ , x − x ∗ i < − η + η R ( k x k + k x ∗ k ) ≤ k ≥ k ( x ∗ , η, R ). This implies for R > k x ∗ k and η > A R,η ⊂ { x ∈ R d : h b k , x i ≤ h b k , x ∗ i} ⊂ (Γ k ) c = { x ∈ R d : f n ( k ) ( x ) < k } . Now note A R,η is open, by Portmanteau Theorem we ﬁnd that ν ( A R,η ) ≤ lim inf k →∞ ν n ( k ) ( A R,η ) = lim inf k →∞ Z A R,η f n ( k ) ( x ) d x ≤ lim inf k →∞ λ d ( A R,η ) k = 0 . This implies ν (cid:0) { x ∈ R d : h b x ∗ , x i < h b x ∗ , x ∗ i} (cid:1) = ν (cid:18) ∞ [ R =1 A R, /R (cid:19) = lim R →∞ ν ( A R, /R ) = 0 , -CONCAVE ESTIMATION where the second equality follows from the fact { A R, /R } is an increasingfamily as R increases. By the assumption that dim (cid:0) csupp( ν ) (cid:1) = d , we ﬁnd x ∗ / ∈ int(csupp( ν )), as we claimed in (6.11).Now Suppose dim C = d , we claim C ⊂ csupp( ν ). To see this, we onlyhave to show C ⊂ csupp( ν ) by the closedness of csupp( ν ). Suppose not,then we can ﬁnd x ∈ C \ csupp( ν ). This implies that there exists δ > B ( x , δ ) ∩ csupp( ν ) = ∅ . By the assumption that dim C = d ,we can ﬁnd x , . . . , x d ∈ B ( x , δ ) ∩ C such that { x , . . . , x d } are in generalposition. By deﬁnition of C we can ﬁnd ǫ > , n ∈ N such that f n ( x j ) ≥ ǫ for all j = 0 , , . . . , d and n ≥ n . By convexity, we conclude that f n ( x ) ≥ ǫ , for all x ∈ conv( { x , . . . , x d } ) and n ≥ n . This gives ν (cid:0) conv( { x , . . . , x d } ) (cid:1) ≥ lim sup n →∞ ν n (cid:0) conv( { x , . . . , x d } ) (cid:1) ≥ ǫ λ d (cid:0) conv( { x , . . . , x d } ) (cid:1) > , a contradiction with B ( x , δ ) ∩ csupp( ν ) = ∅ , thus completing the proof ofthe claim. To summarize, we have proved1. If dim (cid:0) csupp( ν ) (cid:1) = d , then csupp( ν ) ⊂ C . This in turn impliesdim C = d , and hence C ⊂ csupp( ν ). Now it follows that csupp( ν ) = C ;2. If dim C = d , then C ⊂ csupp( ν ). This in turn implies dim (cid:0) csupp( ν ) (cid:1) = d , and hence csupp( ν ) ⊂ C . Now it follows that csupp( ν ) = C . Proof of Lemma 3.2.

The proof is essentially the same as the proofof Proposition 2 Cule and Samworth (2010) by exploiting convexity at thelevel of the underlying basic convex function so we shall omit it.

Proof of Lemma 3.3.

Set U n,t = { x ∈ R d : f n ( x ) ≥ t } . We ﬁrst claimthat there exists n ∈ N , ǫ ∈ (0 ,

1) such that λ d ( U n,ǫ ) ≥ ǫ holds for all n ≥ n . If not, then for all k ∈ N , l ∈ N , there exists n k,l ∈ N such that λ d ( U n k,l , /l ) ≤ l . Note that { lim inf n f n > } = ∪ k ∈ N ∪ l ∈ N ∩ n ≥ k U n, /l . Since λ d (cid:0) S l ∈ N T n ≥ k U n, /l (cid:1) = lim l →∞ λ d (cid:0) T n ≥ k U n, /l (cid:1) ≤ lim l →∞ λ d ( U n k,l , /l ) =0, we ﬁnd that C = { lim inf n f n > } is a countable union of null set andhence λ d ( C ) = 0, a contradiction to the assumption dim C = d . This showsthe claim.Denote M n := sup x ∈ R d f n ( x ) , ǫ n ∈ Arg max f n ( x ) . Without loss of gen-erality we assume M n ≥ ǫ (1+ κ s ) /s where κ s = (1 / s − >

0, and we set λ n := κ s M sn ǫ s − M sn ∈ [0 , x ∈ U n,ǫ , by convexity of f sn we have f sn ( ǫ n + λ n ( x − ǫ n )) ≤ λ n f sn ( x )+(1 − λ n ) f sn ( ǫ n ) ≤ λ n ǫ s +(1 − λ n ) M sn = ( M n / s . HAN AND WELLNER

This implies f n ( x ) ≥ M n / n , for all x ∈ V n,ǫ := { ǫ n + λ n ( x − ǫ n ) : x ∈ U n,ǫ } . Hence V n,ǫ ⊂ U n, Ω n and therefore λ d ( V n,ǫ ) = λ d ( U n,ǫ ) λ dn , thus λ d ( U n, Ω n ) ≥ λ d ( V n,ǫ ) = λ d ( U n,ǫ ) λ dn ≥ ǫ λ dn , holds for all n ≥ n . On the other hand,1 = Z f n ≥ Ω n λ d ( U n, Ω n ) ≥ Ω n ǫ λ dn , and suppose the contrary that M n → ∞ as n → ∞ , then1 ≥ Ω n ǫ λ dn = ǫ κ ds ǫ s − M sn ) d M sdn ≥ cM sdn → ∞ , n → ∞ , since 1 + sd > − /d < s <

0. Here c = ǫ − sd κ ds . This givesa contradiction and the proof is complete. Proof of Theorem 3.4.

We only have to show ν is absolutely contin-uous with respect to λ d . To this end, for given ǫ >

0, choose δ = ǫ/ M ,where M := sup n k f n k ∞ < ∞ by virtue of Lemma 3.3. Now for Borel set A ⊂ R d with λ d ( A ) ≤ δ , we can take an open A ′ ⊃ A such that λ d ( A ′ ) ≤ δ by the regularity of Lebesgue measure. Then ν ( A ) ≤ ν ( A ′ ) ≤ lim inf n →∞ ν n ( A ′ ) = lim inf n →∞ Z A ′ f n ≤ δM = ǫ, as desired. Proof of Lemma 3.5.

Let g n = f sn and g = f s . Without loss of gen-erality we assume 0 ∈ int(dom( g )), and choose η > B η := B (0 , η ) ⊂ int(dom( g )). By the Lemma 7.10, we know there ex-ists a > , R > g ( x ) − g (0) k x k ≥ a, holds for all k x k ≥ R . Nowwe claim that there exists n ∈ N such that g n ( x ) − g n (0) k x k ≥ a , holds for all k x k ≥ R and n ≥ n . Note for each n ∈ N , by convexity of g n ( · ), we knowthat for ﬁxed x ∈ R d , the quantity g n ( λx ) − g n (0) k λx k is non-decreasing in λ , sowe only have to show the claim for k x k = R and n ≥ n . Suppose the con-trary, then we can ﬁnd a subsequence { g n ( k ) } and k x n ( k ) k = R such that g n ( k ) ( x n ( k ) ) − g n ( k ) (0) k x n ( k ) k < a . For simplicity of notation we think of { g n } , { x n } as { g n ( k ) } , { x n ( k ) } . Now deﬁne A n := conv( { x n , B η } ); B n := { y ∈ R d : k y − x n k ≤ R/ } ; C n := A n ∩ B n . By reducing η > -CONCAVE ESTIMATION assume B η ∩ B n = ∅ . It is easy to see C n is convex and λ d ( C n ) = λ is aconstant independent of n ∈ N . By Lemma 3.2, we know that g n → a.e. g on B η , and hence sup x ∈ B η | g n ( x ) − g ( x ) | → n → ∞ ) by Theorem 10.8,Rockafellar (1997). By further reducing η > g n ( y ) ≤ g (0) + aR , holds for all y ∈ B η and n ∈ N . Now for any x ∗ ∈ C n ,write x ∗ = λx n + (1 − λ ) y , by noting R/ ≤ k x ∗ k ≤ R and convexity of g n ,we get g n ( x ∗ ) − g n (0) k x ∗ k ≤ λg n ( x n ) + (1 − λ ) g n ( y ) − g n (0) k x ∗ k = λ · g n ( x n ) − g n (0) k x n k · k x n kk x ∗ k + (1 − λ ) g n ( y ) − g n (0) k x ∗ k≤ λ · a RR/ − λ ) aR/ R/ a . This gives rise tolim inf n →∞ Z C n ( f n − f ) ≥ lim inf n →∞ λ (cid:0) ( aR/ g n (0)) /s − ( aR/ g (0)) /s (cid:1) = λ (cid:0) ( aR/ g (0)) /s − ( aR/ g (0)) /s (cid:1) > , which is a contradiction to Lemma 7.16. This establishes our claim. Nowby Lemma 3.2, we ﬁnd that the set { lim inf n f n ( · ) > } is full-dimensional,and hence by Lemma 3.3 we conclude g n ( · ) is uniformly bounded away fromzero. Also note by Lemma 7.15 we ﬁnd g ( · ) must be bounded away fromzero, which gives the desired assertion.Before the proof of Theorem 3.7, we ﬁrst state some useful lemmas thatgive good control of tails with local information of the s -concave densities;the proof can be found in Section 7.1. Lemma . Let x , . . . , x d be d + 1 points in R d such that its convexhull ∆ = conv( { x , . . . , x d } ) is non-void. If f ( y ) ≤ min j (cid:0) d P i = j f s ( x i ) (cid:1) /s ,then f ( y ) ≤ f max (cid:18) − dr + dr f min C (1 + k y k ) / (cid:19) − r . Here the constant C = λ d (∆)( d +1) − / σ max ( X ) − where X = (cid:18) x . . . x d . . . (cid:19) and f min := min ≤ j ≤ d f ( x j ) , f max := max ≤ j ≤ d f ( x j ) . HAN AND WELLNER

Lemma . Let ν be a probability measure with s -concave density f .Suppose that B (0 , δ ) ⊂ int(dom( f )) for some δ > . Then for any y ∈ R d , sup x ∈ B ( y,δ t ) f ( x ) ≤ J t (cid:18) ν ( B ( ty, δ t )) J λ d ( B ( ty, δ t )) (cid:19) − /r − (1 − t ) !! − r , where J := inf v ∈ B (0 ,δ ) f ( v ) and δ t = δ − t t . Now we are in position to prove Theorem 3.7.

Proof of Theorem 3.7.

That the sequence { f n } n ∈ N converges uniformlyon any compact subset in int(dom( f )) follows directly from Lemma 3.2 andTheorem 10.8 Rockafellar (1997). Now we show that if f is continuous at y ∈ R d with f ( y ) = 0, then for any η > δ = δ ( y, η ) such that(6.12) lim sup n →∞ sup x ∈ B ( y,δ ( y,η )) f n ( x ) ≤ η. Assume without loss of generality that B (0 , δ ) ⊂ int(dom( f )) for some δ >

0. Let J := inf x ∈ B (0 ,δ ) f ( x ). Then uniform convergence of { f n } to f over B (0 , δ ) entails thatlim inf n →∞ inf x ∈ B (0 ,δ ) f n ( x ) ≥ J . Hence with δ t = δ − t t , it follows from Lemma 6.3 thatlim sup n →∞ sup x ∈ B ( y,δ t ) f n ( x ) ≤ J (cid:18) t (cid:18) (cid:18) ν ( B ( ty, δ t )) J λ d ( B ( ty, δ t )) (cid:19) − /r − (1 − t ) (cid:19)(cid:19) − r ≤ J (cid:18) J /r (cid:0) sup x ∈ B ( ty,δ t ) f ( x ) (cid:1) − /r − (1 − t ) t (cid:19) − r → t ր

1. This completes the proof for (6.12). So far we have shown thatlim n →∞ sup x ∈ S ∩ B (0 ,ρ ) | f n ( x ) − f ( x ) | = 0holds for every ρ ≥

0, where S is the closed set contained in the con-tinuity points of f . Our goal is to let ρ → ∞ and conclude. Let ∆ =conv( { x , . . . , x d } ) be a non-void simplex with x , . . . , x d ∈ int(dom( f )).Note ﬁrst by a closer look at the proof of Lemma 3.5, f n ( x ) ∨ f ( x ) ≤ -CONCAVE ESTIMATION (cid:0) ( a k x k − b ) (cid:1) /s + holds for all x ∈ R d with some a, b >

0. Let ρ := inf { ρ ≥ (cid:0) aρ − b ) /s ≤ f min / } where f min := min ≤ j ≤ d f ( x i ) >

0. Then { x ∈ R d : k x k ≥ ρ } ⊂ \ n ≥ { f n ≤ f min / } \ { f ≤ f min / }⊂ \ n ≥ n { f n ≤ ( f n ) min } \ { f ≤ f min }⊂ \ n ≥ n { f n ≤ min j (cid:0) d X i = j f sn ( x i ) (cid:1) /s } \ { f ≤ min j (cid:0) d X i = j f s ( x i ) (cid:1) /s } , where n ∈ N is a large constant. The second inclusion follows from thefact that lim n →∞ f n ( x i ) = f ( x i ) holds for i = 0 , . . . , d . By Lemma 6.2 weconclude thatlim sup n →∞ sup x : k x k≥ ρ ∨ ρ (cid:0) k x k ) κ (cid:0) f n ( x ) ∨ f ( x ) (cid:1) ≤ sup x : k x k≥ ρ ∨ ρ f max (cid:0) k x k ) κ (cid:18) − dr + dr f min C (cid:0) k x k (cid:1) / (cid:19) − r → , as ρ → ∞ . This completes the proof. Proof of Theorem 3.8.

Since ∇ ξ f n ( x ) = − rg n ( x ) /s − ∇ ξ g n ( x ), |∇ ξ f n ( x ) − ∇ ξ f ( x ) | = r (cid:12)(cid:12)(cid:12) g n ( x ) /s ∇ ξ g n ( x ) − g ( x ) /s ∇ ξ g ( x ) (cid:12)(cid:12)(cid:12) ≤ r (cid:18) f n ( x ) |∇ ξ g n ( x ) − ∇ ξ g ( x ) | + | f n ( x ) − f ( x ) | |∇ ξ g ( x ) | (cid:19) ≤ r sup x ∈ T | f ( x ) | |∇ ξ g n ( x ) − ∇ ξ g ( x ) | + r sup x ∈ T | f n ( x ) − f ( x ) | sup x ∈ T k∇ g ( x ) k holds for n large enough by Theorem 3.7. By Theorem 23.4 in Rockafellar(1997), ∇ ξ g n ( x ) = τ Tx ξ for some τ x ∈ ∂g n ( x ) since ∂g n ( x ) is a closed set.Thus the ﬁrst term above is further bounded by2 r sup x ∈ T | f ( x ) | sup x ∈ T,τ ∈ ∂g n ( x ) k τ − ∇ g ( x ) k , which vanishes as n → ∞ in view of Lemma 3.10 in Seijo and Sen (2011).Note that ∇ g ( · ) is continuous on T by Corollary 25.5.1 in Rockafellar (1997),and hence sup x ∈ T k∇ g ( x ) k < ∞ . Now it is easy to see that the second termalso vanishes as n → ∞ by virtue of Theorem 3.7. HAN AND WELLNER

Proofs for Section 4.

Before we prove Theorem 4.1, we will need thefollowing tightness result.

Theorem . We have the following conclusions.1. For ﬁxed

K > , the modiﬁed local process Y locmod n ( · ) converges weaklyto a drifted integrated Gaussian process on C [ − K, K ] : Y locmod n ( t ) → d p f ( x ) Z t W ( s ) d s − rg ( k )0 ( x ) g ( x )( k + 2)! t k +2 , where W ( · ) is the standard two-sided Brownian motion starting from on R .2. The localized processes satisfy Y locmod n ( t ) − H locmod n ( t ) ≥ , with equality attained for all t such that x + tn − / (2 k +1) ∈ S (ˆ g n ) .3. The sequences { ˆ A n } and { ˆ B n } are tight. The above theorem includes everything necessary in order to apply the‘invelope’ argument roughly indicated in Section 4.1. For a proof of thistechnical result, we refer the reader to Section 7.2. Here we will provideproofs for our main results.

Proof of Theorem 4.1.

By the same tightness and uniqueness argu-ment adopted in Groeneboom, Jongbloed and Wellner (2001), Balabdaoui and Wellner(2007), and Balabdaoui, Ruﬁbach and Wellner (2009), we only have to ﬁndthe rescaling constants. To this end we denote H ( · ), Y ( · ) the correspond-ing limit of H locmod n ( · ) and Y locmod n ( · ) in the uniform topology on the space C [ − K, K ], and let Y ( t ) = γ Y k ( γ t ) , where by Theorem 6.4, we know that Y ( t ) = 1 p f ( x ) Z t W ( s ) d s − rg ( k )0 ( x ) g ( x )( k + 2)! t k +2 . Let a := (cid:0) f ( x ) (cid:1) − / and b := rg ( k )0 ( x ) g ( x )( k +2)! , then by rescaling property ofBrownian motion, we ﬁnd that γ γ / = a, γ γ k +22 = b . Solving for γ , γ yields(6.13) γ = a k +42 k +1 b − k +1 , γ = a − k +1 b k +1 . -CONCAVE ESTIMATION On the other hand, by (4.3), let n → ∞ , we ﬁnd that n k k +1 (cid:0) ˆ g n ( x + s n t ) − g ( x ) − s n tg ′ ( x ) (cid:1) n k − k +1 (cid:0) ˆ g ′ n ( x + s n t ) − g ′ ( x ) (cid:1) ! → d g ( x ) − r d d t H ( t ) g ( x ) − r d d t H ( t ) ! (6.14)It is easy to see that d d t H ( t ) = γ γ

22 d d t H k ( γ t ) nd d d t H ( t ) = γ γ

32 d d t H k ( γ t ).Now by substitution in (6.13) we get the conclusion by direct calculation andthe delta method. Proof of Theorem 4.4.

The proof is essentially the same as that ofTheorem 3.6 Balabdaoui, Ruﬁbach and Wellner (2009).

Lemma . Assume (A1)-(A4). Then Z ∞−∞ ˜ f ǫ ( x ) d x = 1 + π k rg ( k ) ( m ) g ( m ) r +1 ǫ k +1 + o ( ǫ k +1 ) , where π k = 1( k + 1)! h k − (2 k − k + 3) + 2 k − i . Proof of Lemma 6.5.

This is straightforward calculation by Taylor ex-pansion. Note that Z ∞−∞ ˜ g − rǫ ( x ) d x = Z ∞−∞ (˜ g − rǫ ( x ) − g − r ( x )) d x + 1= Z m − ǫm − c ǫ ǫ (cid:18) ˜ g − rǫ ( x ) − g − r ( x ) (cid:19) d x + Z m + ǫm − ǫ (cid:18) ˜ g − rǫ ( x ) − g − r ( x ) (cid:19) d x + 1:= I + II + 1 . For y > x , we have x − r − y − r = P ∞ n ≥ (cid:0) − rn (cid:1) ( − n ( y − x ) n y − r − n . Now for the HAN AND WELLNER ﬁrst term above, we continue our calculation of its leading term by noting g ( x ) − ˜ g ǫ ( x )= g ( x ) − g ( m − c ǫ ǫ ) − ( x − m + c ǫ ǫ ) g ′ ( m − c ǫ ǫ )= g ( m ) + g ( k ) ( m ) k ! ( x − m ) k − (cid:20) g ( m ) + g ( k ) ( m ) k ! ( − c ǫ ǫ ) k (cid:21) − ( x − m + c ǫ ǫ ) g ( k ) ( m )( k − − c ǫ ǫ ) k − + higher order terms= g ( k ) ( m ) k ! (cid:20) ( x − m ) k − c kǫ ǫ k + kc k − ǫ ǫ k − ( x − m + c ǫ ǫ ) (cid:21) + higher order terms . (6.15)Here we used the fact k is an even number, as shown in Lemma 7.2. Thuswe haveleading term of I= Z m − ǫm − c ǫ ǫ r (cid:18) g ( x ) − g ( m − c ǫ ǫ ) − ( x − m + c ǫ ǫ ) g ′ ( m − c ǫ ǫ ) (cid:19) g ( x ) − r − d x = rg ( k ) ( m ) k ! g ( m ) r +1 Z m − ǫm − c ǫ ǫ (cid:20) ( x − m ) k − c kǫ ǫ k + kc k − ǫ ǫ k − ( x − m + c ǫ ǫ ) (cid:21) d x + o ( ǫ k +1 )= α k rg ( k ) ( m ) g ( m ) r +1 ǫ k +1 + o ( ǫ k +1 )Here α k = 1( k + 1)! h k − (2 k − k + 3) − i . For the second term, g ( x ) − ˜ g ǫ ( x )= g ( x ) − g ( m + ǫ ) − ( x − m − ǫ ) g ′ ( m + ǫ )= g ( k ) ( m ) k ! (cid:20) ( x − m ) k − ǫ k − kǫ k − ( x − m − ǫ ) (cid:21) + higher order terms . (6.16)Now similar calculations yield that the second term = β k rg ( k ) ( m ) g ( m ) r +1 ǫ k +1 + o ( ǫ k +1 ) with β k = 2 k ( k + 1)! . This gives the conclusion. -CONCAVE ESTIMATION Proof of Lemma 4.6.

By deﬁnition of the Hellinger metric and Lemma6.5, we have2 h ( f ǫ , f ) = Z ∞−∞ (cid:0)p f ǫ ( x ) − p f ( x ) (cid:1) d x = Z ∞−∞ (cid:18) ˜ g − r/ ǫ ( x ) − π k rg ( k ) ( m ) g ( m ) r +1 ǫ k +1 + o ( ǫ k +1 ) ! − g − r/ ( x ) (cid:19) d x ≡ Z ∞−∞ (cid:16) ˜ g − r/ ǫ ( x )(1 + η k ( ǫ )) − g − r/ ( x ) (cid:17) d x since f ǫ ( x ) = ˜ g − rǫ ( x ) (cid:18) π k rg ( k ) ( m ) g ( m ) r +1 ǫ k +1 + o ( ǫ k +1 ) (cid:19) − = ˜ g − rǫ ( x ) (cid:18) − π k rg ( k ) ( m ) g ( m ) r +1 ǫ k +1 + o ( ǫ k +1 ) (cid:19) . Here η k ( ǫ ) = O ( ǫ k +1 ). Splitting two terms apart in the above integral weget2 h ( f ǫ , f ) = Z ∞−∞ (cid:18) ˜ g − r/ ǫ ( x ) − g − r/ ( x ) + η k ( ǫ )˜ g − r/ ǫ ( x ) (cid:19) d x = Z ∞−∞ (cid:0) ˜ g − r/ ǫ ( x ) − g − r/ ( x ) (cid:1) d x + (cid:0) η k ( ǫ ) (cid:1) Z ∞−∞ ˜ g − rǫ ( x ) d x + 2 η k ( ǫ ) Z ∞−∞ ˜ g − r/ ǫ ( x ) (cid:0) ˜ g − r/ ǫ ( x ) − g − r/ ( x ) (cid:1) d x = I + II + III.

Now for the ﬁrst term, I = Z m + ǫm − c ǫ ǫ r (cid:2) g ( x ) − ˜ g ǫ ( x ) (cid:3) g ( x ) − r − d x + higher order terms= r g ( m ) r +2 Z m + ǫm − c ǫ ǫ (cid:2) g ( x ) − ˜ g ǫ ( x ) (cid:3) d x + higher order terms= r g ( m ) r +2 (cid:18) Z m − ǫm − c ǫ ǫ + Z m + ǫm − ǫ (cid:19)(cid:2) g ( x ) − ˜ g ǫ ( x ) (cid:3) d x + higher order terms= I + I + higher order terms . HAN AND WELLNER

By (6.15) and (6.16) we see that for i = 1 , I i = r g ( m ) r +2 Z I i (cid:2) g ( x ) − ˜ g ǫ ( x ) (cid:3) d x = ζ ( i ) k r f ( m ) g ( k ) ( m ) g ( m ) ǫ k +1 + o ( ǫ k +1 ) . Here I = [ m − c ǫ ǫ, m − ǫ ], I = [ m − ǫ, m + ǫ ], and ζ (1) k = 1108( k !) ( k + 1)( k + 2)(2 k + 1) (cid:20) − · k +2 (2 k + 1)(3 k +2 + k + k − k + 1)( k + 2) (cid:18) k +1 −

1) + 2 · k (2 k + 1)(2 k (2 k −

9) + 27) (cid:19)(cid:21) .ζ (2) k = 2 k (2 k + 1)3( k !) ( k + 1)(2 k + 1) . On the other hand, II = O ( ǫ (2 k +2) ) = o ( ǫ k +1 ) and | III | ≤ O ( ǫ k +1 · ǫ (2 k +1) / · ǫ (2 k +2) / ) = o ( ǫ k +1 ) by Cauchy-Schwarz. This completes the proof.

7. Appendix.

Proofs of Lemmas 6.2 and 6.3.

Lemma . Let ν be a probability measure with s -concave density f ,and x , . . . , x d ∈ R d be d + 1 points such that ∆ := conv( { x , . . . , x d } ) isnon-void. If f ( x ) ≤ (cid:0) d P di =1 f s ( x i ) (cid:1) /s , then f ( x ) ≤ ¯ g − r (cid:18) − dr + dr λ d (∆)¯ g − r ν (∆) (cid:19) − r , where ¯ g := d P dj =1 f s ( x j ) . Proof of Lemma 7.1.

For any point x ∈ ∆, we can ﬁnd some u =( u , . . . , u d ) ∈ ∆ d = { u : P di =1 u i ≤ } such that x ( u ) = P di =0 u i x i . Here u := 1 − P di =1 u i ≥

0. We use the following representation of integrationon the unit simplex ∆ d : For any measurable function h : ∆ d → [0 , ∞ ), wehave R ∆ d h ( u ) d u = d ! E h ( B , . . . , B d ) , where B i = E i / P dj =0 E j with inde-pendent, standard exponentially distributed random variables E , . . . , E d . ν (∆) λ d (∆) = 1 λ d (∆ d ) Z ∆ d g (cid:0) x ( u ) (cid:1) − r d u = E g (cid:18) d X j =0 B j x j (cid:19) − r ≥ E (cid:18) d X j =0 B j g ( x j ) (cid:19) − r = E (cid:18) B g + (1 − B ) d X i =1 ˜ B i g ( x i ) (cid:19) − r , -CONCAVE ESTIMATION where ˜ B i := E i / P dj =1 E j for 1 ≤ i ≤ d . Following Cule and D¨umbgen(2008), it is known that B and { ˜ B i } di =1 are independent, and E [ ˜ B i ] = 1 /d .Hence it follows from Jensen’s inequality that ν (∆) λ d (∆) ≥ E " E (cid:18) B g + (1 − B ) d X i =1 ˜ B i g ( x i ) (cid:19) − r (cid:12)(cid:12)(cid:12)(cid:12) B ≥ E (cid:18) B g + (1 − B ) 1 d d X i =1 g ( x i ) (cid:19) − r = E ( B g + (1 − B )¯ g ) − r = Z d (1 − t ) d − (cid:0) tg + (1 − t )¯ g (cid:1) − r d t = ¯ g − r Z d (1 − t ) d − (cid:18) − st (cid:18) ( − /s ) (cid:18) g ¯ g − (cid:19)(cid:19)(cid:19) d t = ¯ g − r J d,s (cid:18) − s (cid:18) g ¯ g − (cid:19) (cid:19) , where J d,s ( y ) = Z d (1 − t ) d − (1 − syt ) /s d t. We claim that J d,s ( y ) ≥ Z d (1 − t ) d − (1 − t ) y d t = dd + y , holds for s < , y >

0. To see this, we write (1 − syt ) /s = (1 + yt/r ) − ( r/y ) y . Then we only have to show (1 + yt/r ) − r/y ≥ (1 − t ) for 0 ≤ t ≤

1, orequivalently (1 + bt ) ≤ (1 − t ) − b where we let b = y/r . Let g ( t ) := (1 − t ) − b − (1 + bt ). It is easy to verify that g (0) = 0, g ′ ( t ) = b (1 − t ) − b − − b with g ′ (0) = 0, and g ′′ ( t ) = b ( b + 1)(1 − t ) − b − ≥

0. Integrating g ′′ twiceyields g ( t ) ≥

0, and hence we have veriﬁed the claim. Now we proceed withthe calculation ν (∆) λ d (∆) ≥ ¯ g − r J d,s (cid:18) − s (cid:18) g ¯ g − (cid:19) (cid:19) ≥ ¯ g − r dd − s (cid:0) g ¯ g − (cid:1) . Solving for g and replacing − /s = r proves the desired inequality. Proof of Lemma 6.2.

For ﬁxed j ∈ { , . . . , d } , note | det( x i − x j ) : i = j | = | det X | where X = (cid:18) x . . . x d . . . (cid:19) . Also for each y ∈ R d , since ∆ = HAN AND WELLNER conv( { x , . . . , x d } ) is non-void, y must be in the aﬃne hull of ∆ and hencewe can write y = P di =0 λ i x i with P di =0 λ i = 1 (not necessary non-negative),i.e. λ = X − (cid:0) y (cid:1) . Let ∆ j ( y ) := conv( { x i : i = j } ∪ { y } ). Then λ d (∆ j ( y )) = 1 d ! (cid:12)(cid:12)(cid:12)(cid:12) det (cid:18) x . . . x j − y x j +1 . . . x d . . . . . . (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) = 1 d ! | λ j | | det X | = | λ j | λ d (∆) . Hence,max ≤ j ≤ d λ d (∆ j ( y )) ≥ λ d (∆) max j | λ j | = λ d (∆) k X − (cid:18) y (cid:19) k ∞ ≥ λ d (∆)( d + 1) − / k X − (cid:18) y (cid:19) k≥ λ d (∆)( d + 1) − / σ max ( X ) − (1 + k y k ) / = C (1 + k y k ) / . Now the conclusion follows from Lemma 7.1 by noting f ( y ) ≤ ¯ g − rj (cid:18) − dr + dr λ d (∆ j ( y ))¯ g − rj ν (∆ j ( y )) (cid:19) − r ≤ f max (cid:18) − dr + dr f min C (1 + k y k ) / (cid:19) − r , since ¯ g − rj = (cid:0) d P i = j f s ( x i ) (cid:1) /s and hence f min ≤ ¯ g − rj ≤ f max , and the index j is chosen such that λ d (∆ j ( y )) is maximized. Proof of Lemma 6.3.

The key point that for any point x ∈ B ( y, δ t ) B ( ty, δ t ) ⊂ (1 − t ) B (0 , δ ) + tx can be shown in the same way as in the proof of Lemma 4.2 Schuhmacher, H¨usler and D¨umbgen(2011). Namely, pick any w ∈ B ( ty, δ t ), let v := (1 − t ) − ( w − tx ), then since k v k = (1 − t ) − k w − tx k = (1 − t ) − k w − ty + t ( y − x ) k ≤ (1 − t ) − ( δ t + tδ t ) = δ, and hence v ∈ B (0 , δ ). This implies that w = (1 − t ) v + tx ∈ (1 − t ) B (0 , δ )+ tx ,as desired. By s -concavity of f , we have f ( w ) ≥ (cid:0) (1 − t ) f ( v ) s + tf ( x ) s (cid:1) /s ≥ (cid:0) (1 − t ) J s + tf ( x ) s (cid:1) /s = J (cid:18) − t + t (cid:18) f ( x ) J (cid:19) s (cid:19) /s . -CONCAVE ESTIMATION Averaging over w ∈ B ( ty, δ t ) yields ν ( B ( ty, δ t )) λ d ( B ( ty, δ t )) ≥ J (cid:18) − t + t (cid:18) f ( x ) J (cid:19) s (cid:19) /s . Solving for f ( x ) completes the proof.7.2. Proof of Theorem 6.4.

We ﬁrst observe that

Lemma . k is an even integer and g ( k )0 ( x ) > . Proof of Lemma 7.2.

By Taylor expansion of g ′′ around x , we ﬁndthat locally for x ≈ x , g ′′ ( x ) = g ( k )0 ( x )( k − x − x ) k − + o (cid:0) ( x − x ) k − (cid:1) . Also note g ′′ ( x ) ≥ k − g ( k )0 ( x ) > k , r n := n k +22 k +1 ; s n := n − k +1 ; x n ( t ) := x + s n t ; l n,x :=[ x , x n ( t )] . Let τ + n := inf { t ∈ S n (ˆ g n ) : t > x } , and τ − n := sup { t ∈ S n (ˆ g n ) : t < x } . The key step in establishing the limit theory, is to establish astochastic bound for the gap τ + n − τ − n as follows. Theorem . Assume (A1)-(A4) hold. Then τ + n − τ − n = O p ( s n ) . Proof.

Deﬁne ∆ ( x ) := ( τ − n − x ) [ τ − n , ¯ τ ] ( x ) + ( x − τ + n ) [¯ τ,τ + n ] ( x ), and∆ := ∆ + τ + n − τ − n [ τ − n ,τ + n ] , where ¯ τ =: τ − n + τ + n . Thus we ﬁnd that Z ∆ d( F n − F ) = Z ∆ d( F n − ˆ F n ) + Z ∆ d( ˆ F n − F ) ≥ − τ + n − τ − n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)Z τ + n τ − n d( F n − ˆ F n ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + Z ∆ ( ˆ f n − f ) d λ ≥ − τ + n − τ − n n + Z ∆ ( ˆ f n − f ) d λ, where the last line follows from Corollary 2.13. Now let R n := R ∆ ( ˆ f n − f ) d λ, R n := R ∆ d( F n − F ) . The conclusion follows directly from thefollowing lemma. HAN AND WELLNER

Lemma . Suppose (A1)-(A4) hold. Then R n = O p ( τ + n − τ − n ) k +2 and R n = O p ( r − n ) . Proof of Lemma 7.4.

Deﬁne p n := ˆ g n /g on [ τ + n , τ − n ]. It is easy to seethat τ + n − τ − n = o p (1), so with large probability, for all n ∈ N large enough,inf x ∈ [ τ + n ,τ − n ] f ( x ) > R n = Z τ + n τ − n ∆ ( x ) (cid:0) ˆ f n ( x ) − f ( x ) (cid:1) d x = Z τ + n τ − n ∆ ( x ) f ( x ) (cid:18) ˆ f n ( x ) f ( x ) − (cid:19) d x = Z τ + n τ − n ∆ ( x ) f ( x ) (cid:18) k − X j =1 (cid:18) − rj (cid:19) ( p n ( x ) − j + (cid:18) − rk (cid:19) θ − r − kx,n ( p n ( x ) − k (cid:19) d x, where θ x,n ∈ [1 ∧ ˆ g n ( x ) g ( x ) , ∨ ˆ g n ( x ) g ( x ) ]. Now deﬁne S nj = Z τ + n τ − n ∆ ( x ) f ( x ) (cid:18) − rj (cid:19) ( p n ( x ) − j d x, ≤ j ≤ k − ,S nk = Z τ + n τ − n ∆ ( x ) f ( x ) (cid:18) − rk (cid:19) θ − r − kx,n ( p n ( x ) − k d x. Expand f around ¯ τ , then we have S nj = k − X l =0 Z τ + n τ − n ∆ ( x ) f ( l )0 (¯ τ ) l ! ( x − ¯ τ ) l (cid:18) − rj (cid:19) ( p n ( x ) − j d x + Z τ + n τ − n ∆ ( x ) f ( l )0 ( η n,x,k ) k ! ( x − ¯ τ ) k (cid:18) − rk (cid:19) ( p n ( x ) − k d x,S nk = k − X l =0 Z τ + n τ − n ∆ ( x ) f ( l )0 (¯ τ ) l ! θ − r − kx,n ( x − ¯ τ ) l (cid:18) − rj (cid:19) ( p n ( x ) − k d x + Z τ + n τ − n ∆ ( x ) f ( l )0 ( η n,x,k ) k ! θ − r − kx,n ( x − ¯ τ ) k (cid:18) − rk (cid:19) ( p n ( x ) − k d x. Now we see the dominating term is the ﬁrst term in S n since all other termsare of higher orders, and | θ x,n − | = o p (1) uniformly locally in x in view ofTheorem 3.7. We denote this term Q n . Note that 1 /g ( x ) = 1 /g ( τ )+ o p (1) -CONCAVE ESTIMATION uniformly in τ around x , and that ˆ g n is piecewise linear, yielding Q n − rf (¯ τ ) = Z τ + n τ − n ∆ ( x ) 1 g ( x ) (cid:0) ˆ g n ( x ) − g ( x ) (cid:1) d x = (cid:18) g ( x ) + o p (1) (cid:19) Z τ + n τ − n ∆ ( x ) (cid:0) ˆ g n ( x ) − g ( x ) (cid:1) d x = (cid:18) g ( x ) + o p (1) (cid:19)(cid:20)(cid:0) ˆ g n (¯ τ ) − g (¯ τ ) (cid:1) Z τ + n τ − n ∆ ( x ) d x + (cid:0) ˆ g ′ n (¯ τ ) − g ′ (¯ τ ) (cid:1) Z τ + n τ − n ∆ ( x )( x − ¯ τ ) d x − k X j =2 g ( j )0 (¯ τ ) j ! Z τ + n τ − n ∆ ( x )( x − ¯ τ ) j d x − Z τ + n τ − n ǫ n ( x )∆ ( x )( x − ¯ τ ) k d x (cid:21) , where the ﬁrst two terms in the bracket is zero by construction of ∆ . Nownote that Z τ + n τ − n ∆ ( x )( x − ¯ τ ) j d x = ( j = 0 , or j is odd; j j +2 ( j +1)( j +2) (cid:0) τ + n − τ − n (cid:1) j +2 j > , and j is even , and that g ( j )0 (¯ τ ) = k − j )! ( g ( k )0 ( x ) + o p (1)) (cid:0) ¯ τ − x ) k − j . This means that for j ≥ j even, g ( j )0 (¯ τ ) j ! Z τ + n τ − n ∆ ( x )( x − ¯ τ ) j d x = j ( g ( k )0 ( x ) + o p (1))( k − j )!( j + 2)!2 j +2 (¯ τ − x ) k − j ( τ + n − τ − n ) j +2 = j ( g ( k )0 ( x ) + o p (1))( k − j )!( j + 2)!2 j +2 O p (1)( τ + n − τ − n ) k +2 . Further note that k ǫ n k ∞ = o p (1) as τ + n − τ − n → p

0, we get Q n = O p ( τ + n − τ − n ) k +2 . This establishes the ﬁrst claim. The proof for R n follows the sameline as in the proof of Lemma 4.4 Balabdaoui, Ruﬁbach and Wellner (2009)p1318-1319. Lemma . We have the following: f ( j )0 ( x ) = j ! (cid:18) − rj (cid:19) g ( x ) − r − j (cid:0) g ′ ( x ) (cid:1) j , ≤ j ≤ k − f ( k )0 ( x ) = k ! (cid:18) − rk (cid:19) g ( x ) − r − k (cid:0) g ′ ( x ) (cid:1) k − rg ( x ) − r − g ( k )0 ( x ) . HAN AND WELLNER

Proof.

This follows from direct calculation.

Lemma . For any

M > , we have sup | t |≤ M (cid:12)(cid:12) ˆ g ′ n ( x + s n t ) − ˆ g ′ ( x ) (cid:12)(cid:12) = O p ( s k − n );sup | t |≤ M (cid:12)(cid:12) ˆ g n ( x + s n t ) − g ( x ) − s n tg ′ ( x ) (cid:12)(cid:12) = O p ( s kn ) . The proof is identical to Lemma 4.4 in Groeneboom, Jongbloed and Wellner(2001) so we shall omit it.

Lemma . Let ˆ e n ( u ) := ˆ f n ( u ) − k − X j =0 f ( j )0 ( x ) j ! ( u − x ) j − f ( x ) (cid:18) − rk (cid:19)(cid:18) g ′ ( x ) g ( x ) (cid:19) k ( u − x ) k . Then for any

M > , we have sup | t |≤ M | ˆ e n ( x + s n t ) | = O p ( s kn ) . Proof.

Note thatˆ f n ( u ) − f ( x ) = f ( x ) (cid:20) ˆ f n ( u ) f ( x ) − (cid:21) = f ( x ) (cid:20)(cid:18) ˆ g n ( u ) g ( x ) (cid:19) − r − (cid:21) = f ( x ) (cid:18) k X j =1 (cid:18) − rj (cid:19)(cid:18) ˆ g n ( u ) g ( x ) − (cid:19) j + X j ≥ k +1 (cid:18) − rj (cid:19)(cid:18) ˆ g n ( u ) g ( x ) − (cid:19) j | {z } =: ˆΨ k,n, ( u ) (cid:19) . (7.1)Deﬁne ˆΨ k,n, ( u ) := P j ≥ k +1 (cid:0) − rj (cid:1) (cid:16) ˆ g n ( u ) g ( x ) − (cid:17) j = P j ≥ k +1 (cid:0) − rj (cid:1) g ( x ) j (cid:0) ˆ g n ( u ) − g ( x ) (cid:1) j . Note that (cid:0) ˆ g n ( u ) − g ( x ) (cid:1) j = (cid:0) ˆ g n ( u ) − g ( x ) − ( u − x ) g ′ ( x ) + ( u − x ) g ′ ( x ) (cid:1) j = j X l =1 (cid:18) jl (cid:19)(cid:2) ˆ g n ( u ) − g ( x ) − ( u − x ) g ′ ( x ) (cid:3) l ( u − x ) j − l g ′ ( x ) j − l + ( u − x ) j g ′ ( x ) j = O p ( s kln · s j − ln ) + O p ( s jn )uniformly on { u : | u − x | ≤ M n − / (2 k +1) } = O p ( n − j k +1 ) , -CONCAVE ESTIMATION if j ≥ k + 1. Here the third line follows from Lemma 7.6. This impliesˆΨ k,n, ( u ) = o p ( n − k k +1 ) , uniformly on { u : | u − x | ≤ M n − / (2 k +1) } . Usingthe same expansion in the ﬁrst term on the right hand side of (7.1), we arriveat ˆ f n ( u ) − f ( x ) | {z } (1) = f ( x ) k X j =1 (cid:18) − rj (cid:19) g ( x )] j j X r =1 (cid:18) jr (cid:19)(cid:2) ˆ g n ( u ) − g ( x ) − ( u − x ) g ′ ( x ) (cid:3) r ( u − x ) j − r g ( x ) j − r | {z } (2) + f ( x ) k X j =1 (cid:18) − rj (cid:19)(cid:18) g ′ ( x ) g ( x ) (cid:19) j ( u − x ) j | {z } (3) + f ( x ) ˆΨ k,n, ( u ) | {z } (4) . By Lemma 7.5, we see that ˆ e n ( u ) = (1) − (3) = (2) + (4) = O p ( s kn ) uniformlyon { u : | u − x | ≤ M n − / (2 k +1) } . This yields the desired result.We are now ready for the proof of Theorem 6.4. HAN AND WELLNER

Proof of Theorem 6.4.

For the ﬁrst assertion, note that[ f ( x )] − (cid:18) ˆ f n ( u ) − k − X j =0 f ( j )0 ( x ) j ! ( u − x ) j (cid:19) =[ f ( x )] − (cid:18) ˆ f n ( u ) − f ( x ) − k − X j =1 f ( j )0 ( x ) j ! ( u − x ) j (cid:19) =[ f ( x )] − (cid:18) f ( x ) (cid:18) k X j =1 (cid:18) − rj (cid:19)(cid:18) ˆ g n ( u ) g ( x ) − (cid:19) j + ˆΨ k,n, ( u ) (cid:19) − k − X j =1 f ( j )0 ( x ) j ! ( u − x ) j (cid:19) by (7 . k,n, ( u ) + k X j =1 (cid:18) − rj (cid:19)(cid:18) ˆ g n ( u ) g ( x ) − (cid:19) j − [ f ( x )] − k − X j =1 f ( j )0 ( x ) j ! ( u − x ) j = ˆΨ k,n, ( u ) + (cid:18) − r (cid:19)(cid:18) ˆ g n ( u ) g ( x ) − (cid:19) − f ( x ) f ′ ( x )( u − x )+ k X j =2 (cid:18) − rj (cid:19)(cid:18) ˆ g n ( u ) g ( x ) − (cid:19) j − [ f ( x )] − k − X j =2 f ( j )0 ( x ) j ! ( u − x ) j = ˆΨ k,n, ( u ) − rg ( x ) (cid:18) ˆ g n ( u ) − g ( x ) − g ′ ( x )( u − x ) (cid:19) + k X j =2 (cid:18) − rj (cid:19)(cid:18) ˆ g n ( u ) g ( x ) − (cid:19) j − [ f ( x )] − k − X j =2 f ( j )0 ( x ) j ! ( u − x ) j = − rg ( x ) (cid:18) ˆ g n ( u ) − g ( x ) − g ′ ( x )( u − x ) (cid:19) + ˆΨ k,n, ( u ) , whereˆΨ k,n, ( u ) := ˆΨ k,n, ( u )+ k X j =2 (cid:18) − rj (cid:19)(cid:18) ˆ g n ( u ) g ( x ) − (cid:19) j − [ f ( x )] − k − X j =2 f ( j )0 ( x ) j ! ( u − x ) j . -CONCAVE ESTIMATION Now we calculate Z l n,x Z vx ˆΨ k,n, ( u )d u d v = 12 t n − k +1 sup u ∈ l n,x (cid:12)(cid:12)(cid:12) ˆΨ k,n, ( u ) (cid:12)(cid:12)(cid:12) + k X j =2 (cid:18) − rj (cid:19) Z l n,x Z vx (cid:18) ˆ g n ( u ) g ( x ) − (cid:19) j d u d v − [ f ( x )] − k − X j =2 f ( j )0 ( x ) j ! Z l n,x Z vx ( u − x ) j d u d v = o p ( r − n ) + k X j =2 (cid:18) − rj (cid:19)(cid:18) g ′ ( x ) g ( x ) (cid:19) j Z l n,x Z vx ( u − x ) j d u d v − k − X j =2 (cid:18) − rj (cid:19)(cid:18) g ′ ( x ) g ( x ) (cid:19) j Z l n,x Z vx ( u − x ) j d u d v + (cid:18) k X j =2 (cid:18) − rj (cid:19) g ( x )] j × Z l n,x Z vx j X l =1 (cid:18) jl (cid:19)(cid:0) ˆ g n ( u ) − g ( x ) − g ′ ( x )( u − x ) (cid:1) l ( u − x ) j − l [ g ′ ( x )] j − l d u d v (cid:19) = o p ( r − n ) + (cid:18) − rk (cid:19)(cid:18) g ′ ( x ) g ( x ) (cid:19) k Z l n,x Z vx ( u − x ) k d u d v + (cid:18) k X j =2 (cid:18) − rj (cid:19) g ( x )] j × Z l n,x Z vx j X l =1 (cid:18) jl (cid:19)(cid:0) ˆ g n ( u ) − g ( x ) − g ′ ( x )( u − x ) (cid:1) l ( u − x ) j − l [ g ′ ( x )] j − l d u d v (cid:19) := o p ( r − n ) + (2) + (1) . Consider (1): for each ( j, l ) satisfying 1 ≤ l ≤ j ≤ k and j ≥

2, we have(1) : r n Z l n,x Z vx (cid:0) ˆ g n ( u ) − g ( x ) − g ′ ( x )( u − x ) (cid:1) l ( u − x ) j − l [ g ′ ( x )] j − l d u d v = n k +22 k +1 · O ( n − k +1 ) · O p ( n − kl k +1 ) · O p ( n − j − l k +1 ) = O p ( n − k ( l − j − l )2 k +1 ) = o p (1) . HAN AND WELLNER

Consider (2) as follows:(2) = (cid:18) − rk (cid:19)(cid:18) g ′ ( x ) g ( x ) (cid:19) k Z l n,x Z vx ( u − x ) k d u d v = 1( k + 1)( k + 2) (cid:18) − rk (cid:19)(cid:18) g ′ ( x ) g ( x ) (cid:19) k t k +2 r − n . Hence we have r n Z l n,x Z vx ˆΨ k,n, ( u )d u d v = 1( k + 1)( k + 2) (cid:18) − rk (cid:19)(cid:18) g ′ ( x ) g ( x ) (cid:19) k t k +2 + o p (1) . Note by deﬁnition we have(7.2) Y locmod n ( t ) = Y loc n ( t ) f ( x ) − r n Z l n,x Z vx ˆΨ k,n, ( u )d u d v. Let n → ∞ , by the same calculation in the proof of Theorem 6.2 Groeneboom, Jongbloed and Wellner(2001), we have Y locmod n ( t ) → d p f ( x ) Z t W ( s ) d s + (cid:20) f ( k )0 ( x )( k + 2)! f ( x ) − k + 1)( k + 2) (cid:18) − rk (cid:19)(cid:18) g ′ ( x ) g ( x ) (cid:19) k (cid:21) t k +2 = 1 p f ( x ) Z t W ( s ) d s − rg ( k )0 ( x ) g ( x )( k + 2)! t k +2 , where the last line follows from Lemma 7.5. Now we turn to the secondassertion. It is easy to check by the deﬁnition of ˆΨ k,n, ( · ) that(7.3) H locmod n ( t ) = H loc n ( t ) f ( x ) − r n Z l n,x Z vx ˆΨ k,n, ( u )d u d v. On the other hand, simple calculation yields that Y loc n ( t ) − H loc n ( t ) = r n (cid:0) H n ( x + s n t ) − ˆ H n ( x + s n t ) (cid:1) ≥ { ˆ A n } and { ˆ B n } . By Theorem 7.3, we can ﬁnd M > τ ∈ S (ˆ g n ) such that 0 ≤ τ − x ≤ M n − / (2 k +1) with large probability. -CONCAVE ESTIMATION Now note (cid:12)(cid:12)(cid:12) ˆ A n (cid:12)(cid:12)(cid:12) ≤ r n s n (cid:12)(cid:12)(cid:12)(cid:0) ˆ F n ( x ) − ˆ F n ( τ ) (cid:1) − (cid:0) F n ( x ) − F n ( τ ) (cid:1)(cid:12)(cid:12)(cid:12) + r n s n n ≤ r n s n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)Z τx (cid:18) ˆ f n ( u ) − k − X j =0 f ( j )0 ( x ) j ! ( u − x ) j (cid:19) d u (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + r n s n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)Z τx (cid:18) k − X j =0 f ( j )0 ( x ) j ! ( u − x ) j − f ( u ) (cid:19) d u (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + r n s n (cid:12)(cid:12)(cid:12)(cid:12)Z τx d( F n − F ) (cid:12)(cid:12)(cid:12)(cid:12) + n − k/ (2 k +1) =: ˆ A n + ˆ A n + ˆ A n + n − k/ (2 k +1) . We calculate three terms respectively.ˆ A n ≤ r n s n (cid:12)(cid:12)(cid:12)(cid:12)Z τx ˆ e n ( u ) d u (cid:12)(cid:12)(cid:12)(cid:12) + r n s n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)Z τx f ( x ) (cid:18) − rk (cid:19)(cid:18) g ′ ( x ) g ( x ) (cid:19) k ( u − x ) k d u (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = O p ( r n s n · s k +1 n ) + o p ( r n s n · s k +1 n ) = O p (1) , by Lemma 7 . A n ≤ r n s n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)Z τx f ( k )0 ( x ) k ! ( u − x ) k d u (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + r n s n (cid:12)(cid:12)(cid:12)(cid:12)Z τx ( u − x ) k ǫ n ( u ) d u (cid:12)(cid:12)(cid:12)(cid:12) = O p (1) , since k ǫ n k ∞ → p x − τ → p . For ˆ A n , we follow the lines of Lemma 4.1 Balabdaoui, Ruﬁbach and Wellner(2009) again to conclude. Fix R >

0, and consider the function class F x ,R := { [ x ,y ] : x ≤ y ≤ x + R } . Then F x ,R ( z ) := [ x ,x + R ] ( z ) is an envelopfunction for F x ,R , and E F x ,R = R x + Rx d z = R. Now let s = k, d = 1 inLemma 4.1 Balabdaoui, Ruﬁbach and Wellner (2009), we haveˆ A n = (cid:12)(cid:12)(cid:12)(cid:12)Z τx d( F n − F )( z ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ | τ − x | k +1 + O p (1) n − k +12 k +1 = O p (1) . This completes the proof for tightness for { A n } . { B n } follows from similarargument so we omit the details.7.3. Auxiliary convex analysis.

Lemma . For any ϕ ( · ) ∈ G with non-empty domain, and ǫ > , deﬁne ϕ ( ǫ ) ( x ) := sup ( v,c ) ( v T x + c ) HAN AND WELLNER where the supremum is taken over all pairs of ( v, c ) ∈ R d × R such that1. k v k ≤ ǫ ;2. ϕ ( y ) ≥ v T y + c holds for all y ∈ R d .Then ϕ ( ǫ ) ∈ G with Lipschitz constant ǫ . Furthermore, ϕ ( ǫ ) ր ϕ, as ǫ ց , where the convergence is pointwise for all x ∈ R d . Lemma . Given Q ∈ Q , a point x ∈ R d is an interior point of csupp ( Q ) if and onlyif h ( Q, x ) ≡ sup { Q ( C ) : C ⊂ R d closed and convex , x / ∈ int( C ) } < . Moreover, if { Q n } ⊂ Q converges weakly to Q , then lim sup n →∞ h ( Q n , x ) ≤ h ( Q, x ) holds for all x ∈ R d . Lemma . If g ∈ G , then there exists a, b > such that for all x ∈ R d , g ( x ) ≥ a k x k − b. Proof.

The proof is essentially the same as for Lemma 1, Cule and Samworth(2010), so we shall omit it.Consider the class of functions G M := (cid:26) g ∈ G : Z g β d x ≤ M (cid:27) . Lemma . For a given g ∈ G M , denote D r := D ( g, r ) := { g ≤ r } tobe the level set of g ( · ) at level r , and ǫ := inf g . Then for r > ǫ , we have λ ( D r ) ≤ M ( − s )( r − ǫ ) d ( s + 1) R r − ǫ v d ( v + ǫ ) /s d v , where β = 1 + 1 /s , and − < s < . -CONCAVE ESTIMATION Proof.

For u ∈ [ ǫ, r ], by convexity of g ( · ), we have λ ( D u ) ≥ (cid:18) u − ǫr − ǫ (cid:19) d λ ( D r ) . This can be seen as follows: Consider the epigraph Γ g of g ( · ), where Γ g = { ( t, x ) ∈ R d × R : x ≥ g ( t ) } . Let x ∈ R d be a minimizer of g . Considerthe convex set C r = conv (cid:0) Γ g ∩ { g = r } , ( x , ǫ ) (cid:1) ⊂ Γ g ∩ { g ≤ r } . where theinclusion follows from the convexity of Γ g as a subset of R d +1 . The claimedinequality follows from λ d ( D u ) = λ d (cid:0) π d (Γ g ∩{ g = u } ) (cid:1) ≥ λ d (cid:0) π d ( C r ∩{ g = u } ) (cid:1) = (cid:18) u − ǫr − ǫ (cid:19) d λ d ( D r ) , where π d : R d × R → R d is the natural projection onto the ﬁrst component.Now we do the calculation as follows: M ≥ Z D r (cid:0) g ( x ) /s +1 − r /s +1 (cid:1) d x = − (cid:18) s + 1 (cid:19) Z D r (cid:18) Z rǫ ( u ≥ g ( x )) u /s d u (cid:19) d x = − (cid:18) s + 1 (cid:19) Z rǫ u /s d u Z D r ( u ≥ g ( x )) d x = − (cid:18) s + 1 (cid:19) Z rǫ λ ( D u ) u /s d u ≥ − (cid:18) s + 1 (cid:19) Z rǫ (cid:18) u − ǫr − ǫ (cid:19) d λ ( D r ) u /s d u = λ ( D r ) · ( s + 1) R rǫ ( u − ǫ ) d u /s d u ( − s )( r − ǫ ) d . By a change of variable in the integral we get the desired inequality.

Lemma . Let G be a convex set in R d with non-empty interior,and a sequence { y n } n ∈ N with k y n k → ∞ as n → ∞ . Then there exists { x , . . . , x d } ⊂ G such that λ d (cid:0) conv (cid:0) x , . . . , x d , y n ( k ) (cid:1) (cid:1) → ∞ , as k → ∞ where { y n ( k ) } k ∈ N is a suitable subsequence of { y n } n ∈ N . HAN AND WELLNER

Proof.

Without loss of generality we assume 0 ∈ int(dom( G )), and weﬁrst choose a convergence subsequence { y n ( k ) } k ∈ N from { y n / k y n k} n ∈ N . Nowif we let a := lim k →∞ y n ( k ) / k y n ( k ) k , then k a k = 1. Since G has non-emptyinterior, { a T x = 0 } ∩ G has non-empty relative interior. Thus we can choose x , . . . , x d ⊂ { a T x = 0 } ∩ G such that λ d − ( K ) ≡ λ d − (cid:0) conv ( x , . . . , x d ) (cid:1) >

0. Note thatdist (cid:0) y n ( k ) , aﬀ( K ) (cid:1) = dist (cid:0) y n ( k ) , { a T x = 0 } (cid:1) = h y n ( k ) , a i = k y n ( k ) kh y n ( k ) / k y n ( k ) k , a i → ∞ , as k → ∞ . Since λ d (cid:0) conv (cid:0) x , . . . , x d , y n ( k ) (cid:1) (cid:1) = λ d (cid:0) conv (cid:0) K, y n ( k ) (cid:1) (cid:1) = cλ d − ( K ) · dist (cid:0) y n ( k ) , aﬀ( K ) (cid:1) , for some constant c = c ( d ) >

0, the proof is complete as we let k → ∞ . Lemma . Let ¯ g and { g n } n ∈ N be functions in G such that g n ≥ ¯ g , for all n ∈ N . Supposethe set C := { x ∈ R d : lim sup n →∞ g n ( x ) < ∞} is non-empty. Then thereexist a subsequence { g n ( k ) } k ∈ N of { g n } n ∈ N , and a function g ∈ G such that C ⊂ dom( g ) and lim k →∞ ,x → y g n ( k ) ( x ) = g ( y ) , for all y ∈ int(dom( g )) , lim inf k →∞ ,x → y g n ( k ) ( x ) ≥ g ( y ) , for all y ∈ R d . (7.4) Lemma . Let { g n } be a sequence of non-negative convex functionssatisfying the following conditions:(A1). There exists a convex set G with non-empty interior such that for all x ∈ int( G ) , we have sup n ∈ N g n ( x ) < ∞ . (A2). There exists some M > such that sup n ∈ N R (cid:0) g n ( x ) (cid:1) β d x ≤ M < ∞ . Then there exists a, b > such that for all x ∈ R d and k ∈ N g n ( k ) ( x ) ≥ a k x k − b, where { g n ( k ) } k ∈ N is a suitable subsequence of { g n } n ∈ N . Proof.

Without loss of generality we may assume G is contained in allint(dom( g n )). We ﬁrst note (A1)-(A2) implies that { b x n ∈ Arg min x ∈ R d g n ( x ) } ∞ n =1 is a bounded sequence, i.e.(7.5) sup n ∈ N k b x n k < ∞ , -CONCAVE ESTIMATION Suppose not, then without loss of generality we may assume k b x n k → ∞ as n → ∞ . By Lemma 7.12, we can choose { x , . . . , x d } ⊂ G such that λ d (cid:0) conv (cid:0) x , . . . , x d , b x n ( k ) (cid:1) (cid:1) → ∞ , as k → ∞ for some subsequence { b x n ( k ) } ⊂{ b x n } . For simplicity of notation we think of { b x n } as such an appropriate sub-sequence. Denote ǫ n := inf x ∈ R d g n ( x ), and M := sup n ∈ N ǫ n ≤ sup n ∈ N g n ( x ) < ∞ by (A1). Again by (A1) and convexity we may assume thatsup x ∈ conv( x ,...,x d , b x n ) g n ( x ) ≤ M , holds for some M > n ∈ N . This implies that Z g βn ( x ) d x ≥ M β λ d (cid:0) conv ( x , . . . , x d , ˆ x n ) (cid:1) → ∞ , as n → ∞ , which gives a contradiction to (A2). This shows (7.5).Now we deﬁne g ( · ) be the convex hull of ˜ g ( x ) := inf n ∈ N g n ( x ), then g ≤ g n holds for all n ∈ N . We claim that g ( x ) → ∞ as k x k → ∞ . By Lemma 7.11,for ﬁxed η >

1, we have λ d (cid:0) D ( g n , ηM ) (cid:1) ≤ M ( − s )( ηM − ǫ n ) d ( s + 1) R ηM − ǫ n v d ( v + ǫ n ) /s d v ≤ M ( − s )( ηM ) d ( s + 1) R ( η − M v d ( v + M ) /s d v < ∞ , where D ( g n , ηM ) := { g n ≤ ηM } . Hence(7.6) sup n ∈ N λ d (cid:0) D ( g n , ηM ) (cid:1) < ∞ . holds for every η >

1. Now combining (7.5) and (7.6), we claim that, forﬁxed η large enough, it is possible to ﬁnd R = R ( η ) > g n ( x ) ≥ ηM , holds for all x ≥ R ( η ) and n ∈ N . If this is not true, then for all k ∈ N , wecan ﬁnd n ( k ) ∈ N and ¯ x k ∈ R d with k ¯ x k k ≥ k such that g n ( k ) (¯ x k ) ≤ ηM .We consider two cases to derive a contradiction. [Case 1.] If for some n ∈ N there exists inﬁnitely many k ∈ N with n ( k ) = n , then we may assume without loss of generality that we can ﬁnd some asequence { ¯ x k } k ∈ N with k ¯ x k k → ∞ as k → ∞ , and g n (¯ x k ) ≤ ηM . Since thesupport g n has non-empty interior, by Lemma 7.12, we can ﬁnd x , . . . , x d ∈ supp( g n ) such that λ d (cid:0) conv( x , . . . , x d , ¯ x k ( j ) ) (cid:1) → ∞ as j → ∞ holds for HAN AND WELLNER some subsequence { ¯ x k ( j ) } j ∈ N of { ¯ x k } k ∈ N . Let ¯ M := max ≤ i ≤ d g n ( x i ), thenwe ﬁnd λ d (cid:0) D ( g n , ¯ M ∨ ηM ) (cid:1) = ∞ . This contradicts with (7.6). [Case 2.]

If { k ∈ N : n = n ( k ) } < ∞ for all n ∈ N , then withoutloss of generality we may assume that for all k ∈ N , we can ﬁnd ¯ x k ∈ R d with k ¯ x k k ≥ k such that g k ( x k ) ≤ ηM . Recall by assumption (A1) convexset G has non-empty interior, and is contained in the support of g n forall n ∈ N . Again by Lemma 7.12, we may take x , . . . , x d ∈ C such that λ d (cid:0) conv( x , . . . , x d , ¯ x k ( j ) ) (cid:1) → ∞ as j → ∞ holds for some subsequence { ¯ x k ( j ) } j ∈ N of { ¯ x k } k ∈ N . In view of (A1), we conclude by convexity that ¯ M :=max ≤ i ≤ d sup j ∈ N g k ( j ) ( x i ) < ∞ . This implies λ d (cid:0) D ( g n k ( j ) , ¯ M ∨ ηM ) (cid:1) ≥ λ d (cid:0) conv( x , . . . , x d , ¯ x k ( j ) ) (cid:1) → ∞ , j → ∞ , which gives a contradiction.Combining these two cases we have proved (7.7). This implies that ˜ g ( x ) →∞ as k x k → ∞ , whence verifying the claim that g ( x ) → ∞ as k x k → ∞ .Hence in view of Lemma 7.10, we ﬁnd that there exists a, b > g n ( x ) ≥ a k x k − b holds for all x ∈ R d and n ∈ N . Lemma . Assume x , . . . , x d ∈ R d are in general position. If g ( · ) is anon-negative function with ∆ ≡ conv( x , . . . , x d ) ⊂ dom( g ) , and g ( x ) = 0 .Then for r ≥ d , we have R ∆ (cid:0) g ( x ) (cid:1) − r d x = ∞ . Proof.

We may assume without loss of generality that x = 0 , x i = e i ∈ R d , where e i is the unit directional vector with 1 in its i -th coordinate and0 otherwise. Then ∆ = ∆ := { x ∈ R d : P di =1 x i ≤ , x i ≥ , ∀ i = 1 , . . . , d } .Denote a i = g ( x i ) ≥

0. We may assume there is at least one a i = 0. Thenby convexity of g we ﬁnd g ( x ) ≤ P di =1 a i x i for all x ∈ ∆ . This gives Z ∆ (cid:0) g ( x ) (cid:1) − r d x ≥ Z ∆ (cid:0) d X i =1 a i x i (cid:1) − r d x ≥ Z ∆ i =1 ,...,d a i ) r k x k r d x ≥ i =1 ,...,d a i ) r d r/ Z C k x k r d x = ∞ , where C := {k x k ≤ √ d } ∩ { x i ≥ , i = 1 , . . . , d } . Note we used the factthat k x k ≤ √ d k x k . Lemma . Let f n → d f , and D be the class of all Borel measurable, convex subsets in R d .Then lim n →∞ sup D ∈D (cid:12)(cid:12)R D ( f n − f ) (cid:12)(cid:12) = 0 . -CONCAVE ESTIMATION Acknowledgements.

The authors owe thanks to Charles Doss, RogerKoenker and Richard Samworth, as well as two referees and an AssociateEditor for helpful comments, suggestions and minor corrections.

References.

Avriel, M. (1972). r -convex functions. Math. Programming Balabdaoui, F. , Rufibach, K. and

Wellner, J. A. (2009). Limit distribution theory formaximum likelihood estimation of a log-concave density.

Ann. Statist. Balabdaoui, F. and

Wellner, J. A. (2007). Estimation of a k -monotone density:limit distribution theory and the spline connection. Ann. Statist. Basu, A. , Harris, I. R. , Hjort, N. L. and

Jones, M. C. (1998). Robust and eﬃ-cient estimation by minimising a density power divergence.

Biometrika Bhattacharya, R. N. and

Ranga Rao, R. (1976).

Normal Approximation and Asymp-totic Expansions . John Wiley & Sons, New York-London-Sydney Wiley Series in Prob-ability and Mathematical Statistics. MR0436272 (55

Birg´e, L. and

Massart, P. (1993). Rates of convergence for minimum contrast estima-tors.

Probab. Theory Related Fields Borell, C. (1974). Convex measures on locally convex spaces.

Ark. Mat. Borell, C. (1975). Convex set functions in d -space. Period. Math. Hungar. Brascamp, H. J. and

Lieb, E. H. (1976). On extensions of the Brunn-Minkowskiand Pr´ekopa-Leindler theorems, including inequalities for log concave functions, andwith an application to the diﬀusion equation.

J. Functional Analysis Brunk, H. D. (1970). Estimation of isotonic regression. In

Nonparametric Techniques inStatistical Inference (Proc. Sympos., Indiana Univ., Bloomington, Ind., 1969)

Cule, M. L. and

D¨umbgen, L. (2008). On an auxiliary function for log-density estima-tion. arXiv preprint arXiv:0807.4719 . Cule, M. and

Samworth, R. (2010). Theoretical properties of the log-concave maxi-mum likelihood estimator of a multidimensional density.

Electron. J. Stat. Cule, M. , Samworth, R. and

Stewart, M. (2010). Maximum likelihood estimation ofa multi-dimensional log-concave density.

J. R. Stat. Soc. Ser. B Stat. Methodol. Das Gupta, S. (1976). S -unimodal function: related inequalities and statistical applica-tions. Sankhy¯a Ser. B Dharmadhikari, S. and

Joag-Dev, K. (1988).

Unimodality, Convexity, and Appli-cations . Probability and Mathematical Statistics . Academic Press, Inc., Boston, MA.MR954608 (89k:60020)

Doss, C. and

Wellner, J. A. (2016). Global rates of convergence of the MLEs of log-concave and s-concave densities.

Ann. Statist. to appear. HAN AND WELLNER

Dudley, R. M. (2002).

Real Analysis and Probability . Cambridge Studies in AdvancedMathematics . Cambridge University Press, Cambridge Revised reprint of the 1989original. MR1932358(2003h:60001) D¨umbgen, L. and

Rufibach, K. (2009). Maximum likelihood estimation of a log-concave density and its distribution function: basic properties and uniform consistency.

Bernoulli D¨umbgen, L. , Samworth, R. and

Schuhmacher, D. (2011). Approximation by log-concave distributions, with applications to regression.

Ann. Statist. Groeneboom, P. (1985). Estimating a monotone density. In

Proceedings of the Berke-ley conference in honor of Jerzy Neyman and Jack Kiefer, Vol. II (Berkeley,Calif., 1983) . Wadsworth Statist./Probab. Ser.

Groeneboom, P. , Jongbloed, G. and

Wellner, J. A. (2001). Estimation of a con-vex function: characterizations and asymptotic theory.

Ann. Statist. Guntuboyina, A. and

Sen, B. (2015). Global risk bounds and adaptation in univariateconvex regression.

Probab. Theory Related Fields

Jongbloed, G. (2000). Minimax lower bounds and moduli of continuity.

Statist. Probab.Lett. Kim, A. K. and

Samworth, R. J. (2015). Global rates of convergence in log-concavedensity estimation. arXiv preprint arXiv:1404.2298v2 . Koenker, R. and

Mizera, I. (2010). Quasi-concave density estimation.

Ann. Statist. Koenker, R. and

Mizera, I. (2014). Convex Optimization in R.

Journal of StatisticalSoftware . Lang, R. (1986). A note on the measurability of convex sets.

Arch. Math. (Basel) MOSEK ApS, D.

Pal, J. K. , Woodroofe, M. and

Meyer, M. (2007). Estimating a Polya frequencyfunction . In Complex datasets and inverse problems . IMS Lecture Notes Monogr. Ser. Prakasa Rao, B. L. S. (1969). Estimation of a unimodal density.

Sankhy¯a Ser. A Rinott, Y. (1976). On convexity of measures.

Ann. Probability Rockafellar, R. T. (1971). Integrals which are convex functionals. II.

Paciﬁc J. Math. Rockafellar, R. T. (1997).

Convex Analysis . Princeton Landmarks in Mathematics .Princeton University Press, Princeton, NJ Reprint of the 1970 original, Princeton Pa-perbacks. MR1451876 (97m:49001)

Schuhmacher, D. , H¨usler, A. and

D¨umbgen, L. (2011). Multivariate log-concavedistributions as a nearly parametric model.

Stat. Risk Model. Seijo, E. and

Sen, B. (2011). Nonparametric least squares estimation of a multivariateconvex regression function.

Ann. Statist. Seregin, A. and

Wellner, J. A. (2010). Nonparametric estimation of multivariateconvex-transformed densities.

Ann. Statist. -CONCAVE ESTIMATION Uhrin, B. (1984). Some remarks about the convolution of unimodal functions.

Ann.Probab. van de Geer, S. A. (2000). Applications of Empirical Process Theory . Cambridge Seriesin Statistical and Probabilistic Mathematics . Cambridge University Press, Cambridge.MR1739079 (2001h:62002) van der Vaart, A. W. and Wellner, J. A. (1996).

Weak Convergence and EmpiricalProcesses . Springer Series in Statistics . Springer-Verlag, New York With applicationsto statistics. MR1385671 (97g:60035)

Walther, G. (2002). Detecting the presence of mixing with multiscale maximum likeli-hood.

J. Amer. Statist. Assoc. Wright, F. T. (1981). The asymptotic behavior of monotone regression estimates.

Ann.Statist. Department of Statistics, Box 354322University of WashingtonSeattle, WA 98195-4322E-mail: [email protected]

Department of Statistics, Box 354322University of WashingtonSeattle, WA 98195-4322E-mail: [email protected]

Related Researches

Density Deconvolution with Non-Standard Error Distributions: Rates of Convergence and Adaptive Estimation

by Alexander Goldenshluger

Factor Modelling for Clustering High-dimensional Time Series

by Bo Zhang

Mixing convergence of LSE for supercritical Gaussian AR(2) processes using random scaling

by Matyas Barczy

Benign overfitting without concentration

by Zong Shang

The Gaussian entropy map in valued fields

by Yassine El Maazouz

Measuring association with Wasserstein distances

by Johannes Wiesel

Transition kernel couplings of the Metropolis-Hastings algorithm

by John O'Leary

Better understanding of the multivariate hypergeometric distribution with implications in design-based survey sampling

by X.G. Duan

Adaptive Sequential Design for a Single Time-Series

by Ivana Malenica

Settling the Sharp Reconstruction Thresholds of Random Graph Matching

by Yihong Wu

Tail concordance measures: A fair assessment of tail dependence

by Takaaki Koike

Adaptive Estimation of Quadratic Functionals in Nonparametric Instrumental Variable Models

by Christoph Breunig

Ellipse Combining with Unknown Cross Ellipse Correlations

by Adam Hall

Motif-based tests for bipartite networks

by Sarah Ouadah

A general method for power analysis in testing high dimensional covariance matrices

by Qiyang Han

Tensor denoising with trend filtering

by Francesco Ortelli

The Langevin Monte Carlo algorithm in the non-smooth log-concave case

by Joseph Lehec

Generalization error of random features and kernel methods: hypercontractivity and kernel matrix concentration

by Song Mei

Inferring serial correlation with dynamic backgrounds

by Song Wei

On the distributions of some statistics related to adaptive filters trained with t -distributed samples

by Olivier Besson

Statistical models and probabilistic methods on Riemannian manifolds

by Salem Said

Diffusion Asymptotics for Sequential Experiments

by Stefan Wager

The EM Perspective of Directional Mean Shift Algorithm

by Yikun Zhang

Eigen-convergence of Gaussian kernelized graph Laplacian by manifold heat interpolation

by Xiuyuan Cheng

Testing for subsphericity when n and p are of different asymptotic order

by Joni Virta

«
1

2

3

4

»

Submitted on 2 May 2015 (v1), last revised 22 Oct 2015 (this version, v4) Updated

arXiv.org Original Source

NASA ADS

Google Scholar

Semantic Scholar