[PDF] Adaptive estimation of planar convex sets

Abstract

In this paper, we consider adaptive estimation of an unknown planar compact, convex set from noisy measurements of its support function on a uniform grid. Both the problem of estimating the support function at a point and that of estimating the convex set are studied. Data-driven adaptive estimators are proposed and their optimality properties are established. For pointwise estimation, it is shown that the estimator optimally adapts to every compact, convex set instead of a collection of large parameter spaces as in the conventional minimax theory of nonparametric estimation. For set estimation, the estimators adaptively achieve the optimal rate of convergence. In both these problems, our analysis makes no smoothness assumptions on the unknown sets.

Full PDF

aa r X i v : . [ m a t h . S T ] A ug Adaptive Estimation of Planar Convex Sets

Tony Cai , Adityanand Guntuboyina , and Yuting Wei August 18, 2015

Abstract

In this paper, we consider adaptive estimation of an unknown planar compact, convex setfrom noisy measurements of its support function on a uniform grid. Both the problem ofestimating the support function at a point and that of estimating the convex set are studied.Data-driven adaptive estimators are proposed and their optimality properties are established.For pointwise estimation, it is shown that the estimator optimally adapts to every compact,convex set instead of a collection of large parameter spaces as in the conventional minimax theoryin nonparametric estimation literature. For set estimation, the estimators adaptively achievethe optimal rate of convergence. In both these problems, our analysis makes no smoothnessassumptions on the unknown sets.

Keywords:

Adaptive estimation, circle convexity, convex set, minimax rate of convergence, sup-port function.

AMS 2000 Subject Classiﬁcation:

Primary: 62G08; Secondary: 52A20.

We study in this paper the problem of nonparametric estimation of an unknown planar compact,convex set from noisy measurements of its support function. Before describing the details of theproblem, let us ﬁrst introduce the support function. For a compact, convex set K in R , its supportfunction is deﬁned by h K ( θ ) := max ( x ,x ) ∈ K ( x cos θ + x sin θ ) for θ ∈ R . Note that h K is a periodic function with period 2 π . It is useful to think about θ in terms ofthe direction (cos θ, sin θ ). The line x cos θ + x sin θ = h K ( θ ) is a support line for K (i.e., it Department of Statistics, The Wharton School, University of Pennsylvania. The research of Tony Cai wassupported in part by NSF Grants DMS-1208982 and DMS-1403708, and NIH Grant R01 CA127334. Department of Statistics, University of California at Berkeley. The research of Adityanand Guntuboyina wassupported by NSF Grant DMS-1309356. K and K lies on one side of it). Conversely, every support line of K is of this formfor some θ . The convex set K is completely determined by the its support function h K because K = T θ { ( x , x ) : x cos θ + x sin θ ≤ h K ( θ ) } .The support function h K possesses the circle-convexity property (see, e.g., Vitale (1979)): forevery α > α > α and 0 < α − α < π , h K ( α )sin( α − α ) + h K ( α )sin( α − α ) ≥ sin( α − α )sin( α − α ) sin( α − α ) h K ( α ) . (1)Moreover the above inequality characterizes h K , i.e., any periodic function of period 2 π satisfyingthe above inequality equals h K for a unique compact, convex subset K in R . The circle-convexityproperty (1) is clearly related to the usual convexity property. Indeed, if we replace the sine functionin (1) by the identity function (i.e., if we replace sin α by α in (1)), we obtain the condition forconvexity. In spite of this similarity, (1) is diﬀerent from convexity as can be seen from the exampleof the function h ( θ ) = | sin θ | which satisﬁes (1) but is clearly not convex. We are now ready to describe the problem studied in this paper. Let K ∗ be an unknown compact,convex set in R . We study the problem of estimating K ∗ or h K ∗ from noisy measurements of h K ∗ .Speciﬁcally, we observe data ( θ , Y ) , . . . , ( θ n , Y n ) drawn according to the model Y i = h K ∗ ( θ i ) + ξ i for i = 1 , . . . , n (2)where θ , . . . , θ n are ﬁxed grid points in ( − π, π ] and ξ , . . . , ξ n are i.i.d Gaussian random variableswith mean zero and known variance σ . We focus on the dual problems of estimating the scalarquantity h K ∗ ( θ i ) for each 1 ≤ i ≤ n as well as the convex set K ∗ . We propose data-driven adaptiveestimators and establish their optimality for both of these problems.The problem considered here has a range of applications in engineering. The regression model (2)was ﬁrst proposed and studied by Prince and Willsky (1990) who were motivated by an applicationto Computed Tomography. Lele et al. (1992) showed how solutions to this problem can be applied totarget reconstruction from resolved laser-radar measurements in the presence of registration errors.Gregor and Rannou (2002) considered application to Projection Magnetic Resonance Imaging. Itis also a fundamental problem in the ﬁeld of geometric tomography; see Gardner (2006). Anotherapplication domain where this problem might plausibly arise is robotic tactical sensing as has beensuggested by Prince and Willsky (1990). Finally this is a very natural shape constrained estimationproblem and would ﬁt right into the recent literature on shape constrained estimation. See, forexample, Groeneboom and Jongbloed (2014).Most proposed procedures for estimating K ∗ in this setting are based on least squares mini-mization. The least squares estimator ˆ K ls is deﬁned as any minimizer of P ni =1 ( Y i − h K ( θ i )) as K ranges over all compact convex sets. The minimizer in this optimization problem is not unique and2ne can always take it to be a polytope. This estimator was ﬁrst proposed by Prince and Willsky(1990) who also proposed an algorithm for computing it based on quadratic programming. Furtheralgorithms for computing ˆ K ls were proposed in Gardner and Kiderlen (2009); Lele et al. (1992);Prince and Willsky (1990).The theoretical performance of the least squares estimator was ﬁrst considered by Gardner et al.(2006) who mainly studied its accuracy for estimating K ∗ under the natural ﬁxed design loss: L f ( K ∗ , ˆ K ls ) := 1 n n X i =1 (cid:16) h K ∗ ( θ i ) − h ˆ K ls ( θ i ) (cid:17) . (3)The key result of Gardner et al. (2006) (specialized to the planar case that we are studying) statesthat L f ( K ∗ , ˆ K ls ) = O ( n − / ) as n → ∞ almost surely provided K ∗ is contained in a ball of boundedradius. This result is complemented by the minimax lower bound in Guntuboyina (2011) whereit was shown that n − / is the minimax rate for this problem. These two results together implyminimax optimality of ˆ K ls under the loss function L f . No other theoretical results for this problemare available outside of those in Gardner et al. (2006) and Guntuboyina (2011).As a result, the following basic questions are still unanswered:1. For a ﬁxed i ∈ { , . . . , n } , how does one optimally and adaptively estimate h K ∗ ( θ i )? Thisis the pointwise estimation problem. In the literature on shape constrained estimation,pointwise estimation has been the most studied problem. Several papers have been writtenon this for monotonicity constrained estimation; prominent examples being Brunk (1970);Carolan and Dykstra (1999); Cator (2011); Groeneboom (1983, 1985); Jankowski (2014);Wright (1981) and convexity constrained estimation; prominent ones being Cai and Low(2015); Groeneboom et al. (2001a,b); Hanson and Pledger (1976); Mammen (1991). For theproblem considered in this paper however, nothing is known about pointwise estimation. Itmay be noted that the result L f ( K ∗ , ˆ K ls ) = O ( n − / ) of Gardner et al. (2006) does not sayanything about the accuracy of h ˆ K ls ( θ i ) as an estimator for h K ∗ ( θ i ).2. How to construct minimax optimal estimators for the set K ∗ that also adapt to polytopes?Polytopes with a small number of extreme points have a much simpler structure than generalconvex sets. In the problem of estimating convex sets under more standard observation modelsdiﬀerent from the one studied here, it is possible to construct estimators that converge atfaster rates for polytopes compared to the overall minimax rate (see Brunel (2014) for a nicesummary of this theory). Similar kinds of adaptation has been recently studied for shapeconstrained estimation problems based on monotonicity and convexity, see Baraud and Birg´e(2015); Chatterjee et al. (2014); Guntuboyina and Sen (2013). Based on these results, it isnatural to expect minimax estimators that adapt to polytopes in this problem. This has notbeen addressed previously. 3 .2 Our Contributions We answer both the above questions in the aﬃrmative in the present paper. The main contributionsof this paper can be summarized in the following:1. We study the pointwise adaptive estimation problem in detail in the decision theoretic frame-work where the focus is on the performance at every function, instead of the maximumrisk over a large parameter space. This framework, ﬁrst introduced in Cai et al. (2013) andCai and Low (2015) for shape constrained regression, provides a much more precise charac-terization of the performance of an estimator than the conventional minimax theory does.In the context of the present problem, the diﬃculty of estimating h K ∗ ( θ i ) at a given K ∗ and θ i can be expressed by means of a benchmark R n ( K ∗ , θ ) which is deﬁned as follows (below E L denotes expectation taken with respect to the joint distribution of Y , . . . , Y n generatedaccording to the model (2) with K ∗ replaced by L ): R n ( K ∗ , θ ) = sup L inf ˜ h max (cid:16) E K ∗ (˜ h − h K ∗ ( θ )) , E L (˜ h − h L ( θ )) (cid:17) , (4)where the supremum above is taken over all compact, convex sets L while the inﬁmum isover all estimators ˜ h . In our ﬁrst result for pointwise estimation, we establish, for each i ∈ { , . . . , n } , a lower bound for the performance of every estimator for estimating h K ∗ ( θ i ).Speciﬁcally, it is shown that R n ( K ∗ , θ i ) ≥ c · σ k ∗ ( i ) + 1 (5)where k ∗ ( i ) is an integer for which an explicit formula can be given in terms of K ∗ and i ; and c is a universal positive constant. It will turn out that k ∗ ( i ) is related to the smoothness of h K ∗ ( θ ) at θ = θ i .We construct a data-driven estimator, ˆ h i , of h K ∗ ( θ i ) based on local smoothing together withan optimization scheme for automatically choosing a bandwidth, and show that the estimatorˆ h i satisﬁes E K ∗ (cid:16) ˆ h i − h K ∗ ( θ i ) (cid:17) ≤ C · σ k ∗ ( i ) + 1 (6)for a universal positive constant C . Inequalities (5) and (6) together imply that ˆ h i is, withina universal constant factor, an optimal estimator of h K ∗ ( θ i ) for every compact, convex set K ∗ . This optimality is stronger than the traditional minimax optimality usually employed innonparametric function estimation. The quantity σ / ( k ∗ ( i ) + 1) depends on the unknown set K ∗ in a similar way that the Fisher information bound depends on the unknown parameter ina regular parametric model. In contrast, the optimal rate in the minimax paradigm is givenin terms of the worse case performance over a large parameter space and does not depend onindividual parameter values. 4. Using the optimal adaptive point estimators ˆ h , . . . , ˆ h n , we construct two set estimators ˆ K and ˆ K ′ . The details of this construction are given in Section 2.2. In Theorems 3.6 and 3.8,we prove that ˆ K is minimax optimal for K ∗ under the loss function L f while the estimatorˆ K ′ is minimax optimal under the integral squared loss function deﬁned by L ( ˆ K ′ , K ∗ ) := Z π − π (cid:0) h ˆ K ′ ( θ ) − h K ∗ ( θ ) (cid:1) dθ. (7)Speciﬁcally, Theorem 3.6 shows that E K ∗ L f ( K ∗ , ˆ K ) ≤ C  σ n + σ √ Rn ! /  (8)provided K ∗ is contained in a ball of radius R . This, combined with the minimax lower boundin Guntuboyina (2011), proves the minimax optimality of ˆ K . An analogous result is shownin Theorem 3.8 for E K ∗ L ( K ∗ , ˆ K ′ ). For the pointwise estimation problem where the goal isto estimate h K ∗ ( θ i ), the optimal rate σ / ( k ∗ ( i ) + 1) can be as large as n − / . However thebound (8) shows that the globally the risk is at most n − / . The shape constraint given byconvexity of K ∗ ensures that the points where pointwise estimation rate is n − / cannot betoo many. Note that we make no smoothness assumptions for proving (8).3. We show that our set estimators ˆ K and ˆ K ′ adapt to polytopes with bounded number ofextreme points. Already inequality (8) implies that E K ∗ L f ( K ∗ , ˆ K ) is bounded from aboveby the parametric risk Cσ /n provided R = 0 (note that R = 0 means that K ∗ is a sin-gleton). Because σ /n is much smaller than n − / , the bound (8) shows that ˆ K adapts tosingletons. Theorem 3.7 extends this adaptation phenomenon to polytopes and we show that E K ∗ L f ( K ∗ , ˆ K ) is bounded by the parametric rate (up to a logarithmic multiplicative factorof n ) for all polytopes with bounded number of extreme points. An analogous result is alsoproved for E K ∗ L ( K ∗ , ˆ K ′ ) in Theorem 3.8. It should be noted that the construction of ourestimators ˆ K and ˆ K ′ (described in Section 2.2) does not involve any special treatment forpolytopes; yet the estimators automatically achieve faster rates for polytopes.We would like to stress two features of this paper: (a) we do not make any smoothness as-sumptions on the boundary of K ∗ throughout the paper; in particular, note that we obtain the n − / rate for the set estimators ˆ K and ˆ K ′ without any smoothness assumptions, and (b) we gobeyond the traditional minimax paradigm by considering adaptive estimation in both the pointwiseestimation problem and the problem of estimating the entire set K ∗ . The rest of the paper is structured as follows. The proposed estimators are described in detailin Section 2. The theoretical properties of the estimators are analyzed in Section 3; Section 3.15ives results for pointwise estimation while Section 3.2 deals with set estimators. In Section 4,we investigate optimal estimation of some special compact convex sets K ∗ where we explicitlycompute the associated rates of convergence. The proofs of the main results are given in Section 6and additional technical results are relegated to Appendix A. Recall the regression model (2), where we observe noisy measurements ( θ , Y ) , . . . , ( θ n , Y n ) with θ i = 2 πi/n − π, i = 1 , ..., n being ﬁxed grid points in ( − π, π ]. In this section, we ﬁrst describe indetail our estimate ˆ h i for h K ∗ ( θ i ) for each i . Subsequently, we shall describe how to put togetherthese estimates ˆ h , . . . , ˆ h n to yield set estimators for K ∗ . h K ∗ ( θ i ) for each ﬁxed i Fix 1 ≤ i ≤ n . Our construction of the estimator ˆ h i for h K ∗ ( θ i ) is based on the key circle-convexityproperty (1) of the function h K ∗ ( · ). Let us deﬁne, for 0 < φ < π/ θ ∈ ( − π, π ], the followingtwo quantities: l ( θ, φ ) := cos φ ( h K ∗ ( θ + φ ) + h K ∗ ( θ − φ )) − h K ∗ ( θ + 2 φ ) + h K ∗ ( θ − φ )2and u ( θ, φ ) := h K ∗ ( θ + φ ) + h K ∗ ( θ − φ )2 cos φ . The following lemma states that for every θ , the quantity h K ∗ ( θ ) is sandwiched between l ( θ, φ )and u ( θ, φ ) for every φ . This will be used crucially in deﬁning ˆ h . The proof of this lemma is astraightforward consequence of (1) and is given in Appendix A. Lemma 2.1.

For every < φ < π/ and every θ ∈ ( − π, π ] , we have l ( θ, φ ) ≤ h K ∗ ( θ ) ≤ u ( θ, φ ) . For a ﬁxed 1 ≤ i ≤ n , Lemma 2.1 implies that l ( θ i , πjn ) ≤ h K ∗ ( θ i ) ≤ u ( θ i , πjn ) for every0 ≤ j < ⌊ n/ ⌋ . Note that when j = 0, we have l ( θ i ,

0) = h K ∗ ( θ i ) = u ( θ i , . Averaging theseinequalities for j = 0 , , . . . , k where k is a ﬁxed integer with 0 ≤ k < ⌊ n/ ⌋ , we obtain L k ( θ i ) ≤ h K ∗ ( θ i ) ≤ U k ( θ i ) for every 0 ≤ k < ⌊ n/ ⌋ (9)where L k ( θ i ) := 1 k + 1 k X j =0 l (cid:18) θ i , πjn (cid:19) and U k ( θ i ) := 1 k + 1 k X j =0 u (cid:18) θ i , πjn (cid:19) . We are now ready to describe our estimator. Fix 1 ≤ i ≤ n . Inequality (9) says that thequantity of interest, h K ∗ ( θ i ), is sandwiched between L k ( θ i ) and U k ( θ i ) for every k . Both L k ( θ i )and U k ( θ i ) can naturally be estimated by unbiased estimators. Indeed, letˆ l ( θ i , jπ/n ) := cos(2 jπ/n )( Y i + j + Y i − j ) − Y i +2 j + Y i − j u ( θ i , jπ/n ) := Y i + j + Y i − j jπ/n )6nd take ˆ L k ( θ i ) := 1 k + 1 k X j =0 ˆ l ( θ i , jπ/n ) and ˆ U k ( θ i ) := 1 k + 1 k X j =0 ˆ u ( θ i , jπ/n ) . (10)Obviously, in order for the above to be meaningful, we need to deﬁne Y i even for i / ∈ { , . . . , n } .This is easily done in the following way: for any i ∈ Z , let s be such that i − sn ∈ { , . . . , n } andtake Y i := Y i − sn .As k increases, one averages more terms in (10) and hence the estimators ˆ L k ( θ i ) and ˆ U k ( θ i )become more accurate. Letˆ∆ k ( θ i ) := ˆ U k ( θ i ) − ˆ L k ( θ i ) = 1 k + 1 k X j =0 (cid:18) Y i +2 j + Y i − j − cos(4 jπ/n )cos(2 jπ/n ) Y i + j + Y i − j (cid:19) . (11)Because of (9), a natural strategy for estimating h K ∗ ( θ i ) is to choose k for which ˆ∆ k ( θ i ) is thesmallest and then use either ˆ L k ( θ i ) or ˆ U k ( θ i ) at that k as the estimator. This is essentially ourestimator with one small diﬀerence in that we also take into account the noise present in ˆ∆ k ( θ i ).Formally, our estimator for h K ∗ ( θ i ) is given by:ˆ h i = ˆ U ˆ k ( i ) ( θ i ) , where ˆ k ( i ) := argmin k ∈I (cid:26)(cid:16) ˆ∆ k ( θ i ) (cid:17) + + 2 σ √ k + 1 (cid:27) (12)and I := { } ∪ { j : j ≥ j ≤ ⌊ n/ ⌋} .Our estimator ˆ h i can be viewed as an angle-adjusted local averaging estimator. It is inspired bythe estimator of Cai and Low (2015) for convex regression. The number of terms averaged equalsˆ k ( i ) + 1 and this is analogous to the bandwidth in kernel-based smoothing methods. Our ˆ k ( i ) isdetermined from an optimization scheme. Notice that unlike the least squares estimator h ˆ K ls ( θ i ),the construction of ˆ h i for a ﬁxed i does not depend on the construction of ˆ h j for j = i . K ∗ We next present estimators for the set K ∗ . The point estimators ˆ h , . . . , ˆ h n do not directly givean estimator for K ∗ because (ˆ h , . . . , ˆ h n ) is not necessarily a valid support vector i.e., (ˆ h , . . . , ˆ h n )does not always belong to the following set: H := (cid:8) ( h K ( θ ) , . . . , h K ( θ n )) : K ⊆ R is compact and convex (cid:9) . To get a valid support vector from (ˆ h , . . . , ˆ h n ), we need to project it onto H to obtain:ˆ h P := (ˆ h P , . . . , ˆ h Pn ) := argmin ( h ,...,h n ) ∈H n X i =1 (cid:16) ˆ h i − h i (cid:17) (13)The superscript P here stands for projection. An estimator for the set K ∗ can now be constructedimmediately from ˆ h P , . . . , ˆ h Pn viaˆ K := n ( x , x ) : x cos θ i + x sin θ i ≤ ˆ h Pi for all i = 1 , . . . , n o . (14)7n Theorems 3.6 and 3.7, we prove upper bounds on the accuracy of ˆ K under the loss function L f deﬁned in (3).There is another reasonable way of constructing a set estimator for K ∗ based on the pointestimators ˆ h , . . . , ˆ h n . We ﬁrst interpolate ˆ h , . . . , ˆ h n to deﬁne a function ˆ h ′ : ( − π, π ] → R asfollows: ˆ h ′ ( θ ) := sin( θ i +1 − θ )sin( θ i +1 − θ i ) ˆ h i + sin( θ − θ i )sin( θ i +1 − θ i ) ˆ h i +1 for θ i ≤ θ ≤ θ i +1 . (15)Here i ranges over 1 , . . . , n with the convention that θ n +1 = θ + 2 π (and θ n ≤ θ ≤ θ n +1 should beidentiﬁed with − π ≤ θ ≤ − π + 2 π/n ). Based on this function ˆ h ′ , we can deﬁne our estimator ˆ K ′ of K ∗ by ˆ K ′ := argmin K Z π − π (cid:16) ˆ h ′ ( θ ) − h K ( θ ) (cid:17) dθ. (16)The existence and uniqueness of ˆ K ′ can be justiﬁed in the usual way by the Hilbert space projectiontheorem. In Theorem 3.8, we prove bounds on the accuracy of ˆ K ′ as an estimator for K ∗ underthe integral loss L deﬁned in (7). We investigate in this section the accuracy of the proposed point and set estimators. The proofsof these results are given in Section 6.

As mentioned in the introduction, we evaluate the performance of the point estimator ˆ h i at individ-ual functions, not the worst case over a large parameter space. This provides a much more precisecharacterization of the accuracy of the estimator. Let us ﬁrst recall inequality (9) where h K ∗ ( θ i ) issandwiched between L k ( θ i ) and U k ( θ i ). Deﬁne ∆ k ( θ i ) := U k ( θ i ) − L k ( θ i ). Theorem 3.1.

Fix i ∈ { , . . . , n } . There exists a universal positive constant C such that the riskof ˆ h i as an estimator of h K ∗ ( θ i ) satisﬁes the following inequality: E K ∗ (cid:16) ˆ h i − h K ∗ ( θ i ) (cid:17) ≤ C · σ k ∗ ( i ) + 1 (17) where k ∗ ( i ) := argmin k ∈I (cid:18) ∆ k ( θ i ) + 2 σ √ k + 1 (cid:19) . (18) Remark 3.1.

It turns out that the bound in (17) is linked to the level of smoothness of the function h K ∗ at θ i . However for this interpretation to be correct, one needs to regard h K ∗ as a function on R instead of a subset of R . This is further explained in Remark 4.1.8heorem 3.1 gives an explicit bound on the risk of ˆ h i in terms of the quantity k ∗ ( i ) deﬁnedin (18). It is important to keep in mind that k ∗ ( i ) depends on K ∗ even though this is suppressedin the notation. In the next theorem, we show that σ / ( k ∗ ( i ) + 1) also presents a lower boundon the accuracy of every estimator for h K ∗ ( θ i ). This implies, in particular, optimality of ˆ h i as anestimator of h K ∗ ( θ i ).One needs to be careful in formulating the lower bound result in this setting. A ﬁrst attemptmight perhaps be to prove that, for a universal positive constant c ,inf ˜ h E K ∗ (cid:16) ˜ h − h K ∗ ( θ i ) (cid:17) ≥ c · σ k ∗ ( i ) + 1where the inﬁmum is over all possible estimators ˜ h . This, of course, would not be possible becauseone can take ˜ h = h K ∗ ( θ i ) which would make the left hand side above zero. A formulation ofthe lower bound which avoids this diﬃculty was proposed by Cai and Low (2015) in the context ofconvex function estimation. Their idea, translated to our setting of estimating the support function h K ∗ at a point θ i , is to consider, instead of the risk at K ∗ , the maximum of the risk at K ∗ and therisk at L ∗ which is most diﬃcult to distinguish from K ∗ in term of estimating h K ∗ ( θ i ). This leadsto the benchmark R n ( K ∗ , θ i ) deﬁned in (4). Theorem 3.2.

For any ﬁxed i ∈ { , . . . , n } , we have R n ( K ∗ , θ i ) ≥ c · σ k ∗ ( i ) + 1 (19) for a universal positive constant c . Theorems 3.1 and 3.2 together imply that σ / ( k ∗ ( i ) + 1) is the optimal rate of estimation of h K ∗ ( θ i ) for a given compact, convex set K ∗ . The results show that our data driven estimator ˆ h i for h K ∗ ( θ i ) performs uniformly within a constant factor of the ideal benchmark R n ( K ∗ , θ i ) for every i . This means that ˆ h i adapts to every unknown set K ∗ instead of a collection of large parameterspaces as in the conventional minimax theory commonly used in nonparametric literature.Given a speciﬁc set K ∗ and 1 ≤ i ≤ n , the quantity k ∗ ( i ) is often straightforward to computeup to constant multiplicative factors. Several examples are provided in Section 4. From theseexamples, it will be clear that the size of σ / ( k ∗ ( i ) + 1) is linked to the level of smoothness of thefunction h K ∗ at θ i . However for this interpretation to be correct, one needs to regard h K ∗ as afunction on R instead of a subset of R . This is explained in Remark 4.1.The following corollaries shed more light on the quantity σ / ( k ∗ ( i )+1). The ﬁrst corollary belowshows that σ / ( k ∗ ( i ) + 1) is at most C ( σ R/n ) − / for every i and K ∗ ( C is a universal constant).This implies, in particular, the consistency of ˆ h i as an estimator for h K ∗ ( θ i ) for every i and K ∗ . InExample 4.3, we provide an explicit choice of i and K ∗ for which σ / ( k ∗ ( i ) + 1) ≥ c ( σ R/n ) − / ( c is a universal constant). This implies that the conclusion of the following corollary cannot ingeneral be improved. 9 orollary 3.3. Suppose K ∗ is contained in some closed ball of radius R . Then for every i ∈{ , . . . , n } , we have σ k ∗ ( i ) + 1 ≤ C (cid:18) σ Rn (cid:19) / (20) and E (cid:16) ˆ h i − h K ∗ ( θ i ) (cid:17) ≤ C (cid:18) σ Rn (cid:19) / . (21) for a universal positive constant C . It is clear from the deﬁnition (18) that k ∗ ( i ) ≤ n for all i and K ∗ . In the next corollary, weprove that there exist sets K ∗ and i for which k ∗ ( i ) ≥ cn for a constant c . For these sets, theoptimal rate of estimating h K ∗ ( θ i ) is therefore parametric.For a ﬁxed i and K ∗ , let φ ( i ) and φ ( i ) be such that φ ( i ) ≤ θ i ≤ φ ( i ) and such that thereexists a single point ( x , x ) ∈ K ∗ with h K ∗ ( θ ) = x cos θ + x sin θ for all θ ∈ [ φ ( i ) , φ ( i )] . (22)The following corollary says that if the distance of θ i to its nearest end-point in the interval[ φ ( i ) , φ ( i )] is large (i.e., of constant order), then the optimal rate of estimation of h K ∗ ( θ i ) isparametric. This situation happens usually for polytopes (polytopes are compact, convex sets withﬁnitely many vertices); see Examples 4.1 and 4.3 for speciﬁc instances of this phenomenon. Fornon-polytopes, it can often happen that φ ( i ) = φ ( i ) = θ i in which case the conclusion of the nextcorollary is not useful. Corollary 3.4.

For every i ∈ { , . . . , n } , we have k ∗ ( i ) ≥ c n min ( θ i − φ ( i ) , φ ( i ) − θ i , π ) (23) for a universal positive constant c . Consequently E (cid:16) ˆ h i − h K ∗ ( θ i ) (cid:17) ≤ Cσ n min( θ i − φ ( i ) , φ ( i ) − θ i , π ) (24) for a universal positive constant C. From the above two corollaries, it is clear that the optimal rate of estimation of h K ∗ ( θ i ) can beas large as n − / and as small as the parametric rate n − . The rate n − / is achieved, for example,in the situation demonstrated in Example 4.3 while the parametric rate is achieved, for example,for polytopes.The next corollary argues that in order to bound k ∗ ( i ) in speciﬁc examples, one only needs tobound the quantity ∆ k ( θ i ) from above and below. This corollary will be very useful in Section 4while working out k ∗ ( i ) in speciﬁc examples. 10 orollary 3.5. Fix ≤ i ≤ n . Let { f k ( θ i ) , k ∈ I} and { g k ( θ i ) , k ∈ I} be two sequences whichsatisfy g k ( θ i ) ≤ ∆ k ( θ i ) ≤ f k ( θ i ) for all k ∈ I . Also let ˘ k ( i ) := max ( k ∈ I : f k ( θ i ) < ( √ − σ √ k + 1 ) (25) and ˜ k ( i ) := min ( k ∈ I : g k ( θ i ) > √ − σ √ k + 1 ) (26) as long as there is some k ∈ I for which g k ( θ i ) > √ − σ/ √ k + 1 ; otherwise take ˜ k ( i ) :=max k ∈I k . We then have ˘ k ( i ) ≤ k ∗ ( i ) ≤ ˜ k ( i ) and E K ∗ (cid:16) ˆ h i − h K ∗ ( θ i ) (cid:17) ≤ C σ ˘ k ( i ) + 1 (27) for a universal positive constant C . We now turn to study the accuracy of the set estimators ˆ K (deﬁned in (14)) and ˆ K ′ (deﬁned in(16)). The accuracy of ˆ K will be investigated under the loss function L f (deﬁned in (3)) while theaccuracy of ˆ K ′ will be studied under the loss function L (deﬁned in (7)).In Theorem 3.6 below, we prove that E K ∗ L f ( K ∗ , ˆ K ) is bounded from above by a constantmultiple of n − / as long as K ∗ is contained in a ball of radius R . The discussions following thetheorem shed more light on its implications. Theorem 3.6. If K ∗ is contained in some closed ball of radius R ≥ , we have E K ∗ L f (cid:16) K ∗ , ˆ K (cid:17) ≤ C  σ n + σ √ Rn ! /  (28) for a universal positive constant C . Note here that R = 0 is allowed (in which case K ∗ is asingleton). Note that as long as

R >

0, the right hand side in (28) will be dominated by the ( σ √ R/n ) − / term for all large n . This would mean thatsup K ∗ ∈K ( R ) E K ∗ L f ( K ∗ , ˆ K ) ≤ C σ √ Rn ! / (29)where K ( R ) denotes the set of all compact convex sets contained in some ﬁxed closed ball of radius R . 11he minimax rate of estimation over the class K ( R ) was studied in Guntuboyina (2011). InGuntuboyina (2011, Theorems 3.1 and 3.2), it was proved thatinf ˜ K sup K ∗ ∈K ( R ) E K ∗ L f ( K ∗ , ˆ K ) ≍ σ √ Rn ! / (30)where ≍ denotes equality upto constant multiplicative factors. From (29) and (30), it follows thatˆ K is a minimax optimal estimator of K ∗ . We should mention here that an inequality of the form(29) was proved for the least squares estimator ˆ K ls by Gardner et al. (2006) which implies that ˆ K ls is also a minimax optimal estimator of K ∗ .The n − / minimax rate here is quite natural in connection with estimation of smooth functions.Indeed, this is the minimax rate of estimation of twice smooth one-dimensional functions. Althoughwe have not made any smoothness assumptions here, we are working under a convexity-basedconstraint and convexity is associated, in a broad sense, with twice smoothness (see, for example,Alexandrov (1939)). Remark 3.2.

Because of the formula (3) for the loss function L f , the risk E K ∗ L f ( K ∗ , ˆ K ) can beseen as the average of the risk of ˆ K for estimating h K ∗ ( θ i ) over i = 1 , . . . , n . We have seen inSection 3.1 that the optimal rate of estimating h K ∗ ( θ i ) can be as high as n − / . Theorem 3.6, onthe other hand, can be interpreted as saying that, on average over i = 1 , . . . , n , the optimal rateof estimating h K ∗ ( θ i ) is at most n − / . Indeed, the key to proving Theorem 3.6 is to establish thefollowing inequality: σ n n X i =1 k ∗ ( i ) + 1 ≤ C  σ n + σ √ Rn ! /  . under the assumption that K ∗ is contained in a ball of radius R . Therefore, even though each term σ / ( k ∗ ( i ) + 1) can be as large as n − / , on average, their size is at most n − / . Remark 3.3.

Theorem 3.6 provides diﬀerent qualitative conclusions when K ∗ is a singleton. Inthis case, one can take R = 0 in (28) to get the parametric bound Cσ /n for E K ∗ L f ( K ∗ , ˆ K ).Because this is smaller than the nonparametric n − / rate, it means that ˆ K adapts to singletons.Singletons are simple examples of polytopes and one naturally wonders here if ˆ K also adapts toother polytopes as well. This is however not implied by inequality (28) which gives the rate n − / for every K ∗ that is not a singleton. It turns out that ˆ K indeed adapts to other polytopes and weprove this in the next theorem. In fact, we prove that ˆ K adapts to any K ∗ that is well-approximatedby a polytope with not too many vertices. It is currently not known if the least squares estimatorˆ K ls has such adaptive estimation properties.In the next theorem, we prove another bound for E K ∗ L f ( K ∗ , ˆ K ). This bound demonstratesadaptive estimation properties of ˆ K as described in the previous remark. Before stating the the-orem, we need some notation. Recall that polytopes are compact, convex sets with ﬁnitely manyextreme points (or vertices). The space of all polytopes in R n will be denoted by P . For a polytope12 ∈ P , we denote by v P , the number of extreme points of P . Also recall the notion of Hausdorﬀdistance between two compact, convex sets K and L deﬁned by ℓ H ( K, L ) := sup θ ∈ R | h K ( θ ) − h L ( θ ) | . (31)This is not the usual way of deﬁning the Hausdorﬀ distance. For an explanation of the connectionbetween this and the usual deﬁnition, see, for example, Schneider (1993, Theorem 1.8.11). Theorem 3.7.

There exists a universal positive constant C such that E K ∗ L f ( K ∗ , ˆ K ) ≤ C inf P ∈P (cid:20) σ v P n log (cid:18) env P (cid:19) + ℓ H ( K ∗ , P ) (cid:21) . (32) Remark 3.4 (Near-parametric rates for polytopes) . The bound (32) implies that ˆ h has the para-metric rate (upto a logarithmic factor of n ) for estimating polytopes. Indeed, suppose that K ∗ isa polytope with v vertices. Then using P = K ∗ in the inﬁmum in (32), we have the risk bound E K ∗ L f ( K ∗ , ˆ K ) ≤ Cσ vn log (cid:16) env (cid:17) . (33)This is the parametric rate σ v/n up to logarithmic factors and is smaller than the nonparametricrate n − / given in (28). Remark 3.5.

When v = 1, inequality (33) has a redundant logarithmic factor. Indeed, when v = 1, we can use (28) with R = 0 which gives (33) without the additional logarithmic factor. Wedo not know if the logarithmic factor in (33) can be removed for values of v larger than one as well.We now turn to our second set estimator ˆ K ′ . For this estimator, the next theorem provides anupper bound on its accuracy under the integral loss function L (deﬁned in (7)). Qualitatively, thebounds on E K ∗ L ( K ∗ , ˆ K ′ ) given in the next theorem are similar to the bounds on E K ∗ L f ( K ∗ , ˆ K )proved in Theorems 3.6 and 3.7. Theorem 3.8.

Suppose K ∗ is contained in some closed ball of radius R ≥ . The risk E K ∗ L ( K ∗ , ˆ K ′ ) satisﬁes both the following inequalities: E K ∗ L ( K ∗ , ˆ K ′ ) ≤ C  σ n + σ √ Rn ! / + R n  (34) and E K ∗ L ( K ∗ , ˆ K ′ ) ≤ C inf P ∈P (cid:20) σ v P n log (cid:18) env P (cid:19) + ℓ H ( K ∗ , P ) + R n (cid:21) . (35)The only diﬀerence between the inequalities (34) and (35) on one hand and (28) and (32) onthe other is the presence of the R /n term. This term is usually very small and does not changethe qualitative behavior of the bounds. However note that inequality (32) did not require anyassumption on K ∗ being in a ball of radius R while this assumption is necessary for (35).13 emark 3.6. The rate ( σ √ R/n ) / is the minimax rate for this problem under the loss function L . Although this has not been proved explicitly anywhere, it can be shown by modifying the proofof Guntuboyina (2011, Theorem 3.2) appropriately. Theorem 3.8 therefore shows that ˆ K ′ is aminimax optimal estimator of K ∗ under the loss function L . We now investigate the conclusions of the theorems of the previous section for speciﬁc choices of K ∗ . For calculations in the following examples, it will be useful here to note that the quantity∆ k ( θ i ) = U k ( θ i ) − L k ( θ i ) has the following alternative expression:1 k + 1 k X j =0 (cid:18) h K ∗ ( θ i + 4 jπ/n ) + h K ∗ ( θ i − jπ/n )2 − cos(4 jπ/n )cos(2 jπ/n ) h K ∗ ( θ i + 2 jπ/n ) + h K ∗ ( θ i − jπ/n )2 (cid:19) . (36) Example 4.1 (Single point) . Suppose K ∗ := { ( x , x ) } for a ﬁxed point ( x , x ) ∈ R . In this case h K ∗ ( θ ) = x cos θ + x sin θ for all θ. (37)It can then be directly checked from (36) that ∆ k ( θ i ) = 0 for every k ∈ I and i ∈ { , . . . , n } . As aresult, it follows that k ∗ ( i ) = max k ∈I k ≥ cn for a positive constant c .Theorem 3.1 then says that the point estimator ˆ h i satisﬁes E K ∗ (cid:16) ˆ h i − h K ∗ ( θ i ) (cid:17) ≤ Cσ n (38)for a universal positive constant C . One therefore gets the parametric rate here.Also, Theorem 3.6 and inequality (34) in Theorem 3.8 can both be used here with R = 0. Thisimplies that the set estimators ˆ K and ˆ K ′ both converge to K ∗ at the parametric rate under theloss functions L f and L respectively. Example 4.2 (Ball) . Suppose K ∗ is a ball centered at ( x , x ) with radius R >

0. It is then easyto verify that h K ∗ ( θ ) = x cos θ + x sin θ + R for all θ. (39)As a result, for every k ∈ I and i ∈ { , . . . , n } , we have∆ k ( θ i ) = Rk + 1 k X j =0 − cos πjn cos πjn ! ≤ R (cid:18) − cos 4 πk/n cos 2 πk/n (cid:19) = R (1 + 2 cos 2 πk/n )cos 2 πk/n (1 − cos 2 πk/n ) . (40)Because k ≤ n/

16 for all k ∈ I , it is easy to verify that ∆ k ( θ i ) ≤ R sin ( πk/n ) ≤ Rπ k /n .Taking f k ( θ i ) = 8 Rπ k /n in Corollary 3.5, we obtain that k ∗ ( i ) ≥ c ( nσ /R ) / for a constant c .14lso since the function 1 − cos(2 x ) / cos( x ) is a strongly convex function on [ − π/ , π/

4] with secondderivative lower bounded by 3, we have∆ k ( θ i ) = Rk + 1 k X j =0 − cos πjn cos πjn ! ≥ Rk + 1 k X j =0 (cid:18) πjn (cid:19) = Rπ k (2 k + 1) n . This gives k ∗ ( i ) ≤ C ( nσ /R ) / as well for a constant C . We thus have k ∗ ( i ) ≍ ( nσ /R ) / forevery i . Theorem 3.1 then gives E K ∗ (cid:16) ˆ h i − h K ∗ ( θ i ) (cid:17) ≤ C σ √ Rn ! / for every i ∈ { , . . . , n } . (41)Theorem 3.6 and inequality (34) prove that the set estimators ˆ K and ˆ K ′ also converge to K ∗ atthe n − / rate.In the preceding examples, we saw that the optimal rate σ / ( k ∗ ( i ) + 1) for estimating h K ∗ ( θ i )did not depend on i . Next, we consider asymmetric examples where the rate changes with i . Example 4.3 (Segment) . Suppose K ∗ is the vertical line segment joining the two points (0 , R )and (0 , − R ) for a ﬁxed R >

0. One then gets h K ∗ ( θ ) = R | sin θ | for all θ . For simplicity, assumethat n is even and consider i = n/ θ n/ = 0. It can then be veriﬁed that∆ k ( θ n/ ) = ∆ k (0) = Rk + 1 k X j =0 tan 2 πjn for every k ∈ I . Because j tan(2 πj/n ) is increasing, we get3 πRk n ≤ Rk + 1 ( 3 k πk/ n ) ≤ ∆ k (0) ≤ R tan(2 πk/n ) ≤ R sin(2 πk/n ) ≤ πRkn . Corollary 3.5 then gives σ k ∗ ( n/

2) + 1 ≍ (cid:18) σ Rn (cid:19) / . (42)It was shown in Corollary 3.3 that the right hand side above represents the maximum possiblevalue of σ / ( k ∗ ( i ) + 1) when K ∗ lies in a closed ball of radius R . Therefore this example presentsthe situation where estimation of h K ∗ ( θ i ) is the most diﬃcult. See Remark 4.1 for the connectionto smoothness of h K ∗ ( · ) at θ i .Now suppose that i = 3 n/ n/ θ i = π/ h K ∗ ( θ ) = R sin θ (without the modulus) for θ = θ i ± jπ/n for every 0 ≤ j ≤ k, k ∈ I . Using (36), we have ∆ k ( θ i ) = 0 for every k ∈ I . This immediately gives k ∗ ( i ) = ⌊ n/ ⌋ and hence σ k ∗ (3 n/

4) + 1 ≍ σ n . (43)15n this example, the risk for estimating h K ∗ ( θ i ) changes with i . For i = n/

2, we get the n − / ratewhile for i = 3 n/

4, we get the parametric rate. For other values of i , one gets a range of ratesbetween n − / and n − .Because K ∗ is a polytope with 2 vertices, Theorem 3.7 and inequality (35) imply that the setestimators ˆ K and ˆ K ′ converge at the near parametric rate σ log n/n . It is interesting to note herethat even though for some θ i , the optimal rate of estimation of h K ∗ ( θ i ) is n − / , the entire set canbe estimated at the near parametric rate. Example 4.4 (Half-ball) . Suppose K ∗ := { ( x , x ) : x + x ≤ , x ≤ } . One then has h K ( θ ) = 1for − π ≤ θ ≤ h K ( θ ) = | cos θ | for 0 < θ ≤ π . Assume n is even and take i = n/ θ i = 0. Then∆ k (0) = 1 k + 1 k X j =0 (cid:18) cos 4 πj/n + 12 − cos 4 πj/n cos 2 πj/n cos 2 πj/n + 12 (cid:19) = 12( k + 1) k X j =0 (cid:18) − cos 4 πj/n cos 2 πj/n (cid:19) . This is exactly as in (40) with R = 1 and an additional factor of 1 /

2. Arguing as in Example 4.2,we obtain that σ k ∗ ( n/

2) + 1 ≍ (cid:18) σ n (cid:19) / . Now take i = 3 n/ n/ θ i = π/

2. Observe then that h K ∗ ( θ ) = | cos θ | for θ = θ i ± jπ/n for every 0 ≤ j ≤ k, k ∈ I . The situation is therefore similar to (42) and weobtain σ k ∗ (3 n/

4) + 1 ≍ (cid:18) σ n (cid:19) / . Similar to the previous example, the risk for estimating h K ∗ ( θ i ) changes with i and varies from n − / to n − / . On the other hand, Theorem 3.6 states that the set estimator ˆ K still estimates K ∗ at the rate n − / . Remark 4.1 (Connection between risk and smoothness) . The reader may observe that the supportfunctions (37) and (39) in the two examples above diﬀer only by the constant R . It might thenseem strange that only the addition of a non-zero constant changes the risk of estimating h K ∗ ( θ i )from n − to n − / . It turns out that the function (37) is much more smoother than the function(39); the right way to view smoothness of h K ∗ ( · ) is to regard it as a function on R . This is donein the following way. Deﬁne, for each z = ( z , z ) ∈ R , h K ∗ ( z ) = max ( x ,x ) ∈ K ∗ ( x z + x z ) . When z = (cos θ, sin θ ) for some θ ∈ R , this deﬁnition coincides with our deﬁnition of h K ∗ ( θ ). Astandard result (see for example Corollary 1.7.3 and Theorem 1.7.4 in Schneider (1993)) states thatthe subdiﬀerential of z h K ∗ ( z ) exists at every z = ( z , z ) ∈ R and is given by F ( K ∗ , z ) := { ( x , x ) ∈ K ∗ : h K ∗ ( z ) = x z + x z } .

16n particular, z h K ∗ ( z ) is diﬀerentiable at z if and only if F ( K ∗ , z ) is a singleton.This point of view of studying h K ∗ as a function on R sheds qualitative light on the riskbounds obtained in the examples. In the case of Example 4.1 when K ∗ = { ( x , x ) } , it is clear that F ( K ∗ , z ) = { ( x , x ) } for all z . Because this set does not change with z , this provides the case ofmaximum smoothness (because the derivative is constant) and thus we get the n − rate.In Example 4.2 when K ∗ is a ball centered at x = ( x , x ) with radius R , it can be checkedthat F ( K ∗ , z ) = { x + Rz/ k z k} for every z = 0. Since F ( K ∗ , z ) is a singleton for each z = 0, itfollows that z h K ∗ ( z ) is diﬀerentiable for every z . For R = 0, the set F ( K ∗ , z ) changes with z and thus here h K ∗ is not as smooth as in Example 4.1. This explains the slower rate in Example4.2 compared to 4.1.Finally in Example 4.3, when K ∗ is the vertical segment joining (0 , R ) and (0 , − R ), it is easyto see that F ( K ∗ , z ) = K ∗ when z = (1 , F ( K ∗ , z ) is not a singleton which implies that h K ∗ ( z ) is non-diﬀerentiable at z = (1 , n − / for estimating h K ∗ ( θ n/ ) in Example 4.3. In this paper we study the problems of estimating both the support function at a point, h K ∗ ( θ i ),and the convex set K ∗ . Data-driven adaptive estimators are constructed and their optimalityis established. For pointwise estimation, the quantity k ∗ ( i ), which appears in both the upperbound (17) and the lower bound (19), is related to the smoothness of h K ∗ ( θ ) at θ = θ i . Theconstruction of ˆ h i is based on local smoothing together with an optimization scheme for choosingthe bandwidth. Smoothing methods for estimating the support function have previously beenstudied by Fisher et al. (1997). Speciﬁcally, working under certain smoothness assumptions on thetrue support function h K ∗ ( θ ), Fisher et al. (1997) estimated it using periodic versions of standardnonparametric regression techniques such as local regression, kernel smoothing and splines. Theyevade the problem of bandwidth selection however by assuming that the true support function issuﬃciently smooth. Our estimator comes with a scheme for choosing the bandwidth automaticallyfrom the data and hence we do not need any smoothness assumptions on the true convex set.To avoid complications, we have assumed throughout the paper that the noise level σ is known.In practice, σ is typically unknown and needs to be estimated. Under the setting of the presentpaper, σ is easily estimable by using the median of the consecutive diﬀerences. Let δ i = Y i − Y i − , i = 1 , . . . , ⌊ n ⌋ . A simple robust estimator of the noise level σ is the following medianabsolute deviation (MAD) estimator:ˆ σ = median | δ i − median( δ i ) | . . It was noted that the construction of our estimators ˆ K and ˆ K ′ given in Section 2.2 does not17nvolve any special treatment for polytopes; yet we obtain faster rates for polytopes. Such automaticadaptation to polytopes has been observed in other contexts: isotonic regression where one getsautomatic adaptation for piecewise constant monotone functions (see Chatterjee et al. (2014)) andconvex regression where one gets automatic adaptation for piecewise aﬃne convex functions (seeGuntuboyina and Sen (2013)).Finally, we note that because σ / ( k ∗ ( i ) + 1) gives the optimal rate in pointwise estimation, itcan potentially be used as a benchmark to evaluate other estimators for h K ∗ ( θ i ) such as the leastsquares estimator h ˆ K ls ( θ i ). This however is beyond the scope of the current paper. We prove the main results in this section. Additional technical results and proofs are given inAppendix A.

We provide the proof of Theorem 3.1 here. The proof uses three simple lemmas: Lemma A.1, A.2and A.3 which are stated and proved in Appendix A.Fix i = 1 , . . . , n . Because ˆ h i = ˆ U ˆ k ( i ) ( θ i ), we write (cid:16) ˆ h i − h K ∗ ( θ i ) (cid:17) = X k ∈I (cid:16) ˆ U k ( θ i ) − h K ∗ ( θ i ) (cid:17) I n ˆ k ( i ) = k o where I ( · ) denotes the indicator function. Taking expectations on both sides and using Cauchy-Schwartz inequality, we obtain E K ∗ (cid:16) ˆ h i − h K ∗ ( θ i ) (cid:17) ≤ X k ∈I q E ( ˆ U k ( θ i ) − h K ∗ ( θ i )) r P K ∗ n ˆ k ( i ) = k o . The random variable ˆ U k − h K ∗ (0) is normally distributed and we know that E Z ≤ E Z ) forevery gaussian random variable Z . We therefore have E K ∗ (cid:16) ˆ h i − h K ∗ ( θ i ) (cid:17) ≤ √ X k ∈I E ( ˆ U k ( θ i ) − h K ∗ ( θ i )) r P K ∗ n ˆ k ( i ) = k o . Because E K ∗ ˆ U k ( θ i ) = U k ( θ i ) (deﬁned in (9)), we have E K ∗ ( ˆ U k ( θ i ) − h K ∗ ( θ i )) = ( U k ( θ i ) − h K ∗ ( θ i )) + var( ˆ U k ( θ i )) . Because L k ( θ i ) ≤ h K ∗ ( θ i ) ≤ U k ( θ i ), it is clear that U k ( θ i ) − h K ∗ ( θ i ) ≤ U k ( θ ) − L k ( θ i ) = ∆ k ( θ i ).Also, Lemma A.3 states that the variance of ˆ U k is at most σ / ( k + 1). Putting these together, weobtain E K ∗ (cid:16) ˆ h i − h K ∗ ( θ i ) (cid:17) ≤ √ X k ∈I (cid:18) ∆ k ( θ i ) + σ k + 1 (cid:19) r P K ∗ n ˆ k ( i ) = k o . X k ∈I (cid:18) ∆ k ( θ i ) + σ k + 1 (cid:19) r P K ∗ n ˆ k ( i ) = k o ≤ C σ k ∗ ( i ) + 1 (44)for a universal positive constant C .Below, we write ∆ k , ˆ k and k ∗ for ∆ k ( θ i ) , ˆ k ( i ) and k ∗ ( i ) respectively for ease of notation. Wealso write P for P K ∗ .We prove (44) by considering the two cases: k ≤ k ∗ , k ∈ I and k > k ∗ , k ∈ I separately.The ﬁrst case is k ≤ k ∗ , k ∈ I . By Lemma A.1 and (88), we get∆ k ≤ ∆ k ∗ ≤ √ − σ √ k ∗ + 1 ≤ √ − σ √ k + 1and consequently∆ k + σ k + 1 ≤ σ k + 1 (cid:16) √ − + 1 (cid:17) for all k ≤ k ∗ , k ∈ I . (45)We bound P { ˆ k = k } by writing P { ˆ k = k } ≤ P (cid:26)(cid:16) ˆ∆ k (cid:17) + + 2 σ √ k + 1 ≤ (cid:16) ˆ∆ k ∗ (cid:17) + + 2 σ √ k ∗ + 1 (cid:27) ≤ P (cid:26)(cid:16) ˆ∆ k ∗ (cid:17) + ≥ σ √ k + 1 − σ √ k ∗ + 1 (cid:27) . Because k ≤ k ∗ , the positive part above can be dropped and we obtain P { ˆ k = k } ≤ P (cid:26) ˆ∆ k ∗ ≥ σ √ k + 1 − σ √ k ∗ + 1 (cid:27) . Because ˆ∆ k ∗ is normally distributed with mean ∆ k ∗ , we have P { ˆ k = k } ≤ P  Z ≥ σ ( k + 1) − / − σ ( k ∗ + 1) − / − ∆ k ∗ q var( ˆ∆ k ∗ )  , where Z is a standard normal random variable. From (88), we have2 σ √ k + 1 − σ √ k ∗ + 1 − ∆ k ∗ ≥ σ √ k + 1 − r k + 1 k ∗ + 1 (cid:16) √ − (cid:17)! . As a result, P { ˆ k = k } ≤ P  Z ≥ σ q ( k + 1)var( ˆ∆ k ∗ ) − r k + 1 k ∗ + 1 (cid:16) √ − (cid:17)! . Suppose ˜ k := ( k ∗ + 1) (cid:16) √ − (cid:17) − − . k < ˜ k , we use the bound given by Lemma A.3 on the variance of ˆ∆ k ∗ to obtain P { ˆ k = k } ≤ P ( Z ≥ r k ∗ + 1 k + 1 − √ !) ≤ exp  − "r k ∗ + 1 k + 1 − √  . Using this and (45), we see that the quantity X k< ˜ k,k ∈I (cid:18) ∆ k + σ k + 1 (cid:19) q P { ˆ k = k } is bounded from above by σ k ∗ + 1 (cid:16) √ − + 1 (cid:17) X k< ˜ k,k ∈I k ∗ + 1 k + 1 exp  − "r k ∗ + 1 k + 1 − √  . Because I consists of integers of the form 2 j , it follows that for any two successive integers k and k in I , we have 3 / ≤ ( k + 1) / ( k + 1) ≤

2. Using this, it is easily seen that X k< ˜ k,k ∈I k ∗ + 1 k + 1 exp  − "r k ∗ + 1 k + 1 − √  is bounded from above by X j ≥ j exp (cid:18) − h (3 / j/ − √ i (cid:19) + X ≤ j ≤ j , which is just a universal positive constant. We have proved therefore that X k< ˜ k,k ∈I (cid:18) ∆ k + σ k + 1 (cid:19) q P { ˆ k = k } ≤ C σ k ∗ + 1 , (46)for a positive constant C .For ˜ k ≤ k ≤ k ∗ , we simply use (45) along with the trivial bound P { ˆ k = k } ≤ X ˜ k ≤ k ≤ k ∗ ,k ∈I (cid:18) ∆ k + σ k + 1 (cid:19) q P { ˆ k = k } ≤ (cid:16) √ − + 1 (cid:17) σ k ∗ + 1 X ˜ k ≤ k k ∗ , k ∈ I . Assume that { k ∈ I : k > k ∗ } is non-empty forotherwise there is nothing to prove. By the ﬁrst part of (89), we get X k>k ∗ ,k ∈I (cid:18) ∆ k + σ k + 1 (cid:19) q P { ˆ k = k } ≤ (cid:18) √ − (cid:19) X k>k ∗ ,k ∈I ∆ k q P { ˆ k = k } . (49)We ﬁrst bound P { ˆ k = k } for k > k ∗ , k ∈ I . We proceed by writing P { ˆ k = k } ≤ P (cid:26) ˆ∆ + k + 2 σ √ k + 1 ≤ ˆ∆ + k ∗ + 2 σ √ k ∗ + 1 (cid:27) ≤ P (cid:26) ˆ∆ k + 2 σ √ k + 1 ≤ ˆ∆ + k ∗ + 2 σ √ k ∗ + 1 (cid:27) (because x ≤ x + ) ≤ P (cid:26) ˆ∆ k + 2 σ √ k + 1 ≤ ˆ∆ k ∗ + 2 σ √ k ∗ + 1 (cid:27) + P K (cid:26) ˆ∆ k + 2 σ √ k + 1 ≤ σ √ k ∗ + 1 (cid:27) ≤ P (cid:26) ˆ∆ k ≤ ˆ∆ k ∗ + 2 σ √ k ∗ + 1 (cid:27) + P K (cid:26) ˆ∆ k ≤ σ √ k ∗ + 1 (cid:27) ≤ P (cid:26) ˆ∆ k ∗ − ˆ∆ k ≥ − σ √ k ∗ + 1 (cid:27) + P (cid:26) − ˆ∆ k ≥ − σ √ k ∗ + 1 (cid:27) Both ˆ∆ k ∗ − ˆ∆ k and ˆ∆ k are normally distributed with means ∆ k ∗ − ∆ k and ∆ k respectively. As aresult P { ˆ k = k } ≤ P  Z ≥ ∆ k − ∆ k ∗ − σ ( k ∗ + 1) − / q var( ˆ∆ k ∗ − ˆ∆ k )  + P  Z ≥ ∆ k − σ ( k ∗ + 1) − / q var( ˆ∆ k )  where Z is a standard normal random variable. Using (88), we obtain P { ˆ k = k } ≤ P  Z ≥ ∆ k − σ ( k ∗ + 1) − / (cid:0) √ − (cid:1)q var( ˆ∆ k ∗ − ˆ∆ k )  + P  Z ≥ ∆ k − σ ( k ∗ + 1) − / q var( ˆ∆ k )  . By the Cauchy-Schwarz inequality and Lemma A.3, we get, for k > k ∗ , q var( ˆ∆ k ∗ − ˆ∆ k ) ≤ q var( ˆ∆ k ∗ ) + q var( ˆ∆ k ) ≤ σ √ k + 1 + σ √ k ∗ + 1 ≤ σ √ k ∗ + 1Also var( ˆ∆ k ) ≤ σ / ( k + 1) ≤ σ / ( k ∗ + 1). Therefore if k > k ∗ , k ∈ I is such that∆ k ≥ σ ( k ∗ + 1) − / (cid:16) √ − (cid:17) , (50)21e obtain P { ˆ k = k } ≤ P ( Z ≥ ∆ k − σ ( k ∗ + 1) − / (cid:0) √ − (cid:1) σ √ k ∗ + 1) − / ) + P ( Z ≥ ∆ k − σ ( k ∗ + 1) − / σ ( k ∗ + 1) − / ) ≤ P ( Z ≥ ∆ k − σ ( k ∗ + 1) − / (cid:0) √ − (cid:1) σ √ k ∗ + 1) − / ) ≤ (cid:18) − k ∗ + 12 σ (cid:16) ∆ k − σ ( k ∗ + 1) − / (3 √ − (cid:17) (cid:19) . Using the inequality ( x − y ) ≥ x / − y with x = ∆ k and y = 2 σ ( k ∗ + 1) − / (3 √ − P { ˆ k = k } ≤ (cid:16) √ − (cid:17) exp (cid:18) − ( k ∗ + 1)∆ k σ (cid:19) (51)whenever k ∈ I, k > k ∗ satisﬁes (50). It is easy to see that when (50) is not satisﬁed, the righthand side above is larger than 2. Thus, inequality (51) is true for all k ∈ I , k > k ∗ . As a result,∆ k q P { ˆ k = k } ≤ √ (cid:16) (3 √ − (cid:17) ξ (cid:0) ∆ k (cid:1) for all k ∈ I , k > k ∗ . (52)where ξ ( z ) := z exp (cid:18) − ( k ∗ + 1) z σ (cid:19) for z > . By (49) and (52), the proof would therefore be complete if we show that P k ∈I : k>k ∗ ξ (cid:0) ∆ k (cid:1) isbounded from above by a universal positive constant. For this, note ﬁrst that the function ξ ( z ) isdecreasing for z ≥ ˘ z := 8 σ / ( k ∗ + 1) and attains its maximum over z > z = ˘ z . Note also thesecond part of inequality (89) gives ∆ k ≥ z k for all k ∈ I , k > k ∗ where z k := ( √ − σ ( k + 1)4( k ∗ + 1) We therefore get ξ (cid:0) ∆ k (cid:1) ≤ ξ (max( z k , ˘ z )) = max( z k , ˘ z ) exp (cid:18) − ( k ∗ + 1) max( z k , ˘ z )8 σ (cid:19) ≤ max( z k , ˘ z ) exp (cid:18) − ( k ∗ + 1) z k σ (cid:19) ≤ ( z k + ˘ z ) exp (cid:18) − ( k ∗ + 1) z k σ (cid:19) . Because k > k ∗ , it is easy to see that˘ z = 8 σ k ∗ + 1 ≤ σ ( k + 1)( k ∗ + 1) . We deduce that ξ (cid:0) ∆ k (cid:1) ≤ " ( √ − σ ( k + 1)( k ∗ + 1) exp − ( √ − k + 1 k ∗ + 1 ! . Denoting the constants above by c and c , we can write X k ∈I : k>k ∗ ξ (cid:0) ∆ k (cid:1) ≤ c σ k ∗ + 1 X k ∈I : k>k ∗ k + 1 k ∗ + 1 exp (cid:18) − k + 1 c ( k ∗ + 1) (cid:19) . X j ≥ j exp − c (cid:18) (cid:19) j ! which is further bounded by a universal constant. This completes the proof of Theorem 3.1. This subsection is dedicated to the proof of Theorem 3.2. We use Lemma A.4 which is stated andproved in Section A. We also use a classical inequality due to Le Cam (1986) which states that forevery estimator ˜ h and compact, convex set L ∗ ,max (cid:20) E K ∗ (cid:16) ˜ h − h K ∗ ( θ i ) (cid:17) , E L ∗ (cid:16) ˜ h − h L ∗ ( θ i ) (cid:17) (cid:21) ≥

14 ( h K ∗ ( θ i ) − h L ∗ ( θ i )) (1 − k P K ∗ − P L ∗ k T V ) . (53)Here P L ∗ is the product of the Gaussian probability measures with mean h L ∗ ( θ i ) and variance σ for i = 1 , . . . , n . Also k P − Q k T V denotes the total variation distance between P and Q .For ease of notation, we assume, without loss of generality, that θ i = 0. We also write ∆ k for∆ k ( θ i ) and k ∗ for k ∗ ( i ).Suppose ﬁrst that K ∗ satisﬁes the following condition: There exists some α ∈ (0 , π/

4) such that h K ∗ ( α ) + h K ∗ ( − α )2 cos α − h K ∗ (0) > σ √ n α (54)where n α denotes the number of integers i for which − α < iπ/n < α . This condition will not besatisﬁed, for example, when K ∗ is a singleton. We shall handle such K ∗ later. Observe that n α ≥ < α < π/ i = 0.Let us deﬁne, for each α ∈ (0 , π/ a ∗ K ( α ) := (cid:18) h K ∗ ( α ) + h K ∗ ( − α )2 cos α , h K ∗ ( α ) − h K ∗ ( − α )2 sin α (cid:19) . and let L ∗ = L ∗ ( α ) be deﬁned as the smallest convex set that contains both K ∗ and the point a K ∗ ( α ). In other words, L ∗ is the convex hull of K ∗ ∪ { a K ∗ ( α ) } .We now use Le Cam’s inequality (53). To control the total variation distance in the right handside of (53), we use Pinsker’s inequality: || P K ∗ − P L ∗ || T V ≤ r D ( P K ∗ || P L ∗ ) , and the fact that (note that θ i = 2 πi/n − π ) D ( P K ∗ || P L ∗ ) = 12 σ n X i =1 ( h K ∗ (2 iπ/n − π ) − h L ∗ (2 iπ/n − π )) . L ∗ is easily seen to be the maximum of the support functions of K ∗ andthe singleton { a K ∗ ( α ) } . Therefore, h L ∗ ( θ ) := max (cid:18) h K ∗ ( θ ) , h K ∗ ( α ) + h K ∗ ( − α )2 cos α cos θ + h K ∗ ( α ) − h K ∗ ( − α )2 sin α sin θ (cid:19) = max (cid:18) h K ∗ ( θ ) , sin( θ + α )sin 2 α h K ∗ ( α ) + sin( α − θ )sin 2 α h K ∗ ( − α ) (cid:19) . Using (1), it can be shown that h K ∗ ( θ ) ≤ sin( θ + α )sin 2 α h K ∗ ( α ) + sin( α − θ )sin 2 α h K ∗ ( − α ) for − α < θ < α, (55)and h K ∗ ( θ ) ≥ sin( θ + α )sin 2 α h K ∗ ( α ) + sin( α − θ )sin 2 α h K ∗ ( − α ) for θ ∈ [ − π, − α ] ∪ [ α, π ] . (56)To see this, assume that θ > θ ∈ [0 , α ] and θ ∈ [ α, π ]. In the ﬁrst case, apply (1) with α = α, α = θ and α = − α to get (55).In the second case, apply (1) with α = θ, α = α and α = − α to get (56).As a result of (55) and (56), we get that h L ∗ ( θ ) = sin( θ + α )sin 2 α h K ∗ ( α ) + sin( α − θ )sin 2 α h K ∗ ( − α ) for − α < θ < α, and that h L ∗ ( θ ) equals h K ∗ ( θ ) for every other θ in ( − π, π ].We now give an upper bound on h L ∗ ( θ ) − h K ∗ ( θ ) for 0 ≤ θ < α . Using (1) with α = θ, α = 0and α = − α , we obtain h K ∗ ( θ ) ≥ sin( α + θ )sin α h K ∗ (0) − sin θ sin α h K ∗ ( − α ) . Thus for 0 ≤ θ < α , we obtain the inequality0 ≤ h L ∗ ( θ ) − h K ∗ ( θ ) = sin( θ + α )sin 2 α h K ∗ ( α ) + sin( α − θ )sin 2 α h K ∗ ( − α ) − h K ∗ ( θ ) ≤ sin( θ + α )sin α (cid:18) h K ∗ ( α ) + h K ∗ ( − α )2 cos α − h K ∗ (0) (cid:19) . Because 0 < α < π/ , ≤ θ ≤ α , we use the fact that the sine function is increasing on (0 , π/

2) todeduce that0 ≤ h L ∗ ( θ ) − h K ∗ ( θ ) ≤ h K ∗ ( α ) + h K ∗ ( − α )2 cos α − h K ∗ (0) for all 0 ≤ θ < α. One can similarly deduce the same inequality for the case − α < θ ≤ h L ∗ ( θ ) equals h K ∗ ( θ ) for all θ in ( − π, π ] that are not in theinterval ( − α, α ), we obtain D ( P K ∗ || P L ∗ ) = 12 σ n X i =1 ( h K ∗ (2 iπ/n − π ) − h L ∗ (2 iπ/n − π )) ≤ n α σ (cid:18) h K ∗ ( α ) + h K ∗ ( − α )2 cos α − h K ∗ (0) (cid:19) . h L ∗ (0) = ( h K ∗ ( α ) + h K ∗ ( − α )) / (2 cos α ), we obtain, by (53), that r ≥ (cid:18) h K ∗ ( α ) + h K ∗ ( − α )2 cos α − h K ∗ (0) (cid:19) (cid:18) − r n α σ (cid:18) h K ∗ ( α ) + h K ∗ ( − α )2 cos α − h K ∗ (0) (cid:19)(cid:19) (57)for every 0 < α < π/ r := inf ˜ h max (cid:20) E K ∗ (cid:16) ˜ h − h K ∗ ( θ i ) (cid:17) , E L ∗ (cid:16) ˜ h − h L ∗ ( θ i ) (cid:17) (cid:21) (58)where the inﬁmum above is over all estimators ˜ h . Let us now deﬁne α ∗ by α ∗ := inf (cid:26) < α < π/ h K ∗ ( α ) + h K ∗ ( − α )2 cos α − h K ∗ (0) > σ √ n α (cid:27) . Note ﬁrst that α ∗ > n α ≥ α and thus for α very small while the quantity( h K ∗ ( α ) + h K ∗ ( − α )) / (2 cos α ) − h K ∗ (0) becomes close to 0 for small α (by continuity of h K ∗ ( · )).Also because we have assumed (54), it follows that 0 < α ∗ < π/

4. Now for each ǫ > h K ∗ ( α ∗ − ǫ ) + h K ∗ ( − α ∗ + ǫ )2 cos( α ∗ − ǫ ) − h K ∗ (0) ≤ σ √ n α ∗ − ǫ . Letting ǫ ↓ n α ∗ − ǫ → n α ∗ and the continuity of h K ∗ , wededuce h K ∗ ( α ∗ ) + h K ∗ ( − α ∗ )2 cos α ∗ − h K ∗ (0) ≤ σ √ n α ∗ . (59)Because 0 < α ∗ < π/

4, by the deﬁnition of the inﬁmum, there exists a decreasing sequence { α k } ∈ (0 , π/

4) converging to α ∗ such that h K ∗ ( α k ) + h K ∗ ( − α k )2 cos α k − h K ∗ (0) > σ √ n α k for all k. For k large, n α k is either n α ∗ or n α ∗ + 2, and hence letting k → ∞ , we get h K ∗ ( α ∗ ) + h K ∗ ( − α ∗ )2 cos α ∗ − h K ∗ (0) ≥ σ √ n α ∗ + 2 ≥ √ σ √ n α ∗ , where we also used that n α ∗ ≥

1. Combining the above with (59), we conclude that1 √ σ √ n α ∗ ≤ h K ∗ ( α ∗ ) + h K ∗ ( − α ∗ )2 cos α ∗ − h K ∗ (0) ≤ σ √ n α ∗ . Using α = α ∗ in (57), we get r ≥ σ n α ∗ . (60)We shall now show that α ∗ ≤ ˜ α := 8( k ∗ + 1) πn (61)when 8( k ∗ + 1) π/n ≤ π/ α n α isnon-decreasing, that n α ∗ ≤ n ˜ α = n ˜ απ − k ∗ + 7 . r ≥ σ k ∗ + 7) ≥ cσ k ∗ + 1for a positive constant c . This would prove the theorem when assumption (54) is true.To prove (61), we only need to show that h K ∗ ( ˜ α ) + h K ∗ ( − ˜ α )2 cos ˜ α − h K ∗ (0) > σ √ n ˜ α = σ √ k ∗ + 7 . (62)We verify this via Lemma A.4 on a case-by-case basis. When k ∗ = 0, we have ˜ α = 8 π/n so that,by Lemma A.4, the left hand side above is bounded from below by ∆ . Because k ∗ is zero, bydeﬁnition of k ∗ , we have ∆ + 2 σ √ ≥ ∆ + 2 σ = 2 σ. This gives ∆ ≥ σ (1 − (1 / √ σ/ √ k ∗ + 7 = σ/ √ k ∗ = 1, we have ˜ α = 16 π/n so that, by Lemma A.4, the left hand side in (62) is boundedfrom below by ∆ . Because k ∗ = 1, by deﬁnition of k ∗ , we have∆ + 2 σ √ ≥ ∆ + 2 σ √ ≥ σ √ ≥ σ ((1 / √ − (1 / √ σ/ √ k ∗ + 7 = σ/ √ k ∗ ≥

2, we again use Lemma A.4 to argue that the left hand side in (62) is bounded frombelow by ∆ k ∗ +1) . Because ∆ k is increasing in k (Lemma A.1), we have ∆ k ∗ +1) ≥ ∆ k ∗ . By thedeﬁnition of k ∗ (and the fact that ∆ k ∗ ≥ k ∗ ≥ σk ∗ + 1 − r k ∗ + 12 k ∗ + 1 ! . Because k ∗ ≥

2, it can be easily checked that ( k ∗ + 1) / (2 k ∗ + 1) ≤ / k ∗ + 7) / ( k ∗ + 1) ≥ / − p / p / >

1, imply (62). This completes the proofof the theorem when assumption (54) holds.We now deal with the simpler case when (54) is violated. When (54) is violated, we ﬁrst showthat k ∗ > n √ − . (63)To see this, note ﬁrst that, because (54) is violated, we have h K ∗ ( α ) + h K ∗ ( − α )2 cos α − h K ∗ (0) ≤ σ √ n α ≤ σ (cid:16) nαπ − (cid:17) − / for all α ∈ (0 , π/ ≤ k ≤ n/

16, we get∆ k ≤ h K ∗ (4 kπ/n ) + h K ∗ ( − kπ/n )2 cos 4 kπ/n − h K ∗ (0) ≤ σ √ k − ≤ σ √ k . k ≤ n √ − , (64)we have ∆ k + 2 σ √ k + 1 ≥ σ √ k + 1 ≥ σ p n/

16 + 2 σ p n/ > ∆ n/ + 2 σ p n/

16 + 1 . It follows therefore that any k satisfying (64) cannot be a minimizer of ∆ k + 2 σ ( k + 1) − / , therebyimplying (63).Let L ∗ be deﬁned as the Minkowski sum of K ∗ and the closed ball with center 0 and radius σ (3 n/ − / . In other words, L ∗ := (cid:8) x + σ (3 n/ − / y : x ∈ K and || y || ≤ (cid:9) . The support func-tion L ∗ can be checked to equal: h L ∗ ( θ ) = h K ∗ ( θ ) + σ (3 n/ − / . Le Cam’s bound again gives r ≥

14 ( h K ∗ (0) − h L ∗ (0)) { − || P K ∗ − P L ∗ || T V } (65)where r is as deﬁned in (58). By use of Pinsker’s inequality, we have || P K ∗ − P L ∗ || T V ≤ σ vuut n X i =1 (cid:0) h K (2 iπ/n − π ) − h ˘ K (2 iπ/n − π ) (cid:1) = 12 σ s nσ n/ ≤ . Therefore, from (65) and (63), we get that r ≥ σ n ≥ √ σ k ∗ + 1 . This completes the proof of Theorem 3.2.

Recall the deﬁnition of ˜ h P in (13) and the deﬁnition of the estimator ˆ K in (14). The ﬁrst thing tonote is that h ˆ K ( θ i ) = ˆ h Pi for every i = 1 , . . . , n. (66)To see this, observe ﬁrst that, because ˆ h P = (ˆ h P , . . . , ˆ h Pn ) is a valid support vector, there exists aset ˜ K with h ˜ K ( θ i ) = ˆ h Pi for every i . It is now trivial (from the deﬁnition of ˆ K ) to see that ˜ K ⊆ ˆ K which implies that h ˆ K ( θ i ) ≥ h ˜ K ( θ i ) = ˆ h Pi . On the other hand, the deﬁnition of ˆ K immediatelygives h ˆ K ( θ i ) ≤ ˆ h Pi .The observation (66) immediately gives E K ∗ L f ( K ∗ , ˆ K ) = E K ∗ n n X i =1 (cid:16) h K ∗ ( θ i ) − ˆ h Pi (cid:17)

27t will be convenient here to introduce the following notation. Let h vecK ∗ denote the vector ( h K ∗ ( θ ) , . . . , h K ∗ ( θ n )).Also, for u, v ∈ R n , let ℓ ( u, v ) denote the scaled Euclidean distance deﬁned by ℓ ( u, v ) := P ni =1 ( u i − v i ) /n . With this notation, we have E K ∗ L f ( K ∗ , ˆ K ) = E K ∗ ℓ ( h vecK ∗ , ˆ h P ) . (67)Recall that ˆ h P is the projection of ˆ h := (ˆ h , . . . , ˆ h n ) onto H . Because H is a closed convex subsetof R n , it follows that (see, for example, Stark and Yang (1998)) ℓ ( h, ˆ h ) ≥ ℓ (ˆ h, ˆ h P ) + ℓ ( h, ˆ h P ) for every h ∈ H . In particular, with h = h vecK ∗ , we obtain ℓ ( h vecK ∗ , ˆ h P ) ≤ ℓ ( h vecK ∗ , ˆ h ). Combining this with (67), weobtain E K ∗ L f ( K ∗ , ˆ K ) ≤ E K ∗ ℓ ( h vecK ∗ , ˆ h ) = 1 n n X i =1 E K ∗ (cid:16) ˆ h i − h K ∗ ( θ i ) (cid:17) . (68)In Theorem 3.1, we proved that E K ∗ (cid:16) ˆ h i − h K ∗ ( θ i ) (cid:17) ≤ Cσ k ∗ ( i ) + 1 for every i = 1 , . . . , n. This implies that E K ∗ L f ( K ∗ , ˆ K ) ≤ Cσ n n X i =1 k ∗ ( i ) + 1 . For inequality (28), it is therefore enough to prove that n X i =1 k ∗ ( i ) + 1 ≤ C ( (cid:18) R √ nσ (cid:19) / ) . (69)Our following proof of (69) is inspired by an argument due to Zhang (2002, Theorem 2.1) in a verydiﬀerent context.Recall that k ∗ ( i ) takes values in I := { } ∪ { j : j ≥ , j ≤ ⌊ n/ ⌋} . For k ∈ I , let ρ ( k ) := n X i =1 I { k ∗ ( i ) = k } and ℓ ( k ) := n X i =1 I { k ∗ ( i ) < k } Note that ℓ (0) = 0 , ℓ (1) = ρ (0) and ρ ( k ) = ℓ (2 k ) − ℓ ( k ) for k ≥ , k ∈ I . As a result n X i =1 k ∗ ( i ) + 1 = X k ∈I ρ ( k ) k + 1 = ℓ (1) + X k ≥ ,k ∈I ℓ (2 k ) − ℓ ( k ) k + 1 . Let K denote the maximum element of I . Because ℓ (2 K ) = n , we can write n X i =1 k ∗ ( i ) + 1 = nK + 1 + ℓ (1)2 + X k ≥ ,k ∈I kℓ ( k )( k + 1)( k + 2) . n/ ( K + 1) ≤ C and loose bounds for the other terms above, we obtain n X i =1 k ∗ ( i ) + 1 ≤ C + X k ≥ ,k ∈I ℓ ( k ) k . (70)We shall show below that ℓ ( k ) ≤ min n, ARk / σn ! for all k ∈ I (71)for a universal positive constant A . Before that, let us ﬁrst prove (69) assuming (71). Assuming(71), we can write X k ≥ ,k ∈I ℓ ( k ) k = X k ≥ ,k ∈I ℓ ( k ) k I ( k ≤ (cid:18) σn AR (cid:19) / ) + X k ≥ ,k ∈I ℓ ( k ) k I ( k > (cid:18) σn AR (cid:19) / ) (72)In the ﬁrst term on the right hand side above, we use the bound ℓ ( k ) ≤ ARk / / ( σn ). We then get X k ≥ ,k ∈I ℓ ( k ) k I ( k ≤ (cid:18) σn AR (cid:19) / ) ≤ ARσn X k ≥ ,k ∈I k / I ( k ≤ (cid:18) σn AR (cid:19) / ) . Because I consists of integers of the form 2 j , the sum in the right hand side above is bounded fromabove by a constant multiple of the last term. This gives X k ≥ ,k ∈I ℓ ( k ) k I ( k ≤ (cid:18) σn AR (cid:19) / ) ≤ CRσn (cid:18) σn AR (cid:19) / = C (cid:18) R √ nσ (cid:19) / (73)For the second term on the right hand side in (72), we use the bound ℓ ( k ) ≤ n which gives X k ≥ ,k ∈I ℓ ( k ) k I ( k > (cid:18) σn AR (cid:19) / ) ≤ n X k ≥ ,k ∈I k − I ( k > (cid:18) σn AR (cid:19) / ) Again, because I consists of integers of the form 2 j , the sum in the right hand side above is boundedfrom above by a constant multiple of the ﬁrst term. This gives X k ≥ ,k ∈I ℓ ( k ) k I ( k > (cid:18) σn AR (cid:19) / ) ≤ Cn (cid:18) σn AR (cid:19) − / = C (cid:18) R √ nσ (cid:19) / . (74)Inequalities (73) and (74) in conjunction with (70) proves (69) which would complete the proof of(28).We only need to prove (71). For this, observe ﬁrst that when k ∗ ( i ) < k , Corollary 3.5 gives that∆ k ( θ i ) ≥ ( √ − σ √ k + 1 . (75)This is because if (75) is violated, then Corollary 3.5 gives k ≤ ˘ k ( i ) ≤ k ∗ ( i ). Consequently, we have I { k ∗ ( i ) < k } ≤ ∆ k ( θ i ) √ k + 1( √ − σ ℓ ( k ) ≤ √ k + 1( √ − σ n X i =1 ∆ k ( θ i ) for every k ∈ I . (76)Now using the expression (36) for ∆ k ( θ i ), it is easy to see that n X i =1 ∆ k ( θ i ) = 1 k + 1 k X j =0 δ j (77)where δ j is given by δ j = n X i =1 (cid:18) h K ∗ ( θ i + 4 jπ/n ) + h K ∗ ( θ i − jπ/n )2 − cos(4 jπ/n )cos(2 jπ/n ) h K ∗ ( θ i + 2 jπ/n ) + h K ∗ ( θ i − jπ/n )2 (cid:19) . We will now prove an upper bound for δ j under the assumption that K ∗ is contained in a ball ofradius R ≥

0. We may assume without loss of generality that this ball is centered at the originbecause the expression for δ j above remains unchanged if h K ∗ ( θ ) is replaced by h K ∗ ( θ ) − a cos θ − a sin θ for any ( a , a ) ∈ R . Because θ i = 2 πi/n − π , we can rewrite δ j as δ j = n X i =1 (cid:18) h K ∗ ( θ i +2 j ) + h K ∗ ( θ i − j )2 − cos(4 jπ/n )cos(2 jπ/n ) h K ∗ ( θ i + j ) + h K ∗ ( θ i − j )2 (cid:19) . Because θ h K ∗ ( θ ) is a periodic function of period 2 π , the above expression only depends on h K ∗ ( θ ), ..., h K ∗ ( θ n ). In fact, it is easy to see that δ j = (cid:18) − cos(4 jπ/n )cos(2 jπ/n ) (cid:19) n X i =1 h K ∗ ( θ i ) . Now because K ∗ is contained in the ball of radius R centered at the origin, it follows that | h K ∗ ( θ i ) | ≤ R for each i which gives δ j ≤ nR (cid:18) − cos(4 jπ/n )cos(2 jπ/n ) (cid:19) ≤ nR (cid:18) − cos(4 kπ/n )cos(2 kπ/n ) (cid:19) = nR (1 + 2 cos 2 πk/n )cos 2 πk/n (1 − cos 2 πk/n )for all 0 ≤ j ≤ k . Because k ≤ n/

16 for all k ∈ I , it follows that δ j ≤ nR sin ( πk/n ) ≤ Rπ k n for all 0 ≤ j ≤ k. The identity (77) therefore gives P ni =1 ∆ k ( θ i ) ≤ Rπ k /n for all k ∈ I . Consequently, from (76)and the trivial fact that ℓ ( k ) ≤ n , we obtain ℓ ( k ) ≤ min (cid:18) n, π ( √ − Rk √ k + 1 σn (cid:19) for all k ∈ I . Note that ℓ (0) = 0 so that the above inequality only gives something useful for k ≥

1. Using k + 1 ≤ k for k ≥ C , we obtain (71). This completes theproof of Theorem 3.6. 30 .4 Proof of Theorem 3.7 The following lemma will be crucially used in our proof of Theorem 3.7. For every compact, convexset P and i = 1 , . . . , n , let k P ∗ ( i ) denote the quantity k ∗ with K ∗ replaced by P . More precisely, k P ∗ ( i ) := argmin k ∈I (cid:18) ∆ Pk ( θ i ) + 2 σ √ k + 1 (cid:19) where ∆ Pk ( θ i ) is given by1 k + 1 k X j =0 (cid:18) h P ( θ i + 4 jπ/n ) + h P ( θ i − jπ/n )2 − cos(4 jπ/n )cos(2 jπ/n ) h P ( θ i + 2 jπ/n ) + h P ( θ i − jπ/n )2 (cid:19) . The next lemma states that for every i = 1 , . . . , n , the risk E K ∗ (ˆ h i − h K ∗ ( θ i )) can be boundedfrom above by a combination of k P ∗ ( i ) and how well K ∗ can be approximated by P . This resultholds for every P . The approximation of K ∗ by P is measured in terms of the Hausdorﬀ distance(deﬁned in (31)). Lemma 6.1 (Approximation) . There exists a universal positive constant C such that for every i = 1 , . . . , n and every compact, convex set P , we have E K ∗ (cid:16) ˆ h i − h K ∗ ( θ i ) (cid:17) ≤ C (cid:18) σ k P ∗ ( i ) + 1 + ℓ H ( K ∗ , P ) (cid:19) . (78) Proof of Lemma 6.1.

Fix i ∈ { , . . . , n } and a compact, convex set P . For notational convenience,we write ∆ k , ∆ Pk , k ∗ and k P ∗ for ∆ k ( θ i ) , ∆ Pk ( θ i ) , k ∗ ( θ i ) and k P ∗ ( θ i ) respectively.We assume that the following condition holds: k P ∗ + 1 ≥ √ − √ − k ∗ + 1) . (79)If this condition does not hold, we have1 k ∗ + 1 < √ − √ − k P ∗ + 1and then (6.1) immediately follows from Theorem 3.1.Note that (79) implies, in particular, that k P ∗ > k ∗ . Inequality (89) in Lemma A.2 applied to k = k P ∗ implies therefore that ∆ k P ∗ ≥ ( √ − p k P ∗ + 1 σ k ∗ + 1) . Also inequality (88) applied to the set P instead of K ∗ gives∆ Pk P ∗ ≤ √ − σ p k P ∗ + 1 . k P ∗ − ∆ Pk P ∗ ≥ ( √ − p k P ∗ + 1 σ k ∗ + 1) − √ − σ p k P ∗ + 1 . The right hand above is non-decreasing in k P ∗ + 1 and so we can replace k P ∗ + 1 by the lower boundin (79) to obtain, after some simplication,∆ k P ∗ − ∆ Pk P ∗ ≥ σ √ k ∗ + 1 q √ − √ − . (80)The key now is to observe that | ∆ k − ∆ Pk | ≤ ℓ H ( K ∗ , P ) for all k. (81)This follows from the deﬁnition (31) of the Hausdorﬀ distance which gives (cid:12)(cid:12) ∆ k − ∆ Pk (cid:12)(cid:12) ≤ ℓ H ( K ∗ , P )  k + 1 k X j =0 cos(4 jπ/n )cos(2 jπ/n )  and this clearly implies (81) because cos(4 jπ/n ) / cos(2 jπ/n ) ≤ ≤ j ≤ k .From (81) and (80), we deduce that ℓ H ( K ∗ , P ) ≥ cσ √ k ∗ + 1for a universal positive constant c . This, together with inequality (17), clearly implies (78) whichcompletes the proof.We are now ready to prove Theorem 3.7. Proof of Theorem 3.7.

We use inequality (68) from the proof of Theorem 3.6. This inequality,along with (78) for i = 1 , . . . , n , gives E K ∗ L f (cid:16) K ∗ , ˆ K (cid:17) = 1 n n X i =1 E K ∗ (cid:16) ˆ h i − h K ∗ ( θ i ) (cid:17) ≤ C σ n n X i =1 k P ∗ ( i ) + 1 + ℓ H ( K ∗ , P ) ! for every compact, convex set P . By restricting P to be in the class of polytopes, we get E K ∗ L f (cid:16) K ∗ , ˆ K (cid:17) ≤ C inf P ∈P σ n n X i =1 k P ∗ ( i ) + 1 + ℓ H ( K ∗ , P ) ! . For the proof of (32), it is therefore enough to show that n X i =1 k P ∗ ( i ) + 1 ≤ Cv P log( en/v P ) for every P ∈ P (82)32here v P denotes the number of extreme points of P and C is a universal positive constant. Fixa polytope P with v P = k . Let the extreme points of P be z , . . . , z k . Let S , . . . , S k denote apartition of { θ , . . . , θ n } into k nonempty sets such that for each j = 1 , . . . , m , we have h P ( θ i ) = z j (1) cos θ i + z j (2) sin θ i for all θ i ∈ S j where z j = ( z j (1) , z j (2)). For (82), it is enough to prove that X i : θ i ∈ S j k P ∗ ( i ) + 1 ≤ C log( en j ) for every j = 1 , . . . , k (83)where n j is the cardinality of S j . This is because we can write n X i =1 k P ∗ ( i ) + 1 = k X j =1 X i : θ i ∈ S j k P ∗ ( i ) + 1 ≤ C k X j =1 log( en j ) ≤ Ck log( en/k )where we used the concavity of x log( ex ). We prove (83) below. Fix 1 ≤ j ≤ k . The inequalityis obvious if S j is a singleton because k P ∗ ( i ) ≥

0. So suppose that n j = m ≥

2. Without loss ofgenerality assume that S j = { θ u +1 , . . . , θ u + m } where 0 ≤ u ≤ n − m . The deﬁnition of S j impliesthat h P ( θ ) = z j (1) cos θ + z j (2) sin θ for all θ ∈ [ θ u +1 , θ u + m ] . We can therefore apply inequality (23) to claim the existence of a positive constant c such that k P ∗ ( i ) ≥ c n min ( θ i − θ u +1 , θ u + m − θ i ) for all u + 1 ≤ i ≤ u + m. The minimum with π in (23) is redundant here because θ u + m − θ u +1 < π . Because θ i = 2 πi/n − π ,we get k P ∗ ( i ) ≥ πc min ( i − u − , u + m − i ) for all u + 1 ≤ i ≤ u + m. Therefore, there exists a universal constant C such that X i : θ i ∈ S j k P ∗ ( i ) + 1 ≤ C m X i =1

11 + min( i − , m − i ) ≤ C m X i =1 i ≤ C log( em ) . This proves (83) thereby completing the proof of Theorem 3.7.

Recall the deﬁnition (16) of the estimator ˆ K ′ and that of the interpolating function (15). Followingan argument similar to that used at the beginning of the proof of Theorem 3.6, we observe that E K ∗ L ( K ∗ , ˆ K ′ ) ≤ Z π − π E K ∗ (cid:16) h K ∗ ( θ ) − ˆ h ′ ( θ ) (cid:17) dθ = n X i =1 Z θ i +1 θ i E K ∗ (cid:16) h K ∗ ( θ ) − ˆ h ′ ( θ ) (cid:17) dθ (84)33ow ﬁx 1 ≤ i ≤ n , θ i ≤ θ ≤ θ i +1 and let u ( θ ) := E K ∗ (cid:16) h K ∗ ( θ ) − ˆ h ′ ( θ ) (cid:17) . Using the expression (15)for ˆ h ′ ( θ ), we get that u ( θ ) = E K ∗ (cid:18) h K ∗ ( θ ) − sin( θ i +1 − θ )sin( θ i +1 − θ i ) ˆ h i − sin( θ − θ i )sin( θ i +1 − θ i ) ˆ h i +1 (cid:19) . We now write ˆ h i = ˆ h i − h K ∗ ( θ i ) + h K ∗ ( θ i ) and a similar expression for ˆ h i +1 . The elementaryinequality ( a + b + c ) ≤ a + b + c ) along with max (sin( θ − θ i ) , sin( θ i +1 − θ )) ≤ sin( θ i +1 − θ i )then imply that u ( θ ) ≤ E K ∗ (cid:16) ˆ h i − h K ∗ ( θ i ) (cid:17) + 3 E K ∗ (cid:16) ˆ h i +1 − h K ∗ ( θ i +1 ) (cid:17) + 3 b ( θ )where b ( θ ) := h K ∗ ( θ ) − sin( θ i +1 − θ )sin( θ i +1 − θ i ) h K ∗ ( θ i ) − sin( θ − θ i )sin( θ i +1 − θ i ) h K ∗ ( θ i +1 )Therefore from (84) (remember that | θ i +1 − θ i | = 2 π/n ), we deduce E K ∗ L ( K ∗ , ˆ K ′ ) ≤ πn n X i =1 E K ∗ (cid:16) ˆ h i − h K ∗ ( θ i ) (cid:17) + 3 Z π − π b ( θ ) dθ. Now to bound P ni =1 E K ∗ (cid:16) ˆ h i − h K ∗ ( θ i ) (cid:17) , we can simply use the arguments from the proofs ofTheorems 3.6 and 3.7. Therefore, to complete the proof of Theorem 3.8, we only need to show that | b ( θ ) | ≤ CRn for every θ ∈ ( − π, π ] (85)for some universal constant C . For this, we use the hypothesis that K ∗ is contained in a ballof radius R . Suppose that the center of the ball is ( x , x ). Deﬁne K ′ := K ∗ − { ( x , x ) } := { ( y , y ) − ( x , x ) : ( y , y ) ∈ K ∗ } and note that h K ′ ( θ ) = h K ∗ ( θ ) − x cos θ − x sin θ . It is theneasy to see that b ( θ ) is the same for both K ∗ and K ′ . It is therefore enough to prove (85) assumingthat ( x , x ) = (0 , | h K ∗ ( θ ) | ≤ R for all θ and alsothat h K ∗ is Lipschitz with constant R . Now, because max (sin( θ − θ i ) , sin( θ i +1 − θ )) ≤ sin( θ i +1 − θ i ),it can be checked that | b ( θ ) | ≤ | h K ∗ ( θ ) | (cid:12)(cid:12)(cid:12)(cid:12) − sin( θ i +1 − θ )sin( θ i +1 − θ i ) − sin( θ − θ i )sin( θ i +1 − θ i ) (cid:12)(cid:12)(cid:12)(cid:12) + | h K ∗ ( θ i ) − h K ∗ ( θ ) | + | h K ∗ ( θ i +1 ) − h K ∗ ( θ ) | . Because h K ∗ is R -Lipschitz and bounded by R , it is clear that we only need to show (cid:12)(cid:12)(cid:12)(cid:12) − sin( θ i +1 − θ )sin( θ i +1 − θ i ) − sin( θ − θ i )sin( θ i +1 − θ i ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ Cn in order to prove (85). For this, write α = θ i +1 − θ and β = θ − θ i so that the above expressionbecomes (cid:12)(cid:12)(cid:12)(cid:12) − sin α + sin β sin( α + β ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ | − cos α | + | − cos β | ≤ α + β ≤ Cn ≤ Cn .

This completes the proof of Theorem 3.8. 34 .6 Proofs of Corollaries in Section 3.1

The proofs of the corollaries stated in Section 3.1 are given here. For these proofs, we need somesimple properties of the ∆ k ( θ i ) which are stated and proved in Appendix A.We start with the proof of Corollary 3.5. Proof of Corollary 3.5.

Fix 1 ≤ i ≤ n . We will prove that ˘ k ( i ) ≤ k ∗ ( i ) ≤ ˜ k ( i ). Inequality (27)would then follow from Theorem 3.1. For simplicity, we write ∆ k for ∆ k ( θ i ), f k for f k ( θ i ), g k for g k ( θ i ), k ∗ for k ∗ ( i ), ˘ k for ˘ k ( i ) and ˜ k for ˜ k ( i ).Inequality (89) in Lemma A.2 gives∆ k ≥ σ ( √ − √ k + 1 for all k > k ∗ , k ∈ I . Thus any k ∈ I for which f k ≤ ∆ k < σ ( √ − / √ k + 1 has to satisfy k ≤ k ∗ . This proves ˘ k ≤ k ∗ .For k ∗ ≤ ˜ k , we ﬁrst inequality (88) in Lemma A.2 to obtain ∆ k ∗ ≥ √ − σ/ √ k ∗ + 1. AlsoLemma A.1 states that k ∆ k is non-decreasing for k ∈ I . We therefore have g k ≤ ∆ k ≤ ∆ k ∗ ≤ √ − σ √ k ∗ + 1 ≤ √ − σ √ k + 1 for all k ≤ k ∗ , k ∈ I . Therefore any k ∈ I for which g k > √ − σ/ √ k + 1 has to be larger than k ∗ . This proves˜ k ≥ k ∗ . The proof is complete.We next give the proof of Corollary 3.3. Proof of Corollary 3.3.

We only need to prove (20). Inequality (21) would then follow from The-orem 3.1. Fix i ∈ { , . . . , n } and suppose that K ∗ is contained in a ball of radius R centered at( x , x ). We shall prove below that ∆ k ( θ i ) ≤ πRk/n for every k ∈ I and (20) would then followfrom Corollary 3.5. Without loss of generality, assume that θ i = 0.As in the proof of Theorem 3.8, we may assume that K ∗ is contained in the ball of radius R centered at the origin. This implies that | h K ∗ ( θ ) | ≤ R for all θ and also that h K ∗ is Lipschitz withconstant R . Note then that for every k ∈ I and 0 ≤ j ≤ k , the quantity Q := h K ∗ (4 jπ/n ) + h K ∗ ( − jπ/n )2 − cos(4 jπ/n )cos(2 jπ/n ) h K ∗ (2 jπ/n ) + h K ∗ ( − jπ/n )2can be bounded as | Q | = (cid:12)(cid:12)(cid:12)(cid:12) h K ∗ (4 jπ/n ) − h K ∗ (2 jπ/n ) + h K ∗ ( − jπ/n ) − h K ∗ ( − jπ/n )2 − (cid:18) cos(4 jπ/n ) − cos(2 jπ/n )cos(2 jπ/n ) (cid:19) h K ∗ (2 jπ/n ) + h K ∗ ( − jπ/n )2 (cid:12)(cid:12)(cid:12)(cid:12) ≤ Rjπn .

Here we used also the fact that cos( · ) is Lipschitz and cos(2 jπ/n ) ≥ /

2. The inequality ∆ k (0) ≤ πRk/n then immediately follows. The proof is complete.35e conclude this section with a proof of Corollary 3.4. Proof of Corollary 3.4.

By Theorem 3.1, inequality (24) is a direct consequence of (23). We there-fore only need to prove (23). Fix k ∈ I with k ≤ n π min( θ i − φ ( i ) , φ ( i ) − θ i ) . (86)It is then clear that θ i ± jπ/n ∈ [ φ ( i ) , φ ( i )] for every 0 ≤ j ≤ k . From (22), it follows that h K ∗ ( θ ) = x cos θ + x sin θ for all θ = θ i ± jπn , ≤ j ≤ k. We now argue that ∆ k ( θ i ) = 0. To see this, note ﬁrst that ∆ k ( θ i ) = U k ( θ i ) − L k ( θ i ) has thefollowing alternative expression (36). Plugging in h K ∗ ( θ ) = x cos θ + x sin θ in (36), one can seeby direct computation that ∆ k ( θ i ) = 0 for every k ∈ I satisfying (86). The deﬁnition (18) of k ∗ ( i )now immediately implies that k ∗ ( i ) ≥ min (cid:16) n π min( θ i − φ ( i ) , φ ( i ) − θ i ) , cn (cid:17) for a small enough universal constant c . This proves (23) thereby completing the proof. References

Alexandrov, A. D. (1939). Almost everywhere existence of the second diﬀerential of a convexfunction and some properties of convex surfaces connected with it.

Leningrad State Univ. Annals[Uchenye Zapiski] Math. Ser. 6 , 3–35.Baraud, Y. and L. Birg´e (2015). Rates of convergence of rho-estimators for sets of densities satisfyingshape constraints. arXiv preprint arXiv:1503.04427 .Brunel, V.-E. (2014).

Non-parametric estimation of convex bodies and convex polytopes . Ph. D.thesis, Universit´e Pierre et Marie Curie-Paris VI; University of Haifa.Brunk, H. D. (1970). Estimation of isotonic regression. In

Nonparametric Techniques in Statis-tical Inference (Proc. Sympos., Indiana Univ., Bloomington, Ind., 1969) , pp. 177–197. London:Cambridge Univ. Press.Cai, T. T. and M. G. Low (2015). A framework for estimation of convex functions.

StatisticaSinica 25 , 423–456.Cai, T. T., M. G. Low, and Y. Xia (2013). Adaptive conﬁdence intervals for regression functionsunder shape constraints.

Annals of Statistics 41 , 722–750.Carolan, C. and R. Dykstra (1999). Asymptotic behavior of the Grenander estimator at densityﬂat regions.

Canad. J. Statist. 27 (3), 557–566.36ator, E. (2011). Adaptivity and optimality of the monotone least-squares estimator.

Bernoulli 17 ,714–735.Chatterjee, S., A. Guntuboyina, and B. Sen (2014). On risk bounds in isotonic and other shaperestricted regression problems.

Annals of Statistics . to appear.Fisher, N. I., P. Hall, B. A. Turlach, and G. S. Watson (1997). On the estimation of a convexset from noisy data on its support function.

Journal of the American Statistical Association 92 ,84–91.Gardner, R. J. (2006).

Geometric Tomography (second ed.). Cambridge University Press.Gardner, R. J. and M. Kiderlen (2009). A new algorithm for 3D reconstruction from supportfunctions.

IEEE Transactions on Pattern Analysis and Machine Intelligence 31 , 556–562.Gardner, R. J., M. Kiderlen, and P. Milanfar (2006). Convergence of algorithms for reconstructingconvex bodies and directional measures.

Annals of Statistics 34 , 1331–1374.Gregor, J. and F. R. Rannou (2002). Three-dimensional support function estimation and applica-tion for projection magnetic resonance imaging.

International Journal of Imaging Systems andTechnology 12 , 43–50.Groeneboom, P. (1983). The concave majorant of Brownian motion.

Ann. Probab. 11 (4), 1016–1027.Groeneboom, P. (1985). Estimating a monotone density. In

Proceedings of the Berkeley confer-ence in honor of Jerzy Neyman and Jack Kiefer, Vol. II (Berkeley, Calif., 1983) , WadsworthStatist./Probab. Ser., Belmont, CA, pp. 539–555. Wadsworth.Groeneboom, P. and G. Jongbloed (2014).

Nonparametric Estimation under Shape Constraints:Estimators, Algorithms and Asymptotics , Volume 38. Cambridge University Press.Groeneboom, P., G. Jongbloed, and J. A. Wellner (2001a). A canonical process for estimation ofconvex functions: The ”invelope” of integrated brownian motion + t . Annals of Statistics 29 ,1620–1652.Groeneboom, P., G. Jongbloed, and J. A. Wellner (2001b). Estimation of convex functions: char-acterizations and asymptotic theory.

Annals of Statistics 29 , 1653–1698.Guntuboyina, A. (2011). Optimal rates of convergence for the estimation of reconstruction ofconvex bodies from noisy support function measurements.

Annals of Statistics . to appear.Guntuboyina, A. and B. Sen (2013). Global risk bounds and adaptation in univariate convex re-gression.

Probab. Theory Related Fields . To appear , available at http://arxiv.org/abs/1305.1648.Hanson, D. L. and G. Pledger (1976). Consistency in concave regression.

Ann. Statist. 4 (6),1038–1050. 37ankowski, H. (2014). Convergence of linear functionals of the Grenander estimator under misspec-iﬁcation.

Ann. Statist. 42 (2), 625–653.Le Cam, L. (1986).

Asymptotic Methods in Statistical Decision Theory . New York: Springer-Verlag.Lele, A. S., S. R. Kulkarni, and A. S. Willsky (1992). Convex-polygon estimation from support-linemeasurements and applications to target reconstruction from laser-radar data.

Journal of theOptical Society of America, Series A 9 , 1693–1714.Mammen, E. (1991). Nonparametric regression under qualitative smoothness assumptions.

Ann.Statist. 19 (2), 741–759.Prince, J. L. and A. S. Willsky (1990). Reconstructing convex sets from support line measurements.

IEEE Transactions on Pattern Analysis and Machine Intelligence 12 , 377–389.Schneider, R. (1993).

Convex Bodies: The Brunn-Minkowski Theory . Cambridge: Cambridge Univ.Press.Stark, H. and Y. Yang (1998). Vector space projections.

John Wiley&Sons, New York .Vitale, R. A. (1979). Support functions of plane convex sets. Technical report, Claremont GraduateSchool, Claremont, CA.Wright, F. T. (1981). The asymptotic behavior of monotone regression estimates.

Ann. Statist. 9 (2),443–448.Zhang, C.-H. (2002). Risk bounds in isotonic regression.

Ann. Statist. 30 (2), 528–555.38

Some additional technical results and proofs

In this appendix, we provide additional technical results and proofs.

Proof of Lemma 2.1.

The inequality h K ∗ ( θ ) ≤ u ( θ, φ ) is obtained by using (1) with α = θ + φ, α = θ − φ and α = θ . For l ( θ, φ ) ≤ h K ∗ ( θ ), we use (1) with α = θ + 2 φ, α = θ and α = θ + φ to obtain h K ∗ ( θ ) ≥ h K ∗ ( θ + φ ) cos φ − h K ∗ ( θ + 2 φ ) . One similarly has h K ∗ ( θ ) ≥ h K ∗ ( θ − φ ) cos φ − h K ∗ ( θ − φ ) and l ( θ, φ ) ≤ h K ∗ ( θ ) is deduced byaveraging these two inequalities. Lemma A.1.

Recall the quantity ∆ k ( θ i ) deﬁned in (36) . The inequality ∆ k ( θ i ) ≥ . k ( θ i ) holdsfor every ≤ i ≤ n and ≤ k ≤ n/ .Proof. We may assume without loss of generality that θ i = 0. We will simply write ∆ k for ∆ k ( θ i )below for notational convenience. Let us deﬁne, for θ ∈ R , δ ( θ ) := h K ∗ (2 θ ) + h K ∗ ( − θ )2 − cos 2 θ cos θ h K ∗ ( θ ) + h K ∗ ( − θ )2 . Note then that ∆ k = P kj =0 δ (2 jπ/n ) / ( k + 1). We shall ﬁrst prove that δ ( y ) ≥ (cid:18) tan y tan x (cid:19) δ ( x ) for every 0 < y ≤ π/ x < y ≤ x. (87)For this, ﬁrst apply (1) to α = 2 x, α = x and α = y to get h K ∗ ( y ) ≤ sin( y − x )sin x h K ∗ (2 x ) + sin(2 x − y )sin x h K ∗ ( x ) . We then apply (1) to α = 2 y, α = x and α = 2 x to get (note that 2 y − x ≤ y < π/ h K ∗ (2 y ) ≥ sin(2 y − x )sin x h K ∗ (2 x ) − sin(2 y − x )sin x h K ∗ ( x ) . Combining these two inequalities, we get (note that 2 y ≤ π/ y ≥ h K ∗ (2 y ) − cos 2 y cos y h K ∗ ( y ) ≥ αh K ∗ (2 x ) − βh K ∗ ( x ) , where α := sin(2 y − x )sin x − cos 2 y cos y sin( y − x )sin x and β := sin(2 y − x )sin x + cos 2 y cos y sin(2 x − y )sin x . It can be checked by a straightforward calculation that α = tan y tan x and β = tan y tan x cos 2 x cos x .

39t follows therefore that h K ∗ (2 y ) − cos 2 y cos y h K ∗ ( y ) ≥ tan y tan x (cid:18) h K ∗ (2 x ) − cos 2 x cos x h K ∗ ( x ) (cid:19) . We similarly obtain h K ∗ ( − y ) − cos 2 y cos y h K ∗ ( − y ) ≥ tan y tan x (cid:18) h K ∗ ( − x ) − cos 2 x cos x h K ∗ ( − x ) (cid:19) . The required inequality (87) now results by adding the above two inequalities. A trivial consequenceof (87) is that δ ( y ) ≥ δ ( x ) for 0 < y ≤ π/ x < y ≤ x . Further, applying (87) to y = 2 x (assuming that 0 < x < π/ δ (2 x ) ≥ δ ( x ). Note that tan 2 x = 2 tan x/ (1 − tan x ) ≥ x for 0 < x < π/ k ≥ (1 . k , we ﬁx 1 ≤ k ≤ n/

16 (note that the inequality is trivial when k = 0)and note that∆ k = 12 k + 1 k X j =0 δ (cid:18) jπn (cid:19) = 12 k + 1 k X j =1 (cid:18) δ (cid:18) j − πn (cid:19) + δ (cid:18) jπn (cid:19)(cid:19) where we used the fact that δ (0) = 0 . Using the bounds proved for δ ( θ ), we have δ (cid:18) j − πn (cid:19) ≥ δ (cid:18) jπn (cid:19) and δ (cid:18) jπn (cid:19) ≥ δ (cid:18) jπn (cid:19) . Therefore ∆ k ≥ k + 1 k X j =1 δ (cid:18) jπn (cid:19) ≥ k + 1) k X j =0 δ (cid:18) jπn (cid:19) = 32 ∆ k and this completes the proof. Lemma A.2.

Fix i ∈ { , . . . , n } . Consider ∆ k ( θ i ) (deﬁned in (36) ) and k ∗ ( i ) (deﬁned in (18) ).We then have the following inequalities ∆ k ∗ ( i ) ( θ i ) ≤ √ − σ p k ∗ ( i ) + 1 . (88) and ∆ k ( θ i ) ≥ max ( √ − σ √ k + 1 , ( √ − √ k + 1 σ k ∗ + 1) ! for all k > k ∗ ( i ) , k ∈ I . (89) Proof.

Fix i ∈ { , . . . , n } . Below we simply denote k ∗ ( i ) and ∆ k ( θ i ) by k ∗ and ∆ k respectively fornotational convenience.We ﬁrst prove (88). If k ∗ ≥

2, we have∆ k ∗ + 2 σ √ k ∗ + 1 ≤ ∆ k ∗ / + √ σ √ k ∗ + 2 ≤ ∆ k ∗ / + √ σ √ k ∗ + 1 . k ∗ ∈ I and hence k ∗ ≤ n/ k ∗ / ≤ (2 / k ∗ . Wetherefore have ∆ k ∗ + 2 σ √ k ∗ + 1 ≤

23 ∆ k ∗ + √ σ √ k ∗ + 1which proves (88). Inequality (88) is trivial when k ∗ = 0. Finally, for k ∗ = 1, we have ∆ + √ σ ≤ ∆ + 2 σ = 2 σ which again implies (88).We now turn to (89). Let k ′ denote the smallest k ∈ I for which k > k ∗ . We start by provingthe ﬁrst part of (89): ∆ k ≥ ( √ − σ √ k + 1 for k > k ∗ , k ∈ I . (90)Note ﬁrst that if (90) holds for k = k ′ , then it holds for all k ≥ k ′ as well because ∆ k ≥ ∆ k ′ (from Lemma A.1) and 1 / √ k + 1 ≤ / √ k ′ + 1. We therefore only need to verify (90) for k = k ′ . If k ∗ = 0, then k ′ = 1 and because ∆ + 2 σ √ ≥ ∆ + 2 σ = 2 σ, we obtain ∆ ≥ (2 − √ σ . This implies (90). On the other hand, if k ∗ >

0, then k ′ = 2 k ∗ and wecan write ∆ k ∗ + 2 σ √ k ∗ + 1 ≥ ∆ k ∗ + 2 σ √ k ∗ + 1 ≥ σ √ k ∗ + 1 . This gives ∆ k ∗ ≥ σ √ k ∗ + 1 r k ∗ + 1 k ∗ + 1 − ! which implies inequality (90) for k = 2 k ∗ because (2 k ∗ + 1) / ( k ∗ + 1) ≥ /

2. The proof of (90) iscomplete.For the second part of (89), we use Lemma A.1 which states ∆ k ≥ (1 . k ≥ √ k for all k ∈ I . By a repeated application of this inequality, we get∆ k ≥ r kk ′ ∆ k ′ ≥ r k + 1 k ′ + 1 ∆ k ′ for all k ≥ k ′ . Using (90) for k = k ′ , we get ∆ k ≥ ( √ − σ √ k + 1 k ′ + 1 . The proof of (89) is now completed by observing that k ′ ≤ k ∗ + 1. Lemma A.3.

Fix i ∈ { , . . . , n } . For every ≤ k ≤ n/ , the variance of the random variable ˆ U k ( θ i ) (deﬁned in (10) ) is at most σ / ( k + 1) . Also, for every ≤ k ≤ n/ , the variance of therandom variable ˆ∆ k ( θ i ) (deﬁned in (11) ) is at most σ / ( k + 1) .Proof. Fix 1 ≤ i ≤ n . We shall ﬁrst prove the bound for the variance of ˆ U k ( θ i ) for a ﬁxed0 ≤ k ≤ n/

8. Note that ˆ U k ( θ i ) = 1 k + 1 k X j =0 Y i + j + Y i − j jπ/n ) .

41t is therefore straightforward to see thatvar( ˆ U k ( θ i )) = σ ( k + 1)  k X j =1 sec (2 jπ/n )  . For 1 ≤ j ≤ k ≤ n/

8, we have sec(2 jπ/n ) ≤ √ jπ/n ≤ π/

4. The inequality var( ˆ U k ( θ i )) ≤ σ / ( k + 1) then immediately follows.Let us now turn to the variance of ˆ∆ k ( θ i ). When k = 0, the conclusion is obvious sinceˆ∆ k ( θ i ) = 0. Otherwise, the expression (11) for ˆ∆ k ( θ i ) can be rewritten asˆ∆ k ( θ i ) = S + S + S where S = − k + 1 k X j =1 { j is odd } cos(4 jπ/n )cos(2 jπ/n ) Y i + j + Y i − j ,S = 1 k + 1 k X j =1 { j is even } (cid:18) − cos(4 jπ/n )cos(2 jπ/n ) (cid:19) Y i + j + Y i − j , and S = 1 k + 1 k X j = k +1 { j is even } Y j + Y − j .S , S and S are clearly independent. Moreover, the diﬀerent terms in each S i are also independent.Thus var( S ) = σ k + 1) k X j =1 { j is odd } cos (4 jπ/n )cos (2 jπ/n ) , var( S ) = σ k + 1) k X j =1 { j is even } (cid:18) − cos(4 jπ/n )cos(2 jπ/n ) (cid:19) , and var( S ) = σ k + 1) k X j = k +1 { j is even } ≤ σ k + 1) . Now for k ≤ n/

16 and 1 ≤ j ≤ k , 0 ≤ cos(4 jπ/n )cos(2 jπ/n ) ≤ S ) + var( S ) ≤ σ / k + 1). Thus var( ˆ∆ k ( θ i )) ≤ σ / ( k + 1).The following lemma was used in the proof of Theorem 3.2. Lemma A.4.

Let ∆ k be the quantity (36) with θ i = 0 i.e., ∆ k := 1 k + 1 k X j =0 (cid:18) h K ∗ (4 jπ/n ) + h K ∗ ( − jπ/n )2 − cos(4 jπ/n )cos(2 jπ/n ) h K ∗ (2 jπ/n ) + h K ∗ ( − jπ/n )2 (cid:19) . hen the following inequality holds for every k ≤ n/ : ∆ k ≤ h K ∗ (4 kπ/n ) + h K ∗ ( − kπ/n )2 cos(4 kπ/n ) − h K ∗ (0) . Proof.

From Lemma A.1, it follows that δ (2 iπ/n ) ≤ δ (2 kπ/n ) for all 1 ≤ i ≤ k (this followsby reapplying Lemma A.1 to 2 iπ/n, iπ/n, . . . until we hit 2 kπ/n ). As a consequence, we have∆ k ≤ δ (2 kπ/n ). Now, if θ = 2 kπ/n then θ ≤ π/ δ ( θ ) = h K ∗ (2 θ ) + h K ∗ ( − θ )2 − cos 2 θ cos θ h K ∗ ( θ ) + h K ∗ ( − θ )2= cos 2 θ (cid:18) h K ∗ (2 θ ) + h K ∗ ( − θ )2 cos 2 θ − h K ∗ (0) (cid:19) − cos 2 θ (cid:18) h K ∗ ( θ ) + h K ∗ ( − θ )2 cos θ − h K ∗ (0) (cid:19) . Because h K ∗ ( θ ) + h K ∗ ( − θ ) ≥ h K ∗ (0) cos θ and cos 2 θ ≥

0, we have δ ( θ ) ≤ cos 2 θ (cid:18) h K ∗ (2 θ ) + h K ∗ ( − θ )2 cos 2 θ − h K ∗ (0) (cid:19) ≤ h K ∗ (2 θ ) + h K ∗ ( − θ )2 cos 2 θ − h K ∗ (0) ..