[PDF] Chaining, Interpolation, and Convexity

Abstract

We show that classical chaining bounds on the suprema of random processes in terms of entropy numbers can be systematically improved when the underlying set is convex: the entropy numbers need not be computed for the entire set, but only for certain "thin" subsets. This phenomenon arises from the observation that real interpolation can be used as a natural chaining mechanism. Unlike the general form of Talagrand's generic chaining method, which is sharp but often difficult to use, the resulting bounds involve only entropy numbers but are nonetheless sharp in many situations in which classical entropy bounds are suboptimal. Such bounds are readily amenable to explicit computations in specific examples, and we discover some old and new geometric principles for the control of chaining functionals as special cases.

Full PDF

aa r X i v : . [ m a t h . P R ] F e b CHAINING, INTERPOLATION, AND CONVEXITY

RAMON VAN HANDEL

Abstract.

We show that classical chaining bounds on the suprema of randomprocesses in terms of entropy numbers can be systematically improved whenthe underlying set is convex: the entropy numbers need not be computed forthe entire set, but only for certain “thin” subsets. This phenomenon arisesfrom the observation that real interpolation can be used as a natural chainingmechanism. Unlike the general form of Talagrand’s generic chaining method,which is sharp but often diﬃcult to use, the resulting bounds involve onlyentropy numbers but are nonetheless sharp in many situations in which clas-sical entropy bounds are suboptimal. Such bounds are readily amenable toexplicit computations in speciﬁc examples, and we discover some old and newgeometric principles for the control of chaining functionals as special cases. Introduction

A remarkable achievement of modern probability theory is the development ofsharp connections between the boundedness of random processes and the geometryof the underlying index set. Perhaps the most fundamental result in this directionis the characterization of boundedness of Gaussian processes due to Talagrand.

Theorem 1.1 ([16]) . Let ( X t ) t ∈ T be a centered Gaussian process and denote by d ( t, s ) = ( E | X t − X s | ) / the associated natural metric on T . Then E (cid:20) sup t ∈ T X t (cid:21) ≍ γ ( T ) := inf sup t ∈ T X n ≥ n/ d ( t, T n ) , where the inﬁmum is taken over all sequences of sets T n with cardinality | T n | < n . The quantity γ ( T ) captures precisely what aspect of the geometry of the metricspace ( T, d ) controls the suprema of Gaussian processes: it quantiﬁes the degree towhich T can be approximated by a sequence of increasingly ﬁne nets T n . Whilewe quote this particular result for concreteness, the structure that is expressedby Theorem 1.1, called the generic chaining, extends far beyond the theory ofGaussian processes and has a substantial impact on various problems in probabil-ity, functional analysis, statistics, and theoretical computer science. An extensivedevelopment of this theory and its implications can be found in [16].Theorem 1.1 provides a powerful general principle for the study of the supremaof random processes. However, when presented with any speciﬁc situation, it oftenproves to be remarkably diﬃcult to control γ ( T ) eﬃciently. Theorem 1.1 can only Mathematics Subject Classiﬁcation.

Key words and phrases.

Generic chaining; majorizing measures; entropy numbers; real inter-polation; suprema of random processes.Supported in part by NSF grant CAREER-DMS-1148711 and by the ARO through PECASEaward W911NF-14-1-0094.

RAMON VAN HANDEL give sharp results if one is able to construct a nearly optimal sequence of nets T n ,a task that is signiﬁcantly complicated by the multiscale nature of γ ( T ). Theaim of this paper is to exhibit some surprisingly elementary principles that make itpossible to obtain sharp control of γ ( T ) in various interesting examples, and thatshed new light on the underlying geometric phenomena.There are essentially two general approaches that have been used to control γ ( T ). The simplest and by far the most useful approach is obtained by bringingthe supremum over t ∈ T inside the sum in the deﬁnition of γ ( T ). This yields γ ( T ) ≤ X n ≥ n/ e n ( T ) , where the entropy number e n ( T ) is deﬁned as the smallest ε > ε -net in T of cardinality less than 2 n . This bound, due to Dudley [7], longpredates Theorem 1.1 and has found widespread use. Its utility stems from thefact that controlling entropy numbers only requires us to approximate the set T at a single scale, for which numerous methods are available; see, e.g., [10, 8, 2].Unfortunately, Dudley’s bound can fail to be sharp even in the simplest examples,such as ellipsoids in Hilbert space. In fact, the supremum of a random processon T cannot in general be understood in terms of the entropy numbers of T : onecan easily construct two such sets with comparable entropy numbers on which aGaussian process behaves very diﬀerently [14]. It is therefore a crucial feature ofTheorem 1.1 that the use of entropy numbers is replaced by a genuinely multiscaleform of approximation. The construction of such a multiscale approximation in anygiven situation is however a highly nontrivial task.The main approach that has been developed for the latter purpose is Talagrand’sgrowth functional machinery [16] that forms the core of the proof of Theorem 1.1.To show that γ ( T ) is upper bounded by the expected supremum of the Gaussianprocess, the proof of Theorem 1.1 constructs nets T n by means of a greedy parti-tioning scheme that uses the Gaussian process itself G ( A ) := E [sup t ∈ A X t ] as anobjective function. It turns out that the success of this proof relies on the propertiesof Gaussian processes only through the validity of a single “growth condition” ofthe functional G . If one can design another functional F that mimics this propertyof Gaussian processes, then the same proof also yields an upper bound on γ ( T )in terms of F ( T ). An important example of such a construction is the proof that γ ( T ) is strictly smaller than Dudley’s bound when T is a q -convex body [16, § F can be designed, andsuccessful application of this approach requires considerable ingenuity.In this paper, we develop a new approach that is intermediate between thesetwo extremes. The central insight of this paper is that it is possible to improvesystematically on Dudley’s bound without giving up the formulation in terms ofentropy numbers. Of course, as was noted above, we cannot expect to improve onDudley’s bound in a general setting in terms of the entropy numbers of T itself.Instead, we will show that when T is a convex set, the entropy numbers e n ( T ) inDudley’s bound can be replaced by the entropy numbers of certain “thin” subsetsthat can be substantially smaller than T . (The convexity assumption is not essentialfor our approach, but leads to a cleaner statement of the results.)To illustrate this idea, let us begin by stating a useful form of such a result. Let( X, k · k ) be a Banach space, and let B ⊂ X be a symmetric compact convex set.We denote by k · k B the gauge of B , and by k · k ∗ B and k · k ∗ the dual norms on X ∗ . HAINING, INTERPOLATION, AND CONVEXITY 3

In this setting, we will always choose the distance d in the deﬁnitions of γ ( B ) and e n ( B ) to be the one generated by the norm d ( x, y ) := k x − y k . Theorem 1.2.

Let B ⊂ ( X, k · k ) be a symmetric compact convex set, and deﬁne B t := { y ∈ B : ∃ z ∈ X ∗ such that h z, y i = k y k B , k z k ∗ B ≤ , k z k ∗ ≤ t } . Then we have for any a > γ ( B ) . a + X n ≥ n/ e n ( B a n/ ) . The bound of Theorem 1.2 proves to be sharp in many situations in whichDudley’s bound is suboptimal, and often provides a simple explanation for whythis is the case. At the same time, Theorem 1.1 is typically no more diﬃcult toapply than Dudley’s bound, as the “thin” subsets B t ⊆ B that appear in this boundcan be found in quite explicit form. For example, if B is a smooth symmetric convexbody in R d , then it is a classical fact that ∇k x k B is the unique norming functionalfor the norm k · k B at the point x , so that we can simply write B t = { y ∈ B : k∇k y k B k ∗ ≤ t } . Such expressions are readily amenable to explicit computations.One of the nice features of Theorem 1.2 is that the phenomenon that it describesarises in a completely elementary fashion. To understand its origin, let us sketchthe simple idea behind the proof. The basic challenge in controlling γ ( B ) is toapproximate the unit ball of the norm k · k B in terms of another norm k · k . Itproves to be useful to connect these two norms using an idea that is inspired byreal interpolation of Banach spaces [4]. To this end, deﬁne Peetre’s K -functional K ( t, x ) := inf y {k y k B + t k x − y k} = k π t ( x ) k B + t k x − π t ( x ) k , where π t ( x ) is any minimizer in the deﬁnition of K ( t, x ) (assume for simplicity thatwe work in a ﬁnite-dimensional Banach space to avoid trivial technicalities). It iseasily seen that lim t →∞ K ( t, x ) = k x k B , K (0 , x ) = 0, and ddt K ( t, x ) = k x − π t ( x ) k (the latter follows by observing that k x − π t ( x ) k is a supergradient of the concavefunction t K ( t, x ), so it must equal ddt K ( t, x ) a.e.; see Proposition 2.3 below.)We therefore obtain by the fundamental theorem of calculus k x k B = Z ∞ k x − π t ( x ) k dt ≍ X n ≥ n/ k x − π n/ ( x ) k , where the last step follows from a Riemann sum approximation of the integral. Thisleads immediately to the following observation: if we deﬁne the sets B t := { π t ( x ) : x ∈ B } , then we have shown that sup x ∈ B X n ≥ n/ d ( x, B n/ ) . . In other words, we see that a natural chaining mechanism is in fact built into thereal interpolation method: we automatically generate a multiscale approximationof B in terms of the sets B t . In order to bound γ ( B ), it remains to choose a ﬁnitenet with the appropriate cardinality inside each of the sets B t . (While it may not RAMON VAN HANDEL be immediately obvious, the deﬁnition of B t given in Theorem 1.2 is none otherthan the dual formulation of the deﬁnition of B t as a set of minimizers.)It should be clear at this point that convexity is not essential in the constructionusing real interpolation: convexity only enters the proof of Theorem 1.2 in orderto obtain the convenient formulation of the sets B t . In section 2, we ﬁrst provea general form of Theorem 1.2 that is applicable in any metric space; we alsoformulate the results for more general γ p -functionals that appear when the genericchaining method is applied to non-Gaussian processes. We then specialize to theconvex setting and derive the dual formulation of B t . In section 3, we illustratethe power of Theorem 1.2 in a number of explicit examples. We also illustrate bymeans of an example that Theorem 1.2 does not always give sharp results.Theorem 1.2 improves on Dudley’s bound by replacing the entropy numbers of B by the entropy numbers of the smaller sets B t . A rather diﬀerent improvementarises when B is q -convex, for which Talagrand shows that [16, § γ ( B ) . " X n ≥ (cid:0) n/ e n ( B ) (cid:1) q/ ( q − ( q − /q . This bound involves only the entropy numbers of the set B itself, and appears at ﬁrstsight to be quite diﬀerent in nature than Theorem 1.2. Nonetheless, we show in sec-tion 4 that this fundamental result is a direct consequence of Theorem 1.2. Roughlyspeaking, we will see that the q -convexity assumption forces the sets B t to be muchsmaller than the original set B in the sense that e n ( B t ) / t / ( q − e n ( B ) q/ ( q − .In fact, it turns out there is nothing particularly special about uniform convexity:Talagrand’s result is a special case of a more general geometric phenomenon thatwill be developed in section 4. As another illustration of this phenomenon, we willshow that Talagrand’s bound for q -convex bodies holds verbatim for ℓ q -balls inBanach spaces with an unconditional basis for every 1 < q < ∞ . Note that suchsets are only 2-convex rather than q -convex when 1 < q <

2, so that the behaviorof ℓ q -balls is evidently not explained by uniform convexity.The connection between interpolation and generic chaining appears in hindsightto be entirely natural. Many generic chaining constructions (that appear in [16, 15],for example) have a ﬂavor of interpolation, and even the multiscale notion of ap-proximation that is intrinsic to the deﬁnition of γ ( T ) has appeared independentlyin interpolation theory in the study of approximation spaces [13, 6, 12]. To thebest of the author’s knowledge, however, the results of this paper are the ﬁrst toexplicitly develop this connection. It would be interesting to understand whetherbroader interactions exist between these areas of probability and analysis.2. Chaining, Interpolation, and Convexity

The aim of this section is to develop the basic connections between chaining,interpolation, and convexity that lie at heart of this paper. In section 2.1, wedevelop an abstract chaining principle that holds in any metric space. In section2.2, we specialize to the convex setting and complete the proof of Theorem 1.2.2.1.

Chaining and interpolation.

In this section, let (

X, d ) be any metric space.We begin by deﬁning formally the notions of entropy numbers and Talagrand’s γ p -functionals. The case p = 2 arises in the context of Gaussian processes together HAINING, INTERPOLATION, AND CONVEXITY 5 with the associated natural metric, cf. Theorem 1.1; however, other values of p andmore general metrics can arise for other random processes [16]. Deﬁnition 2.1.

For any A ⊆ X and n ≥

0, deﬁne the entropy number e n ( A ) := inf | ˜ A | < n sup x ∈ A d ( x, ˜ A ) , and deﬁne for p > γ p -functional γ p ( A ) := inf | ˜ A n | < n sup x ∈ A X n ≥ n/p d ( x, ˜ A n ) . (The approximating sets ˜ A n ⊆ X are not necessarily subsets of A .)Fix a set A ⊆ X for the remainder of this section. To measure the size of A , weintroduce a penalty function f : X → R + ∪ { + ∞} that may in principle be chosenarbitrarily. Consider the corresponding optimization problem K ( t, x ) := inf y ∈ X { f ( y ) + td ( x, y ) } for every t ≥ x ∈ A . We will assume for simplicity that the inﬁmum inthis optimization problem is attained for every t ≥ x ∈ A , and denote by π t ( x ) any choice of minimizer in the deﬁnition of K ( t, x ). (It is a trivial exerciseto extend our results to the setting where π t ( x ) is a near-minimizer, but such anextension will not be needed in the sequel.) We now deﬁne for every t ≥ A t := { π t ( x ) : x ∈ A } . Remark 2.2.

In the present formulation, A t is not necessarily a subset of A .However, it is natural to choose a penalty function f such that A = { x : f ( x ) ≤ } ,in which case evidently A t ⊆ A (because f ( π t ( x )) ≤ K ( t, x ) ≤ f ( x )).The following result lies at the heart of this paper. In the sequel, we write a . b if a ≤ Cb for a universal constant C , and a ≍ b if a . b and b . a . We indicateexplicitly when the universal constant depends on some parameter in the problem. Proposition 2.3.

In the setting of this section, we have for every a > γ p ( A ) . a sup x ∈ A f ( x ) + X n ≥ n/p e n ( A a n/p ) , where the universal constant depends on p only.Proof. We can assume without loss of generality that f is uniformly bounded on A .Thus 0 ≤ K ( t, x ) ≤ f ( x ) < ∞ for every x ∈ A and t ≥

0. Moreover, t K ( t, x )is clearly a concave function for every x ∈ A . We now use some basic facts aboutunivariate concave functions [9, Chapter I]. First, we note that K ( t, x ) − K ( s, x ) = inf y ∈ X { f ( y ) + td ( x, y ) } − f ( π s ( x )) − sd ( x, π s ( x )) ≤ ( t − s ) d ( x, π s ( x ))for all t, s ≥

0, so that d ( x, π s ( x )) is a supergradient of t K ( t, x ) at t = s . As abounded concave function is absolutely continuous, we obtain K ( T, x ) = K (0 , x ) + Z T d ( x, π t ( x )) dt RAMON VAN HANDEL for every T ≥ x ∈ A . In particular, we can estimate Z ∞ d ( x, π t ( x )) dt ≤ f ( x )for every x ∈ A . We also recall that the derivative of a concave function is nonin-creasing, so that we can discretize the integral as follows: f ( x ) ≥ Z a d ( x, π t ( x )) dt + X n ≥ Z a n/p a ( n − /p d ( x, π t ( x )) dt ≥ (1 − − /p ) a X n ≥ n/p d ( x, π a n/p ( x )) , where we used that t d ( x, π t ( x )) is nonincreasing in the last step.It remains to discretize the minimizers π t ( x ). By the deﬁnition of entropy num-bers, we can choose for every n ≥ A n ⊆ X such that | ˜ A n | < n andsup x ∈ A a n/p d ( x, ˜ A n ) ≤ e n ( A a n/p ) . We can therefore estimate γ p ( A ) ≤ sup x ∈ A X n ≥ n/p d ( x, ˜ A n ) ≤ sup x ∈ A X n ≥ n/p d ( x, π a n/p ( x )) + X n ≥ n/p sup x ∈ A d ( π a n/p ( x ) , ˜ A n ) . a sup x ∈ A f ( x ) + X n ≥ n/p e n ( A a n/p ) , which completes the proof. (cid:3) Remark 2.4.

Suppose we replace the penalty f by an equivalent penalty ˜ f ≍ f .Then the ﬁrst term in the bound of Proposition 2.3 only changes by a universalconstant, but the second term might change substantially as the deﬁnition of thesets A t is highly nonlinear. This highlights the nontrivial nature of the choice ofpenalty. Similarly, the bound of Theorem 1.2 could potentially give better results ifwe replace B by an equivalent set c ˜ B ⊆ B ⊆ C ˜ B . Note that the same phenomenonarises when applying the growth functional machinery of [16]: the growth conditionis not preserved if we choose an equivalent functional. This appears to be aninherent diﬃculty that arises in the control of chaining functionals.2.2. Convexity.

While Proposition 2.3 provides a very general chaining principlein metric spaces, it is not immediately obvious how to apply this result in anygiven situation. The problem is that the sets A t that appear in the previous sectionare deﬁned implicitly as families of solutions to certain optimization problems; inthe absence of a more explicit characterization, the computation of the entropynumbers e n ( A a n/p ) can be a challenging problem. To address this problem, wespecialize our results from this point onwards to the case where the set of interestis convex and where the penalty function is chosen to be the associated gauge. Theconvexity assumption makes it possible to obtain a dual formulation of the sets ofoptimizers that is readily amenable to explicit computations. The advantages ofthis formulation will be amply illustrated in the following sections. HAINING, INTERPOLATION, AND CONVEXITY 7

We now introduce the setting that will be used throughout the remainder of thispaper. Let ( X, k · k ) be a Banach space, and let B ⊂ X be a symmetric compactconvex set. The metric d that appears in the deﬁnitions of the entropy numbers e n ( B ) and the functionals γ p ( B ) (cf. Deﬁnition 2.1) will always be chosen to bedeﬁned by the norm d ( x, y ) := k x − y k on the underlying Banach space. The gauge(Minkowski functional) of B will be denoted k · k B , that is, k x k B := inf { s ≥ x ∈ sB } for x ∈ X . Denote by k · k ∗ B and k · k ∗ the associated dual gauge and norm, that is, k z k ∗ B := sup k x k B ≤ h z, x i = sup x ∈ B h z, x i , k z k ∗ := sup k x k≤ h z, x i for z ∈ X ∗ . The key point of this section is the following duality result, whichshows that the minimizers of the K -functional in the convex setting deﬁne a formof projection onto an explicitly deﬁned scale of subsets B t ⊆ B . Proposition 2.5.

For every t ≥ , there is a map π t : B → B such that:(i) π t ( x ) is a minimizer for Peetre’s K -functional for every x ∈ B : K ( t, x ) := inf y ∈ X {k y k B + t k x − y k} = k π t ( x ) k B + t k x − π t ( x ) k . (ii) The set of minimizers B t := { π t ( x ) : x ∈ B } can be characterized as B t = { y ∈ B : ∃ z ∈ X ∗ such that h z, y i = k y k B , k z k ∗ B ≤ , k z k ∗ ≤ t } . (iii) We have π t ( x ) = x for every x ∈ B t .Proof. The result holds trivially for t = 0, so we ﬁx t > Step 1.

Let B K := conv( B ∪ t B ∼ ), where B ∼ is the closed unit ball in ( X, k · k ).For completeness, we recall the proof of the elementary fact that K ( t, x ) = k x k B K for every x ∈ X , where k · k B K denotes the gauge of B K .Suppose ﬁrst that K ( t, x ) < r , so there exists y ∈ X with k y k B + t k x − y k < r .Then writing x = λx + µx with x = y/ k y k B and x = ( x − y ) /t k x − y k readilyimplies that k x k B K < r . In the converse direction, suppose that k x k B K < r , sothat x = λx + µx for some | λ | + | µ | < r , x ∈ B , x ∈ t B ∼ . Then choosing y = λx in the deﬁnition of K ( t, x ) shows that K ( t, x ) < r . Step 2.

We now establish the existence of a minimizer in the deﬁnition of K ( t, x ) for every x ∈ X . This is a direct consequence of the previous step and thecompactness of B . Indeed, as B is compact, the set B K is closed. Thus K ( t, x ) = r implies x ∈ rB K , so there exist | λ | + | µ | ≤ r and x ∈ B , x ∈ t B ∼ such that x = λx + µx . It follows that y = λx is a minimizer for K ( t, x ), as K ( t, x ) ≤ k λx k B + t k µx k ≤ r = K ( t, x ) . Step 3.

Deﬁne the set B ′ t := { y ∈ B : K ( t, y ) = k y k B } . We can characterize this set by duality. Indeed, note that K ( t, y ) = sup {h z, y i : z ∈ X ∗ , k z k ∗ B ≤ , k z k ∗ ≤ t } , where we have used the polar identity B ◦ K = B ◦ ∩ tB ◦∼ . Moreover, the supremumis attained at some point z ∈ X ∗ by the Hahn-Banach theorem. Therefore, if RAMON VAN HANDEL y ∈ B ′ t , then there exists z ∈ X ∗ such that h z, y i = k y k B , k z k ∗ B ≤

1, and k z k ∗ ≤ t .Conversely, if y ∈ B is such that a point z satisfying the latter properties exists,then k y k B = h z, y i ≤ K ( t, y ) ≤ k y k B so that y ∈ B ′ t . Thus we have B ′ t = { y ∈ B : ∃ z ∈ X ∗ such that h z, y i = k y k B , k z k ∗ B ≤ , k z k ∗ ≤ t } . Step 4.

Deﬁne the map π t : B → B as follows. For x ∈ B ′ t , we set π t ( x ) = x .For x B ′ t , we choose π t ( x ) to be any minimizer in the deﬁnition of K ( t, x ). Weare going to verify that each of the claims in the statement of the Proposition hold.Let us ﬁrst note that π t does indeed map B into itself. For x ∈ B ′ t , this is trueby construction. For x B ′ t , this is true because k π t ( x ) k B ≤ K ( t, x ) ≤ k x k B .Moreover, note that when x ∈ B ′ t , by construction y = x = π t ( x ) is a minimizer inthe deﬁnition of K ( t, x ). We have therefore established part (i).To prove parts (ii) and (iii), it suﬃces to show that B t = B ′ t . That B ′ t ⊆ B t isobvious from the fact that π t ( x ) = x for x ∈ B ′ t ⊆ B . To establish the converseinclusion, we argue as follows. Fix x ∈ B , and choose z ∈ X ∗ such that K ( t, x ) = h z, x i , k z k ∗ B ≤

1, and k z k ∗ ≤ t . By the bipolar theorem, we can write h z, π t ( x ) i ≤ k π t ( x ) k B = h z, π t ( x ) i + h z, x − π t ( x ) i − t k x − π t ( x ) k ≤ h z, π t ( x ) i . This implies that π t ( x ) ∈ B ′ t , and thus B t ⊆ B ′ t . (cid:3) Remark 2.6.

When B is a symmetric convex body in a ﬁnite-dimensional Banachspace, the details of the proof of Proposition 2.5 simplify signiﬁcantly. It is aninstructive exercise to give a quick proof in this case using subdiﬀerential calculus.The proof of Theorem 1.2 in the introduction now follows trivially. For futurereference, we formulate the analogous result for γ p -functionals. Corollary 2.7.

Let B ⊂ ( X, k · k ) be a symmetric compact convex set, and deﬁne B t := { y ∈ B : ∃ z ∈ X ∗ such that h z, y i = k y k B , k z k ∗ B ≤ , k z k ∗ ≤ t } . Then we have for any a > γ p ( B ) . a + X n ≥ n/p e n ( B a n/ ) , where the universal constant depends on p only.Proof. This is simply the combined statement of Proposition 2.3, where we choosethe penalty f ( x ) = k x k B and distance d ( x, y ) = k x − y k , and Proposition 2.5. (cid:3) We end this section by emphasizing a remark that was also made in the intro-duction. Recall that a symmetric convex set B ⊂ X is called smooth if for every x ∈ X , x = 0 there is a unique z ∈ X ∗ so that h z, x i = k x k B and k z k ∗ B ≤

1, cf. [3].

Corollary 2.8.

Let B be a symmetric convex body in a ﬁnite-dimensional Banachspace ( X, k · k ) , and denote by ∂ k y k B the subdiﬀerential of k y k B . Then B t = n y ∈ B : inf z ∈ ∂ k y k B k z k ∗ ≤ t o . In particular, if B is smooth, then B t = { y ∈ B : k∇k y k B k ∗ ≤ t } . Proof.

It is a classical fact that ∂ k y k B = { z ∈ X ∗ : h z, y i = k y k B , k z k ∗ B ≤ } , sothat the result follows readily from Proposition 2.5; cf. [9, Chapter VI]. (cid:3) The explicit nature of Corollary 2.8 is particularly useful in computations.

HAINING, INTERPOLATION, AND CONVEXITY 9 Examples

The aim of this section is to illustrate the utility of Theorem 1.2 in explicitcomputations by investigating some simple but conceptually interesting examples.As our goal is to develop insight into the phenomenon described by Theorem 1.2,we have avoided unnecessary distractions by restricting attention to situations inwhich existing entropy estimates can be used.We write k x k r := [ P i | x i | r ] /r , and denote by e , . . . , e d the standard basis in R d .Throughout this section, we work in Euclidean space ( R d , k · k ) where k · k := k · k .The concrete choice of the Euclidean norm is not important for our theory, butis made in order to enable explicit computations and is natural in the setting ofGaussian processes (as it corresponds to the canonical choice X t = h t, g i in Theorem1.1, where g is a standard Gaussian vector in R d ). Some of the examples developedhere will be revisited in section 4 in a much more general setting.3.1. ℓ q -Ellipsoids. The classical example of a situation where Dudley’s bound failsto be sharp is that of ellipsoids in Hilbert space. In this section, we will investigatethe following more general situation. Given scalars 1 < q < ∞ and b ≥ b ≥ · · · ≥ b d >

0, let B ⊂ R d be the ℓ q -ellipsoid whose gauge is given by k x k B = " d X i =1 (cid:18) | x i | b i (cid:19) q /q . We will show that Theorem 1.2 yields the following optimal bound.

Proposition 3.1.

In the setting of this section, we have γ ( B ) . d X i =1 b q/ ( q − i ! ( q − /q , where the universal constant depends on q only. Of course, this result can easily be obtained from Theorem 1.1, but our aim isto provide a geometric proof that explains why the result is true.In order to apply either Dudley’s bound or Theorem 1.2, we will require suitableestimates on the entropy numbers of ℓ q -ellipsoids. The behavior of these entropynumbers is investigated in detail in a classic paper by Carl [5] (in the special caseof ℓ -ellipsoids a much more elementary approach can be found in [16, § r ≥

1, the proof extends directly tothe case 0 < r <

Lemma 3.2 ([5]) . Given < r < ∞ , /s > (1 / − /r ) + , < u < ∞ , and scalars c ≥ c ≥ · · · ≥ c d > , the ℓ r -ellipsoid C = { x ∈ R d : k ( x i /c i ) k r ≤ } satisﬁes X n ≥ (cid:0) n (1 /s +1 /r − / e n ( C ) (cid:1) u ≍ d X k =1 ( k /s − /u c k ) u where the universal constant depends on r, s, u only. Applying this result with r = q , 1 /s = 1 − /q , and u = 1 yields X n ≥ n/ e n ( B ) ≍ d X k =1 k − /q b k . We therefore see immediately that Dudley’s bound is suboptimal for ℓ q -ellipsoids:Dudley’s bound is much larger than γ ( B ), say, when b k = k − ( q − /q (log k ) − .To obtain a sharp bound, we will apply Theorem 1.2. The crux of the matter isto control the sets B t . In the present setting, this is exceedingly simple and givesa vivid illustration of where the improvement over Dudley’s bound comes from. Proof of Proposition 3.1.

Note that B is a smooth convex body with ∂ k y k B ∂y k = 1 b qk | y k | q − k y k q − B sign( y k ) . Thus Corollary 2.8 gives B t = { y ∈ B : k y k C ≤ t / ( q − k y k B } ⊆ t / ( q − C, where k y k C = " d X i =1 (cid:18) | y i | b q/ ( q − i (cid:19) q − / (2 q − . Substituting B t ⊆ t / ( q − C into Theorem 1.2 and optimizing over a > γ ( B ) . X n ≥ nq/ (2 q − e n ( C ) ! ( q − /q . The conclusion follows by applying Lemma 3.2 with r = 2 q − s = u = 1. (cid:3) The key point of the proof of Proposition 3.1 is that each subset B t of the ℓ q -ellipsoid B is contained in a dilation of the much “thinner” ℓ q − -ellipsoid C : thelengths of the semiaxes of C have been raised to the power q/ ( q −

1) as comparedto those of B . This is precisely why we obtain the correct powers of b i inside thesum in Proposition 3.1. The author sees no obvious way to explain this miracleother than that it drops out of the trivial explicit computation performed above.However, a deeper understanding of the geometry of the sets B t for ℓ q -ellipsoidswill be obtained in a much more general setting in section 4. Remark 3.3.

There exist two previous geometric proofs of Proposition 3.1 forspecial values of q . The ﬁrst, in [11, § γ ( B ) for q = 2. The second, in [16, § ≤ q < ∞ from a more general bound for uniformly convex bodiesthat is proved using the growth functional machinery. We will revisit the latteridea in section 4, where we will also see that uniform convexity fails to explain thebehavior of ℓ q -ellipsoids for 1 < q <

2. That we have obtained a sharp bound forevery value of q with the same proof therefore hides the fact that ℓ q -ellipsoids canhave a very diﬀerent geometry for diﬀerent values of q . Remark 3.4.

The universal constant in Proposition 3.1 must necessarily dependon q : if this were not the case, then we would obtain γ ( B ) . b in the limit q ↓ HAINING, INTERPOLATION, AND CONVEXITY 11 behavior as q ↓

1. This is not a deﬁciency of Theorem 1.2, however: the case q = 1is of particular interest in its own right and will be investigated in the next section.3.2. Octahedra.

In this section, we investigate the limiting case q = 1 of theexample developed in the previous section. That is, given scalars b ≥ b ≥ · · · ≥ b d >

0, we investigate the octahedron B ⊂ R d deﬁned by B = absconv { b i e i : i = 1 , . . . , d } . It is not diﬃcult to show that Dudley’s bound is suboptimal in this setting [16,Exercise 2.2.15]. We will show that Theorem 1.2 yields the following optimal bound.

Proposition 3.5.

In the setting of this section, we have γ ( B ) . Σ := max i ≤ d b i p log( i + 1) . Of course, this result could easily be obtained from Theorem 1.1, and a ratherdiﬃcult geometric proof using growth functionals can be found in [15, § B t . Lemma 3.6.

For any t ≥ , we have B t = ( y ∈ B : d X i =1 y i =0 b i ≤ t ) . Proof.

While k · k B is not smooth, we can easily compute its subdiﬀerential: ∂ k y k B = { z ∈ R d : z i = sign( y i ) /b i if y i = 0 , | z i | ≤ /b i if y i = 0 } . We therefore obtain inf z ∈ ∂ k y k B k z k = d X i =1 y i =0 b i , and the result follows from Corollary 2.8. (cid:3) Lemma 3.6 shows that the sets B t are very thin indeed: they consist of sparsevectors. Controlling the entropy numbers of such sets is an easy exercise; for eachﬁxed sparsity pattern we can discretize using a standard volumetric argument, whilecounting the number of sparsity patterns is a matter of simple combinatorics. Lemma 3.7.

There is a universal constant c > such that for all n ≥ e n ( B c n/ / Σ ) . − n b . Proof.

Fix n ≥

0. As 1 /b i ≥ log( i + 1) / Σ by deﬁnition, we have B t ⊆ C t := ( y ∈ B : d X i =1 log( i + 1) y i =0 ≤ Σ t ) . It suﬃces to control the entropy numbers of the larger set C c n/ / Σ .Let us begin with some counting. Denote by I the family of all admissiblesparsity patterns of y ∈ C c n/ / Σ , that is, I is the family of all I ⊆ [ d ] such that X i ∈ I log( i + 1) ≤ c n . Denote by I k ⊆ I the family of all I ∈ I with cardinality | I | = k . Let us boundthe number of such sets. Setting c := √ log 2 /

2, we can estimate |I k | = X | I | = k I ∈I = X | I | = k Q i ∈ I ( i +1) ≤ n − ≤ n − X | I | = k Y i ∈ I i + 1) . The right-hand side can be bounded as follows: X | I | = k Y i ∈ I i + 1) = X ≤ ℓ <ℓ < ··· <ℓ k ≤ d k Y i =1 ℓ i + 1) ≤ k Y i =1 X ℓ ≥ i ℓ + 1) < k ! , where we have used that X ℓ ≥ i ℓ + 1) < X ℓ ≥ i Z ℓ +1 ℓ x dx = Z ∞ i x dx = 1 i . We have therefore shown that |I k | < n − /k !.Let ε ≤ b be a constant to be chosen later on. For every I ∈ I , choose aminimal ε -net T I for the Euclidean ball in R I with radius b , and denote by T theunion of all these sets T I . Evidently T is a ε -net for C c n/ / Σ . Let us estimate itscardinality. A standard volumetric argument yields [2, Corollary 4.1.15] | T I | ≤ (cid:18) b ε (cid:19) | I | . We can therefore estimate | T | ≤ d X k =0 (cid:18) b ε (cid:19) k |I k | < n − e b /ε . If we choose ε = (6 / log 2) 2 − n b , we ﬁnd that | T | < n which establishes the claimwhenever 2 n ≥ / log 2 (as we assumed that ε ≤ b in the volumetric estimate). For2 n < / log 2, simply note the trivial bound e n ( C c n/ / Σ ) ≤ diam( B ) ≤ b . (cid:3) With this entropy estimate in hand, the proof of Proposition 3.5 is an immediateconsequence of Lemma 3.7 and Theorem 1.2 with a = c/ Σ.3.3.

A counterexample.

The aim of this section is to show that Theorem 1.2 doesnot always give sharp results. As the example that we will discuss is a conceptuallyimportant one, let us brieﬂy consider this example in a broader context.A remarkable consequence of Theorem 1.1 is that γ (conv( T )) ≍ γ ( T ) for any(non-convex) subset T ⊆ R d of Euclidean space: as the supremum of a linearfunction over a convex set is attained at an extreme point, Theorem 1.1 yields γ ( T ) ≍ E (cid:20) sup x ∈ T h x, g i (cid:21) = E (cid:20) sup x ∈ conv( T ) h x, g i (cid:21) ≍ γ (conv( T ))(here g denotes a standard Gaussian vector in R d ). It is a long-standing openproblem to understand the geometric mechanism behind this fundamental fact; cf.[16, § x , . . . , x n ∈ R d such that k x k ≥ k x k ≥ · · · ≥ k x n k >

0, we have γ ( B ) . max i ≤ n k x i k p log( i + 1) , B = absconv { x i : i = 1 , . . . , n } . HAINING, INTERPOLATION, AND CONVEXITY 13

We solved this problem in the previous section under the additional assumptionthat the vectors x i are orthogonal. It is not known, however, how this conclusioncan be established in the absence of the orthogonality assumption. The results ofthis paper originated in an attempt by the author to understand this issue. We willpresently illustrate that Theorem 1.2 does not directly resolve this problem.The example that we will consider is deﬁned as follows. Fix 0 < ε < u = d − / , where is the vector of ones (note that k u k = 1). We consider the set B = absconv { x i : i = 1 , . . . , d } , x i = e i + εu. This is a small perturbation of the example in the previous section where all verticesof the simplex have been shifted along the diagonal. One can show as in [16, Exercise2.2.15] that γ ( B ) ≍ √ log d , while Dudley’s bound is of order (log d ) / .We claim that Theorem 1.2 does not improve on Dudley’s bound in the presentsetting: the sets B t are not suﬃciently small to gain any improvement. This un-fortunate conclusion is contained in the following lemma. Lemma 3.8.

We have B t ⊇ conv { x i : i = 1 , . . . , d } for all t ≥ /ε .Proof. Let V = P di =1 x i ⊗ e i be the square matrix whose columns are the vectors x i . Note that V is invertible, and we have k x k B = k V − x k . Therefore ∂ k x k B = ( V ∗ ) − ∂ k V − x k ∋ ( V ∗ ) − sign( V − x ) , where sign( z ) operates entrywise on a vector z and we set sign(0) := 1. In particular, B t ⊇ { x ∈ B : sign( V − x ) ∈ tV ∗ B ∼ } by Corollary 2.8, where B ∼ denotes the Euclidean unit ball in R d .Now note that if x ∈ conv { x i : i = 1 , . . . , d } , then V − x has nonnegative entriesand thus sign( V − x ) = . It therefore suﬃces to show that ∈ tV ∗ B ∼ whenever t ≥ /ε . But this is a simple consequence of the deﬁnition of V , as tV ∗ v = for v = ut ( ε + d − / )and clearly k v k ≤ t ≥ /ε . This completes the proof. (cid:3) Let ∆ d − be the standard simplex in R d . Lemma 3.8 shows that B t ⊇ ∆ d − + εu whenever t ≥ /ε . Setting n a,ε = (2 log (1 /aε )) + , we can estimate X n ≥ n/ e n ( B a n/ ) ≥ X n ≥ n a,ε n/ e n (∆ d − ) & (log d ) / − Cn a,ε p log d for some constant C >

0, where we used that e n (∆ d − ) & − n/ √ log d for n . log d [16, Exercise 2.2.15]. We have therefore shown that Theorem 1.2 does not improveon Dudley’s bound in this example unless ε is polynomially small in d . Remark 3.9.

Of course, the example described in this section is suﬃciently simplethat we can make some manual adjustments to obtain a sharp geometric construc-tion. Indeed, we clearly have B ⊂ B + B where B denotes the ℓ -ball in R d and B = { αu : | α | ≤ ε } is one-dimensional. Theorem 1.2 gives a sharp generic chainingconstruction for B , while a trivial discretization of α suﬃces to control B . We canthen glue together the generic chaining constructions for B and B by summingthe corresponding nets. It is not clear, however, how one could construct such adecomposition in the general setting described at the beginning of this section. Geometry and Entropy Contraction

In the previous section, we illustrated the utility of Theorem 1.2 in speciﬁcexamples. The computations hinge, however, on a suﬃciently explicit descriptionof the sets B t , which may not always be available in more general situations. Forexample, if we consider the examples of the previous section under general norms,it may be nontrivial to control the sets B t directly. It is therefore of interest todevelop more systematic methods to control the geometry of the sets B t .As a prototype of what one might hope for, let us reconsider the setting of ℓ q -ellipsoids in Hilbert space. Theorem 1.2 bounds γ ( B ) in terms of the entropynumbers of the sets B t , which we computed explicitly in section 3.1. However,Lemma 3.2 suggests that the correct behavior of γ ( B ) in this example can also beexpressed in terms of the entropy numbers of B itself: we easily verify that γ ( B ) ≍ " X n ≥ (cid:0) n/ e n ( B ) (cid:1) q/ ( q − ( q − /q . The appearance of such a bound is not a coincidence. Talagrand has shown that anupper bound of this form holds for any q -convex set B [16, § ℓ q -ellipsoids aremax(2 , q )-convex, this provides an alternative explanation for the behavior of ℓ q -ellipsoids in the case 2 ≤ q < ∞ . One of the insights to be developed in this sectionis that this fundamental property of q -convex sets is fully explained by Theorem 1.2.Roughly speaking, we will show that the q -convexity assumption forces the sets B t to be much smaller than B itself in the sense that e n ( B t ) / t / ( q − e n ( B ) q/ ( q − ,from which the above bound is easily deduced. More generally, this phenomenonsuggests that the chaining principle for general convex sets given by Theorem 1.2can be signiﬁcantly simpliﬁed in the presence of additional geometric structure.It turns out that there is nothing special about q -convexity per se, but that theentropy contraction phenomenon illustrated above arises from a much more generalgeometric mechanism. We develop a general formulation of this idea in section 4.1.We then demonstrate how the requisite structure arises in two distinct settings:the case of q -convex sets is developed in section 4.2, while the case of ℓ q -balls inBanach spaces with an unconditional basis is developed in section 4.3.4.1. A geometric principle.

Let ( X, k · k ) be a Banach space and let B ⊂ X bea symmetric compact convex set. The sets B t are deﬁned as in Theorem 1.2. Thefollowing geometric principle is the main result of this section. Theorem 4.1.

Let q > and K > be given constants, and suppose that k y − z k qB ≤ Kt k y − z k for every y, z ∈ B t , t ≥ . Then γ p ( B ) . " X n ≥ (cid:0) n/p e n ( B ) (cid:1) q/ ( q − ( q − /q , where the universal constant depends on p , q , and K only. Like Theorem 1.2, the message of Theorem 4.1 is that the behavior of γ p ( B ) isstrictly better than would be expected from Dudley’s bound. Unlike Theorem 1.2,however, the presence of additional geometric structure allows us to bound γ p ( B )only in terms of the entropy numbers of B itself. This bound could therefore be HAINING, INTERPOLATION, AND CONVEXITY 15 applied even without an explicit description of B t . Of course, there is no free lunch:the assumption of Theorem 4.1 requires us to understand the metric structure of thesets B t . Fortunately, we will see in the sequel that there are interesting situationsin which this can be accomplished without explicitly computing the sets B t . Remark 4.2.

Before we turn to the proof of Theorem 4.1, it is instructive toconsider the signiﬁcance of the geometric assumption of Theorem 4.1. Observethat we always have, regardless of any assumptions, the following simple fact: k y k B ≤ t k y k for every y ∈ B t , t ≥ . Indeed, if z ∈ X ∗ is as in the deﬁnition of B t , then k y k B = h z, y i ≤ k z k ∗ k y k ≤ t k y k . We therefore see that by construction, an element y ∈ B t with small norm must becontained in a small dilation y ∈ t k y k B of the original set B . The assumption ofTheorem 4.1 asks that a weaker form of this property hold not only for norms, butalso for distances: that is, if y, z ∈ B t , then y − z ∈ ( Kt k y − z k ) /q B . This does notfollow automatically from the corresponding property for norms, as it is typicallynot true that B t − B t ⊆ cB ct for some constant c . Nonetheless, this intuition provesto be useful as it will help us identify how the requisite geometric structure arises.The main idea behind the proof of Theorem 4.1 is the following observation. Lemma 4.3.

Suppose that the assumption of Theorem 4.1 holds. Then e n +1 ( B t ) ≤ ( Kt e n ( B t )) /q e n ( B ) for every n ≥ , t ≥ . Proof.

Fix ε >

0. By the deﬁnition of entropy numbers, we can cover B t by lessthan 2 n balls of radius (1+ ε ) e n ( B t ). By our assumption, each of these balls (inter-sected with B t ) is contained in a translate of sB with s ≤ (1 + ε ) /q ( Kt e n ( B t )) /q .Therefore, each of these balls can be further covered by less than 2 n balls of radius(1 + ε ) s e n ( B ). We have now covered B t by less than 2 n · n = 2 n +1 balls ofradius ≤ (1 + ε ) /q ( Kt e n ( B t )) /q e n ( B ). Letting ε ↓ (cid:3) An annoying feature of Lemma 4.3 is that the entropy number on the left-handside is e n +1 ( B t ) rather than e n ( B t ). If it were the case that e n ( B t ) . e n +1 ( B t )(that is, if we knew a priori that the entropy numbers do not decay too quickly),then we could simplify the conclusion of Lemma 4.3 to e n ( B t ) . t / ( q − e n ( B ) q/ ( q − . This expression quantiﬁes in the present setting in what sense the sets B t are muchsmaller than the original set B . From this expression, it would be easy to concludethe result of Theorem 4.1: substituting the above bound into Theorem 1.2 yields γ p ( B ) . a + a / ( q − X n ≥ (cid:0) n/p e n ( B ) (cid:1) q/ ( q − , and the conclusion of Theorem 4.1 would follow by optimizing over a >

0. Themain technical issue in the proof of Theorem 4.1 is to show that its conclusionremains valid even when the regularity assumption e n ( B t ) . e n +1 ( B t ) does nothold, which we do by means of a routine dyadic regularization argument. Proof of Theorem 4.1.

Fix a constant λ > C , we introduce the regularized entropy numbers d n ( C ) ≥ e n ( C ) as d n ( C ) := max ≤ k ≤ n λ ( k − n ) e k ( C ) . Using Lemma 4.3, we estimate d n ( B t ) ≤ max ≤ k ≤ n +1 λ ( k − n ) e k ( B t ) ≤ − λn diam( B ) + 2 λ max ≤ k ≤ n λ ( k − n ) e k +1 ( B t ) . − λn diam( B ) + 2 λ t /q max ≤ k ≤ n λ ( k − n ) e k ( B t ) /q e k ( B ) ≤ − λn diam( B ) + 2 λ t /q d n ( B t ) /q max ≤ k ≤ n λ ( k − n )( q − /q e k ( B ) . Therefore, using a /q b ( q − /q ≤ a/q + b ( q − /q , we obtain d n ( B t ) . − λn diam( B ) + 2 λq/ ( q − t / ( q − max ≤ k ≤ n λ ( k − n ) e k ( B ) q/ ( q − . In particular, we can crudely bound X n ≥ n/p e n ( B a n/p ) . diam( B ) X n ≥ n/p − λn + a / ( q − λq/ ( q − X n ≥ nq/ ( q − p − λn X ≤ k ≤ n λk e k ( B ) q/ ( q − . In order for the sums to converge we must choose λ > q/ ( q − p , so we ﬁx forconcreteness λ = 2 q/ ( q − p (the precise value of λ does not matter). This yields X n ≥ n/p e n ( B a n/p ) . diam( B ) + a / ( q − X n ≥ − nq/ ( q − p X ≤ k ≤ n (cid:0) k/p e k ( B ) (cid:1) q/ ( q − = diam( B ) + a / ( q − X k ≥ X n ≥ k − nq/ ( q − p (cid:0) k/p e k ( B ) (cid:1) q/ ( q − . diam( B ) + a / ( q − X k ≥ (cid:0) k/p e k ( B ) (cid:1) q/ ( q − . Applying Corollary 2.7 and optimizing over a > γ p ( B ) . diam( B ) + " X n ≥ (cid:0) n/p e n ( B ) (cid:1) q/ ( q − ( q − /q . It remains to note that diam( B ) ≤ e ( B ), so that the ﬁrst term can be absorbedin the second at the expense of the universal constant. (cid:3) Remark 4.4.

An inspection of the proof shows that the universal constant inTheorem 4.1 blows up as q ↓

1. It would be interesting to understand whetherthere is an analogue of Theorem 4.1 that holds in the limiting case q = 1: that is,whether there is a general geometric mechanism that ensures the sharp bound γ p ( B ) ≍ sup n ≥ n/p e n ( B ) HAINING, INTERPOLATION, AND CONVEXITY 17 (that the right-hand side is a lower bound on γ p ( B ) is trivial). This situation isillustrated by the example of section 3.2: in this case both the assumption and theconclusion of Theorem 4.1 hold for q = 1 (the assumption holds by Remark 4.2 and B t − B t ⊆ B √ t , while the conclusion can be deduced from [16, Exercise 2.2.15]),but Theorem 4.1 is not suﬃciently sharp to capture this example.4.2. Uniformly convex sets.

In this section, we exhibit an important situationwhere the assumption of Theorem 4.1 can be veriﬁed by imposing additional geo-metric structure on the set B : we show that the assumption holds when B is q -convex. This recovers a fundamental result of Talagrand [16, § X, k · k ) be any Banach space, and let B ⊂ X be a symmetric convex set.As usual, we denote by k · k B the gauge of B . We recall the following deﬁnition. Deﬁnition 4.5.

Let q ≥

2. A symmetric convex set B is called q -convex if (cid:13)(cid:13)(cid:13)(cid:13) x + y (cid:13)(cid:13)(cid:13)(cid:13) B ≤ − η k x − y k qB for all x, y ∈ B , where η > Corollary 4.6 ([16]) . Let B be a symmetric convex set in a Banach space ( X, k·k ) ,and assume that B is q -convex (with constant η ). Then γ p ( B ) . " X n ≥ (cid:0) n/p e n ( B ) (cid:1) q/ ( q − ( q − /q , where the universal constant depends on p , q , and η only. To connect this result to the explicit computations in section 3.1, we recall that ℓ q -ellipsoids are max(2 , q )-convex [3]. This shows that the case 2 ≤ q < ∞ ofProposition 3.1 is in fact a manifestation of the much more general phenomenondescribed by Corollary 4.6: we emphasize that the present result requires no as-sumption of any kind on the norm k · k . On the other hand, it is impossible for aconvex set to be q -convex with q < ℓ q -ellipsoids for q <

2. We willnonetheless see in the next section that the latter case can also be understood as amanifestation of the general geometric principle described by Theorem 4.1.We prove Corollary 4.6 by verifying the assumption of Theorem 4.1.

Lemma 4.7.

Let B be a q -convex set and t ≥ . Then k y − z k qB . t k y − z k for every y, z ∈ B t , where the universal constant depends on q and η only. We will give two diﬀerent proofs of this lemma. The ﬁrst proof is pedestrian,but perhaps not very intuitive. The second proof is more intuitive, as it is close inspirit to the intuition developed in Remark 4.2; however, this proof requires us touse an alternative (but equivalent) formulation of the q -convexity property. First proof.

By Proposition 2.5, we have π t ( y ) = y for y ∈ B t . Thus k y k B = inf u {k u k B + t k y − u k} ≤ (cid:13)(cid:13)(cid:13)(cid:13) y + z (cid:13)(cid:13)(cid:13)(cid:13) B + t (cid:13)(cid:13)(cid:13)(cid:13) y − z (cid:13)(cid:13)(cid:13)(cid:13) for any y, z ∈ B t . Similarly, exchanging the role of y and z , we obtain1 ≤ (cid:13)(cid:13)(cid:13)(cid:13) y + z γ (cid:13)(cid:13)(cid:13)(cid:13) B + t (cid:13)(cid:13)(cid:13)(cid:13) y − z γ (cid:13)(cid:13)(cid:13)(cid:13) , γ := k y k B ∨ k z k B . But note that k y/γ k B ≤ k z/γ k B ≤ γ . Therefore,applying the q -convexity assumption to the ﬁrst term on the right yields k y − z k qB ≤ γ q − η t k y − z k for any y, z ∈ B t . The proof is completed by noting that γ ≤ (cid:3) Second proof.

An equivalent characterization of the q -convexity property is as fol-lows [17, Corollary 1]: B is q -convex if and only if h j y − j z , y − z i & k y − z k qB for all j y ∈ J y := { u ∈ X ∗ : h u, y i = k y k qB , k u k ∗ B ≤ k y k q − B } and j z ∈ J z , wherethe universal constant depends on q, η only. Note that J y is none other than thesubdiﬀerential of the map y

7→ k y k qB /q (cf. Corollary 2.8), so this characterizationis rather intuitive: B is q -convex precisely when the map y

7→ k y k qB exhibits auniform improvement over the usual ﬁrst-order condition for convexity.With this formulation in hand, the lemma follows easily. Let y, z ∈ B t . Bydeﬁnition of B t , we can choose u y ∈ X ∗ with h u y , y i = k y k B , k u y k ∗ B ≤ k u y k ∗ ≤ t .Choose u z ∈ X ∗ analogously. Setting j y = u y k y k q − B and j z = u z k z k q − B gives k y − z k qB . h j y − j z , y − z i ≤ k j y − j z k ∗ k y − z k ≤ t k y − z k . This completes the proof. (cid:3)

It is now trivial to complete the proof of Corollary 4.6.

Proof of Corollary 4.6.

We may as well assume that B is compact: if B is notprecompact, the right-hand side of the desired inequality is inﬁnite and there isnothing to prove; if B is precompact, there is no loss of generality in assuming thatit is also closed. It remains to apply Theorem 4.1 and Lemma 4.7. (cid:3) ℓ q -balls and unconditional bases. We have seen in the previous sectionthat uniform convexity cannot explain the behavior of ℓ q -ellipsoids in Hilbert spacethat was observed in section 3.1. We will presently show that this behavior isnonetheless a manifestation of the general geometric principle of Theorem 4.1. Itwill follow immediately that the same behavior persists in a much larger family ofBanach spaces (but not in a setting as general as for q -convex sets).To understand what is going on, let us take inspiration from the second proof ofLemma 4.7 (and from Remark 4.2). For any x ∈ X , choose any point j x ∈ X ∗ besuch that h j x , x i = k x k qB and k j x k ∗ B ≤ k x k q − B . As k y − z k qB = h j y − z , y − z i ≤ k j y − z k ∗ k y − z k , the assumption of Theorem 4.1 would follow if we could show that k j y − z k ∗ . t whenever y, z ∈ B t . We can always choose k j x k ∗ ≤ t when x ∈ B t , but this doesnot in itself yield the desired result: y, z ∈ B t does not imply y − z ∈ B t .To obtain the desired bound, we must ﬁnd a relation between j y − z and j y , j z . The q -convexity assumption provides the inequality h j y − z , y − z i . h j y − j z , y − z i , whichis particularly convenient for this purpose. However, this is by no means the onlyway to achieve our goal. In the case of ℓ q -ellipsoids, we will use a completely diﬀerent HAINING, INTERPOLATION, AND CONVEXITY 19 geometric property: in this case we observe that | j y − z | . | j y | + | j z | coordinatewise.This simple device allows us to reach the same conclusion as in the q -convex caseas long as the dual norm k · k ∗ respects the coordinatewise ordering.We proceed to make this idea precise. We ﬁrst recall the class of Banach spacesthat possess the desired monotonicity properties [1, § Deﬁnition 4.8.

Let ( X, k · k ) be a Banach space and let { e n } be a basis for X .The basis is said to be unconditional with constant K if (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) N X n =1 a n e n (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ K (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) N X n =1 b n e n (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) for all N ∈ N and scalars a n , b n ∈ R such that | a n | ≤ | b n | for all n .We recall for future reference that if { e n } is an unconditional basis in X withconstant K , then the biorthogonal sequence { e ∗ n } is an unconditional basic sequencein X ∗ with the same constant K [1, Proposition 3.2.1]. Remark 4.9.

The notion of a K -unconditional basis is often deﬁned in a slightlydiﬀerent way than we have done above: a basis is unconditional with constant K if (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) N X n =1 ε n b n e n (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ K (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) N X n =1 b n e n (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) for all N ∈ N , b n ∈ R , and ε n ∈ {− , +1 } , that is, if the norm of P Nn =1 b n e n isapproximately invariant to sign changes of the coeﬃcients b n . The more generalproperty of Deﬁnition 4.8 is however readily deduced from this alternative deﬁnition(for example, by choosing random signs ε n such that a n = E [ ε n b n ]).In the following let ( X, k · k ) be a Banach space and let { e n } be an unconditionalbasis with constant K . Fix 1 < q < ∞ , and deﬁne the ℓ q -ball B ⊂ X as follows: B = ( d X i =1 z i e i : d X i =1 | z i | q ≤ ) (our result will be independent of d , and therefore extends readily to inﬁnite di-mension). Note that the ℓ q -ellipsoids considered in section 3.1 correspond to thespecial case where { e i } is the standard basis in R d and k x k = P i b i x i . Corollary 4.10.

In the setting of this section, we have γ p ( B ) . " X n ≥ (cid:0) n/p e n ( B ) (cid:1) q/ ( q − ( q − /q , where the universal constant depends on p , q , and K only.Proof. The norm k·k on X can be transferred to R d by deﬁning k z k := k P di =1 z i e i k for z ∈ R d . There is therefore no loss of generality in assuming that X = R d withthe above norm, that { e i } = { e ∗ i } is the standard basis, and that k x k B is the ℓ q -norm on R d , as we will do in the sequel for notational simplicity. (We emphasize,however, that k · k is not the Euclidean norm, so that the present setting does notreduce to the Euclidean setting considered previously in section 3.1). As k x k B is the ℓ q -norm, we can compute ∂ k x k B ∂x i = | x i | q − k x k q − B sign( x i ) . By Corollary 2.8, we can write B t = { x ∈ B : k| x | q − sign( x ) k ∗ ≤ t || x || q − B } . Now note that for any vectors x, y ∈ R d , we have k x − y k qB = h| x − y | q − sign( x − y ) , x − y i ≤ k| x − y | q − sign( x − y ) k ∗ k x − y k . Moreover, as | x − y | q − ≤ ( q − + ( | x | q − + | y | q − ), we have k| x − y | q − sign( x − y ) k ∗ ≤ ( q − + K k| x | q − + | y | q − k ∗ ≤ q − + K t for all x, y ∈ B t using the unconditional property of the dual basis { e ∗ n } . Thus k x − y k qB ≤ q − + K t k x − y k whenever x, y ∈ B t , and it remains to invoke Theorem 4.1. (cid:3) We have now given two distinct explanations for the behavior of ℓ q -ellipsoidsobserved in section 3.1. When q ≥

2, such sets are q -convex and the result followsfrom the general principle described by Corollary 4.6. In this setting, the resultremains valid when k · k is an arbitrary norm. When q <

2, the observed behavioris described by Corollary 4.10, which exploits a more special geometric property of ℓ q -balls. In this setting, the result also remains valid for a large class of norms k · k ,but we require the additional restriction that the underlying basis is unconditional.It appears that these two cases possess a genuinely diﬀerent geometry, which iscompletely hidden in the statement of Proposition 3.1. Acknowledgments.

The author would like to thank the anonymous referees forhelpful comments that improved the presentation of this paper.

References [1] F. Albiac and N. J. Kalton.

Topics in Banach space theory , volume 233 of

Graduate Textsin Mathematics . Springer, New York, 2006.[2] S. Artstein-Avidan, A. Giannopoulos, and V. D. Milman.

Asymptotic geometric analysis.Part I , volume 202 of

Mathematical Surveys and Monographs . American Mathematical Soci-ety, Providence, RI, 2015.[3] B. Beauzamy.

Introduction to Banach spaces and their geometry , volume 68 of

North-HollandMathematics Studies . North-Holland Publishing Co., Amsterdam-New York, 1982.[4] C. Bennett and R. Sharpley.

Interpolation of operators , volume 129 of

Pure and AppliedMathematics . Academic Press, Inc., Boston, MA, 1988.[5] B. Carl. Entropy numbers of diagonal operators with an application to eigenvalue problems.

J. Approx. Theory , 32(2):135–150, 1981.[6] R. A. DeVore and V. A. Popov. Interpolation spaces and nonlinear approximation. In

Func-tion spaces and applications (Lund, 1986) , volume 1302 of

Lecture Notes in Math. , pages191–205. Springer, Berlin, 1988.[7] R. M. Dudley. The sizes of compact subsets of Hilbert space and continuity of Gaussianprocesses.

J. Functional Analysis , 1:290–330, 1967.[8] D. E. Edmunds and H. Triebel.

Function spaces, entropy numbers, diﬀerential operators ,volume 120 of

Cambridge Tracts in Mathematics . Cambridge University Press, Cambridge,1996.[9] J.-B. Hiriart-Urruty and C. Lemar´echal.

Convex analysis and minimization algorithms. I ,volume 305 of

Grundlehren der Mathematischen Wissenschaften . Springer-Verlag, Berlin,1993.

HAINING, INTERPOLATION, AND CONVEXITY 21 [10] A. N. Kolmogorov and V. M. Tihomirov. ε -entropy and ε -capacity of sets in function spaces. Uspehi Mat. Nauk , 14(2 (86)):3–86, 1959.[11] M. Ledoux and M. Talagrand.

Probability in Banach spaces , volume 23 of

Ergebnisse derMathematik und ihrer Grenzgebiete . Springer-Verlag, Berlin, 1991.[12] J. Peetre and G. Sparr. Interpolation of normed abelian groups.

Ann. Mat. Pura Appl. (4) ,92:217–262, 1972.[13] A. Pietsch. Approximation spaces.

J. Approx. Theory , 32(2):115–134, 1981.[14] V. N. Sudakov. Gauss and Cauchy measures and ε -entropy. Dokl. Akad. Nauk SSSR , 185:51–53, 1969.[15] M. Talagrand. Majorizing measures: the generic chaining.

Ann. Probab. , 24(3):1049–1103,1996.[16] M. Talagrand.

Upper and lower bounds for stochastic processes , volume 60 of

Ergebnisse derMathematik und ihrer Grenzgebiete . Springer, Heidelberg, 2014.[17] H. K. Xu. Inequalities in Banach spaces with applications.

Nonlinear Anal. , 16(12):1127–1138, 1991.

Sherrerd Hall Room 227, Princeton University, Princeton, NJ 08544, USA

E-mail address ::