On nearly radial marginals of high-dimensional probability measures
aa r X i v : . [ m a t h . F A ] J u l On nearly radial marginals of high-dimensionalprobability measures
Bo’az Klartag ∗ Abstract
Suppose that µ is an absolutely continuous probability measure on R n , for large n .Then µ has low-dimensional marginals that are approximately spherically-symmetric.More precisely, if n ≥ ( C/ε ) Cd , then there exist d -dimensional marginals of µ thatare ε -far from being spherically-symmetric, in an appropriate sense. Here C > is auniversal constant. The purpose of this paper is to clarify a seven line paragraph by Gromov [11, Section1.2.F]. We are interested in projections of high-dimensional probability measures. Not allprobability measures on R n , for large n , are truly n -dimensional. For instance, a measuresupported on an atom or two should not be considered high-dimensional. Roughly speak-ing, we think of a probability measure on a linear space as decently high-dimensional ifany subspace of bounded dimension contains only a small fraction of the total mass. Definition 1.1
Let µ be a Borel probability measure on R n and ε > . We say that µ is“decently high-dimensional with parameter ε ”, or “ ε -decent” in short, if for any linearsubspace E ⊆ R n , µ ( E ) ≤ ε dim( E ) . (1) We say that µ is decent if it is ε -decent for ε = 1 /n , the minimal possible value of ε . Clearly, all absolutely continuous probability measures on R n are decent, as are manydiscrete measures. Note that a decent measure µ necessarily satisfies µ ( { } ) = 0 , how-ever, this feature should not be taken too seriously. A measure µ is “weakly ε -decent” if(1) holds for all subspaces E ⊆ R n except E = { } . For a measure µ on a measurablespace Ω and a measurable map T : Ω → Ω ′ , we write T ∗ ( µ ) for the push-forward of µ under T , i.e., T ∗ ( µ )( A ) = µ ( T − ( A )) for all measurable sets A ⊆ Ω ′ . When µ is a probability measure on R n and T : R n → R ℓ is a linear map with ℓ < n , we say that T ∗ ( µ ) is a marginal of µ , or a measure projection ∗ Supported in part by the Israel Science Foundation and by a Marie Curie Reintegtation Grant from theCommission of the European Communities. µ . The classical Dvoretzky theorem asserts that appropriate geometric projections ofany high-dimensional convex body are approximately Euclidean balls (see Milman [23]and references therein). The analogous statement for probability measures should perhapsbe the following (see Gromov [11]): Appropriate measure projections of any decent high-dimensional probability measure are approximately spherically-symmetric. When can wesay that a probability measure µ on R d is approximately radially-symmetric?We need some notation. Let µ be a finite measure on a measurable space Ω . For asubset A ⊆ Ω with µ ( A ) > we write µ | A for the conditioning of µ on A , i.e., µ | A ( B ) = µ ( A ∩ B ) /µ ( A ) for any measurable set B ⊆ Ω . Write S d − for the unit sphere centered at the origin in R d . The uniform probability measure on the sphere S d − is denoted by σ d − . For twoprobability measures µ and ν on the sphere S d − and ≤ p < ∞ we write W p ( µ, ν ) forthe L p Monge-Kantorovich transportation distance between µ and ν in the sphere S d − endowed with the geodesic distance (see, e.g., [33] or Section 2 below). The metrics W p are all equivalent (we have W ≤ W p ≤ πW /p ) and they metrize weak convergenceof probability measures. For an interval J ⊂ (0 , ∞ ) we consider the spherical shell S ( J ) = { x ∈ R d ; | x | ∈ J } , where | · | is the standard Euclidean norm in R d . The radialprojection in R d is the map R ( x ) = x/ | x | . An interval is either open, closed or half-openand half-closed. Definition 1.2 (Gromov [12])
Let µ be a Borel probability measure on R d and let ε > .We say that µ is “ ε -radial” if for any interval J ⊂ (0 , ∞ ) with µ ( S ( J )) ≥ ε , we have W (cid:0) R ∗ ( µ | S ( J ) ) , σ d − (cid:1) ≤ ε. That is, when we condition µ on any spherical shell that contains at least an ε -fractionof the mass, and then project radially to the sphere, we obtain an approximation to theuniform probability measure on the sphere in the transportation-metric sense. Note that this definition is scale-invariant. We think of the dimension n from Defini-tion 1.1 as a very large number, tending to infinity. On the other hand, we usually viewthe dimension d in Definition 1.2 as being fixed, and typically not very large. The case d = 1 of Definition 1.2 corresponds to the measure being approximately even. We are notsure whether Dirac’s measure δ is a good example of an ε -radial measure. An ε -radialmeasure µ is said to be “proper” if µ ( { } ) = 0 . Our main theorem reads as follows: Theorem 1.3
There exists a universal constant
C > for which the following holds: Let < ε < and let d, n be positive integers. Suppose that n ≥ ( C/ε ) Cd . (2) Then, for any decent probability measure µ on R n , there exists a linear map T : R n → R d such that T ∗ ( µ ) is ε -radial proper.Furthermore, let η > be such that η − ≥ ( C/ε ) Cd . Then, for any η -decent prob-ability measure µ on R n , there exists a linear map T : R n → R d such that T ∗ ( µ ) is ε -radial proper. d = 1 , of Theorem 1.3, which doesnot seem to generalize to higher dimensions [12], [22]. Theorem 1.3 is tight, up to thevalue of the constant C , as demonstrated by the example where µ is distributed uniformlyon n linearly independent vectors: In this case µ is decent, but for any linear map T andan interval J , the discrete measure R (( T ∗ µ ) | S ( J ) ) is composed of at most n atoms. Itis not difficult to see that when the support of ν contains no more than ε − ( d − points,we have the lower bound W ( ν, σ d − ) ≥ cε , for a certain universal constant c > . Itis desirable to find the best constant in the exponent in Theorem 1.3, perhaps also withrespect to other notions of ε -radial measures.The conclusion of Theorem 1.3 also holds when the measure µ is assumed to be only“weakly ε -decent”, except that T ∗ ( µ ) is no longer necessarily proper. Another possibilityin this context is to allow affine maps in Theorem 1.3 in place of linear maps, and obtaina measure T ∗ ( µ ) which is ε -radial proper. (It is also possible to modify Definition 1.1slightly, and require that (1) hold for all affine subspaces of dimension at least one. Theeffect of such a modification is minor, since an ε -decent measure will remain at most ε -decent after such a change).The conclusion of Theorem 1.3 does not necessarily hold for non-decent measures,even when their support spans the entire R n : Let e , . . . , e n be linearly independent vec-tors in R n , and consider the probability measure µ = (1 − − n ) − P ni =1 − i δ e i , where δ x is Dirac’s unit mass at x ∈ R n . Then µ is not decent, and none of the two-dimensionalmarginals of µ are ε -radial proper, for ε = 1 / .As in Milman’s proof of Dvoretzky’s theorem (see [23]), Theorem 1.3 will be provedby demonstrating that a random linear map T works with positive probability, once themeasure µ is put in the right “position”. That is, we first push-forward µ under an appro-priate invertible linear map in R n , which is non-random, and only then do we project theresulting probability measure to a random d -dimensional subspace, distributed uniformlyin the Grassmannian. The measure µ is in the correct “position” when the covariance ma-trix of R ∗ µ is proportional to the identity matrix. If we assume that the covariance matrixof µ itself is proportional to the identity, then a random projection will not work, in gen-eral, with high probability (compare with Sudakov’s theorem; see [29] or the presentationin Bobkov [4]).Here is an outline of the proof of Theorem 1.3 and also of the structure of thismanuscript: In Section 5 we use the non-degeneracy conditions from Definition 1.1 inorder to guarantee the existence of the initial linear transformation that puts µ in the right“position”. Once we know that the covariance matrix of R ∗ µ is approximately a scalarmatrix, we prove that the measure µ may be decomposed into many almost-orthogonalensembles. Each such ensemble is simply a discrete probability measure, uniform on acollection of approximately-orthogonal vectors in R n that are not necessarily of the samelength. This decomposition, which essentially appeared earlier in the work of Bourgain,Lindenstrauss and Milman [6], is discussed in Section 4. Section 3 is concerned with theanalysis of a single ensemble of our decomposition. As it turns out, a random projectionworks with high probability, and transforms the discrete measure into an almost-radialone. Section 2 contains a preliminary discussion regarding ε -radial measures and thetransportation metric. The proof of Theorem 1.3 is completed in Section 6, in which we3lso make some related comments and prove the following corollary to Theorem 1.3. Corollary 1.4
There exists a sequence R n → ∞ with the following property: Let µ be a decent probability measure on R n . Then, there exists a non-zero linear functional ϕ : R n → R such that µ ( { x ; ϕ ( x ) ≥ tM } ) ≥ c exp( − Ct ) for all ≤ t ≤ R n and µ ( { x ; ϕ ( x ) ≤ − tM } ) ≥ c exp( − Ct ) for all ≤ t ≤ R n where M > is a median, that is, µ ( { x ; | ϕ ( x ) | ≤ M } ) ≥ / and µ ( { x ; | ϕ ( x ) | ≥ M } ) ≥ / (3) and c, C > are universal constants. Moreover, one may take R n = c (log n ) / . In other words, any high-dimensional probability measure has super-gaussian marginals.Furthermore, as is evident from the proof, most of the marginals are super-gaussian whenthe measure is in the right “position”. In the case of independent random variables, Corol-lary 1.4 essentially goes back to Kolmogorov [20]. See also Nagaev [25].In Section 7 we formulate our results in an infinite-dimensional setting. Unless statedotherwise, throughout the text the letters c, C, C ′ , ˜ c etc. stand for various positive uni-versal constants, whose value may change from one instance to the next. We usuallydenote by lower-case c, ˜ c, c ′ , ¯ c etc. positive universal constants that are assumed suffi-ciently small, and by upper-case C, ˜ C, C ′ , ¯ C etc. sufficiently large universal constants.We write x · y for the usual scalar product of x, y ∈ R n . Acknowledgments.
I would like to thank Misha Gromov for his interest in this workand for introducing me to the problem, to Vitali Milman for encouraging me to write thisnote, to Boris Tsirelson for his explanations regarding measures on infinite-dimensionallinear spaces, to Sasha Sodin for reading a preliminary version of this text and to SemyonAlesker, Noga Alon and Apostolos Giannopoulos for related discussions.
Let ( X, ρ ) be a metric space and let µ , µ be Borel probability measures on X . A coupling of µ and µ is a Borel probability measure γ on X × X whose first marginalis µ and whose second marginal is µ , that is, ( P ) ∗ γ = µ and ( P ) ∗ γ = µ where P ( x, y ) = x and P ( x, y ) = y . The L Monge-Kantorovich distance is W ( µ , µ ) = inf γ Z X × X ρ ( x, y ) dγ ( x, y ) where the infimum runs over all couplings γ of µ and µ . Then W is a metric, and itsatisfies the convexity relation W ( λµ + (1 − λ ) µ , ν ) ≤ λW ( µ , ν ) + (1 − λ ) W ( µ , ν ) (4)4or any < λ < and probability measures µ , µ , ν on X . The Kantorovich-Rubinsteinduality theorem (see [33, Theorem 1.14]) states that W ( µ, ν ) = sup ϕ Z X ϕd [ µ − ν ] (5)where the supremum runs over all -Lipschitz functions ϕ : X → R (i.e., | ϕ ( x ) − ϕ ( y ) | ≤ ρ ( x, y ) for all x, y ∈ X ). We are concerned mostly with the case where the metricspace X is the Euclidean sphere S n − with the metric ρ ( x, y ) being the geodesic distancein S d − , i.e., cos ρ ( x, y ) = x · y . Denote by M ( S d − ) the space of Borel probabilitymeasures on S d − , endowed with the weak ∗ topology and the corresponding Borel σ -algebra. Similarly, M ( R d ) is the space of Borel probability measures on R d , endowedwith the weak ∗ topology (convergence of integrals of compactly-supported continuousfunctions) and σ -algebra. A measure here always means a non-negative measure. Thetotal variation distance between two measures µ and ν on a measurable space Ω is d T V ( µ, ν ) = sup A ⊆ Ω | µ ( A ) − ν ( A ) | where the supremum runs over all measurable sets A ⊆ Ω . Clearly, for µ, ν ∈ M ( S d − ) , W ( µ, ν ) ≤ πd T V ( µ, ν ) ≤ π. (6)Additionally, d T V ( S ∗ µ, S ∗ ν ) ≤ d T V ( µ, ν ) for any measures µ, ν and a measurablemap S . When S is a λ -Lipschitz map between metric spaces, we obtain the inequal-ity W ( S ∗ µ, S ∗ ν ) ≤ λW ( µ, ν ) . The following lemma is an obvious consequence of (4)and (6), via Jensen’s inequality. Lemma 2.1
Let d be a positive integer, ≤ ε < and µ ∈ M ( S d − ) . Suppose thatwe are given a “random probability measure” on S d − . That is, let λ be a probabilitymeasure on a measurable space Ω , and suppose that with any α ∈ Ω we associate aprobability measure µ α ∈ M ( S d − ) such that the map Ω ∋ α µ α ∈ M ( S d − ) ismeasurable. Assume that d T V (cid:18) µ, Z Ω µ α dλ ( α ) (cid:19) ≤ ε. Then, W ( µ, σ d − ) ≤ Z Ω W ( µ α , σ d − ) dλ ( α ) + πε ≤ sup α ∈ Ω W ( µ α , σ d − ) + 4 ε. Recall that µ | X stands for the conditioning of µ to X . Lemma 2.2
Suppose that µ and ν are finite measures on a measurable space Ω and let ε > . Let X ⊆ Ω be such that ν ( X ) > ε . Suppose that | µ ( A ) − ν ( A ) | ≤ ε for all A ⊆ X. Then d T V ( µ | X , ν | X ) ≤ ε/ν ( X ) . roof: For any A ⊆ X , (cid:12)(cid:12) ν | X ( A ) − µ | X ( A ) (cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12) ν ( A ) − µ ( A ) ν ( X ) + µ ( A ) µ ( X ) · µ ( X ) − ν ( X ) ν ( X ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ εν ( X ) + εν ( X ) = 2 εν ( X ) , since µ ( A ) ≤ µ ( X ) . (cid:3) Next we describe a few simple properties of ε -radial measures. Lemma 2.3
Let d be a positive integer and < ε < / . Let µ, ν be Borel probabilitymeasures on R d . Additionally, assume that we are given a “random probability measure”on R d . That is, let λ be a probability measure on a measurable space Ω . Suppose thatwith any α ∈ Ω we associate a measure µ α ∈ M ( R d ) such that the map Ω ∋ α µ α ∈ M ( R d ) is measurable.(a) Suppose that µ is ε -radial and that d T V ( µ, ν ) ≤ ε . Then ν is ε -radial.(b) Suppose that µ α is ε -radial for any α ∈ Ω . Assume that µ = R Ω µ α dλ ( α ) . Then µ is √ ε -radial.(c) Suppose that A ⊆ Ω satisfies λ ( A ) ≥ − ε , and µ α is ε -radial for any α ∈ A .Assume that µ = R Ω µ α dλ ( α ) . Then µ is √ ε -radial.Proof: Begin with (a). Let J ⊂ (0 , ∞ ) be an interval with ν ( S ( J )) ≥ ε , where S ( J ) = { x ∈ R d ; | x | ∈ J } is a spherical shell. Denote ν J = ν | S ( J ) and µ J = µ | S ( J ) .Since d T V ( µ, ν ) ≤ ε , we may apply Lemma 2.2 with ε and X = S ( J ) . We concludethat d T V ( µ J , ν J ) ≤ ε ) / ε = ε. Consequently, d T V ( R ∗ ( µ J ) , R ∗ ( ν J )) ≤ ε. (7)Since µ is ε -radial and µ ( S ( J )) ≥ ε − ε ≥ ε , then W ( R ∗ ( µ J ) , σ d − ) ≤ ε according toDefinition 1.2. From (6), (7) and the triangle inequality, W ( R ∗ ( ν J ) , σ d − ) ≤ ( π +1) ε ≤ ε . This completes the proof of (a).We move to the proof of (b). Let J ⊂ (0 , ∞ ) be an interval with µ ( S ( J )) ≥ √ ε .Let X = { α ∈ Ω; µ α ( S ( J )) ≥ ε } . Denote ν = R X µ α dλ ( α ) , a finite Borel measure on R n . Then for any A ⊆ S ( J ) , | µ ( A ) − ν ( A ) | = Z Ω \ X µ α ( A ) dλ ( α ) ≤ Z Ω \ X µ α ( S ( J )) dλ ( α ) ≤ ελ (Ω \ X ) ≤ ε. (8)Denote µ J = µ | S ( J ) and ν J = ν | S ( J ) . From (8) and Lemma 2.2, d T V ( µ J , ν J ) ≤ ε/ (4 √ ε ) = √ ε/ . (9)Note that ν J = R X µ α | S ( J ) dλ ′ ( α ) where λ ′ is a certain probability measure on X . Since µ α is ε -radial and µ α ( S ( J )) ≥ ε for α ∈ X , then from Definition 1.2, W (cid:0) R ∗ ( µ α | S ( J ) ) , σ d − (cid:1) ≤ ε for α ∈ X. (10)6e have R ∗ ( ν J ) = R X R ∗ ( µ α | S ( J ) ) dλ ′ ( α ) . Thus, (10) and Lemma 2.1 yield that W ( R ∗ ( ν J ) , σ d − ) ≤ ε . Combining the last inequality with (6) and (9), we see that W ( R ∗ ( µ J ) , σ d − ) ≤ d T V ( R ∗ ( µ J ) , R ∗ ( ν J ))+ W ( R ∗ ( ν J ) , σ d − ) ≤ √ ε + ε ≤ √ ε. Since µ J = µ | S ( J ) , and J ⊂ (0 , ∞ ) is an arbitrary interval with µ ( S ( J )) ≥ √ ε , theassertion (b) is proven.To prove (c), denote ν = R A µ α dλ | A ( α ) , a probability measure on R d . Then ν is √ ε -radial, according to (b). Furthermore, clearly d T V ( λ | A , λ ) = 1 − λ ( A ) ≤ ε , andhence d T V ( µ, ν ) ≤ ε ≤ (4 √ ε ) . According to part (a), the measure µ is √ ε -radial,and (c) is proven. (cid:3) Probability measures are the protagonists of this text. Some of our constructions ofprobability measures are probabilistic in nature. To avoid confusion, we try to distinguishsharply between the measures themselves, and the randomness used in their construction.Whenever we have objects that are declared random (for instance, random vectors in S d − ), all statements containing probability estimates or using the symbol P refer to theserandom objects and only to them.The crude bound in the following lemma is certainly a standard application of the so-called “empirical distribution method” (see, e.g., [5]). We were not able to find it in theliterature, so a proof is provided. Recall that δ x stands for the Dirac unit mass at the point x . Lemma 2.4
Let d, N be positive integers, and let X , . . . , X N be independent randomvectors, distributed uniformly on S d − . Denote µ = N − P Ni =1 δ X i . Then, with proba-bility greater than − C exp( − c √ N ) of selecting X , . . . , X N , W ( µ, σ d − ) ≤ CN c/d where C > and < c < are universal constants.Proof: Denote by F the class of all -Lipschitz functions ϕ : S d − → R such that R ϕdσ d − = 0 . Note that sup | ϕ | ≤ π for any ϕ ∈ F . According to (5), W ( µ, σ d − ) = sup ϕ ∈F Z S d − ϕdµ. (11)Let ε > be a parameter to be specified later on. A subset N ⊂ S d − is an ε -net if forany x ∈ S d − there exists y ∈ N with ρ ( x, y ) ≤ ε . Let N be an ε -net of cardinality N ) ≤ ( C/ε ) d (see, e.g., [26, Lemma 4.16]). For ϕ ∈ F denote ˜ ϕ ( x ) = min y ∈N ( ε ⌈ ϕ ( y ) /ε ⌉ + ρ ( x, y )) , where ⌈ a ⌉ is the minimal integer that is not smaller than a . Then ˜ ϕ is a -Lipschitzfunction, as a minimum of -Lipschitz functions. It is easily verified that ϕ ≤ ˜ ϕ ≤ + 3 ε . Denote ϕ ◦ ( x ) = ˜ ϕ ( x ) − R ˜ ϕ ( y ) dσ d − ( y ) . Then ϕ ◦ ∈ F for any ϕ ∈ F , and sup | ϕ ◦ − ϕ | ≤ ε . Hence, W ( µ, σ d − ) = sup ϕ ∈F Z S d − ϕdµ ≤ ε + sup ϕ ∈F Z S d − ϕ ◦ dµ = 3 ε + sup ϕ ∈F N N X i =1 ϕ ◦ ( X i ) . (12)Denote ˜ F = { ˜ ϕ ; ϕ ∈ F} and F ◦ = { ϕ ◦ ; ϕ ∈ F} . These sets are finite. In fact, as each ϕ ∈ ˜ F is determined by the restriction ϕ | N , we have F ◦ ) ≤ F ) ≤ (cid:18) πε + 1 (cid:19) N ) ≤ exp (cid:0) ( C/ε ) d (cid:1) . (13)Fix ϕ ◦ ∈ F ◦ . Then ϕ ◦ is a -Lipschitz function on the sphere S d − with R ϕ ◦ dσ d − = 0 .According to L´evy’s lemma (see Milman and Schechtman [24, Section 2 and AppendixV]), for any i = 1 , . . . , N , P ( | ϕ ◦ ( X i ) | ≥ t ) ≤ C exp (cid:0) − ct d (cid:1) ∀ t ≥ , where P refers, of course, to the probability of choosing the random vectors X , . . . , X N .From Bernstein’s inequality (see, e.g., Bourgain, Lindenstrauss and Milman [6, Proposi-tion 1]), P (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N N X i =1 ϕ ◦ ( X i ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ t ! ≤ C ′ exp (cid:0) − c ′ t N d (cid:1) ∀ t ≥ . (14)Set t = ε in (14). From (13) and (14), P sup ϕ ◦ ∈F ◦ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N N X i =1 ϕ ◦ ( X i ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ ε ! ≤ C ′ exp (cid:0) ( C/ε ) d − c ′ ε N d (cid:1) . (15)We now select ε = CN − / (2 d +2) , for a sufficiently large universal constant C > .Substitute the value of ε in (15) and apply (12), to deduce that W ( µ, σ d − ) ≤ ε + sup ϕ ◦ ∈F ◦ N N X i =1 ϕ ◦ ( X i ) ≤ ε ≤ C ′ N − / (2 d +2) , with probability greater than − C ′ exp( − c ′ N d/ ( d +1) ) of selecting X , . . . , X N . (cid:3) Remark.
The discrepancy of µ ∈ M ( S d − ) is defined as D ( µ ) = sup B | µ ( B ) − σ d − ( B ) | where the supremum runs over all geodesic balls B ⊆ S d − . A result analogous toLemma 2.4 for discrepancy appears in Beck and Chen [3, Section 7.4]. It is possible toadapt our technique to suit the discrepancy metric, and some of its variants, in place of W . In fact, the only properties of the metric W that are used in our proof are Lemma2.4 and (4) and (6) above. Our method, of course, works for the W p metrics as long as ≤ p < ∞ , but it does not seem to apply for the W ∞ metric. The W ∞ metric inducesa topology that is much stronger than weak convergence, and it is not even weaker thanconvergence in norm. 8 Isotropic Gaussians
A centered Gaussian random vector in R d is a random vector whose density is propor-tional to x exp( − M x · x ) for a positive definite matrix M . A centered Gaussianrandom vector is said to be isotropic if M is a scalar matrix. It is called standard if M = Id/ , where Id is the identity matrix. Recall that R stands for radial projection. Lemma 3.1
Let d, N be positive integers and let Z , . . . , Z N be independent isotropicGaussian random vectors in R d . Denote µ = N P Ni =1 δ Z i .Then, with probability greater than − C exp( − cN / ) of selecting the Z i ’s, themeasure µ is δ -radial, for δ = CN − c/d . Here, C > and < c < are universalconstants.Proof: Set ε = 5 /N / . We may assume that ε ≤ / , as otherwise the conclu-sion of the lemma is obvious for a suitable choice of universal constants c, C > . Thecentral observation is that the radii | Z | , . . . , | Z N | are independent of the angular parts R ( Z ) , . . . , R ( Z N ) , and that the random vectors R ( Z ) , . . . , R ( Z N ) are independentrandom vectors that are distributed uniformly on S d − .With probability one, none of the | Z i | ’s are zero, and there are no i = j with | Z i | = | Z j | . We condition on the values | Z | , . . . , | Z N | , which are assumed to be distinct andnon-zero. For an interval J ⊂ (0 , ∞ ) write Z ( J ) = { i ; | Z i | ∈ J } and w ( J ) = Z ( J )) . Denote k = ⌈ /ε ⌉ , and let J , . . . , J k ⊂ (0 , ∞ ) be disjoint intervals such that w ( J i ) = ⌊ N i/k ⌋ − ⌊ N ( i − /k ⌋ for i = 1 , . . . , k. J JJ
Figure 1
Since ε N ≥ , then ε N/ ≤ w ( J i ) ≤ ε N for any i . For an interval J ⊂ (0 , ∞ ) with w ( J ) = 0 , denote µ J = 1 w ( J ) X j ∈ Z ( J ) δ R ( Z j ) . Fix i = 1 , . . . , k . We abbreviate µ i = µ J i . Since {R ( Z j ) } j ∈ Z ( J i ) is a collection of w ( J i ) independent random vectors, distributed uniformly on the sphere, then Lemma 2.4applies, and yields, P (cid:18) W ( µ i , σ d − ) ≤ Cw ( J i ) c/d (cid:19) ≥ − C exp( − c p w ( J i )) . We now let i vary. Since w ( J i ) has the order of magnitude of ε N , then, P (cid:18) max i =1 ,...,k W ( µ i , σ d − ) ≤ C ( ε N ) c/d (cid:19) ≥ − Ck exp( − cε √ N ) . (16)9rite I for the collection of all non-empty intervals in (0 , ∞ ) . Fix an interval J with w ( J ) ≥ N ε . Let J i , . . . , J i ℓ be all the intervals among the J i ’s that are contained in J .Then J i ∪ . . . ∪ J i ℓ covers all but at most ε N of the | Z i | ’s that belong to the interval J . Therefore, d T V µ J , ℓ X j =1 λ j µ i j ≤ ε NN ε = 4 ε where λ , . . . , λ ℓ are appropriate non-negative coefficients that add to one. According toLemma 2.1, W ( µ J , σ d − ) ≤ max i =1 ,...,k W ( µ i , σ d − ) + 20 ε for all J ∈ I with w ( J ) ≥ N ε.
We thus conclude from (16) that with probability at least − Ck exp( − cε √ N ) , W ( µ J , σ d − ) ≤ C ( ε N ) c/d + 20 ε for all J ∈ I with w ( J ) ≥ N ε. (17)The latter probability bound is valid under the conditioning on | Z | , . . . , | Z N | , and it holdsfor all possible values of | Z | , . . . , | Z N | , up to measure zero. Hence the aforementionedprobability bound for (17) also holds with no conditioning at all. Recall that we write S ( J ) = { x ∈ R n ; | x | ∈ J } , and note that µ J = R ∗ ( µ | S ( J ) ) and w ( J ) = N µ ( S ( J )) .Since ε √ N ≥ N / , then (17) translates as follows: With probability greater than − C exp( − cN / ) of selecting Z , . . . , Z N , W ( R ∗ ( µ | S ( J ) ) , σ d − ) ≤ CN − c/d + Cε for all J ∈ I with µ ( S ( J )) ≥ ε. This means that µ is C ( N − c/d + ε ) -radial with probability greater than − C exp( − cN / ) .Since N − c/d + ε ≤ C ′ N − c ′ /d , the lemma is proven. (cid:3) Lemma 3.2
Let k be a positive integer. For an invertible k × k matrix A , write γ A for theprobability measure on R k whose density is proportional to x exp( −| Ax | / . Then,for any k × k invertible matrices A and B , d T V ( γ A , γ B ) ≤ Ck k BA − − Id k where Id is the identity matrix, k · k is the operator norm, and C > is a universalconstant.Proof: Let X be a standard Gaussian random vector in R k . Then γ A ( U ) = P (cid:0) A − X ∈ U (cid:1) for any measurable set U ⊆ R k . Therefore, d T V ( γ A , γ B ) = sup U ⊆ R K (cid:12)(cid:12) P ( X ∈ U ) − P ( AB − X ∈ U ) (cid:12)(cid:12) = d T V ( γ BA − , γ Id ) . Denote M = BA − , write γ = γ Id and set ε = k M − Id k = sup | x | =1 | M x − x | . Wewrite ϕ M ( x ) = (det M )(2 π ) − d/ exp( −| M x | / for the density of γ M and similarly ϕ stands for the density of γ . We may assume that ε < / , as otherwise the conclusion10f the lemma is trivial. Then (cid:12)(cid:12) | M x | − | x | (cid:12)(cid:12) ≤ ε | x | for any x ∈ R k , and also (1 +2 ε ) − k ≤ det M ≤ (1 + ε ) k . Therefore, | ϕ ( x ) − ϕ M ( x ) | = ϕ ( x ) (cid:12)(cid:12)(cid:12)(cid:12) − (det M ) exp (cid:18) | x | − | M x | (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) ≤ ϕ ( x ) (cid:2) (1 + 2 ε ) k exp(3 ε | x | ) − (cid:3) for any x ∈ R k . Consequently, d T V ( γ, γ M ) = 12 Z R k | ϕ ( x ) − ϕ M ( x ) | dx ≤ (1 + 2 ε ) k Z R k exp(3 ε | x | ) ϕ ( x ) dx − . However, Z R k exp(3 ε | x | ) ϕ ( x ) dx = (2 π ) − d/ Z R k exp (cid:0) −|√ − εx | / (cid:1) dx = (1 − ε ) − k/ . We deduce that d T V ( γ, γ M ) ≤ (1 + 2 ε ) k (1 − ε ) − k/ − ≤ Ckε, under the legitimate assumption that ε < c/k (otherwise, there is nothing to prove). (cid:3)
Consider the map E : ( R d ) N → M ( R d ) defined by ( R d ) N ∋ ( x , . . . , x N ) E N N X i =1 δ x i . A Borel probability measure α on ( R d ) N thus induces the Borel probability measure E ∗ α on the space M ( R d ) . The next lemma is a small perturbation of Lemma 3.1. Lemma 3.3
Let d, N be positive integers, ε > . Let X , . . . , X N be independent, stan-dard Gaussian random vectors in R d . Let ( a ij ) ≤ j ≤ i ≤ N be real numbers, with a ii = 0 for all i , such that | a ij | ≤ ε | a ii | for j < i. (18) Denote Y i = P j ≤ i a ij X j and consider the probability measure µ = N − P Ni =1 δ Y i .Then, with probability greater than − C exp( − cN / ) − CN d ε of selecting therandom vectors X , . . . , X N , the measure µ is δ -radial for δ = CN − c/d . Here, C > and < c < are universal constants.Proof: Denote Z i = a ii X i . The Z i ’s are independent, isotropic, Gaussian randomvectors in R d . Denote by U ⊆ M ( R d ) the collection of all δ -radial probability measures,where δ = CN − c/d is the same as in Lemma 3.1. Let α be the probability measure on ( R d ) N that is the joint distribution of Z , . . . , Z N . According to Lemma 3.1, ( E ∗ α )( U ) ≥ − C exp( − cN / ) . β be the probability measure on ( R d ) N which is the joint distribution of Y , . . . , Y N .To prove the lemma, we need to show that ( E ∗ β )( U ) ≥ − C exp( − cN / ) − C ′ N d ε. This would follow, if we could prove that d T V (( E ∗ α ) , ( E ∗ β )) ≤ d T V ( α, β ) ≤ C ′ N d ε. (19)Let k = dN . Let A be the k × k matrix that represents the linear operator R k = ( R d ) N ∋ ( x , . . . , x N ) ( a x , . . . , a NN x N ) ∈ ( R d ) N = R k . That is, A is a diagonal matrix, and the diagonal of A contains each number a ii exactly d times. Denoting X = ( X , . . . , X N ) ∈ ( R d ) N = R k and Z = ( Z , . . . , Z N ) ∈ ( R d ) N = R k , we see that Z = AX . Therefore, in the notation of Lemma 3.2, we have α = γ A − .Similarly, let B be the k × k matrix that corresponds to the linear operator ( x i ) i =1 ,...,N ( X j ≤ i a ij x j ) i =1 ,...,N where x , . . . , x N are vectors in R d . Denoting Y = ( Y , . . . , Y N ) ∈ R k we have Y = BX and consequently β = γ B − . Condition (18) implies that the off-diagonal elementsof A − B do not exceed ε in absolute value. The diagonal elements of A − B are all ones.Hence k A − B − Id k ≤ kε , and according to Lemma 3.2, d T V ( α, β ) = d T V ( γ B − , γ A − ) ≤ Ck k A − B − Id k ≤ Ck ε = CN d ε. Thus (19) holds and the proof is complete. (cid:3)
This section is concerned with probability measures on R n that may be decomposed as amixture, whose components are mostly ensembles of approximately-orthogonal vectors.Later on, we will apply a random projection, and use Lemma 3.3 in order to show that theprojection of most elements in the mixture is typically ε -radial for small ε > . Definition 4.1
Let ℓ, n be positive integers and let ε > . Let v , . . . , v ℓ ∈ R n be non-zero vectors, and consider v = ( v , . . . , v ℓ ) , an ℓ -tuple of vectors. We say that v is“ ε -orthogonal” if there exist orthonormal vectors w , . . . , w ℓ ∈ R n and real numbers ( a ij ) ≤ j ≤ i ≤ ℓ such that v i = i X j =1 a ij w j for i = 1 , . . . , ℓ and | a ij | ≤ ε | a ii | for j < i . ( v , . . . , v ℓ ) is ε -orthogonal if and only if ( R ( v ) , . . . , R ( v ℓ )) is ε -orthogonal.We write O ℓ,ε ⊂ ( R n ) ℓ for the collection of all ε -orthogonal ℓ -tuples v = ( v , . . . , v ℓ ) ∈ ( R n ) ℓ . For a subspace E ⊂ R n denote by P roj E the orthogonal projection operator onto E in R n . Lemma 4.2
Let ℓ, n be positive integers. Suppose that µ is a Borel probability measureon the unit sphere S n − such that for any unit vector θ ∈ S n − , Z S n − ( x · θ ) dµ ( x ) ≤ ℓ . Let X , . . . , X ℓ be independent random vectors in S n − , all distributed according to µ .Then, with positive probability, ( X , . . . , X ℓ ) is ( ℓ − / -orthogonal.Proof: We may assume that ℓ ≥ , otherwise the lemma is vacuously true. Write W , . . . , W ℓ ∈ R n for the vectors obtained from X , . . . , X ℓ through the Gram-Schmidtorthogonalization process. (If the X i ’s are not linearly independent, then some of the W i ’s might be zero). For i ≥ denote by E i the linear span of X , . . . , X i − . Then, for i ≥ , E | P roj E i ( X i ) | = E i − X j =1 ( X i · W j ) = E i − X j =1 Z S n − ( x · W j ) dµ ( x ) ≤ i − ℓ ≤ ℓ − , as X i is independent of W , . . . , W i − . By Chebyshev’s inequality, P (cid:8) ∃ ≤ i ≤ ℓ ; | P roj E i ( X i ) | ≥ ℓ − / (cid:9) ≤ ( ℓ − ℓ − ℓ − / ≤ ℓ − < . Therefore, with positive probability, | P roj E i ( X i ) | < ℓ − / for all i ≥ . In thisevent, the vectors X , . . . , X ℓ are linearly independent, and W , . . . , W ℓ are orthonor-mal vectors. Furthermore, in this event a ii := p − | P roj E i ( X i ) | ≥ − ℓ − / while a ij := X i · W j satisfies | a ij | ≤ | P roj E i ( X i ) | ≤ ℓ − / for j < i. Thus, with positive probability, X i = P j ≤ i a ij W j for all i , with | a ij | ≤ ( ℓ − / | a ii | for j < i and with W , . . . , W ℓ being orthonormal vectors. This completes the proof.Note that the “positive probability” is in fact greater than − ℓ − . (cid:3) The next lemma is essentially a measure-theoretic variant of a lemma going back toBourgain, Lindenstrauss and Milman [6, Lemma 12], with the main difference being thatthe logarithmic dependence is improved upon to a power law. For two Borel measures µ and ν on a compact K we say that µ ≤ ν if Z K ϕdµ ≤ Z K ϕdν for any continuous ϕ : K → [0 , ∞ ) . (20)Recall that a point is not in the support of a measure if and only if it has an open neigh-borhood of measure zero. We abbreviate O ℓ = O ℓ,ℓ − . For v = ( v , . . . , v ℓ ) ∈ O ℓ denote µ v = 1 ℓ ℓ X i =1 δ v i ,
13 Borel probability measure on R n (in the notation of the previous section, µ v = E ( v ) ).When K ⊆ R n , we write O ℓ ( K ) ⊆ O ℓ for the collection of all ( v , . . . , v ℓ ) ∈ O ℓ with v i ∈ K for all i . Then O ℓ ( K ) = O ℓ ∩ K ℓ ⊆ ( R n ) ℓ , and it is straightforward to verifythat O ℓ ( K ) is compact whenever K ⊂ R n is a compact that does not contain the origin. Lemma 4.3
Let ℓ, n be positive integers and let < ε < / . Let µ be a Borel probabil-ity measure on R n with µ ( { } ) = 0 . Assume that sup θ ∈ S n − Z S n − ( x · θ ) d R ∗ µ ( x ) < εℓ . (21) Then, there exists a Borel probability measure λ on O ℓ such that d T V (cid:18) µ, Z O ℓ µ v dλ ( v ) (cid:19) < ε. (22) Proof:
Since µ ( { } ) = 0 then for any δ > we may find a large punctured ball K = { x ∈ R n ; r ≤ | x | ≤ R } with < r < R such that µ ( K ) ≥ − δ . We may assumethat µ is supported on a compact set that does not contain the origin: Otherwise, replace µ with µ | K for a large punctured ball K ⊂ R n with µ ( K ) ≥ − δ as above, and observethat d T V ( µ, µ | K ) ≤ δ , so the effect of the replacement on the inequalities (21) and (22)is bounded by δ , which can be made arbitrarily small. Write K ⊂ R n for the support of µ , a compact which does not contain the origin. Denote by F the collection of all Borelmeasures λ supported on O ℓ ( K ) for which Z O ℓ µ v dλ ( v ) ≤ µ. (23)The condition (23) defining F is closed in the weak ∗ topology. Furthermore, λ ( O ℓ ) ≤ for all λ ∈ F (use (23)), and take ϕ ≡ in the definition (20)). Hence F is a weak ∗ closed subset of the unit ball of the Banach space of signed finite Borel measures onthe compact O ℓ ( K ) . From the Banach-Alaoglu theorem, F is compact in the weak ∗ topology. Therefore the continuous functional λ λ ( O ℓ ) attains its maximum on F atsome λ ∈ F . Clearly λ ( O ℓ ) ≤ . To prove the lemma, it suffices to show that λ ( O ℓ ) > − ε. (24)Indeed, if (24) holds, then we may define a probability measure λ = λ + ˜ λ , where ˜ λ is any Borel measure on O ℓ of total mass − λ ( O ℓ ) . Then clearly d T V (cid:18) µ, Z O ℓ µ v dλ ( v ) (cid:19) ≤ ˜ λ ( O ℓ ) < ε, and the lemma follows. We thus focus on the proof of (24). Assume by contradiction that(24) fails. Denote ν = µ − R O ℓ µ v dλ ( v ) . Then ν is a non-negative Borel measure on K ⊂ R n , according to (23), and also ν ≤ µ . Moreover, ν ( K ) ≥ ε , since we assume that1424) fails. Denote ˜ ν = ν/ν ( K ) , a probability measure on K ⊂ R n . Then ˜ ν ≤ ν/ε andhence R ∗ (˜ ν ) ≤ R ∗ ( ν ) /ε ≤ R ∗ ( µ ) /ε . For any unit vector θ ∈ S n − , Z S n − ( x · θ ) d R ∗ ˜ ν ( x ) ≤ ε − Z S n − ( x · θ ) d R ∗ ν ( x ) ≤ ε − Z R n ( x · θ ) d R ∗ µ ( x ) ≤ ℓ , from our assumption (21). Lemma 4.2 thus asserts the existence of ˜ x , . . . , ˜ x ℓ ∈ S n − in the support of R ∗ (˜ ν ) such that (˜ x , . . . , ˜ x ℓ ) is ( ℓ − / -orthogonal. Consequently,there exist non-zero vectors x , . . . , x ℓ ∈ R n in the support of ν such that ( x , . . . , x ℓ ) is ( ℓ − / -orthogonal. Let U , . . . , U ℓ ⊂ R n be small open neighborhoods of x , . . . , x ℓ ,respectively, such that ( y , . . . , y ℓ ) ∈ O ℓ,ℓ − = O ℓ for all y ∈ U , . . . , y ℓ ∈ U ℓ . The U i ’s are necessarily disjoint and U × ... × U ℓ ⊆ O ℓ . Denote η = min i =1 ,...,ℓ ν ( U i ) .Then η > , since U i is an open neighborhood of the point x i , and the point x i belongs tothe support of ν . We set ν i = ν | U i , the conditioning of ν to U i . Then ν i is a probabilitymeasure supported on K ⊂ R n , and ην i ≤ ν ( U i ) ν i ≤ ν = µ − Z O ℓ µ v dλ ( v ) for i = 1 , . . . , ℓ. Therefore, also η Z U × ... × U ℓ ℓ ℓ X i =1 δ y i ! dν ( y ) ...dν ℓ ( y ℓ ) = η P ℓi =1 ν i ℓ ≤ µ − Z O ℓ µ v dλ ( v ) . Consequently, the non-negative measure λ = λ + η ( ν × ... × ν ℓ ) on O ℓ ( K ) satisfies (23).Hence λ ∈ F , but λ ( O ℓ ) = λ ( O ℓ ) + η > λ ( O ℓ ) , in contradiction to the maximality of λ . We thus conclude that (24) must be true, and the lemma is proven. (cid:3) A d × n matrix Γ will be called a “standard Gaussian random matrix” if the entries of Γ are independent standard Gaussian random variables (of mean zero and variance one).Suppose that w , . . . , w ℓ are orthonormal vectors in R n and that Γ is a d × n standardGaussian random matrix. Observe that in this case, Γ( w ) , . . . , Γ( w ℓ ) are independentstandard Gaussian random vectors in R d . Lemma 4.4
Let d ≤ ℓ ≤ n be positive integers, let < ε < and assume that ℓ ≥ ( C/ε ) Cd . (25) Suppose that λ is a Borel probability measure on O ℓ , and denote µ = R O ℓ µ v dλ ( v ) . Let Γ be a d × n standard Gaussian random matrix.Then, with positive probability of selecting the random matrix Γ , the measure Γ ∗ µ on R d is ε -radial. Here, C > is a universal constant. (In fact, this probability is at least − ℓ − .)Proof: Fix v = ( v , . . . , v ℓ ) ∈ O ℓ . Consider the measure ˜ µ v := Γ ∗ ( µ v ) on R d . Then ˜ µ v = Γ ∗ ( ℓ P ℓi =1 δ v i ) = ℓ P ℓi =1 δ Γ( v i ) . Let E ( v ) be the following event:15 The measure ˜ µ v is Cℓ − c/d -radial, where C > and < c < are the universalconstants from Lemma 3.3.Let us emphasize that for any v ∈ O ℓ , the event E ( v ) might either hold or not, dependingon the Gaussian random matrix Γ . We are going to apply Lemma 3.3. Since v ∈ O ℓ = O ℓ,ℓ − , then there exist orthonormal vectors w , . . . , w ℓ ∈ R n and numbers a ij suchthat v i = P j ≤ i a ij w j and | a ij | ≤ ℓ − | a ii | for j < i , with a ii = 0 for all i . Denote X i = Γ( w i ) and Y i = P j ≤ i a ij X j . Then X , . . . , X ℓ are independent standard Gaussianrandom vectors in R d , and ˜ µ v = ℓ − P ℓi =1 δ Y i . We may thus apply Lemma 3.3 (with N = ℓ and ε = ℓ − ) and conclude that for any v ∈ O ℓ , P ( E ( v )) ≥ − C exp( − cℓ / ) − Cℓ d ℓ − ≥ − C ′ ℓ − . (26)Let F ⊆ O ℓ be the collection of all v ∈ O ℓ for which the event E ( v ) holds. Then F is arandom subset of O ℓ (depending on the random matrix Γ ). According to (26), E λ ( F ) = E Z O ℓ F ( v ) dλ ( v ) = Z O ℓ E F ( v ) dλ ( v ) = Z O ℓ P ( E ( v )) dλ ( v ) ≥ − C ′ ℓ − where F is the characteristic function of F . Therefore, by Chebyshev’s inequality, P (cid:0) λ ( F ) ≤ − C ′ ℓ − (cid:1) ≤ E (1 − λ ( F ))2 C ′ ℓ − ≤ ℓ − / < . We may assume that C ′ ℓ − ≤ / , thanks to (25). We showed that with positive prob-ability λ ( F ) ≥ − C ′ ℓ − . Recall that ˜ µ v = Γ ∗ ( µ v ) is Cℓ − c/d -radial for any v ∈ F .Hence, according to Lemma 2.3(c), with positive probability of selecting the Gaussianrandom matrix Γ , the measure Z O ℓ Γ ∗ ( µ v ) dλ ( v ) = Γ ∗ (cid:18)Z O ℓ µ v dλ ( v ) (cid:19) = Γ ∗ ( µ ) is C ′ ℓ − c ′ /d -radial on R d . (cid:3) The Grassmannian G n,k of all k -dimensional subspaces in R n carries a unique rota-tionally invariant probability measure, which will be referred to as the uniform probabilitymeasure on G n,k . When Γ is a d × n standard Gaussian random matrix, the kernel of Γ is arandom ( n − d ) -dimensional subspace, that is distributed uniformly in the Grassmannian G n,n − d . For a subspace E ⊆ R n we write E ⊥ = { x ∈ R n ; ∀ y ∈ E, x · y = 0 } for itsorthogonal complement. Lemma 4.5
Let ≤ k ≤ n − be integers, and let µ be a Borel probability measure on R n with µ ( { } ) = 0 . Suppose that E is a random k -dimensional subspace, distributeduniformly in G n,k . Then µ ( E ) = 0 with probability one of selecting E .Proof: By induction on k . The case k = 0 holds trivially. Suppose now that k ≥ ,let n be such that k ≤ n − , and let µ be a Borel probability measure on R n with µ ( { } ) = 0 . Since µ ( { } ) = 0 , there are at most countably many one-dimensional sub-spaces ℓ ⊂ R n with µ ( ℓ ) > . Let ℓ be a random one-dimensional subspace, distributed16niformly in G n, . Then with probability one, µ ( ℓ ) = 0 . Denote ν = ( P roj ℓ ⊥ ) ∗ µ , ameasure supported on an ( n − -dimensional subspace, with ν ( { } ) = 0 . Let F bea random ( k − -dimensional subspace in ℓ ⊥ , distributed uniformly. By the inductionhypothesis, ν ( F ) = 0 with probability one. Denoting E = P roj − ℓ ⊥ ( F ) , we see that µ ( E ) = ν ( F ) = 0 with probability one. From the uniqueness of the Haar measure, E isdistributed uniformly in G n,k , and the lemma follows. (cid:3) Corollary 4.6
Let ≤ d ≤ n be integers and let < ε < / . Let µ be a Borelprobability measure on R n with µ ( { } ) = 0 . Assume that sup θ ∈ S n − Z S n − ( x · θ ) d R ∗ µ ( x ) ≤ (˜ cε ) ˜ Cd . (27) Let Γ be a d × n standard Gaussian random matrix. Then, with positive probability ofselecting Γ , the measure Γ ∗ µ on R d is ε -radial proper. Here, < ˜ c < and ˜ C > are universal constants. (In fact, we have a lower bound of − (˜ cε ) ˜ Cd/ for theaforementioned probability.)Proof: Throughout this proof, we write C for the universal constant from Lemma 4.4.We define ˜ c = (10 C ) − and ˜ C = 100 C . It is elementary to verify that with this choiceof universal constants, there exists an integer ℓ such that ℓ ≥ (5 C/ε ) Cd and (˜ cε ) ˜ Cd ≤ ε ℓ . Note that the left-hand side of (27) is at least /n . Indeed, sup θ ∈ S n − Z S n − ( x · θ ) d R ∗ µ ( x ) ≥ Z S n − Z S n − ( x · θ ) d R ∗ µ ( x ) dσ d − ( θ ) = Z S n − | x | n d R ∗ ( µ )( x ) = 1 n . We conclude that d < ℓ ≤ n , and sup θ ∈ S n − Z S n − ( x · θ ) d R ∗ µ ( x ) ≤ ε ℓ . According to Lemma 4.3, there exists a Borel probability measure λ on O ℓ such that d T V (cid:18) µ, Z O ℓ µ v dλ ( v ) (cid:19) ≤ ε / . (28)Denote ν = R O ℓ µ v dλ ( v ) . From Lemma 4.4 the measure Γ ∗ ( ν ) is ( ε/ -radial, withprobability at least − ℓ − of selecting Γ , because ℓ ≥ (5 C/ε ) Cd . Additionally, d T V (Γ ∗ µ, Γ ∗ ν ) ≤ ε / , by (28). From Lemma 2.3(a) we thus learn that Γ ∗ ( µ ) is ε -radial, with positiveprobability of selecting Γ . Moreover, Γ ∗ ( µ )( { } ) = µ (Γ − (0)) = 0 with probabilityone, according to Lemma 4.5. Hence, with positive probability, Γ ∗ ( µ ) is ε -radial proper. (cid:3) Selecting a position
Our goal in this section is to find an appropriate invertible linear transformation T on R n such that T ∗ µ satisfies the requirements of Corollary 4.6. Our analysis is very muchrelated to the results of Barthe [2], Carlen and Cordero-Erausquin [7] and Carlen, Lieband Loss [8]. For x = ( x , . . . , x n ) ∈ R n we write x ⊗ x for the n × n matrix whoseentries are ( x i x j ) i,j =1 ,...,n . For a probability measure µ on the unit sphere S n − define M ( µ ) = Z S n − ( x ⊗ x ) dµ ( x ) . Then M ( µ ) is a positive semi-definite matrix of trace one. Clearly, for any θ ∈ R n , wehave M ( µ ) θ · θ = R S n − ( x · θ ) dµ ( x ) . More generally, for any subspace E ⊆ R n , Z S n − | P roj E ( x ) | dµ ( x ) = T r ( P roj E M ( µ )) = T r ( M ( µ ) P roj E ) , the trace of the matrix M ( µ ) P roj E . A Borel probability measure µ on S n − is calledisotropic if M ( µ ) = Id/n , where Id is the identity matrix. Observe that when µ isisotropic, for any subspace E ⊆ R n , Z S n − | P roj E x | dµ ( x ) = dim( E ) n . (29)In particular, µ ( E ) ≤ dim( E ) /n and hence an isotropic probability measure is neces-sarily decent in the sense of Definition 1.1. A Borel probability measure µ on R n with µ ( { } ) = 0 is called “potentially isotropic” if there exists an invertible linear map T on R n such that ( R ◦ T ) ∗ µ is isotropic. Lemma 5.1
Let µ be a Borel probability measure on S n − such that µ ( H ) = 0 (30) for any hyperplane H ⊂ R n through the origin. Then µ is potentially isotropic.Proof: Given an invertible linear map T : R n → R n we abbreviate M µ ( T ) = M (( R◦ T ) ∗ µ ) . Then M µ ( T ) is a positive semi-definite matrix of trace one, and by the arithmetic-geometric means inequality, det M µ ( T ) ≤ n − n . Note that M µ ( T ) = M µ ( λT ) for any λ > . Consider the supremum of the continuous functional T det M µ ( T ) (31)over the space of all invertible linear operators T : R n → R n of Hilbert-Schmidt normone.We claim that the supremum is attained. Indeed, let T , T , . . . be a maximizingsequence of matrices. By passing to a subsequence if necessary, we may assume that T i → T , for a certain matrix T of Hilbert-Schmidt norm one. We need to show that T is18nvertible. Denote by E the image of T , a subspace of R n . We need to show that E = R n .For any x ∈ S n − which is not in the kernel of T , we have T i x → T x ∈ E \ { } , hence | P roj E ⊥ ( R ◦ T i )( x ) | i →∞ −→ . (32)The kernel of T is at most ( n − -dimensional, since the Hilbert-Schmidt norm of T is one. According to (30), the convergence in (32) occurs µ -almost-everywhere in x .Therefore, T r ( M µ ( T i ) P roj E ⊥ ) = Z S n − | P roj E ⊥ ( R ◦ T i )( x ) | dµ ( x ) i →∞ −→ . We conclude that if E = R n , then det M µ ( T i ) → , in contradiction to the maximizingproperty of the sequence ( T i ) i ≥ . Hence E = R n and T is invertible. Thus the supremumof the functional (31) is attained for some invertible matrix T of Hilbert-Schmidt normone. We will show that ( R ◦ T ) ∗ µ is isotropic. Without loss of generality we assume that T = Id (otherwise, replace µ with ( R ◦ T ) ∗ µ and note that this replacement does notaffect the validity of the assumptions and the conclusions of the lemma).The matrix M ( µ ) = M µ ( Id ) is a positive semi-definite matrix of trace one. It is non-singular, thanks to (30), and therefore M ( µ ) is in fact positive definite. Moreover, for anyfunction u : S n − → R which is positive µ -almost-everywhere and for any θ ∈ S n − , Z S n − u ( x )( x · θ ) dµ ( x ) > . (33)Assume by contradiction that M ( µ ) is not a scalar matrix. Denote by λ the largesteigenvalue of M ( µ ) = M µ ( Id ) , and let E ⊂ R n be the eigenspace corresponding to theeigenvalue λ . Then ≤ dim( E ) ≤ n − . For ≤ δ < consider the linear operator L δ ( x ) = x − δP roj E ( X ) ( x ∈ R n ) . Then
P roj E L δ = (1 − δ ) P roj E while P roj E ⊥ L δ = P roj E ⊥ . This means that R ◦ L δ strengthens the E ⊥ -component of a given point in R n , at the expense of its E -component.More precisely, for any x ∈ S n − and ≤ δ < there exists ε δx ≥ such that P roj E ⊥ ( R ( L δ x )) = (1 + ε δx ) P roj E ⊥ ( x ) . Moreover, when x E ∪ E ⊥ we have the inequality ε δx ≥ ε ( x ) δ for some ε ( x ) > depending only on x . Consequently, for any ≤ δ < and a non-zero vector θ ∈ E ⊥ , Z S n − ( R ( L δ x ) · θ ) dµ ( x ) = Z S n − (1 + ε δx ) ( x · θ ) dµ ( x ) ≥ Z S n − ( x · θ ) dµ ( x ) + 2 δ Z S n − ε ( x )( x · θ ) dµ ( x ) . (34)The symmetric matrix M µ ( L δ ) is of trace one, and it depends smoothly on δ . Denote D = dM µ ( L δ ) /dδ | δ =0 , a traceless symmetric matrix. According to our assumption(30), the condition x E ∪ E ⊥ holds µ -almost-everywhere as ≤ dim( E ) ≤ n − .19herefore ε ( x ) > for µ -almost every x ∈ S n − . From (33) and (34) we learn that forany = θ ∈ E ⊥ , Dθ · θ = ddδ (cid:18)Z S n − ( R ( L δ x ) · θ ) dµ ( x ) (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) δ =0 > . (35)Recall that E is the eigenspace corresponding to the maximal eigenvalue λ of M ( µ ) .Denote by λ the second-largest eigenvalue, which is still positive but is strictly smallerthan λ . Then P roj E ⊥ M ( µ ) − ≥ λ − P roj E ⊥ in the sense of symmetric matrices.Using elementary linear algebra, we deduce from (35) that T r ( P roj E ⊥ M ( µ ) − D ) ≥ λ − T r ( P roj E ⊥ D ) > . Since
T r ( D ) = 0 then T r ( P roj E D ) = − T r ( P roj E ⊥ D ) and d log det M µ ( L δ ) dδ (cid:12)(cid:12)(cid:12)(cid:12) δ =0 = T r (cid:0) M ( µ ) − D (cid:1) = T r ( P roj E M ( µ ) − D ) + T r ( P roj E ⊥ M ( µ ) − D )= T r ( P roj E D ) λ + T r ( P roj E ⊥ M ( µ ) − D ) ≥ (cid:18) λ − λ (cid:19) T r ( P roj E ⊥ D ) > , in contradiction to the maximality of det M ( µ ) . Hence our assumption that M ( µ ) is not a scalar matrix was absurd. Since T r ( M ( µ )) = 1 then M ( µ ) = Id/n and µ is isotropic. (cid:3) For a subspace E ⊂ R n and δ > we write N δ ( E ) = (cid:8) rx ; | x | = 1 , r ≥ , d ( x, E ∩ S n − ) ≤ δ (cid:9) (36)where d ( x, A ) = inf y ∈ A | x − y | . Then N δ ( E ) is the projective δ -neighborhood of E . Wewill need the following auxiliary continuity lemma. It is the only time in this text wherethe non-degeneracy conditions of Definition 1.1 are used. Lemma 5.2
Let n ≥ be an integer and let µ be a probability measure on R n with µ ( { } ) = 0 such that µ ( E ) < dim( E ) /n for any subspace E ⊂ R n other than R n and { } . Suppose there exists a sequence ofpotentially isotropic probability measures on S n − that converges to R ∗ µ in the weak ∗ topology. Then µ is potentially isotropic.Proof: From the assumptions of the lemma, there exist Borel probability measures µ , µ , . . . on S n − and invertible linear maps T , T , . . . for which the following holds: • µ i i →∞ −→ R ∗ µ in the weak ∗ topology; and • ( R ◦ T i ) ∗ µ i is isotropic for all i . 20ithout loss of generality we may assume that the T i ’s are positive definite operators oftrace one: If not, we will replace the operator T i by rU i T i , where U i is an orthogonaltransformation such that U i T i is positive definite and r − is the trace of U i T i . Sucha replacement does not affect the isotropicity of ( R ◦ T i ) ∗ µ i . Furthermore, replacing T i ( i = 1 , , . . . ) with a subsequence, we may assume that T i → T , where T is a positivesemi-definite matrix of trace one.We claim that T is invertible. Assume by contradiction that T is singular. Denoteby E ⊂ R n the kernel of T , and set k = dim( E ) . Then ≤ k ≤ n − and hence µ ( E ) < k/n . Since E = ∩ δ> N δ ( E ) , then there exists δ > such that µ ( N δ ( E )) < k/n. The set N δ ( E ) ∩ S n − is closed in S n − . Since µ i → R ∗ µ in the weak ∗ topology then lim sup i →∞ µ i ( N δ ( E )) ≤ R ∗ µ ( N δ ( E )) = µ ( N δ ( E )) < kn . (37)Recall that T i → T , that the T i ’s are self-adjoint, and that E is the kernel of T , hence E ⊥ is the image of T . This entails, roughly speaking, that for any x E , the sequence T i x is“approaching E ⊥ ”. In more precise terms, we conclude that for any x
6∈ N δ ( E ) , (cid:12)(cid:12)(cid:12)(cid:12) P roj E ⊥ (cid:18) T i x | T i x | (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) i →∞ −→ . (38)Moreover, the convergence in (38) is uniform over x ∈ R n \ N δ ( E ) . Consequently, from(37) and (38), lim inf i →∞ Z S n − \N δ ( E ) (cid:12)(cid:12)(cid:12)(cid:12) P roj E ⊥ (cid:18) T i x | T i x | (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) dµ i ( x ) = lim inf i →∞ µ i ( S n − \N δ ( E )) > − kn . (39)Recall that ( R ◦ T i ) ∗ µ i is isotropic. According to (29), Z S n − (cid:12)(cid:12)(cid:12)(cid:12) P roj E ⊥ (cid:18) T i x | T i x | (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) dµ i ( x ) = dim( E ⊥ ) n = 1 − kn , in contradiction to (39). Thus our assumption that T is singular was absurd, and T isnecessarily invertible.Since T i → T with T being invertible, we know that for any x ∈ S n − , T i x | T i x | i →∞ −→ T x | T x | and the convergence is uniform in S n − . Therefore, for any θ ∈ S n − , Z S n − (cid:18) T i x | T i x | · θ (cid:19) dµ i ( x ) i →∞ −→ Z S n − (cid:18) T x | T x | · θ (cid:19) d R ∗ µ ( x ) . (40)However, the left-hand side of (40) is always /n . We see that ( R◦ T ◦R ) ∗ µ = ( R◦ T ) ∗ µ is isotropic, and therefore µ is potentially isotropic. (cid:3) orollary 5.3 Let n be a positive integer and let µ be a Borel probability measure on R n with µ ( { } ) = 0 such that µ ( E ) < dim( E ) /n (41) for any subspace E ⊂ R n other than R n and { } . Then µ is potentially isotropic.Proof: Consider a sequence µ , µ , . . . of Borel probability measures on S n − , abso-lutely continuous with respect to the Lebesgue measure on S n − , that converges to R ∗ µ in the weak ∗ topology. The µ i ’s are potentially isotropic by Lemma 5.1. Therefore µ ispotentially isotropic according to Lemma 5.2. (cid:3) A clever proof of Corollary 5.3 for the case where the measure µ is discrete and has fi-nite support appears in the works of Barthe [2], Carlen and Cordero-Erausquin [7, Lemma3.5] and Carlen, Lieb and Loss [8]. We were not able to generalize their argument to thecase of a general measure satisfying (41). The proof presented above is unfortunatelylonger, but perhaps it has the advantage of being geometrically straightforward. Lemma 5.4
Let n be a positive integer and α > . Suppose that µ is an α -decentprobability measure on R n . Then for any < ε < , there exists a linear transformation T : R n → R n such that ν = T ∗ ( µ ) satisfies Z S n − ( x · θ ) d R ∗ ν ( x ) ≤ α + ε for all θ ∈ S n − . Proof:
By induction on the dimension n . The case n = 1 is obvious. Suppose that n ≥ . We may assume that µ ( H ) < for any hyperplane H ⊂ R n that passes throughthe origin (otherwise, invoke the induction hypothesis). We may also assume that α =sup E ⊆ R n µ ( E ) / dim( E ) where the supremum runs over all subspaces { } 6 = E ⊆ R n .Corollary 5.3 takes care of the case where µ ( E ) < dim( E ) /n for any subspace E ⊂ R n with ≤ dim( E ) ≤ n − . We may thus focus on thecase where there exists a proper subspace E ⊂ R n with µ ( E ) ≥ dim( E ) /n . Clearly α ≥ /n . Consequently, there is a subspace E ⊂ R n , with ≤ dim( E ) ≤ n − , suchthat α − ε/ (3 n ) ≤ µ ( E ) < . Let T : R n → R n be the map defined by T ( x ) = (cid:26) P roj E ⊥ x x Ex x ∈ E The map T may be viewed as a “stratified linear map” as in Furstenberg [9]. Set λ = µ ( E ) > . The probability measure T ∗ µ on R n is supported on E ∪ E ⊥ , and it may bedecomposed as T ∗ µ = λµ E + (1 − λ ) µ E ⊥ where µ E = µ | E is the conditioning of µ to E , and µ E ⊥ is a certain probability measuresupported on E ⊥ . Clearly, µ E = µ | E is ( α/λ ) -decent. Regarding µ E ⊥ , let us select asubspace F ⊆ E ⊥ . Then, µ E ⊥ ( F ) = 0 if F = { } and otherwise (1 − λ ) µ E ⊥ ( F ) = µ ( T − ( F \ { } )) = µ (( F ⊕ E ) \ E ) = µ ( F ⊕ E ) − µ ( E ) ≤ α (dim( E ) + dim( F )) − ( α − ε/ (3 n )) dim( E ) ≤ ( α + ε/
3) dim( F ) E ⊕ F = { x + y ; x ∈ E, y ∈ F } is the subspace spanned by E and F . Con-sequently, µ E ⊥ is an (( α + ε/ / (1 − λ )) -decent measure on E ⊥ . We may apply theinduction hypothesis for µ E and µ E ⊥ . We conclude that there exists a linear transforma-tion S : R n → R n , with S ( E ) ⊆ E and S ( E ⊥ ) ⊆ E ⊥ , such that Z S n − ( x · θ ) d ( R ◦ S ◦ T ) ∗ µ ( x ) ≤ α + ε/ for any θ ∈ S n − . (42)The problem is that S ◦ T is not a linear map. However, it is easy to approximate itby a linear map: For < δ < denote T δ x = x − δP roj E x . Then ( R ◦ T )( x ) =lim δ → − ( R ◦ T δ )( x ) for any = x ∈ R n . Consequently, ( R ◦ S ◦ T δ ) ∗ µ = ( R ◦ S ◦ R ◦ T δ ) ∗ µ δ → − −→ ( R ◦ S ◦ R ◦ T ) ∗ µ = ( R ◦ S ◦ T ) ∗ µ in the weak ∗ topology. We conclude that the matrices M (( R ◦ S ◦ T δ ) ∗ µ ) tend to M (( R ◦ S ◦ T ) ∗ µ ) as δ → − . Hence, by (42), for some δ < , Z S n − ( x · θ ) d ( R ◦ S ◦ T δ ) ∗ µ ( x ) ≤ α + ε for any θ ∈ S n − . The map S ◦ T δ is the desired linear transformation. This completes the proof. (cid:3) Proof of Theorem 1.3:
Suppose that µ is an η -decent probability measure on R n . Accord-ing to Lemma 5.4, there exists a linear map S : R n → R n such that ν = S ∗ µ satisfies Z S n − ( x · θ ) d R ∗ ν ( x ) ≤ η for all θ ∈ S n − . We invoke Corollary 4.6 for the measure ν . We see that if the positive integer d and < ε < / are such that η ≤ (˜ cε ) ˜ Cd , then there exists a d × n matrix Γ for which the measure Γ ∗ ν on R d is ε -radial proper.Setting T = Γ S , a d × n matrix, we conclude that T ∗ µ = Γ ∗ ν is a measure on R d whichis ε -radial proper. (cid:3) Proof of Corollary 1.4:
We may assume that n exceeds a given large universal con-stant. Denote d = ¯ c ⌈√ log n ⌉ and δ = e − d / , for a small universal constant < ¯ c < such that n ≥ ( C/δ ) Cd where C is the universal constant from Theorem 1.3. Accordingto Theorem 1.3, we may pass to a d -dimensional marginal and assume that our measure µ is a proper δ -radial measure on R d . For t ∈ R and L > let χ t,L be the L -Lipschitzfunction on the real line which equals zero on ( −∞ , t ] and one on [ t + 1 /L, ∞ ) . Recallthe Kantorovich-Rubinstein duality as in (5) above. Then, for any probability measure ν on the unit sphere S d − and < t < c ≤ / , ν ( { x ; x ≥ t } ) ≥ Z S d − χ t,d ( x ) dν ( x ) ≥ Z S d − χ t,d ( x ) dσ d − ( x ) − dW ( ν, σ d − ) , x = ( x , . . . , x d ) are the coordinates of x ∈ S d − . The integral with respect to σ d − may be estimated directly, and it is bounded from below by ce − Ct d (note that themarginal of σ d − on the first coordinate has a density that is proportional to (1 − t ) ( d − / on [ − , ). We conclude that for any < t < c and an interval J = [ a, b ] ⊂ (0 , ∞ ) with µ ( S ( J )) ≥ δ , µ | S ( J ) ( { x ; x ≥ at } ) ≥ R ∗ ( µ | S ( J ) )( { x ; x ≥ t } ) ≥ ce − Ct d − d · W ( R ∗ ( µ | S ( J ) ) , σ d − ) ≥ ce − Ct d − dδ ≥ c ′ e − C ′ t d . (43)Similarly, for any interval J = [ a, b ] ⊂ (0 , ∞ ) with µ ( S ( J )) ≥ δ , µ | S ( J ) (cid:16)n x ; | x | ≥ b/ √ d o(cid:17) ≤ R ∗ ( µ | S ( J ) ) (cid:16)n x ; | x | ≥ / √ d o(cid:17) ≤ Z S d − χ / √ d − d − ,d ( | x | ) dσ d − ( x ) + d · W ( R ∗ ( µ | S ( J ) ) , σ d − ) ≤ / , (44)where the integral with respect to σ d − is estimated in a straightforward manner. Let ˜ M > be a quantile with µ ( { x ; | x | ≤ ˜ M } ) ≥ / and µ ( { x ; | x | ≥ ˜ M } ) ≥ / . Let a > be such that the interval J = [ a, ˜ M ] satisfies µ ( S ( J )) ≥ / . We apply (44)for the interval J = [ a, ˜ M ] to deduce that µ (cid:16)n x ; | x | ≥
20 ˜
M / √ d o(cid:17) ≤
13 + 23 · µ | S ( J ) (cid:16)n x ; | x | ≥
20 ˜
M / √ d o(cid:17) ≤
13 + 23 · < . (45)Suppose that M > satisfies (3) with the linear functional ϕ ( x ) = x . We learn from(45) that necessarily M ≤
20 ˜ M/ √ d . Let b > be such that the interval J = [ ˜ M , b ] satisfies µ ( S ( J )) ≥ / . We apply (43) for the interval J = [ ˜ M , b ] and conclude that,for any ≤ t ≤ c √ d/ , µ ( { x ; x ≥ tM } ) ≥ · µ | S ( J ) (cid:16)n x ; x ≥ t ˜ M/ √ d o(cid:17) ≥ c ′ (cid:0) − C ′ t (cid:1) . Since c √ d/ ≥ ˜ c log / n , the proof of the lower bound for µ ( { x ; x ≥ tM } ) is com-plete. The proof of the lower bound for µ ( { x ; x ≤ − tM } ) is almost entirely identical.The corollary is thus proven. (cid:3) Remarks.
1. It is conceivable that a more delicate analysis yields a better bound for R n in Corol-lary 1.4. However, note that R n ≤ C √ log n as is shown by the example where µ isdistributed uniformly on n linearly independent vectors in R n . Compare the “super-gaussian” tail behavior of Corollary 1.4 with the almost sub-gaussian bounds in theconvex case in [13] and in Giannopoulos, Paouris and Pajor [10].2. The central limit theorem for convex bodies [14, 15] states that any uniform proba-bility measure on a high-dimensional convex set has some low-dimensional marginals24hat are approximately Gaussian. It is clear that there are perfectly regular proba-bility measures in high dimension (e.g., a mixture of two Gaussians) without anyapproximately Gaussian marginals. Therefore, a geometric condition such as con-vexity is indeed relevant when we look for approximately Gaussian marginals. Forarbitrary high-dimensional measures without convexity properties, we may stillstate the more modest conclusion that some of the marginals are approximatelyspherically-symmetric, according to Theorem 1.3. There is no hope for approxi-mate Gaussians.Theorem 1.3 bears a strong relation to the proof of the central limit theorem forconvex bodies presented in [14, 15] (see [16] for another proof, which at presentworks only for a subclass of convex bodies). That proof begins by showing thatmarginals of the uniform measure on a convex body are approximately spherically-symmetric. The approximation in [14, 15] is rather strong compared to Theorem1.3, but nevertheless, a simple compactness argument enables us to leverage The-orem 1.3 in order to obtain the desired type of approximation. In principle, thisapproach yields a slightly different proof of the central limit theorem for convexsets, albeit with weaker estimates.The Euclidean structure with respect to which a random projection “works” withhigh probability seems a priori different in Theorem 1.3 and in the central limittheorem for convex bodies. In Theorem 1.3 we use the Euclidean structure withrespect to which the covariance matrix of R ∗ µ is scalar, while in the central limittheorem for convex bodies, the most natural position is to require the covariancematrix of µ itself to be a scalar matrix (compare also with [18], [21]). For convexbodies, these Euclidean structures are close to each other, since most of the mass ofa normalized convex body is located very close to a sphere (see [17]).3. The linear map T in Theorem 1.3 may be assumed to be an orthogonal projection.This follows from the following simple observation we learned from G. Schecht-man: Any n -dimensional ellipsoid has an ⌈ n/ ⌉ -dimensional projection which isprecisely a Euclidean ball. Therefore, in order to show that T may be chosen tobe an orthogonal projection, one essentially has to verify that a ⌈ d/ ⌉ -dimensionalmarginal of an ε -radial measure on R d is ε / -radial. We omit the details.4. The isoperimetric inequality on the high-dimensional sphere, which is the corner-stone of the concentration of measure phenomenon (see Milman and Schechtman[24]), is not used in the proof of Theorem 1.3. We do apply Levy’s lemma, whichembeds the isoperimetric inequality, in the proof of Lemma 2.4, but only in d di-mensions. The dimension d here is typically not very large.5. For a positive integer d and ε > denote by N ( ε, d ) the minimal dimensionwith the following property: Whenever N ≥ N ( ε, d ) , any N -dimensional Banachspace has a d -dimensional subspace which is ε -close to a Hilbert space. The clas-sical Dvoretzky’s theorem states that N ( ε, d ) ≤ exp( Cd/ε ) , where C > is auniversal constant (see Milman [23] and references therein). The power of /ε inthe exponent in the bound for N ( ε, d ) can be made arbitrarily close to one at the25xpense of increasing the universal constant C (see Schechtman [27]). It is con-ceivable, however, that these bounds are still far from optimal; perhaps N ( ε, d ) can be made as small as ( C/ε ) Cd ? See Milman [22] for a discussion of this con-jecture. An affirmative answer for the case d = 2 was given by Gromov [22], usinga topological argument which does not seem to generalize to higher dimensions.The analogy with the present article suggests to try and use Theorem 1.3, or ideasfrom its proof, in order to improve the bounds in Dvoretzky’s theorem. Further-more, the operation of marginal is dual, via the Fourier transform, to the operationof restriction to a subspace. So, for instance, suppose a norm k · k in R n may berepresented as k x k = Z R n | x · θ | dµ ( θ ) (46)for a compactly-supported probability measure µ on R n . In this case, we mayconsider subspaces E ⊂ R n for which ( P roj E ) ∗ µ is ε -radial, and expect that therestriction of k · k to these subspaces is close, in a certain sense, to the Euclideannorm. See Koldobsky [19, Chapter 6] for a comprehensive discussion of normsadmitting representations in the spirit of (46).While this approach may possibly yield some meaningful estimates for some classesof normed spaces, it has limitations. Theorem 1.3 is proven by considering a ran-dom marginal with respect to an appropriate Euclidean structure, i.e., a projectionof the given measure to a subspace which is distributed uniformly over the Grass-mannian of all d -dimensional subspaces in R n . However, for Banach spaces suchas ℓ N ∞ , a random subspace is not sufficiently close to a Hilbert space (see Schecht-man [28]), and there are better choices than the random one. (Indeed, the ℓ N ∞ normcannot be represented as in (46) or in a similar way, see Theorem 6.13 in Koldobsky[19], going back to Misiewicz). A direct application of Theorem 1.3 is thus quiteunlikely to provide new information regarding approximately Hilbertian subspacesfor all finite-dimensional normed spaces.6. In principle, the measures T ∗ ( µ ) in Theorem 1.3 are not only approximately ra-dial, but are also approximately a composition of isotropic Gaussians. Indeed, it iswell-known that any d -dimensional marginal of the measure σ k − , for d ≪ k , isapproximately an isotropic d -dimensional Gaussian measure. Thus, we may projectan approximately-radial measure on R k to any d -dimensional subspace, and obtaina measure which is approximately, in some sense, a composition of isotropic Gaus-sians. We did not rigorously investigate this approximation property on a precise,quantitative level. This section contains a corollary to Theorem 1.3, pertaining to probability measures sup-ported on infinite-dimensional spaces. We begin with a lemma regarding distributions onfinite-dimensional spaces. Let n ≥ be an integer, suppose that µ is a Borel probabilitymeasure on R n and let < a ≤ . A subspace E ⊆ R n is “ a -basic for µ ” if26i) µ ( E ) ≥ a (ii) µ ( F ) < a for any proper subspace F ( E .Note that any subspace E ⊆ R n with µ ( E ) ≥ a contains an a -basic subspace. Also,suppose T : R n → R m is a linear map, and let E ⊆ R n be an a -basic subspace for µ containing the kernel of T . Then T ( E ) is a -basic for T ∗ ( µ ) . Lemma 7.1
Let n ≥ be an integer, < a ≤ , and let µ be a Borel probability measureon R n . Then there are only finitely many subspaces E ⊆ R n that are a -basic for µ .Proof: Let k ≥ be an integer and < a ≤ . We will prove by induction on k thefollowing statement: For any integer n ≥ k and for any Borel probability measure µ on R n , there are at most finitely many subspaces E ⊆ R n whose dimension is at most k , thatare a -basic for the measure µ . The statement clearly implies the lemma. The case k = 0 is easy, as there is only one -dimensional subspace in R n .Let k ≥ . Suppose that n ≥ k is an integer, < a ≤ and let µ be a Borelprobability measure on R n . Denote by G the family of all subspaces E ⊆ R n whosedimension is at most k that are a -basic for the measure µ . We need to show that G ) < ∞ . (47)First, note that it is sufficient to prove (47) under the additional assumption that µ ( { } ) =0 . Indeed, denote ε = µ ( { } ) . If a ≤ ε then there is only one a -basic subspace in R n ,which is the subspace { } , and (47) clearly holds. In the non-trivial case where a > ε ,we may replace µ by ( µ − εδ ) / (1 − ε ) and a by ( a − ε ) / (1 − ε ) . The family of basicsubspaces remains exactly the same. From now on, we will thus assume that µ ( { } ) = 0 .Denote by E ⊆ G the collection of all subspaces E ⊆ R n that are a -basic for µ , with dim( E ) ≤ k , for which µ ( F ) < a / for any proper subspace F ( E . We will provethat E ) ≤ /a < ∞ . (48)To that end, let ˜ E be any finite subset of E , and denote N = E ) . For any two distinctsubspaces E , E ∈ ˜ E , we have µ ( E ∩ E ) < a / as E ∩ E is a proper subspace of E . According to the inclusion-exclusion principle, ≥ µ [ E ∈ ˜ E E ≥ X E ∈ ˜ E µ ( E ) − X E ,E ∈ ˜ E E = E µ ( E ∩ E ) > N a − N ( N − a / , where we used the fact that µ ( E ) ≥ a for any E ∈ ˜ E , since E is a -basic. We concludethat > N a − N ( N − a ≥ N a (cid:20) − N a (cid:21) = ⇒ | N a − | > . (49)Thus, there are no finite subsets of E whose cardinality is N = ⌈ /a ⌉ : In this case ≤ N a ≤ which is impossible according to (49). Hence E ) ≤ /a and (48) isproven. 27ext, denote by ˜ G the family of all subspaces E ⊆ R n that are a -basic, with dim( E ) ≤ k , for which there exists a proper subspace F ( E with µ ( F ) ≥ a / . In view of (48),in order to deduce (47) it suffices to show that G ) < ∞ . (50)Whenever a subspace E ⊆ R n contains a proper subspace F ( E with µ ( F ) ≥ a / ,it also contains an a / -basic proper subspace ˜ F ( E with dim( ˜ F ) ≤ k − . By theinduction hypothesis, there are only finitely many subspaces ˜ F ⊆ R n that are a / -basicfor µ whose dimension is at most k − . Fix such an a / -basic subspace ˜ F . Let F be thecollection of all subspaces E ⊆ R n that are a -basic, contain ˜ F , and satisfy dim( E ) ≤ k .The task of proving (50) and completing the proof of the lemma is reduced to showingthat F ) < ∞ . Note that dim( ˜ F ) ≥ as µ ( { } ) = 0 < a / , and hence { } is not an a / -basicsubspace. Denote by P = P roj ˜ F ⊥ the orthogonal projection operator onto ˜ F ⊥ in R n .Then ν = P ∗ ( µ ) is a Borel probability measure on ˜ F ⊥ . For any E ∈ F , the subspace P ( E ) is an a -basic subspace for the measure ν , and dim( P ( E )) = dim( E ) − dim( ˜ F ) ≤ k − . From the induction hypothesis, we see that the set { P ( E ); E ∈ F} is finite.However, P ( E ) = P ( E ) for any distinct E , E ∈ F . Thus F ) < ∞ , as promised.The lemma is proven. (cid:3) An alternative proof of Lemma 7.1 was suggested by N. Alon. His idea is to replacethe first part of the proof of the induction step with the known fact that there exists a finiteset A ⊂ R n that intersects any subspace of measure at least a (see, e.g., Alon and Spencer[1, Section 13.4]).We write R ∞ for the linear space of infinite sequences a = ( a , a , . . . ) with a i ∈ R for all i ≥ . The space R ∞ is endowed with the standard product topology (also knownas Tychonoff’s topology) and the corresponding Borel σ -algebra. The projection map P n : R ∞ → R n is defined by P n ( x ) = ( x , . . . , x n ) for x = ( x , x , . . . ) ∈ R ∞ . Then P n is a continuous, linear map. Note that any finite-dimensional subspace E ⊂ R ∞ is a closed set. Also for any subspace E ⊆ R ∞ wehave dim( E ) = sup n dim( P n ( E )) . (51)With a slight abuse of notation, for m ≥ n ≥ we also write P n : R m → R n forthe projection operator defined by P n ( x , . . . , x m ) = ( x , . . . , x n ) . We will also usethe ridiculous space R = { } , and P ( x ) = 0 for any x . Let ε > and let X bea measurable linear space in which all finite-dimensional subspaces are measurable. Aprobability measure µ on X is called ε -decent if for any finite-dimensional subspace E ⊆ X , µ ( E ) ≤ ε dim( E ) . Lemma 7.2
Let ε > and let µ be a Borel probability measure on R ∞ . Suppose that µ is ε -decent. Then, there exists N ≥ such that ( P N ) ∗ µ is ε -decent. roof: For n ≥ denote µ n = ( P n ) ∗ µ , a Borel probability measure on R n . We saythat a subspace E ⊆ R n is “thick” if µ n ( E ) ≥ ε dim( E ) . A thick subspace E ⊆ R n isnecessarily of dimension at most (2 ε ) − . We say that E is a “primitive, thick subspace”if it is thick and additionally µ n ( F ) < ε dim( F ) for any proper subspace F ( E . Clearly, any thick subspace E ⊆ R n contains aprimitive, thick subspace. Observe also that a primitive, thick, k -dimensional subspace E ⊆ R n is necessarily εk -basic for the measure µ n . From Lemma 7.1 we thus learn thatfor any n , there are only finitely many primitive, thick subspaces E ⊆ R n .Denote by V the collection of all pairs ( E, n ) such that E ⊆ R n is a primitive, thicksubspace. In order to prove the lemma, it suffices to show that V is finite. Indeed, inthis case, set N = max { n + 1; ∃ E ⊆ R n , ( E, n ) ∈ V} . Then there are no primitive,thick subspaces in R N , and hence there are no thick subspaces in R N . Consequently µ N = ( P N ) ∗ ( µ ) is (2 ε ) -decent, and the lemma is proven. The rest of the argument isthus concerned with the proof that V is finite.Define a directed graph structure on V as follows: There is an edge going from thenode ( E, n ) ∈ V to the node ( F, n + 1) ∈ V if and only if E ⊆ P n ( F ) . Note that foreach node ( F, n + 1) ∈ V , the subspace P n ( F ) ⊆ R n is clearly thick, hence it containsa primitive, thick subspace ˜ E ⊆ R n . Therefore each node ( F, n + 1) is connected to acertain node ( ˜
E, n ) ∈ V . We conclude that there is a path from ( { } , ∈ V to any nodein V . For each n ≥ there are only finitely many nodes of the form ( E, n ) ∈ V , sincethere are only finitely many primitive, thick subspaces E ⊆ R n . Therefore, V is finite ifand only if it does not contain an infinite path.We deduce that in order to prove the lemma, it suffices to show that there is no se-quences of subspaces E n ⊆ R n ( n = 0 , , . . . ) such that for any n ≥ , E n ⊆ P n ( E n +1 ) and ( E n , n ) ∈ V . (52)Assume by contradiction that a sequence of subspaces satisfying (52) exists. Recall thata subspace of dimension larger than (2 ε ) − cannot be thick, hence dim( E n ) is boundedby (2 ε ) − . Additionally, dim( E n ) ≤ dim( E n +1 ) for all n . Therefore, there exist n ≥ and d ≤ (2 ε ) − such that dim( E n ) = d for all n ≥ n . Consequently, E n = P n ( E n +1 ) for any n ≥ n . Consider the direct limit E = { a ∈ R ∞ ; P n ( a ) ∈ E n for all n ≥ n } ⊆ R ∞ . Then E = ∩ n ≥ n P − n ( E n ) is a subspace of R ∞ with P n ( E ) = E n for all n ≥ n .Furthermore, dim( E ) = d according to (51). Note that P − n ( E n ) ⊃ P − n +1 ( E n +1 ) forany n ≥ n . Therefore µ ( E ) = µ \ n ≥ n P − n ( E n ) = lim n →∞ µ (cid:0) P − n ( E n ) (cid:1) = lim n →∞ µ n ( E n ) ≥ εd, E n ⊆ R n is a d -dimensional thick subspace. Hence µ ( E ) ≥ εd , in contradictionto our assumption that µ is ε -decent. We conclude that there are no infinite paths in V ,and hence that V is finite .The lemma is proven. (cid:3) Suppose X is a topological vector space. We say that X has a countable separat-ing family of continuous, linear functionals if there exist continuous linear functionals f , f , . . . : X → R such that for any x ∈ X , x = 0 ⇐⇒ ∀ n, f n ( x ) = 0 . This condition is not too restrictive. For example, any separable normed space, any sepa-rable Fr´echet space, and any topological vector space dual to a separable Fr´echet space –admits a countable, separating family of continuous, linear functionals.
Corollary 7.3
Let ε > , let d ≥ be an integer, and let X be a topological vector spacewith a countable separating family of continuous, linear functionals. Suppose that µ isan ε -decent Borel probability measure on X . Then, there exists a continuous linear map T : X → R d such that T ∗ ( µ ) is δ -radial proper, for δ = cε c/d . Here, c > is a universalconstant.Proof: Let f , f , . . . : X → R be the separating sequence of continuous linearfunctionals. Then the linear map T : X → R ∞ defined by T ( x ) = ( f ( x ) , f ( x ) , . . . ) is a continuous linear embedding. Since µ is ε -decent, then also T ∗ ( µ ) is an ε -decent,Borel probability measure on R ∞ . According to Lemma 7.2, there exists a finite N ≥ and a continous linear map P : R ∞ → R N such that ( P ◦ T ) ∗ ( µ ) is a ε -decent measureon R N . The corollary now follows from Theorem 1.3. (cid:3) Note that the linear map T in Corollary 7.3 is not only measurable but also continu-ous. In principle, we could have formulated Corollary 7.3 for a probability measure on ameasurable linear space, without having to rely on an ambient topology: All we need is alinear, measurable embedding in R ∞ . We refer the reader to Tsirelson [30] for a discus-sion of measures on infinite-dimensional linear spaces, and for an exposition of Vershik’s“de-topologization” program [31, 32]. We conclude this note with an infinite-dimensionalanalog of Corollary 1.4. Corollary 7.4
Let X be a topological vector space with a countable separating family ofcontinuous, linear functionals. Suppose that µ is a Borel probability measure on X suchthat µ ( E ) = 0 for any finite-dimensional subspace E ⊂ X . Then, for any R > , thereexists a non-zero, continuous linear functional ϕ : X → R such that µ ( { x ; ϕ ( x ) ≥ tM } ) ≥ c exp( − Ct ) for all ≤ t ≤ R and µ ( { x ; ϕ ( x ) ≤ − tM } ) ≥ c exp( − Ct ) for all ≤ t ≤ R where M > is a median, that is, µ ( { x ; | ϕ ( x ) | ≤ M } ) ≥ / and µ ( { x ; | ϕ ( x ) | ≥ M } ) ≥ / and c, C > are universal constants. eferences [1] Alon, N., Spencer, J. H., The probabilistic method . Wiley-Interscience Series in DiscreteMathematics and Optimization, John Wiley & Sons, New York, 2000.[2] Barthe, F.,
On a reverse form of the Brascamp-Lieb inequality.
Invent. Math., 134, no. 2,(1998), 335–361.[3] Beck, J., Chen, W. W. L.,
Irregularities of distribution.
Cambridge Tracts in Mathematics,89. Cambridge University Press, Cambridge, 1987.[4] Bobkov, S. G.,
On concentration of distributions of random weighted sums.
Ann. Prob., 31,no. 1, (2003), 195–215.[5] Bourgain, J., Lindenstrauss, J., Milman, V. D.,
Approximation of zonoids by zonotopes.
ActaMath., 162, no. 1-2, (1989), 73–141.[6] Bourgain, J., Lindenstrauss, J., Milman, V. D.,
Minkowski sums and symmetrizations.
Geo-metric aspects of functional analysis, Israel seminar (1986–87), Lecture Notes in Math., 1317,Springer, Berlin, (1988), 44–66.[7] Carlen, E., Cordero-Erausquin, D.,
Subadditivity of the entropy and its relation to Brascamp-Lieb type inequalities . Preprint. Available under http://arxiv.org/abs/0710.0870 [8] Carlen, E., Lieb, E., Loss, M.,
A sharp form of Young’s inequality on S N and related entropyinequalities . Jour. Geom. Analysis, 14, (2004), 487 – 520.[9] Furstenberg, H., A note on Borel’s density theorem.
Proc. Amer. Math. Soc., 55, no. 1, (1976),209–212.[10] Giannopoulos, A., Pajor, A., Paouris, G.,
A note on subgaussian estimates for linear function-als on convex bodies.
Proc. Amer. Math. Soc., 135, (2007), 2599–2606.[11] Gromov, M.,
Dimension, nonlinear spectra and width.
Geometric aspects of functional anal-ysis, Israel seminar (1986–87), Lecture Notes in Math., 1317, Springer, Berlin, (1988), 132–184.[12] Gromov, M., personal communication.[13] Klartag, B.,
Uniform almost sub-gaussian estimates for linear functionals on convex sets .Algebra i Analiz (St. Petersburg Math. Journal), Vol. 19, no. 1 (2007), 109–148.[14] Klartag, B.,
A central limit theorem for convex sets.
Invent. Math., 168, (2007), 91–131.[15] Klartag, B.,
Power-law estimates for the central limit theorem for convex sets.
J. Funct. Anal.,245, (2007), 284–310.[16] Klartag, B.,
A Berry-Esseen type inequality for convex bodies with an unconditional basis.
Probab. Theory Related Fields, 45, no. 1, (2009), 1 – 33.[17] Klartag, B.,
High-dimensional distributions with convexity properties . Preprint. Available un-der [18] Klartag, B., Milman, E.,
On volume distribution in 2-convex bodies . Israel J. Math., Vol. 164,(2008), 221–249.[19] Koldobsky, A.,
Fourier analysis in convex geometry.
Mathematical Surveys and Monographs,116, American Mathematical Society, Providence, RI, 2005.[20] Kolmogoroff, A., ¨Uber das Gesetz des iterierten Logarithmus.
Math. Ann., 101, (1929), 126–135. English translation in Selected Works of A. N. Kolmogorov, vol. II, edited by A. N.Shiryayev. Math. and its Applications (Soviet Series), 26, Kluwer, Dordrecht, (1992), 32–42.
21] Milman, E.,
On Gaussian Marginals of Uniformly Convex Bodies . J. Theoret. Probab., 22, no.1, (2009), 256–278.[22] Milman, V. D.,
A few observations on the connections between local theory and some otherfields.
Geometric aspects of functional analysis, Israel seminar (1986–87), Lecture Notes inMath., 1317, Springer, Berlin, (1988), 283–289.[23] Milman, V. D.,
Dvoretzky’s theorem—thirty years later.
Geom. Funct. Anal., 2, no. 4, (1992),455–479.[24] Milman, V. D., Schechtman, G.,
Asymptotic theory of finite-dimensional normed spaces.
Lec-ture Notes in Math., 1200, Springer-Verlag, Berlin, 1986.[25] Nagaev, S. V.,
Lower bounds on large deviation probabilities for sums of independent randomvariables.
In Asymptotic methods in probability and statistics with applications (St. Peters-burg, 1998), Stat. Ind. Technol., Birkh¨auser, Boston, (2001), 277–295.[26] Pisier, G.,
The volume of convex bodies and Banach space geometry.
Cambridge Tracts inMathematics, 94, Cambridge University Press, Cambridge, 1989.[27] Schechtman, G.,
Two observations regarding embedding subsets of Euclidean spaces innormed spaces.
Adv. Math., 200, no .1, (2006), 125–135.[28] Schechtman, G.,
The random version of Dvoretzky’s theorem in ℓ n ∞ . Geometric aspects offunctional analysis, Israel seminar (2004-05), Lecture Notes in Math., 1910, Springer, Berlin,(2007), 265–270.[29] Sudakov, V. N., Typical distributions of linear functionals in finite-dimensional spaces of high-dimension. (Russian) Dokl. Akad. Nauk. SSSR, 243, no. 6, (1978), 1402–1405. English trans-lation in Soviet Math. Dokl., 19, (1978), 1578–1582.[30] Tsirelson, B.,
A strange linear space with measure . Manuscript, 1998.[31] Verˇsik, A. M.,
Duality in the theory of measure in linear spaces.
Dokl. Akad. Nauk SSSR, 170,(1966), 497–500. (Russian). English translation in Soviet Math. Dokl., 7, (1966), 1210–1214.[32] Verˇsik, A. M.,
The axiomatics of measure theory in linear spaces.
Dokl. Akad. Nauk SSSR,178, (1968), 278–281. (Russian). English translation in Soviet Math. Dokl., 9, (1968), 68–72.[33] Villani, C.,
Topics in optimal transportation.
Graduate Studies in Mathematics, 58, AmericanMathematical Society, 2003.School of Mathematical Sciences, Tel-Aviv University, Tel-Aviv 69978, Israel e-mail address: [email protected]@post.tau.ac.il