aa r X i v : . [ m a t h . C A ] S e p EXPLORING THE TOOLKIT OF JEAN BOURGAIN
TERENCE TAO
Abstract.
Gian-Carlo Rota asserted in [29] that “every mathematician only has afew tricks”. The sheer breadth and ingenuity in the work of Jean Bourgain may atfirst glance appear to be a counterexample to this maxim. However, as we hope toillustrate in this article, even Bourgain relied frequently on a core set of tools, whichformed the base from which problems in many disparate mathematical fields couldthen be attacked. We discuss a selected number of these tools here, and then performa case study of how an argument in one of Bourgain’s papers [4] can be interpreted asa sequential application of several of these tools. Introduction
As the other articles in this collection demonstrate, Jean Bourgain achieved break-throughs in an astounding number of areas across mathematics, many of which wouldseem at first glance to be quite unrelated to each other. It is almost beyond beliefthat a single mathematician could have such deep impact on so many different subfieldsduring his or her career. However, if one compares Bourgain’s works in different topics,some common threads begin to emerge. In particular, one discovers that Bourgain hada certain “toolkit” of basic techniques that he was extremely skilled at using in a greatvariety of situations. While this toolkit is far from sufficient to explain the powerfulnature of his work, it does show how he could at least get started on making progressin so many different mathematical problems. In this article we present a selection ofBourgain’s most commonly used tools: quantification of qualitative estimates, dyadicpigeonholing, random translations, and metric entropy and concentration of measure.These tools often did not originate with Bourgain’s work, but he was able to wield themsystematically and apply them to a far broader range of problems than had previouslybeen realized. This is far from a complete list of Bourgain’s standard tools - for in-stance, many of his works also feature systematic use of the uncertainty principle, orthe method of probabilistic decoupling - but it is an illustrative sample of that basictoolkit.Of course, in many cases Bourgain’s work involved much deeper and delicate argumentsthan just the general-purpose techniques presented here. Nevertheless, knowledge ofthese basic tools helps place Bourgain’s arguments in a more systematic framework, inwhich these preliminary techniques are used to isolate the core features of the problem,which were then attacked by the full force of Bourgain’s intellectual firepower. Butsometimes these basic methods are already enough to solve non-trivial problems almoston their own. We illustrate this in the final section (Section 7) by tracing through a single paper [4] of Bourgain’s, giving one of the most general results known on the Erd˝ossimilarity problem; as we hope to demonstrate, this non-trivial result can be interpretedas essentially being a sequential application of each of the tools listed here in turn.2.
Notation
We use 1 E to denote the indicator of a set E , and use ∣ E ∣ to denote either the cardinalityof E (if it is finite) or its Lebesgue measure (if it is infinite). If E lies in a vector space,we use x + E ∶= { x + y ∶ y ∈ E } for the translate of E by x , x − E ∶= { x − y ∶ y ∈ E } for thereflection of E across x /
2, and λ ⋅ E ∶ = { λy ∶ y ∈ E } for the dilation of E by λ .We use P ( E ) to denote the probability of a random event E , and E X and Var ( X ) forthe mean and variance of a random variable X .We will use the asymptotic notation X ≲ Y , Y ≳ X , or X = O ( Y ) to denote the bound ∣ X ∣ ≤ CY for an absolute constant C . If the constant C depends on parameters we willindicate this by parameters, for instance X ≲ p,d Y denotes the bound ∣ X ∣ ≤ C p,d Y fora constant C p,d depending only on p, d . It may be noted that in Bourgain’s work theimplied constant was sometimes omitted completely from the notation. In the words ofHeath-Brown, in his Mathematical Reviews summary of Bourgain’s paper [7], “Readersare advised not to take some of the assertions made in the course of the proofs tooliterally, and, in particular, to abandon any preconceived notion as to the meaning ofthe symbol “ ∼ ””.3. Quantitative formulation of qualitative problems
One can roughly divide analysis into “soft analysis” - the study of qualitative properties(continuity, measurability, integrability, etc.) of infinitary objects, and “hard analysis”- the study of quantitative estimation of finitary objects. Bourgain’s toolkit lies almostexclusively in the latter category, so when tackling a “soft analysis” problem, often thefirst step in one of Bourgain’s arguments is to locate a more quantitative “hard analysis”estimate that will imply the desired claim, removing almost all the appearances of limitsor arbitrarily large and small scales, and instead working with a large but finite numberof scales and focusing on estimates that are uniform with respect to several parameters.For instance, consider the following result of Furstenberg, Katznelson, and Weiss [21]:
Theorem 3.1 (Furstenberg-Katznelson-Weiss theorem, qualitative version) . Let A ⊂ R be a measurable set whose upper density δ ∶ = lim sup R →∞ ∣ A ∩ B ( ,R )∣∣ B ( ,R )∣ is positive. Then thereexists l such that for all l ≥ l , there exist x, y ∈ A with ∣ x − y ∣ ≥ l . Note that this theorem does not provide any quantitative bound for the length threshold l in terms of the upper density δ . Indeed, such a bound is not possible, since if onereplaces A by a rescaled version λ ⋅ A ∶ = { λx ∶ x ∈ A } then the length threshold l XPLORING THE TOOLKIT OF JEAN BOURGAIN 3 will be replaced by λl , while the upper density δ remains unchanged. As such, onemay be tempted to conclude that Theorem 3.1 is irredeemably “qualitative” in nature.Nevertheless, in [2], Bourgain gave a new proof of this theorem (as well as several novelgeneralisations) by first establishing the following quantitative analogue: Theorem 3.2 (Furstenberg-Katznelson-Weiss theorem, quantitative version) . Let < ε < , let B ⊂ [ − , ] have measure ∣ B ∣ ≥ ε , and let J = J ( ε ) a be sufficiently largenatural number depending on ε . Suppose that < t J < ⋅ ⋅ ⋅ < t ≤ are a sequence ofscales with t j + ≤ t j / for all ≤ j < J . Then for at least one ≤ j ≤ J , one has ∫ R ∫ S B ( x ) B ( x + t j ω ) dσ ( ω ) dx ≳ ε , (3.1) where dσ is surface measure on the unit circle S , normalised to have unit mass. Note that there is no longer any appearance of limits in Theorem 3.2, and insteadof working with an infinity of scales l , one now works with a finite number of scales t , . . . , t J . Furthermore, there is a uniform bound on the number J of scales involveddepending on ε ; the arguments in [2] in fact give the explicit dependence J = O ( ε log ε ) on ε .Let us see why Theorem 3.2 implies Theorem 3.1. Assume for contradiction that The-orem 3.1 failed, then one can find a set A ⊂ R of some positive upper density δ > l < l < . . . going to infinity such that for each l j there are no x, y ∈ A with ∣ x − y ∣ = l j . We can sparsify this sequence of scales so that l j + ≥ l j forall j . Let ε > δ ), and let J be as in Theorem3.2. As A has density δ , one can find a radius R > l J such that ∣ A ∩ B ( , R )∣ ≳ δR . Ifwe now consider the rescaling B ∶ = { x ∈ [ − , ] ∶ Rx ∈ A } , and define the rescaled scales0 < t J < ⋅ ⋅ ⋅ < t ≤ t j ∶ = l J + − j / R for j = , . . . , R , then ∣ B ∣ ≳ δ and t j + ≤ t j / ≤ j < J , and for any 1 ≤ j ≤ J there are no points x, y ∈ A R with ∣ x − y ∣ = t j . Inparticular the left-hand side of (3.1) vanishes for all 1 ≤ j ≤ J , and one then contradictsTheorem 3.2 after choosing the parameters appropriately. Note how this argumentfails to give any bound on l (as it must), despite being “quantitative” in nature; itgives quantitative bounds on the number of genuinely different scales l j at which theconclusion of Theorem 3.1 fails, but does not bound the magnitude of these scales.We will sketch the proof of Theorem 3.2 in the next section. For now, we turn to anotherexample of Bourgain’s strategy of attacking qualitative results through quantitativemethods. We focus on a specific pointwise ergodic theorem established in [5, Theorem1]: Theorem 3.3 (Bourgain’s pointwise ergodic theorem along squares) . Let ( X, µ ) be aprobability space with a measure-preserving transformation T ∶ X → X . Let f ∈ L r ( X, µ ) for some r > . Then the averages A N f ( x ) ∶ = N N ∑ n = f ( T n x ) converge pointwise as N → ∞ for µ -almost every x ∈ X . TERENCE TAO
In fact one can replace the squares here by any other polynomial with integer coefficients,but we focus on the squares for sake of concreteness. A standard method to establishalmost everywhere convergence results in ergodic theory is to establish an inequality forthe associated maximal function
M f ( x ) ∶ = sup N > ∣ A N f ( x )∣ , (3.2)and combine this with an almost everywhere convergence result for functions f in adense subclass of the original space L r ( X, µ ) of interest (e.g., L ∞ ( X, µ ) ). In [5] amaximal inequality for (3.2) was established, thus reducing matters to consideration offunctions f in the dense subclass L ∞ ( X, µ ) , but new ideas were needed to handle thissubclass. Bourgain achieved this through the following variant of a maximal inequality: Theorem 3.4 (Bourgain’s variational estimate) . Let λ > , let ε > , and let J besufficiently large depending on λ, ε . Then for any < N < N < ⋅ ⋅ ⋅ < N J , any ( X, µ, T ) as in Theorem 3.3. Then for any f ∈ L ∞ ( X, µ ) , one has J − ∑ j = ∥ sup N j ≤ N ≤ N j + ∶ N ∈ Z λ ∣ A N f − A N j f ∣∥ L ( X,µ ) ≲ εJ ∥ f ∥ L ( X,µ ) where Z λ ∶ = {⌊ λ n ⌋ ∶ n ∈ N } . Note that L boundedness of the maximal function (3.2) would suffice to establishTheorem 3.4 were it not for the additional factor of ε on the right-hand side; thus Wenow sketch how Theorem 3.4 implies Theorem 3.3. By the previous discussion we mayassume f ∈ L ∞ ( X, µ ) , and may normalise ∥ f ∥ L ∞ ( X,µ ) = f to be real-valued.By rounding a natural number N to the nearest element N ′ of Z λ for some λ > A N f ( x ) = A N ′ f ( x ) + O ( λ − ) . From this it is not difficult to see that we onlyneed to establish pointwise almost everywhere convergence of A N f ( x ) , N ∈ Z λ for anyfixed λ >
1. Suppose this claim failed, then there would be a set E of positive measurein X and a δ > N ∈ Z λ A N f ( x ) − lim inf N ∈ Z λ A N f ( x ) ≥ δ for all x ∈ E . By standard measure theory arguments one can then recursively constructan infinite sequence N < N < . . . and a subset E ′ of E of positive measure such thatsup N j ≤ N ≤ N j + ∶ N ∈ Z λ ∣ A N f ( x ) − A N j f ( x )∣ ≳ δ for all j ≥ x ∈ E ′ . This then can be used to contradict Theorem 3.4 after selecting ε small enough and J large enough.The bulk of the paper [5] is occupied with the task of establishing estimates such asTheorem 3.4, which proceeds by transferring the problem to the integers, applyingFourier-analytic decompositions, and establishing some maximal inequalities of a har-monic analysis flavor. These arguments have proven to be quite influential, and a par-adigm for establishing many other pointwise ergodic theorems, but we will not surveythese developments here. XPLORING THE TOOLKIT OF JEAN BOURGAIN 5 Dyadic pigeonholing
One of the oldest tricks in analysis is that of dyadic decomposition : when faced witha sum or integral over a parameter ranging over a wide range of scales, first controlthe contribution of an individual dyadic scale (such as when the magnitude of theparameter ranges between two fixed consecutive powers 2 k , k + of two), and then sumover all dyadic scales. For instance, we have the Cauchy condensation test: when askedto determine whether a series ∑ ∞ n = f ( n ) is absolutely convergent, where f ∶ N → R + isnon-negative and non-increasing, one can break up the sum dyadically ∞ ∑ n = f ( n ) = ∞ ∑ k = ∑ k ≤ n < k + f ( n ) and then observe that each dyadic component can be easily bounded above and below2 k f ( k + ) ≤ ∑ k ≤ n < k + f ( n ) ≤ k f ( k ) at which point one easily sees that the original series ∑ ∞ n = f ( n ) converges if and onlyif the condensed sum ∑ ∞ k = k f ( k ) converges. While both sums are infinite, in practicethe latter sum is significantly more tractable than the former; for instance any polyno-mial improvements n − ε to bounds for the original sequence f ( n ) leads to exponentialimprovements 2 − εk in the bounds for the new sequence 2 k f ( k ) .A surprisingly useful variant of this method was used repeatedly by Bourgain in manyproblems, in which dyadic decomposition is combined with the pigeonhole principle tolocate a single “good” scale in which to run additional arguments. We refer to this com-bination of dyadic decomposition and the pigeonhole principle as dyadic pigeonholing .The quantitative result claimed in Theorem 3.2 is already well suited to a dyadic pi-geonholing argument, since the scales t j in that argument are already at least dyadi-cally separated, and we can sketch its proof as follows. Using the Fourier transformˆ f ( ξ ) ∶ = ∫ R d f ( x ) e πix ⋅ ξ dx , one can rewrite the left-hand side of (3.1) as ∫ R ∣ ˆ1 B ( ξ )∣ ˆ σ ( t j ξ ) dξ, where ˆ σ ( ξ ) ∶ = ∫ S e πiω ⋅ ξ dσ ( ω ) is the Fourier transform of the surface measure dσ . Onecan split this integral into the contribution of the “low frequencies” ∫ ∣ ξ ∣ ≤ δ / t j ∣ ˆ1 B ( ξ )∣ ˆ σ ( t j ξ ) dξ, (4.1)the “medium frequencies” ∫ δ / t j ≤ ∣ ξ ∣ ≤ / δt j ∣ ˆ1 B ( ξ )∣ ˆ σ ( t j ξ ) dξ, (4.2)and the “high frequencies” ∫ ∣ ξ ∣ > / δt j ∣ ˆ1 B ( ξ )∣ ˆ σ ( t j ξ ) dξ, (4.3)where 0 < δ = δ ( ε ) ≤ / ε to be chosen later. For thecontribution (4.1) of the low frequencies, the factor ˆ σ ( t j ξ ) is close to 1, and it is notdifficult to obtain a lower bound on this quantity that is ≳ ∣ B ∣ ≥ ε if δ is small enough. TERENCE TAO
For the contribution (4.3) of the high frequencies, the factor ˆ σ ( t j ξ ) is quite small, andone can show that the contribution of this term is negligible, again for δ small enough.The problematic term is the contribution (4.2), which one can of course upper boundby ∫ δ / t j ≤ ∣ ξ ∣ ≤ / δt j ∣ ˆ1 B ( ξ )∣ dξ. By Plancherel’s theorem one can upper bound this crudely by ≲ ∫ R ∣ ˆ1 B ( ξ )∣ dξ = ∣ B ∣ , but this bound is too weak compared to the lower bound of ≳ ∣ B ∣ one can obtain for themain term (4.1). However, note that because of the lacunarity hypothesis t j + ≤ t j /
2, theannuli { δ / t j ≤ ∣ ξ ∣ ≤ / δt j } only overlap with multiplicity O ( log δ ) , hence the Plancherelbound actually gives J ∑ j = ∫ δ / t j ≤ ∣ ξ ∣ ≤ / δt j ∣ ˆ1 B ( ξ )∣ dξ ≲ log 1 δ ∣ B ∣ (4.4)and hence by the pigeonhole principle we can find a scale t j for which J ∑ j = ∫ δ / t j ≤ ∣ ξ ∣ ≤ / δt j ∣ ˆ1 B ( ξ )∣ dξ ≲ log δ J ∣ B ∣ . For J large enough, one can use this “good” scale to make the contribution (4.2) ofthe medium frequencies small compared to that of the high frequencies, and one canconclude the proof of Theorem 3.2.Another typical instance of dyadic pigeonholing occurs in [6, Lemma 2.15] when Bour-gain establishes new lower bounds on the Hausdorff dimension of Besicovitch sets E (compact subsets of R d that contain a unit line segment ℓ ω in every direction). To lowerbound the Hausdorff such a set by α , one would have to establish a lower bound for theHausdorff content ∑ i r α − εi whenever one covers the set E by small balls B ( x i , r i ) . Byrounding up each r i , we can assume without loss of generality that each r i is a power oftwo (dyadic pigeonholing), and replace balls by cubes; we can then group together thecubes of a given size and obtain a covering E ⊂ ∑ j ≥ j B j where j is large and B j is a union of cubes of sidelength 2 − j . To get the required lowerbound it would then suffice to show that for at least one of the scales j , the number ofcubes used to form B j is ≳ j ( α − ε ) . But which scale j to use? Since the B j cover each ℓ ω , we have ∑ j ≥ j H ( ℓ ω ∩ B j ) ≥ ℓ ω , hence on integrating over all directions ω ∈ S d − using Fubini’s theorem we have ∑ j ≥ j ∫ S d − H ( ℓ ω ∩ B j ) dω ≳ . XPLORING THE TOOLKIT OF JEAN BOURGAIN 7
By the pigeonhole principle, we can then find a scale j ≥ j for which ∫ S d − H ( ℓ ω ∩ B j ) dω ≳ j (say). The quantity 1 / j is quite “large” compared to the scale 2 − j , and this estimateasserts (roughly speaking) that “many” of the ℓ ω have “large” intersection with the B j .Having selected such a good scale δ = − j , Bourgain was able to proceed to establishnew bounds on α by estimation of an expression now known as the Kakeya maximalfunction associated to this scale; see the companion paper [19] for further discussion ofthis function and its applications to the Fourier restriction problem.The dyadic pigeonholing method does not need to explicitly involve powers of two(or other lacunary sequences of scales). One of Bourgain’s earlier uses of the methodappears in his work [3, §
5] on quantitative versions of a Lipschitz embedding theoremof Ribe [27], where at one point in the argument he has a Lipschitz function F ∶ E → Y from a finite-dimensional normed space E to a Banach space Y , and wishes to locatea scale t at which the Poisson integral F ∗ P t of F has a large directional derivative ∂ a ( F ∗ P t )( x ) at one point x ∈ E . This scale t roughly corresponds to the spatial scale2 − j in the previous discussion. To locate a good scale t , Bourgain first observes fromthe triangle inequality that ∥ ∂ a ( F ∗ P t )∥ ∗ P s ( x ) ≥ ∥ ∂ a ( F ∗ P t + s )∥( x ) (4.5)and hence the integral quantity ∫ E ∥ ∂ a ( F ∗ P t )∥( x ) dx is non-increasing in t . On theother hand, the specific construction of the function F in [3] provided an upper boundon this integral that is uniform in t . Applying the pigeonhole principle, one can thenfind a scale t > ∫ E ∥ ∂ a ( F ∗ P t )∥( x ) dx ≈ ∫ E ∥ ∂ a ( F ∗ P ( R + ) t )∥( x ) dx for some moderately large parameter R >
0, and where we shall be vague about theprecise meaning of the symbol ≈ ; comparing this with (4.5) and other properties of F eventually gives the desired lower bound on ∂ a ( F ∗ P t )( x ) . See the companion paper[1] for further discussion of the impact of Bourgain’s “Ribe program”.Dyadic pigeonholing makes a small but important role in an important result [9] ofBourgain on the energy-critical nonlinear Schr¨odinger equation, discussed in more detailin Kenig’s article [22]: Theorem 4.1 (Global regularity for energy-critical NLS for radial data) . Let u ∈ H ( R ) be smooth and spherically symmetric, then there is a unique smooth finite energyglobal solution u ∶ R × R → C to the nonlinear Schr¨odinger equation i∂ t u + ∆ u = ∣ u ∣ u with initial data u ( , x ) = u ( x ) . At one stage [9, §
4] in the (rather intricate) argument, a solution u is constructedto exhibit a concentration property at a certain time t j s and frequency N j s , in that(suppressing a parameter η that is not relevant for the current discussion) ∥ P N js u ( t j s )∥ L ( R ) ≳ N − j s , (4.6) We thank Assaf Naor for this reference.
TERENCE TAO where P N js is a Fourier projection of Littlewood-Paley type to the frequency region { ξ ∶ ∣ ξ ∣ ∼ N j s } . In Bourgain’s argument it is necessary to propagate this lower boundfrom time t j s to a nearby time t j r . The NLS equation conserves the full L mass ∥ u ( t )∥ L ( R ) , but this cannot be directly used here due to the solution u potentiallyhaving an extremely large amount of L mass at low frequencies. To resolve this,Bourgain applies a Fourier truncation operator P ≥ N restricting to frequencies ∣ ξ ∣ ≳ N forsome N < N j s , and exploits the approximately conserved nature of the high-frequencyportion ∥ P ≥ N u ( t )∥ L ( R ) of the mass, by a computation of the time derivative ∂ t ∥ P ≥ N u ( t )∥ L ( R ) . There are several components to this time derivative, but the dominant contributioncomes from the portion of the solution u residing at frequency scales M comparable to N (intuitively, this reflects the potential exchange of mass in the frequency domain betweenfrequency modes of magnitude just below N , and frequency modes of magnitude justabove N ). For any given N , the upper bounds on this time derivative could overwhelmthe lower bound in (4.6); however, by obtaining an estimate for the sum over a rangeof N (in a manner analogous to (4.4)) and applying the pigeonhole principle, one canlocate a “good scale” N for which one has satisfactory control on the derivative ofthe mass. This idea to use the dyadic pigeonholing method to establish approximateconservation laws has wide application; for instance, it was used by Rodgers and myselfrecently to resolve the Newman conjecture [28] in analytic number theory.In some cases one can use more sophisticated tools than the pigeonhole principle to lo-cate a good scale. One example of this arises when establishing Bourgain’s quantitativerefinement [11] of Roth’s theorem [31]: Theorem 4.2 (Bourgain-Roth theorem) . Let N ≥ , and let A ⊂ { , . . . , N } be a setcontaining no three-term arithmetic progressions. Then ∣ A ∣ ≲ ( log log N ) / log / N N . A key innovation in this paper was to manipulate
Bohr sets B ( S, ρ ) ∶ = { n ∈ Z ∶ ∣ n ∣ ≤ N ; ∥ nθ ∥ ≤ ρ ∀ θ ∈ S } where ρ > S is a collection of frequencies θ ∈ R / Z , and ∥ x ∥ denotesthe distance of x to the nearest integer. These sets generalize the long arithmeticprogressions that appear prominently in previous work in this area such as [31], andare well adapted to the Fourier-analytic methods used to establish Roth-type theorems.However, a key difficulty arises due to the discontinuous nature of the Bohr sets in ρ ; in particular, if ρ ′ is close to ρ , there is no a priori reason why the cardinality of B ( S, ρ ′ ) should be close to that of B ( S, ρ ) . However, the cardinality ∣ B ( S, ρ )∣ is clearlynon-decreasing in ρ , and a simple covering argument gives a doubling bound ∣ B ( S, ρ )∣ ≲ O ( ) ∣ S ∣ ∣ B ( S, ρ )∣ . In particular, the distributional derivative ddρ log ∣ B ( S, ρ )∣ is a measure of total variation O (∣ S ∣) on any dyadic interval [ ρ , ρ ] . Combining this with the Hardy-Littlewoodmaximal inequality, one can conclude that every interval [ ρ , ρ ] contains a radius ρ XPLORING THE TOOLKIT OF JEAN BOURGAIN 9 where the Bohr set B ( S, ρ ) is “regular” in the sense that ∣ B ( S, ρ ′ )∣ = exp ( O (∣ S ∣ ∣ ρ ′ − ρ ∣∣ ρ ∣ )) ∣ B ( S, ρ )∣ for all ρ ′ >
0; see for instance [35, Lemma 4.25] for this version of the construction.Regular Bohr sets are now a standard tool in modern additive combinatorics.5.
Random translations
Consider the interval I = [ , / N ] in the unit circle R / Z for some large integer N .Then I is much smaller than R / Z , but we can cover R / Z by 1 /∣ I ∣ = N translates I + j / N, j = , . . . , N − R / Z . Of course, most subsets E of R / Z will not have thisperfect tiling property. However, by using random translations of E , one can achievesomething fairly close to a perfect tiling: Lemma 5.1 (Random translations) . Let G = ( G, ⋅ ) be a compact group (not necessarilyabelian) with Haar probability measure µ . Let E be a measurable subset of G , and let N be a natural number. Then there exist translates g E, . . . , g N E of E by some shifts g , . . . , g N ∈ G with µ ( g E ∪ ⋅ ⋅ ⋅ ∪ g N E ) ≥ − ( − µ ( E )) N . Proof.
We use the probabilistic method. Let g , . . . , g N be drawn independently atrandom from G using the Haar measure µ . Then by the Fubini-Tonelli theorem wehave E µ ( g E ∪ ⋅ ⋅ ⋅ ∪ g N E ) = ∫ G E g E ∪⋅⋅⋅∪ g N E ( x ) dµ ( x ) = ∫ G E ( − N ∏ i = g i ( G / E ) ( x )) dµ ( x ) = ∫ G ( − N ∏ i = E g i ( G / E ) ( x )) dµ ( x ) = ∫ G ( − N ∏ i = µ ( G / E )) dµ ( x ) = − ( − µ ( E )) N and the claim follows. (cid:3) In particular, if µ ( E ) ∼ / N , then we can find N translates g E, . . . , g N E of E whoseunion has measure ∼
1, thus these translates behave as if they are disjoint “up toconstants”. We observe that the same claim also holds for any homogeneous space G / H of a compact group G (with the attendant Haar probability measure), simply bylifting subsets of that homogeneous space back up to G .Lemma 5.1 allows one in many cases to reduce the analysis of “small” subsets of acompact group G (or a homogenous space G / H of G ) to the analysis of “large” sets, particularly if the problem in question enjoys some sort of translation symmetry withrespect to the group G . This idea was for instance famously exploited by Stein [30] inhis maximum principle equating almost everwhere convergence results for translation-invariant operators with weak-type ( p, p ) maximal inequalities. In [6, § Proposition 5.2.
Suppose that d ≥ and < p < is such that one has the restrictionestimate ∥ ˆ f ∥ L ( S d − ,dσ ) ≲ p,d ∥ f ∥ L p ( R d ) (5.1) for all Schwartz functions f ∶ R d → C , where σ is normalized surface measure on thesphere S d − . Then one can automatically improve this to the stronger estimate ∥ ˆ f ∥ L p, ∞ ( S d − ,dσ ) ≲ p,d ∥ f ∥ L p ( R d ) . Proof. (Sketch) We can normalize ∥ f ∥ L p ( R d ) =
1. Let λ >
0, and let E ⊂ S d − denote thelevel set E ∶ = { ω ∈ S d − ∶ ∣ ˆ f ( ω )∣ ≥ λ } . Our task is to show that σ ( E ) ≲ p,d λ − p . A direct application of (5.1) only gives theestimate σ ( E ) ≲ p,d λ − , which is inferior when λ is large. However, if we let N be aninteger with N ∼ / σ ( E ) , then by Lemma 5.1 (applied to the homogeneous space S d − ≡ SO ( d )/ SO ( d − ) ) one can find rotations R E, . . . , R N E of E with σ ( ⋃ Ni = R i E ) ∼
1. Ifone then considers the random sum F ( x ) ∶ = N ∑ i = ǫ i F ( R i x ) where the ǫ i are independent random signs { − , + } (or random gaussian variables), aroutine application of Khintchine’s inequality reveals that with positive probability, onehas ∥ F ∥ L p ( R d ) ≲ p N / p and σ ({ ω ∈ S d − ∶ ∣ ˆ F ( ω )∣ ≳ λ }) ≳ σ ( N ⋃ i = R i E ) ≳ λ ≲ p,d N / p which gives the required estimate σ ( E ) ≲ p,d λ − p . (cid:3) Variations of this argument also appear at several other locations in [6].
XPLORING THE TOOLKIT OF JEAN BOURGAIN 11 Metric entropy and concentration of measure If X is a random variable (which for sake of discussion we take to be real-valued) withfinite second moment, so that the mean E X and the variance Var ( X ) are both finite,then Chebyshev’s inequality asserts that P (∣ X − E X ∣ ≥ λ √ Var ( X )) ≤ λ for any λ >
0, thus there the random variable X exhibits some concentration of measure to the interval [ E X − √ Var ( X ) , E X + √ Var ( X )] , in the sense that the probability oflying far outside this interval drops at a polynomial rate to the (normalized) distance tothis interval). In many situations (particularly if X is somehow “influenced” by many“independent sources of randomness”), the decay is in fact far stronger than the 1 / λ decay; exponential or even Gaussian type decay can often be obtained. For instance, if X is a Gaussian variable, then we have P (∣ X − E X ∣ ≥ λ √ Var ( X )) ≲ exp ( − cλ ) (6.1)for all λ > c >
0. Or, if X = ∑ ni = X i is the sum ofindependent random variables X i that each lie in some interval [ a i , b i ] , then the classicalHoeffding inequality gives P ⎛⎝∣ X − E X ∣ ≥ λ ¿ÁÁÀ n ∑ i = ( b i − a i ) ⎞⎠ ≲ exp ( − cλ ) . Many further concentration of measure inequalities of this type are available; see forinstance the text [23] for a systematic discussion.One can combine these sorts of concentration of measure inequalities to control the largedeviations of suprema sup t ∈ T X t of a random process ( X t ) t ∈ T , where t is a parameterranging in some index set T . If for instance T is finite, then from the union bound wehave P ( sup t ∈ T X t > λ ) = P ( ⋁ t ∈ T ( X t > λ )) ≤ ∑ t ∈ T P ( X t > λ ) for any λ >
0. For λ large, one can hope to use concentration of measure inequalities toobtain exponentially strong bounds on each individual probability P ( X t > λ ) ; if T is nottoo large (e.g., subexponential in cardinality) then this method can lead to non-triviallarge deviation bounds on the supremum sup t ∈ T X t . However, often in many cases ofinterest T is too large for this method to be directly used; for instance, t could be acontinuous parameter, in which case T is likely to be uncountably infinite. But in manyapplications T has the structure of a totally bounded metric space ( T, d ) , thus for eachscale ε > T ε of T with the property that every elementof T lies within ε of some element of T ε . The minimal possible cardinality of T ε is knownas the metric entropy of T at scale ε and we will denote it by N ( T, d ; ε ) . Applying thesenets for each ε = − n , one can assign to each element t ∈ T a chain t , t , . . . convergingto t with t n ∈ T − n and d ( t n , t n + ) ≤ − n + for all n . If X t depends continuously on t , thetriangle inequality then gives the bound X t ≤ ∣ X t ∣ + ∞ ∑ n = ∣ X t n − X t n + ∣ and hence sup t ∈ T X t ≤ sup t ∈ T ∣ X t ∣ + ∞ ∑ n = sup t ∈ T − n ; t ′ ∈ T − n − ; d ( t,t ′ ) ≤ − n + ∣ X t − X t ′ ∣ . (6.2)It is then possible to obtain good large deviation bounds on the uncountable supremumsup t ∈ T X t by using the previous strategy to obtain large deviation bounds on the finitesuprema sup t ∈ T ∣ X t ∣ and sup t ∈ T − n ; t ′ ∈ T − n − ; d ( t,t ′ ) ≤ − n + ∣ X t − X t ′ ∣ , which one then combinesusing crude tools such as the union bound. We refer to this as the chaining argument ; itis particularly effective when one has good bounds on the metric entropies N ( T, d ; 2 − n ) .A prototype application of the chaining argument is Dudley’s inequality [20] E sup t ∈ T X t ≲ ∑ n ∈ Z − n √ log N ( T, d ; 2 − n ) whenever ( X t ) t ∈ T is a (mean zero) Gaussian process with T equipped with the metric(or more precisely, pseudo-metric) d ( t, t ′ ) ∶ = √ Var ( X t − X t ′ ) . This inequality can be readily proven by combining (6.2) with (6.1) and the unionbound; see for instance [33, §
2] for a clear treatment. However, the chaining argumentis substantially more general and flexible than this, with many early applications of thechaining method due to Bourgain. Perhaps the most well known is the following result:
Theorem 6.1 (Random sets of orthonormal systems have the Λ ( p ) property) . Let φ , . . . , φ n be a system of bounded orthonormal functions on a probability space ( X, µ ) ,let < p < ∞ , and let S ⊂ { , . . . , n } be a random set with each i = , . . . , n lying in S with an independent probability of n / p − . Then with probability ∼ , one has the “ Λ ( p ) inequality” ∥ ∑ i ∈ S a i φ i ∥ L p ( X,µ ) ≲ p ( ∑ i ∈ S ∣ a i ∣ ) / for all real or complex numbers a i . This theorem famously resolved a long-standing problem in harmonic analysis, namelywhether it was possible to produce an (infinite) family of plane waves x ↦ e πinx onthe unit circle R / Z which obeyed the Λ ( p ) inequality but not the Λ ( q ) inequality for agiven choice of 2 < p < q < ∞ .The proof of Theorem 6.1 is quite complicated and we only give an extremely oversimpli-fied sketch here. The main difficulty here is one needs to control a random uncountablesupremum K ∶ = sup ∣ a ∣ ≤ ∥ n ∑ i = i ∈ S a i φ i ∥ L p ( X,µ ) (6.3)where a = ( a , . . . , a n ) ranges over vectors of norm at most 1. Raising this expression tothe power p , we obtainsup ∣ a ∣ ≤ ∫ X n ∑ i = i ∈ S a i φ i n ∑ j = j ∈ S a j φ j ∣ n ∑ k = k ∈ S a k φ k ∣ p − dµ. (6.4)The set S appears here three times, but by randomly decomposing S into three subsets S , S , S , and also judiciously splitting a into three components a, b, c , it turns out that XPLORING THE TOOLKIT OF JEAN BOURGAIN 13 one can reduce the task of bounding this expression into that of bounding “decoupled”analogues of (6.4) such assup ∣ a ∣ , ∣ b ∣ , ∣ c ∣ ≤ ∫ X n ∑ i = i ∈ S a i φ i n ∑ j = j ∈ S b j φ j ∣ n ∑ k = k ∈ S c k φ k ∣ p − dµ. In fact by using some further dyadic decompositions (in the spirit of Section 4) one canmake further restrictions on the support and pointwise magnitudes of a, b, c ); a typicalsuch restriction to keep in mind is that there is some 1 ≤ m ≤ n for each of the a, b, c are supported on sets of cardinality at most m and are pointwise bounded by O ( m − / ) .One then applies a variant of Dudley’s inequality to control the supremum in a , reducingmatters to controlling metric entropies of a collection of functions of the form { n ∑ j = j ∈ S b j φ j ∣ n ∑ k = k ∈ S c k φ k ∣ p − ∶ ∣ b ∣ , ∣ c ∣ ≤ } (with additional constraints on the support of b, c that we do not detail here). Theseare controlled in turn by elementary inequalities (such as H¨older’s inequality) as wellas a variant of the random variable K defined in (6.3), as well as some metric entropyestimates of Bourgain, Lindenstrauss, and Milman [13].The chaining method was later streamlined into the generic chaining method of Tala-grand, in which the metric balls { t ′ ∶ d ( t, t ′ ) ≤ ε } that implicitly appear in the chainingargument are allowed to be weighted by an arbitrary measure on T known as a ma-jorizing measure , leading to estimates that are essentially optimal in many situations,and can be used for instance to give a simplified proof of Theorem 6.1 with a strongerconclusion, and which avoids the use of decoupling methods; see [34] for details.There are many other works of Bourgain (and coauthors) in which metric entropy andchaining arguments are used to bound large deviations of supremum type quantities.Here is a small sample: ● The paper [13] concerns the approximation theory of zonotopes (finite sums ofintervals in a normed vector space), showing that all zonoids (limits of zono-topes) can be efficiently approximated by “low complexity” zonotopes. A keystep is to use a chaining argument to show that an L norm ∥ x ∥ L ( X,µ ) can beefficiently approximated by an empirical sample N ∑ N i = ∣ x ( i )∣ uniformly for all x in a certain convex body, relying heavily on metric entropy estimates on convexbodies such as the dual Sudakov inequality [25]. ● In [8], a metric entropy and concentration of measure argument is used to showthat Montgomery’s large values conjecture [24] in analytic number theory on thedistribution of large values of a Dirichlet polynomial ∑ n ∼ M a n n it = ∑ n ∼ M a n e it log n is almost surely true if one replaces the frequencies log n by a suitable randomset. This sort of trick to decouple probabilistic expressions of dependent random variables into prob-abilistic expressions of independent random variables is another useful member of Bourgain’s toolkitthat is now widely used in probability theory. We will not discuss this decoupling trick in further detailhere, but see for instance [18]. ● In [10], a metric entropy and concentration of measure argument (as well asthe decoupling trick) is used to show if K is a symmetric convex body K ofunit volume whose moment of inertia ∫ K yy T dy is normalised to be a constantmultiple LI of the identity matrix, then this moment of inertia can be closelyapproximated by that of a surprisingly small number of randomly chosen pointsfrom K ; this has applications to random matrix theory and statistics, in partic-ular in allowing one to compare a covariance matrix with an empirical sampleof that matrix [36]. ● In [12], a chaining argument combined with dyadic decomposition is used toshow that a randomly selected collection S of columns in a bounded orthonor-mal system obeys the “restricted isometry property”, of being an approximateisometry when restricted to any m rows, as long as S is only slightly larger than m , improving quantitatively over previous bounds [16, 32] in the area.7. Putting it all together: a case study
Let us say that a subset S of the reals has property (E) if, whenever every measurablesubset A of R of positive measure contains an affine image x + tS ∶ = { x + ty ∶ y ∈ S } of S for some x ∈ R and t ≠
0. An easy application of the Lebesgue density theorem showsthat every finite set of reals has property (E). The following question of Erd˝os is stillunsolved:
Problem 7.1 (Erd˝os similarity problem) . Does there exist an infinite set S with prop-erty (E)? One of the strongest general negative results in this direction is by Bourgain [4]:
Theorem 7.2 (Triple sumsets fail (E), qualitative version) . Let S , S , S be infinitesubsets of R . Then the sumset S + S + S ∶ = { s + s + s ∶ s ∈ S , s ∈ S , s ∈ S } failsproperty (E). The corresponding question for double sumsets S + S remains open; it will be clearshortly why it is necessary in Bourgain’s arguments to have at least three summands.As we shall see, the proof of this result can largely be described as an application ofseveral of the tools discussed in this paper. The first step is to convert the problem toa quantitative one, as per Section 3. The quantitative formulation is as follows: Theorem 7.3 (Triple sumsets fail (E), quantitative version) . Let S , S , S be boundedinfinite subsets of R containing as an adherent point. Then there does not exist aconstant C for which one has the bound ∫ ( R / Z ) J inf < t < sup x ′ ∈ x + tS v ∣ f ( x ′ )∣ dx ≤ C ∫ ( R / Z ) J ∣ f ( x )∣ dx for all tori ( R / Z ) J , all continuous f ∶ ( R / Z ) J → R , all vectors v ∈ R J , and all finitesubsets S of S + S + S . XPLORING THE TOOLKIT OF JEAN BOURGAIN 15
Let us now sketch why Theorem 7.3 implies Theorem 7.2. (The converse implication isalso true; see [4, § S , S , S are bounded; by Bolzano-Weierstrass and translation, we may assume that each of the S i contain 0 as an adherentpoint.Let δ > M >
0. From Theorem 7.3 and rescaling, one canfind a torus ( R / Z ) J , a vector v ∈ R J , a finite subset S of S + S + S and a continuousfunction f (which we can take to be non-negative) such that ∫ ( R / Z ) J F ( x ) dx > M ∫ ( R / Z ) J f ( x ) dx where F ( x ) ∶ = inf δ < t < δ sup x ′ ∈ x + tS v f ( x ′ ) dx . Applying dyadic pigeonholing (and writing ∫ ( R / Z ) J F dx = ∫ ∞ ∣{ F ≥ λ }∣ dλ and ∫ ( R / Z ) J f dx = ∫ ∞ ∣{ f > λ }∣ dλ ), as per Section 4, wecan then find a threshold λ > ∣{ F ≥ λ }∣ > M ∣{ f > λ }∣ . If we write A ∶ = { f > λ } and A ∶ = ⋂ δ < t < δ ⋃ y ∈ S ( A − tyv ) then we have ∣ A ∣ ≥ M ∣ A ∣ . The set A could be small compared with ( R / Z ) J ; however byusing the random translations trick as per Section 5, we can find an open set B ⊂ ( R / Z ) J (a union of finitely many translates of A ) of arbitrarily small measure such that the set B ∶ = ⋂ δ < t < δ ⋃ y ∈ S ( B − tyv ) has measure arbitrarily close to 1. By construction, the set B / B does not contain anyset of the form x + t ( S + S + S ) v with x ∈ R / Z and δ < t < δ (here we use the factthat B is open and 0 is an adherent point of S + S + S ). If one then considers theset { y ∈ [ , ] ∶ x + tyv ∈ B / B } for a randomly chosen x ∈ ( R / Z ) J , one can then find asubset of [ , ] of measure arbitrarily close to 1 that does not contain any set of the form x + tS with x ∈ R and δ < t < δ . Taking intersections over all small dyadic choices of δ (and then restricting to a small interval to eliminate large scales) we can then establishTheorem 7.2.Now we sketch the proof of Theorem 7.3. We have to find a continuous function f ∶ ( R / Z ) J → R and a vector v ∈ R J for which ∫ ( R / Z ) J inf < t < sup x ′ ∈ x + tS v ∣ f ( x ′ )∣ dx ≫ ∫ ( R / Z ) J ∣ f ( x )∣ dx for some finite S ⊂ S + S + S , where we informally use X ≫ Y to denote the claimthat X is much larger than Y . The next idea is to take advantage of large deviations asper Section 6. To do this one needs to select the vector v so that the collection of dilates tS v mod Z J , ≤ t ≤ J ∶ = J for a large J , then as S , S , S all have 0 as an adherent point we can find non-zeroreal numbers s i,j ∈ S i for i = , , j = , . . . , J with the relative size relation ∣ s , ∣ ≫ ⋅ ⋅ ⋅ ≫ ∣ s ,J ∣ ≫ ∣ s , ∣ ≫ ⋅ ⋅ ⋅ ≫ ∣ s ,J ∣ ≫ ∣ s , ∣ ≫ ⋅ ⋅ ⋅ ≫ ∣ s ,J ∣ > . We then let v ∈ R J be the vector v ∶ = ( s i,j ) i = , , j = ,...,J . If we set S ∶ = { s ,j + s ,j + s ,j ∶ ≤ j , j , j ≤ J } , then S is a finite subset of S + S + S .A routine calculation shows that for any 1 ≤ t ≤
2, the set tS v mod Z J consists of J points separated from each other (in the ℓ ∞ metric on ( R / Z ) J ) by ≳
1. If this set had nofurther structure, one would then expect the “entropy” of such sets to be exponential in J × J ∼ J . However the arithmetic structure gives this set significantly lower entropy.Indeed, observe that each of the J coordinates of each of the J elements of tS v mod Z J are sums of three quantities of the form s i ′ ,j ′ s i,j mod 1 for i, i ′ = , . . . , j, j ′ = , . . . , J .Thus the set tS v mod Z J is completely described by O ( J ) parameters, so the metricentropy of these sets (using the Hausdorff metric and the ℓ ∞ norm on ( R / Z ) J ) at anygiven metric scale 0 < δ < O ( / δ ) O ( J ) . In particular, this entropy is subexponentialcompared to the cardinality J of S . (It is at this point that it is essential that wehave at least three summands in the set S + S + S .)Now we sketch how to construct the function f . Let ε >
0, and suppose J is largedepending on ε . Partition ( R / Z ) J into cubes of sidelength (say) J , and let E bethe union of a random collection of these cubes, with each cube selected in E withan independent probability of ε . We set f to be the indicator function 1 E (we ignorefor this sketch the requirement that f be continuous, as this can be addressed by astandard mollification). Then ∫ ( R / Z ) J ∣ f ( x )∣ dx ∼ ε with high probability. On the otherhand, for any given x ∈ ( R / Z ) J and 1 < t <
2, we will have sup x ′ ∈ x + tS v ∣ f ( x ′ )∣ = − exp ( − εJ ) . Using the metric entropy bound, one can then showthat inf < t < sup x ′ ∈ x + tS v ∣ f ( x ′ )∣ = − O ( O ( J ) O ( J ) exp ( − εJ )) , whichis comparable to 1 if J is large enough. This gives the claim. References [1] K. Ball,
The Legacy of Jean Bourgain in Geometric Functional Analysis , preprint.[2] J. Bourgain,
A Szemer´edi type theorem for sets of positive density in R k , Israel J. Math. (1986),no. 3, 307–316.[3] J. Bourgain, Remarks on the extension of Lipschitz maps defined on discrete sets and uniformhomeomorphisms. Geometrical aspects of functional analysis (1985/86), 157167, Lecture Notes inMath., 1267, Springer, Berlin, 1987.[4] J. Bourgain, Construction of sets of positive measure not containing an affine image of a giveninfinite structures , Israel J. Math. (1987), no. 3, 333–344.[5] J. Bourgain, Pointwise ergodic theorems for arithmetic sets , With an appendix by the author,Harry Furstenberg, Yitzhak Katznelson and Donald S. Ornstein. Inst. Hautes tudes Sci. Publ.Math. No. 69 (1989), 5–45.[6] J. Bourgain,
Besicovitch type maximal operators and applications to Fourier analysis , Geom. Funct.Anal. 1 (1991), no. 2, 147–187.[7] J. Bourgain,
On the distribution of Dirichlet sums , J. Anal. Math. (1993), 21–32.[8] J. Bourgain, Remarks on Halasz-Montgomery type inequalities , Geometric aspects of functionalanalysis (Israel, 19921994), 2539, Oper. Theory Adv. Appl., 77, Birkhauser, Basel, 1995.[9] J. Bourgain,
Global wellposedness of defocusing critical nonlinear Schr¨odinger equation in theradial case , J. Amer. Math. Soc. (1999), no. 1, 145–171. XPLORING THE TOOLKIT OF JEAN BOURGAIN 17 [10] J. Bourgain,
Random points in isotropic convex sets , Convex geometric analysis (Berkeley, CA,1996), 53–58, Math. Sci. Res. Inst. Publ., 34, Cambridge Univ. Press, Cambridge, 1999.[11] J. Bourgain,
On triples in arithmetic progression , Geom. Funct. Anal. (1999), no. 5, 968–984.[12] J. Bourgain, An improved estimate in the restricted isometry problem , Geometric aspects of func-tional analysis, 65–70, Lecture Notes in Math., 2116, Springer, Cham, 2014.[13] J. Bourgain, J. Lindenstrauss, V. Milman,
Approximation of zonoids by zonotopes , Acta Math.162 (1989), no. 1-2, 73–141.[14] J. Bourgain, L. Tzafriri,
Invertibility of “large” submatrices with applications to the geometry ofBanach spaces and harmonic analysis , Israel J. Math. 57 (1987), no. 2, 137–224.[15] J. Bourgain, L. Tzafriri,
On a problem of Kadison and Singer , J. Reine Angew. Math. 420 (1991),1–43.[16] E. Cand`es, T. Tao,
Near-optimal signal recovery from random projections: universal encodingstrategies? , IEEE Trans. Inform. Theory (2006), no. 12, 5406–5425.[17] A. Cauchy, Anciens Exercises, vol. 2 (1827), p. 221.[18] V. de la Pe˜na, E. Gin´e, Decoupling, From dependence to independence. Randomly stopped pro-cesses. U-statistics and processes. Martingales and beyond. Probability and its Applications (NewYork). Springer-Verlag, New York, 1999.[19] C. Demeter, Bourgain’s work in Fourier restriction , preprint.[20] R. Dudley,
The sizes of compact subsets of Hilbert space and continuity of Gaussian processes ,Journal of Functional Analysis (1967), 290–330.[21] H. Furstenberg, Y. Katznelson, B. Weiss, Ergodic theory and configurations in sets of positivedensity , Mathematics of Ramsey theory, 184–198, Algorithms Combin., 5, Springer, Berlin, 1990.[22] C. Kenig,
On the work of Jean Bourgain in nonlinear dispersive equations , preprint.[23] M. Ledoux, The concentration of measure phenomenon. Mathematical Surveys and Monographs,89. American Mathematical Society, Providence, RI, 2001.[24] H. Montgomery, Topics in multiplicative number theory. Lecture Notes in Mathematics, Vol. 227.Springer-Verlag, Berlin-New York, 1971.[25] A. Pajor, N. Tomczak-Jaegermann,
Subspaces of small codimension of finite dimensional Banachspaces , Proc. Amer. Math. Soc. (1986), 634–642.[26] G. Pisier, Factorization of operators through L p, ∞ or L p, and noncommutative generalizations ,Math. Ann. 276 (1986), no. 1, 105–136.[27] M. Ribe, On uniformly homeomorphic normed spaces , Ark. Mat. (1976), no. 2, 237–244.[28] B. Rodgers, T. Tao, The de Bruijn-Newman constant is non-negative , Forum Math. Pi (2020),e6, 62 pp.[29] G-C. Rota, Ten lessons I wish I had been taught , Notices Amer. Math. Soc. 44 (1997), no. 1, 22–25.[30] E. M. Stein,
On limits of seqences of operators , Ann. of Math. (2) (1961), 140–170.[31] K. F. Roth, On certain sets of integers , J. London Math. Soc. (1953), 104–109.[32] M. Rudelson, R. Vershynin, On sparse reconstruction from Fourier and Gaussian measurements ,Comm. Pure Appl. Math. (2008), no. 8, 1025–1045.[33] M. Talagrand, Majorizing measures: the generic chaining , Ann. Probab. 24 (1996), no. 3, 1049–1103.[34] M. Talagrand, The generic chaining. Upper and lower bounds of stochastic processes. SpringerMonographs in Mathematics. Springer-Verlag, Berlin, 2005.[35] T. Tao, V. Vu, Additive combinatorics. Cambridge Studies in Advanced Mathematics, 105. Cam-bridge University Press, Cambridge, 2006.[36] R. Vershynin,
How close is the sample covariance matrix to the actual covariance matrix? , J.Theoret. Probab. (2012), no. 3, 655–686. UCLA Department of Mathematics, Los Angeles, CA 90095-1555.
E-mail address ::