On self-similar sets with overlaps and inverse theorems for entropy in R d
aa r X i v : . [ m a t h . C A ] J un On self-similar sets with overlaps and inversetheorems for entropy in R d Michael Hochman
Abstract
We study self-similar sets and measures on R d . Assuming that thedefining iterated function system Φ does not preserve a proper affine sub-space, we show that one of the following holds: (1) the dimension is equalto the trivial bound (the minimum of d and the similarity dimension s );(2) for all large n there are n -fold compositions of maps from Φ which aresuper-exponentially close in n ; (3) there is a non-trivial linear subspace of R d that is preserved by the linearization of Φ and whose translates typi-cally meet the set or measure in full dimension. In particular, when thelinearization of Φ acts irreducibly on R d , either the dimension is equal to min { s, d } or there are super-exponentially close n -fold compositions. Wegive a number of applications to algebraic systems, parametrized systems,and to some classical examples.The main ingredient in the proof is an inverse theorem for the entropygrowth of convolutions of measures on R d , and the growth of entropy forthe convolution of a measure on the orthogonal group with a measure on R d . More generally, this part of the paper applies to smooth actions ofLie groups on manifolds. Contents R d . . . . . . . . . . . . . 192.5 An inverse theorem for isometries acting on R d . . . . . . . . . . 202.6 Generalizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Supported by ERC grant 306494, partially supported by ISF grant 1409/11 . 28A80, 11K55, 11B30, 11P70 Entropy, concentration, uniformity and saturation 24 R d R d G -components 625.3 Entropy and the G -action on R d . . . . . . . . . . . . . . . . . . 655.4 Linearization of the G -action . . . . . . . . . . . . . . . . . . . . 675.5 Proof of the inverse theorem . . . . . . . . . . . . . . . . . . . . . 695.6 Generalizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 R d n . . . . . . . . . . . . . . . . . . . . . . . . . 766.3 Saturated subspaces of self-similar measures . . . . . . . . . . . . 786.4 Entropy and dimension for self-similar measures . . . . . . . . . . 826.5 Proof of Theorem 1.5 . . . . . . . . . . . . . . . . . . . . . . . . . 846.6 Transversality and the dimension of exceptions . . . . . . . . . . 886.7 Applications and further comments . . . . . . . . . . . . . . . . . 94 Self similar sets and measures are among the simplest fractal objects: theirdefining property is that the whole is made up of finitely many objects similarto it, i.e. identical to the whole except for scaling, rotation and translation.When these smaller copies are sufficiently separated from each other the small-scale structure is relatively easy to understand and in particular the Hausdorffdimension can be computed precisely in terms of the defining similitudes. With-out separation, however, things are significantly more complicated, and it is anopen problem to compute the dimension. Many special cases of this problemhave received attention, including the Erdös problem on Bernoulli convolutions,Furstenberg’s projection problem for the 1-dimensional Sierpinski gasket (nowsettled), the Keane-Smorodinsky the , , -problem, and “fat” Sierpinski gaskets(for more on these, see below). 2or self-similar sets and measures in R there is a longstanding conjecturepredicting that the dimension will be “as large as possible”, subject to the com-binatorial constraints, unless there are exact overlaps, i.e. unless some of the(iterated) small-scale copies of the original coincide. In recent work [12] weintroduced methods from additive combinatorics to this problem and obtaineda partial result towards the conjecture, showing that if the dimension is “toosmall” then there are super-exponentially close pairs of small-scale copies. Inparticular, for some important classes of self-similar sets, e.g. those defined bysimilarities with algebraic coefficients, this resolves the conjecture.In the present paper we treat the general case of self-similar sets and mea-sures in R d . Easy examples show that in the higher-dimensional setting theconjecture above is false as stated (Example 1.2). The main new feature of theproblem is that the linear parts of the defining similarities may act reducibly on R d , and “excess dimension” may accumulate on non-trivial invariant subspacesand produce dimension loss. To correct this we propose here a modified versionof the conjecture that takes this possibility into account (Conjecture 1.3), andprove a weak version of it (Theorem 1.5), analogous to the main result of [12].We give various applications, in particular we show that the modified conjectureholds when the linear action is irreducible and the coefficients of the similaritiesare algebraic.As in the 1-dimensional case, a central ingredient in the proof is an inversetheorem about the structure of probability measures on R d whose convolutionshave essentially the same entropy as the original (Theorem 2.8). In fact, whatwe really need is a result of this type for the convolution of a measure on R d with a measure on the similarity group, or one on the isometry group (Theorem2.12). These results are of independent interest, and provide a versatile toolfor analyzing smooth images of product measures. We take the opportunity todevelop these methods here, in particular stating results for convolutions in Liegroups and their actions (Theorems 2.12, 2.14 and the subsequent corollaries). Let G denote the group of similarities of R d , namely maps x rU x + a for r ∈ (0 , ∞ ) , a ∈ R d and U a d × d orthogonal matrix; we denote the map simplyby ϕ = rU + a . In this paper an iterated function system means a finite family Φ = { ϕ i } i ∈ Λ ⊆ G consisting of contractions, so ϕ i = r i U i + a i with < r i < .A self similar set is the attractor of such a system, defined as the unique compactset ∅ 6 = X ⊆ R satisfying X = [ i ∈ Λ ϕ i X. (1)The self-similar measure determined by Φ and a positive probability vector ( p i ) i ∈ Λ is the unique Borel probability measure µ on R d satisfying µ = k X i =1 p i · ϕ i µ. Here and throughout, ϕµ = µ ◦ ϕ − denotes the push-forward of µ by ϕ .It is a classical problem to understand the small-scale structure of self-similarsets and measures, and especially their dimension. We shall write dim A for the3ausdorff dimension of A and define the dimension of a finite Borel measure θ by dim θ = inf { dim E : θ ( E ) > } . The textbook case of self-similar sets and measures occurs when the images ϕ i X are disjoint, or satisfy some weaker separation assumption (e.g. the openset condition). Then the dimension can be computed exactly: dim X is equalto the similarity dimension s-dim X , i.e. the unique s ≥ solving the equation P | r i | s = 1 , and dim µ is equal to the similarity dimension of µ , defined by s-dim µ = P p i log p i P p i log r i . It is when the images ϕ i X have more substantial overlap that the problembecomes very challenging. The similarity dimension, and the dimension d of theambient space R d , still constitute upper bounds. Thus one always has dim X ≤ min { d, s-dim X } (2) dim µ ≤ min { d, s-dim µ } . (3)In general little more is known. In fact, we usually cannot even determinewhether or not equality holds in (2) and (3). There is one exception to this,which arises from combinatorial coincidences of cylinder sets. For i = i . . . i n ∈ Λ n write ϕ i = ϕ i ◦ . . . ◦ ϕ i n . One says that exact overlaps occur if there is an n and distinct i, j ∈ Λ n suchthat ϕ i = ϕ j (in particular the images ϕ i X and ϕ j X coincide). If this occursthen the attractor (or self-similar measure) can be expressed using an IFS Ψ which is a proper subset of { ϕ i } i ∈ Λ n , and a strict inequality in (2) and (3) mayfollow from the trivial bounds (2) and (3) applied to the IFS Ψ . Define the distance between similarities ψ = rU + a and ψ ′ = r ′ U ′ + a ′ by d ( ψ, ψ ′ ) = | log r − log r ′ | + k U − U ′ k + k a − a ′ k . (4)Here k·k denotes the Euclidean or operator norm as appropriate. Given an IFS Φ = { ϕ i } i ∈ Λ , let ∆ n = min { d ( ϕ i , ϕ j ) : i, j ∈ Λ n , i = j } . (5) This is the lower Hausdorff dimension. Many other notions of dimension exist but sinceself-similar measures are exact dimensional [8], for them all the major ones coincide. The similarity dimension depends on the IFS Φ rather than the attractor, but we preferthe shorter notation s-dim X in which Φ is implicit. The meaning should always be clear fromthe context. A similar comment holds for the similarity dimension of measures. If i ∈ Λ k , j ∈ Λ m and ϕ i = ϕ j , then i cannot be a proper prefix of j and vice versa,because the maps are all contractions. Thus ij, ji ∈ Λ k + m are distinct, and ϕ ij = ϕ ji . Thisshows that our definition is equivalent to one asking for coincidence of compositions of possiblydifferent lengths. Stated differently, exact overlaps means that the semigroup generated bythe ϕ i , i ∈ Λ , is not freely generated by them. In [12] we used the stronger metric in which the term | log r − log r ′ | is replaced by thediscrete distance δ r,r ′ . One could do the same here but the metric above is better suited insome of the generalizations presented in Section 2.12 and is good enough for our applications,so we restrict ourselves to it. ∆ n = 0 for all large n , and it iseasy to see that ∆ n → at least exponentially fast (this is an easy consequenceof contraction). Convergence may or may not be faster than this, but we notethat in some cases there is an exponential lower bound ∆ n ≥ c n > .The main result of [12] was a step towards the folklore conjecture that when d = 1 , the occurrence of exact overlaps is the only mechanism which can leadto a strict inequality in (2) and (3). Specifically, we proved the following [12,Corollary 1.2]: Theorem 1.1.
For a self-similar set X ⊆ R , if dim X < min { , s-dim X } then ∆ n → super-exponentially, i.e. − n log ∆ n → ∞ . The same conclusion holdsif dim µ < min { , s-dim µ } for a self-similar measure µ on X . When d ≥ , the analogous conjecture and analogous theorem are bothfalse. A trivial class of counterexamples arise when the maps in Φ preserve anon-trivial affine subspace V < R d , which is equivalent to having X ⊆ V . Inthis case, if s-dim X > dim V , then the trivial bound gives dim X ≤ min { dim V, s-dim X } = dim V < min { d, s-dim X } , even though there may be no exact overlaps.We say that Φ is affinely irreducible if the only trivial affine subspaces aresimultaneously preserved by all ϕ i ∈ Φ . The following example shows that affineirreducibility is also not enough for an analog of Theorem 1.1 to hold. Example 1.2.
Begin with the IFS
Φ = { ϕ ± } on R given by ϕ ± ( x ) = λ − x ± ,where λ = 1 . . . . is the real root of t − t − . This example, due toGarsia [11], has the property that ∆ n ≥ c · − n , and the attractor is the interval [ − λλ − , λλ − ] . Let Φ = { ϕ i } i ∈{±} denote the IFS consisting of all three-foldcompositions of the maps ϕ + , ϕ − . Then Φ has the same attractor and all themaps in Φ contract by the same ratio λ − < / . Now let Ψ = { ϕ − , ϕ } ,where ϕ = ϕ ◦ ϕ ◦ ϕ . Then Ψ is an IFS with the same contraction ratio λ − as Φ but it satisfies the strong separation condition (its attractor Y is the disjointunion of ϕ Y and ϕ − Y ), and hence dim Y = log 2 / log λ < . Finally, take theproduct IFS Γ = Φ × Ψ , consisting of all maps of the form ( x, y ) ( ϕx, ψy ) for ϕ ∈ Φ , ψ ∈ Ψ . The attractor Z of Γ is just the product Z = [ − λλ − , λλ − ] × Y of the attractors of Φ and Ψ , and its dimension is / log λ . We cancompute the similarity dimension of Z using λ > and λ − λ − : s-dim Z = log | Γ | log λ = log 16log λ = log 16log(2 + λ ) < . We therefore have (using λ < ): dim Z = 1 + log 2log λ < log 16log λ = min { , s-dim Z } . On the other hand, since both Φ and Ψ have exponential lower bounds onthe distance between cylinders, there is also an exponential lower bound for Γ . Thus, the example shows that a strict inequality in (1) with neither exactoverlaps or even super-exponential concentration of cylinders.5wo things stand out about this example. First, the foliation of R byhorizontal lines is preserved by all maps in Γ , and, second, the excess similaritydimension is being “absorbed” in the intersection of the attractor of Γ with theselines. Indeed, in these intersections we are seeing essentially the 1-dimensionalIFS Φ , and we are not getting all of the potential dimension out of it, sinceits similarity dimension is > but attractor is “trapped” in a line. We do,however, have the maximal possible dimension for the intersection of Z withthose horizontal lines that intersect it.For an IFS Φ = { ϕ i } i ∈ Λ on R d , we say that a linear subspace V < R d is D Φ -invariant if it is invariant under the orthogonal parts (i.e. differentials) U i = Dϕ i of ϕ i ∈ Φ , and nontrivial if < dim V < d . If every D Φ -invariantsubspace is trivial then Φ is said to be linearly irreducible . The discussion abovesuggests the following: Conjecture 1.3.
Let X ⊆ R d be the attractor of an affinely irreducible IFS Φ ⊆ G . Then one of the following must hold:(i) dim X = min { d, s-dim X } .(ii) There are exact overlaps.(iii) There is a non-trivial D Φ -invariant linear subspace V ≤ R d and x ∈ X such that dim( X ∩ ( V + x )) = dim V. One might even conjecture a stronger form of (iii), e.g. that the set of points x in question is of full dimension in X , or is large in some other sense.The main result of this paper, Theorem 6.15, confirms a weakened versionof Conjecture 1.3: Theorem 1.4.
Let X ⊆ R d be the attractor of an affinely irreducible IFS Φ ⊆ G . Then one of the following must hold:(i’) dim X = min { d, s-dim X } .(ii’) ∆ n → super-exponentially.(iii’) There exists a non-trivial D Φ -invariant linear subspace V ≤ R d and x ∈ X such that dim( X ∩ ( V + x )) = dim V. The alternatives are not exclusive (all three may hold simultaneously).The theorem follows, as in the one-dimensional case, from a more precisestatement about the entropy of the measure at small scales. We require somenotation. The level- n dyadic partition D n of R is the partition into intervals [ k/ n , ( k + 1) / n ) , k ∈ Z . The level- n dyadic partition of R d is given by D dn = { I × . . . × I d : I i ∈ D n } . We omit the superscript d when it is clear from the context.For a probability measure ν and partitions E , F of the underlying probabilityspace we write H ( ν, E ) = − P E ∈E ν ( E ) log ν ( E ) and H ( ν, E|F ) = H ( ν, E ∨ F ) − H ( ν, F ) for the entropy and conditional entropy of ν with respect to E (con-ditioned on F , respectively). Here E ∨ F is the common refinement of the6artitions E , F . We also write H ( ν ) for the entropy of an atomic measure ν with respect to the partition into points.It is convenient to parametrize G as a subset of R × M d ( R ) × R d , with ( t, U, a ) corresponding to − t U + a ∈ G . Then the level- n dyadic partition D Gn of G isdefined as the partition induced from the corresponding level- n partition of R × M d ( R ) × R d ∼ = R d + d . We also introduce the partitions E Gn of G inducedby the dyadic partition according to the translation part of the similarities,which in the parametrization G ⊆ R × M d ( R ) × R d is E Gn = { ( R × M d ( R ) × D ) ∩ G : D ∈ D dn } . Note that D Gn refines E Gn .Given a self-similar measure µ = P i ∈ Λ p i · ϕ i µ and assuming all p i > , let ν ( n ) = X i ∈ Λ n p i · δ ϕ i . This is a probability measure on G , but if we fix e x in the attractor of Φ then thepush-forward of ν ( n ) via g g e x is the natural “ n -th generation” approximationof µ , given by e ν ( n ) = X i ∈ Λ n p i · δ ϕ i ( e x ) (this measure depends on the choice of e x but this is of little consequence). Let r = Y i ∈ Λ r p i i denote the (geometric) average contraction and for n ∈ N let n ′ = [ n/ log(1 /r )] , so that − n ′ ∼ r n .Now, it is not hard to show | H ( ν ( n ) , E Gn ′ ) − H ( e ν ( n ) , D n ′ ) | = O (1) (in fact ifwe take e x = 0 then two entropies are identical), and since it is easily seen that n ′ H ( e ν ( n ) , D n ′ ) → dim µ , one concludes lim n →∞ n ′ H ( ν ( n ) , E Gn ′ ) = dim µ. Observe that when there are no exact overlaps, ν ( n ) consists of | Φ | n atoms whosemasses are all the products p i · . . . · p i n , and hence H ( ν ( n ) ) = n · ( − P p i log p i ) .Thus for fixed n , n ′ H ( ν ( n ) , D k ) → − P p i log p i log r = s-dim µ as k → ∞ , and if there is a strict inequality in (3) we would have n ′ H ( ν ( n ) , D Gk |E Gn ′ ) = 1 n ′ H ( ν ( n ) , E Gk ) − n ′ H ( ν ( n ) , D Gk ) → s-dim µ − dim µ as k → ∞ > It is also common to approximated µ “at scale ρ ” by putting the appropriate mass on thepoints ϕ i ...i m ( x ) , where i . . . i m ∈ Λ ∗ are the sequences of minimal length such that ϕ i ...i m contracts by at least ρ . We could use this approximation instead of ν ( n ) , but this would leadto messier notation and have little advantage. k = k ( n ) such that the “excess” n ′ H ( ν ( n ) , D Gk ( n ) |E Gn ′ ) remains bounded away from as n → ∞ . It is natural to ask at what rate thisexcess entropy emerges, that is, how fast k ( n ) must grow for this to hold. Thefollowing theorem shows that it must grow at least super-linearly. Theorem 1.5.
Let µ ∈ P ( R d ) be a self-similar measure for an affinely irre-ducible IFS Φ . Then one of the following must hold:(i”) dim µ = min { d, s-dim µ } .(ii”) lim n →∞ n ′ H ( ν ( n ) , D Gqn |E Gn ′ ) = 0 for all q > .(iii”) There is a non-trivial D Φ -invariant linear subspace V ≤ R d such that for µ -a.e. x , the conditional measure µ V + x on V + x satisfies dim µ V + x =dim V . In fact, the second or third alternatives must hold irrespective of the validityof (i”). The usefulness of the theorem, however, lies in the fact that if (i”) failsand (ii”) holds then ∆ n → super-exponentially.Theorem 1.5 and Theorem 1.4 are usually applied by ruling out (iii’) or (iii”),and then working out the implications for the dimension. One trivial way torule it out is to just assume it: Corollary 1.6. If Φ is a linearly irreducible IFS Φ , then its attractor X satisfies(i’) or (ii’), and every self-similar measure µ for Φ satisfies (i”) or (ii”). As there are no non-trivial linear subspaces of R , every IFS acts linearlyirreducibly, and we have recovered the main results of [12] (Theorem 1.1 above).We say that rU + a ∈ G is algebraic if r and all the coordinates of U and a are algebraic numbers over Q , and we say that an IFS Φ ⊆ G is algebraic ifall of its elements are. If Φ is an algebraic IFS without exact overlaps, and wetake e x = 0 , then for each n , ∆ n is a polynomial in the algebraic parametersdefining the maps on Φ and has degree n and height at most exponential in n .This implies an exponential lower bound ∆ n ≥ c n ; this is a well known factbut we include a proof in Section 6.7. Thus we have ruled out (ii’) ad (ii”), andobtained the following: Corollary 1.7.
Let Φ be an algebraic IFS acting linearly irreducibly on R d andwithout exact overlaps. Then dim µ = min { d, s-dim µ } for every fully supportedself-similar measure µ of Φ , and dim X = min { d, s-dim X } . Our arguments are purely Euclidean and do not utilize any non-elementaryproperties of the orthogonal or similarity groups. However, the nature of thesegroups depends crucially on the dimension d . For d ≤ the orthogonal groupof R d is abelian (and the similarity group is solvable). In particular, the set U n = { U i } i ∈ Λ n of the orthogonal parts of ϕ i , i ∈ Λ n , is of polynomial sizein n , and does not contribute to the entropy H ( ν ( n ) , D Gqn |E Gn ′ ) (for the samereason, the contraction ratios do not contribute asymptotically to the entropy).For d ≥ the orthogonal group is a virtually simple Lie group with strongexpansion properties, and typically |U n | is exponential in n . Our methods donot make use of any special properties of the orthogonal group, but concurrentlyand independently with our work, Lindenstrauss and Varjú utilized the work ofBourgain and Gamburd [3] and of de Saxce [6] on spectral gap of random walkson the orthogonal group to prove the following result.8 heorem 1.8 (Lindenstrauss-Varjú, [22]) . Let U , . . . , U k ∈ SO ( d ) and p =( p , . . . , p k ) a probability vector. Suppose that the operator f P ki =1 p i f ◦ U i on L ( SO ( d )) has a spectral gap. Then there is a number e r < such that forevery choice e r < r , . . . , r k < , and for any a , . . . , a k ∈ R d , the self similarmeasure with weights p for the IFS { r i U i + a i } ki =1 is absolutely continuous withrespect to Lebesgue measure on R d . The spectral gap hypothesis can currently be verified when the entries arealgebraic and U i generate a dense subgroup of O ( d ) , but is conjectured to holdmuch more generally.Compare this theorem to Corollary 1.7: The former ensures absolute con-tinuity (which is a stronger property than full dimension), but only when thecontraction of the IFS is uniformly close enough to , while the latter ensuresthat the dimension is d as soon as there is no dimension obstruction (i.e. assoon as s-dim µ ≥ d ), but does not give absolute continuity. It is probable thatabsolute continuity holds under the same assumptions but this remains open.There are other cases in which possibility (iii’) of Theorem 1.4 or (iii”) ofTheorem 1.5, can be ruled out. A trivial case is when the attractor X of Φ satisfies dim X < k , and all D Φ -invariant subspaces have dimension ≥ k .Another case is when Φ consists of homotheties (i.e. the orthogonal parts U i ofthe contractions are identities), and for every line ℓ in R d we have X i : ϕ i ( X ) ∩ ℓ = ∅ r i < Then elementary covering considerations show that dim( X ∩ ℓ ) < for everyline ℓ ⊆ R d , and consequently (iii’) (and hence (iii”)) fails for every subspace V . Similarly, if Φ consists of homotheties and µ = P p i · ϕ i µ is a self-similarmeasure such that for every line ℓ , P i : ϕ i ( X ) ∩ ℓ = ∅ p i log p i P i : ϕ i ( X ) ∩ ℓ = ∅ p i log r i < then one can deduce that dim µ ℓ + x < for µ -a.e. x , which by Marstrand’sslice theorem rules out (iii”). Another alternative is to show that the linearimages onto ( d − -planes have dimension greater than dim µ − , in which caseDimension conservation [10] implies that in every dimension, the conditionalmeasure on a.e. line has dimension < .Unfortunately such arguments do not always apply, and we know of nogeneral method to exclude (iii’) and (iii”). See Theorem 1.16 and the discussionsurrounding it. Suppose that I is a set of parameters and that for t ∈ I we are given an IFS Φ t = { ϕ i,t } , where ϕ i,t ( x ) = r i ( t ) U i ( t ) x + a i ( t ) for functions r i , U i , a i definedon I . For i, j ∈ Λ n let ∆ i,j ( t ) = ϕ i,t (0) − ϕ j,t (0) . Then k ∆ i,j ( t ) k is the third term in the definition (4) of d ( ϕ i,t , ϕ j,t ) , and hence,writing ∆ n ( t ) for the quantity defined as in (5) for the system Φ t , we have min {k ∆ i,j ( t ) k : i, j ∈ Λ n distinct } ≤ ∆ n ( t ) . Theorem 1.9.
Let { Φ t } t ∈ I be a parametric family of IFSs on R d . Let E ⊆ I be the set E = \ ε> ∞ [ N =1 \ n>N [ i,j ∈ Λ n ∆ − i,j (( − ε n , ε n ) d ) , and let F ⊆ I be the set of parameters t for which Φ t is linearly reducible.Then for t ∈ I \ ( E ∪ F ) , every self-similar measure µ for Φ t satisfies dim µ =min { d, s-dim µ } and similarly for the attractor of Φ t . The main case of interest is when I ⊆ R m . Then, under rather mild assump-tions, the set E of (potential) exceptions can be shown to be quite small. For i, j ∈ Λ N let ∆ i,j ( t ) = lim n →∞ ∆ i ...i n ,j ...j n ( t ) . Theorem 1.10.
Let I ⊆ R m be connected and compact and let { Φ t } t ∈ I be aparametrized family of IFSs for which the associated functions r i ( · ) , U i ( · ) and a i ( · ) are real-analytic on a neighborhood of I . Suppose that ∀ i, j ∈ Λ N ( i = j = ⇒ ∆ i,j . Then the set E of the previous theorem has Hausdorff and packing dimension ≤ m − . In particular if Φ t is linearly irreducible for all t ∈ I , then outsidea set of parameters t of dimension ≤ m − (and in particular for Lebesgue-a.e. parameter), the attractor and self-similar measures of Φ t have the expecteddimension (i.e. equality holds in equations (2) and (3) ). The condition ∆ i,j rules out trivial cases. For instance the theoremcannot be expected to apply when Φ t = Φ does not depend on t and the system Φ has exact overlaps, in which case there are indeed distinct i, j ∈ Λ N with ∆ i,j ≡ .If I ⊆ R m and the IFS is in R d , and m ≥ d , then we expect that for each i, j ∈ Λ N there typically will be a sub-manifold I i,j ⊆ I of dimension m − d onwhich ∆ i,j = 0 for i ∈ I i,j . Thus, the dimension bound on E that one expects is m − d rather than the bound m − appearing in the theorem above. However, thehypothesis ∆ i,j in itself is certainly not enough to guarantee this bound. Tosee this, begin with any a -parameter family { Φ u } u ∈ [0 , of linearly irreducibleIFSs in R , and define a two-parameter family by Φ ( s,t ) = Φ ( s − t ) , ( s, t ) ∈ [0 , .One might expect, by the logic above, that dim E = m − d = 0 . But, evidently,on the 1-dimensional subspace V = { s = t } we have Φ ( s,t ) = Φ , and if theattractor of Φ happens to satisfy (2) with a strict inequality, then dim E ≥ =0 = m − d .It is natural to suggest that, assuming linear irreducibility of the IFS, the“correct” bound for E is dim E ≤ m − sup (cid:8) dim ∆ − i,j (0) : i, j ∈ Λ N , i = j (cid:9) . (6)For m = d = 1 , the bound proved in [12] coincides with this one. The difficultyin higher dimension is that the zero sets of real-analytic functions, and thebehavior of the functions near them, are not so well understood (for real-analyticfunctions on the line things are simple: the zero set consists of isolated points,10way from which the function grows polynomially in a well-understood manner).It seems likely that having effective bounds on the constants in Łojasiewicz’sinequality [23] might advance the matter but this seems to a difficult questionin itself. What we prove here is that the bound (6) holds if one makes anassumption analogous to the classical transversality assumption. Theorem 1.11.
Let I ⊆ R m be compact and let { Φ t } t ∈ I be a parametrizedfamily of IFSs for which the associated functions r i ( · ) , U i ( · ) and a i ( · ) are real-analytic on a neighborhood of I . Suppose that there exists an r ∈ N such thatfor every distinct pair i, j ∈ Λ n and t ∈ I , ∆ i,j ( t ) = 0 = ⇒ rank ( D ∆ i,j ( t )) ≥ r. Then the set E of Theorem 1.9 has Hausdorff and packing dimension ≤ m − r. As noted above, it is likely that there is room for improvement in theseresults.
We demonstrate the use of Theorems 1.10 and 1.11 for families of self-similarmeasures in which one varies the translations, contractions, or the IFS. Proofsare given in Section 6.7.Let X Φ ⊆ R d denote the attractor of an IFS Φ . Theorem 1.12.
For a finite set Λ and d ∈ N let IF S Λ ⊆ G ( d ) Λ denote theset | Λ | -tuples of contracting similarities , which we identify with the set of IFSsindexed by Λ . Then dim { Φ ∈ IF S Λ : dim X Φ < min { d, s-dim Φ X Φ }} ≤ dim IF S Λ − . In particular, dim X Φ = dim { , s-dim X Φ } for a.e. IFS Φ ∈ IF S Λ . If one fixes the linear parts of the similarity maps and varies the translationpart, one obtains a version of results by Simon and Solomyak [29]:
Theorem 1.13.
Let { U i } i ∈ Λ be orthogonal maps acting irreducibly on R d andfix < r i < , i ∈ Λ , satisfying the condition i = j = ⇒ r i + r j < . Then there is a subset A ⊆ ( R d ) Λ with dim( R d ) Λ \ A ≤ d | Λ |− d , and such that for a ∈ A the attractor of Φ = { r i U i + a i } i ∈ Λ satisfies dim X Φ = dim { , s-dim X Φ } .In particular this is true for a.e. a ∈ ( R d ) Λ . The condition on the contraction ratios plays a similar role in [29, Theorem2.1(c)] and the forthcoming book [30], where it is used in conjunction with thetransversality method. It is needed to control the rank of D ∆ i,j , which in oursetting is required in order to apply Theorem 1.11. It is not clear to what extentthe restriction on the contractions in necessary, but without the irreducibilitycondition it certainly is, as follows from [29, Proposition 3.3].Another variant of these results concerns projections of self-similar measuresdefined by homotheties. This is a variant of Marstrand’s theorem and Fursten-berg’s projection problem [20, 12]: 11 heorem 1.14. Let X ∈ P ( R d ) be a self-similar set defined by an IFS con-sisting of homotheties and satisfying strong separation. Let k < d and let Π d,k denote the set of orthogonal projections from R d to k -dimensional subspaces.Then dim { π ∈ Π d,k : dim πX = min { k, dim X }} ≤ dim Π d,k − k. A particularly interesting family are the Bernoulli convolutions with nonuni-form contraction. Namely, for < β, γ < let λ β,γ denote the self-similar mea-sure of maximal dimension for the IFS { x βx, x γx +1 } . Let S ⊆ (0 , bethe set of ( β, γ ) for which s-dim λ β,γ > ; it is expected that λ β,γ is absolutelycontinuous for a.e. ( β, γ ) ∈ S , but this has been established only in certainrestricted parameter ranges, e.g. [27]. Theorem 1.15. dim λ β,γ = min { , s-dim λ β,γ } outside a set of parameters ( β, γ ) ∈ (0 , of Hausdorff (and packing) dimension . In particular this holdsfor Lebesgue-a.e. pair ( β, γ ) ∈ (0 , . Finally, our results can be applied to a higher-dimensional analogs of theBernoulli convolutions problem, namely the “fat Sierpinski gasket”, first studiedby Simon and Solomyak [29]. For λ ∈ (0 , consider the system of contractions { ϕ i } i = a,b,c where a, b, c are the vertices of an equilateral triangle in R and ϕ u ( x ) = λx + u . The classical Sierpinski gasket arises from the choice λ =1 / , and in general when ≤ λ ≤ / the open set condition is satisfied andthe dimension of the attractor S λ is equal to the similarity dimension. When λ > / the attractor has non-empty interior, and this remains true for λ ≥ λ ∗ ,where λ ∗ ≈ . is the real root of x − x + x = 1 / ; see Broomhead-Montaldi-Sidorov [5]. For / < λ ≤ λ ∗ , however, the the dimension is known only forcertain special algebraic parameters and for Lebesgue-typical λ in a certainsub-range, and similarly for absolute continuity of the appropriate self-similarmeasures. See Jordan [15] and Jordan-Pollicott [16]. Theorem 1.16. dim S λ = min { , s-dim S λ } for λ ∈ (0 , outside a set ofHausdorff (and packing) dimension . The last result is an immediate consequence of Theorem 1.10 using the factthat S λ can be written also as the attractor of a linearly irreducible IFS (theone given above is reducible). The possibility of such a presentation of S λ comes from its rotational symmetries. Interestingly, our method do not givecomparable results even for very slight variants of S λ , e.g. the fat Sierpinskigaskets studied in [16]. A key ingredient in our argument is played by on the growth of entropy of mea-sures under convolution. This subject is developed in the next three sections:Section 2 introduces the statements and basic definitions, Section 3 contains pre-liminaries on entropy, saturation, concentration and convolutions, and Section4 proves the main results on convolutions. In Section 5 we extend the resultsto convolutions of a measure on R d with a measure on the isometry group. Fi-nally, in Section 6 we state and prove our main theorem on self-similar sets andmeasures and their applications. 12ome notation: N = { , , , . . . } . All logarithms are to base . P ( X ) isthe space of probability measures on X , endowed with the weak-* topology ifappropriate. We follow standard “big O ” notation: O α ( f ( n )) is an unspecifiedfunction bounded in absolute value by C · f ( n ) for some constant C = C ( α ) de-pending on α . Similarly o (1) is a quantity tending to as the relevant parameter → ∞ . We implicitly suppress all dependence of constants on the dimension d of R d . Thus O (1) sometimes means O d (1) . We sometimes write − O ( · ) instead of + O ( · ) to indicate that the error may be negative but formally the two notationsare equivalent.The statement “for all s and t > t ( s ) , . . . ” should be understood as saying“there exists a function t ( · ) such that for all s and t > t ( s ) , . . . ”. The function t ( · ) will change between contexts, when we want a persistent name we will designatethe function as t ( · ) , t ( · ) , t ∗ ( · ) , etc.For the reader’s convenience we summarize our main notation in the tablebelow. d Dimension of the ambient Euclidean space. B r ( x ) The open Euclidean ball of radius r around x k x k , k A k Euclidean norm of x ∈ R d , operator norm of A ∈ M d ( R )dim Hausdorff dimension of sets and measures
Φ = { ϕ i } i ∈ Λ Iterated Function system, Section 1.1 X Attractor of Φ . Usually assume ∈ X ⊆ [0 , , Section 1.1 µ Self-similar measure (usually), Section 1.1 ϕ i ...i n , p i ...i n ϕ i ◦ ϕ i ◦ . . . ϕ i n and p i · p i · . . . · p i n ν ( n ) P i ∈ Λ n p i · δ ϕ i (0) , the n -th approximation of µ D kn n -th level dyadic partition of R k ( k = d by default); Section 1.2 D Gn Dyadic partition of G ⊆ R + × M d ( R ) × R d , Section 1.2 E Gn Dyadic partition of G by translation part, Section 1.2 P ( X ) Space of probability measures on X . µ x,n , µ x,n Component measures (raw and re-scaled), Section 2.3 S t Scaling map: S t ( x ) = 2 t xτ z Translation map: τ s ( x ) = x + s P i ∈ I , E i ∈ I Distribution and expectation over components, Section 2.3 H ( µ, B ) Shannon entropy, Section 3.1 H ( µ, B|C ) Conditional entropy, Section 3.1 H m ( µ ) m H ( µ, D m ) , Section 3.1 G, G The groups of similarities and isometries, respectively. π V Orthogonal projection to VV ( ε ) ε -neighborhood of Vd ( U, V ) Distance between linear subspaces of
U, V ≤ R d , Section 3.6 ⊑ Subset relation restricted to unit ball, Section 3.6 ∠ ( U, V ) (Modified) angle between linear subspaces, Section 3.6 µ ∗ η Convolution of probability measure on R d . ν . x , ν . µ Action/convolution of ν ∈ P ( G ) on x ∈ R d , µ ∈ P ( R d ) . m ( µ ) , Σ( µ ) Mean and covariance matrix of measure µ ∈ P ( R d ) , Section 4.2 λ i ( µ ) , λ i (Σ) Eigenvalues of measure or covariance matrix, Section 4.2 eigen ...r (Σ) Span of top r eigenvectors of Σ (for measure, Σ = Σ( µ ) ), Section 4.2 sat( η, ε, n, m ) Set of ( V, ε, m ) -saturated subspaces at level n , Section 6.213 cknowledgment I am grateful to Pablo Shmerkin and Boris Solomyak for their many helpfulcomments, and to Ariel Rapaport for his contribution to the argument in Section6.4. Part of this work was done during a visit to Microsoft Research in Redmond,Washington, and I would like to thank Yuval Peres and the members of thetheory group for their hospitality.
A subject of independent interest and central to our work is an analysis of thegrowth of the entropy of measures under convolution, either with other measuresor with measures on the group of isometries (or similarities). This topic willoccupy us for a large part of the paper.We begin with a discussion of convolutions on Euclidean space, leaving gen-eralizations to later. It is convenient to introduce the normalized scale- n entropy H n ( µ ) = 1 n H ( µ, D n ) . This normalization makes H n ( µ ) a finite-scale surrogate for the dimension of µ .In particular, for µ ∈ P ([0 , d ) we have ≤ H n ( µ ) ≤ d, with equality holding for all n if and only if µ is Lebesgue measure on [0 , d ,and in general for measures µ of bounded support, ≤ H n ( µ ) ≤ d + O ( 1 n ) , where the constant depends logarithmically on the diameter of the support.Our aim is to obtain structural information about measures µ, ν for which µ ∗ ν is small in the sense that H n ( µ ∗ ν ) ≤ H n ( µ ) + δ, (7)where δ > is small but fixed, and n is large. This problem is a relative ofclassical ones in additive combinatorics concerning the structure of sets A, B whose sumset A + B = { a + b : a ∈ A , b ∈ B } is appropriately small. Thegeneral principle is that when the sum is small, the sets should have somealgebraic structure. Results to this effect are known as inverse theorems. Forexample the Freiman-Rusza theorem asserts that if | A + B | ≤ C | A | then A, B are close, in a manner depending on C , to generalized arithmetic progressions (the converse is immediate). See e.g [32]. A generalized arithmetic progression is an injective affine image of a box in a higher-dimensional lattice. | A + A | ≤ C | A | is H n ( µ ∗ µ ) ≤ H n ( µ ) + O ( 1 n ) . (8)An entropy version of Freiman’s theorem was recently proved by Tao [31], whoshowed that if µ satisfies (8) then it is close, in an appropriate sense, to a uniformmeasures on a (generalized) arithmetic progression.The condition (7), however, is significantly weaker than (8) even when ν = µ ,and it is harder to draw conclusions from it about the global structure of µ .Consider the following example. Start with an arithmetic progression of length n and gap ε , and put the uniform measure on it. Now split each atom x intoan arithmetic progression of length n and gap ε < ε /n , starting at x (so theentire gap fits in the space between x and the next atom). Repeat this procedure N times with parameters n i , ε i , and call the resulting measure µ . Let k be suchthat ε N is of order − k . It is not hard to verify that we can have H k ( µ ) = 1 / but | H k ( µ ) − H k ( µ ∗ µ ) | arbitrarily small. This example is actually the uniformmeasure on a (generalized) arithmetic progression, as predicted by Freiman-typetheorems, but as we allow the rank N to grow, the entropy growth can be madearbitrarily small. Furthermore, if one conditions µ on an exponentially smallsubset of its support one gets another example with the similar properties thatis quite far from a generalized arithmetic progression.Our main contribution to this matter is Theorem 2.8 below, which showsthat constructions like the one above are, in a certain statistical sense, the onlyway that (7) can occur. We note that there is a substantial existing literatureon the growth condition | A + B | ≤ | A | δ , which is the sumset analog of (7).Such a condition appears in the sum-product theorems of Bourgain-Katz-Tao [4]and in the work of Katz-Tao [19], and in the Euclidean setting more explicitlyin Bourgain’s work on the Erdős-Volkmann conjecture [1] and Marstrand-likeprojection theorems [2]. However we have not found a result in the literaturethat meets our needs and, in any event, we believe that the formulation givenhere will find further applications. We begin by discussing global properties of measures that lead to the inequalityin (7), and formulate discrete analogs of them.For a linear subspace V ≤ R d we say that a measure µ is absolutely continu-ous on a translate V ′ of V if it is absolutely continuous with respect to the dim V -dimensional volume (Hausdorff measure) λ V ′ on V ′ . Suppose that µ ∈ P ( R d ) is compactly supported on a translate V of V , and is absolutely continuousthere. Then the Lebesgue differentiation theorem implies that µ ( B r ( x )) = c x · ( r dim V + o (1)) as r → , and it follows that H n ( µ ) = dim V − o (1) as n → ∞ . (9)If ν ∈ P ( R d ) is compactly supported on another translate V of V , then ν ∗ µ issupported on V = V + V , which is a translate of V , and is absolutely continuousthere. Thus it also satisfies (9), and consequently H n ( µ ∗ ν ) = H n ( µ ) + o (1) :i.e., at small scales there is negligible entropy growth, and (7) is satisfied.15ore generally, let W = V ⊥ be the orthogonal complement of V and write π W for the orthogonal projection to W . Suppose µ ∈ P ( R d ) is compactlysupported and its conditional measures on the translates of V are absolutelycontinuous, that is, µ = ´ µ w dθ ( w ) where θ = π W µ and µ w is θ -a.s. supportedand absolutely continuous on π − W ( w ) = V + w . Then instead of (9), one canshow that H n ( µ ) = H n ( π W ( µ )) + dim V − o (1) as n → ∞ , (10)and, if ν is compactly supported on a translate of V , then µ ∗ ν again hasabsolutely continuous conditional measures on translates of V , and it projects toa translate of θ , so it satisfies the same relation (10). Again, we have H n ( ν ∗ µ ) = H n ( µ ) + o (1) , and (7) is satisfied.This discussion motivates the following finite-scale analogs. For A ⊆ R d and ε > denote the ε -neighborhood of A by A ( ε ) = { x ∈ R d : d ( x, A ) < ε } . Definition 2.1.
Let V ≤ R d be a linear subspace and ε > . A measure µ ∈P ( R d ) is ( V, ε ) -concentrated if there is a translate W of V such that µ ( W ( ε ) ) ≥ − ε .Note that ( V, ε ) -concentration does not imply that the measure is supportednear V itself, only near a translate of it. Next, discretizing (9) we have Definition 2.2.
Let V ≤ R d be a linear subspace, ε > and m ∈ N . Ameasure µ ∈ P ( R d ) is ( V, ε ) -uniform at scale m , or ( V, ε, m ) -uniform, if it is ( V, − m ) -concentrated and H m ( µ ) > dim V − ε .Finally, discretizing (10), we have: Definition 2.3.
Let V ≤ R d be a linear subspace, W = V ⊥ its orthogonalcomplement, and ε > . A probability measure µ ∈ P ( R d ) is ( V, ε ) -saturatedat scale m , or ( V, ε, m ) -saturated, if H m ( µ ) ≥ H m ( π W µ ) + dim V − ε. There are obvious relations between the notions above: being nearly uniformimplies saturation, and saturation implies being essentially a convex combina-tion of nearly uniform measures. Furthermore, as one would expect from thediscussion above, if we convolve a measure which is highly concentrated on asubspace with another measure which is uniform or saturated on that subspaceat some scale, there will be little entropy growth at that scale. For precisestatements see Sections 3.3 and 3.7.
Let D n ( x ) ∈ D n denote the unique level- n dyadic cell containing the point x ∈ R d . For D ∈ D n let T D : R d → R d be the unique homothety mapping D to [0 , d . Recall that if µ ∈ P ( R ) then T D µ is the push-forward of µ through T D . Definition 2.4.
For µ ∈ P ( R d ) and a dyadic cell D with µ ( D ) > , the (raw) D -component of µ is µ D = 1 µ ( D ) µ | D , D -component is µ D = 1 µ ( D ) T D ( µ | D ) . For x ∈ R d with µ ( D n ( x )) > we write µ x,n = µ D n ( x ) µ x,n = µ D n ( x ) . These measures, as x ranges over all possible values for which µ ( D n ( x )) > ,are called the level- n components of µ .Our results on the multi-scale structure of µ ∈ R d are stated in terms of thebehavior of random components of µ , defined as follows. Definition 2.5.
Let µ ∈ P ( R d ) .1. A random level- n component, raw or rescaled, is the random measure µ D or µ D , respectively, obtained by choosing D ∈ D n with probability µ ( D ) ;equivalently, this is the random measure µ x,n or µ x,n , respectively, with x chosen according to µ .2. For a finite set I ⊆ N , a random level- I component, raw or rescaled, ischosen by first choosing n ∈ I uniformly, and then (conditionally indepen-dently on the choice of n ) choosing a raw or rescaled level- n component. Notation . When the symbols µ x,i and µ x,i appear inside an expression P ( . . . ) or E ( . . . ) , they will always denote random variables drawn according to thecomponent distributions defined above. The range of i will be specified asneeded. When dealing with components of several measures µ, ν , we assume allchoices of components are independent unless otherwise stated.The definition is best understood with some examples. For A , B ⊆ P ([0 , d ) ,and writing A for the indicator function of A , we have P i = n (cid:0) µ x,i ∈ A (cid:1) = ˆ A ( µ x,n ) dµ ( x ) P ≤ i ≤ n (cid:0) µ x,i ∈ A (cid:1) = 1 n + 1 n X i =0 ˆ A ( µ x,i ) dµ ( x ) P i = n (cid:0) µ x,i ∈ A , ν y,i ∈ B (cid:1) = ˆ ˆ A ( µ x,n ) · B ( ν y,n ) dµ ( x ) dν ( y ) . This notation implicitly defines x, i as random variables. Thus if A , A , . . . ⊆P ([0 , d ) and D ⊆ [0 , d we could write P ≤ i ≤ n (cid:0) µ x,i ∈ A i and x ∈ D (cid:1) = 1 n + 1 n X i =0 µ (cid:0) x : µ x,i ∈ A i and x ∈ D (cid:1) . Definition 2.5 is motivated by Furstenberg’s notion of a CP-distribution [9, 10, 13], whicharise as limits as N → ∞ of the distribution of components of level , . . . , N . These limits havea useful dynamical interpretation but in our finitary setting we do not require this technology. f : P ([0 , d ) → R and I ⊆ N , E i ∈ I (cid:0) f ( µ x,i ) (cid:1) = 1 | I | X i ∈ I ˆ f ( µ x,i ) dµ ( x ) . We use similar expectation notation to average a sequence a n , . . . , a n + k ∈ R : E n ≤ i ≤ n + k ( a i ) = 1 k + 1 n + k X i = n a i . We note in particular one trivial identity that will be used repeatedly later on: µ = E i = n ( µ x,i ) . (11)Component distributions have the convenient property that they are almostinvariant under repeated sampling, i.e. choosing components of components.More precisely, for µ ∈ P ( R d ) and m, n ∈ N , let P µn denote the distributionof components µ x,i , ≤ i ≤ n , as defined above; and let Q µn,m denote thedistribution on components obtained by first choosing a random component µ x,i , ≤ ≤ n , as above, and then, conditionally on θ = µ x,i , choosing acomponent θ y,j , i ≤ j ≤ i + m with the usual distribution (note that θ y,j = µ y,j is indeed a component of µ ). Lemma 2.7.
Given µ ∈ P ( R d ) and m, n ∈ N , the total variation distancebetween P µn and Q µn,m satisfies (cid:13)(cid:13) P µn − Q µn,m (cid:13)(cid:13) = O ( mn ) In particular if A , B ⊆ P ([0 , d ) and ε, δ > are such that P ≤ i ≤ n ( µ x,y ∈ A ) > − ε P i ≤ j ≤ i + m ( θ y,i ∈ B ) > − δ for every θ ∈ A (12) Then P ≤ i ≤ n ( µ x,i ∈ B ) > − ε − δ − O ( mn ) Proof.
Observe that both P µn and Q µn,m produce a component µ z,k by choosing z according to µ , and independently choosing a level k ∈ N . The differenceis that P µn chooses k uniformly in the range , . . . , n , whereas for Q µn,m , anelementary calculation shows that with probability − O ( m/n ) is choses k uni-formly in the range m, m + 1 , . . . , n , and with probability O ( m/n ) it is chooses k ∈ { , , , . . . , m − } ∪ { n + 1 , . . . , n + m } (one can easily determine the distri-bution in this case but it is not relevant here). This gives the first statement.For the second statement, what we want to show is that P µn ( B ) > − ε − δ − O ( m/n ) . This will follow from the first statement if we show that Q µn,m ( B ) > − ε − δ . Let θ = µ x,i and θ y,j be as in the previous paragraph, so θ y,j is distributed according to Q µn,m . By the law of total probability and ourhypotheses, Q µn,m ( B ) = P ( θ y,j ∈ B ) ≥ P ( θ y,j ∈ B| µ x,i ∈ A ) · P ( µ x,i ∈ A ) > (1 − δ )(1 − ε ) and the claim follows. 18imilar statements hold for raw components and components of measures onthe similarity group. We omit the proofs, which are the same. R d Our main result on entropy growth is that the global obstructions described atthe beginning of Section 2.2 are the only local obstructions.
Theorem 2.8.
For every
R, ε > and m ∈ N there is a δ = δ ( ε, R, m ) > such that for every n > n ( ε, R, δ, m ) , the following holds: if µ, ν ∈ P ([ − R, R ] d ) and H n ( µ ∗ ν ) < H n ( µ ) + δ, then there exists a sequence V , . . . , V n ≤ R d of subspaces such that P ≤ i ≤ n (cid:18) µ x,i is ( V i , ε, m ) -saturated and ν y,i is ( V i , ε ) -concentrated (cid:19) > − ε. (13)The proof of the theorem is given in Section 4.6. Remark .
1. The dependence of δ on ε, m is effective, but the bounds weobtain are certainly far from optimal, and we do not pursue this topic.Also note that the theorem is not a characterization (this is already thecase in dimension 1, see discussion after [12, Theorem 2.7]).2. We have assumed that µ, ν ∈ P ([ − R, R ] d ]) but the theorem can be ex-tended to measures with unbounded support having finite entropy by anapproximation argument, see also [12, Section 5.5].3. An application of Markov’s inequality shows that (up to replacing ε by √ ε ) equation (13) is equivalent to P ≤ i ≤ n (cid:0) µ x,i is ( V i , ε, m ) -saturated and (cid:1) > − ε (14) P ≤ i ≤ n (cid:0) ν y,i is ( V i , ε ) -concentrated (cid:1) > − ε. (15)4. There is no assumption in the theorem on the entropy of ν , but if H n ( ν ) issufficiently close to the conclusion will automatically hold with V i = { } (indeed, a small value of H n ( ν ) implies that with high probability ν y,i willbe highly concentrated on { } , so (14) holds, and (15) is automatic, everymeasure is ( { } , ε, m } ) -saturated).5. The version of Theorem 2.8 given in [12] for the case d = 1 had a somewhatdifferent, but equivalent, appearance. The statement there was that forsmall enough δ > , if H n ( µ ∗ ν ) ≤ H n ( µ ) + δ , then there exist disjointsets I, J ⊆ { , . . . , n } with | I ∪ J | > (1 − ε ) n such that (14) holds for V i = R when the expectation is conditioned on i ∈ I , and (15) holdsfor V i = { } when the expectation is conditioned on i ∈ J . Indeed,if such I, J ⊆ { , . . . , n } are given, observe that by setting V i = R for i ∈ I and V i = { } for i ∈ J , and defining V i arbitrarily on the at most εn remaining i , equations (14) and (15) will hold for slightly larger ε alsowithout conditioning on I, J , because every measure is ( R , ε ) -concentratedand ( { } , ε, m ) -saturated. Thus the version in [12] implies the d = 1 case of Theorem 2.8. Conversely, assuming subspaces V i as in Theorem2.8, we recover the version from [12] by setting I = { i : V i = R } and J = { j : V j = { }} and adjusting ε .19pecializing to self-convolutions and using some of the basic relations be-tween saturation, concentration and uniformity, one deduces a multi-scale Freiman-type result: Theorem 2.10.
For every ε > and m ∈ N , there is a δ = δ ( ε, m ) > suchthat for every n > n ( ε, δ, m ) and every µ ∈ P ([0 , d ) , if H n ( µ ∗ µ ) < H n ( µ ) + δ, then there exists a sequence V , . . . , V n < R d such that P ≤ i ≤ n (cid:0) µ x,i is ( V i , ε, m ) -uniform (cid:1) > − ε. R d Recall that G = G ( d ) denotes the group of similarities of R d . For g = rU + a we write r g = r, U g = U and a g = a . The dyadic partitions D Gn and E Gn of G were defined in Section 1.2 using the identification of G with a subset of R × M d ( R ) × R d . For ν ∈ P ( G ) and for g ∈ G , n ∈ N , we define the rawcomponent ν g,n in terms of the partition D Gn , ν g,n = c · ν | D Gn ( g ) , where c is a normalizing constant. We adopt the same notation and conventionsfor these components as laid out in Section 2.3.It is not natural in this context to define “rescaled” components. When weneed to rescale we shall do so explicitly using the maps S t ∈ G , S t x = 2 t x. For ν ∈ P ( G ) and µ ∈ P ( R d ) we write ν . µ for the push-forward of ν × µ via ( ϕ, x ) ϕ ( x ) , and similarly for x ∈ R d write ν . x for the push-forward of ν by g gx . Our aim is to understand when the entropy of ν . µ is large relative tothe entropy of µ , for ν ∈ P ( G ) and µ ∈ P ( R d ) .While our methods are able to treat this setting, it is more transparent if weassume that ν is supported on the isometry group G < G , and we shall mostlyrestrict our attention to this case.The statement we would like to make is that, if ν ∈ P ( G ) and µ ∈ P ( R d ) ,and if ν is of large entropy, then µ . ν will have substantially more entropy than µ , at small enough scales, unless certain specific obstructions occur. In thepresent setting the obvious global obstruction is that µ may be close to uniformon an orbit of a subgroup H < G , and ν supported on H or a left coset of H . However, locally, this situation is not very different from the one we havealready seen, and it is more natural to study the concentration of µ on affinesubspaces, as in the Euclidean case. This is because the orbit of a point x ∈ R d under a closed subgroup H < G is a finite union of smooth manifolds, andat small scales these look like affine subspaces of R d (essentially, the tangenthyperplanes of the manifolds). Thus we continue to state our results in terms ofthe concentration on subspaces of (the components of) µ and (the componentsof) the image of ν under the action.Even so, there are several complications related to the phenomenon above.The first is demonstrated by the following example. Let d = 2 , let µ be the20niform measure on the circle { x ∈ R : k x k = 1 } , and let ν be the uniformmeasure on the group of rotations about the origin. Then ν . µ = µ , so thereis no entropy growth. In this case, as predicted in the previous paragraph,the components µ x,n become saturated on lines when n is large, but the linevaries according to the point x (the distribution of directions for x ∼ µ is ofcourse uniform). In contrast, recall from Theorem 2.8 that, for convolutions ofmeasures on R d , at each scale there was a single subspace on which, with highprobability, all components of µ at a given level became saturated, irrespectiveof their spatial positions.Another complication is the possibility that at small scales µ indeed becomessaturated, and ν concentrated, on subspaces, but that these subspaces are triv-ial. In the Euclidean setting such an occurrence was possible only if ν hadnearly vanishing entropy, since if H n ( ν ) is substantial then the components of ν cannot with high probability be highly concentrated on points. In the currentsetting, however, this cannot be ruled out. To see this let µ = δ and let ν be normalized Haar measure on the orthogonal group O ( d ) = stab G (0) . Then ν . µ = µ , so there is no entropy growth, and ν has large entropy at all scales,but the components of µ are not saturated on any non-trivial subspace. Thusthe theorem above applies, but V i = { } . This type of situation can be avoided,however, if no part of the measure µ is close to a proper affine subspace. Tomake this quantitative we introduce the following definition: Definition 2.11. µ ∈ P ( R d ) is ( ε, σ ) -non-affine if µ ( V ( σ ) ) < ε for every properaffine subspace V ≤ R d .We can now state the inverse theorem. Informally, it says that if ν . µ doesnot have substantially more entropy than µ , then, to most components of µ and ν at a moderately small scale, we can associate a subspace (depending onthe components in question) such that the sub-components of the componentstypically become concentrated or saturated on this subspace. Furthermore,these subspaces will frequently be non-trivial if µ is not too close to beingsupported on a proper affine subspace of R d . Here is the precise formulation: Theorem 2.12.
For every ε > , R > and m ∈ N , there exists δ = δ ( ε, R, m ) > such that for every k > k ( ε, R, m ) and every n > n ( ε, R, m, k ) ,the following holds. For every ν ∈ P ( G ) and µ ∈ P ([ − R, R ] d ) that are sup-ported on balls of radius R , either H n ( ν . µ ) > H n ( µ ) + δ, or else, to every pair of level- k components e ν of ν and e µ of µ we can assign asequence of subspaces V i = V i ( e ν, e µ ) < R d , ≤ i ≤ n , such that with probabilityat least − ε over the choice of e µ, e ν , P ≤ i ≤ n (cid:18) e µ x,i is ( V i , ε, m ) -saturated and S i U − g ( e ν g,i . x ) is ( V i , ε ) -concentrated (cid:19) > − ε If in addition µ is (( ε/ d ) d +1) , σ ) -non-affine for some σ > , and the relationamong parameters takes σ into account, then for those e ν, e µ in the set of goodcomponents above, then for those e ν, e µ in the set of good components above, n + 1 n X i =0 dim V i > d + 1 H n ( e ν ) − ε, nd E i = k n + 1 n X j =0 dim V j ( ν g,i , µ x,i ) > d + 1 H ( ν ) − ε (16) Remark .
1. Given ε , the assumption that µ is (( ε/ d ) d +1) , σ ) -non-affine is global, and imposes no restriction on the structure of µ below atscales smaller than O ( ε d +1) ) ). Indeed, if µ ∈ P ( R d ) does not give massto any affine subspace, then for any τ it is ( τ, σ ) -non-concentrated for some σ > . Thus, if we fix µ in advance, then for every ε, m the conclusionof the theorem holds automatically for suitable parameters δ, k, n , and all ν ∈ P ( G ) .2. The average in (16) is over all pairs of components ν g,k , µ x,k , not only thosefor which the first part of the conclusion holds. But the total mass of theexceptional components is at most ε , and dim V i ≤ d , so the exceptionalcomponents contribute O ( ε ) to the average, which is of the same order asthe error term. Thus we get an equivalent statement if in (16) we averageonly over only the “good” components from the first part of the theorem.The proof of the theorem is based on a linearization argument which allowsus to apply Theorem 2.8 from the Euclidean setting. See Section 5.5. It is possible to apply our methods also to convolutions in Lie groups, actions ofLie groups on manifolds, and more general settings. Let I ⊆ R d and J ⊆ R d be closed balls and f : I × J → R d a C map. For z = ( x, y ) ∈ I × J we canwrite the differential Df ( z ) : R d + d → R d in matrix form, as Df ( z ) = [ A z , B z ] : R d + d → R d , where A z ∈ M d × d and B z ∈ M d × d . Theorem 2.14.
Let f : I × J → R d be as above. For every ε > and m ∈ N there exists δ = δ ( f, ε, m ) > such that for every k > k ( f, ε, m ) and every n > n ( f, ε, m, k ) , the following holds. Let ν ∈ P ( I ) and µ ∈ P ( J ) . Then either H n ( f ( µ × ν )) > ˆ H n ( f ( µ × δ y )) dν ( y ) + δ (17) or else, for independently chosen level- k components e µ, e ν of µ, ν , respectively,with probability at least − ε there are subspaces V , . . . , V n < R d such that P ≤ i ≤ n (cid:18) A x,y e µ x,i is ( V i , ε, m ) -saturated and B x,y e ν y,i is ( V i , ε ) -concentrated (cid:19) > − ε and n + 1 n X i =0 dim V i > c ˆ H n ( f ( δ x × ν )) d e µ ( x ) . Note that since I × J is compact the norms of A x,y and B x,y are boundedover ( x, y ) ∈ I × J , and since ε may be small and m large relative to these norms,22e have not bothered to re-scale the measures A x,y e µ x,i , B x,y e ν y,i to compensatefor their contraction/expansion (the distortion caused by these matrices is alsoone reason for the dependence of the parameters on f , the other being the speedof linear approximation). The proof is given in Section 5.6.We note two important special cases. Corollary 2.15.
Let
G < GL d ( R ) ⊆ R d be a matrix group acting by leftmultiplication on R d . Let ν ∈ P ( G ) and µ ∈ P ( R d ) be measures of boundedsupport. Then for every ε > and m ∈ N there is a δ = δ ( ν, µ, ε, m ) > , suchthat for k > k ( ν, µ, ε, m, δ ) and n > n ( ν, µ, ε, m, δ, k ) , either H n ( ν . µ ) > H n ( µ ) + δ, or else, for independently chosen level- k components e µ, e ν of µ, ν , respectively,with probability at least − ε there are subspaces V , . . . , V n < R d such that P ≤ i ≤ n (cid:18) y . e µ x,i is ( V i , ε, m ) -saturated and e ν y,i . x ) is ( V i , ε ) -concentrated (cid:19) > − ε (The dependence of δ, k, ν on the measures depends only on their support and isuniform on compact sets).If in addition µ is (( ε/ d ) d +1 , σ ) -non-affine for some σ > , then for δ, k, n which also depend on σ , we also have n + 1 n X i =0 dim V i > c · H n ( e ν ) − ε. for a constant c depending only on d, σ and the support of ν . Corollary 2.16.
Let
G < GL d ( R ) ⊆ R d be a matrix group and µ, ν ∈ P ( G ) measures of bounded support. Then for every ε > and m ∈ N there is a δ > such that for every large enough k and all suitably large enough n , either H n ( µ ∗ ν ) > H n ( µ ) + δ, or else, for an independently chosen pair of raw level- k components e µ, e ν of µ, ν ,respectively, with probability > − ε , there are subspaces V , . . . , V n < R d suchthat P ≤ i ≤ n (cid:18) y ∗ e µ x,i is ( V i , ε, m ) -saturated and e ν y,i ∗ x is ( V i , ε ) -concentrated (cid:19) > − ε and n + 1 n X i =0 dim V i > c · H n ( ν ) − ε. for a suitable constant c . Both corollaries follow from the the previous theorem by taking f ( x, y ) = yx to be the appropriate action map; for the first corollary an additional argumentis needed to produce the constant c . The dependence of the paramerets on themeasures is only through their supports: If we fix a large ball in advance andassume the measures are supported on it, then the parameters depend only onthe ball, not the measures. 23 emark .
1. It is important to note the order of quantifiers in the theorem and corol-laries: In the theorem all parameters depend on the function f , and in thecorollaries the function f is the action map restricted to the (compact)product of the supports of ν and µ , which are fixed before the other pa-rameters. The reason this works is that once the functions is fixed and themeasures are fixed, and compactly supported, the speed with which thefunction f approaches its linearzation is uniform, hence, at small enoughscales, we are essentially dealing with linear convolutions rather than anon-linear image.2. In some applications the order of quantifiers above is not sufficient and itis necessary to obtain statements that are uniform over many functionsor independent of the support of the measures. Then a more quantitativeanalysis is needed. Such an example can be found in [14].3. One can formulate the corollaries in abstract Lie groups using partitionsintroduced from local coordinates, or using general theorem on the exis-tence of similar partitions in doubling metric spaces, see e.g. [17].4. When dealing with more general group actions one would also like to relaxthe condition that the measures be compactly supported. But in doingso one must take into account how various properties of the action affectthe dependence between parameters in the theorem. For example theyare sensitive to the speed at which the action approaches its linearization(which may differ from point to point), how well the an element of thegroup is determined by its action on k -tuples, and how sensitive the latterprocedure is to changes in the k -tuple. It turns out that the cleanestapproach is to choose a left-invariant Riemmanian metric on the groupand dyadic partition adapted to it. For a detailed development of thisapproach in one example we refer the reader to [14]. This section presents without proof some standard results about entropy, fol-lowed by a more detailed analysis of concentration, saturation and uniformity.
The Shannon entropy of a probability measure µ with respect to a countablepartition E is given by H ( µ, E ) = − X E ∈E µ ( E ) log µ ( E ) , where the logarithm is in base and . The conditional entropy withrespect to a countable partition F is H ( µ, E|F ) = X F ∈F µ ( F ) · H ( µ F , E ) , µ F = µ ( F ) µ | F is the conditional measure on F . For a discrete probabilitymeasure µ we write H ( µ ) for the entropy with respect to the partition intopoints, and for a probability vector α = ( α , . . . , α k ) we write H ( α ) = − X α i log α i . and for < ε < we abbreviate H ( ε ) = H (( ε, − ε )) Note that if < ε < / then H ( ε ) = O ( ε log(1 /ε )) .We collect here some standard properties of entropy. Lemma 3.1.
Let µ, ν be probability measures on a common space, E , F parti-tions of the underlying space and α ∈ [0 , .1. H ( µ, E ) ≥ , with equality if and only if µ is supported on a single atomof E .2. If µ is supported on k atoms of E then H ( µ, E ) ≤ log k , with equality ifand only if each of these atoms has mass /k .3. If F refines E (i.e. ∀ F ∈ F ∃ E ∈ E s.t. F ⊆ E ) then H ( µ, F ) ≥ H ( µ, E ) .4. If E ∨ F = { E ∩ F : E ∈ E , F ∈ F} denotes the join of E and F , then H ( µ, E ∨ F ) = H ( µ, F ) + H ( µ, E|F ) , in particular H ( µ, E ∨ F ) ≤ H ( µ, E ) + H ( µ, F ) . H ( · , E ) and H ( · , E|F ) are concave.6. H ( · , E ) obeys the “convexity” bound H ( X α i µ i , E ) ≤ X α i H ( µ i , E ) + H ( α ) . and similarly after conditioning on F . In particular, we note that for µ ∈ P ([0 , d ) we have the bounds H ( µ, D m ) ≤ md and H ( µ, D n + m |D n ) ≤ md .Although the function ( µ, m ) H ( µ, D m ) is not continuous in the weak-*topology on measures, the following estimates provide usable substitutes. Lemma 3.2.
Let µ, ν ∈ P ( R d ) , let E , F be partitions of R d , and m, m ′ ∈ N .1. Given a compact K ⊆ R d and µ ∈ P ( K ) , there is a neighborhood U ⊆P ( K ) of µ such that | H ( ν, D m ) − H ( µ, D m ) | = O d (1) for ν ∈ U .2. If each E ∈ E intersects at most k elements of F and vice versa, then | H ( µ, E ) − H ( µ, F ) | = O (log k ) .3. If f, g : R d → R k and k f ( x ) − g ( x ) k ≤ C − m for x ∈ R d then | H ( f µ, D m ) − H ( gµ, D m ) | ≤ O C,k (1) . . If ν ( · ) = µ ( · + x ) then | H ( µ, D m ) − H ( ν, D m ) | = O d (1) .5. If | m ′ − m | ≤ C , then | H ( µ, D m ) − H ( µ, D m ′ ) | ≤ O C,d (1) . We will use some easy corollaries of Lemma 3.1 (5) and (6).
Lemma 3.3.
Let µ, ν ∈ P ([ − r, r ] d ) , let δ > , and let θ = (1 − δ ) µ + δν . Thenfor partitions A , B of R d we have | H ( θ, A ) − H ( µ, A ) | ≤ H ( δ ) + δ | H ( µ, A ) − H ( ν, A ) | , | H ( θ, A|B ) − H ( µ, A|B ) | ≤ H ( δ ) + δ | H ( µ, A|B ) − H ( ν, A|B ) | . Recall that the total variation distance between µ, ν ∈ P ( R d ) is k µ − ν k = sup A | µ ( A ) − ν ( A ) | , where the supremum is over Borel sets A . This is a complete metric on P ( R d ) .It follows from standard measure theory that given µ, ν there are probabilitymeasures τ, µ ′ , ν ′ such that µ = (1 − δ ) τ + δµ ′ and ν = (1 − δ ) τ + δν ′ , where δ = k µ − ν k . Combining this with Lemma 3.1 (5) and (6), we have Lemma 3.4. If A , B are partitions of R d , and if µ, ν ∈ P ( R d ) are supportedon at most k atoms of each partition and k µ − ν k < ε , then | H ( µ, A ) − H ( ν, A ) | < kε + 2 H ( 12 ε ) . | H ( µ, A|B ) − H ( ν, A|B ) | < kε + 2 H ( 12 ε ) . In particular, if µ, ν ∈ P ([0 , d ) , then | H m ( µ ) − H m ( ν ) | < dε + 2 H ( ε ) m . Recall from Section 2.3 the definition of the raw and re-scaled components µ x,n , µ x,n , and note that H ( µ x,n , D m ) = H ( µ x,n , D n + m ) . Also, E i = n (cid:0) H m ( µ x,i ) (cid:1) = ˆ m H ( µ x,n , D m ) dµ ( x )= 1 m ˆ H ( µ x,n , D n + m ) dµ ( x )= 1 m H ( µ, D n + m | D n ) . The following basic lemmas enable us to get bounds on the scale- n entropy of ameasure, or a convolution of measures, in terms of the average scale- m entropyof their components or convolution of their components, when m ≪ n .26 emma 3.5. For r ≥ , µ ∈ P ([ − r, r ] d ) and integers m < n , H n ( µ ) = E ≤ i ≤ n (cid:0) H m ( µ x,i ) (cid:1) + O ( m + log rn ) . Lemma 3.6.
For r > , µ, ν ∈ P ([ − r, r ] d ) and integers m < n , H n ( µ ∗ ν ) ≥ E ≤ i ≤ n (cid:18) m H ( µ x,i ∗ ν y,i , D i + m |D i ) (cid:19) + O ( m + log rn ) ≥ E ≤ i ≤ n (cid:0) H m ( µ x,i ∗ ν y,i ) (cid:1) + O ( 1 m + m + log rn ) . For proofs see [12, Section 3.2], or the proof of the following variant, whichis essentially the same as the Euclidean case.
Lemma 3.7.
Let ν ∈ P ( G ) and µ ∈ P ( R d ) be supported on sets of diameter r . Then for m < n , H n ( ν . µ ) ≥ E ≤ i ≤ n ( H i,m ( ν g,i . µ )) − O ( 1 m + m + log rn ) . Proof.
We can assume that n = n m , since replacing n by the closest multipleof m results in a change of O ( m/n ) to H n ( ν . µ ) , which is absorbed in the errorterm. Let us also introduce a parameter ≤ k < m . Then H n ( ν . µ ) = 1 n H ( ν . µ, D n )= 1 n H ( ν . µ, D k + n ) + O ( kn )= 1 n H ( ν . µ, D k ) + 1 n H ( ν . µ, D k + n |D k ) + O ( mn ) Since ν is supported on a set of diameter O (1) and µ on a set of diameter O ( r ) ,also ν . µ is supported on a set of diameter O ( r ) , so the trivial entropy boundgives n H ( ν . µ, D k ) = O ( log r + mn ) We next evaluate n H ( ν . µ, D k + n |D k ) . Recalling our assumption n = n m andthe definition of conditional entropy, we have n H ( ν . µ, D k + n |D k ) = 1 n n − X j =0 H ( ν . µ, D k +( j +1) m |D k + jm ) For each j we have the identities ν = E i = j ( ν g,i ) and µ = E i = j ( µ x,i ) , whichimplies ν . µ = E i = j ( ν g,i . µ ) . By concavity of entropy, we get n n X j =1 H ( ν . µ, D k + jm |D k +( j − m ) = 1 n n − X j =0 H ( E i = k + jm ( ν g,i . µ ) , D k +( j +1) m |D k + jm ) ≥ n n X j =1 E i = k + jm (cid:0) H ( ν g,i . µ, D k +( j +1) m |D k + jm ) (cid:1) = 1 n n X j =1 E i = k + jm (cid:0) H ( ν g,i . µ, D k +( j +1) m ) − H ( ν g,i . µ, D k + jm ) (cid:1) ν g,i . µ is supported on a set of diameter O (2 − i ) , for i = k + jm we have H ( ν g,i . µ D k + jm ) = O (1) . Thus the total sum of error terms in the sum aboveis O ( n ) , which upon dividing by n is O ( n /n ) = O (1 /m ) . The discussion sofar shows that n H ( ν . µ, D n ) ≥ n n X j =1 E i = k + jm (cid:0) H ( ν g,i . µ, D k +( j +1) m ) (cid:1) − O ( 1 m + m + log rn ) Averaging now over k = 0 , . . . , m gives n H ( ν . µ, D n ) = 1 m m − X k =0 n H ( ν . µ, D k + n ) − O ( mn ) ≥ m m − X k =0 n n X j =1 E i = k + jm (cid:0) H ( ν g,i . µ, D k +( j +1) m ) (cid:1) − O ( 1 n + m + log rn )= 1 n n X j =1 m E i = j ( H ( ν g,i . µ, D i + m )) − O ( 1 n + m + log rn )= E ≤ i ≤ n (cid:18) m H ( ν g,i . µ, D i + m ) (cid:19) − O ( 1 n + n + log r + kn ) as claimed.We also need the following variant of Lemma 3.7: Lemma 3.8.
Let ν ∈ P ( G ) and µ ∈ P ( R d ) be supported on balls of diameter r . Then for every k, n ∈ N , H n ( ν . µ ) ≥ E i = k ( H n ( ν g,i . µ )) + O R,k ( 1 n ) and in particular E i = k ( H n ( ν g,i . µ ) − H n ( µ x,i )) ≤ H n ( ν . µ ) − H n ( µ ) + O R,k ( 1 n ) Proof.
Since µ, ν are supported on balls of radius R , so is ν . µ , so the scale- k entropies of all these measures is O R,k (1) . It follows that H n ( ν . µ ) = 1 n H ( ν . µ, D n |D k ) + O R,k ( 1 n ) By concavity of conditional entropy, n H ( ν . µ, D n |D k ) = 1 n H ( E i = k ( ν g,i . µ ) , D n |D k ) ≥ E i = k ( 1 n H ( ν g,i . µ, D n |D k )) But ν g,i . µ is supported on a set of diameter O (2 − i ) , so (taking i = k ), n H ( ν g,k . µ, D n |D k ) = 1 n H ( ν g,k . µ, D n ) + O ( 1 n )= H n ( ν g,k . µ ) + O ( 1 n ) H n ( µ ) = 1 n H ( µ, D n |D k ) + O R,k ( 1 n )= 1 n E i = k ( H ( µ x,i , D n )) + O R,k ( 1 n )= E i = k ( H n ( µ x,i )) + O R,k ( 1 n ) (the first inequality again because µ is supported on a set of diameter O ( R ) ).Subtracting this expression for H n ( µ ) from the previous one for H n ( ν . µ ) givesthe claim. We consider here some basic connections between uniform, concentrated andsaturated measures. We make the statements as general as possible, but insome cases, especially when dealing with uniform measures, it is necessary toassume that the support of the measures is bounded, and the constants in theerror terms may depend on the diameter of the support. Since we are interestedin the asymptotics as m → ∞ we rarely make the dependence explicit, but itcan be read off from the proofs.Given partitions E and F of sets X, Y , respectively, write
E ⊗ F = { E × F : E ∈ E , F ∈ F} for the product partition of X × Y . We will also often identify E with thepartition E ⊗ { Y } of X × Y , and similarly F with the partition { X } ⊗ F of X × Y .For a linear subspace V ≤ R d we write D Vn for the level- n dyadic partitionon V with respect to some fixed (but arbitrary) orthogonal coordinate systemin V , which we usually do not specify.Let V ≤ R d be a linear subspace and W = V ⊥ , and let D ′ m = D Vm ⊗ D Wm denote the product partition of R d ∼ = V × W . Each element of D m intersects atmost O (1) elements of D ′ m , and vice versa, so by Lemma 3.2 (2), | H ( µ, D m ) − H ( µ, D ′ m ) | = O (1) . The same is true for the induced partitions on W , so, writing π W for the or-thogonal projection to W , | H ( π W µ, D m ) − H ( π W µ, D ′ m ) | = O (1) and also | H ( π W µ, D m ) − H ( π W µ, D Wm ) | = O (1) . Recall that we identify D Vm , D Wm with the partitions π − V D Vm , π − W D Wm of R d ,respectively. With this identification we have D ′ m = D Vm ∨ D Wm , and H ( π W µ, D ′ m ) = H ( µ, D Wm ) . From this discussion we have the following immediate consequence:29 emma 3.9.
With the above notation, a measure µ ∈ P ( R d ) is ( V, ε + O (1 /m ) , m ) -saturated if and only if m H ( µ, D Vm |D V ⊥ m ) ≥ dim V − ( ε + O ( 1 m )) . From similar considerations we have
Lemma 3.10. If µ ∈ P ( R d ) is ( V, ε, m ) -saturated and g = 2 t U + a ∈ G is asimilarity, then gµ is ( U V, ε + O ( | t | /m ) , m ) -saturated; and similarly for unifor-mity. One way to get saturated measures is from uniform measures:
Lemma 3.11. If µ ∈ P ([ − r, r ] d ) is ( V, ε, m ) -uniform then it is ( V, O r ( ε +1 /m ) , m ) -saturated.Proof. By uniformity, we can write µ = (1 − ε ) µ ′ + εµ ′′ , where µ ′ is supported onthe − m -neighborhood of a translate of V . By concavity of conditional entropy, H ( µ, D m |D V ⊥ m ) ≥ (1 − ε ) H ( µ ′ , D m |D V ⊥ m ) ≥ H ( µ ′ , D m |D V ⊥ m ) − εH ( µ ′ , D m ) . Since µ , and hence µ ′ , is supported on at most O ( r d · m ) atoms of D m , we have H ( µ ′ , D m ) = O ( m log r ) , and the inequality above becomes H ( µ, D m |D V ⊥ m ) ≥ H ( µ ′ , D m |D V ⊥ m ) − εO ( m log r ) . Since µ ′ is supported on a − m -neighborhood of a translate of V , it is supportedon O (1) atoms of D V ⊥ m , so H ( µ ′ , D V ⊥ m ) = O (1) , hence H ( µ ′ , D m |D V ⊥ m ) ≥ H ( µ ′ , D m ) − H ( µ ′ , D V ⊥ m ) ≥ H ( µ ′ , D m ) − O (1) . Finally, by Lemma 3.3 applied to µ = (1 − ε ) µ + εµ ′′ , and using the bound O ( r d m ) on the number of D m -atoms supporting µ ′ , µ ′′ and uniformity of µ , H ( µ ′ , D m ) > H ( µ ) − ε ( m + O (log r )) − H ( ε ) > m dim V − ε ( m + log r ) − H ( ε ) Putting it all together, using H ( ε ) ≤ and dividing by m gives the claim.Another way to get saturated measures is to take convex combinations ofsaturated measures: Lemma 3.12.
A convex combination of ( V, ε, m ) -saturated measures on R d is ( V, ε + O (1 /m ) , m ) -saturated.Proof. Immediate from Lemma 3.9 and concavity of conditional entropy (Lemma3.1 (5)).Combining the two lemmas above gives the following:
Corollary 3.13.
A convex combination of ( V, ε, m ) -uniform measures on [ − r, r ] d is ( V, O r ( ε + 1 /m ) , m ) -saturated. Lemma 3.14.
Let µ, ν ∈ P ([ − r, r ] d ) . If µ is ( V, ε, m ) -saturated and k µ − ν k <δ then ν is ( V, ε ′ , m ) -saturated for ε ′ = ε + O ( δ log r + 1 /m ) .Proof. Take A = D Vm ∨ D V ⊥ m and B = D V ⊥ m in Lemma 3.4, and use Lemma3.9.Finally, we shall need an entropy bound for concentrated measures. Lemma 3.15. If µ ∈ P ([ − r, r ] d ) is ( V, − m ) -concentrated then H m ( µ ) ≤ dim V + O r ( log mm ) .Proof. Write µ = (1 − − m ) µ + 2 − m µ where µ ∈ P ( W (2 − m ) ) for some trans-late W of V and µ ∈ P ([ − r, r ] d ) . Since H m ( µ i ) = O r (1) for i = 1 , , byLemma 3.3 it suffices for us to show that H m ( µ ) ≤ dim V + O r (1 /m ) . Thisagain follows from the fact that W (2 − m ) ∩ [ − r, r ] d intersects O ( r d m ) atoms of D m and the trivial entropy bound. In this section all measures are supported on [0 , d . Lemma 3.16. If µ ∈ P ([0 , d ) is ( V, ε, n ) -saturated, then for every ≤ m < n , P ≤ i ≤ n (cid:0) µ x,i is ( V, ε ′ , m ) -saturated (cid:1) > − ε ′ , where ε ′ = p dε + O ( mn ) . Proof.
Without loss of generality, we may assume that D n = D Vn ∨ D Wn where W = V ⊥ (Lemma 3.9). By the fact that µ is ( V, ε, n ) -saturated and by Lemma3.5, we have dim V + H n ( π W µ ) − ε ≤≤ H n ( µ )= E ≤ i ≤ n (cid:0) H m ( µ x,i ) (cid:1) + O ( mn )= E ≤ i ≤ n (cid:0) H m ( µ x,i , D Wm ) (cid:1) + E ≤ i ≤ n (cid:18) m H (cid:0) µ x,i , D m |D Wm (cid:1)(cid:19) + O ( mn )= E ≤ i ≤ n (cid:0) H m ( π W ( µ x,i )) (cid:1) + E ≤ i ≤ n (cid:18) m H (cid:0) µ x,i , D m |D Wm (cid:1)(cid:19) + O ( mn ) . Since ( π W µ ) y,i is the convex combination (using the natural weights) of π W ( µ D ) over those D ∈ D i with D ∩ π − W ( y ) = ∅ (recall that we are assuming D n = D Vn ∨ D Wn ), concavity of entropy implies H n ( π W µ ) = E ≤ i ≤ n (cid:0) H m (( π W µ ) y,i ) (cid:1) + O ( mn ) ≥ E ≤ i ≤ n (cid:0) H m ( π W ( µ x,i ) (cid:1) + O ( mn ) . E ≤ i ≤ n (cid:18) m H (cid:0) µ x,i , D m |D Wm (cid:1)(cid:19) ≥ dim V − ( ε + O ( mn )) . But we also have the trivial bound m H ( µ x,i , D m |D Wm ) ≤ dim V ≤ d . Combiningthis and the last inequality, the lemma follows by Markov’s inequality.The analogous statement for concentration is valid at individual scales (ratherthan for typical scales between and n , as above): Lemma 3.17. If µ ∈ P ([0 , d ) is ( V, ε ) -concentrated and ≤ m ≤ log(1 /ε ) ,then P i = m (cid:16) µ x,i is ( V, √ m ε ) -concentrated (cid:17) > − √ − m ε. Proof.
Let W = V + v be such that µ ( W ( ε ) ) > − ε . For a dyadic cube D write T D for the surjective homothety D → [0 , d and let W D = T D ( W ) . Clearly,for any D ∈ D m we have T D ( W ( ε ) ) = ( W D ) (2 m ε ) . Take δ = √ m ε ≤ and let E ⊆ D m denote the family of cells D such that µ D ( D \ W ( ε ) ) = µ D ([0 , d \ ( W D ) (2 m ε ) ) > δ. It follows that ε ≥ µ ([0 , d \ W ( ε ) ) ≥ X D ∈E µ ( D \ W ( ε ) ) > δ · µ ( ∪E ) , so µ ( ∪E ) < ε/δ = √ − m ε . Hence µ ( ∪ ( D m \E )) > −√ − m ε , and the conclusionfollows.We often will want to change the scale at which measures are saturated.Clearly if δ < ε and µ is ( V, δ ) -concentrated, then it is also ( V, ε ) -concentrated.However for δ < ε and k > m it is in general not true that if µ is ( V, δ, k ) -saturated then µ is also ( V, ε, m ) -saturated (though of course it certainly is ( V, ε, k ) -saturated). The issue is that the first few scales do not greatly affectthe entropy at a fine scale. In order to allow such change of parameters wewill pass to components, using the lemmas above. We will also need a simplecovering argument for intervals of Z : Lemma 3.18.
Let I ⊆ { , . . . , n } and m ∈ N be given. Then there is a subset I ′ ⊆ I such that I ⊆ I ′ + [0 , m ] and [ i, i + m ] ∩ [ j, j + m ] = ∅ for distinct i, j ∈ I ′ .Proof. Define I ′ inductively. Begin with I ′ = ∅ and, at each successive stage,if I \ S i ∈ I ′ [ i, i + m ] = ∅ then add its least element to I ′ . Stop when I ⊆ S i ∈ I ′ [ i, i + m ] . Proposition 3.19.
For every ε > and m ∈ N , if k > k ( ε, m ) and < δ <δ ( ε, m, k ) , then for all large enough n > n ( ε, m, k, δ ) , the following holds. Let ν, µ ∈ P ( R d ) and let V , V , . . . , V n ≤ R d be linear subspaces such that P ≤ i ≤ n (cid:18) µ x,i is ( V i , δ, k ) -saturated and ν y,i is ( V i , δ ) -concentrated (cid:19) > − δ. (18)32 hen there are linear subspaces V ′ , . . . , V ′ n ≤ R d such that P ≤ i ≤ n (cid:18) µ x,i is ( V ′ i , ε, m ) -saturated and ν y,i is ( V ′ i , ε ) -concentrated (cid:19) > − ε. (19) Furthermore if V i = V is independent of i then we can take V ′ i = V .Proof. Fix δ, k and suppose that (18) holds for some n . Let I ⊆ { , . . . , n } denote the set of indices u such that P i = u (cid:18) µ x,i is ( V i , δ, k ) -saturated and ν y,i is ( V i , δ ) -concentrated (cid:19) > − √ δ. By Markov’s inequality, | I | ≥ (1 − √ δ )( n + 1) Let I ′ ⊆ I be chosen as in the previous lemma with parameter k , so I ⊆ I ′ +[0 , k ] and [ i, i + k ] ∩ [ j, j + k ] = ∅ for distinct i, j ∈ I ′ . If j = i + u for some i ∈ I ′ and ≤ u ≤ k , define V ′ j = V i . Define V ′ j arbitrarily for other j . Note that when V i = V is independent of i then also V ′ j = V for j as above, in which case wecan set V ′ i = V for all i and satisfy the last assertion in the statement.To see that this choice works (assuming the parameters satisfy the properrelations), note that for any pair of components θ = µ x,i , η = ν y,i such that θ is ( V i , δ, k ) -saturated and η is ( V i , δ ) -concentrated, we have by Lemmas 3.16 and3.17 that P i ≤ j ≤ i + k ( θ w,j is ( V ′ j , r dδ + O ( mk ) , m ) -saturated ) > − r dδ + O ( mk ) P i ≤ j ≤ i + k ( η z,j is ( V ′ j , √ k δ ) -concentrated ) > − √ − k δ. so P i ≤ j ≤ i + k (cid:18) θ w,j is ( V ′ j , p dδ + O ( mk ) , m ) -saturated and η z,j is ( V ′ j , √ k δ ) -concentrated (cid:19) > − O ( r δ + mk ) . Write U = S i ∈ I ′ [ i, i + k ] . The union is disjoint by assumption, so the boundsabove combine to give P i ∈ U (cid:18) θ w,i is ( V ′ j , p dδ + O ( mk ) , m ) -saturated and η zij is ( V ′ j , √ k δ ) -concentrated (cid:19) > − O ( r δ + mk ) . Let V = U ∩ [0 , n ] . Then we have the trivial inequalities P i ∈ V ( . . . ) ≥ P i ∈ U ( . . . ) − | U \ V || U | P ≤ i ≤ n ( . . . ) ≥ | V | n + 1 P i ∈ U ( . . . ) . Since I ⊆ U ⊆ [0 , n + k ] to we have | U \ V | ≤ k and | U | ≥ (1 − √ δ )( n + 1) , socombining the identities above with the previous inequality we get P ≤ i ≤ n (cid:18) θ w,i is ( V ′ j , p dδ + O ( mk ) , m ) -saturated and η zij is ( V ′ j , √ k δ ) -concentrated (cid:19) > − O ( r δ + mk ) − O ( kn ) . Thus if k is large enough relative to ε, m ; δ is small enough relative to ε, k ; and n is large enough relative to ε, k , we obtain (19).33e remark that the use of Lemma 3.18 and Markov’s inequality in the proofis rather crude, and one might want to use Lemma 2.7 instead. This would haveshown that one can associate to most components a subspace on which it issuitably concentrated and saturated, but the subspaces would generally dependon the component, and not just on the level it belongs to. The argument abovegives the desired uniformity across each level. Let B r ( x ) denote the open Euclidean ball of radius r around x ∈ R d , and, asbefore, for A ⊆ R d let A ( ε ) = { x ∈ R d : d ( x, A ) < ε } . Define a metric on thespace of linear subspaces V, W ≤ R d by d ( V, W ) = inf { ε > V ∩ B (0) ⊆ W ( ε ) and W ∩ B (0) ⊆ V ( ε ) } (20)This is just the Hausdorff metric on the intersections of V, W with the closed unitball, so the induced topology on the space of linear subspaces of R d is compact(note that it decomposes into d + 1 connected components, corresponding to thedimensions of the subspaces). It is also the same as the distance k π V − π W k ,where k·k denotes the operator norm and π V , π W the orthogonal projections.It will be convenient to write A ⊑ A ′ if A ∩ B (0) ⊆ A ′ . This a transitive, reflexive relation. In this notation, the distance betweensubspaces
V, W ≤ R d defined above is d ( V , V ) = inf { ε > V ⊑ V ( ε )2 and V ⊑ V ( ε )1 } . Define the “angle” between subspaces V , V by ∠ ( V , V ) = 0 if V ⊆ V or V ⊆ V ; otherwise set W = V ∩ V and ∠ ( V , V ) = inf {k v − v k : v ∈ V ∩ W ⊥ , v ∈ V ∩ W ⊥ , k v k = k v k = 1 } . This is not the usual notion of angle, but it agrees with the usual definition upto a multiplicative constant, and is more convenient to work with.The following properties are elementary and we omit their proof.
Lemma 3.20.
Let
V, W ≤ R d be linear subspaces and ε > .1. d ( V, W ) ≤ with equality if and only if V ∩ W ⊥ = { } or W ∩ V ⊥ = { } .In particular if dim W > dim V then W V (1) and d ( V, W ) = 1 .2. If < ε < and V ⊑ W ( ε ) then π W : V → W is injective, dim V ≤ dim W , and if dim V = dim W then W ⊑ V ( ε ) and d ( V, W ) ≤ ε .3. ∠ ( V, W ) ≤ √ · d ( V, W ) .4. If V W ( ε ) then there exists a vector v ∈ V with ∠ ( R v, W ) ≥ ε . We collect some elementary implications for concentration, uniformity andsaturation:
Lemma 3.21.
Let µ ∈ P ([0 , d ) and V, W ≤ R d . . If µ is ( V, ε ) concentrated and d ( W, V ) < δ , then µ is ( W, ε + √ dδ ) -concentrated.2. If µ is ( V, ε, m +1) -uniform and d ( W, V ) < √ d − ( m +1) , then µ is ( W, ε, m ) -uniform.3. If µ is ( V, ε, m ) -saturated and d ( W, V ) < − m , then µ is ( W, ε + O (1 /m ) , m ) -saturated.4. If µ is ( V, ε, m ) -saturated and W ≤ V is a subspace then µ is ( W, ε + O (1 /m ) , m ) -saturated.5. If µ ∈ P ([0 , d ) is both ( V , ε, m ) , and ( V , ε, m ) -saturated, and ∠ ( V , V ) >δ > , then µ is ( V + V , ε ′ , m ) -saturated, where ε ′ = 2 ε + O ( m log( δ )) .Proof. If d ( W, V ) < δ then V ∩ B (0) ⊆ W ( δ ) ∩ B (0) , so V ∩ B √ d (0) ⊆ W ( √ dδ ) ∩ B √ d (0) . It follows that if ( V + v ) ∩ [0 , d = ∅ then V + v ∩ [0 , d ⊆ ( W + v ) ( √ d · δ ) (we use the fact that the diameter of [0 , d is √ d ), so ( V ( ε ) + v ) ∩ [0 , d ⊆ ( W + v ) ( ε + √ dδ ) . The first claim follows.For (2), observe that if d ( W, V ) < − ( m +1) and µ is ( V, − ( m +1) / √ d ) -concentrated, then by the first claim, µ is ( W, − m ) -concentrated. Since byassumption H m ( µ, D m ) > dim V − ε , and d ( V, W ) < − ( m +1) implies dim W =dim V , we have shown that µ is ( V, ε, m ) -uniform.For (3), note that d ( V, W ) < − m implies that k π V ⊥ − π W ⊥ k < − m , so | H ( µ, D m |D V ⊥ m ) − H ( µ, D m |D W ⊥ m ) | = O (1) , and the claim follows.For (4), we may assume W = V . Let W ′ < V denote the orthogonalcomplement of W in V and write R d as the orthogonal direct sum W ⊕ W ′ ⊕ V ⊥ .Without loss of generality we may assume D m = D Wm ∨D W ′ m ∨D V ⊥ m (Lemma 3.9);by doing so we implicitly increased ε by O (1 /m ) . Since µ is ( V, ε, m ) -saturated, m H ( µ, D m |D V ⊥ m ) ≥ dim V − ε. Since D m refines D W ⊥ m = D W ′ ⊕ V ⊥ m which in turn refines D V ⊥ m , we have H ( µ, D m | D V ⊥ m ) = H ( µ, D m ∨ D W ′ m |D V ⊥ m )= H ( µ, D W ′ m |D V ⊥ m ) + H ( µ, D m |D W ⊥ m ) . Inserting this into the inequality above gives m H ( µ, D m |D W ⊥ m ) ≥ dim V − H ( µ, D W ′ m |D V ⊥ m ) − ε. Since m H ( µ, D W ′ m |D V ⊥ m ) ≤ dim W ′ + O (1 /m ) = dim V − dim W + O (1 /m ) , thisis precisely ( W, ε + O (1 /m ) , m ) -saturation of µ .We turn to (5). Let V ′ = V ∩ ( V ∩ V ) ⊥ , so that V ′ < V , V ∩ V ′ = { } , ∠ ( V , V ′ ) = ∠ ( V , V ) > δ and V + V ′ = V + V . By (4) we can replace V by V ′ at the cost of increasing ε by O (1 /m ) . Thus, we may assume from the startthat V ∩ V = { } .Write V = V ⊕ V (this is an algebraic, not an orthogonal, sum) and W = V ⊥ . We can assume without loss of generality that D m = D Vm ∨ D Wm . Also35et E m = D V m ∨ D V m ∨ D Wm be the partition corresponding to the direct sum R d = V ⊕ V ⊕ W .By Lemma 3.9, we must show that H m ( µ, D Vm |D Wm ) ≥ dim V − ε − O ( log(1 /δ ) m ) . Because of the assumption ∠ ( V , V ) > δ , the partitions of D V m ∨ D V m and D Vm of V , and also the corresponding partitions of R d , have the property that eachatom of one intersects O (1 /δ ) atoms of the other. Thus (cid:12)(cid:12) H m ( µ, D V m ∨ D V m |D Wm ) − H ( µ, D Vm |D Wm ) (cid:12)(cid:12) = O (log(1 /δ )) , so it is sufficient for us to prove that H m ( µ, D V m ∨ D V m |D Wm ) ≥ dim V − ε − O ( log(1 /δ ) m ) . (21)Now, m H ( µ, D V m ∨ D V m |D Wm ) = 1 m H ( µ, D V m |D Wm ) + 1 m H ( µ, D V m |D V m ∨ D Wm ) . (22)Since W ⊆ V ⊥ , we can assume that the partition D V ⊥ m refines D Wm . Using thefact that µ is ( V , ε, m ) -saturated, we get a bound for the first term on theright hand side of the above identity: m H ( µ, D V m |D Wm ) ≥ m H ( µ, D V m |D V ⊥ m ) ≥ dim V − ε. As for the second term, again using the fact that each atom of D V m ∨ D V m ∨ D Wm intersects O (1 /δ ) atoms of D V m ∨D V ⊥ m and vice versa, and similarly for D V m ∨D Wm and D V ⊥ m , we have m H ( µ, D V m |D V m ∨ D Wm ) = 1 m H ( µ, D V m |D V ⊥ ) − O ( log(1 /δ ) m ) ≥ dim V − ε − O ( log(1 /δ ) m ) . Combining the last two inequalities and (22) gives the desired inequality (21)
In this section we develop some methods for understanding unions and inter-sections of thickened subspaces. We require some elementary linear algebraestimates.
Lemma 3.22.
Let v , . . . , v k ∈ R d with k v i k ≤ , and suppose that d ( v i , span { v , . . . , v i − } ) > δ for all ≤ i ≤ k. Then for any v = P t i v i we have k ( t , . . . , t k ) k ≤ √ k · k k v k /δ k . roof. We first claim for every ≤ i ≤ k that | t i | ≤ (1 + 1 δ ) k − i +1 k v k This we show by induction on k . For k = 1 it is trivial. In general set V i =span { v , . . . , v i } and W i = V ⊥ i . By hypothesis, (cid:13)(cid:13) π W i − ( v i ) (cid:13)(cid:13) > δ for all ≤ i ≤ k . Thus k v k ≥ (cid:13)(cid:13) π W k − ( v ) (cid:13)(cid:13) = | t k | · (cid:13)(cid:13) π W k − ( v k ) (cid:13)(cid:13) > t k δ so | t k | < k v k /δ , and the claim holds for i = k . Now, (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) k − X i =1 t i v i (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = (cid:13)(cid:13) π V k − ( v ) − t k π V k − ( v k ) (cid:13)(cid:13) ≤ k v k + | t k |≤ (1 + 1 δ ) k v k Thus, by the induction hypothesis, for ≤ i ≤ k − , | t i | ≤ (1 + 1 δ ) ( k − − i +1 (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) k − X i =1 t i v i (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ (1 + 1 δ ) k − i +1 k v k as claimed. It remains to note that k t k ≤ √ k k t k ∞ ≤ √ k (1 + 1 δ ) k k v k The claim follows (note that δ < ).It will be convenient to introduce notation for the constant p k = k · k Corollary 3.23.
Let V ≤ R d be a subspace and v , . . . , v k ∈ V ( ε ) with k v i k ≤ .If d ( v i , span { v , . . . , v i − } ) > δ for all ≤ i ≤ k , then span { v , . . . , v k } ⊑ V ( p k ε/δ k ) .Proof. Write W = span { v i } and let w = P t i v i ∈ W be a unit vector. Write t = ( t , . . . , t k ) . Then by the last lemma, k t k ≤ k √ k/δ k . Thus d ( w, V ) ≤ X | t i | · d ( v i , V ) < ε · k t k ≤ ε · √ k · k t k ≤ p k · εδ k , where we used the hypothesis and the general inequality k u k ≤ √ k k u k . Corollary 3.24.
Suppose that
E, V ≤ R d are subspaces such that E ⊑ V ( ε ) ,and e ∈ V ( ε ) ∩ B (0) is such that d ( e, E ) > δ > . Then E ′ = E ⊕ R e satisfies E ′ ⊑ V (8 ε/δ ) . roof. Every vector in E ′ belongs to a subspace of the form R e ′ ⊕ R e for some e ′ ∈ E , so it is enough to show R e ′ ⊕ R e ⊑ V (8 ε/δ ) . But the pair e ′ , e satisfiesthe assumptions of the previous corollary with k = 2 . Since p = 8 , the claimfollows. Corollary 3.25.
Suppose that
E, V, W ≤ R d are subspaces such that E ⊑ V ( ε ) ∩ W ( ε ) , and e ∈ ( V ( ε ) ∩ W ( ε ) ) ∩ B (0) is such that d ( e, E ) > δ > . Let E ′ = E ⊕ R e . Then E ′ ⊑ V (8 ε/δ ) ∩ W (8 ε/δ ) .Proof. Immediate from the lemma.Proposition 3.27 below takes a family W of subspaces and finds an essentiallyminimal subspace that almost-contains all W ∈ W . The basic step in the proofis to do this for two subspaces, and this is given by the next corollary. Corollary 3.26.
Given ε > let ε k = 4 ε / k . Then for any V, W ≤ R d , thereis a ≤ k ≤ d and a k -dimensional subspace E ⊑ V ( ε k ) ∩ W ( ε k ) such that V ( ε k ) ∩ W ( ε k ) ⊑ E ( ε k +1 ) .Proof. Let E be a subspace of maximal dimension satisfying E ⊑ V ( ε dim E ) ∩ W ( ε dim E ) (such subspaces exist, e.g. { } ). Let k = dim E . If V ( ε k ) ∩ W ( ε k ) E ( ε k +1 ) then by the previous corollary we can replace E by E ′ = E + R e forsome e ∈ ∂B (0) ∩ ( V ( ε k ) ∩ W ( ε k ) \ E ( ε k +1 ) ) and E ′ will satisfy E ′ ⊑ V (8 ε k /ε k +1 ) ∩ W (8 ε k /ε k +1 ) ⊆ V ( ε k +1 ) ∩ W ( ε k +1 ) where we have used ε k ε k +1 = 8 · ε / k ε / k +1 = 2 · ε / k +1 = 12 ε k +1 But dim E ′ = dim E + 1 , which contradicts the maximality of E . Proposition 3.27.
Let ε > and ε k = 4 ε / k . Then for any family W ofsubspaces of R d , there is a subspace V ≤ R d such that W ⊑ V ( ε d ) for all W ∈ W , and if e V is a subspace such that W ⊑ e V ( ε ) for all W ∈ W , then V ⊑ e V ( ε d ) .Proof. We may assume that ε d < since otherwise the statement is trivial (anysubspace V will do). Let V be a subspace of minimal dimension such that W ⊑ V ( ε d − dim V ) for all W ∈ W (such subspaces exist, e.g. V = R d ). Write k = d − dim V . We can assume k < d since the case k = d corresponds to V = { } , and then the conclusion is trivial.We claim that V is the desired subspace. First, ε k ≤ ε d , so we have W ⊑ V ( ε k ) ⊑ V ( ε d ) for all W ∈ W , which is the first property.For the second property of V , suppose that there is a subspace e V ≤ R d such that W ⊑ e V ( ε ) ⊑ e V ( ε k ) for W ∈ W , but such that V e V ( ε d ) . Let E be a subspace of maximal dimension satisfying E ⊑ V ( ε k +1 ) ∩ e V ( ε k +1 ) . Clearly dim E ≤ dim V (since E ⊑ V ( ε k +1 ) and ε k +1 ≤ ε d < ), and we cannot have dim E = dim V because then we would have V ⊑ E ( ε k ) ⊑ e V ( ε k + ε k +1 ) ⊑ e V ( ε d ) ,contrary to assumption. So dim E < dim V . Thus, by the definition of V , thereexists a W ∈ W with W E ( ε k +1 ) . Choose a vector e ∈ ( B (0) ∩ W ) \ E ( ε k +1 ) ,38o that d ( e, E ) ≥ ε k +1 , and note that since W ⊑ V ( ε k ) ∩ e V ( ε k ) we also have e ∈ V ( ε k ) ∩ e V ( ε k ) . Thus, by Corollary 3.25 (with ε k and ε k +1 in the role of ε, δ ), the subspace E ′ = E ⊕ R e satisfies E ′ ⊑ V ( ε k +1 ) ∩ e V ( ε k +1 ) . But dim E ′ > dim E , which contradicts the definition of E . We conclude that V ⊆ e V ( ε d ) , asdesired.We note that the proof actually shows V ⊑ W ( ε d − dim V ) for all W ∈ W andthat any e V with this property satisfies V ⊑ e V ( ε d − dim V +1 ) .From the last proposition we can derive a dual version: for any family W of subspaces and any ε > , there is a subspace V such that V ⊑ W ( ε d ) for all W ∈ W and any other subspace e V with this property satisfies e V ⊑ V ( ε d ) . Tosee this, observe that U ⊑ U ( ε )2 if and only if U ⊥ ⊑ ( U ⊥ ) ( ε ) , and apply theprevious proposition to W ⊥ = { W ⊥ : W ∈ W} . However, in a later applicationwe will want to present the subspace V as an intersection of a small numberof (neighborhoods of) subspaces from W . This is provided for in the followingproposition. Proposition 3.28.
Let ε > and δ = 8 d − ε / d . Then for any family W ofsubspaces of R d , there is a subspace V ≤ R d such that V ⊑ W ( δ ) for every W ∈W , and subspaces W , . . . , W k ∈ W with k ≤ d − dim V such that T ki =1 W ( ε ) i ⊑ V ( δ ) . In particular, if V ′ is any other subspace satisfying V ′ ⊑ W ( ε ) for every W ∈ W , then V ′ ⊑ V ( δ ) .Furthermore, if we are given an increasing sequence W ⊆ W ⊆ . . . witheach W i a family of subspaces of R d , then we can assign V i to W i as above insuch a way that V i +1 ⊑ ( V i ) ( δ ) .Proof. Fix ε, δ, W as in the statement. We shall recursively choose finite se-quences of subspaces W , W , . . . ∈ W and V , V , . . . ≤ R d , and of real numbers δ , δ , . . . > , such that T ij =1 W ( ε ) j ⊑ V ( δ i ) i .Begin with V = R d and δ = ε . Now for j ≥ suppose we have defined V i , W i , δ i for i < j . Let δ ∗ j = 8( δ j − ) / d . If V j − ⊑ W ( δ ∗ j ) for all W ∈ W , weterminate the construction. Otherwise, choose W j ∈ W such that V j − W ( δ ∗ j ) j .Apply Corollary 3.26 to the subspaces V j − , W j with the parameter δ j − . Weobtain a subspace V j ≤ R d and real numbers δ j − ≤ δ ′ j ≤ δ j ≤ δ j − ) / d satisfying V j ⊑ V ( δ ′ j ) j − ∩ W ( δ ′ j ) j (23)and V ( δ ′ j ) j − ∩ W ( δ ′ j ) j ⊑ V ( δ j ) j (in the notation of the corollary, δ ′ j = ε k and δ j = ε k +1 , but if k = d we cantake δ ′ j = δ j = ε d ). Since ε ≤ δ j − ≤ δ ′ j and, by the induction hypothesis, T j − i =0 W ( ε ) j ⊑ V ( δ j − ) j − , the last equation implies that T ji =0 W ( ε ) j ⊑ V ( δ j ) j , and theconditions of the construction are satisfied.We now claim that dim V j < dim V j − as long as they are defined. Indeed,suppose the construction completed the j -th step of the construction withoutterminating, so V j − W ( δ ∗ j ) j . In particular this means that δ j ≤ δ ∗ j < . Now,we know that V j ⊑ V ( δ j ) j − , which together with δ j < implies dim V j ≤ dim V j − .Suppose that equality held. Then, again using δ j < , we would have the reverse39ontainment V j − ⊑ V ( δ j ) j . This, together with V j ⊑ W ( δ ′ j ) j and δ ′ j ≤ δ j , implies V j − ⊑ W (2 δ j ) j . Since δ j ≤ δ ∗ j , this contradicts the assumption V j − W ( δ ∗ j ) j ,so we must have dim V j < dim V j − .Since dim V j is strictly decreasing, the procedure terminates after completingsome k ≤ d iterations, which in our numbering means it completed step k − and terminated at step k . This means that V k − ⊑ W ( δ ∗ k ) for all W ∈ W and T k − i =0 W ( ε ) i ⊑ V ( δ k − ) k . Observe that δ k − ≤ δ ∗ k − < k − ( δ ) / ( k − d ≤ δ (since δ = ε ). Hence for V = V k we have W ⊑ V ( δ ) for all W ∈ W and S ki =1 W ( ε ) i ⊑ V ( δ ) , as desired.The statement about V ′ is immediate from the first statement of the lemma.Finally, for the last part, we note that in the construction we may firstexhaust the subspaces in W , obtaining V , then move on to those in W obtaining possibly a different V , etc. The containment relation follows from(23). Our goal in this section is to identify, given a measure and associated parameters,a subspace V on which it is in a sense most concentrated, and one on whichit is most saturated, relative to the parameters. By this we mean that if e V isanother subspace for which the measure is concentrated or saturated, relativeto comparable parameters, then e V is, respectively, essentially contained in, oressentially contains, V .The existence of a “minimal” subspace on which a given measure concentratesis proved by a variation on the argument in Proposition 3.27: Proposition 3.29.
Let ε > and ε k = 4 ε / k , and assume that ε d < / . Thenfor any η ∈ P ([0 , d ) , there is a subspace V ≤ R d such that η is ( V, √ d · ε d ) -concentrated, and if W is any subspace such that η is ( W, ε ) -concentrated, then V ⊑ W ( ε d ) .Proof. We can assume ε d < . Choose a subspace V ≤ R d of minimal dimensionsuch that η is ( V, √ d · ε d − dim V ) -concentrated (the family of such subspaces isnon-empty, e.g. V = R d ). Write k = d − dim V note that we can assume k < d since otherwise V = { } and the claim is trivial.We claim that V is the desired subspace. Suppose that η is ( W, ε ) -concentrated(and hence ( W, ε k ) -concentrated) but that V W ( ε d ) . Let E ⊑ V ( ε k +1 ) ∩ W ( ε k +1 ) be a subspace of maximal dimension. Then dim E < dim V so by thedefinition of V the measure η is not ( E, √ d · ε k +1 ) -concentrated. Now, con-sider translates of V ( ε k ) + v and W ( ε k ) + w which cover all but ε k and ε of themass of η , respectively. Choose u ∈ ( V ( ε k ) + v ) ∩ ( W ( ε k ) + w ) (the intersectionis non-empty because it has η -mass at least − ε k > ), and observe that V ( ε k ) + v ⊆ V (2 ε k ) + u and W ( ε k ) + w ⊆ W (2 ε k ) + u . Hence η (cid:16) [0 , d ∩ ( V (2 ε k ) ∩ W (2 ε k ) + u (cid:17) > − ε k . [0 , d ∩ ( E ( √ dε k ) + u ) . It covers at most − ε k +1 of the mass of η , which, since since ε k < ε k +1 , is less than the mass of theprevious intersection. Thus, translating back to the origin and scaling by / √ d (so that [0 , d + u is mapped into the unit ball), we find that there existsa point e ∈ ( B (0) ∩ V (2 ε k / √ d ) ∩ W (2 ε k / √ d ) ) \ E ( ε k ) . By Corollary 3.25 thesubspace E ′ = E + R e satisfies E ′ ⊑ V ( ε k +1 ) ∩ W ( ε k +1 ) (we have used that · ε k / ( √ dε k +1 ) < ε k +1 ). But dim E ′ > dim E , contradicting the choice of E .We conclude therefore V ⊑ W ( ε d ) , as desired.We turn to the analog of Proposition 3.29, which provides a “maximal” sub-space on which a given measure is saturated to a certain degree. The argumentis again similar to the measureless case. Proposition 3.30.
Given m ∈ N and θ ∈ P ([0 , d d ) , there is a subspace V ≤ R d such that θ is ( V, O ( log mm ) , m ) -saturated, and if W is any subspace suchthat θ is ( W, m , m ) -saturated, then W ⊑ V ( O ((log m ) /m )) .Proof. Write δ k = C k k log( m ) /m where C > is large enough to serve asthe implicit a constant in the big- O expressions we invoke below. Note that δ k < δ k +1 and δ d = O d ( log mm ) . Let V be a subspace of maximal dimension suchthat θ is ( V, δ dim V , m ) -saturated (such subspaces exist, e.g. V = { } ). Write k = dim V and suppose θ is ( W, /m, m ) -saturated for some W . If W V ( δ d ) then certainly W V ( δ k ) , so by Lemma 3.20(4), there is a subspace W ′ ⊆ W with ∠ ( V, W ′ ) > δ k . By Lemma 3.21 (3) θ is ( W ′ , (1 + C ) /m, m ) -saturated,and since (1 + C ) /m < δ k it is ( W ′ , δ k , m ) -saturated. By Lemma 3.21 (5), θ is ( V + W ′ , δ k + Cm log( δ k ) , m ) -saturated. Since δ k + Cm log( δ k ) < δ k +1 themeasure θ is ( V + W ′ , δ k +1 , m ) -saturated. Since V ′ = V + W ′ has dimension atleast k and θ is ( V ′ , δ dim V ′ , m ) -saturated, this contradicts the definition of V . When a measure has the property that at each level the components are withhigh probability concentrated on a subspace, one may expect the subspace tovary slowly between levels. This is the content of the following proposition,which may be applied to the conclusion of Theorem 2.8, but is also needed inthe theorem’s proof.
Proposition 3.31.
Let < ε < and set δ = 3 · d − ε / (4 · d ) . Let η ∈P ([0 , d ) and n ∈ N , and suppose that for every n ≤ k ≤ n + log(1 /ε ) thereis given a linear subspace W k ≤ R d satisfying P i = k ( η x,i is ( W k , ε ) -concentrated ) > − ε. (24) Then there are subspaces V k ≤ W k such that for n ≤ k ≤ log(1 /ε ) , P i = k ( η x,i is ( V k , δ ) -concentrated ) > − d √ ε, (25) and V j ⊑ V ( δ ) i for all n ≤ i ≤ j ≤ n + log(1 /ε ) . roof. Write N = [ log(1 /ε )] . For each n ≤ i ≤ n + N set W i = { W j : n ≤ j ≤ i } and apply Proposition 3.28 the with parameter ε / to obtain a subspace V i satisfying V i ⊑ W ( δ/ j and V i ⊑ V ( δ/ j for n ≤ j ≤ i , and r ( i ) ≤ d subspaces W i, , . . . , W i,r ( i ) ∈ W i such that T r ( i ) j =1 W ( ε / ) i,j ⊑ V ( δ/ i .Now, given i and ≤ j ≤ r ( i ) , there is by definition a n ≤ k = k ( i, j ) ≤ i such that W i,j = W k ( i,j ) . For every component θ = η x,k in the event in (24),we can apply Lemma 3.17 (using i − k ( i, j ) ≤ log(1 /ε ) ) to get P u = i ( θ x,u is ( V k , ε / ) -concentrated ) > − √ ε. Thus by (24), P u = i ( η x,u is ( V k , ε / ) -concentrated ) > − √ ε. Hence, P u = i ( η x,u is ( V k ( i,j ) , ε / ) -concentrated for all ≤ j ≤ r ( i )) > − r ( i ) √ ε ≥ − d √ ε. Finally, if θ = η x,i is in the event above then, using T r ( i ) j =1 W ( ε / ) i,j = T r ( i ) j =1 W ( ε / ) k ( i,j ) ⊑ V ( δ/ i we have θ ( V ( δ/ i ) ≥ − r ( i ) X j =1 (1 − θ ( W ( ε / ) k ( i,j ) )) ≥ − r ( i ) · ε / . Since r ( i ) ≤ d and dε / ≤ δ/ , this means that θ is ( V i , δ/ -concentrated.Since this is true for components θ = η x,i with probability > − d √ ε , we haveestablished (25), in fact with δ/ instead of δ .Finally, we show that we can assume V k ≤ W k . If ε is so large that δ ≥ there is nothing to prove since we can take V k = W k from the start, so assume δ < . From this and the relation V i ⊑ W ( δ/ i it follows that π W i is injectiveon V i and satisfies d ( V i , π W i V i ) ≤ δ/ . Thus V ( δ/ i ⊑ ( π W i V i ) ( δ ) , so if a measure θ is ( V i , δ/ -concentrated, it is also ( π W i V i , δ ) -concentrated. It follows that ifwe replace V i by π W i V i , we still will have (25), as desired. Also, since V j ⊑ V ( δ/ i for n ≤ i < j before the modification, and each subspace moves by at most δ/ ,after the change we have V j ⊑ V ( δ ) i for n ≤ i < j , as desired. Corollary 3.32.
For every ℓ ∈ N and < ε < the following holds with δ = 3 · d − ε / (4 · d ) . Let η ∈ P ([0 , d ) and N > log(1 /ε ) , and suppose thatfor each ≤ q ≤ N there is given a subspace W q ≤ R d such that P i = q ( η x,i is ( W q , ε ) -concentrated ) > − ε. Then there are subspaces V q ≤ W q such that P i = q ( η x,i is ( V q , δ ) -concentrated ) > − d √ ε, and N + 1 { ≤ i ≤ N : d ( V i , V i − ℓ ) ≤ δ } ≥ − d + 1) ℓ log(1 /ε ) . (Note that the conclusion is of interest only when ℓ is small compared to log(1 /ε ) ). roof. We may assume that δ < , otherwise the statement is trivial.Let m = [ log(1 /ε )] . For each k < [ N/m ] write I k = { mk, mk +1 , . . . , m ( k +1) − } and for k = [ N/m ] write I k = { m [ N/m ] , . . . , N } . For each k ≤ [ N/m ] ,apply the previous proposition with n = km to find subspaces V q ≤ W q , q ∈ I k ,such that V j ⊑ V ( δ ) i for all i < j in I k . This defines V q for all ≤ q ≤ N .Fix k . If i < j are in I k then V j ⊑ V ( δ ) i (since δ < ), hence dim V j ≤ dim V i ,and if dim V i = dim V j then d ( V i , V j ) ≤ δ (since V j ⊑ V ( δ ) i ). Let i = mk and let i u +1 ∈ I k denote the least index such that dim V i u +1 < dim V i u . Thereare at most d such indices. It follows from the above that if j + ℓ ∈ I k and d ( V j , V j − ℓ ) ≥ δ then i u ≤ j < i u + ℓ for some u . There are at most ( d + 1) ℓ suchindices j , so { i : i + ℓ ∈ I k and d ( V i , V i − ℓ ) ≥ δ } ≤ ( d + 1) ℓ. As the sets I , . . . , I [ N/m ] are disjoint and cover { , . . . , N + 1 } , the bound aboveapplies to each of them, so { ≤ i < N : d ( V i , V i − ℓ ) ≥ δ } ≤ ( Nm + 1)( d + 1) ℓ. Dividing by N + 1 and using N > log(1 /ε ) gives the desired bound. R d Our goal in this section is to prove Theorem 2.8.
We begin with the obvious.
Lemma 4.1.
For m ∈ N and µ, ν ∈ P ( R d ) , H m ( µ ∗ ν ) ≥ H m ( µ ) − O ( 1 m ) . Also, if µ is ( V, ε, m ) -saturated then µ ∗ ν is ( V, ε ′ , m ) -saturated, where ε ′ = ε + O (1 /m ) .Proof. Notice that µ ∗ δ y ( A ) = µ ( A − y ) , so that H ( µ ∗ δ y , D m ) = H ( µ, D m + y ) ,where D m + y = { [ a + y, b + y ) : [ a, b ) ∈ D m } . Thus by Lemma 3.2 (4), we have H m ( µ ∗ δ y ) ≥ H m ( µ ) − O ( m ) . Since µ ∗ ν = ´ µ ∗ δ y dν ( y ) , concavity of entropyimplies H m ( µ ∗ ν ) ≥ H m ( µ ) − O ( m ) . The second part follows using the samerelation and Lemma 3.12. Corollary 4.2.
Let µ, ν ∈ P ( R d ) , m ∈ N , and let V ≤ R d be a linear subspace.Suppose that µ is not ( V, ε, m ) -saturated, and that ν is ( V, ε, m ) -saturated.Then H m ( µ ∗ ν ) > H m ( µ ) + ε ′ , where ε ′ = ε − O (1 /m ) . roof. Write W = V ⊥ . By the previous lemma (with the roles of µ, ν reversed), H m ( µ ∗ ν ) ≥ H m ( π W ( µ ∗ ν )) + dim V − ( ε + O (1 /m )) . Since π W is linear, π W ( µ ∗ ν ) = π W µ ∗ π W ν , by the previous lemma H m ( π W ( µ ∗ ν )) ≥ H m ( π W µ ) − O (1 /m ) . Inserting this in the last inequality and usingthe assumption that H m ( µ ) ≤ H m ( π W µ ) + dim V − ε , and absorbing another O (1 /m ) into the error term, we have H m ( µ ∗ ν ) ≥ H m ( π W µ ) + dim V − ( ε + O (1 /m )) ≥ H m ( µ ) + ( ε − O (1 /m )) , as claimed. Lemma 4.3.
Let µ ∈ P ( R d ) be ( V, ε ) -concentrated, < ε < . Then µ ∗ k is ( V, (1 − εk )) -concentrated for all k ∈ N with εk < .Proof. Let µ = εµ +(1 − ε ) µ with µ , µ probability measures and µ supportedon a translate of V ( ε ) . Then we can write µ × k = (1 − ε ) k µ × k + (1 − (1 − ε ) k ) ν k for some probability measure ν k , so, writing π k ( x . . . x k ) = P ki =1 x i , we have µ ∗ k = π k µ × k = (1 − ε ) k π k µ × k + (1 − (1 − ε ) k ) π k ν k Since µ is supported on a translate of V ( ε ) , the measure µ ∗ k = π k µ × k issupported on a translate of P ki =1 V ( ε ) = V ( εk ) . So the splitting of µ ∗ k aboveshows that (1 − ε ) k of the mass of µ ∗ k is supported on an εk -neighborhood of atranslate of V . Since (1 − ε ) k ≥ − εk , the claim follows. A rough but convenient way to describe the distribution of a measure is via itsmean and covariance matrix. In this section we develop some basic propertiesof these objects and their relation to concentration.By a covariance matrix we shall mean a d × d real symmetric matrix withnon-negative eigenvalues (we do not require them to be positive). We denotethe eigenvalues of such a matrix Σ by λ (Σ) ≥ λ (Σ) ≥ . . . ≥ λ d (Σ) . set λ k = 0 for k > d , preserving monotonicity. Define eigen ...r (Σ) to be thespan in R d of the eigenvectors corresponding to eigenvalues ≥ λ r (Σ) . Note thatif λ r (Σ) = λ r +1 (Σ) then dim(eigen ...r (Σ)) > r .It is advantageous to think of a covariance matrix as the positive semi-definitebi-linear form which it determines. The correspondence between these objectsis not one-to-one: The matrix determines the form but the form determines thematrix only given the standard basis. Nevertheless, given the inner product, theform determines the eigenvalues and eigenspaces, and we are primarily interestedin these; since the inner product is always fixed in our discussion, we will not losemuch by thinking in terms of linear forms, and use the same notation for both.One advantage of this approach is that a bi-linear form can be restricted to alinear subspace, giving another bi-linear form, which is positive semi-definite ifthe original one was. 44 emma 4.4. Let Σ be a positive semidefinite form on R d and U ≤ R d as sub-space. Suppose that u , . . . , u k is an orthonormal basis for U and that Σ( u i , u i ) <ε for i = 1 , . . . , d . Then λ (Σ | U × U ) ≤ dε .Proof. Let u = P a i u i ∈ U be a unit vector, write a = ( a , . . . , a d − r +1 ) , sothat k a k = 1 . Using Cauchy-Schwartz inequality for the “semi-inner product” h v, w i 7→ v T Σ w (which may not be positive, but satisfies the requirements forthe weak inequality), and again to get P | a i | ≤ √ d k a k , we have u T Σ µ u = X i,j a i a j u Ti Σ µ u j ≤ X i,j a i a j q ( u Ti Σ µ u i )( u Tj Σ µ u j ) < X i,j a i a j ε ≤ ε · ( X | a i | )( X | a j | ) ≤ ε · d · k a k = d · ε This proves the claim.For µ ∈ P ( R d ) , the mean of µ is m = m ( µ ) = ˆ x dµ ( x ) , and the covariance matrix of µ is Σ( µ ) = ˆ ( x − m )( x − m ) T dµ ( x ) . In this case we abbreviate λ i ( µ ) = λ i (Σ( µ )) , and similarly eigen ...r ( µ ) . We note that scaling a measure by r results inmultiplying its covariance matrix by r , an operation which does not affect theeigenvalues or eigenspaces. Lemma 4.5.
Let µ ∈ P ( R d ) , write Σ = Σ( µ ) , and let U ≤ R d be a linearsubspace. Then Σ | U = Σ( π U µ ) (the equality is of bi-linear forms on U ).Proof. Write m = m ( µ ) and m U = π U m = m ( π U µ ) (the last equality is imme-45iate). For vectors u, v ∈ U , we now have u T Σ v = u T (cid:18) ˆ ( x − m )( x − m ) T dµ ( x ) (cid:19) v = ˆ u T ( x − m )( x − m ) T vdµ ( x )= ˆ h u, x − m i h v, x − m i dµ ( x )= ˆ h u, π U ( x − m ) i h v, π U ( x − m ) i dµ ( x )= ˆ h u, x − m U i h v, x − m U i dπ U µ ( x )= u T Σ( π U µ ) v. This proves the claim.A measure µ is supported on an r -dimensional affine subspace of R d if andonly if λ i ( µ ) = 0 for i > r , in which case it is supported on a translate of eigen ...r µ . We will use a quantitative version of this fact: Lemma 4.6.
Let µ ∈ P ( R d ) and write λ i = λ i ( µ ) and V r = eigen ...r ( µ ) .1. µ is ( V r , O ( λ / r +1 )) -concentrated.2. If µ ∈ P ([0 , is ( V, ε ) -concentrated for some r -dimensional subspace V and ε > , then λ r +1 = O ( ε ) and µ is ( V r , O ( ε / )) -concentrated.3. Let µ = µ ω ∈ P ([0 , d ) be a random measure defined on some probabilityspace (Ω , F , P ) . Set A = E (Σ( µ )) . If λ r +1 ( A ) < ε , then, writing V =eigen ,...,r ( A ) , P (cid:16) µ is ( V, O ( ε / )) -concentrated (cid:17) > − O ( √ ε ) . Proof.
Let ξ be an R d -valued random variable distributed according to µ andlet m = m ( µ ) and Σ = Σ( µ ) . Identifying column vectors with d × matricesand scalars with × matrices, for u ∈ R d we have E (cid:16) h u, ξ − m i (cid:17) = E (cid:0) u T ( ξ − m )( ξ − m ) T u (cid:1) = u T E (cid:0) ( ξ − m )( ξ − m ) T (cid:1) u = u T Σ u. For a subspace W , let η W denote the rotation-invariant probability measure onthe unit sphere in W . Then there exists a constant c = c ( r ) such that d ( ξ, m + V r ) = c · ˆ h u, ξ − m i dη V ⊥ r ( u ) . Therefore, E (cid:0) d ( ξ, m + V r ) (cid:1) = E (cid:18) c · ˆ h u, ξ − m i dη V ⊥ r ( u ) . (cid:19) = c · ˆ E (cid:16) h u, ξ − m i (cid:17) dη V ⊥ r ( u ) . ≤ c · λ r +1 . u T Σ u ≤ λ r +1 for every unit vector in V ⊥ r . Now (1) follows fromMarkov’s inequality.For (2), fix V as in the statement. Since r + 1 + dim V ⊥ > d , we must have dim (cid:0) eigen ,...,r +1 ∩ V ⊥ (cid:1) ≥ . Fix a unit vector w ∈ eigen ,...,r +1 ∩ V ⊥ . Then E (cid:0) d ( ξ, m + V ) (cid:1) = E sup u ∈ V ⊥ h u, ξ − m i k u k ! ≥ sup u ∈ V ⊥ E h u, ξ − m i k u k ! ≥ E (cid:16) h w, ξ − m i (cid:17) = w T Σ( µ ) w ≥ λ r +1 . (26)On the other hand, since µ ∈ P ([0 , d ) we have k ξ k ≤ √ d µ -a.s., hence, writing δ = ε (1 + 2 √ d ) , E (cid:0) d ( ξ, m + V ) (cid:1) ≤ δ P ( ξ ∈ ( m + V ) ( δ ) ) + d · P ( ξ ∈ [0 , d \ ( m + V ) ( δ ) ) ≤ δ + d · P ( ξ ∈ [0 , d \ ( m + V ) ( δ ) ) . (27)Finally, since µ is ( V, ε ) -concentrated, there is a translate U of V such that µ ( U ( ε ) ) > − ε . Hence m = E ( ξ )= µ ( U ( ε ) ) E ( ξ | ξ ∈ U ( ε ) ) + (1 − µ ( U ( ε ) ) E ( ξ | ξ ∈ R d \ U ( ε ) ) . Since U ( ε ) is convex, E ( ξ | ξ ∈ U ( ε ) ) ∈ U ( ε ) . Also, since k ξ k ≤ √ d , both expec-tations on the right hand side of the last equation have magnitude at most √ d .Thus d ( m, U ( ε ) ) ≤ (cid:13)(cid:13)(cid:13) m − E ( ξ | U ( ε ) ) (cid:13)(cid:13)(cid:13) ≤ ε √ d. Therefore U ( ε ) ⊆ m + V ( ε +2 ε √ d ) = m + V ( δ ) , and consequently P ( ξ ∈ [0 , d \ ( m + V ) ( δ ) ) ≤ E ( ξ / ∈ U ( ε ) ) < ε. Combined with (26) and (27) this proves the first part of (2), the second partnow follows from (1).We turn to (3). Let U = (eigen ...r A ) ⊥ ≤ eigen r +1 ,...,d A . Also for brevitywrite Σ µ = Σ( µ ) . For any unit vector u ∈ U , we have ε > u T Au = E ( u Σ µ u T ) Since u T Σ µ u ≥ , by Markov’s inequality, P ( u T Σ µ u > √ ε ) < √ ε Now fix an orthonormal basis u . . . u ℓ of U (so ℓ ≤ d − ( r + 1) ). By the lastinequality, P ( u Ti Σ µ u i ≤ √ ε for all i = 1 , . . . , d ) ≥ − d √ ε
47y the Lemma 4.4, the condition in the event above implies that λ (Σ µ | U × U ) ≤ d √ ε , where Σ µ | U × U is the restriction of the quadratic for Σ µ to U × U , andby Lemma 4.5, Σ µ | U × U = Σ( π U µ ) (as linear forms on U ). Combined with theprevious probability estimate we get P (cid:0) λ (Σ π U µ ) ≤ d √ ε (cid:1) ≥ − d √ ε (28)By the first part of this lemma, for µ in the event in (28), π U µ is ( { } , O ( ε / )) -concentrated, and this is the same as saying that µ is ( V, O ( ε / )) -concentrated,as claimed.Recall the definition of the distance between linear subspaces (20). We shalluse the following basic fact, which we state without proof. Lemma 4.7.
The maps Σ λ i (Σ) are continuous on the set of positive semi-definite matrices. Furthermore, given τ > σ > and ≤ r ≤ d , the map Σ eigen ...r Σ is continuous on the compact space of positive semi-definitematrices Σ satisfying λ r (Σ) ≥ τ and λ r +1 (Σ) ≤ σ . The standard d -dimensional Gaussian measure γ = γ d is given by γ ( A ) = ´ A ϕ ( x ) dx , where ϕ = ϕ d is ϕ ( x ) = (2 π ) d/ exp( − k x k ) . The mean andcovariance are and I (the d × d identity matrix), respectively. Given a d × d covariance matrix Σ and m ∈ R d , write Σ = BB T . The Gaussian measurewith mean m ∈ R d and covariance Σ is the push-forward of γ by the map x Bx + m and is denoted N ( m, Σ) . When Σ is non-singular its density withrespect to Lebesgue is f ( x ) = 1 p (2 π ) d det Σ exp( −
12 ( x − m ) T Σ − ( x − m )) . When Σ is singular and r is such that λ r (Σ) > , λ r +1 (Σ) = 0 , one obtainsa similar formula for the density on the affine space V = eigen ...r (Σ) + m with respect to the r -dimensional Hausdorff measure on V . In particular, if µ = N ( m, Σ) and ν is the push-forward of µ through the map x rx , then ν = N ( rm, r Σ) .If µ , . . . , µ k are measures then µ = µ ∗ . . . ∗ µ k has mean m ( µ ) = P ki =1 m ( µ i ) and covariance Σ( µ ) = P ki =1 Σ( µ i ) . If µ i = N ( m i , Σ i ) then µ ∗ . . . ∗ µ k = N ( P m i , P Σ i ) .The central limit theorem asserts that, for µ , µ , . . . ∈ P ( R d ) which are nottoo concentrated on subspaces, the convolutions µ ∗ . . . ∗ µ k can be re-scaledso that the resulting measure is close to a Gaussian measure. The Berry-Esseenestimate and its variants quantify the rate of this convergence. The followingmulti-dimensional variant is due to Rotar [28]. Theorem 4.8.
Let µ , . . . , µ k be probability measures on R d with finite thirdmoments ρ i = ´ k x k dµ i ( x ) . Let µ = µ ∗ . . . ∗ µ k and let γ be the Gaussianmeasure with the same mean and covariance matrix as µ . Then for any convexBorel set D ⊆ R d , | µ ( D ) − γ ( D ) | ≤ C · P ki =1 ρ i λ d ( µ ) / , here C = C ( d ) . In particular, if ρ i ≤ C and λ d ( µ i ) ≥ c for constants c, C > then | µ ( D ) − γ ( D ) | = O c,C ( k − / ) . If µ ∈ P ( R d ) is supported on a subspace V ≤ R d but not on a smaller subspace,and if the support is bounded, then µ ∗ k becomes increasingly smooth as ameasure on V , in the sense that, by the central limit theorem, it converges(after suitable re-scaling) to a Gaussian on V . In this section we prove a localizedversion of this statement which applies with high probability to the componentsof the measure. Specifically, for µ ∈ P ( R d ) of bounded support, for every δ > and integer scale m , there are subspaces V , V , . . . such that typical level- i components of µ are ( δ, − m ) -concentrated on V i , and, when k is large, a typicallevel- i component of µ ∗ k is ( V i , δ, m ) -saturated.For a linear subspace V ≤ R d let π V denote the orthogonal projection R d → V . Recall our convention that λ i = 0 for i > d , and in what follows define λ (Σ) = d , so that when µ ∈ P ([0 , d ) and Σ = Σ( µ ) the sequence ( λ i ( µ )) ∞ i =0 is monotone. Proposition 4.9.
Let σ > , δ > , R > and m > m ( δ, R ) . Then thereexists an integer p = p ( σ, δ, R, m ) such that for all k ≥ k ( σ, δ, R, m ) and all ≤ ρ < ρ ( σ, δ, R, m, k ) , the following holds:Let µ , . . . , µ k ∈ P ([ − R, R ] d ) , let µ = µ ∗ . . . ∗ µ k and V = eigen ,...,r µ forsome ≤ r ≤ d , and suppose that λ r ( µ ) ≥ σk and λ r +1 ( µ ) ≤ ρ . Then P i = p − [log √ k ] (cid:0) µ x,i is ( V, δ, m ) -uniform (cid:1) > − δ. (29) Remark . Instead of λ r +1 ( µ ) < ρ we could require µ to be ( V, ρ ) -concentrated.This would give a formally equivalent statement (using Lemma 4.6 (2)). Proof.
It is a general fact that, for an absolutely continuous probability measure γ , for γ -a.e. x , as p → ∞ the components γ x,p converge weak-* to Lebesguemeasure on [0 , d , and in particular E i = p ( H m ( γ x,i )) → d as p → ∞ (30)(this is a consequence of the martingale convergence theorem). There is noguaranteed rate of convergence, but if γ has a continuous density function f ,then convergence holds at every x for which f ( x ) > , and the rate dependsonly on f ( x ) and on the modulus of continuity of f at x . In particular, when f ∈ C has a smooth density f , the convergence rate at x is controlled by f ( x ) and the bounds on k∇ f ( x ) k near x . Thus, for any compact family E ⊆ M d ( R ) of non-singular co-variance matrices and any compact K ⊆ R d , convergence in(30) is uniform as γ ranges over the Gaussians γ with mean 0 and co-variancematrix Σ ∈ E , and x ranges over K . Furthermore, given E we can choose a In the one-dimensional case in [12] there wan no requirement that m be large. The reasonthis is necessary in the multi-dimensional case is that, even when µ ∈ P ([0 , d ) is Lebesguemeasure on V ∩ [0 , d for an affine subspace V , we do not generally have H m ( µ ) = dim V ,but rather only H m ( µ ) = dim V − o (1) . One can change coordinates so that if D m is definedin the new coordinates, H m ( µ ) = dim V , but the coordinate change itself incurs an O (1 /m ) loss for H m ( · ) . K ⊆ R d so that it has arbitrarily large mass uniformly for such γ .Summarizing, given < σ, δ < , there is a p = p ( σ, δ, m ) such that, for anyGaussian γ with σ ≤ λ d ( γ ) ≤ /σ , P i = p (cid:0) H m ( γ x,i ) > d − δ (cid:1) > − δ. (31)In addition, by Lemma 3.2 (1), there is a weakly open neighborhood U δ ⊆ P ( R d ) of these Gaussians such that the inequality continues to be valid for all γ ∈ U δ .Next, let µ = µ ∗ . . . ∗ µ k be as in the statement of the proposition and firstassume r = d , so λ d ( µ ) ≥ σk . The third moments of the µ i are bounded by O R (1) , because µ i ∈ P ([ − R, R ] d ) . Thus by Theorem 4.8, if k is large enough ina manner depending only on δ , the scaling µ ′ of µ given by µ ′ ( A ) = µ (2 [log √ k ] · A ) will belong to U δ . Thus we obtain (31) for µ ′ . Scaling everything back by afactor of [log √ k ] we obtain (29).Now consider the case that Σ( µ ) is singular, i.e. λ d ( µ ) = 0 . Fix r, ρ and V = eigen ...r µ as in the statement of the proposition, and let π = π V denotethe orthogonal projection to V . Then the argument in the last paragraph appliesin V to the measure πµ = πµ ∗ . . . ∗ πµ k and ensures that P i = p − [log √ k ] (cid:0) H m ( πµ x,i ) > r − δ/ − O (1 /m ) (cid:1) > − δ/ . The O (1 /m ) term arises because we have transferred the entropy bound fromthe dyadic partition D Vm on V to the dyadic partition D m of R d . But, as we areassuming that m is large relative to δ , we can absorb this term in δ and assumethat P i = p − [log √ k ] (cid:0) H m ( π V µ x,i ) > r − δ/ (cid:1) > − δ/ . (32)Now, the hypothesis λ r +1 ( µ ) ≤ ρ means that µ is ( V, √ ρ ) -concentrated (Lemma4.6) and so for ρ small enough in a manner depending on the other parameters,a (1 − δ/ -fraction of the components µ x,p − [log √ k ] are ( V, − m ) -concentrated(Lemma (4.3)). In fact by taking ρ small we can ensure an arbitrarily highdegree of concentration. Furthermore, if enough of the mass of such a compo-nent µ x,p − [log √ k ] is concentrated on a small enough neighborhood of V , thenon this neighborhood π will be close enough to the identity map (in the supre-mum norm on continuous self-maps of [0 , d ) that Lemma 3.2 (3) will imply | H m ( µ x,p − [log √ k ] ) − H m ( π V µ x,p − [log √ k ] ) | < δ/ . Combined with (32), we obtain(29).We now specialize to convolutions of a single measure. Proposition 4.11.
Let σ, δ > and m > m ( δ ) . Then there exists p = p ( σ, δ, m ) such that for sufficiently large k ≥ k ( σ, δ, m ) and sufficiently small < ρ ≤ ρ ( σ, δ, m, k ) , the following holds.Let µ ∈ P ( R d ) , fix an integer i ≥ , and write A = E i = i (Σ( µ x,i )) . If λ r ( A ) > σ and λ r +1 ( A ) < ρ for some ≤ r ≤ d , then, setting V =eigen ...r ( A ) , ν = µ ∗ k and j = i − [log √ k ] + p , we have P j = j (cid:0) ν x,j is ( V, δ, m ) -saturated (cid:1) > − δ. roof. Fix σ , δ , m , k , ρ , µ , A , V , i as in the statement, we will see thatif the stated relationships hold and p is defined as in the statement, then theconclusion holds.Let e µ denote the k -fold self-product e µ = µ × . . . × µ and π : ( R d ) k → R d themap π ( x , . . . , x k ) = k X i =1 x i . Then ν = π e µ , and, since e µ = E i = i ( e µ x,i ) , we also have by linearity ν = E i = i ( π ( e µ x,i )) . Thus, by Corollary 3.13 and an application of Markov’s in-equality, there is a δ > , depending only on δ and d , such that if m is largeenough as a function of δ then the proposition will follow if we show that withprobability > − δ over the choice of the component e µ x,i of e µ , the measure τ = π ( e µ x,i ) satisfies P j = j (cid:0) τ y,j is ( V, δ , m ) -uniform (cid:1) > − δ . If we manage to define a random subspace W = W ( e µ x,i ) such that P j = j (cid:18) d ( W, V ) < √ d − ( m +1) and τ y,j is ( W, δ , m + 1) -uniform (cid:19) > − δ , then the previous inequality follows by applying Lemma 3.21 to each component η y,i in the last event (we use here the assumption that m is large relative to δ ).We thus aim to define W such that (33) holds.Set η = π ( e µ x,i ) and notice that, with τ as before, the distribution of thecomponents τ y,j is the same as the distribution of the components of η z,j − i .Thus what we really aim to prove is that we P j = j − i (cid:18) d ( W, V ) < √ d − ( m +1) and η y,j is ( W, δ , m + 1) -uniform (cid:19) > − δ . (33)A random component e µ x,i is itself a product measure e µ x,i = µ x ,i × . . . × µ x k ,i (here x = ( x , . . . , x k ) ), and the marginal measures µ x j ,i of thisproduct are distributed independently according to the distribution of the re-scaled components of µ at level i . Recall that Σ( π ( µ x ,i × . . . × µ x k ,i )) = k X j =1 Σ( µ x j ,i ) (34)Fixing a parameter δ which will depend on σ, δ , by the weak law of largenumbers, if k is large enough in a manner depending on δ , then with probability > − δ over the choice of e µ x,i we will have (cid:13)(cid:13)(cid:13)(cid:13) k Σ( π e µ x,i ) − A (cid:13)(cid:13)(cid:13)(cid:13) < δ . (35) We use here the fact that we have a uniform bound for the rate of convergence in the weaklaw of large numbers for i.i.d. random variables X , X , . . . . In fact, the rate can be boundedin terms of the mean and variance of X n . Here X n are matrix-valued (they are distributed likethe covariance matrix of the level- i components of µ ), and therefore the mean and varianceof the components of X n can be bounded independently of the measure µ ∈ P ([0 , d ) . µ r ( A ) > σ and λ r +1 ( A ) < ρ , and assumingas we may that ρ < k/ , we can choose δ in a manner depending on σ, δ , insuch a way that (35) implies λ r ( π e µ x,i ) > kσ λ r +1 ( π e µ x,i ) < kσ and such that, if we write W x,i = eigen ...r Σ( π e µ x,i ) = eigen ...r Σ( π e µ x,i ) ,then d ( W x,i , V ) < √ d − ( m +1) Thus, assuming that k is large enough, we have shown P i = i (cid:18) λ r ( π e µ x,i ) > kσ and d ( W x,i , V ) < √ d − ( m +1) (cid:19) > − δ (36)Next, fix such a k . By hypothesis λ r +1 ( A ) = λ r +1 ( E i = i (Σ( µ x,i ))) < ρ , so byLemma 4.6 (3), P i = i (cid:16) µ x,i is ( V, O ( ρ / )) -concentrated (cid:17) > − O ( √ ρ ) Using again the fact that e µ x,i is a product of k independent copies of level- i components of µ , the last inequality implies P i = i (cid:16) all marginals of e µ x,i are ( V, O ( ρ / ) -concentrated (cid:17) > − O ( k √ ρ ) (37)If e µ x,i is in the event above, then all its marginals are ( V, O ( ρ / )) -concentrated,so by Lemma 4.3, π ( e µ x,i ) is ( V, O ( kρ / )) -concentrated. By Lemma 4.6 (2), λ r +1 ( π ( e µ x,i )) < O k ( ρ / ) , and we conclude that P i = i (cid:16) λ r +1 ( π ( e µ x,i )) ≤ O k ( ρ / ) (cid:17) > − O ( k √ ρ ) Combining this with (36) and assuming that ρ is sufficiently small relative to δ , we have P i = i λ r ( π e µ x,i ) > kσ λ r +1 ( π ( e µ x,i )) < O k ( ρ / ) d ( W x,i , V ) < √ d − ( m +1) > − δ Let e µ x,i belong to the event above. Let us recall the dependences of the pa-rameters: δ is given and determines δ , then m is large relative to δ , then δ small depending on σ, δ ! , then k is correspondingly large, and ρ correspondinglysmall. So we can assume that k is large enough, and ρ small enough, to applyProposition 4.9 with parameters δ , m + 1 and σ/ , and conclude that there isa p = p ( δ , m + 1 , k ) = p ( δ, m, k ) such that, writing η = π e µ x,i , P j = j − i (cid:0) η y,j is ( W x,i , δ , m + 1) -uniform (cid:1) > − δ . This and the estimate above on the probability that d ( W x,i , V ) < √ d − ( m +1) give (33), which is what we wanted. 52 heorem 4.12. Let δ > and m ∈ N . Then there exists a ≤ k ≤ k ( δ, m ) such that for all sufficiently large n ≥ n ( δ, m, k ) , the following holds: For any µ ∈ P ( R d ) there is a sequence V , . . . , V n of subspaces of R d such that, writing ν = µ ∗ k , P ≤ i ≤ n (cid:0) ν x,i is ( V i , δ, m ) -saturated (cid:1) > − δ and P ≤ i ≤ n (cid:0) µ x,i is ( V i , δ ) -concentrated (cid:1) > − δ. Proof.
It is a formal consequence of Proposition 3.19 that we may assume that m is large in a manner depending on δ . We also may assume that δ < / .Also, since we are free to take R large relative to n , we can assume that µ issupported on [0 , d .Let k ( · ) , p ( · ) , ρ ( · ) be as in Proposition 4.11. We assume, without loss ofgenerality, that these functions are monotone in each of their arguments.Let c > denote a constant good for all previous big- O bounds.The proof will depend on a function e ρ : (0 , d ] → (0 , d ] such that e ρ ( σ ) issmall in a manner depending on σ, δ, m . Specifically, we require that e ρ satisfythe following inequalities, where exp ( y ) = 2 y (for concreteness, on could define e ρ ( σ ) to be one-half the minimum of the right-hand sides): e ρ ( σ ) < σ, (38) e ρ ( σ ) < ρ ( σ, δ/ , m, k ( σ, δ/ , m )) , (39) e ρ ( σ ) < δ c (2 d ) , (40) e ρ ( σ ) < c exp ( − d + 1) · ([log p k ( σ, δ/ , m )] − p ( σ, δ/ , m )) δ/ , (41) e ρ ( σ ) < c · ( δ ( √ d + 1) · · d − ) · d . (42)As before define λ (Σ i ) = d and λ d +1 (Σ i ) = 0 . Fix n and µ , we shall latersee how large an n is desirable. For ≤ q ≤ n write Σ q = E i = q (cid:0) Σ( µ x,i ) (cid:1) . Define a sequence σ > σ > . . . by σ = d and σ i = e ρ ( σ i − ) (the sequence isdecreasing because of (38)). For a covariance matrix Σ and s ∈ N , set N s (Σ) = { ≤ j ≤ d : λ j (Σ) ∈ ( σ s , σ s − ] } . Claim . There is an s ≤ ⌈ d/δ ⌉ satisfying P ≤ q ≤ n ( N s (Σ q ) = 0) > − δ . In [12] the corresponding statement holds for all large enough k . The reason the size of k must be restricted is, roughly, that if µ is concentrated extremely near a subspace V then itwill remain so for a reasonable number of convolutions, but too many convolutions will makeit drift away from V . The corresponding theorem in [12] is stated differently, in terms of disjoint subsets
I, J ⊆{ , . . . , n } . See remark after Theorem 2.8. roof. Note that P ∞ r =1 N r (Σ q ) = d , so ⌈ d/δ ⌉ X s =1 E ≤ q ≤ n ( N i (Σ q )) = E ≤ q ≤ n ( ⌈ d/δ ⌉ X s =1 N i (Σ q )) ≤ d. (43)Thus there must exist an s ≤ ⌈ d/δ ⌉ such that E ≤ q ≤ n ( N s (Σ q )) ≤ d ⌈ d/δ ⌉ < δ . Since N i ( · ) is integer valued, we have P ≤ q ≤ n ( N s (Σ q ) ≥ ≤ E ≤ q ≤ n ( N s (Σ q )) , so this is the desired s .Fix an s that satisfies the conclusion of the lemma, write σ = σ s − ρ = σ s = e ρ ( σ ) , and set k = k ( σ, δ , m ) . Note that k is bounded above by some expression k ( δ, m ) (also dependingimplicitly on the choice of the function e ρ ), as in the statement, since its largestpossible value occurs for s = [1 + 2 d/δ ] , and once the function e ρ is fixed, themagnitude σ , and hence k , is bounded.Let I = { ≤ q ≤ n : N s (Σ q ) = 0 } . By our choice of s , | I | ≥ (1 − δ n + 1) . For q ∈ I let ≤ r q ≤ d denote the smallest integer such that λ r q (Σ q ) ≥ σ and λ r q +1 (Σ q ) < ρ, which exists by definition, and set W q = eigen ,...,r q (Σ q ) . We define W q = R d for q / ∈ I . Finally, write ℓ = [log √ k ] − p ( σ, δ , m ) . Claim . For q ∈ I , P i = q (cid:18) ν x,i − ℓ is ( W i , δ , m ) -saturated (cid:19) > − δ (44) P i = q (cid:16) µ x,i is ( W i , cρ / ) -concentrated (cid:17) > − cρ / . (45)54 roof. The first inequality follows from Proposition 4.11 and our choice ofparameters, specifically the definition of ℓ and assumption (39). The sec-ond follows from Lemma 4.6 (3) applied to the random component µ x,i , since W q = eigen ,...,r q , E i = q (cid:0) λ r q +1 ( µ x,i ) (cid:1) < ρ .This is almost what we want, except that in (44) the level of the componentis shifted by ℓ (that is, ν x,i − ℓ appears instead of ν x,i ). To correct this weapply Corollary 3.32 to (45) with parameter cρ / (we can do this since we areassuming that n is large relative to ρ ). Then, writing ρ ′ = 3 · d − c / (4 · d ) ρ / (24 · d ) , we conclude the there are subspaces W ′ i ≤ W i such that for all ≤ q ≤ n , P i = q (cid:0) µ x,i is ( W ′ i , ρ ′ ) -concentrated (cid:1) > − d p cρ / > − δ (46)(the last inequality by (40)), and n + 1 (cid:8) ≤ q ≤ n : d ( W ′ q , W ′ q − ℓ ) ≤ ρ ′ (cid:9) ≥ − d + 1) ℓ log(1 /cρ / ) > − δ (the last inequality in by assumption (41)). Let J = { i ∈ I , d ( W ′ i , W ′ i − ℓ ) ≤ ρ ′ } . Since n +1 | I | > − δ/ , the previous equation implies that n + 1 | J | ≥ − δ. (47)Now, for any ℓ ≤ q ≤ n , applying (46) to q − ℓ we have P i = q (cid:0) ν x,i − ℓ is ( W ′ i − ℓ , ρ ′ ) -concentrated (cid:1) > − δ. Assuming also q ∈ J , we also have d ( W ′ q − ℓ , W ′ q ) ≤ ρ ′ , so by Lemma 3.21 (1)applied to each component ν x,i − ℓ in the event above, P i = q (cid:16) ν x,i − ℓ is ( W ′ i , ( √ d + 1) ρ ′ ) -concentrated (cid:17) > − δ for q ∈ J. Our assumption (42) implies that ( √ d + 1) ρ ′ < δ , and the last inequality yields P i = q (cid:0) ν x,i − ℓ is ( W ′ i , δ ) -concentrated (cid:1) > − δ for q ∈ J. (48)On the other hand for q ∈ J we have q ∈ I and so (44) holds. Since W ′ q ≤ W q ,by Lemma 3.21 (4), we have P i = q (cid:18) µ x,i − ℓ is ( W ′ i , δ O ( 1 m ) , m ) -saturated (cid:19) > − δ for q ∈ J. Since we are assuming m large enough relative to δ , this implies P i = q (cid:0) µ x,i − ℓ is ( W ′ i , δ, m ) -saturated (cid:1) > − δ for q ∈ J. (49)In conclusion, if we define V i = W ′ i + ℓ and replace I by ( J − ℓ ) ∩ [0 , n ] , thenequations (47), (48), and (49) give the desired conclusion, assuming that m, δ have the appropriate relationship to each other and to ε , and that n and is largeenough. 55 .5 The Ka˘ımanovich-Vershik lemma The second ingredient in our proof of Theorem 2.8 is the following entropyanalog of the Plünnecke-Rusza inequality:
Lemma 4.15 (Ka˘ımanovich-Vershik, [18]) . Let Γ be a countable abelian groupand let µ, ν ∈ P (Γ) be probability measures with H ( µ ) < ∞ , H ( ν ) < ∞ . Let δ k = H ( µ ∗ ( ν ∗ ( k +1) )) − H ( µ ∗ ( ν ∗ k )) . Then δ k is non-increasing in k . In particular, H ( µ ∗ ( ν ∗ k )) ≤ H ( µ ) + k · ( H ( µ ∗ ν ) − H ( ν )) . This lemma first appears in a study of random walks on groups by Ka˘ımanovichand Vershik [18]. It was more recently rediscovered and applied in additivecombinatorics by Madiman and co-authors [24, 25], and in a weaker form inde-pendently by Tao [31]. For a proof using our notation see [12].For non-discrete measures in R d we have the following analog: Corollary 4.16.
Let µ, ν ∈ P ( R d ) with H n ( µ ) , H n ( ν ) < ∞ . Then H n ( µ ∗ ( ν ∗ k )) ≤ H n ( µ ) + k · ( H n ( µ ∗ ν ) − H n ( µ )) + O ( kn ) . The error term arises in the same way as in Lemma 4.1. For the proof see[12] (the passage from R to R d requires only notational changes). We now prove Theorem 2.8, which we re-state for convenience.
Theorem 4.17.
For every ε > , R > and m ∈ N , there exists δ = δ ( ε, R, m ) > such that for all n > n ( ε, R, m, δ ) , the following holds: if ν, µ ∈ P ([ − R, R ] d ) and H n ( µ ∗ ν ) < H n ( µ ) + δ, then there exists a sequence V , . . . , V n ≤ R d of subspaces such that P ≤ i ≤ n (cid:18) µ x,i is ( V i , ε, m ) -saturated and ν x,i is ( V i , ε ) -concentrated (cid:19) > − ε Proof.
Fix ε > . It is a formal consequence of Proposition 3.19 that it sufficesfor us to prove the theorem with the assumption that m is large in a mannerdepending on ε . We can also assume that ε < / , and that ε is small withrespect to d . Also, as we are free to choose n large relative to R , the distributionon components depends negligibly on dyadic scales greater than , and the scale- n entropy of µ and µ ∗ ν differs negligibly from the same entropy conditionedon D . Thus, without loss of generality, we can assume that the measures aresupported on [0 , d , and we omit mention of R from now on.Choose k = k ( ε, m ) as in Theorem 4.12. We shall show that the conclusionholds if n is large relative to the previous parameters.Let µ, ν ∈ P ([0 , d ) . Denote τ = ν ∗ k n is large enough, Theorem 4.12 provides us with subspaces V , . . . , V n ⊆ R d such that P ≤ i ≤ n (cid:0) ν x,i is ( V i , ε ) -concentrated (cid:1) ≥ − ε, (50)and P ≤ i ≤ n (cid:0) τ x,i is ( V i , ε, m ) -saturated (cid:1) > − ε. If it holds that P ≤ i ≤ n (cid:0) µ x,i is ( V i , ε, m ) -saturated (cid:1) > − ε (51)then we are done, since (50) and (51) together are the second alternative of thetheorem we want to prove (with a multiple of ε instead of ε , but this is formallyequivalent).Otherwise, by Lemma 4.2 and the above we have P ≤ i ≤ n (cid:18) H m ( µ x,i ∗ τ y,i ) > H m ( µ x,i ) + ε − O ( 1 m ) (cid:19) ≥ P ≤ i ≤ n (cid:0) µ x,i is not ( V i , ε, m ) -saturated and τ y,i is ( V i , ε, m ) -saturated (cid:1) > P ≤ i ≤ n (cid:0) µ x,i is not ( V i , ε, m ) -saturated (cid:1) − (cid:0) − P ≤ i ≤ n (cid:0) τ y,i is ( V i , ε, m ) -saturated (cid:1)(cid:1) > ε − (1 − (1 − ε ))= ε. Let δ ′ ( µ x,i , τ y,i ) = H m ( µ x,i ∗ τ y,i ) − H m ( µ x,i ) . By the previous calculation, withprobability at least ε we have δ ′ ≥ ε − O (1 /m ) , and by Lemma 4.1 we alwayshave δ ′ ≥ − O (1 /m ) . Thus E ≤ i ≤ n (cid:0) δ ′ ( µ x,i , τ y,i ) (cid:1) ≥ ε − O ( 1 m ) Thus, by Lemmas 3.5 and 3.6, H n ( µ ∗ τ ) > E ≤ i
For every ε > and m ∈ N , there are ε ′ = ε ′ ( ε, m ) → and m ′ = m ′ ( ε, m ) → ∞ as ε → and m → ∞ , such that the following holds.Suppose that θ, η are independent P ( R d ) -valued random variables defined ona probability space (Ω , F , P ) and that V = V ( θ, η ) ≤ R d is a linear subspacedetermined by the random measures θ, η (hence V is random). If P ( θ is ( V, ε ′ , m ′ ) -saturated and η is ( V, ε ′ ) -concentrated ) > − ε ′ . (52) Then there is a deterministic subspace V ∗ such that P ≤ i ≤ m ′ (cid:0) θ x,i is ( V ∗ , ε, m ) -saturated and η y,i is ( V ∗ , ε ) -concentrated (cid:1) > − ε. (53) (the probability in the last equation is over both the measures θ, η and i, x, y ,independently).Proof. It is a formal consequence of Proposition 3.19 that it is enough to provethe statement under the assumption that m is large relative to ε .Fix ε . Assume m, m ′ large relative to ε , and ε ′ small relative to the otherparameters. We shall show that (52) implies (53).Apply Proposition 3.29 to η with parameter ε ′ . We obtain a random sub-space V c = V c ( η ) and a constant C c ≥ (which we will later assume is largecompared to another constant D ) such that, for δ c = C c · ( ε ′ ) / d , η is ( V c , δ c ) -concentrated and if η is ( W, ε ′ ) -concentrated then V c ⊑ W ( δ c ) .Apply Proposition 3.30 to θ with parameter m ′ . We obtain a random sub-space V s = V s ( θ ) such that, for a constant C s and δ s = C s log( m ′ ) m ′ , the measure θ is ( V s , δ s , m ′ ) -saturated and if θ is ( W, /m ′ , m ′ ) -saturated, then W ⊑ V ( δ s ) s . We shall assume that ε ′ < /m ′ . Thus, if θ is ( W, ε ′ , m ′ ) -saturatedthen W ⊑ V ( δ s ) s .The random subspace V satisfies (52), so we have P (cid:16) V c ⊑ V ( δ c ) and V ⊑ V ( δ s ) s (cid:17) > − ε ′ . Thus, writing δ = δ c + δ s , we have P ( V c ⊑ V ( δ ) s ) > − ε ′ . (54)Let W = { W , . . . , W N } denote a minimal δ c -dense sequence of subspaceswith respect to the metric (20). This metric is bi-Lipschitz equivalent to asmooth metric on the compact manifold of subspaces, so N ≤ D · δ − [ d / c forsome universal constant D > (here [ d / is the dimension of the space ofsubspaces). Let W = { W ∈ W : P ( d ( V c , W ) < δ c ) > δ c N } . Apply Proposition 3.27 to W with parameter δ to obtain the parameter δ ′ = 4 · / d · δ / d and a non-trivial subspace V ∗ such thata. W ⊑ V ( δ ′ ) ∗ for all W ∈ W ,b. If e V ∗ is another subspace such that W ⊑ e V (2 δ ) ∗ for all W ∈ W , then V ∗ ⊑ e V (2 δ ′ ) ∗ .We claim that V ∗ is the desired subspace. Writing W = W \ W , P ( d ( V c , W ) ≥ δ c for all W ∈ W ) = P ( V c / ∈ [ W ∈W B δ c ( W )) ≤ P ( V c ∈ [ W ∈W B δ c ( W )) ≤ X W ∈W P ( V c ∈ B δ c ( W )) ≤ |W | · δ c N< δ c , S W ∈W ∪W B δ c ( W ) coversall subspaces, and in the last line we used |W | ≤ |W| = N . Hence P ( d ( V c , W ) < δ c for some W ∈ W ) > − δ c . Consequently, by property (a) of V ∗ and the fact that δ c ≤ δ ′ , P ( V c ⊑ V (2 δ ′ ) ∗ ) > − δ c . (55)Since V c is a function of η and V s is a function of θ , and since η, θ areindependent, also each of the pairs V c , θ and V c , V s is independent. Therefore,for a.e. value θ of θ , P ( d ( V c , W ) < δ c | θ = θ ) = P ( d ( V c , W ) < δ c ) > δ c N for all W ∈ W . (56)Observe that δ c N ≥ D δ d / c = C d / c D ( ε ′ ) (1+[ d / / d > ( ε ′ ) / , where, to justify the last inequality, we increase the constant C c if necessary toensure C d / c /D ≥ , and note that (1 + [ d / / d < / . Thus if a fixedmeasure θ satisfies P ( V c ⊑ V ( δ ) s | θ = θ ) > − √ ε ′ (57)then, by (56) and (57), for all W ∈ W , P ( W ⊑ V (2 δ ) s | θ = θ ) ≥ P ( d ( V c , W ) < δ c and V c ⊑ V ( δ ) s | θ = θ ) ≥ P ( d ( V c ⊑ V ( δ ) s | θ = θ ) − (1 − P ( d ( V c , W ) < δ c | θ = θ )) > (1 − √ ε ′ ) − (1 − √ ε ′ )= 0 , Since V s is a function of θ , this says that for θ satisfying (57) we have W ⊑ V (2 δ ) s for each W ∈ W ; consequently, by property (b) of the definition of V ∗ , for such θ we have that V ∗ ⊑ V (2 δ ′ ) s .By Markov’s inequality and (54), the relation (57) holds with probability − √ ε ′ over the choice of θ . Thus we conclude P ( V ∗ ⊑ V (2 δ ′ ) s ) > − √ ε ′ . (58)Combining (55) and (58) and using √ ε ′ ≤ δ c , we find that P ( V c ⊑ V (2 δ ′ ) ∗ and V ∗ ⊑ V (2 δ ′ ) s ) > − δ c . Finally, fix η, θ and associated to V s , V c belonging to this event, we have that η is ( V c , δ c ) -concentrated. Therefore by Lemma 3.17, P ≤ i ≤ m ′ (cid:16) η x,i is ( V c , p i δ c ) -concentrated (cid:17) > − p δ c i , with η, θ fixed). Since V c ⊑ V (2 δ ′ ) ∗ ,and assuming as we may that ε ′ , and hence δ c , is small enough relative to ε, m ′ ,this implies P ≤ i ≤ m ′ (cid:0) η x,i is ( V ∗ , ε ) -concentrated (cid:1) > − ε . (59)Similarly, θ is ( V s , δ s , m ′ ) -saturated, so arguing in the same manner and usingLemma 3.16, P ≤ i ≤ m ′ (cid:18) θ y,i is ( V s , r dδ s + O ( mm ′ ) , m ) -saturated (cid:19) > − O ( r δ s + mm ′ ) . Assuming ε ′ is small enough and m ′ large enough relative to ε, m , the constant δ ′ can be assumed arbitrarily small compared to ε, m . Since V ∗ ⊑ V (2 δ ′ ) s we have d ( V ∗ , π V s V ∗ ) < δ ′ , so by Lemma 3.21 and (4), and assuming the parameterssatisfy the appropriate relationship, we have P ≤ i ≤ m ′ (cid:0) θ y,i is ( V ∗ , ε, m ) -saturated (cid:1) > − ε . (60)Thus, combining (59) and (60) for η, θ in the event in (58), we have P ≤ i ≤ m ′ (cid:0) θ y,i is ( V ∗ , ε, m ) -saturated and η x,i is ( V ∗ , ε ) -concentrated (cid:1) > − ε. Using (58) and assuming as we may that √ ε ′ < ε/ , we obtain (53). Corollary 5.2.
Let ε > and m ∈ N . Then there exist ε ′′ = ε ′′ ( ε, m ) → and m ′′ = m ′′ ( ε, m ) → ∞ as ε → and m → ∞ , such that for all large enough n ,the following holds. Suppose that we are given subspaces V ( i,x,y ) for ≤ i ≤ n and x, y ∈ [0 , d and measures θ, η ∈ P ([0 , d ) such that P ≤ i ≤ n (cid:18) θ x,i is ( V ( i,x,y ) , ε ′′ , m ′′ ) -saturated and η y,i is ( V ( i,x,y ) , ε ′′ ) -concentrated (cid:19) > − ε ′′ , then there are subspaces V i ≤ R d such that P ≤ i ≤ n (cid:18) θ x,i is ( V i , ε, m ) -saturated and η y,i is ( V i , ε ) -concentrated (cid:19) > − ε. Proof.
Apply the previous proposition to obtain ε ′ = ε ′ ( ε, m ) and m ′ = m ′ ( ε, m ) , and set m ′′ = m ′ and ε ′′ = min { ( ε ′ ) , /m ′ } .For ≤ k ≤ n , let p k = P i = k (cid:18) θ x,i is ( V ( i,x,y ) , ε ′′ , m ′′ ) -saturated and η y,i is ( V ( i,x,y ) , ε ′′ ) -concentrated (cid:19) , and assume as in the hypothesis that n +1 P nk =0 p k > − ε ′′ . Let I ⊆ { , . . . , n } denote the set of k such that p k > − √ ε ′′ = 1 − ε ′ , so by Markov, | I | > (1 − ε ′ )( n + 1) .For i ∈ I , consider the random and independently chosen components θ x,i , η y,i and the subspace V i = V i,x,y . Without loss of generality we may assumethat V i,x,y depend only on θ x,i and η y,i , since the only stated property of V i,x,y ε ′ , m ′ ,we conclude that there exists a subspace V i such that P i ≤ j ≤ i + m ′ (cid:18) θ x,i is ( V i , ε, m ) -saturated and η y,i is ( V i , ε ) -concentrated (cid:19) > − ε . The remainder of the argument involves choosing one of these subspaces V i ( j ) ,for every j ∈ S u ∈ I [ u, u + m ′ ] . The details are identical to the proof of Proposition3.19.We will actually need a more general version of the last corollary, but, asthe proof is identical to the one above, we only give the statement. Corollary 5.3.
Let ε > and m ∈ N . Then there exist ε ′′ = ε ′′ ( ε, m ) → and m ′′ = m ′′ ( ε, m ) → ∞ as ε → , such that for all large enough n , the followingholds. Suppose that θ ∈ P ( G ) , η ∈ P ( R d ) , and that for ≤ i ≤ n , x ∈ supp η and g ∈ supp θ there are subspaces V ( i,x,g ) such that P ≤ i ≤ n (cid:18) η x,i is ( V ( i,x,y ) , ε ′′ , m ′′ ) -saturated and S i U − g ( θ g,i . x ) is ( V ( i,x,y ) , ε ′′ ) -concentrated (cid:19) > − ε ′′ . Then there are subspaces V i ≤ R d such that P ≤ i ≤ n (cid:18) η x,i is ( V i , ε, m ) -saturated and S i U − g ( θ y,i . x ) is ( V i , ε ) -concentrated (cid:19) > − ε. G -components We turn our attention to measures η ∈ P ( R d ) of the form η = θ . x for some θ ∈ P ( G ) and x ∈ R d . Our goal is to show that the concentration proper-ties of typical components η y,i translates to similar properties of the “compo-nents” θ g,i . x . The issue which we must overcome is that θ g,i . x is supportedon D Gi ( g ) . x , and this set generally intersects more than one dyadic cell of D di .Thus even if η is highly concentrated on a translate of a subspace W on each ofthese cells, taken together all one can say is that θ g,i . x is concentrated on theunion of several translates of W .For a linear subspace W ≤ R d we say that a measure η ∈ P ( R d ) is ( W, δ ) m -concentrated if for some m ′ ≤ m there are m ′ translates W , . . . , W m ′ of W such that η ( S m ′ u =1 W ( δ ) u ) ≥ − δ . Thus ( W, δ ) -concentration is the same as ( W, δ ) -concentration. Lemma 5.4.
Let
R > , let θ ∈ P ( G ) and x ∈ [ − R, R d ] . Suppose that δ > , m ∈ N and that θ . x is ( W, δ ) m -concentrated. Then for n = [ log(1 /δ )] and δ ′ = O R,m ( log log(1 /δ )log(1 /δ ) ) we have P ≤ i ≤ n ( S i ( θ g,i . x ) is ( W, δ ′ ) -concentrated ) > − δ ′ . Proof.
Although the “rescaled component” θ g,i is not defined, it will be conve-nient to introduce the notation θ g,i . x = S i ( θ g,i . x ) , C = C ( R ) ≥ such that θ g,i . x is supported on a set ofdiameter ≤ C − i , and θ g,i . x is supported on a set of diameter ≤ C .Let W , . . . , W m be affine subspaces parallel to W verifying that θ . x is ( W, δ ) m -concentrated. We may assume the W u are distinct. For u = v let d u,v = d ( W u , W v ) = min { d ( x, y ) : x ∈ W u , y ∈ W v } . Notice that for any ≤ k ≤ n , • If − k < C d u,v for some u, v , then for any g the measure θ g,k . x is sup-ported on a set of diameter at most C − k < d u,v , and on the other hand √ δ ≤ − n ≤ − k ≤ d u,v , hence θ g,k . x gives positive mass to at most onethe sets W ( √ δ ) u , W ( √ δ ) v . • Let I g,k ⊆ { , . . . , m } be the set of indices u such that ( θ g,k . x )( W ( δ ) u ) > .Given ρ > , if all distinct u, v ∈ I g,k satisfy d u,v ≤ ρ − k , then there is atranslate W g,k of W such that S u ∈ I g,k W ( δ ) u ∩ supp( θ g,k . x ) ⊆ W ( ρ − k +2 δ ) g,k ,so θ g,k . x is ( W, ρ + 2 k +1 δ )) -concentrated.Now, for ≤ k ≤ n we have identity θ . x = E i = k ( θ g,i . x ) . Using the hypothesis that ( θ . x )( S mu =1 W ( δ ) u ) > − δ and Markov’s inequalitywe conclude that P i = k ( θ g,i . x )( m [ u =1 W ( δ ) u ) > − √ δ ! > − √ δ. (61)Fix a small parameter ρ > , and suppose that k satisfiesFor each ≤ u < v ≤ m either − k < C d u,v or d u,v ≤ ρ − k , (62)or, equivalently, that k does not belong to any of the intervals J u,v = [log ρd u,v , log Cd u,v ) .Then, setting σ = σ ( ρ ) = max {√ δ, ρ + 2 k +1 δ } the two observations above and (61) imply P i = k (cid:0) θ g,k . x is ( W, σ ) -concentrated (cid:1) > − √ δ. Note that δ k ≤ δ n ≤ √ δ , so in fact σ ≤ ρ + 2 √ δ .Next, since the length of J u,v is log Cρ and there are at most m ( m − distinct values of ≤ u, v ≤ m , the fraction of ≤ k ≤ n which satisfy (62)is at least − m log( Cρ ) /n . Averaging the last equation over k = 0 , . . . , n , weconclude that P ≤ k ≤ n (cid:16) θ g,k . x is ( W, ρ + √ δ ) -concentrated (cid:17) > − √ δ − m log(4 C/ρ ) n = 1 − √ δ − O m ( log(1 /ρ )log(1 /δ ) ) . Choosing ρ = /δ ) gives the desired result.63 roposition 5.5. For every ε > and R > there exists n = n ( ε, R ) (with n ( ε, R ) → ∞ as ε → ) and an δ = δ ( ε, R ) > (with δ ( ε, R ) → as ε → )such that the following holds. Let ν ∈ P ( G ) and x ∈ [ − R, R ] d , and write η = ν . x . Let V < R d be a linear subspace and k ∈ N such that P j = k (cid:0) η x,j is ( V, δ ) -concentrated (cid:1) > − δ. Then P k ≤ j ≤ k + n ( S j ( ν g,j . x ) is ( V, ε ) -concentrated ) > − ε. Proof.
Fix ε, n, δ for the moment and assume that the hypothesis holds. Con-sider the identities E j = k ( η x,j ) = η = E j = k ( ν g,j . x ) . This means that the measures ν g,k . x are ( ν -almost-surely over choice of g ) ab-solutely continuous with respect to the weighted average of the components η x,k . In fact, since each ν g,k . x is supported on a set that intersects m = O R (1) level- k dyadic cells, each “component” ν g,k . x is absolutely continuous with re-spect to the average of these O (1) components η x,k . Most of these componentsare ( V, δ ) -concentrated, so a Markov-inequality argument (similar to the one inLemma 6.5 below) shows that P j = k (cid:0) ν g,j . x is ( V, − k δ ′ ) m -concentrated (cid:1) > − δ ′ , where δ ′ → as δ → . Equivalently, P j = k ( S j ( ν g,j . x ) is ( V, δ ′ ) m -concentrated ) > − δ ′ . Apply the previous lemma to each component θ = ν g,k in the event above withparameter δ ′ . Taking δ ′′ = O R,m (log log(1 /δ ′ ) / log(1 /δ ′ )) and n = [ log(1 /δ ′ )] ,the conclusion is P k ≤ i ≤ k + n ( S i ( ν g,i . x ) is ( V, δ ′′ ) -concentrated ) > − δ ′′ , and δ ′′ can be made arbitrarily small by taking δ small. This is what wasclaimed. Proposition 5.6.
For every δ > , R > and m ∈ N , if m ′ > m ′ ( δ, m, R ) , <δ ′ < δ ′ ( δ, m, R ) , then for all large enough n (depending on previous parameters),the following holds. Let µ ∈ P ( R d ) , ν ∈ P ( G ) and x ∈ R d , and write η = ν . x . Let V , V , . . . , V n ≤ R d be linear subspaces, and suppose that P ≤ j ≤ n (cid:18) µ x,j is ( V j , δ ′ , m ′ ) -saturated and η y,j is ( V j , δ ′ ) -concentrated (cid:19) > − δ ′ . Then there are subspaces V ′ , V ′ , . . . , V ′ n such that P ≤ j ≤ n (cid:18) µ x,j is ( V ′ j , δ, m ) -saturated and S j ( ν g,j . x ) is ( V ′ j , δ ) -concentrated (cid:19) > − δ. Proof.
Let δ, R, m be given. Fix a small auxiliary parameter δ which we willspecify later and let m be the number n ( δ , R ) from the previous proposition,in particular it can be made arbitrarily large by making δ small. Let ε = ε ( δ ) as in the previous proposition, and let δ = ε . Then for all small enough δ ′ m ′ , the hypothesis implies, by Proposition 3.19, that thereare subspaces V ′′ j such that P ≤ j ≤ n (cid:18) µ x,j is ( V ′′ j , δ , m ) -saturated and η y,j is ( V ′′ j , δ ) -concentrated (cid:19) > − δ . By Markov’s inequality, the set I ⊆ { , . . . , n } consisting of k such that P j = k (cid:18) µ x,j is ( V ′′ j , δ , m ) -saturated and η y,j is ( V ′′ j , δ ) -concentrated (cid:19) > − p δ has size | I | ≥ (1 − √ δ )( n + 1) . Since δ < ε , by our choice of m and ε , foreach k ∈ I we have P k ≤ j ≤ k + m ( S j ( ν g,j . x ) is ( V ′′ k , δ ) -concentrated ) > − δ . Also, applying Lemma 3.16 to each ( V ′′ k , δ , m ) -saturated component µ x,k of µ ,we find that P k ≤ j ≤ k + m (cid:18) µ x,j is ( V ′′ k , r d p δ + O ( mm ) , m ) -saturated (cid:19) > − r d p δ + O ( mm ) . Now, by choosing δ small enough we can ensure that ε and δ is small, and m is large, relative to δ, m . With suitable choices, one now argues as in the proofof Proposition 3.19 to combine the last two equations over all k ∈ I and define V ′ j with the desired properties. G -action on R d For g = U + a and g ′ = U ′ + a ′ in G and x, x ′ ∈ R d , g ′ x ′ − gx = ( U ′ − U ) x + U ′ ( x ′ − x ) + ( a ′ − a ) . (63)In particular k gx − g ′ x ′ k ≤ k U − U ′ k k x k + k U ′ k k x − x ′ k + k a − a ′ k , so if g, g ′ are in a common level- k dyadic cell and x, x ′ ∈ [ − R, R ] d are in acommon level- k dyadic cell, then k gx − g ′ x ′ k = O R (2 − k ) . In particular if ν ∈P ( G ) and µ ∈ P ([ − R, R ] d ) are both supported on level- k dyadic cells, then ν . µ is supported on a set of diameter O R (2 − k ) .For a probability measure θ on R d or G it will be convenient in this sectionto write H i,n ( θ ) = 1 n H ( θ, D i + n ) . (This differs from H i + n ( θ ) because we normalize by /n instead of / ( i + n ) ).In particular H n ( θ ) = H ,n ( θ ) . By the previous paragraph, if θ ∈ P ( R d ) and ν ∈ P ( G ) are supported on level- i dyadic cells then ν . θ is supported on O (1) level- i dyadic cells, so H i,n ( ν . θ ) = 1 n H ( ν . θ, D i + n |D i ) + O ( 1 n ) . Also observe that for θ as above, H i,n ( θ ) = H n ( S i θ ) + O (1 /n ) (Lemma 3.1 (5)).We now address the issue, described in Section 2.5, of pairs ν ∈ P ( G ) and x ∈ R d such that ν has substantial entropy but ν . x does not (e.g. because ν issupported close to stab G ( x ) . 65 efinition 5.7. For σ > we say that x , . . . , x d +1 ∈ R d are σ -independent ifeach x i is at distance at least σ from the affine subspace spanned by the others.The action of an element g ∈ G is determined by its action on any ( d + 1) -tuple of affinely independent vectors in R d , in particular of any σ -independent ( d + 1) -tuple. Proposition 5.8.
For every ε, σ, R > , and k ∈ Z and m ∈ N , the followingholds. For every σ -independent sequence x , . . . , x d +1 ∈ [ − R, R ] d and every ν ∈ P ( G ) that is supported on a level- k dyadic cell, if H k,m ( ν . x i ) < ε for all i = 1 , . . . , d + 1 , then H k,m ( ν ) < ( d + 1) ε + O σ,R ( 1 m ) . Proof.
Since ν is supported on a level- k dyadic cell, each ν . x i is supported on O (1) level- k dyadic cells, and therefore H ( ν . x i , D k ) = O (1) Thus the hypothesis is m H ( ν . x i , D k + m ) < ε for i = 1 , . . . , d + 1 , and it isenough to prove that m H ( ν, D Gk + m ) < ( d + 1) ε + O σ,R (1 /m ) .Define the map f : G → ( R d ) ( d +1) by g → ( gx , . . . , gx d +1 ) . Then f is adiffeomorphism and one may easily verify that f is uniformly bi-Lipschitz withits image, with Lipschitz constants of f and f − depending only on σ and R .Thus (e.g. by Lemma 3.2 (2) applied to f − D d ( d +1) k + m and D Gk + m ), | m H ( f ν, D d ( d +1) k + m ) − m H ( ν, D Gk + m ) | = O σ,R ( 1 m ) . Let π i : ( R d ) d +1 → R d denote the projection to the i -th copy of R d . Then ν . x i = π i ( f ν ) . Therefore, if m H ( ν . x i , D k + m ) < ε for all i = 1 , . . . , d + 1 ,then m H ( π i f ν, D k + m ) < ε for all i = 1 , . . . , d + 1 , and so m H ( f ν, D d ( d +1) k + m ) ≤ ( d + 1) ε (because D d ( d +1) k + m = W π − i D dk + m , and using Lemma 3.1 (4)). The claimfollows.Recall the definition of ( ε, σ ) -non-affine measures, Definition 2.11. Lemma 5.9. If µ ∈ P ( R d ) is ( ε, σ ) -non-affine and A ⊆ R d is a Borel set with µ ( A ) > (( d +1) ε ) / ( d +1) , then there exists a σ -independent sequence x , . . . , x d +1 ∈ A .Proof. Let X , . . . , X d +1 be independent R d -valued random variables, each dis-tributed according to µ . Let V i be the (random) affine subspace spanned bythe d vectors { X j } j = i . For each i the vector X i is independent of V i , and X i isdistributed according to µ , so, since µ is ( ε, σ ) -non-affine, P ( X i / ∈ V ( σ ) i ) = µ ( R d \ V ( σ ) i ) > − ε. This fact depends of course on the metric with which we endowed G . In general whenapplying this type of argument to a non-compact group this is one point where the choice ofmetric must be carefully considered. P ( X i / ∈ V ( σ ) i for all i = 1 , . . . , d + 1) > − ( d + 1) ε. Therefore, if µ ( A ) > (( d + 1) ε ) / ( d +1) , P ( X i / ∈ V ( σ ) i and X i ∈ A for all i = 1 , . . . , d + 1) ≥ P ( X i ∈ A for all i ) − ( d + 1) ε ≥ µ ( A ) d +1 − ( d + 1) > . Any realization X , . . . , X d +1 from the event above is σ -independent. Corollary 5.10.
Let k ∈ Z and let ν ∈ P ( G ) be supported on a level- k dyadic cell. Then for every ε, σ, R > , every ( ε, σ ) -non-affine measure µ ∈ P ([ − R, R ] d ) , and for every m ∈ N , µ (cid:18) x ∈ R d : H k,m ( ν . x ) > d + 1 H k,m ( ν ) − O σ,R ( 1 m ) (cid:19) > − (( d + 1) ε ) / ( d +1) . Proof.
Let c = c ( σ, R ) denote the constant in the error term of Proposition 5.8.Let A = { x ∈ R d : H k,m ( ν . x ) ≤ d +1 H k,m ( ν ) − cm } , we claim that µ ( A ) ≤ (( d + 1) ε ) / ( d +1) . Otherwise, by the previous lemma, there is an ( ε, σ ) -non-affine tuple x , . . . , x d +1 ∈ A . By the Proposition 5.8 applied to x , . . . , x d +1 and using the definition of A we have H k,m ( ν ) < ( d + 1)( 1 d + 1 H k,m ( ν ) − cm ) + cm < H k,m ( ν ) , which is a contradiction. G -action Next we utilize the differentiability of the action of G × R d → R d , which impliesthat at small scales, a convolution ν . µ of ν ∈ P ( G ) and µ ∈ P ( R d ) can be wellapproximated by a Euclidean convolution. Since it is easy to give an elementaryargument, we do so.Let g = U + a and g = U + a be elements of G and x , x ∈ R d . Then wehave the identity g . x = g . x + U ( x − x ) + ( U − U )( x − x ) (64)Assuming further that g, g belong to a common level- k dyadic cell in G and x, x belong to a common level- k dyadic cell in R d , we have k U − U k = k x − x k = O (2 − k ) , so g . x = g . x + U ( x − x ) + O (2 − k ) . (65)Recall that τ z ( y ) = y + z is the translation map. It follows from the above thatif ν ∈ P ( G ) and µ ∈ P ( R d ) are supported on the level- k dyadic cells containing67 , x respectively, then for f ∈ Lip( R d ) we have ˆ f d ( ν . µ ) = ˆ ˆ f ( g . x ) dν ( g ) dµ ( x )= ˆ ˆ f ( g . x + U ( x − x )) dν ( g ) dµ ( x ) + O (2 − k · k f k Lip )= ˆ f ( y ) d (( ν . x ) ∗ ( U τ − x µ ))( y ) + O (2 − k · k f k Lip ) . Let ν ′ , µ ′ and θ be the measures obtained from ν . x , U τ x µ and ν . µ , respec-tively, by scaling them by a factor of k and translating them so that they aresupported on a closed ball B of radius O (1) at the origin. Define a metric on P ( B ) by d ( α, β ) = sup k f k Lip =1 | ˆ f dα − ˆ f dβ | . It is well known that d ( · , · ) is compatible with the weak-* topology on P ( B ) (see e.g. [26, Chapter 14]), and the calculation above implies that d ( ν ′ ∗ µ ′ , θ ) = O (2 − k ) . Thus when k is large, by Lemma 3.2 (1), | H m ( ν ′ ∗ µ ′ ) − H m ( θ ) | = O (1 /m ) . Restating this in terms of the original measure, we have shown: Lemma 5.11.
For every m ∈ N and k > k ( m ) the following holds. If µ ∈ P ( R d ) and ν ∈ P ( G ) are supported on level- k dyadic cubes, and x ∈ supp µ and g ∈ supp ν , then H k,m ( ν . µ ) = H k,m (( ν . x ) ∗ U µ ) + O ( 1 m )= H k,m (cid:0) ( U − ( ν . x )) ∗ µ (cid:1) + O ( 1 m ) We omitted the translation in the statement because it commutes with con-volution, and does not affect entropy more than the error term. The second linefollows from the first by applying U − to the convolution.Reasoning similarly, let g, g ∈ G belong to a common level- ℓ dyadic cube D , and x, x ∈ R d belong to a common level- k dyadic cell. Then, using (64),and the fact that k ( U − U )( y − x ) k = O (2 − ℓ − k ) , we have gx = gx + U ( x − x ) + O (2 − ℓ − k ) Thus, for ν ∈ P ( D ) and f ∈ Lip( R d ) , we have ˆ f d ( ν . x ) = ˆ f ( gx ) , dν ( g )= ˆ f ( gx + U ( x − x ) + O (2 − ℓ − k ) dν ( g )= ˆ f ( gx + U ( x − x )) dν ( g ) + O (2 − ℓ − k k f k Lip = ˆ f ( y ) d ( τ U ( x − x ) ( ν . x ))( y ) + O (2 − ℓ − k k f k Lip ) . Now, ν . x and ν . x are measures supported on sets of diameter O k x k , k x k (2 − ℓ ) (since ν is supported on a level- ℓ dyadic cell), so re-scaling by ℓ turns them into“macroscopic” measures. The equation above says that after this the resultingmeasures are, up to a translation, − k -close in the weak sense. Therefore,68 emma 5.12. For every ε > and k > k ( ε ) , if ν ∈ P ( G ) is supported on alevel- k dyadic cube, x, y ∈ R d are in the same level- k dyadic cube, and V ≤ R d is a linear subspace then S k ( ν . x ) is ( V, ε ) -concentrated = ⇒ S k ( ν . y ) is ( V, ε ) -concentrated . We first prove a version of the inverse theorem 2.12 which assumes that ν, µ aresupported on small dyadic cubes. These cubes are introduced to ensure thatthe supports of the measures are small enough for the linearization machineryto kick in, and the proof focuses on this aspect of the argument. After the proofwe explain how to get the stronger version, in which the measures have largersupport, and give bounds on the dimensions of the subspaces produced by thetheorem.
Theorem 5.13.
For every ε > , R > and m ∈ N there is a δ = δ ( ε, R, m ) > , such that, for all k > k ( ε, R, m, δ ) and all n > n ( ε, R, m, δ, k ) , the followingholds: If ν ∈ P ( G ) and µ ∈ P ([ − R, R ] d ) are supported on level- k dyadic cells,then either H n ( ν . µ ) > H n ( µ ) + δ, or there is a sequence V k , . . . , V n of subspaces of R d such that P ≤ i ≤ n (cid:0) µ x,i is ( V i , ε, m ) -saturated (cid:1) > − ε and for all x ∈ supp µ , P ≤ i ≤ n ( S i ( ν g,i . x ) is ( U g V i , ε ) -concentrated ) > − ε. Remark .
1. Since k is assumed large relative to ε , by Lemma 5.12 the last conditionholds for all x ∈ supp µ if and only if it holds for some x ∈ supp µ , up toa change of a factor of in the degree of concentration.2. The measures µ , ν and µ . ν are supported on sets of diameter O R (2 − k ) ,so when measuring their scale- n entropy it might seem more natural torescale them by O R (2 k ) . However, the statement of the theorem is for-mally unchanged if we do so, since we are taking n large relative to k ,and the average entropy over n scales is negligibly affected by the first k scales. Proof.
Let ε, R and m be given.i. Apply Corollary 5.3 with parameters ε and m to obtain parameters ε ′ and m ′ and n ′ .ii. Apply Proposition 5.6 with parameter ε ′ , R and m ′ to obtain parameters ε ′′ and m ′′ .iii. Apply the Euclidean inverse theorem (Theorem 2.8) with parameters ε ′′ , R, m ′′ ,obtaining δ ′ and n ′′ . We are free to assume that δ ′ is arbitrarily small ina manner depending on the previous parameters, and that n ′′ is largewith respect to previous parameters. In particular we assume n ′′ is largerelative to δ = ( δ ′ / . k large enough that the conclusions of Lemma 5.12 hold for param-eter ε ′′ and Lemma 5.11 holds for parameter n ′ (instead of m there). Wealso assume that for any D ∈ D G k and g, h ∈ D , the difference k U g − U g ′ k is small enough that if θ ∈ P ([0 , d ) is a ( U g V, ε ′ ) -concentrated measurethen it is also ( U h V, ε ′ ) -concentrated.v. Let n be very large in a manner depending on all previous parameters.Now let ν ∈ P ( G ) , µ ∈ P ( R d ) be supported on level- k dyadic cells, and supposethat H n ( ν . µ ) ≤ H n ( µ ) + δ. (66)By Lemmas 3.5 and 3.7, assuming n is large compared to n ′′ , (66) implies E ≤ i ≤ n ( H i,n ′′ ( ν g,i . µ ) − H i,n ′′ ( µ x,i )) < δ. By Lemma 5.11, our choice of k and the fact that n ′′ are large in a mannerdepending on δ , E ≤ i ≤ n (cid:0) H i,n ′′ (( U − g ( ν g,i . x )) ∗ µ x,i ) − H i,n ′′ ( µ x,i ) (cid:1) < δ. Since n ′′ is large enough relative to δ , the difference inside the expectation isessentially non-negative, the is, larger than − δ (Lemma 4.1). Since δ ′ = √ δ ,by Markov’s inequality we conclude that P ≤ i ≤ n (cid:0) H i,n ′′ (( U − g ( ν g,i . x )) ∗ µ x,i )) ≤ H i,n ′′ ( µ x,i ) + δ ′ (cid:1) > − δ ′ . Fix g, x such that ν g,i and µ x,i are in the event above. Write η = U − g ( ν g,i . x ) and θ = µ x,i . Since H i,n ′′ ( η ∗ θ ) ≤ H i,n ′′ ( θ ) + δ ′ , and η is supported on a set of diameter O ( R · − i ) , we can, after implicitly re-scaling by i , apply the Euclidean inverse theorem (Theorem 2.8) and conclude,by our choice of the parameters n ′′ , δ ′ , that there are subspaces V j = V ( i,g,x ) j for i ≤ j ≤ i + n ′′ , such that P i ≤ j ≤ i + n ′′ θ y,j is ( V ( i,g,x ) j , ε ′′ , m ′ ) -saturated and η z,j is ( V ( i,g,x ) j , ε ′′ ) -concentrated ! > − ε ′′ . Since we can assume n ′′ > n ′ , by Proposition 5.6 and our choice of parameters,writing τ = ν g,i , P i ≤ j ≤ i + n ′′ θ y,j is ( V ( i,g,x ) j , ε ′ , m ′ ) -saturated and S − j U − g ( τ h,j . x ) is ( V ( i,g,x ) j , ε ′ ) -concentrated ! > − ε ′ (in the last equation, g, x are fixed, and the randomness is over y, h and j ).Recalling that µ, ν are supported on level- k dyadic cells and the definition of k ,we can apply Lemma 5.12 in the event above to replace τ h,j . x by τ h,j . y . As aresult the degree of concentration degrades from ε ′ / to ε ′ / . Then, since h, g are in the same level j (and hence level- k ) component, we can exchange U g with U h in the event above with another ε ′ / degradation of the concentration. Afterthese adjustments we have P i ≤ j ≤ i + n ′′ θ y,j is ( V ( i,g,x ) j , ε ′ , m ′ ) -saturated and S − j U − h ( τ h,j . y ) is ( V ( i,g,x ) j , ε ′ ) -concentrated ! > − ε ′ .
70o far we have seen that with high probability (at least − δ ′ ) over choice ofcomponents θ = µ x,i and τ = ν g,i , we can associate subspaces V ( i,g,x ) j to a largefraction (at least − ε ′ / ) of the components of θ, τ at levels i, . . . , i + n ′′ . Thesecomponents are also components of ν, µ , but each component of ν, µ may arisein several ways as a component of a components. So we have not associateda subspace to (most) components of ν, µ , but rather to (most) components of ν, µ we have associated several subspaces. To correct this we invoke Lemma 2.7,letting us select subspaces V ( i,g,x ) (no longer depending on j ) such that P ≤ i ≤ n (cid:18) µ x,i is ( V ( i,g,x ) , ε ′ , m ′ ) -saturated and S − i U − g ν g,i . x is ( V ( i,g,x ) , ε ′ ) -concentrated (cid:19) > − ε ′ − δ ′ − O ( n ′′ n ) . The right hand side is > − ε ′ assuming as we may that δ ′ is small comparedto ε ′ and n large relative to n ′′ . Applying Corollary 5.3, and by our choice of ε ′ , there are subspaces V i , independent of g, x , such that P ≤ i ≤ n (cid:18) µ x,i is ( V i , ε, m ) -saturated and S i U − g ν g,i . x is ( V i , ε ) -concentrated (cid:19) > − ε. This implies the statement.We now prove Theorem 2.12, which we repeat for convenience:
Theorem 5.15.
For every ε > , R > and m ∈ N , there exists δ = δ ( ε, R, m ) > such that for every k > k ( ε, R, m ) and every n > n ( ε, R, m, k ) ,the following holds. For every ν ∈ P ( G ) and µ ∈ P ([ − R, R ] d ) that are sup-ported on balls of radius R , either H n ( ν . µ ) > H n ( µ ) + δ, or else, to every pair of level- k components e ν of ν and e µ of µ we can assign asequence of subspaces V i = V i ( e ν, e µ ) < R d , ≤ i ≤ n , such that with probabilityat least − ε over the choice of e µ, e ν , P ≤ i ≤ n (cid:18) e µ x,i is ( V i , ε, m ) -saturated and S i U − g ( e ν g,i . x ) is ( V i , ε ) -concentrated (cid:19) > − ε If in addition µ is (( ε/ d ) d +1) , σ ) -non-affine for some σ > , and the relationamong parameters takes σ into account, then for those e ν, e µ in the set of goodcomponents above, n + 1 n X i =0 dim V i > d + 1 H n ( e ν ) − ε (67) and E i = k n + 1 n X j =0 dim V j ( ν g,i , µ x,i ) > d + 1 H ( ν ) − ε (68) Proof.
Fix ε, R, m (the error terms below depend on them but we suppress itin the notation). Let also δ, k, n be parameters whose relations we will specifylater, and suppose that H n ( ν . µ ) < H n ( µ ) + δ
71y Lemma 3.8, E i = k ( H n ( ν g,i . µ ) − H n ( µ x,i )) < δ + O ( 1 n ) < δ. By Markov’s inequality, assuming n large enough, P i = k (cid:16) H n ( ν g,i . µ ) − H n ( µ x,i ) < √ δ (cid:17) > − √ δ (69)Assuming as we may that √ δ < ε , the last probability is at least − ε .Now fix a pair of components e ν, e µ from the event in (69). Assuming that √ δ is small relative to ε, R, m and that k, n are large enough, we can applythe previous theorem to e ν and e µ (which are by definition supported on level- k components) and obtain corresponding subspaces V i ( e ν, e µ ) , k ≤ i ≤ n . Thisproves the first part of the present theorem.For the second part (bounding the dimensions of the subspaces), supposethat µ is ( σ ′ , σ ) -non-concentrated. Fix an auxiliary parameter ε ′ depending ina manner we shall later determine on ε, σ and R , and run first part using ε ′ instead of ε , obtaining associated δ, k, n etc., and a set of level- k components e ν, e µ of probability at least − ε ′ to which are associated subspaces V i ( e µ, e ν ) withthe desired properties w.r.t. ε ′ . Define V i ( e ν, e µ ) = R d for any pair of level- k components e ν, e µ for which is was not yet defined (i.e. pairs that are not in theevent in (69)). For i ≥ k and components ν g,i and µ x,i set V ( ν g,i , µ x,i ) = V i ( ν g,k , µ g,k ) . This is well defined because a level- i component for i ≥ k determines uniquelythe level- k component it belongs to (on the other hand we are abusing notationslightly since, strictly speaking, ν g,i , µ x,i do not determine g, x, i ; but as theyare written explicitly, no confusion should occur).Observe now that, by the first part of the proof, P ≤ i ≤ n (cid:0) S i U − g ν g,i . x is ( V ( ν g,i , µ x,i ) , ε ) -concentrated (cid:1) dµ ( x ) > − ε ′ . Indeed, if we write e ν, e µ for the level- k components to which ν g,i , µ x,i belong, re-spectively, then conditioned on e ν, e µ belonging to the event in (69), the probabil-ity of the event above is at least − ε ′ ; while conditioned on the complementaryevent, the probability is , since then V ( e ν, e µ ) = R d . Thus the unconditionalprobability above is at least − ε ′ .Set ℓ = [log(1 /ε ′ )] By Lemma 3.15, the previous inequality gives P ≤ i ≤ n (cid:18) H ℓ ( S i U − g ν g,i . x ) < dim V ( ν g,i , µ x,i ) + O ( log ℓℓ ) (cid:19) > − ε ′ . Since by Lemma 3.2 (2), (cid:12)(cid:12) H ℓ ( S i U − g ν g,i . x ) − H i,ℓ ( ν g,i . x ) (cid:12)(cid:12) = O ( 1 ℓ ) ,
72e obtain P ≤ i ≤ n (cid:18) H i,ℓ ( ν g,i . x ) < dim V ( ν g,i , µ x,i ) + O ( log ℓℓ ) (cid:19) > − ε ′ − O ( 1 ℓ )= 1 − O ( 1 ℓ ) . (70)We now use the assumption that µ is (( ε/ d ) d +1) , σ ) -non-affine. By Corol-lary 5.10, for every component ν g,i of ν , µ (cid:18) x ∈ R d : H i,ℓ ( ν g,i . x ) > d + 1 H i,ℓ ( ν g,i ) − O σ,R ( 1 ℓ ) (cid:19) > − ε , Choosing the component ν g,i , k ≤ i ≤ n , at random, and then x independentlyaccording to µ , we conclude that H i,ℓ ( ν g,i . x ) > d +1 H i,ℓ ( ν g,i ) − O σ,R ( ℓ ) withprobability at least − ε / . Therefore, combined with (70), we have P ≤ i ≤ n (cid:18) H i,ℓ ( ν g,i . x ) < dim V ( ν g,i , µ x,i ) + O ( log ℓℓ ) and H i,ℓ ( ν g,i . x ) > d +1 H i,ℓ ( ν g,i ) − O σ,R ( ℓ ) (cid:19) > − O ( 1 ℓ ) − ε Recalling that ℓ = log(1 /ε ′ ) , by ε ′ small we can assume the error term does notexceed ε / . We obtain P ≤ i ≤ n (cid:18) d + 1 H i,ℓ ( ν g,i ) < dim V ( ν g,i , µ x,i ) + O σ,R ( log ℓℓ ) (cid:19) > − ε By Markov’s inequality, there is a set A ⊆ G × R d with ν × µ ( A ) > − ε/ ,such that for every ( g , x ) ∈ A , setting e ν = ν g ,k and e µ = µ x ,k , P ≤ i ≤ n (cid:18) d + 1 H i,ℓ ( e ν g,i ) < dim V ( e ν g,i , e µ x,i ) + O σ,R ( log ℓℓ ) (cid:19) > − ε Outside of the event above we have the trivial bound d +1 H i,ℓ ( e ν g,i ) ≤ d +1 d < .On the other hand, by Lemma 3.5 (which holds also in G ), H n ( e ν ) = E ≤ i ≤ n ( H i,ℓ ( e ν g,i )) + O (1 /ℓ + ℓ/n ) Finally, since V ( e ν g,i , e µ x,i ) = V i ( e ν, e µ ) ), the last two equations give d + 1 H n ( e ν ) = E ≤ i ≤ n (cid:18) d + 1 H i,ℓ ( e ν g,i ) (cid:19) + O ( 1 ℓ + ℓn ) < n + 1 n X i =0 (cid:18) dim V i ( e ν, e µ ) + O σ,R ( log ℓℓ ) (cid:19) + O ( 1 ℓ + ℓn ) + 12 ε (71)Taking ε ′ small (and hence ℓ large) relative to ε, R, σ , and n larger, and rear-ranging, we obtain (67).Finally, recall that (71) holds on a set of pairs of level- k components e ν, e µ ofprobability at least − ε/ , and recall that E i = k ( H n ( ν g,i )) = H n ( ν ) − O ( kn ) The last statement of the theorem, (68), follows now by taking expectation ofboth sides in (71) and making ε ′ small enough and n large enough.73 .6 Generalizations To derive Theorem 2.14 very few changes are needed to the convolution case.The C -assumption of f , and the compactness of its domain, easily implyanalogs of Equations (63), (65) and (64) and their consequences (without quan-titative control on the error, but one cannot expect it in the general setting).In particular, for large enough k and suitably large n , with µ × ν -probability atleast − δ over choice of ( x, y ) we have | H n ( f ( µ x,k × δ y )) − H n ( A x,y µ x,k ) | < δ , and | H n ( f ( µ x,k × ν y,k )) − H n ( A x,y µ x,k ∗ B x,y ν x,k ) | < δ (note that since n ≫ k , there is no advantage in scaling / k A k , / k B k by k ,as might seem natural).By concavity and almost-convexity of entropy (Lemma 3.1 (5) and (6)), for n ≫ k we have (cid:12)(cid:12)(cid:12)(cid:12) H n ( f ( µ × ν )) − ˆ H n ( f ( µ x,k × ν y,k )) dµ × ν ( x, y ) (cid:12)(cid:12)(cid:12)(cid:12) < δ , and for every y , similarly, (cid:12)(cid:12)(cid:12)(cid:12) H n ( f ( µ × δ y )) − ˆ H n ( f ( µ x,k × δ y )) dµ ( x ) (cid:12)(cid:12)(cid:12)(cid:12) < δ . Thus the hypothesis (17) of Theorem 2.14 implies that for any k and n ≫ k , ˆ H n ( f ( µ x,k × ν y,k )) dµ × ν ( x, y ) < ˆ H n ( f ( µ x,k × δ y )) dµ × ν ( x, y ) + 810 δ. By the above, for large k this is ˆ H n ( A x,y µ x,k ∗ B x,y ν x,k ) dµ × ν ( x, y ) < ˆ H n ( A x,y µ x,k ) dµ × ν ( x, y ) + 610 δ. Since for large n we essentially have the reverse inequality between the inte-grands, we conclude that with high probability at least − δ over the components e µ = µ x,k and e ν = ν y,k , we have H n ( A x,y e µ ∗ B x,y e ν ) < H n ( A x,y e µ ) + δ ′ , where δ ′ tends to zero with δ . From here one can apply the Euclidean inversetheorem to the components e ν, e µ as we did in the proof of the convolution case,with very few changes other than notational ones. We omit the details.In the special case of actions of matrix groups on R d or on themselves, onehas analogs of Corollary 5.10. In the first case essentially by the same lemma(using compactness of the domain of the action function in place of compactnessof the orthogonal group). For a matrix group acting on itself, there are in facttrivial stabilizers, so there conclusion is automatic.74 Self-similar sets and measures on R d The derivation of our main result, Theorem 1.5, from the Theorem 2.12 (theinverse theorem for the G -action), follows lines similar to the argument in [12]for R . One new ingredient is the explicit presence of the isometry group, butthis is implicit in the original argument and the main change is notational. Moresignificant is the appearance of invariant subspaces in the third alternative ofthe theorem. This will require some further analysis, and will occupy us in thefirst few subsections.We remark that our analysis so far, and much of the analysis below, is of afinitary nature, involving entropies at fine (but finite) partitions. Certainly wemust somewhere connect this to dimension, specifically to the dimension of theconditional measures of a given self-similar measure on the family of translatesof a subspace (as in (iii”) of Theorem 1.5). It is an unfortunate reality that sucha connection seems to be available only when the subspace is invariant under thelinearization of the IFS (see Section 6.4 below). If such results were availablewithout invariance, much of the technical work of the next few sections could beavoided by passing to a limit at an earlier stage. However, understanding these“slice” measures for general self-similar measures remains an open problem. We will obtain invariant subspaces from almost invariant ones:
Definition 6.1.
A subspace V ≤ R d is ε -invariant under a subgroup H < G ,or ( H, ε ) -invariant, if d ( hV, V ) ≤ ε for every h ∈ H .Evidently, ( H, -invariance is H -invariance in the usual sense. Furthermore, Lemma 6.2.
Let
H < G be a closed subgroup. For every ε > there is a δ > , such that if V is δ -invariant under H , then there is an H -invariantsubspace V ′ with d ( V, V ′ ) < ε .Proof. Let S H denote the space of H -invariant subspaces of R d . If the statementwere false there would be some ε > and a sequence V n ≤ R d of subspacessuch that V n is /n -invariant for H , but d ( V n , V ′ ) ≥ ε for every V ′ ∈ S H .Using compactness of the space of subspaces, we can pass to a subsequence V n k converging to some V . Since the linear action is continuous, d ( V, hV ) =lim d ( V n k , hV n k ) = 0 for all h ∈ H , so V ∈ S H . But by hypothesis d ( V n k , V ) ≥ ε for all k , a contradiction.In fact the choice δ = c · ε d +1 works for an appropriate constant c (or c · ε k +1 if one fixes the dimension k of the subspace in question), but we will not usethis.Our second tool will be to construct almost-invariant subspaces from almost-invariant sets of vectors. Lemma 6.3.
Let < ε < and write ε n = ε n ! . Let H < G be a closedsubgroup and let E ⊆ B (0) ⊆ R d be a set such that d ( hv, E ) < ε n for all v ∈ E and h ∈ H . Let v , . . . , v k ∈ E be a maximal sequence of vectors satisfying d ( v i , span { v , . . . , v i − } ) > ε i for < i ≤ k , and set V = span { v , . . . , v k } .Then V is ( H, O ( ε )) -invariant and E ⊆ V ( ε k +1 ) . roof. We may assume k < d since otherwise V = R d and the statementsis trivial. To see that E ⊆ V ( ε k +1 ) , note that if v ∈ E \ V ( ε k +1 ) then thevector v k +1 = v would extend the given sequence of vectors in a way thatcontradicts its maximality. For invariance, let h ∈ H and set w i = hv i and W = hV = span { w i } . By assumption, for each i there is a w ′ i ∈ E with d ( w i , w ′ i ) < ε d ≤ ε k +1 , and we saw above that w ′ i ∈ V ( ε k +1 ) , hence w i ∈ V (2 ε k +1 ) . Also, h is an isometry, so d ( w i , span { w , . . . , w i − } ) > ε i ≥ ε k forall ≤ i ≤ k , since the same is true for the v i . Therefore, by Corollary 3.23, span { w , . . . , w k } ⊑ V ( c · ε k +1 /ε kk ) , and using the fact that dim V = dim W , thisimplies d ( W, V ) = O ( 2 ε k +1 ε kk ) = O ( ε ( k +1)! − k ! · k ) = O ( ε k ! ) = O ( ε ) , as desired. n We will be interested in the situation where the components of a measure atsome scale typically are highly saturated on a subspace. More precisely,
Definition 6.4.
For V ≤ R d , a measure µ ∈ P ( R d ) is ( V, ε, m ) - saturated atlevel n if P i = n (cid:0) µ x,i is ( V, ε, m ) -saturated (cid:1) > − ε. We write sat( µ, ε, m, n ) = { V ≤ R d : µ is ( V, ε, m ) -saturated at level n } . Some technical properties related to this notion are summarized in the nextlemma. In the formulation we write P A for P a ∈ A a . Lemma 6.5.
Let ε, R > and V ≤ R d . Let µ ∈ P ([ − R, R ] d ) and suppose that µ is given as a convex combination of probability measures, µ = P ki =1 α i µ i .1. If µ is ( V, ε, m ) -saturated, then X { α i : µ i is ( V, ε ′ , m ) -saturated } > − ε ′ , where ε ′ = O ( p ε + (log kR ) /m )) .2. For n sufficiently large in a manner depending on µ, α i , ν i , if V ∈ sat( µ, ε, m, n ) then X { α i : V ∈ sat( µ i , ε ′ , m, n ) } > − ε ′ , where ε ′ = O ( √ ε ) .3. If for some n we have X { α i : V ∈ sat( µ i , ε, m, n ) } > − ε, then V ∈ sat( µ, ε ′ , m, n ) , where ε ′ = O ( √ ε ) . . Let g = 2 − t U + a ∈ G . If V ∈ sat( µ, ε, m, n ) then U V ∈ sat( gµ, ε ′ , m, [ n − t ]) where ε ′ → as ( ε, m ) → .5. Under the same assumptions as in (4) , U V ∈ sat( gµ, ε ′′ , m, n ) where ε ′′ → as ( ε, m ) → .Proof. For (1), by absorbing an O (1 /m ) error into ε we can assume that D m = D Vm ∨ D V ⊥ m (Lemma 3.9). By Lemmas 3.1 (6) and the hypothesis, we have k X i =1 α i · m H ( µ i , D m |D V ⊥ m ) ≥ m H ( µ, D m |D V ⊥ m ) − log km> dim V − ( ε + log km ) . On the other hand, each µ i is supported on [ − R, R ] d so each term in the averageon the left hand side is bounded above by dim V + O ( log Rm ) . Now (1) follows byMarkov’s inequality.For (2), fix for convenience δ = √ ε . By standard differentiation theorems,for µ i -a.e. x , k µ x,ℓ − ( µ i ) x,ℓ k → as ℓ → ∞ . In particular for large n , for aset of x of µ i -mass at least − δ , we have µ x,n = (1 − δ ) µ x,ni + δθ for some θ ∈ P ([0 , d ) (depending on x, i ). For any such n let A = { x ∈ [0 , d : µ x,n is ( V, m, ε ) -saturated } . By hypothesis µ ( A ) > − ε . Since µ = P α i µ i , by Markov’s inequality we have X { α i : µ i ( A ) > − √ ε } > − √ ε. (72)For i satisfying µ i ( A ) > − √ ε , for a set x of points having µ i -mass − δ − √ ε we have that µ x,n is ( V, ε, m ) -saturated and µ x,n = (1 − δ ) µ x,ni + δθ for some θ ∈ P [0 , d ) . Now we can apply part (1) of this lemma to µ x,n , which is writtenas a combination of two measures ( k = 2 ) and supported on [0 , (so R = 1 ),and conclude that µ x,ni is ( V, O ( √ ε ) , m ) -saturated. This holds for at least a (1 − δ − √ ε ) -fraction of the components µ x,ni . Since δ = √ ε we find that µ i is ( V, O ( √ ε ) , m, n ) -saturated. This together with (72) is what we wanted to prove.For (3), observe that µ x,n is a convex combination of components µ x,ni (theweights are proportional to α i µ i ( D n ( x )) ). By Lemmas 3.12 and 3.14, we willbe done if we show with µ -probability > − √ ε over the choice of x , thecomponents µ x,ni which are ( V, ε, m ) -saturated constitute a (1 − √ ε ) -fractionof the mass of µ x,n .To show this, let I = { , . . . , k } and let α be the probability measure on I arising from the weights α i . Consider the space I × R d with the probabilitymeasure θ given by θ ( { i } × A ) = α i µ i ( A ) . Define f : I × R d → R by f ( i, x ) = (cid:26) if µ x,ni is ( V, ε, m ) -saturated otherwise . Note that f is I ×D n -measurable. Writing I = { i ∈ I : µ i is ( V, ε, m, n ) -saturated } ,77e have ˆ f dθ = X i ∈ I α i ˆ f ( i, x ) dµ i ( x ) ≥ X i ∈ I α i ˆ f ( i, x ) dµ i ( x )= X i ∈ I α i µ i ( x : µ x,n is ( V, ε, m ) -saturated ) > X i ∈ I α i (1 − ε ) > (1 − ε ) > − ε (the passage from the third to fourth equation is by the hypothesis). Let B besmallest the σ -algebra that makes the map I × R d → R d , ( i, x ) x , measurable.The function g = E ( f |B ) also satisfies g ≤ and ´ gdθ = ´ f dθ > − ε , so byMarkov’s inequality, θ (( i, x ) : g ( x ) > − √ ε ) > − √ ε. But, writing D = D n ( x ) , the inequality g ( x ) > − √ ε just means that in theconvex combination µ x,n = P α i µ i ( D ) P α i µ i ( D ) ( µ i ) x,n , at least − √ ε of the massoriginates in terms for which ( µ i ) x,n is ( V, ε, m ) -saturated. Since the distributionon x induced by θ is equal to µ , this completes the proof.For (4), consider D ∈ D n such that µ D is ( V, ε, m ) -saturated. Let ν = g ( µ D ) .Then ν ′ = S [ n − t ] ν is the image of µ D under a similarity that contracts by O (1) and rotates by U , and so by Lemma 3.10, ν ′ is ( U V, ε + O (1 /m ) , m ) -saturated.Writing ν ′ = P D ∈D ν ′ ( D ) · ν ′ D we can apply (1) and conclude that, with ε small and m large, most mass in this convex combination comes from termsthat are ( U V, ε ′ , m ) -saturated. This means precisely that ν ′ is ( U V, ε ′ , m, n ) -saturated. Now, since gµ is the convex combination of measures ν of which a (1 − ε ) -fraction are as above, (4) follows from (2).(5) is proved in the same manner as (4), using ν ′ = S n ν instead of S [ n − t ] ν . From here until the end of the paper we again denote by µ a self-similar measureon R d defined by an IFS Φ = { ϕ i } i ∈ Λ and a positive probability vector p =( p i ) i ∈ Λ . As usual we write ϕ i = r i U i + a i , and for i ∈ Λ k we set ϕ i = ϕ i ◦ . . . ◦ ϕ i k , p i = p i · . . . · p i k , and define r i , U i , similarly. Denote by G Φ ⊆ G the smallestclosed group containing the orthogonal parts U i , i ∈ Λ , of the maps ϕ i ∈ Φ .In the next few results, all dependences between parameters and implicitconstants depend on µ and Φ .The first lemma says that the set of subspaces that are ( µ, ε, m ) -saturatedat level n is almost invariant under G Φ : Lemma 6.6.
For every ε > there is a δ = δ ( ε ) > such that, if m > m ( ε ) and n > n ( ε, m ) , the following holds. For any V ∈ sat( µ, δ, m, n ) and g ∈ G Φ there exists W ∈ sat( µ, ε, m, n ) such that d ( W, gV ) < ε . roof. Let Λ ≤ n = S ni =1 Λ i . Then S = { U i : i ∈ S ∞ n =1 Λ n } is a sub-semigroupof G Φ , and S is a closed subgroup of G Φ (it is a general fact that a closed sub-semigroup of a compact group is a group). Since { U i } i ∈ Λ ⊆ S in fact S = G Φ .Since S is dense in S and S is the increasing union S = S ∞ n =1 { U i : i ∈ Λ ≤ n } ,we can choose k large enough that for every V , g ∈ G Φ there is a i ∈ Λ ≤ k with d ( U − i V, gV ) < ε .Fix ≤ k ≤ k . Since µ = P i ∈ Λ k p i · ϕ i µ we can apply Lemma 6.5 (2) witha small parameter δ . Writing ε ′ = √ δ , it follows that if V ∈ sat( µ, δ, m, n ) forsome m and n > n , then V ∈ sat( ϕ j µ, ε ′ , m, n ) for all j ∈ Λ k outside a set J ⊆ Λ k with P j ∈ J p j < ε ′ . Choose δ small enough that p j > ε ′ for all j ∈ Λ k (this requires δ small in a manner depending only on k , and hence only on ε ).Thus we have shown that if and V ∈ sat( µ, δ, m, n ) for some m and n > n , then V ∈ sat( ϕ j µ, ε ′ , m, n ) for all j ∈ Λ k . By Lemma 6.5 (5), this in turn impliesthat is U − j V ∈ sat( µ, ε ′′ , m, n ) , where ε ′′ can be made < ε if ε ′ and k /m (andhence k/m ) are small enough. This holds if δ is small and m large relative to ε (and hence k ), and the claim follows from our choice of k .The next proposition says, roughly, that there is an essentially maximal ( ε, m ) -saturated subspace at each small enough scale n , and that it is ( G Φ , ε ) -invariant. Proposition 6.7.
For every < ε < there exists a δ = δ ( ε ) > suchthat, for m > m ( ε ) and n > n ( ε, m ) there exists a ( G Φ , ε ) -invariant subspace V ∗ n ∈ sat( µ, ε, m, n ) such that every W ∈ sat( µ, δ, m, n ) satisfies W ⊑ ( V ∗ n ) ( ε ) .Proof. Fix ε > and apply the previous lemma to ε ′ = ε d ! / to obtain δ ′ andset δ = δ ′ / . Suppose m and n are large enough to satisfy the conclusion ofthat lemma. Assume that m is also large enough that, for a suitable parameter ε ′′ < ε , the following holds: if V , V ≤ R d are subspaces with ∠ ( V , V ) > ε ′ and µ is ( V i , ε ′′ , m ) -saturated for i = 1 , then µ is ( V + V , ε ′ , m ) -saturated (suchan m and ε ′′ exists by Lemma 3.21 (5)). Also assume that m is large enoughthat if µ is ( V, δ, m ) -saturated and V ′ ≤ V , then µ is ( V, δ ′ , m ) -saturated (such m exist by Lemma 3.21 (4), using δ ′ = 2 δ ).By the choice of δ ′ , if V ∈ sat( µ, δ ′ , m, n ) and g ∈ G Φ , then there is asubspace W ≤ R d such that d ( W, gV ) < ε ′ and W ∈ sat( µ, ε ′ , m, n ) . Let W denote the set of all one-dimensional subspaces W that arise in this way, andwrite E = { w ∈ R d : k w k = 1 and R w ∈ W} . Observe that if w ∈ E then W = R w ∈ W and there exists a V ∈ sat( µ, δ ′ , m, n ) and g ∈ G Φ with d ( W, gV ) < ε ′ , hence for every h ∈ G Φ we have d ( hW, hgV ) <ε ′ . By definition of W there is some W ′ ∈ W such that d ( W ′ , hgV ) < ε ′ , so d ( hW, W ′ ) < ε ′ = ε d ! . Thus there is w ′ ∈ E with W ′ = R w ′ and d ( w, w ′ ) < ε d ! .It follows that the set E satisfies the hypothesis of Lemma 6.3 for ε andthe group G Φ . Choosing a maximal sequence of unit vectors v , . . . , v k ∈ E such that d ( v i , span { v , . . . , v i − } ) > ε i ! and setting V = span { v , . . . , v k } , weconclude that V is ( G Φ , O ( ε )) -invariant and E ⊆ V ( ε ) .Since V = ⊕ ki =1 R v i and ∠ ( v i , span { v , . . . , v i − } ) > ε i ! , and R v i ∈ sat( µ, ε d ! / , m, n ) for all i , repeated application of Lemma 3.21 (5), assuming m large enough rel-ative to ε (and hence ε ′ ), gives that V ∈ sat( µ, O ( ε ) , m, n ) .Finally, if W ∈ sat( µ, δ, m, n ) then we can choose an orthonormal basis { w i } for W , so by choice of m , R w i ∈ sat( µ, δ ′ , m, n ) , so w i ∈ E . By Lemma 6.3,79 i ∈ V ( ε ) . The w i are orthonormal, so d ( w i , span { w , . . . , w j − } = 1 . Hence byCorollary 3.23, W ⊑ V ( O ( ε )) .We have proved the claim for V ∗ n = V , up to some constant factors, toremove them begin with a small multiple of ε instead of ε .The next proposition allows us to replace a saturated almost-invariant sub-space with a truly invariant one, of some lesser saturation. It also shows thatthis new subspace is saturated at many levels, even though the original subspacea-priori was saturated at a single level. Proposition 6.8.
For every ε > , < δ < δ ( ε ) , m > m ( ε, δ ) and every n ∈ N , the following holds. If W ∈ sat( µ, δ, m, n ) is ( G Φ , δ ) -invariant and f W is a G Φ -invariant subspace with d ( W, f W ) < δ , then for m ′ = [log(2 /δ )] and alllarge enough n ′ we have f W ∈ sat( µ, ε, m ′ , n ′ ) .Proof. Fix < δ < ε . Also fix m large relative to δ (we shall see how largelater). Let n ∈ N , W ≤ R d and m ′ , n ′ be as in the statement, so our assumptionis that P i = n (cid:0) µ x,i is ( W, δ, m ) -saturated (cid:1) > − δ. For each measure θ = µ x,n in the event above, writing δ = q dδ + O ( m ′ m ) ,Lemma 3.16 implies P ≤ j ≤ m (cid:0) θ y,j is ( W, δ , m ′ ) -saturated (cid:1) > − δ . Assuming m is large relative to δ (and hence m ′ ), we can arrange δ < √ dδ .Combining the two inequalities above, we can find a ≤ k ≤ m such that P i = n + k (cid:0) µ x,i is ( W, δ , m ′ ) -saturated (cid:1) > − δ . Let µ x,n + k be as in this last event. Since d ( W, f W ) < δ < − m ′ , by Lemma 3.21(3), µ x,n + k is also ( f W , δ , m ′ ) -saturated, where δ = δ + O (1 /m ′ ) . Since thisholds for a − δ > − δ proportion of components µ x,n + k , (because, if δ issmall, δ ≥ δ ), we have f W ∈ sat( µ, δ , m ′ , n + k ) . Note that δ can be madearbitrarily small by choosing δ small enough.Finally, let n ′ > n + k . Let Λ( n ′ ) ⊆ S ∞ j =1 Λ j denote the set of sequences i = i . . . i ℓ such that r i · . . . · r i ℓ < − ( n ′ − k ) ≤ r i · . . . · r i ℓ − . Then P i ∈ Λ( n ′ ) p i = 1 and µ = P i ∈ Λ( n ′ ) p i ϕ i µ . By Lemma 6.5 (4), f W = U i f W ∈ ( ϕ i µ, δ , m ′ , n ′ ) for all i ∈ Λ( n ′ ) , where δ → as δ → and m ′ → ∞ . Since µ is a convex combinationof the measures ϕ i µ , i ∈ Λ( n ′ ) , by Lemma 6.5 (3) we have f W ∈ ( µ, δ , m ′ , n ′ ) for δ which can be made arbitrarily small (and in particular < ε ) if δ is smalland m ′ large. This completes the proof.Finally, we show the existence of a “maximal” invariant subspace which issaturated to all degrees at sufficiently deep levels. Let us say that a µ is V -saturated if µ ∈ sat( V, ε, m, n ) for all ε > , m ≥ m ( ε ) and all n > n ( ε, m ) . Proposition 6.9.
There exists a unique subspace e V ≤ R d such that1. µ is e V -saturated. . V ⊆ e V whenever µ is V -saturated.3. e V is G Φ -invariant.Proof. A formal consequence of Lemma 3.21 (5) is that if µ is e V -saturated and e V -saturated then µ is e V + e V -saturated. Thus we can take e V to be the sumof all subspaces V on which µ is saturated. (1) and (2) are then obvious, and(3) is a formal consequence of Lemma 6.2, Proposition 6.7 and Propositions 6.8,because taken together they show that if µ is V -saturated then µ is V ′ -saturatedfor a G Φ -invariant subspace V ′ , and dim V ′ ≥ dim V . Applying this to V = e V we conclude V ′ ⊆ e V and dim V ′ ≥ dim e V so e V = V ′ is G Φ -invariant.We now need sufficient conditions for the subspace e V from the last proposi-tion to be of dimension > . To this end, we have the following. Proposition 6.10.
If there exists a sequence V i ∈ sat( µ, ε i , m i , n i ) with ε i → , m i > m ( ε i ) and n i > n ( ε i , m i ) , and if V i → V , then V ⊆ e V , where e V is as inProposition 6.9.Proof. In each of the three previous propositions, a δ = δ ( ε ) was associated toan ε . We can assume that these functions δ are increasing (so decreasing ε leadsto no increase in δ ( ε ) ).Let V i , ε i , m i , n i be given. Assuming that m i , n i are large enough rela-tive to ε i , by Proposition 6.7 there is a sequence ε ′ i = ε ′ i ( ε i ) → dependingmonotonely on ε i , such that for each i there is a ( G Φ , ε ′ i ) -invariant subspace V ∗ i ∈ sat( µ, ε ′ i , m i , n i ) with ∠ ( V i , V ∗ i ) < ε ′ i and dim V ∗ i ≥ dim V i (if δ ( · ) is thefunction in that proposition than we choose ε ′ i = δ − ( ε i ) ).We can henceforth assume that m i are large enough relative to ε ′ i , and n i relative to ε ′ i , m i (here we use that ε ′ i depends on ε i in a monotone way, so beinglarge with respect to ε ′ i is the same as being large with respect to ε i , which wasassumed).Passing to a subsequence we may assume that V ∗ i converge to some sub-space V ∗ . Note that V ∗ is G Φ -invariant, being the limit of ( G Φ , ε ′ i ) -invariantsubspaces.By increasing ε ′ i if needed, we can assume that m ′ i = [log(2 /ε ′ i )] → ∞ moreslowly than linearly.By Proposition 6.8 we can choose ε ′′ i → , depending monotonely on ε ′ i suchthat if W ∈ sat( µ, ε ′ i , m i , n i ) is a G Φ -invariant subspace, and d ( V ∗ i , W ) < ε ′ i ,then W ∈ sat( µ, ε ′′ i , m ′ i , n ′ ) , for all n ′ > n i + m ′ i (recall that m ′ i = [log(2 /ε ′ i )] ; if δ ( · ) is the function in that proposition, choose ε ′′ i = δ − ( ε ′ i ) ). Note that sincewe assumed that m ′ i → ∞ more slowly than linearly, every large enough integeroccurs as m ′ i for some i .Applying the previous paragraph to W = V ∗ , and since we have arrangedthat { m ′ i } includes all large enough integers, we see that µ if V ∗ -saturated. Thus V ∗ ⊆ e V .Finally, combining ∠ ( V i , V ∗ i ) → with V ∗ i → V ∗ and V i → V , we concludethat ∠ ( V ∗ , V ) = 0 . Since dim V ∗ = lim dim V ∗ i ≥ lim dim V i = dim V , we musthave V ⊆ V ∗ . Since V ∗ ⊆ e V we get V ⊆ e V , as claimed.81 .4 Entropy and dimension for self-similar measures If µ ∈ P ([0 , d ) is exact dimensional, as self-similar measures are, the dimensionof µ is given by the so-called entropy dimension: dim µ = lim n →∞ H n ( µ ) . (73)We require a similar expression relating the dimension of conditional mea-sures on affine subspaces to entropy. We parametrized affine subspaces as theset of fibers π − ( y ) where π is a linear map π : R d → R k and y ranges over R k .The conditional measure µ π − ( y ) of µ on π − ( y ) is defined for πµ -a.e. y by theweak-* limit µ π − ( y ) = lim ℓ →∞ µ π − D kℓ ( y ) , which exists by the measure-valued version of the Martingale convergence the-orem. Theorem 6.11.
Let µ ∈ P ( R d ) be a self similar measure for the IFS Φ and let π : R d → R k be a linear map such that ker π is D Φ -invariant. Then the condi-tional measure µ π − ( y ) is exact dimensional for πµ -a.e. y , and the dimensionis given by dim µ π − ( y ) = lim p →∞ (cid:18) lim inf n →∞ E ≤ i ≤ n (cid:18) p H ( µ x,i , D i + p | π − D ki + p ) (cid:19)(cid:19) = lim p →∞ (cid:18) lim sup n →∞ E ≤ i ≤ n (cid:18) p H ( µ x,i , D i + p | π − D ki + p ) (cid:19)(cid:19) We will apply this theorem via the following corollary:
Corollary 6.12. If µ is self-similar and V is a saturated and G Φ -invariantsubspace, then the conditional measures of µ on translates of V are a.s. exactdimensional and of dimension dim V . In particular this holds for the subspacedescribed in Proposition 6.9. The proof we present for Theorem 6.11 has two ingredients. The first isexact dimensionality and dimension conservation:
Theorem 6.13.
Let µ ∈ P ( R d ) be a self similar measure for the IFS Φ andlet π : R d → R k be a linear map such that ker π is D Φ -invariant. Then πµ isexact dimensional, µ π − ( y ) is exact dimensional for πµ -a.e. y , its dimension is πµ -a.s. independent of y , and dim πµ + dim µ π − ( y ) = dim µ for πµ -a.e. y. This theorem follows from work of Falconer and Jin [7] (which in turn relieson methods of Feng and Hu [8]). Next, we require an expression for dim πµ interms of entropy of dyadic partitions. A special case of this result appears in[13] for the case that G Φ is the full orthogonal group. Theorem 6.14.
Let µ ∈ P ( R d ) be a self similar measure for the IFS Φ and let π : R d → R k be a linear map such that ker π is D Φ -invariant. Then dim πµ = lim p →∞ (cid:18) lim inf n →∞ E ≤ i ≤ n (cid:18) p H ( µ x,i , π − D ki + p (cid:19)(cid:19) = lim p →∞ (cid:18) lim sup n →∞ E ≤ i ≤ n (cid:18) p H ( µ x,i , π − D ki + p (cid:19)(cid:19) . roof. First, note that E ≤ i ≤ n (cid:18) p H ( µ x,i , π − D ki + p ) (cid:19) = E ≤ i ≤ n (cid:18) p H ( µ, π − D ki + p |D i ) (cid:19) , As we have seen, changing the dyadic partition to one adapted to a differentcoordinate system changes the right hand side of the last equation by O (1 /p ) ,and in the statement of the theorem we consider the limit as p → ∞ . Thus,the statement is unaffected by changes to the coordinate system, and we mayassume that π is a coordinate projection. Therefore we can apply the localentropy averages lemma for projections [13]. The lemma is usually formulatedfor lower pointwise dimension, but the same proof exactly, replacing lim inf by lim sup , shows that lim sup n →∞ − n log( µ (( π − D kn )( x ))) ≥ lim sup n →∞ n n − X i =0 p H ( µ x,i , π − D ki + p ) − O ( 1 p ) µ -a.e. x. Since πµ is exact dimensional, the left hand side is µ -a.s. equal to dim πµ , andwe have dim πµ ≥ lim sup n →∞ n n − X i =0 p H ( µ x,i , π − D ki + p ) − O ( 1 p ) µ -a.e. x Integrating this dµ and using Fatou’s lemma, for all p , dim πµ ≥ lim sup n →∞ ˆ n n − X i =0 p H ( µ x,i , π − D ki + p ) ! dµ ( X ) − O ( 1 p )= lim sup n →∞ E ≤ i ≤ n (cid:18) p H ( µ x,i , π − D ki + p ) (cid:19) − O ( 1 p ) . (74)Equation (74) is one half of the inequality we are after, and its proof only usedexact dimensionality of µ . For the reverse inequality we will use self-similarity.Fix p , and note that we have the identity E ≤ i ≤ n (cid:18) p H ( µ x,i , π − D ki + p ) (cid:19) = E ≤ i ≤ n (cid:18) p H ( µ, π − D ki + p |D i ) (cid:19) where the expectation on the left is over i and x , and on the right only over i .Let r = min { r i : i ∈ Λ } , and for each i let I i ⊆ Λ ∗ denote the set of sequences j . . . j k ∈ Λ ∗ such that r · − i < r j . . . r j k < − i ≤ r j . . . r j k − . It is a standard(and easy) fact that µ = P j ∈ I i p j · ϕ j µ . By concavity of conditional entropy(Lemma 3.1(5)), for each i , p H ( µ, π − D ki + p |D i ) ≥ p X i ∈ I i p i H ( ϕ i µ, π − D i + p |D i ) + O ( 1 p )= 1 p X i ∈ I i p i H ( ϕ i µ, π − D i + p ) + O ( 1 p ) . where we used the fact that each ϕ i µ , i ∈ I i , has diameter O (2 − i ) , and Lemma3.2(2). Finally, since ϕ i contracts by − i up to a constant factor, by changing83cale, applying Lemma 3.2(5), and changing the coordinates system, we have p X i ∈ I i p i H ( ϕ i µ, π − D i + p ) = 1 p H ( µ, π − D p ) + O ( 1 p ) . Note that we used here the fact that ker π is invariant under the linear part of ϕ i . Putting this all together, we have shown that for every p , E ≤ i ≤ n (cid:18) p H ( µ x,i , π − D ki + p ) (cid:19) ≥ p H ( πµ, D p ) + O ( 1 p ) . Taking the lim inf as n → ∞ , we have lim inf n →∞ E ≤ i ≤ n (cid:18) p H ( µ x,i , π − D ki + p ) (cid:19) ≥ p H ( πµ, D p ) + O ( 1 p ) . But, since πµ is exact dimensional, as p → ∞ the right hand side tends to dim πµ . Combined with inequality (74), this proves the statement.We can now prove Theorem 6.11. Begin with the identity E ≤ i ≤ n (cid:18) p H ( µ x,i , D i + p | π − D ki + p ) (cid:19) = E ≤ i ≤ n (cid:18) p H ( µ x,i , D i + p ) (cid:19) − E ≤ i ≤ n (cid:18) p H ( µ x,i , π − D ki + p ) (cid:19) (this is just Lemma 3.1 (4) and linearity of expectation). Taking n → ∞ andthen p → ∞ , and using (73) and Theorem 6.14, the right hand side becomes dim µ − dim πµ , which by Theorem 6.13 is the a.s. dimension of fibers. Recall from the introduction that r = Q i ∈ Λ r p i i , n ′ = n log(1 /r ) and ν ( n ) = P i ∈ Λ n p i · δ ϕ i . Also recall the definition of the dyadic partition D n = D Gn , andthe partition E n = E Gn of G according to the level- n dyadic partition of thetranslation part of the maps. In this section we prove the following: Theorem 6.15.
Let
Φ = { ϕ i } be an IFS on R d that does not preserve a non-trivial affine subspace, and µ a self-similar measure for Φ . Then either lim n →∞ n ′ H ( ν ( n ) , D Gqn |E Gn ′ ) = 0 for all q > , or else there is a D Φ -invariant subspace V such that dim µ V + x = dim V for µ -a.e. x . This implies Theorem 1.5, see remark after its statement.We begin the proof. First, note that µ ( V ) = 0 for every proper affinesubspace V ⊆ R d , since if µ ( V ) > for some V then it is easily shown that µ issupported on V , and hence Φ preserves V , contrary to hypothesis.We now argue by contradiction: suppose that there is a δ > and q > such that lim sup n →∞ qn ′ H ( ν ( n ) , D ( q +1) n ′ |E n ′ ) > δ . F denote the partition of G according to the contraction ratio. This is anuncountable partition, but the possible contractions of ϕ i , i ∈ Λ n , are just allthe n -fold products of the contractions r i , i ∈ Λ . Thus only O ( n | Λ | +1 ) distinctcontraction ratios occur in the support of ν ( n ) , so lim n →∞ qn ′ H ( ν ( n ) , F ) = lim n →∞ O (log n ) n ′ = 0 . Using the identities H ( · , D|E ∨ F ) = H ( · , D|E ) + H ( · , F|E ) and H ( · , F|E ) ≤ H ( · , F ) , the two limits above imply lim sup n →∞ qn ′ H ( ν ( n ) , D ( q +1) n ′ |E n ′ ∨ F ) > δ . (75) Lemma 6.16. lim n →∞ ´ qn ′ H ( g . µ, D ( q +1) n ′ |D n ′ ) dν ( n ) ( g ) = dim µ Proof. If g = 2 − t U + a , then g . µ is supported on a set of diameter O (2 − t ) , hence H ( g . µ, D n ′ ) = O ( | t − n ′ | ) . Similarly, by Lemma 3.2 (5), H ( g . µ, D ( q +1) n ′ ) = H ( µ, D qn ′ ) + O ( | t − n ′ | ) .If we choose g = 2 − t U + a randomly according to ν ( n ) , then t is distributed asthe sum of n independent random variables, each of which takes value log(1 /r i ) with probability p i for i ∈ Λ , so by the law of large numbers, t − n ′ = o ( n ′ ) in probability. We also have a worst-case bound of t ≤ Cn (a.s. for g ∼ ν ( n ) ),because ϕ i ,...,i n contracts by at least (min i ∈ Λ r i ) n , and min i ∈ Λ r i < . Hencethe bound t − n ′ = o ( n ′ ) holds also in the mean sense. It follows from the firstparagraph that qn ′ ˆ H ( g . µ, D n ′ ) dν ( n ) ( g ) = o (1)1 qn ′ ˆ H ( g . µ, D ( q +1) n ′ ) dν ( n ) ( g ) = 1 qn ′ H ( µ, D qn ′ ) + o (1)= dim µ + o (1) Subtracting the first line from the second proves the claim.
Lemma 6.17. lim n →∞ qn ′ H ( µ, D ( q +1) n ′ |D n ′ ) = dim µ .Proof. Using the conditional entropy formula, q + 1) n ′ H ( µ, D ( q +1) n ′ ) = 1( q + 1) · n ′ H ( µ, D n ′ ) ++ q ( q + 1) · qn ′ H ( µ, D q ( n ′ +1) |D n ′ ) . The lemma follows by taking n → ∞ and using the fact that n H ( µ, D n ) → dim µ .Let ν ( n ) I denote, as usual the, conditional measure of ν ( n ) on I . Lemma 6.18. lim n →∞ (cid:16)P I ∈E n ′ ∨F ν ( I ) · qn ′ H ( ν ( n ) I . µ, D ( q +1) n ′ ) (cid:17) = dim µ . roof. Write µ = X I ∈E n ′ ∨F ν ( I ) · (cid:16) ν ( n ) I . µ (cid:17) . and note that ν ( n ) I . µ = ˆ g . µ dν ( n ) I ( g ) Combining this with concavity of conditional entropy (Lemma 3.1 (5)) and theprevious two lemmas, dim µ = lim n →∞ qn ′ H ( µ, D ( q +1) n ′ |D n ′ ) ≥ lim sup n →∞ X I ∈E n ′ ∨F ν ( n ) ( I ) · qn ′ H ( ν ( n ) I . µ, D ( q +1) n ′ |D n ′ ) ≥ lim inf n →∞ X I ∈E n ′ ∨F ν ( n ) ( I ) · qn ′ H ( ν ( n ) I . µ, D ( q +1) n ′ |D n ′ ) ≥ lim inf n →∞ X I ∈E n ′ ∨F ν ( n ) ( I ) · ˆ qn ′ H ( g . µ, D ( q +1) n ′ |D n ′ ) dν ( n ) I ( g )= lim n →∞ ˆ qn ′ H ( g . µ, D ( q +1) n ′ |D n ′ ) dν ( n ) ( g )= dim µ, as claimed.For I ∈ E n ′ ∨ F consisting of similarities with contraction − t , define e ν ( n ) I = S t ν ( n ) I This is a measure on the isometry group G . Lemma 6.19.
For every δ > and for arbitrarily large n we can find I ∈ E n ′ ∨F with ν ( I ) > and such that qn ′ H ( e ν ( n ) I , D qn ′ ) > δ and qn ′ H ( e ν ( n ) I . µ, D qn ′ ) < qn ′ H ( µ, D qn ′ ) + δ. Proof.
By (75), for infinitely many n we have qn ′ X I ∈E n ′ ∨F ν ( n ) ( I ) · H ( ν ( n ) I , D ( q +1) n ′ ) (76) = 1 qn ′ H ( ν ( n ) , D ( q +1) n ′ |E n ′ ∨ F ) > δ Suppose I ∈ E n ′ ∨ F contains similitudes of contraction t . Since the action of S t on G is just ordinary scaling in our coordinates on G , we have (cid:12)(cid:12)(cid:12) H ( ν ( n ) I , D ( q +1) n ′ ) − H ( e ν ( n ) I , D qn ′ ) (cid:12)(cid:12)(cid:12) = O ( | t − n ′ | ) , g = 2 − t U + a ∼ ν ( n ) we have t − n ′ = o ( n ′ ) in probabilityas n → ∞ , and the pointwise bound t = O ( n ′ ) , this and (76) imply that thereare infinitely many n such that X I ∈E n ′ ∨F ν ( n ) ( I ) · qn ′ H ( e ν ( n ) I , D qn ′ ) > δ . (77)Similarly, we have e ν ( n ) I . µ = S t ( ν ( n ) I . µ ) , so by Lemma 3.1 (5), (cid:12)(cid:12)(cid:12) H ( ν ( n ) I . µ, D ( q +1) n ′ ) − H ( e ν ( n ) I . µ, D qn ′ ) (cid:12)(cid:12)(cid:12) = O ( | t − n ′ | ) . Using the previous lemma and again the fact that | t − n ′ | = o ( n ′ ) in probabilityas g = 2 − t U + a ∼ ν ( n ) , lim n →∞ qn ′ X I ∈E n ′ ∨F ν ( n ) ( I ) · H ( e ν ( n ) I . µ, D qn ′ ) = dim µ. On the other hand we know that also lim n →∞ (cid:18) qn ′ H ( µ, D qn ′ ) (cid:19) = dim µ. Therefore (using boundedness of the normalized entropy), lim n →∞ X I ∈E n ′ ∨F ν ( n ) ( I ) · (cid:12)(cid:12)(cid:12)(cid:12) qn ′ H ( e ν ( n ) I . µ, D qn ′ ) − qn ′ H ( µ, D qn ′ ) (cid:12)(cid:12)(cid:12)(cid:12) = 0 . (78)Combining this with (77), for infinitely many n we can find I ∈ E n ′ ∨ F withthe desired properties.Now fix a parameter ε > , and let σ > be such that µ is (( ε/ d ) d +1) , σ ) -non-affine (recall Definition 2.11). Such σ exists because by assumption µ givesmass to every proper affine subspace. Choose large m ∈ N , and let δ > and k ∈ N , be as in the conclusion of Theorem 2.12. Apply the theorem tothe measures e ν ( n ) I for the set I ∈ E n ′ ∨ F found in the previous lemma for theparameter δ . We have arrived at the following conclusion:For every ε > , for arbitrarily large n , a (1 − ε ) -fraction ofthe level- k components θ = µ x,k of µ have associated to them asequence of subspaces V , . . . , V n of which at least a cδ -fraction areof dimension ≥ , and which satisfy P ≤ i ≤ n (cid:0) θ y,i is ( V i , ε, m ) -saturated (cid:1) > − ε. (79)If the last equation held for µ instead of θ (possibly for a different sequence ofsubspaces), we would be in a position to apply Proposition 6.9 (4), which wouldgive the second alternative of the present theorem. This “bootstrapping” fromthe component θ to µ is accomplished as follows. Let us say that a probabilitymeasure η ∈ P ( R d ) is fragmented at level k if ν ( D ) > for at least two distinct D ∈ D k , otherwise it is unfragmented. We again abbreviate P A = P a ∈ A A .87 emma 6.20. Given k , if s ∈ N is large enough, then X { p i : i ∈ Λ s and ϕ i µ is unfragmented at level k } > − ε. Proof.
Let E = S ∂D , where the union is over D ∈ D k such that supp µ ∩ D = ∅ .Then E is contained in the union of finitely many proper affine subspaces, sofor a small enough ρ > we will have µ ( E ( ρ ) ) < ε . Let s be large enough thatfor i ∈ Λ s the measure ϕ i µ is supported on a set of diameter < ρ . This meansthat if ϕ i µ is fragmented then it is supported on E ( ρ ) . Since µ = P i ∈ Λ s p i · ϕ i µ ,we conclude that X { p i : i ∈ Λ s and ϕ i µ is fragmented at level k } ≤ µ ( E ( s ) ) < ε, as required.Let s as in the lemma for the k we found previously. Assuming ε < / ,by the lemma and our previous discussion we can find a level- k component θ = µ D , D ∈ D k , of µ , for which (79) holds and, furthermore, − ε of the massof θ comes from components ϕ i µ , i ∈ Λ s , supported entirely on D . We can nowapply Lemma 6.5 (2) to conclude that there is an i ∈ Λ s such that for arbitrarilylarge j there is a V n ∈ sat( ϕ i µ, ε ′ , m, j ) , where ε ′ → as ε → . By Lemma 6.5the same is true of µ for the subspace U − i V j and some ε ′′ that also vanishes as ε → . We can now invoke Proposition 6.10, which completes the proof. In this section we prove Theorems 1.10 and 1.11 on the dimension of exceptionalparameters for parametric families of self-similar sets and measures. We adoptthe notation from the introduction, so ϕ i,t ( x ) = r i ( t ) U i ( t ) x + a i ( t ) are contract-ing similarities for t in a compact connected set I ⊆ R m , for i = i . . . i n ∈ Λ n we define ϕ i,t = ϕ i ,t ◦ . . . ◦ ϕ i n ,t and similarly r i ( t ) and U i ( t ) . Recall that ∆ i,j ( t ) = ϕ i,t (0) − ϕ j,t (0) and define ∆ ′ n ( t ) = min i = j ∈ Λ n k ∆ i,j ( t ) k . If, as in the introduction, we write ∆ n ( t ) for the minimum of d ( ϕ i,t , ϕ j,t ) overdistinct i, j ∈ Λ n , then we have ∆ ′ n ≤ ∆ n and hence (∆ ′ n ) − (( − ε, ε ) d ) ⊇ (∆ n ) − (( − ε, ε ) d ) . In particular, in order to prove Theorem 1.10, one may re-place the set E there with the set E ′ = T ε> E ′ ε , where E ′ ε = ∞ [ N =1 \ n>N [ i,j ∈ Λ n (∆ i,j ) − (( − ε n , ε n ) d ) . Thus we wish to show that, under suitable hypotheses, dim P E ′ ε → as ε → .We begin with the proof of Theorem 1.11. We require an elementary factwhose proof we include for completeness. Lemma 6.21.
Let V ⊆ R m be open and let F : V → R k be a C map. Supposethat K ⊆ V is compact and that rank DF ≥ r everywhere in K . Then K ∩ F − (( − δ, δ ) k ) can be covered by at most C · /δ m − r balls of radius δ , where C depends only on the diameter of K and the magnitude of the first and secondpartial derivatives of F on K . roof. We first reduce to the case that k = r . Assume this case is known.Consider the general case k ≥ r . For each r -tuple of distinct coordinates i =( i , . . . , i r ) ∈ { , . . . , k } r , let π i : R k → R r denote the projection to thesecoordinates. Now, if rank DF ( x ) ≥ r then rank D ( π i ◦ F )( x ) ≥ r for some r -tuple i , so we can find an open cover V = S V i indexed by tuples as abovesuch that D ( π i ◦ F ) has rank r everywhere in K ∩ V i . Choose compact sets K i ⊆ K such that K i ⊆ V i and K = S K i . By our assumption, for each i theset K i ∩ ( π i ◦ F ) − (( − δ, δ ) r ) can be covered by O (1 /δ m − r ) balls of radius δ . If x ∈ F − (( − δ, δ ) k ) then certainly x ∈ ( π i ◦ F ) − (( − δ, δ ) r ) for every tuple i , sothe union of these (cid:0) kr (cid:1) covers is a cover of K ∩ F − (( − δ, δ ) k ) containing at most (cid:0) kr (cid:1) O (1 /δ m − r ) balls of radius δ , as required (note that restricting the functionand composing with a projection can only decrease its C norm, so the constantdoes not get worse).Thus we may from the start assume that k = r and that rank DF = r everywhere in K . Let M denote the bound on the first and second derivativesof F | K . Applying the constant rank theorem [21, Theorem 7.8], for each x ∈ K there is a neighborhood W x ⊆ R m of x and an open set W ′ x ⊆ R r such that F | W x : W x → W ′ x is a diffeomorphism and is C -conjugate to the projection π ,...,r : R m → R r . The distortion of the conjugating maps is controlled by M .Since for π ,...,r the statement is clear, the conclusion follows for F | W x . Finally,the neighborhoods W x contain balls centered at x with radius again boundedin terms of M . Only O ((diam K ) m ) of these neighborhoods are needed to cover K , and the statement follows.Returning to our parametrized family of IFSs, assume that D ∆ i,j has rankat least r at every point in I and every distinct pair i, j ∈ Λ N . Lemma 6.22.
For large enough n and all i, j ∈ Λ n the rank of ∆ i,j is at least r everywhere in I .Proof. It is easy to check that the power series for the functions ∆ i,j converge ona common neighborhood of I in C m (each function being defined by its complexpower series), and since ∆ i ...i n ,j ...j n → ∆ i,j uniformly on this neighborhood,we find that D v ∆ i ,...,i n ,j ,...,j n → D v ∆ i,j as n → ∞ for all v . The lemma nowfollows by a compactness argument.Finally, it is again clear that there is a uniform bound M for the first andsecond derivatives of all the functions ∆ i,j , i, j ∈ Λ n . The proof of Theorem1.11 is now concluded as follows. For large enough n , for each i, j ∈ Λ n , the set (∆ i,j ) − (( − ε n , ε n ) d ) can be covered O M (1 /ε n ( m − r ) ) balls of radius ε n . Thus the set E ′ n,ε = [ i,j ∈ Λ n (∆ i,j ) − (( − ε n , ε n ) d ) satisfies N ( E ′ n,ε , ε n ) ≤ | Λ | n · O M (1 /ε n ( m − r ) ) where N ( X, δ ) is the δ -covering number of X . Thus for every ε and n , N ( \ k>n E ′ k,ε , ε n ) ≤ N ( E ′ n,ε .ε n ) ≤≤ | Λ | n · O M (1 /ε n ( m − r ) ) , ε > , bdim \ k>n E ′ k,ε ! = lim sup n →∞ log N (cid:16)T k>n E ′ k,ε , ε n (cid:17) log(1 /ε n ) ≤ m − r + log | Λ | log(1 /ε ) , It follows that dim P E ′ ε ≤ m − r + log | Λ | / log(1 /ε ) , and this tends to m − r as ε → , as required.We now turn to the the proof of Theorem 1.10, which is very similar to theone-parameter case from [12]. Let | i | denote the length of a sequence i and forsequences i, j let i ∧ j denote the longest common initial segment of sequences(which may be ). Let B m = { e , . . . , e m } denote the standard basis of R m and let D v denote the directional derivativeoperator in direction v . Thus for F = ( F , . . . , F d ) : I → R d we have D v F =( D v F , . . . , D v F d ) : I → R d . We also write D for the differentiation operatorfor functions R m → R d . It will be convenient for the rest of this section to usethe supremum norm on vectors and matrices. Definition 6.23.
Let I ⊆ R m be a connected compact set. A family { Φ t } t ∈ I of IFSs is transverse of order k if the associated functions r i ( · ) , a i ( · ) , U i ( · ) are ( k + 1) -times continuously differentiable in a neighborhood of I , and there is aconstant c > such that for all n ∈ N and all i, j ∈ Λ n , ∀ t ∈ I ∃ p ∈ { , . . . , k } ∃ v , . . . , v p ∈ B m such that (cid:13)(cid:13) ( D v p . . . D v ∆ i,j )( t ) (cid:13)(cid:13) > c · | i ∧ j | − p · r i ∧ j ( t ) . A real-analytic function defined F : I → R d can be extended to a complex-analytic function on an open complex neighborhood of I . Such an F is iden-tically if and only if at some point t ∈ U we have D v . . . D v n F ( t ) = 0 for every n and v . . . v n ∈ B m . For i, j ∈ Λ N the functions ∆ i,j are real ana-lytic if r i , a i , U i are, because on the common neighborhood of I in which thesefunctions are analytic, ∆ i,j is given as an absolutely convergent powers seriesin these functions. Thus the ∆ i,j extend to complex-analytic functions on acommon neighborhoods of I .We have the following analog of [12, Proposition 5.7]: Proposition 6.24.
Let I ⊆ R m be a connected compact set and { Φ t } t ∈ I afamily of IFSs on R d whose associated functions r i ( · ) , a i ( · ) , U i ( · ) are real analyticon I . For i, j ∈ Λ N , suppose that ∆ i,j ≡ on I if and only if i = j . Then { Φ t } is transverse of order k for some k .Proof. For i, j ∈ Λ n , let ℓ = | i ∧ j | and let u, v ∈ Λ n − ℓ denote the sequencesobtained by deleting the first ℓ symbols of i, j . Define the function e ∆ i,j by e ∆ i,j ( t ) = ∆ u,v ( t ) . We find that ∆ i,j ( t ) = r i ∧ j ( t ) · U i ∧ j ( t )( e ∆ i,j ( t )) , n ( u ) denote the number of times that the symbol u ∈ Λ appears in i ∧ j and let U T be the transpose of U . Then (since U Ti ∧ j = U − i ∧ j ), e ∆ i,j ( t ) = ( Y u ∈ Λ r u ( t ) n ( u ) ) · U Ti ∧ j ( t )∆ i,j ( t ) . From here the analysis is entirely analogous to the proof of [12, Proposition 5.7],bounding iterated directional derivatives rather than the higher derivative e ∆ ( p ) i,j from the original proof. We omit the details.Our next task is to show that transversality of order k provides efficientcoverings of pre-images (∆ i,j ) − (( − ε, ε ) d ) . The argument is again very similarto the one-dimensional case but with some additional technicalities. The keypart of the argument in dimension was the fact that if F : [ a, b ] → R satisfies | F ′ | > c , then F − ( − ρ, ρ ) is an interval of length ≤ ρ/c . We now generalizethis to higher dimensions.Let U ⊆ R m − , let f : U → R be a Lipschitz function with Lipschitz constant c , and E = { ( x, f ( x )) ∈ R m : x ∈ U } be its graph. Then we say that E is a c -Lipschitz graph in R m with domain U . More generally we apply this name toany isometric image of E in R m . Lemma 6.25.
Let E ⊆ R m be a c -Lipschitz graph with domain U = B r ( x ) ⊆ R m − and let < ε < r . Then the ε -neighborhood of E can be covered by O (( r/ε ) m − ) balls of radius ε if c < , and by O (( cr/ε ) m − ) such balls if c ≥ .Proof. Assume that c ≤ . Let y = ( u, f ( u )) be a point in the graph. Let y ± = ( u, f ( u ) ± ε/ . Then the union C = C ( u ) = B ε ( y + ) ∪ B ε ( y − ) containsthe cylinder B ε/ ( u ) × [ − ε/ , ε/ . Since f is c -Lipschitz, this implies that C contains the ε -neighborhood of the graph over B ε/ ( u ) . Now cover B r ( x ) by O (( r/ε ) m − ) balls B ε/ ( u i ) . Then S C ( u i ) is covered by O (( r/ε ) m − ) ε -balls,and contains the ε -neighborhood of the graph.If c ≥ , then C ( u ) contains an ε -neighborhood of the graph over B ε/ c ( u ) ,and we obtain the desired bound by covering B r ( x ) by O (( cr/ε ) m − ) balls ofradius ε/ c . Lemma 6.26.
Let I ⊆ R m be a compact set, let < δ < and let I ( δ ) denotethe δ -neighborhood of I , let F : I ( δ ) → R be twice continuously differentiable with < c ≤ k DF k ≤ M and (cid:13)(cid:13) D F (cid:13)(cid:13) ≤ M on I ( δ ) . We assume c ≤ . Then for <ρ < min { δ, c/M } , the set I ∩ F − ( − ρ, ρ ) can be covered by O M, vol( I ( δ ) ) (( c/ρ ) m − ) balls of radius ρ/c .Proof. Let t ∈ I . Under our hypotheses, there is a ball B r ( t ) ⊆ I ( δ ) , with radius r less than min { δ, c/M } and of this order, such that k DF ( t ) − DF ( t ′ ) k < c for t ′ ∈ B r ( t ) (here we use the upper bound on the second derivative of F ). It isthen an easy fact from calculus, essentially, the implicit function theorem, thatthe level set S = F − (0) ∩ B r ( t ) is the graph of a -Lipschitz function and thatin the transverse direction to S the function F grows at a rate proportional to c as long as we remain in B r ( t ) . Thus, given ρ > , the set F − (( − ρ, ρ )) ∩ B r ( t ) iscontained in the O ( ρ/c ) -neighborhood of the graph of a -Lipschitz function withdomain B r ( t ) for r = O M ( c ) , and by the previous lemma, if ρ < min { δ, c/M } ,it can be covered by O M (( r m − / ( ρ/c ) m − )) balls of diameter ρ/c . Also, I canbe covered by O (vol( I ( δ ) ) /r m ) balls B r ( t ) as above, so it can be covered by O M (( c/ρ ) m − ) balls of diameter ρ/c . 91 orollary 6.27. For F : I ( δ ) → R d and under the same assumptions as above,the same conclusion holds.Proof. We can write I = I ∪ . . . ∪ I d such that on each of the closed sets I i theassumption of the previous lemma holds for F i (the i -th component of F ) withsome degradation of c . Then I ∩ F − (( − ρ, ρ ) d ) ⊆ S di =1 I i ∩ F − i ( − ρ, ρ ) and thelemma can be applied to each set in the union to obtain the desired result. Proposition 6.28.
Let I ⊆ R m be a compact set, I ( δ ) the δ -neighborhood of I ,and F : I ( δ ) → R d a ( k + 1) -times differentiable function. Suppose that thereare constants M > and < b < such that1. For every t ∈ I , ≤ p ≤ k +1 and v , . . . , v p ∈ B m we have | D v . . . D v p F ( t ) | ≤ M (for p = 0 this means | F ( t ) | ≤ M ).2. For every t ∈ I there exist p ∈ { , . . . , k } and v , . . . , v p ∈ B m such that (cid:13)(cid:13) D v . . . D v p F ( t ) (cid:13)(cid:13) > b (for p = 0 this means F ( t ) > b ).Then there exists C = C ( b, M, vol I ( δ ) ) ≥ such that for every < ρ < b · b k ,the set Z ρ = I ∩ F − (( − ρ, ρ ) d ) can be covered by C k ( b/ρ ) ( m − / k balls of radius ( ρ/b ) / k .Proof. Take C large enough to play the role of the constant in the bound in theprevious corollary, and large enough that mC k − + C ≤ C k for k ≥ .We argue by induction on k . The case k = 0 is trivial (because | F ( t ) | > b and ρ < b implies Z ρ = ∅ ).Now fix k and suppose we have proved the claim for k − . First, note thatwe can assume without loss of generality that I ⊆ Z b = { t ∈ I : k F ( t ) k ≤ b } ,since clearly Z ρ ⊆ Z b and if we did not have I ⊆ Z b we could simply replace I by I ∩ Z b , to make it hold.Since k F ( t ) k ≤ b on I , the hypothesis (2) necessarily holds at each pointwith p ≥ . Thus we can write I as a union of closed sets I v , v ∈ B m , on eachof which the induction hypothesis holds for one of the functions G v = D v F .Fix v ∈ B m , take ρ ′ = √ bρ . Note that < b < and < ρ < b k , so < ρ ′ < b k − . Define I ′ v = I v ∩ G − v (( − ρ ′ , ρ ′ ) d ) I ′′ v = I v \ I ′ v We cover Z ρ in each of these sets separately.First, we actually cover the entire set I ′ v . Indeed, by the induction hypoth-esis, it can be covered by C k − ( bρ ′ ) ( m − / k − = C k − ( bρ ) ( m − / k balls of radius ( ρ ′ /b ) / k − = ( ρ/b ) / k .On the other hand, on I ′′ v we have k DF k ≥ k G v k ≥ ρ ′ . By the previouscorollary, Z ρ ∩ I ′′ v = I ′′ v ∩ F − (( − ρ, ρ ) d ) can be covered by C ( ρ ′ /ρ ) m − = C ( bρ ) ( m − / ρ/ρ ′ = p ρ/b , hence, since p ρ/b < ( ρ/b ) / k , we can cover Z ρ ∩ I ′′ by at most this many balls of radius ( ρ/b ) / k .Taking the union of the covers we have found for Z ρ ∩ I ′ v and Z ρ ∩ I ′′ v , weobtain a cover of Z ρ ∩ I v by ( C k − + C )( b/ρ ) ( m − / balls of radius ( ρ/b ) / k .Summing over the m elements v ∈ B m , we have covered Z ρ by m ( C k − + C )( 1 ρ ) ( m − / ) ≤ C k ( bρ ) ( m − / k balls of radius ( ρ/b ) / k (using our assumption mC k − + C ≤ C k ). This is thedesired cover.Theorem 1.10 now follows from Proposition 6.24 and the next result: Theorem 6.29. If { Φ t } t ∈ I satisfies transversality of order k ≥ on the compactset I ⊆ R m , then the set E of “exceptional” parameters in Theorem 1.9 haspacking (and hence Hausdorff ) dimension at most m − .Proof. Let M be a uniform bound for (cid:13)(cid:13) D v . . . D v k +1 ∆ i,j ( t ) (cid:13)(cid:13) taken over v i ∈ B m , t ∈ I and i, j ∈ Λ ∗ . Such M exists from k -fold continuous differentiability of r i ( · ) , a i ( · ) and the fact that | r i | are bounded away from on I . By transversalitythere is a constant c > such that for all n ∈ N and all i, j ∈ Λ n , ∀ t ∈ I ∃ p ∈ { , . . . , k } ∃ v , . . . , v p ∈ B m such that (cid:13)(cid:13) ( D v p . . . D v ∆ i,j )( t ) (cid:13)(cid:13) > c · | i ∧ j | − p · r | i ∧ j | min ( t ) , where r min = min { r i ( t ) : i ∈ Λ , t ∈ I } . We may assume that c < and k ≥ . In what follows we suppress thedependence on k, M, c and I in the O ( · ) notation: O ( · ) = O k,M,c, | I | ( · ) .Fix n and distinct i, j ∈ Λ n . Let b = b n = cn − k r nmin , so that the hypothesisof the previous proposition is satisfied for the function F = ∆ i,j and this b .Therefore, for all < ρ < n k , the set { t ∈ I : | ∆ i,j | < ρ } can be covered by atmost O (( b/ρ ) ( m − / k ) balls of radius ( ρ/b ) / k each.Now let ε > be such that ρ = ε n satisfies ρ < ( b n ) k = ( cn − k r nmin ) k forall n (this holds for all sufficiently small ε > ). Fixing n again, the discussionabove applies to (∆ i,j ) − ( − ε n , ε n ) for every distinct pair i, j ∈ Λ n , so rangingover all such pairs we find that E ′ ε,n = [ i,j ∈ Λ n , i = j (∆ i,j ) − ( − ε n , ε n ) can be covered by O ( | Λ | n ( b n /ε n ) ( m − / k ) balls of radius ( ε n /b n ) / k . Now, E ⊆ E ′ ε = ∞ [ N =1 \ n>N E ′ ε,n .
93y the above, for each ε and N we have bdim \ n>N E ′ ε,n ! ≤ lim n →∞ log (cid:16) | Λ | n ( b n /ε n ) ( m − / k (cid:17) log (cid:0) ( b n /ε n ) / k (cid:1) = O ( log( | Λ | ( r min /ε ) ( m − / k )log( r min /ε ) / k ) . The last expression tends to m − as ε → , uniformly in N . Thus the same istrue of E ′ ε , and E ⊆ E ′ ε for all ε , so E has packing (and Hausdorff) dimension m − . Proof of Theorem 1.12.
Fix Λ . For i, j ∈ Λ N , given an IFS Φ = { ( ϕ i } i ∈ Λ =( r i U i + a i ) i ∈ Λ , evidently ∆ i,j (Φ) = ∞ X n =0 (cid:0) r i ...i n − U i ...i n − a i n − r j ...j n − U j ...j n − a j n (cid:1) . As a function of ( r u , U u , a u ) u ∈ Λ ∈ ( R + × R d × R d ) Λ this is clearly a non-constantexpression. The parametrization is trivially real-analytic, and the conclusionfollows from Theorem (1.10). Proof of Theorem 1.13.
Fix { U i } i ∈ Λ ∈ G Λ0 and { r i } i ∈ Λ ∈ (0 , / Λ . Givendistinct i, j ∈ Λ N let k = k ( i, j ) be the first index where they differ. For a = ( a u ) u ∈ Λ ∈ ( R d ) Λ , let Φ a = { r u U u + a u } u ∈ Λ , so ∆ i,j ( a ) = X n ≥ k ( i,j ) (cid:0) r i ...i n − U i ...i n − a i n − r j ...j n − U j ...j n − a j n (cid:1) . This is linear in the a variables. Differentiating by the coordinates in a i k =( a i k , . . . , a di k ) , we obtain a derivative matrix of the form (cid:18) ∂ ∆ i,j ∂a i k (cid:19) = r i ...i k − U i ...i k − + X n ∈ I r i ...i n U i ...i n − X n ∈ J r j ...j n U j ...j n , (80)where I = { n > k : i n = i k } and J = { n > k : j n = i k } . Similarly, setting I ′ = { n > k : i n = j k } and J ′ = { n > k : j n = j k } and differentiating ∆ i,j bythe a j k variable (and using r i ...i k − = r j ...j k − and U i ...i k − = U j ...j k − ), (cid:18) ∂ ∆ i,j ∂a j k (cid:19) = r j ...j k − U j ...j k − + X n ∈ I ′ r i ...i n U i ...i n − X n ∈ J ′ r j ...j n U j ...j n . (81)In order for these matrices to be invertible, it is enough that on the right handsides of equations (80) and (81), the norm of the sum of the last two terms isless than the norm of the first term. Let R = X n ∈ I r i ...i n + X n ∈ J r j ...j n R ′ = X n ∈ I ′ r i ...i n + X n ∈ J ′ r j ...j n . R + R ′ = X n ∈ I r i ...i n + X n ∈ I ′ r i ...i n ! + X n ∈ J r j ...j n + X n ∈ J ′ r j ...j n ! ≤ r i ...i k − Y n ∈ I ∩ I ′ ( r i n + r i n ) + r j ...j k − Y n ∈ J ∩ J ′ ( r j n + r j n ) < r i ...i k − . (In the first inequality we used r i < . In the second we used the fact that if n ∈ I ∩ I ′ then i n = j n and hence r i n + r j n < , and similarly for n ∈ J ∩ J ′ ,and that r i ...i k − = r j ...j k − by choice of k ). Now, R + R ′ < r i ...r k − impliesthat either R < r i ...i k − or R ′ < r i ...i k − . In the first case, the first term in(80) is a similarity with contraction r i ...i k − , and the latter two terms togethergive a matrix whose norm is at most R < r i ...i k − . Hence the sum is invertible,and rank D ∆ i,j ≥ d . The same argument applies to (81) if R ′ < r i ...i k − . Theconclusion now follows from Theorem 1.11. Proof of Theorem 1.14.
Let ( ϕ i ) i ∈ Λ be given. For i ∈ Λ N write ϕ i = lim n →∞ ϕ i ...i n (0) .Given distinct i, j ∈ Λ N and π ∈ Π d,k , evidently ∆ i,j ( π ) = π ( ϕ i ) − π ( ϕ j ) = π ( ϕ i − ϕ j ) Now, it is easy to verify that for a fixed = v ∈ R d the map π π ( v ) , Π d,k → R k , has rank k at every point. Taking v = ϕ i − ϕ j this shows that ∆ i,j has rank k at every point. An application of Theorem (1.11) completes theproof. Proof of Theorem 1.15.
Writing ∆ i,j ( β, γ ) explicitly and noting that it is notconstant and real-analytic, Theorem 1.15 is immediate from Theorem 1.10 (sincethe IFS in on the line, irreducibility is a non-issue). Proof of Theorem 1.16.
We would again like to apply Theorem 1.16. Analytic-ity and non-triviality of ∆ i,j is again a simple matter, but the usual presentationof the fat Sierpinski gaskets uses an IFS consisting of homotheties, which act re-ducibly. However, the attractor of the fat Sierpinski gaskets are invariant underrotation by π/ about their center of mass, and hence they can be presentedalso as attractors of an IFS x λU i x + a i where a i are the vertices of a trianglein R and the U i are rotations by π/ . Unlike the usual presentation this IFSis irreducible. Theorem 1.16 now does the job.The argument in the last proof relied heavily on the possibility of presentingthe attractor using an irreducible IFS. This is not always possible. For instance,if we take the fat Sierpinski gasket with the usual homothetic presentation, andaugment it with an additional homothety, then the symmetry breaks down andthere is no irreducible presentation. In this case Theorem 1.16 no longer givesinformation about the set of exceptional parameters, because the set of reducibleparameters is large. Some additional argument is needed in this case.Finally, the proof of Corollary 1.7 is based on the classical fact that poly-nomials of bounded height in a fixed set of algebraic numbers either vanishesor is exponentially large in the degree of the polynomial. For completeness weinclude a proof, noting that the version in [12, Lemma 5.10] erroneously omittedthe height assumption: 95 emma 6.30. Let
A ⊆ R be a finite set of algebraic numbers over Q . If x isa polynomial expression in the elements of A with coefficients of magnitude atmost h , then either x = 0 or | x | > s n .Proof. Let A = { a , . . . , a k } . Let f ( x , .., x k ) be an integer polynomial of degree n and coefficients bounded by h in absolute value. Assuming x = f ( a , ..., a k ) isnot zero, it suffices to show that | x | > c n /h u for some c, u > depending onlyon A .Let F = Q ( a , . . . , a k ) be the field over Q generated by { a i } .We may assume that a i are algebraic integers. This is because we can choosepositive integers p , ..., p k such that b i = p i · a i is an algebraic integer. Let p = p · . . . · p k (note that this depends only on the a i ). Then p n · f ( a , ..., a k ) = g ( b , ..., b k ) , and g is an integer polynomial of degree n with coefficients bounded by h · p n .So if we have c = c ( b , ..., b k ) > such that g ( b , ..., b k ) > c n / ( hp n ) u , then f ( a , ..., a k ) > c n / ( h u · p ( u +1) n ) , which is what we wanted (using the constant c/p u +1 instead of c ).Assuming now that a i are algebraic integers, let F ′ be the normal closure of F = Q ( a , ..., a k ) and Γ = Gal( F ′ / Q ) , so the fixed field of Γ is Q . Note that F ′ ,hence Γ , depends only on the a i , and Γ is finite.Now we do the usual thing: if f ( x , ..., a k ) is not zero then also Q s ∈ Γ s ( f ( x )) is non-zero, but it is both an algebraic integer and rational, so its absolute valueis at least 1. Hence ≤ Y s ∈ Γ | f ( sx ) | = | f ( x ) | · Y s ∈ Γ \{ id } | f ( sx ) | . The last product has | Γ |− factors | f ( sx ) | , each of size at most h · max {| Γ -conjugates of a i |} n .Dividing gives the bound that we want. References [1] J. Bourgain. On the Erdős-Volkmann and Katz-Tao ring conjectures.
Geom. Funct. Anal. , 13(2):334–365, 2003.[2] J. Bourgain. The discretized sum-product and projection theorems.
J.Anal. Math. , 112:193–236, 2010.[3] J. Bourgain and A. Gamburd. On the spectral gap for finitely-generatedsubgroups of
SU(2) . Invent. Math. , 171(1):83–121, 2008.[4] J. Bourgain, N. Katz, and T. Tao. A sum-product estimate in finite fields,and applications.
Geom. Funct. Anal. , 14(1):27–57, 2004.[5] Dave Broomhead, James Montaldi, and Nikita Sidorov. Golden gaskets:variations on the Sierpiński sieve.
Nonlinearity , 17(4):1455–1480, 2004.[6] Nicholas de Saxce. A product theorem in simple lie groups. preprint , 2014.http://arxiv.org/abs/1405.2003. 967] Kenneth Falconer and Xiong Jin. Exact dimensionality and pro-jections of random self-similar measures and sets. preprint , 2014.http://arxiv.org/abs/1212.1345.[8] De-Jun Feng and Huyi Hu. Dimension theory of iterated function systems.
Comm. Pure Appl. Math. , 62(11):1435–1500, 2009.[9] Harry Furstenberg. Intersections of Cantor sets and transversality of semi-groups. In
Problems in analysis (Sympos. Salomon Bochner, PrincetonUniv., Princeton, N.J., 1969) , pages 41–59. Princeton Univ. Press, Prince-ton, N.J., 1970.[10] Hillel Furstenberg. Ergodic fractal measures and dimension conservation.
Ergodic Theory Dynam. Systems , 28(2):405–422, 2008.[11] Adriano M. Garsia. Arithmetic properties of Bernoulli convolutions.
Trans.Amer. Math. Soc. , 102:409–432, 1962.[12] Michael Hochman. On self-similar sets with overlaps and inverse theoremsfor entropy.
Ann. of Math. (2) , 180(2):773–822, 2014.[13] Michael Hochman and Pablo Shmerkin. Local entropy averages and pro-jections of fractal measures.
Ann. of Math. (2) , 175(3):1001–1059, 2012.[14] Michael Hochman and Boris Solomyak. On the dimension of the furstenbergmeasure for SL ( R ) -random matrix products. Inventiones Mathematicae ,2016. to appear.[15] Thomas Jordan. Dimension of fat Sierpiński gaskets.
Real Anal. Exchange ,31(1):97–110, 2005/06.[16] Thomas Jordan and Mark Pollicott. Properties of measures supported onfat Sierpinski carpets.
Ergodic Theory Dynam. Systems , 26(3):739–754,2006.[17] Antti Käenmäki, Tapio Rajala, and Ville Suomala. Existence of dou-bling measures via generalised nested cubes.
Proc. Amer. Math. Soc. ,140(9):3275–3281, 2012.[18] V. A. Ka˘ımanovich and A. M. Vershik. Random walks on discrete groups:boundary and entropy.
Ann. Probab. , 11(3):457–490, 1983.[19] Nets Hawk Katz and Terence Tao. Some connections between Falconer’sdistance set conjecture and sets of Furstenburg type.
New York J. Math. ,7:149–187 (electronic), 2001.[20] Richard Kenyon. Projecting the one-dimensional Sierpinski gasket.
IsraelJ. Math. , 97:221–238, 1997.[21] John M. Lee.
Introduction to smooth manifolds , volume 218 of
GraduateTexts in Mathematics . Springer-Verlag, New York, 2003.[22] Elon Lindenstrauss and Péter Varjú. Random walks in the groupof euclidean isometries and self-similar measures. preprint , 2014.http://arxiv.org/find/all/1/all:+AND+varju+lindenstrauss/0/1/0/all/0/1.9723] S. Łojasiewicz. Une propriété topologique des sous-ensembles analytiquesréels. In
Les Équations aux Dérivées Partielles (Paris, 1962) , pages 87–89.Éditions du Centre National de la Recherche Scientifique, Paris, 1963.[24] M. Madiman. On the entropy of sums. In
Information Theory Workshop,2008. ITW ’08. IEEE , pages 303–307, 2008.[25] Mokshay Madiman, Adam W. Marcus, and Prasad Tetali. Entropy and setcardinality inequalities for partition-determined functions.
Random Struc-tures Algorithms , 40(4):399–424, 2012.[26] Pertti Mattila.
Geometry of sets and measures in Euclidean spaces , vol-ume 44 of
Cambridge Studies in Advanced Mathematics . Cambridge Uni-versity Press, Cambridge, 1995. Fractals and rectifiability.[27] J. Neunhäuserer. Properties of some overlapping self-similar and some self-affine measures.
Acta Math. Hungar. , 92(1-2):143–161, 2001.[28] V. I. Rotar ′ . The rate of convergence in the multidimensional central limittheorem. Teor. Verojatnost. i Primenen. , 15:370–372, 1970.[29] Károly Simon and Boris Solomyak. On the dimension of self-similar sets.
Fractals , 10(1):59–65, 2002.[30] Karoly Simon and Boris Solomyak.
Self-similar and self-affine sets andmeasures . 2014. preliminary manuscript.[31] Terence Tao. Sumset and inverse sumset theory for Shannon entropy.
Com-bin. Probab. Comput. , 19(4):603–639, 2010.[32] Terence Tao and Van Vu.
Additive combinatorics , volume 105 of