On Bernoulli Decompositions for Random Variables, Concentration Bounds, and Spectral Localization
Michael Aizenman, Francois Germinet, Abel Klein, Simone Warzel
aa r X i v : . [ m a t h . P R ] J u l ON BERNOULLI DECOMPOSITIONSFOR RANDOM VARIABLES, CONCENTRATION BOUNDS,AND SPECTRAL LOCALIZATION
MICHAEL AIZENMAN, FRANC¸ OIS GERMINET, ABEL KLEIN, AND SIMONE WARZEL
Abstract.
As was noted already by A. N. Kolmogorov, any random variablehas a Bernoulli component. This observation provides a tool for the extensionof results which are known for Bernoulli random variables to arbitrary distri-butions. Two applications are provided here: i. an anti-concentration boundfor a class of functions of independent random variables, where probabilisticbounds are extracted from combinatorial results, and ii. a proof, based on theBernoulli case, of spectral localization for random Schr¨odinger operators witharbitrary probability distributions for the single site coupling constants. For ageneral random variable, the Bernoulli component may be defined so that itsconditional variance is uniformly positive. The natural maximization problemis an optimal transport question which is also addressed here. Contents
1. Introduction 22. Bernoulli decompositions for random variables 32.1. The decomposition in two variants 32.2. Optimality of the Pac-Man algorithm 73. Concentration Bounds 93.1. Probabilistic Sperner Estimates 93.2. Concentration Bounds for Functions of Independent Random Variables 114. An Application to Random Schr¨odinger Operators 13Acknowledgements 16References 16
Date : June 29, 2007. Introduction
This article has a twofold purpose. As a general observation it is noted that inany random variable one may find a Bernoulli component. A decomposition whichis based on the above observation allows then to extend results which for systemsof Bernoulli variables are available by combinatorial methods to systems of randomvariables of arbitrary distribution.A Bernoulli decomposition of a real-valued random variable X is a representationof the form X D = Y ( t ) + δ ( t ) η , (1.1)where Y ( · ) and δ ( · ) ≥ , t is uniformly dis-tributed in (0 , η is a Bernoulli random variable taking the values { , } with probabilities { − p, p } independently of t . The relation in (1.1) is to be un-derstood as expressing equality of the distributions of the corresponding randomvariables.Bernoulli decompositions are constructed here for arbitrary random variablesof non-degenerate distributions. For certain purposes it is useful to have positiveuniform conditional variance of the Bernoulli term, i.e.,inf t ∈ (0 , δ ( t ) > . (1.2)We present such a representation below and discuss related issues of optimality.Two applications mentioned here: i. anti-concentration bounds for monotone,though not necessarily linear, functions of independent random variables, and ii. a proof, based on the Bernoulli case [BK], of spectral localization for randomSchr¨odinger operators with arbitrary probability distributions for the single sitecoupling constants.In the first application, we consider functions Φ( X , . . . , X N ) of independentnon-degenerate random variables { X j } whose distributions are either identical or,in a sense explained below, are of widths greater than some common b X >
0. It isshown here that if for some ε > u + v e j ) − Φ( u ) > ε (1.3)for all v ≥ b X , all u ∈ R N , and j = 1 , . . . , N , where e j is the unit vector in the j -direction, then the following concentration bound applies:sup x ∈ R P ( { Φ( X , . . . , X N ) ∈ [ x, x + ε ] } ) ≤ C X √ N , (1.4)with a constant C X < ∞ which depends on the uniform bounds on the distri-butions of { X j } . The proof employs the Bernoulli representation along with thecombinatorial bounds of Sperner [S], and the more general LYM lemma [E].The use of combinatorial estimates for concentration bounds first appeared inthe context of Bernoulli variables in P. Erd¨os’ variant of the Littlewood-OffordLemma [Er]. The presence of a Bernoulli component in any random variable wasnoted implicitly in the work of A. N. Kolmogorov [Ko] where it was put to use in animprovement of the earlier concentration bounds of W. Doeblin and P. L´evy [DoL,Do] on linear functions of independent random variables. Initially, Kolmogorov didnot extract the maximal benefit from the method by not connecting it with Spernertheory, and in particular the concentration bound in [Ko] includes an unnecessarylogarithmic factor; the corresponding improvement was made by B. A. Rogozin [R1]. ERNOULLI DECOMPOSITIONS FOR RANDOM VARIABLES 3
The bounds were further improved in a series of works, in particular [Es, K, R2]where use was also made of other methods. One may note here that perhapsquite naturally a general method like the Bernoulli decomposition is not optimizedfor specific applications. Nevertheless, it has the benefit of providing a simpleperspective on a number of topics.In our second application, we establish spectral localization for a broad classof continuum, alloy-type random Schr¨odinger operators (cf. (4.1)), building on aresult of J. Bourgain and C. Kenig [BK] for the Bernoulli case. The model andthe results are presented more explicitly in Section 4. The main point to be madehere is that the understanding of spectral localization for the Bernoulli case canbe extended through the Bernoulli decomposition to random operators with singlesite coupling parameters of arbitrary distribution (cf. Theorem 4.2).2.
Bernoulli decompositions for random variables
Randomness often is in the eyes of the beholder, as probability measures are usedto express averages over specified sets of rather varied nature. However, it may betrue that the most elementary model underlying the basic popular perception ofprobability is the simple ‘coin toss’, with two possible outcomes: heads or tails,which is modeled by a Bernoulli random variable: a binary variable equal to 1 withprobability p and equal to 0 with probability 1 − p .2.1. The decomposition in two variants.
The following statement assert thatany real valued random variable has a Bernoulli component, which can even bechosen to be of uniformly positive variance.Given a real random variable X by default we shall denote its probability dis-tribution by µ and let G : (0 , → ( −∞ , ∞ ) be the function defined by G ( t ) := inf { u ∈ R : µ (( −∞ , u ]) ≥ t } . (2.1)One may observe that G is the ‘inverse’ distribution function of µ , which takesvalues in the essential range of X . It can alternatively be described by G ( t ) ≤ u ⇐⇒ µ (( −∞ , u ]) ≥ t , (2.2)and satisfies µ (( −∞ , G ( t ) − ε ]) < t ≤ µ (( −∞ , G ( t )]) for all t ∈ R and ε > Theorem 2.1.
Let X be a non-degenerate real-valued random variable with a prob-ability distribution µ . Then, for each p ∈ (0 , , X admits a decomposition of theform: X D = Y p ( t ) + δ + p ( t ) η , (2.3) in the sense of equality of the corresponding probability distributions, where: (1) η and t are independent random variables, with η a binary variable takingthe values { , } with probabilities { − p, p } , correspondingly, and t havingthe uniform distribution in (0 , , (2) Y p : (0 , ( −∞ , ∞ ) is the monotone non-decreasing function Y p ( t ) := G ((1 − p ) t ) , (2.4)(3) δ + p : (0 , [0 , ∞ ) is the function δ + p ( t ) := G (1 − p + pt ) − G ((1 − p ) t ) , (2.5) MICHAEL AIZENMAN, FRANC¸ OIS GERMINET, ABEL KLEIN, AND SIMONE WARZEL (4) for at least one value of p ∈ (0 , we have β + ( p, µ ) := inf t ∈ (0 , δ + p ( t ) > . (2.6)Some explicit expressions for β + ( p, µ ) are mentioned in Remark 2.1 below. TheBernoulli component of the measure is not a uniquely defined notion, and otherrepresentations similar to (2.3) but with different distributions for the conditionalvariance of the Bernoulli component, i.e., for δ ( t ), can also be obtained. In thefollowing construction its uniform positivity may be lost but one gains the featurethat the range of values which δ assumes reaches up to the diameter of the supportof the measure µ . Theorem 2.2.
Let X be a non-degenerate real-valued random variable with prob-ability distribution µ . Then, for each p ∈ (0 , , X admits a decomposition of theform: X D = Y p ( t ) + δ − p ( t ) η (2.7) where t , η and the function Y p are as in Theorem 2.1, satisfying the above (1) and(2), but instead of (3) and (4) the following holds (3’) δ − p : (0 , [0 , ∞ ) is the non-increasing function: δ − p ( t ) := G (1 − pt ) − G ((1 − p ) t ) , (2.8)(4’) for any x − < x + and p ± > such that P ( { X ≤ x − } ) ≥ p − and P ( { X > x + } ) > p + , (2.9) at the particular value p = p + p − + p + we have P t (cid:0)(cid:8) δ − p ( t ) > x + − x − (cid:9)(cid:1) ≥ p − + p + , (2.10) where the probability is with respect to the uniform random variable t . In the proofs we employ two versions of what is called here the
Pac-Man algo-rithm for the construction of a joint distribution ρ of a pair of random variables, ofthe form { Y ( t ) , Y ( t ) } , whose marginal probability measures, ρ , ρ , satisfy µ = (1 − p ) ρ + p ρ . (2.11)The representations (2.3) and (2.7) correspond to letting: Y p ( t ) := Y ( t ) δ ± p ( t ) := Y ( t ) − Y ( t ) . (2.12)The two Theorems will be proven in reverse order. Proof of Theorem 2.2:
We start by recalling the known observation that for anycontinuous function φ ∈ C ( R ): Z φ ( G ( s )) ds = Z R φ ( x ) dµ ( x ) . (2.13)This relation allows to represent, in terms similar to (2.7), as X D = G ( t ) , (2.14)with t the random variable with the uniform distribution in [0 , ERNOULLI DECOMPOSITIONS FOR RANDOM VARIABLES 5
Extending the above representation, we now define a pair of coupled randomvariables through the following functions of t ∈ [0 , Y ( t ) := G ((1 − p ) t ) Y ( t ) := G (1 − pt ) (2.15)As is the case of G in (2.14), the functions Y and Y are made into randomvariables by assigning to them the joint probability distribution which is inducedby Lebesgue measure on [0 , φ ∈ C ( R )(1 − p ) Z φ ( Y ( t )) dt + p Z φ ( Y ( t )) dt = (1 − p ) Z φ ( G ((1 − p ) t )) dt + p Z φ ( G (1 − pt )) dt = (cid:20)Z − p + Z − p (cid:21) φ ( G ( s )) ds = Z φ ( x ) dµ ( x ) (2.16)where the last equality is by (2.13).By (2.16) the random variable seen on the right side of (2.7) has the samedistribution as X . The statement (2) readily follows from the definition (2.15) and(2.12).For a proof of (4’) we note that (2.9) is equivalent to G ( p − ) ≤ x − and G (1 − p + ) > x + . (2.17)This implies δ + p ( t ) > x + − x − for all t ≤ p + + p − , and hence (2.10) holds true. (cid:3) In the above proof, one may regard the functions Y ( t ) and Y ( t ) defined by (2.15)as describing the motion of a pair of markers which move along R consuming the µ -measure at the steady rates of (1 − p ) and p , correspondingly. The markers leapdiscontinuously over intervals of zero µ -measure and, conversely, linger at points ofpositive mass. Their motion invokes the image of a linear version of the Pac-Man game, and hence we shall refer to the construction by this name. Whereas in theabove construction the Pac-Men move towards each other, we shall next use thePac-Man algorithm with one marker chasing the other.
Proof of Theorem 2.1.
For the representation (2.3) we shall employ the followingvariant of (2.15): Y ( t ) := G ((1 − p ) t ) Y ( t ) := G (1 − p + pt ) (2.18)In this case, both Y and Y are monotone non-decreasing in t and Y ( t ) ≤ G (1 − p ) ≤ G (1 − p + 0) ≤ Y ( t ) (2.19)for all t ∈ (0 , G (1 − p + 0) = lim ε ↓ G (1 − p + ε ). Moreover, for any T ∈ (0 ,
1) we have the lower bound β + ( p, µ ) ≥ min { G (1 − p ) − Y ( T ) , Y ( T ) − G (1 − p + 0) } , (2.20)since δ + p ( t ) ≥ ( G (1 − p ) − Y ( T ) if 0 < t ≤ T ,Y ( T ) − G (1 − p + 0) if T ≤ t < . (2.21) MICHAEL AIZENMAN, FRANC¸ OIS GERMINET, ABEL KLEIN, AND SIMONE WARZEL
For a sufficient condition for the uniform positivity of δ + p ( t ) = Y ( t ) − Y ( t ) letus consider the arrival/departure times: T = inf { t ∈ (0 ,
1) : Y ( t ) = G (1 − p ) } (arrival time of Y ) ,T = sup { t ∈ (0 ,
1) : Y ( t ) = G (1 − p + 0) } (departure time of Y ) . The times T , T are non-random and depend on p and µ only. If T > T , (2.22)then for each T ∈ ( T , T ) we have β + ( p, µ ) ≥ min { G (1 − p ) − Y ( T ) , Y ( T ) − G (1 − p + 0) } > . (2.23)The collection of p ∈ (0 ,
1) such that (2.22) is not empty whenever the support ofthe measure includes more than one point. (cid:3)
Remark 2.1. (i)
Explicit lower bounds on β + . For the Bernoulli decompo-sition which is presented in Theorem 2.1 (i.e., based on the ‘chasing Pac-Men’algorithm), an expression for the lower bound β + ( p, µ ) in terms of the distributionfunction of µ is given in (2.34) below. A simple lower bound can be obtained interms of just the “half-time” points for the two markers. i.e., from (2.20) with T = : β + ( p, µ ) ≥ min (cid:26)(cid:20) G (1 − p ) − G (cid:18) − p (cid:19)(cid:21) , (cid:20) G (cid:18) − p (cid:19) − G (1 − p ) (cid:21)(cid:27) . (2.24)This shows that for continuous measures µ one has β + ( p, µ ) >
0, i.e., (2.6), for any p ∈ (0 , µ consists of exactly two points the representation (2.7) istrivially available, though at a unique value of p ∈ (0 , µ contains more than two points, there exists at least one ˆ x ∈ R such that µ (( −∞ , x ]) < µ (( −∞ , ˆ x ]) if x < ˆ x , < µ (( −∞ , ˆ x )) ≤ µ (( −∞ , ˆ x ]) < . (2.25)At the particular value p = 1 − µ (( −∞ , ˆ x )) we then have G (1 − p ) = ˆ x and β + ( p, µ ) ≥ min { ˆ x − G ((1 − p ) t ) , G (1 − p + pt ) − ˆ x } > t such that µ ( { ˆ x } ) p < t < . (2.26)(ii) An alternative form.
For another form of a Bernoulli decomposition, witha binary random variable σ = ±
1, let σ = 2 η − W = Y p + δ + p . (2.27)When such a substitution is made in (2.3) the two resulting functions W ( t ) and δ + p ( t ) are monotone non-decreasing in t and δ + p ( · ) is constant over each interval ofconstancy of W ( · ). It follows that the value of δ + p ( t ) can be expressed in terms of W ( t ), and thus one obtains a representation of the form: X D = W + b ( W ) σ , (2.28)with W and σ independent random variables, and b ( · ) a measurable function whichis determined by µ and p . ERNOULLI DECOMPOSITIONS FOR RANDOM VARIABLES 7 (iii)
Some precedents.
As was commented above, the Bernoulli decompositionof Theorem 2.2 with p = 1 / Z , the representation (2.3) of Theo-rem 2.1 is related to the somewhat similar representation (though with δ = 0 , Optimality of the Pac-Man algorithm.
In applications of the decompo-sition it is desirable to maximize the conditional variance of the binary term. Weshall now address related questions from an optimal transport perspective, and inparticular establish optimality, in a certain limited sense, of the ‘chasing Pac-Men’construction.In addition to the explicit choices presented in Theorems 2.1 and 2.2 there areother possibilities for a Bernoulli decomposition of the form (2.3). With a changeof variables as in (2.12), such representations can alternatively be expressed interms of joint distributions of the variables Y , Y with the properties listed in thefollowing definition. Definition 2.1.
A (1 − p, p ) Bernoulli decomposition of a probability measure µ on R is a probability measure ρ ( dY dY ) on R whose marginals ρ and ρ satisfy:(1 − p ) ρ + p ρ = µ . (2.29)This concept can of course be easily generalized to variables with values in R d ,or C . For real variables the defining condition (2.29) is conveniently expressed interms of the distribution functions, as(1 − p ) F ( x ) + p F ( x ) = F ( x ) (2.30)where F ( x ) = µ (( −∞ , x ]), and F j ( x ) = ρ ( { Y j ≤ x } ) for j = 1 , R we denote: (cid:26) β β ∗ (cid:27) ( p, ρ ) := ess ρ (cid:26) supinf (cid:27) (Y − Y ) . (2.31) Theorem 2.3.
For any (1 − p, p ) , among all the Bernoulli decomposition of a givenprobability measure µ on R : (1) The minimal conditional variation β ∗ ( p, ρ ) is maximized by the ‘chasingPac-Men’ algorithm which is presented in the proof of Theorem 2.1, i.e. forany Bernoulli decomposition β ∗ ( p, ρ ) ≤ β + ( p, µ ) := ess inf t ∈ (0 , δ + p ( t ) , (2.32) where ess inf t ∈ (0 , yields the same value as inf t ∈ (0 , . (2) The maximal conditional variation β ( p, ρ ) is maximized by the ‘collidingPac-Men’ algorithm of Theorem 2.2, for which β ( p, ρ ) equals the diameterof the essential support of µ . The equality: essinf t ∈ [0 , δ +p (t) = inf t ∈ [0 , δ +p (t) is a simple consequence of theleft-continuity property of the chasing Pac-Men algorithm, where Y j ( t ) = Y j ( t − δ + ( t ) = δ + ( t − β + ( p, µ ). Denotingby F + j the distribution functions corresponding to Y and Y of (2.18) we have: MICHAEL AIZENMAN, FRANC¸ OIS GERMINET, ABEL KLEIN, AND SIMONE WARZEL
Lemma 2.1.
For the ‘chasing Pac-Men’ construction, of Theorem 2.1: F +1 ( x ) = 11 − p min { F ( x ) , − p } , F +2 ( x ) = 1 p max { F ( x ) + p − , } (2.33) and β + ( p, µ ) = sup (cid:8) b ∈ R : F +1 ( x ) ≥ F +2 ( x + b ) for all x ∈ R (cid:9) . (2.34) Proof.
The statements (2.33) follow directly from the definition of the Pac-Manprocess (2.18). In the derivation of (2.34), we shall use the fact that for all t ∈ (0 , ε > F + j ( Y + j ( t ) − ε ) < t ≤ F + j ( Y j + ( t )) (2.35)Let S := sup (cid:8) b ∈ R : F +1 ( x ) ≥ F +2 ( x + b ) for all x ∈ R (cid:9) . Clearly, for any u >S , there is x ∈ R such that F ( x ) < F ( x + u ) . (2.36)It follows that for any t ∈ ( F ( x ) , F ( x + u )): Y ( t ) ≤ x + u and Y ( t ) > x , (2.37)and therefore δ + ( t ) = Y ( t ) − Y ( t ) ≤ u . Thus: inf t ∈ (0 , δ +p (t) ≤ S.For the converse direction, let us note that due to the monotonicity of F thecondition on b in (2.34) is satisfied by all u < S . Thus, if u < S , then, for all x ∈ R : F +2 ( x + u ) ≤ F +1 ( x − , (2.38)and hence for any t ∈ (0 , F +2 ( Y +1 ( t ) + u ) ≤ F +1 ( Y +1 ( t ) − ≤ t , (2.39)which implies that Y +1 ( t ) + u ≤ Y +2 ( t ). Thereforeinf t ∈ (0 , (cid:0) Y +2 ( t ) − Y +1 ( t ) (cid:1) ≥ u . (2.40)It follows that inf t ∈ [0 , δ +p (t) ≥ S, which completes the proof of (2.34). (cid:3)
Proof of Theorem 2.3 .
The second assertion is an elementary consequence of (2.8).To prove (1) we shall show that for any b > β + ( p, µ ) it is also true that b > β ∗ ( p, ρ ).The condition (2.30) readily implies that (1 − p ) F ( u ) ≤ F ( u ), or F ( x ) ≤ min (cid:8) (1 − p ) − F ( x ) , (cid:9) , and hence F ( x ) ≤ F +1 ( x ) F ( x ) ≥ F +2 ( x ) . (2.41)Now, by Lemma 2.1, for any b > β + ( p, µ ) there exist some t, u ∈ R , such that F +1 ( u ) = t < F +2 ( u + b ) and therefore, due to (2.41), also F ( u ) ≤ t < F ( u + b ) . (2.42)Eq. (2.42) means that ρ ( { Y ≤ u } ) ≤ t and ρ ( { Y > u + b } ) < − t . Since theprobabilities of the two events add to less than 1 the complement of their union isof positive probability, and this implies: ρ ( { Y − Y ≤ b } ) > , (2.43)and hence b > β ∗ ( p, ρ ). This concludes the proof of (2.32). (cid:3) ERNOULLI DECOMPOSITIONS FOR RANDOM VARIABLES 9
Remark 2.2.
The idea of seeking optimal joint realizations of random variableswith constrained marginals has allowed to present a wide range of analytical resultsfrom a common ‘optimal transport’ perspective (see, e.g., [V]). The most familiarvariants of the problem concern couplings which minimize a distance function be-tween the two coupled variables. As our discussion demonstrates, it may also be ofinterest to seek couplings which maximize the difference between the two variableswith constrained marginals.3.
Concentration Bounds
We shall now demonstrate how the Bernoulli decomposition yields probabilisticbounds from combinatorial results. If there is any novelty in this section it isin the formulation of the bounds for the non-linear case, as the two main ideaswere noted before in the context of linear functions: P. Erd¨os [Er] observed thatconcentration bounds for linear functions of Bernoulli variables can be derived fromthe combinatorial theory of E. Sperner [S], and B. A. Rogozin [R1] has used theBernoulli decomposition of A. N. Kolmogorov [Ko] for the further extension of thesebounds to arbitrary random variables.First, we present some essentially known results of Sperner theory; in the secondsubsection these results will be combined with the Bernoulli decomposition to yieldconcentration bounds for functions of independent random variables.3.1.
Probabilistic Sperner Estimates.
The configuration space { , } N for acollection of Bernoulli random variables η = { η , ..., η N } is partially ordered by therelation defined by: η ≺ η ′ ⇐⇒ for all i ∈ { , ..., N } : η i ≤ η ′ i . (3.1)A set A ⊂ { , } N is said to be an antichain if it does not contain any pair ofconfigurations which are compatible in the sense of “ ≺ ”. The original SpernerLemma states that for any such set: |A| ≤ (cid:0) N [ N ] (cid:1) . A more general result is the LYMinequality for antichains (cf. [An]): X η ∈A (cid:0) N | η | (cid:1) ≤ , (3.2)where | η | = P η j .The LYM inequality has the following probabilistic implication. Lemma 3.1.
Let { η j } be independent copies of a Bernoulli random variable η with P ( { η = 1 } ) = p , P ( { η = 0 } ) = q := 1 − p , (3.3) where p ∈ (0 , . Then for any antichain A ⊂ { , } N : P ( { η ∈ A} ) ≤ Θ σ η √ N , (3.4) where η = ( η , . . . , η N ) , σ η = √ pq is the standard deviation of η , and Θ is anindependent constant which does not exceed √ .Proof. Let A k be the subset of A consisting of configurations with | η | = k . Then: P ( { η ∈ A} ) = N X k =0 p k q N − k |A k | = N X k =0 b ( k ; N, p ) |A k | (cid:0) Nk (cid:1) ≤ max k =0 , ,...,N b ( k ; N, p ) , (3.5) where b ( k ; N, p ) := p k q N − k (cid:0) Nk (cid:1) is the binomial distribution, and the inequality isby (3.2). The maximum of b ( k ; N, p ) over k , which is known to occur near k = pN (cf. [F, Theorem 1 on p. 140]) yields (3.4). (cid:3) The bound (3.4) has the virtue of being valid for all N ; for N → ∞ it holds witha smaller constant which tends to the asymptotic value Θ → / √ π (implied by(3.5) and Stirling’s formula).Following is an extension of Lemma 3.1 to the case of non-identically distributedrandom variables. Lemma 3.2.
Let η = ( η , . . . , η N ) , where { η j } are independent Bernoulli randomvariables with possibly different values of p j , and set α := min j =1 , ,...,N min { p j , − p j } ∈ (0 , / . (3.6) Then, for any antichain
A ⊂ { , } N : P { η ∈ A} ≤ e Θ α √ N , (3.7) where e Θ is an independent constant which does not exceed . The proof gives us the chance to introduce the technique of ‘double sampling’.
Proof.
We start from the observation that any Bernoulli variable η with parameter p η as in (3.3) may be decomposed in terms of two independent Bernoulli variables χ and ξ as η D = ξ χ , (3.8)with p ξ p χ = p η .By the definition of α , eq. (3.6), p j ∈ [ α, − α ] for all j = 1 , , . . . , N . Hencethe variables η may be represented as in (3.8) with independent identically dis-tributed (iid) Bernoulli variables { χ j } with common p χ := 1 − α . We abbreviatethis representation as ξ χ := ( ξ χ , . . . , ξ N χ N ). Evaluating the probability by firstconditioning on the values of ξ , one has P { η ∈ A} = E [ P { ξ χ ∈ A | ξ } ] (3.9)For specified values of the variables χ , the event A depends only on the values of χ j with j in the set J ξ := { j : ξ j = 0 } , and as such it is an antichain in { , } J ξ .Bounding its conditional probability by Lemma 3.1 we obtain P { ξ χ ∈ A | ξ } ≤ min ( , Θ σ χ p | J ξ | ) , (3.10)where σ χ = p α (1 − α ) is the common standard deviation of χ j .To conclude the proof of (3.7) it remains to estimate the expected value ofthe right hand side of (3.10), where | J ξ | = P Nj =1 ξ j . Noting that E ( ξ j ) = p ξ j = p j / (1 − α ) ≥ α/ (1 − α ), we see that the mean satisfies: E ( | J ξ | ) ≥ αN − α . (3.11) ERNOULLI DECOMPOSITIONS FOR RANDOM VARIABLES 11
The event {| J ξ | ≤ αN/ − α ) } is of exponentially small probability, as can beseen by a standard large deviation estimate for independent variables. It thenreadily follows that E min ( , Θ σ χ p | J ξ | )! ≤ e Θ α √ N , (3.12)with a constant for which elementary estimates yield e Θ ≤ (cid:3) Remark 3.1.
The above notions and results have natural extensions to integervalued independent random variables, τ = ( τ , τ , . . . , τ N ), whose configurationspace, Z N , is also partially ordered by the natural extension of the relation (3.1).The Bernoulli decomposition (2.7) can be used for an extension of the probabilisticbound of Lemma 3.2 to this more general case. One way to derive the general state-ment is through the application of the bound (3.7) to the conditional probabilityfor the Bernoulli component, as in the arguments which appear below. Alterna-tively, one may note that the statement directly follows from Theorem 3.1 which ispresented in the next section.For completeness it should be added that in addition to the anti-concentrationupper bounds it is of interest to know the asymptotic behavior. That is covered byknown results, such as is presented in Engel [E, Theorem 7.2.1]:lim N →∞ σ µ √ πN (cid:26) max A⊂{ , ,...,k } N antichain P { τ ∈ A} (cid:27) = 1 , (3.13)which amounts to a ‘local’ central limit theorem (CLT).3.2. Concentration Bounds for Functions of Independent Random Vari-ables.
We shall now employ the Bernoulli decomposition of Section 2, along withthe results presented in the previous subsection, for an upper bound on the con-centration probability Q Z ( ξ ) := sup x ∈ R P ( { Z ∈ [ x, x + ξ ] } ) (3.14)for random variables of the form Z = Φ( X , X , . . . , X N ) , (3.15)where { X j } are independent random variables. Theorem 3.1.
Let X = ( X , . . . , X N ) be a collection of independent random vari-ables whose distributions satisfy, for all j ∈ { , ..., N } : P ( { X j ≤ x − } ) ≥ p − and P ( { X j > x + } ) > p + (3.16) at some p ± > and x − < x + , and Φ : R N R a function such that for some ε > u + v e j ) − Φ( u ) > ε (3.17) for all v ≥ x + − x − , all u ∈ R N , and j = 1 , . . . , N , with e j the unit vector inthe j -direction. Then, the random variable Z which is defined by (3.15) obeys theconcentration bound Q Z ( ε ) ≤ √ N s p + + 1 p − , (3.18) where can also be replaced by the constant e Θ of (3.7) . Proof.
We start by selecting p ∈ (0 ,
1) by the condition p = p + p + + p − . Next, werepresent the variables { X j } using Theorem 2.2: X D = Y ( t ) + δ ( t ) η := (cid:16) Y p, ( t ) + δ − p, ( t ) η , . . . , Y p,N ( t N ) + δ − p,N ( t N ) η N (cid:17) , (3.19)with η = ( η , . . . , η N ) a collection of iid Bernoulli variables taking values { , } withprobability { − p, p } . From (2.10) one may conclude that for all j ∈ { , . . . , N } : P t ( (cid:8) δ − p,j ( t ) ≥ x + − x − (cid:9) ) ≥ p + + p − . (3.20)We express the probability of the event { Z ∈ [ x, x + ε ] } through first conditioningon the { t j } variables. For all x ∈ R : P ( { Z ∈ [ x, x + ε ] } ) = E (cid:2) P (cid:0) A t (cid:12)(cid:12) t (cid:1)(cid:3) (3.21)where A t := (cid:8) η ∈ { , } N : Φ( Y ( t ) + δ ( t ) η ) ∈ [ x, x + ε ] (cid:9) . (3.22)By virtue of (3.17), the set A t is an antichain in its dependence on { η j } j ∈ J t with J t := { j : δ j ( t j ) ≥ x + − x − } . Lemma 3.1 thus yields P (cid:8) A t (cid:12)(cid:12) t , { η j } j J t (cid:9) ≤ min ( , Θ σ η p | J t | ) (3.23)with σ η = p p (1 − p ). We conclude by the large-deviation argument used in theproof of Lemma 3.2. Using (3.20) the expected value of | J t | = P Nj =1 { j : δ j ( t j ) ≥ x + − x − } is bounded below: E ( | J t | ) ≥ ( p + + p − ) N . (3.24)Therefore {| J t | ≤ ( p + + p − ) N } is a large deviation event and its probability isexponentially bounded. Elementary estimates lead to E min ( , Θ σ η p | J t | )! ≤ e Θ √ N s p + + 1 p − , (3.25)with the same constant e Θ as in (3.7). (cid:3)
Remark 3.2. (i)
A simpler proof for iid variables.
For iid non-degeneraterandom variables X , . . . , X N the theorem has a simpler proof using the binary de-composition of Theorem 2.1; there is no need for the large deviation argument. Theconstants in the theorem will then depend on the value of p and its correspondinglower bound in (2.6).(ii) The linear case.
For linear functions, Z = Φ( X , . . . , X N ) = N X j =1 X j , (3.26)concentration inequalities as in (3.18) go back to W. Doeblin, P. L´evy [DoL, Do],P. Erd¨os [Er] (for the Bernoulli case, where it reduces to the Littlewood-Offordproblem), A. N. Kolmogorov [Ko], B. A. Rogozin [R1], H. Kesten [K] and C. G.Esseen [Es]. In this case, sharper inequalities than (3.18) are known, e.g. [R3], Q Z ( ε ) ≤ Θ ε h N X j =1 ε j (1 − Q X j ( ε j )) i − / (3.27) ERNOULLI DECOMPOSITIONS FOR RANDOM VARIABLES 13 where Θ is some constant. A recent application of the discrete case of the concen-tration bounds is found in [TV].(iii)
An extension.
As it is already true for (3.27), the statement of Theorem 3.1has an immediate extension to functions which in some variables are monotoneincreasing and in some are monotone decreasing, satisfying the natural analog of(3.17). For this extension, one only needs to replace p + and p − in (3.18) by ˆ p =min { p + , p − } .(iv) Sperner bounds from concentration inequalities.
In the proof of Theorem 3.1concentration bounds were deduced from the probabilistic Sperner estimate (3.4).For antichains in the multiset S = { , , . . . , K } N the implication can also beestablished in the opposite direction. For that, one may use the fact that in sucha multiset for any antichain A there is a function Φ : S R which satisfies the‘representation condition’ (in the terminology of [E])Φ( u + e j ) − Φ( u ) ≥ u ) = 0 if and only in u ∈ A .4. An Application to Random Schr¨odinger Operators
As a demonstration of a possible uses of the elementary observations which aremade in this article, let us present the case of spectral localization under randomiid single site potential for an arbitrary probability distribution.The (continuum) Anderson Hamiltonian is the random Schr¨odinger operatorgiven by H ω = − ∆ + V ω on L ( R d ) , (4.1)with V ω ( x ) = X ξ ∈ Z d ω ξ u ( x − ξ ) , (4.2)where(1) u ( · ), the single site potential, is a nonnegative bounded measurable func-tion on R d with compact support, uniformly bounded away from zero in aneighborhood of the origin,(2) ω = { ω ξ } ξ ∈ Z d is a family of independent identically distributed randomvariables, whose common probability distribution µ satisfies: { , M } ∈ supp µ ⊂ [0 , M ], for some M > H ω is a function of ω , and as such it is defined over aprobability space which is invariant under the ergodic action of the group of Z d translations. The induced maps on this operator valued function are implementedby unitary translations.Ergodicity considerations carry the implication that there exist fixed subsets of R so that the spectrum of the self-adjoint operator H ω , as well as its pure point (pp),absolutely continuous (ac), and singular continuous (sc) components, are equal tothese fixed sets with probability one (c.f. [P, KuS, KiM]). In the case of the randompotential (4.2), the positivity of u ( · ) and the support properties of µ imply that σ ( H ω ) as = [0 , ∞ ) . (4.3) Although definitions of localization may come in several flavors, they all include(or imply) spectral localization (i.e., pure point spectrum), as given in the followingdefinition.
Definition 4.1.
A self-adjoint operator H on L ( R d ) is said to exhibit spectrallocalization in a closed interval I ⊂ R if σ ( H ) ∩ I = ∅ and the correspondingspectral projection P I ( H ) is given by a countable sum of orthogonal projections onproper eigenspaces.This property is clearly invariant under translations. The defining condition isequivalent to the requirement that for a spanning set of vectors the spectral measureis pure-point within I . The set of ω for which this holds for the random operator H ω is known to be measurable.In the one-dimensional case the continuous Anderson Hamiltonian has been longknown to exhibit spectral localization in the whole real line for any non-degenerate µ , i.e. when the random potential is not constant [GoMP, DSS]. In the multi-dimensional case, localization at the bottom of the spectrum is already known atgreat, but nevertheless not all-inclusive, generality; cf. [St, Kl, BK] and referencestherein. The Bernoulli decomposition presented here allows to prove localizationfor general non-degenerate single site distributions µ .More explicitly, the simplest case to deal with, for the different approaches whichyield proofs of localization, has been when the single site probability distributionis absolutely continuous with bounded derivative. The absolute continuity condi-tion can be relaxed to H¨older continuity of µ , both in the approach based on themultiscale analysis which was introduced in [FrS] and is discussed in [Kl], and inthe one based on the fractional moment method of [AM, AE+]. (The basis in theformer case is an improved analysis of the Wegner estimate, which can be foundin [St, CHK].) However, techniques relying on the regularity of µ seem to reachtheir limit with log-H¨older continuity. In particular, until recently the Bernoullirandom potential had been beyond the reach of analysis in more than one dimen-sion. For that extreme case, i.e., of H ω with µ { } = µ { } = , localization atthe bottom of the spectrum was recently proven by Bourgain and Kenig [BK]. Acrucial step in the analysis of [BK] is the estimation of the probabilities of energyresonances using Sperner’s Lemma, i.e., the p = version of (3.4).The point which we would like to make here is that the Bernoulli decompositionof random variables enables one to turn the latter result of Bourgain and Kenig[BK] into a tool for a general proof of localization at the edge of the spectrum forarbitrary non-degenerate µ .First, the Bourgain and Kenig [BK] analysis needs to be extended to Schr¨odingeroperators which incorporate an additional background potential U ∈ L ∞ ( R d ), andfor which the variances of the Bernoulli terms are uniformly positive, thought notnecessarily uniform. More explicitly, the class is broadened to include operators ofthe form H η = − ∆ + U ( x ) + X ξ ∈ Z d η ξ b ξ u ( x − ξ ) , (4.4)where u ( · ) is as in (4.2), satisfying the above condition (1), but instead of (2):(2’) η = { η ξ } ξ ∈ Z d are iid Bernoulli random variables taking the values { , } with probabilities { − p, p } , and the coefficients { b ξ } ξ ∈ Z d satisfy0 < b − ≤ b ξ ≤ b + < ∞ for all ξ ∈ Z d , (4.5) ERNOULLI DECOMPOSITIONS FOR RANDOM VARIABLES 15 and(3) U ∈ L ∞ ( R d ) satisfies, for all x ∈ R d : 0 ≤ U ( x ) ≤ U + < ∞ .Due to the presence of the background potential U the spectrum of H η need notbe deterministic, i.e., equal to some fixed set with probability one. For our mainpurpose it would suffice to restrict attention to U for which the spectrum of H η is almost surely [0 , ∞ ). Such restriction is not included in the following statementbut instead there is a caveat in the conclusion.The extended BK result, whose proof is presented in [GK], is: Theorem 4.1.
Given a function u ( · ) as above, and: p ∈ (0 , , b ± > and U + < ∞ , there exist E > such that any random operator H η of the form (4.4) ,satisfying conditions (1), (2’) and (3), for otherwise arbitrary external potential U , with probability one, either exhibits spectral localization in [0 , E ] or σ ( H η ) ∩ [0 , E ]= ∅ . Theorem 2.1 allows now to deduce the following general statement from theabove non-trivial Bernoulli result.
Theorem 4.2.
Let H ω = − ∆ + V ω be a Schr¨odinger operator with the randompotential given by (4.2) , satisfying the above conditions (1) and (2). Then for some E > the operator H ω , with probability one, exhibits spectral localization in [0 , E ] .Proof. The Bernoulli decomposition (2.3) allows to write the coefficients in therandom potential in the form: ω D = (cid:8) Y + ( t ξ ) + δ + p ( t ξ ) η ξ (cid:9) ξ ∈ Z d , (4.6)with t = { t ξ } ξ ∈ Z d a family of independent random variables which are uniformlydistributed in (0 , Y + and δ + p the functions defined in (2.12) in terms of thedistribution function of µ , and η = { η ξ } ξ ∈ Z d a family of iid Bernoulli variables,independent of t , which take values in { , } with probabilities { − p, p } for some p ∈ (0 ,
1) such that (2.6) holds.As a consequence, the random operator can be written as: H ω D = − ∆ + U t + V t , η =: H t , η (4.7)where U t ( x ) := X ξ ∈ Z d Y + ( t ξ ) u ( x − ξ ) and V t , η ( x ) := X ξ ∈ Z d δ + p ( t ξ ) η ξ u ( x − ξ ) , (4.8)and the following bounds hold0 ≤ U t ( x ) ≤ U + := M (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) X ξ ∈ Z d u ( · − ξ ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ∞ < ∞ , < b − := inf t ∈ (0 , δ + p ( t ) ≤ b + := M < ∞ . (4.9)This implies that when conditioned on the values of t the operator H t , η is of theform (4.4), with p , U + and b ± independent of t . Thus, by Theorem 4.1 thereexists E > t with probability one, H t , η eitherexhibits spectral localization or has no spectrum in [0 , E ]. However, the latter isexcluded (almost surely, also with respect to the conditional probability) by (4.3)and Fubini. (cid:3) Remark 4.1.
In addition to the spectral localization it is also of interest to estab-lish the existence of uniform localization length, i.e., to prove that all eigenfunctions φ of H ω with eigenvalue in [0 , E ] satisfy Z | x − y |≤ | φ ( y ) | dy ≤ C φ e − | x | /ℓ for all x ∈ R d . (4.10)This can be accomplished in the following two ways, for which the details arepresented in [GK].To establish uniform localization length under the hypotheses of Theorem 4.2one may use the Bernoulli decomposition (4.6) before performing the multiscaleanalysis which is behind the proof of Theorem 4.1. The multiscale analysis is thenexecuted for the random Schr¨odinger operator H t , η in (4.7), in such a way that allevents in the analysis are jointly measurable in t and η .An alternative proof of Theorem 4.2, which yields also uniform localizationlength, can be based on the concentration bound of Theorem 3.1. Namely, theBourgain-Kenig proof can be extended to arbitrary single site probability distribu-tion µ , with the probabilities of energy resonance estimated by the concentrationbound instead of by Sperner’s Lemma as in [BK] (see [GK]). Acknowledgements
We thank the Oberwolfach center for hospitality at a meeting where the four-way collaboration started, and the Isaac Newton Institute where some of the workwas done. We also thank B. Sudakov for an instructive review of the recent resultsin Sperner’s theory, S. Molchanov for alerting us to relevant references, and M.Cranston for many helpful discussions of results in probability theory. This workwas supported in parts by the NSF grants DMS-0602360 (MA), DMS-0457474 (AK)and DMS-0701181 (SW), and a Rothschild Fellowship at INI (MA).
References [AE+] Aizenman, M., Elgart, A., Naboko, S., Schenker, J., Stolz, G.: Moment analysis forlocalization in random Schr¨odinger operators. Inv. Math. , 343–413 (2006)[AM] Aizenman, M., Molchanov, S.:
Localization at large disorder and at extreme energies:an elementary derivation.
Commun. Math. Phys. , 245–278 (1993)[An] Anderson, I.:
Combinatorics of finite sets . Corrected reprint of the 1989 edition. DoverPublications Inc. Mineola, NY, 2002[BK] Bourgain, J., Kenig, C.:
On localization in the continuous Anderson-Bernoulli model inhigher dimension.
Invent. Math. , 389–426 (2005).[CHK] Combes, J.M., Hislop, P.D., Klopp, F.:
An optimal Wegner estimate and its applicationto the continuity of the integrated density of states for random Schr¨odinger operators ,preprint math-ph/0605030.[DSS] Damanik, D., Sims, R., Stolz, G.:
Localization for one dimensional, continuum,Bernoulli-Anderson models . Duke Math. J. , 59–100 (2002).[Do] Doeblin, W.:
Sur les sommes d’un grand nombre de variables al´eatoires indpendantes ,Bull. Sci. Math. Calcul des probabilit´es. Sur les sommes de variables al´eatoiresind´ependantes `a dispersions born´ees inf´erieurement , C.R. Acad. Sci.
Sperner theory . Encyclopedia of Mathematics and its Applications . Cam-bridge University Press, Cambridge, 1997.[Er] Erd¨os, P.: On a lemma of Littlewood and Offord , Bull. Amer. Math. Soc. , 898–902(1945). ERNOULLI DECOMPOSITIONS FOR RANDOM VARIABLES 17 [Es] Esseen, C. G.:
On the concentration function of a sum of independent random variables
Z. Wahrscheinlichkeitstheorie verw. Geb. , 290-308 (1968).[F] Feller, W.: An introduction to probability theory and its applications . Vol. I. 2nd ed.John Wiley and Sons, Inc., New York, 1957.[FrS] Fr¨ohlich, J., Spencer, T.:
Absence of diffusion in the Anderson tight binding model forlarge disorder or low energy.
Commun. Math. Phys. , 151–184 (1983)[GK] Germinet, F., Klein, A.: In preparation.[GoMP] Gol’dsheid, Ya., Molchanov, S., Pastur, L.: Pure point spectrum of stochastic one di-mensional Schr¨odinger operators.
Funct. Anal. Appl. , 1–10 (1977).[K] Kesten, H.: A sharper form of the Doeblin-L´evy-Kolmogorov-Rogozin inequality for con-centration functions . Math. Scand. , 133-144 (1969).[KiM] Kirsch, W., Martinelli, F.: On the ergodic properties of the spectrum of general randomoperators . J. Reine Angew. Math. , 141-156 (1982).[Kl] Klein, A.: Multiscale analysis and localization of random operators. In
RandomSchrodinger operators: methods, results, and perspectives . Panorama & Synth`ese, Soci´et´eMath´ematique de France. To appear.[Ko] Kolmogorov, A. N.:
Sur les propri´et´es des fonctions de concentration de M. P. L´evy .Ann. Inst. H. Poincar´e , 27-34 (1958-60).[KuS] Kunz, H., Souillard, B.: Sur le spectre des operateurs aux differences finies aleatoires .Commun. Math. Phys. , 201–246 (1980).[M] McDonald, D.: A local limit theorem for large deviations of sums of independent, non-identically distributed random variables.
Ann. Probab. , 526–531 (1979).[P] Pastur, L.: Spectral properties of disordered systems in the one-body approximation ,Commun. Math. Phys. , 179–196 (1980).[R1] Rogozin, B. A.: An Estimate for Concentration Functions , Theory Probab. Appl., ,94-97 (1961) Russian original: Teoriya Veroyatnost. i Prilozhen. An integral-type estimate for concentration functions of sums of inde-pendent random variables
Dokl. Akad. Nauk SSSR , 1067-1070 (1973) [in Russian].[R3] Rogozin, B. A.: Concentration function of a random variable. In
Encyclopaedia of Math-ematics , M. Hazewinkel (ed.), Springer 2002.[S] Sperner, E.:
Ein Satz ¨uber Untermengen einer endlichen Menge.
Math. Zeit. , 544–548(1928).[St] Stollmann, P.: Caught by disorder: bound states in random media . Boston: Birkh¨ausser,2001.[TV] Tao, T., Vu, V.:
On Random ± Matrices: Singularity and Determinant.
RandomStructures and Algorithms , 1–23 (2006).[V] Villani, C.: Topics in Optimal Transportation. AMS 2003.(Aizenman) Princeton University, Departments of Mathematics and Physics, Prince-ton, NJ 08544, USA
E-mail address : [email protected] (Germinet) Universit´e de Cergy-Pontoise, Laboratoire AGM, UMR CNRS 8088, D´epartementde Math´ematiques, Site de Saint-Martin, 2 avenue Adolphe Chauvin, 95302 Cergy-Pontoisecedex, France
E-mail address : [email protected] (Klein) University of California, Irvine, Department of Mathematics, Irvine, CA92697-3875, USA
E-mail address : [email protected] (Warzel) Princeton University, Department of Mathematics, Princeton, NJ 08544,USA
E-mail address ::