Equivalence of concentration inequalities for linear and non-linear functions
aa r X i v : . [ m a t h . P R ] S e p — P R E P R I N T — EQUIVALENCE OF CONCENTRATION INEQUALITIES FORLINEAR AND NON-LINEAR FUNCTIONS
T. J. SULLIVAN AND H. OWHADI
Abstract.
We consider a random variable X that takes values in a (possiblyinfinite-dimensional) topological vector space X . We show that, with respectto an appropriate “normal distance” on X , concentration inequalities for linearand non-linear functions of X are equivalent. This normal distance correspondsnaturally to the concentration rate in classical concentration results such asGaussian concentration and concentration on the Euclidean and Hammingcubes. Under suitable assumptions on the roundness of the sets of interest,the concentration inequalities so obtained are asymptotically optimal in thehigh-dimensional limit. Introduction
It is by now almost classical that smooth enough convex functions enjoy goodconcentration properties; see e.g. [15] [18] [22] [23] for surveys of the literature. Itis also known that convexity can be neglected in the Gaussian case and that thesmoothness assumptions are not essential and can be replaced, for instance, withbounded martingale differences; see e.g. [20] [21] and also [29].A common feature of many concentration results is that an appropriate notion ofdistance is needed, e.g.
Talagrand’s convex distance [27]. In this paper, a notion of“normal distance” on a topological vector space X is introduced through a techniquecommonly used in large deviations theory, Chernoff bounding, i.e. estimating themeasure of a set by using a containing half-space. Although simple, this methodleads to a notion of distance that is in some sense “natural” with respect to theduality structure on X . Remarkably, with respect to this distance, concentrationinequalities on the tails of linear, convex, quasiconvex and non-linear functions on X are mutually equivalent.Concentration of measure is based on a simple but non-trivial observation orig-inally due to L´evy [17]: in a high-dimensional probability space, “nearly all” theprobability mass lies close to any set with measure at least ; put another way,functions of many independent variables with small sensitivity to each individualinput are very nearly constant. A typical concentration inequality is of the form P [ | f ( X ) − m | ≥ r ] ≤ C exp( − C r ) , (1.1) Date : October 22, 2018.2010
Mathematics Subject Classification.
Key words and phrases. concentration of measure, large deviations, quasiconvexity, normaldistance.The authors acknowledge portions of this work supported by the United States Departmentof Energy National Nuclear Security Administration under Award Number DE-FC52-08NA28613through the California Institute of Technology’s ASC/PSAAP Center for the Predictive Modelingand Simulation of High Energy Density Dynamic Response of Materials. P R E P R I N T — where f is a suitably well-behaved function, X is a random variable such that thepush-forward measure ( f ◦ X ) ∗ P has some concentration property, and m is eitherthe mean value E [ f ( X )] or median value M [ f ( X )]; sometimes the control is one-sided, and the absolute value in (1.1) is omitted. A notable feature of this paper isthat it provides concentration inequalities with m = f ( E [ X ]).The key property of the normal distance of this paper is contained in the following portmanteau theorem for the equivalence of various concentration inequalities withrespect to normal distance: Theorem 1.1.
Let X be a real topological vector space and X ∗ its continuous dualspace. Let Ψ : X ∗ → [0 , + ∞ ] be positively homogeneous of degree one. Define the Ψ-normal distance from x ∈ X to A ⊆ X by d ⊥ , Ψ ( x, A ) := sup (cid:26) h ν, x − p i + Ψ( ν ) (cid:12)(cid:12)(cid:12)(cid:12) p ∈ X and ν ∈ X ∗ such that,for all a ∈ A , h ν, a i ≤ h ν, p i (cid:27) , with the convention that / . Then the following statements about any randomvariable X that takes values in X are equivalent:(i) for every closed half-space H p,ν := { x ∈ X | h ν, x − p i ≤ } ⊆ X , where p ∈ X and ν ∈ X ∗ , P [ X ∈ H p,ν ] ≤ exp (cid:18) − d ⊥ , Ψ ( E [ X ] , H p,ν ) (cid:19) ; (ii) for every convex set K ⊆ X , P [ X ∈ K ] ≤ exp (cid:18) − d ⊥ , Ψ ( E [ X ] , K ) (cid:19) ; (iii) for every measurable A ⊆ X , P [ X ∈ A ] ≤ exp (cid:18) − d ⊥ , Ψ ( E [ X ] , A ) (cid:19) ; (iv) for every measurable f : X → R ∪ {±∞} and every θ ∈ R ∪ {±∞} , P [ f ( X ) ≤ θ ] ≤ exp (cid:18) − d ⊥ , Ψ ( E [ X ] , f − ([ −∞ , θ ])) (cid:19) ; (v) for every quasiconvex f : X → R ∪ {±∞} and every θ ∈ R ∪ {±∞} , P [ f ( X ) ≤ θ ] ≤ exp (cid:18) − d ⊥ , Ψ ( E [ X ] , f − ([ −∞ , θ ])) (cid:19) . Note that if f is quasilinear ( i.e. both f and − f are quasiconvex), then formu-lation (v) yields concentration inequalities for both the lower and upper tails of f ( X ).The notation and setting of the paper are covered in section 2, along with areview of some definitions and results from the concentration-of-measure literature.Normal distance is defined and its properties (including theorem 1.1) are examinedin section 3. In section 4, the normalizing function Ψ is determined explicitly inseveral cases, thereby connecting theorem 1.1 with classical concentration results.In particular, proposition 4.4 identifies the normal distance that corresponds to theconcentration of a vector, the entries of which are the empirical (sampled) meansof functions of independent random variables. In section 5, it is shown that theinequality in theorem 1.1(iii) is asymptotically sharp (in the sense used in largedeviations theory) in the high-dimensional limit, provided that A is convex and P R E P R I N T — CONCENTRATION INEQUALITIES FOR LINEAR AND NON-LINEAR FUNCTIONS 3 b b
K p q N ∗ p K N ∗ q K b r Figure 2.1.
A convex set K and its outward normal conesat points p, q, r ∈ K . ∂K is smooth at p ∈ ∂K , so N ∗ p K is ahalf-line; ∂K has a vertex at q , so N ∗ q K is a pointed convexcone with non-empty interior; at the interior point r , N ∗ r K isthe empty set.“sufficiently round” at those points of A that are closest to the center of mass E [ X ].Finally, for completeness, the method of Chernoff bounds and its consequences forconvex sets are reviewed in an appendix (section 6).2. Notation and Background
Let X be a real topological vector space. Let X ∗ denote the continuous dualspace of X and let h ℓ, x i denote the dual pairing between ℓ ∈ X ∗ and x ∈ X ; h v, ℓ i will also denote the dual pairing between v ∈ X ∗∗ and ℓ ∈ X ∗ . It is not strictlynecessary to assume that X is locally convex, but the results of this paper may betrivially true if X ∗ does not contain enough linear functionals.2.1. Half-Spaces.
Given p ∈ X and ν ∈ X ∗ , H p,ν will denote the closed half-spaceof X that has p in its frontier and outward-pointing normal ν , i.e. H p,ν := { x ∈ X | h ν, x i ≤ h ν, p i} . (2.1)Note well the degenerate case H p, = X . Every ( p, ν ) ∈ X × X ∗ defines a uniqueclosed half-space of X , whereas a given closed half-space can have multiple distinctrepresentations: H p,ν = H p ′ ,ν ′ if, and only if, ν is a positive multiple of ν ′ and h ν, p − p ′ i = h ν ′ , p − p ′ i = 0.2.2. Convex Sets and Cones.
The closed convex hull of A ⊆ X will be denotedby co( A ). Given a closed convex set K ⊆ X and p ∈ K , N ∗ p K denotes the outwardnormal cone to K at p , and N ∗ K denotes the outward normal bundle of K :N ∗ p K := { ν ∈ X ∗ | K ⊆ H p,ν } , (2.2)N ∗ K := (cid:8) ( p, ν ) ∈ X × X ∗ (cid:12)(cid:12) p ∈ K, ν ∈ N ∗ p K (cid:9) . (2.3)The outward normal cone N ∗ p K is a pointed convex cone : it contains 0, is convex,and s ν + s ν ∈ N ∗ p K for all s , s ≥ ν , ν ∈ N ∗ p K . Also, N ∗ p K = { } if p is an interior point of K . Note that N ∗ K ⊆ X × X ∗ is not necessarily a convexset. See figure 2.1 for an illustration. P R E P R I N T — Quasiconvexity. If K ⊆ X is a convex set, then a function f : K → R ∪{±∞} is said to be quasiconvex if, for every θ ∈ R ∪ {±∞} , the sublevel set f − ([ −∞ , θ ]) := { x ∈ K | −∞ ≤ f ( x ) ≤ θ } (2.4)is a convex set; equivalently, f is quasiconvex if, for all x, y ∈ K and t ∈ [0 , f ((1 − t ) x + ty ) ≤ max { f ( x ) , f ( y ) } . (2.5) f is said to be quasiconcave if − f is quasiconvex, and f is said to be quasilinear if it is both quasiconvex and quasiconcave. Every convex (resp. concave, linear)function is quasiconvex (resp. quasiconcave, quasilinear), but not vice versa . Inparticular, a function f : R N → R is quasilinear if, and only if, it is the compositionof a monotone function with a linear functional on R N [5, p. 122].2.4. Indicator and Characteristic Functions.
Given a set A ⊆ X , A and χ A denote its indicator function and characteristic function respectively: A ( x ) := ( , if x ∈ A ,0 , if x / ∈ A ; (2.6) χ A ( x ) := ( , if x ∈ A ,+ ∞ , if x / ∈ A . (2.7)Note that, for any convex set K ⊆ X , χ K is a convex function.2.5. Probabilistic Notions.
Let (Ω , F , P ) be a probability space and let X : Ω →X be an X -valued random variable. E [ · ] denotes the expectation operator withrespect to the probability measure P : E [ X ] is defined to be any m ∈ X such that E [ h ℓ, X − m i ] ≡ Z Ω h ℓ, X ( ω ) − m i d P ( ω ) = 0 for all ℓ ∈ X ∗ ; (2.8)if X ∗ separates the points of X ( e.g. if X is a Banach space), then E [ X ] is unique.For Y : Ω → R , any m ∈ R that satisfiessup (cid:26) v ∈ R (cid:12)(cid:12)(cid:12)(cid:12) P [ Y ≤ v ] ≤ (cid:27) ≤ m ≤ inf (cid:26) v ∈ R (cid:12)(cid:12)(cid:12)(cid:12) P [ Y ≤ v ] ≥ (cid:27) (2.9)will be called a median of Y and denoted M [ Y ]. M X : X ∗ → [0 , + ∞ ] denotes the moment-generating function defined by M X ( ℓ ) := E [exp h ℓ, X i ] for all ℓ ∈ X ∗ . (2.10)Λ X : X ∗ → R ∪ {±∞} denotes the cumulant generating function (or logarithmicmoment-generating function ) defined byΛ X ( ℓ ) := log M X ( ℓ ) = log E [exp h ℓ, X i ] for all ℓ ∈ X ∗ . (2.11)By H¨older’s inequality, Λ X is a convex function. P R E P R I N T — CONCENTRATION INEQUALITIES FOR LINEAR AND NON-LINEAR FUNCTIONS 5
Talagrand’s Inequalities.
It has been known for some time that convex setsand functions enjoy good concentration properties; moreover, to get good concen-tration results, it is necessary to measure distances in the right way.For example, a theorem of Talagrand shows that if a convex set K ⊆ R N occupiesa “significant” portion of the Hamming cube {− , +1 } N and t ≫
1, then “nearlyall” of the points of the Hamming cube lie within Euclidean distance t of K . Definethe Euclidean Hausdorff distance from x ∈ R N to A ⊆ R N by d Haus ( x, A ) := inf {k x − a k | a ∈ A } . (2.12)Talagrand [26] showed that if X is uniformly distributed in {− , +1 } N then, for any A ⊆ R N , E [exp( d Haus ( X, co( A )) / ≤ P [ X ∈ A ] − ; hence, Chebyshev’s inequalityimplies that, for any t ≥ P [ X ∈ A ] P [ d Haus ( X, co( A )) ≥ t ] ≤ exp (cid:18) − t (cid:19) . (2.13)More interesting results can be obtained if one uses not the Euclidean distancebut the Hamming distance — or, more accurately, an infimum over weighted Ham-ming distances. For w = ( w , . . . , w N ) ∈ [0 , + ∞ ) N , define the w -weighted Hammingdistance d w on a product of sets X = Q Nn =1 X n by d w ( x, y ) := N X n =1 w n (1 − δ x n ,y n ); (2.14)that is, d w ( x, y ) is the w -weighted sum of the number of components in which x, y ∈ X differ. For x ∈ X and A ⊆ X , set d w ( x, A ) := inf a ∈ A d w ( x, a ). Define Talagrand’s convex distance from x ∈ X to A ⊆ X by d Tal ( x, A ) := sup ( d w ( x, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) w ∈ [0 , + ∞ ) N , N X n =1 w n = 1 ) , (2.15)and, for A, B ⊆ X , let d Tal ( A, B ) := inf a ∈ A d Tal ( a, B ). Talagrand [27, § X = ( X , . . . , X N ) is any X -valued random variable with independent com-ponents, then P [ X ∈ A ] P [ X ∈ B ] ≤ exp (cid:18) − d Tal ( A, B ) (cid:19) . (2.16)These bounds on the probabilities of sets lead to deviation inequalities for convexLipschitz functions. For example ( cf. [13] [26]), let X be any random variable in theunit cube in R N with independent components, and let f : [0 , N → R be convexand Lipschitz with k f k Lip ≤
1; then, for any t ≥ P [ f ( X ) ≥ M [ f ( X )] + t ] ≤ (cid:18) − t (cid:19) . (2.17)Note, however, that these results use not only the convexity of the function ofinterest, but also require Lipschitz continuity. What concentration inequalities canbe shown to hold without smoothness assumptions? P R E P R I N T — McDiarmid’s Inequality.
One smoothness-free concentration inequality is
McDiarmid’s inequality [20], also known as the bounded differences inequality , whichitself generalizes an earlier inequality of Hoeffding [11]. McDiarmid’s inequality isby no means the strongest concentration-of-measure inequality in the literature,but is useful because of its simple hypotheses and proof.Define the
McDiarmid diameter of f , denoted D [ f ], by D [ f ] := N X n =1 D n [ f ] ! / , (2.18)where the n th McDiarmid subdiameter D n [ f ] is defined by D n [ f ] := sup {| f ( x ) − f ( y ) | | x j = y j for j = n } . (2.19)When E [ | f ( X ) | ] is finite and X , . . . , X N are independent, McDiarmid’s inequalitybounds the deviations of f ( X ) from E [ f ( X )] in terms of the McDiarmid diameterof f : for any r > P [ f ( X ) − E [ f ( X )] ≤ − r ] ≤ exp (cid:18) − r D [ f ] (cid:19) , (2.20a) P [ f ( X ) − E [ f ( X )] ≥ r ] ≤ exp (cid:18) − r D [ f ] (cid:19) . (2.20b)McDiarmid’s inequality implies that, for any θ ∈ R ∪ {±∞} , P [ f ( X ) ≤ θ ] ≤ exp (cid:18) − E [ f ( X )] − θ ) D [ f ] (cid:19) , (2.21a) P [ f ( X ) ≥ θ ] ≤ exp (cid:18) − θ − E [ f ( X )]) D [ f ] (cid:19) . (2.21b)McDiarmid’s inequality (and similar inequalities such as martingale inequalities)have the advantage that a bound on the tails of f ( X ) is obtained solely in termsof the mean output E [ f ( X )] and the McDiarmid diameter D [ f ]. However, McDiar-mid’s inequality cannot take advantage of any other properties of f such as con-vexity or monotonicity; furthermore, if f has infinite McDiarmid diameter on theessential range of X , then the trivial upper bound 1 is obtained.There are many other sources of concentration-of-measure inequalities: theseinclude logarithmic Sobolev inequalities and the Herbst argument [2] [10] [12], theentropy method [3] [4] [14], and information-theoretic methods [7] [19]. Of par-ticular interest are those concentration results that apply to infinite-dimensionalsettings [16]. 3. Normal Distance
As noted above, efficient presentation of many concentration-of-measure inequal-ities relies on having an appropriate notion of function variation ( e.g. the Lipschitznorm or McDiarmid diameter) or distance ( e.g.
Talagrand’s convex distance). Theinequalities that will be established in section 4 can be phrased in terms of trans-forms of moment-generating functions, but are more transparent if phrased in termsof a normal distance , which will introduced in this section.Fix a function Ψ : X ∗ → [0 , + ∞ ] that is positively homogeneous of degree one, i.e. such that Ψ( αℓ ) = α Ψ( ℓ ) for all α ≥ ℓ ∈ X ∗ . By analogy with the P R E P R I N T — CONCENTRATION INEQUALITIES FOR LINEAR AND NON-LINEAR FUNCTIONS 7 situation in finite-dimensional Euclidean space, in which Ψ = k ·k on ( R N ) ∗ , definethe distance from a point x ∈ X to a half-space H p,ν ⊆ X by d ⊥ , Ψ ( x, H p,ν ) := h ν, x − p i + Ψ( ν ) , (3.1)with the convention that 0 / x ∈ X to the trivialhalf-space H p,ν = X ought to be zero. Note that d ⊥ , Ψ ( x, H p,ν ) = 0 whenever x ∈ H p,ν ; note also that the homogeneity assumption on Ψ ensures that (3.1) isan unambiguous definition. We now generalize (3.1) to more general subsets of X than half-spaces. The heuristic is that the distance from x to A ⊆ X should bethe greatest possible distance (in the sense of (3.1)) from x to any half-space thatcontains A ; the existence of the degenerate half-space H p, ensures that the normaldistance is zero if there are no proper half-spaces that contain A . Definition 3.1.
Let x ∈ X and A ⊆ X . The Ψ -normal distance from x to A ,denoted d ⊥ , Ψ ( x, A ), is defined (with the same convention that 0 / d ⊥ , Ψ ( x, A ) := sup (cid:26) h ν, x − p i + Ψ( ν ) (cid:12)(cid:12)(cid:12)(cid:12) p ∈ X and ν ∈ X ∗ such that A ⊆ H p,ν (cid:27) . (3.2)The Ψ-normal distance from A ⊆ X to B ⊆ X is defined by d ⊥ , Ψ ( A, B ) :=inf a ∈ A d ⊥ , Ψ ( a, B ). In the special case X = R N and Ψ = k · k on ( R N ) ∗ , weshall simply write d ⊥ for d ⊥ , Ψ , i.e. d ⊥ ( x, A ) := sup (cid:26) ( ν · ( x − p )) + k ν k (cid:12)(cid:12)(cid:12)(cid:12) p ∈ R N and ν ∈ ( R N ) ∗ such that A ⊆ H p,ν (cid:27) . (3.3)Note well that the definition of the normal distance d ⊥ , Ψ ( x, A ) does not require X to be normed; even when X is equipped with a norm k · k X and Ψ is the cor-responding operator norm, the normal distance d ⊥ , Ψ ( x, A ) is not the same as theHausdorff distance from x to A defined by d Haus ( x, A ) := inf {k x − a k X | a ∈ A } ; (3.4)see figure 3.1 for an illustration. Note also that it is not generally true that d ⊥ , Ψ ( A, B ) = d ⊥ , Ψ ( B, A ): consider e.g. B := { (0 , } and A as in figure 3.1, inwhich case d ⊥ , Ψ ( A, B ) = inf a ∈ A d ⊥ , Ψ ( a, B ) = 1 = 0 = d ⊥ , Ψ ( B, A ) . For any x ∈ X and A ⊆ B ⊆ X , it holds that d ⊥ , Ψ ( x, B ) ≤ d ⊥ , Ψ ( x, A ). Fur-thermore, since a closed half-space H p,ν contains A if, and only if, it contains theclosed convex hull co( A ) of A , the following equality holds: d ⊥ , Ψ ( x, A ) = d ⊥ , Ψ ( x, co( A )) for all x ∈ X and all A ⊆ X . (3.5) Remark 3.2.
It is natural to ask what, if any, relation there is between the normaldistance and Talagrand’s convex distance. The simplest answer is to say that thetwo distances should be compared only with great caution, since each belongs toa different setting: Talagrand’s distance is defined on a product of sets, whereasthe normal distance is defined on a topological vector space. Even on R N , thetwo distances measure different quantities: in some sense, d Tal ( x, A ) measures howmany of the coordinates of x are covered by A , but does not measure the geometricdistance between them; on the other hand, d ⊥ , Ψ ( x, A ) is a much more geometricmeasure of how far x is from A in terms of linear functionals on X , and the “size”of those linear functionals is measured by Ψ. In particular, Talagrand’s convex P R E P R I N T — A d Haus (0 , A ) d ⊥ (0 , A ) Figure 3.1.
An example of a subset A of the Euclidean plane R for which the normal distance d ⊥ (0 , A ) = 1 unit ( cf. thedashed line), as opposed to the Euclidean Hausdorff distance d Haus (0 , A ) = 2 units ( cf. the dotted arc).distance is positively homogeneous of degree zero, whereas the normal distance ispositively homogeneous of degree one: for any x ∈ R N , A ⊆ R N , and α > d Tal ( αx, αA ) = d Tal ( x, A ) ,d ⊥ , Ψ ( αx, αA ) = αd ⊥ , Ψ ( x, A ) . This section concludes with the proof of the portmanteau theorem (theorem 1.1)and some final remarks on its applicability:
Proof of theorem 1.1.
The equivalence will be established by showing that(i) = ⇒ (ii) = ⇒ (iii) = ⇒ (iv) = ⇒ (v) = ⇒ (i) . Suppose that (i) holds. Then P [ X ∈ K ] ≤ inf H p,ν ⊇ K P [ X ∈ H p,ν ] by monotonicity of P , ≤ inf H p,ν ⊇ K exp (cid:18) − d ⊥ , Ψ ( E [ X ] , H p,ν ) (cid:19) by (i),= exp −
12 sup H p,ν ⊇ K d ⊥ , Ψ ( E [ X ] , H p,ν ) ! = exp (cid:18) − d ⊥ , Ψ ( E [ X ] , K ) (cid:19) by (3.2).Hence, (i) implies (ii).Suppose that (ii) holds; then P [ X ∈ A ] ≤ P [ X ∈ co( A )] since A ⊆ co( A ), ≤ exp (cid:18) − d ⊥ , Ψ ( E [ X ] , co( A )) (cid:19) by (ii),= exp (cid:18) − d ⊥ , Ψ ( E [ X ] , A ) (cid:19) by (3.5), P R E P R I N T — CONCENTRATION INEQUALITIES FOR LINEAR AND NON-LINEAR FUNCTIONS 9 and so (ii) implies (iii). (iv) follows from (iii) upon setting A := { x ∈ X | f ( x ) ≤ θ } .(v) is clearly a special case of (iv). (i) follows from (v) upon setting f := χ H p,ν and θ := 1. (cid:3) Remark 3.3.
It is important to note that all the bounds in theorem 1.1 may betrivial if the dual space X ∗ is not rich enough. For example, given a measure space( Z , F , µ ), for 0 < p <
1, the space L p ( Z , F , µ ; R ) := ( f : Z → R (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) k f k p := (cid:18)Z Z | f ( z ) | p d µ ( z ) (cid:19) /p < + ∞ ) is a topological vector space with respect to the quasinorm topology generated by k · k p . This space is not locally convex and has a trivial dual space: the onlycontinuous linear functional on this space is the zero functional, and so the onlyclosed half-space is the whole space. See e.g. [24, § L p ([0 , R ) for 0 < p < X . This can be done, and most results gothrough mutatis mutandis ; in particular, it is necessary to replace all references tothe closed convex hull co( A ) of A ⊆ X with the convex hull co( A ); the analogue of(3.5) (with Ψ now defined on the algebraic dual of X ) is d ⊥ , Ψ ( x, A ) = d ⊥ , Ψ ( x, co( A )) for all x ∈ X and all A ⊆ X . The principal disadvantage of ignoring all topological structure on X , of course, isthat there are no longer notions of interior, closure and frontier — although it stillmakes sense to discuss the extremal points of convex sets.4. Normal Distance as a Concentration Rate
The method of Chernoff bounding (reviewed in lemma 6.1) gives bounds on P [ X ∈ H p,ν ] in terms of the moment-generating function M X . If these bounds canbe formulated in terms of a suitable normal distance, then theorem 1.1 producesequivalent bounds for on P [ X ∈ K ] for convex K , on P [ X ∈ A ], & c. . As notedin [18, § P [ f ( X ) ≥ θ ] is never better than the bestbound using the all the moments of f ( X ): if f takes only non-negative values, theninf k ∈ N θ − k E (cid:2) f ( X ) k (cid:3) ≤ inf s ≥ e − sθ E (cid:2) e sf ( X ) (cid:3) . (4.1)However, Chernoff bounds have the advantage that they are geometrically veryeasy to handle.The next result provides the normal distance formulation for an X -valued Gauss-ian random variable (in fact, for a family of such variables). In the special case of asingle Gaussian random vector X on X = R N with covariance operator C X = σ I N ,proposition 4.1 yields the classical Chernoff bound for a multivariate normal randomvariable. Proposition 4.1.
Let Γ be a family of Gaussian random vectors in X . For each X ∈ Γ , let C X : X ∗ → X ∗∗ be its covariance operator defined by h C X ℓ, ν i := E [ h ℓ, X ih ν, X i ] . (4.2) Let E := { E [ X ] | X ∈ Γ } , let Ψ( ν ) := sup X ∈ Γ p h C X ν, ν i , (4.3) P R E P R I N T —
10 T. J. SULLIVAN AND H. OWHADI and let d ⊥ , Ψ be the corresponding normal distance. Then, for any A ⊆ X , sup X ∈ Γ P [ X ∈ A ] ≤ exp (cid:18) − d ⊥ , Ψ ( E, A ) (cid:19) . (4.4) Proof.
For each X ∈ Γ, the moment-generating function for X is given by M X ( ℓ ) := E h e h ℓ,X i i = exp (cid:18) h ℓ, E [ X ] i + h C X ℓ, ℓ i (cid:19) . (4.5)Therefore, P [ X ∈ H p,ν ] ≤ inf s ≥ exp (cid:18) s h ν, p − E [ X ] i + s h C X ν, ν i (cid:19) by (4.5) and lemma 6.1,= exp (cid:18) − h ν, E [ X ] − p i h C X ν, ν i (cid:19) ≤ exp (cid:18) − h ν, E [ X ] − p i ν ) (cid:19) by (4.3),= exp (cid:18) − d ⊥ , Ψ ( E [ X ] , H p,ν ) (cid:19) by (3.2).Hence, by theorem 1.1, P [ X ∈ A ] ≤ exp (cid:18) − d ⊥ , Ψ ( E [ X ] , A ) (cid:19) , and so sup X ∈ Γ P [ X ∈ A ] ≤ sup X ∈ Γ exp (cid:18) − d ⊥ , Ψ ( E [ X ] , A ) (cid:19) = exp (cid:18) − inf X ∈ Γ d ⊥ , Ψ ( E [ X ] , A ) (cid:19) = exp (cid:18) d ⊥ , Ψ ( E, A ) (cid:19) . (cid:3) Lemma 6.1 also has the following consequences for random vectors supported ina cuboid in R N ; this encompasses two standard situations in which concentration isoften studied, namely concentration for functions on the Euclidean unit cube andon the Hamming cube. Proposition 4.2.
Let X be a random vector in R N with independent componentssuch that each component X n almost surely takes values in a fixed interval of length L n . Let Ψ( ν ) := 12 vuut N X n =1 L n ν n (4.6) and let d ⊥ , Ψ be the corresponding normal distance. Then, for any A ⊆ R N , P [ X ∈ A ] ≤ exp (cid:18) − d ⊥ , Ψ ( E [ X ] , A ) (cid:19) . (4.7)A fortiori , if X takes values in (a translate of ) the unit cube [0 , N , then P [ X ∈ A ] ≤ exp (cid:0) − d ⊥ ( E [ X ] , A ) (cid:1) , (4.8) P R E P R I N T — CONCENTRATION INEQUALITIES FOR LINEAR AND NON-LINEAR FUNCTIONS 11 and if X takes values in (a translate of ) the Hamming cube {− , +1 } N , then P [ X ∈ A ] ≤ exp (cid:18) − d ⊥ ( E [ X ] , A ) (cid:19) . (4.9) Proof.
The proof is similar to the Gaussian case: it is an application of lemma6.1 and Hoeffding’s lemma [11, lemma 1 and (4.16)], which bounds the moment-generating function of X n as follows: M X n ( ℓ n ) := E [exp( ℓ n X n )] ≤ exp (cid:18) ℓ n E [ X n ] + ℓ n L n (cid:19) . Note that the claim can also be proved directly by applying McDiarmid’s in-equality to the function h ν, ·i , which has mean E [ h ν, X i ] = h ν, E [ X ] i and McDiarmiddiameter p L + · · · + L N . (cid:3) Remark 4.3.
Note the similarity between the normal distances of propositions 4.1and 4.2. In the Gaussian case, the norm on X ∗ is the one induced by the “largest”covariance operator in the family of random variables Γ. In the bounded-range case,the norm on X ∗ is the one induced by the “largest” covariance operator for randomvariables satisfying the range constraint: if X is a real-valued random variabletaking values in an interval [ a, b ], then Ψ( ν ) = ( b − a ) ν and Var[ X ] ≤ ( b − a ) ;this upper bound on the variance is attained by a Bernoulli random variable withlaw δ a + δ b .The next result identifies the normal distance that corresponds to the concentra-tion of a vector, the entries of which are the empirical (sampled) means of functionsof independent random variables. Proposition 4.4.
For n = 1 , . . . , N , let Z n := f n ( Y n, , . . . , Y n,K ( n ) ) be a real-valued function of independent random variables Y n,k , and suppose that f n has finiteMcDiarmid diameter D [ f n ] . Let Z = ( Z , . . . , Z N ) . Suppose that the random inputsof each f n are sampled independently M ( n ) times according to the distribution P and that the empirical average b E [ Z ] = M ( n ) M ( n ) X m =1 f n (cid:16) Y ( m ) n, , . . . , Y ( m ) n,K ( n ) (cid:17) Nn =1 ∈ R N (4.10) is formed. Then, for any A ⊆ R N , P hb E [ Z ] ∈ A i ≤ exp (cid:18) − d ⊥ , Ψ ( E [ Z ] , A ) (cid:19) , (4.11) where the distance Ψ : ( R N ) ∗ → [0 , + ∞ ) is given in terms of the McDiarmid diam-eters of the functions f , . . . , f N and the sample sizes M (1) , . . . , M ( N ) : Ψ( ν ) := 12 N X n =1 ν n D [ f n ] M ( n ) ! / . (4.12) Proof.
Let H p,ν ( R N be a half-space. Consider the real-valued random variable D ν, b E [ Z ] E as a function of the sampled input random variables Y ( m ) n,k . Supposethat the McDiarmid subdiameter of f n with respect to Y n,k is D n,k . Then the P R E P R I N T —
12 T. J. SULLIVAN AND H. OWHADI
McDiarmid subdiameter of D ν, b E [ Z ] E with respect to the m th sample of Y n,k is ν n D n,k /M ( n ). Hence, the McDiarmid diameter of D ν, b E [ Z ] E is vuut X k,n,m ν n D n,k M ( n ) = vuutX n,m ν n D [ f n ] M ( n ) = sX n ν n D [ f n ] M ( n )Therefore, since b E [ Z ] is an unbiased estimator for E [ Z ] ( i.e. E hb E [ Z ] i = b E [ Z ]),McDiarmid’s inequality (2.21a) implies that P hb E [ Z ] ∈ H p,ν i = P hD ν, b E [ Z ] E ≤ h ν, p i i ≤ exp − h ν, E [ Z ] i − h ν, p i ) P Nn =1 ν n D [ f n ] M ( n ) = exp − h ν, E [ Z ] − p i · · P Nn =1 ν n D [ f n ] M ( n ) = exp (cid:18) − d ⊥ , Ψ ( E [ Z ] , H p,ν ) (cid:19) . The claim now follows from theorem 1.1. (cid:3)
An example of the application of proposition 4.4 is the following:
Example 4.5 (Functions of empirical means) . The Chernoff bounding methodcan be used to provide much-improved confidence levels for quantities derived frommany empirical — as opposed to exact — means; see e.g. [25, § H : R N → R is some function of interest: in particular, the quantity of interest is H ( E [ Z ] , . . . , E [ Z N ]) for some absolutely integrable real-valued random variables Z , . . . , Z N . If, however, the exact means E [ Z n ] are unknown, then empirical means b E [ Z n ] may be used in their place if appropriate confidence corrections are made.Suppose that “error” corresponds to concluding, based on the empirical means,that H ( E [ Z ]) is smaller than it actually is. Given α ∈ R N , set H α ( z , . . . , z N ) := H ( z + α , . . . , z N + α N ) . (4.13)Therefore, given any ε >
0, we seek an appropriate “margin hit” α = α ( ε ) ∈ R N (typically, α n ≥ n ∈ { , . . . , N } ) such that P h H α (cid:16)b E [ Z ] , . . . , b E [ Z N ] (cid:17) ≥ H ( E [ Z ] , . . . , E [ Z N ]) i ≥ − ε. Dually, given α ∈ R N , we seek a sharp upper bound on the probability of error, i.e. on P h H α (cid:16)b E [ Z ] , . . . , b E [ Z N ] (cid:17) ≤ H ( E [ Z ] , . . . , E [ Z N ]) i . If H (and hence H α ) is monotonic in each of its N arguments and Z , . . . , Z N are independent, then the probability of non-error can be bounded from below as P R E P R I N T — CONCENTRATION INEQUALITIES FOR LINEAR AND NON-LINEAR FUNCTIONS 13 follows: P h H α (cid:16)b E [ Z ] (cid:17) ≤ H ( E [ Z ]) i = P h H α (cid:16)b E [ Z ] (cid:17) ≤ H α ( E [ Z ] − α ) i ≤ N Y n =1 P hb E [ Z n ] ≤ E [ Z n ] − α n i ≤ − N Y n =1 (cid:18) − exp (cid:18) − − M ( n )( α n ) D [ f n ] (cid:19)(cid:19) . Unfortunately, when N is large, the last line of this inequality is typically closeto zero unless the sample sizes are very large, and so this bound is of limiteduse. Geometrically, this is analogous to the fact that a high-dimensional orthant(product of half-lines) appears to be very narrow from the perspective of an observerat its vertex. In contrast, half-spaces always fill a half of the observer’s field of view.To bound the probability of sublevel or superlevel sets using half-spaces requires H α to have some convexity — not monotonicity — properties.If H α is quasiconvex, then the bounds using normal distances can be applied togood effect, and yield estimates that actually perform better the larger N is. Inparticular, if H α is both quasiconvex and differentiable, then the outward normalto its t -level set at some point p is just any positive multiple of the derivative of H α at p , and this yields the bound P h H α (cid:16)b E [ Z ] (cid:17) ≤ θ i ≤ inf p : H α ( p ) ≤ θ exp − (cid:16)P Nn =1 ∂ n H α ( p )( E [ Z n ] − p n ) (cid:17) P Nn =1 ( ∂ n H α ( p )) D [ f n ] M ( n ) . (4.14)In particular, taking θ = H ( E [ Z ]) = H α ( E [ Z ] − α ) and evaluating the exponentialin (4.14) at p = E [ Z ] − α ∈ R N yields that P h H α (cid:16)b E [ Z ] (cid:17) ≤ H ( E [ Z ]) i ≤ exp − (cid:16)P Nn =1 ∂ n H α ( p ) α n (cid:17) P Nn =1 ( ∂ n H α ( p )) D [ f n ] M ( n ) . (4.15)(4.15) is particularly useful since it links the margin hits α n , the sample sizes M ( n ), and the maximum probability of error. For example, given a desired level ofconfidence, margin hits α n , and a total number of samples M ∈ N , one can choosesample sizes M (1) , . . . , M ( N ) that sum to M and minimize the right-hand side of(4.15); this yields an optimal distribution of sampling resources so as to ensure that H α (cid:16)b E [ Z ] (cid:17) ≥ H ( E [ Z ]) with the desired level of confidence.5. High-Dimensional Asymptotics
The topic of this section is the asymptotic sharpness of the bounds introducedabove as the dimension of the space X becomes large. We begin with a comparisonof the McDiarmid and half-space bounds for a simple function: a quadratic formon R N . Example 5.1 (Comparison with McDiarmid’s inequality) . The following exampleserves to illustrate how the half-space method can produce upper bounds on themeasure of suitable sublevel sets that are superior to those offered by McDiarmid’s P R E P R I N T —
14 T. J. SULLIVAN AND H. OWHADI − − − − N l og ( upp e r b o und ) ut ut ut ut ut ut ut ut ut ut ut ut ut utrs rs rs rs rs rs rs rs rs rs rs rs rs rsu u u u u u u u u u u u u ur r r r r r r r r r r r r r Figure 5.1.
For the quadratic form Q N on R N given in (5.1),a comparison of the McDiarmid upper bound (squares) andhalf-space upper bound (triangles) on P [ Q N ( X ) ≤ θ ] in thecases θ = (dotted line and hollow polygons) and θ = (solidline and filled polygons).inequality; it also shows how this effect is more pronounced in higher-dimensionalspaces. Consider the following quadratic form Q N on R N : Q N ( x ) := (cid:13)(cid:13) x − (cid:0) , . . . , (cid:1)(cid:13)(cid:13) . (5.1)For any θ >
0, the sublevel set Q − N ([ −∞ , θ ]) is simply a ball of radius √ θ aboutthe point (cid:0) , . . . , (cid:1) . Suppose that a random vector X takes values in (cid:2) − , + (cid:3) N with independent components. McDiarmid’s inequality (2.21a) implies that P [ Q N ( X ) ≤ θ ] ≤ exp − √ N − θ √ N ! , If also E [ X ] = 0, then corollary 4.2 implies that P [ Q N ( X ) ≤ θ ] ≤ exp − ( √ N − √ θ ) ! . For small N and large θ , McDiarmid’s bound is the sharper of the two. However,for small θ (and, notably, as N → ∞ for any fixed θ ), the half-space bound is thesharper bound. See figure 5.1 for an illustration.The previous example suggests that bounds constructed using the half-spacemethod may perform very well in high dimension but also that the sharpness ofthe bound may depend on “how round” the set whose measure we wish to boundis. To fix ideas, suppose that X = ( X , . . . , X N ) : Ω → R N is a random vector withindependent components, where X n is supported on an interval of length L n . For A ⊆ R N , how sharp is the bound P [ X ∈ A ] ≤ exp (cid:18) − d ⊥ ( E [ X ] , A ) (cid:19) ? (5.2) P R E P R I N T — CONCENTRATION INEQUALITIES FOR LINEAR AND NON-LINEAR FUNCTIONS 15 H e , − e K ε ε b E [ X ] Figure 5.2.
It is not reasonable to expect that (an upperbound for) the measure of the half-space H e , − e is a sharpupper bound for the measure of the narrow wedge K ε when ε is small.First, note that since d ⊥ ( E [ X ] , A ) = d ⊥ ( E [ X ] , co( A )), the bound cannot be ex-pected to be sharp if A differs greatly from its closed convex hull, and so it makessense to restrict investigation to the case that A = K , a closed and convex subsetof R N . Secondly, it is not reasonable to expect the bound (5.2) on P [ X ∈ K ] tobe sharp if K is sharply pointed, e.g. if K is the narrow wedge K ε of angle ε ≪ e := (1 , , . . . ,
0) in R N : K ε := (cid:26) x ∈ R N (cid:12)(cid:12)(cid:12)(cid:12) ( x − e ) · e k x − e k ≤ ε (cid:27) ; (5.3)see figure 5.2. Therefore, we wish to consider the opposite situation in which K hasno sharp points, which will be made precise by requiring that K satisfy an interiorball condition.Suppose that ( p, ν ) ∈ N ∗ K is such that d ⊥ ( x, H p,ν ) = d ⊥ ( x, K ). Suppose alsothat B r ( p − rω ) ⊆ K , with r > ω ∈ R N a unit vector, is an interior ball for K at p ∈ ∂K ; cf. figure 5.3. If the law of X on R N is highly singular, then it cannot beexpected that the bound (5.2) is sharp, so suppose that the law of X has a densitywith respect to Lebesgue measure that is bounded above by some constant C > P [ X ∈ K ] ≤ exp − h ν, E [ X ] − p i P Nn =1 ν n L n ! . In the extreme case, K is precisely the closed ball B r ( p − rω ), the P -measure ofwhich is at most Cr N π N/ / Γ(1 + N/ § I.1]; see also e.g. [8] [28] for surveys of the large deviationsliterature. Two sequences ( α n ) n ∈ N and ( β n ) n ∈ N are said to be logarithmically equiv-alent , denoted α n ≃ β n , if1 n log α n − n log β n ≡ log (cid:18) α n β n (cid:19) /n → n → ∞ . (5.4)Are the half-space bound (5.2) and the measure of B r ( p − rω ) logarithmically equiv-alent? That is, does the conditional probability P (cid:2) X ∈ B r ( p − rω ) (cid:12)(cid:12) X ∈ H p,ν (cid:3) , P R E P R I N T —
16 T. J. SULLIVAN AND H. OWHADI H p,ν K b q b p − rω b p b E [ X ] Figure 5.3.
An interior ball of radius r for the closed convexset K at the frontier point p . Necessarily, p is a point at which ∂K is smooth; K admits no interior ball of positive radius atthe vertex q . For convenience, the unit vector ω ∈ R N hasbeen identified with ν ∈ N ∗ p K ⊆ ( R N ) ∗ .when raised to the power N , converge to 1 as N → ∞ ? To simplify the asymp-totic expansions below, in all lines after the first two, we shall take E [ X ] = 0 and L = · · · = L N = 1. Then1 N log P (cid:2) X ∈ B r ( p − rω ) (cid:3) − N log (r.h.s. of (5.2)) ≤ N log Cr N π N/ Γ(1 + N/
2) + 2 h ν, E [ X ] − p i P Nn =1 ν n L n ! = 2 h ν, p i − N k ν k + log( Cr N π N/ ) N − log Γ(1 + N/ N which, by Stirling’s approximation for the Gamma function [1, p. 256, eq. (6.1.37)],is approximately ≈ h ν, p i − N k ν k + log( Cr N π N/ ) N − N log s π N/ (cid:18) N/ e (cid:19) N/ ! ∼ h ν, p i − N k ν k + log CN − N log 4 πN − N/ N log N e ∼ h ν, p i − N k ν k + log r − log √ N Note that h ν, p i − / k ν k ≤ √ Nd (0 , p ), where d denotes the weighted Hammingdistance with weight w = (1 , . . . , r is of the same order as √ N . That is, it is necessarythat K is sufficiently round that it has an interior ball of radius comparable to √ N at those frontier points where the normal distance d ⊥ ( E [ X ] , K ) is attained. P R E P R I N T — CONCENTRATION INEQUALITIES FOR LINEAR AND NON-LINEAR FUNCTIONS 17
Now suppose that K = f − ([ −∞ , θ ]) is a convex sublevel set for twice-differentiablefunction f . Let η , . . . , η N − , ν be a basis of R N such that k η k = · · · = k η N − k = k ν k = 1and, for each n ∈ { , . . . , N − } , η n is perpendicular to ν . Suppose that, in thissystem of normal coordinates, near p , the frontier of K can be approximated by aparabola: ∂K = ( y η + . . . y N − η N − − y N ν (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) y N = N − X n =1 λ n y n ) with λ ≥ λ ≥ · · · ≥ λ N − ≥
0. Then the condition that K admits an interior ballof radius r at p is the inequality r − vuut r − N − X n =1 y n ≥ N − X n =1 λ n y n whenever N − X n =1 y n ≤ r . This, in turn, leads to the following condition on λ : it must hold that λ ≤ r . Putanother way, the half-space method cannot be expected to provide asymptoticallysharp bounds for P [ f ( X ) ≤ θ ] if, when f is approximated in normal coordinatesnear the closest point of f − ([ −∞ , θ ]) to E [ X ] by a non-negative quadratic form,that quadratic form has an eigenvalue greater than (4 N ) − / .6. Appendix: Chernoff Bounds
The method of Chernoff bounds [5, § X is bounded by that of a containing half-space, and theprobability of that half-space is bounded using the moment-generating function ofthe probability measure. Lemma 6.1 (Chernoff bounds) . For any half-space H p,ν ⊆ X , P [ X ∈ H p,ν ] ≤ inf s ≥ e s h ν,p i M X ( − sν ) . (6.1) For any convex set K ⊆ X , P [ X ∈ K ] ≤ inf ( p,ν ) ∈ N ∗ K e h ν,p i M X ( − ν ) (6.2a)= exp (cid:18) − sup p ∈ K (Λ X + χ − N ∗ p K ) ⋆ ( p ) (cid:19) . (6.2b) In particular, for any x ∈ X , P [ X = x ] ≤ exp( − Λ ⋆X ( x )) . (6.3) Proof.
By the definition of the half-space H p,ν , P [ X ∈ H p,ν ] = P [ h ν, X i ≤ h ν, p i ]= E (cid:2) [ h ν,p − X i≥ (cid:3) ≤ E h e s h ν,p − X i i for any s ≥ e s h ν,p i E h e h− sν,X i i ≤ e s h ν,p i M X ( − sν ) . P R E P R I N T —
18 T. J. SULLIVAN AND H. OWHADI
Since this inequality holds for any s ≥
0, taking the infimum over all such s yields(6.1). Recall that the outward normal cone to a convex set at any point is closedunder multiplication by non-negative scalars; hence, for any convex set K ⊆ X ,taking the infimum of the right-hand side of (6.1) over half-spaces H p,ν that contain K yields (6.2a). Now observe thatinf ( p,ν ) ∈ N ∗ K e h ν,p i M X ( − ν )= inf ( p,ν ) ∈ N ∗ K exp( h ν, p i + Λ X ( − ν ))= exp (cid:18) inf p ∈ K inf ν ∈ N ∗ p K ( h ν, p i + Λ X ( − ν )) (cid:19) = exp − sup p ∈ K sup ν ∈− N ∗ p K ( h ν, p i − Λ X ( ν )) ! = exp (cid:18) − sup p ∈ K (Λ X + χ − N ∗ p K ) ⋆ ( p ) (cid:19) , which establishes (6.2b); (6.3) follows as a special case. (cid:3) References
1. M. Abramowitz and I. A. Stegun (eds.),
Handbook of Mathematical Functions with Formulas,Graphs, and Mathematical Tables , Dover Publications Inc., New York, 1992, Reprint of the1972 edition. MR 1225604 (94b:00012)2. D. Bakry and M. ´Emery,
Diffusions hypercontractives , S´eminaire de Proba-bilit´es, XIX, 1983/84, Lecture Notes in Math., vol. 1123, Springer, Berlin, 1985, http://dx.doi.org/10.1007/BFb0075847 , pp. 177–206. MR 889476 (88j:60131)3. S. G. Bobkov and M. Ledoux,
On modified logarithmic Sobolev inequalities forBernoulli and Poisson measures , J. Funct. Anal. (1998), no. 2, 347–365, http://dx.doi.org/10.1006/jfan.1997.3187 . MR 1636948 (99e:60051)4. S. Boucheron, G. Lugosi, and P. Massart,
Concentration inequalities using the entropy method ,Ann. Probab. (2003), no. 3, 1583–1614, http://dx.doi.org/10.1214/aop/1055425791 .MR 1989444 (2004i:60023)5. S. Boyd and L. Vandenberghe, Convex Optimization , Cambridge University Press, Cambridge,2004. MR 2061575 (2005d:90002)6. H. Chernoff,
A measure of asymptotic efficiency for tests of a hypothesisbased on the sum of observations , Ann. Math. Statistics (1952), 493–507, http://dx.doi.org/10.1214/aoms/1177729330 . MR 0057518 (15,241c)7. A. Dembo, Information inequalities and concentration of measure , Ann. Probab. (1997),no. 2, 927–939, http://dx.doi.org/10.1214/aop/1024404424 . MR 1434131 (98e:60027)8. A. Dembo and O. Zeitouni, Large Deviations Techniques and Applications , second ed., Appli-cations of Mathematics (New York), vol. 38, Springer-Verlag, New York, 1998. MR 1619036(99d:60030)9. F. den Hollander,
Large Deviations , Fields Institute Monographs, vol. 14, American Mathe-matical Society, Providence, RI, 2000. MR 1739680 (2001f:60028)10. L. Gross,
Logarithmic Sobolev inequalities , Amer. J. Math. (1975), no. 4, 1061–1083, http://dx.doi.org/10.2307/2373688 . MR 0420249 (54 Probability inequalities for sums of bounded random variables , J. Amer. Statist.Assoc. (1963), no. 301, 13–30, http://dx.doi.org/10.2307/2282952 . MR 0144363 (26 Logarithmic Sobolev inequalities and stochastic Ising models ,J. Statist. Phys. (1987), no. 5-6, 1159–1194, http://dx.doi.org/10.1007/BF01011161 .MR 893137 (89e:82013)13. W. B. Johnson and G. Schechtman, Remarks on Talagrand’s deviation inequality forRademacher functions , Functional Analysis (Austin, TX, 1987/1989), Lecture Notes in P R E P R I N T — CONCENTRATION INEQUALITIES FOR LINEAR AND NON-LINEAR FUNCTIONS 19
Math., vol. 1470, Springer, Berlin, 1991, http://dx.doi.org/10.1007/BFb0090214 , pp. 72–77. MR 1126739 (92m:60017)14. M. Ledoux,
On Talagrand’s deviation inequalities for product measures , ESAIMProbab. Statist. (1995/97), 63–87 (electronic), http://dx.doi.org/10.1051/ps:1997103 .MR 1399224 (97j:60005)15. , The Concentration of Measure Phenomenon , Mathematical Surveys and Mono-graphs, vol. 89, American Mathematical Society, Providence, RI, 2001. MR 1849347(2003k:28019)16. M. Ledoux and M. Talagrand,
Probability in Banach Spaces: Isoperimetry and Processes ,Ergebnisse der Mathematik und ihrer Grenzgebiete (3) [Results in Mathematics and RelatedAreas (3)], vol. 23, Springer-Verlag, Berlin, 1991. MR 1102015 (93c:60001)17. P. L´evy,
Probl`emes Concrets d’Analyse Fonctionnelle. avec un compl´ement sur les fonction-nelles analytiques par F. Pellegrino. , second ed., Gauthier-Villars, Paris, 1951. MR 0041346(12,834a)18. G. Lugosi,
Concentration-of-measure inequalities , ,Pompeu Fabra University, Barcelona, Spain, 25 June 2009, Lecture notes.19. K. Marton, Bounding d -distance by informational divergence: a methodto prove measure concentration , Ann. Probab. (1996), no. 2, 857–866, http://dx.doi.org/10.1214/aop/1039639365 . MR 1404531 (97f:60064)20. C. McDiarmid, On the method of bounded differences , Surveys in Combinatorics, 1989 (Nor-wich, 1989), London Math. Soc. Lecture Note Ser., vol. 141, Cambridge Univ. Press, Cam-bridge, 1989, pp. 148–188. MR 1036755 (91e:05077)21. ,
Centering sequences with bounded differences , Combin. Probab. Comput. (1997),no. 1, 79–86, http://dx.doi.org/10.1017/S0963548396002854 . MR 1436721 (98b:60020)22. , Concentration , Probabilistic Methods for Algorithmic Discrete Mathematics, Algo-rithms Combin., vol. 16, Springer, Berlin, 1998, pp. 195–248. MR 1678578 (2000d:60032)23. V. D. Milman and G. Schechtman,
Asymptotic Theory of Finite-Dimensional Normed Spaces ,Lecture Notes in Mathematics, vol. 1200, Springer-Verlag, Berlin, 1986, With an appendix byM. Gromov. MR 856576 (87m:46038)24. W. Rudin,
Functional Analysis , second ed., International Series in Pure and Applied Mathe-matics, McGraw-Hill Inc., New York, 1991. MR 1157815 (92k:46001)25. T. J. Sullivan, U. Topcu, M. McKerns, and H. Owhadi,
Uncertainty quantifica-tion via codimension-one partitioning , Int. J. Numer. Meth. Eng.
In Press (2010), http://dx.doi.org/10.1002/nme.3030 .26. M. Talagrand,
An isoperimetric theorem on the cube and the Kintchine–Kahane inequalities ,Proc. Amer. Math. Soc. (1988), no. 3, 905–909, http://dx.doi.org/10.2307/2046814 .MR 964871 (90h:60016)27. ,
Concentration of measure and isoperimetric inequalities in product spaces , Inst.Hautes ´Etudes Sci. Publ. Math. (1995), 73–205, http://dx.doi.org/10.1007/BF02699376 .MR 1361756 (97h:60016)28. S. R. S. Varadhan, Large deviations , Ann. Probab. (2008), no. 2, 397–419, http://dx.doi.org/10.1214/07-AOP348 . MR 2393987 (2009d:60070)29. V. H. Vu, Concentration of non-Lipschitz functions and applications , Random StructuresAlgorithms (2002), no. 3, 262–316, Probabilistic methods in combinatorial optimization.MR 1900610 (2003c:60053) Graduate Aerospace Laboratories, California Institute of Technology, Mail Code205-45, 1200 East California Boulevard, Pasadena, CA 91125, United States of America
E-mail address : [email protected] URL : Applied & Computational Mathematics and Control & Dynamical Systems, Cali-fornia Institute of Technology, Mail Code 217-50, 1200 East California Boulevard,Pasadena, CA 91125, United States of America
E-mail address : [email protected] URL ::