[PDF] Equivalence of concentration inequalities for linear and non-linear functions

Abstract

We consider a random variable X that takes values in a (possibly infinite-dimensional) topological vector space X . We show that, with respect to an appropriate "normal distance" on X , concentration inequalities for linear and non-linear functions of X are equivalent. This normal distance corresponds naturally to the concentration rate in classical concentration results such as Gaussian concentration and concentration on the Euclidean and Hamming cubes. Under suitable assumptions on the roundness of the sets of interest, the concentration inequalities so obtained are asymptotically optimal in the high-dimensional limit.

Full PDF

aa r X i v : . [ m a t h . P R ] S e p — P R E P R I N T — EQUIVALENCE OF CONCENTRATION INEQUALITIES FORLINEAR AND NON-LINEAR FUNCTIONS

T. J. SULLIVAN AND H. OWHADI

Abstract.

We consider a random variable X that takes values in a (possiblyinﬁnite-dimensional) topological vector space X . We show that, with respectto an appropriate “normal distance” on X , concentration inequalities for linearand non-linear functions of X are equivalent. This normal distance correspondsnaturally to the concentration rate in classical concentration results such asGaussian concentration and concentration on the Euclidean and Hammingcubes. Under suitable assumptions on the roundness of the sets of interest,the concentration inequalities so obtained are asymptotically optimal in thehigh-dimensional limit. Introduction

It is by now almost classical that smooth enough convex functions enjoy goodconcentration properties; see e.g. [15] [18] [22] [23] for surveys of the literature. Itis also known that convexity can be neglected in the Gaussian case and that thesmoothness assumptions are not essential and can be replaced, for instance, withbounded martingale diﬀerences; see e.g. [20] [21] and also [29].A common feature of many concentration results is that an appropriate notion ofdistance is needed, e.g.

Talagrand’s convex distance [27]. In this paper, a notion of“normal distance” on a topological vector space X is introduced through a techniquecommonly used in large deviations theory, Chernoﬀ bounding, i.e. estimating themeasure of a set by using a containing half-space. Although simple, this methodleads to a notion of distance that is in some sense “natural” with respect to theduality structure on X . Remarkably, with respect to this distance, concentrationinequalities on the tails of linear, convex, quasiconvex and non-linear functions on X are mutually equivalent.Concentration of measure is based on a simple but non-trivial observation orig-inally due to L´evy [17]: in a high-dimensional probability space, “nearly all” theprobability mass lies close to any set with measure at least ; put another way,functions of many independent variables with small sensitivity to each individualinput are very nearly constant. A typical concentration inequality is of the form P [ | f ( X ) − m | ≥ r ] ≤ C exp( − C r ) , (1.1) Date : October 22, 2018.2010

Mathematics Subject Classiﬁcation.

Key words and phrases. concentration of measure, large deviations, quasiconvexity, normaldistance.The authors acknowledge portions of this work supported by the United States Departmentof Energy National Nuclear Security Administration under Award Number DE-FC52-08NA28613through the California Institute of Technology’s ASC/PSAAP Center for the Predictive Modelingand Simulation of High Energy Density Dynamic Response of Materials. P R E P R I N T — where f is a suitably well-behaved function, X is a random variable such that thepush-forward measure ( f ◦ X ) ∗ P has some concentration property, and m is eitherthe mean value E [ f ( X )] or median value M [ f ( X )]; sometimes the control is one-sided, and the absolute value in (1.1) is omitted. A notable feature of this paper isthat it provides concentration inequalities with m = f ( E [ X ]).The key property of the normal distance of this paper is contained in the following portmanteau theorem for the equivalence of various concentration inequalities withrespect to normal distance: Theorem 1.1.

Let X be a real topological vector space and X ∗ its continuous dualspace. Let Ψ : X ∗ → [0 , + ∞ ] be positively homogeneous of degree one. Deﬁne the Ψ-normal distance from x ∈ X to A ⊆ X by d ⊥ , Ψ ( x, A ) := sup (cid:26) h ν, x − p i + Ψ( ν ) (cid:12)(cid:12)(cid:12)(cid:12) p ∈ X and ν ∈ X ∗ such that,for all a ∈ A , h ν, a i ≤ h ν, p i (cid:27) , with the convention that / . Then the following statements about any randomvariable X that takes values in X are equivalent:(i) for every closed half-space H p,ν := { x ∈ X | h ν, x − p i ≤ } ⊆ X , where p ∈ X and ν ∈ X ∗ , P [ X ∈ H p,ν ] ≤ exp (cid:18) − d ⊥ , Ψ ( E [ X ] , H p,ν ) (cid:19) ; (ii) for every convex set K ⊆ X , P [ X ∈ K ] ≤ exp (cid:18) − d ⊥ , Ψ ( E [ X ] , K ) (cid:19) ; (iii) for every measurable A ⊆ X , P [ X ∈ A ] ≤ exp (cid:18) − d ⊥ , Ψ ( E [ X ] , A ) (cid:19) ; (iv) for every measurable f : X → R ∪ {±∞} and every θ ∈ R ∪ {±∞} , P [ f ( X ) ≤ θ ] ≤ exp (cid:18) − d ⊥ , Ψ ( E [ X ] , f − ([ −∞ , θ ])) (cid:19) ; (v) for every quasiconvex f : X → R ∪ {±∞} and every θ ∈ R ∪ {±∞} , P [ f ( X ) ≤ θ ] ≤ exp (cid:18) − d ⊥ , Ψ ( E [ X ] , f − ([ −∞ , θ ])) (cid:19) . Note that if f is quasilinear ( i.e. both f and − f are quasiconvex), then formu-lation (v) yields concentration inequalities for both the lower and upper tails of f ( X ).The notation and setting of the paper are covered in section 2, along with areview of some deﬁnitions and results from the concentration-of-measure literature.Normal distance is deﬁned and its properties (including theorem 1.1) are examinedin section 3. In section 4, the normalizing function Ψ is determined explicitly inseveral cases, thereby connecting theorem 1.1 with classical concentration results.In particular, proposition 4.4 identiﬁes the normal distance that corresponds to theconcentration of a vector, the entries of which are the empirical (sampled) meansof functions of independent random variables. In section 5, it is shown that theinequality in theorem 1.1(iii) is asymptotically sharp (in the sense used in largedeviations theory) in the high-dimensional limit, provided that A is convex and P R E P R I N T — CONCENTRATION INEQUALITIES FOR LINEAR AND NON-LINEAR FUNCTIONS 3 b b

K p q N ∗ p K N ∗ q K b r Figure 2.1.

A convex set K and its outward normal conesat points p, q, r ∈ K . ∂K is smooth at p ∈ ∂K , so N ∗ p K is ahalf-line; ∂K has a vertex at q , so N ∗ q K is a pointed convexcone with non-empty interior; at the interior point r , N ∗ r K isthe empty set.“suﬃciently round” at those points of A that are closest to the center of mass E [ X ].Finally, for completeness, the method of Chernoﬀ bounds and its consequences forconvex sets are reviewed in an appendix (section 6).2. Notation and Background

Let X be a real topological vector space. Let X ∗ denote the continuous dualspace of X and let h ℓ, x i denote the dual pairing between ℓ ∈ X ∗ and x ∈ X ; h v, ℓ i will also denote the dual pairing between v ∈ X ∗∗ and ℓ ∈ X ∗ . It is not strictlynecessary to assume that X is locally convex, but the results of this paper may betrivially true if X ∗ does not contain enough linear functionals.2.1. Half-Spaces.

Given p ∈ X and ν ∈ X ∗ , H p,ν will denote the closed half-spaceof X that has p in its frontier and outward-pointing normal ν , i.e. H p,ν := { x ∈ X | h ν, x i ≤ h ν, p i} . (2.1)Note well the degenerate case H p, = X . Every ( p, ν ) ∈ X × X ∗ deﬁnes a uniqueclosed half-space of X , whereas a given closed half-space can have multiple distinctrepresentations: H p,ν = H p ′ ,ν ′ if, and only if, ν is a positive multiple of ν ′ and h ν, p − p ′ i = h ν ′ , p − p ′ i = 0.2.2. Convex Sets and Cones.

The closed convex hull of A ⊆ X will be denotedby co( A ). Given a closed convex set K ⊆ X and p ∈ K , N ∗ p K denotes the outwardnormal cone to K at p , and N ∗ K denotes the outward normal bundle of K :N ∗ p K := { ν ∈ X ∗ | K ⊆ H p,ν } , (2.2)N ∗ K := (cid:8) ( p, ν ) ∈ X × X ∗ (cid:12)(cid:12) p ∈ K, ν ∈ N ∗ p K (cid:9) . (2.3)The outward normal cone N ∗ p K is a pointed convex cone : it contains 0, is convex,and s ν + s ν ∈ N ∗ p K for all s , s ≥ ν , ν ∈ N ∗ p K . Also, N ∗ p K = { } if p is an interior point of K . Note that N ∗ K ⊆ X × X ∗ is not necessarily a convexset. See ﬁgure 2.1 for an illustration. P R E P R I N T — Quasiconvexity. If K ⊆ X is a convex set, then a function f : K → R ∪{±∞} is said to be quasiconvex if, for every θ ∈ R ∪ {±∞} , the sublevel set f − ([ −∞ , θ ]) := { x ∈ K | −∞ ≤ f ( x ) ≤ θ } (2.4)is a convex set; equivalently, f is quasiconvex if, for all x, y ∈ K and t ∈ [0 , f ((1 − t ) x + ty ) ≤ max { f ( x ) , f ( y ) } . (2.5) f is said to be quasiconcave if − f is quasiconvex, and f is said to be quasilinear if it is both quasiconvex and quasiconcave. Every convex (resp. concave, linear)function is quasiconvex (resp. quasiconcave, quasilinear), but not vice versa . Inparticular, a function f : R N → R is quasilinear if, and only if, it is the compositionof a monotone function with a linear functional on R N [5, p. 122].2.4. Indicator and Characteristic Functions.

Given a set A ⊆ X , A and χ A denote its indicator function and characteristic function respectively: A ( x ) := ( , if x ∈ A ,0 , if x / ∈ A ; (2.6) χ A ( x ) := ( , if x ∈ A ,+ ∞ , if x / ∈ A . (2.7)Note that, for any convex set K ⊆ X , χ K is a convex function.2.5. Probabilistic Notions.

Let (Ω , F , P ) be a probability space and let X : Ω →X be an X -valued random variable. E [ · ] denotes the expectation operator withrespect to the probability measure P : E [ X ] is deﬁned to be any m ∈ X such that E [ h ℓ, X − m i ] ≡ Z Ω h ℓ, X ( ω ) − m i d P ( ω ) = 0 for all ℓ ∈ X ∗ ; (2.8)if X ∗ separates the points of X ( e.g. if X is a Banach space), then E [ X ] is unique.For Y : Ω → R , any m ∈ R that satisﬁessup (cid:26) v ∈ R (cid:12)(cid:12)(cid:12)(cid:12) P [ Y ≤ v ] ≤ (cid:27) ≤ m ≤ inf (cid:26) v ∈ R (cid:12)(cid:12)(cid:12)(cid:12) P [ Y ≤ v ] ≥ (cid:27) (2.9)will be called a median of Y and denoted M [ Y ]. M X : X ∗ → [0 , + ∞ ] denotes the moment-generating function deﬁned by M X ( ℓ ) := E [exp h ℓ, X i ] for all ℓ ∈ X ∗ . (2.10)Λ X : X ∗ → R ∪ {±∞} denotes the cumulant generating function (or logarithmicmoment-generating function ) deﬁned byΛ X ( ℓ ) := log M X ( ℓ ) = log E [exp h ℓ, X i ] for all ℓ ∈ X ∗ . (2.11)By H¨older’s inequality, Λ X is a convex function. P R E P R I N T — CONCENTRATION INEQUALITIES FOR LINEAR AND NON-LINEAR FUNCTIONS 5

Talagrand’s Inequalities.

It has been known for some time that convex setsand functions enjoy good concentration properties; moreover, to get good concen-tration results, it is necessary to measure distances in the right way.For example, a theorem of Talagrand shows that if a convex set K ⊆ R N occupiesa “signiﬁcant” portion of the Hamming cube {− , +1 } N and t ≫

1, then “nearlyall” of the points of the Hamming cube lie within Euclidean distance t of K . Deﬁnethe Euclidean Hausdorﬀ distance from x ∈ R N to A ⊆ R N by d Haus ( x, A ) := inf {k x − a k | a ∈ A } . (2.12)Talagrand [26] showed that if X is uniformly distributed in {− , +1 } N then, for any A ⊆ R N , E [exp( d Haus ( X, co( A )) / ≤ P [ X ∈ A ] − ; hence, Chebyshev’s inequalityimplies that, for any t ≥ P [ X ∈ A ] P [ d Haus ( X, co( A )) ≥ t ] ≤ exp (cid:18) − t (cid:19) . (2.13)More interesting results can be obtained if one uses not the Euclidean distancebut the Hamming distance — or, more accurately, an inﬁmum over weighted Ham-ming distances. For w = ( w , . . . , w N ) ∈ [0 , + ∞ ) N , deﬁne the w -weighted Hammingdistance d w on a product of sets X = Q Nn =1 X n by d w ( x, y ) := N X n =1 w n (1 − δ x n ,y n ); (2.14)that is, d w ( x, y ) is the w -weighted sum of the number of components in which x, y ∈ X diﬀer. For x ∈ X and A ⊆ X , set d w ( x, A ) := inf a ∈ A d w ( x, a ). Deﬁne Talagrand’s convex distance from x ∈ X to A ⊆ X by d Tal ( x, A ) := sup ( d w ( x, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) w ∈ [0 , + ∞ ) N , N X n =1 w n = 1 ) , (2.15)and, for A, B ⊆ X , let d Tal ( A, B ) := inf a ∈ A d Tal ( a, B ). Talagrand [27, § X = ( X , . . . , X N ) is any X -valued random variable with independent com-ponents, then P [ X ∈ A ] P [ X ∈ B ] ≤ exp (cid:18) − d Tal ( A, B ) (cid:19) . (2.16)These bounds on the probabilities of sets lead to deviation inequalities for convexLipschitz functions. For example ( cf. [13] [26]), let X be any random variable in theunit cube in R N with independent components, and let f : [0 , N → R be convexand Lipschitz with k f k Lip ≤

1; then, for any t ≥ P [ f ( X ) ≥ M [ f ( X )] + t ] ≤ (cid:18) − t (cid:19) . (2.17)Note, however, that these results use not only the convexity of the function ofinterest, but also require Lipschitz continuity. What concentration inequalities canbe shown to hold without smoothness assumptions? P R E P R I N T — McDiarmid’s Inequality.

One smoothness-free concentration inequality is

McDiarmid’s inequality [20], also known as the bounded diﬀerences inequality , whichitself generalizes an earlier inequality of Hoeﬀding [11]. McDiarmid’s inequality isby no means the strongest concentration-of-measure inequality in the literature,but is useful because of its simple hypotheses and proof.Deﬁne the

McDiarmid diameter of f , denoted D [ f ], by D [ f ] := N X n =1 D n [ f ] ! / , (2.18)where the n th McDiarmid subdiameter D n [ f ] is deﬁned by D n [ f ] := sup {| f ( x ) − f ( y ) | | x j = y j for j = n } . (2.19)When E [ | f ( X ) | ] is ﬁnite and X , . . . , X N are independent, McDiarmid’s inequalitybounds the deviations of f ( X ) from E [ f ( X )] in terms of the McDiarmid diameterof f : for any r > P [ f ( X ) − E [ f ( X )] ≤ − r ] ≤ exp (cid:18) − r D [ f ] (cid:19) , (2.20a) P [ f ( X ) − E [ f ( X )] ≥ r ] ≤ exp (cid:18) − r D [ f ] (cid:19) . (2.20b)McDiarmid’s inequality implies that, for any θ ∈ R ∪ {±∞} , P [ f ( X ) ≤ θ ] ≤ exp (cid:18) − E [ f ( X )] − θ ) D [ f ] (cid:19) , (2.21a) P [ f ( X ) ≥ θ ] ≤ exp (cid:18) − θ − E [ f ( X )]) D [ f ] (cid:19) . (2.21b)McDiarmid’s inequality (and similar inequalities such as martingale inequalities)have the advantage that a bound on the tails of f ( X ) is obtained solely in termsof the mean output E [ f ( X )] and the McDiarmid diameter D [ f ]. However, McDiar-mid’s inequality cannot take advantage of any other properties of f such as con-vexity or monotonicity; furthermore, if f has inﬁnite McDiarmid diameter on theessential range of X , then the trivial upper bound 1 is obtained.There are many other sources of concentration-of-measure inequalities: theseinclude logarithmic Sobolev inequalities and the Herbst argument [2] [10] [12], theentropy method [3] [4] [14], and information-theoretic methods [7] [19]. Of par-ticular interest are those concentration results that apply to inﬁnite-dimensionalsettings [16]. 3. Normal Distance

As noted above, eﬃcient presentation of many concentration-of-measure inequal-ities relies on having an appropriate notion of function variation ( e.g. the Lipschitznorm or McDiarmid diameter) or distance ( e.g.

Talagrand’s convex distance). Theinequalities that will be established in section 4 can be phrased in terms of trans-forms of moment-generating functions, but are more transparent if phrased in termsof a normal distance , which will introduced in this section.Fix a function Ψ : X ∗ → [0 , + ∞ ] that is positively homogeneous of degree one, i.e. such that Ψ( αℓ ) = α Ψ( ℓ ) for all α ≥ ℓ ∈ X ∗ . By analogy with the P R E P R I N T — CONCENTRATION INEQUALITIES FOR LINEAR AND NON-LINEAR FUNCTIONS 7 situation in ﬁnite-dimensional Euclidean space, in which Ψ = k ·k on ( R N ) ∗ , deﬁnethe distance from a point x ∈ X to a half-space H p,ν ⊆ X by d ⊥ , Ψ ( x, H p,ν ) := h ν, x − p i + Ψ( ν ) , (3.1)with the convention that 0 / x ∈ X to the trivialhalf-space H p,ν = X ought to be zero. Note that d ⊥ , Ψ ( x, H p,ν ) = 0 whenever x ∈ H p,ν ; note also that the homogeneity assumption on Ψ ensures that (3.1) isan unambiguous deﬁnition. We now generalize (3.1) to more general subsets of X than half-spaces. The heuristic is that the distance from x to A ⊆ X should bethe greatest possible distance (in the sense of (3.1)) from x to any half-space thatcontains A ; the existence of the degenerate half-space H p, ensures that the normaldistance is zero if there are no proper half-spaces that contain A . Deﬁnition 3.1.

Let x ∈ X and A ⊆ X . The Ψ -normal distance from x to A ,denoted d ⊥ , Ψ ( x, A ), is deﬁned (with the same convention that 0 / d ⊥ , Ψ ( x, A ) := sup (cid:26) h ν, x − p i + Ψ( ν ) (cid:12)(cid:12)(cid:12)(cid:12) p ∈ X and ν ∈ X ∗ such that A ⊆ H p,ν (cid:27) . (3.2)The Ψ-normal distance from A ⊆ X to B ⊆ X is deﬁned by d ⊥ , Ψ ( A, B ) :=inf a ∈ A d ⊥ , Ψ ( a, B ). In the special case X = R N and Ψ = k · k on ( R N ) ∗ , weshall simply write d ⊥ for d ⊥ , Ψ , i.e. d ⊥ ( x, A ) := sup (cid:26) ( ν · ( x − p )) + k ν k (cid:12)(cid:12)(cid:12)(cid:12) p ∈ R N and ν ∈ ( R N ) ∗ such that A ⊆ H p,ν (cid:27) . (3.3)Note well that the deﬁnition of the normal distance d ⊥ , Ψ ( x, A ) does not require X to be normed; even when X is equipped with a norm k · k X and Ψ is the cor-responding operator norm, the normal distance d ⊥ , Ψ ( x, A ) is not the same as theHausdorﬀ distance from x to A deﬁned by d Haus ( x, A ) := inf {k x − a k X | a ∈ A } ; (3.4)see ﬁgure 3.1 for an illustration. Note also that it is not generally true that d ⊥ , Ψ ( A, B ) = d ⊥ , Ψ ( B, A ): consider e.g. B := { (0 , } and A as in ﬁgure 3.1, inwhich case d ⊥ , Ψ ( A, B ) = inf a ∈ A d ⊥ , Ψ ( a, B ) = 1 = 0 = d ⊥ , Ψ ( B, A ) . For any x ∈ X and A ⊆ B ⊆ X , it holds that d ⊥ , Ψ ( x, B ) ≤ d ⊥ , Ψ ( x, A ). Fur-thermore, since a closed half-space H p,ν contains A if, and only if, it contains theclosed convex hull co( A ) of A , the following equality holds: d ⊥ , Ψ ( x, A ) = d ⊥ , Ψ ( x, co( A )) for all x ∈ X and all A ⊆ X . (3.5) Remark 3.2.

It is natural to ask what, if any, relation there is between the normaldistance and Talagrand’s convex distance. The simplest answer is to say that thetwo distances should be compared only with great caution, since each belongs toa diﬀerent setting: Talagrand’s distance is deﬁned on a product of sets, whereasthe normal distance is deﬁned on a topological vector space. Even on R N , thetwo distances measure diﬀerent quantities: in some sense, d Tal ( x, A ) measures howmany of the coordinates of x are covered by A , but does not measure the geometricdistance between them; on the other hand, d ⊥ , Ψ ( x, A ) is a much more geometricmeasure of how far x is from A in terms of linear functionals on X , and the “size”of those linear functionals is measured by Ψ. In particular, Talagrand’s convex P R E P R I N T — A d Haus (0 , A ) d ⊥ (0 , A ) Figure 3.1.

An example of a subset A of the Euclidean plane R for which the normal distance d ⊥ (0 , A ) = 1 unit ( cf. thedashed line), as opposed to the Euclidean Hausdorﬀ distance d Haus (0 , A ) = 2 units ( cf. the dotted arc).distance is positively homogeneous of degree zero, whereas the normal distance ispositively homogeneous of degree one: for any x ∈ R N , A ⊆ R N , and α > d Tal ( αx, αA ) = d Tal ( x, A ) ,d ⊥ , Ψ ( αx, αA ) = αd ⊥ , Ψ ( x, A ) . This section concludes with the proof of the portmanteau theorem (theorem 1.1)and some ﬁnal remarks on its applicability:

Proof of theorem 1.1.

The equivalence will be established by showing that(i) = ⇒ (ii) = ⇒ (iii) = ⇒ (iv) = ⇒ (v) = ⇒ (i) . Suppose that (i) holds. Then P [ X ∈ K ] ≤ inf H p,ν ⊇ K P [ X ∈ H p,ν ] by monotonicity of P , ≤ inf H p,ν ⊇ K exp (cid:18) − d ⊥ , Ψ ( E [ X ] , H p,ν ) (cid:19) by (i),= exp −

12 sup H p,ν ⊇ K d ⊥ , Ψ ( E [ X ] , H p,ν ) ! = exp (cid:18) − d ⊥ , Ψ ( E [ X ] , K ) (cid:19) by (3.2).Hence, (i) implies (ii).Suppose that (ii) holds; then P [ X ∈ A ] ≤ P [ X ∈ co( A )] since A ⊆ co( A ), ≤ exp (cid:18) − d ⊥ , Ψ ( E [ X ] , co( A )) (cid:19) by (ii),= exp (cid:18) − d ⊥ , Ψ ( E [ X ] , A ) (cid:19) by (3.5), P R E P R I N T — CONCENTRATION INEQUALITIES FOR LINEAR AND NON-LINEAR FUNCTIONS 9 and so (ii) implies (iii). (iv) follows from (iii) upon setting A := { x ∈ X | f ( x ) ≤ θ } .(v) is clearly a special case of (iv). (i) follows from (v) upon setting f := χ H p,ν and θ := 1. (cid:3) Remark 3.3.

It is important to note that all the bounds in theorem 1.1 may betrivial if the dual space X ∗ is not rich enough. For example, given a measure space( Z , F , µ ), for 0 < p <

1, the space L p ( Z , F , µ ; R ) := ( f : Z → R (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) k f k p := (cid:18)Z Z | f ( z ) | p d µ ( z ) (cid:19) /p < + ∞ ) is a topological vector space with respect to the quasinorm topology generated by k · k p . This space is not locally convex and has a trivial dual space: the onlycontinuous linear functional on this space is the zero functional, and so the onlyclosed half-space is the whole space. See e.g. [24, § L p ([0 , R ) for 0 < p < X . This can be done, and most results gothrough mutatis mutandis ; in particular, it is necessary to replace all references tothe closed convex hull co( A ) of A ⊆ X with the convex hull co( A ); the analogue of(3.5) (with Ψ now deﬁned on the algebraic dual of X ) is d ⊥ , Ψ ( x, A ) = d ⊥ , Ψ ( x, co( A )) for all x ∈ X and all A ⊆ X . The principal disadvantage of ignoring all topological structure on X , of course, isthat there are no longer notions of interior, closure and frontier — although it stillmakes sense to discuss the extremal points of convex sets.4. Normal Distance as a Concentration Rate

The method of Chernoﬀ bounding (reviewed in lemma 6.1) gives bounds on P [ X ∈ H p,ν ] in terms of the moment-generating function M X . If these bounds canbe formulated in terms of a suitable normal distance, then theorem 1.1 producesequivalent bounds for on P [ X ∈ K ] for convex K , on P [ X ∈ A ], & c. . As notedin [18, § P [ f ( X ) ≥ θ ] is never better than the bestbound using the all the moments of f ( X ): if f takes only non-negative values, theninf k ∈ N θ − k E (cid:2) f ( X ) k (cid:3) ≤ inf s ≥ e − sθ E (cid:2) e sf ( X ) (cid:3) . (4.1)However, Chernoﬀ bounds have the advantage that they are geometrically veryeasy to handle.The next result provides the normal distance formulation for an X -valued Gauss-ian random variable (in fact, for a family of such variables). In the special case of asingle Gaussian random vector X on X = R N with covariance operator C X = σ I N ,proposition 4.1 yields the classical Chernoﬀ bound for a multivariate normal randomvariable. Proposition 4.1.

Let Γ be a family of Gaussian random vectors in X . For each X ∈ Γ , let C X : X ∗ → X ∗∗ be its covariance operator deﬁned by h C X ℓ, ν i := E [ h ℓ, X ih ν, X i ] . (4.2) Let E := { E [ X ] | X ∈ Γ } , let Ψ( ν ) := sup X ∈ Γ p h C X ν, ν i , (4.3) P R E P R I N T —

10 T. J. SULLIVAN AND H. OWHADI and let d ⊥ , Ψ be the corresponding normal distance. Then, for any A ⊆ X , sup X ∈ Γ P [ X ∈ A ] ≤ exp (cid:18) − d ⊥ , Ψ ( E, A ) (cid:19) . (4.4) Proof.

For each X ∈ Γ, the moment-generating function for X is given by M X ( ℓ ) := E h e h ℓ,X i i = exp (cid:18) h ℓ, E [ X ] i + h C X ℓ, ℓ i (cid:19) . (4.5)Therefore, P [ X ∈ H p,ν ] ≤ inf s ≥ exp (cid:18) s h ν, p − E [ X ] i + s h C X ν, ν i (cid:19) by (4.5) and lemma 6.1,= exp (cid:18) − h ν, E [ X ] − p i h C X ν, ν i (cid:19) ≤ exp (cid:18) − h ν, E [ X ] − p i ν ) (cid:19) by (4.3),= exp (cid:18) − d ⊥ , Ψ ( E [ X ] , H p,ν ) (cid:19) by (3.2).Hence, by theorem 1.1, P [ X ∈ A ] ≤ exp (cid:18) − d ⊥ , Ψ ( E [ X ] , A ) (cid:19) , and so sup X ∈ Γ P [ X ∈ A ] ≤ sup X ∈ Γ exp (cid:18) − d ⊥ , Ψ ( E [ X ] , A ) (cid:19) = exp (cid:18) − inf X ∈ Γ d ⊥ , Ψ ( E [ X ] , A ) (cid:19) = exp (cid:18) d ⊥ , Ψ ( E, A ) (cid:19) . (cid:3) Lemma 6.1 also has the following consequences for random vectors supported ina cuboid in R N ; this encompasses two standard situations in which concentration isoften studied, namely concentration for functions on the Euclidean unit cube andon the Hamming cube. Proposition 4.2.

Let X be a random vector in R N with independent componentssuch that each component X n almost surely takes values in a ﬁxed interval of length L n . Let Ψ( ν ) := 12 vuut N X n =1 L n ν n (4.6) and let d ⊥ , Ψ be the corresponding normal distance. Then, for any A ⊆ R N , P [ X ∈ A ] ≤ exp (cid:18) − d ⊥ , Ψ ( E [ X ] , A ) (cid:19) . (4.7)A fortiori , if X takes values in (a translate of ) the unit cube [0 , N , then P [ X ∈ A ] ≤ exp (cid:0) − d ⊥ ( E [ X ] , A ) (cid:1) , (4.8) P R E P R I N T — CONCENTRATION INEQUALITIES FOR LINEAR AND NON-LINEAR FUNCTIONS 11 and if X takes values in (a translate of ) the Hamming cube {− , +1 } N , then P [ X ∈ A ] ≤ exp (cid:18) − d ⊥ ( E [ X ] , A ) (cid:19) . (4.9) Proof.

The proof is similar to the Gaussian case: it is an application of lemma6.1 and Hoeﬀding’s lemma [11, lemma 1 and (4.16)], which bounds the moment-generating function of X n as follows: M X n ( ℓ n ) := E [exp( ℓ n X n )] ≤ exp (cid:18) ℓ n E [ X n ] + ℓ n L n (cid:19) . Note that the claim can also be proved directly by applying McDiarmid’s in-equality to the function h ν, ·i , which has mean E [ h ν, X i ] = h ν, E [ X ] i and McDiarmiddiameter p L + · · · + L N . (cid:3) Remark 4.3.

Note the similarity between the normal distances of propositions 4.1and 4.2. In the Gaussian case, the norm on X ∗ is the one induced by the “largest”covariance operator in the family of random variables Γ. In the bounded-range case,the norm on X ∗ is the one induced by the “largest” covariance operator for randomvariables satisfying the range constraint: if X is a real-valued random variabletaking values in an interval [ a, b ], then Ψ( ν ) = ( b − a ) ν and Var[ X ] ≤ ( b − a ) ;this upper bound on the variance is attained by a Bernoulli random variable withlaw δ a + δ b .The next result identiﬁes the normal distance that corresponds to the concentra-tion of a vector, the entries of which are the empirical (sampled) means of functionsof independent random variables. Proposition 4.4.

For n = 1 , . . . , N , let Z n := f n ( Y n, , . . . , Y n,K ( n ) ) be a real-valued function of independent random variables Y n,k , and suppose that f n has ﬁniteMcDiarmid diameter D [ f n ] . Let Z = ( Z , . . . , Z N ) . Suppose that the random inputsof each f n are sampled independently M ( n ) times according to the distribution P and that the empirical average b E [ Z ] =  M ( n ) M ( n ) X m =1 f n (cid:16) Y ( m ) n, , . . . , Y ( m ) n,K ( n ) (cid:17) Nn =1 ∈ R N (4.10) is formed. Then, for any A ⊆ R N , P hb E [ Z ] ∈ A i ≤ exp (cid:18) − d ⊥ , Ψ ( E [ Z ] , A ) (cid:19) , (4.11) where the distance Ψ : ( R N ) ∗ → [0 , + ∞ ) is given in terms of the McDiarmid diam-eters of the functions f , . . . , f N and the sample sizes M (1) , . . . , M ( N ) : Ψ( ν ) := 12 N X n =1 ν n D [ f n ] M ( n ) ! / . (4.12) Proof.

Let H p,ν ( R N be a half-space. Consider the real-valued random variable D ν, b E [ Z ] E as a function of the sampled input random variables Y ( m ) n,k . Supposethat the McDiarmid subdiameter of f n with respect to Y n,k is D n,k . Then the P R E P R I N T —

12 T. J. SULLIVAN AND H. OWHADI

McDiarmid subdiameter of D ν, b E [ Z ] E with respect to the m th sample of Y n,k is ν n D n,k /M ( n ). Hence, the McDiarmid diameter of D ν, b E [ Z ] E is vuut X k,n,m ν n D n,k M ( n ) = vuutX n,m ν n D [ f n ] M ( n ) = sX n ν n D [ f n ] M ( n )Therefore, since b E [ Z ] is an unbiased estimator for E [ Z ] ( i.e. E hb E [ Z ] i = b E [ Z ]),McDiarmid’s inequality (2.21a) implies that P hb E [ Z ] ∈ H p,ν i = P hD ν, b E [ Z ] E ≤ h ν, p i i ≤ exp  − h ν, E [ Z ] i − h ν, p i ) P Nn =1 ν n D [ f n ] M ( n )  = exp  − h ν, E [ Z ] − p i · · P Nn =1 ν n D [ f n ] M ( n )  = exp (cid:18) − d ⊥ , Ψ ( E [ Z ] , H p,ν ) (cid:19) . The claim now follows from theorem 1.1. (cid:3)

An example of the application of proposition 4.4 is the following:

Example 4.5 (Functions of empirical means) . The Chernoﬀ bounding methodcan be used to provide much-improved conﬁdence levels for quantities derived frommany empirical — as opposed to exact — means; see e.g. [25, § H : R N → R is some function of interest: in particular, the quantity of interest is H ( E [ Z ] , . . . , E [ Z N ]) for some absolutely integrable real-valued random variables Z , . . . , Z N . If, however, the exact means E [ Z n ] are unknown, then empirical means b E [ Z n ] may be used in their place if appropriate conﬁdence corrections are made.Suppose that “error” corresponds to concluding, based on the empirical means,that H ( E [ Z ]) is smaller than it actually is. Given α ∈ R N , set H α ( z , . . . , z N ) := H ( z + α , . . . , z N + α N ) . (4.13)Therefore, given any ε >

0, we seek an appropriate “margin hit” α = α ( ε ) ∈ R N (typically, α n ≥ n ∈ { , . . . , N } ) such that P h H α (cid:16)b E [ Z ] , . . . , b E [ Z N ] (cid:17) ≥ H ( E [ Z ] , . . . , E [ Z N ]) i ≥ − ε. Dually, given α ∈ R N , we seek a sharp upper bound on the probability of error, i.e. on P h H α (cid:16)b E [ Z ] , . . . , b E [ Z N ] (cid:17) ≤ H ( E [ Z ] , . . . , E [ Z N ]) i . If H (and hence H α ) is monotonic in each of its N arguments and Z , . . . , Z N are independent, then the probability of non-error can be bounded from below as P R E P R I N T — CONCENTRATION INEQUALITIES FOR LINEAR AND NON-LINEAR FUNCTIONS 13 follows: P h H α (cid:16)b E [ Z ] (cid:17) ≤ H ( E [ Z ]) i = P h H α (cid:16)b E [ Z ] (cid:17) ≤ H α ( E [ Z ] − α ) i ≤ N Y n =1 P hb E [ Z n ] ≤ E [ Z n ] − α n i ≤ − N Y n =1 (cid:18) − exp (cid:18) − − M ( n )( α n ) D [ f n ] (cid:19)(cid:19) . Unfortunately, when N is large, the last line of this inequality is typically closeto zero unless the sample sizes are very large, and so this bound is of limiteduse. Geometrically, this is analogous to the fact that a high-dimensional orthant(product of half-lines) appears to be very narrow from the perspective of an observerat its vertex. In contrast, half-spaces always ﬁll a half of the observer’s ﬁeld of view.To bound the probability of sublevel or superlevel sets using half-spaces requires H α to have some convexity — not monotonicity — properties.If H α is quasiconvex, then the bounds using normal distances can be applied togood eﬀect, and yield estimates that actually perform better the larger N is. Inparticular, if H α is both quasiconvex and diﬀerentiable, then the outward normalto its t -level set at some point p is just any positive multiple of the derivative of H α at p , and this yields the bound P h H α (cid:16)b E [ Z ] (cid:17) ≤ θ i ≤ inf p : H α ( p ) ≤ θ exp  − (cid:16)P Nn =1 ∂ n H α ( p )( E [ Z n ] − p n ) (cid:17) P Nn =1 ( ∂ n H α ( p )) D [ f n ] M ( n )  . (4.14)In particular, taking θ = H ( E [ Z ]) = H α ( E [ Z ] − α ) and evaluating the exponentialin (4.14) at p = E [ Z ] − α ∈ R N yields that P h H α (cid:16)b E [ Z ] (cid:17) ≤ H ( E [ Z ]) i ≤ exp  − (cid:16)P Nn =1 ∂ n H α ( p ) α n (cid:17) P Nn =1 ( ∂ n H α ( p )) D [ f n ] M ( n )  . (4.15)(4.15) is particularly useful since it links the margin hits α n , the sample sizes M ( n ), and the maximum probability of error. For example, given a desired level ofconﬁdence, margin hits α n , and a total number of samples M ∈ N , one can choosesample sizes M (1) , . . . , M ( N ) that sum to M and minimize the right-hand side of(4.15); this yields an optimal distribution of sampling resources so as to ensure that H α (cid:16)b E [ Z ] (cid:17) ≥ H ( E [ Z ]) with the desired level of conﬁdence.5. High-Dimensional Asymptotics

The topic of this section is the asymptotic sharpness of the bounds introducedabove as the dimension of the space X becomes large. We begin with a comparisonof the McDiarmid and half-space bounds for a simple function: a quadratic formon R N . Example 5.1 (Comparison with McDiarmid’s inequality) . The following exampleserves to illustrate how the half-space method can produce upper bounds on themeasure of suitable sublevel sets that are superior to those oﬀered by McDiarmid’s P R E P R I N T —

14 T. J. SULLIVAN AND H. OWHADI − − − − N l og ( upp e r b o und ) ut ut ut ut ut ut ut ut ut ut ut ut ut utrs rs rs rs rs rs rs rs rs rs rs rs rs rsu u u u u u u u u u u u u ur r r r r r r r r r r r r r Figure 5.1.

For the quadratic form Q N on R N given in (5.1),a comparison of the McDiarmid upper bound (squares) andhalf-space upper bound (triangles) on P [ Q N ( X ) ≤ θ ] in thecases θ = (dotted line and hollow polygons) and θ = (solidline and ﬁlled polygons).inequality; it also shows how this eﬀect is more pronounced in higher-dimensionalspaces. Consider the following quadratic form Q N on R N : Q N ( x ) := (cid:13)(cid:13) x − (cid:0) , . . . , (cid:1)(cid:13)(cid:13) . (5.1)For any θ >

0, the sublevel set Q − N ([ −∞ , θ ]) is simply a ball of radius √ θ aboutthe point (cid:0) , . . . , (cid:1) . Suppose that a random vector X takes values in (cid:2) − , + (cid:3) N with independent components. McDiarmid’s inequality (2.21a) implies that P [ Q N ( X ) ≤ θ ] ≤ exp  − √ N − θ √ N !  , If also E [ X ] = 0, then corollary 4.2 implies that P [ Q N ( X ) ≤ θ ] ≤ exp − ( √ N − √ θ ) ! . For small N and large θ , McDiarmid’s bound is the sharper of the two. However,for small θ (and, notably, as N → ∞ for any ﬁxed θ ), the half-space bound is thesharper bound. See ﬁgure 5.1 for an illustration.The previous example suggests that bounds constructed using the half-spacemethod may perform very well in high dimension but also that the sharpness ofthe bound may depend on “how round” the set whose measure we wish to boundis. To ﬁx ideas, suppose that X = ( X , . . . , X N ) : Ω → R N is a random vector withindependent components, where X n is supported on an interval of length L n . For A ⊆ R N , how sharp is the bound P [ X ∈ A ] ≤ exp (cid:18) − d ⊥ ( E [ X ] , A ) (cid:19) ? (5.2) P R E P R I N T — CONCENTRATION INEQUALITIES FOR LINEAR AND NON-LINEAR FUNCTIONS 15 H e , − e K ε ε b E [ X ] Figure 5.2.

It is not reasonable to expect that (an upperbound for) the measure of the half-space H e , − e is a sharpupper bound for the measure of the narrow wedge K ε when ε is small.First, note that since d ⊥ ( E [ X ] , A ) = d ⊥ ( E [ X ] , co( A )), the bound cannot be ex-pected to be sharp if A diﬀers greatly from its closed convex hull, and so it makessense to restrict investigation to the case that A = K , a closed and convex subsetof R N . Secondly, it is not reasonable to expect the bound (5.2) on P [ X ∈ K ] tobe sharp if K is sharply pointed, e.g. if K is the narrow wedge K ε of angle ε ≪ e := (1 , , . . . ,

0) in R N : K ε := (cid:26) x ∈ R N (cid:12)(cid:12)(cid:12)(cid:12) ( x − e ) · e k x − e k ≤ ε (cid:27) ; (5.3)see ﬁgure 5.2. Therefore, we wish to consider the opposite situation in which K hasno sharp points, which will be made precise by requiring that K satisfy an interiorball condition.Suppose that ( p, ν ) ∈ N ∗ K is such that d ⊥ ( x, H p,ν ) = d ⊥ ( x, K ). Suppose alsothat B r ( p − rω ) ⊆ K , with r > ω ∈ R N a unit vector, is an interior ball for K at p ∈ ∂K ; cf. ﬁgure 5.3. If the law of X on R N is highly singular, then it cannot beexpected that the bound (5.2) is sharp, so suppose that the law of X has a densitywith respect to Lebesgue measure that is bounded above by some constant C > P [ X ∈ K ] ≤ exp − h ν, E [ X ] − p i P Nn =1 ν n L n ! . In the extreme case, K is precisely the closed ball B r ( p − rω ), the P -measure ofwhich is at most Cr N π N/ / Γ(1 + N/ § I.1]; see also e.g. [8] [28] for surveys of the large deviationsliterature. Two sequences ( α n ) n ∈ N and ( β n ) n ∈ N are said to be logarithmically equiv-alent , denoted α n ≃ β n , if1 n log α n − n log β n ≡ log (cid:18) α n β n (cid:19) /n → n → ∞ . (5.4)Are the half-space bound (5.2) and the measure of B r ( p − rω ) logarithmically equiv-alent? That is, does the conditional probability P (cid:2) X ∈ B r ( p − rω ) (cid:12)(cid:12) X ∈ H p,ν (cid:3) , P R E P R I N T —

16 T. J. SULLIVAN AND H. OWHADI H p,ν K b q b p − rω b p b E [ X ] Figure 5.3.

An interior ball of radius r for the closed convexset K at the frontier point p . Necessarily, p is a point at which ∂K is smooth; K admits no interior ball of positive radius atthe vertex q . For convenience, the unit vector ω ∈ R N hasbeen identiﬁed with ν ∈ N ∗ p K ⊆ ( R N ) ∗ .when raised to the power N , converge to 1 as N → ∞ ? To simplify the asymp-totic expansions below, in all lines after the ﬁrst two, we shall take E [ X ] = 0 and L = · · · = L N = 1. Then1 N log P (cid:2) X ∈ B r ( p − rω ) (cid:3) − N log (r.h.s. of (5.2)) ≤ N log Cr N π N/ Γ(1 + N/

2) + 2 h ν, E [ X ] − p i P Nn =1 ν n L n ! = 2 h ν, p i − N k ν k + log( Cr N π N/ ) N − log Γ(1 + N/ N which, by Stirling’s approximation for the Gamma function [1, p. 256, eq. (6.1.37)],is approximately ≈ h ν, p i − N k ν k + log( Cr N π N/ ) N − N log s π N/ (cid:18) N/ e (cid:19) N/ ! ∼ h ν, p i − N k ν k + log CN − N log 4 πN − N/ N log N e ∼ h ν, p i − N k ν k + log r − log √ N Note that h ν, p i − / k ν k ≤ √ Nd (0 , p ), where d denotes the weighted Hammingdistance with weight w = (1 , . . . , r is of the same order as √ N . That is, it is necessarythat K is suﬃciently round that it has an interior ball of radius comparable to √ N at those frontier points where the normal distance d ⊥ ( E [ X ] , K ) is attained. P R E P R I N T — CONCENTRATION INEQUALITIES FOR LINEAR AND NON-LINEAR FUNCTIONS 17

Now suppose that K = f − ([ −∞ , θ ]) is a convex sublevel set for twice-diﬀerentiablefunction f . Let η , . . . , η N − , ν be a basis of R N such that k η k = · · · = k η N − k = k ν k = 1and, for each n ∈ { , . . . , N − } , η n is perpendicular to ν . Suppose that, in thissystem of normal coordinates, near p , the frontier of K can be approximated by aparabola: ∂K = ( y η + . . . y N − η N − − y N ν (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) y N = N − X n =1 λ n y n ) with λ ≥ λ ≥ · · · ≥ λ N − ≥

0. Then the condition that K admits an interior ballof radius r at p is the inequality r − vuut r − N − X n =1 y n ≥ N − X n =1 λ n y n whenever N − X n =1 y n ≤ r . This, in turn, leads to the following condition on λ : it must hold that λ ≤ r . Putanother way, the half-space method cannot be expected to provide asymptoticallysharp bounds for P [ f ( X ) ≤ θ ] if, when f is approximated in normal coordinatesnear the closest point of f − ([ −∞ , θ ]) to E [ X ] by a non-negative quadratic form,that quadratic form has an eigenvalue greater than (4 N ) − / .6. Appendix: Chernoff Bounds

The method of Chernoﬀ bounds [5, § X is bounded by that of a containing half-space, and theprobability of that half-space is bounded using the moment-generating function ofthe probability measure. Lemma 6.1 (Chernoﬀ bounds) . For any half-space H p,ν ⊆ X , P [ X ∈ H p,ν ] ≤ inf s ≥ e s h ν,p i M X ( − sν ) . (6.1) For any convex set K ⊆ X , P [ X ∈ K ] ≤ inf ( p,ν ) ∈ N ∗ K e h ν,p i M X ( − ν ) (6.2a)= exp (cid:18) − sup p ∈ K (Λ X + χ − N ∗ p K ) ⋆ ( p ) (cid:19) . (6.2b) In particular, for any x ∈ X , P [ X = x ] ≤ exp( − Λ ⋆X ( x )) . (6.3) Proof.

By the deﬁnition of the half-space H p,ν , P [ X ∈ H p,ν ] = P [ h ν, X i ≤ h ν, p i ]= E (cid:2) [ h ν,p − X i≥ (cid:3) ≤ E h e s h ν,p − X i i for any s ≥ e s h ν,p i E h e h− sν,X i i ≤ e s h ν,p i M X ( − sν ) . P R E P R I N T —

18 T. J. SULLIVAN AND H. OWHADI

Since this inequality holds for any s ≥

0, taking the inﬁmum over all such s yields(6.1). Recall that the outward normal cone to a convex set at any point is closedunder multiplication by non-negative scalars; hence, for any convex set K ⊆ X ,taking the inﬁmum of the right-hand side of (6.1) over half-spaces H p,ν that contain K yields (6.2a). Now observe thatinf ( p,ν ) ∈ N ∗ K e h ν,p i M X ( − ν )= inf ( p,ν ) ∈ N ∗ K exp( h ν, p i + Λ X ( − ν ))= exp (cid:18) inf p ∈ K inf ν ∈ N ∗ p K ( h ν, p i + Λ X ( − ν )) (cid:19) = exp − sup p ∈ K sup ν ∈− N ∗ p K ( h ν, p i − Λ X ( ν )) ! = exp (cid:18) − sup p ∈ K (Λ X + χ − N ∗ p K ) ⋆ ( p ) (cid:19) , which establishes (6.2b); (6.3) follows as a special case. (cid:3) References

1. M. Abramowitz and I. A. Stegun (eds.),

Handbook of Mathematical Functions with Formulas,Graphs, and Mathematical Tables , Dover Publications Inc., New York, 1992, Reprint of the1972 edition. MR 1225604 (94b:00012)2. D. Bakry and M. ´Emery,

Diﬀusions hypercontractives , S´eminaire de Proba-bilit´es, XIX, 1983/84, Lecture Notes in Math., vol. 1123, Springer, Berlin, 1985, http://dx.doi.org/10.1007/BFb0075847 , pp. 177–206. MR 889476 (88j:60131)3. S. G. Bobkov and M. Ledoux,

On modiﬁed logarithmic Sobolev inequalities forBernoulli and Poisson measures , J. Funct. Anal. (1998), no. 2, 347–365, http://dx.doi.org/10.1006/jfan.1997.3187 . MR 1636948 (99e:60051)4. S. Boucheron, G. Lugosi, and P. Massart,

Concentration inequalities using the entropy method ,Ann. Probab. (2003), no. 3, 1583–1614, http://dx.doi.org/10.1214/aop/1055425791 .MR 1989444 (2004i:60023)5. S. Boyd and L. Vandenberghe, Convex Optimization , Cambridge University Press, Cambridge,2004. MR 2061575 (2005d:90002)6. H. Chernoﬀ,

A measure of asymptotic eﬃciency for tests of a hypothesisbased on the sum of observations , Ann. Math. Statistics (1952), 493–507, http://dx.doi.org/10.1214/aoms/1177729330 . MR 0057518 (15,241c)7. A. Dembo, Information inequalities and concentration of measure , Ann. Probab. (1997),no. 2, 927–939, http://dx.doi.org/10.1214/aop/1024404424 . MR 1434131 (98e:60027)8. A. Dembo and O. Zeitouni, Large Deviations Techniques and Applications , second ed., Appli-cations of Mathematics (New York), vol. 38, Springer-Verlag, New York, 1998. MR 1619036(99d:60030)9. F. den Hollander,

Large Deviations , Fields Institute Monographs, vol. 14, American Mathe-matical Society, Providence, RI, 2000. MR 1739680 (2001f:60028)10. L. Gross,

Logarithmic Sobolev inequalities , Amer. J. Math. (1975), no. 4, 1061–1083, http://dx.doi.org/10.2307/2373688 . MR 0420249 (54 Probability inequalities for sums of bounded random variables , J. Amer. Statist.Assoc. (1963), no. 301, 13–30, http://dx.doi.org/10.2307/2282952 . MR 0144363 (26 Logarithmic Sobolev inequalities and stochastic Ising models ,J. Statist. Phys. (1987), no. 5-6, 1159–1194, http://dx.doi.org/10.1007/BF01011161 .MR 893137 (89e:82013)13. W. B. Johnson and G. Schechtman, Remarks on Talagrand’s deviation inequality forRademacher functions , Functional Analysis (Austin, TX, 1987/1989), Lecture Notes in P R E P R I N T — CONCENTRATION INEQUALITIES FOR LINEAR AND NON-LINEAR FUNCTIONS 19

Math., vol. 1470, Springer, Berlin, 1991, http://dx.doi.org/10.1007/BFb0090214 , pp. 72–77. MR 1126739 (92m:60017)14. M. Ledoux,

On Talagrand’s deviation inequalities for product measures , ESAIMProbab. Statist. (1995/97), 63–87 (electronic), http://dx.doi.org/10.1051/ps:1997103 .MR 1399224 (97j:60005)15. , The Concentration of Measure Phenomenon , Mathematical Surveys and Mono-graphs, vol. 89, American Mathematical Society, Providence, RI, 2001. MR 1849347(2003k:28019)16. M. Ledoux and M. Talagrand,

Probability in Banach Spaces: Isoperimetry and Processes ,Ergebnisse der Mathematik und ihrer Grenzgebiete (3) [Results in Mathematics and RelatedAreas (3)], vol. 23, Springer-Verlag, Berlin, 1991. MR 1102015 (93c:60001)17. P. L´evy,

Probl`emes Concrets d’Analyse Fonctionnelle. avec un compl´ement sur les fonction-nelles analytiques par F. Pellegrino. , second ed., Gauthier-Villars, Paris, 1951. MR 0041346(12,834a)18. G. Lugosi,

Concentration-of-measure inequalities , ,Pompeu Fabra University, Barcelona, Spain, 25 June 2009, Lecture notes.19. K. Marton, Bounding d -distance by informational divergence: a methodto prove measure concentration , Ann. Probab. (1996), no. 2, 857–866, http://dx.doi.org/10.1214/aop/1039639365 . MR 1404531 (97f:60064)20. C. McDiarmid, On the method of bounded diﬀerences , Surveys in Combinatorics, 1989 (Nor-wich, 1989), London Math. Soc. Lecture Note Ser., vol. 141, Cambridge Univ. Press, Cam-bridge, 1989, pp. 148–188. MR 1036755 (91e:05077)21. ,

Centering sequences with bounded diﬀerences , Combin. Probab. Comput. (1997),no. 1, 79–86, http://dx.doi.org/10.1017/S0963548396002854 . MR 1436721 (98b:60020)22. , Concentration , Probabilistic Methods for Algorithmic Discrete Mathematics, Algo-rithms Combin., vol. 16, Springer, Berlin, 1998, pp. 195–248. MR 1678578 (2000d:60032)23. V. D. Milman and G. Schechtman,

Asymptotic Theory of Finite-Dimensional Normed Spaces ,Lecture Notes in Mathematics, vol. 1200, Springer-Verlag, Berlin, 1986, With an appendix byM. Gromov. MR 856576 (87m:46038)24. W. Rudin,

Functional Analysis , second ed., International Series in Pure and Applied Mathe-matics, McGraw-Hill Inc., New York, 1991. MR 1157815 (92k:46001)25. T. J. Sullivan, U. Topcu, M. McKerns, and H. Owhadi,

Uncertainty quantiﬁca-tion via codimension-one partitioning , Int. J. Numer. Meth. Eng.

In Press (2010), http://dx.doi.org/10.1002/nme.3030 .26. M. Talagrand,

An isoperimetric theorem on the cube and the Kintchine–Kahane inequalities ,Proc. Amer. Math. Soc. (1988), no. 3, 905–909, http://dx.doi.org/10.2307/2046814 .MR 964871 (90h:60016)27. ,

Concentration of measure and isoperimetric inequalities in product spaces , Inst.Hautes ´Etudes Sci. Publ. Math. (1995), 73–205, http://dx.doi.org/10.1007/BF02699376 .MR 1361756 (97h:60016)28. S. R. S. Varadhan, Large deviations , Ann. Probab. (2008), no. 2, 397–419, http://dx.doi.org/10.1214/07-AOP348 . MR 2393987 (2009d:60070)29. V. H. Vu, Concentration of non-Lipschitz functions and applications , Random StructuresAlgorithms (2002), no. 3, 262–316, Probabilistic methods in combinatorial optimization.MR 1900610 (2003c:60053) Graduate Aerospace Laboratories, California Institute of Technology, Mail Code205-45, 1200 East California Boulevard, Pasadena, CA 91125, United States of America

E-mail address : [email protected] URL : Applied & Computational Mathematics and Control & Dynamical Systems, Cali-fornia Institute of Technology, Mail Code 217-50, 1200 East California Boulevard,Pasadena, CA 91125, United States of America

E-mail address : [email protected] URL ::