Dimension-free log-Sobolev inequalities for mixture distributions
aa r X i v : . [ m a t h . P R ] F e b DIMENSION-FREE LOG-SOBOLEV INEQUALITIESFOR MIXTURE DISTRIBUTIONS
HONG-BIN CHEN, SINHO CHEWI, AND JONATHAN NILES-WEED
Abstract.
We prove that if ( P x ) x ∈ X is a family of probabilitymeasures which satisfy the log-Sobolev inequality and whose pair-wise chi-squared divergences are uniformly bounded, and µ is anymixing distribution on X , then the mixture R P x d µ ( x ) satisfies alog-Sobolev inequality. In various settings of interest, the resultinglog-Sobolev constant is dimension-free. In particular, our result im-plies a conjecture of Zimmermann and Bardet et al. that Gaussianconvolutions of measures with bounded support enjoy dimension-free log-Sobolev inequalities. Introduction
Functional inequalities, such as the Poincar´e inequality and the log-Sobolev inequality, have played a key role in the study of subjectssuch as concentration of measure and quantitative convergence anal-ysis of Markov processes [BGL14; Han16] (in particular for spin sys-tems [Mar99; Wei04]), as well as the geometry of metric measurespaces [Led00]. It is therefore of considerable interest to identify sit-uations in which such inequalities hold, and furthermore to identifysimple criteria which imply their validity.We begin with a few motivating examples. Suppose that µ is a prob-ability measure on R d whose support is contained in the Euclidean ballof radius R , and let γ ,t denote the centered Gaussian distribution withvariance tI d . What functional inequalities can we expect the convolu-tion measure µ ∗ γ ,t to satisfy? This question, motivated by randommatrix theory, was initiated in [Zim13; Zim16], and further investi-gated in [WW+16; Bar+18]. These works prove that µ ∗ γ ,t satisfiesboth a Poincar´e inequality and a log-Sobolev inequality; moreover, thePoincar´e inequality holds with a constant depending only on R and t ,and not on the dimension d . Furthermore, [Bar+18] conjectures thatthe same holds true for the log-Sobolev constant, and they verify theconjecture in a number of special cases. Date : February 24, 2021.
The sharp dimension dependence of the log-Sobolev inequality forGaussian convolutions is of particular interest due to numerous re-cent applications in non-convex optimization and sampling; we referto [Cha+19; Blo+20; BMR20].Another line of work [CM10; Sch19] studies the following question:let P and P be two probability measures on R d , and consider themixture distribution (1 − p ) P + pP with mixing weight p ∈ (0 , P and P satisfy log-Sobolev inequalities, when does the mixturesatisfy a log-Sobolev inequality too?Although the two preceding examples may at first glance appear tobe different in nature, we can in fact place them in the same framework,as follows. Let ( P x ) x ∈ X be a family of probability measures satisfyingthe log-Sobolev inequality, and let µ be a mixture distribution on X ;here, X may be finite or infinite. When does the mixture R P x d µ ( x )satisfy a log-Sobolev inequality? • For the Gaussian convolution example, we take P x to be theGaussian distribution with mean x and variance tI d . • For the mixture example, we take µ to be the Bernoulli distri-bution with parameter p .In this paper, we identify general conditions which ensure that amixture distribution satisfies a log-Sobolev inequality. Our main con-tribution can be summarized as follows. Theorem (informal) . Let ( P x ) x ∈ X be a family of probability measuressatisfying the log-Sobolev inequality with a uniform constant C . As-sume that the pairwise chi-squared divergences χ ( P x k P x ′ ) are uni-formly bounded by C . Then, the mixture R P x d µ ( x ) satisfies a log-Sobolev inequality with a constant depending only on C and C . In fact, in our main result, we will relax the assumption that thechi-squared divergences are uniformly bounded into a moment condi-tion; see Theorem 1. In turn, this will allow us to prove log-Sobolevinequalities for Gaussian convolutions of measures with sub-Gaussiantails, provided that the variance of the Gaussians is sufficiently large.Crucially, the log-Sobolev constant has no dependence on the mixingdistribution µ . As we show in Section 4, our general theorem yieldsdimension-free log-Sobolev inequalities in various settings; in particular,our result implies the conjecture of [Zim13; Zim16; Bar+18].The rest of the paper is organized as follows. In Section 2, we describethe setting of our general investigation and recall the definitions of aPoincar´e inequality and a log-Sobolev inequality. We then state andprove our main theorem in Section 3. IMENSION-FREE LOG-SOBOLEV INEQUALITIES 3
In Section 4, we illustrate our general result in a number of appli-cations. Section 4.1 is devoted to the proof of the aforementionedconjecture, and Sections 4.2 and 4.3 generalize the result to Gaussianconvolutions of measures with sub-Gaussian tails and other diffusionsemigroups. In Section 4.4, we compare our results to prior work onfunctional inequalities for mixtures of two distributions. Then, in Sec-tion 4.5, we discuss analogues of our result on the Boolean hypercube.2.
Background and notation
To state our results in a form that applies to both discrete and contin-uous mixture distributions, we adopt the general framework of [BGL14]and let Γ be a suitable notion of a gradient operator. More precisely,let Y be a Polish space equipped with the Borel σ -algebra B Y , andlet A be a subspace of bounded measurable functions on E containingall constant functions. Let Γ : A × A → A be a symmetric bilinearoperator satisfying Γ( f, f ) ≥ Y for every f ∈ A . Inaddition, we require Γ to satisfyΓ(1 ,
1) = 0 , (1)where 1 is understood as a constant function. For brevity, we writeΓ( f ) = Γ( f, f ).Important examples include the usual squared gradient Γ( f ) = k∇ f k on R d , and Γ( f ) = P di =1 ( D i f ) on a product space X = X d , where D i f is the discrete gradient D i f ( x ) := sup x ′ i ∈X f ( x , . . . , x i − , x ′ i , x i +1 , . . . , x d ) − inf x ′ i ∈X f ( x , . . . , x i − , x ′ i , x i +1 , . . . , x d ) . For any probability measure ρ on ( Y , B Y ), we write E ρ [ f ] := Z Y f d ρ for a ρ -integrable function f . In addition, we definevar ρ ( f ) := E ρ [( f − E ρ ) ] , ent ρ ( g ) := E ρ ( g log g ) − E ρ g log E ρ g , for suitable measurable functions f and g , with g nonnegative. Whenthere is no confusion, we often omit the brackets and parentheses inthese expressions. If X is a random variable with law µ , we also write E f ( X ) = E µ f and similarly for var and ent. HONG-BIN CHEN, SINHO CHEWI, AND JONATHAN NILES-WEED
We say that ρ satisfies a Poinc´are inequality (PI) if there is a constant C such that var ρ ( f ) ≤ C E ρ Γ( f ) , ∀ f ∈ A . (PI)The optimal constant in this inequality is denoted C P ( ρ ). In addition, ρ is said to satisfy a logarithmic Sobolev inequality (LSI) if there is aconstant C such thatent ρ ( f ) ≤ C E ρ Γ( f ) , ∀ f ∈ A . (LSI)Similarly, we let C LS ( ρ ) denote the optimal constant in this inequality.For probability measures ρ and ρ on ( Y , B Y ), the Kullback-Leibler(KL) divergence and the chi-squared divergence are defined as D KL ( ρ k ρ ) := ent ρ (cid:0) d ρ d ρ (cid:1) = Z Y d ρ d ρ ln d ρ d ρ d ρ = Z Y (cid:0) ln d ρ d ρ (cid:1) d ρ ,χ ( ρ k ρ ) := var ρ (cid:0) d ρ d ρ (cid:1) = Z Y (cid:0) d ρ d ρ − (cid:1) d ρ = Z Y d ρ d ρ d ρ − . The expressions above are understood to be + ∞ if ρ is not absolutelycontinuous w.r.t. ρ . 3. Main theorem
In addition to ( Y , B Y ), let X be a polish space with Borel σ -algebra B X . We consider a Markov kernel P : X × B Y → [0 ,
1] satisfying: (1)for each x ∈ X , P ( x, · ) is a probability measure on ( Y , B Y ), and (2)for each B ∈ B Y , P ( · , B ) is a B X -measurable function on X . We alsowrite P x := P ( x, · ) for convenience. This kernel naturally induces atransition map which maps bounded measurable functions on X tobounded measurable functions on Y : P f ( x ) := Z X f d P x , ∀ x ∈ X . For a probability measure µ on ( X , B X ), we denote by µP the proba-bility measure on ( Y , B Y ) defined by the duality Z Y f d µP = Z X P f d µ . Lastly, we introduce the following quantities. K P ( P ; µ ) := ess sup µ -a.s. x ∈ X C P ( P x ) , (2) K LS ( P ; µ ) := ess sup µ -a.s. x ∈ X C LS ( P x ) , (3) K p, χ ( P ; µ ) := E (cid:2)(cid:0) χ ( P X k P X ′ ) (cid:1) p (cid:3) p , (4) IMENSION-FREE LOG-SOBOLEV INEQUALITIES 5 for p ≥
1, where X and X ′ are i.i.d. with law µ . Since (LSI) implies (PI)with the same constant, we have K P ( P ; µ ) ≤ K LS ( P ; µ ). Throughout,for p ≥
1, we set p ∗ = pp − to be the dual exponent. Theorem 1. (1) If K P ( P ; µ ) and K p, χ ( P ; µ ) are finite for some p > , then µP satisfies (PI) with constant C P ( µP ) ≤ K P { p ∗ + K p ∗ p, χ } , where K P = K P ( P ; µ ) and K p, χ = K p, χ ( P ; µ ) .(2) If K LS ( P ; µ ) and K p, χ ( P ; µ ) are finite for some p > , then µP satisfies (LSI) with constant C LS ( µP ) ≤ K LS ( p ∗ + K p ∗ p, χ ) (1 + log K p ∗ p, χ ) , where K LS = K LS ( P ; µ ) and K p, χ = K p, χ ( P ; µ ) .Remark . Our theorem is stated with a simpler constant for readability.A slightly sharper constant can be read off from the proof. Our resultsclearly extend to the case p = ∞ ( p ∗ = 1) with K ∞ ,χ ( P ; µ ) := 1 + ess sup µ -a.s. x,x ′ ∈ X χ ( P x k P x ′ ) . For both steps, our starting point is to apply classical decompositionsfor the variance and the entropy, which have been used to prove func-tional inequalities for spin systems (see e.g. the appendix of [Wei04]).If X is a random variable drawn according to µ , thenvar µP f = E var P X f + var E P X f , (5) ent µP f = E ent P X f + ent E P X f . (6)In both of these decompositions, the first term is easy to handle be-cause we can apply the PI, resp. LSI, for the family ( P x ) x ∈ X inside theexpectation. The crux of the proof is therefore the second terms. Proof of Theorem 1 (1) . In the case p = ∞ (i.e., the pairwise chi-squared divergences are uniformly bounded), the Poincar´e inequalitycan be proven via a straightforward generalization of [Bar+18]. How-ever, the case 1 < p < ∞ requires non-trivial modifications, and wepresent a complete proof.Let X be a random variable with law µ . As described above, we usethe decomposition (5), and we focus on the problematic second termvar E P X f = E [ | E P X f − E µP f | ] . HONG-BIN CHEN, SINHO CHEWI, AND JONATHAN NILES-WEED
We can write E P X f − E µP f = Z f (cid:16) − d µP d P X (cid:17) d P X = − Z f (cid:16) − d P X d µP (cid:17) d µP . For brevity, we write χ ρ, ρ ′ := χ ( ρ k ρ ′ ). Applying the Cauchy-Schwarzinequality to the above display, we havevar E P X f ≤ E min { (var µP f ) χ P X , µP , (var P X f ) χ µP, P X }≤ E (cid:2) (var µP f ) /p ( χ P X , µP ) /p (var P X f ) /p ∗ ( χ µP, P X ) /p ∗ (cid:3) . Then, Young’s inequality implies that for all λ > E P X f ≤ λ p p (var µP f ) E (cid:2) ( χ P X , µP ) ( χ µP, P X ) p − (cid:3) + λ − p ∗ p ∗ E var P X f . Setting λ = E (cid:2) ( χ P X , µP ) ( χ µP, P X ) p − (cid:3) − p and substituting the above into (5) yieldsvar µP f ≤ (cid:8) p ∗ + E (cid:2) ( χ P X , µP ) ( χ µP, P X ) p − (cid:3) p − (cid:9) E var P X f . Using H¨older’s inequality and the convexity of the chi-squared diver-gence, we can see E (cid:2) ( χ P X , µP ) ( χ µP, P X ) p − (cid:3) ≤ E (cid:2) ( χ P X , P X ′ ) p (cid:3) where X ′ is an i.i.d. copy of X . The desired result follows from thedefinitions of K P ( P ; µ ) in (3) and K p, χ ( P ; µ ) in (4). (cid:3) To prove the second assertion in Theorem 1, we derive a so-calleddefective LSI for µP , which can be tightened to yield a full LSI. Inorder to control the second term in (6), we need a lemma. Lemma 1.
Let π and ρ be two probability measures. Then, the follow-ing holds for every non-negative function f : E π f log E π f E ρ f ≤ ent π ( f ) + E π ( f ) log (cid:0) χ ( π k ρ ) (cid:1) , where by convention both sides vanish if E π f = 0 .Proof. Recall the Donsker–Varadhan theorem : for any probabilitymeasures µ and ν , it holds(7) D KL ( µ k ν ) = sup g { E µ g − log E ν exp( g ) } , See [RS15, Theorem 5.4] or [DZ10, Lemma 6.2.13].
IMENSION-FREE LOG-SOBOLEV INEQUALITIES 7 where the supremum is taken over all g for which the expectations onthe right side make sense.We may assume that π is absolutely continuous with respect to ρ and that E π ( f log f ) < ∞ ; otherwise, the expression on the right sideis infinite. We may therefore assume that 0 < E π f < ∞ , and, sinceeach term in the lemma statement is homogeneous in f , we may assumewithout loss of generality that E π f = 1.Define a new probability measure π f by d π f d π = f . Then, E π (cid:2) f log f E ρ f (cid:3) = E π f log f E ρ f ≤ D KL ( π f k ρ ) + log E ρ exp log f E ρ f = D KL ( π f k ρ ) , where we have used (7). Since D KL ( π f k ρ ) = E π (cid:2) f log (cid:0) f d π d ρ (cid:1)(cid:3) , subtracting E π ( f log f ) from both sides of the inequality above andrecalling that we have assumed that E π f = 1 yields E π f log E π f E ρ f ≤ E π (cid:2) f log d π d ρ (cid:3) . Continuing, we have again by (7) that E π (cid:2) f log d π d ρ (cid:3) = E π f log d π d ρ ≤ D KL ( π f k π ) + log E π exp log d π d ρ = E π ( f log f ) + log (cid:0) χ ( π k ρ ) (cid:1) , as claimed. (cid:3) Proof of Theorem 1 (2) . Let
X, X ′ be i.i.d. copies with law µ . Thesecond term ent E P X ( f ) in (6) can be written asent E P X ( f ) = E h E P X ( f ) log E P X ( f ) E µP ( f ) i . Setting π = P X and ρ = µP in Lemma 1, we obtainent E P X ( f ) ≤ E ent P X ( f ) + E (cid:2) E P X ( f ) log (cid:0) χ ( P X k P X ′ ) (cid:1)(cid:3) (8)where we also used the convexity of the chi-squared divergence in thesecond inequality. The definition of K p, χ ( P ; µ ) in (4) ensures that E exp (cid:16) p log (cid:0) χ ( P X k P X ′ ) (cid:1) − p log K p, χ ( P ; µ ) (cid:17) ≤ . Using the variational principle for the entropy [Han16, Lemma 3.15]:ent Y = sup { E ( Y Z ) | Z is a random variable with E exp Z ≤ } , HONG-BIN CHEN, SINHO CHEWI, AND JONATHAN NILES-WEED we obtain E (cid:2) E P X ( f ) log (cid:0) χ ( P X k P X ′ ) (cid:1)(cid:3) ≤ p ent E P X ( f ) + log K p, χ ( P ; µ ) E µP ( f ) . Substituting this into (8) yieldsent E P X ( f ) ≤ p ∗ (cid:8) E ent P X ( f ) + log K p, χ ( P ; µ ) E µP ( f ) (cid:9) . We insert this into (6) to obtain:ent µP ( f ) ≤ ( p ∗ + 1) E ent P X ( f ) + p ∗ log K p, χ ( P ; µ ) E µP ( f ) ≤ p ∗ K LS ( P ; µ ) E µP Γ( f ) + p ∗ log K p, χ ( P ; µ ) E µP ( f ) . This inequality is known as a defective LSI (see [BGL14, § tightening the LSI, and we refer toAppendix A for details. Together with the PI in the first assertion ofTheorem 1, this completes the proof. (cid:3) Applications
Gaussian convolutions.
Set Y = R d , A = C ∞ b ( R d ) (infinitelydifferentiable functions with bounded derivatives), and Γ( f ) = k∇ f k .Let µ be a probability measure supported on B (0 , R ) := { x ∈ R d : k x k ≤ R } , and for x ∈ R d and t >
0, let γ x,t ( y ) = 1(2 πt ) d e − k y − x k t be the Gaussian with mean x and variance tI d . If we take P x = γ x,t ,then the measure µP is the convolution µ ∗ γ ,t .Functional inequalities for the measure µ ∗ γ ,t were studied in [Zim13;Zim16], and further investigated in [WW+16; Bar+18]. In particu-lar, [Bar+18] proves that C P ( µ ∗ γ ,t ) is bounded above by a functionof R and t , and is therefore dimension-free.For the log-Sobolev constant, these works also show that C LS ( µ ∗ γ ,t )is finite, but the precise dependence of this constant (in particular onthe dimension) was previously unknown. Bardet et al. [Bar+18] verifyin several cases that C LS ( µ ∗ γ ,t ) is dimension-free, and they conjecturethat this is true in general. We now show that their conjecture is animmediate consequence of Theorem 1.It is well-known that γ x,t satisfies (LSI) with C LS ( γ x,t ) = t . Also, for x, x ′ ∈ R d and t ≥
0, a straightforward computation shows that χ ( γ x,t k γ x ′ ,t ) = e k x − x ′ k /t − . IMENSION-FREE LOG-SOBOLEV INEQUALITIES 9
Hence, K ∞ , χ ( P ; µ ) ≤ e R /t and we deduce the following result. Corollary 1.
Let µ be a probability measure on R d supported on B (0 , R ) for some R ≥ . Then, for each t ≥ , µ ∗ γ ,t satisfies (LSI) with C LS ( µ ∗ γ ,t ) ≤ R + t ) e R t . Bardet et al. also prove that µ ∗ γ ,t satisfies a T transport-entropyinequality with a dimension-dependent constant; see [Vil03, § T inequality with the same constant [OV00], we immediately obtain thefollowing improvement. Corollary 2.
Let µ be a probability measure on R d supported on B (0 , R ) for some R ≥ . Then, for each t ≥ , µ ∗ γ ,t satisfies a T transport-entropy inequality with constant C T ( µ ∗ γ ,t ) ≤ R + t ) e R t . Remark . These results show that evolving a compactly supportedmeasure for a short time under the heat flow yields dimension-freefunctional inequalities, which can be interpreted as a strong regular-izing effect of the heat flow. This is in line with other results on thesmoothing behavior of the heat flow, e.g. [EL18].
Remark . As t → ∞ , Corollary 1 impliesthat lim sup t →∞ C LS ( µ ∗ γ ,t ) t ≤ . It is easy to improve this to 1, which is sharp. Indeed, from the subad-ditivity of the log-Sobolev constant under convolution, for t ≥ R , C LS ( µ ∗ γ ,t ) ≤ C LS ( µ ∗ γ , R ) + C LS ( γ ,t − R ) ≤ t + 130 R . On the other hand, as t ց
0, the exponential dependence on R /t cannot be avoided, as a simple example shows. Indeed, consider themeasure µ = δ − R + δ R in one dimension and 0 < t ≪ R . Define thefunction f : R → [ − ,
1] via f ( x ) := − x < − R/ , +1 for x > + R/ , linear interpolation in between . Let g denote a standard Gaussian variable. Then, E µ ∗ γ ,t f = 0, sovar µ ∗ γ ,t f = E µ ∗ γ ,t ( f ) ≥ P (cid:8) − R + √ t g ≤ − R (cid:9) + 12 P (cid:8) R + √ t g ≥ R (cid:9) = P (cid:8) g ≤ R √ t (cid:9) ≥ . On the other hand, | f ′ | = 2 /R on [ − R/ , R/ E µ ∗ γ ,t ( | f ′ | ) ≤ R P (cid:8) g ≥ R √ t (cid:9) ≤ R exp (cid:0) − R t (cid:1) , by standard Gaussian tail bounds. This yields the following lowerbound on the Poincar´e constant of µ ∗ γ ,t : C P ( µ ∗ γ ,t ) ≥ R exp R t . Hence, the exponential dependence on R /t is already present in thePoincar´e constant. However, it is worth noting that the exp(4 R /t )dependence in the log-Sobolev constant enters only via the Poincar´econstant through the method of tighening a defective log-Sobolev in-equality. In particular, if µ is known a priori to satisfy a Poincar´einequality with constant C P ( µ ), then µ ∗ γ ,t satisfies a Poincar´e in-equality with constant C P ( µ ∗ γ ,t ) ≤ C P ( µ ) + t , and the log-Sobolevinequality no longer suffers an explicit exponential dependence on R /t .4.2. Extension to sub-Gaussian tails.
Consider the setting in theprevious section. However, we now relax the assumption that µ hasbounded support, and instead assume that µ has sub-Gaussian tails.More specifically, assume that there exist constants σ , C SG such that Z Z exp (cid:0) k x − x ′ k σ (cid:1) d µ ( x ) d µ ( x ′ ) ≤ C SG . (9)Since a log-Sobolev inequality implies sub-Gaussian tails [BGL14, § σ , C SG are certainly necessary in orderfor µ ∗ γ ,t to satisfy (LSI). We will show that if t is a large enoughmultiple of σ , then we indeed obtain a log-Sobolev constant for µ ∗ γ ,t ,and we will explicitly estimate the constant.The main point is to estimate, for X , X ′ i.i.d. from µ , E [ { χ ( γ X,t k γ X ′ ,t ) } p ] = Z Z exp (cid:0) p k x − x ′ k t (cid:1) d µ ( x ) d µ ( x ′ ) ≤ C SG , provided that t/p ≥ σ ; then, K p, χ ( P ; µ ) p ∗ ≤ C p ∗ /p SG . We therefore take p = t/σ and we obtain as an immediate consequence of Theorem 1 thefollowing result. IMENSION-FREE LOG-SOBOLEV INEQUALITIES 11
Theorem 2.
Suppose µ is a probability measure on R d satisfying (9) and that t > σ . Then, µ ∗ γ ,t satisfies both (PI) and (LSI) , with C P ( µ ∗ γ ,t ) ≤ t (cid:8) tt − σ + C σ / ( t − σ )SG (cid:9) , and C LS ( µ ∗ γ ,t ) ≤ t (cid:8) tt − σ + C σ / ( t − σ )SG (cid:9) (cid:8) σ t − σ log C SG (cid:9) . Remark . The result of Theorem 2 recovers the result of Corollary 1,albeit with worse constants. Indeed, if µ has support contained in theball B (0 , R ) and t >
0, then we can take σ = t/ C SG = Z Z exp (cid:0) k x − x ′ k t (cid:1) d µ ( x ) d µ ( x ′ ) ≤ exp 8 R t . Then, Theorem 2 yields a log-Sobolev inequality for µ ∗ γ ,t with asimilar dependence as Corollary 1. Remark . The sub-Gaussian tail condition (9) is equivalent to µ sat-isfying a T transportation-cost inequality [BV05]. Hence, our resultshows that sufficient Gaussian smoothing upgrades a T inequality toa log-Sobolev inequality.Note that the condition t > σ is similar to the condition in [WW+16,Theorem 1.2]. Remark . As in Remark 3, the Poincar´e and log-Sobolev constantshere can easily be improved when t → ∞ to improve the constantfactor in front of t to 1.4.3. General diffusions.
We now consider a different extension of thesetting in Section 4.1. Let ( P t ) t ≥ be a Markov semigroup on ( Y , B Y )with invariant measure π and infinitesimal generator L . Let A bean algebra of bounded measurable functions such that A is dense in L ( Y , π ); A is contained in the domain of L ; and the carr´e du champ operator Γ : A × A → A given byΓ( f, g ) = 12 (cid:0) L ( f g ) − f L g − g L f (cid:1) is well defined for f, g ∈ A . We assume these objects satisfy the condi-tions specified in [BGL14, § κ ∈ R and t ≥
0, we set C loc ( κ, t ) := ( (1 − e − κt ) /κ, κ = 02 t, κ = 0 . We recall the following result ([BGL14, Theorem 5.5.2]).
Lemma 2.
For every κ ∈ R , the following statements are equivalent.(1) The curvature-dimension condition CD( κ, ∞ ) holds.(2) For all x ∈ E and t ≥ , C LS ( P tx ) ≤ C loc ( κ, t ) . The following result is then a special case of Theorem 1.
Corollary 3.
Suppose that the curvature-dimension condition
CD( κ, ∞ ) holds for some κ ∈ R . Let µ be a probability measure on ( Y , B Y ) . Then,for every t ≥ , C LS ( µP t ) ≤ C loc ( κ, t ) K ∞ , χ ( P t ; µ ) { K ∞ , χ ( P t ; µ ) } . Remark . If Y is a complete connected Riemannian manifold andthe diffusion has generator L = ∆ + h∇ V, ∇·i which satisfies thecurvature-dimension condition, then under mild conditions the con-stant K ∞ , χ ( P t ; µ ) is finite for any measure µ with bounded support,as a consequence of heat kernel estimates in [GW01].4.4. Mixtures of two distributions.
In this section, we considerthe case when X = { , } is the two-point space. Then, the mixingdistribution µ is a Bernoulli distribution with a mixing weight p ∈ [0 , µP is the convex combination µP = (1 − p ) P + pP . (10)Functional inequalities for such mixtures were studied in [CM10;Sch19]. One of the interesting findings of these papers is that asthe mixing weight p tends to { , } , the Poincar´e constant can re-main bounded whereas the log-Sobolev constant diverges logarithmi-cally. Specifically, [Sch19] shows that if P and P satisfy (LSI), p ∈ (0 , χ ( P k P ) or χ ( P k P ) is finite, then µP satis-fies (LSI). Note that this last assumption is weaker than ours, whichrequires both χ ( P k P ) and χ ( P k P ) to be finite. However, even un-der our stronger assumption, the bound of [Sch19] on the log-Sobolevconstant diverges in general as p → { , } .We now present our results for this setting for comparison. Corollary 4.
For all p ∈ [0 , , the mixture (10) satisfies (LSI) with C LS ( µP ) ≤ { C LS ( P ) , C LS ( P ) } K χ { K χ ) } , where K χ := max { χ ( P k P ) , χ ( P k P ) } . In particular, our assumption K χ < ∞ guarantees that the mixturesatisfies (LSI) with a constant independent of p , and hence does notexhibit a logarithmic divergence as p → { , } . We refer to the afore-mentioned papers for further discussion and examples of mixtures. IMENSION-FREE LOG-SOBOLEV INEQUALITIES 13
Analogues on the hypercube.
We now present another in-teresting illustration of our results. Here, we take X = { , } n tobe the Boolean hypercube, and we take Y := Y n to be a productspace. We also require the Γ operator on Y to be consistent withthe product structure; for simplicity of presentation, we omit this dis-cussion and instead think of Γ as being either the squared gradientoperator Γ( f ) = k∇ f k on Euclidean space, or the discrete gradientΓ( f ) = ( Df ) as described in Section 2. Let π , π be two probabilitymeasures on Y with K LS ( π ) := max { C LS ( π ) , C LS ( π ) } < ∞ ,K χ ( π ) := max { χ ( π k π ) , χ ( π k π ) } < ∞ . Given x ∈ { , } n , define the measure P x = n O i =1 π x i . (11)From the tensorization of the chi-squared divergence, χ ( P x k P x ′ ) = n Y i =1 { χ ( π x i k π x ′ i ) } − ≤ { K χ ( π ) } d ( x,x ′ ) − , where d ( · , · ) denotes the Hamming metric on { , } n . Moreover, each P x satisfies (LSI) with a constant at most K LS ( π ), due to the classicaltensorization of log-Soboblev inequalities. As a consequence, we deducethe following result from Theorem 1. Corollary 5.
Suppose µ is a probability measure on { , } n which issupported on a set of diameter at most k in the Hamming metric. Then,the mixture distribution µP := P x ∈{ , } n µ ( x ) P x , with P x as in (11) ,satisfies (LSI) with C LS ( µP ) ≤ k K LS ( π ) { K χ ( π ) } k (cid:8) (cid:0) K χ ( π ) (cid:1)(cid:9) . Importantly, the log-Sobolev inequality is dimension-free in the sensethat it depends only on properties of π and π as well as the diameter k of the support of µ . An example of such a measure µ is any measurewhich is supported on k/ < p < /
2, and we take π and π to be the Bernoullidistributions with parameters p and 1 − p respectively. Also, we takethe Γ operator to be the square of discrete gradient. The optimal log-Sobolev inequality for these distributions is given in [Han16, Problem K LS ( π ) = p (1 − p )2 (1 − p ) log 1 − pp , K χ ( π ) = (1 − p ) p + p − p − . Note that the mixture µP can be interpreted as the result of evolvingthe initial measure µ for a short time under the natural semigroup onthe hypercube. We obtain the following result. Corollary 6.
Suppose µ is a probability measure on { , } n which issupported on a set of diameter at most k in the Hamming metric. Then,the mixture distribution µP with < p < / satisfies (LSI) with C LS ( µP ) ≤ kp k − (1 − p ) log p . Acknowledgments
Sinho Chewi was supported by the Department of Defense (DoD)through the National Defense Science & Engineering Graduate Fellow-ship (NDSEG) Program. Jonathan Niles-Weed was supported in partby National Science Foundation (NSF) grant DMS-2015291.
Appendix A. Tightening of LSI
The following proposition is a standard result, see [BGL14, Propo-sition 5.1.3]. It is straightforward to see that bilinearity of Γ and ourassumption (1) are sufficient for the proof to go through.
Proposition 1. (1) If ρ satisfies (LSI) , then ρ satisfies (PI) with C P ( ρ ) ≤ C LS ( ρ ) .(2) If ρ satisfies the following defective LSI ent ρ ( f ) ≤ C E ρ Γ( f ) + D E ρ ( f ) ∀ f ∈ A , together with (PI) , then ρ satisfies (LSI) with C LS ( ρ ) ≤ C + C P ( ρ ) (cid:0) D (cid:1) . References [Bar+18] J.-B. Bardet et al. “Functional inequalities for Gaussianconvolutions of compactly supported measures: explicitbounds and dimension dependence”. In:
Bernoulli
EFERENCES 15 [BGL14] D. Bakry, I. Gentil, and M. Ledoux.
Analysis and geome-try of Markov diffusion operators . Vol. 348. Grundlehrender Mathematischen Wissenschaften [Fundamental Prin-ciples of Mathematical Sciences]. Springer, Cham, 2014,pp. xx+552.[Blo+20] A. Block et al. “Fast mixing of multi-scale Langevin dy-namics under the manifold hypothesis”. In: arXiv:2006.11166(June 2020).[BMR20] A. Block, Y. Mroueh, and A. Rakhlin. “Generative mod-eling with denoising auto-encoders and Langevin sam-pling”. In: arXiv:2002.00107 (June 2020).[BV05] F. Bolley and C. Villani. “Weighted Csisz´ar-Kullback-Pinsker inequalities and applications to transportationinequalities”. In:
Ann. Fac. Sci. Toulouse Math. (6)
J. Stat. Mech. Theory Exp.
12 (2019), pp. 124018, 24.[CM10] D. Chafai and F. Malrieu. “On fine properties of mix-tures with respect to concentration of measure and Sobolevtype inequalities”. In:
Ann. Inst. Henri Poincar´e Probab.Stat.
Large deviations techniquesand applications . Vol. 38. Stochastic Modelling and Ap-plied Probability. Corrected reprint of the second (1998)edition. Springer-Verlag, Berlin, 2010, pp. xvi+396.[EL18] R. Eldan and J. R. Lee. “Regularization under diffusionand anticoncentration of the information content”. In:
Duke Math. J.
Q. J. Math.
Probability in high dimension . 2016.[Led00] M. Ledoux. “The geometry of Markov diffusion genera-tors”. In: vol. 9. 2. Probability theory. 2000, pp. 305–366.[Mar99] F. Martinelli. “Lectures on Glauber dynamics for discretespin models”. In:
Lectures on probability theory and sta-tistics (Saint-Flour, 1997) . Vol. 1717. Lecture Notes inMath. Springer, Berlin, 1999, pp. 93–191.[OV00] F. Otto and C. Villani. “Generalization of an inequalityby Talagrand and links with the logarithmic Sobolev in-equality”. In:
J. Funct. Anal. [RS15] F. Rassoul-Agha and T. Sepp¨al¨ainen.
A course on largedeviations with an introduction to Gibbs measures . Vol. 162.Graduate Studies in Mathematics. American Mathemati-cal Society, Providence, RI, 2015, pp. xiv+318.[Sch19] A. Schlichting. “Poincar´e and log–Sobolev inequalities formixtures”. In:
Entropy
Topics in optimal transportation . Vol. 58. Grad-uate Studies in Mathematics. American MathematicalSociety, Providence, RI, 2003, pp. xvi+370.[Wei04] D. Weitz.
Mixing in time and space for discrete spin sys-tems . Thesis (Ph.D.)–University of California, Berkeley.ProQuest LLC, Ann Arbor, MI, 2004, p. 175.[WW+16] F.-Y. Wang, J. Wang, et al. “Functional inequalitiesfor convolution probability measures”. In:
Annales del’Institut Henri Poincar´e, Probabilit´es et Statistiques .Vol. 52. 2. Institut Henri Poincar´e. 2016, pp. 898–914.[Zim13] D. Zimmermann. “Logarithmic Sobolev inequalities formollified compactly supported measures”. In:
J. Funct.Anal. R ”. In: Ann.Math. Blaise Pascal (Hong-Bin Chen)
Courant Institute of Mathematical Sciences, NewYork University, New York, NY, USA
Email address : [email protected] (Sinho Chewi) Department of Mathematics, Massachusetts Institute ofTechnology, Cambridge, MA, USA
Email address : [email protected] (Jonathan Niles-Weed) Courant Institute of Mathematical Sciences,New York University, New York, NY, USA
Email address ::