Convergence of Langevin Monte Carlo in Chi-Square Divergence
aa r X i v : . [ s t a t . M L ] J u l A Brief Note on the Convergence of Langevin Monte Carloin Chi-Square Divergence
Murat A. Erdogdu ∗ Rasa Hosseinzadeh † July 31, 2020
Abstract
We study sampling from a target distribution ν ∗ ∝ e − f using the unadjusted Langevin MonteCarlo (LMC) algorithm when the target ν ∗ satisfies the Poincar´e inequality, and the potential f is first-order smooth and dissipative. Under an opaque uniform warmness condition on theLMC iterates, we establish that e O ( ǫ − ) steps are sufficient for LMC to reach ǫ neighborhood ofthe target in Chi-square divergence. We hope that this note serves as a step towards establishinga complete convergence analysis of LMC under Chi-square divergence. We consider sampling from a target distribution ν ∗ ∝ e − f using the Langevin Monte Carlo (LMC) x k +1 = x k − η ∇ f ( x k ) + √ ηW k , (1.1)where f : R d → R is the potential function, W k is a d -dimensional isotropic Gaussian randomvector independent from { x l } l ≤ k , and η is the step size. This algorithm is the Euler discretizationof the following stochastic differential equation (SDE) dz t = −∇ f ( z t ) dt + √ dB t , (1.2)where B t denotes the d -dimensional Brownian motion. The solution of the above SDE is referredto as the first-order Langevin diffusion, and the convergence behavior of the LMC algorithm (1.1)is intimately related to the properties of the diffusion process (1.2). Denoting the distribution of z t with ρ t , the following Fokker-Planck equation describes the evolution of the above dynamics [Ris96] ∂ρ t ( x ) ∂t = ∇ · ( ∇ f ( x ) ρ t ( x )) + ∆ ρ t ( x ) = ∇ · (cid:16) ρ t ( x ) ∇ log ρ t ( x ) ν ∗ ( x ) (cid:17) . (1.3)Convergence to the equilibrium of the above equation has been studied extensively under variousassumptions and distance measures. Defining the Chi-square divergence and the KL divergencebetween two probability distributions ρ and ν in R d as, χ (cid:0) ρ | ν (cid:1) = − R (cid:16) ρ ( x ) ν ( x ) (cid:17) ν ( x ) dx and KL (cid:0) ρ | ν (cid:1) = R log (cid:16) ρ ( x ) ν ( x ) (cid:17) ρ ( x ) dx, (1.4) ∗ Department of Computer Science and Department of Statistical Sciences at the University of Toronto, and VectorInstitute, [email protected] † Department of Computer Science at the University of Toronto, and Vector Institute, [email protected] ν satisfies the Poincar´e inequality (PI) if the following holds ∀ ρ, χ (cid:0) ρ | ν (cid:1) ≤ C P R R d (cid:13)(cid:13) ∇ ρ ( x ) ν ( x ) (cid:13)(cid:13) ν ( x ) dx, (PI)where C P is termed as the Poincar´e constant, and ρ is a probability density. If ν ∗ satisfies (PI),it is straightforward to show that dχ (cid:0) ρ t | ν ∗ (cid:1) /dt = − R k∇ ρ t ( x ) ν ∗ ( x ) k ν ∗ ( x ) dx ; thus, PI implies theexponential convergence of the diffusion ddt χ (cid:0) ρ t | ν ∗ (cid:1) ≤ − C P χ (cid:0) ρ t | ν ∗ (cid:1) = ⇒ χ (cid:0) ρ t | ν ∗ (cid:1) ≤ e − t/C P χ (cid:0) ρ | ν ∗ (cid:1) . (1.5)Under additional smoothness assumptions on the potential function, convergence behavior of (1.3)to equilibrium can be translated to that of the LMC algorithm. In particular, implications of LSIon the convergence of LMC is relatively well-understood [Dal17b, DM17, VW19, EH20]; however,there is no convergence analysis of LMC in Chi-square divergence when the target satisfies PI. Inthis note, we aim to provide an analysis under a uniform warmness condition, which we verify fora simple Gaussian example. We hope that this serves as a step towards establishing a completeconvergence analysis of LMC under Chi-square divergence. Related work.
Started by the pioneering works [DM16, Dal17b, DM17], non-asymptotic analysisof LMC has drawn a lot of interest [Dal17a, CB18, CCAY +
18, DM19, DMM19, VW19, DK19,BDMS19, LWME19]. It is known that e O (cid:0) dǫ − (cid:1) steps of LMC yield an ǫ accurate sample in KLdivergence for strongly convex and first-order smooth potentials [CB18, DMM19]. This is stillthe best-known rate in this setup, and recovers the best-known rates in total variation and L -Wasserstein metrics [DM17, Dal17b, DM19]. Recently, these global curvature assumptions arerelaxed to growth conditions [CCAY +
18, EMS18]. For example, [VW19] established convergenceguarantees for LMC when sampling from targets distributions that satisfy a log-Sobolev inequality,and has a smooth potential. This corresponds to potentials with quadratic tails [B´E85, BG99] upto finite perturbations [HS87]; thus, this result is able to deal with non-convex potentials whileachieving the same rate of e O (cid:0) dǫ − (cid:1) in KL divergence. Notation.
Throughout the note, log denotes the natural logarithm. For a real number x ∈ R , wedenote its absolute value with | x | . We denote the Euclidean norm of a vector x ∈ R d with k x k .The gradient, divergence, and Laplacian of f are denoted by ∇ f ( x ), ∇ · f ( x ) and ∆ f ( x ).We use E [ x ] to denote the expected value of a random variable or a vector x , where expecta-tions are over all the randomness inside the brackets. For probability densities p , q on R d , we useKL (cid:0) p | q (cid:1) and χ (cid:0) p | q (cid:1) to denote their KL divergence (or relative entropy) and Chi-square divergence,respectively, which are defined in (1.4).We denote the Borel σ -field of R d with B ( R d ). L -Wasserstein distance and total variation (TV)metric are defined respectively as W ( p, q ) = inf ν (cid:0)R k x − y k dν ( p, q ) (cid:1) / , and TV ( p, q ) = sup A ∈B ( R d ) (cid:12)(cid:12)R A ( p ( x ) − q ( x )) dx (cid:12)(cid:12) , where in the first formula, infimum runs over the set of probability measures on R d × R d that hasmarginals with corresponding densities p and q .Multivariate Gaussian distribution with mean µ ∈ R d and covariance matrix Σ ∈ R d × d isdenoted with N ( µ, Σ). 2
Convergence Analysis under Uniform Warmness
Our first assumption is on the target: we study target distributions that satisfy the Poincar´einequality.
Assumption 1 (Poincar´e inequality) . The target ν ∗ ∝ e − f satisfies (PI) with Poincar´e constant C P . PI can be seen as a linearization of LSI [OV00] and is sufficient to achieve exponential conver-gence of the diffusion in Chi-square divergence [CGL + W metrics can be bounded asTV ( ρ, ν ∗ ) ≤ q KL (cid:0) ρ | ν ∗ (cid:1) / ≤ q χ (cid:0) ρ | ν ∗ (cid:1) / W ( ρ, ν ∗ ) ≤ C P χ (cid:0) ρ | ν ∗ (cid:1) . Above, the former inequality is due to [Tsy08, Lemma 2.7] together with Csisz´ar-Kullback-Pinskerinequality, and the latter is shown to hold in, for example [Liu20, Theorem 1.1]. Therefore conver-gence in Chi-square divergence implies convergence in these measures of distance as well.A representative analysis of LMC (see e.g. [VW19, EH20]) starts with the interpolation process, d ˜ x k,t = −∇ f ( x k ) dt + √ dB t with ˜ x k, = x k . (2.1)We denote by ˜ ρ k,t and ρ k , the distributions of ˜ x k,t and x k , and we observe easily that ˜ ρ k,η = ρ k +1 .The advantage of analyzing the interpolation process is in its continuity in time, which allows oneto work with the Fokker-Planck equation. Using this diffusion, in Lemma 1, we show that the timederivative of χ (cid:0) ˜ ρ k,t | ν ∗ (cid:1) differs from the differential inequality (1.5) by an additive error term. Lemma 1. If ν ∗ satisfies Assumption 1, then the following inequality governs the evolution ofChi-square divergence of interpolated process (2.1) from the target ddt χ (cid:0) ˜ ρ k,t | ν ∗ (cid:1) ≤ − C P χ (cid:0) ˜ ρ k,t | ν ∗ (cid:1) + 2 E h ˜ ρ k,t (˜ x k,t ) ν ∗ (˜ x k,t ) k∇ f (˜ x k,t ) − ∇ f ( x k ) k i . (2.2) Proof.
The proof follows from similar lines that lead to a differential inequality in KL divergence(see for example [VW19]). Let ˜ ρ t | k denote the distribution of ˜ x k,t conditioned on x k , which satisfies ∂ ˜ ρ t | k ( x ) ∂t = ∇ · ( ∇ f ( x k )˜ ρ t | k ( x )) + ∆˜ ρ t | k ( x ) . Taking expectation with respect to x k we get ∂ ˜ ρ k,t ( x ) ∂t = ∇ · ( R ∇ f ( x k )˜ ρ tk ( x, x k ) dx k ) + ∆˜ ρ k,t ( x )= ∇ · (cid:0) ˜ ρ k,t ( x ) R ∇ f ( x k )˜ ρ k | t ( x k | ˜ x k,t = x ) dx k + ∇ ˜ ρ k,t ( x ) (cid:1) = ∇ · ( ˜ ρ k,t ( x ) ( E [ ∇ f ( x k ) | ˜ x k,t = x )] + ∇ log ˜ ρ k,t ( x )))= ∇ · (cid:16) ˜ ρ k,t ( x ) (cid:16) E [ ∇ f ( x k ) − ∇ f ( x ) | ˜ x k,t = x ] + ∇ log (cid:16) ˜ ρ k,t ( x ) ν ∗ ( x ) (cid:17)(cid:17)(cid:17) . ρ k,t from the target ν ∗ ddt χ (cid:0) ˜ ρ k,t | ν ∗ (cid:1) = 2 R ˜ ρ k,t ( x ) ν ∗ ( x ) × ∂ ˜ ρ k,t ( x ) ∂t dx = 2 R ˜ ρ k,t ( x ) ν ∗ ( x ) × ∇ · (cid:16) ˜ ρ k,t ( x ) (cid:16) E [ ∇ f ( x k ) − ∇ f ( x ) | ˜ x k,t = x ] + ∇ log (cid:16) ˜ ρ k,t ( x ) ν ∗ ( x ) (cid:17)(cid:17)(cid:17) dx = − R ˜ ρ k,t ( x ) h∇ ˜ ρ k,t ( x ) ν ∗ ( x ) , E [ ∇ f ( x k ) − ∇ f ( x ) | ˜ x k,t = x ] + ∇ log (cid:16) ˜ ρ k,t ( x ) ν ∗ ( x ) (cid:17) dx i = − R k∇ ˜ ρ k,t ( x ) ν ∗ ( x ) k ν ∗ ( x ) dx + 2 R h∇ ˜ ρ k,t ( x ) ν ∗ ( x ) , E [ ∇ f ( x ) − ∇ f ( x k ) | ˜ x k,t = x ] i ˜ ρ k,t ( x ) dx ≤ − R k∇ ˜ ρ k,t ( x ) ν ∗ ( x ) k ν ∗ ( x ) dx + 2 R E h ˜ ρ k,t ( x ) ν ∗ ( x ) k∇ f ( x ) − ∇ f ( x k ) k | ˜ x k,t = x i ˜ ρ k,t ( x ) dx ≤ − C P χ (cid:0) ˜ ρ k,t | ν ∗ (cid:1) + 2 E h ˜ ρ k,t (˜ x k,t ) ν ∗ (˜ x k,t ) k∇ f (˜ x k,t ) − ∇ f ( x k ) k i , where step 1 follows from the divergence theorem, step 2 from h a, b i ≤ k a k + k b k and in step 3,we used Assumption 1.The above differential inequality (2.2) is the discrete version of (1.5), which will be used toestablish a single step bound that can be iterated to yield the final convergence result. For this, wecontrol the additive error term in (2.2), namely k∇ f (˜ x k,t ) − ∇ f ( x k ) k , under a smoothness conditionon the potential function. The ratio of densities ˜ ρ k,t (˜ x k,t ) ν ∗ (˜ x k,t ) is harder to control, for which we willproceed under an opaque uniform warmness condition as stated below. Assumption 2 (Uniform warmness of LMC iterates) . The ratio of densities satisfies ∀ k, E h ˜ ρ k,t ν ∗ (˜ x k,t ) i ≤ B d where ˜ x k,t ∼ ˜ ρ k,t . (2.3) where B d is a constant that does not depend on k . This assumption states that the LMC iterates stay sufficiently close to the target. The constant B d may have an exponential dimension dependence as seen in the next example. We emphasizethat the above condition needs to proven to establish a complete convergence analysis; however, inthis note, we only verify this for the Gaussian case where we know ˜ ρ k,t explicitly. Simple example where Assumption 2 is satisfied.
In this toy example, the above as-sumption is verified for the Gaussian sampling problem with LMC. Assume for simplicity that f ( x ) = k x k , and x ∼ N (0 , σ I ). One can easily verify that˜ ρ k,t = N (0 , σ k,t I ) where σ k,t := (1 − t ) σ k + 2 t and σ k := (1 − η ) k σ + − (1 − η ) k − η/ . Let, for simplicity, σ = 0 . / (1 − η/ E h ˜ ρ k,t ν ∗ (˜ x k,t ) i = /σ k,t − d/ σ dk,t whenever σ k,t < . . For the above choice of σ , and a sufficiently small step size η , one can verify that σ k is monotonicallyincreasing and converges to 1 / (1 − η/ ≤ /
2. Moreover, σ > / B d = O (2 d ). Similar analysis can becarried out for the non-isotropic Gaussian case.Alternatively, one could assume that the LMC iterates stay uniformly warm in the sense thatsup x ρ k /ν ∗ ( x ) < ∞ (see [VW19, page 11] for a comment on a similar warmness). In this case, the4roof is simpler and does not require dissipativity (Assumption 4); however, unlike the previouscondition, this notion of warmness cannot be verified for the Gaussian example.Next, we assume that the potential is first-order smooth as follows. Assumption 3 (Smoothness) . The gradient of the potential f satisfies ∀ x, y, k∇ f ( x ) − ∇ f ( y ) k ≤ L k x − y k . We will need to bound the moments of LMC iterates for which we use the following assumption.
Assumption 4 (2-dissipativity) . For some constants a > and b ≥ , we have ∀ x, h∇ f ( x ) , x i ≥ a k x k − b. The above condition implies quadratic growth of f ; thus, we note that LSI may be moreappropriate in this setup. Under the above assumption, for sufficiently small step-size, it is knownthat the LMC iterates have finite moments of all orders. Specifically for η ≤ ψ (for explicit valuesof ψ and α , refer to [EMS18]), we have E (cid:2) k x k k (cid:3) ≤ α (1 + E (cid:2) k x k (cid:3) ) . Lemma 2.
If Assumptions 1, 2, 3 and 4 hold, when η ≤ ψ , the following inequality is satisfiedbetween the consecutive iterates of the LMC algorithm, χ (cid:0) ρ k +1 | ν ∗ (cid:1) ≤ e − η C P χ (cid:0) ρ k | ν ∗ (cid:1) + 2 γL η , where γ , B d × (cid:0) (cid:0) αL E (cid:2) k x k (cid:3) + αL + k∇ f (0) k (cid:1) + d ( d + 2) (cid:1) . Proof.
We use Lemma 1 and bound the additive term. We start by using Assumptions 2 and 3with H¨older inequality to get E h ˜ ρ k,t (˜ x k,t ) ν ∗ (˜ x k,t ) k∇ f (˜ x k,t ) − ∇ f ( x k ) k i ≤ E (cid:20)(cid:16) ˜ ρ k,t (˜ x k,t ) ν ∗ (˜ x k,t ) (cid:17) (cid:21) / E (cid:2) k∇ f (˜ x k,t ) − ∇ f ( x k ) k (cid:3) / ≤ L √ B d E (cid:2) k ˜ x k,t − x k k (cid:3) / = L √ B d E (cid:2) k−∇ f ( x k ) t + √ B t k (cid:3) / . Now we use smoothness assumption to derive a growth condition k∇ f ( x ) k ≤ k∇ f ( x ) − ∇ f (0) k + 8 k∇ f (0) k ≤ L k x k + 8 k∇ f (0) k . We use this growth bound and Assumptions 4 to bound E (cid:2) k−∇ f ( x k ) t + √ B t k (cid:3) . We write E (cid:2) k−∇ f ( x k ) t + √ B t k (cid:3) ≤ t E (cid:2) k∇ f ( x k ) k (cid:3) + 32 E (cid:2) k B t k (cid:3) ≤ t (cid:0) L E (cid:2) k x k k (cid:3) + k∇ f (0) k (cid:1) + 32 t d ( d + 2) ≤ t (cid:0) αL E (cid:2) k x k (cid:3) + αL + k∇ f (0) k (cid:1) + 32 t d ( d + 2) ≤ η (cid:0) ψ (cid:0) αL E (cid:2) k x k (cid:3) + αL + k∇ f (0) k (cid:1) + d ( d + 2) (cid:1) ≤ η γ /B d , where for the last step we used η ≤
1. Putting all of these together, we get E h ˜ ρ k,t (˜ x k,t ) ν ∗ (˜ x k,t ) k∇ f (˜ x k,t ) − ∇ f ( x k ) k i ≤ γL η. ddt χ (cid:0) ˜ ρ k,t | ν ∗ (cid:1) ≤ − C P χ (cid:0) ˜ ρ k,t | ν ∗ (cid:1) + 2 γL η. By rearranging and multiplying with exp (cid:16) C P t (cid:17) , and using the fact that t ≤ η , we obtain ddt (cid:16) e t C P χ (cid:0) ˜ ρ k,t | ν ∗ (cid:1)(cid:17) ≤ e η C P ηL γ. Integrating both sides and rearranging yields the desired inequality.By iterating the bound in the previous lemma, we establish the following convergence guarantee.
Proposition 3.
If Assumptions 1, 2, 3 and 4 hold. Then for a small ǫ satisfying ǫ ≤ C P γL ( C P ∧ ψ ) , where γ is defined in Lemma 2 and for some ∆ upper bounding the error at initialization, i.e. χ (cid:0) ρ | ν ∗ (cid:1) ≤ ∆ , if we choose the step size as η = ǫ C P γL then LMC reaches ǫ accuracy (i.e. χ (cid:0) ρ N | ν ∗ (cid:1) ≤ ǫ ) after N steps for any N satisfying N ≥ C P L ) × γǫ × log (cid:18) ǫ (cid:19) . Proof.
For this choice of η and the bound on ǫ , we have η ≤ ψ ∧ C P . This implies we can useLemma 2 and e − C P η ≤ − C P η. Combining the above bound with Lemma 2, we easily get χ (cid:0) ρ k | ν ∗ (cid:1) ≤ (1 − C P η ) χ (cid:0) ρ k − | ν ∗ (cid:1) + 2 γL η . (2.4)We state the following elementary lemma. Lemma 4.
For a real sequence { θ k } k ≥ , if we have θ k ≤ (1 − a ) θ k − + b for some a ∈ (0 , , and b ≥ , then θ k ≤ e − ak θ + b/a. Proof of Lemma 4.
Recursion on θ k ≤ (1 − a ) θ k − + b yields θ k ≤ (1 − a ) k θ + b (1 + (1 − a ) + (1 − a ) + · · · + (1 − a ) k − ) ≤ (1 − a ) k θ + ba . Using the fact that 1 − a ≤ e − a completes the proof.Iterating (2.4) with the help of Lemma 4, we obtain the following bound χ (cid:0) ρ k | ν ∗ (cid:1) ≤ e − ηN C P ∆ + C P γL η. The conditions on η and N imply that each term on the RHS is bounded by ǫ/ In this note, we provided a simple analysis on the convergence of LMC algorithm in Chi-squaredivergence under the condition that LMC iterates stay uniformly warm across all iterations (As-sumption 2). This condition needs to be verified for the general case to establish a completeconvergence analysis; however, we only verified it for a simple Gaussian example.6 cknowledgements
Authors would like to thank Andre Wibisono for pointing out to an error in the proof of Lemma 1in an early version of this note.
References [BDMS19] Nicolas Brosse, Alain Durmus, Eric Moulines, and Sotirios Sabanis,
The tamed unadjustedlangevin algorithm , Stochastic Processes and their Applications (2019), no. 10, 3638 –3663. (Cited on page 2.) [B´E85] D. Bakry and M. ´Emery,
Diffusions hypercontractives , S´eminaire de Probabilit´es XIX 1983/84(Berlin, Heidelberg), Springer Berlin Heidelberg, 1985, pp. 177–206. (Cited on page 2.) [BG99] Sergej G Bobkov and Friedrich G¨otze,
Exponential integrability and transportation cost relatedto logarithmic sobolev inequalities , Journal of Functional Analysis (1999), no. 1, 1–28. (Citedon page 2.) [CB18] Xiang Cheng and Peter L Bartlett,
Convergence of langevin mcmc in kl-divergence , PMLR 83(2018), no. 83, 186–211. (Cited on page 2.) [CCAY +
18] Xiang Cheng, Niladri S Chatterji, Yasin Abbasi-Yadkori, Peter L Bartlett, and Michael I Jor-dan,
Sharp convergence rates for langevin dynamics in the nonconvex setting , arXiv preprintarXiv:1805.01648 (2018). (Cited on page 2.) [CGL +
20] Sinho Chewi, Thibaut Le Gouic, Chen Lu, Tyler Maunu, Philippe Rigollet, and AustinStromme,
Exponential ergodicity of mirror-langevin diffusions , arXiv preprint arXiv:2005.09669(2020). (Cited on page 3.) [Dal17a] Arnak Dalalyan,
Further and stronger analogy between sampling and optimization: Langevinmonte carlo and gradient descent , Proceedings of the 2017 Conference on Learning Theory,Proceedings of Machine Learning Research, vol. 65, PMLR, 07–10 Jul 2017, pp. 678–689. (Citedon page 2.) [Dal17b] Arnak S Dalalyan,
Theoretical guarantees for approximate sampling from smooth and log-concave densities , Journal of the Royal Statistical Society: Series B (Statistical Methodology) (2017), no. 3, 651–676. (Cited on page 2.) [DK19] Arnak S Dalalyan and Avetik Karagulyan, User-friendly guarantees for the langevin montecarlo with inaccurate gradient , Stochastic Processes and their Applications (2019), no. 12,5278–5311. (Cited on page 2.) [DM16] Alain Durmus and Eric Moulines,
Sampling from strongly log-concave distributions with theunadjusted langevin algorithm , arXiv preprint arXiv:1605.01559 (2016). (Cited on page 2.) [DM17] Alain Durmus and ´Eric Moulines, Nonasymptotic convergence analysis for the unadjustedlangevin algorithm , The Annals of Applied Probability (2017), no. 3, 1551–1587. (Citedon page 2.) [DM19] , High-dimensional bayesian inference via the unadjusted langevin algorithm , Bernoulli (2019), no. 4A, 2854–2882. (Cited on page 2.) [DMM19] Alain Durmus, Szymon Majewski, and Blazej Miasojedow, Analysis of langevin monte carlo viaconvex optimization. , Journal of Machine Learning Research (2019), no. 73, 1–46. (Cited onpage 2.) [EH20] Murat A Erdogdu and Rasa Hosseinzadeh, On the convergence of langevin monte carlo: Theinterplay between tail growth and smoothness , arXiv preprint arXiv:2005.13097 (2020). (Citedon pages 2 and 3.) EMS18] Murat A Erdogdu, Lester Mackey, and Ohad Shamir,
Global non-convex optimization withdiscretized diffusions , Advances in Neural Information Processing Systems, 2018, pp. 9671–9680. (Cited on pages 2 and 5.) [HS87] Richard Holley and Daniel Stroock,
Logarithmic sobolev inequalities and stochastic ising models ,Journal of Statistical Physics (1987), no. 5, 1159–1194. (Cited on page 2.) [Liu20] Yuan Liu, The poincar´e inequality and quadratic transportation-variance inequalities , Electron.J. Probab. (2020), 16 pp. (Cited on page 3.) [LWME19] Xuechen Li, Yi Wu, Lester Mackey, and Murat A Erdogdu, Stochastic runge-kutta acceler-ates langevin monte carlo and beyond , Advances in Neural Information Processing Systems 32,Curran Associates, Inc., 2019, pp. 7748–7760. (Cited on page 2.) [OV00] Felix Otto and C´edric Villani,
Generalization of an inequality by talagrand and links with thelogarithmic sobolev inequality , Journal of Functional Analysis (2000), no. 2, 361–400. (Citedon page 3.) [Ris96] Hannes Risken,
Fokker-planck equation , The Fokker-Planck Equation, Springer, 1996, pp. 63–95. (Cited on page 1.) [Tsy08] Alexandre B Tsybakov,
Introduction to nonparametric estimation , Springer Science & BusinessMedia, 2008. (Cited on page 3.) [VW19] Santosh Vempala and Andre Wibisono,
Rapid convergence of the unadjusted langevin algorithm:Isoperimetry suffices , Advances in Neural Information Processing Systems, 2019, pp. 8092–8104. (Cited on pages 2, 3, and 4.)(Cited on pages 2, 3, and 4.)