Hypocoercivity of Piecewise Deterministic Markov Process-Monte Carlo
Christophe Andrieu, Alain Durmus, Nikolas Nüsken, Julien Roussel
aa r X i v : . [ s t a t . C O ] F e b Hypocoercivity of Piecewise Deterministic MarkovProcess-Monte Carlo
Christophe Andrieu , Alain Durmus , Nikolas Nüsken , and Julien Roussel School of Mathematics, University of Bristol, UK. CMLA - École normale supérieure Paris-Saclay, CNRS, Université Paris-Saclay,94235 Cachan, France. Imperial College London, UK. École des ponts ParisTech and INRIA, Paris, France.March 1, 2019
Abstract
In this work, we establish L -exponential convergence for a broad class of Piecewise Deter-ministic Markov Processes recently proposed in the context of Markov Process Monte Carlomethods and covering in particular the Randomized Hamiltonian Monte Carlo [21, 11], the Zig-Zag process [6] and the Bouncy Particle Sampler [51, 12]. The kernel of the symmetric partof the generator of such processes is non-trivial, and we follow the ideas recently introduced in[20, 21] to develop a rigorous framework for hypocoercivity in a fairly general and unifying set-up, while deriving tractable estimates of the constants involved in terms of the parameters ofthe dynamics. As a by-product we characterize the scaling properties of these algorithms withrespect to the dimension of classes of problems, therefore providing some theoretical evidenceto support their practical relevance. Consider a probability distribution π defined on the Borel σ -field X of some domain X = R d or X = T d where T = R / Z . Assume that π has a density with respect to the Lebesgue measure alsodenoted π and of the form π = e − U / R R d e − U ( y ) d y where U : X → R is a continuously differentiablefunction and is referred to as the potential associated with π . Sampling from such distributionsis of interest in computational statistical mechanics and in Bayesian statistics and allows one,for example, to compute efficiently expectations of functions f : X → R with respect to π byinvoking empirical process limit theorems, e.g. the law of large numbers. In practical set-ups,sampling exactly from π directly is either impossible or computationally prohibitive. A standardand versatile approach to sampling from such distributions consists of using Markov Chain Monte [email protected]; [email protected]; [email protected]; [email protected] π invariant is exploited. Markov Process Monte Carlo (MPMC) methods are thecontinuous time counterparts of MCMC but their exact implementation is most often impossibleon computers and requires additional approximation, such as time discretization of the process inthe case of the Langevin diffusion. A notable exception, which has recently attracted significantattention, is the class of MPMC relying on Piecewise Deterministic Markov Processes (PDMP)[17], which in addition to being simpler to simulate than earlier MPMC, are nonreversible, offeringthe promise of better performance. We now briefly introduce a class of processes covering existingalgorithms. The generic mathematical notation we use in the introduction is fairly standard andfully defined at the end of the section.Known PDMP Monte Carlo methods rely on the use of the auxiliary variable trick, that isthe introduction of an instrumental variable and probability distribution µ defined on an extendeddomain, of which π is a marginal distribution, which may facilitate simulation. In the presentset-up, one introduces the velocity variable v ∈ V ⊂ R d associated with a probability distribution ν defined on the σ -field V of V , where the subset V is assumed to be closed. Standard choices for ν include the centered normal distribution with covariance matrix m I d , where I d is the d -dimensionalidentity matrix, the uniform distribution on the unit sphere S d − , or the uniform distribution on V = {− , } d . Let E = X × V and define the probability measure µ = π ⊗ ν . The aim is now tosample from the probability distribution µ .We denote by C ( E ) the set of bounded functions of C ( E ). The PDMP Monte Carlo algorithmswe are aware of fall in a class of processes associated with generators of the form, for f ∈ C ( E )and ( x, v ) ∈ E , L f ( x, v ) = v ⊤ ∇ x f ( x, v ) + K X k =1 λ k ( x, v ) ( B k − Id) f ( x, v ) + m / λ ref ( x ) R v f ( x, v ) , (1)where K ∈ N , λ k : E → R + for k ∈ { , . . . , K } , λ ref : X → R + , ( R v , D( R v )) and ( B k , D( B k )) for k ∈ { , . . . , K } are operators we specify below, and m = Z V v d ν ( v ) , (2)which is assumed to be finite. For any k ∈ { , . . . , K } , λ k will be referred to as a jump rate and λ ref as the refreshment rate.In the case where V = R d and ν is the zero-mean Gaussian distribution on R d with covariancematrix m I d , we also consider generators of the form, for any f ∈ C ( E ) and ( x, v ) ∈ E , L f ( x, v ) = L f ( x, v ) − m F ( x ) ⊤ ∇ v f ( x, v ) , (3)where F : X → R d .For any k ∈ { , . . . , K } , the jump operators B k we consider are associated with continuousvector fields F k : X → R d of the form, for any f : E → R and ( x, v ) ∈ E , B k f ( x, v ) = f (cid:0) x, v − (cid:0) v ⊤ n k ( x ) (cid:1) n k ( x ) (cid:1) , n k ( x ) = ( F k ( x ) / | F k ( x ) | if F k ( x ) = 0 , F k ( X ) at the event position X , i.e. a flip of the component of the velocity in the direction given2y F k inducing an elastic “bounce” of the position trajectory with the hyperplane. As we shall see,the K + 1 vector fields F k are tied to the potential U by the relation ∇ x U = P Kk =0 F k , required toensure that µ is left invariant by the associated semi-group. Informally, assuming for the momentthat λ ref = 0 and F = ∇ x U for some U : X → R , the corresponding process follows the solutionof Hamilton’s equations ( ˙ x t , ˙ v t ) = (cid:0) v t , −∇ x U ( x t ) (cid:1) for a random time of distribution governed byan inhomogeneous Poisson process with rate ( x, v ) P Kk =1 λ k ( x, v ). When an event occurs andthe current state of the process is ( X, V ), one chooses between the K possible updates of the stateavailable, with probability proportional to λ ( X, V ) , . . . , λ K ( X, V ), with the particularity here thatthe position X is left unchanged.The vector fields { F k : X → R d ; k ∈ { , . . . , K }} and jump rates { λ k : E → R + ; k ∈ { , . . . , K }} are linked by the relations λ k ( x, v ) − λ k ( x, − v ) = v ⊤ F k ( x ) for k ∈ { , . . . , K } and ( x, v ) ∈ E , togetherwith other conditions, required to ensure that µ is an invariant distribution of the associated semi-group. A standard choice, sometimes referred to as canonical, consists of choosing jump rates λ k ( x, v ) = [ v ⊤ F k ( x )] + for k ∈ { , . . . , K } and ( x, v ) ∈ E .Denote by L ( µ ) the set of measurable functions g : E → R such that R E g d µ < + ∞ . We let k · k be the norm induced by the scalar productfor all f, g ∈ L ( µ ) , h f, g i = Z E f g d µ , (5)making L ( µ ) a Hilbert space.The operator R v will be referred to as the refreshment operator, a standard example of whichis R v = Π v − Id where Π v is the following orthogonal projector in L ( µ ): for any f ∈ L ( µ ),Π v f ( x, v ) = Z V f ( x, w ) d ν ( w ) , (6)in which case the velocity is drawn afresh from the marginal invariant distribution, while the positionis left unchanged. In this scenario the informal description of the process given above carries onwith λ ref = 0 added to the rate ( x, v ) P Kk =1 λ k ( x, v ), Π v an additional possible update to thevelocity chosen with probability proportional to λ ref . Another possible choice is the generator ofan Ornstein-Uhlenbeck operator leaving ν invariant.In all the paper we assume the following condition to hold for either L or L , a conditionsatisfied by the examples covered in this manuscript. A1. (a) The operator L is closed in L ( µ ) , generates a strongly continuous contraction semi-group ( P t ) t ≥ on L ( µ ) , i.e. P = Id , for any t, s ∈ R + , P s + t = P s P t , for any f ∈ L ( µ ) , k P t f k ≤ k f k and lim t → k P t f − f k = 0 .(b) µ is a a stationary measure for ( P t ) t ≥ , i.e. for any t ∈ R + , µP t = µ .(c) There exists a core C for L such that C is dense in L ( µ ) and C ⊂ D( L ) ∩ D( L ⋆ ) , where ( L ⋆ , D( L ⋆ )) is the adjoint of L on L ( µ ) . Note that if L generates a strongly continuous contraction semi-group then D( L ) is dense by [27,Theorem 2.12] and the adjoint of L on L ( µ ) is therefore well-defined and closed by [49, Theorem5.1.5], and D( L ⋆ ) is dense.We now describe how various choices of K and F k lead to known algorithms. For simplicity ofexposition, we assume for the moment that V = R d , ν is the zero-mean Gaussian distribution with3ovariance matrix m I d and R v = Π v − Id, but as we shall see later our results cover more generalscenarios. • The particular choice K = 0 and F = ∇ x U corresponds to the procedure described in [23]as a motivation for the popular hybrid Monte Carlo method. This process is also known asthe Linear Boltzman/kinetic equation in the statistical physics literature [5] or randomizedHamiltonian Monte Carlo [11]. In this scenario the process follows the isocontours of µ forrandom times distributed according to an inhomogeneous Poisson law of parameter λ ref > ν . • The scenario where K = d , F = 0 and for k ∈ { , . . . , d } , x ∈ X , F k ( x ) = ∂ k U ( x ) e k where( e k ) k ∈{ ,...,d } is the canonical basis, corresponds to the Zig-Zag (ZZ) process [6], where the x component of the process follows straight lines in the direction v which remains constantbetween events. In this scenario, the choice of B k to update the velocity, consists of negatingits k -th component; see also [29] for related ideas motivated by other applications. • The standard Bouncy Particle Sampler (BPS) of [51], extended by [12], correspond to thechoice K = 1, F = 0 and F = ∇ x U . • More elaborate versions of the ZZ and BPS processes, motivated by computational consid-erations, take advantage of the possibility to decompose the energy as U = P Kk =0 U k andcorresponds to the choice F k = ∇ x U k [43, 12], where in the former the sign flip operation isreplaced with a component swap. • It should be clear that one can consider more general deterministic dynamics with F = 0,effectively covering the Hamiltonian Bouncy Particle Sampler, suggested in [55]. • We remark that the well-known Langevin algorithm corresponds to K = 0, F = ∇ x U andthe situation where R v is the Ornstein-Uhlenbeck process.More general bounces involving randomization (see [55, 58, 44]) can also be considered in ourframework, at the cost of additional complexity and reduced tightness of our bounds.The main aim of the present paper is the study of the long time behaviour for the class of pro-cesses described above using hypercoercivity methods popularized by [57]. More precisely, consider( P t ) t ≥ the semigroup associated to the PDMP with generator L ∈ {L , L } defined above, we aimto find simple and verifiable conditions on U, F k , R v and λ ref ensuring the existence of A ≥ α >
0, and their explicit computation in terms of characteristics of the data of the problem, suchthat for any f ∈ L ( µ ) := (cid:8) g ∈ L ( µ ) : R E g d µ = 0 (cid:9) and t ≥ k P t f k ≤ A e − αt k f k . (7)Establishing such a result is of interest to practitioners for multiple reasons. Explicit boundsmay provide insights into expected performance properties of the algorithm in various situations orregimes. In particular the above leads to an upper bound on the integrated autocorrelation, whichis a performance measure of Monte Carlo estimators of R E f d µ , f ∈ L ( µ ), defined bylim T →∞ T Var µ T − Z T f ( X t , V t ) d t ! / k f k ( π ) ≤ A / α , X t , V t ) t ≥ is a trajectory of a PDMP process of generator L with ( X , V ) distributedaccording to µ . For a class of problems of, say, increasing dimension d → ∞ , weak dependence of A and α on d indicates scalability of the method. It is worth pointing out that the result aboveis equivalent to the existence of A ≥ α > ρ ≪ µ such that k d ρ / d µ k < ∞k ρ P t − µ k TV = Z E | d( ρ P t ) / d µ − | d µ ≤ k d( ρ P t ) / d µ − k L ( π ) ≤ A e − αt k d ρ / d µ − k L ( π ) , where the leftmost inequality is standard and a consequence of the Cauchy-Schwarz inequality. Ourhypocoercivity result therefore also allows characterization of convergence to equilibrium of PDMPsin various scenarios and regimes, leading in particular to the possibility to compare performance ofalgorithms started from the same initial distribution. Establishing similar results for different met-rics may be a useful complement to our characterization of algorithmic computational complexityand is left for future work.In [46, 57], convergence of the type (7) is established using an appropriate H -norm associatedwith µ . The method which was developed in these papers is closely related to hypoellipticity theory[39, 26, 37] for Partial Differential Equation and in particular the kinetic Fokker-Planck equation.Convergence for linear Boltzman equations was first derived in [36, 46]. Since then, several workshave extended and completed these results [21, 35, 1, 14, 28, 45]. Notation and conventions
The canonical basis of R d is denoted by ( e i ) i ∈{ ,...,d } and the d -dimensional identity matrix I d . TheEuclidean norm on R d or R d × d is denoted by | · | , and is associated with the usual Frobenius innerproduct Tr(Φ ⊤ Γ) for any Φ , Γ in R d or R d × d .Let M be a smooth submanifold of R n , for n ∈ N . For any k ∈ N , denote by C k ( M , R m ) theset of k -times differentiable functions from M to R m , C k b ( M , R m ) stands for the subset of boundedfunctions in C k ( M , R m ) with bounded differentials up to order k . C k ( M ) and C k b ( M ) stand forC k ( M , R ) and C k b ( M , R ) respectively.For f : X → R and i ∈ { , . . . , d } , x ∂ x i f ( x ) stands for the partial derivative of f withrespect to the i th -coordinate, if it exists. Similarly, for f : X → R , i, j ∈ { , . . . , d } , denote by ∂ x i ,x j f = ∂ x i ∂ x j f when ∂ x i ∂ x j f exists. For f = ( f , . . . , f m ) ∈ C ( X , R m ), ∇ x f stands for thegradient of f defined for any x ∈ X by ∇ x f ( x ) = ( ∂ x j f i ( x )) i ∈{ ,...,m } , j ∈{ ,...,d } ∈ R d × m . For easeof notation, we also denote by ( ∇ x , D( ∇ x )) the densely defined closed extension of ( ∇ x , C ( X )) onL ( π ), see [40, p. 88]. For any f ∈ C k ( X , R m ), k ∈ N and p ≥
0, define k f k k,p = sup x ∈ X sup ( i ,...,i k ) ∈{ ,...,d } k n k ∂ x i ,...,x ik f ( x ) k / (1 + k x k p ) o . We set for k ≥
0, C k poly ( X , R m ) = (cid:26) f ∈ C k ( X , R m ) : inf p ≥ k f k k,p < + ∞ (cid:27) , and C k poly ( X ) simply stands for C k poly ( X , R ). For any f ∈ C ( X , R ), we let ∆ x f denote the Laplacianof f . Id stands for the identity operator. For two self-adjoint operators ( A , D( A )) and ( B , D( B ))on a Hilbert space H equipped with the scalar product h· , ·i and norm k·k , denote by A (cid:23) B if5 f, A f i ≥ h f, B f i for all f ∈ D( A ) ∩ D( B ). Then, define ( AB , D( AB )) with domain, if not specified,D( AB ) = D( B ) ∩ {B − D( A ) } . For a bounded operator A on H , we let ~ A ~ = sup f ∈ H kA f k / k f k .Π is said to be an orthogonal projection if Π is a bounded symmetric operator H and Π = Π.An unbounded operator ( A , D( A )) is said to be symmetric (respectively anti-symmetric) is for any f, g ∈ D( A ), hA f, g i = h f, A g i (respectively hA f, g i = − h f, A g i ). If A is densily defined, A is saidto be self-adjoint if A = A ⋆ . If in addition A is closed, C ⊂ D( A ) is said to be a core for A if theclosure of A C is A . Denote by 1 F the constant function equals to 1 from a set F to R . For anyunbounded operator ( A , D( A )), we denote by Ran( A ) = {A f : f ∈ D( A ) } and Ker( A ) = { f ∈ D( A ) : A f = 0 } . For any probability measure m on a measurable space ( M , F ), we denote byL (m) the Hilbert space of measurable functions f satisfying R M f dm < + ∞ , equipped with theinner product h f, g i m = R M f g dm, and L (m) = { f ∈ L (m) : R M f dm = 0 } . We will use the samenotation for vector and matrix fields Φ , Γ ∈ (cid:0) R d (cid:1) M or (cid:0) R d × d (cid:1) M , i.e. h Φ , Γ i m = R M Tr (cid:0) Φ ⊤ Γ (cid:1) dmand no confusion should be possible. When m = µ we replace m with 2 in this notation. For any x ∈ M denote by δ x the Dirac distribution at x . We define the total variation distance between twoprobability measures m , m on ( M , F ) by k m − m k TV = sup A ∈F | m ( A ) − m ( A ) | . For a squarematrix A we let diag( A ) be its main diagonal and for a vector v ∈ R d we let diag( v ) be the squarematrix of diagonal v and with zeros elsewhere. For a, b ∈ R we let a ∧ b denote their minimum. For a, b ∈ R d ( A, B ∈ R d × d ), we denote by a ⊙ b ∈ R d ( A ⊙ B ∈ R d × d ) the Hadamard product between a and b defined for any i ∈ { , . . . , d } ( i, j ∈ { , . . . , d } ) by ( a ⊙ b ) i = a i b i (( A ⊙ B ) i,j = A i,j B i,j ).For any i, j ∈ N , δ i,j denotes the Kronecker symbol which is 1 if i = j and 0 otherwise. For any n , n ∈ N , n < n , we let P n n = 0. We now state our main results. In the following, for any densely defined operator ( C , D( C )) we let( C ⋆ , D( C ⋆ )) denote its L ( µ )-adjoint. First we specify conditions imposed on the potential U . H1.
The potential U ∈ C ( X ) and satisfies(a) there exists c ≥ such that, for any x ∈ X , ∇ x U ( x ) (cid:23) − c I d ;(b) lim inf | x |→∞ (cid:8) |∇ x U ( x ) | / − ∆ x U ( x ) (cid:9) > . From [50, 3], H π satisfies a Poincaré inequality on X , thatis the existence of C P > f ∈ C ( X ) satisfying R X f d π = 0, k∇ x f k ≥ C P k f k . (8)Further, H c > ̟ ≥ x ∈ X ,∆ x U ( x ) ≤ c d ̟ + |∇ x U ( x ) | / . (9) H d infront of c will appear natural in the sequel. We have opted for this formulation of the assumptionrequired of the potential to favour intuition and link it to the necessary and sufficient condition forgeometric convergence of Langevin diffusions, but our quantitative bounds below will be given in6erms of the Poincaré constant C P for simplicity (see [4, Section 4.2] for quantitative estimates of C P depending on potentially further conditions on U ). H x ∈ X {|∇ x U ( x ) | / (1 + |∇ x U ( x ) | ) } < ∞ and rephrase our results interms of any finite upper bound of this quantity (see [21, Sections 2 and 3]). Finally the Poincaréinequality (8) implies by [4, Proposition 4.4.2] that there exists s > Z R d e s | x | d π ( x ) < + ∞ . (10) H2.
The family of vector fields { F k : X → R d ; k ∈ { , . . . , K }} satisfies(a) for k ∈ { , . . . , K } , F k ∈ C ( X , R d ) ;(b) for all x ∈ X , ∇ x U ( x ) = P Kk =0 F k ( x ) ;(c) for all k ∈ { , . . . , K } there exists a k ≥ such that for all x ∈ X , | F k | ( x ) ≤ a k { |∇ x U | ( x ) } . (11)This assumption is in particular trivially true for the Zig-Zag and the Bouncy Particle Samplers.In turn we assume the jump rates to be related to the family of vector fields { F k : X → R d ; k ∈{ , . . . , K }} through the following conditions. H3.
There exist a continuous function ϕ : R → R + , C ϕ ≥ and c ϕ ≥ satisfying for any s ∈ R , ϕ ( s ) − ϕ ( − s ) = s , and | s | ≤ ϕ ( s ) + ϕ ( − s ) ≤ c ϕ m / + C ϕ | s | , (12) such that for any k ∈ { , . . . , K } and ( x, v ) ∈ E , λ k ( x, v ) = ϕ (cid:0) v ⊤ F k ( x ) (cid:1) . We note that the canonical choice ϕ ( s ) = ( s ) + satisfies these conditions and that the firstcondition of (12) is equivalent to ϕ ( s ) − ( s ) + = ϕ ( − s ) − ( − s ) + , implying that ϕ ( s ) ≥ ( s ) + forall s ∈ R and therefore that the left hand side inequality in (12) is automatically satisfied. If wefurther assume the existence of C, c ≥ s ∈ R , ϕ ( s ) ≤ cm / + C ( s ) + then thesecond inequality is satisfied with C ϕ = C and c ϕ = 2 c . As remarked in [2], the first condition of(12) holds for rates based on the choice ϕ ( s ) := − log (cid:0) φ (cid:0) exp( − s ) (cid:1)(cid:1) , such that φ : R + → [0 ,
1] satisfies rφ ( r − ) = φ ( r ) for all r ∈ R + \ { } . The canonical choicecorresponds to φ ( r ) = 1 ∧ r , but the (smooth) choice φ ( r ) = r/ (1 + r ) is also possible. H4.
Assume that V and ν satisfy the following conditions.(a) V is stable under bounces, i.e. for all ( x, v ) ∈ E and k ∈ { , . . . , K } , v − v ⊤ n k ( x )) n k ( x ) ∈ V ,where n k ( x ) is defined by (4) .(b) For any A ∈ V , x ∈ X , we have ν (cid:0)(cid:8) Id − k ( x )n k ( x ) ⊤ (cid:9) A (cid:1) = ν ( A ) , for any k ∈ { , . . . , K } .(c) For any bounded and measurable function g : R → R , i, j ∈ { , . . . , d } such that i = j , R V g ( v i , v j ) d ν ( v ) = R V g ( − v , v ) d ν ( v ) ; d) ν has finite fourth order marginal moment m = (1 / (cid:13)(cid:13) v (cid:13)(cid:13) = (1 / Z V v d ν ( v ) < + ∞ , and for any i, j, k, l ∈ { , . . . , d } such that card( { i, j, k, l } ) > , R V v i v j v k v l d ν ( v ) = 0 . Note that in the case where V and ν are rotation invariant, i.e. for any rotation O on R d , O V = V and for any A ∈ V , ν ( O A ) = ν ( A ), then H H R V v v d ν ( x ) = 0 taking g ( v , v ) = v v for any ( v , v ) ∈ R and thereforefor any i, j ∈ { , . . . , d } such that i = j , R V v i v j d ν ( v ) = 0. In addition, under H m , = k v v k = Z V v v d ν ( v ) < ∞ , and note that in the Gaussian case we have the relation m = m , = m . Finally, under H
4, forany f, g ∈ L ( µ ) and k ∈ { , . . . , K } , hB k f, g i = h f, B k g i , that is B k is symmetric on L ( µ ).In this paper we consider operators ( R v , D( R v )) on L ( µ ) satisfying the following conditions. H5. (a) Functions depending only on the position belong to the kernel of R v : L ( π ) ⊂ D( R v ) and for any f ∈ L ( π ) , R v f = 0 ;(b) R v satisfies the detailed balance condition: R v = R ⋆v and C ( E ) ⊂ D( R v ) ;(c) R v admits a spectral gap of size on L ( ν ) : for any g ∈ L ( ν ) ∩ D( R v ) , h−R v g, g i ≥ k g k ; inaddition, for any f ∈ L ( π ) , it holds for any i ∈ { , . . . , d } , v i f ∈ D( R v ) and −R v ( v i f ) = v i f . Typically, R v is of the form Id ⊗ ˜ R v where ( ˜ R v , D( ˜ R v )) is a self-adjoint operator on L ( ν ) withspectral gap equals 1. Then, condition H R v (1 V ) = 0, which implies that forany g ∈ D( ˜ R v ), we have Z V R v g d ν = h E , R v g i = hR ⋆v (1 V ) , g i = hR v (1 V ) , g i = 0 , so that the process associated with ˜ R v preserves the probability measure ν .Note that H R v Π v = 0, whereas H −R v ( v Π v ) = v Π v ,where Π v is defined by (6). Assumption H R v = Π v , or R v = Id ⊗ ˜ R v with ˜ R v the generator of the Ornstein-Uhlenbeck process defined for any g ∈ C ( R d ) by˜ R v g = −∇ v g ⊤ v + ∆ v g . H 6.
The refreshment rate λ ref : X → R + is bounded from below and from above as follows: thereexist λ > and c λ ≥ such that for all x ∈ X , < λ ≤ λ ref ( x ) ≤ λ (1 + c λ |∇ x U ( x ) | ) . Under the previous assumptions we can prove exponential convergence of the semigroup.8 heorem 1.
Assume that L i , i ∈ { , } given by (1) or (3) satisfies A C = C ( E ) and H H H H H H A > and α > such that, for any f ∈ L ( µ ) ,and t ∈ R + , k P t f k ≤ A e − αt k f k . The constants A and α are given in explicit form in (20) in Theorem 4 (Section 3), in terms ofthe constant appearing in H H H H H
6, where ǫ can be taken to be ǫ given in (22) , λ v = λ , λ x = C P / (1 + C P ) and R = (4 + 2 √ ∨ ( λ/ / ) ∨ R where R = p m , + 3( m − m , ) + m ( / (1 + C ϕ ) κ κ K X k =1 a k + κ ) + λ / (cid:26) c λ κ κ (cid:27) + c ϕ K / , (13) κ = (1 + c / / and κ − = C − (1 + 4 c d ̟ + 16 C ) / .Proof. The proof is postponed to Section 4.1.The following details the expected scaling behaviour with d of A and α . The proof can be foundin Section 4.3. Corollary 2.
Consider the assumptions and notation of Theorem 1. Further suppose that thereexists m b > satisfying m − q m , + 3( m − m , ) + ≤ m b , (14) which together with C P , c , c and k a k ∞ = sup k ∈{ ,...,K } a k are independent of d . Then A ≤ / and there exists C α ( C P , c , c , k a k ∞ , m b ) > , independent of d, λ, c λ and C ϕ , c ϕ , such that for d large enough, α > C α ( C P , c , c , k a k ∞ , m b ) λ m / (cid:2) { c ϕ K } ∨ { (1 + C ϕ ) d (1+ ̟ ) / K + 1 } ∨ { λ (1 + c λ d (1+ ̟ ) / ) } (cid:3) − . (15) Thus, if λ , c λ , C ϕ and c ϕ are fixed, we get that α − is in general at most of order O ( m − / d ̟ K ) if K ≥ . We now discuss the assumptions of the theorem, and application of its conclusion to variousinstances of PDMP-MC and two examples of potentials. Assumption H H H H H A ( E ) is indeed a core for the generator L . As shown in [25], BPS and ZZ arewell defined Markov process whose generators admit C ( E ) as a core and similar arguments can beused to establish that it is also a core for the RHMC. Further, it is not difficult to show that forthe class of processes described earlier, for any f ∈ C ( E ), hL f, i = 0, therefore implying that µ is an invariant distribution and that A m / , since if( X t , V t ) t ≥ is a PDMP with generator of the form (1) or (3) for m = 1, then ( X m / t , m / V m / t ) t ≥ is a PDMP with generator of the same form with m = m . We therefore set m = 1 below, acondition satisfied when ν is the uniform distribution on the sphere √ d S d − or {− , } d , or the d -dimensional zero-mean Gaussian distribution with covariance matrix I d , all of which also satisfy(14). More generally, by Lemma 36 in Appendix D, property (14) is satisfied if ν is a sphericallysymmetric distribution on R d corresponding to random variables V = B / W for W uniformly9 α U ( x ) = P di =1 (1 + x i ) β / U ( x ) = (1 + | x | ) β RHMC Θ(1) ω (cid:0) λ ∧ λ − (cid:1) BPS Θ (cid:0) d (1+ ̟ ) / (cid:1) ω (cid:0) d − (1+ ̟ ) / (cid:1) β ≥ ̟ = 0 β ≥ ̟ = 1 − /β ZZ (crude) Θ (cid:0) d (3+ ̟ ) / (cid:1) ω (cid:0) d − (3+ ̟ ) / (cid:1) ZZ (Section 5) Θ(1) ω (1) β ≥ β = 2Table 1: Left hand side: summary of the dependence of α on d for C P , c , c , k a k ∞ constant, m = 1and optimal choice of λ . Right hand side: summary of application to two examples of potentials.distributed on the hypersphere √ d S d − and B a non-negative random variable independent of W and of first and second order moments γ and γ respectively such that γ / /γ is upper boundedby a constant independent of the dimension.By [4, Proposition 5.1.3, Corollary 5.7.2], independence of C P on d is satisfied for strongly convexpotentials U : i.e. whenever there exists m > ∇ x U ( x ) (cid:23) m I d for any x ∈ R d whichimplies that one can take C P = m . This is the case for U ( x ) = P di =1 (cid:0) x i (cid:1) β / U ( x ) = (1+ | x | ) β with β ≥
1, for which (9) is also satisfied with ̟ = 0 and ̟ = 1 − /β respectively (see Lemma 40and Lemma 41 in Appendix F). We note that from the Holley-Stroock perturbation principle [38],uniformly bounded perturbations of a strongly convex potential lead to independence of C P on d .For β ∈ [1 / , C P >
0, but is dependent on d , see [4, Chapter 4]. However recent progress in theprecise quantitative estimation of spectral gaps of certain probability measures [9, 10] allows for thestrong convexity property to be relaxed to simple convexity and beyond, but leads to a dependenceof C P on d which can be characterised.Now further assume that C ϕ , c ϕ and that the refreshment rate are uniformly bounded in the posi-tion x , implying c λ = 0. Then by Corollary 2-(15), there exists C α ( C P , c , c , k a k ∞ , m b , c ϕ , C ϕ ) > d sufficiently large α ≥ C α ( C P , c , c , k a k ∞ , m b , c ϕ , C ϕ ) n(cid:2) λ (cid:0) K d ̟ (cid:1) − / (cid:3) ∧ λ − o , from which we deduce the optimal scaling of the refreshment rate, namely C λ (cid:0) K d ̟ (cid:1) / ≤ λ ≤ C λ (cid:0) K d ̟ (cid:1) / for C λ , C λ > (cid:0) (1 + K d ̟ ) / (cid:1) hereafter to al-leviate notation). Using the description of RHMC, ZZ and BPS provided in the introductionwe deduce the first three lines of Table 1, where α = ω ( s ) is used as a short hand notation for α ≥ C α ( C P , c , c , k a k ∞ , m b , c ϕ , C ϕ ) s for s →
0. The fourth line uses our specialised results ofSection 5, showing that the conclusion of Theorem 2 is not optimal for ZZ.In [7] scaling limits of particular functionals of the ZZ and BPS processes are studied, leadingto quantitative estimates of the time required to achieve near independence at equilibrium. Morespecifically they consider the scenario where the target distribution is a centred normal distributionof covariance matrix I d and focus on the angular momentum, the negative log-target density andthe first coordinate of the process. Our more general results, obtained using a different argument,are in agreement after noticing that [7] considered the scenario m = d − and using our earlierremark on the dependence of our estimate of the absolute spectral gap on m / . In [19] it is shown,again using an approach different from ours, that the RHMC has dimension free convergence ratein a scenario similar to ours.While nonreversibily of the processes considered here may be practically beneficial, it is onlyrecently that the tools allowing our work have been developed [56, 57]. Our method of proof10elies on the framework proposed recently in [20, 21, 13] to study the solutions of the forwardKolmogorov equation associated with the linear kinetic process, but we study the dual backwardKolmogorov equation for a broader class of processes as is the case in [31, 32, 33] who providethe first rigorous derivation of the results of [20, 21, 13]. This, combined with the flexibility ofthe framework of [21, 13] explains the differing inner product used throughout, which we havefound to lead to simpler computations while yielding identical conclusions. The estimate (7) (withconstant A = 1) would follow straightforwardly from a Grönwall argument if the generator L of thesemigroup was coercive, that is it satisfied hL f, f i ≤ − a k f k for some a > f in a core of L . Unfortunately, the symmetric part of the generator corresponding to a PDMP is degenerate ingeneral, in the sense that it has a nontrivial null space. Hence, the aforementioned coercivity clearlyfails to hold. However, it is possible to equip L ( µ ) with an equivalent scalar product derived from h· , ·i with respect to which L is coercive. The constant α is then given by the coercivity bound,while the constant A can be obtained from estimates relating the two equivalent scalar products.The paper is organised as follows. In Section 3 we develop our framework for hypocoercivitysuited to PDMP-MC processes, based on the ideas of [21]. In addition to providing a rigorousframework we further optimize the constants involved, ultimately leading to Theorem 1. Theproofs of Theorem 1 and its corollary are given in Section 4. In Section 5, we specialize our resultsto the case of the Zig-Zag process for which better estimates are possible, leading to attractivescaling properties with the dimension d . Various intermediate technical results have been moved toAppendices where, for completeness, we have also included classical facts from functional analysis. As stated above our results rely on the ideas proposed by [20, 21, 13] for which a rigorous frameworkwas subsequently given in [31, 32, 33, 34]. We derive here a novel proof, which borrows elementsof [31, 32, 33, 34] but leads to a different set of conditions motivated by our application to PDMP-Monte Carlo methods. We further provide explicit and optimized estimates of the constants involvedin terms of accessible characteristics of the process. We first present abstract results which formthe core of all of our proofs and then establish more specific ones common to all the processesconsidered in this paper, implying some of the abstract conditions. More specific results relating tothe Zig-Zag process are treated in Section 5.
We let S and T be the L ( µ )-symmetric and L ( µ )-anti-symmetric parts of a generator L satisfying A
1, that is S = ( L + L ⋆ ) / T = ( L − L ⋆ ) / , defined on D( S ) = D( T ) = C . (16)Consider the following additional assumption to A A2. Π v C ⊂ C and ( T Π v , C ) is a closable operator, where Π v is defined by (6) and C is given in A Note that since Π v C ⊂ C , we have C ⊂ D( T Π v ) and the restriction of T Π v to C exists. Under A A
2, Lemma 28 in Appendix B justifies the definition of the operator A , A = (cid:0) m Id +( T Π v ) ⋆ ( T Π v ) (cid:1) − ( −T Π v ) ⋆ , D( A ) = D(( T Π v ) ⋆ ) , (17)11here m is given by (2) and ( T Π v , D( T Π v )) and (( T Π v ) ⋆ , D(( T Π v ) ⋆ )) are the closure and theadjoint of ( T Π v , C ) respectively. Key properties are that Ran( A ) ⊂ D( T Π v ), A is closable with A bounded, and T Π v A is also closable of bounded closure. To show this result we adapt [31, Lemma2.4] since their lemma assumes that ( T , D( T )) is closed whereas, motivated by our applications, weassume ( T Π v , C ) to be a densely defined and closable operator instead. Lemma 3.
Let ( T , D( T )) be a anti-symmetric densely defined operator on L ( µ ) . Assume thatthere exists D ⊂ D( T Π v ) ∩ D( T ) , such that ( T Π v , D ) is a densely defined closable operator.(a) The closure of ( T Π v , D ) , ( T Π v , D( T Π v )) satisfies D( T Π v ) ⊂ D((Π v T ) ⋆ ) and for any f ∈ D( T Π v ) , (Π v T ) ⋆ f = −T Π v f , where (( T Π v ) ⋆ , D(( T Π v ) ⋆ )) is the adjoint of ( T Π v , D ) .(b) The operator A defined by (17) satisfies Ran( A ) ⊂ D( T Π v ) , is closable and its closure A isa bounded operator on L ( µ ) with (cid:23)(cid:23) A (cid:23)(cid:23) ≤ / (2 m ) / and Π v A = A on L ( µ ) .(c) Assume in addition that for any f ∈ D , Π v T Π v f = 0 . Then, ( T Π v A , D( T Π v A )) is alsoclosable and its closure E is bounded and satisfies for any f ∈ L ( µ ) , kE f k ≤ k (Id − Π v ) f k .Proof. To establish this result, we make use of classical results on unbounded operators in Hilbertspaces which for completeness, are given in Appendix B.(a) Since T is assumed to be anti-symmetric, we have for any f ∈ D( T Π v ), g ∈ D , h Π v T f, g i = − h f, T Π v g i since Π v g ∈ D( T ) as D ⊂ D( T Π v ). By definition of ( T Π v ) ⋆ , we obtain that D ⊂ D((Π v T ) ⋆ ), and for any f ∈ D , T Π v f = − (Π v T ) ⋆ f . Therefore { ( f, T Π v f ) : f ∈ D } ⊂{ ( f, − (Π v T ) ⋆ f ) : f ∈ D((Π v T ) ⋆ ) } , and we obtain the desired result by definition of ( T Π v , D( T Π v ))since − (Π v T ) ⋆ is closed by [49, Theorem 5.1.5].(b) The fact that Ran( A ) ⊂ D( T Π v ), A is closable and the bound follow directly from Lemma 28and Proposition 26-(a)-(d). We turn to the statement Π v A = A . By Lemma 28, the operator C = ( m Id +( T Π v ) ⋆ ( T Π v )) − is well-defined, bounded and Ran( C ) = D(( T Π v ) ⋆ ( T Π v )). Thereforeusing Lemma 30-(a) (since T Π v is densely defined), we have for any f ∈ D( T ), A f = C Π v T f = m − (cid:8) Id − ( T Π v ) ⋆ ( T Π v ) C (cid:9) Π v T f . (18)Therefore, by applying Π v to both sides and using Lemma 30-(b), we deduce that for any f ∈ D( T ),Π v A f = A f . The proof is then concluded upon noting that D( T ) is dense and Π v is continuous.(c) For any f ∈ D , since Π v T Π v f = 0, (18) becomes A f = C Π v T (Id − Π v ) f = m − (cid:8) Id − ( T Π v ) ⋆ ( T Π v ) C (cid:9) Π v T (Id − Π v ) f . Therefore, we get for any f ∈ D , kA f k = m − (cid:8) h Π v T (Id − Π v ) f, A f i − (cid:10) ( T Π v ) ⋆ ( T Π v ) C Π v T (Id − Π v ) f, A f (cid:11) (cid:9) = m − (cid:8) h− ( T Π v ) ⋆ (Id − Π v ) f, A f i − (cid:10) ( T Π v ) ⋆ ( T Π v ) C Π v T (Id − Π v ) f, A f (cid:11) (cid:9) = m − n − (cid:10) (Id − Π v ) f, ( T Π v ) A f (cid:11) − (cid:13)(cid:13) ( T Π v ) A f (cid:13)(cid:13) o , − Π v ) f ∈ D( T ) since f ∈ D ⊂ D( T Π v ), Lemma 30 and A f ∈ D( T Π v ).Using the Cauchy-Schwarz inequality we obtain that for any f ∈ D , k ( T Π v ) A f k ≤ k (Id − Π v ) f k .Using that D is dense in L ( µ ) together with the bounded linear transformation extension theorem[53, Theorem I.7] concludes the proof.The main result of [21] can be formulated under the following abstract assumption, which weshall assume to hold from now on, and the proof of our main theorem relies on optimized estimatesof the constants involved. A 3 (DMS abstract conditions) . Let C be as in A
1. Assume further that it satisfies A λ v > satisfying for any f ∈ C − hS f, f i ≥ λ v m / k (Id − Π v ) f k ; (b) there exists λ x ∈ (0 , satisfying for any f ∈ C − (cid:10) AT Π v f, f (cid:11) ≥ λ x k Π v f k ; (19) (c) there exists R ≥ satisfying for any f ∈ C (cid:12)(cid:12)(cid:10) AT (Id − Π v ) f, f (cid:11) + (cid:10) AS f, f (cid:11) (cid:12)(cid:12) ≤ R k (Id − Π v ) f k k Π v f k ; (d) for any f ∈ C , Π v T Π v f = 0 ;(e) finally, Ran(Π v ) ⊂ Ker( S ⋆ ) . Theorem 4.
Assume A A A f ∈ L ( µ ) , t ∈ R + and ǫ ∈ (0 , (2 / λ v ) − ∧ { λ x / (4 λ x + R ) } ) k P t f k ≤ A ( ǫ )e − α ( ǫ ) t k f k , with α ( ǫ ) = λ v m / Λ( ǫ )1 + 2 / λ v ǫ > and A ( ǫ ) = s / λ v ǫ − / λ v ǫ , (20) where Λ( ǫ ) = 1 − ǫ (1 − λ x ) − p [1 − ǫ (1 − λ x )] − ǫλ x (1 − ǫ ) + ǫ R . (21) (b) Further, if / R ≥ λ v then α : (cid:0) , λ x / (4 λ x + R ) (cid:1) → R + has a unique maximum at ǫ ⋆ suchthat α ( ǫ ) < α ( ǫ ⋆ ) < α ( ǫ ) , with ǫ = 1 + λ x − (1 − λ x ) q R R +4 λ x (1 + λ x ) + R ∈ (0 , (2 / λ v ) − ∧ { λ x / (4 λ x + R ) } ) , (22) so that A ( ǫ ) < + ∞ is well defined. In addition, if R ≥ then ǫ < λ x / (4 λ x + R ) . ε ∈ R + (instead of the L ( µ ) norm, which corresponds to ε = 0) H ε ( f ) = (1 / k f k + ε (cid:10) f, A f (cid:11) , for which ( P t ) t ≥ is exponentially contracting. More precisely, [21, Theorem 2] shows that forsome ε ∈ ( − ( m / / , ( m / / ) there exists α ( ε ) > f ∈ L ( µ ), H ε ( P t f ) ≤ e − α ( ε ) t H ε ( f ). Then, the convergence in L ( µ ) follows by Lemma 3-(b) which implies that H ε ( · )defines a norm which is equivalent to k · k : for ε ∈ ( − ( m / / , ( m / / ) and for any f ∈ L ( µ ),it holds (1 − ( m / − / ε ) k f k ≤ H ε ( f ) ≤ (1 + ( m / − / ε ) k f k . (23)Therefore, for a family (cid:8) f t ∈ L ( µ ) (cid:9) t ≥ , exponential decay of t H ε ( f t ) is equivalent to thatof t
7→ k f t k , a property exploited in the following proof. We first establish the following resultswhich give estimates of the functional { F i : i ∈ { , , }} defined for any g ∈ D( L ) by F ( g ) = hL g, g i , F ( g ) = (cid:10) L g, A g (cid:11) , F ( g ) = (cid:10) AL g, g (cid:11) . (24) Lemma 5.
Assume that L satisfies A A
2, and A
3. Then, for any g ∈ D( L ) , we have F ( g ) ≤ − λ v m / k (Id − Π v ) g k , F ( g ) ≤ k (Id − Π v ) g k , F ( g ) ≤ − λ x k Π v g k + R k (Id − Π v ) g k k Π v g k . (25) Proof.
Note that since C is a core for L and A and Π v are bounded, we only need to show that(25) holds for all g ∈ C . In addition, since A is an extension of A by Lemma 3-(b), and for any g ∈ C ⊂ D(( T Π v ) ⋆ ) = D( A ) from Lemma 30-(a) as Π v ( C ) ⊂ C = D( T ) by A
2, we deduce A g = A g . (26)Using that S is symmetric, T is anti-symmetric and C ⊂ D( L ) ∩ D( L ⋆ ), we get that for any g ∈ C , F ( g ) = hS g, g i ≤ − λ v m / k (Id − Π v ) g k by A v A = A by Lemma 3-(b) and (26), we have for any g ∈ C , F ( g ) = h Π v A g, S g i + h Π v A g, T g i = h Π v A g, T g i , where the last equality follows from Ran(Π v ) ⊂ Ker( S ⋆ ). In addition, since Π v is symmetric,Π v T Π v g = 0, Ran( A ) ⊂ D( T Π v ) ⊂ D((Π v T ) ⋆ ) by Lemma 3-(a)-(b), so (Π v T ) ⋆ A = −T Π v A byLemma 3-(a) and (cid:13)(cid:13) T Π v A g (cid:13)(cid:13) ≤ k (Id − Π v ) g k by Lemma 3-(c), we obtain for any g ∈ C , F ( g ) = hA g, Π v T (Id − Π v ) g i = h (Π v T ) ⋆ A g, (Id − Π v ) g i = − (cid:10) T Π v A g, (Id − Π v ) g (cid:11) ≤ k (Id − Π v ) g k . Finally, using A g ∈ C ⊂ D( L ) ∩ D( L ⋆ ) ∩ D( T Π v ), F ( g ) = (cid:10) AT Π v g, g (cid:11) + (cid:10) AT (Id − Π v ) g, g (cid:11) + (cid:10) AS g, g (cid:11) ≤ − λ x k Π v g k + R k (Id − Π v ) g k k Π v g k . roof of Theorem 4. The first part of the proof follows along the same lines as [31, Theorem 2.18].Let f ∈ L ( µ ) satisfying R E f d µ = 0 and ε >
0. For ease of notation, set for any t ≥ f t = P t f .From the Dynkin formula [27, Proposition 1.5], for any t > f t ∈ D( L ) and d f t / d t = L f t .Therefore, for any t > − dd t H ε ( f t ) = − [ F ( f t ) + ε { F ( f t ) + F ( f t ) } ] , where { F i : i ∈ { , , }} are defined in (24). Then by Lemma 5, we obtain that for any t > − dd t H ε ( f t ) ≥ λ v m / k (Id − Π v ) f t k + ε h λ x k Π v f t k − k (Id − Π v ) f t k − R k (Id − Π v ) f t k k Π v f t k i = (cid:18) k Π v f t k k (Id − Π v ) f t k (cid:19) ⊤ (cid:18) ελ x − εR / − εR / λ v m / − ε (cid:19) (cid:18) k Π v f t k k (Id − Π v ) f t k (cid:19) ≥ Λ ( ε ) k f t k , where Λ ( ε ) = λ v m / − ε (1 − λ x ) − q ( λ v m / − ε (1 − λ x )) − [4 ελ x ( λ v m / − ε ) − ε R ]2 , is the smallest eigenvalue of the symmetric matrix, positive for 0 ≤ ε ≤ λ x λ v m / / (4 λ x + R ) fromLemma 23 in Appendix A (as λ x ≤ A − dd t H ε ( f t ) ≥ ( ε )1 + ( m / − / ε H ε ( f t ) . From Grönwall’s lemma and (23), we obtain for 0 ≤ ε ≤ ( m / / ∧ { λ x λ v m / / (4 λ x + R ) } , k f t k ≤ C ( ε )e − α ( ε ) t k f k , where α ( ε ) = Λ ( ε )1 + ( m / − / ε and C ( ε ) = s m / − / ε − ( m / − / ε . For notational simplicity we let ǫ = ε/ ( λ v m / ) and note that with the definitions in (20)-(21), for ǫ < λ x / (4 λ x + R ), α ( ǫ ) = α ( ε ) > λ v m / Λ( ǫ ) = Λ ( ε ) >
0, and for ǫ ≤ (2 / λ v ) − the twonorms are equivalent and A ( ǫ ) = A ( ε ) is well defined. This concludes the proof of (a).From Proposition 25 and associated notation in Appendix A, ǫ α ( ǫ ) has a unique, butintractable, maximum, ǫ ⋆ ∈ (0 , λ x / (4 λ x + R )). However from Lemma 24-(b) and Proposition 25the unique maximum ǫ ∈ ( ǫ ⋆ , λ x / (4 λ x + R )) of ǫ Λ( ǫ ), defined by (63), provides us with atractable proxy such that α ( ǫ ) < α ( ǫ ⋆ ) < α ( ǫ ). In addition, since λ x ≤ / R ≥ λ v we get ǫ < (1 + λ x )(1 + λ x ) + R ≤ (2 R ) − ≤ (2 / λ v ) − , which implies that A ( ǫ ) is well defined (and the two norms equivalent). The last statement followsfrom Lemma 24-(c) in Appendix A. 15he following lemma provides us with simple estimates of α ( ǫ ) and A ( ǫ ) defined in Theorem 4. Lemma 6.
Let ǫ α ( ǫ ) , A ( ǫ ) and ǫ be as in Theorem 4 and let λ x ∈ (0 , . Then(a) for any R ≥ / , λ x / (1 + R ) ≤ ǫ ≤ / (4 + R ) ≤ / (4 R ) , (27) (b) for any R ≥ (4 + 12 / ) ∨ ( λ v / / ) , A ( ǫ ) ≤ / and λ v λ x m / ǫ / ≤ α ( ǫ ) ≤ λ v λ x m / ǫ . Proof.
The proof is postponed to Section 4.2.
Proposition 7.
Assume that L i , i ∈ { , } , defined by (1) or (3) , with B k given in (4) , satisfies A C = C ( E ) together with H H H H H H
6. Then the L ( µ ) -adjoint of L i for i ∈ { , } defined by (1) or (3) is given for any f ∈ C ( E ) by L ⋆i f = − v ⊤ ∇ x f + δ i, m F ⊤ ∇ v f + K X k =1 ϕ (cid:0) − v ⊤ F k (cid:1) [( B k − Id) f ] + m / λ ref R v f . Proof.
We only consider the case i = 2 since the proof for i = 1 follows along the same lines. Inaddition, since R v is self-adjoint by H ( E ) ⊂ D( R v ), we can consider the case λ ref ( x ) = 0for any x ∈ X . Based on (1)-(3), using that for any k ∈ { , . . . , K } , B k is symmetric on L ( µ ), forany ( x, v ) ∈ E , B k λ k ( x, v ) = λ k ( x, − v ) and by integration by part, for any f, g ∈ C ( E ), we obtain h g, L f i = D − v ⊤ ∇ x g + ( v ⊤ ∇ x U ) g + m F ⊤ ∇ v g − ( v ⊤ F ) g + P Kk =1 ( B k − Id)[ λ k ( x, v ) g ] , f E = D − v ⊤ ∇ x g + [ v ⊤ ( ∇ x U − F )] g + m F ⊤ ∇ v g + P Kk =1 { λ k ( x, − v ) B k g − λ k ( x, v ) g } , f E = hL ⋆i g, f i + D [ v ⊤ ( ∇ x U − F )] g + g P Kk =1 { λ k ( x, − v ) − λ k ( x, v ) } , f E . Using that P Kk =0 F k = ∇ x U by H λ k ( x, v ) − λ k ( x, − v ) = v ⊤ F k ( x ) for any k ∈{ , . . . , K } and ( x, v ) ∈ E by H
3, concludes the proof.The following provides expressions for the L ( µ )-symmetric and L ( µ )-anti-symmetric parts of L for all the PDMP processes considered in this paper. Define λ e k : E → R + for any ( x, v ) ∈ E and k ∈ { , . . . , K } by λ e k ( x, v ) := λ k ( x, v ) + λ k ( x, − v ) . (28) Proposition 8.
Assume that L i , i ∈ { , } , defined by (1) or (3) , with B k given in (4) , satisfies A C = C ( E ) together with H H H H H H
6. Let S and T i be the symmetricand anti-symmetric parts of L i respectively, defined by (16) . a) Then for any f ∈ C ( E ) , T i f = ˜ T i f and S i f = ˜ S f where ˜ T i and ˜ S are the operators defined forany g ∈ C ( E ) by ˜ T i g = v ⊤ ∇ x g − δ i, m F ⊤ ∇ v g + 12 K X k =1 ( v ⊤ F k ) ( B k − Id) g , (29)˜ S g = 12 K X k =1 λ e k ( B k − Id) g + m / λ ref R v g . (30) (b) S satisfies A C ( E ) ⊂ D( T ⋆i ) ∩ D( S ⋆ ) and for any f ∈ C ( E ) , T ⋆i f = − ˜ T i f and S ⋆ f = ˜ S f . Note that the symmetric parts of L i for i ∈ { , } are the same and equal to S . Proof. (a) follows from Proposition 7 and the definitions of S and T in (16). (b) is a directconsequence of the first result and the definition of ( S ⋆ , D( S ⋆ )). Simple integration by parts anddefinitions of ( S ⋆ , D( S ⋆ )), ( T ⋆i , D( T ⋆i )) imply (c).We define the directional derivative operatorfor any f ∈ D( D ) = C ( E ) , D f ( x, v ) := v ⊤ ∇ x f ( x, v ) . (31)The operators ( D , C ( E )) and ( D Π v , C ( E )) are densely defined on L ( µ ) and closable. The proofis similar to that for the operator ∇ x and is omitted, see for example [40, p. 88]. Note that by(29), a simple computation gives that for any f ∈ C ( E ) and i ∈ { , } , since Π v f ∈ C ( E ), T i Π v f = D Π v f . (32) Lemma 9.
Assume that L i , i ∈ { , } , defined by (1) or (3) , with B k given in (4) , satisfies A C = C ( E ) together with H H H H H H
6. Then, with T i the anti-symmetricpart of L i defined by (16) and the operator A i defined by (17) relative to T i , it holds:(a) T i satisfies A A C = C ( E ) and (( T i Π v ) , D( T i Π v )) = (( D Π v ) , D( D Π v )) ;(b) C ( E ) ⊂ D(( T i Π v ) ⋆ T i Π v ) and for any f ∈ C ( E ) , ( T i Π v ) ⋆ T i Π v f = m ∇ ⋆x ∇ x Π v f ;(c) { m Id +( T i Π v ) ⋆ T i Π v } − Π v = m − { Id + ∇ ⋆x ∇ x } − Π v on L ( µ ) ;(d) A ⋆i = m − ( D Π v ) { Id + ∇ ⋆x ∇ x } − Π v and for any f ∈ C ( E ) , there exists a unique function u ∈ C ( X ) , such that m − { Id + ∇ ⋆x ∇ x } − Π v f = u and A ⋆i f = − v ⊤ ∇ x u = − m − ( D Π v ) { Id + ∇ ⋆x ∇ x } − Π v . (33) Proof. (a) First note that C ( E ) is a core for ( D Π v , D( D Π v )) since for any f ∈ C ( E ), there existsa sequence of functions ( f n ) n ∈ N such that for any n ∈ N , f n ∈ C ( E ), lim n → + ∞ k f − f n k = 0 andlim n → + ∞ k∇ x f − ∇ x f n k = 0. Then the proof is completed upon using (31) and (32).17b) By (32), we have for any f ∈ C ( E ), that T i Π v f = v ⊤ ∇ x Π v f . It suffices then to verifythat with g : ( x, v ) v ⊤ ∇ x Π v f ( x, v ), then g ∈ D(( T i Π v ) ⋆ ) and ( T i Π v ) ⋆ g = m ∇ ⋆x g , i.e. for any h ∈ D( T i Π v ), hT i Π v h, g i = m h h, ∇ ⋆x g i . But D( T i Π v ) = C = C ( E ) by assumption and definitionsee (16), and then the result is just a straightforward consequence of (31), (32) and an integrationby part.(c) Note that we only need to show that the two operators { m Id +( T i Π v ) ⋆ T i Π v } − and m − { Id + ∇ ⋆x ∇ x } − coincide on a dense subset of L ( π ) since they are bounded. We prove that thisstatement is true choosing the subset m { Id + ∇ ⋆x ∇ x } (C ( X )). First, for any h ∈ C ( E ), we haveusing (a), (b) and the definition (31) that { m Id +( T i Π v ) ⋆ T i Π v } h = { m Id +( T i Π v ) ⋆ T i Π v } h = m { Id + ∇ ⋆x ∇ x Π v } h . (34)Second, for any g ∈ C ( X ), there exists a sequence ( g n ) n ∈ N such that for any n ∈ N , g n ∈ C ( X ),( g n ) n ∈ N , ( ∇ x g n ) n ∈ N and ( ∇ x g n ) n ∈ N converge in L ( π ) to g , ∇ x g and ∇ x g respectively, which impliesthat { [ m Id +( T i Π v ) ⋆ T i Π v ] g n } n ∈ N and { m [Id +( ∇ x ) ⋆ ∇ x ] g n } n ∈ N converge in L ( π ) . Therefore,since { m Id +( T i Π v ) ⋆ T i Π v } and m { Id + ∇ ⋆x ∇ x } are closed, we get that C ( X ) is included inthe domain of these two operators and (34) holds for any h ∈ C ( X ). [48, Theorem 2] or [15,Lemma 17] show that for any f ∈ C ( X ), there exists u ∈ C ( X ) such that m { Id + ∇ ⋆x ∇ x } u = f . Therefore, it holds that C ( X ) ⊂ m { Id + ∇ ⋆x ∇ x } (C ( X )) so m { Id + ∇ ⋆x ∇ x } (C ( X )) isdense in L ( π ). In addition, since we have shown that the operators { m Id +( T i Π v ) ⋆ T i Π v } and m { Id + ∇ ⋆x ∇ x } coincide on C ( X ), { m Id +( T i Π v ) ⋆ T i Π v } − and m − { Id + ∇ ⋆x ∇ x } − coincideon m { Id + ∇ ⋆x ∇ x } (C ( X )).(d) As A i is bounded, it is sufficient to show that A ⋆i and m − ( D Π v ) { Id + ∇ ⋆x ∇ x } − Π v coincideon a dense subset of L ( µ ). First, for all f, g ∈ C ( E ), we get that hA i g, f i = h Π v A i g, f i byLemma 3-(b). Now using the definition of A i (17), that Π v and { m Id +( T i Π v ) ⋆ T i Π v } − arebounded and self-adjoint, since Π v is an orthogonal projection and by Proposition 26-(a)-(c), weget for any f ∈ C ( E ), hA i g, f i = m − (cid:10) ( − Π v T i ) ⋆ g, { Id + ∇ ⋆x ∇ x } − Π v f (cid:11) = m − (cid:10) T i Π v g, { Id + ∇ ⋆x ∇ x } − Π v f (cid:11) , where we have used Lemma 30-(a) for the last equality and D( T ) = C ( E ). [48, Theorem 2] or [15,Lemma 17] show that there exists u ∈ C ( X ) satisfying m { Id + ∇ ⋆x ∇ x } u = Π v f and therefore,we get that hA i g, f i = hT i Π v g, u i = − (cid:10) g, v ⊤ ∇ x u (cid:11) , using an integration by part for the last identity. This result shows that for any f ∈ C ( E ), wehave that A ⋆i f = − v ⊤ ∇ x u . In addition, for any g ∈ C ( E ), there exists a sequence ( f n ) n ∈ N suchthat f n ∈ C ( E ) and lim n → + ∞ k g − f n k = 0, lim n → + ∞ k∇ x g − ∇ x f n k = 0. Therefore we getthat C ( E ) ⊂ D( D Π v ) and for any g ∈ C ( E ), D Π v g ( x, v ) = v ⊤ ∇ x g ( x, v ) for any ( x, v ) ∈ E .Therefore, we get the desired conclusion that A ⋆i f = − v ⊤ ∇ x u = − m − ( D Π v ) { Id + ∇ ⋆x ∇ x } − Π v f ,which completes the proof.Establishing A Note that the result is stated for functions f ∈ C ( R d ) but the proof can be easily extended to f ∈ C ( X ) roposition 10. Assume that L i , i ∈ { , } given by (1) or (3) , where B k is defined in (4) satisfies A C = C ( E ) . Assume in addition that H H H H H H S be thesymmetric part of L i defined by (16) . Then A λ v = λ and C = C ( E ) .Proof. From H H
6, it holds that for any f ∈ C ( E ), we have − D λ ref m / R v f, f E ≥ λm / h (Id − Π v ) f, f i . (35)In addition, any f ∈ C ( E ) satisfies max k ∈{ ,...,K } (cid:13)(cid:13) v ⊤ F k f (cid:13)(cid:13) < + ∞ by H H k ∈ { , . . . , K } , sup k ∈{ ,...,K } k λ e k f k < + ∞ . Therefore, using the Cauchy-Schwarzinequality, that B k is a symmetric involution on L ( µ ) by H
4, and B k λ e k = λ e k by definition (28),we obtain for any k ∈ { , . . . , K } and f ∈ C ( E ), h λ e k B k f, f i ≤ k ( λ e k ) / f k k ( λ e k ) / B k f k = k ( λ e k ) / f k . As a result, we deduce h λ e k (Id −B k ) f, f i ≥
0. Combining this result and (35) in the expression for S given in (30) in Proposition 8 completes the proof.The following lemma establishes equivalence between A H U only. Proposition 11.
Assume that L i , i ∈ { , } given by (1) or (3) , where B k as in (4) satisfies A C = C ( E ) . Assume in addition that H H H H H H T i be theanti-symmetric part of L i defined by (16) and A i be defined by (17) relative to T i . Then, A (19) , holds with λ x = C P / (1 + C P ) . (36) Proof.
From the assumed Poincaré inequality (8) we have for any f ∈ C ( E ) (cid:13)(cid:13)(cid:13) m − / D Π v f (cid:13)(cid:13)(cid:13) = k∇ x Π v f k ≥ C P k Π v f k . Then, by definition of D Π v this inequality holds also for any f ∈ D( D Π v ) replacing D Π v f by D Π v f .Therefore, we obtain since ( D Π v ) ⋆⋆ = D Π v that for any f ∈ D(( D Π v ) ⋆ D Π v ), (cid:10) f, m − ( D Π v ) ⋆ D Π v f (cid:11) ≥ C P k Π v f k . (37)In addition by [49, Theorem 5.1.9], ( D Π v ) ⋆ D Π v is a self-adjoint operator. These results and (37)imply that Spec( m − ( D Π v ) ⋆ D Π v ) ⊆ [ C P , ∞ ) by [16, Theorem 4.3.1].On the other hand, since by Lemma 9-(a), D Π v = T i Π v , we have ( D Π v ) ⋆ = ( T i Π v ) ⋆ and A i = − (cid:0) m Id +( D Π v ) ⋆ D Π v (cid:1) − ( D Π v ) ⋆ . Therefore, for any f ∈ D(( D Π v ) ⋆ D Π v ), − A i D Π v f = −A i D Π v f = (cid:0) m Id +( D Π v ) ⋆ D Π v (cid:1) − ( D Π v ) ⋆ D Π v f = Φ (cid:0) m − ( D Π v ) ⋆ D Π v (cid:1) f , where Φ( z ) = z/ (1 + z ). Since D(( D Π v ) ⋆ D Π v ) is a core for D Π v by [49, Theorem 5.1.9.], from thespectral mapping theorem [16, Theorem 2.5.1, Corollary 2.5.4], and the fact that Φ : [0 , ∞ ) → [0 , −A i D Π v can be extended on L ( µ ) as a self-adjoint bounded operator E and Spec( E ) ⊆ [Φ( C P ) , v is a projector, we deduce from Lemma 3-(b) that −A i T i Π v f = − Π v A i D Π v Π v f = Π v E Π v f for any f ∈ C ( E ) ⊂ D( D Π v ) and therefore, we get that for any f ∈ C ( E ) − (cid:10) Π v f, AT i Π v f (cid:11) = h Π v f, E Π v f i ≥ C P C P k Π v f k = λ x k Π v f k , which concludes the proof. A f ∈ L ( µ ) denote by u f = m − (Id + ∇ ⋆x ∇ x ) − Π v f . (38)In the scenarios considered here, condition A k u f k , k∇ x u f k and (cid:13)(cid:13) ∇ x u f (cid:13)(cid:13) which are obtained by noticing that by definition u f is solution of the following partialdifferential equation m (Id + ∇ ⋆x ∇ x ) u f = Π v f . In the next section, we show how general, but potentially rough, estimates can be obtained, whilein Section 5 we show how tighter bounds can be obtained in specific scenarios where we can takeadvantage of the structure at hand, in particular when interested in the scaling properties of thealgorithm with d . R in the general setting In all this section, we consider u f defined for any f ∈ L ( µ ) by (38). Recall that from Lemma 9-(d),if f ∈ C ( E ) then u f ∈ C ( R d ) and satisfies (33). Lemma 12.
Assume that L i , i ∈ { , } given by (1) or (3) , where B k is given in (4) , satisfies A C = C ( E ) . Assume in addition that H H H H H H S be thesymmetric part of L i defined by (16) and the operator A i defined by (17) relative to T i .(a) For any f ∈ C ( E ) , | (cid:10) A i S (Id − Π v ) f, f (cid:11) | ≤ k (Id − Π v ) f k k (Id − Π v ) ˜ SA ⋆i f k , where ˜ S is given by (30) .(b) For any f ∈ C ( E ) , k (Id − Π v ) ˜ SA ⋆i f k = k G ⊤ ∇ x u f k , (39) with G given for any ( x, v ) ∈ E by G ( x, v ) = K X k =1 λ e k ( x, v ) (cid:0) n ⊤ k ( x ) v (cid:1) n k + m / λ ref ( x ) v , (40) and u f , { λ e k : E → R + : k ∈ { , . . . , K }} are defined by (38) and (28) respectively. In addition G ⊤ ∇ x u f k ≤ m (cid:0) k λ ref ∇ x u f k + c ϕ K k∇ x u f k (cid:1) + C ϕ q m , + 3( m − m , ) + K X k =1 k F ⊤ k ∇ x u f k . (41) Proof.
We only consider the case i = 2 since the case i = 1 is obtained by taking F = 0.(a) By Lemma 3-(b), A i is a bounded operator. Therefore, we have for any f ∈ C ( E ) that hA i S (Id − Π v ) f, f i = hS (Id − Π v ) f, A ⋆i f i . Then, by Lemma 9-(d), we have that A ⋆i f = − v ⊤ ∇ x u f ,with u f ∈ C ( E ). This result, Proposition 8-(c), and the fact that Id − Π v is an orthogonalprojector imply that (cid:10) A i S (Id − Π v ) f, f (cid:11) = (cid:10) (Id − Π v ) f, (Id − Π v ) ˜ SA ⋆i f (cid:11) , with ˜ SA ⋆ f = − K X k =1 λ e k ( B k − Id) + m / λ ref R v ! v ⊤ ∇ x u f = K X k =1 λ e k ( v ⊤ n k )(n ⊤ k ∇ x u f ) + m / λ ref v ⊤ ∇ x u f = G ⊤ ∇ x u f , (42)where we have used H v ˜ SA ⋆ f = 0 completes the proof of (39).We now show (41) for any f ∈ C ( E ). But it is a direct consequence of the triangle inequality,the definition of { λ e k : E → R + ; k ∈ { , . . . , K }} given in (28), H
3, the Cauchy-Schwarz inequality,Lemma 38 and the identity F k = n k | F k | for any k ∈ { , . . . , K } : kSA ⋆ f k ≤ m / k λ ref v ⊤ ∇ x u f k + K X k =1 n C ϕ k ( v ⊤ n k ) F ⊤ k ∇ x u f k + c ϕ m / k ( v ⊤ n k ) n ⊤ k ∇ x u f k o = m k λ ref ∇ x u f k + m c ϕ K k∇ x u f k + C ϕ q m , + 3( m − m , ) + K X k =1 k F ⊤ k ∇ x u f k . Lemma 13.
Assume that L i , i ∈ { , } given by (1) or (3) , where B k is given in (4) , satisfies A C = C ( E ) . Assume in addition that H H H H H H T i be theanti-symmetric part of L i defined by (16) and the operator A i defined by (17) relative to T i . Then,(a) For any f ∈ C ( E ) , we get | (cid:10) A i T i (Id − Π v ) f, f (cid:11) | ≤ k (Id − Π v ) f k k (Id − Π v ) ˜ T i A ⋆i f k , where ˜ T i is given in (29) . b) For any f ∈ C ( E ) k (Id − Π v ) ˜ T A ⋆i f k = 2 m , k M k + 3( m − m , ) k diag( M ) k , (43) with M = ∇ x u f + K X k =1 ( F ⊤ k ∇ x u )n k n ⊤ k , (44) and u f defined by (38) . Remark 14.
A general, but potentially rough, bound on the right hand side of (43) can be obtainedas follows. From the fact that k diag( M ) k ≤ k M k , it holds that k (Id − Π v ) ˜ T i A ⋆i f k ≤ q m , + 3( m − m , ) + k M k where from the triangle inequality and the property | n k ( x )n k ( x ) ⊤ | = 1 k M k ≤ k∇ x u f k + K X k =1 k F ⊤ k ∇ x u f k . Remark 15.
Specific scenarios lead to simplifications of these bounds and the bounds in Lemma19:(a) from Lemma 36 for radial distributions m = m , leading to a simplification of this bound,(b) further if ν is the centred normal distribution of covariance m I d , then m , = m , leadingto further simplifications,(c) if K = 0 , and hence F = ∇ x U , the scenario considered by [21], then one finds that the bounddepends on (cid:13)(cid:13) ∇ x u f (cid:13)(cid:13) only.Proof. We proceed as in the proof of Lemma 12. We only consider the case i = 2 since the case i = 1 is obtained by taking F = 0.(a) By Lemma 3-(b), A is a bounded operator. Therefore, we have for any f ∈ C ( E )that hA T (Id − Π v ) f, f i = hT (Id − Π v ) f, A ⋆ f i . Then, by Lemma 9-(d), we have that A ⋆ f = − v ⊤ ∇ x u f , with u f ∈ C ( E ). This result, Proposition 8-(c), the fact that Id − Π v is an orthogo-nal projector and F k = n k | F k | , imply that for any (cid:10) A T (Id − Π v ) f, f (cid:11) = (cid:10) (Id − Π v ) f, (Id − Π v ) ˜ T A ⋆ f (cid:11) , with for any ( x, v ) ∈ E , − ˜ T A ⋆ f ( x, v ) = v ⊤ ∇ x u f ( x ) v − m F ⊤ ( x ) ∇ x u f ( x ) − K X k =1 ( v ⊤ F k ( x )) (cid:0) n k ( x )n k ( x ) ⊤ v (cid:1) ⊤ ∇ x u f ( x )= v ⊤ M ( x ) v − m F ⊤ ( x ) ∇ x u f ( x ) . (45)The proof is completed upon using the Cauchy-Schwarz inequality.22b) By (45), we obtain that for any f ∈ C ( E ), ( x, v ) ∈ E , − (Id − Π v ) ˜ T A ⋆i f ( x, v ) = v ⊤ M ( x ) v − m Tr( M ( x )) . Combining this result and Lemma 38, we deduce k (Id − Π v ) ˜ T A ⋆i f k = 2 m , k M k + 3( m − m , ) k diag( M ) k ≤ (cid:2) m , + 3( m − m , ) + (cid:3) k M k , which completes the proof. Remark 16.
Combining Corollary 29 and Corollary 35 in Appendix C, by definition of u f in (38) and using H
6, we obtain that m k∇ x u f k ≤ − / k Π v f k , K X k =1 k F ⊤ k ∇ x u f k ≤ / κ m κ K X k =1 a k k Π v f k ,m k λ ref ∇ x u f k ≤ λ (cid:26) − / + 2 / c λ κ κ (cid:27) k Π v f k . In this section we prove that A A C to be C ( E ). A A A λ v = λ . A λ x = C P / (1 + C P ). A A m = p m , + 3( m − m , ) + , for any f ∈ C ( E ) that (cid:13)(cid:13) ˜ SA ⋆i f (cid:13)(cid:13) + (cid:13)(cid:13) (Id − Π v ) ˜ T i A ⋆i f (cid:13)(cid:13) ≤ m ( k∇ x u f k + (1 + C ϕ ) K X k =1 k F ⊤ k ∇ x u f k ) + m k λ ref ∇ x u f k + m c ϕ K k∇ x u f k ≤ " mm ( / (1 + C ϕ ) κ κ K X k =1 a k + κ ) + λ / (cid:26) c λ κ κ (cid:27) + c ϕ K / k Π v f k , where we have used that (cid:13)(cid:13) ∇ x u f (cid:13)(cid:13) ≤ m − κ k Π v f k by Proposition 33 in Appendix C and Re-mark 16, with κ and κ given in (69) and (72) respectively. The proof of A Proof of Lemma 6.
Fix λ x ∈ (0 , t (1 + t ) / (cid:2) (1 + t ) + R (cid:3) is nondecreasing on R + , we obtain that for any R ≥ √
3, (27) is satisfied. 23b) Since for any a > s ( s + a ) / ( s − a ) for s > a is nonincreasing, we deduce from above thatfor R ≥ (4 + 2 √ ∨ ( λ v / / ), A ( ǫ ) ≤ R + 2 / λ v R − / λ v ≤ / λ v + 2 / λ v / λ v − / λ v < / . For the second part of the statement, first note thatΛ( ǫ ) = 2 − [1 − ǫ (1 − λ x )] (cid:2) − (cid:0) − ǫb Λ ( ǫ ) (cid:1) / (cid:3) , where b Λ ( ǫ ) = (cid:2) λ x (1 − ǫ ) − ǫR (cid:3) / [1 − ǫ (1 − λ x )] ∈ (cid:2) , ǫ − (cid:3) for ǫ ≤ (2 / λ v ) − ∧ { λ x / (4 λ x + R ) } .Using that for any a ∈ [0 , a/ ≤ − (1 − a ) / ≤ a we deduce that for ǫ ≤ (2 / λ v ) − ∧{ λ x / (4 λ x + R ) } , 4 − [1 − ǫ (1 − λ x )] ǫb Λ ( ǫ ) ≤ Λ( ǫ ) ≤ − [1 − ǫ (1 − λ x )] ǫb Λ ( ǫ ) . Further for R ≥ (4 + 2 √ ∨ ( λ v / / ) we have ǫ ≤ (2 / λ v ) − ∧{ λ x / (4 λ x + R ) } from Theorem 4-(b), leading to λ x / [1 − ǫ (1 − λ x )] ≤ b Λ ( ǫ ) ≤ λ x / [1 − ǫ (1 − λ x )] , and consequently, using (27), ǫ λ x / ≤ Λ( ǫ ) ≤ λ x ǫ / [1 − − λ x ) / (4 + R )] ≤ λ x ǫ , where we have used that λ x ≤ ≤
11 + 2 / λ v / (4 + R ) ≤
11 + 2 / λ v ǫ ≤ , where the leftmost inequality follows from the fact that for 2 / R ≥ λ v / λ v R ≤ / λ v − λ v ≤ / . Proof of Theorem 2.
Since λ v = λ and R ≥ (4 + 2 √ ∨ ( λ/ / ) by Theorem 1, from Theorem 4and Lemma 6, A < / while with λ x = C P / (1 + C P ) λλ x m / ǫ / ≤ α ( ǫ ) with λ x / (1 + R ) ≤ ǫ ≤ / (4 + R ) . (46)By (13), if c , c , k a k ∞ , m b are fixed, there exist C R ( C P , c , c , k a k ∞ , m b ) >
0, independent of d, λ , c λ , C ϕ and c ϕ such that R ≤ C R ( C P , c , c , k a k ∞ , m b ) R , where R = c ϕ K + (1 + C ϕ ) d (1+ ̟ ) / K + λ (1 + c λ d (1+ ̟ ) / ). Combining this bound with (46)concludes the proof. 24 The Zig-Zag sampler–optimization
In this section, we specify our results in the case of the Zig-Zag sampler for which better estimatescan be obtained, leading to better scaling properties with respect to d . The Zig-Zag process cor-responds to the instantiation of (1) for which F = 0, K = d , F i ( x ) = ∂ x i U ( x ) e i , n i ( x ) = e i , λ ref ( x ) = λ > , for i ∈ { , . . . , d } and x ∈ X , and R v = Π v − Id. The corresponding generatortakes the simplified form, for f ∈ C ( E ) and any ( x, v ) ∈ E L f ( x, v ) = v ⊤ ∇ x f ( x ) + d X i =1 ϕ (cid:0) v i ∂ x i U ( x ) (cid:1)(cid:2) f (cid:0) x, (Id − e i e ⊤ i ) v (cid:1) − f ( x, v ) (cid:3) + λ ref ( x ) m / R v f ( x, v ) , (47)where ϕ : R → R + is a continuous function and satisfies (12) in H V = {− m / , + m / } d for m > ν is theuniform distribution on V . Theorem 17.
Consider the Zig-Zag process with generator defined by (47) with λ ref = λ , R v =Π v − Id and ϕ : R → R + is a continuous function satisfying (12) in H
3. Assume A C = C ( E ) , H H H H H c ≥ such that for any g ∈ L ( π ) d (cid:10) g, (cid:2) ∇ x U − diag( ∇ x U ) (cid:3) g (cid:11) ≥ − c k g k . (48) Then, Theorem 4 holds with λ x as in (36) , λ v = λ and R = (6 m ) / (2 + C ϕ ) m (cid:16) (1 + c / / + 1 + ( c / / (cid:17) + λ + c ϕ / . (49) Remark 18.
From H g ∈ L ( π ) d (cid:10) g, ∇ x U g (cid:11) ≥ − c k g k and therefore (48) holds if there exist c > such that for any g ∈ L ( π ) d , (cid:10) g, diag( ∇ x U ) g (cid:11) ≤ c k g k , which is itself implied by c Id (cid:23) diag( ∇ x U ( x )) for all x ∈ X , since diag( ∇ x U ( x )) is symmetric.Note that this is the case when | diag( ∇ x U ( x )) | ≤ c or |∇ x U ( x ) | ≤ c for all x ∈ X , for example.Proof. The proof is very similar to the proof Theorem 1 and follows from applying Theorem 4.Checking A A λ v = λ and λ x given by (36). We are left with checking A f ∈ C ( E ), (cid:13)(cid:13) ˜ SA ⋆ f (cid:13)(cid:13) + (cid:13)(cid:13) (Id − Π v ) ˜ T A ⋆ f (cid:13)(cid:13) ≤ (6 m ) / (2 + C ϕ ) (cid:16)(cid:13)(cid:13) ∇ x u f (cid:13)(cid:13) + k∇ ∗ x ∇ x u f k + c / k∇ x u f k (cid:17) + (cid:0) λ + c ϕ (cid:1) m k∇ x u f k . which corresponds to c λ = 0 in H f ∈ C ( E ), (cid:13)(cid:13) (Id − Π v ) ˜ SA ⋆ f (cid:13)(cid:13) + (cid:13)(cid:13) (Id − Π v ) ˜ T A ⋆ f (cid:13)(cid:13) ≤ (cid:26) (6 m ) / (2 + C ϕ ) m (cid:16) (1 + c / / + 1 + ( c / / (cid:17) + λ + c ϕ / (cid:27) k Π v f k , The proof is then completed by Lemma 12-(a) and Lemma 13-(a).We discuss in the following the dependence on the dimension of the convergence rate α ( ǫ ) andthe constant A ( ǫ ) given by Theorem 4 based on the constant provided by Theorem 17. Similarlyto the general case, we need to impose some conditions on m , m . Here, we assume that m / /m does not depend on d , which holds in the case where ν is the uniform distribution on V = {− , } d or the d -dimensional zero-mean Gaussian distribution with covariance matrix I d .In the case where π is the i.i.d. product of one-dimensional distributions π i on ( R , B ( R )) as-sociated with potentials U i : R → R satisfying H i.e. for any x ∈ X , U ( x ) = P di =1 U i ( x i ), ∇ x U ( x ) = diag( ∇ x U ( x )) for any x ∈ X and therefore (48) holds with c = 0. Then, the conver-gence rate α ( ε ) and the constant A ( ε ) in Theorem 4 do not depend on the dimension but onlyon the constants c , c , λ , c λ and C P associated to each U i .Consider now the case where the potential U is strongly convex and gradient Lipschitz, i.e. thereexist m, L > m I d (cid:22) ∇ x U ( x ) (cid:22) L I d for any x ∈ X . Then, since for any i ∈ { , . . . , d } and x ∈ X , ∂ x i ,x i U ( x ) = e ⊤ i ∇ x U ( x ) e i ≤ L by assumption, Remark 18 implies that (48) holds for c = L − m . In addition, H c = 0 and c = L and by [4, Proposition 5.1.3, Corollary5.7.2], U satisfies (8) with C P = m . Then, the convergence rate α ( ε ) and the constant A ( ε ) inTheorem 4 do not depend on the dimension but only on L , m , λ and λ . In addition, we observethat the larger L − m is, the larger R given in (49) is, which in turn make the convergence rate α ( ε ) worse since it is of order O (1 /R ) as R → + ∞ by Lemma 6. This result is expected in theGaussian case U ( x ) = x ⊤ Σ x for any x ∈ X , since L − m is the diameter of the set of eigenvalues ofΣ which is a characterization of the conditioning of the problem. Lemma 19.
Consider the Zig-Zag process with generator L defined by (47) with λ ref = λ , R v =Π v − Id and ϕ : R → R + is a continuous function satisfying (12) in H
3. Assume A C = C ( E ) , H H H H H (48) hold. Let S and T be the symmetric and anti-symmetric parts of L respectively and A the operator defined by (17) relative to T . Then for any f ∈ C ( E ) , k (Id − Π v ) ˜ SA ⋆ f k ≤ (6 m ) / C ϕ (cid:16)(cid:13)(cid:13) ∇ x u f (cid:13)(cid:13) + k∇ ⋆x ∇ x u f k + c / k∇ x u f k (cid:17) + (cid:0) λ + c ϕ (cid:1) m k∇ x u f k , where u f is given by (38) .Proof. We use Lemma 12 and its notation, where K = d , for k ∈ { , . . . , d } , F k = ∂ x k U andn k = sgn( ∂ x k U ) e k . In this setting and by (40), it follows that for any ( x, v ) ∈ E , G ( x, v ) = d X k =1 λ e k ( x, v ) v k e k + λm / v .
26y the triangle inequality and since for i, j ∈ { , . . . , d } , i = j , R V g ( v i ) g ( v j ) v i v j d ν ( v ) = 0 by H g : R → R , we get (cid:13)(cid:13) G ⊤ ∇ x u f (cid:13)(cid:13) ≤ (cid:13)(cid:13)(cid:13)P dk =1 { ϕ ( v k ∂ x k U ) + ϕ ( − v k ∂ x k U ) } v k ∂ x k u f (cid:13)(cid:13)(cid:13) + λm k∇ x u f k = " d X k =1 k{ ϕ ( v k ∂ x k U ) + ϕ ( − v k ∂ x k U ) } v k ∂ x k u f k / + λm k∇ x u f k . Then by H H ( µ ) d ) and since for any i ∈ { , . . . , d } , R V v i d ν ( v ) =3 m by H (cid:13)(cid:13) G ⊤ ∇ x u f (cid:13)(cid:13) ≤ " d X k =1 (cid:13)(cid:13)(cid:13) ( c ϕ m / + C ϕ | v k ∂ x k U | ) v k ∂ x k u f (cid:13)(cid:13)(cid:13) / + λm k∇ x u f k ≤ c ϕ m / " d X k =1 k v k ∂ x k u f k / + C ϕ " d X k =1 k| v k ∂ x k U | v k ∂ x k u f k / + λm k∇ x u f k ≤ ( c ϕ + λ ) m k∇ x u f k + C ϕ (3 m ) / " d X k =1 k ∂ x k U ∂ x k u f k / . (50)To bound the sum we note that for k ∈ { , . . . , d } ∂ x k U ∂ x k u f = ∂ x k u f + ∂ ∗ x k ∂ x k u f by Lemma 31-(a), which together with the fact ( a + b ) ≤ a + b ) leads to k ∂ x i U ∂ x i u f k ≤ (cid:0) (cid:13)(cid:13) ∂ x i u f (cid:13)(cid:13) + (cid:13)(cid:13) ∂ ∗ x i ∂ x i u f (cid:13)(cid:13) (cid:1) . Then, using that for a, b ≥ √ a + b ≤ √ a + √ b twice and (54), we deduce d X k =1 k ∂ x k U ∂ x k u f k ! / ≤ / ( d X k =1 (cid:16)(cid:13)(cid:13) ∂ x k u f (cid:13)(cid:13) + (cid:13)(cid:13) ∂ ∗ x k ∂ x k u f (cid:13)(cid:13) (cid:17)) / ≤ / d X k =1 (cid:13)(cid:13) ∂ x k u f (cid:13)(cid:13) ! / + d X k =1 (cid:13)(cid:13) ∂ ∗ x k ∂ x k u f (cid:13)(cid:13) ! / ≤ / (cid:16)(cid:13)(cid:13) ∇ x u f (cid:13)(cid:13) + k∇ ∗ x ∇ x u f k + c / k∇ x u f k (cid:17) . (51)Then combining (50) and (51) completes the proof by Lemma 12-(b). Lemma 20.
Consider the Zig-Zag process with generator L defined by (47) with λ ref = λ , R v =Π v − Id and ϕ : R → R + a continuous function satisfying (12) in H
3. Assume A C = C ( E ) , H H H H H (48) hold. Let T be the anti-symmetric part of L and A the operatordefined by (17) relative to T . Then for any f ∈ C ( E ) (cid:13)(cid:13) (Id − Π v ) ˜ T A ⋆ f (cid:13)(cid:13) ≤ [6(4 m − m , )] / (cid:16)(cid:13)(cid:13) ∇ x u f (cid:13)(cid:13) + k∇ ∗ x ∇ x u f k + c / k∇ x u f k (cid:17) , where u f is defined by (38) . roof. We use Lemma 13 and its notations, where K = d , for k ∈ { , . . . , d } , F k = ∂ x k U e k andn k = sgn( ∂ x k U ) e k . In this setting and by (44), it follows that M ( x ) = ∇ x u f ( x ) + diag (cid:0) ∇ x u f ⊙ ∇ x U (cid:1) , Since k M k = k diag( M ) k + k M − diag( M ) k , we obtain2 m , k M k + 3( m − m , ) k diag( M ) k = 2 m , k M − diag( M ) k + (3 m − m , ) k diag( M ) k ≤ m , (cid:13)(cid:13) ∇ x u f (cid:13)(cid:13) + (3 m − m , ) k diag( M ) k . (52)We now bound k diag( M ) k . First, we apply the triangle inequality and use Lemma 31-(a), todeduce that k diag( M ) k ( π ) = d X k =1 (cid:13)(cid:13) ∂ x k u f − ∂ x k u f + ∂ x k U ∂ x k u f (cid:13)(cid:13) ≤ d X k =1 (cid:0) (cid:13)(cid:13) ∂ x k u f (cid:13)(cid:13) + (cid:13)(cid:13) − ∂ x k u f + ∂ x k U ∂ x k u f (cid:13)(cid:13) (cid:1) ≤ d X k =1 (cid:16) (cid:13)(cid:13) ∂ x k u f (cid:13)(cid:13) + 2 (cid:13)(cid:13) ∂ ∗ x k ∂ x k u f (cid:13)(cid:13) (cid:17) , (53)where we have used for the last inequality that ( a + b ) ≤ a + 2 b for any a, b ∈ R . By Lemma 31-(a), (68), (10) and the fact that U ∈ C ( X ) using H
1, using that same reasoning as to establish(70), it holds for any k ∈ { , . . . , d } , (cid:13)(cid:13) ∂ ⋆x k ∂ x k u f (cid:13)(cid:13) = (cid:13)(cid:13) ∂ x k u f (cid:13)(cid:13) + h ∂ x k u f , ∂ x k ,x k U ∂ x k u f i , k∇ ∗ x ∇ x u f k = (cid:13)(cid:13) ∇ x u f (cid:13)(cid:13) + (cid:10) ∇ x u f , ∇ x U ∇ x u f (cid:11) . These identities and the condition (48) imply d X i =1 (cid:13)(cid:13) ∂ ∗ x i ∂ x i u f (cid:13)(cid:13) = (cid:13)(cid:13) diag (cid:0) ∇ x u f (cid:1)(cid:13)(cid:13) + (cid:10) ∇ x u f , diag (cid:0) ∇ x U (cid:1) ∇ x u f (cid:11) ≤ (cid:13)(cid:13) ∇ x u f (cid:13)(cid:13) + (cid:10) ∇ x u f , diag (cid:0) ∇ x U (cid:1) ∇ x u f (cid:11) ≤ k∇ ∗ x ∇ x u f k − (cid:10) ∇ x u f , (cid:0) ∇ x U − diag( ∇ x U ) (cid:1) ∇ x u f (cid:11) ≤ k∇ ∗ x ∇ x u f k + c k∇ x u f k . (54)Combining (53) and (54), we obtain k diag( M ) k ≤ d X k =1 (cid:13)(cid:13) ∂ x k u f (cid:13)(cid:13) + 2( k∇ ∗ x ∇ x u f k + c k∇ x u f k ) . From this inequality, (52) and Lemma 13-(b), we deduce (cid:13)(cid:13) (Id − Π v ) ˜ T A ⋆ f (cid:13)(cid:13) ≤ m − m , ) (cid:13)(cid:13) ∇ x u f (cid:13)(cid:13) + 2(3 m − m , ) (cid:16) k∇ ∗ x ∇ x u f k + c k∇ x u f k (cid:17) ≤ m − m , ) (cid:16)(cid:13)(cid:13) ∇ x u f (cid:13)(cid:13) + k∇ ∗ x ∇ x u f k + c / k∇ x u f k (cid:17) , since for a, b, c ≥ a + b + c ≤ ( a + b + c ) . 28 .2 d -dimensional Radmacher distribution We now consider the case V = {− m / , + m / } d and ν is the uniform distribution on V whichcorresponds to the original setting of the Zig-Zag process. This process has been proved to beergodic [8] even in the absence of refreshment, that is λ ref = 0. We note that in this scenario m = m / m , = m which leads to simplified expressions for the bounds in Lemma 19 andLemma 20 upon revisiting their proofs. However this has no qualitative impact. In this section weshow that hypocoercivity holds with our techniques for λ ref ( x ) = 0 for “most of X ” for a particulartype of partial refreshment update.Consider the scenario where R v is a mixture of the bounces {B k , k = 1 , . . . , d } , for any f ∈ L ( µ ),( x, v ) ∈ E , λ ref R v f ( x, v ) = d X k =1 λ ref ,k ( x ) (cid:2) f (cid:0) x, v − v k e k (cid:1) − f ( x, v ) (cid:3) , (55)with λ ref ,k : X → R + for k ∈ { , . . . , d } satisfying H
6, and λ ref = P dk =1 λ ref ,k , that is when the pro-cess refreshes, k ∈ { , . . . , d } is chosen at random with probability proportional to ( λ ref , , . . . , λ ref ,d )and the component v k of v is updated to − v k . Proposition 21.
Consider the Zig-Zag process with generator L and refreshment operator as in (47) and (55) respectively, with ϕ : R → R + is a continuous function satisfying (12) in H
3. Assume A C = C ( E ) , H H H H H (48) hold. Let S be the symmetric part of L definedby (16) .(a) the symmetric part of the generator is given for any f ∈ C ( E ) , ( x, v ) ∈ E by S f ( x, v ) = d X k =1 ( ϕ (cid:0) v k ∂ x k U ( x ) (cid:1) + ϕ (cid:0) − v k ∂ x k U ( x ) (cid:1) m / λ ref ,k ( x ) ) (cid:2) f (cid:0) x, v − v k e k (cid:1) − f ( x, v ) (cid:3) ; (b) the microscopic coercivity condition A f ∈ C ( E ) , ( x, v ) ∈ E − hS f, f i ≥ λ v m / k (Id − Π v ) f k with λ v = min k ∈{ ,...,d } ,x ∈ X (cid:26) | ∂ x k U ( x ) | λ ref ,k ( x ) (cid:27) . (56) Remark 22.
In other words A ε > , for all k ∈ { , . . . , d } , λ ref ,k vanisheseverywhere, except on { x ∈ X : ∃ k ∈ { , . . . , d } | | ∂ x k U | ( x ) < ε } . We also note that a similar resultholds for the case where R v = Π v − Id , that is A λ ref vanishes everywhere,except on { x ∈ X : ∃ k ∈ { , . . . , d } , | | ∂ x k U | ( x ) < ε } for ε > .Proof. The first statement is a direct application of Proposition 8-(a). For the second statement,using that ν is the uniform distribution on V = {− m / , m / } d , from the polarization identity andsince ϕ satisfies H
3, we get for any f ∈ C ( E ), setting ϕ e ( s ) := ϕ ( s ) + ϕ ( − s ), − hS f, f i = 12 Z E d X k =1 (cid:26) ϕ e ( v k ∂ x k U ( x ))2 + m / λ ref ,k ( x ) (cid:27) (cid:2) f ( x, v ) − f (cid:0) x, (Id − e k e ⊤ k ) v (cid:1)(cid:3) d µ ( x, v ) ≥ ( λ v m / / Z E d X k =1 (cid:2) f ( x, v ) − f (cid:0) x, (Id − e k e ⊤ k ) v (cid:1)(cid:3) d µ ( x, v ) , (57)29here λ v is defined in (56). Now by the Poincaré inequality for any g ∈ L ( ν ), see e.g. [47, p. 52],it holds that (1 / Z V d X k =1 (cid:2) g ( v ) − g (cid:0) (Id − e i e ⊤ i ) v (cid:1)(cid:3) d ν ( v ) ≥ Z V d X k =1 g ( v ) d ν ( v ) . (58)Now since for any f ∈ C ( E ), hS f, f i = hS (Id − Π v ) f, (Id − Π v ) f i and for any x ∈ X , v (Id − Π v ) f ( x, v ) ∈ L ( ν ), then combining (57) and (58) and using Fubini’s theorem concludes theproof of (56). As pointed out earlier the scenario K = 0 where F = ∇ x U is considered in [21] where theauthors establish hypercoercivity but also in [11, Theorem 3.9] where the authors establish geometricconvergence, that is the existence of constants A, α > V : E → R + satisfying µ (cid:0) { V = ∞} (cid:1) = 0, such that for any ( x, v ) ∈ E and t ≥ k P t (cid:0) ( x, v ) , · (cid:1) − µ ( · ) k TV ≤ AV ( x, v )e − αt . (59)Similar results have been obtained in [18] and [24] for the Bouncy particle sampler and in [8] forthe Zig-Zag process. All these methods rely on guessing such a suitable Lyapounov function V andestablishing a so-called drift condition for this function, in conjunction with a minorization condition[42]. Here we have established L ( µ )-exponential convergence, or equivalently that there exists anabsolute L ( µ )-absolute spectral gap [22, Proposition 22.3.2] (by considering the skeleton of theprocess) and is therefore µ -a.e. uniformly convergent by [22, Proposition 22.3.3 and Proposition22.3.5], that is (59) holds with V = and µ -a.e..An advantage of our approach is that it provides explicit and relatively simple bounds in termsof interpretable quantities which, we show, are informative, and is in contrast with those on mi-norization and drift conditions in most scenarios. One exception is the study of BPS on the toruscarried out in [24] for U = 0, using an appropriate coupling argument, which leads to a rate of con-vergence for the total variation distance with a favourable Θ( d / ) scaling. Although we have shownthat for the Zig-Zag sampler with Rademacher distribution λ ref is not required to be bounded awayfrom zero on X , the results of [8] hold with λ ref = 0. It would be interesting to further investigatewhether our results can be specialized to consider the scenario λ ref = 0.Although we have shown that the theory developed in this paper covers numerous scenariosin a unified set-up, various possible extensions are possible. For example we have restricted thisfirst investigation to deterministic bounces of the type given in (4), but there does not seem tobe any obstacle to the extension of our results to the more general set-ups such as considered in[55, 58, 44]. In the same vein, great parts of our calculations could be used to consider distributionsof the velocity ν that are neither Gaussian, nor the uniform distribution on the hypersphere. For ν of density proportional to exp( − K( v )) with K : R d → R the Liouville operator involved in thedefinition of (3) would take the form ∇ v K( v ) ⊤ ∇ x f ( x, v ) − m F ⊤ ∇ v f ( x, v ), leading to a differentexpression for T . Such modified kinetic energies have been proposed to speed up the computation,introducing the Modified Langevin Dynamics for which convergence to equilibrium has been studiedin [52]. 30 Optimization and estimates of the rate of convergence α ( ǫ ) Consider the functions R, ˜ α : R ∗ + → R ∗ + given for any ǫ ≥ R ( ǫ ) = [1 − ǫ (1 − λ x )] − ǫλ x (1 − ǫ ) + ǫ R = R (cid:18) ǫ − λ x R (cid:19) + 1 − (1 + λ x ) R > , (60)˜ α ( ǫ ) = Λ( ǫ )1 + 2 / λ v ǫ = 1 − ǫ (1 − λ x ) − R / ( ǫ )2(1 + 2 / λ v ǫ ) , (61)where R = (1 + λ x ) + R , (62)and Λ is given in (21). We show that optimizing ǫ Λ( ǫ ) is a good enough proxy for optimizing ǫ ˜ α ( ǫ ), whose maximum is unique, but intractable. Since ǫ α ( ǫ ) defined by (20) is proportionalto ǫ ˜ α ( ǫ ), the same conclusion holds for this function. Lemma 23.
Let
Λ : R + → R be defined by (21) . Then with λ x ∈ (0 , and R > ,(a) Λ( ǫ ) ≥ for ǫ ∈ (cid:2) , λ x / (4 λ x + R ) (cid:3) and Λ(0) = 0 .(b) Λ has first order derivative Λ ′ ( ǫ ) = − (1 / (cid:2) (1 − λ x ) R / ( ǫ ) + ǫR − (1 + λ x ) (cid:3) R − / ( ǫ ) , and Λ ′ (0) = λ x > .(c) Λ : R + → R has a unique stationary point ( Λ ′ ( ǫ ) = 0 ) ǫ = (1 + λ x ) − (1 − λ x ) (cid:2) R / ( R + 4 λ x ) (cid:3) / (1 + λ x ) + R > , (63) such that Λ( ǫ ) > .Proof. From (21) we see that Λ( ǫ ) ≥ ≤ ǫ ≤ − λ x ∧ λ x λ x + R = 4 λ x λ x + R , where the equality follows from λ x >
0, which completes the proof of (a). The proof of (b) is asimple calculation and is omitted. We now show (c). If we set Λ ′ ( ǫ ) = 0, it implies that ǫ > λ x ) − ǫR = R / ( ǫ )(1 − λ x ) , (64)and imposes the condition (1 + λ x ) − ǫR ≥ ǫ ∈ (cid:20) , λ x (1 + λ x ) + R (cid:21) . (65)31quaring both sides of (64) implies the following sequence of equalities using (60)(1 − λ x ) R ( ǫ ) = (cid:2) ǫR − (1 + λ x ) (cid:3) , (1 − λ x ) (cid:2) R ǫ − λ x ) ǫ + 1 (cid:3) = R ǫ − R (1 + λ x ) ǫ + (1 + λ x ) , which is equivalent by (62) to R ǫ (cid:2) (1 − λ x ) − R (cid:3) − ǫ (1 + λ x ) (cid:2) (1 − λ x ) − R (cid:3) − λ x = 0(1 + λ x ) + R ǫ (cid:2) − λ x − R (cid:3) − ǫ (1 + λ x ) (cid:2) − λ x − R (cid:3) − λ x = 0((1 + λ x ) + R ) R ǫ − λ x ) ǫ + 4 λ x / ( R + 4 λ x ) = 0 . The two strictly positive roots are ǫ ± = (1 + λ x ) ± (cid:2) (1 + λ x ) − λ x { (1 + λ x ) + R } / ( R + 4 λ x ) (cid:3) / (1 + λ x ) + R > , where the inequality follows from λ x > R >
0. Further(1 + λ x ) (cid:0) R + 4 λ x (cid:1) − λ x (cid:2) (1 + λ x ) + R (cid:3) = R (cid:2) (1 + λ x ) − λ x (cid:3) = R [1 − λ x ] , and since λ x ≤
1, this yields the simplified expression for the two roots ǫ ± = (1 + λ x ) ± (1 − λ x ) (cid:2) R / ( R + 4 λ x ) (cid:3) / (1 + λ x ) + R . From the conditions on ǫ given by (a) and (65), and the fact that λ x ≤
1, we retain ǫ = ǫ − only.The last statement follows from the second statement and the fact that Λ ′ is continuous.The following lemma establishes in particular that ǫ is a global maximum. Lemma 24.
Let
Λ : R ∗ + → R be defined by (21) . Then with λ x ∈ (0 , and R > ,(a) or any ǫ > , Λ ′′ ( ǫ ) < (implying concavity),(b) Λ is maximized at ǫ defined by (63) and < ǫ ≤ (4 λ x ) / (4 λ x + R ) .(c) If in addition R ≥ , ǫ ≤ λ x / (4 λ x + R ) .Proof. (a) We differentiate ǫ
7→ − ǫ ) = − [1 − ǫ (1 − λ x )] + R / ( ǫ ) twice, yielding the first orderderivative ǫ (1 − λ x ) + (1 / R ′ ( ǫ ) R − / ( ǫ )and the second order derivative follows ǫ (1 / (cid:16) R ′′ ( ǫ ) R − / ( ǫ ) − (1 / R ′ ( ǫ )] R − / ( ǫ ) (cid:17) = (1 / R − / ( ǫ ) (cid:0) R ′′ ( ǫ ) R ( ǫ ) − [ R ′ ( ǫ )] (cid:1) . Now from (60), R ( ǫ ) = aψ ( ǫ ) with ψ ( ǫ ) = ( ǫ − b ) + c with all constants b, c non-negative. Further ψ ′ ( ǫ ) = 2( ǫ − b ) and ψ ′′ ( ǫ ) = 2 and therefore2 ψ ′′ ( ǫ ) ψ ( ǫ ) − ψ ′ ( ǫ ) = 4[( ǫ − b ) + c − ( ǫ − b ) ] = 4 c > , which implies that Λ ′′ ( ǫ ) ≤ ǫ ≥
0. 32b) From the concavity we deduce that ǫ is a maximum, and the inequality on ǫ follows from thefact that this is required for Λ( ǫ ) ≥ s ≥
0, (1 + s ) / ≤ s/
2, and 4 λ x ≤ (1 + λ x ) , we get that ǫ = R (1 + λ x )(4 λ x /R + 1) / − (1 − λ x ) (cid:2) (1 + λ x ) + R (cid:3) ( R + 4 λ x ) / ≤ λ x R + 2 λ x (1 + λ x ) /R (cid:2) (1 + λ x ) + R (cid:3) / ( R + 4 λ x ) ≤ λ x + 2 λ x (1 + λ x ) /R R + 4 λ x . The assumption R ≥ Proposition 25.
The function ˜ α : R + → R + , defined by (61) , has a unique maximizer ǫ ⋆ ∈ (0 , ǫ ) ,where ǫ is given in (63) . In addition, if / R ≥ λ v then ˜ α ( ǫ ) ≤ ˜ α ( ǫ ⋆ ) ≤ α ( ǫ ) . (66) Proof.
First note that for any ǫ ≥
0, ˜ α ′ ( ǫ ) = Ψ( ǫ )(1 + 2 / λ v ǫ ) , with Ψ( ǫ ) = Λ ′ ( ǫ )(1 + 2 / λ v ǫ ) − / λ v Λ( ǫ ) . Then from Lemma 24,Ψ( ǫ ) = 2 / λ v Λ( ǫ ) < , and for any ǫ ≥
0, Ψ ′ ( ǫ ) = (1 + 2 / λ v ǫ )Λ ′′ ( ǫ ) < . (67)Together with Ψ(0) = Λ ′ (0) = λ x >
0, and the fact that ǫ → Ψ( ǫ ) is continuous, we deduce theexistence and uniqueness of ǫ ⋆ ∈ (0 , ǫ ) satisfying ˜ α ′ ( ǫ ⋆ ) = 0, and maximizing ˜ α on R + . Furthersince ˜ α ′ ( ǫ ⋆ ) = 0 and ǫ Ψ( ǫ ) is non-increasing, using the first equality of (67) and the definitionof ˜ α given in (61), we deducesup ǫ ∈ [ ǫ ⋆ ,ǫ ] | ˜ α ′ ( ǫ ) | ≤ | Ψ( ǫ ) | (1 + 2 / λ v ǫ ⋆ ) = 2 / λ v / λ v ǫ (1 + 2 / λ v ǫ ⋆ ) ˜ α ( ǫ ) , From a Taylor’s theorem, we obtain˜ α ( ǫ ⋆ ) − ˜ α ( ǫ ) ≤ ( ǫ − ǫ ⋆ )2 / λ v / λ v ǫ (1 + 2 / λ v ǫ ⋆ ) ˜ α ( ǫ ) , from which we conclude that˜ α ( ǫ ) ≤ ˜ α ( ǫ ⋆ ) ≤ (cid:20) ǫ − ǫ ⋆ )2 / λ v / λ v ǫ (1 + 2 / λ v ǫ ⋆ ) (cid:21) ˜ α ( ǫ ) . Now if we use 2 / R ≥ λ v we have by (63) that λ v ǫ < (1 + λ x ) λ v (1 + λ x ) + R ≤ λ v (2 R ) − ≤ − / , ǫ − ǫ ⋆ )2 / λ v / λ v ǫ (1 + 2 / λ v ǫ ⋆ ) ≤ / λ v ǫ (1 + 2 / λ v ǫ ) ≤ , which completes the proof of (66). B Some results on closed operators on Hilbert spaces
In this section we gather classical results concerning densely defined closed operators on a Hilbertspace to which we repeatedly refer throughout the manuscript.
Proposition 26.
Let B be a closed and densely defined operator on a Hilbert space H of innerproduct h· , ·i , induced norm k·k and operator norm ~ · ~ .(a) Id + B ⋆ B is a positive self-adjoint operator on H bijective from D( B ⋆ B ) to H . In addition, (Id + B ⋆ B ) − is a positive self-adjoint bounded operator on H and B (Id + B ⋆ B ) − is a boundedoperator.(b) For any h ∈ H , k (Id + B ⋆ B ) − h k + 2 kB (Id + B ⋆ B ) − h k ≤ k h k . (c) B ⋆ B (Id + B ⋆ B ) − is a bounded operator on H which satisfies (cid:23)(cid:23) B ⋆ B (Id + B ⋆ B ) − (cid:23)(cid:23) ≤ .(d) The operator ((Id + B ⋆ B ) − B ⋆ , D( B ⋆ )) is closable, its closure is a bounded operator and ~ (Id + B ⋆ B ) − B ⋆ ~ ≤ . Remark 27.
Note that under the condition of Proposition 26, we get that (Id + B ⋆ B ) − B ⋆ can beextended to a bounded operator and (cid:23)(cid:23) (Id + B ⋆ B ) − (cid:23)(cid:23) ≤ , (cid:23)(cid:23) B (Id + B ⋆ B ) − (cid:23)(cid:23) ≤ / / . Proof. (a) and (b) follow from [49, Theorem 5.1.9] and inspection of the proof.We now show (c). First note that (Id + B ⋆ B −
Id)(Id + B ⋆ B ) − = Id − (Id + B ⋆ B ) − , from whichwe deduce that it is a self-adjoint and bounded operator by the triangle inequality with norm lessor equal than 2. To prove the tighter upper bound we use [49, Proposition 3.2.27 p. 99] (twice),the identity for any h ∈ H (cid:12)(cid:12)(cid:10) B ⋆ B (Id + B ⋆ B ) − h, h (cid:11)(cid:12)(cid:12) = max n k h k − (cid:10) (Id + B ⋆ B ) − h, h (cid:11) , (cid:10) (Id + B ⋆ B ) − h, h (cid:11) − k h k o , that (Id + B ⋆ B ) − is positive and ~ (Id + B ⋆ B ) − ~ ≤ B is closed and densily defined, D( B ⋆ ) is dense and therefore { (Id + B ⋆ B ) − B ⋆ } ⋆ is closed and densely defined by [49, Theorem 5.1.5]. By (a), we have for any h ∈ D( B ⋆ ) and h ∈ H , we have (cid:10) (Id + B ⋆ B ) − B ⋆ h , h (cid:11) = (cid:10) h , B (Id + B ⋆ B ) − h (cid:11) , which implies that { (Id + B ⋆ B ) − B ⋆ } ⋆ = B (Id + B ⋆ B ) − . Therefore, { (Id + B ⋆ B ) − B ⋆ } ∗∗ is a boundedoperator on H . The proof then follows by [49, Theorem 5.1.5] which implies that (Id + B ⋆ B ) − B ⋆ isclosable and (Id + B ⋆ B ) − B ⋆ = ((Id + B ⋆ B ) − B ⋆ ) ∗∗ .34 similar result can be obtained by using that B is closable only, as a consequence of thefollowing lemma. Lemma 28.
Assume that ( B , D( B )) is a densely defined closable operator. Let ( B , D( B )) be theclosure of ( B , D( B )) and m > . Then, the conclusions of Proposition 26 hold changing B to B .Proof. This result is a just a consequence of [49, Theorem 5.1.5] which implies that B ⋆ is denselydefined, B = ( B ⋆ ) ⋆ and B ⋆ = B ⋆ .The densely defined and closed operator ∇ x on L ( π ) can be extended as an operator on L ( π ) d as follows: for any ( f , . . . , f d ) ∈ L ( π ) d , f ∈ D( ∇ x ), ∇ x f = ∇ x f . Therefore a direct consequenceof Proposition 26 applied to the operator m − / ∇ x for m >
0, on L ( π ) d is the following. Corollary 29.
Let m > . The operators ∇ x ( m Id + ∇ ⋆x ∇ x ) − and ∇ ⋆x ∇ x ( m Id + ∇ ⋆x ∇ x ) − arebounded on L ( π ) d with (cid:23)(cid:23) ∇ x ( m Id + ∇ ⋆x ∇ x ) − (cid:23)(cid:23) L ( π ) ≤ / (2 m ) / , (cid:23)(cid:23) ∇ ⋆x ∇ x ( m Id + ∇ ⋆x ∇ x ) − (cid:23)(cid:23) L ( π ) ≤ . In addition, for any f ∈ L ( π ) , (cid:13)(cid:13) ( m Id + ∇ ⋆x ∇ x ) − f (cid:13)(cid:13) + (2 /m ) (cid:13)(cid:13) ∇ x ( m Id + ∇ ⋆x ∇ x ) − f (cid:13)(cid:13) ≤ {k f k /m } , and (cid:13)(cid:13) ∇ ⋆x ∇ x ( m Id + ∇ ⋆x ∇ x ) − f (cid:13)(cid:13) ≤ k f k . We conclude this section by the following results which can be found in [31].
Lemma 30 ([31, Lemma 2.2]) . Let ( T , D( T )) be a anti-symmetric operator on L ( µ ) and Π be anorthogonal projection on L ( µ ) . Assume that there exists D ⊂ D( T ) such that Π( D ) ⊂ D( T ) and D is dense in L ( µ ) . Then the following statements hold.(a) D( T ) ⊂ D(( T Π) ⋆ ) and for any f ∈ D( T ) , ( T Π) ⋆ f = − Π T f .(b) For any f ∈ D(( T Π) ⋆ ) , Π( T Π) ⋆ f = ( T Π) ⋆ f . C Elliptic regularity estimates
We preface this section with some complements on the adjoint of ∇ x seen as an operator on L ( π ) d . Lemma 31.
Assume H
1. Consider the operator ( ∇ x , D( ∇ x )) from the Hilbert space L ( π ) to L ( π ) d endowed with the inner product defined by (5) . Then it holds(a) for any i ∈ { , . . . , d } , the L ( π ) -adjoint of ∂ x i is given for any g ∈ C ( X ) by ∂ ⋆x i g = − ∂ x i g + g∂ x i U ; (b) the L ( π ) -adjoint of ∇ x is given for any G ∈ C ( X , R d ) by ∇ ⋆x G = − div x G + ∇ x U ⊤ G . emark 32. Note that Lemma 31 implies that for any g ∈ C ( X ) and G ∈ C ( X , R d ) , wehave ∇ ⋆x ∇ x g = − ∆ x g + ∇ x U ⊤ ∇ x g and ∇ x ∇ ⋆x G = ∇ ⋆x ∇ x G + ∇ x U G , (68) where we have defined ∇ ⋆x ∇ x G ∈ C poly ( E , R d ) for any ( x, v ) ∈ E and i ∈ { , . . . , d } by {∇ ⋆x ∇ x G ( x, v ) } i = ∇ ⋆x ∂ x i G ( x, v ) = d X j =1 − ∂ x j ,x i G j ( x, v ) + ∂ x j U ( x ) ∂ x i G ( x, v ) . Proof.
The proof just follows by integration by parts.
Proposition 33.
Let m > and assume H
1. Then for any f ∈ C ( E ) , k∇ x ( m Id + ∇ ⋆x ∇ x ) − Π v f k ≤ κ k Π v f k where κ = (1 + c / (2 m )) / . (69) Proof.
Let f ∈ C ( E ) and consider u = ( m Id + ∇ ⋆x ∇ x ) − Π v f . By [48, Theorem 2], u ∈ C ( X ).Therefore we obtain by (68), (10) and the fact that U ∈ C ( X ) using H k∇ x u k = h∇ x u, ∇ x u i = h∇ x u, ( ∇ ⋆x ∇ x )[ ∇ x u ] i = h∇ x u, ( ∇ x ∇ ⋆x )[ ∇ x u ] − ∇ x U ∇ x u i = k∇ ⋆x ∇ x u k − h∇ x u, ∇ x U ∇ x u i . (70)From the definition of u , using Corollary 29 and H k∇ x u k ≤ k Π v f k + c k∇ x u k ≤ k f k + c k Π v f k / (2 m ) . In order to bound terms of the form k F ⊤ k ∇ x u k in Section 3.3 we need the following Lemmawhich is a quantitative version of [21, Lemma 6]. Consider the function W : R d → R + defined forany x ∈ R d by W ( x ) = n |∇ x U ( x ) | o / . (71) Lemma 34 ([21, Lemma 6]) . Assume H
1. Then for any ϕ ∈ D( ∇ x ) , k∇ x ϕ k ≥ h (cid:0) c d ̟ / (4 C ) (cid:1) / i − k ϕ ∇ x U k , where c and C P are defined in (9) and (8) respectively. As a corollary, it holds for any ϕ ∈ D( ∇ x ) , k∇ x ϕ k ≥ κ k ϕW k , where κ − = (cid:0) C − + 16(1 + c d ̟ / (4 C )) (cid:1) / = C − (cid:0) c d ̟ + 16 C (cid:1) / ≥ C − . (72) Proof.
Note that we only need to consider ϕ ∈ C ∞ c ( X ) since C ∞ c ( X ) is a core for ( ∇ x , D( ∇ x )). Firstsince ∇ x U ∈ L ( µ ), for any ε >
0, we get2 h ϕ ∇ x U, ∇ x ϕ i ≤ ε − k∇ x ϕ k + ε k ϕ ∇ x U k . (73)36e then bound from below the left-hand side. Using the carré du champ identity, i.e. for any f, g ∈ C ( X ), h∇ x f, ∇ x g i = (cid:10) ∇ x U ⊤ ∇ x f − ∆ x f, g (cid:11) , we get using that ∇ x [ ϕ ] = 2 ϕ ∇ x ϕ ,2 h ϕ ∇ x U, ∇ x ϕ i = (cid:10) ∇ x [ ϕ ] , ∇ x U (cid:11) = k ϕ ∇ x U k − (cid:10) ϕ , ∆ x U (cid:11) . By (9) and (8), we obtain2 h ϕ ∇ x U, ∇ x ϕ i ≥ k ϕ ∇ x U k / − c d ̟ k ϕ k ≥ k ϕ ∇ x U k / − ( c d ̟ /C ) k∇ x ϕ k . From this result and (73), it follows that k ϕ ∇ x U k / − ( c d ̟ /C ) k∇ x ϕ k ≤ ε − k∇ x ϕ k + ε k ϕ ∇ x U k . Rearranging terms and setting ε = 1 / W in (71).Putting this with Proposition 33, this implies the following. Corollary 35.
Let m > and assume H H
2. For any f ∈ L ( µ ) and k ∈ { , . . . , K } , wehave (cid:13)(cid:13) F ⊤ k {∇ x ( m Id + ∇ ⋆x ∇ x ) − Π v f } (cid:13)(cid:13) ≤ / a k (cid:13)(cid:13) W {∇ x ( m Id + ∇ ⋆x ∇ x ) − Π v f } (cid:13)(cid:13) ≤ / a k κ κ k Π v f k , where a k , W , κ and κ are defined by (11) , (71) , (69) and (72) respectively.Proof. Note first that since ∇ x ( m Id + ∇ ⋆x ∇ x ) − is a bounded operator by Corollary 29, it is suffi-cient by density to show this result for f ∈ C ( E ). Let f ∈ C ( E ) and u = ( m + ∇ ⋆x ∇ x ) − Π v f . By[48, Theorem 2], u ∈ C ( X ). Second since for any t, s ≥ , s + t ≤ / √ s + t , H x ∈ X , | F k | ( x ) ≤ a k (1 + |∇ x U | ( x )) ≤ / a k W ( x ) . Therefore using Lemma 34 and Proposition 33 successively, we obtain (cid:13)(cid:13) F ⊤ k ∇ x u (cid:13)(cid:13) ≤ k | F k | ∇ x u k ≤ / a k k W ∇ x u k = 2 / a k d X i =1 k W ∂ x i u k ! / ≤ (2 / a k /κ ) d X i =1 k∇ x [ ∂ x i u ] k ! / = (2 / a k /κ ) (cid:13)(cid:13) ∇ x u (cid:13)(cid:13) ≤ (2 / a k κ /κ ) k Π v f k . D Radial distributions
The following gathers standard results on spherically symmetric distributions on R d for which wecould not find a single reference. In particular we establish that H Lemma 36.
Let d ≥ . a) Assume ν is the uniform distribution on the unit hypersphere S d − , then(i) for i, j, k, l ∈ { , . . . , d } such that card( { i, j, k, l } ) > , we have R S d − v i v j v k v l d ν ( v ) = 0 ,(ii) otherwise, m = 1 d , m , = Z S d − v v d ν ( v ) = 1 d ( d + 2) and m = 13 Z S d − v d ν ( v ) = 1 d ( d + 2) . (b) For any spherically symmetric distribution ν i.e. corresponding to random variables V = B / W for W uniformly distributed on the unit hypersphere S d − and B a non-negative randomvariable independent of w and of first and second order moments γ and γ respectively,(i) for i, j, k, l ∈ { , . . . , d } such that card( { i, j, k, l } ) > , we have R R d v i v j v k v l d ν ( v ) = 0 ,(ii) otherwise, m = γ d , m , = γ d ( d + 2) and m = γ d ( d + 2) . Remark 37.
Naturally the zero-mean d -dimensional Gaussian distribution on R d with covariancematrix I d . corresponds to B distributed according to χ ( d ) , in which case m = m , = m .Proof. We use the polar parametrization of the multivariate normal distribution. Let v ( φ ) = (cid:0) cos φ , sin φ cos φ , . . . , cos( φ k ) k − Y i =1 sin( φ i ) , . . . , d − Y i =1 sin( φ i ) (cid:1) ,φ ∈ [0 , π ] d − × [0 , π ]. The probability distribution for φ ensuring uniformity of v ( φ ) on the surfaceof the d -sphere has density f S ( φ ) ∝ d − Y i =1 sin d − i − ( φ i ) [0 , π ] d − × [0 , π ] ( φ ) , with respect to the Lebesgue measure on R d − . Let Φ be random variable with distribution f S .Further let B ∼ χ ( d ) be independent of Φ then it is standard knowledge that W = B / v (Φ) followsthe zero-mean d -dimensional Gaussian distribution on R d with covariance matrix I d . Therefore, byconstruction, E (cid:2) W i W j W k W l (cid:3) = E (cid:2) B v i (Φ) v j (Φ) v k (Φ) v l (Φ) (cid:3) = E (cid:2) B (cid:3) E (cid:2) v i (Φ) v j (Φ) v k (Φ) v l (Φ) (cid:3) = d ( d + 2) E (cid:2) v i (Φ) v j (Φ) v k (Φ) v l (Φ) (cid:3) , and the latter term vanishes when the leftmost term does. We also deduce that E (cid:2) W (cid:3) E (cid:2) W (cid:3) = E (cid:2) W W (cid:3) = d ( d + 2) E (cid:2) v (Φ) v (Φ) (cid:3) , from which we obtain E (cid:2) v (Φ) v (Φ) (cid:3) . Similarly using properties of the moments of the normaldistribution, 3 E (cid:2) W (cid:3) = E (cid:2) W (cid:3) = d ( d + 2) E (cid:2) v (Φ) (cid:3) , leading to the expression for E (cid:2) v (Φ) (cid:3) . The last statement is straightforward.38 Expectation of quadratic forms of the velocity
This section provides expressions for second order moments of quadratic forms of v for a large classof distributions for which we could not find adequate references. Lemma 38.
Let M ∈ R d × d be a symmetric matrix, c ∈ R and assume the distribution ν of v issuch that(a) for any bounded and measurable function f : R → R , i, j ∈ { , . . . , d } such that i = j , R f ( v i , v j ) d ν ( v ) = R f ( v , v ) d ν ( v ) (b) for i, j, k, l ∈ { , . . . , d } , we have R v i v j v k v l d ν ( v ) = 0 whenever card( { i, j, k, l } ) > .Then (cid:13)(cid:13) v ⊤ M v − c (cid:13)(cid:13) ν = 3( m − m , )Tr( M ⊙ M ) + ( m Tr( M ) − c ) + 2 m , Tr( M ) , where ⊙ denotes the Hadamard product.Proof. Using that M is symmetric, and the expectation symbol for expectations with respect to ν , E d X i,j =1 M ij v i v j − c = d X i,j,k,ℓ =1 M ij M kℓ E [ v i v j v k v ℓ ] − c d X i,j =1 M ij E [ v i v j ] + c where d X i,j,k,ℓ =1 M ij M kℓ E [ v i v j v k v ℓ ] = 3 m d X i =1 M ii + m , X i = j M ii M jj + 2 m , X i = j M ij = (3 m − m , ) d X i =1 M ii + m , d X i,j =1 (cid:0) M ii M jj + 2 M ij (cid:1) = (3 m − m , )Tr( M ⊙ M ) + m , (cid:0) Tr( M ) + 2Tr( M ) (cid:1) . Therefore E d X i,j =1 M ij v i v j − c = (3 m − m , )Tr( M ⊙ M ) + m , Tr( M ) + 2 m , Tr( M ) − cm Tr( M ) + c , which implies the desired result. Corollary 39.
Given a symmetric matrix M ∈ R d × d and a constant c ∈ R , (cid:13)(cid:13) v ⊤ M v − m Tr( M ) (cid:13)(cid:13) ν ≤ q m , + 3( m − m , ) + | M | . Examples of potentials
Lemma 40.
Assume that the potential U is defined for any x ∈ X by U ( x ) = P di =1 (cid:0) x i (cid:1) β / ,for β ≥ . Then U is strongly convex and there exists c > , dependent on β only, such that (9) is satisfied with ̟ = 0 .Proof. We have for i, j ∈ { , . . . , d } and x ∈ X , (cid:2) ∇ x U ( x ) (cid:3) i = βx i (cid:0) x i (cid:1) β − and (cid:2) ∇ x U ( x ) (cid:3) i,j = β [1 + (2 β − x i ] (cid:0) x i (cid:1) β − δ i,j , leading to ∇ x U ( x ) (cid:23) β I d , and the strong convexity follows. Using that β ≥ s ≥ c >
0, (1 + s ) β − s ≤ (1 + c ) β − c [0 ,c ] ( s ) + (1 + s ) β − s / (1 + c ) β ( c, + ∞ ) ( s )and (1 + s ) β − ≤ { ∨ (1 + c ) β − } [0 ,c ] ( s ) + (1 + s ) β − ( s/c ) ( c, + ∞ ) ( s ), we get for any x ∈ X ,∆ x U ( x ) = Tr( ∇ x U ( x )) = β d X i =1 [1 + (2 β − x i ] (cid:0) x i (cid:1) β − ≤ βd (cid:2) { ∨ (1 + c ) β − } + (2 β − c ) β − c (cid:3) + β − |∇ x U ( x ) | (cid:2) c − + (2 β − c ) − β (cid:3) , which with c ≥ (2 β − / ) ∨ /β completes the proof. Lemma 41.
Assume that the potential U is defined for any x ∈ X by U ( x ) = (1 + | x | ) β with β ≥ . Then U is strongly convex and there exists c > , dependent on β only, such that (9) issatisfied with ̟ = 1 − /β .Proof. First, we have that ∇ x U ( x ) = 2 β (1 + | x | ) β − x = 2 βU ( x ) − /β x, (74)and ∇ x U ( x ) = 2 β h (1 − /β ) U − /β ( x ) ∇ x U ( x ) x ⊤ + U − /β ( x ) I d i . (75)As a result, and since β ≥ − β − ) U − /β ( x ) ∇ x U ( x ) x ⊤ = 2 βU ( x ) − /β xx ⊤ (cid:23) , , U − /β ( x ) I (cid:23) I d , from which we conclude that for any x ∈ X , ∇ x U ( x ) (cid:23) β I d . It remains to show that (9) holds.First we have for any x ∈ X ,Tr (cid:0) ∇ x U ( x ) (cid:1) = 2( β − U − /β ( x ) x ⊤ ∇ x U ( x ) + 2 βd U − /β ( x ) ≤ β − |∇ x U ( x ) | | x | | x | + 2 βd U − /β ( x ) . Using that for any s ≥ a >
0, 2 s ≤ a − +( as ) , (1+ s ) β − ≤ (1+(2 d/β ) /β ) β − [0 , (2 d/β ) /β ] ( s )+(2 d/β ) − s β (1 + s ) β − ((2 d/β ) /β , + ∞ ) ( s ) ≤ (1 + (2 d/β ) /β ) β − [0 , (2 d/β ) /β ] ( s ) + (2 d/β ) − s (1 +40 ) β − ((2 d/β ) /β , + ∞ ) ( s ), (74)-(75), we get for any x ∈ X ,Tr (cid:0) ∇ x U ( x ) (cid:1) ≤ β − |∇ x U ( x ) || x | (1 + | x | ) − + 2 βd U − /β ( x ) ≤ ( β − β + |∇ x U ( x ) | / (4 β )) + 2 βd [(1 + (2 d/β ) /β ) β − + |∇ x U ( x ) | / (8 dβ )] ≤ β − β + 2 β − βd (1 + (2 d/β ) − /β ) + |∇ x U ( x ) | / , where we used in the last step which completes the proof, that ( a + b ) β − ≤ β − ( a β − + b β − ) forany a, b ≥
0, applying Hölder inequality, since β ≥ Acknowledgments
JR would like to thank Pierre Monmarché for showing him how ZZ and BPS fall under a generalframework. CA acknowledges support from EPSRC “Intractable Likelihood: New Challenges fromModern Applications (ILike)” (EP/K014463/1). All the authors acknowledge the support of theInstitute for Statistical Science in Bristol. AD acknowledges support from the Chaire BayeScale“P. Laffitte”.
References [1] F. Achleitner, A. Arnold, and E. A. Carlen. On linear hypocoercive BGK models. In
Fromparticle systems to partial differential equations. III , volume 162 of
Springer Proc. Math. Stat. ,pages 1–37. Springer, [Cham], 2016.[2] C. Andrieu and S. Livingstone. Peskun-Tierney ordering for (cid:0) µ, Q (cid:1) − self-adjoint Markov chainand process Monte Carlo. 2018.[3] D. Bakry, F. Barthe, P. Cattiaux, and A. Guillin. A simple proof of the Poincaré inequality fora large class of probability measures including the log-concave case. Elect. Comm. in Probab. ,13:60–66, 2008.[4] D. Bakry, I. Gentil, and M. Ledoux.
Analysis and geometry of Markov diffusion operators ,volume 348 of
Grundlehren der Mathematischen Wissenschaften [Fundamental Principles ofMathematical Sciences] . Springer, Cham, 2014.[5] P. L. Bhatnagar, E. P. Gross, and M. Krook. A model for collision processes in gases. i. smallamplitude processes in charged and neutral one-component systems.
Phys. Rev. , 94:511–525,May 1954.[6] J. Bierkens, P. Fearnhead, and G. Roberts. The zig-zag process and super-efficient samplingfor Bayesian analysis of big data. arXiv:1607.03188 , 2016.[7] J. Bierkens, K. Kamatani, and G. O. Roberts. High-dimensional scaling limits of piecewisedeterministic sampling algorithms.
ArXiv e-prints , July 2018.[8] J. Bierkens, G. Roberts, and P.-A. Zitt. Ergodicity of the zigzag process. arXiv:1712.09875 ,2018. 419] S. G. Bobkov.
Spectral Gap and Concentration for Some Spherically Symmetric ProbabilityMeasures , pages 37–43. Springer Berlin Heidelberg, Berlin, Heidelberg, 2003.[10] M. Bonnefont, A. Joulin, and Y. Ma. Spectral gap for spherically symmetric log-concaveprobability measures, and beyond.
Journal of Functional Analysis , 270(7):2456 – 2482, 2016.[11] N. Bou-Rabee and J. M. a. Sanz-Serna. Randomized Hamiltonian Monte Carlo.
Ann. Appl.Probab. , 27(4):2159–2194, 2017.[12] A. Bouchard-Côté, S. J. Vollmer, and A. Doucet. The Bouncy Particle Sampler: a non-reversible rejection-free Markov Chain Monte Carlo method.
ArXiv e-prints , 2015.[13] E. Bouin, J. Dolbeault, S. Mischler, C. Mouhot, and C. Schmeiser. Hypocoercivity withoutconfinement.
ArXiv e-prints , Aug. 2017.[14] E. Bouin, F. Hoffmann, and C. Mouhot. Exponential decay to equilibrium for a fiber lay-downprocess on a moving conveyor belt.
SIAM J. Math. Anal. , 49(4):3233–3251, 2017.[15] N. Brosse, A. Durmus, E. Moulines, and S. Sabanis. The tamed unadjusted langevin algorithm.
Stochastic Processes and their Applications , 2018.[16] E. B. Davies.
Spectral theory and differential operators , volume 42 of
Cambridge Studies inAdvanced Mathematics . Cambridge University Press, Cambridge, 1995.[17] M. H. A. Davis. Piecewise-deterministic Markov processes: a general class of nondiffusionstochastic models.
J. Roy. Statist. Soc. Ser. B , 46(3):353–388, 1984. With discussion.[18] G. Deligiannidis, A. Bouchard-Côté, and A. Doucet. Exponential Ergodicity of the BouncyParticle Sampler.
ArXiv e-prints , May 2017.[19] G. Deligiannidis, D. Paulin, and A. Doucet. Randomized hamiltonian monte carlo as scalinglimit of the bouncy particle sampler and dimension-free convergence rates. arXiv preprintarXiv:1808.04299 , 2018.[20] J. Dolbeault, C. Mouhot, and C. Schmeiser. Hypocoercivity for kinetic equations with linearrelaxation terms.
C. R. Math. Acad. Sci. Paris , 347(9-10):511–516, 2009.[21] J. Dolbeault, C. Mouhot, and C. Schmeiser. Hypocoercivity for linear kinetic equations con-serving mass.
Trans. AMS , 367:3807–3828, 2015.[22] R. Douc, Moulines, P. Éric, Priouret, and P. Soulier.
Markov chains . Springer InternationalPublishing, 2019.[23] S. Duane, A. Kennedy, B. J. Pendleton, and D. Roweth. Hybrid Monte Carlo.
Physics LettersB , 195(2):216 – 222, 1987.[24] A. Durmus, A. Guillin, and P. Monmarché. Geometric ergodicity of the bouncy particlesampler.
ArXiv e-prints , July 2018.[25] A. Durmus, A. Guillin, and P. Monmarché. Piecewise Deterministic Markov Processes andtheir invariant measure.
ArXiv e-prints , July 2018.4226] J.-P. Eckmann and M. Hairer. Spectral properties of hypoelliptic operators.
Comm. Math.Phys. , 235(2):233–253, 2003.[27] S. N. Ethier and T. G. Kurtz.
Markov processes . Wiley Series in Probability and MathematicalStatistics: Probability and Mathematical Statistics. John Wiley & Sons Inc., New York, 1986.Characterization and convergence.[28] J. Evans. Hypocoercivity in phi-entropy for the linear relaxation boltzmann equation on thetorus. arXiv preprint arXiv:1702.04168 , 2017.[29] A. Faggionato, D. Gabrielli, and M. Ribezzi Crivellari. Non-equilibrium thermodynamics ofpiecewise deterministic markov processes.
Journal of Statistical Physics , 137(2):259, Oct 2009.[30] A. Gelman, J. B. Carlin, H. S. Stern, D. B. Dunson, A. Vehtari, and D. B. Rubin.
Bayesiandata analysis . Texts in Statistical Science Series. CRC Press, Boca Raton, FL, third edition,2014.[31] M. Grothaus and P. Stilgenbauer. Hypocoercivity for Kolmogorov backward evolution equa-tions and applications.
J. Funct. Anal. , 267(10):3515–3556, 2014.[32] M. Grothaus and P. Stilgenbauer. A hypocoercivity related ergodicity method for singularlydistorted non-symmetric diffusions.
Integral Equations Operator Theory , 83(3):331–379, 2015.[33] M. Grothaus and P. Stilgenbauer. Hilbert space hypocoercivity for the Langevin dynamicsrevisited.
Methods Funct. Anal. Topology , 22(2):152–168, 2016.[34] M. Grothaus and F.-Y. Wang. Weak poincarée inequalities for convergence rate of degeneratediffusion processes. arXiv preprint arXiv:1703.04821 , 2017.[35] D. Han-Kwan and M. Léautaud. Geometric analysis of the linear Boltzmann equation I. Trendto equilibrium.
Ann. PDE , 1(1):Art. 3, 84, 2015.[36] F. Hérau. Hypocoercivity and exponential time decay for the linear inhomogeneous relaxationBoltzmann equation.
Asymptot. Anal. , 46(3-4):349–359, 2006.[37] F. Hérau and F. Nier. Isotropic hypoellipticity and trend to equilibrium for the Fokker-Planckequation with a high-degree potential.
Arch. Ration. Mech. Anal. , 171(2):151–218, 2004.[38] R. Holley and D. Stroock. Logarithmic Sobolev inequalities and stochastic Ising models.
J.Statist. Phys. , 46(5-6):1159–1194, 1987.[39] L. Hörmander. Hypoelliptic second order differential equations.
Acta Math. , 119:147–171,1967.[40] K.Yoshida.
Functional analysis . Grundlehren der mathematischen Wissenschaften in Einzel-darstellungen mit besonderer Beruücksichtigung der Anwendungsgebiete, Bd. 123. Springer-Verlag, 6ed. edition, 1980.[41] J. S. Liu.
Monte Carlo strategies in scientific computing . Springer Science & Business Media,2008. 4342] S. P. Meyn and R. L. Tweedie.
Markov chains and stochastic stability . Springer Science &Business Media, 2012.[43] M. Michel, S. C. Kapfer, and W. Krauth. Generalized event-chain Monte Carlo: Con-structing rejection-free global-balance algorithms from infinitesimal steps.
J. Chem. Phys. ,140(5):054116, 2014.[44] M. Michel and S. Sénécal. Forward Event-Chain Monte Carlo: a general rejection-free andirreversible Markov chain simulation method. arXiv preprint arXiv:1702.08397 , 2017.[45] P. Monmarché. A note on fisher information hypocoercive decay for the linear boltzmannequation. arXiv preprint arXiv:1703.10504 , 2017.[46] C. Mouhot and L. Neumann. Quantitative perturbative study of convergence to equilibriumfor collisional kinetic models in the torus.
Nonlinearity , 19(4):969–998, 2006.[47] R. O’Donnell.
Analysis of Boolean functions . Cambridge University Press, New York, 2014.[48] E. Pardoux and Y. Veretennikov. On the Poisson equation and diffusion approximation. i.
Ann. Probab. , 29(3):1061–1085, 07 2001.[49] G. K. Pedersen.
Analysis now , volume 118. Springer Science & Business Media, 1995.[50] A. Persson. Bounds for the discrete part of the spectrum of a semi-bounded schrödingeroperator.
Mathematica Scandinavica , 8(1):143–153, 1960.[51] E. A. J. F. Peters and G. de With. Rejection-free monte carlo sampling for general potentials.
Phys. Rev. E , 85:026703, Feb 2012.[52] S. Redon, G. Stoltz, and Z. Trstanova. Error analysis of modified Langevin dynamics.
J. Stat.Phys. , 164(4):735–771, 2016.[53] M. Reed and B. Simon.
Methods of Modern Mathematical Physics: Functional Analysis.-1972.-(RU-idnr: M103448034) . Academic Press, 1972.[54] C. Robert and G. Casella.
Monte Carlo Statistical Methods . Springer Science & BusinessMedia, 2013.[55] P. Vanetti, A. Bouchard-Côté, G. Deligiannidis, and A. Doucet. Piecewise DeterministicMarkov Chain Monte Carlo. arXiv preprint arXiv:1707.05296 , 2017.[56] C. Villani. Hypocoercive diffusion operators. In
International Congress of Mathematicians ,volume 3, pages 473–498, 2006.[57] C. Villani. Hypocoercivity.
Mem. Amer. Math. Soc. , 202(950), 2009.[58] C. Wu and C. P. Robert. Generalized bouncy particle sampler. arXiv preprintarXiv:1706.04781arXiv preprintarXiv:1706.04781