[PDF] Approximating the Markov Chain of the Curie-Weiss Model

Abstract

In this paper, we quantify some known approximation to the Curie-Weiss model via applying the Stein method to the Markov chain whose stationary distribution coincides with Curie-Weiss model.

Full PDF

aa r X i v : . [ m a t h . P R ] F e b Applied Probability Trust (22 February 2021)

APPROXIMATING THE MARKOV CHAIN OF THE CURIE-WEISSMODEL

YINGDONG LU, ∗ IBM Research

Abstract

In this paper, we quantify some known approximation to the Curie-Weissmodel via applying the Stein method to the Markov chain whose stationarydistribution coincides with Curie-Weiss model.

Keywords:

1. Introduction

For an integer n > β ≥ h , the Curie-Weiss model for n spins at temperature 1 /β refers to a probability measure on S n := {− , } n , given by, π ( x ) = Z n exp[ − βH n ( x )] , ∀ x ∈ S n , with H n ( x ) = 12 n n X i,j =1 x i x j − h n X i =1 x i , ∀ x ∈ S n , and Z n are the normalization coeﬃcients. The parameter h relates to external mag-netization. This is an important mathematical model for studying the interaction ofelectron spins in real ferromagnets, and is sometimes also called the Ising model on thecomplete graph. For detailed physics background and thorough analysis on the Curie-Weiss model, see, e.g. [5]. The system has two phases determined by the temperature.0 ≤ β < supercritical phase , and β = 1 is known as the critical phase . ∗ Postal address: 1101 Kitchawan Rd, Yorktown Heights, NY 10598 ∗ Email address: [email protected]

While the ﬁnal results for these two phases are diﬀerent, there are commonalities inanalysis, and we will discuss them separately only when necessary.Deﬁne the one-dimensional quantity m n ( x ) := n P ni =1 x i , that is also known as themagnetization of the system. Thus, H n ( x ) can be rewritten in the following form, H n ( x ) = − n (cid:20)

12 ( m n ( x )) + hm n ( x ) (cid:21) . The concentration of mass happens at the minimizer of the following function, i ( m ) = − (cid:18) β ( m ) + βhm (cid:19) + 1 − m − m ) + 1 + m m ) . It is given by, βm + βh = 12 log 1 + m − m , (1)or equivalently, m = tanh( β ( m + h )).In certain parameter cases, it is observed that, see e.g. [1], under proper scaling, theMarkov chain that produces the density can converge to diﬀusion processes. Hence, thestationary distribution of the diﬀusion process can be viewed as a good approximationof the Curie-Weiss model. Note that the key to this approximation is the concentrationeﬀect proved, see, e.g. [3, 4]. They have been proved for a special case of critical systemand for supercritical cases. For other cases, the concentration is known to be morediﬃcult, and will be part of our future research.In this paper, we aim to quantify the discrepancy in their stationary distributionsvia applying Stein method. In the analysis, we will rely on a detailed analysis of theinﬁnitesimal operators of the two processes. While process level approximation can beachieved by a semigroup expansion method as shown in citeprinceton. More detailedregularity analysis for the solution to the Poisson equation related to the generatoris required for the Stein method. More speciﬁcally, our approach is developing aStein method approximation through a Markov chain generated by the Metropolis-Hasting method in simulation. The stationary distribution of this Markov chain is π n .Meanwhile, π ∞ will be the stationary distribution of a diﬀusion process Y ( t ), and isalso denoted by Y ( ∞ ). To estimate the diﬀerence between π n and Y ( ∞ ) in the weaksense, we need to quantify the values of | E [ h ( π n )] − E [ h ( Y ∞ )] | for an arbitrary boundedfunction. The key of the Stein method is to reduce the estimation of this quantity to tein Method for Curie-Weiss Model that of | E [ G ( f h ( Y ∞ )] − E [ G n ( f h ( π n )] | , with f h being the solution to the Stain equationwith respect to h , with the hope that the structural properties of solution will revealmore information that can aid the calculation of the quantities.While better result on the distribution function were obtained in Chaterjee usingtechniques that can not be easily generalized, the results in this paper is focus ongeneral function evaluated at the stationary distribution. Meanwhile, similar methodwas used by Braverman and Dai [2]and Gurvich [6, 7] for stocahstic processing systemand networks.In the rest of paper, we will ﬁrst introduce the Markov chain in Sec. 2; iwe willthen discuss the Stein method and present the results on approximating the stationarydistributions in Sec. 3.

2. The Markov Chain

In this section, we present the Markov chain whose stationary distribution is theCurie-Weiss model, then write out the scaled and centered version and introduce someof its basic properties.

First, let X nt denote a Markov chain whose state space is S n , and its transitionprobability is given by, P x,y =  n y ∈ N ( x )0 y / ∈ N ( x ) (2)where N ( x ) = ∪ nk =1 { y ∈ S n , y = x, y − x = ± e k } . The Markov chain will ﬂip one ofits coordinates with probability 1 /n .Let Y nt = η n ( X t ), where, m n ( x ) := (1 /n ) n X k =1 x k , η n ( x ) := n γ ( m n ( x ) − m ) , which is the centralized and renormalized magnetization. Its (one-step) transition rate Yingdong Lu is characterized by, Q n ( η n , η n + 2 n γ − ) = 12 [1 − ( m + n − γ η n )] ,Q n ( η n , η n − n γ − ) = 12 [1 + ( m + n − γ η n )] . Finally, Z nt , the Metropolis-Hasting version of this Markov chain, is the one with thefollowing transition rates, P n ( η n , η n ± n γ − ) = Q n ( η n , η n ± n γ − )(1 ∧ exp[ β (Φ n ( η ) − Φ n ( η ∓ n γ − ))] , with Φ n ( η ) = − n − γ η − n − γ ( m + h ) η. Deﬁne, p n ± ( η ) : = P n ( η, η ± n γ − )= (cid:20)

12 [1 ∓ ( m + ηnδ ) (cid:21) (cid:26) ∧ exp (cid:20) ± β (cid:18) ηnδ ± m ± h + 2 n (cid:19)(cid:21)(cid:27) . Furthermore, there will be a time scaling, or speed up of the Markov chain, that isquantiﬁed by the number α . The relationship under which a meaningful process limitcan be established has been identiﬁed in [1]. More speciﬁcally α = 1 − γ . The concentration of the Curie Weiss, a known result from Chatterjee, determinedthat γ = for 0 ≤ β < γ = for β = 1. More speciﬁcally, we know that, Lemma [Chatterjee]1. For all β ≥ h ∈ R , we have, for any t ≥ π n (cid:18) | m n − tanh( β ( m n + h )) | ≥ βn + t √ n (cid:19) ≤ (cid:18) − t β ) (cid:19) . (3)2. For h = 0 and β = 1, we have, π n ( | m n | ≥ t / ) ≤ e − cnt (4)We can use the tail distribution estimation to guarantee the boundedness of themoments, which will be useful later. tein Method for Curie-Weiss Model Lemma 2.1.

The ﬁniteness of the moments. More speciﬁcally, there exists an con-stant C m > , such that, E [ Z n ] ≤ C nm n !! , (5) where n !! denotes the double factorial of n , i.e. the product of all the integers from to n that have the same parity as n .Proof. We will discuss the critical and supercritical cases separately.For the critical case, η > n δ , we know that, π [ | η | ≥ n δ ] = π n [ | m n | ≥ n δ − ] ≤ exp( − cn δ − ) . For any integer k , we have, π [ | η | ≥ k ] = π n (cid:20) | m n | ≥ kn / (cid:21) = π n " | m n | ≥ (cid:18) k n (cid:19) / ≤ e − ck , The inequality comes from inequality (4) with t = k n . This will ensure the ﬁnitenessof the moments.For the supercritical case, again, for any integer k , we have, π [ | η | ≥ k ] = π n ( | m n − m | ≥ k √ n )= π n | m n − m | ≥ βn + k − β √ n √ n ! ≤ − ( k − β √ n ) β ) ! . This implies the ﬁniteness of the moment estimate. (cid:3)

Remark 2.1.

The bound is not necessary optimal, just suﬃce for our purpose in thispaper.The dynamics of the Markov chain discussed above indicates that the quantities[ p n + ( η ) + p n − ( η )] and [ p n + ( η ) − p n − ( η )] play prominent roles in the quantiﬁcation of theprocess. In the following, we provide a detailed analysis on these two quantities. Yingdong Lu

Lemma 2.2.

The calculation of [ p n + ( η ) + p n − ( η )] = 12 [1 − m ] (cid:26) (cid:20) β (cid:18) ηnδ + 2 n (cid:19)(cid:21)(cid:27) − ηnδ (cid:26) − − m m exp (cid:20) β (cid:18) ηnδ + 2 n (cid:19)(cid:21)(cid:27) (6) Proof.

The calculations goes as the following,[ p n + ( η ) + p n − ( η )]= 12 h − (cid:16) m + ηnδ (cid:17)i + 12 h (cid:16) m + ηnδ (cid:17)i exp (cid:20) β (cid:18) ηnδ − m − h + 2 n (cid:19)(cid:21) = 12 [1 − m ] + 12 [1 + m ] exp[ − β ( m + h )] exp (cid:20) β (cid:18) ηnδ + 2 n (cid:19)(cid:21) − ηnδ (cid:26) − exp (cid:20) β (cid:18) ηnδ − m − h + 2 n (cid:19)(cid:21)(cid:27) = 12 [1 − m ] (cid:26) (cid:20) β (cid:18) ηnδ + 2 n (cid:19)(cid:21)(cid:27) − ηnδ (cid:26) − exp (cid:20) β (cid:18) ηnδ − m − h + 2 n (cid:19)(cid:21)(cid:27) = 12 [1 − m ] (cid:26) (cid:20) β (cid:18) ηnδ + 2 n (cid:19)(cid:21)(cid:27) − ηnδ (cid:26) − − m m exp (cid:20) β (cid:18) ηnδ + 2 n (cid:19)(cid:21)(cid:27) . The ﬁrst equation is just simple algebraic manipulation, and the second and thirdequations used the basic relationship in (1). (cid:3)

Expand the exponential function gives us the following order estimation.

Corollary 2.1.

We have the following evaluations on terms of the expression in (6) ,i The -th order term in the expression: (1 − m ) .ii The ( nδ ) − term, E [ η ] h β (1 − m ) − m m i . • Recall that, in the critical case, we have, δ = n − / , so ( nδ ) − terms is n − / , • In the supercritical system, δ = n − / , so ( nδ ) − terms is n − / .iii Everything else will be higher order.In other words, we have, p n + ( η )+ p n − ( η ) = (1 − m )+ E ( η ) , with E ( η ) = η h β (1 − m ) − m m i ( nδ ) − + O (( nδ ) − ) .Proof. Both (i) and (ii) are straightforward, and (iii) follows from Lemma 2.1. (cid:3) tein Method for Curie-Weiss Model Lemma 2.3.

The calculation of [ p n + ( η ) − p n − ( η )] . p n + ( η ) − p n − ( η ) = 12 (1 − m ) (cid:26) − exp (cid:20) β (cid:18) ηnδ + 2 n (cid:19)(cid:21)(cid:27) − ηnδ (cid:26) − m m exp (cid:20) β (cid:18) ηnδ + 2 n (cid:19)(cid:21)(cid:27) (7) Proof.

Recall p n ± ( η ) = 12 h ∓ (cid:16) m + ηnδ (cid:17)i (cid:26) ∧ exp (cid:20) β (cid:18) ηnδ ± m ± h + 2 n (cid:19)(cid:21)(cid:27) . When 2 m + 2 h >

0, then, for large enough n , 2 ηnδ + 2 m + 2 h + n ≥

0, and 2 ηnδ − m − h + n ≤

0, we know that,[ p n + ( η ) − p n − ( η )]= 12 h − (cid:16) m + ηnδ (cid:17)i − h (cid:16) m + ηnδ (cid:17)i exp (cid:20) β (cid:18) ηnδ − m − h + 2 n (cid:19)(cid:21) Meanwhile, we know that exp[ β ( m + h )] = (cid:18) m − m (cid:19) . So,[ p n + ( η ) − p n − ( η )]= 12 h − (cid:16) m + ηnδ (cid:17)i − h (cid:16) m + ηnδ (cid:17)i exp (cid:20) β (cid:18) ηnδ − m − h + 2 n (cid:19)(cid:21) = 12 (1 − m ) −

12 (1 + m ) exp[ − β ( m + h )] exp (cid:20) β (cid:18) ηnδ + 2 n (cid:19)(cid:21) − ηnδ (cid:26) (cid:20) β (cid:18) ηnδ − m − h + 2 n (cid:19)(cid:21)(cid:27) = 12 (1 − m ) (cid:26) − exp (cid:20) β (cid:18) ηnδ + 2 n (cid:19)(cid:21)(cid:27) − ηnδ (cid:26) (cid:20) β (cid:18) ηnδ − m − h + 2 n (cid:19)(cid:21)(cid:27) = 12 (1 − m ) (cid:26) − exp (cid:20) β (cid:18) ηnδ + 2 n (cid:19)(cid:21)(cid:27) − ηnδ (cid:26) − m m exp (cid:20) β (cid:18) ηnδ + 2 n (cid:19)(cid:21)(cid:27) The case of 2 m + 2 h < (cid:3) Similar to Corollary 2.1, we have,

Corollary 2.2.

The terms of (7) can be evaluated as, • In the case of h = 0 and β = 1 , we have, m = 0 , with α = 3 / and γ = 1 / , n [ p n + ( η ) − p n − ( η )] = (cid:20) − n − η + 43 n − η (cid:21) n . Yingdong Lu • In the supercritical case − (1 − m ) η (cid:20) β + 12(1 + m ) (cid:21) n − (1 − m ) β (cid:20) (1 + η β ) − η m (cid:21) n − (1 − m ) βη (cid:20) β − (1 + η β )1 + m (cid:21) n In other words, we have, n [ p n − ( η ) + p n − ( η )] = − (1 − m ) η h β + m ) i + E ( η ) , with E ( η ) = O ( n ) .Proof. For the critical case in which h = 0 and β = 1, we al so have, m = 0,with α = 3 / γ = 1 /

4, thus, the following estimation follows directly from theexpansion of the exponential function,2 n [ p n + ( η ) − p n − ( η )]= (cid:20) (1 − n − η ) − (1 + n − η )(1 − n − η + 2 n − η − n − η ) (cid:21) n + O ( n − )= (cid:20) − n − η + 43 n − η (cid:21) n + O ( n − ) . The ﬁrst term is, of course, the desired drift term in the generator, so we only needto estimate the second term, i.e. E h η n − / i . The estimation of the tail order againfollows from Lemma 2.1.Now, let us consider the supercritical case. With γ = , we have,1 − exp (cid:20) β (cid:18) ηn + 2 n (cid:19)(cid:21) = − β (cid:18) ηn + 2 n (cid:19) − β (cid:18) ηn + 2 n (cid:19) + O (cid:18) n (cid:19) = − β ηn − (2 β + 2 η β ) 1 n + O (cid:18) n (cid:19) Plug it in (7), we get, − (1 − m ) η (cid:20) β + 12(1 + m ) (cid:21) n − (1 − m ) β (cid:20) (1 + η β ) − η m (cid:21) n + O (cid:18) n (cid:19) (cid:3)

3. Approximation via the Stein Method

In this section, we will present the result on the approximation of the stationarydistribution via the Stein method. First, we will provide some quantitative charac-terization of the solution to the Stein equation in Sec. 3.1; then we will present anddemonstrate the main result in Sec. 3.2. tein Method for Curie-Weiss Model To study the solution to the Stein equation ( ?? ), we can examine the followingequation that is in a more general form. For functions a ( x ) and b ( x ) satisfying that a ( x ) and b ( x ) /a ( x ) are absolutely continuous, and e R y b ( u ) a ( u ) du a ( y ) is integrable, consider theequation, 12 a ( x ) f ′′ h ( x ) + b ( x ) f ′ h ( x ) = Eh ( Y ) − h ( x ) , (8)with lim x →−∞ f ( x ) = 0 and lim x →−∞ f ′ ( x ) = 0 where C Y e R y b ( u ) a ( u ) du a ( y ) represents thedensity of random variable Y with C Y being the normalizing coeﬃcient.From basic diﬀerential equation calculation, we know that the solution to (8) canbe written as f ( x ) = Z x Z y −∞ e R z b ( u ) a ( u ) du − R y b ( u ) a ( u ) du Eh ( Y ) − h ( z )] a ( z ) dzdy. Hence, f ′ h ( x ) = e − R x b ( u ) a ( u ) du Z x −∞ e R y b ( u ) a ( u ) du Eh ( Y ) − h ( y )] a ( y ) dy, (9) f ′′ h ( x ) = − b ( x ) a ( x ) f ′ h ( x ) + 2[ Eh ( Y ) − h ( x )] a ( x ) , (10) f ′′′ h ( x ) = − (cid:18) b ( x ) a ( x ) (cid:19) ′ f ′ h ( x ) − b ( x ) a ( x ) f ′′ h ( x ) − h ′ ( x )] a ( x ) − a ′ ( x )[ Eh ( Y ) − h ( x )] a ( x ) . (11)Reexamine the ﬁrst derivative, f ′ h , we ﬁnd, f ′ h ( x ) = e − R x b ( u ) a ( u ) du Z x −∞ e R y b ( u ) a ( u ) du Eh ( Y ) − h ( y )] a ( y ) dy = C Y e − R x b ( u ) a ( u ) du Z x −∞ Z ∞−∞ e R y b ( u ) a ( u ) du + R z b ( u ) a ( u ) dy h ( z ) − h ( y )] a ( z ) a ( y ) dzdy For x ≥

0, from integration by part, we have, f ′ ( x ) = 2 e − R x b ( u ) a ( u ) du h ¯ F ( h ) ( x ) − Eh ( Y ) ¯ F ( x ) i , where F ( x ) = Z x −∞ e R z b ( u ) a ( u ) du a ( z ) dz, F ( h ) ( x ) = Z x −∞ e R z b ( u ) a ( u ) du a ( z ) h ( z ) dz, ¯ F ( x ) = 1 − F ( x ) , ¯ F ( h ) ( x ) = Eh ( Y ) − F ( h ) ( x ) . Since, " e R x b ( u ) a ( u ) du a ( x ) ′ = e R x b ( u ) a ( u ) du a ( x ) (cid:20) b ( x ) − a ′ ( x ) a ( x ) (cid:21) , Again by integration by part, we have, for x ≥ F ( x ) = − (cid:20) b ( x ) − a ′ ( x ) a ( x ) (cid:21) − e R x b ( u ) a ( u ) du a ( x ) . The inequality is due to the monotonicity assumption of (cid:20) b ( x ) − a ′ ( x ) a ( x ) (cid:21) − Lemma 3.1.

Under the condition that b ( x ) − a ′ ( x ) a ( x ) is strictly positive and increasing, wehave, f ′ ( x ) ≤ − (cid:20) b ( x ) − a ′ ( x ) a ( x ) (cid:21) − ( k h k + k h k ∞ ) , f ′′ ( x ) ≤ b ( x ) a ( x ) (cid:20) b ( x ) − a ′ ( x ) a ( x ) (cid:21) − ( k h k + k h k ∞ ) . From the parameters we have, as well as (10), we can certainly conclude that

Corollary 3.1.

There exist a C > , such that | f ′′′ ( x ) | ≤ C , for all x ∈ R . Remark 3.1.

Our problem certainly satisﬁes the condition on the density. Similar,we can also have the bound on the second derivative.

For the Curie-Weiss models, this method will lead the following result.

Theorem 3.1.

For proper parameter range, we have the following estimation, espe-cially, • For the critical case, i.e. β = 1 , E [ h ( X ( ∞ ))] − E [ h ( Y ( ∞ ))] ≤ Cn . (12) • For the super-critical case, E [ h ( X ( ∞ ))] − E [ h ( Y ( ∞ ))] ≤ Cn . (13) tein Method for Curie-Weiss Model Proof.

Let us denote the generator for the n -th system as G n , and the limit as G ∞ .By the above arguments, what we need to estimate is, E [ G n f ( X ( ∞ ) − G ∞ f ( X ( ∞ ))] = E [( G n f − G ∞ f )( X ( ∞ )] . For this purpose, write, G n f ( η ) = n α [ P n f ( η ) − f ( η )]= n α (cid:20)

12 [1 − ( m + ηnδ ) (cid:21) (cid:26) ∧ exp (cid:20) β (cid:18) ηnδ + 2 m + 2 h + 2 n (cid:19)(cid:21)(cid:27) [ f ( η + δ ) − f ( η )]+ n α (cid:20)

12 [1 + ( m + ηnδ ) (cid:21) (cid:26) ∧ exp (cid:20) β (cid:18) − ηnδ − m − h + 2 n (cid:19)(cid:21)(cid:27) [ f ( η − δ ) − f ( η )]= n α p n + ( η )[ f ( η + δ ) − f ( η )] + n α p n − ( η )[ f ( η − δ ) − f ( η )] (14)with α being the rate of the transition, i.e. the time scaling factor, a parameter thatwe can control, and δ = n γ − the space scaling factor.To estimate (14), apply Taylor expansion, we have, for some χ ∈ [ η, η + δ ] and ζ ∈ [ η − δ, η ], n α p n + ( η )[ f ( η + δ ) − f ( η )] + n α p n − ( η )[ f ( η − δ ) − f ( η )]= n α p n + ( η )[ f ′ ( η ) δ + 12 f ′′ ( η ) δ ] + n α p n − ( η )[ − f ′ ( η ) δ + 12 f ′′ ( ζ ) δ ]= n α f ′ ( η ) δ [ p n + ( η ) − p n − ( η )] + 12 n α f ′′ ( x ) δ [ p n + ( η ) + p n − ( η )]+ 12 n α δ p n + ( η )[ f ′′ ( χ ) − f ′′ ( η )] + 12 n α δ p n − ( η )[ f ′′ ( ζ ) − f ′′ ( η )] . The estimations are provided in Corollaries 2.1 and 2.2, as well as the gradient boundLammata 3.1 and corollary 3.1. Basically, note that at the critical temperature, h = 0, β = 1, we have, γ = 1 /

4, hence, δ = n γ − = n − / , and α = 3 /

2, so, n α δ = n / and n α δ = 1. In this case, G ∞ = − x f ′ ( x ) + 2 f ′′ ( x ). Meanwhile, at supercriticaltemperature, β ∈ [0 , γ = 1 /

2, hence, δ = n γ − = n − / , and α = 1, so, n α δ = n ′ and n α δ = 1. In this case, G ∞ f = − (1 − m ) βx h β − (1+ η β )1+ m i f ′ ( x ) +2(1 − m ) f ′′ ( x ). Thus, we have, E [( G n f − G ∞ f )( X ( ∞ )] ≤ E [ E ( X ( ∞ )) + E ( X ( ∞ )) + E ( X ( ∞ ))] . Where E i refers to the i -th order term in the approximation, for i = 1 , ,

3, and from2.1 and 2.2, we can conclude that they are in the desired order. (cid:3)

References [1]

Bierkens, J. and Roberts, G. (2017). A piecewise deterministic scaling limit of liftedmetropolis–hastings in the curie–weiss model.

Ann. Appl. Probab.

Braverman, A. and Dai, J. G. (2017). Stein’s method for steady-state diﬀusion approximationsof m/ Ph /n + m systems. Ann. Appl. Probab.

Chatterjee, S. and Dey, P. S. (2010). Applications of stein’s method for concentrationinequalities.

Ann. Probab.

Chatterjee, S. and Shao, Q.-M. (2011). Nonnormal approximation by stein’s method ofexchangeable pairs with application to the curie–weiss model.

Ann. Appl. Probab.

Ellis, R. (2006).

Entropy, Large Deviations, and Statistical Mechanics . Classics in Mathematics.Springer.[6]

Gurvich, I. (2014). Diﬀusion models and steady-state approximations for exponentially ergodicmarkovian queues.

Ann. Appl. Probab.

Gurvich, I. (2014). Validity of heavy-traﬃc steady-state approximations in multiclass queueingnetworks: The case of queue-ratio disciplines.

Mathematics of Operations Research39,