aa r X i v : . [ m a t h . P R ] F e b Applied Probability Trust (22 February 2021)
APPROXIMATING THE MARKOV CHAIN OF THE CURIE-WEISSMODEL
YINGDONG LU, ∗ IBM Research
Abstract
In this paper, we quantify some known approximation to the Curie-Weissmodel via applying the Stein method to the Markov chain whose stationarydistribution coincides with Curie-Weiss model.
Keywords:
1. Introduction
For an integer n > β ≥ h , the Curie-Weiss model for n spins at temperature 1 /β refers to a probability measure on S n := {− , } n , given by, π ( x ) = Z n exp[ − βH n ( x )] , ∀ x ∈ S n , with H n ( x ) = 12 n n X i,j =1 x i x j − h n X i =1 x i , ∀ x ∈ S n , and Z n are the normalization coefficients. The parameter h relates to external mag-netization. This is an important mathematical model for studying the interaction ofelectron spins in real ferromagnets, and is sometimes also called the Ising model on thecomplete graph. For detailed physics background and thorough analysis on the Curie-Weiss model, see, e.g. [5]. The system has two phases determined by the temperature.0 ≤ β < supercritical phase , and β = 1 is known as the critical phase . ∗ Postal address: 1101 Kitchawan Rd, Yorktown Heights, NY 10598 ∗ Email address: [email protected]
While the final results for these two phases are different, there are commonalities inanalysis, and we will discuss them separately only when necessary.Define the one-dimensional quantity m n ( x ) := n P ni =1 x i , that is also known as themagnetization of the system. Thus, H n ( x ) can be rewritten in the following form, H n ( x ) = − n (cid:20)
12 ( m n ( x )) + hm n ( x ) (cid:21) . The concentration of mass happens at the minimizer of the following function, i ( m ) = − (cid:18) β ( m ) + βhm (cid:19) + 1 − m − m ) + 1 + m m ) . It is given by, βm + βh = 12 log 1 + m − m , (1)or equivalently, m = tanh( β ( m + h )).In certain parameter cases, it is observed that, see e.g. [1], under proper scaling, theMarkov chain that produces the density can converge to diffusion processes. Hence, thestationary distribution of the diffusion process can be viewed as a good approximationof the Curie-Weiss model. Note that the key to this approximation is the concentrationeffect proved, see, e.g. [3, 4]. They have been proved for a special case of critical systemand for supercritical cases. For other cases, the concentration is known to be moredifficult, and will be part of our future research.In this paper, we aim to quantify the discrepancy in their stationary distributionsvia applying Stein method. In the analysis, we will rely on a detailed analysis of theinfinitesimal operators of the two processes. While process level approximation can beachieved by a semigroup expansion method as shown in citeprinceton. More detailedregularity analysis for the solution to the Poisson equation related to the generatoris required for the Stein method. More specifically, our approach is developing aStein method approximation through a Markov chain generated by the Metropolis-Hasting method in simulation. The stationary distribution of this Markov chain is π n .Meanwhile, π ∞ will be the stationary distribution of a diffusion process Y ( t ), and isalso denoted by Y ( ∞ ). To estimate the difference between π n and Y ( ∞ ) in the weaksense, we need to quantify the values of | E [ h ( π n )] − E [ h ( Y ∞ )] | for an arbitrary boundedfunction. The key of the Stein method is to reduce the estimation of this quantity to tein Method for Curie-Weiss Model that of | E [ G ( f h ( Y ∞ )] − E [ G n ( f h ( π n )] | , with f h being the solution to the Stain equationwith respect to h , with the hope that the structural properties of solution will revealmore information that can aid the calculation of the quantities.While better result on the distribution function were obtained in Chaterjee usingtechniques that can not be easily generalized, the results in this paper is focus ongeneral function evaluated at the stationary distribution. Meanwhile, similar methodwas used by Braverman and Dai [2]and Gurvich [6, 7] for stocahstic processing systemand networks.In the rest of paper, we will first introduce the Markov chain in Sec. 2; iwe willthen discuss the Stein method and present the results on approximating the stationarydistributions in Sec. 3.
2. The Markov Chain
In this section, we present the Markov chain whose stationary distribution is theCurie-Weiss model, then write out the scaled and centered version and introduce someof its basic properties.
First, let X nt denote a Markov chain whose state space is S n , and its transitionprobability is given by, P x,y = n y ∈ N ( x )0 y / ∈ N ( x ) (2)where N ( x ) = ∪ nk =1 { y ∈ S n , y = x, y − x = ± e k } . The Markov chain will flip one ofits coordinates with probability 1 /n .Let Y nt = η n ( X t ), where, m n ( x ) := (1 /n ) n X k =1 x k , η n ( x ) := n γ ( m n ( x ) − m ) , which is the centralized and renormalized magnetization. Its (one-step) transition rate Yingdong Lu is characterized by, Q n ( η n , η n + 2 n γ − ) = 12 [1 − ( m + n − γ η n )] ,Q n ( η n , η n − n γ − ) = 12 [1 + ( m + n − γ η n )] . Finally, Z nt , the Metropolis-Hasting version of this Markov chain, is the one with thefollowing transition rates, P n ( η n , η n ± n γ − ) = Q n ( η n , η n ± n γ − )(1 ∧ exp[ β (Φ n ( η ) − Φ n ( η ∓ n γ − ))] , with Φ n ( η ) = − n − γ η − n − γ ( m + h ) η. Define, p n ± ( η ) : = P n ( η, η ± n γ − )= (cid:20)
12 [1 ∓ ( m + ηnδ ) (cid:21) (cid:26) ∧ exp (cid:20) ± β (cid:18) ηnδ ± m ± h + 2 n (cid:19)(cid:21)(cid:27) . Furthermore, there will be a time scaling, or speed up of the Markov chain, that isquantified by the number α . The relationship under which a meaningful process limitcan be established has been identified in [1]. More specifically α = 1 − γ . The concentration of the Curie Weiss, a known result from Chatterjee, determinedthat γ = for 0 ≤ β < γ = for β = 1. More specifically, we know that, Lemma [Chatterjee]1. For all β ≥ h ∈ R , we have, for any t ≥ π n (cid:18) | m n − tanh( β ( m n + h )) | ≥ βn + t √ n (cid:19) ≤ (cid:18) − t β ) (cid:19) . (3)2. For h = 0 and β = 1, we have, π n ( | m n | ≥ t / ) ≤ e − cnt (4)We can use the tail distribution estimation to guarantee the boundedness of themoments, which will be useful later. tein Method for Curie-Weiss Model Lemma 2.1.
The finiteness of the moments. More specifically, there exists an con-stant C m > , such that, E [ Z n ] ≤ C nm n !! , (5) where n !! denotes the double factorial of n , i.e. the product of all the integers from to n that have the same parity as n .Proof. We will discuss the critical and supercritical cases separately.For the critical case, η > n δ , we know that, π [ | η | ≥ n δ ] = π n [ | m n | ≥ n δ − ] ≤ exp( − cn δ − ) . For any integer k , we have, π [ | η | ≥ k ] = π n (cid:20) | m n | ≥ kn / (cid:21) = π n " | m n | ≥ (cid:18) k n (cid:19) / ≤ e − ck , The inequality comes from inequality (4) with t = k n . This will ensure the finitenessof the moments.For the supercritical case, again, for any integer k , we have, π [ | η | ≥ k ] = π n ( | m n − m | ≥ k √ n )= π n | m n − m | ≥ βn + k − β √ n √ n ! ≤ − ( k − β √ n ) β ) ! . This implies the finiteness of the moment estimate. (cid:3)
Remark 2.1.
The bound is not necessary optimal, just suffice for our purpose in thispaper.The dynamics of the Markov chain discussed above indicates that the quantities[ p n + ( η ) + p n − ( η )] and [ p n + ( η ) − p n − ( η )] play prominent roles in the quantification of theprocess. In the following, we provide a detailed analysis on these two quantities. Yingdong Lu
Lemma 2.2.
The calculation of [ p n + ( η ) + p n − ( η )] = 12 [1 − m ] (cid:26) (cid:20) β (cid:18) ηnδ + 2 n (cid:19)(cid:21)(cid:27) − ηnδ (cid:26) − − m m exp (cid:20) β (cid:18) ηnδ + 2 n (cid:19)(cid:21)(cid:27) (6) Proof.
The calculations goes as the following,[ p n + ( η ) + p n − ( η )]= 12 h − (cid:16) m + ηnδ (cid:17)i + 12 h (cid:16) m + ηnδ (cid:17)i exp (cid:20) β (cid:18) ηnδ − m − h + 2 n (cid:19)(cid:21) = 12 [1 − m ] + 12 [1 + m ] exp[ − β ( m + h )] exp (cid:20) β (cid:18) ηnδ + 2 n (cid:19)(cid:21) − ηnδ (cid:26) − exp (cid:20) β (cid:18) ηnδ − m − h + 2 n (cid:19)(cid:21)(cid:27) = 12 [1 − m ] (cid:26) (cid:20) β (cid:18) ηnδ + 2 n (cid:19)(cid:21)(cid:27) − ηnδ (cid:26) − exp (cid:20) β (cid:18) ηnδ − m − h + 2 n (cid:19)(cid:21)(cid:27) = 12 [1 − m ] (cid:26) (cid:20) β (cid:18) ηnδ + 2 n (cid:19)(cid:21)(cid:27) − ηnδ (cid:26) − − m m exp (cid:20) β (cid:18) ηnδ + 2 n (cid:19)(cid:21)(cid:27) . The first equation is just simple algebraic manipulation, and the second and thirdequations used the basic relationship in (1). (cid:3)
Expand the exponential function gives us the following order estimation.
Corollary 2.1.
We have the following evaluations on terms of the expression in (6) ,i The -th order term in the expression: (1 − m ) .ii The ( nδ ) − term, E [ η ] h β (1 − m ) − m m i . • Recall that, in the critical case, we have, δ = n − / , so ( nδ ) − terms is n − / , • In the supercritical system, δ = n − / , so ( nδ ) − terms is n − / .iii Everything else will be higher order.In other words, we have, p n + ( η )+ p n − ( η ) = (1 − m )+ E ( η ) , with E ( η ) = η h β (1 − m ) − m m i ( nδ ) − + O (( nδ ) − ) .Proof. Both (i) and (ii) are straightforward, and (iii) follows from Lemma 2.1. (cid:3) tein Method for Curie-Weiss Model Lemma 2.3.
The calculation of [ p n + ( η ) − p n − ( η )] . p n + ( η ) − p n − ( η ) = 12 (1 − m ) (cid:26) − exp (cid:20) β (cid:18) ηnδ + 2 n (cid:19)(cid:21)(cid:27) − ηnδ (cid:26) − m m exp (cid:20) β (cid:18) ηnδ + 2 n (cid:19)(cid:21)(cid:27) (7) Proof.
Recall p n ± ( η ) = 12 h ∓ (cid:16) m + ηnδ (cid:17)i (cid:26) ∧ exp (cid:20) β (cid:18) ηnδ ± m ± h + 2 n (cid:19)(cid:21)(cid:27) . When 2 m + 2 h >
0, then, for large enough n , 2 ηnδ + 2 m + 2 h + n ≥
0, and 2 ηnδ − m − h + n ≤
0, we know that,[ p n + ( η ) − p n − ( η )]= 12 h − (cid:16) m + ηnδ (cid:17)i − h (cid:16) m + ηnδ (cid:17)i exp (cid:20) β (cid:18) ηnδ − m − h + 2 n (cid:19)(cid:21) Meanwhile, we know that exp[ β ( m + h )] = (cid:18) m − m (cid:19) . So,[ p n + ( η ) − p n − ( η )]= 12 h − (cid:16) m + ηnδ (cid:17)i − h (cid:16) m + ηnδ (cid:17)i exp (cid:20) β (cid:18) ηnδ − m − h + 2 n (cid:19)(cid:21) = 12 (1 − m ) −
12 (1 + m ) exp[ − β ( m + h )] exp (cid:20) β (cid:18) ηnδ + 2 n (cid:19)(cid:21) − ηnδ (cid:26) (cid:20) β (cid:18) ηnδ − m − h + 2 n (cid:19)(cid:21)(cid:27) = 12 (1 − m ) (cid:26) − exp (cid:20) β (cid:18) ηnδ + 2 n (cid:19)(cid:21)(cid:27) − ηnδ (cid:26) (cid:20) β (cid:18) ηnδ − m − h + 2 n (cid:19)(cid:21)(cid:27) = 12 (1 − m ) (cid:26) − exp (cid:20) β (cid:18) ηnδ + 2 n (cid:19)(cid:21)(cid:27) − ηnδ (cid:26) − m m exp (cid:20) β (cid:18) ηnδ + 2 n (cid:19)(cid:21)(cid:27) The case of 2 m + 2 h < (cid:3) Similar to Corollary 2.1, we have,
Corollary 2.2.
The terms of (7) can be evaluated as, • In the case of h = 0 and β = 1 , we have, m = 0 , with α = 3 / and γ = 1 / , n [ p n + ( η ) − p n − ( η )] = (cid:20) − n − η + 43 n − η (cid:21) n . Yingdong Lu • In the supercritical case − (1 − m ) η (cid:20) β + 12(1 + m ) (cid:21) n − (1 − m ) β (cid:20) (1 + η β ) − η m (cid:21) n − (1 − m ) βη (cid:20) β − (1 + η β )1 + m (cid:21) n In other words, we have, n [ p n − ( η ) + p n − ( η )] = − (1 − m ) η h β + m ) i + E ( η ) , with E ( η ) = O ( n ) .Proof. For the critical case in which h = 0 and β = 1, we al so have, m = 0,with α = 3 / γ = 1 /
4, thus, the following estimation follows directly from theexpansion of the exponential function,2 n [ p n + ( η ) − p n − ( η )]= (cid:20) (1 − n − η ) − (1 + n − η )(1 − n − η + 2 n − η − n − η ) (cid:21) n + O ( n − )= (cid:20) − n − η + 43 n − η (cid:21) n + O ( n − ) . The first term is, of course, the desired drift term in the generator, so we only needto estimate the second term, i.e. E h η n − / i . The estimation of the tail order againfollows from Lemma 2.1.Now, let us consider the supercritical case. With γ = , we have,1 − exp (cid:20) β (cid:18) ηn + 2 n (cid:19)(cid:21) = − β (cid:18) ηn + 2 n (cid:19) − β (cid:18) ηn + 2 n (cid:19) + O (cid:18) n (cid:19) = − β ηn − (2 β + 2 η β ) 1 n + O (cid:18) n (cid:19) Plug it in (7), we get, − (1 − m ) η (cid:20) β + 12(1 + m ) (cid:21) n − (1 − m ) β (cid:20) (1 + η β ) − η m (cid:21) n + O (cid:18) n (cid:19) (cid:3)
3. Approximation via the Stein Method
In this section, we will present the result on the approximation of the stationarydistribution via the Stein method. First, we will provide some quantitative charac-terization of the solution to the Stein equation in Sec. 3.1; then we will present anddemonstrate the main result in Sec. 3.2. tein Method for Curie-Weiss Model To study the solution to the Stein equation ( ?? ), we can examine the followingequation that is in a more general form. For functions a ( x ) and b ( x ) satisfying that a ( x ) and b ( x ) /a ( x ) are absolutely continuous, and e R y b ( u ) a ( u ) du a ( y ) is integrable, consider theequation, 12 a ( x ) f ′′ h ( x ) + b ( x ) f ′ h ( x ) = Eh ( Y ) − h ( x ) , (8)with lim x →−∞ f ( x ) = 0 and lim x →−∞ f ′ ( x ) = 0 where C Y e R y b ( u ) a ( u ) du a ( y ) represents thedensity of random variable Y with C Y being the normalizing coefficient.From basic differential equation calculation, we know that the solution to (8) canbe written as f ( x ) = Z x Z y −∞ e R z b ( u ) a ( u ) du − R y b ( u ) a ( u ) du Eh ( Y ) − h ( z )] a ( z ) dzdy. Hence, f ′ h ( x ) = e − R x b ( u ) a ( u ) du Z x −∞ e R y b ( u ) a ( u ) du Eh ( Y ) − h ( y )] a ( y ) dy, (9) f ′′ h ( x ) = − b ( x ) a ( x ) f ′ h ( x ) + 2[ Eh ( Y ) − h ( x )] a ( x ) , (10) f ′′′ h ( x ) = − (cid:18) b ( x ) a ( x ) (cid:19) ′ f ′ h ( x ) − b ( x ) a ( x ) f ′′ h ( x ) − h ′ ( x )] a ( x ) − a ′ ( x )[ Eh ( Y ) − h ( x )] a ( x ) . (11)Reexamine the first derivative, f ′ h , we find, f ′ h ( x ) = e − R x b ( u ) a ( u ) du Z x −∞ e R y b ( u ) a ( u ) du Eh ( Y ) − h ( y )] a ( y ) dy = C Y e − R x b ( u ) a ( u ) du Z x −∞ Z ∞−∞ e R y b ( u ) a ( u ) du + R z b ( u ) a ( u ) dy h ( z ) − h ( y )] a ( z ) a ( y ) dzdy For x ≥
0, from integration by part, we have, f ′ ( x ) = 2 e − R x b ( u ) a ( u ) du h ¯ F ( h ) ( x ) − Eh ( Y ) ¯ F ( x ) i , where F ( x ) = Z x −∞ e R z b ( u ) a ( u ) du a ( z ) dz, F ( h ) ( x ) = Z x −∞ e R z b ( u ) a ( u ) du a ( z ) h ( z ) dz, ¯ F ( x ) = 1 − F ( x ) , ¯ F ( h ) ( x ) = Eh ( Y ) − F ( h ) ( x ) . Since, " e R x b ( u ) a ( u ) du a ( x ) ′ = e R x b ( u ) a ( u ) du a ( x ) (cid:20) b ( x ) − a ′ ( x ) a ( x ) (cid:21) , Again by integration by part, we have, for x ≥ F ( x ) = − (cid:20) b ( x ) − a ′ ( x ) a ( x ) (cid:21) − e R x b ( u ) a ( u ) du a ( x ) . The inequality is due to the monotonicity assumption of (cid:20) b ( x ) − a ′ ( x ) a ( x ) (cid:21) − Lemma 3.1.
Under the condition that b ( x ) − a ′ ( x ) a ( x ) is strictly positive and increasing, wehave, f ′ ( x ) ≤ − (cid:20) b ( x ) − a ′ ( x ) a ( x ) (cid:21) − ( k h k + k h k ∞ ) , f ′′ ( x ) ≤ b ( x ) a ( x ) (cid:20) b ( x ) − a ′ ( x ) a ( x ) (cid:21) − ( k h k + k h k ∞ ) . From the parameters we have, as well as (10), we can certainly conclude that
Corollary 3.1.
There exist a C > , such that | f ′′′ ( x ) | ≤ C , for all x ∈ R . Remark 3.1.
Our problem certainly satisfies the condition on the density. Similar,we can also have the bound on the second derivative.
For the Curie-Weiss models, this method will lead the following result.
Theorem 3.1.
For proper parameter range, we have the following estimation, espe-cially, • For the critical case, i.e. β = 1 , E [ h ( X ( ∞ ))] − E [ h ( Y ( ∞ ))] ≤ Cn . (12) • For the super-critical case, E [ h ( X ( ∞ ))] − E [ h ( Y ( ∞ ))] ≤ Cn . (13) tein Method for Curie-Weiss Model Proof.
Let us denote the generator for the n -th system as G n , and the limit as G ∞ .By the above arguments, what we need to estimate is, E [ G n f ( X ( ∞ ) − G ∞ f ( X ( ∞ ))] = E [( G n f − G ∞ f )( X ( ∞ )] . For this purpose, write, G n f ( η ) = n α [ P n f ( η ) − f ( η )]= n α (cid:20)
12 [1 − ( m + ηnδ ) (cid:21) (cid:26) ∧ exp (cid:20) β (cid:18) ηnδ + 2 m + 2 h + 2 n (cid:19)(cid:21)(cid:27) [ f ( η + δ ) − f ( η )]+ n α (cid:20)
12 [1 + ( m + ηnδ ) (cid:21) (cid:26) ∧ exp (cid:20) β (cid:18) − ηnδ − m − h + 2 n (cid:19)(cid:21)(cid:27) [ f ( η − δ ) − f ( η )]= n α p n + ( η )[ f ( η + δ ) − f ( η )] + n α p n − ( η )[ f ( η − δ ) − f ( η )] (14)with α being the rate of the transition, i.e. the time scaling factor, a parameter thatwe can control, and δ = n γ − the space scaling factor.To estimate (14), apply Taylor expansion, we have, for some χ ∈ [ η, η + δ ] and ζ ∈ [ η − δ, η ], n α p n + ( η )[ f ( η + δ ) − f ( η )] + n α p n − ( η )[ f ( η − δ ) − f ( η )]= n α p n + ( η )[ f ′ ( η ) δ + 12 f ′′ ( η ) δ ] + n α p n − ( η )[ − f ′ ( η ) δ + 12 f ′′ ( ζ ) δ ]= n α f ′ ( η ) δ [ p n + ( η ) − p n − ( η )] + 12 n α f ′′ ( x ) δ [ p n + ( η ) + p n − ( η )]+ 12 n α δ p n + ( η )[ f ′′ ( χ ) − f ′′ ( η )] + 12 n α δ p n − ( η )[ f ′′ ( ζ ) − f ′′ ( η )] . The estimations are provided in Corollaries 2.1 and 2.2, as well as the gradient boundLammata 3.1 and corollary 3.1. Basically, note that at the critical temperature, h = 0, β = 1, we have, γ = 1 /
4, hence, δ = n γ − = n − / , and α = 3 /
2, so, n α δ = n / and n α δ = 1. In this case, G ∞ = − x f ′ ( x ) + 2 f ′′ ( x ). Meanwhile, at supercriticaltemperature, β ∈ [0 , γ = 1 /
2, hence, δ = n γ − = n − / , and α = 1, so, n α δ = n ′ and n α δ = 1. In this case, G ∞ f = − (1 − m ) βx h β − (1+ η β )1+ m i f ′ ( x ) +2(1 − m ) f ′′ ( x ). Thus, we have, E [( G n f − G ∞ f )( X ( ∞ )] ≤ E [ E ( X ( ∞ )) + E ( X ( ∞ )) + E ( X ( ∞ ))] . Where E i refers to the i -th order term in the approximation, for i = 1 , ,
3, and from2.1 and 2.2, we can conclude that they are in the desired order. (cid:3)
References [1]
Bierkens, J. and Roberts, G. (2017). A piecewise deterministic scaling limit of liftedmetropolis–hastings in the curie–weiss model.
Ann. Appl. Probab.
Braverman, A. and Dai, J. G. (2017). Stein’s method for steady-state diffusion approximationsof m/ Ph /n + m systems. Ann. Appl. Probab.
Chatterjee, S. and Dey, P. S. (2010). Applications of stein’s method for concentrationinequalities.
Ann. Probab.
Chatterjee, S. and Shao, Q.-M. (2011). Nonnormal approximation by stein’s method ofexchangeable pairs with application to the curie–weiss model.
Ann. Appl. Probab.
Ellis, R. (2006).
Entropy, Large Deviations, and Statistical Mechanics . Classics in Mathematics.Springer.[6]
Gurvich, I. (2014). Diffusion models and steady-state approximations for exponentially ergodicmarkovian queues.
Ann. Appl. Probab.
Gurvich, I. (2014). Validity of heavy-traffic steady-state approximations in multiclass queueingnetworks: The case of queue-ratio disciplines.
Mathematics of Operations Research39,