[PDF] Effective Mori-Zwanzig equation for the reduced-order modeling of stochastic systems

Abstract

Built upon the hypoelliptic analysis of the effective Mori-Zwanzig (EMZ) equation for observables of stochastic dynamical systems, we show that the obtained semigroup estimates for the EMZ equation can be used to drive prior estimates of the observable statistics for system in the equilibrium and non-equilibrium state. In addition, we introduce both first-principle and data-driven methods to approximate the EMZ memory kernel, and prove the convergence of the data-driven parametrization schemes using the regularity estimate of the memory kernel. The analysis results are validated numerically via the Monte-Carlo simulation of the Langevin dynamics for a Fermi-Pasta-Ulam chain model. With the same example, we also show the effectiveness of the proposed memory kernel approximation methods.

Full PDF

EEﬀective Mori-Zwanzig equation for the reduced-order modeling ofstochastic systems

Yuanran Zhu a, ∗ , Huan Lei b a Department of Applied Mathematics, University of California, MercedMerced (CA) 95343 b Department of Computational Mathematics, Michigan State UniversityEast Lansing (MI) 48824

Abstract

Built upon the hypoelliptic analysis of the eﬀective Mori-Zwanzig (EMZ) equation for observables of stochas-tic dynamical systems, we show that the obtained semigroup estimates for the EMZ equation can be usedto drive prior estimates of the observable statistics for system in the equilibrium and nonequilibrium state.In addition, we introduce both ﬁrst-principle and data-driven methods to approximate the EMZ memorykernel, and prove the convergence of the data-driven parametrization schemes using the regularity estimateof the memory kernel. The analysis results are validated numerically via the Monte-Carlo simulation ofthe Langevin dynamics for a Fermi-Pasta-Ulam chain model. With the same example, we also show theeﬀectiveness of the proposed memory kernel approximation methods.

Keywords:

Mori-Zwanzig equation, Stochastic diﬀerential equation, Reduced-order modeling.

MSC:

1. Introduction

The projection operator method, which is also known as the Mori-Zwanzig (MZ) formulation [23, 36], isa widely used dimension-reduction framework in statistical mechanics. The key features of such formulationis that it allows us to formally derive the generalized Langevin equations (GLEs) [37, 3, 30, 14] for coarse-grained quantities of interest based on microscopic equations of motion. Such GLEs can be found in a varietyof applications, including molecular dynamics [19, 31, 11, 10], ﬂuid mechanics [24, 12], and, more generally,systems described by nonlinear partial diﬀerential equations (PDEs) [29, 27, 22, 21]. Although being usedin the physics and applied mathematics communities for a rather long time, a systematic study of the MZequation within an rigorous analytical framework is still lacking. This is closely related to the well-knowndiﬃculty on the quantiﬁcation of the orthogonal dynamics in the MZ equation. Being a high-dimensionalﬂow which is generated by an integro-diﬀerential operator, the mathematical properties such as the regularityand ergodicity of the orthogonal dynamics are not well understood. Hence from a theoretical point of view,there is no available prior estimates which help to determine the properties of the MZ memory integral andthe ﬂuctuation force. As a result, the numerical approximations of these terms has to be done in an rather ad hoc manner.Some recent works have shed light on this direction. In particular, Kupferman, Givon and Hald proved[5] the existence and uniqueness of the orthogonal dynamics for a classical dynamical system with Mori’sprojection operator. More recently, Zhu and Venturi [32] were able to get the uniform boundedness of theorthogonal dynamics propagator for Hamiltonian systems using semigroup estimates [32]. The theoreticalresult obtained therein was later extended and greatly improved for the analysis of the eﬀective Mori-Zwanzig(EMZ) equation corresponding to stochastic diﬀerential equations (SDEs) [35]. In particular, they developeda thorough mathematical analysis of the EMZ equation using the hypoelliptic technique developed mainlyby H´erau, Nier, Eckmann, Hairer and Helﬀer [16, 8, 15]. The key ﬁnding is that the ergodicity and regularity ∗ Corresponding author

Email address: [email protected] (Yuanran Zhu)

Preprint submitted to arXiv February 3, 2021 a r X i v : . [ m a t h - ph ] F e b f the stochastic ﬂow generated by the Markovian semigroup e − t K , where K is the Kolmogorov operatorcorresponding to the SDE, implies the ergodicity and regularity of the stochastic ﬂow generated by the EMZ orthogonal semigroup e − t QKQ , provided that P = I − Q is a Mori-type projection operator. This connectionenables us to get a clear understanding on the dynamical properties of the orthogonal dynamics generatedby e − t QKQ .In this work, we continue Zhu and Venturi’s hypoelliptic study of the EMZ equation for stochasticdynamical systems. The main objective of the paper is twofold. First, we apply the semigroup estimateobtained in [35] to diﬀerent stochastic systems and show that it enables us to derive useful prior estimatesfor the statistics of observables. In particular, we prove that for some commonly used stochastic modelsin physics, any reduced-order system observable has exponentially decaying time autocorrelation functionand EMZ memory kernel. This fact veriﬁed the frequently used exponentially decaying assumption for thememory kernel from a theoretical point of view. Secondly, we will demonstrate the eﬀectiveness of the seriesexpansion approximation method for the memory kernel reconstruction of the EMZ equation. To this end,we will focus on a generalized ﬁrst-principle parametrization method [34], and various data-driven methods[1, 2, 4, 20] developed over the years. For the numerical example we considered, both the ﬁrst-principle andthe data-driven method are proven to yield accurate simulation result within the range of their applicability.Moreover, we will prove the convergence of the commonly used data-driven method using the regularityestimate for the orthogonal dynamics. For the reduced-order modeling problem of a large-scale stochasticsystem, the proposed analysis for the EMZ equation shows the potential usage of the hypoelliptic methodin understanding the dynamical behavior of the reduced-order model, and the numerical methodology weprovide is shown to be a practical way to solve it.This paper is organized as follows. In Section 2, we brieﬂy review the derivation of the eﬀective MoriZwanzig (EMZ) equation for the stochastic dynamical system driven by white noise. In Section 3, we focus onthe equilibrium and nonequilibrium dynamics of the interacting anharmonic chains and derive prior estimatesfor various observable statistics such as the time autocorrelation function, the nonequilibrium mean, the EMZmemory kernel and the ﬂuctuation force. In Section 4, we introduce diﬀerent parametrization methods toapproximate the EMZ memory kernel and prove their convergence. All these theoretical results are veriﬁednumerically in Section 5 via the simulation of the Langevin dynamics for a Fermi-Pasta-Ulam chain model.The main ﬁndings of this paper are summarized in Section 6.

2. Eﬀective Mori-Zwanzig equation for stochastic system

Let us consider a d -dimensional stochastic diﬀerential equation in R d : d x ( t ) dt = F ( x ( t )) + σ ( x ( t )) ξ ( t ) , x (0) = x ∼ ρ ( x ) , (1)where F : R d → R d and σ : R d → R d × m are smooth functions. ξ ( t ) is a m -dimensional Gaussian whitenoise with independent components, and x is a random initial state characterized in terms of a probabilitydensity function ρ ( x ). It is well known that the system of SDEs (1) induces a d -dimensional Markovianprocess in R d . This allows us to deﬁne a composition operator M ( t,

0) that pushes forward in time theaverage of the observable u ( t ) = u ( x ( t )) over the noise, i.e., E ξ ( t ) [ u ( x ( t )) | x ] = M ( t, u ( x ) = e t K u ( x ) , (2)where M ( t,

0) is a Markovian semigroup generated by the following (backward) Kolmogorov operator [26, 17]: K ( x ) = d (cid:88) k =1 F k ( x ) ∂∂x k + 12 m (cid:88) j =1 d (cid:88) i,k =1 σ ij ( x ) σ kj ( x ) ∂∂x i ∂x k . (3)With the evolution operator M ( t,

0) available, we can now derive the Mori-Zwanzig equation for noise-averaged u ( x ( t )). To this end, we introduce a projection operator P and the complementary projection Q = I − P . By diﬀerentiating Dyson’s identity [34, 33, 6] for the Markovian semigroup M ( t, ∂∂t e t K u (0) = e t K PK u (0) + e t QKQ QK u (0) + (cid:90) t e s K PK e ( t − s ) QKQ QK u (0) ds, (4)2here u (0) = u ( x ). The three terms on the right hand side of (4) are called streaming term, ﬂuctuation(or noise) term, and memory term respectively. It is often more useful to compute the evolution of theobservable u ( t ) within a closed linear space, e.g., the image of the projection operator P . Hence we applythe projection operator P to (4) and get the projected equation: ∂∂t P e t K u (0) = P e t K PK u (0) + (cid:90) t P e s K PK e ( t − s ) QKQ QK u (0) ds. (5)Due to the fact that equation (4) and its projected form (5) only describe the dynamics of noise averageof observable u ( x ), they were called as the eﬀective Mori-Zwanzig (EMZ) equation for stochastic systems.The EMZ equation and the classical MZ equation for deterministic (autonomous) systems [34, 32, 33] share asame structure. The only diﬀerence is that the Liouville operator L is replaced by a Kolmogorov operator K .Depending on the choice of the projection operator, the EMZ equation (4) and (5) yield evolution equationsfor diﬀerent statistical quantities. In this paper, we mainly focus on the EMZ equation correspondingto Mori-type linear projection operator. To drive such equation, we consider the weighted Hilbert space H = L ( R d , ρ ), where ρ is a positive weight function in R d . Let (cid:104) h, g (cid:105) ρ = 1 (cid:82) ρd x (cid:90) h ( x ) g ( x ) ρ ( x ) d x h, g ∈ H (6)be the inner product in H . We introduce the following projection operator P h = M (cid:88) i,j =1 G − ij (cid:104) u i (0) , h (cid:105) ρ u j (0) , h ∈ H. (7)where G ij = (cid:104) u i (0) , u j (0) (cid:105) ρ and u i (0) = u i ( x ) ( i = 1 , ..., M ) are M linearly independent functions withrespect to inner product (cid:104)· , ·(cid:105) ρ . With P deﬁned as (7), we can rewrite the EMZ equation (4), and itsprojected version (5), as d u ( t ) dt = Ω u ( t ) + (cid:90) t K ( t − s ) u ( s ) ds + f ( t ) , (8) ddt P u ( t ) = Ω P u ( t ) + (cid:90) t K ( t − s ) P u ( s ) ds. (9)where u ( t ) = [ u ( t ) , . . . , u M ( t )] T and G ij = (cid:104) u i (0) , u j (0) (cid:105) ρ (Gram matrix) , (10a)Ω ij = M (cid:88) k =1 G − jk (cid:104) u k (0) , K u i (0) (cid:105) ρ (streaming matrix) , (10b) K ij ( t ) = M (cid:88) k =1 G − jk (cid:104) u k (0) , K e t QKQ QK u i (0) (cid:105) ρ (memory kernel) , (10c) f i ( t ) = e t QKQ QK u i (0) (ﬂuctuation term) . (10d)In statistical mechanics, the EMZ equation (8) and (9) are often called as the generalized Langevin equations.We note that the classical second ﬂuctuation-dissipation theorem [34] for the EMZ equation is violated dueto non-skew-adjointness of the Kolmogorov operator K .

3. Applications of the hypoelliptic analysis for the EMZ equation

In the previous section, we demonstrate how the EMZ equation is derived from the evolution operator e t K and the orthogonal e t QKQ . In this section, we focus on the prior estimation of these two semigroupsand apply the established analytical results to various physical models. To be consistent with the literatureon hypoelliptic analysis, we will use the negative of K and QKQ as semigroup generators and write the3emigroups appearing in EMZ equation (4) as e − t K and e − t QKQ . Moreover, all estimates are obtained ﬁrstin an regular Hilbert space L ( R d ) and then transformed back in weighted Hilbert space L ( R d ; ρ ) (seeexplanations in [35]). Throughout the paper, we denote the standard L ( R d ) norm as (cid:107) · (cid:107) . The innerproduct in L ( R d ; ρ ) is deﬁned as (6) with the induced weighted norm (cid:107) · (cid:107) L ρ given by (cid:107) · (cid:107) L ρ = (cid:20) (cid:82) ρd x (cid:90) ( · ) ρd x (cid:21) . Unless otherwise stated we only consider scalar quantities of interest. We begin our discussion by reviewingsome results proved in [35]:

Proposition 1 (Zhu and Venturi [35]) . Assuming a Kolmogorov operator ˜ K of the form (3) satisﬁes somesuitable conditions (see Theorem 2 [35] for details). If the spectrum of ˜ K in L ( R n ) is such that σ ( ˜ K ) ∩ i R = { } , then there exits positive constants α and C = C ( α ) such that (cid:107) e − t ˜ K ˜ u − ˜ π ˜ u (cid:107) ≤ Ce − αt (cid:107) ˜ u (cid:107) , (11) for ˜ u ∈ L ( R n ) and all t > , where ˜ π is the spectral projection onto the kernel of ˜ K . Moreover, the n -thorder derivatives of the semigroup e − t ˜ K satisﬁes (cid:107) e − t ˜ K ˜ K n (cid:107) ≤ (cid:18) B ˜ K (cid:18) tn (cid:19) + (cid:107) ˜ π ˜ K(cid:107) (cid:19) n , B ( t ) = Ce − αt (cid:20) t + · · · + 1 t M (cid:21) , (12) for some positive constant C and M . In [35], it is further shown that the similar semigroup estimates hold for orthogonal semigroup e t ˜ Q ˜ K ˜ Q if˜ P = I − ˜ Q is ﬁnite-rank, symmetric projection operator such as Mori’s projection. In particular, we have Proposition 2 (Zhu and Venturi [35]) . Assume that ˜ K satisﬁes all conditions listed in Proposition 1. If ˜ P : L ( R n ) → L ( R n ) is a symmetric, ﬁnite-rank projection operator, and the spectrum of ˜ Q ˜ K ˜ Q in L ( R n ) is such that σ ( ˜ Q ˜ K ˜ Q ) ∩ i R = { } , then there exits positive constants α ˜ Q and C = C ( α ˜ Q ) such that (cid:107) e − t ˜ Q ˜ K ˜ Q ˜ u − ˜ π ˜ Q ˜ u (cid:107) ≤ Ce − α ˜ Q t (cid:107) ˜ u (cid:107) (13) for all ˜ u ∈ L ( R n ) and t > , where ˜ π ˜ Q is the spectral projection onto the kernel of ˜ Q ˜ K ˜ Q . Moreover, the n -th order derivatives of the semigroup e − t ˜ Q ˜ K ˜ Q satisﬁes (cid:107) e − t ˜ Q ˜ K ˜ Q ( ˜ Q ˜ K ˜ Q ) n (cid:107) ≤ (cid:18) B ˜ Q (cid:18) tn (cid:19) + (cid:107) ˜ π ˜ Q ( ˜ Q ˜ K ˜ Q ) (cid:107) (cid:19) n , B ˜ Q ( t ) = Ce − α Q t (cid:20) t + · · · + 1 t M Q (cid:21) , (14) for some positive constant C and M Q .3.1. Application to Langevin dynamics Consider the Langevin dynamics of an interactive particle system, described by the following system ofSDEs in R d :  d q dt = 1 m p d p dt = −∇ V ( q ) − γm p + σ ξ ( t ) . (15)In eqn (15), m is the mass of each particle, V ( q ) is the interaction potential and ξ ( t ) is a d -dimensionalGaussian white noise process modeling physical Brownian motion. The parameters γ and σ are linked by theﬂuctuation-dissipation relation σ = (2 γ/β ) / , where β is proportional to the inverse of the thermodynamictemperature. The (negative) Kolmogorov operator (3) associated with the SDE (15) is given by K = − p m · ∇ q + ∇ q V ( q ) · ∇ p + γ (cid:18) p m · ∇ p − β ∆ p (cid:19) , (16)4here “ · ” denotes the standard dot product. If the interaction potential V ( q ) is strictly positive at inﬁnityand satisﬁes the weak ellipticity assumption (Hypothesis 1 in [35]), then the Langevin equation (15) admits anunique invariant Gibbs distribution given by ρ eq ( p , q ) = e − β H /Z , where H = (cid:107) p (cid:107) m + V ( q ) is the Hamiltonianand Z is the partition function. In [35], it is further proved that Proposition 1 holds for any ˜ u ∈ L ( R d )and t > π ( · ) = (cid:104) ( · ) , e − β H / (cid:105) e − β H / . Now we choose ρ eq as the weight of Hilbert space L ( R d ; ρ eq ),then the L -estimation (11) can be unitarily transformed [35] into the semigroup estimate in L ( R d ; ρ eq )as: (cid:107) e − t K u − π u (cid:107) L eq ≤ Ce − αt (cid:107) u (cid:107) L eq , (17)where π ( · ) = (cid:104) ( · ) (cid:105) eq = E [( · )]. Similarly, for orthogonal semigroup e t QKQ we have: (cid:107) e − t QKQ u − π Q u (cid:107) L eq ≤ Ce − α Q t (cid:107) u (cid:107) L eq . (18)Diﬀerent from the estimate for e − t K , the explicit expression of the spectral projection operator π Q dependson the projection operator P . For Mori-type projection operator P we considered, if there exists uniqueobservable set { w j } mj =1 such that (cid:104) w j , u i (cid:105) eq = 0 and K w j = K ∗ w j = u j , then π Q admits analytical form π Q ( · ) = π ( · ) + P ( · ) + m (cid:88) i =1 (cid:104) ( · ) , w i (cid:105) eq w i . (19)Otherwise π Q ( · ) = π + P . With semigroup estimates (17) and (18), we can drive prior estimations fordiﬀerent observable statistics. Equilibrium state . The equilibrium Langevin dynamics was studied thoroughly in [35]. Here we onlyreview the key estimation result while the derivation is omitted. If the initial condition of the Langevindynamics is (15) set to be ρ = ρ ( t = 0) = ρ eq , then the system is in a statistical equilibrium state, thecorresponding dynamics is called the equilibirum Langevin dynamics . For equilibrium system, the timeautocorrelation function C ( t ) of a scalar observable u ( x ( t )) = u ( p ( t ) , q ( t )) is stationary quantity satisfying C ( t, s ) = C ( | t − s | , C ( t ) := E x (0) [ E ξ ( t ) [ u ( t ) u (0) | x (0)]] = (cid:104) e t K u , u (cid:105) eq . (20)Using Cauchy-Schwarz inequality and the semigroup estimate (17), for u ∈ L ( R d ; ρ eq ) it is easy to getthe asymptotic estimate for C ( t ): | C ( t ) − (cid:104) u (cid:105) eq | = |(cid:104) e − t K u , u (cid:105) eq − (cid:104) u (cid:105) eq | = |(cid:104) e − t K u − (cid:104) u (cid:105) eq , u (cid:105) eq |≤ (cid:107) e − t K u − (cid:104) u (cid:105) eq (cid:107) L eq (cid:107) u (cid:107) L eq ≤ Ce − αt (cid:107) u (cid:107) L eq . (21)This implies the equilibrium correlation function C ( t ) approaches to the equilibrium value (cid:104) u (cid:105) eq = E [ u ]exponentially fast. To get the EMZ equation for observable u ( t ), we introduce Mori-type projection P = (cid:104)· , u (cid:105) eq u . Substituting this into EMZ equation (8) and (9) yields: ddt ˜ u ( t ) = Ω˜ u ( t ) + (cid:90) t K ( t − s )˜ u ( s ) ds + f ( t ) , (22) ddt C ( t ) = Ω C ( t ) + (cid:90) t K ( t − s ) C ( s ) ds, (23)where ˜ u ( t ) = E ξ ( t ) [ u ( x ( t )) | x ] and Ω = (cid:104) u , K u (cid:105) eq / (cid:104) u (cid:105) eq . If we further have (cid:104) u (cid:105) eq = 1, (cid:104) u (cid:105) eq = (cid:104) u , w (cid:105) eq = 0 and K w = K ∗ w = u , by using Cauchy-Schwarz inequality and the semigroup estimate(18), we can get the exponential convergence estimate for the EMZ memory kernel K ( t ) and the ﬂuctuationforce f ( t ): | K ( t ) − ( (cid:104)K u (cid:105) eq (cid:104)QK ∗ u (cid:105) eq + (cid:104)QK ∗ u , w (cid:105) eq (cid:104)K u , w (cid:105) eq ) | ≤ Ce − α Q t , (24) (cid:13)(cid:13)(cid:13) f ( t ) − (cid:16) (cid:104)QK u (cid:105) eq + (cid:104)QK u , w ) (cid:105) eq (cid:17)(cid:13)(cid:13)(cid:13) L eq ≤ Ce − α Q t (cid:107)QK u (cid:107) L eq . (25)5n most cases, such w does not exists, then estimates (24) and (25) reduce to | K ( t ) − (cid:104)K u (cid:105) eq (cid:104)QK ∗ u (cid:105) eq | ≤ Ce − α Q t , (26) (cid:13)(cid:13)(cid:13) f ( t ) − (cid:104)QK u (cid:105) eq (cid:13)(cid:13)(cid:13) L eq ≤ Ce − α Q t (cid:107)QK u (cid:107) L eq . (27) Nonequilibrium nonsteady state.

Semigroup estimate (17) can also be used to get prior estimates fornonequilibrium Langevin dynamics. If the initial condition of (15) is set to be ρ = ρ ( t = 0) (cid:54) = ρ eq , thesystem evolves from a nonequilibrium nonsteady state. We now study the dynamics of the nonequilibirummean function M ( t ) deﬁned as: M ( t ) := E x (0) [ E ξ ( t ) [ u ( t ) | x (0)]] = (cid:104) e t K u (cid:105) ρ .M ( t ) encodes the statistical moment information for a scalar observable u ( x ( t )). Using the Cauchy-Schwarzinequality, the substitution ρ = ρ √ ρ eq / √ ρ eq and the estimate (17), we obtain the asymptotic estimate for M ( t ): | M ( t ) − (cid:104) u (cid:105) eq | = |(cid:104) e − t K u (cid:105) ρ − (cid:104)(cid:104) u (cid:105) eq (cid:105) ρ | = |(cid:104) e − t K u − (cid:104) u (cid:105) eq (cid:105) ρ |≤ (cid:107) e − t K u − (cid:104) u (0) (cid:105) eq (cid:107) L eq (cid:13)(cid:13) ρ /ρ eq (cid:13)(cid:13) ≤ Ce − αt (cid:107) u (cid:107) L eq (cid:13)(cid:13) ρ /ρ eq (cid:13)(cid:13) . Diﬀerent from the equilibrium case, the convergence of the nonequilibrium mean M ( t ) requires the ﬁnitenessof the L ( R d ) norm (cid:107) ρ /ρ eq (cid:107) , which imposes additional constraint on the initial probability distribution ρ .For instance, if the initial probability density is set to be the Gibbs distribution ρ = e − β H / /Z β/ at hightemperature T ∝ /β , we have (cid:107) ρ /ρ eq (cid:107) = + ∞ therefore the above estimate is not suﬃcient to guaranteethe exponential convergence of M ( t ) towards the equilibrium value (cid:104) u (cid:105) eq . Similar conclusion can be drawnfrom the return to equilibrium estimate for the probability density function ρ ( t, p , q ) (see [15], Section 6.5): (cid:90) R d (cid:12)(cid:12)(cid:12)(cid:12) ρ ( t, p , q ) − Z e − β H (cid:12)(cid:12)(cid:12)(cid:12) e β H d p d q ≤ Ce − α t . The above estimate is a dual of (11) and holds only for ρ = ρ (0 , p , q ) ∈ e − β H / S (cid:48) ( R d ), where S (cid:48) ( R d ) is thespace of tempered distributions. Obviously when ρ = e − β H / /Z β/ / ∈ e − β H / S (cid:48) ( R d ), there is no theoreticalguarantee that the marginal distribution ρ u ( t ) would converges to the equilibrium marginal distribution. Consider a chain of nearest-neighbor interacting anharmonic oscillators coupled to two heat bath at endof the chain. Without adding external forces, the chain dynamics is determined by the system Hamiltonian: H S ( p , q ) = N (cid:88) i =0 (cid:18) p i V ( q i ) (cid:19) + N (cid:88) i =1 V ( q i − q i − ) . Now we attach the boundary oscillators to two thermostats with temperature T L and T R , then the dynamicsof the resulting heat conduction model [9, 7, 8] is described by the system of stochastic diﬀerential equations:  dq i = p i dt, i = 0 , · · · N,dp = − V (cid:48) ( q ) + V (cid:48) ( q − q ) dt + r L dtdp N = − V (cid:48) ( q N ) + V (cid:48) ( q N − q N − ) dt + r R dtdp j = − V (cid:48) ( q j ) dt − V (cid:48) ( q j − q j − ) dt + V (cid:48) ( q j +1 − q j ) dt, j = 1 , · · · N − dr L = − γ L r L dt + λ L γ L q dt − λ L (cid:112) γ L T L ξ L ( t ) dtdr R = − γ R r R dt + λ R γ R q N dt − λ R (cid:112) γ R T R ξ R ( t ) dt (28)6here λ L , λ R are the coupling constants between the boundary oscillators and the heat bath. ξ L ( t ) and ξ R ( t ) are the standard Gaussian white noise. The Kolmogorov backward operator K corresponding to thesystem of SDEs (28) is given by: K = λ L γ L T L ∂ r L + λ R γ R T R ∂ r R − γ L ( r L − λ L q ) ∂ r L − γ R ( r R − λ R q N ) ∂ r R + r L ∂ p + r R ∂ p N + N (cid:88) i =0 ( p i ∂ q i − V (cid:48) ( q i ) ∂ p i ) − N (cid:88) i =1 V (cid:48) ( q i − q i − )( ∂ p i − ∂ p i − ) . (29) Equilibrium state.

When T L = T R = T , the system admits an invariant probability density which isgiven by the extended Gibbs distribution ρ eq = e − β G ( p , q , r ) /Z , where β = 1 /T and G ( p , q , r ) is the eﬀectiveenergy corresponding to the chain+heat bath system, deﬁned as G ( p , q , r ) = H S ( p , q ) + r L λ L + r R λ R − q r L − q N r R . (30)The analysis for the equilibrium heat conduction model is exactly the same as the one for the Langevindynamics. For potential energy V and V satisfying suitable conditions listed in [8], Eckmann and Hairerproved that the spectrum of the transformed Kolomogorov operator ˜ K in L ( R N +4 ) is discrete and has thecusp-shape spectrum S ˜ K . Therefore according to Proposition 1, if ˜ K has no purely imaginary eigenvalue in L ( R N +4 ), then we have the exponentially decay estimate for scalar observable u ( x ( t )): (cid:107) e − t K u − π u (cid:107) L eq ≤ Ce − αt (cid:107) u (cid:107) L eq , (31) (cid:107) e − t QKQ u − π Q u (cid:107) L eq ≤ Ce − α Q t (cid:107) u (cid:107) L eq , (32)where weighted Hilbert space L eq = L ( R d ; ρ eq ). Following the procedure we outlined in Section 3.1, it iseasy to obtain corresponding exponentially decaying estimates for the equilibrium correlation function C ( t ),EMZ memory kernel K ( t ) and the ﬂuctuation force f ( t ). For the sake of brevity, the derivation details areomitted. Nonquilibrium steady state.

When T L (cid:54) = T R , it is proved in [7] that the system admits an unique invariantmeasure µ . Its density ρ S is an smooth function on R N +4 and can be represented as ρ S = ˜ h ( p , q , r ) e − β G ( p , q , r ) . (33)In (33), β < min { β L , β R } , ˜ h ( p , q , r ) ∈ (cid:84) γ> L ( R N +4 ; G γ ( p , q , r )) is a function decays faster than anypolynomial as x → ∞ . ρ S characterises a nonequilibrium steady state of the system. General speaking, it ishard to get an explicit expression of ˜ h ( p , q , r ), hence of the probability density (33). However, we can still useGibbs form equilibrium probability density e − β G /Z as a reference state to derive prior estimates. To this end,we consider a weighted Hilbert space L ( R N +4 ; ρ r ), where ρ r = e − β G /Z and 1 /β = T > max { T L , T R } .For the nonequilibrium case, the spectrum estimate obtained by Eckmann et al in [8] still hold, which impliesthe following exponentially decay estimate for scalar observable u ( x ( t )): (cid:107) e − t K u − π u (cid:107) L r ≤ Ce − αt (cid:107) u (cid:107) L r . (34)At the steady state, the correlation function C ( t ) is stationary which can be deﬁned as (20) if the initialcondition of (28) satisﬁes ρ (0) = ρ S . Using Cauchy-Schwarz inequality and the formal expression of thesteady state density (33), we obtain | C ( t ) − (cid:104) u (cid:105) ρ S | = |(cid:104) e − t K u , u (cid:105) ρ S − (cid:104) u (cid:105) ρ S , u (cid:105) ρ S | = |(cid:104) e − t K u − (cid:104) u (cid:105) ρ S , u (cid:105) ρ S |≤ (cid:107) e − t K u − (cid:104) u (cid:105) ρ S (cid:107) L r (cid:107) ˜ h ( p , q , r ) u (cid:107) ≤ Ce − αt (cid:107) ˜ h ( p , q , x ) u (cid:107)(cid:107) u (cid:107) L r Since ˜ h ( p , q , r ) ∈ (cid:84) γ> L ( R N +4 ; G γ ( p , q , r )), for any observable u ( x ( t )) ∈ S (cid:48) ( R N +4 ), e.g. polynomialfunctions, we have (cid:107) ˜ h ( p , q , x ) u (cid:107)(cid:107) u (cid:107) L r < + ∞ . The above estimate implies the steady state correlationfunction C ( t ) decays to (cid:104) u (cid:105) ρ S exponentially fast. We emphasize that all estimates in Section 3 can be readilygeneralized to the N -dimensional EMZ equation (8) and (9) where the observable u ( x ( t )) is a N -dimensionalvector [35]. 7 . Parametrization of the EMZ memory kernel From the previous discussion, we see that the prior estimation for the EMZ equation memory kernelimplies that K ( t ) is bounded by an exponentially decaying function. However, it does not answer what K ( t ) exactly is, which is an important problem for the application the EMZ equation. In this section,we turn to focus on the numerical approximation of the EMZ memory kernel. The main method we areconcerned with is the series expansion method. Over the years, various basis functions have been used toconstruct approximation schemes of the classical system Mori-Zwanzig memory kernel [1, 2, 19, 20, 18, 4, 33],where the expansion coeﬃcients (parameters) are obtained through ﬁrst-principle or data-driven methods.We will show that for the EMZ equation corresponding to the SDEs, similar approaches can be used toparametrize the memory kernel. In particular, we will prove that many commonly used date-driven methodsare convergent due to the regularity of the orthogonal ﬂow. A ﬁrst-pinciple method to approximate the memory kernel was considered in [33, 34]. It is shown that aseries expansion of the memory kernel can be derived exactly from the semigroup expansion of orthogonalsemigroup e t QKQ . Following the derivation given therein, we consider the series expansion of the orthogonalsemigroup: e t QKQ = ∞ (cid:88) n =0 g n ( t )Φ n ( QKQ ) , (35)where Φ n ( QKQ ) is the n -th order polynomial function of operator QKQ and g n ( t − τ ) the correspondingtemporal basis. The simplest choice is the Taylor expansion where Φ n and g n are Φ n ( QKQ ) = (

QKQ ) n and g n ( t ) = t n /n !. Other possible choice of Φ n ( n = 0 , . . . , N ) can be, e.g. the Faber polynomials [33], and thecorresponding g n ( t ) = e − at J n ( bt ), where J n ( bt ) is the Bessel function of the ﬁrst kind. Semigroup expansion(35) leads to function series expansions of the memory kernel. For a one dimensional EMZ equation, asubstitution of (35) into (10c) yields K ( t ) = ∞ (cid:88) n =0 g n ( t ) (cid:104)K Φ n ( QKQ ) QK u , u (cid:105) ρ (cid:104) u , u (cid:105) ρ = ∞ (cid:88) n =0 k n g n ( t ) , (36)where k n is the n -th expansion coeﬃcient which can be understood as the operator cumulant averagedwith respect to the probability density ρ . Naturally, a truncation of the expansion series (36) yields anapproximation of the exact memory kernel. We emphasize that from a theoretical point of view, it is hardto prove the convergence of expansion (36) for nonlinear SDEs due to unboundedness of the operator QKQ .However, the validity of this approximation method has been veriﬁed numerically for linear and nonlinearHamiltonian system in the statistical equilibrium [33, 34].

First-principle method to calculate k n . The ﬁrst-principle method calculate k n via the evaluation of theoperator cumulants in (36). This can be realized using a recursive scheme and the associated combinatorialalgorithm introduced in [34]. The original method is developed for the MZ equation of deterministic Hamil-tonian systems. However, it can be readily generalized to stochastic system EMZ equation with some slightmodiﬁcations of the derivation. Here we only brieﬂy review the main idea of the algorithm and refer to [34]for detailed explanations. Without loss of generality, it is convenient to consider a one-dimensional Mori’sprojection: P f = (cid:104) f, u (cid:105) ρ (cid:104) u , u (cid:105) ρ , (37)and introduce the following notation µ i = (cid:104)K ( QK ) i − u , u (cid:105) ρ (cid:104) u , u (cid:105) ρ , γ i = (cid:104)K i u , u (cid:105) ρ (cid:104) u (0) , u (0) (cid:105) ρ . (38)Clearly, if we are given { µ , . . . , µ n +2 } , then we can easily compute { k , . . . , k n } in (36), and therefore the n -th order approximation of the memory kernel K ( t ) for any given polynomial function Φ n . For example,8f Φ n ( QKQ ) = (

QKQ ) n then k q = µ q +2 /q ! ( q = 0 , . . . , n ). Directly evaluating µ i is a daunting task since itinvolves taking operator powers and averaging of operator QKQ which is a integral-diﬀerential operator bydeﬁnition. However, the following recursive formula indicates that µ i can be constructed iteratively from γ i : µ = γ , µ = γ − µ γ , · · · , µ n = γ n − n − (cid:88) j =1 µ n − j γ j . (39)The proof of (39) is provided in Appendix A. Recurrence relation (39) shifts the problem of computing { µ , . . . , µ n } to the problem of evaluating the coeﬃcients { γ , . . . , γ n } deﬁned in (38). This can be doneiteratively using the enumerative combinatorial algorithm introduced in [34], with the the Livouille operator L used therein replaced by the Kolomogorov operator K . For the sake of brevity, we omit technical detailswhich can be found in [34]. In Appendix B, we provide the derivation of the combinatrorial algorithm forthe Langevin dynamics (15) of the Fermi-Pasta-Ulam (FPU) chain. Diﬀerent from the ﬁrst-principle method, there are many established data-driven methods which can beused to parametrize the memory kernel. Generally speaking, these methods use data collected by simulatingstochastic dynamics (1) to approximate the expansion coeﬃcients k n . The expansion series can be formulatedin the temporal space as well as the frequency space [18]. In this section, we are only concerned with theﬁrst case and use the following ansatz to approximate K ( t ): K ( t ) ≈ N (cid:88) n − ˆ k n φ n ( t ) , where ˆ k n = (cid:104) K ( t ) , φ n ( t ) (cid:105) ω (cid:104) φ n ( t ) , φ n ( t ) (cid:105) ω . (40)In (40), { φ n ( t ) } is the basis function deﬁned in some open interval I ⊂ R + . The common choice of whichare the orthogonal functions in a weighted Hilbert space L ( I, ω ). Under this setting, (40) becomes ageneralized Fourier series. Hence, we can apply established results in approximation theory, say [13], toprove the convergence of the series expansion (40) as N → ∞ . As a preparation, we ﬁrst use the Cauchy-Schwartz inequality and semigroup estimate (14) to obtain the upper bounds of the n -th order derivative ofthe memory kernel: | K ( n ) ( t ) | = |(cid:104)K ∗ e t QKQ ( QKQ ) n QK u , u (cid:105) ρ | = |(cid:104) e t QKQ ( QKQ ) n QK u , K ∗ ρ u (cid:105) ρ |≤ C (cid:107)QK u (cid:107) L ρ (cid:107)K ∗ ρ u (cid:107) L ρ (cid:18) B Q (cid:18) tn (cid:19) + (cid:107) π Q ( QKQ ) (cid:107) (cid:19) n , t > . (41)According to the deﬁnition of B Q ( t ) in (14), estimate (41) implies that K ( n ) ( t ) is bounded by a continuousfunction of time in domain I = ( T , T ), where 0 ≤ T ≤ T < + ∞ . Hence for suitable weight function ω ,we have K ( t ) | I ∈ ∩ ∞ k =1 H kω ( I ), where K ( t ) | I is the restriction of K ( t ) in the open interval I and H kω ( I ) isthe weighted Sobolev space deﬁned in I . This regularity provides suﬃcient conditions for the convergenceof expansion (40). If { φ n ( t ) } is chosen to be, say the shifted Jacobi-type polynomials deﬁned in I , thenaccording to Theorem 6.2.4 in [13], the following convergence estimate holds for any 0 < m < N : (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) K ( t ) | I − N (cid:88) n =0 ˆ k n φ n ( t ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) L ω ( I ) ≤ CN m (cid:13)(cid:13)(cid:13) (1 − t ) m/ K ( m ) ( t ) (cid:13)(cid:13)(cid:13) L ω ( I ) , t ∈ I = ( T , T ) . (42)Since K ( t ) is naturally deﬁned in domain I = (0 , + ∞ ), we can also set { φ n ( t ) } to be the standard Laguerrepolynomial with the weight function ω = e − t/ . For ﬁxed n ∈ N + , using (41) we can getlim t → + ∞ | K ( n ) ( t ) | ≤ C (cid:107)QK u (cid:107) L ρ (cid:107)K ∗ ρ u (cid:107) L ρ (cid:107) π Q ( QKQ ) (cid:107) n , (43)9 = 1 β = 20 Figure 1: Sample path of the tagged oscillator momentum p ( t ). We display the result for the stochastic FPU system (48)with weak ( θ = 0 .

1) and strong nonlinearity ( θ = 1) at high ( β = 1) and low ( β = 20) temperature. which yields | K ( m ) ( t ) | t m/ ∈ L ω ( I ). According to Theorem 6.2.5 in [13], this leads to the following conver-gence estimate for any 0 < m < N : (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) K ( t ) − N (cid:88) n =0 ˆ k n φ n ( t ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) L ω ( I ) ≤ C ( √ N ) m (cid:13)(cid:13)(cid:13) t m/ K ( m ) ( t ) (cid:13)(cid:13)(cid:13) L ω ( I ) , t ∈ I = (0 , + ∞ ) . (44)The semigroup estimate (14) we used in the above derivation holds in the uniform topology for any scalarobservable u ∈ L ( R n ; ρ ). As a consequence, the convergence rate we obtained on (42) and (44) are notoptimal. But we already know the convergence is spectral , i.e. faster than any polynomials. As far as we areconcerned, this is the ﬁrst convergence result for data-driven methods used in the Mori-Zwanzig framework.We also note that error estimate (42) only implies the approximation of K ( t ) within ( T , T ) is accurate. Inorder to maintain low extrapolation error, in applications we will use basis functions deﬁned in I = (0 , + ∞ )to approximate the memory kernel. Data-driven method to calculate k n . A substitution of the truncated expansion (40) into the projectedEMZ equation (5) yields ddt P u ( t ) ≈ Ω P u ( t ) + N (cid:88) n =0 k n (cid:90) t φ n ( s ) P u ( t − s ) ds. The projected observable P u ( t ) trajectories within a certain period [0 , T ] can be obtained via numericalsimulation of the SDE (1), then the expansion coeﬃcients k n can be obtained by solving numerically thefollowing regression problem:min { k n } Nn =1 (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ddt P u ( t ) − Ω P u ( t ) − N (cid:88) n =0 k n (cid:90) t φ n ( s ) P u ( t − s ) ds (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) L ω ( I ) . (45)In Section 5, we will use the LASSO regression [28] to solve (45) and get the approximated parameter set { k n } Nn =1 . When compared with the ﬁrst-principle method, the data-driven method in general has widerrange of applicability but also demands more computational power.

5. Applications

In this section, we will use the Langevin dynamics of a Fermi-Pasta-Ulam (FPU) chain model to numer-ically verify the theoretical results obtained in previous sections and validate the parametrization method of10 = 1 β = 20 -2 -0.500.51 -5 -4 -3 -2 -1 -2 -0.500.51 -5 -4 -3 -2 -1 Figure 2: Temporal auto-correlation function of the tagged oscillator momentum p j ( t ) for weakly nonlinear FPU system atdiﬀerent temperature T ∝ /β . We compare results we obtained by calculating the EMZ memory from ﬁrst principles using14th order Faber polynomials with results from MC simulation (10 sample paths). In the subplots, we display | C ( t ) /C (0) | and the exponentially decaying upper bound ce − αt with an estimated decaying rate α . the EMZ memory kernel. To this end, we consider the Hamiltonian of the FPU chain: H ( p , q ) = N − (cid:88) j =0 p j m + N − (cid:88) j =0 V ( q j +1 − q j ) , (46)where the potential energy is given by V ( q j +1 − q j ) = α q j +1 − q j ) + θ q j +1 − q j ) . (47) { q j , p j } are, respectively, the generalized coordinate and momentum of the j -th oscillator. In addition, theperiodic boundary condition q = q N and p = p N is imposed, and the total number of oscillators is set tobe N = 100. For such system, it is convenient to work on a new, non-canonical coordinate { r , p } where r j = q j − q j − is the distance between two neighboring oscillators. In the new coordinate, the Langevindynamics (15) for the stochastic FPU model is given by  ddt r j = 1 m ( p i − p i − ) ,ddt p j = ∂V ( r j +1 ) ∂r j +1 − ∂V ( r j ) ∂r j − γm p j + σξ ( t ) . (48)The corresponding Kolmogorov operator is K = L ( p , r ) + S ( p )= N − (cid:88) i =1 (cid:20)(cid:18) ∂V ( r i +1 ) ∂r i +1 − ∂V ( r i ) ∂r i (cid:19) ∂∂p i + 1 m ( p i − p i − ) ∂∂r i (cid:21) − N − (cid:88) i =1 γ (cid:18) p i m ∂∂p i − β ∂ ∂p i (cid:19) , (49)where L ( p , r ) is the Liouville operator in the new coordinate { r , p } , S ( p ) is the advection-diﬀusion operatorinvolving p . The sample paths for the momentum of a tagged oscillator p ( t ) are shown in Figure 1for system with diﬀerent parameters. Stochastic FPU chain with weak nonlinearity

We ﬁrst consider theequilibrium dynamics of the FPU chain with weak nonlinearity. To this end, we set modeling parameter α = 1, γ = 1 and θ = 0 .

1. The initial condition of (15) is set to be x (0) ∼ ρ eq where ρ eq = e − βH is theequilibrium Gibbs distribution. For the weakly nonlinear system, we aim to verify the following claims:i) The observable statistics, in particular the auto-correlation function C ( t ) and the corresponding mem-ory kernel K ( t ) deﬁned in the projected EMZ (23), decays exponentially to its equilibrium value.11 = 1 β = 20 -2 -0.200.20.40.60.8 -20 -10 -2 -0.200.20.40.60.81 -30 -20 -10 Figure 3: Approximated EMZ memory kernel corresponding to the tagged particle momentum correlation function C ( t ). Thesubplots display | K ( t ) /K (0) | and the exponentially decaying upper bound c Q e − α Q t with an estimated decaying rate α Q . Othersetting is same as Figure 2. ii) The ﬁrst-principle method introduced in Section 4.1 yields accurate approximation to the memorykernel K ( t ), therefore of C ( t ).For claim i), we note that the FPU potential energy deﬁned as (47) satisﬁes the weak ellipticity condi-tion (Hypothesis 1) in [35]. Therefore the theoretical results in Section 3.1 hold for any polynomial-typeobservable. Now we choose the momentum p j ( t ) of a tagged oscillator as quantity of interest and use Mori’sprojection P ( · ) = (cid:104) ( · ) , p j (0) (cid:105) eq / (cid:104) p j (0) (cid:105) eq to derive the projected EMZ equation (23). Some simple calculationimplies Ω = −

1, hence the projected EMZ equation for the momentum yields the evolution equation for thetime correlation function: dC ( t ) dt = − C ( t ) + (cid:90) t K ( t − s ) C ( s ) ds, (50)where C ( t ) = (cid:104) p j ( t ) , p j (0) (cid:105) eq . According to estimate (21), the auto-correlation function C ( t ) decays to 0exponentially fast with the rate α . In the non-canonical coordinate { p j , r j } , we have K q j = K ∗ eq q j = p j .Moreover, since the periodic boundary condition is posed q j cannot be written as a function of { p j , r j } (thelinear transformation { q j } Nj =1 → { r j } Nj =1 is not invertible). Hence the kernel projection operator π Q of QKQ admits the explicit form: π Q ( · ) = E [( · )] + P ( · ) , and the memory kernel estimate is given by (26). In Figure 2, we plot the auto-correlation function C ( t )obtained by Monte-Carlo (MC) simulation (10 sample paths) for FPU systems with mild nonlinearities( θ = 0 .

1) at diﬀerent temperatures ( β = 1 and β = 20). The corresponding memory kernel K ( t ) is shownin Figure 3 which is obtained by the ﬁrst-principle parametrization method. In both plots, we can see that C ( t ) and K ( t ) approaches to the predicted asymptotic C ( t = ∞ ) = 0 and K ( t = ∞ ) = 0 exponentially fast.For claim ii), we adopt the MZ-Faber expansion of e − t QKQ [33] to approximate the memory kernel,where e − at J n ( bt ) is the basis function. The simulation reult is displayed in Figure 2. It can be seen thatthe MZ-Faber approximation of the EMZ memory kernel yields relatively accurate results for FPU systemswith mild nonlinearties at both low ( β = 20) and high temperature ( β = 1). Stochastic FPU chain with strong nonlinearity.

When the modeling parameter of the stochastic FPUchain is set to be α = 1 and θ = 1, we get a strong nonlinear FPU chain. We can similarly derive theprojected EMZ equation for the time correlation function of the momentum of a tagged particle. The formof which will be the same as (50).The ﬁrst principle method introduced in Section 4.1 still can be applied here to approximate the EMZmemory kernel. However, large θ will lead to signiﬁcant numerical instabilities at large t when calculating K ( t ) and C ( t ) [34]. Hence, we will adopt the data-driven method to approximate the memory kernel. To12 = 1 β = 20 -2 -1 -0.500.51 -5 -4 -3 -2 -1 -2 -1 -0.500.51 -6 -4 -2 Figure 4: Temporal auto-correlation function of the tagged oscillator momentum p j ( t ) for strongly nonlinear FPU system atdiﬀerent temperature T ∝ /β . The MC simulation results (10 sample paths) of the correlation function are compared withthe one obtained by the data-driven memory kernel using Faber series (20th order) and the standard Laguerre polynomials(20th order). In the subplots, we display | C ( t ) /C (0) | and the exponentially decaying upper bound ce − αt with an estimateddecaying rate α . this end, we use the standard Laguerre polynomial [13] and Faber series [33] to construct the data-drivenapproximation scheme of the EMZ memory kernel. In particular, the LASSO regression is used to solve (45)numerically to get the approximated parameter { k n } Nn =1 . We also note that since the EMZ equation (23) isa Volterra-type integro-diﬀerential equation, constructing the memory kernel K ( t ) from C ( t ) is an inverseproblem which is ill-posed. Hence by solving numerically the regression problem (45), we can only get an eﬀective memory kernel which may yield accurate predication of C ( t ) or ˜ u ( t ) while cannot be used to justifythe exponential decaying estimate of K ( t ). Considering this fact, we only use the data-driven method toverify the following claims:iii) The auto-correlation function C ( t ) deﬁned in the projected EMZ (23) decays exponentially to itsequilibrium value.iv) The data-driven method introduced in Section 4.1 yields eﬀective approximations to the memorykernel K ( t ), therefore of C ( t ).To demonstrate iii), we use MC simulation (10 sample paths) to calculate the momentum auto-correlationfunction C ( t ). It is shown in the subplots of Figure 4 that C ( t ) deﬁned in the projected EMZ (23) decays0 exponentially fast. To validate iv), we adopt the Faber series and the standard Laguerre polynomials asthe basis function to construct the data-driven approximation schemes for K ( t ). These calculation results,along with the one obtained by the established rational approximation method [18], are presented in Figure4. We can see that the data-driven method leads to accurate predication of C ( t ).

6. Summary

In this paper, we mainly focus on the application of the eﬀective Mori-Zwanzig (EMZ) equation onthe reduced-order modeling of stochastic systems. In particular, we showed that the semigroup estimatesfor e − t K and e − t QKQ can be used to derive the exponentially decay upper bounds for various observablestatistics associated with the EMZ equation, including the auto-correlation function C ( t ), the EMZ memorykernel K ( t ) and the ﬂuctuation force. The results are presented for the Langevin dynamics of anharmonicoscillator chain and the heat conduction model in and out of statistical equilibrium. For each system,we have shown how the obtained prior estimates describe the dynamical properties of the reduced-orderequation for observables. In addition, we introduced both the ﬁrst-principle and data-driven methods toparametrize the EMZ memory kernel, and demonstrated that the regularity of K ( t ) enables us to prove theconvergence of frequently used data-driven approximation schemes. As far as we are concerned, this is theﬁrst theoretical convergence result regarding the approximation of the memory kernel. All these theoreticalﬁndings are veriﬁed numerically by simulating the Langevin dynamics for a Fermi-Pasta-Ulam (FPU) chain13odel. With the same example, we also proved the eﬀectiveness of the numerical methods within their rangeof applicability. We conclude by emphasizing that analytical results obtained in this paper can be generalizedand applied to the EMZ equation of other hypoelliptic stochastic systems. The numerical methodology weconsidered can be used to build eﬀective reduced-order models for large-scale stochastic dynamical systems. Acknowledgements

This research was partially supported by the Air Force Oﬃce of Scientiﬁc Research(AFOSR) grant FA9550-16-586-1-0092. The ﬁrst author would like to thank Prof. D. Venturi and A. Iserlesfor stimulating discussions.

Appendix A. Proof of recurrence relation (39)The proof of the recurrence relation (39) for equilibrium Hamiltonian system is given by Chu and Li in[4]. Here we provide a more general proof and show that such relation holds for any linear operator K andﬁnite-rank projection operator P . (39) is a direct consequence of the following operator polynomial identity: PK ( QK ) n = PK ( n +1) − n (cid:88) i =1 PK ( QK ) ( i − PK ( n − i +1) , ≤ n ∈ N (A.1)We can prove (A.1) by induction. For n = 0, we have identity PK = PK . Suppose for n = k , we have PK ( QK ) k = PK ( k +1) − k (cid:88) i =1 PK ( QK ) i − PK ( k − i +1) Then for n = k + 1, we have PK ( QK ) k +1 = PK ( QK ) k K − PK ( QK ) k PK = PK ( k +1) K − k (cid:88) i =1 PK ( QK ) ( i − PK ( k − i +1) K − PK ( QK ) k PK = PK ( k +2) − k +1 (cid:88) i =1 PK ( QK ) ( i − PK ( k − i +2) . By mathematical induction, the statement (A.1) holds for all 0 ≤ n ∈ N . Applying operator identity (A.1) onthe observable u (0) and then using the deﬁnition (37) and (38), we can get the recurrence relation (39). For M -dimensional ﬁnite-rank projection (7), using the same trick we get the following matrix form recurrencerelation: M n = Γ n − n − (cid:88) i =1 Γ n − i M i (A.2)where M n , Γ n are M × M dimensional matrix, deﬁned as PK ( QK ) ( n − u (0) = M n u (0) , PK n u (0) = Γ n u (0) . where u (0) = [ u (0) , u (0) , · · · , u M (0)] T is the (initial) vector of quantities of interest and Ran( P ) = { u i (0) } Mi =1 . Appendix B. First-principle algorithm to calculate γ n for stochastic FPU chain The notaion used in this section follows exactly from [34]. We ﬁrst note that for a system with potentialenergy given by a polynomial function, the action of the n -th operator power K n on a polynomial observable u ( x ) yields a polynomial function. Take u ( x ( t )) = x j as an example, this implies K n x j = (cid:88) b i ∈ B ( n ) a ( n ) b i x m ( i ) k k · · · x m ( i ) kr k r ⇒ K n +1 x j = KK n x j = (cid:88) b i ∈ B ( n +1) a ( n +1) b i x m ( i ) k k · · · x m ( i ) kr k r . (B.1)14here { a ( n ) b i } are polynomial coeﬃcients and { m ( i ) k j } are polynomial exponents. At this point, it is con-venient to deﬁne the set of polynomial exponents B ( n ) = { b , b , · · · } , the set polynomial coeﬃcients A ( n ) = { a ( n ) b , a ( n ) b , · · · } , and the combined index set I ( n ) = { A ( n ) , B ( n ) } . Clearly, I ( n ) identiﬁes uniquelythe polynomial (B.1), i.e., there is a one-to-one correspondence between I ( n ) and K n x j . If we can computethe mapping I ( n ) K −→ I ( n +1) , induced by the action of the Kolmogorov operator K to the polynomial (B.1)(represented by I ( n ) ), then we can compute the exact series expansion of K n x j for arbitrary n . The wholeprocess can be represented as u ( x ) → K u ( x ) → K u ( x ) → · · · → K n u ( x ) ⇐⇒ I (0) K −→ I (1) K −→ I (2) K −→ · · · K −→ I ( n ) where ⇐⇒ represents the translation between the action of K on the observables and its action on theindex set I ( n ) . It is left to determine the updating rule of I ( n ) for the Langevin dynamics of the FPUchain. Suppose we are interested in the distance between the oscillators j and j −

1, i.e., in the polynomialobservable u ( p , r ) = r j . Using the formal deﬁnition of the Kolmogorov operator (49), the action of K n on r j can be explicitly written as K n r j = (cid:88) b i ∈ B ( n ) a ( n ) b i r m ( i ) k k · · · r m ( i ) ku k u p s ( i ) l l · · · p s ( i ) lv l v , (B.2)where { k , . . . , k u } and { l , . . . , l v } are the relevant degrees of freedom for r and p at iteration n . We canexplicitly compute the sets of such relevant degrees of freedom as K r ( n, j ) = (cid:110) j − (cid:106) n (cid:107) , . . . , j + (cid:106) n (cid:107)(cid:111) L p ( n, j ) = (cid:26) j − (cid:22) n + 12 (cid:23) , . . . , j + (cid:22) n − (cid:23)(cid:27) , (B.3)The action of the Kolmogorov operator on each monomial appearing in (B.2) can be written as K r m ( i ) ku k r m ( i ) ku k u p s ( i ) l l · · · p s ( i ) lv l v = (cid:88) v ∈ K r ( n,j ) (cid:88) h ∈ L p ( n,j ) ( L r v + L p h + S p h ) r m ( i ) k k · · · r m ( i ) ku k u p s ( i ) l l · · · p s ( i ) lv l v , (B.4)where L r v = 1 m ( p v − p v − ) ∂∂r v L p h = (cid:2) α ( r h +1 − r h ) + θ (cid:0) r h +1 − r h (cid:1)(cid:3) ∂∂p h S p h = γp h m ∂∂p h − γβ ∂ ∂p h . The action of L r v , L p h and S p h on the monomial r m ( i ) k k · · · r m ( i ) ku k u p s ( i ) l l · · · s ( i ) lv l v can be explicitly computed. Thisyields explicit linear maps of the polynomial exponents b i = [ m ( i ) , s ( i ) ] , m ( i ) = [ m ( i ) k , . . . , m ( i ) k u ] , s ( i ) = [ s ( i ) l , . . . , s ( i ) l v ] , (B.5)and polynomial coeﬃcients a ( n ) b i . With such maps available, we can transform the combined index set I ( n ) (representing K n r j ) to I ( n +1) (representing K n +1 r j ). Speciﬁcally, we obtain I ( n +1) = I ( n +1) L r (cid:93) I ( n +1) L p (cid:93) I ( n +1) S p , I ( n +1) L r = (cid:93) v ∈ K r ( n,j ) B ( n ) (cid:93) i =1 1 (cid:93) k =0 (cid:110) m ( i ) v ( − k a ( n ) b i , [ m ( i ) − e v , s ( i ) + e v − k ] (cid:111) , I ( n +1) S p = (cid:93) h ∈ L p ( n,j ) B ( n ) (cid:93) i =1 (cid:26) { γs ( i ) h a ( n ) b i , − γβ s ( i ) h ( s ( i ) h − a ( n ) b i } , { [ m ( i ) , s ( i ) ] , [ m ( i ) , s ( i ) − e h ] } (cid:27) , I ( n +1) L p = (cid:93) h ∈ L p ( n,j ) B ( n ) (cid:93) i =1 1 (cid:93) k =0 (cid:110) { s ( i ) h ( − k +1 αa ( n ) b i , s ( i ) h ( − k +1 γa ( n ) b i } , { [ m ( i ) + e h + k , s ( i ) − e h ] , [ m ( i ) + 3 e h + k , s ( i ) − e h ] } (cid:111) . (B.6)On the other hand, since K ∗ eq = −L ( p , r ) + S ( p ). It is easy to obtain the updating rule for the correspondingindex set I ∗ ( n ) from the formal expression (B.6). With these results available, we can immediately determinethe coeﬃcients γ j in (38) by averaging over the probability density ρ eq as γ n = (cid:104)K n r j , r j (cid:105) eq (cid:104) r j , r j (cid:105) eq =  (cid:104)K n r j , K ∗ n eq r j (cid:105) eq (cid:104) r j , r j (cid:105) eq , n is even (cid:104)K n +12 r j , K ∗ n − eq r j (cid:105) eq (cid:104) r j , r j (cid:105) eq , n is odd (B.7)Using formula (38), (39) and the exact expression of the polynomial Φ n ( QKQ ), we can get the expansioncoeﬃcient k n in (36). References [1] A. D. Baczewski and S. D. Bond. Numerical integration of the extended variable generalized Langevinequation with a positive Prony representable memory kernel.

J. Chem. Phys , 139(4):044107, 2013.[2] M. Berkowitz, J. D. Morgan, D. J. Kouri, and J. A. McCammon. Memory kernels from moleculardynamics.

J. Chem. Phys , 75(5):2462–2463, 1981.[3] A. J. Chorin, O. H. Hald, and R. Kupferman. Optimal prediction and the Mori-Zwanzig representationof irreversible processes.

Proc. Natl. Acad. Sci. USA , 97(7):2968–2973, 2000.[4] W. Chu and X. Li. The Mori-Zwanzig formalism for the derivation of a ﬂuctuating heat conductionmodel from molecular dynamics. arXiv:1709.05928 , 2017.[5] R. Kupferman D. Givon and O. H. Hald. Existence proof for orthogonal dynamics and the Mori-Zwanzigformalism.

Isr. J. Math. , 145(1):221–241, 2005.[6] J. M. Dominy and D. Venturi. Duality and conditional expectations in the Nakajima-Mori-Zwanzigformulation.

J. Math. Phys , 58(8):082701, 2017.[7] J. P. Eckmann and M. Hairer. Non-equilibrium statistical mechanics of strongly anharmonic chains ofoscillators.

Commun. Math. Phys. , 212(1):105–164, 2000.[8] J. P. Eckmann and M. Hairer. Spectral properties of hypoelliptic operators.

Commun. Math. Phys. ,235(2):233–253, 2003.[9] J. P. Eckmann, C. A. Pillet, and L. Rey-Bellet. Non-equilibrium statistical mechanics of anharmonicchains coupled to two heat baths at diﬀerent temperatures.

Commun. Math. Phys. , 201(3):657–697,1999.[10] P. Espa˜nol. Hydrodynamics from dissipative particle dynamics.

Phys. Rev. E , 52(2):1734, 1995.1611] P. Espa˜nol and P. Warren. Statistical mechanics of dissipative particle dynamics.

EPL , 30(4):191, 1995.[12] S. K. J. Falkena, C. Quinn, J. Sieber, J. Frank, and H. A. Dijkstra. Derivation of delay equation climatemodels using the Mori- Zwanzig formalism.

Proc. R. Soc. A , 475, 2019.[13] D. Funaro.

Polynomial approximation of diﬀerential equations , volume 8. Springer Science & BusinessMedia, 2008.[14] F. Grogan, H. Lei, X. Li, and N. A. Baker. Data-driven molecular modeling with the generalizedLangevin equation.

J. Comput. Phys. , 418:109633–109641, 2020.[15] B. Helﬀer and F. Nier.

Hypoelliptic estimates and spectral theory for Fokker-Planck operators and WittenLaplacians . Springer, 2005.[16] F. H´erau and F. Nier. Isotropic hypoellipticity and trend to equilibrium for the Fokker-Planck equationwith a high-degree potential.

Arch. Ration. Mech. Anal , 171(2):151–218, 2004.[17] P. E. Kloeden and E. Platen.

Numerical solution of stochastic diﬀerential equations , volume 23. SpringerScience & Business Media, 2013.[18] H. Lei, N.A. Baker, and X. Li. Data-driven parameterization of the generalized Langevin equation.

Proc. Natl. Acad. Sci. , 113(50):14183–14188, 2016.[19] Z. Li, , X. Bian, X. Li, and G. E. Karniadakis. Incorporation of memory eﬀects in coarse-grainedmodeling via the Mori-Zwanzig formalism.

J. Chem. Phys , 143:243128, 2015.[20] Z. Li, H. S. Lee, E. Darve, and G. E. Karniadakis. Computing the non-Markovian coarse-grainedinteractions derived from the Mori-Zwanzig formalism in molecular systems: Application to polymermelts.

J. Chem. Phys , 146:014104, 2017.[21] K. K. Lin and F. Lu. Data-driven model reduction, Wiener projections, and the Mori-Zwanzig formalism. arXiv preprint arXiv:1908.07725 , 2019.[22] F. Lu, K. K. Lin, and A. J. Chorin. Data-based stochastic model reduction for the Kuramoto–Sivashinskyequation.

Physica D , 340:46–57, 2017.[23] H. Mori. Transport, collective motion, and Brownian motion.

Prog. Theor. Phys. , 33(3):423–455, 1965.[24] E. J. Parish and K. Duraisamy. Non-Markovian closure models for large eddy simulations using theMori-Zwanzig formalism.

Phys. Rev. Fluids , 2(1):014604, 2017.[25] G. A. Pavliotis.

Stochastic processes and applications: diﬀusion processes, the Fokker-Planck andLangevin equations , volume 60. Springer, 2014.[26] H. Risken.

The Fokker-Planck equation: methods of solution and applications . Springer-Verlag, secondedition, 1989. Mathematics in science and engineering, vol. 60.[27] P. Stinis. Stochastic optimal prediction for the Kuramoto–Sivashinsky equation.

Multiscale Modeling& Simulation , 2(4):580–612, 2004.[28] R. Tibshirani. Regression shrinkage and selection via the Lasso.

J. Royal Stat. Soc , 58(1):267–288,1996.[29] D. Venturi and G. E. Karniadakis. Convolutionless Nakajima-Zwanzig equations for stochastic analysisin nonlinear dynamical systems.

Proc. R. Soc. A , 470(2166):1–20, 2014.[30] D. Venturi, T. P. Sapsis, H. Cho, and G. E. Karniadakis. A computable evolution equation for thejoint response-excitation probability density function of stochastic dynamical systems.

Proc. R. Soc. A ,468(2139):759–783, 2012. 1731] Y. Yoshimoto, I. Kinefuchi, T. Mima, A. Fukushima, T. Tokumasu, and S. Takagi. Bottom-up construc-tion of interaction models of non-Markovian dissipative particle dynamics.

Phys. Rev. E , 88(4):043305,2013.[32] Y. Zhu, J. M. Dominy, and D. Venturi. On the estimation of the Mori-Zwanzig memory integral.

J.Math. Phys , 59(10):103501, 2018.[33] Y. Zhu and D. Venturi. Faber approximation of the Mori-Zwanzig equation.

J. Comp. Phys. , (372):694–718, 2018.[34] Y. Zhu and D. Venturi. Generalized langevin equations for systems with local interactions.

J. Stat.Phys , pages 1–31, 2020.[35] Y. Zhu and D. Venturi. Hypoellipticity and the Mori-Zwanzig formulation of stochastic diﬀerentialequations. arXiv preprint arXiv:2001.04565 , 2020.[36] R. Zwanzig. Memory eﬀects in irreversible thermodynamics.

Phys. Rev , 124(4):983, 1961.[37] R. Zwanzig. Nonlinear generalized Langevin equations.