[PDF] Path large deviations for stochastic evolutions driven by the square of a Gaussian process

Abstract

Many dynamics are random processes with increments given by a quadratic form of a fast Gaussian process. We find that the rate function which describes path large deviations can be computed from the large interval asymptotic of a certain Fredholm determinant. The latter can be evaluated explicitly using Widom's theorem which generalizes the celebrated Szego-Kac formula to the multi-dimensional case. This provides a large class of dynamics with explicit path large deviation functionals. Inspired by problems in hydrodynamics and atmosphere dynamics, we present the simplest example of the emergence of metastability for such a process.

Full PDF

PPath large deviations for stochastic evolutions driven by thesquare of a Gaussian process.

Freddy Bouchet, Roger Tribe and Oleg ZaboronskiFebruary 19, 2021

Abstract

Many dynamics are random processes with increments given by a quadratic formof a fast Gaussian process. We ﬁnd that the rate function which describes path largedeviations can be computed from the large interval asymptotic of a certain Fredholmdeterminant. The latter can be evaluated explicitly using Widom’s theorem whichgeneralizes the celebrated Szego-Kac formula to the multi-dimensional case. Thisprovides a large class of dynamics with explicit path large deviation functionals.Inspired by problems in hydrodynamics and atmosphere dynamics, we present thesimplest example of the emergence of metastability for such a process.

Large deviation theory recently became a key theoretical tool for the statistical mechan-ics of non equilibrium systems. Describing path large deviations for the dynamics ofeﬀective degrees of freedom leads to a precise understanding of typical and rare trajecto-ries of physical, biological or economic processes. A paradigm example for the eﬀectivedescriptions of complex systems using large deviation theory is the macroscopic ﬂuctua-tion theory of systems of interacting particles [1]. However, for genuine non-equilibriumprocesses, without local detailed balance, the class of systems for which the Hamiltonianfor path large deviations can be described explicitly is extremely limited.In this paper, we consider a class of systems for which the eﬀective dynamics hasincrements which are given by a quadratic form of a fast Gaussian process. This stronghypothesis is relevant for many applications. Quadratic interactions are common in manyphysical examples: in hydrodynamics, plasma described by the Vlasov equation, mag-neto hydrodynamics, self gravitating systems, the KPZ equation, the physics describedby quadratic networks (for instance heat transfer among quadratic networks [2]), tocite just a few. In all these systems with quadratic nonlinearities, in some regime aseparation of time scale exists and the eﬀective degrees of freedom are coupled to fastevolving Gaussian processes. This is the case, for example, for the kinetic theories ofplasma [3, 4], self gravitating systems [5], geostrophic turbulence [6], wave turbulence [7]for some speciﬁc dispersion relations, among many other examples. From a theoreti-cal and mathematical perspective, the hypothesis that eﬀective degrees of freedom are1 a r X i v : . [ c ond - m a t . s t a t - m ec h ] F e b riven by a quadratic form of a fast Gaussian process is a decisive simpliﬁcation. Withthis assumption, we will be able to write explicit formulas for the path large deviationHamiltonian, and proceed to its analysis in many interesting examples.The study of a slow process coupled to a fast one is a classical paradigm of physicsand mathematics. For such fast/slow dynamics, one can study the averaging of the eﬀectof the fast variable on the slow one (law of large numbers), or the typical ﬂuctuations(stochastic averaging [8]), or the rare ﬂuctuations described by large deviation theory [9].The large deviation theory has been developed for slow/fast Markov processes [10, 11]or deterministic systems [12, 13]. We do not know however many examples of largedeviation studies for which the fast process is not a Markov process. We stress that thefast Gaussian processes we use in this paper are not necessarily Markov.The current paper can be considered as a continuation of the study carried out in[14], of the large deviation principles for systems with two signiﬁcantly diﬀerent timescales, when the drift for a slow process is given by a second degree polynomial of thefast process. However, in this new paper we consider a larger class of systems, for whichthe fast dynamics is not necessarily a Markov process. The main new contribution ofthe current work is to apply the asymptotic theory of Fredholm determinants to thecalculation of the large deviation rate function. The main result is an explicit formulafor the Hamiltonian which characterizes path large deviations for the slow process, whichis valid for both Markov and non-Markov fast processes. Using those explicit formulas wecan study a simple example of bistability motivated by hydrodynamic applications [15,16].We start with the deﬁnition of the model in Section 2 and give a heuristic derivationof the corresponding large deviation principle in Section 3. The highlight of this section isthe application of Widom’s theorem for the asymptotics of Fredholm determinants to thecalculation of the rate function. In Section 4 we show the emergence of metastability for aparticular representative of our class of models and study the corresponding ’instanton’trajectories. Brief conclusions are presented in Section 5. The Appendices A and Bcontain some technical derivations for Section 3. Appendix C contains a review ofWidom’s theorem. Consider the following stochastic model: (cid:26) ˙ X ( t ) = Y T (cid:0) t(cid:15) , X ( t ) (cid:1) M Y (cid:0) t(cid:15) , X ( t ) (cid:1) − νX ( t ) ,X (0) = x , (1)where { X ( t ) } t ≥ is an R n -valued random process, (cid:15) is a parameter, which determinestime scale separation between the processes X and Y , 0 < (cid:15) <<

1; for a ﬁxed x ∈ R n , Y ( t, x ) is an N -dimensional stationary centred Gaussian process with covariance C ( τ, x ), E [ Y i ( t, x ) Y j ( t + τ, y )] = C ij ( τ, x, y ) , τ ≥ , ≤ i, j ≤ N, (2)2hich is assumed to be continuous in all the arguments τ, x, y for some of the heuristicarguments below to be rigourisable. As we will see, only C ( τ, x, x ) enters the ﬁnalexpression for the large deviation rate function, which justiﬁes our shorthand notation C ( τ, x ) := C ( τ, x, x ). Finally, M is a n × N × N matrix, symmetric with respect to thepermutation of the last two indices, and ν > X, Y )process need not be Markov.We assume that C ( τ, x ) decays suﬃciently fast (e. g. exponentially) with τ , perhapsuniformly with respect to x . Then, in the limit of (cid:15) →

0, the slow random process X ( t )stays near the solution to the deterministic equation (cid:26) dx ( t ) = Tr [ M C (0 , x ( t ))] − νx ( t ) ,x (0) = x , (3)where ( T r ) is the trace over N ’fast’ indices. The typical ﬂuctuations of X ( t ) around x ( t ) are Gaussian, with covariance of order (cid:15) (more precisely, the distribution of lim (cid:15) → X ( t ) − x ( t ) √ (cid:15) is centred Gaussian). Here we are interested in the statistics of large deviations of X ( t ) when X ( t ) − x ( t ) = O (1), which are no longer Gaussian. The following computation is not a proof, just a heuristic argument devised to give anintuitive feel for the ﬁnal form of the large deviation principle. We feel that its inclusionis necessary, due to the diﬀerence between our model and Markov models with time scaleseparation, for which the large deviation principle is known, see [14] for references: allwe require of the fast process Y is to be mixing.Let us ﬁx time t >

0, choose a large integer P ∈ N and deﬁne∆ t = tP . Let λ , λ , . . . , λ P be a sequence of n -dimensional vectors with non-negative components.Let P x be the probability measure for the process X started from x . Let E x be theexpectation over the joint distribution of processes X, Y , where X is started from x .Let E be the expectation with respect to the Gaussian measure corresponding to Y .Then (cid:15) log P x [ X ( k ∆ t ) ∈ dx k , k = 1 , . . . , P ] ≤ − P (cid:88) k =1 λ Tk ( x k − x k − ) + (cid:15) log E (cid:34) P (cid:89) k =1 e λTk(cid:15) F k ( Y,x k − ) (cid:35) (4)where F k ( Y, x ) = (cid:90) k ∆ t ( k − t dτ Y T ( τ /(cid:15), x ) M Y ( τ /(cid:15), x ) − νx ∆ t + O (∆ t ) , (5)see Appendix A for the derivation. Proceeding as informally as above, let us evaluatethe expectation in the right hand side of the above bound. Choose ∆ t = √ (cid:15) . Then3 = t/ √ (cid:15) = O ( (cid:15) − / ). Approximating the random process Y by a sequence of boundedrandom variables with a ﬁnite dependency range δ >

0, we ﬁnd log E exp (cid:34) P (cid:88) k =1 λ Tk (cid:15) (cid:32)(cid:90) k ∆ t ( k − t dτ Y T ( τ /(cid:15), x k − ) M Y ( τ /(cid:15), x k − ) + O (∆ T ) (cid:33)(cid:35) = P (cid:88) k =1 log E exp (cid:34) λ Tk (cid:32)(cid:90) k ∆ t/(cid:15) ( k − t/(cid:15) dτ Y T ( τ, x k − ) M Y ( τ, x k − ) (cid:33)(cid:35) + O ( (cid:15) − / ) , (6)see Appendix B for derivation. The remaining expectation is easy to compute using thefact that the process Y t is stationary Gaussian. Let us deﬁne m := λ T M , an N × N symmetric matrix. Let us present it in the form m = S T S , where S is possibly complexCholesky factor of m . Let us also rewriteexp (cid:20) λ T (cid:18)(cid:90) T dτ Y T ( τ, x ) M Y ( τ, x ) (cid:19)(cid:21) = (cid:90) T (cid:89) τ =0 D q ( τ ) e − (cid:82) T dτq T ( τ ) q ( τ )+ (cid:82) T dτq T ( τ ) SY ( τ,x ) (Hubbard-Stratonovich transformation.) Then, for suﬃciently small components of λ , E exp (cid:20) λ T (cid:18)(cid:90) T dτ Y T ( τ, x ) M Y ( τ, x ) (cid:19)(cid:21) = (cid:90) T (cid:89) τ =0 D q ( τ ) e − (cid:82) T dτq T ( τ ) q ( τ ) E (cid:16) e (cid:82) T dτq T ( τ ) SY ( τ,x ) (cid:17) = (cid:90) T (cid:89) τ =0 D q ( τ ) e − (cid:82) T dτq T ( τ ) q ( τ )+ (cid:82) T dτ (cid:82) T τ q T ( τ ) SC ( t − t ,x ) S T q ( τ ) = Det − (cid:16) I − S ˆ C T ( x ) S T (cid:17) = Det − (cid:16) I − m ˆ C T ( x ) (cid:17) . (7)Here m ˆ C T ( x ) is an integral operator acting on (square integrable) R N -valued functionsas follows: f α ( t ) (cid:55)→ m ˆ C T ( x )( f ) α ( t ) = N (cid:88) β,δ =1 (cid:90) T dτ m αβ C β,δ ( t − τ, x ) f δ ( τ ) , α = 1 , . . . , N ; t ∈ R . (8) Remark.

In what follows we will use capital Det and Tr to denote operator determi-nant and trace, and lowercase det and tr for the determinant and the trace of ﬁnite-dimensional matrices.Thus it turns out, the limit of (cid:15) → T asymptotics of the Fredholm determinant entering(7), see the derivation leading to (11) below. Luckily, such an asymptotic can be com-puted using Widom’s theorem, which generalises the celebrated Szego-Kac formula for It is the absence of error estimates associated with this approximation which makes our presentdiscussion non-rigorous. m (e. g. with respect to matrixnorm), log Det (cid:16) I − m ˆ C T ( x ) (cid:17) = T (cid:90) R dk π log det (cid:16) I − m ˜ C ( k, x ) (cid:17) + O ( T ) , (9)where ˜ C ( k, x ) = (cid:90) R dτ e ikτ C ( τ, x ) (10)is the Fourier transform of the autocorrelation function C ( τ, x ). This remarkable state-ment is reviewed in Appendix C. Substituting (7,9) into (4,6) we ﬁnd (cid:15) log P x [ X ( p ∆ t ) ∈ dx p , p = 1 , . . . , P ] (4 , ≤ − P (cid:88) p =1 ∆ tλ Tp (cid:18) x p − x p − ∆ t + νx p − (cid:19) + P (cid:88) p =1 (cid:15) log E exp (cid:34)(cid:90) ∆ t/(cid:15) dτ Y T ( τ, x p − ) mY ( τ, x p − ) (cid:35) + O ( √ (cid:15) ) (7) = − P (cid:88) p =1 ∆ tλ Tp (cid:18) x p − x p − ∆ t + νx p − (cid:19) − P (cid:88) p =1 (cid:15) log Det (cid:16) I − m ˆ C ∆ t(cid:15) ( x p − ) (cid:17) + O ( √ (cid:15) ) (9) = − P (cid:88) p =1 ∆ tλ Tp (cid:18) x p − x p − ∆ t + νx p − (cid:19) − P (cid:88) p =1 (cid:15) (cid:20) ∆ t(cid:15) (cid:90) R dk π log det (cid:16) I − m ˜ C ( k, x p − ) (cid:17) + O ( (cid:15) ) (cid:21) + O ( √ (cid:15) )= − P (cid:88) p =1 ∆ tλ Tp (cid:18) x p − x p − ∆ t + νx p − (cid:19) − P (cid:88) p =1 ∆ t (cid:90) R dk π log det (cid:16) I − λ Tp M ˜ C ( k, x p − ) (cid:17) + O ( (cid:15) / ) , (11)where P = t/ √ (cid:15) , ∆ t = √ (cid:15) . Finally, taking the limit (cid:15) →

0, we ﬁndlim (cid:15) → (cid:15) P x [ X ( τ ) ∈ dx ( τ ) , ≤ τ ≤ t ] ≤ − (cid:90) t dτ (cid:0) λ T ( τ ) ˙ x ( τ ) + νλ T ( τ ) x ( τ ) (cid:1) − (cid:90) t dτ (cid:90) R dk π log det (cid:16) I − λ T ( τ ) M ˜ C ( k, x ( τ )) (cid:17) . Therefore, our informal calculations based on Widom’s theorem lead to the followingresult: the process X ( t ) satisﬁes the large deviation principle with rate (cid:15) and the ratefunction I [ x ] = sup λ ( τ ) , ≤ τ ≤ t (cid:20)(cid:90) t dτ λ T ( τ ) ( ˙ x ( τ ) + νx ( τ )) + 12 (cid:90) t dτ (cid:90) R dk π log det (cid:16) I − λ T ( τ ) M ˜ C ( k, x ( τ )) (cid:17)(cid:21) . (12)We would like to stress that from the mathematical point of view (12) is still a conjecture.Even less formally one can write P x [ X ( τ ) ∈ dx ( τ ) , ≤ τ ≤ t ] ∼ e − (cid:15) I [ x ] (13)5 typical application of the rate functional guessed above is to estimate the probabilityof transitioning between ﬁxed points of the typical evolution (3). If x , x are two suchpoints, thenlog P x [ X ( t ) ∈ dx ] ∼ − (cid:15) (cid:34) inf x ( τ ) , ≤ τ ≤ t : x (0)=0 ,x ( t )= x (cid:32) sup λ ( τ ) , ≤ τ ≤ t S eff ( λ, x ) (cid:33)(cid:35) , (14)where S eff ( λ, x ) = (cid:90) t dτ λ T ( τ ) ( ˙ x ( τ ) + νx ( τ ))+ 12 (cid:90) t dτ (cid:90) R dk π log det (cid:16) I − λ T ( τ ) M ˜ C ( k, x ( τ )) (cid:17) . (15)As a self-consistency check, let us verify that the average evolution equation (3) appearsas an equation for a typical trajectory for the large deviation principle (14), (15). Atypical trajectory ( λ c , x c ) ≤ τ ≤ t is a solution to Euler-Lagrange equations associated with S eff such that S eff ( λ c , x c ) = 0 . Examining the derivation of the large deviation principle, it is reasonable to expect that λ c = 0. Expanding (15) around λ = 0 we ﬁnd S eff = (cid:90) t dτ λ T ( τ ) ( ˙ x ( τ ) + νx ( τ ) − Tr [

M C (0 , x ( t ))]) + O ( λ ) , (16)where we used that (cid:82) R dk π ˜ C ( k, x ) = C (0 , x ). Therefore, λ = 0 solves the Euler-Lagrangeequations if ˙ x ( τ ) + νx ( τ ) − Tr [

M C (0 , x ( t ))] = 0 , x (0) = x , which coincides with (3).In particular, the ﬁxed points of the slow dynamics are solutions to νx = Tr [ M C (0 , x )] (17) Remarks.

1. If N = 1, and Y solves an Ornstein-Uhlenbeck SDE with X -dependent drift, thecorresponding large deviation principle was derived in [15] and is consistent withconjecture (12) for all values of λ . However, in general one has to check that theoptimal λ belongs to the domain of applicability of Widom’s theorem, which is oneof the challenges for the rigorous justiﬁcation of the conjecture. A natural guessis that the minimizer must be small enough to ensure positive deﬁniteness of thequadratic form in the functional integral (7).6. If Y appears as a solution to an Ornstein-Uhlenbeck system of stochastic diﬀerentialequations, then (12) can be viewed as a closed form answer for the trace of theasymptotic solution to the time-dependent matrix Riccatti equation derived in [14].3. In the context of modeling of two-dimensional turbulent ﬂows, equation (1) canbe interpreted as follows: Y is a Gaussian model of fast small-scale velocity ﬁeldwhose evolution depends on the static background created by X ; X is a largescale velocity ﬁeld slowly evolving under the inﬂuence of Y . Thus the model canbe thought of as a non-linear generalisation of the passive vector advection model.The shape of C reﬂects the nature of the small scale turbulent ﬂow (compressibility,isotropy, etc.) The aim of this section is to present an example of the use of the large deviation prin-ciple (15). We are speciﬁcally interested in multistability phenomena observed in twodimensional [15] and geostrophic [16] turbulent ﬂows. In previous works, we have studiedmultistability for geostrophic dynamics [18], in cases when the turbulent ﬂows is forcedby white noises, and the stochastic process is an equilibrium one with detailed balanceor generalized detailed balance. The large deviation principle (15) opens the possibilityfor studying multistability for turbulents ﬂows modelled as a non-equilibrium process.The aim of this section is not to work out exactly multistabilty for turbulent ﬂows, butrather to devise the simplest possible example, amenable to explicit analysis, with thesame properties as the dynamics of turbulent ﬂows.To formulate the example, it will be easier to use complex notations. The fast variable Y ( · , x ) ∈ C N will be the analogous of a set of Fourrier components that describe theturbulent ﬂuctuations. It is characterised as the stationary solution of the complexOrnstein-Uhlenbeck process, (cid:26) dY ( t, x ) = − Γ( x ) Y ( t, x ) dt + σdW ( t ) ,dX ( t ) = Y ( t/(cid:15), X ( t )) ∗ M Y ( t/(cid:15), X ( t )) dt − νX ( t ) dt, (18)where M is a self-adjoint N × N matrix; dW is the C N -valued Brownian motion, withthe non-trivial covariance d ¯ W i dW j = δ ij dt, (19)Γ( x ) is a complex matrix, whose eigenvalues have positive real parts,Γ( x ) = Γ (0) + ix T Γ (1) , (20)where Γ (0) is a real positive deﬁnite N × N matrix, Γ (1) is a real n × N × N matrix. Theformer describes dissipation, whereas the latter corresponds to the ‘rotational’ advectionof Y by the slow ﬁeld X . All the coeﬃcients are polynomials of degree at most one7n x . This structure of the system of SDE’s (18) resembles that of the quasi-linearapproximation to the Navier-Stokes equation, see [6] for details: the non-linearity inthe right hand side is quadratic, the evolution of the slow variable is driven by theterm quadratic in the fast variable, the drift of the fast variable resembles advectionby the slow ﬁeld X . Let us stress that model does not have any artiﬁcial “built-in”non-linearity, but respect strictly the algebraic structure of the Navier-Stokes equationor quasigeostrophic equations, see [6], although the value of the coeﬃcient will be chosenarbitrarily.Some standard computations lead to formulae for the correlation and auto-correlationfunctions, C (0 , x ) := E ( Y (0 , x ) ⊗ Y ∗ (0 , x )), C ( τ, x ) := E ( Y ( τ, x ) ⊗ Y ∗ (0 , x )). C (0 , x )solves the Lyapunov equation,Γ( x ) C (0 , x ) + C (0 , x )Γ ∗ ( x ) = σσ ∗ , (21)whereas C ( τ, x ) = e − Γ( x ) τ C (0 , x ) , τ ≥ . (22)If τ <

0, then C ( τ, x ) = C (0 , x ) e Γ ∗ ( x ) τ . The eﬀective Hamiltonian re-written in complexterms is H ( λ, x ) = − λ T νx − (cid:90) R dk π log det (cid:16) I − λ T M ˜ C ( k, x ) (cid:17) , (23)where M is an N × N self-adjoint matrix and˜ C ( k, x ) := (cid:90) R dτ e ikτ C ( τ ) = (Γ( x ) − ik ) − C (0 , x ) + C (0 , x )(Γ ∗ ( x ) + ik ) − , (24)is the Fourier transform of the auto-correlation function.Keeping the matters as simple as possible, let us choose Γ (0) and Γ (1) to be thediagonal matrices with real entries { γ (0) p , γ (1) p } ≤ p ≤ N , where γ (0) ’s are all positive. Theﬁxed point equation (17) takes the form N (cid:88) j,k =1 ( σσ ∗ ) jk ( M α ) kj (cid:16) γ (0) j + γ (0) k + i (cid:80) nβ =1 (cid:104) ( γ (1) β ) j − ( γ (1) β ) k (cid:105) x β (cid:17) = νx α , ≤ α ≤ n. (25)Notice that if either the noise covariance matrix σσ ∗ , or the interaction matrix M α isdiagonal, there is a unique solution for the α -th component of the ﬁxed point. Indeed, if( M α ) kj = 0 for all k (cid:54) = j , then the left hand side of equation (25) becomes x -independentand the equation becomes linear w.r.t x α . The same remark applies if σσ ∗ is diagonal.Similarly, the ﬁxed point is unique if ( γ (1) β ) j − ( γ (1) β ) k = 0 for all j, k, β . However, forgeneral correlated noise, interaction and an inhomogeneous rotation matrix γ (1) ,thereare multiple solutions to (26). Moreover, it is easy to choose the coeﬃcients in sucha way that there are multiple real solutions, see Fig. 1 for an n = 1 , N = 3 example8igure 1: The eﬀective force for the model (26). Notice a pair of stable ﬁxed points ofthe averaged dynamics separated by an unstable ﬁxed point.with two stable and one unstable ﬁxed point. The chosen model parameters are: n = 1, N = 3, ν = I σ = 12 /  − − i − i − − i − − i − i − − i  , M =   , γ (0) =   , γ (1) = π   . (26)(The appearance of powers of 2 and π in the above parameterisation has no specialmeaning. The choice M ii = 0 and M ij = const for i (cid:54) = j reﬂects some properties of theinteraction matrix for 2-dimensional Navier-Stokes equation, but it is also not essentialfor the appearance of multiple equilibria.)The fact that multiple equilibria appear naturally in the model (18) together withits link to quasi-linear hydrodynamics explained above makes us hope that the largedeviation principle (12) might prove useful in studying realistic hydrodynamic phe-nomena of metastability, such as the zonal-dipole transition discovered in [15]. Euler-Lagrange equations associated with the eﬀective action functional (15) are Hamiltonianwith the Hamiltonian (23). Therefore, each solution lies on a constant energy surface H ( λ, x ) = E . If there is a single slow variable, the trajectories coincide with constant9igure 2: Contour lines of H eff for the model (26). Contour lines in the upper halfplane serve as optimal trajectories for transitions between the stable ﬁxed points in aﬁnite time. The red curve is the inﬁnite-time optimal transition curve. Black arrowsmark the typical trajectory connecting the unstable and stable ﬁxed points.energy surfaces. This allows one to determine a family of the most likely transition pathsbetween the ﬁxed points (the instanton trajectories) by building the contour plot of H numerically, see Fig. 2. Motivated by hydrodynamic applications, we have considered a model with two-timescales, where the slow variable is driven by a quadratic function of a fast Gaussianprocess with rapidly decaying auto-correlations. A natural question of computing theprobabilities of rare events in this model reduces to the computation of large-intervalasymptotics for a certain Fredholm determinant. To the leading order, such a com-putation can be easily carried out using Widom’s theorem. To apply the resultinglarge deviation principle, we considered a special case of the fast ﬁeld being a complexOrnstein-Ohlenbeck process with the the rotational component of the drift given by alinear function of the slow process. As it turns out, the average slow dynamics for such10 model exhibits multiple equilibria, the transitions between which can be studied usinglarge deviation theory.There are many natural further questions to ask. Firstly, it should be a straightfor-ward task to furnish a rigorous proof or provide a counter-example to the statement ofthe conjecture (12). Secondly, for the cases, when the fast process conditional on thevalue of the slow process is an Ornstein-Uhlenbeck process, it might be interesting toconsider ﬁnite- (cid:15) corrections to the leading order answer. Albeit known, the sub-leadingterms in the Widom asymptotic are only characterised as solutions to a certain matrixWiener-Hopf integral equation. There is however a chance of ﬁnding these correctionsrather more explicitly as solutions to time-dependent Riccatti equations derived in [14].Finally, the model considered has the general structure of many equations of hydro-dynamics, plasma dynamics, self-gravitating systems, wave turbulence or other physicalsystem with quadratic couplings or interactions. It would therefore be extremely inter-esting to analyse metastability for such physical systems, in the presence of time scaleseparation, using the ﬁndings of the present paper.

A The derivation of (4) P x [ X ( k ∆ t ) ∈ dx k , k = 1 , , . . . , P ] = E x (cid:34) P (cid:89) k =1 ( X ( k ∆ t ∈ dx k )) (cid:35) ≤ E x (cid:34) e λTP(cid:15) ( X ( P ∆ t ) − x P ) P − (cid:89) k =1 ( X ( k ∆ t ∈ dx k )) (cid:35) = e − λ TP xP − xP − (cid:15) E x (cid:34) e λTP(cid:15) ( X ( P ∆ t ) − x P − ) P − (cid:89) k =1 ( X ( k ∆ t ∈ dx k )) (cid:35) = e − λ TP xP − xP − (cid:15) E x (cid:34) e λTP(cid:15) F P ( Y,x P − ) P − (cid:89) k =1 ( X ( k ∆ t ∈ dx k )) (cid:35) . The ﬁrst inequality is due to Chebyshev, the last equality follows from solving (1) overa short time interval. Repeating the above steps ( P −

1) times, we ﬁnd that P x [ X ( k ∆ t ) ∈ dx k , k = 1 , . . . , P ] ≤ exp (cid:34) − (cid:15) P (cid:88) k =1 λ Tk ( x k − x k − ) (cid:35) E (cid:34) P (cid:89) k =1 e λTk(cid:15) F k ( Y,x k − ) (cid:35) , which is equivalent to (4). 11 The derivation of (6)

Recall that for this derivation Y ( τ, x k )’s are approximated by a sequence of boundedrandom variables with a ﬁnite dependency length δ .log E exp (cid:34) P (cid:88) k =1 λ Tk (cid:15) (cid:32)(cid:90) k ∆ t ( k − t dτ Y T ( τ /(cid:15), x k − ) M Y ( τ /(cid:15), x k − ) + O (∆ T ) (cid:33)(cid:35) = log E exp (cid:34) P (cid:88) k =1 λ Tk (cid:15) (cid:32)(cid:90) k ∆ t − δ(cid:15) ( k − t + δ(cid:15) dτ Y T ( τ /(cid:15), x k − ) M Y ( τ /(cid:15), x k − ) + O ( (cid:15) ) (cid:33)(cid:35) = log E exp (cid:34) P (cid:88) k =1 λ Tk (cid:15) (cid:32)(cid:90) k ∆ t − δ(cid:15) ( k − t + δ(cid:15) dτ Y T ( τ /(cid:15), x k − ) M Y ( τ /(cid:15), x k − ) (cid:33)(cid:35) exp( O ( (cid:15) − / ))= P (cid:88) k =1 log E exp (cid:34) λ Tk (cid:15) (cid:32)(cid:90) k ∆ t − δ(cid:15) ( k − t + δ(cid:15) dτ Y T ( τ /(cid:15), x k − ) M Y ( τ /(cid:15), x k − ) (cid:33)(cid:35) + O ( (cid:15) − / )= P (cid:88) k =1 log E exp (cid:34) λ Tk (cid:15) (cid:32)(cid:90) k ∆ t ( k − t dτ Y T ( τ /(cid:15), x k − ) M Y ( τ /(cid:15), x k − ) (cid:33)(cid:35) + O ( (cid:15) − / )= P (cid:88) k =1 log E exp (cid:34) λ Tk (cid:32)(cid:90) k ∆ t/(cid:15) ( k − t/(cid:15) dτ Y T ( τ, x k − ) M Y ( τ, x k − ) (cid:33)(cid:35) + O ( (cid:15) − / ) , which is (6). C Widom’s theorem

We will follow the original paper by Kac [19] and state the simplest set of conditionsleading to the formula (9).

Let K : R → R N × N be an N × N matrix-valued function of one variable. Assumethat K is even ( K ( t ) = K ( − t ) , for any t ∈ R ) and non-negative ( K ij ( t ) ≥ for any t ∈ R and ≤ i, j ≤ N ). Assume in addition that (cid:90) R | t | K ( t ) dt < ∞ , (27) (cid:90) R N (cid:88) k =1 K ki ≤ , ≤ i ≤ N. (28) The function K can be regarded as a kernel of an integral operator ˆ K acting on square-integrable functions from R to R N , f (cid:55)→ ˆ Kf ( t ) = (cid:90) R dτ K ( t − τ ) f ( τ ) , t ∈ R . (29)12 hen there is λ max > such that for any λ : | λ | < λ max the Fredholm determinantDet ( I − λ ˆ K T ) exists and log Det ( I − λ ˆ K T ) = T (cid:90) R dk π log det(1 − λ ˜ K ( k )) + O ( T ) , (30) where ˆ K T is the restriction of ˆ K to functions on [0 , T ] and ˜ K ( k ) = (cid:90) R dxe − ikx K ( x ) , k ∈ R . (31) Remarks.

1. In [17] Widom presents a stronger version of the above statement which char-acterises the O ( T ) term fully. For the current paper we only need the leadingterm.2. The actual statement of Widom’s theorem does not require the positivity of thekernel. In fact, all steps of the proof presented below go through for signed kernelsas well, but the probabilistic intuition guiding these steps is lost. See also [19] forsimilar remarks about the original proof of Szego’s theorem by Marc Kac.Let us sketch the proof of the theorem using, as we already mentioned, the probabilisticmethod used in [19] to prove a continuous version of Szeg¨o’s formula for the asymptoticsof Toeplitz determinants. For a suﬃciently small | λ | we can calculate the Fredholmdeterminant using the trace-log formula,log Det( I − λ ˆ K T ) = − ∞ (cid:88) n =1 n λ n Tr ˆ K nT , (32)where Tr ˆ K nT = (cid:90) [0 ,T ] n dx dx . . . dx n tr K ( x − x ) K ( x − x ) . . . K ( x n − x ) . Using the cyclic property of trace and the fact that the function K is even, we ﬁnd ddT Tr ˆ K nT = n (cid:90) [0 ,T ] n dx dx . . . dx n tr K (0 − x ) K ( x − x ) . . . K ( x n − − x n ) K ( x n − . (33)Consider the following discrete time Markov chain { X n , S n } n ≥ on the state space R ×{ , , . . . , N } :1. ( X , S ) ∼ ( δ , U N ), where U N is the uniform distribution on { , , . . . , N } .2. At each time step, the transition ( x, i ) → ( y, k ) happens with probability K ki ( y − x ) dy . 13otice that this is a Markov chain with killing, the survival probability when transition-ing from state ( x, i ) is g i ( x ) := (cid:80) Nk =1 (cid:82) R K ki ( y − x ) dy ≤

1. Examining the expression (33)for the derivative of the trace of the n -th power of ˆ K , we see that it can be interpretedas the following expectation with respect to the law of the chain { X n , S n } n ≥ : ddT Tr ˆ K nT = N n E ( ( X n ∈ d ( S n = S ) ( τ = n )) , (34)where τ is the ﬁrst exit time of the chain from the interval (0 , T ) × { , , . . . , N } . Toderive the above expression we exploited the identity ( X n ∈ d ( τ ≥ n ) = ( X n ∈ d ( τ = n ). Substituting (34) into (33) and then (32), we ﬁnd that ddT log Det( I − λ ˆ K T ) = − N E ( λ τ ( X τ ∈ d ( S τ = S ))= − N E ( λ τ ( X τ ∈ d ( M τ < T ) ( S τ = S )) , where τ is the ﬁrst exit time from (0 , ∞ ) × { , , . . . , N } , M τ = max ≤ n<τ ( X n ). Aslog det( I − λ ˆ K ) = 0, we can integrate the last expression to ﬁndlog Det( I − λ ˆ K T ) = − N E ( λ τ ( X τ ∈ d T − M τ ) + ( S τ = S )) , where ( x ) + := max( x, T − ( T − M ) + = min( T, M ), we can re-arrangethe above expression as follows:log Det( I − ˆ K T ) = − N T E ( λ τ ( X τ ∈ d ( S τ = S ))+ N E ( λ τ ( X τ ∈ d

0) min(

T, M τ ) ( S τ = S )) , This is an exact expression for the Fredholm determinant as an expectation with respectto the law of the Markov chain we deﬁned. In many cases it allows for an eﬃcient compu-tation of the large- T expansion of the Fredholm determinant using purely probabilisticmethods. For us it is suﬃcient to check that lim T →∞ min( T, M τ ) = M τ , which impliesthat log Det( I − λ ˆ K T ) = − N T E ( λ τ ( X τ ∈ d ( S τ = S )) + O ( T ) . (35)To calculate the expectation entering the leading term we use the following combinatoriallemma (see e. g. [20], volume 2): Let (0 , R , R + R , . . . , R + R + . . . + R n − ,

0) bethe ﬁrst n R -projections of the states of the chain with τ = n . Then n − (cid:88) p =0 n − (cid:89) k =1 ( R p + R p + . . . + R k + p >

0) = 1 a. s. , ≤ p ≤ n − . (36)The addition of subscripts in the above formula should be understood modulo n . Theabove statement is very general and relies only on the absence of atoms in the transitionprobabilities K ( y − x ) dy .In this case, for any sequence (0 , R , R + R , . . . , R + R + . . . + R n − , , R p , R p + R p , . . . , R p + R p + . . . + R n − p , n −

1. Then N E ( λ τ ( X τ ∈ d ( S τ = S )) = N ∞ (cid:88) n =1 λ n E ( ( X τ ∈ d ( S τ = S ) ( τ = n ))= ∞ (cid:88) n =1 λ n (cid:90) R n dr . . . dr n tr( K ( r ) . . . K ( r n )) δ ( r + . . . + r n ) n − (cid:89) k =1 ( r + . . . + r k > ∞ (cid:88) n =1 λ n n (cid:90) R n dr . . . dr n tr( K ( r ) . . . K ( r n )) δ ( r + . . . + r n ) n − (cid:88) p =0 n − (cid:89) k =1 ( r p + . . . + r k + p > ∞ (cid:88) n =1 λ n n (cid:90) R n dr . . . dr n tr( K ( r ) . . . K ( r n )) δ ( r + . . . + r n )= ∞ (cid:88) n =1 λ n n (cid:90) R dk π (cid:90) R n dr . . . dr n e − ik ( r + ... + r n ) tr( K ( r ) . . . K ( r n ))= ∞ (cid:88) n =1 λ n n (cid:90) R dk π tr( ˜ K ( k ) . . . ˜ K ( k n )) = − (cid:90) R dk π log det( I − λ ˜ K ( k )) . (37)The third inequality is the symmetrisation of the integrand with respect to all cyclingpermutations, the fourth inequality is due to the combinatorial lemma (36). Substituting(37) into (35), we arrive at the statement (30) of Widom’s theorem. References [1] Bertini L, De Sole A, Gabrielli D, Jona-Lasinio G and Landim C 2015

Reviews ofModern Physics Physical Review E Physical kinetics (Course of theoreticalphysics, Oxford: Pergamon Press, 1981)[4] Nicholson D 1983

Introduction to plasma theory (Wiley, New-York)[5] Binney J and Tremaine S 1987

Galactic dynamics (Princeton, NJ, Princeton Uni-versity Press, 1987, 747 p.)[6] Bouchet F, Nardini C and Tangarife T 2013

J. Stat. Phys.

Wave turbulence vol 825 (Springer Science & Business Media)[8] Pavliotis G and Stuart A 2008

Multiscale methods: averaging and homogenization (Springer Science & Business Media) 159] Freidlin M I, Sz¨ucs J and Wentzell A D 2012

Random perturbations of dynamicalsystems vol 260 (Springer Science & Business Media)[10] Freidlin M I 1978

Russian Mathematical Surveys Stochastic Processes and their Applications Inventiones Mathematicae

Ergodic Theory and Dynamical Systems Journal of StatisticalPhysics

Physical Review Letters

Journal of Functional Analysis JOURNAL OF STATISTICALPHYSICS et al.