[PDF] Stochastic averaging of dynamical systems with multiple time scales forced with α-stable noise

Abstract

Stochastic averaging allows for the reduction of the dimension and complexity of stochastic dynamical systems with multiple time scales, replacing fast variables with statistically equivalent stochastic processes in order to analyze variables evolving on the slow time scale. These procedures have been studied extensively for systems driven by Gaussian noise, but very little attention has been given to the case of \alpha-stable noise forcing which arises naturally from heavy-tailed stochastic perturbations. In this paper, we study nonlinear fast-slow stochastic dynamical systems in which the fast variables are driven by additive \alpha-stable noise perturbations, and the slow variables depend linearly on the fast variables. Using a combination of perturbation methods and Fourier analysis, we derive stochastic averaging approximations for the statistical contributions of the fast variables to the slow variables. In the case that the diffusion term of the reduced model depends on the state of the slow variable, we show that this term is interpreted in terms of the Marcus calculus. For the case \alpha = 2, which corresponds to Gaussian noise, the results are consistent with previous results for stochastic averaging in the Gaussian case. Although the main results are derived analytically for 1 < \alpha < 2, we provide evidence of their validity for \alpha < 1 with numerical examples. We numerically simulate both linear and nonlinear systems and the corresponding reduced models demonstrating good agreement for their stationary distributions and temporal dependence properties.

Full PDF

aa r X i v : . [ m a t h . P R ] F e b Stochastic averaging of dynamical systemswith multiple time scales forced with α -stable noise William F. Thompson ∗ , Rachel A. Kuske † , Adam H. Monahan ‡ Abstract

Stochastic averaging allows for the reduction of the dimension and complexity of stochastic dynamical systemswith multiple time scales, replacing fast variables with statistically equivalent stochastic processes in order to analyzevariables evolving on the slow time scale. These procedures have been studied extensively for systems driven byGaussian noise, but very little attention has been given to the case of α -stable noise forcing which arises naturallyfrom heavy-tailed stochastic perturbations. In this paper, we study nonlinear fast-slow stochastic dynamical systemsin which the fast variables are driven by additive α -stable noise perturbations, and the slow variables depend linearlyon the fast variables. Using a combination of perturbation methods and Fourier analysis, we derive stochasticaveraging approximations for the statistical contributions of the fast variables to the slow variables. In the casethat the diﬀusion term of the reduced model depends on the state of the slow variable, we show that this term isinterpreted in terms of the Marcus calculus. For the case α = 2, which corresponds to Gaussian noise, the resultsare consistent with previous results for stochastic averaging in the Gaussian case. Although the main results arederived analytically for 1 < α <

2, we provide evidence of their validity for α <

Stochastic averaging (also referred to as “stochastic reduction” or in some cases, “stochastic homogenization”) refers totechniques whereby a stochastic dynamical system with multiple time scales is approximated by a reduced dynamical ∗ Dept. of Mathematics, U. of British Columbia, Vancouver, BC, Canada. Corresponding author: [ [email protected] ] † Dept. of Mathematics, U. of British Columbia, Vancouver, BC, Canada. ‡ School of Earth and Ocean Science, U. of Victoria, Victoria, BC, Canada. α -stable law,for which the autocovariance is not deﬁned [10] (except for the case when α = 2). A stochastic averaging procedurefor systems with multiple time scales and α -stable noise forcing is relevant for modelling problems with heavy-tailedstatistics, as a shifted and normalized series of heavy-tailed random variables converges in distribution to an α -stablerandom variable [45]. Hence, they form a family of attracting distributions for independent, identically-distributed(i.i.d.) sums of heavy-tailed random variables, analogous to the Gaussian distribution as the limiting distributionfor i.i.d. sums of random variables with ﬁnite variance. An integrated α -stable noise process is characterized byits “superdiﬀusive” behaviour whereby the observed variance in the trajectories grows superlinearly in time and aresometimes referred to as a “L´evy ﬂights” in the physics literature. The dynamics of atmospheric tracers have beenshown to be superdiﬀusive in several contexts [17, 41]. Similar dynamics have also been predicted to be generatedby chaotic dispersion and shear ﬂows [42]. Processes such as these would be considered relatively fast atmosphericprocesses. In addition to these results, an analysis by Ditlevsen has suggested the presence of α -stable forcing onlonger time scales is present in time series of calcium concentration from Greenland ice cores [8, 15] recording highlatitude climate variability during the most recent glacial period.An α -stable noise process is one for which the increments are α -stable random variables whose distribution wedenote by S α ( β, σ ). The distribution depends on three parameters: the stability index α ∈ (0 ,

2] which determinesthe rate of decay of the tails of the distribution, the skewness parameter β ∈ [ − ,

1] which describes the asymmetryin the distribution, and the scale parameter σ > α -stable random variable cannot be expressed in closed form, except for a small number ofspecial cases. Instead, an α -stable random variable, L ∼ S α ( β, σ ), is typically characterized through its characteristicfunction (CF), Φ L ( k ), Φ L ( k ) = exp ( − σ α | k | α Ξ( k ; α, β )) (1.1)where Ξ( s ; α, β ) = 1 + iβ sgn ( s ) tan( πα/ . (1.2)3t is easy to see that α = 2 in (1.1) corresponds to the special case of a Gaussian random variable with variance2 σ and mean 0. The parameter β is inconsequential in this case, since Ξ( k ; 2 , β ) = 1 for all k, β . If α <

2, thesecond moment of the distribution does not exist. When α <

1, the ﬁrst moment does not exist. For α -stableincrements in continuous time denoted by dL ( α,β ) t , the scale parameter of the increment is nonlinearly related tothat of the inﬁnitesimal timestep dt . Formally this relationship implies that σ α ∝ dt and so the noisy increments dL ( α,β ) t ∼ S α (cid:0) β, ( dt ) /α (cid:1) . If β = 0, the noise is symmetric. Otherwise, it is asymmetric.The mathematical problem of stochastic averaging of heavy-tailed processes for linear systems has been investi-gated in [43] in the context of the Forward Kolmogorov equation (FKE). We use the more general FKE terminologythroughout the paper, noting that if the stochastic forcing is Gaussian, the equation is usually called the Fokker-Planckequation. More recent work by Chechkin and Pavlyukevich [6] and Li et al. [25] give Wong-Zakai type results forsystems driven by a fast compound Poisson jump processes. Results like those of Xu et al. [47] consider strong-sense(i.e pathwise) stochastic averaging of α -stable forced systems, determining reduced SDE models that approximatethe full systems in a pathwise sense. They consider a class of two-time scale systems in which the fast dynamicscorresponds to an explicit time-dependence (on fast timescales) of the slow variable’s drift and diﬀusion. We considera much broader class of systems, but only in the weaker sense of convergence in distribution.We derive stochastic averaging approximations in the weak (distributional) sense for fast-slow stochastic dynamicalsystems with additive α -stable noise forcing, obtaining reduced dynamical approximations for the slow component ofthe full fast-slow system. As in [43], we analyze the FKE of the fast-slow systems, but in contrast we consider bothlinear and nonlinear systems under appropriate transformations to allow an analysis of the FKE in Fourier space.This approach allows us to consider a broader class of systems than has been investigated previously in the contextof stochastic averaging of α -stable noise processes. These systems are linear in the fast variable, while allowing forsome nonlinear interactions between slow and fast variables and fully nonlinear behaviour in the dynamics of theslow variables. The averaging approximations are conceptually similar to existing stochastic averaging methods usedfor systems with Gaussian white noise [2, 32], but the approach used for the α -stable case is signiﬁcantly diﬀerentfrom that used for the Gaussian case to determine the appropriate stochastic forcing for the reduced dynamics. Theanalysis here is based on the joint CF of the fast component and a ﬂuctuation about the mean of the slow variable.For the (N+) approximations, Marcus’ stochastic calculus [6, 30] for jump processes is required if the approximationhas a state-dependent “diﬀusion” term. Analogous to Stratonovich calculus for Gaussian processes, the Marcusinterpretation of stochastic integrals allows for the rules of ordinary calculus, like the chain rule, to hold. In Fig. 1,we show an example of stochastic averaging of a linear system where α = 2 and α <

2. For such systems, the (L) and(N+) approximations are the same and they compare well with the full system.In §

2, we deﬁne the stochastic averaging problem and write the expression for the (A) approximation in the case4

20 40 60 80−0.500.5 tx ( t ) −0.5 0 0.510 −5 −3 −1 xp ( x ) tx ( t ) −5 0 510 −5 −3 −1 xp ( x ) Figure 1: A comparison of the time series and simulated marginal distributions of the fast-slow system (3.1, 3.2) andthe stochastically averaged system (3.4). Top-Left: A sample of the time series for the slow variable of both systems: x t (solid blue) and x t + ξ t (dashed green). Top-Right: Normalized histogram-based estimates for the density forthe slow variable. Blue line: full system. Green crosses: (L) approximation. Black squares: Numerically evaluatedprobability density function for ξ from the (L) approximation (3.4) using the MATLAB package STBL as describedin §

3. Parameters for this simulation are ( a = 0 . , b = 0 . , c = 1 , ǫ = 0 . , α = 2 . , x = 0). The bottom panels areas for the top, but with α -stable noise where α = 1 . β = 0.where the stochastic forcing is α -stable noise, rather than Gaussian. In § § §

3, we apply our approximations to one linear system and threedistinct nonlinear systems, comparing the averaging approximations to the full systems. These numerical resultsshow agreement between the approximations and the full systems not only in probability distribution, but also in theautocodiﬀerence which characterizes the memory of the process. Our conclusions are outlined in § Stochastic averaging and α -stable noise We study the canonical problem of stochastic averaging of a fast-slow system with the stochastic forcing given by theincrements of an α -stable L´evy process, L ( α,β ) t , dx t = f ( x t , y t ) dt, (2.1) ǫ dy t = g ( x t , y t ) dt + ǫ γ b dL ( α,β ) t , (2.2)where γ and b are constants and x , y are known. The parameter ǫ is a small positive number (0 < ǫ ≪

1) thatcharacterizes the well-separated time scales between the slow variable x t and the fast variable y t . The variable y t is a stationary and ergodic stochastic process forced with an additive α -stable L´evy noise dL ( α,β ) t with incrementsdistributed as S α (cid:0) β, ( dt ) /α (cid:1) . In this study, we restrict our analysis to the case α ∈ (1 ,

2] with E h dL ( α,β ) t i = 0. Thesystem (2.1, 2.2) includes the well-studied case of Gaussian noise forcing, corresponding to α = 2. When α = 2, wereplace dL (2 ,β ) t with √ dW t , where W t is a standard Wiener process and the factor of √ α -stable CF (1.1). Since α -stable noise processes are best analyzed through their CFs, we consider forms of f, g that are linear in y t , f ( x, y ) = f ( x ) + ǫ − γ f ( x ) y, g ( x, y ) = ǫ γ g ( x ) + g ( x ) y, (2.3)to make analysis in Fourier space straightforward. Speciﬁcally, the linear form allows us to give results in terms ofclosed form functions of the parameters. Some preliminary work for examples that include polynomials of y indicatesthe limited availability of explicit analytical expressions. Within the form (2.3), we also assume g ( x ) < x toensure contracting dynamics so that the distribution of y | x reaches conditional stationary behaviour on a fast timescale. For the (N+) approximation in § f ( x ) has the same sign for all x and is never equalto zero in order to deﬁne a particular monotonic transformation.The goal of stochastic averaging is to derive an approximation in the weak sense to the SDE of x t based on thedistributional properties of y t . We obtain three diﬀerent stochastic averaging approximations resulting in averageddeterministic, linear stochastic, and nonlinear stochastic models, as described below. The (A) approximation is an approximation of x t in the mean and is therefore deterministic. It is given by x t where dx t = f ( x t ) dt, x = x (2.4)6nd f ( x ) = E ( y | x ) [ f ( x, y )] = Z ∞−∞ f ( x, y ) p ( y | x ) dy. (2.5)The notation E ( y | x ) [ f ( x, y )] indicates the expected value of f ( x, y ) conditioned on a ﬁxed x and the term p ( y | x ) isthe conditional density for y t for a ﬁxed x t = x . For this approximation we have implicitly assumed that f can beevaluated for x t , y t in (2.1, 2.2). Indeed, by our assumption that α ∈ (1 ,

2] and the form of f, g given in (2.3) holds,then f is evaluated using (2.5) giving us E ( y | x ) [ y ] = − ǫ γ g ( x ) /g ( x ) and thus, f ( z ) = f ( z ) − f ( z ) g ( z ) g ( z ) . (2.6)For the Gaussian noise case, the (A) approximation has been studied in detail [2, 11, 21]. In the next section, we showthat the (A) approximation holds in the case of α -stable noise forcing if α > γ > − /α in the strict ǫ → < ǫ ≪

1, a more accurate approximation called the (L) approximation emerges that includes correctionsto the (A) approximation.

The (L) approximation is an Ornstein-Uhlenbeck-L´evy process (OULp) obtained by linearizing about the mean x asdeﬁned in (2.4). We ﬁrst decompose the dynamics into a mean component and a ﬂuctuation as in [32]. The propertiesof α -stable noise require analysis of CFs to determine the appropriate stochastic approximation for the ﬂuctuation.We deﬁne the ﬂuctuation about the mean as ξ t = ( x t − x t ) and derive its dynamics by substituting in (2.1) andusing (2.4), dξ t = [ f ( x t , y t ) − f ( x t )] dt (2.7)= (cid:2) f ( x t ) − f ( x t ) (cid:3) dt + ˆ f ( x t , y t ) dt, (2.8)where ˆ f ( z, y ) = f ( z, y ) − f ( z ). Substituting x t = x t + ξ t and keeping leading order contributions for | ξ t | ≪ ξ t , dξ t ≃ f ′ ( x t ) ξ t dt + ˆ f ( x t , y t ) dt. (2.9)To complete the (L) approximation, we ﬁnd a stochastic approximation for ˆ f dt that has its distributional propertieson the t time scale. The variable v t captures the stochastic forcing in the ξ equation. Its dynamics are dv t = ˆ f ( x t , y t ) dt, v = 0 , where ˆ f ( z, y ) = f ( z ) (cid:18) ǫ − γ y + g ( z ) g ( z ) (cid:19) , (2.10)7nd we have substituted (2.3) and (2.6) into (2.9) to obtain an explicit expression for ˆ f . Rewriting (2.9) using (2.10),we see the forcing eﬀect of the increments dv t on the dynamics of ξ , dξ t = f ′ ( x t ) ξ t dt + dv t , ξ (0) = 0 . (2.11)In the case α = 2, the autocovariance C ( s ; x ) can be evaluated and the ﬂuctuation dv t in (2.11) is replaced with aGaussian white noise forcing σ ( x ) dW t [2, 32], where σ ( x ) = Z ∞−∞ C ( s ; x ) ds, C ( s ; x ) = lim t →∞ E ( y | x ) h ˆ f ( x, y t + s ) ˆ f ( x, y t ) i . (2.12)This result is justiﬁed for C ( s ; x ) having a δ -function behaviour similar to that of Gaussian white noise in the limit ǫ → C ( s ; x ) of ˆ f ( x, y ) is undeﬁned when α <

2. While moments withorder greater than or equal to α do not exist for α -stable distributions, the CF is deﬁned for any α <

2. Thus, weanalyze the process v t in terms of its CF in order to determine an appropriate stochastic forcing approximation for dv t when α < p ( y, v, t ) for the system (2.2), (2.10) satisﬁes the fractional FKE(also known as the Fokker-Planck equation if α = 2),  ∂p∂t = − ∂∂v (cid:16) ˆ f ( x, y ) p (cid:17) − ǫ ∂∂y ( g ( x, y ) p ) + b α ǫ (1 − γ ) α D ( α,β ) y p,p ( y, v,

0) = δ ( v ) δ ( y − y ) . (2.13)where D ( α,β ) y is the the Riesz-Feller derivative, a generalization of the Liouville-Riemann fractional derivative [5],acting with respect to y associated with the α -stable noise process. This operator can be deﬁned in terms of a Fouriertransform, F [ u ] = R R exp( ily ) u ( y ) dy , and its inverse: D ( α,β ) y u = F − (cid:20) − πα/ (cid:18) β il ) α + 1 − β − il ) α (cid:19) F [ u ] (cid:21) (2.14)For 0 < ǫ ≪

1, we initially treat x as a constant relative to the fast variables v and y . Let ψ be the CF for the ( y, v )system, ψ ( l, m, t ) = F [ p ] = Z Z R exp( ily + imv ) p ( y, v, t ) dydv. (2.15)Taking the Fourier transform (FT) of (2.13), we note that the noise terms are entering the dynamics through y andnot through v , so that the operator consists of the usual ﬁrst order partial derivatives in y and v , and the fractionalderivatives are in terms of y only. Then, using (2.14), we obtain the corresponding partial diﬀerential equation for8he CF,  ∂ψ∂t = im F h ˆ f ( x, y ) p i + ilǫ F [ g ( x, y ) p ] − b α | l | α ǫ (1 − γ ) α Ξ( l ; α, β ) ψ,ψ ( l, m,

0) = exp( ily ) . (2.16)Substituting g and ˆ f as in (2.3) and (2.10) into (2.16) yields  ∂ψ∂t = (cid:18) f ǫ γ m + g ǫ l (cid:19) ∂ψ∂l + (cid:20) i g ǫ − γ l + i g f g m − b α | l | α ǫ (1 − γ ) α Ξ( l ; α, β ) (cid:21) ψ,ψ ( l, m,

0) = exp( ily ) (2.17)where we have dropped the argument x from f , f , g , g . We derive the solution for ψ using the method ofcharacteristics. The characteristic curves for the variables t, l and m are given in terms of the characteristic variables r, l , and m , t ( r ) = r, l ( r ) = l exp (cid:16) − g ǫ r (cid:17) − ǫ − γ f g m (cid:16) − exp (cid:16) − g ǫ r (cid:17)(cid:17) , m ( r ) = m , (2.18)and ψ satsiﬁes dψdr = (cid:18) i f g g m + iǫ γ − g l − b α ǫ (1 − γ ) α | l ( r ) | α Ξ( l ( r ); α, β ) (cid:19) ψ, ψ (0) = exp( il y ) (2.19)where Ξ is given in (1.2). The solution of (2.19) is given by ψ ( l, m, t ) = exp (cid:20) i Λ( t ) y − i (cid:18) ǫ γ g g l + ǫ g f g m (cid:19) (1 − exp( g t/ǫ )) (2.20) − b α ǫ (1 − γ ) α Z t | Λ( r ) | α Ξ(Λ( r ); α, β ) dr (cid:21) , where Λ( s ) = l exp (cid:16) g ǫ s (cid:17) − ǫ − γ m f g (cid:16) − exp (cid:16) g ǫ s (cid:17)(cid:17) . (2.21)We evaluate the integral term of (2.20) asymptotically for small ǫ as shown in Appendix B giving the result, Z t | Λ( r ) | α Ξ(Λ( r ); α, β ) dr (2.22)= (cid:18) ǫ | l | α − αg (cid:19) Ξ( l ; α, β )(1 − exp( αg t/ǫ )) + O ( ǫ − γ ) for t = O ( ǫ ) ,ǫ | f g − | α | m | α Ξ( m ; α, β ∗ ) t + ǫ | l | α − αg Ξ( l ; α, β ) + O ( ǫ − γ ) for t = O (1).where β ∗ = β sgn ( f ) . (2.23)9ubstituting (2.22) into (2.20) gives the following approximation for ψ , ψ ∼  exp h i Λ( t ) y − i (cid:16) ǫ γ g g l + ǫ g f g m (cid:17) (1 − exp( g t/ǫ )) − b α | l | α ǫ (1 − γ ) α − ( − αg ) Ξ( l ; α, β )(1 − exp( αg t/ǫ )) i for t = O ( ǫ ),exp h − ilǫ γ g g − im (cid:16) ǫ g f g + ǫ − γ f g y (cid:17) − b α | l | α ǫ (1 − γ ) α − ( − αg ) Ξ( l ; α, β ) − b α ǫ (1 − γ ) α − | f /g | α | m | α Ξ( m ; α, β ∗ ) t i for t = O (1) . (2.24)Since ψ has no terms that involve products of l and m , we can factor the approximate CF into components ψ ≈ ψ y ψ v that correspond to CFs of y t and v t . The CF for y t given in (2.2) and g given in (2.3), treating x t as a constant is, ψ y ( l, t ) = exp (cid:20) ily exp ( g t/ǫ ) − ilǫ γ g g (1 − exp ( g t/ǫ )) (2.25) − b α | l | α ǫ (1 − γ ) α − ( − αg ) Ξ( l ; α, β )(1 − exp( αg t/ǫ )) (cid:21) . The asymptotic approximation for t = O (1) is ψ y ( l, t ) ∼ exp (cid:20) − ilǫ γ g g − b α | l | α ǫ (1 − γ ) α − ( − αg ) Ξ( l ; α, β ) (cid:21) . (2.26)We then factor ψ y as deﬁned by (2.25) for t = O ( ǫ ) and (2.26) for t = O (1) out of the approximate expression for ψ given in (2.24) to get the asymptotic behaviour of ψ v , ψ v ( m, t ) ∼ exp (cid:20) − im (cid:18) ǫ − γ f g y + ǫ g f g (cid:19) (1 − exp( g t/ǫ )) (cid:21) for t = O ( ǫ ), (2.27) ψ v ( m, t ) ∼ exp (cid:20) − im (cid:18) ǫ − γ f g y + ǫ g f g (cid:19) (2.28) − b α ǫ (1 − γ ) α − | f g − | α | m | α Ξ( m ; α, β ∗ ) t (cid:21) for t = O (1) . This factorization of the CF allows us to analyze the asymptotic approximation of the CF, ψ v for the process v t . Wecompare (2.28) to the CF of an α -stable process, z t , t ≥

0, with the SDE and associated CF given by dz t = Σ dL ( α,β ) t , ψ z ( k ) = exp [ ikz − t Σ α | k | α Ξ( k ; α, β )] , (2.29)where Σ is a constant and k is the Fourier variable. The functional structure with respect to the Fourier variables m in (2.28) and k in (2.29) is identical, with the terms ( α, β, Σ) in (2.29) corresponding to ( α, β ∗ , ǫ ( γ − /α ) b | f /g | )in (2.28). The coeﬃcient of im in (2.27), (2.28) deﬁnes the mean of v t , which is non-zero in general if y = E ( y | x ) [ y ]given in (2.6). The mean of v t is constant on the t = O (1) time scale as is evident by inspection of (2.28) and is thus10nconsequential to our analysis since we are interested in the increments of v t on this time scale rather than v t itself.Since convergence in CF implies convergence in distribution [10], the comparison of (2.28) and (2.29) indicatesthat on the O (1) time scale, the increments dv t are approximately distributed according to an α -stable law, dv t ∼ ǫ ρ b | f /g | dL ( α,β ∗ ) t where ρ = γ − /α . We conclude that the (L) approximation for the system (2.1 – 2.3) is givenby x t + ξ t where x t is given by (2.4) and dξ t = f ′ ( x t ) ξ t dt + ǫ ρ b (cid:12)(cid:12)(cid:12)(cid:12) f ( x t ) g ( x t ) (cid:12)(cid:12)(cid:12)(cid:12) dL ( α,β ∗ ) t , ξ = 0 (2.30)where f is given in (2.6). We note that if ρ < γ < − /α ), then the scale parameter of the noise diverges andthe approximation (2.30) is undeﬁned in the ǫ → ρ >

0, then the scaling of the noise approaches zero inthis limit and ξ t → t → ∞ , provided f ′ ( x t ) < x t . In this case, the (L) approximation is asymptoticallyequivalent to the (A) approximation (2.4) as ǫ →

0. For non-zero, but suﬃciently small ǫ , the (L) approximation isan asymptotic approximation in the weak sense for the dynamics of x t for ρ >

0. This is also true for ρ < ǫ nottoo small. However in this case, the dynamics of x t become dominated by the noise and it becomes questionable that x t can be considered a slow process relative to y t . For ρ = 0, the intensity of the noise in (2.30) does not depend on ǫ and the stochastic corrections persist as the timescale separation is made arbitrarily small.The (L) approximation is useful not only when the system being approximated is itself linear, but also when thesystem is near a deterministic attractor to which x t has converged. In this case, the (L) approximation is reasonableon ﬁnite time intervals that are shorter than the escape time of the local attractor. We compare simulations of thefull system and the (L) approximation for both linear and nonlinear systems in § The (L) approximation is a good approximation for linear systems or for certain local approximations. However, forsystems with nonlinear dynamics, the (L) approximation may be inadequate due to its inability to model systems withmultimodal stationary distributions or behaviour diﬀerent from an OULp. In these cases, the (N+) approximation ismore appropriate as it is a better approximation for such nonlinear systems.Beginning with the original system (2.1, 2.2) with f, g given by (2.3), we express the dynamics of x t in terms of f and ˆ f given in (2.5) and (2.10), dx t = f ( x t ) dt + ˆ f ( x t , y t ) dt. (2.31)As in the previous section, to complete the (N+) approximation, we ﬁnd a stochastic approximation for ˆ f ( x t , y t ) dt that has its distributional properties on the t = O (1) time scale. We make a change of coordinates µ t = T ( x t ) where11 is a diﬀerentiable function satisfying T ′ ( x ) f ( x ) = g ( x ) . (2.32)The transformation µ t = T ( x t ) deﬁned in (2.32) is invertible since g ( x ) < f ( x ) = 0 has the same sign for all x . As we show below, this coordinate transformation, x → µ allows us to approximate the increments ˆ f ( x t , y t ) dt inthe transformed coordinates as an additive α -stable noise process when we consider the CF in Fourier space. Thenthere is no ambiguity in the interpretation of these noise terms, particularly when we invert the transformation T toobtain the approximation in the original variables. Our use of such a coordinate transformation (2.32) is similar tothat in [6] where a fast-slow system is transformed so that the inﬂuence of the fast Ornstein-Uhlenbeck process drivesthe slow variable additively. In [6], this formulation allows for the unambiguous derivation of a reduced stochasticmodel, with a Stratonovich interpretation.The dynamics of µ t can be determined using the change of variables formula [7], dµ t = T ′ ( x t − ) dx t + 12 T ′′ ( x t − ) d [ x, x ] ct + T ( x t ) − T ( x t − ) − T ′ ( x t )∆ x t , (2.33)where x t − = lim s → t − x s and ∆ x t denotes the jump component of x t . Since the continuous part of the quadraticvariation of x , d [ x, x ] ct ∝ ( dt ) → x t is continuous for all t , we get dµ t = T ′ ( x t ) dx t (2.34)= T ′ ( T − ( µ t )) h f ( T − ( µ t )) + ˆ f ( T − ( µ t ) , y t ) i dt (2.35)= T ′ ( T − ( µ t )) f ( T − ( µ t )) dt + (cid:2) g ( T − ( µ t )) + ǫ − γ g ( T − ( µ t )) y t (cid:3) dt. (2.36)We deﬁne dV t as the additional ﬂuctuations about the ﬁrst forcing term in (2.36), analogous to dv t in the (L)approximation (2.10), dV t = (cid:2) g ( T − ( µ t )) + ǫ − γ g ( T − ( µ t )) y t (cid:3) dt, V = 0 . (2.37)Note that the dynamics in (2.37) are the same (up to a multiplicative constant) as the drift terms in the equationfor y . This correspondence between the V and y equations leads to a straightforward factorization of the joint CF inFourier space, as we see below.Invoking the separation of time scales for small ǫ , we use a multiple scale approximation for the FKE for the ( y, V )system treating µ t as eﬀectively constant relative to the fast time scales of y t and V t ,  ∂q∂t = − (cid:18) ∂∂V + ǫ γ − ∂∂y (cid:19) (cid:0)(cid:2) g + ǫ − γ g y (cid:3) q (cid:1) + b α ǫ (1 − γ ) α D ( α,β ) y q,q ( y, V,

0) = δ ( y − y ) δ ( V ) (2.38)12here q = q ( y, V, t ) is the joint PDF for y, V and g and g are shorthand for g ( T − ( µ t )) and g ( T − ( µ t )), treatedas constants. We take the FT of (2.38) and deﬁne the CF of ( y, V ) to be φ = φ ( l, m, t ) where φ ( l, m, t ) = Z Z R exp( ily + imV ) q ( y, V, t ) dV dy. (2.39)The resulting linear partial diﬀerential equation for φ is similar to (2.17),  ∂φ∂t = (cid:16) g ǫ γ m + g ǫ l (cid:17) ∂φ∂l + (cid:20) i (cid:18) g m + g lǫ − γ (cid:19) − b α | l | α ǫ (1 − γ ) α Ξ( l ; α, β ) (cid:21) φ,φ ( l, m,

0) = exp( ily ) (2.40)with Ξ( l ; α, β ) given in (1.2). Analogous to (2.17), (2.40) can be solved using the method of characteristics giving us φ ( l, m, t ) = exp (cid:20) i Γ( t ) y − i g g ( ǫ γ l + ǫm )(1 − exp( g t/ǫ )) (2.41) − b α ǫ (1 − γ ) α Z t | Γ( r ) | α Ξ(Γ( r ); α, β ) dr (cid:21) , where Γ( s ) = l exp (cid:16) g ǫ s (cid:17) − ǫ − γ m (cid:16) − exp (cid:16) g ǫ s (cid:17)(cid:17) . (2.42)Using the same asymptotic results derived in Appendix B, we can approximate φ on the t = O ( ǫ ) and O (1) timescales. Thus, we obtain an approximate form for the CF of the ( y, V ) system for 0 < ǫ ≪ φ ∼  exp h i Γ( t ) y − i ( ǫ γ l + ǫm ) g g (1 − exp( g t/ǫ )) − b α | l | α Ξ( l ; α,β ) ǫ (1 − γ ) α − ( − αg ) (1 − exp( αg t/ǫ )) i for t = O ( ǫ )exp h − iǫ γ g g l − i (cid:16) ǫ − γ y + ǫ g g (cid:17) m − b α | l | α Ξ( l ; α,β ) ǫ (1 − γ ) α − ( − αg ) − b α ǫ (1 − γ ) α − | m | α Ξ( m ; α, − β ) t i for t = O (1) . (2.43)Since φ has no products of l and m , we decompose the t = O (1) approximation in (2.43) into two factors: φ ∼ φ y φ V ,where φ V ( m, t ) = exp (cid:18) − i (cid:18) ǫ − γ y + ǫ g g (cid:19) m − ǫ αρ b α | m | α Ξ( m ; α, − β ) t (cid:19) (2.44)and φ y = ψ y is the CF of y given in (2.26). As in the case of the (L) approximation, φ V is identical in form to the CFof an α -stable process. By the same reasoning leading from (2.28) to (2.30) in the previous section, we conclude that13he dynamics of the perturbation V t can be approximated on the t = O (1) time scale by an α -stable noise process, dV t = g ( T − ( µ t )) dt + ǫ − γ g ( T − ( µ t )) y t dt ≈ ǫ ρ b dL ( α, − β ) t . (2.45)Note that in our transformed system µ t = T ( x t ), the coeﬃcient of the L´evy increment is state-independent and sothe limiting noise is additive. Using (2.37) and (2.45) in (2.36), we obtain an approximate equation for µ t , dµ t ≈ T ′ ( T − ( µ t )) f ( T − ( µ t )) dt − ǫ ρ b dL ( α,β ) t , µ = T ( x ) . (2.46)Here we have chosen to express the stochastic increments ǫ ρ b dL ( α, − β ) t as − ǫ ρ b dL ( α,β ) t , which is equivalently distributed.To complete the approximation for x t on the slow time scale, we invert our initial transformation x t → µ t = T ( x t )by taking µ t → X t = T − ( µ t ), similarly to [6] in the example of Gaussian noise. The fact that the stochasticforcing on µ t is additive allows for an unambiguous interpretation of the noise terms prior to performing the inversetransformation, and inverting the transformation then provides the appropriate interpretation of the noise terms forthe approximation of x t . Since T − is a nonlinear transformation, the change of variables formula for jump processes[7] yields non-trivial contributions when applied to X t as follows, dX t = ( T − ) ′ ( µ t − ) dµ t + T − ( µ t ) − T − ( µ t − ) − ( T − ) ′ ( µ t − ) ∆ µ t , (2.47)where X = x , µ t − = lim s → t − µ s , and ∆ µ t = µ t − µ t − = − ǫ ρ b dL ( α,β ) t is the jump component of µ t at time t . Usingthe expression for dµ t (2.36), (2.47) can be written as dX t = 1 T ′ ( T − ( µ t − )) T ′ ( T − ( µ t )) f ( T − ( µ t )) dt + (cid:8) T − ( µ t ) − T − ( µ t − ) (cid:9) (2.48)= (cid:18) T ′ ( X t ) T ′ ( X t − ) (cid:19) f ( X t ) dt + (cid:8) T − ( µ t ) − T − ( µ t − ) (cid:9) . (2.49)where X t − = lim s → t − X s . With probability one, the ratio T ′ ( X t ) / T ′ ( X t − ) is equal to one, as X t is a c`adl`ag processand hence the set of times where a jump occurs is a null-set with respect to the Lebesgue measure [7]. Hence, wereplace T ′ ( X t ) / T ′ ( X t − ) with 1 in (2.49). The diﬀerence T − ( µ t ) − T − ( µ t − ) can not be neglected since the timeintegral of these terms results in a sum that does not scale with the size of the inﬁnitesimal time step dt . To see this,we show that this diﬀerence is equal to the Marcus increment [ T ′ ( X t )] − ⋄ dL ( α,β ) t , by writing T − ( µ t ) − T − ( µ t − ) interms of the processes, Θ( r ; ∆ L t , µ t − ) = µ t − − ǫ ρ b ∆ L t r and θ ( r ; ∆ L t , T − ( µ t − )) = T − (Θ( r ; ∆ L t , µ t − )), where dθdr = dθd Θ d Θ dr = − ǫ ρ b ∆ L t T ′ ( θ ( r )) , θ (0; ∆ L t , T − ( µ t − )) = T − ( µ t − ) = X t − (2.50)14nd ∆ L t = dL ( α,β ) t is the size of the jump in the α -stable noise process L ( α,β ) t at time t . The variable θ ( r ; ∆ L t , µ t − )is referred to as the Marcus map and it describes the integrated eﬀect of the jump ∆ L t over the inﬁnitesimal time onwhich the jump occurs which begins at r = 0 and ends at r = 1 (additional details in Appendix C.1.2). The diﬀerence T − ( µ t ) − T − ( µ t − ) can then be written in terms of θ ( r ; ∆ L t , µ t − ) as follows, µ t = µ t − − ǫ ρ b ∆ L t = Θ(1; ∆ L t , µ t − ) = T ( θ (1; ∆ L t , T − ( µ t − ))) (2.51) ⇒ T − ( µ t ) − T − ( µ t − ) = T − (Θ(1; ∆ L t , µ t − )) − X t − = θ (1; ∆ L t , X t − ) − X t − (2.52)The diﬀerence θ (1; ∆ L t , X t − ) − X t − where θ satisﬁes (2.50) is the deﬁnition of the Marcus increment − ǫ ρ b [ T ′ ( X t )] − ⋄ dL ( α,β ) t . Then using (2.52) and T ′ from (2.32) in (2.49), yields the weak approximation X t for the dynamics of x t , dX t = f ( X t ) dt + ǫ ρ b (cid:18) f ( X t ) − g ( X t ) (cid:19) ⋄ dL ( α,β ) t , X = x . (2.53)We note that (2.53), which we refer to as the (N+) approximation is equivalent to the (L) approximation (2.30) when(2.1, 2.2) are a linear system of equations (i.e. f , f , g , g are constant). Also note that the stochastic increment − ǫ ρ b [ f ( X t ) /g ( X t )] ⋄ dL ( α,β ) t can be written equivalently as ǫ ρ b | f ( X t ) / ( g ( X t )) | ⋄ dL ( α,β ∗ ) t where β ∗ is given in (2.23)with f = f ( X t ) since any negative sign in the scale parameter can be absorbed into the skewness parameter. As in § ρ >

0, then the (N+) approximation reduces to the (A) approximation (2.4) in the limit ǫ →

0, while the scaleparameter diverges if ρ < ǫ , (2.53) provides a good approximationto x t when ρ >

0, but if ρ < § ρ = 0, the stochastic dynamics persist in the strict limit ǫ → α = 2, the Marcus interpretation of (2.53) reduces to the Stratonovich interpretation (see Appendix C.1.2)and thus our approximation is consistent with the previously derived (N+) approximation in the case of Gaussianwhite noise stochastic forcing [2].In the next section we show that the (N+) approximation is generally superior to the (L) approximation fornonlinear systems as would be expected. However, we note that the (N+) approximation requires an additionalconstraint (i.e. sgn ( f ( x )) is constant and not equal to zero, for all x ) and its numerical implementation is morecomplicated, often requiring simulation of the Marcus diﬀerential terms (See Appendix C.1.2) for which we do nothave an eﬃcient numerical method when the solution for θ from (2.50) cannot be expressed in closed form and β = 0.Depending on the problem under consideration, for practical reasons the (N+) approximation may not be a preferableoption over the (L) approximation.Finally, we note the interesting fact that the drift and diﬀusion coeﬃcients of the dynamics in the approximationsare identical to those in the Gaussian case discussed in [32] up to a power of ǫ , regardless of the value of α .15 Computational Results

We apply the (L) and (N+) stochastic averaging approximations to four fast-slow dynamical systems with α -stablenoise forcing and numerically simulate the systems to compare the stationary probability density functions andautocodiﬀerence functions of the approximations to those of the corresponding full systems. One of the systems wesimulate is linear while the other three include nonlinear functions f , f , g and g . For this section, we set ρ = 0(i.e. γ = 1 − /α ). Our numerical scheme is outlined in Appendix C.1, and we have run additional simulations witheven smaller simulation time steps to conﬁrm that the simulations are well-resolved. All density plots in this paperwere derived from simulations having the same sampling time step Dt = 10 − and number of data points, N = 10 .The rationale for our choice of time series length is given in Appendix C.2. We apply the (L) approximation from § dx t = (cid:0) − x t + ǫ − γ ay t (cid:1) dt, x ∈ R (3.1) ǫ dy t = ( ǫ γ cx t − y t ) dt + ǫ γ b dL ( α,β ) t , y = 0 (3.2)where 0 < ǫ ≪ − ac ) >

0. Using (2.4), (2.30), we write the (L) approximation for x t in (3.1, 3.2). The (A)approximation, x , as in (2.4) is dx t = − (1 − ac ) x t dt, x = x , ⇒ x t = x exp[ − (1 − ac ) t ] . (3.3)Using (2.30) and (3.3), the (L) approximation for x t is x t + ξ t where x t satisﬁes (3.3) and ξ t satisﬁes dξ t = − (1 − ac ) ξ t dt + | a | b dL ( α,β a ) t , ξ = 0 , (3.4)where β a = sgn ( a ) β . In the long-time limit t → ∞ (i.e. at stationarity) or in the case that x = 0, the (N+)approximation of x t is identical to the (L) approximation since (3.1, 3.2) is linear. We set x = 0 for all the numericalsimulations of (3.1, 3.2) and (3.4), and hence x t = 0.In Fig. 2, we display the numerically simulated PDF of x t and ξ t with dynamics given by (3.1, 3.2) and (3.4)respectively for diﬀerent values of α <

2. We also consider cases with asymmetric noise (i.e. β = 0) in Fig. 3.Comparing the numerically generated densities of ξ t (3.4) with those of x t from (3.1, 3.2) we see that the densityobtained by the (L) approximation is almost indistinguishable from the density of the full system. In these ﬁgures, wealso plot the numerically computed distribution as determined using the STBLPDF command in the STBL package16or MATLAB [46]. This code computes the PDF by numerically evaluating an integral representation of the α -stabledistribution as given in Theorem 1 of [33]. The densities estimated from numerically simulated time series are inexcellent agreement with the numerically computed densities as determined from the integral representation. −10 −5 0 5 1010 −5 −3 −1 xp ( x ) −50 0 5010 −5 −3 −1 xp ( x ) −100 −50 0 50 10010 −5 −3 −1 xp ( x ) Figure 2: Normalized histogram-based estimates of the PDF with symmetric forcing ( β = 0) of x t from (3.1, 3.2)(blue line) and the (L) approximation ξ t of (3.4) (green crosses). Left/Middle: As in Fig. 1 with α = 1 . / .

4. Right:As in Fig. 1, but with a = 0 . , b = 2 , α = 1 . ǫ = 0 .

1. The numerically evaluated density using the MATLABpackage STBL for ξ in (3.4) is shown by black squares. −10 −5 0 5 1010 −5 −3 −1 xp ( x ) −3 −1 xp ( x ) −10 −5 0 5 1010 −5 −3 −1 xp ( x ) Figure 3: Normalized histogram-based estimates of the PDF with β = 0 of x t from (3.1, 3.2) (blue line). Also plottedis the histogram for ξ t of (3.4) (green crosses). For all ﬁgures: a = 0 . , b = 0 . , c = 1 , α = 1 .

7. Left/Middle/Right: ǫ = 0 . / . / . , β = − . / / .

75. The numerically evaluated density using the MATLAB package STBL for ξ in(3.4) is shown in black squares.We also compare the autocodiﬀerence functions (AFs) [45] of the simulated trajectories for the full and averagedsystems. This quantity is the generalization of the autocovariance function to variables that may not have a ﬁnitevariance. In the case where α = 2, the autocovariance function is a standard measure of the degree of linear dependencebetween time-lagged states of a time series. However, when α <

2, the autocovariance function is not deﬁned andso instead we must use the analogous AF to get a sense of this linear dependence. The AF for a continuous-timestochastic process z t is deﬁned in terms of the CF of z t , A z ( τ ) = log [ E [exp( i ( z t + τ − z t ))]] − log [ E [exp( iz t + τ )]] − log [ E [exp( − iz t )]] , (3.5)17here τ is the lag time. In the case of ξ t given by (3.4), the AF can be explicity evaluated, A ξ ( τ ) = ( ab ) α α (1 − ac ) [1 + exp( − α (1 − ac ) τ ) − | − exp( − (1 − ac ) τ ) | α ] (3.6) − iβ tan (cid:16) πα (cid:17) ( ab ) α α (1 − ac ) [(1 − exp( − α (1 − ac ) τ )) − | − exp( − (1 − ac ) τ ) | α ] . We note that when β = 0, the imaginary part of the AF is equal to zero and (3.6) reduces to the autcodiﬀerencefunction given in [45]. If α = 2, then A ξ , is identical to the autocovariance function of ξ . The existence of a nonzeroimaginary part of the AF for β = 0 is a consequence of deﬁnition (3.5). It has no obvious interpretation in terms ofthe standard autocovariance function or the dynamics of the corresponding process.We estimate the AF for the sample trajectories used to generate the density estimates given in Fig. 2 and oﬀera brief summary of the estimation method in Appendix C.3. The results are displayed in Fig. 4. We see that for ǫ ≪

1, there is virtually no diﬀerence between the estimated AFs of the full and averaged systems, and both arecomparable to the analytically derived result (3.6). For ǫ = 0 .

1, small diﬀerences between the AFs of the full andaveraged systems are apparent but not unexpected given the relatively large value of ǫ which is on the edge of theasymptotic regime. −2 −1 τ −2 −1 τ τ Figure 4: Logarithmic plot of the numerically estimated AF for x t of (3.1, 3.2) (blue line) and ξ t of (3.4) (greencrosses) where β = 0. The analytically derived autocodiﬀerence A ξ ( τ ) in (3.6) is also displayed (black squares). Theparameters for the left, middle and right panels correspond to those from Fig. 2.We also consider the case of skewed noise (not shown) for ǫ ≪ β and ǫ = 0 .

01 with some slightdiﬀerences between the full and averaged systems for ǫ = 0 .

1. Similar agreement is observed for the real andimaginary components of the AF compared separately. From these results, we see that (3.4) is statistically similar to x t of (3.1, 3.2) not only in stationary distribution, but also in autocodiﬀerence.18 .2 Nonlinear system 1: Bilinear slow dynamics The ﬁrst nonlinear system we study is a fast-slow system with a bilinear term in the slow equation: dx t = (cid:0) c − x t + ǫ − γ x t y t (cid:1) dt, x > , (3.7) dy t = − y t ǫa dt + bǫ /α dL ( α, t , y = 0 . (3.8)where c, a >

0. The bilinearity in the slow equation has the eﬀect of restricting the dynamics of x t to positive valuessince the increments dx t /dt → c > x → + . We compute the (N+) approximation, X t , using (2.53), dX t = ( c − X t ) dt + abX t ⋄ dL ( α, t , X = x . (3.9)For the sake of comparison, we also give the (L) approximation using (2.30), x t ≈ x t + ξ t where  dξ t = − ξ t dt + ab | x t | dL ( α, t , ξ = x − cx t = c + ( x − c ) exp( − t ) . (3.10)We compare the densities of the full system (3.7, 3.8) and the approximations (3.9) and (3.10) via numericalsimulation in Fig. 5 demonstrating the expected superiority of the (N+) approximation over the (L) approximation.The simulation methods are described in Appendix C.1, (It¯o for (3.7, 3.8) and (3.10) and Marcus for (3.9)). Partic-ularly noteworthy is that the (N+) approximation, like the full system, does not cross the x = 0 boundary. Giventhe multiplicative nature of the noise of the (N+) approximation, it is clear that the positivity of the slow processis preserved in the approximation, as the dynamics of X t (3.9) satisfy dX t /dt → c > X t →

0. The dynamicsof x t in the full system (3.7, 3.8) also satisfy the same positivity constraint, as shown above. In contrast, since the(L) approximation is linear drift with additive noise, it is possible for the (L) approximation to cross zero. However,even though the (L) approximation does cross x = 0, it is a reasonable approximation of the full system near x t (themean of x t ), and thus may be suitable for local approximations. Although we have only presented results for β = 0,the approximations for β = 0 have similar quality (see Appendix C.1.2 for a discussion on the numerical method).Figure 6 shows that the AF of the (L) approximation, which exhibits exponential decay, diﬀers from that ofthe (N+) approximation and the full system. The slope of the AF for the (L) approximation for small time lagsis shallower than that of (N+), indicating less rapid ”memory loss” over short time intervals. However, for longertimes, the (L) approximation appears to have more rapid memory loss than both x t and X t of the full and (N+)approximation systems. The autocodiﬀerence of the (N+) approximation is very similar to that of x t of the fullsystem, indicating that X t is an excellent approximation to x t in both stationary PDF and AF.19 −3 −1 xp ( x ) −4 −2 0 2 410 −5 −3 −1 xp ( x ) −5 −3 −1 xp ( x ) Figure 5: Normalized histogram-based estimates of the PDF of x t for the full nonlinear system (3.7, 3.8) (blue line),the (L) approximation x t + ξ t (3.10) (green crosses) and the (N+) approximation X t (3.9) (red circles). For all ﬁgures: a = 1 , b = 0 . , c = 1 , ǫ = 0 .

01 and x = c . Left/Middle/Right: α = 1 . / . / ǫ , while the AF of the (L) approximation diﬀers. −1 τ −1 τ −1 τ Figure 6: Logarithmic plot of the estimated AF for x t of the full nonlinear system (3.7, 3.8) (blue line). Also plottedare the real parts of the autocodiﬀerences of the (L) approximation, x t + ξ t , (3.10) (green crosses) and the (N+)approximation, X t , (3.9) (red circles). The parameters for the 3 panels correspond to those from Fig. 5. For the second nonlinear system we consider, there is a nonlinear term in the equation for the fast variable y t , dx t = (cid:0) − x t + ǫ − γ y t (cid:1) dt, x ∈ R , (3.11) dy t = − (1 + | x t | ) ǫ y t dt + bǫ /α dL ( α, t , y = 0 . (3.12)20or larger values of | x | , there is stronger reversion to zero of the fast variable y t . Computing the (N+) approximationusing (2.53) and comparing to the (L) approximation (2.30) gives(N+): x t ≈ X t where dX t = − X t dt + b (1 + | X t | ) ⋄ dL ( α, t , X = x , (3.13)(L): x t ≈ x t + ξ t where  x t = x exp( − t ) ,dξ t = − ξ t dt + b (1+ | x t | ) dL ( α, t , ξ = 0 . (3.14)Figure 7 shows the estimates of the stationary PDFs for (3.11, 3.12) and the approximations (3.13) and (3.14) obtainedby simulation. Once again, we see that the (N+) approximation gives an excellent approximation to the density ofthe full system. By comparison, the (L) approximation is poor as it does not capture the bimodal nature of the fullsystem in the given parameter regimes and overestimates the probability mass in the tails of the distribution. Infailing to capture the bimodal nature of (3.11, 3.12), the (L) approximation does not perform well even as a localapproximation of the full system near the mean, x = 0. The (N+) approximation accurately captures both the tailsand the bimodal nature of the full system. −5 0 510 −3 −1 xp ( x ) −1 0 1 −5 0 510 −5 −3 −1 xp ( x ) −1 0 1 −1 0 110 −1 xp ( x ) −1 0 1 Figure 7: Normalized histogram-based estimates of the PDF of x t for the full system (3.11, 3.12) (blue line), the (L)approximation x t + ξ t (3.14) (green crosses), and the (N+) approximation, X t , (3.13) (red circles). For all plots, b = 2and x = 0. Left: ǫ = 0 . , α = 1 .

6. Middle: ǫ = 0 . , α = 1 .

9. Right: ǫ = 0 . , α = 2. (Insets are on a linear scale) In the third nonlinear model, the drift term of the slow variable contains a cubic term in x t , dx t = (cid:0) − x t − x t + ǫ − γ y t (cid:1) dt, x = 0 , (3.15) dy t = − y t ǫa dt + bǫ /α dL ( α, t , y = 0 . (3.16)where a >

0. The fast variable which is an OULp, appears linearly in the slow equation. For x t near 0, the systemcan be approximated as a multivariate OULp as the cubic term in (3.15) is small. As before, we compute the (N+)21pproximation (2.53) and compare it to the (L) approximation (2.30),(N+): x t ≈ X t where dX t = ( − X t − X t ) dt + ab dL ( α, t , X = x , (3.17)(L): x t ≈ x t + ξ t where  x t = x √ x ( e t − e t ,dξ t = − ξ t dt + ab dL ( α, t , ξ = 0 . (3.18)From the results in Fig. 8, we see that again the (N+) approximation captures the full nonlinearity of the slow dy-namics (3.15) while the (L) approximation eﬀectively gives a linearization of the drift dynamics near the deterministicequilibrium of (3.15) when y t is at its mean value of 0. Both approximations do well near the peak of the distributionat x = 0, but the (L) approximation deviates from the full system for | x | large, so that the OULp approximationis not appropriate. The (N+) approximation, by comparison, is once again a good approximation over the wholedomain. −5 0 510 −5 −3 −1 xp ( x ) −4 −2 0 2 410 −5 −3 −1 xp ( x ) −1 0 110 −5 −3 −1 xp ( x ) Figure 8: Normalized histogram-based estimates of the PDF of the x t - variable for (3.15, 3.16) (blue line), x t + ξ t for the system (3.18) (green crosses), and X t for (3.17) (red circles). For all ﬁgures: σ = 0 . , a = 1 , ǫ = 0 . , x = 0.Left/Middle/Right: α = 1 . / . / . In this paper, we determine the (L) and (N+) appproximations for fast-slow sytems forced with α -stable noise. Theseare reduced systems providing weak approximations for the slow variables, analogous to the established (L) and (N+)approximation for Gaussian noise forcing. Previous results for Gaussian noise have been based on the analysis ofautocovariance [2, 32], while other analyses have used the characteristic functions obtained from the Fourier transformof the FKE for linear fast-slow systems [43]. Here we consider nonlinear systems, and since the autocovariance functionis not available for systems with α -stable noise, we use the characteristic functions for appropriately transformedvariables as the basis for the approximations. We restrict our attention to nonlinear models which are linear in thefast variable, as the Fourier analysis necessary for determining the CF is straightforward in this case.22or the (L) and (N+) approximations that we derive, we ﬁnd that the drift and diﬀusion coeﬃcients of thedynamics have the same form up to a power of ǫ as the coeﬃcients obtained in the (L) and (N+) approximationsin the Gaussian case. The diﬀerence is in the treatment of the noise; obviously, for systems with α -stable noise, the(L) and (N+) approximations also involve α -stable noise. In addition, the general result for the (N+) approximationrequires a Marcus interpretation for the α -stable case, while in the Gaussian case the interpretation of the noise termsis Stratonovich. For α = 2, corresponding to Gaussian noise with no jumps, the Marcus interpretation reduces to theStratonovich, demonstrating consistency.The derivation of the stochastic averaging methods presented here depends on the scaling relationships with regardto the timescale separation parameter ǫ . In particular, the value of the exponent γ in the dynamical equations of theunreduced system,  dx t = f ( x t ) dt + ǫ − γ f ( x t ) y t dt,ǫ dy t = ǫ γ g ( x t ) + g ( x t ) y t dt + ǫ γ b dL ( α,β ) t , is critically important. If γ < − /α , the stochastic terms of the (L) and (N+) approximations, which havethe coeﬃcients proportional to ǫ γ − (1 − /α ) , diverge in the formal ǫ → γ > − /α , the (L) and (N+)approximations are asymptotically equivalent to the (A) approximation as ǫ →

0, which does not include any stochasticeﬀects. For non-zero ǫ = o (1) and γ > − /α , the (L) and (N+) approximations are asymptotic approximationsin the weak sense for the dynamics of x t . While we also have this approximation for γ < − /α , the dynamics of x t become dominated by the noise in this case and it becomes questionable that x t can be considered a slow processrelative to y t . For the critical value γ = 1 − /α , the intensity of the noise in the (L) and (N+) approximations doesnot depend on ǫ and the stochastic corrections persist as the timescale separation parameter ǫ is made arbitrarilysmall.We assumed that g ( x ) < O (1) to ensure contracting dynamics so that the distribution of y conditioned on x reaches stationary behaviour for y on a fast time scale. We expect that other systems with g ( x ) > x also have this property, but we have not considered them here in order to concentrate on developing the approach forderiving the reduced system, rather than on conﬁrming that conditionally stationary behaviour of y is realized on afast time scale. In addition, we expect that caution is required in the cases where the distribution of y conditionedon x is multimodal (e.g. multistability for y in the absence of noise).We compare numerical results for several fast-slow systems (both linear and nonlinear) and their corresponding(L) and (N+) approximations for the slow variables, computing both the long time probability density functions andthe autocodiﬀerence functions (analogous to the autocovariance functions for α -stable processes). We observe goodagreement between the full system and the approximations with some expected limitations. The (L) approximationgives a good approximation for linear fast-slow systems as well as a local approximation near the mean when a23inear approximation is valid there. The (N+) approximation captures the nonlinear behaviour of both the long timeprobability density functions and the autocodiﬀerence functions, showing good agreement with the full system over allvalues for the slow variable. However, the (N+) approximation is more diﬃcult to simulate, primarily because of theMarcus stochastic terms. Considerations of numerical stability of nonlinear systems may also be an issue, requiringhigher order methods for accuracy, as discussed in Appendix C.1.1. The choice to use the (L) or (N+) approximationdepends on the balance of accuracy, range of values where the approximation is needed, and the amount of analyticaland computational work required.Our results suggest a number of valuable areas for future research. In our analysis, we required that α > α <

1. This suggests that the (L) and (N+) approximations in the caseof α < y are similar in character to thosefrom systems that are linear in y . In Fig. 9, right panel, we show an example that is quadratic in y , demonstratingthat the numerical approximation for the density is similar in character to that of our nonlinear system 1 withbilinear form. It may then be possible to provide a (semi)-analytical expression for the reduced system, but thenonlinearities would require methods beyond those used in the present study. Furthermore, in order to consider (N+)approximations for systems with asymmetric α -stable noise where the Marcus map cannot be expressed in closedform, a numerical method needs to be developed. A computationally expensive approach would be simply to treatevery stochastic increment as a jump and use the numerical method outlined in Appendix C.1.2, but ideally one wouldprefer to have a more eﬃcient method. Extending the analysis to systems where the fast and slow subsystems aremulti-dimensional, rather than scalar is another natural direction for our research, although this may not be a trivialendeavour. A detailed comparison with strong pathwise approximations, such as [47], could provide an interestingstudy in which to explore weak convergence rates and the connection between the two distinct types of averaging. A Table of notation

The following table gives a short list of the most commonly used variables and symbols in this study.24

100 −50 0 50 10010 −3 −1 xp ( x ) −10 0 1010 −3 −1 xp ( x ) −1 0 1 1 2 3 4 510 −3 −1 xp ( x ) ζ = 0 ζ = 0.05 ζ = 0.1 Figure 9: Left & Middle: Normalized histogram-based estimates of the PDF with symmetric α -stable forcing for α < a, b, c, ǫ, α ) = (0 . , . , , − , . ǫ, α ) = (10 − , . x t where dx t = ( c − x t + ǫ − γ x t y t [1 + ζy t ]) dt and y t satisﬁes (3.8), ( a, b, c, ǫ, α ) = (1 , . , , − , . ζ are indicated in the legend. Symbol(s) Representative of x, y the slow, fast dynamical variables x the process satisfying the (A) approximation to x (2.4) ξ the perturbation from x in the (L) approximation to x (2.30) X the (N+) approximation to x (2.53). v, V continuous processes approximated asymptotically with a white noise(deﬁned in (2.10), (2.37)) p ( y, v, t ) , q ( h, V, t ) the marginal probability distribution functions of ( y t , v t ), ( h t , V t ) resp. F [ f ] the Fourier transform of the function f (see (2.15) and (2.39)) l, m, ω the Fourier variables associated with y, v (and V ), h resp. ψ, φ the characteristic functions (CFs) of the ( y, v ) , ( h, V ) system resp. µ a variable that weakly approximates T ( x ) where T satisﬁes (2.32) B Evaluation of the characteristic function ψ In this appendix we show the explicit calculations involving the asymptotic evaluation of the integral (2.22) thatcontributes to (2.20), Z t | Λ( r ) | α Ξ(Λ( r ); α, β ) dr (B.1)where Ξ , Λ are deﬁned in (1.2), (2.21) respectively. We focus on two diﬀerent temporal regimes, t = O ( ǫ ) and t = O (1).In the case where t = O ( ǫ ), we make a change of variable ǫy = r in (B.1), ǫ Z ˜ t | Λ( ǫy ) | α Ξ(Λ( ǫy ); α, β ) dy, ˜ t = t/ǫ = O (1) . (B.2)25e use a Taylor series approximation for | Λ( ǫy ) | α to expand the integrand in powers of ǫ for 0 < ǫ ≪ | Λ( ǫy ) | α = (cid:12)(cid:12)(cid:12)(cid:12) l exp ( g y ) − mǫ − γ f g (1 − exp ( g y )) (cid:12)(cid:12)(cid:12)(cid:12) α (B.3) ∼ | l | α exp( αg y ) − ǫ − γ α | l | α − sgn ( l ) m f g [exp(( α − g y ) − exp ( αg y )] . (B.4)The function Ξ( s ; α, β ) = 1 + iβ sgn ( s ) tan( πα/

2) depends on s only though sgn( s ) and is therefore constant exceptwhen s changes sign. In the t = O ( ǫ ) limit, Λ( ǫy ) = l exp( g y ) + O ( ǫ − γ ) and so the value of Ξ is determined by thesign of l for y t and ǫ suﬃciently small: Ξ(Λ( ǫy ); α, β ) = Ξ( l ; α, β ) . (B.5)Substituting (B.4), (B.5) into (B.2), we obtain an approximation for (B.1) for t = O ( ǫ ), Z t | Λ( r ) | α Ξ( r ; α, β ) dr ≈ − ǫ | l | α Ξ( l ; α, β ) αg (1 − exp ( g t/ǫ )) (B.6)+ ǫ − γ | l | α − Ξ( l ; α, β ) sgn ( l ) m f g (cid:18) αα − − exp(( α − g t/ǫ )] − [1 − exp ( αg t/ǫ )] (cid:19) + O ( ǫ − γ ) . (B.7)Keeping only the leading order term in l , we get the approximation to (B.1) valid for t = O ( ǫ ) that is written in(2.22).For t = O (1), (B.1) depends on both global contributions from terms that are signiﬁcant over the entire domainof integration, and on local contributions that are concentrated on an interval of length O ( ǫ ) near t = 0. Methodsfor handling such integrals can be found in [16]. For the global contributions, we neglect terms in Λ( r ) that areexponentially small for 0 < ǫ ≪ r = O (1) to get the leading order behaviours of | Λ( r ) | α and Ξ(Λ( r ); α, β ) ,  | Λ( r ) | α ∼ (cid:12)(cid:12)(cid:12) − ǫ − γ m f g (cid:12)(cid:12)(cid:12) α = ǫ (cid:12)(cid:12)(cid:12) f g (cid:12)(cid:12)(cid:12) α | m | α , Ξ(Λ( r ); α, β ) = Ξ( f m ; α, β ) = Ξ( m ; α, β ∗ ) , (B.8)where β ∗ is deﬁned in (2.23). Then we evaluate (B.1) to get the global contribution plus a remainder, R . Z t | Λ( r ) | α Ξ(Λ( r ); α, β ) dr = ǫ (cid:12)(cid:12)(cid:12)(cid:12) f g (cid:12)(cid:12)(cid:12)(cid:12) α | m | α Ξ( m ; α, β ∗ ) t + R . (B.9)26hen R is primarily the local contribution near r = 0 and can be written as R = Z t | Λ( r ) | α Ξ(Λ( r ); α, β ) dr − ǫ (cid:12)(cid:12)(cid:12)(cid:12) f g (cid:12)(cid:12)(cid:12)(cid:12) α | m | α Ξ( m ; α, β ∗ ) t = Z t exp (cid:16) αg rǫ (cid:17) Υ( r ) dr (B.10)Υ( r ) = (cid:12)(cid:12)(cid:12)(cid:12) l − mǫ − γ f g (cid:16) exp (cid:16) g rǫ (cid:17) − (cid:17)(cid:12)(cid:12)(cid:12)(cid:12) α Ξ(Λ( r ); α, β ) (B.11) − ǫ exp (cid:16) − αg rǫ (cid:17) (cid:12)(cid:12)(cid:12)(cid:12) f g (cid:12)(cid:12)(cid:12)(cid:12) α | m | α Ξ( m ; α, β ∗ ) . The remainder is approximated by evaluating the integral asymptotically using Watson’s Lemma [16] and keepingthe ﬁrst three terms,

R ∼ − ǫ | l | α αg Ξ( l ; α, β ) + ǫ − γ f g | l | α − Ξ( l ; α, β ) sgn ( l ) m (cid:18) α + 1 α (cid:19) + ǫ − γ f g | l | α − m Ξ( l ; α, β ) (cid:18) α − α (cid:19) − ǫ αg (cid:12)(cid:12)(cid:12)(cid:12) f g (cid:12)(cid:12)(cid:12)(cid:12) α | m | α Ξ( m ; α, β ∗ ) . (B.12)Substituting (B.12) into (B.9), we obtain the approximation to (B.1) valid for t = O (1), given in (2.22). C Numerical Methods

C.1 Simulating the SDEs

In this appendix, we describe the numerical methods used to simulate trajectories of the systems given in §

3. De-pending on whether the stochastic dynamics of a system are interpreted in the sense of It¯o or Marcus, the stochasticforcing term is treated diﬀerently and give details below in Appendices C.1.1 and C.1.2.For the systems studied in §

3, we simulate the full systems of the form (2.1, 2.2) using a time step of size, δt (referred to as the simulation time step). Each simulated time series was sampled on a coarser time grid with samplingtime step Dt = 10 − . The simulation time step, δt ≤ Dt , with Dt no longer than the characteristic time scale ofthe fast process. The choice of time steps was taken to ensure that the simulation time scale and the characteristictime scale of the fast process were well-separated and that we obtain a suﬃciently long time series while keepingthe number of data points to a “computationally tractable” number. Diﬀerent simulation time steps were used fordiﬀerent systems to compromise between issues of numerical stability and computation time, but all systems, includingthe approximations, have the same sampling time step, Dt .For the (L) and (N+) approximations in §

3, we used δt = Dt . This choice samples the dynamics suﬃciently wellsince the characteristic time scale of the processes is O (1). Thus we ensure that both the full and reduced systemsare simulated over the same time intervals with the same sample data sizes. Table 1 gives the parameters used in ournumerical simulations. 27 ystem variable characteristic time scale δt (3.1, 3.2) y t ǫ ≥ Dt − (3.4) ξ t (1 − ac ) − ≥ / Dt (3.7, 3.8) ∗ y t ǫa ≥ Dt × − (3.9) ∗∗ X t Dt (3.10) ξ t Dt (3.11, 3.12) y t ǫ (1 + | x t | ) − ( ǫ ≥ Dt ) 5 × − (3.13) ∗∗∗ X t Dt (3.14) ξ t Dt (3.15, 3.16) y t ǫa ≥ Dt − (3.17) ∗ X t Dt (3.18) ξ t Dt Table 1: Simulation parameters for the examples considered in §

3. The indicated variable determines the smallestcharacteristic time scale of the system, which is also indicated. All systems were sampled on the time step Dt = 10 − .Single asterisks indicate the system was simulated using a higher-order method as described by (C.3). Double/Tripleasterisks indicate numerical Marcus integration without/with numerical evaluation of the Marcus map, θ (C.16). C.1.1 Numerical simulations for §

3: It¯o interpretation

For all of the (L) approximations simulated in § ξ t satisﬁes an SDE of the form dξ t = H ( ξ t ) dt + κ ( x t ) dL ( α,β ) t , ξ = 0 , (C.1)where x t is treated as constant with respect to ξ t , so that the noise term has the It¯o interpretation. As we areinterested in the stationary distribution for x t + ξ t , we take the initial value for x t to be its stationary value inall numerical simulations of the (L) approximation, and hence there is no need to numerically simulate x t as it isconstant. We denote the constant value κ ( x t ) as κ . To numerically simulate (C.1), in most cases we use an Euler-typenumerical approximation, ˆ ξ n +1 = ˆ ξ n + H ( ˆ ξ n ) δt + κ δL ( α,β ) n , ˆ ξ = 0 , n = 0 , , , . . . (C.2)where ˆ ξ n denotes the numerical approximation to ξ t at time t = n · δt and the terms δL ( α,β ) n ∼ S α (cid:0) β, ( δt ) /α (cid:1) .Numerical schemes analogous to (C.2) are used for all systems in this paper, unless otherwise indicated in Table 1.For two systems, the weak order 1.0 predictor-corrector method [23] was required for additional accuracy: i) nonlinearsystem 1 (3.7, 3.8) to ensure the x = 0 boundary was not crossed and ii) the (N+) approximation of system 3 (3.17)as the cubic term caused numerical trajectories to diverge when the standard Forward Euler step method was used.The predictor-corrector numerical scheme is given by  ˆ ξ n +1 = ˆ ξ n + (cid:16) H ( ˆ ξ † ) + H ( ˆ ξ n ) (cid:17) δt + κ δL ( α,β ) n ˆ ξ † = ˆ ξ n + H ( ˆ ξ n ) δt + κ δL ( α,β ) n , ˆ ξ = 0 . (C.3)28n all cases the increments of α -stable noise are given by δL ( α,β ) n , with stability parameter α ∈ (0 , ∪ (1 , β and scale parameter σ = ( δt ) /α . We use the algorithm in [9] to generate these quantities as follows, δL ( α,β ) n = B α,β sin( α ( ζ + C α,β ))cos( ζ ) /α (cid:18) cos( ζ − α ( ζ + C α,β )) ζ (cid:19) (1 − α ) /α (C.4)where  B α,β = ( δt ) /α (cid:0) cos (cid:0) arctan (cid:0) β tan (cid:0) πα (cid:1)(cid:1)(cid:1)(cid:1) − /α ,C α,β = α arctan (cid:0) β tan (cid:0) πα (cid:1)(cid:1) (C.5)and ζ , ζ are respectively a uniform random variable on the open interval ( − π/ , π/

2) and an exponential randomvariable with parameter 1. In the case α = 2, the increments δL (2 ,β ) n = δL (2 , n are Gaussian random variables withvariance 2 δt and the simulation scheme (C.2) is then an Euler-Maruyama method [23]. C.1.2 Numerical methods for evaluating Marcus integrals

We summarize results of [3, 6, 13, 24, 26, 44, 38] relevant for processes z t satisfying SDEs with multiplicative noiseterms in the sense of Marcus which are indicated by the “ ⋄ ” symbol, such as (3.9) and (3.13). These systems are ofthe form, dz t = H ( z t ) dt + κ ( z t ) ⋄ dL ( α,β ) t , z ∈ R (C.6)and can be written in integral form as z t = z + Z t H ( z s ) ds + Z t κ ( z s ) ⋄ dL ( α,β ) s . (C.7)The ﬁrst integral on the right-hand side of (C.7) corresponds to the drift terms and is numerically approximatedusing standard methods such as Euler, explicit trapezoidal methods. Here L ( α,β ) t is a pure jump process and so theMarcus increments are expressed in terms of the Marcus map, θ ( r ; ∆ L s , z s − ) as follows, Z t κ ( z s ) ⋄ dL ( α,β ) s = X s ≤ t [ θ (1; ∆ L s , z s − ) − z s − ] . (C.8)where θ ( r ; ∆ L s , z s − ) satisﬁes dθ ( r ; ∆ L s , z s − ) dr = ∆ L s κ ( θ ( r ; ∆ L s , z s − )) , θ (0; ∆ L s , z s − ) = z s − (C.9)[1, 6]. In the Marcus map θ ( r ; ∆ L s , z s − ), r is a time-like variable in which the process z t travels inﬁnitely fast alonga curve connecting the initial and end “times” of the jump, ∆ L s , given by r = 0 and r = 1, respectively. The29ump occurring at time s , denoted ∆ L s = dL ( α,β ) s , is a random variable distributed according to an α -stable law, S α ( β, dt /α ). The resulting eﬀect of the jump on the process z t is determined by the size of the jump as well as theintegrated eﬀect of the noise coeﬃcient over the course of the jump [1]. When an expression for θ can be derived andexpressed in a closed form expression, then z t can be numerically estimated asˆ z n +1 = ˆ z n + H (ˆ z n ) δt + [ θ (1; δL ( α,β ) n , ˆ z n ) − ˆ z n ] . (C.10)where δL ( α,β ) n is as described in Appendix C.1.1. For example, in the (N+) approximation of nonlinear system 1 (3.9), θ (1; δL ( α,β ) n , ˆ z n ) = ˆ z n exp (cid:16) bτ δL ( α,β ) n (cid:17) .In cases where θ needs to be evaluated numerically, as in the case of the (N+) approximation of nonlinear system2 (3.13), we follow [3, 13] to approximate the α -stable noise increments dL ( α,β ) t as the sum of a (path continuous)Gaussian white noise process and a (pure jump) compound Poisson process. This process eﬀectively decomposes dL s into small jumps and large jumps, approximated respectively with the Gaussian white noise process and the Poissonprocess. We consider the case where the noise is symmetric (i.e. β = 0) as the case of asymmetric noise has not beenstudied explicitly (to our knowledge). It is necessary to deﬁne a large jump threshold R > dL ( α, t as dL ( α, t ≈ η dW t + dN t (C.11)where η is a constant satisfying η = (cid:18) α − α (cid:19) C α R − α , C α =  − α Γ(2 − α ) cos( πα/ if α = 12 /π if α = 1 , (C.12) W t is a standard Wiener process, and N t = N t ( ν, λ ) is a compound Poisson process with rate λ and measure ν givenby λ = C α /R α , ν ( x ) =  αR α | x | − α +1 if x < − R or x ≥ R R = 1 to be our large jump threshold, as is done in [13]. Since we are approximating α -stable noise witha sum of a continuous path process and pure jump process, we must use the general representation for stochastic30ntegrals of L´evy processes that includes contributions from both the continuous and jump components [1, 7], yielding Z t κ ( z s ) ⋄ dL ( α, s ≈ η Z t κ ( z s ) dW s + η Z t κ ( z s ) κ ′ ( z s ) dt + Z t κ ( z s ) ⋄ dN s (C.14)= η Z t κ ( z s ) dW s + η Z t κ ( z s ) κ ′ ( z s ) dt + X s ≤ t [ θ (1; ∆ N s , z s − ) − z s − ] (C.15)where the ﬁrst integral on the right-hand side is interpreted in the sense of It¯o . The term ∆ N s represents the jump inthe process N t occurring at time s (if any), accounting for large jumps in the process L ( α, t . Then ∆ N s is distributedas a compound Poisson process evaluated over the inﬁnitesimal time step dt with rate and measure as in (C.13).We see that when α = 2, then the jump rate λ = 0 (i.e. ∆ N t = 0 for all t and the sum in (C.15) vanishes). Thus(C.15) reduces to the sum of the It¯o integral and the Wong-Zakai correction and η = 2, which is equivalent to theStratonovich integral [34]. When α = 2 and κ is constant, then (C.15) reduces to the It¯o integral, as expected.To simulate (C.6) with β = 0, and time step δt , we use the numerical schemeˆ z n +1 = ˆ z n + (cid:18) H (ˆ z n ) + η κ (ˆ z n ) κ ′ (ˆ z n ) (cid:19) δt + ηκ (ˆ z n ) δW n + [ˆ θ (1; δN n , ˆ z n ) − ˆ z n ] (C.16)where δW n ∼ N (0 , δt ) and δN n are discrete increments of a compound Poisson process over the time step δt withrate and measure given by (C.13) which can be simulated as in [7]. The term ˆ θ (1; δN n , ˆ z n ) = ˆ θ M is the numericalapproximation to θ (1; δN n , ˆ z n ), whereˆ θ j +1 = ˆ θ j + δN n κ (ˆ θ j ) δu, ˆ θ = ˆ z n , (cid:18) j = 0 , , . . . , M − , δu = 1 M (cid:19) . (C.17)This numerical method for simulating z t when ˆ θ needs to be evaluated is similar to the tau leaping methods that aregiven in [24] for evaluating Marcus integrals, and a more explicit description can be found therein (also see [26] and[44]). Our numerical method converges weakly and more technical details related to the convergence of our methodare discussed in [7, 13].Here we have written the numerical solutions (C.10), (C.16), and (C.17) using the Euler method for the driftterms, but it would be possible to use higher-order methods, such as the explicit trapezoidal method to numericallyintegrate the drift terms. C.2 Estimating the PDFs

To ensure accurate estimation of the tails of the sample PDFs, we estimate a value N for the size of the sample ofthe time varying processes that provides reasonable conﬁdence in the estimates for the PDFs. This value of N isbased on a calculation for estimated conﬁdence intervals for the tails of the PDFs. Here N is equal to the length31f the time interval of the simulation divided by Dt , so that in our calculations we have N discrete observationsof process z t sampled at intervals of Dt = 10 − . However, in order to obtain an approximation for the conﬁdenceintervals of the tails of the PDF, we use approximations based on independent observations, so we require a subsample Z j , j = 1 , . . . , N Z of z t which has N Z almost-independent observations. Given the sampling time step Dt = 10 − and that the characteristic time scale of the slow variable is O (1) (as can be seen by inspection of the AFs above inFig. 4, 6), we need to subsample z t once every 1 /Dt = 10 points to ensure that observations of Z are suﬃcientlyindependent. Then N is determined as N Z × , with N Z determined from conﬁdence intervals for the PDF of Z .To obtain conﬁdence intervals for the tails of the PDFs, we discretize the state space of Z into n disjoint binswhere the true probability of observing Z j within bin i is π i . Under these assumptions, the estimated number ofobservations of Z j in the n discrete bins is given by a multinomial distribution with N Z observations and probabilities { π i } ni =1 ( P ni π i = 1). Obviously, the π i for the bins that correspond to the tail of the density are very small, yetwe want to ensure that N Z (and thus N ) is suﬃciently large to obtain a good approximation for the tails of thedensity in our simulations. Therefore, we derive an expression for the conﬁdence intervals of π i based on N Z and n .Specifying that small π i ( π i = 10 − ) fall within a 95% conﬁdence interval of width O (10 − ) about π i , we obtain aminimum value of N Z . First, we approximate the multinomial distribution for the number of observations using amultivariate normal approximation which is justiﬁed on the basis of large N Z . We then apply the conﬁdence intervalsuggested in [31] by Quesenberry and Hurst with Goodman’s recommendation and apply the limit N Z → ∞ to obtainthe following approximate two-sided 95% conﬁdence interval for π i , P [ˆ π i − Ω i < π i < ˆ π i + Ω i ] ≈ .

95 (C.18)ˆ π i = 1 N { Z ∈ bin i } , (C.19)Ω i = s M Z ˆ π i (1 − ˆ π i ) N Z , M Z = χ , − . /n (C.20)where χ m, − c denotes the (1 − c )% quantile of the χ distribution with m degrees of freedom. To get a conservativeestimate of the value of Ω i we might expect for our simulations, we choose n = 100, and so M Z = χ , − . / ≈ .

12 = O (10 ). We wish to have narrow conﬁdence intervals where ˆ π i is within O (10 − ) of π i = 10 − , which impliesthat Ω i = O (10 − ) and N Z satisﬁes, N Z = M Z π i / Ω i = O (10 ) . (C.21)Since N = N Z /Dt , we conclude that 10 points with time step Dt are needed in the full simulation to resolve thecharacteristic time scale, the sampling of z t , and the subsampling of Z .32 .3 Estimating the autocodiﬀerence function Given a time-dependent stochastic process z t , the autocodiﬀerence of z t , A z ( τ ), is deﬁned by (3.5) [39, 45]. Tonumerically estimate A z ( s ) we need to estimate both the CF of z t , Φ z ( k ) = E [exp( ikz t )] and the joint CF of z t and atime-shifted version, z t + τ , Ψ z ( k, l, τ ) = E [exp( ikz t + τ + ilz t )]. Assume we have N observations of z t , which we denoteby ˆ z n = z n · Dt , n = 0 , , . . . , N −

1. The estimates for Φ z ( k ) , Ψ z ( k, l, n · Dt ) are given by  ˆΦ [ J,K ] z ( k ) = 1 K − J + 1  K X j = J exp( ik ˆ z j )  , ˆΨ [ J,K ] z ( k, l, n ) = 1 K − J + 1  K X j = J exp( ik ˆ z j + n + il ˆ z j )  , (C.22)respectively, where J, K are integers satisfying 0 ≤ J ≤ K ≤ N − − n . The estimate for A z ( n · Dt ) which we denoteby ˆ A z ( n · Dt ) can be computed at discrete time displacements n · Dt using (C.22) and is given byˆ A z ( n · Dt ) = log (cid:16) ˆΨ [0 ,N − − n ] z (1 , − , n ) (cid:17) − log (cid:16) ˆΦ [ n,N − z (1) (cid:17) − log (cid:16) ˆΦ [0 ,N − − n ] z ( − (cid:17) (C.23)This estimator is reasonably accurate for short times where n ≪ N as there are N − n pairs of data points withrelative time-displacement n · δt , and by the law of large numbers ˆΨ, ˆΦ given in (C.22) are reasonable estimators. Forour study, we consider n ≤ ≪ N and so A ( n · Dt ) is well-estimated. D Acknowledgements

The authors wish to acknowledge the generous support of NSERC . The authors thank Dr. I. Pavlyukevich (Uni.Jena) for bringing the Marcus integral to our attention and for helpful advice. WFT also thanks Dr. J. Walsh (UBC)for helpful discussions. References [1]

D. Applebaum , L´evy processes and stochastic calculus , Cambridge University Press, 2004.[2]

L. Arnold, P. Imkeller, and Y. Wu , Reduction of deterministic coupled atmosphere-ocean models to stochas-tic ocean models: a numerical case study of the lorenz-maas system , Dyn. Syst., 18 (2003), pp. 295–350.[3]

S. Asmussen and P. W. Glynn , Stochastic Simulation: Algorithms and Analysis , Springer, 2007. National Science and Engineering Research Council (Canada):

A. N. Borodin , A limit theorem for solution of diﬀerential equations with random right-hand side , Theor.Probab. Appl., 22 (1977), pp. 482–497.[5]

A. S. Chaves , A fractional diﬀusion equation to describe L´evy ﬂights , Phys. Lett. A, 239 (1998), pp. 13–16.[6]

A. V. Chechkin and I. Pavlyukevich , Marcus versus Stratonovich for systems with jump noise , J. Phys. A,47 (2014), p. 342001.[7]

R. Cont and P. Tankov , Financial Modelling with Jump Processes , Taylor & Francis, 2004.[8]

P. Ditlevsen , Observation of alpha stable noise induced millenial climate changes from an ice-core record ,Geophys. Res. Lett., 26 (1999), pp. 1441–1444.[9]

B. Dybiec, E. Gudowska-Nowak, and P. H¨anggi , L´evy-brownian motion on ﬁnite intervals: Mean ﬁrstpassage time analysis , Phys. Rev. E, 73 (2006), p. 046104.[10]

W. Feller , An Introduction to Probability Theory and Its Applications: Vol. II, 2nd Ed. , John Wiley & Sons,1966.[11]

M. Freidlin and A Wentzell , Random Pertubations of Dynamical Systems , Springer-Verlag, 1984.[12]

D. Givon, R. Kupferman, and A. Stuart , Extracting macroscopic dynamics: model problems and algorithms ,Nonlinearity, 17 (2004), pp. 55–127.[13]

M. Grigoriu , Numerical solution of stochastic diﬀerential equations with poisson and l´evy white noise , Phys.Rev. E, 80 (2009), p. 026704.[14]

K. Hasselmann , Stochastic climate models: Part i. theory , Tellus, 28 (1976), pp. 473–485.[15]

C. Hein, P. Imkeller, and I. Pavlyukevich , Limit theorems for p -variations of solutions of SDEs drivenby additive stable L´evy noise and model selection for paleo-climatic data , in Recent Development in StochasticDynamics and Stochastic Analysis, J. Duan, S. Luo, and C. Wang, eds., vol. 8 of Interdiscip. Math. Sci., 2009,pp. 137–150.[16] E. J. Hinch , Perturbation Methods , Cambridge University Press, 1991.[17]

M. Huber, J. C. McWilliams, and M. Ghil , A climatology of turbulent dispersion in the troposphere , J.Atmos. Sci, 58 (2001), pp. 2377–2394.[18]

W. Just, K. Gelfert, N. Baba, A. Riegert, and H. Kantz , Elimination of fast chaotic degrees of freedom:on the accuracy of the Born approximation , Jour. Stat. Phys., 112 (2003), pp. 277–292.3419]

W. Just, H. Kantz, C. Rodenbeck, and M. Helm , Stochastic modelling: Replacing fast degrees of freedomby noise , J. Phys. A, 34 (2001).[20]

R. Z. Khas’minskii , A limit theorem for the solutions of diﬀerential equations with random right-hand sides ,Theory Probab. Appl., 11 (1966), pp. 390–406.[21] ,

On Stochastic Processes deﬁned by diﬀerential equations with a small parameter , Theory Probab. Appl.,11 (1966), pp. 211–228.[22]

Y. Kifer , L diﬀusion approximation for slow motion in averaging , Stoch. Dyn., 3 (2003), pp. 213–246.[23] Peter E. Kloeden and Eckhard Platen , Numerical Solution of Stochastic Diﬀerential Equations , Springer-Verlag, 1992.[24]

T. Li, B. Min, and Z. Wang , Marcus canonical integral for non-Gaussian processes and its computation:Pathwise simulation and tau-leaping algorithm. , J. Chem. Phys., 138 (2013), p. 104118.[25] ,

Adiabatic elimination for systems with inertia driven by compound poisson colored noise , Phys. Rev. E, 89(2014), p. 022144.[26] ,

Erratum: “marcus canonical integral for non-gaussian processes and its computation: Pathwise simulationand tau-leaping algorithm” [j. chem. phys., 138, 104118 (2013)] , J. Chem. Phys., 140 (2014).[27]

A. Majda, I. Timofeyev, and E. Vanden-Eijnden , A mathematical framework for stochastic climate models ,Comm. Pure and Appl. Math., 54 (2001), pp. 891–974.[28] ,

A mathematical framework for stochastic climate models , Phys. D, 170 (2002), pp. 206–252.[29] ,

Systematic strategies for stochastic mode reduction in climate , J. Atmos. Sci., 60 (2003), pp. 1705–1722.[30]

S. Marcus , Modeling and analysis of stochastic diﬀerential equations driven by point processes , IEEE Trans.Inform. Theory, IT-24 (1978), pp. 164–172.[31]

R. G. Miller , Simultaneous Statistical Inference , Springer-Verlag, 1981.[32]

A. H. Monahan and J. Culina , Stochastic averaging of idealized climate models , J. Clim., 28 (2011), pp. 3068–3088.[33]

J. P. Nolan , Numerical calculation of stable densities and distribution functions , Comm. Statist. Stoch. Models.,13 (1997), pp. 759–774.[34]

Bernt Øksendal , Stochastic Diﬀerential Equations , Springer-Verlag, 2003.3535]

T. N. Palmer, F. J. Doblas-Reyes, A. Weisheimer, and M. J. Rodwell , Toward seamless prediction:Calibration of climate change projections using seasonal forecasts , Bull. Amer. Meteor. Soc., 89 (2008), pp. 459–470.[36]

G. C. Papanicolaou and W. Kohler , Asymptotic theory of mixing stochastic ordinary diﬀerential equations ,Comm. Pure and Appl. Math., 27 (1974), pp. 641–668.[37]

G. A. Pavliotis and A. M Stuart , Multiscale Methods: Averaging and homogenization , Springer, 2007.[38]

P. Protter , Stochastic Integration and Diﬀerential Equations , Springer-Verlag, 1990.[39]

D. Rosadi and M. Deistler , Estimating the codiﬀerence function of linear time series models with inﬁnitevariance , Metrika, 73 (2009), pp. 395–429.[40]

B. Saltzman , Dynamical Paleoclimatology , Academic Press, 2002.[41]

K-H. Seo and K. P. Bowman , L´evy ﬂights and anomalous diﬀusion in the stratosphere , J. Geophys. Res., 105(2000), pp. 295–302.[42]

T. H. Solomon, E. R. Weeks, and H. L. Swinney , Chaotic advection in a two-dimensional ﬂow: L´evyﬂights and anomalous diﬀusion , Phys. D, 76 (1994), pp. 70–84.[43]

Tomasz Srokowski , Correlated L´evy noise in linear dynamical systems , Acta Phys. Polon. B, 42 (2011), pp. 3–19.[44]

X. Sun, J. Duan, and X. Li , An alternative expression for stochastic dynamical systems with parametricpoisson white noise , Probab. Eng. Mech., 32 (2013), pp. 1 – 4.[45]

M. S. Taqqu and G. Samarodnitsky , Stable Non-Gaussian Random Processes: Stochastic Models withInﬁnite Variance , CRC Press, 1994.[46]

M. Veillette , Stbl: Alpha stable distributions for matlab , 2012. Available at [47]