Averaging in the case of multiple invariant measures for the fast system
aa r X i v : . [ m a t h . P R ] F e b Averaging in the case of multiple invariant measuresfor the fast system
M. Freidlin ∗ , L. Koralov † Abstract
We consider the averaging principle for deterministic or stochastic systems witha fast stochastic component (family of continuous time Markov chains depending onthe state of the system as a parameter). We show that, due to bifurcations in thesimplex of invariant probability measures of the chains, the limiting system shouldbe considered on a graph or on an open book with certain gluing conditions in thevertices of the graph (or on the bifurcation surface).
Consider the d -dimensional continuous stochastic process z εt satisfying the equation dz εt = v ( ξ εt , z εt ) dt + κ dW t , < ε ≪ . (1)We assume that v is sufficiently smooth, and ξ εt = ξ t/ε , where ξ t is a stationary processwith sufficiently good mixing properties, such as a non-degenerate diffusion on a compactmanifold or a continuous time Markov chain on a finite state space (we consider the lattercase in this paper). The Wiener process W t is independent of ξ εt . The coefficient κ isnon-negative.Put ¯ v ( z ) = E v ( ξ t , z ). Then (see, for example, [6] Section 7.2) z εt → ¯ z t , as ε ↓ , (2)(convergence, in distribution, of the processes), where ¯ z t is the solution of the equation d ¯ z t = ¯ v (¯ z t ) dt + κ dW t (3) ∗ Dept of Mathematics, University of Maryland, College Park, MD 20742, [email protected] † Dept of Mathematics, University of Maryland, College Park, MD 20742, [email protected] z εt . The convergence of z εt to ¯ z t is preserved if theprocess ξ t is not stationary but converges with probability one to a stationary ergodicprocess ˜ ξ t . In this case, ¯ v ( z ) = E v ( z, ˜ ξ t ). Moreover, the fast component ξ εt in (1) candepend on the slow component. In order to illustrate this point, let us focus on the casewhen the fast motion is governed by a continuous time Markov chain Ξ zt on the finite statespace { , ..., n } . The transition rates for the chain Ξ zt , which depends on the parameter z ∈ R d , will be denoted by q ij ( z ) ≥
0, 1 ≤ i, j ≤ n , i = j . Intuitively, the slow motion z εt is governed, at short time scales, by (1) with ξ εt = ξ t/ε replaced by Ξ zt/ε . Yet, we cannotsimply say that Ξ zt/ε is the fast component of the process since z itself evolves (althoughslowly) in time. The fast-slow system X εt = ( ξ εt , z εt ) can be defined constructively (as inSection 2) or by describing its generator. Namely, for 1 ≤ i ≤ n , consider the operators L i u ( z ) = κ u ( z ) + v ( i, z ) ∇ u ( z ) , where u is a function defined on R d . These operators would govern the evolution of theslow component for the fixed value i of the fast component in the absence of the fastmotion. The second order term, the Laplacian in our case, could also be a more generaloperator to allow for more general diffusion in the slow variable. To account for the fastcomponent, we define the operator A ε f ( i, z ) = 1 ε X j = i q ij ( z )( f ( j, z ) − f ( i, z )) ! + L i f ( i, z ) , where f is a function on { , ..., n }× R d . This operator, with the properly specified domain,is the generator of the process X εt = ( ξ εt , z εt ).If q ij ( z ) > i = j , then the process Ξ zt has a unique invariant distri-bution µ ( z ) = ( µ ( z ) , ..., µ n ( z )), and (2) holds with ¯ v ( z ) = P ni =1 µ i ( z ) v ( i, z ). Assumenow that there is a closed domain G with a smooth boundary such that the chainΞ zt is ergodic for z / ∈ G and has, say, two ergodic components R = { , ..., m } and R = { m + 1 , ..., n } for z ∈ G . Thus the transitions between R and R are impossiblewhile z εt ∈ G . Then one can expect that, as long as z εt remains in G , it converges, as ε ↓
0, to the solution of (3) either with ¯ v ( z ) = P i ∈ R µ i ( z ) v ( i, z ) / P i ∈ R µ i ( z ) or with¯ v ( z ) = P i ∈ R µ i ( z ) v ( i, z ) / P i ∈ R µ i ( z ), depending on whether the fast component evolvesin R or R . Note that, while the invariant distribution is not determined uniquely for z ∈ G , the above expressions for ¯ v are.The process z εt can go from G to R d \ G and vice versa in finite time. Therefore,in order to define the limiting process, one should describe the behavior of the processin an infinitesimal neighborhood of ∂G . The novelty of the current work is that, in thepresence of multiple invariant measures for the fast process, the limiting motion for theslow component is (and needs to be) considered on a graph or an open book (if d > A ε can be written as expectations of certain functionalsof the process X εt = ( ξ εt , z εt ). This allows one to calculate the asymptotics of solutions tothose PDE problems using the results for the process X εt and vice versa. One can alsoapply the probability results to certain non-linear PDE problems related to the process.For example, certain problems for reaction-diffusion systems can be considered in thisway (compare with [3], Chapters 5-7).Finally, we note that the problem considered in this paper can be viewed as a problemconcerning the long-time influence of small perturbations: the process ˜ X εt = X εεt startingat ( i, z ) can be viewed as a small perturbation of the process ˜ X t whose first componentis Ξ zt starting at i and the second component z ∈ R d does not evolve in time.A general approach to the study of the long-time influence of perturbations (see [4],[5]) is to consider the projection of X εt = ˜ X t/ε onto the simplex of invariant probabilitymeasures of the unperturbed process. In the case when the unperturbed process is ˜ X t ,the set M erg of the extreme points of the simplex (ergodic invariant measures) consists ofthe measures of the form µ ( z ) × δ z (where z / ∈ G and µ ( z ) is the invariant measure forΞ zt ) and of the measures of the form µ ( z ) × δ z and µ ( z ) × δ z (where z ∈ G and µ ( z ), µ ( z ) are invariant for Ξ zt on R and R , respectively). The projection of a point ( i, z ), i ∈ { , ..., n } , z ∈ R d , from the phase space of X εt onto M erg is µ ( z ) × δ z (if z / ∈ G ), or µ ( z ) × δ z (if z ∈ G , i ∈ R ), or µ ( z ) × δ z (if z ∈ G , i ∈ R ). Note that M erg can beparametrized by the set of pairs ( l, z ), z ∈ R d , l ∈ { , } if z ∈ G and l = 0 if z / ∈ G . Themain result of the paper is that the projection of X εt onto M erg converges to a Markovprocess on M erg . In this section, we’ll introduce the fast-slow system X εt = ( ξ εt , z εt ). (Sometimes we’llwrite X x,εt to indicate the dependence on the initial position x ). The fast component ξ εt evolves as a Markov chain, whose transition rates depend on the slow variable. The slowcomponent z εt solves an ODE or an SDE with the right hand side that depends on thefast variable. Namely, let q ij ( z ) ≥
0, 1 ≤ i, j ≤ n , i = j be a family of transition rates fora Markov chain Ξ zt that depends on the parameter z ∈ R . Each of the functions q ij ( z ) isassumed to be continuous.We assume that Ξ zt is ergodic for each z <
0, while there are two ergodic classes R = { , ..., m } and R = { m + 1 , ..., n } for z >
0. More precisely, we us assume that q ij ( z ) > z <
0, while, for z ≥ q ij ( z ) > i, j ∈ R or i, j ∈ R .Moreover, we assume that q ij ( z ) − q ij (0), i = j , degenerate at the same rate as z ↑
0, namely, there are positive constants q ij , a function ϕ : ( −∞ , → (0 , ∞ ) withlim z ↑ ϕ ( z ) = 0 and functions β ij : ( −∞ , → R with lim z ↑ β ij ( z ) = 0 such that q ij ( z ) − q ij (0) = q ij ϕ ( z )(1 + β ij ( z )) , z < , i = j. Let µ i ( z ), 1 ≤ i ≤ n , z ∈ R , be the invariant distribution of the Markov chain Ξ zt . Thisis not determined uniquely for z ≥ π i = lim z ↑ µ i ( z ), and, for z ≥
0, we select the unique invariant distribution such that µ i ( z ) are continuous functionson R . Define π = X i ∈ R π i , π = X i ∈ R π i . Let v ( i, z ), 1 ≤ i ≤ n , z ∈ R , be Lipschitz-continuous in z for each i . Define v ( z ) = n X i =1 v ( i, z ) µ i ( z ) , z < ,v ( z ) = 1 π X i ∈ R v ( i, z ) µ i ( z ) , v ( z ) = 1 π X i ∈ R v ( i, z ) µ i ( z ) , z ≥ . We’ll assume that v ( i, z ) > i, z ) (this assumption is not required if there isdiffusion in the slow variable (case κ = 1 below)). Let us make a simplifying assumptionabout the behavior of the coefficients at infinity. Namely, we will assume that there is C > q ij ( z ) = q lij for z ≤ − C and q ij ( z ) = q rij for z ≥ C , where q lij , q rij do notdepend on z . Moreover, let us assume that v ( i, z ) = v ∞ for some v ∞ for all 1 ≤ i ≤ n , | z | ≥ C . These assumptions can be relaxed significantly, however, this will not concernus since we would like to focus on the behavior of the process near z = 0. The slowcomponent z εt is assumed to be continuous and to satisfy dz εt = v ( ξ εt , z εt ) dt + κ dW t at the points of continuity of ξ εt . Here κ = 0 or κ = 1 (we’ll consider two cases resultingin two different types of the limiting behavior). The fast component, intuitively, evolvesas the Markov chain Ξ zt (with z = z εt ), sped up by the factor 1 /ε . However, since z itself evolves in time, we need a more formal definition of the process X x,εt = ( ξ x,εt , z x,εt ).Namely, the process starts at x = ( i, z ) ∈ M , and moves along the z -axis during a randomtime interval [0 , σ ). For t ∈ [0 , σ ), z x,εt solves dz t = v ( i, z t ) dt + κ dW t . At a random time σ , the process X x,εt jumps to a random location ( j, z σ ). The distributionof σ is determined as follows. Let Q i ( z ) = P j = i q ij ( z ) and r ( t ) = ε − R t Q i ( z s ) ds . Then σ ≥ r ( σ ) is exponential with parameter one. Given avalue of σ , the probability that X x,εt jumps to ( j, z σ ) is q ij ( σ ) /Q i ( σ ).Having identified the location of the process at time σ , we treat it as a new startingpoint, and select a new (random) time interval for the jump-free motion of the processindependently of the past. The construction then continues inductively. It is clear thatthe process just described is the RCLL Markov process.The process X x,εt = ( ξ x,εt , z x,εt ) could be defined, equivalently, through its generator,using the Hille-Yosida theorem. We discuss the Hille-Yosida theorem and the generatorof X x,εt next, since, in any case, a similar construction will be used to define the limitingprocess when κ = 1. 4et M be a separable locally compact metric space, C ( M ) be the space of continuousfunctions on M that tend to zero at infinity (can be made arbitrarily close to zero outsidea sufficiently large compact). The space C ( M ) is endowed with the supremum norm.Let P ( t, x, B ) be a Markov transition function (a priori not assumed to be conservative)on M . For f ∈ C ( M ), let( T t f )( x ) = Z T d f ( x ′ ) P ( t, x, dx ′ ) , t ≥ . We’ll say that P satisfies condition C if T t f ∈ C ( M ) for each f ∈ C ( M ). Recall that P is said to be stochastically continuous if lim t ↓ P ( t, x, U ) = 1 for each open neighborhood U of x . Theorem 2.1. [Hille-Yosida] ([7], page 365).
Suppose that a linear operator A on C ( M ) has the following properties:(a) The domain D ( A ) is dense in C ( M ) ;(b) If f ∈ D ( A ) , f ( x ) ≥ and f ( x ) ≥ f ( x ) for all x ∈ M , then Af ( x ) ≤ .(c) For every ψ ∈ C ( M ) , and every λ > , there exists a solution f ∈ D ( A ) of theequation λf − Af = ψ .Then the operator A is the infinitesimal generator of a semi-group T t , t ≥ , on C ( M ) that is defined by a stochastically continuous Markov transition function satisfyingcondition C . The transition function with such properties is determined uniquely. The Hille-Yosida theorem can be applied to the space M = { , ..., n } × R . Let usdefine the linear operator A ε in C ( M ). In the case κ = 0, the domain of A ε , denotedby D ( A ε ), consists of all functions f ∈ C ( M ) such that f ′ ( i, · ) ∈ C ( R ) for each i . For f ∈ D ( A ε ), we define A ε f ( i, z ) = 1 ε X j = i q ij ( z ) f ( j, z ) − Q i ( z ) f ( i, z ) ! + v ( i, z ) f ′ ( i, z ) . In the case when κ = 1, the domain of A ε consists of all functions f ∈ C ( M ) suchthat f ′′ ( i, · ) + v ( i, · ) f ′ ( i, · ) ∈ C ( R ) for each i . For f ∈ D ( A ε ), we define A ε f ( i, z ) = 1 ε X j = i q ij ( z ) f ( j, z ) − Q i ( z ) f ( i, z ) ! + 12 f ′′ ( i, z ) + v ( i, z ) f ′ ( i, z ) . In both cases, it is possible to show that the conditions of the Hille Yosida theorem aresatisfied. (We skip details since, in any case, the process was already defined construc-tively.) Let P ε ( t, x, dx ′ ) be the corresponding Markov transition function, and T εt , t ≥ C ( M ). Take a sequence of functions f n ∈ D ( A ε ) withvalues in [0 ,
1] with compact support such that f n ( i, z ) = 1 for | z | ≤ n , k A ε f n k C ≤ /n .The existence of such a sequence is easily justified once we recall that the coefficients of A ε are constant for sufficiently large | z | . 5ince A ε is the infinitesimal generator of the semi-group T εt , we have (see Theorem I.1of [8]), for f ∈ D ( A ε ), T εt f − f = Z t T εs A ε f ds. (4)Therefore, T εt f n ( x ) − f n ( x ) = Z t T εs A ε f n ( x ) ds → n → ∞ , which implies that T εt f n ( x ) →
1, and therefore P ε ( t, x, · ) is a probability measure. Let X x,εt = ( ξ x,εt , z x,εt ), x = ( i, z ) ∈ M , be the corresponding Markov family. A modificationof X x,εt can be chosen with trajectories that are right continuous and have left limits ([7],page 348).Rewrite (4) as E f ( X x,εt ) − f ( x ) = E Z t ( A ε f )( X x,εs ) ds. Since X x,εt is a RCLL Markov process with continuous trajectories, for each x ∈ M , theprocess f ( X x,εt ) − f ( x ) − R t ( A ε f )( X x,εs ) ds is a RCLL martingale, and, for each stoppingtime τ with E τ < ∞ , we getE f ( X x,ετ ) − f ( x ) = E Z τ ( A ε f )( X x,εs ) ds. (5)Recall that we earlier defined the process X x,εt constructively, without referring to theHille-Yosida theorem. It is easily verified directly that the generator of this process coin-cides with A ε on D ( A ε ). The Markov transition function of the process is stochasticallycontinuous and satisfies condition C . At the same time, by (4), the semigroup is defineduniquely by the values of the generator on a dense set, and thus the generator of theconstructively defined process is A ε (rather than a non-trivial extension). Let us describe the appropriate space and the limiting process on it for the fast-slowsystem X x,εt = ( ξ x,εt , z x,εt ). Let I = ( −∞ , I = { } × [0 , ∞ ), I = { } × [0 , ∞ ). Theseare three half-lines, with I and I distinguished by a label. We’ll identify the ends of I , I , and I , thus obtaining a graph, denoted by S , with three semi-infinite edges with thecommon vertex, which will be denoted O . Each point y = ( l, z ) ∈ S is determined bythe label of the edge l ∈ { , , } and the coordinate z , where z ∈ ( −∞ ,
0] for l = 0 and z ∈ [0 , ∞ ) for l = 1 , κ = 0). Theprocess Y yt starting at y = ( l, z ) ∈ S will move deterministically with the variable speed v on I , v on I , and v on I . For y ∈ I , we still need to describe the behavior of Y yt once the process reaches O . The behavior at O is random, the process proceeds to I and6 with probabilities p = P i ∈ R π i v i P i ∈ R π i v i + P i ∈ R π i v i and p = P i ∈ R π i v i P i ∈ R π i v i + P i ∈ R π i v i respectively, where v i = v ( i, κ = 1). The process Y yt is a diffusion insideeach of the edges. However, a gluing condition is needed to describe the behavior of theprocess once it reaches the vertex. Thus, it is most convenient to define the process viaits generator. The domain of A , denoted by D ( A ), consists of all functions f ∈ C ( S )such that:(a) f ′′ ( l, · ) + v l ( · ) f ′ ( l, · ) ∈ C ( S ), i.e., the differential operator can be applied to f inside each of the edges, and the resulting function can be extended to the vertex O , sothat it becomes an element of C ( S ).(b) There are one-sided derivatives f ′ ( l,
0) and f ′ (0 ,
0) = π f ′ (1 ,
0) + π f ′ (2 , . (6)It is not difficult to verify that the conditions of the Hille-Yosida theorem are satisfiedand that the resulting Markov transition function, denoted by P ( t, x, B ), is a probabilitymeasure, as a function of B . Let Y yt , y ∈ S , be the corresponding Markov family and T t be the corresponding semigroup. In order to show that a modification with continuoustrajectories exists, it is enough to check that lim t ↓ P ( t, x, B ) /t = 0 for each closed set B that doesn’t contain x (Theorem I.5 of [8], see also [1]). Let f ∈ D ( A ) be a non-negativefunction that is equal to one on B and whose support doesn’t contain x . Thenlim t ↓ P ( t, x, B ) t ≤ lim t ↓ ( T t f )( x ) − f ( x ) t = Af ( x ) = 0 , as required. Thus Y yt can be assumed to have continuous trajectories. The next lemma can be used to show convergence of families of parameter-dependentprocesses. We formulate it in a general setting. Consider a metric space M and a Markovfamily X x,εt , x ∈ M , of processes that depend on a parameter ε >
0. We also consider acontinuous mapping h : M → S from M to a locally compact separable metric space S and define the processes Y x,εt = h ( X x,εt ), x ∈ M , ε > X x,εt , as ε ↓
0. However, the space M is too large forour purposes, i.e., the natural state space for the limiting process consists of equivalenceclasses in M rather than of individual points. Thus, Y x,εt will capture reduced dynamics,where meaningful limiting behavior can be observed.7ote that while convergence to Markov processes on S as ε ↓ Y x,εt need not be Markov for fixed ε >
0. The main point of the lemma isthat, in order to demonstrate the convergence of Y x,εt to a limiting process, it is sufficientto check that for small ε the processes nearly satisfy the relation (7), which is similar tothe martingale problem but with the ordinary exptectation rather than the conditionalexpectation. Lemma 4.1.
Let h : M → S be a continuous mapping from a metric space M to alocally compact separable metric space S . Let X x,εt , x ∈ M , be a Markov family on M that depends on a parameter ε > . Suppose that the processes Y x,εt = h ( X x,εt ) , x ∈ M , ε > , have continuous trajectories. Let Y yt , y ∈ S , be a Markov family on S withcontinuous trajectories whose semigroup T t , t ≥ , preserves the space C ( S ) . (This,together with the continuity of trajectories, implies that T t is a Feller semi-group, i.e., T t f , viewed as a function of t , is a right-continuous from [0 , ∞ ) to C ( S ) for each f .)Let A : D ( A ) → C ( S ) denote the infinitesimal generator of this family, where D ( A ) isthe domain of the generator. Let Ψ be dense linear subspace of C ( S ) and D be a linearsubspace of D ( A ) , and suppose that Ψ and D have the following properties:(1) There is λ > such that for each f ∈ Ψ the equation λF − AF = f has asolution F ∈ D .(2) For each T > , each f ∈ D , and each compact K ⊆ S , lim ε ↓ E( f ( Y x,εT ) − f ( Y x,ε ) − Z T Af ( Y x,εt ) dt ) = 0 , (7) uniformly in x ∈ h − ( K ) . Suppose that the family of measures on C ([0 , ∞ ) , S ) inducedby the processes Y x,εt , ε > , is tight for each x ∈ M .Then, for each x ∈ M , the measures induced by the processes Y x,εt converge weakly, as ε ↓ , to the measure induced by the process Y h ( x ) t .Proof. Fix x ∈ M . Since the family of measures on C ([0 , ∞ ) , S ) induced by the processes Y x,εt , ε >
0, is tight, we can find a process Z xt with continuous trajectories and a sequence ε n ↓ Y x,ε n t converge to Z xt in distribution as n → ∞ . The desired resultwill immediately follow if we demonstrate that the distribution of Z xt coincides with thedistribution of Y h ( x ) t (and thus does not depend on the choice of the sequence ε n ). Wewill show that Z xt is a solution of the martingale problem for ( A | D , h ( x )), i.e., for each T > T ≥ f ∈ D ,E( f ( Z xT ) − f ( Z xT ) − Z T T Af ( Z xt ) dt |F Z x T ) = 0 , Z x = h ( x ) . (8)First, however, let us discuss the uniqueness for solutions of the martingale problem. Weclaim that:(a) D is dense in C ( S ).(b) Range( λ − A | D ) is dense in C ( S ). 8c) For each pair of measures µ , µ on S , the equality R S f dµ = R S f dµ for all f ∈ C ( S ) implies that µ = µ .To demonstrate (a), take an arbitrary δ > F ∈ D ( A ). Let g = λF − AF ,and take g ′ ∈ Ψ such that k g ′ − g k ≤ λδ . Let F ′ ∈ D be such that λF ′ − AF ′ = g ′ .Then, since A is the generator of a strongly continuous semigroup on C ( S ), from theHille-Yosida theorem it follows that k F ′ − F k ≤ k g ′ − g k /λ ≤ δ . This implies (a) since D ( A ) is dense in C ( S ). Note that (b) follows from the existence of a solution F ∈ D to λF − AF = f ∈ Ψ and the density of Ψ, while (c) is obvious. The validity of (a)-(c)is enough to conclude that the distribution on C ([0 , ∞ ) , S ) of a process with continuouspaths satisfying (8) is uniquely determined (Theorem 4.1, Chapter 4 in [2]).Note that (8) is satisfied if Z xt is replaced by Y h ( x ) t since D ⊆ D ( A ) and A the thegenerator of the family Y yt , y ∈ S . Therefore, Z xt and Y h ( x ) t have the same distribution if(8) holds. It remains to prove (8).Note that Z xt is a solution of the martingale problem for ( A | D , h ( x )) if and only ifE ( k Y i =1 g i ( Z xt i ))( f ( Z xT ) − f ( Z xT ) − Z T T Af ( Z xt ) dt ) ! = 0 , Z x = h ( x ) , whenever f ∈ D , 0 ≤ t < ... < t k ≤ T , and g , ..., g k ∈ C ( S ). Since Y x,ε n t = h ( X x,ε n t )converge to Z xt in distribution, we haveE ( k Y i =1 g i ( Z xt i ))( f ( Z xT ) − f ( Z xT ) − Z T T Af ( Z xt ) dt ) ! =lim n →∞ E ( k Y i =1 g i ( h ( X x,ε n t i )))( f ( h ( X x,ε n T )) − f ( h ( X x,ε n T )) − Z T T Af ( h ( X x,ε n t )) dt ) ! =lim n →∞ E ( k Y i =1 g i ( h ( X x,ε n t i )))E( f ( h ( X x,ε n T )) − f ( h ( X x,ε n T )) − Z T T Af ( h ( X x,ε n t )) dt |F X x,εn T ) ! . By the Markov property of the family X x,ε n t ,E( f ( h ( X x,ε n T )) − f ( h ( X x,ε n T )) − Z T T Af ( h ( X x,ε n t )) dt |F X x,εn T ) =E( f ( h ( X x ′ ,ε n T − T )) − f ( h ( X x ′ ,ε n )) − Z T − T Af ( h ( X x ′ ,ε n t )) dt ) | x ′ = X x,εnT , which tends to zero in distribution, as follows from (7) and from the tightness of thesequence of random variables X x,ε n T . Therefore, using the boundedness of f , Af , and g , ..., g k , we conclude thatE ( k Y i =1 g i ( Z xt i ))( f ( Z xT ) − f ( Z xT ) − Z T T Af ( Z xt ) dt ) ! = 0 . Finally, Z x = h ( x ) since Y x,ε n = h ( X x,ε n ) = h ( x ) for all n .9 Convergence of the fast-slow process
Consider first a simplified version of the problem: assume that the fast-slow system X x,εt =( ξ x,εt , z x,εt ) is defined as in Section 2, but q ij ( z ) > i = j (and thus Ξ zt is ergodic) foreach z ∈ R . In this case, the fast Markov chain has a unique invariant distribution,which will be denoted by µ i ( z ), 1 ≤ i ≤ n , for each z ∈ R . Define Y yt , y ∈ R , tobe the deterministic motion on the real line with the velocity v ( y ) = P ni =1 v ( i, z ) µ i ( z ), z ∈ R . The domain D ( A ) of its generator A consists of all functions f ∈ C ( R ) such that f ′ ∈ C ( R ), while Af ( y ) = v ( y ) f ′ ( y ). Let h : M → R be the projection h ( i, z ) = z . Thefollowing theorem is a standard averaging result. Theorem 5.1.
Suppose that q ij > for i = j , z ∈ R . For each x ∈ M , the measuresinduced by the processes Y x,εt = h ( X x,εt ) on R converge weakly, as ε ↓ , to the measureinduced by the process Y h ( x ) t .Proof. We apply Lemma 4.1 with S = R , Ψ = D = D ( A ). Thus we need to justify (7)for f ∈ D ( A ). Define ˜ f ( i, z ) = f ( z ), 1 ≤ i ≤ n . Using (5) (which is still valid in thissimplified case) applied to ˜ f with τ = T , we can writeE (cid:18) f ( Y x,εT ) − f ( Y x,ε ) − Z T Af ( Y x,εt ) dt (cid:19) =E (cid:18) f ( Y x,εT ) − f ( Y x,ε ) − Z T Af ( Y x,εt ) dt (cid:19) − E (cid:18) ˜ f ( X x,εT ) − ˜ f ( x ) − Z T A ε ˜ f ( X x,εt ) dt (cid:19) =E Z T (cid:16) A ε ˜ f ( X x,εt ) − Af ( Y x,εt ) (cid:17) dt = E Z T ( v ( X x,εt ) − v ( z x,εt )) f ′ ( z x,εt ) dt. It easily follows from the explicit construction of X x,εt (Section 2) that the expression inthe right hand side tends to zero uniformly in x .Now let us consider the original situation with two ergodic classes for the Markovchain when z ≥
0. Recall that S is now a graph with three semi-infinite edges, I , I , and I , with the common vertex O . The process Y yt on S has been defined in Section 2 (thecase κ = 0). The motion is deterministic on each of the edges, while the behavior at O is random - the process proceeds to I or I with the prescribed probabilities p and p ,respectively.Let h be the mapping of M = { , ..., n } × R to S defined as follows: h ( i, z ) = (0 , z ) , z ≤ , (1 , z ) , i ∈ R , z ≥ , (2 , z ) , i ∈ R , z ≥ . (9)10 heorem 5.2. Suppose that κ = 0 and that the assumptions made in Section 2 aresatisfied (in particular, the Markov chain Ξ zt has two ergodic classes for each z ≥ ). Foreach x ∈ M , the measures induced by the processes Y x,εt = h ( X x,εt ) on S converge weakly,as ε ↓ , to the measure induced by the process Y h ( x ) t .Proof. Lemma 4.1 is not directly applicable now because the semigroup that correspondsto the process Y yt does not preserve C ( S ). However, outside of an arbitrarily smallneighborhood of the set h − ( O ), the limiting motion of Y x,εt is given by Y h ( x ) t , as followsfrom Theorem 5.1. To complete the proof, we need to show that if X x,εt starts slightly tothe left of h − ( O ), then it quickly moves to the right of h − ( O ) and ξ x,εt ends up in thefirst ergodic class with probability close to p .More precisely, let τ x,εδ = inf { t ≥ z x,εt = δ } . It is sufficient to show that for each η > δ > δ ∈ (0 , δ ] thereis ε > ε ∈ (0 , ε ], we haveE τ x,εδ < η, (10) | P( ξ x,ετ x,εδ ∈ R ) − p | < η, (11)whenever x = ( i, − δ ). From the explicit construction of X x,εt (Section 2), it is clearthat z x,εt increases, while on [ − δ, δ ], with the speed that is bounded from below byinf i,z ∈ [ − δ,δ ] v ( i, z ) >
0. This implies (10). To prove (11), we define f ε ( i, z ), z ∈ [ − δ, δ ], asthe solution of the system of ODEs df ε ( i, z ) dz = ( v ( i, z )) − ε Q i ( z ) f ε ( i, z ) − X j = i q ij ( z ) f ε ( j, z ) ! with the terminal condition f ε ( i, δ ) = e i := (cid:26) , i ∈ R , , i ∈ R . We extend f ε to be defined on M so that f ε ∈ D ( A ε ). Observe that, by construction, A ε f ε ( i, z ) = 0 when z ∈ [ − δ, δ ]. Therefore, applying (5) with τ = τ x,εδ and x = ( i, − δ ),we obtain P( ξ x,ετ x,εδ ∈ R ) = f ε ( i, − δ ) . Thus it remains to analyze the asymptotics of the solution to the ODE. Let N ( z ) be thematrix, whose diagonal elements are N ii ( z ) = − ( v ( i, z )) − Q i ( z ) and off-diagonal elementsare N ij ( z ) = ( v ( i, z )) − q ij ( z ). Let N δ = 12 δ Z δ − δ N ( z ) dz. f ε ( · , − δ ) = exp( 2 δε N δ ) e. When δ is small, N δ is a small perturbation of the matrix N (0). Namely, let H δ = N δ − N (0) . All the entries of H δ tend to zero when δ ↓
0. Observe that all the off-diagonal entries of N δ are positive for each δ , and the sum of elements in each row is equal to zero. Therefore,zero is the simple eigenvalue of N δ with the right eigenvector equal to e = (1 , ..., T , thereal parts of the other eigenvalues are negative.Let Π δe ( e ) be the projection of e onto e along the space spanned by the remainingeigenvectors (and generalized eigenvectors) of the matrix N δ . Thenlim ε ↓ f ε ( i, − δ ) = (Π δe ( e )) i for each i , and it remains to show that (Π δe ( e )) i (which does not depend on i ) is close to p for small δ .Observe that zero is the top eigenvalue of N (0) with two linearly independent righteigenvectors e and e and two linearly independent left eigenvectors: π i = (cid:26) π i v i , i ∈ R , , i ∈ R ,π i = (cid:26) , i ∈ R ,π i v i , i ∈ R , where v i = v ( i, λ δ < N δ with the second-largest realpart (the top eigenvalue is zero). It is determined uniquely for small δ . Let g δ be thecorresponding right eigenvector (determined up to a constant factor). Lemma 5.3.
The vector g δ can be represented as g δ = e + α δ e + g δ , (12) where g δ belongs to the space spanned by the eigenvectors (and generalized eigenvectors)of N (0) , other than e and e . The coefficient α δ is bounded away from zero, and g δ tendsto zero when δ ↓ .Proof. Let ¯ i ( δ ) be such that | g δ ¯ i ( δ ) | = max ≤ i ≤ n | g δi | . Assume, for now, that ¯ i ( δ ) ∈ R forall sufficiently small δ . Then, since N δ is a small perturbation of N (0) and λ δ → δ ↓ N δ g δ = λ δ g δ easily implies that g δi /g δ ¯ i ( δ ) → δ ↓ i ∈ R .Let ˜ π δ be the normalized left eigenvector for N δ with eigenvalue zero. From ˜ π δ N δ = 0and N δ g δ = λ δ g δ it follows that h g δ , ˜ π δ i = 0. Let ˜ i ( δ ) be such that g δ ˜ i ( δ ) = max i ∈ R | g δi | .Observe that ˜ π δi → π i for i ∈ R , and ˜ π δi → π i for i ∈ R . Therefore, c | g δ ¯ i ( δ ) | ≤ | g δ ˜ i | ≤ c | g δ ¯ i ( δ ) | (13)12or some positive constants c and c . As above, g δi /g δ ˜ i ( δ ) → δ ↓ i ∈ R . Fromthe facts that h g δ , ˜ π δ i = 0, ˜ π δi → π i for i ∈ R , and ˜ π δi → π i for i ∈ R , it follows that g δi , i ∈ R , are of the opposite sign from g δi , i ∈ R .The vector g δ can be represented as a sum of three components, g δ = a δ + b δ + c δ , where a δ is a multiple of e , b δ is a multiple of e , and c δ is in the space spanned by the eigenvectors(and generalized eigenvectors) of N (0), other than e and e . Observe that k c δ k / k g δ k → δ ↓ e and e span the eigenspace corresponding to the top eigenvalue of N (0) and g δ belongs to a small perturbation of that space. Moreover, from (13) and the fact that g δi , i ∈ R , and g δi , i ∈ R , are of the opposite sign, it follows that k a δ k / k b δ k is boundedfrom above and below. Therefore, (12) is possible with α δ bounded away from zero andinfinity.Finally, it remains to note that our condition ¯ i ( δ ) ∈ R does not lead to any loss ofgenerality.Since g δ is the eigenvector of N δ , we get( N (0) + H δ )( e + α δ e + g δ ) = λ δ ( e + α δ e + g δ ) . Taking the scalar product with π and π on both sides and noting that H δ e = 0, weobtain α δ h H δ e, π i + h H δ g δ , π i = λ δ h e + α δ e, π i ,α δ h H δ e, π i + h H δ g δ , π i = λ δ h e + α δ e, π i . Therefore, (cid:0) α δ h H δ e, π i + h H δ g δ , π i (cid:1) h e + α δ e, π i = (cid:0) α δ h H δ e, π i + h H δ g δ , π i (cid:1) h e + α δ e, π i . Observe that h H δ g δ , π i = o ( α δ h H δ e, π i ) , h H δ g δ , π i = o ( α δ h H δ e, π i ) , as δ ↓ , and therefore, h H δ e, π ih e + α δ e, π i ∼ h H δ e, π ih e + α δ e, π i as δ ↓ . Solving for α δ giveslim δ ↓ α δ = − − (cid:16)P i ∈ R P j ∈ R q ij π i (cid:17) (cid:0)P i ∈ R π i (cid:1)(cid:16)P i ∈ R P j ∈ R q ij π i (cid:17) (cid:0)P i ∈ R π i (cid:1) = − − P i ∈ R π i P i ∈ R π i . From (12), it follows thatlim δ ↓ (Π δe ( e )) i = − / lim δ ↓ α δ = P i ∈ R π i P i ∈ R π i + P i ∈ R π i , as required. 13 .2 The case with diffusion Now we consider the fast-slow system X x,εt = ( ξ x,εt , z x,εt ) defined in Section 2, with κ = 1.The filtration generated by the process will be denoted by F x,εt . The process Y yt on thegraph S is now a diffusion (defined in Section 3 via its generator). The mapping h is thesame as in (9). Theorem 5.4.
Suppose that κ = 1 and that the assumptions made in Section 2 aresatisfied (in particular, the Markov chain Ξ zt has two ergodic classes for each z ≥ ). Foreach x ∈ M , the measures induced by the processes Y x,εt = h ( X x,εt ) on S converge weakly,as ε ↓ , to the measure induced by the process Y h ( x ) t .Proof. Let
T > f ∈ D ( A ), and let K be a compact subset of S . It is clear that thefamily of measures on C ([0 , ∞ ) , S ) induced by the processes Y x,εt , ε >
0, is tight for each x ∈ M . Thus, by Lemma 4.1, it is sufficient to prove that, given η >
0, we have | E( f ( Y x,εT ) − f ( Y x,ε ) − Z T Af ( Y x,εt ) dt ) | ≤ η, for all x ∈ h − ( K ) and all sufficiently small ε .Let us define two sequences of stopping times: σ x,ε = 0; τ x,εn = inf { t ≥ σ n − : z x,εt = 0 } , n ≥ σ x,εn = inf { t ≥ τ n : | z x,εt | = δ } , n ≥ , where δ > (cid:18) f ( Y x,εT ) − f ( Y x,ε ) − Z T Af ( Y x,εt ) dt (cid:19) =E ∞ X n =1 f ( Y x,ετ x,εn ∧ T ) − f ( Y x,εσ x,εn − ∧ T ) − Z τ x,εn ∧ Tσ x,εn − ∧ T Af ( Y x,εt ) dt ! + (14)E ∞ X n =1 f ( Y x,εσ x,εn ∧ T ) − f ( Y x,ετ x,εn ∧ T ) − Z σ x,εn ∧ Tτ x,εn ∧ T Af ( Y x,εt ) dt ! . In order to control the number of terms in the sums above, we’ll need the following lemma.
Lemma 5.5.
There is c > such that, for all sufficiently small δ , P( σ x,εn ≤ T ) ≤ exp( − cδn ) , x ∈ M, n ≥ . (15) Proof.
Let A t be an auxiliary diffusion process, dA t = adt + dW t , A = − δ , where a = sup i,z | v ( i, z ) | . Let ˜ τ = inf { t : A t = 0 } . Then P(˜ τ ≤ T ) ≤ exp( − cδ ) for some c > τ k , k ≥
1, is a sequence of independent random variables distributed as ˜ τ , thenP(˜ τ + ... + ˜ τ n ≤ T ) ≤ exp( − cδn ) . (16)14rom the definition of the stopping times and the process X x,εt it follows thatP( τ x,εn − σ x,εn − > s |F x,εσ x,εn − ) ≥ P(˜ τ > s )for each n ≥ s ≥
0. Therefore, estimate (15), with τ x,εn +1 instead of σ x,εn , follows from(16) and the strong Markov property. Thus, the original formula (15) also holds, with adifferent constant c .Let α ( x, n ) = E f ( Y x,ετ x,εn ∧ T ) − f ( Y x,εσ x,εn − ∧ T ) − Z τ x,εn ∧ Tσ x,εn − ∧ T Af ( Y x,εt ) dt |F x,εσ x,εn − ∧ T ! . Observe that lim ε ↓ sup x ∈ h − ( K ) sup n ≥ | α ( x, n ) | = 0uniformly in all the realizations of the randomness (which is present since we are takingthe conditional expectation). This is a standard averaging result for the fast-slow systemin the case of a single invariant measure for the fast motion. It easily follows from theexplicit construction of X x,εt . Therefore, for the first expectation in (14), by Lemma 5.5,we get | E ∞ X n =1 f ( Y x,ετ x,εn ∧ T ) − f ( Y x,εσ x,εn − ∧ T ) − Z τ x,εn ∧ Tσ x,εn − ∧ T Af ( Y x,εt ) dt ! | ≤≤ ∞ X n =1 | α ( x, n ) | P( σ x,εn − ≤ T ) → ε ↓ , uniformly in x ∈ h − ( K ).Next, observe that | E (cid:16) σ x,εn ∧ T − τ x,εn ∧ T |F x,ετ x,εn ∧ T (cid:17) | ≤ Cδ for some constant C and all x ∈ M , n ≥
1. This follows from the fact that the process z x,εt is a Brownian motion with a bounded variable drift, and the expectation of its exittime from the δ -neighborhood of the origin is estimated from above by Cδ . Therefore, | E ∞ X n =1 Z σ x,εn ∧ Tτ x,εn ∧ T Af ( Y x,εt ) dt | ≤ Cδ sup | Af | ∞ X n =1 P( τ x,εn ≤ T ) . By Lemma 5.5, since τ x,εn ≥ σ x,εn − , the right hand side does not exceed Kδ for someconstant K . This is smaller than η/ δ . Thus it remains to showthat there is δ > | E ∞ X n =1 (cid:16) f ( Y x,εσ x,εn ∧ T ) − f ( Y x,ετ x,εn ∧ T ) (cid:17) | < η/ ε . Observe that | E ∞ X n =1 (cid:16) f ( Y x,εσ x,εn ) − f ( Y x,εσ x,εn ∧ T ) (cid:17) χ { τ x,εn ≤ T }| ≤ sup | f ( l , z ) − f ( l , z ) | < η/ δ , where the supremum is taken over all l , l and z , z such that | z | , | z | ≤ δ . Therefore, | E ∞ X n =1 (cid:16) f ( Y x,εσ x,εn ∧ T ) − f ( Y x,ετ x,εn ∧ T ) (cid:17) | ≤ η/ x : h ( x )= O E ( f ( Y x,εσ ) − f ( Y x,ε )) ∞ X n =1 P( τ x,εn ≤ T ) , where σ = σ x,ε = inf { t ≥ | z x,εt | = δ } . By Lemma 5.5, since τ x,εn ≥ σ x,εn − , the sum inthe right hand side can be estimated from above by K/δ for some K , and it remains toshow that sup x : h ( x )= O E ( f ( Y x,εσ ) − f ( Y x,ε )) /δ can be made arbitrarily small for some δ and all sufficiently small ε . Since f ( l, z ) is differentiable in z at z = 0 along each edge(one-sided derivatives exist), and the relation between the derivatives is given by (6), theresult follows from the following lemma. Lemma 5.6.
For each η > , for all sufficiently small δ > , | P( ξ x,εσ ∈ R , z x,εσ = δ ) − π | ≤ η, | P( ξ x,εσ ∈ R , z x,εσ = δ ) − π | ≤ η for each x such that h ( x ) = O and all sufficiently small ε (depending on δ ).Proof. Consider an auxiliary process ˜ X x,εt = ( ˜ ξ x,εt , ˜ z x,εt ) that is defined the same way as X x,εt , but with v ( i, · ) ≡ i . The corresponding stopping time will be denotedby ˜ σ . Let ˜ µ t and µ t be the measures on the space of RCLL functions from [0 , t ] to M induced by the processes ˜ X x,εt and X x,εt , respectively. By the Girsanov theorem, ˜ µ t and µ t are mutually absolutely continuous. Moreover, for each η >
0, for all sufficientlysmall t and ε , we have ˜ µ t (1 − η ≤ p t ≤ η ) ≥ − η for each x = ( i, p t is the density of µ t with respect to ˜ µ t . Since P(˜ σ ≤ t ) → δ ↓
0, we have, for allsufficiently small δ and ε and x = ( i, δ ↓ | P( ξ x,εσ ∈ R , z x,εσ = δ ) − P( ˜ ξ x,ε ˜ σ ∈ R , ˜ z x,ε ˜ σ = δ ) | ≤ η/ . Similarly, lim δ ↓ | P( ξ x,εσ ∈ R , z x,εσ = δ ) − P( ˜ ξ x,ε ˜ σ ∈ R , ˜ z x,ε ˜ σ = δ ) | ≤ η/ . Thus it is sufficient to prove Lemma 5.6 in the case when there is no drift term.Next, we need the following observation about time-inhomogeneous Markov processes.Recall that the time-homogeneous Markov chain with transition rates q ij ( z ), z <
0, hasa unique invariant distribution µ i ( z ), 1 ≤ i ≤ n , z ∈ R . Moreover, when t → ∞ , thedistribution of Ξ zt is close to the invariant distribution, which, in turn is close to π i ,1 ≤ i ≤ n , if | z | is small. A similar statement can be made about time-inhomogeneousprocesses. Namely, let η >
0. It is not difficult to show that there is δ > | ˜ z ( t ) | ≤ δ for t ≤ t and if there is δ > λ ( t : ˜ z ( t ) ≤− δ ) → ∞ as t → ∞ , then lim t →∞ | P(˜Ξ ˜ zt − i ) − π i | ≤ η, where λ is the Lebesgue measure on the real line and ˜Ξ ˜ zt is a time-inhomogeneous Markovprocess with transition rates at time t given by q ij (˜ z ( t )).To complete the proof of Lemma 5.6 in the case when there is no drift term, we con-dition the evolution of the fast component on the realization of the Brownian motion andobtain that the above argument is applicable for almost every realization of the Brownianmotion (after rescaling the time by 1 /ε ).As we discussed above, Lemma 5.6 completes the proof of the theorem. Acknowledgments : While working on this article, L. Koralov was supported bythe ARO grant W911NF1710419 and by the University of Maryland Research and Schol-arship Award.
References [1] Dynkin E. B.,
Markov Processes , Springer-Velag, Berlin, Heidelberg, New York,1965.[2] Ethier S. N., Kurtz T. G,
Markov processes: characterization and convergence ,Wiley Series in Probability and Mathematical Statistics: Probability and Math-ematical Statistics. John Wiley and Sons, Inc., New York, 1986.[3] Freidlin M.I.,
Functional Integration and Partial Differential Equations , Prince-ton University Press, 1985.[4] Freidlin M.I.,
Thermostat-like perturbations of an oscillator , J. Stat. Phys. 164(2016), no. 1, pp. 130–141.[5] Freidlin M.I.,
On stochastic perturbations of dynamical systems with a “rough”symmetry. Hierarchy of Markov chains , J. Stat. Phys. 157 (2014), no. 6, 1031–1045.[6] Freidlin M. I., Wentzell A. D.,
Random Perturbations of Dynamical Systems ,Springer 2012.[7] Korolyuk V. S., Portenko N. I, Skorokhod A. V., Turbin A. F.,
Handbook onprobability theory and mathematical statistics , Nauka, 1985, (in Russian).178] Mandl P.,