Giant leaps and long excursions: fluctuation mechanisms in systems with long-range memory
GGiant leaps and long excursions: fluctuation mechanisms in systems with long-rangememory
Robert L. Jack
1, 2 and Rosemary J. Harris Department of Applied Mathematics and Theoretical Physics,University of Cambridge, Wilberforce Road, Cambridge CB3 0WA, United Kingdom Department of Chemistry, University of Cambridge,Lensfield Road, Cambridge CB2 1EW, United Kingdom School of Mathematical Sciences, Queen Mary University of London,Mile End Road, London E1 4NS, United Kingdom
We analyse large deviations of time-averaged quantities in stochastic processes with long-rangememory, where the dynamics at time t depends itself on the value q t of the time-averaged quantity.First we consider the elephant random walk and a Gaussian variant of this model, identifying twomechanisms for unusual fluctuation behaviour, which differ from the Markovian case. In partic-ular, the memory can lead to large-deviation principles with reduced speeds, and to non-analyticrate functions. We then explain how the mechanisms operating in these two models are genericfor memory-dependent dynamics and show other examples including a non-Markovian symmetricexclusion process. I. INTRODUCTION
Memory effects and long-range temporal correlationsare important in many physical systems [1–5], and inother scientific fields ranging from biology to telecom-munications to finance [6, 7]. It is particularly no-table that long-ranged memory can change fluctua-tion behaviour qualitatively, compared to Markovian(memory-less) cases. Demonstrations of this includenon-Markovian random walks [8–13], models of clustergrowth [14–16], and agent-based models where decisionsdepend on past experience [17]. The distinction betweenMarkovian and non-Markovian systems is also importantwhen formulating general theories. For example, a largedeviation theory of dynamical fluctuations is now estab-lished for Markovian systems [18–23], but memory canlead to new effects which cannot be captured by the stan-dard theory [16, 24–27]. In particular, one finds [24, 25]a breakdown of the standard large-deviation principles(LDPs) that hold quite generically in finite Markoviansystems [22].In this work, we consider non-Markovian systemswhere the dynamics depend explicitly on a time-averagedcurrent, whose value at time t is denoted by q t . This isa simple type of memory that occurs in a wide range ofphysical models [8, 12, 16, 17, 24, 25]. Using methodsof large-deviation theory [18–22, 28], we show how thislong-range memory can lead to anomalous fluctuationsof q t . We explain that much of this behaviour can beunderstood by considering two generic fluctuation mech-anisms, where memory plays an intrinsic role. These gen-eral mechanisms are useful for classifying previous resultsfor non-Markovian systems, and for identifying new phe-nomena.We illustrate these mechanisms by analysing cur-rent fluctuations in the elephant random walk (ERW)of [8, 10, 29, 30], and a related process which we call theGaussian elephant random walk (GERW). A key differ- ence from Markovian systems is that large (rare) fluctu-ations in these models are associated with currents thatare strongly time-dependent [16, 24, 25, 28] – a largecurrent at early times biases the subsequent evolutionand can trigger anomalous fluctuations that persist forlarge times. The two specific mechanisms that we discussare: (i) a very large initial current flow in a finite timeinterval, which results in anomalously large deviations(specifically, an LDP for q t with a speed that is less than t [24, 25]); and (ii) a large initial current that occurs overa sustained time interval, which leads to a breakdown ofthe central limit theorem (CLT) for q t [8] and an LDPwhich generically has a non-analytic rate function [16].The GERW illustrates mechanism (i), which we refer toas an initial giant leap (IGL); the ERW illustrates mech-anism (ii) which we call a long initial excursion (LIE).We also describe several other examples of systems inwhich these mechanisms occur.The structure of the paper is as follows. Sec. II intro-duces the models that we analyse, and some relevant the-ory. In Sec. III we describe the large-deviation behaviourof the ERW and GERW models. Sec. IV gives a generaltheory for the IGL and LIE mechanisms and Sec. V de-scribes how this theory plays out in several other models,to illustrate its applicability. Sec. VI gives a summary ofthe main conclusions and open questions. Additional de-tails of calculations are given in Appendices. II. MODELS AND METHODSA. Definitions of ERW and GERW
The ERW is a random walk model, in discrete time [8,9]. The position of the elephant after step t is x t . Inthe variant of the model that we consider here, x t takesvalues in a finite domain { , , . . . , L − } , with periodicboundaries. (The choice of periodic boundaries does not a r X i v : . [ c ond - m a t . s t a t - m ec h ] J un change the physical behaviour but it is useful when com-paring the large-deviation behaviour with that of Marko-vian systems, see Sec. II B. In the following we do notdistinguish our periodic variant from the original ERW,except in the rare cases where this is necessary.) We take x = 0 and denote the displacement of the elephant onstep t by ∆ x t . Hence the time-averaged current is q t = 1 t t (cid:88) τ =1 ∆ x τ (1)with q = 0. The dynamical rule is that∆ x t = ± ± aq t − , (2)where a ∈ ( − ,
1) is a parameter that corresponds to2 p − x t is a real number in [0 , L ), still with peri-odic boundaries. The dynamical rule is that ∆ x t is aGaussian-distributed real number with mean aq t − andvariance unity. Hence the mean value of ∆ x t (conditionalon q t − ) is the same as for the ERW. At first glance,one might also expect fluctuations in the two models tobehave similarly but in fact their memory-induced large-deviation behaviour is very different. Note that althoughthe GERW is rather simple to analyse, it is useful tostudy in detail as a contrast to the ERW and to illus-trate the strong effects of memory. B. Large deviations for Markovian andnon-Markovian dynamics
For the ERW and GERW, we consider the probabil-ity density for q t at large times, which we denote by p t ( q ). We will be particularly concerned with the tailsof this probability distribution and the associated fluctu-ation mechanisms, which are characterised by large devi-ation theory [18–22]. This theory describes rare fluctu-ations, outside the range of CLTs and their generalisa-tions [10]. In recent years, it has been applied to time-averaged quantities in many physical systems, yieldingimportant new insights [20, 31–34].For finite Markov chains, there is a well-establishedlarge deviation theory due originally to Donsker andVaradhan (DV) [35–38], see for example [23, 39] for recentsummaries. Within this theory, time-averaged quantitiessuch as q t obey LDPs of the form p t ( q ) (cid:39) exp[ − tI ( q )] , (3)where t is called the speed of the LDP, and I the ratefunction . More generally, one may also consider LDPs ofthe form p t ( q ) (cid:39) exp (cid:2) − t θ I ( q ) (cid:3) . (4) with θ (cid:54) = 1. In some of the non-Markovian models con-sidered here we find 0 < θ < t θ is re-duced, compared to the Markovian case. Physically, thismeans that the memory makes large fluctuations lessrare [24, 25]. See [40–42] for some other examples whereLDPs with reduced speed are associated with enhancedfluctuations.For large deviations, an important quantity is the cu-mulant generating function for q t : G ( λ, t ) = log (cid:104) e λtq t (cid:105) . (5)To analyse the limit of large t , we consider the scaled cu-mulant generating function (SCGF) which can be definedgenerally for LDPs with speed t θ : ψ θ ( λ ) = lim t →∞ t θ log (cid:104) e λt θ q t (cid:105) . (6)For the usual case θ = 1 we omit the subscript and writesimply ψ ( λ ). If the limit (6) exists and certain other tech-nical conditions are met then the Gartner-Ellis theoremstates that the LDP (4) holds with I ( q ) = sup λ [ λq − ψ θ ( λ )] . (7)The classical (DV) theory deals with LDPs of speed t . Under suitable assumptions, the rate function can beshown to be analytic and strictly convex. For processeson discrete state spaces, it is sufficient that (i) the modelis Markovian; (ii) transition rates (or transition probabil-ities) are independent of time; (iii) the model is finite andirreducible (and, for discrete-time systems, aperiodic);(iv) the contribution of each transition to the sum in (1)is fully determined by its initial and final state. For theERW, condition (i) is violated, but the other conditionsstill hold. [The ERW was defined on a finite (periodic)domain so that assumption (iii) is valid.] This enables aclear comparison with the classical theory. The strikingresult of this comparison is that the memory effect in theERW leads to a rate function that is (generically) singu-lar at q = 0, as we show below. Such behaviour is strictlyforbidden in the classical case and is directly attributableto the memory effect, via the breaking of assumption (i).By contrast, the GERW does not allow such a clearcomparison with the classical case. The model is de-fined on a compact domain, and assumption (iii) can begeneralised to account for this, while still ensuring ananalytic rate function. However, the GERW allows forjumps with | ∆ x t | > L , in which case the contribution to(1) is not fully-determined by the initial and final states(due to the periodic boundaries). In principle, assump-tion (iv) might be generalised to account for this effect,but the memory effect means that the typical jump sizecan diverge as q t → ∞ , which would not be allowed inthe classical case. In this sense, the GERW violates theclassical assumptions more strongly than the ERW. Weshow below that this strong violation can lead to an LDPwith reduced speed, θ < Marko-vian models. To see this, note that both the ERW andGERW can be formulated as Markovian models for ei-ther the current q t or (equivalently) the displacement Q t = tq t . The case of the displacement is more natu-ral: in this case the dynamical rule of the ERW is that Q t +1 = Q t ± ± aQ t /t ) /
2. In thisformulation, one sees that the transition probabilities de-pend explicitly on time. The GERW may be formulatedin a similar way. Representing the models in this way,the Markovian assumption (i) above is now obeyed, butassumption (ii) is violated. It follows that the behaviourpresented here can be viewed as either an extension of theclassical theory to a particular class of non-Markovianmodels, or an extension to a class of Markovian modelswith explicit time-dependence in the rates.In terms of methods, it is notable that the SCGF ψ ( λ )in the classical theory can be characterised as the largesteigenvalue of a matrix, the tilted generator [22]. Hencethe rate function is available, via (7). Such a determi-nation of ψ ( λ ) is not possible for the models consideredhere, and other methods must be used, for example thetheory of Dupuis-Ellis [43] as in [28], or arguments basedon separation of time scales [24, 25].We close this section by noting that since the (G)ERWmodels are defined on finite periodic domains, they can-not be formulated as Markovian processes for x t , evenwith time-dependent rates. The transition rates (or tran-sition probabilities) would need to depend on the windingnumber around the periodic boundaries. C. General models
Although we use the ERW and GERW as motivatingexamples, we stress that the mechanisms they illuminatehave much wider applicability. To this end, we intro-duce here a general notation for describing a broad classof models with similar memory-induced phemenonlogy.Several examples are given in Sec. V.We consider models in which the time t may be con-tinuous or discrete. Let C t denote the configuration ofthe (general) model at time t , this corresponds to x t inthe (G)ERW. This C t may come from a finite set as inthe periodic ERW, or it may indicate a vector in somecompact domain such as [0 , L ] d . The periodic GERW isin this latter class with d = 1.All models considered are jump processes. We definea time-averaged quantity that generalises (1): q t = 1 t (cid:88) jumps j α j (8)where the sum is over all jumps up to time t and α j de-pends on the properties of jump j . (In the ERW there isone jump on each time step and α j = ± x t .) In discrete time the model is specified by the con-ditional distribution of C t +1 given ( C t , q t ), supplemented by a rule specifying the contribution α j for each jump. Incontinuous time the model is specified by a set of jumprates [dependent on ( C t , q t )], and a rule specifying the α j . We assume that the dynamical rules do not dependexplicitly on time, but only on ( C t , q t ).All the results that we present can be straightforwardlygeneralised to the case where q t is a time-average of astate-dependent quantity (for example t (cid:82) t b ( C τ ) dτ asin [23]) but we restrict here to the form (8). The modelsthat we consider have scalar q t but the analysis is eas-ily extended to vectorial q t . For continuous-time modelsthen some regularisation may be required for (8) at shorttimes, see for example Sec. V A.Consistent with Sec. II B we observe that these generalmodels can be formulated as Markov processes ( C t , q t )but they are not Markovian for C t . Independent of thismathematical distinction, the physical role of q is to cap-ture the role of memory: its definition depends on thefull history of the process. III. FLUCTUATIONS IN THE GERW AND ERW
In this section we describe the large-deviation be-haviour of (G)ERW models. For the ERW we draw onresults of [8, 10, 28] and we characterise the relevant fluc-tuation mechanisms. For the GERW then p t ( q ) can becomputed quite straightforwardly and leads to an LDPwith reduced speed, we describe the relevant fluctuationmechanisms in this case too. A. Preliminary results
We summarise here some preliminary results for theERW and GERW with further detail given in Appen-dices A and B. For 0 < a < / (cid:104) q t (cid:105) (cid:39) t (1 − a ) . (9)Our work focusses on a > / (cid:104) q t (cid:105) (cid:39) χt − a ) (10)where χ is an a -dependent constant which we denote by χ E and χ G for the two models. For the ERW we have χ E = 1 / [(2 a − a )] from [8].For the GERW the total displacement is a sum ofGaussian-distributed increments so p t ( q ) is Gaussian atall times (although the increments are neither indepen-dent nor identically distributed). As shown in AppendixA, for large times one has p t ( q ) ∝ exp (cid:20) − q t − a ) χ G (cid:21) , (11)where χ G can be obtained from the limit of a series so-lution. (The proportionality sign is used because a t -dependent normalisation constant has been omitted. Weuse this notation in cases where the normalisation is clearfrom the context.) For large times, the correspondingCGF is G ( λ, t ) (cid:39) λ t a χ G . (12)We now turn to the ERW. Since the second deriva-tive of the CGF gives the variance of tq t (and using thatthe distribution is symmetric) one obtains from (10) anexpansion in powers of λ (at fixed t (cid:29) G ( λ, t ) (cid:39) λ t a χ E O ( λ ) , (13)which is similar to (12). However, there is no CLT for q t [44, 45] which has consequences for the correctionterms in (13) and their scaling with t .Baur and Bertoin [10] considered typical fluctuations of q at large times (that is, fluctuations with probabilitiesof order unity). Their theorem 3 states that p t ( q ) is ascaling function of qt − a , as in the GERW. Hence, G isdescribed by a scaling form at large tG ( λ, t ) (cid:39) g ( λt a ) , (14)which holds as t → ∞ , with the argument of g held fixed.In a recent mathematical study, Franchini [28] consid-ered large deviations in P´olya urn models, which can bemapped onto the ERW [10]. Corollary 12 of [28] estab-lishes that p t ( q ) follows an LDP, and that (14) extendsinto the large-deviation regime: taking t → ∞ with fixed λ (cid:28)
1, one has g ( λt a ) (cid:39) c E t | λ | /a (15)for some constant c E (dependent on a ). This result isdiscussed in Appendix B and a formula for c E is givenin (B10). Hence for q (cid:28) t → ∞ one has thelarge-deviation result p t ( q ) ∝ exp (cid:16) − κ E t | q | / (1 − a ) (cid:17) . (16)with κ E = (1 − a )( a/c E ) a/ (1 − a ) . For larger q there aredeviations from the scaling form; the full SCGF is givenin (B4), as derived in [28]. Equ. (16) is an LDP withspeed t and rate function I ( q ) (cid:39) κ E | q | / (1 − a ) . As ad-vertised above, the long-ranged memory has resulted ina rate function that is non-analytic (at q = 0), exceptin exceptional cases where 1 / (1 − a ) happens to be aneven integer. From a physical perspective, note that if q t obeyed a CLT then its asymptotic variance would be1 /I (cid:48)(cid:48) (0): here we have I (cid:48)(cid:48) (0) = 0, which shows that thescaling is superdiffusive, consistent with (10).We emphasise that the distributions (11,16) aresharply peaked as t → ∞ . In this sense, both systemsare ergodic [11]. ° ∏t a ° ° G ( ∏ , t ) ERW, t = 10 ERW, t = 10 GERW, t = 10 ° . ° . . . . qt ° a ° ° ° p t ( q ) t a ° ERW, t = 10 ERW, t = 10 ERW, Eq.(16)GERW Eq.(11) (a)(b)
FIG. 1. Numerical data for ERW and GERW with a = 0 . λt a . The dotted line is the analytical (large- t ) GERW result(12) with χ G obtained from Appendix A . The dashed line cor-responds to (15) and agrees well with the data, given that itis an asymptotic prediction that assumes that both t and λt a are large. There are no fitting parameters. (b) Distribution of q for the ERW, which collapses to a scaling function of qt − a ,as predicted by [10]. (The collapse is not quite perfect, whichwe attribute to finite- t corrections to scaling.) The solid lineis the analytical (large- t ) Gaussian distribution of the GERW,for comparison; the dashed line is the prediction (16) for thetail of the distribution; the constant κ E is derived using re-sults from Appendix B while the proportionality constant isdetermined by fitting to the data. Fig. 1 shows numerical data, which illustrates thesepreliminary results. For small values of λt a the CGFfor the ERW is proportional to | λt a | and matches thelarge- t GERW result. For larger λt a , the CGF for theERW matches the large-deviation form (15) without anyfitting [the value of c E is given in (B10)]. For both ERWand GERW, the distribution p t ( q ) is a scaling function of qt − a . The ERW result (16) is shown in Fig. 1(b) witha dashed line: the value for κ E is derived from c E butthe proportionality constant in (16) is used as a fittingparameter.We close this section with a result for conditional av-erages. The models have fixed initial conditions and av-erages over the dynamics are denoted by (cid:104)·(cid:105) . Define also (cid:104)·(cid:105) q τ as an average that is conditioned on the value of q τ .For both GERW and ERW, averaging over the possibili-ties in a single step gives( τ + 1) (cid:104) q τ +1 (cid:105) q τ = ( τ + a ) q τ . (17)It follows [8] that for t > τ , (cid:104) q t (cid:105) q τ = q τ Γ( t + a )Γ( τ + 1)Γ( t + 1)Γ( τ + a ) . (18)Hence for large t : (cid:104) q t (cid:105) q τ = η q τ t − a . (19)with η = Γ( τ + 1) / Γ( τ + a ). That is, if the elephant isconditioned to have a non-typical value of q τ , its sub-sequent evolution involves regression to the mean (zero)as a power law with exponent 1 − a . This result will beused in the following to rationalise the large-deviation be-haviour of these models. As a point of comparison, time-averaged quantities in finite Markovian systems generi-cally have power-law relaxation with exponent 1. B. IGL mechanism for large deviations in theGERW
We now turn to large deviations, beginning with theGERW. Consider a discrete-time trajectory with t stepswhich we represent using its sequence of increments: X = (∆ x , ∆ x , . . . , ∆ x t ). This trajectory occurs withprobability P ( X ) which is a multivariate Gaussian dis-tribution, so all correlations can be computed exactly (atfixed t ). Specifically, P ( X ) ∝ exp[ −S ( X ) /
2] (20)with S ( X ) = t − (cid:88) τ =0 [( τ + 1) q τ +1 − q τ ( τ + a )] (21)where we recall that q τ is related to the increments ∆ x by (1), with q = 0.To characterise large deviations, the most likely paththat achieves q t = q can be derived, by conditioning P ( X ) on this rare event. Collecting terms in (21), oneobtains S ( X ) = t q t + t − (cid:88) τ =1 q τ (cid:0) τ + a + 2 aτ (cid:1) − t − (cid:88) τ =1 q τ q τ +1 ( τ + 1) ( τ + a ) . (22) Conditioning on q t , we arrive at a Gaussian distributionfor the ( t − q = ( q , q , . . . , q t − ).This is P micro ( q | q t ) ∝ exp (cid:18) hq t q t − − q T M q (cid:19) , (23)where h = t ( t + a − M is a matrix whose elementscan be read from (22). The subscript “micro” recalls thatconditioning on q t = q is analogous to considering a mi-crocanonical ensemble in thermodynamics. Completingthe square in the exponent, one obtains P micro ( q | q t ) ∝ exp (cid:20) − ( q − hq t µ ) T M ( q − hq t µ )2 (cid:21) (24)where µ is given by the ( t − M − . Hencethe most likely path with q t = q is given by (cid:104) q τ (cid:105) micro = µ τ qh . (25)This path depends on the value of q t and on the asso-ciated time t . For finite t , the path can be straightfor-wardly computed numerically (for all τ < t ).It is also possible to construct an optimally-controlledprocess (or auxiliary process) whose typical dynamicsgenerate the most likely path to q t = q . This is sim-ilar to the Doob-transformed dynamics of [22, 46], seealso [47–51] and Appendix C 1 of this work for the generaltheory. For the GERW, we have derived the optimally-controlled process, see Appendix C 2. Its average path is (cid:104) q τ (cid:105) con = (cid:104) q τ (cid:105) micro (for τ = 1 , , . . . , t ). Fig. 2(a) showsresults illustrating the average path under the controlleddynamics, also compared with the ERW (see below). Themechanism for achieving a rare value of q t is that theGERW makes very large hops on the first few steps, af-ter which q τ decreases towards q t .The results so far are valid for any finite time but weare interested in large deviations as t → ∞ . In this casethe problem may be simplified. We characterise the mostlikely path as the minimum of the exponent in (24). Writ-ing the matrix product as a sum over time steps [similarto (21)], we fix some K and separate the sum into termswith τ ≤ K and τ > K . For τ > K we make the replace-ment q τ → ˜ q ( τ ) where ˜ q is a smooth function of τ ; thisallows the sum to be estimated by an integral. Fixingvalues for q K and q t , the action can be minimised (ex-actly) over the function ˜ q , which is equivalent to solvingthe instanton equation in [25]. One finds˜ q ( τ ) = C τ − a + C τ − (1 − a ) (26)where C and C are fixed by the boundary conditionsat τ = K, t . Writing u = K/t (so 0 < u < S from this path is [25] S = ut (2 a −
1) (1 − u a − ) (cid:0) q t − q K u − a (cid:1) ( u a − u − a ) . (27) τ h q τ i c o n (a) GERWERW q t ( t/τ ) − a + C − − λ ψ ( λ ) (b) theorynumerical FIG. 2. (a) Averaged paths (cid:104) q τ (cid:105) con of the controlled dynamicswith t = 10 , and (cid:104) q t (cid:105) con = 0 .
2, for the ERW and GERWwith a = 0 .
7. These illustrate the IGL and LIE mechanisms.The dashed line indicates the long-time behaviour (29) whichis common to both ERW and GERW. (In the plot, this hasbeen offset by C = 0 .
14, for clarity.) (b) Theoretical estimatefor ψ ( λ ) in the ERW again with a = 0 .
7, derived at t = 10 and compared with numerically exact results for small λ . Thedashed and dotted lines indicate the predicted power laws ψ ∝ λ /a and ψ ∝ λ respectively. For this optimal path (21) then reduces to S = K − (cid:88) τ =0 [( τ + 1) q τ +1 − q τ ( τ + a )] + S (28)which is to be minimised over q , q , . . . , q K . This is theprocedure used to obtain the GERW paths in Fig. 2(a).We typically take K = 40, this choice does not stronglyaffect the results because replacing the discrete sum byan integral is accurate for K (cid:29) a > /
2, the behaviour of (26) for large t, τ gives (cid:104) q τ (cid:105) con ≈ q t ( t/τ ) − a , (29)similar to [25]. Comparing with (19), we see that the long-time behaviour of the optimally-controlled dynamicsmatches the natural regression to the mean.Extrapolating (29) back to τ = 1 indicates that for(rare) paths that end at q t , the first hop should have size q t t − a , which diverges as t → ∞ . In fact the early-timebehaviour is more complex but the size of the first hopis indeed of this order. The diverging hop is the reasonthat we call this mechanism an initial giant leap (IGL).It applies in the Gaussian elephant for all fluctuationswith q t = O (1) as t → ∞ .Two comments are in order. First, the analysis herefor τ > K recovers exactly that of [25], the fact that thedistribution of q t is sharply-peaked under the controlleddynamics can be used to justify the so-called temporaladditivity assumption in that work, for τ ≥ K (cid:29) τ < K ,it is important that the model evolves by discrete timesteps and that q t can change significantly in a single step.This means that the temporal additivity assumption isnot valid in this regime. For this reason, quantitativeresults for p t ( q ) require a more detailed analysis of earlytimes, without the temporal additivity assumption. Weaccomplish this here by analysing numerically the sumof terms with τ < K . The second comment is that weuse the language of a giant leap, but we note that theGERW makes (on average) very large jumps on severalof the early time steps. We explain below that we areusing IGL to refer to any divergent displacement q ∗ in afinite time interval τ ∗ , see Sec. IV B. C. LIE mechanism for large deviations in the ERW
As discussed in Sec. III A, large-deviation properties ofthe ERW are available from [28]. In particular, there isan LDP for q t with speed t whose rate function behavesfor small q as I ( q ) (cid:39) κ E | q | / (1 − a ) . (30)Correspondingly, ψ ( λ ) (cid:39) c E | λ | /a , (31)for small λ . [Recall Equs. (15,16).]We characterise here the mechanism responsible for(31), by deriving a controlled process which captures thebehaviour of the relevant conditioned path ensemble, seeAppendices C 1 and C 3. This controlled process is simi-lar to the original process, but now ∆ x τ = ± ± b τ ) / b , b , . . . , b t )are variational parameters that we optimise, to reproducethe large-deviation mechanism.This analysis yields a controlled process for which (cid:104) q τ (cid:105) con is shown in Fig. 2(a): for early times, typicalpaths have q τ ≈ long initialexcursion (LIE). For larger times, q τ decreases. Fig. 2(b)shows our theoretical estimate of ψ ( λ ) obtained by a vari-ational analysis at finite t , compared with numericallyexact results from direct simulation. The theoretical es-timate (i) matches the exact result in the region wherenumerical results are available; (ii) is consistent with (31)for t − a (cid:28) λ (cid:28)
1; (iii) recovers ψ ( λ ) (cid:39) | λ | for large λ ,which is the exact result (since q ≤ ψ ( λ ).It can also be shown that the averaged paths inFig. 2(a) capture the true fluctuation mechanism. Wesketch the argument. At the level of large deviations, thetrue mechanism is the path measure P con that achievesequality in (C4). From [28], the large-deviation event q t = q is associated with a single path, in the sense thatthe conditional distribution of q αt is sharply-peaked as t → ∞ for all α ∈ (0 , q = 0, for (symmetric)models similar to the ERW. This predicts dominant pathssimilar to (26). The results presented here show that suchan expansion is not generically valid: for all q t (cid:54) = 0, large-deviation events involve initial excursions far from q t = 0,and the quadratic expansion breaks down. Nevertheless,if q t (cid:28) τ , that is (cid:104) q τ (cid:105) con ≈ ( t/τ ) − a q t as in (29). [For theERW, this result is valid for a > / q t (cid:28) t, τ → ∞ such that also (cid:104) q τ (cid:105) con (cid:28) τ is sufficiently large.In Sec. IV C below, we exploit this fact to show thatthe scaling ψ ( λ ) ∼ | λ | /a of (31) is generic if optimally-controlled processes have (i) (cid:104) q t (cid:105) con ≈ τ ∗ ∼ t , and (ii) (cid:104) q τ (cid:105) con ∼ τ − (1 − a ) for long times. This isthe sense in which the ERW is a prototype for a generalfluctuation mechanism. IV. GENERIC FLUCTUATION MECHANISMSA. Overview of method
We have explained that the large-deviation behaviourof the ERW and GERW is different from that expectedin Markov chains. Fluctuations in these models occur bymechanisms where the particle makes a large excursion from the origin at early times, which biases all future mo- tion in the same direction, via the memory effect. Thisleads to a reduced speed in the LDP of the GERW andto a singular rate function in the ERW. The differencebetween ERW and GERW arises from the different char-acters of their initial excursions (a giant leap over a finitetime for the GERW and a long excursion scaling with tra-jectory length for the ERW). In this section we explainthat such phenomena are relevant for a broad class ofnon-Markovian models. We provide general conditionsunder which excursions can occur, and explain their con-sequences for LDPs.We consider models where q t converges to its mean as t → ∞ (to be precise, this is convergence in probability).We denote this mean value by q ∞ = lim t →∞ (cid:104) q t (cid:105) . (32)For simplicity we discuss deviations with q t > q ∞ ; theopposite case is a straightforward analogue. We considerexcursions which extend over a time period between t = 0and some time τ ∗ . The probability p t ( q ) can be boundedfrom below by restricting to paths where the size of theexcursion is at least q ∗ , that is q τ ∗ ≥ q ∗ . By conditionalprobability:log p t ( q ) ≥ log p t ( q | q τ ∗ ≥ q ∗ ) + log P ( q τ ∗ ≥ q ∗ ) (33)where P ( q τ ∗ ≥ q ∗ ) is the probability of the excursion and p t ( q | q τ ∗ ≥ q ∗ ) is the corresponding conditional probabil-ity density for q t .The inequality (33) is valid for all q ∗ , τ ∗ . Now supposethat q, t are given and we seek a useful bound on p t ( q ):this requires that we choose suitable values for q ∗ , τ ∗ . Tothis end, we introduce the notation (cid:104)·(cid:105) q ∗ ,τ ∗ for averagesthat are conditioned on q τ ∗ ≥ q ∗ . Then we choose q ∗ , τ ∗ such that (cid:104) q t (cid:105) q ∗ ,τ ∗ = q and we further assume that theconditional distribution of q t is sharply-peaked at thisvalue. This means that if we consider trajectories where asuitable excursion has already taken place before τ ∗ , thenfollowing the natural dynamics of the model for t > τ ∗ will result in q t ≈ q with a probability close to unity.Under these assumptions, (33) reduces tolog p t ( q ) (cid:38) log P ( q τ ∗ ≥ q ∗ ) . (34)In other words, we now have a more explicit lower boundon p t ( q ) which is valid if (cid:104) q t (cid:105) q ∗ ,τ ∗ = q. (35)(The additional requirement that the conditional distri-bution is sharply peaked is always obeyed in the follow-ing.)The strategy in Secs. IV B and IV C below is to char-acterise situations in which (34) can be used to establishLDPs that differ from those expected in finite Markovianmodels. In particular, we now establish a sufficient con-dition for memory to have a strong effect on the large- t behaviour. Physically, the idea is that after the excur-sion, the time-averaged current relaxes to its steady-statevalue as a power law with exponent 1 − a , as establishedin (19) for the (G)ERW. Finite Markovian systems relaxgenerically as t − , so a encodes the effects of memory,this is related to the fixed-point stability analysis of [25].The condition that we will require is that for t > τ ∗ , (cid:104) q t − q ∞ (cid:105) q ∗ ,τ ∗ (cid:39) ( q ∗ − q ∞ ) F ( q ∗ , τ ∗ ) (cid:18) τ ∗ t (cid:19) − a (36)for some function F , and some number a ∈ (0 , p t ( q ). We must first establish (36) for a par-ticular model, at least for q ∗ in some range. To bound p t ( q ) for specific values of q, t , we must then find a com-bination q ∗ , τ ∗ such that (36) holds, with q t = q . As longas this is possible, the constraint (35) is satisfied and theresulting q ∗ , τ ∗ can be substituted into (34) to obtain abound on p t ( q ). Note that the combination q ∗ , τ ∗ de-pends in general on t ; the final step is to take t → ∞ inorder to characterise large deviations that occur in thislimit. This strategy is similar to those used in [16]. B. Generic IGL mechanism
We now show how a generic IGL mechanism leads to auseful bound. We achieve this by laying out the proper-ties that a model should have, in order that this mecha-nism is relevant. A defining feature of the IGL is that ittakes place over a finite time period τ ∗ and that the sizeof the excursion diverges in the limit t → ∞ .The first requirement is that the model of interest sup-ports very large excursions. To characterise their proba-bility, we require that there exists some τ ∗ such that for q ∗ → ∞ we havelog P ( q τ ∗ ≥ q ∗ ) (cid:39) − γ | q ∗ − q ∞ | β , (37)with γ, β >
0. Since we consider divergent excursions, werequire that (36) remains valid even as q ∗ → ∞ . In thefollowing we take τ ∗ to be a fixed parameter, the choiceof its value is discussed below. We define f ∗ ( τ ∗ ) = lim q ∗ →∞ F ( q ∗ , τ ∗ ) (38)which we require to be strictly positive. These require-ments place strong constraints on the range of models forwhich the IGL mechanism will determine the large devi-ations but, as we demonstrate, such models do indeedexist. Then (33,36) with q = (cid:104) q t (cid:105) q ∗ ,τ ∗ yield − log p t ( q ) (cid:46) t β (1 − a ) | q − q ∞ | β κ IGL (39)with κ IGL = γf ∗ ( τ ∗ ) − β τ − β (1 − a ) ∗ .Equ. (39) corresponds to an LDP with speed t β (1 − a ) .If this speed is less than t , fluctuations are qualitativelylarger than one finds in generic Markovian systems. Inprinciple the bound (39) can be optimised over τ ∗ . How-ever, (39) already establishes that the speed of the LDP can be less than t , without any requirement for optimi-sation over τ ∗ . This is the central result. In this sense,the specific value of τ ∗ is not crucial.The GERW satisfies all the requirements for the IGLmechanism, with β = 2; one may take τ ∗ = 1. The appli-cability of (36) was already shown in (19). The resultingbound is consistent with the exact result (11), it gives theright scaling with t and the correct general mechanism.However, the constant κ IGL obtained from this genericargument does not coincide with the prefactor in the ex-ponent of (11): obtaining that result requires the moredetailed (model-dependent) calculation of Sec. III B.We note in passing that some arguments of [25] aresimilar to those of this section, but the connection be-tween the giant leap and the reduced speed of the LDPwas neglected in that work. In particular, the require-ment that (36) must hold as q ∗ → ∞ means that somecare is required when applying the arguments of [25] togeneric models; they are not valid in the ERW, for ex-ample. C. Generic LIE mechanism
The LIE mechanism is generically associated with ex-cursions that have finite q ∗ but diverging τ ∗ (proportionalto t ). This may be compared with the IGL, which hasfixed τ ∗ and diverging q ∗ . The LIE mechanism has twocentral requirements, which must hold for some q ∗ , dif-ferent from q ∞ . First, (36) must hold asymptotically for1 (cid:28) τ ∗ (cid:28) t . Second, f ‡ ( q ∗ ) = lim τ ∗ →∞ F ( q ∗ , τ ∗ ) (40)must be strictly positive. Comparing with (38), the rolesof q ∗ , τ ∗ are reversed.Under these conditions, we assume that there is anLDP with speed t as in (3), verify the self-consistencyof this assumption, and establish a bound on the ratefunction I ( q ) for | q − q ∞ | (cid:28)
1. Since τ ∗ is proportionalto t , this means that P ( q τ ∗ ≥ q ∗ ) (cid:39) exp[ − τ ∗ I ( q ∗ )] (41)which is analogous to (37). Using this with (34,36) yields − log p t ( q ) (cid:46) tκ LIE | q − q ∞ | / (1 − a ) (42)with κ LIE = I ( q ∗ ) (cid:18) | q ∗ − q ∞ | f ‡ ( q ∗ ) (cid:19) / (1 − a ) . (43)The result (42) is consistent with the assumption of anLDP with speed t , but it shows (for a > /
2) that therate function increases from zero more slowly than anyquadratic function. As noted above, this means that I (cid:48)(cid:48) (0) = 0, corresponding to superdiffusive scaling. τ . . . . . . h q τ i c o n ERWmin(1 , q t ( t/τ ) − a ) FIG. 3. Comparison between the optimally controlled pathfor the ERW (similar to Fig. 2) and the corresponding genericLIE path used to derive (42). We take a = 0 . q t = 0 . t = 10 . The generic LIE path has an excursion with q ∗ = 1, after which q τ relaxes back towards zero, as the systemfollows its natural dynamics (19). The generic LIE path doesnot capture the details of the optimally-controlled (instanton)path which means that the coefficient κ LIE does not match κ E in (16), but the generic LIE argument is sufficient to capturethe non-quadratic form of the rate function at q = 0. In addition, by Varadhan’s lemma [a standard resultin large deviation theory [18, 21], which amounts to theinverse Legendre transform of (7)], one obtains ψ ( λ ) (cid:38) sup q [ λq − | q − q ∞ | / (1 − a ) κ LIE ] (44)which gives ψ ( λ ) (cid:38) λq ∞ + | λ | /a c LIE (45)with c LIE = a (cid:18) − aκ LIE (cid:19) (1 /a ) − . (46)All these generic arguments are consistent with the be-haviour of the ERW, which has q ∞ = 0. In particular,the requirement for (36) to hold asymptotically followsfrom (19).Moreover, the results of Sec. III C indicate that thetrue fluctuation mechanism for the ERW is an LIE with q ∗ = 1. Also p t (1) = (1 + a ) t − / t because all hopshave ∆ x t = 1 in this case, so I (1) = log[2 / (1 + a )].The coefficient in (19) is η (cid:39) τ − a as τ → ∞ , whichmeans f ‡ ( q ∗ ) = 1. Hence the bound (39) holds with κ LIE = log[2 / (1 + a )]. The exact result for the ERW canbe obtained from κ E = (1 − a )( a/c E ) a/ (1 − a ) as quoted inSec. III A, together with (B10).For the representative case a = 0 .
7, we find κ E = 0 . κ LIE = 0 .
16. Given that the generic LIE argu-ment is much simpler than the full calculation of κ E , thislevel of agreement is reasonable. The generic LIE argu-ment is based on a simple path (or equivalently a simplecontrolled process) that includes a long excursion: the path is illustrated in Fig. 3, where it is compared withthe optimal LIE path discussed in Sec. III C. The genericLIE path captures the correct qualitative behaviour andmatches the optimal path for small and large times. How-ever, the agreement is not quantitative, and the differencebetween κ LIE and κ E reflects this. D. Discussion of generic IGL and LIE mechanisms
We summarise the difference between the IGL and LIEmechanisms. The IGL makes a giant (divergent) excur-sion in a finite time and leads to an LDP with reducedspeed. The LIE makes a finite excursion over a long (di-vergent) time period; it leads to an LDP with speed t ,and to a rate function with I (cid:48)(cid:48) ( q ∞ ) = 0 which is (gener-ically) non-analytic at q ∞ . In all the examples that wehave managed to construct, the IGL mechanism relies onmicroscopic transition rates that diverge as q t → ∞ , inorder to satisfy (36).The IGL mechanism has an interesting analogy withcondensation in interacting-particle systems [52, 53]: toachieve q t = q the system must support an excess currentwhich may be distributed over a macroscopic fractionof the time period (as in the LIE), or condensed into afinite time interval (the IGL). A similar phenomenon isdescribed by the “single-big-jump” principle for sums ofrandom variables (including certain types of correlatedprocess) [54]; the particular history-dependence in ourmodels, with a >
0, constrains the condensation to takeplace at the beginning of the time period.We close this section by noting that (39,42) are bothlower bounds on the probability p t ( q ). Physically, thismeans that fluctuations can take place by IGL and LIEmechanisms, so fluctuations of a given size q are at leastas likely as (39,42) predict. We have not ruled out com-peting mechanisms that might allow fluctuations of thesame size to occur in a more likely way. As a simpleexample, an LIE bound can be obtained for the GERWbut does not accurately describe the probability of rarefluctuations, because the IGL mechanism is available andoccurs with (much) higher probability. [Indeed it is easyto see that the IGL mechanism, if available, will alwaysdominate the LIE mechanism if β (1 − a ) < V. EXAMPLE MODELS EXHIBITING IGLSAND LIES
By considering IGLs and LIEs, we have establishedsimple and generic requirements which enable bounds onthe probabilities of large-deviation events. It is straight-forward to construct (or identify) other models that ex-hibit these mechanisms. In this section we give a briefdiscussion of three such cases. Similar to the ERW in0Sec. III C, we establish bounds on the probabilities oflarge excursions by using arguments based on optimalcontrol theory, these computations then enable us tocheck conditions for the IGL and LIE mechanisms. Ourmain purpose here is not to describe the model behaviourin detail, but rather to illustrate the general relevance ofthe identified mechanisms.
A. IGL in unidirectional hopping model
As an example in continuous time, we modify the uni-directional walker model of [24]. Similarly to the ERW,we consider a particle with integer-valued position x t which we identify with the configuration C t . The par-ticle always hops in the same direction so ∆ x t = 1. Wedefine q t as the total time-averaged displacement whichcorresponds to (8) with α j = 1 for all jumps. In the vari-ant of the model that we consider, the particle makes itsfirst jump at time t ; subsequent jumps occur with rate r ( q t ) = aq t , (47)where 0 < a <
1. The regularisation parameter t isimportant because if one allows jumps to occur at arbi-trarily early times then q t in (8) can become arbitrarilylarge after just one jump; combined with (47), this canlead to pathological fluctuations.The results of [24] indicate that large deviations with q t > q ∗ ∼ t − a , leadingto an LDP with speed t − a . However, that work madean assumption of temporal additivity which (strictly-speaking) is valid only for t (cid:29)
1. Here we discuss thecase where t takes any positive value; we show that theIGL mechanism operates, and p t ( q ) can be bounded as in(39), which is consistent with an LDP with speed t − a .To analyse the IGL we take τ ∗ = 2 t . In this case weshow in Appendix D thatlog P ( q τ ∗ ≥ q ∗ ) (cid:38) − γ uni q ∗ t , (48)with γ uni = O (1) as q ∗ → ∞ . That is, the probabilityof a large excursion to q ∗ in a finite time decays at mostexponentially in q ∗ . This establishes the requirement (37)for an IGL.Moreover, after the excursion the average displacementobeys τ ∂∂τ (cid:104) q τ (cid:105) q ∗ ,τ ∗ = ( a − (cid:104) q τ (cid:105) q ∗ ,τ ∗ , (49)which follows directly from the master equation of themodel. Similar to (19), integrating this equation yields (cid:104) q τ (cid:105) q ∗ ,τ ∗ = q ∗ ( τ ∗ /τ ) − a which is exactly the requiredcondition (36) with q ∞ = 0 and F ( q ∗ , τ ∗ ) = 1. Notethat this holds even as q ∗ → ∞ , which is related to thefact that r ( q ∗ ) diverges in this limit. To apply (34) re-quires that the conditional distribution of q t after theexcursion is sharply-peaked: this is easily verified. Hence, the conditions for an IGL are in place and wehave established (39) with β = 1 and f ∗ ( τ ∗ ) = 1, that is − log p t ( q ) (cid:46) t − a κ uni q , (50)with κ uni = 2 a − γ uni t a , using τ ∗ = 2 t , from above.This corresponds to an LDP with speed t − a as shownin [24, 25] by arguments based on an assumption of tem-poral additivity. Our analysis avoids any such assump-tion; it also shows that the unusual speed of the LDParises because the fluctuation mechanism is an IGL.The result (50) applies to the unidirectional modelwith r ( q ) = aq but, in fact, the main ingredient requiredin the analysis was lim q →∞ [ r ( q ) /q ] = a (with 0 < a < B. LIE in cluster growth models
We consider a model of a growing cluster as in [14–16].The cluster contains two types of particles (for exam-ple, red and blue) whose numbers at time t are n Rt and n Bt . The cluster evolves in discrete time and a singleparticle is added on each step, so n Rt + n Bt = t . (Thisis the irreversible model of [14], in that particles areadded but never removed.) The configuration is givenby C t = ( n Rt , n Bt ) and we take q t = ( n Rt − n Bt ) /t whichmeans that α t = ± t , the added particle is red (+) or blue ( − )with probability (1 ± tanh Jq t − ) / J > m t = ( n Rt − n Bt ) is similar to that of theERW position x t , but with the nonlinear tanh functionreplacing the linear function in (2). This nonlinearityleads to a symmetry-breaking transition: for J < q t ≈ J > q t ≈ ± ¯ m , which corresponds to spontaneous de-mixing. Here ¯ m is the order parameter for the underlyingphase transition [14]. Large deviations in this model werediscussed previously in [15, 16], it may be also formulatedas an urn model so the results of [28] are applicable.In the mixed (one-phase) regime, the behaviour of thismodel is qualitatively similar to the ERW. It can beanalysed similarly to Sec. III C, using the same (gen-eral) controlled model: red/blue particles are added withprobabilities (1 ± b t ) /
2. The theoretical arguments ofAppendix C 1 can then be applied. Indeed, these ideaswere already applied to the growth model in [16]: for1 / < J < a = J . However, that paper did not come to a definitiveconclusion about the speed of the LDP in this regime.The general results of the present work can be used toresolve this open question, and to understand the rare-event mechanism. We outline the argument below (againfor 1 / < J < t , so one may expect an LIE mechanism,similar to the ERW. Moreover, Ref. [16] showed that (19)holds in this system for relaxation as t → ∞ after aninitial excursion. However, contrary to the ERW, thisresult is now valid only for (cid:104) q t (cid:105) q τ (cid:28)
1. This establishesthat (36) holds, but only for small values of q ∗ .We therefore fix some small value for this parame-ter and construct the LIE, using (36) as in Sec. IV Cto fix τ ∗ = t ( q ∗ f ‡ ( q ∗ ) /q t ) − / (1 − a ) so that the naturaldynamics after the excursion arrives at q t with proba-bility 1. By [28], this excursion has Prob( q τ ∗ ≥ q ∗ ) (cid:39) exp[ − τ ∗ I ( q ∗ )], although the rate function I is not knownexplicitly. These results can be used with (34) to obtain − log p t ( q ) (cid:46) tκ LIE | q | / (1 − a ) , (51)as in (42). Since the validity of (36) is restricted to small q ∗ , this construction is restricted to small q (strictly pos-itive and fixed as t → ∞ ). Still, this generic bound issufficient to establish the non-analytic behaviour of therate function at q = 0. The use of a fixed small value of q ∗ is convenient for this argument but is not expected tobe optimal for the large-deviation mechanism; in fact weanticipate the true large-deviation mechanism to involvean excursion with q ∗ = 1, as for the ERW. This meansthat the prefactor κ LIE is likely to be far from optimal,but the scaling (51) is expected to be robust.The overall picture is that for small values of q t (fixedas t → ∞ ), the cluster growth model with 1 / < J < a = J , exhibitingan LIE fluctuation mechanism and a rate function thatincreases from zero with exponent 1 / (1 − a ). Physically,the similarity can be explained by an argument similarto the fixed-point stability analysis of [25], because theexponent that appears in the LIE bound only depends onthe asymptotic (long-time) dynamics close to the fixedpoint. For models that can be formulated as urns [28],we therefore expect these similarities with the ERW tobe generic, based on an expansion of the urn functionabout the fixed point. C. LIE in a non-Markovian exclusion process
We consider a non-Markovian symmetric exclusionprocess (SEP) where N particles hop in continuous timeon a periodic one-dimensional lattice of L sites, subjectto the constraint that each site may contain at most oneparticle. We define n i = 1 if site i contains a parti-cle, and n i = 0 otherwise. A configuration is speci-fied as C = ( n , n , . . . , n L ). The time-averaged currentis q t = ( Lt ) − (cid:80) jumps j ∆ x j , as in (8), where the sumis over all particle hops, with ∆ x j = ± q t have been studied extensively in the Markoviancase [55, 56]. For non-Markovian models, similar quanti-ties have been studied in [25, 57]. Given the connectionsbetween exclusion processes and traffic modelling [58], − t − − − h q t i (a) ν = 2 . ν = 2 . ν = 1 . − .
05 0 .
00 0 . q − − p t ( q ) (b) FIG. 4. Numerical results for a non-Markovian symmetricsimple exclusion process with (
N, L ) = (8 , ν c = 3 . ν = 2 . , .
8, particle motion is superdiffusive sothe variance of q t decays as a power law consistent with(10), dashed lines indicate power-law behaviour with thetheoretically-predicted exponent a = ν/ν c . For ν = 1 . (cid:104) q t (cid:105) ∝ t − , since ν < ν c /
2. (b) Thedistribution of q t for ν = 2 . t = 10 is similar to the ERWin Fig. 1, the dashed black line is a fit to (16) with a = ν/ν c . the generalisation of such models to include memory ofprevious flow (current) is quite natural [25].We introduce here a memory of mean-field type, sothat every particle hops either right (+) or left ( − ) withrate w ± = [1 ± tanh( νq t )] /
2, as long as the destinationsite is empty. The non-linearity in this model is similarto that of the cluster growth model which leads to somesimilar phenomenology.It is useful to note that detailed balance is broken inthis model (except for q t = 0), but the dynamical rules forany given q t correspond to an asymmetric simple exclu-sion process with periodic boundaries, whose stationarystate has all particles distributed independently (subjectto the exclusion constraint). Assuming that the system isin such a stationary state at time t , and its time-averagedcurrent is q t , the (average) rate for accepted particle hopsis (cid:28) L dd t ( tq t ) (cid:29) q t = N L − NL − νq t ) . (52)Here the factor of ( L − N ) / ( L −
1) is the probability thata site adjacent to a given particle is vacant. Expandingthe tanh about q t = 0 shows that the zero-current state (cid:104) q t (cid:105) = 0 is stable only if ν < ν c with ν c = L ( L − N ( L − N ) . (53)We identify ν c as a phase-transition point, directly anal-ogous to the cluster-growth model.For ν < ν c , expansion of (52) about q t = 0 yields(36) with a = ν/ν c , which is again similar to the growthmodel and indicates that the LIE scenario is applicable,at least for small q ∗ . As a controlled model, we con-sider a (Markovian) asymmetric simple exclusion processwith a time-dependent asymmetry parameter, so hops in2the ( ± )-direction have w ± = (1 ± b t ) /
2. This controlledmodel also has particles distributed independently at alltimes. In this case the KL divergence may be computedsimilarly to (C9). This allows numerical optimisation ofthe controlled dynamics – the optimal behaviour is sim-ilar to the ERW and cluster growth models, showing anLIE mechanism. An explicit LIE bound may also be de-rived by following exactly the same steps as used for thecluster growth model in Sec. V B.Contrary to the other models considered here, we donot expect this controlled model to fully capture thelarge-deviation mechanism, because it neglects interpar-ticle correlations which are important for large deviationsin exclusion processes [55]. This effect might be capturedby combining the temporal additivity principle [25] withresults for large deviations in Markovian exclusion pro-cesses [55], but such an analysis is beyond the scope ofthe present work. However, we expect the general fea-tures to be robust: a large excursion at early times and arate function scaling as (42). Numerical results confirm-ing the similarity between this non-Markovian SEP andother LIE models are shown in Fig. 4. This analysis illus-trates that the generic fluctuation mechanisms describedin this paper are not limited to simple one-particle sys-tems.As a final observation, note that since particles do notpass each other in exclusion processes, trajectories withtime-averaged current q t = c at long times must havesingle-particle currents whose time averages all convergeto c also. For this reason, we would expect similar be-haviour if each particle had an individual memory of itsown individual displacements, in contrast to the simple(mean-field) case considered here, where the motion ofeach particle is affected by the memory of the whole sys-tem. VI. OUTLOOK
We have presented two mechanisms by which large de-viations can occur in non-Markovian processes, leadingto generic bounds (39,42) on the probabilities of theserare events. To prove that these bounds give the rightscaling in specific cases requires more detailed analysis, asillustrated here for the simple ERW and GERW models.(Such analyses are necessary to rule out competing mech-anisms with larger probability then the IGL and LIE.)Our results indicate that the LIE mechanism operatesin a non-Markovian exclusion process, and the generalmechanistic insights have enabled us to clarify and ex-tend several other results from the literature [16, 24, 25].This understanding is also relevant in socioeconomic de-cision models that can be approximated by generalisedurn/elephant models [17]; by revealing fluctuation mech-anisms in these systems, our analysis may be utilised topredict and control their long-term fluctuations. We lookforward to future work exploiting these new insights, inorder to elucidate the rich fluctuation behaviour of non- Markovian systems.
ACKNOWLEDGMENTS
We thank Simone Franchini for helpful discussions.R.J.H. gratefully acknowledges an External Fellowshipfrom the London Mathematical Laboratory.
Appendix A: Typical fluctuations in the GERW
We derive (10) for the GERW. Suppose that after t steps Q t = tq t has a Gaussian distribution with meanzero and variance v t . Then Q t +1 − Q t is normally dis-tributed with mean aq t and variance 1 so the distributionof Q t +1 is p ( Q t +1 ) = 1 z t (cid:90) exp (cid:20) − [ Q t +1 − (1 + at ) Q t ] − Q t v t (cid:21) d Q t (A1)with z t = √ π v t . This distribution is normal with meanzero and variance v t +1 = 1 + v t (1 + a/t ) . From this re-cursion relation one finds a series solution for v t in termsof the gamma function: v t = Γ( a + t ) Γ( a + 1) Γ( t ) (cid:32) t − (cid:88) n =1 Γ( a + 1) Γ( n + 1) Γ( a + n + 1) + 1 (cid:33) . (A2)The form of the large- t behaviour can be obtained di-rectly from the recursion by writing v t = v ( t ) so that v (cid:48) ( t ) ≈ av ( t ) /t . Hence v ( t ) ≈ t/ (1 − a ) + ct a andso the variance of q t isVar( q t ) = v ( t ) t ≈ t (1 − a ) + c a t − − a ) (A3)where subleading terms at higher order in t − have beenomitted. The second term is dominant for a > / c a corresponds to χ G in the asymptotic vari-ance; its value can be extracted as a limit from the seriessolution. For a = 0 . χ G ≈ . Appendix B: Large deviations in ERW by mappingto urn model
For large deviations of q t in the ERW, the SCGF ψ ( λ )can be obtained exactly by adapting results of [28]. Westate the equations and characterise the behaviour atsmall λ .3The ERW can be interpreted as an urn model [10].If the fraction of + steps before time t is s t then theprobability that ∆ x t +1 = +1 is π ( s t ) where π ( s ) = 1 + a (2 s − a, b ) of Corollary 12 of [28]correspond to ((1 − a ) / , a ) in the notation of this work.Since q t = 2 s t − G ( λ, t ) = log (cid:104) e λt (2 s t − (cid:105) . (B2)Define ˜ ψ ( µ ) = lim t →∞ t − log (cid:104) e µts t (cid:105) as the SCGF of [28],denoted in that work by ψ . Then (6,B2) yield ψ ( λ ) = ˜ ψ (2 λ ) − λ . (B3)Hence by Corollary 12 of [28] one has for λ > ψ ( λ ) = − log (cid:104) − w e − wλ y /a B ( w, − w, y ) (cid:105) − λ , (B4)where we introduced shorthand notation w = − a a and y = 1 − e − λ (used only within this Appendix), and where B ( w, v, y ) = (cid:90) y (1 − t ) w − t v − d t (B5)is a particular case of the incomplete Beta function.We now compute the behaviour of ψ at small λ , ob-serving that y (cid:39) λ in this limit. Our regime of interestis 1 / < a < < w < /
2. In this case B ( w, − w, y ) diverges as y →
0. To extract the nature ofthis divergence, introduce a factor of 1 = t + (1 − t ) intothe integrand of (B5) to yield B ( w, v, y ) = (cid:90) y (cid:2) (1 − t ) w − t v + (1 − t ) w t v − (cid:3) d t = v + wv (cid:90) y (1 − t ) w − t v d t − y v (1 − y ) w v , (B6)where the second line used an integration by parts, withthe assumption that w >
0. There is no such assumptionon v , the case of interest is − < v <
0. The limitingbehaviour at small y can now be extracted: for v > − y → B ( w, v, y ) (cid:39) v + wv B ( w, v ) − y v v + o (1) . (B7)Here B ( x, y ) is the (complete) Beta function which isgiven for x, y > (cid:82) t x − (1 − t ) y − d t , it is fi-nite and positive. Moreover, the relation B ( w, v ) = v + wv B ( w, v ) extends the Beta function to negative arguments. Using these results with (B4) and identify-ing − w = ( a − /a gives ψ ( λ ) = − log (cid:20) − wy /a (cid:18) B ( w, − w ) + y ( a − /a w + o (1) (cid:19)(cid:21) − λ . (B8)Finally, noting that y (cid:39) λ and using that ψ is an evenfunction: ψ ( λ ) = 1 − a a | λ | /a B (cid:18) − a a , a − a (cid:19) [1 + o (1)] . (B9)Hence c E = 2 (1 /a ) − (1 − a ) a B (cid:18) − a a , a − a (cid:19) (B10)in (15,31). Analysing the subleading term shows thatin fact the first correction to (B8) is ψ ( λ ) = c E | λ | /a + O ( λ ).Recall that we assumed here 1 / < a <
1, since this isthe regime of interest for this work. However (B4) alsoapplies for 0 < a < / a < ψ is analytic. Appendix C: Controlled dynamics1. Outline of general theory
As discussed in the main text, one method foranalysing fluctuation mechanisms is to construct con-trolled processes whose typical trajectories reproduce therare-event behaviour of interest. Such processes can beanalysed variationally.We work in the generic framework where the configura-tion of the system at time t is C t . A trajectory or samplepath is denoted C and its probability C in the originalmodel is denoted by P ( C ). Throughout our analysis, wefix t as the trajectory length and we use τ to indicate ageneric time within the trajectory. Now let P con ( C ) bethe probability of C in some controlled model, which hasdifferent dynamics. Optimal-control theory provides thefollowing general inequality [43] G ( λ, t ) ≥ λt (cid:104) q t (cid:105) con − D ( P con || P ) (C1)where (cid:104)·(cid:105) con indicates an average in the controlled model,and D ( Q || P ) is the Kullback-Leibler (KL) divergence be-tween the distributions Q and P . To prove (C1) define P cano ( C ) = e λtq t − G ( λ,t ) P ( C ) (C2)which is a normalised probability distribution, by defi-nition of G . (The subscript “cano” indicates that thisdefinition is analogous to that of the canonical ensemble4in thermodynamics.) Then by definition of the KL di-vergence, the right-hand side of (C1) can be expressedas λt (cid:104) q t (cid:105) con −D ( P con || P ) = G ( λ, t ) −D ( P con || P cano ) . (C3)The KL divergence is non-negative so the right-hand sideis less than or equal to G ( λ, t ), and (C1) follows. More-over, there is equality in (C1) if and only if P con = P cano .In addition, setting θ = 1 in the definition (6) we ob-tain ψ ( λ ) = lim t →∞ t − G ( λ, t ) so (C1) yields ψ ( λ ) ≥ lim t →∞ (cid:20) λ (cid:104) q t (cid:105) con − t D ( P con || P cano ) (cid:21) . (C4)If this bound is saturated then the controlled processgives an accurate representation of the rare event ofinterest, see also below. We emphasise that for non-Markovian processes as considered here, the limit in (C4)involves controlled processes where the dynamical rule attime τ depends both on τ and on the total trajectorylength t ; accurate bounds require controlled processeswith time-dependent rates.
2. GERW
We construct the optimally-controlled process for largedeviations of q t in the GERW. Using (C2) one obtains adistribution for the trajectory X , as defined in Sec. III B: P cano ( X ) ∝ exp (cid:20) λtq t − G ( λ, t ) − S ( X )2 (cid:21) . (C5)where q t also depends on X though (1). This distributionis Gaussian for the increments and for the q τ , and onehas an analogue of (25) which is (cid:104) q τ (cid:105) cano = µ τ (cid:104) q t (cid:105) cano h (C6)where h, µ τ are the same quantities that appear in (25).That is, choosing λ in the canonical ensemble fixes (cid:104) q t (cid:105) cano . Then the average path in this ensemble coin-cides with the average path in a corresponding micro-canonical ensemble with q t = (cid:104) q t (cid:105) cano .Since P cano in (C5) is Gaussian, it is possible to con-struct exactly an optimally-controlled process that gen-erates trajectories according to this distribution. Thisprocess achieves equality in (C1) and captures the mecha-nism by which large rare fluctuations occur in the GERW.This is similar to the Doob transform, as discussed in [22],with time-dependent rates as in [16]. Within the con-trolled system, the displacement on step τ is Gaussianwith mean aq τ − + b τ and variance unity. This meansthat P con ( X ) = exp( − ˜ S ( X ) /
2) with˜ S ( X ) = t − (cid:88) τ =0 [( τ + 1) q τ +1 − q τ ( τ + a ) − b τ +1 ] , (C7) analogous to (21). Hence˜ S ( X ) = S ( X ) − tq t b t +2 t − (cid:88) τ =1 q τ [( τ + a ) b τ +1 − τ b τ ]+ t (cid:88) τ =1 b τ . (C8)The optimally-controlled process has P con = P cano [recall(C3)], which is achieved by setting b t = λ and using b τ − = b τ (1 + aτ − ) iteratively to fix the b τ . For the CGFthis identification yields G ( λ, t ) = (cid:80) tτ =1 b τ .
3. ERW
For the ERW, a variational characterisation of ψ ( λ )is available following [28]. This construction also allowscomputation of the dominant paths shown in Fig. 2.We outline the approach, which is to define a controlledprocess that almost achieves equality in (C1), up to acorrection that vanishes on taking the limit in (C4). Thetypical path of this controlled model captures the mech-anism of the (rare) fluctuations that achieve q t = q in theERW. (Specifically, for large t and any u >
0, the condi-tional distribution of q ut for paths that achieve q t = q issharply peaked at (cid:104) q ut (cid:105) con , see [28].)We use (C1) with the controlled dynamics describedin the main text for which ( b , b , . . . , b t ) are variationalparameters. The KL divergence between P con and P is D = 12 t (cid:88) τ =1 [(1 + b τ ) log(1 + b τ ) + (1 − b τ ) log(1 − b τ )] − t (cid:88) τ =1 (1 + b τ ) (cid:104) log(1 + aq τ − ) (cid:105) con − t (cid:88) τ =1 (1 − b τ ) (cid:104) log(1 − aq τ − ) (cid:105) con , (C9)and we have (cid:104) q τ (cid:105) con = 1 τ τ (cid:88) k =1 b k . (C10)Moreover, the variance of q τ in this controlled processis at most 1 /τ so it is consistent to assume that q τ issharply peaked for almost all terms in the sums in (C9).Hence D ≈ ˆ D withˆ D = 12 (cid:88) τ [(1 + b τ ) log(1 + b τ ) + (1 − b τ ) log(1 − b τ )] − (cid:88) τ (1 + b τ ) log(1 + (cid:104) aq τ − (cid:105) con ) − (cid:88) τ (1 − b τ ) log(1 − (cid:104) aq τ − (cid:105) con ) . (C11)Using (C10) this is an explicit function of the b τ vari-ables, so the right-hand side of (C1) can be maximised5numerically, which yields a numerical estimate of G ( λ, t )and hence (by considering large but finite t ) one mayestimate ψ ( λ ).For numerical work we use a similar method to thatfor the GERW: we split the sums in (C11) into contri-butions from small τ and large τ and we approximatethe sum over large- τ contributions by an integral (whichis also estimated numerically). This combination of sumand integral is maximised numerically to obtain estimatesof ψ ( λ ) and of the corresponding (average) path (C10).This yields the results of Fig. 2. Appendix D: IGL mechanism in unidirectionalhopping model
This Appendix establishes (48), which means that (37)holds for the model of Sec. V A, with β = 1. For thiscondition, it is sufficient to consider a finite-time intervalbetween t and τ ∗ (there is no large-time limit because weare focussing on the excursion that occurs at early times).For a compact notation we work on the interval ( t , τ ] andwe write k for a generic time within this interval.Consider a controlled process where the first hop is attime t (as for the original model), after which hops takeplace with a time-dependent rate b ( τ ). Then ( τ q τ −
1) isPoissonian with mean (cid:82) τt b ( k )d k and so τ (cid:104) q τ (cid:105) con = 1 + (cid:90) τt b ( k )d k . (D1)The KL divergence of (C1) is D = (cid:90) τt (cid:26) b ( k ) (cid:28) log b ( k ) aq k (cid:29) con − b ( k ) + (cid:104) aq k (cid:105) con (cid:27) d k , (D2)similar to (C9). In addition to (C1), the KL divergencealso allows a bound on the probability distribution of q t . Roughly speaking, if one can construct a controlledprocess such that the large-deviation event occurs withprobability one, P con ( q τ ≥ q ) = 1, then the probabilityof this event in the original model can be bounded frombelow: − log P ( q τ ≥ q ) ≤ D ( P con || P ) . (D3)This may be proved by Jensen’s inequality; a more pre-cise statement is given (for example) in Equs. (14,15)of [16]. Hence we seek an upper bound on D .To achieve this, we use log(1 /x ) ≤ (1 /x ) − x = q k / (cid:104) q k (cid:105) con to write D ≤ (cid:90) τt (cid:40) b ( k ) log b ( k ) a (cid:104) q k (cid:105) con − b ( k ) + (cid:104) aq k (cid:105) con + b ( k ) (cid:104) q k (cid:105) con (cid:28) q k (cid:29) con (cid:41) d k . (D4)For a Poisson random variable X with mean x , one has (cid:104) X (cid:105) = e − x (cid:80) ∞ n =0 x n / ( n + 1)! = (1 − e − x ) /x . Since( kq k −
1) is Poissonian, we obtain
D ≤ (cid:90) τt (cid:40) b ( k ) log b ( k ) (cid:104) aq k (cid:105) con − b ( k ) + (cid:104) aq k (cid:105) con + b ( k ) (cid:104) kq k (cid:105) con − e −(cid:104) kq k − (cid:105) con (cid:104) kq k − (cid:105) con (cid:41) d k . (D5)To recover the results of [25] one should assume that kq k (cid:29) b ( k ). This is validfor t (cid:29)
1. Then one sets τ = t and minimises the re-sulting KL divergence over the path ˆ q ( k ) = (cid:104) q k (cid:105) con , using(D1) to replace b ( k ) → ( ∂/∂k )( k ˆ q ( k )). The optimal pathbehaves for short times as k ˆ q ( k ) = 1+ A [( k/t ) −
1] where A is proportional to the size of the giant excursion [25].Our approach here does not require t to be large: weretain all terms in (D5), and use (D3) with τ = τ ∗ toestablish (37). To obtain a convenient bound we set τ ∗ =2 t and choose b ( k ) such that (cid:104) kq k (cid:105) con = 1 + Ax with x = ( k/t ) − A = 2 q ∗ t −
1. This requires b ( k ) = A/t and ensures that (cid:104) q τ ∗ (cid:105) con = q ∗ . [Note, b ( k ) is onlyindependent of k for k < τ ∗ (i.e., during the excursion),the controlled process reverts to the natural dynamics ofthe model for k > τ ∗ .] Then (D5) with τ = τ ∗ becomes D ≤ A (cid:90) (cid:40) log A (1 + x ) a ( Ax + 1) − a ( Ax + 1) A (1 + x )+ (1 + Ax ) 1 − e − Ax Ax (cid:41) d x . (D6)We are concerned with the limit q ∗ → ∞ which corre-sponds to A → ∞ . The integral can be evaluated in thislimit and the KL divergence scales as D (cid:46) γ uni q ∗ t (D7)with γ uni = 2[log(4 /a ) + a (1 − log 2) − P con ( q τ ∗ ≥ q ∗ ) → q ∗ → ∞ : this holds because the distribution of q τ ∗ is Poissonian with a diverging mean equal to q ∗ , so itis sharply peaked at q ∗ . Hence (D3) is applicable withKL divergence (D7) and the probability of the excursionobeys (48), as required.6 [1] N. C. Keim and S. R. Nagel, Phys. Rev. Lett. , 010603(2011).[2] C. Scalliet and L. Berthier, Phys. Rev. Lett. , 255502(2019).[3] J. Kappler, J. O. Daldrop, F. N. Br¨unig, M. D. Boehle,and R. R. Netz, J. Chem. Phys. , 014903 (2018).[4] P. Van Mieghem and R. van de Bovenkamp, Phys. Rev.Lett. , 108701 (2013).[5] J. Zhang and T. Zhou, Proc. Natl. Acad. Sci. USA ,23542 (2019).[6] G. Rangarajan and M. Ding, eds., Processes with Long-Range Correlations: Theory and Applications , LectureNotes in Physics, Vol. 621 (Springer-Verlag, Berlin Hei-delberg, 2003).[7] J. Beran, Y. Feng, S. Ghosh, and R. Kulik,
Long-MemoryProcesses: Probabilistic Properties and Statistical Meth-ods , berlin heidelberg ed. (Springer-Verlag, 2013).[8] G. M. Sch¨utz and S. Trimper, Phys. Rev. E , 045101(2004).[9] S. Hod and U. Keshet, Phys. Rev. E , 015104 (2004).[10] E. Baur and J. Bertoin, Phys. Rev. E , 052134 (2016).[11] A. A. Budini, Phys. Rev. E , 022108 (2016).[12] A. A. Budini, Phys. Rev. E , 052110 (2017).[13] A. Rebenshtok and E. Barkai, Phys. Rev. Lett. ,210601 (2007).[14] K. Klymko, J. P. Garrahan, and S. Whitelam, Phys.Rev. E , 042126 (2017).[15] K. Klymko, P. L. Geissler, J. P. Garrahan, and S. White-lam, Phys. Rev. E , 032123 (2018).[16] R. L. Jack, Phys. Rev. E , 012140 (2019).[17] R. J. Harris, New J. Phys. , 053049 (2015).[18] F. den Hollander, Large deviations (American Mathe-matical Society, Providence, RI, 2000).[19] V. Lecomte, C. Appert-Rolland, and F. van Wijland, J.Stat. Phys. , 51 (2007).[20] B. Derrida, J. Stat. Mech. , P07023 (2007).[21] H. Touchette, Phys. Rep. , 1 (2009).[22] R. Ch´etrite and H. Touchette, Ann. Henri Poincar´e ,2005 (2015).[23] R. L. Jack, Eur. Phys. J. B , 74 (2020).[24] R. J. Harris and H. Touchette, J. Phys. A , 342001(2009).[25] R. J. Harris, J. Stat. Mech. , P07021 (2015).[26] C. Maes, K. Netoˇcn´y, and B. Wynants, J. Phys. A ,365002 (2009).[27] A. Faggionato, arXiv:1709.05653 (2017).[28] S. Franchini, Stoch. Process. Appl. , 3372 (2017).[29] B. Bercu, J. Phys. A , 015201 (2017).[30] V. M. Kenkre, arXiv:0708.0034.[31] J. Lebowitz and H. Spohn, J. Stat. Phys. , 333 (1999). [32] J. P. Garrahan, R. L. Jack, V. Lecomte, E. Pitard, K. vanDuijvendijk, and F. van Wijland, Phys. Rev. Lett. ,195702 (2007).[33] L. O. Hedges, R. L. Jack, J. P. Garrahan, and D. Chan-dler, Science , 1309 (2009).[34] T. R. Gingrich, J. M. Horowitz, N. Perunov, and J. L.England, Phys. Rev. Lett. , 120601 (2016).[35] M. D. Donsker and S. R. S. Varadhan, Comm. Pure Appl.Math , 1 (1975).[36] M. D. Donsker and S. R. S. Varadhan, Comm. Pure Appl.Math , 279 (1975).[37] M. D. Donsker and S. R. S. Varadhan, Comm. Pure Appl.Math , 389 (1976).[38] M. D. Donsker and S. R. S. Varadhan, Comm. Pure Appl.Math , 183 (1983).[39] H. Touchette, Physica A , 5 (2018).[40] D. Nickelsen and H. Touchette, Phys. Rev. Lett. ,090602 (2018).[41] G. Gradenigo and S. N. Majumdar, J. Stat. Mech. ,053206 (2019).[42] B. Meerson, Phys. Rev. E , 042135 (2019).[43] P. Dupuis and R. S. Ellis, A weak convergence approachto the theory of large deviations (Wiley, 1997).[44] F. N. C. Paraan and J. P. Esguerra, Phys. Rev. E ,032101 (2006).[45] M. A. A. da Silva, J. C. Cressoni, G. M. Sch¨utz, G. M.Viswanathan, and S. Trimper, Phys. Rev. E , 022115(2013).[46] R. Ch´etrite and H. Touchette, J. Stat. Mech. ,P12001 (2015).[47] C. Maes and K. Netoˇcn´y, EPL , 30003 (2008).[48] A. Simha, R. M. L. Evans, and A. Baule, Phys. Rev. E , 031117 (2008).[49] D. Simon, J. Stat. Mech. , P07017 (2009).[50] R. L. Jack and P. Sollich, Prog. Theor. Phys. Supp. ,304 (2010).[51] R. L. Jack and P. Sollich, Eur. Phys. J.: Spec. Topics , 2351 (2015).[52] S. Grosskinsky, G. M. Sch¨utz, and H. Spohn, J. Stat.Phys. , 389 (2003).[53] M. R. Evans and B. Waclaw, J. Phys. A: Math. Theor. , 095001 (2014).[54] A. Vezzani, E. Barkai, and R. Burioni, Phys. Rev. E , 012108 (2019).[55] C. Appert-Rolland, B. Derrida, V. Lecomte, and F. vanWijland, Phys. Rev. E , 021122 (2008).[56] V. Lecomte, J. P. Garrahan, and F. van Wijland, J.Phys. A , 175001 (2012).[57] M. Cavallaro and R. J. Harris, J. Phys. A , 47LT02(2016).[58] K. Nagel, Phys. Rev. E53