LLarge Deviations
Satya N. Majumdar and Gr´egory Schehr LPTMS, CNRS, Univ. Paris-Sud, Universit´e Paris-Saclay, 91405 Orsay, France (Dated: November 22, 2017)
Extreme events such as earthquakes, tsunamis, ex-tremely hot or cold days, financial crashes etc. are rare events. They do not happen everyday. But if/when theyhappen, they can have devastating effects. Hence it is ofabsolute importance to build models to estimate, whensuch catastrophic events may occur, and if they do, theamount of damage, i.e, the magnitude of such events.A first and basic step towards building such models isto study the existing statistics of such rare events andto construct a ‘tool’ that describes well these extremestatistics.For example, suppose we look at the data of the heightof water level of a river. One can easily construct ahistogram of height (empirical probability distribution)from the available record. Typically they have a bell-shaped form, with a peak around the mean water level.The probability of small ‘typical’ fluctuations around themean are often well described by a Gaussian form. Thiscan be understood using standard tools from probabilitytheory, such as the central limit theorem (CLT). How-ever, we are interested in rare events (e.g., floods ordraughts) where the typical height of the water level ismuch above (or below) the mean level, i.e, with verylarge fluctuations from the mean. These events are char-acterized by the tails of the histogram. The probabilityat these tails can be as small as 10 − (one in a billionevents!). How do we describe such tails? The CLT doesnot hold far away from the peak, and to describe theseextremely small probability at the tails, one needs a newtool. The ‘large deviation theory’ provides precisely sucha tool.To illustrate this idea, let us start with a concrete ex-ample. Imagine that we have N unbiased coins and wetoss them simultaneously. We record the outcome of eachcoin, which are either head ‘H’ or tail ‘T’. In each trial,we count the number of heads N H , which can be anynumber between 0 and N . Indeed, N H fluctuates fromone trial to another – thus it is a random variable. Let P ( M, N ) = Proba . ( N H = M ) denote the probability dis-tribution of N H . Given that each outcome can be ‘H’ or‘T’ with probability 1 / P ( M, N ) isgiven by the binomial distribution P ( M, N ) = 12 N (cid:18) NM (cid:19) . (1)If we plot this distribution as a function of M for a given N (see Fig. 1), this histogram has a bell shaped formwith a peak around the mean (cid:104) N H (cid:105) = N/
2. One canalso compute trivially the variance of N H , which is given by σ = (cid:104) N H (cid:105) − (cid:104) N H (cid:105) = N . (2) ln P ( M, N ) M -3500-3000-2500-2000-1500-1000-500 0 0 1000 2000 3000 4000 5000 GaussianLarge deviation
FIG. 1. Plot of ln P ( M, N ) given in Eq. (1) as a func-tion of M (square symbols), for N = 5000. The solidblue line corresponds to the typical Gaussian fluctuations inEq. (3), which describes well the exact curve in the vicinityof M = N/ This indicates that the typical fluctuations of N H around its mean are of order σ ∼ √ N . Moreover, theshape of the histogram around the peak, on a scale oforder √ N around the mean, can be very well approxi-mated, for large N , by a Gaussian form (see Fig. 1) P ( M, N ) ≈ (cid:114) πN e − N ( M − N/ . (3)This Gaussian form is a direct consequence of the CLT.To see this, we can write N H = (cid:80) Ni =1 σ i where σ i = 1 ifthe i -th coin shows a head and σ i = 0 otherwise. Sincethe σ i ’s are independent random variables, the CLT saysthat the sum of a large number of such independentrandom variables has a Gaussian shape. However, theCLT does not hold when the deviation from the meanis much larger than √ N . For example, suppose we con-sider the extreme event where all the outcomes are head,i.e. N H = N . Clearly the probability of such an event,putting M = N in Eq. (1), is exactly P ( M = N, N ) = Prob . ( N H = N ) = 12 N = e − N ln 2 . (4) a r X i v : . [ c ond - m a t . s t a t - m ec h ] N ov On the other hand, putting M = N in the Gaussianapproximation in Eq. (3), one finds P ( N, N ) ≈ e − N/ which is much bigger than the exact value e − N ln 2 forlarge N . This clearly demonstrates that the Gaussianform, while being a very good approximation near thepeak, is rather poor at the extreme tails. This is exactlywhere the ‘large deviation theory’ comes to the rescue,as we show now.Since we would like to describe events such that M − N/ N , we can set M = c N (where the fractionof heads c is of order 1) in the exact expression in Eq. (1).For large N , we can use Stirling’s approximation N ! ∼√ πN e N ln N − N to write P ( M = c N, N ) = 12 N N !( c N )!((1 − c ) N )! ≈ e − N φ ( c ) , (5)where φ ( c ) = c ln c +(1 − c ) ln(1 − c )+ln 2 , ≤ c ≤ . (6)The Eq. (5) is usually referred to as a “large deviationprinciple”, with speed N and a rate function φ ( c ). Thisfunction φ ( c ) is a convex function with a minimum at c =1 /
2. At the extreme end c = 1, we get φ ( c = 1) = ln 2.Thus for c = 1, Eq. (5) correctly describes P ( N, N ) inEq. (4). Moreover, this large deviation form in Eq. (5)also describes correctly the Gaussian behavior near thepeak at c = 1 /
2. To demonstrate this, we note that φ ( c ) ≈ c − / as c → /
2. Using this quadraticbehavior in Eq. (5) we recover the Gaussian form (3).Thus in this simple example, the large deviation form inEq. (5) not only describes the extreme events but alsothe typical events around the mean (see Fig. 1).While this large deviation theory is quite well devel-oped in the mathematics literature [1–3], physicists arealso quite familiar with this concept, though in a slightlydifferent language (for a review see [4]). To connect tothe language of physicists, let us again consider this sim-ple example of coin tossing experiment. Instead of askingfor the probability of the number of heads, let us considerjust the number of possible configurations N ( M ) = (cid:18) NM (cid:19) (7)with a fixed number of heads M = c N . For large N , thiscan also be written, using Stirling’s formula, as N ( M ) ≈ e N S ( c = MN ) (8)where it follows from Eqs. (5) and (6) that S ( c ) = ln 2 − φ ( c ) = − c ln c − (1 − c ) ln(1 − c ) . (9)Hence we see that N ( M ) also admits a large deviationprinciple with a rate function S ( c ), which is thus simplyrelated to the “mathematician’s” rate function φ ( c ) viaEq. (9). We will now see that this S ( c ) is nothing butthe good old entropy density that physicists are familiar with. To demonstrate this, let us consider the same coin-tossing experiment in a slightly different language. Letus consider N non-interacting Ising spins s i = ±
1, sub-jected to a constant external magnetic field h . The en-ergy associated to a particular configuration of the spinsis E = − h (cid:80) Ni =1 s i . Writing s i = 2 σ i − σ i = 1or 0) and setting h = − /
2, one gets, up to an additiveconstant, E = N (cid:88) i =1 σ i , (10)which is precisely the number of heads N H in the coin-tossing experiment. If we now consider the statisticalmechanics of this spin system in the micro-canonical en-semble (i.e., energy E is fixed), we would like to com-pute the micro-canonical partition function N ( E ) whichsimply denotes the number of spin configurations witha given energy E . But this is precisely the number ofheads N H in the coin-tossing experiment. Hence, setting N H = M = E , and using Eq. (7), we get N ( E ) = (cid:0) NE (cid:1) .Thus it follows from Eq. (8) that, for large N , it admitsa large deviation form as in Eq. (8) with the associatedrate function S ( c ) given in Eq. (9). In statistical me-chanics, S ( c ) is the well known entropy density at energy E = c N (upon setting the Boltzmann constant k B = 1).The large deviation principle in this example just reflectsthat the entropy and the energy are extensive. Eventhough this interpretation of the rate function S ( c ) asthe entropy density at fixed energy E = c N is demon-strated here in a simple example, this is actually moregeneral. Indeed, for any short-ranged interacting sys-tem, thermodynamics tells us that both the energy andthe entropy are extensive. Hence, for any such system,we would expect that there is a large deviation principlefor the micro-canonical partition function.One can also connect the rate function S ( c ) (or equiv-alently the entropy density) to another quantity, verymuch familiar to physicists, namely the free energy perparticle in the “canonical” ensemble (where the tempera-ture T is kept fixed, but allowing the energy E to fluctu-ate). In this canonical ensemble, one first defines the socalled “canonical partition function” Z = (cid:80) C e − βE ( C ) ,summing over all microscopic configurations C of thesystem with an associated Boltzmann weight e − βE ( C ) ,where β = 1 / ( k B T ) is the inverse temperature. One canconvert this sum into an integral over energy Z = (cid:88) C e − βE ( C ) = (cid:90) e − βE N ( E ) dE , (11)where N ( E ) is the micro-canonical partition function.Assuming extensivity of the energy (which is true for anyshort-ranged system), we would expect a large deviationprinciple as in Eq. (8), N ( E ) ≈ e N S ( EN ), where S ( c ) isthe entropy density at energy E = c N . Using this resultin Eq. (11) and making the change of variable E = N c ,one obtains Z ≈ N (cid:90) dc e − β N [ c − S ( c ) β ] . (12)For large N , the dominant contribution to the integralcomes from the minimum of the argument of the ex-ponential (the so called “saddle point approximation”)leading to Z ≈ e − β N min c [ c − S ( c ) β ] . (13)In the thermodynamic (i.e., large N ) limit, the free en-ergy per particle is defined as f ( β ) = − lim N →∞ β N ln Z .This definition is equivalent to say that the canonical par-tition function Z in (13) admits a large deviation princi-ple with speed N and rate function β f ( β ) as in Eq. (13)with f ( β ) = min c (cid:20) c − S ( c ) β (cid:21) . (14)Hence the free energy per particle f ( β ) in the canoni-cal ensemble and the entropy density S ( c ) of the micro-canonical ensemble are related to each other via a socalled “Legendre transform”.So far, we learnt that the large deviation principle andthe associated rate function S ( c ) is a very useful tool todescribe, within a single setting, both typical as well asatypically rare events. What else can we learn from thisrate function S ( c )? In this coin tossing example, we seethat S ( c ) in Eq. (9) is a smooth function of c with nosingularity for 0 < c <
1. It turns out however that in asystem that exhibits a thermodynamic phase transition,the rate function S ( c ) displays a singularity (non-analyticbehavior) at some critical value c ∗ . As a simple example,let us consider the 2 d ferromagnetic Ising model. In thecanonical ensemble, we know from Onsager’s celebratedexact solution [5], that the free energy f ( β ) has a sin-gularity at a critical point β = β c (this corresponds toa second order phase transition from a high-temperatureparamagnetic phase to a low-temperature ferromagneticphase). From Eq. (14) connecting f ( β ) and S ( c ), oneimmediately sees that S ( c ) will also exhibit a singular-ity at a critical value c = c ∗ . Indeed, it has been shownthat S ( c ) ∼ ( c − c ∗ ) / ln | c − c ∗ | for c close to c ∗ . Thusthe second derivative of S ( c ) diverges logarithmically at c = c ∗ [6]. This fact that the thermodynamic phase tran-sition manifests itself as a singularity in the rate function S ( c ) turns out to be quite generic, both in short-rangedand in long-ranged systems [4, 7].This idea of detecting a phase transition by studyingpossible singularities of the large deviation function asso-ciated to the probability distribution of some observablehas recently been extensively used in various disorderedsystems, most notably in problems related to the randommatrix theory (RMT). RMT has been a very success-ful tool in analyzing problems arising in statistics, num-ber theory, combinatorics all the way to nuclear physics,mesoscopic systems, wireless communications, informa-tion theory, etc. The main goal in RMT is to study the statistics of the eigenvalues of a random N × N ma-trix with entries chosen from a specified ensemble. Thesimplest example is the Gaussian Ensemble of real sym-metric matrices, for which all the eigenvalues are real.In this case the joint distribution of the N eigenvaluescan be interpreted as the Boltzmann weight of a gas of N charges on a line, in presence of a harmonic trap, andwith long-range pairwise (logarithmic) repulsion betweenthem.There has been a lot of recent activities on the statis-tics of the top eigenvalue, i.e., the position of the right-most charge λ max . For large N , the typical fluctuationsof λ max around its mean √ λ max in RMTis the analogue of the Gaussian distribution describingthe typical fluctuations of the number of heads aroundthe mean in the simple coin-tossing example discussedbefore in Eq. (3). However, the large atypical fluctua-tions of λ max are not described by the TW law, similarto the coin-tossing example where the central Gaussiandistribution fails to describe the extreme tails. The largedeviation tails for λ max have been computed and it turnsout that the tails are rather different on the left and theright of the mean, at variance with the coin-tossing ex-periment where P ( M = c N, N ) is symmetric around themean c = 1 / φ ( c ) is smooth around c = 1 /
2, in the case of λ max , the associated large devia-tion function is singular around the mean and its thirdderivative is discontinuous there. This is thus an exampleof a third order phase transition, according to Ehrenfestclassification. One might wonder: this is a phase tran-sition, but what are the two phases across this criticalpoint? It turns out that the left large deviation of λ max corresponds to a “pushed phase” where all the N chargesare pushed to the left – this involves a collective reorga-nization of the N charges (see Fig. 2). In contrast, theright large deviation of λ max corresponds to a “pulledphase” where only one single charge splits off the sea of N − N gauge theory is a similar third order phase transi-tion from a “strong” (analogue of the pushed phase) toa “weak” coupling phase (i.e., pulled phase). In recenttimes, similar third order phase transitions have beenfound in a large number of examples [10]. For a lesstechnical discussion of the TW distribution and the as- weak couplingstrong coupling P ( max , N ) max p left tail right tailTracy-Widom FIG. 2. Schematic picture of the probability distribution P ( λ max , N ) of the largest eigenvalue λ max of an N × N Gaus-sian random matrix. The central blue part indicates theTracy-Widom distribution, while the red and the green tailscorrespond respectively to the left and right large deviations.In the inset we show the typical charge configurations in re-spectively the ”pushed” (strong coupling) and “pulled” (weakcoupling) phases. sociated phase transition, we refer the reader to a populararticle in Quanta Magazine by N. Wolchover [11].So far, we have been discussing the applications of largedeviation principles in equilibrium systems, both shortand long-ranged. However, in recent years, large devia-tions have played a major role in open non-equilibriumdriven systems. In many situations, the driven sys-tems may reach a non-equilibrium steady-state, wherethe probability distribution of observables become time-independent. However, contrary to equilibrium steadystates, there is a priori no notion of free energy orentropy density associated with such non-equilibriumsteady states. It turns out that in such steady states,one can instead use large deviation functions of appro-priate observables as substitutes of the free energy inequilibrium systems. Let us consider again a simple ex-ample. Imagine we have a sample of size L in one dimen-sion which is connected to two different heat reservoirsat the two ends: a “hot” reservoir at temperature T H and a “cold” reservoir at temperature T C . The tempera-ture gradient sets up a heat current through the system,flowing from the hot to the cold reservoir. Let j ( τ ) de-note the instantaneous heat flux (or current) at time τ at any given point of the sample. Due to thermal fluc- tuations, j ( τ ) is a random variable and at late times itsprobability distribution becomes time independent, sig-naling that the system has reached a steady state. Oneuseful observable in the steady state, which has been ex-tensively studied, is the integrated current up to time t , Q ( t ) = (cid:82) t j ( τ ) dτ . Its average value (cid:104) Q ( t ) (cid:105) ∼ t for large t , since (cid:104) j ( τ ) (cid:105) is a constant in the steady state. Hence itis natural to expect, and has been established in severalmodels, that the probability distribution P ( Q, t ) of Q ( t )satisfies a large deviation principle, P ( Q, t ) ∼ e − t Φ ( Qt ) , (15)where Φ( z ) is a rate function, morally similar to thefree energy in equilibrium system. Note that the time t here plays the role of N in the coin-tossing example [seeEq. (5)]. Indeed, Φ( z ) satisfies certain additivity proper-ties, like the free energy in equilibrium systems. Therehas been a lot of recent analytical progress in this field,either by exact solution of Φ( z ) in solvable models [12]or from exploiting a macroscopic hydrodynamic theorydeveloped for driven diffusive systems [13]. In addition,large deviation theory has played a very crucial role inthe development of so called “fluctuation theorems” innonequilibirum systems [14] – a subject of great theoret-ical and experimental interest, but unfortunately beyondthe scope of this short article.To conclude, one sees that large deviation theory,though originally developed in probability theory, is in-creasingly becoming a very useful tool in several areas ofstatistical physics. These include the analysis of the ex-treme statistics of rare events in disordered systems andrelated problems in random matrix theory, in equilibriumsystems with both short and long range interactions, aswell as in systems out of equilibrium. Despite severalanalytical calculations of the large deviation functions inmostly one-dimensional models, these rate functions, ingeneral, are hard to compute analytically. Hence, nu-merical methods play also an important role. Indeed,in recent years, very powerful numerical algorithms (us-ing “important sampling” methods) have been developedthat can probe probabilities as small as 10 − [15, 16].Similarly, on the experimental side also, large deviationfunctions have been measured (see for example [9, 17]).Thus the large deviation theory has seen an explosionof applications during the last two decades, bringing to-gether researchers from mathematics, computer science,information theory and physicists, both theorists and ex-perimentalists. There is no doubt that these rapidlyevolving developments in this subject will continue to ex-cite researchers across disciplines for the years to come. [1] S. S. Varadhan, Large deviations and applications , (Soci-ety for Industrial and Applied Mathematics, 1984).[2] F. den Hollander,
Large Deviations (American Mathe-matical Society, Providence, 2000). [3] A. Dembo, O. Zeitouni,
Large Deviations Techniques andApplications (Springer, Berlin, 2010).[4] H. Touchette, Phys. Rep. , 1 (2009).[5] L. Onsager, Phys. Rev. , 117 (1944). [6] M. Promberger, A. H¨uller, Z. Phys. B , 341 (1995).[7] D. Mukamel, ”Statistical mechanics of systems with longrange interactions”, AIP Conference Proceedings. Eds.Alessandro Campa, et al. Vol. 970. No. 1. AIP, (2008).[8] C. Tracy, H. Widom, Commun. Math. Phys. (1), 151(1994).[9] M. Fridman, R. Pugatch, M. Nixon, A. A. Friesem, N.Davidson, Phys. Rev. E , R020101 (2012).[10] S. N. Majumdar, G. Schehr, J. Stat. Mech. P01012(2014).[11] N. Wolchover, Quanta Mag., (2014), https://lc.cx/Z9ao. [12] B. Derrida, J. Stat. Mech. P01030 (2011).[13] L. Bertini, A. De Sole, D. Gabrielli, G. Jona-Lasinio, C.Landim, Rev. Mod. Phys. , 593 (2015).[14] U. Seifert, Rep. Prog. Phys , 126001 (2012).[15] W. Krauth, Statistical Mechanics: Algorithms and Com-putations , Oxford Master Series in Physics (Oxford Uni-versity Press, UK, 2006).[16] A. K. Hartmann, Phys. Rev. E65