[PDF] A brief history of long memory: Hurst, Mandelbrot and the road to ARFIMA

Abstract

Long memory plays an important role in many fields by determining the behaviour and predictability of systems; for instance, climate, hydrology, finance, networks and DNA sequencing. In particular, it is important to test if a process is exhibiting long memory since that impacts the accuracy and confidence with which one may predict future events on the basis of a small amount of historical data. A major force in the development and study of long memory was the late Benoit B. Mandelbrot. Here we discuss the original motivation of the development of long memory and Mandelbrot's influence on this fascinating field. We will also elucidate the sometimes contrasting approaches to long memory in different scientific communities

Full PDF

aa r X i v : . [ s t a t . O T ] F e b A brief history of long memory: Hurst, Mandelbrot andthe road to ARFIMA

Timothy Graves ∗ Robert B. Gramacy † Nicholas W. Watkins ‡ Christian L. E. Franzke § Abstract

Long memory plays an important role in many ﬁelds by determining the behaviourand predictability of systems; for instance, climate, hydrology, ﬁnance, networks andDNA sequencing. In particular, it is important to test if a process is exhibiting longmemory since that impacts the accuracy and conﬁdence with which one may predictfuture events on the basis of a small amount of historical data. A major force in thedevelopment and study of long memory was the late Benoit B. Mandelbrot. Here wediscuss the original motivation of the development of long memory and Mandelbrot’sinﬂuence on this fascinating ﬁeld. We will also elucidate the sometimes contrastingapproaches to long memory in diﬀerent scientiﬁc communities.

Key words: long-range dependence, Hurst eﬀect, fractionally diﬀerenced models,Mandelbrot

In many ﬁelds, there is strong evidence that a phenomenon called “long memory” plays asigniﬁcant role, with implications for forecast skill, low frequency variations, and trends.In a stationary time series the term “long memory”—sometimes “long range dependence”(LRD) or ”long term persistence”—implies that there is non-negligible dependence betweenthe present and all points in the past. To dispense quickly with some technicalities, we ∗ Arup, London, UK † The University of Chicago Booth School of Business; 5807 S. Woodlawn Avenue, Chicago, IL 60637 ‡ Corresponding author: Centre for Fusion Space and Astrophysics, University of Warwick, Coventry,UK; Department of Engineering and Innovation, Faculty of Mathematics Computing and Technology, OpenUniversity, Milton Keynes, UK; Centre for the Analysis of Time Series, London School of Economics andPolitical Science, London, UK; and Max Planck Institute for the Physics of Complex Systems, Dresden,Germany ;

[email protected] § Meteorological Institute and Centre for Earth System Research and Sustainability (CEN), University ofHamburg, Germany P ∞ k = −∞ ρ ( k ) = ∞ . This is equivalent toits power spectrum having a pole at zero frequency (Beran, 1994; Beran et al., 2013). Inpractice, this means the ACF and the power spectrum both follow a power-law, becausethe underlying process does not have any characteristic decay timescale. This is in strikingcontrast to many standard (stationary) stochastic processes where the eﬀect of each datapoint decays so fast that it rapidly becomes indistinguishable from noise. The study of longmemory processes is important because they exhibit nonintuitive properties where manyfamiliar mathematical results fail to hold, and because of the numerous datasets (Beran,1994; Beran et al., 2013) where evidence for long memory has been found. In this paper wewill give a historical account of three key aspects of long memory: 1) The environmetricobservations in the 1950s which ﬁrst sparked interest: the anomalous growth of range inhydrological time series, later known as the “Hurst” phenomenon; 2) After more than adecade of controversy, the introduction by Mandelbrot of the ﬁrst stationary model-fractionalGaussian noise (FGN)-which could explain the Hurst phenomenon ; and 3)The incorporationof LRD, via a fractional diﬀerencing parameter d , into the more traditional ARMA( p, q )models, through Hosking and Granger’s ARFIMA( p, d, q ) model.The development of the concept of long memory, both as a physical notion and a formalmathematical construction, should be of signiﬁcant interest in the light of controversialapplication areas like the study of bubbles and business cycles in ﬁnancial markets (Sornette,2004), and the quantiﬁcation of climate trends (Franzke, 2012). Yet few articles about longmemory cover the history in much detail. Instead most introduce the concept with passingreference to its historical signiﬁcance; even books on LRD tend to have only a brief history. This was in itself controversial because it explicitly exhibited LRD (which he dubbed “the Joseph eﬀect”) Notable exceptions include Montanari (2003), the semi-autobiographical Mandelbrot and Hudson (2008),and his posthumous autobiography (Mandelbrot, 2013), as well as the reminiscence of his former student As we shall see, this evolution took less than three decadesacross the middle of the twentieth century. During this period there was signiﬁcant debateabout the mathematical, physical, and philosophical interpretations of long memory. It isboth the evolution of this concept, and the accompanying debate (from which we shall oftendirectly quote), in which we are mostly interested. The kind of memory that concerns ushere was a conceptually new idea in science, and rather diﬀerent, for example, from thatembodied in the laws of motion developed by Newton and Kepler. Rather than Markov pro-cesses where the current state of a system now is enough to determine its immediate future,the fractional Gaussian noise model requires information about the complete past history ofthe system.As will become evident, the late Benoˆıt B. Mandelbrot was a key ﬁgure in the develop-ment of long memory. Nowadays most famous for coining the term and concept ‘fractal’,Mandelbrot’s output crossed a wide variety of subjects from hydrology to economics as well

Murad Taqqu (2013) For speciﬁcity, we clarify here that our interpretation of ‘modern-day subject’ comprises of the deﬁnitionsof long memory given above, including footnoted alternatives, together with the ARFIMA( p, d, q ) processesdeﬁned through the backshift operator B as Φ( B (1 − B ) d X t = Θ( B ) ε t where Φ and Θ are autoregressive andmoving average polynomials, and ε t is white noise. For more details, see any modern time series text (e.g.,Brockwell and Davis, 1991).

3s pure and applied mathematics. During the 1960s he worked on the theory of stochasticprocesses exhibiting heavy tails and long memory, and was the ﬁrst to distinguish betweenthese eﬀects. Because of the diversity of the communities in which he made contributions, itsometimes seems that Mandelbrot’s role in statistical modelling is perhaps underappreciated(in contrast, say, to within the physics and geoscience communities (Aharony and Feder,1990; Turcotte, 1997)). It certainly seemed this way to him:Of those three [i.e. economics, engineering, mathematics], nothing beats myimpact on ﬁnance and mathematics. Physics - which I fear was least aﬀected -rewarded my work most handsomely. (Mandelbrot, 2013)A signiﬁcant portion of this paper is devoted to his work. We do not, however, intend toconvey in any sense his ‘ownership’ of the LRD concept, and indeed much of the modernprogress concerning long memory in statistics has adopted an approach (ARFIMA) that hedid not agree with.Mandelbrot’s motivation in developing an interest in long memory processes stemmedfrom an intriguing study in hydrology by Harold Hurst (1951). Before we proceed to discussthis important work it is necessary to give a brief history of hydrological modelling, Hurst’scontributions, and the reactions to him from other authors in that area in Section 2. Thenwe discuss Mandelbrot’s initial musings, his later reﬁnements, and the reactions from thehydrological community in Section 3. In Section 4 we discuss the development in the 1980sof fractionally diﬀerenced models culminating from this sequence of thought. Section 5 oﬀersour conclusions.

Water is essential for society to ﬂourish since it is required for drinking, washing, irrigationand for fuelling industry. For thousands of years going back to the dawn of settled agricultural4ommunities, humans have sought methods to regulate the natural ﬂow of water. They triedto control nature’s randomness by building reservoirs to store water in times of plenty, sothat lean times are survivable. The combined factors of the nineteenth century IndustrialRevolution, such as fast urban population growth, the requirement of mass agriculturalproduction, and increased energy requirements, led to a need to build large scale reservoirsformed by the damming of river valleys. When determining the capacity of the reservoir, orequivalently the height of the required dam, the natural solution is the ‘ideal dam’:[An ‘ideal dam’ for a given time period is such that] (a) the outﬂow is uni-form, (b) the reservoir ends the period as full as it began, (c) the dam neveroverﬂows, and (d) the capacity is the smallest compatible with (a), (b) and (c).(Mandelbrot and Wallis, 1969a) From a civil engineer’s perspective, given the parameters of demand (i.e. required outﬂow)and time horizon, how should one determine the optimal height of the dam? To answer thisquestion we clearly need an input, i.e. river ﬂows. It is not hard to imagine that for agiven set of inputs it would, in principle, be possible to mathematically solve this problem.A compelling solution was ﬁrst considered by Rippl (1883) “whose publication can . . . beidentiﬁed with the beginning of a rigorous theory of storage reservoirs” (Klemeˇs, 1987).Despite solving the problem, Rippl’s method was clearly compromised by its requirementto know, or at least assume, the future variability of the river ﬂows. A common method wasto use the observed history at the site as a proxy; however records were rarely as long asthe desired time horizon. Clearly a stochastic approach was required, involving a simulationof the future using a stochastic process known to have similar statistical properties to theobserved past. This crucial breakthrough, heralding the birth of stochastic hydrology, wasmade by Hazen (1914) who used the simplest possible model; an iid Gaussian process. The concept of the ideal dam obviously existed long before Mandelbrot, however the quotation is asuccinct mathematical deﬁnition. Naturally, this neat mathematical description ignores complications suchas margins of error, losses due to evaporation etc. but the principle is clear. Actually, as Hurst (1951) himselfpointed out: “increased losses due to storage are disregarded because, unless they are small, the site is notsuitable for over-year storage”.

5n practice, just one sample path would be of little use so, in principle, many diﬀer-ent sample paths could be generated, all of which could be analysed using Rippl’s methodto produce a distribution of ‘ideal heights’. This idea of generating repeated samples waspursued by Sudler (1927), however the stochastic approach to reservoir design was not gen-erally accepted in the West until the work of Soviet engineers was discovered in the 1950s.The important works by Moran (1959) and Lloyd (1967) are jointly considered to be thefoundations of modern reservoir design, and helped establish this approach as best practice.

Harold Edwin Hurst had spent a long career in Egypt (ultimately spanning 1906–68) even-tually becoming Director-General of the Physical Department where he was responsible for,amongst other things, the study of the hydrological properties of the Nile basin. For thou-sands of years the Nile had helped sustain civilisations in an otherwise barren desert, yet itsregular ﬂoods and irregular ﬂows were a severe impediment to development. Early attemptsat controlling the ﬂow by damming at Aswan were only partially successful. Hurst and hisdepartment were tasked with devising a method of water control by taking an holistic viewof the Nile basin, from its sources in the African Great Lakes and Ethiopian plains, to thegrand delta on the Mediterranean.In his studies of river ﬂows, Hurst (1951) used a method similar to Rippl’s in which heanalysed a particular statistic of the cumulative ﬂows of rivers over time called the ‘adjustedrange’, R . Let { X k } be a sequence of random variables, not necessarily independent, withsome non-degenerate distribution. We deﬁne the n th partial sum Y n =: X + · · · + X n . Feller(1951) then deﬁnes the Adjusted Range, R ( n ), as: R ( n ) = max ≤ k ≤ n (cid:26) Y k − kn Y n (cid:27) − min ≤ k ≤ n (cid:26) Y k − kn Y n (cid:27) . R ∗ ( n ) = max ≤ k ≤ n { Y k }− min ≤ k ≤ n { Y k } . Moreover he normalised the adjusted rangeby the sample standard deviation to obtain what is now called the Rescaled Adjusted Rangestatistic, denoted R/S ( n ): R/S ( n ) = max ≤ k ≤ n (cid:8) Y k − kn Y n (cid:9) − min ≤ k ≤ n (cid:8) Y k − kn Y n (cid:9)q n P nk =1 (cid:0) X k − n Y n (cid:1) . Hurst’s reasons for this normalisation are unclear but, as we shall see later, proved remarkablyfortuitous. The attraction of using

R/S is that, for a given time period of say n years, R/S ( n )is a proxy for the ideal dam height over that time period.Hurst (1951) then examined 690 diﬀerent time series, covering 75 diﬀerent geophysicalphenomena spanning such varied quantities as river levels, rainfall, temperature, atmosphericpressure, tree rings, mud sediment thickness, and sunspots. He found that in each case,the statistic behaved as R/S ( n ) ∝ n k for some k . He estimated k using a statistic hecalled K , and found that K was approximately normally distributed with mean 0.72 andstandard deviation 0.006. He actually acknowledged that “ K does vary slightly with diﬀerentphenomena”, and that the range (0 . − .

96) was large for a Gaussian ﬁt, however to a ﬁrstapproximation it appeared that the mean value of 0 .

72 might hold some global signiﬁcance.At this point it is worth highlighting an aspect of Hurst’s work which often gets over-looked. As we shall see, the

R/S statistic has enjoyed great use over the past ﬁfty years.However the modern method of estimating the exponent k is not that originally used byHurst. His estimate K was obtained by assuming a known constant of proportionality:speciﬁcally he assumed the asymptotic (i.e. for large n ) law that R/S ( n ) = ( n/ k . A dou-bly logarithmic plot of values of R/S ( n ) against n/ K . By assuming a known constant of proportionality, Hurst waseﬀectively performing a one parameter log-regression to obtain his estimate of k .7is reason for choosing this approach was that it implies R/S (2) = 1 exactly , and conse-quently this ‘computable value’ could be used in the estimation procedure. This methodologywould nowadays be correctly regarded as highly dubious because it involves ﬁtting an asymp-totic (large n ) relationship while making use of an assumed small ﬁxed value for the n = 1point. This logical ﬂaw was immediately remarked upon in the same journal issue by Chow(1951). As we will see, Mandelbrot later introduced the now-standard method of estimationby dropping this ﬁxed point and performing a two -parameter log-regression to obtain theslope. Hurst’s original method was forgotten and most authors are unaware that it was notthe same as the modern method; indeed many cite Hurst’s result of 0 .

72 unaware that is wasobtained using an inappropriate analysis.Notwithstanding these shortcomings, Hurst’s key result that estimates of k were about0.72 would likely not have been either noteworthy or controversial in itself had he not shownthat, using contemporary stochastic models, this behaviour could not be explained.In the early 1950s, stochastic modelling of river ﬂows was immature and so the onlymodel that Hurst could consider was the iid Gaussian model of Hazen (1914) and Sudler(1927). Rigorously deriving the distribution of the range under this model was beyondHurst’s mathematical skills, but by considering the asymptotics of a coin tossing game andappealing to the central limit theorem, he did produce an extraordinarily good heuristicsolution. His work showed that, under the independent Gaussian assumption, the exponent k should equal 0.5. In other words Hurst had shown that contemporary hydrological modelsfundamentally did not agree with empirical evidence. This discrepancy between the theoryand practice became known as the ‘Hurst phenomenon’. Hurst’s observation sparked a series of investigations that ultimately led to the formal it actually equals 1 / √ n rather than n − It is worth clarifying a potential ambiguity here: since the phrase was coined, the ‘Hurst Phenomenon’has been attributed to various aspects of time series and/or stochastic processes. For clarity, we will use theterm to mean “the statistic

R/S ( n ) empirically grows faster than n / ”. § Hurst’s ﬁnding took the hydrological community by surprise, not only because of the intrin-sic puzzle, but because of its potential importance. As previously mentioned, the

R/S ( n )statistic is a proxy for the ideal dam height over n years. If Hurst’s ﬁnding was to be believed,and R/S ( n ) increased faster than n / , there would be potentially major implications fordam design. In other words, dams designed for long time horizons might be too low, withﬂoods as one potential consequence.Although the debate over Hurst’s ﬁndings, which subsequently evolved into the debateabout long memory, was initially largely conﬁned to the hydrological community, fortuitouslyit also passed into more mainstream mathematical literature — a fact which undoubtedlyhelped to lend it credence in later years. Despite Hurst’s non-rigorous approach, and anunclear mathematical appeal for what was essentially a niche subject, the eminent probabilistWilliam Feller (1951) contributed greatly by publishing a short paper. By appealing toBrownian motion theory, he proved that Hurst was correct; for sequences of standardised9id random variables with ﬁnite variance, the asymptotic distribution of the adjusted range, R ( n ), should obey the n / law: E [ R ( n )] ∼ (cid:0) π (cid:1) / n / . It should be emphasised that Fellerwas studying the distribution of the adjusted range, R ( n ), not the rescaled adjusted range R/S ( n ). The importance of dividing by the standard deviation was not appreciated untilMandelbrot, however Feller’s results would later be shown to hold for this statistic as well.By proving and expanding (since the Gaussianity assumption could be weakened) Hurst’sresult, Feller succeeded in both conﬁrming that there was a phenomenon of interest, andalso that it should interest mathematicians as well as hydrologists. Over the course of the1950s more precise results were obtained although attention was unfortunately deﬂected toconsideration of the simple range (e.g. Anis and Lloyd, 1953) as opposed to R/S . The exactdistribution of R ( n ) was found to be, in general, intractable; a notable exception being thatfor the simplest iid Gaussian case, where (Solari and Anis, 1957) E [ R ( n )] = (cid:16) π (cid:17) / π n − X k =1 p k ( n − k ) ! n / . Having conclusively shown that Hurst’s ﬁndings were indeed worthy of investigation, severaldiﬀerent possible explanations of the eponymous phenomenon were put forward. It wasassumed that the eﬀect was caused by (at least) one of the following properties of the process:a) an ‘unusual’ marginal distribution, b) non-stationarity, c) transience (i.e pre-asymptoticbehaviour), or d) short-term auto-correlation eﬀects.For Hurst’s original data, the ﬁrst of these proposed solutions was not relevant becausemuch of his data were clearly Gaussian. Moran (1964) claimed that the eﬀect could be ex-plained by using a sequence of iid random variables with a particular moment condition onthe distribution. Although this case had been shown by Feller (1951) to still asymptotically produce the n / law, Moran showed that in such cases the transient (or the pre -asymptotic)10hase exhibiting the Hurst phenomenon could be extended arbitrarily. Furthermore, Moranpointed out that if the ﬁnite variance assumption was dropped altogether, and instead asymmetric α -stable distribution was assumed, the Hurst phenomenon could apparently beexplained: E [ R ( n )] ∼ ℓn /α , for 1 < α ≤

2. and some known (computable) ℓ . However,as Mandelbrot later showed, the division by the standard deviation is indeed crucial. Inother words, whilst Moran’s arguments were correct, they were irrelevant because the ob-ject of real interest was the rescaled adjusted range. Several Monte Carlo studies, notablythose by Mandelbrot and Wallis (1969b), conﬁrmed that for iid random variables, regard-less of marginal distribution, R/S ( n ) asymptotically follows the n / law. That Mandelbrot(1975) and Mandelbrot and Taqqu (1979) were ﬁnally able to prove this result remains ofthe principal reasons why the R/S statistic has remained popular to the present day.The second potential explanation of the Hurst phenomenon, non-stationarity, is harderto discount and is more of a philosophical (and physical) question than a mathematical one.Is it meaningful to talk of a time-invariant mean over thousands of years? Iflong enough realizations of such time series were available would they in fact bestationary? (O’Connell, 1971, § Moran used a Gamma distribution, although to achieve the eﬀect the distribution had to be heavilyskewed, thus ruling it out as a practical explanation for Hurst’s eﬀect. Their study shows why it is crucial to distinguish between the ‘Hurst phenomenon’ and ‘long mem-ory’. The process described by Bhattacharya et al. does not have long memory yet it exhibits the Hurstphenomenon (recall our speciﬁc deﬁnition of this term).

11n his inﬂuential paper, Klemeˇs (1974) not only showed that the Hurst phenomenon couldbe explained by non-stationarity, but argued that assuming stationarity may be mis-founded:The question of whether natural processes are stationary or not is likely aphilosophical one. . . . there is probably not a single historic time series of whichmathematics can tell with certainty whether it is stationary or not . . . Traditionally,it has been assumed that, in general, the geophysical, biological, economical, andother natural processes are nonstationary but within relatively short time spanscan be well approximated by stationary models. (Klemeˇs, 1974)As an example, Klemeˇs suggested that a major earthquake might drastically aﬀect a riverbasin so much as to induce a regime-change (i.e. an element of non-stationarirty). Howeveron a larger (spatial and temporal) scale, the earthquake and its local deformation of theEarth may be seen as part of an overall stationary ‘Earth model’. Thus choosing betweenthe two forms is, to some extent, a matter of personal belief. As we will see in the nextsection, Mandelbrot did in fact consider (and publish) other models with a particular typeof nonstationary switching himself, even while formulating his stationary FGN model, butunfortunately Klemeˇs was unaware of that work, about which a more fruitful discussionmight perhaps have occurred.If we discount this explanation and assume stationarity, we must turn to the third andfourth possible explanations, namely transience (i.e. pre-asymptotic behaviour) and/or thelack of independence. These two eﬀects are related: short-term auto-correlation eﬀects arelikely to introduce signiﬁcant pre-asymptotic behaviours. As mentioned earlier, Hurst himselfsuggested some kind of serial dependence might explain the eﬀect, and Feller suggested:It is conceivable that the [Hurst] phenomenon can be explained probabilisti-cally, starting from the assumption that the variables { X k } are not independent. . . Mathematically this would require treating the variables { X k } as a Markovprocess. (Feller, 1951)Soon however, Barnard (1956) claimed to have shown that Markovian models still led tothe n / law and it would be shown later (Mandelbrot and Wallis, 1968; Mandelbrot, 1975)12hat any then-known form of auto-correlation must asymptotically lead to the same result.The required condition on the auto-correlation function turned out to be that it is summable,whereby for iid random variables with ACF ρ ( · ) (Siddiqui, 1976): E [ R/S ( n )] ∼ (cid:16) π (cid:17) / ∞ X k = −∞ ρ ( k ) ! / n / . (1)Even before this was formally proved, it was generally known that some complicated auto-correlation structure would be necessary to explain the Hurst phenomenon:It has been suggested that serial correlation or dependence [could cause theHurst phenomenon]. This, however, cannot be true unless the serial dependenceis of a very peculiar kind, for with all plausible models of serial dependencethe series of values is always approximated by a [Brownian motion] when thetime-scale is suﬃciently large. A more plausible theory is that the experimentalseries used by Hurst are, as a result of serial correlation, not long enough for theasymptotic formula to become valid. (Moran, 1959)Thus Moran was arguing that, since no ‘reasonable’ auto-correlation structure could accountfor the Hurst phenomenon, it should be assumed that the observed eﬀect was caused by pre-asymptotic behaviour, the extent of which was inﬂuenced by some form of local dependence.In other words, he was suggesting that a short-memory process could account for the Hurstphenomenon over observed time scales.This issue has both a practical and philosophical importance. It would later be arguedby some that, regardless of the ‘true’ model, any process that could exhibit the Hurst phe-nomenon over the observed (or required) time scales would suﬃce for practical purposes.Using such processes requires a choice. One might accept the Hurst phenomenon as gen-uine and acknowledge that, although theoretically incorrect, such a model is good enough for the desired purpose. Alternatively, one might reject the Hurst phenomenon as simplya pre-asymptotic transient eﬀect, and therefore any model which replicates the eﬀect overobserved ranges of n is potentially valid. Mandelbrot, for one, was highly critical of those13ho followed the latter approach:So far, such a convergence [to the n / law] has never been observed in hydrol-ogy. Thus, those who consider Hurst’s eﬀect to be transient implicitly attach anundeserved importance to the value of [the sample size] . . . These scholars con-demn themselves to never witness the full asymptotic development of the modelsthey postulate. (Mandelbrot and Wallis, 1968)Despite this, the concept of short-memory-induced transience was explored both before andafter Mandelbrot’s work. Matalas and Huzzen (1967) performed a rigorous Monte Carloanalysis of the AR(1) model and demonstrated that for medium n and heavy lag-one serialcorrelation, the Hurst phenomenon could be induced (albeit Matalas and Huzzen were ac-tually using Hurst’s original erroneous K estimate). Fiering (1967) succeeding in building amore sophisticated model; however he found he needed to use an AR(20) process to inducethe eﬀect — an unrealistically large number of lags to be useful for modelling.To summarise, by the early 1960s, more than a decade on from Hurst’s original discoveries,no satisfactory explanation for the Hurst phenomenon had yet been found. To quote (Klemeˇs,1974) again:Ever since Hurst published his famous plots for some geophysical time series. . . the by now classical Hurst phenomenon has continued to haunt statisticiansand hydrologists. To some it has become a puzzle to be explained, to othersa feature to be reproduced by their models, and to others still, a ghost to beconjured away.It was at this point that Benoˆıt Mandelbrot heard of the phenomenon. In the early 1960s, Mandelbrot had worked intensively on the burgeoning subject of math-ematical ﬁnance which is concerned with modelling economic indices such as share prices.Central to this subject was the ‘Random Walk Hypothesis’ which provided for Brownian14otion models. This was ﬁrst implicitly proposed in the seminal (yet long undiscovered)doctoral thesis by Bachelier (1900). The detailed development of this topic is also interestingbut beyond the scope of this paper. It suﬃces to say here that, although Bachelier’s modelwas recognised as an adequate working model which seemed to conform to both intuitionand the data, it could also beneﬁt from reﬁnements. Various modiﬁcations were proposedbut one common feature they all shared was the underlying Gaussian assumption.In a ground-breaking paper, Mandelbrot (1963) proposed dropping the Gaussianity as-sumption and instead assuming a heavy tailed distribution, speciﬁcally the symmetric α -stable distribution (e.g., Samorodnitsky and Taqqu, 1994, § α -stable distributions have the attractive property that an ap-propriately re-weighted sum of such random variables is itself an α -stable random variable.This passion for scaling would remain with Mandelbrot throughout his life, and is epitomisedby his famous fractal geometry.Returning to Hurst’s results, Mandelbrot’s familiarity with scaling helped him immedi-ately recognise the Hurst phenomenon as symptomatic of this, and, as he later recounted(Mandelbrot, 2002a; Mandelbrot and Hudson, 2008), he assumed that it could be explainedby heavy tailed processes. He was therefore surprised when he realised that, not only wereHurst’s data essentially Gaussian, but as discussed previously, the rescaled adjusted rangeis not sensitive to the marginal distribution. Instead, he realised that a new approach wouldbe required. In keeping with the idea of scaling, he introduced the term ‘self-similar’, andformally introduced the concept in its modern form: Let Y ( t ) be a continuous-time stochas- Mandelbrot later regretted the term ‘self-similar’ and came to prefer ‘self-aﬃne’, because scaling in timeand space were not necessarily the same, but the revised terminology never caught on to the same extent. Y ( t ) is said to be self-similar, with self-similarity parameter J , if for allpositive c , Y ( ct ) d = c J Y ( t ). Using this concept, Mandelbrot (1965), laid the foundationsfor the processes which would initially become the paradigmatic models in the ﬁeld of longmemory, the self-similar fractional Brownian motion (FBM) model and its increments, thelong range dependent fractional Gaussian noise (FGN) model.At this point it is necessary to informally describe FBM. It is a continuous-time Gaussianprocess, and is a generalisation of ordinary Brownian motion, with an additional parameter h . This parameter can range between zero and one (non-inclusive to avoid pathologies)with diﬀerent values providing qualitatively diﬀerent types of behaviour. The case h = 1 / statistically identical to the original, i.e. they will have the sameﬁnite dimensional distributions. In this sense FBM, like standard Brownian motion (whichof course is just a special case of FBM), has no characteristic time-scale, or ‘tick’.In practical applications it is necessary to use a modiﬁcation of FBM because it is (likestandard Brownian motion) a continuous time process and non-stationary. Thus the incre-ments of FBM are considered; these form a discrete process which can be studied usingconventional time series analysis tools. These increments, called fractional Gaussian noise(FGN), can be considered to be the discrete approximation to the ‘derivative’ of FBM. Note We remark that the naming and notation of this parameter has been a source of immense confusionover the past half-century, with various misleading expressions such as the ‘Hurst parameter’, the ‘Hurstexponent’, the ‘Hurst coeﬃcient’, the ‘self-similarity parameter’ and the ‘long memory parameter’. Moreover,the more traditional notation of an upper-case H does not help since it disobeys the convention of usingseparate cases for constants (parameters) and random variables (statistics). For clarity in what follows wewill we will reserve the notation h simply to denote the ‘fractional Brownian motion parameter’. h = 1 /

2, FGN is simply the increments of standard Brownian motion,i.e. white noise. Mandelbrot and Van Ness (1968, corollary 3.6) showed that this process isstationary, but most importantly (for its relevance here), it exhibits the Hurst phenomenon:for some c > R/S ( n ) ∼ cn h . This result was immensely signiﬁcant; it was the ﬁrst timesince Hurst had ﬁrst identiﬁed the phenomenon, that anyone had been able to exhibit astationary, Gaussian stochastic process capable of reproducing the eﬀect. The mystery hadbeen partially solved; there was such a process, and for over a decade it remained the onlymodel known to be able to fully explain the Hurst phenomenon.Mandelbrot (1965) then proceeded to show that such a process must have a spectraldensity function that blows up at the origin. By proposing such a model, he realised hewas attempting to explain with one parameter h both low- and high-frequency eﬀects, i.e.he was “. . . postulating the same mechanism for the slow variations of climate and for therapid variations of precipitation”. He also recognised that the auto-correlation function ofthe increments would decay slower than exponentially, and (for 1 / < h <

1) would notbe summable. This correlation structure, which is now often taken to be the deﬁnition oflong memory itself, horriﬁed some. Concurrently, the simplicity of FGN, possessing only oneparameter h , concerned others. We shall consider these issues in depth later, but the keypoint was that although Mandelbrot had ‘conquered’ the problem, to many it was somewhatof a Pyrrhic victory (Klemeˇs, 1974). Mandelbrot immediately attempted to expand on the subject although his papers took timeto get accepted. He ultimately published a series of ﬁve papers in 1968–9 through collabo-rations with the mathematician John Van Ness and the hydrologist James Wallis. Taken asa whole, these papers oﬀered a comprehensive study of long memory and fractional Brown-ian motion. They helped publicise the subject within the scientiﬁc community and started17he debates about the existence of long memory and the practicality of FGN, which havecontinued until the present day.The ﬁrst of these papers (Mandelbrot and Van Ness, 1968) formally introduced FBMand FGN and derived many of their properties and representations. The aim of this paperwas simply to introduce these processes and to demonstrate that they could provide anexplanation for the Hurst phenomenon; this was succinctly stated:We believe FBM’s do provide useful models for a host of natural time seriesand wish therefore to present their curious properties to scientists, engineers andstatisticians.Mandelbrot and Van Ness argued that all processes thus far considered have “the propertythat suﬃciently distant samples of these functions are independent, or nearly so”, yet incontrast, they pointed out that FGN has the property “that the ‘span of interdependence’between their increments can be said to be inﬁnite”. This was a qualitative statement ofthe diﬀerence between short and long memory and soon led to the formal deﬁnition of longmemory. As motivation for their work, they cited various examples of observed time serieswhich appeared to possess this property: in economics (Adelman, 1965; Granger, 1966), ‘1 /f noises’ in the ﬂuctuations of solids (Mandelbrot, 1967), and hydrology (Hurst, 1951).Intriguingly, and undoubtedly linked to Mandelbrot’s original interest in heavy-tailedprocesses, Mandelbrot and Van Ness (1968, § α -] stable . . . Suchincrements necessarily have an inﬁnite variance. ‘Fractional L´evy-stable randomfunctions’ have moreover an inﬁnite span of interdependence.In other words the authors postulated a heavy-tailed, long memory process. It would beover a decade before such processes were properly considered due to diﬃculties arising from18he lack of formal correlation structure in the presence of inﬁnite variance. One key point which is often overlooked is that Mandelbrot and Van Ness did not claimthat FGN is necessary to explain the Hurst phenomenon: “. . . we selected FBM so as to beable to derive the results of practical interest with a minimum of mathematical diﬃculty”.Often Mandelbrot was incorrectly portrayed as insisting that his, and only his, model solvedthe problem. Indeed Mandelbrot himself took an interest in alternative models, althoughas we will later see, he essentially rejected Granger and Hosking’s ARFIMA which was tobecome the standard replacement of FGN in statistics and econometrics literatures.Furthermore, neither did the authors claim that they were the ﬁrst to discover FBM.They acknowledged that others (e.g. Kolmogorov, 1940) had implicitly studied it; howeverMandelbrot and Van Ness were undoubtedly the ﬁrst to attempt to use it in a practical way.Having ‘solved’ Hurst’s riddle with his stationary fractional Gaussian model, Mandelbrotdetermined to get FGN and FBM studied and accepted, in particular by the communitywhich had most interest in the phenomenon, hydrology. Therefore his remaining four impor-tant papers were published in the leading hydrological journal

Water Resources Research .These papers represented a comprehensive study of FBM in an applied setting, and werebold; they called for little short of a revolution in stochastic modelling:... current models of statistical hydrology cannot account for either [Noahor Joseph] eﬀect and must therefore be superseded. As a replacement, the ‘self-similar’ models that we propose appear very promising (Mandelbrot and Wallis,1968)As its title suggests, Mandelbrot and Wallis (1968) introduced the colourful terms ‘NoahEﬀect’ and ‘Joseph Eﬀect’ for heavy tails and long memory respectively; both labels refer-encing key events of the Biblical characters’ lives. Ironically, the river level data were in However a preliminary demonstration of the robustness of R/S as a measure of LRD was given inMandelbrot and Wallis (1969b), using a heavy tailed modiﬁcation of fBm which they dubbed “fractionalhyperbolic motion”. . This ‘erratic’ behaviour could be caused by one, or both,of the Joseph and Noah eﬀects. Mandelbrot and Wallis showed that processes lying withinthe BDoA will, after an initial transient behaviour, obey the n / law. They rejected, onphilosophical grounds, the idea that the Hurst phenomenon might be caused by transienteﬀects.Mandelbrot proceeded to provide more evidence in support of his model. Mandelbrot and Wallis(1969a) included several sample graphs of simulated realisations of FGN with varying h . Theexplicit aim was to “encourage comparison of [the] artiﬁcial series with the natural recordwith which the reader is concerned”. These simulations were performed by using one of twomethods developed by the authors which were diﬀerent types of truncated approximations. Mandelbrot later preferred the terms ‘mild’ to “nice”, and subdivided “erratic” into heavy tailed ‘wild’and strongly dependent ‘slow”. We stick with his original terminology. As documented by Mandelbrot (1971) it was soon found that one of these approximations was far fromadequate because it failed to accurately reproduce the desired eﬀects. The algorithms were also slow toimplement; a signiﬁcant practical problem when computer time was expensive and processing power limited.Mandelbrot (1971) therefore introduced a more eﬃcient algorithm. Later an exact algorithm would becreated by Hipel and McLeod (1978a,b) which forms the basis for modern algorithms(Davies and Harte,

R/S analysis but they recog-nised the previously mentioned logical ﬂaw in Hurst’s approach. They therefore developeda systematic two-parameter log-regression to obtain an estimate of h which we will denote H . This approach to R/S analysis has since become the standard method.The simulated sample paths were subjected to both

R/S and spectral analysis, and forboth cases it was found that the simulated paths largely agreed with the theory, i.e. thesample paths seemed suﬃciently good representations of the theoretical processes. For the

R/S analysis, it was found, as expected, that there existed three distinct regions: transientbehaviour, ‘Hurst’ behaviour, and asymptotic ‘1/2’ behaviour. This last region was causedentirely because the simulations were essentially short memory approximations to the longmemory processes; inﬁnite moving averages were truncated to ﬁnite ones. Thus this thirdregion could be eliminated by careful synthesis, i.e. by making the running averages muchlonger than the ensemble length. Furthermore the transient region was later shown (Taqqu,1970) to be largely a feature of a programming error.Mandelbrot and Wallis (1969c) applied their

R/S method to many of the same data typesas Hurst (1951, 1956a) and Hurst et al. (1965), and similarly found signiﬁcant evidencein favour of the long memory hypothesis. In a comparison of Hurst’s K with their H ,Mandelbrot and Wallis pointed out that K will tend to under-estimate h when h > .

72 but over -estimate when h < .

72. So Hurst’s celebrated ﬁnding of a global average of 0.72 washeavily inﬂuenced by his poor method, and his estimated standard deviation about this meanwas underestimated. This important point, that the original empirical ﬁndings which helpedspawn the subject of long memory were systematically ﬂawed, has long been forgotten.Next, Mandelbrot and Wallis (1969b) undertook a detailed Monte Carlo study of the ro-bustness to non-Gaussianity of their

R/S method. As previously mentioned, in general

R/S was shown to be very robust. The diﬀerent distributions studied were Gaussian, lognormal, α -stable but attracted to that law),and truncated Gaussian (to achieve kurtosis lower than Gaussian). The distribution of theun-normalised adjusted range, R ( n ), was shown to be highly dependent on the distribution,however the division by S ( n ) corrected for this. For any sequence of iid random variables,their estimate H was always (close to) 1 / R/S was shown to be susceptible in the presence of strong periodicities; a fact rather optimisticallydismissed: “Sharp cyclic components rarely occur in natural records. One is more likely toﬁnd mixtures of waves that have slightly diﬀerent lengths . . . ”.Finally, Mandelbrot and Wallis (1969b) intriguingly replaced the Gaussian variates intheir FGN simulator with ‘hyperbolic’ variates. Although now known have drawbacks, thiswas for a long time the only attempt at simulating a heavy-tailed long memory process.

By proposing heavy tailed models to economists, Mandelbrot had had a tough time advo-cating against orthodoxy (Mandelbrot, 2013). Because his fractional models were similarlyunorthodox, he learned from his previous experience, and was more careful about introduc-ing them to hydrologists. By producing several detailed papers covering diﬀerent aspectsof FBM he had covered himself against charges of inadequate exposition. Unsurprisinglyhowever, many hydrologists were unwilling to accept the full implications of his papers.Firstly, Mandelbrot’s insistence on self-similar models seemed somewhat implausible andrestrictive, and seemed to totally ignore short-term eﬀects. Secondly, Mandelbrot’s modelwas continuous-time which, although necessary to cope with self-similarity, was only usefulin a theoretical context because we live in a digital world; data are discrete and so are22omputers. As soon as his models were applied to the real world they became compromised :The theory of fractional noise is complicated by the motivating assumptionsbeing in continuous time and the realizable version being needed in discrete time.(Lawrance and Kottegoda, 1977, § d , to produce the very ﬂexible class of ARIMA( p, d, q ) models. As in otherscientiﬁc ﬁelds, many hydrologists were attracted to these models, and sought to explore thepossibility of using them to replicate the Hurst phenomenon.It is important to note that ARIMA models cannot genuinely reproduce the asymptotic Hurst phenomenon since all ARIMA models either have short memory, or are non-stationary.However by choosing parameters carefully, it can be shown that it is possible to replicate the observed

Hurst phenomenon over a large range of n . O’Connell (1971) was an early exponentof this idea; speciﬁcally he used an ARMA(1 ,

1) model which could (roughly) preserve a givenﬁrst-lag auto-correlation as well as h . To summarise, in the early 1970s there were two distinct approaches to modelling hydro-logical processes. One could use traditional AR processes (or their more advanced ARMAcousins) which, although able to partially replicate the Hurst phenomenon, were essentiallyshort memory models. Alternatively one could use Mandelbrot’s FGN process in order to Mandelbrot was primarily interested in FBM; he saw the necessary discretisation, FGN, as its derivative,both literally and metaphorically. For completeness, we mention that other modelling approaches were investigated to try and repli-cate the Hurst phenomenon. One such model was the so-called ‘broken-line’ process detailed byRodr´ıguez-Iturbe et al. (1972), Garcia et al. (1972), and Mejia et al. (1972, 1974) which sought to preservea twice diﬀerentiable spectrum. This was criticised by Mandelbrot (1972a) and did not prosper. § As an aside, the set of papers by McLeod and Hipel were also remarkable for two other reasons. Asmentioned previously, they developed an exact

FGN simulator (using the Cholesky decomposition method),which although computationally expensive, was the ﬁrst time anyone had been able to simulate genuinelong memory data. Secondly, the authors derived a maximum likelihood estimator for the FGN parameter h . This was the ﬁrst proper attempt at parametric modelling of FGN. Mandelbrot and Taqqu (1979) were dismissive of this approach due to the strong assumptions needed, however from a theoretical statisticalpoint of view it was a clear breakthrough. h = 1 /

2) to extrapolate the correlated behaviourfrom a ﬁnite time span to an asymptotically inﬁnite one is physically completelyunjustiﬁed. Furthermore, using self-similarity to intrapolate [sic] to a very shorttime span . . . is physically absurd. (Scheidegger, 1970)Interestingly, in his reply, Mandelbrot (1970) somewhat missed the point:[The] self-similar model is the only model that predicts for the rescaled rangestatistic . . . precisely the same behaviour as Harold Edwin Hurst has observedempirically. To achieve the same agreement with other models, large numbers of ad hoc parameters are required. Thus the model’s justiﬁcation is empirical, as isultimately the case for any model of nature.Yet another argument used to oppose the use of long memory models arose from a debateabout their practical value. By not incorporating long memory into models, at how muchof a disadvantage was the modeller? Clearly, this is a context-speciﬁc question, but thepertinent question in hydrology is: by how much does incorporating long memory into thestochastic model change the ideal dam height? One view, shared by Mandelbrot:The preservation within synthetic sequences . . . [of h ] is of prime importance toengineers since it characterizes long term storage behaviour. The use of syntheticsequences which fail to preserve this parameter usually leads to underestimationof long term storage requirements. (O’Connell, 1971).By ignoring the Hurst phenomenon, we would generally expect to underestimate the idealdam height but how quantiﬁable is the eﬀect? Wallis and Matalas (1972) were the ﬁrst todemonstrate explicitly that the choice of model did indeed aﬀect the outcome: by comparingAR(1) and FGN using the Sequential Peak algorithm — a deterministic method of assessingstorage requirements based on the work of Rippl (1883) and further developed in the 1960s.26allis and Matalas showed that the height depends on both the short and long memorybehaviours, and in general, FGN models require larger storage requirements, as expected.Lettenmaier and Burges (1977) went into more detail by looking at the distribution of theideal dam height (rather than simply the mean value) and found it followed extreme valuetheory distributions. Lettenmaier and Burges also showed that long memory inputs requiredslightly more storage, thus conﬁrming the perception that long memory models need to beused to guard against ‘failure’.However Klemeˇs et al. (1981) argued against using this philosophy; instead suggestingthat ‘failure’ is not an absolute term. In the context of hydrology, ‘failure’ would mean beingunable to provide a large enough water supply; yet clearly a minimal deﬁcit over a few daysis a diﬀerent severity to a substantial drought over many years. Any ‘reasonable’ economicanalysis should take this into account. Klemeˇs et al. claimed that the incorporation of longmemory into models used to derive the optimum storage height is essentially a ‘safety factor’,increasing the height by a few percent, however “. . . in most practical cases this factor willbe much smaller than the accuracy with which the performance reliability can be assessed.”In summary therefore, Mandelbrot’s work was controversial because, although it pro-vided an explanation of Hurst’s observations, the physical interpretation of the solution wasunpalatable. There was no consensus regarding the whole philosophy of hydrological mod-elling; should the Hurst phenomenon be accounted for, and if so implicitly or explicitly?Moreover, the new concept of long memory, borne out of the solution to the riddle, was bothnon-intuitive and mathematically unappealing at the time.Much of the debate outlined above was conﬁned to the hydrological community, in par-ticular the pages of Water Resources Research . With the exception of articles appearing inprobability journals concerning the distributions of various quantities related to the rescaledadjusted range, little else was known about long memory by statisticians. This was recti-ﬁed by a major review paper by Lawrance and Kottegoda (1977) which helped bring the27ttention of the Hurst phenomenon to the wider statistical community.One of those non-hydrologists who took up the Hurst ‘riddle’ was the eminent econo-metrician Clive Granger. In an almost-throwaway comment at the end of a paper, Granger(1978) ﬂoated the idea of ‘fractionally diﬀerencing’ a time series, whose spectrum has a poleat the origin. Granger’s observation was followed up by both himself, and independently bythe hydrologist Hosking (1981), who between them laid the foundations for a diﬀerent classof long memory model. This class of ARFIMA models are the most commonly used longmemory models of the present day. If the empirical ﬁndings of Hurst helped to stimulate theﬁeld, and the models of Mandelbrot helped to revolutionise the ﬁeld, the class of ARFIMAmodels can be said to have made the ﬁeld accessible to all.

One of the objections to Mandelbrot’s fractional Gaussian noise was that it was a discreteapproximation to a continuous process. Hosking (1981) explained how FGN can be roughlythought of as the discrete version of a fractional derivative of Brownian motion. In otherwords, FGN is obtained by fractionally diﬀerentiating, then discretising. Hosking proposedto reverse this order of operations, i.e discretising ﬁrst, then fractionally diﬀerencing.The advantage of this approach is that the discrete version of Brownian motion has anintuitive interpretation; it is the simple random walk, or ARIMA(0 , ,

0) model. We mayfractionally diﬀerence this using the well-deﬁned ‘fractional diﬀerencing operator of order d ’to obtain the ARFIMA(0 , d,

0) process, which for 0 < d < / It should be pointed out that the ubiquity of 1 /f spectra had been a puzzle to physicists since the workof Schottky in 1918. Adenstedt (1974), derived some properties of such processes but his work went largelyunnoticed until the late 1980s, while Barnes and Allan (1966) considered a model of 1 /f noise explicitlybased on fractional integration possible to fractionally diﬀerence a process and, in order not to over- or under diﬀerencedata, it may be desirable to do so. Direct motivation was provided by Granger (1980) whoshowed that such processes could arise as an aggregation of independent AR(1) processes,where the Auto-Regressive parameters were distributed according to a Beta distribution (thisaggregation of micro-economic variables was a genuine motivation, rather than a contrivedexample). Furthermore, Granger and Joyeux pointed out that in long-term forecasts it isthe low frequency component that is of most importance. Both Granger and Joyeux (1980) and Hosking (1981) acknowledged that their model wasbased on diﬀerent underlying assumptions to Mandelbrot’s models. They also recognisedthe extreme usefulness of introducing long memory to the Box–Jenkins framework. Byconsidering their fractionally diﬀerenced model as an ARIMA(0 , d,

0) process, it was anobvious leap to include the parameters p, q in order to model short-term eﬀects; thence thefull ARFIMA( p, d, q ) model. By developing a process which could model both the short andlong memory properties, the authors had removed the forced dichotomy between ARMAand FGN models. By being able to model both types of memory simultaneously, ARFIMAmodels immediately resolved the main practical objection to Mandelbrot’s FGN model.Depending on the individual context and viewpoint, ARFIMA models can either beseen as pure short memory models adjusted to induce long memory behaviour, or purelong memory models adjusted to account for short-term behaviour. ARFIMA models aremore often introduced using the former of these interpretations — presumably because mostpractitioners encounter the elementary Box–Jenkins models before long memory — however It is worth remarking that forecasting is quite diﬀerent from synthesis discussed earlier; the former takesan observed sequence and, based on a statistical examination of its past, attempts to extrapolate its future.This is a deterministic approach and, given the same data and using the same methods, two practitionerswill produce the same forecasts. Synthesis on the other hand is a method of producing a representativesample path of a given process and is therefore stochastic in nature. Given the same model and parameters,two practitioners will produce diﬀerent sample paths (assuming their random number generator seeds arenot initiated to the same value). However their sequences will have the same statistical properties.

29t is arguably more useful to consider the latter interpretation.Although slow to take oﬀ, the increased ﬂexibility of ARFIMA models, and their generalease of use compared to Mandelbrot’s FGN, meant that they gradually became the longmemory model of choice in many areas including hydrology and econometrics, although wehave found them still to be less well known in physics than FGN. Apart from their discrete-ness (which may, or may not be a disadvantage depending on the point of view) the onlydisadvantage that ARFIMA models have is that they are no longer completely self-similar.The re-scaled partial sums of a ‘pure’ ARFIMA(0 , d,

0) model converge in distribution toFBM (see e.g. Taqqu, 2003, § asymptotically self-similar process. However any non-trivial short memory componentintroduces a temporal ‘tick’ and destroys this self-similarity.Perhaps inevitably given his original motivation for introducing self-similarity as an ex-planation for the Hurst phenomenon, and his further development of the whole concept ofscaling into fractal theory, Mandelbrot was not attracted to ARFIMA models. Decades aftertheir introduction, and despite their popularity, Mandelbrot would state:[Granger] prefers a discrete-time version of FBM that diﬀers a bit from theType I and Type II algorithm in Mandelbrot and Wallis (1969a). Discretizationis usually motivated by unquestionable convenience, but I view it as more than adetail. I favor very heavily the models that possess properties of time-invarianceor scaling. In these models, no time interval is privileged by being intrinsic.In discrete-time models, to the contrary, a privileged time interval is imposednonintrinsically. (Mandelbrot, 2002c)Convenience would seem to rule the roost in statistics, however, as ARFIMA-based inferenceis applied in practice far more often than FBM/FGN. Many practitioners would argue thatit is not hard to justify use of a “privileged time interval” in a true data analysis context:the interval at which the data are sampled and/or at which decisions based on such datawould typically be made, will always enjoy privilege in modeling and inference.30s we saw above, the introduction of the LRD concept into science came with Man-delbrot’s application of the fractional Brownian models of Kolmogorov to an environmetricobservation – Hurst’s eﬀect in hydrology. Nowadays, an important new environmetric ap-plication for LRD is to climate research. Here ARFIMA plays an important role in under-standing long-term climate variability and in trend estimation, but remains less well knownin some user communities compared to, for example, SRD models of the Box-Jenkins type,of which AR(1) is still the most frequently applied. Conversely, in many branches of physicsthe fractional α -stable family of models including FBM remain rather better known thanARFIMA. The process of codifying the reasons for the similarities and diﬀerences betweenthese models, and also the closely related anomalous diﬀusion models such as the ContinuousTime Random Walk, in a way accessible to users, is underway but much more remains to bedone here, particularly on the “physics of FARIMA”. We have attempted to demonstrate the original motivation behind long memory processes,and trace the early evolution of the concept. Debates over the nature of such processes,and their applicability or appropriateness to real life, are still ongoing. Importantly, thephysical meaning of FBM has been clariﬁed by studies which show how it plays the roleof the noise term in the generalised Langevin equation when a particular (“1 /f ”) choice ofheat bath spectral density has been made, see for example Kupferman (2004). Rather thandraw our own conclusions, we rather intended to illuminate the story of this fascinating areaof science, and in particular the role played by Benoit Mandelbrot, who died in 2010. Thefacet of Mandelbrot’s genius on show here was to use his strongly geometrical mathematicalimagination to link some very arcane aspects of the theory of stochastic processes to the needsof operational environmetric statistics. Quite how remarkable this was can only be fully31ppreciated when one reminds oneself of the available data and computational resources ofthe early 1960s, even at IBM. The wider story (Mandelbrot and Hudson, 2008; Mandelbrot,2013) in which this paper’s theme is embedded, of how he developed and applied in sequence,ﬁrst the α -stable model in economics, followed by the fractional renewal model in 1 /f noise,and then FBM, and a fractional hyperbolic precursor to the linear fractional stable models,and ﬁnally a multifractal model, all in the space of about 10 years, shows both mathematicalcreativity and a real willingness to listen to what the data was telling him. The fact thehe (and his critics) were perhaps less willing to listen to each other is a human trait whoseeﬀects on this story-we trust-will become less signiﬁcant over time. Acknowledgements

This paper is derived from Appendix D of TG’s PhD thesis (Graves, 2013). NWW andCF acknowledge the stimulating environment of the British Antarctic Survey, and TG andRG that of the Cambridge Statslab, during this period. We thank Cosma Shalizi, HolgerKantz, and the participants in the International Space Science Institute programme on “Self-Organized Criticality and Turbulence” for discussions, and David Spiegelhalter for support.CF was supported by the German Science Foundation (DFG) through the cluster of excel-lence CliSAP, while NWW has recently been supported at the University of Potsdam byONR NICOP grant N62909-15-1-N143.

References

Adelman, I. (1965). “Long Cycles — Fact or Artifact?”

The American Economic Review ,55, 3, 444–463.Adenstedt, R. K. (1974). “On Large-sample Estimation for the Mean of a Stationary RandomSequence.”

The Annals of Statistics , 2, 1095–1107.32harony, A. and Feder, J. (1990).

Fractals in Physics: Essays in Honour of Benoit B .Mandelbrot . Amsterdam : North-Holland ; New York, N.Y.Anis, A. A. and Lloyd, E. H. (1953). “On the range of partial sums of a ﬁnite number ofindependent normal variates.”

Biometrika , 40, 1/2, 35–42.Bachelier, L. (1900). “Th´eorie de la Sp´eculation.”

Annales Scientiﬁques de l’ ´Ecole NormaleSup´erieure , 3, 17, 21–86. Doctoral dissertation. Translation: Cootner (1964b).Barnard, G. A. (1956). “Discussion of Hurst (1956a).”

Proceedings of the Institution of CivilEngineers , 5, 5, 552–553.Barnes, J. A. and Allan, D. W. (1966). “A statistical model of ﬂicker noise.”

Proceedings ofthe IEEE , 54, 2, 176–178.Beran, J. (1994).

Statistics for Long Memory Processes . Chapman & Hall, New York.Beran, J., Feng, Y., Ghosh, S., and Kulik, R. (2013).

Long Memory Processes . Springer,Heidelberg.Bhattacharya, R. N., Gupta, V. K., and Waymire, E. (1983). “The Hurst Eﬀect underTrends.”

J. of Applied Probability , 20, 3, 649–662.Box, G. E. P. and Jenkins, G. M. (1970).

Time Series Analysis, Forecasting and Control .Holden-Day.Brockwell, P. J. and Davis, R. A. (1991).

Time Series: Theory and Methods . 2nd ed.Springer; New York.Chow, V. T. (1951). “Discussion of Hurst (1951).”

Transactions of the American Society ofCivil Engineers , 116, 800–802.Cootner, P. H. (1964a). “Comments on Mandelbrot (1963).” In Cootner (1964b), 333–337.33ootner, P. H., ed. (1964b).

The Random Character of Stock Market Prices . M.I.T Press;Cambridge, MA.Davies, R. B. and Harte, D. S. (1987). “Tests for Hurst Eﬀect.”

Biometrika , 74, 1, 95–102.Doukhan, P., Oppenheim, G., and Taqqu, M. S., eds. (2003).

Theory and Applications ofLong-Range Dependence . Birkh¨auser; Boston, MA.Feller, W. (1951). “The Asymptotic Distribution of the Range of Sums of IndependentRandom Variables.”

The Annals of Mathematical Statistics , 22, 3, 427–432.Fiering, M. B. (1967).

Streamﬂow Synthesis . Harvard University Press, Cambridge MA.Franzke, C. (2012). “Nonlinear trends, long-range dependence and climate noise propertiesof surface air temperature.”

J. Climate , 25, 4172–4183.Garcia, L. E., Dawdy, D. R., and Mejia, J. M. (1972). “Long Memory Monthly StreamﬂowSimulation by a Broken Line Model.”

Water Resources Research , 8, 4, 1100–1105.Granger, C. W. J. (1966). “The Typical Spectral Shape of an Economic Variable.”

Econo-metrica , 34, 1, 150–161.— (1978). “New Classes of Time Series Models.”

J. of the Royal Statistical Society, SeriesD , 27, 3-4, 237–253.— (1980). “Long Memory Relationships and the Aggregation of Dynamic Models.”

J. ofEconometrics , 14, 2, 227–238.Granger, C. W. J. and Joyeux, R. (1980). “An Introduction to Long-memory Time SeriesModels and Fractional Diﬀerencing.”

J. of Time Series Analysis , 1, 1, 15–29.Graves, T. (2013). “A systematic approach to Bayesian inference for long memory processes.”Ph.D. thesis, University of Cambridge, UK.34azen, A. (1914). “Storage to be provided in impounding reservoirs for municipal watersupply.”

Transactions of the American Society of Civil Engineers , 77, 1539–1669. Withdiscussion.Hipel, K. W. and McLeod, A. I. (1978a). “Preservation of the rescaled adjusted range: 2.Simulation studies using Box–Jenkins Models.”

Water Resources Research , 14, 3, 509–516.— (1978b). “Preservation of the rescaled adjusted range: 3. Fractional Gaussian noisealgorithms.”

Water Resources Research , 14, 3, 517–518.Hosking, J. R. M. (1981). “Fractional diﬀerencing.”

Biometrika , 68, 165–176.Hurst, H. E. (1951). “Long-term storage capacity of reservoirs.”

Transactions of the Amer-ican Society of Civil Engineers , 116, 770–808. With discussion.— (1956a). “Methods of Using Long-term storage in reservoirs.”

Proceedings of the Institu-tion of Civil Engineers , 5, 5, 519–590. With discussion.— (1956b). “The Problem of Long-Term Storage in Reservoirs.”

Hydrological SciencesJournal , 1, 3, 13–27.— (1957). “A suggested statistical model of some time series which occur in nature.”

Nature ,180, 4584, 494.Hurst, H. E., Black, R. P., and Simaika, Y. M. (1965).

Long-Term Storage: An ExperimentalStudy . Constable, London.Klemeˇs, V. (1974). “The Hurst Phenomenon: A puzzle?”

Water Resources Research , 10, 4,675–688.— (1987). “One hundred years of applied storage reservoir theory.”

Water Resources Man-agement , 1, 3, 159–175. 35lemeˇs, V., Srikanthan, R., and McMahon, T. A. (1981). “Long-memory ﬂow models inreservoir analysis: What is their practical value?”

Water Resources Research , 17, 3,737–751.Kolmogorov, A. N. (1940). “Wienersche Spiralen und einige andere interessante Kurven inHilbertschen Raum.”

Comptes Rendus (Doklady) , 26, 115–118. In German. Translation:(Tikhomirov, 1991, § Journal ofStatistical Physics , 114, 291–326.Lawrance, A. J. and Kottegoda, N. T. (1977). “Stochastic Modelling of Riverﬂow TimeSeries.”

J. of the Royal Statistical Society, Series A , 140, 1, 1–47. With discussion.Lettenmaier, D. P. and Burges, S. J. (1977). “Operational assessment of hydrologic modelsof long-term persistence.”

Water Resources Research , 13, 1, 113–124.Lloyd, E. H. (1967). “Stochastic reservoir theory.”

Advances in Hydrosciences , 4, 281.Mandelbrot, B. B. (1963). “The Variation of Certain Speculative Prices.”

The J. of Business ,36, 4, 394–419. Correction: Mandelbrot (1972b).— (1965). “Une classe de processus stochastiques homoth´etiques `a soi; application `a loiclimatologique de H. E. Hurst.”

Comptes Rendus (Paris) , 260, 3274–3277. Translation:(Mandelbrot, 2002b, § H9).— (1967). “Some noises with 1 /f spectrum, a bridge between direct current and whitenoise.” IEEE Transactions on Information Theory , 13, 2, 289–298.— (1970). “Comment on Scheidegger (1970).”

Water Resources Research , 6, 6, 1791.36 (1971). “A fast fractional Gaussian noise generator.”

Water Resources Research , 7, 3,543–553.— (1972a). “Broken line process derived as an approximation to fractional noise.”

WaterResources Research , 8, 5, 1354–1356.— (1972b). “Correction of an Error in Mandelbrot (1963).”

The J. of Business , 45, 4,542–543.— (1975). “Limit theorems on the self-normalized bridge range.”

Zeitschrift f¨ur Wahrschein-lichkeitstheorie und verwandte Gebiete , 31, 4, 271–285.— (2002a). “Experimental power-laws suggest that self-aﬃne scaling is ubiquitous in nature.”In Mandelbrot (2002b), 187–203.— (2002b).

Gaussian Self-Aﬃnity and Fractals: Globality, The Earth, /f Noise, and

R/S ,vol. H of

Selecta . Springer; New York.— (2002c). “Global (long-term) dependence in economics and ﬁnance.” In Mandelbrot(2002b), 601–610.— (2013).

The Fractalist: Memoir of a Scientiﬁc Maverick . Vintage Books, New York, NY.Mandelbrot, B. B. and Hudson, R. L. (2008).

The (mis)Behaviour of Markets: A FractalView of Risk, Ruin, and Reward . 2nd ed. Proﬁle Books; London.Mandelbrot, B. B. and Taqqu, M. S. (1979). “Robust

R/S analysis of long-run serial cor-relation.” In

Proceedings of the 42nd Session of the International Statistical Institute,Manila 1979 , vol. 48(2) of

Bulletin of the International Statistical Institute , 69–105. Withdiscussion. 37andelbrot, B. B. and Van Ness, J. W. (1968). “Fractional Brownian Motions, FractionalNoises and Applications.”

SIAM Review , 10, 4, 422–437.Mandelbrot, B. B. and Wallis, J. R. (1968). “Noah, Joseph and operational hydrology.”

Water Resources Research , 4, 5, 909–918.— (1969a). “Computer experiments with Fractional Gaussian noises.”

Water ResourcesResearch , 5, 1, 228–267.— (1969b). “Robustness of the rescaled range

R/S in the measurement of noncyclic longrun statistical dependence.”

Water Resources Research , 5, 5, 967–988.— (1969c). “Some long-run properties of geophysical records.”

Water Resources Research ,5, 2, 321–340.Matalas, N. C. and Huzzen, C. S. (1967). “A property of the range of partial sums.” In

Proceedings of the International Hydrology Symposium , vol. 1, 252–257. Colorado StateUniversity; Fort Collins, CO.McLeod, A. I. and Hipel, K. W. (1978). “Preservation of the rescaled adjusted range: 1. Areassessment of the Hurst Phenomenon.”

Water Resources Research , 14, 3, 491–508.Mejia, J. M., Dawdy, D. R., and Nordin, C. F. (1974). “Streamﬂow Simulation 3: The BrokenLine Process and Operational Hydrology.”

Water Resources Research , 10, 2, 242–245.Mejia, J. M., Rodr´ıguez-Iturbe, I., and Dawdy, D. R. (1972). “Streamﬂow Simulation 2: TheBroken Line Process as a Potential Model for Hydrologic Simulation.”

Water ResourcesResearch , 8, 4, 931–941.Montanari, A. (2003). “Long-Range Dependence in Hydrology.” In Doukhan et al. (2003),461–472. 38oran, P. A. P. (1959).

The Theory of Storage . Wiley; New York.— (1964). “On the range of cumulative sums.”

Annals of the Institute of Statistical Mathe-matics , 16, 1, 109–112.O’Connell, P. E. (1971). “A simple stochastic modelling of Hurst’s law.” In

MathematicalModels in Hydrology: Proceedings of the Warsaw Symposium, July 1971 , vol. 1, 169–187.Potter, K. W. (1976). “Evidence for nonstationarity as a physical explanation of the HurstPhenomenon.”

Water Resources Research , 12, 5, 1047–1052.Rippl, W. (1883). “The Capacity of storage-reservoirs for water-supply.”

Minutes of theProceedings of the Institution of Civil Engineers , 71, 1883, 270–278.Rodr´ıguez-Iturbe, I., Mejia, J. M., and Dawdy, D. R. (1972). “Streamﬂow Simulation 1: ANew Look at Markovian Models, Fractional Gaussian Noise, and Crossing Theory.”

WaterResources Research , 8, 4, 921–930.Samorodnitsky, G. and Taqqu, M. S. (1994).

Stable Non-Gaussian Random Processes:Stochastic Models with Inﬁnite Variance . Chapman & Hall; New York.Scheidegger, A. E. (1970). “Stochastic Models in Hydrology.”

Water Resources Research , 6,3, 750–755.Siddiqui, M. M. (1976). “The asymptotic distribution of the range and other functions ofpartial sums of stationary processes.”

Water Resources Research , 12, 6, 1271–1276.Solari, M. E. and Anis, A. A. (1957). “The Mean and Variance of the Maximum of theAdjusted Partial Sums of a Finite Number of Independent Normal Variates.”

The Annalsof Mathematical Statistics , 28, 3, 706–716.39ornette, D. (2004).

Why Stock Markets Crash: Critical Events in Complex Financial Sys-tems . Princeton University Press, Princeton, NJ.Sudler, C. (1927). “Storage required for the regulation of streamﬂow.”

Transactions of theAmerican Society of Civil Engineers , 91, 622–660.Taqqu, M. S. (1970). “Note on evaluations of

R/S for fractional noises and geophysicalrecords.”

Water Resources Research , 6, 1, 349–350.— (2003). “Fractional Brownian Motion and Long-Range Dependence.” In Doukhan et al.(2003), 5–38.— (2013). “Benoit Mandelbrot and Fractional Brownian Motion.”

Statistical Science , 28,131–134.Tikhomirov, V. M., ed. (1991).

Selected Works of A. N. Kolmogorov , vol. 1: Mathematicsand Mechanics. Kluwer Academic Publishers; Dordrecht, The Netherlands. Translationfrom Russian by V. M. Volosov.Turcotte, D. (1997).

Fractals and Chaos in Geology and Geophysics . 2nd ed. CambridgeUniversity Press; Cambridge, UK.Wallis, J. R. and Matalas, N. C. (1972). “Sensitivity of reservoir design to the generatingmechanism of inﬂows.”

Water Resources Research , 8, 3, 634–641.Wallis, J. R. and O’Connell, P. E. (1973). “Firm Reservoir Yield – How Reliable are HistoricHydrological Records?”