Inferring long memory using extreme events
aa r X i v : . [ n li n . C G ] N ov Inferring long memory using extreme events
Dayal Singh ∗ and M. S. Santhanam † Indian Institute of Science Education and Research,Dr. Homi Bhabha Road, Pune 411008, India. (Dated: November 24, 2020)Many natural and physical processes display long memory and extreme events. In these systems,the measured time series is invariably contaminated by noise. As the extreme events display largedeviation from the mean behaviour, the noise does not affect the extreme events as much as it affectsthe typical values. Since the extreme events also carry the information about correlations in the fulltime series, they can be used to infer the correlation properties of the latter. In this work, from agiven time series, we construct three modified time series using only the extreme events. It is shownthat the correlations in the original time series and in the modified time series, as measured by theexponent obtained from detrended fluctuation analysis technique, are related to each other. Hence,the correlation exponents for a long memory time series can be inferred from its extreme eventsalone. This approach is demonstrated for several empirical time series.
PACS numbers: 02.50.-r, 89.75.Da, 05.4r05.-a
I. INTRODUCTION
Any event whose magnitude strays far from its typicalvalues can be designated as an extreme event. Such ex-treme events have significant impact in both nature andsociety [1, 2]. The consequences of naturally occurringextreme events such as the floods, droughts, cyclones andearthquakes are often disastrous. Extremely large solarflares such as the Halloween storms of 2003 [3] can poten-tially damage the communication satellites, power distri-bution and communication networks and might requirere-routing of aircrafts [4]. Due to our reliance on tech-nology in day-to-day life, we encounter comparatively lessdisruptive extreme events ranging from mobile networkcongestion to traffic jams. In economy, market crasheshave impacted the entire international financial system[5]. Due to disproportionately large social and economiccosts, it is essential to understand the extreme eventproperties and their early warning signals.It is by now well understood that irrespective of thephysical origins, extreme events display certain genericstatistical properties. One such possibility arises in alarge number of natural, socio-economic and technolog-ical systems display long memory property. This im-plies that the autocorrelation function decays sufficientlyslowly in a power-law form, h x ( t ) x ( t + τ ) i ∼ τ − γ where0 < γ < ∗ [email protected] † [email protected] and fall in this class [11, 12]. Primarily motivated bythese examples, considerable research effort had been in-vested in studying the distribution of time interval R between successive occurrences of extreme events calledrecurrence time distributions P ( R ). For a time serieswith auto-correlation exponent γ , the approximate formfor P ( R ) has a power-law decay for short recurrence in-tervals, and a stretched exponential decay for long recur-rence intervals [10] and the characteristic exponents inthese regimes are a function of γ . Thus, γ carries infor-mation about the entire time series as well as about itsextreme events.Often, such characterisation through γ becomes am-biguous when the long range correlated time series iscontaminated by noise and/or missing data. For suf-ficiently strong noise contamination, a time series canlose its long range character and even become uncorre-lated. Similarly, missing data also leads to uncorrelat-edness. Generally, the extreme events strongly deviatefrom typical values and consequently are far less affectedthan the regular non-extreme events by the noise levelin the measurements. Randomly missing data affectsnon-extreme events more than the extreme events sincetypically the non-extremes outnumber the extremes. Allthese arguments imply that it might be possible to studythe statistical features of the extreme events alone andinfer information about long-range correlations in a timeseries. Effectively, the information contained in the, pos-sibly noisy, non-extreme values of the time series can bedisregarded. Hence, the main premise of this article is tocharacterise a long range correlated time series by usingonly the extreme events. Further, apart from its appli-cation to noisy time series, in the context of the currentinterest in big data, this can be thought of as a methodto estimate correlation exponents of very long time se-ries using only a small fraction of its data that displayextreme events.In this work, we use detrended fluctuation analysis(DFA) to quantify long memory and the results are pre-sented in terms of DFA exponent defined in Eq. 3.This method has been extensively studied earlier, largelythrough simulations, and is useful even in the presenceof non-stationarities [13, 14] and trends [15] in the timeseries. Recently, detailed theoretical studies of DFA [16–18] have shown that detrending is implicit if fluctuationfunction is to be an unbiased estimator [19]. Probabilisticapproaches have been adopted to obtain expected val-ues of fluctuation function for Gaussian processes [20].This paper could also be thought of as response of thelong memory series to specific kind of data loss, and isrelevant in the context of research interest in how DFAfluctuation functions behave under similar conditions ofdata loss [21]. In this work, we pick out extreme eventsin a time series for special treatment for their long mem-ory properties. This has some broad parallels in earlierwork in which sign and magnitude of the fluctuationshave been singled out for special treatment as far as theirlong range properties are concerned [22]. It must also bementioned that DFA technique can be applied to classi-cal and quantum systems at criticality [23], and furtherit is equivalent to performing ∆ statistics widely usedin random matrix theory [24].Let x ( t ) be a long range correlated time series and Q be the threshold which defines extreme events. Then, ex-treme events are those for which x ( t ) > Q . A schematicof designating events as extreme events is shown in Fig.1, in which the solid horizontal line defines the thresholdfor extreme events. This figure also shows recurrence in-tervals, the time interval between two successive extremeevents. Suppose we consider Gaussian distributed time-series x ( t ) with long memory exponent γ = 0 .
2, thenupon adding white noise of sufficient strength, the se-ries tends to become uncorrelated. As we show in thiswork, by isolating only the extreme events from x ( t ), wecan still infer about the long memory exponent of x ( t ).Rest of the article is structured as follows : In Section II,various methods and measured data used in this work isreviewed. In particular, we review detrended fluctuationanalysis and the Fourier filtering method to generate syn-thetic long memory data. In Sections III to V, we intro-duce three different methods of applying extreme eventsto infer about long memory exponent, and a regressionbased method is used to estimate the exponent in sectionVI. Finally, conclusions are presented in Section VII. II. METHODS AND DATAA. Detrended Fluctuation Analysis
Detrended Fluctuation Analysis is a widely employedtechnique to quantify correlations in non-stationary timeseries data [25, 26]. Several variants of this technique arealso studied in the literature, see Ref. [27] for one suchvariant based on orthogonal polynomials. We briefly re-view the basic technique here and the details are availablein Ref. [25]. Let x ( t ′ ) , t ′ = 1 , , , . . . , N denote a time FIG. 1. A schematic time series x ( t ). For a threshold of Q = 2, indicated by a horizontal line, three extreme eventscan be identified. Two return intervals, r and r , are shown. series of length N with mean µ and variance σ . Theintegrated version of the time series is given by y ( t ) = t X t ′ =0 ( x ( t ′ ) − µ ) . (1)Now, y ( t ) is partitioned into boxes of size n , where typ-ically n ≤ N/
4. Within each box, a polynomial y l ( t ) oforder- l is fitted to y ( t ). In practice, usually order-1 isused. The time series is locally detrended by subtracting y i ( t ) from the integrated time series. The root-mean-square fluctuation function as a function of size of box n is given by F ( n ) = vuut N n X t =0 ( y ( t ) − y l ( t )) . (2)This process is repeated by varying the box size n . Forcorrelated time series, the fluctuation function F ( n ) gen-erally scales as F ( n ) ∼ n α , (3)where α is the DFA exponent and indicates the degree ofcorrelation. If α = 0 .
5, then the series is uncorrelated. If α > . α < . α ∈ [0 ,
1] [11, 12],though it is also known to work for α > n [15]. In this case, α can bereliably estimated by first integrating the anti-correlatedseries and then applying DFA to it. The local trend isthen removed by fitting a second-order polynomial as itis integrated twice in this process. The true exponent canbe calculated from the estimated exponent α ′ using therelation α ′ = α + 1 [15]. In Appendix A, it is shown thatthis method (we call it DFA-int) can estimate exponentwhen α <
0. For stationary time series, α is related Data set Years Length of data DFA exponentS&P 500 index 1927-2020 23248 0.865ED stock data 1962-2020 14739 0.850IBM stock data 1962-2020 14739 0.883BK stock data 1973-2020 11910 0.956Seismic data 1981-2002 91797 0.784Prague temperature 1775-2019 89484 0.670TABLE I. Description of long-range correlated, empirical dataanalyzed in this work. to the auto-correlation exponent γ , and power spectralexponent β through the relations α = 1 + β , (4) α = 1 − γ . (5)These relations [29] are valid for 0 < α < α > − < α < x ( t ) with desired DFA expo-nent α in . Time series of length 2 was generated and allthe estimates for various exponents are averaged over 40realizations. Using x ( t ), the estimated DFA exponent is α x . In practice, α in ≈ α x , and hence we use the latterin rest of this paper. Further, we employ surrogate dataanalysis [32] by randomly shuffling x ( t ). If the shuffledseries becomes uncorrelated, then it implies the presenceof non-trivial correlations in the data and is not a chanceoccurrence. B. Applications to observed data
The results presented in this paper are tested usingobserved data sets. For this purpose, time series fromthree different systems are considered for extreme eventsbased analysis. They are, ( a ) absolute log-returns of dailyclosing index and stock data, namely, S&P index data,equity data of ED, IBM and BK stocks [33], ( b ) seismicrecords from Italian Seismicity Catalog CSI 1.1 [34], and( c ) observed daily mean temperatures from Prague ob-servatory [35, 36]. More details about the data are givenin Table I.For the stock market data, if x k represents the valueof stock/index at time k , then the absolute log-returns isdefined as ρ k = (cid:12)(cid:12)(cid:12)(cid:12) log (cid:18) x k +1 x k (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) . (6)The extreme events analysis was peformed on ρ k . TheItalian seismicity catalog, CSI 1.1, contains magnitudesof earthquakes in the Italian territory during 1981-2002. FIG. 2. (a) Estimated DFA exponent α η of extreme event se-ries η ( t ) as a function of α x for synthetic data. The shaded re-gion shows error bar around the mean trend. The two dashedlines are obtained by regression for the anti-correlated andthe long-range correlated (0 . < α x < m = − .
009 and m = 0 . η ( t ). The data has 91797 earthquake records, of which N =39665 have a magnitude evaluation. The observed dailytemperature records from Prague observatory is also an-alyzed. Let T i represents the measured temperature on i -th day in any year. The seasonal trends were removedand we analysed the data of temperature anomaly ∆ T defined by ∆ T i = T i − T i , (7)where T i is the average temperature for the i -th daytaken over all years. The correlation properties of thisseries had been studied earlier [36].As seen from the last column of Table I, the DFA ex-ponent values indicate that all the time series are longrange correlated. In all the cases, the time series werestandardised to zero mean and unit variance. To defineextreme events in these measured data sets, a cut-off de-fined by q = 1 was chosen (see Eq. 8 below for furtherexplanation). III. EXTREME EVENTS : MODIFIED TIMESERIES
In this section, we analyse a modified time series inwhich only the extreme events are retained and all othersare set to zero. A correlated time series x ( t ) with DFAexponent α x is considered. It is assumed that x ( t ) has awell defined mean µ and standard deviation σ . An eventat any instant of time t will be designated as an extremeevent if x ( t ) > x th , where the threshold is x th = µ + q σ (8)and q ≥
0. From the given original time series x ( t ), amodified time series η ( t ) is constructed as follows : η ( t ) = ( x ( t ) , if x ( t ) > x th , otherwise . (9)Throughout this paper, we choose q = 1 as a threshold forextreme events. Further, we address the question – givenlong range correlated time series x ( t ) with DFA exponent α x , what is the DFA exponent α η of the modified timeseries η ( t ) ? Thus, we explore the self-affinity of η ( t )using DFA technique for α x ∈ [ − , α x and α η , for which evidence will be presented below,can be stated as follows : α η ≈ . , − ≤ α x ≤ . ,< α x , . < α x ≤ . (10)This result is presented in Fig. 2(a). In this figure, theoriginal long range correlated time series x ( t ) of length2 is synthetically generated using Fourier filtering tech-nique. By putting q = 1, extreme events are identifiedand a modified time series η ( t ) is generated. The DFAtechnique is applied to η ( t ). Figure 2(a) shows α η plottedas a function of α x . As can be inferred from this figure,the modified series η ( t ) is found to be uncorrelated inthe anti-correlated regime of x ( t ), i.e., α x < .
5, whereasthe correlations increase monotonically in the positivelycorrelated regime, α x > .
5, with an approximate slopeof about 0.907. This systematics imply that α η can beused to estimate the value of α x in the persistence regimewith α x ∈ [0 . , α x > α η is not reliable due to large varianceprimarily due to the finite size of the time series. Fig-ure 2(a) also displays a similar analysis performed onobserved data sets listed in Table I. It shows that α η ,for observed data sets, closely follows the trend obtainedfrom synthetic long range correlated data. This furtherconfirms the systematic relation between α η and α x .To physically understand these results, we realise thatEq. 9 is essentially a process of data loss. For an anti-correlated series, even moderate data loss of about ∼ ∼
84% (for q = 1) of the series is removed andhence we expect the short range correlations in the restof the series to be destroyed. This results in an uncor-related time series. However, remarkably, the long-rangecorrelated series is relatively more robust against dataloss of non-extreme events. This is due to the fact thatpersistence ensures that extreme events are bunched to-gether and the construction in Eq. 9 does not destroy these extreme event values, and as a result their correla-tions are preserved with slight modifications. The effectof noisy data is also shown in Fig. 2. The data η ( t )is deliberately contaminated with uniformly distributednoise ξ ∈ [ − . , . η ( t ). As this reveals, the additive noisein the data does not significantly affect the value of α η since extreme events are less affected by noise than nor-mal events. Finally, Figure 2(b) shows the DFA exponentobtained after shuffling the time series. As expected, itshows that α η ≈ . IV. CORRELATIONS IN RETURN INTERVALSSERIES
The time ordered extreme events, exploited in sectionIII, is one useful piece of information about the timeseries. Another useful component of the time series isthe return intervals between successive occurrences of ex-treme events. This had been extensively studied as recur-rence statistics, and it is fairly well understood that thereturn interval distribution is parameterised by the auto-correlation exponent γ of a long range correlated timeseries [10]. However, as shown in Appendix B, the returnintervals are not correlated with the extreme event val-ues as measured by the linear Pearson correlation. Thesetwo components of a long range correlated time series,the extreme event values and the recurrence statistics,are independent of one another.For any x ( t ) , t = 1 , , , . . . , N , with respect to extremeevent threshold x th ( q ), we consider two successive occur-rences of extreme events at times t = t m and t m +1 . Then,the return interval series is defined as r ( m ) = t m +1 − t m , m = 1 , , , . . . , (11)and its DFA exponent is denoted by α r . Figure 3 displayssimulation results for how α r varies as a function of α x .It reveals that the return intervals are uncorrelated in theanti-correlated regime of α x . This is to be expected sincethe extreme events are uncorrelated in this regime, andnot surprisingly the return intervals are uncorrelated. Incontrast, in the regime of 0 . < α x < α r , increasesapproximately linearly (till about α x = 0 .
9) at a rateslower than for α η . In general, we infer that, α r < α η < α x . (12)Upon further increasing beyond α x >
1, the correlationsbegin to monotonically decrease, and the return inter-vals become almost uncorrelated. This happens becauseas α x increases, correlations get stronger, and more andmore extreme events are consecutive in time with returninterval r = 1. Then, due to consecutive extreme events, r ( m ) = 1 for long times and this suppresses fluctuationsleading to a decrease in α r . In fact, this effect appears to FIG. 3. (a) Estimated DFA exponent of return interval series, r ( t ) plotted against α x . The shaded region shows the errorbar around the mean trend. The two dashed lines obtainedby regression for the anti-correlated and long-range correlatedregimes have slopes m = 0 .
013 and m = 0 .
811 respectively.The orange symbols are from observed data sets in Table I.The black symbols are the DFA exponents for synthetic datawith uniform noise added. (b) DFA exponent of randomlyshuffled r ( t ). take over even as α x → .
9. For high values of α x > . α r = 0 . m → ∞ ,the return interval series will be slightly correlated in theregime α x > .
5. Figure 3(a) also displays the values of α r computed for the observed data sets listed in TableI. A good agreement is observed with the trend shownby the synthetic data (blue symbols). The systematicrelationship between α r and α x is also exhibited by theobserved time series. We might also point out that theadditive noise (uniform noise added to r ( t )) in the datadoes not significantly affect the value of α r , as seen bythe black symbols in Fig. 3(a). If the return intervalseries r ( t ) is randomly shuffled, the DFA exponent is ap-proximately 0.5 (Fig. 3(b)) pointing to the existence ofnon-trivial correlations in r ( t ). V. COMPRESSED EXTREME EVENTS
Starting from x ( t ), we construct a time series s ( m ) inwhich only the extreme events are retained by removingall instances of non-extreme events. We call this com-pressed extreme event series and is defined as s ( m ) = { x ( t m ) | x ( t = t m ) > x th } , m = 1 , , . . . N e , (13)where N e is the number of extreme events in the timeseries x ( t ) of length N . In most cases, N ≥ N e . Note FIG. 4. (a) Estimated DFA exponent of compressed extremeevent series, s ( t ), plotted against α x . The shaded region rep-resents the error bar. The two dashed lines obtained by re-gression for the regimes, α x < . . < α x <
1, haveslopes m = 0 . m = 0 .
983 respectively. The orange sym-bols represent α s for the observed data listed in Table I. Theblack symbols are the DFA exponents for synthetic data withuniform noise added. (b) DFA exponent of randomly shuffledtime series s ( t ). that in contrast to Eq. 9, the non-extremes in Eq. 13 areentirely removed instead of setting them to zeros. TheDFA exponent associated with the time series s ( m ) isdenoted by α s . The variation of α s as a function of α x ,is shown in Fig. 4. Evidently, the compressed extremeevent series s ( m ) is uncorrelated in the anti-correlatedregime, α x < .
5. This is the result of extreme eventseries x ( t ) being nearly uncorrelated.In the correlated regime, α x > .
5, the observationsshown in Fig. 4 reveal that α s < α x . This can be un-derstood as follows. For the series x ( t ), the fluctuationfunction is denoted as F ( L ) ∼ L α x . For a time series x ( t ) of length L , the corresponding length of the com-pressed time series is L ′ = L/ h r i , where h r i representsthe mean return interval for extreme events, which de-pends on the x th used to identify extreme events. Then,using the equation for F ( L ), the fluctuation function forcompressed extreme events can be written as F ( L ′ ) ∼ (cid:18) L ′ h r i (cid:19) α x ∼ L ′ α x (1 − β ) , (14)where the DFA exponent of the compressed series s ( m )can be identified as α s = α x (1 − β ) and box-size depen-dent β is, β = log L ′ h r i = log h r i log L ′ . (15)In the limit of large box size L ′ , i.e. , L ′ >>
1, we have β <<
1. Hence, we can infer that α x > α s , as observed FIG. 5. The heat map shows the DFA exponent of the timeseries obtained by assembling two components together; thereturn intervals with DFA exponent α rin , and the extremeevents series with DFA exponent α sin . Notice that the com-posite time series has DFA exponent close to that of its returnintervals. The numbers in the grid give the actual DFA ex-ponent, and the colormap encodes the same information. in the simulation results displayed in Fig. 4. The depen-dence on box size L ′ is logarithmic and hence sufficientlyweak that for finite sample sizes the DFA exponent ofthe compressed extreme event series does not change ap-preciably with length of the time series. As predicted byEq. 15, simulation results (not shown here) verify that α s ≈ α x as L ′ → ∞ .Based on Eqns 14-15 and Fig. 4, the relation between α x and α s can be surmised as, α s ≈ . , − ≤ α x ≤ . ,< α x , . < α x ≤ . (16)For 0 . < α x < α x and α s bear a linear relation be-tween them. However, α x > .
5, correlations are farstronger and most extreme events tend to be consecutivewith h r i ≈
1. Hence, β ≈ α x ≈ α s . Thus, forhighly correlated time series, α s provides a reasonableestimate of α x . Fig. 4(a) also shows α s computed for theobserved data sets in Table I. A reasonably good agree-ment is seen between the trends of synthetic data (bluesymbols) and the observed data (orange symbols). If ad-ditive noise (uniform noise added to s ( t )) is present in thedata, even then it does not significantly affect the valueof α s , as seen by the black symbols in Fig. 4(a). This isdue to extreme events being less affected by noise thanthe non-extreme events. Finally, as shown in Fig. 4(b),if the compressed extreme event series s ( t ) is randomlyshuffled, its DFA exponent is nearly 0.5 showing that thecompressed extreme events carry non-trivial correlationsinherited from the original time series x ( t ).Finally, it must be pointed out that return intervalsseries r ( t ) and the compressed extreme event series s ( t ) are uncorrelated processes, for most part. The linear cor-relation between them (not shown here) reveals that theyare nearly uncorrelated. From any time series x ( t ), thederived series r ( t ) and s ( t ) are created by consideringextreme events defined with, say, q = 1. This approachalso provides a way of performing the reverse process– assembling a time series with specified return inter-val correlations and specified extreme event correlationswhose DFA exponents, respectively, are α rin and α sin . Ifwe assemble such a time series, then the DFA exponentobtained is displayed in Fig. 5. This figure reveals thatfor a time series x ( t ) with DFA exponent α x , the corre-lations largely are contributed from the return intervals(with exponent α rin ) and is only weakly by the correla-tions in the extreme event values (with exponent α sin ). VI. INFERRING DFA EXPONENT USINGEXTREME EVENTS
The central premise of this paper is that the infor-mation about temporal correlations, or equivalently theDFA exponent, of any time series is also embedded in itsextreme events. Hence, using only the values of extremeevents, we can infer the DFA exponent of the originaltime series. In this section, we employ the systematic re-lation between the DFA exponents that depend only onthe extreme events, namely, α η , α r and α s and that forthe original time series α x , depicted in Figures 2-4, to in-fer the value of α x . In principle, any of α i , i = η, r, s canbe used for this purpose because α i is a monotonically in-creasing function of α x in the regime when 0 . < α x < e α x ,we propose a simple regression procedure that uses allthe available information as outlined below.Let x ( t ) , t = 1 , . . . T represent a persistent time se-ries for which we need to estimate the DFA exponent un-der the condition that the sample is noisy and it can besafely assumed that the extreme events are less affectedby noise than non-extreme events. First step would be todecide a suitable threshold, x th = h x i + σ x , where h x i and σ x are, respectively, the mean and standard deviation ofthe time series. Then, based on the extreme events soidentified, three different time series η ( t ) , r ( t ) and s ( t )are constructed and, respectively, their DFA exponents α ∗ η , α ∗ r and α ∗ s are computed. For 0 . < α x <
1, it isclear from Figs. 2-4 that the estimated exponents have asystematic dependence on α x and hence a cost function U that depends on α x can be defined as, U ( α x ) = q ( α ∗ η − α η ) + ( α ∗ r − α r ) + ( α ∗ s − α s ) . (17)Now, we set dU/dα x = 0, and obtain an estimate forDFA exponent to be e α x = g (cid:0) α ∗ η , α ∗ r , α ∗ s (cid:1) . (18)where g ( . ) must be obtained numerically. In Table II, theresults of this procedure are tabulated for the measured Data set DFA exponent Estimated exponentusing extreme eventsS&P 500 index 0.865 0.972ED stock data 0.850 0.878IBM stock data 0.884 0.799BK stock data 0.956 0.881Seismic data 0.784 0.675Prague temperature 0.670 0.679TABLE II. A comparison between ”exact” DFA exponent α and the value estimated using extreme events as described inSec VI through Eqns. 17 and 18. time series listed in Table I. It is seen that, within thelimitations of finite data length, a reasonable agreementis obtained between the actual DFA exponent and theestimated exponent. In principle, this procedure need notnecessarily use the information of all the three estimatedexponents α ∗ η , α ∗ r and α ∗ s . However, using all these threeexponents provides a tighter constraint for e α . In the anti-persistent regime when 0 < α x < .
5, the DFA exponentscomputed from the derived time series η ( t ) , r ( t ) and s ( t )are α η ≈ α r ≈ α s ≈ .
5. Hence, the regression procedurewould not estimate the correct value of DFA exponent,but would still yield the qualitative information that theoriginal time series is anti-persistent in nature.
VII. SUMMARY
Extreme events are the ones that deviate strongly fromtypical events. Generally, methods such as the detrendedfluctuation analysis or its other variants are used to deter-mine the correlation exponent of the long memory timeseries. In any typical time series, though the extremeevents are fewer in number compared to typical events,the former have disproportionate influence in the timeseries. The measured time series could often be con-taminated by noise and missing data points. Typically,the extreme events, due to their large magnitudes, arenot as much affected by noise as the non-extreme events.Hence, in this work, we exploit this property by attempt-ing to estimate the correlation exponent of a time seriesfrom its extreme events alone. we have systematically re-moved non-extremes in a correlated time series and theeffect of removal on the correlation exponent has beenstudied. For this, given a time series x ( t ), three differ-ent time series have been constructed that contain infor-mation about extreme events alone. We show that theDFA exponents for the original time series x ( t ) and ofthe three derived time series (based on extreme eventsalone) collectively denoted by x ee ( t ) have a systematicrelation. This result, in Figs. 2-4, has been numeri-cally obtained for extreme events defined with q = 1 (seeEq. 8). However, our results indicate that qualitativelysimilar trends are observed for other values of q as well.We use this relation to estimate the DFA exponent ofthe original time series, effectively using only the infor- FIG. 6. Estimated DFA exponent, α x , against the input DFAexponent α in for the two methods. The shaded region showsthe error bar. Deviation from grey line would imply that themethod does not work well. The DFA-int gives reliable resultsin the range [ − . , .
5] whereas the standard DFA method isreliable only in [0 . , . mation about the extreme events in them. A regressionbased method is put forward to estimate the DFA ex-ponent of x ( t ). This technique has been demonstratedon several measured real time series data from physicalsystems, namely, stock markets, seismic and temperaturedata, all of which are known to be long range correlatedand also display extreme events. The estimated expo-nents based on x ee ( t ) are in good agreement with theexponents obtained from the measured time series. Thework presented is an exploration of the relationship be-tween a time series and its extreme events treated as a de-rived time series. This raises several interesting questionsthat require a deeper investigation such as the role playedby the extreme events thresholds in inferring DFA expo-nents, study of time series in which the extreme eventsmagntiudes and the return intervals are correlated. Someof these will be explored and reported elsewhere. ACKNOWLEDGMENTS
Dayal Singh would like to acknowledge the INSPIRE-SHE programme of Department of Science & Technology,India. M. S. Santhanam acknowledges the MATRICSgrant MTR/2019/001111 from SERB, Govt. of India.
Appendix A: Benchmarking DFA-int in theextended range of correlations
In this, we study the scaling behaviour of correlatedseries in the extended range of the DFA exponent, i.e, α x ∈ [ − , β ∈ [ − , was generated from aGaussian distribution with zero mean and unit variance FIG. 7. Numerically estimated linear Pearson correlation co-efficient between return intervals and extreme event, as a func-tion of the DFA exponent of x ( t ). using the Fourier filtering method having DFA exponent α x . The input DFA exponent α in was calculated usingEqn. 4). The exponent α x is estimated by two methods,the standard DFA and and DFA-int (as described in sec-tion II), for the entire range and the trends are shown inFig. 6. The results in this figure are averaged over 20realizations. As seen in Fig. 6, DFA-int gives reliable agreementwith α in in a wider range [ − . , . . , α x = 0for all values of α in < α in < α in ≈ −
1. Both the procedures underestimate the DFAexponent for α in > .
5, though DFA-int always providesa better estimate.
Appendix B: Correlations between return intervalsand extreme event magnitudes
Consider a long memory time series x ( t ). The ex-treme events in this series are denoted by y i = x ( t ) , i =1 , , , . . . , such that x ( t ) > x th , where x th is the thresh-old identified in Eq. 8. Corresponding to every y i , areturn interval r i can be identified as the time elapsedsince the last occurrence of extreme event. The cross-correlation between y i and r i is computed and the resultis shown in Fig. 7. As seen in this figure, the return in-tervals and extreme events display mild anti-correlationsfor 0 < α x <
1, and is nearly uncorrelated outside thisregime of α x . Hence, to a first approximation, the ex-treme events and return intervals can be taken to be ap-proximately uncorrelated. [1] S. Albeverio, V. Jentsch, and H. Kantz, J. Stat. Phys. , 189-190 (2006).[2] A. S. Sharma, A. Bunde, V. P. Dimri and D. N. Baker(eds.), Extreme events and natural hazards : The com-plexity perspective , AGU Monographs , (2012).[3] Halloween storms, ,(27-10-2008).[4] National Research Council,
Severe Space WeatherEvents: Understanding Societal and Economic Impacts:A Workshop Report , (The National Academies Press,2008).[5] D. Sornette,
Why Stock Markets Crash: Critical Eventsin Complex Financial Systems , (Princeton UniversityPress, 2003).[6] A. Bunde, J. F. Eichner, S. Havlin, and J. W. Kantel-hardt, Physica A , 1 (2003); M. S. Santhanam andH. Kantz, Physica A , 713 (2005); A. Bunde, J. F.Eichner, J. W. Kantelhardt and S. Havlin, Phys. Rev.Lett. , 048701 (2005); M. I. Bogachev, J. F. Eichnerand A. Bunde, Phys. Rev. Lett. , 240601 (2007).[7] E. G. Altmann and H. Kantz, Phys. Rev. E , 056106(2005).[8] J. F. Eichner, J. W. Kantelhardt, A. Bunde andS. Havlin, Phys. Rev. E , 011128 (2007).[9] F. Wang, K. Yamasaki, S. Havlin and H. E. Stanley,Phys. Rev. E , 026117 (2006).[10] M. S. Santhanam, and H. Kantz, Phys. Rev. E ,051113, (2008). [11] A. Bunde, J. Kropp and H.-J. Schellnhuber (eds), TheScience of Disasters - Climate Disruptions, Heart Attacksand Market Crashes , (Springer, Berlin, 2002).[12] D. L. Turcotte,
Fractals and Chaos in Geology and Geo-physics , (Cambridge University Press, 1997).[13] Z. Chen, P. Ch. Ivanov, K. Hu and H. E. Stanley, Phys.Rev. E , 041107 (2002).[14] M. H¨oll, H. Kantz and Y. Zhou, Phys. Rev. E , 042201(2016).[15] K. Hu, P. Ch. Ivanov, Z. Chen, P. Carpena and H. E.Stanley, Phys. Rev. E , 011114 (2001).[16] M. H¨oll nad H. Kantz, Eur. Phys. J. B , 327 (2015).[17] O. Lovsletten, Phys. Rev. E , 012141 (2017).[18] K. Ken and Y. Tsujimoto, Physica A , 807 (2016).[19] M. H¨oll, K. Kiyono and H. Kantz, Phys. Rev. E ,033305 (2019).[20] G. Sikora et. al. , Phys. Rev. E , 032114 (2020).[21] Qianli D. Y. Ma, R. P. Bartsch, P. Bernaola-Galv´an, M.Yoneyama, and P. Ch. Ivanov, Phys. Rev. E , 031101(2010).[22] M. Gomez-Extremera, P. Carpena, P. Ch. Ivanov and P.A. Bernaola-Galvan, Phys. Rev. E , 042201 (2016); P.Carpena et. al. , Entropy , 261 (2017); C. Lerma et.al. , Chaos , 093906 (2017).[23] E. Landa et. al. , Phys. Rev. E , 016224 (2011).[24] M. S. Santhanam, J. N. Bandyopadhyay and D. Angom,Phys. Rev. E , 015201(R) (2006).[25] C. K. Peng, S. Havlin, H. E. Stanley, and A. Goldberger,Chaos , 82 (1995). [26] A. Bashan, R. Bartsch and J. W. Kantelhardt and S.Havlin, Physica A, , 5080 (2008).[27] R. B. Govindan, Phys. Rev. E , 010201(R) (2020).[28] M. G´omez-Extremera, P. Carpena, P. Ch. Ivanov, and P.A. Bernaola-Galv´an, Phys. Rev. E , 042201 (2016).[29] M. H¨oll, H. Kantz, Eur. Phys. J. B , 327 (2015).[30] H. A. Makse, S. Havlin, M. Schwartz, and H. E. Stanley,Phys. Rev. E , 5445-5449 (1996)[31] G. Rangarajan, M. Ding, Phys. Rev. E , 4991 (2000).[32] J. Theiler, S. Eubank, A. Longtin, B. Galdrikian and J.D. Farmer, Physica D , 77 (1992). [33] Yahoo Finance, https://in.finance.yahoo.com , ac-cessed on 22-07-2020.[34] Catalog of Italian seismicity CSI 1.1 1981-2002, https://csi.rm.ingv.it/home , accessed on 06-05-2019.[35] A. Bunde, S. Havlin and E. Koscielny-Bunde, H.Schellnhuber, Physica A , 255 (2001).[36] R. B. Govindan, A. Bunde, S. Havlin, Physica A318