[PDF] Inferring long memory using extreme events

Abstract

Many natural and physical processes display long memory and extreme events. In these systems, the measured time series is invariably contaminated by noise. As the extreme events display large deviation from the mean behaviour, the noise does not affect the extreme events as much as it affects the typical values. Since the extreme events also carry the information about correlations in the full time series, they can be used to infer the correlation properties of the latter. In this work, from a given time series, we construct three modified time series using only the extreme events. It is shown that the correlations in the original time series and in the modified time series, as measured by the exponent obtained from detrended fluctuation analysis technique, are related to each other. Hence, the correlation exponents for a long memory time series can be inferred from its extreme events alone. This approach is demonstrated for several empirical time series.

Full PDF

aa r X i v : . [ n li n . C G ] N ov Inferring long memory using extreme events

Dayal Singh ∗ and M. S. Santhanam † Indian Institute of Science Education and Research,Dr. Homi Bhabha Road, Pune 411008, India. (Dated: November 24, 2020)Many natural and physical processes display long memory and extreme events. In these systems,the measured time series is invariably contaminated by noise. As the extreme events display largedeviation from the mean behaviour, the noise does not aﬀect the extreme events as much as it aﬀectsthe typical values. Since the extreme events also carry the information about correlations in the fulltime series, they can be used to infer the correlation properties of the latter. In this work, from agiven time series, we construct three modiﬁed time series using only the extreme events. It is shownthat the correlations in the original time series and in the modiﬁed time series, as measured by theexponent obtained from detrended ﬂuctuation analysis technique, are related to each other. Hence,the correlation exponents for a long memory time series can be inferred from its extreme eventsalone. This approach is demonstrated for several empirical time series.

PACS numbers: 02.50.-r, 89.75.Da, 05.4r05.-a

I. INTRODUCTION

Any event whose magnitude strays far from its typicalvalues can be designated as an extreme event. Such ex-treme events have signiﬁcant impact in both nature andsociety [1, 2]. The consequences of naturally occurringextreme events such as the ﬂoods, droughts, cyclones andearthquakes are often disastrous. Extremely large solarﬂares such as the Halloween storms of 2003 [3] can poten-tially damage the communication satellites, power distri-bution and communication networks and might requirere-routing of aircrafts [4]. Due to our reliance on tech-nology in day-to-day life, we encounter comparatively lessdisruptive extreme events ranging from mobile networkcongestion to traﬃc jams. In economy, market crasheshave impacted the entire international ﬁnancial system[5]. Due to disproportionately large social and economiccosts, it is essential to understand the extreme eventproperties and their early warning signals.It is by now well understood that irrespective of thephysical origins, extreme events display certain genericstatistical properties. One such possibility arises in alarge number of natural, socio-economic and technolog-ical systems display long memory property. This im-plies that the autocorrelation function decays suﬃcientlyslowly in a power-law form, h x ( t ) x ( t + τ ) i ∼ τ − γ where0 < γ < ∗ [email protected] † [email protected] and fall in this class [11, 12]. Primarily motivated bythese examples, considerable research eﬀort had been in-vested in studying the distribution of time interval R between successive occurrences of extreme events calledrecurrence time distributions P ( R ). For a time serieswith auto-correlation exponent γ , the approximate formfor P ( R ) has a power-law decay for short recurrence in-tervals, and a stretched exponential decay for long recur-rence intervals [10] and the characteristic exponents inthese regimes are a function of γ . Thus, γ carries infor-mation about the entire time series as well as about itsextreme events.Often, such characterisation through γ becomes am-biguous when the long range correlated time series iscontaminated by noise and/or missing data. For suf-ﬁciently strong noise contamination, a time series canlose its long range character and even become uncorre-lated. Similarly, missing data also leads to uncorrelat-edness. Generally, the extreme events strongly deviatefrom typical values and consequently are far less aﬀectedthan the regular non-extreme events by the noise levelin the measurements. Randomly missing data aﬀectsnon-extreme events more than the extreme events sincetypically the non-extremes outnumber the extremes. Allthese arguments imply that it might be possible to studythe statistical features of the extreme events alone andinfer information about long-range correlations in a timeseries. Eﬀectively, the information contained in the, pos-sibly noisy, non-extreme values of the time series can bedisregarded. Hence, the main premise of this article is tocharacterise a long range correlated time series by usingonly the extreme events. Further, apart from its appli-cation to noisy time series, in the context of the currentinterest in big data, this can be thought of as a methodto estimate correlation exponents of very long time se-ries using only a small fraction of its data that displayextreme events.In this work, we use detrended ﬂuctuation analysis(DFA) to quantify long memory and the results are pre-sented in terms of DFA exponent deﬁned in Eq. 3.This method has been extensively studied earlier, largelythrough simulations, and is useful even in the presenceof non-stationarities [13, 14] and trends [15] in the timeseries. Recently, detailed theoretical studies of DFA [16–18] have shown that detrending is implicit if ﬂuctuationfunction is to be an unbiased estimator [19]. Probabilisticapproaches have been adopted to obtain expected val-ues of ﬂuctuation function for Gaussian processes [20].This paper could also be thought of as response of thelong memory series to speciﬁc kind of data loss, and isrelevant in the context of research interest in how DFAﬂuctuation functions behave under similar conditions ofdata loss [21]. In this work, we pick out extreme eventsin a time series for special treatment for their long mem-ory properties. This has some broad parallels in earlierwork in which sign and magnitude of the ﬂuctuationshave been singled out for special treatment as far as theirlong range properties are concerned [22]. It must also bementioned that DFA technique can be applied to classi-cal and quantum systems at criticality [23], and furtherit is equivalent to performing ∆ statistics widely usedin random matrix theory [24].Let x ( t ) be a long range correlated time series and Q be the threshold which deﬁnes extreme events. Then, ex-treme events are those for which x ( t ) > Q . A schematicof designating events as extreme events is shown in Fig.1, in which the solid horizontal line deﬁnes the thresholdfor extreme events. This ﬁgure also shows recurrence in-tervals, the time interval between two successive extremeevents. Suppose we consider Gaussian distributed time-series x ( t ) with long memory exponent γ = 0 .

2, thenupon adding white noise of suﬃcient strength, the se-ries tends to become uncorrelated. As we show in thiswork, by isolating only the extreme events from x ( t ), wecan still infer about the long memory exponent of x ( t ).Rest of the article is structured as follows : In Section II,various methods and measured data used in this work isreviewed. In particular, we review detrended ﬂuctuationanalysis and the Fourier ﬁltering method to generate syn-thetic long memory data. In Sections III to V, we intro-duce three diﬀerent methods of applying extreme eventsto infer about long memory exponent, and a regressionbased method is used to estimate the exponent in sectionVI. Finally, conclusions are presented in Section VII. II. METHODS AND DATAA. Detrended Fluctuation Analysis

Detrended Fluctuation Analysis is a widely employedtechnique to quantify correlations in non-stationary timeseries data [25, 26]. Several variants of this technique arealso studied in the literature, see Ref. [27] for one suchvariant based on orthogonal polynomials. We brieﬂy re-view the basic technique here and the details are availablein Ref. [25]. Let x ( t ′ ) , t ′ = 1 , , , . . . , N denote a time FIG. 1. A schematic time series x ( t ). For a threshold of Q = 2, indicated by a horizontal line, three extreme eventscan be identiﬁed. Two return intervals, r and r , are shown. series of length N with mean µ and variance σ . Theintegrated version of the time series is given by y ( t ) = t X t ′ =0 ( x ( t ′ ) − µ ) . (1)Now, y ( t ) is partitioned into boxes of size n , where typ-ically n ≤ N/

4. Within each box, a polynomial y l ( t ) oforder- l is ﬁtted to y ( t ). In practice, usually order-1 isused. The time series is locally detrended by subtracting y i ( t ) from the integrated time series. The root-mean-square ﬂuctuation function as a function of size of box n is given by F ( n ) = vuut N n X t =0 ( y ( t ) − y l ( t )) . (2)This process is repeated by varying the box size n . Forcorrelated time series, the ﬂuctuation function F ( n ) gen-erally scales as F ( n ) ∼ n α , (3)where α is the DFA exponent and indicates the degree ofcorrelation. If α = 0 .

5, then the series is uncorrelated. If α > . α < . α ∈ [0 ,

1] [11, 12],though it is also known to work for α > n [15]. In this case, α can bereliably estimated by ﬁrst integrating the anti-correlatedseries and then applying DFA to it. The local trend isthen removed by ﬁtting a second-order polynomial as itis integrated twice in this process. The true exponent canbe calculated from the estimated exponent α ′ using therelation α ′ = α + 1 [15]. In Appendix A, it is shown thatthis method (we call it DFA-int) can estimate exponentwhen α <

0. For stationary time series, α is related Data set Years Length of data DFA exponentS&P 500 index 1927-2020 23248 0.865ED stock data 1962-2020 14739 0.850IBM stock data 1962-2020 14739 0.883BK stock data 1973-2020 11910 0.956Seismic data 1981-2002 91797 0.784Prague temperature 1775-2019 89484 0.670TABLE I. Description of long-range correlated, empirical dataanalyzed in this work. to the auto-correlation exponent γ , and power spectralexponent β through the relations α = 1 + β , (4) α = 1 − γ . (5)These relations [29] are valid for 0 < α < α > − < α < x ( t ) with desired DFA expo-nent α in . Time series of length 2 was generated and allthe estimates for various exponents are averaged over 40realizations. Using x ( t ), the estimated DFA exponent is α x . In practice, α in ≈ α x , and hence we use the latterin rest of this paper. Further, we employ surrogate dataanalysis [32] by randomly shuﬄing x ( t ). If the shuﬄedseries becomes uncorrelated, then it implies the presenceof non-trivial correlations in the data and is not a chanceoccurrence. B. Applications to observed data

The results presented in this paper are tested usingobserved data sets. For this purpose, time series fromthree diﬀerent systems are considered for extreme eventsbased analysis. They are, ( a ) absolute log-returns of dailyclosing index and stock data, namely, S&P index data,equity data of ED, IBM and BK stocks [33], ( b ) seismicrecords from Italian Seismicity Catalog CSI 1.1 [34], and( c ) observed daily mean temperatures from Prague ob-servatory [35, 36]. More details about the data are givenin Table I.For the stock market data, if x k represents the valueof stock/index at time k , then the absolute log-returns isdeﬁned as ρ k = (cid:12)(cid:12)(cid:12)(cid:12) log (cid:18) x k +1 x k (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) . (6)The extreme events analysis was peformed on ρ k . TheItalian seismicity catalog, CSI 1.1, contains magnitudesof earthquakes in the Italian territory during 1981-2002. FIG. 2. (a) Estimated DFA exponent α η of extreme event se-ries η ( t ) as a function of α x for synthetic data. The shaded re-gion shows error bar around the mean trend. The two dashedlines are obtained by regression for the anti-correlated andthe long-range correlated (0 . < α x < m = − .

009 and m = 0 . η ( t ). The data has 91797 earthquake records, of which N =39665 have a magnitude evaluation. The observed dailytemperature records from Prague observatory is also an-alyzed. Let T i represents the measured temperature on i -th day in any year. The seasonal trends were removedand we analysed the data of temperature anomaly ∆ T deﬁned by ∆ T i = T i − T i , (7)where T i is the average temperature for the i -th daytaken over all years. The correlation properties of thisseries had been studied earlier [36].As seen from the last column of Table I, the DFA ex-ponent values indicate that all the time series are longrange correlated. In all the cases, the time series werestandardised to zero mean and unit variance. To deﬁneextreme events in these measured data sets, a cut-oﬀ de-ﬁned by q = 1 was chosen (see Eq. 8 below for furtherexplanation). III. EXTREME EVENTS : MODIFIED TIMESERIES

In this section, we analyse a modiﬁed time series inwhich only the extreme events are retained and all othersare set to zero. A correlated time series x ( t ) with DFAexponent α x is considered. It is assumed that x ( t ) has awell deﬁned mean µ and standard deviation σ . An eventat any instant of time t will be designated as an extremeevent if x ( t ) > x th , where the threshold is x th = µ + q σ (8)and q ≥

0. From the given original time series x ( t ), amodiﬁed time series η ( t ) is constructed as follows : η ( t ) = ( x ( t ) , if x ( t ) > x th , otherwise . (9)Throughout this paper, we choose q = 1 as a threshold forextreme events. Further, we address the question – givenlong range correlated time series x ( t ) with DFA exponent α x , what is the DFA exponent α η of the modiﬁed timeseries η ( t ) ? Thus, we explore the self-aﬃnity of η ( t )using DFA technique for α x ∈ [ − , α x and α η , for which evidence will be presented below,can be stated as follows : α η ≈ . , − ≤ α x ≤ . ,< α x , . < α x ≤ . (10)This result is presented in Fig. 2(a). In this ﬁgure, theoriginal long range correlated time series x ( t ) of length2 is synthetically generated using Fourier ﬁltering tech-nique. By putting q = 1, extreme events are identiﬁedand a modiﬁed time series η ( t ) is generated. The DFAtechnique is applied to η ( t ). Figure 2(a) shows α η plottedas a function of α x . As can be inferred from this ﬁgure,the modiﬁed series η ( t ) is found to be uncorrelated inthe anti-correlated regime of x ( t ), i.e., α x < .

5, whereasthe correlations increase monotonically in the positivelycorrelated regime, α x > .

5, with an approximate slopeof about 0.907. This systematics imply that α η can beused to estimate the value of α x in the persistence regimewith α x ∈ [0 . , α x > α η is not reliable due to large varianceprimarily due to the ﬁnite size of the time series. Fig-ure 2(a) also displays a similar analysis performed onobserved data sets listed in Table I. It shows that α η ,for observed data sets, closely follows the trend obtainedfrom synthetic long range correlated data. This furtherconﬁrms the systematic relation between α η and α x .To physically understand these results, we realise thatEq. 9 is essentially a process of data loss. For an anti-correlated series, even moderate data loss of about ∼ ∼

84% (for q = 1) of the series is removed andhence we expect the short range correlations in the restof the series to be destroyed. This results in an uncor-related time series. However, remarkably, the long-rangecorrelated series is relatively more robust against dataloss of non-extreme events. This is due to the fact thatpersistence ensures that extreme events are bunched to-gether and the construction in Eq. 9 does not destroy these extreme event values, and as a result their correla-tions are preserved with slight modiﬁcations. The eﬀectof noisy data is also shown in Fig. 2. The data η ( t )is deliberately contaminated with uniformly distributednoise ξ ∈ [ − . , . η ( t ). As this reveals, the additive noisein the data does not signiﬁcantly aﬀect the value of α η since extreme events are less aﬀected by noise than nor-mal events. Finally, Figure 2(b) shows the DFA exponentobtained after shuﬄing the time series. As expected, itshows that α η ≈ . IV. CORRELATIONS IN RETURN INTERVALSSERIES

The time ordered extreme events, exploited in sectionIII, is one useful piece of information about the timeseries. Another useful component of the time series isthe return intervals between successive occurrences of ex-treme events. This had been extensively studied as recur-rence statistics, and it is fairly well understood that thereturn interval distribution is parameterised by the auto-correlation exponent γ of a long range correlated timeseries [10]. However, as shown in Appendix B, the returnintervals are not correlated with the extreme event val-ues as measured by the linear Pearson correlation. Thesetwo components of a long range correlated time series,the extreme event values and the recurrence statistics,are independent of one another.For any x ( t ) , t = 1 , , , . . . , N , with respect to extremeevent threshold x th ( q ), we consider two successive occur-rences of extreme events at times t = t m and t m +1 . Then,the return interval series is deﬁned as r ( m ) = t m +1 − t m , m = 1 , , , . . . , (11)and its DFA exponent is denoted by α r . Figure 3 displayssimulation results for how α r varies as a function of α x .It reveals that the return intervals are uncorrelated in theanti-correlated regime of α x . This is to be expected sincethe extreme events are uncorrelated in this regime, andnot surprisingly the return intervals are uncorrelated. Incontrast, in the regime of 0 . < α x < α r , increasesapproximately linearly (till about α x = 0 .

9) at a rateslower than for α η . In general, we infer that, α r < α η < α x . (12)Upon further increasing beyond α x >

1, the correlationsbegin to monotonically decrease, and the return inter-vals become almost uncorrelated. This happens becauseas α x increases, correlations get stronger, and more andmore extreme events are consecutive in time with returninterval r = 1. Then, due to consecutive extreme events, r ( m ) = 1 for long times and this suppresses ﬂuctuationsleading to a decrease in α r . In fact, this eﬀect appears to FIG. 3. (a) Estimated DFA exponent of return interval series, r ( t ) plotted against α x . The shaded region shows the errorbar around the mean trend. The two dashed lines obtainedby regression for the anti-correlated and long-range correlatedregimes have slopes m = 0 .

013 and m = 0 .

811 respectively.The orange symbols are from observed data sets in Table I.The black symbols are the DFA exponents for synthetic datawith uniform noise added. (b) DFA exponent of randomlyshuﬄed r ( t ). take over even as α x → .

9. For high values of α x > . α r = 0 . m → ∞ ,the return interval series will be slightly correlated in theregime α x > .

5. Figure 3(a) also displays the values of α r computed for the observed data sets listed in TableI. A good agreement is observed with the trend shownby the synthetic data (blue symbols). The systematicrelationship between α r and α x is also exhibited by theobserved time series. We might also point out that theadditive noise (uniform noise added to r ( t )) in the datadoes not signiﬁcantly aﬀect the value of α r , as seen bythe black symbols in Fig. 3(a). If the return intervalseries r ( t ) is randomly shuﬄed, the DFA exponent is ap-proximately 0.5 (Fig. 3(b)) pointing to the existence ofnon-trivial correlations in r ( t ). V. COMPRESSED EXTREME EVENTS

Starting from x ( t ), we construct a time series s ( m ) inwhich only the extreme events are retained by removingall instances of non-extreme events. We call this com-pressed extreme event series and is deﬁned as s ( m ) = { x ( t m ) | x ( t = t m ) > x th } , m = 1 , , . . . N e , (13)where N e is the number of extreme events in the timeseries x ( t ) of length N . In most cases, N ≥ N e . Note FIG. 4. (a) Estimated DFA exponent of compressed extremeevent series, s ( t ), plotted against α x . The shaded region rep-resents the error bar. The two dashed lines obtained by re-gression for the regimes, α x < . . < α x <

1, haveslopes m = 0 . m = 0 .

983 respectively. The orange sym-bols represent α s for the observed data listed in Table I. Theblack symbols are the DFA exponents for synthetic data withuniform noise added. (b) DFA exponent of randomly shuﬄedtime series s ( t ). that in contrast to Eq. 9, the non-extremes in Eq. 13 areentirely removed instead of setting them to zeros. TheDFA exponent associated with the time series s ( m ) isdenoted by α s . The variation of α s as a function of α x ,is shown in Fig. 4. Evidently, the compressed extremeevent series s ( m ) is uncorrelated in the anti-correlatedregime, α x < .

5. This is the result of extreme eventseries x ( t ) being nearly uncorrelated.In the correlated regime, α x > .

5, the observationsshown in Fig. 4 reveal that α s < α x . This can be un-derstood as follows. For the series x ( t ), the ﬂuctuationfunction is denoted as F ( L ) ∼ L α x . For a time series x ( t ) of length L , the corresponding length of the com-pressed time series is L ′ = L/ h r i , where h r i representsthe mean return interval for extreme events, which de-pends on the x th used to identify extreme events. Then,using the equation for F ( L ), the ﬂuctuation function forcompressed extreme events can be written as F ( L ′ ) ∼ (cid:18) L ′ h r i (cid:19) α x ∼ L ′ α x (1 − β ) , (14)where the DFA exponent of the compressed series s ( m )can be identiﬁed as α s = α x (1 − β ) and box-size depen-dent β is, β = log L ′ h r i = log h r i log L ′ . (15)In the limit of large box size L ′ , i.e. , L ′ >>

1, we have β <<

1. Hence, we can infer that α x > α s , as observed FIG. 5. The heat map shows the DFA exponent of the timeseries obtained by assembling two components together; thereturn intervals with DFA exponent α rin , and the extremeevents series with DFA exponent α sin . Notice that the com-posite time series has DFA exponent close to that of its returnintervals. The numbers in the grid give the actual DFA ex-ponent, and the colormap encodes the same information. in the simulation results displayed in Fig. 4. The depen-dence on box size L ′ is logarithmic and hence suﬃcientlyweak that for ﬁnite sample sizes the DFA exponent ofthe compressed extreme event series does not change ap-preciably with length of the time series. As predicted byEq. 15, simulation results (not shown here) verify that α s ≈ α x as L ′ → ∞ .Based on Eqns 14-15 and Fig. 4, the relation between α x and α s can be surmised as, α s ≈ . , − ≤ α x ≤ . ,< α x , . < α x ≤ . (16)For 0 . < α x < α x and α s bear a linear relation be-tween them. However, α x > .

5, correlations are farstronger and most extreme events tend to be consecutivewith h r i ≈

1. Hence, β ≈ α x ≈ α s . Thus, forhighly correlated time series, α s provides a reasonableestimate of α x . Fig. 4(a) also shows α s computed for theobserved data sets in Table I. A reasonably good agree-ment is seen between the trends of synthetic data (bluesymbols) and the observed data (orange symbols). If ad-ditive noise (uniform noise added to s ( t )) is present in thedata, even then it does not signiﬁcantly aﬀect the valueof α s , as seen by the black symbols in Fig. 4(a). This isdue to extreme events being less aﬀected by noise thanthe non-extreme events. Finally, as shown in Fig. 4(b),if the compressed extreme event series s ( t ) is randomlyshuﬄed, its DFA exponent is nearly 0.5 showing that thecompressed extreme events carry non-trivial correlationsinherited from the original time series x ( t ).Finally, it must be pointed out that return intervalsseries r ( t ) and the compressed extreme event series s ( t ) are uncorrelated processes, for most part. The linear cor-relation between them (not shown here) reveals that theyare nearly uncorrelated. From any time series x ( t ), thederived series r ( t ) and s ( t ) are created by consideringextreme events deﬁned with, say, q = 1. This approachalso provides a way of performing the reverse process– assembling a time series with speciﬁed return inter-val correlations and speciﬁed extreme event correlationswhose DFA exponents, respectively, are α rin and α sin . Ifwe assemble such a time series, then the DFA exponentobtained is displayed in Fig. 5. This ﬁgure reveals thatfor a time series x ( t ) with DFA exponent α x , the corre-lations largely are contributed from the return intervals(with exponent α rin ) and is only weakly by the correla-tions in the extreme event values (with exponent α sin ). VI. INFERRING DFA EXPONENT USINGEXTREME EVENTS

The central premise of this paper is that the infor-mation about temporal correlations, or equivalently theDFA exponent, of any time series is also embedded in itsextreme events. Hence, using only the values of extremeevents, we can infer the DFA exponent of the originaltime series. In this section, we employ the systematic re-lation between the DFA exponents that depend only onthe extreme events, namely, α η , α r and α s and that forthe original time series α x , depicted in Figures 2-4, to in-fer the value of α x . In principle, any of α i , i = η, r, s canbe used for this purpose because α i is a monotonically in-creasing function of α x in the regime when 0 . < α x < e α x ,we propose a simple regression procedure that uses allthe available information as outlined below.Let x ( t ) , t = 1 , . . . T represent a persistent time se-ries for which we need to estimate the DFA exponent un-der the condition that the sample is noisy and it can besafely assumed that the extreme events are less aﬀectedby noise than non-extreme events. First step would be todecide a suitable threshold, x th = h x i + σ x , where h x i and σ x are, respectively, the mean and standard deviation ofthe time series. Then, based on the extreme events soidentiﬁed, three diﬀerent time series η ( t ) , r ( t ) and s ( t )are constructed and, respectively, their DFA exponents α ∗ η , α ∗ r and α ∗ s are computed. For 0 . < α x <

1, it isclear from Figs. 2-4 that the estimated exponents have asystematic dependence on α x and hence a cost function U that depends on α x can be deﬁned as, U ( α x ) = q ( α ∗ η − α η ) + ( α ∗ r − α r ) + ( α ∗ s − α s ) . (17)Now, we set dU/dα x = 0, and obtain an estimate forDFA exponent to be e α x = g (cid:0) α ∗ η , α ∗ r , α ∗ s (cid:1) . (18)where g ( . ) must be obtained numerically. In Table II, theresults of this procedure are tabulated for the measured Data set DFA exponent Estimated exponentusing extreme eventsS&P 500 index 0.865 0.972ED stock data 0.850 0.878IBM stock data 0.884 0.799BK stock data 0.956 0.881Seismic data 0.784 0.675Prague temperature 0.670 0.679TABLE II. A comparison between ”exact” DFA exponent α and the value estimated using extreme events as described inSec VI through Eqns. 17 and 18. time series listed in Table I. It is seen that, within thelimitations of ﬁnite data length, a reasonable agreementis obtained between the actual DFA exponent and theestimated exponent. In principle, this procedure need notnecessarily use the information of all the three estimatedexponents α ∗ η , α ∗ r and α ∗ s . However, using all these threeexponents provides a tighter constraint for e α . In the anti-persistent regime when 0 < α x < .

5, the DFA exponentscomputed from the derived time series η ( t ) , r ( t ) and s ( t )are α η ≈ α r ≈ α s ≈ .

5. Hence, the regression procedurewould not estimate the correct value of DFA exponent,but would still yield the qualitative information that theoriginal time series is anti-persistent in nature.

VII. SUMMARY

Extreme events are the ones that deviate strongly fromtypical events. Generally, methods such as the detrendedﬂuctuation analysis or its other variants are used to deter-mine the correlation exponent of the long memory timeseries. In any typical time series, though the extremeevents are fewer in number compared to typical events,the former have disproportionate inﬂuence in the timeseries. The measured time series could often be con-taminated by noise and missing data points. Typically,the extreme events, due to their large magnitudes, arenot as much aﬀected by noise as the non-extreme events.Hence, in this work, we exploit this property by attempt-ing to estimate the correlation exponent of a time seriesfrom its extreme events alone. we have systematically re-moved non-extremes in a correlated time series and theeﬀect of removal on the correlation exponent has beenstudied. For this, given a time series x ( t ), three diﬀer-ent time series have been constructed that contain infor-mation about extreme events alone. We show that theDFA exponents for the original time series x ( t ) and ofthe three derived time series (based on extreme eventsalone) collectively denoted by x ee ( t ) have a systematicrelation. This result, in Figs. 2-4, has been numeri-cally obtained for extreme events deﬁned with q = 1 (seeEq. 8). However, our results indicate that qualitativelysimilar trends are observed for other values of q as well.We use this relation to estimate the DFA exponent ofthe original time series, eﬀectively using only the infor- FIG. 6. Estimated DFA exponent, α x , against the input DFAexponent α in for the two methods. The shaded region showsthe error bar. Deviation from grey line would imply that themethod does not work well. The DFA-int gives reliable resultsin the range [ − . , .

5] whereas the standard DFA method isreliable only in [0 . , . mation about the extreme events in them. A regressionbased method is put forward to estimate the DFA ex-ponent of x ( t ). This technique has been demonstratedon several measured real time series data from physicalsystems, namely, stock markets, seismic and temperaturedata, all of which are known to be long range correlatedand also display extreme events. The estimated expo-nents based on x ee ( t ) are in good agreement with theexponents obtained from the measured time series. Thework presented is an exploration of the relationship be-tween a time series and its extreme events treated as a de-rived time series. This raises several interesting questionsthat require a deeper investigation such as the role playedby the extreme events thresholds in inferring DFA expo-nents, study of time series in which the extreme eventsmagntiudes and the return intervals are correlated. Someof these will be explored and reported elsewhere. ACKNOWLEDGMENTS

Dayal Singh would like to acknowledge the INSPIRE-SHE programme of Department of Science & Technology,India. M. S. Santhanam acknowledges the MATRICSgrant MTR/2019/001111 from SERB, Govt. of India.

Appendix A: Benchmarking DFA-int in theextended range of correlations

In this, we study the scaling behaviour of correlatedseries in the extended range of the DFA exponent, i.e, α x ∈ [ − , β ∈ [ − , was generated from aGaussian distribution with zero mean and unit variance FIG. 7. Numerically estimated linear Pearson correlation co-eﬃcient between return intervals and extreme event, as a func-tion of the DFA exponent of x ( t ). using the Fourier ﬁltering method having DFA exponent α x . The input DFA exponent α in was calculated usingEqn. 4). The exponent α x is estimated by two methods,the standard DFA and and DFA-int (as described in sec-tion II), for the entire range and the trends are shown inFig. 6. The results in this ﬁgure are averaged over 20realizations. As seen in Fig. 6, DFA-int gives reliable agreementwith α in in a wider range [ − . , . . , α x = 0for all values of α in < α in < α in ≈ −

1. Both the procedures underestimate the DFAexponent for α in > .

5, though DFA-int always providesa better estimate.

Appendix B: Correlations between return intervalsand extreme event magnitudes

Consider a long memory time series x ( t ). The ex-treme events in this series are denoted by y i = x ( t ) , i =1 , , , . . . , such that x ( t ) > x th , where x th is the thresh-old identiﬁed in Eq. 8. Corresponding to every y i , areturn interval r i can be identiﬁed as the time elapsedsince the last occurrence of extreme event. The cross-correlation between y i and r i is computed and the resultis shown in Fig. 7. As seen in this ﬁgure, the return in-tervals and extreme events display mild anti-correlationsfor 0 < α x <

1, and is nearly uncorrelated outside thisregime of α x . Hence, to a ﬁrst approximation, the ex-treme events and return intervals can be taken to be ap-proximately uncorrelated. [1] S. Albeverio, V. Jentsch, and H. Kantz, J. Stat. Phys. , 189-190 (2006).[2] A. S. Sharma, A. Bunde, V. P. Dimri and D. N. Baker(eds.), Extreme events and natural hazards : The com-plexity perspective , AGU Monographs , (2012).[3] Halloween storms, ,(27-10-2008).[4] National Research Council,

Severe Space WeatherEvents: Understanding Societal and Economic Impacts:A Workshop Report , (The National Academies Press,2008).[5] D. Sornette,

Why Stock Markets Crash: Critical Eventsin Complex Financial Systems , (Princeton UniversityPress, 2003).[6] A. Bunde, J. F. Eichner, S. Havlin, and J. W. Kantel-hardt, Physica A , 1 (2003); M. S. Santhanam andH. Kantz, Physica A , 713 (2005); A. Bunde, J. F.Eichner, J. W. Kantelhardt and S. Havlin, Phys. Rev.Lett. , 048701 (2005); M. I. Bogachev, J. F. Eichnerand A. Bunde, Phys. Rev. Lett. , 240601 (2007).[7] E. G. Altmann and H. Kantz, Phys. Rev. E , 056106(2005).[8] J. F. Eichner, J. W. Kantelhardt, A. Bunde andS. Havlin, Phys. Rev. E , 011128 (2007).[9] F. Wang, K. Yamasaki, S. Havlin and H. E. Stanley,Phys. Rev. E , 026117 (2006).[10] M. S. Santhanam, and H. Kantz, Phys. Rev. E ,051113, (2008). [11] A. Bunde, J. Kropp and H.-J. Schellnhuber (eds), TheScience of Disasters - Climate Disruptions, Heart Attacksand Market Crashes , (Springer, Berlin, 2002).[12] D. L. Turcotte,

Fractals and Chaos in Geology and Geo-physics , (Cambridge University Press, 1997).[13] Z. Chen, P. Ch. Ivanov, K. Hu and H. E. Stanley, Phys.Rev. E , 041107 (2002).[14] M. H¨oll, H. Kantz and Y. Zhou, Phys. Rev. E , 042201(2016).[15] K. Hu, P. Ch. Ivanov, Z. Chen, P. Carpena and H. E.Stanley, Phys. Rev. E , 011114 (2001).[16] M. H¨oll nad H. Kantz, Eur. Phys. J. B , 327 (2015).[17] O. Lovsletten, Phys. Rev. E , 012141 (2017).[18] K. Ken and Y. Tsujimoto, Physica A , 807 (2016).[19] M. H¨oll, K. Kiyono and H. Kantz, Phys. Rev. E ,033305 (2019).[20] G. Sikora et. al. , Phys. Rev. E , 032114 (2020).[21] Qianli D. Y. Ma, R. P. Bartsch, P. Bernaola-Galv´an, M.Yoneyama, and P. Ch. Ivanov, Phys. Rev. E , 031101(2010).[22] M. Gomez-Extremera, P. Carpena, P. Ch. Ivanov and P.A. Bernaola-Galvan, Phys. Rev. E , 042201 (2016); P.Carpena et. al. , Entropy , 261 (2017); C. Lerma et.al. , Chaos , 093906 (2017).[23] E. Landa et. al. , Phys. Rev. E , 016224 (2011).[24] M. S. Santhanam, J. N. Bandyopadhyay and D. Angom,Phys. Rev. E , 015201(R) (2006).[25] C. K. Peng, S. Havlin, H. E. Stanley, and A. Goldberger,Chaos , 82 (1995). [26] A. Bashan, R. Bartsch and J. W. Kantelhardt and S.Havlin, Physica A, , 5080 (2008).[27] R. B. Govindan, Phys. Rev. E , 010201(R) (2020).[28] M. G´omez-Extremera, P. Carpena, P. Ch. Ivanov, and P.A. Bernaola-Galv´an, Phys. Rev. E , 042201 (2016).[29] M. H¨oll, H. Kantz, Eur. Phys. J. B , 327 (2015).[30] H. A. Makse, S. Havlin, M. Schwartz, and H. E. Stanley,Phys. Rev. E , 5445-5449 (1996)[31] G. Rangarajan, M. Ding, Phys. Rev. E , 4991 (2000).[32] J. Theiler, S. Eubank, A. Longtin, B. Galdrikian and J.D. Farmer, Physica D , 77 (1992). [33] Yahoo Finance, https://in.finance.yahoo.com , ac-cessed on 22-07-2020.[34] Catalog of Italian seismicity CSI 1.1 1981-2002, https://csi.rm.ingv.it/home , accessed on 06-05-2019.[35] A. Bunde, S. Havlin and E. Koscielny-Bunde, H.Schellnhuber, Physica A , 255 (2001).[36] R. B. Govindan, A. Bunde, S. Havlin, Physica A318