[PDF] Border effect corrections for diagonal line based recurrence quantification analysis measures

Abstract

Recurrence Quantification Analysis (RQA) defines a number of quantifiers, which base upon diagonal line structures in the recurrence plot (RP). Due to the finite size of an RP, these lines can be cut by the borders of the RP and, thus, bias the length distribution of diagonal lines and, consequently, the line based RQA measures. In this letter we investigate the impact of the mentioned border effects and of the thickening of diagonal lines in an RP (caused by tangential motion) on the estimation of the diagonal line length distribution, quantified by its entropy. Although a relation to the Lyapunov spectrum is theoretically expected, the mentioned entropy yields contradictory results in many studies. Here we summarize correction schemes for both, the border effects and the tangential motion and systematically compare them to methods from the literature. We show that these corrections lead to the expected behavior of the diagonal line length entropy, in particular meaning zero values in case of a regular motion and positive values for chaotic motion. Moreover, we test these methods under noisy conditions, in order to supply practical tools for applied statistical research.

Full PDF

HHighlights

Border eﬀect corrections for diagonal line based recurrence quantiﬁcation analysis measures

K.Hauke Kraemer,Norbert Marwan• In recurrence quantiﬁcation analysis, border eﬀects and tangential motion can heavily bias diagonal line basedcharacteristics.• Border eﬀect bias can be minimized by a simple alteration of the diagonal line length histogram.• A parameter free, skeletonization method reduces artifacts due to tangential motion and leaves a RP of thindiagonal lines.• The proposed correction schemes lead to diagonal line entropy values that fall within analytically derived ex-pectation ranges, even in the presence of noise. a r X i v : . [ phy s i c s . d a t a - a n ] S e p order eﬀect corrections for diagonal line based recurrencequantiﬁcation analysis measures K.Hauke Kraemer a,b , ∗ ,1 , Norbert Marwan a a Potsdam Institute for Climate Impact Research, Member of the Leibniz Association, Telegrafenberg A31, 14473 Potsdam, Germany, EU b Institute of Geosciences, University of Potsdam, Karl-Liebknecht-Str. 24-25, 14476 Potsdam-Golm, Germany, EU

A R T I C L E I N F O

Keywords :Recurrence PlotsRecurrence Quantiﬁcation AnalysisShannon EntropyDynamical invariants

A B S T R A C T

Recurrence Quantiﬁcation Analysis (RQA) deﬁnes a number of quantiﬁers, which base upondiagonal line structures in the recurrence plot (RP). Due to the ﬁnite size of an RP, these linescan be cut by the borders of the RP and, thus, bias the length distribution of diagonal linesand, consequently, the line based RQA measures. In this letter we investigate the impact of thementioned border eﬀects and of the thickening of diagonal lines in an RP (caused by tangen-tial motion) on the estimation of the diagonal line length distribution, quantiﬁed by its entropy.Although a relation to the Lyapunov spectrum is theoretically expected, the mentioned entropyyields contradictory results in many studies. Here we summarize correction schemes for both,the border eﬀects and the tangential motion and systematically compare them to methods fromthe literature. We show that these corrections lead to the expected behavior of the diagonal linelength entropy, in particular meaning zero values in case of a regular motion and positive valuesfor chaotic motion. Moreover, we test these methods under noisy conditions, in order to supplypractical tools for applied statistical research.

1. Introduction

Recurrence quantiﬁcation analysis (RQA) is a powerful tool for the identiﬁcation of characteristic dynamics andregime changes [1, 2]. This approach is successfully applied in many scientiﬁc disciplines [3, 4, 5, 6, 7, 8, 9, 10, 11,12, 13]. Several measures of complexity are deﬁned on geometric features (such as diagonal and vertical lines) in therecurrence plot (RP), which represents time points 𝑗 when a state ⃗𝑥 𝑖 at time 𝑖 recurs [14, 3, 1, 2]. These line structuresrepresent typical dynamical behavior and are related to certain properties of the dynamical system, e.g., chaotic orperiodic dynamics. Therefore, their quantitative study by the RQA measures within sliding windows is a frequentlyused task for the detection of regime changes [15, 16, 17, 2]. However, as some RQA measures rely on the probabilitydistribution of the lengths of the diagonal lines in an RP, the artiﬁcial alteration of these lines due to border eﬀects[18, 17], insuﬃcient embedding [1, 19], or a certain sampling setting [20, 21] can have signiﬁcant impact on thesemeasures. A few ideas have been suggested to overcome such problems [18, 22, 23]. Here we review these ideas,propose novel correction schemes, and systematically compare them.

2. Recurrence quantiﬁcation analysis and border eﬀects

A recurrence plot (RP) is a binary, square matrix 𝐑 representing the recurrences of states ⃗𝑥 𝑖 ( 𝑖 = 1 , ..., 𝑁 , with 𝑁 the number of measurement points) in the 𝑑 -dimensional phase space [24, 1] 𝑅 𝑖,𝑗 ( 𝜀 ) = Θ ( 𝜀 − ‖ ⃗𝑥 𝑖 − ⃗𝑥 𝑗 ‖) , ⃗𝑥 ∈ ℝ 𝑑 , (1)with ‖ ⋅ ‖ a norm, 𝜀 a recurrence threshold, and Θ the Heaviside function. The RP consists of small-scale structures,such as single points and diagonal and vertical lines, which characterize important dynamical properties of the system.A diagonal line is a sequence of pairs of time points  ∶= {( 𝑖, 𝑗 ) , ( 𝑖 + 1 , 𝑗 + 1) , … , ( 𝑖 + 𝓁 − 1 , 𝑗 + 𝓁 − 1)} where 𝑅 𝑖,𝑗 ≡ for all index pairs in  . Diagonal lines in the RP represent the temporal duration that two distinct parts of ∗ Principal Corresponding author [email protected] (K.Hauke Kraemer); [email protected] (K.Hauke Kraemer); [email protected] (N.Marwan)

ORCID (s): (K.Hauke Kraemer) https://twitter.com/KH_Kraemer (K.Hauke Kraemer)

Kraemer and Marwan:

Preprint submitted to Elsevier

Page 1 of 22

QA border eﬀect corrections

Time T i m e B

50 60 70 8050556065707580 A Figure 1:

Parallel and close parts of a phase space trajectory (A) correspond to diagonal lines of length 𝓁 in an RP (B).Diagonal lines can be cut by the border of the RP (green circles). the phase space trajectory run parallel (Figs. 1 and 2). The histogram 𝑃 ( 𝓁 ) of the lengths of diagonal lines (Fig. 3)characterizes the dynamics [25, 26, 27] and can be and has been used to quantitatively distinguish between RPs, theunderlying dynamics, or to identify regime transitions [15, 3, 11, 12, 13].For uncorrelated noise, the probability to ﬁnd a line  of exact length 𝓁 decays exponentially [28] (Fig. 3A), i.e., theRP consists only of very short diagonal lines, if there are any lines at all (Fig. 2A). In contrast, for chaotic dynamics,the RP contains diagonal lines of diﬀerent lengths (Fig. 2C), resulting in a broad distribution 𝑃 ( 𝓁 ) (Fig. 3C). TheRP for a periodic system contains continuous, non-interrupted diagonal lines, virtually of inﬁnite length (Fig. 2B). Inprincipal, we would expect a discrete line length distribution with a peak at line length inﬁnity. However, the lines arecut at the begin and end of the RP, such that an uncorrected conventional line length measurement results in a discretedistribution 𝑃 ( 𝓁 ) with uniform characteristics (Fig. 3B).The RP is a discrete matrix. Therefore, the creation of the histogram 𝑃 ( 𝓁 ) appears to be trivial. But it is not assimple as it looks at the ﬁrst glance. Diagonal lines can be quite long and – as already mentioned – can exceed theﬁnite size of the RP. In practice, this is a very common problem, particularly when a sliding window method is applied.How to count such diagonal lines? As we will see later, for some measures, it can be important to have the correctlength of the lines, for other measures it does not play any role. In the original deﬁnition, the lines are also countedeven if they were cut by the RP border [14, 29, 1].Several measures for RP analysis have been introduced which use 𝑃 ( 𝓁 ) . The ﬁrstly introduced measure was thedeterminism [29]. This measure is the fraction of recurrence points that form diagonal lines 𝐷 = ∑ 𝑁 𝓁 = 𝓁 min 𝑃 ( 𝓁 ) ∑ 𝑁 𝓁 =1 𝓁 𝑃 ( 𝓁 ) (2)and considers lines which have at least length 𝓁 min , which in principle is a free parameter, but often set to 2. Never-theless the choice of the minimal line length can be crucial for the correct estimation of some RQA measures and wecome back to that in Sect. 6.1. More details about this can be found in [1]. Since RPs of uncorrelated noise have mainlysingle points and only few and short diagonal lines, for such dynamics 𝐷 has rather low values (although embeddingcan result in artiﬁcially high 𝐷 values, see discussion in [19, 30]). In contrast, RPs for deterministic dynamics containof many diagonal lines, resulting in elevated values of 𝐷 , with the special case of 𝐷 = 1 for periodic and quasi-periodicdynamics. As this measure only quantiﬁes whether a recurrence point is on a diagonal line or not, the actual length ofa diagonal line is not important (i.e., whether the line crosses the RP border or not).Another idea is to look at the average and maximal length of the detected diagonal lines (related to prediction timeand Lyapunov exponent, resp.[1]). The average, of course, depends on the actual line lengths and will be biased whendiagonal lines cross the RP borders.Because the shape of 𝑃 ( 𝓁 ) diﬀers for diﬀerent dynamics, the Shannon entropy of the probability distribution Kraemer and Marwan:

Preprint submitted to Elsevier

Page 2 of 22QA border eﬀect corrections

Gaussian random numbers A

100 200 300

Time T i m e Periodic (monochromatic sine) B

100 200 300

Time T i m e Chaos (Roessler system) C

100 200 300

Time T i m e Figure 2:

RPs of (A) standard normal Gaussian numbers, (B) time-delay embedded sinusoidal with an oscillation period 𝑇 = 100 time units ( 𝑚 = 2 , 𝜏 = 𝑇 ∕4 ), and (C) the Rössler system ( 𝑎 = 0 . , 𝑏 = 0 . , 𝑐 = 10 ) (only subsets shown). RPswere constructed from time series of 2,000 samples (in case of the Rössler system we removed transients) using a constantglobal recurrence rate of 4% with a ﬁxed threshold and Euclidean norm. Gaussian random numbersconventional line length comp. A Length F r equen cy Periodic (monochromatic sine)conventional line length comp. B

500 1000 1500

Length F r equen cy Chaos (Roessler system)conventional line length comp. C Length F r equen cy Figure 3:

Diagonal line length distributions of the diﬀerent systems types described in Fig. 2, gained from the conventionalline counting. 𝑝 ( 𝓁 ) = 𝑃 ( 𝓁 ) ∑ 𝓁 𝑃 ( 𝓁 ) to ﬁnd a diagonal line of exact length 𝓁 was suggested [29] 𝑆 = − 𝑁 ∑ 𝓁 = 𝓁 min 𝑝 ( 𝓁 ) ln 𝑝 ( 𝓁 ) . (3)This measure was introduced in a pragmatic way to quantify the visual line structures in the RP and has been interpretedas the “information content of the trajectories” [31]. Here, the choice of the minimal line length 𝓁 min has a signiﬁcanteﬀect, since it discards parts of the line length histogram and therefore alters its shape. For uncorrelated noise, 𝑆 has low values, because 𝑝 ( 𝓁 ) is exponentially decaying. For chaotic dynamics, 𝑝 ( 𝓁 ) is a broad distribution, resultingin quite large 𝑆 values. However, for periodic signals 𝑝 ( 𝓁 ) has more similarity with a uniform distribution if thementioned border eﬀects are not accounted for. Therefore, 𝑆 is not low for periodic signals, although we would expectit, but rather large, even larger than for chaotic dynamics. Here, the eﬀect of the sliced lines at the RP border has thestrongest and remarkable eﬀect, which is why we focus on this measure only in this letter. Maximal and mean diagonalline length and speciﬁcally determinism and their behavior with respect to border eﬀects, the choice of further RPrelated parameters and its interpretation will be examined carefully in a forthcoming paper. Kraemer and Marwan:

Preprint submitted to Elsevier

Page 3 of 22QA border eﬀect corrections

3. Correction schemes for counting diagonal lines

In this section we show two ways of overcoming the problem of biased diagonal line based measures due to theborder eﬀect. Either we manipulate the histogram of the diagonal lines (Sect. 3.1) or we change the shape of the RPin order to avoid a bias in the ﬁrst place (Sect. 3.2).

Let R be a 𝑁 × 𝑁 recurrence matrix, Eq. (1), and 𝑃 ( 𝓁 ) the histogram of the diagonal lines contained in R . Wenow substantiate the deﬁnition of a diagonal line in an RP from Sect. 2. A diagonal line  of length 𝓁 is a set of 𝓁 index tupels ( ⋅ , ⋅ ) 𝑘 =1 ,.., 𝓁 :  𝓁 ∶= {( 𝑖 + 𝑘, 𝑗 + 𝑘 ) | ∀ 𝑘 = 0 , ..., 𝓁 − 1 ∶ ( 𝑅 𝑖 −1 ,𝑗 −1 ) ( 𝑅 𝑖 + 𝓁 ,𝑗 + 𝓁 ) 𝑅 𝑖 + 𝑘,𝑗 + 𝑘 ≡ . (4)The length of a line 𝓁 , is usually the cardinality of this set |  | .We denote any diagonal line which starts and ends at the border of R as a border diagonal , e.g., in case of the lowertriangle of the RP, when starting at ( 𝑖, in the ﬁrst column and ending at ( 𝑁, 𝑁 − 𝑖 + 1) in the last row:  border ∶={( 𝑖 + 𝑘, 𝑘 ) | ∀ 𝑘 = 0 , … , 𝑁 − 𝑖 ∶ 𝑅 𝑖 + 𝑘, 𝑘 ≡ 𝑘, 𝑗 + 𝑘 ) | ∀ 𝑘 = 0 , … , 𝑁 − 𝑗 ∶ 𝑅 𝑘,𝑗 + 𝑘 ≡ . (5)Any diagonal of length 𝓁 , which starts or ends at the border of R and has an end or start point within the recurrencematrix, we call semi border diagonal :  semi border ∶= {( 𝑖 + 𝑘, 𝑗 + 𝑘 ) | ∀ 𝑘 = 0 , … , 𝓁 − 1 ∧ ( 𝑗 = 1 ∨ 𝑖 + 𝓁 − 1 = 𝑁 ) ∶ 𝑅 𝑖 + 𝑘,𝑗 + 𝑘 ≡ 𝑖 + 𝑘, 𝑗 + 𝑘 ) | ∀ 𝑘 = 0 , … , 𝓁 − 1 ∧ ( 𝑖 = 1 ∨ 𝑗 + 𝓁 − 1 = 𝑁 ) ∶ 𝑅 𝑖 + 𝑘,𝑗 + 𝑘 ≡ . (6) The real length of the border diagonals is unknown. Therefore, we are not able to assign their true length to themand, hence, one option to deal with the missing length regarding the line length histogram is setting their length to zero.That is, we simply discard all (semi-)border diagonals from 𝑃 ( 𝓁 ) and, thus, avoid the broad line length distribution asexemplary shown in Fig. 3B. As desired, this results in a lowered entropy value, but also has some drawbacks. In caseof a perfectly sampled stationary periodic signal (without any noise contamination and without eﬀects due to tangentialmotion, cf. Sect.4) this method would empty the histogram 𝑃 ( 𝓁 ) completely, leaving an undeﬁned entropy (Figs. 8,12, 13) and a mean and maximum line length of zero (cf. Fig. 4B for a result not corrected for tangential motion). Inthe following, we refer to this approach as dibo correction (DIscard BOrder diagonals). To avoid an empty diagonal line histogram, Censi et al.[18] suggested to assign all border diagonals the lengthof the main diagonal of the RP (line of identity). Sticking to the aforementioned example of a perfectly sampled anduncontaminated stationary periodic signal, this modiﬁcation would result in a delta peak in 𝑃 ( 𝓁 ) (cf. Fig. 4C for aresult not corrected for tangential motion), and therefore a sound deﬁned entropy value of zero as well as meaningfulmean and maximal line length estimate (Figs. 8, 12, 13). For deterministic chaotic processes this correction schemecould underestimate the entropy, if the RP is smaller than the average length scale of the diagonal lines. Especially ina running window approach, this eﬀect is assumed to be signiﬁcant. We refer to this approach as Censi correction . In alternative to the correction in Sect. 3.1.1, all (semi-)border diagonals from 𝑃 ( 𝓁 ) are discarded, but the longestone (cf. Fig. 4D). This approach would also avoid the broad line length distribution shown in Figs. 3B and 4A, butwould leave a valid deﬁnition of the entropy, since 𝑃 ( 𝓁 ) is not an empty set (cf. Figs. 8, 12, 13). The resulting entropyfor the aforementioned example would be low. In contrast to the Censi correction, this approach would avoid the biasfor deterministic chaotic processes when a windowing approach is applied. In the following, we refer to this approachas kelo correction (KEeping the LOngest border diagonal). Kraemer and Marwan:

Preprint submitted to Elsevier

Page 4 of 22QA border eﬀect corrections conventional line length comp. A Length F r equen cy dibo correction B Length F r equen cy Censi correction C Length F r equen cy kelo correction D Length F r equen cy window masking E Length F r equen cy F Length F r equen cy Figure 4:

Diagonal line length histograms of the conventional line length computation (A) and of the presented correctionschemes (B-E) for a monochromatic time-delay embedded sinusoidal with an oscillation period 𝑇 = 100 time units ( 𝑚 = 2 , 𝜏 = 𝑇 ∕4 , same as in Figs. 2B and 3B). An enlargement of the histograms from panels A to D, focusing on the shorterline lengths, is presented in panel F. A corresponding enlargement of panel E does qualitatively look the same, but withreduced frequencies, due to the smaller eﬀective window size (see text for details). For a better visibility we enlarged singlebars in panels B to E and limited the view to a frequency range [0 3] in panels A to E (in F the full range is used). The origin of the border diagonals is related to a geometric diﬀerence between the RP and the diagonals. Therefore,a further approach to avoid the length bias of border diagonals is to apply a speciﬁc window to the RP which has thesame geometric orientation as the diagonals. One realization of such a window is a 45° rotated cutout from the originalRP (Fig. 5). Conventionally counting the lines of this rotated RP cutout preserves a delta peak distribution in 𝑃 ( 𝓁 ) incase of a periodic signal (cf. Fig. 4E for a result not corrected for tangential motion). However, with this shape weloose 𝑤 − 2 𝑠 = 𝑤 − 𝑤 = 𝑤 data points with respect to the original RP. Note that 𝑠 and 𝑤 in Fig. 5 implya number of data points, meaning hypotenuse and catheti of an isosceles triangle have the same length ( 𝑤 = 𝑠 ). Weargue that this approach could be rather useful in a running window approach over a global RP, where the size of thealternative shape could be chosen such that it contains as many data points as the classic , non-rotated, window. Werefer to this approach as window masking . An alternative window shape would be a parallelogram with the top andbottom sides having the 45° direction [32].

4. Tangential motion in Recurrence Quantiﬁcation Analysis

Even though the considerations made in the preceding sections are valid and useful, the correction schemes pre-sented in Sect. 3 most likely do not give the expected correction for the entropy of diagonal line lengths for experimentaldata, unless the data has been properly preprocessed. There are three reasons why the correction of the border diago-

Kraemer and Marwan:

Preprint submitted to Elsevier

Page 5 of 22QA border eﬀect corrections ss w

Figure 5:

Blue shaded alternative window shape with edge length 𝑠 of a 𝑤 × 𝑤 recurrence plot. 𝑠 and 𝑤 imply the numberof RP matrix elements covered by the window shapes. nals in the diagonal line histogram 𝑃 ( 𝓁 ) is not suﬃcient enough: ( i ) temporal correlations in the data, especially whenhighly sampled ﬂow data is used, ( ii ) noise, and ( iii ) insuﬃcient embedding of the time series at hand (if needed) com-bined with the eﬀect of discretization and an inadequate choice of parameters needed to construct the RP (recurrencethreshold method, recurrence threshold size, norm).Temporal correlation means that states ⃗𝑥 𝑗 preceding or succeeding a state ⃗𝑥 𝑖 (or a recurring state ⃗𝑥 𝑘 of ⃗𝑥 𝑖 ), are verysimilar to this one and, hence, falling into the neighbourhood of ⃗𝑥 𝑖 (or ⃗𝑥 𝑘 ) and to be considered as recurrences, i.e., 𝑅 𝑖,𝑗 ∶= 1 for 𝑗 = [ 𝑖 − 𝑚, … , 𝑖 + 𝑛 ] or for 𝑗 = [ 𝑘 − 𝑚, … , 𝑘 + 𝑛 ] when 𝑅 𝑖,𝑘 ∶= 1 (Fig. 6A). This results in verticallyextended sequences in the RP, i.e., thickening its diagonal lines. The thickening leads to an artiﬁcially enlarged numberof diagonal lines, thus eﬀecting the distribution 𝑃 ( 𝓁 ) , and is often referred to as tangential motion [33, 1]. Moreover,the thickening is not evenly distributed along a diagonal line (Fig. 6B). For border diagonals, this means that there arenot only additional border diagonals (which could be handled by applying correction schemes as described in Sect. 3),but additional shorter diagonal lines, again leading to a broadening of the line length distribution 𝑃 ( 𝓁 ) (Fig. 4, inparticular panel F) and an elevated entropy 𝑆 .Additive noise causes the already thickened lines in the RP to appear more diﬀus (Fig. 6C,E). Technically speaking,the noise alters the phase space trajectory, causing the pairwise distances to randomly scatter about their true/noisefree values and, thus, the histogram 𝑃 ( 𝓁 ) gets enriched with small line lengths [34]. This eventually biases the RQAmeasures discussed in Sect. 2.

5. Correction schemes for reducing the eﬀects of tangential motion

A straightforward way to reduce the thickening of the diagonal lines from a theoretical perspective is the perpen-dicular RP , suggested by Choi et al. [35] R ⟂ 𝑖,𝑗 ( 𝜀 ) = Θ ( 𝜀 − ‖ ⃗𝑥 𝑖 − ⃗𝑥 𝑗 ‖) ⋅ 𝛿 ( ̇⃗𝑥 𝑖 ⋅ ( ⃗𝑥 𝑖 − ⃗𝑥 𝑗 ) ) , ⃗𝑥 ∈ ℝ 𝑑 . (7)This RP contains only those points ⃗𝑥 𝑗 that fall into the neighbourhood of ⃗𝑥 𝑖 and lie in the ( 𝑑 − 1 )-dimensionalsubspace of ℝ 𝑑 that is perpendicular to the phase space trajectory at ⃗𝑥 𝑖 . Although theoretically there is no needfor an additional parameter in order to construct a perpendicular RP, in practical situations almost no points in ℝ 𝑑 phase space end up on the mentioned ( 𝑑 − 1 )-dimensional subspace of ⃗𝑥 𝑖 (Poincaré section), due to limited resolution(discretization) of the data. Hence, it is reasonable to introduce an additional threshold parameter 𝜑 , which allows Kraemer and Marwan:

Preprint submitted to Elsevier

Page 6 of 22QA border eﬀect corrections i i +1 i −1 x i i −2 i +2

60 65 70 7560657075 Time T i m e

754 702

Time d i s t an c e

65 70

Time

Time d i s t an c e

65 70

Time

A B CD E

Figure 6: (A) Tangential motion, i.e., points of a trajectory preceding and succeeding a (recurring) state (gray), causethickening of diagonal lines in the RP (B), (C). The thickening of diagonal lines can vary, e.g., as in this example of theRössler system (noise free case in B and additive noise in C). The diagonal lines are more thick at the beginning andbecome less thick with time. A diagonal line in an RP (B), (C) denotes a range of distances in the distance matrix fallingunder a the recurrence threshold 𝜀 . Panels (D) and (E) show three “distance ranges” (we call such a range 𝔇 in the text)corresponding to the three lines in (B), (C) respectively. Shown is a colorcoded, thresholded distance matrix with reversed 𝑧 -axis for a better visibility (increasing distances from top to bottom). The colormap encodes zero distance as black andthe distance corresponding to the recurrence threshold as grey. points ⃗𝑥 𝑗 to be considered as perpendicular to ⃗𝑥 𝑖 , if arccos ̇⃗𝑥 𝑖 ⋅ ( ⃗𝑥 𝑖 − ⃗𝑥 𝑗 ) | ̇⃗𝑥 𝑖 | ⋅ | ( ⃗𝑥 𝑖 − ⃗𝑥 𝑗 ) | ∈ [( 𝜋 𝜑 ) , ( 𝜋 𝜑 )] . (8)Thus, Eq. (7) transforms to 𝑅 ⟂ 𝑖,𝑗 ( 𝜀, 𝜑 ) = Θ ( 𝜀 − ‖ ⃗𝑥 𝑖 − ⃗𝑥 𝑗 ‖) ⋅ Θ ( 𝜑 − |||| arccos ̇⃗𝑥 𝑖 ⋅ ( ⃗𝑥 𝑖 − ⃗𝑥 𝑗 ) | ̇⃗𝑥 𝑖 | ⋅ | ( ⃗𝑥 𝑖 − ⃗𝑥 𝑗 ) | |||| − 𝜋 ) , ⃗𝑥 ∈ ℝ 𝑑 . (9)Figure 7B shows a perpendicular RP for a Rössler system (with parameters 𝑎 = 0 . , 𝑏 = 0 . , 𝑐 = 10 , transientsremoved). For the estimation of the tangential at each point in phase space we used the reference point, its predecessorand its successor. We set the angle threshold to 𝜑 = 𝜋 ( = 15 °). Requiring less computational eﬀort, the iso-directional RP suggested by Horai et al. [36] also promises to copewith the tangential motion, but also inherits two additional parameters 𝑇 and 𝜀 (Fig. 7C). In this approach two pointsin phase space are denoted recurrent, if their mutual distance falls within the recurrence threshold 𝜀 and the distanceof their trajectories throughout 𝑇 consecutive time steps falls within a recurrence threshold 𝜀 𝑅 ⇗ 𝑖,𝑗 ( 𝜀, 𝜀 , 𝑇 ) = Θ( 𝜀 − ‖ ⃗𝑥 𝑖 − ⃗𝑥 𝑗 ‖ ) ⋅ Θ( 𝜀 − ‖ ( ⃗𝑥 𝑖 + 𝑇 − ⃗𝑥 𝑖 ) − ( ⃗𝑥 𝑗 + 𝑇 − ⃗𝑥 𝑗 ) ‖ ) , ⃗𝑥 ∈ ℝ 𝑑 . (10)We achieved decent results when choosing 𝑇 in the size of the decorrelation time (e.g., ﬁrst minimum of the mutualinformation) and the second recurrence threshold as half of the size of the recurrence threshold 𝜀 , which determinesthe parent RP. Kraemer and Marwan:

Preprint submitted to Elsevier

Page 7 of 22QA border eﬀect corrections normal RP A Time T i m e perpendicular RP B Time T i m e isodirectional RP C Time T i m e TRP D Time T i m e LM2P E Time T i m e diagonal RP F Time T i m e Figure 7:

Diﬀerent approaches for avoiding the eﬀect of tangential motion in a recurrence plot (RP), exemplary shownfor the Rössler system (with parameters 𝑎 = 0 . , 𝑏 = 0 . , 𝑐 = 10 , sampling time Δ 𝑡 = 0 . ). (A) Normal RP with ﬁxedrecurrence threshold ensuring 4% global recurrence rate as a basis to all other RPs shown in this ﬁgure. (B) PerpendicularRP with angle threshold 𝜑 = 15 °, (C) isodirectional RP with 𝑇 = 5 [sampling units] and 𝜀 = 𝜀 ∕2 , (D) true recurrencepoint RP (TRP) with 𝑇 min = 5 [sampling units], which coincides with the ﬁrst minimum of the mutual information, (E)thresholded local minima approach with two parameters (LM2P) and 𝜏 m = 5 , and (F) diagonal RP. Inspired by the work of Gao [37], Ahlstrom et al. [38] compute a normal RP, Eq. (1), but only accept those pointsto be recurrent, which “ﬁrst” enter the 𝜀 neighbourhood shown in Fig. 6A. To ensure this, they ﬁrst identify all pointswhich fall into an 𝜀 -neighbourhood of a certain point ⃗𝑥 𝑖 𝜁 𝑖 ≡ { ⃗𝑥 𝑗 , ⃗𝑥 𝑗 , ... | 𝑅 𝑖,𝑗 𝑘 ∶= 1} , (11)i.e., all points 𝑗 𝑘 in column 𝑖 of the RP. The time diﬀerence of two consecutive recurrence points ⃗𝑥 𝑗 𝑘 , ⃗𝑥 𝑗 𝑘 +1 is { 𝑇 (1) 𝑘 = 𝑗 𝑘 +1 − 𝑗 𝑘 } 𝑘 ∈ ℕ in units of the sampling time (recurrence times of ﬁrst type [37]) and these correspond to thevertical distances between these points in column 𝑖 of the RP. They now discard all points from the RP, which verticaldistance to its neighbouring point in a column is 1 and leaving all points with recurrence time larger than 1, 𝜁 ∗ 𝑖 ≡ { ⃗𝑥 𝑗 , ⃗𝑥 𝑗 , … | 𝑅 𝑖,𝑗 𝑘 ∶= 1 , 𝑇 (1) 𝑘 > 𝑇 min } , 𝑇 min = 1 . (12)The authors call this modiﬁed RP a true recurrence point recurrence plot (TRP) . This is diﬀerent than simplydiscarding all points from the computations of Eq. (1) which fall within a certain time range 𝑤 Theiler of the referencepoint (

Theiler window [20]) 𝑅 𝑖,𝑗 ( 𝜀 ) = Θ ( 𝜀 − ‖ ⃗𝑥 𝑖 − ⃗𝑥 𝑗 ‖) , | 𝑖 − 𝑗 | > 𝑤 Theiler , ⃗𝑥 ∈ ℝ 𝑑 . (13) Kraemer and Marwan:

Preprint submitted to Elsevier

Page 8 of 22QA border eﬀect corrections

To obtain a TRP, we suggest to discard all recurrence points with recurrence times greater than 𝑤 Theiler , i.e., 𝑇 min = 𝑤 Theiler in Eq. (12). The Theiler window should be set in the order of the decorrelation time or the delay, iftime delay embedding is used for reconstructing the phase space vectors from time series.However, the TRP most often leads to disjoint, deviated diagonal line structures (Fig. 7D), which correspond tothe white embraced lines in Fig. 6D, E.An alternative would be to use the mid-points of the recurrence sequences. This would also correspond to recur-rence times as discussed in [39]. In Subsec. 5.5, we will develop another correction scheme which is motivated bythese mid-point based “true recurrences”.

Another approach for reducing the eﬀect of tangential motion and which shares the basic idea from the TRP ap-proach, was introduced by Schultz et al. [22], who track the local minima of the distance matrix (corresponding tothe maxima in Fig. 6D, E). Wendi & Marwan [23] then extended this idea in order to make the method more robustagainst noise. However, such local-minima based RP can contain bended or disrupted diagonal line structures. Thekey idea is to look for local minima in each column of the distance matrix, illustrated as an orange cross section inFig. 6D, E. If such a local minimum is smaller than the recurrence threshold, then it is a recurrence (LocalMinimaTh-resholded, LMT). In the two-parameter approach (LM2P) [23] shown in Fig. 7E, there is an additional constraint fortwo consecutive local minima to be displaced by at least 𝜏 m time steps. We now propose an additional approach to cope with the tangential motion, which does not need any additionalparameters and leads to an RP of straight, unbended diagonal line structures (Fig. 7F). We call this approach the diagonal RP , since it generates an RP with only diagonal line structures that are just one point thick.A diagonal line in a RP corresponds to a connected region in the distance matrix with distances smaller than therecurrence threshold 𝜀 (Fig. 6D, E, white embraced region). We call such region a “distance range” 𝔇 . Typically, thelarger 𝜀 the larger the 𝔇 𝑖 ’s in the RP. Moreover, tangential motion, noise and insuﬃcient embedding aﬀect the shapeand width of the 𝔇 𝑖 ’s. For the diagonal line based RQA measures we are interested in these ranges to be representedby single, connected diagonal lines in the corresponding RP. We choose the longest line of each 𝔇 𝑖 to be its adequaterepresentative in the RP. We deﬁne the “distance ranges” 𝔇 𝑖 of an RP recursively as a set of adjacent diagonal lines  ( 𝑚 ) 𝓁 𝑚 of length 𝓁 𝑚 (cf. Eq. (4)), initializing with the longest line  ( 𝑘 ) 𝓁 𝑘 , for which 𝓁 𝑘 = max( 𝓁 ∶ 𝑃 ( 𝓁 ) > . 𝔇 𝑖 ∶= {  ( 𝑘 ) 𝓁 𝑘 ,  ( 𝑚 ) 𝓁 𝑚 |  ( 𝑚 ) 𝓁 𝑚 ↶  ( 𝑚 −1) 𝓁 𝑚 −1 ↶  ( 𝑚 −2) 𝓁 𝑚 −2 ↶ ... ↶  ( 𝑘 ) 𝓁 𝑘 ∨  ( 𝑚 ) 𝓁 𝑚 ↷  ( 𝑚 −1) 𝓁 𝑚 −1 ↷  ( 𝑚 −2) 𝓁 𝑚 −2 ↷ ... ↷  ( 𝑘 ) 𝓁 𝑘 } (14)with the line-neighbour-relations ↶ and ↷ deﬁned by ∃ 𝑝 ∈ [1 , ..., 𝓁 𝑚 ] ∃ 𝑞 ∈ [1 , ..., 𝓁 𝑘 ] ∶( 𝑖 𝑚 , 𝑗 𝑚 ) 𝑝 ∶= { ( 𝑖 𝑘 + 1 , 𝑗 𝑘 ) 𝑞 ∨ ( 𝑖 𝑘 , 𝑗 𝑘 + 1) 𝑞 , if  ( 𝑚 ) 𝓁 𝑚 ↶  ( 𝑘 ) 𝓁 𝑘 ( 𝑖 𝑘 − 1 , 𝑗 𝑘 ) 𝑞 ∨ ( 𝑖 𝑘 , 𝑗 𝑘 − 1) 𝑞 , if  ( 𝑚 ) 𝓁 𝑚 ↷  ( 𝑘 ) 𝓁 𝑘 (15)where ( 𝑖 𝑚 , 𝑗 𝑚 ) 𝑝 =[1 ,..., 𝓁 𝑚 ] denote the index tuples corresponding to lines  ( 𝑚 ) 𝓁 𝑚 and ( 𝑖 𝑘 , 𝑗 𝑘 ) 𝑞 =[1 ,..., 𝓁 𝑘 ] denote the indextuples corresponding to the longest line  ( 𝑘 ) 𝓁 𝑘 . We then delete all lines contained in 𝔇 𝑖 from the histogram 𝑃 ( 𝓁 ) anddeﬁne the next distance range 𝔇 𝑖 +1 with a new  ( 𝑘 ′ ) 𝓁 𝑘 ′ from the histogram and so on until 𝑃 ( 𝓁 ) is an empty set.We construct the new RP by keeping the longest line of each 𝔇 𝑖 (all the  ( 𝑘 ) 𝓁 𝑘 ’s). Denote the set of index tuples ( 𝑖, 𝑗 ) corresponding to the set of longest lines gained from the 𝔇 𝑖 ’s as 𝔖 , then 𝑅 ↗ 𝑖,𝑗 = { , if ( 𝑖, 𝑗 ) ∈ 𝔖 , otherwise (16) Kraemer and Marwan:

Preprint submitted to Elsevier

Page 9 of 22QA border eﬀect corrections

Note that this algorithm constricts clusters of adjacent recurrence points to a single diagonal line, representingthis “distance range” 𝔇 (skeletonization). Although this method impresses with the absence of additional parameters,caution in its use is advised concerning the choice of the embedding parameters and the recurrence threshold. A wrongsetup, speciﬁcally a too high recurrence threshold and/or a “wrong” time delay, could lead to an overall connected RP,which in turn would cause a diagonal RP consisting of just one single line in each triangle (if the main diagonal isdiscarded). However, concerning the sensitivity to the choice of the recurrence threshold, our numerical investigationssuggest a rather low risk of this special case and a broad range of threshold values, which do work well (cf. Sect. 6,Sect. A and ﬁgures therein).

6. Results: Eﬃcience of correction schemes

We now apply the correction schemes for counting diagonal lines (Sect. 3) and suppressing tangential motion(Sect. 5) on a time discrete as well as time continuous example, in order to test their ability to give valid estimates forthe entropy of diagonal line lengths, Eq. (3). In case of the former we choose the Logistic map 𝑥 𝑛 +1 = 𝑟𝑥 𝑛 (1 − 𝑥 𝑛 ) with control parameter 𝑟 = 3 . , leading to regular limit cycle behavior, and control parameter 𝑟 = 3 . , where a chaoticregime is obtained. For the latter we show diagonal line length entropies of RPs of the Rössler system [40] ̇𝑥 = − 𝑦 − 𝑧̇𝑦 = 𝑥 + 𝑎𝑦 (17) ̇𝑧 = 𝑏 + 𝑧 ( 𝑥 − 𝑐 ) in two parameter conﬁgurations, also leading to regular limit cycle behavior ( 𝑎 = 0 . , 𝑏 = 𝑐 = 10 ) and chaoticmotion ( 𝑎 = 0 . , 𝑏 = 0 . , 𝑐 = 10 ) [41]. The results shown in this section are based on ensembles of 100 realizationsof each parameter setting for the Rössler system and on ensembles of 1,000 realizations of each parameter setting forthe Logistic map, gained from randomly chosen initial conditions out of a uniformly distributed interval 𝑥 ∈ [0 , . (Logistic map), 𝑥 (0) , 𝑦 (0) , 𝑧 (0) ∈ [0 , (Rössler system). We numerically integrate the Rössler equations using theexplicit Runge-Kutta (4,5) formalism (Dormand-Prince pair) as provided by the ode45-solver in MATLAB [42] witha ﬁxed sampling time of Δ 𝑡 = 0 . . For both systems we discard the ﬁrst 2,500 data points as transients, keeping 1,000(Logistic map) and 2,000 (Rössler) data points as the time series we base our further computations on. For estimatingthe entropy, we use the Maximum-Likelihood-estimator 𝑝 ( 𝓁 ) ̂ = ̂𝑝 ( 𝓁 ) = number of lines of length 𝓁 number of all lines in the RP for the probabilities.Generally, we expect (near-)zero entropy values for the regular regime setups and high(er) values for the chaoticregime setups for both considered examples in the noise free case (cf. Sect. 2). Moreover, we expect the correctionschemes for counting diagonal lines (Sect. 3) to perform well in case of the Logistic map examples, due to the absenceof tangential motion. For the ﬂow data in the Rössler examples, we expect a combination of these correction schemeswith the correction schemes for tangential motion described in Sect. 5 to give reasonable results. In order to validateour results we compute the diagonal line length entropy analytically for the mentioned cases. March et al. [27] gavean expression for this: 𝑆 theoretical = 𝐾 ( 𝛾 − 1 ) − ln 𝛾 , (18)with 𝛾 = (1 − 𝑒 − 𝐾 ) and 𝐾 the correlation entropy. Practically we compute the largest Lyapunov exponent forour experimental settings [43] and use Pesin’s identity to get the Kolmogorov entropy 𝐾 . Because the correlationentropy is a lower bound for the Kolmogorov entropy [44], we expect the reference values computed from Eq. (18) togive underestimated expectation values for the diagonal line length entropy.The results conﬁrm our expectations (Fig. 8). While the conventional way of counting diagonal lines, where bordereﬀects are not taken into consideration, lead to counterintuitive behavior, all the described correction schemes are ableto distinguish chaotic from regular regimes in both exemplary systems. In this laboratory, noise free conditions, theentropy estimates in case of the regular limit cycle regimes are zero (or in case of the 𝑑𝑖𝑏𝑜 -correction scheme notdeﬁned, due to the absence of any diagonal line). For 𝑑𝑖𝑏𝑜 and 𝑘𝑒𝑙𝑜 the estimated values for the chaotic Rössler regimefall within the two standard deviation margin of the theoretical values, whereas Censi’s correction scheme comes veryclose and the windowshape correction scheme misses it by approx. 5%. Again, we have to stress that we expect the Kraemer and Marwan:

Preprint submitted to Elsevier

Page 10 of 22QA border eﬀect corrections

Residuals to entropy expectation value - Diagonal RP c on v en t i ona l C en s i d i bo k e l o w i ndo w m a sk i ng r e s i dua l t o e x pe c t a t i on v a l ue Roessler system regularRoessler system chaoticLogistic Map regularLogistic Map chaotic

Entropy of diagonal line length distr. - Diagonal RP c on v en t i ona l C en s i d i bo k e l o w i ndo w m a sk i ng no r m a li z ed en t r op y Figure 8:

Diagonal line length entropy (left panel) of the proposed diagonal recurrence plot 𝑅 ↗ (cf. Sect. 5.5) of theRössler system (reddish) and the Logistic map (bluish) in a regular limit cycle regime (bright) as well as in a chaoticregime (dark). Shown are medians of the diagonal line length entropies gained from 1,000 realizations of the Logistic mapand 100 realizations of the Rössler example, respectively, for the diﬀerent line counting correction schemes described inSect. 3. Errorbars indicate two standard deviations of these distributions. Black stars show medians of ensembles of 1,000analytically computed values derived from Eq. (18) (its errorbars, as two standard deviations of the ensemble distribution,are barely visible and smaller than markers used). In the right panel the residuals to these underestimated expectationvalues are shown. Firstly, RPs were obtained with a ﬁxed recurrence threshold corresponding to 19% recurrence rate incase of the Rössler examples and a ﬁxed recurrence threshold corresponding to of the range of the underlying timeseries in case of the Logistic map examples (for noise free map data the 𝜀 -adjustment with respect to the global recurrencerate does not work properly). Then our proposed, parameter free correction scheme leading to the diagonal recurrenceplot 𝑅 ↗ was applied. Results for a range of recurrence thresholds and for all tangential motion RP-correction schemes areshown in Fig. 12 and Fig. 13 in the Appendix A. expectation values to be underestimated, i.e. we assume Censi’s method and the window masking do also perform well.Let us stick to the 𝑘𝑒𝑙𝑜 correction scheme for now and look how the diﬀerent correction schemes for tangential motionperform (Fig. 9). First of all we have to mention that we were not able to produce any kind of reasonable estimateswhile using the perpendicular recurrence plot 𝑅 ⟂ , regardless of the angle threshold parameter. This straightforwardapproach is extremely sensitive to any kind of noise and to the sampling time of the system under observation. Itneeds a fairly high density of state space points, in order to yield a non empty RP and, thus, any meaningful diagonalline length entropy estimate. Hence, we skip this approach in our further analysis, especially the dependence of theshown results to the choice of the recurrence threshold and additive noise, but will discuss the performance of theperpendicular RP for a high sampled Rössler setup in the next subsection. For a general use, we cannot recommendthe application of perpendicular RPs. Coming back to the results (Fig. 9), solely the LM2P approach and the diagonalRP perform as expected (zero-values in case of the regular regime setups and higher values for the chaotic regimes,clearly distinguishable). Only the proposed diagonal RP is able to give estimates within the errorbars of the theoreticalvalues (which is why only this approach was selected for Fig. 8). Note that the reference values slightly underestimatethe “true” value and we cannot quantitatively correct for this bias. As in Fig. 7, we set the parameters 𝑇 , 𝑇 min and 𝜏 m to the corresponding ﬁrst minimum of the auto mutual information and the second recurrence threshold for theisodirectional RP was again set to 𝜀 = 𝜀 ∕2 , but we tried many parameter conﬁgurations. For the sake of completeness and in order to investigate the behavior of our proposed methods under more realisticconditions, we now look at the noise corrupted Rössler system in the two dynamical regimes (Sect. 6), but with an

Kraemer and Marwan:

Preprint submitted to Elsevier

Page 11 of 22QA border eﬀect corrections

Residuals to entropy expectation value - kelo correction no r m a l R P i s od i r e c t i ona l R P T R P L M P d i agona l R P -0.1-0.0500.050.10.150.20.250.30.350.4 r e s i dua l t o e x pe c t a t i on v a l ue Roessler system regularRoessler system chaoticLogistic Map regularLogistic Map chaotic

Entropy of diagonal line length distr. - kelo correction no r m a l R P i s od i r e c t i ona l R P T R P L M P d i agona l R P no r m a li z ed en t r op y Figure 9:

Diagonal line length entropy (left panel) based on the proposed line counting correction scheme 𝑘𝑒𝑙𝑜 (cf.Sect. 3.1.3) for the Rössler system (reddish) and the Logistic map (bluish) in a regular limit cycle regime (bright) as wellas in a choatic regime (dark). Shown are medians of the diagonal line length entropies gained from 1,000 realizations ofthe Logistic map and 100 realizations of the Rössler example, respectively, for all the diﬀerent tangential motion correctionschemes described in Sect. 4, but the perpendicular recurrence plot 𝑅 ⟂ . Errorbars indicate two standard deviations ofthese distributions. Black stars show medians of ensembles of 1,000 analytically computed values derived from Eq. (18) (its errorbars, as two standard deviations of the ensemble distribution, are barely visible and smaller than markers used).In the right panel the residuals to these underestimated expectation values are shown. The normal RP with a ﬁxedrecurrence threshold corresponding to 19% recurrence rate in case of the Rössler examples and a ﬁxed recurrence thresholdcorresponding to of the range of the underlying time series in case of the Logistic map examples (for noise free mapdata the 𝜀 -adjustment with respect to the global recurrence rate does not work properly) serves as a basis for the RPcorrection schemes shown here. Results for a range of recurrence thresholds and for all tangential motion RP-correctionschemes are shown in Fig. 12 and Fig. 13 in the Appendix A. increased sampling frequency (sampling time Δ 𝑡 = 0 . ) and with total lengths of the three numerically integratedtime series of 10,000 (transients already removed). In this setup the perpendicular recurrence plot 𝑅 ⟂ (Sect. 5.1)yields meaningful results (Fig. 10), and we compare its utility with respect to the estimation of the diagonal line lengthentropy to the normal RP and the novel diagonal recurrence plot 𝑅 ↗ (Sect. 5.5).Figure 11 illustrates the capability of 𝑅 ↗ to cope with tangential motion, especially under noise. Due to a too highcomputational eﬀort we did not compute an ensemble in this case as we did in the lower sampled cases, so the errorbarsare missing. Here we added an auto regressive (AR) process of second order with an amplitude corresponding to 20%of the mean standard deviation of the multivariate signal. 𝑥 𝑡 = 𝑎 𝑥 𝑡 −1 + 𝑎 𝑥 𝑡 −2 + 𝜀 𝑡 , (19)with parameters 𝑎 = 0 . , 𝑎 = 0 . and 𝜀 𝑡 denotes a white noise process with zero mean and constant variance ofunity. Outcomes for the normal RP and the perpendicular recurrence plot 𝑅 ⟂ can be found in the Appendix (Figs. 16,17). Additive white noise of the same magnitude gave similar results to the ones discussed here.As expected from the examples in the last section, the diagonal RP approach performs well under noise free con-ditions and all, but the conventional line counting algorithms yield zero-value entropy estimates for the regular regime(panel B) and clearly non-zero entropies in case of the chaotic regime (panel A) close to the underestimated referencevalues. The perpendicular RP also performs well in noise free conditions (Fig. 16). Even the conventional line lengthcounting leads to the desired zero entropy estimates in case of regular motion. In the presence of noise, however, 𝑅 ⟂ isnot able to distinguish regular from chaotic behavior (Fig. 17), whereas 𝑅 ↗ still performs well, giving almost the same Kraemer and Marwan:

Preprint submitted to Elsevier

Page 12 of 22QA border eﬀect corrections perpendicular RP, noisefree A

10 15 20 25

Time T i m e normal RP, noisefree B

10 15 20 25

Time T i m e diagonal RP, noisefree C

10 15 20 25

Time T i m e perpendicular RP, noisy D

10 15 20 25

Time T i m e normal RP, noisy E

10 15 20 25

Time T i m e diagonal RP, noisy F

10 15 20 25

Time T i m e Figure 10:

Cut outs of (A, D) the perpendicular recurrence plot 𝑅 ⟂ , (B, E) normal RP, and (C, F) the diagonal recurrenceplot 𝑅 ↗ of the highly sampled Rössler system in chaotic regime (here with a sampling time of Δ 𝑡 = 0 . ). Top panels (A-C)show noise free cases, bottom panels (D-F) show their noise contaminated counterparts. Shown are results of additivewhite noise as 10% of the mean standard deviation of the multivariate signal gained from the numerical integration.Computations have been carried out by using a ﬁxed recurrence threshold corresponding to 35% recurrence rate and anangle threshold 𝜑 = 15 ◦ for 𝑅 ⟂ . results as in the noise free setup. The explanation can be found in considering the RPs (Fig. 10). For this noiselevel ourproposed skeletonization approach (Fig. 10F) leaves small lines of maximum length 4 after its application to the noisynormal recurrence plot (Fig. 10E) as noise-leftovers. The appearance of these lines is not a result of the dynamicsitself. Noise enriches the RP and its corresponding diagonal line histogram with small line lengths depending on thenoiselevel ([34], Fig. 10, Fig. 2A, Fig. 3A). By increasing the minimum line length one gradually discards the majorityof the lines contained in the histogram and, thus, increases the prominence of larger line lengths for the computationof the entropy. For a regular regime, the distribution of lines of intermediate length is broader for all the correctionschemes, but the diagonal RP. Therefore an increasing minimal line length increases the entropy in the presence ofnoise for all the correction schemes, but the diagonal RP (cf. Fig. 17). In case of a chaotic regime the distribution ofdiagonals due to the dynamics is broader anyway (Fig. 2C, Fig. 3C) leading to the same eﬀect.When increasing the minimal line length for the diagonal RP, the entropy estimates stay more or less constant aftera certain, suﬃciently high, minimum line length, which depends on the noiselevel (Fig. 11C, D). The oﬀset to theunderestimated reference value for the chaotic case grows for increasing noiselevels. Note that the eﬀect of additivenoise is harder to tackle for the tangential motion correction schemes for high sampled data like in this case, than itis for lower sampled examples as discussed in Sect. 6. The higher the sampling, the ﬁner the ramiﬁcation of distanceranges 𝔇 𝑖 (thickened diagonal lines). Results for all correction schemes for a wide range of the recurrence thresholdsand under the inﬂuence of white noise for the lower sampled situation can be found in the Appendix (Figs. 14 and 15). Kraemer and Marwan:

Preprint submitted to Elsevier

Page 13 of 22QA border eﬀect corrections considered minimal line length l min no r m a li z ed en t r op y diagonal RP (noisefree chaotic) A considered minimal line length l min no r m a li z ed en t r op y diagonal RP (noisefree regular) B conventionalCensidibokelowindow maskingreference considered minimal line length l min no r m a li z ed en t r op y diagonal RP (noisy chaotic) C considered minimal line length l min no r m a li z ed en t r op y diagonal RP (noisy regular) D Figure 11:

Normalized diagonal line length entropy estimates for all described correction schemes for counting diagonallines (Sect. 3) based on the diagonal recurrence plot 𝑅 ↗ (Sect. 5.5) of the high sampled Rössler system as a function ofthe chosen minimal line length 𝓁 min . The top panels (A - chaotic motion , B - regular motion) show the noisefree case andin the bottom panels (C - chaotic motion, D - regular motion) the results for noise corrupted data are shown. We addednoise from an auto-regressive (AR) process of second order as 20% of the mean standard deviation of the multivariatesignal gained from the numerical integration (cf. Eq. (19) ). The underlying RPs for obtaining 𝑅 ↗ were computed usinga ﬁxed recurrence threshold corresponding to 35% recurrence rate. The grey shaded areas show medians of ensembles of1,000 analytically computed reference values for 𝐾 ± two standard deviations of these distributions transformed by usingEq. (18) .

7. Discussion

In this letter we investigated the eﬀect of the ﬁnite size of a recurrence plot on its diagonal line length based quan-tiﬁcation. Speciﬁcally, we showed how these border eﬀects inﬂuence the diagonal line length entropy and proposedthree new line length counting correction schemes, which take these eﬀects into account (cf. Subsects. 3.1.1, 3.1.3,3.2) and systematically compared them to an already proposed correction by Censi et al. [18] (Subsect. 3.1.2). Itturned out that for noise free or slightly noise corrupted map data all these correction methods solve the problem ofthe biased diagonal line length entropy due to lines cut by the borders of the RP. However, for ﬂow data the eﬀect oftangential motion has a much bigger inﬂuence on the entropy bias than the border eﬀects. Therefore, we systematicallycompared already proposed ideas to handle tangential motion and proposed a new, parameter free method, the diag-onal RP (cf. Sect. 5.5). It can properly tackle the tangential motion eﬀects and yield, in combination with the bordereﬀect correction schemes, meaningful estimates for the diagonal line length entropy. We have to emphasize that thismethod, in contrast to other suggested ideas, also works for noise contaminated data, is not sensitive to the particular

Kraemer and Marwan:

Preprint submitted to Elsevier

Page 14 of 22QA border eﬀect corrections choice of the recurrence threshold, does not introduce any additional parameter, and is, therefore, easy to use. In caseof a noise corrupted ﬂow-like signal the diagonal line length entropy approaches its constant expectation value forsuﬃciently high choices of the minimal line length, when the diagonal RP together with Censi’s or our propsed bordereﬀect correction schemes is used. Fairly high recurrence thresholds (>10% recurrence rate) favour the diagonal RP method for intermediate or high noise levels.

Acknowledgments

We thank Johannes Donath for his contribution through his Bachelor’s thesis, regarding the window maskingcorrection scheme. This work has been ﬁnancially supported by the German Research Foundation (DFG projectsMA4759/8 and MA4759/9) and the European Union’s Horizon 2020 Research and Innovation Programme under theMarie Skłodowska-Curie grant agreement 691037 (project QUEST).An implementation of all the discussed correction routines (border eﬀects and tangential motion) is available asMATLAB code in the Zenodo archive [45].

References [1] N. Marwan, M. C. Romano, M. Thiel, J. Kurths, Recurrence Plots for the Analysis of Complex Systems, Physics Reports 438 (5–6) (2007)237–329. doi:10.1016/j.physrep.2006.11.001 .[2] C. L. Webber, Jr., N. Marwan, Recurrence Quantiﬁcation Analysis – Theory and Best Practices, Springer, Cham, 2015. doi:10.1007/978-3-319-07155-8 .[3] N. Marwan, N. Wessel, U. Meyerfeldt, A. Schirdewan, J. Kurths, Recurrence Plot Based Measures of Complexity and its Application to HeartRate Variability Data, Physical Review E 66 (2) (2002) 026702. doi:10.1103/PhysRevE.66.026702 .[4] Z. O. Guimarães-Filho, I. L. Caldas, R. L. Viana, J. Kurths, I. C. Nascimento, Y. K. Kuznetsov, Recurrence quantiﬁcation analysis of electro-static ﬂuctuations in fusion plasmas, Physics Letters A 372 (7) (2008) 1088–1095. doi:10.1016/j.physleta.2007.07.088 .[5] A. Facchini, F. Rossi, C. Mocenni, Spatial recurrence strategies reveal diﬀerent routes to Turing pattern formation in chemical systems, PhysicsLetters A 373 (46) (2009) 4266–4272. doi:10.1016/j.physleta.2009.09.049 .[6] C. L. Webber, Jr., N. Marwan, A. Facchini, A. Giuliani, Simpler methods do it better: Success of Recurrence Quantiﬁcation Analysis as ageneral purpose data analysis tool, Physics Letters A 373 (2009) 3753–3756. doi:10.1016/j.physleta.2009.08.052 .[7] K. Guhathakurta, B. Bhattacharya, A. R. Chowdhury, Using recurrence plot analysis to distinguish between endogenous and exogenous stockmarket crashes, Physica A 389 (9) (2010) 1874–1882. doi:10.1016/j.physa.2009.12.061 .[8] Y. Hirata, Y. Shimo, H. L. Tanaka, K. Aihara, Chaotic Properties of the Arctic Oscillation Index, SOLA 7 (2011) 33–36. doi:10.2151/sola.2011-009 .[9] N. P. Subramaniyam, J. Hyttinen, Characterization of dynamical systems under noise using recurrence networks: Application to simulatedand EEG data, Physics Letters A 378 (46) (2014) 3464–3474. doi:10.1016/j.physleta.2014.10.005 .[10] M. S. Santos, J. D. Szezech, A. M. Batista, I. L. Caldas, R. L. Viana, S. R. Lopes, Recurrence quantiﬁcation analysis of chimera states, PhysicsLetters A 379 (37) (2015) 2188–2192. doi:10.1016/j.physleta.2015.07.029 .[11] O. Kopáček, V. Karas, J. Kovář, Z. Stuchlík, Transition from regular to chaotic circulation in magnetized coronae near compact objects, TheAstrophysical Journal 722 (2) (2010) 1240. doi:10.1088/0004-637X/722/2/1240 .[12] V. Mitra, A. Sarma, M. S. Janaki, A. N. Sekar Iyengar, B. Sarma, N. Marwan, J. Kurths, P. K. Shaw, D. Saha, S. Ghosh, Order to chaostransition studies in a DC glow discharge plasma by using recurrence quantiﬁcation analysis, Chaos, Solitons & Fractals 69 (2014) 285–293. doi:10.1016/j.chaos.2014.10.005 .[13] V. Nair, G. Thampi, R. I. Sujith, Intermittency route to thermoacoustic instability in turbulent combustors, Journal of Fluid Mechanics 756(2014) 470–487. doi:10.1017/jfm.2014.468 .[14] J. P. Zbilut, C. L. Webber, Jr., Embeddings and delays as derived from quantiﬁcation of recurrence plots, Physics Letters A 171 (3–4) (1992)199–203. doi:10.1016/0375-9601(92)90426-M .[15] L. L. Trulla, A. Giuliani, J. P. Zbilut, C. L. Webber, Jr., Recurrence quantiﬁcation analysis of the logistic equation with transients, PhysicsLetters A 223 (4) (1996) 255–260. doi:10.1016/S0375-9601(96)00741-4 .[16] J. F. Donges, R. V. Donner, K. Rehfeld, N. Marwan, M. H. Trauth, J. Kurths, Identiﬁcation of dynamical transitions in marine palaeoclimaterecords by recurrence network analysis, Nonlinear Processes in Geophysics 18 (2011) 545–562. doi:10.5194/npg-18-545-2011 .[17] D. Eroglu, T. K. D. Peron, N. Marwan, F. A. Rodrigues, L. d. F. Costa, M. Sebek, I. Z. Kiss, J. Kurths, Entropy of weighted recurrence plots,Physical Review E 90 (2014) 042919. doi:10.1103/PhysRevE.90.042919 .[18] F. Censi, G. Calcagnini, S. Cerutti, Proposed corrections for the quantiﬁcation of coupling patterns by recurrence plots, IEEE Transactions onBiomedical Engineering 51 (5) (2004) 856–859.[19] N. Marwan, How to avoid potential pitfalls in recurrence plot based data analysis, International Journal of Bifurcation and Chaos 21 (4) (2011)1003–1017. doi:10.1142/S0218127411029008 .[20] J. Theiler, Spurious dimension from correlation algorithms applied to limited time-series data, Phys. Rev. A 34 (1986) 2427–2432. doi:10.1103/PhysRevA.34.2427 .URL https://link.aps.org/doi/10.1103/PhysRevA.34.2427 [21] A. Facchini, H. Kantz, Curved structures in recurrence plots: The role of the sampling time, Physical Review E 75 (2007) 036215. doi:10.1103/PhysRevE.75.036215 . Kraemer and Marwan:

Preprint submitted to Elsevier

Page 15 of 22QA border eﬀect corrections [22] A. Schultz, Y. Zou, N. Marwan, M. T. Turvey, Local Minima-based Recurrence Plots for Continuous Dynamical Systems, International Journalof Bifurcation and Chaos 21 (4) (2011) 1065–1075. doi:10.1142/S0218127411029045 .[23] D. Wendi, N. Marwan, Extended recurrence plot and quantiﬁcation for noisy continuous dynamical systems, Chaos: An InterdisciplinaryJournal of Nonlinear Science 28 (8) (2018) 085722. arXiv:https://doi.org/10.1063/1.5025485 , doi:10.1063/1.5025485 .URL https://doi.org/10.1063/1.5025485 [24] J.-P. Eckmann, S. Oliﬀson Kamphorst, D. Ruelle, Recurrence Plots of Dynamical Systems, Europhysics Letters 4 (9) (1987) 973–977. doi:10.1209/0295-5075/4/9/004 .[25] P. Faure, H. Korn, A new method to estimate the Kolmogorov entropy from recurrence plots: its application to neuronal signals, Physica D122 (1–4) (1998) 265–279. doi:10.1016/S0167-2789(98)00177-8 .[26] M. Thiel, M. C. Romano, P. L. Read, J. Kurths, Estimation of dynamical invariants without embedding by recurrence plots, Chaos 14 (2)(2004) 234–243. doi:10.1063/1.1667633 .[27] T. K. March, S. C. Chapman, R. O. Dendy, Recurrence plot statistics and the eﬀect of embedding, Physica D 200 (1–2) (2005) 171–184. doi:10.1016/j.physd.2004.11.002 .[28] M. Thiel, M. C. Romano, J. Kurths, Analytical Description of Recurrence Plots of white noise and chaotic processes, Izvestija vyssich ucebnychzavedenij/ Prikladnaja nelinejnaja dinamika – Applied Nonlinear Dynamics 11 (3) (2003) 20–30.[29] C. L. Webber, Jr., J. P. Zbilut, Dynamical assessment of physiological systems and states using recurrence plot strategies, Journal of AppliedPhysiology 76 (2) (1994) 965–973.[30] D. Wendi, N. Marwan, B. Merz, In search of determinism-sensitive region to avoid artefacts in recurrence plots, International Journal ofBifurcation and Chaos 28 (1) (2018) 1850007. doi:10.1142/S0218127418500074 .[31] J. P. Zbilut, A. Giuliani, C. L. Webber, Jr., Recurrence quantiﬁcation analysis and principal components in the detection of short complexsignals, Physics Letters A 237 (3) (1998) 131–135. doi:10.1016/S0375-9601(97)00843-8 .[32] J. Donath, Untersuchung alternativer Zeitfensterformen für die quantitative Rekurrenzanalyse anhand von Modellsystemen und Klimadaten,Bachelor thesis, Humboldt Universität zu Berlin (May 2016).[33] J. B. Gao, H. Q. Cai, On the structures and quantiﬁcation of recurrence plots, Physics Letters A 270 (1–2) (2000) 75–87. doi:10.1016/S0375-9601(00)00304-2 .[34] M. Thiel, M. C. Romano, J. Kurths, R. Meucci, E. Allaria, F. T. Arecchi, Inﬂuence of observational noise on the recurrence quantiﬁcationanalysis, Physica D 171 (3) (2002) 138–152. doi:10.1016/S0167-2789(02)00586-9 .[35] J. M. Choi, B. H. Bae, S. Y. Kim, Divergence in perpendicular recurrence plot; quantiﬁcation of dynamical divergence from short chaotic timeseries, Physics Letters A 263 (4–6) (1999) 299–306. doi:10.1016/S0375-9601(99)00751-3 .[36] S. Horai, T. Yamada, K. Aihara, Determinism Analysis with Iso-Directional Recurrence Plots, IEEE Transactions - Institute of ElectricalEngineers of Japan C 122 (1) (2002) 141–147.[37] J. B. Gao, Recurrence Time Statistics for Chaotic Systems and Their Applications, Physical Review Letters 83 (16) (1999) 3178–3181. doi:10.1103/PhysRevLett.83.3178 .[38] C. Ahlstrom, P. Hult, P. Ask, Thresholding distance plots using true recurrence points, Proceedings of the IEEE Conference on Acoustics,Speech and Signal Processing (ICASSP 2006) 3 (1660747) (2006) III688–III691. doi:10.1109/ICASSP.2006.1660747 .[39] E. J. Ngamga, D. V. Senthilkumar, A. Prasad, P. Parmananda, N. Marwan, J. Kurths, Distinguishing dynamics using recurrence-time statistics,Physical Review E 85 (2) (2012) 026217. doi:10.1103/PhysRevE.85.026217 .[40] O. Roessler, An equation for continuous chaos, Physics Letters A 57 (5) (1976) 397 – 398. doi:https://doi.org/10.1016/0375-9601(76)90101-8 .URL [41] R. Barrio, F. Blesa, S. Serrano, Qualitative analysis of the rÃűssler equations: Bifurcations of limit cycles and chaotic attractors, Physica D:Nonlinear Phenomena 238 (13) (2009) 1087 – 1100. doi:https://doi.org/10.1016/j.physd.2009.03.010 .URL [42] L. F. Shampine, M. W. Reichelt, The matlab ode suite, SIAM Journal on Scientiﬁc Computing 18 (1997) 1–22.[43] G. Datseries, Dynamicalsystems.jl: A julia software library for chaos and nonlinear dynamics., Journal of Open Source Software 3(23) (2018)598. doi:10.21105/joss.00598 .[44] P. Grassberger, I. Procaccia, Estimation of the Kolmogorov entropy from a chaotic signal, Physical Review A 9 (1–2) (1983) 2591–2593. doi:10.1103/PhysRevA.28.2591 .[45] K. H. Kraemer, hkraemer/Border-eﬀect-corrections-for-diagonal-line-based-recurrence-quantiﬁcation-analysis-measures: MATLAB routinesfor RQA border eﬀect and tangential motion correction (Jun. 2019). doi:10.5281/zenodo.3384892 .URL https://doi.org/10.5281/zenodo.3384892 Kraemer and Marwan:

Preprint submitted to Elsevier

Page 16 of 22QA border eﬀect corrections

A. Sensitivity of the results to the recurrence threshold

Kraemer and Marwan:

Preprint submitted to Elsevier

Page 17 of 22QA border eﬀect corrections

Figure 12:

Diagonal line length entropy estimates as a function of the recurrence threshold 𝜀 . Shown are results for alldescribed correction schemes for counting diagonal lines (Sect. 3) and suppressing tangential motion (Sect. 4), exceptthe perpendicular recurrence plot 𝑅 ⟂ . In the top panel (A) median diagonal line length entropy values gained from 100realizations of the noise free regular limit cycle regime of the Rössler system are shown, whereas the bottom panel (B)shows its chaotic regime counterpart, see text in Sect. 6 for details. The grey-shaded surface denotes the theoreticalexpectation value (median) computed from Eq. (18) . Results for the diagonal RP and the kelo correction scheme areshown in the bottom right subplot, which is a cutout of the orange bars in the bottom center subplot, here includingerrorbars as two standard deviations from the computed ensemble.Kraemer and Marwan: Preprint submitted to Elsevier

Page 18 of 22QA border eﬀect corrections

Figure 13:

Diagonal line length entropy estimates as a function of the recurrence threshold 𝜀 . Shown are results for alldescribed correction schemes for counting diagonal lines (Sect. 3) and suppressing tangential motion (Sect. 4), exceptthe perpendicular recurrence plot 𝑅 ⟂ . In the top panel (A) median diagonal line length entropy values gained from 1,000realizations of the noise free regular limit cycle regime of the Logistic map are shown, whereas the bottom panel (B) showsits chaotic regime counterpart, see text in Sect. 6 for details. The grey-shaded surface denotes the theoretical expectationvalue (median) computed from Eq. (18) . Results for the diagonal RP and the kelo correction scheme are shown in thebottom right subplot, which is a cutout of the orange bars in the bottom center subplot, here including errorbars as twostandard deviations from the computed ensemble.Kraemer and Marwan: Preprint submitted to Elsevier

Page 19 of 22QA border eﬀect corrections

Figure 14:

Diagonal line length entropy estimates as a function of the recurrence threshold 𝜀 . Shown are results for alldescribed correction schemes for counting diagonal lines (Sect. 3) and suppressing tangential motion (Sect. 4), exceptthe perpendicular recurrence plot 𝑅 ⟂ . In the top panel (A) median diagonal line length entropy values gained from 100realizations of the additive noise contaminated regular limit cycle regime of the Rössler system are shown, whereas thebottom panel (B) shows its chaotic regime counterpart, see text in Sect. 6.1 for details. Here, we added white noise as10% of the mean standard deviation of the multivariate signal gained from the numerical integration. The grey-shadedsurface denotes the theoretical expectation value (median) computed from Eq. (18) . Results for the diagonal RP and thekelo correction scheme are shown in the bottom right subplot, which is a cutout of the orange bars in the bottom centersubplot, here including errorbars as two standard deviations from the computed ensemble.Kraemer and Marwan: Preprint submitted to Elsevier

Page 20 of 22QA border eﬀect corrections

Figure 15:

Diagonal line length entropy estimates as a function of the recurrence threshold 𝜀 . Shown are results for alldescribed correction schemes for counting diagonal lines (Sect. 3) and suppressing tangential motion (Sect. 4), exceptthe perpendicular recurrence plot 𝑅 ⟂ . In the top panel (A) median diagonal line length entropy values gained from 1,000realizations of the additive noise contaminated regular limit cycle regime of the Logistic map are shown, whereas thebottom panel (B) shows its chaotic regime counterpart, see text in Sect. 6.1 for details. Here, we added white noiseas 10% of the standard deviation of the time series. The grey-shaded surface denotes the theoretical expectation value(median) computed from Eq. (18) . Results for the diagonal RP and the kelo correction scheme are shown in the bottomright subplot, which is a cutout of the orange bars in the bottom center subplot, here including errorbars as two standarddeviations from the computed ensemble.Kraemer and Marwan: Preprint submitted to Elsevier

Page 21 of 22QA border eﬀect corrections considered minimal line length l min no r m a li z ed en t r op y normal RP (noisefree chaotic) A considered minimal line length l min no r m a li z ed en t r op y normal RP (noisefree regular) B considered minimal line length l min no r m a li z ed en t r op y diagonal RP (noisefree chaotic) C considered minimal line length l min no r m a li z ed en t r op y diagonal RP (noisefree regular) D considered minimal line length l min no r m a li z ed en t r op y perpendicular RP (noisefree chaotic) E considered minimal line length l min no r m a li z ed en t r op y perpendicular RP (noisefree regular) F conventionalCensidibokelowindow maskingreference Figure 16:

Normalized diagonal line length entropy estimates as a function of the minimum line length 𝓁 𝑚𝑖𝑛 for noisefreedata from the high sampled Rössler system (cf. Sect. 6.1). In the left panels (A, C, E) the underlying system exhibitschaotic dynamics, whereas the right panels (B, D, F) show their regular counterparts. The normal RPs (A, B) and theperpendicular RPs (E, F) were constructed using a ﬁxed recurrence threshold corresponding to 35% recurrence rate. Thenormal RPs served as input for obtaining the diagonal RPs 𝑅 ↗ (C, D) and for the computation of the perpendicular RPs 𝑅 ⟂ we used an angle threshold 𝜑 = 15 °. The grey shaded areas show medians of ensembles of 1,000 analytically computedreference values for 𝐾 ± two standard deviations of these distributions transformed by using Eq. (18) . B. Sensitivity of the results to noise

Kraemer and Marwan:

Preprint submitted to Elsevier

Page 22 of 22QA border eﬀect corrections considered minimal line length l min no r m a li z ed en t r op y normal RP (noisy chaotic) A considered minimal line length l min no r m a li z ed en t r op y normal RP (noisy regular) B considered minimal line length l min no r m a li z ed en t r op y diagonal RP (noisy chaotic) C considered minimal line length l min no r m a li z ed en t r op y diagonal RP (noisy regular) D considered minimal line length l min no r m a li z ed en t r op y perpendicular RP (noisy chaotic) E considered minimal line length l min no r m a li z ed en t r op y perpendicular RP (noisy regular) F conventionalCensidibokelowindow maskingreference Figure 17: