[PDF] A unified and automated approach to attractor reconstruction

Abstract

We present a fully automated method for the optimal state space reconstruction from univariate and multivariate time series. The proposed methodology generalizes the time delay embedding procedure by unifying two promising ideas in a symbiotic fashion. Using non-uniform delays allows the successful reconstruction of systems inheriting different time scales. In contrast to the established methods, the minimization of an appropriate cost function determines the embedding dimension without using a threshold parameter. Moreover, the method is capable of detecting stochastic time series and, thus, can handle noise contaminated input without adjusting parameters. The superiority of the proposed method is shown on some paradigmatic models and experimental data from chaotic chemical oscillators.

Full PDF

AA uniﬁed and automated approach to attractor reconstruction

K. H. Kraemer,

1, 2, 3, ∗ G. Datseris, J. Kurths,

1, 5

I. Z. Kiss, J. L. Ocampo-Espindola, and N. Marwan

1, 31

Potsdam Institute for Climate Impact Research,Member of the Leibniz Association, Telegraphenberg A 31 14473 Potsdam, EU Institute of Physics and Astronomy, University of Potsdam,Karl-Liebknecht-Str. 24–25, 14476 Potsdam, EU Institute of Geosciences, University of Potsdam,Karl-Liebknecht-Str. 24–25, 14476 Potsdam, EU Max Planck Institute for Meteorology, Bundesstr. 53, 20146 Hamburg, EU Institute of Physics, Humboldt Universität zu Berlin, 12489 Berlin, EU Department of Chemistry, Saint Louis University, St. Louis, USA (Dated: January 21, 2021)

Abstract

We present a fully automated method for the optimal state space reconstruction from univariate and multi-variate time series. The proposed methodology generalizes the time delay embedding procedure by unifyingtwo promising ideas in a symbiotic fashion. Using non-uniform delays allows the successful reconstruc-tion of systems inheriting different time scales. In contrast to the established methods, the minimizationof an appropriate cost function determines the embedding dimension without using a threshold parameter.Moreover, the method is capable of detecting stochastic time series and, thus, can handle noise contaminatedinput without adjusting parameters. The superiority of the proposed method is shown on some paradigmaticmodels and experimental data from chaotic chemical oscillators.

Keywords: nonlinear dynamics, complex systems, embedding, state space reconstruction ∗ a r X i v : . [ phy s i c s . d a t a - a n ] J a n . INTRODUCTION State space reconstruction from observational time series often marks the ﬁrst and basic stepin nonlinear time series analysis. Several methods addressed the reconstruction problem, but noneof them allow for a fully automatized and reliable way of embedding a uni- or multivariate setof observed time series with no, or at least very few, free parameters. The aim of this paper isto provide such a technique. The embedding theorems of Whitney [1], Mañé [2], and Takens [3]among with their extension by Sauer et al. [4] allow several approaches to tackle the reconstruc-tion problem. Among using derivative coordinates [5, 6], PCA [7] or Legendre coordinates [8],uniform- and nonuniform time delay embedding [7] is by far the most commonly used technique,due to its appealing simplicity. However, since Takens’ theorem [3] is based on noise-free andinﬁnitely long data, it does not give any guidance to choose the proper time delay(s) τ in practice.Together with the unknown box-counting dimension D B of the observed, but unknown system,which is needed to fulﬁl the embedding dimension criterion m (cid:62) D B +

1, the majority of thepublished articles propose ideas to infer estimates for τ and the reconstruction dimension m fromdata, usually a univariate time series (univariate embedding). The reconstruction problem startswith the unknown system (cid:126) u ( t ) with a mapping f : R D B → R D B , which is observed via a measure-ment function h and lead to M observational time series { s i ( t ) | i = , . . . , M } (Fig. 1). There can bedifferent measurement functions h (cid:48) forming the multivariate dataset s i ( t ) and the combination ofWhitney’s and Takens’ embedding theorems allow for constructing (cid:126) u ( t ) from more than one timeseries (multivariate embedding) [4, 9]. One then tries to ﬁnd optimal embedding parameters m and τ ’s (the delays can be integer multiple of some constant, uniform time delay embedding UTDE ,or different for each embedding dimension, non-uniform time delay embedding NUTDE ) in orderto build reconstruction vectors (cid:126) v ( t ) and, thus, a mapping F : R m → R m . These can be furthermoretransformed by Ψ into (cid:126) v (cid:48) ( t ) and F (cid:48) , preserving the diffeomorphism to (cid:126) u ( t ) . For a detailed intro-duction into the reconstruction problem we refer to Casdagli et al. [10], Gibson et al. [8], Uzal et al. [11] or Nichkawde [12].In time delay embedding the key idea is to use lagged values of the available time series ascomponents of the reconstruction vector (cid:126) v ( t ) = (cid:0) s i ( t − τ ) , s i ( t − τ ) , . . . , s i m ( t − τ m ) (cid:1) . (1)The delays τ j and the corresponding time series s i j , j = [ , . . . , m ] should be chosen, such that theresulting components of the vectors forming the reconstructed state space (cid:126) v ( t ) are as independent2 nknown dynamical systemTime series Time delayembedding FIG. 1: Schematic representation of the embedding/reconstruction procedure. See the text fordetails. This ﬁgure is inspired by Casdagli et al. [10] and Uzal et al. [11].as possible [4, 7], but at the same time preserve the correlation structure of the system to a certainextent. These two competing objectives are also known as the problems of redundancy (delaysshould not be too small) and irrelevance (delays should not be too large) [10, 11, 13, 14] and itsoptimization is goal to any proposed time delay embedding procedure.Despite of its lack of a sound theoretical foundation for higher dimensional reconstructions( m >

2) [15–17], in a univariate scenario (i.e., s i = . . . = s i m = s ( t ) in Eq. (1)), the approach tochoose τ from the ﬁrst minimum of the auto-mutual information [18, 19] is most common. τ isusually set to zero, i.e. the unlagged time series constitutes the ﬁrst component of the reconstruc-tion vectors. The embedding dimension m is then separately determined, usually by a false nearestneighbour approach [20–24] or some other neighbourhood-preserving-idea [25, 26] and all delaysup to τ m are simply integer multiples of τ (UTDE). Other approaches for an appropriate choiceof τ are possible [5, 13, 27–32]. We refer to this as standard time delay embedding (TDE) in thefollowing. More sophisticated ideas [16, 27, 33–42], some including non-uniform delays and the3xtension to multivariate input data [12, 43–48] have been presented [48, 49], but it seems their useis rather limited and not very popular. This could be due to their occasionally very complex natureand the lack of high quality open source implementation in the most commonly used program-ming languages. Another reason could be the fact that standard TDE performs surprisingly well ina range of examples; but still, its limitations should not be neglected, in particular when it comesto noisy time series, systems exhibiting multi-time scale behaviour, or multivariate input data. Thelatter are becoming of increasing interest in the near future, since acquisition costs for sensors anddata collection decrease rapidly. Moreover, the application of complex systems approaches andnonlinear dynamics in different scientiﬁc disciplines receives increasing popularity.Here we propose a fully automated method for an appropriate state space reconstruction of uni-or multivariate time series input, which utilizes two concepts, the continuity statistic [46] and the L-statistic [11]. We brieﬂy review the basic ideas we will use in our proposed method (Sect. II)in order to illustrate their speciﬁc utilization in the algorithm (Sect. III), before applying it tosimulated and experimental data (Sect. IV).

II. REVIEW OF USED CONCEPTS

In order to ensure comprehensibility of our proposed method in Section III we explain thetwo main concepts of it in the following. In Section II A we review the continuity statistic [46]rather detailed, while the

L-statistic is described only brieﬂy in Section II B and extensively in theAppendix F.

A. Continuity statistic by Pecora et al.

In the continuity statistic , the problem of ﬁnding an optimal state space reconstruction withrespect to redundancy and irrelevance is addressed by a statistical determination of functionalindependence among the components of the reconstruction vector [46]. Let { s i ( t ) | i = , . . . , M } be a multivariate dataset consisting of M time series, equally sampled at time instants t = , . . . , N .Suppose we have already chosen some delays τ k to build our temporal reconstruction vector (cid:126) v ( t ) of dimension d . This is, (cid:126) v ( t ) = ( s j ( t + τ ) , s j ( t + τ ) , . . . , s j d ( t + τ d )) , with j k ∈ { , . . . , M } beingthe choices of the different time series and τ k the according delays, which can be – and most oftenare – different. Then for a new potential component of (cid:126) v ( t ) we ask if this new potential component4 oints in d -dimensionalstate space ( d +1) th component ofthose points(1-dimensional)δ-ball ε-interval FIG. 2: Fiducial point (blue) and its k = d -dimensional δ - ball(left panel). Arrows indicate the mapping f : R d → R (right panel), which is the potential ( d + ) th component of each of the points in the left panel (according to a speciﬁc delay τ d + and time series s j d + ), Eq. (2). To decide whether this ε -interval size accepts or rejects the null hypothesis on asigniﬁcance level α the cumulative binomial distribution for getting at least l = ε -interval with probability p is used (modiﬁed after Pecora et al. [46]).can be expressed as a function of the existing components. Mathematically speaking, the equality s j d + ( t + τ d + ) ? = f (cid:0) s j ( t + τ ) , s j ( t + τ ) , . . . , s j d ( t + τ d ) (cid:1) (2)needs to be tested in an appropriate way, i.e., a sensible choice for f : R d → R has to be made.This choice can be based on the property of continuity [50].The practical implementation of Eq. (2) would start with mapping k nearest neighbours, (cid:126) v k ( t ) ,of a ﬁducial point (cid:126) v ﬁd ( t ) from R d → R , as illustrated in Fig. 2. That is, for each of the ( k + d -dimensional points in the left panel a potential ( d + s j d + ( t + τ d + ) is consideredand drawn onto the number line (right panel). The continuity statistic now asks whether these k + ε -interval size by chance, or due to thefact that there is a functional relationship between the d and the ( d + (cid:126) v ( t ) . The according continuity null hypothesis is that l of the k + ε -interval by chance, with probability p on the basis of the binomial distribution.5hen the number of observed neighbours, which are mapped into the ε -interval, is larger thanthe expected number from the binomial distribution for a selected α , i.e. l points, then the nullcan be rejected and, thus, a functional relationship can be assumed. The number of considerednearest neighbours k (i.e., the size of the δ -ball in Fig. 2) also determines the acceptable numberof k + − l points falling outside the chosen ε -interval for a given probability parameter p of thebinomial distribution and a given α . For each candidate delay τ d + and each time series s j d + foreach k (at a given p and α ) there is a minimal spatial scale ε (cid:63) (cid:48) for which the null hypothesis can berejected, i.e., a minimal size of the ε -interval in the right panel of Fig. 2. For the sake of avoiding redundancy while choosing the right delay, an ε (cid:63) (cid:48) for each potential τ d + has to be found. This issimply the distance from the ﬁducial point to its l th-nearest neighbour. By averaging over a rangeof ﬁducial points we eventually get the ﬁnal continuity statistic (cid:104) ε (cid:63) (cid:105) ( τ ) as a function of considereddelay values τ (Fig. 3).The ﬁnal idea for achieving an optimal embedding is a sequential one. For each embeddingcycle D d , i.e. for trying to ﬁnd an appropriate additional component to build a reconstructionvector (cid:126) v ( t ) of dimension d +

1, initially starting with a 1-dimensional time series (cid:126) v ( t ) = { s i ( t ) | i = , . . . , M } , the (cid:104) ε (cid:63) (cid:105) ( τ ) values for a range of possible delay values τ and for each time series s i ( t ) gets computed. The τ of the highest relative maximum of (cid:104) ε (cid:63) (cid:105) ( τ ) is selected as the optimal delayfor this embedding dimension d . This delay is used to build up the temporal reconstruction vectors (cid:126) v ( t ) with the according time series. From here the next embedding cycle D d + gets executedand the process gets repeated until some break criterion terminates the procedure, i.e., when asufﬁciently high embedding dimension m is reached.Even though the idea of the continuity statistic is indeed promising, in this approach severalunanswered questions remain, making the proposed idea not suitable for a fully automated em-bedding approach. i) The choice of p = . α = .

05 or α = .

01 is standard in science (see Figs. 10, 11, 12 in Appendix C), so we cansafely ﬁx them to these values. What is not so clear, but highly relevant for the method, is how tochoose the optimal delay τ from the continuity statistic. Speciﬁcally, we might ask what “relativemaximum” exactly means and if there is any objective criteria for that choice. Moreover, it isalso not clear how to obtain the continuity statistic in the ﬁrst place with respect to the size of theneighbourhood, i.e. the size of the δ -ball in Fig. 2. We propose to vary k from k = k =

14, for each considered delay τ and take the minimum of all trials ε (cid:63) (cid:48) (and6nally average over all ﬁducial points in order to obtain (cid:104) ε (cid:63) (cid:105) ). This is allowed, because there isno preferred choice of k , but a lower bound (see Table 1 in Pecora et al. [46]), and generally thechoice depends on the amount of available data and its sampling rate. ii) In the original study, it was proposed that the continuity statistic on its own provides abreaking criterion for the method, namely, when “ (cid:104) ε (cid:63) (cid:105) remains small out to large τ , we do not needto add more components.” [46] However, this is no objective criterion and introduces a statistic,which would quantify small , and also a threshold, which determines when small is small enough .Due to folding and stretching of the attractor for high delay values τ , especially in case of chaoticsystems, we expect (cid:104) ε (cid:63) (cid:105) to increase with higher τ , anyway. For these cases a (computationallyintensive) irrelevance measure, the undersampling statistic , has been proposed [46]. Nevertheless,even though the undersampling statistic prevents the choice of irrelevant delays, it does not tellwhich of the local maxima of the continuity statistic we should pick and when to stop adding morecomponents to the reconstruction vectors [51]. As an alternative to assess the irrelevance, the L-statistic has been suggested [11] which will be later used for our automated approach to attractorreconstruction. B. L -statistic by Uzal et al. The

L-statistic is an objective cost function, which quantiﬁes the goodness of a reconstruc-tion, independent of the reconstruction method [11]. It has two free parameters, k and T M . Theapproach uses ideas of noise ampliﬁcation and minimization of the complexity of the reconstruc-tion, which lead to a variable σ , and combines it with an irrelevance measure α . Speciﬁcally, themethod estimates the conditional variance of neighbourhoods consisting of k nearest neighboursas the relative dispersal of these neighbourhood points with respect to the center of mass of thatneighbourhood T time steps ahead. Eventually this conditional variance estimate gets averagedover a range of prediction horizons T up to a maximum value T M and is normalized with respectto the original neighbourhood size, thus deﬁning σ . The irrelevance measure α is basically theaveraged inter-point distance, which depends on the sampling. The ﬁnal statistic is then deﬁned as L k = log ( α k σ k ) , (3)where the index k indicates the dependence on the chosen number of nearest neighbours. A de-tailed description can be found in Appendix F. The authors showed, that the L -statistic converges7or any k ≥

3. Our analysis (Fig. 10 in Appendix C) supports this assumption and, thus, we canﬁx k =

3. However, the second free parameter T M will alter the resulting L -statistic at any value.Particularly in the way we want to utilize the concept of this cost function in our automated embed-ding scheme, we need to tackle this parameter dependency (see Sect. III). It is worth mentioningthat the L -statistic inherits the minimization of a mean squared prediction error (MSE) (here com-puted using a local constant model based on the ﬁrst k neighbours) and the FNN-statistic proposedby Kennel et al. [20] (when k = T M = τ ). III. NEW RECONSTRUCTION METHOD

The L -statistic (Sect. II B, Appendix F) on its own could guide the reconstruction problem onﬁnding the optimal delay values and a sufﬁciently high embedding dimension, when used in abrute-force-approach, i.e., scanning all possible delay values of all available time series in everysingle possible combination. It is not clear a priori how to set the parameter T M for obtainingthe L -statistic, so the described procedure has to be repeated for a range of values for T M . Inmost cases, this is not computationally feasible and, therefore, not suitable for a fully automatedembedding approach. We propose to combine the continuity statistic (Sect. II A) and the L-statistic (Sect. II B). The continuity statistic (cid:104) ε (cid:63) (cid:105) guides the preselection of potential delays τ and timeseries { s i ( t ) | i = , . . . , M } in each embedding cycle D d , whereas the L -statistic decides which oneto pick.This algorithm works recursively. An embedding cycle D d determines the optimal time delayand its corresponding time series and enables us to build the actual reconstruction vectors (cid:126) v act fromthis, having dimension d +

1. Algorithm 1 and Fig. 3 explains the method in detail, which we referto as “PECUZAL” algorithm in honour of its roots.(1) For the actual reconstruction vectors (cid:126) v act , in each embedding cycle D d , (cid:104) ε (cid:63) (cid:105) i ( τ ) is computedfor all available M time series s i ( t ) . We comment on the procedure in the ﬁrst embedding cycle D further below.(2) We consider all those delays τ j for each (cid:104) ε (cid:63) (cid:105) i ( τ ) , which correspond to local maxima ˆ τ j in (cid:104) ε (cid:63) (cid:105) i ( τ ) . These delays τ j (and their corresponding time series s i j ( t ) ) are used to build temporaryreconstruction vectors (cid:126) v temp ( τ j , s i j ) , by concatenating (cid:126) v act with the τ j -lagged time series s i j ( t ) .(3) For each (cid:126) v temp ( τ j , s i j ) and the actual reconstruction vectors (cid:126) v act the L -statistic is simultane-ous computed for many parameters T M (c.f. Fig. 4), and we denote them as L (cid:126) v temp ( τ j , s i j ) ( T M ) B y ( t + ) y(t+10) y(t) C on t i nu i t ys t a t i s t i c ⟨ ε * ⟩ Delay τ st embedding cycle ( τ =10)2 nd embeddingcycle ( τ =5)3 rd embedding cycle (no valid τ ) τ =5 32–4 –3 –2 –1 0 1 Δ L = –0.533 Δ L = –0.427 Δ L = –0.163 Δ L = –0.104 Δ L = –0.325 Δ L = +0.314 Δ L = –0.041 Δ L = +0.032 Δ L = +0.339 τ =10 FIG. 3: Illustration of the proposed embedding procedure for a univariate case, using the y -component of the Lorenz system (Appendix E 1). (A) Blue, yellow, and green lines representthe continuity statistics (cid:104) ε (cid:63) (cid:105) for the three embedding cycles the Algorithm 1 computes. Trianglesidentify the τ values corresponding to local maxima of (cid:104) ε (cid:63) (cid:105) . Then, the local maximum whichcorresponds to the maximum decrease of the L -statistic with respect to the actual reconstructionvectors (cid:126) v act , denoted as ∆ L (orange triangle) is chosen as a delay time in each embedding cycle. Inthe third embedding cycle the cost-function cannot be minimized any further, i.e. all peaks lead toa non-negative ∆ L . In this case no delay value is chosen and the algorithm breaks. (B) We end upwith a 3-dimensional embedding and lags τ = , τ = , τ = L (cid:126) v act ( T M ) . We compute the maximum L -decrease for (cid:126) v temp ( τ j , s i j ) with respect to (cid:126) v act as ∆ L temp ( τ j , s i j ) = min T M [ L (cid:126) v temp ( τ j , s i j ) ( T M ) − L (cid:126) v act ( T M )] . This way T M is not a free parameter anymore.(4) The delay-time series combination ( τ j (cid:48) , s i j (cid:48) ( t )) , which yields the minimum ∆ L value will bepicked for this embedding cycle D d , if ∆ L <

0, and is used to construct the actual reconstructionvectors (cid:126) v act = (cid:126) v temp ( τ j (cid:48) , s i j (cid:48) ) of dimension d + (cid:126) v act is passed into the next embedding cycle D d + .(6) We repeat steps (1) to (5) until we cannot minimize the L -statistic any further, i.e. when eachconsidered potential embedding (cid:126) v temp will yield a positive ∆ L . In this case the reconstructioncannot get any better and we stop. (cid:126) v act constitutes the ﬁnal reconstruction. Thus, the L -statisticprovides a break criterion, without the introduction of any threshold parameter. Each embeddingcycle ensures the maximum possible decrease of the cost-function.9 lgorithm 1 Pecuzal Embedding

Input:

A uni- or multivariate dataset consisting of M time series s i with same length and sampling and arange of possible delay values τ = τ max Normalize all M time series to zero mean and unit variance Set ∆ L min = − inf while ∆ L min < do if D then Compute (cid:104) ε (cid:63) (cid:105) ik ( τ ) for all M pairwise combinations of s i , s k for the given τ ’s for each peak ˆ τ j in each (cid:104) ε (cid:63) (cid:105) ik ( τ ) do Create (cid:126) v temp by appending the time series s i with the τ k -lagged time series s k Compute the L -statistics for (cid:126) v temp and s i for a range of parameter values T M = τ max ,denote them as L temp ( T M ) and L s i ( T M ) Compute ∆ L temp = min T M [ L temp ( T M ) − L s i ( T M )] end for From all ∆ L temp j take the τ j , which corresponds to the peak with minimum ∆ L , ∆ L min Save ∆ L min and (cid:126) v temp else if D d then Compute (cid:104) ε (cid:63) (cid:105) i ( τ ) for (cid:126) v act and all s i for the given τ ’s for each peak ˆ τ j in each (cid:104) ε (cid:63) (cid:105) i ( τ ) do Create (cid:126) v temp by appending (cid:126) v act with the τ i -lagged time series s i Compute the L -statistics for (cid:126) v temp and (cid:126) v act for a range of parameter values T M = τ max ,denote them as L temp ( T M ) and L act ( T M ) Compute ∆ L temp = min T M [ L temp ( T M ) − L act ( T M )] end for From all ∆ L temp j take the τ j , which corresponds to the peak with minimum ∆ L , ∆ L min Save ∆ L min and (cid:126) v temp end if if ∆ L min < then Set (cid:126) v act = (cid:126) v temp end if end while Set (cid:126) v ﬁnal = (cid:126) v act Output:

The ﬁnal reconstruction vectors (cid:126) v ﬁnal We give some remarks on the proposed idea: i) In case of the very ﬁrst embedding cycle the actual reconstruction vectors (cid:126) v act are not yetdeﬁned. Roughly speaking, the algorithm needs to ﬁnd a time series to start with and which canact as the ﬁrst component of the ﬁnal reconstruction vectors. As explained in Algorithm 1, each ofthe available time series serves as (cid:126) v act in the ﬁrst run and consequently M continuity statistics getcomputed in the ﬁrst step, i.e., for each combination ( i , k ) of the given time series s i ( t ) . Note thatwe always use the unlagged available time series s i ( t ) , which will constitute the ﬁrst componentof the reconstruction vectors, i.e. we set τ =

0. The continuity statistic reveals the correlationstructure of its input, meaning that a lagged initial time series would lead to different consecutivedelay values. However, the difference of the ﬁnally chosen delay values as well as the total timewindow of the reconstruction Tw = max ( τ , τ , . . . ) would be identical, because the correlationstructure does not change, at least in case of inﬁnite data. Practically, any τ (cid:54) = τ in the sense of a minimal ∆ L for the ﬁrstembedding cycle D would increase the computation time tremendously. ii) The L -statistic can not serve as an absolute measure, due to its dependence on the parameter T M . Consider a time series and a potential 2-dimensional embedding consisting of this time seriesand a τ -lagged version of it. This corresponds to the ﬁrst embedding cycle D in Algorithm 1.Figure 4 shows the L -statistic for a range of parameters T M for the single time series and for thepotential 2-dimensional reconstruction for two deterministic systems (panels A, B) and in case ofthe time series being uniformly distributed random numbers (panel C).Whenever the L -statistic of the 2-dimensional reconstruction (orange graph) is lower than theone from the single time series (blue graph) an embedding would be beneﬁcial (gray shaded areasin Fig. 4). But this is not always the case throughout the course of T M (see panel B). The conclusionis, that it is not meaningful to judge a reconstruction on a single L -value, gained from a ﬁxed T M .A reconstruction is always related to a function L ( T M ) of the parameter T M and it is sensible to11IG. 4: Illustration of the determination of ∆ L within the embedding procedure for the ﬁrst embed-ding cycle in a univariate case, using (A) the y -component of the Lorenz system (Appendix E 1),(B) the time series of the second node of a N = L -statistics for the single time series (bluegraphs) and a 2-dimensional embedding with a τ -lagged version of itself (orange graphs) for arange of parameter values T M . We set the number of nearest neighbours, which constitute a neigh-bourhood, necessary for computing the L -statistic, to KNN = τ from the ﬁrst min-imum of the corresponding auto-mutual information. The possible decrease of the L -statistic forthis hypothetical embedding cycle with the chosen τ is simply ∆ L ( T M ) = L orange ( T M ) − L blue ( T M ) .When ∆ L ( T M ) < ∆ L ( T M ) > ∀ T M a further embedding is notbeneﬁcial. As expected that is the case for the stochastic signal in panel C. The Algorithm 1 auto-matically picks the ﬁrst minimum ∆ L over all T M , which has also been the global minimum withrespect to all T M throughout all examples we have considered so far.look at relative L -discrepancies between two consecutive embedding cycles, namely ∆ L . It turnsout that some stochastic signals will yield a negative ∆ L for T M =

1. In panel C of Fig. 4 this is notthe case, but the proximity of the two curves only for T M = L -statistic inherits the mean squared prediction error (see Eq. (F5)), and a one-sample-prediction horizon is simply too short, to properly distinguish deterministic from stochasticsources. As a consequence we compute the L -statistics in each embedding cycle for T M -valuesstarting with T M =

2. Thus, for any embedding cycle each peak of the continuity statistic does12ot receive a certain absolute L -value, but rather a maximum possible decrease of L , ∆ L , withrespect to the actual embedding (Fig. 3). Then one simply picks the peak, which yields the largestdecrease. We can not rule out the possibility that we could obtain a lower overall decrease in L forall embedding cycles by taking a different “path”, i.e. not go for the maximum decrease of L ineach embedding cycle. This would correspond to achieving a local minimum of the cost functionin the parameter and embedding cycle space. iii) We propose to stop the embedding process, when ∆ L > (cid:126) v temp . One could think about incorporating a threshold, a small negativenumber, e.g. ∆ L > − .

05, to avoid only tiny decreases of the cost function encountered in anembedding cycle. Throughout all our computations this has not been necessary and, therefore, wedispense on such an additional parameter. iv)

The continuity statistic (cid:104) ε (cid:63) (cid:105) itself contains information about the correlation structure of thedata (cf. Sect. II A), which makes it valuable in the context of an automated embedding procedureas proposed here, especially for multivariate input. Not only that the ﬁrst maximum most oftencoincides with the value we would obtain from the ﬁrst minimum of the mutual information,but the continuity statistic of two time series “levels off” at a certain value range. The absolutevalue of (cid:104) ε (cid:63) (cid:105) represents the correlation structure of the data we are looking at and quantiﬁes theindependence from each other for a speciﬁc time lag. This fact allows our method to pick onlytime series from a multivariate set, which belong together, and, consequently, in combination withthe corresponding decrease of the L -statistic, ∆ L , avoid stochastic signals (cf. Fig 4C, Table I). v) The time complexity of the proposed method is O ( N log N ) as illustrated in Fig. 9 (AppendixA). IV. APPLICATION, COMPARISON & RESULTS

We apply the proposed method to a range of different systems, artiﬁcial and experimental data,exemplifying its advantage over established methods. Speciﬁcally, we compare our method tothe standard time delay embedding (TDE). We estimate the uniform delay τ by means of the ﬁrstminimum of the auto-mutual information [18] and estimate the appropriate embedding dimensionby using Cao’s method [22]. Speciﬁcally, we automatically select the appropriate embeddingdimension, when the E -statistic reaches the saturation value within a given threshold of the slope(we picked a slope of < . d E -statistic in case of G&A’s method is set to 10, as suggestedby the authors. We estimate the decorrelation time by using the ﬁrst minimum of the auto-mutualinformation and use it as the Theiler window [53] in all approaches. In the multivariate inputcases, we pick the maximum from all ﬁrst minima of all auto-mutual information statistics. Fordistance computation, the Euclidean norm is used. A. Reconstruction evaluation statistics

In order to compare our approach with the established methods we need to quantify the good-ness of the embedding. For this, we will consider six statistics. In addition to the overall decreaseof the L -statistic, that is ∆ L = ∑ m − i = ∆ L i , where m is the embedding dimension and ∆ L i the cor-responding L -decreases in the encountered embedding cycles, we use two other statistics, whichalso reﬂect the neighbourhood relations of the reconstruction compared to the reference. One is themutual false nearest neighbour statistic [54]. Instead of estimating the coupling strength/ degreeof synchrony of two coupled oscillators, we use the statistic for assessing the similarity betweenthe reference (time series gained from numerical integrations) and the reconstruction:MFNN = N N ∑ i = ∑ Kk = | (cid:126) v i − (cid:126) v i ref k | ∑ Kk = | (cid:126) u i − (cid:126) u i ref k | ∑ Kk = | (cid:126) u i − (cid:126) u i rec k | ∑ Kk = | (cid:126) v i − (cid:126) v i rec k | , (4)where (cid:126) u i are the vectors of the reference/original system, (cid:126) v i , i = , . . . , N the reconstruction vectors, i ref k the indices of the K -nearest neighbours of index i in the original system and i rec k , k = , . . . , K the corresponding indices measured in the reconstruction. We propose the comparison of K near-est neighbours instead of just focusing on the ﬁrst nearest neighbour, in order to receive morerobust results in the presence of noise. By sampling the data sufﬁciently high we allow the precisedetermination of quite large neighbourhoods of a ﬁducial point, so we set K =

10 in our computa-14ions. The results vary in their absolute values with different choices of K , but the order of rank forthe different test methods and their relative difference remain approximately constant. MFNN = R ref , and the RP of the reconstruction, R rec ).JRRF = ∑ Ni , j JR i , j ∑ Ni , j R ref i , j , JRRF ∈ [ ] (5) JR = R ref ◦ R rec . (6)We compute both, R ref and R rec , by ﬁxing the recurrence threshold corresponding to a globalrecurrence rate of 8%. This is also to ensure comparability of the recurrence quantiﬁers describedbelow [58]. Results a fairly similar for a wide range of choices of the recurrence rate we triedand the particular choice (in our case 8%) is not so important, since we apply them to all RP’s wecompare. It is, of course, also possible to compare different recurrence plot quantiﬁers gained from R rec to the ones derived from R ref [57]. We here choose the determinism (DET), the diagonal linelength entropy (ENTR) and the recurrence time entropy (RTE) (Appendix D). The latter two arerelated to the Kolmogorov-Sinai-Entropy [59, 60], but do not serve as straight forward estimators,when necessary corrections on the RP and its quantiﬁers are ignored [61]. B. Paradigmatic examples

We investigate three paradigmatic chaotic systems, the Rössler system in the funnel regime(Appendix E 3), a driven Dufﬁng oscillator in regular motion (Appendix E 3) and the Mackey-Glass delay equation (Appendix E 5). For all systems we compare the mean values of the eval-uation statistics from ensembles of 1,000 trajectories with different initial conditions. Table II inAppendix B summarizes all results, also including uncertainties and results for 10% additive mea-surement noise. In order to easily compare the evaluation statistics, we use the relative deviationfrom the reference (e.g., | DET rec − DET ref | DET ref ), except for the MFNN and the ∆ L -statistic, where we usethe relative deviation from the best score (i.e., | MFNN rec − MFNN best | MFNN best ). For a concise visual presen-15ation we use spider plots in the following and plot , i.e. the closer to unity thevalue gets, the better the accordance to the reference or the best achieved value is. (i)

For the chaotic Rössler system, we reconstruct the state space for the univariate case (usingthe y -component, in order to allow TDE having the best chances [62]) and for the multivariatecase (using the x - and y -component). An overview over the results are shown in Figure 5. ForTDE in the multivariate case we take the results from the univariate example, because TDE cannothandle multivariate input. Note that this leads to different relative values for MFNN and ∆ L , sincewe plot the deviation to the best score in these cases. The PECUZAL method performs best inthe univariate as well as in the multivariate scenario, with improved outcomes for the multivariateone, as expected. This also holds in case of applied measurement noise, where our method evenexpands its lead for most measures (Table II). Surprisingly, TDE also provides very good resultsin the univariate case and in the multivariate setup Garcia & Almeida’s method yields an overalllarger deacrease of the L -statistic than PECUZAL, but only in the noise free case (see Table II).This lower ∆ L is not reﬂected in better performance of the other evaluation statistics. Speciﬁcally,the diagonal line length entropy values differ from the true reference value in the double-digitpercentage range for G&A’s method. (ii) The overall rating also holds for the driven Dufﬁng oscillator (Fig. 6), but there are severedifferences. In contrast to the chaotic Rössler system here it seems that the additional given timeseries for the multivariate scenario do not improve the reconstructions signiﬁcantly. The

JRRF donot get better and remain on a quite high level of 80% (G&A) up to 84% (PECUZAL) accordancewith the reference recurrence plot R ref . This is also reﬂected in the ∆ L statistic, which also does notimprove in the multivariate case for G&A and MDOP, and only slightly decreases for PECUZAL.The same story is told by looking at the other evaluation statistics summarized in Table II. It isimportant to note that under noise our proposed method outperforms the others in almost all cases,but in principle we notice that the signal to noise ratio seems to be very low and biases the resultsof all methods more than in case of the Rössler example. Speciﬁcally in case of RTE deviations tothe true reference value increase from the low single digit percentage range up to 62% in the noisycase. The reason is that in regular motion we expect a near zero value of

RTE and noise easilyblurrs the diagonal lines in the recurrence plot, leading to a broader distribution of recurrencetimes and, thus, to randomly elevated

RTE values. The very same problem make the

ENTR valuesdeteriorate for all methods, here already apparent in the noise free case. In a regular motion system

ENTR is biased the most when no correction of the diagonal lines is performed, which has been16 össler (y-component)

JRRF MFNNΔLDETENTRRTE

Rössler (x- & y-component)

JRRF MFNNΔLDETENTRRTE

A B standard TDE Garcia & Almeida MDOP PECUZAL

FIG. 5: (Relative) deviation of reconstruction by standard time delay embedding (TDE), the Garcia& Almeida (G&A), Nichkawde’s (MDOP), and PECUZAL methods, comparing the accordanceof the RPs of the reconstructed attractor and the reference attractor (JRRF), mutual false nearestneighbors (MFNN), ∆ L -statistic, as well as the recurrence quantiﬁers determinism (DET), diago-nal line length entropy (ENTR), and the recurrence time entropy (RTE). (A) Univariate case usingthe y -component of the numerically integrated Rössler system (Appendix E 3) and (B) multivariatecase using the x - and y -values of the Rössler system. Since TDE cannot handle multivariate inputwe take the values from the univariate case here for illustrative reasons, which result in differentrelative values in case of MFNN and the ∆ L -statistic. For these measures we plot the

1- relativedeviations to the best score , which increases in the multivariate case. For the other statistics weplot

1- relative deviations to the reference score , i.e. the closer to unity the value gets, the betterthe accordance to the reference or the best achieved value is.shown by Kraemer and Marwan [61]. The proposed skeletonizaton of the recurrence plots for thepurpose of reducing the bias in the diagonal line based recurrence plot quantiﬁers, such as

ENTR and

RTE , lead to way better results in the one digit percentage range even for noise, as expected.Due to the computational complexity of the skeletonization algorithm [61] we did not apply thiscorrection scheme to all of the 1,000 runs in this experiment, but rather tried it on small samplesnot shown here, in order to understand the bad performances for all methods in case of

ENTR and

RTE . 17 tandard TDE Garcia & Almeida MDOP PECUZAL

A B

Duffing (x-component)

JRRF MFNNΔLDETENTRRTE

Duffing (x- & y-component)

JRRF MFNNΔLDETENTRRTE

FIG. 6: Same as Fig. 5, here for the Dufﬁng system (Appendix E 4). (iii)

In contrast to the above systems, the Mackey-Glass system (Appendix E 5) is inﬁnite di-mensional and we have do deal with a univariate time series from the numerical integration, whichis why we do not have a “reference” value we could base our computations on. Therefore, we canonly use the ∆ L -statistic (Table II). The proposed PECUZAL method performs signiﬁcantly betterthan the other methods with Garcia & Almeida’s method coming second, especially in the pres-ence of noise. Here all methods but the proposed one yield reconstructions with correspondingpositive ∆ L -values, i.e. the suggested embeddings do not decrease the overall L -statistic. Ad-mittedly, PECUZAL also comes with a very small negative ∆ L value, with 0 falling within the1 σ -interval. This ﬁnding indicates a too low signal to noise ratio, which we comment on below.Farmer [63] conjectured a linear relation between the delay chosen in the Mackey-Glass equationand the corresponding dimensionality of the attractor (c.f. Table I in [63]). A linear ﬁt to the dataof that table suggests an attractor-dimension of d A ≈ . ≈ . ≈ . ± ( . ± . ) (TDE), 3 ± ( . ± ) (G&A), 4 ± ( ± ) (MDOP) and 7 ± ( . ± . ) (PECUZAL). The bracketed values corre-spond to the case of 10% additive noise. While all methods meet Farmer’s conjecture in the noisefree scenario, this does not hold for the MDOP method an the proposed method in the noisy case.Both methods suggest too low embedding dimensions. This is due to the fact that the signal tonoise ratio is apparently too low and PECUZAL treats the signal as a stochastic source for some18ealizations where it does not embed the data at all, while it did not do it in case of the Rössler andDufﬁng system, despite the same variance of the white Gaussian noise. Results from G&A andTDE do fall in the 95% conﬁdence interval, which is large, because of the weak data basis given inRef.[63]) and the resulting uncertain ﬁt. We ﬁnd the time window of the embedding, i.e., the totaltime span covered by a reconstruction vector, decreasing with increasing noise level throughoutour experiments. This is very much in line with the ﬁndings of [64].We ﬁnally look at two made up, ill-deﬁned multivariate datasets, in order to see how the G&A,MDOP, and the PECUZAL method cope with redundant data and with stochastic signals. (i) First we construct a dataset consisting of six time series (

Fooling dataset I ). The ﬁrst twotime series are the x - and y -component of the Rössler system (Appendix E 3), the third and fourthtime series are the x - and y -component of the Lorenz system (Appendix E 1), the ﬁfth time seriescorresponds to x + y Rössler , whereas the sixth time series is set to x Lorenz · y Lorenz + y Lorenz . Ourproposed method does not mix time series from both systems and sticks to one system (Lorenz inthis case) (Tab. I). It suggests a 3-dimensional embedding and also does not need the redundantinformation stored in the ﬁfth and the sixth time series of the input dataset. In contrast, G&Aand MDOP fail here, suggesting a 3-dimensional and a 6-dimensional embedding, respectively,mixing up the different systems yielding a useless reconstruction (Fig. 7). (ii)

The second made-up dataset (

Fooling dataset II ) consists of three time series of length5,000, with the ﬁrst one being an auto-regressive process of order 1 with parameters (0, 0.9) andinitial condition u = .

2, to mimic coloured noise. The second and third time series are Gaussianand uniform random numbers. While G&A and MDOP embed the non-deterministic time series,our proposed algorithm suggests no embedding and throws an error (Tab. I). The reason is, thatthe L -statistic is a monotonically increasing function of the embedding dimension for stochasticdata for any prediction horizon parameter T M , i.e., the algorithm cannot minimize L already inthe ﬁrst embedding cycle. For the sake of completeness we have to stress that this particularexample should not be read as a claim of a generalizable behaviour of our proposed method todeal with auto-regressive processes of arbitrary order p . In the case of higher-order AR processes,PECUZAL often suggests an embedding with a dimension that corresponds to the order of theAR process, as we would expect it theoretically. We have noticed, however, that the embeddingdepends heavily on the length of the time series used, which is in line with the ﬁndings of Holsteinand Kantz [43], but also of the choice of the particular AR-parameters and the order of the ARprocess under study. A systematic consideration of PECUZAL’s embedding suggestions for this19lass of processes is beyond the scope of this paper. Garcia & Almeidadelays: [0 1 1]ts: [1 4 1] MDOPdelays: [0 20 16 20 0 23]ts: [6 2 1 3 3 4] PECUZALdelays: [0 3 0]ts: [3 4 4]

A B C

FIG. 7: Reconstructions of the

Fooling dataset I (see text for details). In case of the MDOPmethod (panel B), we plot the ﬁrst three components of the 6-dimensional trajectory. While G&Aand MDOP methods mix timeseries from different systems, leading to nonsensical reconstructions(panels A and B), our proposed method (panel B) sticks to time series from one system (Lorenz inthis case).TABLE I: Embedding dimension, the accordingly chosen time series and corresponding time lagsfor the fooling datasets, mimicking mixed deterministic data from different systems and redun-dant time series as well as stochastic time series. We compare Garcia & Almeida’s (G&A),Nichkawde’s (MDOP), and our proposed PECUZAL method. m chosen time series Delay’s

System

G&A MDOP PECUZAL G&A MDOP PECUZAL G&A MDOP PECUZAL

Fooling dataset I

Fooling dataset II

C. Experimental data

We will now utilize the PECUZAL embedding method on experimental data. Speciﬁcally, welook at a chaotic time series from electrochemical oscillations. The experiment was performedwith the chaotic electrodissolution of nickel in sulfuric acid [65]. A standard three-electrode20lectrochemical cell was used with a 1-mm diameter nickel wire as working, a Pt counter, anda Hg/Hg SO / sat. K SO reference electrode. The electrolyte was 4.5 M H SO at 10°C. Thenickel wire was connected to the working point of the potentiostat through an individual resistance( R ind ), and a potentiostat (Gill AC, ACM Instruments) applied a constant circuit potential ( V , withrespect to the reference electrode). At a given circuit potential, the rate of the metal dissolution,measured as the current, can exhibit chaotic oscillations due the hidden negative differential re-sistance and additional nonlinear processes related to the passivation kinetics [66]. About 500current oscillations were recorded at a data acquisition rate of 200Hz, which corresponds to about200 points per cycle and a time series length of N = , R ind = 1.5 k Ω and V = 1,360 mV.FIG. 8: (A) Entire z-standardized and downsampled time series from electrochemical oscillationsof length N = ,

000 and (B) a sub-sample of length ˆ N = , T obtained from RPsof 1 ,

000 sub-samples to their reference values obtained from RPs of the entire time series for thefour reconstruction methods.We demonstrate the ability of the proposed embedding method to cope with rather small tointermediate sized experimental datasets. We ﬁrst downsample the time series to N = , T ), see Appendix D. We denote each of these values for each of the reconstruction methodas its reference values. We then repeat the described procedure for 1 ,

000 sub-samples of lengthˆ N = ,

500 drawn from the time series at random (shown exemplary in Fig. 8B), i.e. for each of the1 ,

000 sub-samples we compute the reconstruction for each of the four methods, its correspondingRP and the RQA-quantiﬁers. This will result in distributions for ENTR, LAM, RTE and T foreach reconstruction method. Finally we compare the medians of these distributions to their refer-ence values and plot the relative deviation | RQA median(distr.) − RQA re f | RQA re f in Fig. 8C. The capability of thefour methods to allow for satisfying estimates from short time series samples differs strongly forthe different RQA-quantiﬁers. The largest discrepancies to the reference can be noted for TDE incase of T and RTE. For LAM all methods estimate the reference very well from the subsamples.While our proposed method slightly comes last in case of ENTR, it yields the best results for T and LAM and is performing well in case of RTE. The example shown here can not be generalized,but it underpins our claim that PECUZAL provides robust state space reconstructions for a verybroad range of processes under different conditions, which are often better, but always at leastequally well than the ones obtained from the established methods. V. CONCLUSIONS

A fully automated approach for state space reconstruction from uni- or even multivariate timeseries is proposed and compared to established methods. The algorithm works iteratively and ap-pends the reconstruction vectors in each embedding cycle with an appropriate time delay and anaccording time series until a cost function cannot be minimized further. Its core functionality isbased on identifying potential time delays and its corresponding time series in each embeddingcycle by using the continuity statistic. For each of those delays, temporary reconstruction vectorsare build and the cost function is computed. The delay value, which yields the maximum decreaseof the cost function is selected. If none of the considered delay values yields a decrease of thecost function the reconstruction can not get any better and the ﬁnal embedding is obtained withoutthe need of setting any threshold parameter. Usually the time delays chosen that way are not sim-ply multiples of each other, but rather reﬂect even complex correlation structures within the data.22his is why the algorithm is also able to detect time series stemming from a stochastic source,which it will not embed. Except from computing the decorrelation time of the data for providinga valid Theiler window for the nearest neighbour search, and providing a range of possible de-lay values the algorithm shall encounter, there are neither any data preprocessing steps necessary,nor any free parameters need to be adjusted before using the proposed routine. The approach isdemonstrated on a variety of exemplary systems as well as on experimental data stemming fromchaotic chemical oscillators. We ﬁnd that it provides often better, but always at least equally wellreconstructions than the established methods. It is furthermore capable of providing meaningfulreconstructions for rather short time series, which particularly holds for the case of multivariateinput. The additional computational effort in comparison to standard time delay embedding ismanageable and justiﬁed. Since the proposed method works automatically, is basically parameterfree, and can handle multivariate input without mixing data originating from different systems, wecan think of a wide range of potential applications. This is especially true for scenarios, wheremultiple sensors or channels of a detector monitor real world processes, which are not isolated ob-servables, i.e., in engineering, earth- and life science contexts. The provided software (AppendixA) in three common coding languages will facilitate the use of the presented method.

ACKNOWLEDGMENTS

K.H.K thanks Dr. Paul Schultz and Maximilian Gelbrecht for fruitful discussions and importanthints and Dr. Jaqueline Lekscha for the idea to understand and implement the continuity statistic inthe ﬁrst place. I.Z.K. acknowledges support from the National Science Foundation (NSF) (GrantNo. CHE-1900011). This work has been ﬁnancially supported by the German Research Founda-tion (DFG projects MA4759/8 and MA4759/9).23 ppendix A: Implementation and code availability

The study that we present here is available as a fully reproducible code base [67]. In ad-dition, we have implemented performant versions of the embedding algorithms, as well asthe automated pipeline for optimal embedding. A single package is provided for Python [68](https://pypi.org/project/pecuzal-embedding/) and a toolbox for MATLAB ® [69],(https://de.mathworks.com/matlabcentral/ﬁleexchange/86004-pecuzal-embedding-algorithm-for-matlab) while for the Julia language we have integrated the PECUZAL algorithm into the libraryDynamicalSystems.jl [70] (the other methods “TDE, Garcia&Almedia, MDOP” were already partof the library). The automated pipeline for optimal embedding has been further reﬁned for betteruser experience and is also part of DynamicalSystems.jl.The Julia versions of the algorithms have been heavily optimized and are the most performantones by orders of magnitude in parts, when compared to the Python (and MATLAB ® ) imple- t i m e [ s ] Median computation time of reconstruction methods

TDEG&AMDOPPECUZALN log(N)

FIG. 9: Median time complexity for the state space reconstruction from the y -component of theRössler system (Appendix E 3) for TDE, Garcia & Almeida’s, MDOP, and PECUZAL method.Ensembles of function calls for each considered time series length N are computed and the medianvalues are shown (implemented in BenchmarkTools.jl [71]). A Theiler window is set as the ﬁrstminimum of the auto-mutual information and the maximum encountered time delay is set to fourtimes the Theiler window for TDE, G&A, and PECUZAL.24entation. Due to the nearest neighbour tree search, all compared methods have O ( N log N ) timecomplexity (Fig. 9). Our proposed method is slightly (but consistently) slower than the establishedmethods, with standard time delay embedding performing best (Cao’s method). Worth mention-ing is the fact that in the MDOP case the most computational effort is put on the estimation ofthe maximum considered delay value and not on the computation of their β -statistic. Consideringthat an optimal embedding is a one-time operation, we believe that all methods shown here arepractically useful with respect to their computational complexity. Appendix B: Numerical Results

Chaotic Rössler system, driven Dufﬁng oscillator in regular motion, and nonlinear time-delayMackey-Glass equation are used to evaluate the performance of the different embedding ap-proaches (TDE, G&A, MDOP, and PECUZAL). We consider univariate and multivariate em-bedding (Rössler and Dufﬁng). The y -component of the Rössler system and the x -componentof the Dufﬁng system are used for the univariate embedding, wheres the x - and y -values of thecorresponding systems are used for the multivariate embedding. For the TDE only univariateembedding is possible and, thus, there are no results for the multivariate case.Ensembles of 1,000 trajectories using different initial conditions are used. Additionally, weconsider additive noise with an amplitude of 10% of the standard deviation of the correspondingsignal.All RPs are computed using a ﬁxed recurrence rate of 8% and a minimum line length of 2.25 A B LE II : R e s u lt s o f t h ee v a l u a ti on m ea s u r e s : acc o r d a n ce o f t h e r ec u rr e n ce p l o t ( J RR F ) , m u t u a l f a l s e n ea r e s t n e i ghbo r s ( M F NN ) , ∆ L - s t a ti s ti c , a s w e ll a s t h e r ec u rr e n ce qu a n ti ﬁ e r s d e t e r m i n i s m ( D ET ) , d i a gon a lli n e l e ng t h e n t r opy ( E N T R ) , a nd r ec u rr e n ce ti m ee n t r opy ( R TE ) f o r t h e s t a nd a r d ti m e d e l a y e m b e dd i ng ( T D E ) , G a r c i a & A l m e i d a ( G & A ) , N i c hk a w d e ’ s ( M DO P ) , a nd P E C U Z A L - m e t hod ( s ee t e x t f o r d e t a il s ) , u s i ng c h a o ti c R ö ss l e r s y s t e m , d r i v e n D u f ﬁ ngo s c ill a t o r i n r e gu l a r m o ti on ( bo t h i nun i v a r i a t e ( u ) a nd m u lti v a r i a t e ( m ) ca s e ) , a nd non li n ea r ti m e - d e l a y M ac k e y - G l a ss e qu a ti on . M ea nv a l u e s a nd s t a nd a r dd e v i a ti ono f e n s e m b l e s c on s i s ti ngo f , i n t e g r a t e d t r a j ec t o r i e s a r e s ho w n . B r ac k e t e dv a l u e s p r e s e n t r e s u lt s f o r a dd iti v e no i s e . T h e r e s u lt s o fr ec u rr e n ce qu a n ti ﬁ e r s a r e p r e s e n t e d a s r e l a ti v e d e v i a ti on s fr o m t h e r e f e r e n ce . T h e b e s t r e s u lt s f o r eac h c on s i d e r e d s y s t e m a r e h i gh li gh t e d i nb l ac k . J RR F M F NN ∆ L S y s t e m T D E G & A M DO PP E C U Z A LT D E G & A M DO PP E C U Z A LT D E G & A M DO PP E C U Z A L R ö ss l er ( u ) . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . − . ± . − . ± . − . ± . − . ± . ( . ± . )( . ± . )( . ± . )( . ± . )( . ± . )( . ± . )( . ± . )( . ± . )( − . ± . )( − . ± . )( − . ± . )( − . ± . ) R ö ss l er ( m ) — . ± . . ± . . ± . — . ± . . ± . . ± . — − . ± . − . ± . − . ± . — ( . ± . )( . ± . )( . ± . ) — ( . ± . )( . ± . )( . ± . ) — ( − . ± . )( − . ± . )( − . ± . ) M a c k e y - G ———————— − . ± . − . ± . − . ± . − . ± . ———————— ( . ± . )( . ± . )( × ± . )( − . ± . ) D u f ﬁn g ( u ) . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . − . ± . − . ± . − . ± . − . ± . ( . ± . )( . ± . )( . ± . )( . ± . )( . ± . )( . ± . )( . ± . )( . ± . )( − . ± . )( − . ± . )( − . ± . )( − . ± . ) D u f ﬁn g ( m ) — . ± . . ± . . ± . — . ± . . ± . . ± . — − . ± . − . ± . − . ± . — ( . ± . )( . ± . )( . ± . ) — ( . ± . )( . ± . )( . ± . ) — ( − . ± . )( − . ± . )( − . ± . ) D ET ( × ) E N T RR TE S y s t e m T D E G & A M DO PP E C U Z A LT D E G & A M DO PP E C U Z A LT D E G & A M DO PP E C U Z A L R ö ss l er ( u ) . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . ( . ± . )( . ± . )( . ± . )( . ± . )( . ± . )( . ± . )( . ± . )( . ± . )( . ± . )( . ± . )( . ± . )( . ± . ) R ö ss l er ( m ) — . ± . . ± . . ± . — . ± . . ± . . ± . — . ± . . ± . . ± . — ( . ± . )( . ± . )( . ± . ) — ( . ± . )( . ± . )( . ± . ) — ( . ± . )( . ± . )( . ± . ) D u f ﬁn g ( u ) . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . ( . ± . )( . ± . )( . ± . )( . ± . )( . ± . )( . ± . )( . ± . )( . ± . )( . ± . )( . ± . )( . ± . )( . ± . ) D u f ﬁn g ( m ) — . ± . ± . . ± . — . ± . . ± . . ± . — . ± . . ± . . ± . — ( ± )( ± )( ± ) — ( . ± . )( . ± . )( . ± . ) — ( . ± . )( . ± . )( . ± . ) ppendix C: Dependency on parameters We investigate the impact of different parameter settings on the resulting reconstruction of ourproposed method for the x -component of the Lorenz system (Appendix E 1). In particular, Fig-ure 10) shows the sensitivity of the L -statistic value and the chosen delays with respect to thenumber of nearest neighbours kNN for a ﬁxed paramter T M =

20 (panels A, B) and also the de-pendence on the continuity statistic parameters δ -Neighborhoodsize (panels C, D), the (binomial)probability p (panels E, F) and signiﬁcance-level α (panels G, H). The critical dependence of the L -statistic on the parameter T M is discussed in Sect. III and Fig. 4. The results show very little im-pact on the reconstruction quality and, thus, conﬁrm our choice of ﬁxed parameter values for thealgorithm. Speciﬁcally, the crucial qualitative course of the continuity statistic, i.e., the positionof the local maxima, remains unchanged for relevant choices of α and in the vicinity of p = . Appendix D: The recurrence plot and its quantiﬁcation measures

A recurrence plot (RP) is a binary, square matrix R representing the recurrences of states (cid:126) x i ( i = , . . . , N , with N the number of measurement points) in the d -dimensional state space [56, 57] R i , j ( ε ) = Θ (cid:0) ε − (cid:107) (cid:126) x i − (cid:126) x j (cid:107) (cid:1) , (cid:126) x ∈ R d , (D1)with (cid:107) · (cid:107) a norm, ε a recurrence threshold, and Θ the Heaviside function. There are several waysto quantify the structures and dynamics encoded therein, but here we look at the determinism [72],which is the fraction of recurrence points that form diagonal lines of length (cid:96) DET = ∑ N (cid:96) = (cid:96) min P ( (cid:96) ) ∑ N (cid:96) = (cid:96) P ( (cid:96) ) , (D2)with P ( (cid:96) ) being the histogram of all diagonal line lengths in the RP and (cid:96) min the considered min-imal line length (here set to 2). Another quantiﬁer is the Shannon entropy of the probabilitydistribution p ( (cid:96) ) = P ( (cid:96) ) ∑ (cid:96) P ( (cid:96) ) to ﬁnd a diagonal line of exact length (cid:96) ENTR = − N ∑ (cid:96) = (cid:96) min p ( (cid:96) ) ln p ( (cid:96) ) . (D3)It is also possible to look at the Shannon entropy of the length distribution of the white vertical27 -statistics Chosen delays k-NN k-NNδ-Neighbourhoodsize δ-Neighbourhoodsize(binomial-)p (binomial-)p(binomial-)α (binomial-)α st embedding cycle 2 nd embedding cycle τ , 1 st embedding cycle τ , 2 nd embedding cycle τ , 3 rd embedding cycle FIG. 10: L -statistic (left panels) and chosen delays (right panels) for a variety of choices of param-eters for embedding the Lorenz system, relevant for the PECUZAL method. See text for details.When not varied, the other parameters were ﬁxed to δ -Neighbourhoodsize = p = . α = . kNN =

3, as we propose.lines (cid:96) w , which correspond to recurrence timesRTE = − N ∑ (cid:96) w = p ( (cid:96) w ) ln p ( (cid:96) w ) . (D4)Analogue to the deﬁnition of the determinism, the laminarity is the fraction of recurrence pointsthat form vertical lines LAM = ∑ N (cid:96) v = (cid:96) v , min P ( (cid:96) v v ) ∑ N (cid:96) v = (cid:96) v vP ( (cid:96) v v ) , (D5)with (cid:96) v the length of a vertical line, P ( (cid:96) v v ) the histogram of all vertical lines in the RP and (cid:96) v , min the considered minimal line length (here set to 2).Finally, the RP can be considered as the adjacency matrix of an ε -recurrence network [73], A = R − and deﬁne the transitivity [74] as T = ∑ Ni , j , k = A jk A i j A ik ∑ Ni , j , k = A i j A ik ( − δ jk ) , (D6)28

23 123

FIG. 11: Impact of different signiﬁcance-level choices α on the continuity statistic. The otherparameters for obtaining the continuity statistic were ﬁxed to δ -Neighbourhoodsize =

14 and p = . Appendix E: Exemplary models

The models described in the following are all integrated using DynamicalSystems.jl and Dif-ferentialEquations.jl [70, 76] in the Julia Language.

1. Lorenz system

The Lorenz system [77] is deﬁned as ˙ x = σ ( y − x ) ˙ y = x ( ρ − z ) − y ˙ z = xy − β z . We set the initial condition to u = [ . , . , . ] , use a sampling time of ∆ t = .

01 and discardthe ﬁrst 1,000 points of the integration as transients. For producing Fig. 3 we set the parameters29

21 123 123123123123

FIG. 12: Impact of different choices of the binomial probability parameter p on the conti-nuity statistic. The other parameters for obtaining the continuity statistic were ﬁxed to δ -Neighbourhoodsize =

14 and α = . σ = , β = / , ρ =

60 and use a time series consisting of 5,000 samples. In Figs. 4, 10,11, 12 and the redundancy fooling-dataset in section IV B we use the standard parameter values σ = , β = / , ρ =

28 and also a time series consisting of 5,000 samples.

2. Lorenz 96 system

The Lorenz 96 system [78] is deﬁned as dx i dt = ( x i + − x i − ) x i − − x i + F with x i the state of the system for nodes i = , . . . , N and it is assumed that the total number ofnodes is N ≥

4. The forcing constant F serves as the control parameter. Here we set N = F = .

472 and the initial condition to u = [ . . . . . . . . ] ,use a sampling time of ∆ t = . . Rössler system The Rössler system [79] is deﬁned as ˙ x = − y − z ˙ y = x + ay ˙ z = b + z ( x − c ) . We randomly choose the initial conditions uniformly from an interval [ , ] and discard the ﬁrst2,000 points of the integration as transients working with time series of length N = ,

000 (forFig. 9 we set N = , a = . , b = . , c = . ∆ t = .

03 and a = . , b = . , c = . ∆ t = .

02 elsewhere.

4. Driven Dufﬁng oscillator

The driven Dufﬁng/Van der Pol oscillator [80, 81] is deﬁned as˙ x = y ˙ y = µ ( − x ) y − α x − β x + z ˙ z = γ · ω · cos ( ω · t ) , with µ = . , α = , β = , γ = . , and ω = x , y , z uniformly from the interval [ , . ] ,use a sampling time of ∆ t = . N = ,

5. Mackey-Glass equation

The Mackey-Glass equation [83] is the nonlinear time delay differential equation˙ x = β x τ + x n − γ x , with the lag τ =

44, and the parameters n = β = .

2, and γ = . x τ represents the valueof x at time t - τ . We randomly choose the initial conditions uniformly from an interval [ , . ] ,use a sampling time of ∆ t = . N = , ppendix F: Details of the L -statistic The concept of quantifying noise ampliﬁcation in the context of the validation of an attractorreconstruction has been proposed in [10]. The reconstruction process is considered in the presenceof noise and the ﬁnite data availability as a modeling problem, introducing a noise ampliﬁcationand estimation/prediction error for any measures on the reconstructed attractor. The variance ofthe conditional probability density function is a “natural criterion for assessing predictability”[10] and is used as the noise ampliﬁcation for a ﬁducial point (cid:126) v ﬁd ( t ) at a given noise level ε on thereconstruted attractor σ ε (cid:0) T ,(cid:126) v ﬁd ( t ) (cid:1) = ε (cid:114) Var (cid:16) (cid:126) v (cid:0) t + T (cid:1) | B ε (cid:0) (cid:126) v ﬁd ( t ) (cid:1)(cid:17) , (F1)with Var (cid:16) (cid:126) v (cid:0) t + T (cid:1) | B ε (cid:0) (cid:126) v ﬁd ( t ) (cid:1)(cid:17) being the conditional variance of (cid:126) v ( t + T ) for (cid:126) v ﬁd ( t ) in a radius ε ball B ε (cid:0) (cid:126) v ﬁd ( t ) (cid:1) for a prediction horizon T . Finally, the noise ampliﬁcation σσ (cid:0) T ,(cid:126) v ﬁd ( t ) (cid:1) = lim ε → σ ε (cid:0) T ,(cid:126) v ﬁd ( t ) (cid:1) (F2)averaged over all ﬁducial points on the attractor (and squared), (cid:104) σ ( T ) (cid:105) , serves as a measure ofthe predictive power, with respect to the time horizon T , the reconstruction vectors (cid:126) v allow for (fordetails see [10]). Broadly speaking a low conditional variance, and thus, a low value of (cid:104) σ ( T ) (cid:105) is achieved for sufﬁciently unfolded attractors, because in this case noise distortions of the truetrajectory are not likely to result in mixing states, which are far away from each other in true statespace and consequently preserve the neighbourhood relations on the reconstructed attractor.Uzal et al. [11] reinterpret Eqs. (F1), (F2) and give an approximation-recipe for the conditionalvariance. The authors redeﬁne the mentioned equations as σ ε (cid:0) (cid:126) v ﬁd ( t ) (cid:1) = T M (cid:90) T M σ ε (cid:0) T ,(cid:126) v ﬁd ( t ) (cid:1) dT (F3)and consider the limit σ (cid:0) (cid:126) v ﬁd ( t ) (cid:1) = lim ε → σ ε (cid:0) (cid:126) v ﬁd ( t ) (cid:1) (F4)where ε is not related to any observational noise level anymore. The ε -ball B ε (cid:0) (cid:126) v ﬁd ( t ) (cid:1) is simplya tool for determining certain neighbourhood relations of a ﬁducial point (cid:126) v ﬁd ( t ) and their changeswhen mapped to future states by the reconstruction function F / F (cid:48) (Fig. 1). It is possible toapproximate the conditional variance in Eq. (F1) by E k ( T ,(cid:126) v ﬁd ) ≡ k + ∑ (cid:126) v (cid:48) ∈ B k ( (cid:126) v ﬁd ) (cid:0) (cid:126) v (cid:48) ( t + T ) − u k ( T ,(cid:126) v ﬁd ) (cid:1) (F5)32here B k ( (cid:126) v ﬁd ) estimates B ε (cid:0) (cid:126) v ﬁd ( t ) (cid:1) by the ﬁducial point itself and its k nearest neighbours, re-specting a Theiler window (i.e., avoid temporal correlations in the neighbour-searching) [53]. Thecenter of mass with respect to the chosen time horizon T and the ﬁducial point (cid:126) v ﬁd is deﬁned as u k ( T ,(cid:126) v ﬁd ) ≡ k + ∑ (cid:126) v (cid:48) ∈ B k ( (cid:126) v ﬁd ) (cid:126) v (cid:48) ( t + T ) . (F6)The size of the k -neighbourhood of (cid:126) v ﬁd , B k ( (cid:126) v ﬁd ) is estimated as ε k ( (cid:126) v ﬁd ) ≡ k ( k + ) ∑ (cid:126) v (cid:48) ,(cid:126) v (cid:48)(cid:48)∈ Bk ( (cid:126) v ﬁd ) (cid:126) v (cid:48)(cid:54) = (cid:126) v (cid:48)(cid:48) (cid:107) (cid:126) v (cid:48) − (cid:126) v (cid:48)(cid:48) (cid:107) , (F7)where (cid:107) · (cid:107) is a norm used for the distance computation. Finally, E k ( T ,(cid:126) v ﬁd ) (Eq. (F5)) is averagedover a range of T ’s in [ , T M ] and the noise ampliﬁcation estimated from k nearest neighbours is σ k ( (cid:126) v ﬁd ) ≡ E k ( (cid:126) v ﬁd ) ε k ( (cid:126) v ﬁd ) , (F8)which needs to averaged over all considered ﬁducial points N (cid:48) on the reconstructed attractor toobtain σ k = N (cid:48) ∑ i ∈{ (cid:126) v ﬁd } σ k ( (cid:126) v i ) . (F9)Since the reinterpretation of ε with the related k now acts as a neighbourhood size parameter(Eqs. F3,F4,F5,F6,F7,F8), σ k can be normalized with respect to the averaged inter-point distance,which depends on the sampling rate and the scale of the input data [11]. The normalization factoris α k = N (cid:48) ∑ i ∈{ (cid:126) v ﬁd } ε − k ( (cid:126) v i ) (F10)with ε k from Eq. (F7). In this way, the ﬁnal statistic will be able to compare attractor reconstruc-tions stemming from different input data and also serves as an irrelevance measure, because largedelays will result in large ε k ’s.Finally, Eqs. (F9),(F10) deﬁne the L -statistic L k = log ( α k σ k ) , (F11)which has a free parameter k and another implicit parameter T M (Eq. (F3)). [1] H. Whitney, Annals of Mathematics , 645 (1936).

2] R. Mañé, in

Dynamical Systems and Turbulence, Warwick 1980 , edited by D. Rand and L.-S. Young(Springer Berlin Heidelberg, Berlin, Heidelberg, 1981) pp. 230–242.[3] F. Takens, in

Dynamical Systems and Turbulence, Warwick 1980 , edited by D. Rand and L.-S. Young(Springer Berlin Heidelberg, Berlin, Heidelberg, 1981) pp. 366–381.[4] T. Sauer, J. A. Yorke, and M. Casdagli, Journal of Statistical Physics , 579 (1991).[5] D. Broomhead and G. P. King, Physica D: Nonlinear Phenomena , 217 (1986).[6] B. Mann, F. Khasawneh, and R. Fales, Communications in Nonlinear Science and Numerical Simu-lation , 2999 (2011).[7] N. H. Packard, J. P. Crutchﬁeld, J. D. Farmer, and R. S. Shaw, Phys. Rev. Lett. , 712 (1980).[8] J. F. Gibson, J. Doyne Farmer, M. Casdagli, and S. Eubank, Physica D: Nonlinear Phenomena , 1(1992).[9] E. R. Deyle and G. Sugihara, PloS one , e18295 (2011).[10] M. Casdagli, S. Eubank, J. Farmer, and J. Gibson, Physica D: Nonlinear Phenomena , 52 (1991).[11] L. C. Uzal, G. L. Grinblat, and P. F. Verdes, Phys. Rev. E , 016223 (2011).[12] C. Nichkawde, Phys. Rev. E , 022905 (2013).[13] M. T. Rosenstein, J. J. Collins, and C. J. D. Luca], Physica D: Nonlinear Phenomena , 82 (1994).[14] A. Eftekhari, H. L. Yap, M. B. Wakin, and C. J. Rozell, Phys. Rev. E , 022222 (2018).[15] P. Grassberger, T. Schreiber, and C. Schaffrath, International Journal of Bifurcation and Chaos ,521 (1991), https://doi.org/10.1142/S0218127491000403.[16] I. Vlachos and D. Kugiumtzis, Phys. Rev. E , 016207 (2010).[17] A. M. Fraser, Physica D: Nonlinear Phenomena , 391 (1989).[18] A. M. Fraser and H. L. Swinney, Phys. Rev. A , 1134 (1986).[19] W. Liebert and H. Schuster, Physics Letters A , 107 (1989).[20] M. B. Kennel, R. Brown, and H. D. I. Abarbanel, Phys. Rev. A , 3403 (1992).[21] M. B. Kennel and H. D. I. Abarbanel, Phys. Rev. E , 026209 (2002).[22] L. Cao, Physica D: Nonlinear Phenomena , 43 (1997).[23] R. Hegger and H. Kantz, Phys. Rev. E , 4970 (1999).[24] Y. Tang, A. Krakovská, K. Mezeiová, and H. Budáˇcová, Journal of Complex Systems , 932750(2015).[25] Z. Aleksi´c, Physica D: Nonlinear Phenomena , 362 (1991).[26] A. ˇCenys and K. Pyragas, Physics Letters A , 227 (1988).

27] G. Kember and A. Fowler, Physics Letters A , 72 (1993).[28] L. Cao, A. Mees, and K. Judd, Physica D: Nonlinear Phenomena , 75 (1998).[29] D. Kugiumtzis, Physica D: Nonlinear Phenomena , 13 (1996).[30] A. M. Albano, J. Muench, C. Schwartz, A. I. Mees, and P. E. Rapp, Phys. Rev. A , 3017 (1988).[31] A. Albano, A. Passamante, and M. E. Farrell, Physica D: Nonlinear Phenomena , 85 (1991).[32] L. A. Aguirre, Physics Letters A , 88 (1995).[33] W. Liebert, K. Pawelzik, and H. G. Schuster, Europhysics Letters (EPL) , 521 (1991).[34] T. Buzug, T. Reimers, and G. Pﬁster, Europhysics Letters (EPL) , 605 (1990).[35] T. Buzug and G. Pﬁster, Physica D: Nonlinear Phenomena , 127 (1992).[36] T. Buzug and G. Pﬁster, Phys. Rev. A , 7073 (1992).[37] J. Gao and Z. Zheng, Physics Letters A , 153 (1993).[38] J. Gao and Z. Zheng, Phys. Rev. E , 3807 (1994).[39] K. Judd and A. Mees, Physica D: Nonlinear Phenomena , 273 (1998).[40] H. Kim, R. Eykholt, and J. Salas, Physica D: Nonlinear Phenomena , 48 (1999).[41] M. Small and C. Tse, Physica D: Nonlinear Phenomena , 283 (2004).[42] A. Perinelli and L. Ricci, Phys. Rev. E , 052226 (2018).[43] D. Holstein and H. Kantz, Phys. Rev. E , 056202 (2009).[44] S. P. Garcia and J. S. Almeida, Phys. Rev. E , 037204 (2005).[45] S. P. Garcia and J. S. Almeida, Phys. Rev. E , 027205 (2005).[46] L. M. Pecora, L. Moniz, J. Nichols, and T. L. Carroll, Chaos: An Interdisciplinary Journal of Nonlin-ear Science , 013110 (2007), https://doi.org/10.1063/1.2430294.[47] Y. Hirata, H. Suzuki, and K. Aihara, Phys. Rev. E , 026202 (2006).[48] M. Han, W. Ren, M. Xu, and T. Qiu, IEEE Transactions on Cybernetics , 1885 (2019).[49] Y. Hirata and K. Aihara, Phys. Rev. E , 032219 (2017).[50] L. M. Pecora, T. L. Carroll, and J. F. Heagy, Phys. Rev. E , 3420 (1995).[51] Implicitly the undersampling statistic could be used as a break criterion, when all of the considereddelays (local maxima in the continuity statistic) are above the chosen signiﬁcance level β in the cor-responding undersampling statistic. But this will not hold for non-chaotic systems and leads to veryhigh embedding dimensions in these cases. ().[52] Note that we were not able to reproduce the results shown in Garcia and Almeida [44] and Garciaand Almeida [45]. Two of the autors, K.H.K and N.M as well as another experienced researcher ndependently implemented this method and got the exact same results. An email to Prof. Garciaexplaining this issues and seeking for help remains unanswered. We improved the method to be atleast capable of producing acceptable results. First we implemented a Theiler window, which has notbeen discussed by the authors. Second we introduced the forward time step in order to produce the d E statistic as a free parameter. The authors only discuss the case of a forward time step of 1. Throughoutall our computations we set this parameter to the same value as the Theiler window, i.e. the valueof the ﬁrst minimum of the auto-mutualinformation. In the multivariate input cases, we picked themaximum from all ﬁrst minima of all auto-mutualinformation. The reader is welcome to follow theimplementation process on https://github.com/JuliaDynamics/DelayEmbeddings.jl/pull/38. ().[53] J. Theiler, Phys. Rev. A , 2427 (1986).[54] N. F. Rulkov, M. M. Sushchik, L. S. Tsimring, and H. D. I. Abarbanel, Phys. Rev. E , 980 (1995).[55] V. S. Afraimovich, N. N. Verichev, and M. I. Rabinovich, Radiophysics and Quantum Electronics ,795 (1986).[56] J.-P. Eckmann, S. Oliffson Kamphorst, and D. Ruelle, Europhysics Letters , 973 (1987).[57] N. Marwan, M. C. Romano, M. Thiel, and J. Kurths, Physics Reports , 237 (2007).[58] K. H. Kraemer, R. V. Donner, J. Heitzig, and N. Marwan, Chaos: An Interdisciplinary Journal ofNonlinear Science , 085720 (2018), https://doi.org/10.1063/1.5024914.[59] T. K. March, S. C. Chapman, and R. O. Dendy, Physica D , 171 (2005).[60] M. S. Baptista, E. J. Ngamga, P. R. F. Pinto, M. Brito, and J. Kurths, Physics Letters A , 1135(2010).[61] K. H. Kraemer and N. Marwan, Physics Letters A , 125977 (2019).[62] C. Letellier and L. A. Aguirre, Chaos: An Interdisciplinary Journal of Nonlinear Science , 549(2002), https://doi.org/10.1063/1.1487570.[63] J. D. Farmer, Physica D: Nonlinear Phenomena , 366 (1982).[64] M. Ragwitz and H. Kantz, Phys. Rev. E , 056201 (2002).[65] I. Z. Kiss, W. Wang, and J. L. Hudson, Phys. Chem. Chem. Phys. , 3847 (2000).[66] M. Wickramasinghe and I. Z. Kiss, Chaos , 023125 (2010).[67] K. H. Kraemer and G. Datseris, “hkraemer/PECUZAL Julia: Reproducible code base for PECUZALembedding,” (2021).[68] K. H. Kraemer and N. Marwan, “hkraemer/PECUZAL python: PECUZAL embedding algorithm,”(2021).

69] K. H. Kraemer and N. Marwan, “hkraemer/PECUZAL Matlab: PECUZAL embedding algorithm,”(2021).[70] G. Datseris, Journal of Open Source Software , 598 (2018).[71] J. Chen and J. Revels, “Robust benchmarking in noisy environments,” (2016), arXiv:1608.04295[cs.PF].[72] C. L. Webber, Jr. and J. P. Zbilut, Journal of Applied Physiology , 965 (1994).[73] Y. Zou, R. V. Donner, N. Marwan, J. F. Donges, and J. Kurths, Physics Reports , 1 (2019).[74] S. Boccaletti, V. Latora, Y. Moreno, M. Chavez, and D.-U. Hwang, Physics Reports , 175 (2006).[75] R. V. Donner, J. Heitzig, J. F. Donges, Y. Zou, N. Marwan, and J. Kurths, The European PhysicalJournal B , 653 (2011).[76] C. Rackauckas and Q. Nie, Journal of Open Research Software (2017).[77] E. N. Lorenz, Journal of the Atmospheric Sciences , 130 (1963).[78] E. Lorenz, in Seminar on Predictability, 4-8 September 1995 , Vol. 1, ECMWF (ECMWF, ShinﬁeldPark, Reading, 1995) pp. 1–18.[79] O. Rössler, Physics Letters A , 397 (1976).[80] G. Dufﬁng, Erzwungene Schwingungen bei veraenderlicher Eigenfrequenz und ihre technische Be-deutung. (F. Vieweg & Sohn, 1918).[81] E. Appleton and B. van der Pol junr., The London, Edinburgh, and Dublin Philosophical Magazineand Journal of Science , 177 (1922), https://doi.org/10.1080/14786442208633861.[82] J. Cui, W. Zhang, Z. Liu, and J. Sun, Numerical Algorithms , 1217 (2018).[83] M. Mackey and L. Glass, Science , 287 (1977)., 287 (1977).