[PDF] Delay Parameter Selection in Permutation Entropy Using Topological Data Analysis

Abstract

Full PDF

DDelay Parameter Selection in Permutation Entropy UsingTopological Data Analysis

A Preprint

Audun D. Myers

Department of Mechanical EngineeringMichigan State UniversityEast Lansing, MI [email protected]

Firas A. Khasawneh

Department of Mechanical EngineeringMichigan State UniversityEast Lansing, MI [email protected]

May 14, 2019

Abstract

Permutation Entropy (PE) is a powerful tool for quantifying the predictability of a sequence whichincludes measuring the regularity of a time series. Despite its successful application in a variety ofscientiﬁc domains, PE requires a judicious choice of the delay parameter τ . While another parameterof interest in PE is the motif dimension n , Typically n is selected between 4 and 8 with 5 or 6 givingoptimal results for the majority of systems. Therefore, in this work we focus solely on choosing thedelay parameter. Selecting τ is often accomplished using trial and error guided by the expertise ofdomain scientists. However, in this paper, we show that persistent homology, the ﬂag ship tool fromTopological Data Analysis (TDA) toolset, provides an approach for the automatic selection of τ . Weevaluate the successful identiﬁcation of a suitable τ from our TDA-based approach by comparing ourresults to a variety of examples in published literature. Shannon entropy, which was introduced in 1948 [30], is ameasurement of how much uncertainty there is in futuredata given the current dataset. Since the ﬁrst introductionof information entropy, several new forms of entropy havebeen popularized. Some examples include approximateentropy [23], sample entropy [26], and permutation en-tropy (PE) [5]. While all of these methods measure thepredictability of a sequence, PE also considers the orderin which the data was received, which is critical for timeseries analysis. PE was ﬁrst introduced by Bandt andPompe [5] in 2002. Similar to Shannon Entropy, PE isquantiﬁed as the summation of the probabilities of a datatype (see Eq. 1), where the data types for PE are motifs(see Fig. 1), which we represent as π . The PE parameters n and τ are used when selecting the motif size and spacing,respectively. More speciﬁcally, τ is the embedding delaylag applied to the series and n is a natural number thatdescribes the dimension of the motif. In this study we focuson selecting τ since the dimension is typically chosen inthe range 3 < n ≤ τ = n = τ = n ∈ [ , ] for logisticmaps, and Frank et al. [9] suggest τ = n ∈ [ , ] for heart rate applications. One main disadvantage ofusing suggested parameter values for an application is thehigh dependence of PE on the sampling frequency. Asan example, Popov et al. [24] showed the importance ofconsidering the sampling frequency when selecting τ for anEEG signal. Another limitation is the need for applicationexpertise in order to determine the needed parameters. Thiscan hinder using PE in new applications that have not beensuﬃciently explored. Consequently, there is a need foran automatic, application-independent parameter selectionalgorithm for PE.To some extent, some studies have attempted to ﬁnd robustand application-independent methods for determining τ .Many of these methods are based on phase space recon-struction using Takens embedding [31], which has corre-sponding parameters. Some of the common approachesfor determining τ are mutual information [10], autocorre-lation [12], and phase space methods [6]. However, due tothe origin of these methods in phase space reconstructionand not PE, they may result in a poor estimate of τ . a r X i v : . [ phy s i c s . d a t a - a n ] M a y preprint - May 14, 2019In this paper, we present two novel approaches for ﬁnding τ based on tools from Topological Data Analysis (TDA).Speciﬁcally in the ﬁrst approach we compute the 0-D sub-level set persistence of the Fourier spectrum to obtain asummary of how the diﬀerent components in the spectrumare connected. We then use the modiﬁed z -score fromstatistics to acquire a threshold for a cutoﬀ frequency whichcaptures the most dominant frequencies in the spectrumin an automatic way. We then utilize Nyquist’s samplingtheorem to ﬁnd an appropriate τ value. Our second ap-proach utilizes sliding window and 1-D persistence scoring(SW1PerS) [21] to measure how periodic or signiﬁcantthe circular shape is for the embedded point cloud. UsingSW1Pers, we search for the window size that maximizes aperiodicity score, and then use the resulting window sizeto suggest an embedding delay for PE. To determine the vi-ability of our two methods, PE parameters generated basedon them are compared to expert suggested parameters.The paper is organized as follows. In Section 1.1 wedescribe PE and show its computation using a simpleexample. Section 2 provides an overview of the toolsthat we use from TDA. Section 3.2 describes our ﬁrstapproach for ﬁnding τ which is a frequency based approachthat applies 0-D sublevel set persistence and the modiﬁed z -score to the Fourier spectrum. Section 3.3 details thesecond TDA approach for ﬁnding τ which is a time-domainapproach that combines sliding window embedding of thetime series with 1-D persistence. The results are thenpresented in Section 4, and the concluding remarks aremade in Section 5. Permutation entropy H ( n ) for motif dimension n is calcu-lated according to [5] H ( n ) = − (cid:213) p ( π i ) log p ( π i ) , (1)where p ( π i ) is the probability of a permutation π i , and H ( n ) has units of bits when the logarithm is of base 2.The permutation entropy parameters τ and n are usedwhen selecting the motif size: τ is the number of timesteps between two consecutive points in a uniformly sub-sampled time series, and n is the permutation length or motifdimension. Using a set X and an element of the set x i ∈ X ,we can deﬁne the vector v i = [ x i , x i + τ , x i + τ , . . . , x i + ( n − ) τ ] ,which has the permutation π i . To better understand thepossible permutations, consider an example with thirddegree ( n =

3) permutations. This results in six possiblemotifs as shown in Fig. 1.Next, to further demonstrate PE with an example, considerthe sequence X = [ , , , , , , ] with third order per-mutations n = τ =

1. The sequence can bebroken down into the following permutations: two ( , , ) permutations, one ( , , ) permutation, and two ( , , ) permutations for a total of 5 permutations. Applying Eq. (1)yields H ( ) = −

25 log 25 −

15 log 15 = .

522 bits . (0,1,2) (0,2,1) (1,0,2)(2,0,1) (1,2,0) (2,1,0) Figure 1: All possible permutation conﬁgurations (motifs)for n = 3, where [ π . . . π ] = [( , , ) . . . ( , , )] .The permutation distribution can be visually understood byillustrating the probabilities of each permutation as separatebins. To accomplish this, Fig. 2 was created by taking thesame series X (Fig. 2 a) and placing the abundance of eachpermutation into its respective bin (Fig. 2 b). PE is at aFigure 2: Abundance of each permutation from exampledataset X .maximum when all n ! possible permutations are evenlydistributed or, equivalently, when the permutations areequiprobable with p ( π i ) = p ( π ) , p ( π ) , . . . , p ( π n ! ) = n ! .From this, the maximum permutation entropy H max isquantiﬁed as H max ( n ) = − (cid:213) p ( π i ) log p ( π i ) = − log 1 n ! = log n ! . (2)Applying Eq. (2) for n = n !, the normalized permutation entropy iscalculated as h n = − n ! (cid:213) p ( π i ) log p ( π i ) . (3)2 preprint - May 14, 2019Applying Eq. (3) to the example series X results in h ≈ . Our TDA-based approaches for ﬁnding the delay dimensionemploy two types of persistence applied to two diﬀerenttypes of data. Speciﬁcally, in the ﬁrst approach we combinethe 0-D sublevel persistence of one-dimensional time serieswith the z -score, while in the second approach we utilize1-D persistent homology on embedded time series as partof the SW1PerS framework in Section 3.3. This sectionprovides a basic background of the topics needed to atleast intuitively understand the subsequent analysis. Morespeciﬁcs can be found in [7, 8, 11, 20]. An abstract k -simplex σ is deﬁned as a set of k + ( σ ) = k . If we apply a geometric interpretationto a k -simplex, we can think of it as a set V of k + K is a set of simplices σ ⊆ V such that for every σ ∈ K , all the faces of σ , i.e., allthe lower dimensional component simplices σ (cid:48) ⊂ σ arealso in K . For example, if a triangle (2-simplex) is in asimplicial complex K , then so are the edges of the triangle(1-simplices) as well as all the nodes in the triangle (0-simplices). The dimension of the resulting simplicialcomplex is given by the largest dimension of its simplicesaccording to dim ( K ) = max σ ∈ K dim ( σ ) . The n -skeletonof a simplicial complex K ( n ) is the restriction of the latterto its simplices of degree at most n , i.e., K ( n ) = { σ ∈ K | dim ( σ ) ≤ n } .Given an undirected graph G = ( V , E ) where V are thevertices and E are the edges, we can construct the clique(or ﬂag) complex K ( G ) = { σ ⊆ V | u v ∈ E for all u (cid:44) v ∈ σ } . If we ﬁx a simplicial complex K , then homology groupscan be used to quantify the holes of the structure in diﬀerentdimensions. For example, in dimension 0, the rank of the0 dimensional homology group H ( K ) is the number ofconnected components. The rank of the 1-dimensional ho-mology group H ( K ) is the number of loops, while the rankof H ( K ) is the number of voids, and so on. The homologygroups are constructed using linear transformations termedboundary operators.To describe boundary operators, we ﬁrst deﬁne the orientedsimplicial complex as the simplicial complex obtained byan ordered set of vertices σ = [ v , . . . , v k ] . Permutingtwo indices in σ gives the opposite simplex according to [ v , . . . , v i , . . . , v j , . . . , v k ] = −[ v , . . . , v j , . . . , v i , . . . , v k ] .Now let { α σ } be coeﬃcients in a ﬁeld F (in this paper wechoose F = Z ). Then K ( n ) , the n -skeleton of K , can beused as a generating set of the F -vector space ∆ n ( K ) . Inthis representation, any element of ∆ n ( K ) can be writtenas a linear combination (cid:205) σ ∈ K ( n ) α σ σ . Further, elementsin ∆ n ( K ) are added by adding their coeﬃcients. A ﬁniteformal sum of the n -simplices in K is called an n -chain,and the group of all n -chains is the n th chain group ∆ n ( K ) ,which is a vector space.Given a simplicial complex K , the boundary map ∂ n : ∆ n ( K ) → ∆ n − ( K ) is deﬁned by ∂ n ([ v , . . . , v n ]) = n (cid:213) i = (− ) i [ v , . . . , ˆ v i , . . . , v n ] , where ˆ v i denotes the absence of element v i from the set.This linear transformation maps any n -simplex to the sumof its codimension 1 (codim-1) faces. The geometricinterpretation of the boundary operator is that it yields theorientation-preserved boundary of a chain.By combining boundary operators, we obtain the chaincomplex . . . ∂ n + −−−→ ∆ n ( K ) ∂ n −−→ . . . ∂ −−→ ∆ ( K ) ∂ −−→ , where the composition of any two subsequent boundaryoperators is zero, i.e., ∂ n ◦ ∂ n + =

0. A n -chain α ∈ ∆ n ( K ) is a cycle if ∂ n ( α ) =

0; it is a boundary if there is an n + β such that ∂ n + ( β ) = α . Deﬁne the kernel ofthe boundary map ∂ n using Z n ( K ) = { c ∈ ∆ n ( K ) : ∂ n c = } , and the image of ∂ n + B n ( K ) = { c ∈ ∆ n ( K ) : c = δ n + c (cid:48) , c (cid:48) ∈ ∆ n + ( K )} . Consequently, we have B k ( K ) ⊆ Z k ( K ) . Therefore, we deﬁne the n th homology group of K as the quotient group H n ( K ) = Z n ( K )/ B n ( K ) . In this paper,we only need 0- and 1-dimensional persistent homology,and we always assume homology with Z coeﬃcients whichremoves the need to keep track of orientation. In the case of0-dimensional homology, there is a unique class in H ( K ) for each connected component of K . For 1-dimensionalhomology, there is one homology class in H ( K ) for each hole in the complex. Now, we are interested in studying the structure of achanging simplicial complex. We introduce a real-valuedﬁltration function on the simplicies of K such that f ( τ ) ≤ f ( σ ) for all τ ≤ σ simplices in K . If we let { y < y <. . . < y (cid:96) } be the set of the sorted range of f for any y ∈ R ,then the ﬁltration of K with respect to f is the orderedsequences of its subcomplexes ∅ ⊆ K ( y ) ⊆ K ( y ) ⊆ · · · ⊆ K ( y (cid:96) ) = K . The sublevel set of K corresponding to y is deﬁned as K ( y ) = { σ ∈ K | f ( σ ) ≤ y } , (4)where each of the resulting K ( y ) is a simplicial complex,and for any y ≤ y , we have K ( y ) ⊆ K ( y ) .3 preprint - May 14, 2019The ﬁltration of K enables the investigation of the topolog-ical space under multiple scales of the output value of theﬁltration function f . In this paper we consider two diﬀerentﬁltration functions where each of these functions is appliedto a diﬀerent type of data: 0-D persistence applied to 1-Dtime series, and 1-D persistence applied to point cloudsembedding in R n .0 -D persistence applied to -D time series: Let χ bethe time-ordered set of the critical values of a time series.Here, we can think of the simplicial complex K = G ( V , E ) containing a number of vertices | V | equal to the number ofcritical values in the time series and only the edges E thatconnect adjacent vertices. i.e., vertices { v i | ≤ i ≤ n } ,and edges { v i v i + | ≤ i ≤ n − } . Therefore, we have aone-to-one correspondence between the critical values in χ and the vertices of the simplicial complex V . We deﬁnethe ﬁltration function for every face σ in K according to f ( σ i ) = (cid:26) χ i if σ i ∈ V , max ( u , v ) if σ i = u v ∈ E . Using this ﬁltration function in Eq. (4), we can de-ﬁne an ordered sequence of subcomplexes where y ∈[ min ( χ ) , max ( χ )] .1 -D persistence applied to point clouds in R m : Thesecond set of data points we work with is a point cloud P ⊂ R m . There are several methods to embed a timeseries; however, here we embed the time series using delayreconstruction. For each point p ∈ P , let B ( p , r ) be theball centered at p and of radius r . Now, for some radius r ,we can build a simplicial complex where, for example, theintersection of any two balls adds an edge, the intersectionof three balls adds a triangle, and higher dimensionalanalogs are added similarly, see the construction for r through r in Fig. 4. The result of the union of all theballs ∪ p ∈ P B ( p , r ) is called the Čech complex; althoughin practice we work with Rips complexes which are theclique complexes of the function f and they are easier toobtain computationally while still capturing the interestingfeatures of the space. The ﬁltration function in this case isgiven by f ( σ ) = max { u , v }⊆ σ d ( u , v ) , where d ( u , v ) is the distance between vertices u and v , andthe distance between a vertex and itself is zero. Persistent homology is a tool from topological data analysiswhich can be used to quantify the shape of data. Themain idea behind persistent homology is to watch how thehomology changes over the course of a given ﬁltration.Fix a dimension n , then for a given ﬁltration K ⊆ K ⊆ · · · ⊆ K N we have a sequence of maps on the homology H n ( K ) → H n ( K ) → · · · → H n ( K N ) . We say that a class [ α ] ∈ H n ( K i ) is born at i if it is not inthe image of the map H n ( K i − ) → H n ( K i ) . The same classdies at j if [ α ] (cid:44) H n ( K j − ) but [ α ] = H n ( K j ) .This information can be used to construct a persistencediagram as follows. A class that is born at i and dies at j is represented by a point in R at ( i , j ) . The collectionof the points in the persistence diagram, therefore, give asummary of the topological features that persists over thedeﬁned ﬁltration. See the example of Fig. 3 for n = n = Delay embedding is a uniform subsampling of the originaltime series according to the embedding parameter τ . Forexample, the subsampled sequence X with elements { x i : i ∈ N ∪ } subject to the delay τ is deﬁned as X ( τ ) = [ x , x τ , x τ , . . . ] . Riedl et al. [27] showed that PE is sensitiveto the time delay, which prompts the need for a robustmethod for determining an appropriate value for τ . Forestimating the optimal τ , we will be investigating thefollowing methods in the subsequent sections: MutualInformation (MI) in Section 3.1, combining Fourier analysiswith 0-D persistence and the modiﬁed z -score (Section3.2), and Sliding Window for One-Dimensional PersistenceScoring (SW1PerS) (Section 3.3). We recognize, but donot investigate, some other commonly used methods forﬁnding τ . These include the autocorrelation function [12]and the phase space expansion [6]. Mutual information is a measurement of how much infor-mation is shared between two sequences, and it was ﬁrstrealized by Shannon et al. [29] as I ( X ; Y ) = (cid:213) x ∈ X (cid:213) y ∈ Y p ( x , y ) log p ( x , y ) p ( x ) p ( y ) , (5)where X and Y are separate sequences, p ( x ) and p ( y ) arethe probability of the element x and y separately, and p ( x , y ) is the joint probability of x and y . Fraser andSwinney [10] showed that for a chaotic time series themutual information between the sequence x ( t ) and x ( t + τ ) r and dies at r , while the second complex is born at r and dies at a much largerradius r .will decrease as τ increases until reaching a minimum. Atthis delay τ , the individual data points share a minimumamount of information, thus indicating that the data pointsare suﬃciently separated. While this delay value wasspeciﬁcally developed for phase space reconstruction froma single time series, it is also commonly used for the PEparameter τ . We would like to point out that in general thereis no guarantee that minima exist for the mutual informationfunction, which is a serious limitation for computing τ using this method. -D Persistenceand Modiﬁed z -score In this section we present a novel topological data anal-ysis (TDA) based approach for ﬁnding the noise ﬂoorin the Fourier spectrum. Speciﬁcally, we show how the0-dimensional sublevel persistence, a tool from TDA dis-cussed in Section 2, can be used to ﬁnd the lifetime of themaxima and minima in a signal. We describe a fast andsimple algorithm (Algorithm 1) for computing the lifetimesusing the peaks and valleys of the signal. We then separatethe noise lifetimes from signiﬁcant lifetimes through theuse of the modiﬁed z -score, which allows us to ﬁnd thenoise ﬂoor and frequency cutoﬀ. This frequency cutoﬀwill later be used to ﬁnd an embedding delay τ for PE. Theprocess for ﬁnding the cutoﬀ frequency is summarized inFig. 5. The following paragraphs give an overview of themodiﬁed z -score and the threshold and cutoﬀ analysis. Modiﬁed z -score The modiﬁed z -score z m is essentialto understanding the techniques used for isolating noisefrom a signal [28]. The standard score, commonly knownas the z -score, uses the mean and the standard deviation ofa dataset to ﬁnd an associated z -score for each data pointand is deﬁned as z = x − µσ , (6)where x is the dataset, µ is the mean, and σ is the standarddeviation of the dataset, respectively. The z -score valueis commonly used to identify outliers in the dataset byrejecting points that are above a set threshold, which isset in terms of how many standard deviations away from the mean are acceptable. Unfortunately, the z -score issusceptible to outliers itself because both the mean andthe standard deviation are not robust against outliers [16].This led Hampel [13] to develop the modiﬁed z -score asan outlier detection method that is insensitive to outliers.The logic behind the modiﬁed z -score or median absolutedeviation (MAD) method is grounded on the use of themedian for the mean. The MAD is calculated asMAD = median (| x − ˜ x |) , (7)where x is the dataset and ˜ x is the median of the dataset. TheMAD is substituted for the standard deviation in Eq. (6). Tocomplete the modiﬁed z -score, Iglewicz and Hoaglin [14]suggested substituting the mean with the median. Theresulting equation for the modiﬁed z -score is z m = . x − ˜ x MAD . (8)Using z m allows for outlier detection with a breakdownpoint of 50% outliers. A threshold for separating noisein the persistence domain is discussed in the followingparagraph. Threshold and Cutoﬀ Analysis

To determine the noiseﬂoor in the normalized Fast Fourier Transform (FFT) spec-trum, we compute the 0-dimensional persistence of the FFT.This provides relatively short lifetimes for the GWN, whilethe prominent peaks, which represent the actual signal,have comparatively long lifetimes or high persistence. Toseparate the noise from the outliers, we apply the modiﬁed z -score for each lifetime in the persistence diagram. Wedetermine the modiﬁed z -score threshold based on the needto account for approximately all of the noise in the FFT. Todo this, we use a signal of pure GWN with a standard devi-ation of 1.0 and length of 10 . Next, we calculate the FFTof the GWN signal and its 0-D sublevel set persistence toget a set of lifetimes from the persistence diagram. We thenapply modiﬁed z -score thresholds ranging from 0 to 5 onthe lifetimes from the persistence diagram to determine thepercent of noise as a function of the threshold. By settinga threshold requirement that accounts for approximatelyall persistence diagram noise, while not being excessivelylarge, a threshold of approximately 4.8 was found as shownin Fig. 6. This threshold was rounded up to 5 for simplicity.5 preprint - May 14, 2019 TimesSeries Fourier Transform 0-DPersistence Modified-score Cutoff/Max. Freq. Embedding Delay

Cutoff

Max Frequency

Birth D e a t h Birth D e a t h Frequency Frequency

Figure 5: Overview of procedure for ﬁnding maximum signiﬁcant frequency using 0-dimensional sublevel set persistenceand the modiﬁed z -score for a signal contaminated with GWN.With a suitable modifed z -score threshold, our next goalin ﬁnding a the noise ﬂoor cutoﬀ begins by investigatingthe distribution of the FFT of GWN in the persistencedomain. To do this, the same GWN signal is analyzed.Again, the FFT and the 0-D persistence were calculatedto determine the lifetimes as well as the births and deathsof the persistence diagram. Figure 7 shows the histogramobtained from the resulting persistence diagram.Figure 6: Percent of the persistence points from 0-Dsublevel set persistence of the FFT of GWN using themodiﬁed z -score with the provided threshold ranging from0 to 5.Although there have been studies on pushing forward proba-bility distributions into the persistence domain [1, 2, 15], itis diﬃcult to obtain a theoretical cutoﬀ value in persistencespace. Therefore, we proceed by computing the histogramsfor the 0-D sublevel set and establishing a heuristic cutoﬀvalue that seems to perform well. The theoretical distri-bution of GWN in the Fourier spectrum has a Rayleighdistribution [25]. Approximately the same distributionappears in the histogram of noise for the birth times ofthe persistence diagram with the following diﬀerences:the death times tend to show a slightly shifted normaldistribution, while the lifetimes show approximately afolded-normal distribution. Without doing an in depth sta-tistical analysis of the distributions, we calculate a suitablecutoﬀ using the median of the births and maximum of thelifetimes asCutoﬀ noise = max ( lifetime noise ) + median ( birth noise ) , (9)where Cutoﬀ noise is the suggested noise cutoﬀ and birth noise and lifetime noise is the set of the births and lifetimes thatwere considered noise from the modiﬁed z -scores. Figure 7: Distributions of the lifetimes, deaths, and birthsof 0-D sublevel set persistence for the FFT of GWN witha length of 10 , mean µ = .

0, and standard deviation σ = . f max as thehighest frequency in the Fourier spectrum with an amplitudegreater than the speciﬁed cutoﬀ from Eq. (9). Using f max ,we suggest a delay following Nyquist’s sampling rate as τ = f s f max , where f s is the sampling frequency. τ using SW1Pers Perea and Harer [22] developed Sliding Window for One-Dimensional Persistence Scoring (SW1PerS) as a TDAmethod for determining the most signiﬁcant period of atime series. SW1PerS is also useful for ﬁnding the windowin which a chaotic time series has the highest periodicity.In this section we develop an algorithm that combinesthe period of the main oscillation derived from SW1PerSwith frequency criterion developed by [18] to determine asuitable delay.SW1PerS uses 1-D persistent homology from Section 2 tomeasure how periodic or signiﬁcant the circular shape inan embedded dimension (point cloud) is as the embeddingwindow size increases. The sliding window SW is deﬁnedas SW m ,τ s f ( t ) = [ f ( t ) , f ( t + τ s ) , . . . , f ( t + m τ s )] , (10)where f ( t ) is the time series being analyzed and τ s and m are, respectively, SW1PerS embedding delay and dimen-sion. Applying Eq. (10) across the range of the time series n T times results in a collection of vectors known as a setof point clouds, which live in an m -dimensional Euclideanspace. The number of sliding windows n T was set to 200 tobe suﬃciently high for forming 1-D topological shapes or6 preprint - May 14, 2019simplices. Speciﬁcally, we ﬁrst linearly map the entire timeseries onto the domain [ , π ] . Next, we set the desiredwindow size as w = π mL ( m + ) , (11)where L is the division parameter that segments the entiretime series into seperate windows of size w . For ouranalysis, we varied the L parameter from Eq. (11) to ﬁndthe window size which yielded the highest periodicity.Figure 8 shows an example sliding window using theLorenz system described in Section A.2 with ρ =

28. The n n+1 n+2 ...... n n+1 n+2 x yz Figure 8: SW1PerS applied on Lorenz example deﬁnedin Section A.2 showing sliding window for time serieslinearly scaled to 2 π and embedded in dimension m = m =

3. Normally, m is determinedbased on the theory developed by Perea et al. [22], whichshowed the necessary value of m for reconstruction isbounded by m ≥ N , where N is the number of Fourierterms necessary for reconstructing the signal. For ourapplication, we will also set m = N . Perea et al. [21]did not choose N based on the data, but rather picked alarge number assuming it would be suﬃcient to faithfullyrepresent the data. In contrast, in this work we automatechoosing N by approximating the Fourier series using thediscrete Fourier transform, and then computing the (cid:96) normbetween the approximation and the signal to obtain thevalue of N that yields an error within a desired threshold,see Eq. (13). Speciﬁcally, if we let the time series X be adiscrete time sampling of a piecewise smooth signal x ( t ) ,then the N -partial sum of the Fourier series of x ( t ) can be approximated according to f N ( t ) = | X | N (cid:213) k = (cid:16) | X | (cid:213) j = X ( j ) e − π ijk / T (cid:17) e π ikt / T , (12)where | X | is the length of the time series X . As a rule ofthumb, N ≈ | X |/ x ( t ) [4]. The relative (cid:96) norm that measures the error betweentime series X and its reconstruction f N ( t ) is given by (cid:96) ( N ) = (cid:16) (cid:205) | X | j = (cid:2) X ( j ) − f N ( j ) (cid:3) (cid:17) / (cid:16) (cid:205) | X | j = X ( j ) (cid:17) / . (13)For our application, we cosider f N ( t ) as suﬃciently closeto x ( t ) when we ﬁnd a value of N for which (cid:96) ( N ) < . m determined as m = N , we then calculate thesliding window delay τ s based on the relationship τ s = w / m .To apply this desired time delay, we interpolate the timeseries using a cubic spline ﬁt. We can then determine thewindow size that provides the greatest periodicity usingthe periodicity score s = − r B − r D , (14)Equation (15) normalizes the scores so that s = s = P of the time series us-ing the L parameter corresponding to the lowest periodicityscore s to calculate P as P = mT ( m + ) L P , (15)where T is the time span of the original time series and L P is the L value corresponding to the the lowest periodicityscore s . To ﬁnd L P , L was incremented from 1 to L max ,where L max is the maximum number of periods to considerin the time series. Figure 9 demonstrates how the shape ofthe data changes with the time delay τ s , which changes thewindow size w . Additionally, Fig. 9 shows how a minimumperiodicity score occurs when the embedded data forms a1-D simplex with the longest lifetime. Using the periodof the time series P , we compute a suggested embeddingdelay according to τ = f s P α , (16)where f s is the original sampling frequency and α ∈ [ , ] is based on the frequency criteria from [18]. For this paperwe set α = To verify our TDA-based methods for determining τ , Ta-ble 1 compares our results to the values suggested by expertsusing a variety of systems including the ones listed by Riedlet al. [27]. The table highlights the automatically computed τ which best matches the expert-identiﬁed values for τ .7 preprint - May 14, 2019Figure 9: Example showing the shape of the embedded time series as a function of the window size: (left) w Small istoo small revealing elliptical shape and high periodicity score s , (middle) w Medium is properly sized and results in aminimum periodicity score s , and (right) w Medium is too large and results in a high periodicity score s .The tables shows that for GWN, all the considered methodsaccurately calculate an embedding delay of 1. However,none of the methods gave reliable results for both diﬀerenceequations (Henon and the logistic map). Speciﬁcally, thefrequency approach based on 0-D persistence provided thebest results with τ =

1, while the MI method yielded thebest result for the Logistic map. While the 0-D persistenceapproach slightly underestimated τ for these systems, wedo not suggest using SW1PerS for diﬀerence equationssince it results in considerably inaccurate suggestions forthe delay parameter.Although Table 1 shows that there is no one method thatperforms well for all types of signals, in the followingwe provide recommendations for which method to usewith chaotic signals, periodic signals, and EEG/ECG data.For chaotic diﬀerential equations, we suggest using thedelay calculated using the frequency approach with a 0-D-persistence noise ﬁlter as this method consistently providedthe most accurate time delays. For periodic systems, wesuggest using the τ found using either SW1PerS or the 0-Dpersistence approach, but to avoid using the MI results.For EEG/ECG data, we suggest using either SW1PerS orthe 0-D persistence approach as they consistently providemore accurate values of τ in comparison to the MI method. In this paper we describe two novel TDA-based methodsfor automatically determining the PE delay parameter τ given a suﬃciently sampled/oversampled time series. Oneof the methods is a frequency domain approach based onthe modiﬁed z -score and the 0-D sublevel set persistence.The other methods is based on SW1PerS, and it operateson a sliding window embedding of the time-domain signal. Both of these methods were compared to the standard MIapproach of Fraser and Swinney [10].For the frequency approach, we developed an automaticalgorithm for ﬁnding the maximum signiﬁcant frequencyusing a cutoﬀ greater than the additive Gaussian noise ﬂoor.We ﬁnd the noise ﬂoor using 0-D sublevel set persistencewith noise separated using the modiﬁed z -score.We applied these methods to various categories includingdiﬀerence equation, chaotic diﬀerential equations, periodicsystems, EEG/ECG data, and Gaussian noise. We thencompared the generated parameters to values suggestedby experts to determine which methods consistently foundaccurate values for τ , see Table 1. We showed that no singlemethod provides accurate values of τ for every application,but each method showed better performance depending onthe type of the studied system.Speciﬁcally, for pure white noise, all the methods produceda viable estimate for τ . However for chaotic systems,the 0-D persistence approach yielded the most consistentresults. For periodic time series, both TDA-based methodoutperforms their MI counterparts.For EEG/ECG data, the lowest bound for either SW1PerSor the 0-D gave better estimates for τ in comparison to theMI approach. However, for diﬀerence equations, no onemethod consistently provided accurate values for τ with theSW1PerS approach giving large overestimates. For thesesystems, although the frequency approach based on 0-Dpersistence underestimated τ , it came closest to providingthe best results for Hénon map. On the other hand, theMI method yielded the best result for the Logistic map.Therefore, if TDA-methods are used for ﬁnding the delayparameter for diﬀerence equations, we recommend avoidingthe SW1PerS approach in favor of the 0-D persistenceapproach.8 preprint - May 14, 2019Table 1: A comparison between the calculated and suggested values for the delay parameter τ . The shaded cellshighlight the methods that yielded the closest match to the suggested delay. MethodCatagory System MI SW1PerS 0-D Persistenceof FFT SuggestedDelay τ Ref.

Noise White Noise 1 1 1 1 [27]Lorenz 10 9 8 10 [27]Rossler 13 22 7 9 [32]Bi-directionalRossler 11 7 12 15 [27]ChaoticDiﬀerentialEquation Mackey-Glass 7 10 3 1-to 700 [27]PeriodicSystem Sine Wave 2 22 21 20 [32]Logistic Map 4 15 1 1 to 5 [33]NonlinearDiﬀerence Eq. Henon Map 9 90 1 1 to 2 [27]ECG 5 11 12 1 to 4 [27]MedicalData EEG 6 2 5 1 to 3 [27]

Acknowledgment

FAK acknowledges the support of the National Sci-ence Foundation under grants CMMI-1759823 and DMS-1759824.

References [1] Robert J. Adler, Omer Bobrowski, Matthew S. Bor-man, Eliran Subag, and Shmuel Weinberger. Persis-tent homology for random ﬁelds and complexes. In

Institute of Mathematical Statistics Collections , pages124–143. Institute of Mathematical Statistics, 2010.[2] Robert J. Adler, Omer Bobrowski, and Shmuel Wein-berger. Crackle: The homology of noise.

Discrete &Computational Geometry , 52(4):680–704, aug 2014.[3] Ralph G Andrzejak, Klaus Lehnertz, Florian Mor-mann, Christoph Rieke, Peter David, and Christian EElger. Indications of nonlinear deterministic andﬁnite-dimensional structures in time series of brainelectrical activity: Dependence on recording regionand brain state.

Physical Review E , 64(6):061907,2001.[4] Nakhlé H Asmar.

Partial diﬀerential equations withFourier series and boundary value problems . CourierDover Publications, 2016.[5] Christoph Bandt and Bernd Pompe. Permutationentropy: a natural complexity measure for time series.

Physical review letters , 88(17):174102, 2002.[6] Th Buzug and G Pﬁster. Optimal delay time andembedding dimension for delay-time coordinates byanalysis of the global static and local dynamicalbehavior of strange attractors.

Physical review A ,45(10):7073, 1992.[7] Gunnar Carlsson. Topology and data.

Bulletin ofthe American Mathematical Society , 46(2):255–308,January 2009. Survey.[8] Herbert Edelsbrunner and John L. Harer.

Computa-tional topology: an introduction . American Mathe-matical Society, 2009. [9] Birgit Frank, Bernd Pompe, Uwe Schneider, andDirk Hoyer. Permutation entropy improves fetal be-havioural state classiﬁcation based on heart rate anal-ysis from biomagnetic recordings in near term fetuses.

Medical and Biological Engineering and Computing ,44(3):179, 2006.[10] Andrew M Fraser and Harry L Swinney. Indepen-dent coordinates for strange attractors from mutualinformation.

Physical review A , 33(2):1134, 1986.[11] Robert Ghrist. Barcodes: The persistent topology ofdata.

Builletin of the American Mathematical Society ,45:61–75, 2008. Survey.[12] Peter Grassberger and Itamar Procaccia. Measuringthe strangeness of strange attractors.

Physica D:Nonlinear Phenomena , 9(1-2):189–208, 1983.[13] Frank R Hampel. The inﬂuence curve and its role inrobust estimation.

Journal of the american statisticalassociation , 69(346):383–393, 1974.[14] Boris Iglewicz and David Hoaglin.

Volume 16: how todetect and handle outliers, The ASQC basic referencesin quality control: statistical techniques, Edward F.Mykytka . PhD thesis, Ph. D., Editor, 1993.[15] Matthew Kahle and Elizabeth Meckes. Limit theo-rems for betti numbers of random simplicial com-plexes.

Homology, Homotopy and Applications ,15(1):343–374, 2013.[16] Christophe Leys, Christophe Ley, Olivier Klein,Philippe Bernard, and Laurent Licata. Detecting out-liers: Do not use standard deviation around the mean,use absolute deviation around the median.

Journalof Experimental Social Psychology , 49(4):764–766,2013.[17] Duan Li, Zhenhu Liang, Yinghua Wang, SatoshiHagihira, Jamie W Sleigh, and Xiaoli Li. Parameterselection in permutation entropy for an electroen-cephalographic measure of isoﬂurane anesthetic drugeﬀect.

Journal of clinical monitoring and computing ,27(2):113–123, 2013.[18] Michał Melosik and W Marszalek. On the 0/1 test forchaos in continuous systems.

Bulletin of the Polish

Academy of Sciences Technical Sciences , 64(3):521–528, 2016.[19] George B Moody and Roger G Mark. The impact ofthe mit-bih arrhythmia database.

IEEE Engineering inMedicine and Biology Magazine , 20(3):45–50, 2001.[20] S. Y. Oudot.

Persistence theory: from quiver rep-resentations to data analysis , volume 209 of

AMSMathematical Surveys and Monographs . AmericanMathematical Society, 2015.[21] Jose A Perea, Anastasia Deckard, Steve B Haase,and John Harer. Sw1pers: Sliding windows and 1-persistence scoring; discovering periodicity in geneexpression time series data.

BMC bioinformatics ,16(1):257, 2015.[22] Jose A Perea and John Harer. Sliding windows andpersistence: An application of topological methodsto signal analysis.

Foundations of ComputationalMathematics , 15(3):799–838, 2015.[23] Steven M Pincus. Approximate entropy as a measureof system complexity.

Proceedings of the NationalAcademy of Sciences , 88(6):2297–2301, 1991.[24] Anton Popov, Oleksii Avilov, and Oleksii Kanaykin.Permutation entropy of eeg signals for diﬀerent sam-pling rate and time lag combinations. In

Signal Pro-cessing Symposium (SPS), 2013 , pages 1–4. IEEE,2013.[25] Mark A Richards. The discrete-time fourier transformand discrete fourier transform of windowed stationarywhite noise.

Georgia Institute of Technology, Tech.Rep , 2013.[26] Joshua S Richman and J Randall Moorman. Physio-logical time-series analysis using approximate entropyand sample entropy.

American Journal of Physiology-Heart and Circulatory Physiology , 278(6):H2039–H2049, 2000.[27] Müller Riedl, A Müller, and N Wessel. Practicalconsiderations of permutation entropy.

The EuropeanPhysical Journal Special Topics , 222(2):249–262,2013.[28] Songwon Seo.

A review and comparison of methodsfor detecting outliers in univariate data sets . PhDthesis, University of Pittsburgh, 2006.[29] Claude E Shannon, Warren Weaver, and Arthur WBurks. The mathematical theory of communication.1951.[30] Claude Elwood Shannon. A mathematical theory ofcommunication.

ACM SIGMOBILE mobile comput-ing and communications review , 5(1):3–55, 2001.[31] Floris Takens. Detecting strange attractors in turbu-lence. In

Dynamical systems and turbulence, Warwick1980 , pages 366–381. Springer, 1981.[32] Mei Tao, Kristina Poskuviene, Nizar Alkayem,Maosen Cao, and Minvydas Ragulskis. Permutationentropy based on non-uniform embedding.

Entropy ,20(8):612, 2018.[33] Hong Zhang and Xuncheng Liu. Analysis of pa-rameter selection for permutation entropy in logisticchaotic series. In

Intelligent Transportation, Big Data& Smart City (ICITBS), 2018 International Confer-ence on , pages 398–402. IEEE, 2018.

A Summary of the used data and models

A.1 Gaussian White Noise

The white noise was generated using NumPy’s ran-dom.normal function with a standard deviation of 1 . A.2 Lorenz System

The Lorenz system used is deﬁned as dxdt = σ ( y − x ) , d y dt = x ( ρ − z ) − y , dzdt = x y − β z . (17)The Lorenz system had a sampling rate of 100 Hz withparameters σ = . β = . / .

0, and ρ =

95. Thissystem was solved for 100 seconds and the last 24 secondswere used.

A.3 Rössler System

The Rössler system used was deﬁned as dxdt = − y − z , d y dt = x + a y , dzdt = b + z ( x − c ) , (18)with parameters of a = . b = . c =

14, which wassolved over 400 seconds with a sampling rate of 100 Hz.Only the last 2000 data points of the x-solution were usedin the analysis.

A.4 Bi-Directional Coupled Rössler System

The Bi-directional Rössler system is deﬁned as dx dt = − w y − z + k ( x − x ) , d y dt = w x + . y , dz dt = . + z ( x − ) , dx dt = − w y − z + k ( x − x ) , d y dt = w x + . y , dz dt = . + z ( x − ) , (19)with w = . w = .

95, and k = .

05. This was solvedfor 4000 seconds with a sampling rate of 10 Hz. Only thelast 400 seconds of the x-solution were used in the analysis.

A.5 Mackey-Glass Delayed Diﬀerential Equation

The Mackey-Glass Delayed Diﬀerential Equation is deﬁnedas x ( t ) = − γ x ( t ) + β x ( t − τ ) + x ( t − τ ) n (20)with τ = β = γ =

1, and n = .

65. This was solvedfor 400 seconds with a sampling rate of 100 Hz. Only thelast 300 seconds of the x-solution were used in the analysis.10 preprint - May 14, 2019

A.6 EEG Data

The EEG signal was taken from andrzejak et al. [3]. Specif-ically, the ﬁrst 2000 data points from the EEG data of ahealthy patient from set A, ﬁle Z-093 was used.

A.7 ECG Data

The Electrocardoagram (ECG) data was taken from SciPy’smisc.electrocardiogram data set. This ECG data was origi-nally provided by the MIT-BIH Arrhythmia Database [19].We used data points 3000 to 4500 during normal sinusrhythm.

A.8 Logistic Map

The logistic map was generated as x n + = r x n ( − x n ) , (21)with x = . r = .

95. Equation 21 was solved forthe ﬁrst 500 data points.

A.9 Hénon Map

The Hénon map was solved as x n + = − ax n + y n , y n + = bx n , (22)where b = . x = . y = .

3, and a = .

4. Thissystem was solved for the ﬁrst 500 data points of thex-solution.

B Algorithms

B.1 Fast 0D persistence algorithm for 1D dataB.2 SW1PerS Algorithm for the Embedding Delay τ The procedure we described in Section 3.3 is summarizedin algorithm 2. This algorithm can be used to automaticallydetermine a suitable delay τ using SW1PerS. Result:

Persistence Diagram: Birth and Deaths Times begin initialize time series;ﬁnd locations and values of minima and maxima;store locations and values in matrix ( M minmax ); while M minmax has values do increment i;diﬀerence = maxima values - minima values;ﬁnd index of smallest diﬀerence ( I min );peak value ( P v ) = M minmax of I min for max valuecolumn;peak index ( P i ) = M minmax of I min for max indexcolumn;valley value ( V v ) = M minmax of I min for min valuecolumn;valley index ( V i ) = M minmax of I min for min indexcolumn;persistence diagram point[i] = [valley value, peakvalue];remove P v , P i , V v , V i from M minmax ; end return persistence diagram points; endAlgorithm 1: Zero Dimensional Persistence Algorithm.

Result: τ begin set n T = 200;calculate L max using max signiﬁcant frequency;calculate m using FFT reconstruction to 10% errorthreshold;linearly map time series onto the domain [ , π ] ;set L = while L < L max do calculate window size as w = π mL ( m + ) ;calculate τ s as τ s = w / m ;generate interpolated function for time seriesusing cubic spline at τ s intervals;create point cloud using n T sliding windows as SW m ,τ f ( t ) ;apply persistence homology on SW point cloud;calculate periodicity score as s ( L ) = − r B − r D ;increment L ; end calculate L at minimum score as L P ;Find length of series as T ;calculate main period of time series as P = mT ( m + ) L P ;set α between 2 and 4;calculate τ = f s P α ;return τ ; end Algorithm 2: SW1PerS algorithm for ττ