Delay Parameter Selection in Permutation Entropy Using Topological Data Analysis
DDelay Parameter Selection in Permutation Entropy UsingTopological Data Analysis
A Preprint
Audun D. Myers
Department of Mechanical EngineeringMichigan State UniversityEast Lansing, MI [email protected]
Firas A. Khasawneh
Department of Mechanical EngineeringMichigan State UniversityEast Lansing, MI [email protected]
May 14, 2019
Abstract
Permutation Entropy (PE) is a powerful tool for quantifying the predictability of a sequence whichincludes measuring the regularity of a time series. Despite its successful application in a variety ofscientific domains, PE requires a judicious choice of the delay parameter τ . While another parameterof interest in PE is the motif dimension n , Typically n is selected between 4 and 8 with 5 or 6 givingoptimal results for the majority of systems. Therefore, in this work we focus solely on choosing thedelay parameter. Selecting τ is often accomplished using trial and error guided by the expertise ofdomain scientists. However, in this paper, we show that persistent homology, the flag ship tool fromTopological Data Analysis (TDA) toolset, provides an approach for the automatic selection of τ . Weevaluate the successful identification of a suitable τ from our TDA-based approach by comparing ourresults to a variety of examples in published literature. Shannon entropy, which was introduced in 1948 [30], is ameasurement of how much uncertainty there is in futuredata given the current dataset. Since the first introductionof information entropy, several new forms of entropy havebeen popularized. Some examples include approximateentropy [23], sample entropy [26], and permutation en-tropy (PE) [5]. While all of these methods measure thepredictability of a sequence, PE also considers the orderin which the data was received, which is critical for timeseries analysis. PE was first introduced by Bandt andPompe [5] in 2002. Similar to Shannon Entropy, PE isquantified as the summation of the probabilities of a datatype (see Eq. 1), where the data types for PE are motifs(see Fig. 1), which we represent as π . The PE parameters n and τ are used when selecting the motif size and spacing,respectively. More specifically, τ is the embedding delaylag applied to the series and n is a natural number thatdescribes the dimension of the motif. In this study we focuson selecting τ since the dimension is typically chosen inthe range 3 < n ≤ τ = n = τ = n ∈ [ , ] for logisticmaps, and Frank et al. [9] suggest τ = n ∈ [ , ] for heart rate applications. One main disadvantage ofusing suggested parameter values for an application is thehigh dependence of PE on the sampling frequency. Asan example, Popov et al. [24] showed the importance ofconsidering the sampling frequency when selecting τ for anEEG signal. Another limitation is the need for applicationexpertise in order to determine the needed parameters. Thiscan hinder using PE in new applications that have not beensufficiently explored. Consequently, there is a need foran automatic, application-independent parameter selectionalgorithm for PE.To some extent, some studies have attempted to find robustand application-independent methods for determining τ .Many of these methods are based on phase space recon-struction using Takens embedding [31], which has corre-sponding parameters. Some of the common approachesfor determining τ are mutual information [10], autocorre-lation [12], and phase space methods [6]. However, due tothe origin of these methods in phase space reconstructionand not PE, they may result in a poor estimate of τ . a r X i v : . [ phy s i c s . d a t a - a n ] M a y preprint - May 14, 2019In this paper, we present two novel approaches for finding τ based on tools from Topological Data Analysis (TDA).Specifically in the first approach we compute the 0-D sub-level set persistence of the Fourier spectrum to obtain asummary of how the different components in the spectrumare connected. We then use the modified z -score fromstatistics to acquire a threshold for a cutoff frequency whichcaptures the most dominant frequencies in the spectrumin an automatic way. We then utilize Nyquist’s samplingtheorem to find an appropriate τ value. Our second ap-proach utilizes sliding window and 1-D persistence scoring(SW1PerS) [21] to measure how periodic or significantthe circular shape is for the embedded point cloud. UsingSW1Pers, we search for the window size that maximizes aperiodicity score, and then use the resulting window sizeto suggest an embedding delay for PE. To determine the vi-ability of our two methods, PE parameters generated basedon them are compared to expert suggested parameters.The paper is organized as follows. In Section 1.1 wedescribe PE and show its computation using a simpleexample. Section 2 provides an overview of the toolsthat we use from TDA. Section 3.2 describes our firstapproach for finding τ which is a frequency based approachthat applies 0-D sublevel set persistence and the modified z -score to the Fourier spectrum. Section 3.3 details thesecond TDA approach for finding τ which is a time-domainapproach that combines sliding window embedding of thetime series with 1-D persistence. The results are thenpresented in Section 4, and the concluding remarks aremade in Section 5. Permutation entropy H ( n ) for motif dimension n is calcu-lated according to [5] H ( n ) = − (cid:213) p ( π i ) log p ( π i ) , (1)where p ( π i ) is the probability of a permutation π i , and H ( n ) has units of bits when the logarithm is of base 2.The permutation entropy parameters τ and n are usedwhen selecting the motif size: τ is the number of timesteps between two consecutive points in a uniformly sub-sampled time series, and n is the permutation length or motifdimension. Using a set X and an element of the set x i ∈ X ,we can define the vector v i = [ x i , x i + τ , x i + τ , . . . , x i + ( n − ) τ ] ,which has the permutation π i . To better understand thepossible permutations, consider an example with thirddegree ( n =
3) permutations. This results in six possiblemotifs as shown in Fig. 1.Next, to further demonstrate PE with an example, considerthe sequence X = [ , , , , , , ] with third order per-mutations n = τ =
1. The sequence can bebroken down into the following permutations: two ( , , ) permutations, one ( , , ) permutation, and two ( , , ) permutations for a total of 5 permutations. Applying Eq. (1)yields H ( ) = −
25 log 25 −
25 log 25 −
15 log 15 = .
522 bits . (0,1,2) (0,2,1) (1,0,2)(2,0,1) (1,2,0) (2,1,0) Figure 1: All possible permutation configurations (motifs)for n = 3, where [ π . . . π ] = [( , , ) . . . ( , , )] .The permutation distribution can be visually understood byillustrating the probabilities of each permutation as separatebins. To accomplish this, Fig. 2 was created by taking thesame series X (Fig. 2 a) and placing the abundance of eachpermutation into its respective bin (Fig. 2 b). PE is at aFigure 2: Abundance of each permutation from exampledataset X .maximum when all n ! possible permutations are evenlydistributed or, equivalently, when the permutations areequiprobable with p ( π i ) = p ( π ) , p ( π ) , . . . , p ( π n ! ) = n ! .From this, the maximum permutation entropy H max isquantified as H max ( n ) = − (cid:213) p ( π i ) log p ( π i ) = − log 1 n ! = log n ! . (2)Applying Eq. (2) for n = n !, the normalized permutation entropy iscalculated as h n = − n ! (cid:213) p ( π i ) log p ( π i ) . (3)2 preprint - May 14, 2019Applying Eq. (3) to the example series X results in h ≈ . Our TDA-based approaches for finding the delay dimensionemploy two types of persistence applied to two differenttypes of data. Specifically, in the first approach we combinethe 0-D sublevel persistence of one-dimensional time serieswith the z -score, while in the second approach we utilize1-D persistent homology on embedded time series as partof the SW1PerS framework in Section 3.3. This sectionprovides a basic background of the topics needed to atleast intuitively understand the subsequent analysis. Morespecifics can be found in [7, 8, 11, 20]. An abstract k -simplex σ is defined as a set of k + ( σ ) = k . If we apply a geometric interpretationto a k -simplex, we can think of it as a set V of k + K is a set of simplices σ ⊆ V such that for every σ ∈ K , all the faces of σ , i.e., allthe lower dimensional component simplices σ (cid:48) ⊂ σ arealso in K . For example, if a triangle (2-simplex) is in asimplicial complex K , then so are the edges of the triangle(1-simplices) as well as all the nodes in the triangle (0-simplices). The dimension of the resulting simplicialcomplex is given by the largest dimension of its simplicesaccording to dim ( K ) = max σ ∈ K dim ( σ ) . The n -skeletonof a simplicial complex K ( n ) is the restriction of the latterto its simplices of degree at most n , i.e., K ( n ) = { σ ∈ K | dim ( σ ) ≤ n } .Given an undirected graph G = ( V , E ) where V are thevertices and E are the edges, we can construct the clique(or flag) complex K ( G ) = { σ ⊆ V | u v ∈ E for all u (cid:44) v ∈ σ } . If we fix a simplicial complex K , then homology groupscan be used to quantify the holes of the structure in differentdimensions. For example, in dimension 0, the rank of the0 dimensional homology group H ( K ) is the number ofconnected components. The rank of the 1-dimensional ho-mology group H ( K ) is the number of loops, while the rankof H ( K ) is the number of voids, and so on. The homologygroups are constructed using linear transformations termedboundary operators.To describe boundary operators, we first define the orientedsimplicial complex as the simplicial complex obtained byan ordered set of vertices σ = [ v , . . . , v k ] . Permutingtwo indices in σ gives the opposite simplex according to [ v , . . . , v i , . . . , v j , . . . , v k ] = −[ v , . . . , v j , . . . , v i , . . . , v k ] .Now let { α σ } be coefficients in a field F (in this paper wechoose F = Z ). Then K ( n ) , the n -skeleton of K , can beused as a generating set of the F -vector space ∆ n ( K ) . Inthis representation, any element of ∆ n ( K ) can be writtenas a linear combination (cid:205) σ ∈ K ( n ) α σ σ . Further, elementsin ∆ n ( K ) are added by adding their coefficients. A finiteformal sum of the n -simplices in K is called an n -chain,and the group of all n -chains is the n th chain group ∆ n ( K ) ,which is a vector space.Given a simplicial complex K , the boundary map ∂ n : ∆ n ( K ) → ∆ n − ( K ) is defined by ∂ n ([ v , . . . , v n ]) = n (cid:213) i = (− ) i [ v , . . . , ˆ v i , . . . , v n ] , where ˆ v i denotes the absence of element v i from the set.This linear transformation maps any n -simplex to the sumof its codimension 1 (codim-1) faces. The geometricinterpretation of the boundary operator is that it yields theorientation-preserved boundary of a chain.By combining boundary operators, we obtain the chaincomplex . . . ∂ n + −−−→ ∆ n ( K ) ∂ n −−→ . . . ∂ −−→ ∆ ( K ) ∂ −−→ , where the composition of any two subsequent boundaryoperators is zero, i.e., ∂ n ◦ ∂ n + =
0. A n -chain α ∈ ∆ n ( K ) is a cycle if ∂ n ( α ) =
0; it is a boundary if there is an n + β such that ∂ n + ( β ) = α . Define the kernel ofthe boundary map ∂ n using Z n ( K ) = { c ∈ ∆ n ( K ) : ∂ n c = } , and the image of ∂ n + B n ( K ) = { c ∈ ∆ n ( K ) : c = δ n + c (cid:48) , c (cid:48) ∈ ∆ n + ( K )} . Consequently, we have B k ( K ) ⊆ Z k ( K ) . Therefore, we define the n th homology group of K as the quotient group H n ( K ) = Z n ( K )/ B n ( K ) . In this paper,we only need 0- and 1-dimensional persistent homology,and we always assume homology with Z coefficients whichremoves the need to keep track of orientation. In the case of0-dimensional homology, there is a unique class in H ( K ) for each connected component of K . For 1-dimensionalhomology, there is one homology class in H ( K ) for each hole in the complex. Now, we are interested in studying the structure of achanging simplicial complex. We introduce a real-valuedfiltration function on the simplicies of K such that f ( τ ) ≤ f ( σ ) for all τ ≤ σ simplices in K . If we let { y < y <. . . < y (cid:96) } be the set of the sorted range of f for any y ∈ R ,then the filtration of K with respect to f is the orderedsequences of its subcomplexes ∅ ⊆ K ( y ) ⊆ K ( y ) ⊆ · · · ⊆ K ( y (cid:96) ) = K . The sublevel set of K corresponding to y is defined as K ( y ) = { σ ∈ K | f ( σ ) ≤ y } , (4)where each of the resulting K ( y ) is a simplicial complex,and for any y ≤ y , we have K ( y ) ⊆ K ( y ) .3 preprint - May 14, 2019The filtration of K enables the investigation of the topolog-ical space under multiple scales of the output value of thefiltration function f . In this paper we consider two differentfiltration functions where each of these functions is appliedto a different type of data: 0-D persistence applied to 1-Dtime series, and 1-D persistence applied to point cloudsembedding in R n .0 -D persistence applied to -D time series: Let χ bethe time-ordered set of the critical values of a time series.Here, we can think of the simplicial complex K = G ( V , E ) containing a number of vertices | V | equal to the number ofcritical values in the time series and only the edges E thatconnect adjacent vertices. i.e., vertices { v i | ≤ i ≤ n } ,and edges { v i v i + | ≤ i ≤ n − } . Therefore, we have aone-to-one correspondence between the critical values in χ and the vertices of the simplicial complex V . We definethe filtration function for every face σ in K according to f ( σ i ) = (cid:26) χ i if σ i ∈ V , max ( u , v ) if σ i = u v ∈ E . Using this filtration function in Eq. (4), we can de-fine an ordered sequence of subcomplexes where y ∈[ min ( χ ) , max ( χ )] .1 -D persistence applied to point clouds in R m : Thesecond set of data points we work with is a point cloud P ⊂ R m . There are several methods to embed a timeseries; however, here we embed the time series using delayreconstruction. For each point p ∈ P , let B ( p , r ) be theball centered at p and of radius r . Now, for some radius r ,we can build a simplicial complex where, for example, theintersection of any two balls adds an edge, the intersectionof three balls adds a triangle, and higher dimensionalanalogs are added similarly, see the construction for r through r in Fig. 4. The result of the union of all theballs ∪ p ∈ P B ( p , r ) is called the Čech complex; althoughin practice we work with Rips complexes which are theclique complexes of the function f and they are easier toobtain computationally while still capturing the interestingfeatures of the space. The filtration function in this case isgiven by f ( σ ) = max { u , v }⊆ σ d ( u , v ) , where d ( u , v ) is the distance between vertices u and v , andthe distance between a vertex and itself is zero. Persistent homology is a tool from topological data analysiswhich can be used to quantify the shape of data. Themain idea behind persistent homology is to watch how thehomology changes over the course of a given filtration.Fix a dimension n , then for a given filtration K ⊆ K ⊆ · · · ⊆ K N we have a sequence of maps on the homology H n ( K ) → H n ( K ) → · · · → H n ( K N ) . We say that a class [ α ] ∈ H n ( K i ) is born at i if it is not inthe image of the map H n ( K i − ) → H n ( K i ) . The same classdies at j if [ α ] (cid:44) H n ( K j − ) but [ α ] = H n ( K j ) .This information can be used to construct a persistencediagram as follows. A class that is born at i and dies at j is represented by a point in R at ( i , j ) . The collectionof the points in the persistence diagram, therefore, give asummary of the topological features that persists over thedefined filtration. See the example of Fig. 3 for n = n = Delay embedding is a uniform subsampling of the originaltime series according to the embedding parameter τ . Forexample, the subsampled sequence X with elements { x i : i ∈ N ∪ } subject to the delay τ is defined as X ( τ ) = [ x , x τ , x τ , . . . ] . Riedl et al. [27] showed that PE is sensitiveto the time delay, which prompts the need for a robustmethod for determining an appropriate value for τ . Forestimating the optimal τ , we will be investigating thefollowing methods in the subsequent sections: MutualInformation (MI) in Section 3.1, combining Fourier analysiswith 0-D persistence and the modified z -score (Section3.2), and Sliding Window for One-Dimensional PersistenceScoring (SW1PerS) (Section 3.3). We recognize, but donot investigate, some other commonly used methods forfinding τ . These include the autocorrelation function [12]and the phase space expansion [6]. Mutual information is a measurement of how much infor-mation is shared between two sequences, and it was firstrealized by Shannon et al. [29] as I ( X ; Y ) = (cid:213) x ∈ X (cid:213) y ∈ Y p ( x , y ) log p ( x , y ) p ( x ) p ( y ) , (5)where X and Y are separate sequences, p ( x ) and p ( y ) arethe probability of the element x and y separately, and p ( x , y ) is the joint probability of x and y . Fraser andSwinney [10] showed that for a chaotic time series themutual information between the sequence x ( t ) and x ( t + τ ) r and dies at r , while the second complex is born at r and dies at a much largerradius r .will decrease as τ increases until reaching a minimum. Atthis delay τ , the individual data points share a minimumamount of information, thus indicating that the data pointsare sufficiently separated. While this delay value wasspecifically developed for phase space reconstruction froma single time series, it is also commonly used for the PEparameter τ . We would like to point out that in general thereis no guarantee that minima exist for the mutual informationfunction, which is a serious limitation for computing τ using this method. -D Persistenceand Modified z -score In this section we present a novel topological data anal-ysis (TDA) based approach for finding the noise floorin the Fourier spectrum. Specifically, we show how the0-dimensional sublevel persistence, a tool from TDA dis-cussed in Section 2, can be used to find the lifetime of themaxima and minima in a signal. We describe a fast andsimple algorithm (Algorithm 1) for computing the lifetimesusing the peaks and valleys of the signal. We then separatethe noise lifetimes from significant lifetimes through theuse of the modified z -score, which allows us to find thenoise floor and frequency cutoff. This frequency cutoffwill later be used to find an embedding delay τ for PE. Theprocess for finding the cutoff frequency is summarized inFig. 5. The following paragraphs give an overview of themodified z -score and the threshold and cutoff analysis. Modified z -score The modified z -score z m is essentialto understanding the techniques used for isolating noisefrom a signal [28]. The standard score, commonly knownas the z -score, uses the mean and the standard deviation ofa dataset to find an associated z -score for each data pointand is defined as z = x − µσ , (6)where x is the dataset, µ is the mean, and σ is the standarddeviation of the dataset, respectively. The z -score valueis commonly used to identify outliers in the dataset byrejecting points that are above a set threshold, which isset in terms of how many standard deviations away from the mean are acceptable. Unfortunately, the z -score issusceptible to outliers itself because both the mean andthe standard deviation are not robust against outliers [16].This led Hampel [13] to develop the modified z -score asan outlier detection method that is insensitive to outliers.The logic behind the modified z -score or median absolutedeviation (MAD) method is grounded on the use of themedian for the mean. The MAD is calculated asMAD = median (| x − ˜ x |) , (7)where x is the dataset and ˜ x is the median of the dataset. TheMAD is substituted for the standard deviation in Eq. (6). Tocomplete the modified z -score, Iglewicz and Hoaglin [14]suggested substituting the mean with the median. Theresulting equation for the modified z -score is z m = . x − ˜ x MAD . (8)Using z m allows for outlier detection with a breakdownpoint of 50% outliers. A threshold for separating noisein the persistence domain is discussed in the followingparagraph. Threshold and Cutoff Analysis
To determine the noisefloor in the normalized Fast Fourier Transform (FFT) spec-trum, we compute the 0-dimensional persistence of the FFT.This provides relatively short lifetimes for the GWN, whilethe prominent peaks, which represent the actual signal,have comparatively long lifetimes or high persistence. Toseparate the noise from the outliers, we apply the modified z -score for each lifetime in the persistence diagram. Wedetermine the modified z -score threshold based on the needto account for approximately all of the noise in the FFT. Todo this, we use a signal of pure GWN with a standard devi-ation of 1.0 and length of 10 . Next, we calculate the FFTof the GWN signal and its 0-D sublevel set persistence toget a set of lifetimes from the persistence diagram. We thenapply modified z -score thresholds ranging from 0 to 5 onthe lifetimes from the persistence diagram to determine thepercent of noise as a function of the threshold. By settinga threshold requirement that accounts for approximatelyall persistence diagram noise, while not being excessivelylarge, a threshold of approximately 4.8 was found as shownin Fig. 6. This threshold was rounded up to 5 for simplicity.5 preprint - May 14, 2019 TimesSeries Fourier Transform 0-DPersistence Modified-score Cutoff/Max. Freq. Embedding Delay
Cutoff
Max Frequency
Birth D e a t h Birth D e a t h Frequency Frequency
Figure 5: Overview of procedure for finding maximum significant frequency using 0-dimensional sublevel set persistenceand the modified z -score for a signal contaminated with GWN.With a suitable modifed z -score threshold, our next goalin finding a the noise floor cutoff begins by investigatingthe distribution of the FFT of GWN in the persistencedomain. To do this, the same GWN signal is analyzed.Again, the FFT and the 0-D persistence were calculatedto determine the lifetimes as well as the births and deathsof the persistence diagram. Figure 7 shows the histogramobtained from the resulting persistence diagram.Figure 6: Percent of the persistence points from 0-Dsublevel set persistence of the FFT of GWN using themodified z -score with the provided threshold ranging from0 to 5.Although there have been studies on pushing forward proba-bility distributions into the persistence domain [1, 2, 15], itis difficult to obtain a theoretical cutoff value in persistencespace. Therefore, we proceed by computing the histogramsfor the 0-D sublevel set and establishing a heuristic cutoffvalue that seems to perform well. The theoretical distri-bution of GWN in the Fourier spectrum has a Rayleighdistribution [25]. Approximately the same distributionappears in the histogram of noise for the birth times ofthe persistence diagram with the following differences:the death times tend to show a slightly shifted normaldistribution, while the lifetimes show approximately afolded-normal distribution. Without doing an in depth sta-tistical analysis of the distributions, we calculate a suitablecutoff using the median of the births and maximum of thelifetimes asCutoff noise = max ( lifetime noise ) + median ( birth noise ) , (9)where Cutoff noise is the suggested noise cutoff and birth noise and lifetime noise is the set of the births and lifetimes thatwere considered noise from the modified z -scores. Figure 7: Distributions of the lifetimes, deaths, and birthsof 0-D sublevel set persistence for the FFT of GWN witha length of 10 , mean µ = .
0, and standard deviation σ = . f max as thehighest frequency in the Fourier spectrum with an amplitudegreater than the specified cutoff from Eq. (9). Using f max ,we suggest a delay following Nyquist’s sampling rate as τ = f s f max , where f s is the sampling frequency. τ using SW1Pers Perea and Harer [22] developed Sliding Window for One-Dimensional Persistence Scoring (SW1PerS) as a TDAmethod for determining the most significant period of atime series. SW1PerS is also useful for finding the windowin which a chaotic time series has the highest periodicity.In this section we develop an algorithm that combinesthe period of the main oscillation derived from SW1PerSwith frequency criterion developed by [18] to determine asuitable delay.SW1PerS uses 1-D persistent homology from Section 2 tomeasure how periodic or significant the circular shape inan embedded dimension (point cloud) is as the embeddingwindow size increases. The sliding window SW is definedas SW m ,τ s f ( t ) = [ f ( t ) , f ( t + τ s ) , . . . , f ( t + m τ s )] , (10)where f ( t ) is the time series being analyzed and τ s and m are, respectively, SW1PerS embedding delay and dimen-sion. Applying Eq. (10) across the range of the time series n T times results in a collection of vectors known as a setof point clouds, which live in an m -dimensional Euclideanspace. The number of sliding windows n T was set to 200 tobe sufficiently high for forming 1-D topological shapes or6 preprint - May 14, 2019simplices. Specifically, we first linearly map the entire timeseries onto the domain [ , π ] . Next, we set the desiredwindow size as w = π mL ( m + ) , (11)where L is the division parameter that segments the entiretime series into seperate windows of size w . For ouranalysis, we varied the L parameter from Eq. (11) to findthe window size which yielded the highest periodicity.Figure 8 shows an example sliding window using theLorenz system described in Section A.2 with ρ =
28. The n n+1 n+2 ...... n n+1 n+2 x yz Figure 8: SW1PerS applied on Lorenz example definedin Section A.2 showing sliding window for time serieslinearly scaled to 2 π and embedded in dimension m = m =
3. Normally, m is determinedbased on the theory developed by Perea et al. [22], whichshowed the necessary value of m for reconstruction isbounded by m ≥ N , where N is the number of Fourierterms necessary for reconstructing the signal. For ourapplication, we will also set m = N . Perea et al. [21]did not choose N based on the data, but rather picked alarge number assuming it would be sufficient to faithfullyrepresent the data. In contrast, in this work we automatechoosing N by approximating the Fourier series using thediscrete Fourier transform, and then computing the (cid:96) normbetween the approximation and the signal to obtain thevalue of N that yields an error within a desired threshold,see Eq. (13). Specifically, if we let the time series X be adiscrete time sampling of a piecewise smooth signal x ( t ) ,then the N -partial sum of the Fourier series of x ( t ) can be approximated according to f N ( t ) = | X | N (cid:213) k = (cid:16) | X | (cid:213) j = X ( j ) e − π ijk / T (cid:17) e π ikt / T , (12)where | X | is the length of the time series X . As a rule ofthumb, N ≈ | X |/ x ( t ) [4]. The relative (cid:96) norm that measures the error betweentime series X and its reconstruction f N ( t ) is given by (cid:96) ( N ) = (cid:16) (cid:205) | X | j = (cid:2) X ( j ) − f N ( j ) (cid:3) (cid:17) / (cid:16) (cid:205) | X | j = X ( j ) (cid:17) / . (13)For our application, we cosider f N ( t ) as sufficiently closeto x ( t ) when we find a value of N for which (cid:96) ( N ) < . m determined as m = N , we then calculate thesliding window delay τ s based on the relationship τ s = w / m .To apply this desired time delay, we interpolate the timeseries using a cubic spline fit. We can then determine thewindow size that provides the greatest periodicity usingthe periodicity score s = − r B − r D , (14)Equation (15) normalizes the scores so that s = s = P of the time series us-ing the L parameter corresponding to the lowest periodicityscore s to calculate P as P = mT ( m + ) L P , (15)where T is the time span of the original time series and L P is the L value corresponding to the the lowest periodicityscore s . To find L P , L was incremented from 1 to L max ,where L max is the maximum number of periods to considerin the time series. Figure 9 demonstrates how the shape ofthe data changes with the time delay τ s , which changes thewindow size w . Additionally, Fig. 9 shows how a minimumperiodicity score occurs when the embedded data forms a1-D simplex with the longest lifetime. Using the periodof the time series P , we compute a suggested embeddingdelay according to τ = f s P α , (16)where f s is the original sampling frequency and α ∈ [ , ] is based on the frequency criteria from [18]. For this paperwe set α = To verify our TDA-based methods for determining τ , Ta-ble 1 compares our results to the values suggested by expertsusing a variety of systems including the ones listed by Riedlet al. [27]. The table highlights the automatically computed τ which best matches the expert-identified values for τ .7 preprint - May 14, 2019Figure 9: Example showing the shape of the embedded time series as a function of the window size: (left) w Small istoo small revealing elliptical shape and high periodicity score s , (middle) w Medium is properly sized and results in aminimum periodicity score s , and (right) w Medium is too large and results in a high periodicity score s .The tables shows that for GWN, all the considered methodsaccurately calculate an embedding delay of 1. However,none of the methods gave reliable results for both differenceequations (Henon and the logistic map). Specifically, thefrequency approach based on 0-D persistence provided thebest results with τ =
1, while the MI method yielded thebest result for the Logistic map. While the 0-D persistenceapproach slightly underestimated τ for these systems, wedo not suggest using SW1PerS for difference equationssince it results in considerably inaccurate suggestions forthe delay parameter.Although Table 1 shows that there is no one method thatperforms well for all types of signals, in the followingwe provide recommendations for which method to usewith chaotic signals, periodic signals, and EEG/ECG data.For chaotic differential equations, we suggest using thedelay calculated using the frequency approach with a 0-D-persistence noise filter as this method consistently providedthe most accurate time delays. For periodic systems, wesuggest using the τ found using either SW1PerS or the 0-Dpersistence approach, but to avoid using the MI results.For EEG/ECG data, we suggest using either SW1PerS orthe 0-D persistence approach as they consistently providemore accurate values of τ in comparison to the MI method. In this paper we describe two novel TDA-based methodsfor automatically determining the PE delay parameter τ given a sufficiently sampled/oversampled time series. Oneof the methods is a frequency domain approach based onthe modified z -score and the 0-D sublevel set persistence.The other methods is based on SW1PerS, and it operateson a sliding window embedding of the time-domain signal. Both of these methods were compared to the standard MIapproach of Fraser and Swinney [10].For the frequency approach, we developed an automaticalgorithm for finding the maximum significant frequencyusing a cutoff greater than the additive Gaussian noise floor.We find the noise floor using 0-D sublevel set persistencewith noise separated using the modified z -score.We applied these methods to various categories includingdifference equation, chaotic differential equations, periodicsystems, EEG/ECG data, and Gaussian noise. We thencompared the generated parameters to values suggestedby experts to determine which methods consistently foundaccurate values for τ , see Table 1. We showed that no singlemethod provides accurate values of τ for every application,but each method showed better performance depending onthe type of the studied system.Specifically, for pure white noise, all the methods produceda viable estimate for τ . However for chaotic systems,the 0-D persistence approach yielded the most consistentresults. For periodic time series, both TDA-based methodoutperforms their MI counterparts.For EEG/ECG data, the lowest bound for either SW1PerSor the 0-D gave better estimates for τ in comparison to theMI approach. However, for difference equations, no onemethod consistently provided accurate values for τ with theSW1PerS approach giving large overestimates. For thesesystems, although the frequency approach based on 0-Dpersistence underestimated τ , it came closest to providingthe best results for Hénon map. On the other hand, theMI method yielded the best result for the Logistic map.Therefore, if TDA-methods are used for finding the delayparameter for difference equations, we recommend avoidingthe SW1PerS approach in favor of the 0-D persistenceapproach.8 preprint - May 14, 2019Table 1: A comparison between the calculated and suggested values for the delay parameter τ . The shaded cellshighlight the methods that yielded the closest match to the suggested delay. MethodCatagory System MI SW1PerS 0-D Persistenceof FFT SuggestedDelay τ Ref.
Noise White Noise 1 1 1 1 [27]Lorenz 10 9 8 10 [27]Rossler 13 22 7 9 [32]Bi-directionalRossler 11 7 12 15 [27]ChaoticDifferentialEquation Mackey-Glass 7 10 3 1-to 700 [27]PeriodicSystem Sine Wave 2 22 21 20 [32]Logistic Map 4 15 1 1 to 5 [33]NonlinearDifference Eq. Henon Map 9 90 1 1 to 2 [27]ECG 5 11 12 1 to 4 [27]MedicalData EEG 6 2 5 1 to 3 [27]
Acknowledgment
FAK acknowledges the support of the National Sci-ence Foundation under grants CMMI-1759823 and DMS-1759824.
References [1] Robert J. Adler, Omer Bobrowski, Matthew S. Bor-man, Eliran Subag, and Shmuel Weinberger. Persis-tent homology for random fields and complexes. In
Institute of Mathematical Statistics Collections , pages124–143. Institute of Mathematical Statistics, 2010.[2] Robert J. Adler, Omer Bobrowski, and Shmuel Wein-berger. Crackle: The homology of noise.
Discrete &Computational Geometry , 52(4):680–704, aug 2014.[3] Ralph G Andrzejak, Klaus Lehnertz, Florian Mor-mann, Christoph Rieke, Peter David, and Christian EElger. Indications of nonlinear deterministic andfinite-dimensional structures in time series of brainelectrical activity: Dependence on recording regionand brain state.
Physical Review E , 64(6):061907,2001.[4] Nakhlé H Asmar.
Partial differential equations withFourier series and boundary value problems . CourierDover Publications, 2016.[5] Christoph Bandt and Bernd Pompe. Permutationentropy: a natural complexity measure for time series.
Physical review letters , 88(17):174102, 2002.[6] Th Buzug and G Pfister. Optimal delay time andembedding dimension for delay-time coordinates byanalysis of the global static and local dynamicalbehavior of strange attractors.
Physical review A ,45(10):7073, 1992.[7] Gunnar Carlsson. Topology and data.
Bulletin ofthe American Mathematical Society , 46(2):255–308,January 2009. Survey.[8] Herbert Edelsbrunner and John L. Harer.
Computa-tional topology: an introduction . American Mathe-matical Society, 2009. [9] Birgit Frank, Bernd Pompe, Uwe Schneider, andDirk Hoyer. Permutation entropy improves fetal be-havioural state classification based on heart rate anal-ysis from biomagnetic recordings in near term fetuses.
Medical and Biological Engineering and Computing ,44(3):179, 2006.[10] Andrew M Fraser and Harry L Swinney. Indepen-dent coordinates for strange attractors from mutualinformation.
Physical review A , 33(2):1134, 1986.[11] Robert Ghrist. Barcodes: The persistent topology ofdata.
Builletin of the American Mathematical Society ,45:61–75, 2008. Survey.[12] Peter Grassberger and Itamar Procaccia. Measuringthe strangeness of strange attractors.
Physica D:Nonlinear Phenomena , 9(1-2):189–208, 1983.[13] Frank R Hampel. The influence curve and its role inrobust estimation.
Journal of the american statisticalassociation , 69(346):383–393, 1974.[14] Boris Iglewicz and David Hoaglin.
Volume 16: how todetect and handle outliers, The ASQC basic referencesin quality control: statistical techniques, Edward F.Mykytka . PhD thesis, Ph. D., Editor, 1993.[15] Matthew Kahle and Elizabeth Meckes. Limit theo-rems for betti numbers of random simplicial com-plexes.
Homology, Homotopy and Applications ,15(1):343–374, 2013.[16] Christophe Leys, Christophe Ley, Olivier Klein,Philippe Bernard, and Laurent Licata. Detecting out-liers: Do not use standard deviation around the mean,use absolute deviation around the median.
Journalof Experimental Social Psychology , 49(4):764–766,2013.[17] Duan Li, Zhenhu Liang, Yinghua Wang, SatoshiHagihira, Jamie W Sleigh, and Xiaoli Li. Parameterselection in permutation entropy for an electroen-cephalographic measure of isoflurane anesthetic drugeffect.
Journal of clinical monitoring and computing ,27(2):113–123, 2013.[18] Michał Melosik and W Marszalek. On the 0/1 test forchaos in continuous systems.
Bulletin of the Polish
Academy of Sciences Technical Sciences , 64(3):521–528, 2016.[19] George B Moody and Roger G Mark. The impact ofthe mit-bih arrhythmia database.
IEEE Engineering inMedicine and Biology Magazine , 20(3):45–50, 2001.[20] S. Y. Oudot.
Persistence theory: from quiver rep-resentations to data analysis , volume 209 of
AMSMathematical Surveys and Monographs . AmericanMathematical Society, 2015.[21] Jose A Perea, Anastasia Deckard, Steve B Haase,and John Harer. Sw1pers: Sliding windows and 1-persistence scoring; discovering periodicity in geneexpression time series data.
BMC bioinformatics ,16(1):257, 2015.[22] Jose A Perea and John Harer. Sliding windows andpersistence: An application of topological methodsto signal analysis.
Foundations of ComputationalMathematics , 15(3):799–838, 2015.[23] Steven M Pincus. Approximate entropy as a measureof system complexity.
Proceedings of the NationalAcademy of Sciences , 88(6):2297–2301, 1991.[24] Anton Popov, Oleksii Avilov, and Oleksii Kanaykin.Permutation entropy of eeg signals for different sam-pling rate and time lag combinations. In
Signal Pro-cessing Symposium (SPS), 2013 , pages 1–4. IEEE,2013.[25] Mark A Richards. The discrete-time fourier transformand discrete fourier transform of windowed stationarywhite noise.
Georgia Institute of Technology, Tech.Rep , 2013.[26] Joshua S Richman and J Randall Moorman. Physio-logical time-series analysis using approximate entropyand sample entropy.
American Journal of Physiology-Heart and Circulatory Physiology , 278(6):H2039–H2049, 2000.[27] Müller Riedl, A Müller, and N Wessel. Practicalconsiderations of permutation entropy.
The EuropeanPhysical Journal Special Topics , 222(2):249–262,2013.[28] Songwon Seo.
A review and comparison of methodsfor detecting outliers in univariate data sets . PhDthesis, University of Pittsburgh, 2006.[29] Claude E Shannon, Warren Weaver, and Arthur WBurks. The mathematical theory of communication.1951.[30] Claude Elwood Shannon. A mathematical theory ofcommunication.
ACM SIGMOBILE mobile comput-ing and communications review , 5(1):3–55, 2001.[31] Floris Takens. Detecting strange attractors in turbu-lence. In
Dynamical systems and turbulence, Warwick1980 , pages 366–381. Springer, 1981.[32] Mei Tao, Kristina Poskuviene, Nizar Alkayem,Maosen Cao, and Minvydas Ragulskis. Permutationentropy based on non-uniform embedding.
Entropy ,20(8):612, 2018.[33] Hong Zhang and Xuncheng Liu. Analysis of pa-rameter selection for permutation entropy in logisticchaotic series. In
Intelligent Transportation, Big Data& Smart City (ICITBS), 2018 International Confer-ence on , pages 398–402. IEEE, 2018.
A Summary of the used data and models
A.1 Gaussian White Noise
The white noise was generated using NumPy’s ran-dom.normal function with a standard deviation of 1 . A.2 Lorenz System
The Lorenz system used is defined as dxdt = σ ( y − x ) , d y dt = x ( ρ − z ) − y , dzdt = x y − β z . (17)The Lorenz system had a sampling rate of 100 Hz withparameters σ = . β = . / .
0, and ρ =
95. Thissystem was solved for 100 seconds and the last 24 secondswere used.
A.3 Rössler System
The Rössler system used was defined as dxdt = − y − z , d y dt = x + a y , dzdt = b + z ( x − c ) , (18)with parameters of a = . b = . c =
14, which wassolved over 400 seconds with a sampling rate of 100 Hz.Only the last 2000 data points of the x-solution were usedin the analysis.
A.4 Bi-Directional Coupled Rössler System
The Bi-directional Rössler system is defined as dx dt = − w y − z + k ( x − x ) , d y dt = w x + . y , dz dt = . + z ( x − ) , dx dt = − w y − z + k ( x − x ) , d y dt = w x + . y , dz dt = . + z ( x − ) , (19)with w = . w = .
95, and k = .
05. This was solvedfor 4000 seconds with a sampling rate of 10 Hz. Only thelast 400 seconds of the x-solution were used in the analysis.
A.5 Mackey-Glass Delayed Differential Equation
The Mackey-Glass Delayed Differential Equation is definedas x ( t ) = − γ x ( t ) + β x ( t − τ ) + x ( t − τ ) n (20)with τ = β = γ =
1, and n = .
65. This was solvedfor 400 seconds with a sampling rate of 100 Hz. Only thelast 300 seconds of the x-solution were used in the analysis.10 preprint - May 14, 2019
A.6 EEG Data
The EEG signal was taken from andrzejak et al. [3]. Specif-ically, the first 2000 data points from the EEG data of ahealthy patient from set A, file Z-093 was used.
A.7 ECG Data
The Electrocardoagram (ECG) data was taken from SciPy’smisc.electrocardiogram data set. This ECG data was origi-nally provided by the MIT-BIH Arrhythmia Database [19].We used data points 3000 to 4500 during normal sinusrhythm.
A.8 Logistic Map
The logistic map was generated as x n + = r x n ( − x n ) , (21)with x = . r = .
95. Equation 21 was solved forthe first 500 data points.
A.9 Hénon Map
The Hénon map was solved as x n + = − ax n + y n , y n + = bx n , (22)where b = . x = . y = .
3, and a = .
4. Thissystem was solved for the first 500 data points of thex-solution.
B Algorithms
B.1 Fast 0D persistence algorithm for 1D dataB.2 SW1PerS Algorithm for the Embedding Delay τ The procedure we described in Section 3.3 is summarizedin algorithm 2. This algorithm can be used to automaticallydetermine a suitable delay τ using SW1PerS. Result:
Persistence Diagram: Birth and Deaths Times begin initialize time series;find locations and values of minima and maxima;store locations and values in matrix ( M minmax ); while M minmax has values do increment i;difference = maxima values - minima values;find index of smallest difference ( I min );peak value ( P v ) = M minmax of I min for max valuecolumn;peak index ( P i ) = M minmax of I min for max indexcolumn;valley value ( V v ) = M minmax of I min for min valuecolumn;valley index ( V i ) = M minmax of I min for min indexcolumn;persistence diagram point[i] = [valley value, peakvalue];remove P v , P i , V v , V i from M minmax ; end return persistence diagram points; endAlgorithm 1: Zero Dimensional Persistence Algorithm.
Result: τ begin set n T = 200;calculate L max using max significant frequency;calculate m using FFT reconstruction to 10% errorthreshold;linearly map time series onto the domain [ , π ] ;set L = while L < L max do calculate window size as w = π mL ( m + ) ;calculate τ s as τ s = w / m ;generate interpolated function for time seriesusing cubic spline at τ s intervals;create point cloud using n T sliding windows as SW m ,τ f ( t ) ;apply persistence homology on SW point cloud;calculate periodicity score as s ( L ) = − r B − r D ;increment L ; end calculate L at minimum score as L P ;Find length of series as T ;calculate main period of time series as P = mT ( m + ) L P ;set α between 2 and 4;calculate τ = f s P α ;return τ ; end Algorithm 2: SW1PerS algorithm for ττ