Calculating permutation entropy without permutations
FFrom permutation to arithmetic entropy
A.K.Vidybida ∗ Bogolyubov Institute for Theoretical Physics14-b Metrolohichna str. Kyiv, 03143, Ukraine
April 24, 2020
Abstract
A method for analyzing sequential data sets, similar to the permuta-tion entropy one, is discussed. The characteristic features of this methodare as follows: it preserves information about equal values in the embed-ding vectors; it is exempt of combinatorics; it delivers the same entropyvalue as does the permutation method, provided the embedding vectorshave no equal components. In the case they do, obtained entropy is greaterthan the permutation one.
Keywords: permutation entropy, equal values, symbolization
Due to technical progress in the areas of sensors and storage devices a hugeamount of raw data about time course of different processes, like ECG, EEG,climate data recordings, stock market data have become available. These dataare redundant. The data processing and classification, aimed at extractingmeaningful for nonspecialist characteristics, is based on reducing the excess ofredundancy. As a result, a new data is obtained, small in size and digestible bya human being. Examples of those reduced data for time series can be meanvalue, variance, Liapunov exponents, correlation dimension, attractor dimensionand others.A remarkable method suitable for reducing the excess of redundancy in timeseries has been proposed by Ch.Bandt and B.Pompe in [1], known as permu-tation entropy. This method is simple and transparent, is robust with respectto monotone distortions of the raw data, and is suitable for estimating thedynamical complexity of the underlying dynamical process. Many interestingresults, e.g. [2, 3, 4], have been obtained with straightforward application ofthe permutation entropy methodology in its initial form, as it is described in ∗ [email protected], http://vidybida.kiev.ua a r X i v : . [ phy s i c s . d a t a - a n ] A p r D , where D is the embedding dimension. In the case of no ties (noequal components in the embedding vectors) the technique is equivalent to thestandard permutation entropy methodology. Imagine that while observing a process one obtains a finite sequence X = { X , X , . . . , X N − } , X i ∈ R , i = 0 , , , . . . N − , (1)of measurements. By choosing the embedding dimension D < N the data (1)can be embedded into a D -dimensional space by picking out consecutive D -tuples from X . As a result, a sequence of D -dimensional embedding vectors isobtained: V = { V , V , . . . , V N − D } , V i ∈ R D , i = 0 , , , . . . N − D, (2)where each vector has the following form: V = { X , X , X . . . , X D − } , . . . ,V i = { X i , X i +1 , X i +2 , . . . , X i + D − } , . . . , (3) V N − D = { X N − D , X N +1 − D , . . . , X N − } . An additional parameter of the embedding procedure is delay τ = 1 , , . . . . Inthe above definition, we put τ = 1 for simplicity. With τ (cid:54) = 1 one would have V i = { X i , X i + τ , X i +2 τ , . . . , X i +( D − τ } instead of (3).2he data represented in (2) and/or (3) is even more redundant than thatrepresented in (1) since, for D (cid:28) N , most data values from (1) are representedin (3) D times. In the permutation entropy technique [1], each embeddingvector from (2) and/or (3) is replaced with a permutation π of D integers { D − } , which is defined by the order pattern of values composingthe vector. For any embedding vector V = { x , x , . . . , x D − } the permutation π , which symbolizes it, is calculated as follows. Arrange all components of V either in the descending, [11]: V = { x , x , . . . , x D − } → V π = { x r , x r , . . . , x r D − } , x r > x r > · · · > x r D − , (4)or in the ascending, [12]: V = { x , x , . . . , x D − } → V π = { x r , x r , . . . , x r D − } , x r < x r < · · · < x r D − (5)order . The permutation π which corresponds to V is obtained as the row ofindexes in the rearranged vector V π from either (4), or (5): π ≡ π ( V ) = { r , r , . . . , r D − } . (6)From the sequence of embedding vectors V , calculate a new sequence Π of orderpatterns by replacing each vector in (2) by the corresponding permutation:Π = { π , π , . . . , π N − D } . (7)Now, empirical probability of each permutation, p ( π i ), can be obtained by divid-ing the number of occurrences of π i in the Π by the total number of elements inthe Π. The permutation entropy of V is the Shannon entropy of the probabilitydistribution p ( π i ): H ( V ) ≡ H (Π) = − K − (cid:88) i =0 p ( π i ) log( p ( π i )) , (8)where K is the number of different permutations in the Π. Equal values in an embedding vector are, to an extent, inconvenient. Indeed, if x r = x s for some 0 ≤ r, s < D in a vector V = { x , x , . . . , x D − } , then r and s should be placed side by side in the permutation (6), but which one shouldgo first? Due to sameness of values it is impossible to uniquely determine acorresponding permutation without introducing additional rules. In some cases Actually, in (4), (5) equal values (ties) are as well admitted. Here, we exclude such apossibility for the sake of clarity. The equal values are discussed in the next section. x x x x x x x x x x x Figure 1: In the standard symbolization, a sequence of same values (left) isequivalent either to ascending (center), or descending (right) sequence, if either(4), or (5) convention is used.the possibility of equal values can be ignored due to their low probability. This isreasonable when the embedding dimension is low, and/or a chaotic process dataare recorded with high precision, see [1, 13, 14]. If equal values are inevitable,the following rule is applied if x s = x r and s > r than s goes first. (9)The rule (9) has different meaning depending on whether (4) or (5) convention isused. Namely, in the case of (4), an embedding vector with all components equalwill be equivalent to a vector with monotonically ascending components. If (5) isadopted, then that same vector will be equivalent to a vector with monotonicallydescending components, see Fig. 1. Without knowing a real system, it is notclear which case is better and whether it is good or bad to label a sequenceof same values as being decreasing or increasing. Actually, the permutationsymbolization technique is aimed to reduce redundancy. Discrimination betweenconstant and either increasing, or decreasing sequences of data may appear to beexcessive in some cases, especially, when equal values are not observed. On theother hand, when a system generating data has a few possible outputs, or thedata was subjected to a crude quantization, or embedding dimension is large,it may happen to be useful if presence of equal values in the embedding vectorresults in order pattern preserving this fact. One possible approach to do thisis discussed in the next section. The following symbolization is aimed to keep information about equal valuesin embedding vectors. Having a vector V = { x , x , . . . , x D − } construct asequence of integers α : V = { x , x , . . . , x D − } → α = α ( V ) = { a , a , . . . , a D − } (10)by the following rule. Find the smallest component, c , in the V . If c is foundat places r , r , . . . , put number 0 at those places in the α ( V ). Find the nextsmallest component c , c > c in the V . If c is found at places s , s , . . . , putnumber 1 at those places in the α . Proceed this way until all components of V D components of α will be determined. The α obtained this way is used as a symbol of embedding vector V .For example, consider V = { , , , , } . The corresponding symbol, orthe order pattern is α = { , , , , } . Here, information about equal values andtheir positions is preserved.If V has no equal components, it can be shown that α = π − . This meansthat α is the inverse permutation of the one obtained for V if convention (5) isused. Since correspondence between permutations and their inverse is one-to-one, it does not matter which one, π or α , is used for calculating entropy. Thisfurther means that for a data set and embedding method, which does not deliverequal values in the embedding vectors, symbolization used here is equivalent tothe permutation one while calculating entropy. Expect that embedding vector V in (10) has exactly d unequal components,where d ≤ D . In this case, corresponding symbol α ( V ) will be a sequence of D numbers chosen from the set d = { , , . . . , d − } in such a way that noone element from d is missed. The latter can be formulated as the followingcondition: (cid:94) b ∈ d b ∈ α ( V ) . (11)The sequence α ( V ) can be considered as a single integer A ( V ), in a base- D positional numeral system , with digits a D − a D − . . . a : A ( V ) ≡ A = a + a D + a D + · · · + a D − D D − . (12)It is clear that there is one-to-one correspondence between order patterns α andintegers obtained as shown in (12). Therefore, a sequence of order patterns,constructed as described in Sec. 3.1, can be replaced with a sequence A ofintegers obtained as shown in (12): A = { A , A , . . . , A N − D } , where A i ≡ A ( V i ) . (13)The empirical probabilities p ( A i ) to find an integer A i among those in A canbe calculated as usual, and we have for the arithmetic entropy: H a ( V ) ≡ H a ( A ) = − L − (cid:88) i =0 p ( A i ) log( p ( A i )) , (14) It seems, that in paper [4] symbolization method described here is used. But, as it maybe concluded from [4, Eq. (6)], the issue of equal values is not addressed. Similar approach isused in [10], again without considering equal values. For a single embedding vector, d might be chosen as radix instead of D . But d maybe different for different vectors. And a same integer may have different representation fordifferent bases with (11) satisfied. E.g. 0112 = 1110 . L is the number of different integers in the A .For a data set and embedding method which does not deliver equal valuesin the embedding vectors, all d i = D and the integers A i will represent corre-sponding permutation order patterns unambiguously. In this case, A min ≤ A i ≤ A max , where A min corresponds to permutation α min = { D − , D − , . . . , , } : A min = D − D − D + ( D − D + · · · + D D − , and A max corresponds to permutation α max = { , , . . . , D − } : A max = D + 2 D + 3 D + · · · + ( D − D D − . Only D ! integers will be used from [ A min ; A max ] due to condition (11). If it is decided to treat order patterns generated from embedding D -vectorswith some components equal as not equivalent to those from vectors with allcomponents different, then the number of all possible patterns will be greaterthan D !. Here we attempt to estimate how many new patterns can be obtained.Any new pattern appears from embedding D -vector with d different compon-ents, where d ∈ { , , . . . , D − } . So, having d fixed, the number of correspond-ing new patterns is equal to the number N ( D, d ) of base-
D D -digit integersconstructed from digits { , , . . . , d − } in such a way that each of the d digitsis used at least once. This number can be calculated as N ( D, d ) = d ! (cid:26) Dd (cid:27) , where (cid:8) Dd (cid:9) — is the Stirling numbers of the second kind, [15, Part 5, § d , we have for the total number of possiblenew patterns: N ( D ) = (cid:88) X[N] . For calculating numerical order pattern of vector V i shown in (3) it is necessary to pass a pointer to the X[i] to the function get numerical pattern , below, as its third argument: data point = X + i .In the below example, X[i] is declared as double , but it can be of any typewith appropriate sorting defined. The return value is declared as mpz class ,which represents a GNU multiple precision integer ( https://gmplib.org/ ).This is used because for embedding dimensions D > 15 the returned num-ber representing an order pattern may exceed 64 bits in size. For smaller D , mpz class can be replaced with int , or long everywhere in the code. /** Function calculates numerical representation of order pattern of an embedding vector V_i = {X_i, X_{i + tau}, ...}. Here D is the embedding dimension, tau is the delay. data_point points to the first component of the Vi in the array of raw data. */ mpz_class get_numerical_pattern(int D,int tau,double * data_point) { int k; std::forward_list 5) is written in-stead. This introduces a non-zero correlation between consecutive values in S2.E.g., in the S2 any two consecutive values are always different. Examples of S1,S2 are as followsS1 = { , , , , , , , , , , , , , , , , , , , } , S2 = { , , , , , , , , , , , , , , , , , , , } . τ = 1 it can be seen that arithmetic entropy betterdiscriminates better between S1 and S2. Although, case with delay τ = 2 shownin Table 2 is not similarly conclusive. This might be due to construction method9f the S2 sequence. Namely, by pulling from S2 embedding vectors with delay 2,we may get vectors with equal adjacent components, similarly to S1 case. Thisalleviates difference between S1 and S2. For τ = 1, embedding vectors for S2do not have equal adjacent components. In this paper, we have discussed a method for calculating entropy in a sequenceof data, which is similar to the permutation entropy one. The characteristicfeatures of this method are as follows:(i) it treats equal components in the embedding vectors as being equal insteadof ordering them artificially;(ii) it is entirely exempt of combinatorics, labeling order patterns by integersinstead of permutations;(iii) if embedding vectors do not have equal components, this method deliversexactly the same value for the entropy as does the standard permutationentropy one.In the symbolization procedure discussed in Sec. 3.1, new order patterns mayappear as compared to the standard permutation method, see Sec. 3.3. Thosenew patterns arise from embedding vectors with some components being equalto each other. In the standard permutation method, the embedding vectorscharacterized by those new patterns are labeled by permutations. This is madepossible through ordering equal values in accordance with the rule (9).Mathematically, replacing embedding vectors with their order patterns meansconstructing a quotient set from the set of all embedding vectors with respectto some equivalence relation, [17, 9, 18]. In the case of permutation entropy,the corresponding equivalence relation is defined by (9) and either (4), or (5).Denote it by ∼ P . For arithmetic entropy, the corresponding equivalence relationis defined by the algorithm described in the first paragraph of Sec. 3.1. Denoteit by ∼ A . It is clear that for two embedding vectors U , V , if U ∼ A V , then U ∼ P V . Namely, if U , V have the same arithmetic order pattern then theyhave the same permutation order pattern. That means that ∼ P is coarser rela-tion than ∼ A . Other equivalence relations could be offered, which are courserthan ∼ P , or finer than ∼ A , or lying in between, or incomparable with the both,see e.g. [10]. Which one is better depends on the data sequence and which kindof redundancy is intended to strip. Acknowledgments. In this paper the following free software have been used: (i)linux operating system ( https://getfedora.org/ ); (ii) GNU Scientific Library, [19],( ); (iii) GNU Multiple Precision Arithmetic Li-brary ( https://gmplib.org/ ); (iv) Maxima, a free Computer Algebra System( http://maxima.sourceforge.net/ );(v) RefDB, a free Reference Manager http://refdb.sourceforge.net/ ). The present work was partially supported by theProgram of Fundamental Research of the Department of Physics and Astronomy ofthe National Academy of Sciences of Ukraine ”Mathematical models of nonequilibriumprocesses in open systems” N 0120U100857. References [1] Ch. Bandt and B. Pompe. Permutation entropy: A natural complexity measurefor time series. Physical Review Letters , 88(17):174102, 2002.[2] A. Porta, S. Guzzetti, N. Montano, R. Furlan, M. Pagani, A. Malliani, andS. Cerutti. Entropy, entropy rate, and pattern classification as tools to typ-ify complexity in short heart period variability series. IEEE Transactions onBiomedical Engineering , 48(11):1282–1291, 2001.[3] A. F. Bariviera, M. B. Guercio, L. B. Martinez, and O. A. Rosso. A permutationinformation theory tour through different interest rate maturities: the libor case. Philosophical Transactions of the Royal Society A: Mathematical, Physical andEngineering Sciences , 373(2056):20150119, 2015.[4] L. Tylov´a, J. Kukal, V. Hubata-Vacek, and O. Vyˇsata. Unbiased estimationof permutation entropy in EEG analysis for Alzheimer’s disease classification. Biomedical Signal Processing and Control , 39:424–430, 2018.[5] L. Zunino, F. Olivares, F. Scholkmann, and O. A. Rosso. Permutation entropybased time series analysis: Equalities in the input signal can lead to false conclu-sions. Physics Letters A , 381(22):1883–1892, 2017.[6] D. Cuesta–Frau, M. Varela–Entrecanales, A. Molina–Pic´o, and B. Vargas. Pat-terns with equal values in permutation entropy: Do they really matter for biosig-nal classification? Complexity , 2018:1324696, 2018.[7] H. Azami and J. Escudero. Amplitude-aware permutation entropy: Illustrationin spike detection and signal segmentation. Computer Methods and Programs inBiomedicine , 128:40–51, 2016.[8] Zhe Chen, Yaan Li, Hongtao Liang, and Jing Yu. Improved permutation en-tropy for measuring complexity of time series under noisy condition. Complexity ,2019:1403829, 2019.[9] Chunhua Bian, Chang Qin, Qianli D. Y. Ma, and Qinghong Shen. Modi-fied permutation-entropy analysis of heartbeat dynamics. Physical Review E ,85(2):021906, 2012.[10] S. Berger, G. Schneider, F. E. Kochs, and D. Jordan. Permutation entropy: Toocomplex a measure for EEG time series? Entropy , 19(12), 2017.[11] K. Keller, A. M. Unakafov, and V. A. Unakafova. Ordinal patterns, entropy, andEEG. Entropy , 16:6212–6239, 2014.[12] T. Gutjahr and K. Keller. Ordinal pattern based entropies and the Kol-mogorov–Sinai entropy: An update. Entropy , 22(1):63, 2020.[13] C. Bandt. Ordinal time series analysis. Ecol. Model. , 182:229–238, 2005.[14] W. Aziz and M. Arif. Multiscale permutation entropy of physiological time series.In , pages 1–6, 2005. 15] J. Riordan. An Introduction to Combinatorial Analysis . John Wiley, 1958.[16] N. Pippenger. The hypercube of resistors, asymptotic expansions, and preferentialarrangements. Mathematics Magazine , 83(5):331–346, 2010.[17] K. Keller, M. Sinn, and J. Emonds. Time series from the ordinal viewpoint. Stochastics and Dynamics , 07(02):247–272, 2007.[18] A. B. Piek, I. Stolz, and K. Keller. Algorithmics, possibilities and limits of ordinalpattern based entropies. Entropy , 21(6):547, 2019.[19] M. Galassi et al. GNU Scientific Library Reference Manual . 2009.. 2009.