Change-point detection using the conditional entropy of ordinal patterns
aa r X i v : . [ m a t h . S T ] J u l Change-point detection using the conditionalentropy of ordinal patterns
Anton M. Unakafov ∗ and Karsten Keller Institute of Mathematics, University of Lübeck Graduate School for Computing in Medicine and Life Sciences,University of LübeckJuly 17, 2018
Abstract
This paper is devoted to change-point detection using only the ordinalstructure of a time series. A statistic based on the conditional entropyof ordinal patterns characterizing the local up and down in a time seriesis introduced and investigated. The statistic requires only minimala priori information on given data and shows good performance innumerical experiments.
Keywords : Change-point detection; Conditional entropy; Ordinalpattern.
Most of real-world time series are non-stationary, that is some of their prop-erties change over time. A model for some non-stationary time series is pro-vided by a piecewise stationary stochastic process: its properties are locallyconstant except for certain time-points called change-points , where someproperties change abruptly [8]. Detecting change-points is a classical prob-lem and there are many methods for tackling it [8–10,12,27]. However, mostof the existing methods have a common drawback: they require certain apriori information about the time series. It is necessary to know either afamily of stochastic processes providing a model for the time series (see forinstance [13] where AR processes are considered) or at least to know whichcharacteristics (mean, standard deviation, etc.) of the time series reflect the ∗ Corresponding address: Institute of Mathematics, University of Lübeck, RatzeburgerAllee 160, Building 64, 23562 Lübeck, Germany. e-mail: [email protected](A.M. Unakafov) d ∈ N describes order relationsbetween ( d + 1) successive points of a time series. The main step of ordinalpattern analysis is the transformation of an original time series into a se-quence of ordinal patterns. A result of this transformation is demonstratedin Figure 1 for order d = 1 .Figure 1: A part of a piecewise stationary time series with a change-pointat t = L/ (marked by a vertical line) and corresponding ordinal patterns oforder d = 1 (below the plot)For detecting a change-point t ∗ ∈ N in a time series x = (cid:0) x ( t ) (cid:1) Lt =0 withvalues in R , one generally considers x as a realization of a stochastic process X and computes for x a statistic S ( t ; x ) that should reach its maximum at t = t ∗ . Here we suggest a statistic on the basis of the conditional entropy ofordinal patterns introduced in [46].Let us provide an ‘obvious’ example only to motivate our approach andto illustrate its idea. Example 1.
Consider a time series (cid:0) x ( t ) (cid:1) Lt =0 , its central part is shown inFigure 1. The time series is periodic before and after L/ , but at L/ thereoccurs a change (marked by a vertical line): the “oscillations” become faster.Figure 1 also presents the ordinal patterns π ( t ) of order d = 1 at times t underlying the time series. Note that there are only two ordinal patternsof order : the increasing (coded by ) and the decreasing (coded by ).Both ordinal patterns occur with the same frequency before and after thechange-point. 2owever, the transitions between successive ordinal patterns are changedat L . Indeed, before the change-point L/ both ordinal patterns have twopossible successors (for instance, the ordinal pattern π ( L/ −
5) = 0 is suc-ceeded by the ordinal pattern π ( L/ −
4) = 0 , which in turn is succeeded bythe ordinal pattern π ( L/ −
3) = 1 ), whereas after the change-point the or-dinal patterns and are alternating. A measure of diversity of transitionsbetween ordinal patterns is provided by the conditional entropy of them.For the sequence (cid:0) π ( k ) (cid:1) Lk =1 of ordinal patterns of order the (empirical)conditional entropy for t = 2 , , . . . , L is defined as follows: eCE (cid:16)(cid:0) π ( k ) (cid:1) tk =1 (cid:17) = − X i =0 1 X j =0 n i,j ( t ) t − n i,j ( t ) n i ( t ) with n i,j ( t ) = { l = 1 , , . . . , t − | π ( l ) = i, π ( l + 1) = j } ,n i ( t ) = { l = 1 , , . . . , t − | π ( l ) = i } (throughout the paper, and / , and A denotes the numberof elements of a set A ).To detect change-points we use a test statistic for d = 1 defined as follows: CEofOP( θL ) =( L −
2) eCE (cid:16)(cid:0) π ( k ) (cid:1) Lk =1 (cid:17) − ( θL −
1) eCE (cid:16)(cid:0) π ( k ) (cid:1) θLk =1 (cid:17) − (cid:0) L − ( θL + 1) (cid:1) eCE (cid:16)(cid:0) π ( k ) (cid:1) Lk = θL +1 (cid:17) , for θ ∈ (0 , with θL ∈ N . According to the properties of conditionalentropy (see Section 3 for details), CEofOP( θL ) attains its maximum when θL coincides with a change-point. Figure 2 demonstrates this for the timeseries from Figure 1. θ C E o f O P ( θ L ) Figure 2: Statistic
CEofOP( θL ) for the sequence of ordinal patterns of order for the time series from Figure 1For simplicity and in view of real applications, in Example 1 we defineordinal patterns and the CEofOP statistic immediately for concrete time3eries. However, for theoretical consideration it is necessary to define the
CEofOP statistic for stochastic processes. For this we refer to Section 3.To illustrate applicability of the
CEofOP statistic let us discuss a real-world data example:
Example 2.
Here we consider EEG recording 14 from the sleep EEG datasetkindly provided by Vasil Kolev (details and other results for this dataset areprovided in [45, Subsection 5.3.2]). We employ the following procedure foran automatic discrimination between sleep stages from the EEG time series:first we split time series into pseudo-stationary intervals by finding change-points with the
CEofOP statistic (change-points are detected in each EEGchannel separately), then we cluster all the obtained intervals. Figure 3 il-lustrates the outcome of the proposed discrimination for single EEG channelin comparison with the manual scoring by an expert; the automated identi-fication of a sleep type (waking, REM, light sleep, deep sleep) is correct for79.6% of 30-second epochs. Note that the borders of the segments (that isthe detected change-points) in most cases correspond to the changes of sleepstage.
Figure 3: Hypnogram (bold curve) and the results of ordinal-patterns-baseddiscrimination of sleep EEG. Here ordinate axis represents the results of theexpert classification: W stands for waking, stages S1, S2 and S3, S4 indicatelight and deep sleep, respectively, REM stands for REM sleep and Error– for unclassified samples. Results of ordinal-patterns-based discriminationare represented by the background colour: white colour indicates epochsclassified as waking state, light gray – as light sleep, gray – as deep sleep,dark gray – as REM, red colour indicates unclassified segmentsThe
CEofOP statistic was first introduced in [23], where we have em-ployed it as a component of a method for sleep EEG discrimination. Howeverno theoretical details of the method for change-point detection were provided4here. This paper aims to fill in this gap and provides a justification of the
CEofOP statistic.This paper is organized as follows. In Section 2 we provide a brief intro-duction into ordinal pattern analysis. In particular, we define the conditionalentropy of ordinal patterns and discuss its properties. In Section 3 we in-vestigate the properties of the
CEofOP statistic. We also suggest there amethod for detecting multiple change-points via the
CEofOP statistic. Sec-tion 4 is devoted to a comparison of this method with two other (classicaland ordinal-patterns-based) methods for change-point detection by perform-ing experiments on realizations of piecewise stationary stochastic processes.In Section 5 we summarize the results and state open problems. Finally,in supplementary Section 6 we investigate the asymptotic properties of the
CEofOP statistic.
Central objects of the following are stochastic processes X = (cid:0) X ( t ) (cid:1) mt = n on a probability space (Ω , A , P ) with values in R . Here n ∈ N and m ∈ N ∪ {∞} allowing both finite and infinite lengths of processes. We consideronly univariate stochastic processes to keep notation simple, though thereare no principal restrictions on the dimension of a process. X = (cid:0) X ( t ) (cid:1) mt = n is stationary if for all t , t , . . . , t k , s with t , t , . . . , t k , t + s, t + s, . . . , t k + s ∈{ n, n + 1 , . . . , m } the distributions of ( X t i ) ki =1 and ( X t i + s ) ki =1 coincide.Throughout this paper we discuss detection of change-points in a piece-wise stationary stochastic process. Simply speaking, a piecewise station-ary stochastic process is obtained by “gluing” several pieces of stationarystochastic processes (for a formal definition of piecewise stationarity see, forinstance, [42, Section 3.1]).In this section we recall the basic facts from ordinal pattern analysis (Sub-section 2.1), present the idea of ordinal-patterns-based change-point detec-tion (Subsection 2.2), and define the conditional entropy of ordinal patterns(Subsection 2.3). Let us recall the definition of an ordinal pattern [22,23,47]. For d ∈ N denotethe set of permutations of { , , . . . , d } by S d . Definition 1.
We say that a real vector ( x , x , . . . , x d ) has ordinal pattern OP( x , x , . . . , x d ) = ( r , r , . . . , r d ) ∈ S d of order d ∈ N if x r ≥ x r ≥ . . . ≥ x r d and r l − > r l for x r l − = x r l .
5s one can see there are ( d + 1)! different ordinal patterns of order d . Definition 2.
Given a stochastic process X = (cid:0) X ( t ) (cid:1) Lt =0 for L ∈ N ∪ {∞} ,the sequence Π d = (cid:0) Π( t ) (cid:1) Lt = d with Π( t ) = OP (cid:0) X ( t − d ) , X ( t − d + 1) , . . . , X ( t ) (cid:1) is called the abstract sequence of ordinal patterns of order d ∈ N of the process X . Similarly, given x = ( x ( t )) Lt =0 a realization of X , the sequence of ordinalpatterns of order d of x is defined as π d,L = (cid:0) π ( t ) (cid:1) Lt = d with π ( t ) = OP (cid:0) x ( t − d ) , x ( t − d + 1) , . . . , x ( t ) (cid:1) . For simplicity, we say that L in the case L ∈ N is the length of the sequence π d,L though, in fact, it consists of ( L − d + 1) elements. Definition 3.
A stochastic process X = (cid:0) X ( t ) (cid:1) Lt =0 for L ∈ N ∪ {∞} is saidto be ordinal- d -stationary if for all i ∈ S d the probability P (cid:0) Π( t ) = i (cid:1) doesnot depend on t for d ≤ t ≤ L . In this case we call p i = P (cid:0) Π( t ) = i (cid:1) (1)the probability of the ordinal pattern i ∈ S d in X .The idea of ordinal pattern analysis is to consider the sequence of ordinalpatterns and the ordinal patterns distribution obtained from it instead of theoriginal time series. Though implying the loss of all the metric information,this often allows to extract some relevant information from a time series, inparticular, when it comes from a complex system. For example, ordinal pat-tern analysis provides estimators of the Kolmogorov-Sinai entropy [5, 21, 46]of dynamical systems, measures of time series complexity [6, 23, 32], mea-sures of coupling between time series [20, 33] and estimators of parametersof stochastic processes [7, 40], see also [1, 2] for a review of applications toreal-world time series. Methods of ordinal pattern analysis are invariant withrespect to strictly-monotone distortions of time series [22], do not need infor-mation about range of measurements, and are computationally simple [47].This qualifies it for application in the case that no much is known about thesystem behind a time series, possibly as a first exploration step.For a discussion of the properties of a sequence of ordinal patterns werefer to [4, 7, 16, 39, 40]. For the following we need two results stated below. Lemma 1 (Corollary 2 from [39]) . Each process X = (cid:0) X ( t ) (cid:1) t ∈ N with associ-ated stationary increment process ( X ( t ) − X ( t − t ∈ N is ordinal- d -stationaryfor each d ∈ N . The next result is a direct consequence of [46, Lemma 4] and generalizesthe main result of [30]. 6 heorem 2. If X is a Markov chain with values in a two-element-set (con-tained in R ), then the corresponding abstract sequence Π d of ordinal patternsalso forms a Markov chain for every order d ∈ N . In relation to this theorem, note that in general one does not need valuesin R but in a totally ordered set. It is shown in [30] and [45, Subsection 3.3.3]that if X is a Markov chain in a finite set of more than two elements, Π d does not generally form a Markov chain.Probability distributions of ordinal patterns are known only for somespecial cases of stochastic processes [7, 16, 39]. In general one estimatesprobabilities of ordinal patterns by their empirical probabilities. Considera sequence π d,L of ordinal patterns. For any t ∈ { d + 1 , d + 2 , . . . , L } thefrequency of occurrence of an ordinal pattern i ∈ S d among the first ( t − d ) ordinal patterns of the sequence is given by n i ( t ) = { l ∈ { d, d + 1 , . . . , t − } | π ( l ) = i } . (2)A natural estimator of the probability of an ordinal pattern i in the ordinal- d -stationary case is provided by its relative frequency in the sequence π d,L : b p i = n i ( L ) L − d . Sequences of ordinal patterns are invariant to certain changes in the originalstochastic process X , such as shifts (adding a constant to the process) [1,Subsection 3.4.3] and scaling (multiplying the process with a positive con-stant) [22]. However, in many cases changes in the original process X affectalso the corresponding abstract sequences of ordinal patterns and ordinalpatterns distributions. On the one hand, this impedes application of ordinalpattern analysis to non-stationary time series since most of ordinal-patterns-based quantities require ordinal- d -stationarity of a time series [1, 6, 33] andmay be unreliable when this condition fails. On the other hand, one candetect change-points in the original process by detecting changes in the se-quence of ordinal patterns.First ideas of using ordinal patterns for detecting change-points wereformulated in [4, 35, 38, 41, 49]. The advantage of the ordinal-patterns-basedmethods is that they require less information than most of the existing meth-ods for change-point detection: it is assumed neither that the stochastic pro-cess is from a specific family nor that the change affects specific characteristicof the process. Instead, we consider further change-points with the followingproperty. 7 efinition 4. Let (cid:0) X ( t ) (cid:1) Lt =0 with L ∈ N ∪ {∞} be a piecewise stationarystochastic process with a change-point t ∗ ∈ N . We say that t ∗ is a ordinalchange-point if there exist some m, n ∈ N with m < t ∗ < n ≤ L and some d ∈ N such that (cid:0) X ( t ) (cid:1) t ∗ t = m and (cid:0) X ( t ) (cid:1) nt ∗ +1 are ordinal- d -stationary but (cid:0) X ( t ) (cid:1) nt = m is not.This approach seems to be natural for many stochastic processes andreal-world time series. Remark.
Note that a change-point where the change in mean occurs, neednot be ordinal, since the mean is irrelevant for the distribution of ordinalpatterns [1, Subsection 3.4.3]. However, there are many methods that effec-tively detect changes in mean; the method proposed here is intended for usein a more complex case, when there is no classical method, or it is not clear,which of them to apply.We illustrate Definition 4 by two examples. Piecewise stationary autore-gressive processes considered in Example 3 are classical and provide modelsfor linear time series. Since many real-world time series are non-linear, fur-ther we introduce in Example 4 a process originated from non-linear dynam-ical systems. These two types of processes are used throughout the paperfor empirical investigation of change-point detection methods.
Example 3.
A first order piecewise stationary autoregressive (AR) process with change-points t ∗ , t ∗ , . . . , t ∗ N st − is defined as AR (cid:0) ( φ , φ , . . . , φ N st ) , ( t ∗ , t ∗ , . . . , t ∗ N st − ) (cid:1) = (cid:0) AR( t ) (cid:1) Lt =0 , where φ , φ , . . . , φ N st ∈ [0 , are the parameters of the autoregressive modeland AR( t ) = φ k AR( t −
1) + ǫ ( t ) , for all t ∈ { t ∗ k − + 1 , t ∗ k − + 2 , . . . , t ∗ k } for k = 1 , , . . . , N st , where t ∗ := 0 and t ∗ N st := L , with ǫ being the standard white Gaussian noise, and AR(0) := ǫ (0) . AR processes are often used for the investigation of methods for change-points detection (see, for instance, [38, 41]), since they provide models for awide range of real-world time series. Figure 4a illustrates a realization of a‘two piece’ AR process with a change-point at L/ . By [7, Proposition 5.3]the distributions of ordinal patterns of order d ≥ reflect change-points forpiecewise stationary AR processes. Figure 4c illustrates this for the realiza-tion from Figure 4a: empirical probability distributions of ordinal patternsof order d = 2 before and after the change-point L/ differ significantly. Example 4.
A classical example of non-linear system is provided by a lo-gistic map on the unit interval: x ( t ) = rx ( t − (cid:0) − x ( t − (cid:1) , (3)8ith t ∈ N , for x (0) ∈ [0 , and r ∈ [1 , . The behaviour of this map signifi-cantly varies for different value r ; we are especially interested in r ∈ [3 . , with chaotic behaviour. In the latter situation there exists an invariantergodic measure absolutely continuous with respect to the Lebesgue mea-sure [29, 44], therefore (3) defines a stationary stochastic process NL : NL ( t ) = r (cid:0) − NL ( t − (cid:1) NL ( t − , with NL (0) ∈ [0 , being a uniformly distributed random number. Notethat for almost all r ∈ [3 . , (more generally, for all r ∈ [0 , ), eitherthe map NL is chaotic or hyperbolic roughly meaning that an attractiveperiodic orbit is dominating it. This is a deep result in one-dimensionaldynamics (see [29] for details). In the hyperbolic case after some transientbehaviour numerically one only sees some periodic orbit, which has longperiods in the interval r ∈ [3 . , . From the practical viewpoint, i.e. whenconsidering short orbits, dynamics for that interval can be considered aschaotic since already small changes of r provide chaotic behaviour also inthe theoretical sense.Let us include some observational noise by adding standard white Gaus-sian noise ǫ to an orbit: NL( t ) = NL ( t ) + σǫ ( t ) , where σ > is the level of noise.Orbits of logistic maps, particularly with observational noise, are oftenused as a studying and illustrating tool of non-linear time series analysis(see [15,28]). This justifies as a natural object for study a piecewise stationarynoisy logistic (NL) process with change-points t ∗ , t ∗ , . . . , t ∗ N st − , defined as NL (cid:0) ( r , . . . , r N st ) , ( σ , . . . , σ N st ) , ( t ∗ , t ∗ , . . . , t ∗ N st − ) (cid:1) = (cid:0) NL( t ) (cid:1) Lt =0 , where r , . . . , r N st ∈ [3 . , are the values of control parameter, σ , . . . , σ N st > are the levels of noise, and NL( t ) = NL ( t ) + σ k ǫ ( t ) , with NL ( t ) = r k (cid:0) − NL ( t − (cid:1) NL ( t − , for all t ∈ { t ∗ k − + 1 , t ∗ k − + 2 , . . . , t ∗ k } for k = 1 , , . . . , N st , with t ∗ := 0 , t ∗ N st := L and NL (0) = ǫ (0) .Figure 4b shows a realization of a ‘two-piece’ NL process with a change-point at L/ ; as one can see in Figure 4d, the empirical distributions ofordinal patterns of order d = 2 before the change-point and after the change-point do not coincide. In general, the distributions of ordinal patterns oforder d ≥ reflect change-points for the NL processes (which can be easilychecked). 9 /2−100 L/2 L/2+100−202 t A R (( . , . ) , L / ) (a) e m p i r i ca l p r ob a b ilit y (c) L/2−100 L/2 L/2+100−1012 t N L (( . , ) , . , L / ) (b) e m p i r i ca l p r ob a b ilit y (d)Figure 4: Upper row: parts of realizations of AR (a) and NL (b) process withchange-points marked by vertical lines, L = 20000 . Lower row: empiricalprobability distributions of ordinal patterns of order d = 2 before and afterthe change-point for the realizations of AR (c) and NL (d) processThe NL and AR processes considered have rather different ordinal pat-terns distributions, being the reason for using them for empirical investiga-tion of change-point detection methods in Section 4.We now consider the classical problem of detecting a change-point t ∗ onthe basis of a realization x of a stochastic process X having at most onechange-point, that is it holds either N st = 1 or N st = 2 (compare [12]). Tosolve this problem one estimates a tentative change-point b t ∗ as the time-pointthat maximizes a test statistic S ( t ; x ) . Then the value of S ( b t ∗ ; x ) is comparedto a given threshold in order to decide whether b t ∗ is a change-point.The idea of ordinal change-point detection is to find change-points in astochastic process X by detecting changes in the sequence π d,L of ordinalpatterns for a realization of X . Given at most one ordinal change-point t ∗ in X , one estimates its position b t ∗ by using the fact that • π ( d ) , π ( d + 1) , . . . , π ( t ∗ ) characterize the process before the change; • π ( t ∗ + 1) , π ( t ∗ + 2) , . . . , π ( t ∗ + d − correspond to the transitionalstate; • π ( t ∗ + d ) , π ( t ∗ + d + 1) , . . . , π ( L ) characterize the process after thechange. 10herefore, a position of a change-point can be estimated by an ordinal-patterns-based statistic S ( t ; π d,L ) that, roughly speaking, measures dissim-ilarity between the distributions of ordinal patterns for (cid:0) π ( k ) (cid:1) tk = d and for (cid:0) π ( k ) (cid:1) Lk = t + d .Then an estimate of the change-point t ∗ is given by b t ∗ = arg max t = d,d +1 ,...,L S ( t ; π d,L ) . A method for detecting one change-point can be extended to an arbitrarynumber of change-points using the binary segmentation [48]: one applies asingle change-point detection procedure to the realization x ; if a change-point is detected then it splits x into two segments in each of which one islooking for a change-point. This procedure is repeated iteratively for theobtained segments until all of them either do not contain change-points orare too short.The key problem in this paper is the selection of an appropriate ordinal-patterns-based test statistic S ( t ; π d,L ) for detecting changes. We suggest anappropriate test statistic in Section 3, but first we introduce in Subsection 2.3the conditional entropy of ordinal patterns being the cornerstone of thisstatistic. Let us call a process X = (cid:0) X ( t ) (cid:1) Lt =0 for L ∈ N ∪ {∞} ordinal- d + -stationary if for all i, j ∈ S d the probability of pairs of ordinal patterns p i,j = P (cid:0) Π( t ) = i, Π( t + 1) = j (cid:1) does not depend on t for d ≤ t ≤ L − (compare with Definition 3). Obvi-ously, ordinal- d + 1 -stationarity implies ordinal- d + -stationarity.For an ordinal- d + -stationary stochastic process, consider the probabilityof an ordinal pattern j ∈ S d to occur after an ordinal pattern i ∈ S d ; similarlyto (1), it is given by: p j | i = P (cid:0) Π( t + 1) = j | Π( t ) = i (cid:1) = p i,j p i for p i = 0 . If p i = 0 , let p j | i = 0 . Definition 5.
The conditional entropy of ordinal patterns of order d ∈ N ofan ordinal d + -stationary stochastic process X is defined by: CE(
X, d ) = − X i ∈ S d X j ∈ S d p i p j | i ln( p i p j | i ) + X i ∈ S d p i ln p i = − X i ∈ S d X j ∈ S d p i p j | i ln p j | i . (4)11or brevity, we refer to CE(
X, d ) as the “conditional entropy” when noconfusion can arise. The conditional entropy characterizes the mean diversityof successors j ∈ S d of a given ordinal pattern i ∈ S d . This quantity oftenprovides a good practical estimation of the Kolmogorov-Sinai entropy fordynamical systems; for a discussion of this and other theoretical propertiesof conditional entropy we refer to [46].One can estimate the conditional entropy from a time series by using theempirical conditional entropy of ordinal patterns [23]. Consider a sequence π d,L of ordinal patterns of order d ∈ N with length L ∈ N . Similarly to (2),the frequency of occurrence of an ordinal patterns pair i, j ∈ S d is given by n i,j ( t ) = { l ∈ { d, d + 1 , . . . , t − } | π ( l ) = i, π ( l + 1) = j } (5)for t ∈ { d + 1 , d + 2 , . . . , L } . The empirical conditional entropy of ordinalpatterns for π d,L is defined by eCE (cid:0) π d,L (cid:1) = − L − d X i ∈ S d X j ∈ S d n i,j ( L ) ln n i,j ( L ) + 1 L − d X i ∈ S d n i ( L ) ln n i ( L )= − L − d X i ∈ S d X j ∈ S d n i,j ( L ) ln n i,j ( L ) n i ( L ) . (6)As a direct consequence of Lemma 1 the empirical conditional entropyapproaches the conditional entropy under certain assumptions, namely itholds the following. Corollary 3.
For the sequence π d, ∞ of ordinal patterns of order d ∈ N of arealization of an ergodic stochastic process X = (cid:0) X ( t ) (cid:1) t ∈ N with associatedstationary increment process ( X ( t ) − X ( t − t ∈ N , it holds almost surely that lim L →∞ eCE (cid:16)(cid:0) π ( k ) (cid:1) Lk = d (cid:17) = CE( X, d ) . (7) We suggest to use the following statistic for detecting ordinal change-pointsof a process on the basis of a sequence π d,L of ordinal patterns for d, L ∈ N of a realization of the process: CEofOP (cid:0) t ; π d,L (cid:1) = ( L − d ) eCE (cid:16)(cid:0) π ( k ) (cid:1) Lk = d (cid:17) − ( t − d ) eCE (cid:16)(cid:0) π ( k ) (cid:1) tk = d (cid:17) (8) − (cid:0) L − ( t + d ) (cid:1) eCE (cid:16)(cid:0) π ( k ) (cid:1) Lk = t + d (cid:17) . t ∈ N with d < t < L − d . The intuition behind this statistic comesfrom the concavity of conditional entropy (not only for ordinal patterns butin general, see [19, Subsection 2.1.3]). It holds eCE (cid:16)(cid:0) π ( k ) (cid:1) Lk = d (cid:17) ≥ t − dL − d eCE (cid:16)(cid:0) π ( k ) (cid:1) tk = d (cid:17) (9) + L − ( t + d ) L − d eCE (cid:16)(cid:0) π ( k ) (cid:1) Lk = t + d (cid:17) . Therefore, if the probabilities of ordinal patterns change at some point t ∗ ,but do not change before and after t ∗ , then CEofOP (cid:0) t ; π d,L (cid:1) tends to attainits maximum at t = t ∗ . If the probabilities do not change at all, then for L being sufficiently large, (9) tends to hold with equality (see Corollary 6 inSection 6).Let n i ( t ) and n i,j ( t ) be the frequencies of occurrence of an ordinal pattern i ∈ S d and of an ordinal patterns pair i, j ∈ S d (given by (2) and (5),respectively). Set m i ( t ) = n i ( L ) − n i ( t + d ) and m i,j ( t ) = n i,j ( L ) − n i,j ( t + d ) ,then using (6) we rewrite (8) in a straightforward form: CEofOP (cid:0) t ; π d,L (cid:1) = − L − dL − d X i ∈ S d X j ∈ S d n i,j ( L ) ln n i,j ( L ) n i ( L ) (10) + X i ∈ S d X j ∈ S d n i,j ( t ) ln n i,j ( t ) n i ( t ) + X i ∈ S d X j ∈ S d m i,j ( t ) ln m i,j ( t ) m i ( t ) . This statistic was first introduced and applied to the segmentation of sleepEEG time series in [23].In the rest of this section, we present a motivating example (Subsec-tion 3.1), show the relation of the
CEofOP statistic to the likelihood ratiostatistic for a piecewise stationary Markov chain (Subsection 3.2) and for-mulate an algorithm for detecting multiple change-points by means of theCEofOP statistic in Subsection 3.3.
We demonstrate a “non-linear” nature of the
CEofOP statistic by means ofExample 5 concerning transition from a time series to its surrogate. Al-though being in a sense tailor-made, this example shows that
CEofOP dis-cerns changes that cannot be detected by conventional “linear” methods.
Remark.
The question whether a time series is linear or non-linear oftenarises in data analysis. For instance, linearity should be verified before usingsuch powerful methods as Fourier analysis. For this one usually employs aprocedure known as surrogate data testing [36, 37, 43]. It utilises the factthat a linear time series is statistically indistinguishable from any time seriessharing some of its properties (for instance, second moments and amplitude13pectrum). Therefore one can generate surrogates having the certain proper-ties of the original time series without preserving other properties, irrelevantfor a linear system. If such surrogates are significantly different from theoriginal series then non-linearity is assumed.
Example 5.
Consider a time series consisting of a realisation of a noisylogistic process NL (cid:0) r, σ (cid:1) of length L/ without changes, glued with its sur-rogate of the same length (to generate surrogates we use the iterative AAFTalgorithm suggested by [37] and implemented by [17]). This compound timeseries has a change-point at t ∗ = L/ , which conventional methods may failto detect since the surrogate has the same autocorrelation function as theoriginal process (for instance this is the case for the Brodsky-Darkhovskymethod considered further in Section 4). However, the ordinal pattern dis-tributions for the original time series and its surrogate generally are signif-icantly different. Therefore the CEofOP statistic detects the change-point,which is illustrated by Figure 5. t C E o f O P ( t ) Figure 5: Statistic
CEofOP( θL ) for a time series, obtained by “gluing” arealization of a noisy logistic stochastic process NL (cid:0) , . (cid:1) with its surrogateat t ∗ = 2000 Remark.
Although the idea that ordinal structure is a relevant indicatorof time series linearity/non-linearity is not new [1, 6], to our knowledge, itwas not rigorously proved that the distribution of ordinal patterns is alteredby surrogates. This is clearly beyond the scope of this paper and will bediscussed elsewhere as a separate study; here it is sufficient for us to providean empirical evidence for this. 14 .2 CEofOP statistic for a sequence of ordinal patterns form-ing a Markov chain
In this subsection we show that there is a connection between the
CEofOP statistic and the classical likelihood ratio statistic. Though taking placeonly in a particular case, this connection reveals the nature of the
CEofOP statistic.First we set up necessary notations. Consider a sequence π d,L of ordinalpatterns for that transition probabilities of ordinal patterns may change atsome t ∈ { d, d + 1 . . . , L } . The basic statistic for testing whether there isa change in the transition probabilities is the likelihood ratio statistic [8,Subsection 2.2.3]: LR (cid:0) t ; π d,L (cid:1) = − (cid:0) H | π d,L (cid:1) + 2 ln Lkl (cid:0) H A | π d,L (cid:1) , (11)where Lkl (cid:0) H | π d,L (cid:1) is the likelihood of the hypothesis H given a sequence π d,L of ordinal patterns, and the hypotheses are given by H : (cid:0) p j | i ( t ) (cid:1) i,j ∈ S d = (cid:0) q j | i ( t ) (cid:1) i,j ∈ S d ,H A : (cid:0) p j | i ( t ) (cid:1) i,j ∈ S d = (cid:0) q j | i ( t ) (cid:1) i,j ∈ S d , where p j | i ( t ) , q j | i ( t ) are transition probabilities of ordinal patterns beforeand after t , respectively. Proposition 4.
If an abstract sequence Π d of ordinal patterns of order d ∈ N forms a Markov chain with at most one change-point, then for a sequence π d,L = (cid:0) π ( k ) (cid:1) Lk = d of ordinal patterns being a realization of Π d of length L ∈ N , it holds LR (cid:0) t ; π d,L (cid:1) = 2 CEofOP (cid:0) t ; π d,L (cid:1) + 2 d · eCE (cid:0) π d,L (cid:1) . Proof.
First we estimate the probabilities and the transition probabilitiesbefore ( p ) and after ( q ) the change [3, Section 2]: b p i ( t ) = n i ( t ) t − d , b p j | i ( t ) = n i,j ( t ) n i ( t ) , b q i ( t ) = m i ( t ) L − ( t + d ) , b q j | i ( t ) = m i,j ( t ) m i ( t ) . Lkl (cid:0) H | π d,L (cid:1) = b p π ( d ) ( L ) L − Y l = d b p π ( l +1) | π ( l ) ( L )= b p π ( d ) ( L ) Y i ∈ S d Y j ∈ S d (cid:0)b p j | i ( L ) (cid:1) n i,j ( L ) , Lkl (cid:0) H A | π d,L (cid:1) = b p π ( d ) ( t ) t Y l = d b p π ( l +1) | π ( l ) ( t ) L − Y l = t + d b q π ( l +1) | π ( l ) ( t )= b p π ( d ) ( t ) Y i ∈ S d Y j ∈ S d (cid:0)b p j | i ( t ) (cid:1) n i,j ( t ) Y i ∈ S d Y j ∈ S d (cid:0)b q j | i ( t ) (cid:1) m i,j ( t ) . Assume that the first ordinal pattern π ( d ) is fixed in order to simplifythe computations. Then b p π ( d ) ( L ) = b p π ( d ) ( t ) and it holds: LR (cid:0) t ; π d,L (cid:1) = − X i ∈ S d X j ∈ S d n i,j ( L ) (cid:0) ln n i,j ( L ) − ln n i ( L ) (cid:1) +2 X i ∈ S d X j ∈ S d n i,j ( t ) (cid:0) ln n i,j ( t ) − ln n i ( t ) (cid:1) +2 X i ∈ S d X j ∈ S d m i,j ( t ) (cid:0) ln m i,j ( t ) − ln m i ( t ) (cid:1) . Since P j ∈ S d n i,j ( t ) = n i ( t ) , one finally obtains: LR (cid:0) t ; π d,L (cid:1) =2( L − d ) eCE (cid:16)(cid:0) π ( k ) (cid:1) Lk = d (cid:17) − t − d ) eCE (cid:16)(cid:0) π ( k ) (cid:1) tk = d (cid:17) − (cid:0) L − ( t + d ) (cid:1) eCE (cid:16)(cid:0) π ( k ) (cid:1) Lk = t + d (cid:17) =2 CEofOP (cid:0) t ; π d,L (cid:1) + 2 d · eCE (cid:16) π d,L (cid:17) . Consider a sequence π d,L of ordinal patterns of order d ∈ N with length L ∈ N , corresponding to a realization of some piecewise stationary stochasticprocess. To detect a single change-point via the CEofOP statistic we firstestimate its possible position by b t ∗ = arg max t = T min + d,...,L − T min S ( t ; π d,L ) , where T min is a minimal length of a sequence of ordinal patterns that issufficient for a reliable estimation of empirical conditional entropy.16 emark. From the representation (8) it follows that for a reasonable com-putation of the
CEofOP statistic, a reliable estimation of eCE before andafter the assumed change-point is required. To satisfy this requirement thestationary parts of a process are assumed to be sufficiently long. We take T min = ( d + 1)!( d + 1) , which is equal to the number of all possible pairs ofordinal patterns of order d (see [23] for details). Consequently, the length L of a time series should satisfy L > T min = 2( d + 1)!( d + 1) . (12)Note that this does not impose serious limitations on the suggested method,since condition (12) is not too restrictive for d ≤ . However it implies usingof either d = 2 or d = 3 , since d = 1 does not provide effective change-pointdetection (see Examples 6 and 3), while d > in most applications demandstoo large sample sizes.Then in order to check whether b t ∗ is an actual change-point we testbetween the hypotheses: H : parts π ( d ) , π ( d + 1) , . . . , π ( b t ∗ ) and π ( b t ∗ + d ) , . . . , π ( L ) of the sequence π d,L come from the same distribution; H A : parts π ( d ) , π ( d + 1) , . . . , π ( b t ∗ ) and π ( b t ∗ + d ) , . . . , π ( L ) of the sequence π d,L come from different distributions.This test is performed by comparing CEofOP (cid:0)b t ∗ ; π d,L (cid:1) to a threshold h ,such that if the value of CEofOP is above the threshold then one rejects H in favour of H A , otherwise H is accepted. The choice of the threshold isambiguous: the higher h , the higher the possibility of false acceptance of thehypothesis H is; on the contrary, the lower h , the higher the possibility offalse rejection of H in favour of H A ( false alarm ) is.As it is usually done, we consider the threshold h as a function of thedesired probability α of false alarm; for computing the threshold h ( α ) weuse block bootstrapping from the sequence π d,L of ordinal patterns (boot-strapping is often used in change-point detection for computing a threshold,see [14, 25] for a theoretical discussion and [24, 31] for applications of boot-strapping with detailed and clear explanations). Algorithm 1 describes thedetection of at most one change-point via the CEofOP statistic.To detect multiple change-points we use an algorithm that consists oftwo steps:
Step 1: preliminary estimation of boundaries of the stationary segmentswith a threshold h (2 α ) computed for doubled nominal probability offalse alarm (that is with a higher risk of detecting false change-points). Step 2: verification of the boundaries and exclusion of false change-points:a change-point is searched for a merging of every two adjacent intervals.17 lgorithm 1
Detecting at most one change-point
Input: sequence π = (cid:0) π ( k ) (cid:1) t end k = t start of ordinal patterns of order d , nominalprobability α of false alarm Output: estimate of a change-point b t ∗ if change-point is detected, otherwisereturn . function DetectSingleCP ( π , α ) T min ← ( d + 1)!( d + 1) ; if t end − t start < T min then return ⊲ sequence is too short, no change-point can bedetected end if b t ∗ ← arg max t = t start + T min ,...,t end − T min CEofOP( t ; π ) ; N boot ← ⌊ α ⌋ ; ⊲ number of bootstrap samples for computingthreshold for l = 1 , , . . . , N boot do ⊲ computing threshold by bootstrapping ξ ← randomly shuffled blocks of length ( d + 1) from π ; c j ← arg max t = t start + T min ,...,t end − T min CEofOP( t ; ξ ) ; end for c j ← Sort( c j ); ⊲ sort the maximal values of CEofOP for bootstrapsamples in decreasing order h ← c ⌊ αN boot ⌋ if S ( b t ∗ ; π ) < h then return else return b t ∗ ; end if end function Details of these two steps are displayed in Algorithm 2. While Step 1 is theusual binary segmentation procedure as suggested in [48], Step 2 improvesthe obtained solution following the idea suggested in [11].
In this section we empirically investigate performance of the method forchange-point detection via the
CEofOP statistic. We apply it to the noisylogistic processes and to autoregressive processes (see Subsection 2.2) andcompare performances of change-point detection by the suggested methodand by the following methods. • The ordinal-patterns-based method for detecting change-points via the
CMMD statistic [38, 41]: A time series is split into windows of equal18 lgorithm 2
Detecting multiple change-points
Input: sequence π = (cid:0) π ( k ) (cid:1) Lk = d of ordinal patterns of order d , nominalprobability α of false alarm. Output: estimates of the number b N st of stationary segments and of theirboundaries (cid:0)b t ∗ k (cid:1) b N st k =0 . function DetectAllCP ( π , α ) b N st ← ; b t ∗ ← ; b t ∗ ← L ; k ← ⊲ Step 1 repeat b t ∗ ← DetectSingleCP (cid:16)(cid:0) π ( k ) (cid:1) b t ∗ k +1 k = b t ∗ k + d , α (cid:17) ; if b t ∗ > then Insert b t ∗ to the list of change-points after b t ∗ i ; b N st ← b N st + 1 ; else k ← k + 1 ; end if until k < b N st ; k ← ; ⊲ Step 2 repeat b t ∗ ← DetectSingleCP (cid:16)(cid:0) π ( k ) (cid:1) b t ∗ k +2 k = b t ∗ k + d , α (cid:17) ; if b t ∗ > then b t ∗ k +1 ← b t ∗ ; k ← k + 1 ; else Delete b t ∗ k +1 from the change-points list; b N st ← b N st − ; end if until k < b N st − ; return b N st , (cid:0)b t ∗ k (cid:1) b N st k =0 ; end function lengths W ∈ N , empirical probabilities of ordinal patterns are esti-mated in every window. If there is a ordinal change-point in the timeseries, then the empirical probabilities of ordinal patterns should beapproximately constant before the change-point and after the change-point, but they change at the window with the change-point. To de-tect this change authors have introduced the CMMD statistic . In the Note that the definition of the CMMD statistic in [38] contains a mistake, which iscorrected in [41]. The results of numerical experiments reported in [38] also do not complywith the actual definition of the CMMD statistic, see [45, Subsections 4.2.1.1, 4.5.1.1] fordetails. • Two versions of the classical Brodsky-Darkhovsky method [11]: TheBrodsky-Darkhovsky method can be used for detecting changes in var-ious characteristics of a time series x = (cid:0) x ( t ) (cid:1) Lt =1 , but the characteristicof interest should be selected in advance. In this paper we consider de-tecting changes in mean which is just the basic characteristic, and incorrelation function corr( x ( t ) , x ( t + 1)) which reflects relations betweenthe future and the past of a time series and seems to be a natural choicefor detecting ordinal change-points. Changes in mean are detected bythe generalized version of the Kolmogorov-Smirnov statistic [11]: BD exp ( t ; x, δ ) = (cid:18) t ( L − t ) L (cid:19) δ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) t t X l =1 x ( l ) − L − t L X l = t +1 x ( l ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) , where the parameter δ ∈ [0 , regulates properties of the statistic, δ = 0 is basically used (see [11] for details). Changes in the correlationfunction are detected by the following statistic: BD corr ( t ; x, δ ) = BD exp (cid:0) t ; (cid:0) y ( t ) (cid:1) L − t =1 , δ (cid:1) with y ( t ) = x ( t ) x ( t + 1) . Remark.
Note that we consider the statistic BD exp , which is intended todetect changes in mean, though ordinal-patterns-based statistics do not de-tect these changes. This is motivated by the fact that changes in the noisylogistic processes are on the one hand changes in mean, and in the otherhand – ordinal changes in the sense of Definition 4. Therefore, they can bedetected both by BD exp and by ordinal-patterns-based statistics. In general,by the nature of ordinal time series analysis, changes in mean and in theordinal structure are in some sense complementary.We carry out experiments for order d = 3 of ordinal patterns (lowerorders may provide worse results because of reduced sensitivity, while higherorders are applicable only to rather long time series due to condition (12)).For the CMMD statistic; we take the window size W = 256 . (There are nospecial reasons for this choice except the fact that W = 256 is sufficient forestimating probabilities of ordinal patterns of order d = 3 inside the windows,since >
120 = 5( d + 1)! [1, Section 9.3]. Results of the experimentsremain almost the same for ≤ W ≤ .)In Subsection 4.1 we study how well the statistics for change-point de-tection estimate the position of a single change-point. Since we expect thatthe performance of the statistics for change-point detection may strongly20epend on the length of realization, we check this in Subsection 4.2. Finally,we investigate the performance of various statistics for detecting multiplechange-points in Subsection 4.3. Consider N = 10000 realizations x j = (cid:0) x j ( t ) (cid:1) Lt =0 with j = 1 , . . . , N foreach of the processes listed in Table 1. A single change occurs at a randomtime t ∗ uniformly distributed in (cid:8) L − W, L − W + 1 , . . . , L + W (cid:9) . For allprocesses, length L = 80 W of sequences of ordinal patterns is taken.Short name Complete designationNL, . → . , σ = 0 . (cid:0) (3 . , . , (0 . , . , t ∗ (cid:1) NL, . → . , σ = 0 . (cid:0) (3 . , . , (0 . , . , t ∗ (cid:1) NL, . → . , σ = 0 . (cid:0) (3 . , . , (0 . , . , t ∗ (cid:1) AR, . → . (cid:0) (0 . , . , t ∗ (cid:1) AR, . → . (cid:0) (0 . , . , t ∗ (cid:1) AR, . → . (cid:0) (0 . , . , t ∗ (cid:1) Table 1: Processes used for investigation of the change-point detectionTo measure the overall accuracy of change-point detection via somestatistic S as applied to the process X we use three quantities. Let us firstdetermine the error of the change-point estimation provided by the statistic S for the j -th realization of a process X : err j ( S, X ) = (cid:16)b t ∗ ( S ; x j ) − t ∗ (cid:17) , where t ∗ is the actual position of the change-point and b t ∗ ( S ; x j ) is its estimateobtained by using S . Then the fraction of satisfactorily estimated change-points sE (averaged over N realizations) is defined by: sE( S, X ) = (cid:8) j ∈ { , , . . . , N } : | err j ( S, X ) | ≤ MaxErr (cid:9)
N , where
MaxErr is the maximal satisfactory error, we take
MaxErr = W =256 . The bias and the root mean squared error are respectively given by B( S, X ) = 1 N N X j =1 err j ( S, X ) , RMSE(
S, X ) = vuut N N X j =1 (cid:0) err j ( S, X ) (cid:1) . sE is and the more near to zero the bias and the RMSE are, themore accurate the estimation of a change-point is.Results of the experiments are presented in Tables 2, 3 for NL and ARprocesses, respectively. For every process the best values of performancemeasures are shown in bold .NL, . → . NL, . → . NL, . → . Statistic σ = 0 . σ = 0 . σ = 0 . -13 CEofOP
397 0.65
256 0.88 20 99 BD exp
351 0.78 -6
145 0.89 BD corr . → . AR, . → . AR, . → . CEofOP
234 0.86 BD exp > > > > > > BD corr
73 0.97 Table 3: Performance of different statistics for estimating change-point inAR processesLet us summarize: For the considered processes the CEofOP statisticestimates change-point more accurately than the CMMD statistic. For theNL processes the CEofOP statistic has almost the same performance as theBrodsky-Darkhovsky method; for the AR processes performance of the clas-sical method is better, though CEofOP has lower bias. In contrast to theordinal-patterns-based methods, the Brodsky-Darkhovsky method is unreli-able when there is lack of a priori information about the time series. Forinstance, changes in NL processes only slightly influence the correlation func-tion and BD corr does not provide a good indication of changes (cf. perfor-mance of BD corr and CEofOP in Table 2). Meanwhile, changes in the ARprocesses do not influence the expected value (see Example 3), which doesnot allow to detect them using BD exp (see Table 3). Therefore we do notconsider the BD exp statistic in further experiments.22 .2 Estimating position of a single change-point for differentlengths of time series Here we study how the accuracy of change-point estimation for the threeconsidered statistics depends on the length L of a time series. We take N = 50000 realizations of NL, . → . , σ = 0 . and AR, . → . forrealization lengths L = 24 W, W, . . . , W . Again, we consider a singlechange at a random time t ∗ ∈ (cid:8) L − W, L − W + 1 , . . . , L + W (cid:9) . Results ofthe experiment are presented in Figure 6.
40 60 80 100 1200.20.250.30.350.40.450.50.550.6 s E L, windows
MMDCEofOPBD corr (a)
40 60 80 100 1200.40.50.60.70.80.9 s E L, windows
MMDCEofOPBD corr (c)
40 60 80 100 1200200400600800 B i a s L, windows
MMDCEofOPBD corr (b)
40 60 80 100 120050100150200250300350400 B i a s L, windows
MMDCEofOPBD corr (d)Figure 6: Measures of change-point detection performance for NL (a, b) andAR (c, d) processes with different lengths, where L is the product of windownumbers given on the abscissa axis with window length W = 256 Summarizing, performance of the CEofOP statistic strongly depends onthe length of time series but is generally better than for the CMMD statistic.In comparison with the classical Brodsky-Darkhovsky method, CEofOP hasbetter performance for NL processes (see Figures 6a,b), and lower bias forAR processes (see Figure 6d). 23 .3 Detecting multiple change-points
Here we investigate how well the considered statistics detect multiple change-points. Methods for change-point detection via the CEofOP and the CMMDstatistics are implemented according to Subsection 3.3 and [45, Subsec-tion 4.5.1], respectively. The Brodsky-Darkhovsky method is implementedaccording to [11] with only one exception: to compute a threshold for it weuse bootstrapping, which in our case provided better results than the tech-nique described in [11]. Nominal probability of a false alarm α = 0 . hasbeen taken for all methods (in the case of the CMMD statistic we have usedthe equivalent value . , see [45, Subsection 4.3.2]).We consider here two processes, AR (cid:0) (0 . , . , . , . , ( t ∗ , t ∗ , t ∗ ) (cid:1) and NL (cid:0) (3 . , , . , . , (0 . , . , . , . , ( t ∗ , t ∗ , t ∗ ) (cid:1) , with change-points t ∗ k be-ing independent and uniformly distributed in (cid:8) t ∗ k − W, t ∗ k − W + 1 , . . . , t ∗ k + W (cid:9) for k = 1 , , with t ∗ = 0 . L , t ∗ = 0 . L , t ∗ = 0 . L , and L = 100 W . Forboth processes we generate N = 10000 realizations x j with j = 1 , . . . , N .We consider unequal lengths of stationary segments to study methods forchange-point detection in more realistic conditions.As we apply change-point detection via a statistic S to realization x j ,we obtain estimates of the number b N st ( S ; x j ) of stationary segments and ofchange-points positions b t ∗ l ( S ; x j ) for l = 1 , , . . . , b N st ( S ; x j ) − . Since thenumber of estimated change-points may be different from the actual numberof changes, we suppose that the estimate for t ∗ k is provided by the nearest b t ∗ l ( S ; x j ) . Therefore the error of estimation of the k -th change-point providedby S is given by err jk ( S, X ) = min l =1 , ,..., b N st ( S ; x j ) − (cid:12)(cid:12)(cid:12)b t ∗ l ( S ; x j ) − t ∗ k (cid:12)(cid:12)(cid:12) . To assess the overall accuracy of change-point detection, we compute twoquantities. The fraction sE k of satisfactory estimates of a change-point t ∗ k , k = 1 , , is given by sE k ( S, X ) = (cid:8) j ∈ { , , . . . , N } | err jk ( S, X ) ≤ MaxErr (cid:9)
N , where
MaxErr is the maximal satisfactory error; we take
MaxErr = W =256 . The average number of false change-points is defined by: fCP( S, X ) = N P j =1 (cid:0) b N st ( S ; x j ) − − (cid:8) k ∈ { , , } | err jk ( S, X ) ≤ MaxErr (cid:9)(cid:1)
N .
Results of the experiment are presented in Tables 4 and 5, the best valuesare shown in bold .In summary, since distributions of ordinal patterns for NL and AR pro-cesses have different properties, results for them differ significantly. The24tatistic fCP Fraction sE k of satisfactory estimates1st change 2nd change 3rd change average cMMD CEofOP BD corr NL (cid:0) (3 . , , . , . , (0 . , . , . , . , ( t ∗ , t ∗ , t ∗ ) (cid:1) Statistic fCP Fraction sE k of satisfactory estimates1st change 2nd change 3rd change average CMMD
CEofOP BD corr Table 5: Performance of change-point detection methods for the process withthree change-points AR (cid:0) (0 . , . , . , . , ( t ∗ , t ∗ , t ∗ ) (cid:1) CEofOP statistic provides good results for the NL processes. However, forthe AR processes its performance is much worse: only the most prominentchange is detected rather well. Weak results for two other change-points arecaused by the fact that the CEofOP statistic is rather sensitive to the lengthsof stationary segments (we have already seen this in Subsection 4.2), and inthis case they are not very long.
In this paper we have introduced a method for change-point detection via theCEofOP statistic and have tested it for time series coming from two classesof models having quite different distributions, namely piecewise stationarynoisy logistic and autoregressive processes.The empirical investigations suggest that the method proposed providesbetter detection of ordinal change-points than the ordinal-patterns-basedmethod introduced in [38, 41]. Performance of our method for the twomodel classes considered is particularly comparable to that for the classicalBrodsky-Darkhovsky method, but in contrast to it, ordinal-patterns-basedmethods require less a priori knowledge about the time series. This can beespecially useful in the case of considering non-linear models where the au-tocorrelation function does not describe distributions completely. Here thepoint is that with exception of the mean much of the distribution is capturedby its ordinal structure. So (together with methods finding changes in mean)25he CEofOP statistic can be used at least for a first exploration step.Although numerical experiments and tests to real-world data cannot re-place rigorous theoretical studies, the results of the current study show thepotential of the change-point detection via the CEofOP statistic. However,there are some open points listed below.1. A method for computing a threshold h for the CEofOP statistic with-out using bootstrapping is of interest, since the bootstrapping pro-cedure is rather time consuming. One possible solution is to utilizeTheorem 5 (Section 6) and to precompute thresholds using the valuesof ∆ dγ,θ ( P, Q ) . However, this approach requires further investigation.2. The binary segmentation procedure [48] is not the only possible methodfor detecting multiple change-points. In [26,27] an alternative approachis suggested: the number of stationary segments b N st is estimated byoptimizing a contrast function, then the positions of the change-pointsare adjusted. Likewise one can consider a method for multiple change-point detection based on maximizing the following generalization of CEofOP statistic:
CEofOP( t ) = ( L − d b N st ) eCE (cid:16)(cid:0) π ( k ) (cid:1) Lk = d (cid:17) − b N st X l =1 (cid:0)b t ∗ l − b t ∗ l − − d (cid:1) eCE (cid:16) π (cid:0)b t ∗ l − + d (cid:1) , . . . , π (cid:0)b t ∗ l (cid:1)(cid:17) , where b N st ∈ N is an estimate of number of stationary segments, b t ∗ = 0 , b t ∗ b N st = L and b t ∗ , b t ∗ , . . . , b t ∗ b N st − ∈ N are estimates of change-points.Further investigation in this direction could be of interest.3. As we have seen in Subsection 4.2, CEofOP statistic requires ratherlarge sample sizes to provide reliable change-point detection. This isdue to the necessity of the empirical conditional entropy estimation(see Subsection 3.3). In order to reduce the required sample size, onemay consider more effective estimates of the conditional entropy, forinstance, the Grassberger estimate (see [18] and [45, Subsection 3.4.1]).However elaboration of this idea is beyond the scope of this paper.4. In this paper only one-dimensional time series are considered, thoughthere is no principal limitation for applying ordinal-patterns-basedmethods to multivariate data (see [21]). Discussion of using ordinal-patterns-based methods for detecting change-point in multivariate data(for instance, in multichannel EEG) is therefore of interest.5. We have considered here only the “off-line” detection of changes, whichis used when the acquisition of a time series is completed. Meanwhile,26n many applications it is necessary to detect change-points “on-line”,based on a small number of observations after the change [8]. Develop-ment of on-line versions of ordinal-patterns-based methods for change-point detection may be an interesting direction of a future work. Here we consider the values of
CEofOP for the case when segments of astochastic process before and after the change-point have infinite length.Let us first introduce some notation. Given an ordinal- d + -stationarystochastic process X for d ∈ N the distribution of pairs of ordinal patternsis denoted by P = ( p i,j ) i,j ∈ S d , with p i,j = P (cid:0) Π( t ) = i, Π( t + 1) = j (cid:1) = p j | i p i for all i, j ∈ S d . One easily sees the following: The conditional entropy ofordinal patterns is represented as CE(
X, d ) = H ( P ) , where H ( P ) = − X i ∈ S d X j ∈ S d p i,j ln p i,j + X i ∈ S d (cid:18) X j ∈ S d p i,j (cid:19) ln X j ∈ S d p i,j . Here recall that p i = P j ∈ S d p i,j . Theorem 5.
Let Y = ( Y t ) t ∈ N and Z = ( Z t ) t ∈ N be ergodic d + -ordinal-statio-nary stochastic processes on a probability space (Ω , A , P ) with probabilitiesof pairs of ordinal patterns of order d ∈ N given by P = ( p i,j ) i,j ∈ S d and Q = ( q i,j ) i,j ∈ S d , respectively. For L ∈ N and γ ∈ (0 , , let Π L,γ be theabstract sequence of ordinal patterns of order d of ( Y , . . . , Y ⌊ γL ⌋ , Z ⌊ γL ⌋ +1 , Z ⌊ γL ⌋ +2 , . . . , Z L ) . (13) Then, for all θ ∈ (0 , it holds lim L →∞ CEofOP (cid:16) ⌊ θL ⌋ ; Π L,γ (cid:17) = L ∆ dγ,θ ( P, Q ) P -almost sure, where ∆ dγ,θ ( P, Q ) = ( H (cid:0) γP + (1 − γ ) Q (cid:1) − θH ( P ) − (1 − θ ) H (cid:0) γ − θ − θ P + − γ − θ Q (cid:1) , θ < γH (cid:0) γP + (1 − γ ) Q (cid:1) − θH (cid:0) γθ P + θ − γθ Q (cid:1) − (1 − θ ) H ( Q ) , θ ≥ γ . By definition, (13) is a stochastic process of length L + 1 with a potentialordinal change-point t ∗ = ⌊ θL ⌋ , i.e. the position of t ∗ relative to L is prin-cipally the same for all L , and the statistics considered are stabilizing forincreasing L . (13) can be particularly interpreted as a part of a stochasticprocess including exactly one ordinal chance point. We omit the proof ofTheorem 5 since it is a simple computation.27ue to the properties of the conditional entropy, it holds max θ ∈ (0 , ∆ dγ,θ ( P, Q ) = ∆ dγ,γ ( P, Q ) = H (cid:0) γP + (1 − γ ) Q (cid:1) − γH ( P ) − (1 − γ ) H ( Q ) . Values of ∆ dγ,θ ( P, Q ) can be computed for a piecewise stationary stochas-tic process with known probabilities of ordinal patterns before and after thechange-point . In [7] authors compute probabilities of ordinal patterns oforders d = 2 (Proposition 5.3) and d = 3 (Theorem 5.5) for stationary Gaus-sian processes (in particular, for autoregressive processes). Here we use theseresults to provide Example 6 illustrating Theorem 5. Example 6.
Consider an autoregressive process AR (cid:0) ( φ , φ ) , t ∗ (cid:1) with a sin-gle change-point t ∗ = L/ for L ∈ N . Using the results from [7] we computedistributions P φ , P φ of ordinal pattern pairs for orders d = 1 , and onthis basis we calculate the values of ∆ d . , . (cid:0) P φ , P φ (cid:1) for different values of φ and φ . The results are presented in Tables 6 and 7. ❍❍❍❍❍❍ φ φ
100 ∆ . , . (cid:0) P φ , P φ (cid:1) for an autoregressive process (co-efficient here is only for the sake of readability)According to Theorem 5, for π d,L = (cid:0) π ( k ) (cid:1) Lk = d being a sequence of ordinalpatterns of order d for a realization of AR (cid:0) ( φ , φ ) , L/ (cid:1) it holds almost surethat L max θ ∈ (0 , CEofOP (cid:0) ⌊ θL ⌋ ; π d,L (cid:1) −−−−→ L →∞ ∆ d . , . (cid:16) P φ , P φ (cid:17) . To apply Theorem 5 one needs probabilities of pairs of ordinal patterns of order d .They can be calculated from the probabilities of ordinal patterns of order ( d + 1) : as onecan easily verify, probability P i,j of any pair ( i, j ) of ordinal patterns i, j ∈ S d is equaleither to the probability of a certain ordinal pattern of order ( d + 1) or to the sum of twosuch probabilities. ❍❍❍❍❍ φ φ
100 ∆ . , . (cid:0) P φ , P φ (cid:1) for an autoregressive processFigure 7 shows how fast this convergence is. Note that the CEofOP statisticfor orders d = 1 , allows to distinguish between change and no change in theconsidered processes for L ≥ · . For L = 10 the values of the CEofOP statistic for order d = 2 is already very close to its theoretical values, whereasfor d = 1 this length does not seem sufficient. −3 L, x10 AR,0.1 → → → → (a) −3 L, x10 AR,0.1 → → → → (b)Figure 7: Convergence of L max θ ∈ (0 , CEofOP (cid:0) ⌊ θL ⌋ ; π d,L (cid:1) to the theoreticalvalues ∆ d . , . as L increases for autoregressive processes for d = 1 (a) and d = 2 (b). Here AR, φ → φ stands for the process AR (cid:0) ( φ , φ ) , L/ (cid:1) . Theprovided empirical values are obtained either as 5th percentile (for φ = φ )or as 95th percentile (for φ = φ ) from 1000 trials29he following result is a simple consequence of Theorem 5. Corollary 6.
Let X = ( X t ) t ∈ N be an ergodic d + -ordinal-stationary stochas-tic process on a probability space (Ω , A , P ) . For L ∈ N let Π d,L be the abstractsequence of ordinal patterns of order d of ( X , X , . . . , X L ) . Then for any θ ∈ (0 , it holds lim L →∞ CEofOP (cid:16) ⌊ θL ⌋ ; Π d,L (cid:17) = 0 (14) P -almost sure. Acknowledgement
This work was supported by the Graduate School for Computing in Medicineand Life Sciences funded by Germany’s Excellence Initiative [DFG GSC235/1].
References [1] J. M. Amigó.
Permutation complexity in dynamical systems. Ordinalpatterns, permutation entropy and all that . Berlin Heidelberg: Springer,2010.[2] J. M. Amigó and K. Keller. Permutation entropy: One concept, twoapproaches.
European Physical Journal Special Topics , 222(2):263–273,2013.[3] T. W. Anderson and L. A. Goodman. Statistical inference about markovchains.
Annals of Mathematical Statistics , 28:89–110, 1957.[4] C. Bandt. Autocorrelation type functions for big and dirty data series. arXiv preprint arXiv:1411.3904 , 2003.[5] C. Bandt, G. Keller, and B. Pompe. Entropy of interval maps viapermutations.
Nonlinearity , 15(5):1595–1602, 2002.[6] C. Bandt and B. Pompe. Permutation entropy: A natural complexitymeasure for time series.
Physical Review Letters , 88(174102), 2002.[7] C. Bandt and F. Shiha. Order patterns in time series.
Journal of TimeSeries Analysis , 28(5):646–665, 2007.[8] M. Basseville and I. V. Nikiforov.
Detection of abrupt changes: theoryand application . Upper Saddle River, NJ: Prentice-Hall, Inc., 1993.[9] B. E. Brodsky and B. S. Darkhovsky.
Nonparametric methods in change-point problems . Dordrecht: Kluwer Academic Publishers, 1993.3010] B. E. Brodsky and B. S. Darkhovsky.
Non-parametric statistical diag-nosis. Problems and methods . Dordrecht: Kluwer Academic Publishers,2000.[11] B. E. Brodsky, B. S Darkhovsky, A. Ya. Kaplan, and S. L. Shishkin.A nonparametric method for the segmentation of the EEG.
ComputerMethods and Programs in Biomedicine , 60(2):93–106, 1999.[12] E. Carlstein, H. G. Muller, and D. Siegmund.
Change-point problems .Hayward, CA: Institute of Mathematical Statistics, 1994.[13] R. A. Davis, T. C. M. Lee, and G. A. Rodriguez-Yam. Structural breakestimation for nonstationary time series models.
Journal of the Ameri-can Statistical Association , 101(473):223–239, 2006.[14] A. C. Davison and D. V. Hinkley.
Bootstrap Methods and their Appli-cations . Cambridge: Cambridge University Press, 1997.[15] C. Diks.
Nonlinear Time series analysis: Methods and Applications .World Scientific, 1999.[16] S. Elizalde and M. Martinez. The frequency of pattern occurrence inrandom walks. arXiv preprint arXiv:1412.0692 e-print– arXiv preprint physics/0307138 , 2003.[19] T. S. Han and K. Kobayashi.
Mathematics of information and coding.Transl. from the Japanese by J. Suzuki . Providence: American Mathe-matical Society, 2002.[20] T. Haruna and K. Nakajima. Permutation complexity and couplingmeasures in hidden Markov models.
Entropy , 15(9):3910–3930, 2013.[21] K. Keller. Permutations and the Kolmogorov-Sinai entropy.
Discreteand Continuous Dynamical Systems , 32(3):891–900, 2012.[22] K. Keller, M. Sinn, and J. Emonds. Time series from the ordinal view-point.
Stochastics and Dynamics , 7(2):247–272, 2007.[23] K. Keller, A. M. Unakafov, and V. A. Unakafova. Ordinal patterns,entropy, and EEG.
Entropy , 16(12):6212–6239, 2014.[24] A. Y. Kim, C. Marzban, D. B. Percival, and W. Stuetzle. Using labeleddata to evaluate change detectors in a multivariate streaming environ-ment.
Signal Processing , 89(12):2529–2536, 2009.3125] S. N. Lahiri.
Resampling methods for dependent data . New York:Springer, 2003.[26] M. Lavielle. Detection of multiple changes in a sequence of dependentvariables.
Stochastic Processes and their Applications , 83(1):79–102,1999.[27] M. Lavielle and G. Teyssière. Adaptive detection of multiple change-points in asset price volatility. In G. Teyssière and A. P. Kirman, ed-itors,
Long Memory in Economics , pages 129–156. Berlin Heidelberg:Springer, 2007.[28] SJ Linz and M Lücke. Effect of additive and multiplicative noise on thefirst bifurcations of the logistic model.
Physical Review A , 33(4):2694,1986.[29] M. Lyubich. Forty years of unimodal dynamics: on the occasion of ArturAvila winning the Brin prize.
Journal of Modern Dynamics , 6(2):183–203, 2012.[30] H. N. Nagaraja. On the non-Markovian structure of discrete order statis-tics.
Journal of Statistical Planning and Inference , 7:29–33, 1982.[31] A. M. Polansky. Detecting change-points in Markov chains.
Computa-tional Statistics & Data Analysis , 51(12):6013–6026, 2007.[32] B. Pompe. The LE-statistic.
The European Physical Journal SpecialTopics , 222(2):333–351, 2013.[33] B. Pompe and J. Runge. Momentary information transfer as a couplingmeasure of time series.
Physical Review E , 83:051122, 2011.[34] P. Preuss, R. Puchstein, and H. Dette. Detection of multiple structuralbreaks in multivariate time series.
Journal of the American StatisticalAssociation , 110(510):654–668, 2015.[35] A. Schnurr and H. Dehling. Testing for structural breaks via ordinalpattern dependence. arXiv preprint arXiv:1501.07858 , 2015.[36] T. Schreiber and A. Schmitz. Surrogate time series.
Physica D: Non-linear Phenomena , 142(3):346–382, 2000.[37] Thomas Schreiber and Andreas Schmitz. Improved surrogate data fornonlinearity tests.
Physical Review Letters , 77(4):635, 1996.[38] M. Sinn, A. Ghodsi, and K. Keller. Detecting change-points in timeseries by maximum mean discrepancy of ordinal pattern distributions.In
Uncertainty in Artificial Intelligence, Proceedings of the 28th Con-ference , pages 786–794, 2012. 3239] M. Sinn and K. Keller. Estimation of ordinal pattern probabilities infractional brownian motion. arXiv preprint arXiv:0801.1598 , 2008.[40] M. Sinn and K. Keller. Estimation of ordinal pattern probabilities ingaussian processes with stationary increments.
Computational Statistics& Data Analysis , 55(4):1781–1790, 2011.[41] M. Sinn, K. Keller, and B. Chen. Segmentation and classification of timeseries using ordinal pattern distributions.
European Physical JournalSpecial Topics , 222(2):587–598, 2013.[42] D. S. Stoffer. Frequency Domain Techniques in the Analysis of DNASequences. In T. S. Rao, S. S. Rao, and C. R. Rao, editors,
Handbookof Statistics: Time Series Analysis: Methods and Applications , pages261–296. Elsevier, 2012.[43] J. Theiler, S. Eubank, A. Longtin, B. Galdrikian, and J.D. Farmer.Testing for nonlinearity in time series: the method of surrogate data.
Physica D: Nonlinear Phenomena , 58(1-4):77–94, 1992.[44] H. Thunberg. Periodicity versus chaos in one-dimensional dynamics.
SIAM Review , 43(1):3–30, 2001.[45] A. M. Unakafov.
Ordinal-patterns-based segmentation and discrimina-tion of time series with applications to EEG data . PhD thesis, Universityof Lübeck, 2015.[46] A. M. Unakafov and K. Keller. Conditional entropy of ordinal patterns.
Physica D , 269:94–102, 2014.[47] V. A. Unakafova and K. Keller. Efficiently measuring complexity on thebasis of real-world data.
Entropy , 15(10):4392–4415, 2013.[48] L. Yu. Vostrikova. Detecting disorder in multidimensional random pro-cesses.
Soviet Mathematics Doklady , 24(1):55–59, 1981.[49] Y.-J. Yuan, X. Wang, Z.-T. Huang, and Z.-C. Sha. Detection of RadioTransient Signal Based on Permutation Entropy and GLRT.