Segmented Pairwise Distance for Time Series with Large Discontinuities
Jiabo He, Sarah Erfani, Sudanthi Wijewickrema, Stephen O'Leary, Kotagiri Ramamohanarao
SSegmented Pairwise Distance for Time Series withLarge Discontinuities
Jiabo He , Sarah Erfani , Sudanthi Wijewickrema , Stephen OLeary , Kotagiri Ramamohanarao School of Computing and Information Systems, The University of Melbourne, Australia Department of Otolaryngology, The University of Melbourne, AustraliaEmail: { jiaboh@student., sarah.erfani@, sudanthi.wijewickrema@, sjoleary@, kotagiri@ } unimelb.edu.au Abstract —Time series with large discontinuities are commonin many scenarios. However, existing distance-based algorithms(e.g., DTW and its derivative algorithms) may perform poorly inmeasuring distances between these time series pairs. In this paper,we propose the segmented pairwise distance (SPD) algorithm tomeasure distances between time series with large discontinuities.SPD is orthogonal to distance-based algorithms and can beembedded in them. We validate advantages of SPD-embeddedalgorithms over corresponding distance-based ones on both opendatasets and a proprietary dataset of surgical time series (ofsurgeons performing a temporal bone surgery in a virtual realitysurgery simulator). Experimental results demonstrate that SPD-embedded algorithms outperform corresponding distance-basedones in distance measurement between time series with largediscontinuities, measured by the Silhouette index (SI).
Index Terms —segmented pairwise distance, distance-basedalgorithms, time series, large discontinuities
I. I
NTRODUCTION
Time series are a ubiquitous form of data in scientificdisciplines. There may be value gaps in time series, whichare jumps of value orthogonal to the time axis. Time serieswith large discontinuities (value gaps) are common in a varietyof scenarios, such as surgical procedures, human activity, etc.It is quite challenging to measure distances between these timeseries with the existence of such large discontinuities. Largediscontinuities can impede putting local characteristics intothe spotlight. Since distance measurement for time series isthe core of similarity analysis, classification and clustering,we should address this issue of measuring distances betweentime series with large discontinuities.There are a large quantity of algorithms measuring distancesbetween time series, among which Euclidean distance anddynamic time warping (DTW) along with their derivativealgorithms are the most widely used. Many classificationand clustering algorithms are based on Euclidean distancewhen all elements have the same dimension or length [1]–[4]. However, it can perform poorly when there is distortionin time series along the time axis [5]. DTW is also widelyused for global distance measurement of time series, appliedin a diverse range of domains including gesture recognition[6], [7], time series classification [8], trajectory clustering [9],disease detection [10], etc. DTW can address distortion in timeseries to certain extent by aligning two time series with indicesin monotonically increasing order. DTW is a global distance-based algorithm which cannot fully extract local characteristics
ABx t (a) DTW
ABx t (b) SDTWFig. 1. Example of segmentation and alignment for time series with largediscontinuities using DTW and SPD-embedded DTW (SDTW). of time series. Due to this, DTW is unsuitable for certain kindsof data where local similarity is more significant than globalsimilarity.In order to address the above issue, we propose the seg-mented pairwise distance (SPD) algorithm, which can be em-bedded in distance-based algorithms. Although both Euclideandistance and DTW have difficulty in measuring distancesbetween time series with large discontinuities, we use DTWand its derivative algorithms as our baselines, since DTWperforms comparatively better than Euclidean distance. Fig.1 is an example to exhibit the difference between DTW andSPD-embedded DTW (SDTW) when dealing with time serieswith large discontinuities. DTW aligns two time series withindices monotonically increasing, while SDTW can segmenttime series based on large discontinuities and sum all distancesbetween the most similar segment pairs. As a result, SDTWis able to detect similar time series sharing similar segmentpairs by obtaining small overall distances between them.This paper mainly contributes to proposing a new algorithm,SPD, to measure distances between time series with largediscontinuities. Moreover, there are two technical merits inSPD: (1)
SPD is orthogonal to distance measurement and canbe embedded in all distance-based algorithms; (2)
SPD candecide a unique segmentation threshold for every time seriesin different datasets so that it can be applied to a variety ofdatasets. We validate advantages of SPD-embedded algorithmsover corresponding distance-based ones on both open datasetsand the surgical dataset that we collect, where surgeries areperformed by expert surgeons on the same patient’s temporalbone in the virtual reality surgery simulator. It will provide anew challenging benchmark dataset for distance measurement a r X i v : . [ c s . D B ] S e p f time series with large discontinuities . In addition, alltechniques for speeding up distance-based algorithms can alsobe applied to SPD-embedded algorithms, which is beyond thescope of this paper. II. R ELATED W ORK
A. Segmentation algorithms for time series
Many segmentation algorithms for time series are basedon regression subroutines [11]–[13]. There are three maincategories of regression-based segmentation algorithms, in-cluding Sliding-Windows, Top-Down, and Bottom-Up algo-rithms. They are utilized to address multiple segmentationproblems: generate piecewise representations (1) using onlyK segments, (2) minimizing the piecewise error, (3) andminimizing the total error in various scenarios [14]. [11]caught segmentation points by combining the Lasso penaltywith dynamic programming. [15] first learned a sequenceof local relationship models that could best fit time seriesdata, and then combined changes of local relationships toidentify the operational behavior switching in the system level.Regression-based segmentation algorithms have two main lim-itations: (1) regression subroutines are not efficient if one justneeds segments of time series without regression; (2) linearrepresentations for univariate segments are not applicable inhigh-dimensional multivariate time series.There are also some segmentation algorithms for time serieswithout employing regression. [16] successfully segmentedmultivariate time series with differential evolution. Later, thefast low-cost online semantic segmentation (FLOSS) algorithmsegmented time series at a high level by detecting the regimechange [17]. In addition, Matrix Profile distance (MPdist) wasproposed to detect the similarity of two time series whenthey share multiple similar subsequences based on Euclideandistance [18]. However, these algorithms still have somelimitations: (1) they are not computationally efficient becausethey traverse all possible subsequences; (2) there are somehyperparameters (e.g., the length of subsequences in timeseries and the quantile threshold in MPdist) to be decidedbefore measurement, which is domain dependent.
B. Euclidean distance, DTW and their derivatives
Both Euclidean distance and DTW have difficulty in mea-suring distances between time series with large discontinu-ities. Euclidean distance regards each time series as a high-dimensional point, which is extensively applied as the distancefunction in time series classification [1], clustering [2], [3], andother scenarios. In addition, DTW and its derivative algorithmscan also measure distances between time series. Complexityinvariance was proposed to measure distances between timeseries with varying complexities, which could be embedded inDTW as the complexity-invariant DTW (CIDTW) algorithm[19]. The shape of time series is also a significant feature.[20] proposed the derivative DTW (DDTW) algorithm to align The code and dataset is available at https://github.com/Jacobi93/Segmented-Pairwise-Distance. time series using high level features of shape. Moreover, thephase difference is also a potential problem because DTWprovides non-linear alignments. [21] proposed the weightedDTW (WDTW) algorithm to penalize points with higherphase difference, in order to achieve minimum distance dis-tortion caused by outliers. The weighted version of DDTW(WDDTW) was then proposed in [21]. These derivative al-gorithms of DTW all perform competitively well in theirspecific scenarios and exhibit their limitations in others. Theyare all whole time series distance-based algorithms, whichmeasures distances between time series using all elementsin them. We do not consider those shapelet-based, interval-based, or dictionary-based algorithms. They measure distancesbetween subsequences from whole time series with differentfeature selection methods, which were compared and analyzedcomprehensively in [22].Some research embedded segmentation techniques in DTWfor distance measurement as well. [23] implemented peakidentification and pairing for time series before DTW analysis.The limitation is that the number of segments between twotime series must be the same, which is not ubiquitous inmany scenarios. In contrast, our proposed segmented pairwisedistance (SPD) algorithm can be embedded in any distance-based algorithm and the numbers of segments of two timeseries are not restricted to be the same. Another segmented-based DTW (SBDTW) algorithm was proposed for similaritymeasurement in urban transportation systems [24]. Point-segment, prediction and segment-segment distances were de-fined in SBDTW. The minimal distance of time series pairswas computed by accumulating the minimum of three dis-tances. Instead, SPD only segments time series based on theirlarge discontinuities. Following that, original distance-basedalgorithms are employed on every segment pair from differenttime series to obtain pairwise distances, which is then usedfor calculating the overall distance between time series.
C. Local similarity
Some researchers also noticed the significance of localsimilarity for time series. Local descriptors for recognizingmotion patterns in videos were presented to classify humanactions [25]. Internal self-similarities were captured by a localself-similarity descriptor [26], which provided matching ca-pabilities of complex visual data. Besides, all-pairs-similarity-search algorithms were also proposed to evaluate similarityjoins for time series subsequences [27]. To extend this idea,our work measures local similarity between time series withlarge discontinuities using SPD-embedded algorithms.III. S
EGMENTED P AIRWISE D ISTANCE A LGORITHM
This paper proposes the segmented pairwise distance (SPD)algorithm to measure distances between time series withlarge discontinuities. The SPD algorithm can be embedded inall distance-based ones. Since DTW performs comparativelybetter than Euclidean distance, we use DTW and its derivativealgorithms in our experiments. We embed SPD in DTW to lgorithm 1
SDTW
Input:
Time series A and B in length n and n , q Output:
SDT W ( A , B ) Calculate consecutive distances for A and B , obtainsegmentation thresholds based on q and sorted distancedistributions of A and B , respectively Segment A and B into s segments ( a , a , . . . , a s ),and s segments ( b , b , . . . , b s ) based on thresholds Calculate
DT W ( a i , b j ) for all i in [1 , s ] and j in [1 , s ] to obtain the DTW matrix D in size of ( s , s ) Dis = (cid:80) s i =1 min ( row ( i )) Record the column numbers of min ( row ( i )) in step anddelete recorded columns with s (cid:48) columns remained in D’ Dis = Dis + (cid:80) s (cid:48) j =1 min ( col ( j )) D = D T , repeat steps 4-6 to obtain Dis SDT W ( A , B ) = min ( Dis ,Dis ) n + n build the SDTW algorithm as an example, with details inAlgorithm 1. A. Dynamic Time Warping
Before we go deep into our proposed SPD, it is necessaryto introduce DTW first. We select DTW and its derivativealgorithms as distance measurement baselines because DTWperforms comparatively better than Euclidean distance in mea-suring distances between time series with large discontinuities.DTW is flexible to align time series with variant lengths.Equations (1) and (2) are recursive functions of DTW [5]. A and B are two time series in sequence ( a , a , . . . , a n ) and ( b , b , . . . , b n ) . n and n are the number of elementsin A and B . d ( a i , b j ) is the distance (usually it is Euclideandistance) defined between the i th element in A and the j th element in B . D (1: n , n ) = d ( a n , b n ) + min [ D (1:( n − , n − , D (1:( n − , n ) , D (1: n , n − (1) D (1:1 , d ( a , b ) (2) B. SPD-embedded Dynamic Time Warping
We are finally in the position to introduce the core contribu-tion of our work. This paper proposes SPD to help distance-based algorithms measure distances between time series withlarge discontinuities. We embed SPD in DTW to build SDTWas an example (Algorithm 1). In order to measure the SDTWdistance between two time series, we first calculate distancesof consecutive elements for each one of them, respectively.The quantile q of sorted distance distribution decides the seg-mentation threshold for each time series (step 1). The quantile q is not completely domain agnostic. The knowledge of thedataset can help set a reasonable q for distance measurementbetween time series, although we find that it is insensitivein range of [0 . , . in most scenarios by experiments. q = 0 . represents that the time series will be segmented (a) DTW=22 (b) SDTW=2Fig. 2. Example of distance measurement between time series with largediscontinuities using DTW and SPD-embedded DTW (SDTW). where the distance between two consecutive elements is largerthan of all in the time series. For example, if there areabout one thousand elements in the time series, it will besegmented into subsequences approximately. There is aunique segmentation threshold for every time series when q isdetermined.After segmentation of two time series based on their thresh-olds we obtain (step 2), we can calculate the DTW matrix forevery segment pair from two different time series based on theDTW algorithm (step 3). Afterwards, all minimal distancesfrom every segment in A to any segment in B are foundand accumulated (step 4). There are probably some segmentsin B , which are never paired by A in step 4. Then minimaldistances from those remaining segments in B to any segmentin A are then found and accumulated to obtain Dis (steps5-6). Dis can be measured after transposing D and repeatingsteps 4-6 (step 7). Finally, we can obtain the SDTW distancebetween two time series A and B in step 8.Here is an example of the comparison between DTW andSDTW when calculating the distance between two time serieswith large discontinuities (Fig. 2). Sequence A is (4, 5, 6,1, 2, 3, 7, 8, 9). Sequence B is (1, 2, 3, 7, 8, 9, 4, 6,5). We set the segmentation threshold to be 2 for each oneof them so that each time series can be segmented into 3subsequences. From the local point of view, two time seriesare very similar to each other but not exactly the same. Both A and B have subsequences (1, 2, 3) and (7, 8, 9) whilethere is a unique subsequence (4, 5, 6) in A and (4, 6, 5) in B . SDTW outperforms DTW when analyzing local similaritybetween two time series, which successfully detects similarsubsequence pairs from different time series.The time complexity of DTW is O ( n n ) . Thetime complexity of SDTW is O ( n ) + O ( n ) + O ( (cid:80) s i =1 (cid:80) s j =1 l ( a i ) l ( b j )) + O ( s s ) , where O ( n ) + O ( n ) is the time complexity for segmenting A and B (steps 1-2), O ( (cid:80) s i =1 (cid:80) s j =1 l ( a i ) l ( b j )) is that for calculating the DTWmatrix (step 3) and O ( s s ) is that for calculating the SDTWdistance based on the DTW matrix (steps 4-8). l ( a i ) is thelength of the i th subsequence in A and l ( b j ) is the lengthof the j th subsequence in B . Since (cid:80) s i =1 l ( a i ) = n and (cid:80) s j =1 l ( b j ) = n , O ( (cid:80) s i =1 (cid:80) s j =1 l ( a i ) l ( b j )) is equal to ( n n ) . Because O ( s s ) < O ( n n ) , the time complexityof SDTW is O ( n n ) in the end, which is the same as thatof DTW although it takes a little longer to implement SDTWthan DTW in experiments. In addition, all techniques forspeeding up DTW can also be applied to SDTW, and SDTWhas the advantage of less space complexity over DTW. Bothof them are beyond the scope of this paper.IV. D ATASETS
A. The Cortical Mastoidectomy dataset
The Cortical Mastoidectomy (CM) dataset is collected withthe help of expert ear surgeons performing this operationon the Virtual Reality Temporal Bone Surgery (VRTBS)simulator [28]. The VRTBS simulator was developed as aplatform for temporal bone surgery training, including CM.Expert surgeons can record their surgeries in the simulator sothat trainees can learn from them. Trainees can also practiseperforming surgeries repetitively in the simulator before theyachieve expertise. Long time series of the surgical drilling bitare recorded as voxel positions in the 3D space. Fig. 3 is anexample of a temporal bone after the completion of a CMsurgery on the simulator [29]. The vacant region in the centeris the drilled part of the mastoid from a temporal bone. Datapreprocessing is implemented for better distance measurementbecause surgical time series collected in the simulator areusually constrained with noise. First, all consecutive andduplicated elements are deleted so that no repeated positionsare recorded. Second, we also delete positions without theremoval of the mastoid so that all changes of positions intime series are effective actions. In the end, remaining timeseries with varying lengths are saved for further distancemeasurement. [30] proposed processing methods to deal withvarying-length time series, such as the uniform scaling, theprefix and suffix padding, etc. We do not include them because(1) there is no noticeable improvement and (2) DTW and itsderivatives are able to measure distances between time serieswith varying lengths.Surgeons remove the mastoid part by part so that time serieswith large discontinuities are recorded. Surgeons have a varietyof ways to remove mastoid air cells in different styles. Timeseries of the surgical drilling bit in every surgery are differentfrom each other from the global point of view. However, someparts of them are analogous to each other from the local pointof view because surgeons tend to remove parts of the mastoidin their own way when they perform CM surgeries. As thesurgery goes on, there are more and more stochastic actionsof removing the mastoid out of the temporal bone, whichis unavoidable to be recorded as stochastic elements in timeseries. These stochastic elements can impede the discovery ofsurgeries from the same surgeon severely. In order to alleviatethis problem, we truncate the first / of all time series to buildthe CM dataset. 21 surgeries are collected from 7 surgeonson the same temporal bone, with each surgeon performing3 surgeries in their unique style. Therefore, there are 21 timeseries pairs from the same surgeon and 189 pairs from differentsurgeons in total. Surgical time series from the same surgeon Fig. 3. Processed temporal bone image after the cortical mastoidectomy (CM)surgery (bottom right is one surgeon performing a CM surgery in the VRTBSsimulator). tend to share smaller distances between each other than thosefrom different surgeons.To our knowledge, the CM dataset is a very challengingbenchmark dataset. Surgeries from the same surgeon can stillbe different from each other from a global point of view (Fig.4). Our goal is not to propose a novel algorithm and identifyevery surgery from every surgeon without any mistake (webelieve that no algorithms can successfully do it right now).We provide this benchmark dataset to help researchers explorethe possibility of distance measurement for time series withlarge discontinuities. There are some significant characteristicsof the CM dataset: • All surgical time series are collected from expert surgeonsin the VRTBS simulator. It is different from those timeseries collected from sensors or other professional toolsbecause there are not only noise but also unavoidablestochastic actions in them, which makes it more chal-lenging to measure distances between these time seriespairs. • Apart from the noise and unavoidable stochastic actionsin the CM dataset, there are large discontinuities in everysurgical time series. The detection of consecutive ele-ments with large discontinuities is crucial for measuringdistances between them. • All surgical time series collected in the CM dataset are3D time series. while most open datasets are composed of1D (e.g., stock price, ECD) or 2D (e.g., GPS trajectories)time series.
B. Open datasets
There are not a lot of open datasets where time seriesown large discontinuities. This should be one reason why fewresearch addressed the issue of measuring distances betweenthem before. However, it is still important to discuss about thisissue as they are common in some scenarios, such as surgicalprocedures and human activity. We will release the CM datasetas a supplement to existing datasets. It will be beneficial toresearchers who want to do further research on measuringdistances between time series with large discontinuities.In order to validate the segmented pairwise distance (SPD)algorithm, we should find some other open datasets and mod-ify them if necessary. After thorough (may not be complete)searching throughout the UCR Archive and other online data ig. 4. Cortical mastoidectomy surgery time series performed by seven surgeons with each performing three surgeries. resources, two datasets are used for our experiments afterlimited modification.
1) Activity Recognition dataset:
The first dataset is theActivity Recognition (AG) dataset [31]. AG is collected fromwearable accelerometers mounted on chests of 15 participantsperforming 7 activities, such as standing, walking, goingupdown stairs, etc. It provides challenges for identification andauthentication of people using motion patterns.Although there are 7 different activities in the recordedtime series of every participant, there is still no large dis-continuities in time series. In order to build time series withlarge discontinuities as the benchmark dataset, we concatenatedifferent activities of the same participant to build new timeseries for everyone. The activity standing and walking areselected because their patterns are quite different from eachother in fluctuation amplitude and frequency. For every partic-ipant, two new time series are built as ( S , W , S , W ) and ( W , S , W , S ) , where S stands for standing and W standsfor walking. They are randomly selected from subsequencesof time series representing the same participant with the samelength (500 elements in each subsequence and 2000 in total).We partition 15 participants into 5 sub-datasets so that thereare 3 participants in each experiment in AG (from AR to AR in Table I), with each participant having two newly-builttime series.
2) Indoor User Movement dataset:
The second dataset isthe Indoor User Movement (IUM) dataset [32]. IUM containspatterns of user movements in real-world office environments from time series generated by a Wireless Sensor Network(WSN), comprising 5 sensors: 4 anchors deployed in theenvironment and 1 mote worn by the user. Target data in IUMconsists in a class label indicating whether the user’s timeseries will lead to a room change or not. In particular, thetarget class 1 is associated to the location changing movements(156 sequences), while the target class -1 is associated to thelocation preserving movements (158 sequences).For the same reason mentioned above, we need to buildnew time series to evaluate the performance of SPD. Everymovement is recorded by 4 anchors in IUM. We can concate-nate them together and add constant values in the second andfourth subsequences to create large discontinuities. We select26 time series from IUM non-repetitively every time, with 13from class 1 and the other 13 from class -1. As a result, theIUM dataset is partitioned into 12 sub-datasets (from
IU M to IU M in Table I).V. E XPERIMENTAL R ESULTS AND A NALYSIS
In this section, we will compare and analyze the perfor-mance of SPD-embedded algorithms with their correspondingdistance-based ones on open datasets and the cortical mas-toidectomy (CM) dataset collected by expert surgeons.
A. Silhouette index
To evaluate the performance of SPD-embedded algorithmsalong with corresponding distance-based ones, we should testif they measure distances between time series reasonably. Thenternal cluster criteria are built to evaluate the performance ofclustering algorithms, which compare average within-clusterdistances and between-cluster distances obtained by differentalgorithms [21]. There are a group of validity indices tointerpret and validate distance measurement of clustering algo-rithms, including Silhouette index, Dunn index, DaviesBouldinindex, etc. In this paper, the Silhouette index (SI) is used asan evaluation metric for our task [33]. Equations (3)-(7) defineSI, where t are time series in the dataset D t . We can calculate SI ( t i ) for every time series t i and the overall SI ( D t ) can bean estimation of the performance of distance-based algorithmson the dataset. a ( t i ) is the average pairwise distance between t i and any other time series in the same cluster while b ( t i ) isthe average pairwise distance between t i and any time seriesin the neighbouring cluster. SI are in range of [ − , and highSI means appropriate distance measurement for clustering thedataset. a ( t i ) = 1 | C t i | − (cid:88) j ∈ C ti ,i (cid:54) = j D ( t i , t j ) (3) b ( t i ) = min k (cid:54) = i | C t k | (cid:88) t j ∈ C tk D ( t i , t j ) (4) SI ( t i ) = b ( t i ) − a ( t i )max( a ( t i ) , b ( t i )) , if | C t i | > (5) SI ( t i ) = 0 , if | C t i | = 1 (6) SI ( D t ) = 1 | D t | (cid:88) t i ∈ D t SI ( t i ) (7) B. Experimental algorithms
We select 5 distance-based algorithms, namely DTW,CIDTW, DDTW, WDTW and WDDTW mentioned in SectionII. We chose DTW as an example in Section III to exhibit howSPD can be embedded in distance-based algorithms. Here areother four algorithms which are to be compared in followingexperiments.
1) CIDTW:
Complexity invariance was proposed as a sup-plement to the invariance family including amplitude invari-ance, local scaling invariance, uniform scaling invariance,phase invariance, occlusion invariance, and their combinations[19]. The complexity invariance was achieved by the introduc-tion of a correction factor CF for existing distance measures,obtained by the complexity estimate CE . It can be embeddedin DTW as the complexity-invariant DTW (CIDTW) algorithmdefined by (8)-(10). CIDT W ( A , B ) = DT W ( A , B ) × CF ( A , B ) (8) CF ( A , B ) = max ( CE ( A ) , CE ( B )) min ( CE ( A ) , CE ( B )) (9) CE ( A ) = (cid:118)(cid:117)(cid:117)(cid:116) n − (cid:88) i =1 ( a i − a i +1 ) (10)
2) DDTW:
The derivative DTW (DDTW) algorithm alignstime series considering high level features of shape. It obtainsinformation about shapes by considering the first derivative oftime series [20]. DDTW preprocesses time series using (11),where the undefined elements a D and a Dn are obtained by a D = a D and a Dn = a Dn − . Distances between preprocessedtime series are then calculated by DTW. a Di = ( a i − a i − ) + ( a i +1 − a i − ) / , < i < n (11)
3) WDTW:
The phase difference is also a common prob-lem because DTW provides non-linear alignments, which isregarded as the phase invariance problem in the invariancefamily. [21] proposed the weighted DTW (WDTW) algorithmto penalize elements with larger phase difference using (12)-(13), in order to achieve minimum distance distortion causedby outliers. Equation (14) is the modified logistic weightfunction (MLWF) proposed to systematically assign weights asa function of phase difference. g is the penalty coefficient forphase difference. There is no guarantee that SPD can improvethe performance of WDTW for any g on any dataset, butwe found positive results in most experiments. g = 0 . isselected as the penalty for WDTW shown in Table I. D (1: n , n ) = w n − n d ( a n , b n ) + min [ D (1:( n − , n − , D (1:( n − , n ) , D (1: n , n − (12) D (1:1 , w − d ( a , b ) (13) w i = w max − g ( i − n c )) (14)
4) WDDTW:
The penalty for phase difference can beextended to variants of DTW, including DDTW. The weightedversion of DDTW (WDDTW) was then proposed in [21]. Weuse the same g for WDDTW in all experiments. C. Results and analysis
We validate advantages of SPD by selecting 5 distance-based algorithms with their corresponding SPD-embedded ver-sions to measure pairwise distances on the CM, AR and IUMdatasets, respectively. All odd columns are results of existingdistance-based algorithms and all even columns are thoseof corresponding SPD-embedded algorithms. The quantile ofsorted distance distribution is insensitive when it is in rangeof [0 . , . in most scenarios. We set the quantile to be . in all experiments based on a priori knowledge of datasets.On the CM dataset, all SPD-embedded algorithms performbetter than corresponding distance-based ones, with overallimprovement of in average measured by SI. DDTW,WDTW and WDDTW all perform poorly on the CM dataset.Although their corresponding SPD-embedded algorithms im-prove much based on their poor performance, they are stillworse than DTW. The poor performance of DDTW impliesthat high level features of shape extracted by the first deriva-tive of time series should impede the measurement of their ABLE IE
XPERIMENTAL RESULTS ON ALL DATASETS MEASURED BY
SI (SI
IN RANGE OF [ − , ; CM: CORTICAL MASTOIDECTOMY DATASET ; AG:
ACTIVITYRECOGNITION DATASET ; IUM:
INDOOR USER MOVEMENT DATASET ).DTW
SDTW
CIDTW
SCIDTW
DDTW
SDDTW
WDTW . SWDTW . WDDTW . SWDDTW . CM .
054 0 .
139 0 . − . − . − .
627 0 . − . − . AR .
226 0 .
260 0 . .
220 0 . − .
198 0 . − .
171 0 . AR .
264 0 .
406 0 . .
186 0 . − .
216 0 . − .
208 0 . AR .
291 0 .
488 0 . .
248 0 . − .
199 0 . − .
174 0 . AR .
280 0 .
449 0 . .
161 0 . − .
193 0 . − . − . AR .
091 0 .
459 0 . .
060 0 . − .
168 0 . − . − . Overall .
230 0 .
412 0 . .
175 0 . − .
195 0 . − .
169 0 . IUM .
020 0 .
140 0 . .
023 0 .
026 0 .
023 0 .
137 0 .
028 0 . IUM .
031 0 .
155 0 . .
022 0 .
071 0 .
040 0 .
145 0 .
031 0 . IUM .
013 0 .
232 0 .
017 0 . − .
017 0 .
032 0 . .
006 0 . IUM .
076 0 .
295 0 .
074 0 . − .
031 0 .
043 0 . .
035 0 . IUM .
059 0 .
385 0 .
069 0 .
348 0 .
007 0 .
051 0 . .
071 0 . IUM .
093 0 .
123 0 . − .
010 0 .
037 0 .
120 0 .
109 0 .
051 0 . IUM . .
127 0 . − .
061 0 .
087 0 .
199 0 .
451 0 .
069 0 . IUM − .
043 0 . − . − .
084 0 .
043 0 .
116 0 .
135 0 .
078 0 . IUM − .
019 0 . − . − .
082 0 .
093 0 .
111 0 .
338 0 .
076 0 . IUM .
024 0 .
212 0 . − .
058 0 .
052 0 .
145 0 .
239 0 .
080 0 . IUM .
001 0 .
082 0 .
004 0 . − .
045 0 .
013 0 . .
066 0 . IUM .
083 0 .
233 0 .
081 0 . − .
056 0 .
130 0 . .
078 0 . Overall .
040 0 .
234 0 .
045 0 . − .
033 0 .
057 0 . .
056 0 . pairwise distances in the CM dataset. The poor performanceof WDTW and WDDTW implies that we should tolerate thephase difference between time series when measuring theirpairwise distances in the CM dataset. SCIDTW performsthe best on the CM dataset, with CIDTW the second andSDTW the third. SDTW does not defeat CIDTW but CIDTWcan be additionally improved by our proposed SPD as thechampion SCIDTW on the CM dataset. SDTW improves theperformance of DTW by and SCIDTW improves theperformance of CIDTW by , respectively.On the AR dataset, most SPD-embedded algorithms performbetter than corresponding distance-based ones on sub-datasets,with overall improvement of in average measured by SI.DDTW performs slightly worse than DTW, while WDTWand WDDTW perform poorly on the AR dataset. Althoughtheir corresponding SPD-embedded algorithms improve muchbased on their poor performance, they are still worse thanDTW. The poor performance of DDTW implies that it isnot necessary to extract high level features of shape by thefirst derivative of time series when measuring their pairwisedistances in the AR dataset. The poor performance of WDTWand WDDTW implies that we should also tolerate the phasedifference between time series in the AR dataset. SCIDTWperforms the best on the AR dataset, with SDTW the second.SDTW defeats CIDTW on the AR dataset. Both SCIDTW andSDTW perform much better than all other algorithms. SDTWimproves the performance of DTW by and SCIDTWimproves the performance of CIDTW by , respectively.On the IUM dataset, most SPD-embedded algorithms per-form better than corresponding distance-based ones on sub-datasets as well, with overall improvement of in av-erage measured by SI. However, almost all distance-based algorithms perform badly on the IUM dataset. It is quitenecessary to use SPD-embedded algorithms in order to im-prove the performance of corresponding distance-based onesin measuring pairwise distances of time series in the IUMdataset. SWDTW performs the best on the IUM dataset, withSCIDTW the second and SDTW the third. SDTW, SCIDTW,and SWDTW improves the performance of DTW, CIDTW, andWDTW by , , and , respectively. The top 3algorithms share very close performance to each other, whichis one main reason why we do experiments on randomly-selected sub-datasets. We can clearly see the distribution ofbest performance on these sub-datasets when overall resultsare close to each other.In conclusion, all algorithms perform quite differently fromeach other on different datasets. SPD-embedded algorithmscan help improve the performance of corresponding distance-based algorithms dominantly on every dataset, even whendistance-based ones perform very badly. DTW is a widely usedalgorithm, which is hard to beat by its derivative algorithms(CIDTW, DDTW, WDTW and WDDTW). CIDTW performsthe best among 5 distance-based algorithms, which shows theimportance of achieving complexity invariance when measur-ing distances between time series with large discontinuities.Moreover, none of these distance-based algorithms are ubiqui-tously well-performing ones and they can only perform well intheir specific scenarios. The poor performance of SWDDTWon all datasets implies that complicated algorithms can notmake sure of good performance. It may not be necessaryto learn shape features of time series or consider the phaseinvariance when measuring their pairwise distances all thetime. It is always necessary to obtain a priori knowledge ofscenarios in order to select suitable algorithms. In this scenariohere all time series own large discontinuities, SCIDTWperforms the best, followed by SDTW.VI. C ONCLUSIONS AND F UTURE W ORK
This paper proposes a new algorithm, the segmented pair-wise distance (SPD), to measure distances between time serieswith large discontinuities, which are common in many scenar-ios. SPD is orthogonal to distance-based algorithms and can beembedded in them. We validate advantages of SPD-embeddedalgorithms over corresponding distance-based ones on bothopen datasets and our collected cortical mastoidectomy (CM)dataset. We provide the potential of distance measurement withSPD-embedded algorithms in more challenging scenarios. Inthe near future, we plan to (1) find an intelligent methodto decide the segmentation threshold for SPD on differentdatasets and (2) consider the extension of SPD to surgicaltime series identification, human activity recognition and otherchallenging tasks. A
CKNOWLEDGMENT
We truly appreciated great cooperation with anonymoussurgeons at the Royal Victorian Eye and Ear Hospital, whohelped us perform and validate CM surgeries in the VRTBSsimulator. This research was supported by the MelbourneResearch Scholarship. R EFERENCES[1] P. Cunningham and S. J. Delany, “k-nearest neighbour classifiers,”
Multiple Classifier Systems , vol. 34, no. 8, pp. 1–17, 2007.[2] M. Ester, H.-P. Kriegel, J. Sander, X. Xu et al. , “A density-basedalgorithm for discovering clusters in large spatial databases with noise.”in
Kdd , vol. 96, no. 34, 1996, pp. 226–231.[3] D. Arthur and S. Vassilvitskii, “k-means++: The advantages of carefulseeding,” in
Proceedings of the eighteenth annual ACM-SIAM sym-posium on Discrete algorithms . Society for Industrial and AppliedMathematics, 2007, pp. 1027–1035.[4] Z. Wang, R. Zhang, J. Qi, and B. Yuan, “Dbsvec: Density-basedclustering using support vector expansion,” in
ICDE . IEEE, 2019,pp. 280–291.[5] A. Mueen and E. Keogh, “Extracting optimal performance from dynamictime warping,” in
Proceedings of the 22nd ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining . ACM, 2016,pp. 2129–2130.[6] K. Barczewska and A. Drozd, “Comparison of methods for handgesture recognition based on dynamic time warping algorithm,” in .IEEE, 2013, pp. 207–210.[7] G. Plouffe and A.-M. Cretu, “Static and dynamic hand gesture recogni-tion in depth data using dynamic time warping,”
IEEE transactions oninstrumentation and measurement , vol. 65, no. 2, pp. 305–316, 2016.[8] V. Wegner Maus, G. Cˆamara, M. Appel, and E. Pebesma, “dtwsat: Time-weighted dynamic time warping for satellite image time series analysisin r,”
Journal of Statistical Software , vol. 88, no. 5, pp. 1–31, 2019.[9] S. Atev, G. Miller, and N. P. Papanikolopoulos, “Clustering of vehicletrajectories,”
IEEE Transactions on Intelligent Transportation Systems ,vol. 11, no. 3, pp. 647–657, 2010.[10] R. Varatharajan, G. Manogaran, M. Priyan, and R. Sundarasekar,“Wearable sensor devices for early detection of alzheimer disease usingdynamic time warping algorithm,”
Cluster Computing , pp. 1–10, 2017.[11] C. Levy-leduc and Z. Harchaoui, “Catching change-points with lasso,”in
Advances in Neural Information Processing Systems , 2008, pp. 617–624.[12] M. Lovri´c, M. Milanovi´c, and M. Stamenkovi´c, “Algoritmic methodsfor segmentation of time series: An overview,”
Journal of ContemporaryEconomic and Business Issues , vol. 1, no. 1, pp. 31–53, 2014. [13] J. Wu, Y. Wang, P. Wang, J. Pei, and W. Wang, “Finding maximalsignificant linear representation between long time series,” in . IEEE, 2018, pp. 1320–1325.[14] E. Keogh, S. Chu, D. Hart, and M. Pazzani, “Segmenting time series:A survey and novel approach,” in
Data mining in time series databases .World Scientific, 2004, pp. 1–21.[15] Z. Han, H. Chen, T. Yan, and G. Jiang, “Time series segmentation todiscover behavior switching in complex physical systems,” in . IEEE, 2015, pp. 161–170.[16] D. Graves and W. Pedrycz, “Multivariate segmentation of time serieswith differential evolution.” in
IFSA/EUSFLAT Conf.
Citeseer, 2009,pp. 1108–1113.[17] S. Gharghabi, Y. Ding, C.-C. M. Yeh, K. Kamgar, L. Ulanova, andE. Keogh, “Matrix profile viii: domain agnostic online semantic segmen-tation at superhuman performance levels,” in . IEEE, 2017, pp. 117–126.[18] S. Gharghabi, S. Imani, A. Bagnall, A. Darvishzadeh, and E. Keogh,“An ultra-fast time series distance measure to allow data mining inmore complex real-world deployments,” in , 2018.[19] G. E. Batista, X. Wang, and E. J. Keogh, “A complexity-invariantdistance measure for time series,” in
Proceedings of the 2011 SIAMinternational conference on data mining . SIAM, 2011, pp. 699–710.[20] E. J. Keogh and M. J. Pazzani, “Derivative dynamic time warping,” in
Proceedings of the 2001 SIAM international conference on data mining .SIAM, 2001, pp. 1–11.[21] Y.-S. Jeong, M. K. Jeong, and O. A. Omitaomu, “Weighted dynamictime warping for time series classification,”
Pattern Recognition , vol. 44,no. 9, pp. 2231–2240, 2011.[22] A. Bagnall, J. Lines, A. Bostrom, J. Large, and E. Keogh, “The greattime series classification bake off: a review and experimental evaluationof recent algorithmic advances,”
Data Mining and Knowledge Discovery ,vol. 31, no. 3, pp. 606–660, 2017.[23] R. Ma, A. Ahmadzadeh, S. F. Boubrahimi, and R. A. Angryk, “Segmen-tation of time series in improving dynamic time warping,” in . IEEE, 2018, pp. 3756–3761.[24] Y. Mao, H. Zhong, X. Xiao, and X. Li, “A segment-based trajectorysimilarity measure in the urban transportation systems,”
Sensors , vol. 17,no. 3, p. 524, 2017.[25] I. Laptev and T. Lindeberg, “Local descriptors for spatio-temporalrecognition,” in
International Workshop on Spatial Coherence for VisualMotion Analysis . Springer, 2004, pp. 91–103.[26] E. Shechtman and M. Irani, “Matching local self-similarities acrossimages and videos.” in
CVPR , vol. 2. Minneapolis, MN, 2007, p. 3.[27] C.-C. M. Yeh, Y. Zhu, L. Ulanova, N. Begum, Y. Ding, H. A. Dau, D. F.Silva, A. Mueen, and E. Keogh, “Matrix profile i: all pairs similarityjoins for time series: a unifying view that includes motifs, discords andshapelets,” in .IEEE, 2016, pp. 1317–1322.[28] X. Ma, S. Wijewickrema, S. Zhou, Y. Zhou, Z. Mhammedi,S. O’Leary, and J. Bailey, “Adversarial generation of real-time feedbackwith neural networks for simulation-based training,” arXiv preprintarXiv:1703.01460 , 2017.[29] P. A. Yushkevich, J. Piven, H. Cody Hazlett, R. Gimpel Smith, S. Ho,J. C. Gee, and G. Gerig, “User-guided 3D active contour segmentation ofanatomical structures: Significantly improved efficiency and reliability,”
Neuroimage , vol. 31, no. 3, pp. 1116–1128, 2006.[30] C. W. Tan, F. Petitjean, E. Keogh, and G. I. Webb, “Time series clas-sification for varying length series,” arXiv preprint arXiv:1910.04341 ,2019.[31] P. Casale, O. Pujol, and P. Radeva, “Personalization and user verificationin wearable systems using biometric walking patterns,”
Personal andUbiquitous Computing , vol. 16, no. 5, pp. 563–580, 2012.[32] D. Bacciu, P. Barsocchi, S. Chessa, C. Gallicchio, and A. Micheli, “Anexperimental characterization of reservoir computing in ambient assistedliving applications,”
Neural Computing and Applications , vol. 24, no. 6,pp. 1451–1464, 2014.[33] P. J. Rousseeuw, “Silhouettes: a graphical aid to the interpretation andvalidation of cluster analysis,”