[PDF] A Formally Robust Time Series Distance Metric

Abstract

Distance-based classification is among the most competitive classification methods for time series data. The most critical component of distance-based classification is the selected distance function. Past research has proposed various different distance metrics or measures dedicated to particular aspects of real-world time series data, yet there is an important aspect that has not been considered so far: Robustness against arbitrary data contamination. In this work, we propose a novel distance metric that is robust against arbitrarily "bad" contamination and has a worst-case computational complexity of O(nlogn) . We formally argue why our proposed metric is robust, and demonstrate in an empirical evaluation that the metric yields competitive classification accuracy when applied in k-Nearest Neighbor time series classification.

Full PDF

AA Formally Robust Time Series Distance Metric

Maximilian Toller

Know-Center GmbHGraz, [email protected]

Bernhard C. Geiger

Know-Center GmbHGraz, [email protected]@ieee.org

Roman Kern

Graz University of TechnologyGraz, [email protected]

ABSTRACT

Distance-based classification is among the most competitive classi-fication methods for time series data. The most critical componentof distance-based classification is the selected distance function.Past research has proposed various different distance metrics ormeasures dedicated to particular aspects of real-world time seriesdata, yet there is an important aspect that has not been consid-ered so far:

Robustness against arbitrary data contamination . In thiswork, we propose a novel distance metric that is robust against ar-bitrarily “bad” contamination and has a worst-case computationalcomplexity of O( n log n ) . We formally argue why our proposedmetric is robust, and demonstrate in an empirical evaluation thatthe metric yields competitive classification accuracy when appliedin k-Nearest Neighbor time series classification. CCS CONCEPTS • Mathematics of computing → Time series analysis ; •

Com-puting methodologies → Classification and regression trees ; •

Information systems → Clustering.

KEYWORDS time series, distance metric, robustness, classification, clustering

ACM Reference Format:

Maximilian Toller, Bernhard C. Geiger, and Roman Kern. 2019. A FormallyRobust Time Series Distance Metric. In

MileTS ’19: 5th KDD Workshop onMining and Learning from Time Series, August 5th, 2019, Anchorage, Alaska,USA.

ACM, New York, NY, USA, 10 pages. https://doi.org/10.1145/1122445.1122456

Time series data classification is an important task in many do-mains such as data mining, machine learning and econometrics.Extensive past evaluations [6] have shown that k-Nearest Neighbor(k-NN) classification is among the most competitive classificationapproaches for time series data. In simple terms, k-NN classificationassigns a query time series instance the class based on its k nearestneighbors in a labeled training set. As such, the k-NN classifier isa distance-based classifier, since its distance function is the only

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than theauthor(s) must be honored. Abstracting with credit is permitted. To copy otherwise, orrepublish, to post on servers or to redistribute to lists, requires prior specific permissionand/or a fee. Request permissions from [email protected].

MileTS ’19, August 5th, 2019, Anchorage, Alaska, USA © 2019 Copyright held by the owner/author(s). Publication rights licensed to ACM.ACM ISBN 978-1-4503-9999-9/18/06...$15.00https://doi.org/10.1145/1122445.1122456 component that discriminates between classes. The same appliesto distance-based clustering algorithms [15, 21].The data mining and machine learning communities have pro-posed numerous different distance functions for improving clas-sification and clustering accuracies on benchmark datasets [1, 6]and for accelerating the practical computation [14, 16, 17, 20, 23].However, there is an important aspect of distance measures thatwas not considered so far to the best of our knowledge:

Robustnessagainst arbitrary data contamination . While previous research hasproposed distance measures that are “robust” against additive whiteGaussian noise [13] or against temporal misalignment [22], we fol-low the definition used in the field of robust statistics. A crucialmeasure for determining the robustness of a distance function isbreakdown point (BP) analysis [11]. The asymptotic BP describesthe amount of contamination in the data that an estimator (in thiscase a distance function) can tolerate before it will be fully biasedby the contamination in the worst case. For example, the Euclideandistance has an asymptotic BP of zero: If a single observation in oneof the time series instances it compares is contaminated to (plusor minus) infinity, then the Euclidean distance becomes infinite aswell, regardless of the remaining observations.To address this issue, one may propose to use the raw Edit dis-tance [18], as it is robust against arbitrary contamination at a fewobservations. However, Edit distance is susceptible to a differenttype of contamination that is routinely overlooked, as it is triviallyfulfilled by most distance measures. If time series data are subjectto a tiny contamination at every single data point, then the Edit dis-tance will become very large. One may be tempted to address thisissue by defining a small tolerance interval suggested by Chen etal. [3], yet this is difficult if the variance of the data is large or time-dependent, which is a well-known behavior of many econometrictime series [8, 19]. Further, the time series classification accuracy ofthe raw Edit distance is poor for real time series data. If one extendsEdit distance to an elastic (non-lockstep) variant thereof, such asEdit distance with real penalty [2], then the classification accuracymay increase, yet the asymptotic BP immediately drops to 0 .Since all existing distance measures either have a low asymptoticBP or else yield a low classification accuracy, we aim to fill thisgap. In this work, we propose a novel distance metric which isformally robust according to Huber’s definition [11] against a smallpercentage of contaminated observations, and robust against tinydeviations at many observations. Additionally, we show that itsclassification accuracy is not significantly different from other dis-tance metrics and that our metric has a worst-case computationalcomplexity of O( n log n ) . The source code of our implementationand a script that reproduces all results can be found online. https://github.com/mtoller/robust-distance-metric a r X i v : . [ c s . L G ] A ug ileTS ’19, August 5th, 2019, Anchorage, Alaska, USA Maximilian Toller, Bernhard C. Geiger, and Roman Kern Let x = { x t , t ∈ , . . . , n } and y = { y t , t ∈ , . . . , n } be two timeseries instances and d : R n × R n → R a distance function forcomparing them. For an efficient distance-based classification, it isadvantageous if d (·) is a metric , as this allows a variety of run-timeacceleration techniques [9, 10]. To be a metric, d (·) has to fulfill thefollowing properties for all x , y ∈ R n : • d ( x , y ) ≥ Non-negativity • d ( x , y ) = ⇔ x = y Identity of Indiscernibles • d ( x , y ) = d ( y , x ) Symmetry • d ( x , z ) ≤ d ( x , y ) + d ( y , z ) Triangle Inequality

A typical example for a distance metric is the Euclidean distance e ( x , y ) = (cid:118)(cid:116) n (cid:213) t = ( x t − y t ) . (1)If a distance function d (·) fulfills all properties except the identityof indiscernibles, then it is called a pseudometric .To evaluate the robustness of a distance function, we adapt thedefinition of the breakdown point given in [11]. Specifically, let d sup = sup x , y ∈ R n d ( x , y ) be the largest possible value the distancefunction can obtain theoretically. Then, the breakdown point β ⋆ d ( n ) is given by β ⋆ d ( n ) = min (cid:26) kn (cid:12)(cid:12)(cid:12) sup d ( x , x + K ) = d sup (cid:27) (2)where the supremum is over all x ∈ R n and over all contaminationprocesses K = { K t , t ∈ , . . . , n } that assume arbitrary non-zerovalues in at most k positions and zero otherwise. In simple terms,the breakdown point describes the highest percentage of contami-nated observations that function d (·) can tolerate. For example, itis evident that for the Euclidean distance contaminating a singletime point suffices, i.e., if K = ∞ and K t = t = , . . . , n , then e ( x , x + K ) = d sup = ∞ for every x ∈ R n ; thus, β ⋆ e ( n ) = / n . Forclarity, the asymptotic BP is obtained by evaluating β ⋆ d ( n ) as n tendsto infinity. To link the theoretical concept of breakdown points with practicalclassification, we formulate two classification-specific notions ofrobustness. To this end, let C = {C , . . . , C r } denote a set of timeseries classes and d (·) a candidate distance function. Definition 2.1. Contamination Tolerance : A distance function d (·) tolerates ˆ k contaminated observation w.r.t. C if ∀ i , j (cid:44) i ∈ , . . . , r : ∀ x ∈ C i : ∀ y ∈ C j : d ( x , x + K ) < d ( x , y ) . (3)holds for every contamination processes K = { K t , t ∈ , . . . , n } thatassumes arbitrary non-zero values in at most ˆ k positions.Intuitively, assume that a distance function d (·) ideally separatesclass C i from other classes C j , j (cid:44) i . Function d (·) will tolerate up toˆ k contaminated observations if the distance between an uncontam-inated time series instance x and a contaminant variant thereof x + K is smaller than the distance between x and an instance froma different class y . Table 1: The components of the proposed ensemble metric E .The top three members are metrics, while the bottom threeare pseudometrics. Member name DefinitionEuclidean distance e ( x , y ) Log-distance ℓ ( x , y ) Raw Edit distance Edit ( x , y ) Robust Euclidean distance e ( → m ( x ) , → m ( y )) Robust Log-distance ℓ ( → m ( x ) , → m ( y )) Robust Raw Edit distance Edit ( → m ( x ) , → m ( y )) To specify imprecision invariance as mentioned in Section 1, i.e.invariance to tiny changes, we introduce an imprecision process { ε t } that is negligibly small at all t . Specifically, we assume that,for all t , | ε t | ≤ ε max , where ε max is much smaller than the standarddeviation (or some norm) of the time series x . Definition 2.2. Imprecision Invariance : A distance function d (·) isinvariant to an imprecision of ε max w.r.t. C if ∀ i , j (cid:44) i ∈ , . . . , r : ∀ x ∈ C i : ∀ y ∈ C j : d ( x , x + ε ) < d ( x , y ) (4)holds for every imprecision processes ε = { ε t , t ∈ , . . . , n } thatsatisfies | ε t | ≤ ε max for every t .In other words, assume distance function d (·) perfectly discrimi-nates class C i from C j , j (cid:44) i . Function d (·) is invariant to an impreci-sion of ε max if the distance between an instance x and almost thesame instance x + ε is smaller than the distance between x and aninstance from another class y .Contamination tolerance and imprecision invariance are verydifferent properties. There are not many metrics that fulfill bothsimultaneously: For example, no metric induced by an L p normwith a finite p ≥ k .Also, while many popular metrics are imprecision invariant, somemetrics such as the Edit distance are susceptible to it. In this section, we present a novel metric which can tolerate con-siderable contamination and is invariant to imprecision. The metricis obtained by aggregating an ensemble of metrics and pseudo-metrics in a way that preserves their discriminatory power whileguaranteeing robust results.

The ensemble consists of three metrics and three pseudometrics.The distances measured by these metrics are combined via a scalingfunction and an arbitrary L p norm, with p ≥

1, to obtain the metric E( x , y ) . A summary of the ensemble members can be seen in Table 1. Definition 3.1. Log-distance : Let x , y ∈ R n be two real-valued n -dimensional observations. Then, the Log-distance ℓ (·) between x and y is given by ℓ ( x , y ) = n (cid:213) t = log ( + | x t − y t |) . (5) Robust Distance Metric MileTS ’19, August 5th, 2019, Anchorage, Alaska, USA

Proposition 3.2.

The Log-distance ℓ (·) is a metric. Proof. Since log ( x ) : R + → R is a strictly monotonic subaddi-tive function, log ( + x ) is also a strictly monotonic subadditivefunction that is zero iff x =

0. Consequently, log ( + | x − y |) also ful-fills these properties and is a metric by Kelly’s theorem [12, p. 131].That the sum of metrics is a metric [7] completes the proof. □ For 1-dimensional data, the Log-distance is asymptoticallysmaller than any L p metric with p ≥

1, since the logarithm growsslower than an arbitrary polynomial, i.e. lim z →∞ log ( z ) P ( z ) = , z ∈ R .This property is beneficial when one expects a small number oflarge outliers in time series data x t . L p metrics such as the Eu-clidean distance will be much more influenced by a single largedifference than several small deviations that sum up to the samevalue. The Log-distance will weight several small changes higherthan one large change due to subadditivity of the logarithm.The remaining two metrics of the ensemble are the Euclideandistance e (·) as defined in Equation (1) and the raw Edit distanceEdit ( x , y ) = n (cid:213) t = ϕ t , ϕ t = (cid:40) x t = y t x t (cid:44) y t (6)which is equivalent to the number of observations where x t and y t differ. While the Edit distance tolerates up to n − e (·) is invariant to imprecision but sensitive to contamination. TheLog-distance ℓ (·) aims to present a middle-ground between thetwo. Compared to the Euclidean distance it is “more” sensitive toimprecision and “less” sensitive to contamination, while the inverseholds when it is compared against the Edit distance. However, interms of robustness, the Euclidean distance and the Log-distanceare asymptotically equivalent, since they have the same BP β ⋆ e ( n ) = β ⋆ℓ ( n ) = n . Hence, when confronted with arbitrary contamination,both metrics become equally useless in the worst case. To raise the BP of the metrics in the ensemble E , one can introducea function composition with a function that has a high BP whilepreserving metric properties. Let m ( x ) be the median of x . As ameasure of central tendency, the median has a BP of β ⋆ m ( n ) = . + n according to Huber’s definition [11]. However, computingthe median of time series data x t is meaningless, since it disregardsthe temporal structure of x t by treating it like an unordered data set.To exploit the asymptotic robustness of the median in the context oftime series, one can instead apply the median via a sliding window: Definition 3.3.

Let x be a time series instance and let w , an oddinteger in [ n ] , be the size of a sliding window. The sliding median → m : R n → R n − w + of x is then defined as → m ( x ) = { m ( x , . . . , x w ) , m ( x , . . . , x w + ) , . . . , m ( x n − w + , . . . , x n )} . (7)If one computes the Euclidean, Log and Edit distance of → m , thenthe result is no longer a metric — the identity of indiscerniblesbecomes violated since the median is not an injective function.However, the remaining metric properties are preserved: Proposition 3.4. Let d (·) be a metric. Then the sliding mediandistance M d ( x , y ) = d ( → m ( x ) , → m ( y )) is a pseudometric. Proof. Non-negativity, symmetry and triangle inequality followtrivially from the application of d (·) . □ The sliding median distance M d (·) has a BP of w n , since, if allcontamination occurred at w + consecutive observations, the me-dian of all w − windows containing these observations could becontaminated to an arbitrary value. Since all six ensemble members operate on different scales, it isdesirable to convert them to the same scale without loss of general-ity. Therefore we propose the following scaling function S (·) thatis applied after distance computation and that preserves all metricproperties: Definition 3.5.

Let d (·) be an arbitrary distance function. Themetric-preserving scaling S : R + → [

0; 1 ] of this metric is thendefined as S ( d ( x , y )) = − + d ( x , y ) . (8)Lemma 3.6. Metrics are closed under scaling with S (·) . Proof. S (·) is a concave, monotonically increasing function with S ( d ( x , y )) = ⇔ d ( x , y ) =

0. Hence, it is a metric by Kelly’stheorem [12]. □ After scaling, the ensemble members can be combined into asingle metric

E(·) via an arbitrary L p norm with p ≥

1. Specifically,we suggest the L norm. The resulting function is a metric, sincethe sum of a pseudometric and a metric is a metric. In particular,we propose the following ensemble: E( x , y ) (cid:66) (cid:118)(cid:117)(cid:117)(cid:117)(cid:117)(cid:117)(cid:117)(cid:117)(cid:117)(cid:116) S ( e ( x , y )) + S ( ℓ ( x , y )) + S ( Edit ( x , y )) + S ( e ( → m ( x )) , → m ( y )) + S ( ℓ ( → m ( x ) , → m ( y )) + S ( Edit ( → m ( x ) , → m ( y ))) (9)The ensemble E : R n → [ √ ] has a BP of β ⋆ E =

1. This followsfrom the fact that the measurements of the non-robust metrics e (·) and ℓ (·) are mapped onto the interval [ , √ ] and thus theirinfluence on the ensemble is restricted. The remaining membershave a BP of w n or higher, and the inclusion of the Edit distanceraises the total BP to 1.The ensemble has a worst-case computational complexity of O( n log n ) under the assumption that w = O( n ) . This arises fromthe ensemble’s most expensive step, which is the computation ofthe sliding median → m . A more detailed explanation can be found inthe Appendix. In this section, we describe the experiments we conducted to showthat the proposed ensemble metric E has competitive classificationaccuracy. Further, we validate that E tolerates contamination andis imprecision invariant. We compared E with Euclidean distance(Euc), Dynamic Time Warping [22] (DTW) with window size w = ileTS ’19, August 5th, 2019, Anchorage, Alaska, USA Maximilian Toller, Bernhard C. Geiger, and Roman Kern In our practical evaluation, we conducted three experiments toassess the following properties of the ensemble E : • • Contamination tolerance (cf. Equation (3)) • Imprecision invariance (cf. Equation (4))For all three experiments we used 83 selected benchmark data fromthe UCR Time Series Classification Archive [4]. All datasets whichcontained non-real data such as missing values were omitted, whichwas necessary since otherwise the behavior of the ensemble wouldbe undefined. Further, we were forced to omit all datasets in whicheither training or test datasets contained more that 1000 instancesdue to our limited computational resources.For the classification accuracy experiment we computed the rawaccuracy of a 1-NN classifier based on the ensemble metric andsubtracted the resulting value from 1 to obtain the error rate. Todetermine statistical significance, we then performed a Friedman’srank test [5].The dataset-dependent contamination tolerance was computedwith the following procedure:i) Assume d (·) perfectly separates all classes in the dataset.ii) For every instance x ∈ C i , count for how many y ∈ C j , j (cid:44) i Equation (3) holds when k = . × n , i.e. 5% of the observa-tions are contaminated to be ±∞ iii) Compute the ratio of this count and the number of instancesin C j (cid:44) i .iv) Compute the mean over all instance-based ratios.The window size of the ensemble E was set to w = . × n + k set to n and ε t ∼ U(− − , − ) . Theoretically, ε t should be as close to zero as possible. Yet the results belowsuggest that the above interval is sufficiently small for asserting theimprecision invariance property of the distance functions underconsideration. In this subsection we present a summary of the results of the threeexperiments we conducted. The complete results can be found inthe Appendix.Our first experiment showed that, in terms of classification errorrate, there is no significant difference between the ensemble E andEuc, DTW or Log, but that ED and EDR are significantly worsethan E . A visual representation of this result is depicted in Figure 1.The second and third experiment revealed that the only distancefunctions which are both contamination tolerant and imprecisioninvariant are the ensemble E and EDR. An overview of these resultscan be found in Table 2. Figure 1: Critical distance plot for the classification errorrate. Average ranks are depicted in order, where lower isbetter and the horizontal bars highlight no significant dif-ference. The ensemble E is not significantly different fromeither Euc, DTW or Log, but significantly better than ED andEDR. Functions labeled with an asterisk (*) are not metrics.Table 2: Summary of the second and third experiments. Theensemble E and EDR tolerate contamination on 56 and 79data sets, respectively, while the other distance measuresnever tolerate contamination. In terms of imprecision in-variance, the first four distances are perfectly invariant,while ED is susceptible and EDR is almost perfectly invari-ant. E Euc DTW Log ED EDRIs a metric? ✓ ✓ × ✓ ✓ × Contam. tol. on

The goal of this work was to propose a distance function that • is robust against arbitrary contamination • is invariant to imprecision • fulfills all metric properties • has a competitive classification accuracy • is computationally efficient.The combined results of our theoretical analysis and of the prac-tical evaluation suggest that the ensemble E has all of these prop-erties. One might argue that the ensemble E is no improvementover EDR, since both methods depend on one parameter which in-fluences their classification accuracy. However, correctly choosingan appropriate tolerance interval for time series with large or time-dependent variance is difficult, while tuning the ensemble’s windowsize is simpler — it should be just larger than the expected amountof contamination. Additionally, E has a significantly better classi-fication accuracy than EDR. The same holds for the Log-distance,which we believe should be seen as a natural alternative to theEuclidean distance.Future work might consider less extreme cases of contaminationand determine precisely how this affects classification accuracy.Further, evaluating robust distance functions on a clustering task Robust Distance Metric MileTS ’19, August 5th, 2019, Anchorage, Alaska, USA with arbitrarily contaminated data seems a promising avenue forthe future.

Our work was funded by the iDev40 project. The iDev40 projecthas received funding from the ECSEL Joint Undertaking (JU) undergrant agreement No 783163. The JU receives support from the Eu-ropean Union’s Horizon 2020 research and innovation programme.It is co-funded by the consortium members, grants from Austria,Germany, Belgium, Italy, Spain and Romania.

REFERENCES [1] Amaia Abanda, Usue Mori, and Jose A Lozano. 2019. A review on distance basedtime series classification.

Data Mining and Knowledge Discovery

33, 2 (2019),378–412.[2] Lei Chen and Raymond Ng. 2004. On the marriage of lp-norms and edit distance.In

Proceedings of the Thirtieth international conference on Very large data bases-Volume 30 . VLDB Endowment, 792–803.[3] Lei Chen, M Tamer Özsu, and Vincent Oria. 2005. Robust and fast similaritysearch for moving object trajectories. In

Proceedings of the 2005 ACM SIGMODinternational conference on Management of data

Journal of Machine learning research

7, Jan (2006), 1–30.[6] Hui Ding, Goce Trajcevski, Peter Scheuermann, Xiaoyue Wang, and EamonnKeogh. 2008. Querying and mining of time series data: experimental comparisonof representations and distance measures.

Proceedings of the VLDB Endowment

Metric preserving functions . Štroffek Košice.[8] Robert F Engle. 1982. Autoregressive conditional heteroscedasticity with esti-mates of the variance of United Kingdom inflation.

Econometrica: Journal of theEconometric Society (1982), 987–1007.[9] Christos Faloutsos, Mudumbai Ranganathan, and Yannis Manolopoulos. 1994.

Fast subsequence matching in time-series databases . Vol. 23. ACM.[10] Gisli R Hjaltason and Hanan Samet. 2003. Index-driven similarity search inmetric spaces (survey article).

ACM Transactions on Database Systems (TODS)

International Encyclopedia of StatisticalScience . Springer, 1248–1251.[12] John L Kelley. 2017.

General topology . Courier Dover Publications.[13] Eamonn Keogh. 1997. A fast and robust method for pattern matching in timeseries databases.

Proceedings of WUSS

97, 1 (1997), 99.[14] Eamonn Keogh, Kaushik Chakrabarti, Michael Pazzani, and Sharad Mehrotra.2001. Dimensionality reduction for fast similarity search in large time seriesdatabases.

Knowledge and information Systems

3, 3 (2001), 263–286.[15] Eamonn Keogh and Shruti Kasetty. 2003. On the need for time series data miningbenchmarks: a survey and empirical demonstration.

Data Mining and knowledgediscovery

7, 4 (2003), 349–371.[16] Eamonn J Keogh and Michael J Pazzani. 2000. Scaling up dynamic time warpingfor datamining applications. In

Proceedings of the sixth ACM SIGKDD internationalconference on Knowledge discovery and data mining . ACM, 285–289.[17] Eamonn J Keogh and Padhraic Smyth. 1997. A probabilistic approach to fastpattern matching in time series databases.. In

Kdd , Vol. 1997. 24–30.[18] Karen Kukich. 1992. Techniques for automatically correcting words in text.

AcmComputing Surveys (CSUR)

24, 4 (1992), 377–439.[19] Benoit Mandelbrot. 1967. The variation of some other speculative prices.

TheJournal of Business

40, 4 (1967), 393–413.[20] Abdullah Mueen and Eamonn Keogh. 2016. Extracting optimal performance fromdynamic time warping. In

Proceedings of the 22nd ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining . ACM, 2129–2130.[21] Sangeeta Rani and Geeta Sikka. 2012. Recent techniques of clustering of timeseries data: a survey.

International Journal of Computer Applications

52, 15 (2012).[22] Hiroaki Sakoe, Seibi Chiba, A Waibel, and KF Lee. 1990. Dynamic program-ming algorithm optimization for spoken word recognition.

Readings in speechrecognition

159 (1990), 224.[23] Xiaopeng Xi, Eamonn Keogh, Christian Shelton, Li Wei, and Chotirat AnnRatanamahatana. 2006. Fast time series classification using numerosity reduction.In

Proceedings of the 23rd international conference on Machine learning . ACM,1033–1040.

A DETAILED TIME COMPLEXITY ANALYSIS

A procedural description of the ensemble E is listed in Algorithm 1.Since there are no directly listed loops, the ensemble is linear in thecomplexity of the functions it applies. Hence, its worst-case timecomplexity is equal to that of the function that requires the mostexpensive computation. When looking at the distance functions, itquickly becomes evident that these can be computed in O( n ) , sincethese lockstep methods look at each observation only once. Thescaling function can be computed in O( ) , so the only non-trivialstep is the computation of the sliding median. Algorithm 1

Ensemble Metric E Require: x t , y t , w if w (cid:27) ( mod 2 ) then w ← w + end if med x ← → m ( x ) med y ← → m ( y ) dist ← S e ( x , y ) dist ← S ℓ ( x , y ) dist ← S Edit ( x , y ) dist ← S e ( med x , med y ) dist ← S ℓ ( med x , med y ) dist ← S Edit ( med x , med y ) return (cid:113)(cid:205) i dist i Directly computing the sliding median requires one to sort theobservations in all windows. Since the most efficient sorting al-gorithm requires O( w log w ) steps, sorting all n − w + ( n − w + ) × O( w log w ) = n × O( w log w ) steps. Weassume that a certain small percentage of the n observations iscontaminated, so we must conclude that w = O( n ) , which resultsin a total complexity of O( n log n ) .However, there exist more efficient algorithms for computingthe sliding median, and several implementations are available in Clibraries. If one starts by sorting the initial window, one consumes O( n log n ) time as argued above. After this, if one keeps the com-puted window and an index list in the memory, the next window canbe computed by removing the oldest observation in O( ) time andsorting in the new observation in O( log n ) time. So, for all windows,this algorithms requires O( n log n ) + ( n − w )× O( log n ) = O( n log n ) steps. B PROOFS

This section contains alternate proofs of the propositions and lem-mas presented in the main article. These proofs do no rely on Kelly’stheorem and are easy to verify.

B.1 Proof of Proposition 3.2

Proof.

Non-negativity : ℓ ( x , y ) ≥ ℓ ( x , y ) = log ( + | x − y |) ≥ log ( ) = ≥ Identity of Indiscernibles : ℓ ( x , y ) = ⇔ x = y This is trivial, since the log function has exactly one zero atlog ( + | x − x |) = loд ( ) = ileTS ’19, August 5th, 2019, Anchorage, Alaska, USA Maximilian Toller, Bernhard C. Geiger, and Roman Kern Symmetry

Trivial, due to absolute value.

Triangle Inequality : ℓ ( x , z ) ≤ ℓ ( x , y ) + ℓ ( y , z ) ℓ ( x , z ) = log ( + | x − z |) ≤ log ( + | x − y |) + log ( + | y − z |) log ( + | x − z |) ≤ log (cid:16) ( + | x − y |)( + | y − z |) (cid:17) + | x − z | ≤ ( + | x − y |)( + | y − z |)| x − z | ≤ | x − y | + | y − z | + | x − y || y − z || x − z | ≤ | x − y | + | y − z | □ B.2 Proof of Proposition 3.4

Proof.

Non-negativity : M d ( x , y ) ≥ d (·) ≥ Symmetry : M d ( x , y ) = M d ( y , x ) This also follows trivially from the symmetry of d (·) . Triangle Inequality : M d ( x , z ) ≤ M d ( x , y ) + M d ( y , z ) d ( → m ( x ) , → m ( z )) ≤ d ( → m ( x ) , → m ( y ) + d ( → m ( y ) , → m ( z ) a (cid:66) → m ( x ) , b (cid:66) → m ( y ) , c (cid:66) → m ( z ) d ( a , c ) ≤ d ( a , b ) + d ( b , c ) □ B.3 Proof of Lemma 3.6

Proof.

Non-negativity : S ( x , y ) ≥ − + d ( x , y ) ≥ + d ( x , y ) ≤ + d ( x , y ) ≥ d ( x , y ) ≥ Identity of Indiscernibles : S ( x , y ) = ⇔ x = y First we show S ( x , y ) = = ⇒ x = y − + d ( x , y ) = + d ( x , y ) = d ( x , y ) = S ( x , y ) = ⇐ = x = yS ( x , x ) = − + d ( x , x ) = − = Symmetry : S ( x , y ) = S ( y , x ) This is trivial, since d ( x , y ) is symmetric. Triangle Inequality : S ( x , z ) ≤ S ( x , y ) + S ( y , z ) − + d ( x , z ) ≤ − + d ( x , y ) + − + d ( y , z )− + d ( x , z ) ≤ − + d ( x , y ) − + d ( y , z )− ≤ ( + d ( x , z )) − + d ( x , z ) + d ( x , y ) − + d ( x , z ) + d ( y , z )− (cid:0) + d ( x , y ) (cid:1) (cid:0) + d ( y , z ) (cid:1) ≤ (cid:0) + d ( x , z ) (cid:1) (cid:16)(cid:0) + d ( x , y ) (cid:1) (cid:0) + d ( y , z ) (cid:1) − (cid:0) + d ( x , y ) (cid:1) − (cid:0) + d ( y , z ) (cid:1)(cid:17) − (cid:0) + d ( x , y ) (cid:1) (cid:0) + d ( y , z ) (cid:1) ≤ (cid:0) + d ( x , z ) (cid:1) (cid:0) d ( x , y ) d ( y , z ) − (cid:1) − − d ( x , y ) d ( y , z ) − d ( x , y ) − d ( y , z ) ≤ d ( x , y ) d ( y , z ) − − d ( x , z ) + d ( x , y ) d ( y , z ) d ( x , z ) d ( x , z ) − d ( x , y ) − d ( y , z ) ≤ ≤ d ( x , y ) d ( y , z ) d ( x , z ) □ C FULL EMPIRICAL RESULTS

This section contains tables with the complete empirical results.The classification accuracy of Dynamic Time Warping was takenfrom the results published in the UCR archive [4]. The names ofthe used datasets and the complete results can be found below. Thefirst two tables list the classification error rate per distance function.The second two table compare the contamination tolerance and theimprecision invariance per distance function.

Robust Distance Metric MileTS ’19, August 5th, 2019, Anchorage, Alaska, USA

Dataset

Error rate E Euc DTW Log ED EDRACSF1 0.28 0.46 0.36 0.17 0.90 0.48Adiac 0.42 0.39 0.40 0.40 0.97 0.54ArrowHead 0.21 0.20 0.30 0.20 0.61 0.31Beef 0.33 0.33 0.37 0.40 0.80 0.50BeetleFly 0.25 0.25 0.30 0.35 0.50 0.40BirdChicken 0.45 0.45 0.25 0.35 0.50 0.35BME 0.25 0.17 0.10 0.20 0.65 0.59Car 0.27 0.27 0.27 0.28 0.77 0.30CBF 0.06 0.15 0.00 0.11 0.67 0.38Chinatown 0.06 0.05 0.04 0.05 0.45 0.03Coffee 0.04 0.00 0.00 0.07 0.46 0.11Computers 0.47 0.42 0.30 0.42 0.50 0.42CricketX 0.37 0.42 0.25 0.37 0.92 0.71CricketY 0.38 0.43 0.26 0.34 0.92 0.70CricketZ 0.38 0.41 0.25 0.36 0.93 0.72DiatomSizeReduction 0.07 0.07 0.03 0.08 0.70 0.08DistalPhalanxOutlineAgeGroup 0.35 0.37 0.23 0.33 0.58 0.26DistalPhalanxOutlineCorrect 0.29 0.28 0.28 0.26 0.42 0.30DistalPhalanxTW 0.37 0.37 0.41 0.37 0.70 0.34Earthquakes 0.35 0.29 0.28 0.33 0.75 0.75ECG200 0.12 0.12 0.23 0.11 0.64 0.20ECGFiveDays 0.18 0.20 0.23 0.21 0.50 0.36EOGHorizontalSignal 0.60 0.58 0.50 0.67 0.83 0.83EOGVerticalSignal 0.68 0.56 0.55 0.73 0.86 0.86EthanolLevel 0.72 0.73 0.72 0.69 0.75 0.69FaceFour 0.22 0.22 0.17 0.15 0.70 0.26FiftyWords 0.34 0.37 0.31 0.31 0.97 0.61Fish 0.22 0.22 0.18 0.23 0.88 0.30Fungi 0.12 0.18 0.16 0.08 0.96 0.51GunPoint 0.07 0.09 0.09 0.05 0.51 0.24GunPointAgeSpan 0.03 0.10 0.08 0.00 0.49 0.17GunPointMaleVersusFemale 0.01 0.03 0.00 0.01 0.47 0.21GunPointOldVersusYoung 0.00 0.05 0.16 0.00 0.52 0.01Ham 0.42 0.40 0.53 0.50 0.51 0.43HandOutlines 0.14 0.14 0.12 0.14 0.46 0.18Haptics 0.63 0.63 0.62 0.64 0.79 0.64Herring 0.47 0.48 0.47 0.41 0.41 0.55HouseTwenty 0.14 0.34 0.08 0.18 0.32 0.37InlineSkate 0.66 0.66 0.62 0.64 0.83 0.69InsectEPGRegularTrain 0.00 0.32 0.13 0.00 0.00 0.00InsectEPGSmallTrain 0.00 0.34 0.27 0.00 0.00 0.00LargeKitchenAppliances 0.47 0.51 0.21 0.42 0.67 0.67Lightning2 0.18 0.25 0.13 0.18 0.46 0.51Lightning7 0.33 0.42 0.27 0.26 0.74 0.71Meat 0.07 0.07 0.07 0.07 0.65 0.07

Table 3: Classification error per distance function, Part 1. The ensemble E is not significantly different from DTW, Log or Euc.ED and EDR frequently have a higher classification error than the remaining functions. Unsurprisingly, DTW has the lowestoverall error rate. Its elastic nature likely superior classification accuracy over lockstep distance functions. ileTS ’19, August 5th, 2019, Anchorage, Alaska, USA Maximilian Toller, Bernhard C. Geiger, and Roman Kern Dataset

Error rate E Euc DTW Log ED EDRMedicalImages 0.31 0.32 0.26 0.29 0.49 0.53MiddlePhalanxOutlineAgeGroup 0.45 0.48 0.50 0.47 0.44 0.49MiddlePhalanxOutlineCorrect 0.25 0.23 0.30 0.25 0.43 0.24MiddlePhalanxTW 0.49 0.49 0.49 0.45 0.73 0.47OliveOil 0.13 0.13 0.17 0.17 0.83 0.60OSULeaf 0.48 0.48 0.41 0.45 0.90 0.55PigAirwayPressure 0.88 0.94 0.89 0.90 0.90 0.94PigArtPressure 0.73 0.88 0.75 0.72 0.93 0.86PigCVP 0.87 0.92 0.85 0.87 0.91 0.88Plane 0.04 0.04 0.00 0.04 0.85 0.01PowerCons 0.03 0.07 0.12 0.04 0.20 0.32ProximalPhalanxOutlineAgeGroup 0.23 0.21 0.20 0.22 0.57 0.22ProximalPhalanxOutlineCorrect 0.24 0.19 0.22 0.22 0.32 0.24ProximalPhalanxTW 0.28 0.29 0.24 0.30 0.98 0.27RefrigerationDevices 0.57 0.61 0.54 0.52 0.67 0.69Rock 0.44 0.16 0.40 0.34 0.62 0.46ScreenType 0.65 0.64 0.60 0.62 0.67 0.72SemgHandGenderCh2 0.13 0.24 0.20 0.22 0.35 0.36SemgHandMovementCh2 0.24 0.63 0.42 0.55 0.83 0.82SemgHandSubjectCh2 0.16 0.60 0.27 0.42 0.80 0.77ShapeletSim 0.52 0.46 0.35 0.49 0.50 0.50ShapesAll 0.24 0.25 0.23 0.24 0.98 0.37SmallKitchenAppliances 0.46 0.66 0.36 0.51 0.66 0.74SmoothSubspace 0.02 0.09 0.17 0.00 0.67 0.15SonyAIBORobotSurface1 0.24 0.30 0.27 0.31 0.57 0.37SonyAIBORobotSurface2 0.13 0.14 0.17 0.12 0.38 0.21Strawberry 0.06 0.05 0.06 0.05 0.36 0.05SwedishLeaf 0.22 0.21 0.21 0.22 0.93 0.32Symbols 0.10 0.10 0.05 0.10 0.84 0.20SyntheticControl 0.07 0.12 0.01 0.13 0.83 0.35ToeSegmentation1 0.29 0.32 0.23 0.27 0.46 0.36ToeSegmentation2 0.15 0.19 0.16 0.12 0.18 0.21Trace 0.31 0.24 0.00 0.21 0.76 0.32UMD 0.25 0.24 0.01 0.24 0.48 0.51Wine 0.33 0.39 0.43 0.35 0.50 0.48WordSynonyms 0.38 0.38 0.35 0.34 0.91 0.63Worms 0.48 0.55 0.42 0.56 0.44 0.64WormsTwoClass 0.36 0.39 0.38 0.42 0.44 0.43

Table 4: Classification error per distance function, Part 2.

Robust Distance Metric MileTS ’19, August 5th, 2019, Anchorage, Alaska, USA

Dataset

Contamination Tolerance Imprecision Invariance E Euc DTW Log ED EDR E Euc DTW Log ED EDRACSF1 0.68 0.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 1.00 0.00 0.98Adiac 0.49 0.00 0.00 0.00 1.00 0.94 1.00 1.00 1.00 1.00 0.00 0.98ArrowHead 0.98 0.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 1.00 0.00 1.00Beef 0.85 0.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 1.00 0.00 1.00BeetleFly 1.00 0.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 1.00 0.00 1.00BirdChicken 1.00 0.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 1.00 0.00 1.00BME 1.00 0.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 1.00 0.00 1.00Car 0.99 0.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 1.00 0.00 1.00CBF 1.00 0.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 1.00 0.00 1.00Chinatown 1.00 0.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 1.00 0.00 1.00Coffee 0.30 0.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 1.00 0.00 1.00Computers 1.00 0.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 1.00 0.00 0.99CricketX 1.00 0.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 1.00 0.00 1.00CricketY 1.00 0.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 1.00 0.00 1.00CricketZ 1.00 0.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 1.00 0.00 1.00DiatomSizeReduction 0.95 0.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 1.00 0.00 1.00DistalPhalanxOutlineAgeGroup 0.95 0.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 1.00 0.00 1.00DistalPhalanxOutlineCorrect 0.66 0.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 1.00 0.00 1.00DistalPhalanxTW 0.94 0.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 1.00 0.00 1.00Earthquakes 1.00 0.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 1.00 0.00 1.00ECG200 1.00 0.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 1.00 0.00 1.00ECGFiveDays 1.00 0.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 1.00 0.00 1.00EOGHorizontalSignal 1.00 0.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 1.00 0.16 1.00EOGVerticalSignal 1.00 0.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 1.00 0.15 1.00EthanolLevel 0.85 0.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 1.00 0.00 1.00FaceFour 1.00 0.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 1.00 0.00 1.00FiftyWords 1.00 0.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 1.00 0.00 1.00Fish 0.97 0.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 1.00 0.00 1.00Fungi 1.00 0.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 1.00 0.00 1.00GunPoint 1.00 0.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 1.00 0.00 1.00GunPointAgeSpan 1.00 0.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 1.00 0.05 1.00GunPointMaleVersusFemale 1.00 0.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 1.00 0.07 1.00GunPointOldVersusYoung 1.00 0.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 1.00 0.04 1.00Ham 1.00 0.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 1.00 0.00 1.00HandOutlines 0.57 0.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 1.00 0.00 1.00Haptics 1.00 0.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 1.00 0.00 1.00Herring 0.62 0.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 1.00 0.00 1.00HouseTwenty 1.00 0.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 1.00 0.18 1.00InlineSkate 1.00 0.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 1.00 0.00 1.00InsectEPGRegularTrain 1.00 0.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 1.00 0.00 1.00InsectEPGSmallTrain 1.00 0.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 1.00 0.00 1.00LargeKitchenAppliances 0.96 0.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 1.00 0.00 1.00Lightning2 1.00 0.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 1.00 0.00 1.00Lightning7 1.00 0.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 1.00 0.00 1.00Meat 0.00 0.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 1.00 0.00 1.00

Table 5: Contamination tolerance and imprecision invariance per distance function, Part 1. The numbers indicate the percent-age of non-class members which have a larger distance to the considered in-class time series, averaged over all classes. EDperfectly tolerates contamination, while E and EDR commonly, but not always achieve perfect scores. In terms of imprecisioninvariance, all measures besides ED appear to fulfill this property. Altogether, EDR with a median absolute deviation-basedtolerance interval appears to have the highest combined robustness. ileTS ’19, August 5th, 2019, Anchorage, Alaska, USA Maximilian Toller, Bernhard C. Geiger, and Roman Kern Dataset