[PDF] Binary Dynamic Time Warping in Linear Time

Abstract

Dynamic time warping distance (DTW) is a widely used distance measure between time series x, y \in \Sigma^n. It was shown by Abboud, Backurs, and Williams that in the \emph{binary case}, where |\Sigma| = 2, DTW can be computed in time O(n^{1.87}). We improve this running time O(n). Moreover, if x and y are run-length encoded, then there is an algorithm running in time \tilde{O}(k + \ell), where k and \ell are the number of runs in x and y, respectively. This improves on the previous best bound of O(k\ell) due to Dupont and Marteau.

Full PDF

aa r X i v : . [ c s . D S ] J a n Binary Dynamic Time Warping in Linear Time

William KuszmaulMIT CSAIL [email protected]

Abstract

Dynamic time warping distance (DTW) is a widely used distance measure between timeseries x, y ∈ Σ n . It was shown by Abboud, Backurs, and Williams that in the binary case ,where | Σ | = 2, DTW can be computed in time O ( n . ). We improve this running time O ( n ).Moreover, if x and y are run-length encoded, then there is an algorithm running in time˜ O ( k + ℓ ), where k and ℓ are the number of runs in x and y , respectively. This improves on theprevious best bound of O ( kℓ ) due to Dupont and Marteau. Introduction

Dynamic time warping distance (DTW) is a widely used distance measure between time series [26].DTW is particularly ﬂexible in dealing with temporal sequences that vary in speed. To measure thedistance between two sequences, portions of each sequence are allowed to be warped (meaning that acharacter may be replaced with multiple consecutive copies of itself), and then the warped sequencesare compared by summing the distances between corresponding pairs of characters. DTW’s manyapplications include phone authentication [10], signature veriﬁcation [27], speech recognition [24],bioinformatics [1], cardiac medicine [7], and song identiﬁcation [34].The textbook dynamic-programming algorithm for DTW runs in time O ( n ), which can beprohibitively slow for large inputs. Moreover, conditional lower bounds [6, 2, 23] prohibit the exis-tence of a strongly subquadratic-time algorithm , unless the Strong Exponential Time Hypothesisis false.On the practical side, the diﬃculty of computing DTW directly has motivated the developmentof fast heuristics [31, 19, 20, 18, 4, 29] which typically lack provable guarantees.On the theoretical side, the diﬃculty of computing DTW directly has led researchers to focuson certain important special cases. Hwang and Gelfand [16] show how to compute DTW( x, y ) intime O (( s + t ) n ), where | x | = | y | = n and where s and t are the number of non-zero values in x and y , respectively. Kuszmaul [23] showed how to compute DTW( x, y ) in time O ( n DTW( x, y )),and also gave an O ( n ǫ )-approximation algorithm with running time ˜ O ( n − ǫ ). Recently, Froese etal. [13] gave an algorithm parameterized by the run-length-encoding lengths of x and y , runningin time O (( k + ℓ ) n ), where k and ℓ are the number of repeated-letter runs in x and y respectively.In the case where k, ℓ ∈ O ( √ n ), the algorithm achieves a faster time of O ( k ℓ + ℓ k ). Binary DTW . One case that is of special interest is that where x and y are binary time series – that is, x, y ∈ { , } n . In this case, the conditional lower bounds [6, 2, 23] do not apply. Abboud,Backurs, and Williams [2] gave an algorithm for computing binary DTW in time O ( n . ), buildingon an algorithm given by [8] for the Bounded Monotone Convolution Problem . Other work hasgiven algorithms running in time O ( st ) [17, 25], where s and t are the number of 1s in x and y respectively, and in time O ( kℓ ), where k and ℓ are the number of repeated-letter runs in x and y respectively [11].The binary DTW problem has also received attention from practitioners. For example, severalof the CASAS human activity data sets [9] that have been examined in the context of DTW [25, 32]consist of binary data points (e.g., sensor data indicating when a door is open/closed).Binary DTW has also been studied in the context of a large number r of time series x (1) , x (2) , . . . , x ( r ) being considered simultaneously. In this case, researchers have focused on the Binary Mean Prob-lem [32], in which the goal is to ﬁnd a single time series x ∗ that minimizes the sum of dynamictime warping distances P i DTW( x ∗ , x ( i ) ). Leveraging the binary DTW algorithm of [2], Schaar,Froese, and Niedermeier [32] gave an O ( rn . )-time algorithm for the Binary Mean Problem. Thealgorithm was not included in the subsequent empirical evaluation [32], however, due to the im- An algorithm is said to run in strongly subquadratic time if it runs in time O ( n − ǫ ) for some constant ǫ > Researchers have also studied related problems that are not constrained by the aforementioned conditional lowerbounds. See, for example, work by Braverman et al. [5] on communication complexity and by Kuszmaul [23] onapproximation algorithms. For a full discussion of the O ( n . )-time algorithm, see the extended version [3] of [2]. n . term. Binary DTW in linear time . In this note, we show that binary DTW can be computed in lineartime O ( n ), substantially improving on the previous state of the art of O ( n . ). Our algorithm isvery simple, and hinges on the relationship between binary DTW and minimum weight bipartitematching.Our algorithm can also be modiﬁed for the case where x and y are run-length encoded. If x and y consist of k and ℓ repeated-letter runs, respectively, then our algorithm runs in time O (( k + ℓ ) log( k + ℓ )). In this paper we capture treat time series as strings. The runs of a string are the maximalsubstrings consisting of a single repeated letter. For example, the runs of aabbbccd are aa , bbb , cc ,and d . Given a string x , we can extend a run in x by further duplicating the letter which populatesthe run. For example, the second run in aabbbccd can be extended to obtain aabbbbccd . Any stringobtained from x by extending x ’s runs is an expansion of x . For example, aaaabbbbccccdddd is anexpansion of aabbbccd .Consider two strings x and y with characters from a metric space (Σ , d ). A correspondence between x and y is a pair ( x, y ) of equal-length expansions of x and y . The value of a correspondenceis the diﬀerence X i d ( x i , y i )between the two expansions. A correspondence between x and y is said to be optimal if it has theminimum attainable value, and the resulting value is called the dynamic time warping distance DTW( x, y ) between x and y . In this section we show that binary DTW can be computed in linear time:

Theorem 3.1.

Let x ∈ { , } n and y ∈ { , } m be binary strings. Then DTW( x, y ) can be computedin time O ( n + m ) . We also consider the case where x and y are run-length encoded. That is, x (and similarly y )is given as a sequence of pairs ( i , a ) , ( i , a ) , . . . indicating that the j -th run consists of i j copiesof the letter a j . Theorem 3.2.

Let x ∈ { , } n and y ∈ { , } m be binary strings. Suppose that x and y arerun-length encoded, and that the total number of runs in x and y is ℓ . Then DTW( x, y ) can becomputed in time O ( ℓ log ℓ ) . A useful reduction . We begin by employing a result of Abboud, Backurs, and Williams [2].

Lemma 3.3 (Theorem 8 of [2]) . Computing

DTW( x, y ) of two strings x ∈ { , } n and y ∈ { , } m can be reduced in time O ( m + n ) to (a constant number of instances of ) the following problem:given a sequence w , w , . . . , w s of s ≤ max( m, n ) positive integers, and an integer r ≤ s , ﬁnd a ubsequence w i , w i , . . . , w i r of length r that does not use any neighboring integers (i.e., i j +1 − i j ≥ for all j ) and such that the sum P j w i j of integers is minimized. The integers in w , w , . . . , w s sum up to at most max( m, n ) . When x and y are run-length encoded (meaning each run is encoded by its length), then thefollowing extension of Lemma 3.3 is also useful. Corollary 3.4.

Suppose that x and y are run-length encoded, and that k and ℓ are the number ofruns in x and y , respectively. Then the reduction in Lemma 3.3 takes time O ( k + ℓ ) and results insequences w , w , . . . , w s of length s ≤ min( k, ℓ ) . Although we will not re-prove Lemma 3.3 here, we do give a brief intuition. Suppose forsimplicity that both x and y begin and end with 0, and suppose that x has more runs than y . If x has k runs and y has ℓ runs, then the optimal correspondence ( x, y ) will select k − ℓ runs R in x and the correspondence will contain miss-matches x i = y i only for x i ’s from those runs R . Inparticular, the expansion y of y “covers up” the runs in R by expanding runs in y to engulf theruns in R . A run in y can only “cover up” a run in x if the two runs have diﬀerent values (onerun is of 0s and the other is of 1s). Consequently, the runs R in x that are covered up cannot beadjacent to one-another. That is, no two runs in R can appear adjacently in x . This turns out tobe the only constraint on R , however, and subject to this constraint, the cost of the correspondence( x, y ) is minimized by selecting the runs in R to have the minimum possible total length. Thus thereduction from Lemma 3.3 can be thought of as follows: let w , w , . . . , w k be the lengths of theruns in x . Then the dynamic time warping distance is given byDTW( x, y ) = min i ,i ,...,i k − ℓ k − ℓ X j =1 w i j , where i < i < · · · < i k − ℓ and where i j + 1 = i j +1 for any j . In order to handle cases where x and y disagree in their ﬁrst or last letters, a small amount of additional casework is necessary, resultingin a reduction to O (1) instances of the subsequence problem, rather than just a single instance [2]. Relationship to bipartite matching . The problem given by Lemma 3.3 can be reformulatedas a problem of minimum-weight bipartite matching. Consider the line graph G with vertices V = { v , v , . . . , v s } , with edges E = { e = ( v , v ) , . . . , e s = ( v s − , v s ) } , and with edge-weightswt( e i ) = w i . Then the problem described in Lemma 3.3 becomes: ﬁnd the minimum-weightmatching M ⊆ E such that | M | = r .Our algorithm for computing DTW( x, y ) hinges on the relationship to minimum-weight bipar-tite matching. In order to eﬃciently compute DTW( x, y ), we will construct the minimum-weightmatching M of size | M | = r by simply performing iterative path augmentation. The Hungarian Algorithm for weighted bipartite matching . One of the simplest algo-rithms for weighted bipartite matching is the so-called

Hungarian Algorithm [21, 22, 28, 33, 12].Although the Hungarian Algorithm applies to arbitrary weighted bipartite graphs, we will be dis-cussing the algorithm and its properties exclusively in the context of our line graph G . In order todescribe the algorithm in the context of a line graph, we ﬁrst introduce several useful notations.Formally, a matching M in the line graph G is a subset M ⊆ E such that | M ∩ { e i , e i +1 }| ≤ i . The weight wt( M ) is given by P e i ∈ M wt( e i ). A chain C in a matching M is a setof the form C = { e i , e i +2 , e i +4 , . . . , e i +2 c } ⊆ M for some c ∈ N . The chain C is maximal if3 i − , e i +2 c +2 M . The augmentation of a chain C = { e i , e i +2 , e i +4 , . . . , e i +2 c } is the new chainAug( C ) = { e i − , e i +1 , e i +3 , . . . , e i +2 c +1 } ∩ E . A matching M ′ is said to be a augmentation of amatching M if M ′ = M \ C ∪ Aug( C ) for some maximal chain C in M , and if | M ′ | = | M | + 1. Note that M \ C ∪ Aug( C ) is guaranteed to be a matching for any maximal chain C .In order to simplify discussion, we also introduce the notion of an empty chain. For i ∈ [ s ], amatching M contains the i -th empty chain ∅ i if M does not contain any of e i − , e i , e i +1 . In thiscase the empty chain ∅ i is considered to be maximal, and the augmentation Aug( ∅ i ) is deﬁned tobe e i . Thus, if a matching M ′ equals M ∪ e i for some edge e i M , then the matching M ′ can bethought of as M \ ∅ i ∪ Aug( ∅ i ), making M ′ an augmentation of M .The Hungarian Algorithm constructs a matching M of size r as follows. The algorithm beginswith the empty matching M . The algorithm then iteratively constructs M , M , . . . , M r , whereeach M i is a minimum-weight augmentation of M i − . That is, M i is permitted to be any augmen-tation of M i − that achieves the minimum attainable value for wt( M i ) (over all augmentations of M i − ). The ﬁnal matching M r consists of r edges and is given as the output matching M .Tarjan and Ramshaw (Proposition 3-8 of [30]) showed that the Hungarian Algorithm outputs amatching M r with the minimum possible weight (out of all r -edge matchings). Note that we focusonly on r ≤ ⌈ s/ ⌉ , since ⌈ s/ ⌉ is the size of the largest matching in our line graph G . Lemma . For r ≤ ⌈ s/ ⌉ , the matching M r has the minimum weightout of all r -edge matchings.Whereas Tarjan and Ramshaw extend Lemma 3.5 to arbitrary bipartite graphs, we are onlyinterested in the line graph. This allows for an especially simple proof of the lemma. Proof of Lemma 3.5.

Let r ≥ M r − is minimum-weight out of( r − M ∗ r be a minimum-weight matching of size r . The edges E can be decomposed as thedisjoint union, E = [ maximal chain C ⊆ M r − C ∪ Aug( C ) . (1)Since | M ∗ r | = | M r − | + 1, and since E decomposes into (1), there must be a maximal chain C in M r − for which | M ∗ r ∩ ( C ∪ Aug( C )) | > | M r − ∩ ( C ∪ Aug( C )) | = | C | . Recalling that M ∗ r is a matching, it follows that M ∗ r contains the chain Aug( C ) of size | C | + 1.Now we turn our attention to M r , the minimum-weight augmentation of M r − . Using thedeﬁnition of M r , we know that wt( M r ) ≤ wt( M r − \ C ∪ Aug( C )). To prove that M r is optimalout of r -edge matchings, it therefore suﬃces to show thatwt( M r − \ C ∪ Aug( C )) ≤ wt( M ∗ r ) . (2)By the assumption that M r − is a minimum-weight matching, we know thatwt( M r − ) ≤ wt( M ∗ r \ Aug( C ) ∪ C ) . (3)If we remove C from the matchings on both sides of (3), and then insert Aug( C ) into both match-ings, then we arrive at (2), as desired. Note that M \ C ∪ Aug( C ) evaluates as ( M \ C ) ∪ Aug( C ) by order of operations. ﬃciently constructing the matchings . Again using the fact that G is a line graph on s + 1vertices, the matchings M , M , M , . . . can easily be computed in time O ( s log s ). Lemma . For any r ≤ ⌈ s/ ⌉ , the matching M r can be computed in time O ( s log s ). Proof.

We build M , M , . . . , M r using the Hungarian algorithm. When going from M i to M i +1 ,we maintain two data structures: (1) a balanced binary tree B consisting of the maximal chains C ⊆ M i for which | Aug( C ) | = | C | + 1, and sorted by the key wt(Aug( C )) − wt( C ); and (2) an array A of s ones and zeroes, where the ones correspond to the positions in which the maximal chains C ⊆ M i begin and end.To go from M i to M i +1 , the minimum element of B is used to determine which chain C toaugment. This means that M i +1 = M i \ C ∪ Aug( C ). The array A is updated to reﬂect the updatefrom M i to M i +1 , and is used to determine whether the new augmented chain Aug( C ) combineswith another chain C ′ ⊆ M i in order to form a larger maximal chain in M i +1 . The tree B is thenupdated appropriately to reﬂect the transition from M i to M i +1 . (The subtle case here is that, ifAug( C ) combines with another chain C ′ , then both C and C ′ are removed from B and replacedwith a single node for the new chain Aug( C ) ∪ C ′ .)The tree B takes time O ( s log s ) to initialize and the array A takes time O ( s ) to initialize (asall zeros). Constructing M r then takes time O ( r log s ).Corollary 3.4 and Lemma 3.6 combine to imply Theorem 3.2.In order to prove Theorem 3.1 we will need to prove several additional properties of the match-ings M , M , M , . . . . The next lemma shows that M i +2 can always be reached from M i via two disjoint chain augmentations. Lemma 3.7.

Consider M i and M i +2 for some i (satisfying ≤ i ≤ ⌈ s/ ⌉− ). There exist maximalchains C and C in M i such that M i +2 = M i \ ( C ∪ C ) ∪ ( Aug ( C ) ∪ Aug ( C )) . Proof.

Let D be the maximal chain augmented between M i and M i +1 , and let D be the maximalchain augmented between M i +1 and M i +2 . If D is a maximal chain in M i , then we can simply set C = D and C = D in order to complete the lemma. On the other hand, if D is not a maximalchain in M i , then D must be of the form D ′ ∪ Aug( D ) for some maximal chain D ′ in M i . Itfollows that M i +2 = M i \ ( D ∪ D ′ ) ∪ Aug(Aug( D )) ∪ Aug( D ′ ) . Observe Aug(Aug( D )) overlaps Aug( D ′ ) in one edge, and otherwise consists of D and some othernew edge e j . That is, Aug(Aug( D )) \ ( D ∪ Aug( D ′ )) consists of a single edge e j . It follows that M i +2 = M i \ ( ∅ j ∪ D ′ ) ∪ (Aug( ∅ j ) ∪ Aug( D ′ )) , where ∅ j is treated as the empty set. Setting C = ∅ j and C = D ′ completes the proof.Using Lemma 3.7, we can prove a monotonicity property for ∆ , ∆ , . . . , ∆ r , where ∆ i =wt( M i ) − wt( M i − ). 5 emma 3.8. Let r ≤ ⌈ s/ ⌉ . Deﬁne ∆ , ∆ , . . . , ∆ r , where ∆ i = wt ( M i ) − wt ( M i − ) . Then ∆ ≤ ∆ ≤ · · · ≤ ∆ r . Proof.

To compare ∆ i and ∆ i +1 , we apply Lemma 3.7 deduce that M i +1 = M i − \ ( C ∪ C ) ∪ (Aug( C ) ∪ Aug( C )) , for some two maximal chains C , C ⊆ M i − (such that | Aug( C j ) | ≥ | C j | for both chains j ∈ { , } ).It follows that ∆ i + ∆ i +1 = wt( C ) − wt(Aug( C )) + wt( C ) − wt(Aug( C )) . This means that for some j ∈ { , } , we have∆ i + ∆ i +1 ≥ wt( C j ) − wt(Aug( C j )) . By the deﬁnition of ∆ i , and the fact that M i is the minimum-weight augmentation of M i − ,wt( M i − ) + ∆ i = wt( M i ) ≤ wt( M i − ) − wt( C j ) + wt(Aug( C j )) ≤ wt( M i − ) + ∆ i + ∆ i +1 . It follows that ∆ i ≤ (∆ i + ∆ i +1 ) /

2, which implies ∆ i ≤ ∆ i +1 .By exploiting the monotonicity of the ∆ i ’s, we can compute the matching M r in time O ( n + m ). Lemma 3.9.

For any r ≤ ⌈ s/ ⌉ , the matching M r can be computed in time O ( m + n ) .Proof. We modify the approach from Lemma 3.6 as follows. Rather than maintaining B as abalanced binary tree, we maintain B using what is essentially a dynamic bucket sort.At any given moment, B consists of max( n, m ) buckets, where each bucket i contains a linkedlist of the maximal chains C whose key wt(Aug( C )) − wt( C ) equals i . (We also modify A to containa pointer from the one-entries that represent the ends of chain C to the linked-list element for C in B .) Additionally, B maintains a counter t indicating ∆ i for the most recent M i computed. In orderto ﬁnd the smallest element of B , one simply repeatedly increments the counter t until reaching anon-empty bucket, and then uses a chain C from that bucket. By Lemma 3.8, this always resultsin us ﬁnding the chain in B with the smallest key (i.e., there are never any non-empty buckets withindices smaller than our counter t ).The initial state of B can be constructed time O ( m + n ), since we are inserting s ≤ m + n elements into buckets. The counter t can only be incremented a total of max( m, n ) times, andbesides those increments, each operation on B takes constant time (making O (1) modiﬁcations tolinked lists). It follows that the total running time of the algorithm is now O ( m + n ), as desired.Lemma 3.3 and Lemma 3.9 combine to imply Theorem 3.1.6 Conclusion

This note gives a very simple linear time algorithm that computes DTW( x, y ) for two binary timeseries x, y . The algorithm makes use of a simple connection between dynamic time warping andminimum-weight bipartite matching. Although both the algorithm and the analysis are extremelysimple, the linear running time signiﬁcantly improves on the previous state of the art of O ( n . ) [2]. An open question . Many applications of dynamic time warping use a constrained version ofDTW, in which two the expansions x and y are only allowed to pair up letters x i and y j if | i − j | ≤ k for some width-parameter k . This heuristic is known as the Sakoe-Chiba Band heuristic [31] andis employed, for example, in the commonly used library of Giorgino [14]. One of the main reasonsthat the k -width constraint is added is that it allows for a simple O ( nk )-time algorithm (whichis much faster than O ( n ) for small k ). On the other hand, in the case of binary DTW, the k -width constraint may also make DTW a richer similarity measure. In particular, without the widthconstraint DTW( x, y ) depends only on the number of runs in x and y , and on the properties of thestring with more runs.Thus we conclude with the following open question. What is the fastest that binary DTW canbe computed subject to the k -width constraint? And, in particular, do O ( m + n )-time algorithmsexist for all k ? William Kuszmaul was supported in part by an NSF Graduate Fellowship and a Hertz FoundationFellowship. This research was sponsored in part by National Science Foundation Grant 1533644 andin part by the United States Air Force Research Laboratory and was accomplished under Coopera-tive Agreement Number FA8750-19-2-1000. The views and conclusions contained in this documentare those of the authors and should not be interpreted as representing the oﬃcial policies, eitherexpressed or implied, of the United States Air Force or the U.S. Government. The U.S. Governmentis authorized to reproduce and distribute reprints for Government purposes notwithstanding anycopyright notation herein.

References [1] John Aach and George M Church. Aligning gene expression time series with time warpingalgorithms.

Bioinformatics , 17(6):495–508, 2001.[2] Amir Abboud, Arturs Backurs, and Virginia Vassilevska Williams. Tight hardness results forlcs and other sequence similarity measures. In , pages 59–78, 2015.[3] Amir Abboud, Arturs Backurs, and Virginia Vassilevska Williams. Tight hardness results forlcs and other sequence similarity measures. http://people.csail.mit.edu/virgi/LCS.pdf ,2015.[4] Nurjahan Begum, Liudmila Ulanova, Jun Wang, and Eamonn J. Keogh. Accelerating dynamictime warping clustering with a novel admissible pruning strategy. In

Proceedings of the 21th CM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages49–58, 2015.[5] Vladimir Braverman, Moses Charikar, William Kuszmaul, David P Woodruﬀ, and Lin F Yang.The one-way communication complexity of dynamic time warping distance. In , volume 129, page 16. SchlossDagstuhl–Leibniz-Zentrum fuer Informatik, 2019.[6] Karl Bringmann and Marvin K¨unnemann. Quadratic conditional lower bounds for stringproblems and dynamic time warping. In , pages 79–97, 2015.[7] EG Caiani, A Porta, G Baselli, M Turiel, S Muzzupappa, F Pieruzzi, C Crema, A Malliani, andS Cerutti. Warped-average template technique to track on a cycle-by-cycle basis the cardiacﬁlling phases on left ventricular volume. In

Computers in Cardiology 1998 , pages 73–76, 1998.[8] Timothy M Chan and Moshe Lewenstein. Clustered integer 3sum via additive combinatorics.In

Proceedings of the forty-seventh annual ACM symposium on Theory of computing , pages31–40, 2015.[9] Diane J Cook, Aaron S Crandall, Brian L Thomas, and Narayanan C Krishnan. Casas: Asmart home in a box.

Computer , 46(7):62–69, 2012.[10] Alexander De Luca, Alina Hang, Frederik Brudy, Christian Lindner, and Heinrich Hussmann.Touch me once and i know it’s you!: implicit authentication based on touch screen patterns.In

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems , pages987–996, 2012.[11] Marc Dupont and Pierre-Fran¸cois Marteau. Coarse-dtw for sparse time series alignment. In

International Workshop on Advanced Analysis and Learning on Temporal Data , pages 157–172.Springer, 2015.[12] Michael L Fredman and Robert Endre Tarjan. Fibonacci heaps and their uses in improvednetwork optimization algorithms.

Journal of the ACM (JACM) , 34(3):596–615, 1987.[13] Vincent Froese, Brijnesh Jain, and Maciej Rymar. Fast exact dynamic time warping on run-length encoded time series. arXiv preprint arXiv:1903.03003 , 2019.[14] Toni Giorgino et al. Computing and visualizing dynamic time warping alignments in R: theDTW package.

Journal of statistical Software , 31(7):1–24, 2009.[15] Omer Gold and Micha Sharir. Dynamic time warping and geometric edit distance: Break-ing the quadratic barrier. In , pages 25:1–25:14, 2017.[16] Youngha Hwang and Saul B Gelfand. Sparse dynamic time warping. In

International Confer-ence on Machine Learning and Data Mining in Pattern Recognition , pages 163–175. Springer,2017.[17] Youngha Hwang and Saul B Gelfand. Binary sparse dynamic time warping. In

MLDM (2) ,pages 748–759, 2019. 818] Eamonn J. Keogh. Exact indexing of dynamic time warping. In , pages 406–417, 2002.[19] Eamonn J. Keogh and Michael J. Pazzani. Scaling up dynamic time warping to massivedataset. In

Principles of Data Mining and Knowledge Discovery, Third European Conference,(PKDD) , pages 1–11, 1999.[20] Eamonn J. Keogh and Michael J. Pazzani. Scaling up dynamic time warping for dataminingapplications. In

Proceedings of the sixth ACM SIGKDD international conference on Knowledgediscovery and data mining , pages 285–289, 2000.[21] Harold W Kuhn. The hungarian method for the assignment problem.

Naval research logisticsquarterly , 2(1-2):83–97, 1955.[22] Harold W Kuhn. Variants of the hungarian method for assignment problems.

Naval researchlogistics quarterly , 3(4):253–258, 1956.[23] William Kuszmaul. Dynamic time warping in strongly subquadratic time: Algorithms forthe low-distance regime and approximate evaluation. In . Schloss Dagstuhl-Leibniz-Zentrumfuer Informatik, 2019.[24] Lindasalwa Muda, Mumtaj Begam, and Irraivan Elamvazuthi. Voice recognition algorithmsusing mel frequency cepstral coeﬃcient (MFCC) and dynamic time warping (DTW) techniques. arXiv preprint arXiv:1003.4083 , 2010.[25] Abdullah Mueen, Nikan Chavoshi, Noor Abu-El-Rub, Hossein Hamooni, and Amanda Min-nich. Awarp: fast warping distance for sparse time series. In , pages 350–359. IEEE, 2016.[26] Meinard M¨uller. Dynamic time warping.

Information retrieval for music and motion , pages69–84, 2007.[27] Mario E Munich and Pietro Perona. Continuous dynamic time warping for translation-invariantcurve alignment with applications to signature veriﬁcation. In

Proceedings of 7th InternationalConference on Computer Vision , volume 1, pages 108–115, 1999.[28] James Munkres. Algorithms for the assignment and transportation problems.

Journal of thesociety for industrial and applied mathematics , 5(1):32–38, 1957.[29] Fran¸cois Petitjean, Germain Forestier, Geoﬀrey I. Webb, Ann E. Nicholson, Yanping Chen,and Eamonn J. Keogh. Faster and more accurate classiﬁcation of time series by exploiting anovel dynamic time warping averaging algorithm.

Knowl. Inf. Syst. , 47(1):1–26, 2016.[30] Lyle Ramshaw and Robert E Tarjan. On minimum-cost assignments in unbalanced bipartitegraphs.

HP Labs, Palo Alto, CA, USA, Tech. Rep. HPL-2012-40R1 , 2012.[31] Hiroaki Sakoe and Seibi Chiba. Dynamic programming algorithm optimization for spokenword recognition.

IEEE transactions on acoustics, speech, and signal processing , 26(1):43–49,1978. 932] Nathan Schaar, Vincent Froese, and Rolf Niedermeier. Faster binary mean computation underdynamic time warping. In . Schloss Dagstuhl-Leibniz-Zentrum f¨ur Informatik, 2020.[33] Mikkel Thorup. Integer priority queues with decrease key in constant time and the singlesource shortest paths problem.

Journal of Computer and System Sciences , 69(3):330–353,2004.[34] Yunyue Zhu and Dennis Shasha. Warping indexes with envelope transforms for query byhumming. In