[PDF] Efficient and Effective Similar Subtrajectory Search with Deep Reinforcement Learning

Abstract

Similar trajectory search is a fundamental problem and has been well studied over the past two decades. However, the similar subtrajectory search (SimSub) problem, aiming to return a portion of a trajectory (i.e., a subtrajectory) which is the most similar to a query trajectory, has been mostly disregarded despite that it could capture trajectory similarity in a finer-grained way and many applications take subtrajectories as basic units for analysis. In this paper, we study the SimSub problem and develop a suite of algorithms including both exact and approximate ones. Among those approximate algorithms, two that are based on deep reinforcement learning stand out and outperform those non-learning based algorithms in terms of effectiveness and efficiency. We conduct experiments on real-world trajectory datasets, which verify the effectiveness and efficiency of the proposed algorithms.

Full PDF

EEfﬁcient and Effective Similar Subtrajectory Search withDeep Reinforcement Learning

Zheng Wang, Cheng Long, Gao Cong, Yiding Liu

School of Computer Science and Engineering, Nanyang Technological University, Singapore { wang zheng, c.long, gaocong, ydliu } @ntu.edu.sg ABSTRACT

Similar trajectory search is a fundamental problem and hasbeen well studied over the past two decades. However, thesimilar subtrajectory search (SimSub) problem, aiming toreturn a portion of a trajectory (i.e., a subtrajectory), whichis the most similar to a query trajectory, has been mostlydisregarded despite that it could capture trajectory simi-larity in a ﬁner-grained way and many applications takesubtrajectories as basic units for analysis. In this paper,we study the SimSub problem and develop a suite of algo-rithms including both exact and approximate ones. Amongthose approximate algorithms, two that are based on deepreinforcement learning stand out and outperform those non-learning based algorithms in terms of eﬀectiveness and ef-ﬁciency. We conduct experiments on real-world trajectorydatasets, which verify the eﬀectiveness and eﬃciency of theproposed algorithms.

PVLDB Reference Format:

Zheng Wang, Cheng Long, Gao Cong, Yiding Liu. Eﬃcient andEﬀective Similar Subtrajectory Search with Deep ReinforcementLearning.

PVLDB , 12(xxx): xxxx-yyyy, 2020.DOI: https://doi.org/10.14778/xxxxxxx.xxxxxxx

1. INTRODUCTION

Trajectory data, which corresponds to a type of data forcapturing the traces of moving objects, is ubiquitous. Ithas been used for various types of analysis such as cluster-ing [1, 16, 5] and similarity search [6, 7, 44, 29, 18, 46]. Themajority of existing studies take a trajectory as a whole foranalysis [6, 7, 44, 29, 18, 46]. Motivated by the phenomenonthat two trajectories could be dissimilar to each other if eachis considered a whole but similar if only some portion of eachis considered, there have been a few studies, which take aportion of a trajectory as a basic entity for analysis [1, 16, 5,34, 35]. Some examples include subtrajectory clustering [1,16, 5] and subtrajectory join [34, 35]. For example, thesubtrajectory clustering method in [16] ﬁrst partitions raw

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. To view a copyof this license, visit http://creativecommons.org/licenses/by-nc-nd/4.0/. Forany use beyond those covered by this license, obtain permission by [email protected]. Copyright is held by the owner/author(s). Publication rightslicensed to the VLDB Endowment.

Proceedings of the VLDB Endowment,

Vol. 12, No. xxxISSN 2150-8097.DOI: https://doi.org/10.14778/xxxxxxx.xxxxxxx trajectories into diﬀerent subtrajectories using some princi-ple and then groups those subtrajectories that are similar toone another into clusters.In this paper, we study a query with its goal to searchfor a portion of a trajectory from a database storing manytrajectories called data trajectories , which is the most sim-ilar to a given trajectory called query trajectory . In thisquery, a portion of a trajectory, called subtrajectory , is con-sidered as a basic entity and a query trajectory is takenas a whole for analysis. Therefore, it captures trajectorysimilarity in a ﬁner-grained way than conventional similartrajectory search. For instance, consider a data trajectoryand a query trajectory. When considered as a whole, thedata trajectory is not similar to the query trajectory basedon some trajectory similarity measurement, but some por-tion of it is very similar to the query trajectory. With theconventional similar trajectory search query, this data tra-jectory would be ruled out, though a portion of it is verysimilar to the query trajectory, which is interesting.Moreover, in quite a few real-life applications, subtrajecto-ries are naturally considered as basic units for analysis, e.g.,subtrajectory search [32], subtrajectory join [34], subtrajec-tory clustering [5], etc. One application is the subtrajectorysearch query on sports play data. In sports such as soccerand basketball, a common practice nowadays is to track themovements of players and/or the ball using some special-purpose camera and/or GPS devices [41]. The resultingtrajectory data is used to capture the semantics of the playsand for diﬀerent types of data analyses. One typical task onsuch sports play data is to search for a portion/segment ofplay from a database of plays, with its trajectories of play-ers and/or its trajectory of the ball similar to those and/orthat of a given query play [32]. This task is essentially oneof searching for similar subtrajectories. Another potentialapplication is detour route detection. It ﬁrst collect thoseroutes that have been reported by passengers to be detourroutes and then searches for those subtrajectories of taxis’routes, which are similar to a detour route. The found sub-trajectories are probably detour routes as well.A key problem that is involved in answering the querymentioned above is to ﬁnd a subtrajectory of a data tra-jectory, which is the most similar to a given query trajec-tory. We call this problem the similar subtrajectory search (SimSub) problem. While there are many existing studieson the similar trajectory search problem with each trajec-tory considered a whole, there are very few studies on theSimSub problem. Let T be a data trajectory involving n points and T q be a query trajectory involving m points. We a r X i v : . [ c s . D B ] J u l esign an exact algorithm, which enumerates all possiblesubtrajectories of T , computes the similarity between eachsubtrajectory and the query trajectory, and returns the onewith the greatest similarity. We further adopt an incremen-tal strategy for computing the similarities that are involvedin the exact algorithm, which helps to improve the timecomplexity by O ( n ). We also follow some existing studieson subsequence matching [15, 49] and design an algorithm,which considers only those subtrajectories with their sizessimilar to that of the query trajectory and controlled by auser parameter. This would provide a controllable trade-oﬀbetween eﬃciency and eﬀectiveness.To push the eﬃciency further up, we propose several al-gorithms, which share the idea of splitting a data trajectoryinto some subtrajectories to be candidate solutions to theproblem and diﬀer in using diﬀerent methods for splittingthe data trajectory. Speciﬁcally, the process is to scan thepoints of a data trajectory one by one sequentially and foreach one, it decides whether to split the data trajectory atthe point. Some of them use pre-deﬁned heuristics, e.g., agreedy one. Others model the process as a markov decisionprocess (MDP) [26] and use deep reinforcement learning tolearn an optimal policy for the MDP, which is then usedfor splitting the data trajectory. These splitting-based al-gorithms have time complexities much lower than the exactalgorithm in general, e.g., for measurements such as t2vec,each splitting-based algorithm runs in O ( n ) time while theexact algorithm runs in O ( nm ) time.The major contributions of this paper are as follows. • We propose the SimSub problem, and this to the bestof our knowledge, corresponds to the ﬁrst systematicstudy on searching subtrajectories that are similar to aquery trajectory. The SimSub problem relies on a tra-jectory similarity measurement, and in this paper, weassume an abstract one, which could be instantiatedwith any existing measurement. • We develop a suite of algorithms for the SimSubproblem: (1) one exact algorithm, (2) one approxi-mate algorithm, which provides a controllable trade-oﬀ between eﬃciency and eﬀectiveness, and (3) severalsplitting-based algorithms including both heuristics-based ones and deep reinforcement learning basedones. These algorithms should cover a wide spectrumof application scenarios in terms of eﬃciency and ef-fectiveness requirements. • Third, we conducted extensive experiments, which ver-iﬁed that splitting-based algorithms in general havegood eﬃciency and among them, the algorithms basedon deep reinforcement learning achieve the best eﬀec-tiveness and eﬃciency.

Organization . We review the related work in Section 2and provide the problem deﬁnition and some preliminariesin Section 3. Section 4 presents all non-learning based algo-rithms and Section 5 presents the deep reinforcement learn-ing based algorithms. We report our experimental results inSection 6 and conclude this paper and discuss some futurework in Section 7.

2. RELATED WORK (1) Trajectory Similarity Measurements.

Measuringthe similarity between trajectories is a fundamental prob-lem and has been studied extensively. Some classical so-lutions focus on indexing trajectories and performing sim-ilarity computation by the alignment of matching samplepoints. For example, DTW [46] is the ﬁrst attempt at solv-ing the local time shift issue for computing trajectory sim-ilarity. Frechet distance [2] is a classical similarity measurethat treats each trajectory as a spatial curve and takes intoaccount the location and order of the sampling points. Fur-ther, ERP [6] and EDR [7] are proposed to improve theability to capture the spatial semantics in trajectories. How-ever, these point-matching methods are inherently sensitiveto noise and suﬀer from quadratic time complexity. EDS [44]and EDwP [29] are two segment-matching methods, whichoperate on segments for matching two trajectories. In re-cent years, some learning-based algorithms were proposed tospeed up the similarity computation. Li et al. [18] proposeto learn representations of trajectories in the form of vectorsand then measure the similarity between two trajectories asthe Euclidean distance between their corresponding vectors.Some other studies [39, 47, 38] deﬁne similarity measure-ments on trajectories based on road segments, to which thetrajectories are matched. Yao et al. [45] employ deep metriclearning to approximate and accelerate trajectory similaritycomputation. In addition, Ma et al. [20] propose a similaritymeasurement called p-distance for uncertain trajectories andstudy the problem of searching for top-k similar trajectoriesto a given query trajectory based on p-distance. Diﬀerentspecialized index techniques are developed for these simi-larity measures, such as DTW distance [46, 13], LCSS [37],ERP [6], EDR [7], and EDwP [29]. However, these indextechniques do not generalize to other similarity measures orsubtrajectory similarity search. In this paper, we assume anabstract trajectory similarity measurement, which could beinstantiated with any of these existing similarity measure-ments and our techniques still apply. (2) Subtrajectory Similarity Related Problems.

Mea-suring subtrajectory similarity is also a fundamental func-tionality in many tasks such as clustering [1, 16, 5] and sim-ilarity join [34]. Lee et al [16] propose a general partitionand group framework for subtrajectory clustering. Further,Buchin et al. [5] show the hardness of subtrajectory cluster-ing based on Frechet distance, and Agarwal et al. [1] applythe trajectory simpliﬁcation technique to approximate dis-crete Frechet to reduce the time cost of subtrajectory clus-tering. Recently, Tampakis et al. [34, 35] proposed a dis-tributed solution for subtrajectory join and clustering byutilizing the MapReduce programming model. Althoughthese algorithms need to consider subtrajectory similarity,similarity computation is not their focus and they usuallyﬁrst segment a trajectory into subtrajectories and employan existing measure, such as Freechet distance. (3) Subsequence (Substring) Matching.

Subsequencematching is a related but diﬀerent problem. It aims to ﬁnda subsequence that has the same length as the query in agiven candidate sequence, which usually contains millionsor even trillions [27, 28] of elements. Eﬃcient pruning al-gorithms [10, 28, 27, 3, 11, 23, 9] have been proposed forthe matching, and these pruning algorithms are generallydesigned for a speciﬁc similarity measure, such as DTW [10,28, 27, 3, 11, 25, 14] and Euclidean distance [9, 23], andcannot generalize to other measures. On the other hand,ubstring matching [15, 49] often focuses on approximatematching based on the Edit distance. It aims to ﬁnd a sub-string in a string to best match the query. Our problemdiﬀers from the substring matching problem mainly in twoaspects. First, characters in a string have exact match (0 or1) in the alphabet; however, the points of a trajectory arediﬀerent. Second, substring matching techniques are usu-ally designed based on the characteristics of strings. e.g.,grammar structure patterns, or word concurrence patterns;however, a trajectory does not have such patterns. (4) Reinforcement Learning.

The goal of reinforcementlearning is to guide agents on how to take actions to max-imize a cumulative reward [33] in an environment, and theenvironment is generally modeled as a Markov decision pro-cess (MDP) [26]. Recently, RL models have been utilizedsuccessfully to solve some database related problems. Forexample, Zhang et al. [48] and Li et al. [17] use RL modelfor automatic DBMS tuning. Trummer et al. [36] use RL tolearn optimal join orders in the SkinnerDB system. Wanget al. [40] design an eﬀective RL-based algorithm for bi-partite graph matching. Overall, there are two types ofpopular reinforcement learning methods: (1) model-basedmethods [4, 12] that require to understand the environmentand learn the parameters of the MDP in advance, and (2)model-free methods [43, 21] that make no eﬀorts to learn amodel and get feedback from the environment step by step.In this paper, we follow the model-free methods becausethey are more eﬃcient. Speciﬁcally, we make use of a popu-lar reinforcement learning method, namely Deep Q Network(DQN) [21], for splitting a trajectory into subtrajectories tobe candidate solutions for similar subtrajectory search.

3. PROBLEM DEFINITION AND PRELIM-INARIES

The trace of a moving object such as a vehicle and a mo-bile user is usually captured by a trajectory. Speciﬁcally, atrajectory T has its form as a sequence of time-stamped loca-tions (called points), i.e., T = , where point p i = ( x i , y i , t i ) means that the location is ( x i , y i ) at time t i .The size of trajectory T , denoted by | T | , corresponds to thenumber of points of T .Given a trajectory T = and 1 ≤ i ≤ j ≤ n , we denote by T [ i, j ] the portion of T that startsfrom the i th point and ends at the j th point, i.e., T [ i, j ] =

. Besides, we say that T [ i, j ] for any 1 ≤ i ≤ j ≤ n is subtrajectory of T . There are in total n ( n +1)2 subtra-jectories of T . Note that any subtrajectory of a trajectory T belongs to a trajectory itself. Suppose we have a database of many trajectories, whichwe call data trajectories . As discussed in Section 1, one com-mon application scenario would be that a user has a trajec-tory at hand, which we call a query trajectory and wouldlike to check what is the portion of the data trajectoriesthat is the most similar to the one at his/her hand. Notethat in some cases, by looking each data trajectory as whole,none is similar enough to the query trajectory, e.g., all datatrajectories are relatively long while the query trajectory isrelatively short.We note that a more general query is to ﬁnd the top- k similar subtrajectories to a query trajectory, which reduces to the user’s query as described above when k = 1. In thispaper, we stick to the setting of k = 1 since extending thetechniques for the setting of k = 1 to general settings of k is straightforward. Speciﬁcally, the techniques for the set-ting k = 1 in this paper are all based on a search process,which maintains the most similar subtrajectory found so farand updates it when a more similar subtrajectory is foundduring the process. These techniques could be adapted togeneral settings of k by simply maintaining the k most simi-lar subtrajectories and updating them when a subtrajectorythat is more similar than the k th most similar subtrajectory.An intuitive solution to answer the user’s query is to scanthe data trajectories, and for each one, compute its subtra-jectory that is the most similar to the query one based onsome similarity measurement and update the most similarsubtrajectory found so far if necessary. This solution couldbe further enhanced by employing indexing techniques suchas the R-tree based index and the inverted-ﬁle based indexfor pruning [45, 39], e.g., the data trajectories that do nothave any overlap with the query trajectory could usuallybe pruned. The key component of this solution (no matterwhether indexing structures are used or not) is to computefor a given data trajectory, its subtrajectory that is the mostsimilar to a query trajectory. We formally deﬁne the prob-lem corresponding to this procedure as follows. Problem 1 (Similar Subtrajectory Search).

Given a data trajectory T = and a querytrajectory T q = < q , q , ..., q m > , the similar subtrajec-tory search (SimSub) problem is to ﬁnd a subtrajectory of T , denoted by T [ i ∗ , j ∗ ] ( ≤ i ∗ ≤ j ∗ ≤ n ), which is the mostsimilar to T q according to a trajectory similarity measure-ment Θ( · , · ) , i.e., [ i ∗ , j ∗ ] = arg max ≤ i ≤ j ≤ n Θ( T [ i, j ] , T q ) . The SimSub problem relies on a similarity measurementΘ(

T, T (cid:48) ), which captures the extent to which two trajec-tories T and T (cid:48) are similar to each other. The larger thesimilarity Θ( T, T (cid:48) ) is, the more similar T and T (cid:48) are. In theliterature, several “dissimilarity measurements” have beenproposed for Θ( · , · ) such as DTW [46], Frechet [2], LCSS [37],ERP [6], EDR [7], EDS [44], EDwP [29], and t2vec [18].Diﬀerent measurements have diﬀerent merits and suit fordiﬀerent application scenarios. In this paper, we assumean abstract similarity measurement Θ( · , · ), which could beinstantiated with any of these existing measurements by ap-plying some inverse operation such as taking the ratio be-tween 1 and a distance. The SimSub problem assumes an abstract similarity mea-surement and the techniques developed could be appliedto any existing measurements. Since the time complex-ity analysis of the algorithms proposed in this paper relieson the time complexities of computing a speciﬁc measure-ment in several diﬀerent cases, in this part, we review threeexisting measurements, namely t2vec [18], DTW [46], andFrechet [2], and discuss their time complexities in diﬀerentcases as background knowledge. The ﬁrst two are the mostwidely used measurements and the last one is the most re-cently proposed one, which is a data-driven measurement.We denote by Φ the time complexity of computing thesimilarity between a general subtrajectory of T and T q from scratch , Φ inc be the time complexity of computing( T [ i, j ] , T q ) (1 ≤ i < j ≤ n ) incrementally assuming thatΘ( T [ i, j − , T q ) has been computed already, and Φ ini thetime complexity of computing Θ( T [ i, i ] , T q ) (1 ≤ i ≤ n ) fromscratch since it cannot be computed incrementally. As willbe discussed later, Φ inc and Φ ini are usually much smallerthan Φ across diﬀerent similarity measurements. t2vec [18]. t2vec is a data-driven similarity measure basedon deep representation learning. It adapts a sequence-to-sequence framework based on RNN [8] and takes the ﬁnalhidden vector of the encoder [30] to represent a trajectory.It computes the similarity between two trajectories basedon the Euclidean distance between their representations asvectors.Given T and T q , it takes O ( n ) and O ( m ) time to com-pute their hidden vectors, respectively and O (1) to computethe Euclidean distance between two vectors [18]. There-fore, we know Φ = O ( n + m + 1) = O ( n + m ). Sincein the context studied in this paper, we need to computethe similarities between many subtrajectories and a querytrajectory T q , we assume that the representation of T q un-der t2vec is computed once and re-used many times, i.e.,the cost of computing the representation of T q , which is O ( m ), could be amortized among all computations of sim-ilarity and then that for each one could be neglected. Be-cause of the sequence-to-sequence nature of t2vec, given therepresentation of T [ i, j − O (1) to com-pute that of T [ i, j ] (1 ≤ i < j ≤ n ). Therefore, we knowΦ inc = O (1). Besides, we know Φ ini = O (1) since thesubtrajectory involved in the computation of similarity, i.e., T [ i, i ] (1 ≤ i ≤ n ), has its size equal to 1. DTW [46].

Given a data trajectory T = and a query trajectory T q = < q , q , ..., q m > , the DTWdistance is deﬁned as below D i,j =  (cid:88) ih =1 d ( p h , q ) if j = 1 (cid:88) jk =1 d ( p , q k ) if i = 1 d ( p i , q j )+min( D i − ,j − , D i − ,j , D i,j − ) otherwise (1)where D i,j denotes the DTW distance between T [1 , i ] and T q [1 , j ] and d ( p i , q j ) is the distance between p i and q j (typ-ically the Euclidean distance, which could be computed in O (1)). Consider Φ. It is clear that Φ = O ( n · m ) sinceit needs to compute all pairwise distances between a pointin a subtrajectory of T and a point in T q and in general,the subtrajectory has its size of O ( n ) and T q has its size of m . Consider Φ inc . This should be the same as the timecomplexity of computing D i,m given that D i − ,m has beencomputed. Since D i − ,m has been computed, we can safelyassume that D i − , , D i − , , ..., D i − ,m have been computedalso according to Equation 1 (note that we can always makethis hold by enforcing that we compute D i − ,m or any otherDTW distance in this way). Therefore, in order to compute D i,m , we compute D i, , D i, , ..., D i,m sequentially, each ofwhich would take O (1) time with the information of D i − ,k (1 ≤ k ≤ m ) all available. That is, it takes O ( m ) to com-pute D i,m , and thus we know that Φ inc = O ( m ). ConsiderΦ ini . We know Φ ini = O ( m ) since T [ i, i ] (1 ≤ i ≤ n ) has itssize always equal to 1 and T q has its size of m . Frechet [2].

Given a data trajectory T = and a query trajectory T q = < q , q , ..., q m > , the Frechet Table 1: Time complexities of computing the similarity be-tween a subtrajectory of T and T q in three casesTime complexities t2vec DTW FrechetΦ (general) O ( n + m ) O ( n · m ) O ( n · m )Φ inc (incremental) O (1) O ( m ) O ( m )Φ ini (initial) O (1) O ( m ) O ( m )distance is deﬁned as below F i,j =  max ih =1 d ( p h , q ) if j = 1max jk =1 d ( p , q k ) if i = 1max( d ( p i , q j ) , min( F i − ,j − , F i − ,j , F i,j − )) otherwise (2)where F i,j denotes the Frechet distance between T [1 , i ] and T q [1 , j ] and d ( p i , q j ) is the distance between p i and q j (typ-ically the Euclidean distance, which could be computedin O (1)). When the Frechet distance is used, we haveΦ = O ( n · m ), Φ inc = O ( m ), and Φ ini = O ( m ), basedon similar analysis as for the DTW distance.The summary of Φ, Φ inc and Φ ini for the similaritymeasurements corresponding to the distance measurementsDTW, Frechet and t2vec is presented in Table 1.

4. NON-LEARNING BASED ALGO-RITHMS

In this part, we introduce three types of algorithms,namely an exact algorithm ExactS, an approximate algo-rithm SizeS, and splitting-based algorithms including PSS,POS and POS-D. The ExactS algorithm is based on an ex-haustive search with some careful implementation and hasthe highest complexity, the SizeS algorithm is inspired by ex-isting studies on subsequence matching [15, 49] and providesa tunable parameter for controlling the trade-oﬀ between ef-ﬁciency and eﬀectiveness, and the splitting-based algorithmsare based on the idea of splitting the data trajectory for con-structing subtrajectories as candidates of the solution andrun the fastest. A summary of the time complexities of thesealgorithms is presented in Table 2.

Let T be a data trajectory and T q be a query trajectory.The ExactS algorithm enumerates all possible subtrajecto-ries T [ i, j ] (1 ≤ i ≤ j ≤ n ) of the data trajectory T andcomputes the similarity between each T [ i, j ] and T q , i.e.,Θ( T [ i, j ] , T q ), and then returns the one with the greatestsimilarity. For better eﬃciency, ExactS computes the simi-larities between the subtrajectories and T q incrementally asmuch as possible as follows. It involves n iterations, andin the i th iteration, it computes the similarity between eachsubtrajectory starting from the i th point and the query tra-jectory in an ascending order of the ending points, i.e., itcomputes Θ( T [ i, i ] , T q ) (from scratch) ﬁrst and then com-putes Θ( T [ i, i + 1] , T q ), ..., Θ( T [ i, n ] , T q ) sequentially andincrementally. During the process, it maintains the subtra-jectory that is the most similar to the query one, amongthose that have been traversed so far. As could be veriﬁed,it would traverse all possible subtrajectories after n itera-tions. The ExactS algorithm with this implementation ispresented in Algorithm 1.able 2: Time complexities of algorithms ( n << n )Algorithms abstract similarity measurement t2vec DTW FrechetExactS O ( n · (Φ ini + n · Φ inc )) O ( n ) O ( n · m ) O ( n · m )SizeS O ( n · (Φ ini + ( m + ξ ) · Φ inc )) O (( ξ + m ) · n ) O (( ξ + m ) · n · m ) O (( ξ + m ) · n · m )PSS, POS, POS-D O ( n · Φ ini + n · Φ inc ) O ( n ) O ( n · m ) O ( n · m )RLS, RLS-Skip(learning-based) O ( n · Φ ini + n · Φ inc ) O ( n ) O ( n · m ) O ( n · m ) Algorithm 1:

ExactS

Input:

A data trajectory T and query trajectory T q ; Output:

A subtrajectory of T that is the most similarto T q ; T best ← ∅ ; Θ best ← forall ≤ i ≤ | T | do compute Θ( T [ i, i ] , T q ); if Θ( T [ i, i ] , T q ) > Θ best then T best ← T [ i, i ]; Θ best ← Θ( T [ i, i ] , T q ); end forall i + 1 ≤ j ≤ | T | do compute Θ( T [ i, j ] , T q ) based onΘ( T [ i, j − , T q ); if Θ( T [ i, j ] , T q ) > Θ best then T best ← T [ i, j ]; Θ best ← Θ( T [ i, j ] , T q ); end end end return T best ;Consider the time complexity of ExactS. Since there are n iterations and in each iteration, the time complexity ofcomputing Θ( T [ i, i ] , T q ) is Φ ini and the time complexity ofcomputing Θ( T [ i, i + 1] , T q ), ..., and Θ( T [ i, n ] , T q ) is O ( n · Φ inc ), we know that the overall time complexity is O ( n · (Φ ini + n · Φ inc )).We note that for some speciﬁc similarity measurement,there may exist algorithms that have better time complex-ity than ExactS. For example, the Spring algorithm [31],which ﬁnds the most similar subsequence of a data time se-ries to a query one, is applicable to the SimSub problemand has the time complexity of O ( nm ). The major idea ofSpring is a dynamic programming process for computing theDTW distance between the data time series and the queryone, where the latter is padded with a ﬁctitious point thatcould be aligned with any point of the data time series withdistance equal to 0 (so as to cover all possible suﬃxes of thedata time series). Nevertheless, Spring is designed for thespeciﬁc similarity DTW while ExactS works for an abstractone that could be instantiated to be any similarity. ExactS explores all possible n ( n +1)2 subtrajectories, manyof which might be quite dissimilar from the query trajectoryand could be ignored. For example, by following some exist-ing studies on subsequence matching [15, 49], we could re-strict our attention to only those subtrajectories, which havesimilar sizes as the query one for better eﬃciency. Speciﬁ-cally, we enumerate all subtrajectories that have their sizeswithin the range [ m − ξ, m + ξ ], where ξ ∈ [0 , n − m ] is a pre-deﬁned parameter that controls the trade-oﬀ between the ef-ﬁciency and eﬀectiveness of the algorithm. Again, we adoptthe strategy of incremental computation for the similarities between those subtrajectories starting from the same pointand the query trajectory. We call this algorithm SizeS andanalyze its time complexity as follows. The time complex-ity of computing the similarities between all subtrajectoriesstarting from a speciﬁc point and having their sizes withinthe range [ m − ξ, m + ξ ] is O (Φ ini +( m − ξ − · Φ inc +2 ξ · Φ inc ),where Φ ini + ( m − ξ − · Φ inc is cost of computingΘ( T [ i, i + m − ξ − , T q ) and 2 ξ · Φ inc is the cost of computingΘ( T [ i, j ] , T q ) for j ∈ [ i + m − ξ, i + m + ξ − O (Φ ini + ( m + ξ ) · Φ inc ). Therefore, theoverall time complexity of SizeS is O ( n · (Φ ini + ( m + ξ ) · Φ inc )). For example, when DTW or Frechet is used, it is O ( n · ( m + ( m + ξ ) · m )) = O (( ξ + m ) · n · m ) and when t2vecis used, it is O ( n · (1 + ( ξ + m ) · O (( ξ + m ) · n ).In summary, SizeS achieves a better eﬃciency than ExactSat the cost of its eﬀectiveness. Besides, SizeS still needs toexplore O ( ξ · n ) subtrajectories, which restricts its applica-tion on small and moderate datasets only. Unfortunately,SizeS may return a solution, which is arbitrarily worse thanthe best one. We illustrate this in the technical report ver-sion [42] due to the page limit. The ExactS algorithm is costly since it explores O ( n )subtrajectories. The SizeS algorithm runs faster than Ex-actS since it explores about O ( ξ · n ) subtrajectories ( ξ << n ).Thus, an intuitive idea to push the eﬃciency further up isto explore fewer subtrajectories. In the following, we designa series of three approximate algorithms, which all share theidea of splitting a data trajectory into several subtrajecto-ries and returning the one that is the most similar to thequery trajectory. These algorithms diﬀer from each otherin using diﬀerent heuristics for deciding where to split thedata trajectory. With this splitting strategy, the numberof subtrajectories that would be explored is bounded by n and in practice, much smaller than n . We describe thesealgorithms as follows. (1) Preﬁx-Suﬃx Search (PSS). The PSS algorithm is agreedy one, which maintains a variable T best storing the sub-trajectory that is the most similar to the query trajectoryfound so far. Speciﬁcally, it scans the points of the datatrajectory T in the order of p , p , ..., p n . When it scans p i , it computes the similarities between the two subtrajec-tories that would be formed if it splits T at p i , i.e., T [ h, i ]and T [ i, n ], and the query trajectory T q , where p h is thepoint following the one, at which the last split was done ifany and p h is the ﬁrst point p otherwise. In particular,we replace the part of computing the similarity between thesuﬃx T [ i, n ] and the query trajectory with that betweentheir reversed versions, denoted by T [ i, n ] R and T Rq , respec-tively. This is because (1) Θ( T [ i, n ] R , T Rq ) could be com-puted incrementally based on Θ( T [ i + 1 , n ] R , T Rq ) and (2)Θ( T [ i, n ] R , T Rq ) and Θ( T [ i, n ] , T q ) are equal for some sim-igure 1: A problem input. Table 3: Illustration of PSS with the DTW distance. Initial h = 1, T best = ∅ and Θ best = 0Point Preﬁx Suﬃx Split h Θ best T best p Θ( T [1 , , T q ) = 0 .

124 Θ( T [1 , R , T Rq ) = 0 .

150 Yes 2 0.150 T [1 , p Θ( T [2 , , T q ) = 0 .

236 Θ( T [2 , R , T Rq ) = 0 .

227 Yes 3 0.236 T [2 , p Θ( T [3 , , T q ) = 0 .

183 Θ( T [3 , R , T Rq ) = 0 .

215 No 3 0.236 T [2 , p Θ( T [3 , , T q ) = 0 .

236 Θ( T [4 , R , T Rq ) = 0 .

215 No 3 0.236 T [2 , p Θ( T [3 , , T q ) = 0 .

215 Θ( T [5 , R , T Rq ) = 0 .

152 No 3 0.236 T [2 , T best = T [2 ,

2] with Θ best = 0 . ilarity measurements such as DTW and Frechet and posi-tively correlated for others such as t2vec as we found viaexperiments. If any of these two similarities are larger thanthe best-known similarity, it performs a split operation at p i and updates T best accordingly; otherwise, it continues toscan the next point p i +1 . At the end, it returns T best . Theprocedure of PSS is presented in Algorithm 2.To illustrate, consider an example shown in Figure 1,where T is a data trajectory with 5 points p and T q isa query trajectory with 3 points q . Suppose that we mea-sure the similarity between two trajectories using the ratioof 1 over the DTW distance between them. Consider theprocess of the PSS algorithm, which is depicted in Table 3.When it scans p , it considers two subtrajectories, namely T [1 ,

1] and T [1 , . . p and updates h , Θ best and T best accordingly, as shown in the 3 rd row ofthe table. It continues to scan p and considers two sub-trajectories T [2 ,

2] and T [2 , p andupdates h , Θ best and T best . It then scans p , p , and p anddoes not perform a split at any of them. Therefore, h , Θ best and T best are kept unchanged. Finally, it returns T best , i.e., T [2 , . T [2 , T q , which is 1 / . p i , the time costs include thatof computing Θ( T [ h, i ] , T q ) and also that of computingΘ( T [ i, n ] R , T Rq ). Consider the former part. If i = h , it isΦ ini . If i ≥ h +1, it is Φ inc since Θ( T [ h, i ] , T q ) could be com-puted based on Θ( T [ h, i − , T q ) incrementally. Consider thelatter part. It is simply O (Φ inc ). In conclusion, the timecomplexity of PSS is O ( n · (Φ ini + Φ inc ) + ( n − n ) · Φ inc ) = O ( n · Φ ini + n · Φ inc ), where n is the number of points wheresplits are done. For example, when DTW or Frechet is used,the time complexity of PSS is O ( n · m + n · m ) = O ( n · m )and when t2vec is used, it is O ( n · n ·

1) = O ( n ). (2) Preﬁx-Only Search (POS). In PSS, when it scansa point p i , it considers two subtrajectories, namely T [ h, i ]and T [ i, n ]. An alternative is to consider the preﬁx T [ h, i ]only - one argument is that the suﬃx T [ i, n ] might be de-stroyed when further splits are conducted. A consequentbeneﬁt is that the time cost of computing Θ( T [ i, n ] , T q )would be saved. We call this algorithm the POS algorithm.As could be veriﬁed, POS has the same time complexity asPSS though the former runs faster in practice. (3) Preﬁx-Only Search with Delay (POS-D). POSperforms a split operation whenever a preﬁx, which is bet-

Algorithm 2:

Preﬁx-Suﬃx Search (PSS)

Input:

A data trajectory T and query trajectory T q ; Output:

A subtrajectory of T that is similar to T q ; T best ← ∅ ; Θ best ← compute Θ( T [ n, n ] R , T Rq ); compute Θ( T [ n − , n ] R , T Rq ), Θ( T [ n − , n ] R , T Rq ), ...,Θ( T [1 , n ] R , T Rq ) incrementally; h ← forall ≤ i ≤ | T | do compute Θ( T [ h, i ] , T q ) incrementally if possible; if max { Θ( T [ h, i ] , T q ) , Θ( T [ i, n ] R , T Rq ) } > Θ best then Θ best ← max { Θ( T [ h, i ] , T q ) , Θ( T [ i, n ] R , T Rq ) } ; if Θ( T [ h, i ] , T q ) > Θ( T [ i, n ] R , T Rq ) then T best ← T [ h, i ]; else T best ← T [ i, n ]; end h ← i + 1; end end return T best ;ter than the best subtrajectory known so far, is found. Thislooks a bit rush and may prevent a better subtrajectory tobe formed by extending it with a few more points. Thus, wedesign a variant of POS, called Preﬁx-Only Search with De-lay (POS-D). Whenever a preﬁx is found to be more similarto the query trajectory than the best subtrajectory knownso far, POS-D continues to scan D more points and splitsat one of these D + 1 points, which has the correspondingpreﬁx the most similar to the query trajectory. It could beveriﬁed that with this delay mechanism, the time complex-ity of the algorithm does not change though in practice, itwould be slightly higher.While these splitting-based algorithms including PSS,POS and POS-D, return reasonably good solutions in prac-tice, they may return solutions that are arbitrarily worsethan the best one in theory. We illustrate this in the tech-nical report version [42] due to the page limit.

5. REINFORCEMENT LEARNING BASEDALGORITHM

A splitting-based algorithm has its eﬀectiveness rely onthe quality of the process of splitting a data trajectory. Inorder to ﬁnd a solution of high quality, it requires to per-form split operations at appropriate points such that somesubtrajectories that are similar to a query trajectory areormed and then explored. The three splitting-based al-gorithms, namely PSS, POS and POS-D, mainly use somehand-crafted heuristics for making decisions on whether toperform a split operation at a speciﬁc point. This processof splitting a trajectory into subtrajectories is a typical se-quential decision making process. Speciﬁcally, it scans thepoints sequentially and for each point, it makes a decisionon whether or not to perform a split operation at the point.In this paper, we propose to model this process as a

Markovdecision process (MDP) [26] (Section 5.1), adopt a deep- Q -network (DQN) [21] for learning an optimal policy for theMDP (Section 5.2), and then develop an algorithm called re-inforcement learning based search (RLS), which correspondsto a splitting-based algorithm that uses the learned policyfor the process of splitting a data trajectory (Section 5.3)and an augmented version of RLS, called RLS-Skip, withbetter eﬃciency (Section 5.4) A MDP consists of four components, namely states , ac-tions , transitions , and rewards , where (1) a state capturesthe environment that is taken into account for decision mak-ing by an agent ; (2) an action is a possible decision thatcould be made by the agent; (3) a transition means thatthe state changes from one to another once an action istaken; and (4) a reward, which is associated with a transi-tion, corresponds to some feedback indicating the quality ofthe action that causes the transition. We model the processof splitting a data trajectory as a MDP as follows. (1) States. We denote a state by s . Suppose it is cur-rently scanning point p t . p h denotes the point followingthe one, at which the last split operation happens if anyand p otherwise. We deﬁne the state of the current envi-ronment as a triplet (Θ best , Θ pre , Θ suf ), where Θ best is thelargest similarity between a subtrajectory found so far andthe query trajectory T q , Θ pre is Θ( T [ h, t ] , T q ) and Θ suf isΘ( T [ t, n ] R , T Rq ). As could be noticed, a state captures theinformation about the query trajectory, the data trajectory,the point at which the last split happens, and the pointthat is being scanned, etc. Note that the state space is athree-dimensional continuous one. (2) Actions. We denote an action by a . We deﬁne twoactions, namely a = 1 and a = 0. The former means toperform a split operation at the point that is being scannedand the latter means to move on to scan the next point. (3) Transitions. In the process of splitting a trajectory,given a current state s and an action a to take, the proba-bility that we would observe a speciﬁc state s (cid:48) is unknown.We note that the method that we use for solving the MDPin this paper is a model-free one and could solve the MDPproblem even with its transition information unknown. (4) Rewards. We denote a reward by r . We deﬁne thereward associated with the transition from state s to state s (cid:48) after action a is taken as ( s (cid:48) . Θ best − s. Θ best ), where the s (cid:48) . Θ best is the ﬁrst component of state s (cid:48) and s. Θ best is theﬁrst component of state s . With this reward deﬁnition, thegoal of the MDP problem, which is to maximize the accumu-lative rewards, is consistent with that of the process of split-ting a data trajectory, which is to form a subtrajectory withthe greatest possible similarity to the query trajectory. Tosee this, consider that the process goes through a sequenceof states s , s , ..., s N and ends at s N . Let r , r , ..., r N − denote the rewards received at these states except for thetermination state s N . Then, when the future rewards arenot discounted, we haveΣ t r t = Σ t ( s t . Θ best − s t − . Θ best ) = s N . Θ best − s . Θ best where s N . Θ best corresponds to the similarity between thebest subtrajectory found and the query trajectory T q and s . Θ best corresponds to the best known similarity at the be-ginning, i.e., 0. Therefore, maximizing the accumulativerewards is equivalent to maximizing the similarity betweenthe subtrajectory to be found and T q in this case. Q -Network (DQN) Learning The core problem of a MDP is to ﬁnd an optimal pol-icy for the agent, which corresponds to a function π thatspeciﬁes the action that the agent should choose when at aspeciﬁc state so as to maximize the accumulative rewards.One type of methods that are commonly used is those value-based methods [33, 21]. The major idea is as follows. First,it deﬁnes an optimal action-value function Q ∗ ( s, a ) (or Q function), which represents the maximum amount of ex-pected accumulative rewards it would receive by followingany policy after seeing the state s and taking the action a .Second, it estimates Q ∗ ( s, a ) using some methods such as Q -learning [43] and deep- Q -network (DQN) [21]. Third, itreturns the policy, which always chooses for a given state s the action a that maximizes Q ∗ ( s, a ).In our MDP, the state space is a three dimensional con-tinuous one, and thus we adopt the DQN method. Speciﬁ-cally, we use the deep Q learning with replay memory [22] forlearning the Q functions. This method maintains two neuralnetworks. One is called the main network Q ( s, a ; θ ), whichis used to estimate the Q function. The other is called the target network ˆ Q ( s, a ; θ − ), which is used to compute someform of loss for training the main network. Besides, it main-tains a ﬁxed-size pool called replay memory , which containsthe latest transitions that are sampled uniformly and usedfor training the main network. The intuition is to avoidthe correlation among consecutive transitions. The detailedprocedure of DQN for our MDP is presented in Algorithm 3,which we go through as follows. We maintain a database D of data trajectories and a set of D q of query trajectories.It ﬁrst initializes the reply memory M with some capac-ity, the main network Q ( s, a ; θ ) with random weights, andthe target network ˆ Q ( s, a ; θ − ) by copying Q ( s, a ; θ ) (Lines1 - 3). Then, it involves a sequence of many episodes. Foreach episode, it samples a data trajectory T from D anda query trajectory T q from D q , both uniformly (Lines 4 -5). It initializes a variable h such that p h corresponds tothe point following the one, at which the last split opera-tion is performed if any and p otherwise (Line 6). It alsoinitializes the state s (Lines 7 - 8). Then, it proceeds with | T | time steps. At the t th time step, it scans point p t andselects an action using the (cid:15) -greedy strategy based on themain network, i.e., it performs a random action a t with theprobability (cid:15) (0 < (cid:15) <

1) and a t = arg max a Q ( s t , a ; θ ) withthe probability (1 − (cid:15) ) (Lines 9 - 10). If a t = 1, it splits thetrajectory at point p t and updates h to be t + 1 (Lines 11 -13). It then updates Θ best if possible (Line 14). If the cur-rent point being scanned is the last point p n , it terminates(Lines 15 - 17). Otherwise, it observes a new state s t +1 andthe reward r t (Lines 18 - 20). It then stores the experience( s t , a t , r t , s t +1 ) in the reply memory, samples a minibatch of lgorithm 3: Deep- Q -Network (DQN) Learning withExperience Replay Input:

A database D of data trajectories and a set of D q of query trajectories; Output:

Learned action-value function Q ( s, a ; θ ); initialize the reply memory M ; initialize the main network Q ( s, a ; θ ) with randomweights θ ; initialize the target network ˆ Q ( s, a ; θ − ) with weights θ − = θ ; for episode = 1 , , , ... do sample a data and query trajectory T , T q ; h ← Θ best ←

0; Θ pre ← Θ( T [ h, h ] , T q );Θ suf ← Θ( T [ h, n ] R , T Rq ); observe the ﬁrst state s = (Θ best , Θ pre , Θ suf ); for each step ≤ t ≤ | T | do select a random action a t with probability (cid:15) andselect action a t = arg max a Q ( s t , a ; θ ) withprobablity (1 − (cid:15) ); if a t = 1 then h ← t + 1; end Θ best ← max { s t . Θ best , s t . Θ pre , s t . Θ suf } ; if t = | T | then break ; end Θ pre ← Θ( T [ h, t + 1] , T q );Θ suf ← Θ( T [ t + 1 , n ] R , T Rq ); observe the next state s t +1 = (Θ best , Θ pre , Θ suf ); observe the reward r t = s t +1 . Θ best − s t . Θ best ; store the experience ( s t , a t , r t , s t +1 ) in thereplay memory M ; sample a random minibatch of experiences from M uniformly; perform a gradient descent step on the loss ascomputed by Equatioin (3) wrt θ ; end copy the main network Q ( s, a ; θ ) to ˆ Q ( s, a ; θ − ); end experiences, and uses it to perform a gradient descent stepfor updating θ wrt a loss function (Lines 21 - 23). The lossfunction for one experience ( s, a, r, s (cid:48) ) is as follows. L ( θ ) = ( y − Q ( s, a ; θ )) (3)where y is equal to r if s (cid:48) is a termination step and r + γ · max a (cid:48) ˆ Q ( s (cid:48) , a (cid:48) ; θ − ) otherwise. Finally, it updates the targetnetwork ˆ Q ( s, a ; θ − ) with the main network Q ( s, a ; θ ) at theend of each episode (Line 25). A graphical illustration ofthe method is shown in Figure 2. Once we have estimated the Q functions Q ( s, a ; θ ) via thedeep Q learning with experience replay, we use the policy,which always takes for a given state s the action that max-imizes Q ( s, a ; θ ), for the process of splitting a data trajec-tory. Among all subtrajectories that are formed as a resultof the process, we return the one with the greatest similarityto the query trajectory T q . We call this algorithm reinforce-ment learning based search (RLS). Essentially, it is the same Figure 2: Deep Q learning with experience replayas PSS except that it uses a policy learned via DQN insteadof human-crafted heuristics for making decisions on how tosplit a data trajectory.RLS has the same time complexity as PSS since both RLSand PSS make decisions based on the similarities of the sub-trajectories that are being considered when scanning a pointand the best-known similarity: (1) RLS constructs a stateinvolving them and goes through the main network of DQNwith the state information, which is O (1) given that the net-work is small-size (e.g., a few layers); and (2) PSS simplyconduct some comparisons among the similarities, which isalso O (1). In terms of eﬀectiveness, RLS provides consis-tently better solutions than PSS as well as POS and POS-D, as will be shown in the empirical studies, and the reasonis possibly that RLS is based on a learned policy, whichmakes decision more intelligent than simple heuristics thatare human-crafted. In the RLS algorithm, each point is considered as a can-didate for performing a split operation. While this helps toattain a reasonably large space of subtrajectories for explo-ration and hence achieving good eﬀectiveness, it is somehowconservative and incurs some cost of decision marking for each point. An alternative is to go a bit more optimisticand skip some points from being considered as places forsplit operations. The beneﬁt would be immediate, i.e., thecost of making decisions at these points is saved. Motivatedby this, we propose to augment the MDP that is used byRLS by introducing k more actions (apart from two exist-ing ones: scanning the next point and performing a splitoperation), namely skipping 1 point, skipping 2 points, ...,skipping k points. Here, k is a hyperparameter, and byskipping j points ( j = 1 , , ..., k ), it means to skip points p i +1 , p i +2 , ..., p i + j and scan point p i + j +1 next, where p i isthe point that is being scanned. All other components of theMDP are kept the same as that for RLS. Note that when k = 0, this MDP reduces to the original one for RLS. Wecall the algorithm based on this augmented MDP RLS-Skip .To illustrate, consider again the example shown in Fig-ure 1. Suppose that it has learned a policy using the DQNmethod, which is captured by the main network Q ( s, a ; θ ).The process of RLS-Skip is depicted in Table 4. Suppose theparameter k is equal to 1, which implies that there are threepossible actions 0 (no split), 1 (split), and 2 (no split andskip of 1 point). In addition, we write Θ pre = Θ( T [ i, j ] , T q )able 4: Illustration of RLS-Skip with the DTW distance. Initial h = 1, T best = ∅ and Θ best = 0Point State Action h Θ best T best p s = (Θ best = 0 , Θ pre = T [1 , = 0 . , Θ suf = T [1 , = 0 . a = arg max a Q ( s , a ; θ ) = 1 : split . T [1 , p s = (Θ best = 0 . , Θ pre = T [2 , = 0 . , Θ suf = T [2 , = 0 . a = arg max a Q ( s , a ; θ ) = 2 : skip . T [2 , p (skipped) - - - - - p s = (Θ best = 0 . , Θ pre = T [2 , = 0 . , Θ suf = T [4 , = 0 . a = arg max a Q ( s , a ; θ ) = 1 : split . T [2 , p s = (Θ best = 0 . , Θ pre = T [5 , = 0 . , Θ suf = T [5 , = 0 . a = arg max a Q ( s , a ; θ ) = 0 : no − split . T [2 , T best = T [2 ,

4] with Θ best = 0 . as Θ pre = T [ i,j ] and Θ suf = Θ( T [ i, j ] , T q ) as Θ suf = T [ i,j ] forsimplicity. At the very beginning, it initializes h , Θ best and T best . It then scans point p , observes the ﬁrst state s as (Θ best = 0 , Θ pre = T [1 , = 0 . , Θ suf = T [1 , = 0 . a = arg max a Q ( s , a ; θ ) = 1, mean-ing to perform a split operation at p . It then updates h ,Θ best , and T best as shown in the 3 rd row of the table. Itcontinues to scan point p , observes the second state s as(Θ best = 0 . , Θ pre = T [2 , = 0 . , Θ suf = T [2 , = 0 . a = arg max a Q ( s , a ; θ ) = 2, mean-ing to skip the next 1 point, i.e., p . It keeps h unchanged(since no splits are done) but updates Θ best and T best to be0 .

236 and T [2 , T [2 ,

2] is the subtra-jectory with the largest similarity among all subtrajectoriesthat have been considered. As a result of the skipping, itscans point p next and proceeds similarly. It performs asplit operation when scanning point p and terminates afterscanning point p . At the end, it returns T [2 , . all pointssince the state at a point involves some similarities, whichare computed incrementally based on the similarities com-puted at those points before the point. Thus, by applyingthe skipping strategy alone would not help much in reduc-ing the time cost since the cost of maintaining the statesdominates that of making decisions. To fully unleash thepower of the skipping strategy, we propose to ignore thosepoints that have been skipped when maintaining the states.That is, to maintain the state (Θ best , Θ pre , Θ suf ) at a point p i , we compute Θ best and Θ suf in the same way as we doin RLS and Θ pre as the similarity between the query trajec-tory and the subtrajectory consisting of those points thatare before p i and have not been skipped . Here, the preﬁxsubtrajectory corresponds to a simpliﬁcation of that usedin RLS [19]. While RLS-Skip has the same worse-case timecomplexity as RLS, e.g., it reduces to RLS when no skip-ping operations happen, the cost of maintaining the statesfor RLS-Skip would be much smaller. As shown in our em-pirical studies, RLS-Skip runs signiﬁcantly faster than RLSas well as PSS, POS and POS-D. In addition, RLS-Skip andRLS do not provide theoretical guarantees on the approxi-mation quality due to their learning nature. The proofs canbe found in the appendix of the technical report [42]. Nev-ertheless, they work quite well in practice (e.g., RLS hasthe approximation ratio smaller than 1.1 for all similaritymeasurements and on all datasets (Figure 3)). In addition,the problem instances that we constructed for proving thenegative results in fact rarely happen in practice, which areconﬁrmed by the eﬀectiveness results on real datasets.

6. EXPERIMENTS

We present the experimental set-up in Section 6.1 andthen the experimental results in Section 6.2.

Dataset.

Our experiments are conducted on three real-world trajectory datasets. The ﬁrst dataset, denoted byPorto, is collected from the city of Porto , Portugal, whichconsists around 1.7 million taxi trajectories over 18 monthswith a sampling interval of 15 seconds and a mean lengtharound 60. The second dataset, denoted by Harbin, in-volves around 1.2 million taxi trajectories collected from13,000 taxis over 8 months in Harbin, China with non-uniform sampling rates and a mean length around 120. Thethird dataset, denoted by Sports, involves around 0.2 mil-lion soccer player and ball trajectories collected from STATSSports with a uniform sampling rate of 10 times per secondand a mean length around 170. Parameter Setting.

For training t2vec model, we followthe original paper [18] by excluding those trajectories thatare short and use their parameter settings. For SizeS, weuse the setting ξ = 5 (with the results of its eﬀect shownlater on). For POS-D, we vary the parameter D from 4to 7, and since the results are similar, we use the setting D = 5. For the neural networks involved in the RL-basedalgorithms, i.e., RLS and RLS-Skip, we use a feedforwardneural network with 2 layers. In the ﬁrst layer, we use theReLu function with 20 neurons, and in the second layer, weuse the sigmoid function with 2 + k neurons as the outputcorresponding to diﬀerent actions, where for RLS we use k = 0 and for RLS-Skip, we use k = 3 by default. In thetraining process, the size of replay memory M is set at 2000.We train our model on 25k random trajectory pairs, usingAdam stochastic gradient descent with an initial learningrate of 0.001. The minimal (cid:15) is set at 0.05 with decay 0.99for the (cid:15) -greedy strategy, and the reward discount rate γ isset at 0.95. Compared Methods.

We compare RL-based Search(RLS), RL-based Search with skipping (RLS-Skip) and theproposed non-learning based algorithms (Section 4), namelyExactS, SizeS, PSS, POS, and POS-D. For RLS and RLS-Skip, when t2vec is adopted, we ignore the Θ suf componentof a state based on empirical ﬁndings.In addition, we consider three competitor methods,namely UCR [24, 27, 28], Spring [31], and Random-S. UCRwas originally developed for searching subsequences of atime series, which are the most similar to a query time se-ries and the similarity is based on the DTW distance. UCR (a) AR (t2vec) (b) MR (t2vec) (c) RR (t2vec) (d) AR (DTW) (e) MR (DTW) (f) RR (DTW) (g) AR (Frechet) (h) MR (Frechet) (i) RR (Frechet) Figure 3: Eﬀectiveness for t2vec (a)-(c), DTW (d)-(f) andFrechet (g)-(i). (a) Porto (t2vec) (b) Porto (DTW) (c) Porto (Frechet)(d) Porto (t2vec) (e) Porto (DTW) (f) Porto (Frechet)

Figure 4: Eﬃciency without index (a)-(c) and with R-treeindex (d)-(f) on Porto.enumerates all subsequences that are of the same length ofthe query time series and employs a rich set of techniquesfor pruning many of the subsequences. We adapt UCR forour similar subtrajectory search problem (details of adap-tions are provided in the appendix of the technical reportversion [42]). We note that UCR only works for DTW, butnot for Frechet or t2vec. Spring is an existing algorithmfor searching a subsequence of a time series, which is themost similar to a query time series. It is designed for DTW.Random-S randomly samples a certain number of subtrajec-tories of data trajectory and among them, returns the onewith the highest similarity to the query trajectory. Sincethese methods are either not general for all similarity mea-surements (e.g., UCR) or involve some parameter that isdiﬃcult to set (e.g., Random-S with a parameter of sam-ple size), we compare these two competitor methods withour RLS-Skip algorithm only in terms of eﬀectiveness andeﬃciency.

Evaluation Metrics.

We use three metrics to evaluate theeﬀectiveness of an approximate algorithm. (1) Approximate Ratio (AR): It is deﬁned as the ratio between the dissimilar-ity of the solution wrt a query trajectory, which is returnedby an approximate algorithm, and that of the solution re-turned by an exact algorithm. A smaller AR indicates abetter algorithm. (2) Mean Rank (MR): We sort all the sub-trajectories of a data trajectory in ascending order of theirdissimilarities wrt a query trajectory. MR is deﬁned as therank of the solution returned by an approximate algorithm.(3) Relative Rank (RR): RR is a normalized version of MRby the total number subtrajectories of a data trajectory. Asmaller MR or RR indicates a better algorithm.

Evaluation Platform.

All the methods are implementedin Python 3.6. The implementation of RLS is based onKeras 2.2.0. The experiments are conducted on a server with32-cores of Intel(R) Xeon(R) Gold 6150 CPU @ 2.70GHz768.00GB RAM and one Nvidia Tesla V100-SXM2 GPU. (1) Eﬀectiveness results.

We randomly sample 10,000trajectory pairs from a dataset, and for each pair we useone trajectory as the query trajectory to search the mostsimilar subtrajectory from the other one. Figure 3 showsthe results. The results clearly show that RLS and RLS-Skip consistently outperform all other non-learning basedapproximate algorithms in terms of all three metrics on bothdatasets and under all three trajectory similarity measure-ments. For example, RLS outperforms POS-D, the bestnon-learning algorithm when using t2vec, by 70% (resp.83%) in terms of RR on Porto (resp. Harbin); RLS out-performs PSS, the best non-learning based algorithm whenusing DTW, by 25% (resp. 20%) in terms of MR on Porto(resp. Harbin); RLS outperforms PSS, the best non-learningbased algorithm when using Frechet, by 25% (resp. 20%) interms of MR on Porto (resp. Harbin). Among PSS, POS,and POS-D, PSS performs the best for DTW and Frechet;However, for t2vec, PSS provides similar accuracy as POSand POS-D on Porto, but performs much worse on Harbin.The reason is that for DTW and Frechet, PSS computesexact similarity values for suﬃx subtrajectories, while fort2vec, it computes only approximate ones. Therefore, PSShas a relatively worse accuracy when used for t2vec. Wealso observe that SizeS is not competitive compared withother approximate algorithms. In addition, RLS-Skip hasits eﬀectiveness a bit worse than RLS, but still better thanthose non-learning based algorithms due to the fact that itis based on a learned policy for decision making. (2) Eﬃciency results.

We prepare diﬀerent databases ofdata trajectories by including diﬀerent amounts of trajecto-ries from a dataset and vary the total number of points in adatabases. For each database, we randomly sample 10 querytrajectories from the dataset, run for each query trajectorya query for ﬁnding the top-50 similar subtrajectories, andthen collect the average running time over 10 queries. Theresults of running time on the Porto dataset are shown inFigure 4, and those on the other datasets could be found inthe technical report [42]. RLS-Skip runs the fastest since onthose points that have been skipped, the cost of maintain-ing the states and making decisions is saved. In contrast,none of the other algorithms skip points. ExactS has thelongest running time, e.g., ExactS is usually around 7-15times slower than PSS, POS, POS-D, RLS and 20-30 timesslower than RLS-Skip. RLS is slightly slower than PSS, a) Porto (t2vec) (b) Porto (DTW) (c) Porto (Frechet)

Figure 5: Eﬀectiveness with varying query lengths. (a) Porto (t2vec) (b) Porto (DTW) (c) Porto (Frechet)

Figure 6: Eﬃciency with varying query lengths.POS, POS-D. This is because RLS makes the splitting deci-sion via a learning model while the other three use a simplesimilarity comparison. (3) Scalability.

We investigate the scalability of all thealgorithms based on the results reported in Figure 4. Allthose splitting-based algorithms including PSS, POS, POS-D, RLS and RLS-Skip scale well. (4) Working with indexes.

Following two recent stud-ies [45, 39] on trajectory similarity search, we employ theBounding Box R-tree Index for boosting the eﬃciency. Itindexes the MBRs of data trajectories and prunes all thosedata trajectories whose MBRs do not interact with the MBRof a given query trajectory. We note that in theory, exact so-lutions might be ﬁltered out by the index (e.g., the most sim-ilar subtrajectory may be part of a trajectory, whose MBRdoes not interact with that of a query one), but in practice,this rarely happen. For example, as found in our experi-ments on the Porto dataset, when DTW and Frechet areused, the results returned when using the index and thosewhen using no indexes are exactly the same, i.e., no resultsare missed out. When t2vec is used, at most 20% results aremissed. Furthermore, for cases of ﬁnding approximate solu-tions, as most of the proposed algorithms do, missing somepotential solutions for better eﬃciency is acceptable. Com-pared with the results without indexes in Figure 4(a)-(c),the results using the R-tree index as shown in Figure 4(d)-(f) are lower by around 20–30%. (5) The eﬀect of query trajectory length.

We preparefour groups of query trajectories from a dataset, namely G G G

3, and G

4, each with 10,000 trajectories, such that thelengths of the trajectories in a group are as follows: G , G , G ,

75) and G , k for RLS-Skip. Metrics k = 0 k = 1 k = 2 k = 3 k = 4 k = 5AR 1.028 1.039 1.042 1.044 1.055 1.069MR 41.138 56.633 58.077 64.741 70.281 94.356RR 3.5% 5.4% 5.6% 5.8% 6.3% 8.9%Time (ms) 55.2 39.8 38.5 35.8 31.8 22.9Skip Pts 0% 3.1% 13.1% 17.7% 29.5% 47.6% (a) Relative Rank (DTW) (b) Time Cost (DTW) Figure 7: The eﬀect of soft margin ξ for SizeS.This is because the length of the most similar subtrajectorymay not have similar length as the query trajectory, and thusit may miss high-quality results when the search space isconstrained by the parameter ξ . The results of running timeon Porto are shown in Figure 6. We notice that for t2vec, therunning times of all the algorithms are almost not aﬀectedby the query length. This is because for t2vec, the timecomplexity of computing a similarity is constant once thevector of the query trajectory is learned. For DTW, Frechet,the time complexity of computing a similarity increase withthe query length, as shown in Table 2. (6) The eﬀect of skipping steps k . According to theresults, a general trend is that with larger settings of k , RLS-Skip has its eﬀectiveness drop but its eﬃciency grow becauseRLS-Skip tends to skip more points. We present in Table 5the results on Porto for DTW only due to the page limit.We also report the portion of skipped points in the Portodataset with 10,000 trajectories. Note that when k is setto 0, RLS-Skip degrades to RLS. For other experiments, wechoose k = 3 as a reasonable trade-oﬀ between eﬀectivenessand eﬃciency. (7) The eﬀect of parameter ξ . Figure 7 shows SizeS’s RRand running time averaged on 10,000 trajectory pairs fromthe Porto dataset. As expected, as ξ grows, the RR of SizeSbecomes better, but running time increases and approachesto that of ExactS. (8) Comparison with similar trajectory search (Sim-Tra). The solution of the similar trajectory search (SimTra)could be regarded as an approximate solution of the SimSubproblem because a data trajectory by itself is a subtrajec-tory. We compare this approximate solution by SimTra andthat by the RLS algorithm. We report the average resultsover 10,000 trajectory pairs. The results are shown in Ta-ble 6. The MR and RR of SimTra are around 10 times largerthan those of SimSub for t2vec and 20 times for DTW andFrechet, which shows that SimTra is not a good approxima-tion for SimSub, though SimTra runs faster than SimSub. (9) Comparison with algorithms for speciﬁc mea-surements (UCR and Spring).

In UCR and Spring,a point q i from the query trajectory can be aligned withonly those points p j from the data trajectory T with j ∈ [ i − R · | T | , i + R · | T | ]. We vary the parameter R in thisable 6: Comparison with Trajectory Similarity Computation and Subtrajectory Similarity Computation. Similarity t2vec DTW FrechetDataset Problem AR MR RR Time (ms) AR MR RR Time (ms) AR MR RR Time (ms)Porto SimTra 1.313 156.153 23.3% 28.5 2.100 752.831 70.7% 18.1 1.883 559.462 56.5% 19.2SimSub 1.098 18.323 3.0% 39.6 1.028 41.138 3.5% 55.2 1.034 34.162 3.6% 69.6Harbin SimTra 1.293 678.311 46.9% 31.7 2.326 1218.908 72.2% 27.1 1.891 854.042 53.9% 28.6SimSub 1.025 14.945 1.3% 62.6 1.081 75.324 4.1% 114.4 1.045 64.729 4.4% 130.6Sports SimTra 1.221 345.488 43.4% 46.1 1.659 4291.666 59.8% 107.5 1.403 3272.743 48.2% 133.3SimSub 1.045 28.761 3.8% 210.3 1.005 126.334 2.1% 254.7 1.002 95.280 1.7% 302.3 (a) Relative Rank (DTW) (b) Time Cost (DTW)

Figure 8: Comparison with UCR and Spring.experiment. When R = 1, it reduces to the unconstrainedDTW that is used in this paper. Essentially, R controlshow accurately the DTW distance is computed: the higher R is, the more accurate (but also more costly) the compu-tation is. We note that even when R = 1, UCR does notreturn exact solutions since it considers subtrajectories ofthe same size of the query one only. For this part of ex-periment, we drop the component Θ suf when deﬁning theMDP of RLS-Skip for better eﬃciency and call the resultingalgorithm RLS-Skip+ . The results are shown in Figure 8,where we vary the parameter R from 0 to 1. We notice that(1) RLS-Skip+ dominates UCR in terms of both eﬃciencyand eﬀectiveness; (2) the RR of UCR changes slightly from60.1% (when R = 0) to 59.7% (when R = 1), which showsthat the performance of UCR is insensitive to the parameter R ; (3) under settings of 0 . ≤ R ≤ .

3, RLS-Skip+ domi-nates Spring in terms of both eﬀectiveness and eﬃciency;and (4) under other settings, RLS-Skip+ and Spring pro-vide diﬀerent trade-oﬀs between eﬀectiveness and eﬃciency. (10) Comparison with Random-S.

The results areshown in Figure 9, where we vary the sample size from 10to 100 and for each sample size, we run the algorithm 100times and collect the average and standard deviations ofthe metrics of RR and running time. We notice that for arelatively small sample size, e.g., 100, the running time ofRandom-S is almost that of ExactS and signiﬁcantly largerthan that of RLS-Skip (25 times higher). This is because forRandom-S, the subtrajectories that are considered could bequite diﬀerent, and thus it is not possible to compute theirsimilarities incrementally as it does for ExactS. Whereaswhen the sample size is small, e.g., below 20, Random-Shas its eﬀectiveness signiﬁcantly degraded, which is clearlyworse than that of RLS-Skip. (11) Training time.

The training times of the RLS andRLS-Skip models on diﬀerent datasets are shown in Table 7.It normally takes a couple of hours to train a reinforcementlearning model for RLS and RLS-Skip. It takes less timeto train RLS-Skip than RLS since we use the same numberof trajectory pairs and epochs for training both algorithmsand RLS-Skip runs faster. (a) Relative Rank (DTW) (b) Time Cost (DTW)

Figure 9: Comparison with Random-S.Table 7: Training time (hours).

Similarity t2vec DTW FrechetAlgorithms RLS RLS-Skip RLS RLS-Skip RLS RLS-SkipPorto 7.2 4.4 10.1 4.8 10.6 5.5Harbin 9.7 5.4 13.9 5.7 14.2 8.3Sports 12.5 7.6 83.3 46.6 104.1 52.4

7. CONCLUSION

In this paper, we study the similar subtrajectory search(SimSub) problem and develop a suite of algorithms includ-ing an exact algorithm, an approximate algorithm providinga controllable trade-oﬀ between eﬃciency and eﬀectiveness,and a few splitting-based algorithms, among which some arebased on pre-deﬁned heuristics and some are based on deepreinforcement learning called RLS and RLS-Skip. We con-ducted extensive experiments on real datasets, which veri-ﬁed that among the approximate algorithms, learning basedalgorithms achieve the best eﬀectiveness and eﬃciency. Inthe future, we plan to explore some more similarity measure-ments for the SimSub problem, e.g., the constrained DTWdistance and other similarity measurements reviewed in Sec-tion 2.

Acknowledgments.

This research is supported by theNanyang Technological University Start-UP Grant from theCollege of Engineering under Grant M4082302 and by theMinistry of Education, Singapore, under its Academic Re-search Fund Tier 1 (RG20/19 (S)). Gao Cong acknowl-edges the support by Singtel Cognitive and Artiﬁcial In-telligence Lab for Enterprises (SCALE@NTU), which is acollaboration between Singapore Telecommunications Lim-ited (Singtel) and Nanyang Technological University (NTU)that is funded by the Singapore Government through theIndustry Alignment Fund - Industry Collaboration ProjectsGrant, and a Tier-1 project RG114/19. The authors wouldlike to thank Eamonn Keogh for pointing out some refer-ences to the time series literature and also the anonymousreviewers for their constructive comments. . REFERENCES [1] P. K. Agarwal, K. Fox, K. Munagala, A. Nath, J. Pan,and E. Taylor. Subtrajectory clustering: Models andalgorithms. In

Proceedings of the 37th ACMSIGMOD-SIGACT-SIGAI Symposium on Principlesof Database Systems , pages 75–87. ACM, 2018.[2] H. Alt and M. Godau. Computing the fr´echet distancebetween two polygonal curves.

International Journalof Computational Geometry & Applications ,5(01n02):75–91, 1995.[3] V. Athitsos, P. Papapetrou, M. Potamias, G. Kollios,and D. Gunopulos. Approximate embedding-basedsubsequence matching of time series. In

Proceedings ofthe 2008 ACM SIGMOD international conference onManagement of data , pages 365–378. ACM, 2008.[4] R. I. Brafman and M. Tennenholtz. R-max-a generalpolynomial time algorithm for near-optimalreinforcement learning.

Journal of Machine LearningResearch , 3(Oct):213–231, 2002.[5] K. Buchin, M. Buchin, J. Gudmundsson, M. L¨oﬄer,and J. Luo. Detecting commuting patterns byclustering subtrajectories.

International Journal ofComputational Geometry & Applications ,21(03):253–282, 2011.[6] L. Chen and R. Ng. On the marriage of lp-norms andedit distance. In

Proceedings of the Thirtiethinternational conference on Very large databases-Volume 30 , pages 792–803. VLDB Endowment,2004.[7] L. Chen, M. T. ¨Ozsu, and V. Oria. Robust and fastsimilarity search for moving object trajectories. In

Proceedings of the 2005 ACM SIGMOD internationalconference on Management of data , pages 491–502.ACM, 2005.[8] K. Cho, B. Van Merri¨enboer, C. Gulcehre,D. Bahdanau, F. Bougares, H. Schwenk, andY. Bengio. Learning phrase representations using rnnencoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 , 2014.[9] C. Faloutsos, M. Ranganathan, and Y. Manolopoulos.

Fast subsequence matching in time-series databases ,volume 23. ACM, 1994.[10] X. Gong, S. Fong, and Y.-W. Si. Fast fuzzysubsequence matching algorithms on time-series.

Expert Systems with Applications , 116:275–284, 2019.[11] W.-S. Han, J. Lee, Y.-S. Moon, and H. Jiang. Rankedsubsequence matching in time-series databases. In

Proceedings of the 33rd international conference onVery large data bases , pages 423–434. VLDBEndowment, 2007.[12] M. Kearns and S. Singh. Near-optimal reinforcementlearning in polynomial time.

Machine learning ,49(2-3):209–232, 2002.[13] E. Keogh and C. A. Ratanamahatana. Exact indexingof dynamic time warping.

Knowledge and informationsystems , 7(3):358–386, 2005.[14] S.-W. Kim, S. Park, and W. W. Chu. An index-basedapproach for similarity search supporting timewarping in large sequence databases. In

Proceedings17th International Conference on Data Engineering ,pages 607–614. IEEE, 2001.[15] Y. Kim and K. Shim. Eﬃcient top-k algorithms for approximate substring matching. In

Proceedings of the2013 ACM SIGMOD International Conference onManagement of Data , pages 385–396. ACM, 2013.[16] J.-G. Lee, J. Han, and K.-Y. Whang. Trajectoryclustering: a partition-and-group framework. In

Proceedings of the 2007 ACM SIGMOD internationalconference on Management of data , pages 593–604.ACM, 2007.[17] G. Li, X. Zhou, S. Li, and B. Gao. Qtune: Aquery-aware database tuning system with deepreinforcement learning.

Proceedings of the VLDBEndowment , 12(12):2118–2130, 2019.[18] X. Li, K. Zhao, G. Cong, C. S. Jensen, and W. Wei.Deep representation learning for trajectory similaritycomputation. In , pages617–628. IEEE, 2018.[19] C. Long, R. C.-W. Wong, and H. Jagadish.Direction-preserving trajectory simpliﬁcation.

Proceedings of the VLDB Endowment , 6(10):949–960,2013.[20] C. Ma, H. Lu, L. Shou, and G. Chen. Ksq: Top-ksimilarity query on uncertain trajectories.

IEEETransactions on Knowledge and Data Engineering ,25(9):2049–2062, 2012.[21] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves,I. Antonoglou, D. Wierstra, and M. Riedmiller.Playing atari with deep reinforcement learning. arXivpreprint arXiv:1312.5602 , 2013.[22] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu,J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller,A. K. Fidjeland, G. Ostrovski, et al. Human-levelcontrol through deep reinforcement learning.

Nature ,518(7540):529, 2015.[23] Y.-S. Moon, K.-Y. Whang, and W.-K. Loh.Duality-based subsequence matching in time-seriesdatabases. In

Proceedings 17th InternationalConference on Data Engineering , pages 263–272.IEEE, 2001.[24] A. Mueen and E. Keogh. Extracting optimalperformance from dynamic time warping. In

Proceedings of the 22nd ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining ,pages 2129–2130, 2016.[25] S. Park, W. W. Chu, J. Yoon, and C. Hsu. Eﬃcientsearches for similar subsequences of diﬀerent lengthsin sequence databases. In

Proceedings of 16thInternational Conference on Data Engineering (Cat.No. 00CB37073) , pages 23–32. IEEE, 2000.[26] M. L. Puterman.

Markov Decision Processes.:Discrete Stochastic Dynamic Programming . JohnWiley & Sons, 2014.[27] T. Rakthanmanon, B. Campana, A. Mueen,G. Batista, B. Westover, Q. Zhu, J. Zakaria, andE. Keogh. Searching and mining trillions of time seriessubsequences under dynamic time warping. In

Proceedings of the 18th ACM SIGKDD internationalconference on Knowledge discovery and data mining ,pages 262–270. ACM, 2012.[28] T. Rakthanmanon, B. Campana, A. Mueen,G. Batista, B. Westover, Q. Zhu, J. Zakaria, andE. Keogh. Addressing big data time series: Miningrillions of time series subsequences under dynamictime warping.

ACM Transactions on KnowledgeDiscovery from Data (TKDD) , 7(3):10, 2013.[29] S. Ranu, P. Deepak, A. D. Telang, P. Deshpande, andS. Raghavan. Indexing and matching trajectoriesunder inconsistent sampling rates. In , pages999–1010. IEEE, 2015.[30] D. E. Rumelhart, G. E. Hinton, R. J. Williams, et al.Learning representations by back-propagating errors.

Cognitive modeling , 5(3):1, 1988.[31] Y. Sakurai, C. Faloutsos, and M. Yamamuro. Streammonitoring under the time warping distance. In , pages 1046–1055. IEEE, 2007.[32] L. Sha, P. Lucey, Y. Yue, P. Carr, C. Rohlf, andI. Matthews. Chalkboarding: A new spatiotemporalquery paradigm for sports play retrieval. In

Proceedings of the 21st International Conference onIntelligent User Interfaces , pages 336–347, 2016.[33] R. S. Sutton and A. G. Barto.

Reinforcement learning:An introduction . MIT press, 2018.[34] P. Tampakis, C. Doulkeridis, N. Pelekis, andY. Theodoridis. Distributed subtrajectory join onmassive datasets. arXiv preprint arXiv:1903.07748 ,2019.[35] P. Tampakis, N. Pelekis, C. Doulkeridis, andY. Theodoridis. Scalable distributed subtrajectoryclustering. arXiv preprint arXiv:1906.06956 , 2019.[36] I. Trummer, S. Moseley, D. Maram, S. Jo, andJ. Antonakakis. Skinnerdb: regret-bounded queryevaluation via reinforcement learning.

Proceedings ofthe VLDB Endowment , 11(12):2074–2077, 2018.[37] M. Vlachos, G. Kollios, and D. Gunopulos.Discovering similar multidimensional trajectories. In

Proceedings 18th international conference on dataengineering , pages 673–684. IEEE, 2002.[38] S. Wang, Z. Bao, J. S. Culpepper, T. Sellis, andX. Qin. Fast large-scale trajectory clustering.

Proceedings of the VLDB Endowment , 13(1):29–42,2019.[39] S. Wang, Z. Bao, J. S. Culpepper, Z. Xie, Q. Liu, andX. Qin. Torch: A search engine for trajectory data. In

The 41st International ACM SIGIR Conference onResearch & Development in Information Retrieval ,pages 535–544. ACM, 2018. [40] Y. Wang, Y. Tong, C. Long, P. Xu, K. Xu, andW. Lv. Adaptive dynamic bipartite graph matching:A reinforcement learning approach. In , pages 1478–1489. IEEE, 2019.[41] Z. Wang, C. Long, G. Cong, and C. Ju. Eﬀective andeﬃcient sports play retrieval with deep representationlearning. In

Proceedings of the 25th ACM SIGKDDInternational Conference on Knowledge Discovery &Data Mining , pages 499–509, 2019.[42] Z. Wang, C. Long, G. Cong, and Y. Liu. Eﬃcient andeﬀective similar subtrajectory search with deepreinforcement learning (technical report). .[43] C. J. Watkins and P. Dayan. Q-learning.

Machinelearning , 8(3-4):279–292, 1992.[44] M. Xie. Eds: a segment-based distance measure forsub-trajectory similarity search. In

Proceedings of the2014 ACM SIGMOD International Conference onManagement of Data , pages 1609–1610. ACM, 2014.[45] D. Yao, G. Cong, C. Zhang, and J. Bi. Computingtrajectory similarity in linear time: A genericseed-guided neural metric learning approach. In , pages 1358–1369. IEEE, 2019.[46] B.-K. Yi, H. V. Jagadish, and C. Faloutsos. Eﬃcientretrieval of similar time sequences under time warping.In

Proceedings 14th International Conference on DataEngineering , pages 201–208. IEEE, 1998.[47] H. Yuan and G. Li. Distributed in-memory trajectorysimilarity search and join on road network. In , pages 1262–1273. IEEE, 2019.[48] J. Zhang, Y. Liu, K. Zhou, G. Li, Z. Xiao, B. Cheng,J. Xing, Y. Wang, T. Cheng, L. Liu, et al. Anend-to-end automatic cloud database tuning systemusing deep reinforcement learning. In

Proceedings ofthe 2019 International Conference on Management ofData , pages 415–432. ACM, 2019.[49] Z. Zhang, M. Hadjieleftheriou, B. C. Ooi, andD. Srivastava. Bed-tree: an all-purpose indexstructure for string similarity search based on editdistance. In

Proceedings of the 2010 ACM SIGMODInternational Conference on Management of data ,pages 915–926. ACM, 2010.

PPENDIXA. QUALITY ANALYSIS OF THE SIZESALGORITHM

There usually exists a space, within which the objectsmove. Therefore, we assume the points of trajectories areall located in a d max × d max rectangle, where d max is alarge number that captures the extent of the space. We usea coordinate system whose origin is at the middle of thisrectangle. Case 1: DTW.

Consider a problem input with aquery trajectory with m points T q = and a data trajectory with n = m points T =

.We assume that m is an even number and l = m/

2. Inaddition, we let d = d max /m . The locations of the points inthese trajectories are provided as follows. • p (cid:48) i = ( − ( l − i + 1 / · d,

0) for i = 1 , , ..., l ; • p (cid:48) i = (( i − l − / · d,

0) for i = l + 1 , l + 2 , ..., m ; • p i,j ’s (1 ≤ j ≤ m ) are evenly located at the circle withits center at p (cid:48) i and its radius equal to (cid:15) , where (cid:15) is avery small real number for i = 1 , , ..., m ;Consider the optimal solution. Its DTW distance, de-noted by D o , is at most the distance between T and T q ,which is equal to m · (cid:15) (points p i,j for 1 ≤ j ≤ m arealigned with point p (cid:48) i for 1 ≤ i ≤ m ). That is, we have D o ≤ m · (cid:15) .Consider the approximate solution returned by SizeS.Suppose that ξ = 0. That is, we only consider subtra-jectories with the length exactly equal to m . We furtherknow that each such sub-trajectory consists of points, ei-ther all located at a circle with the center at one of thepoint of T q or some located at a circle with the center ata point p (cid:48) i and others located at a circle with the center ata point p (cid:48) i +1 (1 ≤ i < m ). It could be veriﬁed that amongthese subtrajectories, the one with some located at the cir-cle with the center at p (cid:48) l and others at p (cid:48) l +1 has the smallestDTW distance from T q and would be returned. We denoteits DTW distance to T q by D a . It could be veriﬁed that D a > × (Σ l − i =1 (( l − i ) · d − (cid:15) ) + (cid:15) ), where the lower boundof the Euclidean distance from a point of T q to its alignedpoint of the returned subtrajectory is (1) (( l − i ) · d − (cid:15) ) for1 ≤ i ≤ l − (cid:15) for i = l ; and (3) that of the pointsymmetric to p (cid:48) i (w.r.t. the origin) for l + 1 ≤ i ≤ m due tothe symmetry. Therefore, we have D a D o ≥ × (Σ l − i =1 (( l − i ) · d − (cid:15) ) + (cid:15) ) m · (cid:15) = m / · d − m/ · d − m · (cid:15) + 4 · (cid:15)m · (cid:15) = 1 / · d − / (2 m ) · d − (cid:15)/m + 4 /m · (cid:15)(cid:15) which approaches inﬁnity when m approaches inﬁnity and (cid:15) approaches 0. In summary, the solution returned by SizeScould be arbitrarily worse than the optimal one when DTWdistance is used. Case 2: Frechet and t2vec.

Consider the case whenFrechet distance is used. The optimal solution and the ap-proximate solution returned would be the sames as the casewhen DTW distance is used, but the distances would be dif-ferent. D o would be equal to (cid:15) and D a would be at least (( l − · d − (cid:15) ). Therefore, we have D a D o ≥ ( l − · d − (cid:15)(cid:15) which approaches inﬁnity when (cid:15) approaches 0.Consider the case when t2vec is used. Since it is alearning-based distance - in theory, it may reduce to anypossible distance metric such as DTW and Frechet. Thus,the analysis for DTW or Frechet could be carried over fort2vec. B. QUALITY ANALYSIS OF SPLITING-BASED ALGORITHMS

Case 1: PSS with DTW.

Consider a problem in-put with a data trajectory with n + 3 points T =

( n is a positive integer) and a querytrajectory T q = . Let d = d max /

2. The locations ofthe points in these trajectories are provided as follows. • p (cid:48) = ( − d/ , p (cid:48) = ( − d, • p i = (0 ,

0) for i = 1 , , ..., n ; • p (cid:48) = ( d, • p (cid:48) = (0 , (cid:15) ), where (cid:15) is a very small non-negative realnumber;Consider the optimal solution. It could be any subtra-jectory (1 ≤ i ≤ n ) and the corresponding DTWdistance is equal to (cid:15) , which we denote by D o .Consider the approximate solution returned by PSS. Itis the subtrajectory , which is explained as follows.When it scans the ﬁrst point p (cid:48) , it would split the trajec-tory at p (cid:48) and update the best-known subtrajectory to be with the DTW distance equal to (cid:112) d / (cid:15) . Itthen continues to scan the following points p (cid:48) , p , ..., p n , p (cid:48) and would not perform any split operations at these pointsdue to the fact that p (cid:48) and p (cid:48) are farther away from p (cid:48) than p (cid:48) . As a result, would be returned as a solution.We denote the DTW distance of this solution by D a , i.e., D a = (cid:112) d / (cid:15) .Consider the approximation ratio (AR). We have D a D o = (cid:112) d / (cid:15) (cid:15) > d/ (cid:15) which approaches inﬁnity when (cid:15) approaches zero.Consider the mean rank (MR). We know that the rankof the approximate solution is at least n ( n +1)2 + 1since any subtrajectory of has a smallerDTW distance than (assuming (cid:15) = 0). Therefore,the mean rank would approach inﬁnity when n approachesinﬁnity.Consider the relative rank (RR). Based on the analysisof mean rank, we know that the relative rank of the ap-proximate solution is at least n ( n +1)2 +1 ( n +3)( n +4)2 , which ap-proaches 1 when n approaches inﬁnity.In conclusion, the solution returned by PSS could be ar-bitrarily worse than the optimal one in terms of AR, MR,and RR, when the DTW distance is used. Case 2: Other Algorithms and Similarity Measure-ments

Consider the other algorithms, i.e., POS and POS-D.It could be veriﬁed that they would run exactly in the sameway as PSS on the problem input provided in the Case 1.herefore, the conclusion would be carried over. Considerthe other measurements, namely, Frechet and t2vec. ForFrechet, it could be veriﬁed that the optimal solution andthe approximate solution returned by the algorithms are thesame as those in the Case 1 and their Frechet distances andDTW distances to T q are equal since both solutions and T q involve one single point. As a result, the conclusion forDTW distance could be carried over for Frechet distance.For t2vec, since it is a learning-based distance - in theory,it may reduce to any possible distance metric such as DTWand Frechet. Thus, the analysis for DTW in the Case 1could be carried over for t2vec. C. ADAPTION OF UCR

UCR [27] was originally developed for searching subse-quences of a time series, which are the most similar to aquery time series and the similarity is based on the DTW dis-tance. Speciﬁcally, UCR enumerates all subsequences thatare of the same length of the query time series and employs arich set of techniques for pruning many of the subsequences.We use UCR for our “similar subtrajectory search” prob-lem by adapting UCR’s pruning techniques for trajectories.UCR involves seven techniques used for pruning, organizedin two groups. Let T = be a data tra-jectory and T q = < q , q , ..., q m > be a query trajectory.Suppose we are considering a subtrajectory of T , denotedby T (cid:48) = without loss of generality. Wedescribe the adaptions of the techniques involved in UCR,which are used to prune T (cid:48) from being considered if possibleas follows.Group 1: Known Optimizations • Early Abandoning of LB Keogh . This is to computea lower bound called LB Keogh of the DTW distancebetween T (cid:48) and T q , denoted by LB Keogh ( T (cid:48) , T q ), andprune T (cid:48) if the lower bound is larger than the best-known DTW distance. Speciﬁcally, let (1) R ∈ [0 ,

1] bea real number, (2) q i −(cid:98) R · m (cid:99) : i + (cid:98) R · m (cid:99) be the set involving (cid:98) R · m (cid:99) points before q i , (cid:98) R · m (cid:99) points after q i , andpoint p i , and (3) MBR ( · ) be the minimum boundingbox of a set of points. We compute LB Keogh ( T (cid:48) , T q ) asfollows. LB Keogh ( T (cid:48) , T q ) = m (cid:88) i =1  d ( p i , MBR ( q i −(cid:98) R · m (cid:99) : i + (cid:98) R · m (cid:99) ))if p i outside MBR ( q i −(cid:98) R · m (cid:99) : i + (cid:98) R · m (cid:99) )0 otherwise where d ( p i , MBR ( · )) denotes shortest distance be-tween p i and MBR ( · ). • Early Abandoning of DTW. During the process of com-puting the DTW distance between T (cid:48) and T q , when theaccumulated DTW distance exceeds the best-knownDTW distance, we abandon the computation. • Earlier Early Abandoning of DTW using LB Keogh . For i = 1 , , ..., m , if the sum of the DTW distance between T q [1 : i ] and any preﬁx of T (cid:48) [1 : i ] (or between T (cid:48) [1 : i ]and any preﬁx of T q [1 : i ]) and the LB Keogh boundbetween T q [ i : m ] and T (cid:48) [ i : m ] is larger than the best-known DTW distance, we prune T (cid:48) . Note that DTWdistances used in this pruning are maintained in theprocess of computing the DTW distance between T (cid:48) and T q .Group 2: Novel Optimizations in UCR Suite. • Just-in-time Z-normalizations. We do not adapt theZ-normalization technique since it is designed for one-dimensional data and cannot be used for trajectorydata that is two-dimensional. • Reordering Early Abandoning. We consider the pointsof T q in a descending order of their distances to they-axis for computing the bound of LB Keogh . • Reversing LB Keogh . We reverse the roles of T (cid:48) and T q and compute another LB Koegh bound. We then usethe larger one among the two LB Koegh bounds to actas a tighter LB Keogh bound when necessary. • Cascading Lower Bounds. We ﬁrst compute the LB Kim

F L bound, which is another simple lower boundof the DTW distance between T (cid:48) and T q , denoted by LB Kim

F L ( T (cid:48) , T q ). Speciﬁcally, LB Kim

F L ( T (cid:48) , T q ) isdeﬁned as ( d ( T q [1] , T (cid:48) [1]) + d ( T q [ m ] , T (cid:48) [ m ])). The timecomplexity of this step is simply O (1). If this bounddoes not help to prune T (cid:48) , we cascade the techniquesof early abandoning of LB Keogh , early abandoning ofDTW, and earlier early abandoning of DTW using LB Keogh . Finally, if T (cid:48) has still not been pruned, wecompute the DTW distance between T (cid:48) and T q .We note that UCR is designed speciﬁcally for the DTW dis-tance and cannot be used for the problem when the Frechetor t2vec distance is used. D. ADDITIONAL EXPERIMENTAL RE-SULTS .4 0.8 1.2 1.6 2.0

Data Size (million) T i m e w i t hou t I nde x ( s ) ExactSPSS POSPOS-DSizeSRLSRLS-Skip

Data Size (million) T i m e w i t hou t I nde x ( s ) ExactSPSS POSPOS-D SizeSRLSRLS-Skip

Data Size (million) T i m e w i t hou t I nde x ( s ) ExactSPSSPOSPOS-DSizeSRLSRLS-Skip

Data Size (million) T i m e w i t hou t I nde x ( s ) ExactSPSSPOSPOS-D SizeSRLSRLS-Skip (a) Harbin (t2vec) (b) Harbin (DTW) (c) Harbin (Frechet) (d) Sports (t2vec) (e) Sports (DTW) (f) Sports (Frechet)

Data Size (million) T i m e w i t h R - t r ee I nde x ( s ) ExactSPSS POSPOS-DSizeSRLSRLS-Skip

Data Size (million) T i m e w i t h R - t r ee I nde x ( s ) ExactSPSS POSPOS-D SizeSRLSRLS-Skip

Data Size (million) T i m e w i t h R - t r ee I nde x ( s ) ExactSPSSPOSPOS-D SizeSRLSRLS-Skip (g) Harbin (t2vec) (h) Harbin (DTW) (i) Harbin (Frechet) (j) Sports (t2vec) (k) Sports (DTW) (l) Sports (Frechet)

Figure 10: Eﬃciency without index (a)-(f) and with R-tree index (g)-(l) on Harbin and Sports.

G1 G2 G3 G4

Grouping A pp r o x i m a t i on R a t i o PSSPOSPOS-D SizeSRLS RLS-Skip

G1 G2 G3 G4

Grouping M ean R an k PSSPOSPOS-D SizeSRLS RLS-Skip

G1 G2 G3 G4

Grouping A pp r o x i m a t i on R a t i o PSSPOSPOS-D SizeSRLSRLS-Skip

G1 G2 G3 G4