[PDF] Efficient Trajectory Compression and Queries

Abstract

Nowadays, there are ubiquitousness of GPS sensors in various devices collecting, storing and transmitting tremendous trajectory data. However, an unprecedented scale of GPS data has posed an urgent demand for not only an effective storage mechanism but also an efficient query mechanism. Line simplification in online mode, a kind of commonly used trajectory compression methods in practice, plays an important role to attack this issue. To attack this issue, in this paper, each compressed trajectory is regarded as a sequence of continuous line segments, but not discrete points. And based on this, we propose a new trajectory similarity metric AL, an efficient index \emph{ASP-tree} and two algorithms about how to process range queries and top- k similarity queries on the compressed trajectories.

Full PDF

EEfﬁcient Trajectory Compression and Queries

Hongbo Yin , Hong Gao , Binghao Wang , Sirui Li , and Jianzhong LiDepartment of Computer ScienceHarbin Institute of Technology, Harbin, China { hongboyin, honggao, [email protected], [email protected], lijzh } @hit.edu.cn Abstract —Nowadays, there are ubiquitousness of GPS sensorsin various devices collecting, storing and transmitting tremendoustrajectory data. However, an unprecedented scale of GPS data hasposed an urgent demand for not only an effective storage mech-anism but also an efﬁcient query mechanism. Line simpliﬁcationin online mode, a kind of commonly used trajectory compressionmethods in practice, plays an important role to attack this issue.But for the existing algorithms, either their time cost is extremelyhigh, or the accuracy loss after the compression is too much.To address this, we propose (cid:15) -Region based Online trajectoryCompression with Error bounded (ROCE for short), whichmakes the best balance among the accuracy loss, the time cost andthe compression rate. In most previous work, each trajectory isseen as a sequence of discrete points for various queries. But it’snot suitable when the queried trajectories have been compressed,because there may be hundreds of points discarded betweeneach two adjacent points and the points in each compressedtrajectory are quite sparse. To attack this issue, in this paper, eachcompressed trajectory is regarded as a sequence of continuousline segments, but not discrete points. And based on this, wepropose a new trajectory similarity metric AL, an efﬁcient index

ASP-tree and two algorithms about how to process range queriesand top- k similarity queries on the compressed trajectories.Extensive experiments have been done on real datasets and theresults demonstrate superior performance of our methods. I. I

NTRODUCTION

The last decade has witnessed an unprecedented growthof mobile devices, such as smart-phones, vehicles, and wear-able smart devices. Nearly all of them are equiped with thelocation-tracking function and widely used to collect massivetrajectory data of moving objects at a certain sampling rate(e.g. 5 seconds) for location based services, trajectory mining,wildlife tracking and many other useful and meaningful appli-cations. However, the amount of the trajectory data collectedis often very large, and in many application scenarios, it’stoo difﬁcult to store and query such massive trajectories. Forexample, Fibit, which is one of the most popular wearabledevice manufacturing companies for ﬁtness monitor and ac-tivity tracker, has 28 million active users up to November 1st,2019 . If each wearable device records its latest position every5 seconds, over 14 trillion trajectory points in total will begenerated just in one month. It will consume too much networkbandwidth for such large amounts of data to be transmitted tothe cloud server, and it also bring a great deal of hardship onstoring and querying on such big data. https://expandedramblings.com/index.php/ﬁtbit-statistics/ Trajectory compression is a suitable and effective solution tosolve the problem. Line simpliﬁcation is a mainstream com-pression method and has drawn wide attention, which com-presses each trajectory into a set of continuous line segments.It’s a kind of lossy compression, where a high compressionrate can be obtained with a tolerable error bound. Existingline simpliﬁcation methods fall into two categories, i.e. batchmode and online mode. For each trajectory, algorithms in batchmode require that all points in this trajectory must be loadedin the local buffer before compression, which means that thelocal buffer must be large enough to hold the entire trajectory.Thus, the space complexities of these algorithms are at least O ( N ) , or even O ( N ) , which limits the application of thesealgorithms in resource-constrained environments. Therefore,more work focuses on the other kind of compression methods,algorithms in online mode, which only need a limited size oflocal buffer, rather than a very large local buffer, to compresstrajectories in an online processing manner. Thus algorithms inonline mode have much more application scenarios comparedwith those in batch mode, i.e. compressing streaming data.For all algorithms in online mode, the execution time, theaccuracy loss and the compression rate are the three indicatorsused to measure their performance. There is a tradeoff amongthem, and the key issue is how to reach a perfect balance. Forsome algorithms, though their accuracy loss is small, their timecost is quite high, such as BQS [1], [2] and FBQS [1], [2] inTable I, from the experimental results of Zhang et al [3]. Forother algorithms, they are all fast, but at the expense of theextremely high accuracy loss, such as Angular [4], Interval[5] and OPERB [6] in Table I. So, it’s still a big challenge tocompress trajectories into much smaller forms with less timeand less accuracy loss. To address this, we propose a newonline line simpliﬁcation compression method, (cid:15) -Region basedOnline trajectory Compression with Error bounded (ROCEfor short), with only O ( N ) time complexity and O (1) spacecomplexity. When the compression rate is ﬁxed, ROCE is oneof the fastest algorithms, and its accuracy loss is the smallestin the fastest algorithms. ROCE makes the best balance amongthe accuracy loss, the time cost and the compression rate.Compressing trajectories can help us reduce not only thecost of transmission and storage, but also the computing costof queries. So how to process queries on compressed trajec-tories is very important. Range queries and top- k similarityqueries are two kinds of most fundamental queries for variousapplications in trajectory data analysis and they have attracted a r X i v : . [ c s . D B ] O c t ABLE IT

HE TIME COST AND ACCURACY LOSS OF SOME STATE - OF - THE - ARTALGORITHMS IN ONLINE MODE . T

HE COMPRESSION RATES ARE ALLFIXED AT Algorithm in Online Mode BQS FBQS Angular Interval OPERBExecution Time per Point ( µs ) wide attention. In most previous work, each trajectory is seenas a sequence of discrete points. And it’s usually regardedthat a trajectory is overlapped with the query region R iffat least one point in this trajectory falls in R . However, thisjudgement condition is not complete in some situations wherethe points in each queried trajectory are sparse. Especiallyfor compressed trajectories, there may be hundreds of pointsdiscarded between each two adjacent points in compressedtrajectories, which makes the points in compressed trajectoriesquite sparse. If some points in a trajectory fall in the the queryregion, but these points are discarded after the compression,such as the situation shown in Figure 1, then such a trajectoryis missing in the result set. To solve this kind of problem,in this paper, each compressed trajectory is regarded as asequence of continuous line segments, but not discrete points.And based on this, we propose two algorithms about howto process range queries and top- k similarity queries on thecompressed trajectories. 𝑝𝑝 𝑝𝑝 𝑝𝑝 𝑝𝑝 𝑝𝑝 𝑹𝑹 Fig. 1. An example that neither endpoint of the line segment p p is inregion R , but the the discarded points p , p and p fall in R The main contributions of this paper are summarized asfollows: • Point-to-Segment Euclidean Distance (PSED), an accu-racy loss metric, is deﬁned to measure the degree of theaccuracy loss after a trajectory is compressed. Then basedon PSED, we propose a new online line simpliﬁcationcompression algorithm ROCE with bounded error. Withonly O ( N ) time complexity and O (1) space complexity,ROCE achieves the best balance among the accuracy loss,the time cost and the compression rate. • To improve the accuracy of range query results oncompressed trajectories, we propose a new range queryprocessing algorithm RQC based on line segments. F ,the synthesized indicator of the precision rate and therecall rate, can be improved by up to . . • A new trajectory similarity metric, Area clamped by theLine segments (AL for short), is deﬁned to measure thesimilarity between each pair of compressed trajectories.And based on AL, we propose a new top- k similarityquery processing algorithm SQC. • An efﬁcient index

ASP-tree and a set of novel techniquesare also presented to accelerate the processing of bothrange queries and top- k similarity queries greatly. • We conduct extensive comparison experiments on real-life trajectory datasets, and the results demonstrate supe-rior performance of our methods.The rest of this paper is organized as follows. Section IIintroduces an accuracy loss metric PSED and a new compres-sion algorithm ROCE. Section III introduces an efﬁcient index

ASP-tree and the range query processing algorithm RQC.Section IV gives a new trajectory similarity metric AL and thetop- k similarity query processing algorithm SQC. Section Vshows the sufﬁcient experimental results and analysis. SectionVI reviews related works and ﬁnally Section VII concludesour work. II. ROCE C OMPRESSION A LGORITHM

In this section, we ﬁrst propose an accuracy loss metricPSED. Then based on PSED, a new compression algorithmROCE is introduced in detail.

A. Basic Concepts and Notations

A trajectory T can be expressed as a sequence of points { p , p , ..., p N } , where T [ i ] = p i ( x i , y i ) represents the coor-dinate of the moving object. For any two points p i and p j , if i < j , then p i was generated before p j .Given a trajectory T = { p , p , ..., p N } , ∀ i, j (1 ≤ i

Deﬁnition 2. (Compression Rate): Given a raw trajectory T = { p , p , ..., p N } with N points and its compressedtrajectory, T (cid:48) = { p i , p i , ..., p i n } ( p i = p , p i n = p N ) with n − consecutive line segments, the compression rate is r = N/n.

B. Accuracy Loss Metric

After compression, a set of consecutive line segments isused to approximately represent a raw trajectory. When thecompression rate is ﬁxed, for a compression algorithm, thesmaller accuracy loss, the better. How to measure the accuracyoss calls for a reasonable metric. Usually, the accuracy lossis calculated based on the deviation between each discardedpoint and its corresponding line segment.Perpendicular Euclidean Distance (PED for short), an accu-racy loss metric adopted by most existing line simpliﬁcationmethods, e.g. [1], [2], [6]–[9], is formally deﬁned as:

Deﬁnition 3. (PED): Given a trajectory segment T [ s : e ]( s

To solve the problem, we deﬁne a new accuracy loss metric,point-to-segment Euclidean distance (PSED for short), whichis a revised version of PED, to measure the accuracy loss.The main difference between PSED and PED is that PSEDadopts the shortest Euclidean distance from a point to itscorresponding line segment, rather than the straight line. PSEDis formally deﬁned as follows:

Deﬁnition 4. (PSED): Given a trajectory segment T [ s : e ]( s < e ) and its compressed form, the line segment p s p e ,for any discarded point p m ( s < m < e ) in T [ s : e ] , the PSEDof p m is calculated according to the following cases: P SED ( p m ) = (cid:26) ||−−−→ p s p m × −−→ p s p e || / ||−−→ p s p e || −−−→ p s p m · −−→ p s p e ≥ −−−→ p m p e · −−→ p s p e ≥ min {||−−−→ p s p m || , ||−−−→ p m p e ||} otherwise where × and · are respectively the symbols of cross productand dot product in vector operations. In Deﬁnition 4, if −−−→ p s p m · −−→ p s p e ≥ −−−→ p m p e · −−→ p s p e ≥ ,i.e. the perpendicular point of p m falls on the line segment p s p e , P SED ( p m ) is the perpendicular distance from p m to p s p e , the same as P ED ( p m ) . Otherwise, P SED ( p m ) is theshorter one between | p s p m | and | p m p e | .In Figure 2, since the perpendicular points of p and p both fall on the extension line of p p , P SED ( p ) = | p p | and P SED ( p ) = | p p | . For the perpendicular points of p and p are both on the line segment p p , P SED ( p ) = P ED ( p ) = | p p (cid:48) | and P SED ( p ) = P ED ( p ) = 0 .With the deﬁnition of PSED, the (cid:15) -error-bounded trajectoryis deﬁned as follows: Deﬁnition 5. ( (cid:15) -Error-bounded Trajectory): Given a value (cid:15) ,a raw trajectory T = { p , p , ..., p N } and its compressedtrajectory T (cid:48) = { p i , p i , ..., p i n } ( p i = p , p i n = p N ) ,for each discarded point p m in T , the accuracy loss metric P SED ( p m ) ≤ (cid:15) . Then we say T (cid:48) is (cid:15) -error-bounded and (cid:15) isthe upper bound of the deviation.C. Algorithm ROCE In this part, we present a new trajectory compressionalgorithm ROCE, which makes the best balance among theaccuracy loss, the time cost and the compression rate. Givena raw trajectory T = { p , p , ..., p N } and the upper boundof PSED (cid:15) , ROCE is to compress T into an (cid:15) -error-boundedcompressed trajectory T (cid:48) , which is a set of consecutive linesegments.In order to determine whether a compressed trajectory is (cid:15) -error-bounded more conveniently, we deﬁne a new concept (cid:15) -Region as below: Deﬁnition 6. ( (cid:15) -Region): Given the upper bound of PSED (cid:15) and a raw trajectory point p i , we can get a circle whose centeris p i and radius is (cid:15) . This circle is called the (cid:15) -Region of p i . For convenience, E i is used to denote the (cid:15) -Region of p i inthe following. We have the property as below: Lemma 1.

Given a trajectory segment T [ s : e ]( s < e ) andthe upper bound of PSED (cid:15) , T [ s : e ] is compressed into a linesegment p s p e . For any discarded point p m ( s < m < e ) in T [ s : e ] , P SED ( p ) ≤ (cid:15) iff p s p e intersects E i . Then p s p e is (cid:15) -error-bounded iff p s p e intersects all (cid:15) -Regions of discardedpoints, i.e. E s +1 , E s +2 , ..., E e − . In Figure 3, the trajectory segment T [1 : 3] is compressedinto the line segment p p . It’s obvious that the line segment p p doesn’t intersect E and P SED ( p ) > (cid:15) . Thus p p isn’t (cid:15) -error-bounded. Another trajectory segment T [11 : 16] is compressed into the line segment p p . For any discardedpoint, p p intersects its corresponding (cid:15) -Region and p p is (cid:15) -error-bounded.Given a raw trajectory T = { p , p , ..., p N } and the upperbound of PSED (cid:15) , an optimal compression is to compress T into an (cid:15) -error-bounded trajectory T (cid:48) , which consists of the 𝑝𝑝 𝑝𝑝 𝑝𝑝 𝑝𝑝 𝑝𝑝 𝑝𝑝 𝑝𝑝 𝑝𝑝 𝐸𝐸 𝐸𝐸 𝐸𝐸 𝐸𝐸 𝐸𝐸 𝑅𝑅 = 𝜖𝜖 Fig. 3. T [1 : 3] and T [11 : 16] are two trajectory segments and arecompressed into the line segments p p and p p respectively smallest number of consecutive line segments. T can be splitinto N − different sets of consecutive trajectory segments,which means there are up to N − different compressedstrategies. So, the search space is exponential. By adoptinga greedy strategy and some effective tricks, ROCE handlesthe trajectory compression in an online processing manner.First, ROCE anchors the start point p s of a trajectory segmentto be compressed. p f , where f is a variable and assigned as ( s + 2) at ﬁrst, is selected as the current ﬂoat point. Thenin the trajectory, it deﬁnes a trajectory segment T [ s : f ] .If for any point p m ( s < m < f ) , P SED ( p m ) ≤ (cid:15) ,then p f +1 is assigned as the new ﬂoat point and f + 1 isassigned as f . Otherwise, p f − becomes the end point of thecurrent trajectory segment, and the current trajectory segment T [ s : f − is compressed into a line segment p s p f − . And p f − becomes the anchor point of the next trajectory segmentto be compressed.Every time ROCE checks whether the last ﬂoat point p f − is the end point of ﬁnal end point of the current trajectorysegment, each point p m ( s < m < f ) needs to be scanned onceto calculate PSED to verify whether p s p f is (cid:15) -error-bounded.So each point needs to be scanned multiple times during thecompression. To deal with this problem, the candidate regionis adopted in ROCE and each point needs to be scannedonce and only once. ( p s , p f ) -CandidateRegion and T [ s : f ] -CandidateRegion are formally deﬁned as follows: Deﬁnition 7. ( ( p s , p f ) -CandidateRegion): Given the upperbound of PSED (cid:15) , a trajectory segment T [ s : f ]( s (cid:15) ) . Since p s is outside the corresponding (cid:15) -Region E f of p f , we can get two tangent rays of E f startingfrom p s named tr f and tr (cid:48) f . The minor sector enclosed by tr f and tr (cid:48) f , minus the overlapping region of this minor sector anda circular region, whose center is p s and radius is | p s p f | , is ( p s , p f ) -CandidateRegion. Deﬁnition 8. ( T [ s : f ] -CandidateRegion): Given the upperbound of PSED (cid:15) , a trajectory segment T [ s : f ]( s (cid:15) ) . T [ s : f ] -CandidateRegion = ( p s , p s +1 ) -CandidateRegion (cid:84) ( p s , p s +2 ) -CandidateRegion (cid:84) ... (cid:84) ( p s , p f ) -CandidateRegion, i.e. T [ s : f ] -CandidateRegion = T [ s : f − -CandidateRegion (cid:84) ( p s , p f ) -CandidateRegionif s < ( f − . During the procedure of ﬁnding which point is the ﬁnal endpoint of the current trajectory segment starting from p s , if theﬂoat point p f +1 falls in T [ s : f ] -CandidateRegion, then forany point p m ( s < m < f + 1) , P SED ( p m ) must be less than (cid:15) . So by using the candidate region, PSED no longer needs tobe calculated, and each point needs to be scanned only once. Figure 4 gives us an example to show the processingprocedure of ROCE. Since p is outside the corresponding (cid:15) -Region E of p , we can get two tangent rays tr and tr (cid:48) of E starting from p . The region in blue is ( p , p ) -CandidateRegion and also T [1 : 2] -CandidateRegion. p falls in T [1 : 2] -CandidateRegion. Similarly, we can get ( p , p ) -CandidateRegion, which is the region in red. Theoverlapping region of T [1 : 2] -CandidateRegion and ( p , p ) -CandidateRegion is T [1 : 3] -CandidateRegion. If the nextpoint p falls in T [1 : 3] -CandidateRegion, the line segment p p must intersects all (cid:15) -Regions of discarded points, i.e. E , E , and p p must be (cid:15) -error-bounded according to Lemma 1.Otherwise, the trajectory segment T [1 : 3] is compressed into p p . And ROCE will repeat the similar processing procedurestarting from p untill the last point is compressed. 𝑝𝑝 𝒑𝒑 𝑬𝑬 𝑬𝑬 𝒑𝒑 𝒑𝒑 𝒕𝒕𝒕𝒕 𝒕𝒕𝒕𝒕 𝒕𝒕𝒕𝒕 𝒕𝒕𝒕𝒕 Fig. 4. An example of ROCE

ROCE is formally described in Algorithm 1. Starting fromthe ﬁrst point, ROCE scans points in the trajectory one by one.In each iteration, ROCE tries to ﬁnd which point is the ﬁnalend point of the current trajectory segment, and this trajectorysegment is compressed into a line segment (Line 4-15). Forthe following points of the start point, if their (cid:15) -Regions allcontain the start point, any line segment starting from the startpoint must intersect their corresponding (cid:15) -Regions. Thus theirrestrictions are on longer needed to be thought about accordingto Lemma 1 (Line 7-9).By using the candidate region, each point needs to bescanned once and only once. So ROCE is a one-pass errorbounded trajectory compression algorithm and its time com-plexity is O ( N ) . Only const space is needed by ROCE, nomatter how many points to be compressed into a line segment.So, the space complexity of ROCE is only O (1) .III. R ANGE Q UERY P ROCESSING

In most previous work, a trajectory is thought to be over-lapped with the query region R iff at least one point in thistrajectory falls in R . However, this judgement condition isnot complete especially for compressed trajectories, becausethere may be hundreds of points discarded between each twoadjacent points and the points in each queried trajectory areextremely sparse. If some points in a trajectory fall in thethe query region, but these points are discarded after thecompression, then such a trajectory is missing in the resultset. To address this, each compressed trajectory is regardedas a set of continuous line segments, but not discrete points.And based on this, we propose a new Range Query processingalgorithm on Compressed trajectories (RQC for short). Rangequeries are redeﬁned as: Deﬁnition 9. (Range Query): Given a compressed trajectorydataset T and a query region R , the range query result lgorithm 1 The ROCE Algorithm

Input:

Raw trajectory T = { p , p , ..., p N } , the upper boundof PSED (cid:15) Output: (cid:15) -Error-bounded compressed trajectory T (cid:48) = { p i , p i , ..., p i n } ( p i = p , p i n = p N ) of T i = 1 T (cid:48) = [ T [1]] while i ≤ N do StartP oint = T [ i ] Initialize ( CandidateRegion, StartP oint ) i = i + 1 while ( StartP oint in EpsilonRegion ( T [ i ] , (cid:15) ) ) and( i ≤ N ) do i = i + 1 end while while ( T [ i ] in CandidateRegion ) and ( i ≤ N ) do U pdateCandidateRegion ( CandidateRegion, T [ i ] , (cid:15) ) i = i + 1 end while i = i − T (cid:48) .Append ( T [ i ]) end while return T (cid:48) Q r ( R, T ) consists of all such compressed trajectories in T ,at least one of whose line segments intersects R , i.e., Q r ( R, T ) = { T ∈ T |∃ p i k p i k +1 ∈ T, s.t. p i k p i k +1 intersects R } For simplicity, we consider query regions as two-dimensional rectangles, but our approach can be adapted tohandle regions in arbitrary shapes.Given a range query, in order to get the query result Q r ( R, T ) , a key issue is to determine whether a line segmentintersects the query region R . There are lots of work on how todetermine the relationship between a line segment and a rect-angle, so we don’t discuss about that in this paper. However,each compressed trajectory consists of multiple consecutiveline segments, and it costs a lot to judge the relationshipbetween the query region R and each line segments in allcompressed trajectories. To attack this issue, in Section III-A,some accelerating strategies are proposed to reduce the sizeof the search space. We also propose a highly efﬁcient indexto further speed up range queries in Section III-B. A. Accelerating Strategies

In order to reduce the size of the search space and acceleraterange queries, the Minimal Bounding Rectangle of a com-pressed trajectory (MBR for short) is used here.

M BR ( T (cid:48) ) is the smallest rectangle which contains the entire compressedtrajectory T (cid:48) . We can easily get an observation as below: Observation 1.

Given a query region R , a compressed tra-jectory T (cid:48) and its MBR M BR ( T (cid:48) ) , if M BR ( T (cid:48) ) and R don’toverlap, T (cid:48) must not be overlapped with R . If M BR ( T (cid:48) ) and R overlap, it’s hard to determine whether T (cid:48) is overlapped with R , such as the situation shown in Figure5. Under this circumstance, we have another observation.There are still 3 cases where T (cid:48) must be overlapped with R as shown in Figure 6. 𝑴𝑴𝑴𝑴𝑴𝑴𝑴𝑴 𝑇𝑇 ′ [1] 𝑇𝑇 ′ [2] 𝑇𝑇 ′ [3] Fig. 5. The rectangle in gray represents the query region R and the rectanglein orange represents the MBR ( T (cid:48) ) ① ③② 𝑹𝑹 𝑴𝑴𝑴𝑴𝑹𝑹 𝑹𝑹 𝑴𝑴𝑴𝑴𝑹𝑹 𝑹𝑹 𝑴𝑴𝑴𝑴𝑹𝑹

Fig. 6. Each gray rectangle represents the query region R and each orangerectangle represents the MBR ( T (cid:48) ) of a compressed trajectory T (cid:48) Observation 2.

Given a query region R , and a compressedtrajectory T (cid:48) . If M BR ( T (cid:48) ) is contained in R , T (cid:48) must be overlappedwith R . If there is one edge of

M BR ( T (cid:48) ) completely containedin R , then at least one point of T (cid:48) is in R and T (cid:48) mustbe overlapped with R . The reason is that based on theconcept of MBR, there must be at least one point of T (cid:48) on each edge of M BR ( T (cid:48) ) . There are only two parallel edges of

M BR ( T (cid:48) ) inter-secting R , and the other two parallel edges outside R .From the analysis in Case (2), we can get that there is atleast one point of T (cid:48) on each of two parallel edges whichare outside R . For T (cid:48) consists of multiple continuousline segments, these two points must be connected bycontinuous line segments. So, there must be at leastone line segment in T (cid:48) intersecting R and T (cid:48) must beoverlapped with R .B. Trajectory Index ASP-tree In order to further speed up range queries, we proposea highly efﬁcient index, Adaptive Spatial Partition quadtreelike index (

ASP-tree for short). In order to reduce the spaceoverhead, only leaf nodes directly store trajectory informationin

ASP-tree . For each non-leaf node in

ASP-tree , it con-tains entries in the form of ( ChildRegion - ChildP ointer ) ,where ChildRegion - ChildP ointer refers to the correspond-ing regions and addresses of its 4 child nodes. Each leafnode in

ASP-tree stores information in the form of ( ID - LineSegments ) . ( ID - LineSegments ) refers to continueline segments of a compressed trajectory whose identiﬁer is ID , and these line segments all intersect the correspondingregion of this leaf node.There is only a root node, which represents the whole regionand all compressed trajectories. For each node in ASP-tree ,if there are more than ξ line segments which intersect thecorresponding region of this node, then this node is a non-leafode which has 4 child nodes. Otherwise, this node is a leafnode. So ξ , a threshold value estimated through experiments,controls the height of ASP-tree .Given how to divide the region of a non-leaf node, theline segments intersecting the corresponding region of thisnode should be split among the 4 child nodes. The strategy isto verify which one or more corresponding regions of these4 child nodes each line segment intersects, and then assignthis line segment to the corresponding child nodes. So a linesegment may be stored in different leaf nodes. As an exampleshown in Figure 7, 4 continue line segments in a compressedtrajectory T (cid:48) i all intersect the father node. The line segment T (cid:48) i [ k + 2] T (cid:48) i [ k + 3] intersects ChildRegion , ChildRegion and ChildRegion , so T (cid:48) i [ k + 2] T (cid:48) i [ k + 3] is assigned to these3 child nodes. And ﬁnally, 3 line segments T (cid:48) i [ k + 1] T (cid:48) i [ k + 2] , T (cid:48) i [ k + 2] T (cid:48) i [ k + 3] and T (cid:48) i [ k + 4] T (cid:48) i [ k + 5] are assigned to thechild node whose corresponding region is ChildRegion . 𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪 𝑇𝑇 𝑖𝑖′ [ 𝑘𝑘 +1] 𝑇𝑇 𝑖𝑖′ [ 𝑘𝑘 +2] 𝑇𝑇 𝑖𝑖′ [ 𝑘𝑘 +3] 𝑇𝑇 𝑖𝑖′ [ 𝑘𝑘 +4] 𝑇𝑇 𝑖𝑖′ [ 𝑘𝑘 +5] { 𝑇𝑇 𝑖𝑖′ 𝑘𝑘 + 1: 𝑘𝑘 + 3 , 𝑇𝑇 𝑖𝑖′ [ 𝑘𝑘 + 4: 𝑘𝑘 + 5]}{ 𝑇𝑇 𝑖𝑖′ 𝑘𝑘 + 2: 𝑘𝑘 + 3 }{ 𝑇𝑇 𝑖𝑖′ 𝑘𝑘 + 2: 𝑘𝑘 + 4 }{ 𝑇𝑇 𝑖𝑖′ 𝑘𝑘 + 3: 𝑘𝑘 + 5 } 𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪 𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪 𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪 𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪 𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪 𝑜𝑜𝑜𝑜 ① ② 𝑅𝑅 𝑅𝑅 𝑅𝑅 𝑅𝑅 Fig. 7. An example shows how to divide the corresponding region of a fathernode and how to split the line segments intersecting the corresponding regionof the father node among its 4 child nodes

In a traditional quadtree, if a node has 4 child nodes, theregion represented by the father node is evenly divided intofour regions. But it’s not suitable to do so, because trajectoriesare usually not uniformly distributed. This may make the indexinclined greatly, which affects the efﬁciency of range queries.To handle this, a data adaptive strategy is adopted in

ASP-tree . There are totally two cases to divide the correspondingregion of a father node among its 4 child nodes as shown inFigure 7. For all line segments intersecting the correspondingregion of the father node, we ﬁrst get all endpoints of theseline segments falling in the this region, and then calculate themedian of all their x dimensions ( y dimensions). The resultof this calculation is used to draw a vertical (horizontal) line,which divides the region represented by the father node intotwo smaller regions, R and R . After that, the median ofall y dimensions ( x dimensions) of all endpoints of the linesegments falling in R and R are respectively used to furtherdivide these two parts into 4 regions. In these two case, thecase with fewer repeated line segments intersecting the 4 childnodes will be chosen. The purpose for doing these is to make ASP-tree as balanced as possible.There may be a special case when a region is to be dividedinto 2 parts. For all line segment intersecting this region, noneof their endpoints fall in this ragion. In such a case, we dividethis region evenly into 2 equal regions.Let’s introduce how to get the range query result Q r ( R, T ) on compressed trajectories with ASP-tree . First, traverse

ASP-tree from top to bottom with the query region R to ﬁndall leaf nodes whose corresponding regions are overlapped with R , and then put all continue line segments stored inthese leaf nodes into the candidate result set. For a non-leafnode, if its corresponding region isn’t overlapped with R ,then the corresponding regions of its descendants must be notoverlapped with R and its descendants are no longer neededto be traversed. If the corresponding region of a leaf nodeis completely contained in R , then directly put the identiﬁersstored in this node into the ﬁnal result set. Second, for continueline segments in the candidate result set, we use their MBRsto determine their relationships with R one by one. Third,if the relationship still can’t be judged, then we go to checkwhether there is a line segment in these continue line segmentsintersecting R . After these three steps, we can get the ﬁnalresult set of Q r ( R, T ) .IV. S IMILARITY Q UERY P ROCESSING

In this section, a new trajectory similarity measure, Areaclamped by the Line segments (AL for short), is introducedﬁrst. Then based on AL, we propose a top- k Similarity Queryalgorithm on Compressed trajectories (SQC for short).

A. Trajectory Similarity Metric AL

The trajectory similarity measure is a fundamental operationthat can be used in many applications, e.g. similarity search,clustering, and classiﬁcation. Given two compressed trajec-tories T (cid:48) r = { p i , p i , ..., p i m } and T (cid:48) s = { p j , p j , ..., p j n } .Most widely used trajectory similarity metrics are based onthe distance between matched point pairs of T (cid:48) r and T (cid:48) s . Thisis suitable for raw trajectories where the distance betweenany two adjacent points in a trajectory doesn’t vary much.But for compressed trajectories, a few or even hundreds ofraw points are approximately represented by a line segment,which makes the length of each line segment vary greatly. Sothese similarity metrics are unapplicable. As shown in Figure8, there are two pairs of matched line segments in T (cid:48) r and T (cid:48) s . There is no doubt that the pair with longer lengths, i.e. p i p i and p j p j , should have a greater impact on the degreeof similarity between T (cid:48) r and T (cid:48) s than p i p i and p j p j . Sothe lengths of line segments should also be considered whenwe deﬁne the similarity between two compressed trajectories.When we measure the similarity between T (cid:48) r and T (cid:48) s withdifferent numbers of line segments, there must be unmatchedline segments in the compressed trajectory with more linesegments, e.g. p i p i in T (cid:48) r as shown in Figure 8. The existenceof the line segment p i p i has a negative effect on thesimilarity between T (cid:48) r and T (cid:48) s , since there is no line segmentin T (cid:48) s matched with p i p i . So such a line segment should bepunished when we measure the similarity between T (cid:48) r and T (cid:48) s . 𝑇𝑇 𝑟𝑟′ 𝑝𝑝 𝑖𝑖 𝑝𝑝 𝑖𝑖 𝑝𝑝 𝑗𝑗 𝑝𝑝 𝑖𝑖 𝑝𝑝 𝑖𝑖 𝑝𝑝 𝑗𝑗 𝑝𝑝 𝑗𝑗 𝑇𝑇 𝑠𝑠′ Fig. 8. The compressed trajectories T (cid:48) r and T (cid:48) s consist of 3 and 2 consecutiveline segments respectively For a pair of line segments p i k p i k +1 and p j h p j h +1 , eachline segment is divided into at most two parts, the matchedart and the unmatched part. The matched part of a linesegment is a subline segment satisfying the condition that forany point on this subline segment, its minimum Euclideandistance to the other line segment is no more than σ , agiven threshold value. As shown in Figure 9, the matchedpart of p i k p i k +1 is the subline segment p a p b , and the un-matched part of p i k p i k +1 is the remaining two subline seg-ments, i.e. p i k p a and p b p i k +1 . Let Dist ( p i k p i k +1 , p j h p j h +1 ) denote the dissimilarity between p i k p i k +1 and p j h p j h +1 , andthen Dist ( p i k p i k +1 , p j h p j h +1 ) = C ( p i k p i k +1 , p j h p j h +1 ) + P ( p i k p i k +1 , p j h p j h +1 ) , where C ( p i k p i k +1 , p j h p j h +1 ) is thearea clamped by two matched parts, and P ( p i k p i k +1 , p j h p j h +1 ) is the total penalty areas of two unmatched parts. P ( p i k p i k +1 , p j h p j h +1 ) = (cid:80) p u p u ∈ UP P unish ( p u p u ) and P unish ( p u p u ) = || p u p u || ∗ ( σ/ , where U P denotestwo unmatched parts of p i k p i k +1 and p j h p j h +1 , p u p u is anunmatched line segment or subline segment in U P , and || p u p u || is to calculate the length of p u p u . Next, let’sintroduce how to calculate C ( p i k p i k +1 , p j h p j h +1 ) . Let S abc represent the area of the triangle whose 3 vertices are p a , p b and p c . Assuming the subline segments p a p b and p c p d are the matched parts of p i k p i k +1 and p j h p j h +1 respectively, C ( p i k p i k +1 , p j h p j h +1 ) , the area clamped by p a p b and p c p d , iscalculated in four situations as shown in Figure 10:(1) If p a p b p c p d is a convex quadrilateral, then C ( p i k p i k +1 , p j h p j h +1 ) = S abd + S bcd .(2) If two points p a and p b are on the different re-gions divided by the straight line where p c p d lies, then C ( p i k p i k +1 , p j h p j h +1 ) = S acd + S bcd .(3) If one point p c is on the straight line where p a p b lies,then C ( p i k p i k +1 , p j h p j h +1 ) = S abd .(4) If p a p b and p c p d are collinear, then C ( p i k p i k +1 , p j h p j h +1 ) = 0 . 𝑝𝑝 𝑖𝑖 𝑘𝑘 𝑝𝑝 𝑖𝑖 𝑘𝑘+1 𝑝𝑝 𝑗𝑗 ℎ 𝑝𝑝 𝑗𝑗 ℎ+1 𝑝𝑝 𝑎𝑎 𝑝𝑝 𝑏𝑏 𝑅𝑅 = 𝜎𝜎 Fig. 9. Based on p j h p j h +1 , the line segment p i k p i k +1 is divided into 2parts (1) (2) 𝑝𝑝 𝑎𝑎 𝑝𝑝 𝑏𝑏 𝑝𝑝 𝑐𝑐 𝑝𝑝 𝑑𝑑 𝑝𝑝 𝑎𝑎 𝑝𝑝 𝑏𝑏 𝑝𝑝 𝑐𝑐 𝑝𝑝 𝑑𝑑 (3) (4) 𝑝𝑝 𝑎𝑎 𝑝𝑝 𝑎𝑎 𝑝𝑝 𝑏𝑏 𝑝𝑝 𝑏𝑏 𝑝𝑝 𝑐𝑐 𝑝𝑝 𝑑𝑑 𝑝𝑝 𝑐𝑐 𝑝𝑝 𝑑𝑑 Fig. 10. The gray region is the region clamped by p a p b and p c p d Given two compressed trajectories T (cid:48) r = { p i , p i , ..., p i m } and T (cid:48) s = { p j , p j , ..., p j n } , let Θ ( T (cid:48) r , T (cid:48) s ) represent thedissimilarity between T (cid:48) r and T (cid:48) s , and then Θ ( T (cid:48) r , T (cid:48) s ) can be recursively calculated as: Θ ( T (cid:48) r , T (cid:48) s ) =  P unish ( T (cid:48) r ) if n = 1 P unish ( T (cid:48) s ) if m = 1 min  P unish ( p i p i ) + Θ ( Rest ( T (cid:48) r ) , T (cid:48) s ) ,P unish ( p j p j ) + Θ ( T (cid:48) r , Rest ( T (cid:48) s )) ,Dist ( r (cid:48) , s (cid:48) ) + Θ ( Rest ( T (cid:48) r ) , Rest ( T (cid:48) s ))  otherwise where P unish ( T (cid:48) r ) = (cid:80) m − k =1 P unish ( p i k p i k +1 ) , P unish ( T (cid:48) s ) = (cid:80) n − h =1 P unish ( p j h p j h +1 ) , Rest ( T (cid:48) r ) = { p i , p i , ..., p i m } and Rest ( T (cid:48) s ) = { p j , p j , ..., p j n } . Θ ( T (cid:48) r , T (cid:48) s ) = 0 iff T (cid:48) r and T (cid:48) s are identical. ∀ p i k p i k +1 ∈ T (cid:48) r and ∀ p j h p j h +1 ∈ T (cid:48) s , if there is no matched subline segmentin p i k p i k +1 and p j h p j h +1 , i.e. T (cid:48) r and T (cid:48) s are too far away fromeach other, then Θ ( T (cid:48) r , T (cid:48) s ) = P unish ( T (cid:48) r ) + P unish ( T (cid:48) s ) . Ingeneral, ≤ Θ ( T (cid:48) r , T (cid:48) s ) ≤ P unish ( T (cid:48) r ) + P unish ( T (cid:48) s ) . Thesimilarity between T (cid:48) r and T (cid:48) s is represented by AL ( T (cid:48) r , T (cid:48) s ) ,and then by normalizing Θ ( T (cid:48) r , T (cid:48) s ) into [0 , , AL ( T (cid:48) r , T (cid:48) s ) isformally deﬁned as: Deﬁnition 10. (AL): Given two compressed trajectories T (cid:48) r and T (cid:48) s , AL ( T (cid:48) r , T (cid:48) s ) = 1 − Θ ( T (cid:48) r , T (cid:48) s ) P unish ( T (cid:48) r ) + P unish ( T (cid:48) s ) The more similar T (cid:48) r and T (cid:48) s , the larger AL ( T (cid:48) r , T (cid:48) s ) . B. AL Based Top- k Similarity Query

Deﬁnition 11. (Top- k Similarity Query): Given a querycompressed trajectory T (cid:48) q , a compressed trajectory dataset T , a distance threshold σ and an integer k , the top- k similarity query result Q s ( T (cid:48) q , k, T , σ ) consists of k tra-jectories in T , which are most similar to T (cid:48) q , satisfying: ∀ T (cid:48) ∈ Q s ( T (cid:48) q , k, T , σ ) and ∀ T (cid:48) ∈ ( T − Q s ( T (cid:48) q , k, T , σ )) , AL ( T (cid:48) , T (cid:48) q ) ≥ AL ( T (cid:48) , T (cid:48) q ) . Like most widely used trajectory similarity measures, thecomputing of AL needs quadratic computation cost. So per-forming sequential scans across the entire dataset is notscalable. In addition, AL is also non-metric due to violatingtriangular inequality.

Theorem 1.

AL don’t satisﬁes triangular inequality.Proof.

Given three compressed trajectories, T (cid:48) = { (0 , , (4 , } , T (cid:48) = { (0 , , (4 , , (4 , , (7 , } and T (cid:48) = { (0 , , (4 , , (4 , − , (5 , − } , and σ , which is set to . Then, AL ( T (cid:48) , T (cid:48) ) = , AL ( T (cid:48) , T (cid:48) ) = , AL ( T (cid:48) , T (cid:48) ) = and AL ( T (cid:48) , T (cid:48) ) + AL ( T (cid:48) , T (cid:48) ) < AL ( T (cid:48) , T (cid:48) ) . Therefore, AL don’tsatisﬁes triangular inequality.Because of Theorem 1, generic indexing techniques, whichrely on triangular inequality based pruning, can’t be appliedhere. In order to reduce the search space, we construct acandidate set to avoid lots of unnecessary calculations ofAL. Before introducing the candidate set, we deﬁne the σ -Bounding Rectangle ( σ - BR for short) of a compresssedtrajectory. Deﬁnition 12. ( σ - BR ( T (cid:48) ) ): Given a distance threshold σ andthe MBR of the compressed trajectory T (cid:48) , whose coordinate isepresented by [ x (cid:48) min , x (cid:48) max ] ∗ [ y (cid:48) min , y (cid:48) max ] , σ - BR ( T (cid:48) ) is therectangle region, whose coordinate is [ x (cid:48) min − σ, x (cid:48) max + σ ] ∗ [ y (cid:48) min − σ, y (cid:48) max + σ ] . Based on Deﬁnition 12, the similarity candidate set isformally deﬁned as:

Deﬁnition 13. (Similarity Candidate Set SC ( T (cid:48) q , T , σ ) ): Givena query compressed trajectory T (cid:48) q , a compressed trajectorydataset T and a distance threshold σ , the similarity candidateset SC ( T (cid:48) q , T , σ ) consists of all compressed trajectories in T ,which are overlapped with σ - BR ( T (cid:48) q ) . By using our range query processing algorithm RQC, wecan get the similarity candidate set SC ( T (cid:48) q , T , σ ) . For anytrajectory T (cid:48) ∈ ( T − SC ( T (cid:48) q , T , σ )) , the similarity between T (cid:48) and T q must be 0 in AL. To get the ﬁnal result set Q s ( T (cid:48) q , k, T , σ ) , the AL between T (cid:48) q and each compressedtrajectory in SC ( T (cid:48) q , T , σ ) is needed to be checked one byone. During this procedure, a smallest heap, whose size is k ,is maintained and updated continuously.By using the similarity candidate set, the searching space isreduced greatly, but the calculation cost of the AL between T (cid:48) q and each compressed trajectory in SC ( T (cid:48) q , T , σ ) is stilla little high. To address this, we propose an efﬁcient Pre-Punishment strategy to reduce the cost. For a compressedtrajectory T (cid:48) in SC ( T (cid:48) q , T , σ ) , there may be some line segmentsin T (cid:48) not intersecting σ - BR ( T (cid:48) q ) , such as the line segments p i p i , p i p i and p i p i as shown in Figure 11. Accordingto the deﬁnition of σ - BR ( T (cid:48) q ) , the distances of these linesegments to any line segment in T (cid:48) q must be larger than σ ,thus these line segments must be punished in the calculationof AL. So for each of these line segments, only P unish () of itneeds to be calculated and the calculation of Dist () betweenit and each line segment in T (cid:48) q is not needed any more. Inthe same way, there may be also some line segments in T (cid:48) q not intersecting σ - BR ( T (cid:48) ) and these line segments are sureto be punished in the calculation of AL. And for each of theseline segments, only its corresponding P unish () needs to becalculated and the calculation of Dist () between it and eachline segment in T (cid:48) is also no longer needed. The experiment insection V-D3 shows that by using the Pre-Punishment strategy,the execution time of similarity queries can be reduced to aslow as 17.8%. 𝝈𝝈 - 𝑩𝑩𝑩𝑩 𝑻𝑻 𝒒𝒒′ 𝑝𝑝 𝑖𝑖 𝑝𝑝 𝑖𝑖 𝑝𝑝 𝑖𝑖 𝑝𝑝 𝑖𝑖 𝑝𝑝 𝑖𝑖 𝑝𝑝 𝑖𝑖 𝑇𝑇𝑇 𝑝𝑝 𝑖𝑖 Fig. 11. An example shows how to accelerate the calculation of AL

V. E

XPERIMENTAL E VALUTION

In this section, we evaluate the performance of the com-pression algorithm ROCE, and our two query processingalgorithms RQC and SQC on compressed trajectories.

A. Experiment Setup1) Datasets:

The experiments were conducted on 3 types ofreal-life datasets. The dataset Animal [10] records the migra-tion of 8 young white storks originating from 8 populations,and the sampling rates of these trajectories are relatively low.The dataset Indoor [11] records the trajectories of visitorsin a shopping center, and the points were sampled quitefrequently. With a quite large size, the dataset Planet consistsof quite a few trajectories distributed all over the globe, andthe movement modes of these trajectories are very rich. Thesetrajectories are sparsely distributed on the Earth, but mainlydistributed in a large rectangular region, which is . ∗ km in area. All trajectories completely contained in this largeregion were selected as the raw dataset called Planet I. Sincethis dataset was to be compressed, the trajectories with lessthan 1000 points were removed. With 0.3 billion points intotal, Planet I consists of 96279 raw trajectories.When comparing the execution time of different compres-sion algorithms, we found some basline algorithms were tootime-consuming to run on the entire datasets. So we randomlysampled some long trajectories from Animal, Indoor andPlanet I. Then we got 3 subsets with 120, 90 and 47 longtrajectories, and these subsets are called Animal II, Indoor IIand Planet II respectively. There are all about 2 million pointsin these three subsets.

2) Experimental Environment:

Experiments were all con-ducted on a linux machine with a 64-bit, 8-core, 3.6GHzIntel(R) Core (TM) i9-9900K CPU and 32GB memory. Ouralgorithms were all implemented in C++ on Ubuntu 18.04.Each experiment was repeated over 3 times and the averagewas reported here.

B. Performance Evaluation for Trajectory Compression Algo-rithms

We compared our compression algorithm ROCE with 4existing trajectory compression algorithms in online mode,which use PED as their error metric, i.e. OPW(BOPW) [7],[8], BQS [1], [2], FBQS [1], [2] and OPERB [6]. Thoughthe error metric of DOTS [12] is not PED but LISSED, itwas still compared with ROCE, because it was demonstratedstable superiority against other online compression algorithmson some indicators [3]. For these compression algorithms,their performances were measured by the execution time andaccuracy loss.

1) Execution Time:

In the ﬁrst experiment, we evaluate theexecution time of 6 algorithms w.r.t. varying the compressionrate, and the results are shown in Figure 12. The results showthat ROCE is obviously faster than BQS, FBQS and DOTS.DOTS needs much more memory and time to handle thesituations that the tracked object stays at the same place fora long time. Because there are such trajectories in Planet II,the execution time of DOTS is too long when the compression http://dx.doi.org/10.5441/001/1.78152p3q https://irc.atr.jp/crest2010 HRI/ATC dataset/ https://wiki.openstreetmap.org/wiki/Planet.gpx ate is close to 100, so we chose to stop the experiment. Onthe dataset Planet I, the execution times of OPERB and ROCEare nearly the same, and ROCE is faster than OPW when thecompression rate is more than 100, which is consistent withtheir time complexity analyses. (cid:2) (cid:4)(cid:7) (cid:7)(cid:2) (cid:8)(cid:7)(cid:3)(cid:2)(cid:2)(cid:3)(cid:4)(cid:7)(cid:3)(cid:7)(cid:2)(cid:3)(cid:8)(cid:7)(cid:4)(cid:2)(cid:2)(cid:4)(cid:4)(cid:7)(cid:4)(cid:7)(cid:2)(cid:2)(cid:8)(cid:2)(cid:2)(cid:3)(cid:6)(cid:2)(cid:2)(cid:4)(cid:3)(cid:2)(cid:2)(cid:4)(cid:9)(cid:2)(cid:2)(cid:5)(cid:7)(cid:2)(cid:2) (cid:4)(cid:15)(cid:7)(cid:6)(cid:14)(cid:13)(cid:8)(cid:11)(cid:10)(cid:1)(cid:5)(cid:8)(cid:9)(cid:7)(cid:2)(cid:12)(cid:3) (cid:2)(cid:9)(cid:7)(cid:10)(cid:11)(cid:5)(cid:12)(cid:12)(cid:6)(cid:9)(cid:8)(cid:1)(cid:3)(cid:4)(cid:13)(cid:5) (cid:1)(cid:10)(cid:17)(cid:19)(cid:1) (cid:1)(cid:12)(cid:15)(cid:20)(cid:19)(cid:1)(cid:1)(cid:14)(cid:10)(cid:17)(cid:19)(cid:1) (cid:1)(cid:15)(cid:16)(cid:13)(cid:18)(cid:10)(cid:1)(cid:1)(cid:15)(cid:16)(cid:21)(cid:1) (cid:1)(cid:18)(cid:15)(cid:11)(cid:13) (a) Animal II (cid:2) (cid:4)(cid:7) (cid:7)(cid:2) (cid:9)(cid:7)(cid:3)(cid:2)(cid:2)(cid:3)(cid:4)(cid:7)(cid:3)(cid:7)(cid:2)(cid:3)(cid:9)(cid:7)(cid:4)(cid:2)(cid:2)(cid:4)(cid:4)(cid:7)(cid:4)(cid:7)(cid:2)(cid:2)(cid:11)(cid:2)(cid:2)(cid:3)(cid:10)(cid:2)(cid:2)(cid:4)(cid:9)(cid:2)(cid:2)(cid:5)(cid:8)(cid:2)(cid:2)(cid:6)(cid:7)(cid:2)(cid:2) (cid:4)(cid:15)(cid:7)(cid:6)(cid:14)(cid:13)(cid:8)(cid:11)(cid:10)(cid:1)(cid:5)(cid:8)(cid:9)(cid:7)(cid:2)(cid:12)(cid:3) (cid:2)(cid:9)(cid:7)(cid:10)(cid:11)(cid:5)(cid:12)(cid:12)(cid:6)(cid:9)(cid:8)(cid:1)(cid:3)(cid:4)(cid:13)(cid:5) (cid:1)(cid:12)(cid:19)(cid:21)(cid:1) (cid:1)(cid:14)(cid:17)(cid:22)(cid:21)(cid:1)(cid:1)(cid:16)(cid:12)(cid:19)(cid:21)(cid:1) (cid:1)(cid:17)(cid:18)(cid:15)(cid:20)(cid:12)(cid:1)(cid:1)(cid:17)(cid:18)(cid:23)(cid:1) (cid:1)(cid:20)(cid:17)(cid:13)(cid:15) (b) Indoor II (cid:2) (cid:4)(cid:5) (cid:5)(cid:2) (cid:6)(cid:5)(cid:3)(cid:2)(cid:2)(cid:3)(cid:4)(cid:5)(cid:3)(cid:5)(cid:2)(cid:3)(cid:6)(cid:5)(cid:4)(cid:2)(cid:2)(cid:4)(cid:4)(cid:5)(cid:4)(cid:5)(cid:2)(cid:2)(cid:5)(cid:2)(cid:2)(cid:2)(cid:3)(cid:2)(cid:2)(cid:2)(cid:2)(cid:3)(cid:5)(cid:2)(cid:2)(cid:2)(cid:4)(cid:2)(cid:2)(cid:2)(cid:2) (cid:4)(cid:15)(cid:7)(cid:6)(cid:14)(cid:13)(cid:8)(cid:11)(cid:10)(cid:1)(cid:5)(cid:8)(cid:9)(cid:7)(cid:2)(cid:12)(cid:3) (cid:2)(cid:9)(cid:7)(cid:10)(cid:11)(cid:5)(cid:12)(cid:12)(cid:6)(cid:9)(cid:8)(cid:1)(cid:3)(cid:4)(cid:13)(cid:5) (cid:1)(cid:7)(cid:14)(cid:16)(cid:1) (cid:1)(cid:9)(cid:12)(cid:17)(cid:16)(cid:1)(cid:1)(cid:11)(cid:7)(cid:14)(cid:16)(cid:1) (cid:1)(cid:12)(cid:13)(cid:10)(cid:15)(cid:7)(cid:1)(cid:1)(cid:12)(cid:13)(cid:18)(cid:1) (cid:1)(cid:15)(cid:12)(cid:8)(cid:10) (c) Planet II (cid:2) (cid:4)(cid:7) (cid:7)(cid:2) (cid:8)(cid:7)(cid:3)(cid:2)(cid:2)(cid:3)(cid:4)(cid:7)(cid:3)(cid:7)(cid:2)(cid:3)(cid:8)(cid:7)(cid:4)(cid:2)(cid:2)(cid:4)(cid:4)(cid:7)(cid:4)(cid:7)(cid:2)(cid:2)(cid:3)(cid:2)(cid:2)(cid:2)(cid:4)(cid:2)(cid:2)(cid:2)(cid:5)(cid:2)(cid:2)(cid:2)(cid:6)(cid:2)(cid:2)(cid:2) (cid:4)(cid:15)(cid:7)(cid:6)(cid:14)(cid:13)(cid:8)(cid:11)(cid:10)(cid:1)(cid:5)(cid:8)(cid:9)(cid:7)(cid:2)(cid:12)(cid:3) (cid:2)(cid:9)(cid:7)(cid:10)(cid:11)(cid:5)(cid:12)(cid:12)(cid:6)(cid:9)(cid:8)(cid:1)(cid:3)(cid:4)(cid:13)(cid:5) (cid:1)(cid:12)(cid:13)(cid:11)(cid:14)(cid:9)(cid:1) (cid:1)(cid:12)(cid:13)(cid:15)(cid:1) (cid:1)(cid:14)(cid:12)(cid:10)(cid:11) (d) Planet IFig. 12. Efﬁciency evaluation: varying the compression rate To evaluate the impacts of the trajectory size (i.e. thenumber of points in a trajectory) on the execution time ofcompression, we chose 20 trajectories, whose sizes are thelargest, from Animal, Indoor and Planet respectively, andvaried the size of each trajectory from 5000 to 20000, whileﬁxed the compression rate as 50. The results are reported inFigure 13, and the y-coordinates are the rates of the executiontime of compressing trajectories to the execution time ofcompressing trajectories whose trajectory sizes are all 5000.Only algorithms ROCE and OPERB always scale well withthe increase of the size of each trajectory on all datasets, andshow linear running time. While other algorithms do not, andthey need much more time to compress trajectories with morepoints, especially for DOTS. (cid:7) (cid:3)(cid:2) (cid:3)(cid:7) (cid:4)(cid:2)(cid:3)(cid:4)(cid:5)(cid:6)(cid:7)(cid:8)(cid:9) (cid:3)(cid:4)(cid:5)(cid:6)(cid:7)(cid:8)(cid:9) (cid:3)(cid:6)(cid:7)(cid:5)(cid:1)(cid:2)(cid:4)(cid:8)(cid:5) (cid:8)(cid:13)(cid:11)(cid:1)(cid:7)(cid:14)(cid:22)(cid:11)(cid:1)(cid:17)(cid:12)(cid:1)(cid:11)(cid:9)(cid:10)(cid:13)(cid:1)(cid:8)(cid:18)(cid:9)(cid:15)(cid:11)(cid:10)(cid:20)(cid:17)(cid:18)(cid:21)(cid:2) · (cid:5)(cid:4)(cid:4)(cid:4)(cid:1)(cid:6)(cid:17)(cid:14)(cid:16)(cid:20)(cid:19)(cid:3) (cid:1)(cid:10)(cid:17)(cid:19)(cid:1) (cid:1)(cid:12)(cid:15)(cid:20)(cid:19)(cid:1) (cid:1)(cid:14)(cid:10)(cid:17)(cid:19)(cid:1)(cid:15)(cid:16)(cid:13)(cid:18)(cid:10)(cid:1) (cid:1)(cid:15)(cid:16)(cid:21)(cid:1) (cid:1)(cid:18)(cid:15)(cid:11)(cid:13) (a) Animal II (cid:7) (cid:3)(cid:2) (cid:3)(cid:7) (cid:4)(cid:2)(cid:3)(cid:4)(cid:5)(cid:6)(cid:7)(cid:8)(cid:9)(cid:10) (cid:3)(cid:4)(cid:5)(cid:6)(cid:7)(cid:8)(cid:9)(cid:10) (cid:3)(cid:6)(cid:7)(cid:5)(cid:1)(cid:2)(cid:4)(cid:8)(cid:5) (cid:8)(cid:13)(cid:11)(cid:1)(cid:7)(cid:14)(cid:22)(cid:11)(cid:1)(cid:17)(cid:12)(cid:1)(cid:11)(cid:9)(cid:10)(cid:13)(cid:1)(cid:8)(cid:18)(cid:9)(cid:15)(cid:11)(cid:10)(cid:20)(cid:17)(cid:18)(cid:21)(cid:2) · (cid:5)(cid:4)(cid:4)(cid:4)(cid:1)(cid:6)(cid:17)(cid:14)(cid:16)(cid:20)(cid:19)(cid:3) (cid:1)(cid:11)(cid:18)(cid:20)(cid:1) (cid:1)(cid:13)(cid:16)(cid:21)(cid:20)(cid:1) (cid:1)(cid:15)(cid:11)(cid:18)(cid:20)(cid:1)(cid:16)(cid:17)(cid:14)(cid:19)(cid:11)(cid:1) (cid:1)(cid:16)(cid:17)(cid:22)(cid:1) (cid:1)(cid:19)(cid:16)(cid:12)(cid:14) (b) Indoor II (cid:6) (cid:3)(cid:2) (cid:3)(cid:6) (cid:4)(cid:2)(cid:2)(cid:4)(cid:2)(cid:5)(cid:2)(cid:7)(cid:2) (cid:2)(cid:4)(cid:2)(cid:5)(cid:2)(cid:7)(cid:2) (cid:3)(cid:6)(cid:7)(cid:5)(cid:1)(cid:2)(cid:4)(cid:8)(cid:5) (cid:8)(cid:13)(cid:11)(cid:1)(cid:7)(cid:14)(cid:22)(cid:11)(cid:1)(cid:17)(cid:12)(cid:1)(cid:11)(cid:9)(cid:10)(cid:13)(cid:1)(cid:8)(cid:18)(cid:9)(cid:15)(cid:11)(cid:10)(cid:20)(cid:17)(cid:18)(cid:21)(cid:2) · (cid:5)(cid:4)(cid:4)(cid:4)(cid:1)(cid:6)(cid:17)(cid:14)(cid:16)(cid:20)(cid:19)(cid:3) (cid:1)(cid:8)(cid:15)(cid:17)(cid:1) (cid:1)(cid:10)(cid:13)(cid:18)(cid:17)(cid:1) (cid:1)(cid:12)(cid:8)(cid:15)(cid:17)(cid:1)(cid:13)(cid:14)(cid:11)(cid:16)(cid:8)(cid:1) (cid:1)(cid:13)(cid:14)(cid:19)(cid:1) (cid:1)(cid:16)(cid:13)(cid:9)(cid:11) (c) Planet IIFig. 13. Efﬁciency evaluation: varying the size of trajectories

2) Accuracy Loss:

In order to compare the accuracy loss ofthe compressed trajectories generated by these 6 algorithms,we evaluate the maximum PSED and the average PSED ofthe compressed trajectories w.r.t. varying the compression rate,and the results are shown in Figure 14. For each algorithm,the maximum PSED and the average PSED both increase withthe increase of the compression rate. The maximum PSEDof ROCE is always much smaller than those of OPERB andOPW. ROCE, FBQS and BQS always perform similarly inthe maximum PSED. On the average PSED, ROCE always performs much better than most other algorithms. So thecompressed trajectories generated by ROCE maintain muchless accuracy loss than the ones generated by most otheralgorithms, including other fastest algorithms, i.e. OPW andOPERB. In summary, ROCE makes the best balance amongthe accuracy loss, the time cost and the compression rate. (cid:3) (cid:5)(cid:7) (cid:7)(cid:3) (cid:9)(cid:7)(cid:4)(cid:3)(cid:3)(cid:4)(cid:5)(cid:7)(cid:4)(cid:7)(cid:3)(cid:4)(cid:9)(cid:7)(cid:5)(cid:3)(cid:3)(cid:5)(cid:5)(cid:7)(cid:5)(cid:7)(cid:3)(cid:3)(cid:2)(cid:3)(cid:3)(cid:2)(cid:5)(cid:3)(cid:2)(cid:6)(cid:3)(cid:2)(cid:8)(cid:3)(cid:2)(cid:10) (cid:4)(cid:7)(cid:11)(cid:8)(cid:9)(cid:10)(cid:9)(cid:1)(cid:5)(cid:6)(cid:3)(cid:2) (cid:2)(cid:9)(cid:7)(cid:10)(cid:11)(cid:5)(cid:12)(cid:12)(cid:6)(cid:9)(cid:8)(cid:1)(cid:3)(cid:4)(cid:13)(cid:5) (cid:1)(cid:11)(cid:18)(cid:20)(cid:1) (cid:1)(cid:13)(cid:16)(cid:21)(cid:20)(cid:1) (cid:1)(cid:15)(cid:11)(cid:18)(cid:20)(cid:1) (cid:1)(cid:16)(cid:17)(cid:14)(cid:19)(cid:11)(cid:1)(cid:1)(cid:16)(cid:17)(cid:22)(cid:1) (cid:1)(cid:19)(cid:16)(cid:12)(cid:14) (a) Animal II (cid:2) (cid:4)(cid:7) (cid:7)(cid:2) (cid:8)(cid:7)(cid:3)(cid:2)(cid:2)(cid:3)(cid:4)(cid:7)(cid:3)(cid:7)(cid:2)(cid:3)(cid:8)(cid:7)(cid:4)(cid:2)(cid:2)(cid:4)(cid:4)(cid:7)(cid:4)(cid:7)(cid:2)(cid:2)(cid:8)(cid:2)(cid:2)(cid:3)(cid:6)(cid:2)(cid:2)(cid:4)(cid:3)(cid:2)(cid:2)(cid:4)(cid:9)(cid:2)(cid:2)(cid:5)(cid:7)(cid:2)(cid:2) (cid:4)(cid:7)(cid:11)(cid:8)(cid:9)(cid:10)(cid:9)(cid:1)(cid:5)(cid:6)(cid:3)(cid:2) (cid:2)(cid:9)(cid:7)(cid:10)(cid:11)(cid:5)(cid:12)(cid:12)(cid:6)(cid:9)(cid:8)(cid:1)(cid:3)(cid:4)(cid:13)(cid:5) (cid:1)(cid:10)(cid:17)(cid:19)(cid:1) (cid:1)(cid:12)(cid:15)(cid:20)(cid:19)(cid:1) (cid:1)(cid:14)(cid:10)(cid:17)(cid:19)(cid:1)(cid:1)(cid:15)(cid:16)(cid:13)(cid:18)(cid:10)(cid:1) (cid:1)(cid:15)(cid:16)(cid:21)(cid:1) (cid:1)(cid:18)(cid:15)(cid:11)(cid:13) (b) Indoor II (cid:3) (cid:5)(cid:8) (cid:8)(cid:3) (cid:10)(cid:8)(cid:4)(cid:3)(cid:3)(cid:4)(cid:5)(cid:8)(cid:4)(cid:8)(cid:3)(cid:4)(cid:10)(cid:8)(cid:5)(cid:3)(cid:3)(cid:5)(cid:5)(cid:8)(cid:5)(cid:8)(cid:3)(cid:3)(cid:2)(cid:3)(cid:3)(cid:3)(cid:3)(cid:2)(cid:3)(cid:3)(cid:9)(cid:3)(cid:2)(cid:3)(cid:4)(cid:5)(cid:3)(cid:2)(cid:3)(cid:4)(cid:11)(cid:3)(cid:2)(cid:3)(cid:5)(cid:7)(cid:3)(cid:2)(cid:3)(cid:6)(cid:3) (cid:4)(cid:7)(cid:11)(cid:8)(cid:9)(cid:10)(cid:9)(cid:1)(cid:5)(cid:6)(cid:3)(cid:2) (cid:2)(cid:9)(cid:7)(cid:10)(cid:11)(cid:5)(cid:12)(cid:12)(cid:6)(cid:9)(cid:8)(cid:1)(cid:3)(cid:4)(cid:13)(cid:5) (cid:1)(cid:12)(cid:19)(cid:21)(cid:1) (cid:1)(cid:14)(cid:17)(cid:22)(cid:21)(cid:1)(cid:1)(cid:16)(cid:12)(cid:19)(cid:21)(cid:1) (cid:1)(cid:17)(cid:18)(cid:15)(cid:20)(cid:12)(cid:1)(cid:1)(cid:17)(cid:18)(cid:23)(cid:1) (cid:1)(cid:20)(cid:17)(cid:13)(cid:15) (c) Planet II (cid:3) (cid:5)(cid:8) (cid:8)(cid:3) (cid:9)(cid:8)(cid:4)(cid:3)(cid:3)(cid:4)(cid:5)(cid:8)(cid:4)(cid:8)(cid:3)(cid:4)(cid:9)(cid:8)(cid:5)(cid:3)(cid:3)(cid:5)(cid:5)(cid:8)(cid:5)(cid:8)(cid:3)(cid:3)(cid:2)(cid:3)(cid:3)(cid:3)(cid:2)(cid:3)(cid:4)(cid:3)(cid:2)(cid:3)(cid:5)(cid:3)(cid:2)(cid:3)(cid:6)(cid:3)(cid:2)(cid:3)(cid:7) (cid:2)(cid:11)(cid:8)(cid:10)(cid:7)(cid:9)(cid:8)(cid:1)(cid:5)(cid:6)(cid:4)(cid:3) (cid:2)(cid:9)(cid:7)(cid:10)(cid:11)(cid:5)(cid:12)(cid:12)(cid:6)(cid:9)(cid:8)(cid:1)(cid:3)(cid:4)(cid:13)(cid:5) (cid:1)(cid:10)(cid:17)(cid:19)(cid:1) (cid:1)(cid:12)(cid:15)(cid:20)(cid:19)(cid:1) (cid:1)(cid:14)(cid:10)(cid:17)(cid:19)(cid:1)(cid:15)(cid:16)(cid:13)(cid:18)(cid:10)(cid:1) (cid:1)(cid:15)(cid:16)(cid:21)(cid:1) (cid:1)(cid:18)(cid:15)(cid:11)(cid:13) (d) Animal II (cid:2) (cid:4)(cid:6) (cid:6)(cid:2) (cid:7)(cid:6)(cid:3)(cid:2)(cid:2)(cid:3)(cid:4)(cid:6)(cid:3)(cid:6)(cid:2)(cid:3)(cid:7)(cid:6)(cid:4)(cid:2)(cid:2)(cid:4)(cid:4)(cid:6)(cid:4)(cid:6)(cid:2)(cid:2)(cid:3)(cid:2)(cid:2)(cid:4)(cid:2)(cid:2)(cid:5)(cid:2)(cid:2) (cid:2)(cid:11)(cid:8)(cid:10)(cid:7)(cid:9)(cid:8)(cid:1)(cid:5)(cid:6)(cid:4)(cid:3) (cid:2)(cid:9)(cid:7)(cid:10)(cid:11)(cid:5)(cid:12)(cid:12)(cid:6)(cid:9)(cid:8)(cid:1)(cid:3)(cid:4)(cid:13)(cid:5) (cid:1)(cid:8)(cid:15)(cid:17)(cid:1) (cid:1)(cid:10)(cid:13)(cid:18)(cid:17)(cid:1) (cid:1)(cid:12)(cid:8)(cid:15)(cid:17)(cid:1)(cid:13)(cid:14)(cid:11)(cid:16)(cid:8)(cid:1) (cid:1)(cid:13)(cid:14)(cid:19)(cid:1) (cid:1)(cid:16)(cid:13)(cid:9)(cid:11) (e) Indoor II (cid:3) (cid:5)(cid:7) (cid:7)(cid:3) (cid:8)(cid:7)(cid:4)(cid:3)(cid:3)(cid:4)(cid:5)(cid:7)(cid:4)(cid:7)(cid:3)(cid:4)(cid:8)(cid:7)(cid:5)(cid:3)(cid:3)(cid:5)(cid:5)(cid:7)(cid:5)(cid:7)(cid:3)(cid:3)(cid:2)(cid:3)(cid:3)(cid:3)(cid:3)(cid:3)(cid:2)(cid:3)(cid:3)(cid:3)(cid:6)(cid:3)(cid:2)(cid:3)(cid:3)(cid:3)(cid:9)(cid:3)(cid:2)(cid:3)(cid:3)(cid:4)(cid:5) (cid:2)(cid:11)(cid:8)(cid:10)(cid:7)(cid:9)(cid:8)(cid:1)(cid:5)(cid:6)(cid:4)(cid:3) (cid:2)(cid:9)(cid:7)(cid:10)(cid:11)(cid:5)(cid:12)(cid:12)(cid:6)(cid:9)(cid:8)(cid:1)(cid:3)(cid:4)(cid:13)(cid:5) (cid:1)(cid:10)(cid:17)(cid:19)(cid:1) (cid:1)(cid:12)(cid:15)(cid:20)(cid:19)(cid:1) (cid:1)(cid:14)(cid:10)(cid:17)(cid:19)(cid:1)(cid:15)(cid:16)(cid:13)(cid:18)(cid:10)(cid:1) (cid:1)(cid:15)(cid:16)(cid:21)(cid:1) (cid:1)(cid:18)(cid:15)(cid:11)(cid:13) (f) Planet IIFig. 14. Evaluation of the maximum PSED and the average PSED: varyingthe compression rate

C. Performance Evaluation for Range Query Processing Al-gorithm RQC

For RQC algorithm, our experimental goals includes: 1)evaluate the deviation between the range query results on thedataset consisting of raw trajectories and the correspondingcompressed dataset, 2) evaluate the execution time of rangequeries based on points or segments on the compressed trajec-tories, 3) evaluate the impacts of the area of each query region,4) evaluate the impacts of Observation 2 on the execution timeof range queries, 5) evaluate the impacts of

ASP-tree on theexecution time of range queries.Planet I was compressed by ROCE into multiple compresseddatasets with different compression rates. The experiments inSection V-C and V-D were all performed on Planet I and itscorresponding compressed datasets.

1) Range Query Based on Points or Segments:

We inves-tigate the deviation between the range query results on theraw trajectories and the ones on the corresponding compressedtrajectories. In order to evaluate this deviation, we deﬁne 3evaluation metrics. When a range query is queried on the rawtrajectories, each trajectory is seen as a sequence of discretepoints, and Q R denotes the range query result set. When arange query is queried on the compressed trajectories, Q C denotes the range query result set. The precision rate P re andthe recall rate

Rec of a range query are respectively deﬁnedas

P re = | Q R ∩ Q C | / | Q C | and Rec = | Q R ∩ Q C | / | Q R | .For comprehensive comparison, F -Measure is deﬁned as F = ∗ ( P re + Rec ) .We evaluate the average precision rate P re , the averagerecall rate

Rec and the average F of 100000 randomly gener-ated range queries on the compressed datasets w.r.t. varying theompression rate, and the results are shown in Figure 15. Theareas of these query regions are all 16 km . For range queriesbased on points, P re is always 1 since the points in eachcompressed trajectory must be a subset of the correspondingraw trajectory points. But as the compression rate increases,

Rec declines sharply, which means that range queries based onpoints leave up to 25% trajectories overlapped with the queryregions undiscovered. For range queries based on segments,when the compression rate increases, though at most 3.5%trajectories not overlapped with the query regions are in theresult set, much more trajectories overlapped with the queryregions can be found out. The F of range queries based onsegments is up to 10.3% higher than the F of range queriesbased on points, so it’s more accurate to use range queriesbased on segments on the compressed trajectories. (cid:3) (cid:5)(cid:6) (cid:6)(cid:3) (cid:7)(cid:6)(cid:4)(cid:3)(cid:3)(cid:4)(cid:5)(cid:6)(cid:4)(cid:6)(cid:3)(cid:4)(cid:7)(cid:6)(cid:5)(cid:3)(cid:3)(cid:5)(cid:5)(cid:6)(cid:5)(cid:6)(cid:3)(cid:7)(cid:6)(cid:2)(cid:8)(cid:3)(cid:2)(cid:8)(cid:6)(cid:2)(cid:9)(cid:3)(cid:2)(cid:9)(cid:6)(cid:2)(cid:4)(cid:3)(cid:3)(cid:2) (cid:2)(cid:7)(cid:4)(cid:6)(cid:3)(cid:5)(cid:4)(cid:1)(cid:1)(cid:3)(cid:2) (cid:2)(cid:9)(cid:7)(cid:10)(cid:11)(cid:5)(cid:12)(cid:12)(cid:6)(cid:9)(cid:8)(cid:1)(cid:3)(cid:4)(cid:13)(cid:5) (cid:1)(cid:13)(cid:15)(cid:21)(cid:18)(cid:17)(cid:1)(cid:12)(cid:26)(cid:17)(cid:23)(cid:27)(cid:1)(cid:10)(cid:15)(cid:24)(cid:17)(cid:16)(cid:1)(cid:22)(cid:21)(cid:1)(cid:11)(cid:22)(cid:19)(cid:21)(cid:25)(cid:24)(cid:1)(cid:13)(cid:15)(cid:21)(cid:18)(cid:17)(cid:1)(cid:12)(cid:26)(cid:17)(cid:23)(cid:27)(cid:1)(cid:10)(cid:15)(cid:24)(cid:17)(cid:16)(cid:1)(cid:22)(cid:21)(cid:1)(cid:14)(cid:17)(cid:18)(cid:20)(cid:17)(cid:21)(cid:25)(cid:24) (cid:3) (cid:5)(cid:6) (cid:6)(cid:3) (cid:7)(cid:6)(cid:4)(cid:3)(cid:3)(cid:4)(cid:5)(cid:6)(cid:4)(cid:6)(cid:3)(cid:4)(cid:7)(cid:6)(cid:5)(cid:3)(cid:3)(cid:5)(cid:5)(cid:6)(cid:5)(cid:6)(cid:3)(cid:7)(cid:6)(cid:2)(cid:8)(cid:3)(cid:2)(cid:8)(cid:6)(cid:2)(cid:9)(cid:3)(cid:2)(cid:9)(cid:6)(cid:2)(cid:4)(cid:3)(cid:3)(cid:2) (cid:2)(cid:7)(cid:4)(cid:6)(cid:3)(cid:5)(cid:4)(cid:1)(cid:1)(cid:3)(cid:2) (cid:2)(cid:9)(cid:7)(cid:10)(cid:11)(cid:5)(cid:12)(cid:12)(cid:6)(cid:9)(cid:8)(cid:1)(cid:3)(cid:4)(cid:13)(cid:5) (cid:1)(cid:13)(cid:15)(cid:21)(cid:18)(cid:17)(cid:1)(cid:12)(cid:26)(cid:17)(cid:23)(cid:27)(cid:1)(cid:10)(cid:15)(cid:24)(cid:17)(cid:16)(cid:1)(cid:22)(cid:21)(cid:1)(cid:11)(cid:22)(cid:19)(cid:21)(cid:25)(cid:24)(cid:1)(cid:13)(cid:15)(cid:21)(cid:18)(cid:17)(cid:1)(cid:12)(cid:26)(cid:17)(cid:23)(cid:27)(cid:1)(cid:10)(cid:15)(cid:24)(cid:17)(cid:16)(cid:1)(cid:22)(cid:21)(cid:1)(cid:14)(cid:17)(cid:18)(cid:20)(cid:17)(cid:21)(cid:25)(cid:24) (cid:3) (cid:5)(cid:6) (cid:6)(cid:3) (cid:7)(cid:6)(cid:4)(cid:3)(cid:3)(cid:4)(cid:5)(cid:6)(cid:4)(cid:6)(cid:3)(cid:4)(cid:7)(cid:6)(cid:5)(cid:3)(cid:3)(cid:5)(cid:5)(cid:6)(cid:5)(cid:6)(cid:3)(cid:8)(cid:6)(cid:2)(cid:9)(cid:3)(cid:2)(cid:9)(cid:6)(cid:2)(cid:4)(cid:3)(cid:3)(cid:2) (cid:2)(cid:7)(cid:4)(cid:6)(cid:3)(cid:5)(cid:4)(cid:1)(cid:1) (cid:1) (cid:2)(cid:9)(cid:7)(cid:10)(cid:11)(cid:5)(cid:12)(cid:12)(cid:6)(cid:9)(cid:8)(cid:1)(cid:3)(cid:4)(cid:13)(cid:5) (cid:1)(cid:13)(cid:15)(cid:21)(cid:18)(cid:17)(cid:1)(cid:12)(cid:26)(cid:17)(cid:23)(cid:27)(cid:1)(cid:10)(cid:15)(cid:24)(cid:17)(cid:16)(cid:1)(cid:22)(cid:21)(cid:1)(cid:11)(cid:22)(cid:19)(cid:21)(cid:25)(cid:24)(cid:1)(cid:13)(cid:15)(cid:21)(cid:18)(cid:17)(cid:1)(cid:12)(cid:26)(cid:17)(cid:23)(cid:27)(cid:1)(cid:10)(cid:15)(cid:24)(cid:17)(cid:16)(cid:1)(cid:22)(cid:21)(cid:1)(cid:14)(cid:17)(cid:18)(cid:20)(cid:17)(cid:21)(cid:25)(cid:24) Fig. 15. Evaluation of

P re , Rec and F of range queries on the compressedtrajectories: varying the compression rate Figure 16 shows the execution time of 10000 randomlygenerated range queries on the raw dataset and its correspond-ing compressed datasets with different compression rates.For fairness, we don’t use any index to accelerate rangequeries based on points or segments. On the raw dataset, theexecution time of 10000 range queries based on points is 148 s .Range queries based on points or segments on the compressedtrajectories need similar execution time. And they are bothobviously faster than range queries on the raw trajectories.In summary, it’s more efﬁcient and practical to execute rangequeries based on segments on the compressed trajectories.

2) Impacts of the Areas of Query Regions:

For each rangequery, the area of the query region has impact on the num-ber of trajectories in the result set and the execution time.100000 randomly generated range queries were executed onthe compressed dataset whose compression rate is 100, and thearea of each query region was varied from 5 km to 30 km .The results are reported in Table II. Both the average numberof trajectories in each result set and the execution time growapproximately linearly with the area of each query region. OurRQC algorithm is quite efﬁcient and able to support about (cid:1) (cid:3)(cid:5) (cid:5)(cid:1) (cid:7)(cid:5) (cid:2)(cid:1)(cid:1) (cid:2)(cid:3)(cid:5) (cid:2)(cid:5)(cid:1) (cid:2)(cid:7)(cid:5) (cid:3)(cid:1)(cid:1) (cid:3)(cid:3)(cid:5) (cid:3)(cid:5)(cid:1)(cid:1)(cid:4)(cid:1)(cid:6)(cid:1)(cid:8)(cid:1)(cid:2)(cid:3)(cid:1)(cid:2)(cid:5)(cid:1) (cid:4)(cid:15)(cid:7)(cid:6)(cid:14)(cid:13)(cid:8)(cid:11)(cid:10)(cid:1)(cid:5)(cid:8)(cid:9)(cid:7)(cid:2)(cid:12)(cid:3) (cid:2)(cid:9)(cid:7)(cid:10)(cid:11)(cid:5)(cid:12)(cid:12)(cid:6)(cid:9)(cid:8)(cid:1)(cid:3)(cid:4)(cid:13)(cid:5) (cid:1)(cid:5)(cid:7)(cid:13)(cid:10)(cid:9)(cid:1)(cid:4)(cid:18)(cid:9)(cid:15)(cid:19)(cid:1)(cid:2)(cid:7)(cid:16)(cid:9)(cid:8)(cid:1)(cid:14)(cid:13)(cid:1)(cid:3)(cid:14)(cid:11)(cid:13)(cid:17)(cid:16)(cid:1)(cid:5)(cid:7)(cid:13)(cid:10)(cid:9)(cid:1)(cid:4)(cid:18)(cid:9)(cid:15)(cid:19)(cid:1)(cid:2)(cid:7)(cid:16)(cid:9)(cid:8)(cid:1)(cid:14)(cid:13)(cid:1)(cid:6)(cid:9)(cid:10)(cid:12)(cid:9)(cid:13)(cid:17)(cid:16) Fig. 16. Efﬁciency evaluation: varying the compression rate

TABLE IIE

FFICIENCY EVALUATION : VARYING THE AREA OF EACH QUERY REGION

The Area of each Query Region( km ) 5 10 15 20 25 30Average Trajectory Number in the Result 20.78 29.39 36.11 41.90 47.10 51.87Execution Time( s ) 2.82 3.18 3.44 3.67 3.91 4.08

3) Impacts of Observation 2:

To study the impacts of Ob-servation 2, we evaluate the execution time of 10000 randomlygenerated range queries w.r.t. varying the compression rate,and the results are shown in Figure 17. By using Observation2, the acceleration rate is up to 19.5%, and the accelerationgets more obvious when the compression rate gets lower. (cid:1) (cid:3)(cid:6) (cid:6)(cid:1) (cid:8)(cid:6) (cid:2)(cid:1)(cid:1) (cid:2)(cid:3)(cid:6) (cid:2)(cid:6)(cid:1) (cid:2)(cid:8)(cid:6) (cid:3)(cid:1)(cid:1) (cid:3)(cid:3)(cid:6) (cid:3)(cid:6)(cid:1)(cid:4)(cid:5)(cid:6)(cid:7) (cid:4)(cid:15)(cid:7)(cid:6)(cid:14)(cid:13)(cid:8)(cid:11)(cid:10)(cid:1)(cid:5)(cid:8)(cid:9)(cid:7)(cid:2)(cid:12)(cid:3) (cid:2)(cid:9)(cid:7)(cid:10)(cid:11)(cid:5)(cid:12)(cid:12)(cid:6)(cid:9)(cid:8)(cid:1)(cid:3)(cid:4)(cid:13)(cid:5) (cid:1)(cid:5)(cid:11)(cid:16)(cid:10)(cid:13)(cid:17)(cid:16)(cid:1)(cid:4)(cid:15)(cid:11)(cid:12)(cid:9)(cid:1)(cid:3)(cid:7)(cid:15)(cid:8)(cid:14)(cid:18)(cid:6)(cid:16)(cid:11)(cid:13)(cid:12)(cid:1)(cid:2)(cid:1)(cid:5)(cid:11)(cid:16)(cid:10)(cid:1)(cid:4)(cid:15)(cid:11)(cid:12)(cid:9)(cid:1)(cid:3)(cid:7)(cid:15)(cid:8)(cid:14)(cid:18)(cid:6)(cid:16)(cid:11)(cid:13)(cid:12)(cid:1)(cid:2)

Fig. 17. Efﬁciency evaluation: Observation 2 when varying the compressionrate

4) Impacts of ASP-tree:

First, we study how much

ASP-tree index can accelerate the range query processing. We evaluatethe execution time of 10000 randomly generated range queriesw.r.t. varying the compression rate, and the results are shown inFigure 18. It’s obvious that

ASP-tree can accelerate the rangequery processing greatly. By using

ASP-tree , the executiontime can be reduced to less than 1%. (cid:1) (cid:3)(cid:4) (cid:4)(cid:1) (cid:5)(cid:4) (cid:2)(cid:1)(cid:1) (cid:2)(cid:3)(cid:4) (cid:2)(cid:4)(cid:1) (cid:2)(cid:5)(cid:4) (cid:3)(cid:1)(cid:1) (cid:3)(cid:3)(cid:4) (cid:3)(cid:4)(cid:1)(cid:2)(cid:1) (cid:1)(cid:3) (cid:2)(cid:1) (cid:2) (cid:2)(cid:1) (cid:3) (cid:2)(cid:1) (cid:4) (cid:4)(cid:14)(cid:6)(cid:5)(cid:13)(cid:12)(cid:7)(cid:10)(cid:9)(cid:1)(cid:12)(cid:7)(cid:8)(cid:6)(cid:2)(cid:11)(cid:3) (cid:2)(cid:9)(cid:7)(cid:10)(cid:11)(cid:5)(cid:12)(cid:12)(cid:6)(cid:9)(cid:8)(cid:1)(cid:3)(cid:4)(cid:13)(cid:5) (cid:1)(cid:4)(cid:9)(cid:13)(cid:8)(cid:11)(cid:14)(cid:13)(cid:1)(cid:3)(cid:12)(cid:9)(cid:10)(cid:7)(cid:1)(cid:2)(cid:4)(cid:3)(cid:1)(cid:7)(cid:6)(cid:5)(cid:5)(cid:1)(cid:2)(cid:10)(cid:5)(cid:6)(cid:15)(cid:1)(cid:4)(cid:9)(cid:13)(cid:8)(cid:1)(cid:3)(cid:12)(cid:9)(cid:10)(cid:7)(cid:1)(cid:2)(cid:4)(cid:3)(cid:1)(cid:7)(cid:6)(cid:5)(cid:5)(cid:1)(cid:2)(cid:10)(cid:5)(cid:6)(cid:15)

Fig. 18. Efﬁciency evaluation:

ASP-tree index when varying the compressionrate ξ controls the average height of ASP-tree . The averageheight of

ASP-tree is deﬁned as the average height of allleaf nodes, and the height of the root node is 1. In thisexperiment, we evaluate the average height of

ASP-tree and theexecution time of 100000 randomly generated range querieson the compressed dataset whose compression rate is 100 w.r.t.varying ξ , and the results are shown in Table III. ξ controls theaverage height of ASP-tree , and the average height has impacton the execution time of range queries. The greater the averageheight of

ASP-tree , the less execution time needed by theserange queries. We also ﬁnd that the heights of all leaf nodesare almost the same. Especially when ξ =32000 or ξ =64000,the heights of all leaf nodes are all the same, which shows that ASP-tree is not inclined and our region partitioning strategydose work.

D. Performance Evaluation for Similarity Query ProcessingAlgorithm SQC

For SQC algorithm, our experimental goals includes: 1)compare the precision rate (the proportion of true trajectories

ABLE IIIE

FFICIENCY EVALUATION : VARYING THE THRESHOLD VALUE ξ ξ ASP-tree s ) 3.48 3.49 5.38 5.33 9.92 10.02 21.89 in top- k similarity result) of different trajectory similaritymetrics, 2) evaluate the impacts of σ on the execution timeof top- k similarity queries, 3) evaluate the impacts of the Pre-Punishment strategy on the execution time of top- k similarityqueries.

1) Trajectory Similarity Metric AL:

We compare AL withtwo other trajectory similarity metrics for measuring the tra-jectory similarity, namely EDR [13] and EDwP [14]. EDwPis the state-of-the-art metrics for measuring similarity of non-uniform and low sampling rate trajectories. EDR is a trajectorysimilarity metric based on point and it’s more robust andaccurate than other distance functions [13], [15].It’s challenging to evaluate the accuracy of trajectory simi-larity because of the lack of ground-truth dataset. To contendwith the lack of ground-truth, we followed the methodologyused in previous work [14], [16]. We ﬁrst randomly chose 100trajectories from Planet I as top- k similarity query trajectories.Next, we applied each trajectory similarity metrics to ﬁnd thetop- k similarity result of each query trajectory from Planet Ias its groundtruth. Finally, on the corresponding compresseddatasets of Planet I, 100 corresponding compressed trajectorieswere chosen as top- k similarity query trajectories. And foreach query, we found the top- k similarity result from the cor-responding compressed dataset using each trajectory similaritymetric. Then we compare the result with the correspondingground-truth. The rationale behind this methodology is that arobust trajectory similarity metric should adapt to non-uniformand relatively low sampling rates and yield results close tothose for relatively high sampling rate counterparts. Figure19 shows the precision rates of different trajectory similaritymetrics when the compression rate of the compressed datasetis varied. The precision of all methods decreases when thecompression rate increases.The precision rates of AL are all much higher than thoseof EDR, which means AL is obviously more suitable forsimilarity queries on compressed trajectories. (cid:2) (cid:4)(cid:2) (cid:5)(cid:2) (cid:6)(cid:2) (cid:7)(cid:2) (cid:3)(cid:2)(cid:2) (cid:3)(cid:4)(cid:2) (cid:3)(cid:5)(cid:2) (cid:3)(cid:6)(cid:2) (cid:3)(cid:7)(cid:2) (cid:4)(cid:2)(cid:2)(cid:2)(cid:1)(cid:4)(cid:2)(cid:1)(cid:5)(cid:2)(cid:1)(cid:6)(cid:2)(cid:1)(cid:7)(cid:2)(cid:1)(cid:3)(cid:2)(cid:2)(cid:1) (cid:4)(cid:9)(cid:2)(cid:3)(cid:1)(cid:5)(cid:9)(cid:2)(cid:3)(cid:1) (cid:4)(cid:6)(cid:2)(cid:3)(cid:1) (cid:4)(cid:9)(cid:2)(cid:3)(cid:1) (cid:4)(cid:7)(cid:2)(cid:3)(cid:1) (cid:4)(cid:6)(cid:2)(cid:3)(cid:1) (cid:4)(cid:8)(cid:2)(cid:3)(cid:1)(cid:11)(cid:7)(cid:2)(cid:3)(cid:1)(cid:11)(cid:3)(cid:2)(cid:3)(cid:1) (cid:10)(cid:10)(cid:2)(cid:3)(cid:1) (cid:10)(cid:11)(cid:2)(cid:3)(cid:1) (cid:10)(cid:10)(cid:2)(cid:3)(cid:1) (cid:10)(cid:9)(cid:2)(cid:3)(cid:1) (cid:10)(cid:5)(cid:2)(cid:3)(cid:1) (cid:2)(cid:10)(cid:6)(cid:5)(cid:7)(cid:11)(cid:7)(cid:9)(cid:8)(cid:1)(cid:3)(cid:4)(cid:12)(cid:6) (cid:2)(cid:9)(cid:7)(cid:10)(cid:11)(cid:5)(cid:12)(cid:12)(cid:6)(cid:9)(cid:8)(cid:1)(cid:3)(cid:4)(cid:13)(cid:5) (cid:1)(cid:4)(cid:3)(cid:6)(cid:1)(cid:2)(cid:5) Fig. 19. The average precision rates of EDR and AL

2) Impacts of the Threshold Value σ : σ is a given thresholdvalue and when the minimum Euclidean distance betweentwo compressed trajectories is greater than σ , the similaritybetween these two compressed trajectories must be 0 in AL.To evaluate the impacts of σ on the execution time of 10randomly generated top- k similarity queries on the compressed trajectories, whose compression rate is 100, and the ﬁlteringrate of the similarity candidate set, we varied σ from 0.002to 0.02 in longitude and latitude, and the results are reportedin Table IV. When increasing σ , there are more trajectorieswhose similarity to the queried one is more than 0 in AL. Theﬁltering rate of the similarity candidate set varies from 99.8%to 99.9%, which means only at most 0.2% trajectories in thequeried dataset are left. So by using the similarity candidateset, most of unnecessary calculations of AL can be avoided.The execution time increases with the increase of σ , sincemore trajectories left in the similarity candidate set and morecalculations of AL are deeded. TABLE IVE

FFICIENCY EVALUATION : VARYING THE THRESHOLD VALUE σ σ

3) Impacts of Pre-Punishment strategy:

To study the im-pacts of the

Pre-Punishment strategy, we evaluate the execu-tion time of 10 randomly generated similarity queries w.r.t.varying the compression rate, and the results are shown inFigure 20. By using the

Pre-Punishment strategy, the executiontime can be reduced to as low as 17.8%. It can be also observedthat by querying on the compressing trajectories with a highercompression rate, the execution time of similarity queries canbe reduced quite greatly. (cid:1) (cid:3)(cid:4) (cid:4)(cid:1) (cid:5)(cid:4) (cid:2)(cid:1)(cid:1) (cid:2)(cid:3)(cid:4) (cid:2)(cid:4)(cid:1) (cid:2)(cid:5)(cid:4) (cid:3)(cid:1)(cid:1) (cid:3)(cid:3)(cid:4) (cid:3)(cid:4)(cid:1)(cid:2)(cid:1) (cid:1)(cid:3) (cid:2)(cid:1) (cid:2) (cid:2)(cid:1) (cid:3) (cid:2)(cid:1) (cid:4) (cid:2)(cid:1) (cid:5) (cid:2)(cid:1) (cid:6) (cid:2)(cid:1) (cid:7) (cid:4)(cid:15)(cid:7)(cid:6)(cid:14)(cid:13)(cid:8)(cid:11)(cid:10)(cid:1)(cid:5)(cid:8)(cid:9)(cid:7)(cid:2)(cid:12)(cid:3) (cid:2)(cid:9)(cid:7)(cid:10)(cid:11)(cid:5)(cid:12)(cid:12)(cid:6)(cid:9)(cid:8)(cid:1)(cid:3)(cid:4)(cid:13)(cid:5) (cid:1)(cid:3)(cid:8)(cid:13)(cid:7)(cid:10)(cid:14)(cid:13)(cid:1)(cid:2)(cid:12)(cid:8)(cid:9)(cid:6)(cid:1)(cid:13)(cid:7)(cid:5)(cid:1)(cid:2)(cid:8)(cid:3)(cid:1)(cid:2)(cid:11)(cid:7)(cid:5)(cid:9)(cid:4)(cid:6)(cid:3)(cid:7)(cid:10)(cid:1)(cid:12)(cid:13)(cid:11)(cid:4)(cid:13)(cid:5)(cid:6)(cid:15)(cid:1)(cid:3)(cid:8)(cid:13)(cid:7)(cid:1)(cid:2)(cid:12)(cid:8)(cid:9)(cid:6)(cid:1)(cid:13)(cid:7)(cid:5)(cid:1)(cid:2)(cid:8)(cid:3)(cid:1)(cid:2)(cid:11)(cid:7)(cid:5)(cid:9)(cid:4)(cid:6)(cid:3)(cid:7)(cid:10)(cid:1)(cid:12)(cid:13)(cid:11)(cid:4)(cid:13)(cid:5)(cid:6)(cid:15)

Fig. 20. Efﬁciency evaluation: the

Pre-Punishment strategy when varying thecompression rate

VI. R

ELATED W ORK

Compression algorithms in online mode . Appliedin many application scenarios, trajectory compression algo-rithms in online mode attract people’s attention and somealgorithms are proposed based on different error criterions.There are mainly 4 frequently-used error criterions, i.e. PED,SED, DAD and LISSED, which are adopted to measure thedegree of the accuracy loss after a trajectory is compressed.DAD deﬁnes the accuracy loss based on the greatest angulardifference between two directions. For DAD doesn’t takeinto account the Euclidean distance between each discardedpoint and its corresponding line segment, a main weaknessof DAD is that a discarded point may be too far awayfrom its corresponding line segment and this line segmentcan’t approximately represent such a discarded point well.For each discarded point, SED and LISSED use the attributeof time to ﬁnd its corresponding synchronized point on itscorresponding line segment. But the application of SED andLISSED is limited, because the attribute of time in eachtrajectory point isn’t always available because of privacy orther reasons. Introduced in Section II-B, PED can be usedin much more application scenarios, and there are mainly4 trajectory compression algorithms in online mode usingPED as their error metric, i.e. OPW(BOPW) [7], [8], BQS[1], [2], FBQS [1], [2] and OPERB [6]. OPW is proposedvery early and it compresses a trajectory segment as long aspossible into a line segment. BQS builds a virtual coordinatesystem centered at the starting point. In each of 4 quadrants,BQS establishes a rectangular bounding box as well as twobounding lines so that in most cases, a point can be quicklydecided for removal or preservation without expensive errorcalculation. In FBQS, a fast version of BQS, error calculationis no longer needed, and a raw trajectory point is directlyreserved if it needs error calculation in BQS. FBQS is a littlefaster than BQS at the expense of compression rate. Based ona novel distance checking method, OPERB uses a directed linesegment to approximate the buffered points. These algorithmshave been compared detailly in Section V-B.

Trajectory Similarity Metric . Most widely used tra-jectory similarity metrics are only based on the distancesbetween matched point pairs of two trajectories, such as ED[17], [18], DTW [19], LCSS [20], EDR [13], Swale [21], MA[22] and so on. Though they are widely used, none of themare suitable for compressed trajectories as discussed in SectionIV-A. From another viewpoint of integral, DISSIM [23] diﬁnesthe spatiotemporal dissimilarity between two trajectories dur-ing a deﬁnite time interval by integrating their Euclideandistance in time. When measuring the dissimilarity betweentwo trajectories, DISSIM thinks highly of the time dimension,which means two trajectories are determined to be similar iffthey must have not only similar shapes but also similar speeds.While in practical applications, it’s not a must. EDwP [14]is designed to measure the similarity between a pair of rawtrajectories under inconsistent and variable sampling rates. ALhas been compared with EDR and EDwP in Section V-DVII. C

ONCLUSIONS

In this paper, each compressed trajectory is regarded asa sequence of continuous line segments, but not discretepoints. And these continuous line segments approximatelydescribe the movement of the moving object. Based on this,we propose a whole set of solutions to efﬁcient compressingtrajectories and querying on compressed trajectories, includinga new compression algorithm ROCE, range query processingalgorithm RQC and similarity query processing algorithmSQC. An efﬁcient index

ASP-tree and lots of novel techniquesare also presented to accelerate the trajectory compression,range queries and similarity queries obviously. Extensive ex-periments are conducted on real-life trajectory datasets. Theresults demonstrate superior performance of our methods.R

EFERENCES[1] J. Liu, K. Zhao, P. Sommer, S. Shang, B. Kusy, and R. Jurdak, “Boundedquadrant system: Error-bounded trajectory compression on the go,” in . IEEE,2015, pp. 987–998. [2] J. Liu, K. Zhao, P. Sommer, S. Shang, B. Kusy, J.-G. Lee, and R. Jurdak,“A novel framework for online amnesic trajectory compression inresource-constrained environments,”

IEEE Transactions on Knowledgeand Data Engineering , vol. 28, no. 11, pp. 2827–2841, 2016.[3] D. Zhang, M. Ding, D. Yang, Y. Liu, J. Fan, and H. T. Shen,“Trajectory simpliﬁcation: An experimental study and quality analysis,”

Proc. VLDB Endow. , vol. 11, no. 9, pp. 934–946, May 2018. [Online].Available: https://doi.org/10.14778/3213880.3213885[4] B. Ke, J. Shao, Y. Zhang, D. Zhang, and Y. Yang, “An online approachfor direction-based trajectory compression with error bound guarantee,”in

Asia-Paciﬁc Web Conference . Springer, 2016, pp. 79–91.[5] B. Ke, J. Shao, and D. Zhang, “An efﬁcient online approach fordirection-preserving trajectory simpliﬁcation with interval bounds,” in . IEEE, 2017, pp. 50–55.[6] X. Lin, S. Ma, H. Zhang, T. Wo, and J. Huai, “One-pass error boundedtrajectory simpliﬁcation,”

Proceedings of the VLDB Endowment , vol. 10,no. 7, pp. 841–852, 2017.[7] E. Keogh, S. Chu, D. Hart, and M. Pazzani, “An online algorithmfor segmenting time series,” in

Proceedings 2001 IEEE InternationalConference on Data Mining . IEEE, 2001, pp. 289–296.[8] N. Meratnia and A. Rolf, “Spatiotemporal compression techniquesfor moving point objects,” in

International Conference on ExtendingDatabase Technology . Springer, 2004, pp. 765–782.[9] J. E. Hershberger and J. Snoeyink,

Speeding up the Douglas-Peuckerline-simpliﬁcation algorithm . University of British Columbia, Depart-ment of Computer Science, 1992.[10] A. Flack, W. Fiedler, J. Blas, I. Pokrovski, B. Mitropolsky, M. Kaatz,K. Aghababyan, A. Khachatryan, I. Fakriadis, E. Makrigianni,L. Jerzak, M. Shamin, C. Shamina, H. Azafzaf, C. Feltrup-Azafzaf,T. Mokotjomela, and M. Wikelski, “Data from: Costs of migratorydecisions: a comparison across eight white stork populations,” 2015.[Online]. Available: http://dx.doi.org/10.5441/001/1.78152p3q[11] D. Brˇsˇci´c, T. Kanda, T. Ikeda, and T. Miyashita, “Person tracking in largepublic spaces using 3-d range sensors,”

IEEE Transactions on Human-Machine Systems , vol. 43, no. 6, pp. 522–534, 2013.[12] W. Cao and Y. Li, “Dots: An online and near-optimal trajectorysimpliﬁcation algorithm,”

Journal of Systems and Software , vol. 126,pp. 34–44, 2017.[13] L. Chen, M. T. ¨Ozsu, and V. Oria, “Robust and fast similarity search formoving object trajectories,” in

Proceedings of the 2005 ACM SIGMODinternational conference on Management of data . ACM, 2005, pp.491–502.[14] S. Ranu, P. Deepak, A. D. Telang, P. Deshpande, and S. Raghavan,“Indexing and matching trajectories under inconsistent sampling rates,”in .IEEE, 2015, pp. 999–1010.[15] B. Zhang, Y. Shen, Y. Zhu, and J. Yu, “A gpu-accelerated frameworkfor processing trajectory queries,” in . IEEE, 2018, pp. 1037–1048.[16] H. Su, K. Zheng, H. Wang, J. Huang, and X. Zhou, “Calibratingtrajectory data for similarity-based analysis,” in

Proceedings ofthe 2013 ACM SIGMOD International Conference on Managementof Data , ser. SIGMOD ’13. New York, NY, USA: Associationfor Computing Machinery, 2013, p. 833–844. [Online]. Available:https://doi.org/10.1145/2463676.2465303[17] C. Faloutsos, M. Ranganathan, and Y. Manolopoulos, “Fast subsequencematching in time-series databases,” in

Proceedings of the 1994 ACMSIGMOD International Conference on Management of Data , ser.SIGMOD ’94. New York, NY, USA: Association for ComputingMachinery, 1994, p. 419–429. [Online]. Available: https://doi.org/10.1145/191839.191925[18] B.-K. Yi and C. Faloutsos, “Fast time sequence indexing for arbitrarylp norms,” in

VLDB , vol. 385, no. 394, 2000, p. 99.[19] D. J. Berndt and J. Clifford, “Using dynamic time warping to ﬁndpatterns in time series.” in

KDD workshop , vol. 10, no. 16. Seattle,WA, 1994, pp. 359–370.[20] M. VLACHOS, G. KOLLIOS, and D. GUNOPULOS, “Discoveringsimilar multidimensional trajectories,” in

International conference ondata engineering , 2002, pp. 673–684.[21] M. D. Morse and J. M. Patel, “An efﬁcient and accurate method forevaluating time series similarity,” in

Proceedings of the 2007 ACMSIGMOD international conference on Management of data . ACM,2007, pp. 569–580.22] S. Sankararaman, P. K. Agarwal, T. Mølhave, J. Pan, and A. P.Boedihardjo, “Model-driven matching and segmentation of trajectories,”in

Proceedings of the 21st ACM SIGSPATIAL International Conferenceon Advances in Geographic Information Systems . ACM, 2013, pp.234–243.[23] E. Frentzos, K. Gratsias, and Y. Theodoridis, “Index-based most similartrajectory search,” in