[PDF] Faster and Smaller Two-Level Index for Network-based Trajectories

Abstract

Two-level indexes have been widely used to handle trajectories of moving objects that are constrained to a network. The top-level of these indexes handles the spatial dimension, whereas the bottom level handles the temporal dimension. The latter turns out to be an instance of the interval-intersection problem, but it has been tackled by non-specialized spatial indexes. In this work, we propose the use of a compact data structure on the bottom level of these indexes. Our experimental evaluation shows that our approach is both faster and smaller than existing solutions.

Full PDF

FFaster and Smaller Two-Level Index forNetwork-based Trajectories (cid:63)

Rodrigo Rivera, Andrea Rodr´ıguez, and Diego Seco

Department of Computer Science, University of Concepci´on, Chile { rodrivera,andrea,dseco } @udec.cl Abstract.

Two-level indexes have been widely used to handle trajec-tories of moving objects that are constrained to a network. The top-level of these indexes handles the spatial dimension, whereas the bottomlevel handles the temporal dimension. The latter turns out to be an in-stance of the interval-intersection problem, but it has been tackled bynon-specialized spatial indexes. In this work, we propose the use of acompact data structure on the bottom level of these indexes. Our exper-imental evaluation shows that our approach is both faster and smallerthan existing solutions.

Keywords:

Space-eﬃcient data structures · Moving-objects · Indexing.

Spatio-temporal information has gained popularity in decision making systems,such as optimization of transportation systems, urban planning, and so on. Theproliferation of diﬀerent types of sensors to capture or generate this kind of datahas made these applications possible but, at the same time, it has also madechallenging the storage and processing of spatio-temporal data. The work in thispaper focuses on a subcategory of spatio-temporal data, that is, trajectory ofmoving objects, which can be be reconstructed by the GPS devices of smart-phones or, at a diﬀerent granularity, by smart transportation cards.Trajectories can be classiﬁed as free-trajectories, in which movement is notconstrained, and network-based trajectories, in which movement is constrainedto a network and cannot exist outside such network. Hurricanes and animal mi-grations are examples of the former, whereas public transportation is an exampleof the latter. Useful queries that can be answered by handling trajectories are:count the number of vessels inside a region during a time period (e.g. ﬁshingclosed season) or ﬁnd the shortest path between two stops of a transportationsystem during a time period. (cid:63)

Funded in part by European Union’s Horizon 2020 research and innovation pro-gramme under the Marie Sk(cid:32)lodowska-Curie grant agreement 690941, by CONICYT-PFCHA/Mag´ısterNacional/2016 - 22161080 (R.R.), and Millennium Institute forFoundational Research on Data and Fondecyt-Conicyt grant number 1170497. a r X i v : . [ c s . D S ] J a n R. Rivera et al.

Several spatio-temporal indexes have been proposed to handle both free andnetwork-based trajectories. However, classical solutions to deal with moving-object data are ineﬃcient when facing the data volume collected through newsensor technology and the increasing interest for data analysis. On the otherhand, space-eﬃcient data structures have been proved to be successful for han-dling large volumes of data in many diﬀerent domains, such as the Web, biologicalsequences, documents and code repositories, to name some examples.In this work, we focus on two-level indexes for network-based trajectories andpropose a new solution that uses compact data structures on the bottom level.This approach turns out to be smaller and faster than existing solutions.

A data structure for trajectories must provide access methods that allow the pro-cessing of spatio-temporal queries. These queries can be classiﬁed into coordinate-and trajectory-based queries [25]. Coordinate-based queries include time-slice queries that determine the position of objects at a given time instant, time-interval queries that extend time-slice queries to a time range, and queriesabout nearby neighbors . As for trajectory-based queries, they include topologi-cal queries, which involve information regarding the movement of an object, andqueries related to navigation, which involve information derived from the move-ment, such as speed or direction. There also exist combinations such as “Wherewas object X at a given time instant”.Various data structures have been proposed to eﬃciently support queries ontrajectories. These structures can be broadly classiﬁed into two categories: i)Data structures to support free movements on a space, such as 3D R-tree (athree-dimensional extension of the R-tree [17]), TB-tree [25] (which preservesthe trajectories while allowing typical range queries on an R-tree) and MV3R-tree [31] (which uses a multi-version R-tree, called MVR-tree, along with anauxiliary 3D R-tree). ii) Data structures to support movements on networks, suchas FNR-tree [14] (which uses a combination of a 2D R-tree with a forest of 1DR-trees), MON-tree [1] (using 2 levels of 2D R-trees) and PARINET [27] (basedon graph partitioning and the use of B + -tree). Among the previous structures,FNR-tree and MON-tree have in common the separation of spatial and temporaldimensions, using a spatial structure (two-dimensional) and a forest of temporalstructures (one-dimensional) to tackle each of these sub-problems separately.Like FNR-tree and MON-tree, we focus on these two-level indexes. To solvethe spatial problem, that is, the representation of the network in space (two-dimensional plane), aforementioned structures use a 2D R-tree, storing the seg-ments of the network as lines. With the spatial problem solved, time has to beassociated with segments in the network. More precisely, it is necessary to lookfor all the time intervals (times in which some objects pass through a segment)that intersect with a given query interval. This problem is known in the litera-ture as interval intersection , an extension of the interval stabbing problem [29].Classical structures to solve this problem are Interval trees and Priority trees [5]. aster and Smaller Two-Level Index for Network-based Trajectories 3 As for the subproblem in the temporal dimension, FNR-tree makes use ofa one-dimensional R-tree for each segment. These 1D R-trees index the objectswhose trajectories pass through the segments of the network, storing the instantthey enter and leave the segment in the form of a time interval ( t entry , t exit ).Since only these intervals are stored, the structure assumes that objects do notstop or change speed or direction in the middle of a segment, they can only do soat nodes. MON-tree eliminates this restriction by replacing the one-dimensionalR-trees with two-dimensional R-trees, where they store the relative movementwithin the segments as rectangles in the 2D R-tree of the form ( p , p , t , t ),with ( p , p ) a range of relative positions and ( t , t ) a temporal interval.While some of aforementioned structures support queries eﬃciently on largedatasets, they are incapable of handling the increasing data volume of currentapplications. This has forced the use of compression techniques for data storageand transmission. Some techniques are to reduce the number of points in acurve [22] or to use features at each point, such as speed and orientation [26].Both techniques work in free spaces and, when the movement is restricted tonetworks, it is even possible to get a better compression, like the ones shownin [18,19,20,28].Previous compression techniques improve storage requirements and trans-mission time of large datasets. However, the compression can be directly ex-ploited by data structures that can maintain a compact representation of thedata while allowing for indexed search capabilities. These structures have beencalled self-indexes and have been successfully implemented in other domains,such as information retrieval [23].Recently, compact data structures have been also used for the representationof trajectories. GraCT [9], for free paths, uses a k -tree [10] to store the abso-lute position of the objects in regular time intervals (snapshots) plus compressedlogs for the representation of the movements between snapshots. ContaCT [8] im-proves GraCT with more eﬃcient logs. Both structures answer spatio-temporalqueries where space and time are the main ﬁlters, such as, “ﬁnding trajectoriesthat went through a speciﬁc region at a given time instant”. On the other hand,CTR [7] supports trajectories restricted to networks by combining compressedsuﬃxes arrays (CSA), to represent the nodes on the network an object passesthrough, and a balanced Wavelet matrix for the temporal component of themovement. In CTR, trajectories (or trips) are deﬁned as sequences of labels,which represent the nodes of the network. Hence, it solves other types of queriesin which the space is represented with such labels, such as “ﬁnd the numberof trajectories that started at X and ended at Y ”. This is a fundamental dif-ference with our proposal, in which the spatial dimension are coordinates ina two-dimensional space, and not labels. This is also the main diﬀerence withCiNCT [20], which boosts CTR in terms of memory storage and query time.Another diﬀerence with previous solutions is that our approach uncouplesthe network from the trajectories. This model known as Network-Matched hasbeen successfully used [12,21], but without using compact data structures in itsimplementation. Our approach has the advantage that mapping trajectories to R. Rivera et al. a network facilitates the ﬁnding of similar trajectories and, in consequence, itallows a better use of space.

Similarly to the FNR-tree and MON-tree, we propose an index with two levels:spatial (top level) and temporal (bottom level). In a preliminary experimentalevaluation, we observed that the spatial level requires negligible space comparedwith the temporal level. For example, for the Oldenburg network (see Section 4),in the baseline structure, the temporal level uses about 89% of the total memorywith 1,000 objects circulating and about 94% with 2,000 objects. The more dataare stored, the more negligible the spatial level becomes (due to the almost-staticnature of the transportation network in comparison with the moving objects).Hence, we focus on optimizing the temporal level and process the spatial levelwith a two-dimensional R-tree, as the FNR-tree and MON-tree do. Recall thatthe R-tree is a balanced tree in which each leaf stores an entry of the form (id,MBB), where id is a reference to the data (in this case to a temporal index)and MBB is the Minimum Bounding Box that covers the spatial object (a linesegment, in this domain). The R-tree does not provide worst-case guaranteesas it may be forced to examine the entire tree in O ( n ) time, even when theoutput is empty. However, it performs well in practice and is ubiquitous in spatialdatabases.Each leaf of the R-tree contains a reference to a temporal index. These in-dexes solve the Interval Intersection problem. Before presenting alternatives tosolve this problem, we give an overview of a query algorithm for spatio-temporalrange queries, which are the most general coordinate-bases queries. First, a spa-tial query, a 2D window, is solved on the 2D R-tree, which returns a set of leaveswhose segment may intersect the window. As in most spatial indexes, a reﬁne-ment step is then executed to eliminate false positives, i.e. network segmentswhose MBBs intersect the window, but they do not actually intersect the win-dow. After this reﬁnement, the interval intersection query is executed in eachtemporal index referenced by the remaining leaves of the R-tree. Results fromall these temporal indexes are then combined using an implementation of a set.

Unlike the FNR-tree and MON-tree, which use variants of an R-tree, we explorethe use of specialized data structures for the interval-intersection problem.

Interval-tree [5].

This is a binary tree that is constructed recursively in thefollowing way: i) The median x med of all the interval endpoints is computed. ii)Intervals are classiﬁed in three sets, I med , I left and I right , which contain intervalsstabbed by x med , intervals to the left of x med and intervals to the right of x med ,respectively. iii) I med is stored in a structure composed of two arrays sorted by aster and Smaller Two-Level Index for Network-based Trajectories 5 left and right endpoints, and associated with the root, whereas I left and I right are recursively processed and assigned as left and right child, respectively.A search for the intervals that intersect with the query interval ( l q , r q ) issolved recursively starting from the root. The intervals within the visiting nodethat intersect the query interval are returned and the search is continued in theleft child if l q is less than x med and/or in the right child if r q is greater than x med . This data structure requires linear space and O (log n + k ) query time,where k is the number of reported results. Schmidt.

The structure presented in [29] to solve the Interval Stabbing problemcan be extended to solve also the Interval Intersection problem [11]. It deﬁnesthe father of an interval as the rightmost interval among those that cover itcompletely. This relation forms a tree where siblings are ordered from left toright, and the root of the tree is a special node that acts as the father of all theintervals that are not covered by any other. In addition, for each possible end-point of an interval, the structure stores an array called start , with a pointer tothe node representing the rightmost-starting interval that intersects such point,and an array start2 , storing a pointer to the node representing the rightmostinterval starting up to such point (which may not be stabbed by it).To solve an interval intersection query q , the algorithm ﬁrst reports the right-most interval that intersects q , which is max( start [ l q ] , start [ r q ]), if it exists.Then, the algorithm recursively reports the siblings to the left of the node whileits right endpoint is greater than or equal to l q , also searching among the rightchildren of the reported nodes. This structure requires linear space and optimal O (1+ k ) query time. Note, however, that this solution works only for small integerranges. In order to work with intervals whose endpoints are ﬂoats, these end-points are stored in sorted arrays and two binary searches are used to translatethe query to rank space [11], which results in a total complexity of O (log n + k ). Compact data structure based on Independent Interval Sets (IIS).

Aset of intervals I = { i , i , ..., i n } is called an Independent Interval Set if nointerval i j ∈ I is contained in any other interval i k ∈ I .Report the k intervals of an IIS that intersect a query interval Q = [ l q , r q ]can be easily computed if we have the intervals in order. Note that, by deﬁnitionof IIS , the order of the left endpoints of the intervals is the same as that ofthe right endpoints. If the ﬁrst and the last interval intersected by the queryare located, it is enough to iterate between them to return all the intersectedintervals (see Figure 1).In order to locate these two intervals, we could store the left and right coordi-nates of the intervals in two sorted arrays and use binary search to locate them,which is similar to what we did in previous solution. However, for this domain,we propose a simple solution that facilitates the use of compact data structures.Recall that the endpoints of our intervals are timestamps represented as ﬂoatnumbers. We multiply these timestamps by a scale factor to convert them tointegers. For example, if we work with timestamps with up to 6 decimals it is

R. Rivera et al.

Fig. 1: An Independent Interval Set (IIS) and its representation with two bitvec-tors. In red, the last interval stabbed in start and the ﬁrst one in end .enough to multiply each one of them by 10 to discretize the space. With thisprocedure, we obtain integer endpoints in an universe U and, the larger the scalefactor, the larger the universe.After this discretization, we use two bitvectors, one for the left endpoints, start , and another for the right endpoints, end , of each interval in the set (seeFigure 1). A 1-bit in these bitvectors indicate that an interval starts (or ends,respectively), at such position. Then, for a query Q = [ l q , r q ] also discretized tothis universe, two rank operations on these bitmaps are used to locate the ﬁrstand last intervals intersected by the query: rank ( end, l q ) and rank ( start, r q ),respectively. As we mentioned above, the larger the scale factor, the larger thesize of the universe u , which is the number of bits in these bitmaps. However, thenumber of set bits in them is n , which is the number of intervals (independentlyof the scale factor). Hence, we use the Elias-Fano representation [24] for thisbitmaps, which takes 2 n + n log un bits of space. Note that, for a constant c and u = O ( n c ), it uses linear space as previous structures. The query time of rankoperations on these bitmaps is O (log un ), thus, this structure can report the k intervals intersecting the query in O (log un + k ) time.Although this solution only works for IIS, a general set of intervals can bedecomposed into m independent sets in O ( n log m ) time, for example, with Fred-man’s algorithm [13] to ﬁnd the optimal number of shuﬄed upsequences in apermutation (by considering the rightmost endpoints of the intervals as the per-muted values). This leads to a solution that requires O ( m log un + k ) time toreport the k solutions. This does not provide worst case guarantees as m can beas large as n , however, this adaptive analysis shows that this is an eﬃcient solu-tion for domains in which m is small. The empirical evaluation in next sectionshows that this is precisely the case in our domain. All the implementations evaluated in this paper were coded in C++11. Forthe baselines, we use some available implementations: R-tree [2], Interval-tree This implementation uses sequential search in each node, which is not optimal intheory, but performs well in practice.aster and Smaller Two-Level Index for Network-based Trajectories 7 [15] and Schmidt [30]. We also make use of some succinct data structures fromthe SDSL library [16]. The experiments were run in a computer with an IntelXeon E3-1220 v5 of 3.00 GHz CPU, 64GB of RAM, and implementations werecompiled with g++ 5.4.0 over Ubuntu 16.04 (64 bits).We ﬁrst evaluate the performance of all the implementations for intervalintersection on synthetic datasets, and then, the best candidates are evaluatedin the complete solution for network-based trajectories.

We evaluated the performance in three scenarios with diﬀerent types of intervals:i) ﬁxed size (Figure 2), ii) random size (Figure 3), and iii) intervals of trajectoriesextracted from a trajectories dataset generated with Brinkhoﬀ’s generator [6]over San Francisco’s network (Figure 4). For each of these scenarios, we created adataset with 800,000 intervals and a queryset with 500 random queries. Reportedquery time is the total time to solve all the queries. × T i m e ( s ) a) Query time R-treeInterval-treeSchmidtIIS × M e m o r y u s ag e ( M B ) b) Space (log scale) 2 4 6 801234 Number of intervals ( × T i m e ( s ) c) Construction time Fig. 2: Fixed size intervals.Figure 2 shows the performance of the structures using ﬁxed size intervals.The compact data structure shows the best performance among the four struc-tures, with a considerable advantage in both query time and memory usage. Inthis scenario, intervals do not fully cover each other (except for precision issues),which produces a low number of independent sets in the IIS structure (only 6for 800,000 intervals). This explains the outstanding performance of IIS.Figure 3 shows the performance of the structures for random size intervals.The compact data structure keeps the best results in query time and memoryusage (although in a tie with the Interval-tree) while the building time is dras-tically increased (up to 900 times the building time of the Interval-tree). This isexplained by the high number of independent sets (3,273 for 800,000 intervals),which is caused by the frequency with which intervals fully cover each other.Figure 4 shows the performance of the structures using time intervals ex-tracted from synthetic trajectories obtained with Brinkhoﬀ’s generator. Thecompact data structure shows a performance in between the two previous cases,

R. Rivera et al. × T i m e ( s ) a) Query time R-treeInterval-treeSchmidtIIS × M e m o r y u s ag e ( M B ) b) Space (log scale) 2 4 6 805001 ,

000 Number of intervals ( × T i m e ( s ) c) Construction time Fig. 3: Random size intervals. × T i m e ( s ) a) Query time R-treeInterval-treeSchmidtIIS × M e m o r y u s ag e ( M B ) b) Space (log scale) 2 4 6 80510 Number of intervals ( × T i m e ( s ) c) Construction time Fig. 4: Intervals from trajectories.but more similar to the ﬁrst one. This shows the sensibility of the structure tothe number of independent sets. In this dataset, intervals of trajectories haveoften similar length, producing a relatively low number of independent sets (29for 800,000 intervals). Each temporal index is associated with a segment of thenetwork, and moving objects usually traverse a same segment at a similar speed.We also evaluated the sensibility of the structures to the scale used to trans-form original ﬂoat-number times to integers. In this procedure, each time ismultiplied by a scale factor and then truncated. In our datasets, original timesuse up to 8 digits to the right of the decimal point. Hence, a scale factor of 10 guarantees a lossless transformation, whereas lower scale factors may produce alossy transformation. In these experiments, we used the same 800,000 intervalsof trajectories of the previous evaluation and results are shown in Figure 5.Query time shows an almost constant behavior, except for the increase suf-fered by Schmidt, which is caused by the high number of duplicates when only2 or less digits are used for the fractional part. In terms of space and construc-tion time, the compact data structure is more sensible than the other structures,which is caused by the scale process. As we explain in previous section, the largerthe scale factor, the larger the size of the bitmaps in this structure. Even so, thisstructure obtains the best results in both query time and memory usage, alsogiving the possibility to improve the performance in applications where the usercan aﬀord losing some precision. Note, however, that in all the other experimentswe consider all the decimals, which is the worst case for our proposal. aster and Smaller Two-Level Index for Network-based Trajectories 9 T i m e ( s ) a) Query time R-treeInterval-treeSchmidtIIS − M e m o r y u s ag e ( M B ) b) Space (log scale) 0 2 4 6 80246 Scale T i m e ( s ) c) Construction time Fig. 5: Performance according the scale of the intervals. The last point of IIS inthe last graph was omitted, because it is about 40 times larger than the others.

From the experiments in previous section, we conclude that Schmidt’s structureis always outperformed by the others, and thus it is not considered in the im-plementation of data structures for trajectories. In the following experiments wecompare our proposal, based on compact data structures, with two baselines: theoriginal FNR-tree and an ad-hoc baseline in which 1D R-trees are replaced byinterval trees. Note that in these experiments we are comparing three two-levelindexes, all of them using a 2D R-tree on the top level.The datasets of trajectories were created using Brinkhoﬀ’s generator [6] overthe real road networks of Oldenburg and San Francisco. The former consists of6,105 nodes and 7,305 edges, whereas the latter consists of 175,343 nodes and223,343 edges. We created trajectories for 1,000, 2,000, 3,000, 4,000 and 5,000objects during 100 units of time for both networks.

Memory usage.

Figure 6 shows the space required by each of the structures.The proposed space-eﬃcient solution (labeled as IIS in the graphs) obtained thebest results in all the experiments. In addition, the larger the number of objectsmoving over the network, the larger the advantage of this structure over thebaselines. For small number of moving objects, the total space used by the datastructures is dominated by the spatial level, however, as this number increases,the temporal level dominates, and our proposal takes more advantage. × M e m o r y u s ag e ( M B ) Oldenburg

FNR-treebaselineIIS × FNR-treebaselineIIS

Fig. 6: Total memory usage.

Structure Old. S.F.FNR-tree 5 32baseline 4 26IIS 1.5 11

Table 1: Memory usage perobject (KB / object) [5,000objects and 100 time units].

The approximated memory usage per object is shown in Table 1, which showsthat our approach requires about 70% less memory than the FNR-tree, and about60% less memory than the baseline, when there are 5,000 objects moving overthe networks. The diﬀerence between the two datasets is explained by the sizeof the network, the San Francisco network being much larger. First, part of thespace charged to each object is due to the spatial index. However, the size ofthe network has also an impact on the distribution of objects per edge of thenetwork. As this distribution is very skewed, the larger the network, the largerthe number of nodes with few objects, which means an overhead.

Query Time.

The time performance of the structures was evaluated for threetypes of queries, which are the same used in the original evaluation of the FNR-tree [14]: i)

Range Queries with Equal Spatial and Temporal Extent , such as “ﬁndall objects within a given area during a given time interval”; ii)

Range Querieswith Larger Temporal Extent , which query for very large time intervals, includingintervals expanding the whole temporal dimension, such as “ﬁnd all the objectshaving ever passed through a given area”; and iii)

Time Slice Queries , that onlyconsider a time instant, such as “ﬁnd all the objects that were in a given area ata given time instant”. For each of these scenarios, we created three query-setswith 500 random queries for each network.Figure 7 shows the results for the ﬁrst type of queries. The ﬁrst row showsresults for Oldenburg and the second row for San Francisco. For both datasets,we show the results of random queries of diﬀerent sizes, 1%, 10% and 20% ineach dimension. Similar frameworks will be used to evaluate the other two typesof queries. This is the same experimental setting used in [14].In all the experiments our proposal outperforms both baselines. Just for smallqueries, 1% of the dimensions, the FNR-tree shows competitive results with ourproposal. This is more evident in the largest network. The justiﬁcation is therelative importance of the spatial part of the query with respect to the temporalpart, which depends on the size of the network. Also important, our proposalshows better scalability on the number of objects moving through the network.Figure 8 shows the results for range queries with larger temporal extent. Inthese experiments the temporal extent is always larger than the spatial extent,expanding the whole temporal dimension in the second and third column.Results in this scenario are similar to the previous one, but the advantage ofour proposal is even more obvious. Recall that our structure performs two rankoperations in each independent set and then it just iterates over the results,which is very eﬃcient. Finally, Figure 9 shows the results for time slice queries.The analysis of these experiments is quite diﬀerent from the previous ones,as the FNR-tree usually outperforms all the other approaches. There are twomain reasons for this. First, large spatial queries lead querying many temporalindexes (all of them for the experiments in the last column). Second, most ofthese queries to temporal indexes produce empty results or very few results,which is expensive in our proposal. Each of these queries needs to perform thetwo rank operations in each independent set just to detect that there are no aster and Smaller Two-Level Index for Network-based Trajectories 11 × O l d e nbu r g T i m e ( µ s ) FNR-treebaselineIIS , ,

500 Num. of objects ( × FNR-treebaselineIIS , , , ,

000 Num. of objects ( × FNR-treebaselineIIS , ,

000 Num. of objects ( × S a n F r a n c i s c o T i m e ( µ s ) FNR-treebaselineIIS . · × FNR-treebaselineIIS . . · × FNR-treebaselineIIS

Fig. 7: Range Queries. First row for Oldenburg and second row for San Francisco.Each column contains queries of diﬀerent size from 1% to 20%. × O l d e nbu r g T i m e ( µ s )

1% - 10%

FNR-treebaselineIIS × FNR-treebaselineIIS . · × FNR-treebaselineIIS . . · × S a n F r a n c i s c o T i m e ( µ s ) FNR-treebaselineIIS . . · × FNR-treebaselineIIS . . · × FNR-treebaselineIIS

Fig. 8: Range Queries with Larger Temporal Extent. First row for Oldenburgand second row for San Francisco. Each column indicates x % - y %, being x thesize of each spatial dimension (1% or 10%) and y the size of the time intervals(10% or 100%).results to iterate through. Hence, this scenario represents the worst case for ourproposal. × O l d e nbu r g T i m e ( µ s ) FNR-treebaselineIIS ,

000 Num. of objects ( × FNR-treebaselineIIS · × FNR-treebaselineIIS , ,

000 Num. of objects ( × S a n F r a n c i s c o T i m e ( µ s ) FNR-treebaselineIIS · × FNR-treebaselineIIS · × FNR-treebaselineIIS

Fig. 9: Time Slice Queries. First row for Oldenburg and second row for San Fran-cisco. Each columns contains queries of diﬀerent spatial extent (1% to 100%).

We have proposed a new data structure for trajectories of moving objects, whichmovement is constrained to a network. Our proposal is inspired by two-levelindexes, such as the FNR-tree and MON-tree and, indeed, we use the same two-dimensional R-tree for the spatial dimension. Hence, the diﬀerence from previoussolutions is in the temporal dimension. This is justiﬁed by our experimentalevaluation showing that the spatial dimension requires negligible space comparedwith the temporal dimension. For this dimension, we propose a structure basedon a decomposition on independent sets of intervals and the use of succinct datastructures. Our experimental evaluation shows that the resulting structure issmaller than previous solutions, and also faster for a broad set of queries.The interval intersection problem can be reduced to 2-sided range report-ing [11], a problem for which eﬃcient data structures have been successfullyapplied in LZ-indexes [3,4]. As these structures are not adaptive to the numberof independent interval sets, a combination of both approaches would be interest-ing as future work. Second, to handle larger datasets, it is necessary to improveconstruction time. Note, however, that we used larger datasets than those usedin the evaluation of the FNR-tree. Third, some parts of the structure could befurther optimized. We have observed that the distribution of the moving ob-jects through the network is very skewed, which produces few temporal indexesstoring many intervals and many indexes storing very few intervals. Hence, inorder to use this index in practice, it is necessary to determine a threshold underwhich the intervals are just stored in an array and sequentially searched. Finally,bitmaps supporting append operations should be used to support dynamism. aster and Smaller Two-Level Index for Network-based Trajectories 13

References

1. de Almeida, V.T., G¨uting, R.H.: Indexing the trajectories of moving objects innetworks*. GeoInformatica (1), 33–60 (2005)2. Barkan, Y.: RTree (2011), GitHub repository, https://github.com/nushoin/RTree3. Belazzougui, D., Cunial, F., Gagie, T., Prezza, N., Raﬃnot, M.: Compositerepetition-aware data structures. In: CPM. pp. 26–39 (2015)4. Belazzougui, D., Cunial, F., Gagie, T., Prezza, N., Raﬃnot, M.: Flexible indexingof repetitive collections. In: CiE. pp. 162–174 (2017)5. Berg, M.d., Cheong, O., Kreveld, M.v., Overmars, M.: Computational Geometry:Algorithms and Applications. Springer-Verlag TELOS, 3rd ed. edn. (2008)6. Brinkhoﬀ, T.: A framework for generating network-based moving objects. GeoIn-formatica (2), 153–180 (2002)7. Brisaboa, N.R., Fari˜na, A., Galaktionov, D., Rodr´ıguez, M.A.: Compact trip rep-resentation over networks. In: SPIRE. pp. 240–253 (2016)8. Brisaboa, N.R., Gagie, T., G´omez-Brand´on, A., Navarro, G., Param´a, J.R.: Eﬃ-cient compression and indexing of trajectories. In: SPIRE. pp. 103–115 (2017)9. Brisaboa, N.R., G´omez-Brand´on, A., Navarro, G., Param´a, J.R.: Gract: A grammarbased compressed representation of trajectories. In: SPIRE. pp. 218–230 (2016)10. Brisaboa, N.R., Ladra, S., Navarro, G.: k2-Trees for Compact Web Graph Repre-sentation. In: SPIRE. pp. 18–30 (2009)11. Brisaboa, N.R., Luaces, M.R., Navarro, G., Seco, D.: Space-eﬃcient representationsof rectangle datasets supporting orthogonal range querying. Inf. Syst. (5), 635–655 (2013)12. Ding, Z., Yang, B., Gting, R.H., Li, Y.: Network-matched trajectory-based moving-object database: Models and applications. IEEE Transactions on Intelligent Trans-portation Systems (4), 1918–1928 (2015)13. Fredman, M.L.: On computing the length of longest increasing subsequences. Dis-crete Math. (1), 29–35 (1975)14. Frentzos, E.: Indexing objects moving on ﬁxed networks. In: SSTD. pp. 289–305(2003)15. Garrison, E.: intervaltree (2011), GitHub repository,https://github.com/ekg/intervaltree16. Gog, S., Beller, T., Moﬀat, A., Petri, M.: From theory to practice: Plug and playwith succinct data structures. In: SEA. pp. 326–337 (2014)17. Guttman, A.: R-trees: A dynamic index structure for spatial searching. SIGMODRec. (2), 47–57 (1984)18. Han, Y., Sun, W., Zheng, B.: Compress: A comprehensive framework of trajec-tory compression in road networks. ACM Trans. Database Syst. (2), 11:1–11:49(2017)19. Kellaris, G., Pelekis, N., Theodoridis, Y.: Map-matched trajectory compression. J.Syst. Softw. (6), 1566–1579 (2013)20. Koide, S., Tadokoro, Y., Xiao, C., Ishikawa, Y.: CiNCT: Compression and retrievalfor massive vehicular trajectories via relative movement labeling. In: ICDE. pp.1097–1108 (2018)21. Krogh, B., Pelekis, N., Theodoridis, Y., Torp, K.: Path-based queries on trajectorydata. In: SIGSPATIAL. pp. 341–350 (2014)22. Meratnia, N., de By, R.A.: Spatiotemporal compression techniques for movingpoint objects. In: EDBT. pp. 765–782 (2004)4 R. Rivera et al.23. Navarro, G., M¨akinen, V.: Compressed full-text indexes. ACM Comput. Surv. (1) (2007)24. Okanohara, D., Sadakane, K.: Practical entropy-compressed rank/select dictionary.In: ALENEX. pp. 60–70 (2007), http://dl.acm.org/citation.cfm?id=2791188.2791194

25. Pfoser, D., Jensen, C.S., Theodoridis, Y.: Novel approaches in query processing formoving object trajectories. In: VLDB. pp. 395–406 (2000)26. Potamias, M., Patroumpas, K., Sellis, T.: Sampling trajectory streams with spa-tiotemporal criteria. In: SSDBM. pp. 275–284 (2006)27. Sandu Popa, I., Zeitouni, K., Oria, V., Barth, D., Vial, S.: Indexing in-networktrajectory ﬂows. The VLDB Journal (5), 643 (2011)28. Schmid, F., Richter, K.F., Laube, P.: Semantic trajectory compression. In: SSTD.pp. 411–416 (2009)29. Schmidt, J.M.: Interval stabbing problems in small integer ranges. In: ISAAC. pp.163–172 (2009)30. Schmidt, J.M.: Publications by Jens M. Schmidt.