Faster and Smaller Two-Level Index for Network-based Trajectories
FFaster and Smaller Two-Level Index forNetwork-based Trajectories (cid:63)
Rodrigo Rivera, Andrea Rodr´ıguez, and Diego Seco
Department of Computer Science, University of Concepci´on, Chile { rodrivera,andrea,dseco } @udec.cl Abstract.
Two-level indexes have been widely used to handle trajec-tories of moving objects that are constrained to a network. The top-level of these indexes handles the spatial dimension, whereas the bottomlevel handles the temporal dimension. The latter turns out to be an in-stance of the interval-intersection problem, but it has been tackled bynon-specialized spatial indexes. In this work, we propose the use of acompact data structure on the bottom level of these indexes. Our exper-imental evaluation shows that our approach is both faster and smallerthan existing solutions.
Keywords:
Space-efficient data structures · Moving-objects · Indexing.
Spatio-temporal information has gained popularity in decision making systems,such as optimization of transportation systems, urban planning, and so on. Theproliferation of different types of sensors to capture or generate this kind of datahas made these applications possible but, at the same time, it has also madechallenging the storage and processing of spatio-temporal data. The work in thispaper focuses on a subcategory of spatio-temporal data, that is, trajectory ofmoving objects, which can be be reconstructed by the GPS devices of smart-phones or, at a different granularity, by smart transportation cards.Trajectories can be classified as free-trajectories, in which movement is notconstrained, and network-based trajectories, in which movement is constrainedto a network and cannot exist outside such network. Hurricanes and animal mi-grations are examples of the former, whereas public transportation is an exampleof the latter. Useful queries that can be answered by handling trajectories are:count the number of vessels inside a region during a time period (e.g. fishingclosed season) or find the shortest path between two stops of a transportationsystem during a time period. (cid:63)
Funded in part by European Union’s Horizon 2020 research and innovation pro-gramme under the Marie Sk(cid:32)lodowska-Curie grant agreement 690941, by CONICYT-PFCHA/Mag´ısterNacional/2016 - 22161080 (R.R.), and Millennium Institute forFoundational Research on Data and Fondecyt-Conicyt grant number 1170497. a r X i v : . [ c s . D S ] J a n R. Rivera et al.
Several spatio-temporal indexes have been proposed to handle both free andnetwork-based trajectories. However, classical solutions to deal with moving-object data are inefficient when facing the data volume collected through newsensor technology and the increasing interest for data analysis. On the otherhand, space-efficient data structures have been proved to be successful for han-dling large volumes of data in many different domains, such as the Web, biologicalsequences, documents and code repositories, to name some examples.In this work, we focus on two-level indexes for network-based trajectories andpropose a new solution that uses compact data structures on the bottom level.This approach turns out to be smaller and faster than existing solutions.
A data structure for trajectories must provide access methods that allow the pro-cessing of spatio-temporal queries. These queries can be classified into coordinate-and trajectory-based queries [25]. Coordinate-based queries include time-slice queries that determine the position of objects at a given time instant, time-interval queries that extend time-slice queries to a time range, and queriesabout nearby neighbors . As for trajectory-based queries, they include topologi-cal queries, which involve information regarding the movement of an object, andqueries related to navigation, which involve information derived from the move-ment, such as speed or direction. There also exist combinations such as “Wherewas object X at a given time instant”.Various data structures have been proposed to efficiently support queries ontrajectories. These structures can be broadly classified into two categories: i)Data structures to support free movements on a space, such as 3D R-tree (athree-dimensional extension of the R-tree [17]), TB-tree [25] (which preservesthe trajectories while allowing typical range queries on an R-tree) and MV3R-tree [31] (which uses a multi-version R-tree, called MVR-tree, along with anauxiliary 3D R-tree). ii) Data structures to support movements on networks, suchas FNR-tree [14] (which uses a combination of a 2D R-tree with a forest of 1DR-trees), MON-tree [1] (using 2 levels of 2D R-trees) and PARINET [27] (basedon graph partitioning and the use of B + -tree). Among the previous structures,FNR-tree and MON-tree have in common the separation of spatial and temporaldimensions, using a spatial structure (two-dimensional) and a forest of temporalstructures (one-dimensional) to tackle each of these sub-problems separately.Like FNR-tree and MON-tree, we focus on these two-level indexes. To solvethe spatial problem, that is, the representation of the network in space (two-dimensional plane), aforementioned structures use a 2D R-tree, storing the seg-ments of the network as lines. With the spatial problem solved, time has to beassociated with segments in the network. More precisely, it is necessary to lookfor all the time intervals (times in which some objects pass through a segment)that intersect with a given query interval. This problem is known in the litera-ture as interval intersection , an extension of the interval stabbing problem [29].Classical structures to solve this problem are Interval trees and Priority trees [5]. aster and Smaller Two-Level Index for Network-based Trajectories 3 As for the subproblem in the temporal dimension, FNR-tree makes use ofa one-dimensional R-tree for each segment. These 1D R-trees index the objectswhose trajectories pass through the segments of the network, storing the instantthey enter and leave the segment in the form of a time interval ( t entry , t exit ).Since only these intervals are stored, the structure assumes that objects do notstop or change speed or direction in the middle of a segment, they can only do soat nodes. MON-tree eliminates this restriction by replacing the one-dimensionalR-trees with two-dimensional R-trees, where they store the relative movementwithin the segments as rectangles in the 2D R-tree of the form ( p , p , t , t ),with ( p , p ) a range of relative positions and ( t , t ) a temporal interval.While some of aforementioned structures support queries efficiently on largedatasets, they are incapable of handling the increasing data volume of currentapplications. This has forced the use of compression techniques for data storageand transmission. Some techniques are to reduce the number of points in acurve [22] or to use features at each point, such as speed and orientation [26].Both techniques work in free spaces and, when the movement is restricted tonetworks, it is even possible to get a better compression, like the ones shownin [18,19,20,28].Previous compression techniques improve storage requirements and trans-mission time of large datasets. However, the compression can be directly ex-ploited by data structures that can maintain a compact representation of thedata while allowing for indexed search capabilities. These structures have beencalled self-indexes and have been successfully implemented in other domains,such as information retrieval [23].Recently, compact data structures have been also used for the representationof trajectories. GraCT [9], for free paths, uses a k -tree [10] to store the abso-lute position of the objects in regular time intervals (snapshots) plus compressedlogs for the representation of the movements between snapshots. ContaCT [8] im-proves GraCT with more efficient logs. Both structures answer spatio-temporalqueries where space and time are the main filters, such as, “finding trajectoriesthat went through a specific region at a given time instant”. On the other hand,CTR [7] supports trajectories restricted to networks by combining compressedsuffixes arrays (CSA), to represent the nodes on the network an object passesthrough, and a balanced Wavelet matrix for the temporal component of themovement. In CTR, trajectories (or trips) are defined as sequences of labels,which represent the nodes of the network. Hence, it solves other types of queriesin which the space is represented with such labels, such as “find the numberof trajectories that started at X and ended at Y ”. This is a fundamental dif-ference with our proposal, in which the spatial dimension are coordinates ina two-dimensional space, and not labels. This is also the main difference withCiNCT [20], which boosts CTR in terms of memory storage and query time.Another difference with previous solutions is that our approach uncouplesthe network from the trajectories. This model known as Network-Matched hasbeen successfully used [12,21], but without using compact data structures in itsimplementation. Our approach has the advantage that mapping trajectories to R. Rivera et al. a network facilitates the finding of similar trajectories and, in consequence, itallows a better use of space.
Similarly to the FNR-tree and MON-tree, we propose an index with two levels:spatial (top level) and temporal (bottom level). In a preliminary experimentalevaluation, we observed that the spatial level requires negligible space comparedwith the temporal level. For example, for the Oldenburg network (see Section 4),in the baseline structure, the temporal level uses about 89% of the total memorywith 1,000 objects circulating and about 94% with 2,000 objects. The more dataare stored, the more negligible the spatial level becomes (due to the almost-staticnature of the transportation network in comparison with the moving objects).Hence, we focus on optimizing the temporal level and process the spatial levelwith a two-dimensional R-tree, as the FNR-tree and MON-tree do. Recall thatthe R-tree is a balanced tree in which each leaf stores an entry of the form (id,MBB), where id is a reference to the data (in this case to a temporal index)and MBB is the Minimum Bounding Box that covers the spatial object (a linesegment, in this domain). The R-tree does not provide worst-case guaranteesas it may be forced to examine the entire tree in O ( n ) time, even when theoutput is empty. However, it performs well in practice and is ubiquitous in spatialdatabases.Each leaf of the R-tree contains a reference to a temporal index. These in-dexes solve the Interval Intersection problem. Before presenting alternatives tosolve this problem, we give an overview of a query algorithm for spatio-temporalrange queries, which are the most general coordinate-bases queries. First, a spa-tial query, a 2D window, is solved on the 2D R-tree, which returns a set of leaveswhose segment may intersect the window. As in most spatial indexes, a refine-ment step is then executed to eliminate false positives, i.e. network segmentswhose MBBs intersect the window, but they do not actually intersect the win-dow. After this refinement, the interval intersection query is executed in eachtemporal index referenced by the remaining leaves of the R-tree. Results fromall these temporal indexes are then combined using an implementation of a set.
Unlike the FNR-tree and MON-tree, which use variants of an R-tree, we explorethe use of specialized data structures for the interval-intersection problem.
Interval-tree [5].
This is a binary tree that is constructed recursively in thefollowing way: i) The median x med of all the interval endpoints is computed. ii)Intervals are classified in three sets, I med , I left and I right , which contain intervalsstabbed by x med , intervals to the left of x med and intervals to the right of x med ,respectively. iii) I med is stored in a structure composed of two arrays sorted by aster and Smaller Two-Level Index for Network-based Trajectories 5 left and right endpoints, and associated with the root, whereas I left and I right are recursively processed and assigned as left and right child, respectively.A search for the intervals that intersect with the query interval ( l q , r q ) issolved recursively starting from the root. The intervals within the visiting nodethat intersect the query interval are returned and the search is continued in theleft child if l q is less than x med and/or in the right child if r q is greater than x med . This data structure requires linear space and O (log n + k ) query time,where k is the number of reported results. Schmidt.
The structure presented in [29] to solve the Interval Stabbing problemcan be extended to solve also the Interval Intersection problem [11]. It definesthe father of an interval as the rightmost interval among those that cover itcompletely. This relation forms a tree where siblings are ordered from left toright, and the root of the tree is a special node that acts as the father of all theintervals that are not covered by any other. In addition, for each possible end-point of an interval, the structure stores an array called start , with a pointer tothe node representing the rightmost-starting interval that intersects such point,and an array start2 , storing a pointer to the node representing the rightmostinterval starting up to such point (which may not be stabbed by it).To solve an interval intersection query q , the algorithm first reports the right-most interval that intersects q , which is max( start [ l q ] , start [ r q ]), if it exists.Then, the algorithm recursively reports the siblings to the left of the node whileits right endpoint is greater than or equal to l q , also searching among the rightchildren of the reported nodes. This structure requires linear space and optimal O (1+ k ) query time. Note, however, that this solution works only for small integerranges. In order to work with intervals whose endpoints are floats, these end-points are stored in sorted arrays and two binary searches are used to translatethe query to rank space [11], which results in a total complexity of O (log n + k ). Compact data structure based on Independent Interval Sets (IIS).
Aset of intervals I = { i , i , ..., i n } is called an Independent Interval Set if nointerval i j ∈ I is contained in any other interval i k ∈ I .Report the k intervals of an IIS that intersect a query interval Q = [ l q , r q ]can be easily computed if we have the intervals in order. Note that, by definitionof IIS , the order of the left endpoints of the intervals is the same as that ofthe right endpoints. If the first and the last interval intersected by the queryare located, it is enough to iterate between them to return all the intersectedintervals (see Figure 1).In order to locate these two intervals, we could store the left and right coordi-nates of the intervals in two sorted arrays and use binary search to locate them,which is similar to what we did in previous solution. However, for this domain,we propose a simple solution that facilitates the use of compact data structures.Recall that the endpoints of our intervals are timestamps represented as floatnumbers. We multiply these timestamps by a scale factor to convert them tointegers. For example, if we work with timestamps with up to 6 decimals it is
R. Rivera et al.
Fig. 1: An Independent Interval Set (IIS) and its representation with two bitvec-tors. In red, the last interval stabbed in start and the first one in end .enough to multiply each one of them by 10 to discretize the space. With thisprocedure, we obtain integer endpoints in an universe U and, the larger the scalefactor, the larger the universe.After this discretization, we use two bitvectors, one for the left endpoints, start , and another for the right endpoints, end , of each interval in the set (seeFigure 1). A 1-bit in these bitvectors indicate that an interval starts (or ends,respectively), at such position. Then, for a query Q = [ l q , r q ] also discretized tothis universe, two rank operations on these bitmaps are used to locate the firstand last intervals intersected by the query: rank ( end, l q ) and rank ( start, r q ),respectively. As we mentioned above, the larger the scale factor, the larger thesize of the universe u , which is the number of bits in these bitmaps. However, thenumber of set bits in them is n , which is the number of intervals (independentlyof the scale factor). Hence, we use the Elias-Fano representation [24] for thisbitmaps, which takes 2 n + n log un bits of space. Note that, for a constant c and u = O ( n c ), it uses linear space as previous structures. The query time of rankoperations on these bitmaps is O (log un ), thus, this structure can report the k intervals intersecting the query in O (log un + k ) time.Although this solution only works for IIS, a general set of intervals can bedecomposed into m independent sets in O ( n log m ) time, for example, with Fred-man’s algorithm [13] to find the optimal number of shuffled upsequences in apermutation (by considering the rightmost endpoints of the intervals as the per-muted values). This leads to a solution that requires O ( m log un + k ) time toreport the k solutions. This does not provide worst case guarantees as m can beas large as n , however, this adaptive analysis shows that this is an efficient solu-tion for domains in which m is small. The empirical evaluation in next sectionshows that this is precisely the case in our domain. All the implementations evaluated in this paper were coded in C++11. Forthe baselines, we use some available implementations: R-tree [2], Interval-tree This implementation uses sequential search in each node, which is not optimal intheory, but performs well in practice.aster and Smaller Two-Level Index for Network-based Trajectories 7 [15] and Schmidt [30]. We also make use of some succinct data structures fromthe SDSL library [16]. The experiments were run in a computer with an IntelXeon E3-1220 v5 of 3.00 GHz CPU, 64GB of RAM, and implementations werecompiled with g++ 5.4.0 over Ubuntu 16.04 (64 bits).We first evaluate the performance of all the implementations for intervalintersection on synthetic datasets, and then, the best candidates are evaluatedin the complete solution for network-based trajectories.
We evaluated the performance in three scenarios with different types of intervals:i) fixed size (Figure 2), ii) random size (Figure 3), and iii) intervals of trajectoriesextracted from a trajectories dataset generated with Brinkhoff’s generator [6]over San Francisco’s network (Figure 4). For each of these scenarios, we created adataset with 800,000 intervals and a queryset with 500 random queries. Reportedquery time is the total time to solve all the queries. × T i m e ( s ) a) Query time R-treeInterval-treeSchmidtIIS × M e m o r y u s ag e ( M B ) b) Space (log scale) 2 4 6 801234 Number of intervals ( × T i m e ( s ) c) Construction time Fig. 2: Fixed size intervals.Figure 2 shows the performance of the structures using fixed size intervals.The compact data structure shows the best performance among the four struc-tures, with a considerable advantage in both query time and memory usage. Inthis scenario, intervals do not fully cover each other (except for precision issues),which produces a low number of independent sets in the IIS structure (only 6for 800,000 intervals). This explains the outstanding performance of IIS.Figure 3 shows the performance of the structures for random size intervals.The compact data structure keeps the best results in query time and memoryusage (although in a tie with the Interval-tree) while the building time is dras-tically increased (up to 900 times the building time of the Interval-tree). This isexplained by the high number of independent sets (3,273 for 800,000 intervals),which is caused by the frequency with which intervals fully cover each other.Figure 4 shows the performance of the structures using time intervals ex-tracted from synthetic trajectories obtained with Brinkhoff’s generator. Thecompact data structure shows a performance in between the two previous cases,
R. Rivera et al. × T i m e ( s ) a) Query time R-treeInterval-treeSchmidtIIS × M e m o r y u s ag e ( M B ) b) Space (log scale) 2 4 6 805001 ,
000 Number of intervals ( × T i m e ( s ) c) Construction time Fig. 3: Random size intervals. × T i m e ( s ) a) Query time R-treeInterval-treeSchmidtIIS × M e m o r y u s ag e ( M B ) b) Space (log scale) 2 4 6 80510 Number of intervals ( × T i m e ( s ) c) Construction time Fig. 4: Intervals from trajectories.but more similar to the first one. This shows the sensibility of the structure tothe number of independent sets. In this dataset, intervals of trajectories haveoften similar length, producing a relatively low number of independent sets (29for 800,000 intervals). Each temporal index is associated with a segment of thenetwork, and moving objects usually traverse a same segment at a similar speed.We also evaluated the sensibility of the structures to the scale used to trans-form original float-number times to integers. In this procedure, each time ismultiplied by a scale factor and then truncated. In our datasets, original timesuse up to 8 digits to the right of the decimal point. Hence, a scale factor of 10 guarantees a lossless transformation, whereas lower scale factors may produce alossy transformation. In these experiments, we used the same 800,000 intervalsof trajectories of the previous evaluation and results are shown in Figure 5.Query time shows an almost constant behavior, except for the increase suf-fered by Schmidt, which is caused by the high number of duplicates when only2 or less digits are used for the fractional part. In terms of space and construc-tion time, the compact data structure is more sensible than the other structures,which is caused by the scale process. As we explain in previous section, the largerthe scale factor, the larger the size of the bitmaps in this structure. Even so, thisstructure obtains the best results in both query time and memory usage, alsogiving the possibility to improve the performance in applications where the usercan afford losing some precision. Note, however, that in all the other experimentswe consider all the decimals, which is the worst case for our proposal. aster and Smaller Two-Level Index for Network-based Trajectories 9 T i m e ( s ) a) Query time R-treeInterval-treeSchmidtIIS − M e m o r y u s ag e ( M B ) b) Space (log scale) 0 2 4 6 80246 Scale T i m e ( s ) c) Construction time Fig. 5: Performance according the scale of the intervals. The last point of IIS inthe last graph was omitted, because it is about 40 times larger than the others.
From the experiments in previous section, we conclude that Schmidt’s structureis always outperformed by the others, and thus it is not considered in the im-plementation of data structures for trajectories. In the following experiments wecompare our proposal, based on compact data structures, with two baselines: theoriginal FNR-tree and an ad-hoc baseline in which 1D R-trees are replaced byinterval trees. Note that in these experiments we are comparing three two-levelindexes, all of them using a 2D R-tree on the top level.The datasets of trajectories were created using Brinkhoff’s generator [6] overthe real road networks of Oldenburg and San Francisco. The former consists of6,105 nodes and 7,305 edges, whereas the latter consists of 175,343 nodes and223,343 edges. We created trajectories for 1,000, 2,000, 3,000, 4,000 and 5,000objects during 100 units of time for both networks.
Memory usage.
Figure 6 shows the space required by each of the structures.The proposed space-efficient solution (labeled as IIS in the graphs) obtained thebest results in all the experiments. In addition, the larger the number of objectsmoving over the network, the larger the advantage of this structure over thebaselines. For small number of moving objects, the total space used by the datastructures is dominated by the spatial level, however, as this number increases,the temporal level dominates, and our proposal takes more advantage. × M e m o r y u s ag e ( M B ) Oldenburg
FNR-treebaselineIIS × FNR-treebaselineIIS
Fig. 6: Total memory usage.
Structure Old. S.F.FNR-tree 5 32baseline 4 26IIS 1.5 11
Table 1: Memory usage perobject (KB / object) [5,000objects and 100 time units].
The approximated memory usage per object is shown in Table 1, which showsthat our approach requires about 70% less memory than the FNR-tree, and about60% less memory than the baseline, when there are 5,000 objects moving overthe networks. The difference between the two datasets is explained by the sizeof the network, the San Francisco network being much larger. First, part of thespace charged to each object is due to the spatial index. However, the size ofthe network has also an impact on the distribution of objects per edge of thenetwork. As this distribution is very skewed, the larger the network, the largerthe number of nodes with few objects, which means an overhead.
Query Time.
The time performance of the structures was evaluated for threetypes of queries, which are the same used in the original evaluation of the FNR-tree [14]: i)
Range Queries with Equal Spatial and Temporal Extent , such as “findall objects within a given area during a given time interval”; ii)
Range Querieswith Larger Temporal Extent , which query for very large time intervals, includingintervals expanding the whole temporal dimension, such as “find all the objectshaving ever passed through a given area”; and iii)
Time Slice Queries , that onlyconsider a time instant, such as “find all the objects that were in a given area ata given time instant”. For each of these scenarios, we created three query-setswith 500 random queries for each network.Figure 7 shows the results for the first type of queries. The first row showsresults for Oldenburg and the second row for San Francisco. For both datasets,we show the results of random queries of different sizes, 1%, 10% and 20% ineach dimension. Similar frameworks will be used to evaluate the other two typesof queries. This is the same experimental setting used in [14].In all the experiments our proposal outperforms both baselines. Just for smallqueries, 1% of the dimensions, the FNR-tree shows competitive results with ourproposal. This is more evident in the largest network. The justification is therelative importance of the spatial part of the query with respect to the temporalpart, which depends on the size of the network. Also important, our proposalshows better scalability on the number of objects moving through the network.Figure 8 shows the results for range queries with larger temporal extent. Inthese experiments the temporal extent is always larger than the spatial extent,expanding the whole temporal dimension in the second and third column.Results in this scenario are similar to the previous one, but the advantage ofour proposal is even more obvious. Recall that our structure performs two rankoperations in each independent set and then it just iterates over the results,which is very efficient. Finally, Figure 9 shows the results for time slice queries.The analysis of these experiments is quite different from the previous ones,as the FNR-tree usually outperforms all the other approaches. There are twomain reasons for this. First, large spatial queries lead querying many temporalindexes (all of them for the experiments in the last column). Second, most ofthese queries to temporal indexes produce empty results or very few results,which is expensive in our proposal. Each of these queries needs to perform thetwo rank operations in each independent set just to detect that there are no aster and Smaller Two-Level Index for Network-based Trajectories 11 × O l d e nbu r g T i m e ( µ s ) FNR-treebaselineIIS , ,
500 Num. of objects ( × FNR-treebaselineIIS , , , ,
000 Num. of objects ( × FNR-treebaselineIIS , ,
000 Num. of objects ( × S a n F r a n c i s c o T i m e ( µ s ) FNR-treebaselineIIS . · × FNR-treebaselineIIS . . · × FNR-treebaselineIIS
Fig. 7: Range Queries. First row for Oldenburg and second row for San Francisco.Each column contains queries of different size from 1% to 20%. × O l d e nbu r g T i m e ( µ s )
1% - 10%
FNR-treebaselineIIS × FNR-treebaselineIIS . · × FNR-treebaselineIIS . . · × S a n F r a n c i s c o T i m e ( µ s ) FNR-treebaselineIIS . . · × FNR-treebaselineIIS . . · × FNR-treebaselineIIS
Fig. 8: Range Queries with Larger Temporal Extent. First row for Oldenburgand second row for San Francisco. Each column indicates x % - y %, being x thesize of each spatial dimension (1% or 10%) and y the size of the time intervals(10% or 100%).results to iterate through. Hence, this scenario represents the worst case for ourproposal. × O l d e nbu r g T i m e ( µ s ) FNR-treebaselineIIS ,
000 Num. of objects ( × FNR-treebaselineIIS · × FNR-treebaselineIIS , ,
000 Num. of objects ( × S a n F r a n c i s c o T i m e ( µ s ) FNR-treebaselineIIS · × FNR-treebaselineIIS · × FNR-treebaselineIIS
Fig. 9: Time Slice Queries. First row for Oldenburg and second row for San Fran-cisco. Each columns contains queries of different spatial extent (1% to 100%).
We have proposed a new data structure for trajectories of moving objects, whichmovement is constrained to a network. Our proposal is inspired by two-levelindexes, such as the FNR-tree and MON-tree and, indeed, we use the same two-dimensional R-tree for the spatial dimension. Hence, the difference from previoussolutions is in the temporal dimension. This is justified by our experimentalevaluation showing that the spatial dimension requires negligible space comparedwith the temporal dimension. For this dimension, we propose a structure basedon a decomposition on independent sets of intervals and the use of succinct datastructures. Our experimental evaluation shows that the resulting structure issmaller than previous solutions, and also faster for a broad set of queries.The interval intersection problem can be reduced to 2-sided range report-ing [11], a problem for which efficient data structures have been successfullyapplied in LZ-indexes [3,4]. As these structures are not adaptive to the numberof independent interval sets, a combination of both approaches would be interest-ing as future work. Second, to handle larger datasets, it is necessary to improveconstruction time. Note, however, that we used larger datasets than those usedin the evaluation of the FNR-tree. Third, some parts of the structure could befurther optimized. We have observed that the distribution of the moving ob-jects through the network is very skewed, which produces few temporal indexesstoring many intervals and many indexes storing very few intervals. Hence, inorder to use this index in practice, it is necessary to determine a threshold underwhich the intervals are just stored in an array and sequentially searched. Finally,bitmaps supporting append operations should be used to support dynamism. aster and Smaller Two-Level Index for Network-based Trajectories 13
References
1. de Almeida, V.T., G¨uting, R.H.: Indexing the trajectories of moving objects innetworks*. GeoInformatica (1), 33–60 (2005)2. Barkan, Y.: RTree (2011), GitHub repository, https://github.com/nushoin/RTree3. Belazzougui, D., Cunial, F., Gagie, T., Prezza, N., Raffinot, M.: Compositerepetition-aware data structures. In: CPM. pp. 26–39 (2015)4. Belazzougui, D., Cunial, F., Gagie, T., Prezza, N., Raffinot, M.: Flexible indexingof repetitive collections. In: CiE. pp. 162–174 (2017)5. Berg, M.d., Cheong, O., Kreveld, M.v., Overmars, M.: Computational Geometry:Algorithms and Applications. Springer-Verlag TELOS, 3rd ed. edn. (2008)6. Brinkhoff, T.: A framework for generating network-based moving objects. GeoIn-formatica (2), 153–180 (2002)7. Brisaboa, N.R., Fari˜na, A., Galaktionov, D., Rodr´ıguez, M.A.: Compact trip rep-resentation over networks. In: SPIRE. pp. 240–253 (2016)8. Brisaboa, N.R., Gagie, T., G´omez-Brand´on, A., Navarro, G., Param´a, J.R.: Effi-cient compression and indexing of trajectories. In: SPIRE. pp. 103–115 (2017)9. Brisaboa, N.R., G´omez-Brand´on, A., Navarro, G., Param´a, J.R.: Gract: A grammarbased compressed representation of trajectories. In: SPIRE. pp. 218–230 (2016)10. Brisaboa, N.R., Ladra, S., Navarro, G.: k2-Trees for Compact Web Graph Repre-sentation. In: SPIRE. pp. 18–30 (2009)11. Brisaboa, N.R., Luaces, M.R., Navarro, G., Seco, D.: Space-efficient representationsof rectangle datasets supporting orthogonal range querying. Inf. Syst. (5), 635–655 (2013)12. Ding, Z., Yang, B., Gting, R.H., Li, Y.: Network-matched trajectory-based moving-object database: Models and applications. IEEE Transactions on Intelligent Trans-portation Systems (4), 1918–1928 (2015)13. Fredman, M.L.: On computing the length of longest increasing subsequences. Dis-crete Math. (1), 29–35 (1975)14. Frentzos, E.: Indexing objects moving on fixed networks. In: SSTD. pp. 289–305(2003)15. Garrison, E.: intervaltree (2011), GitHub repository,https://github.com/ekg/intervaltree16. Gog, S., Beller, T., Moffat, A., Petri, M.: From theory to practice: Plug and playwith succinct data structures. In: SEA. pp. 326–337 (2014)17. Guttman, A.: R-trees: A dynamic index structure for spatial searching. SIGMODRec. (2), 47–57 (1984)18. Han, Y., Sun, W., Zheng, B.: Compress: A comprehensive framework of trajec-tory compression in road networks. ACM Trans. Database Syst. (2), 11:1–11:49(2017)19. Kellaris, G., Pelekis, N., Theodoridis, Y.: Map-matched trajectory compression. J.Syst. Softw. (6), 1566–1579 (2013)20. Koide, S., Tadokoro, Y., Xiao, C., Ishikawa, Y.: CiNCT: Compression and retrievalfor massive vehicular trajectories via relative movement labeling. In: ICDE. pp.1097–1108 (2018)21. Krogh, B., Pelekis, N., Theodoridis, Y., Torp, K.: Path-based queries on trajectorydata. In: SIGSPATIAL. pp. 341–350 (2014)22. Meratnia, N., de By, R.A.: Spatiotemporal compression techniques for movingpoint objects. In: EDBT. pp. 765–782 (2004)4 R. Rivera et al.23. Navarro, G., M¨akinen, V.: Compressed full-text indexes. ACM Comput. Surv. (1) (2007)24. Okanohara, D., Sadakane, K.: Practical entropy-compressed rank/select dictionary.In: ALENEX. pp. 60–70 (2007), http://dl.acm.org/citation.cfm?id=2791188.2791194
25. Pfoser, D., Jensen, C.S., Theodoridis, Y.: Novel approaches in query processing formoving object trajectories. In: VLDB. pp. 395–406 (2000)26. Potamias, M., Patroumpas, K., Sellis, T.: Sampling trajectory streams with spa-tiotemporal criteria. In: SSDBM. pp. 275–284 (2006)27. Sandu Popa, I., Zeitouni, K., Oria, V., Barth, D., Vial, S.: Indexing in-networktrajectory flows. The VLDB Journal (5), 643 (2011)28. Schmid, F., Richter, K.F., Laube, P.: Semantic trajectory compression. In: SSTD.pp. 411–416 (2009)29. Schmidt, J.M.: Interval stabbing problems in small integer ranges. In: ISAAC. pp.163–172 (2009)30. Schmidt, J.M.: Publications by Jens M. Schmidt.