A Survey on Trajectory Data Management, Analytics, and Learning
AA Survey on Trajectory Data Management, Analytics, andLearning
SHENG WANG,
New York University, United States
ZHIFENG BAO and J. SHANE CULPEPPER,
RMIT University, Australia
GAO CONG,
Nanyang Technological University, SingaporeRecent advances in sensor and mobile devices have enabled an unprecedented increase in the availability andcollection of urban trajectory data, thus increasing the demand for more efficient ways to manage and analyzethe data being produced. In this survey, we comprehensively review recent research trends in trajectorydata management, ranging from trajectory pre-processing, storage, common trajectory analytic tools, suchas querying spatial-only and spatial-textual trajectory data, and trajectory clustering. We also explore fourclosely related analytical tasks commonly used with trajectory data in interactive or real-time processing.Deep trajectory learning is also reviewed for the first time. Finally, we outline the essential qualities that atrajectory data management system should possess in order to maximize flexibility.CCS Concepts: •
Information systems → Data management systems .Additional Key Words and Phrases: Trajectory, storage system, similarity search, urban analytics, deep learning
ACM Reference Format:
Sheng Wang, Zhifeng Bao, J. Shane Culpepper, and Gao Cong. 2020. A Survey on Trajectory Data Management,Analytics, and Learning.
ACM Comput. Surv.
1, 1 (December 2020), 36 pages. https://doi.org/10.1145/nnnnnnn.nnnnnnn
Just over twenty years ago, Global Positional System (GPS) satellites began transmitting twoadditional signals to be used for civilian (non-military) applications to improve aircraft safety. Demand for GPS-based navigation has grown steadily ever since, and privately owned vehiclesare now the dominant consumers of this technology. An increased reliance on GPS-equippedsmartphones has also led to a rise in the use of location-based services (
LBS ), such as ride-sharingand social-network check-ins.In addition to GPS data, other sensor devices such as traffic surveillance cameras, unmannedaerial vehicles (UAV), and Radio-frequency identification (RFID) can also collect location datawithout deploying battery-dependent receiving devices to track objects, as shown in Fig. 1. Suchdevices can accurately report precise locations and track objects proactively and continuously. a r X i v : . [ c s . D B ] D ec Wang, et al.
Understanding Collective Crowd Behaviors:Learning a Mixture Model of Dynamic Pedestrian-Agents
Bolei Zhou , Xiaogang Wang , , and Xiaoou Tang , Department of Information Engineering, The Chinese University of Hong Kong Department of Electronic Engineering, The Chinese University of Hong Kong Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences [email protected], [email protected], [email protected]
Abstract
In this paper, a new Mixture model of Dynamicpedestrian-Agents (MDA) is proposed to learn the collectivebehavior patterns of pedestrians in crowded scenes. Col-lective behaviors characterize the intrinsic dynamics of thecrowd. From the agent-based modeling, each pedestrian inthe crowd is driven by a dynamic pedestrian-agent, whichis a linear dynamic system with its initial and terminationstates reflecting a pedestrian’s belief of the starting pointand the destination. Then the whole crowd is modeled asa mixture of dynamic pedestrian-agents. Once the modelis unsupervisedly learned from real data, MDA can simu-late the crowd behaviors. Furthermore, MDA can well in-fer the past behaviors and predict the future behaviors ofpedestrians given their trajectories only partially observed,and classify different pedestrian behaviors in the scene. Theeffectiveness of MDA and its applications are demonstrat-ed by qualitative and quantitative experiments on the videosurveillance dataset collected from the New York GrandCentral Station.
1. Introduction
Automatically understanding the behaviors of pedestri-ans in crowd is of great interest to video surveillance, andhas drawn more and more attentions in recent years [26]. Ithas important applications, such as event recognition [12],traffic flow estimation [23], behavior prediction [2], andcrowd simulation [20]. One of the underlying challengesof these problems is to model and learn the collective dy-namics of pedestrian behaviors in crowded scenes.Crowd behavior analysis has been studied in social sci-ence with a long history. French sociologist Le Bon(1841 ∼ A) B)
Figure 1. A) The crowd of pedestrians walking in a train station.Pedestrians have clear beliefs of the starting points and the des-tinations in mind. These beliefs and scene structures (e.g. theborder of walls) influence their past behaviors (indicated as solidgreen lines) as well as the future behaviors (indicated as dashedgreen lines). The shared beliefs and dynamics of movements gen-erate several dominant collective dynamic patterns in the scene.B) MDA learns the collective dynamic patterns of the crowd fromfragmented trajectories and simulates the collective behaviors ofthe crowd. Yellow circles and red arrows represent the current po-sitions of the simulated pedestrians and their velocities, along withtheir past trajectories in different colors. book The Crowd: A Study of the Popular Mind as, “ thecrowd, an agglomeration of people, presents new charac-teristics very different from those of the individuals com-posing it, the sentiments and ideas of all the persons in thegathering take one and the same direction, and their con-scious personality vanishes. ” It leads to the motivation ofthis work: the crowd has its intrinsic collective dynamic-s. Although individuals in crowd might not acquaint witheach other, their shared movements and destinations makethem coordinate collectively and follow the paths common-ly taken by others [13]. An illustrative example is shown inFigure 1A.In this paper, a new M ixture model of D ynamicpedestrian- A gents (MDA) is proposed to learn the collec-tive dynamics of pedestrians from a large amount of obser-vations without supervision. Observations are trajectoriesof feature points on pedestrians obtained by a KLT track-er [19]. Because of frequent occlusions in crowded scenes,1 (a) Human trajectories in surveillance [208] (b)
Monitoring cars by UAV [23] (c)
Car localization by RFID [3]
Fig. 1. New localization techniques that can capture trajectory data more proactively than using GPS.
Devices that track moving objects often generate a series of ordered points representing a trajectory .More formally, a trajectory 𝑇 is composed of two or more spatial-only points and defined as:Definition 1. (Point) A point 𝑝 = { x , y , 𝑡 } records the latitude x, the longitude y at timestamp 𝑡 . Definition 2. (Trajectory)
A trajectory 𝑇 is a sequence of points {( 𝑝 , 𝛾 ) , ( 𝑝 , 𝛾 ) , . . . , ( 𝑝 𝑚 , 𝛾 𝑚 )} . The number of points in a trajectory 𝑇 is denoted as the length of the trajectory, and the samplingrate is the number of samples per second (or other time units). Additional information 𝛾 canbe included with each point 𝑝 in a trajectory 𝑇 generated from the location-based service. Forexample, textual data can be integrated into a trajectory from social network check-in data or travelblogs, and has been referred to as spatial-textual trajectories [19, 87], semantic trajectories [193], or symbolic trajectories [73]. This survey will focus primarily on these two closely related forms oftrajectory data as solutions and applications in recent research work commonly overlap.
Survey Scope.
We focus primarily on urban trajectories. Other studies of domain specific data suchas aircraft [16] are not covered in this work. Note that the main difference between a moving object[135] and a trajectory is the distinction between a pointwise sequence and a continuous projection,and each of them often employs fundamentally different storage and algorithmic solutions.
Public Trajectory Datasets.
Table 1 shows several existing urban trajectory datasets, which canbe broadly divided into three groups: humans, vehicles (car, truck, train, bus, tram, etc.), others(animals and hurricanes). We can observe that human derived trajectory data (including vehicles)is a common source of trajectory data, and is currently the largest source of trajectory data. Forexample, 1 . Fig. 2 provides an overview of trajectory data management, analytics, and learning. Briefly, atrajectory data management and analytic system has several fundamental components: • Cleaning : Common techniques to clean common sensor errors in raw trajectories andnormalize sampling rates. • Storage : Open-source, commercial, and other commonly deployed storage formats that areused to represent trajectory data. • Similarity Measures : The most commonly used measures for top- 𝑘 similarity search, simi-larity joins, and clustering. For spatial-only trajectories, we can set 𝛾 = ∅ . Note that 𝛾 can also be attached to the whole trajectory directly.ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: December 2020. Survey on Trajectory Data Management, Analytics, and Learning 3
Table 1. Publicly available trajectory datasets.
Categorization Type Exemplary Datasets
Human GPS tracking GeoLife [206] 17 ,
621 Human mobility [194]
UrbanareaCheck-in Foursquare [19]Gowalla [39] 104 ,
478 Tourism planning [165]Mobility prediction [58]Online sharing OpenStreetMap [9] 8.7 million Commuting analyticsVideo surveillance Grand central station [208] 20 ,
000 Crowd behaviorCOVID-19 KCDC [5], JHU [8] <100 Disease controlVehicle Taxi trips T-drive [192], Porto [2] 1.7 million Traffic monitoringTaxi trip-requests NYC [10], Chicago [10] 1.1 billion Ride sharingTraffic cameras NGSIM [4] - Traffic simulationDrones HighD [90], inD [23] 110 ,
000 Traffic predictionSelf-driving Argoverse [28], ApolloScape [118] 300 ,
000 Trajectory forecastingTrucks Greece Trucks [6] 1100 Pattern miningOthers Hurricanes Atlantic hurricanes [7] 1740 Disaster detectionAnimals Zebranet [7], Movebank [1] 33 Animal behavior [92]
Representative Applications
StorageCleaningMap-matchingCompression SimilaritymeasureSpatialIndex Spatial-onlysearchTrajectoryjoinClustering
Spatial-textual search
Pre-processing Indexing Query & Analytics
Pruning byindex
Data Collection
VehicleGPSMobilephoneTrafficcameraDrone
Rawdata
TextualIndexData recovery ClassificationSythetic datagenerationSelf-driving
23 11
Predcition
Travel route Travel time AutoEncoder
Human mobility
GAN RNN: LSTM / GNUSeq2Seq Attention Mechanism CNN
Variational Autoencoders
16 4 62 5
Deep Learning Models Management
Road Traffic
Anomoly detectionJam & flowTraffic monitoring
Site Selection
Billboard placement Facility routeCharging station
Tourism Planning
Customizing tours Interest discovery Semantic pattern
Green Transport
Network planning NavigationCar pooling Fig. 2. Overview of trajectory data collection, management, representative applications, and widely useddeep learning models (indicated by the green labels, and the full names of models can be found in Section 7).Table 2. Survey contributions relative to prior trajectory surveys.
Survey Datasets Commercialsystems Similaritymeasures Pruningmechanisms Indexing Processingpipelinedecomposition Deeplearning ✗ ✗ spatial spatial spatial ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ spatial ✗ spatial mining ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ spatial ✗ ✗ This survey ✓ ✓ ✓ ✓ ✓ ✓ ✓ • Indexing : State-of-the-art solutions for indexing spatial and spatial-textual trajectory data,as well as effective pruning solutions to improve performance. • Query and Analytics : The most common trajectory queries, including range queries, top- 𝑘 similarity search, and trajectory joins. • Upstream Urban Applications : The four most common offline urban applications of tra-jectory data with problem formulations and solutions.
Managing large-scale trajectory data has many important challenges in efficiency and scalabilitydue to its size and diversity. A survey covering state-of-the-art and open problems in each stage,
ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: December 2020.
Wang, et al. from data pre-processing all the way to upper-stream applications, is valuable to both researchand industrial communities. While several surveys on trajectory data exist, none review trajectorydata management and applications as outlined in Fig. 2. Prior surveys have covered trajectory datamining [204], semantic trajectory modeling [127], and related applications [133]. Table 2 comparesand contrasts prior surveys with our own, highlighting the new areas covered. A couple of surveysfrom closely related areas which do partially overlap with the content of this survey include:trajectory similarity measures [146], trajectory privacy [72], and trajectory classification [22]. Inthese instances, this survey includes more recent work from these important areas. The closestrelated review work is the textbook “Computing with spatial trajectories” [207], published in 2011.Current techniques have advanced significantly in the last ten years, including the quickly growingarea of spatial-textual trajectories, which were not covered previously. The key contributions ofthis survey can be summarized as: • Scalability and Storage . This survey comprehensively reviews trajectory storage, which iscommonly required in scalable offline analytics. • Processing Pipeline Decomposition . We provide insights into the connections among allcomponents in the entire data processing pipeline. For the backend, we compare trajectorystorage systems (academic and commercial); for middleware, we compare similarity measures,queries, and index structures; for upstream trajectory applications, we propose a taxonomyfor key operators used across multiple tasks. • Representative Applications . Based on a comprehensive review of recent research inSIGMOD, PVLDB, ICDE, KDD, SIGSPATIAL, VLDBJ, TKDE, IJGIS, and TITS, we focus onurban applications requiring real-time responses or timely decision making, including: 1)road traffic; 2) green transport; 3) tourism planning; 4) site selection. These applicationsdepend heavily on downstream analytics tools. For example, clustering is commonly used togenerate candidate routes and to generate a fixed number of results for traffic monitoring. • Deep Trajectory Learning . New trends on using deep learning for various trajectory dataanalytics tasks, including trajectory recovery and generation, trajectory representation andsimilarity search, clustering, classification, anomaly detection, and trajectory prediction. • Perspectives on Future Trajectory Data Management Challenges . We conclude witha list of key challenges and associated open problems for researchers working in the area.
The remainder of the survey is organized as follows: Section 2 presents techniques for trajectory datastorage and pre-processing. Section 3 compares common trajectory similarity measures. Section 4illustrates queries and indexing techniques for spatial-only trajectory data and spatial-textualtrajectory data, respectively. Section 5 reviews trajectory clustering from a data managementperspective. Section 6 presents a taxonomy of four application types commonly encountered intrajectory data management. Section 7 introduces the latest progress in deep trajectory learning.Section 8 discusses the future of trajectory data system management. Section 9 concludes the paper.
A list of coordinates denoted by two floats (e.g., “-37.807302, 144.963242” ) are the traditional represen-tation of a spatial point location. Two or more such points can be used to represent a trajectoryfor storage, search, and analytics, and is the most common method in both research and industry.Table 3 shows several representative research systems specifically designed for trajectory data, aswell as several commercial systems which were not necessarily designed for only trajectory data,
ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: December 2020.
Survey on Trajectory Data Management, Analytics, and Learning 5
Table 3. An overview of existing trajectory database systems.
System Data Indexing Range 𝑘 NN Similarity Measure
TrajStore [44] Trajectory Quad-tree ✓ ✗ ✗
SharkDB [157] Trajectory Frame structure ✓ ✓
Point-to-trajectoryUITraMan [53] Trajectory R-tree ✓ ✓
Point-to-trajectoryTrajMesa [97] Trajectory Z-order curve ✓ ✓
Point-to-trajectoryDITA [141] Trajectory R-tree ✓ ✓
Trajectory-to-trajectoryTorch [166] Trajectory Grid-index ✓ ✓
Trajectory-to-trajectoryOracle LineString R-tree ✓ ✓
Point-to-pointPostGIS LineString R-tree ✓ ✓
Point-to-pointSQL Server LineString Grid-index ✓ ✓
Point-to-pointGeoMesa LineString Z-order curve ✓ ✓
Point-to-pointTile38 LineString R-tree ✓ ✓
Point-to-pointGeoSpark [189] LineString R-tree ✓ ✓
Point-to-pointSpatialSpark [188] LineString R-tree ✓ ✗ ✗ but can store and manipulate trajectory data through module extensions. Since commercial systemsare designed to handle many different types of data, they can incur additional cost overheads.In the academic field, several trajectory storage systems have been built to manage trajectory data[44, 53, 141, 157, 166, 183], and store trajectories as a set of points. These trajectory managementsystems [44, 157, 183] only support a single storage format, and often only a single query typesuch as a range query (finding trajectories in a rectangle). More recently, distributed systems[53, 97, 141] have been developed to store point-based trajectory datasets scalably, and can performmore advanced queries such as 𝑘 NN search over trajectories.Existing commercial or open-source systems (e.g., a general database such as Oracle Spatial ,SQL Server , and MySQL , or geospatial databases such as ArcGIS , Tile38 , PostGIS , GeoMesa )can also be extended for trajectory management. For example, LineString defined a portable data-interchange format-GeoJSON which is a geographical representation for an array of point co-ordinates. A trajectory can be stored as a LineString using this format.
GPX (the GPS ExchangeFormat) [61] is another format used by OpenStreetMap for routing and tracking. Figures 10 and 11in Appendix [162] show an example of these two formats. However, operator support for LineStringand GPX data is limited, e.g., return the number of points, the starting the ending point, or the 𝑖 -thpoint. Range and 𝑘 NN queries are supported over all points, but not the trajectories themselves.SpatialSpark [188] and GeoSpark [189] are two representative open-source systems from theresearch field of spatial databases, which also support the storage and querying of LineString.Range and 𝑘 NN queries are supported for the point data, but not trajectories. Nevertheless, thesespatial database systems could be extended to support any number of trajectory search operators inthe future since trajectory data is an extension point data. In Section 4.2, we will review indexingtechniques for point data in more detail.
Trajectory Cleaning.
Since trajectory data is often collected with GPS devices that have anaverage user range error (URE) of 7.8 m (25.6 ft), with 95% probability, trajectory point data is noisy https://docs.microsoft.com/en-us/sql/relational-databases/spatial/spatial-data-sql-server https://dev.mysql.com/doc/refman/5.7/en/gis-linestring-property-functions.html https://github.com/tidwall/tile38 https://postgis.net/ https://en.wikipedia.org/wiki/GeoJSON ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: December 2020. Wang, et al. [204]. Moreover, the sampling rate of GPS devices varies by application. For example, two vehiclesfollowing an identical path can produce trajectories that contain a different number of sampledpoints. These factors can directly affect distance or similarity computations, and degrade resultquality. Hence, when processing raw trajectory datasets, data cleaning is not only beneficial, butin certain circumstances a requirement.
Data segmentation [26], calibration [147] and enrichment [13] are the three most common cleaning techniques for trajectory data. Specifically, segmentingdata splits a long trajectory into several short trajectories. For example, analysis of a single taxi“trip” makes more sense than analyzing the movements of the taxi for an entire day. Buchin et al.[26] addressed the problem of segmenting a trajectory based on spatiotemporal criterias, includinglocation, heading, speed, velocity, curvature, sinuosity, curviness, and shape.Su et al. [147] proposed a calibration method that transforms a heterogeneous trajectory datasetto one with (almost) unified sampling strategies, such that the similarity between trajectories canbe computed more accurately. For noisy points, calibration identifies outliers and adds statisticalcorrection instead of filtering out these points directly. Such calibration also plays an importantrole in precise similarity computation [132], which we will introduce later. Liu et al. [109] furtherconsidered the constraint of road network topology, geometry information, and historical infor-mation to re-calibrate noisy data points. For trajectories of sparse data points, Alvares et al. [13]proposed that the data can be enriched with additional points and semantic information aboutthe types of visited places. Conversely, trajectory simplification [196] can be applied to removeredundant points.
Trajectory Compression.
Existing trajectory compression techniques can be divided into twogroups: simplification-based and road network-based. Excluding extra points is a common spacereduction method when the sampling rate is high, and is commonly referred to as trajectorysimplification [196], and also used in trajectory cleaning as discussed above. Removing points canreduce size, but must be used carefully as it can also reduce the resolution when analyzing the data.To alleviate this problem, an error ratio can be applied to bound the loss for specific computationsand data processing operations that are predefined before data simplification [113].Alternatively, a road network can be used to enable better compression [144] with little or noreduction in quality, for certain types of data. Each trajectory is projected onto a road networkas a sequence of road segments. Next, each road segment is uniquely encoded using
Huffmancoding [43]. Each trajectory is succinctly represented as a concatenation of the codewords, and issignificantly more effective than attempting to compress the raw floating point value pairs (latitudeand longitude). Further, string compression techniques [184] can be used directly on the trajectorydata, and any temporal information can also be succinctly stored using these techniques.
Map-matching.
Using a road network, map-matching [114, 124] projects a raw trajectory ontoa real path, and supports cleaning and compression. A road network is modeled as a weightedgraph [43], where a road segment is an edge, and its weight represents the length of this road.Consequently, a path mapped from a raw trajectory is a set of connected edges in a road network.Definition 3. (Road Network)
A road network is a directed graph 𝐺 = ( 𝑉 , 𝐸 ) , where 𝑉 is a setof vertices 𝑣 representing the intersections and terminal points of the road segments, 𝐸 is a set of edges 𝑒 representing road segments, and each vertex has a unique id allocated from to | 𝑉 | . Definition 4. (Mapped Trajectory)
Given a raw trajectory 𝑇 and a road network 𝐺 , we map 𝑇 to a road network path 𝑃 which is composed of a set of connected edges in 𝐺 , such that 𝑇 : 𝑒 → 𝑒 → . . . → 𝑒 𝑚 , and denoted as 𝑇 containing a series of edge IDs. Map-matching techniques should consider both effectiveness and efficiency. Efficient algorithmsenable us to find the nearest road segments for each point in the trajectory, and are required in
ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: December 2020.
Survey on Trajectory Data Management, Analytics, and Learning 7 q q q Q T . . T q q Q q . . . (a) Points-to-trajectory (b) Trajectory-to-trajectory Fig. 3. Two types of similarity measures: (a) a query is a set of points; (b) a query is a trajectory. order to connect the candidate road segment to the best path. Uncertainty is often the biggesthurdle, and is a reflection of parameter selection, which would reduce the number of potentialcandidate paths during shortest path search in 𝐺 . For effectiveness, mapping a trajectory to thetrue traversal path of a vehicle when errors in the data can be substantial is a key limiting factor.To measure effectiveness, a ground truth is required, and is a problem in its own right. Thereare three common ways to generate a ground truth for the map-matching problem: 1) using realvehicles equipped with GPS devices on the road [124]; 2) human adjudication [114]; and 3) simulatedGPS sampling [83] which is the least costly approach. This section will introduce the most widely used similarity measures to compare two trajectories.Fig. 3 presents two different types of query-to-trajectory similarity measures. Labels on each dashedline denote the distance from a query point to a point in the trajectory. A 𝑘 Best Connected Trajectories search ( 𝑘 BCT ) [38, 131, 149] is arguably the most commonlyemployed formulation of point-to-trajectory similarity search. To compute the result, each querypoint 𝑞 ∈ 𝑄 is paired with the closest point 𝑝 ∈ 𝑇 , and the sum of the distances for all pairs areaggregated to generate the final distance (similarity) score: 𝑑 kBCT ( 𝑄,𝑇 ) = (cid:205) 𝑞 ∈ 𝑄 min 𝑝 ∈ 𝑇 𝑑 ( 𝑞, 𝑝 ) .For example, applying 𝑑 kBCT ( 𝑄,𝑇 ) to Fig. 3 yields 𝑑 kBCT ( 𝑄 ,𝑇 ) = + . + . = .
9, computed as(a) is the sum of the distance between every query point in 𝑄 to its nearest neighbor in 𝑇 (the threedotted lines). When there is only a single query point, the problem reduces to a 𝑘 NN query [53, 157].Since this solution requires every point in every trajectory in a collection to be considered in orderto locate the the nearest point for all 𝑞 ∈ 𝑄 , the complexity of a brute force solution is quadratic.Note that this distance measure can only be computed when the query 𝑄 is a set of points, anddoes not obey the symmetry rule, i.e., 𝑑 ( 𝑄,𝑇 ) ≠ 𝑑 ( 𝑇 , 𝑄 ) .An alternative to summing the distances between all nearest neighbor pairs is another well-known measure called the closest-pair distance (CPD) [204], which minimizes distance as follows: 𝑑 CPD ( 𝑄,𝑇 ) = min 𝑞 ∈ 𝑄 min 𝑝 ∈ 𝑇 𝑑 ( 𝑞, 𝑝 ) . Then 𝑑 CPD ( 𝑄 ,𝑇 ) = 𝑑 kBCT , 𝑑 CPD is more robust when erroneous points exist in a trajectory.
One weakness in point-to-trajectory measures is that they do not capture the true ordering ofpoints. However, many applications of trajectory search expect an ordering constraint to hold. Thatis, the similarity measure should satisfy a local time shifting [34] constraint.
We now provide a taxonomy of similarity measures that can be used tocompare and contrast each of them. As shown in Table 4, we broadly divide the most commonlyused similarity measures into five categories: curve-based, real-distance, edit-distance, temporal-aware, and segment-based. Each of these will now be described in more detail, and the exampleshown in Fig. 3(b) will be used to more concretely illustrate important properties for each measure.Since a trajectory can also be viewed as a geometric curve , Hausdorff distance [134] measuresthe separation between two subsets of a metric space adversarially, without imposing an ordering
ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: December 2020.
Wang, et al.
Table 4. An overview of existing trajectory-to-trajectory similarity measures.
RobustnessType Measures Complexity Metric No Parameter
GPS Error Sampling Rate Point Shift
Curve Hausdorff [134] O( 𝑛 ) ✓ ✓ ✗ ✗ ✗ DFD [54] O( 𝑛 ) ✓ ✓ ✗ ✗ ✗ Real
DTW [187] O( 𝑛 ) ✗ ✓ ✗ ✗ ✗ ERP [33] O( 𝑛 ) ✓ ✗ ✗ ✗ ✗ Edit
EDR [34] O( 𝑛 ) ✗ ✗ ✓ ✗ ✗ LCSS [154] O( 𝑛 ) ✗ ✗ ✓ ✗ ✗ Temporal DISSIM [62] O( 𝑛 ) ✗ ✗ ✗ ✓ ✗ EDwP [132] O( 𝑛 ) ✗ ✓ ✗ ✓ ✗ Segment LORS [166], LCRS [190] O( 𝑛 ) ✗ ✓ ✓ ✓ ✓ EBD [164] O( 𝑛𝑙𝑜𝑔 ( 𝑛 )) ✓ ✓ ✓ ✓ ✓ constraint. Informally, it is the maximum of the distances between each point in a set 𝑄 to the nearest point in the reference set (trajectory 𝑇 ): 𝑑 Hau ( 𝑄,𝑇 ) = max 𝑞 ∈ 𝑄 min 𝑝 ∈ 𝑇 𝑑 ( 𝑝, 𝑞 ) . Discrete Fréchet Distance [54] (DFD) extends the Hausdorff distance to account for location andordering of the points along the curve, as shown in Equation 1, where 𝑝 and 𝑞 are the head (first)point of 𝑇 and 𝑄 , respectively, and 𝑇 ℎ and 𝑄 ℎ are the sub-trajectory excluding the head 𝑝 and 𝑞 .Accordingly, using the example in Fig. 3(b), 𝑑 Hau and 𝑑 DFD will return the same value 3 .
1, since thematching shown in the figure not only finds the nearest neighbor for each point but also obeyslocal time shifting (the order constraint discussed above). If any two points in 𝑇 swap locations, 𝑑 Hau does not change while 𝑑 DFD could. 𝑑 DFD ( 𝑄,𝑇 ) = 𝑑 ( 𝑝, 𝑞 ) , if 𝑄 = { 𝑞 } 𝑎𝑛𝑑 𝑇 = { 𝑝 }∞ , if 𝑄 = ∅ 𝑜𝑟 𝑇 = ∅ max { 𝑑 ( 𝑞, 𝑝 ) , min { 𝑑 DFD ( 𝑄 ℎ ,𝑇 ℎ ) , 𝑑 DFD ( 𝑄,𝑇 ℎ ) , 𝑑 DFD ( 𝑄 ℎ ,𝑇 )}} , otherwise (1) Since a trajectory can be also viewed as a time series, many similarity measures originallydesigned for time series search can be leveraged in trajectory related problems [204]. Time-seriesmeasures which have been applied to trajectory data include Dynamic Time Warping (
DTW ) [187],Longest Common Subsequence (
LCSS ) [154], and Edit Distance for Real sequences (
EDR ) [34] andEdit distance with Real Penalty (
ERP ) [33].
DTW computes the distance based on the sum of minimumdistances, instead of choosing the maximum as in Discrete Fréchet Distance, as shown in Equation 2.To compute
DTW for Fig. 3(b), every possible pairing of points in 𝑇 and 𝑄 are compared in order,and the sum of the minimum of each pairing 𝑑 DTW ( 𝑄 ,𝑇 ) = . + + + . + . = . . 𝑑 DTW ( 𝑄,𝑇 ) = 𝑑 ( 𝑝, 𝑞 ) , if 𝑄 = { 𝑞 } 𝑎𝑛𝑑 𝑇 = { 𝑝 }∞ , if 𝑄 = ∅ 𝑜𝑟 𝑇 = ∅ 𝑑 ( 𝑞, 𝑝 ) + min { 𝑑 DTW ( 𝑄 ℎ ,𝑇 ℎ ) , 𝑑 DTW ( 𝑄,𝑇 ℎ ) , 𝑑 DTW ( 𝑄 ℎ ,𝑇 )} , otherwise (2) Instead of computing the real distance, computing an edit distance (0 or 1) can be more robust fornoisy data as outliers can heavily impact
DTW -based comparisons. To judge whether two points arematched, edit-distance based measures set a range threshold 𝜏 , i.e., match ( 𝑝, 𝑞 ) = true , if | 𝑝 𝑥 − 𝑞 𝑥 | ≤ 𝜏 and | 𝑝 𝑦 − 𝑞 𝑦 | ≤ 𝜏 , where || denotes the absolute value, and 𝑝 𝑥 and 𝑝 𝑦 denote the latitude andlongitude of point 𝑝 , respectively. Now LCSS and
EDR can be defined as shown in Equation 3 andFig. 4. The main difference between
LCSS and
EDR is that
LCSS measures the similarity between twotrajectories, while
EDR measures the dissimilarity . For example, consider 𝜏 = . 𝑑 LCSS ( 𝑄 ,𝑇 ) = 𝑑 EDR ( 𝑄 ,𝑇 ) =
4. If 𝜏 =
5, then Note that 𝑑 Hau ( 𝑄,𝑇 ) is the directed distance from 𝑄 to 𝑇 . A more general definition which obeys the symmetry andtriangle inequality would be: max { 𝑑 Hau ( 𝑄,𝑇 ) , 𝑑 Hau ( 𝑇, 𝑄 ) } , and is metric as shown in Table 4.ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: December 2020.
Survey on Trajectory Data Management, Analytics, and Learning 9 every point can be matched, and 𝑑 LCSS ( 𝑄 ,𝑇 ) = 𝑑 EDR ( 𝑄 ,𝑇 ) =
0. Hence, both measures aresensitive to the hyperparameter 𝜏 . 𝑑 LCSS ( 𝑄,𝑇 ) = , if 𝑄 = ∅ or 𝑇 = ∅ + 𝑑 LCSS ( 𝑄 ℎ ,𝑇 ℎ ) , if match ( 𝑝, 𝑞 ) = true max { 𝑑 LCSS ( 𝑄,𝑇 ℎ ) , 𝑑 LCSS ( 𝑄 ℎ ,𝑇 )} , otherwise (3) 𝑑 EDR ( 𝑄,𝑇 ) = | 𝑄 | 𝑜𝑟 | 𝑇 | , if 𝑇 = ∅ 𝑜𝑟 𝑄 = ∅ min { 𝑑 EDR ( 𝑄 ℎ ,𝑇 ℎ ) , 𝑑 EDR ( 𝑄,𝑇 ℎ ) + , 𝑑 EDR ( 𝑄 ℎ ,𝑇 ) + } , if match ( 𝑝, 𝑞 ) = false min { + 𝑑 EDR ( 𝑄 ℎ ,𝑇 ℎ ) , 𝑑 EDR ( 𝑄,𝑇 ℎ ) + , 𝑑 EDR ( 𝑄 ℎ ,𝑇 ) + } , otherwise (4)Since these measures ( DTW , LCSS , and
EDR ) do not obey the triangle inequality, which is anessential requirement for metric space pruning, all of the time-series similarity measures are non-metric except for
ERP [33]. In
ERP , a point 𝑔 can be any fixed point in a metric space, and is used asthe reference origin point 𝑔 = { , } . Changing 𝑔 can also change distance scores similar to LCSS ,which can result in 𝑘 NN -based algorithms being non-deterministic. 𝑑 ERP ( 𝑄,𝑇 ) = (cid:205) 𝑝 ∈ 𝑇 𝑑 ( 𝑔, 𝑝 ) , if 𝑄 = ∅ (cid:205) 𝑞 ∈ 𝑄 𝑑 ( 𝑞, 𝑔 ) , if 𝑇 = ∅ min { 𝑑 ERP ( 𝑄 ℎ ,𝑇 ℎ ) + 𝑑 ( 𝑞, 𝑝 ) , 𝑑 ERP ( 𝑄,𝑇 ℎ ) + 𝑑 ( 𝑔, 𝑝 ) , 𝑑 ERP ( 𝑄 ℎ ,𝑇 ) + 𝑑 ( 𝑞, 𝑔 )} , otherwise (5) The aforementioned six measures are the representativeof commonly deployed similarity measures for spatial-only trajectory applications. In additionto spatial information, temporal information is also an important factor in accurate sample ratecalibration. Frentzos et al. [62] proposed a measure called
DISSIM to compute the dissimilarity: 𝑑 DISSIM = ∫ 𝑡 𝑛 𝑡 𝑑 ( 𝑄 𝑡 ,𝑇 𝑡 ) 𝑑𝑡 . Here, 𝑄 𝑡 and 𝑇 𝑡 denote the points of 𝑄 and 𝑇 at timestamp 𝑡 in the range [ 𝑡 , 𝑡 𝑛 ] . The core idea is to integrate time w.r.t. Euclidean distance for trajectories occurring in thesame period. DISSIM can resolve various sampling rate problems commonly encountered throughintegration, but can also be computationally expensive. To alleviate this issue,
EDwP [132] calibratestrajectory manipulations by adding or removing points to align the sampling rate between any twotrajectories, and computes similarity as described for
EDR . Temporal-aware pointwise methods can compute similarity inmuch richer trajectory data, but results often have precision issues unless sample-rate calibrationis applied. Another alternative is to convert trajectories to segments. This approach has beenshown to reduce sample mismatch effects as well as reducing the complexity of the similaritycomputations applied. Tiakas et al. [151] was among the first to propose this solution for roadnetworks. The approach uses the sum of the distances between nodes in the road network thatcontribute to the final distance, but the trajectory pairs must have the same length in order toguarantee point-to-point matching. To circumvent the length constraint, Mao et al. [120] proposed arelated solution which used
DTW after converting pointwise trajectories to paths on a road network.In both approaches, the similarity is still computed based on the end-points of road segments, somap matching is only applied once to clean trajectory data and to align sampling rates, but it stillneeds to compute the Euclidean distance between nodes which can be computationally expensiveeven is small data sets.To reduce the costs in when computing the distances, Wang et al. [166] proposed the use of longestoverlapped road segments ( LORS ). This measure was inspired by
LCSS and adapted to leverage the Actually all aforementioned spatial-only measures can be easily extended to handle the spatio-temporal case, by performingtwo separate distance calculations on the spatial and temporal features, and then combining the score by using a balancingfactor 𝛼 . ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: December 2020. T T (a) Two similar original trajectories (b) Ideal sampling(c) Point shift (d) GPS error (e) Inconsistent sampling rate T T Fig. 4. A similarity measure that is sensitive to point shift, error, andsampling rate. T T T T T T q q q Q Q q q q Q Fig. 5. Three kinds of spatial-only trajectory queries. properties inherent in map-matched data.
LORS does not compute the Euclidean distance betweenpoints to determine if they adhere to a threshold distance constraint. Instead, overlapping segmentsbetween trajectories are identified first, and then reused to compute the similarities. 𝑑 LORS ( 𝑄,𝑇 ) = , if 𝑄 or 𝑇 is empty | 𝑒 𝑚 | + 𝑑 LORS ( 𝑄 ℎ ,𝑇 ℎ ) , if 𝑒 𝑚 = 𝑒 𝑥 max ( 𝑑 LORS ( 𝑄 ℎ ,𝑇 ) , 𝑑 LORS ( 𝑄,𝑇 ℎ )) , otherwise (6) Here, the inputs are 𝑄 = ( 𝑒 , 𝑒 , ..., 𝑒 𝑚 ) and 𝑇 = ( 𝑒 , 𝑒 , ..., 𝑒 𝑥 ) , where 𝑒 𝑚 are the end edgeof trajectory 𝑄 , and | 𝑒 𝑚 | is the travel length of a graph edge 𝑒 𝑚 . Yuan and Li [190] furtherextended and normalized LORS to account for trajectory length effects in | 𝑄 | and | 𝑇 | . The resultingmeasure LCRS can be defined as: 𝑑 LCRS ( 𝑄,𝑇 ) = 𝑑 LORS ( 𝑄,𝑇 )| 𝑄 |+| 𝑇 |− 𝑑 LORS ( 𝑄,𝑇 ) . By modeling network-constrainedtrajectories as strings in a similar manner, Koide et al. [89] proposed a generalized metric called weighted edit distance ( WED ), which supports user-defined cost functions that can be used withseveral important similarity functions such as
ERP and
EDR .Both
LORS , LCRS , and
WED satisfy the local time shift constraint, and have a quadratic complexityusing dynamic programming. Wang et al. [164] later simplified the computational costs of
LORS further. The key insight was to exploit ordered integer intersection (through finger search), whereordered integers represented unique IDs assigned to road segments during map-matching. Theresulting method called edge-based distance -EBD: 𝑑 EBD ( 𝑄,𝑇 ) = max (| 𝑄 | , | 𝑇 |) − | 𝑄 ∩ 𝑇 | , was shown tohave comparable precision to LORS and was also highly scalable in practice.
Table 4 compares characteristics of similarity measures based on four different properties: complex-ity, metricity, parameter independence, and robustness. Since parameters such as 𝜏 in LCSS and
EDR can be easily observed from their definitions, we will elaborate the other three characteristics next.
Complexity.
Many trajectory-based similarity measures were designed to allow search to be moreresilient to variance produced by local time shifting. However, computing optimal distances withordering constraints generally requires dynamic programming solutions with quadratic complexity(
O ( 𝑛 ) ) as well as the associated space overheads. Metricity.
Fully metric compliance is often crucial for a similarity measure, as obeying the triangleinequality provides both theoretical and practical advantages. Here, we use an example to illustratethe concept. Consider the computation of the well-known metric similarity measure, Euclideandistance. Given a query point 𝑞 with 𝑝 and 𝑝 as the two next candidates for a 𝑘 NN query, 𝑑 ( 𝑞, 𝑝 ) is computed. Before distance is computed for 𝑝 , the lower and upper bound of 𝑑 ( 𝑞, 𝑝 ) can be ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: December 2020.
Survey on Trajectory Data Management, Analytics, and Learning 11
Table 5. An overview of common operations applied to trajectory and point data.
Data Type Query Input Output Measure Index
Spatial-only Basic Range [91, 136] range trajectories N/A R-treePath [88, 91] path trajectories N/A R-tree 𝑘 NN 𝑘 NN [20, 74] point points Euclidean R-tree 𝑘 NNT [34, 141, 154] trajectory trajectories LCSS, DTW R-tree 𝑘 BCT [38, 149] points trajectories Aggregate R-treeReverse 𝑘 NN R 𝑘 NN [150] point points Euclidean R-tree R 𝑘 NNT [163] trajectory trajectories CPD N/AClustering Density [95] thresholds paths N/A N/APartition [164] 𝑘 trajectory EBD Inverted indexJoin similarity join [148, 190] threshold pairs Normalized CPD signature 𝑘 NN join [57] 𝑘 pairs CPD Grid-indexSpatial-textual KS [25, 52] keywords points TF-IDF Inverted listTop- 𝑘 T 𝑘 SK [105, 195] point, keywords points Aggregate IR-tree T 𝑘 STT [140, 202] points, keywords trajectories Aggregate Grid-index,Inverted list estimated based on 𝑑 ( 𝑞, 𝑝 ) and 𝑑 ( 𝑝 , 𝑝 ) as: | 𝑑 ( 𝑞, 𝑝 ) − 𝑑 ( 𝑝 , 𝑝 )| < 𝑑 ( 𝑞, 𝑝 ) < 𝑑 ( 𝑞, 𝑝 ) + 𝑑 ( 𝑝 , 𝑝 ) , where 𝑑 ( 𝑝 , 𝑝 ) can be precomputed off-line for any query. The lower bound can be compared with 𝑑 ( 𝑞, 𝑝 ′ ) where 𝑝 ′ is currently the best nearest neighbor candidate. If the bound is greater than the currentbest result, we can discard 𝑝 directly as it can never replace 𝑝 ′ . Robustness.
In addition to complexity and metricity, many similarity measures are sensitive tonoise, sampling rate, and point shifting (points are shifted on the path where the trajectory lies).The existence of noise due to GPS errors or other common sensor data collection approachesresult in more (or fewer) sampled points, point shifting, and other quality issue. Wang et al. [156]systematically compared the robustness of six common similarity measures in a real-world trajectorydataset, and a complete version [146] was published later with additional point-based measures.However, the most recent segment-based measures [166, 190] were not covered.Fig. 4 illustrates several common robustness problems encountered in trajectory data. 𝑇 and 𝑇 are two vehicle trajectories in a road network that overlap. After sampling, a set of pointsrepresents each one, with perfect data resulting in (b). In reality, sampling rates can vary, resultingin a point shift (c). GPS errors (d) can also occur in sampled points that are not on the same road,and inconsistent sampling rates (e) lead to a different number of points being produced for thetwo trajectories. Each of the above issues can lead to distance scores greater than those found inerror free data, and can even result in incorrect solutions for certain query types. Robust similaritymeasures are therefore highly desirable. Fig. 5 shows three exemplar queries 𝑄 , 𝑄 , 𝑄 for the spatial-only trajectories 𝑇 to 𝑇 . Given threepoints ( 𝑞 , 𝑞 , 𝑞 ), 𝑄 finds the best connected trajectory. Among all six trajectories, 𝑇 and 𝑇 areboth possible solutions. For a pointwise similarity search as shown in 𝑄 , the similarity measureused should ideally be able to reliably distinguish between the two candidates. Given a target range(red box), 𝑄 finds all trajectories covered (partially or fully). Valid results are 𝑇 and 𝑇 , and cannot be computed using any of the similarity measures shown. In comparison, if 𝑄 is the query forthe top- 𝑘 search, a similarity measure must to be defined that can capture all of the properties of 𝑄 , including points and ordering.Table 5 compares and contrasts several common trajectory queries by input, output, similaritymeasure, and most appropriate index. The “Index” column shows the preferred indexing approach.Many other alternatives exist. A more comprehensive analysis and discussion of indexing solutions ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: December 2020. is in Section 4.2. Query types range from basic trajectory search, top- 𝑘 trajectory similarity search(spatial-only and spatial-textual), to more complex operations such as reverse 𝑘 nearest neighborssearch, and trajectory clustering. We now discuss each of these query operations in detail alongwith several indexing techniques which have been proposed to accelerate performance. Basic trajectory search includes three classic query formulations.The most basic is a
Range Query (RQ) [91, 134, 136, 144] which finds all (sub-)trajectories locatedin a spatial or temporal region. Range queries has many applications in traffic monitoring suchas returning all vehicles at a road intersection. The second is a
Path Query (PQ) which retrievesthe trajectories that contain any edge of the given path query. The third is a
Strict Path Query (SPQ) [88, 91, 136] which finds all trajectories that traverse an entire path from beginning to end.Interestingly, path queries and strict path queries share many common properties with disjunctive(OR) and conjunctive (AND) Boolean queries [119].Definition 5. (Range Query)
Given a trajectory database 𝐷 = { 𝑇 , . . . ,𝑇 | 𝐷 | } and query rectan-gular region 𝑄 𝑟 , a range query retrieves the trajectories: RQ ( 𝑄 𝑟 ) = { 𝑇 ∈ 𝐷 |∃ 𝑝 𝑖 ∈ 𝑇 ( 𝑝 𝑖 ∈ 𝑄 𝑟 )} . Definition 6. (Path Query)
Given a 𝑄 𝑝 which is a path in 𝐺 , a path query retrieves the trajectories 𝑇 that pass through at least one edge 𝑒 𝑗 of 𝑄 𝑝 : PQ ( 𝑄 𝑝 ) = { 𝑇 ∈ 𝐷 |∃ 𝑒 𝑖 ∈ 𝑇 , 𝑒 𝑗 ∈ 𝑄 𝑝 ( 𝑒 𝑖 = 𝑒 𝑗 )} . Definition 7. (Strict Path Query)
Given a 𝑄 𝑝 , a strict path query retrieves the trajectories whoseedges can all be found in 𝑄 𝑝 : SPQ ( 𝑄 𝑝 ) = { 𝑇 ∈ 𝐷 |∃ 𝑖, 𝑗 ( 𝑇 𝑖 𝑗 = 𝑄 𝑝 )} , where 𝑇 𝑖 𝑗 = { 𝑒 𝑖 , 𝑒 𝑖 + , · · · , 𝑒 𝑗 } is thesub-trajectory of 𝑇 . 𝑘 Nearest Neighbors Query.
To find a relevant subset from a large set of objects for a givenquery object, 𝑘 nearest neighbor queries have been widely applied in spatial databases. For trajec-tories, the query can be either a trajectory or a set of points.Definition 8. (k Nearest Neighbors Query over Trajectories) Given a trajectory database 𝐷 = { 𝑇 , . . . ,𝑇 | 𝐷 | } and query 𝑄 = { 𝑞 , 𝑞 , · · · , 𝑞 | 𝑄 | } , a 𝑘 Nearest Neighbors Query ( 𝑘 NN ( 𝑄 ) ) retrievesa set 𝐷 𝑠 ⊆ 𝐷 with 𝑘 trajectories such that: ∀ 𝑇 ∈ 𝐷 𝑠 , ∀ 𝑇 ′ ∈ 𝐷 − 𝐷 𝑠 , 𝑑 ( 𝑄,𝑇 ) < 𝑑 ( 𝑄,𝑇 ′ ) . Search by Trajectory.
Given a query trajectory 𝑄 , a 𝑘 Nearest Neighbor Trajectories Query aimsto find the 𝑘 most similar/nearest trajectories to 𝑄 , based on a given trajectory similarity measure[166, 182]. Such a query can be used to find the most similar trips in traffic flow analysis. For thespecial case when the trajectory is a single point, which is also known as 𝑘 NN search (we use kNNT to denote trajectory search), with the default similarity being Euclidean distance. Search by Points.
There has also been previous work targeting search on spatial-only trajectorydata where the input query is a set of points [38, 131, 149]. Chen et al. [38] initially proposed theproblem of querying over a set of points with spatial-only trajectories. They proposed incrementalexpansion using an R-tree to prune candidate points from consideration. They referred to theapproach
Incremental K Nearest Neighbors (IKNN) . To optimize the IKNN algorithm, Tang et al.[149] devised a qualifier expectation measure that ranks partially matched candidate trajectories.The approach accelerates query processing significantly when non-uniform trajectory distributionsand/or outlier query locations exist in the solution space. 𝑘 Nearest Neighbor Query.
Instead of searching the most relevant objects to a queryobject,
Reverse 𝑘 Nearest Neighbors ( R 𝑘 NN ) queries attempt to locate objects which will take thequery as one of their 𝑘 nearest neighbors. Formally, R 𝑘 NN is defined as below:Definition 9. (Reverse k Nearest Neighbor) Given a set of points (or trajectories) 𝐷 and a querypoint (or trajectory) 𝑄 , a R 𝑘 NN ( 𝑄 ) retrieves all objects 𝑇 ∈ 𝐷 that take 𝑄 as 𝑘 NN, i.e., ∀ 𝑇 , 𝑄 ∈ 𝑘 NN ( 𝑇 ) . ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: December 2020.
Survey on Trajectory Data Management, Analytics, and Learning 13 R 𝑘 NN queries for spatial point data have attracted considerable attention in the research commu-nity [150] as it can be used to solve a wide variety of industry-relevant problems. An R 𝑘 NN queryaims to identify all (spatial) objects that have a query location as a 𝑘 nearest neighbor. Importantapplications of the R 𝑘 NN query include resource allocation [163] and profile-based marketing [199].For example, R 𝑘 NN queries can be used to estimate the number of customers a new restaurantwould attract based on location, and was initially referred to as a bichromatic R 𝑘 NN based on thedual-indexing solution proposed to solve the problem [150].Instead of querying a set of static points, Cheema et al. [29] proposed the continuous reversenearest neighbors query to monitor a moving object. The goal is to find all static points that containthe moving object as a 𝑘 nearest neighbor. This approach targets a single point rather than acommuter trajectory of multiple-points, which was later solved in R 𝑘 NNT [163].
Reverse Trajectory Search.
The “Reverse 𝑘 Nearest Neighbor Query over Trajectories”( R 𝑘 NNT [163]), and can be defined as follows. Given a trajectory dataset D 𝑇 and a set of routes D 𝑅 (also defined as trajectory) and a candidate point set 𝑄 = ( 𝑜 , 𝑜 , . . . , 𝑜 𝑚 ) as a query, R 𝑘 NNT returnsall the trajectories in D 𝑇 that will take 𝑄 as 𝑘 nearest routes using a point-to-trajectory similaritymeasure. The main application of R 𝑘 NNT is to estimate the capacity of a new bus route and can befurther used to plan a route with a maximum capacity between a source and destination, whichwas defined as a
MaxR 𝑘 NNT in [163]. The main challenge in solving R 𝑘 NNT is how to prune thetrajectories which cannot be in the results without explicitly accessing the whole dataset D 𝑇 . All ofthe pruning methods commonly used for R 𝑘 NN may work for trajectories. For example half-space pruning [150], which can prune an area by drawing a perpendicular bisector between a query pointand a data point will not work. Wang et al. [163] proposed the use of an R-tree of the trajectoryroutes (an example can be seen in Fig. 6). By drawing bisectors between the route points and thequery, an area can be found where all trajectory points contained in a region can not have thenearest route for the query. Trajectory indexing is commonly applied to improve efficiency in trajectory related search problems.Trajectory data is processed off-line to improve scalability and prune the search space. Existingapproaches to trajectory indexing have two components: indexable point data and mapping tables.
Point Indexing.
Space-efficient index representations and processing frameworks are crucial fortrajectory data, and the majority of trajectory search solutions [38] rely on an R-tree [74], whichstore all points from the raw trajectories, and are historically the dominant approach deployedfor spatial computing applications. As shown in Fig. 6, trajectories are decomposed to points first,then an R-tree is used to index each point. A mapping table identifies the trajectory that containseach point. Since trajectory datasets such as T-drive [192] can easily contain millions of points, theR-tree must manage an enormous number of maximum bounding rectangles (MBR), which haveprohibitive memory costs in practice. So, traditional methods employed for spatial pruning can beinfeasible on raw trajectory data. Simpler Grid-index solutions can sometimes be more appropriatein such scenarios [168, 202]. Nevertheless, the core problem is compounded by the fact that manyof the similarity measures proposed for trajectory search are non-metric.
Mapping Table.
A mapping table is used to map points to trajectories [38, 202]. After searching fora point, the mapping table identifies the trajectory containing the point. Ensure that the mappingis unambiguous, every point has a unique identifier ranging from [ , |D .𝑃 |] , each trajectory has aunique identifier ranging from [ , |D|] , |D .𝑃 | and |D| are the number of points and trajectories.With the mapping table in Fig. 6(c), we can know that 𝑝 is a point of 𝑇 and 𝑝 belongs to 𝑇 . ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: December 2020. p T p T ... ...... ... ...... N N N N N N N N N p p ... ... T T T T T T Q N N N N N N N N N p p (a) Indexing points with R-tree (b) Structure of R-tree (c) Mapping table Fig. 6. Indexing spatial-only trajectories using an R-tree and a mapping table.
Pruning Mechanism.
For range queries, most existing approaches employ MBR-based pruning,as MBR intersection with a query trajectory can be used to effectively and efficiently prune thesearch space. If a node intersects with a query range, the object covered can be removed fromconsideration. For example, the nodes 𝑁 and 𝑁 in Fig. 6(a) do not intersect with 𝑁 , so theycan be eliminated from consideration, leaving only 𝑁 . For 𝑘 NN trajectory search, there are twocommon pruning techniques: early termination and early abandoning . Early termination isinitially proposed in the threshold algorithm [56]. Given an index, the algorithm scans a sortedlist, and early terminates when all remaining items exceed some precomputed upper bound. It issimilar to pruning in 𝑘 NN search with an R-tree where the minimum distance between the queryand a node in the index can be used to determine if the distance for the items contained must becomputed. In contrast to early termination which can prune many items in a single operation, earlyabandoning must consider each item but exploits a bound to minimize the number of expensivecomputations applied during processing. This technique is commonly used to accelerate time seriessimilarity search and 𝑘 NN search [154]. By estimating the upper bound of the similarity scoreand comparing it to the 𝑘 -th item in a max-heap (for top- 𝑘 search for example), an algorithm candetermine if a full similarity computation must be made for the item under consideration. Definition 10. (Trajectory Join)
Given two sets of trajectories 𝑆 and 𝑆 , and a similarity threshold 𝜖 (or 𝑘 in 𝑘 NN search), a trajectory join operation will return all pairs of trajectories 𝑇 𝑖 and 𝑇 𝑗 from 𝑆 and 𝑆 with a similarity that exceeds 𝜖 (or 𝑇 𝑖 ∈ 𝑘 NN ( 𝑇 𝑗 ) ). The main applications of trajectory joins are in trajectory data cleaning, near-duplicate detection,and carpooling. For example, given a driver trajectory database and a rider trajectory database, atrajectory join will match riders with similar trajectories to drivers. Trajectory join operations canbe divided into two categories: trajectory similarity joins [148] and trajectory 𝑘 Nearest Neighborsjoins [57]. A simple baseline is to conduct a similarity search or 𝑘 NN search for each trajectory.Given the quadratic complexity of this solution, scaling can be problematic.To reduce the complexity, many different indexing techniques can be used. Bakalov and Tsotras[17] considered the moving object trajectory similarity join, which is a pointwise join in a specifictimestamp, and similarity is computed between points and not trajectories. Ta et al. [148] proposed asignature-based solution to filter trajectory pairs before distance computation, where the signatureis built for each trajectory to help prune, based on its crossed and neighbor grids. Shang et al. [139]explored the spatial-temporal trajectory similarity join problem for road networks. A two-stepalgorithm is applied to each trajectory: expansion and verification, similar to trajectory similaritysearch in Section 4. ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: December 2020.
Survey on Trajectory Data Management, Analytics, and Learning 15
Distributed Trajectory Similarity Join.
Distributed computing is often applied to solve thetrajectory join problem due to the high computational costs of the distance computation. Fanget al. [57] proposed a distributed trajectory 𝑘 NN join method using MapReduce. Their solutionused Hadoop, and the algorithm can also run on a single machine. Shang et al. [141] computeddistributed trajectory similarity joins using Spark, where a global index is deployed over multiplecomputer nodes and used to prune the search space. Yuan and Li [190] also explored the distributedtrajectory similarity join problem using similar techniques to Shang et al. [141], but applied thesimilarity measure LCRS described in Section 3.2.3 to compute. Spatial-textual trajectory search incorporates text and keywords to the data, and relevant trajectorieshave more than one dimension of similarity that must be considered. For example the keywords“ seafood ”, “ coffee ” and “ swimming ” might be contained in a query and a trajectory. 𝑘 Spatial-Textual Trajectory Search.
Similarity Measures.
In contrast to spatial-onlysimilarity measures, spatial-textual measures for an activity trajectory depend on both spatial andtextual components. Linear combinations of two distinct similarity measures is a common solutionfor spatial-textual data [41, 195]. The spatial distance is combined with TF · IDF similarity for the text,with a user-defined weight between the two to balance importance based on the target application.Such a measure was also extended by Shang et al. [140], where the spatial-distance is computed ina manner similar to the 𝑘 Best Connected Trajectories query ( 𝑘 BCT ). The textual similarity was thencomputed using simple keyword intersection. Since the text relevance is computed as a part in thefinal score, it is more flexible in real applications. Conversely, a conjunctive approach [42, 202]tightens the constraint, requiring that the result trajectory should contain all of the query keywords,and results are ordered by spatial distance.
Pointwise Search with Keywords.
Several recent papers have explored the problem of spatial-textual trajectory search in a range of different scenarios. Cong et al. [42] proposed a sub-trajectorybased solution to find the minimum travel distance for a single query point using a Boolean keywordmatch. Shang et al. [140] presented a disjunctive solution for multiple points, where the distance ismeasured between points, but keywords are assigned to an entire trajectory instead of each point.Zheng et al. [202] attached the keywords to a specific point in the trajectory to allow finer-grainedmatching of textual information. However, their work only supported conjunctive text matching,so results must contain all query keywords, simplifying the more general keyword search problem.To handle the case when users do not have a preferred locations or exact text matching should notbe guaranteed, Zheng et al. [201] proposed an approximate query solution over trajectories thatuse a set of keywords, and the similarity is measured as the travel distance, just as Cong et al. [42]did. A special case of this problem is when the query contains only keywords, and is known as akeyword-only search ( KS ) [25]. Another special case is when the trajectory data and query are asingle point, and is known as Top- 𝑘 Spatial Keyword Search ( T 𝑘 SK ) [195]. 𝑘 Query.
Hariharan et al. [78] utilized the Boolean range queries tofind all objects containing query keywords, all of which must be located within a bounded range.Boolean Top- 𝑘 queries [50] find the 𝑘 nearest objects using conjunctive Boolean constraints on thetext, i.e., it conducts a 𝑘 NN search over all the points that contain all the keywords from query. Hanet al. [77] investigated Boolean range queries on trajectories, and considered spatial, temporal, andtextual information in the solution, and an index based on an octree and inverted index is proposedto answer their proposed query. ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: December 2020.
Inverted indexes [210] are widely used to efficiently managetext information for spatial-textual problems. Related algorithms and data structures for invertedindexes have benefited from many years of practical improvements to support web search andrelated applications, making them an obvious candidate to manage textual components in hetero-geneous data collections. A unique object identifier is mapped to every keyword contained, andan inverted list of all objects containing each keyword can only represent existence, or containauxiliary data such as the frequency of the keyword in the object text. The auxiliary data can alsobe a precomputed similarity score or even offsets for phrase reconstruction. Most prior work [195]attaches inverted lists to nodes in an existing tree-based spatial index (a Grid-index for example) inorder to support spatial-textual similarity computations during tree traversal.
Clustering similar trajectories to produce representative exemplars can be a powerful visualizationtool to track the mobility of vehicles and humans. It has been investigated in many differentapplications, such as spatial databases [12, 82, 95], data mining [129], transportation [181], andvisualization [60]. Clustering is most often applied to spatial-only trajectories, with prior workon spatial-textual trajectory clustering being relatively rare. Trajectory clustering can be broadlydivided into two categories: Partition-based: [27, 60, 68, 82, 129, 186] and Density-based: [12, 15, 75,95, 104]. Both the partition and the density-based trajectory clustering require extensive similaritycomputations, with the only distinction being if it is computed for whole trajectories or using onlysub-trajectories.
Given a set of trajectories, partition-based clustering dividestrajectories into a limited number ( 𝑘 ) of groups (clusters). The similarity measure (Section 3) andparameter 𝑘 are selected a priori depending on the use case.Definition 11. (Partition-based Trajectory Clustering) Given a set of trajectories { 𝑇 ,𝑇 , · · · ,𝑇 𝑛 } , partition-based clustering aims to partition 𝑛 trajectories into 𝑘 ( 𝑘 ≤ 𝑛 ) clusters 𝑆 = { 𝑆 , 𝑆 , · · · , 𝑆 𝑘 } by minimizing the objective function: 𝑂 = arg min 𝑆 (cid:205) 𝑘𝑗 = (cid:205) 𝑇 𝑖 ∈ 𝑆 𝑗 𝑑 ( 𝑇 𝑖 , 𝜇 𝑗 ) . Here, each cluster 𝑆 𝑗 has a centroid trajectory or path 𝜇 𝑗 , and 𝑑 ( 𝑇 𝑖 , 𝜇 𝑗 ) is a similarity measure.Many different similarity measures such as the ones introduced in Section 3 have been usedfor trajectory clustering. For partition-based clustering, the most appropriate similarity measureused in the application scenario can vary widely (e.g., vehicle [27, 60, 82, 129], soccer player [68],cellular [181], large vessels [186]). Parameterization can also be an important hurdle as well asseveral approaches need to optimize multiple parameters in addition to 𝑘 in order to produce highquality results. As an extension of 𝑘 -means , which is NP-hard when computing an exact solution,is partition-based trajectory clustering, which can also be computationally intractable for largecollections when certain similarity measures are required. So, often the only solution is to find waysto reduce the number of trajectories compared through similarity thresholding [129] or throughapproximation ratios if an exact algorithm, which cannot be achieved in polynomial time [27].One possible approach is to use 𝑘 -paths clustering [164], which is an extension of 𝑘 -means fortrajectories on a road network. The key idea in 𝑘 -paths is to use a quasilinear similarity measure- EBD and prune the search space based with an inverted index and exploit the metric propertiesof
EBD . As we described in Section 3,
EBD computes similarity by performing a set intersectionover edges, which substantially improves the performance of pairwise similarity computation, andalso maintains comparable precision to common measures that have a quadratic computationalcomplexity. Then, an indexing technique based on an edge inverted index and a tree structure for
ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: December 2020.
Survey on Trajectory Data Management, Analytics, and Learning 17 metric similarity measure is used to further reduce the number of similarity computations in theassignment and refinement phases.
Density-based trajectory clustering first finds dense “segments”,and then connects these segments to produce representative routes. However, the most appropriateapproach to identifying dense segments highly dependent on properties of the data.
TRACLUS [95]is the most cited trajectory clustering algorithm which solves this problem in two steps, eachtrajectory is partitioned into a set of line segments (sub-trajectory), and then, similar line segmentsare grouped together into clusters. A distance threshold 𝜏 is the most common parameter used indensity-based clustering solutions, which specifies the neighborhood radius.Based on TRACLUS [95], Li et al. [104] proposed an incremental clustering framework to allownew trajectories to be added to a database, and a new parameter is further introduced for a new step- generating micro-clusters before the final clusters . Similarly, Agarwal et al. [12] reduced the clusteringproblem to finding a frequent path in a graph, which allowed standard graph traversal techniquesto be applied to identify pathlets that could be used to represent each cluster with common sub-trajectories. Each cluster’s pathlet is defined as a sequence of points that is not necessarily asubsequence of an input trajectory. The main difference with
TRACLUS is that the pathlets are theresult of optimizing a single objective function to best represent the trajectories, while
TRACLUS generates a set of common segments through density-based clustering, then connects them to formseveral common paths to represent the trajectories. Unfortunately, finding pathlets has been provedto be an NP-hard problem, so exact solutions are intractable.
For urban data, there are mainly two types of trajectory classification problems: 1) similarity-based classification [94]; 2) transportation modes and activities classification [45, 205] which wasreviewed recently by [204]. Next, we will mainly focus on the similarity-based classification, andalso introduce a new classification task for travel-based inferences [30, 67, 123, 161].
Several studies have investigated trajectory classi-fication problems which also require extensive similarity computations [94, 128, 142]. The maindistinction in trajectory clustering is that classification assigns a label to an individual trajectorybased on its features, while clustering is conducted over all items in a dataset. Specifically, Leeet al. [94] proposed a feature generation framework for trajectory classification, where two typesof clustering, region-based and trajectory-based [95], are used to generate features for traditionalclassifiers such as decision tree or an SVM. Sharma et al. [142] proposed a nearest neighbor clas-sification for trajectory data by computing distance with trajectory sampled and then choosingthe nearest trajectory as the label. Andres et al. [14] found relevant sub-trajectories as features forrobust classification, where the distance is computed between two equal-length sub-trajectories.
Inferring the purpose of a trip has potential applications forimproving urban planning and governance [161], but it is normally conducted by through manualsurveys. With the proliferation of trajectory data availability, trip purposes can be also classified.Gong et al. [67] categorized the spatiotemporal characteristics of nine daily activity types basedon inference results, including their temporal regularities, spatial dynamics, and distributions oftrip lengths and directions. Wang et al. [161] proposed a probabilistic framework for inducing tripordering in massive taxi GPS trajectories collections. The key idea is to augment the origin anddestination with neighbor POIs and identify POI links based on time periods. Then the trip intentscan be explained semantically. Nair et al. [123] learned a model to automatically infer the purpose
ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: December 2020.
Table 6. A summary of trajectory applications, where “P2P” denotes the point-to-point distance, and “P2T”denotes the point-to-trajectory distance.
Type Work Mapmatching Storage Measure Representativequery Clustering Indexing
Monitoring[166, 170, 182] ✓ segment - Rangequery ✓ ✓ RoadTraffic Jam & Flow[48, 85, 203] ✓ segment EBD ✗ ✓ ✓ Anomaly[32, 172] ✗ point EDR ✗ ✓ ✓ Network design[31, 111, 130] ✗ point P2P Constrainedpath search ✓ ✓ GreenTransport Navigation[47, 70] ✓ segment ✗ Shortestpath query ✗ ✓
Carpooling[79, 81] ✗ segment P2P Join ✓ ✓ Trip search[165, 167] ✗ point P2T TkSTT ✓ ✓ TourismPlanning Customized trip[37, 169] ✗ point P2P Frequentpath ✓ ✓ Interest Discovery[126, 169] ✗ point P2PP2T Multiplequeries ✓ ✓ Billboard[199, 200] ✗ point P2T Set cover ✗ ✓ SiteSelection Charging station[102, 106] ✗ point ✗ ✗ ✗ ✓ Facility route[18, 96, 163] ✓ segment P2T Shortestpath query ✓ ✓ of a cycling trip using personal data from cyclists, GPS trajectories, and a variety of built-in andsocial environment features extracted from open datasets characterizing the streets. Large-scale trajectory data is primarily collected in urban areas, making it a valuable tool forapplications in real-time smart cities and timely decision making. We will focus on four categoriesof applications which can benefit from trajectory data. Table 6 provides a summary of representativework in each of these broad areas. To intuitively illustrate the connections between each of these,we also provide several examples in Fig. 12. Other examples can be found in the Appendix of [162].
Cities and government agencies are often a valuable source of trajectory data. There has been anincreased emphasis on collecting data from traffic signals and other related sensors for internalauditing purposes, and regulatory requirements often require the data collected to be made publiclyavailable since it was gathered using tax income. Such data can be used for traffic monitoring [170],anomaly detection [172], and traffic jam and flow analysis [175].
An Overview.
Fig. 7 shows the pipeline of three representative trajectory analytical tasks for roadtraffic. As a common operator, mapping trajectories onto road networks not only cleans the data,but also enables traffic jam and flow analysis in real time.
Since trajectories record the trace of vehicles in the road, many visual-ization systems [35] were developed to observe and monitor past or real-time traffic trends andmovement patterns based on trajectory data. Users can interactively explore the traffic conditionin a specific area or road, and further control the traffic if necessary. As visualizing an entire
ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: December 2020.
Survey on Trajectory Data Management, Analytics, and Learning 19
Trajectorydataset Mapmatching Inputquery 𝑘 Incomingtrajectory SimilaritycomputationAssigntrajectoriesCoveragecheck Pruningby indexRefinecluster 𝜃 comparison Relatedtrajectories 𝑘 pathsAnomalyor not Monitoringtraffic [170]Anomalydection [32]Flowanalysis [164] Fig. 7. Pipeline decomposition of three types of work on road traffic. city-wide trajectory dataset on a single screen can be unreadable, selecting an area and identifyingthe trajectories covered is a useful application of a range query, which was described in Section 4.In addition to a range query, a wide variety of trajectory search queries [34, 49, 88, 91, 134, 136]have been proposed over the years to support various applications. To monitor all vehicles that use a Main Street , a path query [91] can be issued. Further, a strict path query identifies every vehiclethat traverses all of a Main Street . Wang et al. [170] integrated all the above queries into a real-timetraffic monitoring system, where user can interactively conduct queries and get results in real time.
Live traffic monitoring enables analysts to track usage ofspecific roads or areas in real time, and can be used to identify traffic jams in order to notifycommuters through digital traffic signs or GPS applications. After mapping trajectory data to aroad network, Wang et al. [175] visually inspected traffic data and verified problem areas based onspeed limit information of the streets under consideration.Instead of inspecting individual road segments to identify traffic jams, traffic flow analysisdoes not rely on domain-specific features such as speed limits. The goal is to discover importantmovement patterns of drivers such as repeatedly using a common route composed of severaldifferent road segments. Several recent papers have proposed concept definitions that captureessential properties of traffic flow analysis based on trajectories. Gudmundsson and Van Kreveld[69] defined the concept of a flock , which is a common sub-trajectory covered by a set of trajectorieswith at least 𝑘 circles with the same-radius. Jeung et al. [85] extended the concept of a convoy ,where the circles can have various radii. Li et al. [103] proposed the concept of a swarm to solve theproblem that a sub-trajectory has to be a continuous set of points in the corresponding trajectory.Rather than a single sub-trajectory, Zheng et al. [203] proposed the concept of gathering , whichreturns multiple sub-trajectories. To further avoid specifying multiple hyper-parameters, e.g., thenumber of circles, 𝑘 -paths trajectory clustering will return 𝑘 most representative trajectories (realpaths on road network) [164], which can be used in the traffic flow analysis. Anomaly or outlier trajectories in a dataset can be defined as atrajectory that falls outside of a predefined confidence interval w.r.t. the entire collection distribution.It has be applied in applications such as identifying criminal behavior in a population [143] and taxidriver fraud detection [172]. A combination of similarity and clustering can be used to solve thisproblem. Computing similarity between trajectories can be used to identify trajectories that are themost dissimilar in a dataset, and clustering can be used to aggregate common trends that might notbe easily identifiable otherwise. For example, given an incoming trajectory 𝑇 between a source 𝑠 anda destination 𝑑 , anomaly detection was defined by Chen et al. [32] as: 1) the anomaly score of 𝑇 basedon the proportion of existing trajectories in 𝐷 passing through 𝑠 and 𝑑 and covering 𝑇 completely;2) a predefined threshold 𝜃 to determine if 𝑇 is anomalous or not. The similarity measures are basedon two types of information: 1) sub-trajectories [93]; 2) entire trajectories [32, 172]. Lee et al. [93]detected outliers using line segments, i.e., a trajectory was partitioned into a set of line segments,and then, outlying line segments were used to find trajectory outliers. Wang et al. [172] utilized anedit-based similarity measure to detect anomalies in a hierarchical cluster arrangement. ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: December 2020.
Trajectorydataset Graphconstruction EdgeinformationenrichmentTrajectory clustering Input OD &preferenceCandidategeneration GraphtraversalRouteselection Network planning [31]Similarityjoin Ridermatching Carpooling [79]Navigation [46]Optimalroute
Fig. 8. Pipeline decomposition of work on green transport.
Public transportation networks such as subways, bus routes, and gas stations provide more choicesfor green commuting. Trajectory data is commonly applied in green transport (also known as“sustainable transport” [137]) applications to optimize network design, conduct personalized andadaptive navigation, and carpooling. Note that this survey focuses on algorithms and complexity,and we also refer the interested readers to a review which covers this problem in detail [121] for amore comprehensive view from the perspective of road transportation agencies.
An Overview.
Based on these examples, we observe that a critical task in road networks is tripplanning or finding common routes in a graph for a passenger or a driver, as shown in Fig. 8.Complex route planning problems typically maximize a capacity or minimize a travel time. Whenviewed this way, many of the problems encountered are NP-hard, reducible to the traveling salesmenproblem, and solved using a greedy algorithm.
By mining taxi data, Chen et al. [31] approximated night timebus route planning by first clustering all points in taxi trajectories to determine “hot spots” thatcould be bus stops, and then created bus routes based on the connectivity between two stops. Tooleet al. [152] used census records and mobile phone location information to estimate demand intransit routes. Historical traffic data [11], smart cards [198], sensors [117], and cellular data [130]can provide more comprehensive demand data and be used to further improve data-driven transitnetwork design. Pinelli et al. [130] derived frequent patterns of movement from trajectories bycomputing distance and the flow gain, then merge them to generate a network for the whole city.A common shortcoming of these methods is that they are building a new network, and are notapplicable for most cities which already have bus networks. Based on human mobility patternsextracted from taxi data and smart card data, Liu et al. [111] proposed a localized transportationchoice model, which predicted travel demand for different bus routes by taking into account bothexisting bus network and taxi routes.
Travel distance-based shortest path search between a source anddestination is widely used in navigation services. It can be applied adaptively using historicaltrajectory data from drivers. With precise map matching techniques, discovering trajectory pathsin data can enrich models for a road network, e.g., dynamic time cost of each road [47], driverpreferences [70]. Using a dynamic travel time cost for each road segment, travel time estimation [47]and fastest path search [171] is more realistic, capturing various traffic conditions, and potentiallyleading to alternative shortest paths when all aspects are considered. Guo et al. [70] proposedlearning to route with sparse trajectory sets, by constructing a region graph of transfer routingpreferences. In the graph, the nodes are equal-size grids, and the edges represent two grids crossedby more than a fixed number of trajectories. Discovering preferences from trajectories can alsobe used to support personalized route recommendation for green vehicle routing [21]. Dai et al.[46] developed techniques to support efficient trajectory subset selection for following usingdriver preferences (travel distance, travel time, and consumed fuel) and the source, destination,
ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: December 2020.
Survey on Trajectory Data Management, Analytics, and Learning 21 and departure time specified by the driver. Delling et al. [51] proposed a framework to generatepersonalized driving directions from trajectories, where the path preferences were composed ofseveral features, such as avoiding tolls, U-turns, and highway versus surface streets.
There are two common approaches to ridesharing problems which rely heavilyon trajectory data. The first one is grouping passengers and drivers based on their historicaltrajectory data. The underlying goal is to keep all seats in all vehicles occupied, and is essentiallya type of bin packing problem. For example, Hsieh [81] devised a carpooling system to matchpassengers and drivers based on their travel trajectories, and the similarity measure was definedas the sum of the distance from the pickup and drop off points to the nearest point to a drivertrajectory. The other approach is to detect frequent routes using only rider trajectories. He et al. [79]proposed a carpooling system that generates an efficient route for dynamic ridesharing by miningthe frequent routes taken by participating riders. Hong et al. [80] performed rider matching byclustering trajectories, and generated a simulation model of ridesharing behaviors to illustrate thepotential impact. Similar to other road traffic applications, similarity computations are performedbetween passenger and driver trajectories, clustering is then used to detect the frequent routesshared by multiple passengers in the trajectory-driven carpooling.
The tourism market is an enormously valuable commodity in many countries. For example, thismarket in Britain is expected to be worth more than £257 billion by 2025, and represents 11% ofthe total GDP of the UK. From a consumer perspective, historical trip data can also improveoverall satisfaction, and provide tools to discover more personalized opportunities that align withindividual preferences.
Searching for similar trips related to a set of given points with keywords[165, 201, 202], or a range [77] can enable more effective zero knowledge querying capabilities, aswe described in Section 4.4. In addition to spatial proximity and text relevance, other factors can beconsidered, such as photos [115]. Lu et al. [115] leveraged existing travel clues recovered from 20million geo-tagged photos to suggest customized travel route plans according to user preferences.Wei et al. [176] developed a framework to retrieve the top- 𝑘 trajectories that pass through popularregions. Rather deriving search results using only existing trips, generating new trip plans withoutkeywords or locations is also possible. Chen et al. [37] discovered popular routes from spatial-onlytrajectories by constructing a transfer network from trajectory data in two steps. First, hot areasare detected as nodes in the network; then edges are added based on connectivity properties in theentire trajectory collection. Then, a network flow algorithm was proposed to discover the mostpopular route from the transfer network. Wang et al. [169] developed an interactive trip planningsystem called TISP, which enabled interesting attractions to be discovered, and used to dynamicallymodify recommendations using click-based feedback from POIs displayed on the map. When a trajectory is enriched with text, it is possible to discover moreuseful patterns by leveraging semantic information as well as human interaction. Interest discoveryis a valuable tool to improve the effectiveness of recommendations for restaurants, attractions, orpublic event to tourists. The three most common categories of interests discovery using trajectorydata for tourism are: 1) point of interest (POI) [66]; 2) region of interest (ROI) [84, 153]; and 3)interactive discovery [24, 167]. Palma et al. [126] discovered interesting locations from trajectoriesusing a spatiotemporal clustering method, based on the speed of single trajectories. Uddin et al.[153] presented a generalized ROI definition for trajectories which is parameter independent, and Trajectorydataset Inputfacilitydata Inputconstraint Definean objectivefunction Reduceto an NPproblem Greedyor exact orapproximatealgorithm Returnsitelocations
Fig. 9. A common pipeline decomposition of trajectory-driven site selection. an efficient index over the segments by speed was proposed to find the ROIs without scanningthe whole dataset. Brilhante et al. [24] utilized Wikipedia content, trajectories over georeferencedFlickr photos, and human feedback to discover a “budget-constrained sightseeing tour” using atourist’s preferences and available time. Wang et al. [167] proposed a unified index, composed ofinverted index and grid index, to find POIs, including attractions, hotels, and restaurants, to achievefast performance for real-time response.
Generally, semantic pattern mining aims to mine the frequentmovement with descriptive text information, which is more comprehensive than a spatial-only route.Zhang et al. [193] discovered fine-grained sequential patterns which satisfy spatial compactness,semantic consistency and temporal continuity simultaneously from semantic trajectories. Thealgorithm first groups all the places by category and retrieves a set of coarse patterns from thedatabase, then splits a coarse pattern in a top-down manner. Instead of two steps in [193], Kimet al. [87] presented a latent topic-based clustering algorithm to discover semantic patterns in thetrajectories of geo-tagged text messages. However, the above method can only work over trajectorieswith texts. Choi et al. [40] studied a regional semantic trajectory pattern mining problem, aimingat identifying all the regional sequential patterns in semantic trajectories. Semantic pattern miningwas not covered in previous surveys [127, 201]. Trajectory pattern mining was grouped into fourcategories by Zheng et al. [201]: 1) co-movement pattern which has been described in the trafficflow analytics, e.g., the convoy, flock, group, gathering; 2) trajectory clustering (check Section 5);3) sequential patterns which indicate a certain number of moving objects traveling a commonsequence of locations in a similar time interval; 4) periodical patterns which indicate periodicbehaviors for future movement prediction.
As a core decision-making tool, trajectory-driven site selection has been a crucial factor in increasingbusiness profit and public service quality. Using collected trajectory data to estimate the influenceof selected sites for drivers or passengers can be applied to problems such as charging stationplacement [102], billboard placement [71], and facility route design [163].Definition 12. (Constrained Site Selection)
Given a set of trajectories 𝐷 , a set of facilities 𝐹 ,a constraint value 𝐶 , and an influence model ( 𝐼𝑀 ), the aim of constrained site selection is to find asubset 𝑆 ⊂ 𝐹 to that satisfies an objective function: 𝑂 = arg max 𝑐𝑜𝑠𝑡 ( 𝑆 ) < 𝐶 𝐼𝑀 ( 𝑆, 𝐷 ) . Different objective functions can be defined based on the exact scenario. In Table 7, we comparerecent work in this area in terms of constraints, input data, map matchability, objective function,NP-hardness reduction, acceleration strategies, and approximation guarantees.
An Overview.
Fig. 9 illustrates common connections between various site selection problems. Siteselection definitions can change based on a specific problem scenario and may cover a singe datasetor multiple datasets. The general problem is NP-Hard by reduction to the set cover problem, andeven approximate solutions tend to scale poorly. Bounded or greedy algorithms can be used toaccelerate the processing along with expansion-based methods, such as estimating the bound for
ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: December 2020.
Survey on Trajectory Data Management, Analytics, and Learning 23
Table 7. Recent work on trajectory-driven site selection.
Work Constraint Extra Input Databesides Trajectory RoadNetwork The ObjectiveFunction The ReducedNP Problem AcceleratingStrategies Guarantee . . [102] 𝑘 stations existing stations, ✓ travel time,waiting time integerprogramming LP-rounding N/A[76] 𝑘 stations +batteryperformance ✓ +installing cost,charging cost,waiting time - evolutionalgorithm -[106] for thewhole city petroleumstations, POIs,real-estate ✓ revenue,queueing time bileveloptimization alternatingframework localminima6 . . [71] 𝑘 billboards busadvertisement,bus stationaudiences ✗ influence set-cover index,expansion,upper bound 1 − / 𝑒 [107] budget or 𝑘 location traffic volume,speed ✗ coveragevalue maximumcoverage invertedindex,greedyheuristic 1 − / 𝑒 [199] budget billboards’price ✗ one-timeimpression set coverproblem bound 1 − / 𝑒 [200] budget billboards’price ✗ logisticinfluence bicliquedetection branch-and-bound 𝜃 ( − / 𝑒 ) [159] 𝑘 advertisementtopics,traffic conditions,mobility transition ✓ influencespread weightedmaximumcoverage divide-and-conquer N/A6 . . [18] budget,constructioncost,utilization biketrajectories ✓ beneficialscore 0-1 knapsackproblem greedynetworkexpansion,invertedindex N/A[163] traveldistance nodecapacity ✓ routecapacity constrainedshortestpath search divide-and-conquer exact[96] source,destination parkingedges ✓ on-roadtravel time shortestpath search incrementalexpansion exact6 . . [160] 𝑘 = ✗ cumulativeinfluenceprobability - filter-and-validate exact[101] 𝑘 - ✓ 𝑘 -cover grouppruning 1 − / 𝑒 [122] 𝑘 Existingfacilities ✓ Userinconvenience k-center best-first 1 − / 𝑒 all remaining set candidates by incrementally adding a single facility and comparing it againstcurrent result to determine whether expansion can be safely terminated. Given the increasing popularity of electric vehicles, buildingmore charging stations has become a crucial problem. Trajectory-driven charging station deploy-ment aims to reduce the detour distance required for charging. Li et al. [102] developed an optimalcharging station deployment framework that uses historical electric taxi trajectory data, roadmap data, and existing charging station information as input. Then, it performs charging stationplacement and charging point assignment (each charging station has multiple changing points).The objective function is designed to minimize the average time to the nearest charging station,and the average waiting time for an available charging point, respectively, which can be reducedto the integer linear programming (ILP) problem [138] that is NP-hard. Liu et al. [106] aimed to
ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: December 2020. achieve two goals: (i) the overall revenue generated by the EVCs is maximized, subject to (ii) theoverall driver discomfort (e.g., queueing time) for EV charging is minimized. Hence, the chargingstation deployment is defined as a multiple-objective optimization problem.
Billboard placement aims to find a limited number of billboards whichmaximize the influence on passengers and further increase profits. Recently, several studies [71, 107,159, 199, 200] have investigated trajectory-driven billboard placement. Guo et al. [71] proposed thetop- 𝑘 trajectory influence maximization problem, which aims to find 𝑘 trajectories for deployingbillboards on buses to maximize the expected influence based on the audience. The model of Liu et al.[107] was based on traffic volume. More specifically, the algorithm counts how many trajectoriestraverse an edge or vertex containing a billboard, and then identifies the 𝑘 most frequent for billboardplacement. Zhang et al. [199] applied a model on range and one-time impressions, constrainedby total budget available for billboard placement. Zhang et al. [200] proposed a logistic influencemodel which solves a key shortcoming in approaches that depend on one-time impressions [199],which did not consider the relationship between the influence effect and the impression counts for asingle user. Wang et al. [159] placed billboards in a road network, and applied a divide-and-conquerstrategy to accelerate the processing. As we conclude in Table 7, billboard placement problemdefinition always have a budget, and it is also general for other site selection problem in reality aspublic resource allocation also needs to be considered in the budget. Instead of pointwise candidate set selection, a facility set can also beroutes covering multiple candidates, such as planning bike lanes and route search. Bao et al. [18]designed bike lanes based on bike trajectories, where the constraint is a budget and the numberof connected components. A greedy network expansion algorithm was proposed to iterativelyconstruct new lanes to reduce the number of connected components until the budget is met. Wanget al. [163] proposed the use of MaxRkNNT for planning bus routes between a given source anddestination using a route capacity estimation query called R 𝑘 NNT . The key constraint is traveldistance, and the objective function maximizes the estimated capacity of the routes. Using transittrajectories, Wu et al. [179] defined various objective functions for passenger preferences, andevaluated new transit routes based on multiple factors on a real-world transport network, includingmonetary cost, time cost, number of transfers, number of choices, and transit mode.
There are several general site selection studies that are not limited byany one scenario. Instead of setting a cost budget, it may be more desirable to set a parameter 𝑘 tochoose a set of facilities from the candidate set. Specifically, Wang et al. [160] aimed to identifythe optimal location ( 𝑘 =
1) which can influence the maximum number of moving objects byusing the probabilistic influence. An exact algorithm based on filtering-verification is proposed toprune many inferior candidate locations prior to computing the influence. Li et al. [101] identifiedthe 𝑘 most influential locations traversed by the maximum number of unique trajectories in agiven spatial region with an efficient algorithm to find the location set using a greedy heuristic,in cases where 𝑘 and the spatial areas are large. Mitra et al. [122] solved the trajectory-awareinconvenience-minimizing placement of 𝑘 facility services which was proved to be NP-hard [27].The inconvenience is defined as the extra distance traveled to access the nearest facility location. Deep learning has been successfully used for trajectory data analytics and applications for thelast several years, which is attracting increasing interest. We group these methods into severalcategories, and discuss the associated research problems in detail in this section. We then cover theemerging trends in deep learning for trajectory data.
ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: December 2020.
Survey on Trajectory Data Management, Analytics, and Learning 25
Trajectory Data Generation.
Generating synthetic yet realistic location trajectories plays afundamental role in privacy-aware analysis on trajectory data. A generative model was proposedfor generating trajectories [125] based on a set of training trajectories. The proposed solution usesgenerative adversarial networks (GAN) to produce data points in a metric space, which are thentransformed to a sequential trajectory. Experimental results show that the model is able to capturethe correlation between visited locations in a trajectory and learn the common semantic/geographicmobility patterns from the training trajectories. The idea of using a GAN to generate synthetictrajectories that can preserve the summary properties of real trajectory data for data privacy wasintroduced by Liu et al. [110].
Trajectory Data Recovery.
As many trajectories are recorded at a low sampling rate, the low-sampled trajectories cannot capture the correct routes of the objects. This problem has been studiedin two settings previously, either with road networks using map matching [114], or without roadnetworks [147], as discussed in Section 2.2. Wang et al. [158] aimed to recover the trajectorybetween two consecutive sampled points of “low-sampled” trajectory data. Using a seq2seq model,the proposed method uses spatial and temporal attention to capture spatiotemporal correlationsand integrates a calibration component using a Kalman filter (KF) in order to reduce the uncertainty.
Representation learning of trajectories aims to represent trajectories as vectors of a fixed dimension.Then the similarity of two trajectories can be computed based on the Euclidean distance of theirvector representations, which reduces the complexity of similarity computation from 𝑂 ( 𝑛 ) to 𝑂 ( 𝑛 ) , where 𝑛 is the trajectory length. Li et al. [100] proposed a seq2seq-based model to learntrajectory representations, where the spatial proximity is taken into account in the design of the lossfunction. The trajectory similarity based on the learned representations is robust to non-uniform,low sampling rates, and noisy sample points. Yao et al. [185] proposed the use of a deep metriclearning framework to approximate any existing trajectory measure, and is capable of computingsimilarity for a given trajectory pair in linear time. The basic idea is to sample a number of seedtrajectories from the given database and then use their pairwise similarities as guide to approximatethe similarity function with a neural metric learning framework. The proposed solution adoptsa new spatial attention memory module that augments existing RNN for trajectory encoding. Fuand Lee [63] proposed to exploit the road networks to learn the trajectory representation based onan encoder decoder model. Deep trajectory representation was also extended to multi-trajectoryscenarios, such as trajectories encountered in sporting events with several football players [173].Wang et al. [174] applied deep reinforcement learning to enable sub-trajectory similarity search,by splitting every trajectory into a set of sub-trajectories that can be candidate solutions for aquery trajectory. By learning an optimal splitting policy, the sub-trajectory similarity search ismore efficient than heuristics-based methods. Yao et al. [186] transformed trajectories into feature sequences that capture object movements, andthen applied an autoencoder framework to learn fixed length deep representations of trajectoriesfor clustering. Song et al. [145] proposed a model named DeepTransport to predict the transporta-tion mode such as walk, taking trains, taking buses, etc., from a set of individual peoples GPStrajectories. LSTM is used to constructed DeepTransport to predict a user’s transportation mode.Endo et al. [55] adopted stacked denoising autoencoder (SDA) to automatically extract features forthe transportation mode classification problem. Gao et al. [65] considered a different trajectory
ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: December 2020. classification problem, namely Trajectory-User Linking (TUL), which aims to link trajectories tousers who generate them in the location-based social networks. An RNN based semi-supervisedmodel was proposed to address the TUL problem. Liu et al. [112] proposed a deep generative model,namely a Gaussian Mixture Variational Sequence AutoEncoder (GM-VSAE), for online anomaloustrajectory detection based on Variational autoencoding.
Recurrent neural networks (RNN) are widely used for trajectory prediction, and most of the workreviewed also learns user representations from trajectories to capture user preferences in additionto the spatio-temporal contexts. Liu et al. [108] proposed Spatial-Temporal RNN (ST-RNN) for thenext location prediction of a given user using an RNN. An ST-RNN model local temporal contexts,periodical temporal contexts, and geographical contexts to learn the representation of users underspecific contexts. Zhou et al. [209] adopted a seq2seq encoder-decoder framework consisting oftwo encoders and two decoders to predict future trajectories. This method does not explicitly learnuser representations. Wu et al. [180] designed two RNN models to model trajectories that considerroad network constraints on trajectories, and can be used to predict the destination of a trajectory.DeepMove [58] is based on a multi-modal embedding RNN to capture the complex sequentialtransitions by jointly embedding multiple factors such as time, user, and location. DeepMove alsoapplies attention mechanism to capture the periodical effect of mobility. Chen et al. [36] proposeda context-aware deep model called DeepJMT for jointly performing mobility prediction (to knowwhere) and time prediction (to know when). DeepJMT captures the user’s mobility regularitiesand temporal patterns using RNN, captures spatial context, periodicity context and social andtemporal context using various mechanisms, e.g., co-attention mechanism to capture, and maketime prediction using temporal point process.A convolutional neural network (CNN) can also be used for trajectory prediction. Karatzoglouet al. [86] proposed a CNN-based approach for representing semantic trajectories and predictingfuture locations. The semantic trajectories are represented as a matrix of semantic meanings andtrajectory IDs. A CNN is applied to the matrix to learn the latent features for next visited semanticlocation prediction. Gao et al. [64] developed a deep generative model called Variational Attentionbased Next (VANext) POI prediction, which simultaneously learns implicit relationships betweenusers and POIs, and captures sequential patterns of user check-in behavior for next POI prediction.A CNN is used to capture long term and structural dependency among user check-ins, achievingcomparable learning ability with the state-of-the-art RNN based methods, while significantlyimproving the learning efficiency. Lv et al. [116] proposed to model trajectories as two-dimensionalimages, and employed CNN for trajectory destination prediction.
Wang et al. [155] aimed to estimate the travel time of a path from the mobility trajectory data. Theirapproach used a CNN and an RNN to capture the features of a path, and employed a multi-tasklearning component to estimate the travel time. Zhang et al. [197] developed a bidirectional LSTMbased deep model, called DeepTravel, to estimate the travel time of a path from the historicaltrajectories. Li et al. [99] proposed a deep generative model, DeepGTT, to learn the travel timedistribution for a route by conditioning on the real-time traffic, which is captured by the trajectorydata. Li et al. [98] proposed a deep probabilistic model, DeepST, which unifies three key explanatoryfactors, the past traveled trajectories, the impact of destination, and real-time traffic for the routedecision of a pair of source and destination. Yuan et al. [191] estimated the travel time of anorigin-destination pair at a certain departure time, and proposed a neural network based prediction
ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: December 2020.
Survey on Trajectory Data Management, Analytics, and Learning 27 model. This model exploits the fact that for a past OD trip its travel time is usually affiliated withthe trajectory it travels along, whereas it does not exist during prediction.
Trajectory data has inspired many important research problems and applications in both academiaand industry. However, open-source and commercial systems that can support the entire pipeline oftrajectory data management and analytics are still non-existent. To better cater to future applications,a unified trajectory data management system would be an important contribution to the community.Desirata of such a system include: • Data Cleaning : Map matching, re-sampling, and calibration are fundamental building blocksin efficient and effective analytics. However, current solutions need to input a road networkdataset given by users manually, and also need to input several data-dependent parameters.An automated data cleaning pipeline for trajectory data is highly desirable as manual datacleaning is time-consuming and often not reproducible. • Trajectory Data Repositories : Cleaning trajectory is computationally expensive. Publiclyavailable data repositories of raw and cleaned data would not only improve reproducibility, itwould also stimulate new research in the area. A standard data format for trajectory data isalso not currently defined. Even though LineString and GPX can be used, their use is limited.Labeled datasets for similarity search and classification problems are also non-existent,hampering progress on this important problem. • Data Integration and Operator Support : In more advanced data analytics applications,trajectory data is often heterogeneous. Data integration between trajectory databases andother databases, including spatial databases (point data) and graph databases (road networks)would be a valuable contribution. Trajectory data from multiple devices, including cameras,UAV, and RFID can also be integrated to build more comprehensive profiles of a city. Unstruc-tured metadata from various sources also plays a vital role when apply trajectory data inreal-world application. This data might include speed limits, transit timetables, and publicservice opening hours. While this information is used commonly in commercial applications,the cost of using proprietary data sources is often too high in the research community. • Performance Benchmarks : A trajectory performance benchmark similar to the TPC bench-mark would greatly improve the quality of empirical comparisons of new algorithms inresearch papers. There are many different distance measures and indexing data structuresbeing applied to trajectory search with little understanding of their true performance charac-teristics. For efficiency, pruning power and I/O are important factors in total running time;for the effectiveness, no ground-truth means that search quality cannot truly be measured. • Parameterization : Parameter selection plays a critical role in system performance of tra-jectory analytics. Parameter-free algorithms are highly desirable, but not always possible.Automated parameter selection is an essential requirement for automatic databases, and forintelligent trajectory analytics. It would also be interesting to investigate how to automatebounding approximation ratios can be achieved during algorithm design. • Deep Trajectory Learning : Deep learning has made some progress in synthetic data gen-eration, representation learning, and mobility prediction. More advanced tasks, includingquery optimization and learned index, can be investigated to improve the query performance.Another promising topic can be the online deep trajectory learning to meet the demand fortimely decision making over dynamic trajectory data. • Self-Driving Trajectory : Self-driving [28, 118] will be one of the primary sources of trajec-tories in the near future. Different from the typical long-term trajectories being currentlyanalyzed in road-network which is the focus in this survey, self-driving trajectories will beshort-term lane-level traces with high sampling rates, e.g., each trajectory only lasts for fiveseconds [28]. Due to the requirement of real-time response for safe driving, managing andanalyzing such trajectories will be crucial and challenging, especially in a streaming scenario. • Trajectory Analysis for Public Health : Human movement data recorded by mobile phoneshas been analyzed to control Malaria in Africa [177] and dengue epidemics in Pakistan [178].With mass outbreaks of diseases such as COVID-19 [5, 8] in dense urban areas, collectingand analyzing the trajectories of infected people and building real-time warning mechanismswill play an important role in disease traceability, disease control, and emergency response.
In this survey, we have reviewed recent progress in trajectory data management and learning.We first presented an overview of trajectory data management and urban applications. Then, wecategorized the problems based on components and operators shared across the tasks. Similaritymeasures, top- 𝑘 similarity search, and fast clustering are all widely used for many different problemsand scenarios, and were the focus of this study. Finally, we reviewed important research advanceson four common analytics applications of trajectories for real-time smart cities. New applicationsemerge daily that leverage trajectory data, such as deep trajectory learning, and we hope this surveycan provide readers an overview of the landscape of trajectory data management and applications.Perspectives on how to choose the most appropriate solution for pre-processing, storage, search,and advanced analytics to achieve high efficiency and effectiveness can be applied in many differentproblem domains beyond road networks. ACKNOWLEDGMENTS
Zhifeng Bao is supported in part by ARC DP200102611, DP180102050, and a Google Faculty Award.J. Shane Culpepper is supported in part by ARC DP190101113. Gao Cong is supported in part by aMOE Tier-2 grant MOE2019-T2-2-181, and a MOE Tier-1 grant RG114/19.
REFERENCES
IEEE Transactions on Intelligent Transportation Systems
16, 2 (2015), 653–662.[12] Pankaj K Agarwal, Kyle Fox, Kamesh Munagala, Abhinandan Nath, Jiangwei Pan, and Erin Taylor. 2018. Subtrajectoryclustering: Models and algorithms. In
PODS . 75–87.[13] Luis Otavio Alvares, Vania Bogorny, Bart Kuijpers, Jose Antonio Fernandes de Macedo, Bart Moelans, and AlejandroVaisman. 2007. A model for enriching trajectories with semantic geographical information. In
GIS .[14] Carlos Andres, Luis Otávio Alvares, Willian Zalewski, Vania Bogorny, and Luis Otavio Alvares. 2018. MOVELETS:Exploring relevant subtrajectories for robust trajectory classification. In
SAC . 849–856.ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: December 2020.
Survey on Trajectory Data Management, Analytics, and Learning 29 [15] Gennady Andrienko, Natalia Andrienko, Salvatore Rinzivillo, Mirco Nanni, Dino Pedreschi, and Fosca Giannotti.2009. Interactive visual clustering of large collections of trajectories. In
VAST . 273–274.[16] Samet Ayhan and Hanan Samet. 2016. Aircraft trajectory prediction made easy with predictive analytics. In
KDD .21–30.[17] Petko Bakalov and Vassilis J. Tsotras. 2008. Continuous spatiotemporal trajectory joins. In
GSN . 109–128.[18] Jie Bao, Tianfu He, Sijie Ruan, and Yu Zheng. 2017. Planning bike lanes based on sharing-bikes’ trajectories. In
KDD .1377–1386.[19] Jie Bao, Yu Zheng, and Mohamed F Mokbel. 2012. Location-based and preference-aware recommendation usingsparse geo-social networking data. In
SIGSPATIAL . 199–208.[20] Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, and Bernhard Seeger. 1990. The R*-tree: an efficient androbust access method for points and rectangles. In
SIGMOD . 322–331.[21] Tolga Bektacs, Emrah Demir, and Gilbert Laporte. 2016. Green vehicle routing. In
Green transportation logistics .Springer, 243–265.[22] Jiang Bian, Dayong Tian, Yuanyan Tang, and Dacheng Tao. 2019. Trajectory data classification: A review.
ACMTransactions on Intelligent Systems and Technology
10, 4 (2019), 1–34.[23] Julian Bock, Robert Krajewski, Tobias Moers, Steffen Runde, Lennart Vater, and Lutz Eckstein. 2019.
The inD Dataset:A Drone Dataset of Naturalistic Road User Trajectories at German Intersections . Technical Report. arXiv:1911.07602[24] Igo Ramalho Brilhante, Jose Antonio Macedo, Franco Maria Nardini, Raffaele Perego, and Chiara Renso. 2015. Onplanning sightseeing tours with TripBuilder.
Information Processing and Management
51, 2 (2015), 1–15.[25] Andrei Z Broder, David Carmel, Michael Herscovici, Aya Soffer, and Jason Zien. 2003. Efficient query evaluationusing a two-level retrieval process. In
CIKM . 426–434.[26] Maike Buchin, Anne Driemel, Marc Van Kreveld, and Vera Sacristan. 2011. Segmenting trajectories: A frameworkand algorithms using spatiotemporal criteria.
Journal of Spatial Information Science
3, 3 (2011), 33–63.[27] T-h Hubert Chan, Arnaud Guerquin, and Mauro Sozio. 2018. Fully dynamic k-center clustering. In
WWW . 579–587.[28] Ming-Fang Chang, John Lambert, Patsorn Sangkloy, Jagjeet Singh, Slawomir Bak, Andrew Hartnett, De Wang, PeterCarr, Simon Lucey, Deva Ramanan, and James Hays. 2019. Argoverse: 3D Tracking and Forecasting with Rich Maps.In
CVPR . 8748–8757.[29] Muhammad Aamir Cheema, Wenjie Zhang, Xuemin Lin, Ying Zhang, and Xuefei Li. 2012. Continuous reverse knearest neighbors queries in Euclidean space and in spatial networks.
VLDB Journal
21, 1 (2012), 69–95.[30] Chao Chen, Chengwu Liao, Xuefeng Xie, Yasha Wang, and Junfeng Zhao. 2019. Trip2Vec: a deep embedding approachfor clustering and profiling taxi trip purposes.
Personal and Ubiquitous Computing
23, 1 (2019), 53–66.[31] Chao Chen, Daqing Zhang, Nan Li, and Zhi Hua Zhou. 2014. B-planner: Planning bidirectional night bus routes usinglarge-scale taxi GPS traces.
IEEE Transactions on Intelligent Transportation Systems
15, 4 (2014), 1451–1465.[32] Chao Chen, Daqing Zhang, Pablo Samuel Castro, Nan Li, Lin Sun, and Shijian Li. 2011. Real-time detection ofanomalous taxi trajectories from GPS traces. In
MobiQuitous . 63–74.[33] Lei Chen and Raymond Ng. 2004. On the marriage of Lp-norms and edit distance. In
VLDB . 792–803.[34] Lei Chen, M. Tamer Özsu, and Vincent Oria. 2005. Robust and fast similarity search for moving object trajectories. In
SIGMOD . 491–502.[35] Siming Chen, Xiaoru Yuan, Zhenhuang Wang, Cong Guo, Jie Liang, Zuchao Wang, and Jiawan Zhang. 2015. InteractiveVisual Discovering of Movement Patterns from Sparsely.
IEEE Transactions on Visualization and Computer Graphics
22, 1 (2015), 1–1.[36] Yile Chen, Cheng Long, Gao Cong, and Chengliang Li. 2020. Context-aware deep model for joint mobility and timeprediction. In
WSDM . 106–114.[37] Zaiben Chen, Heng Tao Shen, and Xiaofang Zhou. 2011. Discovering popular routes from trajectories. In
ICDE .900–911.[38] Zaiben Chen, Heng Tao Shen, Xiaofang Zhou, Yu Zheng, and Xing Xie. 2010. Searching trajectories by locations-anefficiency study. In
SIGMOD . 255–266.[39] Eunjoon Cho, Seth A. Myers, and Jure Leskovec. 2011. Friendship and mobility: User movement in location-basedsocial networks. In
KDD . 1082–1090.[40] Dong-wan Choi, Jian Pei, and Thomas Heinis. 2017. Efficient mining of regional movement patterns in semantictrajectories.
PVLDB
10, 13 (2017), 2073–2084.[41] Gao Cong, Christian S Jensen, and Dingming Wu. 2009. Efficient retrieval of the top-k most relevant spatial webobjects.
PVLDB
2, 1 (2009), 337–348.[42] Gao Cong, Hua Lu, Beng Chin Ooi, Dongxiang Zhang, and Meihui Zhang. 2012. Efficient spatial keyword search intrajectory databases. In
ArXiv:1205.2880 . 12.[43] Thomas H Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2009.
Introduction to algorithms . MITpress. ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: December 2020. [44] Philippe Cudre-Mauroux, Eugene Wu, and Samuel Madden. 2010. TrajStore: an adaptive storage system for verylarge trajectory data sets. In
ICDE . 109–120.[45] Sina Dabiri, Chang-Tien Lu, Kevin Heaslip, and Chandan K Reddy. 2020. Semi-supervised deep learning approach fortransportation mode identification using GPS trajectory data.
IEEE Transactions on Knowledge and Data Engineering
32, 5 (2020), 1010–1023.[46] Jian Dai, Bin Yang, Chenjuan Guo, and Zhiming Ding. 2015. Personalized route recommendation using big trajectorydata. In
ICDE . 543–554.[47] Jian Dai, Bin Yang, and Christian S Jensen. 2016. Path cost distribution estimation using trajectory data.
PVLDB
10, 3(2016), 85–96.[48] Eleonora D’Andrea and Francesco Marcelloni. 2017. Detection of traffic congestion and incidents from GPS traceanalysis.
Expert Systems with Applications
73 (2017), 43–56.[49] Victor Teixeira de Almeida and Ralf Hartmut Güting. 2005. Indexing the trajectories of moving objects in networks.
GeoInformatica
9, 1 (2005), 33–60.[50] Ian De Felipe, Vagelis Hristidis, and Naphtali Rishe. 2008. Keyword search on spatial databases. In
ICDE . 656–665.[51] Daniel Delling, Andrew V. Goldberg, Moises Goldszmidt, John Krumm, Kunal Talwar, and Renato F. Werneck. 2016.Navigation made personal: Inferring Driving Preferences from GPS Traces. In
SIGSPATIAL . 1–9.[52] Shuai Ding and Torsten Suel. 2011. Faster top-k document retrieval using block-max indexes. In
SIGIR . 993–1002.[53] Xin Ding, Lu Chen, Yunjun Gao, Christian S Jensen, and Hujun Bao. 2018. UlTraMan : A unified platform for bigtrajectory data management and analytics.
PVLDB
11, 7 (2018), 787–799.[54] Thomas Eiter and Heikki Mannila. 1994.
Computing discrete Fréchet distance . Technical Report. CD–TR 94/64 pages.[55] Yuki Endo, Hiroyuki Toda, · Kyosuke Nishida, and Jotaro Ikedo. 2016. Classifying spatial trajectories using represen-tation learning.
Int J Data Sci Anal
J. Comput.System Sci.
66, 4 (2003), 614–656.[57] Yixiang Fang, Reynold Cheng, Wenbin Tang, Silviu Maniu, and Xuan Yang. 2016. Scalable algorithms for nearest-neighbor joins on big trajectory data.
IEEE Transactions on Knowledge and Data Engineering
28, 3 (2016), 785–800.[58] Jie Feng, Yong Li, Chao Zhang, Funing Sun, Fanchao Meng, Ang Guo, and Depeng Jin. 2018. Deepmove: Predictinghuman mobility with attentional recurrent networks. In
WWW . 1459–1468.[59] Zhenni Feng and Yanmin Zhu. 2016. A survey on trajectory data mining: Techniques and applications.
IEEE Access
Computer Graphics Forum
32, 3 (2013), 201–210.[61] Dan Foster. 2004. GPX The GPS Exchange Format. (2004).[62] Elias Frentzos, Kostas Gratsias, and Yannis Theodoridis. 2007. Index-based most similar trajectory search. In
ICDE .816–825.[63] Tao Yang Fu and Wang Chien Lee. 2020. TremBR: Exploring road networks for trajectory representation learning.
ACM Transactions on Intelligent Systems and Technology
11, 1 (2020), 1–25.[64] Qiang Gao, Fan Zhou, Goce Trajcevski, Kunpeng Zhang, Ting Zhong, and Fengli Zhang. 2019. Predicting humanmobility via variational attention. In
WWW . 2750–2756.[65] Qiang Gao, Fan Zhou, Kunpeng Zhang, Goce Trajcevski, Xucheng Luo, and Fengli Zhang. 2017. Identifying humanmobility via trajectory embeddings. In
IJCAI . 1689–1695.[66] Fosca Giannotti, Mirco Nanni, Fabio Pinelli, and Dino Pedreschi. 2007. Trajectory pattern mining. In
KDD . 330–339.[67] Li Gong, Xi Liu, Lun Wu, and Yu Liu. 2016. Inferring trip purposes and uncovering travel patterns from taxi trajectorydata.
Cartography and Geographic Information Science
43, 2 (2016), 103–114.[68] Joachim Gudmundsson and Nacho Valladares. 2012. A GPU approach to subtrajectory clustering using the Fréchetdistance. In
SIGSPATIAL . 259–268.[69] Joachim Gudmundsson and Marc Van Kreveld. 2006. Computing longest duration flocks in trajectory data. In
GIS .35–42.[70] Chenjuan Guo, Bin Yang, Jilin Hu, and Christian Jensen. 2018. Learning to route with sparse trajectory sets. In
ICDE .IEEE, 1085–1096.[71] Long Guo, Dongxiang Zhang, Gao Cong, Wei Wu, and Kian Lee Tan. 2017. Influence maximization in trajectorydatabases.
IEEE Transactions on Knowledge and Data Engineering
29, 3 (2017), 627–641.[72] Mingming Guo, Xinyu Jin, Niki Pissinou, Sebastian Zanlongo, Bogdan Carbunar, and S S Iyengar. 2015. In-networktrajectory privacy preservation.
ACM Comput. Surv.
48, 23 (2015).[73] Ralf Hartmut Güting, Fabio Valdés, and Maria Luisa Damiani:. 2015. Symbolic trajectories.
ACM Trans. SpatialAlgorithms and Systems
1, 2 (2015), 7:1–7:51.ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: December 2020.
Survey on Trajectory Data Management, Analytics, and Learning 31 [74] Antonin Guttman. 1984. R-trees: a dynamic index structure for spatial searching.
ACM SIGMOD Record
14, 2 (1984),47–57.[75] Binh Han, Ling Liu, and Edward Omiecinski. 2015. Road-network aware trajectory clustering: Integrating locality,flow, and density.
IEEE Transactions on Mobile Computing
14, 2 (2015), 416–429.[76] Daehee Han, Yongjun Ahn, Sunkyu Park, and Hwasoo Yeo. 2016. Trajectory-interception based method for electricvehicle taxi charging station problem with real taxi data.
International Journal of Sustainable Transportation
10, 8(2016), 671–682.[77] Yuxing Han, Liping Wang, Ying Zhang, Wenjie Zhang, and Xuemin Lin. 2015. Spatial keyword range search ontrajectories. In
DASFAA . 223–240.[78] Ramaswamy Hariharan, Bijit Hore, Li Chen, and Sharad Mehrotra. 2007. Processing spatial-keyword (SK) queries inGeographic Information Retrieval (GIR) systems. In
SSDBM . 16–25.[79] Wen He, Kai Hwang, and Deyi Li. 2014. Intelligent carpool routing for urban ridesharing by mining GPS trajectories.
IEEE Transactions on Intelligent Transportation Systems
15, 5 (2014), 2286–2296.[80] Zihan Hong, Ying Chen, Hani S. Mahmassani, and Shuang Xu. 2017. Commuter ride-sharing using topology-basedvehicle trajectory clustering: Methodology, application and impact evaluation.
Transportation Research Part C:Emerging Technologies
85, October (2017), 573–590.[81] Fu Shiung Hsieh. 2017. Car pooling based on trajectories of drivers and requirements of passengers. In
AINA . IEEE,972–978.[82] Chih Chieh Hung, Wen Chih Peng, and Wang Chien Lee. 2015. Clustering and aggregating clues of trajectories formining trajectory patterns and routes.
VLDB Journal
24, 2 (2015), 169–192.[83] G R Jagadeesh, T Srikanthan, and X D Zhang. 2004. A map matching method for GPS based real-time vehicle location.
The Journal of Navigation
57, 3 (2004), 429–440.[84] Priit Jarv, Tanel Tammet, and Marten Tall. 2018. Hierarchical regions of interest. In
MDM . 86–95.[85] Hoyoung Jeung, Man Lung Yiu, Xiaofang Zhou, Christian S Jensen, and Heng Tao Shen. 2008. Discovery of convoysin trajectory databases.
PVLDB
1, 1 (2008), 1068–1080.[86] Antonios Karatzoglou, Nikolai Schnell, and Michael Beigl. 2018. A Convolutional Neural Network Approach forModeling Semantic Trajectories and Predicting Future Locations Antonios. In
ICANN . 61–72.[87] Younghoon Kim, Jiawei Han, and Cangzhou Yuan. 2015. TOPTRAC: Topical trajectory pattern mining. In
KDD .587–596.[88] Satoshi Koide, Yukihiro Tadokoro, and Takayoshi Yoshimura. 2015. SNT-index: Spatio-temporal index for vehiculartrajectories on a road network based on substring matching. In
UrbanGIS@SIGSPATIAL . 1–8.[89] Satoshi Koide, Chuan Xiao, and Yoshiharu Ishikawa. 2020. Fast subtrajectory similarity search in road networksunder weighted edit distance constraints.
PVLDB
13, 11 (2020), 2188–2201.[90] Robert Krajewski, Julian Bock, Laurent Kloeker, and Lutz Eckstein. 2018. The highd dataset: A drone dataset ofnaturalistic vehicle trajectories on german highways for validation of highly automated driving systems. In
ITSC .[91] Benjamin Krogh and Christian S Jensen. 2016. Efficient in-memory indexing of network-constrained trajectories. In
SIGSPATIAL . 17:1–17:10.[92] Scott LaPoint, Paul Gallery, Martin Wikelski, and Roland Kays. 2013. Animal behavior, cost-based corridor models,and real corridors.
Landscape Ecology
28, 8 (2013), 1615–1630.[93] Jae Gil Lee, Jiawei Han, and Xiaolei Li. 2008. Trajectory outlier detection: A partition-and-detect framework. In
ICDE .140–149.[94] Jae-Gil Lee, Jiawei Han, Xiaolei Li, and Hector Gonzalez. 2008. TraClass: Trajectory classification using hierarchicalregion-based and trajectory-based clustering.
PVLDB
1, 1 (2008), 1081–1094.[95] Jae-gil Lee, Jiawei Han, and Kyu-Young Whang. 2007. Trajectory clustering: A partition-and-group framework. In
SIGMOD . 593–604.[96] Lei Li, Kai Zheng, Sibo Wang, Wen Hua, and Xiaofang Zhou. 2018. Go slow to go fast: minimal on-road time routescheduling with parking facilities using historical trajectory.
VLDB Journal
27, 3 (2018), 321–345.[97] Ruiyuan Li, Huajun He, Rubin Wang, Sijie Ruan, Yuan Sui, Jie Bao, and Yu Zheng. 2020. TrajMesa: A distributednosql storage engine for big trajectory data. In
ICDE . 2002–2005.[98] Xiucheng Li, Gao Cong, and Yun Cheng. 2020. Spatial transition learning on road networks with deep probabilisticmodels. In
ICDE . 349–360.[99] Xiucheng Li, Aixin Sun, Gao Cong, and Yun Cheng. 2019. Learning travel time distributions with deep generativemodel. In
WWW . 1017–1027.[100] Xiucheng Li, Kaiqi Zhao, Gao Cong, Christian S Jensen, and Wei Wei. 2018. Deep representation learning for trajectorysimilarity computation. In
ICDE . 617–628.[101] Yuhong Li, Jie Bao, Yanhua Li, Yingcai Wu, Zhiguo Gong, and Yu Zheng. 2016. Mining the most influential k -locationset from massive trajectories.
IEEE TRANSACTIONS ON BIG DATA
4, 4 (2016), 556–570.ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: December 2020. [102] Yanhua Li, Jun Luo, Chi Yin Chow, Kam Lam Chan, Ye Ding, and Fan Zhang. 2015. Growing the charging stationnetwork for electric vehicles with trajectory data analytics. In
ICDE . 1376–1387.[103] Zhenhui Li, Bolin Ding, Jiawei Han, and Roland Kays. 2010. Swarm: Mining relaxed temporal moving object clusters.
PVLDB
3, 1 (2010), 723–734.[104] Zhenhui Li, Jae-Gil Lee, Xiaolei Li, and Jiawei Han. 2010. Incremental clustering for trajectories. In
DASFAA . 32–46.[105] Zhisheng Li, Ken C K Lee, Baihua Zheng, Wang Chien Lee, Dik Lee, and Xufa Wang. 2011. IR-tree: An efficient indexfor geographic document search.
IEEE Transactions on Knowledge and Data Engineering
23, 4 (2011), 585–599.[106] Chen Liu, Ke Deng, Chaojie Li, Jianxin Li, Yanhua Li, and Jun Luo. 2016. The optimal distribution of electric-vehiclechargers across a city. In
ICDM . 261–270.[107] Dongyu Liu, Di Weng, Yuhong Li, Jie Bao, Yu Zheng, Huamin Qu, and Yingcai Wu. 2017. SmartAdP: Visual analyticsof large-scale taxi trajectories for selecting billboard locations.
IEEE Transactions on Visualization and ComputerGraphics
23, 1 (2017), 1–10.[108] Qiang Liu, Shu Wu, Liang Wang, and Tieniu Tan. 2016. Predicting the next location: A recurrent model with spatialand temporal contexts. In
AAAI . 194–200.[109] Siyuan Liu, Ce Liu, Qiong Luo, Lionel M. Ni, and Ramayya Krishnan. 2012. Calibrating large scale vehicle trajectorydata. In
MDM . 222–231.[110] Xi Liu, Hanzhou Chen, and Clio Andris. 2018. trajGANs: Using generative adversarial networks for geo-privacyprotection of trajectory data (Vision paper). In
Location Privacy and Security Workshop . 1–7.[111] Yanchi Liu, Chuanren Liu, Nicholas Jing Yuan, Lian Duan, Yanjie Fu, Hui Xiong, Songhua Xu, and Junjie Wu. 2016.Intelligent bus routing with heterogeneous human mobility patterns.
Knowledge and Information Systems
50, 2 (2016),383–415.[112] Yiding Liu, Kaiqi Zhao, Gao Cong, and Zhifeng Bao. 2020. Online anomalous trajectory detection with deep generativesequence modeling. In
ICDE . 949–960.[113] Cheng Long, Raymond Chi-Wing Wong, and H. V. Jagadish. 2013. Direction-preserving trajectory simplification.
PVLDB
6, 10 (2013), 949–960.[114] Yin Lou, Chengyang Zhang, Yu Zheng, Xing Xie, Wei Wang, and Yan Huang. 2009. Map-matching for low-sampling-rate GPS trajectories. In
GIS . 352–361.[115] Xin Lu, Changhu Wang, Jiang-Ming Yang, Yanwei Pang, and Lei Zhang. 2010. Photo2Trip: generating travel routesfrom geo-tagged photos for trip planning. In MM . 143–152.[116] Jianming Lv, Qinghui Sun, Qing Li, and Luis Moreira-Matias. 2020. Multi-scale and multi-scope convolutional neuralnetworks for destination prediction of trajectories. IEEE Transactions on Intelligent Transportation Systems
21, 8 (2020),3184–3195.[117] Yisheng Lv, Yanjie Duan, Wenwen Kang, Zhengxi Li, and Fei Yue Wang. 2015. Traffic flow prediction with big data: adeep learning approach.
IEEE Transactions on Intelligent Transportation Systems
16, 2 (2015), 865–873.[118] Yuexin Ma, Xinge Zhu, Sibo Zhang, Ruigang Yang, Wenping Wang, and Dinesh Manocha. 2019. TrafficPredict:Trajectory Prediction for Heterogeneous Traffic-Agents. In
AAAI . 6120–6127. arXiv:1811.02146[119] Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008.
An Introduction to Information Retrieval .Cambridge University Press.[120] Yingchi Mao, Haishi Zhong, Xianjian Xiao, and Xiaofang Li. 2017. A segment-based trajectory similarity measure inthe urban transportation systems.
Sensors
17, 3 (2017).[121] Nikola Markovic, Przemylaw Sekula, Zachary Vander Laan, Gennady Andrienko, and Natalia Andrienko. 2019.Applications of trajectory data from the perspective of a road transportation agency: Literature review and marylandcase study.
IEEE Transactions on Intelligent Transportation Systems
20, 5 (2019), 1858–1869.[122] Shubhadip Mitra, Priya Saraf, and Arnab Bhattacharya. 2019. TIPS: Mining Top-K Locations to Minimize User-Inconvenience for Trajectory-Aware Services.
IEEE Transactions on Knowledge and Data Engineering (2019).[123] Suraj Nair, Kiran Javkar, Jiahui Wu, and Vanessa Frias-Martinez. 2019. Understanding cycling trip purpose androute choice using GPS traces and open data.
Proceedings of the ACM on Interactive, Mobile, Wearable and UbiquitousTechnologies
3, 1 (2019), 1–26.[124] Paul Newson and John Krumm. 2009. Hidden Markov map matching through noise and sparseness. In
SIGSPATIAL .336–343.[125] Kun Ouyang, Reza Shokri, David S. Rosenblum, and Wenzhuo Yang. 2018. A non-parametric generative model forhuman trajectories. In
IJCAI . 3812–3817.[126] Andrey Tietbohl Palma, Vania Bogorny, Bart Kuijpers, and Luis Otavio Alvares. 2008. A Clustering-based Approachfor Discovering Interesting Places in Trajectories. In
SAC . 863–868.[127] Christine Parent, Nikos Pelekis, Yannis Theodoridis, Zhixian Yan, Stefano Spaccapietra, Chiara Renso, GennadyAndrienko, Natalia Andrienko, Vania Bogorny, Maria Luisa Damiani, Aris Gkoulalas-Divanis, and Jose Macedo. 2013.Semantic trajectories modeling and analysis.
Comput. Surveys
45, 4 (2013), 1–32.ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: December 2020.
Survey on Trajectory Data Management, Analytics, and Learning 33 [128] Dhaval Patel, Chang Sheng, Wynne Hsu, and Mong Li Lee. 2012. Incorporating duration information for trajectoryclassification. In
ICDE . 1132–1143.[129] Nikos Pelekis, Ioannis Kopanakis, Evangelos E. Kotsifakos, Elias Frentzos, and Yannis Theodoridis. 2009. Clusteringtrajectories of moving objects in an uncertain world. In
ICDM . 417–427.[130] Fabio Pinelli, Rahul Nair, Francesco Calabrese, Michele Berlingerio, Giusy Di Lorenzo, and Marco Luca Sbodio. 2016.Data-driven transit network design from mobile phone trajectories.
IEEE Transactions on Intelligent TransportationSystems
17, 6 (2016), 1724–1733.[131] Shuyao Qi, Panagiotis Bouros, Dimitris Sacharidis, and Nikos Mamoulis. 2015. Efficient point-based trajectory search.In
SSTD . 179–196.[132] Sayan Ranu, P Deepak, Aditya D Telang, Prasad Deshpande, and Sriram Raghavan. 2015. Indexing and matchingtrajectories under inconsistent sampling rates. In
ICDE . 999–1010.[133] Keven Richly. 2018. A survey on trajectory data management for hybrid transactional and analytical workloads. In
BigData . 562–569.[134] Gook-pil Roh, Jong-won Roh, Seung-won Hwang, and Byoung-kee Yi. 2011. Supporting pattern-matching queriesover trajectories on road networks.
IEEE Transactions on Knowledge and Data Engineering
23, 11 (2011), 1753–1758.[135] Simonas Šaltenis, Christian S. Jensen, Scott T. Leutenegger, and Mario A. Lopez. 2000. Indexing the positions ofcontinuously moving objects. In
SIGMOD . 331–342.[136] Iulian Sandu Popa, Karine Zeitouni, Vincent Oria, Dominique Barth, and Sandrine Vial. 2011. Indexing in-networktrajectory flows.
VLDB Journal
20, 5 (2011), 643–669.[137] Preston L Schiller and Jeffrey R Kenworthy. 2017.
An introduction to sustainable transportation: Policy, planning andimplementation . Routledge.[138] Alexander Schrijver. 1998.
Theory of linear and integer programming . John Wiley & Sons.[139] Shuo Shang, Lisi Chen, Zhewei Wei, Christian S Jensen, Kai Zheng, and Panos Kalnis. 2017. Trajectory similarity joinin spatial networks.
PVLDB
10, 11 (2017), 1178–1189.[140] Shuo Shang, Ruogu Ding, Bo Yuan, Kexin Xie, Kai Zheng, and Panos Kalnis. 2012. User oriented trajectory search fortrip recommendation. In
EDBT . 156–167.[141] Zeyuan Shang, Guoliang Li, and Zhifeng Bao. 2018. DITA : Distributed In-Memory Trajectory Analytics. In
SIGMOD .725–740.[142] Lokesh K. Sharma, Om Prakash Vyas, Simon Schieder, and Ajaya K. Akasapu. 2010. Nearest neighbour classificationfor trajectory data. In
ICT . 180–185.[143] Minxin Shen, Duen Ren Liu, and Shi Han Shann. 2015. Outlier detection from vehicle trajectories to discover roamingevents.
Information Sciences
294 (2015), 242–254.[144] Renchu Song, Weiwei Sun, Baihua Zheng, and Yu Zheng. 2014. PRESS: A novel framework of trajectory compressionin road networks.
PVLDB
7, 9 (2014), 661–672.[145] Xuan Song, Hiroshi Kanasugi, and Ryosuke Shibasaki. 2016. Deeptransport: Prediction and simulation of humanmobility and transportation mode at a citywide level. In
IJCAI . 2618–2624.[146] Han Su, Shuncheng Liu, Bolong Zheng, Xiaofang Zhou, and Kai Zheng. 2020. A survey of trajectory distance measuresand performance evaluation.
VLDB Journal
29 (2020), 3–32.[147] Han Su, Kai Zheng, Haozhou Wang, Jiamin Huang, and Xiaofang Zhou. 2013. Calibrating trajectory data forsimilarity-based analysis. In
SIGMOD . 833–844.[148] Na Ta, Guoliang Li, Yongqing Xie, Changqi Li, Shuang Hao, and Jianhua Feng. 2017. Signature-based trajectorysimilarity join.
IEEE Transactions on Knowledge and Data Engineering
29, 4 (2017), 870–883.[149] Lu An Tang, Yu Zheng, Xing Xie, Jing Yuan, Xiao Yu, and Jiawei Han. 2011. Retrieving k-nearest neighboringtrajectories by a set of point locations. In
SSTD . 223–241.[150] Yufei Tao, D Papadias, and Xiang Lian. 2004. Reverse kNN search in arbitrary dimensionality. In
VLDB . 744–755.[151] Eleftherios Tiakas, Apostolos Papadopoulos, Alexandros Yannis Manolopoulos Nanopoulos, Dragan Stojanovic, andSlobodanka Djordjevic-Kajan. 2009. Searching for similar trajectories in spatial networks.
The Journal of Systems &Software
82, 5 (2009), 772–788.[152] Jameson L Toole, Serdar Colak, Bradley Sturt, Lauren P Alexander, Alexandre Evsukoff, and Marta C González. 2015.The path most traveled: Travel demand estimation using big data resources.
Transportation Research Part C: EmergingTechnologies
58 (2015), 162–177.[153] Md Reaz Uddin, Chinya Ravishankar, and Vassilis J. Tsotras. 2011. Finding regions of interest from trajectory data. In
MDM . 39–48.[154] Michail Vlachos, George Kollios, and Dimitrios Gunopulos. 2002. Discovering similar multidimensional trajectories.In
ICDE . 673–684.[155] Dong Wang, Junbo Zhang, Wei Cao, Jian Li, and Yu Zheng. 2018. When will you arrive? Estimating travel time basedon deep neural networks. In
AAAI . 2500–2507.ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: December 2020. [156] Haozhou Wang, Han Su, Kai Zheng, Shazia Sadiq, and Xiaofang Zhou. 2013. An effectiveness study on trajectorysimilarity measures. In
ADC . 13–22.[157] Haozhou Wang, Kai Zheng, Jiajie Xu, Bolong Zheng, and Xiaofang Zhou. 2014. SharkDB: an in-memory column-oriented trajectory storage. In
CIKM . 1409–1418.[158] Jingyuan Wang, Ning Wu, Xinxi Lu, Xin Zhao, and Kai Feng. 2019. Deep trajectory recovery with fine-grainedcalibration using kalman filter.
IEEE Transactions on Knowledge and Data Engineering (2019).[159] Liang Wang, Zhiwen Yu, Dingqi Yang, Huadong Ma, and Hao Sheng. 2019. Efficiently targeted billboard advertisingusing crowdsensing vehicle trajectory data.
IEEE Transactions on Industrial Informatics
16, 2 (2019), 1058–1066.[160] Meng Wang, Hui Li, Jiangtao Cui, Ke Deng, Sourav S Bhowmick, and Zhenhua Dong. 2016. Pinocchio: Probabilisticinfluence-based location selection over moving objects.
IEEE Transactions on Knowledge and Data Engineering
28, 11(2016), 3068–3082.[161] Pengfei Wang, Guannan Liu, Yanjie Fu, Yuanchun Zhou, and Jianhui Li. 2017. Spotting trip purposes from taxitrajectories: A general probabilistic model.
ACM Transactions on Intelligent Systems and Technology
9, 3 (2017).[162] Sheng Wang, Zhifeng Bao, J. Shane Culpepper, and Gao Cong. 2020. A Survey on Trajectory Data Management,Analytics, and Learning. (mar 2020). arXiv:2003.11547 http://arxiv.org/abs/2003.11547[163] Sheng Wang, Zhifeng Bao, J Shane Culpepper, Timos Sellis, and Gao Cong. 2018. Reverse k nearest neighbor searchover trajectories.
IEEE Transactions on Knowledge and Data Engineering
30, 4 (2018), 757 – 771.[164] Sheng Wang, Zhifeng Bao, J. Shane Culpepper, Timos Sellis, and Xiaolin Qin. 2019. Fast large-scale trajectoryclustering.
PVLDB
13, 1 (2019), 29–42.[165] Sheng Wang, Zhifeng Bao, J Shane Culpepper, Timos Sellis, Mark Sanderson, and Xiaolin Qin. 2017. Answering top-kexemplar trajectory queries. In
ICDE . 597–608.[166] Sheng Wang, Zhifeng Bao, J. Shane Culpepper, Zizhe Xie, Qizhi Liu, and Xiaolin Qin. 2018. Torch: A search enginefor trajectory data. In
SIGIR . 535–544.[167] Sheng Wang, Zhifeng Bao, Shixun Huang, and Rui Zhang. 2018. A unified processing paradigm for interactivelocation-based web search. In
WSDM . 601–609.[168] Shuang Wang and Hakan Ferhatosmanoglu. 2021. PPQ-trajectory: spatio-temporal quantization for querying in largetrajectory repositories.
PVLDB
14, 2 (2021), 215–227.[169] Sheng Wang, Mingzhao Li, Yipeng Zhang, Zhifeng Bao, David Alexander Tedjopurnomo, and Xiaolin Qin. 2018. Tripplanning by an integrated search paradigm. In
SIGMOD . 1673–1676.[170] Sheng Wang, Yunzhuang Shen, Zhifeng Bao, and Xiaolin Qin. 2019. Intelligent traffic analytics: from monitoring tocontrolling. In
WSDM . 778–781.[171] Yong Wang, Guoliang Li, and Nan Tang. 2019. Querying shortest paths on time dependent road networks.
PVLDB
ISPRS Int. J. Geo-Inf
7, 1 (2018), 25.[173] Zheng Wang, Cheng Long, Gao Cong, and Ce Ju. 2019. Effective and efficient sports play retrieval with deeprepresentation learning. In
KDD . 499–509.[174] Zheng Wang, Cheng Long, Gao Cong, and Yiding Liu. 2020. Efficient and effective similar subtrajectory search withdeep reinforcement learning.
PVLDB
13, 11 (2020), 2312–2325.[175] Zuchao Wang, Min Lu, Xiaoru Yuan, Junping Zhang, and Huub Van De Wetering. 2013. Visual traffic jam analysisbased on trajectory data.
IEEE Transactions on Visualization and Computer Graphics
19, 12 (2013), 2159–2168.[176] Ling-Yin Wei, Wen-Chih Peng, and Wang-Chien Lee. 2013. Exploring pattern-aware travel routes for trajectorysearch.
ACM Transactions on Intelligent Systems and Technology
4, 3 (2013), 1.[177] Amy Wesolowski, Nathan Eagle, Andrew J. Tatem, David L. Smith, Abdisalan M. Noor, Robert W. Snow, and Caroline O.Buckee. 2012. Quantifying the impact of human mobility on Malaria.
Science
Proceedings of the National Academy of Sciences of the United States of America
ICDM . 547–556.[180] Hao Wu, Ziyang Chen, Weiwei Sun, Baihua Zheng, and Wei Wang. 2017. Modeling trajectories with recurrent neuralnetworks. In
IJCAI . 3083–3090.[181] Yanbo Wu, Hong Shen, and Quan Z. Sheng. 2015. A cloud-friendly RFID trajectory clustering algorithm in uncertainenvironments.
IEEE Transactions on Parallel and Distributed Systems
26, 8 (2015), 2075–2088.[182] Dong Xie, Feifei Li, and Jeff M Phillips. 2017. Distributed trajectory similarity search.
PVLDB
10, 11 (2017), 1478–1489.ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: December 2020.
Survey on Trajectory Data Management, Analytics, and Learning 35 [183] Xike Xie, Benjin Mei, Jinchuan Chen, Xiaoyong Du, and Christian S. Jensen. 2016. Elite: an elastic infrastructure forbig spatiotemporal trajectories.
VLDB Journal
25, 4 (2016), 1–21.[184] Xiaochun Yang, Bin Wang, Kai Yang, Chengfei Liu, and Baihua Zheng. 2018. A novel representation and compressionfor queries on trajectories in road networks.
IEEE Transactions on Knowledge and Data Engineering
30, 4 (2018),613–629.[185] Di Yao, Gao Cong, Chao Zhang, and Jingping Bi. 2019. Computing trajectory similarity in linear time: A genericseed-guided neural metric learning approach. In
ICDE . 1358–1369.[186] Di Yao, Chao Zhang, Zhihua Zhu, Jianhui Huang, and Jingping Bi. 2017. Trajectory clustering via deep representationlearning. In
IJCNN . 3880–3887.[187] Byoung-Kee Yi, H V Jagadish, and Christos Faloutsos. 1998. Efficient retrieval of similar time sequences under timewarping. In
ICDE . 201–208.[188] Simin You, Jianting Zhang, and Le Gruenwald. 2015. Large-scale spatial join query processing in Cloud. In
ICDE .34–41.[189] Jia Yu, Jinxuan Wu, and Sarwat Mohamed. 2015. Geospark: A cluster computing framework for processing large-scalespatial data. In
SIGSPATIAL . 4–7.[190] Haitao Yuan and Guoliang Li. 2019. Distributed in-memory trajectory similarity search and join on road network. In
ICDE . 1262–1273.[191] Haitao Yuan, Guoliang Li, Zhifeng Bao, and Ling Feng. 2020. Effective travel time estimation: When historicaltrajectories over road networks matter. In
SIGMOD . 2135–2149.[192] Jing Yuan, Yu Zheng, Chengyang Zhang, Wenlei Xie, Xing Xie, Guangzhong Sun, and Yan Huang. 2010. T-drive:Driving directions based on taxi trajectories. In
GIS . 99–108.[193] Chao Zhang, Jiawei Han, Lidan Shou, Jiajun Lu, and Thomas La Porta. 2014. Splitter: Mining fine-grained sequentialpatterns in semantic trajectories.
PVLDB
7, 3 (2014), 769–780.[194] Chao Zhang, Keyang Zhang, Quan Yuan, Luming Zhang, Tim Hanratty, and Jiawei Han. 2016. Gmove: Group-levelmobility modeling using geo-tagged social media. In
KDD . 1305–1314.[195] Dongxiang Zhang, Chee-Yong Chan, and Kian-Lee Tan. 2014. Processing spatial keyword query as a top-k aggregationquery. In
SIGIR . 355–364.[196] Dongxiang Zhang, Mengting Ding, Dingyu Yang, Yi Liu, Ju Fan, and Heng Tao Shen. 2018. Trajectory simplification:an experimental study and quality analysis.
PVLDB
11, 9 (2018), 934–946.[197] Hanyuan Zhang, Hao Wu, Weiwei Sun, and Baihua Zheng. 2018. DEEPTRAVEL: A neural network based travel timeestimation model with auxiliary supervision. In
IJCAI . 3655–3661.[198] Jun Zhang, Dayong Shen, Lai Tu, Fan Zhang, Chengzhong Xu, Yi Wang, Chen Tian, Xiangyang Li, Benxiong Huang,and Zhengxi Li. 2017. A real-time passenger flow estimation and prediction method for urban bus transit systems.
IEEE Transactions on Intelligent Transportation Systems
18, 11 (2017), 3168–3178.[199] Ping Zhang, Zhifeng Bao, Yuchen Li, Guoliang Li, Yipeng Zhang, and Zhiyong Peng. 2018. Trajectory-driven influentialbillboard placement. In
KDD . 2748–2757.[200] Yipeng Zhang, Yuchen Li, Zhifeng Bao, Songsong Mo, and Ping Zhang. 2019. Optimizing impression counts foroutdoor advertising. In
KDD . 1205–1215.[201] Bolong Zheng, Nicholas Jing Yuan, Kai Zheng, Xing Xie, Shazia Sadiq, and Xiaofang Zhou. 2015. Approximatekeyword search in semantic trajectory database. In
ICDE . 975–986.[202] Kai Zheng, Shuo Shang, Nicholas Jing Yuan, and Yi Yang. 2013. Towards efficient search for activity trajectories. In
ICDE . 230–241.[203] Kai Zheng, Yu Zheng, Nicholas J. Yuan, S. Shang, and Xiaofang Zhou. 2014. Online discovery of gathering patternsover trajectories.
IEEE Transactions on Knowledge and Data Engineering
26, 8 (2014), 1974–1988.[204] Yu Zheng. 2015. Trajectory data mining: An overview.
ACM Trans. Intell. Syst. Technol.
6, 3 (2015), 1–41.[205] Yu Zheng, Like Liu, Longhao Wang, and Xing Xie. 2008. Learning transportation modes from raw GPS data. In
WWW .247–256.[206] Yu Zheng, Lizhu Zhang, Xing Xie, and Wei-Ying Ma. 2009. Mining interesting locations and travel sequences fromGPS trajectories. In
WWW . 791–800.[207] Yu Zheng and Xiaofang Zhou. 2011.
Computing with Spatial Trajectories . Springer.[208] Bolei Zhou, Xiaogang Wang, and Xiaoou Tang. 2012. Understanding collective crowd behaviors: Learning a mixturemodel of dynamic pedestrian-agents. In
CVPR . 2871–2878.[209] Fan Zhou, Xiaoli Yue, Goce Trajcevski, Ting Zhong, and Kunpeng Zhang. 2019. Context-aware trajectory embeddingand human mobility inference. In
WWW . 3469–3475.[210] Justin Zobel and Alistair Moffat. 2006. Inverted files for text search engines.
Comput. Surveys
38, 2 (2006), 1–56.ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: December 2020.
A SUPPLEMENTAL FIGURES
RFC 7946 GeoJSON August 20161.4. Definitions o JavaScript Object Notation (JSON), and the terms object, member, name, value, array, number, true, false, and null, are to be interpreted as defined in [RFC7159]. o Inside this document, the term "geometry type" refers to seven case-sensitive strings: "Point", "MultiPoint", "LineString", "MultiLineString", "Polygon", "MultiPolygon", and "GeometryCollection". o As another shorthand notation, the term "GeoJSON types" refers to nine case-sensitive strings: "Feature", "FeatureCollection", and the geometry types listed above. o The word "Collection" in "FeatureCollection" and "GeometryCollection" does not have any significance for the semantics of array members. The "features" and "geometries" members, respectively, of these objects are standard ordered JSON arrays, not unordered sets.1.5. Example A GeoJSON FeatureCollection: { "type": "FeatureCollection", "features": [{ "type": "Feature", "geometry": { "type": "Point", "coordinates": [102.0, 0.5] }, "properties": { "prop0": "value0" } }, { "type": "Feature", "geometry": { "type": "LineString", "coordinates": [ [102.0, 0.0], [103.0, 1.0], [104.0, 0.0], [105.0, 1.0] ] }, "properties": {Butler, et al. Standards Track [Page 5]
Fig. 10. An example of
LineString withfour points in the GeoJSON format. Fig. 11. An example of
GPX format [61] widely used in Open-StreetMap Public GPS trace [9]. (a) Road traffic monitoring with range querieson the Porto collection [170]