[PDF] Intra-City Urban Network and Traffic Flow Analysis from GPS Mobility Trace

Abstract

We analyse two large-scale intra-city urban networks and traffic flows therein measured by GPS traces of taxis in San Francisco and Shanghai. Our results coincide with previous findings that, based purely on topological means, it is often insufficient to characterise traffic flow. Traditional shortest-path betweenness analysis, where shortest paths are calculated from each pairs of nodes, carries an unrealistic implicit assumption that each node or junction in the urban network generates and attracts an equal amount of traffic. We also argue that weighting edges based only on euclidean distance is inadequate, as primary roads are commonly favoured over secondary roads due to the perceived and actual travel time required. We show that betweenness traffic analysis can be improved by a simple extended framework which incorporates both the notions of node weights and fastest-path betweenness. We demonstrate that the framework is superior to traditional methods based solely on simple topological perspectives.

Full PDF

aa r X i v : . [ phy s i c s . s o c - ph ] M a y Intra-City Urban Network and Traﬃc Flow Analysisfrom GPS Mobility Trace

Ian X. Y. Leung , Shu-Yan Chan , Pan Hui and Pietro Li`o Computer Laboratory, University of Cambridge, Cambridge CB3 0FD, U.K. Deutsche Telekom Laboratories, Ernst-Reuter-Platz 7, 10709 Berlin, GermanyE-mail: [email protected]

Abstract.

We analyse two large-scale intra-city urban networks and traﬃc ﬂowstherein measured by GPS traces of taxis in San Francisco and Shanghai. Our resultscoincide with previous ﬁndings that, based purely on topological means, it is ofteninsuﬃcient to characterise traﬃc ﬂow. Traditional shortest-path betweenness analysis,where shortest paths are calculated from each pairs of nodes, carries an unrealisticimplicit assumption that each node or junction in the urban network generates andattracts an equal amount of traﬃc. We also argue that weighting edges based onlyon Euclidean distance is inadequate, as primary roads are commonly favoured oversecondary roads due to the perceived and actual travel time required. We showthat betweenness traﬃc analysis can be improved by a simple extended frameworkwhich incorporates both the notions of node weights and fastest-path betweenness.We demonstrate that the framework is superior to traditional methods based solely onsimple topological perspectives. ntra-City Urban Network and Traﬃc Flow Analysis from GPS Mobility Trace

1. Introduction

Complex networks provide a natural abstraction of the structure and relationshipsbetween entities in real-world systems. From the study of social relationships tobiological systems, the study of Network Science has provided abundant tools andresearch opportunities for the understanding and modelling of such complex systems.Urban networks, those pertaining to city infrastructures, have been traditionally subjectto investigation from the ﬁelds of Urban Planning, Economic Geography, Economics,and Engineering. In a common node-link nomenclature, road junctions are consideredas nodes while roads are treated as edges connecting the nodes in the network. Typicalreal-world networks from the realms of social and biological systems are famouslyknown to exhibit properties such as scale-free degree distributions and the small worldphenomenon, where despite their large size, the average distance between any pairsof node remain relatively small. On the other hand, urban networks are unlike theaforementioned networks due to spatial and geographical constraints. They are knownto exhibit a smaller average degree and longer diameter, due to them being almostplanar [15, 7].Studies from both Urban Planning and Network Science have revealed interestingcorrelations between topological properties of urban networks to human related activitiesin a wide variety of city topologies [6, 30]. We are interested in further understandingthe relationship between topological properties and traﬃc ﬂow with the support oflarge-scale mobility traces. Previous works have shed light into topological-based traﬃcprediction, but either focussed on unrealistic or incomplete traﬃc estimation or was donein a small and coarse scale. The development of cheap and portable Global PositioningSystem (GPS) has greatly advanced the state of the art of understanding humanmobility. By mining large-scale and real-time tracking datasets, one is able empiricallyevaluate important hypotheses. Such tools have for instance eased communicationnetwork deployment, e.g. network operators can choose to deploy more network facilitiessuch as Wi-Fi access points or relay nodes for mobile communication at areas with hightraﬃc ﬂow [18].The purpose of this paper is to analyse various network-based metrics and theirability to predict traﬃc ﬂow based on GPS mobility trace data in two major cities.We make use of two sets of GPS mobility trace data that were gathered from morethan 500 taxis in the San Francisco Bay area for 24 days and more than 4,000 taxisin Shanghai city for 28 days. We propose an eﬀective centrality-based methodologycombined with minimal locale information in an attempt to predict the traﬃc ﬂowin the road networks. We discuss four centrality metrics—namely, geodesic shortest-path betweenness, Euclidean shortest-path between, fastest-path betweenness, node-weighted fastest-path betweenness, and their relationships to traﬃc ﬂow. We arguethat based solely on topological properties of urban networks, as supported by previouswork, provides a limited prediction power of traﬃc ﬂow in a city. The accuracy can beenhanced given minimal information on location liveliness, for example by incorporating ntra-City Urban Network and Traﬃc Flow Analysis from GPS Mobility Trace

2. Related Work

Typical topology and patterns in spatial networks [15] such as the roads, Internet,and ﬂight networks are unlike their well studied non-spatial counterparts (e.g. social,biological, and technological networks), which disregard the physical geography of thenodes and exhibit various important properties [1]. Urban spatial networks, due to thegeographical constraints, are typically planar, i.e., they can be drawn on a 2D planewithout any edges crossing due presumably to construction constraints. Planar graphscan be shown to have an average degree strictly less than 6 and a diameter which growsfaster than non-spatial real-world networks ( √ N as opposed to log N in a network ofsize N ) without a well deﬁned low dimension (See [15]).L¨ammer et al. [27] reported the eﬀective dimensions ranged from 2 to 2.5 in 20Germany city road networks as well as a power-law betweenness centrality distributionover the nodes. A deduction was then made that majority traﬃc volume is concentratedon a minority of roads but this was not supported by real evidence. Cardillo et al. [7]analysed unit-square mile tile samples of 20 diﬀerent cities and reported that citiesexhibited meshedness , an alternative to the clustering coeﬃcient used in non-planargraphs, given by the number of faces associated with the planar graph with N nodesand M edges over the maximum possible. Road networks were also reported to beglobally eﬃcient [7], which meant that the actual distance required to travel from anytwo points in the city does not deviate too much from their straight line Euclideandistance. Attempts to characterise and compare diﬀerent cities were done using theselocal and global properties speciﬁcally designed for spatial graphs.Researchers have studied human mobility for the understanding of the natureof human movement which is vital to aspects pertaining to epidemic spreading andcommunication network design [10, 16, 19]. For instance, analysis of human mobilitytraces has demonstrated power-law inter-contact time distributions with cut oﬀ [8, 23],levy-ﬂight patterns consisting of lots of small moves followed by long jumps [32],etc. Vehicular mobility has also been known to be important for resource allocationand communication network optimization in a city [3]. Krings et al. [25] used the ntra-City Urban Network and Traﬃc Flow Analysis from GPS Mobility Trace integration (equivalent to closeness with restrictionof number of hops from the node). Traﬃc data was collected from oﬃcial traﬃc censususing inductive loop and pneumatic tube counters but are restricted to only hundreds ofroads. Correlation as high as 0.8 were achieved but only when small sample areas fromthe entire map were selected. Attempts were also made in [22, 20] to correlate Google’sPageRank [28] on junctions to predict the respective traﬃc ﬂow. Their results indicatedthat PageRank oﬀered minimal advantage over simple measures based on degree orcloseness. PageRank assumes a random walk model over a network which is intendedto model human web browsing behaviour. However, we believe that, despite incompleteinformation on the road network structure, human tend to know the destination beforesetting oﬀ and hence the two processes are fundamentally diﬀerent.De Montis et al. [12] studied a weighted inter-municipal commuting network in theSardinia region, Italy. Oﬃcial statistics of traﬃc between major cities in the regionwas incorporated into the networks. High correlation between the connectivity of themunicipality and commuter traﬃc was found.Kazerani and Winter [24] argued that shortest-path betweenness centrality wasagainst human way-ﬁnding behaviour as most people ﬁnd their way based on incompleteand inaccurate network knowledge. It was also argued that sources and sinks of traﬃcwere irregularly distributed in the networks, a fact that is not captured by betweennesscentrality which measures shortest paths between all pairs of nodes in the network. A ntra-City Urban Network and Traﬃc Flow Analysis from GPS Mobility Trace

3. Methodology

Since the San Francisco taxi traces were taken in 2008 [29], we have obtained the 2008US Government Census road shape ﬁles of the San Francisco Bay area [5]. The shape ﬁlecontains information of every road in the city as polylines deﬁned by their correspondingGPS coordinates. Where the GPS coordinates of any segment of two diﬀerent polylinematch, the coordinate is interpreted as a road intersection between the two roads.Each road is also conveniently classiﬁed by one of the ten possible road feature classesaccording to the US Bureau of the Census MAF/TIGER Feature Classiﬁcation Code(MTFCC), of which we only keep three classes: type S1100—Interstate, type S1200—Major Highways and type S1400—Local Neighbourhood Road, Rural Road, City Street.The road network contains 26,049 nodes and 33,079 edges, spanning a total estimateddistance of 1,980km.For the analysis in Shanghai, we have obtained the network shapeﬁles fromOpenStreetMap ‡ . OpenstreetMap is a public mapping service which allows users toupdate and edit the map freely. We also note with caution that the discrepanciesbetween the map snapshot of Shanghai taken at the time of writing and in 2007 whenthe traces were taken are likely to introduce errors in terms of shortest-path calculationsas well as potential invalid snapping of GPS traces to the roads. The road network ofShanghai contains 54,151 nodes and 61,834 edges, spanning a total estimated distanceof 7,402km.We only take the largest connected component of the resulting networks. Thenetworks are undirected due to data constraint. While knowing the exact ﬂowrestrictions on or between each individual road would be beneﬁcial to our analysis,we leave it as a potential and important future work. For our analysis, we reduce thecomplexity and size of the network while keeping the maximum information using thefollowing scheme. A node/junction of degree two in the original network is either thesame road or a corner of two roads, which serves no other purpose than maintaining theroad’s correct shape on the map. Hence, for every such node B, we remove the nodeand the two corresponding edges (A,B) and (B,C), and establish a new link betweenA and C. We assign a weight to (A,C) equal to the sum of the weights of edges (A,B)and (B,C). If a link already exists between A and C, we assign the new weight to bethe minimum of the old and new weights. We simplify the network for the followingreasons:(i) It greatly speeds up the centrality analysis. ‡ ntra-City Urban Network and Traﬃc Flow Analysis from GPS Mobility Trace N K h l i W d h k i San Francisco 26,049 33,079 59.9m 1,980km 6.67km 2.54San Francisco (Simpliﬁed) 9,791 16,129 116.4m 1,877km 6.60km 3.29Shanghai 54,151 61,834 119.7m 7,402km 20.87km 2.28Shanghai (Simpliﬁed) 12,979 20,585 354.1m 7,239km 18.9km 3.17

Table 1.

Key statistics of the two road networks studied. All centrality-based analysesare carried on the simpliﬁed version of the networks as described in Section 3.1. Here, N denotes the number of nodes, K is the number of edges, h l i is the average edgelength, W is the total edge lengths, d is the characteristic path length (average lengthof all shortest paths) and h k i is the average degree. The trace for San Francisco consists of roughly 10 million GPS readings (latitude,longitude) from more than 500 taxis in the Bay area over a period of 24 days fromMay 17, 2008 to June 10, 2008 [29]. The trace for Shanghai contains more than 100million GPS readings from more than 4 000 taxis over the entire month of February,2007 [17].We ﬁrst remove all the traces which were recorded when the taxis were unoccupied.We believe this provides a fairer measure of aggregated traﬃc ﬂow given by the actualneed of passenger travels. It is believed that unoccupied taxis are highly incentive-oriented and tend to remain in certain areas where passengers are easier to be found(such as commercial areas, stations, airports, etc). Hence, including those traces in thetraﬃc ﬂow estimate is likely to introduce bias. Secondly, by removing traces when taxisare unoccupied we also remove meaningless data, e.g. when they are waiting at the taxistops. ntra-City Urban Network and Traﬃc Flow Analysis from GPS Mobility Trace Figure 1.

The original network representation of the Shanghai road network( top )and the simpliﬁed network representation by removing all junctions with node degree2 ( bottom ). ntra-City Urban Network and Traﬃc Flow Analysis from GPS Mobility Trace trip to be a sequence of GPS traces ofa particular taxi in which each consecutive trace pair is less than 90 seconds apart andimplies a speed < = 120 km/h . Then, for every such valid trace in each trip, we locateits closest edge and increase the traﬃc count of the two corresponding junctions by one.By deﬁnition, each unique trip can only visit a junction once and hence traﬃc count foreach unique junction can only be increased once per trip.Due to the time resolution of the data, it is often the case that each consecutivetrace pair is several junctions apart. If we follow the simple scheme above, we risklosing valuable information of the entire trajectory taken by each taxi during the trip.We employ two methods to interpolate the trajectory of the traces when they are toofar apart. We ﬁrst attempt to estimate the actual edgetraversed based on the fastest path between the two nodes. Where two nodes are morethan one hop apart, we run Dijkstra’s algorithm from either end to devise a fastestpath between them. The ﬁnal path proﬁle is a sequence of connected junctions whichapproximates the actual path of the taxi during that trip. Similarly, we increase thetraﬃc ﬂow count of each junction in the interpolated path by one. This estimationtechnique is based on the assumption that taxis always take the shortest/fastest routefrom source to destination. Unfortunately, this interpolation is biased towards ourbetweenness-based prediction (Section 3.3.2) as betweenness is itself based on theshortest/fastest-path assumption. Since the average speed and time interval indicatean average distance of 500m between each consecutive trace, this estimation is highlyvulnerable to the bias especially in the case where the trajectory is not straight. ntra-City Urban Network and Traﬃc Flow Analysis from GPS Mobility Trace Our second technique involves interpolating thetrajectory of the entire trip by submitting each consecutive trace pair to the MapQuestDirections API service § with the default “Fastest Route” option enabled. Given thecomprehensive information on each individual road which online routing services suchas MapQuest possess, we believe that its fastest-path predictions can keep the errors toa minimal. Indeed, it is implicitly assumed that the route returned is the one picked bythe taxi driver which again is not always true. However, we believe that the resolution ofthe traces and the criteria we employ to merge the traces into individual trips keep thepotential errors to a minimum while retaining as much traﬃc information as possible.Here, the interpolated trajectory for each trip is subjected to further ﬁltering as itis noted that some routes returned by MapQuest were aﬀected by GPS snapping errors.In some cases, a route was suggested which would mean the taxi had travelled the entirea length of the road and to come back to a point nearby, just because the consecutiveGPS trace was snapped to the same road going in the opposite direction. Again, athreshold is set such that the interpolated trip could not imply a travel speed of morethan 120km/h by the taxi.Finally, for each valid trip, the traﬃc count on each of the distinct interpolatedjunctions is increased by one. This is similarly based on the assumption that in asingle trip, a taxi does not visit the same junction twice. Figure 2 depicts graphicallythe overall traﬃc ﬂow in the key commercial districts of the two cities based on thisinterpolation technique.Figure 3 shows a distribution of traﬃc ﬂow count per junction in the two cities. Apower-law decay trend with a cut oﬀ (which can be explained by the ﬁnite nature oftraﬃc ﬂow count) is observed. Also, it indicates that only a tiny fraction of junctionscarry a relatively substantial traﬃc load in both cities. Both degree and closeness centralities were extensively used under the study of SpaceSyntax, and in many cases were found to signiﬁcantly correlate to traﬃc ﬂow, safetyagainst criminality, commerce activity, activity separation and pollution [11]. We shallpresent the experimental results including the aforementioned centrality measures in thenext section. Here, we focus our discussion on betweenness centrality as well as some ifits extensions for traﬃc networks which will be introduced in due course. refers tothe simplest deﬁnition of betweenness, i.e., the fractions of shortest paths between anypairs of nodes in the network which go through a particular node: C B ( v ) = X u,t ∈ V σ ut ( v ) σ ut , (1) § http://developer.mapquest.com/web/products/open/directions-service ntra-City Urban Network and Traﬃc Flow Analysis from GPS Mobility Trace Figure 2.

Traﬃc ﬂow (interpolated by MapQuest routing service) in the commercialdistrict of San Francisco ( top ) and Shanghai ( bottom ). For visualisation purpose,the traﬃc count for each edge is given by the average of its two respective ends.Red coloured edges correspond to a high number of traﬃc counts and blue colourcorresponds to a close to zero traﬃc count. where V is the set of nodes in the network, σ ut is the number of shortest paths betweennodes u and t , and σ ut ( v ) is the number of those going through node v . Here the networkis treated as a non-spatial network, where each edge corresponds to a single hop from onenode to another with equal weight. For its derivation, we employ the original Brandesalgorithm [4] which carries out a breath-ﬁrst search from each node in the network andaccumulate the shortest path counts on each node that is on the shortest path. ntra-City Urban Network and Traﬃc Flow Analysis from GPS Mobility Trace Traffic Flow Count (x 100) P r obab ili t y D en s i t y ( x - ) San FranciscoShanghai

Figure 3.

Traﬃc ﬂow distribution per junction in San Francisco and Shanghai, withtrajectories interpolated by MapQuest routing service.

The drawback of geodesic shortest-path is that it does not take into account the physical distance of the spatial networkwhen devising the shortest path. It is highly sensitive to short edges which are commonin spatial networks due to the large number of intersections of diﬀerent roads. Traversingon such networks generally requires a relatively large number of hops compared to non-planar networks. A simple way to circumvent this problem is to take into accountthe Euclidean distance of each edge when calculating all the possible pairs of shortestpaths. We denote this as the

Euclidean shortest-path betweenness . The distributions ofthis betweenness of each junction in the two cities are shown in Figure 4.One can further improve the resemblance to traﬃc decision by including speedconsideration of diﬀerent roads in the city. As with most route decision made by humanor computer, the fastest route is often favoured over the shortest route. We thereforeexperiment with replacing the weight of each edge by an estimate of the travel timerequired based on its pre-labelled feature class. In most modern cities, the speed limitin typical residential or city streets is 50km/h while that in highways normally doublesthe former. For the sake of brevity, we halve the weights of all edges speciﬁed asinterstate/highways/primary/trunk roads in the network to accommodate for an averagedriver decision on choosing the faster routes. Understandably, route decision should alsodepend greatly on the time of day, the width, permeability [31], or potential chargeson using the routes. A full analysis would therefore require a better understanding of ntra-City Urban Network and Traﬃc Flow Analysis from GPS Mobility Trace Betwenness (x 10 ) P r obab ili t y D en s i t y ( x - ) Figure 4.

Distributions of Euclidean shortest-path betweenness centrality in the twocities. route decision made by every individual. We call this

Euclidean fastest-path betweenness .Similar attempts in capturing betweenness which the shortest-path assumption does notalways hold have been investigated in the realms of packet routing in communicationnetworks [13].For both the aforementioned measures, we employ the weighted version of Brandesalgorithm for its derivation, with all nodes set to the same weight (see Algorithm 1).

It has soon come to our attention thatpure topological measures which ignore locale information would fail to capture theactual traﬃc ﬂow due simply to the fact that, after all, places or nodes which are moredensely populated or commercially active inevitably generate and attract more traﬃc.Without resorting to a full blown population or commercial density map and in thespirit of using as little accessible information as possible, we decide to choose restaurantdensity as a measure to give each node a corresponding weight. Restaurants and theirexact locations are readily available information on a lot of online or oﬄine directories.While there is a potential bias to restaurants which publish themselves online, we believeit provides a fair estimate of traﬃc generation and attraction as restaurants are designedto serve in places where people are around.For each restaurant, we pre-assign to each junction within its 150m radius a weightinversely proportional to the number of junctions in that radius. This assumes thatevery restaurant carries the same traﬃc attracting and generating factor which is spread ntra-City Urban Network and Traﬃc Flow Analysis from GPS Mobility Trace node-weighted fastest-path betweenness framework follows closely the algorithmproposed in [9] which is based on a modiﬁcation of the original Brandes algorithm totake into account of node weights. We present the full algorithm in Appendix A. The keyobservation is on line a of the algorithm, where we adjust the number of shortest-pathcounts based on the node weights during the back propagation step from destination tosource. For the sake of brevity, our assumption is that the overall traﬃc between twonodes can be estimated by the multiple of the two respective weights.As an example, we have located 4,066 and 53,748 restaurants that are within thearea of interest in San Francisco and Shanghai respectively. Figure 5 shows the densityof restaurants in the San Francisco Bay area. Note that again these restaurants arecurrent information and therefore may not reﬂect truly the distribution at the time thetraces were taken.

We evaluate the quality of predictions by devising the Pearson correlation coeﬃcientbetween the traﬃc count and the predicted magnitude. The Pearson correlationcoeﬃcient, or Pearson’s r , is a value ranged from -1.0 to 1.0 which measures the lineardependence of two variables. A value close to zero means that the two variables arenot correlated at all, i.e., knowing one does not tell much about the other. Conversely,a value close to -1.0 or 1.0 indicates that there is a perfect linear relationship betweenthe two variables. For each junction, we therefore calculate 7 diﬀerent measures forcomparisons: its degree, closeness k , number of restaurants in its 150m radius, and thefour betweenness measures.While node-based correlation is simple and intuitive, it has several drawbacks.Firstly, it is susceptible to noise caused by routing inaccuracy due to insuﬃcientinformation of the road structure. Consider a multi-lane highway which would typicallybe depicted as a number of parallel edges in the network. Disregarding the directional orturning restrictions, the shortest-path based estimation would often pick one and onlyone of these parallel edges to increase the traﬃc count. Also, routing errors mentionedin the last section due to insuﬃcient time resolution of the GPS traces are bound to leadto noise and inaccuracies in node-based correlations. Finally, despite having removed alljunctions of degree 2 from the network, certain remaining junctions which are physically k The notion of closeness used in our analysis is diﬀerent to space syntax’s global integration in thatdistance is calculated based on the fastest-path weight between nodes. ntra-City Urban Network and Traﬃc Flow Analysis from GPS Mobility Trace Figure 5.

Density of restaurants in the San Francisco Bay area. For visualisationpurpose, the restaurant count for each edge is given by the average of the numberof restaurants surrounding a 150m radius from its two respective ends. Red colourededges correspond to a high number of restaurants in its surroundings and blue colourcorresponds to insigniﬁcant number of restaurants. close to each other in the network might exhibit similar behaviours in terms of traﬃcﬂow as well as centrality. Therefore, blindly carrying out node-based correlations maybe biased to nodes which are located closely.To improve on these issues, we follow a similar scheme used in [30] by carryingout spatial smoothing on the predictions and traﬃc counts prior to the correlation, acommon technique known as Kernel Density Estimation (KDE). In essence, for anypoint on the map, a measure is obtained by summing up all the events occurred on thewhole 2D plane weighted inversely by their Euclidean distances from that point. Toachieve that, a kernel function κ , whose shape is symmetric and integral is one, is addedto an overall sum over the plane where an event occurs. The sum therefore forms asurface over the plane with a volume equivalent to the number of events. It is typicallynormalised such that it becomes a probability density function over the 2D plane. For agiven point on a 1-dimensional line with coordinate x , the measure by KDE is commonly ntra-City Urban Network and Traﬃc Flow Analysis from GPS Mobility Trace f ( x ) = 1 n n X i =1 κ (cid:18) ( x − x i ) h (cid:19) , (2)where n is the total number of events in the entire space, h is the bandwidth, and x i isthe coordinate of the event i . h acts as the smoothing factor—the larger h is, the morelikely a distant event has an eﬀect on the point concerned.Since the coordinates are real-valued, it is common to discretise the 2D plane intopixels or squares prior to running KDE. We follow closely the settings used in [30],setting each pixel to be 10 m by 10 m , and h = 300 m . We employ the standard Gaussiankernel given by: κ ( x ) = 1 √ π e − x . (3)To ease the calculation, we pre-calculate the values of the six predictors, traﬃccount, as well as restaurant count for each 10m by 10m pixel. Where there is more thanone node in the pixel, we take the sum of the node-based measures as an aggregatedestimate for the pixel. For restaurant count, the number of restaurants within the pixelis used instead. To further speed up the process, pixels which are beyond the bandwidth h from the pixel concerned are not considered in the summation.To summarise, we ﬁrst pre-calculate the aggregated estimates for every pixel on themap. Then, for each pixel with coordinates x which contains any junction, we devisethe following metric:˜ f ( x ) = X { x i | d ( x , x i ) ≤ h } w i κ (cid:18) d ( x , x i ) h (cid:19) , (4)where d ( x , x i ) is the Euclidean distance between the concerned pixel with coordinate x and another pixel i with coordinate x i ; w i is the pre-calculated value (e.g. the aggregatedcentrality or traﬃc count) of pixel i .The correlation analysis only takes into account pixels which contain junctions.We do not correlate pixels that contain only edges (roads) since our knowledge of theedges, i.e., their traﬃc ﬂows and centralities, is no more than that of their correspondingjunction ends. Hence, given the framework, correlating every pixel which contains anedge may yield an overall biased Pearson correlation.Note that the above measure transforms the eﬀect of an event on the 2D space usinga 1-dimensional Gaussian as we assume an event’s eﬀect is symmetric in any directionon the 2D plane. We do not normalise ˜ f ( x ) as it does not aﬀect the correlations.

4. Results and Discussion

We present the results for our correlation analyses of San Francisco and Shanghai inTables 2 and 3 respectively. For each entry, both the node-based and KDE pixel-basedPearson’s r are included. ntra-City Urban Network and Traﬃc Flow Analysis from GPS Mobility Trace San Francisco - Pearson’s r (Node-based/KDE)No Interpolation Non-Rush Hour Rush Hour OverallDegree 0.272/0.392 0.246/0.388 0.268/0.394Closeness 0.243/0.413 0.19/0.404 0.231/0.413Geodesic betweenness 0.236/0.189 0.191/0.142 0.226/0.177Shortest-path 0.273/0.333 0.201/0.271 0.256/0.318Fastest-path 0.264/0.315 0.203/0.249 0.249/0.298Node-weighted fastest-path 0.591/0.83 0.526/0.767 0.579/0.818Restaurant 0.445/0.717 0.458/0.709 0.453/0.72Fastest-Path InterpolationDegree 0.268/0.521 0.267/0.538 0.269/0.526Closeness 0.256/0.529 0.275/0.545 0.261/0.534Geodesic betweenness 0.204/0.266 0.222/0.314 0.208/0.277Shortest-path 0.252/0.417 0.274/0.469 0.257/0.429Fastest-path 0.386/0.456 0.437/0.531 0.398/0.473Node-weighted fastest-path 0.565/0.753 0.549/0.725 0.564/0.75Restaurant 0.419/0.622 0.381/0.568 0.412/0.613Routing-Service InterpolationDegree 0.218/0.486 0.231/0.46 0.223/0.48Closeness 0.239/0.501 0.273/0.484 0.25/0.498Geodesic betweenness 0.159/0.237 0.179/0.23 0.166/0.236Shortest-Path 0.224/0.392 0.259/0.397 0.235/0.395Fastest-path 0.223/0.383 0.25/0.381 0.232/0.383Node-weighted fastest-path 0.528/0.764 0.573/0.812 0.544/0.779Restaurant 0.465/0.65 0.463/0.66 0.467/0.654 Table 2.

Pearson product-moment correlation coeﬃcients of 6 centrality predictorsand restaurant count against traﬃc ﬂow in San Francisco. Traﬃc ﬂow is furtherseparated into rush hour (6AM - 10AM and 4PM - 7PM) and non-rush hour forcomparisons.

A ﬁrst glance at the results reveals that predictors work diﬀerently in the two citiesand across the diﬀerent interpolation techniques. This is understandable given that thetwo cities have fundamental diﬀerences in terms of design and planning—San Franciscohas a well deﬁned grid like structure while Shanghai road structure is more irregularand ad-hoc (self-organised). As explained earlier, the fact that fastest-path interpolationwould by deﬁnition favour the fastest-path betweenness predictor is also evident acrossboth tables. KDE-pixel based correlation has in general a much higher correlationcoeﬃcient over the node-based counterpart. As discussed, due to imperfect informationon the road networks and from the raw traces, carrying out spatial smoothing on the ntra-City Urban Network and Traﬃc Flow Analysis from GPS Mobility Trace Shanghai - Pearson’s r (Node-based/KDE)No Interpolation Non-Rush Hour Rush Hour OverallDegree 0.253/0.404 0.273/0.413 0.261/0.408Closeness 0.339/0.466 0.367/0.474 0.35/0.47Geodesic betweenness 0.153/0.505 0.173/0.516 0.16/0.51Shortest-path 0.322/0.642 0.35/0.637 0.333/0.641Fastest-path 0.262/0.684 0.29/0.676 0.272/0.683Node-weighted fastest-path 0.289/0.634 0.311/0.622 0.297/0.631Restaurant 0.256/0.44 0.276/0.436 0.263/0.44Fastest-Path InterpolationDegree 0.265/0.439 0.269/0.438 0.267/0.439Closeness 0.418/0.501 0.414/0.496 0.418/0.5Geodesic betweenness 0.167/0.568 0.179/0.577 0.171/0.572Shortest-path 0.387/0.646 0.388/0.631 0.389/0.642Fastest-path 0.481/0.744 0.476/0.724 0.48/0.738Node-weighted fastest-path 0.491/0.674 0.472/0.641 0.486/0.664Restaurant 0.329/0.39 0.321/0.377 0.328/0.386Routing-Service InterpolationDegree 0.313/0.505 0.309/0.507 0.312/0.506Closeness 0.552/0.583 0.55/0.586 0.552/0.584Geodesic betweenness 0.086/0.429 0.089/0.425 0.087/0.428Shortest-path 0.484/0.751 0.478/0.746 0.483/0.75Fastest-path 0.43/0.814 0.427/0.807 0.43/0.812Node-weighted fastest-path 0.528/0.832 0.515/0.817 0.525/0.828Restaurant 0.438/0.444 0.433/0.438 0.438/0.443 Table 3.

Pearson product-moment correlation coeﬃcients between 6 centralitypredictors and restaurant count against traﬃc ﬂow in Shanghai. Traﬃc ﬂow is furtherseparated into rush hour (6AM - 10AM and 4PM - 7PM) and non-rush hour forcomparisons. observed traﬃc ﬂow and predictions would remove some of the noise from these errors.Nonetheless, we are able to draw several important conclusions from our experiments.Degree and geodesic betweennesses oﬀer minimal prediction power. This is asexpected as the variation of degree centrality in spatial networks is so small that itsdiscrimination power would be low. Geodesic betweenness does not take into accountedge weights and hence is inapplicable for ﬂow prediction in spatial networks where thenumber of hops between source and destination carries little meaning. These coincidewith numerous previous ﬁndings that simple topological based measures seem to havean inherent limit in characterising traﬃc ﬂow. Closeness centrality has a marginal ntra-City Urban Network and Traﬃc Flow Analysis from GPS Mobility Trace ntra-City Urban Network and Traﬃc Flow Analysis from GPS Mobility Trace San Francisco (Node−based)

Traffic Count N ode − W e i gh t ed F a s t e s t P a t h B e t w eenne ss San Francisco (KDE−based)

Traffic Count N ode − W e i gh t ed F a s t e s t P a t h B e t w eenne ss Shanghai (Node−based)

Traffic Count N ode − W e i gh t ed F a s t e s t P a t h B e t w eenne ss Shanghai (KDE−based)

Traffic Count N ode − W e i gh t ed F a s t e s t P a t h B e t w eenne ss Figure 6.

Scatter plots of Node-weighted betweenness against the observed andinterpolated traﬃc count in the two cities. The plots are in log-log scale and clearpositive trends can be observed in all cases. The Pearson’s r from the correlationstudies are reported in Tables 2 and 3.

5. Conclusion and Future Work

In this literature, more than a hundred million GPS taxi traces in two intra-cityurban networks of San Francisco and Shanghai have been analysed with a weightednetwork perspective. We have reviewed relevant literature on spatial network analysis,applications of large-scale human mobility analysis, as well as existing work ontopological-based traﬃc ﬂow analysis. We have discussed methodologies to eﬀectivelysimplify the urban networks as well as estimate traﬃc ﬂow from raw data. We haveargued that based on pure topological features, like a lot of previous work did, would beinsuﬃcient to consistently predict traﬃc ﬂow due to the inherently ﬂawed assumptionwhich some centrality measures carry. A novel framework which allows node weight andtravel speed to be incorporated into traditional betweenness analysis has been shownto greatly improve traﬃc prediction performance. Based on the topological framework,we have also identiﬁed driving behaviours which potentially diﬀer during rush hour andnon-rush hour traﬃc. ntra-City Urban Network and Traﬃc Flow Analysis from GPS Mobility Trace

Acknowledgments

The Shanghai taxi data was obtained from Wireless and Sensor networks Lab (WnSN),Shanghai Jiao Tong University, for which we are grateful. The city graphs used in thismanuscript are generated using Tulip 3.5.0 [2]. ntra-City Urban Network and Traﬃc Flow Analysis from GPS Mobility Trace Appendix A. Node-weighted Brandes Algorithm

The detailed implementation of the modiﬁed Brandes algorithm which allows weightededges and nodes is given below in Algorithm 1.

Input : G = ( V, E ), demand [ ]( Node weight ), weight ( v, v ′ ) ( Edge weight of (v, v’) ) Output : C B [ ] ( The approximation of node-weighted betweenness of each node ) begin C B [ v ] ← , v ∈ V ; foreach s ∈ V do /*s is the source node in each iteration */ S ← empty stack; /* P[] is a list of predecessors in the shortest path */ P [ w ] ← empty list, w ∈ V ; /* counters for number of shortest paths */ σ [ t ] ← , t ∈ V ; σ [ s ] ← /* distances from source */ d [ t ] ← − , t ∈ V ; d [ s ] ← /* a Priority queue of nodes ordered with increasing d[node] */ Q ← empty queue; enqueue s → Q ; /* Dijkstra’s algorithm, also counting the number of equal distance shortest paths toreach each node */ while Q not empty do dequeue Q → v ;push v → S ; foreach neighbour w of v do /* w found for the first time? */ if d [ w ] < then d [ w ] ← d [ v ] + weight ( v, w );enqueue w → Q ; end /* shorter path to w via v? */ if d [ w ] > d [ v ] + weight ( v, w ) then d [ w ] ← d [ v ] + weight ( v, w ); σ [ w ] ← σ [ v ]; P [ w ] ← new list (); append v → P [ w ]; /* reorder w in Q with the value of d[w] */ Q.decreaseKey ( w ); endelse if d [ w ] = d [ v ] + weight ( v, w ) then /* one of the shortest paths to w via v */ σ [ w ] ← σ [ w ] + σ [ v ];append v → P [ w ]; endendend /* counters for number of shortest paths from s passing through each node*/ δ [ v ] ← , v ∈ V ; /* S returns vertices in decreasing order of distance from s */ while S not empty do pop S → w ; foreach v ∈ P [ w ] do a δ [ v ] ← δ [ v ] + σ [ v ] σ [ w ] · ( demand [ s ] · demand [ w ] + δ [ w ]); if w = s then C B [ w ] ← C B [ w ] + δ [ w ]; endendendendend Algorithm 1:

Full implementation of the node-weighted betweenness centralityalgorithm. ntra-City Urban Network and Traﬃc Flow Analysis from GPS Mobility Trace References [1] R´eka Albert and Albert-L´aszl´o Barab´asi. Statistical mechanics of complex networks.

Rev. Mod.Phys. , 74:47–97, Jan 2002.[2] D. Auber. Tulip-a huge graph visualization framework, 2003.[3] Fan Bai and Bhaskar Krishnamachari. Spatio-temporal variations of vehicle traﬃc in vanets:facts and implications. In

Proceedings of the sixth ACM international workshop on VehiculArInterNETworking , VANET ’09, pages 43–52, New York, NY, USA, 2009. ACM.[4] U. Brandes. A faster algorithm for betweenness centrality.

Journal of Mathematical Sociology ,25(2):163–177, 2001.[5] U.S. Census Bureau. Tiger mapping service: San francisco county, california, 2008.[6] F. Calabrese, C. Ratti, M. Colonna, P. Lovisolo, and D. Parata. Real-time urban monitoring usingcell phones: a case study in rome.

IEEE Transactions on Intelligent Transportation Systems ,2010.[7] A. Cardillo, S. Scellato, V. Latora, and S. Porta. Structural properties of planar graphs of urbanstreet patterns.

Physical Review E , 73(6):066107, 2006.[8] Augustin Chaintreau, Pan Hui, Jon Crowcroft, Christophe Diot, Richard Gass, and James Scott.Impact of human mobility on opportunistic forwarding algorithms.

IEEE Transactions onMobile Computing , 6(6):606–620, 2007.[9] S.Y. Chan, I.X.Y. Leung, and P. Li`o. Fast centrality approximation in modular networks. In

Proceeding of the 1st ACM international workshop on Complex networks meet information &knowledge management , pages 31–38. ACM, 2009.[10] Vittoria Colizza and Alessandro Vespignani. Epidemic modeling in metapopulation systemswith heterogeneous coupling pattern: theory and simulations.

Journal of Theoretical Biology ,251:450, 2008.[11] P. Crucitti, V. Latora, and S. Porta. Centrality in networks of urban streets.

Chaos: aninterdisciplinary journal of nonlinear science , 16:015113, 2006.[12] A. De Montis, M. Barth´elemy, A. Chessa, and A. Vespignani. The structure of inter-urban traﬃc:A weighted network analysis.

Environment and Planning B: Planning and Design , 34:905–927,2007.[13] S. Dolev, Y. Elovici, and R. Puzis. Routing betweenness centrality.

Journal of the ACM (JACM) ,57(4):1–27, 2010.[14] M. Garavello and B. Piccoli.

Traﬃc ﬂow on networks . American institute of mathematicalsciences, 2006.[15] M.T. Gastner and M.E.J. Newman. The spatial structure of networks.

The European PhysicalJournal B-Condensed Matter and Complex Systems , 49(2):247–252, 2006.[16] Marta C. Gonzalez, Cesar A. Hidalgo, and Albert-Laszlo Barabasi. Understanding individualhuman mobility patterns.

Nature , 453(7196):779–782, 2008.[17] H.Y. Huang, P.E. Luo, M. Li, X. Li, W. Shu, and M.Y. Wu. Performance evaluation of suvnet withreal-time traﬃc data.

Vehicular Technology, IEEE Transactions on , 56(6):3381–3396, 2007.[18] P. Hui, R. Mortier, M. Piorkowski, T. Henderson, and J. Crowcroft. Planet-scale human mobilitymeasurement. In

Proceedings of the 2nd ACM International Workshop on Hot Topics in Planet-scale Measurement , pages 1–5. ACM, 2010.[19] Pan Hui and Jon Crowcroft. Human mobility models and opportunistic communications systemdesign.

Philosophical Transactions of The Royal Society A: Mathematical, Physical andEngineering Sciences , 366:2005–2016, 2008.[20] B. Jiang. Ranking spaces for predicting human movement in an urban environment.

InternationalJournal of Geographical Information Science , 23(7):823–837, 2009.[21] B. Jiang and C. Liu. Street-based topological representations and analyses for predicting traﬃcﬂow in gis.

International Journal of Geographical Information Science , 23(9):1119–1137, 2009.[22] B. Jiang, S. Zhao, and J. Yin. Self-organized natural roads for predicting traﬃc ﬂow: a sensitivity ntra-City Urban Network and Traﬃc Flow Analysis from GPS Mobility Trace study. Journal of statistical mechanics: Theory and experiment , 2008:P07008, 2008.[23] Thomas Karagiannis, Jean-Yves Le Boudec, and Milan Vojnovi´c. Power law and exponentialdecay of inter contact times between mobile devices. In

MobiCom 2007 , pages 183–194, 2007.[24] A. Kazerani and S. Winter. Can betweenness centrality explain traﬃc ﬂow. In , 2009.[25] Gautier Krings, Francesco Calabrese, Carlo Ratti, and Vincent D. Blondel. Scaling behaviors inthe communication network between cities. In

Proceedings of the 2009 International Conferenceon Computational Science and Engineering - Volume 04 , pages 936–939, Washington, DC, USA,2009. IEEE Computer Society.[26] M. Kurant and P. Thiran. Layered complex network.

Physical review letters , 96(13):138701, 2006.[27] S. L¨ammer, B. Gehlsen, and D. Helbing. Scaling laws in the spatial structure of urban roadnetworks.

Physica A: Statistical Mechanics and its Applications , 363(1):89–95, 2006.[28] Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The pagerank citationranking: Bringing order to the web. Technical Report 1999-66, Stanford InfoLab, November1999. Previous number = SIDL-WP-1999-0120.[29] Michal Piorkowski, Natasa Saraﬁjanovic-Djukic, and Matthias Grossglauser.CRAWDAD data set epﬂ/mobility (v. 2009-02-24). Downloaded fromhttp://crawdad.cs.dartmouth.edu/epﬂ/mobility, February 2009.[30] S. Porta, V. Latora, F. Wang, E. Strano, A. Cardillo, S. Scellato, V. Iacoviello, and R. Messora.Street centrality and densities of retail and services in bologna, italy.

Environment and PlanningB: Planning and Design , 36(3):450–465, 2009.[31] C. Ratti. Urban texture and space syntax: some inconsistencies.

Environment and Planning B:Planning and Design , 31, 2004.[32] I. Rhee, M. Shin, S. Hong, K. Lee, and S. Chong. On the levy-walk nature of human mobility. In