[PDF] Relaxing door-to-door matching reduces passenger waiting times: a workflow for the analysis of driver GPS traces in a stochastic carpooling service

Abstract

Carpooling has the potential to transform itself into a mass transportation mode by abandoning its adherence to deterministic passenger-driver matching for door-to-door journeys, and by adopting instead stochastic matching on a network of fixed meeting points. Stochastic matching is where a passenger sends out a carpooling request at a meeting point, and then waits for the arrival of a self-selected driver who is already travelling to the requested meeting point. Crucially there is no centrally dispatched driver. Moreover, the carpooling is assured only between the meeting points, so the onus is on the passengers to travel to/from them by their own means. Thus the success of a stochastic carpooling service relies on the convergence, with minimal perturbation to their existing travel patterns, to the meeting points which are highly frequented by both passengers and drivers. Due to the innovative nature of stochastic carpooling, existing off-the-shelf workflows are largely insufficient for this purpose. To fill the gap in the market, we introduce a novel workflow, comprising of a combination of data science and GIS (Geographic Information Systems), to analyse driver GPS traces. We implement it for an operational stochastic carpooling service in south-eastern France, and we demonstrate that relaxing door-to-door matching reduces passenger waiting times. Our workflow provides additional key operational indicators, namely the driver flow maps, the driver flow temporal profiles and the driver participation rates.

Full PDF

RRelaxing door-to-door matching reduces passenger waitingtimes: a workﬂow for the analysis of driver GPS traces in astochastic carpooling service

Panayotis Papoutsis ∗1,2,3 , Safa Fennia , Constant Bridon , and Tarn Duong Department of Computing and Mathematics, Nantes Central Engineering School,F-44300, France Jean Leray Mathematics Laboratory, University of Nantes, F-44300, France Department of Data Science-GIS, Ecov, F-44200, France

Abstract

Carpooling has the potential to transform itself into a mass transportation mode by aban-doning its adherence to deterministic passenger-driver matching for door-to-door journeys,and by adopting instead stochastic matching on a network of ﬁxed meeting points. Stochasticmatching is where a passenger sends out a carpooling request at a meeting point, and thenwaits for the arrival of a self-selected driver who is already travelling to the requested meetingpoint. Crucially there is no centrally dispatched driver. Moreover, the carpooling is assuredonly between the meeting points, so the onus is on the passengers to travel to/from themby their own means. Thus the success of a stochastic carpooling service relies on the con-vergence, with minimal perturbation to their existing travel patterns, to the meeting pointswhich are highly frequented by both passengers and drivers. Due to the innovative nature ofstochastic carpooling, existing oﬀ-the-shelf workﬂows are largely insuﬃcient for this purpose.To ﬁll the gap in the market, we introduce a novel workﬂow, comprising of a combination ofdata science and GIS (Geographic Information Systems), to analyse driver GPS traces. Weimplement it for an operational stochastic carpooling service in south-eastern France, andwe demonstrate that relaxing door-to-door matching reduces passenger waiting times. Ourworkﬂow provides additional key operational indicators, namely the driver ﬂow maps, thedriver ﬂow temporal proﬁles and the driver participation rates.Keywords: data science, stochastic matching, GIS, meeting point, network

Carpooling has seen an explosion of utilisation in recent years [Furuhata et al., 2013]. Thereare many underlying reasons for this, with concerns ranging from greenhouse gas emissions andair pollution to road congestion to land use, as well as economic costs [Shaheen et al., 2016].It also attracts intense interest since carpooling is a crucial element of almost all developmentplans for smart cities [Ghoseiri, 2012]. A broad deﬁnition of carpooling involves a driver sharingtheir journey with passengers. In this paper we employ a narrower deﬁnition. We additionally ∗ Corresponding author. E-mail: [email protected] a r X i v : . [ s t a t . A P ] F e b equire that a non-professional driver would have undertaken their journey for their own reasons,regardless of whether the passengers would have been present or not. The driver may receivepayment to oﬀset the costs of the use of their vehicle, but the proﬁt motive is non-existent or atleast not their primary motivation [Zhu, 2020]. Hence we do not consider taxi-like services (suchas Uber, Lyft and Kapten etc.) to be carpooling services as they employ professional driverswho create a journey in response to a passenger request and are then paid the market rate forthe service rendered.Due to the altruistic nature of carpooling, service providers tend to be small, local, non-proﬁtorganisations. Though it does not preclude viable business models arising from carpooling withnon-professional drivers: the market leader BlaBlaCar levies a commission fee for facilitatingthe matching of drivers and passengers [Shaheen et al., 2017]. This matching is managed by acentralised platform, which we call deterministic matching since a known driver is assigned inadvance to collect the passenger. This deterministic matching is highly successful for infrequent,long distance, pre-reserved carpooling journeys, as witnessed by BlaBlaCar’s status as a unicornstart-up company (a market capitalisation of at least 1000 million USD). Despite the success ofdeterministic passenger-driver matching in this market, attempts to export it other carpoolingmarkets have not resulted in the same level of market penetration. This is most notable forfrequent, short distance journeys (from 10 to 40 km roughly), which comprise the bulk of dailyhome-work commutes, and so carpooling remains a marginal practice in this market.This paper focuses on short-distance, non-reserved carpooling, and it is what we refer towhen employing ‘carpooling’ without any qualiﬁers. The advent of mass carpooling dependscrucially on incentivising drivers and passengers to converge onto highly frequented meetingpoints (hotspots) along their door-to-door journeys [Stiglic et al., 2015]. This type of incentivi-sation is well-established for a bus network where passengers embark/disembark only at the ﬁxedbus stops. Thus mass carpooling requires a paradigm shift from considering carpooling as anexclusively private means of transport to a closer alignment to public transport models [Cooper,2007].Continuing with the public transport model, the meeting points are not deﬁned informallybetween passengers and drivers, but are decided in consultation with local government authoritiesso that they respond to the mobility requirements in the local area, taking into account variousfactors such as aggregated traﬃc ﬂow, socioeconomic characteristics, pedestrian accessibility,local government regulations, etc. For our purposes, we consider that the identiﬁcation of themeeting hotspots has been carried out beforehand. These meeting points are then connectedto each other to deﬁne carpooling lines, which have massiﬁcation potential, like traditional buslines [Stiglic et al., 2015, Li et al., 2018].Like a bus service, no pre-reservations are required, as a passenger makes an ad hoc carpoolingrequest at a meeting point, and this request for the desired destination is communicated to allpassing drivers in real-time via an electronic sign on the side of a highly frequented main road.Unlike for deterministic passenger-driver matching mentioned above, a speciﬁc driver is notassigned to the passenger by a centralised platform, but the decision to collect a passenger at themeeting point is made spontaneously by a self-selected driver. Since the actual driver who collectsthe passenger is not known deterministically in advance, but is only known to be drawn fromthe population of drivers, this is known as stochastic matching . Due to the inherent variabilityof these driver arrivals, stochastic matching is only feasible when employed in conjunction witha network of highly frequented meeting points.The eﬀects of the double innovations of ﬁxed meeting points and stochastic matching are only2parsely covered by the recent comprehensive review of general carpooling and taxi-like servicesover the past two decades [Wang and Yang, 2019]. So there are few oﬀ-the-shelf workﬂows whichare suitable for the analysis of the data arising from a stochastic carpooling service. We introducea data science-GIS workﬂow which ﬁlls this gap in the market. Its main data source is the GPStraces, and its secondary sources are the meeting point locations, the origin-destination matrices,the route ﬁnder API and the base maps. Data wrangling/geoprocessing are then applied to thesedata sources, with the critical geoprocessing step being the topological simpliﬁcation of the GPStraces onto the carpooling network. This topological simpliﬁcation is essential to be able tomutualise GPS traces which share common arrival times at the meeting points. From thesesimpliﬁed GPS traces, we can produce the waiting times. The latter allow us to assert thatstochastic matching at meeting points leads to reduced passenger waiting times in comparison todoor-to-door matching. In addition to the waiting times, other outputs from this workﬂow arethe driver ﬂow maps, the driver ﬂow proﬁles, and the driver participation rates. These additionaloutputs are obtained at low marginal cost but which are important elements for evidence-baseddecision making in a stochastic carpooling service.In Section 2 we present the theoretical reasons why door-to-door matching is insuﬃcient toensure a regular carpooling service. In Section 3 we detail our data science-GIS workﬂow forthe analysis of GPS traces. In Section 4, we apply this workﬂow to an operational stochasticcarpooling service to produce the passenger waiting times and the other outputs. We end withsome concluding remarks. As alluded to in the introduction, door-to-door matching of complete trajectories from the originto the destination is a structural obstacle to the transformation of carpooling to a mass transitservice. To illustrate the diﬃculties of passenger-driver matching in space and in time for door-to-door trajectories, we can represent it with partition of a 3D cube divided into smaller sub-cubes, where the x -axis is the longitude, the y -axis the latitude and the z -axis the time, asshown in Figure 1. On the left, there are 9 sub-cubes, where each sub-cube represents theorigin/destination of a door-to-door trajectory. The blue sub-cube in the lower left represents allthe trajectories whose origins are, say, within a 5 km radius around a residential neighbourhoodbetween 07:00 and 09:00 on Tuesday, and the green sub-cube the trajectories whose destinationare within a 5 km radius of the workplace between 08:00 and 10:00 on Tuesday. So for twotrajectories to match spatio-temporally in a door-to-door sense, they must share the same sub-cube for the origin, and similarly for the destination: this condition is met by the 1 pair of greenand blue sub-cubes among all possible 27 pairs of sub-cubes. On the right, the conditions for adoor-to-door matching are stricter, say the origin is 1 km within the residential neighbourhoodduring 07:00 to 07:30, and the destination is 1 km within the workplace during 08:30 to 09:00.This represents 1 pair out of 125 pairs of sub-cubes. Thus stricter door-to-door matching leadsto fewer drivers being available to share their trajectories with passengers.To supplement the heuristic observations for door-to-door matching in Figure 1, we demon-strate that the probability that two users (i.e. a driver and a passenger) share the same originand destination at the same time decreases rapidly as the spatio-temporal matching conditionsbecome more stringent. For the sake of simplicity, we suppose that the origin and destination fora driver and a passenger are both represented by independent random variables which are uni-form over all sub-cubes in Figure 1. Let U dO and U dD be the origin and destination of a driver, and3igure 1: Spatio-temporal door-to-door matching fragments the population of mutualisable tra-jectories. (Left) Relaxed matching conditions. (Right) Restricted matching conditions. Bluesub-cube represents the origin (residential neighbourhood), green the destination (workplace),and trajectories which share the same origin and destination sub-cubes are considered to bedoor-to-door matches.likewise U pO , U pD for a passenger. These quantities are all uniform random variables U ( { , . . . , n } ) where n is the number of sub-cubes in Figure 1. Then the probability of a door-to-door matchbetween the driver and passenger is p ( n ) = P ( U pO = U dO , U pD = U dD ) . Since an exact formula forthis probability is diﬃcult to obtain, we approximate it by a Monte Carlo re-sampling method.That is, we generate samples of U pO , U dO , U pD , U dD and the probability of a door-to-door matchis approximated as ˆ p ( n ) = 11000 (cid:88) i =1 { U pO ,i = U dO ,i , U pD ,i = U dD ,i } where {·} is the indicator function. Figure 2 is the graph of the number of sub-cubes n versusthe approximate probability of a door-to-door match ˆ p ( n ) . If there is only 1 sub-cube (i.e. nospatio-temporal constraints) the probability of a match is 1. This probabilistic certainty decreasesrapidly as the spatio-temporal constraints are added: for 27 sub-cubes, this probability is 0.6,and for 125 sub-cubes, it falls to 0.2.The previous analysis was based on the synthetic uniformly distributed origins and destina-tions. For a more realistic example, we analyse some data generated by an operational carpoolingservice. Our example is the ‘Lane’ carpooling service ( lanemove.com ) operated by Ecov, in con-junction with Instant System ( instant-system.com ), since May 2018 in the peri-urban regionsaround Lyon in south-eastern France. Our main data source is the GPS traces of drivers, whichcan be considered to be a form of crowd-sourced data collection [Lee and Liang, 2011]. Pas-senger GPS traces are more diﬃcult to obtain, and as we are not able to replicate exactly thesynthetic example of passenger-driver matching above, so we use door-to-door matching of driverGPS traces to illustrate the diminishing probabilities. Since these GPS traces provide highly de-tailed spatio-temporal information, we are able to determine the number of strict door-to-doormatches which also pass by two meeting points, as well as the number of matches when door-to-door matching is relaxed. For an illustrative example in Figure 3, we analyse 121 GPS traces4igure 2: Probability of door-to-door matches for uniformly distributed drivers and passengers,as a function of the number of sub-cube partition classes. Higher number of sub-cubes representmore stringent spatio-temporal matching conditions.of drivers who travelled from the Bourgoin meeting point (solid black circle labelled B) to theSt-Priest meeting point (solid black circle labelled S) in the Lane carpooling service during themorning operating hours (06:30 to 09:00) for the work week 2019-11-25 to 2019-12-01. A hierar-chical clustering with complete linkage was carried out on the spatial locations of these originsand destinations. The dissimilarity matrix used for this hierarchical clustering is composed ofthe Euclidean distance between the 4-vector comprising the (origin longitude, origin latitude,destination longitude, destination latitude) of each trajectory. This dissimilarity takes into ac-count both the origin and the destination, but not the intermediate GPS points as these actualroute taken is not critical for our purposes. We cut the dendrogram at h = 6000 to yield 9 spatialclusters. These clusters are represented with the diﬀerent colours. So GPS traces with the samecolour can be considered as door-to-door matches with each other.The number of GPS traces per cluster is given in Table 1: as cluster 1 contains 75% ofthe mutualisable traces, this leaves the other 25% spread sparsely over the other 8 clusters,fragmenting the supply of the carpooling trajectories to passengers.Door-to-door cluster 1 2 3 4 5 6 7 8 9 TotalNumber of GPS traces 76 15 7 9 4 1 7 1 1 121Table 1: Spatio-temporal door-to-door matching fragments the number of mutualisable trajecto-ries in the Bourgoin > St-Priest carpooling line, during its morning operating hours 06:30–09:00,from 2018-11-25 to 2018-12-01. The ﬁrst line is the door-to-door cluster label and the secondline is the number of traces in each cluster.To quantify the augmentation of the carpooling potential by relaxing door-to-door matching,we compare the door-to-door cluster with the largest cardinality (76 traces) from Table 1 tothe number of the trajectories (121 traces) which coincide with this carpooling line regardless oftheir true origin and destination. These counts are an empirical equivalent of Figure 1: the leftcorresponds to the 121 meeting point (i.e. relaxed door-to-door) matches, whereas the right the5igure 3: Spatio-temporal door-to-door matching fragments the number of mutualisable trajec-tories in an operational stochastic carpooling service. The clusters of GPS traces of door-to-doormatches are colour coded, with the GPS points as the solid circles, and the origins/destinationsas the solid diamonds. The meeting points are the solid black circles, denoted B = Bourgoin, S= St-Priest.76 door-to-door matches. Thus meeting point matching represents an increase of 45 traces or59% of the carpooling driver potential due to relaxing door-to-door matching.Furthermore, Stiglic et al. [2015] and Li et al. [2018] provide more complex synthetic modelsto aﬃrm that meeting points are essential to the feasibility of the mass carpooling services,and assert that it is almost impossible for a carpooling service to be based on door-to-doorspatio-temporal matching.Whilst these examples demonstrate that incentivising drivers to converge to meeting points,rather than relying on door-to-door matching, increases the potential pool of mutualisable jour-neys, we have not yet demonstrated that this leads to reduced waiting times. This would bestraightforward for the synthetic examples but this is not the case for empirically observed driversand passengers journeys. In the next section we introduce a general workﬂow which indeed allowsus to conﬁrm these reduced waiting times for empirical GPS traces. The GPS traces analysis workﬂow is illustrated in Figure 4. The left rectangle of Figure 4 containsthe main data sources: the GPS traces, the meeting point locations, the origin-destinationmatrices, the route ﬁnder API and the base maps. The ﬁrst two are supplied in-house by thecarpooling service provider, the origin-destination matrices are usually supplied by a third partywhich has carried out a mobility survey (e.g. a national statistical agency INSEE [2018]), theroute ﬁnder API is provided by a GPS navigation operator (e.g. TomTom [2019]), and thebase maps are accessed from a cartography provider (e.g. OpenStreetMap contributors [2019]).There are specialised data wrangling techniques speciﬁc to spatial databases, known collectively6s geoprocessing , and these are carried out, in conjunction with traditional data wrangling, in thecentral rectangle. The critical geoprocessing task concerns the topological simpliﬁcation of theGPS traces onto the carpooling network. Whilst GPS traces are a rich source of information ofdriver behaviour, they are voluminous and complex. Our approach is based on network analysistools [Guidotti et al., 2017] and complexity reduction/harmonisation algorithms [Douglas andPeucker, 2011]. This topological simpliﬁcation is essential to be able to mutualise GPS traceswhich share common arrival times at the carpooling meeting points. Once these GPS traces arein a suitable format, we are able to produce the required outputs in the right rectangle, namelythe predicted waiting times, the driver ﬂow maps, the driver ﬂow temporal proﬁles and the driverparticipation rates.Figure 4: Data science-GIS workﬂow for the analysis of driver GPS traces in stochastic carpoolingservice. (Left) Spatio-temporal input data sources. (Centre) Data wrangling and geoprocessingtasks. (Right) Generated outputs.

Our primary data source are the driver GPS traces. A GPS trace is represented by an (cid:96) -sequenceof triplets X = { ( X i , Y i , T i ) } (cid:96)i =1 where ( X i , Y i ) are the longitude, latitude coordinates of the GPSsensor at the i th timestamp T i . We have n GPS traces X , . . . , X n in the data collection period.The m meeting point locations are represented by their GPS coordinates M , . . . , M m . Theorigin-destination matrix is such that its ( j, k ) th entry is the number of journeys from j th originto the k th destination. In addition to the origin-destination matrix, we have the GPS coordinatesof the origins and the destinations. Whilst it is common that they coincide, this is not requiredfor our workﬂow. The base maps are graphics ﬁles of maps of the study area, which facilitatethe fast and accurate map rendering at any desired scale. From the m meeting points M , . . . , M m , a directed graph is constructed where the meetingpoints are the nodes, and an edge is drawn between the two nodes if carpooling between thesetwo meeting points is guaranteed by the service provider. Thus a carpooling line is representedby an acyclic sub-graph with at least two nodes.The crucial data wrangling/geoprocessing process applied to the GPS traces is the topologicalsimpliﬁcation of GPS traces on a carpooling line. Around each of the m meeting points, a buﬀerzone of 1 km radius is drawn to obtain B ( M ) , . . . , B ( M m ) . The intersection of the buﬀer zones7nd the GPS trace, X ∩ B ( M ) , . . . , X ∩ B ( M m ) , is m sub-sequences of the GPS points of X .For those meeting points with non-empty intersections, we consider that the driver is able tocollect a passenger at these points without onerous detours.This spatial intersection only considers the spatial proximity of the driver to a passenger at ameeting point. For the carpooling to succeed, they also need to be in temporal proximity. Amongthe spatial intersections X ∩ B ( M ) , . . . , X ∩ B ( M m ) , we examine the corresponding timestampsand retain only those in a suitably restrained time interval. If this reduced set of spatio-temporalintersections is non-empty then we proceed to the last data wrangling/geoprocessing step.We compute the closest GPS points in X to the meeting points M j , as deﬁned by X M j = { ( X k , Y k , T k ) : k = argmin ≤ i ≤ (cid:96) (cid:107) (( X i , Y i ) − M j (cid:107)} , j = 1 , . . . , m . From this closest point X M j ,we can extract the corresponding timestamp to be an estimate of the arrival time at the meetingpoint M j . As an example, if the meeting points M , M form the carpooling line { M > M } ,and if the GPS trace X has well-deﬁned estimated arrival times at M and M , then we areable to reduce the complexity of the GPS trace. That is the (cid:96) points of X can be reduced to thesequence of 4 points ˜ X ( M , M ) = { ( X , Y , T ) > X M > X M > ( X (cid:96) , Y (cid:96) , T (cid:96) ) } where ( X , Y , T ) is the driver origin and ( X (cid:96) , Y (cid:96) , T (cid:96) ) is the driver destination. With this simpli-ﬁed trace ˜ X ( M , M ) , we are still able to determine if the driver can fulﬁl a passenger request ata given time on the carpooling line { M > M } . The complex topology of X is thus simpliﬁedby retaining a small number of key derived indicators [Lee and Liang, 2011].We repeat these data wrangling/geoprocessing steps for all n GPS traces. The result is areduced set of ˜ n GPS traces which correspond to the driver journeys which closely resemble thespatio-temporal characteristics of the likely passenger requests along the carpooling line.

For the ﬁrst output in the workﬂow in Figure 4, if we visualise the GPS traces of the reduced set of ˜ n meeting point matches with the base maps, then we obtain a map of the driver ﬂow that matchesto the passengers in the carpooling line, as in Figure 3. For the second output in the workﬂow,suppose that the initial time interval of interest is divided into n T sub-intervals τ j , j = 1 , . . . , n T since we wish to quantify the driver ﬂow at a higher temporal resolution. Computing the driverﬂows f ( τ j ) , j = 1 , . . . , n T , is straightforward as it only requires an enumeration of the simpliﬁedGPS traces whose estimated arrival times fall within each sub-interval τ j . That is, the driverﬂow for the carpooling line { M > M } during the time interval τ j is f ( τ j , M , M ) = { i : ˜ X i ( M , M ) ∈ τ j , i = 1 , . . . , ˜ n } . (1)For the third output in the workﬂow, let W ( t ) be the waiting time until the driver arrivalfor a carpool request made at time t . For stochastic carpooling, since a speciﬁc driver is notdispatched to the given passenger, the problem is equivalent to the arrival time of the ﬁrstdriver from the population of available drivers. Assuming a Poissonian driver arrival process, thewaiting time and the driver ﬂow are inversely proportional to each other, W ( t ) ∝ len( τ j ) /f ( τ j ) where t ∈ τ j and len( τ j ) is the length of the time interval τ j . For simplicity, we set the constant ofproportionality to 1 as this corresponds to the assumption that all geolocated drivers are willingto respond to a carpooling request. It is a reasonable assumption that the vast majority ofgeolocated drivers are willing to pick up a passenger, according to unpublished evidence supplied8y Ecov. Thus for the carpooling line { M > M } , the passenger waiting times for the timeinterval τ j , j = 1 , . . . , n T , are W ( τ j , M , M ) = len( τ j ) /f ( τ j , M , M ) . (2)For the fourth output in the workﬂow, the driver participation rate is P = n /n where n isthe total number of the drivers who are motivated to carpool in response to a passenger request,and n is the total numbers of drivers who undertake journeys in the same geographical regionas the carpooling service. Both n and n are diﬃcult to deﬁne and to estimate precisely. Wepropose that ˜ n , calculated above as the number of drivers who share their geolocation, to be ourproxy for n , as the vast majority of carpooling journeys are assured by drivers who are willingto share their geolocation.To enumerate all n drivers in the same geographical region as the carpooling network isdiﬃcult since the GPS traces for all drivers are not available. Our proxy ( ˜ n ) is derived frominferring likely trajectories from the reference origin-destination matrix. Usually this origin-destination matrix is provided at the county-level, but this is insuﬃciently detailed to decide ifthe drivers match with the meeting points on the carpooling lines. So we infer likely trajectories.These inferred likely trajectories are determined as the fastest route from the origins (countycentroids) to the destinations (county centroids) by a route ﬁnder API. We employ a route ﬁnderAPI rather than an explicit model-based methodology, e.g. Tang et al. [2016], to infer these mostlikely routes. Model-based methods are the product of extensive theoretical and empirical work,and these tend to be diﬃcult to access due to their proprietary nature. They also tend to belimited to dense urban regions, which are not the target regions for stochastic carpooling. Thus ˜ n is the sum of the driver ﬂow from all origin-destination pairs whose likely trajectories coincidewith the carpooling lines. The driver participation rate for a carpooling line { M > M } is ˜ P = ˜ n/ ˜ n (3)where ˜ n = (cid:80) n T j =1 f ( τ j , M , M ) from Equation (1).Since there is no comparable door-to-door carpooling service operating concurrently with themeeting-point stochastic carpooling service, a direct comparison of empirical passenger waitingtimes is not possible. Instead, we propose an indirect comparison in three stages: (i) extractall driver GPS traces which connect two meeting points in a restrained time interval, as themeeting point matches, (ii) extract the largest hierarchical cluster of these GPS traces to serveas the door-to-door matches, and (iii) compute the driver ﬂows using Equation (1) for both setsof matches, and then convert them using Equation (2) to passenger waiting time predictions.In addition to the waiting times as an output, there are also the driver ﬂow maps, thedriver ﬂow temporal proﬁles, and the driver participation rates. All these outputs are usefulin understanding the transport mix of the local area as well as the market penetration of thecarpooling into the transport mix. Our case study focuses on the Lane carpooling service introduced earlier. Before we progressfurther into the data analysis of the driver GPS traces, we describe the operational details of thisstochastic carpooling service. The physical meeting points require an integrated infrastructureto facilitate this real-time stochastic matching, as illustrated in Figure 5. The orange structure9n the right functions like a bus shelter to provide protection from inclement weather whilst thepassenger waits, and a prominent visual point of reference for drivers on the road. The passengermakes a carpooling request on the console (the green device with a horizontal yellow stripe). Thisrequest is displayed on the electronic sign on the roadside. In this conﬁguration, the electronicsign is located close to the meeting point, but this can vary considerably according to the localgeographical characteristics. A driver who wishes to embark the passenger in response to theirrequest is able to do so safely in the reserved parking place.Figure 5: Conﬁguration of a physical meeting point for the ‘Lane’ carpooling service. The orangestructure is like a bus shelter. A passenger notiﬁes potential drivers of their carpooling requestusing the console, which is then displayed on the roadside electronic sign. A driver can safelyembark the passenger in the reserved parking place. Reproduced with permission from Ecov.

The schematic diagram of the carpooling lines in the Lane network is shown on the left ofFigure 6. The visual similarities of the schematic of this carpooling service with those associatedwith bus or train services is deliberately designed to induce the perception of carpooling as aform of public transport. There are 5 physical meeting points (Lyon Mermoz, St-Priest ParcTechno, Aéroport Lyon-St Exupéry, Villefontaine The Village, and Bourgoin La Grive Sortie 7),denoted by the circles with the stylised L , which function analogously to bus stops. Accordingto mobility studies in this territory, the coloured lines connect the meeting points that have asuﬃcient driver ﬂow between them to maintain a carpooling service with stochastic matching.These connected meeting points form a carpooling line, analogous to a bus line, where carpoolingis only available between these meeting points.This carpooling network is represented as a directed graph, as shown on the right of Figure 6,where each node is a meeting point and the edge connects two nodes if they form segment of acarpooling line. For brevity, the node labels are abbreviated to the ﬁrst letter, i.e. L = LyonMermoz, S = St-Priest Parc Techno, A = Aéroport Lyon-St Exupéry, V = Villefontaine TheVillage, and B = Bourgoin La Grive Sortie 7. We focus on the most frequented carpooling line,that is, the Bourgoin > St-Priest line (green line in Figure 6). The topology of the road network10igure 6: (Left) Schematic diagram of the Lane carpooling network, which resembles the ge-ographically restrained trajectories of a public transport service. Reproduced with permissionfrom Ecov. (Right) Carpooling network represented as a directed graph. Nodes are the meetingpoints, edges connect meeting points whenever a carpooling service between them is assured.ensures that most of the journeys from Bourgoin to St-Priest pass also by the Villefontainemeeting point, that is, the B > S carpooling line includes both sub-graphs B > S and B > V> S. The period of data collection is 2019-07-25 (service launch date) to 2020-02-17 (last datefor which consistent driver GPS traces are available), during the most frequented time period(the morning operating hours 06:30-09:00).A complete GPS trace X is displayed as the sequence of 530 blue circles in Figure 7. ThisGPS trace passes within 1 km of the B, V and S meeting point nodes, so its simpliﬁed topologyconsists of the 5-node sequence {origin > B > V > S > destination}, shown as the orange arrows.This simpliﬁed GPS trace represents a data compression rate of over 99% yet it retains theimportant information to decide the matching potential of this GPS driver trace with a passengerrequest on the Bourgoin > St-Priest carpooling line. In Table 2 is the average data compressionfor the ˜ n = 121 GPS traces on the Bourgoin > St-Priest line. The ﬁrst column is the averagenumber of GPS points in the complete driver traces X , the second is the average number ofGPS points of the simpliﬁed topologies X , and the last column is the average data compressionrate (1 − X / X ) .Line Of the ˜ n = 121 GPS traces that follow the Bourgoin > St-Priest carpooling line, GPS traceshave an arrival time at Bourgoin within 08:00 to 08:30, i.e., a close spatio-temporal match for apassenger request for a departure at the Bourgoin meeting point between 08:00 and 08:30 am,11igure 7: Topological simpliﬁcation of a GPS trace in the Bourgoin > St-Priest carpooling line,during its morning operating hours 06:30–09:00, on 2019-11-28. The complete GPS trace are the530 blue circles; the sequence of ﬁve nodes, as its simpliﬁed topology, are the orange arrows, andthe orange diamonds are the origin, carpooling meeting points, destination nodes. The meetingpoints are S = St-Priest, V = Villefontaine, and B = Bourgoin.with a destination at the St-Priest meeting point. Of these 31 GPS traces, 17 of them are door-to-door matches (as deﬁned as belonging to the largest door-to-door cluster of GPS traces inTable 1). These 17 traces are both door-to-door and meeting point matches and their simpliﬁedtraces are displayed in Figure 8 as the orange diamonds/arrows. The simpliﬁed traces of theremaining 14 meeting point but not door-to-door matches are the blue diamonds/arrows. Thelatter represent an 82% increase in the number of drivers (from 17 to 31) who can potentiallyrespond to a passenger carpooling request on the Bourgoin > St-Priest line.Table 3 summarises the weekly evolution of the impact of meeting point matching overdoor-to-door matching. The ﬁrst set of three columns focus on the entire morning operat-ing hours (06:30–09:00) whereas the second set on the single 30 minute period (08:00–08:30)as this latter restricted period is a more realistic time frame that potential passengers arewilling to wait for a driver to arrive. The ﬁrst column contains the weekly aggregate num-ber of door-to-door matches, the second the number of meeting point matches, and the thirdis the percentage increase due to meeting point matches, i.e. ( meeting point matches − door-to-door matches ) / door-to-door matches. The number of door-to-door matches are enu-merated from a similar hierarchical clustering to that in Table 1, and the number of meetingpoint matches are computed from Equation (1). This table demonstrates that the increase in thedriver ﬂow due to meeting point matching is maintained over the entire data collection period.The simpliﬁed GPS traces also allow us to compute the driver ﬂows for narrower time intervalsthan the 2.5 hour and 0.5 hour intervals in Table 3. Following Smith and Demetsky [1997] andMcShane and Roess [1990] that 15 minutes intervals are a suitable choice because the variationin driver ﬂows for shorter intervals is less stable, Table 4 displays the average driver ﬂow for 15minute intervals during 06:30 to 09:00. For robustness, we aggregate these driver ﬂows over all12igure 8: Matching on meeting points increases the number of driver spatio-temporal matches incomparison to door-to-door matching, during a single 30 minute period (08:00-08:30), on 2019-11-28. The orange arrows are the 14 GPS traces which are both meeting point and door-to-door,and the blue arrows are the 17 GPS traces which are meeting point matches but not door-to-doormatch matches. The diamonds are the origin/destination points. The solid black circles are themeeting points: S = St-Priest, V = Villefontaine, and B = Bourgoin.weeks in the data collection period in applying Equation (1) since we have increased the intra-daytemporal resolution. It is straightforward to convert these average driver ﬂows in Table 4 into predicted waiting timesusing Equation (2). Suppose that a passenger makes a carpool request at 08:10 at the Bourgoinmeeting point to travel to St-Priest. The expected waiting time is the length of the intervaldivided by the average driver ﬂow in the interval 08:00–08:15, i.e. 7.5 minutes. Given that wehave already established a highly detailed spatio-temporal proﬁle of the average driver ﬂow fora carpooling line in Table 4, then the predicted waiting times at the same temporal resolutionare shown in Table 5.For the Bourgoin > St-Priest carpooling line from 2019-07-25 to 2020-02-17, we observedroughly 1500 carpooling requests with a reliably recorded waiting time. Each box plot in Figure 9displays the observed waiting times each 15 minute interval during the morning opening hourswith at least one observed waiting time.The accuracy of these predicted waiting times with respect to these observed ones is illustratedin Figure 10. Each box plot displays the RMSE (Root Mean Squared Error) between the observedand the predicted waiting times (from Table 5) for each 15 minute interval. For all 15 minuteintervals, the median RMSE is around 2 to 4 minutes which implies that the predicted waitingtimes fairly closely track the observed waiting times. This gives us conﬁdence in our method forpredicting waiting times in a stochastic carpooling service.13river ﬂow (06:30–09:00) Driver ﬂow (08:00–08:30)Week

A key question for the service provider is what driver participation rate leads to passenger waitingtimes around 5 to 10 minutes, as observed in Figure 9? To respond to this question, we ﬁrstneed to enumerate the population of all drivers on a carpooling line. The county-level origin-destination matrix of home-work trajectories from the French oﬃcial statistical agency [INSEE,2018] is insuﬃciently detailed to decide if the drivers with these origins-destinations travel onthe carpooling lines. So we infer likely trajectories, as determined as the fastest route by the16omTom route ﬁnder API [TomTom, 2019] starting on Tuesday 8am from the origins (countycentroids) to the destinations (county centroids). A spatial intersection, similar to that carriedout for the driver GPS traces, is computed to determine which trajectories pass within 1 km ofthe carpooling meeting points. These trajectories are shown in Figure 11. Note that there is notemporal information attached to this origin-destination matrix, but since they are home-worktrajectories, we suppose they are eﬀected in the morning peak hours which matches the timeinterval of the driver GPS traces.Figure 11: Likely driver itineraries from the TomTom route ﬁnder API in the same geographicalregion as the Bourgoin > St-Priest carpooling line. The origins and destinations (county cen-troids) are the orange diamonds. The solid black circles are the meeting points: S = St-Priest,B = Bourgoin.If we then aggregate the corresponding driver ﬂows in the origin-destination matrix, then weobtain ˜ n = 3821 drivers whose likely trajectories match the Bourgoin > St-Priest carpooling line.From Table 4, there is a daily average of ˜ n = 20 driver GPS traces between 06:30 and 09:30. Thisyields an estimated driver participation rate of ˜ P = ˜ n/ ˜ n = 0 . from Equation (3). Even withthis low driver participation rate, average passenger waiting times of 5–10 minutes are observedin Figure 9.If we were able to increase this low driver participation rate even modestly (to 1% or 5%),then the predicted passenger waiting times would fall substantially, as illustrated in Figure 12.In this case, these waiting times would be lower than those of bus lines and approach those ofhigh frequency metro/subway train lines. The methods for increasing driver participation, asthey lie largely outside of data science, are out of scope of this paper but are of intense interestto the service provider [Zhu, 2017, 2018, 2020]. Stochastic real-time carpooling services diﬀer from competing services which oﬀer deterministicdoor-to-door matching for complete trajectories. Whilst the latter oﬀer a high level of personal17igure 12: Evolution of the predicted passenger waiting time as a function of the driver par-ticipation rate in the Bourgoin > St-Priest carpooling line, during the morning operating hours06:30–09:00, from 2019-07-25 to 2020-02-17.convenience in highly urbanised regions, door-to-door matching structurally inhibits mass adop-tion of carpooling, especially in peri-urban regions. Relaxing the strict door-to-door matchingallows, and subsequently implementing stochastic meeting point matching, allows for the mutu-alisation of high throughput road segments, and thus removes this obstacle. We introduced anovel data science-GIS workﬂow for a stochastic carpooling service. The crucial mathematicalabstraction in this workﬂow is to reduce the complexity of driver GPS traces to a graph-basedtopology of the carpooling network. We illustrated this workﬂow on an operational stochasticcarpooling service in a peri-urban region in south-eastern France. We provided quantitative justi-ﬁcations that the physical meeting points, by facilitating a critical mass of drivers and passengersdrawn from a much larger geographical area, leads to passenger waiting times which are lowerthan those achieved by door-to-door matching. Our workﬂow is novel combination of two closelyrelated, but historically separate, disciplines of data science and GIS into a single workﬂow. Inaddition to predicting the passenger waiting times, it is able to deliver outputs for the driver ﬂowmaps, driver ﬂow spatio-temporal proﬁles, and driver participation rates. This workﬂow forms asolid prototype for other workﬂows to accompany the expansion of stochastic carpooling servicesto address the mobility requirements in neglected peri-urban regions in the future.

The authors thank Ecov for providing the data sets of driver GPS traces and passenger waitingtimes. The authors also thank Bertrand Michel from the Central Engineering School of Nantesand Gérard Biau from Sorbonne University for their feedback.

References

Carol Cooper. Successfully changing individual travel behavior: Applying community-basedsocial marketing to travel choice.

Transportation Research Record , 2021:89–99, 2007.David H. Douglas and Thomas K. Peucker. Algorithms for the Reduction of the Number of Points18equired to Represent a Digitized Line or its Caricature.

Classics in Cartography: Reﬂectionson Inﬂuential Articles from Cartographica , 10:15–28, 2011. doi: 10.1002/9780470669488.ch2.Masabumi Furuhata, Maged Dessouky, Fernando Ordóñez, Marc-Etienne Brunet, XiaoqingWang, and Sven Koenig. Ridesharing: The state-of-the-art and future directions.

Trans-portation Research Part B: Methodological , 57:28–46, 2013.Keivan Ghoseiri.

Dynamic Rideshare Optimized Matching Problem . PhD thesis, University ofMaryland, 2012.Riccardo Guidotti, Mirco Nanni, Salvatore Rinzivillo, Dino Pedreschi, and Fosca Giannotti.Never drive alone: boosting carpooling with network analysis.

Information Systems , 64:237–257, 2017.INSEE.

Mobilités professionnelles en 2015: déplacements domicile–lieu de travail . NationalInstitute of Statistics and Economic Studies [INSEE], France., 2018. In French. .Dong Woo Lee and Steve H. L. Liang. Crowd-sourced carpool recommendation based on simpleand eﬃcient trajectory grouping. In

Proceedings of the 4th ACM SIGSPATIAL InternationalWorkshop on Computational Transportation Science , pages 12–17, 2011. doi: 10.1145/2068984.2068987.Xin Li, Sangen Hu, Wenbo Fan, and Kai Deng. Modeling an enhanced ridesharing system withmeet points and time windows.

PLOS ONE , 13:1–19, 2018.William R McShane and Roger P Roess.

Traﬃc Engineering . Prentice-Hall, 1990.OpenStreetMap contributors. OpenStreetMap. , 2019.Susan A. Shaheen, Nelson D. Chan, and Teresa Gaynor. Casual carpooling in the San FranciscoBay Area: Understanding user characteristics, behaviors, and motivations.

Transport Policy ,51:165 – 173, 2016.Susan A. Shaheen, Adam Stocker, and Marie Mundler. Online and app-based carpooling inFrance: Analyzing users and practices – a study of BlaBlaCar. In Gereon Meyer and SusanShaheen, editors,

Disrupting Mobility: Impacts of Sharing Economy and Innovative Trans-portation on Cities , pages 181–196. Springer International Publishing, 2017.Brian L Smith and Michael J Demetsky. Traﬃc ﬂow forecasting: comparison of modeling ap-proaches.

Journal of Transportation Engineering , 123:261–266, 1997.Mitja Stiglic, Niels Agatz, Martin Savelsbergh, and Mirko Gradisar. The beneﬁts of meetingpoints in ride-sharing systems.

Transportation Research Part B: Methodological , 82:36–53,2015.Jinjin Tang, Ying Song, Harvey J. Miller, and Xuesong Zhou. Estimating the most likelyspace–time paths, dwell times and path uncertainties from vehicle trajectory data: A timegeographic method.

Transportation Research Part C: Emerging Technologies , 66:176 – 194,2016. 19omTom.

Routing API and Extended Routing API , 2019. https://developer.tomtom.com/routing-api .Hai Wang and Hai Yang. Ridesourcing systems: A framework and review.

TransportationResearch Part B: Methodological , 129:122–155, 2019.Dianzhuo Zhu. More generous for small favour? Exploring the role of monetary and pro-socialincentives of daily ride sharing using a ﬁeld experiment in rural Île-de-France.

DigiWorldEconomic Journal , 108:77–97, 2017.Dianzhuo Zhu. The limit of money in daily ridesharing: Evidence from a ﬁeld experiment.Technical report, PSL, University of Paris-Dauphine, 2018.Dianzhuo Zhu.