[PDF] A spatial algorithm for the analysis of transportation systems using statistical model checking

Abstract

We present an automated methodology for using Automatic Vehicle Location measurements of public transportation vehicles to construct a probabilistic model. The model not only allows for accurate evaluation of service performance, but also makes it possible to study the effects of system modifications a priori. The methodology is almost entirely agnostic to otherwise important details of the service -- in particular its route and the location of stops. Instead, it infers this from the data using automated map generation techniques. The behaviour of vehicles in the model is analysed using computer simulation combined with statistical model checking. We present two case studies involving the Airlink service in Edinburgh and the Bellevue Express in Seattle. To demonstrate the usefulness of the approach, we analyse the impact of the scheduling strategies of bus holding and speed modification on the Airlink's performance. The data and code used to create the figures are publicly available online.

Full PDF

AA spatial algorithm for the analysis of transportation systems using statisticalmodel checking

DANIËL REIJSBERGEN,

Singapore University of Technology and Design

STEPHEN GILMORE,

University of Edinburgh

We present an automated methodology for using Automatic Vehicle Location measurements of public transportation vehicles toconstruct a probabilistic model. The model not only allows for accurate evaluation of service performance, but also makes it possibleto study the effects of system modifications a priori . The methodology is almost entirely agnostic to otherwise important details ofthe service — in particular its route and the location of stops. Instead, it infers this from the data using automated map generationtechniques. The behaviour of vehicles in the model is analysed using computer simulation combined with statistical model checking.We present two case studies involving the Airlink service in Edinburgh and the Bellevue Express in Seattle. To demonstrate theusefulness of the approach, we analyse the impact of the scheduling strategies of bus holding and speed modification on the Airlink’sperformance. The data and code used to create the figures are publicly available online.

The wealth of data produced by the increasing digitisation of urban transportation systems has made it considerablyeasier to evaluate their performance. Modern GPS and communication technologies allow for a level of monitoring ofsystem components — particularly vehicles — that was previously unimaginable. At the same time, improvements informal verification techniques have enabled increasingly realistic system models to be analysed in a rigorous manner.There is a synergy between these two developments in the sense that the ability to handle more complex models islargely futile if there is insufficient data for realistic parameterisation. Hence, there is a need for a methodology thatcan translate the vast datasets generated by ‘smart’ transportation systems into properly parameterised formal models,which can then be analysed by modern analysis tools.In this paper, we present a novel technique that uses Automatic Vehicle Location (AVL) data to build and parameterisea formal model for vehicle movements along a route. This model can be used to determine whether a service satisfies agiven performance requirement, e.g. , one set by regulators. Additionally, by making alterations to the model, planners canstudy the impact of changes to the network on its performance a priori . Property evaluation is done using model checking [7, 15, 16]. Model checking is a formal verification technique that essentially consists of two steps: (1) specification of aformal model representing the system and (2) evaluation of whether the model satisfies a formally specified property ,expressed as a formula in a suitable logic. Since transportation systems are strongly subject to random effects ( e.g. , trafficconditions or passenger numbers at bus stops), we use stochastic models. Our models are spatial and patch-based : theservice under consideration is partitioned into a discrete set of patches, and vehicles move from one patch to the nextsuch that the time spent in each patch is modelled using a formally defined probability distribution. Our methodologyis almost entirely agnostic to route information, as the AVL data is used both to determine the patch structure and toparameterise the probability distributions of the times spent within them. The properties of interest are expressed usingthe language MultiQuaTeX, which allows us to use the powerful and versatile statistical model-checking algorithms ofthe MultiVeStA tool [36].We illustrate our methodology using case studies that focus on performance measures for frequent services, definedas services for which more than six bus departures are timetabled each hour. Unlike infrequent services, for whichpunctuality ( i.e. , timetable adherence) is the main performance criterion, the main metrics for frequent services involve a r X i v : . [ ee ss . SP ] S e p he regularity of the headways , i.e. , the time between subsequent vehicle arrivals at a stop. We focus on the commonnotion of Excess Waiting Time (EWT), but to demonstrate the broad applicability of our method we also consider twomeasures specified by the Scottish government [35]. We perform two case studies involving frequent services, one inEdinburgh (Scotland, UK) and one in Seattle (Washington, USA). In addition to evaluating the current performanceof the services, we study the impact of performance improvement strategies — bus holding and speed modification .Both strategies depend on parameters set by the operator, and these greatly determine the strategy’s success [28]. Theability to discover a good choice of control parameters is a powerful contribution of our framework. Both the code,which is open source, and the data used to conduct the experiments presented in this paper are available online viahttp://dx.doi.org/10.7488/ds/1472 and http://dx.doi.org/10.7488/ds/1470 respectively.The use of our approach has following key advantages over other techniques known in the literature: • The numerical efficiency of our approach allows for quick evaluation of the service requirements. This isparticularly helpful for operators who seek to optimise timetables or control measures using a wide-rangingparameter sweep. • The methodology is almost completely agnostic to information about the route, and learns it from the datainstead. This makes the methodology applicable even when this information is outdated, wrong ( e.g. , if due toroad works buses take a different route than prescribed), or unavailable. • The spatial nature of the patch identification step makes it easier for planners to make spatially-informed changesto the model ( e.g. , the imposition of speed limits in specific areas — see also [33]), or bus control measures thatcan only be imposed in certain areas ( e.g. , lowering bus speeds only in areas that are not heavily congested). • Informative maps that identify regions of different vehicle movement behaviour are created as a by-product ofthe parameter fitting procedure. • Our methodology can be used to evaluate any performance property that can be expressed in the highlyexpressive formal language MultiQuaTeX. Although we focus on EWT and the two measures used by the Scottishgovernment, extensions to other examples, e,g. , on-time performance for infrequent services, or coefficients ofvariation of headway ratios [27], are straightforward.In particular, the patch identification technique and the use of model checking to evaluate performance propertiesare the main contributions of this paper. As part of the case studies, we present a detailed analysis of the appropriatestochastic model for the time spent by vehicles in each of the patches. We find that if the data is sufficiently regular( e.g. , in patches outside the city centre and during non-peak periods), simple Erlang distributions give a good fit, butthat in other cases we need distributions with a heavier tail ( e.g. , hyper-Erlang).The structure of this paper is as follows. In Section 2 we give an overview of the datasets and the related scientificliterature. In Section 3 we present the theoretical preliminaries, in particular a discussion of phase-type modelling andthe specification of performance properties. In Section 4, we present the main algorithm of the paper. In Section 5 weprovide the numerical results, and Section 6 concludes the paper.

Since our approach is data-driven, we start with a discussion of the two datasets (Edinburgh and Seattle) in Sec-tions 2.1 and 2.2. We provide an overview of related literature in Section 2.3. .1 Edinburgh Dataset The main dataset used in this paper was collected and provided to us by the Lothian Buses company, which operates anextensive bus network in Edinburgh. The provided dataset consists of bus GPS measurements obtained for the fullfleet of 745 buses between 28th January 2014 at 11:31:14 and 30th January 2014 at 12:38:31. The measurements werecollected using the AVL system built into each bus for the purpose of monitoring the buses and providing live arrivaltime predictions at bus stops. The AVL technology reports to a Real Time Passenger Information (RTPI) system calledBustracker, operated by the French company Cofely Ineo. The measurements are collected through centralised pullrequests made by a central server in France, with measurements coming in roughly every 35 seconds. This type of datacan be publicly accessed live via the MyBustracker API. The bus GPS measurements in the dataset have already gone through a data pre-processing step. As a result, themeasurements are very accurate, although some quirks can be observed, e.g. , buses ‘flying’ from one point in the city toanother (due to interpolation), or buses seemingly going over 50mph on Princes Street, a busy shopping street in thecity centre. The measurements contain bus vehicle identifiers (fleet numbers). However, since buses can serve differentroutes on different days (sometimes even on a single day if the vehicle has been assigned to a different service duringpeak hours), we do not always know for a specific measurement which route the bus is serving at the moment thatthe measurement was taken. In this paper, we avoid this complication by focusing on a specific route in Edinburgh,namely the Airlink service which connects the city centre to the airport. Since the buses on this route are coated ina distinct livery, they cannot be assigned to other routes. For the Airlink service, there are unfortunately two largeGPS “shadows” ( i.e. , areas where the GPS antennas have limited connectivity) in which the buses are not observed.This has consequences for our algorithm, as we discuss in more detail in Section 4.1. To cover the gaps, we performlinear interpolations between measurements, which is possible because we have measurement timestamps (additionally,the measurements are chronologically sorted in the dataset). Since the average times needed by buses to clear routesegments depend on the time of day ( e.g. , crossing the city centre takes much more time during rush hours), we focuson a specific period of the day, namely between 10AM and 3PM. Although it is possible to incorporate time-dependentaverage speeds in our model (using time-inhomogeneous Markov chains [42]), this is left to future work.In principle, our approach can be applied to any GPS dataset that is in the format of the one provided, i.e. , consistingof four data fields (bus ID, latitude, longitude, time) per measurement. The spatial data in the Edinburgh dataset is notgiven by default in longitude/latitude, but in the Eastings/Northings coordinates used for the Ordnance Survey NationalGrid in the UK. They can be converted into longitude/latitude using the Jcoord library in Java. The date/time formatneeds to be converted to UNIX timestamps — a custom format can be set in the supplementary programming code(specifically in the AVLDataProcesser class).

In addition to the Edinburgh dataset, we also use a publicly available dataset involving buses in Seattle [23]. This datasetspans roughly one month of data (November 2001). One advantage of the Seattle dataset is that each measurementincludes both a vehicle and service identifier, meaning that it is trivial to isolate routes. However, the Seattle dataset isconsiderably noisier than the Edinburgh dataset, as we discuss in more detail in Section 5.2. Although entries in the Note that since buses are never observed in the gaps, we cannot interpolate using averaged crossing behaviour from earlier observations as is done in, e.g. , [26]. eattle dataset are chronologically sorted per bus, the dataset as a whole is not, meaning that for a pair of successivemeasurements the earlier measurement does not necessarily appear first if the measurements involve different buses. Thelocation measurements are given in a bespoke x , y -coordinate system that does not trivially map to latitude/longitudecoordinates. To aid visual inspection of the data, our code has support for approximate translation between the twocoordinate systems, but the algorithm of Section 4 is shift- and scale-invariant to the measurements — i.e. , if an offset orzoom is applied to all measurements simultaneously, then the results from our algorithm remain the same.We focus on the Bellevue Express (Route 550), which is operated by SoundTransit and connects downtown Seattleto the Bellevue area in the east across Lake Washington. We have chosen the Bellevue Express for two reasons. First,the Bellevue express is a frequent service during the evening rush hour period between 4PM and 6PM on workingdays. Second, the heterogeneity of the route — two densely populated areas connected by a long stretch of highway —makes it very suitable to demonstrate the differences between the patches, and the traffic conditions inside the denselypopulated areas during the rush hour period make the parameter fitting within the patches more difficult than for themore predictable Airlink service. The use of AVL data to evaluate or improve public transport performance is an active research field: see [28] for a recentliterature overview. In particular, a number of probabilistic models have been proposed for the purpose of vehicle traveltime prediction. Examples include models based on ARIMA time series [43], Kalman filters [38], and basic Markovmodels [24]. In particular, the latter model represents the delay at each stop in the system using a Markov chain with astate for each minute of delay. Models to predict travel times can be used to infer models for vehicle headways at stops,which allows for the calculation of headway-based performance metrics. The model of this paper explicitly uses spatialinformation — hence it is considerably easier in our setting to make spatially-informed alterations to the system, suchas the introduction of a speed limit or space-based bus strategies. Furthermore, our approach is agnostic to stop data.Other examples of the use of AVL data to evaluate system performance include [13] and [14], in which Gaussiandistributions are used to evaluate and optimise timetable performance of a bus system in Miami. In [44], an approachwas presented for minimising service run times as a function of stop locations, where the run times were computedusing AVL and Automatic Passenger Counting (APC) data from Montréal, Canada. See also [21] for an approach toevaluate travel times and passenger numbers using a micro-level simulation which takes into account, for example,lanes and intersections. In [27], AVL data is combined with a machine learning procedure to predict travel times andheadways. This is then used to predict bus bunching — i.e. , the phenomenon where a delayed bus keeps accruing moredelay due to the greater number of passengers at subsequent stops, to the point where it is in very close proximityto the next bus serving the same route. Such a prediction algorithm for bus bunching can inform a control strategy.It would be interesting to see whether these strategies can be modelled using a formalism that is compatible withmodel-checking tools, so that service requirements involving a broad variety of performance measures can be checkedefficiently using high-performance evaluation algorithms.The current approach of dividing a bus route into patches and fitting Erlang or more general phase-type distributions, e.g. , hyper-Erlang to the time spent by buses in the patches can also be found in [32] and [25]. A formal discussion ofthe performance metrics for frequent services used in this paper is given in [31]. Here, a time series model was fittedto the bus arrivals, meaning that the movement of buses serving the route became lost in the abstraction. Finally, in[33] different probability distributions for the time spent in patches were compared, and the resulting model was used o evaluate the impact of a planned speed limit reduction in Edinburgh. One of the aims of this paper is to unify themethodology developed over the course of these papers.The patch identification technique presented in this paper incorporates automated map generation techniques. Fora recent overview of these techniques, see [1]. In particular, we make heavy use of the work of Biagioni et al. onthe EasyTracker project [9, 10]. The method proposed in [10] is able to perform route map generation, stop locationidentification, timetable construction and arrival time prediction at stops using only AVL data. This paper expands ontheir approach by using it to inform the construction of a simulation and performance evaluation model.In general, the question of how best to model the time spent by vehicles on route segments has been studied for manydecades [8], and is still being actively researched [12, 22, 40]. Among these papers, [22] and [12] are particularly relevantas the multimodal distributions reported in those papers resemble some of the distributions encountered in our casestudies ( e.g. , one can compare our Figure 6 to Figure 6 of [22], or to Figure 5 of [12]). Although the modelling choices inthose papers ( e.g. , the use of truncated distributions of [12]) are interesting, the use of the Erlang and hyper-Erlangdistribution gives a sufficiently good fit in our setting, as we discuss further in Section 5. In this section, we briefly discuss the formal principles needed in the later sections. We first discuss the basic stochasticmodelling framework that we will use for the patch crossing times in Section 3.1. In Section 3.2, we discuss the modelchecking techniques that we will use to evaluate performance properties.

In our paper, we represent the time spent by vehicles inside the patches using a phase-type distributions, which havea long history [3, 4] of being used to succinctly represent general distributions in a wide range of applications ( e.g. ,[11, 19, 29]). In our context, this means that the process of a bus moving across a patch can be modelled using a sequenceof phases such that the time to complete each phase is exponentially distributed . The time T spent in a single phase isexponentially distributed with rate λ > Probability Density Function (PDF) is given by f T ( t ; λ ) = λe − λt (1)for t ≥ λ — hence, large values ofthe rate λ mean that the time spent in a phase is small on average. The PDF f T ( t ) of a random variable T determinesthe probability of observing values from a small interval around t , and can also be used to measure how well a fittedprobability distribution corresponds to the data. In the appendix, we use the related function F T ( t ) = ∫ t f T ( τ ) dτ , calledthe Cumulative Density Function (CDF) to compare fitted to empirical distributions.The exponential distribution has several characteristics that often make it unsuitable for modelling patch crossingtimes. It has a high standard deviation relative to its mean (called the coefficient of variation ), its mode at 0, and it is memoryless , meaning that the amount of time already spent in a patch gives no information on the probability distributionof the remaining time. A straightforward generalisation, called the

Erlang distribution is often more appropriate. Given asequence T , . . . , T k of exponentially distributed random variables with rate λ > S = T + . . . + T k is Erlang-distributedwith rate λ and shape k ∈ N . Its PDF is given by f S ( s ; λ , k ) = λ k s k − e − λs ( k − ) ! (2) or s ≥ mixture of two or more Erlang branches.The PDF of a random variable Z with an m -branch hyper-Erlang distribution with rates (cid:174) λ = ( λ , . . . , λ m ) , shapes (cid:174) k = ( k , . . . , k m ) , and (cid:174) α = ( α , . . . , α m ) , is given by f Z ( z ; (cid:174) λ , (cid:174) k , (cid:174) α ) = m (cid:213) i = α i λ k i i z k i − i e − λ i z ( k i − ) ! (3)for z ≥ λ > m i ∈ N , and α i > i ∈ { , . . . , m } , and (cid:205) mi = α i = e.g. ,lognormal, the distributions from [22] and [12]) when required.In general, any probabilistic model described by a system state transitioning between phases (possibly skippingphases or going back) such that the time spent in each phase is exponentially distributed is called a Continuous-TimeMarkov chain (CTMC) . The probability distribution of the time until the system reaches a specific phase in the CTMC iscalled a phase-type distribution, of which the exponential and Erlang distributions are rudimentary examples.

Having specified the model, properties of interest can be expressed using a formal specification language. The choice ofspecification language depends on the modelling formalism used. For example, for CTMCs the most commonly usedproperty specification language is Continuous Stochastic Logic (CSL) [5, 6]. In this paper, we use the more generallanguage MultiQuaTEx [36, 37], which we extend with a notion of steady-state properties. We do not aim to describeMultiQuaTEx in full, but only those language features used in Section 4.3. In the following, we will assume that weare given a model simulator s , which can be queried using statements of the form s.rval("Y") , where Y is either alocation label (in which case the returned value is 1 if the vehicle is in that location, otherwise 0), or the name of a clockor integer counter (in which case the value of the clock or counter is returned). Furthermore, specification of functionsand if-then-else statements is allowed. For example, the query isYAboveThreshold() = if {s.rval("Y") > 5 } then 1 else 0 fi; returns 1 if the value returned by the query s.rval("Y") is above the threshold value 5 and 0 otherwise. As a specialcase, s.rval("time") returns the value of the global system clock. For model checking, we are interested in assertionsof the form S [ F(), "C"] < p; (4)In words, this asserts that the steady-state value of the random variable defined through the function

F() , with timegiven by the clock C , is smaller than a given value p between 0 and 1. To make this formal, let x ( t ) represent the globalstate (recall that this is, for each automaton, its current location, combined with the values of all local clocks andcounters) as a function of the global clock t . Then (4) asserts that the valuelim T →∞ T ∫ T F ( x ( t )) dC ( t ) (5) s smaller than p . The evolution of our system can be represented by a discrete-event process ( t , x ) , ( t , x ) , . . . , ( t N , x N ) such that the global state x ( t ) equals x i for all t ∈ [ t i , t i + ) , (5) can be rewritten to1 t N − t t N (cid:213) i = ( C ( t i ) − C ( t i − )) F ( x i − ) where t = t N = T . The latter equation is what we have implemented in the simulation engine BusSimulatordiscussed in Section 4.Assertions written in the form of (4) can be model-checked using statistical model checking [45], which uses computersimulation to produce a statistically justified statement about whether the properties hold. Exact numerical computationof steady-state probabilities as mentioned above is often infeasible, so we use a purpose-built simulator combined withthe model-checking tool MultiVeStA [36] (and in particular its steady-state model checking functionality [20]) to obtainthe results of Section 5. In particular, the steady-state properties are evaluated using the batch means method [2], asimplemented in the ASAP3 algorithm [41]. In this section, we introduce the algorithm to generate a fully parameterised stochastic model from the data. Thealgorithm consists of three main steps: map generation, patch identification, and model checking. Each step consists of anumber of subroutines as displayed in Algorithm 1. The reasoning behind the subroutines is to make the code modular,so that individual techniques can be replaced should the need arise. In the following we present a brief overview of theprogram code — note that an implementation written in Java is available online (see Section 1).

Algorithm 1

Map generation: File heatMapFile ← createObservationHeatMap(dataFile, δ , b ); File blurredMapFile ← gaussianBlur(heatMapFile, σ ); File skeletonMapFile ← skeletonise(blurredMapFile, τ , η ); File crossingsMapFile ← detectCrossings(skeletonMapFile); File graphFile ← createPrunedGraphMap(crossingsMapFile, ϵ , m ); File endpGraphFile ← identifyEndPoints(graphFile); Patch Identification: int[ ] a ← obtainObservationCounts(dataFile, endpGraphFile, γ ); int[ ] j ← jenksCluster(a, n ); Model Checking & Simulation List < List < Integer >> o ← obtainObservations(dataFile, j); double[][] erlangPars ← fitErlangDistribution(o); BusSimulator simulator ← new BusSimulator(erlangPars, β ); Most of the steps in Algorithm 1 require manually-specified parameters as input, as we discuss in more detail inthe next two sections. These parameter choices primarily concern the boundary between valid measurements andnoise, which are best judged by a human user. For example, the degree of noise and the existence of GPS “shadows”influence the choice of the parameters used in the first four steps. Although some level of input from a human planneris still required, the results of the subroutines provide a convenient way to provide feedback after each step has been ompleted. Also, for a single dataset (e.g., Seattle) a single value for each parameter will typically have good performancethroughout the dataset, so parameter selection is not something that necessarily has to be repeated for every service.In sections 4.1 to 4.3, we discuss the main steps of the algorithm in detail. However, before we continue, we first addressthe question of how to determine, for an AVL measurement, the corresponding patch, as this question arises in multiplesteps ( e.g. , in the obtainObservationCounts and obtainObservations routines in lines 7 and 9 of Algorithm 1,respectively). Since the bus GPS measurements are given as coordinates in a 2-dimensional continuous space, the moststraightforward characterisation of patches is as polygons on the city map. This was indeed the approach taken inother papers such as [32] and [33]. In this setting, an observation of a patch crossing time is collected from the databy recording the time of each measurement where a bus leaves a patch, and subtracting the time when it entered.To increase the accuracy of the crossing time, a linear interpolation between the measurements can be used. In thispaper, we take a different approach: we assume that during the period of the day that we are interested in, buses do notdeviate from their route. (This is not always valid, but we discuss approaches to mitigate resulting errors later on.)Consequently, given a bus location measurement, it is always possible to determine what fraction of the route hasbeen completed and what fraction has yet to be done. This allows us to assign to each measurement a route completionpercentage , represented by a value in [ , ) , where a value of 0 represents the bus being at the beginning of the route,and values close to 1 that the full route is about to be completed. The patches are then intervals that are subsets of [ , ) .Calculating route completion percentages is not straightforward in general, but amenable to automation as we willargue below. When a route is largely linear ( e.g. , the Airlink route in Edinburgh, see Fig. 2a), one possibility is to usethe distance between the current location and the most recently visited terminus ( i.e. , start or end point). However,this is clearly inadequate when the route shape is more complex; see, for example, the Bellevue Express in Seattle (seeFig. 4). To avoid users having to manually identify shape-defining corners in the route, we use the automatic procedureproposed by Biagioni et al. [9] to parse the data into a representation of the route in terms of a graph, i.e. , a collection ofnodes and edges. We then determine which of these edges are most likely to contain route termini, and then determinethe main route segments using the most frequently occurring sequences of edges going from one terminus to another.If we know the edge sequences forming the routes, we project each measurement onto the nearest edge and calculatebased on this information and the last terminus visited what the current route completion is. The first step of the algorithm is informed by the procedure from Biagioni et al. [9], which we briefly summarise in thissection. The first subroutine creates a heat map of GPS observations as displayed in Fig. 2a. A potential obstacle, whichwe for example observed near the airport and Waverley Station termini in Edinburgh, is that there can be areas on theroute in which buses are never observed. Two possible explanations for this are GPS “shadows” — where tall buildingsor other structures block the GPS signal — and overly aggressive data pre-processing. The gap near the Waverley Stationend point is displayed in Fig. 1a. We compensate for the gap by interpolating between subsequent measurements — seeFig. 1b. This procedure can be tuned using the parameter δ , which determines the weight of interpolations compared toactual observations (setting δ = b informs apost-processing step that accentuates certain noise — the only difference with δ is that its effect is non-linear.The next subroutine applies a Gaussian blur, as displayed in Fig. 1c - the intensity of the blur is determined by itsstandard deviation σ . This is followed by a filter in which pixels that are lighter than a certain threshold τ are madecompletely white. We then find a skeleton representation of the remaining non-white pixels, as displayed in Fig. 1d. Theprocedure that has been implemented in the code is slightly different from the procedure described by Biagioni et al. - a) (b) (c)(d) (e) (f) Fig. 1. Map thumbnail images for the portion of the Airlink route that intersects Princes Street. Specifically, these are: (a) uninterpolated,(b) interpolated, (c) blurred, (d) skeletonised, (e) skeletonised with the ‘crossings’ identified (note that three pixels have been madered), and (f) the final graph representation with red pixels representing nodes and black pixels representing edges. instead of setting pixels at the end to white at each step of the skeletonisation procedure, we “eat” away at the edges byreducing the darkness of edge pixels by a certain degree η . This means that infrequently-occurring interpolations that“cut corners” are removed quicker. Low values of η means that the algorithm takes longer to complete, but that it maybe better able to get rid of certain types of noise (namely the “hairs” of the type also reported in [17]).Having obtained the skeleton map, we turn this into a graph as follows. First, we determine the locations of all theends and “crossings” (specifically, all non-white pixels that do not have 0 or 2 non-white pixels in their 8-neighbourVon Neumann neighbourhood [39]). These are the red pixels in Fig. 1e. We then determine the sequences of non-whitepixels connecting these ends/crossings. For each such sequence, an initial graph representation is one where each pixelhas a vertex and there is an edge for each pair of pixels that are Von Neumann neighbours. This graph is then prunedusing the Ramer-Douglas-Peucker algorithm [18] with tolerance ϵ . This transforms a sequence of pixels into a muchsmaller set of edges of different lengths, such that the rough 2-dimensional shape of the sequence is preserved. Inorder to remove extreme disparities between the edge lengths, we partition all edges that are longer than twice themedian edge length into chunks roughly the size of the median divided by m (this aids the process of finding terminias described below). The final graph is then constructed by merging the vertices corresponding to the pixels whichrepresent crossings, completing the algorithm described in [10]. The result for the Airlink route in Edinburgh’s citycentre is displayed in Fig. 1f.In the resulting graph, the route can be represented by a sequence of directed edges. To determine the route, wefirst identify the edges containing the termini — we assume that those are the edges in which a large amount of timeis spent. We calculate for each (directed) edge how much time is spent on it on average, divided by the length of the dge. If this value is very high, then this indicates the presence of a terminus. In particular, we assume that the servicehas two termini (this can be generalised if the need exists) and hence select the two edges with the highest values. Analternative approach is to select the two extreme edges if the graph consists of a sequence of edges. Next, we determinethe edge sequences that occur most often for buses travelling between the termini — these are the route segments. Forexample, the Airlink has two segments, namely the West-East direction and the East-West direction. The length of the total route starting from a specific start point (it does not matter which one is chosen) is then the sum of the edgesleading back to the start point, if we take into account the direction of travel on the edges.Given the graph map and route (segment) information, the route completion can be computed as follows: givena GPS measurement, we find the edge that it is closest to and which direction this bus is going (for this we use thepreviously visited terminus). This then maps uniquely to a part of the route — buses are assumed to cross each directededge at most once during the route. The distance to the start point can be found through the edge distances, and whenthis is divided by the total route length, this results in the required route completion measurement. Note that the lengthof the route can be different in the two directions: going by the edge lengths in the graph, the lengths of the west-eastand east-west directions are roughly 12.4 kilometres and 12.3 kilometres respectively, whereas for the Bellevue Expressin Seattle the route is roughly 20.2 kilometres long in both directions. In the first subroutine of the second step, we compute for each bus GPS measurement the degree to which the bus hascompleted the route at the time of measurement, as a number in [0, 1). We then construct an initial patch structureby dividing the interval [0,1) into γ evenly-sized portions. We want to group together patches that are next to eachother and that are “similar” in terms of the number of times buses were observed in the patches, since this is a goodindication of the average bus speed in those patches. There are several ways to do this – we have investigated twoin more detail. The first method is to look for the pair of adjacent patches such that the merger of the two has thelowest number of bus observations. Merge these, and continue carrying out this step until the resulting patch has anobservation count larger than the highest number of observations amongst the initial patches. As an advantage, thisapproach does not require that the final number of patches is specified a priori .For the second approach, we create a sequence of absolute differences between the number of observations in eachpatch and the next. We then use K -means clustering in one dimension (this is called Jenks natural breaks optimisation)— note that the K in ‘ K -means clustering’ is part of its commonly used name, we instead denote the final number ofpatches by n . This is visualised in Fig 3, where the height of each bar represents the number of observations and thecolours denote the final patch structure. In the remainder of this paper, we use this second approach.The final patch structure for the Airlink service is displayed in Figures 2b and 2c for the west-east and east-westdirections respectively. Here, the initial granularity γ was set to 50 and the final number of patches n to 10. The twoend points are part of the red patch on the left and the orange patch on the right respectively. Note that there is anasymmetry between the two directions: there are five patches between the two end patches on the west-east directionand only three on the east-west direction. Alternative Approaches.

Several other patch identification approaches have been considered, but were ultimately deemedless suitable than the aforementioned approach. We briefly discuss two of them in the following.The first alternative approach is to do away with route completion percentages altogether, and take the full datasetof GPS measurements expressed only in terms of latitude/longitude coordinates. We then use K -means clustering to dentify K centre points of the clusters. A patch structure is then obtained using the Voronoi tessellation resultingfrom these K centre points. An advantage of this approach is that it is easy to apply — many tools support K -meansclustering ( e.g. , the statistical package R). However, this approach is relatively crude and cannot handle “complex” routeshapes, i.e. , those that do not resemble a straight line.The second is use the bus stops to demarcate the patches. To do this, we require a list of bus stops and theirlatitude/longitude coordinates. These locations can be inferred via the method of Biagioni et al. [10], or obtained viasome other data source, e.g. , the Lothian Buses data API for the Airlink route. Given the bus stops, we can then constructpatches either using Voronoi tessellation or route completion percentages. However, the automated inference methodof [10] is not always able to distinguish bus stops from busy junctions, and a succession of busy junctions in a denselypopulated area ( e.g. , in area around the eastern terminus of the Bellevue Express in Seattle) could lead to the creation ofa multitude of spurious small patches. Whilst these could be merged, the question would then be what advantage therestill is over the approach described above despite the additional effort. Once the patch structure has been obtained, the next step is to identify the probability distribution of the time spentin each of the n patches. We focus on the Erlang and the hyper-Erlang distributions, which we validate through acomparison to other distributions in Section 5. We assume that we are only interested in measurements during givenperiods of interest during which the behaviour of vehicles is assumed to be similar. In our experiments we have foundthat, as expected, patch crossing times are noticeably different during rush hours, midday, and the night. The sameis true for, e.g. , weekends and weekdays. This is not only true for patch crossing times, but also for the schedule andmodel parameters that are input by the user such as β , the number of buses assigned to the service. Note that thisdoes not affect the generality of the method: for example, if one wants to know whether a bus service has adequateperformance throughout the day, one should partition the dataset and check whether the performance requirementsare satisfied in each of the portions. In fact, the parameter fitting procedures of this paper are helpful to inform such apartitioning: if the parameters are comparable between time segments then they can justifiably be merged.The parameters of the Erlang and hyper-Erlang distributions are found by matching them to the patch crossing timesobserved in the data. Patch crossing time observations are obtained in the following way: we calculate route completionpercentages for each measurement in the dataset that occurs during a period of interest, for all buses that service theselected route. To increase the granularity of the data, we perform a linear interpolation between the measurements foreach bus. That is, for every measurement we also calculate its UNIX time, i.e. , the number of seconds since 1 January1970. For each measurement i , let t i be its Unix time, r i its route completion percentage, and t i + and r i + the Unix timeand route completion percentage of the next measurement of the same bus . We then create interpolated measurementsas follows: let ∆ t = t i + − t i and ∆ r = r i + − r i , then we create new measurements with times t i + , t i + . . . , t i + ∆ t − r i + ∆ t ∆ r , r i + ∆ t ∆ r , . . . , r i + ∆ t − ∆ t ∆ r . We interpolate in terms of route completionpercentage instead of latitude/longitude to increase measurement accuracy — e.g. , interpolation in space can lead tocorners being cut short, whereas interpolation in route completion compels the buses to adhere strictly to the route.For each bus, we keep track of the number of seconds since it crossed into its current patch. If the bus crosses theboundary between patches j and j +

1, or between between patches n − j as a crossing time observation for patch j . Due to measurement rrors, it may happen that a bus appears to move backwards, i.e., cross from patch j into j − n − This may lead to spurious small crossing time observations, which may have a large distorting effect on the parameterfitting procedure. As such, if a bus crosses into the previous patch, the number of seconds since crossing into it is setto − Once a full dataset of observations (cid:174) x = ( x j , x j , . . . , x jN ( j ) ) has been created, where N ( j ) is the number of obser-vations for patch j , we need to obtain the optimal Erlang and hyper-Erlang distribution parameters k j and λ j . In ourimplementation, the procedure for the Erlang distribution is as follows: starting with k j =

1, we set λ j = k j / ¯ x j with¯ x j = (cid:205) N ( j ) i = x ji / N ( j ) , meaning that we set the mean of the Erlang distribution to equal the sample average (in statistics,this is called the ‘method of moments’). We then calculate the corresponding log-likelihood value l ((cid:174) x ; k j , λ j ) = N ( j ) (cid:213) i = log ( f ( x ji ; k j , λ j )) with f equal to the function f S of (3). We then increase n j by one, and repeat this procedure - we continue until wereach the value k j such that its log-likelihood is lower than in the previous step. We then choose the λ j and k j of theprevious step as the distribution parameters. To obtain the hyper-Erlang results in Section 5, we have used the toolHyperStar [34], although it should be noted that this tool uses a randomised algorithm and that the found parameterstypically vary between tries.The Erlang/hyper-Erlang distributions describe the movement of the buses through the patches when they areservicing the route. During simulation, we initialise the buses as either starting at a single terminus, or uniformly alongthe route (for long-run simulations, the difference is negligible). We keep track of a single global clock, and for eachbus its current patch and the number of completed phases within that patch. We then draw phase completion timesfor each bus using the rate in their current patches (and branches in case of the hyper-Erlang distribution). We thenidentify the bus with the next phase completion, increment its phase, potentially change its patch (and possibly selecta hyper-Erlang branch) and draw its next phase completion time. We continue until the statistical model checkingfront-end ( i.e. , MultiVeStA) has found that enough samples have been drawn for the simulation to be completed.In practice, the buses are regularly observed to wait at the termini to increase the regularity of the service. To enforcean implicit schedule on the buses, we enforce that buses cannot leave end patches unless t > rd i + h ij . Here, t is theglobal time, d i is an integer that is incremented by 1 every time the route is completed ( i.e. , the first patch is entered), r is the timetabled amount of time that is assigned to buses to complete the entire route, and h ij is a constant thatensures that different buses leave patch j at different time points. A typical choice for r is the total duration of the route(including time spent at the end points). Let the cumulative means c j be given as c j = (cid:205) j − ι = µ ι where the µ ι are themean patch completion times. Then a typical choice for h ij equals r ( i − )/ β + c j , with β the total number of buses, sothat the times between bus departures are timetabled to all be the same.To illustrate these choices of r and h ij for the Airlink case study, first note that the total route duration (includingtermini) for midday Airlink buses equals 5 259 seconds or 87 .

65 minutes (this can be seen by summing the values of µ For example, this appears to happens for bus 5019 in the Seattle dataset, between 16:14:30 and 16:21:53 on 1 November 2001. For example, this appears to happens for bus 5203 in the Seattle dataset, between 17:45:55 and 18:07:26 on 31 October 2001. uaTEx expression Property ewt() = 0.5 * (s.rval("y_j") - mu_tot) S [ ewt(), "c_j" ] < 75* (s.rval("y_j") - mu_tot) / mu_tot;evwt() = if {s.rval("y_j") > 900} S [ evwt(), "c_j" ] < 0.05then 1 else 0 fi;bph() = if {s.rval("H_j") < 6} S [ bph(), "time" ] < 0.05then 1 else 0 fi; Table 1. The QuaTEx expressions for the EWT, EVWT, and BPH. in Table 2 in Section 5). In the dataset, 11 buses typically seem to be servicing the route in the midday. This means that r = h ij = / ( i − ) + c j = . ( i − ) + c j , with c = c = .

09 seconds. This is very close to the value of 477 . Performance Metrics.

To measure system performance, we use the metrics used by the Scottish government (see also[31]). These measures are only relevant for frequent services, defined as routes for which six or more bus arrivalsare scheduled per hour. For frequent services, exact timetable performance is not as relevant as the regularity of the headway , the time between subsequent bus departures (see, e.g. ,[28]). In particular, we consider the following metrics:(1) the excess waiting time (EWT),(2) the extreme-value waiting time performance (EVWT), and(3) the buses-per-hour performance (BPH).The EWT is the average experienced waiting time minus the “timetabled” waiting time, where by the “timetabled”waiting time we mean the waiting time experienced by passengers arriving uniformly to the bus stop when there is zero headway variance. In particular, with σ denoting the headway variance and µ the headway mean, the excesswaiting time can be shown to equal σ / µ . The EVWT is the probability that the amount of time between subsequentbus departures is more than 15 minutes. The BPH performance is the steady-state probability that fewer than six buseshave departed in the previous hour. The Scottish government’s requirement on the EVWT and BPH is that they shouldbe at most 5% at the starting point of a journey. The EWT should not exceed 75 seconds at a set of important bus stopscalled “timing points”.To evaluate the aforementioned metrics using statistical model checking, we introduce several additional variablesduring simulation. For each bus i ∈ { , . . . , β } and patch j ∈ { , . . . , n } , we maintain a clock z ij that represents the timesince bus i entered patch j . For the BPH, we use the counter H j which for each patch j denotes the number of busesthat departed in the past hour. Hence, H j = (cid:205) βi = ( z ij < ) , where ( A ) equals 1 if the boolean expression A is trueand 0 otherwise. The clock y j , which is used by both the EWT and EVWT, represents the time since the last visit by abus to patch j , and equals y j = min i ∈{ ,..., β } z ij . Finally, the counter c j is incremented every time a bus leaves patch j .Given these variables, the three system performance metrics can be expressed using the language MultiQuaTEx (seeSection 4.3) as given in Table 1. For the EWT, we use the value µ tot which denotes the timetabled headway. Note thatfor the BPH the notion of time is given by the global clock, denoted by "time" , whereas for the EWT and EVWT weuse c j to denote time. In Section 5, we will give numerical results involving these properties. RESULTS

In this section, we illustrate the applicability of the proposed method by means of two case studies. We begin with theAirlink service in Edinburgh in Section 5.1. We confirm the experiments done in [31] that show that the Airlink hasexcellent performance. In Section 5.2, we consider one of the services in Seattle, namely the Bellevue express. Its routecovers downtown Seattle and its performance in 2001 was noticeably below that of the Airlink service. In Section 5.3,we have another look at the Airlink service — our simulations, using a set-up with altered parameters, suggest thatthe average waiting time of passengers can still be improved by 1.5 minutes by using a strategy that involves busesadapting to each other’s behaviour.

Patch Structure.

The full heat map of the Airlink service, including interpolations, that is created by running the createObservationHeatMap function in line 1 of Algorithm 1, is displayed in Figure 2a. For Figure 2a, we have decidedto display the heat map for the full dataset , whereas we only consider the time period between 10AM and 3PM forall other Airlink figures and tables. We have done this because the full heat map shows the buses still reportingmeasurements whilst stationed at Lothian Buses’ depot on Annandale Street, at the “cloud” in the top right corner ofFigure 2a. It also visualises data quirks: the vertical lines going downward from the Annandale depot correspond tointerpolations between bus depot measurements and unlikely measurements in other parts of UK, sometimes as far asWales. The regularly appearing (and hence quite dark-coloured) straight line between the Annandale depot and theWaverley station terminus on the Easternmost part of the route is presumably due to the measurement device being offeither after completion or before the start of the route. After the completion of steps 1-6 of Algorithm 1, we partition the route into 50 initial patches, and determine foreach patch how many times a bus was observed in this patch. The resulting bar chart is displayed in Figure 3. AfterK-means clustering with 10 means, we obtain the patch structure as indicated by the colouring of the bars in Figure 3.A visual representation of the patches in given in Figures 2b and 2c. Because the two route segments are not of equallength, and because entering the city centre is more time-consuming than leaving it, two patches are created for the citycentre area, and only one in the opposite direction (excluding the terminus patch). In general, it can be observed that,as expected, buses spend more time in the densely populated areas in the eastern part of the route than on GlasgowRoad towards the west.

Parameter Fitting.

In Table 2, we have displayed the Erlang parameter values λ j and k j for each patch j = , . . . ,

10. Wealso display for each patch its mean patch crossing time µ j (in seconds), the standard deviation σ j , and the coefficient ofvariation c v , j . The coefficient of variation is defined as the ratio of the standard deviation to the mean — it is noticeablyhigher for inner-city patches than for the patches in the city outskirts. In particular, the patch near Haymarket station(patch 5) has a very high coefficient of variation.Empirical CDF plots for the patch crossing time observations are displayed in Figure 7 in the Appendix. As we cansee, the Erlang distributions have a good fit. To evaluate the goodness-of-fit numerically, we use the Anderson-Darlingtest to test the null hypothesis that a sample is drawn from the fitted distribution. Of course, the test can only be usedto disprove the null hypothesis, whereas ideally we would like to prove it. Moreover, the test is biased against disprovalin our setting, because the test assumes that the sample is drawn independently from the null hypothesis distribution— this is not the case, as the parameters of the distribution where calculated using the same sample. However, if the As discussed in Section 4.3, such measurements are not considered when we determine patch crossing time observations. a)(b)(c)Fig. 2. Top: heat map of the full Airlink route, including entries outside the 10AM-3PM window. Centre and bottom: automaticallyobtained patch structures for the west-east and east-west direction respectively. atch 1 2 3 4 5 6 7 8 9 10 l k

44 106 68 73 17 37 40 30 78 101 λ µ v σ c v Table 2. Patch parameters for the Airlink service, midday (10AM - 3PM). For each patch, we display the length l of the patch inkilometres, Erlang shape and scale parameters k and λ , the mean µ = k / λ in seconds, the average bus speed v in the patch inkilometres per hour, the standard deviation σ = √ k / λ , and the coefficient of variation c v = /√ k . Anderson-Darling test is still able to reject the null hypothesis, despite the bias against doing so, then we can take thisas evidence that the fitted distribution is inappropriate. The test statistics and p -values have been calculated using the goftest package of R.Patch 1 2 3 4 5 6 7 8 9 10N. obs. 71 80 80 79 79 76 75 80 79 75AD 0.777 1.1076 0.5353 0.4338 0.6865 0.108 0.843 0.6958 0.8432 0.3954 p -value 0.4974 0.3053 0.7105 0.8143 0.5697 0.9999 0.4506 0.5618 0.4505 0.8529 Table 3. For each of the 10 patches for the Airlink service, the number of patch crossing time observations, Anderson-Darling teststatistics, and corresponding p -values. The p -values are relatively high in all cases. We have displayed the test statistics and p -values in Table 3. The p -values are very high, which also suggests thatthe Erlang distribution has a good fit for the Airlink’s patch crossing time distributions. The choice for a very specificobservation period ( i.e. , between 10AM and 3PM, on a Tuesday, Wednesday and Thursday) may contribute to theregularity of the measurements. In Section 5.2, we will see that the Erlang distribution can have a less good fit whenconditions are more challenging ( i.e. , during rush hour). nu m be r o f ob s e r v a t i on s Fig. 3. In this bar chart, the height of each bar represents the number of bus observations (without interpolation) in the correspondinginitial patch. The colouring indicates the patch structure resulting from Jenks natural breaks optimisation on the differences betweensubsequent bar heights. The colouring used here is identical to that used in Figures 2b and 2c. atch 1 2 3 4 5 6 7 8 9 10EWT 0 . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . EVWT — — — — 0 . ± . . ± . — — 0 . ± . . ± . BPH — — — — — 0 . ± . — — — — Table 4. For each of the 10 patches for the Airlink service, its performance expressed in the form of the three metrics discussed inSection 4.3. The dashes mean that the corresponding event ( e.g. , more than 15 minutes between subsequent bus arrivals) was notobserved during the simulation, resulting in an estimate of 0. The entries saying “0.000” mean that the event of interest was observed,but that the resulting probability was still rounded down to zero.

Performance Evaluation.

To demonstrate the usefulness of our method, we can use it to reproduce the results from [31].We use the three performance metrics for frequent services used in [31] and discussed in Section 4.3: namely the EWT,the EVWT, and the BPH. In [31], these metrics were computed using an elementary (ARMA) time series model that didnot consider the movement of buses through space. Before we present a comparison of headway correction strategies,we will first determine that the results coincide.In Table 4, we display estimates of the current EWT, EVWT, and BPH for each of the 10 patches in the Airlink model.We drew as many simulations as needed (using sequential hypothesis testing) to reach a conclusion on whether therequirement was met ( i.e. , 75 seconds for the EWT etc.), and additionally that the relative confidence interval half-widthhad to be at most 10% — this was to ensure that we would get reasonably accurate confidence intervals even whenthe requirements are comfortably met. We left the entry blank if we had not yet observed the event of interest after300 seconds of simulation. A complication in many settings is that the Airlink service is so reliable that observingserious headway deviations is very unlikely, in particular at the beginning of the route. If we do not observe the event ofinterest, we cannot estimate its probability — the so-called rare-event problem. Since this means that the requirementsare satisfied in our setting, this is not a complication.Note that in Table 4, the service typically starts to perform worse when the buses get further from the termini. Thisagrees with the results presented in [31]. The EWT values are somewhat lower than were reported in [31]. For examplethe EWT for patch 1 in Table 4 is 0.25 seconds, whereas the value of 7.9773 seconds is reported in Table 3 of [31] asthe EWT at the airport bus stop. This is to be expected: in the model a bus will leave an end stop the very secondthe timetable says that it should, whereas human drivers will not be similarly precise. This effect wanes as the routeprogresses: at the end of Patch 3, the difference is 5.84 versus 17.5301 seconds. At the end of patch 6, the difference is36.39 versus 35.5752 seconds. Do note that the results in the tables are not entirely comparable since different timeintervals were used: 9AM-5PM in [31] and 10AM-3PM in this paper. The EVWT at the end of Patch 6 is roughly 1.2%,which is inside the confidence interval reported in Table 4 of [31]. The BPH is close to zero across the route, which isalso consistent with Table 6 of [31].It is clear from Table 4 that the Airlink service meets the requirements set by government regulators. The EWTremains below 75 seconds at each of the patches; similarly, the EVWT and BPH remain below 5% even at parts in themiddle of the route where the requirements no longer apply. However, there is still room for improvement, as we willsee in Section 5.3. However, before we move on to bus performance improvement strategies for the Airlink, we presentthe modelling results for the Seattle dataset. Note that [31] considered two versions of the BPH: namely the Steady-State Buses-per-Hour Requirement (SSBHR) and the Day-Long Buses-per-HourRequirement (DLBHR). The reason was that the requirement specified in the Scottish governments BPIPS document [35] could be interpreted in differentways. The BPH here uses the underlying metric of the SSBHR as we feel that this is the more natural interpretation of the requirement. .2 The Bellevue Express in Seattle Patch Structure.

One of the main strengths of the approach introduced in this paper is its generality: under mildassumptions, any transport service for which AVL measurements have been collected over a prolonged period can beanalysed in the same manner. To demonstrate this generality, we apply our method to the publicly dataset involvingbuses in Seattle [23], and in particular on the Bellevue Express, which is/was a frequent service during evening rushhours ( i.e. , between 4PM and 6PM on weekdays). The route map of the Bellevue express as of July 2016 is displayed in

Fig. 4. Screenshot from displaying the current (as per July 2016)route of SoundTransit Route 550, the Bellevue Express.

Figure 4. As we can see in Figure 5a, the route has only slightly been changed in the more than 15 years since 2001,when the data was recorded – the only major difference is in the North-East part of the route. There are no gaps inthe measurements similar to what we observed for the Airlink data, which means that it is not necessary to put muchweight on interpolations between subsequent measurements.As measured by the graph map produced in line 5 of Algorithm 1, the route is roughly 20.16km in both directions.We use 12 patches instead of the 10 for the Airlink as this led to better fitting results. During rush hour, the BellevueExpress buses spend little time waiting at the western terminus, sometimes passing it completely in the time betweensubsequent measurements. (This is also evident from the CDF plot for patch 1 in Figure 8 in the Appendix, wich includesvery short and very long crossing times.) This complicates the procedure for determining the termini, however in thiscase the two edges on the far ends of the graph are obvious candidates.

Parameter Fitting.

The parameters characterising the patches are displayed in Table 5. The fact that buses spend aconsiderably larger amount of time in the terminus in the east (Patch 7) than in the west (Patch 1) is evident from thecorresponding values for µ . The average speed in Patch 3, which corresponds to the highway crossing Mercer Island,is 60.48 km/h, which is much higher than any patch in the Airlink dataset. However, the patches for the west-eastdirection have higher average speeds than those for the east-west direction. Another interesting feature is Patch 8,corresponding to an area in Bellevue where an average speed (including time spent waiting at stops and junctions) ofonly about 6.68 km/h is observed. In general, the coefficients of variation in Table 5 are much higher than in Table 2, itscounterpart for the Airlink service. a)(b) (c) Fig. 5. Heat map, and patch structure in both directions for the Bellevue Express in Seattle.

Patch 1 2 3 4 5 6 7 8 9 10 11 12 l k λ µ v σ c v Table 5. Same as Table 2, but for the 12 patches of the Bellevue Express.

The CDFs of the patch crossing time observations, including fitted Erlang distributions, are displayed in Figure 8 inthe Appendix. The Anderson-Darling test statistics and p -values for the Bellevue Express are displayed in Table 6. Wecan see that the observation counts differ starkly from patch to patch, especially near the ends, which means that manyobservations were rejected due to appearing to move “backwards” into a patch. The p -values and visual inspection ofthe CDF plots both suggest that the Erlang distribution has a good fit for some patches, but worse for others. The fit forthe terminus patches (patches 1 and 7) is especially lacking, but for some of the patches in the east-west direction (inparticular patches 8, 10, and 11) the p -value is below 5%.In Figure 6, we take a closer look at the goodness-of-fit for the patch crossing time observations for Patch 10, whichin Table 6 can be seen to have the worst fitting results apart from the terminus patches. In Figure 6a, we have displayedboth a kernel density plot for the data and the pdf of the fitted Erlang distribution. It can be seen that the data has a atch 1 2 3 4 5 6 7 8 9 10 11 12N. obs. 30 166 193 201 183 174 60 103 111 105 100 21AD 2.6455 1.6557 0.759 0.853 0.86 1.4794 7.5402 2.8301 1.2526 4.3316 3.3847 0.6046 p -value 0.042 0.1434 0.5114 0.4442 0.4395 0.1815 2e-04 0.0335 0.2483 0.006 0.0176 0.6416 Table 6. Same as Table 3, but for the 12 patches of the Bellevue Express. considerably heavier tail on the right than the data. This is confirmed by the Cullen and Frey graph of Figure 6b, fromwhich we observe that the skewness of the data is much higher what we would expect from a Gamma distribution(which includes the Erlang) with the same kurtosis. It also tells us that other common heavy-tailed distributions, suchas the lognormal and Weibull distributions, also do not have a good fit in terms of skewness and kurtosis. In Figure 6c,we have compared the data to a 2-branch hyper-Erlang distributions. The parameters were obtained using HyperStar,and equal (cid:174) k = ( , ) , (cid:174) λ ≈ ( . , . ) , and (cid:174) α ≈ ( . , . ) . It can be seen to have a much better fit to the data:after drawing a random sample of size 10 000 from this hyper-Erlang distribution using the mapfit library of R, weconducted a 2-sample Anderson-Darling test to compare this sample to the observation data. The resulting p -value(0.5551) suggests that this distribution has a much better fit to the data. However, there may be a risk of overfitting dueto the high number of parameters (namely 6) in a 2-branch hyper-Erlang distribution. . . . kernel density plot, Erlang N = 105 Bandwidth = 34.16 D en s i t y (a) l Cullen and Frey graph square of skewness k u r t o s i s l Observation Theoretical distributionsnormaluniformexponentiallogisticbetalognormalgamma (Weibull is close to gamma and lognormal) (b) . . . kernel density plot, 2−branch hyper−Erlang N = 105 Bandwidth = 34.16 D en s i t y (c) Fig. 6. Comparisons between a kernel density plot of the patch crossing time observations of Patch 10 in black with a fitted Erlangdistribution (Figure 6a) and a fitted 2-branch hyper-Erlang distribution (Figure 6c) in red. Figure 6b displays a Cullen and Frey graph,which compares the skewness and kurtosis of the observations to several commonly-used distributions.

Performance Evaluation.

The statistical model checking results for the Bellevue Express are displayed in Table 7. Table 6apresents the results for the standard setting, i.e. , with an implicit timetable. The timetable is derived as follows: we firstcompute average completion time of the complete route (namely 6323 seconds, which can be obtained by summing allthe µ -values in Table 5). We then divide this value by 12, as around 12 buses are typically observed to do the routeduring the rush hour period, leading to scheduled inter-departure time of 527 seconds. However, in our experimentswe found that the timetabled setting is overly optimistic. For example, by computing empirical EWTs by recordingthe observed patch departure times in our dataset (which can be noisy because dropped observations result in a biastowards high values) and applying EWT = µ /( σ ) , we found that the empirical EWTs for patches 2, 3, 4, 5, and 6 were a) With a timetable at the end of the patches Patch 1 2 3 4 5 6 7 8 9 10 11 12EWT 0 . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . EVWT — 0 . ± . . ± . . ± . . ± . . ± . — 0 . ± . . ± . . ± . . ± . . ± . BPH 0 . ± . — 0 . ± . . ± . . ± . . ± . — 0 . ± . . ± . . ± . . ± . . ± . (b) Without a timetable at the end of the patches Patch 1 2 3 4 5 6 7 8 9 10 11 12EWT 152 . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . EVWT 0 . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . BPH 0 . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . Table 7. Same as Table 2, but for the 12 patches of the Bellevue Express in two settings: with a timetable at both termini (as describedin Section 4.3), and without. approximately equal to 204 .

6, 183 .

0, 172 .

8, 188 .

0, and 216 . As mentioned previously, the simulation model can be used to investigate the impact of hypothetical changes to aservice on its performance. In Table 8, we have presented an overview of the simulation results for different headwaycorrection strategies. We first include the EWT as computed from the data. Next is the situation where the services waitat the termini using the procedure described in Section 4.3. Note that the difference between the empirical EWT and itssimulated counterpart is greater in the east-west direction (as evidenced by the EWT in Patch 10) than in the reverse.The next row corresponds to the situation where there is no headway correction even at the termini. With buses nolonger standing still at the termini, the headway variance is much larger than in the standard setting. However, themean observed headway ( ˆ µ Y ) is 150 seconds lower, which is a substantial improvement. Because the EWT is defined asthe observed average waiting minus the timetabled waiting time, we subtract the originally timetabled waiting time.This leads to a net effect of -15 seconds in all patches, compared to between 3 and 32 seconds in the original setting.This means that the increased headway variance is outweighed by the gains in average headway.To improve system performance further, [28] discusses four methods to reduce headway variance: bus holding (increasing dwell times at stops), speed modification (decreasing maximum cruise speed), stop-skipping , and short-turning (switching directions before the end stop). Since the system state in our model does not distinguish whether buses aremoving or standing still, only the fourth of these can be made explicit. However, we can try to capture the behaviour ofspeed modification (and potentially stop-skipping) in an abstract way by lowering the rate at which buses complete the atches. Also, we can mimic bus holding by lower-bounding the exit times from patches based on the headway withthe previous bus. The procedure that we have chosen is as follows. For bus holding, we impose that buses cannot leavepatch j , j ∈ { , . . . , n } , unless the clock y j , which represents the time since the previous bus departure from patch j , isgreater than or equal to some threshold value θ . For speed modification, at each time step, we compute for each bus itsroute progression using the procedure discussed at the end of Section 4.1. With 11 buses on the road, the difference interms of route progression between subsequent buses should be roughly 9 .

09% (this can vary quite a bit near the endpoints). We then introduce the following mechanism: if the route progression difference between bus i and the onefollowing it is greater than 100% · θ , we slow bus i down (via its phase completion rate) by 10%. After all, in order tomaintain headway regularity it may be preferable to use the possibility to go slightly slower than specified by the speedlimit to avoid getting too close to the bus in front. This can be combined with timetables at the termini.The results displayed in Table 8 indicate that bus holding combined with the timetable does have a stabilising effect,with a slightly higher EWT in the patches directly after the termini (presumably due to a higher probability of a bustaking so long to complete the route that it has run out of “slack” time at the terminus), but a lower EWT in the middle ofthe route. If the threshold θ is set too high, then route completion times are such that buses tend to arrive at the terminiafter their timetabled departure time, leading to higher average headways and higher EWTs. By contrast, the impact ofspeed modification with the chosen parameters is very small. The best performance is achieved by a combination ofbus holding with a lack of timetables at the termini, leading to an excess waiting time of between -80 and -100 seconds.Note that reducing the EWT may result in other performance metrics being affected: e.g. , the Averaged In-VehicleTimes (AIVT) as discussed in [27], which concerns the time spent by passengers in vehicles. Passengers are currentlynot part of the model, so we leave this as subject for further research. In this paper, we have presented a novel, fully automated approach for using AVL data to build and parameterisestochastic models of transport services. The model can be used to obtain reliable estimates of the performance of theservice, which can help operators determine whether they are meeting the requirements set by regulators. Furthermore,it can be used to analyse the impact of changes to the system. The data and the code used to conduct the experimentsare available for public use.One direction for future work is to expand the model to incorporate other real-world phenomena. For example, thecurrent model is time-homogeneous and focuses on a specific part of the day (between 10AM to 3PM). An extensionwould be a fully time-dependent model including the rush hours and the night. Time-inhomogeneity is straightforwardlyadded to the model by assigning to each patch multiple sets of Erlang parameters, such that only one set of parameterscan be chosen based on the time of the day. This can be further extended to include weekday/weekend effects andperhaps even seasonal effects. Another addition would be to incorporate weather effects, as bus systems are used moreintensively during periods of heavy precipitation. To realistically parameterise such a model, the AVL dataset wouldneed to be matched with a dataset of weather measurements.A powerful addition to the model would be to include passengers boarding or alighting from the vehicles. However,this requires the AVL measurements to be matched with Automatic Passenger Counting (APC) data, which can bechallenging to obtain, either because this data is not being collected or because operators are understandably hesitantto share it. From the point of view of system reliability, another interesting addition would be to include vehiclebreak-downs, as these can have a big impact on certain performance measures (particularly the EVWT). Extending themodel to include performance measures for non-frequent services, particularly the On-Time Performance (OTP), is traightforward. Finally, instead of modelling a single service, one could create a model of the full network and includecontention between buses for access to bus stops. ACKNOWLEDGEMENTS

This work has been supported by the EU project QUANTICOL, 600708. The authors thank Bill Johnston and PhilipLock of Lothian Buses for providing access to the data and for their helpful feedback.

REFERENCES [1] M. Ahmed, S. Karagiorgou, D. Pfoser, and C. Wenk. A comparison and evaluation of map construction algorithms using vehicle tracking data.

GeoInformatica , 19(3):601–632, 2015.[2] Christos Alexopoulos and Andrew F Seila. Implementing the batch means method in simulation experiments. In

Winter Simulation Conference ,pages 214–221. Citeseer, 1996.[3] T. Altiok. On the phase-type approximations of general distributions.

IIE Transactions , 17(2):110–116, 1985.[4] S. Asmussen, O. Nerman, and M. Olsson. Fitting phase-type distributions via the em algorithm.

Scandinavian Journal of Statistics , pages 419–441,1996.[5] A. Aziz, K. Sanwal, V. Singhal, and R. Brayton. Model-checking continuous-time Markov chains.

ACM Transactions on Computational Logic (TOCL) ,1(1):162–170, 2000.[6] C. Baier, B. Haverkort, H. Hermanns, and J.-P. Katoen. Model-checking algorithms for continuous-time Markov chains.

IEEE Transactions on softwareengineering , 29(6):524–541, 2003.[7] C. Baier and J.-P. Katoen.

Principles of model checking . MIT press, 2008.[8] D.S. Berry and D.M. Belmont. Distribution of vehicle speeds and travel times. In

Proceedings of the Second Berkeley Symposium on MathematicalStatistics and Probability , 1951.[9] J. Biagioni and J. Eriksson. Map inference in the face of noise and disparity. In

Proceedings of the 20th International Conference on Advances inGeographic Information Systems , pages 79–88. ACM, 2012.

EWT per patch ˆ µ Y per patch2 5 7 10 2 5 7 10 Observed from the data θ Timetable at end points only 4 . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . No headway correction at all − . ± . − . ± . − . ± . − . ± . . ± . . ± . . ± . . ± . Bus holding in all patches 120 . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . Bus holding only (all patches) 60 . − . ± . − . ± . − . ± . − . ± . . ± . . ± . . ± . . ± . . − . ± . − . ± . − . ± . − . ± . . ± . . ± . . ± . . ± . . − . ± . − . ± . − . ± . − . ± . . ± . . ± . . ± . . ± . . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . Speed modification 0 .

05 4 . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . .

01 4 . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . .

15 5 . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . Table 8. Comparison of the impact of several performance improvement strategies in terms of their EWT and observed averageheadway ˆ µ Y .

10] J. Biagioni, T. Gerlich, T. Merrifield, and J. Eriksson. Easytracker: automatic transit tracking, mapping, and arrival time prediction using smartphones.In

Proceedings of the 9th ACM Conference on Embedded Networked Sensor Systems , pages 68–81. ACM, 2011.[11] M. Bladt. A review on phase-type distributions and their use in risk theory.

ASTIN Bulletin: The Journal of the IAA , 35(1):145–161, 2005.[12] P. Cao, T. Miwa, and T. Morikawa. Modeling distribution of travel time in signalized road section using truncated distribution.

Procedia-Social andBehavioral Sciences , 138:137–147, 2014.[13] F. Cevallos, X. Wang, Z. Chen, and A. Gan. Using AVL data to improve transit on-time performance.

Journal of Public Transportation , 14(3):21–40,2011.[14] F. Cevallos, X. Wang, and A. Gan. Using a web-service to monitor transit on-time performance. In

Presented at the 19th ITS World Congress, Vienna ,2012.[15] E. Clarke, E. Emerson, and A. Sistla. Automatic verification of finite-state concurrent systems using temporal logic specifications.

ACM Transactionson Programming Languages and Systems (TOPLAS) , 8(2):244–263, 1986.[16] E.M. Clarke, O. Grumberg, and D. Peled.

Model checking . MIT press, 1999.[17] J.J. Davies, A.R. Beresford, and A. Hopper. Scalable, distributed, real-time map generation.

IEEE Pervasive Computing , 5(4):47–54, 2006.[18] D.H. Douglas and T.K. Peucker. Algorithms for the reduction of the number of points required to represent a digitized line or its caricature.

Cartographica: The International Journal for Geographic Information and Geovisualization , 10(2):112–122, 1973.[19] M.J. Faddy and S.I. McClean. Analysing data on lengths of stay of hospital patients using phase-type distributions.

Applied Stochastic Models inBusiness and Industry , 15(4):311–317, 1999.[20] Stephen Gilmore, Daniël Reijsbergen, and Andrea Vandin. Transient and steady-state statistical analysis for discrete event simulators. In

InternationalConference on Integrated Formal Methods , pages 145–160. Springer, 2017.[21] Y.E. Hawas. Simulation-based regression models to estimate bus routes and network travel times.

Journal of Public Transportation , 16(4), 2013.[22] A. Hofleitner, R. Herring, and A. Bayen. Probability distributions of travel times on arterial networks: a traffic flow and horizontal queuing theoryapproach. In , number 12-0798, 2012.[23] J.G. Jetcheva, Y.-C. Hu, S. PalChaudhuri, A.K. Saha, and D.B. Johnson. CRAWDAD dataset rice/ad_hoc_city (v. 2003-09-11). Downloaded fromhttp://crawdad.org/rice/ad_hoc_city/20030911 on 9 June 2016.[24] W.H. Lin and R.L. Bertini. Modeling schedule recovery processes in transit operations for bus arrival time prediction.

Journal of AdvancedTransportation , 38(3):347–365, 2004.[25] L. Luisa Vissat, A. Clark, and S. Gilmore. Finding optimal timetables for Edinburgh bus routes.

Electronic Notes in Theoretical Computer Science ,310:179–199, 2015.[26] J. Mendes-Moreira, L. Moreira-Matias, J. Gama, and J.F. de Sousa. Validating the coverage of bus schedules: a machine learning approach.

InformationSciences , 293:299–313, 2015.[27] L. Moreira-Matias, O. Cats, J. Gama, J. Mendes-Moreira, and J.F. de Sousa. An online learning approach to eliminate bus bunching in real-time.

Applied Soft Computing , 47:460–482, 2016.[28] L. Moreira-Matias, J. Mendes-Moreira, J.F. de Sousa, and J. Gama. Improving mass transit operations by using AVL-based systems: a survey.

IEEETransactions on Intelligent Transportation Systems , 16(4):1636–1653, 2015.[29] M.F. Neuts and K.S. Meier. On the use of phase type distributions in reliability modelling of systems with two components.

Operations-Research-Spektrum , 2(4):227–234, 1981.[30] J.L. Pline.

Traffic engineering handbook . Prentice-Hall, 1992.[31] D. Reijsbergen and S. Gilmore. Formal punctuality analysis of frequent bus services using headway data. In

Computer Performance Engineering ,pages 164–178. Springer, 2014.[32] D. Reijsbergen, S. Gilmore, and J. Hillston. Patch-based modelling of city-centre bus movement with phase-type distributions.

Electronic Notes inTheoretical Computer Science , 310:157–177, 2015.[33] D. Reijsbergen and R. Ratan. Probabilistic modelling of the impact on bus punctuality of a speed limit proposal in Edinburgh. In

Proceedings of the9th EAI International Conference on Performance Evaluation Methodologies and Tools (ValueTools) , 2015.[34] P. Reinecke, T. Krauß, and K. Wolter. HyperStar: Phase-type fitting made easy. In

Proceedings of the Ninth International Conference on the QuantitativeEvaluation of Systems (QEST) , pages 201–202, 2012.[35] The Scottish government. Bus Punctuality Improvement Partnerships (BPIPs) guidance document. 2009.[36] S. Sebastio and A. Vandin. MultiVeStA: Statistical model checking for discrete event simulators. In

Proceedings of the 7th International Conference onPerformance Evaluation Methodologies and Tools , pages 310–315. ICST (Institute for Computer Sciences, Social-Informatics and TelecommunicationsEngineering), 2013.[37] K. Sen, M. Viswanathan, and G.A. Agha. Vesta: A statistical model-checker and analyzer for probabilistic systems. In

Proceedings of QEST , 2005.[38] A. Shalaby and A. Farhan. Bus travel time prediction model for dynamic operations control and passenger information systems.

TransportationResearch Board Annual Meeting , 2, 2003.[39] W. Shi, S. Shen, and Y. Liu. Automatic generation of road network map from massive GPS, vehicle trajectories. In , pages 1–6. IEEE, 2009.[40] A. Skabardonis and N. Geroliminis. Real-time estimation of travel times on signalized arterials. Technical report, 2005.

41] Natalie M Steiger, Emily K Lada, James R Wilson, Jeffrey A Joines, Christos Alexopoulos, and David Goldsman. ASAP3: A batch means procedurefor steady-state simulation analysis.

ACM Transactions on Modeling and Computer Simulation (TOMACS) , 15(1):39–73, 2005.[42] D. Stroock.

An introduction to Markov processes , volume 230. Springer Science & Business Media, 2013.[43] W. Suwardo, M. Napiah, and I. Kamaruddin. ARIMA models for bus travel time prediction.

J. Institut. Eng. , 71(2):49–58, 2010.[44] P.R. Tétreault and A.M. El-Geneidy. Estimating bus run times for new limited-stop service using archived AVL and APC data.

TransportationResearch Part A: Policy and Practice , 44(6):390–402, 2010.[45] H.L.S. Younes and R.G. Simmons. Probabilistic verification of discrete event systems using acceptance sampling. In

International Conference onComputer Aided Verification , pages 223–235. Springer, 2002.

A APPENDIX

200 400 600 800 1000 1200 . . . . . . empirical vs fitted cdf, patch 1 (red) seconds c u m u l a t i v e p r obab ili t y llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll . . . . . . empirical vs fitted cdf, patch 2 (green) seconds c u m u l a t i v e p r obab ili t y lllllllllllllllllllllllllllllllllllllllllllllllllll l ll . . . . . . empirical vs fitted cdf, patch 3 (blue) seconds c u m u l a t i v e p r obab ili t y llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll . . . . . . empirical vs fitted cdf, patch 4 (yellow) seconds c u m u l a t i v e p r obab ili t y l lllllllllllllllllllllllllllllllllllllllllllllllllllll lll ll . . . . . . empirical vs fitted cdf, patch 5 (magenta) seconds c u m u l a t i v e p r obab ili t y l llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lll . . . . . . empirical vs fitted cdf, patch 6 (cyan) seconds c u m u l a t i v e p r obab ili t y l llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll ll . . . . . . empirical vs fitted cdf, patch 7 (orange) seconds c u m u l a t i v e p r obab ili t y lll l llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll . . . . . . empirical vs fitted cdf, patch 8 (olive green) seconds c u m u l a t i v e p r obab ili t y l llllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllll l . . . . . . empirical vs fitted cdf, patch 9 (salmon) seconds c u m u l a t i v e p r obab ili t y llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll l . . . . . . empirical vs fitted cdf, patch 10 (light blue) seconds c u m u l a t i v e p r obab ili t y l lllllllllllllllllllllllllllllllllllllllllllllllllllllllll lll Fig. 7. Empirical (black dots) vs fitted (smooth lines) CDF plots for all of the 10 final patches in the Edinburgh dataset.