Large-scale Analysis and Simulation of Traffic Flow using Markov Models
Renátó Besenczi, Norbert Bátfai, Péter Jeszenszky, Roland Major, Fanny Monori, Márton Ispány
LLarge-scale Analysis and Simulation of Traffic Flow using Markov Models
Ren´at´o Besenczi ∗ , Norbert B´atfai, P´eter Jeszenszky, Roland Major, Fanny Monori, M´arton Isp´any Department of Information Technology,Faculty of Informatics, University of Debrecen4002 Debrecen, PO Box 400, Hungary
Abstract
Modeling and simulating movement of vehicles in established transportation infrastructures, especially in large urbanroad networks is an important task. It helps with understanding and handling traffic problems, optimizing trafficregulations and adapting the traffic management in real time for unexpected disaster events. A mathematically rigorousstochastic model that can be used for traffic analysis was proposed earlier by other researchers which is based onan interplay between graph and Markov chain theories. This model provides a transition probability matrix whichdescribes the traffic’s dynamic with its unique stationary distribution of the vehicles on the road network. In this paper,a new parametrization is presented for this model by introducing the concept of two-dimensional stationary distributionwhich can handle the traffic’s dynamic together with the vehicles’ distribution. In addition, the weighted least squaresestimation method is applied for estimating this new parameter matrix using trajectory data. In a case study, we applyour method on the Taxi Trajectory Prediction dataset and road network data from the OpenStreetMap project, bothavailable publicly. To test our approach, we have implemented the proposed model in software. We have run simulationsin medium and large scales and both the model and estimation procedure, based on artificial and real datasets, havebeen proved satisfactory. In a real application, we have unfolded a stationary distribution on the map graph of Porto,based on the dataset. The approach described here combines techniques whose use together to analyze traffic on largeroad networks has not previously been reported.
Keywords:
Road network, Traffic simulation, Discrete time Markov chain, Stationary distribution, Weighted leastsquares estimation
1. Introduction
In the past decade, research and development of SmartCity applications have become an active topic. These ap-plications provide services to inhabitants which can makeeveryday life easier [1]. In addition, these services containsolutions such as intelligent city planning, crowd-sourcing,as well as crisis and disaster management [2]. These appli-cations will also both generate and make us of big data an-alytics which will arise from the wide availability of cloudcomputing. In addition, complex sensor systems are be-ing implemented, called the Internet of Things (IoT) thatcould aid the operation of these applications. By the year2050, 70% of Earth’s population is expected to live in ur-ban areas [3]. City infrastructures will face new challengesfrom many factors; one such factor is urban traffic.In the recent years, the research and development ofsmart city applications focusing on urban transportation ∗ Corresponding author.
Email addresses: [email protected] (Ren´at´oBesenczi), [email protected] (Norbert B´atfai), [email protected] (P´eter Jeszenszky), [email protected] (Roland Major), [email protected] (Fanny Monori), [email protected] (M´arton Isp´any) and traffic have become a vivid topic. Since more andmore people live in urban areas, a solution for the prob-lem of dense traffic is highly demanding. Moreover, in thepast few years, many developments have occurred in theautomobile industry that fundamentally changed the waywe think about personal mobility. Autonomous or driver-less cars are being introduced by several tech giants andcar companies, some of them are on the market already.From driving-aid system (e.g. pedestrian observers, lanesupport systems), which have become ubiquitous by theend of the 2010s, we can envision the widespread use ofdriverless cars in the 2020’s. In addition, pure electric carshave been on the market for a few years and will certainlycontinue to expand their market share.In a previous paper [4], we introduced a smart cityapplication that can support the operation of self drivingand electric cars in urban environments and may providea common research platform for investigating the connec-tion between autonomous cars and smart cities. Our mainquestion was: what can a city controlled IT solution offerto these cars to allow them to be operated more efficiently?The application was a prototype and some of its featureswere not defined or developed satisfactorily. One such fea-ture was the traffic simulation algorithms included in the a r X i v : . [ ee ss . S Y ] J u l pplication. In a previous paper [5], we raised the questionwhether we can develop such an algorithm to control thesimulated cars that can hold the real distribution of cars.An additional question is how can we fit this algorithm byestimating its parameters based on real data, e.g., whichhas the form of trajectories. Our results presented in thispaper are an answer to these questions.In [6], see also [7] and [8], a stochastic model is pro-posed which can handle the traffic in an urban networkby using a mathematically rigorous method. This modelis based on discrete time Markov chain on the road graphwhich plays the role of the state space. In the traffic inter-pretation, the transition probability matrix describes thedynamic of the traffic while its unique stationary distri-bution corresponds to the traffic equilibrium (or steady)state on the road network. In this steady-state, the distri-bution of vehicles remains invariant locally in time underthe transport dynamic. Thus, this stationary distributionof the Markov chain can be interpreted as the momentarytrue distribution of the vehicles on the road network.Our contributions in this paper are the following. Basedon [6] we introduce the concepts of a Markov random walk,which describes the motion of an individual vehicle, andMarkov traffic, which describes the entire traffic on theroad network, respectively. We determine the stationarydistribution of the Markov traffic as a multinomial distri-bution, see formula (5). We present how the ergodic the-ory of finite Markov chains implies that a complex trafficevent can be approximated well by the stationary distribu-tion of a Markov chain on the road network; see Theorem2. This theorem also yields the theoretical ground of oursimulation algorithm. We reparametrize the model by in-troducing the concept of two-dimensional stationary distri-bution which possesses equi-distributed marginals that arethe unique stationary distribution of the transition prob-ability matrix, respectively. To estimate this parametermatrix the weighted least squares (WLS) estimation as akind of composite (quasi-) likelihood methods is applied,see [9]. In Theorem 3, we show that the WLS estimatorof the two-dimensional stationary distribution can be ex-pressed explicitly. Moreover, this estimation method pro-vides a computationally effective technique in a large scalesince the MapReduce paradigm can be easily applied forit. Finally, we present how a city-controlled IT solutioncan be developed which is able to simulate the traffic on aroad network that fits to the real data.Note that the joint application of Markov chains andlarge graphs to analyze the behavior of complex systems iswell known in several fields, e.g., distributed systems [10],geophysics [11] and biology [12].This paper is structured as follows: the remainder ofthis chapter provides a short introduction to our softwaresystem and shows a brief outlook for other simulation toolsand approaches. In section 2, we describe our proposedtraffic simulation model in detail. In section 3, we dis-cuss our findings, and in section 4, we conclude the paper.The Appendix provides the proofs of theoretical results and a toy example to demonstrate the proposed estima-tion method. This research follows our development of a traffic simu-lation platform initiative called rObOCar World Champi-onship (or OOCWC for short) [4], [5], [13], [14], [15]. TheOOCWC is a multiagent-oriented environment for creatingurban traffic simulations. The goal is to offer a researchand educational platform to investigate urban traffic con-trol algorithms and the relationship between smart citiesand self-driving cars. In our vision, a software system thatis controlled by a city administration can support thesecars well, because a city possesses all the necessary infor-mation (congestions, accidents, detours, etc.) for effec-tive route planning. The traffic simulations are performedby one of its components called
Robocar City Emulator (RCE). The first rapid prototype of RCE was called Jus-tine, which is an open source project released under theGNU GPL v3 and can be downloaded from GitHub. Ithas 3 main components, the RCE itself, and two visualiza-tion tools, the rcwin and the rclog. This prototype uses theOpenStreetMap (OSM) database and processes it with theOsmium Library. The result of this processing is a routingmap graph and a Boost Graph Library graph. The routingmap graph is then placed into a shared memory segment.This prototype was also called “Police Edition”, because itcould simulate police pursuit (macroscopic, with multiplepolice agents). In addition, it was somewhat capable ofsimulating traffic, but only for a technical demonstration,because every traffic unit moved randomly. In the follow-ing, we only focus on the traffic simulation algorithm of theRCE. The traffic simulation model of the RCE is based onthe Nagel-Schreckenberg (NaSch) model [16], because ituses a cell based approach, as well. Moreover, it can beconsidered as a standalone multi-agent system. The sim-ulation takes place on a rectangular part of the OSM. Weslice all edges for parts 3 meters long. Based on NaSch ter-minology, these parts are called cells. Therefore, the celllength is equal to 3 meters. In contrast with the NaSchmodel, where a cell may contain many cars, edges can con-tain a given number of cars only (calculated as the edgelength divided by the length of the cell, or part, i.e. 3 me-ters), since in our simulation, one cell can contain onlyone car. In the original implementation, the simulationalgorithm moves the cars by random walk. So, when a cararrives to a graph vertex (i.e. intersection), it selects thenext edge (i.e. next street) according to uniform distribu-tion. For a detailed description of the operation of RCE,see paper [4].A simulation can be initialized based on measured dataor some prescribed distribution, e.g., uniform one, on theroad network. During the simulation, we can observe howthe distribution of the cars change. One of the reasonable https://github.com/nbatfai/robocar-emulator is a modeling software environmentthat supports macroscopic, and microscopic traffic simu-lations. PTV Vissim is a microscopic, multi-modal traf-fic simulation package. It supports simulation of trafficpatterns and can display all road users and their inter-actions. Although the above mentioned applications arewidely used in traffic analysis and planning, the main fo-cus of their simulation algorithms is on microscopic trafficevents. In contrast, our software system focuses only onthe traffic flow of the whole city, or, to be more precise,the traffic graph.Several approaches exist for short-term traffic flow pre-diction. These models are based on for example Box-Jenkins time-series analyses with ARIMA model [21], [22],[23], [24], Kalman filter theory [25], [26], non-parametricmethods (k-NN, kernel, local regression) [27], [28], [29],[30], exponential smoothing [31], [32], spectral analysis [33]or wavelets [34], [35], [36]. In addition, several approachesuse machine learning and data mining techniques, such assupport vector regression [37], artificial neural networks[38], [39], [40], Bayesian networks [41] or deep learning[42]. Some applications can be found based on computa-tional intelligence techniques, e.g., linear genetic program-ming [43] or fuzzy logic [44], [45], [46], but seldom canwe find approaches based on Markov models [47] and [6]mentioned previously.
2. Modeling traffic flow by Markov chains on graphs
In this section, we overview a traffic simulation modelthat uses tools of graph theory and Markov chains. First, http://vision-traffic.ptvgroup.com/en-us/products/ptv-vissim/
321 45
Figure 1: A simple road network. in sections 2.1 and 2.2 we outline the basic concepts inthe fields of graph theory and finite Markov chains, re-spectively. Then, in 2.3, we describe the proposed modelcalled “Markov traffic” in detail. As a case study, we usea publicly available trajectory dataset, namely, the TaxiTrajectory Prediction dataset ; in section 2.4, we outlinethe key points of how we selected and processed the tra-jectory data. In section 2.5, we describe how we processOSM data and build a traffic graph from this data. Sec-tion 2.6 is devoted to the statistical inference for Markovtraffic. In this section, we outline the concepts of graph theorythat are necessary for modeling traffic flow. A standardtextbook on graph theory is [48].
Road network, line digraph and degree distri-butions.
Let G = ( V, E ) be a directed graph (digraph)where V and E denote the set of vertices or nodes and theset of directed edges or arcs or arrows of the graph, respec-tively. In the sequel, vertices are denoted by u, v, w, edgesare denoted by e, f, g . For a directed edge e = ( v, w ) ∈ E we also use the notation v → w . We suppose that G is asimple digraph in the sense that it does not contain multi-ple arrows, i.e., two or more edges that connect the sametwo vertices in the same direction. The digraph G , calledroad network in this paper, represents the road system ofa city. More precisely, we have the following definition, see[49], Definition 1. A road network G is a simple directedgraph, G = ( V, E ), where V is a set of nodes representingthe terminal points of road segments, and E is a set ofdirected edges denoting road segments.A road segment e = ( v, w ) ∈ E is a directed edge inthe road network graphs, with two terminal points v and w . The vehicle flow on this edge is from v to w .Fig. 1 presents a simple example for road network.Let S denote the diagonal set of V , i.e., S := { ( v, v ) | v ∈ V } . If v → v , i.e., ( v, v ) ∈ E , for a v ∈ V , then it is calleda loop which connects the vertex v to itself. If there wouldbe a loop in a road network then there would be a physical kaggle.com/c/pkdd-15-predict-taxi-service-trajectory-i v → v ( v ∈ V ) in the road network. From apractical point of view, we lock out this possibility becausein this case a vehicle can move in an infinite cycle. Thus,we suppose that E ∩ S = ∅ . Later however, when we definethe stochastic traffic on a road network, we allow loops inorder that we ensure that a vehicle may remain at thesame node or edge of the road network after a time step.For v ∈ V define v − := { e ∈ E | ∃ u ∈ V : e = ( u, v ) } and v + := { e ∈ E | ∃ w ∈ V : e = ( v, w ) } , i.e., v − and v + arethe sets of arrows in and out the node v , respectively. Notethat deg − ( v ) = | v − | and deg + ( v ) = | v + | is the indegreeand outdegree of v , respectively, where | · | denotes thecardinality of a set.For a digraph G = ( V, E ) another digraph can be as-sociated by the following way. Let the set V (cid:48) of vertices ofthis new digraph be the set of directed edges E of G andlet the set E (cid:48) of its directed edges consist of the orderedpair ( e, f ) where e, f ∈ E such that there exist u, v, w ∈ V that e = ( u, v ) and f = ( v, w ), i.e., u → v → w is a path(or dipath) in G of length 2. This associated digraph iscalled a directed line graph, see Section 4.5 in [48], shortlyline digraph, or line road network (network line graph,see [50]), and it is denoted by L( G ) = ( V (cid:48) , E (cid:48) ). The ele-ments of E (cid:48) can be described by triplets ( u, v, w ), where u, v, w ∈ V, ( u, v ) , ( v, w ) ∈ E , and for a directed edge inL( G ) we use the notation ( u, v ) → ( v, w ) too.The basic difference between the digraph and line di-graph views of a road network is that the former assignsthe vehicles moving in a city to the vertices while the lat-ter to the edges. One can refer the former as first-order(primal) network while the latter as second-order (dual)network, see [51] ([52]). These two kinds of graphs areboth useful because in a road network, certain measure-ments are associated with the crossings (vertices), and cer-tain measurements are associated with the road segments(directed edges). When we are concerned with comparingmeasurements associated with crossings, then we will beconcerned with the adjacency relationships of crossings,and so with the road network. However, when we are con-cerned with measurements associated with road segmentswe will be concerned with the adjacency relationships ofroad segments, and so our analyses will involve the lineroad network.The degree distributions of the digraphs G and L( G ),respectively, are given in the following way. For all i =0 , , , . . . define n + i := |{ v ∈ V | deg + ( v ) = i }| . Then,the pairs ( i, n + i ) , i = 0 , , , . . . , form the frequency his-togram for the outdegree distribution of G . The inde-gree frequency histogram is defined similarly as ( i, n − i ) , i =0 , , , . . . , where n − i := |{ v ∈ V | deg − ( v ) = i }| . On theother hand, for all i = 0 , , , . . . , define m + i := (cid:80) v ∈ G + i deg − ( v ) where G + i := { v ∈ V | deg + ( v ) = i } .Then, the pairs ( i, m + i ) , i = 0 , , , . . . , form the frequencyhistogram for the outdegree distribution of L( G ). Notethat n + i = | G + i | for all i . Similarly, the pairs ( i, m − i ) , i =0 , , , . . . , form the frequency histogram for the indegree distribution of L( G ) where m − i := (cid:80) v ∈ G − i deg + ( v ) and G − i := { v ∈ V | deg − ( v ) = i } . (Note that n − i = | G − i | forall i .) One can easily see that the supports of the two inde-gree (outdegree) histograms are the same. For the Portoexample (described later in this paper), the above men-tioned degree distributions can be seen in Fig. 2. Thesehistograms corroborates the fact that the Porto’s road net-work is a sparse graph since there is no node with higherin- and outdegree than 6 and the ratio of the number ofedges and the number of nodes is less than 2, see Fig. 7. Connectedness, periodicity and adjacency ma-trix.
Recall that a sequence v , . . . , v (cid:96) ∈ V , (cid:96) ∈ N ( N := { , , . . . } ), is called walk of length (cid:96) if v → v → · · · → v (cid:96) .A walk is called path if its elements are different vertices.For a pair u, v ∈ V, u (cid:54) = v , it is said that v is reachablefrom u if there exists a walk v , v , . . . , v (cid:96) such that u = v and v = v (cid:96) . Clearly, if v is reachable from u then thereis a path from u to v . A digraph G is said to be stronglyconnected (diconnected) if every vertex is reachable fromevery other vertex.We can define vectors (functions) and matrices (opera-tors or kernels) on V in the following way. Let α : V → R be a real function on V and let F ( V, R ) denote the setof real functions on V . We shall also use the notations α ( v ) = α v for all v ∈ V and α = ( α v ) v ∈ V . Then, F ( V, R )is a finite dimensional vector space with the usual innerproduct (cid:104) α , β (cid:105) = α (cid:62) β := (cid:88) v ∈ V α v β v , α , β ∈ F ( V, R ). Let T : V × V → R be a real function.We shall also use the notations T ( u, v ) = t uv , u, v ∈ V and T = ( t uv ) u,v ∈ V . Then T is called matrix, operator orkernel on V and induces a linear operator on F ( V, R ) inthe following way: for each α ∈ F ( V, R ), T ( α ) ∈ F ( V, R )is defined as T ( α ) u := (cid:80) v ∈ V t uv α v , u ∈ V . For the sake ofsimplicity, we write T ( α ) = T α as a matrix-vector prod-uct. If the support of T (the set { ( u, v ) | u, v ∈ V : t uv (cid:54) = 0 } in V × V ) is a subset of E ( E ∪ S ) then T is called G -subordinated in strong (weak) sense.Let A = ( a uv ) u,v ∈ V denote the adjacency matrix of thedigraph G , i.e., A is a matrix on V and a uv = 1 if and onlyif ( u, v ) ∈ E and 0 otherwise. Clearly, the support of A is E , i.e., A is a G -subordinated matrix in strong sense( a vv = 0 for all v ∈ V ). Note that A is not necessarilysymmetric. The number of directed walks from vertex u to vertex v of length k is the entry on u -th row and v -thcolumn of the matrix A k . For example, in Fig. 1, the num-ber of directed walks of length 4 from vertex 2 to vertex4 is 4, see Appendix A. One can easily check that G isstrongly connected if and only if there is a positive integer k such that the matrix I + A + · · · + A k is positive, i.e.,all the entries of this matrix are positive. The indegreeand outdegree of a vertex v can be expressed by the ad-jacency matrix as deg − ( v ) = (cid:80) u ∈ V a uv and deg + ( v ) = (cid:80) u ∈ V a vu . Introduce the vectors d − := ( deg − ( v )) v ∈ V
11 17220 14165 2291 168 5 1 indegree c oun t (a) Indegree distribution (vertices).
154 17159 14160 2310 174 3 1 outdegree c oun t (b) Outdegree distribution (vertices).
122 18180 27643 6514 638 23 6 indegree c oun t (c) Indegree distribution (edges).
167 18142 27588 6550 658 15 6 outdegree c oun t (d) Outdegree distribution (edges)Figure 2: The degree distribution histograms of the Porto map traffic graph. and d + := ( deg + ( v )) v ∈ V . Then, we have d − = A T and d + = A where := (1) v ∈ V is the constant unit function.Clearly, the line digraph of a strongly connected di-graph is also strongly connected. Namely, if e = ( u, v ) ∈ V (cid:48) (= E ) and f = ( w, z ) ∈ V (cid:48) (= E ) are arbitrary suchthat e (cid:54) = f , then, since G is strongly connected, there ex-ists a walk (or a path) of length (cid:96) in G such that v = v → v → . . . → v (cid:96) = w , where v , . . . , v (cid:96) ∈ V ( (cid:96) ∈ N ), andthus we have e = ( u, v ) → ( v , v ) → . . . → ( v (cid:96) − , v (cid:96) ) → ( w, z ) = f , i.e., there exists a walk (or a path) of length (cid:96) in L( G ) between the vertices e, f ∈ V (cid:48) . If u → v → u for a pair u, v ∈ V then we have ( u, v ) → ( v, u ) → ( u, v )in the line digraph, i.e., vehicles can turn back at vertex u into v . Sometimes the traffic regulations do not allow this kind of reversal, i.e., the edge set E (cid:48) in L( G ) must not con-tain some triplet ( u, v, u ) while some of these triplets areneeded that L( G ) be strongly connected. By deleting all ofthe unnecessary triplet ( u, v, u ), u, v ∈ V , such that the re-maining line digraph be still strongly connected we get theminimal strongly connected line digraph of G . This linedigraph is denoted by ML( G ). For example, the verticesof ML( G ) for G defined in Fig. 1 are given in Table 1.Recall that a cycle C ⊂ V in digraph G is a path v → v → . . . → v (cid:96) → v . Here (cid:96) ( C ) = (cid:96) is calledthe length of C . A digraph G is said to be aperiodic ifthe greatest common divisor of the lengths of its cycles isone. Formally, the period of G is defined as per ( G ) :=gcd { (cid:96) > ∃ C ⊂ V cycle such that (cid:96) ( C ) = (cid:96) } . Then, G
5s called aperiodic if per ( G ) = 1. Clearly, if a digraph G isaperiodic then its line digraph L( G ) is also aperiodic. Thisstatement follows from the following fact: if v → v → . . . → v (cid:96) → v is a cycle then ( v , v ) → ( v , v ) → . . . → ( v (cid:96) , v ) → ( v , v ) is a cycle in L( G ). Thus, if (cid:96) > C ⊂ V such that (cid:96) ( C ) = (cid:96) then thereexists a cycle C (cid:48) ⊂ V (cid:48) such that (cid:96) ( C (cid:48) ) = (cid:96) .It is well known that the adjacency matrix A of an ape-riodic, strongly connected graph G is primitive, i.e., irre-ducible and has only one eigenvalue of maximum modulus.Primitivity is equivalent to the following quasi-positivity:there exists k ∈ N such that the matrix A k >
0, see Section8.5 in [53].
Open road networks and their closures.
In gen-eral, there are vehicles which leave or enter the city. Tomodel these two possibilities V is augmented by a newideal vertex 0 which denotes the world outside of the city.This approach is similar to that is applied for public trans-port in [54]. Let V := V ∪ { } . Then, additional directededges which contains vertex 0 are also added to E . Inthis case, ( v,
0) denotes that the vehicles can leave the cityat vertex v , and (0 , v ) denotes that new vehicles can en-ter the city at vertex v , where v ∈ V . Let E denote theaugmentation of E by directed edges for getting into andout of the city. Note that, for E , it is not allowed to con-tain the loop (0 , G is denoted by G = ( V , E ) and it is called the closure of a road network G . For e = ( v, w ) ∈ E we also use the notation v → w .In what follows, we suppose that there exist u, v ∈ V suchthat u → → v .In the sequel, each definition (strong connectedness,periodicity, line digraph) given for G can be extended for G in a natural way. Note that in the augmented line di-graph L( G ) = ( V (cid:48) , E (cid:48) ) the elements of the edge set E (cid:48) canbe described by triplet ( u, v, w ), where u, v, w ∈ V and if v = 0 then u, w (cid:54) = 0 and if u or w is 0 then v (cid:54) = 0 becausetriplets (0 , , v ), ( v, ,
0) and (0 , ,
0) are excluded from E (cid:48) .One can easily see that if G is strongly connected thenits closure G is also strongly connected. Moreover, thestrongly connected components of G , if there exist morethan 1, can be connected through the ideal vertex 0, re-sulting in a strongly connected G . Thus, in this case, theaugmented line digraph will also be strongly connected.Clearly, if G is aperiodic then G is aperiodic too.In the rest of this paper, it is assumed that the roadnetwork is closed, i.e., the vehicles can not get into and outof the road system of the city augmented with the idealvertex 0. In this section, we summarize some basic concepts andresults of the theory of finite Markov chains with theirinterpretations and consequences for traffic flow modeling.Some textbooks on this field are [55] and [56]. We can define two kinds of probability distributions ona road network G considering the set V or E as the statespace, respectively. In fact, we can handle both cases al-together by defining probability distributions on the set ofvertices of the digraph G or the line digraph L( G ), accord-ingly. However, we should note that the Markov kernelsdefined on the line digraph must be defined with particularattention.A probability distribution (p.d.) on V is the vector π := ( π v ) v ∈ V where π v ≥ v ∈ V and (cid:80) v ∈ V π v =1. That is a p.d. π is a normalized V → R + function. Wecan think of π v as the proportion of the number of vehicleswhich drive through the crossing v with respect to thewhole number of vehicles in the city at a fixed time period.A Markov kernel or transition probability matrix on V isdefined as a real kernel P := ( p uv ) u,v ∈ V such that p uv ≥ u, v ∈ V and (cid:80) v ∈ V p uv = 1 for all u ∈ V , i.e., p u := ( p uv ) v ∈ V is a p.d. on V for all u ∈ V . The quantity p uv ∈ [0 ,
1] is called the transition probability from vertex u to vertex v . The kernel P is said to be G -subordinated if p uv > u, v ∈ V implies ( u, v ) ∈ E or u = v , i.e., P as a matrix on V is G -subordinated in the weak sense. Itis well known, see [57], that for a Markov kernel P on V , anassociated graph G P = ( V, E P ) can be introduced in thefollowing way: for a pair u, v ∈ V (where the case u = v is also allowed) ( u, v ) ∈ E P if and only if p uv >
0. Thus, P is G -subordinated if and only if E P ⊆ E ∪ S , i.e., G P is the subgraph of the graph G extended with its diagonal S . In this case, we also use the notation P = ( p e ) e ∈ E ∪ S .In other words, a G -subordinated Markov kernel P is astochastic matrix on V with support E ∪ S . Then, thesum condition for Markov kernel P can be rewritten as: (cid:88) w : v → w p vw + p vv = 1 , v ∈ V. (1)A p.d. π on V is a stationary distribution (s.d.) ofthe kernel P if (cid:80) u ∈ V π u p uv = π v for all v ∈ V . For a G -subordinated Markov kernel P this formula, the so-calledglobal balance equation, can be expressed as: (cid:88) u : u → v π u p uv + π v p vv = π v , v ∈ V. (2)Fig. 3 presents a Markov kernel with its s.d. on the roadnetwork in Fig. 1.Since the vehicles are moving along the road segmentsof the road network G it is natural to choose the set E for the role of the state space. In this case, we can de-fine probability distributions on the set of vertices again,however, we have to consider the line digraph L( G ) (orML( G )). Formally, a probability distribution (p.d.) onL( G ) is the vector π (cid:48) := ( π (cid:48) e ) e ∈ E where π (cid:48) e ≥ e ∈ E and (cid:80) e ∈ E π (cid:48) e = 1. Similarly to p.d.’s on V , thep.d. π (cid:48) can also be considered as a normalized E → R + function or a normalized V (cid:48) → R + function. If we wantto emphasize the vertices of the original road network G instead of the edges then the notation π (cid:48) e = π (cid:48) uv is also6 : 1/72: 2/71: 1/7 4: 2/75: 1/7 Figure 3: A Markov kernel (on edges) with its stationary distribution(on vertices) on the road network in Fig. 1. used where e = ( u, v ) ∈ E . We can think of π (cid:48) e as theproportion of the number of vehicles at the road segment e with respect to the whole number of vehicles in the cityat a fixed time point. Note that G endowed with π (cid:48) is aweighted digraph which is often called a network in itselfas well.A transition probability matrix (or Markov kernel) on E , i.e., on the line digraph L( G ), can be defined as a realkernel P (cid:48) := ( p (cid:48) ef ) e,f ∈ E such that p (cid:48) ef ≥ e, f ∈ E and (cid:80) f ∈ E p (cid:48) ef = 1 for all e ∈ E . A p.d. π (cid:48) on E is as.d. of the kernel P (cid:48) if (cid:80) e ∈ E π (cid:48) e p (cid:48) ef = π (cid:48) f for all f ∈ E .Since G represents a road system we may suppose that if e (cid:54) = f then p (cid:48) ef > e, f ) ∈ E (cid:48) , i.e., thereexist u, v, w ∈ V such that e = ( u, v ) and f = ( v, w ),and hence, u → v → w is a walk of length 2. In otherwords, we may suppose that the Markov kernel P (cid:48) is L( G )-subordinated. Similarly to Markov kernels on V , P (cid:48) alsoinduces an associated graph G P (cid:48) = ( V (cid:48) , E (cid:48) P (cid:48) ). Thus, P (cid:48) isL( G )-subordinated if and only if E (cid:48) P (cid:48) ⊆ E (cid:48) ∪ S (cid:48) where S (cid:48) denotes here the diagonal { ( e, e ) | e ∈ E } in L ( G ). In thiscase, we use the notation p (cid:48) ef = p (cid:48) uvw as well. In fact, p (cid:48) uvw denotes the probability that a vehicle on the road segment( u, v ) will go further to the road segment ( v, w ) in the nexttime point. Moreover, in the case of e = f = ( u, v ), let p (cid:48) ee = p (cid:48) uv be the probability that a vehicle remains on thesame road segment in the next time point which can benon-zero as well. Thus, since P (cid:48) is a Markov kernel, wehave that, for all u → v , (cid:88) w : v → w p (cid:48) uvw + p (cid:48) uv = 1 (3)and the global balance equation is given as: (cid:88) u : u → v π (cid:48) uv p (cid:48) uvw + π (cid:48) vw p (cid:48) vw = π (cid:48) vw (4)for all v → w .
321 45
Figure 4: The stationary distribution of the Markov kernel in Table 1.
An example for the Markov kernel P (cid:48) on the minimalline digraph ML( G ) of the road network G in Fig. 1 isshown in Table 1. Fig. 4 shows the unique stationary dis-tribution π (cid:48) of the Markov kernel P (cid:48) . Probability distributions and Markov kernelson open road networks.
Probability distributions andMarkov kernels on the closure G of an open road network G can be defined similarly by considering the set V or E as the state space, respectively. A p.d. on V is thevector π := ( π v ) v ∈ V where π v ≥ v ∈ V and (cid:80) v ∈ V π v = 1. Note that π denotes the proportion ofthe number of vehicles which drive in or out of the city’sroads at a time point. A Markov kernel P on V is said tobe G -subordinated if p uv > u, v ∈ V implies( u, v ) ∈ E or u = v ∈ V . Note that G -subordination im-plies that p = 0, i.e., the vehicles cannot move from 0 to0, thus they either enter to the road network or leave theroad network. The equations (1) and (2) remain hold too.Equation (1) can be rewritten as (cid:88) w ∈ V : v → w p vw + p v + p vv = 1 , v ∈ V, (cid:88) w ∈ V :0 → w p w = 1 . The global balance equation (2) for the s.d. can be rewrit-ten as (cid:88) u ∈ V : u → v π u p uv + π p v + π v p vv = π v , v ∈ V, → v, (cid:88) u ∈ V : u → v π u p uv + π v p vv = π v , v ∈ V, (cid:57) v, (cid:88) u ∈ V : u → π u p u = π . We can define Markov kernels on the line digraph L( G )of the augmented road network G , and thus on the aug-mented edge set E similarly to the case of L( G ). A Markovkernel P (cid:48) on L( G ) is called L( G )-subordinated if p (cid:48) ef > e, f ∈ E implies ( e, f ) ∈ E (cid:48) or e = f . Note that( e, f ) ∈ E (cid:48) implies that e = ( u, v ) and f = ( v, w ) where u, v, w ∈ V excluding the triplets (0 , , v ), ( v, ,
0) and(0 , , p (cid:48) uvw = p (cid:48) ef if e =71,2) (2,3) (3,4) (4,2) (2,1) (2,4) (4,5) (5,2)(1,2) 1/2 1/4 0 0 0 1/4 0 0(2,3) 0 1/2 1/2 0 0 0 0 0(3,4) 0 0 1/2 1/4 0 0 1/4 0(4,2) 0 1/4 0 1/2 1/4 0 0 0(2,1) 1/2 0 0 0 1/2 0 0 0(2,4) 0 0 0 0 0 1/2 1/2 0(4,5) 0 0 0 0 0 0 1/2 1/2(5,2) 0 1/4 0 0 1/8 1/8 0 1/2 Table 1: An example for a Markov kernel on the minimal line digraph of the road network in Fig. 1. ( u, v ) and f = ( v, w ) and p (cid:48) uv = p (cid:48) ee if e = ( u, v ). However,three additional conditions should be added. The first oneis that p (cid:48) u u = 0 for all u ∈ V such that u → → u . Thismeans that if a vehicle is on the edge ( u, u then it cannot be on the edge (0 , u ) atthe next time point, i.e., it cannot enter at vertex u in theroad network again, immediately. The second one is that p (cid:48) v = 0 for all v ∈ V such that 0 → v →
0, i.e., vehiclescan enter and leave the city at node v . This means thatif a vehicle enters the city then it cannot leave the city atthe next time. Finally, the third one is that p (cid:48) u = p (cid:48) v = 0for all u, v ∈ V such that u → → v . That is a ve-hicle cannot remain on the road network at the edge ( u, , v ) (or at the vertex v ) the first time then it does not remain on this edge afterthe next time point and it goes further immediately in theroad network. Under these conditions the equations (3)and (4) remain hold. Equation (3) can be rewritten as: (cid:88) w ∈ V : v → w p (cid:48) uvw + p (cid:48) uv + p (cid:48) uv = 1 , u, v ∈ V, u → v, (cid:88) w ∈ V : v → w p (cid:48) vw = 1 , v ∈ V, → v, (cid:88) v ∈ V \{ u } :0 → v p (cid:48) u v = 1 , u ∈ V, u → . Equation (4) can be rewritten as: (cid:88) u ∈ V : u → v π (cid:48) uv p (cid:48) uvw + π (cid:48) v p (cid:48) vw + π (cid:48) vw p (cid:48) vw = π (cid:48) vw ,v, w ∈ V, v → w, (cid:88) u ∈ V : u → v π (cid:48) uv p (cid:48) uv + π (cid:48) v p (cid:48) v = π (cid:48) v , v ∈ V, v → , (cid:88) u ∈ V \{ w } : u → π (cid:48) uv p (cid:48) u w + π (cid:48) w p (cid:48) w = π (cid:48) w , w ∈ V, → w. The stationary distribution in all cases, i.e., for Markovkernels on road networks, line road networks and theirclosures, can be derived by solving the above appropriatelinear equations numerically. Since the state space (theroad network) is finite there exists at least one station-ary distribution. However, in some cases, the stationarydistribution is not uniquely defined by these equations.
Uniqueness of stationary distribution.
We showthat there is a direct connection between the existence ofs.d. of the Markov kernels P and P (cid:48) and the stronglyconnected property of the physical road network G if theMarkov and graph structures are compatible with eachother. The Markov kernel P on V is called G -compatibleif, for any u, v ∈ V such that u (cid:54) = v , p uv > u, v ) ∈ E . Note that the G -compatibility implies theweak G -subordination for a Markov kernel P , however theconverse is not true. Similarly, the Markov kernel P (cid:48) on E is called G -compatible if it is L( G )-compatible Markovkernel on L( G ), i.e., for any e, f ∈ E such that e (cid:54) = f , p (cid:48) ef > e, f ) ∈ E (cid:48) . This is equivalent tothe statement that p (cid:48) uvw > u, v, w ∈ V , if and only if( u, v ) , ( v, w ) ∈ E . Since ( e, f ) ∈ E (cid:48) if and only if thereexist u, v, w ∈ V such that e = ( u, v ) and f = ( v, w ) wecan define the G -compatibility of a Markov kernel P (cid:48) as,for any e, f ∈ E such that e (cid:54) = f , p (cid:48) ef > u, v, w ∈ V such that e = ( u, v ) and f = ( v, w ).Clearly, if P is G -compatible then the strong connec-tivity of G implies that the associated graph G P to theMarkov kernel P is also strongly connected. In this case,the Markov kernel (the transition matrix) P is called irre-ducible. Thus, by Theorem 1 in [57], see also Theorem 3.1and 3.3 in Chapter 3 of [56] the following theorem holds. Theorem 1.
If a road network G is strongly connectedthen there is a unique stationary distribution π ( π (cid:48) ) toany G -compatible Markov kernel P ( P (cid:48) ). Moreover, thisdistribution satisfies π v > v ∈ V ( π (cid:48) uv > u, v ) ∈ E ).The main consequence of this theorem is that, in case ofany physical road network augmented by the ideal vertex0, all of the Markov kernels defined on the road networkthat has positive transition probability on all roads haveunique stationary distribution. Thus, it is reasonable tosuppose that a real traffic which follows a Markovian dy-namic has a local unique stationary distribution in a shorttime period that can be explored by observing the traffic. Let (Ω , A , P ) be a probability space. Then a V -valued( E -valued) random variable (r.v.) is a X : Ω → V ( Y :8 → E ) measurable function, i.e., X − ( v ) ∈ A for all v ∈ V ( Y − ( e ) ∈ A for all e ∈ E ). In this case, X ( Y ) is arandom function on the set V of vertices (on the set E ofedges). For example, X ( Y ) can be the random position ofa vehicle on the road network G , where the position refersto the actual vertex (edge) which the vehicle belongs to.Then, P ( X − ( v )) = P ( X = v ) ( P ( Y − ( e )) = P ( Y = e ))denotes the probability that a vehicle is at the vertex v ∈ V (at the edge e ∈ E ). Clearly, by π X ( v ) := P ( X = v ), v ∈ V , a r.v. X induces a p.d. π X on V . Similarly, by π (cid:48) Y ( e ) := P ( Y = e ), e ∈ E , a r.v. Y induces a p.d. π (cid:48) Y on E . A sequence { X t } t ∈ Z + of V -valued r.v.’s is a Markovchain on the state space V if the Markov property holds: P ( X t = v t | X t − = v t − , . . . , X = v )= P ( X t = v t | X t − = v t − )for all t ∈ N , v , . . . , v t ∈ V . If X, X (cid:48) are V -valued r.v.’sthen for the conditional distribution P = ( p vv (cid:48) ) v,v (cid:48) ∈ V , p vv (cid:48) := P ( X = v | X (cid:48) = v (cid:48) ), v, v (cid:48) ∈ V , we shall also usethe notation X | X (cid:48) . Clearly, X | X (cid:48) is a Markov kernel on V . Similarly, a Markov chain { Y t } t ∈ Z + of E -valued r.v.’scan also be defined through the Markov kernel Y | Y (cid:48) onthe state space E .The main concepts of this paper are the Markov ran-dom walk and the Markov traffic defined in the followingway. Definition 2.
Let the road network G be strongly con-nected and let P be a G -compatible Markov kernel on V with unique s.d. π . Moreover, let { X t } t ∈ Z + be a Markovchain on V such that π X = π and X t | X t − ∼ P for all t ∈ N .Then, { X t } t ∈ Z + is called Markov random walk onthe road network G with Markov kernel P .The set of k ( k ∈ N ) mutually independent Markov ran-dom walks on G with Markov kernel P is called Markovtraffic of size k and it is denoted by the quadruple ( G, P, π , k ).Similarly, { Y t } t ∈ Z + is a Markov random walk on theline road network if it is a Markov chain on the state space E such that π (cid:48) Y = π (cid:48) and Y t | Y t − ∼ P (cid:48) for all t ∈ N .Note that if P is the uniform Markov kernel on G thenwe obtain the standard random walk of the graph theory,see the survey [58].A Markov random walk is an individual Markov trafficwith k = 1 in the sense that it describes the movement ofa random vehicle which follows the stochastic rules definedby the Markov kernel. For a pair u, v ∈ V the notation u ⇒ v will mean that ( u, v ) ∈ E ∪ S , i.e., either u → v or u = v . If X , X are random functions on V then X ⇒ X means that the two-dimensional distribution of ( X , X )is concentrated on E ∪ S . Clearly, for any Markov randomwalk { X t } t ∈ Z + we have X t ⇒ X t +1 ⇒ . . . ⇒ X t + n for all t and n ∈ N . One can also call { X t } t ∈ Z + as a first-orderrandom walk on the road network where a vehicle movesfrom vertex u to vertex v with probability p uv . On the other hand, { Y t } t ∈ Z + can be referred as a second-orderrandom walk where the vehicles move from edge to edge,i.e., we have to consider where the vehicle came from, thevertex visited before the current vertex. The second-orderrandom walk has also been considered in graph analysis,see [51].The state space of a first-order Markov traffic can bedescribed by the function space F ( V, N ) where N := { , , , . . . } . A function f = ( f v ) v ∈ V in F ( V, N ) is calledtraffic configuration or counting function and f v measuresthe number of vehicles at vertex v ∈ V . Let | f | denotethe size of the traffic configuration f defined by | f | := (cid:80) v ∈ V f v . The size of a traffic configuration counts thenumber of vehicles on the road network at a time. Let F k ( k ∈ N ) denote the subset of traffic configurations of size k . A p.d. (cid:37) on F ( V, N ) is a function (cid:37) : F ( V, N ) → [0 , (cid:80) f (cid:37) ( f ) = 1. For a p.d. π on the road network G define the induced p.d. (cid:37) on F k as a multinomial dis-tribution with parameters k and π , see Chapter 35 in [59],given by the formula (cid:37) ( f ) := k ! (cid:89) v ∈ V π f v v f v ! (5)for all f ∈ F k . By this formula the probabilities of complexevents of the traffic can be computed.Then, we define the concept of Markov kernel on F k . AMarkov kernel R on F k is a function F k × F k → [0 ,
1] suchthat (cid:80) g ∈F k R ( f , g ) = 1 for all f ∈ F k . We demonstratethat every Markov kernel P induces a natural Markov ker-nel on F k . The matrix K = ( k uv ) u,v ∈ V is called transportmatrix from traffic configuration f to g on the road net-work G if K : V × V → N is weakly G -subordinated, (cid:80) v ∈ V k uv = f u for all u ∈ V , and (cid:80) u ∈ V k uv = g v forall v ∈ V . In fact, K has row and column marginals f and g , respectively and, heuristically, K defines a way fortransporting the vehicles from configuration f into g onthe road network. An example for transport matrix canbe seen in Fig 5. For a pair f , g ∈ F ( V, N ) let M ( f , g )denote the set of all transport matrix from f to g . Definethe Markov kernel R on F k in the following way R ( f , g ) := (cid:89) u ∈ V f u ! (cid:88) K ∈M ( f , g ) (cid:89) u,v : u ⇒ v p k uv uv k uv ! (6)where f , g ∈ F k . Then, R maps a p.d. (cid:37) into the p.d. R (cid:37) on the state space F k in the following way:( R (cid:37) )( g ) := (cid:88) f ∈F k (cid:37) ( f ) R ( f , g ) (7)for all g ∈ F k . To check that R is a Markov kernel indeedwe note that, by the multinomial theorem, (cid:88) g ∈F k R ( f , g ) = (cid:89) u ∈ V f u ! (cid:88) (cid:80) v ∈ V k uv = f u (cid:89) u,v : u ⇒ v p k uv uv k uv != (cid:89) u ∈ V (cid:32)(cid:88) v ∈ V p uv (cid:33) f u = 1 . (8)9 | | | | |
10 11 11 200 1 10 1
Figure 5: A transport matrix (on edges) on the road network in Fig. 1from configuration f = (1 , , , ,
1) (left in vertices) to configuration g = (1 , , , ,
2) (right in vertices) with k = 10. Moreover, one can easily see similarly to (8), by the multi-nomial theorem, that if π is a s.d. of the Markov kernel P , then the p.d. (cid:37) defined by (5) is the s.d. of the in-duced Markov kernel R defined by (6). Namely, we havethe global balance equation (cid:88) f ∈F k (cid:37) ( f ) R ( f , g ) = (cid:37) ( g ) (9)for all g ∈ F k . (For the proof see Appendix B.)Note that the concepts of traffic configuration and in-duced Markov kernel on them can be extended to the caseof second-order Markov traffic by using the state space F ( E, N ). Ergodicity of Markov traffic.
Let π be a p.d. on V . Using π as an initial distribution let us define the n -th absolute p.d. π n by the recursion π (cid:62) n = π (cid:62) n − P , n ∈ N . Clearly, π (cid:62) n = π (cid:62) P n , where the product of twoMarkov kernels P and Q on V is defined as ( P Q ) uw := (cid:80) v ∈ V p uv q vw , u, w ∈ V . If G is strongly connected and forthe initial distribution π = π , where π is the unique sta-tionary solution for P , see Theorem 1, then π (cid:62) n = π (cid:62) P n = π (cid:62) for all n ∈ N . Thus, in this case, π n → π as n → ∞ .However, in general, the sequence ( π n ) n ∈ N does not con-verge to the stationary distribution for all initial distribu-tion π even if the stationary distribution is unique. How-ever, if we consider the average of the n -th absolute p.d.’sin time, we have the convergence to the unique stationarydistribution. The Markov kernel P is called ergodic if itsatisfies this property.The following result on G -compatible Markov kernelis based on Theorem 4.1 in Chapter 3 of [56]. If a roadnetwork G is strongly connected then any G -compatibleMarkov kernel P is ergodic and the average Markov kernel A n converges, i.e., A n := ( n + 1) − ( I + P + . . . + P n ) → Π := π (cid:62) as n → ∞ , where π is the unique stationary distribution of P . Moreover, the limiting probabilities of the time av-erages of the absolute p.d.’s satisfy( n + 1) − ( π + π + . . . + π n ) → π (10)as n → ∞ for all initial p.d. π .In applications, along absolute p.d.’s, we may also beinterested in some functionals of these distributions, e.g.,the number of vehicles in a region of the road network.Define the functional F ( π ) := (cid:88) v ∈ V f v π v , of p.d. π , where f ∈ F ( V, R ). Then, (10) can be extendedthat n − (cid:80) nk =1 F ( π k ) → F ( π ) as n → ∞ , see Theorem 4.1of [56].Instead of time averages, in order to achieve the con-vergence of n -th absolute p.d.’s we need the additional as-sumption of aperodicity for G , see Theorem 2.1 in Chapter4 of [56]. If G is an aperiodic, strongly connected road net-work and P is a G -compatible Markov kernel on it, thenthe sequence of Markov kernels P n , n ∈ N , converges tothe limiting Markov kernel Π. Moreover, the limit of thesequence of n -th absolute p.d. π n is the unique stationarydistribution π to the Markov kernel P which is indepen-dent of the initial p.d. π .For any functional F we also have that F ( π n ) con-verges to F ( π ) as n → ∞ on an aperiodic, strongly con-nected road network. In [56] stronger convergence con-cepts, e.g., the convergence in variation, are also investi-gated.Note that similar results hold for any G -compatibleMarkov kernel P (cid:48) on V (cid:48) = E .The above results on Markov kernels on V (or E ) canbe extended to Markov kernels on the state space of trafficconfigurations. Let (cid:37) be an initial p.d. on F k and letus define the n th absolute p.d. (cid:37) n on F k by the recur-sion (cid:37) n := R (cid:37) n − , n ∈ N , where R is a Markov kernelon F k induced by a G -compatible Markov kernel P on G ,see formula (6). One can prove that the irreducibility andaperiodicity of P imply the same properties for R , respec-tively.Our main result on ergodicity of Markov traffic is thefollowing theorem. Note that the n th power of R is definedrecursively as R n (cid:37) := R ( R n − (cid:37) ), n = 2 , , . . . , by formula(7). Theorem 2.
Let G be a strongly connected and aperiodicroad network and P be a G -compatible Markov kernel.Then, there is a unique stationary distribution (cid:37) to theMarkov traffic described by the Markov kernel R on F k induced by P which has the form (5).Moreover, the Markov traffic is ergodic in the sensethat we have R n ( f , g ) → (cid:37) ( g )as n → ∞ for all f , g ∈ F k and, for all initial p.d. (cid:37) on F k , (cid:37) n ( f ) → (cid:37) ( f )10s n → ∞ for all f ∈ F k .Furthermore, the p.d. π on G can be unfolded by thelimit of state space averages in time as1 k (cid:88) f ∈F k f v (cid:37) n ( f ) → π v (11)as n → ∞ for all v ∈ V .Note that formula (11) follows from the well-kown factthat the expectation vector of a multivariate distributionwith parameters k and π is equal to k π , see formula (35.6)in [59].These convergence results guarantee that the uniquestationary distribution of a G -compatible Markov kernelcan be approximated and thus explored by long run be-havior of absolute p.d.’s on the traffic configurations of theroad network. Two-dimensional stationary distribution.
Usingtwo-dimensional distributions a Markov traffic can be re-parametrized in the following way. Introduce the two-dimensional distribution Q = ( q uv ) on V × V as q uv := π u p uv , u, v ∈ V . Then, Q is a two-dimensional stationarydistribution on G in the following sense: Definition 3.
A matrix Q = ( q uv ) u,v ∈ V is called two-dimensional stationary distribution on G if (i) q uv ≥ u, v ∈ V and q uv = 0 for all u, v ∈ V such that( u, v ) / ∈ E ∪ S (i.e., Q is weakly G -subordinated); (ii) (cid:80) u,v ∈ V q uv = 1 (i.e., Q is a normalized matrix on V );and (iii) (cid:80) v ∈ V q uv = (cid:80) v ∈ V q vu for all u ∈ V (i.e., Q hasequidistributed marginals).A two-dimensional stationary distribution Q on G iscalled (strictly) positive if q uv > u, v ∈ V suchthat ( u ⇒ v ) u → v .Property (iii) states that the two (row-wise and column-wise) marginal distributions of a two-dimensional station-ary distribution on G coincide with each other. Clearly,for a Markov traffic, Q defined above is a positive two-dimensional distribution on G . Q can also be consideredas a p.d. on the state space E ∪ S , i.e., if we extend the set V (cid:48) of vertices of L( G ) as V (cid:48) = E ∪ S , on the line digraph.Thus, Q can be interpreted as the distribution of the ve-hicles on the edges of the road network, i.e., on the linedigraph, see formula (11) in [6]. The distribution Q canalso be visualized on the edges, see, Fig. 6 for the toy ex-ample and Fig. 11 in case of the Porto example discussedlater. However, the converse of this statement is not truebecause there is p.d. on the line digraph which does notsatisfy (iii). If { X t } t ∈ Z + is a Markov random walk thenthe two dimensional distribution of any consecutive pair( X t , X t +1 ), t ∈ Z + , corresponds with Q .Denote by Q the set of two-dimensional stationary dis-tributions on G . One can easily see that Q is closed withrespect to the affine combination. Namely, if Q , Q ∈ Q then λQ + (1 − λ ) Q ∈ Q for all λ ∈ [0 , Q ∈ Q , let us define π u := (cid:88) v ∈ V q uv = (cid:88) v ∈ V q vu , u ∈ V,p uv := q uv π u , u, v ∈ V. (12)Then, P = ( p uv ) defines a G -compatible Markov kernelwith stationary distribution π on G . Thus, a Markov traf-fic defined by the quadruple ( G, P, π , k ) can be introducedby an equivalent way through the triplet ( G, Q, k ). How-ever, it will turn out later that, from a statistical point ofview, the parameter matrix Q can be estimated in a com-putationally more efficient way than the pair of transitionmatrix P and its stationary distribution π .With the help of two-dimensional stationary distribu-tion, we can assign a p.d. to any Markov traffic on thespace of traffic configurations which are defined as count-ing functions on the edges of the road network. Namely,define the traffic configuration h = ( h uv ) u ⇒ v as a functionin F ( E ∪ S, N ). Then, h uv denotes the number of vehi-cles on the edge ( u, v ) where u, v ∈ V such that u ⇒ v .Similarly to (5), the two-dimensional distribution σ onthe set of traffic configurations h with size k ( k ∈ N ), i.e., (cid:80) u ⇒ v h uv = k , induced by a p.d. π on G can be expressedas a multinomial distribution with parameter k and Q as σ ( h ) := k ! (cid:89) u ⇒ v q h uv uv h uv !for all h ∈ F ( E ∪ S, N ).Finally, one can easily see that the concept of two-dimensional stationary distribution can be extended forthe second-order Markov traffic as well. For our experiments, we needed a dataset of real-lifetraffic trajectory data. In our terminology, a trajectoryis a sequence of data that provides information about thepath of a vehicle moving from a start to an end point,associating geographic coordinates with timestamps. Werequired a dataset that satisfies the following criteria:1. Contains complete trajectories, i.e., the availabilityof only the start and end points is not sufficient, in-termediate trajectory points must also be available.2. The trajectory points must be sampled at a highenough frequency, so that the distance between con-secutive points should not be too large, (e.g., an av-erage distance of the order of 10 meters is acceptable,but an average distance of the order of 100 meters isdefinitely not.)3. The dataset is sufficiently large. It should cover along enough period of time, preferably uniformly.The number of trajectories per day should be of theorder of thousands.11 : 1/72: 2/71: 1/7 4: 2/75: 1/7
Figure 6: The two-dimensional stationary distribution (on edges) with its equidistributed marginals (on vertices) on the road network inFig. 1 for the Markov kernel in Fig. 3. One can easily check that the sums of probabilities written on the edges in and out each vertex areequal, respectively.
4. Trajectories should cover a relatively small geographicarea, e.g., a city or a district.5. Vehicles should not follow a fixed route, e.g., publictransport bus trajectories are not suitable.6. Publicly available for research purposes.These requirements were satisfied by the Taxi Trajec-tory Prediction (TTP) dataset from Kaggle. The datasetcovers a period of one year from July 1, 2013 to June30, 2014. It is split into a training and a test set, the for-mer contains 1,710,670 trajectories, and the latter contains320. The trajectories were collected in the city of Porto,Portugal, with a sampling rate of 15 seconds. First, wecreated a subset of the dataset, filtered to coordinates be-tween W8.6518, W8.5771, N41.1129, N41.1756, see Fig. 7.The data samples’ features that were not relevant to theresearch, such as origin of call, identifiers for individualtaxi or customers, and type of day (i.e. workday, week-end, holiday) were omitted. The processed format in-cluded the time of departure, both as a timestamp andas distinct date attributes, the length of the trajectory,and the points of the trajectory, represented as a list ofGPS coordinates. Some data samples contained incom-plete trajectories, these were discarded. Because of theproperties of the proposed simulation model, the data wasfiltered to include only those samples that had a time ofdeparture between 8-9 am. As a result, 82,345 trajecto-ries remained. Although the length of trajectories had awide range (the longest has 2,324 sample points), long tripswere rare. Fig. 8a shows the distribution of the length oftrajectories. Most routes were around a length of 41 sam-ple points, and routes with over 150 points were less than1% of the dataset, see Fig. 8b. The distribution of thetrajectory points (all, difference of start and end points,
Figure 7: The map of the observed area. The graph created fromthe OSM data has 33,961 nodes, 53,126 edges, and covers a total of857.26 km of road. The size of the area is about 43.68 km . (Imagesource: openstreetmap.org) histogram of the difference) is shown in Fig. 9. The de-scriptive statistics of the dataset is shown in Table 2.12 ength (meters) N u m be r o f t r a j e c t o r i e s (a) Histogram of trajectory lengths. The rightmost bar representstrajectories longer than 12,500 meters. The average trajectorylength is 3,628.93 meters. Number of sample points N u m be r o f t r a j e c t o r i e s (b) Histogram of number of sample points per trajectories. Therightmost bar represents trajectories with more than 200 samplepoints. On the average, a trajectory consists of 40 sample pointsand takes 10 minutes.Figure 8: Histograms of lenght of trajectories and sample points. Long La t count (a) Distribution of all trajectory points shownin a 2D histogram (number of bins: 80 × Long La t count (b) Difference of trajectory starting and end-points shown in a 2D histogram (number ofbins: 80 × Difference N u m be r o f b i n s −20 −10 0 10 20 (c) Histogram of the difference of trajectorystarting and endpoints.Figure 9: Distribution of trajectory points of the filtered dataset. OpenStreetMap (OSM) is a community project to builda free map of the world to which anyone can contribute.Data is available under the Open Data Commons OpenDatabase License (ODbL). The representation and stor-ing of map data is based on a simple but powerful model,that uses only three modeling primitives, namely, nodes, ways, and relations : 1. A node represents a geographi-cal entity with GPS coordinates. 2. A way is an orderedlist of at least two nodes. 3. A relation is an ordered listof nodes, ways, and/or relations. All of these modelingelements can have associated key-value pairs called tagsthat describe and refine the meaning of the element to http://wiki.openstreetmap.org/wiki/Elements able 2: Descriptive statistics of lengths of trajectories. (82,345 totaltrajectories.) Name of statistics Dist in points Dist in metersMean 39.53 3,628.93Median 35 3,176.786Mode 34 0Standard Deviation 31.64 2,408.93Kurtosis 473.28 9.9Skewness 12.11 1.78Minimum 2 0Maximum 2,324 61,055.58which they belong. Users can export map data at theOSM web site manually, selecting a rectangular region ofthe map. Alternatively, map data can be extracted viaweb services. OSM uses two formats for exporting mapdata, namely OSM XML and PBF. Software libraries forparsing and working with OSM data are available for sev-eral programming languages. We started our processing by building a graph from theOSM map of Porto, with the same bounding box as thefiltered dataset. Specific nodes of the OSM file become thenodes in the graph. Because we only need those nodes thatcan be reached via vehicles, we had to filter the OSM fileand collect only specific types of way nodes. In the OSMfile, a way is a sequence of OSM nodes, so naturally, thenodes of ways become nodes in the graph. For every nodewe store the node’s OSM ID, and its coordinates. We alsoinsert an edge into the graph between every nodes in way.The weight of an edge is given by the squared distancebetween the nodes, which we calculate from the OSM file’sdata. We used pyosmium library for processing the OSMfiles and the NetworkX Python library for building thegraph.After building the graph we process the list of trajec-tories. Because the trajectories are given in GPS coor-dinates, we first have to translate those coordinates intoOSM node IDs. For every coordinate in a trajectory, wesearch for the closest way node’s coordinates in the builtgraph, so the result nodes have the same domain as thebuilt graph’s nodes. Obviously, the original trajectoriesmade up of GPS coordinates does not have the same scal-ing as the OSM map. The coordinates in the trajectoryare sampled in regular, but larger time intervals than theOSM, so they are not aligned. In order to match a trajec-tory to a way in our graph, we had to perform an interpo-lation on the result list of node IDs, so we ran a Dijkstra’sshortest path algorithm on our graph between every nodeIDs for every trajectory. Because the OSM database con-tains errors, it can happen that in real life a route existsbetween two given places, but in the OSM database, thereare no existing routes between those nodes that are rep-resenting the given places. In this case, we cut the faulty http://wiki.openstreetmap.org/wiki/API https://wiki.openstreetmap.org/wiki/Frameworks trajectories into pieces. The result of this process is anaperiodic strongly connected road network augmented bythe ideal vertex 0. The statistical analysis of a traffic systems describedby the Markov traffic model means the estimation of thequadruple (
G, P, π , k ) or the triplet ( G, Q, k ) using ob-served data. (In this subsection only the first-order Markovtraffic is considered.) To estimate G we have to explorethe road system under study by identifying the set V ofvertices and the set E of directed road segments. For-tunately, this exploring has already been done by a feworganizations, see, e.g., the Google Maps and the Open-StreetMap. However, we should note that, in case of GPS-based trajectory data, we have to fit the data to the ap-plied map system which is not an evident task at all, seesection 2.5. In the present paper, we propose a methodfor estimating the two-dimensional stationary distribution Q immediately instead of the pair ( P, π ) of a transitionmatrix and its stationary distribution using mobile sensordata which may be gathered by vehicles, passengers etc.In this case, we have trajectories data which consists of thesequences of consecutive vertices, like in the TTP dataset.By (12), the estimators for P and π can be easily derivedfrom an estimator of Q . Finally, it is supposed that thesize k of the traffic is known.Suppose that, for a Markov traffic, we observed a ran-dom sample of trajectories { X i } , i = 1 , . . . , k , of size k defined by X i ⇒ X i ⇒ . . . ⇒ X in i , i = 1 , . . . , k ,where n i denotes the length of the i -th trajectory. Let n := n + . . . + n k be the total sample size. Define the to-tal two-dimensional consecutive empirical frequencies as: n uv := k (cid:88) i =1 n iuv , (13) u, v ∈ V , where the trajectory-wise two-dimensional con-secutive empirical frequencies, i = 1 , . . . , k , are defined as n iuv := n i − (cid:88) j =1 I ( X ij = u, X ij +1 = v ) ,u, v ∈ V . Plainly, n iuv denotes the number of consecutive( u, v ) ( u, v ∈ V ) pairs in the i -th trajectory. One cansee that since { X i } is a proper Markov random walk wehave n iuv = 0 for all ( u, v ) / ∈ E ∪ S . Thus, the support ofthe two-dimensional frequency matrices N := ( n uv ) u,v ∈ V , N i := ( n iuv ) u,v ∈ V , i = 1 , . . . , k , is a subset of E ∪ S , i.e.,they are weakly G -subordinated matrices. Clearly, N = (cid:80) ki =1 N i and we have (cid:88) u,v : u ⇒ v n uv = n − k, (14)14here n − k is the corrected sample size. Introduce s v := k (cid:88) i =1 I ( X i = v ) , e v := k (cid:88) i =1 I ( X in i = v ) ,v ∈ V , i.e., s v denotes the number of trajectories whichstart at vertex v and e v denotes the number of trajec-tories which terminate at vertex v , respectively. Denotethe one-dimensional marginal frequencies of N by n v + := (cid:80) u ∈ V n vu and n + v := (cid:80) u ∈ V n uv , v ∈ V . We obtain that n v + + e v = n + v + s v = n v := k (cid:88) i =1 n i (cid:88) j =1 I ( X ij = v ) (15)for all v ∈ V , where n v denotes the number of vertex v inall trajectories. Finally, (cid:88) v ∈ V s v = (cid:88) v ∈ V e v = k. (16)Define the vectors s and e on V as s := ( s v ) v ∈ V and e :=( e v ) v ∈ V , respectively. Then, (16) implies that (cid:62) ( e − s ) =0, i.e., the vectors e − s and are orthogonal.The traditional maximum likelihood (ML) estimator (cid:98) P ML of the transition matrix P is given by the maximiza-tion of the conditional loglikelihoodlog L = (cid:88) u ⇒ v n uv log p uv in parameters p uv , u, v ∈ V such that u ⇒ v , under theconstraints p u + = 1 for all u ∈ V . A solution of thisconstrained optimization problem is (cid:98) p ML uv = n uv /n u + forall u, v ∈ V if n u + > (cid:98) p ML uv = δ uv if n u + = 0where δ denotes the Kronecker delta. The maximum like-lihood estimator (cid:98) π ML of the stationary distribution π isderived by the solution of the global balance equation π (cid:62) = π (cid:62) (cid:98) P ML in π . Thus, the maximum likelihood es-timator (cid:98) Q ML = ( (cid:98) q ML uv ) of Q is given by (cid:98) q ML uv = (cid:98) π ML u (cid:98) p ML uv , u, v ∈ V . In the sequel, a direct method is proposed forestimating the two-dimensional stationary distribution Q .A naˆıve estimator for the two-dimensional stationarydistribution Q based on the two-dimensional consecutiveempirical frequency matrix N is (cid:98) Q naˆıve := ( n − k ) − N .Clearly, (cid:98) Q naˆıve , as a non-negative matrix on V , satisfiesthe properties (i) and (ii) of Definition 3. However, theproblem with this naˆıve estimator is that its row and col-umn marginals are not necessarily equal, i.e., in general, itdoes not satisfy the asumption (iii) of Definition 3. Hence,we have to introduce a new estimator (cid:98) Q which belongs to Q and is optimal in some sense.The optimality of the proposed estimator is defined bymeans of the least squares distance between matrices over G . Let A = ( a uv ) u,v ∈ V and B = ( b uv ) u,v ∈ V such that a uv = b uv = 0 for all u, v ∈ V where u (cid:59) v , i.e., let A and B be weakly G -subordinated matrices. The distancebetween A and B is defined as (cid:107) A − B (cid:107) G := (cid:32) (cid:88) u,v : u ⇒ v | a uv − b uv | (cid:33) / . In fact, (cid:107) · (cid:107) G is the Frobenius norm of the matrices ofdimension | V | × | V | which vanish on the entries outside of E ∪ S .To formulate our error or objective function for esti-mating the two-dimensional stationary distribution it isconvenient to weaken the assumptions of Definition 3 byleaving the normalizing assumption (ii). In the sequel,let M = ( m uv ) denote a non-negative parameter matrixon G which satisfies assumptions (i) and (iii) of Defini-tion 3, i.e., M is weakly G -subordinated and (cid:80) v ∈ V m uv = (cid:80) v ∈ V m vu for all u ∈ V . Then, one can easily derive atwo-dimensional stationary distribution Q from M by itsnormalization defining as Q := ( (cid:62) M ) − M .Based on k number of trajectories, using the Frobeniusnorm, the optimality criterion is defined as the weightedsum of squared errors (SSE):SSE( M, w | N ) := k (cid:88) i =1 w − i (cid:107) N i − w i M (cid:107) G , (17)where M is a non-negative parameter matrix satisfyingassumptions (i) and (iii) of Definition 3, w = ( w i ) i =1 ,...,k are non-negative unknown weights, i.e., (cid:80) ki =1 w i = 1, and N := ( N i ) i =1 ,...,k denotes the data, where N i is the two-dimensional consecutive empirical frequency matrix for the i th trajectory, see (13). The statistical inference for aMarkov traffic means the minimization of the objectivefunction SSE in its parameters M and w deriving theweighted least squares (WLS) estimators (cid:99) M WLS and (cid:98) w WLS .Then, the WLS estimator of Q is defined as (cid:98) Q WLS := n − (cid:99) M WLS where n eff := ( (cid:62) (cid:99) M WLS ) is the so-called ef-fective sample size. Here, (cid:98) Q can be interpreted as theestimated two-dimensional stationary distribution whichdescribes the individual Markov traffic in time. On theother hand, n eff gives the equivalent sample size relatedto the independent case which may be thought of as theinformation content of the observed data. Note that n eff is not necessarily an integer and is different from n and n − k . Finally, w i gives the importance of the i th trajec-tory in the sample. One can see that longer trajectoryimplies higher weight.To formulate our result on WLS estimation of Markovtraffic we need some basic facts on the spectral theoryof directed graphs, see [60] for details. The symmetricunnormalized graph Laplacian matrix L of a digraph G isdefined as L := D − A − A (cid:62) where A denotes the adjacency matrix of G and D :=diag { d + + d − } . Note that for the road graph G , sincethere is no loop, we have l vv = d vv = deg + ( v ) + deg − ( v )for all v ∈ V . The main theorem of this paper is thefollowing. Theorem 3.
There is a unique pair ( (cid:99) M WLS , (cid:98) w WLS ) whichminimizes the weighted sum of squared errors SSE defined15n (17). These WLS estimators are derived as (cid:98) w i WLS := (cid:107) N i (cid:107) G (cid:80) kj =1 (cid:107) N j (cid:107) G ,i = 1 , . . . , k , and (cid:99) M WLS := N + ( λ (cid:62) − λ (cid:62) ) ◦ A, where λ ∈ F ( V, R ) is called Lagrange vector and definedas a unique solution to the linear equation L λ = s − e and ◦ denotes the entrywise (Hadamard) product of matrices.Based on the previous theorem, by (14), the effectivesample size is given as n eff := ( n − k ) + ( d − − d + ) (cid:62) λ , (18)i.e., n eff depends only on the graph structure of the roadnetwork, which is independent of the data, the traffic di-rection vector s − e , and the corrected sample size. How-ever, it does not depend on the data which are inside thetrajectories.The WLS estimators proposed above can be consideredas a kind of composite (or quasi-) likelihood estimators forMarkov chains, see [9]. The composite likelihood method iswidely applied in complex statistical models when the fullML method can be difficult to apply or may not be robustenough. In our method, the objective function is based onpairwise marginal distributions, however, instead of for-mula (2) in [9], the quasi-likelihood function is a squarefunction, the logarithm of the normal probability densitywith heteroscedastic variance which depends on the lengthof trajectories. The latter will be more clear by introduc-ing the mean squared error (MSE) asMSE := n − SSE = k (cid:88) i =1 n i eff (cid:107) ( n i eff ) − N i − Q (cid:107) G , (19)where n i eff := w i n eff denotes the effective sample size ofthe i th trajectory, i = 1 , . . . , k . The parameters of the ob-jective function MSE are the effective sample sizes { n i eff } and the two-dimensional stationary distribution Q . Theheuristic explanation of the need to use weights in formu-las (17) and (19) is the following. By the Central LimitTheorem, for large n i , the trajectory-wise two-dimensionalconsecutive empirical frequency matrix N i can be approx-imated as N i ≈ n i Q + n / i ξ i , where ξ i is a normally dis-tributed random matrix on V for all i = 1 , . . . , k , which isa heteroscedastic equation between the observed N i andthe parameter matrix Q . Hence, (cid:107) N i − n i Q (cid:107) G ≈ n i (cid:107) ξ i (cid:107) G ,where (cid:107) ξ i (cid:107) G , i = 1 , . . . , k , are independent identically dis-tributed r.v.’s. Thus, we have to normalize the trajectory-wise squared errors proportionally to their lengths, respec-tively, in order to get balanced error terms.The estimation theory of finite Markov chains goesback for a long time, see [61]. In the traditional MLapproach the estimators of the transition and stationary probabilities are derived by corresponding relative frequen-cies, respectively. However, these estimators have a fewproblems which imply that they can be applied with lim-ited success for estimating the Markov traffic on a roadnetwork. Firstly, they are based on only one long trajec-tory (or realization). However, in a real traffic datasetthere is a large number of relatively short trajectories, i.e.,the set { n i , i = 1 , . . . , k } are bounded, where k is large ortends to infinity. In our example, for the TTP dataset,the number of trajectories is above 80K with the meanlength 40 and maximum length 2K, see Table 2. Secondly,they are asymptotic estimators in the sense that, for finitesample size, the estimated stationary distribution does notsatisfy the global balance equation given by the estimatedtransition probability matrix. The global balance equa-tion holds only asymptotically, i.e., when the sample sizetends to infinity. In fact, the inaccuracy in the global bal-ance equation is not too large, however, this little bias cancause significant discrepancy from the “true” stationarydistribution in the simulation. Thirdly, the trajectoriesare biased during a short time period in the sense thatthey are starting from some parts of the road network andending at other parts. For example, in the morning periodthe vehicles are moving from the residential districts tothe business districts of the city and they are moving backin the afternoon period. In other words, the traffic hasa definite direction on the road network. To demonstratethis behavior in the case of TTP dataset, Fig. 9c shows thedistribution of the elements of the traffic direction vector s − e while Fig. 9b shows their spatial distribution. Nei-ther distributions are concentrated around the zero. Theknown improvements of the ML estimators, e.g., by usingthe bootstrap, see [62], do not solve these problems. How-ever, the WLS estimator of the two-dimensional stationarydistribution proposed in this paper is able to handle all ofthese problems. The estimator (cid:98) Q WLS is taking accountof more than one trajectory with their length. It deter-mines uniquely both the transition probability matrix andits stationary distribution by (12) which satisfy the globalbalance equation obviously. Finally, by taking account ofthe traffic direction vector in the estimator, it can correctthe bias due to the unbalanced sampling of trajectories onthe road network.The fundamental statement of Theorem 3, as one ofthe main result of this paper, is that the estimator (cid:98) Q WLS (or (cid:99) M WLS ) consists of two parts: the first part is the naˆıveestimator for the distribution of the consecutive pairs intrajectories based on the empirical frequencies, while thesecond part is a correction term ensuring that (cid:98) Q WLS (or (cid:99) M WLS ) has equidistributed marginals. The second partalso depends on two components. The first one is theLaplacian matrix of the road graph which depends only onthe graph structure of the road network and independentfrom the trajectory data. The second one is the traffic di-rection vector which depends only on the trajectory data.Note that all sufficient statistics, namely the total two-16imensional consecutive empirical frequencies and start-ing and ending empirical frequencies, can be computed bycounting, which is numerically very effective and can beexecuted even for big data.The following propositions and remarks are useful incomputing the Lagrange vector λ in a numerically effectivemanner. The first proposition sums up the basic propertiesof the symmetric unnormalized graph Laplacian matrix.Note that 1)–4) are based on Proposition 1 in [60]. Proposition 1.
The matrix L satisfies the following prop-erties:1. For all α ∈ F ( V, R ) we have α (cid:62) L α = 12 (cid:88) u,v ∈ V ( a uv + a vu )( α u − α v ) (20)2. L is symmetric and positive semi-definite.3. The smallest eigenvalue of L is 0, the correspondingeigenvector is the constant one vector .4. L has | V | non-negative, real-valued eigenvalues 0 = τ ≤ τ ≤ . . . ≤ τ | V | and corresponding orthonormaleigenvectors = α , α , . . . , α | V | in F ( V, R ).5. If G is strongly connected then τ >
0, i.e., the multi-plicity of the eigenvalue τ = 0 is 1, and L is invert-ible on the | V | − S := { α ∈ F ( V, R ) | (cid:62) α = 0 } and the inverse canbe expressed as L − S = (cid:80) | V | j =2 τ − j α j α (cid:62) j .By Proposition 1 the Lagrange vector can be expressedby the eigenvalues and the eigenvectors of L as λ = L − S ( s − e ) = | V | (cid:88) j =2 τ − j ( α (cid:62) j ( s − e )) α j . This formula can be applied for approximating λ by leav-ing the too large eigenvalues which have negligible inversein the above finite sum in a large road network. Thus, if | V | is too large then it is enough to store the first few sig-nificant eigenvalues and eigenvectors. Remark that theseeigenvalues and eigenvectors, being independent from thedata, can be computed and stored in advance for a simu-lation program.If a sharp bound is needed for the eigenvalues then theapplication of symmetric normalized Laplacian matrix ismore useful. The symmetric normalized adjacency matrix (cid:101) A and the symmetric normalized graph Laplacian matrix (cid:101) L of a digraph G are defined as (cid:101) A := D − / ( A + A (cid:62) ) D − / , (cid:101) L := I − (cid:101) A. For a road graph, we have (cid:101) l vv = 1 for all v ∈ V . One cansee that L = D / (cid:101) LD / , (cid:101) L := D − / LD − / The next proposition is based on Proposition 3 in [60], thestatements 5)–7) are new as far as we know.
Proposition 2.
The matrices (cid:101) A and (cid:101) L satisfy the follow-ing properties:1. For all α ∈ F ( V, R ) we have α (cid:62) (cid:101) L α = 12 (cid:88) u,v ∈ V ( a uv + a vu ) (cid:18) α u √ d u − α v √ d v (cid:19) (21)2. (cid:101) L and (cid:101) A are symmetric and (cid:101) L is positive semi-definite.3. The smallest eigenvalue of (cid:101) L is 0, the correspondingeigenvector is the vector D / .4. (cid:101) L has | V | non-negative, real-valued eigenvalues 0 = (cid:101) τ ≤ (cid:101) τ ≤ . . . ≤ (cid:101) τ | V | ≤ D / = (cid:101) α , (cid:101) α , . . . , (cid:101) α | V | in F ( V, R ).5. If G is strongly connected then (cid:101) τ >
0, i.e., the multi-plicity of the eigenvalue (cid:101) τ = 0 is 1, and (cid:101) L is invert-ible on the | V | − (cid:101) S := { (cid:101) α ∈ F ( V, R ) | (cid:101) α (cid:62) D / = 0 } and the inversecan be expressed as (cid:101) L − (cid:101) S = (cid:80) | V | j =2 (cid:101) τ − j (cid:101) α j (cid:101) α (cid:62) j .6. (cid:101) A has | V | real-valued eigenvalues 1 = (cid:101) (cid:37) ≥ (cid:101) (cid:37) ≥ . . . ≥ (cid:101) (cid:37) | V | ≥ −
1, where (cid:101) (cid:37) j = 1 − (cid:101) τ j , j = 1 , . . . , | V | ,and corresponding orthonormal eigenvectors D / = (cid:101) α , (cid:101) α , . . . , (cid:101) α | V | in F ( V, R ).7. If G is strongly connected and | V | ≥ − < (cid:101) (cid:37) | V | ≤ (cid:101) (cid:37) < (cid:101) L − (cid:101) S (cid:101) α = ∞ (cid:88) k =0 (cid:101) A k (cid:101) α for all (cid:101) α ∈ (cid:101) S , where the infinite series is absolute convergent. More-over, the convergence is exponentially fast since (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:101) L − (cid:101) S (cid:101) α − n (cid:88) k =0 (cid:101) A k (cid:101) α (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ κ n +1 − κ (cid:107) (cid:101) α (cid:107) for all (cid:101) α ∈ (cid:101) S , where κ := max {| (cid:101) (cid:37) | , | (cid:101) (cid:37) | V | |} < (cid:101) α ∈ (cid:101) S , (cid:101) L − (cid:101) S (cid:101) α can be obtained as the limit of the iteration (cid:101) β n = (cid:101) A (cid:101) β n − + (cid:101) α , n ∈ N , (cid:101) β = 0 , where (cid:101) β n ∈ (cid:101) S for all n ∈ N .Finally, we remark that λ can be computed by thesymmetric unnormalized graph Laplacian matrix as fol-lows. By equation (Appendix B) in Appendix B we have D / (cid:101) LD / λ = s − e and thus, if (cid:101) λ := D / λ , we have17 L (cid:101) λ = D − / ( s − e ), where D − / ( s − e ) ∈ (cid:101) S . Thus (cid:101) λ = (cid:101) L − (cid:101) S D − / ( s − e ) and λ = D − / (cid:101) λ . By Proposition 2we have λ = D − / (cid:101) L − S D − / ( s − e )= | V | (cid:88) j =2 (cid:101) λ − j (cid:16) (cid:101) α (cid:62) j D − / ( s − e ) (cid:17) D − / (cid:101) α j and (cid:101) λ is the limit of the following iteration (cid:101) λ n = (cid:101) A (cid:101) λ n − + D − / ( s − e ) , n ∈ N , (cid:101) λ = 0 , where (cid:101) λ n ∈ (cid:101) S for all n ∈ N .
3. Results
To evaluate the performance of the proposed WLS es-timation method by comparing it to the traditional MLone discussed in section 2.6 a simple simulation study wasconducted at different sample sizes for small and mediumroad network. In the simulations, in order to mimic thereal traffic, we tried to keep the length of trajectories lowand the number of trajectories high compared to the sizeof the road network, similarly to the Porto example. Theabsolute bias of an investigated estimator (cid:98) Q for the two-dimensional stationary distribution Q as a parameter isdefined by (cid:107) (cid:98) Q − Q (cid:107) G . The empirical absolute bias and itsstandard error (SE) correspond to the mean and standarddeviation of absolute biases in 100 replications, respec-tively. All simulations were carried out in Python usingthe PyDTMC library developed for analysing discrete timeMarkov chains . The codes and datasets of our simulationare available upon request.Table 3 displays the simulation results for the smallroad network in Fig. 1 using the Markov kernel of Fig. 3,see the toy example in Appendix A. The simulation pa-rameters were k = 100 , , , and 1000 number of tra-jectories with n = 3 , , and 10 length. The absolute biasand its standard error do not depend on the length n andthey are decreasing as k is increasing for both estimationmethods. The latter is an expected result. Moreover, whilefor relatively small k the performance of the WLS and MLmethods are similar, in the case of relatively large numberof trajectories the ML estimator outperforms the WLS onea little bit. This phenomenon could be due to the asymp-totic optimality of the ML estimator because the param-eter k is enough large (1000) compared to the size of theroad graph (5).In the second simulation scenario, a strongly connectedsubgraph, which contains 1000 vertices, of Porto’s roadnetwork was chosen (exported from the OSM, as well, GPScoordinates W8.6137, W8.5991, N411573, N41.1437). Theentries of Markov kernel were generated randomly. The https://pypi.org/project/PyDTMC/ Table 3: Simulation results, absolute bias and SE (inside parenthe-sis), for the Markov kernel in Fig. 3 on the road network in Fig. 1.( k - number, n - length of trajectories) k n ML WLS100 3 0.034 (0.0103) 0.034 (0.0109)100 5 0.035 (0.0106) 0.034 (0.0103)100 10 0.033 (0.0095) 0.033 (0.0094)200 3 0.024 (0.0066) 0.026 (0.0066)200 5 0.023 (0.0064) 0.024 (0.0067)200 10 0.024 (0.0071) 0.025 (0.0070)500 3 0.015 (0.0046) 0.017 (0.0049)500 5 0.015 (0.0041) 0.017 (0.0049)500 10 0.016 (0.0047) 0.017 (0.0049)1000 3 0.010 (0.0032) 0.013 (0.0041)1000 5 0.011 (0.0034) 0.015 (0.0044)1000 10 0.010 (0.0030) 0.014 (0.0040) Table 4: Simulation results, absolute bias and SE (inside parenthe-sis), for a part of Porto’s map with 1000 vertices. ( k - number, n -length of trajectories) k n ML WLS1000 3 0.166 (0.0559) 0.025 (0.0007)1000 5 0.184 (0.1214) 0.025 (0.0007)1000 10 0.169 (0.0938) 0.025 (0.0008)3000 3 0.064 (0.1725) 0.023 (0.0005)3000 5 0.070 (0.1665) 0.023 (0.0005)3000 10 0.063 (0.1705) 0.023 (0.0005)5000 3 0.016 (0.0150) 0.023 (0.0004)5000 5 0.014 (0.0055) 0.023 (0.0004)5000 10 0.014 (0.0126) 0.023 (0.0003)simulation parameters were k = 1000 , , and 5000 with n = 3 , , and 10. In this scenario, the absolute bias andits standard error are also independent of the length n .However, there are significant differences between the per-formances of the two estimation methods (ML and WLS)related to the parameter k . On the one hand, the abso-lute bias of ML estimator is decreasing as k is increasingwhile it is constant for WLS estimator. On the other hand,the WLS estimator is better than the ML one in case of k = 1000 but worse in case of k = 5000. Since the formerparameter setting is closer to the real traffic, this simula-tion corroborates the superiority of WLS method based ontwo-dimensional stationary distribution against the tradi-tional maximum likelihood. Finally, in this scenario, thescale of the SE’s indicates that the WLS estimator is morestable than the ML one.We have also implemented the model in the OOCWCsystem. First, we have filtered the TTP dataset as de-scribed in section 2.4. Then, we have created the Markovkernel from the filtered dataset following the method de-scribed in section 2.6 (see also Appendix A for a toy ex-ample). Regarding the RCE, we performed several modifi-cations. First, we extended the operation of the RCE to beable to handle kernel files. This kernel file can be loaded18 able 5: Transitions of intersection 1110673569. (TP – transitionprobability) Neighbor node k = 5 , , , , , , ,
000 and 50 , Figure 10: A visual explanation of transitions of intersection1110673569. TP means transition probability, yellow arrows indi-cate directions, red dots indicate nodes. (Source of the map: open-streetmap.org, annotated by the authors.) (cid:98) Q WLS estimator and then TPmatrix using the method described in section 2.6. For ma-trices with this size (34,000 x 34,000), we cannot solve thelinear equation of the Lagrange vector in Theorem 3 alwaysnumerically, thus we could only use the least square solu-tion for a numerically stable calculation. In some cases,this causes impossible numbers to present in the TP ma-trix, e.g. for a node, the TP vector is [1.17489, -0.174894],which is obviously impossible. It is interesting to note thatthe sum of these “malfunctional” TP vectors are 1 all thetime, and mostly occurs if the node has a low number oftransitions (less than 20). In such cases, we use the fre-quency based TP. Another numerical problem can occurwhen we calculate the stationary distribution π , namely,negative values may present in the results. We need tohandle this problem when we calculate the Pearson’s chi- Figure 11: The two-dimensional stationary distribution of cars inPorto based on the TTP dataset. C h i - s q u a r e t e s t v a l u e Number of traffic units5,00010,00020,00030,00050,000
Figure 12: Chi-square test results. squared test. We chose to shift every value of π until weget a sum of 1 for π .Some minor issues can occur with the map databaseand the differences between the Porto dataset and theOSM data. In some cases, we could not calculate a routebetween two consecutive trajectory points using OSM data.This can happen because of the imperfection of the OSMdata or false GPS measurement. We handle this case bysplitting the trajectory into pieces.Another minor issue arises in the calculation of thePearson’s chi-squared test. Since the OSM Porto mapand the trajectory dataset do not cover each other per-fectly, we only know the stationary distribution π for asubgraph of the whole map. During the simulation theunits can traverse the whole map graph, so, it can happenthat a traffic unit reaches an edge which is not part of thesubgraph where we know the stationary distribution π .During the calculation of the Pearson’s chi-squared test,we consider only those cars that are present on the roadnetwork, where the stationary distribution π is known.
4. Conclusions
In this paper, we have described our traffic simulationmodel that is called “Markov traffic” based on tools fromgraph theory and Markov modelling. The aim was to pro-vide a simulation method that is able to keep the distri-bution of the cars on the map in a steady-state on a largescale road network. We have proven that, under generalassumptions, the stationary distribution is unique for anyMarkov transition mechanism on a wide class of road net-works. An explicit formula has also been derived for thestationary distribution.We have shown that the stationary distribution, withthe related transition mechanism, can be explored fromobserved data based on sample trajectories. We have pro-vided a statistical method and proved its optimality with20hich we can create the Markov kernel necessary to ob-tain a Markov traffic on a given road graph. Using thiskernel, we can initiate traffic simulations that provide astationary distribution of the cars on the map.To provide an example for creating this kernel file, wehave used a publicly available dataset, namely the TaxiTrajectory Prediction dataset. Our simulation uses Open-StreetMap, a free map database.To test our theories, we have implemented the pro-posed model in our simulation program (RCE). We haverun simulations and it has been proved to provide a sta-tionary distribution on the map graph of Porto, Portu-gal. The whole project (including the RCE) is availablefor download. Some simulation video is available at theYouTube channel of the first author. Future work will focus on the possible applications ofour simulation approach, e.g., modelling the pollution orenergy consumption in a city due to multi-modal trafficwith gasoline, diesel, electric and plug-in hybrid vehicles.
Acknowledgement
This publication is supported by the EFOP-3.6.1-16-2016-00022 project. The project is co-financed by the Eu-ropean Union and the European Social Fund. The au-thors would like to thank all actual and former membersof the Smart City group of the University of Debrecen.Special thanks to Louis Mattia for a close reading of themanuscript. We are especially grateful to all of the partic-ipants of the OOCWC competitions and the students ofthe BSc courses of “High Level Programming Languages”at the University of Debrecen. Finally we would also liketo give special thanks to M´arton Vona and Bal´azs K´oti fordesigning the graphical elements of the OOCWC.
References [1] E. Ismagilova, L. Hughes, Y. K. Dwivedi, K. R. Raman,Smart cities: Advances in research—an information systemsperspective, International Journal of Information Management47 (2019) 88–100.[2] Y. Zheng, L. Capra, O. Wolfson, H. Yang, Urban computing:Concepts, methodologies, and applications, ACM Transactionson Intelligent Systems and Technology 5 (3) (2014) 38:1–38:55.[3] P. Bocquier, World Urbanization Prospects: An alternative tothe UN model of projection compatible with the mobility tran-sition theory, Demographic Research 12 (2005) 197–236.[4] N. B´atfai, R. Besenczi, A. Mameny´ak, M. Isp´any, OOCWC:The robocar world championship initiative, in: T. Planck (Ed.),Telecommunications (ConTEL), IEEE 13th International Con-ference on, 2015, pp. 1–6.[5] N. B´atfai, R. Besenczi, A. Mameny´ak, M. Isp´any, Traffic simu-lation based on the robocar world championship initiative, In-focommunications Journal 7 (3) (2015) 50–58. see https://github.com/rbesenczi/Crowd-sourced-Traffic-Simulator/blob/master/justine/install.txt see http://bit.ly/2FRpPxL [6] E. Crisostomi, S. Kirkland, R. Shorten, A Google-like modelof road network dynamics and its application to regulation andcontrol, International Journal of Control 84 (3) (2011) 633–651.[7] M. Faizrahnemoona, A. Schlote, L. Maggi, E. Crisostomi,R. Shorten, A big-data model for multi-modal public trans-portation with application to macroscopic control and optimisa-tion, International Journal of Control 88 (11) (2015) 2354–2368.[8] M. Faizrahnemoon, Real-data modelling of transportation net-works, Ph.D. thesis, Hamilton Institute, National University ofIreland Maynooth (2016).[9] N. L. Hjort, C. Varin, ML, PL, QL in Markov chain models,Scandinavian Journal of Statistics 35 (1) (2008) 64–82.[10] C. Dabrowski, F. Hunt, Using Markov chain and graph theoryconcepts to analyze behavior in complex distributed systems,Tech. rep., U.S. National Institute of Standards and Technology(2011).[11] M. Cavers, K. Vasudevan, Spatio-temporal complex Markovchain (SCMC) model using directed graphs: Earthquake se-quencing, Pure and Applied Geophysics 172 (2) (2015) 225–241.[12] A. Lesne, Complex networks: from graph theory to biology,Letters in Mathematical Physics 78 (2006) 235–262.[13] R. Besenczi, M. Szil´agyi, N. B´atfai, A. Mameny´ak, I. Oniga,M. Isp´any, Using crowdsensed information for traffic simula-tion in the Robocar World Championship framework, in: Cog-nitive Infocommunications (CogInfoCom), 6th IEEE Interna-tional Conference on, 2015, pp. 333–337.[14] R. Besenczi, T. Katona, M. Szil´agyi, A fork implementation ofthe police edition of the OOCWC system, in: Cognitive Info-communications (CogInfoCom), 6th IEEE International Con-ference on, 2015, pp. 163–164.[15] N. B´atfai, R. Besenczi, M. Isp´any, P. Jeszenszky, R. S. Ma-jor, F. Monori, Markov modeling and simulation of traffic flow,in: Data Science, Statistics & Visualisation, DSSV 2018, 2018,p. 61.URL http://cstat.tuwien.ac.at/filz/BoA.pdf [16] K. Nagel, M. Schreckenberg, A cellular automaton model forfreeway traffic, Journal de Physique I France 2 (12) (1992) 2221–2229.[17] A. Horni, K. Nagel, K. W. Axhausen, The Multi-Agent Trans-port Simulation MATSim, Ubiquity Press London, 2016.[18] D. Charypar, K. W. Axhausen, K. Nagel, Event-driven queue-based traffic flow microsimulation, Transportation ResearchRecord 2003 (1) (2007) 35–40. doi:10.3141/2003-05 .[19] E. Popovici, A. Bucci, R. P. Wiegand, E. D. De Jong, Coevo-lutionary principles, in: G. Rozenberg, T. B¨ack, J. Kok (Eds.),Handbook of Natural Computing, Springer, Berlin, Heidelberg,2012, pp. 987–1033.[20] D. Krajzewicz, J. Erdmann, M. Behrisch, L. Bieker, Recentdevelopment and applications of SUMO - Simulation of UrbanMObility, International Journal On Advances in Systems andMeasurements 5 (3&4) (2012) 128–138.URL http://elib.dlr.de/80483/ [21] S. Lee, D. B. Fambro, Application of subset autoregressive in-tegrated moving average model for short-term freeway trafficvolume forecasting, Transportation Research Record 1678 (1)(1999) 179–188.[22] A. Stathopoulos, M. G. Karlaftis, A multivariate state spaceapproach for urban traffic flow modeling and prediction, Trans-portation Research Part C: Emerging Technologies 11 (2) (2003)121–135.[23] B. Ghosh, B. Basu, M. O’Mahony, Multivariate short-term traf-fic flow forecasting using time-series analysis, IEEE Transac-tions on Intelligent Transportation Systems 10 (2) (2009) 246.[24] J. Xue, Z. Shi, Short-time traffic flow prediction based on chaostime series theory, Journal of Transportation Systems Engineer-ing and Information Technology 8 (5) (2008) 68–72.[25] Y. Wang, M. Papageorgiou, Real-time freeway traffic state es-timation based on extended Kalman filter: A general approach,Transportation Research Part B: Methodological 39 (2) (2005)141–167.[26] D. Ngoduy, Low-rank unscented Kalman filter for freeway traffic stimation problems, Transportation Research Record 2260 (1)(2011) 113–122.[27] G. A. Davis, N. L. Nihan, Nonparametric regression and short-term freeway traffic forecasting, Journal of Transportation En-gineering 117 (2) (1991) 178–188.[28] B. L. Smith, B. M. Williams, R. K. Oswald, Comparison ofparametric and nonparametric models for traffic flow forecast-ing, Transportation Research Part C: Emerging Technologies10 (4) (2002) 303–321.[29] R. E. Turochy, B. D. Pierce, Relating short-term traffic fore-casting to current system state using nonparametric regression,in: Proceedings of the 7th International IEEE Conference onIntelligent Transportation Systems, 2004, pp. 239–244.[30] B. L. Smith, M. J. Demetsky, Traffic flow forecasting: Compar-ison of modeling approaches, Journal of Transportation Engi-neering 123 (4) (1997) 261–266.[31] C. J. Messer, Advanced freeway system ramp metering strate-gies for Texas, Tech. rep., Texas Transportation Institute, Col-lege Station, TX (1993).[32] M. Castro-Neto, Y.-S. Jeong, M.-K. Jeong, L. D. Han, Online-SVR for short-term traffic flow prediction under typical andatypical traffic conditions, Expert Systems with Applications36 (3) (2009) 6164–6173.[33] H. Nicholson, C. D. Swann, The prediction of traffic flow vol-umes based on spectral analysis, Transportation Research 8 (6)(1974) 533–538.[34] X. Jiang, H. Adeli, Dynamic wavelet neural network model fortraffic flow forecasting, Journal of Transportation Engineering131 (10) (2005) 771–779.[35] Y. Xie, Y. Zhang, A wavelet network model for short-termtraffic volume forecasting, Journal of Intelligent TransportationSystems 10 (3) (2006) 141–150.[36] Y. Cheng, Y. Zhang, J. Hu, L. Li, Mining for similarities in ur-ban traffic flow using wavelets, in: 2007 IEEE Intelligent Trans-portation Systems Conference, 2007, pp. 119–124.[37] Y.-S. Jeong, Y.-J. Byon, M. M. Castro-Neto, S. M. Easa, Super-vised weighting-online learning algorithm for short-term trafficflow prediction, IEEE Transactions on Intelligent Transporta-tion Systems 14 (4) (2013) 1700–1707.[38] K. Y. Chan, T. S. Dillon, J. Singh, E. Chang, Neural-network-based models for short-term traffic flow forecasting using ahybrid exponential smoothing and Levenberg-Marquardt algo-rithm, IEEE Transactions on Intelligent Transportation Sys-tems 13 (2) (2012) 644–654.[39] B. Park, C. J. Messer, T. Urbanik, Short-term freeway trafficvolume forecasting using radial basis function neural network,Transportation Research Record 1651 (1) (1998) 39–47.[40] H. Dia, An object-oriented neural network approach to short-term traffic forecasting, European Journal of Operational Re-search 131 (2) (2001) 253–261.[41] S. Sun, C. Zhang, G. Yu, A Bayesian network approach to trafficflow forecasting, IEEE Transactions on Intelligent Transporta-tion Systems 7 (1) (2006) 124–132.[42] Y. Lv, Y. Duan, W. Kang, Z. Li, F.-Y. Wang, Traffic flow pre-diction with big data: A deep learning approach, IEEE Transac-tions on Intelligent Transportation Systems 16 (2) (2015) 865–873.[43] M. F. Brameier, W. Banzhaf, Basic concepts of linear geneticprogramming, Linear Genetic Programming (2007) 13–34.[44] T. Iokibe, N. Mochizuki, T. Kimura, Traffic prediction methodby fuzzy logic, in: Second IEEE International Conference onFuzzy Systems, 1993, pp. 673–678.[45] L. Li, W.-H. Lin, H. Liu, Type-2 fuzzy logic approach for short-term traffic forecasting, in: IEEE Proceedings-Intelligent Trans-port Systems, Vol. 153, IET, 2006, pp. 33–40.[46] Y. Zhang, Z. Ye, Short-term traffic flow forecasting using fuzzylogic system methods, Journal of Intelligent Transportation Sys-tems 12 (3) (2008) 102–112.[47] E. Necula, Dynamic traffic flow prediction based on GPS data,in: IEEE 26th International Conference on Tools with ArtificialIntelligence, 2014, pp. 922–929. [48] J. Bang-Jensen, G. Z. Gutin, Digraphs: Theory, Algorithmsand Applications, Springer Science & Business Media, BerlinHeidelberg New York, 2008.[49] B. Pan, Y. Zheng, D. Wilkie, C. Shahabi, Crowd sensing oftraffic anomalies based on human mobility and social media,in: Proceedings of the 21st ACM SIGSPATIAL InternationalConference on Advances in Geographic Information Systems,ACM, 2013, pp. 344–353.[50] M. Crovella, E. Kolaczyk, Graph wavelets for spatial trafficanalysis, in: IEEE INFOCOM 2003. Twenty-second AnnualJoint Conference of the IEEE Computer and CommunicationsSocieties (IEEE Cat. No.03CH37428), Vol. 3, 2003, pp. 1848–1857.[51] Y. Wu, X. Zhang, Y. Bian, Z. Cai, X. Lian, X. Liao, F. Zhao,Second-order random walk-based proximity measures in graphanalysis: Formulations and algorithms, The VLDB Journal27 (1) (2018) 127–152.[52] S. Porta, P. Crucitti, V. Latora, The network analysis of urbanstreets: A dual approach, Physica A 369 (2006) 853–866.[53] R. A. Horn, C. R. Johnson, Matrix Analysis, Cambridge Uni-versity Press, Cambridge, 2012.[54] M. Faizrahnemoona, A. Schlote, E. Crisostomi, R. Shorten, AGoogle-like model for public transport, in: International Con-ference on Connected Vehicles and Expo (ICCVE), 2013, pp.612–613.[55] S. Asmussen, Applied Probability and Queues, Vol. 51 of Ap-plications of Mathematics (New York), Springer-Verlag, NewYork, 2003.[56] P. Br´emaud, Markov chains. Gibbs fields, Monte Carlo simu-lation, and queues, Vol. 31 of Texts in Applied Mathematics,Springer-Verlag, New York, 1999.[57] J. P. Jarvis, D. R. Shier, Graph-theoretic analysis of finiteMarkov chains, in: D. R. Shier, K. T. Wallenius (Eds.), Ap-plied mathematical modeling: A multidisciplinary approach,CRC Press, 1996, pp. 85–102.[58] L. Lov´asz, Random walks on graphs: A survey, in: D. Mikl´os,V. T. S´os, T. Sz˝onyi (Eds.), Combinatorics, Paul Erd˝os iseighty: Papers from the International Conference on Combi-natorics, J´anos Bolyai Mathematical Society, Budapest, Mag-yarorsz´ag, 1996, pp. 353–397.[59] N. L. Johnson, S. Kotz, N. Balakrishnan, Discrete MultivariateDistributions, Wiley, New York, 1997.[60] U. Von Luxburg, A tutorial on spectral clustering, Statisticsand Computing 17 (4) (2007) 395–416.[61] P. Billingsley, Statistical Inference for Markov Processes, TheUniversity of Chicago Press, Chicago, 1961.[62] I. Teodorescu, Maximum likelihood estimation for Markovchains, Tech. rep., Arxiv, https://arxiv.org/abs/0905.4131(2009).[63] A. N. Langville, C. D. Meyer, Google’s PageRank and Beyond– The Science of Search Engine Rankings, Princeton UniversityPress, Princeton, NJ, 2006.[64] M. Matsumoto, T. Nishimura, Mersenne Twister: A 623-dimensionally equidistributed uniform pseudo-random numbergenerator, ACM Transactions on Modeling and Computer Sim-ulation 8 (1) (1998) 3–30.[65] C. M. Bishop, Pattern Recognition and Machine Learning, In-formation Science and Statistics, Springer-Verlag, New York,2006. Appendix A. A Toy Example
In order to demonstrate the main concepts and meth-ods of this paper we present a simple toy example. Con-sider the road network G = ( V, E ) on Fig. 1 where V := We implemented this example in Python, see: https://github.com/rbesenczi/Crowd-sourced-Traffic-Simulator/blob/master/model-sources/Markovkernel/example_graph.py , , , , } and E := { (1 , , (2 , , (2 , , (2 , , (3 , , (4 , , (4 , , (5 , } . Then | V | = 5 and | E | = 8. The ad-jacency matrix A G of G , where we denote the vertices aswell, can be derived as: A G := 1 2 3 4 51 0 1 0 0 02 1 0 1 1 03 0 0 0 1 04 0 1 0 0 15 0 1 0 0 0 . Clearly, G is a strongly connected digraph. Since 1 → → → → → per ( G ) = 1 and thus G is aperiodic.The first power k that A kG > k = 4 and A G := . The entries of this matrix are the number of directed walksof length 4 between the pairs of vertices.One can see that the in- and outdegree of vertices aregiven as: d − = d + = (cid:2) (cid:3) T . Define the Markov kernel P on the road network G as: P := 1 2 3 4 51 1 / / / / / / / / / / /
45 0 1 / / P is weakly G -subordinated. Fig. 3 displays theMarkov kernel P denoting the transition probabilities onthe edges and its stationary distribution π denoting on thevertices. Note that π = (1 / , / , / , / , / (cid:62) .The symmetric unnormalized graph Laplacian matrix L and the symmetric normalized graph Laplacian matrix (cid:101) L of the road network G are given as: L = − − − − − − − − − − − − and (cid:101) L = − .
577 0 0 0 − .
577 1 − . − . − . − .
288 1 − .
35 00 − . − .
35 1 − . − .
288 0 − .
35 1 . The eigenvalues and eigenvectors of the symmetric unnor-malized graph Laplacian matrix L are given as: S = .
72 0 0 00 0 2 0 00 0 0 4 .
46 00 0 0 0 7 . and O = . − .
82 0 − .
18 0 . . − .
11 0 0 . − . .
447 0 .
36 0 . − . . .
447 0 .
22 0 0 .
76 0 . .
447 0 . − . − . . where S contains the eigenvalues in its diagonal and O is the orthonormal matrix of eigenvectors in its columns.The multiplicity of the smallest eigenvalue 0 is 1 whichshows that the road network is strongly connected. Theinverse of L on the subspace S which is the generalizedinverse or Moore-Penrose inverse of L , can be derived as L − S = OS − O (cid:62) = .
41 0 . − . − . − . .
01 0 . − . − . − . − . − .
05 0 . − . − . − . − . − .
02 0 . − . − . − . − . − .
02 0 . where S − is the generalized inverse of S . The eigenval-ues and eigenvectors of the symmetric normalized graphLaplacian matrix (cid:101) L are given as: (cid:101) S = .
77 0 0 00 0 1 0 00 0 0 1 . . and (cid:101) O = .
35 0 .
73 0 0 − . .
61 0 .
29 0 0 0 . . − .
31 0 . − . − . . − .
44 0 0 . − . . − . − . − . − . . Clearly, the eigenvalues of (cid:101) L are in [0 ,
2] and the smallestone is 0 with multiplicity 1. In this example, the speedof convergence in Proposition 2 is κ = max { . , . } =0 . G : Trajectory Length Count1 → → → → → → → → → → → → → → → → → → → n = 3350, the number of tra-jectories is k = 1000 and the two-dimensional consecutivefrequency matrix N is given by N = . The statistics for the starting and ending points of thetrajectories are:Node Start End Diff1 250 450 − − λ = (cid:2) − . − .
66 116 .
66 0 16 . (cid:3) (cid:62) . Thus, the correction matrix R is given by R = −
100 0 133 .
33 16 .
66 00 0 0 − .
66 00 − .
66 0 0 16 . − .
33 0 0 0
Since d − = d + we have n eff = ( n − k ) = 2350, and N + R = .
33 166 .
66 00 0 0 333 .
33 00 183 .
33 0 0 316 .
660 316 .
66 0 0 0 (cid:98) Q WLS = .
149 0 0 00 .
149 0 0 .
142 0 .
07 00 0 0 0 .
142 00 0 .
078 0 0 0 . .
135 0 0 0
The stationary distribution is given as (cid:98) π WLS = (cid:2) .
149 0 .
362 0 .
142 0 .
213 0 . (cid:3) (cid:62) and one can easily check that (cid:98) π WLS is indeed the station-ary distribution of the estimated Markov kernel: (cid:98) P WLS = .
41 0 0 .
39 0 . .
37 0 0 0 .
630 1 0 0 0
Note that the estimated Markov kernel using the standardmaximum likelihood estimator is given as (cid:98) P ML = . .
25 0 . . .
60 1 0 0 0 which has stationary distribution (cid:98) π ML = (cid:2) .
224 0 .
398 0 . .
174 0 . (cid:3) (cid:62) . Appendix B. Proofs
Proof of formula (9) . For all g ∈ F k we have by for-mulas (5) and (6) and the multinomial theorem that (cid:88) f ∈F k (cid:37) ( f ) R ( f , g ) == k ! (cid:88) f ∈F k (cid:89) u ∈ V π f u u (cid:88) K ∈M ( f , g ) (cid:89) u,v : u ⇒ v p k uv uv k uv != k ! (cid:88) f ∈F N (cid:88) K ∈M ( f , g ) (cid:89) u,v : u ⇒ v ( π u p uv ) k uv k uv != k ! (cid:88) (cid:80) u ∈ V k uv = g v (cid:89) u,v : u ⇒ v ( π u p uv ) k uv k uv != k ! (cid:89) v ∈ V ( g v !) − (cid:32) (cid:88) u ∈ V π u p uv (cid:33) g v = k ! (cid:89) v ∈ V π g v v g v ! = (cid:37) ( g ) . Proof of Theorem 3.
By the bias-variance decomposi-tion, see Section 3.2 in [65], we haveSSE( M, w | N ) = k (cid:88) i =1 w − i (cid:107) N i − w i M (cid:107) G = k (cid:88) i =1 w − i (cid:107) N i − w i N (cid:107) G + (cid:107) N − M (cid:107) G . (B.1)Clearly, if (cid:104)· , ·(cid:105) G denotes the inner product induced by thenorm (cid:107) · (cid:107) G , (cid:107) N i − w i M (cid:107) G = (cid:107) N i − w i N + w i N − w i M (cid:107) G = (cid:107) N i − w i N (cid:107) G + (cid:107) w i ( N − M ) (cid:107) G + 2 (cid:104) N i − w i N, w i ( N − M ) (cid:105) G . Thus, we have (B.1) since k (cid:88) i =1 w − i (cid:107) w i ( N − M ) (cid:107) G = k (cid:88) i =1 w i (cid:107) N − M (cid:107) G = (cid:107) N − M (cid:107) G k (cid:88) i =1 w − i (cid:104) N i − w i N, w i ( N − M ) (cid:105) G = k (cid:88) i =1 (cid:104) N i − w i N, N − M (cid:105) G = (cid:104) k (cid:88) i =1 N i − k (cid:88) i =1 w i N, N − M (cid:105) G = 0where we applied that (cid:80) ki =1 N i = N and (cid:80) ki =1 w i = 1.Minimizing SSE in its parameters M and w is equiva-lent by minimizing the right hand side of (B.1) partially:minimizing the first term with respect to w ; and minimiz-ing the second term with respect to M . Since (cid:107) N i − w i N (cid:107) G = (cid:107) N i (cid:107) G + w i (cid:107) N (cid:107) G − w i (cid:104) N i , N (cid:105) G we have to minimize k (cid:88) i =1 w − i (cid:107) N i − w i N (cid:107) G = k (cid:88) i =1 w − i (cid:107) N i (cid:107) G + k (cid:88) i =1 w i (cid:107) N (cid:107) G − k (cid:88) i =1 (cid:104) N i , N (cid:105) G = k (cid:88) i =1 w − i (cid:107) N i (cid:107) G − (cid:107) N (cid:107) G under the constraint (cid:80) ki =1 w i = 1. The Lagrange functionis given as: (cid:96) = k (cid:88) i =1 w − i (cid:107) N i (cid:107) G − (cid:107) N (cid:107) G + µ (cid:32) k (cid:88) i =1 w i − (cid:33) where µ is the Lagrange multiplicator. For the first-ordercondition we have, for all i = 1 , . . . , k , ∂(cid:96)∂w i = − w − i (cid:107) N i (cid:107) G + µ = 0 . Thus, w i = µ − / (cid:107) N i (cid:107) G , i = 1 , . . . , k , and hence µ / = (cid:80) ki =1 (cid:107) N i (cid:107) G and (cid:98) w i = (cid:107) N i (cid:107) G / (cid:80) kj =1 (cid:107) N j (cid:107) G , i = 1 , . . . , k ,minimizes the first term of (B.1).To minimize the second term in (B.1) define the estima-tor (cid:99) M = ( (cid:98) m uv ) as a correction to the two-dimensional con-secutive frequency matrix N in the following way: (cid:99) M := N + R where R := ( r uv ) u,v ∈ V is a real matrix on V suchthat (i) r uv = 0 if ( u, v ) / ∈ E ∪ S (i.e., u (cid:59) v ); (ii) (cid:98) m v + = (cid:98) m + v for all v ∈ V ; and (iii) (cid:107) N − (cid:99) M (cid:107) G = (cid:80) u ⇒ v r uv is minimal. Property (i) means that there is correction onthe set E ∪ S only. Property (ii) means that (cid:99) M is anunnormalized two-dimensional stationary distribution on G . Finally, property (iii) implies that (cid:99) M is optimal in theleast square sense. By equation (15) one can see that (ii)implies r v + − r + v − e v + s v = 0 , v ∈ V. (B.2) Thus, the correction matrix R is given as a solution to theconstrained optimization problem defined by12 (cid:88) u,v : u ⇒ v r uv → minunder the constraints (B.2).The Lagrange function of this constrained optimizationproblem is given as: (cid:96) := 12 (cid:88) u,v : u ⇒ v r uv + (cid:88) v ∈ V λ v ( r v + − r + v − e v + s v )where λ v , v ∈ V , are the Lagrange multipliers. Taking thegradient of the Lagrange function we have the first-ordercondition for extrema as ∂(cid:96)∂r uv = r uv + λ u − λ v = 0 , for all u, v ∈ V : u ⇒ v. Thus, we have r uv = λ v − λ u for all u ⇒ v which implies r vv = 0 for all v ∈ V and, for all u, v ∈ V , r u + = (cid:88) v : u → v ( λ v − λ u ) = (cid:88) v : u → v λ v − deg + ( u ) λ u = (cid:88) v ∈ V a uv λ v − deg + ( u ) λ u ,r + v = (cid:88) u : u → v ( λ v − λ u ) = deg − ( v ) λ v − (cid:88) u : u → v λ u = deg − ( v ) λ v − (cid:88) u ∈ V a uv λ u . Thus, by (B.2), we have the linear equation for the vector λ ∈ F ( V, R ) defined by λ := ( λ v ) v ∈ V : s − e = ( r + v − r v + ) v ∈ V = ( D − A − A (cid:62) ) λ = L λ By equation (16) s − e ∈ S which implies that λ = L − S ( s − e ) + c , where c ∈ R is arbitrary. One can easily see that R is not dependent on c thus we can suppose that c = 0and we have R = ( λ (cid:62) − λ (cid:62) ) ◦ A and we finished theproof.The formula (18) for the effective sample size followsfrom n eff := (cid:62) (cid:99) M = (cid:88) u,v : u ⇒ v (cid:98) m uv = (cid:88) u,v : u ⇒ v ( n uv + r uv )= n − k + ( d − − d + ) (cid:62) λ since (cid:88) u,v : u ⇒ v r uv = (cid:88) u,v : u → v ( λ v − λ u )= (cid:88) v ∈ V deg − ( v ) λ v − (cid:88) u ∈ V deg + ( u ) λ u =( d − − d + ) (cid:62) λ . Proof of Proposition 1.
The entries of L can be ex-pressed as l uv = (cid:40)(cid:80) w : w (cid:54) = u ( a uw + a wu ) if u = v, − ( a uv + a vu ) if u (cid:54) = v. (cid:88) u,v ∈ V l uv α u α v = (cid:88) u,w : u (cid:54) = w ( a uw + a wu ) α u − (cid:88) u,v : u (cid:54) = v ( a uv + a vu ) α u α v = 12 (cid:88) u,v : u (cid:54) = v ( a uv + a vu )( α u − α u α v + α v )which implies (20). The parts 2), 3), and 4) can be provedsimilarly to Proposition 1 in [60]. The strict positivity ofthe second eigenvalue in 5) follows from the strong con-nectedness of G by Proposition 2 in [60]. Finally, the re-maining part of 5) is a direct consequence of the spectraldecomposition of L which implies L = (cid:80) | V | j =1 τ j α j α (cid:62) j andthe fact that α , . . . , α | V | form a basis of S . Proof of Proposition 2.
The entries of (cid:101) L can be ex-pressed as (cid:101) l uv = (cid:40) − a uu d u if u = v, − a uv + a vu √ d u d v if u (cid:54) = v. Thus, we have (cid:88) u,v ∈ V (cid:101) l uv α u α v == (cid:88) u ∈ V (cid:18) − a uu d u (cid:19) α u − (cid:88) u,v : u (cid:54) = v a uv + a vu √ d u d v α u α v = 12 (cid:88) u,v : u (cid:54) = v ( a uv + a vu ) (cid:18) α u d u − α u α v √ d u d v + α v d v (cid:19) which implies (21). The proof of 2), 3), 4) and 5) is similarto the Proposition 1. We have to prove only (cid:101) τ j ≤ j . Since ( a − b ) ≤ a + b ) for all a, b ∈ R , and = if andonly if a = − b this follows from α (cid:62) (cid:101) L α ≤ (cid:88) u,v ∈ V ( a uv + a vu ) (cid:18) α u d u + α v d v (cid:19) ≤ (cid:88) v ∈ V α v , where = is in the first inequality if and only if α u √ d u = − α v √ d v for all u, v ∈ V . However, this is impossible if | V | ≥
3. 26 a) (b) (c)(d) (e) (f)(g) (h) (i)(j) (k) (l)Figure 13: The change of the distribution of cars during the simulation (5,000 ( a-c ), 10,000 ( d-f ), 20,000 ( g-i ) and 50,000 ( j-l ) cars, initialstate, after 30 mins, after 60 mins, respectively). The thickness of the street is proportionate with the number of cars on the street.) cars, initialstate, after 30 mins, after 60 mins, respectively). The thickness of the street is proportionate with the number of cars on the street.