[PDF] Public transport networks: empirical analysis and modeling

Abstract

We use complex network concepts to analyze statistical properties of urban public transport networks (PTN). To this end, we present a comprehensive survey of the statistical properties of PTNs based on the data of fourteen cities of so far unexplored network size. Especially helpful in our analysis are different network representations. Within a comprehensive approach we calculate PTN characteristics in all of these representations and perform a comparative analysis. The standard network characteristics obtained in this way often correspond to features that are of practical importance to a passenger using public traffic in a given city. Specific features are addressed that are unique to PTNs and networks with similar transport functions (such as networks of neurons, cables, pipes, vessels embedded in 2D or 3D space). Based on the empirical survey, we propose a model that albeit being simple enough is capable of reproducing many of the identified PTN properties. A central ingredient of this model is a growth dynamics in terms of routes represented by self-avoiding walks.

Full PDF

PPublic transport networks: empirical analysis and modeling

C. von Ferber,

1, 2, ∗ T. Holovatch,

3, 1, † Yu. Holovatch,

4, 5, ‡ and V. Palchykov § Applied Mathematics Research Centre, Coventry University, Coventry CV1 5FB, UK Theoretische Polymerphysik, Universit¨at Freiburg, D-79104 Freiburg, Germany Laboratoire de Physique des Mat´eriaux, Universit´e Henri Poincar´e,Nancy 1, 54506 Vandœuvre les Nancy Cedex, France Institute for Condensed Matter Physics, National Academy of Sciences of Ukraine, UA–79011 Lviv, Ukraine Institut f¨ur Theoretische Physik, Johannes Kepler Universit¨at Linz, A-4040, Linz, Austria (Dated: October 22, 2018)We use complex network concepts to analyze statistical properties of urban public transport net-works (PTN). To this end, we present a comprehensive survey of the statistical properties of PTNsbased on the data of fourteen cities of so far unexplored network size. Especially helpful in ouranalysis are diﬀerent network representations. Within a comprehensive approach we calculate PTNcharacteristics in all of these representations and perform a comparative analysis. The standardnetwork characteristics obtained in this way often correspond to features that are of practical im-portance to a passenger using public traﬃc in a given city. Speciﬁc features are addressed that areunique to PTNs and networks with similar transport functions (such as networks of neurons, cables,pipes, vessels embedded in 2D or 3D space). Based on the empirical survey, we propose a modelthat albeit being simple enough is capable of reproducing many of the identiﬁed PTN properties. Acentral ingredient of this model is a growth dynamics in terms of routes represented by self-avoidingwalks.

PACS numbers: 02.50.-r, 07.05.Rm, 89.75.Hc

I. INTRODUCTION

The general interest in networks of man-made and nat-ural systems has lead to a careful analysis of various net-work instances using empirical, simulational, and theo-retical tools. The emergence of this ﬁeld is sometimesreferred to as the birth of network science [1, 2, 3, 4, 5].In this paper, we use complex network concepts to an-alyze the statistical properties of public transport net-works (PTN) of large cities. These constitute an exampleof transportation networks [3] and share general featuresof these systems: evolutionary dynamics, optimization,embedding in two dimensional (2D) space. Other exam-ples of transportation networks are given by the airport[6, 7, 8, 9, 10, 11, 12, 13], railway [14], or power gridnetworks [6, 15, 16].While the evolution of a PTN of a given city is closelyrelated to the city growth itself and therefore is inﬂuencedby numerous factors of geographical, historical, and so-cial origin, there is ample evidence that PTNs of diﬀerentcities share common statistical properties that arise dueto their functional purposes [17, 18, 19, 20, 21, 22, 23,24, 25, 26, 27, 28, 29]. Some of these properties havebeen analyzed in former studies, however, the objectiveof the present study is to present a comprehensive surveyof characteristics of PTNs and to provide a comparativeanalysis. Based on this empirical survey we are in the ∗ [email protected] † [email protected] ‡ [email protected] § [email protected] FIG. 1: One of the networks we analyze in this study. TheLos Angeles PTN consists of R = 1881 routes and N = 44629stations, some of them are shown in this map (color online). position to propose a growth model that captures manyof the main (statistical) features of PTNs.A further distinct feature of our study is that thePTNs we will consider are networks of all means of pub-lic transport of a city (buses, trams, subway, etc.) re-gardless of the speciﬁc means of transport. A numberof studies have analyzed speciﬁc sub-networks of PTNs[17, 18, 19, 20, 23, 24, 27]. Examples are the Boston[17, 18, 19, 20] and Vienna [20] subway networks and thebus networks of several cities in China [24, 27]. However,each particular traﬃc system (e.g. the network of busesor trams, or the subway network) is not a closed system:it is a subgraph of a wider transportation system of a a r X i v : . [ phy s i c s . s o c - ph ] M a r city, or as we call it here, of a PTN. Therefore to under-stand and describe the properties of public transport ina city as a whole, one should analyze the complete net-work, without restriction to speciﬁc parts. Indeed, forthe case of Boston it has been shown that changing fromthe subway system to the network “subway + bus” thenetwork properties change drastically [18, 19].Urban public transport networks of general type haveso far been analyzed mainly in two previous studies[21, 22]. In the ﬁrst one, Ref. [21], the PTNs of Berlin,D¨usseldorf, and Paris were examined, whereas the sub-ject of Ref. [22] were public transport systems of 22Polish cities. Ref. [21] concentrated on the scale-freeproperties. For the cities considered, the node degreedistribution was shown to follow a power law. Moreoverpower laws were found for a number of other speciﬁc fea-tures describing the traﬃc load on the PTN. However,the statistics in this study was too small for deﬁnite con-clusions. In Ref. [22] it was found that the node degreedistribution may follow a power law or be described by anexponential function, depending on the assumed networkrepresentation. Besides, a number of other network char-acteristics (clustering, betweenness, assortativity) wereextensively analyzed.In the present paper, we analyze PTNs of a numberof major cities of the world (see table I) [30, 31]. Ourchoice for this data base was motivated by the require-ment to collect network samples from cities of diﬀerentgeographical, cultural, and economical background. Ourcurrent analysis extends former studies [21, 22] by con-sidering cities with larger public transport systems (thetypical number of stops in the systems considered in Ref.[22] was several hundreds) as well as by systematicallyanalyzing diﬀerent representations. The idea of diﬀerentnetwork representations naturally arises in the networkscience [1, 2, 3, 4, 5]. For the PTN the primary networktopology is given by the set of routes each servicing an or-dered series of given stations (see Fig. 1 as an example).For the transportation networks studied so far mainlytwo diﬀerent neighborhood relations were used. In theﬁrst one, two stations are deﬁned as neighbors only if onestation is the successor of the other in the series servicedby this route [18, 19]. In the second one, two stationsare neighbors whenever they are serviced by a commonroute [14]. We will exploit both representations in ourstudy. Moreover, we introduce further natural represen-tations (described in detail in Section II) which make thedescription of the PTNs of table I comprehensive. In par-ticular, this includes a bipartite graph representation of atransportation network that reﬂects its intrinsic features[20, 24, 26].There is another reason to seek scale-free properties ofPTNs considering a larger data base of more cities withlarger public transport communications involved. A cur-rently well accepted mechanism to explain the abundantoccurrence of power laws is that of preferential attach-ment or “rich gets richer” [32, 33, 34]. As far as PTNs ob-viously are evolving networks, their evolution may be ex- City

A P N R S

TypeBerlin 892 3.7 2992 211 29.4 BSTUDallas 887 1.2 5366 117 59.9 BD¨usseldorf 217 0.6 1494 124 28.5 BSTHamburg 755 1.8 8084 708 25.5 BFSTUHong Kong 1052 7.0 2024 321 39.6 BIstanbul 1538 11.1 4043 414 31.7 BSTLondon 1577 8.3 10937 922 34.2 BSTLos Angeles 1214 3.8 44629 1881 52.9 BMoscow 1081 10.5 3569 679 22.2 BESTParis 2732 10.0 3728 251 38.2 BSRome 5352 4.0 3961 681 26.8 BTSa˜o Paolo 1523 10.9 7215 997 58.3 BSydney 1687 3.6 1978 596 16.3 BTaipei 2457 6.8 5311 389 70.5 BTABLE I: Cities analyzed in this study. A : urban area (km ); P : population (million inhabitants); N : number of PTN sta-tions; R : number of PTN routes; S : mean route length. Typeof transport which is taken into account in the PTN database:Bus, Electric trolleybus, Ferry, Subway, Tram, Urban train. pected to follow a similar underlying mechanisms. How-ever, scale-free networks have also been shown to arisewhen minimizing both the eﬀort for communication andthe cost for maintaining connections [35, 36]. Moreover,this kind of an optimization was shown to lead to smallworld properties [37] and to explain the appearance ofpower laws in a general context [38]. Therefore, scale-free behavior of PTNs may also be related to obviousobjectives to optimize their operation.This paper is organized as follows. In the next sec-tion (II) we deﬁne diﬀerent representations in which thePTN will be analyzed, sections III-V explore the net-work properties in these representations. We separatelyanalyze local characteristics, such as node degrees andclustering coeﬃcients (section III), and global character-istics, such as path length distributions and centralities(section IV). Special attention is paid to characteristicsthat are unique to PTNs and networks with similar con-struction principles. An example is given by the analysisof sequences of routes which go in parallel along a givensequence of stations, a feature we call ’harness’ eﬀect.A description of correlations between the properties ofneighboring nodes in terms of generalized assortativitiesis performed in section V. Our ﬁndings for the statis-tics of real-word PTNs are supported by simulations ofan evolutionary model of PTNs as displayed in sectionVI. Conclusions and an outlook are given in section VII.Some of our results have been preliminary announced inRef. [25]. A B C DE F

A B C DE F a b

A CB DE F A CB DE F c d e f

FIG. 2: a : a piece of public transport map. Stations A-F areserviced by the tram lines No 1 (solid line), No 2 (dashed line),No 3 (dotted line). Taking all the lines to be indistinguishable,we call such representation L (cid:48) -space. b : L -space. c : P (cid:48) -space.All stations that belong to the same route are connected. d : P -space. e : C (cid:48) -space. Each route is represented by a node,each link corresponds to a common station shared by the routenodes it connects. f : C -space. II. PT NETWORK TOPOLOGY

Although everyone has an intuitive idea about what aPTN is, it appears that there are numerous ways to de-ﬁne its topology. Let us describe some of them, deﬁningdiﬀerent ’spaces’ in which public transport networks willbe analyzed. A straightforward representation of a PTmap in the form of a graph represents every station by anode while any two nodes that are successively servicedby at least one route are linked by an edge as shown inFig. 2 a . Let us note, that the full information about thenetwork of N stations and R routes is given by the set ofordered lists each corresponding to one route or to oneof the two directions of a given route. These simply listall stations serviced by that route in the order of servicebetween two terminal stations or in the course of a roundtrip. Note that multiple entries of a given station in sucha list are possible and do occur. Let us ﬁrst introduce asimple graph to represent this situation. In the followingwe will refer to this graph as a L -space [22]. This graphrepresents each station by a node, a link between nodesindicates that there is at least one route that services thecorresponding station consecutively. No multiple linksare allowed (see Fig. 2 b ). The neighbors of a given node in L -space represent all stations that are within reach ofa single station trip. For analyzing PTNs, the L -spacerepresentation has been used in Refs. [18, 21, 22, 23, 27].Extending the notion of L -space one may either introducemultiple links between nodes depending on the numberof services between them or associate a correspondingweight to a single link. We will refer to such a represen-tation as L (cid:48) -space (c.f. Fig. 2 a ).A particularly useful concept for the description of con-nectivity in transport networks which we refer to as P -space [22] was introduced in ref. [14] and used in PTNanalysis in Refs. [20, 22, 27]. In this representationthe network is a graph where stations are representedby nodes that are linked if they are serviced by at leastone common route. In P -space representation the neigh-borhood of a given node represents all stations that canbe reached without changing means of transport. The P -space concept may be extended to include multiple orweighted links. Such a representation we refer to as P (cid:48) -space (c.f. Figs. 2 c and 2 d , correspondingly).A somewhat diﬀerent concept is that of a bipartitespace which is useful in the analysis of cooperation net-works [3, 57]. In this representation which we call B -space both routes and stations are represented by nodes[24, 25, 26]. Each route node is linked to all station nodesthat it services. No direct links between nodes of sametype occur (see Fig. 3). Obviously, in B -space the neigh-bors of a given route node are all stations that it serviceswhile the neighbors of a given station node are all routesthat service it. A B C D

E F1 2 3

FIG. 3: A bipartite graph of tram lines (ﬁlled circles) and sta-tions (circles) which corresponds to the public transport mapof Fig. 2 a . For the sake of illustration, lines corresponding todiﬀerent tram routes are shown in a diﬀering way. However,neither line type nor the order of the stations matter in thisgraph. Note that Figs. 2 c - 2 f are the one-mode projectionsof this bipartite graph. We note that the one mode projections of the bipartitegraph of B -space to the set of station nodes results in P -space or in P (cid:48) -space space if we retain multiple links.The complementary projection to route nodes leads to agraph which we call C -space ( C (cid:48) -space if multiple linksare retained). In this space all nodes represent routesand the neighbors of any route node are those routeswith which it shares a common station, see Figs. 2 e , 2 f .Below, we will study diﬀerent features of the PT net-works as they appear when represented in the above de-ﬁned spaces. It is worthwhile to mention here, that stan-dard network characteristics being represented in diﬀer-ent spaces turn out to be natural characteristics one is City (cid:104) k L (cid:105) z L (cid:96) max L (cid:104) (cid:96) L (cid:105) (cid:104)C L b (cid:105) c L (cid:104) k P (cid:105) z P (cid:96) max P (cid:104) (cid:96) P (cid:105) (cid:104)C P b (cid:105) c P (cid:104) k C (cid:105) z C (cid:96) max C (cid:104) (cid:96) C (cid:105) (cid:104)C C b (cid:105) c C Berlin 2.58 1.96 68 18.5 2.6 · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · L , P , and C -spaces, correspondingly). k : node degree(nearest neighbors number z (1) ); z = (cid:104) z (2) (cid:105) / (cid:104) z (1) (cid:105) ( z (2) being the next nearest neighbors number); (cid:96) max , (cid:104) (cid:96) (cid:105) : maximal andmean shortest path length (10); C b : betweenness centrality (23); c : relation of the mean clustering coeﬃcient to that of theclassical random graph of equal size (8). Averaging has been performed with respect to corresponding network, only the meanshortest path (cid:104) (cid:96) (cid:105) is calculated with respect to the largest connected component. interested in when judging about the public transport ofa given city. To give an example, the average length ofa shortest path (cid:104) (cid:96) (cid:105) in L -space, (cid:104) (cid:96) L (cid:105) gives the number ofstops one has to pass on average to travel between anytwo stations. When represented in P space, (cid:104) (cid:96) P (cid:105) tellsabout how many changes one has to do to travel betweenany two stations. And, ﬁnally, (cid:104) (cid:96) C (cid:105) brings about thenumber of changes one has to do to pass between anytwo routes. Another example is given by the node de-gree k : k L (cid:48) tells to how many directions a passenger cantravel at a given station; k L is the number of stops in thedirect neighborhood; k P is the number of other stationsreachable without changing a line; whereas k C tells howmany routes are directly accessible from the given one.Table II lists some of the PTN characteristics we ob-tained for the cities under consideration using publiclyavailable data from the web pages of local transport or-ganizations [30, 31]. A detailed analysis and discussionis given in the following sections III - V. III. LOCAL NETWORK CHARACTERISTICS

Let us ﬁrst examine local properties of the PTNs underdiscussion. Instead of looking for characteristics of indi-vidual nodes we will be interested in their mean valuesand statistical distributions. This approach allows us toderive conclusions that are signiﬁcant for the global be-havior of the given network. The simplest but highly im-portant properties are those concerning the node degreesof a network and in particular their distribution. Earlyattempts to model complex networks were performed bymathematicians using the concept of random networks [40, 41] in which correlations are absent. A wealth ofinsight was gained by elaborating the theory on rigorousgrounds developing many concepts which remain amongthe core of network analysis. A random graph is givenby a set of N nodes and M links. The nodes to whichthe two ends of each link are connected are chosen withconstant probability 2 M/N . In case that multiple linksare excluded the average number of neighbors z is equalto the average node degree k which is: (cid:104) z (cid:105) = (cid:104) k (cid:105) = 2 M/N. (1)For the node degree k and its moments k m the average(1) can also be considered as an average with respect tothe node degree distribution p ( k ): (cid:104) k m (cid:105) = k max (cid:88) k =1 p ( k ) k m , (2)with the obvious notation k max for the maximal nodedegree. In (2), (cid:104) . . . (cid:105) stands for an ensemble average overdiﬀerent network conﬁgurations. In the following analyz-ing empirical data we will often use the same notation foran average over a large network instance. For classicalrandom graphs of ﬁnite size the node degree distribution p ( k ) is binomial, in the inﬁnite case it becomes a Poissondistribution.In Figs. 4, 5 we show node degree distributions forPTNs of several cities in L , P , and C -spaces. Note thatto get smoother curves we plot in the case of P and C -spaces the cumulative distributions deﬁned as: P ( k ) = k max (cid:88) q = k p ( q ) . (3) a b c FIG. 4: a : Node degree distributions of PTN of several cities in L -space. b : Cumulative node degree distribution in P -space. c : Cumulative node degree distribution in C -space. Berlin (circles, ˆ k L = 1 .

24, ˆ k P = 39 . k L = 1 . k P = 58 . k L = 2 .

50, ˆ k P = 125 . a b c FIG. 5: a : Node degree distributions of PTN of several cities in L -space. b : Cumulative node degree distributions in P -space. c : Cumulative node degree distribution in C -space. London (circles, γ L = 4 . γ P = 4 . γ L = 4 . γ P = 3 . γ L = 2 . γ P = 3 . In Fig. 4 the data is shown in a log-linear plot togetherwith ﬁts (for L and P -spaces) to an exponential decay: p ( k ) ∼ exp( − k/ ˆ k ) , (4)where ˆ k is of the order of the mean node degree. Withinthe accuracy of the data both L and P -space distribu-tions for the cities analyzed in Fig. 4 are nicely ﬁttedby an exponential decay. As far as the L -space data isconcerned, we ﬁnd evidence for an exponential decay forabout half of the cities analyzed, while the other partrather demonstrate a power law decay of the form: p ( k ) ∼ /k γ . (5)Figs. 5 a , 5 b show the corresponding plots for threeother cities on a log-log scale. Numerical values of theﬁt parameters ˆ k and γ (4), (5) for diﬀerent cities aregiven in Table III. There, bracketed values indicate a lessreliable ﬁt. Note that for L -space the ﬁt was done directlyfor the node degree distribution p ( k ), whereas due toan essential scattering of data in P -space the cumulative distribution (3) was ﬁtted and the corresponding valuesfor the ﬁt parameters γ P , ˆ k P were extracted from thosefor the cumulative distributions.While the node degree distribution of almost half ofthe cities in the L -space representation display a powerlaw decay (5), this is not the case for the P -space. Sofar, the analysis of PTNs of smaller cities never showedany power-law behavior in P -space [22, 27]. The data forthe three cities shown in Fig. 5 b gives ﬁrst evidence ofpower law behavior of P ( k ) in the P -space representa-tion. Previous results concerning node-degree distribu-tions of PTNs in L and P -spaces seemed to indicate thatin general the degree distribution is power-law like in L -space and exponential in P -space. This was interpreted[22] as indicating strong correlations in L -space and ran-dom connections between the routes explaining P -spacebehavior. Our present study, which includes a much lesshomogeneous selection of cities (Ref. [22] was based onexclusively Polish cities) shows that almost any combi-nation of diﬀerent distributions in L and P -spaces mayoccur. However, the three cities that show a power law City γ L ˆ k L γ P ˆ k P Berlin (4.30) 1.24 (5.85) 39.7Dallas 5.49 (0.78) (4.67) 76.9Du¨sseldorf (3.76) 1.43 (4.62) 58.8Hamburg (4.74) 1.46 (4.38) 60.7Hong Kong (2.99) 2.50 (4.40) 125.1Istanbul 4.04 (1.13) (2.70) 71.4London 4.48 (1.44) 4.39 (143.3)Los Angeles 4.85 (1.52) 3.92 (201.0)Moscow (3.22) 2.15 (2.91) 50.0Paris 2.62 (3.30) 3.70 (100.0)Rome 3.95 (1.71) (5.02) 54.8Sa˜o Paolo 2.72 (4.20) (4.06) 225.0Sydney 4.03 (1.88) (5.66) 38.7Taipei (3.74) 1.75 (5.16) 201.0TABLE III: Parameters of the PTN node degree distributionsﬁt to an exponential (4) and power law (5) behavior. Brack-eted values indicate less reliable ﬁts. Subscripts refer to L and P -spaces [31]. distribution in P -space also exhibit power law behaviorin L -space, as one can see comparing Figs. 5 a and 5 b .In C -space the decay of the node degree distribution isexponential or faster, as one can see from the plots in Fig.4c and 5c. From the cities presented there, only the PTNsof Berlin, London, and Los Angeles are governed by anexponential decay and their node degree distributions canbe approximated by a straight line in the ﬁgures.For most cities that show a power law degree distribu-tion in L -space the corresponding exponent γ L is γ L ∼ N also lie in this region: γ L = 3 . N = 940), γ L = 3 . N = 1023), γ L = 3 .

44 for Warsaw ( N = 1530)[22]. According to the general classiﬁcation of scale-freenetworks [2] this indicates that in many respect these net-works are expected to behave similar to those with expo-nential node degree distribution. Prominent exceptionsto this rule are provided by the PTNs of Paris ( γ L = 2 . γ L = 2 . γ L inthe range 2 . ÷ . N = 3938),Shanghai ( N = 2063), and Nanjing ( N = 1150) [27].The connectivity within the closest neighborhood ofa given node is described by the clustering coeﬃcientdeﬁned as C i = 2 y i k i ( k i − , (6)where y i is the number of links between the k i nearestneighbors of the node i . The clustering coeﬃcient of anode may also be deﬁned as the probability of any twoof its randomly chosen neighbors to be connected. Forthe mean value of the clustering coeﬃcient of a random FIG. 6: Mean clustering coeﬃcient (cid:104) C P ( k ) (cid:105) of several PTN in P -space. Berlin (stars), London (triangles), Taipei (circles). graph one ﬁnds (cid:104) C (cid:105) R = (cid:104) k (cid:105) R N = 2 MN . (7)In Table II we give the values of the mean clusteringcoeﬃcient in L , P , and C -spaces. The highest absolutevalues of the clustering coeﬃcient are found in P -space,where their range is given by (cid:104) C P (cid:105) = 0 . ÷ . (cid:104) C L (cid:105) = 0 . ÷ . (cid:104) C (cid:105) by the mean clusteringcoeﬃcient (7) of a random graph of the same size: c = N (cid:104) C (cid:105) / (2 M ) . (8)In L and P -representations we ﬁnd the mean clusteringcoeﬃcient to be larger by orders of magnitude relativeto the random graph. This diﬀerence is less pronouncedin C -space indicating a lower degree of organization inthese networks. Furthermore, we ﬁnd these values tovary strongly within the sample of the 14 cities. Thissuggests that the concepts according to which variousPTNs are structured lead to a measurable diﬀerence intheir organization.In P -space the clustering coeﬃcient of a node isstrongly correlated with the node degree. In Fig. 6 weshow the mean clustering coeﬃcient of nodes of degree k , (cid:104) C P ( k ) (cid:105) , as a function of k for several PTNs. Its be-havior can be understood as follows. Recall that the P -space the degree of a node (station) equals the numberof stations that can be reached from a given one. Eachroute enters the network as a complete graph, withinwhich every node has a clustering coeﬃcient of one. Asmall number k of neighbors of a given station indicatesthat the station belongs to a single route (i.e. (cid:104) C P ( k ) (cid:105) a b c FIG. 7: Mean shortest path length distribution P ( (cid:96) ) for the PTN of Berlin (stars), Hong Kong (circles), Paris (boxes) andRome (triangles). Solid line shows a ﬁt to the function (12). a : L -space; b : P -space; c : C -space. is most probably equal to one). For nodes with higherdegrees k it is more probable that they belong to morethan one route. Consequently, (cid:104) C P ( k ) (cid:105) decreases with k .The change in the behavior of (cid:104) C P ( k ) (cid:105) should occur atsome value of k which is of the order of the mean num-ber of stops of the routes. The prominent feature of thefunction (cid:104) C P ( k ) (cid:105) in P -space is that it decays following apower law (cid:104) C P ( k ) (cid:105) ∼ k − β . (9)Within a simple model of networks with star-like topol-ogy this exponent is found to be of value β = 1 [22]. Intransport networks. This behavior was ﬁrst observed forthe Indian railway network [14] and then for the PolishPTNs [22]. In our case, the values of the exponent β for the networks studied lie in the range from 0 .

65 (Sa˜oPaolo) to 0 .

96 (Los Angeles) with a mean value of 0 . IV. GLOBAL CHARACTERISTICSA. Path length distribution

Let (cid:96) i,j be the length of a shortest path between sites i and j in a given space. The mean shortest path is deﬁnedas (cid:104) (cid:96) (cid:105) = 2 N ( N − N (cid:88) i>j =1 (cid:96) ij . (10)Note that (cid:104) (cid:96) (cid:105) is well-deﬁned only if nodes i and j be-long to the same connected component of the network.In the following any expression as given in Eq. (10) willbe restricted to this case. Furthermore, related networkcharacteristics will be calculated for the largest (or gi-ant) connected component, GCC. Correspondingly, N denotes the number of constituting nodes of this com-ponent. Denoting the path length distribution as P ( (cid:96) ), the average (10) reads (cid:104) (cid:96) (cid:105) = (cid:96) max (cid:88) (cid:96) =1 P ( (cid:96) ) (cid:96), (11)where (cid:96) max is maximal shortest path length found onthe connected component. In Fig. 7 we plot the meanshortest path length distributions obtained in diﬀerentspaces for several selected cities. Together with the datawe plot a ﬁt to the asymmetric unimodal distribution[22]: P ( (cid:96) ) = A(cid:96) exp ( − B(cid:96) + C(cid:96) ) , (12)where A, B, C are ﬁt parameters. As can be seen fromthe ﬁgures, the data is generally nicely reproduced bythis ansatz. However, in certain networks additional fea-tures may lead to a deviation from this behavior as canbe seen from Fig. 8, which shows the mean shortestpath length distribution in L -space P L ( (cid:96) ) for Los Ange-les. One observes a second local maximum on the rightshoulder of the distribution. Qualitatively this behaviormay be explained by assuming that the PTN consists ofmore than one community. For the simple case of onelarge community and a second smaller one at some dis-tance this situation will result in short intra-communitypaths which will give rise to a global maximum and aset of longer paths that connect the larger to the smallercommunity resulting in additional local maxima. Such asituation deﬁnitely appears to be present in the case ofthe Los Angeles PTN, see Fig. 1.Let us introduce a characteristic that informs how re-mote a given node is from the other nodes of the net-works. For the node i this may be characterized by thevalue: (cid:96) i = 1 N − (cid:88) j (cid:54) = i (cid:96) ij . (13)Now, the mean shortest path (10) can be deﬁned in terms FIG. 8: Mean shortest path length distribution in L -space, P L ( (cid:96) ), for the PTN of Los Angeles. of (cid:96) i as: (cid:104) (cid:96) (cid:105) = 1 N (cid:88) i (cid:96) i . (14)In order to look for correlations between (cid:96) i and the nodedegree k i let us introduce the value: (cid:96) ( k ) = 1 N k N (cid:88) i =1 (cid:96) i δ k,k i , (15)where N k is number of nodes of degree k and δ k,k i is theKronecker delta. Consequently, (cid:96) ( k ) is the mean short-est path length between any node of degree k and othernodes of the network. For the majority of the analyzedcities the dependence of the mean path (cid:96) L ( k ) (15) onthe node degree k in L -space can be approximated by apower law (cid:96) L ( k ) ∼ k − α L . (16)The value of the exponent varies in the range α L = 0 . ÷ .

27. We show this dependence for several cities in Fig.9. A particular relation between path lengths and nodedegrees can be shown to hold relating the mean pathlength between two nodes to the product of their nodedegrees. To this end let us deﬁne (cid:96) ( k, q ) = N (cid:88) i,j =1 (cid:96) i,j δ k i k j ,kq . (17)As has been shown in [42], this relation can be approxi-mated by (cid:96) ( k, q ) = A − B log( kq ) . (18) a bc d FIG. 9: Mean path (cid:96) L ( k ) (15) in the L -space as a functionof the node degree k with a ﬁt to the power law decay (16). a : Berlin, α L = 0 . b : Hong Kong, α L = 0 . c : Paris, α L = 0 . d : Taipei, α L = 0 . For random networks the coeﬃcients A and B can becalculated exactly [43]. The validity of Eq. (18) waschecked on the base of PTNs of some Polish cities anda rather good agreement for the majority of the citieswas found in L -space. In our analysis which concernsPTNs of much larger size, we do not observe the samegood agreement for all cities. The suggested logarithmicdependence (18) was found by us in L -space also for thelarger cities, however with much more pronounced scatterof data for large values of the product kq . In Fig. 10 weplot the mean path (cid:96) L ( k, q ) in the L -space for the PTNof Berlin, Hong Kong, Rome, and Taipei. Note, however,that due to the scatter of data a logarithmic dependencefrequently is indistinguishable from a power law with asmall exponent.In P -space, the shortest path length (cid:96) ij gives the min-imal number of routes required to be used in order toreach site j starting from the site i . In turn, (cid:96) i , Eq. (13),deﬁnes the number of routes one uses on average travel-ing from the site i to any node of the network. The higherthe node degree, the easier it is to access other routes inthe network. Therefore, also in P -space one expects adecrease of (cid:96) P ( k ) when k increases. This is shown forseveral cities in Fig. 11. Besides the expected decreaseof (cid:96) P ( k ), one can see a tendency to a power-law decay (cid:96) P ( k ) ∼ k − α P . (19)The value of the exponent α P varies in the interval α P = 0 .

09 (for Sydney) to α P = 0 .

17 (for Dallas) andis centered around α P = 0 . ÷ .

13 as shown for thecities in Fig. 11. The mean path (cid:96) P ( k, q ) as a function of FIG. 10: Mean path (cid:96) L ( k, q ) (17) in the L -space as a functionof kq for the PTN of Berlin (stars), Hong Kong (circles), Rome(triangles), and Taipei (squares). a bc d FIG. 11: Mean path (cid:96) P ( k ) in P -space as a function of the nodedegree k and its ﬁt to the power law decay (19). a : Berlin, α P = 0 . b : Hong Kong, α P = 0 . c : Paris, α P = 0 . d :Taipei, α L = 0 . kq for several cities is given in P -space in Fig. 12. Thescattering of data is much more pronounced than in L -space. However one distinguishes a tendency of (cid:96) P ( k, q )to decrease with an increase of kq . The red lines in Figs.12 are the guides to the eye characterizing the decay. B. Centralities

To measure the importance of a given node withrespect to diﬀerent properties of a graph a numberof so-called centrality measures have been introduced[44, 45, 46, 47, 48]. Most of them are based on either a b

FIG. 12: Mean path (cid:96) P ( k, q ) in P -space for PTN of Berlin ( a )and Paris ( b ) as a function of kq . measuring path lengths to other nodes or on countingthe number of paths between other nodes mediated bythis node. The closeness C c ( i ) [45] and graph C g ( i ) [46]centralities of a node i are based on the shortest pathlengths (cid:96) ij to other nodes j : C c ( i ) = 1 (cid:80) j (cid:54) = i (cid:96) i,j , (20) C g ( i ) = 1max j (cid:54) = i (cid:96) i,j . (21)Only nodes j that belong to the same connected compo-nent as i contribute to (20), (21). For a given node theseproperties obviously depend on the size of the connectedcomponent to which the node belongs. The importanceof the node i with respect to the connectivity within thegraph may be measured in terms of the number of short-est paths σ jk ( i ) between nodes j and k that go via node i . Denoting by σ jk the overall number of shortest pathsbetween nodes j and k one deﬁnes stress C s ( i ) [47] andbetweenness C b ( i ) [48] centralities by: C s ( i ) = (cid:88) j (cid:54) = i (cid:54) = k σ jk ( i ) , (22) C b ( i ) = (cid:88) j (cid:54) = i (cid:54) = k σ jk ( i ) σ jk . (23)Numerical values of the betweenness centrality (23) aregiven in Table I in L , P and C -spaces.Averaging the two centralities that are based on pathlength (20), (21) one obtains values that are closely re-lated to the average shortest path length on the GCC.As far as this relation is independent of the represen-tation of the PTN, we ﬁnd very similar correspondencebetween (cid:104) (cid:96) (cid:105) and the mean centralities (cid:104)C c (cid:105) , (cid:104)C g (cid:105) in allspaces considered as shown in Fig. 13. The fact thatthese centralities are based on the inverse path length isreﬂected by the negative slope of the curves shown in theﬁgures.The betweenness centrality (23) and the related stresscentrality (22) of a given node measure the share of themean paths between nodes that are mediated by thatnode. It is obvious that a node with a high degree has0 a b c FIG. 13: Correspondence between the mean shortest path (cid:104) (cid:96) (cid:105) and mean centralities (cid:104)C c (cid:105) (open circles), (cid:104)C g (cid:105) (ﬁlled circles) forall fourteen PTN listed in Table II in ( a ) L , ( b ) P , and ( c ) C -spaces. a higher probability to be part of any path connectingother nodes. This relation between C b and the node de-gree may be quantiﬁed by observing their correlation. InFigs. 14 we plot the mean betweenness centrality (cid:104)C b ( k ) (cid:105) of all nodes that have a given degree k . There, we presentresults for the PTN of Paris in L , C and P , and B -spaces.Especially well expressed is the betweenness-degree cor-relation in L -space (Fig. 14 a ) and with somewhat lessprecision in C -space (Fig. 14 b ). In both cases there isa clear tendency to a power law (cid:104)C b ( k ) (cid:105) ∼ k η with anexponent η = 2 ÷

3. Let us note here, that this powerlaw together with the scale free behavior of the degreedistribution implies that also the betweenness distribu-tion should follow a power law with an exponent δ . Thisbehavior is clearly identiﬁed in Fig.15 for the L -spacebetweenness distribution of the Paris PTN, for which weﬁnd an exponent δ ≈ .

5. The resulting scaling relation[49] η = ( γ − / ( δ −

1) (24)is fulﬁlled within the accuracy for these exponents. Inthe plots for both B and P -spaces we observe the occur-rence of two regimes which correspond to small and largedegrees k . This separation however has a diﬀerent originin each of these cases. In the B -space representation, thenetwork consists of nodes of two types, route nodes andstation nodes. Typically, station nodes are connectedonly to a low number of routes while there is a minimalnumber of stations per route. One may thus identify thelow degree behavior as describing the betweenness of sta-tion nodes, while the high degree behavior correspondsto that of route nodes. In the overlap region of the tworegimes one may observe that when having the same de-gree station nodes have a higher betweenness than routenodes. The occurrence of two regimes in the P -space rep-resentation has a similar origin as the change of behav-ior observed for the mean clustering coeﬃcient (cid:104) C P ( k ) (cid:105) ,see Fig.6. Namely, stations with low degrees in generalbelong only to a single route and thus are of low impor-tance for the connectivity within the network resulting ina low betweenness centrality. Comparing our results withthose of Ref. [22] we do not however ﬁnd a saturation forthe low k region, as observed there. Similar betweenness a bc d FIG. 14: Mean betweenness centrality (cid:104)C b ( k ) (cid:105) - degree k cor-relations for the PTN of Paris in ( a ) L , ( b ) C , ( c ) P , and ( d ) B -spaces.FIG. 15: Betweenness centrality C b - distribution for PTN ofParis in L - space. (cid:104)C b ( k ) (cid:105) - degree relations as observed in Fig. 14 for thePTN of Paris we also ﬁnd for most of the other cities,however, with diﬀerent quality of expression.1 a b FIG. 16: Cumulative harness distributions for Istanbul ( a )and for Moscow ( b ) PTN. C. Harness

Besides the local and global properties of networks de-scribed above which can be deﬁned in any type of net-work, there are some characteristics that are unique forPTNs and networks with similar construction principles.A particularly striking example is the fact that as far asthe routes share the same grid of streets and tracks oftena number of routes will proceed in parallel along shorteror longer sequences of stations. Similar phenomena areobserved in networks built with real space consuminglinks such as cables, pipes, neurons, etc. In the presentcase this behavior may be easily worked out on the basisof sequences of stations serviced by each route. To quan-tify this behavior recently the notion of network harnesshas been introduced [25]. It is described by the harnessdistribution P ( r, s ): the number of sequences of s consec-utive stations that are serviced by r parallel routes. Sim-ilarly to the node-degree distributions, we observe thatthe harness distribution for some cities (Hong Kong, Is-tanbul, Paris, Rome, Sa˜o Paolo, Sydney) may be ﬁttedby a power law: P ( r, s ) ∼ r − γ s , for ﬁxed s, (25)whereas the PTNs of other cities (Berlin, Dallas,D¨usseldorf, London, Moscow) are better ﬁtted to an ex-ponential decay: P ( r, s ) ∼ exp ( − r/ ˆ r s ) , for ﬁxed s. (26)As examples we show the harness distribution for Istan-bul (Fig. 16 a ) and for Moscow (Fig. 16 b ). Moreover,sometimes (we observe this for Los Angeles and Taipei),for larger s the regime (25) changes to (26). We showthis for the PTN of Los Angeles in Fig. 17. There, onecan see that for small values of s the curves are betterﬁtted to a power law dependence (25). With increasing s a tendency to an exponential decay (26) appears: Fig.17 b .As one can observe from the Figs. 16, 17 the slope ofthe harness distribution P ( r, s ) as a function of the num-ber of routes r increases with an increase of the sequencelength s . For PTNs for which the harness distributionfollows power law (25) the corresponding exponents γ s a b FIG. 17: Cumulative harness distributions for Los Angeles. a : log-log scale; b : log-linear scale. are found in the range of γ s = 2 ÷

4. For those distribu-tions with an exponential decay the scale ˆ r s (26) variesin the range ˆ r s = 1 . ÷

4. The power laws observed forthe behavior of P ( r, s ) indicate a certain level of organi-zation and planning which may be driven by the need tominimize the costs of infrastructure and secondly by thefact that points of interest tend to be clustered in certainlocations of a city. Note that this eﬀect may be seen asa result of the strong interdependence of the evolutionsof both the city and its PTN.We want to emphasize that the harness eﬀect is a fea-ture of the network given in terms of its routes but itis invisible in any of the graph representations presentedso far. In particular PTN representation in terms of asimple graph which do not contain multiple links (suchas L , P , C and B -spaces) can not be used to extractharness behavior. Furthermore, the multi-graph repre-sentations (such as L (cid:48) , P (cid:48) , and C (cid:48) -spaces) would need tobe extended to account for the continuity of routes. Asnoted above, the notion of harness may be useful also forthe description of other networks with similar properties.On the one hand, the harness distribution is closely re-lated to distributions of ﬂow and load on the network.On the other hand, in the situation of space-consuminglinks (such as tracks, cables, neurons, pipes, vessels) theinformation about the harness behavior may be impor-tant with respect to the spatial optimization of networks.A generalization may be readily formulated to accountfor real-world networks in which links (such as cables)are organized in parallel over a certain spatial distance.While for the PTN this distance is simply measured bythe length of a sequence of stations, a more general mea-sure would be the length of the contour along which theselinks proceed in parallel. V. GENERALIZED ASSORTATIVITIES

To describe correlations between the properties ofneighboring nodes in a network the notion of assorta-tivity was introduced measuring the correlation betweenthe node degrees of neighboring nodes in terms of themean Pearson correlation coeﬃcient [50, 51]. Here, wepropose to generalize this concept to also measure corre-2lations between the values of other node characteristics(other observables). For any link i let X i and Y i be thevalues of the observable at the two nodes connected bythis link. Then the correlation coeﬃcient is given by: r = M − (cid:80) i X i Y i − [ M − (cid:80) i ( X i + Y i )] M − (cid:80) i ( X i + Y i ) − [ M − (cid:80) i ( X i + Y i )] (27)where summation is performed with respect to the M links of the network. Taking X i and Y i to be the nodedegrees Eq. (27) is equivalent to the usual formula for theassortativity of a network [50]. Here we will call this spe-cial case the degree assortativity r (1) . In the following wewill investigate correlations between other network char-acteristics such as the observables considered above, z , C i (6), C c (20), C g (21), C s (22), C b (23). Consequently,this results in generalized assortativities of next near-est neighbors ( r (2) ), clustering coeﬃcients ( r cl ), close-ness ( r c ), graph ( r g ), stress ( r s ), and betweenness ( r b )centralities.The numerical values of the above introduced assorta-tivities r (1) and r (2) for the PTN under discussion arelisted in Table IV in L , P and C -spaces. With respectto the values of the standard node degree assortativity r (1) L in L -space, we ﬁnd two groups of cities. The ﬁrstis characterized by values r (1) L = 0 . ÷ .

3. Althoughthese values are still small they signal a ﬁnite preferencefor assortative mixing. That is, links tend to connectnodes of similar degree. In the second group of citiesthese values are very small r (1) L = − . ÷ .

08 showingno preference in linkage between nodes with respect tonode degrees. PTNs of both large and medium sizes arepresent in each of the groups. This indicates the absenceof correlations between network size and degree assorta-tivity r (1) L in L -space. Measuring the same quantity inthe P and C -spaces, we observe diﬀerent behavior. In P -space almost all cities are characterized by very small(positive or negative) values of r (1) P with the exceptionof the PTNs of Istanbul ( r (1) P = − .

12) and Los Angeles( r (1) P = 0 . C -space PTNs demon-strate clear assortative mixing with r (1) C = 0 . ÷ .

5. Anexception is the PTN of Paris with r (1) C = 0 . r (1) >

0) or neutral ( r (1) ∼

0) mixing with respectto the node degree (ﬁrst nearest neighbors number) k .Calculating assortativity with respect to the second nextnearest neighbor number r (2) we explore the correlationof a wider environment of adjacent nodes. Due to thefact that in this case the two connected nodes share atleast part of this environment (the ﬁrst nearest neighborsof a node form part of the second nearest neighbors ofthe adjacent node) one may expect the assortativity r (2) to be non-negative. Results for r (2) shown in Table IVappear to conﬁrm this assumption. In all the spaces con-sidered, we ﬁnd that all PTNs that belong to the group ofneutral mixing with respect to k also belong to the same City r (1) L r (2) L r (1) P r (2) P r (1) C r (2) C Berlin 0.158 0.616 0.065 0.441 0.086 0.318Dallas 0.150 0.712 0.154 0.728 0.290 0.550D¨usseldorf 0.083 0.650 0.041 0.494 0.244 0.180Hamburg 0.297 0.697 0.087 0.551 0.246 0.605Hong Kong 0.205 0.632 -0.067 0.238 0.131 0.087Istanbul 0.176 0.726 -0.124 0.378 0.282 0.505London 0.221 0.589 0.090 0.470 0.395 0.620Los Angeles 0.240 0.728 0.124 0.500 0.465 0.753Moscow 0.002 0.312 -0.041 0.296 0.208 0.011Paris 0.064 0.344 -0.010 0.258 0.060 -0.008Rome 0.237 0.719 0.044 0.525 0.384 0.619Sa˜o Paolo -0.018 0.437 -0.047 0.266 0.211 0.418Sydney 0.154 0.642 0.077 0.608 0.458 0.424Taipei 0.270 0.721 0.009 0.328 0.100 0.041TABLE IV: Nearest neighbors and next nearest neighborsassortativities r (1) and r (2) in diﬀerent spaces for the wholePTN. group with respect to the second nearest neighbors. Forthose PTNs that display signiﬁcant nearest neighbors as-sortativity r (1) we ﬁnd that the second nearest neighborassortativity r (2) is in general even stronger in line withthe above reasoning.Recall that both closeness and graph centralities C c and C g are measured in terms of path lengths, Eqs. (20),(21). It is natural to expect that adjacent nodes will havevery similar (or almost identical) centralities C c and C g .In turn this will lead to strong assortative mixing withhigh assortativities r c and r g . This assumption holdsonly if the average path length in the network is suﬃ-ciently large. The latter is certainly the case for PTNsin L -space but it does not hold in P and even less in C -spaces. Indeed, in L -space, where most PTNs display amean path length (cid:104) (cid:96) L (cid:105) >

10 (see Table II) we ﬁnd valuesof r c L in the range r c L = 0 . ÷ .

998 ( r g L = 0 . ÷ . r c L = 0 . r g L = 0 . (cid:104) (cid:96) L (cid:105) = 7 . r c L = 0 . r g L = 0 . (cid:104) (cid:96) L (cid:105) = 6 . P and C -spaces where the mean path lengths aremuch shorter (of the order of three in P and of the or-der of two in C -spaces) the one-step diﬀerence in pathlength between adjacent nodes leads to much reducedassortative mixing. Numerically this is reﬂected in muchlower (however positive) values of corresponding assorta-tivities for PTNs where (cid:104) (cid:96) (cid:105) is especially small. Indeed,for all PTNs that display in P -space a mean path length (cid:104) (cid:96) P (cid:105) < . r c P < . r g P < . (cid:104) (cid:96) P (cid:105) may display larger assortativitieseven in P -space. The extreme example is Los Angeleswith (cid:104) (cid:96) P (cid:105) = 4 . r c P = 0 . r g P = 0 . C -space, where vertices are routes the mean path length iseven smaller and further reduction of closeness and graph3centrality assortativities is observed. For ﬁve PTNs weﬁnd in C -space (cid:104) (cid:96) C (cid:105) < r c C < . r g C < .

3. Again the largest values are attainedin the Los Angeles PTN with (cid:104) (cid:96) C (cid:105) = 3 . r c C = 0 . r g C = 0 . r s and r b and cluster-ing coeﬃcient assortativity r cl we in general ﬁnd no evi-dence for any (positive or negative) correlation in any ofthe spaces considered. The only exception are the stressand betweenness centrality assortativities in L -space, r s L and r b L . There, small but signiﬁcantly positive valuesof r s L and r b L are found. The latter is explained by therelatively large mean path length in this space in con-junction with relatively small node degree values. Let usrecall that stress and betweenness centralities essentiallycount the number of shortest paths mediated by a givennode. If a selected node is a part of many such longpaths while having low degree, there is high probabilitythat any of its neighbors will also be a part of these paths.Consequently, a positive value of r c ( r g ) will arise. Theanalogous conclusion can be drawn for nodes with lowbetweenness (or stress) centralities. For most PTNs thevalues of the assortativities under consideration changein the range r s L = 0 . ÷ . r b L = 0 . ÷ .

61. Ex-ceptions are the PTNs which in L -space have mean pathlength (cid:104) (cid:96) L (cid:105) <

10, namely Moscow, Paris and Sa˜o Paolo.There we ﬁnd r s L = 0 . ÷ . r b L = 0 . ÷ . VI. MODELING PTNSA. Motivation and description of the model

Having at hand the above described wealth of empiricaldata and analysis with respect to typical scenarios foundin a variety of real-world PTNs we feel in the positionto propose a model that albeit being simple may capturethe characteristic features of these networks. Nonethelessit should be capable of discriminating between some ofthe various scenarios observed.If we were only to reproduce the degree distributionof the network, standard models such as random net-works [4, 52] or preferential attachment type models[6, 34, 53, 54, 55, 56] would suﬃce. The evolution of suchnetworks however is based on the attachment of nodes.For description of PTNs the concept of routes as ﬁnitesequences of stations is essential [5, 23, 25, 28] and allowsfor the representation with respect to the spaces deﬁnedabove. Moreover, taking a route as the essential elementof PTN growth allows to account for the essential bipar-tite structure of this network [20, 24, 26, 57]. Therefore,the growth dynamics in terms of routes will be a centralingredient of our model. Another obvious requirement isthe embedding of this model in two-dimensional space.To simplify matters we will restrict the model to a two-dimensional grid, in particular to square lattice. Both theobservations of power law degree distributions as well as the occurrence of the corresponding harness distributionsdescribed above indicate a preference of routes to servicecommon stations (i.e. an attraction between routes).Let us describe our model in more detail. As noticedabove, a route will be modeled as a sequence of stationsthat are adjacent nodes on a two-dimensional square lat-tice. Noting that in general loops in PTN routes are al-most absent, a most simple choice to model a PTN routeis a self-avoiding walk (SAW). It may sound less obviousthat a route apart from being non self-intersecting pro-ceeds randomly. However, the analysis of geographicaldata [25] has shown that the fractal dimension of PTNroutes closely coincides with that of a two-dimensionalSAW, d f = 4 / R routes of S stations each constructed on a pos-sibly periodic X × X square lattice. The dynamics of theroute generation adheres to the following rules: •

1. Construct the ﬁrst route as a SAW of S latticesites. •

2. Construct the R − a) choose a terminal station at (cid:126)x with probability p ∼ k (cid:126)x + a/X ; (28) b) choose any subsequent station (cid:126)x of the routewith probability p ∼ k (cid:126)x + b. (29)In (28), (29) k (cid:126)x is the number of times the lattice site (cid:126)x has been visited before (the number of routes that passthrough (cid:126)x ). Note that to ensure the SAW property anyroute that intersects itself is discarded and its construc-tion is restarted with step 2a). B. Global topology of model PTN

Let us ﬁrst investigate the global topology of thismodel as function of its parameters. We ﬁrst ﬁx boththe number of routes R and the number of stations S per route as well as the size of the lattice X . This leavesus with essentially two parameters a and b , Eqs. (28),(29). Dependencies on R , S , and X will be studied be-low.For the real-world PTNs as studied in the previous sec-tions, almost all stations belong to a single component,GCC, with the possible exception of a very small numberof routes. Within the network however we often observewhat above we called the harness eﬀect of several routesproceeding in parallel for a sequence of stations. Let usﬁrst investigate from a global point of view which pa-rameters a and b reproduce realistic maps of PTNs. InFig. 18 we show simulated PTNs on lattices 300 ×

300 for R = 1024, S = 64 and diﬀerent values of the parameters4 b = 0 . b = 0 . b = 0 . a = 15 a = 20 a = 500FIG. 18: PTN maps of diﬀerent simulated cities of size 300 ×

300 with R = 1024 routes of S = 64 stations each (color online).First row: a = 0, b = 0 . ÷ .

5. Second row: b = 0 . a = 15 ÷ b routes cover more and more area.Increase of a leads to clusterisation of the network. R S b (cid:104) k L (cid:105) z L (cid:96) max L (cid:104) (cid:96) L (cid:105) (cid:104)C L b (cid:105) (cid:104) k P (cid:105) z P (cid:96) max P (cid:104) (cid:96) P (cid:105) (cid:104)C P b (cid:105) c P (cid:104) k C (cid:105) z C (cid:96) max C (cid:104) (cid:96) C (cid:105) (cid:104)C C b (cid:105) c C

256 16 0.5 2.92 1.66 61 20.8 4.7 × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × X = 300, a = 0 for diﬀerent parameters R , S , and b . The rest ofnotations as in Table II. a and b . Each route is represented by a continuous linetracing the path along its sequence of stations. For rep-resentation purposes, parallel routes are shown slightlyshifted. Thus, the line thickness and intensity of colorsindicate the density of the routes.Parameter b governs the evolution of each single subse-quent route. If b = 0 each subsequent route is restrictedto follow the previous one. The change of simulatedPTNs with b for ﬁxed a = 0 is shown in the ﬁrst rowof Fig. 18. For small values of b = 0 ÷ . b from b = 0 . b = 0 . b to b = 0 . a quantiﬁes the possibility to start anew route outside the existing network. For vanishing a = 0 the resulting network always consists of a singleconnected component, while for ﬁnite values of a a fewor many disconnected components may occur. The re-sults found for a = 0 and varying b parameters are com-pletely independent of the lattice size X provided X is5suﬃciently large. When introducing a ﬁnite a parame-ter, however, new routes may be started anywhere on thelattice which results in a strong lattice size dependency.To partly compensate for this, the impact of a has beennormalized by X in (28). The dependence of the simu-lated PTN maps on a for ﬁxed b = 0 . a <

15 one observes the for-mation of a single large cluster with only a few individualroutes occurring outside this cluster. Slightly increasing a beyond a = 15 one ﬁnds a sharp transition to a situa-tion with several (two or more) clusters. For much largervalues of a the number of clusters further increases andthe situation becomes more and more homogeneous: theroutes tend to cover all available lattice space area. C. Statistical characteristics of model PTN

From the above qualitative investigation we concludethat realistic PTN maps are obtained for small or van-ishing a and b ≥ .

5. To quantitatively investigate thebehavior of the simulated networks on their parametersincluding R and S let us now compare their statisticalcharacteristics with those we have empirically obtainedfor the real-world networks. In Table V we have chosento list the same characteristics of the simulated PTN asthey are displayed for the real-world networks in TableII. To provide for additional checks of the correlationsbetween simulated and real-world networks, we presentthe characteristics in all L , P , and C -spaces. Let us notethat our choice of the underlying grid to be a square lat-tice limits the number of nearest neighbors of a givenstation in L -space to k L ≤

4. Moreover, as far as no di-rect links between these neighbors occur, the clusteringcoeﬃcient in L -space vanishes, c L = 0. Nonetheless, aswe discuss below, both characteristics display nontrivialbehavior similar to real-world networks when displayedin P and C -spaces.For reasons explained above we choose a vanishing pa-rameter a = 0 and b = 0 . b = 5 . R = 256 , , L = 16 , ,

64. In the range of param-eters covered in the Table we observe only weak changesof the various characteristics. Natural trends are thatwith the increase of the number of routes R the maximaland mean shortest path length increases in all spaces.Most pronounced this is observed in L -space, while it isweakest in C -space. A similar increase is observed in L -space when increasing the number of stations S per route.Choosing the values of R in the range R = 256 ÷ S = 16, S = 32 the average and maximal values of thecharacteristics studied here are found within the rangesseen for real-world PTNs, see Table II. More detailed in-formation is contained in the distributions of these char-acteristics and their correlations.We restrict the further discussion to simulated PTNsdescribed by R = 256 , , S = 16 ,

32, and a = 0, b = 0 .

5, which appear to reproduce many of the charac-teristics of real-world PTNs. In ﬁgure 19 we display themean shortest path length distribution for these selectedPTNs in L , P , and C -spaces. In L -space we observe twogroups of distributions which correspond to the two routelengths S = 16 and S = 32. The most probable values forthe path length ˆ (cid:96) L being of the order of the corresponding S . In P and C -spaces the distributions are very similarwith most probable path lengths ˆ (cid:96) P ∼

3, ˆ (cid:96) C ∼

2. In allcases the distributions are well ﬁtted by the asymmet-ric unimodal distribution (12) and resemble those of thereal-world networks shown in Fig. 7. Varying b = 0 . ÷ L -space degrees are restricted by the geometry ofthe underlying square lattice. Thus of the representa-tions discussed here one may observe non-trivial distri-butions only in P , C , and B -spaces. Fig. 20 a showsthe cumulative node degree distribution in P -space insemi-logarithmic scale. Recall that for the majority ofreal-world PTNs studied in section III as well as in otherworks [22, 27], the P -space node degree distribution wasfound to decay exponentially. All distributions shownin Fig. 20 a display two regions each governed by anexponential decay with a separate scale. Note that in-creasing both S and R leads to an increase of the rangesover which these regions extend. Comparing these resultswith those of Fig. 4 b for real-world PTNs we ﬁnd that allranges observed there are also reproduced here. Withinthe parameter ranges chosen here the current model doesnot seem to attain a power law node degree distributionin P -space.Comparing the C -space node degree distributions forreal-world and simulated PTNs (Figs. 4 c and 20 b , cor-respondingly) one ﬁnds a deﬁnite tendency to an expo-nential behavior with two diﬀerent scales in both cases.Note however that for the simulated PTNs the scales in-crease with the number of routes R while they decreasewith the number of stations per route S .The simulated results discussed so far concerned dataobtained for individual instances of modeled PTNs. Oneof the reasons for this was to reduce the computational ef-fort required for the calculation of path lengths, between-nesses, and related global characteristics. Furthermore,in particular for the simulations involving high numberof routes some self averaging may be expected to occur.The latter assumption was tested and veriﬁed by (i) sim-ulating a reasonable set of PTNs with the same choice ofparameters and (ii) by performing large-scale simulationscalculating local characteristics. A result of the latterprocedure involving averages over up to 3 · instancesof simulated networks is shown in Fig. 21 a . There weshow the node degree distribution of the station nodesin B -space, i.e. the bipartite network of routes and sta-tions with the inherent neighborhood relation (see Fig.3). As can be seen in the double logarithmic plot shownin Fig. 21 a a power-law like behavior of this distribu-6 a b c FIG. 19: Mean shortest path length distribution P ( (cid:96) ) for several simulated PTNs. a : L -space; b : P -space; c : C -space. Symbolscorrespond to simulation results, curves to ﬁts of unimodal distributions. a b FIG. 20: Cumulative node degree distributions P cum ( k ) (3) for several simulated PTNs in ( a ) P and ( b ) C -spaces. tion that extends over a wider range is found for smallvalues of the parameter b . This corresponds to a situa-tion where one ﬁnds many routes to proceed in parallel(compare with the maps shown in Fig. 18). For the morerealistic choices of the b parameter the overall behaviorof this distribution is described by an exponential decay.The scale of this decay strongly depends on b . Fig. 21 b shows that similar distributions for the real cities haveoscillating character, which is caused by the fact thatnon-cumulative distributions are plotted. Similarly, in-dividual distributions for simulated PTNs are in generalnon-monotonous, however the large number average ofthe distribution appears to be monotonously decreasing.Nevertheless, comparing plots in Figs. 21 a and 21 b onesees that in general the model is capable to reproducethe global decay properties of the station node degreedistributions in B -space.In Fig. 22 we show the betweenness-degree correlationfor the simulated PTN with X = 300, R = 500, S =50, a = 0, and b = 0 .

5. There, we present the meanbetweenness centrality (cid:104)C b ( k ) (cid:105) in C , P , and B -spaces.Corresponding plots for a real world network are shownin Fig. 14. Plots displayed for the simulated networks inFigs. 22 a - 22 c qualitatively reproduce the behavior of (cid:104)C b ( k ) (cid:105) observed for the real world networks in C , P , and B -spaces. L -space behavior can not be reproduced due tothe restrictions caused by the geometry of the underlyingsquare lattice.In Figs. 23 a and 23 b we plot the cumulative harnessdistributions P ( r, s ) for two simulated networks with R =256, S = 32, a = 0 and diﬀerent values of parameter b : b = 0 . a ) and b = 1 . b ). Similar plotsfor real world networks are given in Figs. 16 and 17. Theplots of Fig. 23 nicely reproduce two regimes empiricallyobserved for the real-world PTN. In the ﬁrst, the harnessdistribution is governed by a power law decay (25), Fig.23 a , whereas in the other one there is a tendency to anexponential decay (26), Fig. 23 b . A prominent featuredemonstrated by Fig. 23 is that one can tune the decayregime by changing the parameter b . For small values of b the probability of a route to proceed in parallel with otherroutes is high c.f. Eq. (29). Therefore, the number of“hubs” in the P ( r, s ) distribution of lines of several routesthat go in parallel is large for small b . This is reﬂectedby a power-law decay of the distribution. Alternatively,an increase of b leads to a decrease of such hubs as shownby the exponential decay of their distribution.Summarizing the comparison of the statistical charac-7 a b FIG. 21: a : Averaged (over 3000 ÷ B -space. R = 1024, S = 64, a = 0. Parameter b changes in the region b = 0 . ÷ . b : Corresponding node degree distributions forHamburg (circles), Hong Kong (squares), Los Angeles (triangles), and Istanbul (stars). a b c FIG. 22: Mean betweenness centrality (cid:104)C b ( k ) (cid:105) for the simulated city of 300 ×

300 sites with R = 500, S = 50, a = 0, and b = 0 . C ( a ), P ( b ), and B ( c ) spaces. a b FIG. 23: Cumulative harness distributions P ( r, s ) for the sim-ulated PTN with R = 256, S = 32. Fig. a : a = 0, b = 0 . b : a = 0, b = 1 .

0. Compare with plots in Figs. 16, 17 forthe real-world networks. teristics of real world networks with those of simulatedones one can deﬁnitely state that the model proposedabove captures many essential features of real worldPTNs. This is especially evident if one includes into thethe comparison diﬀerent network representations (diﬀer-ent spaces) as performed above.

VII. CONCLUSIONS

This paper was driven by two main objectives towardsthe analysis of urban public transport networks. First,we wanted to present a comprehensive survey of statis-tical properties of PTNs based on the data for cities ofso far unexplored network size (see Table I). Based onthis survey, the second objective was to present a modelthat albeit being simple enough is capable to reproducea majority of these properties.Especially helpful in our analysis was the use of diﬀer-ent network representations (diﬀerent spaces, introducedin section II). Whereas former PTN studies used some ofthese representations, here within a comprehensive ap-proach we calculate PTN characteristics as they show upin L , P , C , and B -spaces. It is the comparative analysisof empirical data in diﬀerent spaces that enabled, in par-ticular, an adequate PTN modeling presented in sectionVI.The networks under consideration appear to bestrongly correlated small-world structures with high val-ues of clustering coeﬃcients (especially in L and less in C -spaces) and comparatively low mean shortest path val-ues, as listed in Table II. Standard network characteris-8tics listed there correspond to the features a passenger isinterested in when using public traﬃc in a given city. Togive several examples, any two stops in Paris are on theaverage separated by (cid:104) (cid:96) L (cid:105)− . (cid:96) max L − (cid:104) (cid:96) P (cid:105) − . L and for some in P -space (see Table IV). Currently, weﬁnd no explanation why some of the networks of our sur-vey are governed by power-law node degree distributionswhereas others follow an exponential decay. In the anal-ysis of urban street networks a classiﬁcation has beenfound [39, 59] that allows to discriminate between prop-erties of diﬀerent classes of city organization. Let us notehowever that as a rule the latter analysis is performed forrestricted regions of street networks i.e. either the histor-ical or the suburban part. In the case of a PTN, however,one usually deals with a structure that spreads over allthe city, covering both the inner and outer regions.Besides looking on traditional network characteristics(as described in sections III - V) we addressed here a spe-ciﬁc feature which is unique for PTNs and networks withsimilar construction principles. Namely, we analyzed sta-tistical distributions of public transport routes that go inparallel for a sequence of stations. As we have shown suchdistributions (we call them harness distributions) are welldeﬁned for the networks under consideration and maybe also be used for a quantitative description of similarnetworks embedded in 2D or 3D space as cables, pipes,neurons, or (blood-) vessels, etc.The common statistical features of the networks con-sidered emerge due to their common functional purposes and construction principles also reﬂected in the under-lying bipartite structure [57]. It is this structure thatexplains parts of the correlations present in PTNs [20].The network growth model we present in Section VI cap-tures this structure describing network evolution in termsof adding public transport routes, each of them beinga complete graph in P -space. Our choice to use a selfavoiding walk (SAW) as a route model in lattice simu-lations was motivated by geographical observations andother reasons, as argued in section VI. In support of thescaling argument given there, one may note that the frac-tal dimension of a SAW on a lattice does not change ifa weak uncorrelated disorder is present, i.e. when somelattice sites can not be visited [60]. In turn, this tells thatthe model is robust with respect to weak disturbances ofthe underlying lattice structure. Further analysis of sim-ulated PTNs performed in section VI established strongsimilarities in the statistical characteristics of simulatedand real-world networks.Obviously, the two objectives in the PTN study wehave so far achieved in this paper - the empirical anal-ysis and the modeling - naturally call for an analyticapproach. In particular, such approach may be used inparallel with numerical simulations to derive statisticalproperties of the model proposed in section VI. This willbe a task for forthcoming studies. Another natural con-tinuation of this work will be to analyze diﬀerent possiblydynamic phenomena that may occur on and with PTNs.A particular task will be to study robustness of PTNs totargeted attacks and random failures [29].Yu.H. acknowledges support of the Austrian FWFproject 19583-PHY. [1] R. Albert and A.-L. Barab´asi, Rev. Mod. Phys. , 47(2002).[2] S. N. Dorogovtsev and J. F. F. Mendes, Adv. Phys. ,1079 (2002).[3] M. E. J. Newman, SIAM Review , 167 (2003).[4] S. N. Dorogovtsev and S. N. Mendes, Evolution of Net-works (Oxford University Press, Oxford, 2003).[5] Yu. Holovatch, O. Olemskoi, C. von Ferber, T. Holovatch,O. Mryglod, I. Olemskoi, and V. Palchykov, J.Phys.Stud. , 247 (2006).[6] L. A. N. Amaral, A. Scala, M. Barth´el´emy, and H. E.Stanley, Proc. Natl. Acad. Sci. USA., , 11149 (2000).[7] R. Guimera and L. A. N. Amaral, Eur. Phys. J. B ,381 (2004).[8] R. Guimera, S. Mossa, A. Turtschi, and L.A.N. Amaral,Proc. Nat. Acad. Sci. USA , 7794 (2005).[9] A. Barrat, M. Barth´elemy, R. Pastor-Satorras, and A.Vespignani, Proc. Nat. Acad. Sci. USA , 3747 (2004).[10] L.-P. Chi, R. Wang, H. Su, X.-P. Xu, J.-S. Zhao, W. Li,and X. Cai, Chin. Phys. Lett. , 1393 (2003).[11] Y. He, X. Zhu, and D.-R. He, Int. J. Mod. Phys. B ,2595 (2004).[12] W. Li and X. Cai, Phys. Rev. E , 046106 (2004). [13] W. Li, Q. A. Wang, L. Nivanen, and A. Le M´ehaut´e,Physica A , 262 (2006).[14] P. Sen, S. Dasgupta, A. Chatterjee, P. A. Sreeram, G.Mukherjee, and S. S. Manna, Phys. Rev. E , 036106(2003).[15] P. Crucitti, V. Latora, and M. Marchiori, Physica A ,92 (2004).[16] R. Albert, I. Albert, and G. L. Nakarado, Phys. Rev. E , 025103 (2004).[17] M. Marchiori and V. Latora, Physica A , 539 (2000).[18] V. Latora and M. Marchiori, Phys. Rev. Lett. , 198701(2001).[19] V. Latora and M. Marchiori, Physica A , 109 (2002).[20] K. A. Seaton and L. M. Hackett, Physica A , 635(2004).[21] C. von Ferber, Yu. Holovatch, and V. Palchykov,Condens. Matter Phys. , 225 (2005), e-print cond-mat/0501296.[22] J. Sienkiewicz and J. A. Holyst, Phys. Rev. E , 046127(2005), e-print physics/0506074; J. Sienkiewicz and J. A.Holyst, Acta Phys. Polonica B , 1771 (2005).[23] P. Angeloudis and D. Fisk, Physica A , 553 (2006).[24] P.-P. Zhang, K. Chen, Y. He, T. Zhou, B.-B. Su, Y. Jin, H. Chang, Y.-P. Zhou, L.-C. Sun, B.-H. Wang, and D.-R.He, Physica A , 599 (2006).[25] C. von Ferber, T. Holovatch, Yu. Holovatch, and V.Palchykov, Physica A , 585 (2007).[26] H. Chang, B.-B. Su, Y.-P. Zhou, and D.-R. He, PhysicaA , 687 (2007).[27] X. Xu, J. Hu, F. Liu, and L. Liu, Physica A , 441(2007).[28] C. von Ferber, T. Holovatch, Yu. Holovatch, and V.Palchykov, arXiv:0709.3203. In

Traﬃc and GranularFlow ’07 . Springer (2008) (to appear).[29] C. von Ferber, T. Holovatch, and Yu. Holovatch,arXiv:0709.3206. In

Traﬃc and Granular Flow ’07 , 425 (1955).[33] D. de S. Price, J. Amer. Soc. Inform. Sci. , 292 (1976).[34] A.-L. Barab´asi and R. Albert, Science , 509 (1999);A.-L. Barab´asi, R. Albert, and H. Jeong, Physica A ,173 (1999).[35] R. Ferrer i Cancho and R. V. Sol´e, e-print cond-mat/0111222; S. Valverde, R. Ferrer i Cancho, and R.V. Sol´e, Europhys. Lett. , 512 (2002); R. Ferrer i Can-cho and R. V. Sol´e, in Statistical mechanics of ComplexNetworks , edited by R. Pastor-Satorras, M. Rubi, and A.Diaz-Guilera (Lecture Notes in Physics Vol 625, Springer,Berlin, 2003), p. 114.[36] M. T. Gastner and M. E. J. Newman, Eur. Phys. J. B , 247 (2006).[37] N. Mathias and V. Gopal, Phys. Rev. E , 021117(2001).[38] R. Ferrer i Cancho and R. V. Sol´e, Proc. Natl. Acad. Sci.USA., , 788 (2003); R. Ferrer i Cancho, Physica A, , 275 (2005).[39] A. Cardillo, S. Scellato, V. Latora, and S. Porta, Phys.Rev. E , 066107 (2006). [40] P. Erd¨os and A. R´enyi, Publ. Math. (Debrecen) , 290(1959); Publ. Math. Inst. Hung. Acad. Sci. , 17 (1960);Bull. Inst. Int. Stat. , 343 (1961).[41] B. Bollob´as, Random Graphs (Academic Press, London,1985).[42] J. A. Holyst, J. Sienkiewicz, A. Fronczak, P. Fronczak,and K. Suchecki, Phys. Rev. E , 026108 (2005).[43] A. Fronczak, P. Fronczak, and J. A. Holyst, Phys. Rev.E , 046126 (2003).[44] U. Brandes, J. Math. Sociology , 163 (2001).[45] G. Sabidussi, Psychometrika , 581 (1966).[46] P. Hage and F. Harary, Social Networks , 57 (1995).[47] A. Shimbel, Bull. Math. Biophys. , 501 (1953).[48] L. C. Freeman, Sociometry , 35 (1977).[49] K.-I. Goh, B. Kahng, and D. Kim, Phys. Rev. Lett. ,278701 (2001).[50] M. E. J. Newman, Phys. Rev. Lett. , 208701 (2002).[51] M. E. J. Newman, Phys. Rev. E , 026126 (2003).[52] M. E. J. Newman, S. H. Strogatz, and D. J. Watts, Phys.Rev. E , 026118 (2001).[53] Z. Liu, Y.-C. Lai, N. Ye, and P. Dasgupta, Phys. Lett. A , 337 (2002).[54] M.E.J. Newman, Phys. Rev. E , 016131 (2001).[55] X. Li and G. Chen, Physica A , 274 (2003).[56] J.J. Ramasco, S.N. Dorogovtsev, and R. Pastor-Satorras,Phys. Rev. E , 036106 (2004).[57] J.-L. Guillaume and M. Latapy, Physica A , 795(2006).[58] B. Nienhuis, Phys. Rev. Lett. , 1062 (1982).[59] D. Volchenkov and Ph. Blanchard, Phys. Rev. E ,026104 (2007); D. Volchenkov, Condens. Matter Phys.(2008), to appear.[60] A.B. Harris, Z. Phys. B , 347 (1983); Y. Kim, J. Phys.C , 1345 (1983); V. Blavats’ka, C. von Ferber, andYu. Holovatch, Phys. Rev. E, , 041102 (2001); C. vonFerber, V. Blavats’ka, R. Folk, and Yu. Holovatch, Phys.Rev. E70