[PDF] Complex delay dynamics on railway networks: from universal laws to realistic modelling

Abstract

Railways are a key infrastructure for any modern country. The reliability and resilience of this peculiar transportation system may be challenged by different shocks such as disruptions, strikes and adverse weather conditions. These events compromise the correct functioning of the system and trigger the spreading of delays into the railway network on a daily basis. Despite their importance, a general theoretical understanding of the underlying causes of these disruptions is still lacking. In this work, we analyse the Italian and German railway networks by leveraging on the train schedules and actual delay data retrieved during the year 2015. We use {these} data to infer simple statistical laws ruling the emergence of localized delays in different areas of the network and we model the spreading of these delays throughout the network by exploiting a framework inspired by epidemic spreading models. Our model offers a fast and easy tool for the preliminary assessment of the {effectiveness of} traffic handling policies, and of the railway {network} criticalities.

Full PDF

EEPJ Data Science manuscript No. (will be inserted by the editor)

Complex delay dynamics on railway networks from universal laws to realistic modelling

Bernardo Monechi · Pietro Gravino · Riccardo Di Clemente · Vito D. P. Servedio

Received: date / Accepted: date

Abstract

Railways are a key infrastructure for any modern country. The relia-bility and resilience of this peculiar transportation system may be challenged bydiﬀerent shocks such as disruptions, strikes and adverse weather conditions. Theseevents compromise the correct functioning of the system and trigger the spreading of delays into the railway network on a daily basis. Despite their importance, a general theoretical understanding of the underlying causes of these disruptions isstill lacking. In this work, we analyse the Italian and German railway networksby leveraging on the train schedules and actual delay data retrieved during theyear 2015. We use these data to infer simple statistical laws ruling the emergenceof localized delays in diﬀerent areas of the network and we model the spreadingof these delays throughout the network by exploiting a framework inspired byepidemic spreading models. Our model oﬀers a fast and easy tool for the prelimi-nary assessment of the eﬀectiveness of traﬃc handling policies, and of the railwaynetwork criticalities.

Keywords

Complex Systems · Networks · Delay Dynamics · Modelling · Spreading

B. MonechiInstitute for Scientiﬁc Interchange Foundation, Via Alassio 11/c, 10126, Turin, Italyand SONY Computer Science Lab, 6 Rue Amyot, Paris, France, 75005, France E-mail:[email protected]. Gravino”Sapienza” University of Rome, Piazzale Aldo Moro 5, 00185, Romaand SONY Computer Science Lab, 6 Rue Amyot, Paris, 75005, France E-mail:[email protected]. Di ClementeUniversity College London, The Bartlett Centre for Advanced Spatial Analysis, London, WC1E6BT, United KingdomMassachusetts Institute of Technology, 77 Massachusetts Ave, MA 02139 - Cambridge, USAE-mail: [email protected]. D. P. ServedioComplexity Science Hub Vienna, Josefst¨adter Str. 39, 1080 Vienna, AustriaE-mail: [email protected] a r X i v : . [ phy s i c s . s o c - ph ] J un Bernardo Monechi et al.

Transportation networks are a critical infrastructure with an enormous impacton local, national, and international economies [1,2]. At the micro-level, the com-muting time impacts the economy and the shape of cities [3], directly inﬂuencingour life style and choices [4]. At the macro-level the travel time regulates tradeand stimulates economic activities [5]. Railroads, when correctly developed, fosterinterregional trades within the country, and increase income levels within the cityboundaries [6]. Whether we are traveling in crowded coaches of the commutersrail during rush hours or in the long-medium high speed train trips, the relia-bility of the transportation system is a key factor in determining travel behavior[7]. The travel time unreliability in rail service can have substantial consequencesfor its users [8] and the growth of cities [9]. Delays are one of the causes thatcorrupt the travel time reliability of the transportation systems. To address de-lay propagation, traditional approaches apply either stochastic models [10] or usepropagation algorithms on a timed event graph representation of a scheduled rail-way system [11]. Complex equations describing delay were obtained and solved bymeans of iterative reﬁnement algorithms to predict positive delays in urban trains[12], while traﬃc control models were proposed to manage the safety of timetablesafter perturbations occur [13].The development of models addressing the dynamics of trains over railway net-works are usually focused either on delimited aspects, e.g., single lane dynamics[14] or the issuing of schedule [15,16], or are comprehensively including a widevariety of diﬀerent microscopic ingredients that can be hardly validated by realdata [17,18]. In the last years a new perspective bloomed to provide a diﬀerentunderstanding of delay dynamics through the lens of Complex Systems. Railwaynetworks have largely been studied and exhibit small-world scaling properties [19],their topology have geo-spatial restriction [20] and structural redundancy [21]. Onthe other end, a railway delay dynamics framework within a complex systemsapproach is still lacking. Our aim is to develop a model capable of reproducingthe actual delay structure and the emergence of congested areas, by exploitingsome essential ingredients without relying on the detailed knowledge of the micro-scopic mechanisms underlying delay generation. A similar approach has alreadybeen successfully applied in the context of the Air Transport System of USA andEurope [22,23], where interaction models were used to characterize and forecastthe spreading of delays among ﬂights. These modelling schemes are driven by thenecessity of simple and novel frameworks to study transportation systems, allow-ing fast scenario simulations for schedule testing, which could be easily appliedto resilience studies that are nowadays performed independently of the dynamicstaking place over the network [21].Following a similar approach, we develop a new data driven model of delay prop-agation among trains diﬀusing over a railway network. We start by inferring uni-versal laws governing the emergence of delays over the railway network as a con-sequence of the occurrence of adverse conditions that are not depending on the (possible) interactions between trains. Hence, inspired by models for epidemicspreading [24], we introduce the

Delay Propagation Model as a novel frameworkto asses and test how such emerging delays spawn and spread over the network.We name the ﬁrst kind of delays as “exogenous delays”, while the second onesstemming from the interaction between trains as “endogenous delays”. While the omplex delay dynamics on railway networks 3 occurrence of exogenous delays can be detected and analysed from our data, theendogenous delays, on the contrary, since the (supposed) direct interaction betweentrains cannot be inferred from the dataset we collected, will be modeled with aone parameter interaction. The eﬀects and consistency of such endogenous delaymodelling scheme will be checked a posteriori by means of numerical simulations.Our model is close to an SIS model of epidemic spreading [24], since trains caneither get infected by delay, recover from it and then get infected again. We showthat this model is capable of reproducing the empirical distribution of delays mea-sured in the data as well as the emergence of large congested areas. Moreover, weshow that the removal of the delay propagation mechanism prevents the modeledsystem from generating large disruptions, hence strongly suggesting the existenceof this kind of interaction in the real system. Finally, we propose an application ofour model in studying a scenario where the propagation of a localized delay leadsto the emergence of a vast non-functioning area in the Italian Railway Network.The paper is structured as follows: in the ﬁrst section we describe the dataset andsome empirical ﬁndings that inspired the assumptions used in the development ofthe model; in the second section we discuss the model in detail and will show itscapability of reproducing empirical ﬁndings such as the distribution of delays andthe size of congested areas; we will also demonstrate the usefulness of the modelas a scenario simulation tool; in the last section we will summarize our ﬁndingsand discuss possible future developments. . .

46 km per km of tracks, respectively [27]). Moreover, these two countriesshare also a crucial characteristic: in both railway systems traﬃc is handled mainlyby a single national company. In other countries with similar networks, like in theUnited Kingdom, the railway network is managed by diﬀerent companies resultingin a more complex system where trains are additionally subject to the commercialpolicies of diﬀerent operators. Fig. 1 shows a set of topological analyses of the two railway networks, together with a preliminary analysis of the traﬃc load and de-lays in the systems. Following existing literature, we refer to a railway network asthe network whose nodes represent stations that are connected by a link wheneverthere is a train connecting them with two consecutive stops in its schedule. Thenetwork is directed since it is possible that a connection between two stations is Bernardo Monechi et al. travelled just in one direction. In this paper we refer to the nodes in a railway net-work either as “nodes” or as “stations”. Moreover, the action performed by a traintravelling from station A to station B will be referred as “travelling over the link”connecting A and B. In both networks the distribution of the degree k is peakedon k = 4 and for larger degrees has exponential-like decrease [19,28] proportionalto e − k/k with k (cid:39) . ± . .

18 and 0 .

24 respectively. This indicates that while there is a slight preference forstations with the same degree to be connected each other, yet the various degreesare mostly mixing, i.e., smaller and larger stations are typically connected. The lo-cal clustering coeﬃcient (Fig. 1D), deﬁned as the fraction of pairs of neighbours ofa given station that are connected over all pairs of neighbours of that station [31],can be used to infer the redundancy of the network. When a disruption occurs,a station with an high clustering coeﬃcient can be bypassed easily. Complemen-tarily, betweenness centrality (Fig. 1E) is a measure of the centrality of a node ina network based on the number of shortest paths that pass through it [31]. Thismeasure highlights how strategic a given station is at a global level.The clustering and the betweenness outline the typical small word topologyof transportation networks [19]. German stations show a distribution between 10and 100 trains per day, narrower that the train distribution of the Italian stations,which instead display a broader distribution, suggesting a more heterogeneous han-dling of train traﬃc load (Fig. 1F). Finally, Fig. 1G shows the histogram of theaverage delays of trains aggregated by stations for the two nations. Both distribu-tions show a peak and heavy tail, whereas compared to the German distribution,the Italian distribution is clearly shifted towards higher delays. The topologicalsimilarities between the two networks suggest that the diﬀerences in delays andtraﬃc load might be the result of diﬀerences in the trains dynamics.2.2 Train interaction on railways networksTo focus on the analysis of delays, their outbreak and evolution, we choose trainsinstead of stations as our reference systems. The intermediate delays for a train i travelling from station A to station B on the link e is deﬁned as ∆ i t ( e ), Thedeparture delays at the initial station have been subtracted to analyse only thedelays that have been generated during the travel. Hence ∆ i t ( e ) can be negativewhenever the train is in advance, resulting in the train waiting at the station for thecorrect time of departure. Fig. 2A shows the delay distributions for both nationalsystems, considering the delays at intermediate stations along the path of a train,or just at the ﬁnal station. We observe similar shapes of these distributions, withthe Italian one exhibiting broader tails than the German. More than 10% of thetrain stops are on-time. The distributions of both countries exhibit an asymmetricpattern: the right tail (labeled Delay) shows a power-law like behaviour compatiblewith a q-exponential distribution [32], while the left tail (labelled Advance) has an exponential steeper slope. We expect this distribution to be the result of theinterplay between the occurrence of adverse conditions and the interaction betweentrains, inﬂuencing their dynamics. Despite the fact that the microscopic details inour data do not allow for a precise investigation of possible interactions betweentrains, we can highlight how the possibility of interaction might aﬀect the delays in omplex delay dynamics on railway networks 5 Fig. 1

Italian and German Railway Networks and Traﬃc. (A) A simple comparison ofdatasets and basic networks properties. (B,C) Railway Networks shown on a map. Nodesof the network represent stations, which are linked by a straight line whenever there is a traingoing from one to another without intermediate stops. (D) A scatter plot of the local clus-tering coeﬃcient against the degree of nodes. Each dot represents a station and is collocateddepending on its properties. Lines represent the averaged clustering as a function of the degreeof the nodes in the two networks. (E) Similar to (D), but with betweenness centrality in placeof clustering. (F) Comparison of the Italian and German distributions of the average numberof trains per day travelling through a station. (G) Comparison between the Italian and Ger-man distributions of the average delay at stations. All the analyses are made over the whole2015. The Italian distribution has been made transparent in order to see the overlap with theGerman one. railway networks. Hence, we study the relation between the ﬁrst order co-activity ofa link, i.e., the probability that at a certain time t (with time steps of 30 minutes)a link which is active has at least one active neighbouring link, to the average delayof the link itself. In Fig. 2B we show this quantity as a function of the average delayfor both the Italian and German cases. We notice a slight increase in average delayas the co-activity increases, conﬁrmed also by the Spearman’s coeﬃcients, 0 .

43 forItaly and 0 .

56 for Germany. Hence, the possibility of interaction between trains ina certain part of the railway network seems to increase the delay localized in thatarea. Note that we have deﬁned the ﬁrst order co-activity between neighbouringlinks, i.e. links that have at least one node in common. We can deﬁne a k -orderco-activity considering links connecting at least two nodes that are less than ( k −

1) links apart following the shortest-path connecting them. We report the samemeasurements for k = 2 and k = 3 showing that in general, the same relation withthe average delay is conﬁrmed even though the curves are shifted towards lowervalues. This indicates that considering the interaction between non-neighbouringlinks is relevant but might include less important contributions to the delay. Hence, in the following we will limit our analysis to the ﬁrst order neighbour links andassume that interactions between trains are possible only when they are in nearbylinks. Fig. 2B supports the thesis of a propagation eﬀect but an important featurehas yet to be determined. In fact the direction of the propagation still has to bedetermined. Bernardo Monechi et al.

B) C)A)

Fig. 2

Delays and train interactions. (A) Distributions of the delay at the ﬁnal and at theintermediate stations trains in both Italian and German datasets. Departure delays at theinitial station has been subtracted, hence these delays originated during the travel. As a guidefor the eye, a power-law has been added in the positive delay area and an exponential in theadvance area (negative delay). (B) Average delay of a link as a function of the link k -order co-activities, with Spearman’s correlation coeﬃcients showing the existence of a relation betweenthe co-activity and the average delay. The curves have been obtained by binning the dataon the x-axis at steps of 0 . C ( dt ) from equation (1) as a function of dt for the average delay time series ofpairs of nearby links in the railway networks in the “Forward” and “Backward” conﬁgurations(shown in Fig. 3), and in the “Connection” conﬁguration . i travelling between two stations A and B (i.e., on thelink e = AB , see Fig. 3) with some delay. We can argue that the propagation ofthis delay to other trains can occur only if they travel on the neighbour links of e . Due to the fact that the railway networks are directed, there are four diﬀerentconﬁgurations of the links with respect to e :(i) links entering A , i.e. trains moving towards the last station crossed by i ;(ii) links entering B , i.e. trains travelling towards the same station i is currentlytravelling to;(iii) links exiting from A , i.e. trains departed from the last station crossed by i ;(iv) links exiting from B , i.e. trains leaving the station i is currently travelling to.We can exclude the last two case: (iii) given that all the trains in such conﬁgurationwill have no interaction with i ; (iv) is less important because it describes scheduledconnections. In the latter case, schedules foresee extra-time between the two trainsexactly to avoid delay propagation. We checked whether the propagation occursin the case (i) of backward propagation , in the case (ii) of forward propagation , whosedeﬁnitions are depicted in the graphic Fig. 3. To discriminate which mechanism isat play, we measured the average delay time sequence ∆t ( e ) of each link e of thenetwork, deﬁned as the average delay of all the trains that are currently travellingon e . Successively we measured the cross-correlation functions of the average delaytime series of all the pairs of links, i.e., CC e,e (cid:48) ( dt ) = (cid:80) t ∆t e ( t ) ∆t e (cid:48) ( t + dt ) /σ e σ e (cid:48) being e and e (cid:48) generic neighbours links of the network, σ e and σ e (cid:48) the standard de-viation of the whole time series ∆t e ( t ) and ∆t e (cid:48) ( t ). Then we averaged, aggregatingthe pairs of neighbours links according to their conﬁguration (forward, backward,etc). In this way, for each of the four conﬁguration, we obtained an average cross-correlation function. In the backward propagation conﬁguration can be deﬁned omplex delay dynamics on railway networks 7 A B CD

BackwardPropagation ForwardPropagation

Fig. 3

Arrows represent main delay propagation mechanisms.

Backward propagation : the(green) train travelling from station A to station B has got a delay and propagates it tofollowing (blue) trains travelling towards station A;

Forward propagation : the train on ABpropagates its delay to (red) trains that are travelling from the station C toward B. : C B ( dt ) = (cid:104) CC e,e (cid:48) ( dt ) (cid:105) ( e,e (cid:48) ) ∈B (1)with B as the ensemble of links pairs. For both networks the Backward mechanismis dominating while the Forward can be neglected (Fig. 2D ). The high-speed layerof the railway network shows the similar backward mechanism, while there is nocross-correlations between the delays of high-speed vs. low-speed (see Supplemen-tary Material Fig. S1) acting as two independent layers.2.4 Exogenous generation of delayWe deﬁne two kinds of delays: endogenous and exogenous. By “endogenous” wemean that its origin is inside the railway system dynamics, i.e. it has been caused byanother train. Conversely, by “exogenous” we mean that its cause is of a diﬀerentnature: strikes, malfunctioning, bad weather or anything else which is not the resultof the interaction with another train. We measure directly this types of delay inour datasets. Let us consider a train i travelling from a station A to a station B onthe link e and further to a station C on the link e (cid:48) . It will travel ﬁrst on the link e and then on the link e (cid:48) . The delay are indicate respectively as ∆t i ( e ) and ∆t i ( e (cid:48) ). If there is a increase in the delay ∆t i ( e (cid:48) ) > ∆t i ( e ) it might be “endogenous” or“exogenous”. The exogenous delay is deﬁned as δt = ∆t i ( e (cid:48) ) − ∆t i ( e ), the variationof the train delay traveling on links whose neighbouring links were empty or hostedtrains perfectly on time. It is worth noticing that δt might also be negative, forexample, if the train managed to make up for lateness. Results are reported inFig. 4, showing the distribution of positive exogenous delays as well as negativeones for the Italian and German cases.In order to model these distributions, we adopted the same approach used in[32] for departure delays. We ﬁtted both the positive and negative parts of thedistributions with q -exponential functions, where the parameter q modulates from an exponential distribution q → q ∈ (1 , e q,b ( δt ) ∝ (1 + b ( q − δt ) / (1 − q ) with q ∈ [1 , , b > . (2)It has been shown that such distribution can be obtained starting from a poisso-nian process p ( δt | α ) = αe − αt , where α is a random variable extracted from of n Bernardo Monechi et al.

A) B)

Fig. 4

Exogenous delays: positive exogenous delays (continuous lines) and negative delays(dashed lines) for the Italian (A) and German (B) railway networks. The black dashed linesare obtained by ﬁtting the q -exponential distribution of Eq. (4) to the data. The best ﬁttingparameters are shown in the legend. independent gaussian random variables X i with (cid:104) X i (cid:105) = 0 and (cid:104) X i (cid:105) (cid:54) = 0, so that α = (cid:80) ni =1 X i [32]. In this way it can be proven that n = 2 / ( q − −

2, i.e. theparameter q , is related to the number of random variables composing α . The pa-rameter b is proportional to the average value of α , so that large values of b at ﬁxed q result in a distribution biased toward shorter delays. The parameter q quantiﬁeshow much equation (4) deviates from being exponential, which is the case q = 1.This model has already been applied to the departure delays in the UK railwaysystem, showing that the value of q where so that 4 ≤ n ≤

11 and thus estimatingthe number of independent occurrences contributing to the delay. For the positiveexogenous delays in the Italian and German case respectively, we found q = 1 . q = 1 .

32, corresponding to n (cid:39) n (cid:39)

4. The negative part of the dis-tribution is exponential-like for the Italian railway network and broader for theGerman, this outlines the delay recovery strategies in the second case. To charac-terize the eﬀect of the spatial distance on the delay distribution, we subdividedthe links e of the railway networks in classes according to the geodesic distance d ( e ). Fig. 5 shows the behavior of the q and b parameters of the q -exponential ﬁtas functions of d ( e ). The parameter q remains constant in every case, while on theother hand the parameter b decreases as b ∼ d − a . Fig. F-I of Appendix show thediﬀerent best ﬁt for equation (4) as d ( e ) varies. This result suggest that while thecauses of the delay remain the same, the distribution of disturbances gets closerto a power-law as the length of the links increases, this outlines a relation betweenlink length and delay. Finally, we can assume that the occurrence of positive ornegative exogenous delays on links, P ( δt > | d ( e )) and P ( δt < | d ( e )), are notroughly constant with d ( e ) and hence are not depending on the length of the links(Fig. J of Appendix). omplex delay dynamics on railway networks 9 A)C) B)D)

Fig. 5

Best ﬁts of the q -distribution parameters for the exogenous positive and negative delaydistributions as function of the length of the links d . Panels A and B show the parameters forthe positive and negative exogenous delays in the Italian railway network, while panels C andD show the same results for the German case. The function expressing b was sought of theform b = b ( d/d ) − a , while the parameter q of the constant form q = c . physical train (e.g., the same convoy travelling back and forth along the same pathon the railway network – this is denoted as “rotation” –), the delay at departuremight suﬀer from the inﬂuence of the traﬃc. However, railway administrators en-visage suitable time buﬀers at the endpoints of the paths of each train so that itis reasonable to assume, at least as crude approximation, that departure delay isexogenous in character. It has already been shown that this kind of delay can bedescribed by a q -exponential distribution in [32]. However, the dependence on theparameters of the obtained distributions with respect to the topological proper-ties of the network has not been investigated yet. Following the same spirit of theprevious paragraph for the exogenous delay on links, we divide the nodes in thenetwork (the train stations), with respect to their out-degree. The out-degree k out represents roughly the number of diﬀerent railway lines starting from a certainstations and hence can be considered as a proxy for the complexity of the stationitself. Once the nodes of the networks have been divided according to k out , weﬁtted these distributions using a q -exponential following the procedure deﬁned in[32] (see Fig. L, Fig. M and Fig. N of Appendix). Fig. 6 shows the behaviour of the parameters q and b of the q -exponential distri-bution as functions of k out for the positive and negative departure delays in thetwo considered railway networks. Negative departure delays were never reportedin the German dataset and hence we assume they are not present. Despite thefact that better proxies for station complexity than k out might exist (weightingeach link with the actual number of railway lines on it is a valuable alternative A) B) C)

Fig. 6

Best ﬁts of the q -distribution parameters of the departure delay distribution as afunction of the out-degree of the nodes k out . Panels A and B show the parameters for thepositive and negative departure delays in the Italian railway network, while panel C the positivedeparture delay distribution for the German case. No negative departure delays were reportedin the German dataset. The function expressing b was sought of the form b = b e − k out /k ,while the parameter q of the constant form q = c . example), it is possible to see that we have again a constant parameter q indicat-ing that the sources of delays can be assumed to be the same independently fromthe station, while on the other hand the parameter b decreases exponentially with k out . The small value of R in the case of Italy might reﬂect the above mentionedpossibility of having a non negligible endogenous contribution to the departuredelays because of train rotations.In Germany the departure delay can be considered constant and independent onthe size of the station. In Italy the P ( δt > | k out ) rows linearly with k out meaningthat stations with high degree are generating larger disruptions in the network. The analyses proposed so far suggest the existence of two sources of delays: ex-ogenous delays spontaneously occurring due to external adverse conditions at de-parture and during travel, depending on the topological properties of the RailwayNetwork; endogenous delays resulting from the interaction between trains. Whilein the ﬁrst case we were able to characterize the statistical laws governing theemergence of delays, we cannot directly investigate the mechanisms of interactionbetween trains. Following the ideas exploited to model real world epidemics [34,35] which have already been proven eﬀective on air traﬃc delay modeling [22], wewill deﬁne a propagation process for delays i.e. we will suppose that trains canspread their delays from one another according to some ﬁxed probability and incertain conditions. Such probability will be derived by comparing the results fromthe simulations with the model and the empirical data. omplex delay dynamics on railway networks 11 window” to travel over a certain link along its path. Each train departure is at aﬁxed time and at a certain station and each intermediate station is going to bevisited at a given time. However, our interest is in the deviance from the expectedschedule. If some disruptions occurs a train might go out of the window of use fora certain link and overlap the window of another train for the same link. In thiscase, the second train will be forced to wait for its path to be cleared. Hence, themodel adopts three diﬀerent sources of delay as depicted in Fig. 7:

Departure delay

This delay is assigned at the beginning of the path (originat-ing station) of each train and is considered exogenous and unrelated with thecurrent traﬃc conditions at the departing stations or in the nearby links. Weassign to a train either a positive or negative delay according to the empir-ically found law of the corresponding probabilities p +dep = P ( δt dep > | k out )and p − dep = P ( δt dep < | k out ) respectively (and no delay with complementaryprobability 1 − p +dep − p − dep ) (see Fig. O of Appendix). Once the sign of thedelay has been decided, we assign a positive or negative delay value so that | δt dep | ∼ e q ( k out ) ,b ( k out ) ( δt ), i.e. distributed according to a q -exponential distri-bution with the parameters q and b depending on k out and on the sign of thedelay itself as in Fig. 6. Exogenous Link Delay

This delay is assigned whenever a train starts travellingon a link for the ﬁrst time (Fig. 7 reports an exempliﬁcation of the model). Con-sidering a train i passing from the link AB to the link BC , we adopt the samemodelling scheme as in the departure delay case assigning to the train a positivedelay with probability p +exo = P ( δt exo > | d ( BC )), a negative one with proba-bility p − exo = P ( δt exo < | d ( BC )) and no delay with 1 − p +exo − p − exo , where d ( BC )is the geodesic length of the links BC (see Fig. J of Appendix for the corre-sponding law of probability). Having decided the sign of the delay, we ﬁx the pa-rameter of a q -exponential distribution q and b according to d ( BC ) (Fig. 5) andwe extract the magnitude of the delay so that | δt exo | ∼ e q ( d ( BC )) ,b ( d ( BC )) ( δt ). Delay Propagation

We model the interaction between trains with a mechanismof propagation of delay from other trains to train i . The investigations per-formed on the datasets suggest that such interaction can occur only whennearby trains are in the Backward propagation conﬁguration of Fig. 3.Following the picture of the model in Fig. 7, when the train i starts travellingon the link BC leading to the C station is susceptible of propagation from thetrain j , which is travelling from C towards D . Inspired by the SIS models ofepidemic spreading [24], in which agents can get infected and recover from theillness with a ﬁxed probability, we introduce the delay propagation parameter β ∈ [0 ,

1] describing the probability that the delay of the train j propagates to i . We assume that the propagation occurs at the time i starts travelling onto BC . When train i travels on BC from time t to time t , we check whether de-layed trains are travelling on links in the Backward propagation conﬁguration with respect to BC in [ t , t ]. Thus, we randomly pick one of those trains, say j , and with probability β its delay δt j is added to the delay of i .The model simulates the theoretic schedule applying all these delay-generatingmechanisms. More in detail, we start the dynamics of each train i by adding adeparture delay δt dep according to the Departure delay mechanism. After the

A B C D j i i t i + t exo + t’ j P( t exo | d(BC)) P( t’ j ) = t i P( t dep. | k(A)) { for t’ j = t j for t’ j = 0 Fig. 7

A schematic explanation of the core mechanism of the Delay Propagation Model.In this picture, a train i departs from the station A and acquires a departure delay from adistribution depending on the out-degree of A in the railway network. When the train i passesthrough station B , its delay δt i gets two contributes: i) a stochastic exogenous delay δt exo depending on the length of the link d the train i is going to travel; ii) a propagated delay δt j depending on the presence of a delayed train j travelling on a nearby link. If there is such atrain j with a positive delay δt j >

0, this delay is propagated to train i with probability β ,which is the only free parameter of the model. departure, each time the train starts traveling over a link in its route, we updateits delay according to the following rule: δt i → δt i + δt exo + δt j (3) where δt exo is sample following Exogenous Link Delay mechanism, so that thedelay will be positive with probability p +exo and negative with p − exo and the param-eters of the sampling distributions are depending on the length of the link as inFig. 5. The term δt j the contribution of the Delay Propagation which will diﬀerentfrom 0 with probability β and just if there is at least one delayed train in the rightconﬁguration. This mechanism for intermediate stations reproduces the generalnoise associated with external causes, but does not account for long correlationssuch those present in case of large scale adverse conditions, e.g., bad weather, ornational strikes. Finally, when the train reaches its ﬁnal destination it is simplyremoved from the simulation. We used the described model to reproduce the delayspreading dynamics starting from a theoretic timetable and local exogenous delaysdistribution. Results of the simulation for both the Italian and German nationalrailways systems are reported in the next section. The model is based on the exogenous sources of delay that represent the spon-taneous emergence of disruptions due to the aggregation of a ﬁnite number ofexternal causes (trains malfunctions, accidents, bad weather, etc.). These sources are modelled according to a universal probabilistic law, whose parameters are in-ferred from the data at our disposal. The propagation parameter β modulates themutual interaction between trains. For simplicity, we chose this parameter to beuniform all over the network, which is a strong assumption since the propagationmight depend on the considered part of the railway system. omplex delay dynamics on railway networks 13 β bytrying to reproduce the average delay of each station. These values have been esti-mated as β = 0 .

15 for Italy and β = 0 . β the system is capable of reproducing the em-pirical distributions of arrival delays. However, in order to disentangle the eﬀect ofthe β parameter and the two other sources of exogenous delay, we also report thesame result for the case in which β = 0 and β is larger than the optimal value. Inthe latter, we clearly see that there is an increase in the number of extremely largedelays. On the other hand, turning oﬀ the interaction by setting β = 0 completelyremoves delays larger than 2 hours. Hence according to our model, the presence ofexogenous delays by themselves would not be able to predict the presence of largedisruptions. Since β represents the probability of delay propagation between twotrains, these estimations give a quantitative account of the diﬀerence between thetwo countries. These results suggest that the Italian trains propagate each otherdelay more often than German trains. The microscopic reasons of this diﬀerencecould be connected to diﬀerent properties of the railway networks, to the peculiargeographical structure of the territory, to diﬀerent delay handling policies, etc.,and it is out of the scope of the present paper. As it happens in other transporta-tion systems, we expect that disruptions occur clustered in certain areas of thenetwork [22]. For this purpose we provide a deﬁnition of “cluster of congestedstations” and how to discriminate whether a station is “congested”, i.e. when itsfunctioning is ineﬃcient because of the hoarding of delays. For each station we candeﬁne a threshold between the “functioning” and the “congested” status based onthe value of incoming train delays. We calculated such threshold for each stationas the average value of the delays of all the trains arriving at the considered sta-tion in the dataset. Considering a period of two months (March and April 2015),we examined each station every 5 minutes during the day, checking whether theaverage delay of all the trains moving towards that station during that lapse oftime was above the average. We consider the station as “congested”, while if theaverage delay is below that threshold we consider it as “functioning”.We identify, at each time step, the “clusters of congested stations” as the connectedcomponents of the railway network left once we removed all the functioning sta-tions from the network. (i.e. the stations whose congestion might be correlatedwith one another because of the propagation of delay).In order to check whether our model is able to reproduce the emergence of theseareas, we focused on two measures: (i) the size of the clusters in terms of numberof stations and (ii) the relation between the size and the diameter of the clus-ters. This latter measure is capable of giving insights about the topology of thecongested clusters. Bottom panels of Fig. 8A show real and simulated cumulativedistributions of cluster sizes for both countries. We ﬁnd a strong accordance be-tween model and reality when β is set to the optimal value, pointing out how themodel is also able to reproduce this aspect of delays dynamics. Similarly to the arrival delay distribution, here β aﬀects the tail of the distribution so that β = 0(i.e. no propagation of endogenous delays) implies the absence of large congestedareas. We study the diameter of the cluster deﬁned as the distance between thefarthest couple of nodes in the cluster. We compute the diameter both in the casewhere just the topological distance between nodes is considered and the case where Fig. 8 (A) Comparison between the empirical delay distribution (left) and the empirical clus-ter size distribution (right), with the results coming from the simulations for the Italian (Red)and German (Blue) Railway Networks. Diﬀerent curves correspond to diﬀerent parameters of β . The optimal β value is 0 .

15 for Italy and 0 . each link is weighted with the geodesic distance between the stations it connects.To panels of Fig. 8B shows the dependence of the cluster diameter computed withboth distances on the cluster size. In the cases where the diameter is computed byusing the geodesic distance, the dotted line represents the diameter of a cluster ofsize n , assuming that all the links of the clusters are as long as the average linklength of the network. Such pattern corresponds to the case where all the clustersare randomly sampled from the whole network without any constraint. From thelower panels of Fig. 8B we can see that while clusters do have a path-like struc-ture, they cannot be considered randomly distributed over the networks. Instead,they seem to be deployed in areas where the links are larger than the averagelinks length of the corresponding network, resulting in the same dependence ofthe dotted lines but shifted upwards. The fact that the geographical diameter oflarge clusters grows up to hundreds of kilometers indicates that disturbances canpropagate between far away parts of the network.The topological measure in the top panels of Fig. 8B exhibits strong deviationsfrom the path-like behaviour especially when large clusters are involved. The mis-match between the two ways of characterizing clusters could be due to the factthat the geographical distance may hide, by stretching them, non path-like struc-tures that are actually present in the delay propagation patterns. The appearance of these clusters, and their features, are typical outcomes of a non-trivial complexdynamics arising from train interactions. For this reason it is a remarkable resultthat, as can be neatly be seen in Fig. 8, the model proposed, despite its simplicity,captures this behaviour correctly. The relations between cluster size and clusterdiameter is show very good agreements with the empirical measures for both na- omplex delay dynamics on railway networks 15 tions and, perhaps more importantly, for both clusters diameter deﬁnition: thetopological and the spatial one. This clear adherence with reality of the resultsfrom a model with only one parameter trained on a so small training set (only oneweek) is a strong proof of the fact that the model hypothesis are very likely tobe correct. Our model is capable of reproducing some global patterns of the delaydynamics in railway systems. However, some discrepancies at a more microscopiclevel can be observed. In particular our model fails to correctly describe the be-haviour of stations with low traﬃc. For these stations, the hypothesis of a constantin time and homogeneous coupling parameter β may not be well justiﬁed. We referto Section 5 of Appendix for a thorough discussion of this point.4.2 Scenario Simulation: Prediction of the Eﬀects of a Strong LocalizedDisturbanceAccording to our model, the emergence of large clusters of congested stations isdue to the propagation of delay between trains. The source of this delay is how-ever exogenous, in the sense that comes from some adverse conditions which areexternal with respect to the interaction of the trains. In order to investigate howthe emergence of a real large cluster is linked to exogenous eﬀects, we study thecase of the cluster emerged in the eastern part of the Italian Railway Network onthe 28th of February 2015. As can be seen in the on-line visualization , a largecongestion starts to emerge in the early afternoon around 12am and propagatesto a large part of the network until night. We empirically identiﬁed the inter-ested part of the network as the shortest-path connecting the station of “NAPOLICENTRALE” to the station of “ROMA TERMINI”, indicated in Fig. 9A. Theshortest-path has been computed weighting each link with the geographical dis-tance between the stations it connects. Fig. 9B shows the fraction of stations inthis path that are congested in diﬀerent times of the day, clearly indicating thatsuch fraction starts growing until a peak is reached in the afternoon. We have beenable to identify the beginning of this adverse occurrence as a disruption on the“NAPOLI CENTRALE-AVERSA” link (highlighted in Fig. 9A) in the ﬁrst partof the afternoon from 11 am to 14 pm, resulting in a large delay acquired by thetrains travelling on that links. We argued that this disruption was the spark thatlightened the emergence of the congested cluster.In order to check this hypothesis, we ran a simulation in which a large delay of 100minutes is assigned with probability 1 to each of the trains crossing the “NAPOLICENTRALE-AVERSA” link in that period of time. Due to the non-deterministicnature of the model, we performed the simulation of 200 diﬀerent realizations inorder to have a set of scenarios to be compared with the real data. This compar-ison is shown in Fig. 9B. We can see that with this simple modiﬁcation and notconsidering other possible correlations between the occurrences of external adverseconditions in nearby links, we are able to reproduce a pattern which is qualitatively similar to the one observed in the data, clearly pointing out that the “NAPOLICENTRALE-AVERSA” link played a major role in the congestion of the network.Moreover, our scenarios indicates the probability of having a minor congestion inthe line also in the early morning, probably due to usual minor disruption occur- The visualization can be found at ring on other links. The proposed scenario simulation could be easily extended toless localized adverse occurrences, which might comprehend spatially correlateddisruptions due to natural events or strikes. Moreover, the same approach couldbe used to study more dramatic eﬀect of node, link removal or the positive eﬀectsdue to the introduction of more resilient and optimized schedules. In this cases, itis suﬃcient to use the same structure of the model and modify the input scheduleand/or the Railway Network itself.

A B

Fig. 9 (A) Representation of the “NAPOLI CENTRALE - ROMA TERMINI” route. Thehighlighted blue link is the “NAPOLI CENTRALE-AVERSA” one, i.e. the spark of the pertur-bation. (B) Fraction of congested stations in the “NAPOLI CENTRALE - ROMA TERMINI”route at diﬀerent times of the day. The solid black line represents the empirical measure per-formed on the dataset, the red area represents the results of the simulations between the 5thand 95th percentile of all the realizations. The dashed red line is the average fraction of con-gested stations obtained by averaging over the realizations. The vertical line represents thebeginning of the perturbation on the “NAPOLI CENTRALE-AVERSA” link.

Railways and the railway transport systems have been historically of utmost im-portance for the development of modern countries. Nowadays their importance ison the rise again due to their relevance in the reduction of CO emissions and theircompetitiveness in short and middle range movements, so that huge investmenthave been made in the European Union to improve their eﬃciency. However, theemergence of large disruptions and the intrinsic ineﬃciencies seem to be endemic inthis kind of transportation. The understanding of the universal features governingthe dynamics of the system from a theoretical point of view could give an impor-tant contribution to the solution of such problems. The development of a universalmodel explaining the emergence of delays in Railway Network would allow for a quantiﬁcation of the causes behind the occurrence of large disruptions and for amore direct link between the eﬀects that localized interventions might have on theglobal system. In this work, we developed a novel model of delay emergence andpropagation between trains moving on Railway Networks, partly based on empiri-cal laws inferred from the data governing the accidental emergence of delays from omplex delay dynamics on railway networks 17 the inﬂuence of adverse occurrences. Remarkably, the statistical models used forthe description of this “exogenous delays” showed how these adverse occurrencesare the result of the same ﬁnite number of causes (e.g., bad weather conditions andmalfunctions) independently from the topology, as opposed to the scale of thesedisruptions, which are largely inﬂuenced by the part of the network the trains aretravelling on. We ﬁnd, in fact, that both the complexity of a station as measuredby the number of diﬀerent routes originating from it and the length of a route, areconnected to the magnitude of the microscopic delays composing the overall delayby simple statistical laws with two parameters, whose value is the same all over thenetwork and can be inferred and ﬁxed by data. This kind of universality is some-how surprising and opens the possibility of easily simulating the behaviour of any railway system once the complete schedules are known. These emergent delays canthen spread from one train to another by means of a delay propagation probability β , which is in fact the only free parameter of our model. Despite the high level ofabstraction used in our approach, our model is capable of reproducing the empir-ical patterns found in data when the delay-spreading probability β is ﬁne tunedto an optimal value. The model reproduces correctly the ﬁnal delay distributionand the distribution of the sizes of congested areas. Moreover, the model graspsthe topological properties of such congested areas, which appears to be spreadingin some cases for many kilometres through the network. According to our model,the emergence of large disruptions is the result of the interplay between the oc-currence of localized exogenous delays and the propagation of such delays betweentrains. In other words, turning oﬀ the delay propagation mechanism prevents thesystem from generating extremely large delays and congested areas, pointing outthat interactions are the driving force behind the emergence of major spontaneousadverse occurrences. The modelling scheme seems quite promising so far, but itstill suﬀers from some major assumptions that might limit its predictive power.In fact, the interactions between trains could occur in longer ranges and not justbetween neighboring links and the propagation probability could not be uniformall over the Railway Network but instead depending on speciﬁc operational condi-tions. Moreover, the modelization of exogenous delays could be reﬁned by takinginto account more static topological feature of the Railway Networks or by intro-ducing dynamical variations due to diﬀerent traﬃc conditions. Another interestingpossibility would be to increase the number of Railway Systems under study (e.g.,the French SNFC sharing similar characteristics with respect to the Italian andGerman systems whose data is publicly available [36]), in order to understand thereasons behind the quantitative diﬀerences observed, e.g. larger delays in the Ital-ian case. In the ﬁnal part of the paper, we show how the model can be applied tostudy the capability of functioning of the system after a localized large disruptionoccurring in a single node of the network. This approach can be easily extendedto more complex case studies of distributed disruptions, the occurrence of strikes,the removal of trains or parts of the network, also including the possibility to in-troduce delay management strategies to increase the resilience of the system. Themodel is also useful in order to have a fast test of changes in the overall schedule of trains so to have a more precise assessment of their global eﬀects. Finally, de-spite its simplicity the model is open an increase of complexity that might leadto a better adherence to the empirical ﬁndings such as long-range interactions be-tween trains and a more detailed model of exogenous delays including correlationsbetween nearby link and the dependence on traﬃc conditions. Funding

This work has been supported by the KREYON Project, funded by the John Tem-pleton Foundation under contract n. 51663; VDPS acknowledges ﬁnancial supportfrom the Austrian Research Promotion Agency FFG under grant

Acknowledgements

The authors acknowledge Fabio Lamanna for the initial discussion about thedatasets to be used for the work.

AppendixA Datasets Information

Novel information technologies enabled real-time monitoring and sharing of any kind of traﬃcdata. Impressive instances are the visualised datasets about marine traﬃc that can be easilyfound on the Internet . Also, several websites display live air traﬃc, by gathering and visualis-ing oﬃcial data from various sources . These sources of information were found to be crucial inorder to improve the understanding of the related transportation systems [37,38,39]. However,it is still a hard task to aggregate and analyse global or continental datasets about railwaysystems. In fact, due to historical reasons and to the typical usage scale, each nation has anetwork with few international connections and the available datasets are not homogeneous incoverage and format. Thus, tailoring the analysis on national systems seems a natural choice,whereas the most interesting characteristic behaviours appear in all systems, suggesting somekind of universality in the dynamics. We focused our analysis on the European continent bothfor the historical importance of railways and for the recent institutional eﬀorts to raise theiradoption. The actual railway system is composed by three distinct layers: high-speed passen-ger trains, normal-speed trains (mostly of regional type) and freight/military trains. Whilethe freight trains use diﬀerent stations and traﬃc handling rules (e.g., they operate mainlyduring night time) and can be discarded from our analysis, the other two layers can possiblyinteract each other. In the high-speed layer, correlations are identical to those in the regularlayer, while no appreciable correlations can be spotted across the two layers so that consideringthem as independent is a fairly good approximation. Since no additional information can begained by studying the whole system, we focus on the regular-speed layer only, both for Italyand Germany A.1 The Italian Dataset

The dataset regarding the Italian Railways has been collected by means of the “ViaggiaTreno”website . The purpose of this website is to provide real-time information to travellers regardingthe position of a certain train on the network, its delay and possible adverse occurrenceslike cancellations or strikes. Despite the fact that the information is in real-time, i.e., the E.g.: , , E.g.: , https://flightaware.com , https://planefinder.net omplex delay dynamics on railway networks 19instantaneous delay of a train can be checked at any time during the day, whenever the trainarrives at its ﬁnal destination its record is not deleted from the site. Instead, it is possible tocheck its route and its delay at each intermediate stop from the departing station to the arrivalone until the end of the day, at 23:59. Hence, we downloaded all the information displayedon the website each day at 23:30 in order to be sure that each train would have arrived atdestination. Starting from the 1st of January 2015 and for the whole 2015, we collected 12months of historical data about the dynamics of regular and high-speed trains in Italy. Foreach train we get an identiﬁer, the ordered list of stations the train has to cross, the scheduledarrival time at each station and its delay. The resulting dataset comprehends the traﬃc runningon 2253 stations, with a daily average schedule pertaining 8112 trains on 7062 links. Note that“ViaggiaTreno” does not collect information about the geographical position of the stations.Such information has been integrated by means of Wikipedia and Google Maps, allowing us torepresent the geo-localized network of Italian railways. Each dot corresponds to a station andeach link corresponds to a route between two stations. In other words, a line between Romeand Naples means that there is a direct train route linking them without intermediate stops.Lacking real point-wise tracks data, the route has been simply represented with a straight line. A.2 The German Dataset

The data about German Railways have been collected through the OpenDataCity website.This site gathers diﬀerent datasets collected by a variety of on-going or terminated projectsdealing with open data. In particular, the data we analysed come from the “Zugmonitor”project, which aimed at providing a web-app and an API to German travellers in order tohave real time information on the position of the trains on the German Railway Network andtheir delay. The project is no-longer running and the API is not accessible anymore. However,some dumps of historical data collected during the project are still available. In particular, wedownloaded all the data regarding year 2015, covering the same period of the Italian dataset.This dumps collected not only the delay at each station like in the Italian case, but also thedelay at intermediate points between two stations. All the points are also geo-localized so thatit is possible to reconstruct a quite accurate trajectory of the trains. In order to be consistentwith the Italian dataset we used the geo-localization only to identify the position of the stationsin the map. Scheduled arrival times at each station were also stored in the dump, so that inthe end we managed to reconstruct a dataset with a structure identical to the Italian one.The resulting dataset includes data for 5979 stations with a daily average schedule containing11,975 trains on 16,277. B High-speed layer

The structure Italian and the German Railway Networks is the overlap of two distinct layers,the normal-speed and the high-speed one. These two layers are diﬀerent from the structuralpoint of view. The high-speed layer in fact has to allow for fast travelling trains and have adiﬀerent kind of rails connecting stations and, in general, it is reasonable to assume that high-speed trains and normal-speed ones do not interact when travelling from a station to another.However, the nodes of the network (i.e. stations) are shared between the layers, making thenetwork a “multiplex” [40,41] and allowing for interactions of the two diﬀerent kind of trains.Our datasets contain information about high-speed and normal-speed trains, for both theItalian and German case and in principle it could be possible to study the dynamics of thehigh-speed layer and its interaction with the normal one. In the main text we decided thoughto focus on the normal layer cutting-out the high-speed part. This choice was made for sake ofsimplicity, since the rules of interaction between the layers might have been hard to understandor derive with data analysis. Here, we will show that this approximation is reasonable due tothe smaller numbers of the high-speed trains travelling and their poor eﬀects on the dynamicsof the normal-speed one.In our datasets it is possible to identify high-speed trains thanks a speciﬁc identiﬁer (“ES*”for the Italian Network and “EC”, “IC”, “ICE” for the German Network) and use them to Fig. A

Degree distributions for the Italian (left) and German (right) Railway Networks.Continuous lines correspond to the normal-speed layers, while dotted lines correspond to thehigh-speed layers.build the High-speed layer of the Railway Network in a similar way that has been done forthe normal layer in the main text. The number of travelling high-speed trains per day isconsiderably smaller with respect to the normal-speed trains, being of ∼

210 and ∼ .

28 6 . .

06 0 . Table A

Network Metrics of the High-Speed Layers of the Italian and German RailwayNetworks.the approximation of neglecting this layer by looking at the cross-correlations between thetime-series of average delays on the links in the networks, similarly to what we have done forthe normal-speed layer in the main text. In this case though we checked for correlations notjust between the links of the same layer, but also between couples of links from diﬀerent layersin order to see whether we can spot a signal of a possible inter-layer interaction. Fig. D showsthe cross-correlations in the “Forward”, “Backward” conﬁgurations, between the couples oflinks of the high-speed layer and the couples of links made by a link in the high-speed layerand one in the normal-speed layer. As for the normal-speed layer, decaying correlations existfor non inter-layer couples of links in the Backward conﬁguration, while in all the other casesthe signal of correlation is very close to 0. Hence, it is possible to considered the high-speedand normal-speed layer as independent and non-interacting. It is worth noticing that thismeasure of correlation might hide possible local interaction eﬀects due to the fact that it isan aggregation of all the couples of links in the network. Such approximation will be thenvalid when considering global or aggregated metrics (e.g. the delay distributions), but it is notunlikely that more “ﬁne-grained” observations (e.g. the distribution of delays on a single linkor station) might be inﬂuenced by our choice.omplex delay dynamics on railway networks 21

Fig. B

Links Length distributions for the Italian (left) and German (right) Railway Networks.Continuous lines correspond to the normal-speed layers, while dotted lines correspond to thehigh-speed layers.

Fig. C

Final delay distributions for the Italian (left) and German (right) Railway Networks.Continuous lines correspond to the normal-speed layers, while dotted lines correspond to thehigh-speed layers.

Fig. D

Cross-Correlations between the average delay time series of pairs of nearby edges inthe high-speed layers of the railway networks and between the time series of pairs of edgescoming from diﬀerent layers. Decaying correlations are observed only in the “Backward” casefor pairs of edges both in the high-speed layer. No signal of inter-layer correlations can beobserved.2 Bernardo Monechi et al.

C Exogenous Delay Distributions

The most trivial way to group the links of the railway networks is according to the geodesicdistance between the stations they connect, behind this a rough estimate of the length of therailway between them. Fig. Eshow the distribution of these distances d ( e ) for all the edges e in the Italian and German Railway Networks. From these distributions we can see that thedistances are distributed around a typical value of ∼ ∼

100 km. In order to characterize correctly the exogenous delay on the links, we measured

Fig. E

Distributions of the links length d ( e ) for the Italian (left) and German (right) RailwayNetworks.the positive and negative exogenous delay distributions aggregating the links according to d ( e )as can be seen from Figures F, G, H, and G. In all these cases, we modelled the distributionusing a q -exponential functional form[32,33]: e q,b ( δt ) ∝ (1 + b ( q − δt ) / (1 − q ) , q ∈ [1 , , b > , (4)so that in these cases the parameter q and b are depending on d ( e ). The behavior of theparameters with respect to d ( e ) are shown in the main text. We ﬁnd that in general: q ( d ) = const , b ( d ) = Ad − a . (5)The parameters for equation (5) can be found in table B: Since these distributions are allq A aITA positive 1 .

15 0 .

66 5 . .

03 0 .

22 1 . .

28 0 .

99 37 . .

15 0 .

57 0 . Table B

Parameters for the equation (5), governing the behavior of the parameters q and b of the q -exponential distribution as the links length d ( e ) varies.conditioned on the fact that the acquired exogenous delay is either positive or negative, wecan check whether the probability of these conditions are inﬂuenced or not by the length of thelink the train is travelling on. Fig. J shows these dependencies for both the considered RailwayNetworks. Despite the fact that a small dependence can be observed in the probability ofhaving positive delays (i.e. it is slightly increasing with d ), assuming that such probabilitiesare constant is a good zero-order approximation that we have used in the main text.omplex delay dynamics on railway networks 23 δt 5 4 3 2 1 P ( δ t | δ t > ) d ( e ) = 5 . km q =1 . ,b =1 . χ red =17 . δt P ( δ t | δ t > ) d ( e ) = 10 . km q =1 . ,b =1 . χ red =7 . δt P ( δ t | δ t > ) d ( e ) = 15 . km q =1 . ,b =1 . χ red =7 . δt P ( δ t | δ t > ) d ( e ) = 20 . km q =1 . ,b =0 . χ red =3 . δt 5 4 3 2 1 P ( δ t | δ t > ) d ( e ) = 25 . km q =1 . ,b =0 . χ red =5 . δt P ( δ t | δ t > ) d ( e ) = 30 . km q =1 . ,b =0 . χ red =4 . δt P ( δ t | δ t > ) d ( e ) = 35 . km q =1 . ,b =0 . χ red =3 . δt P ( δ t | δ t > ) d ( e ) = 40 . km q =1 . ,b =0 . χ red =1 . Fig. F

Distributions of the positive exogenous delays according to the length d ( e ) of the linksin the Italian Railway Network. Dotted lines represents the q -exponential ﬁt of the distribution.The parameters obtained with the ﬁts are shown in the legend. δt 7 6 5 4 3 2 1 P ( δ t | δ t < ) d ( e ) = 5 . km q =1 . ,b =0 . χ red =296 . δt P ( δ t | δ t < ) d ( e ) = 10 . km q =1 . ,b =0 . χ red =49 . δt P ( δ t | δ t < ) d ( e ) = 15 . km q =1 . ,b =0 . χ red =10 . δt P ( δ t | δ t < ) d ( e ) = 20 . km q =1 . ,b =0 . χ red =12 . δt 7 6 5 4 3 2 1 P ( δ t | δ t < ) d ( e ) = 25 . km q =1 . ,b =0 . χ red =18 . δt P ( δ t | δ t < ) d ( e ) = 30 . km q =1 . ,b =0 . χ red =11 . δt P ( δ t | δ t < ) d ( e ) = 35 . km q =1 . ,b =0 . χ red =2 . δt P ( δ t | δ t < ) d ( e ) = 40 . km q =1 . ,b =0 . χ red =20 . Fig. G

Distributions of the negative exogenous delays according to the length d ( e ) of the linksin the Italian Railway Network. Dotted lines represents the q -exponential ﬁt of the distribution.The parameters obtained with the ﬁts are shown in the legend.4 Bernardo Monechi et al. δt 5 4 3 2 1 P ( δ t | δ t > ) length = 5 . km q =1 . ,b =8 . χ red =148 . δt P ( δ t | δ t > ) length = 10 . km q =1 . ,b =3 . χ red =92 . δt P ( δ t | δ t > ) length = 15 . km q =1 . ,b =2 . χ red =41 . δt P ( δ t | δ t > ) length = 20 . km q =1 . ,b =1 . χ red =23 . δt 5 4 3 2 1 P ( δ t | δ t > ) length = 25 . km q =1 . ,b =1 . χ red =9 . δt P ( δ t | δ t > ) length = 30 . km q =1 . ,b =1 . χ red =5 . δt P ( δ t | δ t > ) length = 35 . km q =1 . ,b =1 . χ red =2 . δt P ( δ t | δ t > ) length = 40 . km q =1 . ,b =0 . χ red =2 . Fig. H

Distributions of the positive exogenous delays according to the length d ( e ) of thelinks in the German Railway Network. Dotted lines represents the q -exponential ﬁt of thedistribution. The parameters obtained with the ﬁts are shown in the legend. δt 7 6 5 4 3 2 1 P ( δ t | δ t < ) d ( e ) = 5 . km q =1 . ,b =4 . χ red =23 . δt P ( δ t | δ t < ) d ( e ) = 10 . km q =1 . ,b =2 . χ red =60 . δt P ( δ t | δ t < ) d ( e ) = 15 . km q =1 . ,b =1 . χ red =42 . δt P ( δ t | δ t < ) d ( e ) = 20 . km q =1 . ,b =1 . χ red =13 . δt 7 6 5 4 3 2 1 P ( δ t | δ t < ) d ( e ) = 25 . km q =1 . ,b =1 . χ red =10 . δt P ( δ t | δ t < ) d ( e ) = 30 . km q =1 . ,b =1 . χ red =1 . δt P ( δ t | δ t < ) d ( e ) = 35 . km q =1 . ,b =2 . χ red =3 . δt P ( δ t | δ t < ) d ( e ) = 40 . km q =1 . ,b =0 . χ red =2 . Fig. I

Distributions of the negative exogenous delays according to the length d ( e ) of thelinks in the German Railway Network. Dotted lines represents the q -exponential ﬁt of thedistribution. The parameters obtained with the ﬁts are shown in the legend.omplex delay dynamics on railway networks 25 d P P ( δt< c =0 . P ( δt> c =0 . d P P ( δt< c =0 . P ( δt> c =0 . Fig. J

Probability of having a positive (dashed line) and negative (continuous line) exogenousdelay as function of the length of the links in the Italian (left) and German (right) RailwayNetworks.

D Departure Delay Distributions

An approach similar to the one adopted for the exogenous delays on the links of the networkscan also be adopted for the departure delays at the stations. In this case we categorized thedeparting stations (i.e. a subset of the nodes in the network) according to their out-degree k out whose distributions are shown in Fig. K. Having divided the nodes of the network accordingto k out , we can ﬁt the aggregated departure delay distributions as k out varies as shown inFig. M, L and N. Note that negative departure delays are not present in the German dataset.These distribution have been ﬁtted using a q -exponential functional form as in equation 4. Thebehavior of the q and b parameters for these distributions according to k out can be summarizedby the equations: q ( k out ) = conts , b ( k out ) = Ae − ak out . (6)The parameters for equations (6) are obtained by ﬁtting the empirical data as shown in themain text. Table C shows the values obtained with the ﬁt: To conclude the investigation aboutq A aITA positive 1 .

28 0 .

014 1 . .

01 0 .

004 0 . .

20 0 .

026 1 . Table C

Parameters for the equation (6), governing the behavior of the parameters q and b of the q -exponential distribution as out-degree k out of the nodes varies. k out P ( k o u t ) k out P ( k o u t ) Fig. K

Out-degree distributions for the Italian (left) and German (right) Railway Networks.6 Bernardo Monechi et al. δt 5 4 3 2 1 P ( δ t | δ t > ) k out = 2 . q =1 . ,b =2 . χ red =7 . δt P ( δ t | δ t > ) k out = 3 . q =1 . ,b =1 . χ red =2 . δt P ( δ t | δ t > ) k out = 4 . q =1 . ,b =1 . χ red =4 . δt P ( δ t | δ t > ) k out = 5 . q =1 . ,b =1 . χ red =7 . δt P ( δ t | δ t > ) k out = 6 . q =1 . ,b =0 . χ red =5 . δt 5 4 3 2 1 P ( δ t | δ t > ) k out = 7 . q =1 . ,b =1 . χ red =7 . δt P ( δ t | δ t > ) k out = 8 . q =1 . ,b =1 . χ red =2 . δt P ( δ t | δ t > ) k out = 9 . q =1 . ,b =1 . χ red =5 . δt P ( δ t | δ t > ) k out = 10 . q =1 . ,b =0 . χ red =3 . δt P ( δ t | δ t > ) k out = 11 . q =1 . ,b =1 . χ red =8 . Fig. L

Distributions of the positive departure delays according to k out of the links in theItalian Railway Network. Dotted lines represents the q -exponential ﬁt of the distribution. Theparameters obtained with the ﬁts are shown in the legend. δt 3 2 1 P ( δ t | δ t < ) k out = 2 . q =1 . ,b =1 . χ red =37 . δt P ( δ t | δ t < ) k out = 3 . q =1 . ,b =1 . χ red =6 . δt P ( δ t | δ t < ) k out = 4 . q =1 . ,b =1 . χ red =22 . δt P ( δ t | δ t < ) k out = 5 . q =1 . ,b =1 . χ red =6 . δt P ( δ t | δ t < ) k out = 6 . q =1 . ,b =1 . χ red =121 . δt 3 2 1 P ( δ t | δ t < ) k out = 7 . q =1 . ,b =1 . χ red =63 . δt P ( δ t | δ t < ) k out = 8 . q =1 . ,b =0 . χ red =8 . δt P ( δ t | δ t < ) k out = 9 . q =1 . ,b =0 . χ red =303 . δt P ( δ t | δ t < ) k out = 10 . q =1 . ,b =0 . χ red =37 . δt P ( δ t | δ t < ) k out = 11 . q =1 . ,b =0 . χ red =66 . Fig. M

Distributions of the negative departure delays according to k out of the links in theItalian Railway Network. Dotted lines represents the q -exponential ﬁt of the distribution. Theparameters obtained with the ﬁts are shown in the legend.omplex delay dynamics on railway networks 27 δt 5 4 3 2 1 P ( δ t | δ t > ) k out = 2 . q = 1 . , b = 1 . χ red = 19 . δt P ( δ t | δ t > ) k out = 4 . q = 1 . , b = 0 . χ red = 8 . δt P ( δ t | δ t > ) k out = 6 . q = 1 . , b = 0 . χ red = 12 . δt P ( δ t | δ t > ) k out = 8 . q = 1 . , b = 0 . χ red = 13 . δt P ( δ t | δ t > ) k out = 10 . q = 1 . , b = 0 . χ red = 5 . δt 5 4 3 2 1 P ( δ t | δ t > ) k out = 12 . q = 1 . , b = 0 . χ red = 10 . δt P ( δ t | δ t > ) k out = 14 . q = 1 . , b = 0 . χ red = 13 . δt P ( δ t | δ t > ) k out = 16 . q = 1 . , b = 0 . χ red = 16 . δt P ( δ t | δ t > ) k out = 18 . q = 1 . , b = 0 . χ red = 3 . δt P ( δ t | δ t > ) k out = 20 . q = 1 . , b = 0 . χ red = 14 . Fig. N

Distributions of the positive departure delays according to k out of the links in theGerman Railway Network. Dotted lines represents the q -exponential ﬁt of the distribution.The parameters obtained with the ﬁts are shown in the legend.departure delays, it is necessary to study the occurrences of positive and negative ones as k out varies. Fig. O shows these dependencies for the Italian and German Railway Networks.Similarly to what we have found for the dependency of the exogenous delay with the length ofthe links, in the German Railway Network no dependence can be observed and the probabilitiesof having a positive or negative departure delay can be considered constant in every station.However, this is not true for the Italian Railway Network where just the probability of havinga negative delay can be considered constant. On the contrary, the probability of having apositive departure delay increases linearly with k out .8 Bernardo Monechi et al. k out P P ( δt< c =0 . P ( δt> a =0 . ,b =0 . ,R =0 . k out P P ( δt< c =0 . P ( δt> c =0 . Fig. O

Probability of having a positive (dashed line) and negative (continuous line) departuredelay as function of the out-degree of the nodes in the Italian (left) and German (right) RailwayNetworks. Dashed and continuous lines corresponds to linear or constant ﬁts of the data. Weassume that there is no dependency with k out in every case but the positive departure delaysin the Italian data where we found the linear relation: P = ak out + b (parameters shown inthe legend). E Optimal Choice of β In order to determine the optimal value of the β , we tune our model to reproduce with thehighest probability the delay that each train gets whenever it crosses a station during its path.Considering a train i arriving at a station n on a given day, we call δt emp i,n its measured ar-rival delay at that station as recorded in the dataset. Hence, we perform 200 simulations of theschedule of the considered day in order to compute the distribution P ( δt i,n ) of the correspond-ing δt i,n . Hence, considering the null hypothesis that δt emp i,n is produced by our model (i.e. itis extracted by P ( δt i,n )), we calculate the double tailed p -value for such hypothesis for everypair ( i, n ), i.e. for every train and for every station. For each day we average the p -values of allthe train-station pairs to obtain the performance metrics for β . Fig. P shows the dependenceof the average p -value as a function of β . The curves have been computed from the simulationof a week of daily schedules. The values of β = 0 .

15 for Italy and β = 0 .

10 for Germany allowsthe model to maximize the probability of reproducing the correct arrival delay for each train ata particular station. These values will be used in all the numerical simulations in the following,if not otherwise speciﬁed.omplex delay dynamics on railway networks 29

Fig. P

Average p -value statistics for the arrival delay for each train at each station in theirroute as a function of the diﬀusion parameter β . The curves have been obtained as the averageof the single curve of diﬀerent days (from the 1 st to the 6 th of March 2015). The highlightedregions correspond to the range of values of β where the average maximum p -value has beenobserved; F Predictive limits of the model

By means the deﬁnition of the p -value of the previous paragraph, we can explore a bit whichare the predictive limits of the model, i.e. in which part of the Railway Network it is morelikely to not reproduce correctly the delays. In other words, we computed for each station n the average p -values, by averaging all the p -values assigned to the couple ( i, n ). Fig. Q showsthe distribution of these average p -values for the stations in the networks, obtained with theoptimal value of β . The largest parts of the stations have p -values centred around a typicalvalue of ∼ . ∼ . ∼

11% for theItalian and ∼

20% for the German case) with a p -value smaller than 0 .

05. In these stations thepredictive power of the model is particularly unsatisfactory and it is interesting to understandsomething about their features.Fig. R shows the distribution of the average weight of the links (in terms of the number oftrains that have travelled in the links in our whole datasets) and the distribution of the lengthof the links connected to a station, in the case of stations with p -value larger and smaller than0 .

05. We can see that in the latter, the distribution of the weight is considerably narrower,indicating that in these stations the traﬃc is usually low. Hence, the disagreement might bethe result of a poor sampling of the exogenous disturbances around these stations leading topoor predictions or to a dependence of the transmission parameter β on the traﬃc conditionsthat have been ruled out when we assumed them to be constant in time and uniform all overthe network.For the Italian case, these stations also are usually connected to links with a shorter distance.This fact can be interpreted as the eﬀect of discrepancies in the ﬁtted exogenous delay modelwhen the links are not suﬃciently long.0 Bernardo Monechi et al. A) B)

Fig. Q

Average p -values distribution for the stations of the Italian (left) and German (right)railway networks with the optimal choice of the diﬀusion parameter β . Vertical dashed linescorrespond to the value p = 0 . Fig. R

Distributions of the average weight of the in-going (A-E) and out-going (B-F) of thelinks of the Railway Networks. Weights are computed as the total number of trains that havetravelled over a link during the whole 2015. Distributions of the average length of the of thein-going (C-G) and out-going (D-H) of the links of the Railway Networks. Red and Blue linescorrespond to the Italian and German railway networks, dotted lines are the distributions forthe stations with a p -value smaller than 0 . References

1. D. Banister, S. Watson, C. Wood, Environment and Planning B: planning and design (1), 125 (1997)2. N. Limao, A.J. Venables, The World Bank Economic Review (3), 451 (2001)3. P. Newman, J. Kenworthy, Sustainability and cities: overcoming automobile dependence (Island press, 1999)4. K.E. Train,

Discrete choice methods with simulation (Cambridge university press, 2009)5. A. Nelson, (2008)6. D. Donaldson, American Economic Review (4-5), 899 (2018)7. C. Carrion, D. Levinson, Transportation research part A: policy and practice (4), 720(2012)8. R.B. Noland, J.W. Polak, Transport reviews (1), 39 (2002)9. D. Banister, Transport policy (2), 73 (2008)10. L.E. Meester, S. Muns, Transportation Research Part B: Methodological (2), 218 (2007)11. R.M. Goverde, Transportation Research Part C: Emerging Technologies (3), 269 (2010)12. A. Higgins, E. Kozan, Transportation Science (4), 346 (1998)13. A. D’Ariano, M. Pranzo, I.A. Hansen, IEEE Transactions on Intelligent TransportationSystems (2), 208 (2007)14. K. Li, Z. Gao, B. Ning, Journal of Computational Physics (1), 179 (2005)15. V. Cacchiani, A. Caprara, P. Toth, Transportation Research Part B: Methodological (2),215 (2010)16. ˙I. S¸ahin, Transportation Research Part B: Methodological (7), 511 (1999)17. A. Giua, C. Seatzu, IEEE Transactions on automation science and engineering (3), 431(2008)18. D. Middelkoop, M. Bouwman, in Proceedings of the 33nd conference on Winter simulation (IEEE Computer Society, 2001), pp. 1042–104719. P. Sen, S. Dasgupta, A. Chatterjee, P. Sreeram, G. Mukherjee, S. Manna, Physical ReviewE (3), 036106 (2003)20. A. Erath, M. L¨ochl, K.W. Axhausen, Networks and Spatial Economics (3), 379 (2009)21. U. Bhatia, D. Kumar, E. Kodra, A.R. Ganguly, PloS one (11), e0141890 (2015)22. P. Fleurquin, J.J. Ramasco, V.M. Eguiluz, Scientiﬁc reports (2013)23. B. Campanelli, P. Fleurquin, V. Egu´ıluz, J. Ramasco, A. Arranz, I. Etxebarria, C. Ciruelos,Proceedings of the Fourth SESAR Innovation Days, Schaefer D (Ed.), Madrid (2014)24. R. Pastor-Satorras, A. Vespignani, Physical review letters (14), 3200 (2001)25. Viaggiatreno. . Accessed: 2018-05-2926. OpenDataCity. . Accessed: 2017-05-1027. The CIA World Factbook. . Accessed: 2018-05-2928. M. Kurant, P. Thiran, Physical Review E (3), 036114 (2006)29. M. Barth´elemy, Physics Reports (1), 1 (2011)30. M.E. Newman, Physical review letters (20), 208701 (2002)31. M.E. Newman, SIAM review (2), 167 (2003)32. K. Briggs, C. Beck, Physica A: Statistical Mechanics and its Applications (2), 498(2007)33. S. Picoli Jr, R. Mendes, L. Malacarne, R. Santos, Brazilian Journal of Physics (2A),468 (2009)34. M.F. Gomes, A.P. y Piontti, L. Rossi, D. Chao, I. Longini, M.E. Halloran, A. Vespignani,PLOS Currents Outbreaks (2014)35. Q. Zhang, K. Sun, M. Chinazzi, A. Pastore-Piontti, N.E. Dean, D.P. Rojas, S. Merler,D. Mistry, P. Poletti, L. Rossi, et al., bioRxiv p. 066456 (2016)36. SNCF Open Data. https://ressources.data.sncf.com/explore/dataset/api-sncf/ .Accessed: 2018-05-2937. R. Guimera, L.A.N. Amaral, The European Physical Journal B-Condensed Matter andComplex Systems (2), 381 (2004)38. R. Guimer`a, S. Mossa, A. Turtschi, L.A.N. Amaral, Proceedings of the National Academyof Sciences (22), 7794 (2005)39. P. Kaluza, A. K¨olzsch, M.T. Gastner, B. Blasius, Journal of the Royal Society Interface (48), 1093 (2010)40. F. Battiston, V. Nicosia, V. Latora, Physical Review E (3), 032804 (2014)41. G. Menichetti, D. Remondini, P. Panzarasa, R.J. Mondrag´on, G. Bianconi, PloS one9