[PDF] Modeling growth of urban firm networks

Abstract

The emergence of interconnected urban networks is a crucial feature of globalisation processes. Understanding the drivers behind the growth of such networks - in particular urban firm networks -, is essential for the economic resilience of urban systems. We introduce in this paper a generative network model for firm networks at the urban area level including several complementary processes: the economic size of urban areas at origin and destination, industrial sector proximity between firms, the strength of links from the past, as well as the geographical and socio-cultural distance. An empirical network analysis on European firm ownership data confirms the relevance of each of these factors. We then simulate network growth for synthetic systems of cities, unveiling stylized facts such as a transition from a local to a global regime or a maximal integration achieved at an intermediate interaction range. We calibrate the model on the European network, outperforming statistical models and showing a strong role of path-dependency. Potential applications of the model include the study of mitigation policies to deal with exogenous shocks such as economic crisis or potential lockdowns of countries, which we illustrate with an application on stylized scenarios.

Full PDF

MModeling growth of urban firm networks

Juste Raimbault , Natalia Zdanowska , Elsa Arcaute Centre for Advanced Spatial Analysis, UCL, London, United Kingdom UPS CNRS 3611 ISC-PIF, Paris, France UMR CNRS 8504 G´eographie-cit´es, Paris, France* [email protected]

Abstract

The emergence of interconnected urban networks is a crucial feature of globalisation processes. Understandingthe drivers behind the growth of such networks – in particular urban firm networks –, is essential for the economicresilience of urban systems. We introduce in this paper a generative network model for firm networks at the urbanarea level including several complementary processes: the economic size of urban areas at origin and destination,industrial sector proximity between firms, the strength of links from the past, as well as the geographical andsocio-cultural distance. An empirical network analysis on European firm ownership data confirms the relevance ofeach of these factors. We then simulate network growth for synthetic systems of cities, unveiling stylized facts suchas a transition from a local to a global regime or a maximal integration achieved at an intermediate interactionrange. We calibrate the model on the European network, outperforming statistical models and showing a strongrole of path-dependency. Potential applications of the model include the study of mitigation policies to dealwith exogenous shocks such as economic crisis or potential lockdowns of countries, which we illustrate with anapplication on stylized scenarios.

Introduction

Networks constitute the new social morphology of our societies, since the intensification of globalisation of theeconomy in the late 20th century [1]. A world-city system has since then emerged [2], characterized by highlyinterconnected urban centres at the global level [3]. These interactions can be of different nature and determinedby economic, socio-cultural, or geopolitical proximity inducing spatial and non-spatial patterns [4]. Interactionsof economic nature between transnational firms [5], are together with international trade patterns among themost representative of the current economic trends [2]. Understanding the drivers of growth and geographicalproperties of such an urban network, can be used to foster innovation in specific cities and to shape public policiesfor local industrial clusters [6]. In addition, straightening these city-interactions can be crucial for revitalisingcertain geographical areas [7], by developing strategic industries for global integration of regions [8], cities [9] andcountries [10]. Exploring city/firm interactions [11] and the position of cities within these networks [9] can alsopermit to infer the consequences of future exogenous shocks such as an economic crisis, a post-Brexit scenario forthe European Union or a single market collapse in the post-COVID19 future.The use of network models to understand processes underlying the emergence and dynamics of such networkshas already been tackled in the literature, but still not extensively and never explicitly at city level [2]. In factgenerative urban network models combining geographical factors with network topological properties exist at theregional scale [12]. Trade networks have been extensively studied from the complex networks perspective, but theseare generally driven at the country-level [13] because of a lack of city-level data. Input-output models, mostly usedin regional science to represent flows between geographical areas, are considered in some cases as a type of urbangenerative model [14]. In this stream of research, spatial interaction models [15] can form the basis of urban networkmodels for different types of flows [16] or physical infrastructure [17].In this paper we choose to focus on urban networks of firms to question and capture geographical and economicprocesses as internationalisation, metropolisation, regionalisation and specialisation of cities within a generativemodel. The specific question we tackle is the complementarity of different processes to generate such urban networks.The originality of the present paper is to examine these issues within the system of cities framework [18]. In thisapproach the position and dynamics of cities in the socio-economic world-wide space can be considered by their1 a r X i v : . [ phy s i c s . s o c - ph ] S e p nteractions with other cities [19]. We examine the interactions of European cities within firm linkages defined bycorporate ownership links. Since transnational firms structure is one of the determinant of the global economicspace, it provides one accurate proxy to unveil geo-economic structures [20].More precisely, our contribution consists in the following: (i) we propose an empirical analysis of the firmownership urban network in the European Union, based on the Bureau Van Dijk’s AMADEUS database [21]; (ii) weintroduce a generative network model to simulate the growth of such linkages at urban areas scale, which combinesmultiple factors influencing link formation, such as the economic intensity of the urban areas corresponding to theorigin and destination nodes, the industrial sector proximity, the strength of previous links, and the geographicaland socio-cultural distance; (iii) the model allows us to compare the effect of different factors on the final networkstructure, which is extensively studied through model sensitivity analysis and exploration; and (iv) we calibrate themodel on the empirical network.The rest of this paper is organized as follows: we firstly describe the generative model for urban networks andproceed to an empirical analysis of the network used as a case study. We then proceed to a sensitivity analysis andexploration of the model on synthetic data. The model is calibrated on real European firm ownership data. Wefinally discuss implications of our results and possible extensions.

Materials and Methods

Generative urban network model

Rationale

The model is expected to capture a single macroscopic urban scale, i.e. the level of the urban system where basicentities are cities. Links are induced by underlying firms but these are not explicit. Cities are defined by theireconomic size, but also by an industrial sector profile. As the authors of [4] point out, a combination of severaldistances may play a role in establishing linkages. Geographical distance, industrial proximity and complementaritywill typically be equally significant [22]. Therefore, we combine several processes driving network growth: (i)geographical proximity (in our case the crow-fly distance, but it can be generalized to any type of accessibility); (ii)geopolitical proximity, which captures the higher propensity to establish links within existing submarkets within theEuropean common market (confirmed relevant by the fixed effect regression described below); (iii) city economicsize, which presents agglomeration effects; (iv) industrial proximity in terms of firm sector composition; and (v)previous linkages (influence of already existing links in the past). This last point is crucial to capture accumulatedeffects and include path-dependencies, justifying further the use of a simulation model instead of a more simplestatistical model.

Model description

Let 1 ≤ i ≤ N cities be defined at time t by their economic structure such that E i ( t ) is the total economic volume(GDP) for city i and S ik ( t ) is a matrix giving economic volume proportions within each economic sector k , assumingthere are K economic sectors. Formally, the model creates links between cities which are characterized by theireconomic size E i (GDP) and economic structure S ik in terms of activity sectors (probability distribution of firmswithin K sectors). Starting with an existing network with no links, the model iteratively adds links, following aprobability given by a generalized Cobb-Douglas function [23] as p ij ∝ (cid:18) E i E (cid:19) γ O · (cid:18) E j E (cid:19) γ D · (cid:16) w ij W (cid:17) γ W · s ( S ik , S jk ) γ S · exp ( − d ij /d ) · exp ( − c ij /c ) (1)where E = (cid:80) k E k , W = (cid:80) i,j w ij weights of previous links, s ( S ik , S jk ) is a similarity function between activitysectors S ik and S jk taken as a cosine similarity, d ij is the Euclidian distance, and c ij is a socio-cultural distance. Inthe case of an already existing link between two areas, the weight of the latter is incremented by a unit volume w .The model is stopped either when a final time is reached or when the cumulated volume of links reaches a targetquantity. This formulation can be considered both as an economic utility model and a generalisation of spatialinteraction models.Note that: (i) we consider an asymmetric influence of sizes, assuming that link directions are important (similarlythe similarity function s may be taken as asymmetric); (ii) the influence of previous links is similar to a preferentialattachment process; (iii) we do not renormalise the exponents to 1 to ensure to include convex functions. The“geopolitical/sociocultural” distance remains abstract and should be parametrised (see setup below) and estimated in2he data-driven setting. In this simple version of the model we assume an absence of evolution of economic size,which means that the adjusted parameters are valid on a restricted time period. Model setup

When setting the model, it is important to consider that different initial conditions (or parametrisation on real data),given by the spatial distribution, the sizes and industrial sector compositions, are going to affect the behaviour ofthe model. One of the authors [24] has already shown that the initial spatial configuration can have significantimpacts on model outcomes. We therefore consider two versions: one with a synthetic system of cities which can bestochastically repeated, and one other based on real data.

Synthetic setup

The synthetic system of cities is generated to be at the scale of a continent with propertiessimilar to Europe: (i) N = 700 cities with an empirical power-law from the Global Human Settlements database foreconomic sizes ( α = 1 . d max = 3000 km ;(ii) clustered into C = 20 countries using a k-means algorithm (this remains consistent with non-correlated city sizesat the scale of the countries as our cities are randomly distributed in space [25]); (iii) with synthetic distribution ofsectors following log-normal distributions adjusted in a way that large cities have more high-value activities and aremore diverse.More precisely for this last point, assuming sectors are distributed continuously along a one-dimensional axis,cities are assumed to have a sector distribution with a log-normal density function. The largest city is constrainedto have a standard deviation and a mode of 1 / /K , with a linear interpolation between the twofor other cities. Parameters of the log-normal law are obtained by solving the corresponding two equations systemfor each city, and each distribution is discretised and renormalised to be a vector of probabilities. See S2 Text for alldetails of this synthetic setup.In synthetic experiments, the network is generated from an initial complete network with dummy links of weight1 until reaching t f = 1500, with a unit link volume of w = 1. Real setup

The real setup is done using the empirical network of ownership links. Urban areas are distributedaccording to their positions in the actual geographical space, and economic size and sector composition attributesare used. The socio-economic distance is constructed using information from the statistical modeling, by consideringfixed-effect coefficients estimated for each pair of countries. See S2 Text for more details on the construction of thesocio-economic distance matrix.The initial network is the same than in the synthetic setup, and the real links are taken as targets for thegenerated network. We constrain the total volume of flows by setting w = (cid:80) ij w ( obs ) ij /t f when the number of timesteps t f is given as a parameter. Model parameters and indicators

The model parameters are the Cobb-Douglas weights γ O , γ D , γ W , γ S and distance decay parameters d , c . As themodel also includes path-dependencies by integrating the influence of previous links, it cannot be reduced to asimple statistical model. Finally, the role of space introduces even further complexity, yielding a generative model,which has to be simulated.We consider two main macroscopic indicators to quantify the generated network. Firstly, geographical structuresare captured by internationalisation (modularity of countries in the network). The situation of a minimised modularitycorresponds to dense connections between cities of different countries in Europe and sparse connections betweencities of the same European country. In practice, modularity is computed as follows I = 12 | E | · (cid:88) i,j (cid:18) w ij − d i d j | E | (cid:19) c i = c j (2)where d i is the weighted degree of city i in country c i .Secondly, we consider the metropolisation indicator as the correlation between the weighted degree and theGDP of the city. With this indicator we capture the effect of large cities or metropolises of regional importanceattracting the biggest amount of links and involved in high interactions with other cities. It is computed with aPearson correlation estimator as M = ˆ ρ [ d i , E i ] (3)3igure 1: Distributions of network properties. (Left) Cumulated distribution for edge weight, with power law fit(red) and log-normal fit (green); (Right)

Cumulated distribution for weighted degree.In this work, we do not consider the relationship between countries given their economic complexity, such asthe framework developed by [26], where the diversity of products produced plays an essential role in determiningco-dependencies and resilience. What we investigate instead, is the role of linkages given by geography and history,and hence, the sector structure of cities does not evolve within the model.

Empirical data

Companies from

AMADEUS are georeferenced using the Geonames database (using postcode or address dependingon the information available). We then join this data with the Global Human Settlement dataset GHS-FUA forFunctional Urban Areas [27] – which delineate the commuting areas for urban centres. Starting from a firm-level dataset of 1,562,862 nodes and 1,866,936 links, we obtain a directed network of 573 urban areas and 9323ownership links. The weight of links is obtained by computing the owned share of turnover at destination, i.e. w ij = (cid:80) k ∈ i,l ∈ j p kl · T l where p kl is the ownership share of company l by company k and T l is the turnover ofcompany l . Node attributes include the economic size of urban areas, that we compute as the sum of turnoversof companies within the area. This quantity is highly correlated with the GDP of the area-level included in theGHS database (Spearman correlation ρ = 0 .

71) and also with population ( ρ = 0 . s ij = (cid:80) Kk =1 S ik · S jk where S ik are sector proportions.The urban network has heavy tail weighted degree and edge weight distributions, as shown in Fig. 1. We fit powerlaw and log-normal distributions, including minimal value cutoff following [29], using the powerlaw R package [30].For both, log-normal distributions appear to be a better fit ( µ = 18 . σ = 2 . µ = 13 . σ = 2 . Map of network communities obtained by modularity maximisation.

The area of circles givesthe turnover within the Functional Urban Area, and color the community it belongs to. Urban Areas in communitiesof size smaller than 3 were not colored for readability.(resp. 15). In comparison, when taking into account 29 countries, the classification of urban areas by country has amodularity of 0.32. This indicates that the structure of flows present certain regional patterns. Indeed, a null modelassigning random labels with 30 communities, bootstrap 1000 times, yields an average modularity of 0 . ± . tatistical analysis In order to understand the most determining geo-economic factors influencing the creation of links between cities,we run a statistical analysis as applied in the case of unconstrained spatial interaction models [36]. We test fivedifferent statistical models, which progressively include the processes for the generative network model. The mostgeneral statistical one is: log( w ij ) = log( d ij ) + log( T i ) + log( T j ) + log( s ij ) + α c i ,c j + ε ij (4)where α c i ,c j is a fixed effect term for the coupling of countries c i , c j , and T i , T j correspond to the turnovers of cities i and j respectively.This multiplicative form linearly transformed through logarithms is common practice for spatial interactionmodels. Note that no constraints are considered, since links include information about the turnover and a shareof ownership at the same time. Estimation results are given in Table 1. We also fit a Poisson generalized linearmodel with a logarithmic link on the rounded weights [37]. The findings show that when distance only is takeninto account (model 1), it has a significant negative effect on link creation, but with an explained variance close tozero. Model 2 adds fixed effects by pair of countries, for which around a third are significant, and with a better R of 0.17. Adding the economic size at origin and destination (turnovers T i and T , model 3) slightly improvesthe explanatory power (model 3). The best model among the linear ones is the model 4, with all factors and fixedeffects, both in terms of explained variance ( R = 0 .

31) and Akaike information criterion. Model 4 also indicatesthe absence of overfitting. Model 5, which is a Poisson model with all variables is much a better fit regardingpseudo- R and provides much more significant estimates (all fixed effects significant and much smaller standarddeviations). Across all models, consistent qualitative stylized facts regarding link creation between cities are revealed:(i) distance has a negative influence; (ii) economic sizes have a positive effect, with the size of the origin city beingmore important (consistent in the ownership links context, as strongly driven by decision at the headquarter/ownerlevel); (iii) industrial proximity has a positive influence, i.e. similarity is more important than complementarity forthis dimension; (iv) a large part of country fixed-effects are statistically significant; and (v) the order of magnitudeof the influence of each factor is roughly the same, which means that no process totally dominates the others. Thisconfirms the relevance of including all these different dimensions in the generative network model, since they have acomparable explanatory role in statistical models.Table 1: Statistical models.

Ordinary Least Square estimations of statistical models explaining log( w ij ) (standarderrors in parenthesis). ***, ** and * respectively indicate 0.01, 0.05 and 0.1 levels of significance. For the fixedeffect by origin-destination pairs of countries, the proportion of significant coefficients is attributed (where p < . R , Mean Square Error, and Akaike Information Criterion.For the Poisson model (model 5), AIC is not given as it is not comparable.Model (1) (2) (3) (4) (5)log( d ij ) -0.06** (0.03) -0.11** (0.05) -0.41*** (0.02) -0.35*** (0.04) -0.26*** (5 · − )log( T i ) 0.56*** (0.01) 0.56*** (0.01) 0.79*** (2 · − )log( T j ) 0.39*** (0.01) 0.39*** (0.01) 0.66*** (1 . · − )log( s ij ) 0.19*** (0.07) 0.53*** (9 · − )Countries 28.3% 26.5% 100% R Model exploration and calibration

The model is implemented in NetLogo, which provides a good compromise between performance and interactiveprototyping and exploration [38]. Numerical experiments are done by integrating the model into the OpenMOLEsoftware for model exploration [39], which provides sensitivity analysis and calibration methods and a transparentaccess to high-performance computing infrastructures. The model source code, the aggregated network data (the6aw initial firm network cannot be made available for data ownership reasons), and the results are available on theopen Git repository of the project at https://github.com/JusteRaimbault/ABMCitiesFirms . Large simulationresult files are available on the dataverse repository at https://doi.org/10.7910/DVN/UPX23S . We show in Fig. 3examples of generated networks, with very contrasted patterns obtained with different spatial interaction ranges.Low values of the gravity decay parameter yield local networks around main cities, while long interaction rangesyield more integrated networks. We also show networks simulated in the real setting.

Synthetic city systems

We first study extensively the behavior of the model on synthetic systems of cities, to isolate its intrinsic behaviorindependently of a contingent geographical situation.

Sensitivity analysis

A global sensitivity analysis first unveils an aggregated influence of parameters on modelindicators. We apply Saltelli’s Global Sensitivity method introduced by [40]. This produces values for each parameterand each output of a first order sensitivity index, defined as

V ar [ E X ∼ i ( Y j | X i )] /V ar [ Y j ], for parameter X i andoutput Y j , X ∼ i being all other parameters. This captures the total influence of X i all other things being equal.The total order index is given by E X ∼ i [ V ar ( Y j | X ∼ i )] /V ar [ Y j ] and captures non-homogeneity in behavior andinteraction between parameters. Indices were estimated with a Sobol sequence sampling of 20,000 model runs [41].We give in S2 Text the full estimated results for sensitivity indices. To summarize, we find that indicators related tomodularity (internationalisation and optimal modularity) are mostly influenced by geographical parameters. Onthe contrary, indicators of network structure are driven by origin and destination sizes. The influence of sectorsand historical links has a secondary but not negligible role. Altogether, the analysis witnesses of strong interactioneffects between parameters, since for several indicators the total order index is one order of magnitude larger thanthe first order index. This means that the generative model captures some complex behavior of the system. One factor sampling

A first numerical experiment of one-factor sampling on all Cobb-Douglas parameters and100 stochastic repetitions confirms a good statistical convergence (average Sharpe ratios for indicators all larger than5). We use thus 20 repetitions in following larger experiments (in the case of a normal distribution, a 95% confidenceinterval on the average is of size 2 σ · . / √ n , what gives n = 16 repetitions for a CI of width of the standarddeviation). Behavior of some indicators are shown in Fig. 4. For example, internationalisation and metropolisationboth show a transition as a function of d o , from a local to a global regime. The sector proximity parameter γ S influences the internationalization linearly, but induces a maximum for the correlation between size and degree,which can be interpreted as a setting where size and the total volume in a city are the most correlated. Grid exploration

The model behaviour is then studied with a grid sampling for parameters and 20 repetitions(the computations are run on the European Grid Infrastructure, and are equivalent to 2.5 years of CPU time).Distance decay and sector parameters are varied with 10 steps, other parameters with three. The influence of gravitydecay parameters is confirmed when plotting the internationalization index against the distance, which exhibits anexponential decay as shown in Fig. 5. It interacts with the role of sectors and the size of the origin, with a moresignificant effect of sector proximity, when size has the largest influence (third panel).Other indicators exhibit non-trivial patterns. For example, when considering average community size of the finalnetwork as shown in Fig. 6, we obtain a maximal integration in terms of communities at an intermediate value ofthe gravity decay. This can be interpreted as the emergence of a regional regime. The size of communities is largelyinfluenced by the value of the elasticities for the similarity function and the ratio of the economic output of the area.In particular, we observe a qualitative inversion of the role of γ S when introducing an effect of the origin (switching γ O from 0 to 1, first panel to second panel). The maximal community size disappears when γ O = 2, implying aregime with small local communities where large cities mostly connect with their hinterland. These experimentsreveal a complex interplay between processes and how the model produces diverse stylised facts. Impact of urban hierarchy

We run a specific experiment to study the impact of urban hierarchy on firmsdynamics. Rank-size law in cities is a central stylized fact within urban theories, despite the discussions regardingthe estimation of the power-law exponent, its actual values and its variation depending on the definition of citiesconsidered [42, 43]. In this particular context, understanding how this parameter can impact urban dynamics withinartificial systems of cities – in which it can be modified to extreme values non observable in real systems–, shouldbring evidence on underlying processes [24]. We change the initial hierarchy α from 0.5 to 2.0 with a step of 0.1,and also vary the interaction range d . The behavior of internationalisation and metropolisation indicators are7igure 3: Example of simulated networks at t = 1500, for a synthetic system of cities (Top row) and the Europeanurban network (Bottom row), with a high (resp. low) gravity decay parameter for the left column (resp. right).8igure 4: (Top Left) Internationalisation index decreases exponentially with gravity decay; (Top Right) Correlationbetween city weighted degree and size. Both plots show a transition from a local to a global regime. (Bottom Left)Internationalisation varies linearly with sector proximity γ S ; (Bottom Right) Correlation between degree and sizeexhibits a maximum, witnessing an intermediate regime where size is the most important.9igure 5: The transition as a function of interaction range depends on the influence of origin size γ O ; sector proximity γ S plays a role only for a large influence of the origin.Figure 6: Maximal integration in terms of community size is achieved at an intermediate value of d o : emergenceof a regional regime; maximal size depends on the role of sectors γ S in a decreasing way when the origin size isdeactivated, and in an increasing way when γ o = 1; this regime disappears when the influence of the origin sizebecomes too large. 10igure 7: Impact of urban hierarchy. (Left) Internationalisation as a function of the initial synthetic hierarchy α , for different values of interaction range d (color); (Right) Metropolisation as a function of α , for different valuesof interaction range d (color).shown in Fig. 7. As expected, the hierarchy of the urban system has an important impact on model outcomes.We find that more hierarchical urban systems tend to produce more international networks (internationalisationindicator is a modularity which is minimized when networks span more accross different countries), what is consistentwith the existence of globalized metropolis [44]. This effect is mitigated by lower interaction ranges. Regardingmetropolisation, we observe a maximum values as a function of initial hierarchy consistent across different interactionranges d . This means that a too high hierarchy is detrimental to the global metropolisation when considering allcities, what may be due to the fact that in such a case very large cities are capturing all flows, leaving nothing tointermediate and medium-sized cities. Interestingly, the optimal value α (cid:39) . Model calibration on the European urban network

We then apply the model to a real system of cities by calibrating it on the European ownership network. Model setupis done following the real setup described above. The number of time steps t f is left as an additional parameter,which in a sense determines the granularity of the cumulative network generation process. The objective functions forcalibration are the average mean squared error on logarithms of weights given by ε L = N (cid:80) i,j (log w ij − log ˆ w ij ) if ˆ w ij are the simulated weights, and the logarithm of the mean squared error ε M = log (cid:16) N (cid:80) i,j ( w ij − ˆ w ij ) (cid:17) . [17]showed that these two are complementary to calibrate urban systems models. Calibration is done using a geneticNSGA2 algorithm [46], which is particularly suited for such a bi-objective calibration of a simulation model.Stochasticity is taken into account by adding the estimation confidence as an additional objective, and we filterthe final population to have at least 20 samples. We use a population of 200 individuals and run the algorithm for50,000 generations, on 1000 islands in parallel.We show in Fig. 8 the calibration result, as a Pareto front of compromise points between the two objectives. Asthe mean squared error on logarithm is the same than the one for statistical models, it can be directly compared.The other objective would correspond to statistical models directly on variables: therefore, our model covers in asense a set of intermediate models which make a compromise between the two objectives, and is therefore muchmore flexible. The values for ε L are for a large proportion of points lower than for statistical models, our modeloutperforms these in this predictive sense. As shown in Fig. 8, increasing values for γ O lead to better performanceregarding ε L : increasing the importance of the origin will provide a better fit on all order of magnitudes of link11igure 8: Model calibration results.

Pareto front obtained for the bi-objective model calibration for ε L and ε M objectives. ε L can be compared to the MSE of statistical models fitted above, and the generative model outperformthese for several parametrizations (lowest MSE 5.33 for statistical models). Point size gives the number of stochasticsamples and color the value of γ O . 12eights (while ε M favors mostly the largest links given the fat-tail distribution).When considering optimal points with ε L < .

0, the adjusted parameters are: long range gravity interactionsand socio-economic interactions given by d = 5972 ± c = 113 ±

33 (the maximal c ij is 11.8); a strongerinfluence of destination than origin as γ O = 1 . ± .

10 and γ D = 2 . ± .

05; a comparable influence of sectorsas γ S = 1 . ± .

21; a fine time granularity as t f = 4951 ± γ W = 6 . ± .

11. This last point furthermore shows in particular a role of the path-dependency parameter,confirming the relevance of this complex generative model including path-dependency. The interaction of this processwith others implies an inversion of the relative importance of origin and destination (on the contrary to statisticalmodels). The fact that calibrated interaction distances are large is consistent with long range economic interactionswithin Europe. Besides these stylized facts, these calibration results altogether demonstrate the feasibility of modelapplication to real data.

Application: impact of economic shock

We finally design scenarios to evaluate the impact of economic shocks. These can be the consequence of diversefactors, including socio-economic, geopolitical factors (example of Brexit), or other unpredicted human disasters suchas the COVID-19 pandemic. We simulate border closures by rescaling c – the range parameter for socio-economicinteractions –, and intra-country lock-downs by rescaling d – the range parameter for direct interactions – during thesimulation, with other parameters being fixed to their default values and the model setup in the real configuration.More precisely, with t f = 5000, the model is run for t f / κ C ∈ [0; 1]and κ G ∈ [0; 1] defining the shock, model parameters are updated as d = κ G · d and c = κ C · c . The model isthen run for the remaining time steps until t f . We study the temporal trajectory of indicators to see the impactof the shock in time. The experiment is run for ( κ G , κ C ) ranging between 0.2 (strong restrictions) to 1 (referencetrajectory), with 100 model repetitions.Results are shown in Fig. 9. We find an important impact of the shock on internationalisation, but less onthe average community size. The impact of changing d is more important than c , but a combination of bothyield the stronger effect. For minimal values of κ G and κ C , internationalisation is more than doubled, confirminga reconfiguration of urban networks within countries. We validate the statistical significance of effects observedin Fig. 9 by comparing distributions with Kolmogorov-Smirnov (KS) tests. More precisely, we proceed for eachindicator and value of ( κ G , κ C ) to a two-sided KS test between the statistical distribution of the indicator overrepetitions at final time with the distribution in the reference configuration. For internationalisation, p-values of thetest are all smaller than 0.002 for κ D < . .

15 for κ D ≥ .

6. This means that the shock has astatistically significant impact only when interaction range is more than halved. For community size, significance(p-value smaller than 0.01) is reached only for the smallest value κ D = 0 .

2. Thus, the trajectory of the urban networkis rather resilient to moderate shocks. This application shows how the model could be applied to practical policyissues.

Discussion

This paper aimed at presenting a generative model for urban networks defined by interactions between firms basedon synthetic data, simulated via the OpenMole platform and calibrated on real data on ownership linkages of firmsfor Europe from the Amadeus dataset. The simulation on a synthetic system of cities provides a broad knowledge onmodel behavior in its parameter space. Even in such a simple model (close to directly tractable stationary state) thebehaviour is highly non-linear in many dimensions. Our study shows how crucial model exploration is to overcomehidden parameters (deactivated mechanisms or default parameter values). This paper emphasis on the fact thatexploration of intrinsic dynamics on synthetic data is of great importance and should be driven before an applicationon real data so as to put aside the geographical effects from model dynamics.The calibration of model suggests its potential application to policy issues and can therefore have practicalapplications in the future. For example the effect of exogenous shocks in the socio-economic structure can be testedas we did in the stylized application example above. Such shocks can for example be the upcoming impact of Brexiton the European market or the consequence of a long-term closure of the European Union members’ borders due tofurther pandemic outbreaks on the single market. In this sense the UK economic sectors mostly dependent on foreigncapital and generating the highest turnovers in 2018, that would be the most affected and worth examining in detailsare: extraction of crude petroleum, manufacture of tobacco products and of basic pharmaceutical goods (in terms ofout-going links from the UK) ; activities of head offices and holding companies, sale of cars and manufacture of13igure 9:

Application of the calibrated model to an economic shock scenario.

All plots show the trajectoryof indicators (internationalisation for the top row and average community size for the bottom row) in time, fordifferent values of interaction decay rescaling κ G (color) and socio-economic distance rescaling κ C (panels). Thevertical dotted red line shows the moment the economic shock is introduced and d (respectively c ) is set to κ G · d (resp. κ C · c ). 14otor vehicles (in terms of in-going links to the UK). We showed in our experiment that although the system isresilient to moderate shocks, strong restrictions have an important impact on internationalization and thus can beexpected to have a detrimental effect on UK economy due to these foreign ownership links (see also S1 Text).Several possible extensions of the current work can be undertaken in the future as a co-evolution model with anevolution of city sizes or a model with firm agents leading to a multi-scale agent-based model. A more precise studyof the role of path dependency can also be undertaken in order to understand the creation of the future links. Otherformulations of the model might be taken into consideration as other formulation of the combination of factors ormulti-objective optimisation depending on sectors using Pareto fronts. Acknowledgments

References

1. Castells M. The Rise of the Network Society. Blackwell Publishers, Cambridge MA. 2000;.2. Taylor PJ. Specification of the world city network. Geographical analysis. 2001;33(2):181–194.3. Sassen S. The Global City: New York, London, Tokyo. Princeton University Press, New Jersey. 1991; p. 480.4. Martinus K, Sigler TJ. Global city clusters: theorizing spatial and non-spatial proximity in inter-urban firmnetworks. Regional studies. 2018;52(8):1041–1052.5. Derudder B, Taylor PJ. Central flow theory: Comparative connectivities in the world-city network. RegionalStudies. 2018;52(8):1029–1040.6. Turkina E, Van Assche A, Kali R. Structure and evolution of global cluster networks: evidence from theaerospace industry. Journal of Economic Geography. 2016;16(6):1211–1234.7. Clark J, Harrison J, Miguelez E. Connecting cities, revitalizing regions: the centrality of cities to regionaldevelopment. Regional Studies. 2018;52(8):1025–1028.8. Dawley S, Mackinnon D, Pollock R. Creating strategic couplings in global production networks: regionalinstitutions and lead firm investment in the Humber region, UK. Journal of Economic Geography. 2019;.9. Gl¨uckler J, Panitz R. Relational upgrading in global value networks. Journal of Economic Geography.2016;16(6):1161–1185.10. Martinus K, Sigler T, Iacopini I, Derudder B. The brokerage role of small states and territories in globalcorporate networks. Growth and Change. 2019;.11. Storme T, Derudder B, D¨orry S. Introducing cluster heatmaps to explore city/firm interactions in world cities.Computers, Environment and Urban Systems. 2019;76:57–68.12. Dai L, Derudder B, Liu X, Ye L, Duan X. Simulating infrastructure networks in the Yangtze River Delta(China) using generative urban network models. Belgeo Revue belge de g´eographie. 2016;(2).13. Garlaschelli D, Loffredo MI. Structure and evolution of the world trade network. Physica A: StatisticalMechanics and its Applications. 2005;355(1):138–144.14. Jin Yx, Wilson A. Generation of integrated multispatial input-output models of cities (GIMIMoC) I: Initialstage. Papers in Regional Science. 1993;72(4):351–367.15. Dennett A, Wilson A. A multilevel spatial interaction modelling framework for estimating interregionalmigration in Europe. Environment and Planning A. 2013;45(6):1491–1507.156. Dai L, Derudder B, Liu X. Generative network models for simulating urban networks, the case of inter-citytransport network in Southeast Asia. Cybergeo: European Journal of Geography. 2016;.17. Raimbault J. Indirect evidence of network effects in a system of cities. Environment and Planning B: UrbanAnalytics and City Science. 2020;47(1):138–155.18. Berry BJ. Cities as systems within systems of cities. Papers in regional science. 1964;13(1):146–163.19. Pumain D. An Evolutionary Theory of Urban Systems. In: International and Transnational Perspectives onUrban Systems. Springer; 2018. p. 3–18.20. Zdanowska N. Exploring cities of Central and Eastern Europe within transnational company networks: thecore-periphery effect. arXiv e-prints. 2019;.21. BVD. Bureau Van Dijk Amadeus database: extraction for Europe. University College London Library ServiceDatabases. 2018;.22. Cottineau C, Arcaute E. The nested structure of urban business clusters. Applied Network Science. 2020;5(1):1–20.23. Vˆılcu GE. A geometric perspective on the generalized Cobb–Douglas production functions. Applied Mathe-matics Letters. 2011;24(5):777–783.24. Raimbault J, Cottineau C, Le Texier M, Le Nechet F, Reuillon R. Space Matters: Extending SensitivityAnalysis to Initial Spatial Conditions in Geosimulation Models. Journal of Artificial Societies and SocialSimulation. 2019;22(4).25. Simini F, James C. Testing Heaps law for cities using administrative and gridded population data sets. EPJData Science. 2019;8(1):1–13.26. Hidalgo CA, Klinger B, Barab´asi AL, Hausmann R. The product space conditions the development of nations.Science. 2007;317(5837):482–487.27. Florczyk AJ, Corbane C, Schiavina M, Pesaresi M, Maffenini L, Melchiorri M, et al. GHS Urban CentreDatabase 2015, multitemporal and multidimensional attributes, R2019A. European Commission, JointResearch Centre (JRC); 2019.28. EUROSTAT. NACE rev.2 Statistical classification of economic activities in the European community. 2019; p.369p.29. Clauset A, Shalizi CR, Newman ME. Power-law distributions in empirical data. SIAM review. 2009;51(4):661–703.30. Gillespie CS. Fitting Heavy Tailed Distributions: The poweRlaw Package. Journal of Statistical Software.2015;64(2):1–16.31. Nicosia V, Mangioni G, Carchiolo V, Malgeri M. Extending the definition of modularity to directed graphs withoverlapping communities. Journal of Statistical Mechanics: Theory and Experiment. 2009;2009(03):P03024.32. Clauset A, Newman ME, Moore C. Finding community structure in very large networks. Physical review E.2004;70(6):066111.33. Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E. Fast unfolding of communities in large networks. Journalof statistical mechanics: theory and experiment. 2008;2008(10):P10008.34. Decoville A, Durand F. Exploring cross-border integration in Europe: How do populations cross borders andperceive their neighbours? European Urban and Regional Studies. 2016;26(2):134–157.35. Zdanowska N. Distribution of Foreign Direct Investment Across the National Urban Systems in Countries ofCentral and Eastern Europe in 2013. Geographia Polonica. 2016;90(2):5–24.36. Wilson AG. Some new forms of spatial interaction model: A review. Transportation Research. 1975;9(2-3):167–179. 167. Flowerdew R, Lovett A. Fitting constrained Poisson regression models to interurban migration flows.Geographical Analysis. 1988;20(4):297–307.38. Railsback S, Ayll´on D, Berger U, Grimm V, Lytinen S, Sheppard C, et al. Improving execution speed ofmodels implemented in NetLogo. Journal of Artificial Societies and Social Simulation. 2017;20(1).39. Reuillon R, Leclaire M, Rey-Coyrehourcq S. OpenMOLE, a workflow engine specifically tailored for thedistributed exploration of simulation models. Future Generation Computer Systems. 2013;29(8):1981–1990.40. Saltelli A, Ratto M, Andres T, Campolongo F, Cariboni J, Gatelli D, et al. Global sensitivity analysis: theprimer. John Wiley & Sons; 2008.41. Saltelli A, Annoni P, Azzini I, Campolongo F, Ratto M, Tarantola S. Variance based sensitivity analysisof model output. Design and estimator for the total sensitivity index. Computer Physics Communications.2010;181(2):259–270.42. Cottineau C. MetaZipf. A dynamic meta-analysis of city size distributions. PloS one. 2017;12(8):e0183919.43. Corral ´A, Udina F, Arcaute E. Truncated lognormal distributions and scaling in the size of naturally definedpopulation clusters. Physical Review E. 2020;101(4):042312.44. Sassen S. The global city. New York. 1991;.45. Raimbault J, Denis E, Pumain D. Empowering Urban Governance through Urban Science: Multi-ScaleDynamics of Urban Systems Worldwide. Sustainability. 2020;12(15):5954. doi:10.3390/su12155954.46. Deb K, Pratap A, Agarwal S, Meyarivan T. A fast and elitist multiobjective genetic algorithm: NSGA-II.IEEE transactions on evolutionary computation. 2002;6(2):182–197.

Supporting information: S1 Text

UK’s economic sectors dependent on international links.

Summary of links between UK and the rest of theEU, by economic sector.Based on

AMADEUS dataset, the most vulnerable economic sectors in the United Kingdom have been highlighted.The later are the ones due to be mostly affected by eventual future lock-downs of the international economy(international ownership links).When considering the UK sectors generating the highest turnover in 2018 mostly dependent on foreign capitalthey own (out-going links from the UK - see below Table 2) these are extraction of crude petroleum, manufactureof tobacco products and of basic pharmaceutical goods. In terms of in-going links to the UK (see below Table 3),activities of head offices and holding companies, sale of cars and manufacture of motor vehicles are sectors mostlydependent on foreign ownership coming from abroad.Table 2: Top 10 UK industries depending on foreign ownership expressed in total turnover (out-going links from theUK)Classification NACE Industries Total turnover1 Extraction of crude petroleum 96,458,258,6422 Manufacture of tobacco products 5,477,272,4643 Manufacture of basic pharmaceutical products 4,825,881,9304 Retail sale in non-specialised stores with food, beverages or tobacco predominating 3,681,373,3745 Wireless telecommunications activities 3,379,297,2186 Event catering activities 2,871,593,7547 Other mining and quarrying n.e.c. 1,590,548,8738 Business and other management consultancy activities 1,394,373,0149 Activities of head offices 1,322,692,36510 Manufacture of air and spacecraft and related machinery 1,292,065,497When expressed in total number of links (Table 4 and Table 5), activities mostly dependent on out-going linksfrom the UK are activities of holding companies, of head offices and fund management activities. In terms of in-goinglinks these are business and consultancy activities and renting and operating of own and leased real estate.17able 3: Top 10 UK industries depending on foreign ownership expressed in total turnover (in-going links to the UK)Classification NACE Industries Total turnover1 Activities of head offices 358,099,8702 Activities of holding companies 259,008,1233 Other business support service activities n.e.c. 183,098,1704 Security and commodity contracts brokerage 72,482,1675 Manufacture of motor vehicles 65,162,7576 Sale of cars and light motor vehicles 61,019,0267 Other financial service activities, except insurance and pension funding 47,003,9378 Other personal service activities n.e.c. 37,926,6689 Wholesale of solid, liquid and gaseous fuels and related products 36,693,36210 Other information technology and computer service activities 35,668,779Table 4: Top 10 UK industries depending on foreign ownership expressed in total links (out-going links from the UK)Classification NACE Industries Total links1 Activities of holding companies 6862 Other business support service activities n.e.c. 5673 Activities of head offices 5034 Fund management activities 4075 Business and other management consultancy activities 3986 Extraction of crude petroleum 3667 Other activities auxiliary to financial services, except insurance and pension funding 2408 Development of building projects 2219 Manufacture of basic pharmaceutical products 21210 Other information technology and computer service activities 181Table 5: Top 10 UK industries depending on foreign ownership expressed in total links (in-going links to the UK)Classification NACE Industries Total links1 Other business support service activities n.e.c. 16,1402 Renting and operating of own or leased real estate 6,3203 Business and other management consultancy activities 6,0944 Activities of head offices 5,6285 Computer consultancy activities 5,0476 Activities of holding companies 4,7287 Other personal service activities n.e.c. 3,9968 Development of building projects 3,9469 Other information technology and computer service activities 3,75710 Other financial service activities, except insurance and pension funding 3,321

Supporting information: S2 Text

Model setup and sensitivity analysis.

Details on the simulation model: summary statistics for the socio-economic distance matrix, procedure for the synthetic model setup and detailed Global Sensitivity Analysis.

Socio-economic distance matrix

In the real model setup, socio-economic distance between countries c ij is constructed using fixed effects coefficientsfrom the statistical model. In Table 1 in main text, we estimated several statistical models, including some withfixed effects by pairs of countries. The socio-cultural distance between two countries is then taken as the opposite ofthe fixed effect coefficient for this pair, for the full statistical model. Summary statistics for the 29x29 matrix (28EU countries and Jersey which is in the database a distinct country from UK but has an important total turnover18ince many British companies are located there for fiscal advantage reasons) are shown below in Table 6. We observethat distances are rather localized but with large outliers, and an important proportion are not defined (339 out of841), corresponding to couple of countries having no exchange at all in the dataset.Table 6: Summary statistics of the socio-cultural distance estimated with a fixed effects model.

Min. 1st Qu. Median Mean 3rd Qu. Max. NA’s-2.327 1.440 2.451 2.538 3.447 11.797 339

Synthetic model setup

The procedure for the synthetic sector setup is the following: • Sectors distributions follow log-normal densities with most mass in [0; 1] • Large cities are more innovative and more diverse. Assuming that sectors are ordered by innovativity (thelarger the sector index the larger the innovativity), this assumption is translated by taking a mode and varianceof 0.5 for the largest city and a mode and variance of 1 /K for the smallest, and a linear interpolation betweenthe two. For each city, we definelog( e i ) = (log( E i ) − log( E imin ))(log( E imax ) − log( E imin )) ∗ (1 / − /K ) + 1 /K • Log-normal parameters ( µ i , σ i ) for each city are then fixed by µ i − σ i = log( e i ) and − σ i − σ i ) −

1) = log ( e i ) • σ i is the unique positive root of f ( X ) = 0 with f ( X ) = − X − X ) − − log( e i ) and µ i = log( e i ) + σ i . Sensitivity analysis

The table 7 gives the full results for the Global Sensitivity Analysis, for all model indicators and free parameters.We give the first order indices and the total indices.Table 7: Saltelli sensitivity indices, for indicators in rows and parameters in columns. We give for each pair the firstorder index (F) and the total order index (T). γ G γ D γ S γ W γ O γ D F T F T F T F T F T F TInternationalisation 0.2 0.3 0.7 0.7 0.001 0.009 4 · − · −4