A diffusion-based approach to obtaining the borders of urban areas
Cesar Henrique Comin, Filipi Nascimento Silva, Luciano da Fontoura Costa
AA diffusion-based approach to obtaining the borders of urban areas
Cesar Henrique Comin , ∗ Filipi Nascimento Silva , and Luciano da Fontoura Costa Institute of Physics at S˜ao Carlos, University of S˜ao Paulo, S˜ao Carlos, S˜ao Paulo, Brazil
The access to an ever increasing amount of information in the modern world gave rise to thedevelopment of many quantitative indicators about urban regions in the globe. Therefore, thereis a growing need for a precise definition of how to delimit urban regions, so as to allow properrespective characterization and modeling. Here we present a straightforward methodology to auto-matically detect urban region borders around a single seed point. The method is based on a diffusionprocess having street crossings and terminations as source points. We exemplify the potential ofthe methodology by characterizing the geometry and topology of 21 urban regions obtained from8 distinct countries. The geometry is studied by employing the lacunarity measurement, which isassociated to the regularity of holes contained in a pattern. The topology is analyzed by associatingthe betweenness centrality of the streets with their respective class, such as motorway or residential,obtained from a database.
I. INTRODUCTION
The characterization and modeling of urban areas [1]stands as an important aspect of improving, in an ef-ficient manner, their infrastructure, transportation andgrowth planning. Among many distinct properties of ur-ban areas that can be studied, universal scaling laws ofurban area sizes and population have received particu-lar interest in the literature [2, 3]. Special attention isgiven to the long standing problem of the validity andexplanation of Zipf’s law [4–6], and its conciliation withGibrat’s law of proportionate growth [5, 7, 8]. Strikingly,most studies about urban areas are not concerned withthe proper definition of their borders, a problem whichis related to the very definition of an urban area.It is indeed a difficult task to provide a general defini-tion of urban area borders [2, 5, 9]. A common approachin network theory is to use the main administrative regionof the cities constituting the urban area of interest, whichare compiled in great detail in the Global AdministrativeAreas dataset [10]. However, administrative regions havea number of problems. First, in many cases their delimi-tation does not have a clear motivation [11, 12]. Second,administrative areas can have different meanings for dis-tinct countries, in the sense that the territories may bedivided at different scales (e.g. province, state, countyor municipality) for administration purposes. Third, inmany cases they are unrelated to the population density,that is, cities with low population tend to have admin-istrative regions composed by a small inhabited centerand a large, mostly uninhabited, area. Such uninhabitedareas commonly contain a sparse distribution of infras-tructure, which is a potential source of bias when charac-terizing the regions. In Figure 1 we show an example ofsuch a problem. The official administrative region of thecity, shown in Figure 1(a), is much larger than the actualinhabited region of the city. Using official subdivisions ofsuch an administrative area (shown in Figure 1(b)) is also ∗ Email: [email protected]
FIG. 1. Example of administrative regions visualized on themap of a city. The administrative regions at the (a) city and(b) district levels are shown in red. The main inhabited regionof the city can be seen at the lower part of the map. a source of problems, since the city can become dividedinto regions that should be considered as one.Other definitions can be used to delimit urban ar-eas [13, 14]. An overview of the methodologies adoptedby a number of countries can be found in [15]. Unfortu-nately, such methodologies are usually aimed at captur-ing specific characteristics observed in cities belonging tocountries where the criteria was developed. Besides, theyusually rely on manual interventions to set the main re-gions of interest or to avoid outlier cases. Furthermore,since the majority of the methods are based on popula-tion density, they are influenced by the census method-ology adopted at each region. For example, a city mightbe divided into large districts for census data organiza-tion purposes, and a single value of population density beassociated to each district. The respective border foundusing such data will be influenced by the district borders.In this work we present a general procedure that can a r X i v : . [ phy s i c s . s o c - ph ] J u l be applied to any urban region, regardless of the actualdefinition of border used for each specific country. Theprocedure is based on the structure of the urban areastreets, which composes the majority of the transporta-tion network of such systems. The use of an urban areastreet configuration has a clear motivation, since the den-sity of streets in an urban area is known to be correlatedwith many other indicators, the most often consideredones being urbanization and population density [16–18].Some methodologies to define borders using streets havebeen described in the literature [6, 19, 20]. Our approachdiffers from the other studies by considering a distincttype of clustering process for the street crossings. Also,the three parameters employed in the methodology canbe adjusted to follow distinct qualitative and quantitativerules setting the criteria for urban area borders.Two applications of the methodology are also provided.We obtained 21 urban regions from 8 countries and an-alyzed their street patterns geometrically and topolog-ically. The geometry of the urban areas is analyzedthrough the lacunarity measurement [21, 22], a tradi-tional approach to characterize fractal structures. Theedge betweenness centrality [23] is used to infer the ex-pected traffic flow in the topology of each street network. II. DEFINING THE BORDERS OF URBANAREAS
The first step of the method is to define a referencepoint inside the urban area, for geolocation purposes. Itis best, but not strictly necessary, that this point be closeto the center of the main city of the urban area. Thecity centers required for this step can be obtained fromthe Wikipedia [24] page for each urban region being ana-lyzed. Alternatively, one can obtain the centers from theGeoNames dataset [25]. All street intersections or end-ings that are in a radius R from the reference point areretrieved. In Figure 2(a) we show an example of the ini-tial streets one might obtain in this step. The urban areais then described by a graph, where street intersectionsand terminations are represented by nodes and two nodesare connected by an edge whenever there is a street be-tween them. Next, a probability density, P , is estimatedby propagating street intersections and endings inside theregion. This propagation is done by considering the inter-sections and endings as sources of a diffusion process tak-ing place inside the initial region. This diffusion schemeallows to estimate what we call the structural density ofthe city. Therefore, larger probabilities are associatedwith city regions having more street terminations andintersections. Computationally, this physical process isequivalent to applying a kernel density estimation [26]to the initial set of street intersections and terminations.For such a task, we use a Gaussian kernel with standarddeviation σ . The density map for the streets displayedin Figure 2(a) is shown in Figure 2(b). We note that pa-rameter σ is related to the time allowed for the diffusion (a) (b)(c) (d) FIG. 2. Steps of the proposed methodology to identify ur-ban area borders. (a) Streets retrieved in a circle of radius R around a seed point. (b) Probability density distribution ofstreet crossings and terminations. (c) Candidate regions ob-tained after applying a threshold to the probability density.The skeleton of the region is also shown. Each skeleton pointcontains the distance to the closest border point. (d) Urbanregion obtained after applying a threshold to the skeleton,and retrieving the region with the largest area. to take place. This parameter sets the scale at which weconsider that nearby streets belong to the same urbanarea.Candidate urban regions are then defined by applyinga threshold, T , to the structural density and selectingregions where the density is larger than T . The thresh-old is defined as a fraction of the maximum value of thestructural density, that is, T = f P max . Then, the skele-tons [27], or medial axis, of the candidate regions arecalculated. The shortest distance between each skeletonpoint, s , and the candidate region border is stored as D ( s ). In Figure 2(c) we show in white the regions foundafter applying the threshold to the density map, togetherwith the distances D ( s ) associated to each skeleton point.The skeleton is used as a means to separate nearby citiesfrom the main urban area. If a city is close to the re-gion we are interested in, but there are few streets be-tween the city and the region, D ( s ) will have a smallvalue. On the other hand, if the nearby city is so closeto the region of interest that their streets do not presenta clear separation, we consider that its street networkbelongs to the main urban area. Therefore, a skeletonpruning procedure is done by applying a morphologicaldilation [27] to the obtained skeletons using a disk withradius D ( s ) as a structuring element, but only skeleton FIG. 3. The influence of the diffusion time scale σ on thepredicted structural density of a city. Each panel shows theresulting density when setting (a) σ = 0 .
19 km, (b) σ = 0 . σ = 1 .
11 km, (d) σ = 1 .
86 km. points s where D ( s ) > M w are dilated. The parameter M w sets the minimum width of the desired urban regionat the interface between different cities. More than oneregion might be created by the previous step, the regionwith the largest physical area becomes the relevant urbanarea. The resulting region for our illustrative example isshown in Figure 2(d). We note that a hard boundary canbe added to the methodology in order to separate urbanregions belonging to distinct countries.The parameters σ , f and M w should be the same forall considered urban areas. This is because their valueprovide the actual definition of an urban area. That is,we consider that urban areas should be formed by streetsthat are, roughly, closer than σ , have a density of inter-sections larger than f and do not form patches thinnerthan M w . The main parameter of the method is the diffu-sion time σ , which sets the scale of the structural density.In Figure 3 we show the structural density of the candi-date region displayed in Figure 2(a) for distinct values of σ . When σ is small, the diffusion dynamics only reachesthe immediate neighborhood of street intersections andterminations, as shown in Figure 3(a). Such scale can beuseful for city planning purposes, as the structural den-sity can be related to the expected traffic flow at eachrespective region, due to its relationship with the densityof street crossings. At intermediate scales such as thoseshown in Figures 3(b) and (c), the structural density canbe used to define subdivisions of a city for administrationor characterization purposes. Nevertheless, our interest lies on a sufficiently long time of the diffusion process,where the influence of structural variations inside the citybecomes negligible, as shown in Figure 3(d).In order to illustrate the potential of using the struc-tural density to establish urban area borders, we comparethe borders defined according to administrative regionswith those obtained from the presented methodology. InFigure 4 we show maps of the street networks of threecities, and the respective administrative regions. We alsoshow the urban borders obtained by setting different val-ues for the parameters of the methodology. The distinctborders shown in each map correspond to different re-spective values of the parameter f , which are indicatedin the legend of Figure 4(a). Parameter σ was also varied,the considered values being indicated above each figure.Parameter M w was kept fixed at M w = 100 m for allcases.Regarding the city of S˜ao Carlos, since its streets areconcentrated in a specific region, with few streets out-side it, the parameters of the method have little impacton the obtained border, as can be seen in Figures 4(a),(b) and (c). It is also clear that the obtained bordersare more intuitive than the administrative region definedfor the city, which is much larger than the urban partof this city. For the city of Luton, we observe in Fig-ure 4(d) that setting a small σ and large f provides aborder that seems too small to accommodate the city’sstreet network, while larger values of these parametersgives a more intuitive border for the urban area of thecity, as shown in Figures 4(e) and (f). A particularly in-teresting result is obtained for the city of Mulhouse. Theborders shown in Figure 4(g) indicate that, by setting asufficiently small value of σ , parameter f can be changedto define different core regions of the street network ofthe city. That is, large values of f define core regionshaving a disproportionately high density of streets. Thissituation is not observed when setting larger values of σ ,as shown in Figures 4(h) and (i).In the following, we provide examples of studies thatcan be applied to urban regions identified by the method-ology. We consider 21 urban regions belonging to 8different countries, obtained using the OpenStreetMapOverpass API [28]. The name and location of each re-gion is presented in Table S1 of the supplementary ma-terial. The radius used for data retrieval was r = 15km, while the standard deviation of the Gaussian kernelwas σ = 2 . f = 0 . M w = 100 m. We note that, forcomparison purposes, we consider urban regions having asimilar population, roughly in the range [1 × , × ]. III. GEOMETRY CHARACTERIZATION
In this section we characterize the geometry of theurban regions by calculating their lacunarity [21, 22].Usually attributed to Mandelbrot [29], the lacunarity
FIG. 4. Borders identified by the proposed methodology, compared with administrative regions defined for three cities. Twoparameters of the methodology were varied. The values used for parameter f are indicated in panel (a). Parameter σ was setto the values indicated above each figure. The considered cities are S˜ao Carlos (Figures (a), (b) and (c)), Luton (Figures (d),(e) and (f)) and Mulhouse (Figures (g), (h) and (i)). has traditionally been used to characterize fractal struc-tures [22, 30]. Nevertheless, the measurement has farreaching applications in different fields [31–33]. The lacu-narity is commonly used to measure the spatial regular-ity of holes in objects represented in a binary image. Forexample, one can define a matrix I containing the repre-sentation of a regular grid as an image, that is, I ij = 1if a line of the grid passes through the point ( i, j ), and I ij = 0 otherwise. This image can then be used as inputto the lacunarity calculation. Since a regular grid can beregarded as a structure containing holes with exactly thesame size, its lacunarity is close to the minimum value of the measurement, which is defined in the range [1 , ∞ ].A finite grid never reaches the minimum value due toborder effects on the lacunarity calculation. This influ-ence is lessened by applying a self-referred version of thelacunarity [34], which we use in the following.The lacunarity is a multiscale property, where the scaleis set by the radius r of the neighborhood used to cal-culate hole size variations around each image pixel. Inthe case of urban regions, this means that we can lookfor regularity differences from the city block up to theentire city scale. For each urban region, we define itsrespective binary image as a drawing of the streets be- Radius (km) L ac un a r i t y Amiens
Radius (km) L ac un a r i t y LafayetteHalleLe MansVictoriaPosadasSão CarlosHalifax
Springfield
Santiago del EsteroMulhouseFormosaDarwinPlymouthWollongongSaskatoonNorthamptonStockton UberabaLutonCascavel r = 0.25km FIG. 5. Lacunarity of urban regions for different consideredscales. The vertical dotted line represents the radius at whichwe rank the regions according to their lacunarity, which arethen displayed in Figure 6. longing to the region. The lacunarity is then calculatedover the set of obtained images. In Figure 5 we show thelacunarity of the urban regions as a function of the ra-dius used in the calculation. It is clear that at very smallscales ( r <
100 meters), the urban regions have similarlacunarity, which indicates similar regularity. This hap-pens because, at this scale, we are only considering thestructure of the roads. Around r ≈
200 meters there isalready a defined regularity ranking between the regions.This is a scale that spans a few city blocks. For a verylarge scale ( r ≈ r ≈
250 meters seems to represent agood balance between statistical significance and absenceof border effects, we consider the regularity hierarchy ob-tained at this scale. In Figure 5 we indicate by a verticaldotted line the value r = 250 meters. In Figure 6 we showeach urban region ordered according to increasing val-ues of lacunarity. The obtained ordering seems to reflectthe notion that some cities are structured in a grid-likefashion [35], thus displaying low lacunarity values, whileothers have a seemingly disordered structure, having cityblocks of many different sizes. IV. TOPOLOGY CHARACTERIZATION
Another aspect that can be studied about the ur-ban regions obtained with the presented methodology isthe topological properties of their street networks. TheOpenStreetMaps dataset contains information about thestreets hierarchy, in the sense of expected traffic flow orprojected traffic speed. Therefore, it is interesting to an-alyze how the edge betweenness [23] values of the streetsare associated with their respective hierarchy. For sucha task, we calculate the edge betweenness for three mainclasses of streets.We intuitively expect that the “main” streets, thatis, streets planned to sustain larger traffic flows andspeeds, will present larger betweenness values. In Fig-ure 7 we show the edge betwenness distribution for six ur-ban regions. The OpenStreetMaps tags associated to thestreets, as well as the respective three-class division con-sidered are shown in Figure 7(c). The histograms werenormalized individually for each street class, since thenumber of streets in classes B and C is much larger thanin class A. We note that the results shown in the figureare consistent for the other urban regions obtained usingthe methodology. It is clear that main streets, which areassociated to class A, do indeed tend to have larger edgebetweenness.The results also indicate that streets having low edgebetweenness values do not necessarily belong to classes Bor C. This is mostly caused by main streets having exitsor intersections with B and C-class streets. The topolog-ical representation of intersections as nodes do not con-sider the preferential traffic flow associated with the mainstreets, and therefore the role of some main streets getsoverlooked in the topological analysis. Nevertheless, it isinteresting that many main streets do have large betwen-ness. This means that this measurement can be used toidentify main streets when the metadata do not containinformation about the streets hierarchy.
V. CONCLUSION
The very definition of what constitutes a city or urbanregion is strongly associated to its border demarcation.Therefore, it is expected that studies trying to deriveuniversal laws for cities should give special attention tothe actual area associated to the city. We believe thatthis stands true specially for studies in network theory,which commonly use administrative areas to delimit thenetworks.We presented a simple approach to detect urban areaborders based on an important structural aspect of cities,their road transportation network. The main advantageof the method is in its simplicity and intuitive behaviorwhen changing each of its parameters. This is becauseeach parameter involves a precise rule defining the re-quested border. The standard deviation of the Gaussiankernel, σ , sets the maximum typical distance between FIG. 6. Visualization of the urban regions structures, which are ordered according to increasing values of lacunarity.
Motorway AStreet tag Associated classMotorway link ATrunk ATrunk link APrimary BPrimary link BSecondary BSecondary link BTertiary BTertiary link BUnclassified CResidential CLiving street C
FIG. 7. Comparison of the edge betweenness and planned hierarchy of streets. The plots show the edge betweenness histogramscalculated for six urban regions of the dataset. The values are separated into three classes, and their frequency is normalizedindividually for each class. The OpenStreetMaps tags and their respective associated class are indicated in panel (c). two sets of streets. The probability density fraction, f ,used to threshold the density map of street intersections,establish the minimum density of street crossings. Theparameter can be related to the admissible complexityof the streets configuration inside the urban area. Theminimum width of the urban region, M w , eliminates thinpatches of streets that provide communication betweentwo distinct urban regions.Two possible applications allowed by the proposedmethodology were discussed. The lacunarity measure-ment provided an interesting ranking of the geometricalregularity of the considered cities. It remains as an inter-esting study to identify the difference in lacunarity be-tween planned cities and unplanned ones. Also, it wouldbe interesting to verify the influence of the regions’ relief(e.g. mountains, lakes and rivers) or urban architecture(e.g. parks, open spaces and large buildings) on the lacu-narity. The topological analysis showed that the edge be-tweenness does indeed correlates with street importance,even when one considers a simple network construction defined by street crossings and terminations. Therefore,the betweenness could be used, for example, as a plan-ning tool for constructing new streets.As mentioned before, the presented methodology canbe applied to any urban region in the world, providedthere is data for such. It remains as a challenge toapply it in a world-wide scale in order to, amongother interesting studies, search for classes of lacu-narity, measure the topological regularity and test thelegitimacy of Zipf’s and Gibrat’s laws for the urban areas. ACKNOWLEDGMENTS
CHC thanks FAPESP (Grant no. 11/22639-8) forfinancial support. FNS acknowledges CAPES. LdFCthanks CNPq (Grant no. 307333/2013-2), NAP-PRP-USP and FAPESP (Grant no. 11/50761-2) for support. [1] We note that, in this study, the terms urban area and metropolitan area are considered interchangeable. Theymay have strong distinctions for some countries.[2] Xavier Gabaix and Yannis M Ioannides. The evolutionof city size distributions.
Handbook of regional and urbaneconomics , 4:2341–2378, 2004. [3] Alexander I Saichev, Yannick Malevergne, et al.
Theoryof Zipf’s law and beyond , volume 632. Springer Science& Business Media, 2009.[4] George Kingsley Zipf.
Human behavior and the principleof least effort.
Addison-Wesley Press, 1949.[5] Xavier Gabaix. Zipf’s law for cities: an explanation.
Q.J. Econ. , pages 739–767, 1999. [6] Bin Jiang and Tao Jia. Zipf’s law for all the naturalcities in the united states: a geospatial perspective.
Int.J. Geogr. Inf. Sci. , 25(8):1269–1281, 2011.[7] Jan Eeckhout. Gibrat’s law for (all) cities.
Am. Econ.Rev. , pages 1429–1451, 2004.[8] Moshe Levy. Gibrat’s law for (all) cities: Comment.
Am.Econ. Rev. , 99(4):1672–1675, 2009.[9] Paul R Krugman and Paul Krugman.
The self-organizingeconomy
J. Regional Sci. ,49(2):243–261, 2009.[12] Alberto Alesina, Reza Baqir, and Caroline Hoxby. Politi-cal jurisdictions in heterogeneous communities. Technicalreport, National Bureau of Economic Research, 2000.[13] Thomas J Holmes and Sanghoon Lee. Cities as six-by-six-mile squares: Zipf’s law? In
Agglomeration Economics ,pages 105–131. University of Chicago Press, 2010.[14] Hern´an D Rozenfeld, Diego Rybski, Jos´e S Andrade,Michael Batty, H Eugene Stanley, and Hern´an A Makse.Laws of population growth.
Proc. Natl. Acad. Sci. ,105(48):18702–18707, 2008.[15] United Nations.
Demographic Yearbook , volume Table 6.United Nations, 2013.[16] Hern´an A Makse, Shlomo Havlin, and H EugeneStanley. Modelling urban growth patterns.
Nature ,377(6550):608–612, 1995.[17] Hern´an D Rozenfeld, Diego Rybski, Xavier Gabaix, andHern´an A Makse. The area and population of cities: Newinsights from a different perspective on cities.
Am. Econ.Rev. , 101:2205–2225, 2009.[18] Roberto Murcio, Antonio Sosa-Herrera, and SuemiRodriguez-Romo. Second-order metropolitan urbanphase transitions.
Chaos, Solitons & Fractals , 48:22–31,2013.[19] A. Paolo Masucci, Elsa Arcaute, Erez Hatna, KirilStanilov, and Michael Batty. On the problem of bound-aries and scaling for urban street networks.
J. R. Soc.Interface , 12:20150763, 2015.[20] Bin Jiang, Junjun Yin, and Qingling Liu. Zipfs law forall the natural cities around the world.
Int. J. Geogr. Inf.Sci. , (ahead-of-print):1–25, 2015.[21] Yuval Gefen, YigaL Meir, Benoit B Mandelbrot, and Am-non Aharony. Geometric implementation of hypercubiclattices with noninteger dimensionality by use of low la-cunarity fractal lattices.
Phys. Rev. Lett. , 50(3):145–148,1983.[22] C Allain and M Cloitre. Characterizing the lacunarityof random and deterministic fractal sets.
Phys. Rev. A ,44(6):3552, 1991.[23] Ulrik Brandes. On variants of shortest-path betweennesscentrality and their generic computation.
Soc. Networks
Kernel smoothing . CrcPress, 1994.[27] Costa, L da F and Roberto Marcond Cesar Jr.
Shapeclassification and analysis: theory and practice . CRCPress, 2009.[28] http://wiki.openstreetmap.org/wiki/Overpass API. [29] Benoit B Mandelbrot.
The fractal geometry of nature ,volume 173. Macmillan, 1983.[30] TG Smith, GD Lange, and WB Marks. Fractal methodsand results in cellular morphologydimensions, lacunar-ity and multifractals.
Journal of neuroscience methods ,69(2):123–136, 1996.[31] Roy E Plotnick, Robert H Gardner, William W Hargrove,Karen Prestegaard, and Martin Perlmutter. Lacunarityanalysis: a general technique for the analysis of spatialpatterns.
Physical review E , 53(5):5461, 1996.[32] Charles R Tolle, Timothy R McJunkin, David TRohrbaugh, and Randall A LaViolette. Lacunarity def-inition for ramified data sets based on optimal cover.
Physica D: Nonlinear Phenomena , 179(3):129–152, 2003.[33] MN Barros Filho and FJA Sobreira. Accuracy of lacu-narity algorithms in texture classification of high spatialresolution images from urban areas. In
XXI congressof international society of photogrammetry and remotesensing , 2008.[34] Erbe P Rodrigues, Marconi S Barbosa, and Costa, L daF. Self-referred approach to lacunarity.
Physical ReviewE , 72(1):016707, 2005.[35] Alessio Cardillo, Salvatore Scellato, Vito Latora, and Ser-gio Porta. Structural properties of planar graphs of urbanstreet patterns.
Phys. Rev. E , 73(6):066107, 2006.