[PDF] Comparing and modeling land use organization in cities

Abstract

The advent of geolocated ICT technologies opens the possibility of exploring how people use space in cities, bringing an important new tool for urban scientists and planners, especially for regions where data is scarce or not available. Here we apply a functional network approach to determine land use patterns from mobile phone records. The versatility of the method allows us to run a systematic comparison between Spanish cities of various sizes. The method detects four major land use types that correspond to different temporal patterns. The proportion of these types, their spatial organization and scaling show a strong similarity between all cities that breaks down at a very local scale, where land use mixing is specific to each urban area. Finally, we introduce a model inspired by Schelling's segregation, able to explain and reproduce these results with simple interaction rules between different land uses.

Full PDF

CComparing and modeling land use organization in cities

Maxime Lenormand, Miguel Picornell, Oliva G. Cant´u-Ros, Thomas Louail,

3, 4

RicardoHerranz, Marc Barthelemy,

3, 5

Enrique Fr´ıas-Mart´ınez, Maxi San Miguel, and Jos´e J. Ramasco Instituto de F´ısica Interdisciplinar y Sistemas Complejos IFISC (CSIC-UIB),Campus UIB, 07122 Palma de Mallorca, Spain Nommon Solutions and Technologies, calle Ca˜nas 8, 28043 Madrid, Spain Institut de Physique Th´eorique, CEA-CNRS (URA 2306), F-91191, Gif-sur-Yvette, France G´eographie-Cit´es, CNRS-Paris 1-Paris 7 (UMR 8504), 13 rue du four, FR-75006 Paris, France Centre d’Analyse et de Math´ematique Sociales, EHESS-CNRS (UMR 8557),190-198 avenue de France, FR-75013 Paris, France Telef´onica Research, 28050 Madrid, Spain

The advent of geolocated ICT technologies opens the possibility of exploring how people use spacein cities, bringing an important new tool for urban scientists and planners, especially for regionswhere data is scarce or not available. Here we apply a functional network approach to determineland use patterns from mobile phone records. The versatility of the method allows us to run asystematic comparison between Spanish cities of various sizes. The method detects four major landuse types that correspond to diﬀerent temporal patterns. The proportion of these types, their spatialorganization and scaling show a strong similarity between all cities that breaks down at a very localscale, where land use mixing is speciﬁc to each urban area. Finally, we introduce a model inspiredby Schelling’s segregation, able to explain and reproduce these results with simple interaction rulesbetween diﬀerent land uses.

INTRODUCTION

Land use patterns appear as a natural result of cit-izens’ and planners’ interaction with the urban space.However, in a feedback loop, they also play a ma-jor role in the experience that residents and visitorshave of a city [1]. Land use patterns have an eﬀecton the liveability of neighborhoods and even on thehealth of the local residents [2]. On the other hand,land use and transportation display a well-establishedrelation [3–6]. Transport demand depends on the lo-cation of residence and business areas, while the pres-ence of new transport lines or facilities such as metrostations can substantially modify the land use mixingin a given area of the city. These ideas lie behind thedevelopment of the so-called Land Use Transport In-teraction (LUTI) models [7, 8], which are commonlyemployed in transport planning around the globe [9].An important issue regarding land use refers to themethods employed to estimate it. City Hall regis-ters, surveys or satellite images have been used inthe past to this end [10–16]. The emergence of geo-located ICT technologies introduces extra capabili-ties to directly measure the use that citizens makeof each urban space. The information is exhaustive interms of spatial and temporal resolution, allowing forthe detection of concentrations of people second bysecond along days, weeks and months. Informationfrom mobile phone call records [17–30], geolocatedtweets [15, 31–34], credit card use [35] or FourSquare[22] has been considered in the literature. Diﬀerentdata sources have been compared, ﬁnding a consis-tent agreement among the estimations on human con-centrations and mobility obtained from diﬀerent ICTdata [27], as well as between ICT data and more tra-ditional techniques [21–24, 27, 29, 36].Such wealth of information together with the abilityto process massive data brought by the Internet era allows the systematic comparison of features acrosscities. This analysis can lead to the discovery and con-ﬁrmation of properties that have been hypothesizedto be common to all cities, and also to laws provid-ing insights into the way a property scales with citysize. Some examples of these properties include num-ber of patents ﬁled, unemployment rates, GDP percapita, business diversity, consumption of resources,length of road networks, or even crime density [37–43]. The ﬁnding of these laws raises the hope ofthe existence of a coherent framework for city science[38, 39, 41, 42, 44–46].In this work, we explore land use patterns in the ﬁvemost populous urban areas of Spain. Land use infor-mation is obtained from mobile phone records usinga new framework based on network theory and sys-tematic comparisons of land use distribution acrossthe ﬁve cities are performed at diﬀerent scales. Ourresults reveal common features in the land use types’spatial distributions, which can be understood witha model introduced also here. The similarities breakdown when the land use type mixing is studied at veryshort spatial scales, exposing patterns characteristicto each city.

MATERIALS AND METHODSA network approach to detect land use

Our database is composed of aggregated andanonymized call records during 55 days betweenSeptember and November 2009 in Spain. Every timea user receives or makes a call, the event is registeredtogether with the tower (BTS) providing the service.The positions of the BTSs are geo-referenced and sothe activity levels of each spatial area can be tracked intime. For this work, we select the ﬁve most populatedmetropolitan areas of Spain: Madrid (with a popula- a r X i v : . [ phy s i c s . s o c - ph ] D ec Time (hours)

A BCDE F

Figure 1 . Steps of the method to detect land use. (A-B) The urban area is divided in cells of equal area. (C) Foreach cell, we calculate an activity proﬁle in terms of phonecalls along time during the days of the week. (D) A Pearsoncorrelation matrix between cell activities is computed. Thenthe matrix formed by correlations over a threshold value δ is used to deﬁne an undirected weighted network (E), whichis clusterized using community detection techniques and theresults plotted again on the city map (F). tion over 5 . . . , , × m to which the activity ismapped. This should prevent spurious eﬀects due tothe Voronoi areas heterogeneity (see the Appendix fora detailed description of the cities and the divisionprocess). The activity (number of users) in each cell is moni-tored in time and then processed as illustrated in Fig-ure 1. Average activity proﬁles are estimated overeach day of the week hour by hour in every cell. Theseproﬁles are normalized by the total hourly activityto subtract the trends introduced by the circadianrhythms. A Pearson correlation coeﬃcient is then cal-culated between the activities of every pair of cells,obtaining a correlation matrix describing the level ofsimilarity between activity proﬁles. The correlationscan take positive and negative values. Distributionsof these values are shown in the Figure S4 of the Ap-pendix. In order to remove non-signiﬁcant and nega-tive correlations we only consider Pearson correlationcoeﬃcients higher than a threshold δ . As a result, weobtain one weighted network per urban area. We ﬁrstnote that variations of the threshold do not producesigniﬁcant changes in the properties of the resultingnetwork. The results in the main text refer to a valueof δ equal to the correlation distribution dispersion.Once the networks are built, their mesoscopic struc-ture is analyzed using clustering techniques. Themain advantage of community detection algorithms innetworks compared to more classical clustering tech-niques based on dissimilarity matrix is that the num-ber of clusters do not need to be ﬁxed a priori . How-ever, it is important to note that diﬀerent clusteringmethods can lead to distinct partitions of the net-works. We report next results obtained with Infomap[48], while a systematic comparison with results ob-tained with other clustering tools is provided in theAppendix. As mentioned previously, Infomap doesnot require the input of a predetermined number ofclusters. Therefore, it is interesting to ﬁnd that inthe ﬁve cities, between 98 and 100% of the cells arecovered with only 4 groups. Figure 2 shows how theactivity looks like for each of these four clusters inMadrid (similar plots for Barcelona, Valencia, Sevilleand Bilbao are included as Figure S9 and S10).Each of the clusters can be associated with a mainland use:1. Residential (red) , which is characterized bylow activities from 8am to 5 − − Business (blue) , where the activity is signif-icantly higher during the weekdays than dur-ing the weekends. Furthermore, it concentratesfrom 9am to 6 − Logistics/Industry (cyan) , where, as forBusiness, the activity is higher during the week-days. We observe a large peak between 5am and7am followed by a smaller peak around 3pm.This cluster can be related to transport and dis-tribution of goods: for example, ”Mercamadrid”(the largest distribution area of Madrid) belongto this cluster;4.

Nightlife (orange) , which is characterized byhigh activity during the night hours (1am-4am),especially during the weekends. During theweekdays, these areas show higher activity be-tween 9am and 6pm, as for the Business cluster,which may be hinting a certain level of mixing inthe land use. Some examples of this category arethe ”Gran Via” in Madrid and the ”Ramblas”of Barcelona where abound theatres, restaurantsand pubs mixed with oﬃces and shops. This istypically the smallest cluster of the four in num-ber of cells. F r ac t i o n o f M ob il e P h o n e U s e r s Hour of the day

Mon. Tue. Wed. Thu. Fri. Sat. Sun.

Figure 2 . Temporal patterns associated with the fourclusters for the metropolitan area of Madrid.

In red: Resi-dential cluster; In blue: Business; In cyan: Logistics/Industry;And in orange: Nightlife.

More systematically, we compared the land usepatterns obtained with our algorithm with cadastraldata. The dataset contains information about landuse for each cadastral parcel of the metropolitan areaof Madrid and Barcelona (about 650 ,

000 parcels). Inparticular, we have for each cadastral parcel the netinternal area devoted to Residential, Business and In-dustrial uses. This information can be used to iden-tify the dominant cadastral land use in each grid cellclassiﬁed as Residential, Business and Industrial usesby the community detection algorithm. To do so weneed to deﬁne a rule to determine what is the dom-inant land use in a cell. Intuitively, one would tendto identify the dominant land use in a cell as the landuse class with the largest area. However, the natureof both land use assignations is very diﬀerent: thecadastral data is based on the net internal area oﬃ-cially devoted to each activity and not necessarily on the number of people performing it. Therefore, Resi-dential use is the land use class with the largest areain most of the cell leading to an over-representationof Residential cells in the metropolitan area. To cir-cumvent this limitation we introduce two thresholdsto identify Business and Logistics cells with cadastraldata in order to obtain a distribution of the fractionof cells according to the land use type similar to theone obtained with our algorithm (see the Appendix fordetails). The overall agreement is high, we ﬁnd a per-centage of correct predictions equal to 65% for Madridand 60% for Barcelona which is consistent with valuesobtained in other studies, 54% in [21] and 58% in [23].Furthermore, for both case studies, almost all land usetypes have a percentage of correct predictions higherthan 50% (see the Appendix for details).

RESULTSComparison of cities

Once deﬁned the clusters, we can compare the pro-portion of land use type over the ﬁve case studies.Figure 3 shows the fraction of cells and the fraction ofmobile phone activity averaged over the time periodaccording to the land use patterns for each metropoli-tan area. We ﬁnd similar results for the ﬁve case stud-ies, the Residential land use type represents in averageabout 40% of the cells and the mobile phone activityagainst 30% for the Business category and less than15% for the Logistics and Nightlife clusters.

MAD BCN VAL SEV BIL A Metropolitan Area F r ac t i o n o f C e ll s MAD BCN VAL SEV BIL B Metropolitan Area F r ac t i o n o f M ob il e P h o n e U s e r s ResidentialBusinessLogisticsNightlife

Figure 3 . Fraction of cells (A) and mobile phone users(B) according to the type of land use for each case study.

The fraction of mobile phone users is averaged over the values of the time period.

We can also study how the cells in each cluster areorganized in the city’s space. For the sake of compar-ison, we arbitrarily consider as city center the loca-tion of the City Hall and build a histogram with thenumber of cells at a certain distance from it. Sinceeach city has a diﬀerent spatial extension, distancesare normalized by dividing by the maximal distancein each city so as to produce comparable results. Thedistributions are shown in Figure 4, where averagecurves over all cities have been superimposed. It isinteresting to note certain similarity in the distribu-tion of cells for all urban areas. City size acts as a

Residential P D F −2 −1 Business −2 −1 Normalized distance to the center

Logistics/Industry P D F −2 −1 Nightlife −2 −1 Normalized distance to the center

MadridBarcelonaValenciaSevilleBilbaoAverage

Figure 4 . Distribution of the distance between the cells and the City Hall according to the type of land use.

Thedistance has been normalized by the maximum distance in each city. natural cutoﬀ in the distributions, although no simplefunctional shape is found in any of the clusters. Res-idential cells are well distributed across the cities butwith a maximum not very far from the center. Busi-ness cells appear at a similar distance as Residentialbut peaking a little further. Logistics and Industryare preferentially located in the periphery, while theNightlife cells are well distributed along the urban ar-eas but slightly more concentrated in the central areas.In order to quantify land use distribution patterns,we use the Ripley’s K [49] deﬁned as K ( r ) = An n (cid:88) i N i ( r ) , (1)where A is the city area, r the search radius (a geo-graphical scale), the index i runs over the cells in theurban area and n is the total number of cells. N i ( r )stands for the number of cells of a given type withina distance r from the cell i . This indicator measuresthe spatial heterogeneity of a given type of cells. Thebaseline for homogeneous random systems is a growth K ( r ) = πr until reaching A . If the value of K ( r ) isover the random curve for a certain r it implies thatthe system is clusterized at that scale. Since citieshave diﬀerent sizes, both K ( r ) and the radius mustbe normalized by their maximum values ( A for K ( r )and the maximum distance for r ). Curves for the nor-malized Ripley’s K for each city and land use type are displayed in Figure 5A as a function of the normalizedradius. The K ( r ) /A for each city are always above thegreen curve corresponding to a random distribution ofland use types, indicating coarsening of land use. Weﬁnd a scaling-like curve for all the land use types withmost of the cities following well the general trend withsome small deviations for Nightlife in Seville.Deepening the analysis, we can also deﬁne an en-tropy index to characterize the land use spatial orga-nization. Let us consider a frame containing the fullurban area, which is, in turn, sub-divided in a certainnumber D of equal divisions. Each of these subdivi-sions, B i , intersects the elementary cells so a certainfraction of area falls in each of the land use types: f Ri in the Residential cluster, f Bi in Business, f Li in Lo-gistics, and f Ni in Nightlife. An entropy index, E i ,can be deﬁned for B i as E i = − (cid:88) α f αi ln ( f αi ) , (2)where α runs over the four clusters. The entropy E i is then averaged over all the divisions to obtain aglobal metric for the city at a given scale E ( D ). E ( D )tends to zero if the land use within the divisions be-comes unique, as occurs for instance at large D (smallspatial scales). On the other extreme, when D → E ( D ) converges to a ﬁxed value describing the fullcity. Figure 5B shows how the average entropy be- D E n t r op y MadridBarcelonaValenciaSevilleBilbaoNull

ModelModel

A B

Residential K ( r ) / A -1 Business -1 r (Normalized radius) Logistics/Industry K ( r ) / A -1 Nightlife -1 r (Normalized radius) Figure 5 . Comparison of the observed and the simulated Ripley’s K and average entropy index. (A) Ripley’s K dividedby the city area as a function of the search radius. The radius has been normalized with the maximum value in each urbanarea. (B) Average entropy index as function of the lateral number of divisions (inverse scale) D. The color and symbols ofthe curves represent diﬀerent cities. The red curve corresponds to our model results and the green curve is the outcome of arandom null model. Results for our model were obtained with a calibrated value of γ = 0 . . The red and green curves displaythe average over realizations. haves with D . The curves are similar across cities,recalling the shape of scaling functions. This is notsurprising if the concept of a fractal-like distributionof the city activity applies as has been previously dis-cussed in the literature [37–42]. Modeling land use

Urban land use models in the literature are typicallybuilt with relative elaborated mechanisms [50, 51]. Ifbasic in the rules, the models typically refer to char-acteristics of cities such as the population or activ-ity distributions [44–46]. The shape of E ( D ) can beexplained, however, by a simple model inspired bySchelling’s segregation [52]. It is important to stressthat this model is not intended to reproduce all theprocesses leading to the land use formation, but toexplain the scaling of its spatial distribution patterns.The basic framework is a lattice in 2D representingthe urban space. Initially, a variable t i with a landuse type is assigned to every cell i at random (Res-idential R , Business B , Logistics L or Nightlife N ).The global fraction of cells of each type respects theproportions found in the empirical data in such a waythat E ( D = 1) coincides with the observations. A sat-isfaction index, S i , is then deﬁned per cell taking intoaccount its type and those of its neighbors. Similarlyto Schelling’s model, we assume that the satisfactionincreases when a cell is surrounded by cells of its owntype. Otherwise, S i depends on the particular com- binations of types. Some land uses attract each otheras, for instance, Residential and Business, while oth-ers repel as Residential and Logistics. The existenceof rules of attraction and repulsion between diﬀerenttypes of land use has already been considered in thepast like for example in [53, 54]. To be speciﬁc if p it is the fraction of neighbors of i of type t , then S i iscalculated asif t i = L, S i = δ p iL , , if t i = N, S i = p iN δ p iL , , if t i = R, B, S i =  δ p iL , with probability γ,p iR,B δ p iL , with probability 1 − γ, where δ p,x is the Kronecker delta (equal to one if p = x , zero otherwise) and γ is the only model pa-rameter. Note that the ﬁrst condition implies that forLogistic cells S i = 1 only if they are surrounded bycells of the same type, and that cells of other typeshave zero satisfaction if surrounded by any Logisticone. With this rule, we introduce a tendency to lo-cate Industry and Logistics out of the core areas of thecities. Residential and Business cells have a certaintolerance to the R, B and N types with γ acting as amixing control parameter: if γ = 0, mixing is not fa-vored. A global satisfaction measure is deﬁned as thesum over all the cell satisfaction indices, S = (cid:80) i S i . −1.0 −0.5 0.0 0.5 1.00.00.51.01.52.02.53.0 Correlation coefficient

Assigned clusterSecond closest clusterThird closest cluster Fourth closest cluster

C D

Metropolitan Area F r ac t i on o f ce ll s Res/Bus Res/Log Res/NigBus/Log Bus/Nig Log/Nig

MAD BAR VAL SEV BIL

Population (millions) F r ac t i on o f m i xe d ce ll s MadridBarcelonaValenciaSevilleBilbao

A B P D F Figure 6 . Land use mixing. (A) Distribution of the Pearson correlations between cells activity and the average clusterproﬁles. (B) Map of Barcelona displaying the four clusters with the colors varying from white to the baseline according to theintensity of the relation with the assigned cluster of each cell. The color code is red for Residential, blue for Business, cyanfor Logistics/Industry and orange for Nightlife. (C) Fraction of mixed cells as a function of the city population. (D) Fractionof cells classiﬁed by the type of land use mixing among those with two types of land use.

The model is updated by choosing random pairs ofcells and interchanging their land use if the exchangeincreases S . This process is repeated until the satis-faction reaches a stationary state.Calibrating the single parameter γ , we can repro-duce the observed K ( r ) /A and E ( D ) scaling in thereal urban areas (see red curves in Figure 5). Thevalue of the mixing parameter at which the best av-erage results are obtained is γ = 0 .

08 (Figure S8).For comparison sake, we have included a null modelin which the land use types are distributed at ran-dom, keeping the real proportions (green curves inFigure 5). Unsurprisingly, the null model fails at re-producing the curves obtained with the data, mainlybecause, generally, areas of a given land use type tendto cluster together to form land use zones. More in-terestingly, assuming that the satisfaction of a cell in-creases when it is surrounded by cells of its own typeand that Logistic cells and cells repel each other (i.e. γ = 0) is also insuﬃcient to reproduce the propertiesobserved in the data (for more details see Figure S8in Appendix). Therefore, taking into account the at-traction between Residential and Business areas seemsto be crucial for the reproduction of land use spatialorganization in cities. Mixing of land use types

So far, we have considered that each elementary cellhas a unique land use type associated. This conditioncan be easily relaxed. If an average activity proﬁle isdeﬁned for each of the four clusters, a Pearson correla-tion coeﬃcient between the activity proﬁle in each celland the clusters’ averages can be calculated. The dis-tribution of correlation values is shown in Figure 6A.The highest correlation value corresponds typically tothe cluster at which the cell is assigned. Still, in somecases, positive correlation values are found for other oreven two other clusters. For every cell, we can quan-tify the intensity of its relation with each cluster bysumming over these positive correlations and normal-izing by the total. A map of the Barcelona metropoli-tan area with the intensity of each cell relation withits assigned cluster is shown in Figure 6B. The colorsrepresent the four main type of cells and the color sat-uration is related to the correlation: darker if the cor-relation is high, paler otherwise. Most cells match wellwith their original assigned cluster, keeping darker col-ors, while some are brighter, implying a higher levelof land use mixing.We arbitrarily deﬁne a cell as mixed when the nor-malized correlations fall within the interval 0 . − . DISCUSSION

In summary, we introduce a method to automati-cally detect land use from electronic records and ap-ply it to the ﬁve largest urban areas of Spain in orderto perform a systematic comparison across them onthe land use distribution. The urban space is dividedin a regular grid to prevent geographic heterogeneityand to maintain the spatial scale under control. Theuser activity proﬁles are monitored in each unit cellalong time, and then a correlation matrix is estab-lished between the proﬁles of every pair of cells. Thiscorrelation matrix encodes the functional network ofeach city. We analyze them by using network cluster-ing techniques, which ensures that cells showing sim-ilar use proﬁles are grouped together. This methodhas been applied to the ﬁve most populated Spanishcities: Madrid, Barcelona, Valencia, Seville and Bil-bao. Since the delimitation of urban areas could af-fect the results, the deﬁnition of the municipal trans-port oﬃces is employed in each case. Interestinglygiven that the method is unsupervised, four groupsconsistently appear as dominant in all cities. Theycorrespond to activity proﬁles compatible with mainland uses in Residence, Business, Logistics/Industryand Nightlife. Not only the types are the same acrosscities, but also the proportions of cells and area de-voted to each type are similar.We also study the distribution of the four land usetypes at diﬀerent spatial scales. We deﬁne the Ripley’sK and the entropy index for each land use type andthe behavior of both metrics is explored as the spa-tial scale varies from the full city (macroscopic scale) to a single cell (microscopic). The ﬁve cities showsimilar scaling curves for the metrics, implying com-parable structures regarding how the four types amal-gamate at the urban level. The shape of the scalingcurves can be explained by a simple model that hasbeen proposed in this work. The model is based on aSchelling-like segregation in which the diﬀerent landuse types interact to generate a spatial distribution inthe city. Cells in a given land use type tend to max-imize the number of neighbors undergoing equivalentuses. This rule induces a tendency to coarsening inland use types. The diﬀerent land uses interact byattracting each other, such as services and residen-tial areas, or by repelling like industry and almostany other type. The calibration of a single parame-ter regulating the intensity of the attraction betweenservices, residential uses and nightlife is enough to re-produce the scaling curves observed in the real cities.Moreover, we also demonstrate that a model withoutland use type interactions cannot recreate the empir-ical scaling.Similarities across cities break down when one fo-cuses on how the land use types mix microscopicallywithin each unit cell. A characteristic mixing proﬁleis detected for every urban area, providing an individ-ual city ﬁngerprint. Further data on other cities couldhelp to elucidate whether diﬀerent typologies exist atthis microscopic mixing level. In conclusion, despitefurther data from other countries and sources could beimportant to conﬁrm our results, we ﬁnd that a co-herent picture emerges in the land use organization ofmajor urban areas and that its origin can be explainedwith a basic model.

ACKNOWLEDGEMENTS

Partial ﬁnancial support has been received fromthe Spanish Ministry of Economy (MINECO) andFEDER (EU) under projects MODASS (FIS2011-24785) and INTENSE@COSYP (FIS2012-30634), andfrom the EU Commission through projects EUNOIA,LASAGNE and INSIGHT. The work of ML has beenfunded under the PD/004/2013 project, from the Con-selleria de Educaci´on, Cultura y Universidades of theGovernment of the Balearic Islands and from the Eu-ropean Social Fund through the Balearic Islands ESFoperational program for 2013-2017. JJR from theRam´on y Cajal program of MINECO. [1] C. Humphries. The science of cities: Life in the con-crete jungle.

Nature , 491:514515, 2012.[2] J. F Sallis, B. E. Saelens, L. D. Frank, T. L. Con-way, D. J. Slymen, K. L. Cain, J. E. Chapman, andJ. Kerr. Neighborhood built environment and income: examining multiple health outcomes.

Social science &medicine , 68(7):1285–1293, 2009.[3] D.A. Badoe and E.J. Miller. Transportationland-useinteractions: Empirical ﬁndings and implications formodeling.

Transportation Research D: Transport &

Environment , 5D(4):235–263, 2000.[4] R. Cervero. Built environments and mode choice:toward a normative framework.

Transportation Re-search Part D: Transport and Environment , 7(4):265– 284, 2002.[5] S. Krygsman, M. Dijst, and T. Arentze. Multimodalpublic transport: an analysis of travel time elementsand the interconnectivity ratio.

Transport Policy ,11(3):265–275, 2004.[6] Y.-H. Tsai. Quantifying urban form: Compactnessversus ’sprawl’.

Urban Studies , 42:141–161, 2005.[7] R. Cervero and K. Kockelman. Travel demand andthe 3ds: Density, diversity, and design.

Transporta-tion Research Part D: Transport and Environment ,2(3):199 – 219, 1997.[8] P. Waddell, G. F. Ulfarsson, J. P. Franklin, andJ. Lobb. Incorporating land use in metropolitan trans-portation planning.

Transportation Research Part A:Policy and Practice , 41(5):382 – 410, 2007.[9] K. Bartholomew. Land use-transportation scenarioplanning: promise and reality.

Transportation ,34(4):397–412, 2007.[10] For surveys see in spain, for instance, the central oﬃcefor cadastral information (sede electr´onica de la di-recci´on general del catastro (sec) in spanish). accessi-ble via the url: .[11] J.-P. Donnay and D. Unwin. Modelling geographi-cal distributions in urban areas. In Jean-Paul Don-nay, M.J. Barnsley, and P.A. Longley, editors,

RemoteSensing and Urban Analysis , pages 205–224. Taylor &Francis, 2001.[12] X. Yang and C. P. Lo. Using a time series of satelliteimagery to detect land use and land cover changes inthe atlanta, georgia metropolitan area.

InternationalJournal of Remote Sensing , 23(9):1775–1798, 2002.[13] K. T. Geurs and B. van Wee. Accessibility evalua-tion of land-use and transport strategies: review andresearch directions.

Journal of Transport Geography ,12(2):127–140, June 2004.[14] S. Wu, B. Xu, and L. Wang. Urban land use clas-siﬁcation using variogram-based analysis with aerialphotographs.

Photogrammetric Engineering and Re-mote Sensing , 72:812–822, 2006.[15] V. Fr´ıas-Mart´ınez, V. Soto, H. Hohwald, and E. Fr´ıas-Mart´ınez. Characterizing urban landscapes using ge-olocated tweets. In

SocialCom/PASSAT , pages 239–248. IEEE, 2012.[16] S. Hu and L. Wang. Automated urban land-use clas-siﬁcation with remote sensing.

International Journalof Remote Sensing , 34:790–803, 2013.[17] J. Reades, F. Calabrese, A. Sevtsuk, and C. Ratti.Cellular census: Explorations in urban data collec-tion.

Pervasive Computing, IEEE , 6(3):30–38, 2007.[18] M. C. Gonzalez, C. A. Hidalgo, and A.-L. Barabasi.Understanding individual human mobility patterns.

Nature , 453(7196):779–782, June 2008.[19] J. Reades, F. Calabrese, and C. Ratti. Eigenplaces:analysing cities using the space time structure of themobile phone network.

Environment and Planning B:Planning and Design , 36(5):824–836, 2009.[20] V. Soto and E. Fr´ıas-Mart´ınez. Automated land useidentiﬁcation using cell-phone records. In

Proceedingsof the 3rd ACM international workshop on MobiArch ,HotPlanet ’11, pages 17–22, New York, NY, USA,2011. ACM. [21] J. L. Toole, M. Ulm, M. C. Gonz´alez, and D. Bauer.Inferring land use from mobile phone activity. In

Pro-ceedings of the ACM SIGKDD International Work-shop on Urban Computing , UrbComp ’12, pages 1–8,2012.[22] A. Noulas, C. Mascolo, and E. Fr´ıas-Mart´ınez. Ex-ploiting foursquare and cellular data to infer user ac-tivity in urban environments. In

Proceedings of the2013 IEEE 14th International Conference on MobileData Management - Volume 01 , MDM ’13, pages 167–176, 2013.[23] T. Pei, S. Sobolevsky, C. Ratti, S. L. Shaw, andC. Zhou. A new insight into land use classiﬁcationbased on aggregated mobile phone data.

Interna-tional Journal of Geographical Information Science ,28:1988–2007, 2014.[24] S. Grauwin, S. Sobolevsky, S. Moritz, I. G´odor, andC. Ratti. Towards a comparative science of cities:using mobile traﬃc records in new york, london andhong kong.

ArXiv e-print , arXiv:1406.4400, 2014.[25] V. Fr´ıas-Mart´ınez, V. Soto, A. S´anchez, and E. Fr´ıas-Mart´ınez. Consensus clustering for urban land useanalysis using cell phone network data.

Int. J. AdHoc Ubiquitous Comput. , 17(1):39–58, 2014.[26] V. Fr´ıas-Mart´ınez and E. Fr´ıas-Mart´ınez. Spectralclustering for sensing urban land use using twitter ac-tivity.

Eng. Appl. Artif. Intell. , 35:237–245, 2014.[27] M. Lenormand, M. Picornell, O. Garcia Cant´u, A. Tu-gores, T. Louail, R. Herranz, M. Barth´elemy, E. Fr´ıas-Mart´ınez, and J. J. Ramasco. Cross-checking dif-ferent source of mobility information.

PLoS ONE ,9(8):e105184, 2014.[28] T. Louail, M. Lenormand, O. Garcia Cant´u, M. Pi-cornell, R. Herranz, E. Fr´ıas-Mart´ınez, J. J. Ramasco,and M. Barthelemy. From mobile phone data to thespatial structure of cities.

Scientiﬁc Reports , 4(5276),2014.[29] P. Deville, C. Linard, S. Martin, M. Gilbert, F. R.Stevens, A. E. Gaughan, V. D. Blondel, and A. J.Tatem. Dynamic population mapping using mobilephone data.

Proceedings of the National Academy ofSciences , 111:15888–15893, 2014.[30] T. Louail, M. Lenormand, M. Picornell, O. Gar-cia Cant´u, R. Herranz, E. Fr´ıas-Mart´ınez, J. J. Ra-masco, and M. Barthelemy. Uncovering the spatialstructure of mobility networks.

Nature Communica-tions , 6:6007, 2015.[31] B. Gonalves and D. S´anchez. Crowdsourcing di-alect characterization through twitter.

PLoS ONE ,9:e112074, 2014.[32] B. Hawelka, I. Sitko, E. Beinat, S. Sobolevsky,P. Kazakopoulos, and C. Ratti. Geo-located twitter asa proxy for global mobility patterns.

Cartography andGeographic Information Science , 41:260–271, 2014.[33] M. Lenormand, A. Tugores, P. Colet, and J. J. Ram-asco. Tweets on the road.

PLoS ONE , 9(8):e105407,2014.[34] M. Lenormand, B. Goncalves, A. Tugores, and J. J.Ramasco. Human diﬀusion and city inﬂuence. arXivpreprint arXiv:1501.07788 , 2015.[35] M. Lenormand, T. Louail, O. Garcia Cant´u, M. Pi-cornell, R. Herranz, M. Barthelemy, M. San Miguel,and J. J. Ramasco. Inﬂuence of sociodemographiccharacteristics on human mobility.

Scientiﬁc Reports ,2015. [36] M. Tizzoni, P. Bajardi, A. Decuyper, G. KonKam King, C.M. Schneider, V. Blondel, Z. Smoreda,M.C. Gonzalez, and V. Colizza. On the use of hu-man mobility proxies for modeling epidemics.

PLoSComput Biol , 10:e1003716, 2014.[37] L.M.A. Bettencourt, J. Lobo, D. Helbing, C. Kuhn-ert, and G.B. West. Growth, innovation, scaling, andthe pace of life in cities.

Proceedings of the NationalAcademy of Sciences , 104(17):7301–7306, 2007.[38] M. Batty. The Size, Scale, and Shape of Cities.

Sci-ence , 319(5864):769–771, February 2008.[39] L. Bettencourt and G. West. A uniﬁed theory of urbanliving.

Nature , 467(7318):912–913, 2010.[40] L. M. A. Bettencourt. The Origins of Scaling in Cities.

Science , 340(6139):1438–1441, June 2013.[41] M. Batty.

The New Science of Cities . The MIT Press,2013.[42] E. Arcaute, E. Hatna, P. Ferguson, H. Youn, A. Jo-hansson, and M. Batty. Constructing cities, decon-structing scaling laws.

Journal of The Royal SocietyInterface , 12:20140745, 2015.[43] L. G. A. Alves, H. V. Ribeiro, E. K. Lenzi, and R. S.Mendes. Distance to the scaling law: A useful ap-proach for unveiling relationships between crime andurban metrics.

PLoS ONE , 8(8):e69580, 2013.[44] H. A. Makse, J. S. Andrade, M.l Batty, S. Havlin, andH. E. Stanley. Modeling urban growth patterns withcorrelated percolation.

Phys. Rev. E , 58:7054–7062,1998.[45] R. Louf and M. Barthelemy. Modeling the polycentrictransition of cities.

Phys. Rev. Lett. , 111:198702, 2013.[46] R. Louf and M. Barthelemy. How congestion shapescities: from mobility patterns to scaling.

ScientiﬁcReports , 4:5561, 2014.[47] R. Louf and M. Barthelemy. Scaling: lost in the smog.

Environment and Planning B: Planning and Design ,41(5):767–769, 2014.[48] M. Rosvall and C. T. Bergstrom. Maps of randomwalks on complex networks reveal community struc-ture.

Proceedings of the National Academy of Sci-ences , 105(4):1118–1123, 2008.[49] B.D. Ripley. The second-order analysis of station-ary point processes.

Journal of Applied Probability ,13:255–266, 1976.[50] A. G. Wilson. The Use of Entropy Maximising Mod-els, in the Theory of Trip Distribution, Mode Splitand Route Split.

Journal of Transport Economics andPolicy , 3(1):108–126, 1969.[51] J. Decraene, C. Monterola, G. K. K. Lee, T. G. G.Hung, and M. Batty. The emergence of urban land usepatterns driven by dispersion and aggregation mech-anisms.

PLoS ONE , 8(12):e80309, 12 2013.[52] T. Schelling. Dynamic models of segregation.

Journalof Mathematical Sociology , 1, 1971.[53] P. Krugman.

The Self-Organizing Economy . Black-well, 1996.[54] X. Feng and D. Levinson.

Evolving transportation net-works . Springer Science & Business Media, 2011.[55] C. Roth, S. M. Kang, M. Batty, and M. Barthelemy.Structure of urban movements: Polycentric activ-ity and entangled hierarchical ﬂows.

PLoS ONE ,6(1):e15923, 2011.[56] S. Maslov and K. Sneppen. Speciﬁcity and stability intopology of protein networks.

Science , 296(5569):910–913, 2002. [57] A. Lancichinetti, F. Radicchi, and S. Fortunato. Com-munity detection algorithms: a comparative analysis.

Physical Review E , 80(5):056117, 2009.[58] A. Lancichinetti, F. Radicchi, and J. J. Ramasco.Statistical signiﬁcance of communities in networks .

Physical Review E , 81(4):046110, 2010.[59] A. Lancichinetti, F. Radicchi, J. J. Ramasco, andS. Fortunato. Finding Statistically Signiﬁcant Com-munities in Networks.

PLoS ONE , 6(4):e18961+,April 2011.[60] V.D. Blondel, J.L. Guillaume, R. Lambiotte, andE.L.J.S. Mech. Fast unfolding of communities in largenetworks.

J. Stat. Mech , page P10008, 2008.[61] . APPENDIXCase studies

In this study, we focused on the ﬁve biggestmetropolitan areas of Spain, Madrid, Barcelona,Valencia, Seville and Bilbao (Figure S1). Thesemetropolitan areas are very diﬀerent in terms of sizesand populations (Table S1). For all cities we haveselected as urban area the one served by public trans-portation (bus and metro) instead of the oﬃcial deﬁ-nition that in the case of Seville includes a much largerextension relatively depopulated.

Data pre-processing

Mobile phone records of anonimyzed users during55 days (hereafter noted T ) within the period ofSeptember-November 2009 were aggregated in two dif-ferent ways. The aggregated data corresponds to thenumber of users per hour and per base transceiverstations (BTSs) identiﬁed with UTM (WSG84) coor-dinates. A user may appear connected to more thanone BTS within a period of one hour. To avoid overcounting people the following criteria was used when Madrid

Barcelona

Valencia

Sevilla

Bilbao

Figure S . Map of the metropolitan areas. Table S . Summary statistics on the metropolitan areas.

Metropolitan area Number of municipalities Number of inhabitants Area (km )Madrid 27 5,512,495 1,935.97Barcelona 36 3,218,071 634Valencia 43 1,549,855 628.81Sevilla 8 983,852 352Bilbao 34 908,916 500.2 aggregating the data: each person shall count onlyonce per hour. If a user is detected in k diﬀerent posi-tions within a certain 1-hour time period, each regis-tered position will count as (1 /k ) ”units of activity”.From this aggregated data activity per BTS and perhour is calculated for each day. In order to computethe number of mobile phone users P g,d ( h ) in a grid cell g (dimension 0 . × . ) for a day d ∈ T between h and h + 1, where h ∈ | [0 , | , we ﬁrst computed theVoronoi cells associated with each BTS. Voronoi cells

First we remove the BTSs with zero mobile phoneusers and we compute the Voronoi cells associatedwith each BTSs of the metropolitan area (hereaftercalled MA). We remark in Figure S2A that there arefour types of Voronoi cells:1. The Voronoi cells contained in MA.2. The Voronoi cells between MA and the territoryoutside the metropolitan area.3. The Voronoi cells between MA and the sea(noted S).4. The Voronoi cells between MA, the territoryoutside the metropolitan area and the sea.To compute the number of users associated withthe intersections between the Voronoi cells and MAwe have to take into account these diﬀerent types ofVoronoi cells. Let m be the number of Voronoi cells (ieBTSs), N v,d ( h ) the number of users in a Voronoi cell v (on day d at time h ) and A v the area of v , v ∈ | [1 , m ] | .The number of users N v ∩ MA,d ( h ) in the intersectionbetween v and MA is given by the following equation: N v ∩ MA,d ( h ) = N v,d ( h ) (cid:18) A v ∩ MA A v − A v ∩ S (cid:19) (1)We note in Equation S1 that we have removed theintersection of the Voronoi area with the sea, indeed,we assume that the number of users calling from thesea are negligible. Now we consider the number ofmobile phone users N v,d ( h ) and the associated area A v of the Voronoi cells intersecting MA (Figure S2B). Grid cells

Let n be the number of grid cells, the number ofmobile phone users N g,d ( h ) (on day d at time h ) isgiven by the following equation, ∀ g ∈ | [1 , n ] | : N g,d ( h ) = m (cid:88) v =1 N v,d ( h ) A v ∩ g A v . (2)Then the set of days T is divided into subsets T w ⊂ T and the average number of mobile phone users iscomputed for each day of the week w (Equation S3). N g,w ( h ) = (cid:80) d ∈ T w N g,d ( h ) | T w | (3)The average number of mobile phone users for themetropolitan areas according to the time and the dayof the week are plotted on Figure S3. The proﬁlecurve shows two peaks, one peak around 12AM and another one around 7PM. It also shows that the numberof mobile phone users is higher during weekdays thanduring weekend. N is normalized such that the total number of usersat a given time on a given day is equal to 1, EquationS4, ˆ N g ,w ( h ) = N g ,w ( h ) (cid:80) ng =1 N g,w ( h ) (4)This normalization allows for a direct comparisonbetween sources with diﬀerent absolute user’s activ-ity. For a given grid cell g = g we deﬁned the tem-poral distribution of users ˆ N g as the concatenationof the temporal distribution of users associated witheach day of the week. For each grid cell we obtained atemporal distribution of users (also called signal) rep-resented by a vector of length 24 ×

7. It is possiblethat some grid cells have exactly the same signal be-cause some Voronoi cells may contain several cells, inthis case the grid cells have been aggregated (FigureS2C).1

A B C

Figure S . Map of the metropolitan area of Barcelona.

The white area represents the metropolitan area, the dark grayarea represents territory surrounding the metropolitan area and the light grey area represents the sea. (A) Voronoi cells of themobile phone antennas point pattern. (B) Intersection between the Voronoi cells and the metropolitan area. (C) Recordingsites composed of grid cells of dimension . × . km . o o o o o o o o o o o o o o o o o o o o o o o o Madrid o o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o N u m be r o f u s e r s ( x ) P e r c en t age o f t he popu l a t i on o o o o o o o o o o o o o o o o o o o o o o o o Barcelona o o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o N u m be r o f u s e r s ( x ) P e r c en t age o f t he popu l a t i on o o o o o o o o o o o o o o o o o o o o o o o o Valencia o o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o

246 1.32.63.9Hours N u m be r o f u s e r s ( x ) P e r c en t age o f t he popu l a t i on o o o o o o o o o o o o o o o o o o o o o o o o Sevilla o o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o

246 24.16.1Hours N u m be r o f u s e r s ( x ) P e r c en t age o f t he popu l a t i on o o o o o o o o o o o o o o o o o o o o o o o o Bilbao o o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o

124 1.12.24.4Hours N u m be r o f u s e r s ( x ) P e r c en t age o f t he popu l a t i on MondayTuesdayWednesdayThursdayFridaySaturdaySunday

Figure S . Average number of mobile phone users per hour according to the day of the week for the ﬁve metropolitanareas.

Functional network

Choice of δ In the method used to extract the functional net-work from the mobile phone data presented in themain text we apply a threshold δ to the correlationmatrix in order to remove the noise and negative cor-relations from the correlation matrix. Hence, we have to choose a value of δ high enough to remove the noisebut not too high in order to preserve the structureand the properties of the network. Figure S4 displaysthe distribution of the weights (i.e. correlation coeﬃ-cient) for the ﬁve case studies. One can observe thatthese distributions can be approximated by a Gaus-sian distribution. Therefore, we have decided to keeponly edges with a weight higher than the weight dis-tribution’s standard deviation. In Figure S4 we note2 l l l l l l l l l l l l l l l C onne c t ed C o m ponen t s −1.0 −0.5 0.0 0.5 1.00.00.20.40.60.8 P D F l l l l l l l l l l l l l l l C onne c t ed C o m ponen t s −1.0 −0.5 0.0 0.5 1.00.00.20.40.60.8 P D F l l l l l l l l l l l l l l l C onne c t ed C o m ponen t s −1.0 −0.5 0.0 0.5 1.00.00.20.40.60.81.0 P D F l l l l l l l l l l l l l l l C onne c t ed C o m ponen t s −1.0 −0.5 0.0 0.5 1.00.00.20.40.60.8 P D F l l l l l l l l l l l l l l l C onne c t ed C o m ponen t s d −1.0 −0.5 0.0 0.5 1.00.00.20.40.60.81.0 P D F Weights

Figure S . Number of connected components as a function of δ (Left) and weight distribution (Right) for the ﬁvecase studies. From top to bottom, Madrid, Barcelona, Valencia, Sevilla and Bilbao.

Table S . Statistical properties of the functional net-works.

City

SD N E < k > < k > /N C L C r L r Madrid 0.42 1,381 222,227 321.8 0.233 0.69 2.04 0.31 1.77Barcelona 0.38 652 46,573 142.9 0.219 0.62 2.02 0.29 1.79Valencia 0.35 351 13,847 78.9 0.225 0.66 2.06 0.31 1.84Sevilla 0.38 188 3,700 39.2 0.209 0.62 2.15 0.26 1.81Bilbao 0.35 267 8,915 66.8 0.25 0.67 2.03 0.39 1.76 that for δ lower than the weight distribution’s stan-dard deviation (around 0 .

4, see details in Table S2)the number of connected components is equal to 1.Table S2 summarizes the statistical properties ofthe functional networks obtained for the ﬁve case stud- ies. In these tables we can observe the threshold ( SD ),the number of nodes (i.e number of cells) ( N ), thenumber of edges ( E ), the average degree ( < k > ),the average clustering coeﬃcient ( C ) and the averageshortest path length ( L ). The average clustering co-eﬃcient C r and the average shortest path length L r have been obtained with a randomly rewired networkpreserving the degree of the original network by per-muting links (4 x (number of edges) times) [56]. Weobserve that the ﬁve networks are very similar, charac-terized by a high clustering coeﬃcient and low averageshortest path.3 A Oslom I n f o m ap B Oslom I n f o m ap C Oslom I n f o m ap D Oslom I n f o m ap E Oslom I n f o m ap Figure S . Contingency tables between the partitions obtained with Infomap and OSLOM for each case study. (A)Madrid. (B) Barcelona. (C) Valencia. (D) Sevilla. (E) Bilbao. Each row represents a cluster obtained with Infomap and eachcolumn represents a cluster obtained with Oslom. The matrices have been normalized so that the sum of each column is equalto one.

Community detection

Community detection in complex networks has re-cently been the subject of an abundant literature anda large number of algorithms has been proposed thelast few years. The purpose of these algorithms is toidentify closely connected groups of nodes within anetwork. To do so, several techniques are used suchas maximizing the modularity, measuring probabilityﬂows of random walks or optimizing the local statis-tical signiﬁcance of communities.In this paper, we have decided to use the Infomapmethod proposed in [48]. Infomap ﬁnds communi-ties by using the probability of ﬂow of random walkson the network as a proxy for information ﬂow in thereal system and then decompose the graph into groupsof nodes among which information ﬂows easily. Asshown in [57], this method gives good results, however,to evaluate the robustness of the results, the analy-sis has also been performed with two other clusteringmethods, Oslom [58, 59] and Louvain [60]. Oslomis a method based on a topological approach to de-tect statistically signiﬁcant cluster whereas Louvain isbased on modularity optimization which means ﬁnd-ing the optimal partition maximizing the density oflinks within clusters and minimizing the density oflinks between clusters.In order to compare the partition obtained with thediﬀerent method we have plotted in Figure S5 and S6,respectively, the contingency tables between the parti-tions obtained with Infomap and Oslom and Infomapand Louvain for each case study. In these ﬁgures, eachplot represents a contingency table C in which eachelement C ij is the number of nodes which belong tothe cluster i detected with Infomap and to the clus- ter j detected with Oslom or Louvain. The matriceshave been normalized so that the sum of each col-umn is equal to one. This normalization allows usto study how the nodes belonging to the groups ob-tained with Oslom or Louvain are distributed amongthe clusters found with Infomap. First, we can observethat the number of communities detected with Lou-vain or Oslom is always greater or equal to the onesobtained with Infomap. Indeed, Louvain has detecteda similar number of clusters whereas the number ofcommunities detected with Oslom increases with thesize of the metropolitan area, from 5 clusters for Bil-bao to 12 for Madrid. However, it is worth noting thatin most of the cases, more than 80% of the nodes be-longing to the Oslom and Louvain’s clusters are gath-ered in one Infomap cluster. This means that evenif the size of the partitions are diﬀerent, we observethat clusters obtained with Louvain and Oslom aresub-clusters of clusters identiﬁed with Infomap. Comparison with cadastral data

In order to validate the results we compared theland use patterns obtained with our algorithm withcadastral data available on the Spanish CadastralElectronic Site [61]. The dataset contains informa-tion about land use for each cadastral parcel of themetropolitan area of Madrid and Barcelona (about650 ,

000 parcels). In particular, we have for eachcadastral parcel the net internal area devoted to Res-idential, Business and Industrial uses. We can usethese data to identify the dominant cadastral land usein each grid cell classiﬁed as Residential, Business andIndustrial uses by the community detection algorithm.To do so we need to deﬁne a rule to determine what is4 A Louvain I n f o m ap B Louvain I n f o m ap C Louvain I n f o m ap D Louvain I n f o m ap E Louvain I n f o m ap Figure S . Contingency tables between the partitions obtained with Infomap and Louvain for each case study. (A)Madrid. (B) Barcelona. (C) Valencia. (D) Sevilla. (E) Bilbao. Each row represents a cluster obtained with Infomap and eachcolumn represents a cluster obtained with Louvain. The matrices have been normalized so that the sum of each column isequal to one. the dominant land use in a cell. Intuitively, one wouldtend to identify the dominant land use in a cell as theland use class with the largest area. However, Resi-dential use is the land use class with the largest areain most of the cell leading to an over-representation ofResidential cells in the metropolitan area. To circum-vent this limitation we introduce two thresholds δ Bus and δ Log to identify Business and Logistics cells withcadastral data. If the fraction of area devoted to Busi-ness in a grid cell is higher than δ Bus then the gridcell is classiﬁed as Business. Otherwise, if the fractionof area devoted to Logistics is higher than δ Log thenthe grid cell is classiﬁed as Industry. Finally, if thefraction of area devoted to Business and Logistics is,respectively, lower than δ Bus and δ Log then the gridcell is classiﬁed as Residential.Hence, we can adjust the values of these two thresh-olds in order to obtain a distribution of the fractionof cells according to the land use type similar to theone obtained with our algorithm. To this end wehave calibrated these parameters by minimizing the L distance between the distribution of the fractionof cells according to the land use type obtained withthe cadastral data and the one obtained with our al-gorithm for the municipality of Barcelona which rep-resents 20% of the metropolitan area of Barcelona.In Figure S7, we can observe that the minimum isreached for δ Bus = 0 . δ Log = 0 .

2. Now we canuse these values to identify the dominant cadastralland use in each grid cell of the metropolitan area of d Bus d Log L Figure S . L distance between the distribution of the frac-tion of cells according to the land use type (Residential,Business and Logistics) obtained with our algorithm and thecadastral data as a function of δ Bus and δ Log for the munic-ipality of Barcelona.

Barcelona and Madrid.We ﬁnd a percentage of correct predictions equalto 65% for Madrid and 60% for Barcelona which isconsistent with values obtained in other studies, 54%in [21] and 58% in [23]. Furthermore, for both casestudies, almost all land use types have a percentage5

Table S . onfusion matrix of the classiﬁcation for Madrid and Barcelona. For the Residential, Business and Logistics rowsand columns, the value in the i th row and the j th column gives the percentage of grid cells classiﬁed as use i by the cadastralclassiﬁcation which are classiﬁed as belonging to the class j by the algorithm. The Total is the distribution of the percentageof cells according to the land use type obtained with our algorithm (row) and the cadastral data (column) with the thresholdvalues δ Bus = 0 . and δ Log = 0 . . Madrid (cid:96)(cid:96)(cid:96)(cid:96)(cid:96)(cid:96)(cid:96)(cid:96)(cid:96)(cid:96)(cid:96)

Cadastral Algorithm Residential Business Logistics TotalResidential 71.23

Business

Logistics

Total

Barcelona (cid:96)(cid:96)(cid:96)(cid:96)(cid:96)(cid:96)(cid:96)(cid:96)(cid:96)(cid:96)(cid:96)

Cadastral Algorithm Residential Business Logistics TotalResidential 68

Business

Logistics

Total of correct predictions higher than 50% (Table S3).

Calibration of γ The value of γ was calibrated in order to reproducethe evolution of the entropy index as a function of thenumber of divisions by side obtained with the data(red line in Figure 4 in the main text). We chose thevalue of γ minimizing the Euclidean distance betweenthe observed values and the average values obtainedwith the model with 100 replications. The best resultshave been obtained with the value γ = 0 . γ = γ = γ = E n t r o p y I n d e x ●●●●●●● ●●●●●●● ● Data γ = γ = γ = A BC D

Figure S . Results obtained with diﬀerent values of γ ( γ = 0 , γ = 0 . and γ = 1 ) and T = 500 , . The 2D lattice used torepresent the urban space is composed of ×

50 = 2 , cells. The model seems to converge after , iterations butto ensure the convergence all the results shown in the paper were obtained with , iterations. BA P r o p o r t i o n o f M o b il e P h o n e U s e r Time of day

Time of day

A BC D

Figure S . (A-B) Geographical location of the clusters for Madrid and Barcelona. (C-D) Temporal patterns associated withthe four clusters for both metropolitan areas. In red, Residential cluster; In blue, Business; In cyan, Logistics; And in orange,Nightlife. A P r opo r t i on o f M ob il e P hone U s e r Time of day P r opo r t i on o f M ob il e P hone U s e r P r opo r t i on o f M ob il e P hone U s e r Time of day

Time of day

ACE

BBDDFF

Figure S ..