Information theoretic network approach to socioeconomic correlations
AAn information theoretic network approach to socioeconomic correlations
Alec Kirkley Department of Physics, University of Michigan, Ann Arbor, Michigan 48109, USA
Due to its wide reaching implications for everything from identifying hotspots of income inequal-ity to political redistricting, there is a rich body of literature across the sciences quantifying spatialpatterns in socioeconomic data. In particular, the variability of indicators relevant to social andeconomic well-being between localized populations is of great interest, as it pertains to the spa-tial manifestations of inequality and segregation. However, heterogeneity in population density,sensitivity of statistical analyses to spatial aggregation, and the importance of pre-drawn politicalboundaries for policy intervention may decrease the efficacy and relevance of existing methods foranalyzing spatial socioeconomic data. Additionally, these measures commonly lack either a frame-work for comparing results for qualitative and quantitative data on the same scale, or a mechanismfor generalization to multi-region correlations. To mitigate these issues associated with traditionalspatial measures, here we view local deviations in socioeconomic variables from a topological lensrather than a spatial one, and use a novel information theoretic network approach based on the Gen-eralized Jensen Shannon Divergence to distinguish distributional quantities across adjacent regions.We apply our methodology in a series of experiments to study the network of neighboring censustracts in the continental US, quantifying the decay in two-point distributional correlations acrossthe network, examining the county-level socioeconomic disparities induced from the aggregation oftracts, and constructing an algorithm for the division of a city into homogeneous clusters. These re-sults provide a new framework for analyzing the variation of attributes across regional populations,and shed light on new, universal patterns in socioeconomic attributes.
I. INTRODUCTION
Analysis of spatial data is crucial for understanding a wide variety of human systems, and with our increasing ca-pacity to handle high resolution data sets, has found applications across the sciences [1]. From analyzing demographicpolling behavior [2], to epidemic vulnerability of populations [3], to disparities in access to nutritious food [4], spatialdata on social and economic attributes of populations is central to many problems in modern data-driven science.In particular, assessing the extent to which socioeconomic properties differ across regions of space is an importanttopic for understanding the spatial dynamics of production and consumption [5], the manifestation of segregation[6], and the spatial decomposition of inequality [7]. There has been thus been extensive research to understandhow socioeconomic indicators fluctuate across space, which has involved the development of many sophisticatedmathematical techniques to quantify variation in spatial data.A major challenge with these methods is determining the scale at which to probe spatial variations, noting thatpopulations tend to disperse heterogeneously across space [8]. Extreme spatial inhomogeneity in population densitycauses inconsistencies in interpretability of results based solely on distance, as the pace of economic activity is largelydetermined by population [9], and space is primarily relevant insofar as it relates to the number of “interveningopportunities” it provides economic agents with [10]. As a result, various methods have been designed to accountfor heterogeneous population in the analysis of spatial data, some of which include density-equalizing maps [11] andmethods based on dasymetric mapping [12]. As an additional complication, there is no apriori way to aggregateregions of space for statistical analysis (an issue that is more precisely quantified by the Modifiable Areal UnitProblem, or MAUP for short) [13]. Consequently, finding suitable scales for various problems in spatial analysisis an open problem that has received extensive interest due the sensitivity of results to the chosen scale [14]. Tomake matters worse, policy interventions take place at the level of artificial government-designated boundaries, andso analysis that ignores these boundaries may be irrelevant for certain implementation-driven studies [15]. Here,we assess relationships between official boundaries (census tracts) in a network-based manner, circumventing theaforementioned spatial issues by considering topological distance rather than geographic distance, which also allowsfor the development of insights at the scale of regions designated for policy intervention.Another important avenue of research investigates what measures to use to quantify spatial disparities in populationdata. For nominal distributions, such as race or religious affiliation, there are a wide variety of measures of qualitativevariation that take the form of segregation indices between disjoint groups [16]. Some of these indices can also beused with ordinal or interval data such as population counts over income brackets or education levels [6], but few ofthese measures have the flexibility to accommodate all types of data on the same scale or generalize to more thantwo regions. Measures based on information theory can also be effectively applied to distributional socioeconomic a r X i v : . [ phy s i c s . s o c - ph ] N ov data [17, 18], having the additional benefit of being founded in fundamental statistical principles, and allowing insome cases an extension to multiple distributions. We develop here a novel approach based on the GeneralizedJensen-Shannon Divergence (GJSD) [19] to compare distributional data, which has a number of advantages overother approaches, including flexibility for all distributional data types and an intuitive theoretical interpretation.We note other approaches proposed to analyze spatial data using networks or information theoretic principles, asthere has been similar research in regional science, economic and political geography, urban planning, and spatialanalysis. There has been extensive work on using spatial network methods in urban science [20, 21], with a focusoriented mostly towards the structure of urban form and the dynamics of urban growth. Numerous methods basedon spatial aggregation of neighboring regions within the context of multiscalar diversity indices have been developedto assess the spatial manifestation of diversity [22, 23], but rarely do these accommodate distributions or multipledata types. Additionally, there is a body of research constructing spatial correlation and aggregation methods basedon information theoretic measures [24, 25]. However, these analyses thus far have been limited primarily to racialsegregation and ecological diversity, and focus on relationships between individual regional entities within largerclusters rather than multi-distribution correlations as in this study. Additionally, these measures may not be as easilyinterpreted in terms of a simple statistical process, as is the case with our method, and many of these measures arenot adaptable to all data types, which limits their capability in comparative analysis.In this study, we develop a novel approach for studying spatial variation in distributional socioeconomic data basedon regional adjacency networks and information theory, and apply our methodology to the network of adjacent censustracts in the continental US through a few experiments. We first examine two-point correlations in our distributionaldistance measure with respect to path length across the adjacency network, finding a universal decay pattern withsimilar scaling exponents and finite size cutoffs across a variety of socioeconomic attributes. We also utilize thismethodology to assess disparity with respect to various socioeconomic attributes across US counties by generalizingour measure to the comparison of more than two regions, finding high regional dependence and correlations in ourmeasure for multiple variables. Finally, we discuss a new means for spatial aggregation of regions through communitydetection with our measures at multiple resolutions, clustering the census tract network for the city of Chicagointo meaningful regions of homogeneous socioeconomic characteristics at different cluster size scales. Our methodsprovide a new means for analyzing spatial variation in all types of distributional data within a universal frameworkthat circumvents limitations in traditional spatial measures. The results from our experiments point to new waysof thinking about how socioeconomic characteristics manifest across space, and can be applied to a wide range ofproblems across the social sciences. II. METHODOLOGYA. Census Tract Data and Network Construction
In order to study a wide variety of socioeconomic attributes at high spatial resolution, we utilize US Censusdata at the tract level from the American Community Survey in 2018 [26, 27]. The American Community Surveycontinuously samples US households to collect data on various socioeconomic and demographic characteristics of thepopulation, and it is the largest survey at the household level that is conducted by the Census Bureau. We choose toanalyze data at the level of census tracts because they encapsulate highly localized populations, represent officiallydesignated regions relevant for policy intervention [28, 29], and have roughly equal populations (the 25th and 75thpercentiles in terms of population are 2971 and 5572 for the set of tracts used in the analysis). We aggregatedistributional data on educational attainment, house price, income, industry of occupation, and race in order to assessspatial variability across a range of different variables. The techniques we develop can be adapted for continuousdistributional data, but here we use the available binned data for housing prices and incomes, leaving to futurework the estimation of the full corresponding continuous distributions, as this is a difficult problem on its own [30, 31].In order to quantify variation in the discrete distribution of a variable X across tracts, we encode its possible valuesas a vector q X , which may be nominal, ordinal, or interval in nature depending on the variable X being analyzed.For census tract i , we denote its distributional vector of values for the variable X as q ( i ) X , with the particular valuefor an entry x denoted q ( i ) X ( x ). These tract distributional vectors are normalized, and satisfy (cid:80) x q ( i ) X ( x ) = 1 for alltracts i and variables X , making q ( i ) X ( x ) a probability mass function over realizations x of X . For example, if incensus tract 5 there are 300 persons classified as Asian out of 1000 total persons, then q (5) race ( Asian ) = 0 .
30. Detailson the variables analyzed are given in Table I.The nearest-neighbor network representation for census tracts is constructed utilizing the TIGER shapefile data[32], and two tracts are neighbors in the network if they share a common length of border or a corner. Only tracts inthe continental US were considered for this analysis in order to ensure a single connected component for two-pointcorrelation analyses. After removing tracts with corrupted or incomplete data, the final network had 70 ,
201 nodesand 197 ,
841 edges (for an average degree of 5 . . , 6 . , and 269 . respectively. If we consider the set oftracts kept in the filtered dataset, and construct their (potentially incomplete) associated counties, the distribution ofland areas is also right-skewed, with the counties in the 10th percentile, median, and 90th percentile having areas of953 . , 1911 . , and 4863 . respectively. The high level of heterogeneity we see in the land area statisticsat both the tract and county level further illustrates the utility of an approach to socioeconomic correlations that isspatial scale-independent, as adminstratively equivalent regions clearly can have drastically different sizes. B. Generalized Jensen-Shannon Divergence
Due to its desirable properties as a distributional distance measure, which we discuss in more detail, the Gener-alized Jensen-Shannon Divergence (GJSD) has gained popularity for applications across disciplines, from quantumphysics [33], to genomics [34], and even to history [35]. For our purposes, the GJSD will allow us to distinguishdistributional data across census tracts in a meaningful way, which can be understood in terms of the following process.Suppose we have two spatial regions, region 1 and region 2, and we want to determine how similar these regionsare with respect to a socioeconomic variable X . We assume that their respective populations n and n are known,as well as the distributions q (1) X ( x ) and q (2) X ( x ) defined in the Introduction. One way to think about how thepopulations in regions 1 and 2 differ in their composition of the attribute X is to consider the situation wherethere was no artificial line drawn between regions 1 and 2, and instead we had just decided to consider them onesingle “super-region”. We can then ask the question: How different is the distribution of X across the population inthis super-region than in its individual sub-regions? Rather than naively comparing the distributions q (1) and q (2) directly, this perspective accounts for the population difference between the regions, and will also allow us to addressin a natural way the increase in regional homogeneity we get by separating these regions.From an information theoretic perspective, we can quantify the homogeneity of attribute X within a population byits average information content (or surprisal ), in other words how unpredictable it is. For instance, if a populationhas relatively equal fractions of people from each race, it is difficult to guess what any given person’s race is, and theamount of “information” we gain by finding out each person’s race is relatively high on average. However, if nearlyeveryone is of a single race, it is very easy to guess an individual’s race, and we are on average very “unsurprised”upon each discovery of the race of a randomly selected individual in this population. For our thought experiment, wecan determine the homogeneity gain we achieve by separating regions 1 and 2 by computing how much the averageinformation content of attribute X in the population is reduced after the split of the super-region.The average information content of a random variable with probability distribution q ( x ) is given by its entropy, H [ q ( x )], where H is the Shannon entropy functional H [ q ( x )] = − (cid:88) x q ( x ) log q ( x ) (1)and log q ( x ) is the information content of an observation of x [36]. Thus, the average information content of attribute X in the super-region population is given by H (cid:104) Q (12) X ( x ) (cid:105) = − (cid:88) x Q (12) X ( x ) log (cid:16) Q (12) X ( x ) (cid:17) , (2)where Q (12) X ( x ) = n n + n q (1) X ( x ) + n n + n q (2) X ( x ) (3)is the empirical probability mass function of X in the super-region. Now, if regions 1 and 2 are split, then wecan associate to any individual in the super-region a label i = 1 , X on average. Then, in a random experiment to survey thesame population about X , we would know the region i that each person we sample is from, and thus the informationcontent associated with each observation we make is log q ( i ) X ( x ) rather than log Q (12) X ( x ). The average informationcontent H (cid:48) (cid:104) Q (12) X ( x ) (cid:105) of X after the regional split is then given by the weighted average H (cid:48) (cid:104) Q (12) X ( x ) (cid:105) = n n + n H [ q (1) X ( x )] + n n + n H [ q (2) X ( x )] , (4)and the reduction in average information content from splitting the regions is given by the difference J (12) X = H (cid:104) Q (12) X ( x ) (cid:105) − H (cid:48) (cid:104) Q (12) X ( x ) (cid:105) . (5)Generalizing our argument to the merging of m ≥ m adjacent regions is given by J (1 ,...,m ) X = H (cid:104) Q (1 ,...,m ) X ( x ) (cid:105) − m (cid:88) k =1 π k H [ q ( k ) X ( x )] , (6)where π k = n k (cid:80) mk (cid:48) =1 n k (cid:48) (7)(with n k the population of region k ) and Q (1 ,...,m ) X ( x ) = m (cid:88) k =1 π k q ( k ) X ( x ) . (8)We can recognize now that Eq. 6 is equivalent to the Generalized Jensen-Shannon Divergence (GJSD), which issometimes referred to as just the
Jensen-Shannon Divergence for m = 2 [19].Intuitively, if the distributions { q ( k ) } are all very similar, knowing which region that a person is from does notreduce our uncertainty about their value of X by much, and J (1 ,..,m ) will be close to 0. On the other hand, if the { q ( k ) } are relatively different, then we can reduce our uncertainty about a person’s value of X by a lot by knowingwhich region k they are from, and J (1 ,...,m ) will be higher.We know that Eq. 6 is bounded below by 0 due to the concavity of entropy, and this minimum is achieved when q ( k ) = q ( k (cid:48) ) for all k, k (cid:48) , as merging the regions does not change our uncertainty about a person’s value of X at all.On the other hand, the maximum value J (1 ,..,m ) max that Eq. 6 can take is J (1 ,...,m ) max = − m (cid:88) k =1 π k log π k , (9)which happens when the { q ( k ) } are entirely non-overlapping in their regions of non-zero probability. We can see thatthis is the upper bound by rewriting Eq. 6 in a more illuminating manner as J (1 ,...,m ) X = (cid:88) x,k π k q ( k ) ( x ) log (cid:20) q ( k ) ( x ) (cid:80) l π l q ( l ) ( x ) (cid:21) , (10)and noting that log (cid:104) q ( k ) ( x ) (cid:80) l π l q ( l ) ( x ) (cid:105) ≤ log (cid:104) π k (cid:105) , with the equality holding when q ( l ) ( x ) = 0 for all l (cid:54) = k , which isequivalent to the q ’s having disjoint nonzero support. Eq. 9 is just the average uncertainty we have about whichsmaller region k a randomly chosen person from the super-region will come from.We normalize Eq. 6 by the upper bound in Eq. 9 to enforce values to lie in [0 , n k . The final expression we use for distributional comparison acrossregions is then L (1 ,...,m ) X = J (1 ,...,m ) X J (1 ,...,m ) max . (11)This measure is easily adapted to any discrete variable X , which can be nominal, ordinal, or interval in nature, allowingfor the application of Eq. 11 to a wide variety of problems. It can also be adapted to continuous distributions throughapproximations of the differential entropy. We note that for ordered data, Eq. 11 is only sensitive to how muchthe probability mass changes between distributions of interest, not to where it moves. In this sense, there are otherappealing measures for comparing ordered data, such as variants of the earth-mover’s distance [37]. However, Eq.11 has a major advantage over such previous measures in that it can be used to compare results across all typesdistributional data on the same scale, and can also accommodate the inclusion of more than two distributions forcomparison. In the following section, we perform multiple experiments on the tract adjacency network using Eq. 11,demonstrating new insights on spatial socioeconomic variability that can be gained through our methodology. III. RESULTSA. Two-point Correlations in L X Two-point correlation functions—a term used to refer generically to functions that measure some type of averagecorrelation between points in a system as a function of the distance between them—are an invaluable tool fordescribing spatial data for systems as diverse as galaxy clusters [38], turbulent fluids [39], and earthquakes [40]. Inmore recent work, the concept of the two-point correlation function has been extended to networks [41–43], whereit refers to computing correlations between the properties (in most cases, degree) of two nodes as a function of theshortest path distance between them.Here, in order to assess the “scale” at which socioeconomic properties vary across the US, we compute a two-pointcorrelation function for L X (Eq. 11) between census tracts as a function of the number of network hops betweenthem. The effective distance we are concerned with is then consistent with policy-relevant boundaries and roughlyaccounts for the heterogeneous population density across space (as tracts have relatively similar populations asdiscussed earlier). In other words, the total population of neighbors at path distance l or less from a focal tractis roughly the same for all tracts, as the degree distribution of the analyzed network is highly homogeneous as ischaracteristic of spatial networks in general.In our case, the two-point correlation function C X ( l ) for socioeconomic attribute X as a function of (unweighted)network geodesic distance l is given by C X ( l ) = 1 n ( l ) (cid:88) i 000 uniformly sampled focal tracts i , thencomputing the sum in Eq. 12 over sampled focal tracts i and traversed nodes j . A network distance of l = 20corresponds to a spatial distance of 200 km, varying depending on the location of the central tract, and so capturesspatial regions roughly of size 160 , 000 km (or about 2% of the land area of the continental US). Using this distancecutoff thus restricts our analysis to relatively concentrated regions, which may be more relevant for spatially targetedpolicy interventions.In order to examine the scale over which correlations in each attribute decay, we analyze how quickly C X ( l )approaches its asymptotic value C X ( ∞ ) from its initial value C X (1) as we increase l . C X ( ∞ ) is estimated as theaverage value of L X over 10,000 tract pairs selected uniformly at random (which should draw primarily from nodepairs at distances much greater than l = 20 based on the network structure). Taking inspiration from the form oftwo-point correlations in spin systems, we can then fit the resulting data to the truncated power-law form˜ C X ( l ) = l − α e − ( l − /β , (13)where ˜ C X ( l ) = C X ( ∞ ) − C X ( l ) C X ( ∞ ) − C X (1) , (14)and we’ve rescaled C X → ˜ C X to account for the intercepts at l = 1 and l = ∞ .The scaling exponent α in Eq. 13 quantifies the rate of decay in correlation in the system as a function of distance(network hops), and the cutoff exponent β determines the distance scale (in terms of hops) over which correlationpersists. A higher (more positive) value of α indicates a slower decay in correlations as we move away from a giventract, and a higher value of β indicates a longer distance over which tracts have correlated distributions with thisreference tract. To extract the exponents α and β , the following OLS fit is performedlog ˜ C X ( l ) = − α log l − ( l − β + (cid:15) l , (15)with (cid:15) l a white noise process.We plot the results of the fit in Fig. 1A, where we show the coefficient of determination r , the scaling exponent α , and the cutoff exponent β for the fit in Eq. 15 for each attribute. We can see that the curves for all attributes(apart from “industry”, which due to autocorrelated residuals has been suspected to follow a different decay formthat we will not explore here) collapse quite well onto each other. This collapse is not only an indication of a goodfit, but can possibly lead us to consider a more fundamental, attribute-independent mechanism behind the variationof different attributes X across regions, which we will discuss at the section’s closing.To investigate a potential consequence of the striking similarity in the decay of ˜ C X ( l ) across attributes X studiedin Fig. 1A, we examine the correlations between the losses L ( ij ) X and L ( ij ) X (cid:48) across edges ( i, j ) for all pairs of attributes( X, X (cid:48) ). Specifically, we analyze the monotonic dependence between losses using Spearman correlation, which relaxesthe linearity assumption of Pearson correlation but also allows us to test for the significance of observed correlations[44]. Specifically, we compute ρ (cid:16) { L ( ij ) X : ( i, j ) ∈ E } , { L ( ij ) X (cid:48) : ( i, j ) ∈ E } (cid:17) , (16)where E is the set of edges in the adjacency network, ρ is the Spearman correlation coefficient, and the argumentsto ρ describe the vectors of measurements being correlated. We plot the results as a correlation matrix in Fig.1B, where we can see relatively high correlations between most of the variables analyzed. The high correlationswe see are consistent with associations seen in a multitude of previous economic and sociological studies [45–48],although our framework has the added benefit of utilizing a single unified formalism to analyze all these associations.However, to get at underlying universal mechanisms behind observed socioeconomic data, we must go beyond solelydemonstrating statistical associations between variables. The correlations seen in Fig. 1B may actually just bean artifact of a more fundamental process determining the decays in Fig. 1A, and we can make some headway inuncovering this process (or processes) using techniques inspired from urban scaling.Traditional urban scaling posits that a wide variety of characteristics Y in a city can be predicted solely by thecity’s population P through relations of the form Y ∼ P η for some exponent η > 0, which in practice holds up for alarge number of cities and variables of interest [9]. The success of the urban scaling theory relies on a few key factorsthat are associated with a growing city population: denser organization of facilities and infrastructure, an acceleratedpace of life, and increased interaction between agents and activities leading to specialization and innovation [49]. Inpractice, the data Y for some city-level characteristic (such as new patents or number of gas stations) is fit versuscity population P for many different cities, yielding an estimate for the exponent η which we can interpret to gainan understanding of the fundamental processes contributing to the scaling behavior of Y . For instance, if η > Y grows superlinearly with P , which should be the case for quantities Y that show increasing returnswith population (e.g. indicators of innovation such as new patents). On the other hand, η < Y that decrease in unit cost as we increase the city’s population (e.g. mobility-relatedinfrastructure such as number of gas stations). Perhaps the most important takeaway from traditional urban scalinganalysis is that when we can collapse the behavior of a wide range of seemingly different socioeconomic systems into asingle framework with few parameters, these parameters can help us understand basic universal processes underlyingthese superficially distinct variables.We can use a similar process to interpret the results of Fig. 1A, except rather than the absolute quantity of asocioeconomic indicator, we are analyzing correlations between distribution-valued quantities, and the fundamentalcovariate here is network distance l instead of population P . Based on their homogeneous populations and degrees,the total population in tracts at path distance l or less from a focal tract is very similar across tracts, and so l reflects the total population included as we encircle a focal tract at greater and greater radii. As space is a factor forsocioeconomic processes mainly in that it provides a medium for interaction among people [50], this distance l maybe a more fundamental quantity than standard spatial distance in how it determines socioeconomic activity, and sowe may be able to explain the spatial variation in a wide variety of socioeconomic variables using simple functions of l such as Eq. 13. An alternative quantity to l could be derived from literally transforming space based on populationto homogenize the population density, a concept which has inspired numerous interesting and informative mappingmethods [11]. However, we are ultimately constrained by the basic spatial units designated for data aggregation (e.g.census tracts), and so here we treat these regions, hence l , as fundamental.In the present case, we can see that the exponents β determining the network correlation cutoff scale are verysimilar for education, housing, income, and race, indicating that correlations in these regional distributions arenon-negligible over a universal distance scale of ∼ 30 hops. However, we see higher variation in the scaling exponents α , with race and housing decaying at a slower rate across the network than education and income. This suggeststhat perhaps the mechanisms that drive spatial differences in racial composition and local real estate values operateover larger distances than the mechanisms behind income or educational variability, at least in the US.The association between the spatial distributions of housing values and racial groups has been noted in numerousstudies that address “redlining” and other processes that result in lower property values in neighborhoods with highminority populations [51]. The analyses in Fig. 1A may point to additional, more subtle mechanisms behind thisinequality due to a significant difference in the scaling exponents for housing and race, as this observed discrepancyindicates that the scales over which housing and racial regional similarity decay are quite different. It is known thathome values are also tied to local incomes, which in turn can result in high variability in housing prices due to therelative flexibility of wages and mobility of workers compared to supply-regulated housing [52]. Therefore, perhaps theinterplay between the long-range correlated racial composition of the population and the comparatively short-rangecorrelated income distributions play a role in determining the moderate decay exponent α we see in the data. However,more definite conclusions and practical intervention strategies require a more contextualized analysis in conjunctionwith domain expertise. B. County-level Heterogeneity To examine the regional diversity of a given socioeconomic variable, we employ Eq. 11, this time to all the tractscomprising each county within our dataset. More specifically, for each county we examine, we compute L ( t ,t ,... ) X with t i the census tracts within the county subdivision and X the variable of interest. For notational convenience,we will use the notation L ( county ) X from now on for this quantity. In order to compare counties with varying numbersof constituent tracts on the same scale, we normalize for potential biases from the number of included tracts byusing a bootstrapping procedure to compute z-scores for each county-level value L ( county ) X . To do this, for all countysizes (number of constituent tracts) k we compute the vectors µ k and σ k , which are the sample mean and standard e du c a t i o n h o u s i n g i ndu s tr y i n c o m e r a cee du c a t i o nh o u s i n g i ndu s tr y i n c o m e r a ce BA FIG. 1: Universal patterns in tract similarity across attributes. (A) Fit results for the two-point correlation functions C X ( l ) for attributes X in Table I, with 95% confidence intervals around the scaling and cutoff exponents α and β . The line y = x is plotted for reference, as a perfect scaling collapse maps all points onto this line. Eq. 15 is deemed a poor fit for˜ C industry after a residual analysis, and so this result is omitted. (B) Spearman correlation matrix with respect to losses L ( ij ) X across all edges ( i, j ), for all pairs of socioeconomic attributes utilized in our study. All correlations are highly statisticallysignificant at the 1% significance level, with standard errors of ∼ . deviation of L ( t ,...,t k ) X over 100 random samples of k tracts t , ..., t k . Then, we calculate a standardized version of Eq.11, ˜ L , for each county using ˜ L ( county ) X = L ( county ) X − µ | county | σ | county | , (17)where | county | is the number of tracts within the county. We will refer to Eq. 17 as a “disparity” measure, ashigher values of ˜ L ( county ) X indicate higher dissimilarity in a county’s tract-level distributions of q ( i ) X relative to what isexpected in a randomized null model where the county’s tracts are chosen at random. In practice, we will see that˜ L tends to be negative for most counties, and in this case we should note that values of greater magnitude indicatehigher similarity in the county-aggregated tracts than expected by chance.As a first step in understanding county-level disparities across the US, we plot the distribution of ˜ L ( county ) X over all counties for each socioeconomic attribute X in Fig. 2A as box-and-whisker plots. We can see that thedistributions of all quantities tend strongly towards negative values, indicating that most counties have greatersimilarity in their tract-level distributions q ( i ) X than expected in the null model. This is consistent with the spatialautocorrelation at short scales we see in socioeconomic variables in Fig. 1A, although these analyses in some senseprovide a complimentary view point. Here, rather than assessing the scales over which distributions of socioeconomiccharacteristics remain similar as in Fig. 1, we are examining whether artificially drawn administrative boundaries areeffective at capturing the homogeneity in these attributes. As counties have size scales much smaller than the areacovered up to the typical correlation cutoff scale l ∼ 30 from any reference tract, we expect that correlations betweentract-level distributions will be relatively high within counties, and so in this sense these results should be unsurprising.Looking at Fig. 2A, we do see something perhaps unexpected though: there are lots of counties that are onlyslightly more (and sometimes less) homogeneous in their tract-level distributional data than we’d expect by chance.In particular, most of the values of ˜ L ( county ) race are in the interval [ − , L ( county ) X across counties, we plot the corresponding Spearmancorrelation matrix using the results from all counties studied. Similarly to Eq. 16, we compute ρ (cid:16) { ˜ L ( c ) X : c ∈ counties } , { ˜ L ( c ) X : c ∈ counties } (cid:17) . (18)The Spearman correlation matrix in Eq. 18 is shown in Fig. 2B, where we can see very high correlations betweenthe within-county disparities, even higher than in the values of L ( ij ) X shown in Fig. 1B. These correlations are similarin sign and relative magnitude (between attributes X ) to those in Fig. 1B, but by aggregating tracts at the countylevel rather than just assessing correlations over edges, we are effectively reducing noise by smoothing out localfluctuations, and so we see a major increase in the values of ρ . In other words, some individual edges ( i, j ) may havevery different divergences L ( ij ) X and L ( ij ) X (cid:48) , but the effect of these outlier pairwise relationships is reduced when lookingat distributions between tracts at the county-level. This noise reduction is only possible because, as discussed, thescale at which we are analyzing ˜ L ( county ) X is smaller than the area associated with the correlation cutoff scales β foundin Fig. 1A.Finally, as a case study to visualize the geographic manifestation of these county-level disparities, we plot a heatmapof ˜ L ( county ) housing across all counties studied in Fig. 2C. Here we can immediately see an interesting pattern: the county-leveldisparity in housing prices, when compared to the same number of randomly selected regions, is actually much lower along the coasts and metropolitan areas than it is elsewhere. Housing markets in coastal and metropolitan regionsare typically seen as having high inequality due to the large variation in home and land values often seen in theseareas [53, 54]. However, when assessed on a relative scale using distributions at the granularity of census tracts, wesee a different story. In this case, we see that these coastal and metropolitan counties actually have quite similardistributions q ( i ) housing across their constituent tracts i relative to more inland and rural counties. The primary reasonfor this may be that the heterogeneity in housing prices in dense, urban counties is primarily manifested at scales belowour measurement precision: tracts themselves have house price distributions with high variance, but tracts in a givencounty tend to have relatively similar distributions. This is consistent with the low rate of spatial decay in housingcorrelations seen in Fig. 1A, as most tracts are urban [55] and if most of the fluctuations persist at scales smallerthan census tracts, we will see a relatively smooth correlation trend at larger scales. Due to the coarse binning ofhousing values, however, there is also a potential confounding factor here in that many of the home prices in expensivemetropolitan and coastal regions fall into the highest bin in the corresponding census data ( > $1 , , C. Regional Clustering As a final experiment using our measures, we detect communities at multiple size scales in the census tractsubnetwork within the city of Chicago—a frequently used case study in socioeconomic diversity due to its rich historyand abundance of available data [56, 57]—with the goal of constructing clusters that are relatively homogeneous withrespect to each socioeconomic attribute. Optimal aggregation of spatial regions according to various criteria hasbeen a longstanding problem of interest, and numerous approaches have been proposed to tackle this using networks0 BCA ˜ L ( county ) housing FIG. 2: County-level distributional disparity. (A) Distribution of ˜ L X for all measures X , showing trends towards negativevalues indicating lower within-county disparity than expected by chance in a randomized null model. Whiskers extend to the5th and 95th percentiles. (B) Spearman correlation between county-level disparity measures ˜ L ( county ) X for all pairs of attributes,showing a high positive association between all pairs of variables. All correlations are highly statistically significant at the 1%significance level, with standard errors of ∼ . (C) Housing price disparities ˜ L ( county ) housing for all counties in the continentalUnited States. Values of ˜ L indicate the degree to which counties’ within-county distributional similarity in housing valuesdiffers from expectation (with more negative values associated with very high within-county similarity compared to expected). with edges weighted by an attribute representing regional similarity. This approach has the added benefit that sincecommunity detection algorithms look for connected clusters of nodes, the clusters detected naturally tend to becontiguous, and thus relevant for spatially localized policy. Attributes used in previous studies include phone callsbetween regions [58], commuting flows [59], taxi trips [60], and similarity between individual within-region featureslike our own method [61].In order to group the tract network into clusters that exhibit homogeneity with respect to attribute X , we use L ( ij ) X to construct edge weights w ij for the network prior to performing community detection. However, we cannot not use L ( ij ) X for edge weights directly, as community detection algorithms typically associate higher edge weight with highernode similarity, and L ( ij ) X is structured so that lower values indicate greater similarity across an edge ( i, j ). We thusemploy a common transformation from the machine learning literature [62], which is to use an exponential kernel tomap the values L ( ij ) X to their associated edge weights w ij in the network. The weight transformation can be written1as w ij = e − ωL ( ij ) X , (19)where ω > ω ≈ w ij ≈ ω >> L ( ij ) X and edges withhigher L ( ij ) X . Any kernel mapping the unit interval to decreasing non-negative reals would suffice to construct theweights w ij , but we opt for the exponential function here because it is particularly simple and only has one tunableparameter. For the experiments shown, we find a middle ground between the two extremes presented for ω , for eachattribute-based clustering choosing a value of ω that results in a relatively uniform distribution of edge weights across[0 , X we numerically approximate the ω that maximizes the entropy of theassociated distribution of edge weights e − ωL ( ij ) X . A more principled method for choosing ω based on the applicationof interest is a subject is left to future work, but here we use this simple statistical procedure to avoid falling into oneof the two cases presented, where there is either no differentiation in the edge weights or only a handful of edges matter.In order to detect communities in the Chicago subnetwork, we aim to find the configuration of node communities (cid:126)c = { c i } in the subnetwork such that the weighted modularity Q γ ( (cid:126)c ) is approximately optimized. The modularity Q γ ( (cid:126)c ) that we use here is defined by Q γ ( (cid:126)c ) = 1 W (cid:88) ij (cid:104) γw ij − s i s j W (cid:105) δ c i ,c j , (20)where W is the sum of edge weights in the network, s i = (cid:80) k w ik is the sum of weights of edges attached to node i ,and γ is a resolution parameter [63]. When γ = 1, Eq. 20 reduces to the traditional notion of weighted modularity,but varying γ (cid:54) = 1 allows us to choose the importance given to w ij relative to s i s j W (which is the approximateexpected weight of w ij through random rewiring). In particular, the larger we make γ , the more importance is givento the observed edge weights relative to the expected weights, and the community configurations (cid:126)c that maximizeEq. 20 will be larger. Thus, by varying ω we can tune how much influence differences in L ( ij ) X across edges have,and by varying γ we can determine the characteristic cluster size. We use the Louvain Algorithm [64], a greedyoptimization method, to find the configuration (cid:126)c that approximately maximizes Eq. 20. There are numerous viablealternative methods but here we opt for the Louvain algorithm as it is fast and straightforward to implement. Itis also important to note that we can perform regional aggregation with L ( ij ) in a manner where clusters are notlikely to be contiguous, for instance by constructing a matrix from all pairwise values of L ( ij ) and performing oneof various matrix-clustering techniques [65]. However, here we are interested in constructing contiguous clusters oftracts in order to coarse grain the city into zones relevant for spatially targeted policy intervention, and so we usecommunity detection to encourage contiguity of the clusters.In Fig. 3 we show the results of our community detection analysis for the Chicago census tract subnetwork. In Fig.3A-3C, we show the clusters obtained for edge weights constructed using L ( ij ) income , at various resolutions γ . We canobserve that increasing γ allows us to get a coarser view of the socioeconomic clusters present in the city, and can allowfor delineation of super-regions at a desired scale. We also show the officially designated neighborhood boundaries(thick black lines) for Chicago ( https://data.cityofchicago.org/ ) in order to visualize the consistencies andinconsistencies between our clusters and these officially delineated regions. We can see that in the intermediate regime γ ∼ . 1, our clusters are consistent with some neighborhood boundaries, but deviate significantly from others. Thissuggests that the officially designated regions are somewhat consistent with homogeneous socioeconomic clusters,but there is room for improvement to these boundaries if the goal is to delineate socioeconomically homogeneouszones within the city (at least regarding income). Of course, there are numerous other factors, both socioeconomicand geographic, that would need to be accounted for in addition to the factors we analyze in order to draw effectivepolicy-relevant boundaries in practice.We also compute the Adjusted Mutual Information (AMI) between clusters obtained using different attributes X aswell as the official neighborhood clusters, in order to assess the consistency in the groups we obtain when consideringthese different factors. The Mutual Information M I ( (cid:126)c , (cid:126)c ) is the amount of shared information (in an informationtheoretic sense) between the clusterings (cid:126)c and (cid:126)c , or more intuitively, the statistical uncertainty in each independentclustering minus the statistical uncertainty when combined. More specifically, we have that M I ( (cid:126)c , (cid:126)c ) = H [ (cid:126)q ] + H [ (cid:126)q ] − H [ (cid:126)q ] = (cid:88) s,t (cid:126)q ( s, t ) log (cid:20) (cid:126)q ( s, t ) (cid:126)q ( s ) (cid:126)q ( t ) (cid:21) (21)2where (cid:126)q ( s ) is the fraction of nodes put into cluster s under configuration (cid:126)c (and similarly for (cid:126)q ), and (cid:126)q ( s, t ) isthe fraction of nodes put into group s under configuration (cid:126)c and t under configuration (cid:126)c . One drawback to usingMI, however, is that it gives systematically higher values as we increase the number of clusters, even for completelyrandom cluster configurations [66]. One proposed correction (of many) to this is to use the AMI, given by AM I ( (cid:126)c , (cid:126)c ) = M I ( (cid:126)c , (cid:126)c ) − (cid:104) M I ( (cid:126)c , (cid:126)c ) (cid:105) Max( H [ (cid:126)q ] , H [ (cid:126)q ]) − (cid:104) M I ( (cid:126)c , (cid:126)c ) (cid:105) , (22)where (cid:104) M I ( (cid:126)c , (cid:126)c ) (cid:105) is the expectation value of MI in the null model where the number of items in each cluster isfixed and groups are generated randomly through permutations of labels. The AMI is equal to 0 if the clusters (cid:126)c and (cid:126)c share the amount of information we expect from random chance purely based on their cluster sizes, and 1 ifthe clusters are identical.In Fig. 3D, we plot the average AMI over all pairs of partitions using the five socioeconomic attributes, as afunction of the resolution parameter γ . We can see that there is a clear peak value of γ at which the five attributesshare highly overlapping clusters. In practice, this could be used as a heuristic to tune γ for selecting the size scaleof the clusters, if the goal is to select clusters that are highly homogeneous with respect to multiple socioeconomicattributes. It is interesting to note the clear scale sensitivity in this analysis: at certain scales, we can divide the cityinto zones that are relatively socioeconomically homogeneous in all variables studied, but at other scales, the citydecomposes into regions with less overlap.Fig. 3E shows the AMI matrix for the clusters obtained at γ ≈ . 1, the peak in Fig. 3D. We can see from this plotthat all socioeconomic attributes are spatially clustered in quite similar patterns at this scale, and that all have highcorrelation with the official neighborhood boundaries as well. This is perhaps an endorsement for the neighborhoodboundaries, as these results suggest that the scale at which the neighborhoods are drawn corresponds to the scale atwhich the socioeconomic clusters in the city are most similar. Taken together, the results from Fig. 3D and 3E maypoint to a new method for subdividing a city into different neighborhoods, which can be constructed easily based onany socioeconomic attribute and at any size scale. IV. CONCLUSION In this study, we propose a new measure for analyzing socioeconomic data across spatial regions using conceptsfrom network theory and information theory, which accommodates all forms of distributional data, has a naturalextension to the comparison of more than two regions, and allows for policy-relevant analysis by considering officiallydelineated regions as fundamental spatial units. By analyzing spatial data from a topological lens, we can approachregional analysis issues from a relational perspective that avoids the longstanding issue of identifying appropriatespatial scales. We apply our framework in a series of experiments on the adjacency network of US census tracts todemonstrate the new insights we can gain with our methodology. We first find a universal decay pattern in varioussocioeconomic correlations as a function of path distance, as well as high statistical association between distributionalsimilarities in adjacent tracts. We then aggregate tract-level distributions at the county level, finding again thatdistributional disparity measures are highly correlated, and also that there are relatively low levels of within-countyinequality compared to what one would expect by aggregation of random tracts. Finally, we propose a clusteringalgorithm for regional aggregation into homogeneous socioeconomic clusters, finding that in practice the clustersobtained by our methodology have high overlap with accepted neighborhood delineations, as well as with each otheracross attributes. These applications illustrate the versatility of our methods, as well as the universality present insocioeconomic data when analyzed with a unified framework.There are numerous improvements that can be made to our methodology in future work that increase its effectivenessin practical applications. Firstly, important limitations arise from the quality and resolution of census data, whichwe do not attempt to address as they are outside the scope of this work. In particular, the coarse binning of intervaldistributional datasets (here, income and housing) can result in poor estimation of entropy and other uncertaintymeasures, as long tails are not accounted for, which may account for a large portion of the variability in the distributions[67]. One improvement to our methodology to obtain more accurate results would thus be to estimate these fulldistributions based on the predefined bins and other summary statistics such as mean, median, and Gini coefficient[31, 68], then apply our measures using approximations of differential entropy. Additionally, some census data havelarge margins of error due to various statistical sampling issues [69, 70], and so correcting for this noise in our analyseswould also improve the efficacy of our techniques. However, we leave these and further improvements to future work.3 A B C γ ≈ 0.01 γ ≈ 0.1 γ ≈ 1 D E FIG. 3: Attribute-based regional clustering at multiple scales. (A)-(C) Clusters obtained through weighted communitydetection for census tracts (thin black lines) in Chicago with respect to income for various resolution parameters γ , displayingvarying characteristic size and association with neighborhood boundaries (thick black lines). (D) Adjusted Mutual Information(AMI) between the clusters obtained by our community detection algorithm, as a function of γ and averaged over all pairwisecombinations of the five studied socioeconomic variables. We see a clear peak value of γ ≈ . (E) AMI matrix between clusters computed at this peak γ value, including the clusters obtained by grouping tracts by the neighborhood they most overlap with, indicating a highcorrelation between all of these partitions compared to what one expects by random chance based on their cluster sizes. Acknowledgments The author thanks Shihui Feng and Mark Newman for helpful discussions. A.K. was supported by the Departmentof Defense through the National Defense Science and Engineering Graduate Fellowship (NDSEG) program. [1] T. C. Bailey and A. C. Gatrell, Interactive Spatial Data Analysis , Longman Scientific & Technical, Essex (1995).[2] R. J. Johnston, Political, Electoral, and Spatial Systems: An Essay in Political Geography . Clarendon Press, Oxford (1979).[3] P. Elliot, J. C. Wakefield, N. G. Best, D. J. Briggs, et al. , Spatial Epidemiology: Methods and Applications. , OxfordUniversity Press, Oxford (2000).[4] R. E. Walker, C. R. Keane, and J. G. Burke, Disparities and access to healthy food in the United States: A review of fooddeserts literature. Health & Place , 876–884 (2010).[5] M. J. Beckmann and T. Puu, Spatial Economics: Density, Potential, and Flow , North Holland Publishing, Amsterdam (1985).[6] S. F. Reardon and D. O’Sullivan, Measures of spatial segregation. Sociological Methodology , 121–162 (2004).[7] S. J. Rey, Spatial analysis of regional income inequality. Spatially Integrated Social Science , 280–299 (2004).[8] J. H. Brown, D. W. Mehlman, and G. C. Stevens, Spatial variation in abundance. Ecology , 2028–2043 (1995).[9] L. M. Bettencourt, J. Lobo, D. Helbing, C. K¨uhnert, and G. B. West, Growth, innovation, scaling, and the pace of life incities. Proceedings of the National Academy of Sciences , 7301–7306 (2007).[10] S. A. Stouffer, Intervening opportunities and competing migrants. Journal of Regional Science , 1–26 (1960).[11] M. T. Gastner and M. Newman, Diffusion-based method for producing density-equalizing maps. Proceedings of the NationalAcademy of Sciences , 7499–7504 (2004).[12] J. B. Holt, C. Lo, and T. W. Hodler, Dasymetric estimation of population density and areal interpolation of census data. Cartography and Geographic Information Science , 103–121 (2004).[13] C. E. Gehlke and K. Biehl, Certain effects of grouping upon the size of the correlation coefficient in census tract material. Journal of the American Statistical Association , 169–170 (1934).[14] M. G. Turner, R. V. O’Neill, R. H. Gardner, and B. T. Milne, Effects of changing spatial scale on the analysis of landscapepattern. Landscape Ecology , 153–162 (1989).[15] C. Flint and P. J. Taylor, Political Geography: World-economy, Nation-state, and Locality , Pearson Education, London(2007).[16] O. D. Duncan and B. Duncan, A methodological analysis of segregation indexes. American Sociological Review , 210–217(1955).[17] J. Walsh and M. E. O’Kelly, An information theoretic approach to measurement of spatial inequality. Economic and SocialReview , 267–286 (1979).[18] P. S. Chodrow, Structure and information in spatial segregation. Proceedings of the National Academy of Sciences ,11591–11596 (2017).[19] J. Lin, Divergence measures based on the Shannon entropy. IEEE Transactions on Information Theory , 145–151 (1991).[20] M. Barth´elemy, Spatial networks. Physics Reports , 1–101 (2011).[21] A. Kirkley, H. Barbosa, M. Barthelemy, and G. Ghoshal, From the betweenness centrality in street networks to structuralinvariants in random planar graphs. Nature Communications , 1–12 (2018).[22] W. A. Clark, E. Anderson, J. ¨Osth, and B. Malmberg, A multiscalar analysis of neighborhood composition in Los Angeles,2000–2010: A location-based approach to segregation and diversity. Annals of the Association of American Geographers , 1260–1284 (2015).[23] M. Olteanu, J. Randon-Furling, and W. A. Clark, Segregation through the multiscalar lens. Proceedings of the NationalAcademy of Sciences , 12250–12254 (2019).[24] M. Batty, Spatial entropy. Geographical Analysis , 1–31 (1974).[25] E. Vaz, D. Bandur, Merging Entropy in Self-Organisation: A Geographical Approach. In: Resilience and Regional Dynam-ics. Advances in Spatial Science (The Regional Science Series) , pp. 171–186, Springer, Cham (2018).[26] U. C. Bureau, American Factfinder , US Department of Commerce, Economics and Statistics Administration (2004).[27] American community survey 5-year data (2009-2018). .[28] L. O. Houstoun Jr, Neighborhood change and city policy. Urban Land , 159–170 (1976).[29] N. Krieger, A century of census tracts: Health & the body politic (1906–2006). Journal of Urban Health , 355–361(2006).[30] J. B. McDonald and M. Ransom, The generalized beta distribution as a model for the distribution of income: Estimationof related measures of inequality. In Modeling Income Distributions and Lorenz Curves , pp. 147–166, Springer, New York(2008).[31] P. T. Von Hippel, D. J. Hunter, and M. Drown, Better estimates from binned income data: Interpolated CDFs andmean-matching. Sociological Science , 641–655 (2017).[32] Tiger/line shapefiles. United States Census Bureau (2019).[33] J. Bri¨et and P. Harremo¨es, Properties of classical and quantum Jensen-Shannon divergence. Physical Review A , 052311(2009).[34] S. Itzkovitz, E. Hodis, and E. Segal, Overlapping codes within protein-coding sequences. Genome Research , 1582–1589(2010).[35] S. Klingenstein, T. Hitchcock, and S. DeDeo, The civilizing process in London’s Old Bailey. Proceedings of the NationalAcademy of Sciences , 9419–9424 (2014).[36] T. M. Cover and J. A. Thomas, Elements of Information Theory , John Wiley & Sons, Hoboken (2012).[37] Y. Rubner, C. Tomasi, and L. J. Guibas, A metric for distributions with applications to image databases. In SixthInternational Conference on Computer Vision , pp. 59–66, IEEE, Bombay (1998).[38] M. Davis and P. Peebles, A survey of galaxy redshifts. V. The two-point position and velocity correlations. The AstrophysicalJournal , 465–482 (1983).[39] B. Ganapathisubramani, N. Hutchins, W. Hambleton, E. K. Longmire, and I. Marusic, Investigation of large-scale coherencein a turbulent boundary layer using two-point correlations. Journal of Fluid Mechanics , 57–80 (2005).[40] Y. Kagan and L. Knopoff, Spatial distribution of earthquakes: the two-point correlation function. Geophysical JournalInternational , 303–320 (1980).[41] D. Rybski, H. D. Rozenfeld, and J. P. Kropp, Quantifying long-range correlations in complex networks beyond nearest neighbors. Europhysics Letters , 28002 (2010).[42] M. Mayo, A. Abdelzaher, and P. Ghosh, Long-range degree correlations in complex networks. Computational Social Net-works , 4 (2015).[43] Y. Fujiki, T. Takaguchi, and K. Yakubo, General formulation of long-range degree correlations in complex networks. Physical Review E , 062308 (2018).[44] T. D. Gautheir, Detecting trends using Spearman’s rank correlation coefficient. Environmental Forensics , 359–362 (2001).[45] J. A. Kahl and J. A. Davis, A comparison of indexes of socioeconomic status. American Sociological Review , 317–325(1955).[46] E. D. Lawson and W. E. Boek, Correlations of indexes of families’ socioeconomic status. Social Forces , 149–152 (1960).[47] J. Ludwig, H. F. Ladd, G. J. Duncan, J. Kling, and K. M. O’Regan, Urban poverty and educational outcomes. In Brookings-Wharton Papers on Urban Affairs , pp. 147–201, Brookings Institution Press, Washington D.C. (2001).[48] S. Moller, A. S. Alderson, and F. Nielsen, Changing patterns of income inequality in US counties, 1970–2000. AmericanJournal of Sociology , 1037–1101 (2009).[49] L. Bettencourt and G. West, A unified theory of urban living. Nature , 912–913 (2010).[50] S. J. Redding and E. Rossi-Hansberg, Quantitative spatial economics. Annual Review of Economics , 21–58 (2017).[51] Y. Zenou and N. Boccard, Racial discrimination and redlining in cities. Journal of Urban Economics , 260–285 (2000).[52] S. Van Nieuwerburgh and P.-O. Weill, Why has house price dispersion gone up? The Review of Economic Studies ,1567–1606 (2010).[53] R. E. Dwyer, Expanding homes and increasing inequalities: Us housing development and the residential segregation of theaffluent. Social Problems , 23–46 (2007).[54] M. A. Davis and M. G. Palumbo, The price of residential land in large US cities. Journal of Urban Economics , 352–384(2008).[55] F. Wang, M. Wen, and Y. Xu, Population-adjusted street connectivity, urbanicity and risk of obesity in the US. AppliedGeography , 1–14 (2013).[56] M. Doussard, J. Peck, and N. Theodore, After deindustrialization: Uneven growth and economic inequality in “postindus-trial” Chicago. Economic Geography , 183–207 (2009).[57] M. Kassen, A promising phenomenon of open data: A case study of the Chicago open data project. Government InformationQuarterly , 508–513 (2013).[58] S. Sobolevsky, R. Campari, A. Belyi, and C. Ratti, General optimization technique for high-quality community detectionin complex networks. Physical Review E , 012811 (2014).[59] G. D. Nelson and A. Rae, An economic geography of the United States: From commutes to megaregions. PLOS One ,e0166083 (2016).[60] Y. Liu, X. Liu, S. Gao, L. Gong, C. Kang, Y. Zhi, G. Chi, and L. Shi, Social sensing: A new approach to understandingour socioeconomic environments. Annals of the Association of American Geographers , 512–530 (2015).[61] R. M. Assun¸c˜ao, M. C. Neves, G. Cˆamara, and C. da Costa Freitas, Efficient regionalization techniques for socioeconomicgeographical units using minimum spanning trees. International Journal of Geographical Information Science , 797–811(2006).[62] R. I. Kondor and J. Lafferty, Diffusion kernels on graphs and other discrete structures. In Proceedings of the 19th Interna-tional Conference on Machine Learning , pp. 315–22, Morgan Kaufmann Publishers, San Francisco (2002).[63] J. Reichardt and S. Bornholdt, Statistical mechanics of community detection. Physical Review E , 016110 (2006).[64] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre, Fast unfolding of communities in large networks. Journalof Statistical Mechanics: Theory and Experiment , P10008 (2008).[65] A. K. Jain, M. N. Murty, and P. J. Flynn, Data clustering: a review. ACM Computing Surveys (CSUR) , 264–323(1999).[66] N. X. Vinh, J. Epps, and J. Bailey, Information theoretic measures for clusterings comparison: Variants, properties,normalization and correction for chance. The Journal of Machine Learning Research , 2837–2854 (2010).[67] S. Ferson, V. Kreinovich, J. Hajagos, W. Oberkampf, and L. Ginzburg, Experimental uncertainty estimation and statisticsfor data having interval uncertainty. Sandia National Laboratories, Report SAND2007-0939 (2007).[68] P. T. von Hippel, S. V. Scarpino, and I. Holas, Robust estimation of inequality from binned incomes. Sociological Method-ology , 212–251 (2016).[69] N. R. Council, Using the American Community Survey: Benefits and Challenges , National Academies Press, WashingtonD.C. (2007).[70] S. E. Spielman, D. Folch, and N. Nagle, Patterns and causes of uncertainty in the American Community Survey. AppliedGeography , 147–157 (2014). variable X support { x } of q X ACS codes race WhiteBlack or African AmericanAmerican Indian and Alaska NativeAsianNative Hawaiian and Other Pacific IslanderOther DP05, 0059PE - 0064PEincome Less than 10,00010,000 - 15,00015,000 - 25,00025,000 - 35,00035,000 - 50,00050,000 - 75,00075,000 - 100,000100,000 - 150,000150,000 - 200,000Greater than 200,000 DP03, 0052PE - 0061PEindustry Agriculture, forestry, fishing and hunting, and miningConstructionManufacturingWholesale tradeRetail tradeTransportation and warehousing, and utilitiesInformationFinance and insurance, and real estate and rental and leasingProfessional, scientific, management, and administrative servicesEducational services, and health care and social assistanceArts, entertainment, recreation, accommodation, and food servicesOther services, except public administrationPublic administration DP03, 0033PE - 0045PEhousing (value) Less than 50,00050,000 - 100,000100,000 - 150,000150,000 - 200,000200,000 - 300,000300,000 - 500,000500,000 - 1,000,000Greater than 1,000,000 DP04, 0081PE - 0088PEeducation Less than 9th grade9th to 12th grade, no diplomaHigh school graduate (includes equivalency)Some college, no degreeAssociate’s degreeBachelor’s degreeGraduate or professional degree DP02, 0059PE - 0065PETABLE I: Information on ACS distributional variables. For each variable X , we show its support as well as the associated ACSvariable codes from https://api.census.gov/data/2018/acs/acs5/profile/variables.htmlhttps://api.census.gov/data/2018/acs/acs5/profile/variables.html