[PDF] Network connectivity optimization: An evaluation of heuristics applied to complex networks and a transportation case study

Abstract

Network optimization has generally been focused on solving network flow problems, but recently there have been investigations into optimizing network characteristics. Optimizing network connectivity to maximize the number of nodes within a given distance to a focal node and then minimizing the number and length of additional connections has not been as thoroughly explored, yet is important in several domains including transportation planning, telecommunications networks, and geospatial analysis. We compare several heuristics to explore this network connectivity optimization problem with the use of random networks, including the introduction of two planar random networks that are useful for spatial network simulation research, and a real-world case study from urban planning and public health. We observe significant variation between nodal characteristics and optimal connections across network types. This result along with the computational costs of the search for optimal solutions highlights the difficulty of finding effective heuristics. A novel genetic algorithm is proposed and we find this optimization heuristic outperforms existing techniques and describe how it can be applied to other combinatorial and dynamic problems.

Full PDF

aa r X i v : . [ phy s i c s . s o c - ph ] J u l Network connectivity optimization: Anevaluation of heuristics applied to complexnetworks and a transportation case study

Jeremy Auerbach and Hyun Kim Colorado State University, Department of Environmental and Radiological HealthSciences, Fort Collins, CO 80523, USA University of Tennessee, Department of Geography, Knoxville, TN 37996, USA

Corresponding author:Jeremy Auerbach Email address: [email protected]

ABSTRACT

Network optimization has generally been focused on solving network ﬂow problems, but re-cently there have been investigations into optimizing network characteristics. Optimizing net-work connectivity to maximize the number of nodes within a given distance to a focal node andthen minimizing the number and length of additional connections has not been as thoroughlyexplored, yet is important in several domains including transportation planning, telecommu-nications networks, and geospatial analysis. We compare several heuristics to explore thisnetwork connectivity optimization problem with the use of random networks, including theintroduction of two planar random networks that are useful for spatial network simulation re-search, and a real-world case study from urban planning and public health. We observesigniﬁcant variation between nodal characteristics and optimal connections across networktypes. This result along with the computational costs of the search for optimal solutions high-lights the difﬁculty of ﬁnding effective heuristics. A novel genetic algorithm is proposed andwe ﬁnd this optimization heuristic outperforms existing techniques and describe how it can beapplied to other combinatorial and dynamic problems.

INTRODUCTION

Network optimization has generally been focused on solving the following classes of problems(i) ﬁnding the shortest path between nodes, (ii) maximizing the ﬂow of information across a net-work, (iii) minimizing the cost of the ﬂow of information across a network, and (iv) the problemsdealing with multiple types of information ﬂows across the network (Schrijver, 2002; Wu et al.,2004). One problem, optimizing network connectivity around a speciﬁc node with the introduc-tion of new edges, has not been thoroughly explored and yet is important in several domains.Optimizing the network connectivity of additional edges attempts to maximize the number ofnodes within a given distance to a focal node to be connected and minimizing the number andlength of additional connections is essential in network layout planning for telecommunicationsand computer systems (Resende and Pardalos, 2006; Donoso and Fabregat, 2007), the spreadof information or diseases in social networks (Eubank et al., 2004; Gavrilets et al., 2016), andthe development of neural networks (Whitley et al., 1990). This network connectivity problemis particularly important with transportation planning in urban environments, where the weightsof the network edges can be physical distances or riderships and future street connections or ymbol Deﬁnition ν Network node e Network edge N Number of nodes in a given network, N = ∑ i ν i A Network adjacency matrix a ij Adjacency matrix element ijF

Focal node d ( i , j ) Network distance between nodes i and jD Threshold distance from focal node N C Set of close nodes, N C ⊂ NN D Set of distant nodes, N D ⊂ NN ′ C Set of nodes that are now close after a new connection L F Average path length to the focal node C ( i , j ) Cost of the new connection B ( i , j ) Beneﬁt of the new connection α Cost weight β Beneﬁt weight t Optimization iteration O t Optimal solution for iteration tO ∗ Optimal solution M Set of long-term memory solutions N Ci Set of neighboring close nodes for node iN Dj Set of neighboring distant nodes for node jC Di Degree centrality of node iC Ci Closeness centrality of node i σ ij Shortest path between nodes i and j σ jk ( i ) Shortest path between nodes j and k that includes node iC Bi Betweenness centrality of node i λ Eigenvalue x i Eigenvector C Ei Eigenvector centrality of node i α P Attenuation factor C Pi Pagerank centrality of node i η Variable neighborhood size µ Genetic algorithm mutation rate s Genetic algorithm selection coefﬁcient P Population of solutions for the genetic algorithm f ( i , j ) Genetic algorithm ﬁtness function ε B Beneﬁt error from heuristic ε C Cost deviation from heuristic p Connection probability (Erd¨os-R´enyi graphs and Klemm and Egu´ılez graphs) p W Rewiring probability (Watts-Strogatz graphs) k L Initial node degree (Watts-Strogatz graphs) m Initial network size (Barab´asi and Albert graphs and Klemm and Egu´ılez graphs) m Degree of new nodes (Barab´asi and Albert graphs) p S node selection probability (Klemm and Egu´ılez graphs) p R Edge removal probability (Delaunay and Voronoi random graphs) C D Mean degree of a network L Average path length of a network c wi Weighted clustering coefﬁcient for node iw ij Weight of connection between nodes i and jC Weighted clustering coefﬁcient of a network C r Weighted clustering coefﬁcient of a completely random network L r Average path length of a completely random network γ Power law exponent P ( n ) Degree distribution E Efﬁciency of a network E r Efﬁciency of a completely random network E G Global efﬁciency of a network K Number of clusters

Table 1.

List of symbols and their deﬁnitions.transportation lines can impact ﬂow to established facilities. For example, residential develop-ers could optimize thoroughfare connectivity around existing schools to foster student activecommuting and reduce busing costs when planning new developments (Linehan et al., 1995;Auerbach, 2018; Auerbach and Zaviska, 2020), and evaluating accessibility and patient traveltime to health care facilities (Branas et al., 2005).Optimization approaches have been applied to several network problems: the search fornew edges that minimize the average shortest path distance in a network (Meyerson and Tagiku,2009); the minimization of the diameter of the network, i.e. minimizing the the maximal dis-tance between a pair of nodes (Demaine and Zadimoghaddam, 2010); and maximization ofbetweenness centrality (Jiang et al., 2011). However, the search for new edges, or shortcuts,that maximize connectivity to a focal node and minimize the length of these new edges is less nderstood and as with the above mentioned graph optimization problems, the search for opti-mal solutions can become costly when networks are large and complex. This work compares aset of heuristics for this task drawn from a review of combinatorial heuristics (Mladenovi´c et al.,2007) and from methods used for location models as this problem has many applications wherespace is an essential component (Brimberg and Hodgson, 2011). In this study, we show thatoptimization heuristics are preferred for the analysis and practice due to the nonlinearity of thesolution space and the optimal solution’s dependence on nodal characteristics, such as distanceto the focal node.In connectivity optimization, network nodes are ﬁrst segmented and assigned to ’close’ and’distant’ sets by a speciﬁed weighted network distance from the network’s focal node. An ex-haustive search, where all possible edges from distant to close nodes for a network are evaluatedto identify the optimal connections and as a benchmark for the time to ﬁnd these solutions. Thisapproach ensures that the optimal edges are found. However, as the number of nodes increasesand therefore the number of possible connections between close and distant nodes increases, itcan become computationally expensive and timely to implement. When the exhaustive searchroutine was applied to random networks and a real-world street networks, we also discoveredthat the optimal solution is nonlinearly related to nodal characteristics. To counter this, sev-eral heuristics are explored to ﬁnd the optimal connection utilizing nodal characteristics andpossibly in a quicker and less computationally expensive manner: hill climbing with randomrestart (Russell and Norvig, 2004); stochastic hill climbing (Greiner, 1992); hill climbing witha variable neighborhood search (Mladenovi´c and Hansen, 1997); simulated annealing, whichhas a history of applications in graph problems (Kirkpatrick et al., 1983; Johnson et al., 1989;Kirkpatrick, 1984); and genetic algorithms, which has been successfully used for combinato-rial optimization (Anderson and Ferris, 1994; Jaramillo et al., 2002). A Tabu heuristic was notemployed as it has been observed to not be an effective method for multi-objective optimiza-tion problems, whereas simulated annealing and genetic algorithms have shown to be effective(Golden and Skiscim, 1986; Kim et al., 2016). Among these methods, the genetic algorithmpresented here introduces a novel chromosome formulation where the genes are not propertiesof a speciﬁc variable but weights for the probability to move in a given direction in the solutionspace. This allows the method to dynamically change what solution characteristics to explorewhile possibly reducing the size of the local neighborhood search.These optimization heuristics are then applied to randomly generated networks that vary incomplexity and size to evauluate their efﬁcacy in ﬁnding the optimal connection. Several typesof random graph networks were generated to analyze the efﬁcacy of the optimization heuristicsfor systems with different topologies which are generally representative of naturally occuringand built systems: (1) Erd¨os-R´enyi networks, (2) Watts-Strogatz networks, (3) Barab´asi and Al-bert networks, (4) Klemm and Egu´ılez networks, (5) Delaunay triangulation networks, and (6)Voronoi diagrams. Erd¨os-R´enyi random networks are constructed by randomly creating connec-tion between pairs of nodes with a probability (Erd¨os and R´enyi, 1959). These networks, eventhough they have random connections, consistently have short average path lengths and irregu-lar connections, both of which are well found in natural systems. The Watts-Strogatz networksalso have random connections but the networks also form clusters, another feature commonlyfound in real-world networks (Watts and Strogatz, 1998). The Barab´asi-Albert model producesrandom structures with a small number of highly connected nodes, ’hubs’, which are observedin numerous types of networks (Barab´asi and Albert, 1999; Albert and Barab´asi, 2002). Klemmand Egu´ılez networks have random connections, clusters, and hubs (Klemm and Egu´ılez, 2002).We also introduce two novel types of random planar network versions of Voronoi diagramsand Delaunay triangulations. The reasons these were considered was that planarity is partic-ularly important in many ﬁelds and networks generated from Voronoi diagrams and Delau- ay triangles have been used in spatial health epidemiology (Johnson, 2007), transportationﬂow problems (Steffen and Seyfried, 2010; Pablo-Mart`ı and S´anchez, 2017), terrain surfacemodeling (Floriani et al., 1985), telecommunications (Meguerdichian et al., 2001), computernetworks design (Liebeherr and Nahas, 2001), hazard avoidance systems in autonomous vehi-cles (Anderson et al., 2012). Delaunay triangulation maximizes the minimum angles betweenthree nodes, to generate planar graphs with consistent network characteristics (Delaunay, 1934).Voronoi diagrams, the dual of a Delaunay triangulation, are composed of points and cells suchthat each cell is closer to its point than any other point. When edges are randomly removed fromthe connected Delaunay network or Voronoi network, with weights given by node distance froma focal node, we show that these networks display some of the properties similarly found in thenetworks mentioned above, such as complexity and randomness, but with the added componentof being planar and having edge weights that can be framed as physical distances.To complement the random network analysis, the network connectivity optimization meth-ods are applied to a study of urban transportation planning. We use the network connectivityoptimization methods to evaluate the potential costs and beneﬁts of increased thoroughfare con-nectivity for student active commuting to school. It is assumed that expanding this connectivityaround a school would allow for more households, and students, to be included within thewalking distance to the school. If more students actively commute to school, this reduces thebusing costs for the school system and increases the health and academic achievement of thestudents (Centers for Disease Control and Prevention, 2010). The combinatorial optimizationtechniques employed here to identify and evaluate new street connections can complement theoptimization approaches used for other transportation planning problems, such as greenwayplanning (Linehan et al., 1995), bus stop locations (Ibeas et al., 2010; Delmelle et al., 2012),and health care accessibility (Gu et al., 2010).The following section describes the formulation of the connectivity problem in more detail,the local search methodology, and the optimization heuristics (see Appendix A for the speciﬁcpseudocode of the optimization algorithms). The section also details the data used for thestudy including descriptions of the random networks and the street networks around schoolsthat are used for the transportation study. Results of the heuristics applied to both the randomnetworks and the case study data are also presented in that section. This is followed by adetailed discussion of the heuristic results, the further implications of these techniques for urbantransportation planning, and future work for this avenue of research.

METHODS AND DATA

Formulation of the network connectivity problem

For the description of the optimization methodology the following nomenclature will be used.The number of nodes is N , and nodes are separated into two sets based on their shortest networkpaths, d ( i , j ) , where i and j are nodes, to the focal node F . The nodes that are outside thepath distance under consideration, D , are assigned to the ‘distant’ set, N D ⊂ N , i.e. i ∈ N D if d ( i , F ) > D . The nodes that are within this distance are assigned to the ‘close’ set, N C ⊂ N , i.e. i ∈ N C if d ( i , F ) ≤ D and F ∈ N C (see Figure 1 (A)). Node neighborhoods are assigned to the sets N Di and N Cj for the distant and close nodes of i and j , respectively by the nodal characteristicsdescribed in the following subsection. For a network of size N the number of new connectionsto evaluate is ≤ N / D to the focal node are assigned to the new set N ′ C . For example, if a new connection isestablished between distant node i and close node j , then k ∈ N ′ C if k ∈ N D and d ( k , i ) + d ( i , j ) + d ( j , F ) ≤ D (see Figure 1 (C)). The number of nodes in N ′ C set is considered the beneﬁt of his new connection, B ( i , j ) = | N ′ C | . The cost of the new connection is denoted by C ( i , j ) andfor simplicity and this analysis C ( i , j ) = d ( i , j ) . The optimal solution is the solution with thegreatest beneﬁt, or number of new nodes now within the distance to the focal node which canbe expressed as the bi-objective function O ∗ = max ( i , j ) ( α C ( i , j ) + β B ( i , j )) , (1)where α and β are the weights for the costs and beneﬁts, respectively and for this study α = ∞ and β =

1. If α = ∞ , then the objective is only to minimize costs for the same beneﬁt. Someof the heuristics are also dependent on the number of iterations ( t ) and terminate when thesolutions converge, O t = O t − , or the solution does not improve, O t − > O t . Figure 1.

Diagram of the network connectivity optimization problem. The close nodes thatare within a threshold network distance (orange dashed circle) from the focal node (blacksquare) are colored green, distant nodes that could be within the threshold network distancewith additional edges are colored red, and distant nodes that could not be within this distanceregardless of additional edges are gray. Figure (A) is an example graph, (B) shows the samegraph with the optimal new connection that maximizes the number of additional nodes withinthe threshold network distance and minimizes the length of the new connection, and the inset(C) highlights this optimal connection, between nodes i and j , with the methodologicalterminology presented in Section 2. ocal search methodology The selection of neighboring nodes to improve solutions begins with several evident nodal char-acteristics (see Supplementary Table S.2 and Figure 2). These nodal characteristics are exploredto ﬁnd the critical network properties for connectivity optimization and their impact on the per-formance in ﬁnding the optimal solution. Nodes are ranked by these characteristics and thiscreates a multidimensional solution space. A two-level selection process is used, with the fol-lowing nodal level characteristics: (i) distance to the focal node, (ii) degree centrality, (iii) close-ness centrality, (iv) betweenness centrality, (v) eigenvector centrality, (vi) pagerank centrality,(vii) weighted clustering coefﬁcient; and the the following clustering of the characteristics: (i)hierarchical clusters, (ii) network-constrained clusters, and (iii) network modularity. Multiplenodes in a network can have the same degree or assigned to the same cluster, therefore the localsearches include a random shufﬂing routine to evaluate nodes with the same values.

Figure 2.

Diagram of the local search methodology. Figure (A) shows a generated networkwith the focal node represented by a black square, the close nodes are colored green and thedistant nodes red. Figure (B) shows part of the search space neighborhood for distant node i byincreasing and decreasing degree and distance to the focal node, d ( i , F ) (the other nodalcharacteristics and clusters are not shown for simplicity). Figure (C) similarly represents thespace for the close node m . The number of neighbors selected at each iteration of theoptimization routine is heuristic dependent. Table (D) gives the costs and beneﬁts for selectedconnections from the local search and the optimal connection ( j , m ) for this iteration is shownin (E) an an orange edge.Nodes are ranked by their distance to the focal node and moving in this solution dimensionmay result in lower connectivity length costs but may not maximize the number of nodes ulti-mately connected to the focal node. Ranking and selecting nodes by their centrality, i.e. theimportance of the node, could result in maximizing the number of nodes within the speciﬁeddistance to the focal node but with the possibility of higher connectivity length costs comparedto selecting nodes by other characteristics. Several commonly used measures of centrality areexplore: degree, the number of edges incident to a node; closeness centrality, the average lengthof the shortest path between the node and all other nodes in the network (Bavelas, 1950); be-tweenness centrality, the frequency of a node included in the shortest paths between all othernode pairs (Freeman, 1977); eigenvector centrality, which is a relative ranking of nodes suchthat nodes with high values are connected to other nodes with high values (Newman, 2008); and agerank centrality, a variant of eigenvector centrality that ranks nodes based on their probabil-ity of being connected to a randomly selected node and which is commonly used in web-pagerankings (Brin and Page, 1998).Nodes are clustered using weighted clustering coefﬁcients, hierarchical clustering, network-constrained clustering, and network modularity. The weighted clustering coefﬁcient of a nodeis the count of the triplets in the neighborhood of the node and accounts for the weights ofthe edges times the maximum possible number of triplets that could occur (Barrat et al., 2004).Nodes are also clustered by their characteristics with hierarchical clustering utilizing Ward’smethod and the gap criterion. Hierarchical clustering with Ward’s method attempts to min-imize variance within clusters and maximize variance between clusters (Ward, 1963). Thegap criteria is used to identify the optimal number of hierarchical clusters by maximizing thedistance between the within-cluster variation and the expected within-cluster variation foundfrom bootstrapping (Tibshirani et al., 2001). The network-constrained clustering method uti-lizes the shortest paths between nodes and thereby capturing the network neighbors of eachnode (Yamada and Thill, 2006). Network modularity attempts to cluster nodes by maximizingthe number of connections within a cluster and minimizing the number of connections betweenthe clusters. Network modularity accomplishes this by comparing the probability that an edgeis in a cluster with the probability a random edge is in the module, i.e. an edge is present in arandom graph with the same node degree distribution (Newman, 2006). Network connectivity optimization heuristics

The following techniques were selected for the network connectivity optimization study fromtheir extensive use in optimization (see the Supplementary Material for the algorithms). Param-eter selection was simpliﬁed for easy comparison of the methods. Random restart, randomlyselecting initial nodes to avoid local optima and running the routine until the optimal solutionis found, was used for each method to ensure the methods did not converge on suboptimalsolutions due to the initial starting values. Six heuristics are employed as below.

Exhaustive search (ES).

The exhaustive search optimization routine creates an edge forevery combination of distant and close nodes (see Algorithm 1 in the Supplementary Material).Because the results by ES is the optimal, the solution times and the objective values are used tobenchmark the solutions by other methods.

Hill climbing (HC).

The solution space was observed to be hilly from the exhaustive searchresults, so several modiﬁcations were introduced to the hill climbing technique to address this(Algorithm 2). A stochastic hill climbing (HCS) , an advanced search method based on HC,routine is also explored where the selection of nodes for the next iteration is randomly pickedwith probability ( i , j ) = α C ( i , j ) + β B ( i , j ) ∑ ( m , n ) ( α C ( m , n ) + β B ( m , n )) , (2)which terminates when a better solution is no longer found (Algorithm 3). A hill climbingalgorithm is coupled with a variable neighborhood (HCVN) where the size of the neighborhoodstarts with the nearest neighbors ( η =

1) and is updated as follows: η = ( O t > O t − η + O t ≤ O t − , (3)and the HCVN method terminates after n max is reached (Algorithm 4). Simulated annealing (SA).

As a meta-heuristic approach, the simulated annealing methodrandomly selects an initial solution from the solution space to avoid entrapment in a local op-tima. At each iteration, the heuristic evaluates the neighboring solutions and if it does not ﬁnd n improved solution, it moves to a new solution with the following probabilityprobability ( i , j ) = exp (cid:18) − O t − − O ( i , j ) t (cid:19) , (4)to obtain an improvement of the solution. The distance of the move decreases with the numberof iterations until a better solution is no longer found (Algorithm 6). Genetic algorithm (GA).

The genetic algorithm begins with a population of P randomlyselected solutions with a set of chromosomes composed of genes which represent the weightsof selecting a neighbor and are all initialized to unity (Algorithm 7). During each iteration ofthe method, solution scores (ﬁtnesses) are computed by f ( i , j ) = O ( i , j ) ∑ ( m , n ) O ( m , n ) , (5)and a new generation of solutions are selected based on the following probability conditionprobability ( i , j ) = s ∗ f ( i , j ) + ( − s ) ∑ ( m , n ) ( s ∗ f ( m , n ) + ( − s )) , (6)where s is the selection coefﬁcient. Weak selection, s ≪

1, is used to ensure that randommutations impact solution frequency. Crossover is conducted by alternating the weights for theoffspring from each parent, also known as cycle crossover (Oliver et al., 1987). Mutations areintroduced at a low rate µ ≪ m is used to ﬁnda neighbor for node i is given byprobability(characteristic or cluster) = gene ( i , m ) ∑ k gene ( i , k ) / K , (7)where K is the total number of nodal characteristics and clusters. This formulation ensures thatthe nodal characteristics or clusters that improve the solution increase in weight, results in agreater probability they will be selected for neighborhood exploration, and reduces the size ofthe neighborhood search. Simulated data: Complex random networks

Complex random networks.

Several types of random networks were used to evaluate theeffectiveness of each heuristic in identifying the optimal new connection. The following typesof random undirected networks were generated: (1) Erd¨os-R´enyi ( ER ) networks, (2) Watts-Strogatz ( WS ) networks, (3) Barab´asi and Albert ( BA ) networks, (4) Klemm and Egu´ılez ( KE )networks (see Supplementary Figure S.1 (A) – (D) and for the algorithms used to generate thenetworks see Prettejohn et al. (2011)). Complex planar networks.

Two novel types of random networks are created here, ran-dom Delaunay triangulation ( DT ) (Supplementary Figure S.1 (E)) and random Voronoi dia-grams ( VD ) (Supplementary Figure S.1 (F)). These networks are inherently planar and edgesare removed from network nodes randomly based on their distance from the focal node withprobability p R · max ( d ( i , F ) , d ( j , F )) max k d ( k , F ) , (8)where p R is the removal probability and weighted by the normalized edge distance from thefocal node. arameter selection. To compare the efﬁcacy of different optimization methods for dif-ferent network topologies, identifying the best set of parameters are critical. Parameter valueswere selected for each type of random network to ensure network complexity (SupplementaryTable S.6 summarizes the parameters which were used in the analysis). Variation in networksize was also explored and the most connected node in each network was selected as the focalnode. Uniformly randomly generated edge weights in [0,1] were used for the network distancesand the threshold distance was set to ensure that half of the nodes were initially within thedistance to the focal node. The costs and beneﬁts were normalized using the ranges from theexhaustive search routine as a benchmark to compare the results from the different optimizationmethods.

Empirical data: Street networks around schools

Networks composed of street edges and residence nodes around several schools from a USschool system were used for the analysis. Ten suburban and rural schools from Knox County,TN, were selected for the analysis, including seven elementary and three middle, that wouldbeneﬁt the most from increased thoroughfare connectivity, i.e. had the most students within theEuclidean walking distance but not the network distance to the school. Urban schools were notused since the street connectivity around the schools was signiﬁcantly high and the from addi-tional thoroughfares would be low. Residential nodes were placed on the street networks. Theresidences within 1 mile and 1.5 miles, for the elementary schools and the middle schools re-spectively, are considered close nodes while the nodes outside of these distances were classiﬁedas distant nodes (see Figure 3). The school networks do not generally display the characteris-tics of complex network, they had low average degree, large path lengths, and were not efﬁcient,yet have a few intersections (nodes) with a large number of street connections (see Supplemen-tary Table S.4). The networks were evaluated with each optimization method to maximize thenumber or close residences connected to the school and minimize the distance of the new thor-oughfares. The costs and beneﬁts of these street connections were normalized using the rangesfrom the exhaustive search routine as a benchmark to compare the results from the differentoptimization methods.

RESULTS

Several ﬁnding are worthy to note. First, there were consistent nonlinear relationships betweenthe nodal characteristics and the quality of the solutions for each type of random network andthe school networks (see Figure 4). There was also signiﬁcant variation for which nodal charac-teristics were correlated with the quality of the solution across networks (see Table 2). Amongthose, the distance between the close node and the focal node and the distance between the dis-tant node and the close node were most often highly correlated with the quality of the solutionacross networks. The centrality measures were inconsistently related to the solution qualityfor the random networks. The clustering methods were consistently unrelated to the quality ofthe solutions for the random networks, while the network modularity for the distant node wascorrelated with spatial networks and the school networks.Results of the termination times and the optimal solutions deviations from the optimiza-tion heuristics applied to the random networks are summarized in Figure 5 and SupplementaryFigures S.1 and S.2. The hill climbing method was consistently faster for all of the networks,yet had the largest cost and beneﬁt deviations. Simulated annealing and the genetic algorithmhad similar termination times, but the genetic algorithm was consistently superior to all of theother methods in approaching the optimal solution. The results from the application of theoptimization heuristics applied to the ten school networks are shown in Figure 5. The times to igure 3.

Examples of the street networks used for the analysis: (A) a suburban elementaryschool and (B) a rural elementary school. The blue nodes represent the distant residences, i.e.the residences within the 1-mile Euclidean walking distance but not the 1-mile networkwalking distance, the green nodes are the close residences within the network walkingdistance, and the black square represents the school. The orange line denotes the optimal newwalking connection that maximizes the number of additional residences (orange nodes) andminimizes the length of the new connection.termination for each hueristic according to network size consistently followed the following pat-tern: ES > SA > HCVN > GA > HCS > HC. The genetic algorithm clearly outperformed the otherheuristics, followed by simulated annealing, in terms of cost and beneﬁt deviations (see Figure 5(B), (D), and (F)). igure 4.

The relationship between the distances of the close node and the focal node withthe costs and beneﬁts for each solution for different networks. Figure (A) shows therelationship for a Watts-Strogatz network with N = N = N = N ≈ etworkER WS BA KS DT VD Schools N od a l c h a r ac t e r i s ti c s c l o s e nod e d ( j , F ) -0.08 -0.52 -0.37 -0.48 -0.28 -0.41 -0.30 C Dj C Cj C Bj C Ej C Pj -0.03 -0.01 -0.00 c wj HC j NM j -0.01 0.03 0.07 0.06 -0.06 0.02 d i s t a n t nod e d ( i , F ) -0.46 -0.18 -0.12 -0.09 -0.07 -0.08 -0.04 C Di C Ci C Bi C Ei C Pi c wi HC i NM i -0.30 -0.19 0.17 Table 2.

Average correlation coefﬁcients for the random networks (1000 graphs for eachnetwork type with N = igure 5. The termination times for the heuristics applied to the random networks and schoolnetworks: (A) and (B) Erd¨os-R´enyi networks, (C) and (D) Delaunay networks, and (E) and (F)the ten school networks. These are average times for 1000 random restarts for optimizationapplied to 1000 random network of each type and size. The times are scaled by the exhaustivesearch time and log transformed for easier interpretation. The beneﬁts were scaled by theresults from the exhaustive search, where a longer connection length is a positive costdeviation and a shorter connection is a negative cost deviation.

ONCLUSION

The network connectivity problem introduced in this study is relevant to a wide range of applica-tions and is nontrivial as the number of solutions can become large even for small systems. Thistype of combinatorial optimization problem highlights the difﬁculty in determining local searchroutines a priori. The nodal characteristics were nonlinearly related to the solutions while differ-ent characteristics varied in their correlation with solution quality for different networks makingit difﬁcult to exclude speciﬁc characteristics for network connectivity optimization. Distance tothe focal node was consistently related to the quality of the solution as this lowers connectiv-ity length costs, while centrality was intermittently correlated with solution quality it providesgreater beneﬁt through more connections. Clustering nodal characteristics did not provide ad-ditional useful information from the nodal characteristics for the random networks. This couldarise from the following issues: the curse of dimensionality, i.e. large sparse subspaces in thesolution space; the nodal characteristics are highly correlated with each other; outliers; ﬁnd-ing the appropriate inﬂuential nodal characteristics is not possible a priori; and the inﬂuenceof speciﬁc characteristics is dynamic as the heuristics converge. For the school networks: theclustering coefﬁcient was a poor measure due to the lack of triplets in the networks; the networkmodularity also had poor results possibly due to the measure’s inability to account for the spa-tial component of the nodes; and the network-constrained clusters were also poor in explainingthe solutions, due to the complexity of the network topology.The optimization heuristics save computational time but vary considerably in their ability toﬁnd a solution near the optimal. The stochastic hill climbing search was not effective due to thelarge neighborhood search space explored. In our experiment, the number of solutions checkedat each iteration is >

300 and resulted in a skewed probability distribution of objective valuesfavoring the selection of low values. This degraded the efﬁciency of the method resulting in theselection of poor solutions. The variable neighborhood search method was similarly not reliablebecause of the signiﬁcantly large neighborhood search space (the number of possible solutionsexplored at a given iteration could be > , imes. Optimizing this feature is currently being developed as is a tool for ArcGIS and Pythonfor planners and researchers to utilize. ACKNOWLEDGMENTS.

We would like to thank Alex Zendel (GIS Analyst at the Knoxville-Knox County MetropolitanPlanning Commission) for providing the street networks and residential data around the schools.

REFERENCES

Albert, R. and Barab´asi, A. (2002). Statistical mechanics of complex networks.

Reviews ofModern Physics , 74(1):47–97.Anderson, E. J. and Ferris, M. C. (1994). Genetic algorithms for combinatorial optimization:The assembly line balancing problem.

ORSA Journal on Computing , 6:161–173.Anderson, S. J., Karumanchi, S. B., and Iagnemma, K. (2012). Constraint-based planning andcontrol. In

Proceedings of the Intelligent Vehicles Symposium (IV) . IEEE.Auerbach, J. Fitzhugh, E. and Zaviska, E. (2020). The impact of small changes in thoroughfareconnectivity on the potential for student walking. arXiv .Auerbach, J. (2018).

Essays in network theory applications for transportation planning . PhDthesis, University of Tennessee.Barab´asi, A. and Albert, R. (1999). Emergence of scaling in random networks.

Science ,286:509–512.Barrat, A., Barth´elemy, M., Pastor-Satorras, R., and Vespignani, A. (2004). The architectureof complex weighted networks.

Proceedings of the National Academy of Sciences USA ,101(11):3747–3752.Bavelas, A. (1950). Communication patterns in task-oriented groups.

The Journal of the Acous-tical Society of America , 22:725.Branas, C. C., MacKenzie, E. J., Williams, J. C., Schwab, C. W., Teter, H. M., Flanigan, M. C.,Blatt, A. J., and ReVelle, C. S. (2005). Access to trauma centers in the united states.

Journalof the American Medical Association , 293(21):2626–2633.Brimberg, J. and Hodgson, J. M. (2011). Heuristics for location models. In Eiselt, H. A.and Marianov, V., editors,

Foundations of Location Analysis , chapter 15, pages 335–355.Springer.Brin, S. and Page, L. (1998). The anatomy of a large-scale hypertextual Web search engine.

Computer Networks and ISDN Systems , 30(1-7):107–117.Centers for Disease Control and Prevention (2010). The association between school basedphysical activity, including physical education, and academic performance. Technical report,U.S. Department of Health and Human Services.Delaunay, B. (1934). Sur la sph`ere vide.

Bulletin de l’Acad´emie des Sciences de l’URSS, Classedes sciences math´ematiques et naturelles , 6:793–800.Delmelle, E., Shuping, L., and Murray, A. (2012). Identifying bus stop redundancy: A gis-basedspatial optimization approach.

Computers, Environment and Urban Systems , 36:445–455.Demaine, E. D. and Zadimoghaddam, M. (2010). Minimizing the diameter of a network usingshortcut edges. In

Proceedings of the 12th Scandinavian conference on Algorithm Theory .Springer.Donoso, Y. and Fabregat, R. (2007).

Multi-Objective Optimization in Computer Networks UsingMetaheuristics . Auerbach Publications.Erd¨os, P. and R´enyi, A. (1959). On random graphs.

Publicationes Mathematica , 6:290–297.Eubank, S., Guclu, H., Kumar, V. S. A., Marathe, M. V., Srinivasan, A., Toroczkai, Z., and ang, N. (2004). Modelling disease outbreaks in realistic urban social networks.

Nature ,429:180–184.Floriani, L. D., Falcidieno, B., and Pienovi, C. (1985). Delaunay-based representation of sur-faces deﬁned over arbitrarily shaped domains.

Computer Vision, Graphics, and Image Pro-cessing , 32(1):127–140.Freeman, L. C. (1977). A set of measures of centrality based on betweenness.

Sociometry ,40(1):35–41.Gavrilets, S., Auerbach, J., and van Vugt, M. (2016). Convergence to consensus in heteroge-neous groups and the emergence of informal leadership.

Scientiﬁc Reports , 6.Golden, B. L. and Skiscim, C. C. (1986). Using simulated annealing to solve routing andlocation problems.

Naval Research Logistics Quarterly , 33(2):261–279.Greiner, R. (1992). Probabilistic hill-climbing: Theory and applications. In

Proceedings of theNinth Canadian Conference on Artiﬁcial Intelligence . AAAI.Gu, W., Wang, X., and McGregor, S. E. (2010). Optimization of preventive health care facilitylocations.

International Journal of Health Geographies , 9(17).Ibeas, A., dell’Olio, L., Alonso, B., and Sainz, O. (2010). Optimizing bus stop spacing in urbanareas.

Transportation Research Part E: Logistics and Transportation Review , 46:446–458.Jaramillo, J. H., Bhadury, J., and Batta, R. (2002). On the use of genetic algorithms to solvelocation problems.

Computers and Operations Research , 29:761–779.Jiang, Z., Liang, M., and Guo, D. (2011). Enhancing network performance by edge addition.

International Journal of Modern Physics C , 22(11):1211–1226.Johnson, D. S., Aragon, C. R., McGeoch, L. A., and Schevon, C. (1989). Optimization bysimulated annealing: An experimental evaluation; Part 1, Graph partitioning.

OperationsResearch , 37(6):865–892.Johnson, S. (2007).

The Ghost Map: The Story of London’s Most Terrifying Epidemic-and HowIt Changed Science, Cities, and the Modern World . Riverhead Books.Kim, K., Dean, D., Kim, H., and Chun, Y. (2016). Spatial optimization for regionalizationproblems with spatial interaction: a heuristic approach.

International Journal of GeographicInformation Science , 30(3):451–473.Kirkpatrick, S. (1984). Optimization by simulated annealing: Quantitative studies.

Journal ofStatistical Physics , 34(5/6):975–986.Kirkpatrick, S., Gelatt, C. D., and Vecchi, M. P. (1983). Optimization by simulated annealing.

Science , 220:671–680.Klemm, K. and Egu´ılez, V. M. (2002). Growing scale-free networks with small-world behavior.

Physical Review E , 65.Liebeherr, J. and Nahas, M. (2001). Application-layer multicast with Delaunay triangulations.In

Proceedings of the Global Telecommunications Conference, GLOBECOM . IEEE.Linehan, J., Gross, M., and Finn, J. (1995). Greenway planning: developing a landscape eco-logical network approach.

Landscape and Urban Planning , 33(1-3):179–193.Meguerdichian, S., Koushanfar, F., Qu, G., and Potkonjak, M. (2001). Exposure in wireless Ad-Hoc sensor networks. In

Proceedings of the 7th Annual International Conference on MobileComputing and Networking , MobiCom ’01, pages 139–150. ACM.Meyerson, A. and Tagiku, B. (2009). Minimizing average shortest path distances via shortcutedge addition. In Dinur, I., Jansen, K., Naor, J., and Rolim, J., editors,

Approximation,Randomization, and Combinatorial Optimization. Algorithms and Techniques , pages 272–285. Springer.Mladenovi´c, N., Brimberg, J., Hansen, P., and Moreno-P´erez, J. A. (2007). The p-medianproblem: A survey of metaheuristic approaches.

European Journal of Operational Research ,179:927–939. ladenovi´c, N. and Hansen, E. (1997). Variable neighborhood search.

Computers and Opera-tions Research , 24(11):1097–1100.Newman, M. E. J. (2006). Modularity and community structure in networks.

Proceedings ofthe National Academy of Sciences USA , 103(23):8577–8582.Newman, M. E. J. (2008). Mathematics of networks. In Blume, L. and Burlauf, S., editors,

TheNew Palgrave Encyclopedia of Economics . Palgrave Macmillan, Basingstoke, 2nd edition.Oliver, I. M., Smith, D. J. D., and Holland, R. C. J. (1987). Study of permutation crossoveroperators on the traveling salesman problem. In

Proceedings of the Second InternationalConference on Genetic Algorithms on Genetic algorithms and their application , pages 224–230. MIT.Pablo-Mart`ı, F. and S´anchez, A. (2017). Improving transportation networks: Effects of popula-tion structure and decision making policies.

Scientiﬁc Reports , 7.Prettejohn, B., Berryman, M., and McDonnell, M. (2011). Methods for generating complexnetworks with selected properties for simulations: a review and tutorial for neuroscientists.

Frontiers in Computational Neuroscience , 5.Resende, M. G. C. and Pardalos, P. M., editors (2006).

Handbook of Optimization in Telecom-munications . Springer.Russell, S. J. and Norvig, P. (2004).

Artiﬁcial Intelligence: A Modern Approach . Prentice Hall.Schrijver, A. (2002). On the history of the transportation and maximum ﬂow problems.

Mathe-matical Programming , 91(3):437–445.Steffen, B. and Seyfried, A. (2010). Methods for measuring pedestrian density, ﬂow, speedand direction with minimal scatter.

Physica A: Statistical Mechanics and its Applications ,389(9):1902–1910.Tibshirani, R., Walther, G., and Hastie, T. (2001). Estimating the number of clusters in a dataset via the gap statistic.

Journal of the Royal Statistical Society: Series B , 63(2):411–423.Ward, J. H. (1963). Hierarchical grouping to optimize an objective function.

Journal of theAmerican Statistical Association , 58:236–244.Watts, D. J. and Strogatz, S. H. (1998). Collective dynamics of “small-world” networks.

Nature ,393:440–442.Whitley, D., Starkweather, T., and Bogart, C. (1990). Genetic algorithms and neural networks:optimizing connections and connectivity.

Parallel Computing , 14(3):347–361.Wu, F., Huberman, B. A., Adamic, L. A., and Tyler, J. R. (2004). Information ﬂow in socialgroups.

Physica A: Statistical Mechanics and its Applications , 337(1-2):327–335.Yamada, I. and Thill, J. (2006). Local indicators of network-constrained clusters in spatial pointpatterns.

Geographical Analysis , 39:268–292., 39:268–292.