[PDF] A Parallel Evolutionary Multiple-Try Metropolis Markov Chain Monte Carlo Algorithm for Sampling Spatial Partitions

Abstract

We develop an Evolutionary Markov Chain Monte Carlo (EMCMC) algorithm for sampling spatial partitions that lie within a large and complex spatial state space. Our algorithm combines the advantages of evolutionary algorithms (EAs) as optimization heuristics for state space traversal and the theoretical convergence properties of Markov Chain Monte Carlo algorithms for sampling from unknown distributions. Local optimality information that is identified via a directed search by our optimization heuristic is used to adaptively update a Markov chain in a promising direction within the framework of a Multiple-Try Metropolis Markov Chain model that incorporates a generalized Metropolis-Hasting ratio. We further expand the reach of our EMCMC algorithm by harnessing the computational power afforded by massively parallel architecture through the integration of a parallel EA framework that guides Markov chains running in parallel.

Full PDF

AA P

ARALLEL E VOLUTIONARY M ULTIPLE -T RY M ETROPOLIS M ARKOV C HAIN M ONTE C ARLO A LGORITHM FOR S AMPLING S PATIAL P ARTITIONS

Wendy K. Tam Cho Yan Y. Liu Abstract

We develop an Evolutionary Markov Chain Monte Carlo (EMCMC) algorithm for samplingspatial partitions that lie within a large and complex spatial state space. Our algorithm com-bines the advantages of evolutionary algorithms (EAs) as optimization heuristics for statespace traversal and the theoretical convergence properties of Markov Chain Monte Carlo algo-rithms for sampling from unknown distributions. Local optimality information that is identi-ﬁed via a directed search by our optimization heuristic is used to adaptively update a Markovchain in a promising direction within the framework of a Multiple-Try Metropolis MarkovChain model that incorporates a generalized Metropolis-Hasting ratio. We further expand thereach of our EMCMC algorithm by harnessing the computational power afforded by massivelyparallel architecture through the integration of a parallel EA framework that guides Markovchains running in parallel.

Keywords:

Markov Chain Monte Carlo; Evolutionary Algorithms; Spatial Partitioning Department of Political Science, Department of Statistics, Department of Mathematics, Department of Asian Amer-ican Studies, the College of Law, and the National Center for Supercomputing Applications, University of Illinois atUrbana-Champaign. [email protected] Research Scientist, Computational Urban Sciences Group, Computational Science and Engineering Division, OakRidge National Laboratory. [email protected] a r X i v : . [ s t a t . C O ] J u l INTRODUCTION

Markov Chain Monte Carlo (MCMC) methods originated in statistical physics (Metropolis et al.,1953) and have migrated to applications in many disciplines. A particular use of MCMC thathas wide application is for sampling from complex and unknown distributions. While the theoryof MCMC methods ensures sampling from unknown distributions, it is not always straightfor-ward to devise these methods for a particular application, and the theoretical result, moreover, isasymptotic. Indeed, if the application problem is large and the state space is difﬁcult to traverse,the amount of time required before the theoretical convergence of a Markov chain is realized maybe prohibitively long. Hence, while MCMC methods are theoretically attractive, successful imple-mentation for complex applications can be quite challenging.Consider the spatial partitioning application illustrated in Figure 1. The goal is to characterizethe set of all possible feasible partitions of size k , where feasible is deﬁned as a partition thatsatisﬁes a speciﬁc set of constraints (e.g., contiguity). The plot on the left displays a ﬁxed set ofgeographic units that collectively encompasses the entire state of Ohio. The plot on the right showsone particular mutually exclusive and exhaustive partition of these geographic units into k = 16 disjoint contiguous sets. One can visually see that, given the large number of base geographicunits, there is an enormous number of distinct partitions into 16 disjoint contiguous sets. Indeed,this number is sufﬁciently large that it is computationally infeasible to enumerate the set of allpartitions within current computing capabilities. Since an exhaustive search is not possible, analternative route is to produce a representative sample of the universal set, which would alsoallow us to derive insights into the underlying population of feasible partitions. MCMC methodsprovide a possible route for sampling from this unknown population distribution. However, howto specify an MCMC model that produces a sample of feasible spatial partitions is not obvious.For many applications, a common MCMC strategy is to deﬁne a Markov transition functionthat amounts to a small or local random change in the current state. Small or local changes are INTELLIGENT STATE SPACE TRAVERSAL

Intelligent and randomized state space traversal are fundamental to performance, whether foran optimization heuristic or for an MCMC algorithm. One way in which the moves can be bothrandom and purposeful rather than simply random, is to follow the local dynamics of the targetdistribution. A Gibbs sampler follows the local dynamics of the target distribution by composinga sequence of conditional distributions along a set of sampling directions. In practice, while Gibbssamplers are ensured theoretical convergence, they may converge quite slowly, because it is oftennot clear how to intelligently sample from the full conditional distributions.

INTELLIGENT STATE SPACE TRAVERSAL f ( x ) where x can be broken down into compo-nents, x = ( x , x , . . . , x n ) . One iteration of a systematic scan Gibbs sampler can be described asfollows. Begin at some state, x (0) . At iteration i ,1. Sample x ( i +1)1 from the conditional distribution f ( x | x ( i )2 , x ( i )3 , . . . , x ( i ) n ) .2. Sample x ( i +1)2 from the conditional distribution f ( x | x ( i +1)1 , x ( i )3 , . . . , x ( i ) n ) ....n. Sample x ( i +1) n is sampled from f ( x n | x ( i +1)1 , x ( i +1)2 , . . . , x ( i +1) n − ) The sequence of realizations, { x ( t ) = ( x ( t )1 , x ( t )2 , . . . , x ( t ) n ) } , forms a Markov chain with stationarydistribution, f ( x ) .Two challenges must be overcome for the Gibbs sampler to be successful. First is the issue of conjugacy or the ability to sample from the conditional distributions (Carlin and Gelfand, 1991).Second is how to overcome the difﬁculty in constructing univariate sampling directions in such away that ensures rapid movement around the support of f ( x ) when simple random movement isinsufﬁcient. In other words, while it is possible to devise a Gibbs sampler that follows the localdynamics of the target distribution, how precisely one intelligently, and thus effectively, devisesthis movement is non-obvious and, further, application dependent.The conjugacy issue can be addressed with the Griddy-Gibbs sampler (Ritter and Tanner, 1992)where rather than sampling from the full conditional distributions, one may form a simple approx-imation of the inverse CDF on a grid of points. However, even with a successful implementationof the Griddy-Gibbs sampler that addresses the difﬁculties of sampling from the full conditionaldistributions, the challenge of devising intelligent movements in purposeful directions remains. Adaptive Direction Sampling (ADS) is a general technique that ﬁts within the framework of theGibbs sampler and aims to sample in a particular search direction. It was introduced as an auto-mated way of overcoming the slow convergence in the Gibbs sampler by providing a way to guidesampling from the conditional distributions (Gilks, Roberts and George, 1994; Roberts and Gilks,1994). The ADS algorithm also introduces a way in which the interaction of multiple Markovchains can be speciﬁed to improve the performance of MCMC by proposing that the samplingdirection be updated with information from a different Markov chain state.ADS can be adapted in various ways. The general form of ADS is x ( t +1) c = x ( t ) c + r ( v ( t ) + u ( t ) x ( t ) c ) , (1)where v ( t ) , an n -vector, and u ( t ) , a scalar, are any functions of the current set, S ( t ) , excluding x ( t ) c ,the current state, all at time or state t . If v ( t ) is the difference between two points in the current set,parallel ADS emerges. If v ( t ) is a random coordinate direction with u ( t ) = 0 , we have the Gibbssampler. The hit-and-run algorithm emerges when v ( t ) is a random direction and u ( t ) = 0 (B´elisle,Romeijn and Smith, 1993). INTELLIGENT STATE SPACE TRAVERSAL

Rather than relying on an arbitrary direction, Liang and Wong (2001) guide the directional sam-pling of ADS with their snooker crossover , a recombination operator from an evolutionary algorithm(EA) that they modify to ﬁt within the MCMC framework. That is, they utilize an optimizationheuristic to deﬁne the sampling direction in ADS. In the snooker algorithm, a set of possible states, S , is retained. In the language of evolutionary algorithms, these possible states comprise the pop-ulation. From the set, S , two states are chosen randomly, one as the current state, x c , and oneas the anchor state, x a . The direction of movement is then chosen along a line that connects thecurrent state and the anchor state. From equation (1), the snooker algorithm arises when u ( t ) = − and v ( t ) = x ( t ) a , where x ( t ) a is a randomly chosen state from S ( t ) , excluding x ( t ) c . To prevent thenew states from being too concentrated around the anchor state, one can sample from an adjustedfull conditional distribution. The intuition is that after a burn-in period, the set of states is likely tobe in high density regions, and high density states are helpful for identifying sampling directionsfor a Gibbs sampler.While the snooker algorithm is a recombination evolutionary algorithm operator that can beadapted to ﬁt within the mathematical framework of MCMC, the effectiveness and efﬁciency ofthe algorithm for recovering samples for any particular application is still dependent on how itsoperators traverse that particular state space. In the optimization literature, it is clear that whileEAs have been successful for many applications, signiﬁcant effort must still be expended to adaptthe heuristic to the particular solution landscape. To be sure, EAs are not panaceas. Rather, theycomprise a general technique whose success is dependent on successful adaptation of the evolu-tionary framework to a speciﬁc application.All the same time, combining optimization heuristics within an MCMC framework is a promis-ing strategy for difﬁcult problems, and one that has already been successfully implemented for anumber of interesting problems, including Bayesian mixture models, C p model sampling, andchange point problems (Liang and Wong, 2000, 2001; Laskey and Myers, 2003). The main hin-drance is that the approach is general, and successful implementation of both the optimizationas well as the MCMC components, especially for complex applications, must be tuned to the id-iosyncrasies of the application state space. Liu, Liang and Wong (2000) generalize how Markov chain movement can be adapted with op-timization heuristics in their Multiple-Try Method (MTM). They propose a way to incorporateoptimization steps into an MCMC sampler with a multiple-try Metropolis-like transition rule that

AN EVOLUTIONARY ALGORITHM FOR SPATIAL OPTIMIZATION T ( x , y ) . The method is general and Liu, Liang and Wong (2000) demonstrate how MTMcan be combined with different MCMC variants (in particular, ADS, Griddy-Gibbs samplers, andthe Hit-and-Run algorithm) to produce more effective samplers.In the MTM framework, suppose that the current state is x . The optimization heuristic pro-poses some set of m proposal moves, Y = { y , . . . , y m } . In this way, the optimization heuristicprovides a method for directional sampling without the shortcoming of previous approaches thatrelied primarily on a random direction. Importantly, since an optimization move may not be aproper Markov transition, it cannot simply be accepted as the next state in a Markov chain. In-stead, from the set of proposal states created by the optimization heuristic, one proposal state, y ∈ Y , is selected probabilistically, and this selected proposal is then accepted or rejected accord-ing to a generalized Metropolis-Hastings ratio that has the form, r g = min (cid:26) , ξ ( y , x ) + · · · + ξ ( y m , x ) ξ ( x ∗ , y ) + · · · + ξ ( x ∗ m , y ) (cid:27) , (2)where ξ ( y , x ) is the probability of choosing proposal y from the set Y , and { x ∗ i } , i = 1 , . . . , m ,comprise a set of proposal moves from the same optimization heuristic that now begins at state y and moves in the direction of state x .Liu, Liang and Wong (2000) develop and demonstrate these ideas with a Conjugate-GradientMonte Carlo where the search direction proceeds along a vector whose direction is based on theconjugate gradient. Once the direction is chosen, randomness is injected via the magnitude of thisvector. Searching along a gradient is one optimization technique, but plainly not the only option.This tactic is effective for certain types of state spaces, but not ideal for other types, and certainlynot for all types. Indeed, optimization heuristics come in many different genres, and within thesegenres, with many different speciﬁcations and implementations. However, the MTM frameworkis general such that any optimization technique can be employed to create a set of proposal movesthat are then accepted or rejected according to the generalized Metropolis-Hastings ratio.The evolution of these algorithms highlights a theme encountered repeatedly both in statisticalmodeling and in the design of optimization heuristics. Namely, there is not one ﬁxed solution thatis ideal for every application. There are only general guiding principles for utilizing particularmodeling frameworks. Incontrovertibly, devising Markov chains must be done with the pecu-liarities of the application in mind. As well, the success of optimization heuristics increases inlikelihood when they are designed to adapt to application idiosyncrasies. In every case, circum-spection is necessary in the selection of a meaningful and effective search direction for either theMarkov chain transitions or the optimization heuristic steps. While traversal of some state spaces may seem to ﬁt into standard frameworks, other state spacesare more unusual. Constrained spatial state spaces produce particularly thorny applications sinceit is difﬁcult to envision how one might transform multi-dimensional spatial characteristics intouni-dimensional searches. Indeed, spatial constraints pose particular problems for both spatialoptimization as well as sampling techniques because spatial requirements can impose signiﬁcant

AN EVOLUTIONARY ALGORITHM FOR SPATIAL OPTIMIZATION

Consider the spatial partitioning application where we wish to partition a set of n spatial units, u , u , . . . , u n , into a set of k disjoint spatially contiguous zones that each satisfy a speciﬁed set ofspatial as well as non-spatial constraints. Mathematical Formulation I : set of spatial units; A : set of adjacent unit pairs; K : set of zones; n k : number of units in zone k ; x = { x ik } : x ik = (cid:26) if unit i is assigned to zone k otherwise y ijk : ﬂow from unit i to unit j for zone kh ik = (cid:26) if unit i is the hub of zone k otherwiseObjective minimize obj ( x ) AN EVOLUTIONARY ALGORITHM FOR SPATIAL OPTIMIZATION (cid:88) j | ( i,j ) ∈ A y ijk − (cid:88) j | ( j,i ) ∈ A y jik = n k h ik − x ik ∀ k ∈ K, ∀ i ∈ I (3) (cid:88) j | ( j,i ) ∈ A y jik (cid:54) ( n k − x ik ∀ k ∈ K, ∀ i ∈ I (4) (cid:88) k ∈ K x ik = 1 ∀ i ∈ I (5) (cid:88) i ∈ i h ik = 1 ∀ k ∈ K (6) a x (cid:54) b (7) x ik , h ik ∈ { , } ∀ k ∈ K, ∀ i ∈ I (8) y ijk (cid:62) ∀ k ∈ K, ∀ ( i, j ) ∈ A (9)The above formulation is revised from Shirabe (2009), who deﬁnes the contiguity for each par-tition as a network ﬂow from all of the spatial units in each zone to their zone hub that receivesthe ﬂow. The objective function, obj (), is a weighted sum of spatial and non-spatial objectives.Constraint (3) requires that the difference of ﬂow into a unit i and out of i must be ( n k − . Thismeans that if unit i is the hub of zone k , the ﬂow traverses each unit in the zone exactly once. Oth-erwise, this constraint has no effect. Constraint (4) ensures that no unit is visited twice. Constraint(5) guarantees that each unit is a member of one and only one zone. Constraint (6) guarantees thateach zone has only one hub. These four constraints together ensure that all of the units are par-titioned into exactly k contiguous zones. Constraint (7) denotes all other non-spatial constraints.While unit assignment is discrete (Constraint (8)), the ﬂow is formulated as a continuous variable(Constraint (9)). We can also see that this problem is computationally intractable since Constraints(3) and (4) generate a number of inequalities that increases exponentially with the number of units.One example of a constraint is the requirement for weight balancing. across zones. Here,each spatial unit, u j , j = 1 , . . . , n , is associated with a weight, u ( w ) j . Each zone, i = 1 , . . . , k , is anaggregation of some number of these units, and the weight for zone i is w i = (cid:80) u j ∈ zone i u ( w ) j . Theaggregated weight for any one zone is required to be within a small speciﬁed range, ε , of all otherzones, max ki =1 w i − min ki =1 w i (cid:80) ki =1 w i < ε. (10)There are many other constraints that may be imposed on the solutions. One example isa mathematically deﬁned shape requirements that may, for instance, constrain the size of theisoperimeter quotient or the ratio of the area of a zone to the area of a circle with the same perime-ter. Another example of a spatial constraint is a requirement that certain spatial units must becontained in the same zone. While there are a large number of possibilities for spatial constraints,they all have the effect that they limit the global state space to a smaller feasible state space wherefeasibility is deﬁned as states that satisfy these speciﬁc spatial and non-spatial constraints. AN EVOLUTIONARY ALGORITHM FOR SPATIAL OPTIMIZATION n alleles, β , β , . . . , β n . Each allele, β i , takes on onevalue from the set, [1 , k ] . We introduce a virtual unit 0 and its associated zone 0 to indicate theregion border outside and surrounding the units being partitioned. Every spatial unit is assignedto exactly one zone.This formulation is similar to the Real-Encoded Evolutionary Monte Carlo (Liang and Wong,2001) except that β i ∈ [0 , k ] ∈ Z instead of β i ∈ R . While this may seem like a reduction in thesolution space, note that in our application, the state space becomes much more complex since thespatial constraints signiﬁcantly restrict which encodings are valid or feasible solutions and rendermost of the combinatorial encodings as infeasible because they violate the spatial constraints. Inturn, the restrictions on which encodings are valid makes the solution space traversal especiallydifﬁcult since randomly choosing alleles to alter is then very likely to result in a solution thatviolates the constraints. Movement in the solution space thus must be carefully considered andspatially aware to avoid wasted computational effort. For this spatial partitioning optimization problem, Liu and Cho (2020) formulated two EA spatialoperators. The ﬁrst is an ejection chain-based mutation operator, ECMUT, which selects a chromo-some at random and mutates a small number of its “mutable” alleles where an allele is deﬁned asmutable if the change in the allele does not result in an infeasible solution. When a small numberof alleles are mutated, at least one of the alleles is on a zone border. This process may be describedas follows.1. Randomly choose one chromosome, x i from the current population, S .2. Add a random vector subject to constraints, e , to create a new chromosome y i = x i + e ,where e is chosen so that y i is a feasible solution. The second EA operator is spatial path-relinking crossover operator (PRCRX) that adapts the gen-eral non-spatial path relinking method to the spatial context. It does so by providing an orderedway to explore and perform recombination in the neighborhood space deﬁned by two chromo-somes (Glover, 1994; Glover, Laguna and Marti, 2000) that respects spatial constraints. Details oftheir operators are provided in Liu and Cho (2020). Here, we provide just a general description.In particular, their spatial path-relinking crossover can be described as follows. At step i , AN EVOLUTIONARY ALGORITHM FOR SPATIAL OPTIMIZATION f i t n e ss v a l u e Iterations ECMUTPRCRX

Figure 3: Performance of the Spatial Path Relinking Crossover Operator (PRCRX)1. randomly choose two chromosomes, a source solution, x s , and a target solution, x t , s (cid:54) = t ,from the population S ( i ) .2. The relinking process is comprised of a “walk” in the spatial neighborhood space betweenthe two solutions. Each step in the path converts a random allele from its value in x s toits value in x t . It is possible that some of set of these steps will lead to infeasible solutions,though spatial contiguity is always maintained.3. If feasible solutions are found on the path, randomly pick one, y , to replace chromosome x s with some probability, yielding a new population, S ( i +1) .The ECMUT spatial mutation and PRCRX spatial crossover operators have been shown to beeffective and efﬁcient for a large spatial partitioning optimization application.Figure 3 shows the performance of these EA operators in an optimization context. We cansee from the ﬁgure that while ECMUT is also effective in identifying solutions with increasinglyoptimal values, the spatial path relinking operator provides signiﬁcant additional improvement inthe search process. In particular, PRCRX was able to reach lower ﬁtness values more quickly andmore deliberately than the ECMUT operator alone, which exhibited improved, but more variable,behavior.In Liu and Cho (2020)’s empirical evaluation, they compared the performance of their EA op-erators with the performance of four previously proposed spatial optimizers and provided com- EVOLUTIONARY SPATIAL SAMPLING

Spatial optimization and spatial sampling share a common component, the need for effective andefﬁcient traversal of a spatial state space. While operators that have been designed for spatialoptimization cannot be used directly for sampling from spatial state spaces, they can be adaptedto an MCMC framework either through a Metropolis-Hastings framework, for instance, or byproviding a proposal set for directional sampling according to the structure of the Multiple-TryMetropolis MCMC model. We now discuss how this adaptation might proceed for these particularEA operators in the spatial partitioning problem.

ECMUT is a fairly small and local movement, simply exchanging some small number, p , of muta-ble units. The adaptation of the ECMUT mutation operator to deﬁne the transition of a Markovchain for a standard Metropolis-Hastings MCMC is theoretically straightforward. In particular,an ECMUT proposal is accepted with probability, min(1 , r ) , deﬁned by the Metropolis-Hastingsrule, where r = π ( y ) T ( y , x ) π ( x ) T ( x , y ) = exp {− [ H ( y ) − H ( x )] } T ( y , x ) T ( x , y ) , (11)and T ( x , y ) is a proposal transition function. When the transition function, T ( · ) , is symmetric(as it is for non-spatial state spaces, where every allele is subject to possible mutation and maybe mutated to any other allele), only the ﬁtness values are needed to compute the Metropolis-Hastings ratio.For our application, the transition function is not symmetric because we constrain ECMUTmovement to feasible alleles, at least one of which is located on a zone border. Hence, for thespatial partitioning problem, we calculate the MH ratio as r MUT = M x M y exp {− [ H ( y ) − H ( x )] } , (12)where M x is the number of mutable units that may be chosen for the mutation step that originatesfrom solution x and that results in another feasible solution. Similarly, M y is the number of muta-ble units that may be chosen for the mutation step that originates from solution y and that resultsin another feasible solution. H ( · ) is the ﬁtness function value for a particular solution.To distinguish the use of the operator for differing numbers of mutated units, we call theoperator ECMUT- p , where p is the number of units that are mutated. Variations of ECMUT-1as an MCMC transition proposal for spatial partitioning have been discussed by others (Bangiaet al., 2017; Mattingly and Vaughn, 2014). For ECMUT-1, M x and M y are fairly straightforward to EVOLUTIONARY SPATIAL SAMPLING M s , where s is some solution, we can assess whether each boundaryunit for each zone is mutable, which means that it can be reassigned to a neighboring zone withoutdisconnecting its current zone and would result in a feasible solution after mutation. Notice that,for a planar graph, where the edges indicate unit connectivity, the computing cost for M s dependson the number of neighbors that must be checked for contiguity for each unit on a zone boundaryand the total number of zone boundary units. Because of the sparsity of a planar graph, theaverage cost of checking the direct neighbors of a zone boundary unit for mutability is O (1) . Inaddition, following the Euler characteristic, for any convex polyhedron surface, V − E + F = 2 ,where V is the number of vertices, E is the number of edges, and F is the number of faces. In ourapplication, V = n , the number of units, E is the number of edges or adjacency links, and F isthe number of faces in our resulting ﬁnite and connected planar graph. We know that E ≤ n − when n ≥ since any face is bounded by at least three edges, and each edge touches two faces.Therefore, for any unit, the average number of direct neighbors is less than three.The relationship between b , the number of boundary units for all k zones, and n can be mod-eled as the total length of the zone perimeters and the sum of the zone areas. We can make ananalogy to cutting a pizza into k pieces where the shape of each piece is inconsequential as longas the k pieces collectively encompass the whole pizza. Here, each pizza piece is akin to a zone.The upper bound of b is n , regardless of the value of k . For example, consider an n = q × k grid,when each row of q cells is a zone in a k -partition solution. The lower bound of b is k . This wouldoccur in the scenario where one zone has ( n − k + 1) contiguous units while the ( k − remainingzones contain one unit from the rest of the ( k − units. Hence, we know that kn ≤ bn ≤ . Animportant observation is that, when k is ﬁxed, the rate of decrease for the ratio, bn , is faster thanthe rate of increase for n . That is, while the area increases linearly with n , the perimeter increasesin proportion to √ n . The manifestation in our problem is that, as the problem size increases, thecost of computing M s rises much slower than the linear increase of n .While ECMUT-1 provides a proper Markov transition with a computationally feasible MHratio, we have already discussed the issues associated with the ECMUT-1 transition. First, whilethis type of transition in a Metropolis-Hastings MCMC is simple conceptually and operationally,and results in a ﬂuid Markov chain, the performance of such an operationalization is inefﬁcientand possibly insufﬁcient if the state space is particularly large or complex. Unless the performancecan be greatly improved, perhaps via massive computational resources, this type of transitionproposal will be inadequate for large scale applications. Second, even for a relatively small spatialpartitioning application, because the spatial constraints render large numbers of the combinatorialencodings as infeasible, this Markov chain is likely to become trapped in localized regions thatare disconnected from other feasible solutions, resulting in a reducible chain that is incapable ofsampling from the global state space.An ECMUT- p operator, where p > , produces larger transitions, which contributes to search-ing efﬁciency and reduces (though do not solve) the issues associated with disconnected statespace regions since invalid encodings can be traversed between the ﬁrst and the p th single unitstep. If we are able to enumerate the set of possible transitions of size p from a solution x , we cancompute the MH ratio (12) directly. The tradeoff is, as p increases, this combinatorial calculation EVOLUTIONARY SPATIAL SAMPLING EA crossover operators have been successfully integrated into an MCMC framework. For ex-ample, Liang and Wong (2000) adapt a binary-encoded EA, where the alleles, β i ∈ { , } , pro-duce a chromosome that is a binary vector. Their crossover operators include a 1-point, 2-point,and uniform crossover. They choose the ﬁrst parent chromosome, x i from the population, x ,using a roulette wheel. The second parent, x j , is chosen randomly from the remaining chromo-somes. From the two parent chromosomes, two offspring chromosomes, y i and y j , are gener-ated. Without loss of generality, assume H ( x i ) ≥ H ( x j ) and H ( y i ) ≥ H ( y j ) . A new population, y = { x , . . . , y i , . . . , y j , . . . , x n } , is proposed and is accepted with probability, min(1 , r c ) , where r c = π ( y ) T ( y , x ) π ( x ) T ( x , y ) = exp {− [ H ( y i ) − H ( x i )] /t i − [ H ( y j ) − H ( x j )] /t j } T ( y , x ) T ( x , y ) (13)is the Metropolis-Hastings ratio, and T ( x , y ) = P (( x i , x j ) | x ) P (( y i , y j ) | ( x i , x j )) . Within the tran-sition probability, P (( x i , x j ) | x ) , the probability of selecting ( x i , x j ) from the population x , is P (( x i , x j ) | x ) = 1( n − Z ( x ) [exp[ − H ( x i ) /t ] + exp {− H ( x j ) /t } ] , (14)where Z ( x ) = (cid:80) ni =1 exp {− H ( x i ) /t } . Since this crossover operator is symmetric, the ratio of tran-sition probabilities reduces to 1, and so only the selection probabilities, which are either chosenrandomly or based on ﬁtness values, are needed to determine the acceptance probability.Real-encoded EAs can be more intimidating than binary-encoded EAs in the sense that thenon-spatial real-encoded model, x = { β , β , . . . , β k } , with β i ∈ R , encompasses a much largersolution space to traverse. However, when moving to the non-spatial real-encoded values noadditional complications are introduced in the adaptation for symmetric transition probabilities.This occurs for simple crossover operators like the k -point or uniform crossovers.Unfortunately, as we have discussed, these simple crossover operators are ineffective for spa-tial applications because they are spatially unaware. Indeed, this was the impetus for derivingnew EA operators for spatial optimization where we designed the spatially cognizant PRCRX op-erator to produce both larger movements and enable a more diversiﬁed search. Our empiricalevaluation showed that PRCRX offered signiﬁcant improvement in both the effectiveness and theefﬁciency of the optimization search process (Liu and Cho, 2020), making it an auspicious prospectfor an MCMC transition. Adaptation of PRCRX into an MCMC context, however, is not as simplyrealized as it is for the common and basic binary EA crossover operators. Fiﬁeld et al. (2019) bypass the need for computational advancements by proposing an approximation for the tran-sition probability, p ! (cid:16) u x (cid:17) p . However, their approximation is based on intuition and was not theoretically derived.They acknowledge that “it is difﬁcult to develop a rigorous theoretical justiﬁcation for the proposed approximation.”In addition, while they provide an empirical example, the closeness of this approximation to the true value is unknownin general and has not been rigorously tested or evaluated. EVOLUTIONARY SPATIAL SAMPLING We adapt PRCRX, the spatial path relinking crossover operator that we developed as part of anoptimization heuristic, to an MCMC algorithm through the Multiple-Try Model framework. Here,the purpose of the optimization crossover operator is to create a Markov transition proposal set .One proposal state from this set is chosen probabilistically and is accepted or rejected accordingto a generalized Metropolis-Hastings ratio.We begin with a set of q parallel chains, X , X , . . . , X q , that are randomly generated. Theirstate at time t is x ( t )1 , x ( t )2 , . . . , x ( t ) q , respectively. For the spatial path relinking crossover, we needa current state and a target state. If, at time t , for chain X i , we want to invoke a spatial path re-linking crossover transition, then state x ( t ) i becomes the current state. We randomly select anotherchain, X j , j (cid:54) = i from the set or population of all other Markov chains. The current state of thatrandomly chosen chain, x ( t ) j , becomes the target state. Roberts and Gilks (1994) established thatessentially any way of choosing the target state, x ( t ) j , is appropriate provided that x ( t ) i and x ( t ) j areindependent.The spatial path relinking crossover, PRCRX, begins at state x ( t ) i and moves in the directionof state x ( t ) j . Each step in the crossover heuristic involves the swap of some set of contiguousunits from their zone in the current state to their zone in the target state. Figure 4 illustrates thisprocess. The current state is shown in subﬁgure (a). The target state is shown in subﬁgure (b).Each of the outlined regions in subﬁgure (c) has a unique source/target zone label, indicating itszone assignment in the source state and its zone assignment in the target state. Since there are 4zones, there are ∗ possible unique current/target zone labels, though all of these zonelabels will not necessarily be in an overlap. In the illustration, only 13 of the 16 possible groupingsappear.The path relinking begins with a set of k seed groups, where k is the number of zones in thepartitioning problem. These seed groups can be chosen in any way as long as the resulting groupseach have unique target zones. Subﬁgure (d) shows one way in which to choose these seed groups,i.e., by picking those groups where the current and target zone labels are the same.Figure 4 shows one possible path that traverses the entire distance from the source state to thetarget state. Note that full traversal of the path from the current state to the target state is not nec-essary since the purpose is simply to explore the feasible intermediate states in the neighborhoodspace, not necessarily to complete the path. Subﬁgure (f) shows one intermediate state along onepath that results from the re-assignment of the region labeled ∆ . Many intermediate/proposalstates exist along the various possible paths in the spatial neighborhood space.Let C be the total number of sets of contiguous units that need to be moved or re-assigned tocomplete the path. A proposal state is the result of the swap of some number, c , of these compo-nents, < c < C . The distance between states x ( t ) i and x ( t ) j is deﬁned as d = (cid:80) Cz =1 | G z | , where z The MTM framework requires T ( x , y ) > if and only if T ( y , x ) > . That is, the transition to y from x is possibleif and only if this transition is reversible with non-zero probability. To ensure that x could be chosen as the anchorsolution, whenever a path relinking transition is accepted, we enlarge the target pool by one by inserting the currentsolution, x ( t ) i , into the target pool. Since we just traversed a path that connects x to y , we know such a path exists.Hence, if y becomes the current state, to satisfy the reversibility requirement, it only has to be possible that x ( t ) i couldbe chosen as the target solution. EVOLUTIONARY SPATIAL SAMPLING

EVOLUTIONARY SPATIAL SAMPLING G z is a set of contiguous units, C is the number of sets, G z , in the source zonethat can potentially be moved to the target zone, and | G z | is the cardinality of the set of contigu-ous units in the group G z . At most, d units, contained within C connected components need to bemoved to complete the entire path from the current state to the target state.The direction of the movement from the current state is determined by the target state. Themagnitude of the movement in this direction is chosen randomly by sampling d m ∼ Unif [ {| G z | ∀ z } ] .To identify a set of m proposal states, y , y , . . . , y m , at various distances, d m , along the path in thespatial neighborhood space of state x ( t ) i and state x ( t ) j , we draw m samples.Once we have our proposal set, Y = { y , y , . . . , y m } , we choose a state, y , from this setwith probability proportional to ξ ( x , y ) = π ( x ) T ( x , y ) λ ( x , y ) , where λ ( x , y ) is a non-negativesymmetric function and T ( x , y ) is the proposal transition function deﬁned by the path relinkingoperator. Note that T ( x , y ) does not need to be symmetric. In addition, the only requirementfor λ ( x , y ) is that λ ( x , y ) > whenever T ( x , y ) > . How to identify an optimal λ ( x , y ) for aparticular application is an open question. However, Liu, Liang and Wong (2000), in their set ofnumerical experiments, found that the performance of an MTM sampler was insensitive to thechoice of λ ( x , y ) and that the simplest choice is λ ( x , y ) ≡ . This is the choice we utilize.If c connected components is the number of connected components that have been moved fromthe current state x to the target state, y , then we can see that T ( x , y ) = C ( C − . . . ( C − ( c − and then ξ ( x , y ) = T ( x , y ) exp { H ( y ) } .Once the state y is chosen, we generate a second set, { x ∗ } = { x ∗ , x ∗ , . . . , x ∗ m − } , from T ( y , · ) ,which is a transition that is also deﬁned by the spatial path relinking model, but y is now thecurrent state, x ( t ) i is the target state, and { x ∗ } is a set of intermediate states that lie in the neighbor-hood space between y and x ( t ) i . We then accept y as the next Markov chain state with probabilitydeﬁned by the generalized Metropolis-Hastings ratio, r PRCRX = min (cid:40) , ξ ( y , x ) + (cid:80) m − j =1 ξ ( y j , x ) ξ ( x , y ) + (cid:80) m − j =1 ξ ( x ∗ j , y ) (cid:41) , (15)and let x ( t +1) i = x ( t ) i with the remaining probability.This process allows us to update one Markov chain, X i , with information from another Markovchain, X j , that is randomly chosen to be the target chain. The next step, x ( t +1) i , for the currentMarkov chain, X i , is updated with an MTM transition along the direction deﬁned by the pathbetween x ( t ) i and x ( t ) j . Our Evolutionary Markov Chain Monte Carlo (EMCMC) algorithm combines the advantages ofevolutionary algorithms as optimization heuristics for state space traversal and the theoretical con-vergence properties of Markov Chain Monte Carlo algorithms for sampling from unknown distri-butions. We encompass these two algorithms within the framework of a Multiple-Try Metropo-lis Markov Chain model that incorporates a generalized Metropolis-Hasting ratio. The generalframework is as follows.

EVOLUTIONARY SPATIAL SAMPLING q parallel chains, X , X , . . . , X q at separate randomly generated feasible solutions, x (0)1 , x (0)2 , . . . , x (0) q .2. For each iteration of the algorithm, t = 0 , , . . . , T , the current state of each chain, x ( t ) i , isupdated to x ( t +1) i . The state at iteration t + 1 is determined with either a mutation transition,with probability p m , or a crossover transition, with probability p c = 1 − p m . The values of p m and p c may be the same or different across the n chains.3. If the next step is determined with a mutation transition, generate a proposal transition viathe ECMUT operator, and accept or reject that proposal with MH ratio, r MUT = M x M y exp {− [ H ( y ) − H ( x )] } .

4. If the next step is a proposal from the PRCRX spatial path relinking crossover operator,accept the proposal with the generalized Metropolis ratio, r PRCRX = min (cid:40) , ξ ( y , x ) + (cid:80) m − j =1 ξ ( y j , x ) ξ ( x , y ) + (cid:80) m − j =1 ξ ( x ∗ j , y ) (cid:41) . By adapting the ECMUT mutation operator and the spatial PRCRX crossover operators intothe MCMC framework as described in Sections 4.1–4.3, we arrive at a new MCMC algorithm,which we term an Evolutionary Markov Chain Monte Carlo algorithm (EMCMC), for samplingconstrained spatial partitions. EMCMC utilizes optimization heuristics to guide the movement ofMarkov chains. This enables large movements in the state space while supplying a way for theselarge movements to identify states that are not unlikely to be associated with a reasonably largeMetropolis-Hastings ratio. The Multiple-Try Method provides a generalized Metropolis-Hastingsratio that enables the integration of optimization methods with MCMC algorithms that is able tosampling spatial partition with constraints from spatial state spaces.

EMPIRICAL EXAMPLE Figure 5: Spatial Partitioning Example. A representation with spatial geographic units is shownon the left. A graph representation is shown on the right.We demonstrate the properties and effectiveness of EMCMC with an empirical application. Inthis application, there are 25 spatial units that we seek to partition into 3 spatially contiguous andseparate zones. The plot on the left in Figure 5 shows these units and their spatial conﬁgurationas a collection of spatial geographic units. This representation is not particularly conducive tocomputational structures, and so we transform these units into a graph theory format. The corre-sponding graph theory representation is shown on the right in Figure 5. The edges in the graphindicate spatial adjacency. Each node is able to preserve the attributes of the geographic unit vianode weights/attributes. The unit weights are shown by the number in each of the nodes.In this example, a feasible solution is deﬁned as an exhaustive and mutually exclusive partitionof the 25 nodes into 3 disjoint contiguous subgraphs such that the total weight in each subgraphis within 10% of the total weight in each of the other subgraphs. The coloring in the map and thegraph displays one feasible partition of the units into 3 zones/subgraphs that satisfy the contiguityand weight balancing requirement. The total weight of the red subgraph in the graph is 54,565.The total weight of the green subgraph is 61,711. The total weight of the blue subgraph is 58,767.If we measure the weight balance as max i | w i − µ | /µ , where µ = (cid:80) i w i /k , and k is the number ofzones, then this partition has a weight balance score of 0.058, which is below our threshold valueof 0.10. We seek to produce a uniform sample of the set of all such feasible partitions.With no constraints (i.e., no contiguity requirement and no weight balancing requirement),there are S (25 ,

3) = 141 , , , ways to partition these spatial units into 3 zones, where S ( n, k ) is a Sterling number of the second kind. When a contiguity requirement is imposed, thenumber of feasible partitions reduces to 117,688 possible solutions. When the weight balancingrequirement is added, only 927 of the possible partitions remain feasible partitions. Evidently, theconstraints signiﬁcantly reduce the feasible portion of the overall state space.We employed our EMCMC algorithm with 16 parallel chains. In 5 seconds of time, each chainvisited approximately 7,000 feasible states or partitions where feasible is deﬁned as satisfying thecontiguity and the weight balancing requirements. Let X represent a feasible state. Since thesestates are spatial partitions, the state space does not have a convenient structure that is simple EMPIRICAL EXAMPLE Partition Metric

Figure 6: Spatial Partitioning Application Example. The true underlying distribution of the par-tition metric, f ( X ) , is shown with a red histogram. Overlaid on that red histogram is a bluehistogram that shows the metric for the states visited by our Markov chains. In purple is theoverlap of these two distributions.either to order or to represent in R . To alleviate this problem, let f : X → R be a function thattakes a partition, X , and maps it to a metric in R . This function may be an arbitrary function. Thepartition metric we use in our empirical example is, f ( X ) = 12 k (cid:88) i =1 w i (cid:80) ki =1 w i | r i − R | R (1 − R ) , (16)where i = 1 , . . . k is a zone index, w i is the aggregated weight in zone i , r i is the proportion ofweight in zone i with a particular characteristic, r , and R is the proportion of that characteristic, r ,across all zones. The resulting distribution of this partition metric, f ( X ) , among the visited statesis shown in Figure 6. The true underlying distribution of the partition metric is shown with a redhistogram. Overlaid on that red histogram is a blue histogram that shows the metric for the statesvisited by our Markov chains. In purple is the overlap of these two distributions. As we can see,the overlap of the states visited by the Markov chains and the true underlying distribution, whilenot exact, is fairly close. The histograms and their overlap provide evidence of the ability of ourEMCMC algorithm to uniformly sample the underlying spatial state space. THE CHALLENGES WITH SCALING TO LARGER APPLICATIONS

While EMCMC was able to successfully sample the feasible spatial partitions in this spatial statespace, this particular spatial partitioning application, while not trivial, is still small in size anddoes not reach the level of complexity that accompanies both larger problem sizes and/or addi-tional constraints. Hence, while this spatial state space provides some proof of concept, additionalhurdles must be overcome in order to ensure successful navigation for more complex applications.While our small example does not, itself, produce a sufﬁciently challenging application, it has twoimportant roles. First, it is small enough that we can enumerate all of the spatial partitions andthus allows us to test the ability of our algorithm to produce a uniform sample from the true un-derlying distribution. Second, it is large enough to allow us to explore the structure and root ofthe obstacles that manifest in larger and more complex applications.

One issue is solution sparsity. In our small problem, the simple unconstrained combinatorial con-struction of sets has more than possible solutions. When the spatial contiguity constraint isimposed, the number of feasible partitions is reduced many orders of magnitude to . That is,fewer than 0.0001% of the set partitions are spatially contiguous. Though it is obvious that con-siderations of spatially deﬁned locality and adjacency have a direct and explicit effect on solutionfeasibility, this example highlights that even in a small sized application, the effect is dramatic.Figure 7 helps us visualize the precipitous decline in the number of feasible solutions from thenumber of unconstrained combinatorial set partitions. In the ﬁgure on the left, if the plot regionrepresents the full set of combinatorial set partitions, then the red square in the middle is

100 times

THE CHALLENGES WITH SCALING TO LARGER APPLICATIONS larger than the set of contiguous partitions. Pointedly, the feasible state space is much smaller thanthe entire decision space.

While the red square helps us visualize the size of the feasible space, note that the feasible solutionsare not concentrated in the central part or in any part of the entire solution space. Instead, thefeasible solutions are scattered throughout the decision space in an unstructured and unknownmanner, as illustrated in the plot on the right in Figure 7. While the size of the dots is now too large(collectively, they should be 100 times smaller than the red square in the plot on the left), they arecorrectly shown as scattered throughout the space. For any reasonably sized spatial partitioningproblem, then, an intelligent spatially guided search is essential since simply random movementin the decision space almost surely identiﬁes an infeasible solution, which results in enormouswasted computational effort (Liu and Cho, 2020).Importantly, note the critical implications that arise from the choice of the graph theoretic rep-resentation. Computationally, the graph theoretic framework is advantageous because it permitsa simple way to incorporate spatial adjacency with convenient discrete structures and providesan organizational framework for state space traversal. It also facilitates a way to deﬁne a fullyconnected state space such that beginning at any contiguous state, one can reach any other con-tiguous solution via a sequence of deletions/creations of single edges between nodes. Unsurpris-ingly, accounting for contiguity in this graph traversal is relatively simple since the basis of thegraph structure is spatial contiguity. However, perhaps also unsurprisingly, then, graph traver-sal accounting for other constraints is not so easily facilitated and requires dedicated effort andingenuity.To be sure, any graph theoretic framework compels a speciﬁc notion of solution adjacencyand proximity. In particular, the number of edges that must be changed to convert one solution toanother deﬁnes the distance between those two solutions in the state space. In our graph represen-tation, the distance from one contiguous solution to another contiguous solution is 1; and the set ofall contiguous solutions forms a connected set. When feasibility includes another constraint, thisadditionally constrained set of feasible solutions, using this graph traversal mechanism, almostsurely now comprises a disconnected set.

THE CHALLENGES WITH SCALING TO LARGER APPLICATIONS Figure 9: Spatially Isolated Partitions. With a Markov transition that involves only movement onzone boundaries, these two feasible partitions are isolated from all other feasible partitions.A connected space can be visualized with the amoeba shape shown in Figure 8. The mainfeature of the “shape” of the contiguous solutions is that it comprises a connected subspace suchthat any contiguous partition can be reached from any other contiguous partition with transitionsthat traverse only other contiguous partitions. That is, movement is completely conﬁned withinthe contiguous set, and movement into infeasible space is not necessary. However, the impositionof any other constraint, say, weight balancing, induces a “patchy” decision space that is not fullyconnected by similar traversal. We can visualize this as the ﬁgure on the right in Figure 8. Howto traverse from one contiguous solution that satisﬁes the weight balancing criterion to anothercontiguous solution that also satisﬁes the weight balancing criteria is non-trivially more difﬁcult,featuring not only greater sparsity, but disconnected solution regions. For our small example,Figure 9 shows two feasible partitions that are proximate and reachable from the other, but isolatedfrom all other feasible partitions if traversal involves only unit swaps on subgraph borders.

As we have already noted, for the task of sampling solutions satisfying only the contiguity con-straint, a Markov transition like ECMUT-1, as well as variants that have been proposed by oth-ers (Bangia et al., 2017), is sufﬁcient. While the Markov chain produced by this transition proposalis irreducible on the fully connected state space, it is not irreducible when additional constraintsrestrict the feasible solution space and induce a disconnected state space. A reducible Markovchain will not produce a sample of the underlying space.This issue can be resolved by 1) allowing movement into infeasible space in the search for otherfeasible regions, 2) formulating a set of parallel or serial chains that uniformly sample among andin the disconnected regions, or 3) specifying a different transition proposal that does produce anirreducible Markov chain. The ﬁrst option is theoretically viable, but plainly produces massivecomputational waste. In our small example, the wasted effort is not strictly limiting given thesmall problem size. However, for a larger and more realistic application, this approach is not

THE CHALLENGES WITH SCALING TO LARGER APPLICATIONS Certainly the advantages of employing these separate options can made more effective bycombining the various strategies. Movement need not completely avoid infeasible solutions; par-allelism can be implemented from a variety of different serial scenarios; and larger Markov transi-tions can be integrated along with parallel chains that sometimes explore infeasible space. Indeed,a conﬂuence of these strategies is likely to be more effective than any one on its own. This is theapproach that we employ in our EMCMC algorithm, where all of these strategies have been incor-porated.Finally, computational scalability transcends the choice of search strategy. While convergencerates are certainly affected by the efﬁciency of the transition proposal, for large applications, onemust also pursue computational strategies to further improve efﬁciency. This brings us to our lastchallenge, where even with a signiﬁcantly more efﬁcient and irreducible Markov chain, the issueof scalability remains.

The nature of the EMCMC algorithm is already parallelized, with a parallel structure that isstraightforward to scale, so scalability is already an integral part of our algorithm. In Figure 10,we see how our EMCMC algorithm scales when we enlist more processor cores. As we doublethe number of processor cores, EMCMC reaches an increasingly larger portion of the underlyingstate space. The ﬁgure on the right shows the number of unique solutions identiﬁed in 10 minutes To be sure, there may be other issues such as a multimodal state space replete with copious low and high energystates. Here, there is a signiﬁcant literature with many and varied proposals. None of these can be regarded as a gen-eral ﬁx. Rather, they must be tuned to the particular application. The strategies generally fall into one of two camps.The ﬁrst is the set of methods that incorporate temperature as an auxiliary variable to facilitate sampling at a broadarray of temperatures, both low and high. Included in this set is, for example, parallel tempering (Geyer, 1991), theequi-energy sampler (Kou, Zhou and Wong, 2006), and the Swendsen-Wang algorithm (Swendsen and Wang, 1986).The second is the set that incorporates information from past samples, which includes, for example, multicanonicalsampling (Berg and Neuhaus, 1991), the Wang-Landau algorithm (Wang and Landau, 2001) and stochastic approxima-tion Monte Carlo (Liang, Liu and Carroll, 2007). The choice depends on which strategy is most tightly coupled withthe peculiarities of the speciﬁc state space. This literature addresses issues of efﬁciency in the Markov chain. That is,if one has an irreducible Markov chain that converges slowly, these strategies help to increase the efﬁciency of thatchain and hasten convergence by producing a method that employs a mechanism for facilitating movement out of lowenergy states. Fiﬁeld et al. (2019) found that a parallel tempering approach was able to sample spatial partitions, butthey failed to realize that the sample was enabled by the different chains that were initialized in different disconnectedregions. That is, what makes this approach tenable is not parallel tempering per se, but the ability to initiate differentMarkov chains (whether in parallel or not) at different and random starting states.

DISCUSSION

We have devised a massively parallel Evolutionary Markov Chain Monte Carlo (EMCMC) algo-rithm for sampling from astronomically large and complex spatial state spaces. The applicabilityand performance of our algorithm has been demonstrated for a spatial partitioning problem. We

ACKNOWLEDGEMENTS m , and the complexity of the landscape of the underlying distribution, π . As well, while ourEA operators were effective, there are other and perhaps more efﬁcient methods for landscapetraversal in these spatial state spaces. Improvement in the optimization components would haveobvious translation to gains in performance for the sampler. There is an unknown and untestedrelationship between the population size or the number of parallel chains and the rate of con-vergence of the sampler. A larger number of chains increases the diversiﬁcation of the search,which is desirable, while fewer but longer chains have an impact on the convergence rate. Thereare a number of trade offs, some of which are application speciﬁc, while others have roots in thetheoretical properties of the underlying model.In addition, computational advances will improve the performance of the sampler. This in-cludes innovations that increase the efﬁciency of the computational algorithms as well as novelways to adapt the serial implementation of the algorithm to run on the massively parallel ar-chitecture that is available on supercomputers. This type of architecture will have an obviousrelationship with the population size or the number of Markov chains that can be run in parallel.It has a more subtle relationship with how the search can be made more efﬁcient, by perhaps,intelligently increasing the diversity and the intensity of the optimization heuristic in response tothe variations in state space landscape (Cho and Liu, 2019). The large number of processors couldthen span across the solution space to increase diversity in the search while working collectivelyor intensively in regions where the search is particularly difﬁcult. This research is part of the Blue Waters sustained petascale computing project, which is supportedby the National Science Foundation (awards OCI-0725070 and ACI-1238993) and the State of Illi-nois. Blue Waters is a joint effort of the University of Illinois at Urbana-Champaign and its Na-tional Center for Supercomputing Applications.

EFERENCES References

Bangia, Sachet, Christy Vaughn Graves, Gregory Herschlag, Han Sung Kang, Justin Luo,Jonathan C. Mattingly and Robert Ravier. 2017. “Redistricting: Drawing the Line.”arXiv:1704.03360 stat.AP.B´elisle, Claude J.P., H. Edwin Romeijn and Robert L. Smith. 1993. “Hit-and-Run Algorithms forGenerating Multivariate Distributions.”

Mathematics of Operations Research

Physics Letters B

Statistics and Computing

SC17: The International Conference for High Performance Computing, Network-ing, Storage and Analysis .Cho, Wendy K. Tam and Yan Y. Liu. 2019. Parallel Hybrid Metaheuristics with Distributed Inten-siﬁcation and Diversiﬁcation for Large-scale Optimization in Big Data Statistical Analysis. In

Proceedings of the 2019 IEEE International Conference on Big Data . Los Angeles, CA: .Duque, Juan C., Richard L. Church and Richard S. Middleton. 2011. “The p-Regions Problem.”

Geographical Analysis

Facility Layout and Location: An Analytical Approach .Englewood Cliffs, NJ: Prentice Hall.Gelman, Andrew and Donald B. Rubin. 1992. “Inference from Iterative Simulation Using MultipleSequences.”

Statistical Science

Computing Scienceand Statistics: Proceedings of the 23rd Symposium on the Interface , ed. E.M. Keramidas. FairfaxStation: Interface Foundations pp. 156–163.Gilks, W.R., G.O. Roberts and E.I. George. 1994. “Adaptive Direction Sampling.”

The Statistician

Statisticsand Computing

Control and Cybernetics

EFERENCES

Biometrika

Spatial Optimization for Managed Ecosystems . New York:Columbia University Press.Kou, S.C., Qing Zhou and Wing Hung Wong. 2006. “Equi-energy Sampler with Applications inStatistical Inference and Statistical Mechanics.”

The Annals of Statistics

Machine Learning

Journal of the American Statistical Association C p Model Sampling and Change-point Problems.”

Statistica Sinica

Journal of the American Statistical Society

Journal of the American Statistical Association

Applied Soft Computing Journal .Liu, Yan Y., Wendy K. Tam Cho and Shaowen Wang. 2016. “PEAR: A Massively Parallel Evolu-tionary Computation Approach for Political Redistricting Optimization and Analysis.”

Swarmand Evolutionary Computation

The Journalof Chemical Physics

T, V, µ ) Monte Carlo Method for the Computer Simulationof Fluids.”

Molecular Physics

Journal of Transportation Geography

Journal of the American Statistical Association

EFERENCES

Journal ofMultivariate Analysis

Environmentand Planning B: Planning and Design

Physical Review Letters