[PDF] Non-reversible Markov chain Monte Carlo for sampling of districting maps

Abstract

Evaluating the degree of partisan districting (Gerrymandering) in a statistical framework typically requires an ensemble of districting plans which are drawn from a prescribed probability distribution that adheres to a realistic and non-partisan criteria. In this article we introduce novel non-reversible Markov chain Monte-Carlo (MCMC) methods for the sampling of such districting plans which have improved mixing properties in comparison to previously used (reversible) MCMC algorithms. In doing so we extend the current framework for construction of non-reversible Markov chains on discrete sampling spaces by considering a generalization of skew detailed balance. We provide a detailed description of the proposed algorithms and evaluate their performance in numerical experiments.

Full PDF

NNon-reversible Markov chain Monte Carlo for sampling ofdistricting maps

Gregory Herschlag ∗ , Jonathan C. Mattingly † , Matthias Sachs ‡ , Evan Wyse § Department of Mathematics, Duke University, Durham, NC 27708 Department of Statistical Science, Duke University, Durham NC 27708 The Statistical and Applied Mathematical Sciences Institute (SAMSI), Durham, NC27709

Abstract.

Evaluating the degree of partisan districting (Gerrymandering) in a statisti-cal framework typically requires an ensemble of districting plans which are drawn from aprescribed probability distribution that adheres to a realistic and non-partisan criteria. Inthis article we introduce novel non-reversible Markov chain Monte-Carlo (MCMC) meth-ods for the sampling of such districting plans which have improved mixing properties incomparison to previously used (reversible) MCMC algorithms. In doing so we extend thecurrent framework for construction of non-reversible Markov chains on discrete samplingspaces by considering a generalization of skew detailed balance. We provide a detaileddescription of the proposed algorithms and evaluate their performance in numerical ex-periments. Introduction

The use of computer generated alternative redistricting plans to benchmark particularredistricting maps has gained legal and scientiﬁc traction in recent years. The generationof such an ensemble of maps has been used to identify and quantify the extent of partisanand racial gerrymandering by answering the question “What would one expect to havehappened if no partisan or racial information had been used?” These methods producea baseline informed by the geo-political geography of the state and which do not assumeproportional presentation or unrealistic symmetry assumptions. This baseline can then beused to evaluate a particular redistricting plan of interest.In [18, 11, 16, 20], an ensemble of maps is generated by sampling from probability dis-tribution constructed on the space of possible redistricting plans using only non-partisanconsiderations. In this thread of work, the sampling was performed via Markov chainMonte Carlo (MCMC) using a standard Metropolis-Hasting algorithm based on a singlenode ﬂip proposal chain. Other ensemble methods have used generative techniques basedon optimization, genetic algorithms, or Markov chains without a clearly describable sta-tionary measure. Examples of the latter include the generation of samples using simulatedannealing [2, 13, 12] and Markov chains based on merge-split operations [6] (see also [5] ∗ [email protected] † [email protected] ‡ [email protected] § [email protected] a r X i v : . [ s t a t . C O ] A ug for an extension of the latter work which allows one to generate samples from a prescribedtarget measure.)Many of the above samplers utilize the Metropolis-Hastings algorithm so the underlyinggenerating Markov chain is reversible The reversible methods, by deﬁnition, have a Markovkernel associated with this Markov chain satisfying a detailed balance condition with re-spect to the corresponding stationary measure. Heuristically, this implies that the Markovchain has a diﬀusive nature. Other samplers used which are non-reversible typically samplefrom an unknown distribution.In recent years MCMC methods based on non-reversible Markov chains (i.e., Markovchains whose Markov kernel fails to satisfy a detailed balance condition) have attractedincreased attention because of their favorable convergence and mixing properties; withoutclaim to completeness of the work listed we refer the reader to [22, 25, 28, 26, 14] andto [3, 19, 4, 10, 9, 15, 8] for examples of non-reversible MCMC methods for sampling oncontinuous spaces, and discrete spaces, respectively.For many of these methods, improved mixing properties over their reversible counter-parts is folklore among practitioners; however, there is a growing body of theoretical workthat supports these claims [7, 10].In the setup of a continuous sampling space, non-reversible Markov chains naturallyarise through time discretization of stochastically perturbed versions or modiﬁcations ofNewton’s equations of motion (see e.g., [1, 23]). In these cases, reversibility of the dynamicsis broken due to the presence of inertia modeled by the momenta associated with eachdegree of freedom. This is consistent with physical intuition that the resulting ballistic-likemotion tends to exhibit better mixing properties over a purely diﬀusive dynamics of areversible Markov chain. For example, the existence of momentum is typically cited as thestrength of Langevin sampling over simple Browning dynamics.For sampling in discrete space, a common approach for designing non-reversible MCMCmethods is what is sometimes referred to as “lifting” [29, 19]. Here, a reversible MCMCmethod is modiﬁed by replicating the state space through the introduction of a dichotomousauxiliary variable taking values in {− , } along with a simple directed subgraph of theMarkov state graph induced by the original reversible Markov chain. Depending on thevalue of the auxiliary variable transition probabilities along the assigned directions of thesimple subgraph are then either increased or decreased. As such the auxiliary variable hasa similar eﬀect as the momentum variable in the continuous setup. For example, in thecase where the Markov state graph induced by the original reversible MCMC method is acircular graph, a simple way of implementing a lifting approach is by increasing clockwisetransition probabilities for a positive value of the auxiliary variable and increasing counter-clockwise transition probabilities for a negative values of the auxiliary variable [7].The sampling eﬃciency of the Markov chain obtained by lifting highly depends on thechoice of the simple directed subgraph. While in the above mentioned example a “good”choice can easily be identiﬁed, constructing a suitable subgraph for Markov chains whoseassociated Markov state graph has a more complex topology can be diﬃcult.In this article, we introduce non-reversible MCMC methods for the sampling of redistrict-ing maps. Creating a collection of redistricting maps, via sampling of a speciﬁed measure, is an important step in many method currently used to evaluate redistricting and detectand explain gerrymandering. In this note, we introduce a heuristic for implementing aneﬃcient lifting approach which is based on a notion of ﬂowing the center of mass of districtsalong a deﬁned vector ﬁeld; the center of mass arises from an embedding of the districtinggraph in R . We also introduce a novel construction for non-reversible MCMC dynamics asa generalization of the standard lifting approach which allows the incorporation of multi-ple momenta variables. This allows us to construct non-reversible MCMC schemes for ourapplication which make use of the structure of the induced district-level graph. Finally, wecombine these methods with a tempering scheme which minimizes rejection rates in thenon-reversible Markov chain and thereby increases sampling eﬃciency.The remainder of this article is organized as follows. In Section 3.1, we review the formaldeﬁnition of non-reversibility of Markov chains on discrete sampling spaces. In Section 3.2,we review the basic construction of non-reversible MCMC methods via a skew detailedbalance condition. In Section 4, we describe a novel construction of non-reversible MCMCschemes which allows for multipule momentum corresponding to diﬀerent proposal chains.In Section 5, we describe the implementation of our approach under the application underconsideration; in Section 7, we test our ideas numerically.2. Exposition of the main algorithms

Before we rigorously develop the underlying mathematical framework, we start by in-formally describing the two main sampling algorithms proposed in this article and demon-strating how they would be applied to sampling redistricting plans for the North CarolinaCongressional Delegation. We construct our non-reversible sampling methods as modi-ﬁcations of a variant of the single node ﬂip algorithms (see Section 5.3) where randomredistricting maps are sequentially generated by changing in each iteration the color (dis-trict allocation) of a single precinct located at the border of the current redistricting map.We introduce non-reversibility by directing transitions along what we informally refer toas a ﬂow. Depending on the value of a momentum variable, only transitions in positive ornegative direction are permitted, resulting in a macroscopic level kinetic like movementsalong/against the ﬂow.The intuition behind the ﬁrst proposed method (“Center-of-mass ﬂow”, see Section 6.1and Fig. 1 ) is that a fast mixing on a macroscopic level is obtained if districts tend tocollectively follow the ﬂow of a suitable, well-stirring vector ﬁeld in R where there districtgraph is embedded. For example, under an appropriate choice of the vector ﬁeld, theresulting collective rotational movements of districts in the course of a simulation producesmore eﬃcient mixing then more diﬀusive sampling algorithms (see Fig. 1). Technically,we implement this idea by aligning the movement of the center of masses of districts withthe vector ﬁeld. For positive/negative momentum value only transitions for which themidpoints of the center of masses of the modiﬁed districts move in the positive/negativedirection of the vector ﬁeld are permitted. The second proposed method (“(Pair-wise)District-to-district ﬂow”, see Section 6.2 and Fig. 2) utilizes an extended framework, whichallows the incorporation of multiple momenta each associated with a diﬀerent ﬂow. The Figure 1.

Center-of-mass ﬂow introduced in Section 6.1. Changes in districtboundaries must, on average, move the center of masses of the districts either withor against the drawn vector ﬁeld. Drawing the 13-district N.C. Congressional mapis used as an example.

Figure 2.

District-to-district ﬂow introduced in Section 6.2. Depending on theindividual velocity values associated with each vector the corresponding boundariesbetween adjacent districts may either only move in the direction or in oppositedirection of the displayed vector. idea of the method is to associate a momentum variable with each district pair. Dependingon the value of the respective momentum, only transitions that ﬂow districts in a directionaligned with the orientation of the respective momentum arrow are permitted. For example,consider the redistricting plan depicted in Fig. 2. If the value of the momentum variableassociated with the orange and light blue district is positive, then among the transitionswhich modify both these two districts only transitions that add a precinct from the lightblue district to the orange district are permitted.3.

Reversible and Non-Reversible Markov Chains

Detailed Balance.

Consider a Markov Chain on a countable state-space X witha transition kernel P . The Markov kernel P is said to be reversible , if there exists aprobability distribution π on X so that the pair ( P , π ) satisﬁes detailed balance . That is π ( x ) P ( x, x (cid:48) ) = π ( x (cid:48) ) P ( x (cid:48) , x ) , ∀ x, x (cid:48) ∈ X . (1) We restrict to a countable state-space of simplicity. There are no inherent obstructions to generalizingto general Polish Space. See Remark 4.5

Markov transition kernels P which fail to satisfy the detailed balance condition for anymeasure π are referred to as non-reversible . Since π ( x ) P ( x, x (cid:48) ) is the probability ﬂux inequilibrium ﬂowing from state x to x (cid:48) , detailed balance can be restated as the equilibriumﬂux from x to x (cid:48) is the same as from x (cid:48) to x .The detailed balance condition is a suﬃcient, but not a necessary condition, for thetransition kernel P to preserve the measure π . By deﬁnition, invariance of the measure π only requires that π P = π which is just a compact notation for (cid:88) x ∈X π ( x ) P ( x, x (cid:48) ) = π ( x (cid:48) ) ∀ x (cid:48) ∈ X . It can be rewritten as (cid:88) x (cid:48) ∈X \{ x } π ( x (cid:48) ) P ( x (cid:48) , x ) = (cid:88) x (cid:48) ∈X \{ x } π ( x ) P ( x, x (cid:48) ) ∀ x ∈ X , (2)and, as such, states that for any state x ∈ X the total probability ﬂux into the state x (thelefthand side of (2)) is equal to the total probability ﬂux out of the state x (the righthandside of (2)). This condition is commonly referred to as a global balance condition and issatisﬁed by any Markov kernel which preserves π .3.2. Skew detailed balance.

A common way of constructing non-reversible Markovchains with prescribed invariant measure π is by enforcing global balance through aninvolutive transform. This structure is called skew detailed balance and ensures that de-tailed balance holds up to some π -invariant involutive transformation. More precisely,let S : X → X be an π -invariant involutive transformation, so that S = S − , and π ( S ( A )) = π ( S − ( A )) = π ( A ) , ∀ A ⊂ X . Then, the Markov kernel P satisﬁes skewdetailed balance if π ( x ) P ( x, x (cid:48) ) = π ( x (cid:48) ) P ( S ( x (cid:48) ) , S ( x )) , ∀ x, x (cid:48) ∈ X . (3)It is easy to verify that skew detailed balance implies global balance (see e.g. [24], or proofof Theorem 4.1 in appendix B), and thus invariance of π under P .Due to its local nature, skew detailed balance with respect to π can be easily enforcedby an accept-reject step. More precisely, let Q denote a “proposal” Markov kernel on X ,and denote by ( x k ) k ∈ N the Markov chain generated by the following generalization of theMetropolis-Hastings algorithm(1) x (cid:48) ∼ Q ( x k , · ),(2) with probability r ( x k , x (cid:48) ) set x k +1 = x (cid:48) ; otherwise x k +1 = S ( x k ), where r ( x, x (cid:48) ) := min (cid:18) , π ( x (cid:48) ) Q ( S ( x (cid:48) ) , S ( x )) π ( x ) Q ( x, x (cid:48) ) (cid:19) . The detailed balance condition is equivalent to the Kolmogorov deﬁnition of reversibility which requiresthe probability of following any sequence of states is the same as following the sequence in reverse order.This justiﬁes the name reversible . Here, and in the remainder of this article we denote by N the set of non-negative integers. Provided that the acceptance probability r ( x, x (cid:48) ) is well deﬁned for all pairs ( x, x (cid:48) ), thetransition kernel of the generated Markov chain takes the form P ( x, x (cid:48) ) = r ( x, x (cid:48) ) Q ( x, x (cid:48) ) + (1 − r ( x, x (cid:48) )) { S ( x ) } ( x (cid:48) ) , which indeed can be veriﬁed to satisfy the skew detailed balance condition (3) (see e.g.,[24] or proof of Theorem 4.2 in appendix B).4. A General Non-Reversible Process Construction

In this section we ﬁrst introduce a generalization of the standard skewed balance condi-tion, termed mixed skewed balance condition, and show that this condition is suﬃcient forthe corresponding Markov kernel to preserve a prescribed probability measure. We thenprovide a generalization of the Metropolis-Hastings algorithm, the

Mixed Skew Metropolis-Hastings Algorithm (MSMH) which utilizes the mixed skewed balance condition.4.1.

The mixed skewed balance condition.

In the following, let { S i } ni =1 be a collectionof π -invariant involutions, { P i } ni =1 a collection of Markov kernels on X , and ω : X → ∆ n − , ω ( x ) = ( ω ( x ) , . . . , ω n ( x )) , a weight vector taking values in the n th standard simplex ∆ n − := { y ∈ R n : y i ≥ , (cid:80) ni =1 y i = 1 } . Since at each point x the weights are non-negative and sum to one,we can build a new kernel P out of the collection of Markov kernels { P i } ni =1 by setting P = ω · P , which is written more explicitly as P ( x, · ) = n (cid:88) i =1 ω i ( x ) P i ( x, · ) , ∀ x ∈ X . From this we see clearly that P is an x -dependent mixture of the kernels { P i } . With thiscomes the interpretation that a draw from P can be realized by ﬁrst picking an index i according to the weights and then drawing the next state according to P i .We say that the Markov kernel P satisﬁes mixed skewed balance with respect to π , if forall x, x (cid:48) ∈ X and for all i ∈ { , . . . , n } ,(4) ω i ( x ) π ( x ) P i ( x, x (cid:48) ) = ω i ( x (cid:48) ) π ( x (cid:48) ) P i (cid:0) S i ( x (cid:48) ) , S i ( x ) (cid:1) . As discussed further in Remark 4.3, equation (4) means that i -th kernel P i satisﬁes theskew-detail balance condition for the an invariant measure proportional to π ( x ) ω i ( x ). Yet,as the following results show, by mixing these kernels according to the weights ω i ( x ), oneobtains a Markov which has π as its invariant measure. Typically one has that π ( x ) = π ( S i ( x )) and ω i ( x ) = ω i ( S i ( x )); and hence, (4) can again be understood as a probabilityﬂux balancing condition; the ﬂux from x to x (cid:48) is equal to the ﬂux from S i ( x (cid:48) ) to S i ( x ).(Given that we chose the i th kernel P i according to the weight ω i .) Theorem 4.1.

If the Markov kernel P deﬁned by the collection { ( ω i , P i , S i ) } ni =1 satisﬁesmixed detailed balance with respect to π , then P π = π , i.e., the Markov kernel P preservesthe probability measure π . For a proof of this theorem, see Appendix B.

The Mixed Skew Metropolis-Hastings algorithm.

Consider a collection of n Markov “proposal kernels” Q i , i = 1 , . . . , n, on the subsets X i ⊂ X , i = 1 , . . . , n , re-spectively, which form a cover of the whole domain, i.e., (cid:83) ni =1 X i = X . Moreover, let S i : X i → X i , i = 1 , . . . , n be a collection of π -invariant involutions.In what follows we describe how the collections of proposal kernels and involutionstogether with a suitable state dependent weight vector ω : X → ∆ n − can be used togenerate a Markov chain which preserves the target measure π .The mixed skewed balance condition provides the appropriate framework for “patching”these proposals kernels together to obtain a Markov chain which samples from the targetmeasure π .Algorithmically, this can be implemented in a two-step algorithm (see Algorithm 1). Inthe ﬁrst step of this algorithm a proposal x (cid:48) is generated from the current state x of theMarkov chain as x (cid:48) ∼ Q i ( x, · ) , where i ∼ ω ( x ) . The mixed skew detailed balance condition is then enforced through an accept-reject step,where the proposal is accepted with probabilitymin (cid:18) , π ( x (cid:48) ) ω i ( S i ( x (cid:48) )) Q i ( S i ( x (cid:48) ) , S i ( x )) π ( x ) ω i ( x ) Q i ( x, x (cid:48) ) (cid:19) , in which case the subsequent state of the Markov chain is set to be x (cid:48) , and rejected oth-erwise, in which case the next state of the Markov chain is set to be the i th involutivetransformation of the current state, that is S i ( x ).In order for these two steps to be well-deﬁned and the resulting transition kernel toindeed preserve the target measure π we require the weight vector ω to satisfy( C ) ω i ( x ) > ⇐⇒ x ∈ X i , i = 1 , . . . , n .( C ) ω i ( x ) > Q i ( x, x (cid:48) ) > ω i ( x (cid:48) ) > Q i ( S i ( x (cid:48) ) , S i ( x )) > C ) ω i ( S i ( · )) = ω i ( · ) , i = 1 , . . . , n , i.e., the i th entry of the weight vector is invariantunder the i th involutive transformationCondition ( C ) ensures that the eﬀective proposal kernel Q , Q ( x, · ) = (cid:80) ni =1 ω i ( x ) Q i ( x, · ) , ∀ x ∈X , is well deﬁned, and ( C ) ensures that the Metropolis ratio r i ( x, x (cid:48) ) is well deﬁned. In-variance of the i th weight under the i th involution as stated in ( C ) ensures that the mixedskew detailed balance condition holds for the generated Markov chain. In summary, wehave Theorem 4.2.

Let X i ⊂ X , i = 1 , . . . , n ; be a cover of the X , and let Q i , i = 1 , . . . , n ; beMarkov kernels deﬁned on X i , i = 1 , . . . , n ; respectively. Moreover, let S i , i = 1 , . . . , n ; be acollection of π -invariant involutions on X , and ω : X → ∆ n − be an x -dependent weightvector satisfying the conditions ( C ) to ( C ) . Then, the MSMH Markov chain generated byAlgorithm 1 possesses π as an invariant measure. For a proof of this theorem, see Appendix B.

Algorithm 1:

Mixed SkewMetropolis-Hastings (MSMH) input : x sample partition i ∼ ω ( x ); generate proposal x (cid:48) ∼ Q i ( x, · ); r i ( x, x (cid:48) ) ← π ( x (cid:48) ) ω i ( S i ( x (cid:48) )) Q i ( S i ( x (cid:48) ) ,S i ( x )) π ( x ) ω i ( x ) Q i ( x,x (cid:48) ) ; sample u ∼ U ([0 , if u < r i ( x, x (cid:48) ) then x ← x (cid:48) else x ← S i ( x ) return x Algorithm 2:

Mixed Skew Metropolis-Hastings on extended state space input : ξ, (cid:126)θ sample partition i ∼ (cid:101) ω ( ξ ); sample proposal ( ξ (cid:48) , (cid:126)θ (cid:48) ) ∼ Q i (cid:0) ( ξ, (cid:126)θ ) , ( · , · ) (cid:1) ; r i (cid:0) ( ξ, (cid:126)θ ) , ( ξ (cid:48) , (cid:126)θ (cid:48) ) (cid:1) ← (cid:101) ω i ( ξ (cid:48) ) (cid:101) π ( ξ (cid:48) ) Q i ( ( ξ (cid:48) ,R i ( (cid:126)θ (cid:48) )) , ( ξ,R i ( (cid:126)θ )) ) (cid:101) ω i ( ξ ) (cid:101) π ( ξ ) Q i ( ( ξ,(cid:126)θ ) , ( ξ (cid:48) ,(cid:126)θ (cid:48) ) ) ; sample u ∼ U ([0 , if u < r i (cid:0) ( ξ, (cid:126)θ ) , ( ξ (cid:48) , (cid:126)θ (cid:48) ) (cid:1) then ( ξ, (cid:126)θ ) ← ( ξ (cid:48) , (cid:126)θ ) else θ i ← − θ i return ξ, (cid:126)θ Mixed Skew Metropolis-Hastings algorithm in generic form (Algorithm 1) and asobtained via augmenting the sampling space (Algorithm 2).

Remark . The transition kernel P of the Markov chain generated by Algorithm 1 takesthe explicit form P ( x, · ) = (cid:80) ni =1 ω i ( x ) P i ( x, · ) with P i ( x, x (cid:48) ) = min(1 , r i ( x, x (cid:48) )) Q i ( x, x (cid:48) ) + (1 − min(1 , r i ( x, x (cid:48) ))) { S i ( x ) } ( x (cid:48) ) , i = 1 , . . . , n. If entries in the weight vector ω are constant in x , then, the weight entries in the expressionof the respective Metropolis-Hasting ratios r i ( x, x (cid:48) ) , i = 1 , . . . , n cancel, so that each P i is π -invariant, and P is simply a mixture of π -invariant Markov kernels. In contrast, theweights ω will not be constant in our examples; and hence, the Markov kernels P i will notgenerally preserve the target measure π . Instead these kernels can be shown to preservethe probability measures π i ( x ) ∝ ω i ( x ) π ( x ) , i = 1 , . . . , n , respectively.4.3. Implementation on the state space graph of a Markov process.

While Algo-rithm 1 is very general, we have not speciﬁed how the involutions { S i } ni =1 and the proposalkernels { Q i } ni =1 may be chosen, or provided any intuition for why the algorithm might bean improvement over the classical Metropolis-Hastings algorithm.In what follows we provide a general construction which takes a proposal Markov tran-sition kernel (cid:101) Q and target measure (cid:101) π on a discrete state space Ξ and builds a collectionof proposals { Q i } ni =1 and involutions { S i } ni =1 on an extended state space X so that Al-gorithm 1 can be used. This construction will make more precise the idea that the skewMetropolis-Hastings algorithm adds “momentum” to the standard Metropolis-Hastings al-gorithm.Our main conditions on the proposal kernel (cid:101) Q and (cid:101) π are summarized in the followingassumption. Assumption 1.

Let the proposal kernel (cid:101) Q and the target measure (cid:101) π be such that (i) (cid:101) π ( ξ ) > for all ξ ∈ Ξ (ii) (cid:101) Q ( ξ, ξ (cid:48) ) (cid:54) = 0 ⇐⇒ (cid:101) Q ( ξ (cid:48) , ξ ) (cid:54) = 0 (iii) The Markov chain generated by (cid:101) Q is irreducible. The ﬁrst condition is mild as states ξ with (cid:101) π ( ξ ) = 0 can simply be removed from Ξ. Thesymmetry condition (ii) ensures that (cid:101) Q is equivalent (in the senes that the correspondingtransition probabilities have identical support) to some reversible kernel with invariant mea-sure (cid:101) π . Condition (iii) is necessary to ensure that the constructed non-reversible Markovchain is uniquely ergodic (see Section 4.4).The proposal kernel induces a graph structure on Ξ. States ξ, ξ (cid:48) ∈ Ξ are said to beadjacent if (cid:101) Q ( ξ, ξ (cid:48) ) >

0. We refer to corresponding adjacency graph G = ( V , E ) withvertices given as V = Ξ and edges E = { ( ξ, ξ (cid:48) ) ∈ Ξ × Ξ | (cid:101) Q ( ξ, ξ (cid:48) ) > } as the state graph of the Markov chain; see Fig. 3a for an illustration. In the view of this graph structure,the condition (ii) ensures that the graph G is symmetric (or undirected) in the sense thatif ( u, v ) ∈ E then so is ( v, u ) ∈ E , and condition (iii) ensures that G is connected.The general idea of our construction is to build a non-reversible Markov chain by intro-ducing non-reversible ﬂows, typically shaped like “vortices,” on the state graph, each ofwhich being associated with a involutive transformation.Concretely, we begin by specify these ﬂows by a collection of oriented subgraphs G + i =( V i , E + i ) , i = 1 , . . . , n . We require that each G + i has no isolated vertices and that thesymmetric completions G i = ( V i , E i ), where E i = E + i ∪ E − i with E − i = { ( u, v ) | ( v, u ) ∈ E + i } ,form a cover of G in the sense that V = (cid:83) ni =1 V i , E = (cid:83) ni =1 E i ; see Fig. 3b.For each oriented subgraphs G + i , we introduce an auxiliary variable θ i which takes pos-itive or negative unitary values {− , +1 } , and we denote the vector of all such auxiliaryvariables as (cid:126)θ = ( θ , . . . , θ n ) ∈ {− , +1 } n . In accordance with our notation of Section 4we denote the such extended state-space by X = Ξ × {− , } n and we use the shorthandnotation x = ( ξ, (cid:126)θ ) ∈ X for elements of that state.Given this collection of directed subgraphs, our general recipe to built a collections ofassociated proposals { Q i } ni =1 , involutions { S i } ni =1 and weights { (cid:101) ω i } ni =1 on the extendedspace X is as follows. Let N i ( ξ ) := { ξ (cid:48) | ( ξ, ξ (cid:48) ) ∈ E i } , denote the of neighborhood of ξ in the graph G i which are reachable under the proposalkernel (cid:101) Q . Let N + i ( ξ ) = { ξ (cid:48) ∈ Ξ : ( ξ, ξ (cid:48) ) ∈ E + i } , N − i ( ξ ) = { ξ (cid:48) ∈ Ξ : ( ξ, ξ (cid:48) ) ∈ E − i } , denote the partition of neighborhood N i ( ξ ) into the set of states which can be reached inone step from the state ξ following the direction of the positive ﬂow E + i , and the negativeﬂow E − i , respectively; see Fig. 3b. We use this partition to built for each subgraph G i aproposal kernel Q i on X i := V i × {− , } n ⊂ X , which for positive value θ i = 1 proposesnew states in the direction of the positive ﬂow E + i , and for negative value θ i = − new states in the direction of the negative ﬂow E − i . That is Q i (cid:0) ( ξ, (cid:126)θ ) , ( · , (cid:126)θ ) (cid:1) ∝ N θii ( ξ ) ( · ) (cid:101) Q i ( ξ, · ) , or, more precisely, Q i (( ξ, (cid:126)θ ) , ( ξ (cid:48) , (cid:126)θ (cid:48) )) =  (cid:101) Q ( ξ, ξ (cid:48) ) (cid:101) Q ( ξ, N θ i i ( ξ )) if ξ (cid:48) ∈ N θ i i ( ξ ) and (cid:126)θ (cid:48) = (cid:126)θ, N θ i i ( ξ ) = ∅ and S i (( ξ (cid:48) , (cid:126)θ (cid:48) )) = ( ξ, (cid:126)θ ) , , (5)where in both the above expressions we used the shorthand notation N θi ( ξ ) = (cid:40) N + i ( ξ ) , if θ = +1 N − i ( ξ ) , if θ = − . As a natural choice for the involutive map S i : X → X we consider the map which ﬂipsthe sign of the i th component of the vector (cid:126)θ , i.e., S i (( ξ, (cid:126)θ )) = ( ξ, R i ( (cid:126)θ )) , with R i ( (cid:126)θ ) = (cid:126)θ − (cid:126)θ · e i )e i , (6)where e i denotes the i th canonical vector in R n . For an illustration, see Fig. 3.Throughout the remainder of this article, we assume, that the weight vector ω is purely afunction of ξ , i.e., ω (( ξ, (cid:126)θ )) = (cid:101) ω ( ξ ) for some (cid:101) ω : Ξ → ∆ n − so that ( C ) is trivially satisﬁed.With the Q i ’s and S i ’s as deﬁned in (5) and (6), respesctively, it can be veriﬁed that( C (cid:48) ) (cid:101) ω i ( ξ ) > ⇐⇒ ξ ∈ V i , is suﬃcient for the remaining conditions ( C ) and ( C ) to be satisﬁed, provided that thesymmetry condition (ii) of Assumption 1 holds. We say that the i th proposal is active instate ξ if ξ ∈ V i , or, equivalently, if (cid:101) ω i ( ξ ) >

0, and we denote by A ( ξ ) = { i ∈ { , . . . , n } | (cid:101) ω i ( ξ ) > } the index set of proposals which are active in ξ . Remark . Note that by our assumptions on the graphs G + i , they have non-empty edgesets E + i . This implies that for every i ∈ { , . . . , n } there exists at lease one ξ ∈ Ξ with i ∈ A ( ξ ). Hence there is always at least one state at which the i th momentum can beﬂipped.We consider(7) (cid:101) ω i ( ξ ) = (cid:101) Q ( ξ, N + i ( ξ )) + (cid:101) Q ( ξ, N − i ( ξ )) , as a generic choice for the weight vector which can be easily veriﬁed to satisfy condition( C (cid:48) ).Lastly, we extend the deﬁnition of the target measure (cid:101) π from Ξ to X as π (( ξ, (cid:126)θ )) = 12 n (cid:101) π ( ξ ) , ξ ∈ Ξ , (cid:126)θ ∈ {− , } n , (a) Undirected StateGraph G = ( V , E ) (b) State graph with threeassigned ﬂows: E +1 (blue), E +2 (red), E +3 (green) (c) State graph with indi-cated positive and negativeneighborhoods of state ξ . Figure 3.

Panel (A) shows the undirected state graph induced by a reversible proposal kernel (cid:101) Q on a state space comprised of 9 states. Panel (B) shows an exemplary assignment of ﬂows. Ifthe current state is ξ and either (cid:126)θ = (1 , ,

1) or (cid:126)θ = ( − , , N +2 ( ξ ) = { ξ } or N +3 ( ξ ) = { ξ , ξ , ξ } . If instead (cid:126)θ = (1 , − , −

1) or (cid:126)θ = ( − , − , − N − ( ξ ) = { ξ } ; see Panel (C). Notethat the neighbourhoods N +1 ( ξ ) , N − ( ξ ) , and N − ( ξ ) are empty. so that π is the product measure of (cid:101) π and the uniform measure on {− , } n . In particular, (cid:88) (cid:126)θ ∈{− , } n π (( ξ, (cid:126)θ )) = (cid:101) π ( ξ ) , i.e., the marginal measure of π in ξ coincides with the target measure (cid:101) π on Ξ. Moreover,with the augmented measure being of product form and uniform in the (cid:126)θ -component, itfollows that the S i ’s are π -invariant, i.e., π ( S i (( ξ, (cid:126)θ ))) = π (( ξ, (cid:126)θ )) for all i ∈ { , . . . , n } , ξ ∈ Ξ and (cid:126)θ, (cid:126)θ (cid:48) ∈ {− , } n , since S i only changes the (cid:126)θ -component.In conclusion, the collection of proposals, weights and involutions ( Q i , S i , ω i ) , i ∈ { , . . . , n } and the augmented measure π satisfy by construction the hypotheses of Theorem 4.2.Hence, with these choices, the MSMH-algorithm (see Algorithm 2) produces a Markovchain which preserves the measure π on X , and thus also the marginal measure (cid:101) π on Ξ. Remark . We have chosen to assumeΞ to be countable. Extending to the case where Ξ is a general separable measure spaceis straight forward when proposal kernel from state ξ , (cid:101) Q ( ξ, dξ (cid:48) ), is absolutely continuouswith respect to a common ( σ -ﬁnite) radon measure λ ( dξ (cid:48) ) for all ξ ∈ Ξ.It then follows that the measure (cid:101) π needs to also be absolutely continuous with respect to λ . One can then write the detailed balance condition as (cid:101) π ( dξ ) (cid:101) Q ( ξ, dξ (cid:48) ) = (cid:101) π ( dξ (cid:48) ) (cid:101) Q ( ξ (cid:48) , dξ )as measure on the product space Ξ × Ξ. If we denote by q ( ξ, ξ (cid:48) ) and p ( ξ ) to be the densitiesof (cid:101) Q ( ξ, dξ (cid:48) ) and (cid:101) π ( dξ ) with respect to λ then all that follows makes sense and applies tothis more general setting with (cid:101) Q and (cid:101) π replaced respectively by q and p . For example theconditions of Assumptions 1 become p ( ξ ) > (cid:101) π ( ξ ) > q ( ξ, ξ (cid:48) ) > than (cid:101) Q ( ξ, ξ (cid:48) ) >

0. The ratios of (cid:101) Q ’s in (5) become ratios of q ’s evaluated at the samepoints. The existence of a version (indistinguishable up to null sets) which has the rightmeasurability properties of these ratios and the acceptance rations r i from Algorithm 1and 2 are guarantied by the arguments of Proposition 1 in [27]. The fact that graph whichis constructed had countable vertices was unimportant as the deﬁnitions really do not usethe graph structure. All of the deﬁnitions make sense once the above modiﬁcations havebeen made.4.4. Ergodic properties.

Let in the remainder of this section P (cid:0) ( ξ, (cid:126)θ ) , · (cid:1) = n (cid:88) i =1 (cid:101) ω i ( ξ ) P i (cid:0) ( ξ, (cid:126)θ ) , · (cid:1) , denote the Markov kernel generated by Algorithm 2 with generic weight function (cid:101) ω asspeciﬁed in (7). If in addition to Assumption 1 the following Assumption 2.

For any extended state ( ξ, (cid:126)θ ) ∈ Ξ ×{− , } n and any active index i ∈ A ( ξ ) ,there are m ∈ N and ξ (cid:48) ∈ Ξ so that P mi (cid:0) ( ξ, (cid:126)θ ) , ( ξ (cid:48) , R i ( (cid:126)θ )) (cid:1) > . is satisﬁed, then we can conclude unique ergodicity of the generated Markov chain asdetailed in the following theorem. Theorem 4.6.

Let Assumption 1 and Assumption 2 be satisﬁed. The non-reversibleMarkov chain ( x k ) k ∈ N generated by P and initial extended state x = ˜ x is uniquely ergodicin the sense that (8) lim N →∞ N N − (cid:88) k =0 { x (cid:48) } ( x k ) = π ( x (cid:48) ) almost surely for any value of the initial state ˜ x ∈ X and all x (cid:48) ∈ X . In particular, forany observable ϕ : Ξ → R , we have (9) lim N →∞ N N − (cid:88) k =0 ϕ ( x k ) = (cid:88) x ∈X ϕ ( x ) π ( x ) almost surely for all initial states ˜ x ∈ X .Proof. A proof of this theorem can be found in appendix B. (cid:3)

By construction described in Section 4.3, Assumption 2 implies that for every state ξ ∈ V i we can reach a vertex ξ (cid:48) for which the probability of ﬂipping the i th velocity componentis positive. This can be accomplished by either following a directed path along edges in E + i if θ i = +1, or by following a directed path along edges in E − i if θ i = −

1. Providedthat each of the Markov kernels { P i } ni =1 possess an invariant measure (which Assumption 1guarantees; see Remark 4.3), we will see bellow that the only obstruction to the existenceof such a reachable vertex ξ (cid:48) is that ξ is a vertex of a cycle in the associated directed graph.To make this precise, we need a few concepts. A circuit in a directed graph G = ( V, E ) is a sequence of vertices ( v , . . . , v m ) such that(i) each pair ( v i , v i +1 ) ∈ E for i = 0 , . . . , m − v = v m ). Note that we allow edges ( v i , v i +1 ) to be repeated. We say thata circuit is escapable if there is at least one edge ( v, v (cid:48) ) ∈ E with v being a vertex in thecircuit and v (cid:48) not. Conversely, we say that a circuit is non-escapable if such an edge doesnot exist. We say that a non-escapable circuit ( v , . . . , v m ) is maximal if the vertices of thecircuit, namely { v k | k = 0 , . . . , m } , are not a proper subset of the vertices of another non-escapable circuit. Equipped with the terminology we can state the following Assumption 3and Theorem 4.7. Assumption 3.

For every value of (cid:126)θ ∈ {− , } n and i ∈ { , . . . , n } every maximal non-escapable circuit in G θ i i contains at least one state ξ (cid:48) for which P i (cid:0) ( ξ (cid:48) , (cid:126)θ ) , ( ξ (cid:48) , R i ( (cid:126)θ )) (cid:1) > .Here we use the short-hand notation G θ i i = (cid:40) G + i , θ i = +1 G − i , θ i = − . Theorem 4.7.

Provided that Assumption 1 holds, then Assumption 3 and Assumption 2are equivalent. In short: (Assumption 1 + Assumption 3) ⇐⇒ (Assumption 1 + As-sumption 2).Proof. We prove this theorem in appendix B. (cid:3)

In practice, Assumption 2 and even Assumption 3 may often be diﬃcult to check. Inorder to guarantee ergodicity in the sense of Theorem 4.6, one may instead consider a“lazy” version of the algorithm, where in each step(1) with some small probability ε >

0, a component index is uniformly sampled from { , . . . , n } and the sign of the corresponding velocity component ﬂipped, while thestate ξ remains unchanged.(2) with probability 1 − ε , the steps in Algorithm 2 are executed.Since the Markov kernel of the Markov chain generated by this modiﬁcation of the algorithmis a mixture of two kernels which each preserve π it directly follows that this Markov chainindeed preserves π . Moreover, provided that (cid:101) Q is irreducible it follows by similar argumentsas in the proof of Theorem 4.6 that the such generated non-reversible Markov chain is alsoirreducible, which is suﬃcient for the Markov chain to be uniquely ergodic in the sensespeciﬁed in Theorem 4.6. In addition we may consider an even “lazier” modiﬁcation of thealgorithm where in each step in addition to the above described modiﬁcation, we may leavewith non-zero probability the complete extended state ( ξ, (cid:126)θ ) unchanged. It is easy to seethat under this modiﬁcation, the resulting Markov chain will not only preserve the targetmeasure and be irreducible, but will in addition also be guaranteed to be acyclic, so thatthe chain converges in law to the target measure, i.e., lim N →∞ P N (cid:0) ( ξ, (cid:126)θ ) , ( ξ (cid:48) , (cid:126)θ (cid:48) ) (cid:1) = π ( ξ (cid:48) , (cid:126)θ (cid:48) )for all ( ξ, (cid:126)θ ) ∈ Ξ × {− , } n and all ( ξ (cid:48) , (cid:126)θ (cid:48) ) ∈ Ξ × {− , } n .In what follows we show how the framework described in this section can be appliedto devise non-reversible MCMC methods for the sampling of redistricting plans. For this purpose we describe in the following Section 5.1 the sampling space Ξ of that application–the set of redistricting plans– and the target measure (cid:101) π deﬁned on that space. We thenintroduce in Section 5.3 a reversible Markov kernel (cid:101) Q on set of redistricting plans, which weconstruct as a tempered version of the single node ﬂip algorithm of [12, 16]. In Section 6 wedescribe and discuss three methods to partition and direct the induced state space graph G = ( V , E ). 5. Sampling redistricting plans

The space and distribution of redistricting plans.

Throughout this article weconsider the problem of sampling redistricting plans as equivalent to sampling the spaceof n D partitions on a graph G p = ( V p , E p ), which is a discrete representation of the ad-ministrative region (e.g., a state, or collection of counties) for which redistricting plansare drawn. The vertices ( V p ) of the graph correspond to geographic regions of a certainadministrative level (e.g., voter tabulation districts (VTD), precincts, census blocks), andedges ( E p ) are placed between vertices that are either rook, queen, or legally adjacent ; seeFig. 4a (depending on the situation).In order to keep language simple we present our approach in the setup where the vertexset V p represent the precincts of a state , and we refer to G p as the precinct graph .We represent a single districting plan made up of n D districts as a function ξ : V p →{ , . . . n D } . In other words, districting plans are n D -coloring of the graph G p . Informally, ξ ( v ) = i means that the precinct associated with vertex v is in the i th district. Given adistricting plan ξ , we denote by D i ( ξ ) = { v ∈ V p | ξ ( v ) = i } , E i ( ξ ) = { ( u, v ) ∈ E p | ξ ( v ) = ξ ( u ) = i } , the set of precincts assigned to the i th district, and the set of edges between verticescorresponding to precincts within the i th district, respectively.On the set of all districting plans (cid:8) ξ | ξ : V p → { , . . . n D } (cid:9) , we deﬁne a score function J which measures how well a redistricting plan complies with a set of prescribed crite-ria such as compactness of districts, equal partition of the population or preservation ofmunicipalities.This score function is used to deﬁne a probability measure (cid:101) π on the set of districtingplans via the relation (cid:101) π ( ξ ) ∝ e − J ( ξ ) . (10)The lower the score J ( ξ ) of a redistricting plan, the better it complies with the criteria andthe higher is the probability assigned to it by the Gibbs measure (cid:101) π ( ξ ). In particular, if aredistricting plan ξ is non-compliant, then, J ( ξ ) = ∞ , and thus (cid:101) π ( ξ ) = 0. Rook adjacency means that the geographical boundary between two regions has non-zero length; queenadjacency means that the boundaries touch, but may do so at a point. At times two regions may not begeographically adjacent, but may be considered adjacent for legal purposes; for example, an island may stillbe considered adjacent to regions on a mainland for the purposes of making districts. (a) Precinct graph (b)

District graph

Figure 4. (A) State with associated precinct graph; the embedding φ used for displaying theprecinct graph places nodes at the areal centroid of the associated precinct. (B) Coloring of theprecinct graph which corresponds to a partition of the state into three districts. Superimposed isthe associated district graph. In the remainder of this article we denote byΞ := { ξ : V p → { , . . . , n D } | J ( ξ ) < ∞} . the support of (cid:101) π , to which in the following we will also refer as the set of all possible maps or possible redistricting plans .5.2. The score function.

The score function J ( ξ ) which determines the measure (cid:101) π typ-ically relies on additional information associated with vertices and edges of the precinctgraph G p such as population, land area, and border length. This additional data is usedto evaluate the districts on desired redistricting criteria, such as equal-population andcompactness.We denote by pop( v ) and area( v ) the population and area, respectively, of the geograph-ical region corresponding to vertex v ∈ V p . Similarly, for e = ( u, v ) ∈ E p , we denote byboundary( e ) the length of the boundary shared between the precincts u and v . Certainvertices may not share all of their boundary with an adjacent vertex; for example, a vertexmay be on the boundary of the map. In this case, we also describe the unshared boundaryof a vertex to be boundary( v ) (which will be zero for interior nodes).On the set of all possible maps, the score function J may be constructed as a positivelinear combination of sub-functions, each of which evaluate diﬀerent properties of theredistricting map (see, e.g., [12]). For example, in some of the numerical examples ofSection 7, we let(11) J ( ξ ) = w pop J pop ( ξ ) + w c J c ( ξ ) , where J pop ( ξ ) is a measure of the population deviation and J c ( ξ ) is a measure of howcompact the districts are. Typically J pop ( ξ ) is in the form of a hard constraint or a sum of squared deviations of each district from a target population; similarly, J c ( ξ ) is typically asum of district isoparametric ratios or some measure of the overall perimeter.5.3. The tempered Single Node Flip proposal and algorithm.

To utilize the ideasof Section 4.3 in order to sample from (cid:101) π on the space of possible redistricting plans, wemust ﬁrst establish a proposal kernel (cid:101) Q on the domain of possible redistricting plans Ξwhich satisﬁes Assumption 1.For redistricting problems, one of the most widely used methods is what is commonlyknown as the single node ﬂip algorithm . This algorithm has been shown to mix wellon smaller problems [11, 17, 16], but as the size of the districting plan and the criteriafor redistricting becomes more complex, the moving boundary MCMC algorithms willconverge, in theory, but the mixing time for these chains may cause their use to be infeasibleto solve computationally [21].In this article we use the proposal kernel (cid:101) Q of a tempered version of the single nodeﬂip algorithm as the basis for constructing our non-reversible MCMC methods. As inthe classical version of the algorithm, the proposal kernel of this variant of the algorithmchanges “ﬂips” the color of exactly one vertex on the boundary of a district to the color ofa neighboring district.More speciﬁcally, for an ordered pair, ( u, v ) ∈ V p × V p , of two distinct precincts we deﬁnethe ﬂip operator F ( u,v ) : Ξ → Ξ , F ( u,v ) ( ξ )( w ) = (cid:40) ξ ( w ) , w (cid:54) = v,ξ ( u ) , w = v, which ﬂips the label of vertex v to the label of vertex u , and where as above Ξ is the domainof all possible maps. For a given districting map ξ ∈ Ξ, we denote by C ( ξ ) := (cid:8) ( u, v ) ∈ E p : ξ ( u ) (cid:54) = ξ ( v ) , F ( u,v ) ( ξ ) ∈ Ξ (cid:9) , the set of all conﬂicting edges , i.e., edges which connect precincts with diﬀerent labels(precincts which are assigned to diﬀerent districts), and for which application of the cor-responding ﬂip operator results in a valid redistricting plan. Moreover, denote by N ( ξ ) := (cid:8) F ( u,v ) ( ξ ) | ( u, v ) ∈ C ( ξ ) (cid:9) , the set of all possible redistricting plans which can be obtained from the districting plan ξ upon application of the ﬂip operator along a conﬂicting edge; that is the vertices of theneighborhood of ξ . With this notation at hand we deﬁne the proposal distribution (cid:101) Q ( ξ, · )to be a tempered version of the target measure (cid:101) π constrained to the set N ( ξ ), i.e., (cid:101) Q ( ξ, · ) ∝ N ( ξ ) ( · )e − βJ ( · ) , ξ ∈ Ξ , or, more explicitly,(12) (cid:101) Q ( ξ, ξ (cid:48) ) = (cid:40) Z β ( ξ ) e − βJ ( ξ (cid:48) ) , ξ ∈ N ( ξ )0 , otherwise Figure 5.

Support of the single node ﬂip proposal; the color(s) of the circular markers withinprecincts indicate that the probability of proposing the districting plan which is obtained from thecurrent districting plan by changing the color of the respective precinct to (one of) the color(s) ofthe marker, is positive. where β > Z ( ξ ) := (cid:88) ξ (cid:48) ∈N ( ξ ) e − βJ ( ξ (cid:48) ) is the corresponding partition function.The tempered proposal kernel (cid:101) Q can be combined with a Metropolis acceptance rejec-tion criteria to obtain a reversible Metropolis-Hastings algorithm (see Algorithm 5) whichpossess (cid:101) π as an invariant measure.6. Application to graph partitions and redistricting

In this section we present two diﬀerent approaches for applying the framework of Sec-tion 4.3 to built non-reversible Markov chains for the sampling of redistricting maps outof the tempered single node ﬂip algorithm described in Section 5.3. We describe these twoapproaches in Section 6.1 and Section 6.2, respectively. In Section 6.3 we discuss com-putational aspects of the algorithms and the role of tempering/choice of the temperatureparameter β .6.1. Precinct ﬂows via planar embeddings.

In this approach we construct a singleﬂow via a planar embedding of the planar graph in such a way that state transitions inpositive ﬂow direction are aligned are aligned with a prescribed vector ﬁeld in R .Consider the embedding φ : V p → R of the precinct graph in R , and a volume preserv-ing vector ﬁeld v in R .More speciﬁcally, we consider the embedding which maps every vertex v to the centerof mass (assuming constant density) of the associated precinct and the ﬁeld deﬁned byconcentric circles oriented in the counter-clockwise direction. That is, φ ( v ) = 1area( v ) (cid:90) A v x d x where A v ⊂ R is the precinct represented by v in some suitable geographical map repre-sentation of the state, and v ( x, y ) = ( − r sin( α ) , r cos( α )) , where ( r, α ) = ( (cid:112) x + y , arctan( x/y )) are the polar representations of the ( x, y ) ∈ R .For a given embedding φ and vector ﬁeld v , we need to make precise what is meant bya transition to be aligned with the vector ﬁeld v . For this purpose we need to deﬁne afunction, orientation φ, v : E → {− , } , which maps every edge of the state graph E of theMarkov chain to the set {− , +1 } , with +1 , − E + = { ( ξ, ξ (cid:48) ) | orientation φ, v ( ξ, ξ (cid:48) ) > } . The question now is how to construct such a function orientation φ, v . To accomplish thiswe base positively align the movement of centroids with the vector ﬁeld. Let c ( D i ( ξ )) = 1area( D i ( ξ )) (cid:88) v ∈ D i ( ξ ) area( v ) φ ( v ) , (13)denote the centroid of the i -th district. We orient edges ( ξ, ξ (cid:48) ) such that transitions from ξ to ξ (cid:48) are such that the movements of the centroids of the involved districts i, j (these arethe districts for which either a precinct is removed or added in the course of the transition)are aligned with the direction of the vector ﬁeld v , i.e.,(14)orientation φ, v ( ξ, ξ (cid:48) ) = sign  (cid:88) k ∈{ i,j } v (cid:18) c ( D k ( ξ (cid:48) )) + c ( D k ( ξ ))2 (cid:19) · (cid:2) c ( D k ( ξ (cid:48) )) − c ( D k ( ξ )) (cid:3) , where i, j are the indices of the two districts which are modiﬁed in the transition from ξ to ξ (cid:48) . See Algorithm 3 for an algorithmic implementation and Fig. 6 for a graphicalillustration of this method.6.2. District to district ﬂows.

We associate n = n D ( n D −

1) momenta θ e ∈ {− , } , e ∈ E d across the ordered district pairs E d = { ( i, j ) ∈ { , . . . , n D } : i < j } . If the momentum θ e associated with an adjacent district pair e = ( i, j ) is positive, we may only propose statechanges in which a boundary precinct of district j is reassigned to district i . If, on theother hand, θ e = −

1, we may only propose state changes in which a boundary precinct ofdistrict i is reassigned to district j ; see Fig. 7a.In the view of Section 4.3 the construction of the algorithm is as follows. The velocityvector is of the form (cid:126)θ = ( θ e , . . . , θ e n ) ∈ {− , } E d . For any pair of districts e = ( i, j ) ∈ E d , Algorithm 3:

Center-of-mass ﬂow input : ξ, θ N θ ( ξ ) ← { ξ (cid:48) ∈ N ( ξ ) : orientation φ, v ( ξ, ξ (cid:48) ) = θ } ; if N θ ( ξ ) = ∅ then θ ← − θ else sample ξ (cid:48) ∼ N ( ξ ) ( · )e − βJ ( · ) , sample u ∼ U ([0 , N − θ ( ξ (cid:48) ) ← (cid:110) ξ (cid:48)(cid:48) ∈ N ( ξ (cid:48) ) :orientation φ, v ( ξ (cid:48) , ξ (cid:48)(cid:48) ) = − θ (cid:111) ; if N − θ ( ξ (cid:48) ) = ∅ or u < e J ( ξ ) − J ( ξ (cid:48) ) Z θ ( ξ ) e βJ ( ξ ) − βJ ( ξ (cid:48) ) Z − θ ( ξ (cid:48) ) then ξ ← ξ (cid:48) else θ ← − θ return ξ, θ In the edge-aligned version of the algorithm, orientation φ, v ( ξ, ξ (cid:48) ) has the formspeciﬁed in (14). (a) Center-of-mass-ﬂow, conceptual sketch (b)

Support of center-of mass-ﬂow proposal

Figure 6. (A) Geometrical illustration of the orientation assigning function orientation φ, v in thecenter-of-mass ﬂow method: in a transition where the color of the hatched precinct changes fromred to blue the centroids of the corresponding districts (black dots) change as indicated by theblue arrows. The red arrows correspond to evaluations of the vector ﬁeld (black arrows) at therespective midpoints of the blue arrows. The orientation of the transition is computed as the signof the sum of the inner products of the two arrow/vector pairs of each district. (B) Support of thecenter-of-mass-ﬂow proposal distribution for positive velocity θ = 1; The same convention is usedfor displaying the support as in Fig. 5. the corresponding positive ﬂow E + e of transitions between possible districts where a precinctfrom district i is removed and added to district j is E + e = { ( ξ, F ( u,v ) ( ξ )) | ξ ∈ E , ( u, v ) ∈ C + e ( ξ ) } , where C + e ( ξ ) = { ( u, v ) ∈ C ( ξ ) | u ∈ D i ( ξ ) , v ∈ D j ( ξ ) } , denotes the set of directed conﬂicted edges which connect a precinct of district i with aprecinct of district j . Similarly, we have E − e = { ( ξ, F ( u,v ) ( ξ )) | ξ ∈ E , ( u, v ) ∈ C − e ( ξ ) } where C − e ( ξ ) = { ( u, v ) | ( v, u ) ∈ C + e ( ξ ) } . The e th vicinity of the state ξ in positive andnegative direction can be explicitly written as N + e ( ξ ) = { F ( u,v ) ( ξ ) | ( u, v ) ∈ C + e ( ξ ) } , N − e ( ξ ) = { F ( u,v ) ( ξ ) | ( v, u ) ∈ C + e ( ξ ) } , respectively, and we can write the e th proposal kernel on the extended space Ξ × {− , } E d as Q e (( ξ, (cid:126)θ ) , ( ξ (cid:48) , (cid:126)θ (cid:48) )) =  e − βJ ( ξ (cid:48) ) Z θ e e ( ξ ) if ξ (cid:48) ∈ N θ e e ( ξ ) and (cid:126)θ (cid:48) = (cid:126)θ, N θ e e ( ξ ) = ∅ and ξ (cid:48) = ξ, (cid:126)θ (cid:48) = F e ( (cid:126)θ ) , , where Z θ e e ( ξ ) = (cid:80) ξ (cid:48) ∈N θee ( ξ ) e − βJ ( ξ (cid:48) ) . The generic choice (7) for the weight vector (cid:101) ω ( ξ ) =( (cid:101) ω e ( ξ ) , . . . , (cid:101) ω e n ( ξ )) ∈ {− , } E d results in weights of the form(15) (cid:101) ω e ( ξ ) = Z e ( ξ ) Z ( ξ ) , where Z e ( ξ ) = Z θ e e ( ξ ) + Z − θ e e ( ξ ) , for e ∈ E d , so that Z ( ξ ) = (cid:80) ˜ e ∈ E d Z ˜ e ( ξ ). With this choice of proposal kernel and weights,the Metropolis-Hastings ratio becomes r e (( ξ, (cid:126)θ ) , ( ξ (cid:48) , (cid:126)θ (cid:48) )) = e J ( ξ ) − J ( ξ (cid:48) ) Z θ e e ( ξ ) e βJ ( ξ ) − βJ ( ξ (cid:48) ) Z − θ e e ( ξ (cid:48) ) . We provide an explicit implementation as Algorithm 4.6.2.1.

Associated district graph.

If the number of district is three or larger, then, for certainredistricting plans we may have the situation that certain pairs of district do not share aborder. That means that for a given redistricting plan ξ ∈ Ξ the set of adjacent districts E d ( ξ ) = (cid:8) ( i, j ) | C + e ( ξ ) ∪ C − e ( ξ ) (cid:54) = ∅ (cid:9) , may be a proper subset of E d . We refer to the state dependent graph G d ( ξ ) = ( V d , E d ( ξ )),where V d = { , . . . , n D } is the index set of the districts, as the district graph of ξ ; seeFig. 4b. If e = ( i, j ) ∈ E d ( ξ ), we refer to Q e as an active kernel. The generic choice ofweights as in (15) ensures that only active kernels are selected in the proposal step. (a) District-to-district ﬂow (b)

Circular ﬂow on district graph

Figure 7. (A) Support of the district-to-district ﬂow proposal distribution with ( θ , , θ , , θ , ) =(1 , , , , (1 , , (2 ,

3) are displayed as red, blue, andgreen colored directed edges of the district graph, respectively. (B) Support of a generalized version(see Remark 6.2) of the district-to-district ﬂow where a momentum variable θ (1 , , is assignedto the red colored counter-clockwise oriented cycle on the district graph. The support is shownfor positive velocity θ (1 , , , ) = 1. For both ﬁgures the same convention is used for displaying thesupport as in Fig. 5. Remark . The total number of district pairs | E d | = n D ( n D + 1) / n D , and thus keeping track of all entries in (cid:126)θ may be memoryintensive if n D is large. Moreover, the stochastic process ( ξ n ) n ∈ N typically converges tothe target measure (cid:101) π (when accounting for internal symmetries/label permutation) beforeall proposal kernels have become active. For this reason one may not want to keep trackfor all velocities over the whole simulation time. Instead one may resample the velocity θ e from the uniform measure on {− , } whenever the kernel Q e becomes active after atransition which results in the previously non-adjacent district pairs e = ( i, j ) to share aborder. Remark . Algorithm 4can be viewed as a special case of a much larger class of non-reversible algorithms which usethe structure of the state dependent district graph G d ( ξ ) = ( V d , E d ( ξ )). Instead of assigningvelocities to edges as in Algorithm 4, one may assign velocities to any type of subgraphsof G d ( ξ ), and associate proposals which mimic ﬂows across the respective districts. Forexample, one may associate velocities with cycles of 3 districts (see Fig. 7b), and choosethe proposal Q i,j,k so that it mimics a district ﬂow in clockwise/anti-clockwise directiondepending on the value of the velocity θ ( i,j,k ) , e.g., Q ( i,j,k ) (( ξ, (cid:126)θ ) , ( ξ (cid:48) , (cid:126)θ (cid:48) )) = N θ ( i,j ) ( ξ ) ( · ) (cid:101) Q ( i,j ) ( ξ, · ) + N θ ( j,k ) ( ξ ) ( · ) (cid:101) Q ( j,k ) ( ξ, · )+ N θ ( k,i ) ( ξ ) ( · ) (cid:101) Q ( k,i ) ( ξ, · ) , with the terms on the right hand side as deﬁned in the pair-wise district graph version ofthe algorithm.6.3. Computational aspects and choice of the temperature parameter.

The com-putational eﬃciency of an MCMC scheme both depends on the cost per generated sampleand the mixing rate of the Markov chain itself. It is intuitively clear that very high rejec-tion rates will negatively aﬀect the mixing speed of a Markov chain. This is in particulartrue in the case of non-reversible Markov chains constructed as presented here, since everyrejection event will result in the reversal of a momentum variable, so that high rejectionrates prevent kinetic like movement of the chain. Therefore, it may be worth to tradein higher computational costs for the generation of proposals / cost per step if rejectionrates are reduced by that. In what follows we discuss this tradeoﬀ in the case of two im-portant special cases of the tempered proposal kernel of Algorithm 4 (Note that the sameobservation follow as a special case for Algorithm 5.) • If β = 1, then the proposal distribution (cid:101) Q ( ξ, · ) is the target measure constrainedto N ( ξ ), and the Metropolis ratio simpliﬁes to(16) r e ( ξ, ξ (cid:48) ) = Z θ e e ( ξ ) Z − θ e e ( ξ (cid:48) ) × Z e ( ξ (cid:48) ) Z e ( ξ ) × Z ( ξ ) Z ( ξ (cid:48) ) . In this case the acceptance probability of the proposal is not dependent on theenergy diﬀerence ∆ J = J ( ξ (cid:48) ) − J ( ξ ), but is merely a function of various partitionfunctions of the current state and the proposed state. • In the limit β →

0, the proposal distribution (cid:101) Q ( ξ, · ) becomes the uniform distri-bution on N ( ξ ), and we thus recover for β = 0 a version of the single node ﬂipalgorithm where the proposal is sampled uniformly from the set N ( ξ ). In this casethe acceptance probability becomes r e ( ξ, ξ (cid:48) ) = e − ∆ J × |N θ e e ( ξ ) ||N − θ e e ( ξ (cid:48) ) | × |N e ( ξ (cid:48) ) ||N e ( ξ ) | × |N ( ξ ) ||N ( ξ (cid:48) ) | . In terms of computational costs it is important to note, that in the case where β > ξ as well as all statesin the neighborhood of the proposal ξ (cid:48) . Depending on the form of the score function J the computational costs for this operation may vary. In particular, if the form of J is asin (11), computing the J ( ξ (cid:48) ) of a neighboring state of ξ may only require reevaluation ofthe two terms in sums of the sub score functions which relate to the two districts whichare modiﬁed in the transition from ξ to ξ (cid:48) . In contrast to that the operations involved ingenerating a proposal and evaluating the Metropolis-Hastings ratio in the case of β = 0only require identifying the various neighborhoods of the current state ξ and the proposal ξ (cid:48) .The higher computational costs of using a tempered proposal with β = 1, may beoﬀset due to potentially drastically reduced rejection rates in the tempered case. For largeredistricting maps the terms Z e ( ξ (cid:48) ) /Z e ( ξ ) and Z ( ξ ) /Z ( ξ (cid:48) ) in the Metropolis-Hastings ratio Algorithm 4:

Pair-wise district-to-district ﬂow input : ξ, (cid:126)θ sample e ∼ (cid:101) ω ( ξ ) = [ Z ˜ e ( ξ ) /Z ( ξ )] ˜ e ∈ E d ; if Z θ e e ( ξ ) = 0 then θ e ← − θ e else sample ξ (cid:48) ∼ N θee ( ξ ) ( · )e − βJ ( · ) ; sample u ∼ U ([0 , if Z − θ e e ( ξ (cid:48) ) = 0 or u < e J ( ξ ) − J ( ξ (cid:48) ) Z θee ( ξ ) e βJ ( ξ ) − βJ ( ξ (cid:48) ) Z − θee ( ξ (cid:48) ) then ξ ← ξ (cid:48) ; θ ˜ e ∼ U ( {− , } ) , for any edge ˜ e ∈ E d ( ξ (cid:48) ) \ E d ( ξ ); else θ e ← − θ e ; return ξ, (cid:126)θ (16) can be expected to be with high probability close to 1. Similarly, the ﬁrst factormay only be signiﬁcantly smaller than 1 if extending the district i into j is in averageenergetically unfavorable in comparison to extending the j th district in to the i th district,a property which may give rise to cyclic alignment of district-to-district velocities; seeSection 7.In contrast to that the Metropolis-Hastings ratio and thus the acceptance probability inthe case β = 0 does directly depend on the energy diﬀerence between the current state ξ and ξ (cid:48) . In the situation where the variation of score values of states in the neighborhood ofa given states is large this may result in drastically increased rejection rates in comparisonthe rejection rates in the tempered proposal.7. Numerical experiments

To test our ideas numerically, we consider an example ﬁrst presented in Section 4 of [21]of a square lattice split into two districts. As noted in the previous work, this districtingproblem is equivalent to considering a loop-free random walk that partitions the region.We consider the score function J ( ξ ) = J pop ( ξ ) + J C ( ξ ) , (17)where the population score J pop ( ξ ) = (cid:40) , pop min ≤ | D ( ξ ) | ≤ pop max ∞ otherwise , is a hard constraint ensuring that the number of nodes in each district are between pop min and pop max . The compactness score J C ( ξ ) = | C ( ξ ) | corresponds to length of the boundary,i.e., the number of conﬂicted edges.We sample the space of districting plans that are simply connected on a 10 ×

10 squarelattice with parameter values pop min = 45 , pop max = 55 (i.e. we allow up to 10% deviationfrom half of the lattice points).We note that the score function J ( ξ ) is minimized when the district boundary lies eitherperfectly horizontally or vertically, and decreases as the district boundary length grows.Intuitively we may think of four meta-stable states, one with a given district in the north,south, east or west, with an energetic barrier between these meta-stable states. Indeed,as demonstrated in [21], the single node ﬂip algorithm may mix extremely slowly. Using J C ( ξ ) = log(10) | cut( ξ ) | on a 40 ×

40 lattice, the authors found that the system did notchange meta-stable states even after nearly 3 billion steps when using the single node ﬂipalgorithm without tempering.We evaluate the performance of the single node ﬂip algorithm, non-reversible district-to-district ﬂow algorithm, and non-reversible center of mass ﬂow algorithm. For the singlenode ﬂip algorithm we consider versions with tempering and without tempering. For thetempered algorithms we set β = 0 .

5. For the center of mass algorithm, we place a circularvortex ﬁeld with an origin at the center of the redistricting graph; in Cartesian coordinates,the ﬁeld is deﬁned as (cos( α ) , − sin( α )), where α is the angle between the vector and thepositive horizontal axis.For each method we simulate 10 independent chains of 10 samples. All chains are ini-tialized in the same meta-stable state with one district in the north and south, respectively.We ﬁrst evaluate the early mixing properties of the methods on basis of the ﬁrst 25000steps of a single chain for each method. For each vertex of the precinct graph, we computethe fraction f of steps that vertex is assigned to district 1. By symmetry, the expected valueof f with respect to the target measure (cid:101) π is 1 / | f − / | ) sgn( f − / , (18)where f is the fraction of time a node spends in a chosen district and E [ f ] = 1 /

2. Wesummarize this ﬁeld variable in the early part of the chains in Figure 8.In the single node ﬂip algorithm, we ﬁnd that the plan is predominantly stuck in thenorth/south orientation (see Figure 8). Tempering helps to alleviate this issue, however,this method appears to favor diagonal cuts (or more likely oscillates between a verticaland horizontal orientation). The district to district ﬂow is biased toward a north/southdistricting plan, although apparently less so than the ﬁrst two methods. The center of massﬂow is nearly unbiased in its orientation. The largest deviation of f from the expectedvalue of any vertex is found to be . . . . -0.4-0.3-0.2-0.100.10.20.30.4 Center of MassDistrict to districtSingle Node Flip(Tempered)Single Node Flip , s t ep s s t ep s T i m e i n d i s t r i c t Figure 8.

We examine the fraction of time each node in a 10 ×

10 square latticeremains in a given district for four the methods after 25000 steps (top row) and10 step (bottom row). The methods are single node ﬂip (left most), single nodeﬂip with tempered proposals (second from the left), center of mass ﬂow (secondfrom the right) and district to district ﬂow (right most). We also examine the chains over all 10 million steps in the second row of Figure 8, andﬁnd that for all methods and all vertices f does not deviate further than 0.02 from theexpected value.To further probe the mixing rates of the chains, we examine how frequently the chaintransitions between each meta-stable state. We deﬁne a meta stable state as a state wherethe square boundary between the two districts is within three nodes of a horizontal orvertical cut on an opposite side of the lattice. We begin by examining the frequency thateach chain spends in each of the meta-stable states after 10 steps and look at the varianceof these frequencies over the 10 chains (see Figure 9). For the single node ﬂip algorithm therange of frequencies for the meta-stable states varies between 3.2% and 4.5%; when addingtempering, the range of varied frequencies across chains lies between 3.6% and 4.7%; fordistrict to district ﬂow, the range lies between 2.8% and 2.8%. In contrast, the center ofmass ﬂow frequencies range between 1.4% and 2.8%, meaning that the variance across runsis signiﬁcantly lower than the other methods.We expect that the reason for the faster mixing in the center of mass ﬂow is due to ahigher rate of transitions across meta-stable states. We examine the frequency of transitionsbetween meta-stable states in Figure 10 after 10 millions steps on a single chain. We ﬁndthat all of the methods are symmetric in terms of how often they transition between meta-stable states. We ﬁnd that the center of mass ﬂows have signiﬁcantly more transitions thanall other methods – this method transitions roughly 7,500 times in 10 million proposals Single Node FlipSingle Flip (Temp)District to districtCenter of mass F r a c t i on o f s a m p l e s i n s t a t e Approximate meta-stable state

Figure 9.

We display the median frequency spent in the meta-stable states after10 proposals over 10 chains for each method. We also display the deviation of thesefrequencies across the chains with standard box plots. The half colored squarescorrespond to a given district being in the north, east, south and west (whereshaded). Dashed lines show the long time expected limit (i.e. equal probability ofbeing in any of the meta-stable states)) which is more than twice the number of transitions of the other methods. In contrast thesingle node ﬂip tempered and district to district ﬂow transition with a nearly identicalfrequency (roughly 3,600 times per 10 million proposals) and the single node ﬂip methodtransitions even less (roughly 2,200 times per 10 million proposals).We conclude by examining how the districting plan decorrelates along the trajectoryof each sampling method. To examine this correlation, we examine precinct assignmentmatrix, φ : Ξ → R n × d such that φ ( ξ ) ij = 1 if precinct i (of n precincts) is assigned todistrict j (of d districts), 0 else. To compare two precinct assignments, we note that { φ ( ξ ) φ ( ξ (cid:48) ) T } ii = δ ξ ( i ) ξ (cid:48) ( i ) , (19)which is to say that the diagonal of the product is one when the precinct is assigned to thesame precinct and zero otherwise. This implies that1 n tr( φ ( ξ ) φ ( ξ ) T ) = 1 . (20)Furthermore, by symmetry, we have that E ( φ ( ξ )) ij = 1 /d. (21) Single Node FlipSing. Node Flip (Temp)District to DistrictCenter of Mass

Figure 10.

We display the number of transitions between meta-stable states inthe ﬁrst 10 million proposals on a single chain for each of the four methods. Thehalf colored squares correspond to a given district being in the north, east, southand west. All sub-ﬁgure bar charts have the same scale in the vertical axis with arange of zero to 1,100 transitions.

We use the above facts to develop a measurement of precinct similarity given by G ( t ) = dn ( d − E (cid:34) tr (cid:16)(cid:0) φ ( ξ ) − E ( φ ) (cid:1)(cid:0) φ ( ξ t ) − E ( φ ) (cid:1) T (cid:17)(cid:35) , (22)where the outer expectation is taken with respect to the law of the process assuming ξ ∼ (cid:101) π .We remark that G (0) = 1 and that G ( t ) → t → ∞ since ξ t becomes independent of ξ in the limit t → ∞ . We also remark that this correlation is closely related to the evolutionof the Hamming distance between two plans. The Hamming distance, d ( ξ, ξ (cid:48) ), counts thenumber of precincts that are assigned to diﬀerent districts across two plans. Thustr( φ ( ξ ) φ ( ξ (cid:48) ) T ) = n − d ( ξ, ξ (cid:48) ) . (23)We estimate G ( t ) by taking 100,000 boot-strapped samples from each chain of 10 millionsamples. From each sample, we gather statistics as a function of progressive steps from thesample and then average across all chains. We plot G ( t ) as a function of the steps from theinitial samples in Figure 11. We ﬁnd that all four methods decorrelate within 50,000 steps.We ﬁnd that the center of mass ﬂow decorrelates signiﬁcantly faster than the three othermethods. The district to district and single node ﬂip tempered algorithms decorrelate atnearly the same rate, which makes sense due to the fact that a two district state does not Single Node FlipSingle Flip (Temp)District to districtCenter of mass P r e c i n c t S i m il a r i t y ( G ( t )) Figure 11.

We show the correlation of precinct assignments, G ( t ), as a functionof the number of steps taken for each of the four methods. The results are takenby bootstrap sampling over the 10 diﬀerent runs on the ﬁrst 10 − × stepsfor each method and then gathering statistics for G ( t ) starting with each of thesamples. admit cycles which can create longer time ﬂows without rejection. The single node ﬂipalgorithm decorrelates far slower than all other methods. Discussion

To the best of our knowledge this article is the ﬁrst work to propose using non-reversibleMCMC schemes in the context of sampling of redistricting plans. We explore variousnatural choices of introducing non-reversibility in this application (i.e., via ﬂows inducedby a vector ﬁeld, and district-to-district ﬂows). We provide the necessary (and novel)mathematical framework for implementing these approaches in the form of the MixedSkew Metropolis-Hastings algorithm, which relies on a generalization of the skew detailedbalance condition involving multiple momentum and a mixture of proposal kernels. Theresulting Markov chain, on a state space extended to include unit momentum, satisﬁesmixed skew detailed balance. In this setting, we derive conditions which ensure ergodicityof the generated Markov chain with respect to the target measure.This framework can be used to derive many instances of non-reversible MCMC algo-rithms, both in general and for the sampling of redistricting plans. We focus in this articleon two speciﬁc simple implementations of the above mentioned approaches, namely thecenter-of-mass ﬂow (Algorithm 3) and the pair-wise district-to-district ﬂow (Algorithm 4).Even with this simple implementations, we ﬁnd in numerical experiments that the pro-posed methods mix signiﬁcantly better than comparable reversible methods. We expect that further improvement can be achieved by more principled implementations of the heresuggested approaches. This may open many interesting directions of research. This per-tains in particular the choice of the vector ﬁeld v , the embedding φ and the orientationfunction. Similarly, we expect that a more reﬁned assignment of cycles on the districtgraph as alluded to in Remark 6.2 may lead to signiﬁcant improvement in performance inredistricting problems with a larger number of districts. Acknowledgement

Matthias Sachs acknowledges support from grant DMS-1638521 from SAMSI. JonathanMattingly acknowledges the partial support the NSF grant DMS-1613337. Jonathan Mat-tingly and Gregory Herschlag acknowledge partially support from the Duke MathematicsDepartment, the Rhodes Information initiative at Duke, and the Duke Dean’s and Provost’soﬃces. All authors are thankful for the support of SAMSI and the Duke NSF-TRIPOSgrant (NSF-CFF-1934964) which have supported activity around the mathematics of Re-districting during the time of this work which was very stimulating and supportive. Wealso acknowledge the PRUV and Data+ undergraduate research programs under whichour initial work in Quantifying Gerrymander was developed staring in 2013. JonathanMattingly is thankful for very educational and inspiring conversations with Manon Michelwhile at CIRM in Marseille in September 2018 which started him on this line of inquire.We would also like to thank Andrea Agazzi, Jianfeng Lu, and the Triangle QuantifyingGerrymandering Meet-up for useful and stimulating discussions.

References [1] E. Akhmatskaya, N. Bou-Rabee, and S. Reich. A comparison of generalized hybrid Monte Carlo meth-ods with and without momentum ﬂip.

Journal of Computational Physics , 228(6):2256–2265, 2009.[2] S. Bangia, C. V. Graves, G. Herschlag, H. S. Kang, J. Luo, J. C. Mattingly, and R. Ravier. Redistricting:Drawing the line. arXiv preprint arXiv:1704.03360 , 2017.[3] J. Bierkens, P. Fearnhead, G. Roberts, et al. The zig-zag process and super-eﬃcient sampling forBayesian analysis of big data.

The Annals of Statistics , 47(3):1288–1320, 2019.[4] A. Bouchard-Cˆot´e, S. J. Vollmer, and A. Doucet. The bouncy particle sampler: A nonreversiblerejection-free Markov chain Monte Carlo method.

Journal of the American Statistical Association ,113(522):855–867, 2018.[5] D. Carter, G. Herschlag, Z. Hunter, and J. Mattingly. A merge-split proposal for reversible MonteCarlo Markov chain sampling of redistricting plans. arXiv preprint arXiv:1911.01503 , 2019.[6] D. DeFord, M. Duchin, and J. Solomon. Recombination: A family of Markov chains for redistricting. arXiv preprint arXiv:1911.05725 , 2019.[7] P. Diaconis, S. Holmes, and R. M. Neal. Analysis of a nonreversible Markov chain sampler.

Annals ofApplied Probability , pages 726–752, 2000.[8] P. Dobson, I. Fursov, G. Lord, and M. Ottobre. Reversible and non-reversible Markov chain MonteCarlo algorithms for reservoir simulation problems.

Computational Geosciences , pages 1–13, 2020.[9] A. Duncan, G. Pavliotis, and K. Zygalakis. Nonreversible Langevin samplers: Splitting schemes, anal-ysis and implementation. arXiv preprint arXiv:1701.04247 , 2017.[10] A. B. Duncan, T. Lelievre, and G. Pavliotis. Variance reduction using nonreversible Langevin samplers.

Journal of statistical physics , 163(3):457–491, 2016.[11] B. Fiﬁeld, M. Higgins, K. Imai, and A. Tarr. A new automated redistricting simulator using Markovchain Monte Carlo.

Work. Pap., Princeton Univ., Princeton, NJ , 2015. [12] G. Herschlag, H. S. Kang, J. Luo, C. V. Graves, S. Bangia, R. Ravier, and J. C. Mattingly. QuantifyingGerrymandering in North Carolina. arXiv preprint arXiv:1801.03783 , 2018.[13] G. Herschlag, R. Ravier, and J. C. Mattingly. Evaluating partisan Gerrymandering in Wisconsin. arXivpreprint arXiv:1709.01596 , 2017.[14] K. Hukushima and Y. Sakai. An irreversible Markov-chain Monte Carlo method with skew detailed bal-ance conditions. In Journal of Physics: Conference Series , volume 473, page 012012. IOP Publishing,2013.[15] Y.-A. Ma, E. B. Fox, T. Chen, and L. Wu. Irreversible samplers from jump and continuous Markovprocesses.

Statistics and Computing , 29(1):177–202, 2019.[16] J. C. Mattingly. Expert report for Common Cause v. Lewis.

Common Cause v. Lewis , 2019.[17] J. C. Mattingly. Rebuttal of defendant’s expert reports for Common Cause v. Lewis.

Common Causev. Lewis , 2019.[18] J. C. Mattingly and C. Vaughn. Redistricting and the will of the people, 2014.[19] M. Michel.

Irreversible Markov chains by the factorized Metropolis ﬁlter: Algorithms and applicationsin particle systems and spin models . PhD thesis, ´Ecole Normale Supe´erieure, 2016.[20] L. Najt, D. DeFord, and J. Solomon. Complexity and geometry of sampling connected graph partitions,2019.[21] L. Najt, D. Deford, and J. Solomon. Complexity and geometry of sampling connected graph partitions.

Preprint; arxiv.org/pdf/1908.08881.pdf , 2019.[22] R. M. Neal. Improving asymptotic variance of MCMC estimators: Non-reversible chains are better. arXiv preprint math/0407281 , 2004.[23] M. Ottobre, N. S. Pillai, F. J. Pinski, A. M. Stuart, et al. A function space HMC algorithm with secondorder Langevin diﬀusion limit.

Bernoulli , 22(1):60–106, 2016.[24] G. Stoltz, M. Rousset, et al.

Free energy computations: A mathematical perspective . World Scientiﬁc,2010.[25] Y. Sun, J. Schmidhuber, and F. J. Gomez. Improving the asymptotic performance of Markov chainMonte-Carlo by inserting vortices. In

Advances in Neural Information Processing Systems , pages 2235–2243, 2010.[26] H. Suwa and S. Todo. General construction of irreversible kernel in Markov chain Monte Carlo. arXivpreprint arXiv:1207.0258 , 2012.[27] L. Tierney. A note on Metropolis-Hastings kernels for general state spaces.

The Annals of AppliedProbability , 8(1):19, Feb 1998.[28] K. S. Turitsyn, M. Chertkov, and M. Vucelja. Irreversible Monte Carlo algorithms for eﬃcient sampling.

Physica D: Nonlinear Phenomena , 240(4-5):410–414, 2011.[29] M. Vucelja. Lifting—a nonreversible Markov chain Monte Carlo algorithm.

American Journal ofPhysics , 84(12):958–968, 2016.

Appendix A. Algorithms

A.1.

Tempered single node-ﬂip algorithm.

Combining the tempered single node-ﬂipproposal of Section 5.3 with a Metropolis-Hastings accept-reject step results in the followingalgorithm.Note that for • β = 1, then the proposal distribution (cid:101) Q ( ξ, · ) is the target measure constrained to N ( ξ ), and the Metropolis ratio simpliﬁes to r ( ξ, ξ (cid:48) ) = Z ( ξ ) Z ( ξ (cid:48) ) . Algorithm 5:

Tempered single node-ﬂip algorithm. input : ξ generate proposal ξ (cid:48) ∼ N ( ξ ) ( · )e − βJ ( · ) ; r ( ξ, ξ (cid:48) ) = e − J ( ξ (cid:48) )+ βJ ( ξ (cid:48) ) Z ( ξ ) e − J ( ξ )+ βJ ( ξ ) Z ( ξ (cid:48) ) ; sample u ∼ U ([0 , if u < r ( ξ, ξ (cid:48) ) then ξ ← ξ (cid:48) return ξ In this case the acceptance probability of the proposal is not explicitly dependenton the energy diﬀerence ∆ J = J ( ξ (cid:48) ) − J ( ξ ), but is merely a function of the partitionfunctions of the current state and the proposed state. • β = 0 a version of the single node ﬂip algorithm where the proposal is sampleduniformly from the set N ( ξ ) is recovered and the Metropolis-Hastings ratio becomes r ( ξ, ξ (cid:48) ) = e − J ( ξ (cid:48) ) |N ( ξ (cid:48) ) | − e − J ( ξ ) |N ( ξ ) | − , tends to be lower in comparison to the tempered proposalwith β = 1.Furthermore, if the underlying tempered proposal generates an irreducible Markov chain,then the tempered MCMC algorithm is also irreducible and hence has the desired targetmeasure as its unique invariant measure. This is summarized in the following proposition. Proposition A.1.

If the topology of the graph G p = ( V p , E p ) is such that the Markov chaingenerated by the single-node-ﬂip proposal as deﬁned in (12) is irreducible, then the proba-bility measure (cid:101) π is the unique invariant measure of the Markov chain ( ξ n ) n ∈ N generated bythe MCMC algorithm given in Algorithm 5. Appendix B. Proofs

B.1.

Proof of Theorem 4.1.

We now return to the proof of Theorem 4.1 which guaran-teed that the mixed skewed balance condition ensures that the desired target measure π is an invariant measure for the Markov chain P . The proof has the same structure as theanalogous proof for the original skewed balance condition [7]. Proof of Theorem 4.1. ( π P )( x ) = (cid:88) x (cid:48) ∈X π ( x (cid:48) ) P ( x (cid:48) , x ) = (cid:88) x (cid:48) ∈X n (cid:88) i =1 π ( x (cid:48) ) ω i ( x (cid:48) ) P i ( x (cid:48) , x )= (cid:88) x (cid:48) ∈X n (cid:88) i =1 ω i ( x ) π ( x ) P i ( S i ( x ) , S i ( x (cid:48) ))= π ( x ) n (cid:88) i =1 ω i ( x ) (cid:16) (cid:88) x (cid:48) ∈X P i ( S i ( x ) , S i ( x (cid:48) )) (cid:17) = π ( x ) (cid:16) n (cid:88) i =1 ω i ( x ) (cid:17) = π ( x ) , where the third equality follows by the mixed skew detailed balance condition (4), the ﬁfthand sixth equality follow due to the fact that the sum within each pair of parentheses sumsto one. In the ﬁrst case because P i is a Markov transition kernel and S i ( X ) = X and inthe second case because because weights ω i ( x ) sum up to one. (cid:3) B.2.

Proof of Theorem 4.2.

We now give the proof that the Mixed Skew Metropolis-Hastings (MSMH) algorithm given in Algorithm 1 satisﬁes the mixed skew detailed balancecondition, given in (4); and hence, has the desired target measure π as an invariant measure. Proof of Theorem 4.2.

The form of the algorithm directly implies that the transition ma-trices/kernels P i , i = 1 , . . . , n of the generated Markov chain are of the form P i ( x, x (cid:48) ) = g i, ( x, x (cid:48) ) + g i, ( x, x (cid:48) ) , where(24) g i, ( x, x (cid:48) ) = Q i ( x, x (cid:48) ) min(1 , r i ( x, x (cid:48) )) ,g i, ( x, x (cid:48) ) = { S i ( x ) } ( x (cid:48) ) (cid:104) − (cid:88) ˜ x ∈X Q i ( x, ˜ x ) min(1 , r i ( x, ˜ x )) (cid:105) , thus a generalized skew detailed balance condition is satisﬁed for P = ω · (cid:101) P , if(25) g i,j ( x, x (cid:48) ) π ( x ) (cid:101) ω i ( x ) = g i,j (cid:0) S i ( x (cid:48) ) , S i ( x ) (cid:1) π ( x (cid:48) ) (cid:101) ω i ( x (cid:48) )for j = 1 , x, x (cid:48) ∈ X and all i = 1 , . . . , n . We will now verify this condition foreach j separately.Case j = 1 : A simple computation shows that r i ( x, x (cid:48) ) = 1 r i ( S i ( x (cid:48) ) , S i ( x )) . Thus, r i ( x, x (cid:48) ) ∈ (0 , ⇐⇒ r i ( S i ( x (cid:48) ) , S i ( x )) >

1, and r i ( x, x (cid:48) ) = r i ( S i ( x (cid:48) ) , S i ( x )) ⇐⇒ r i ( x, x (cid:48) ) = 1. Therefore, if r i ( x, x (cid:48) ) ≤

1, we ﬁnd g i, ( x, x (cid:48) ) π ( x ) = min(1 , r i ( x, x (cid:48) )) (cid:101) Q ( x, x (cid:48) ) π ( x )= (cid:101) Q ( S i ( x (cid:48) ) , S i ( x )) π ( x (cid:48) ) (cid:101) Q ( x, x (cid:48) ) π ( x ) (cid:101) Q ( x, x (cid:48) ) π ( x )= g i, (cid:0) S i ( x (cid:48) ) , S i ( x ) (cid:1) π ( x (cid:48) ) , and, similarly, r i ( x, x (cid:48) ) > ⇐⇒ r i ( S i ( x (cid:48) ) , S i ( x )) ∈ (0 , g i, ( x, x (cid:48) ) π ( x ) = (cid:101) Q ( x, x (cid:48) ) min(1 , r i ( x, x (cid:48) )) π ( x )= (cid:101) Q ( x, x (cid:48) ) π ( x )= (cid:101) Q ( x, x (cid:48) ) π ( x ) (cid:101) Q ( S i ( x (cid:48) ) , S i ( x )) π ( x (cid:48) ) (cid:101) Q (cid:0) S i ( x (cid:48) ) , S i ( x ) (cid:1) π ( x (cid:48) )= min (cid:0) , r i ( S i ( x (cid:48) ) , S i ( x )) (cid:1) (cid:101) Q (cid:0) S i ( x (cid:48) ) , S i ( x ) (cid:1) π ( x (cid:48) )= g i, (cid:0) S i ( x (cid:48) ) , S i ( x ) (cid:1) π ( x (cid:48) ) . Case j = 2 : For S i ( x ) (cid:54) = x (cid:48) , it follows from the deﬁnition of g i, ( x, x (cid:48) ) in (24) that g i, ( x, x (cid:48) ) = g i, ( S i ( x (cid:48) ) , S i ( x )) = 0 and (25) is thus trivially satisﬁed. If S i ( x ) = x (cid:48) , itfollows that g i, ( x, x (cid:48) ) = g i, ( S i ( x ) , S i ( x (cid:48) )) = g i, ( S i ( x (cid:48) ) , S i ( x )) , which by virtue of the fact that π is invariant under S i also implies (25). (cid:3) B.3.

Proof of Theorem 4.6.

We now turn to the proof of the unique ergodicity andconvergence of averages result given in Theorem 4.6.Since π is invariant under P , that is π P = π , and π ( ξ ) > ξ ∈ Ξ , it is suﬃcientto show that the Markov chain generated by P is irreducible, meaning that we can reachany extended state ( ξ (cid:48) , (cid:126)θ (cid:48) ) from any other extended state ( ξ, (cid:126)θ ) within a ﬁnite number ofsteps, i.e.,(26) ∀ ( ξ, (cid:126)θ ) , ( ξ (cid:48) , (cid:126)θ (cid:48) ) ∈ Ξ × {− , } n , ∃ m ∈ N , s.t. P m (( ξ, (cid:126)θ ) , ( ξ (cid:48) , (cid:126)θ (cid:48) )) > . We will prove this statement by combining the following two intermediate facts.(1) Show that we can ﬁnd a path with non-zero probability between any state ( ξ, (cid:126)θ ) ∈ Ξ × {− , } n in the set ξ (cid:48) × {− , } n sitting above an arbitrary point ξ (cid:48) ∈ Ξ.(Lemma B.1 below). Such a path will be used to move about the statespace.(2) Show that for any ( ξ, (cid:126)θ ) ∈ Ξ × {− , } n , if i ∈ A ( ξ ), we can ﬁnd a path with non-zero probability beginning at ( ξ, (cid:126)θ ) and ending at ( ξ, R i ( (cid:126)θ )). These paths will beused to ﬂip the i th momentum as needed, without changing the underlining state ξ . (Lemma B.2 below). The basic idea of the proof of Theorem 4.6 is that the ﬁrst of the above fact allows oneto move around the state space from ξ to ξ (cid:48) but without control of what happens to themomentum variable (cid:126)θ . The second fact above then allows one to modify the resulting (cid:126)θ (cid:48)(cid:48) tothe desired (cid:126)θ (cid:48) . We will see that at its core, the ﬁrst fact will follow from the irreducibilityof the proposal kernel (cid:101) Q which was an input to our algorithm. The second fact will followagain from this base irreducibility along with Assumption 2 (or in light of Theorem 4.7,Assumption 3) to ensure there is a place along the path to ﬂip the signs in the momentum (cid:126)θ (cid:48)(cid:48) until it agrees with (cid:126)θ (cid:48) . This is the basic arc of the proof of Theorem 4.6, though each ofthe above statements will need a little reﬁnement and a few additional technical elementswill need to be added to complete the argument. A graphic representation of the sketch ofthe proof is given in Fig. 12.We begin by stating the two lemma which correspond to the two above statements. Theproofs of these lemma are postponed to the end of the section. Lemma B.1.

For all extended states ( ξ, (cid:126)θ ) ∈ Ξ × {− , } n and any state ξ (cid:48) ∈ Ξ thereexists m ∈ N so that P m (( ξ, (cid:126)θ ) , { ξ (cid:48) } × {− , } n ) > . Additionally, m can be chosen so that there is a path (( ξ , (cid:126)θ ) , · · · , ( ξ m , (cid:126)θ m )) , of positiveprobability starting at ( ξ, (cid:126)θ ) and ending in the set { ξ (cid:48) } × {− , } n , satisfying the followingproperty: for every i ∈ { , . . . , n } there exist a k ∈ { , . . . , m } (depending on i ) with i ∈ A ( ξ k ) . Lemma B.2.

For any extended state ( ξ, (cid:126)θ ) ∈ Ξ × {− , } n and i ∈ A ( ξ ) , there is m ∈ N so that P m (cid:16)(cid:0) ξ, (cid:126)θ (cid:1) , (cid:0) ξ, R i ( (cid:126)θ ) (cid:1)(cid:17) > . By the construction of the MSMH chain P from the proposal chain (cid:101) Q in Section 4.3, inparticular the structure N + i and N − i , imply that(27) (cid:91) i ∈A ( ξ ) (cid:104) supp P i (cid:0) ( ξ, (cid:126)θ ) , ( · , (cid:126)θ ) (cid:1) ∪ supp P i (cid:0) ( ξ, R i ( (cid:126)θ )) , ( · , R i ( (cid:126)θ )) (cid:1)(cid:105) = supp (cid:101) Q ( ξ, · ) . The one central obstacle in deducing Lemma B.1 from this fact is that the particularcoordinate momentum (cid:126)θ might not be aligned in the right direction to allow us to proposeand then follow a given transition which is possible under (cid:101) Q . Lemma B.2 allows us to ﬂipparticular coordinate momentum θ i provided i is active. One remaining concern is that wemight be forced to constantly correct the momentum which are changed as a side-eﬀect ofprevious moves. The following lemma is central to ruling out this scenario. Lemma B.3.

Let Assumption 1 hold and consider the transition kernel P (( ξ, (cid:126)θ ) , · ) = (cid:80) ni =1 (cid:101) ω i ( ξ ) P i (( ξ, (cid:126)θ ) , · ) of the Markov chain generated by Algorithm 2 with generic weightsas speciﬁed in (7) . For any two states ξ, ξ (cid:48) ∈ Ξ and an i ∈ { , . . . , n } , such that i ∈ A ( ξ ) and i (cid:54)∈ A ( ξ (cid:48) ) , the following conclusion holds: ∀ (cid:126)θ ∈ {− , } n , P i (( ξ, (cid:126)θ ) , ( ξ (cid:48) , (cid:126)θ )) = 0 . x = ( ξ , ⃗ θ ) x . . . x τ i = ( ξ τ i , ⃗ θ τ i ) Last chance to ﬂip { i ∉ 𝒜( ξ ′ ) i ∈ 𝒜( ξ τ i )𝒫 𝒫 P i 𝒫 x ⋆2 = ( ξ ⋆2 , ⃗ θ τ i ) ... x ( m i ) m i /2 = ( ξ ⋆ m i /2 , ⃗ θ τ i ) x ⋆ m i /2+1 = ( ξ ⋆ m i /2+1 , R i ( ⃗ θ τ i )) P i P i P i ... x ⋆( m i −1) = ( ξ ⋆ m i −1 , R i ( ⃗ θ τ i )) x ⋆ τ i = ( ξ τ i , R i ( ⃗ θ τ i )) P i P i P i x τ i +1 . . .𝒫 𝒫 𝒫 x m ′ ′ = ( ξ ′ , ⃗ θ ′ ′ ) Lemma B.2 x m ′ ′ ′ = ( ξ ′ , R k ( ⃗ θ ′ ′ )) with Lemma B.2Flip { ⃗ θ ′ k ≠ ⃗ θ ′ ′ k k ∈ 𝒜( ξ ′ ) ... P k . . . P k x m = ( ξ ′ , R k ∘ R k ∘ . . . ∘ R k ℓ ( ⃗ θ ′ ′ )) = ( ξ ′ , ⃗ θ ′ ) Start FinishLemma B.1 Lemma B.1

Figure 12.

A sketch of the proof of theorem 4.5, showing that there is a pathwith non-zero probability between any two points in the extended state space ( ξ, θ )and ( ξ (cid:48) , θ (cid:48) ). Equipped now with Lemma B.1, Lemma B.2 and Lemma B.3 as well (27), we are able ofgive the proof of Theorem 4.6. As already mentioned, the basic outline is given in Fig. 12.The proofs of these lemmas are postponed to the end of this section.

Proof of Theorem 4.6.

As already mentioned, Lemma B.1 ensures that for any extendedstate ( ξ, (cid:126)θ ) and state ξ (cid:48) ∈ Ξ, there is a sequence of states ( ξ , (cid:126)θ ) , · · · , ( ξ m , (cid:126)θ m (cid:48)(cid:48) ) with( ξ , (cid:126)θ ) = ( ξ, (cid:126)θ ) and ξ m (cid:48)(cid:48) = ξ (cid:48) and P (( ξ k , (cid:126)θ k ) , ( ξ k +1 , (cid:126)θ k +1 )) > k = 0 , . . . , m (cid:48)(cid:48) − ξ to ξ m (cid:48)(cid:48) which runs horizontal across the center of the diagram (orange arrows).In order to show (26), we need to show that this sequence can be modiﬁed/extended toa sequence where the ﬁnal velocity coincides with the velocity vector (cid:126)θ (cid:48) speciﬁed in (26).First notice that if (cid:126)θ (cid:48)(cid:48) and (cid:126)θ (cid:48) only diﬀer in components which are active in ξ (cid:48) , then theexistence of such a modiﬁed sequence directly follows from Lemma B.2. One simply addsloops (green arrows in Fig. 12) starting from ξ (cid:48) and the current (cid:126)θ and returning with aparticular component of the momentum’s sign ﬂipped to agree with (cid:126)θ (cid:48) . In Fig. 12, theseexcursions correspond extensions to the initial path, which ended at ξ m (cid:48)(cid:48) , which snakedownward and then back to the left from ( ξ (cid:48) , (cid:126)θ (cid:48)(cid:48) ).If (cid:126)θ (cid:48)(cid:48) and (cid:126)θ (cid:48) diﬀer in a component i which is activated along the path ( ξ , . . . , ξ m (cid:48)(cid:48) ), then,we can again invoke Lemma B.2 to insert a loop in the middle of the original sequenceto ﬂip the i th component of the momentum (blue arrows in Fig. 12). We consider thestate ξ τ i in the sequence ( ξ , . . . , ξ m (cid:48)(cid:48) ) at which the component i is active for the lasttime, i.e., τ i := max { j ∈ { , . . . , m (cid:48)(cid:48) } | i ∈ A ( ξ j ) } . After this state we insert a loop ( ξ (cid:63) , (cid:126)θ (cid:63) ) , . . . , ( ξ (cid:63)m i , (cid:126)θ (cid:63)m i ), into the sequence which according to Lemma B.2 ﬂips the sign ofthe i th velocity component, i.e., ξ (cid:63)m i = ξ τ i , (cid:126)θ (cid:63)m i = R i ( (cid:126)θ τ i ). This produces a modiﬁedsequence of states( ξ , (cid:126)θ ) , . . . , ( ξ τ i , (cid:126)θ τ i ) , ( ξ (cid:63) , (cid:126)θ (cid:63) ) , . . . , ( ξ (cid:63)m i , (cid:126)θ (cid:63)m i ) , ( ξ τ i +1 , R i ( (cid:126)θ τ i +1 )) , . . . , ( ξ m (cid:48)(cid:48) , R i ( (cid:126)θ m (cid:48)(cid:48) )) , which is attained with positive probability by the Markov chain (see Fig. 12). Observe that,the transition probabilities between the remaining states ( ξ τ i +1 , R i ( (cid:126)θ τ i +1 )) , . . . , ( ξ m , R i ( (cid:126)θ m ))is not aﬀected by the sign change in the i th velocity component, since the i th component isby the choice of τ i inactive for all this states. All that remains is to show that the transitionfrom the extended state ( ξ (cid:63)m i , (cid:126)θ (cid:63)m i ) = ( ξ τ i , R i ( (cid:126)θ τ i )) to ( ξ τ i +1 , R i ( (cid:126)θ τ i +1 )) does not use kernel P i ; and hence, can not be eﬀected by the sign change in the i th velocity component from ξ (cid:63)m i = ξ τ i (at the start of the loop) to (cid:126)θ (cid:63)m i = R i ( (cid:126)θ τ i ) (at the end of the loop). Lemma B.3ensures that P i (( ξ τ i , R i ( (cid:126)θ τ i ) , ( ξ τ i +1 , R i ( (cid:126)θ τ i +1 )) = 0 because we know that i (cid:54)∈ A ( ξ τ i +1 ) fromthe deﬁnition of τ i .By repeating the described procedure for all remaining mismatched components whichare activated somewhere along the original sequence of states ( ξ , . . . , ξ m (cid:48)(cid:48) ), we obtain asequence for which the ﬁnal velocity coincides in all these components with (cid:126)θ (cid:48) . SinceLemma B.1 allows to choose the sequence ( ξ , . . . , ξ m (cid:48)(cid:48) ) which connects the state ( ξ, (cid:126)θ )with ( ξ (cid:48) , (cid:126)θ (cid:48) ), such that every component P i , i ∈ { , . . . , n } is at least activated once on theway, this concludes the proof. (cid:3) We now return to the proofs of Lemma B.1, Lemma B.2 and Lemma B.3 which we willgive in reverse order.

Proof of Lemma B.3.

Since mixed skew detailed balance holds for P we have in particular(28) P i (( ξ, (cid:126)θ ) , ( ξ, (cid:126)θ )) (cid:101) π ( ξ ) (cid:101) ω i ( ξ ) = P i (cid:16) ( ξ (cid:48) , R i ( (cid:126)θ )) , ( ξ, R i ( (cid:126)θ )) (cid:17) (cid:101) π ( ξ (cid:48) ) (cid:101) ω i ( ξ (cid:48) ) . By the hypotheses of the Lemma we have (cid:101) ω i ( ξ (cid:48) ) = 0 since i (cid:54)∈ A ( ξ (cid:48) ). Thus, the right handside of (28) certainly is 0. Similarly, we have (cid:101) ω i ( ξ ) >

0, and (cid:101) π ( ξ ) > P i (( ξ, (cid:126)θ ) , ( ξ (cid:48) , (cid:126)θ )) = 0 in order for the equality given in (28) tohold. (cid:3) Proof of Lemma B.2.

First notice that since each kernel P i satisﬁes a modiﬁed skew de-tailed balance condition as detailed in (4.1), we have that P i (( ξ, (cid:126)θ ) , ( ξ (cid:48) , (cid:126)θ )) > ⇐⇒ P i (( ξ (cid:48) , R i ( (cid:126)θ )) , ( ξ, R i ( (cid:126)θ ))) > . Thus it follows inductively that any sequence of states ( ξ , (cid:126)θ ) , ( ξ , (cid:126)θ ) , . . . , ( ξ m , (cid:126)θ m ) which canbe observed with positive probability when evolving according to P i , can also with positiveprobability be “walked back” in reverse order as ( ξ m , R i ( (cid:126)θ )) , . . . , ( ξ , R i ( (cid:126)θ )) , ( ξ , R i ( (cid:126)θ )) afterﬂipping the sign of the i th momentum. Since i ∈ A ( ξ ), Assumption 2 guarantees that thereis some path with positive P i probability from ( ξ, (cid:126)θ ) to a point at which (cid:126)θ can be ﬂippedto R i ( (cid:126)θ ). Without loss of generality, we can assume that (cid:126)θ does not change until this laststep. By “walked back” along this path we arrive at ( ξ, R i ( (cid:126)θ )) as desired. (cid:3) Proof of Lemma B.1.

From Assumption 1, which states that (cid:101) Q is irreducible on Ξ, we knowthat there exists an integer m and a path ξ = ξ , . . . , ξ m = ξ (cid:48) with positive (cid:101) Q probability.By the construction of the P chain from (cid:101) Q , we know that for any pair ξ i and ξ i +1 alongthis path there exists some kernel P k and some choice of the k th momentum (cid:15) ∈ {− , } sothat P k (( ξ i , (cid:126)θ ∗ ) , ( ξ i +1 , (cid:126)θ ∗ )) > θ ∗ k = (cid:15) . If by chance we arrive at ξ i with the wrong signin the k th momentum we can insert one of the loops constructed in Lemma B.2 to changethe sign of the k th momentum. By following this procedure inductively for each step inthe original path, we can obtain the path needed to prove the ﬁrst part of Lemma B.1.Finally, we need to show that we can construct the path connecting ( ξ, (cid:126)θ ) to ( ξ (cid:48) , (cid:126)θ (cid:48) ) suchfor each index i ∈ { , . . . , n } there is that along one state ( ξ k , (cid:126)θ k ) such that i ∈ A ( ξ k ). Firstnotice that Remark 4.4 guarantees the existence of states ξ ∗ i ∈ Ξ with i ∈ A ( ξ ∗ i ) for each i ∈ { , . . . , n } . The last result is obtained by prepending to the path constructed above apath which visits each of the ξ ∗ i , i ∈ { , . . . , n } , before heading onto ( ξ (cid:48) , (cid:126)θ (cid:48) ). (cid:3) B.4.

Proof of Theorem 4.7.

We closeout this last appendix of the paper by provingTheorem 4.7 which showed that, under Assumption 1, Assumption 3 and Assumption 2are equivalent.

Proof of Theorem 4.7.

Consider the Markov chain obtained by starting in ( ξ, (cid:126)θ ) and evolv-ing according to the memory kernel P i . Moreover, assume without loss of generality θ i = +1, and denote by V + i ( ξ ) = (cid:91) m ∈ N supp P mi (( ξ, (cid:126)θ ) , ( · , (cid:126)θ )) , the set of states which can be reached from ξ in a ﬁnite number of steps when evolving ac-cording to the kernel P i without changing the sign of the velocity θ i = +1. This set of reach-able vertices V + i ( ξ ) induces a subgraph of G + i which we denote by G + i ( ξ ) = ( V + i ( ξ ) , E + i ( ξ ))where E + i ( ξ ) := E + i ∩ (cid:0) V + i ( ξ ) × V + i ( ξ ) (cid:1) .Let Assumption 3 be violated. That is for a certain i ∈ { , . . . , n } the correspondinggraph G + i contains a non-escapable circuit . If ξ is contained in this circuit , then G + i ( ξ )coincides with this closed circuit and in particular P i (( ξ (cid:48) , (cid:126)θ ) , ( ξ (cid:48) , R i ( (cid:126)θ ))) = 0 for every ξ (cid:48) ∈ G + i ( ξ ). Thus, it follows immediately that also Assumption 2 is violated. This showsthat Assumption 2 implies Assumption 3.In order to show that Assumption 3 together with Assumption 1 implies Assumption 2 itsuﬃces to show that if Assumption 3 and Assumption 1 hold, then there is always at leastone state/node, say ˜ ξ , in the set of reachable vertices V + i ( ξ ) for which there is a positiveprobability of ﬂipping the velocity component θ i (in the sense that P i (( ˜ ξ, (cid:126)θ ) , ( ˜ ξ, R i ( (cid:126)θ ))) > G + i ( ξ )either does or does not contain a non-escapable circuit separately.If the graph G + i ( ξ ) contains a non-escapable circuit, then, the existence of such a state˜ ξ is guaranteed by Assumption 3.If the graph G + i ( ξ ) does not contain a non-escapable circuit , the existence of such a state˜ ξ can be easily shown using the fact that the probability measure π i ( ξ (cid:48) , (cid:126)θ (cid:48) ) ∝ (cid:101) ω i ( ξ (cid:48) ) (cid:101) π ( ξ (cid:48) ) is invariant under P i (see Remark 4.3): there is at least one state ξ (cid:63) ∈ E + i ( ξ ) which is not partof a circuit (otherwise G + i ( ξ ) would be a non-escapable circuit). If P i (( ξ (cid:48) , (cid:126)θ ) , ( ξ (cid:48) , R i ( (cid:126)θ ))) = 0for all ξ (cid:48) ∈ V + i ( ξ ) ⊃ V + i ( ξ (cid:63) ), then this state is transient, which is in direct contradictionto π i ( ξ (cid:63) , (cid:126)θ ) >

0. Consequently, there must be at least one state in V + i ( ξ ) for which theprobability of ﬂipping the i th velocity component is positive.th velocity component is positive.