[PDF] A Fast MCMC for the Uniform Sampling of Binary Matrices with Fixed Margins

Abstract

Uniform sampling of binary matrix with fixed margins is an important and difficult problem in statistics, computer science, ecology and so on. The well-known swap algorithm would be inefficient when the size of the matrix becomes large or when the matrix is too sparse/dense. Here we propose the Rectangle Loop algorithm, a Markov chain Monte Carlo algorithm to sample binary matrices with fixed margins uniformly. Theoretically the Rectangle Loop algorithm is better than the swap algorithm in Peskun's order. Empirically studies also demonstrates the Rectangle Loop algorithm is remarkablely more efficient than the swap algorithm.

Full PDF

AA Fast MCMC for the Uniform Sampling of BinaryMatrices with Fixed Margins

Guanyang Wang ∗ June 21, 2019

Abstract

Uniform sampling of binary matrix with ﬁxed margins is an important and diﬃ-cult problem in statistics, computer science, ecology and so on. The well-known swapalgorithm would be ineﬃcient when the size of the matrix becomes large or whenthe matrix is too sparse/dense. Here we propose the Rectangle Loop algorithm, aMarkov chain Monte Carlo algorithm to sample binary matrices with ﬁxed marginsuniformly. Theoretically the Rectangle Loop algorithm is better than the swap algo-rithm in Peskun’s order. Empirically studies also demonstrates the Rectangle Loopalgorithm is remarkablely more eﬃcient than the swap algorithm.

The problem of sampling binary matrices with ﬁxed row and column sums has attractedmuch attention in numerical ecology. In ecological studies, the binary matrix is called occur-rence matrix . Rows usually corresponds to species, the columns, to locations. For example,the binary matrix shown on Table 1 is known as “Darwin’s Finch” dataset, which comesfrom Darwin’s studies of the ﬁnches on the Galapagos islands (an archipelago in the EastPaciﬁc). The matrix represents the presence/absence of 13 species of ﬁnches in 17 islands. A“1” or “0” in entry ( i, j ) indicates the presence or absence of species i at island j . It is clearfrom Table 1 that some pairs of species tend to occur together (for example, species 9 and10) while some other pairs tend to be disjoint. Therefore, it is of our interest to investigatewhether the cooperation/competition inﬂuences the distribution of species on islands, or thepatterns found are just by chance.Assuming diﬀerent species have independent distributions on islands, then the observedbinary matrix is simply a random sample from the uniform distribution of all the binarymatrices with ﬁxed margins. Table 2 gives an example of all conﬁgurations of 3 × , ,

1] as both row and column sums. Ideally, if we could list all the binarymatrices with arbitrary size, then we could compare the pattern found in the observed ma-trix with others, to conclude whether the observed matrix is simply by chance. However, ∗ Guanyang Wang is with the Department of Mathematics, Stanford University. Email: { guanyang } @stanford.edu. a r X i v : . [ s t a t . C O ] J un able 1: Occurrence Matrix occurrence matrix of the ﬁnches on the Galapagos islands.IslandFinch A B C D E F G H I J K L M N O P Q1 0 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 12 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 0 03 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 04 0 0 1 1 1 0 0 1 0 1 0 1 1 0 1 1 15 1 1 1 0 1 1 1 1 1 1 0 1 0 1 1 0 06 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 07 0 0 1 1 1 1 1 1 1 0 0 1 0 1 1 0 08 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 09 0 0 1 1 1 1 1 1 1 1 0 1 0 0 1 0 010 0 0 1 1 1 1 1 1 1 1 0 1 0 1 1 0 011 0 0 1 1 1 0 1 1 0 1 0 0 0 0 0 0 012 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 013 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1Table 2: All possible 3 × , ,

1] as both row and column sumsA B C D E           enumerating matrices with ﬁxed margins is often impractical both theoretically and compu-tationally for moderate size of matrices. Therefore, sampling such random matrices becomesthe natural choice.The problem of sampling such matrices also occurs in many other ﬁelds, with diﬀerentnames. For example, an equivalent formulation is uniformly sampling undirected bipartitegraphs with given vertex degrees. A bipartite graph G = ( U, V, E ) is a graph whose verticesare divided into two disjointed sets, denoted by U = { u , · · · , u m } , V = { v , · · · , v n } . E is called the edge set where every edge connects one vertex in U to one in V . The binarymatrix M = ( m i,j ) m × n is often called the bi-adjacency matrix of G and is deﬁned by m i,j = (cid:40) , if there is an edge connecting u i and v j , , otherwise . Bipartite graphs are often used in network studies to model the interaction between two2bjects, for example, customers and products. It is often required to sample graphs withpreserved degree sequence in network analysis uniformly. Throughout this paper, we willuse the term ‘binary matrix’ instead of ‘bipartite graph’ to avoid confusion, although theyare equivalent.The algorithms of sampling binary matrices with ﬁxed margins are divided into twoclasses. The ﬁrst class of algorithms relies on the rejection sampling or importance sampling techniques, see [Sni91], [MP04], [CDHL05], [HM13] [HJ96] for examples. Importance sam-pling usually provides degree from non-uniform distribution, but it can be used to constructestimators to estimate the quantities of interest, such as the number of binary matrices withgiven margins. Chen et al. [CDHL05] introduced a sequential importance sampling (SIS)scheme to test the hypothesis we mentioned at the beginning of this paper on “Darwin’sFinch” dataset.The second class falls into the Markov Chain Monte Carlo (MCMC) category and will beour main focus in this paper. The well-known “swap algorithm” has been used for decades.To the author’s best knowledge, it is ﬁrst introduced by Besag and Cliﬀord [BC89] in 1989to solve a statistical testing problem. The swap algorithm has been formally proposedand analysed by Rao et al. [RJB96] and [KTV99] in the 1990s. A similar question is tosample matrices with non-negative integer entries, ﬁxed row and column sums. Diaconisand Gangolli have proposed a random walk Metropolis algorithm [DG95]. Many variationsand extensions of this algorithm are described by Diaconis and Sturmfels [DS + single swap in each iteration, but when the matrixis large, or is mostly ﬁlled (or unﬁlled), the eﬃciency of swap algorithm can be relatively low.In 2008, Verhelst [Ver08] proposed a new MCMC algorithm based on the idea of performingmultiple swaps per iteration. In 2014, Strona et al. [SNB +

14] introduced the “Curveballalgorithm”, which uses a ‘fair trade’ operation to replace the ‘swap’ operation in the swapalgorithm, aiming for a faster mixing. The mathematical formulation of Curveball algorithmis equivalent to Verhelst’s algorithm, but with diﬀerent implementation and reasoning. Anice survey and numerical comparisons of the existing algorithms can be found in a recentdissertation [Rec18]. The class of ‘multiple swaps’ algorithms tends to improve the mixingtime empiricially. However, each step of the ‘multiple swaps’ algorithm is slower than theclassical swap algorithm. Meanwhile, it is hard to compare the ‘multiple swaps’ algorithmsand classical swap algorithm theoretically, as the corresponding Markov-chains have compli-cated behaviors and are therefore hard to analyze mathematically. The only existing resultcan be found in [CK18].In this paper, we introduce a novel algorithm called Rectangle Loop algorithm. Thealgorithm is based on the classical swap algorithm, with a careful utilization of the matrixstructure given by margins. We have also proved the resulting Markov Chain dominates theclassical chain used in the swap algorithm in the sense of Peskun’s partial ordering [Pes73],and is easy to implement. Section 2 gives a review of swap algorithm and Curveball algo-rithm, including the details of both algorithms and a discussion. In Section 3 we introduceour new algorithm – Rectangle Loop algorithm. Section 4 proves the theoretical propertiesof Rectangle Loop algorithm. Section 5 gives numerical results.3

Existing Methods

The swap algorithm, or equivalently, swap chain is based on the idea of swapping checker-board units . Here a checkerboard unit is a two by two matrix with one of the following forms: (cid:32) (cid:33) , (cid:32) (cid:33) . A swap means changing one checkerboard unit to the other.Starting from an initial matrix, one chooses two rows and two columns uniformly atrandom among all rows and columns. If the resulting 2 × Algorithm 1

Swap Algorithm

Input: initial binary matrix A , number of iterations T for t = 1 , · · · T do Choose two distinct rows and two distinct columns uniformly at random If the corresponding 2 × A t − is a checkerboard unit, i.e. (cid:32) (cid:33) or (cid:32) (cid:33) , swap the submatrix (cid:32) (cid:33) ← (cid:32) (cid:33) or vice versa. Otherwise A t ← A t − end for The swap algorithm is a Metropolis-type Markov chain Monte Carlo which converges tothe uniform distribution.

The swap algorithm can often be ineﬃcient, taking Darwin’s Finch data for example,there are (cid:0) (cid:1)(cid:0) (cid:1) = 10608 submatrices with size 2 ×

2, however, only about 3% of themare swappable. This means it requires a very large T (the number of iterations) to ensurethe generated degree is close to uniformly distributed. The Curveball algorithm providesanother solution.The Curveball algorithm uses ‘trade’ instead of ‘swap’ operation in each iteration. Steps3-5 in Algorithm 2 gives an illustration of trading, it trades elements in column V withelements in column S a − b for row a and row b , preserving their row and column sums. Thoughseemingly complicated, there is a very intuitive explanation of the Curveball algorithm. Werefer the readers to [SNB +

14] for detailed illustrations.4 lgorithm 2

Curveball Algorithm

Input: initial binary matrix A , number of iterations T for t = 1 , · · · T do Choose two distinct rows r a , r b uniformly at random Determine two disjoint sets S a − b . = { k : A t ( a, k ) = 1 , A t ( b, k ) = 0 } S b − a . = { l : A t ( a, l ) = 0 , A t ( b, l ) = 1 } , here assuming | S i − j | ≤ | S j − i | Choose a subset V ⊂ S j − i uniformly at random Set A t +1 = A t except for row a, b .For row a : A t +1 ( a, l ) =  l ∈ V l ∈ S a − b ∪ S b − a \ VA ( a, l ) otherwiseFor row b A t +1 ( b, k ) =  k ∈ S a − b ∪ S b − a \ V k ∈ VA ( b, k ) otherwise end for Rectangle Loop Algorithm

The swap algorithm is proven to converge to uniform distribution. However, gettingstuck at the same conﬁguration is ineﬃcient and thus the convergence could be very slow.For example, numerical experiments suggest that one would expect more than 30 iterationsbefore each successful swap using ‘Darwin’s Finch’ dataset. Assuming the randomly chosenrow is mostly ﬁlled, such as row 1 in Table 1, the two random chosen entries in this rowwould most likely be [1 ,

1] but the row of a ‘checkerboard unit’ has to be either [1 ,

0] or[0 , ×

5, with row names R , · · · , R and column names C , · · · , C . In each step, we choose one row and one columnuniformly at random (Step A). Suppose R and C is chosen, with corresponding entry 1 ,the red number in the top middle plot of Figure 1. Then we randomly choose a 0 amongall the 0s in R (Step B). Since there is only one 0 in R , which is at location C , thisis our only choice. Again, we scan through all the entries in the same column with the 0just chosen ( C ) and randomly choose a 1 among all 1s (Step C). In our example, the 1s of C are located at R and R . Suppose we have chosen ( R , C ). Now the three locations( R , C ) , ( R , C ) , ( R , C ) altogether give us the fourth one ( R , C ), making the four entriesa rectangle (Step D). If the fourth entry equals 0, then we swap the submatrix as we did inswap method (Step E). Otherewise the fourth entries equals 1 and the original matrix is notchanged. After Step A - E is iterated for many times, the resulting randomized matrices areused as representatives of uniformly distributed matrix with ﬁxed margins.The main diﬀerence between the Rectangle Loop algorithm and the swap algorithm isthe sampling scheme. The Rectangle Loop algorithm is performing ‘conditional sampling’,making it more eﬃcient than swap method, which is doing ‘unconditional choosing’. Forexample, suppose both the swap method and Rectangle Loop algorithm have chosen R and C , an entry with value 1. Then R has only one 0 which is in column 4. For the swapalgorithm, the probability of correctly choosing C is only , as it is uniformly choosingamong all columns. The Rectangle Loop algorithm, however, as the mechanism guaranteeswe only sample from the zero entries, chooses C with probability 1. Therefore it signiﬁcantlyincrease the swapping probability, leading to a faster convergence than the swap chain.The details of the Rectangle Loop algorithm is described in Algorithm 3. Noteworthy,when ﬁnding a 0 entry, we sample 1 with the same column as the 0. When ﬁnding a 1, wesample 0 with the same row as the 1. This ‘symmetric’ design ensures the algorithm thatconverges to the correct distribution, as will be proved in Section 4. The paths of sampledentries in each iteration always form a rectangle, that is where the name ‘Rectangle Loopalgorithm’ comes from. Given row sums r = ( r , r , · · · , r m ) and column sums c = ( c , c , · · · , c n ), we deﬁne Σ r , c be the set of all matrices with row sums r and column sums c . The suﬀcient and necessary6igure 1: An illustration of Rectangle Loop algorithm. Algorithm 3

Rectangle Loop Algorithm

Input: initial binary matrix A , number of iterations T for t = 1 , · · · T do Choose one row and one column ( r , c ) uniformly at random if A t − ( r , c ) = 1 then Choose one column c at random among all the 0 entries in r Choose one row r at random among all the 1 entries in c else A t − ( r , c ) = 0 Choose one row r at random among all the 1 entries in c Choose one column c at random among all the 0 entries in r end if if the submatrix extracted from r , r , c , c is a ‘checkerboard unit’ then Swap the submatrix else A t ← A t − end if end for r , c not being zero is given by Gale [G + A , B ‘swappable’ if one can transform to the other via one step swap algorithm.Equivalently, A and B only diﬀers in a 2 × < r i < n, < c j < m for any 1 ≤ i ≤ m, ≤ j ≤ n , as otherwise wecould simply delete that degenerate row/column. The following theorem characterizes thelimit distribution and transition probability of the swap chain. Theorem 1.

Given r , c and an initial matrix A ∈ Σ r , c , the swap algorithm deﬁnes aMetropolis-type Markov chain with stationary distribution Unif (Σ r , c ) , transition kernel: P ( A, B ) =  ( m )( n ) If A and B are swappable , − s ( A ) ( m )( n ) If B = A, s ( A ) . = { C: C and A are swappable } Otherwiseand acceptance probability .Proof. When A and B are swappable, there exists two rows i , i and two columns j , j suchthat A and B only diﬀers in the 2 × i , i and column j , j .Therefore the probability of swapping A to B equals1 (cid:0) m (cid:1)(cid:0) n (cid:1) . Recall that in the setting of Metropolis-Hastings, the acceptance probability from A to B isgiven by min { , π ( B ) P ( B, A ) π ( A ) P ( A, B ) , } notice here the stationary distribution π is designed to be Unif(Σ r , c ), P ( A, B ) = P ( B, A ) = ( m )( n ) . Hence it is clear that the swap algorithm is a Metropolis-type Markov chain withUnif(Σ r , c ) as stationary distribtuion and acceptance probability 1, which justiﬁes the cor-rectness of the swap algorithm.The key point in swap algorithm is symmetry. When two diﬀerent states A, B are swap-pable, the associated transition probability is symmetric, i.e., P ( A, B ) = P ( B, A ) , this ensures the chain has acceptance probability 1.In order to compare the eﬃciency of diﬀerent Markov kernels with the same distribution,Peskun [Pes73] ﬁrst introduced the following partial-ordering.Let P , P be two Markov transition kernels on the same state space S with same sta-tionary distribution π , then P dominates P oﬀ the diagonal , P (cid:23) P , if P ( x, A ) ≥ P ( x, A )for all x ∈ S and A measurable with x / ∈ A . 8hen the state space is ﬁnite, as in our case, P (cid:23) P iﬀ all the oﬀ-diagonal entries of P are greater than or equal to the corresponding oﬀ-diagonal entries of P . This indicates P has lower probability to get stuck in the same state, and is exploring the state spacein a more eﬃcient way. The following theorem shows the rectangale loop algorithm alsohas uniform distribution as stationary distribution, and the corresponding chain dominatesthe swap chain oﬀ the diagonal. For simplicity, we will use P s and P r to denote transitionkernel for the swap chain and rectangle loop chain, respectively, omitting its dependency on r , c , Unif(Σ r , c ) . Theorem 2.

Given r , c and an initial matrix A ∈ Σ r , c , the Rectangale loop algorithm de-ﬁnes a Metropolis-type Markov chain with stationary distribution Unif (Σ r , c ) . The transitionkernel P r dominates P s oﬀ the diagonal.Proof. Given any two swappable conﬁgurations A and B , we are aiming to show P r ( A, B ) = P r ( B, A ) ≥ P s ( A, B ).As

A, B are swappable, there exists two rows i , i and two columns j , j such that A and B only diﬀers in the 2 × i , i and column j , j . Withoutloss of generality, we assume the checkerboard unit corresponding to A has the form (cid:34) (cid:35) ,as shown in Figure 4. Notice that there are four vertices of the 2 × A to B equals the summation of four probabilities, each onecorresponds to choosing one speciﬁc vertex of the ‘checkerboard unit’ .Figure 2 illustrates the calculation of one path, starting from the vertex ( i , j ). Thepossibility of choosing row i and column j is mn . Then one chooses a 0 among all the 0s inrow i , and there are n − r i of them. Therefore the possibility of choosing column j equals n − r i . Similarly, after choosing j , one chooses a 1 among all 1s in column j , and there are c j of them. Hence the possibility of choosing row i equals c j . The fourth entry is ﬁxedafter determining the ﬁrst three entries thus the last step has probability 1. Multipling thepossibilities above altogether, the possibility of transforming A to B , starting with ( i , j ),equals 1 mn · n − r i · c j . The probability of transforming A to B with other starting vertex can be calculated accord-ingly. It turns out P r ( A, B ) can be written as the following summation: P r ( A, B ) = 1 mn (cid:18) n − r i · c j + 1 c j · n − r i + 1 n − r i · c j + 1 c j · n − r i (cid:19) . Following the same strategy, P r ( B, A ) can also be calculated below. Figure 3 illustratesthe calculation of P r ( B, A ) starting from ( i , j ). P r ( B, A ) = 1 mn (cid:18) c j · n − r i + 1 n − r i · c j + 1 c j · n − r i + 1 n − r i · c j (cid:19) . After matching all the terms of P r ( B, A ) with P r ( A, B ), we conclude that P r ( B, A ) =9igure 2: An illustration of calculating P r ( A, B ) for a single path, starting from vertex( i , j ). 10igure 3: An illustration of calculating P r ( B, A ) for a single path, starting from vertex( i , j ). 11 r ( A, B ), which justiﬁes the Rectangle Loop algorithm has Unif(Σ r , c ) as stationary distri-bution.To show P r ( A, B ) ≥ P s ( A, B ), notice that P s ( A, B ) = ( m )( n ) = m ( m − n ( n − . and P r ( A, B ) = 1 mn · n − r i · c j + 1 mn · c j · n − r i +1 mn · n − r i · c j + 1 mn · c j · n − r i . It is clear that P r ( A, B ) can be written as the summation of four terms. Each term in thesummation is greater than or equal to m ( m − n ( n − . Therefore we conclude that P r ( A, B ) ≥ P s ( A, B ) for any swappable

A, B . This indicates P r (cid:23) P s , the Rectangle Loop algorithm isexploring the state space in a more eﬃcient way than the swap algorithm. Now we use the example used in [SNB +

14] and [MP04] to compare the existing algorithmsand the Rectangle Loop algorithm. The example below is concrete. The transition matrixcan be calculated explicitly and convergence can be assessed analytically.The ﬁve matrices shown in Table 2 are all possible conﬁgurations of 3 × , ,

1] as both row and column sums. The transition matrices for swap algorithm,Curveball algorithm and Rectangale loop algorithm is shown in Table 3.Swap Curveball Rectangle Loop 

59 19 19 19 1919 23

19 1919

23 19 1919 19 19 23

19 19 19  

13 16 16 16 1616 13

16 1316

13 13 1616 16 13 13

16 13 16  

14 14 14 1414 14

13 1614

14 16 1314 13 16 14

14 16 13  Table 3: Transition matrices for swap algorithm (left), Curveball algorithm (middle), Rect-angle Loop algorithm (right)Figure 4 shows the comparison of the three algorithms. Here we measure the distancebetween transition kernel P and the stationary distribution π by total variation distance:max A ∈ Σ r , c (cid:107) P k ( A, · ) − π (cid:107) TV = 12 max A ∈ Σ r , c (cid:88) B ∈ Σ r , c | P k ( A, B ) − π ( B ) | , k denotes the power. It is clear that all the algorithms converge exponentially touniform distribution, but the Rectangle Loop algorithm converges faster than swap algorithmand Curveball algorithm. For larger matrices, it is infeasible to calculate the transition matrix theoretically. Toprovide empirical justiﬁcation for the advantage of the Rectangle Loop algorithm. We havedesigned the experiment as follows. For each p = 0 . , . , . , . , . , . , .

5, a 100 × p to be 1. We ran eachalgorithm for 10000 iterations and collected the corresponding number of swaps, as shownin Table 4. When the ﬁlled portion p is small, the Rectangle Loop algorithm is extremelyeﬃcient, producing more than 73 times more swaps than the swap algorithm. For large p ,the advantage of Rectangle Loop algorithm is reduced, but still very signiﬁcant. For p = 0 . p > . p = 0 .

01, the RectangleLoop is about 31 times more eﬃcient than the swap algorithm. Even for p = 0 .

5, RectangleLoop algorithm is still about 4 times more eﬃcient than the swap algorithm.We have also used the pertubation score suggested by Strona et.al. [SNB +

14] to accessconvergence for both algorithm. Pertubation score of a matrix is deﬁned by the fraction ofcells diﬀering from the corresponding ones of the initial matrix. It takes several iterationsfor each algorithm to stabilize around its expectation. It is shown in Figure 5 that it takesless iterations and less time for the Rectangle Loop algorithm to stabilize, suggesting a fastermixing than the swap algorithm.

Going back to ‘Darwin’s Finch’ dataset, we use the test statistics ¯ S suggested by Robertsand Stone [RS90] to compare the three algorithms. ¯ S is deﬁned by¯ S ( A ) = 1 m ( m − (cid:88) i (cid:54) = j s ij , where m is the number of rows of matrix A , S = ( s ij ) = AA T . For the ﬁnch data, ¯ S = 45 . .00.10.20.30.4 0 10 20 30 Power T o t a l V a r i a t i on D i s t an c e variable SwapCurveballRectangle Loop−20−100 0 10 20 30

Power

Log T o t a l V a r i a t i on D i s t an c e variable SwapCurveballRectangle Loop

Figure 4: Comparison between swap algorithm, Curveball algorithm and Rectangle Loopalgorithm. Top: relationship between total variation distance and the power of transitionmatrix. Bottom: relationship between logarithm of total variation distance and the powerof transition matrix. 14 .0 0.5 1.0 1.5 2.0 2.5 . . . Fill = 1%

Time/s P e r t u r ba t i on S c o r e Rectangle loopSwap . . . . . . Fill = 1%

Iterations P e r t u r ba t i on S c o r e Rectangle loopSwap . . . . . Fill = 10%

Time/s P e r t u r ba t i on S c o r e Rectangle loopSwap . . . . . Fill = 10%

Iterations P e r t u r ba t i on S c o r e Rectangle loopSwap

Figure 5: Comparison between swap algorithm and Rectangle Loop algorithm in mixing100 ×

100 matrices. The left two subplots are the relationship between time and pertubationscore for diﬀerent ﬁlled portions. The right two subplots are the relationship between (every100) iterations and pertubation score for diﬀerent ﬁlled portions. Rectangle Loop algorithmis represented by blue curves and swap algorithm is represented by red curves.15ethod Filled portion Number of swaps Time per swap (/s)Rectangle Loop 1% 586 1 . × − Swap 8 3 . × − Rectangle Loop 5% 977 5 . × − Swap 42 3 . × − Rectangle Loop 10% 1838 3 . × − Swap 156 1 . × − Rectangle Loop 20% 3271 2 . × − Swap 509 5 . × − Rectangle Loop 30% 4222 2 . × − Swap 803 5 . × − Rectangle Loop 40% 4794 1 . × − Swap 1160 4 . × − Rectangle Loop 50% 5080 1 . × − Swap 1271 5 . × − Table 4: The comparison between swap algorithm and Rectangle Loop algorithm. Eachalgorithm is implemented 10000 iterations on 100 ×

100 matrices with diﬀerent ﬁlled portions.The third column records the number of successful swaps among the 10000 iterations, thelast column records the average time per swap, respectively.implemented the swap algorithm, Curveball algorithm and Rectangle Loop algorithm on thesame data for 20000 iterations, using its average as an estimator for E ( ¯ S ). The resultsare shown in Figure 6. After 20000 iterations, Rectangle Loop algorithm gives an estimateof 42 .

135 with standard deviation 0 . .

126 withstandard deviation 0 . . . S and standard deviation estimation, the Rectangle Loop algorithm becomes stabilizedmuch earilier than the swap algorithm, indicating a faster mixing.16 . . . Iterations S Rectangle loopSwapCurveball . . . . . . Iterations s d Rectangle loopSwapCurveball

Comparison on Darwin's Finch Dataset

Figure 6: Comparison between swap algorithm, Curveball algorithm and Rectangle Loopalgorithm. The left subplot is the relationship between the average of ¯ S with the number ofiterations. The right subplot is the relationship between the standard deviation of ¯ S withthe number of iterations. Blue, red and green curves represent Rectangle Loop, swap andCurveball algorithms respectively. 17 Discussion

There is a growing tendency to study the behavior of binary matrices with ﬁxed marginsin numerous scientiﬁc ﬁelds, ranging from mathematics to natural science to social science.For example, mathematicians and computer scientists are interested in the total number ofconﬁgurations of given margin sums. Ecologists use the so-called occurence matrix to modelthe presence/absence of species in diﬀerent locations. Biologists use the binary matrix tomodel neuronal networks. Social scientists use the binary matrix for studying social networkfeatures.One of the central and diﬃcult problems is uniformly sampling binary matrices withgiven margins. In this article we have developed the Rectangle Loop algorithm which iseﬃcient, intuitive and easy to implement. Theoretically, the algorithm is superior to theclassical swap algorithm in Peskun’s order. In practice, the Rectangle Loop algorithm isnotably more eﬃcient than the swap approach. For a ﬁxed number of iterations, RectangleLoop algorithm produces 4 −

73 times more successful swaps than the swap algorithm. For aﬁxed amount of time, Rectangle Loop algorithm still produces 4 −

31 times more successfulswaps than the swap algorithm. This suggests the Rectangle Loop algorithm is eﬃcient bothstatistically and computationally.There are many other problems that remain. From a theoretical point of view, it isimportant to give sharp bounds on the convergence speed of a given Markov chain. However,giving a useful running time estimate is often challenging in practical problems. It would bevery interesting if the swap algorithm, Curveball algorithm and the Rectangle Loop algorithmcan be investigated analytically. From an applied point of view, there are many factors thatinﬂuence the performance of algorithms, such as running time per step (swap algorithm isthe fastest, while Curveball algorithm is the slowest), initialization of the matrix, size of thematrix, ratio between row number and column numbers, ﬁlled proportions. Our empiricalstudies suggest that all the factors have a signiﬁcant impact on the convergence speed forall the algorithms. It would be beneﬁcial if more numerical experiments are carried out,yielding a complete and comprehensive comparison between all the existing algorithms.18 eferences [BC89] Julian Besag and Peter Cliﬀord. Generalized monte carlo signiﬁcance tests.

Biometrika , 76(4):633–642, 1989.[CDHL05] Yuguo Chen, Persi Diaconis, Susan P Holmes, and Jun S Liu. Sequential montecarlo methods for statistical analysis of tables.

Journal of the American StatisticalAssociation , 100(469):109–120, 2005.[CK18] Corrie Jacobien Carstens and Pieter Kleer. Speeding up switch markov chainsfor sampling bipartite graphs with given degree sequence. In

Approximation,Randomization, and Combinatorial Optimization. Algorithms and Techniques(APPROX/RANDOM 2018) . Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik,2018.[DG95] Persi Diaconis and Anil Gangolli. Rectangular arrays with ﬁxed margins. In

Discrete probability and algorithms , pages 15–41. Springer, 1995.[DS +

98] Persi Diaconis, Bernd Sturmfels, et al. Algebraic algorithms for sampling fromconditional distributions.

The Annals of statistics , 26(1):363–397, 1998.[G +

57] David Gale et al. A theorem on ﬂows in networks.

Paciﬁc J. Math , 7(2):1073–1082, 1957.[HJ96] RB Holmes and LK Jones. On uniform generation of two-way tables with ﬁxedmargins and the conditional volume test of diaconis and efron.

The Annals ofStatistics , pages 64–68, 1996.[HM13] Matthew T Harrison and Jeﬀrey W Miller. Importance sampling for weightedbinary random matrices with speciﬁed margins. arXiv preprint arXiv:1301.3928 ,2013.[KTV99] Ravi Kannan, Prasad Tetali, and Santosh Vempala. Simple markov-chain algo-rithms for generating bipartite graphs and tournaments.

Random Structures &Algorithms , 14(4):293–308, 1999.[MP04] Istv´an Mikl´os and J´anos Podani. Randomization of presence–absence matrices:comments and new algorithms.

Ecology , 85(1):86–92, 2004.[Pes73] Peter H Peskun. Optimum monte-carlo sampling using markov chains.

Biometrika , 60(3):607–612, 1973.[Rec18] Steﬀen Rechner. Markov chain monte carlo algorithms for the uniform samplingof combinatorial objects. 2018.[RJB96] A Ramachandra Rao, Rabindranath Jana, and Suraj Bandyopadhyay. A markovchain monte carlo method for generating random (0, 1)-matrices with givenmarginals.

Sankhy¯a: The Indian Journal of Statistics, Series A , pages 225–242,1996. 19RS90] Alan Roberts and Lewis Stone. Island-sharing by archipelago species.

Oecologia ,83(4):560–567, 1990.[Rys09] Herbert J Ryser. Combinatorial properties of matrices of zeros and ones. In

Classic Papers in Combinatorics , pages 269–275. Springer, 2009.[SNB +

14] Giovanni Strona, Domenico Nappo, Francesco Boccacci, Simone Fattorini, andJesus San-Miguel-Ayanz. A fast and unbiased procedure to randomize ecologi-cal binary matrices with ﬁxed row and column totals.

Nature communications ,5:4114, 2014.[Sni91] Tom AB Snijders. Enumeration and simulation methods for 0–1 matrices withgiven marginals.

Psychometrika , 56(3):397–417, 1991.[Ver08] Norman D Verhelst. An eﬃcient mcmc algorithm to sample binary matrices withﬁxed marginals.