[PDF] Many-Configuration Markov-Chain Monte Carlo

Abstract

We propose a minimal generalization of the celebrated Markov-Chain Monte Carlo algorithm which allows for an arbitrary number of configurations to be visited at every Monte Carlo step. This is advantageous when a parallel computing machine is available, or when many biased configurations can be evaluated at little additional computational cost. As an example of the former case, we report a significant reduction of the thermalization time for the paradigmatic Sherrington-Kirkpatrick spin-glass model. For the latter case, we show that, by leveraging on the exponential number of biased configurations automatically computed by Diagrammatic Monte Carlo, we can speed up computations in the Fermi-Hubbard model by two orders of magnitude.

Full PDF

MMany-Conﬁguration Markov-Chain Monte Carlo

Fedor ˇSimkovic IV ∗ CPHT, CNRS, Ecole Polytechnique, Institut Polytechnique de Paris, Route de Saclay, 91128 Palaiseau, FranceColl`ege de France, 11 place Marcelin Berthelot, 75005 Paris, France

Riccardo Rossi † Center for Computational Quantum Physics, Flatiron Institute,162 5th Avenue, New York, New York 10010, USA andInstitute of Physics, Ecole Polytechnique F´ed´erale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland (Dated: February 11, 2021)We propose a minimal generalization of the celebrated Markov-Chain Monte Carlo algorithmwhich allows for an arbitrary number of conﬁgurations to be visited at every Monte Carlo step. Thisis advantageous when a parallel computing machine is available, or when many biased conﬁgurationscan be evaluated at little additional computational cost. As an example of the former case, we reporta signiﬁcant reduction of the thermalization time for the paradigmatic Sherrington-Kirkpatrickspin-glass model. For the latter case, we show that, by leveraging on the exponential number ofbiased conﬁgurations automatically computed by Diagrammatic Monte Carlo, we can speed upcomputations in the Fermi-Hubbard model by two orders of magnitude.

I. INTRODUCTION

The Markov-Chain Monte Carlo method [1] (MCMC)is a generic algorithm that allows to draw samples from agiven unnormalized probability distribution. It has foundapplications in many areas of science, in particular inclassical and quantum physics [2–4], and statistics [5].The Markov-chain thermalization time is one of thecritical properties of the MCMC algorithm [6]. Alsoknown as mixing time, it is the time it takes for theMarkov chain to approach the steady-state distribution,which is sometimes called the thermal-equilibrium distri-bution in statistical physics. In the context of a MonteCarlo calculation, the thermalization time is the time oneneeds to wait before accumulating statistics. If it is largerthan the total number of Monte Carlo steps, the MCMCtechnique is not applicable. Another critical property isthe autocorrelation time, which is deﬁned as the time ittakes to draw two uncorrelated samples from the Markovchain, after the thermalization time has passed. It can bereduced linearly by using a parallel computing machinerunning independent Markov chains, while the thermal-ization time cannot be reduced indeﬁnitely with this ap-proach. Introducing couplings between Markov chainshas been shown to help reduce the thermalization time:one usually either considers chains at diﬀerent temper-atures [7, 8], or constructs unbiased estimators [9]. An-other strategy to reduce thermalization time is the useof irreversible Markov chains [10, 11], a method that hasrecently been extended to allow for parallelization [12].On the other hand, quasi-Monte Carlo techniques [13] donot have a thermalization time by construction, and haverecently been successfully applied to quantum impurity ∗ [email protected] † [email protected] problems [14], but their application to large conﬁgura-tions spaces has not yet been fully explored.In some situations, one is able to generate many biasedproposals for new conﬁgurations of the Markov chainwith little or no additional computational eﬀort. Thisis the case, for instance, with some variants of the Dia-grammatic Monte Carlo technique as they automaticallygenerate an exponential number of conﬁgurations at eachMonte Carlo step at no additional cost [15–20]. Here,the MCMC algorithm can only consider one conﬁgura-tion at a given Monte Carlo step, and there is no way touse the information of the multiple biased conﬁgurationsthat have been generated.In this paper, we propose a minimal generalization ofthe Markov-Chain Monte Carlo algorithm, which we callMany-Conﬁguration Markov-Chain Monte Carlo (MCM-CMC), that allows to consider multiple conﬁgurations atthe same Monte Carlo step. This guarantees a straight-forward parallelization of the computation in order to re-duce the thermalization time, as well as allowing for theuse of many biased conﬁgurations that might be availableat a given Monte Carlo step.The document is organized as follows: In Sec. II weformulate and motivate the problem which this work ad-dresses; In Sec. III we present an intuitive picture of theMCMCMC technique by considering the case of two andthree conﬁgurations explicitely; In Sec. IV we presentbenchmark results for the Sherrington-Kirkpatrick spin-glass model, showing how the use of a parallel ma-chine can drastically reduce the thermalization time;In Sec. V we provide numerical results for the Fermi-Hubbard model within Diagrammatic Monte Carlo us-ing a speciﬁcally designed version of the MCMCMC al-gorithm. a r X i v : . [ phy s i c s . c o m p - ph ] F e b FIG. 1. Construction of the Markov chain in standardMCMC. The initial state of the Markov chain is c . Thenew proposed state is c ; we depict this proposal process as adirected graph. Then, detailed balance is imposed to stochas-tically select the next state of the Markov chain. II. PROBLEM STATEMENT

The MCMC method is based on the generation ofa sequence of states, called Markov chain, such thatthe transition probability between these states dependsonly on the current state. The goal of a MCMC al-gorithm is to build a Markov chain such that, for ev-ery initial state, the probability of a given state con-verges to an arbitrary “equilibrium” probability dis-tribution P eq in the long-time limit. The Markovchain is usually built by proposing a new state c given the current state c with a certain, arbitrary,proposal probability distribution, P prp ( c | c ), which iseasy to sample from; then, in the simplest form ofMCMC , the proposed state is accepted or rejected ac-cording to the Metropolis rate satisfying detailed bal-ance: min(1 , P eq ( c ) P prp ( c | c ) / ( P eq ( c ) P prp ( c | c ))).While being an extremely general and powerful algo-rithm, there are situations where the sequential gener-ation of new conﬁgurations leads to ineﬃciency. In thefollowing, we will consider two such situations:1. A parallel computing machine is available, and thethermalization time for the Markov chain is non-negligeable2. It is possible to compute a large number of biasedconﬁgurations with little or no additional compu-tational eﬀortIn this paper, we present a solution to the following prob-lem: What is a minimal modiﬁcation of Markov-ChainMonte Carlo that allows to take advantage of one (orboth) of these possibilities?

As already mentioned, the diﬃculty of MCMC in deal-ing with these situations can be traced back to its sequen-tial deﬁnition: The Markov chain is built by consideringonly one new conﬁguration at a given Monte Carlo step,

FIG. 2. Construction of the Markov chain for MCMCMC.The initial state of the Markov chain is c . The new generatedconﬁgurations are c , . . . , c ; we depict this generation processas a directed graph. In the thermalization phase, the double-arrow graph is considered, and a fraction of a time step thatallows to respect detailed balance within the graph is spentin each node of the graph. Then, the new state of the Markovchain is stochastically selected within the graph with a rateequal to the fraction of time spent in each node. and there is no way to simultaneously use the informa-tion from multiple conﬁgurations. Even if multiple con-ﬁgurations can be generated, only at most one of themcan be accepted. In particular, in the presence of non-negligeable thermalization time, there is little advantagein using a parallel computing machine over a single-coremachine as any statistics accumulated before thermaliza-tion has completed cannot be used. III. INTUITIVE PICTURE

Below, we aim at providing the reader with an intuitivepicture of the novel algorithm presented in this work,Many-Conﬁguration Markov-Chain Monte Carlo (MCM-CMC hereafter), in the graph version [the set version isdiscussed in the Appendix, see Sec. A 3]. Here, we do notprovide complete derivations and proofs, all of which canbe found in Appendix A.Let us ﬁrst describe a standard version of MCMC in alanguage that will easily allow for its generalization. Us-ing the notation of Fig. 1, let us suppose that, at a givenMonte Carlo step, the Markov-chain state is the conﬁgu-ration c , which belongs to some conﬁguration space wewant to sample in. We propose a new conﬁguration c asthe next state of the Markov chain, where c is proposedgiven c with some probability distribution P prp ( c | c ). P prp can be freely chosen as long as it is guaranteed thatthe Markov chain is ergodic. We schematically depictthis proposal process with a directed graph composed ofthe nodes c and c , and of an arrow connecting c to c [see Fig. 1]. In order for the Markov chain to convergeto a given equilibrium distribution P eq , it is suﬃcient toimpose the detailed-balance condition: P eq ( c ) P prp ( c | c ) P acc ( c | c ) = P eq ( c ) P prp ( c | c ) P acc ( c | c ) , (1)where P acc ( c j | c k ) is the probability of accepting the pro-posed conﬁguration c k as the next Markov-chain statewhen proposing it from the conﬁguration c j . In Fig. 1we represent the detailed-balance process by an undi-rected graph with double arrows. After the accep-tance/rejection process, the Markov-chain state is either c or c , and this whole procedure is repeated at the nextMonte Carlo step.We now introduce the MCMCMC algorithm as a nat-ural many-conﬁguration generalization of MCMC. Let c be the Markov-chain state at a given Monte Carlo timestep. We want to follow as closely as possible the MCMCprocedure illustrated in Fig. 1. For this reason, we wantto ﬁnd a way to propose multiple conﬁgurations by re-peatedly using the “two-body” probability distribution P prp ( c k | c j ) introduced above. A natural way to achievethis is sketched in Fig. 2: We consider the conﬁgura-tions c , . . . , c to be nodes of a directed graph; an arrowgoing from c j to c k means that the node c k has beenproposed using the probability distribution P prp ( c k | c j ).The graph structure and generation process are com-pletely arbitrary; in the following we will detail some ofthe possible choices. In the MCMC procedure describedabove, once the graph is generated, one of the two conﬁg-urations is chosen as the Markov-chain state at the nextstep. In the case of MCMCMC , as the graph we considerat each step can be very big, we want to make full use ofall the proposed conﬁgurations. For this reason, we add athermalization phase: All the nodes of the graph are vis-ited a fraction of time proportional to the rate at whichthey would be chosen according to the detailed-balancecondition between conﬁgurations belonging to the sameundirected graph [see Fig. 2]. After this thermalizationphase, a node of the graph is chosen with a probabilityproportional to the rate in which it is visited, and theprocess is iterated.In the rest of this section we want to progressively mo-tivate MCMCMC as a minimal extension of MCMC byconsidering the case of three proposed conﬁgurations perMonte Carlo step in detail. For this reason, we are goingto present three algorithms that “interpolate” betweenMCMC and MCMCMC : the ﬁrst is a rewriting of thestandard MCMC with a heat-bath acceptance rate withinthe graph language we use in this paper, the second isa modiﬁcation of the ﬁrst where we visit multiple con-ﬁgurations at each Monte Carlo step, and the third isa MCMCMC algorithm for many conﬁgurations that wespecialize to the case of three conﬁgurations. R (0 , := c c R (1 , := c c T := c c FIG. 3. Top: in R (0 , the conﬁguration c is generated from c . Center: in R (1 , the conﬁguration c is generated from c .Bottom: the undirected graph T describes the two processes. A. MCMC in the language of graphs

We now discuss in more detail the MCMC algorithm ofFig. 1, highlighting its graphical interpretation. We sup-pose that, at a given Monte Carlo step, the Markov-chainstate is the conﬁguration c . As discussed above, we pro-pose the conﬁguration c as the next Markov-chain stategiven c with the probability distribution P prp ( c | c ).We can represent this process by a directed graph R (0 , with two nodes, c and c , and an arrow connecting theconﬁguration c to the conﬁguration c , as illustrated inFig. 3.As we want to use detailed balance, we also need toconsider the inverse process in which the conﬁguration c is proposed from c . We can represent the inverseprocess by a directed graph R (1 , with the same nodesas R (0 , and opposite direction for the arrow [see Fig. 3].Finally, we introduce the undirected graph T having thesame nodes as R (0 , and R (1 , [see Fig. 3]. We de-ﬁne P acc ( c | T ) as the probability of accepting a node c ∈ { c , c } of the undirected graph T as the next stateof the Markov chain, which, for simplicity, we choose tobe independent on the creation history of T . We decideto impose the detailed balance condition independentlyon any such T : P eq ( c ) P prp ( T | c ) P acc ( c | T ) = P eq ( c ) P prp ( T | c ) P acc ( c | T ) , (2)where we deﬁned P prp ( T | c α ) := P prp ( c − α | c α ) as theprobability of proposing c − α given c α , as discussedabove. We remark that while Eq. 4 is equivalent to Eq. 1,their interpretation is diﬀerent: In Eq. 1 we have directlyconsidered the detailed balance between conﬁgurations,while in Eq. 4 we have used the graph T as an interme-diary. One can easily show that the following choice of P acc satisﬁes the detailed balance condition [Eq. (4)]: P acc ( c | T ) := P eq ( c ) P prp ( T | c ) (cid:80) c (cid:48) ∈ V ( T ) P eq ( c (cid:48) ) P prp ( T | c (cid:48) ) , (3)where V ( T ) := { c , c } is the set of nodes of T . Thisis equivalent to standard MCMC with a “heat-bath” ac-ceptance rate. R { (0 , , (0 , } := c c c R { (0 , , (1 , } := c c c FIG. 4. The two directed graphs of conﬁgurations that canbe generated starting from c . R { (0 , , (1 , } := c c c R { (1 , , (1 , } := c c c R { (1 , , (2 , } := c c c T := c c c FIG. 5. Directed graphs that correspond to the same undi-rected graph T . B. Generalization to three conﬁgurations perMonte Carlo step

In this section we generalize the procedure of Sec. III Ato the case of three conﬁgurations per Monte Carlo step,for a particular graph-generation process. The main pur-pose of this section is to give the intuition behind the pro-cedure, without providing a proof for all the statements,which is left to the Appendix [see Sec. A 2].Suppose that, at a given Monte Carlo step, theMarkov-chain state is c . We propose a new conﬁgura-tion c with probability P prp ( c | c ), analogously to whatis done in Sec. III A. At this point, there are two possibil-ities: With probability p , we generate a new conﬁgura-tion c from c with generation probability P prp ( c | c );otherwise, with probability q = 1 − p , we generate anew conﬁguration c from c with generation probabil-ity P prp ( c | c ). We represent these two processes bytwo directed graphs, which we respectively denote by R { (0 , , (0 , } and R { (0 , , (1 , } [see Fig. 4]. Suppose thatwe have stochastically chosen to generate the ﬁrst di-rected graph in Fig. 4, R { (0 , , (1 , } . Let T be the undi-rected graph with the same nodes as R { (0 , , (1 , } [seeFig. 5]. We consider all the directed graphs that have thesame nodes as T , and edges that can be obtained from T by removing one arrow per edge: these are R { (0 , , (1 , } , R { (1 , , (1 , } , and R { (1 , , (2 , } ; they correspond to thediﬀerent ways of generating the graph T starting fromone of the nodes [see Fig. 5 for the deﬁnition].As the next Markov-chain state, we choose c ∈{ c , c , c } with probability P acc ( c | T ), a quantity thatonly depends on the undirected graph T . In order tohave a useful Monte Carlo process, we need to imposethat the average time spent in a given conﬁguration c coincides with P eq ( c ). For this, it is suﬃcient to imposethe detailed balance condition: P eq ( c j ) P prp ( T | c j ) P acc ( c k | T ) = P eq ( c k ) P prp ( T | c k ) P acc ( c j | T ) , (4) FIG. 6. Discrete and continuous Monte Carlo time steps inMCMCMC. A discrete step corresponds to the generation ofa new graph of conﬁgurations. A continuous step is a virtualupdate between two conﬁgurations which are connected byan edge within a given graph, a process which we refer to as“thermalization” of the undirected graph. where c j and c k are nodes of the undirected graph T .It can be shown [see Sec. A 2] that the following choicesatisﬁes detailed balance: P acc ( c | T ) := P eq ( c ) P prp ( T | c ) (cid:80) c (cid:48) ∈ V ( T ) P eq ( c (cid:48) ) P prp ( T | c (cid:48) ) , (5)where V ( T ) := { c , c , c } is the set of nodes of the graph T , and P prp ( T | c ) is the probability of generating thegraph T from c , which, for the speciﬁc stochastic graphgeneration process we consider in this section, is given by P prp ( T | c ) = P prp ( c | c ) q P prp ( c | c ) P prp ( T | c ) = 2 P prp ( c | c ) p P prp ( c | c ) P prp ( T | c ) = P prp ( c | c ) q P prp ( c | c ) , (6)where we have taken into account the two ways of gen-erating T given c . C. Visiting many conﬁgurations in one MonteCarlo step by “thermalization” of the undirectedgraph

In order to render the algorithm more eﬃcient whenmany conﬁgurations are simultaneously available, we in-troduce a modiﬁcation of the Markov chain deﬁnitionthat allows to visit multiple conﬁgurations per MonteCarlo step. We call this extension the “thermalization”of the undirected graph [see Fig. 6]. As an intuitive pic-ture, we can imagine that the generated undirected graphof conﬁgurations is the whole conﬁguration space for onetime step, and continuous-time Monte Carlo updates aremade between nodes of the graph. Formally, this will besimply done by renaming the “probability of proposal ofconﬁgurations”, P prp , to the “probability of generationof conﬁgurations”, P gen , and interpreting the probabilityof acceptance of a conﬁguration P acc as the fraction oftime spent in a node of the graph P node .A powerful way to formalize this concept is by con-sidering a Markov chain of undirected graphs instead ofthe usual Markov chain of conﬁgurations. Suppose thatat a given Monte Carlo step the Markov-chain state isthe undirected graph T consisting of the nodes c and c [see Fig. 3]. We generate the next state of the Markovchain, T (cid:48) , in the following way: We choose one of thenodes of T , say c , with a certain probability distribution P node ( c | T ) that coincides with P acc ( c | T ) [deﬁned inSec. III A]; We then generate the new state of the Markovchain, T (cid:48) , with a probability P gen ( T (cid:48) | c ) that coincideswith P prp ( T (cid:48) | c ) [ P prp is introduced in Sec. III A]. Tothis discrete-time Markov chain of undirected graphs, weassociate a Markov chain of conﬁgurations in continuoustime in the following way: Suppose that the Markov-chain state at discrete time j is T ; For a continuous time t such that j ≤ t < j + 1, we spend a fraction of the timeequal to P node ( c | T ) in a given conﬁguration c of the undi-rected graph T [see Fig. 6 for an intuitive picture]. Forany given P gen ≡ P prp , we adjust P node to achieve an av-erage time of P eq ( c ) spent in each conﬁguration c . It canbe shown that substituing P acc with P node and P prp with P gen in Eq. (3) guarantees this equilibrium condition. InSec. A 2 we provide a formal proof of this fact. IV. APPLICATION TO A SPIN-GLASS MODELA. The Sherrington-Kirkpatrick model

We now present the implementation of the MCMCMCalgorithm for the Sherrington-Kirkpatrick model, deﬁnedby the energy E ( { J jk } , { S j } ) := 1 √ L (cid:88) ≤ j

We have compared the following algorithms for theSherrington-Kirkpatrick model simulation: the simplestlocal MCMC algorithm; the MCMCMC extension of thelocal MCMC with the line-graph addition process [seeSec. B 1 for details]; and the graph populating processfor a k -ary tree [see Sec. B 2 for details].In Fig. 7, we present the results of the numerical com-parison between the standard local Metropolis MCMC FIG. 7. Energy per site, averaged over parallel computingcores, for a particular disorder realization of the Sherrington-Kirkpatrick model with L = 2 and T = 1, as a functionof Monte Carlo steps. The MCMC dotted line representsthe standard local Metropolis algorithm, while the solid linesare the MCMCMC results. The blue line is the line-graphrandom addition process with graph size of 2 , the red line isthe binary tree of depth 7, the yellow line is the ternary treeof depth 5, the violet line is the 16-ary tree of depth 3. algorithm with a single update, and various MCMCMCtechniques diﬀering by graph structure. We see that thethermalization time is decreased by two orders of mag-nitude, proving therefore the success of the algorithm indramatically reducing the thermalization time. V. APPLICATION TO LATTICE FERMIONSA. The Fermi-Hubbard model

We consider the doped two-dimensional fermionicHubbard model [22–24]: H = (cid:88) k,σ ( (cid:15) k − µ ) c † kσ c kσ + U (cid:88) i n i ↑ n i ↓ . (9)Here µ denotes the chemical potential, k the momentum, σ ∈ {↑ , ↓} the spin, U the onsite repulsion strength, i labels lattice sites, and the (square lattice) dispersion isgiven by (cid:15) k = − t [cos( k x ) + cos( k y )] , (10)where t is the nearest-neighbor hopping amplitude (weset t = 1 in our units). We further deﬁne T as the tem-perature and n as the density.In the chosen parameter regime of T = 0 . n =0 . U = 8, which represents a much investigatedstrongly correlated regime of the model with stripe mag-netic order in the ground state [25–27]. B. Numerical results

As traditional Quantum Monte Carlo techniques areaﬀected from the fermionic sign problem when simulatingthe repulsive Hubbard model away from half ﬁlling, weuse Diagrammatic Monte Carlo, which allows to circum-vent this issue by sampling Feynman diagrams directlyin the thermodynamic limit. We hereafter use a MCM-CMC algorithm for sets we speciﬁcally developed for usewithin the Connected Determinant Diagrammatic MonteCarlo (CDet) [15] [see Sec. C for a CDet introduction].The reader can ﬁnd more details about the set generatingprocess we use in the Appendix, Sec. D.To assess relative performance of MCMCMC andMCMC we compare the stochastic error bars obtained af-ter a given time (about one hour) on a single CPU proces-sor [see Fig. 8]. For a given order we run the MCMCMCalgorithm at order higher than the standard MCMC asthis is optimal for the MCMCMC performance. Bothalgorithms spend around 10% of their time on normali-sation. We report an improvement of up to two orders ofmagnitude in the computational time needed to reach agiven stochastic error for the highest expansion coeﬃcient12, and, importantly, we observe that the improvementgrows as a function of order. ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ order s peedup FIG. 8. Improvement in computational time needed to reacha given stochastic error for a given expansion coeﬃcient ofthe double occupancy as a function of expansion order for theFermi-Hubbard model in the thermodynamic limit, at T = 0 . / VI. CONCLUSION

In this paper we have introduced a minimal general-ization of the Markov-Chain Monte Carlo algorithm thatallows to consider multiple conﬁgurations at the same Monte Carlo step. This new technique permits straight-forward parallelization, as well as giving the possibilityof using an arbitrary number of biased conﬁgurations atthe same Monte Carlo step.After giving an intuitive picture of the algorithm basedon a graphical interpretation, we have presented numer-ical results for spin-glass and quantum fermionic mod-els, showcasing the generality and the potential of thisnew technique. More speciﬁcally, we have shown that,for the Sherrington-Kirkpatrick model, a very signiﬁcantreduction of thermalization time can be achieved whenmany conﬁgurations are considered at each Monte Carlostep. We have further shown how we can use the knowl-edge from all subsets of diagram conﬁgurations withinthe Connected Determinant Diagrammatic Monte Carlomethod to further improve its computational eﬃciency,and we have presented benchmark results for the paradig-matic Fermi-Hubbard model away from half-ﬁlling wherea speedup of up to two orders of magnitude was observedat twelve Feynman loops.We believe that our work has potential applications toareas where Markov-chain Monte Carlo techniques are of-ten considered too computationally intensive due to theirnon-parallelizable nature, such as big data Bayesian in-ference [28]. More generally, the Markov-chain MonteCarlo algorithm for multiple conﬁgurations we have in-troduced in this paper will be potentially applicable tosituations where the thermalization time is an issue anda parallel computing machine is available, or when manybiased conﬁgurations can be evaluated at little computa-tional eﬀort.

ACKNOWLEDGMENTS

We thank Giuseppe Carleo, Michel Ferrero, WernerKrauth, Botao Li, and Shiwei Zhang for valuable dis-cussions. We thank Dries Sels for pointing out Ref. [9].This work has been supported by the Simons Foun-dation within the Many Electron Collaboration frame-work. The Flatiron Institute is a division of the Si-mons Foundation. This work was granted access to theHPC resources of TGCC and IDRIS under the allocationsA0090510609 and A0070510609 attributed by GENCI(Grand Equipement National de Calcul Intensif).

Appendix A: General theory1. Markov chain of conﬁgurations

In this section we review the standard MCMC algo-rithm. We consider a discrete conﬁguration space S . TheMarkov chain is usually deﬁned as a inﬁnite sequence ofconﬁgurations ( c ( j = 1) , c ( j = 2) , ... ), c ( j ) ∈ S , indexedby a discrete “time” j , such that a conﬁguration c ( j + 1)is generated stochastically using only c ( j ) as input. Letus deﬁne P eq ( c ) as the average time spent in the conﬁg-uration c ∈ S : P eq ( c ) := lim J →∞ J J (cid:88) j =1 δ c,c ( j ) (A1)We also deﬁne P (( c, j )) as the probability of being in theconﬁguration c at discrete time j . We consider a Markovchain such that (at least on average)lim j →∞ P (( c, j )) = P eq ( c ) . (A2)The MCMC algorithm is particularly useful to computeaverage values of an “observable” O , which takes as ar-guments elements of the conﬁguration space S : O := (cid:80) c ∈ S w ( c ) O ( c ) (cid:80) c ∈ S w ( c ) (A3)where w ( c ) ≥ c . If the condition P eq ( c ) = w ( c ) (cid:80) c (cid:48) ∈ S w ( c (cid:48) ) (A4)is satisﬁed for every c ∈ S , then we can compute O as atime average of O ( c ( j )), where c ( j ) is the Markov-chainstate at discrete time j : O = lim J →∞ J J (cid:88) j =1 O ( c ( j )) . (A5)

2. Markov chain of graphs of conﬁgurations

In this section, we generalize the standard MCMC for-malism of Sec. A 1 to to a Markov chain consisting of aninﬁnite sequence of undirected graphs of conﬁgurations( T ( j = 1) , T ( j = 2) , . . . ), such that the set of nodes of T ( j ), V ( T ( j )), is contained in S , and such that T ( j + 1)is stochastically generated using only T ( j ) as input. Wecan introduce the average time spent in a given graph T ⊆ S P eq ( T ) = lim J →∞ J J (cid:88) j =1 δ T,T ( j ) . (A6)At a continuous time t ∈ [ j, j + 1[, j ∈ N , we imag-ine visiting a conﬁguration c ∈ V ( T ( j )) with a certainprobability P (( c, t ) | ( T, j )) which we impose to be time-independent, and we call it P node : P node ( c | T ) =: P (( c, t ) | ( T, j )) . (A7)We assume this probability to be normalized to one: Ifthe graph T belongs to the Markov chain at time j ∈ N ,then for every t ∈ [ j, j + 1[, the system can be in only one node c ∈ V ( T ) of the graph T at at time, which isequivalent to imposing (cid:88) c ∈ V ( T ) P node ( c | T ) = 1 (A8)for every undirected graph T .The Markov chain of undirected graphs is built withthe sole purpose of computing average values of functionsdeﬁned in the nodes of the graph. Therefore, if we deﬁne P eq ( c ) = (cid:88) T : V ( T ) (cid:51) c P eq ( T ) P node ( c | T ) (A9)and if we impose Eq. (A4), we can compute observablesin conﬁguration space from the Markov chain in graphspace: O = lim J →∞ J J (cid:88) j =1 (cid:88) c ∈ V ( T ( j )) P node ( c | T ( j )) O ( c ) (A10)where O is deﬁned in Eq. (A3). The advantage of thisformulation compared to Eq. (A5) is that we can considermultiple conﬁgurations at a given Monte Carlo step j ,and visit each one of them.We now show how to compute P node from the graphtransition probability, which is a free parameter in thisformulation. While we could choose an arbitrary graphtransition probability, here we limit ourselves to the fol-lowing choice: the new graph is generated starting fromone of the vertices of the old graph P (( T (cid:48) , j + 1) | ( T, j )) := (cid:88) c ∈ V ( T ) ∩ V ( T (cid:48) ) P node ( c | T ) P gen ( T (cid:48) | c ) , (A11)where P gen ( T (cid:48) | c ) is the probability of generating a graph T starting from a node c . P gen can be chosen in a com-pletely arbitrary way, but a choice of P gen constrains theform of P node . We will impose the detailed balance con-dition in conﬁguration space, which is deﬁned as P (( c, t ) , ( c (cid:48) , t (cid:48) )) = P (( c (cid:48) , t ) , ( c, t (cid:48) )) , (A12)where c, c (cid:48) ∈ S , t < t (cid:48) , and t, t (cid:48) ∈ R + are much larger thanthe thermalization time. First of all, we remark that inour Monte Carlo process there are no transitions at non-integer times; therefore, we can suppose t (cid:48) = j ∈ N , and t = t (cid:48) − + . We then obtain P (( c, j − + ) , ( c (cid:48) , j )) = (cid:88) T : V ( T ) (cid:51) c P eq ( T ) P node ( c | T ) × (cid:88) T (cid:48) ; V ( T (cid:48) ) (cid:51) c (cid:48) P (( T (cid:48) , j ) | ( T, j − P node ( c (cid:48) | T ) . (A13)By using Eq. (A11), one can verify that the choice P node ( c | T ) := P eq ( c ) P gen ( T | c ) P eq ( T ) (A14)satisﬁes detailed balance [Eq. (A12)]. We remark that theprobability of being in the undirected graph T , P eq ( T ),does not need to be computed as one can use the nor-malization condition [see Eq. (A8)].

3. Markov chain of sets of conﬁgurations

In this section we consider a version of the Markovchain algorithm for sets of conﬁgurations. We consider aMarkov chain consisting of an inﬁnite sequence of sets ofconﬁgurations ( S ( j = 1) , S ( j = 2) , . . . ), such that S ( j )is a set of ﬁxed cardinality n ∈ N , n >

0, contained in S , and such that the S ( j + 1) is stochastically generatedusing only S ( j ) as input. We can introduce the averagetime spent in a given set S ⊆ S as P eq ( S ) = lim J →∞ J J (cid:88) j =1 δ S,S ( j ) . (A15)At a continuous time t ∈ [ j, j + 1[, j ∈ N , we visit a sub-set S (cid:48) (cid:40) S with a certain probability P ( ( S (cid:48) , t ) | ( S, j ) ),which we impose to be time-independent: P subset ( S (cid:48) | S ) =: P (( S (cid:48) , t ) | ( S, j )) . (A16)We assume this probability to be normalized to one: if S is the Markov-chain state at discrete time j , then we mustbe in one and only subset S (cid:48) (cid:40) S for every continuoustime t ∈ [ j, j + 1[, which is equivalent to imposing (cid:88) S (cid:48) (cid:40) S P subset ( S (cid:48) | S ) = 1 (A17)for every S ⊆ S such that | S | = n . We would like toevaluate O u := (cid:88) S ⊆ S : | S | = u w ( S ) O ( S ) (A18)for u ∈ { , , . . . , n } , and w ( S ) ≥

0. We introduce λ u > u ∈ { , . . . , n − } , and deﬁne N n := (cid:88) S ⊆ S : | S |

Using Eq. (A26) and Eq. (A20), we see thatthe explicit λ u factor in Eq. (A21) simpliﬁes to oneand we can safely take the λ u → + limit for some u ∈ { , , . . . , n − } . This means that we can accu-mulate statistics for sets that are never visited by theMarkov chain. Appendix B: Generating graphs of conﬁgurations

In this section we describe some choices for the graph-generation probability P gen introduced in Sec. A 2. Weintroduce two classes of graph-generation process: ad-dition processes [Sec. B 1], and populating processes[Sec. B 2].

1. Line-graph addition process

We consider here the following process: starting fromthe conﬁguration c , we create a new conﬁguration c (cid:48) witha probability given by some function P gen ( c (cid:48) | c ), and weconsider the graph with an edge between c and c (cid:48) . Sup-pose now that we have created a graph consisting of c , c , . . . , c l − such that c k is connected by an edge to c k +1 , for k ∈ { , , . . . , l − } . With probability , weadd a new vertex c (cid:48) connected to c with the probabilitydistribution P gen ( c (cid:48) | c ), otherwise we add the vertex c (cid:48) to the graph connecting it to c l − with the probabilitydistribution P gen ( c (cid:48) | c l − ). If T is the line-graph with c , c , . . . , c L − as nodes, one therefore has P gen ( T | c k ) = 12 L (cid:18) Lk (cid:19) (cid:32) k − (cid:89) k l =0 P gen ( c k l | c k l +1 ) (cid:33) × (cid:32) L − (cid:89) k r = k +1 P gen ( c k r | c k r − ) (cid:33) . (B1)

2. General graph populating process

We now present a graph-generation process that isamenable to analytic treatment. The key point is that,if we ﬁrst generate an “empty” graph, where no node isassigned, and we then populate the graph nodes of withconﬁguration values in such a way that the computa-tion of P gen ( T | c ) becomes straightforward. We considerspeciﬁcally a tree T . We choose a root r for the tree, r ∈ { , , . . . , | V ( T ) | − } , with a given probability distri-bution p r , which determines a rooted tree R . From theroot r , we can populate the tree by generating a conﬁgu-ration for each edge of the rooted tree with the probabil-ity distribution P gen ( c | c (cid:48) ), where c and c (cid:48) are connectedwith an edge going from c to c (cid:48) . We can therefore write,assuming that c is the root of the rooted tree RP gen ( T | c ) = p r (cid:89) ( c,c (cid:48) ) ∈ E ( R ) P gen ( c | c (cid:48) ) (B2)One has complete freedom on p r : the uniform choice isjust p r = | V ( T ) | , which gives a self-thermalizing tree if P gen is symmetric. Appendix C: Introduction to ConnectedDeterminant Diagrammatic Monte Carlo

Below we provide a very brief overview of CDet. Weare interested in computing a physical observable C ( ξ ),which we express in terms of an inﬁnite power series C ( ξ ) := ∞ (cid:88) n =0 c n ξ n , (C1)where the expansion coeﬃcients c n are computed fromthe stochastic sampling of multi-dimensional integralsover a set of n internal variables { X , . . . , X n } =: S cor-responding to vertex positions in some arbitrary space: c n = 1 N n (cid:90) S c ( S ) , (C2)where N n is a normalisation constant and c ( S ) representsthe sum of weights for all topologically distinct connectedgraphs T constructed from vertices in S and obeying ad-ditional rules imposed by the choice of model and ob-servable. The weight of a particular graph is given bythe product of the weights of its edges E ( S )[ i, j ], whichare functions of two vertex positions X i and X j . We canwrite: c ( S ) = (cid:88) T (cid:89) { i,j }∈T E ( S )[ i, j ] . (C3)In general, the number of distinct connected graphtopologies grows factorially with the number of verticesand so does the computational cost of evaluating c ( S ).However, in many cases it is possible to compute therelated sum of the weights for all connected and discon-nected graphs a ( S ) at only polynomial or exponentialcost[29]. In order to obtain c ( S ), one needs to eliminateall disconnected diagrams from a ( S ) using the recursiveformula: c ( S ) = a ( S ) − (cid:88) S (cid:48) (cid:40) SS (cid:48) (cid:51) X c ( S (cid:48) ) a ( S \ S (cid:48) ) , (C4)where, in order to properly deﬁne connectivity, the sumis over all subsets S (cid:48) containing the arbitrarily chosenvertex X from S . The cost of evaluating Eq.(C4) scalesas O (3 n ) with the number of vertices and, sometimestogether with the evaluation of a ( S ), represents the com-putational bottleneck of the CDet algorithm. It is impor-tant to note from Eq.(C4) that in order to compute c ( S )for a given set of vertices, one is obliged to compute c ( S (cid:48) )for all of it’s subsets S (cid:48) , which can be used in computinglower order expansion coeﬃcients c n (cid:48) where n (cid:48) < n . Thismeans that in the process of computing the weight at or-der n we are generating an exponential (2 n −

1) amountof a lower order weights as a side product.0

Appendix D: Generating sets of conﬁgurations1. Stochastic generation of the set

In this section we deﬁne the stochastic process that weperform to generate a set S of cardinality n ∈ N givena subset S (cid:48) (cid:40) S and | S (cid:48) = m | . Let S (cid:48) = S (cid:48) . Given S (cid:48) k , k ∈ { m, m + 1 , . . . , n } , we generate S (cid:48) k +1 by adding avertex to S (cid:48) k in the following way: We choose one element e (cid:48) of S (cid:48) k randomly, we generate a new element e with thesome arbitrary probability distribution P gen ( e | e (cid:48) ), andwe add it to S (cid:48) k to generate S k +1 ; = S k ∪ { e (cid:48) } . We thenidentify S (cid:48) n ≡ S .

2. Recursive exponential formula for thegeneration probability

We now show how to compute P gen ( S | S (cid:48) ) for S (cid:48) (cid:40) S ,for a given S of cardinality n ∈ N , | S | = n . This isneeded to compute P subset [see Eq. (A26)]. A brute forceapproach would result in an algorithm that scales facto-rially with n . We are going to present a recursive formulawhose computational cost scales only exponentially with n . We deﬁne ˜ P S ( S (cid:48) ) := P gen ( S | S \ S (cid:48) ) (D1)for S (cid:48) (cid:40) S . One has˜ P S ( { e (cid:48) } ) = 1 | S | (cid:88) e (cid:48)(cid:48) ∈ S \{ e (cid:48) } P gen ( e (cid:48) | e (cid:48)(cid:48) ) (D2)for e ∈ S . For S (cid:48) (cid:40) S , the Chapman-Kolmogorov equa-tion can be written as˜ P S ( S (cid:48) ) = 1 | S (cid:48) | | S \ S (cid:48) | (cid:88) e (cid:48)(cid:48) ∈ S \ S (cid:48) (cid:88) e (cid:48) ∈ S (cid:48) P gen ( e (cid:48) | e (cid:48)(cid:48) ) · ˜ P S ( S (cid:48) \ { e (cid:48) } ) (D3) We can express the previous equation in words: In orderto generate the subset S from the set S \ S (cid:48) , we need toﬁrst generate an element e (cid:48) of S (cid:48) starting from one ele-ment e (cid:48)(cid:48) of S \ S (cid:48) , then generate the set S from the set S \ S (cid:48) ∪ { e (cid:48) } . We note that that the cardinality of thel.h.s. of Eq. (D3) is one higher than the cardinality ofthe r.h.s and therefore this equation can be solved recur-sively. This leads to a computational cost of o ( n n ). Itis possible to improve upon this scaling by introducing a“cumulative” probability distribution ˜ P gen ( S (cid:48) , e (cid:48) ) whichwe compute recursively as:˜ P gen ( S (cid:48) , e (cid:48) ) = (cid:26) (cid:80) e (cid:48) ,e (cid:48)(cid:48) ∈ S (cid:48) e (cid:48) (cid:54) = e (cid:48)(cid:48) P gen ( e (cid:48) | e (cid:48)(cid:48) ) if | S (cid:48) | = 1˜ P gen ( S (cid:48) \ { e S (cid:48) } , e (cid:48) ) − P gen ( e S (cid:48) | e (cid:48) ) otherwise (D4)where e S (cid:48) ∈ S (cid:48) can be chosen arbitrarily as long as e S (cid:48) (cid:54) = e (cid:48) . Then we can rewrite Eq. (D3) into˜ P S ( S (cid:48) ) = 1 | S (cid:48) | | S \ S (cid:48) | (cid:88) e (cid:48) ∈ S (cid:48) ˜ P gen ( S (cid:48) , e (cid:48) ) ˜ P S ( S (cid:48) \ { e (cid:48) } )(D5)which now has only computational cost of O ( n n ). Thiscomputational cost is negligible in comparison to O (3 n ),which is the asymptotic computational scaling for theCDet [15] algorithm used for the evaluation of connectedFeynman diagrams. [1] N. Metropolis, A. W. Rosenbluth, M. Rosenbluth, A. H.Teller, and E. Teller, Journal of Chemical Physics ,1087 (1953).[2] W. Krauth, Statistical mechanics: algorithms and com-putations , Vol. 13 (OUP Oxford, 2006).[3] J. Gubernatis, N. Kawashima, and P. Werner,

Quan-tum Monte Carlo Methods (Cambridge University Press,2016).[4] F. Becca and S. Sorella,

Quantum Monte Carlo ap-proaches for correlated systems (Cambridge UniversityPress, 2017).[5] D. Gamerman and H. F. Lopes,

Markov chain MonteCarlo: stochastic simulation for Bayesian inference (CRC Press, 2006).[6] L. Lov´asz and P. Winkler, Microsurveys in discrete prob-ability , 85 (1998). [7] R. H. Swendsen and J.-S. Wang, Phys. Rev. Lett. ,2607 (1986).[8] G. Altekar, S. Dwarkadas, J. P. Huelsenbeck,and F. Ronquist, Bioinformatics , 407 (2004),https://academic.oup.com/bioinformatics/article-pdf/20/3/407/455968/btg427.pdf.[9] P. E. Jacob, J. O’Leary, and Y. F. Atchad´e, arXivpreprint arXiv:1708.03625 (2017).[10] E. P. Bernard, W. Krauth, and D. B. Wilson, Phys. Rev.E , 056704 (2009).[11] M. Michel, S. C. Kapfer, and W. Krauth, The Journal ofchemical physics , 054116 (2014).[12] B. Li, S. Todo, A. Maggs, and W. Krauth, ComputerPhysics Communications , 107702 (2021).[13] J. Dick, F. Y. Kuo, and I. H. Sloan, Acta Numerica ,133–288 (2013). [14] M. Maˇcek, P. T. Dumitrescu, C. Bertrand, B. Triggs,O. Parcollet, and X. Waintal, Phys. Rev. Lett. ,047702 (2020).[15] R. Rossi, Phys. Rev. Lett. , 045701 (2017).[16] F. ˇSimkovic and E. Kozik, Phys. Rev. B , 121102(2019).[17] A. Moutenet, W. Wu, and M. Ferrero, Phys. Rev. B ,085117 (2018).[18] R. Rossi, arXiv preprint, arXiv:1802.04743 (2018).[19] R. Rossi, F. ˇSimkovic, and M. Ferrero, EPL (EurophysicsLetters) , 11001 (2020).[20] F. ˇSimkovic, R. Rossi, and M. Ferrero, Physical ReviewB , 10.1103/physrevb.102.195122 (2020).[21] G. Parisi, Phys. Rev. Lett. , 1754 (1979).[22] J. Hubbard, in Proceedings of the royal society of londona: mathematical, physical and engineering sciences , Vol.276 (The Royal Society, 1963) pp. 238–257.[23] P. W. Anderson, Solid state physics , 99 (1963).[24] P. W. Anderson et al. , The theory of superconductivity inthe high-Tc cuprate superconductors , Vol. 446 (PrincetonUniversity Press Princeton, NJ, 1997). [25] J. P. F. LeBlanc, A. E. Antipov, F. Becca, I. W. Bulik,G. K.-L. Chan, C.-M. Chung, Y. Deng, M. Ferrero, T. M.Henderson, C. A. Jim´enez-Hoyos, E. Kozik, X.-W. Liu,A. J. Millis, N. V. Prokof’ev, M. Qin, G. E. Scuseria,H. Shi, B. V. Svistunov, L. F. Tocchio, I. S. Tupitsyn,S. R. White, S. Zhang, B.-X. Zheng, Z. Zhu, and E. Gull(Simons Collaboration on the Many-Electron Problem),Phys. Rev. X , 041041 (2015).[26] B.-X. Zheng, C.-M. Chung, P. Corboz, G. Ehlers,M.-P. Qin, R. M. Noack, H. Shi, S. R. White,S. Zhang, and G. K.-L. Chan, Science , 1155 (2017),https://science.sciencemag.org/content/358/6367/1155.full.pdf.[27] M. Qin, C.-M. Chung, H. Shi, E. Vitali, C. Hubig,U. Schollw¨ock, S. R. White, and S. Zhang (Simons Col-laboration on the Many-Electron Problem), Phys. Rev.X , 031016 (2020).[28] R. Bardenet, A. Doucet, and C. Holmes, Journal of Ma-chine Learning Research , 1 (2017).[29] This is often done by computing determinants or per-manents of n × n matrices M ( S ) with edge weights asentries: ( M ( S )) ij = E ( S )[ i, ji, j