Many-Configuration Markov-Chain Monte Carlo
MMany-Configuration Markov-Chain Monte Carlo
Fedor ˇSimkovic IV ∗ CPHT, CNRS, Ecole Polytechnique, Institut Polytechnique de Paris, Route de Saclay, 91128 Palaiseau, FranceColl`ege de France, 11 place Marcelin Berthelot, 75005 Paris, France
Riccardo Rossi † Center for Computational Quantum Physics, Flatiron Institute,162 5th Avenue, New York, New York 10010, USA andInstitute of Physics, Ecole Polytechnique F´ed´erale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland (Dated: February 11, 2021)We propose a minimal generalization of the celebrated Markov-Chain Monte Carlo algorithmwhich allows for an arbitrary number of configurations to be visited at every Monte Carlo step. Thisis advantageous when a parallel computing machine is available, or when many biased configurationscan be evaluated at little additional computational cost. As an example of the former case, we reporta significant reduction of the thermalization time for the paradigmatic Sherrington-Kirkpatrickspin-glass model. For the latter case, we show that, by leveraging on the exponential number ofbiased configurations automatically computed by Diagrammatic Monte Carlo, we can speed upcomputations in the Fermi-Hubbard model by two orders of magnitude.
I. INTRODUCTION
The Markov-Chain Monte Carlo method [1] (MCMC)is a generic algorithm that allows to draw samples from agiven unnormalized probability distribution. It has foundapplications in many areas of science, in particular inclassical and quantum physics [2–4], and statistics [5].The Markov-chain thermalization time is one of thecritical properties of the MCMC algorithm [6]. Alsoknown as mixing time, it is the time it takes for theMarkov chain to approach the steady-state distribution,which is sometimes called the thermal-equilibrium distri-bution in statistical physics. In the context of a MonteCarlo calculation, the thermalization time is the time oneneeds to wait before accumulating statistics. If it is largerthan the total number of Monte Carlo steps, the MCMCtechnique is not applicable. Another critical property isthe autocorrelation time, which is defined as the time ittakes to draw two uncorrelated samples from the Markovchain, after the thermalization time has passed. It can bereduced linearly by using a parallel computing machinerunning independent Markov chains, while the thermal-ization time cannot be reduced indefinitely with this ap-proach. Introducing couplings between Markov chainshas been shown to help reduce the thermalization time:one usually either considers chains at different temper-atures [7, 8], or constructs unbiased estimators [9]. An-other strategy to reduce thermalization time is the useof irreversible Markov chains [10, 11], a method that hasrecently been extended to allow for parallelization [12].On the other hand, quasi-Monte Carlo techniques [13] donot have a thermalization time by construction, and haverecently been successfully applied to quantum impurity ∗ [email protected] † [email protected] problems [14], but their application to large configura-tions spaces has not yet been fully explored.In some situations, one is able to generate many biasedproposals for new configurations of the Markov chainwith little or no additional computational effort. Thisis the case, for instance, with some variants of the Dia-grammatic Monte Carlo technique as they automaticallygenerate an exponential number of configurations at eachMonte Carlo step at no additional cost [15–20]. Here,the MCMC algorithm can only consider one configura-tion at a given Monte Carlo step, and there is no way touse the information of the multiple biased configurationsthat have been generated.In this paper, we propose a minimal generalization ofthe Markov-Chain Monte Carlo algorithm, which we callMany-Configuration Markov-Chain Monte Carlo (MCM-CMC), that allows to consider multiple configurations atthe same Monte Carlo step. This guarantees a straight-forward parallelization of the computation in order to re-duce the thermalization time, as well as allowing for theuse of many biased configurations that might be availableat a given Monte Carlo step.The document is organized as follows: In Sec. II weformulate and motivate the problem which this work ad-dresses; In Sec. III we present an intuitive picture of theMCMCMC technique by considering the case of two andthree configurations explicitely; In Sec. IV we presentbenchmark results for the Sherrington-Kirkpatrick spin-glass model, showing how the use of a parallel ma-chine can drastically reduce the thermalization time;In Sec. V we provide numerical results for the Fermi-Hubbard model within Diagrammatic Monte Carlo us-ing a specifically designed version of the MCMCMC al-gorithm. a r X i v : . [ phy s i c s . c o m p - ph ] F e b FIG. 1. Construction of the Markov chain in standardMCMC. The initial state of the Markov chain is c . Thenew proposed state is c ; we depict this proposal process as adirected graph. Then, detailed balance is imposed to stochas-tically select the next state of the Markov chain. II. PROBLEM STATEMENT
The MCMC method is based on the generation ofa sequence of states, called Markov chain, such thatthe transition probability between these states dependsonly on the current state. The goal of a MCMC al-gorithm is to build a Markov chain such that, for ev-ery initial state, the probability of a given state con-verges to an arbitrary “equilibrium” probability dis-tribution P eq in the long-time limit. The Markovchain is usually built by proposing a new state c given the current state c with a certain, arbitrary,proposal probability distribution, P prp ( c | c ), which iseasy to sample from; then, in the simplest form ofMCMC , the proposed state is accepted or rejected ac-cording to the Metropolis rate satisfying detailed bal-ance: min(1 , P eq ( c ) P prp ( c | c ) / ( P eq ( c ) P prp ( c | c ))).While being an extremely general and powerful algo-rithm, there are situations where the sequential gener-ation of new configurations leads to inefficiency. In thefollowing, we will consider two such situations:1. A parallel computing machine is available, and thethermalization time for the Markov chain is non-negligeable2. It is possible to compute a large number of biasedconfigurations with little or no additional compu-tational effortIn this paper, we present a solution to the following prob-lem: What is a minimal modification of Markov-ChainMonte Carlo that allows to take advantage of one (orboth) of these possibilities?
As already mentioned, the difficulty of MCMC in deal-ing with these situations can be traced back to its sequen-tial definition: The Markov chain is built by consideringonly one new configuration at a given Monte Carlo step,
FIG. 2. Construction of the Markov chain for MCMCMC.The initial state of the Markov chain is c . The new generatedconfigurations are c , . . . , c ; we depict this generation processas a directed graph. In the thermalization phase, the double-arrow graph is considered, and a fraction of a time step thatallows to respect detailed balance within the graph is spentin each node of the graph. Then, the new state of the Markovchain is stochastically selected within the graph with a rateequal to the fraction of time spent in each node. and there is no way to simultaneously use the informa-tion from multiple configurations. Even if multiple con-figurations can be generated, only at most one of themcan be accepted. In particular, in the presence of non-negligeable thermalization time, there is little advantagein using a parallel computing machine over a single-coremachine as any statistics accumulated before thermaliza-tion has completed cannot be used. III. INTUITIVE PICTURE
Below, we aim at providing the reader with an intuitivepicture of the novel algorithm presented in this work,Many-Configuration Markov-Chain Monte Carlo (MCM-CMC hereafter), in the graph version [the set version isdiscussed in the Appendix, see Sec. A 3]. Here, we do notprovide complete derivations and proofs, all of which canbe found in Appendix A.Let us first describe a standard version of MCMC in alanguage that will easily allow for its generalization. Us-ing the notation of Fig. 1, let us suppose that, at a givenMonte Carlo step, the Markov-chain state is the configu-ration c , which belongs to some configuration space wewant to sample in. We propose a new configuration c asthe next state of the Markov chain, where c is proposedgiven c with some probability distribution P prp ( c | c ). P prp can be freely chosen as long as it is guaranteed thatthe Markov chain is ergodic. We schematically depictthis proposal process with a directed graph composed ofthe nodes c and c , and of an arrow connecting c to c [see Fig. 1]. In order for the Markov chain to convergeto a given equilibrium distribution P eq , it is sufficient toimpose the detailed-balance condition: P eq ( c ) P prp ( c | c ) P acc ( c | c ) = P eq ( c ) P prp ( c | c ) P acc ( c | c ) , (1)where P acc ( c j | c k ) is the probability of accepting the pro-posed configuration c k as the next Markov-chain statewhen proposing it from the configuration c j . In Fig. 1we represent the detailed-balance process by an undi-rected graph with double arrows. After the accep-tance/rejection process, the Markov-chain state is either c or c , and this whole procedure is repeated at the nextMonte Carlo step.We now introduce the MCMCMC algorithm as a nat-ural many-configuration generalization of MCMC. Let c be the Markov-chain state at a given Monte Carlo timestep. We want to follow as closely as possible the MCMCprocedure illustrated in Fig. 1. For this reason, we wantto find a way to propose multiple configurations by re-peatedly using the “two-body” probability distribution P prp ( c k | c j ) introduced above. A natural way to achievethis is sketched in Fig. 2: We consider the configura-tions c , . . . , c to be nodes of a directed graph; an arrowgoing from c j to c k means that the node c k has beenproposed using the probability distribution P prp ( c k | c j ).The graph structure and generation process are com-pletely arbitrary; in the following we will detail some ofthe possible choices. In the MCMC procedure describedabove, once the graph is generated, one of the two config-urations is chosen as the Markov-chain state at the nextstep. In the case of MCMCMC , as the graph we considerat each step can be very big, we want to make full use ofall the proposed configurations. For this reason, we add athermalization phase: All the nodes of the graph are vis-ited a fraction of time proportional to the rate at whichthey would be chosen according to the detailed-balancecondition between configurations belonging to the sameundirected graph [see Fig. 2]. After this thermalizationphase, a node of the graph is chosen with a probabilityproportional to the rate in which it is visited, and theprocess is iterated.In the rest of this section we want to progressively mo-tivate MCMCMC as a minimal extension of MCMC byconsidering the case of three proposed configurations perMonte Carlo step in detail. For this reason, we are goingto present three algorithms that “interpolate” betweenMCMC and MCMCMC : the first is a rewriting of thestandard MCMC with a heat-bath acceptance rate withinthe graph language we use in this paper, the second isa modification of the first where we visit multiple con-figurations at each Monte Carlo step, and the third isa MCMCMC algorithm for many configurations that wespecialize to the case of three configurations. R (0 , := c c R (1 , := c c T := c c FIG. 3. Top: in R (0 , the configuration c is generated from c . Center: in R (1 , the configuration c is generated from c .Bottom: the undirected graph T describes the two processes. A. MCMC in the language of graphs
We now discuss in more detail the MCMC algorithm ofFig. 1, highlighting its graphical interpretation. We sup-pose that, at a given Monte Carlo step, the Markov-chainstate is the configuration c . As discussed above, we pro-pose the configuration c as the next Markov-chain stategiven c with the probability distribution P prp ( c | c ).We can represent this process by a directed graph R (0 , with two nodes, c and c , and an arrow connecting theconfiguration c to the configuration c , as illustrated inFig. 3.As we want to use detailed balance, we also need toconsider the inverse process in which the configuration c is proposed from c . We can represent the inverseprocess by a directed graph R (1 , with the same nodesas R (0 , and opposite direction for the arrow [see Fig. 3].Finally, we introduce the undirected graph T having thesame nodes as R (0 , and R (1 , [see Fig. 3]. We de-fine P acc ( c | T ) as the probability of accepting a node c ∈ { c , c } of the undirected graph T as the next stateof the Markov chain, which, for simplicity, we choose tobe independent on the creation history of T . We decideto impose the detailed balance condition independentlyon any such T : P eq ( c ) P prp ( T | c ) P acc ( c | T ) = P eq ( c ) P prp ( T | c ) P acc ( c | T ) , (2)where we defined P prp ( T | c α ) := P prp ( c − α | c α ) as theprobability of proposing c − α given c α , as discussedabove. We remark that while Eq. 4 is equivalent to Eq. 1,their interpretation is different: In Eq. 1 we have directlyconsidered the detailed balance between configurations,while in Eq. 4 we have used the graph T as an interme-diary. One can easily show that the following choice of P acc satisfies the detailed balance condition [Eq. (4)]: P acc ( c | T ) := P eq ( c ) P prp ( T | c ) (cid:80) c (cid:48) ∈ V ( T ) P eq ( c (cid:48) ) P prp ( T | c (cid:48) ) , (3)where V ( T ) := { c , c } is the set of nodes of T . Thisis equivalent to standard MCMC with a “heat-bath” ac-ceptance rate. R { (0 , , (0 , } := c c c R { (0 , , (1 , } := c c c FIG. 4. The two directed graphs of configurations that canbe generated starting from c . R { (0 , , (1 , } := c c c R { (1 , , (1 , } := c c c R { (1 , , (2 , } := c c c T := c c c FIG. 5. Directed graphs that correspond to the same undi-rected graph T . B. Generalization to three configurations perMonte Carlo step
In this section we generalize the procedure of Sec. III Ato the case of three configurations per Monte Carlo step,for a particular graph-generation process. The main pur-pose of this section is to give the intuition behind the pro-cedure, without providing a proof for all the statements,which is left to the Appendix [see Sec. A 2].Suppose that, at a given Monte Carlo step, theMarkov-chain state is c . We propose a new configura-tion c with probability P prp ( c | c ), analogously to whatis done in Sec. III A. At this point, there are two possibil-ities: With probability p , we generate a new configura-tion c from c with generation probability P prp ( c | c );otherwise, with probability q = 1 − p , we generate anew configuration c from c with generation probabil-ity P prp ( c | c ). We represent these two processes bytwo directed graphs, which we respectively denote by R { (0 , , (0 , } and R { (0 , , (1 , } [see Fig. 4]. Suppose thatwe have stochastically chosen to generate the first di-rected graph in Fig. 4, R { (0 , , (1 , } . Let T be the undi-rected graph with the same nodes as R { (0 , , (1 , } [seeFig. 5]. We consider all the directed graphs that have thesame nodes as T , and edges that can be obtained from T by removing one arrow per edge: these are R { (0 , , (1 , } , R { (1 , , (1 , } , and R { (1 , , (2 , } ; they correspond to thedifferent ways of generating the graph T starting fromone of the nodes [see Fig. 5 for the definition].As the next Markov-chain state, we choose c ∈{ c , c , c } with probability P acc ( c | T ), a quantity thatonly depends on the undirected graph T . In order tohave a useful Monte Carlo process, we need to imposethat the average time spent in a given configuration c coincides with P eq ( c ). For this, it is sufficient to imposethe detailed balance condition: P eq ( c j ) P prp ( T | c j ) P acc ( c k | T ) = P eq ( c k ) P prp ( T | c k ) P acc ( c j | T ) , (4) FIG. 6. Discrete and continuous Monte Carlo time steps inMCMCMC. A discrete step corresponds to the generation ofa new graph of configurations. A continuous step is a virtualupdate between two configurations which are connected byan edge within a given graph, a process which we refer to as“thermalization” of the undirected graph. where c j and c k are nodes of the undirected graph T .It can be shown [see Sec. A 2] that the following choicesatisfies detailed balance: P acc ( c | T ) := P eq ( c ) P prp ( T | c ) (cid:80) c (cid:48) ∈ V ( T ) P eq ( c (cid:48) ) P prp ( T | c (cid:48) ) , (5)where V ( T ) := { c , c , c } is the set of nodes of the graph T , and P prp ( T | c ) is the probability of generating thegraph T from c , which, for the specific stochastic graphgeneration process we consider in this section, is given by P prp ( T | c ) = P prp ( c | c ) q P prp ( c | c ) P prp ( T | c ) = 2 P prp ( c | c ) p P prp ( c | c ) P prp ( T | c ) = P prp ( c | c ) q P prp ( c | c ) , (6)where we have taken into account the two ways of gen-erating T given c . C. Visiting many configurations in one MonteCarlo step by “thermalization” of the undirectedgraph
In order to render the algorithm more efficient whenmany configurations are simultaneously available, we in-troduce a modification of the Markov chain definitionthat allows to visit multiple configurations per MonteCarlo step. We call this extension the “thermalization”of the undirected graph [see Fig. 6]. As an intuitive pic-ture, we can imagine that the generated undirected graphof configurations is the whole configuration space for onetime step, and continuous-time Monte Carlo updates aremade between nodes of the graph. Formally, this will besimply done by renaming the “probability of proposal ofconfigurations”, P prp , to the “probability of generationof configurations”, P gen , and interpreting the probabilityof acceptance of a configuration P acc as the fraction oftime spent in a node of the graph P node .A powerful way to formalize this concept is by con-sidering a Markov chain of undirected graphs instead ofthe usual Markov chain of configurations. Suppose thatat a given Monte Carlo step the Markov-chain state isthe undirected graph T consisting of the nodes c and c [see Fig. 3]. We generate the next state of the Markovchain, T (cid:48) , in the following way: We choose one of thenodes of T , say c , with a certain probability distribution P node ( c | T ) that coincides with P acc ( c | T ) [defined inSec. III A]; We then generate the new state of the Markovchain, T (cid:48) , with a probability P gen ( T (cid:48) | c ) that coincideswith P prp ( T (cid:48) | c ) [ P prp is introduced in Sec. III A]. Tothis discrete-time Markov chain of undirected graphs, weassociate a Markov chain of configurations in continuoustime in the following way: Suppose that the Markov-chain state at discrete time j is T ; For a continuous time t such that j ≤ t < j + 1, we spend a fraction of the timeequal to P node ( c | T ) in a given configuration c of the undi-rected graph T [see Fig. 6 for an intuitive picture]. Forany given P gen ≡ P prp , we adjust P node to achieve an av-erage time of P eq ( c ) spent in each configuration c . It canbe shown that substituing P acc with P node and P prp with P gen in Eq. (3) guarantees this equilibrium condition. InSec. A 2 we provide a formal proof of this fact. IV. APPLICATION TO A SPIN-GLASS MODELA. The Sherrington-Kirkpatrick model
We now present the implementation of the MCMCMCalgorithm for the Sherrington-Kirkpatrick model, definedby the energy E ( { J jk } , { S j } ) := 1 √ L (cid:88) ≤ j We have compared the following algorithms for theSherrington-Kirkpatrick model simulation: the simplestlocal MCMC algorithm; the MCMCMC extension of thelocal MCMC with the line-graph addition process [seeSec. B 1 for details]; and the graph populating processfor a k -ary tree [see Sec. B 2 for details].In Fig. 7, we present the results of the numerical com-parison between the standard local Metropolis MCMC FIG. 7. Energy per site, averaged over parallel computingcores, for a particular disorder realization of the Sherrington-Kirkpatrick model with L = 2 and T = 1, as a functionof Monte Carlo steps. The MCMC dotted line representsthe standard local Metropolis algorithm, while the solid linesare the MCMCMC results. The blue line is the line-graphrandom addition process with graph size of 2 , the red line isthe binary tree of depth 7, the yellow line is the ternary treeof depth 5, the violet line is the 16-ary tree of depth 3. algorithm with a single update, and various MCMCMCtechniques differing by graph structure. We see that thethermalization time is decreased by two orders of mag-nitude, proving therefore the success of the algorithm indramatically reducing the thermalization time. V. APPLICATION TO LATTICE FERMIONSA. The Fermi-Hubbard model We consider the doped two-dimensional fermionicHubbard model [22–24]: H = (cid:88) k,σ ( (cid:15) k − µ ) c † kσ c kσ + U (cid:88) i n i ↑ n i ↓ . (9)Here µ denotes the chemical potential, k the momentum, σ ∈ {↑ , ↓} the spin, U the onsite repulsion strength, i labels lattice sites, and the (square lattice) dispersion isgiven by (cid:15) k = − t [cos( k x ) + cos( k y )] , (10)where t is the nearest-neighbor hopping amplitude (weset t = 1 in our units). We further define T as the tem-perature and n as the density.In the chosen parameter regime of T = 0 . n =0 . U = 8, which represents a much investigatedstrongly correlated regime of the model with stripe mag-netic order in the ground state [25–27]. B. Numerical results As traditional Quantum Monte Carlo techniques areaffected from the fermionic sign problem when simulatingthe repulsive Hubbard model away from half filling, weuse Diagrammatic Monte Carlo, which allows to circum-vent this issue by sampling Feynman diagrams directlyin the thermodynamic limit. We hereafter use a MCM-CMC algorithm for sets we specifically developed for usewithin the Connected Determinant Diagrammatic MonteCarlo (CDet) [15] [see Sec. C for a CDet introduction].The reader can find more details about the set generatingprocess we use in the Appendix, Sec. D.To assess relative performance of MCMCMC andMCMC we compare the stochastic error bars obtained af-ter a given time (about one hour) on a single CPU proces-sor [see Fig. 8]. For a given order we run the MCMCMCalgorithm at order higher than the standard MCMC asthis is optimal for the MCMCMC performance. Bothalgorithms spend around 10% of their time on normali-sation. We report an improvement of up to two orders ofmagnitude in the computational time needed to reach agiven stochastic error for the highest expansion coefficient12, and, importantly, we observe that the improvementgrows as a function of order. ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ order s peedup FIG. 8. Improvement in computational time needed to reacha given stochastic error for a given expansion coefficient ofthe double occupancy as a function of expansion order for theFermi-Hubbard model in the thermodynamic limit, at T = 0 . / VI. CONCLUSION In this paper we have introduced a minimal general-ization of the Markov-Chain Monte Carlo algorithm thatallows to consider multiple configurations at the same Monte Carlo step. This new technique permits straight-forward parallelization, as well as giving the possibilityof using an arbitrary number of biased configurations atthe same Monte Carlo step.After giving an intuitive picture of the algorithm basedon a graphical interpretation, we have presented numer-ical results for spin-glass and quantum fermionic mod-els, showcasing the generality and the potential of thisnew technique. More specifically, we have shown that,for the Sherrington-Kirkpatrick model, a very significantreduction of thermalization time can be achieved whenmany configurations are considered at each Monte Carlostep. We have further shown how we can use the knowl-edge from all subsets of diagram configurations withinthe Connected Determinant Diagrammatic Monte Carlomethod to further improve its computational efficiency,and we have presented benchmark results for the paradig-matic Fermi-Hubbard model away from half-filling wherea speedup of up to two orders of magnitude was observedat twelve Feynman loops.We believe that our work has potential applications toareas where Markov-chain Monte Carlo techniques are of-ten considered too computationally intensive due to theirnon-parallelizable nature, such as big data Bayesian in-ference [28]. More generally, the Markov-chain MonteCarlo algorithm for multiple configurations we have in-troduced in this paper will be potentially applicable tosituations where the thermalization time is an issue anda parallel computing machine is available, or when manybiased configurations can be evaluated at little computa-tional effort. ACKNOWLEDGMENTS We thank Giuseppe Carleo, Michel Ferrero, WernerKrauth, Botao Li, and Shiwei Zhang for valuable dis-cussions. We thank Dries Sels for pointing out Ref. [9].This work has been supported by the Simons Foun-dation within the Many Electron Collaboration frame-work. The Flatiron Institute is a division of the Si-mons Foundation. This work was granted access to theHPC resources of TGCC and IDRIS under the allocationsA0090510609 and A0070510609 attributed by GENCI(Grand Equipement National de Calcul Intensif). Appendix A: General theory1. Markov chain of configurations In this section we review the standard MCMC algo-rithm. We consider a discrete configuration space S . TheMarkov chain is usually defined as a infinite sequence ofconfigurations ( c ( j = 1) , c ( j = 2) , ... ), c ( j ) ∈ S , indexedby a discrete “time” j , such that a configuration c ( j + 1)is generated stochastically using only c ( j ) as input. Letus define P eq ( c ) as the average time spent in the config-uration c ∈ S : P eq ( c ) := lim J →∞ J J (cid:88) j =1 δ c,c ( j ) (A1)We also define P (( c, j )) as the probability of being in theconfiguration c at discrete time j . We consider a Markovchain such that (at least on average)lim j →∞ P (( c, j )) = P eq ( c ) . (A2)The MCMC algorithm is particularly useful to computeaverage values of an “observable” O , which takes as ar-guments elements of the configuration space S : O := (cid:80) c ∈ S w ( c ) O ( c ) (cid:80) c ∈ S w ( c ) (A3)where w ( c ) ≥ c . If the condition P eq ( c ) = w ( c ) (cid:80) c (cid:48) ∈ S w ( c (cid:48) ) (A4)is satisfied for every c ∈ S , then we can compute O as atime average of O ( c ( j )), where c ( j ) is the Markov-chainstate at discrete time j : O = lim J →∞ J J (cid:88) j =1 O ( c ( j )) . (A5) 2. Markov chain of graphs of configurations In this section, we generalize the standard MCMC for-malism of Sec. A 1 to to a Markov chain consisting of aninfinite sequence of undirected graphs of configurations( T ( j = 1) , T ( j = 2) , . . . ), such that the set of nodes of T ( j ), V ( T ( j )), is contained in S , and such that T ( j + 1)is stochastically generated using only T ( j ) as input. Wecan introduce the average time spent in a given graph T ⊆ S P eq ( T ) = lim J →∞ J J (cid:88) j =1 δ T,T ( j ) . (A6)At a continuous time t ∈ [ j, j + 1[, j ∈ N , we imag-ine visiting a configuration c ∈ V ( T ( j )) with a certainprobability P (( c, t ) | ( T, j )) which we impose to be time-independent, and we call it P node : P node ( c | T ) =: P (( c, t ) | ( T, j )) . (A7)We assume this probability to be normalized to one: Ifthe graph T belongs to the Markov chain at time j ∈ N ,then for every t ∈ [ j, j + 1[, the system can be in only one node c ∈ V ( T ) of the graph T at at time, which isequivalent to imposing (cid:88) c ∈ V ( T ) P node ( c | T ) = 1 (A8)for every undirected graph T .The Markov chain of undirected graphs is built withthe sole purpose of computing average values of functionsdefined in the nodes of the graph. Therefore, if we define P eq ( c ) = (cid:88) T : V ( T ) (cid:51) c P eq ( T ) P node ( c | T ) (A9)and if we impose Eq. (A4), we can compute observablesin configuration space from the Markov chain in graphspace: O = lim J →∞ J J (cid:88) j =1 (cid:88) c ∈ V ( T ( j )) P node ( c | T ( j )) O ( c ) (A10)where O is defined in Eq. (A3). The advantage of thisformulation compared to Eq. (A5) is that we can considermultiple configurations at a given Monte Carlo step j ,and visit each one of them.We now show how to compute P node from the graphtransition probability, which is a free parameter in thisformulation. While we could choose an arbitrary graphtransition probability, here we limit ourselves to the fol-lowing choice: the new graph is generated starting fromone of the vertices of the old graph P (( T (cid:48) , j + 1) | ( T, j )) := (cid:88) c ∈ V ( T ) ∩ V ( T (cid:48) ) P node ( c | T ) P gen ( T (cid:48) | c ) , (A11)where P gen ( T (cid:48) | c ) is the probability of generating a graph T starting from a node c . P gen can be chosen in a com-pletely arbitrary way, but a choice of P gen constrains theform of P node . We will impose the detailed balance con-dition in configuration space, which is defined as P (( c, t ) , ( c (cid:48) , t (cid:48) )) = P (( c (cid:48) , t ) , ( c, t (cid:48) )) , (A12)where c, c (cid:48) ∈ S , t < t (cid:48) , and t, t (cid:48) ∈ R + are much larger thanthe thermalization time. First of all, we remark that inour Monte Carlo process there are no transitions at non-integer times; therefore, we can suppose t (cid:48) = j ∈ N , and t = t (cid:48) − + . We then obtain P (( c, j − + ) , ( c (cid:48) , j )) = (cid:88) T : V ( T ) (cid:51) c P eq ( T ) P node ( c | T ) × (cid:88) T (cid:48) ; V ( T (cid:48) ) (cid:51) c (cid:48) P (( T (cid:48) , j ) | ( T, j − P node ( c (cid:48) | T ) . (A13)By using Eq. (A11), one can verify that the choice P node ( c | T ) := P eq ( c ) P gen ( T | c ) P eq ( T ) (A14)satisfies detailed balance [Eq. (A12)]. We remark that theprobability of being in the undirected graph T , P eq ( T ),does not need to be computed as one can use the nor-malization condition [see Eq. (A8)]. 3. Markov chain of sets of configurations In this section we consider a version of the Markovchain algorithm for sets of configurations. We consider aMarkov chain consisting of an infinite sequence of sets ofconfigurations ( S ( j = 1) , S ( j = 2) , . . . ), such that S ( j )is a set of fixed cardinality n ∈ N , n > 0, contained in S , and such that the S ( j + 1) is stochastically generatedusing only S ( j ) as input. We can introduce the averagetime spent in a given set S ⊆ S as P eq ( S ) = lim J →∞ J J (cid:88) j =1 δ S,S ( j ) . (A15)At a continuous time t ∈ [ j, j + 1[, j ∈ N , we visit a sub-set S (cid:48) (cid:40) S with a certain probability P ( ( S (cid:48) , t ) | ( S, j ) ),which we impose to be time-independent: P subset ( S (cid:48) | S ) =: P (( S (cid:48) , t ) | ( S, j )) . (A16)We assume this probability to be normalized to one: if S is the Markov-chain state at discrete time j , then we mustbe in one and only subset S (cid:48) (cid:40) S for every continuoustime t ∈ [ j, j + 1[, which is equivalent to imposing (cid:88) S (cid:48) (cid:40) S P subset ( S (cid:48) | S ) = 1 (A17)for every S ⊆ S such that | S | = n . We would like toevaluate O u := (cid:88) S ⊆ S : | S | = u w ( S ) O ( S ) (A18)for u ∈ { , , . . . , n } , and w ( S ) ≥ 0. We introduce λ u > u ∈ { , . . . , n − } , and define N n := (cid:88) S ⊆ S : | S | Using Eq. (A26) and Eq. (A20), we see thatthe explicit λ u factor in Eq. (A21) simplifies to oneand we can safely take the λ u → + limit for some u ∈ { , , . . . , n − } . This means that we can accu-mulate statistics for sets that are never visited by theMarkov chain. Appendix B: Generating graphs of configurations In this section we describe some choices for the graph-generation probability P gen introduced in Sec. A 2. Weintroduce two classes of graph-generation process: ad-dition processes [Sec. B 1], and populating processes[Sec. B 2]. 1. Line-graph addition process We consider here the following process: starting fromthe configuration c , we create a new configuration c (cid:48) witha probability given by some function P gen ( c (cid:48) | c ), and weconsider the graph with an edge between c and c (cid:48) . Sup-pose now that we have created a graph consisting of c , c , . . . , c l − such that c k is connected by an edge to c k +1 , for k ∈ { , , . . . , l − } . With probability , weadd a new vertex c (cid:48) connected to c with the probabilitydistribution P gen ( c (cid:48) | c ), otherwise we add the vertex c (cid:48) to the graph connecting it to c l − with the probabilitydistribution P gen ( c (cid:48) | c l − ). If T is the line-graph with c , c , . . . , c L − as nodes, one therefore has P gen ( T | c k ) = 12 L (cid:18) Lk (cid:19) (cid:32) k − (cid:89) k l =0 P gen ( c k l | c k l +1 ) (cid:33) × (cid:32) L − (cid:89) k r = k +1 P gen ( c k r | c k r − ) (cid:33) . (B1) 2. General graph populating process We now present a graph-generation process that isamenable to analytic treatment. The key point is that,if we first generate an “empty” graph, where no node isassigned, and we then populate the graph nodes of withconfiguration values in such a way that the computa-tion of P gen ( T | c ) becomes straightforward. We considerspecifically a tree T . We choose a root r for the tree, r ∈ { , , . . . , | V ( T ) | − } , with a given probability distri-bution p r , which determines a rooted tree R . From theroot r , we can populate the tree by generating a configu-ration for each edge of the rooted tree with the probabil-ity distribution P gen ( c | c (cid:48) ), where c and c (cid:48) are connectedwith an edge going from c to c (cid:48) . We can therefore write,assuming that c is the root of the rooted tree RP gen ( T | c ) = p r (cid:89) ( c,c (cid:48) ) ∈ E ( R ) P gen ( c | c (cid:48) ) (B2)One has complete freedom on p r : the uniform choice isjust p r = | V ( T ) | , which gives a self-thermalizing tree if P gen is symmetric. Appendix C: Introduction to ConnectedDeterminant Diagrammatic Monte Carlo Below we provide a very brief overview of CDet. Weare interested in computing a physical observable C ( ξ ),which we express in terms of an infinite power series C ( ξ ) := ∞ (cid:88) n =0 c n ξ n , (C1)where the expansion coefficients c n are computed fromthe stochastic sampling of multi-dimensional integralsover a set of n internal variables { X , . . . , X n } =: S cor-responding to vertex positions in some arbitrary space: c n = 1 N n (cid:90) S c ( S ) , (C2)where N n is a normalisation constant and c ( S ) representsthe sum of weights for all topologically distinct connectedgraphs T constructed from vertices in S and obeying ad-ditional rules imposed by the choice of model and ob-servable. The weight of a particular graph is given bythe product of the weights of its edges E ( S )[ i, j ], whichare functions of two vertex positions X i and X j . We canwrite: c ( S ) = (cid:88) T (cid:89) { i,j }∈T E ( S )[ i, j ] . (C3)In general, the number of distinct connected graphtopologies grows factorially with the number of verticesand so does the computational cost of evaluating c ( S ).However, in many cases it is possible to compute therelated sum of the weights for all connected and discon-nected graphs a ( S ) at only polynomial or exponentialcost[29]. In order to obtain c ( S ), one needs to eliminateall disconnected diagrams from a ( S ) using the recursiveformula: c ( S ) = a ( S ) − (cid:88) S (cid:48) (cid:40) SS (cid:48) (cid:51) X c ( S (cid:48) ) a ( S \ S (cid:48) ) , (C4)where, in order to properly define connectivity, the sumis over all subsets S (cid:48) containing the arbitrarily chosenvertex X from S . The cost of evaluating Eq.(C4) scalesas O (3 n ) with the number of vertices and, sometimestogether with the evaluation of a ( S ), represents the com-putational bottleneck of the CDet algorithm. It is impor-tant to note from Eq.(C4) that in order to compute c ( S )for a given set of vertices, one is obliged to compute c ( S (cid:48) )for all of it’s subsets S (cid:48) , which can be used in computinglower order expansion coefficients c n (cid:48) where n (cid:48) < n . Thismeans that in the process of computing the weight at or-der n we are generating an exponential (2 n − 1) amountof a lower order weights as a side product.0 Appendix D: Generating sets of configurations1. Stochastic generation of the set In this section we define the stochastic process that weperform to generate a set S of cardinality n ∈ N givena subset S (cid:48) (cid:40) S and | S (cid:48) = m | . Let S (cid:48) = S (cid:48) . Given S (cid:48) k , k ∈ { m, m + 1 , . . . , n } , we generate S (cid:48) k +1 by adding avertex to S (cid:48) k in the following way: We choose one element e (cid:48) of S (cid:48) k randomly, we generate a new element e with thesome arbitrary probability distribution P gen ( e | e (cid:48) ), andwe add it to S (cid:48) k to generate S k +1 ; = S k ∪ { e (cid:48) } . We thenidentify S (cid:48) n ≡ S . 2. Recursive exponential formula for thegeneration probability We now show how to compute P gen ( S | S (cid:48) ) for S (cid:48) (cid:40) S ,for a given S of cardinality n ∈ N , | S | = n . This isneeded to compute P subset [see Eq. (A26)]. A brute forceapproach would result in an algorithm that scales facto-rially with n . We are going to present a recursive formulawhose computational cost scales only exponentially with n . We define ˜ P S ( S (cid:48) ) := P gen ( S | S \ S (cid:48) ) (D1)for S (cid:48) (cid:40) S . One has˜ P S ( { e (cid:48) } ) = 1 | S | (cid:88) e (cid:48)(cid:48) ∈ S \{ e (cid:48) } P gen ( e (cid:48) | e (cid:48)(cid:48) ) (D2)for e ∈ S . For S (cid:48) (cid:40) S , the Chapman-Kolmogorov equa-tion can be written as˜ P S ( S (cid:48) ) = 1 | S (cid:48) | | S \ S (cid:48) | (cid:88) e (cid:48)(cid:48) ∈ S \ S (cid:48) (cid:88) e (cid:48) ∈ S (cid:48) P gen ( e (cid:48) | e (cid:48)(cid:48) ) · ˜ P S ( S (cid:48) \ { e (cid:48) } ) (D3) We can express the previous equation in words: In orderto generate the subset S from the set S \ S (cid:48) , we need tofirst generate an element e (cid:48) of S (cid:48) starting from one ele-ment e (cid:48)(cid:48) of S \ S (cid:48) , then generate the set S from the set S \ S (cid:48) ∪ { e (cid:48) } . We note that that the cardinality of thel.h.s. of Eq. (D3) is one higher than the cardinality ofthe r.h.s and therefore this equation can be solved recur-sively. This leads to a computational cost of o ( n n ). Itis possible to improve upon this scaling by introducing a“cumulative” probability distribution ˜ P gen ( S (cid:48) , e (cid:48) ) whichwe compute recursively as:˜ P gen ( S (cid:48) , e (cid:48) ) = (cid:26) (cid:80) e (cid:48) ,e (cid:48)(cid:48) ∈ S (cid:48) e (cid:48) (cid:54) = e (cid:48)(cid:48) P gen ( e (cid:48) | e (cid:48)(cid:48) ) if | S (cid:48) | = 1˜ P gen ( S (cid:48) \ { e S (cid:48) } , e (cid:48) ) − P gen ( e S (cid:48) | e (cid:48) ) otherwise (D4)where e S (cid:48) ∈ S (cid:48) can be chosen arbitrarily as long as e S (cid:48) (cid:54) = e (cid:48) . Then we can rewrite Eq. (D3) into˜ P S ( S (cid:48) ) = 1 | S (cid:48) | | S \ S (cid:48) | (cid:88) e (cid:48) ∈ S (cid:48) ˜ P gen ( S (cid:48) , e (cid:48) ) ˜ P S ( S (cid:48) \ { e (cid:48) } )(D5)which now has only computational cost of O ( n n ). Thiscomputational cost is negligible in comparison to O (3 n ),which is the asymptotic computational scaling for theCDet [15] algorithm used for the evaluation of connectedFeynman diagrams. [1] N. Metropolis, A. W. Rosenbluth, M. Rosenbluth, A. H.Teller, and E. Teller, Journal of Chemical Physics ,1087 (1953).[2] W. Krauth, Statistical mechanics: algorithms and com-putations , Vol. 13 (OUP Oxford, 2006).[3] J. Gubernatis, N. Kawashima, and P. Werner, Quan-tum Monte Carlo Methods (Cambridge University Press,2016).[4] F. Becca and S. Sorella, Quantum Monte Carlo ap-proaches for correlated systems (Cambridge UniversityPress, 2017).[5] D. Gamerman and H. F. Lopes, Markov chain MonteCarlo: stochastic simulation for Bayesian inference (CRC Press, 2006).[6] L. Lov´asz and P. Winkler, Microsurveys in discrete prob-ability , 85 (1998). [7] R. H. Swendsen and J.-S. Wang, Phys. Rev. Lett. ,2607 (1986).[8] G. Altekar, S. Dwarkadas, J. P. Huelsenbeck,and F. Ronquist, Bioinformatics , 407 (2004),https://academic.oup.com/bioinformatics/article-pdf/20/3/407/455968/btg427.pdf.[9] P. E. Jacob, J. O’Leary, and Y. F. Atchad´e, arXivpreprint arXiv:1708.03625 (2017).[10] E. P. Bernard, W. Krauth, and D. B. Wilson, Phys. Rev.E , 056704 (2009).[11] M. Michel, S. C. Kapfer, and W. Krauth, The Journal ofchemical physics , 054116 (2014).[12] B. Li, S. Todo, A. Maggs, and W. Krauth, ComputerPhysics Communications , 107702 (2021).[13] J. Dick, F. Y. Kuo, and I. H. Sloan, Acta Numerica ,133–288 (2013). [14] M. Maˇcek, P. T. Dumitrescu, C. Bertrand, B. Triggs,O. Parcollet, and X. Waintal, Phys. Rev. Lett. ,047702 (2020).[15] R. Rossi, Phys. Rev. Lett. , 045701 (2017).[16] F. ˇSimkovic and E. Kozik, Phys. Rev. B , 121102(2019).[17] A. Moutenet, W. Wu, and M. Ferrero, Phys. Rev. B ,085117 (2018).[18] R. Rossi, arXiv preprint, arXiv:1802.04743 (2018).[19] R. Rossi, F. ˇSimkovic, and M. Ferrero, EPL (EurophysicsLetters) , 11001 (2020).[20] F. ˇSimkovic, R. Rossi, and M. Ferrero, Physical ReviewB , 10.1103/physrevb.102.195122 (2020).[21] G. Parisi, Phys. Rev. Lett. , 1754 (1979).[22] J. Hubbard, in Proceedings of the royal society of londona: mathematical, physical and engineering sciences , Vol.276 (The Royal Society, 1963) pp. 238–257.[23] P. W. Anderson, Solid state physics , 99 (1963).[24] P. W. Anderson et al. , The theory of superconductivity inthe high-Tc cuprate superconductors , Vol. 446 (PrincetonUniversity Press Princeton, NJ, 1997). [25] J. P. F. LeBlanc, A. E. Antipov, F. Becca, I. W. Bulik,G. K.-L. Chan, C.-M. Chung, Y. Deng, M. Ferrero, T. M.Henderson, C. A. Jim´enez-Hoyos, E. Kozik, X.-W. Liu,A. J. Millis, N. V. Prokof’ev, M. Qin, G. E. Scuseria,H. Shi, B. V. Svistunov, L. F. Tocchio, I. S. Tupitsyn,S. R. White, S. Zhang, B.-X. Zheng, Z. Zhu, and E. Gull(Simons Collaboration on the Many-Electron Problem),Phys. Rev. X , 041041 (2015).[26] B.-X. Zheng, C.-M. Chung, P. Corboz, G. Ehlers,M.-P. Qin, R. M. Noack, H. Shi, S. R. White,S. Zhang, and G. K.-L. Chan, Science , 1155 (2017),https://science.sciencemag.org/content/358/6367/1155.full.pdf.[27] M. Qin, C.-M. Chung, H. Shi, E. Vitali, C. Hubig,U. Schollw¨ock, S. R. White, and S. Zhang (Simons Col-laboration on the Many-Electron Problem), Phys. Rev.X , 031016 (2020).[28] R. Bardenet, A. Doucet, and C. Holmes, Journal of Ma-chine Learning Research , 1 (2017).[29] This is often done by computing determinants or per-manents of n × n matrices M ( S ) with edge weights asentries: ( M ( S )) ij = E ( S )[ i, ji, j