Performance of Domain-Wall Encoding for Quantum Annealing
PPerformance of Domain-Wall Encoding for Quantum Annealing
Jie Chen, Tobias Stollenwerk, and Nicholas Chancellor Department of Physics; Joint Quantum Centre (JQC) Durham-NewcastleDurham University, South Road, Durham, DH1 3LE, UK German Aerospace Center (DLR), Linder H¨ohe, 51147 Cologne, Germany (Dated: December 2020)In this paper we experimentally test the performance of the recently proposed domain-wall en-coding of discrete variables from [Chancellor Quantum Sci. Technol. 4 045004] on Ising model fluxqubit quantum annealers. We compare this encoding with the traditional one-hot methods and findthat they outperform the one-hot encoding for three different problems at different sizes both of theproblem and of the variables. From these results we conclude that the domain-wall encoding yieldssuperior performance against a variety of metrics furthermore, we do not find a single metric bywhich one hot performs better. We even find that a 2000Q quantum annealer with a drastically lessconnected hardware graph but using the domain-wall encoding can outperform the next generationAdvantage processor if that processor uses one-hot encoding.
I. INTRODUCTION
Quantum annealing is a subject of much recent in-terest, because of recent advances in both theory andexperimental implementations. After the initial numer-ical studies which pointed to quantum annealing as apotential tool for optimization [1], focus was mostly onrelatively simple closed systems in the adiabatic limit[2, 3]. However, a wide variety of advances have nowtaken place, for example better understanding of the rolenoise plays [4], more rapid quenches [5–8], and how toincorporate quantum annealing into hybrid protocols [9–12]. Experimentally this field is exciting because it allowsfor large scale experiments on superconducting hardwaredesigned to solve difficult optimization problems. Proof-of-concept studies have taken place on a diverse rangeof topics including aerospace problems[13, 14], hydrology[15], radar waveform design [16], scheduling [17–19] andtraffic flow optimization [20, 21]. While to our knowl-edge a scaling advantage for optimization has yet to beseen, signs of a potential advantage have been observedin recent quantum simulation experiments [22].The problems which these devices solve are encodedas energy minimization with respect to quadratic un-constrained binary (QUBO) Hamiltonians (aka. penaltyfunctions) of the form H QUBO = (cid:88) i,j ∈ χ Q ij b i b j , where b i ∈ { , } encodes the value of qubit i and Q is theQUBO matrix which defines the problem[23]. Similarly,one could use the equivalent Ising formulation by substi-tuting b i = (1 − z i ) / z i ∈ { , − } . The interactionsare constrained to a graph χ , but as long as the graphobeys certain structural constraints, arbitrary connectiv-ity can be mapped using a technique known as minorembedding [24, 25] where variables within a graph minorare joined using strong ferromagnetic (negative) inter-actions. These joined variables are referred to as chains,and the strength of the interactions is referred to as chain strength. An alternative approach to mapping is to useparity constraints [26–29], but we use minor embeddingmethods for this paper because they are more commonlyused, and because quantum Monte-Carlo studies havesuggested that this is a better method for the kind ofdevices we study here [30].Since solving a QUBO is known to be NP-hard, allother optimization problems can be mapped to themwith only a polynomial overhead. One particular map-ping which is common is a discrete-to-binary mapping,where discrete variables with greater than two values aremapped to binary variables. The traditional way to dothis is to use a kind of constraint known as a one-hotconstraint which requires that only one of a set of qubitscan be in the | (cid:105) configuration.To construct a quantum algorithm to solve a QUBOproblem, quantum annealing is performed, the Hamilto-nian which describes this process includes Pauli X terms( X i ) which introduce quantum mechanical (qu)bit flips H ( A, B ) = − A ( t ) (cid:88) i X i + B ( t ) H QUBO where the protocol starts out in an equal positive super-position of all possible solutions with A ( t =0) B ( t =0) (cid:29)
1, andends with B ( t = t f ) A ( t = t f ) (cid:29)
1. Note, that the first term, whichis called driver , can be replaced by other operators whichdo not commute with the second operator. While we arenot concerned with the detailed physics of how these de-vices operate in the present study, it is worth remarkingthat the devices we study here operate in a highly dissi-pative regime, where interactions with a low-temperatureenvironment play an important role in the dynamics.Recently it has been demonstrated in [31] that a differ-ent way of encoding discrete variables, known as domain-wall encoding, can lead to problem structures which makeminor-embedding more efficiently and use fewer variablesthan one-hot encodings while still allow arbitrary inter-actions between the variables. This study was purelynumerical and theoretical, and furthermore has pointedout that the qubit flips explore the solution space in fun- a r X i v : . [ qu a n t - ph ] F e b damentally different ways for one-hot versus domain-wallencoded problems. The domain-wall encoding has founduse in quantum simulations of quantum field theories[32, 33] and has been used in proof-of-concept experi-ments for using quantum fluctuations to guide searcheson annealers [34]. To our knowledge, however, there hasnever been a direct experimental test of the relative per-formance between between domain-wall and one-hot en-codings.Since the solution space is not explored in the sameway for the two encodings, it is not a priori clear thatthe more efficient embedding will translate to improvedproblem solving abilities . The search could be less effec-tive in a way which negates the gains from improved em-bedding. However when we perform the experiments onseveral examples we find that the domain wall encodingdoes indeed lead to an improvement over many differentmetrics. These experiments are performed on two dif-ferent quantum processing units (QPUs) manufacturedby D-Wave Systems Inc. which have different allowedinteraction graphs, an older, less connected generation(2000Q), and a newer more connected one (Advantage).We find that at least by some metrics, the use of the moresophisticated domain-wall encoding can make more of adifference to the ability to solve problems than the re-engineered hardware graphs. Although not the primarygoal of this paper, we also compare between the two QPUarchitectures (and between all QPU-encoding combina-tions), this allows us to compare the gains from using thedomain wall encoding to those attained by using a moreconnected architecture. II. DISCRETE QUADRATIC MODELS
While the native models for quantum annealers arequadratic unconstrained binary optimization problems(QUBOs), many real world problems are most naturallyexpressed in a way which still involves pairwise inter-actions between terms, these are referred to as discretequadratic models (DQMs). A DQM can be describedby a set of discrete variables d i , i ∈ [ n − d i do not need to be integersor even a number, it can be any discrete set, for examplecolors in a co louring problem, but are indexed in orderby the index i . A DQM Hamiltonian can be written asan extension of a QUBO Hamiltonian with an additionalindex denoting the variable value, H DQM = (cid:88) i,j (cid:88) α,β D ( i,j,α,β ) x i,α x j,β , where x i,α = (cid:40) d i takes value α D ( i,j,α,β ) defines the pairwise interactions betweenthe variables. For the sake of simplicity, we further con- strain the values of the discrete variables to consecutiveintegers α ∈ [ m −
1] in this work. An example for suchdiscrete variables is the colors α of vertex i in graph color-ing problems. The extensions to arbitrary sets of discretevalues is straight forward. See [14] as an example. A. Domain-Wall and One-Hot Encoding
The discrete variable d i is encoded in multiple binaryvariables x iα . We now quickly review the encoding meth-ods which are commonly used for these discrete variables.The traditional method is known as one-hot encoding.Here, each qubit corresponds to one possible value of thediscrete variable and a constraint which specifies that thevariable only takes one value has to be imposed ∀ i (cid:88) α x i,α = 1 . (1)This constraint can be enforced by adding a quadraticpenalty term to the Hamiltonian H one hot = H DQM + λ (cid:88) i (cid:32) m − (cid:88) α =0 x i,α − (cid:33) . (2)Hence, we can write the discrete variables as d i = (cid:80) mα =0 αx i,α . However, from the perspective of physicallyimplementation on a real device, it has the undesirableproperty that all qubits used to encode a variable mustbe able to interact with all others used to encode thatsame variables.It was shown in [31] that instead using a “domain wall”encoding strategy can encode discrete variables with onefewer qubit per variable and does not require interac-tions between all qubits to implement the constraint.This work found that minor embedding was more effi-cient when the domain-wall encoding was used. While afull description of this encoding strategy can be be foundin [31], and Python code to implement the domain-wallencoding can be found at [35] we review the key detailshere in the interest of making this manuscript self con-tained. The underlying principle of the domain wall en-coding is to use the degeneracy of domain wall positionson a segment of a frustrated Ising spin chain as it isshown in Figure 1. For each discrete variable d i , we have s i, − s i, s i, s i, · · · s i,m − s i,m − κ κ κ κ κ κ FIG. 1: Domain wall encoding scheme. The value of thediscrete variable d i is given by the position α ∈ [ m − s i, − = − s i,m − = 1. m − s i, − = − s i,m − = 1. The variable d i is correctly encoded if there is a single domain wall inthe Ising chain. This is enforced by the penalty Hamil-tonian describing a ferromagnetic coupling between theIsing spin variables H chain = − κ (cid:32) m − (cid:88) α = − s i,α s i,α +1 (cid:33) (3)where κ is a coupling large enough to enforce a singledomain wall in the ground state (cf. [31]). Then, thebinary variable x i,α depends on the values of the spinvariable according to x i,α = 12 ( s i,α − s i,α − ) , ∀ i, α ∈ [ n − × [ m − , and Equation (1) is fulfilled. Note that only m − m different values. This is one less than in the one-hot-encoding case. Hence, the total Hamiltonian for theDQM reads H domain wall = H DQM + H chain Note that this is a function purely of the ( m − n ( in-ner ) spin variables { s i,α | i ∈ [ n − , α ∈ [ m − } . Theconversion back to a QUBO is straight forward. B. k -Coloring One type of problem we use to contrast the perfor-mance of different QPU-encoding combinations, is max-imum colouring problems. Formally these are Max- k -Colorable Subgraph [36] problems or equivalently Max- k -Cut problems [37, 38]. However, since we do not con-sider any other types of coloring problems in the paper,we refer to these problems as k -coloring problems with-out fear of ambiguity. These problems consist of findingways to color a graph with q nodes using k colors suchthat nodes of the same color are in contact with eachother in as few places as possible. For graphs which arecolorable this reduces to finding a coloring of the graph.We use randomly generated Erd¨os-R´enyi graphs [39, 40]where for each pair of nodes the presence or absence ofan edge is independently and randomly decided with agiven probability.In the one-hot encoding scheme fore the Max- k -Colorable Subgraph problem [36], we have k = m deci-sion variables x iα for each node i in the graph and everycolor α ∈ [ m −
1] which is 1 if the node i has the color α .The Hamiltonian for graph G = ([ n − , E ) reads H DQM = m − (cid:88) α =0 (cid:88) ( i,j ) ∈ E x iα x jα . Following the procedure in [31], we consider two classesof k -coloring problems. The for the first we fix k = 3and vary q , we refer to these as maximum three coloringproblems, or simply three coloring problems. For theseproblems we use graphs with an edge probability of 0 . k -coloring problems where both k and q are varied, with q = 2 k and an edge probability of 0 .
75, we refer to theseas maximum k -coloring problems, or simply k -coloringproblems. C. Flight-Gate Assignment
The Flight-Gate Assignment problem was already in-vestigated for QAOA [41] and quantum annealing in theone-hot encoding scheme [13]. We want to assign n flightsto m gates using the decision variable x iα = (cid:40) , if flight i is assigned to gate α, , otherwise , for i ∈ [ n −
1] and α ∈ [ m − H DQM = (cid:88) iα (cid:0) n di t dα + n ai t aα (cid:1) x iα + (cid:88) ijαβ n ij t αβ x iα x jβ where the various problem parameters are listed in Ta-ble I Symbol Meaning n d i in a i in ij i and jt in i Arrival time of flight it out i Departure time of flight it a α Transfer time from gate α to baggage claim t d α Transfer time from check-in to gate αt αβ Transfer time from gate α to gate βt buf Buffer time between two flights at the same gate
TABLE I: Flight-Gate Assignment problem parameterThere are two hard constraints in the problem. First,a flight should be assigned to exactly one gate. This isrepresented by the Equation (1). Second, flights withtemporal overlap are not allowed to be assigned to thesame gate. ∀ α, ∀ ( i, j ) ∈ E : x i,α · x j,α = 0 , where E = (cid:8) ( i, j ) | ( t ini − t outj < t buf ) ∧ ( t inj − t outi < t buf ) (cid:9) is the set of forbidden flight pairs. This second constraintis enforced by adding the following H temp = µ (cid:88) α (cid:88) ( i,j ) ∈ E x iα x jα , (4)to the total Hamiltonians (2) and (3), respectively. Againthe penalty weight µ must be sufficiently large to ensurethe constraint satisfaction in the ground state (see [13] fordetails). Note that both constraints are equivalent to theproper coloring of a graph with edged E . This relationto graph coloring is extensively discussed e.g. in [41].Due to precision problems of the D-Wave quantum an-nealer, in [13] it was found, that the largest instances withnon-vanishing success probability where 29 instance with n = 7 flights and m = 2. We used these instances for ourstudy. III. EXPERIMENTAL SETUP ANDPERFORMANCE MEASURES
In this section, we discuss the experimental setup andmeasures for comparing the performance of domain-walland one-hot encoding.
A. Chain and constraint strengths
To experimentally study these problems we need todecide upon a method to choose the strength of bothembedding chains and the constraints used to restrictthe number of domain walls to one or enforce the one-hot constraint (e.g. λ , µ and κ in Equations (2), (3)and (4)). Since the goal of this paper is to compare ratherthan develop an absolute benchmark, it is not crucialthat these choices be completely optimal, but for thesecomparisons to be relevant to real calculations we shouldstill ensure that we are operating in a regime where thebehavior is likely to be similar to the optimal choices.For the chain strength we use the “uniform torque com-pensation” [42] feature which is available through the D-Wave ocean software repository [43]. On the other handno such feature exists for finding the strength of the con-straints. However a practical approach is to choose thestrength of the constraint parameters λ , κ and µ equalto the magnitude of the largest single field or couplerin the problem definition before embedding. Both ofthese choices have the advantage of being “automatic”the sense of adjusting based on the structure of the prob-lem at hand. Since the goal of this paper is to make acomparison of performance on the same footing, ratherthan to test the optimal performance of the device, wedo not need to show that our parameter choices are thebest possible, but that they perform reasonably well. Toverify that these choices are sensible, we compare differ-ent constraint choices for a size 15 three coloring problemin Figure 2. We find that our strategy for choosing theconstraint strength is roughly optimal for 2000Q, it isslightly suboptimal for Advantage. While this performswell enough for the purposes of the analysis we do here,it is worth keeping in mind that there is potential roomfor further gains within our Advantage data by finding abetter performing parameter setting heuristic. largest term 1 2 3 constraint value C one hot Advantagedomain wall 2000Qdomain wall Advantage FIG. 2: The cost function C, which is equal to the totalnumber of places which the same color touch for thethree coloring problem with 15 nodes, averaged over 100instances versus the constraint strength.
B. Experimental procedures
All experiments reported here were performed in theautumn of 2020. They were performed using the defaultanneal time of 5 µ s and 100 anneals were performed ineach run using a single embedding (we found that theperformance variation between different embeddings wasnegligible). All embeddings were performed using theminor-miner embedding software provided by D-Wavesystems Inc. [44]. If an embedding failed, we attempt ittwice more to verify that this failure was not an anomaly.We found that for all problem classes (for example par-ticular size of graph for three coloring or number of colorsfor k-coloring), for any given QPU/embedding combina-tion either all problem embeddings failed or they all suc-ceeded. Except when explicitly stated otherwise in theappendix, broken chain decoding was performed usingthe majority vote decoding tool included in the softwarepackage. Spin reversal transforms (sometimes referred toas gauge averaging) were not used. Experimental dataare available in a public repository [45].Numerical analysis and plotting were performed usingMatlab and the python programming language [46], inparticular, heavy use was made of the numpy [47, 48] andmatplotlib [49] packages, as well as jupyter notebooks[50, 51]. C. Hypothesis testing
One method to compare between pairs of QPU-encoding combinations is to run them on the same prob-lems and see how many times each can outperform theothers, ignoring cases where they can each find equallygood solutions. The immediate question then becomeshow statistically significant a given sample is. In otherwords, how likely are we to see a result which is at leastas favorable by random chance? To quantify this signifi-cance, we perform hypothesis testing.To this end, we partition the solutions where one QPU-encoding combination outperforms the other into the twoclasses. First, let the number of cases where the combi-nation we expect to do better (the “expected winner”)has performed better than the combination it is beingcompared to n b . And second, let the number of caseswhere the “expected winner” performs worse n w . Thisallows us to calculate the statistical significance, p = 12 n b + n w n b + n w (cid:88) k = n w (cid:18) n b + n w k (cid:19) . (5)This is effectively the probability that the expected win-ner could perform better at least as many times if thebetter performing QPU-encoding combination were cho-sen at random with 50% probability (our null hypothe-sis). Effectively, it is the probability that the result (oran even more favorable one) could happen by chance ifboth performed equally well.By convention, p < .
05 is considered to be a sta-tistically significant result rejecting the null hypothesis[52] and therefore confirming that the expected winnerdoes indeed perform better, by symmetry, p > .
95 isalso a statistically significant result, but rejecting an al-ternate hypothesis that the expected winner was chosencorrectly, and therefore showing that we chose the ex-pected winner incorrectly. A value of 0 . < p < .
95 isnot statistically significant and indicates that an insuf-ficient number of samples have been taken to draw anyconclusions based on our hypothesis testing strategy.
D. Performance Measures
To understand the effect of encoding and QPU choiceon the ability of the device to solve the problem, wechoose to analyze four performance metrics. The firstis the fraction of raw solutions which do not contain anybroken embedding chains, R chain . This measure allowsus to get a sense of how faithfully the device is able torepresent the problem, since solutions with broken chainsno longer correspond to valid solutions.The second quantity we examine is the rate of solutionswhere, after majority vote decoding is performed on anybroken chains, the solutions satisfy all of the one-hot ordomain-wall constraints.We call this rate R enc .We next define the cost function C which is the quan-tity which we are attempting to minimize subject to ourconstraints. By convention, if an annealing run has notreturned any solutions where all the constraints are satis-fied, then we define the effective value of C to be infinite,since no valid solution is returned, therefore as we havedefined it, the average cost function is infinite even if asingle problem does not yield a valid solution. Finally we define the success probability P , which isdefined as the fraction of problems for which the mostoptimal value of C was found. Recall however thatwe only perform 100 anneals per run, so it is likelythat higher success probabilities could be found by in-creased sampling, potentially also using spin reversaltransforms, or even more advanced tricks like reverse an-nealing [9, 11, 53], pausing [54], sample persistence [55],anneal offsets [56], or extended coupling range [57]. Thegoal of this study is a relative comparison between QPUsand encoding methods rather than to benchmark the bestpossible performance which can be attained using thesedevices, and for this reason we have elected to keep ourexperiments simple at the expense of cutting-edge per-formance. IV. RESULTS
In this section, we present our results by showing thefour performance measures for the flight gate assignmentproblem, the three-co louring problem and k -co louringproblem. Also, we show our results on the hypothesistesting for the three-coloring and k -colouring problem. A. Flight Gate Assignment
As discussed in section II C, all studied flight gate as-signment instances have m = 2. Since the encoding ofa discrete variable of size two into a domain-wall encod-ing reduces to a direct binary encoding, it is not math-ematically possible for the domain-wall constraint to beviolated in these cases. On the other hand, we find thatthe one-hot constraint is only satisfied in 71% and 65%of solutions on average for 2000Q and Advantage respec-tively. We do find some chain breaks in all cases, butthey are so rare it is not possible to reliably differentiatewhich encoding performs better based on our data, al-though we have observed that the domain-wall encodingseems to perform slightly better. We observe that theproblem is solved after 100 reads in all cases except forthree using the one-hot encoding on Advantage.To study what effect chain breaks would have on largersystems, we set embedding chain strengths to intention-ally suboptimal values as opposed to using the uniformtorque compensation tool, which performs well in allcases. The results are shown in Figure 3. We findthat the domain-wall encoding performs equally or betterthan one-hot against all metrics, regardless of the QPUused. What is particularly striking is that the domain-wall encoding was able to solve all instances at all chainstrengths for both processors, while the one-hot encod-ing was only able to achieve this for the uniform torquecompensation scheme. uniform torque 1 2 3 chain strength R c h a i n one hot 2000Qone hot Advantagedomain wall 2000Qdomain wall Advantage (a) Rate of unbroken chains in physical solutions uniform torque 1 2 3 chain strength R e n c one hot 2000Qone hot Advantagedomain wall 2000Qdomain wall Advantage (b) Rate of correctly encoded logical solutions uniform torque 1 2 3 chain strength CC m i n one hot Advantagedomain wall 2000Qdomain wall Advantageone hot 2000Q (c) Cost function of the problem from logical solutions uniform torque 1 2 3 chain strength P one hot 2000Qone hot Advantagedomain wall 2000Qdomain wall Advantage (d) Success probability FIG. 3: Mean of the four performance measures against the chain strength for the 29 instances of the flight gateassignment problem for all four QPU-encoding combinations. Note that the cost function is normalized bysubtracting the optimal value C min for more meaningful comparison across problems. In part (c) the one-hot 2000Qpoint only appears for uniform torque compensation since the data for all other chain strengths contained at leastone problem where none of the 100 reads produced a solution which satisfied all one-hot constraints. All bars in thisplot are standard error. B. Three-Coloring
For further demonstration of the effect of the differentencodings, we consider 100 random instances of threecolor problems as studied in [31]. For these problemseach variable d i has three possible values, correspondingto each color, since the domain-wall encoding consists oftwo qubits in this case, it is mathematically possible forthe single domain-wall constraint to not be satisfied. Foreach of these instances we run both domain-wall and one-hot encodings each on the Advantage and 2000Q QPU.Figure 4 shows the four performance measures as dis-cussed in section III D. For the same QPU the domain-wall encoding performs consistently better than one-hot.Also, for the same encoding the Advantage QPU per-forms consistently better than the 2000Q QPU. Whencomparing the domain-wall encoding on the 2000Q QPU with one-hot encoding on the Advantage QPU, we canshow that the average solution quality appears to beeither comparable or to favor the domain-wall on theless advanced processor. On the other hand, even us-ing the one-hot encoding, the Advantage processor hasfewer chain breaks. In essence, using a more connectedQPU and using domain-wall as opposed to one-hot seemto reduce chain breaks and furthermore, the higher con-nectivity seems to be the more decisive factor.However, it seems to be the case that (at least formajority vote decoding), broken chains in the domain-wall encoding are more likely to be decoded to correctsolutions. For the cost function as well, there is a visi-ble difference between the performance of both the one-hot and domain-wall encoding. Finally, we observe that,while the domain-wall encoding makes a large differencein the probability of finding the optimal solution within graph size R c h a i n one hot 2000Qone hot Advantagedomain wall 2000Qdomain wall Advantage (a) Rate of unbroken chains in physical solutions graph size R e n c one hot 2000Qone hot Advantagedomain wall 2000Qdomain wall Advantage (b) Rate of correctly encoded logical solutions graph size C / e dg e nu m b e r one hot 2000Qone hot Advantagedomain wall 2000Qdomain wall Advantage (c) Cost function of the problem from logical solutions graph size P one hot 2000Qone hot Advantagedomain wall 2000Qdomain wall Advantage (d) Success probability FIG. 4: Mean of the four performance measures against the problem size for the three coloring problem for all fourQPU-encoding combinations. The cost function in part (c) has been normalized by edge number to allow a moredirect comparisons at different sizes. The bars for parts (a), (b) and (c) are the standard deviation of thedistribution, rather than standard error. Since all data points are based on 100 samples, the standard error is tentimes smaller than what is depicted by these bars. Bars for (d) are standard error.100 reads, the difference between the two types of QPUsis within error bars for all sizes we test. Aside from thefact that larger problems can be encoded on the Advan-tage QPU, there is not a significant difference betweenthe performance of the two QPUs for the same encodingfor graph sizes less than about 25, however as we demon-strate later, the performance differences can be betterunderstood using analysis based on hypothesis testing.
C. Hypothesis Testing for Three-Coloring
Since it is difficult to visually distinguish the averagecost functions, we apply a different technique to compareperformance of solutions against the cost function. Weexamine how many cases each processor-encoding com-bination does better or worse than any of the others. Wefurther perform hypothesis testing to see which of the differences are statistically significant as described in theSection III C. The results are shown in Table II. We cansee that except for at the smallest size (5 nodes), there aredifferences in how the processor-encoding combinationsperform. We find that except for at this smallest size,the domain-wall encoding always performs better thanthe one-hot, even when comparing the domain-wall en-coding on a 2000Q to one-hot encoding on an Advantage.We further find that all statistically significant resultspoint toward Advantage performing better than 2000Q,but at smaller sizes the differences are not statisticallysignificant.A particularly striking result here is that, at least up tothe size where the problems can no longer be embeddedon the 2000Q, using a domain-wall, rather than one-hotencoding makes a bigger difference to solution qualitythan using the more advanced processor. This under-scores the importance of encoding methods to obtain-
Adv. dw/oh 2000Q dw/oh dw Adv./2000Q oh Adv./2000Q (dw, Adv.)/(oh, 2000Q) (dw, 2000Q)/(oh, Adv.)5 node (b,w) 0 0 0 0 0 0 0 0 0 0 0 05 node p10 node (b,w) 42 0 37 0 2 0 19 21 39 0 40 010 node p 2 . × − . × − . × − . × − . × − . × −
15 node (b,w) 85 2 95 3 32 34 70 22 94 1 91 215 node p 2 . × − . × − . × − . × − . × − . × −
20 node (b,w) 99 0 100 0 43 41 94 3 100 0 93 220 node p 1 . × − . × − . × − . × − . × − . × −
25 node (b,w) 100 0 FAIL 66 20 FAIL FAIL 98 225 node p 7 . × − . × − . × −
30 node (b,w) 100 0 FAIL 72 20 FAIL FAIL 97 230 node p 7 . × − . × − . × −
35 node (b,w) 100 0 FAIL FAIL FAIL FAIL FAIL FAIL35 node p 7 . × −
40 node(b,w) 100 0 FAIL FAIL FAIL FAIL FAIL FAIL40 node p 7 . × − TABLE II: Hypothesis testing results for all six possible comparisons of QPU-encoding combinations for three colorproblems of different sizes. For each comparison the expected winner is listed first. For each size the count of caseswhere the expected winner (written first at the top of the column) performs better n b (left) and worse n w (right) arelisted. Below is listed the value of p as calculated by Equation (5). In cases where either both combinations performthe same on all problems, or one or both fail to embed, statistical significance cannot be calculated. In case wherethe expected winner failed to embed, we write ‘FAIL’ in the left column, and likewise if the embedding fails for theQPU-encoding combination described by the right column. These comparisons are performed for the single bestsolution found out of all 100 samples, using majority vote decoding for broken chains. If none of the samples decodeto valid solutions, than the cost function is treated as being “infinite” and any finite value is considered to be better.Color coding used as a guide to the eye, green indicates a statistically significant rejection of the null hypothesis,while yellow indicates a result which is not statistically significant.ing high quality solutions over just waiting for hardwareimprovements. An astute reader may question whetherthis result is simply because the majority vote decodingseems to perform better on domain-wall encoded prob-lems than one-hot (as can be seen by comparing Figure 4ato Figure 4b). To answer this question, we consider analternate way of processing the data, in which solutionswith broken chains are discarded rather than decodedby majority vote. We find that this approach does notaffect the qualitative result that the domain-wall encod-ing on a 2000Q performs better in a statistically signif-icant way. For completeness, these results are shown inTable IV in the appendix. While the domain-wall en-coding does always make a bigger performance differencein cases where the problem can be embedded on bothQPUs, it is possible to embed larger problems on theAdvantage QPU, even with the domain-wall encoding,30 nodes is the largest size we are able to embed on an2000Q, whereas Advantage can embed a problem of atleast size 40. D. k -Coloring Now that we have demonstrated that the domain-wallencoding leads to significantly better performance in en-coding discrete variables with both two and three possi-ble values, the next natural question is what happens at higher values, particularly because [31] found that thesewere the cases where the structure had the most effect onembedding efficiency. To do this we examine maximum k -coloring problems for which the value of k scales withthe number of nodes. Figure 5 shows the results. We seea similar pattern as before, with Figure 5a showing thatthe QPU structure makes a bigger difference in terms ofchain breaks, however, at least for the Advantage QPU,usage of the domain-wall encoding can also significantlyreduce the number of breaks. Also as seen in the three-Coloring case, the encoding type is the dominant factor indetermining the number of solutions which are decodedcorrectly. Unlike the three-coloring case however, R enc decreases toward zero as the number of colors and there-fore the problem size, is increased. This is likely becausefor more colors, there are more possible ways to violatethe constraint. The final cost function likewise showsencoding being the dominant factor in determining per-formance and being a more significant factor than QPUtype. We further see this trend in probability of find-ing the most optimal solution, although the performancedifference between the Advantage and 2000Q by this met-ric is larger than the three-Coloring case, suggesting thatthe slightly better performance from Advantage is a realeffect, as we will see later, a different method of compari-son actually favors the older version of the processor, the2000Q. color number R c h a i n one hot 2000Qone hot Advantagedomain wall 2000Qdomain wall Advantage (a) Rate of unbroken chains in physical solutions color number R e n c one hot 2000Qone hot Advantagedomain wall 2000Qdomain wall Advantage (b) Rate of correctly encoded logical solutions color number C / e dg e nu m b e r one hot 2000Qone hot Advantagedomain wall 2000Qdomain wall Advantage (c) Cost function of the problem from logical solutions color number P one hot 2000Qone hot Advantagedomain wall 2000Qdomain wall Advantage (d) Success probability FIG. 5: Mean of the four performance measures against the problem size for the three coloring problem for variousgraph sizes and comparing all four QPU-encoding combinations. The cost function in part (c) has been normalizedby edge number to allow a more direct comparisons at different sizes. The bars for parts (a-c) are the standarddeviation of the distribution, rather than standard error. Since all data points are based on 100 samples, thestandard error is ten times smaller than what is depicted by these bars. Bars for (d) are standard error.
E. Hypothesis Testing for k -Coloring As we have done before, we perform hypothesis testingbased analysis on the best cost function values returnedby all QPU-encoding combinations. As with the three-Coloring case, the domain-wall encoding leads to a sta-tistically significant improvement at all but the smallestsize. Even when we perform a cross comparison, we seethat the domain-wall encoding on a 2000Q out-performsthe one-hot encoding on the Advantage. We do find onesurprising result, not in terms of comparison between thetwo encodings, but in terms of comparisons between thetwo QPUs. For the one-hot encoding of the maximumfive coloring problem, we find a highly statistically sig-nificant result that the 2000Q actually outperforms theAdvantage processor. This is the opposite trend to whatis seen when the domain-wall encoding is used, and veryunusual since a more highly connected device should per- form better than a less connected one. To understand theroot cause of this effect, we perform the same analysis,but discard broken-chain solutions, rather than performmajority vote decoding. As Table V in the appendixshows, the effect goes away when we change decodingstrategy, indicating that this is an artifact of the strat-egy. Since the primary purpose of this paper is to com-pare encoding strategy, rather than QPU performance,we have elected not to probe this effect further, althoughdoing so could potentially yield interesting results
V. DISCUSSION AND CONCLUSIONS
We have performed the first (to our knowledge) exper-imental tests of the domain-wall encoding proposed in[31] on quantum annealing processors. We find that forproblems with variables up to size seven, the domain-wall0
Adv. dw/oh 2000Q dw/oh dw Adv./2000Q oh Adv./2000Q (dw, Adv.)/(oh, 2000Q) (dw, 2000Q)/(oh, Adv.)3 color (b,w) 0 0 0 0 0 0 0 0 0 0 0 03 color p4 color (b,w) 34 1 37 2 11 3 26 16 44 1 33 74 color p 1 . × − . × − . × − . × − . × − . × − . × − . × − . × − ≈ . × − . × − . × − . × − . × − . × − TABLE III: Hypothesis testing results for all six possible comparisons of QPU-encoding combinations for k colorproblems with different numbers of colors. For each comparison the expected winner is listed first. For each size thecount of cases where the expected winner (written first at the top of the column) performs better n b (left) and worse n w (right) are listed. Below is listed the value of p as calculated by Equation (5). In cases where either bothcombinations perform the same on all problems, or one or both fail to embed, statistical significance cannot becalculated. In case where the expected winner failed to embed, we write ‘FAIL’ in the left column, and likewise ifthe embedding fails for the QPU-encoding combination described by the right column. These comparisons areperformed for the single best solution found out of all 100 samples, using majority vote decoding for broken chains.If none of the samples decode to valid solutions, than the cost function is treated as being “infinite” and any finitevalue is considered to be better. Color coding used as a guide to the eye, green indicates a statistically significantrejection of the null hypothesis, while yellow indicates a result which is not statistically significant. Red indicates astatistically significant result which rejects the alternate hypothesis.encoding out-performs the traditional one-hot encodingin all but the smallest cases, for which their performanceis roughly equal due to the ease of the problems. Wefurther find that the domain-wall encoding generally re-duces the number of broken minor-embedding chains andincreases the proportion of solutions which decode cor-rectly. Crucially, for every problem we look at, we donot find a single metric for which the domain-wall encod-ing performs worse on average than one-hot on the sameQPU, suggesting that the domain-wall encoding shouldbe the method of choice.Dramatically, when we perform a cross comparisonof performance between domain-wall encoding on theolder 2000Q QPU versus one-hot on the newer AdvantageQPU, we find that the encoding makes a bigger differencein solution quality than the QPU architecture (at least onproblems which can be embedded into both QPUs, em-bedding fails at a smaller size for domain wall on a 2000Qthan it does for one hot on Advantage). This underscoresthe importance of “software” advances like better encod-ing in parallel to hardware advances. This is particularlytrue given that, while developing new hardware can be avery expensive endeavor, using a different encoding canbe done at almost no cost. Given the recent trend forthe programming of these devices to be done at an in-creasingly higher level, the end user does not necessarilyneed to even “see” the encoding steps at all, for exam-ple the D-Wave ocean repository currently has a discretequadratic model (DQM) solver [58], which uses one-hotencoding to solve optimization problems [59]. Chang-ing the underlying encoding used by this solver is likelyto improve performance, but would have no other effect on the way the end users interact with this solver, andwould not require them to understand how this strategyworks. The German Aerospace Center (DLR) is develop-ing a library which is more hardware agnostic (includinggate-based quantum computers) with overlapping func-tionality. It is planed to integrate different encodings, in-cluding domain-wall encoding, and publish the softwareunder an open-source license.While it being not the main purpose of our study, wehave also compared the Advantage and 2000Q QPUs.We have found that minor embedding chains break lessfrequently on the Advantage QPU, as what should beexpected on a more connected graph. We found that theAdvantage QPU also performed better (or for small prob-lems no statistically significant difference could be found)at solving problems in all but one example on a five color-ing problem. This result is highly statistically significant,so it is unlikely to be a statistical anomaly. And it alsogoes away when we do not perform majority vote decod-ing on the broken chains. While we have not investigatedthis effect further since it is far from our main purpose, itis likely that a more complete investigation could be fruit-ful in finding improved broken-chain decoding strategies.It is also worth remembering that there was room for im-provement in how the constraint strength was chosen forAdvantage, a more optimal strategy here would changethe comparison between Advantage and 2000Q (further)in favor of Advantage.While our work gives compelling evidence that thedomain-wall-strategy is superior for currently availablesuperconducting flux qubit QPUs, there are still manyunanswered questions with regards to the encoding which1are beyond the scope of our current work. It would be il-luminating to test these strategies on larger problems andmore connected hardware graphs using quantum MonteCarlo. This approach has previously been used to com-pare embedding strategies [30]. It could further be inter-esting to test whether the advantages seen in annealingcarry over to gate model optimization algorithms. Pre-liminary work [31] gives some theoretical suggestions forthis, because our the domain-wall encoding approach willrequire fewer interactions between distant qubits. How-ever, an experimental test would be enlightening. VI. ACKNOWLEDGMENTS
NC and JC were funded by UKRI EPSRC grant num-ber EP/S00114X/1 and QPU access for early experi- ments was supported by impact acceleration funding as-sociated with grant EP/L022303/1, although none ofthese data are directly presented in the manuscript. Theauthors gratefully acknowledge the J¨ulich Supercomput-ing Centre [60]) for funding this project by providingcomputing time through the J¨ulich UNified Infrastruc-ture for Quantum computing (JUNIQ) an the D-Wavequantum annealer.
APPENDIX: HYPOTHESIS TESTING TABLESDISCARDING BROKEN CHAINS
In this appendix we provide versions of Tables IIand III but where solutions with broken chains are dis-carded rather than decoded by majority vote, the resultsappear in Tables IV and V respectively. [1] T. Kadowaki and H. Nishimori, Quantum annealing inthe transverse Ising model, Phys. Rev. E , 5355 (1998).[2] E. Farhi, J. Goldstone, S. Gutmann, and M. Sipser,Quantum computation by adiabatic evolution (2000),arXiv preprint quant-ph/0001106.[3] T. Albash and D. A. Lidar, Adiabatic quantum compu-tation, Rev. Mod. Phys. , 015002 (2018).[4] L. C. Venuti, T. Albash, D. A. Lidar, and P. Zanardi,Adiabaticity in open quantum systems, Phys. Rev. A ,032118 (2016).[5] J. G. Morley, N. Chancellor, S. Bose, and V. Kendon,Quantum search with hybrid adiabatic–quantum-walk al-gorithms and realistic noise, Phys. Rev. A , 022339(2019).[6] M. B. Hastings, Duality in Quantum Quenches and Clas-sical Approximation Algorithms: Pretty Good or VeryBad, Quantum , 201 (2019).[7] A. Callison, M. Festenstein, J. Chen, L. Nita, V. Kendon,and N. Chancellor, An energetic perspective on rapidquenches in quantum annealing, ar χ iv:2007.11599 (2020).[8] E. J. Crosson and D. A. Lidar, Prospects for quantumenhancement with diabatic quantum annealing (2020),arXiv preprint quant-ph/2008.09913.[9] A. Perdomo-Ortiz, S. E. Venegas-Andraca, andA. Aspuru-Guzik, A study of heuristic guesses foradiabatic quantum computation, Quantum InformationProcessing , 33 (2011).[10] Q.-H. Duan, S. Zhang, W. Wu, and P.-X. Chen, An al-ternative approach to construct the initial hamiltonianof the adiabatic quantum computation, Chinese PhysicsLetters , 010302 (2013).[11] N. Chancellor, Modernizing quantum annealing using lo-cal searches, New Journal of Physics , 023024 (2017).[12] T. Graß, Quantum annealing with longitudinal biasfields, Phys. Rev. Lett. , 120501 (2019).[13] T. Stollenwerk, E. Lobe, and M. Jung, Flight gate as-signment with a quantum annealer, in Proceedings of theFirst International Workshop on Quantum Technologyand Optimization Problems , Theoretical Computer Sci-ence and General Issues No. 9 (Springer, Munich, Ger-many, 2019). [14] T. Stollenwerk, B. O’Gorman, D. Venturelli, S. Mandr`a,O. Rodionova, H. Ng, B. Sridhar, E. G. Rieffel, andR. Biswas, Quantum annealing applied to de-conflictingoptimal trajectories for air traffic management, IEEETransactions on Intelligent Transportation Systems , 1(2019).[15] D. O’Malley, An approach to quantum-computationalhydrologic inverse analysis, Scientific Reports , 6919(2018).[16] G. E. Coxson, C. R. Hill, and J. C. Russo, Adiabaticquantum computing for finding low-peak-sidelobe codes(2014), presented at the 2014 IEEE High PerformanceExtreme Computing conference.[17] A. Crispin and A. Syrichas, Quantum annealing algo-rithm for vehicle scheduling, in (2013) pp.3523–3528.[18] D. Venturelli, D. J. J. Marchand, and G. Rojo, Quan-tum Annealing Implementation of Job-Shop Scheduling(2000), arXiv preprint quant-ph/1506.08479.[19] T. T. Tran, M. N. Do, E. G. Rieffel, J. Frank, Z. Wang,B. O’Gorman, D. Venturelli, and J. C. Beck, A hybridquantum-classical approach to solving scheduling prob-lems, in SOCS (2016).[20] F. Neukart, G. Compostella, C. Seidel, D. von Dollen,S. Yarkoni, and B. Parney, Traffic flow optimization usinga quantum annealer, Frontiers in ICT , 29 (2017).[21] S. Yarkoni, F. Neukart, E. M. G. Tagle, N. Magiera,B. Mehta, K. Hire, S. Narkhede, and M. Hofmann, Quan-tum shuttle: Traffic navigation with quantum comput-ing, in Proceedings of the 1st ACM SIGSOFT Interna-tional Workshop on Architectures and Paradigms for En-gineering Quantum Software (Association for ComputingMachinery, New York, NY, USA, 2020) p. 22–30.[22] A. D. King et al. , Scaling advantage in quan-tum simulation of geometrically frustrated magnets,ar χ iv:1911.03446 (2019).[23] Note that b i = b i .[24] V. Choi, Minor-embedding in adiabatic quantum com-putation: I. the parameter setting problem, QuantumInformation Processing , 193 (2008), arXiv:0804.4884. Adv. dw/oh 2000Q dw/oh dw Adv./2000Q oh Adv./2000Q (dw, Adv.)/(oh, 2000Q) (dw, 2000Q)/(oh, Adv.)5 node (b,w) 0 0 0 0 0 0 0 0 0 0 0 05 node p10 node (b,w) 42 0 38 0 2 0 22 21 40 0 40 010 node p 2 . × − . × − . × − . × − . × − . × −
15 node (b,w) 88 2 96 3 35 28 78 16 96 0 91 315 node p 3 . × − . × − . × − . × − . × − . × −
20 node (b,w) 99 1 96 0 60 33 98 0 100 0 82 1520 node p 7 . × − . × − . × − . × − . × − . × −
25 node (b,w) 100 0 FAIL 93 6 FAIL FAIL 61 2425 node p 7 . × − . × − . × −
30 node (b,w) 100 0 FAIL 100 0 FAIL FAIL 14 330 node p 7 . × − . × − . × −
35 node (b,w) 100 0 FAIL FAIL FAIL FAIL FAIL FAIL35 node p 7 . × −
40 node(b,w) 88 0 FAIL FAIL FAIL FAIL FAIL FAIL40 node p 3 . × − TABLE IV: Hypothesis testing results for all six possible comparisons of QPU-encoding combinations for three colorproblems of different sizes. For each comparison the expected winner is listed first. For each size the count of caseswhere the expected winner performs better n b (left) and worse n w (right) are listed. Below is listed the value of p ascalculated by Equation (5). In cases where either both combinations perform the same on all problems, or one orboth fail to embed, statistical significance cannot be calculated. In case where the expected winner failed to embed,we write ‘FAIL’ in the left column, and likewise if the embedding fails for the QPU-encoding combination describedby the right column. These comparisons are performed for the single best solution found out of all 100 samples,where samples with broken chains are treated as being invalid. If none of the samples decode to valid solutions, thanthe cost function is treated as being “infinite” and any finite value is considered to be better. Color coding used as aguide to the eye, green indicates a statistically significant rejection of the null hypothesis, while yellow indicates aresult which is not statistically significant. Adv. dw/oh 2000Q dw/oh dw Adv./2000Q oh Adv./2000Q (dw, Adv.)/(oh, 2000Q) (dw, 2000Q)/(oh, Adv.)3 color (b,w) 0 0 0 0 0 0 0 0 0 0 0 03 color p4 color (b,w) 39 1 42 3 17 2 33 14 53 1 32 94 color p 3 . × − . × − . × − . × − . × − . × − . × − . × − . × − . × − . × − . × − . × − . × − . × − . × − TABLE V: Hypothesis testing results for all six possible comparisons of QPU-encoding combinations for k-colorproblems of different number of colors. For each comparison the expected winner is listed first. For each size thecount of cases where the expected winner performs better n b (left) and worse n w (right) are listed. Below is listedthe value of p as calculated by Equation (5). In cases where either both combinations perform the same on allproblems, or one or both fail to embed, statistical significance cannot be calculated. In case where the expectedwinner failed to embed, we write ‘FAIL’ in the left column, and likewise if the embedding fails for the QPU-encodingcombination described by the right column. These comparisons are performed for the single best solution found outof all 100 samples, where samples with broken chains are treated as being invalid. If none of the samples decode tovalid solutions, than the cost function is treated as being “infinite” and any finite value is considered to be better.Color coding used as a guide to the eye, green indicates a statistically significant rejection of the null hypothesis. [25] V. Choi, Minor-embedding in adiabatic quantum compu-tation: II. minor-universal graph design, Quantum Infor-mation Processing , 343 (2011), arXiv:1001.3116.[26] W. Lechner, P. Hauke, and P. Zoller, Aquantum annealing architecture with all-to- all connectivity from local interactions, Sci-ence Advances , 10.1126/sciadv.1500838 (2015),http://advances.sciencemag.org/content/1/9/e1500838.full.pdf.[27] A. Rocchetto, S. C. Benjamin, and Y. Li, Stabilis-ers as a design tool for new forms of Lechner-Hauke- Zoller annealer, Science Advances , e1601246 (2016),arXiv:1603.08554.[28] M. Leib, P. Zoller, and W. Lechner, A transmon quantumannealer: decomposing many-body Ising constraints intopair interactions, Quantum Science and Technology ,015008 (2016).[29] N. Chancellor, S. Zohren, and P. A. Warburton, Cir-cuit design for multi-body interactions in superconduct-ing quantum annealing systems with applications toa scalable architecture, npj Quantum Information ,10.1038/s41534-017-0022-6 (2017), arXiv:1603.09521.[30] T. Albash, W. Vinci, and D. A. Lidar, Simulated quan-tum annealing with two all-to-all connectivity schemes,Phys. Rev. A , 022327 (2016), arXiv:1603.03755.[31] N. Chancellor, Domain wall encoding of discrete vari-ables for quantum annealing and QAOA, Quantum Sci-ence and Technology , 045004 (2019).[32] S. Abel, N. Chancellor, and M. Spannowsky, Quantumcomputing for quantum tunnelling, ar χ iv:2003.07374(2020).[33] S. Abel and M. Spannowsky, Observing the fate ofthe false vacuum with a quantum laboratory (2020),ar χ iv:2006.06003.[34] N. Chancellor, Fluctuation-guided search in quantum an-nealing, Phys. Rev. A , 062606 (2020).[35] N. Chancellor, Code associated with: Domain wall en-coding of integer variables for quantum annealing andQAOA (), https://doi.org/10.15128/r27d278t029 .[36] S. Hadfield, Z. Wang, B. O’Gorman, E. G. Rieffel,D. Venturelli, and R. Biswas, From the quantum approx-imate optimization algorithm to a quantum alternatingoperator ansatz, Algorithms , 34 (2019), (Specialissue ”Quantum Optimization Theory, Algorithms, andApplications”), arXiv:1709.03489.[37] A. Frieze and M. Jerrum, Improved approximation algo-rithms for maxk-cut and max bisection, Algorithmica ,67 (1997).[38] S. Khot, G. Kindler, E. Mossel, and R. O’Donnell, Op-timal inapproximability results for max-cut and other2-variable csps?, SIAM Journal on Computing , 319(2007).[39] P. Erd¨os and A. R´enyi, On random graphs, Publ. Math.Debrecen , 290 (1959).[40] P. Erd¨os and A. R´enyi, On the evolution of randomgraphs, Bull. Inst. Internat. Statist , 343 (1960).[41] T. Stollenwerk, S. Hadfield, and Z. Wang, Toward quan-tum gate-model heuristics for real-world planning prob-lems, IEEE Transactions on Quantum Engineering , 1(2020).[42] Documentation on d-wave uniform torque compen-sation tool, https://docs.ocean.dwavesys.com/projects/system/en/stable/reference/generated/dwave.embedding.chain_strength.uniform_torque_compensation.html (2021), accessed 2021-19-01.[43] D-wave ocean software documentation, https://docs.ocean.dwavesys.com/en/stable/ (2021), accessed2021-19-01.[44] minorminer, https://github.com/dwavesystems/minorminer (2021), accessed 2021-19-01. [45] N. Chancellor, Experimental data and code associatedwith: Performance of domain-wall encoding for quantumannealing (), to appear in a later version.[46] G. Van Rossum and F. L. Drake, Python language refer-ence manual
A guide to NumPy , Vol. 1 (Trelgol Pub-lishing USA, 2006).[49] J. D. Hunter, Matplotlib: A 2D graphics environment,Computing in science & engineering , 90 (2007).[50] T. Kluyver, B. Ragan-Kelley, F. P´erez, B. Granger,M. Bussonnier, J. Frederic, K. Kelley, J. Hamrick,J. Grout, S. Corlay, P. Ivanov, D. Avila, S. Abdalla,and C. Willing, Jupyter notebooks – a publishing formatfor reproducible computational workflows, in Position-ing and Power in Academic Publishing: Players, Agentsand Agendas , edited by F. Loizides and B. Schmidt (IOSPress, 2016) pp. 87 – 90.[51] F. P´erez and B. E. Granger, IPython: a system for in-teractive scientific computing, Computing in Science &Engineering (2007).[52] E. L. Lehmann, The fisher, neyman-pearson theories oftesting hypotheses: One theory or two?, Journal of theAmerican Statistical Association , 1242 (1993).[53] D. Venturelli and A. Kondratyev, Reverse quantum an-nealing approach to portfolio optimization problems,Quantum Machine Intelligence , 17 (2019).[54] J. Marshall, D. Venturelli, I. Hen, and E. G. Rieffel,Power of pausing: Advancing understanding of thermal-ization in experimental quantum annealers, Phys. Rev.Applied , 044083 (2019).[55] H. Karimi, G. Rosenberg, and H. G. Katzgraber, Effec-tive optimization using sample persistence: A case studyon quantum annealers and various monte carlo optimiza-tion methods, Phys. Rev. E , 043312 (2017).[56] S. Yarkoni, H. Wang, A. Plaat, and T. B¨ack, Boostingquantum annealing performance using evolution strate-gies for annealing offsets tuning, in Quantum Technol-ogy and Optimization Problems , edited by S. Feld andC. Linnhoff-Popien (Springer International Publishing,Cham, 2019) pp. 157–168.[57] Source code for dwave.system.composites.virtual graph, https://docs.ocean.dwavesys.com/en/stable/_modules/dwave/system/composites/virtual_graph.html (2021), accessed 2021-19-01.[58] Discrete quadratic models, https://docs.ocean.dwavesys.com/en/stable/concepts/dqm.html (2021),accessed 2021-19-01.[59] T. M., forum response to: Does the dis-crete quadratic model function use a one-hot encoding?, D-Wave user forums (responsepermalink https://support.dwavesys.com/hc/en-us/community/posts/360051436394/comments/360015043293)(December 8, 2020).[60]