[PDF] Ferromagnetically shifting the power of pausing

Abstract

We study the interplay between quantum annealing parameters in embedded problems, providing both deeper insights into the physics of these devices and pragmatic recommendations to improve performance on optimization problems. We choose as our test case the class of degree-bounded minimum spanning tree problems. Through runs on a D-Wave quantum annealer, we demonstrate that pausing in a specific time window in the anneal provides improvement in the probability of success and in the time-to-solution for these problems. The time window is consistent across problem instances, and its location is within the region suggested by prior theory and seen in previous results on native problems. An approach to enable gauge transformations for problems with the qubit coupling strength J in an asymmetric range is presented and shown to significantly improve performance. We also confirm that the optimal pause location exhibits a shift with the magnitude of the ferromagnetic coupling, | J F | , between physical qubits representing the same logical one. We extend the theoretical picture for pausing and thermalization in quantum annealing to the embedded case. This picture, along with perturbation theory analysis, and exact numerical results on small problems, confirms that the effective pause region moves earlier in the anneal as | J F | increases. It also suggests why pausing, while still providing significant benefit, has a less pronounced effect on embedded problems.

Full PDF

FFerromagnetically shifting the power of pausing

Zoe Gonzalez Izquierdo,

1, 2, 3

Shon Grabbe, Stuart Hadﬁeld,

2, 3

Jeﬀrey Marshall,

2, 3

Zhihui Wang,

2, 3 and Eleanor Rieﬀel Department of Physics and Astronomy, and Center for Quantum Information Science & Technology,University of Southern California, Los Angeles, California 90089, USA QuAIL, NASA Ames Research Center, Moﬀett Field, California 94035, USA USRA Research Institute for Advanced Computer Science, Mountain View, California 94043, USA (Dated: June 16, 2020)We study the interplay between quantum annealing parameters in embedded problems, providingboth deeper insights into the physics of these devices and pragmatic recommendations to improveperformance on optimization problems. We choose as our test case the class of degree-boundedminimum spanning tree problems. Through runs on a D-Wave quantum annealer, we demonstratethat pausing in a speciﬁc time window in the anneal provides improvement in the probability ofsuccess and in the time-to-solution for these problems. The time window is consistent across probleminstances, and its location is within the region suggested by prior theory and seen in previousresults on native problems. An approach to enable gauge transformations for problems with thequbit coupling strength J in an asymmetric range is presented and shown to signiﬁcantly improveperformance. We also conﬁrm that the optimal pause location exhibits a shift with the magnitudeof the ferromagnetic coupling, | J F | , between physical qubits representing the same logical one. Weextend the theoretical picture for pausing and thermalization in quantum annealing to the embeddedcase. This picture, along with perturbation theory analysis, and exact numerical results on smallproblems, conﬁrms that the eﬀective pause region moves earlier in the anneal as | J F | increases. Italso suggests why pausing, while still providing signiﬁcant beneﬁt, has a less pronounced eﬀect onembedded problems. I. INTRODUCTION

Quantum computing provides novel mechanisms for ef-ﬁcient computing, but the extent of its impact is as of yetundetermined. A tantalizing area of application is com-binatorial optimization, where challenging instances arecurrently attacked by a variety of classical heuristics, andwhere quantum heuristics have the potential to outper-form these classical approaches. Here, we advance theunderstanding of one such heuristic, quantum anneal-ing, deepening the theoretical picture of the roles thatthermalization, adiabatic processes, and diabatic processplay in quantum annealing, and demonstrating the im-pact of annealing schedules and the interplay betweenquantum annealing parameters on performance, partic-ularly on application-related problems that require em-bedding.Our work builds on the theoretical picture of Marshallet al. [1] that explains why pausing in an appropriate timewindow during the anneal enables the system to thermal-ize better, improving the ﬁt of the output distributionwith a Boltzmann distribution and increasing the suc-cess probability by orders of magnitude. Because quan-tum annealing happens at non-zero temperature, tem-perature plays a signiﬁcant role, along with quantum dy-namics induced by varying the Hamiltonian, particularlynear where the temperature and the minimal energy gapbetween the ground state and the ﬁrst excited state arecommensurate. This eﬀect has also been studied in simu-lations [2], and recently, rigorous suﬃcient conditions un-der which pausing helps were identiﬁed in Ref. [3]. Here,we build on the above understanding, beyond the native problems studied in [1], to embedded problems.It is well known that most problem instances, in partic-ular those related to applications, will not have a struc-ture that matches that of the hardware, in which case theproblems must be embedded. Embedded problems usemultiple physical qubits to represent each logical qubit,with these physical qubits coupled via ferromagnetic cou-plings J F <

0. In the embedded problems we study, weconﬁrm an improvement in success probability and thatfor this class of problem, as was found for native prob-lems, there is a time window in which a pause reliablyimproves the performance across problem instances. Weextend the theoretical picture of Refs. [1, 3] to embed-ded problems, including a perturbative analysis on theeﬀect of | J F | on the minimal energy gap between theground and the ﬁrst excited states. Our gap analysisand numerical simulations on small systems show thatas | J F | increases, the minimal gap shifts earlier and thegap size decreases. This extended picture explains whyone would expect a shift of the optimal pause locationto earlier in the anneal with increasing | J F | , and also asomewhat less pronounced improvement from pausing onembedded problems than on native problems.The class of problems studied, bounded-degree min-imal spanning tree (BD-MST) problems, seen in a va-riety of application areas such as a broad spectrum ofnetwork-related problems, have not been studied beforein this context. We demonstrate that small instancesof these problems can be embedded and successfullysolved by state-of-the-art quantum annealers, and con-ﬁrm the results predicted by our theoretical picture. Wedemonstrate that for the best parameters, pausing im- a r X i v : . [ qu a n t - ph ] J un proves not only the probability of success, but also thetime-to-solution (TTS). To obtain these results, we usedthe newly added extended range feature of the D-Wave2000Q to enable the use of stronger ferromagnetic cou-plings relative to the problem instance couplings. Be-cause of the asymmetry in the extended range, we couldnot use the standard gauge approach to randomize theeﬀect of qubit biases in the D-Wave 2000Q on the anneal-ing runs. We developed a partial gauge approach thatenabled us to obtain much cleaner results and substan-tially better probabilities of success than running withoutpartial gauges.The rest of the paper is organized as follows: InSec. II, we review background information on spanningtree problems and on quantum annealing. In Sec. III,we describe the speciﬁcs related to the hardware, the in-stances, and the parameters for our runs, and the metricswe use to evaluate them. Sec. IV is devoted to resultson the annealer. Results for annealing without pause areshown in Sec. IV A, how pausing can be helpful is demon-strated in Sec. IV B and how pausing shifts with | J F | inSec. IV C. The technical treatment that enables the con-clusive results, partial gauges, is discussed in Sec. IV D.We provide theoretical analysis and a physical picture forthe shifting of optimal pause location with | J F | in Sec. V.In Sec. VI we summarize the results and discuss futurework. II. BACKGROUND

We review background material on spanning tree prob-lems and on quantum annealing.

A. Spanning Tree Problem Classes

Deﬁnition A.1.

A spanning tree for a graph G is asubgraph of G that is a tree and contains all vertices of G . Spanning trees are important for several reasons. Theyplay a critical role in designing eﬃcient routing algo-rithms. Some computationally hard problems, such asthe Steiner Tree problem and the Traveling SalespersonProblem, can be solved approximately using spanningtrees [4]. Spanning tree problems also ﬁnd broad appli-cations in network design, bioinformatics, etc.One ﬂavor of the spanning tree problems is theweighted spanning tree problem: Given a connectedundirected graph G = ( V, E ) and set of weights w uv foreach edge ( uv ) ∈ E , we seek a spanning tree T ⊂ E suchthat the tree weight (cid:80) ( uv ) ∈ T w uv is minimized.For general graphs, determining if there exists a span-ning tree of weight W can be decided in polynomial time,and diﬀerent eﬃcient algorithms exist to ﬁnd a minimumweight tree; for example, Kruskal’s algorithm requirestime O ( | E | log | V | ) [5]. (Special classes of graphs can be solved even faster.) On the other hand, with the addi-tional constraint that the maximum vertex degree of thespanning tree found is at most ∆, even deciding whetherthere exists such a spanning tree becomes NP-completefor ﬁxed ∆ ≥ The BD-MST Problem:

Given an integer ∆ ≥ G = ( V, E ) with edge weights w uv , ( uv ) ∈ E ,ﬁnd a minimum weight spanning tree of maximum degreeat most ∆.We refer interested readers to Appendix. B and Ref-erences therein for approximation complexity theory re-lated to the BD-MST problem. B. Solving on a Quantum Annealer

Quantum annealing is a quantum metaheuristic foroptimization. Quantum annealers are quantum hard-ware that are designed to run this metaheuristic. Anyclassical cost function C ( x ) that is a polynomial overbinary variables x ∈ { , } n can, with the addition ofauxiliary variables, be turned into a quadratic cost func-tion. Problems with quadratic cost functions over bi-nary variable without additional constraints are calledquadratic unconstrained binary optimization (QUBO)problems. Quantum annealing is carried out by evolv-ing the system under a time-dependent Hamiltonian H ( s ) = A ( s ) H D + B ( s ) H C , where H D is a driver Hamil-tonian, most commonly H X = − (cid:80) i X i and H C is anIsing Hamiltonian derived from a classical cost function.There is a straightforward mapping between QUBO andIsing problems. The parameter s is a dimensionless timeparameter that ranges from 0 to 1, with A ( s ) and B ( s )determining the form of the anneal schedule. As we willsee, many diﬀerent schedules s ( t ) are possible. More in-formation about quantum annealing generally, includingmappings of problems to QUBO can be found in [7–9].For most application problems, on a hardware with re-stricted qubit connectivity, the resulting QUBO problemmust further be embedded to conform with the hardwareconnectivity; graph minor embedding enables couplingbetween logical qubits in the QUBO graph by represent-ing each logical qubit by a set of physical qubits with fer-romagnetically coupled with a magnitude of | J F | amongthem to promote collective behavior ( J F is always neg-ative, so we typically refer to its magnitude | J F | .). Fol-lowing standard terminology in graph theory, each suchset of physical qubits is called a vertex model for its cor-responding logical qubit. When embedding, we use thesame coupling strength | J F | for all the couplings withina vertex model. Problems that do not require embeddingbecause their structure matches that of the hardware arecalled native problems for that hardware.While | J F | can be set to a large value such that theembedded problem preserves the ground state of the log-ical problem, and analytical bounds on this value can beobtained [10], too large a | J F | can reduce quantum an-nealing performance. Physically there is an energy limiton the Hamiltonian as a whole, and too large a | J F | rela-tive to other parameters would mean that all of the prob-lem parameters could reduce performance due to preci-sion issues and noise in implementation. Furthermore,the energy spectrum throughout the anneal varies withthe value of | J F | , and its eﬀect on the annealing oftenrequires careful case-by-case consideration [7, 8, 11, 12].Thus, optimally setting the ferromagnetic coupling | J F | is a challenging task. Prior work has shown there is asweet spot for this value. Physically this makes sensebecause a stronger | J F | makes it less likely for individ-ual qubits within a vertex model to ﬂip, which helps toavoid breaking the vertex model, but too large a | J F | makes it increasingly costly for the vertex model qubitvalues to ﬂip together, potentially preventing the systemfrom leaving a non-optimal conﬁguration.To boost the probability of success, | J F | must strikethe right balance, leading to better chances of arrivingat—and staying in—the correct conﬁguration. The D-Wave 2000Q allows asymmetric extension of the pairwisequbit coupling strengths J i,j ∈ [ − ,

1] (in addition to thecanonical symmetric option J i,j ∈ [ − , | J F | in the extended range.We show how the extended values improve the successprobability of our problems.The schedule s ( t ) can signiﬁcantly aﬀect performance.Of particular interest to us are schedules that include apause where for some sub-interval s ( t ) is constant (i.e., H is constant for a speciﬁed time). Marshall et al. [1]observed on an ensemble of native problems that, strik-ingly, a pause at a location (generally) insensitive to theinstance speciﬁcs boosts the probability of ﬁnding theground state—the success probability—by orders of mag-nitude. The physical picture underlying such a universaleﬀect is reviewed and expanded in Sec. V III. METHODS

Here, we discuss the speciﬁcs of the problem instances,annealing schedules and parameters, and metrics used toobtain our results.

A. Problem Instances

Each BD-MST problem instance consists of a weightedgraph G = ( V, E ) and a degree bound ∆. The underlyinggraphs are chosen by exhausting all connected graphswith n = | V | = 5, which have m = | E | ranging from 4to 10. Weight sets were uniformly drawn from 1 to 7.Graphs and weight sets were combined to yield a largenumber of unique instances. Results are averaged overensembles of instances. The size of the ensemble will bespeciﬁed for each result in Section IV. The complete listof graphs and weight sets can be found in Table III andTable IV respectively of Appendix C. A number of mappings of the BD-MST problem toQUBO can be found in Ref. [13]; here we use the resource-eﬃcient level-based mapping described in App. A. Foreach problem instance, the level-based mapping yieldsan objective function Hamiltonian H C . For the degreebound we generally selected ∆ = 2, resulting in problemsequivalent to Hamiltonian path problems; we also tested∆ = 3 and our claims hold for this case as well (seeFig. 10 in Appendix F). B. Annealing Parameters and Schedules

We ran our problems on the D-Wave 2000Q quantumannealer housed at NASA Ames Research Center, whichhas 2031 qubits and a Chimera graph architecture [14].To embed the resulting QUBO instances in the D-Wave2000Q hardware graph, we ran D-Wave’s embedding-ﬁnding algorithm 30 times and used the smallest sizeembedding found (fewest total physical qubits). Thisprocedure found an embedding for all graphs we consid-ered. Detailed information about the typical size of theembedded problems for diﬀerent graphs can be found inFig. 9 in Appendix D, including the number of physicalqubits and the size of the vertex models. Embeddingstatistics for a future D-Wave architecture (Pegasus) arealso given.The objective Hamiltonians were scaled so that thecoupling strengths are in the range [ − , J range is used to couplephysical qubits representing the same logical qubits. Wechose a | J F | in the range [1 , . A ( s ) and B ( s ),exploring two qualitatively diﬀerent schedules. The ﬁrstis a standard anneal, with time parameter s ( t ) = t/t a ,where t a is the annealing time. Baseline runs were per-formed with this schedule, and several annealing times t a initially tested. The shortest time allowed by D-Wave, t a = 1 µ s, was found to be optimal in terms of TTS forthe instance ensembles, agreeing with previous studiesfor other problems [12, 15, 16].The second type of schedule includes a pause. The be-ginning and end of the anneal are the same as in the ﬁrstcase, but at some intermediate point s p the Hamiltonianis held constant for some time t p . The entire range of pos-sible pause locations ( s p ∈ [0 , | J F | , it is always withinthe range [0 . , . s p varied between 0 . . .

02 intervals. A range of pause durations t p were alsosurveyed. Since shorter pause times found to yield betterTTS (see Sec. IV B), our runs are performed with pauseduration t p = 1 µ s unless otherwise noted. After optimalvalues for other parameters were found, other t p valuesin the range [0 . , µ s were explored.We use the extended range of J i,j ∈ [ − ,

2] (in addi-tion to the canonical symmetric option J i,j ∈ [ − , partial gauge transformation , that selectively applies thetransformation only to couplings in the symmetric range[ − , | J F | inthe case of no pausing, and the shift of the optimal pauselocation with | J F | . Partial gauges, and their eﬀect, willbe discussed in more detail in Sec. IV D.Unless otherwise speciﬁed, all runs are performed with t a = 1 µ s, 50 ,

000 anneals (or reads), and 100 partialgauges.

C. Metrics

We use the empirical probability of success ( p success )and time to solution (TTS) as our ﬁgures of merit fordetermining how likely a problem is to be solved, deﬁnedas: p success = anneals with correct solutiontotal anneals (1)TTS = log(1 − . . − p success ) t tot , (2)where the total time t tot = t a + t p is the time spent oneach run, taking into account both the base annealingtime t a and the pause duration t p .These two measures are complementary to each other.The TTS ﬁgure of merit reports the expected time re-quired to solve the problem with 99% conﬁdence. While p success is directly determined by and hence provides aportal to understand the underlying physical process,TTS gives a more practical measure that is universalacross diﬀerent parameter ranges and diﬀerent solvers.A higher success probability does not necessarily mean ahigher TTS. For instance, we might get a slightly higher p success by using a longer annealing time t a = 100 µ s thana shorter one t a = 1 µ s, yet the chance of ﬁnding the so-lution might be higher by repeating the t a = 1 µ s runs100 times than doing the t a = 100 µ s anneal once.Because we compare results from two diﬀerent sched-ules (baseline no-pause and pause), we also need metricsthat help us examine the beneﬁts that the latter presentsover the former. To this end, we deﬁne two quantitiesbased on the instance-wise improvement in TTS. The ﬁrst one is the absolute TTS improvement, deﬁned foreach instance i as∆TTS i = TTS i (no pause) - TTS i (pause) , (3)with the two TTS values calculated at their respectiveoptimal | J F | values ( | J ∗ F | = 1 . . i indicates thata pause improves upon the baseline results (i.e. reducesTTS) for that particular instance. The second one is therelative TTS improvement, deﬁned as the ratio∆TTS i / TTS i = TTS i (no pause) - TTS i (pause)TTS i (no pause) . (4)When a valid solution is not found for a speciﬁc in-stance, and thus p success = 0 for that instance, its corre-sponding TTS is inﬁnity. If TTS for both the pause andno pause results are inﬁnity, the pause is not improv-ing upon the no pause results, hence ∆TTS i = 0. WhenTTS= ∞ only for the no pause case, pausing provides themaximum possible improvement, and we set ∆TTS i = ∞ and ∆TTS i / TTS i = 1. Finally, when TTS= ∞ only forthe pause case, the opposite occurs, with ∆TTS i = −∞ and ∆TTS i / TTS i = −∞ .After the embedded problem is run on the D-Wave,outputs with any inconsistent values on physical qubitsthat represent the same logical qubit–or with violatedpenalty terms such that the output doesn’t encode a de-gree bounded spanning tree–are considered to be invalidanswers, and counted as failed runs. The retained validanswers are then veriﬁed against the exact solution ofthe problem, which is obtained through direct enumera-tion for the small problem sizes we consider. Reporteddata points correspond to the median, with the error barsmarking the 35 th and 65 th percentiles. For ensemblesof instances, 10 bootstraps are performed over the in-stances to obtain those values, where each bootstrap sam-ple is drawn with replacement from the original instanceensemble until it and is of the same size as the origi-nal ensemble. Median and 35 and 65 percentiles fromthe bootstrap samples are reported. There are a few in-stances that did not solve with or without pauses; theseinstances are are not excluded from the ensemble in ourbootstrap procedure, but are given a TTS of ∞ . These ±∞ values for ∆TTS i and ∆TTS i / TTS i do not appearin our reported results, as they remain very far from themedian (which we report as our data point) and fromthe 35 th and 65 th percentiles of the bootstrapped resultsthat we present as error bars.D-Wave returns the solution with the minimum costit has found. To ensure the validity of this solution, weﬁrst conﬁrm that the resulting graph is in fact a span-ning tree that satisﬁes the degree constraint, and also atrue optimal solution by comparing with the true mini-mal cost obtained by an exact classical algorithm. Anyother outcome is weighted zero toward p success . IV. RESULTS

We now present our results on D-Wave 2000Q, includ-ing anneals without a pause (baseline) and the eﬀect ofpausing.

A. Annealing without pause, eﬀect of | J F | We ﬁrst show that the BD-MST problems we study aresuccessfully solved on the D-Wave 2000Q using a stan-dard annealing schedule, demonstrating the ability of aquantum annealer to solve a new class of optimizationproblems, and study the eﬀect of the strength of the fer-romagnetic coupling on the success probability. . . . . . . . | J F | TT S ( µ s ) FIG. 1:

Optimal | J F | for baseline. TTS for anensemble of 45 instances as | J F | varies. 1 µ s anneal isused. The best performance is observed at | J ∗ F | = 1 . baseline results are obtained with no pause and t a = 1 µs , which is the shortest that D-Wave allows, andwas chosen for consistently yielding the best TTS for en-sembles of problem instances for both this study and pre-viously studied problems. [12, 15, 16]By exploring the available range of | J F | values between1 . .

0, we conﬁrm the advantage of using the ex-tended | J F | range and identify its optimal value for thebase case at | J ∗ F | = 1 . | J F | for the ensemble of in-stances.Results vary for groups of instances with diﬀerent n ;the optimal | J F | for n = 4 is lower, around 1 . . B. Improvement with a pause

After establishing the baseline with the no pause sched-ule, we introduce a mid-anneal pause. A pause can beplaced at any point in the anneal, i.e., s p ∈ [0 , .

20 0 .

25 0 .

30 0 .

35 0 .

40 0 .

45 0 . s p − − p s u cc e ss no pausepause FIG. 2:

Improvement of p success with a pause. Success probability for an ensemble of 45 instances. | J F | = 1 . µ s anneal are used. Pause duration is100 µ s. The horizontal line shows the baseline, i.e.,no-pause results. Each data point represents the resultswhen introducing a pause at the location s p . At theoptimal pause locations, an improvement of about anorder of magnitude in p success is obtained.results show that, like for native problems, the probabil-ity of success signiﬁcantly improves when pausing withina speciﬁc region that is consistent across problem in-stances. The optimal pause location is between 0 . ∼ . t p = 100 µ s and | J F | = 1 . | J F | forthe no pause case). When we examine how the optimalpause location is aﬀected by | J F | , we will see that thepeak in p success moves earlier with increasing | J F | , butremains in this range. We also ﬁnd that for instancesthat were unsolved in the baseline (no pause) runs, asolution is often found after introducing an appropriatepause. See Appendix E for statistics on such cases. Theseﬁndings will be given theoretical and numerical supportin Sec. V.As in Ref. [1], the probability of success grows mono-tonically as the pause duration increases in the range t p ∈ [0 . , µ s (not shown). With respect to expectedtime-to-solution (TTS), longer duration can cancel outimprovements due to increased probability of success. Wewere able to locate a sweet spot in pause duration for thevarious TTS metrics (Sec. III C) with pause durations of t p = 0 .

75 or t p = 1 . s p = 0 . s p = 0 .

32. We now discuss these results in more detail.The results of Fig. 3 demonstrate that a properlyplaced pause of certain duration leads to statistically sig-niﬁcant improvement in the various TTS metrics on ourensemble of BD-MST instances. After sparsely sweep-ing through a range of parameters (not shown) we foundthat the parameter ranges t p ∈ [0 . , µ s, s p = 0 . s p = 0 .

32, and | J F | = 1 . relative diﬀerence ∆TTS/TTS, taking the median of thisdiﬀerence across the ensemble. The “median of the diﬀer-ence” of the two latter metrics can be quite diﬀerent fromthe “diﬀerence in median”. Since the magnitude of theTTS across our instances ranges over a few orders of mag-nitude, the instance-wise relative diﬀerence ∆TTS/TTScan be quite diﬀerent from the instance-wise diﬀerence∆TTS. While several of the pause schedules are betterthan the baseline according to every metric we use, oth-ers only do better in some of the metrics. The magnitudeof the improvement, as well as the optimal pause loca-tion and duration, can vary signiﬁcantly depending onthe metric.The left panel of Fig. 3 shows TTS for the ensemblein the above narrowed parameter range. Plotted as ahorizontal line is the baseline (no pause case) at its op-timal | J F | = 1 .

6. At both pause locations s p = 0 . s p = 0 .

32, a pause duration t p = 1 µ s is optimal on theensemble of 45 instances. While at s p = 0 .

3, only the t p = 1 µ s case beats the baseline, at s p = 0 .

32, the TTSfor all values of t p ∈ [0 . , µ s is consistently lower (bet-ter) than that of the baseline. (See Fig. 14 in Appendix Ffor the the corresponding p success .)The center panel in Fig. 3 shows the median instance-wise diﬀerence ∆TTS for the ensemble of 45 instances.All the data points and their respective error bars areabove 0, indicating that pausing provides a statisti-cally signiﬁcant improvement when the pause parame-ters are in the studied range with s p = { . , . } and t p ∈ [0 . , µ s.For example, while the median TTS of the ensemble isbetter for the baseline case than for the pause schedulewith s p = 0 . t p = 0 .

25, this pause schedule did betterthan the baseline on more than half of the 45 instances,leading to a positive ∆TTS. These two metrics providediﬀerent information about the strengths of each method.The right panel of Fig. 3 represents the instance-wiserelative improvement in TTS, that is, each instance-wiseimprovement is divided by the corresponding baseline nopause TTS for that particular instance, and then the me-dian over the ensemble of instances of this set of values iscalculated. We ﬁnd that for a pause duration of t p = 1 µ sthe median relative improvement holds an optimal value ∼ .

22. This pause duration was not the optimal for theabsolute improvement shown in the middle panel, givingsomewhat lower values of ∆TTS than a pause durationof 0 . µ s. This ‘change of order’ occurs whenever thefollowing condition is met:TTS j (base)TTS k (base) > ∆TTS j ( t p , s p )∆TTS k ( t (cid:48) p , s (cid:48) p ) , (5)where j is the instance where the median of∆TTS/TTS( t p , s p ) occurs, and k the instance wherethe median of ∆TTS/TTS( t (cid:48) p , s (cid:48) p ) does. We examinein more detail the four best data points with respect to the ∆TTS/TTS metric, those with s p ∈ { . , . } and t p ∈ { . , . } µ s.We ﬁrst look at the absolute improvement. At pauselocation s p = 0 .

3, the median improvement for pause du-rations t p = 1 and 0 . µ s were 216 and 266 µ s, respec-tively. At s p = 0 .

32, it is 369 and 401 µ s, respectively,all the same order of magnitude. Consider the four in-stances that yield these four values. Their baseline nopause TTS values vary considerably, being 897, 5003,2738, and 25581 respectively. The substantially longerbaseline TTS for the instances that are the median ineach of the 1 µ s cases than those in the 0 . µ s cases (5xat 0.3 and 10x at 0.32) suggests that the 1 µ s pause willperform better than the 0 . µ s pause under the rela-tive diﬀerence metric. (This is not certain because themedian in the two metrics may correspond to diﬀerentinstances.)We now look at the relative improvement. Comparedto how it did with respect to the ∆TTS metric, the0 . µ s pause did much worse than expected relative tothe other pause durations. For all four cases with param-eters s p ∈ { . , . } and t p ∈ { . , . } µ s, many moreinstances’ relative performance improved with a pausethan were hurt by a pause (See Tables I and II). Onthe other hand, for pauses at s p = 0 . s p = 0 . t p = 0 .

75 case,with the median harm over 5 times that of the medianbeneﬁt, compared to the other case where the ratio wasless than 2. At s p = 0 .

32, the median beneﬁt is largerthan or the same as the median harm.(In all cases, there are 3 instances that were not solvedwith or without a pause, hence are not included here.) s p = 0 . / TTS t p = 1pause hurts 15 -0.6666pause helps 27 0.3960 t p = 0 . -1.1875 pause helps 28 0.2522 TABLE I: s p = 0 . s p = 0 .

32 No. instances median ∆TTS / TTS t p = 1pause hurts 12 -0.1993pause helps 30 0.3467 t p = 0 . TABLE II: s p = 0 . .

30 0 . s p × × TT S ( µ s ) s p = 0 . s p = 0 . .

30 0 . s p ∆ TT S ( µ s ) s p = 0 . s p = 0 . .

30 0 . s p . . . . . . . ∆ TT S / TT S s p = 0 . s p = 0 . no pause t p = 0.25 µ s t p = 0.5 µ s t p = 0.75 µ s t p = 1.0 µ s t p = 2.0 µ s FIG. 3:

Eﬀect of pause duration on TTS. Left : With pause duration of { . , . , . , , } µ s, and | J F | =1.8,the median TTS for an ensemble of 45 instances is shown for pause locations s p = 0 . s p = 0 .

32. The reference(horizontal line and band for median and 35 and 65 percentiles, respectively) is the no-pausing case with parametersoptimal for TTS: t a = 1 µ s, and | J F | =1.6. Data points show the median, with error bars at the 35 th and 65 th percentiles, after performing 10 bootstraps over the set of instances. Center : Instance-wise absolute improvementin TTS (in µ s). ∆TTS represents the reduction in TTS accomplished by introducing a short pause at an optimallocation s p . A positive ∆TTS indicates that the TTS is reduced (improved) by the introduction of the pause.∆TTS is calculated by subtracting the TTS for the pause case with with | J F | = 1 . | J F | = 1 . | J F | for each case). Data points are the median, error bars are 35 th and 65 th percentileobtained from 10 bootstraps over 45 instances. Right : Instance-wise improvement ratio ∆TTS / TTS. Data pointsare staggered along the s p axis for readability. Note that the errorbars in all panels are 35 and 65 percentiles of thebootstrap samples, thus indicates the uncertainty in the median values reported, instead of the median value of the35 and 65 percentiles in the instance ensemble.When interpreting these results, it is worth keeping inmind that with the exponential dependence of the TTSon the probability of solution, long TTS values are sub-ject to much greater statistical ﬂuctuations than shorterTTS values. C. Shift in optimal pause location with | J F | One interesting new avenue that opens up with thestudy of embedded problems is how the value of | J F | aﬀects the beneﬁts and eﬀects of pausing. As previouslydiscussed, the p success vs s p curve typically shows a peakaround an optimal pause location and is mostly ﬂat faraway from it (like in Fig. 2). We have also seen in Fig. 1that without a pause, the value of | J F | aﬀects p success .For the pausing case, when | J F | increases, not only doesthe height of the peak change with | J F | , but its positionshifts as well, moving earlier in the anneal. The top panelof Fig. 4 shows this shift for a demo instance and a wide range of | J F | , with the horizontal axis spanning the rangeof pause locations where the peak in p success is found.Such clear shifting is found in many instances, andresults in a shift in the behavior of the whole instanceensemble, as shown in the bottom panel of Fig. 4. Forﬁgure clarity, pausing results for just three values of | J F | are shown. The shift is consistent over all | J F | valueswe examined (in [1 . , | J F | values, like1 . , .

3, even away from the peak, is clearly lower thanfor larger | J F | values (This holds true for the ensemble ofinstances, but some individual outliers have been found,with a high p success for smaller values of | J F | . Fig. 11 inthe Appendix shows some examples). The reason is thatwhen the ferromagnetic coupling is not very strong com-pared to the problem couplings, it is more likely that thelow-lying energy states are densely populated by stateswith an inconsistent vertex model (i.e. when not all thequbits are aligned and hence are no longer acting as a .

20 0 .

25 0 .

30 0 .

35 0 .

40 0 .

45 0 . s p − − − p s u cc e ss | J F | = 2 | J F | = 1 . | J F | = 1 . | J F | = 1 . | J F | = 1 . | J F | = 1 . | J F | = 1 . | J F | = 1 . | J F | = 1 . | J F | = 1 . .

20 0 .

25 0 .

30 0 .

35 0 .

40 0 .

45 0 . s p − − − p s u cc e ss no pause, t a = 1 µ sno pause, t a = 101 µ s | J F | = 1 . | J F | = 1 . | J F | = 2 FIG. 4:

Shift of optimal pause location with | J F | . Top-Left : Probability of success versus the annealing pauselocation for the demo instance. The anneal was performed with 1 µ s anneal time, 100 µ s pause. A monotonic shiftin the peak location with | J F | is observed. The horizontal curve corresponds to the no pause, | J F | = 1 .

6, and annealtime of 1 µ s. The reason for lower success probability for | J F | =1.2 is detailed in Sec. IV C. Bottom : Probability ofsuccess for an ensemble of 9 instances of n = 5 with a pause duration t p = 100 µ s, and t a = 1 µ s. The horizontal lines(for median) and bands (for 35 to 65 percentiles) are baseline results with no pause, | J F | = 1 .

6: Blue (lower)line/band: t a = 1 µ s; orange (lower) line/band: t a = 101 µ s.single variable). Accordingly, even when the annealer isdoing well at ﬁnding the ground state or a low-lying state,such outcome do not correspond to a valid solution of theoriginal problem. Indeed, by applying simulated anneal-ing to solve the embedded problem (which is too largeto diagonalize exactly), we veriﬁed that in the range of | J F | we used, the ratio of inconsistent-vertex -model-statein the ground/low-lying states is signiﬁcantly higher for | J F | = 1 . , . . | J F | ﬁtsour theoretical understanding, which is laid out in sec-tion V B. .

20 0 .

25 0 .

30 0 .

35 0 .

40 0 .

45 0 . s p − − − p s u cc e ss no pause, 100 gaugesno pause, no gauges100 gaugesno gauges .

20 0 .

25 0 .

30 0 .

35 0 .

40 0 .

45 0 . s p − − − − − p s u cc e ss | J F | = 2 | J F | = 1 . | J F | = 1 . | J F | = 1 . | J F | = 1 . | J F | = 1 . | J F | = 1 . | J F | = 1 . | J F | = 1 . .

20 0 .

25 0 .

30 0 .

35 0 .

40 0 .

45 0 . s p − − − − − p s u cc e ss | J F | = 2 | J F | = 1 . | J F | = 1 . | J F | = 1 . | J F | = 1 . | J F | = 1 . | J F | = 1 . | J F | = 1 . | J F | = 1 . FIG. 5:

Eﬀect of partial gauges . Top: partial gaugeshelp boost average success probability for a set of 117instances. | J F | = 1 . Middle:

Eﬀect of | J F | on P succ for a single instance. t a = 1 µ s, a pause ofduration 100 µ s is applied. No gauge transformations isperformed. Bottom:

The same instance andparameters as in the Middle, but with 100 partialgauges applied. Partial gauges helped suppress thevariance, and revealed the peak shift with | J F | . D. Help of Partial gauges

We developed a partial gauge transformation techniquethat signiﬁcantly improved the success probability, andenabled the conﬁrmation of the peak shift.Gauge averaging is a technique commonly used to al-leviate the eﬀect that intrinsic biases on the local ﬁeldsand couplers can have on the data obtained from a quan-tum annealer [17]. It can help improve statistics and leadto less noisy results and improved p success and TTS. Agauge transformation starts with assigning a random se-quence a j ∈ {± } to re-deﬁne the basis for each qubit,˜ Z j = a j Z j . If we accordingly adjust the local ﬁeld andcouplers such that ˜ J ij = a i a j J ij ˜ h i = a i h i , then the resulting Hamiltonian has the same energy spec-trum as the original one. This Hamiltonian is run on theannealer and the output bit string is transformed backusing the same a j ’s. By performing multiple gauge trans-formations and averaging results over them, biases thatstem from, for example, a qubit having a slight prefer-ence to aligning in one direction over the opposite, canbe suppressed.When J ij ∈ [ − , J range, allowing J ij ∈ [ − , − , −

1) cannot change sign.Our partial gauge method circumvents this issue byonly applying the gauge transformation on the couplingswithin the interval [ − , p success by paus-ing is observed for a family of embedded problems, butno relation between the optimal pausing location and J F was observed. In our study, with the help of partial gaugetransformation, the variance in the annealing output issigniﬁcantly suppressed, resulted in the revelation of theshift of the peak in Sec. IV C. This improvement of thevariance is seen in the top panel (note log scale) and bycomparing the middle and bottom panels of Fig. 5.Another beneﬁt was a remarkable increase in p success for hard problems. Usually, we dont expect p success tochange signiﬁcantly from gauge averaging, because solv-ing the problem without gauge transformations is justapplying one gauge, which typically will be near aver-age instead of an outlier. But the success probability islower bounded by zero, and when problems like the ones0we are solving here are diﬃcult for the solver the typi-cal empirical p success is zero or very close to zero. Theexistence of such a lower bound explains the signiﬁcantbeneﬁt in applying gauges: even if we get a bad gauge, p success cannot go below 0, while a good gauge can yielda much higher p success . In a number of gauges it is likelyto encounter a few good gauges, bringing the average p success up. The top panel of Fig. 5 shows, for a largeensemble of instances, the improvement in success prob-ability with 100 partial gauges is signiﬁcant: about anorder or magnitude higher than the results ran withoutgauge transformation.The improvement in p success will saturate as one in-creases the number of gauges applied. Fig. 13 in theAppendix shows for an n = 4 ensemble, applying as fewas 10 gauges yields similar p success to 100 gauges. Theseresults indicate that with 10 gauges we are already likelyto encounter one or more positive outliers, leading to thelarge improvement in p success . As the number of gaugesfurther increases, the eﬀect is not as dramatic, indicat-ing the spread in gauge quality approaches the intrinsicdistribution.The partial gauge transformation therefore enables usto extend the beneﬁts of general gauge averaging to em-bedded problems. V. PHYSICAL PICTURE

In this section, we expand the physical picture ofRef. [1] to embedded problems, explaining both the shiftof optimal pausing location with increasing | J F | and whyembedded problems, while beneﬁting signiﬁcantly, bene-ﬁt less from a pause than native problems. We provide aperturbation analysis supporting the picture, and numer-ical evidence on the change in minimal gap location. Thepicture is far from that of the adiabatic regime - pausingis eﬀective after not at the minimum gap and diabaticand thermal eﬀects play a signiﬁcant role. A. How pausing helps

We start with a recap of the physical picture of Ref. [1]that explains the increase in success probability by intro-ducing a pause in the middle of the annealing schedule,after the minimal energy gap. Recent work [3] veriﬁedthis qualitative picture in numerical simulations, and alsoprovided suﬃcient conditions under which pausing im-proves success probability. Loosely speaking, so long asshortly after the minimum gap the relaxation time-scaleis small enough (relative to the pause time), one can ex-pect a pause to boost success probability. As discussedabove, whether or not this improves the TTS is not asobvious.We use GS and FES to refer to the ground and sub-space of ﬁrst excited states of the instantaneous quantum Hamiltonian. In the rest of the section we refer to the gap as the energy gap between the GS and the FES.At very early or late stages in the annealing, only oneHamiltonian—either the driver H X or the problem H C —dominates. Since both the problem Hamiltonian and thedriving Hamiltonian are classical when acting alone, dy-namics in these regions are almost classical. Because thetemperature T is much lower relative to the energy scale,thermal relaxation rates remain slow.In the middle of the anneal, when the scales of H X and H C are comparable, the system dynamics is determinedby the interplay of the energy gap, non-adiabaticity (an-nealing speed relative to the gap), and thermalization. Inthis region we expect signiﬁcant population loss from theground state to excited states. In particular, when thegap is small enough, approximate instantaneous thermal-ization may occur, populating excited states. This regionis also where non-adiabiatic transitions are expected tobe largest.We thus distinguish three diﬀerent regimes in the an-neal, as described below and illustrated by the cartoonschematic in Fig. 6. Regime I : || A ( s ) H X || (cid:29) || B ( s ) H C || . The instanta-neous Hamiltonian is mainly H X , and its energy scaleis much larger than the temperature, T . The sys-tem stays in the ground state of H X . Regime II : || A ( s ) H X || ∼ || B ( s ) H C || , and their energy scale is com-parable to the temperature. Both thermal and quantumdynamics happen, and the minimal gap occurs in thisregion. Thermalization and non-adiabaticity are bothcontributing to populating FES (compared to the case ofzero-temperature adiabatic evolution in which all popu-lation is in the GS).As the anneal goes on in this regime, it sequentiallygoes through the following stages:a Gap approaching temperature, system leaving adi-abatic regime, but transitions (non-adiabatic andthermal) may still be relatively slow compared tothe system evolution.b Gap is near its minimum and is much smaller thanthe temperature — thermalization happens almostinstantaneously and the system is near the thermalequilibrium state. Quantum non-adiabatic eﬀectscould be strong enough to increase the populationof the FES beyond its magnitude at thermal equi-librium.c Gap is larger than temperature, non-adiabaticity isweak. System may still approximately equilibriateif given enough time (e.g. a pause), but will notfully thermalize during the standard evolution.At stage II b, the instantaneous equilibration removesthe state memory from the history. The system is simplyin thermal equilibrium. Due to the closeness between theGS and FES, the FES is signiﬁcantly populated.As the system enters stage II c from II b, pausing pro-mote betters thermalization, which could bring signiﬁ-1FIG. 6: Cartoon diagram for the three Regimes.

Colored in purple the left and rightmost regions are theadiabatic Regimes I and III (which further extend tothe left and right as indicated by the arrows). Regime 2is further subdivided into three regions, a,b,c, as in themain text, and are determined by the instantaneous gap∆ and the temperature T . In IIa we expect the systemto stop behaving strictly adiabatically, and region IIbinstantaneous thermalization may occur if therelaxation time scale is small enough, as well asnon-adiabatic transitions. In IIc, a pause may help torepopulate the GS. This should be thought of as anapproximate picture of what occurs, to aid the reader.In reality the transitions between these regions will ofcourse not be sharp and deﬁned by a single point duringthe evolution.cant FES population in stage II c down to the GS, sincerelative to the gap the temperature T is now lower, henceboosting the success probability. Regime III : || B ( s ) H C || (cid:29) || A ( s ) H X || , || B ( s ) H C || (cid:29) T , dynamics are slow, the system simply picks up phasesunder H C , and the population distribution is ﬁnal. Thisis also known as the frozen region in the literature. B. How | J F | shifts the optimal pausing locationearlier An increase in | J F | is expected to shift the minimalgap to earlier in the anneal, meaning that stage II b oc-curs earlier in the anneal, and therefore also shifting theoptimal pause region II c earlier. This shift of the mini-mal gap can partly result from the increase in the relativenorm of H C , i.e. decreasing the value of A ( s ) B ( s ) (cid:107) H X (cid:107)(cid:107) H C (cid:107) . Sim-ilar to Ref. [7], this is akin to shifting each point earlierin the anneal.In Figure. 7, for a small Ising problem embed-ded to 4 physical qubits with Chimera connectivity—which allows exact diagonalization of the instantaneous FIG. 7: Shift of minimal gap with | J F | . Energy gapbetween the ground and the ﬁrst excited states for theinstantaneous quantum Hamiltonian during annealingfor a toy problem. The logical problem is an Isingproblem of a complete graph of size 3, embedded to 4physical qubits of Chimera connectivity. The gap iscomputed exactly by diagonalizing the instantaneousHamiltonian. As | J F | increases, the instantaneous gapcloses, and the minimum gap shifts to earlier in theanneal.Hamiltonian—we show the change of minimal gap with | J F | . Note that because the cells in Chimera are bipar-tite graphs, odd cycles are not native to the structure. Inthis small example, a triangle on 3 nodes requires minorembedding as a square on 4 qubits. Below we provide anargument as to why the minimum gap increases in valuewith decreasing ferromagnetic strength.Before providing a proof sketch for the gap increase,we mention another picture that comes into play is thatan increase in | J F | can yield clusters (physical qubits rep-resenting the same logical qubits) with stronger internalcouplings. Changing the state in such clusters requirescollective ﬂipping of qubits, and demanding greater quan-tum dynamics. Accordingly the transition from stage II bto II c would happen earlier in the anneal. Such a picturemay also be accountable for the less dramatic increasein success rate compared to the native Ising case: theassociated energy barrier may require much higher rela-tive temperature, while pausing earlier helps, the amountit can help is limited (because it is an interplay of thethree inﬂuences which are correlated in a given anneal-ing schedule and at a given temperature). Proof sketch of gap scaling under | J F | : We ap-ply ﬁrst order non-degenerate perturbation theory. Let H ( s ) = H ( s ) + B ( s ) λH F with H ( s ) = A ( s ) H X + B ( s ) H C + B ( s ) J F H F where λ > J F < H F is the ferromagnetic Hamiltonian for the vertex model.That is, we are considering the eﬀect of weakening the2FIG. 8: Energy level shift with | J F | . The individualenergy levels show an increase in energy upondecreasing the ferromagnetic strength (i.e. the casewhen λ > | J F | . To sim-plify matters, assume the only vertex model is a chain oflength 2. Then H F = σ zk σ zk for two qubits k , k . Write | E i ( s ) (cid:105) as the instantaneous i th eigenstate of H ( s ). Forsimplicity, we drop the explicit s dependence (i.e., we willjust consider s ﬁxed at some value). Then we can alwaysdecompose our instantaneous eigenstates in the compu-tational basis, | E i (cid:105) = (cid:80) j a ( i ) j | z Lj (cid:105) + (cid:80) k b ( i ) k | z Bk (cid:105) , where | z Lj (cid:105) are logical states, and | z Bk (cid:105) has the chain broken.We compute the matrix elements (cid:104) E i | H F | E i (cid:105) = (cid:88) j | a ( i ) j | − (cid:88) k | b ( i ) k | . (6)Note, by normalization (cid:80) k | b ( i ) k | = 1 − (cid:80) j | a ( i ) j | , and,denoting the logical probability P ( i ) L := (cid:80) j | a ( i ) j | , (cid:104) E i | H F | E i (cid:105) = 2 P ( i ) L − . (7)This tells us, to ﬁrst order in λ >

0, that the lowlying energy levels experience an increase in energy upondecreasing the ferromagnetic strength ( | J F | → | J F | − λ ),i.e., E (cid:48) i = E i + Bλ (2 P ( i ) L − > E i , assuming that P ( i ) L > /

2. We see consistent behaviour with this picture inFig. 8 (even though this ﬁgure is not in the perturbativelimit).Now, the gap ∆ = E − E changes under λ , to ﬁrstorder, as∆ (cid:48) = ∆ + Bλ ( (cid:104) E | H F | E (cid:105) − (cid:104) E | H F | E (cid:105) )= ∆ + 2 Bλ ( P (1) L − P (0) L ) , (8) which therefore increases in magnitude (at a ﬁxed s ) byweakening the ferromagnetic couplings, assuming P (1) L >P (0) L .Note that at the start of the anneal, | E (0) (cid:105) = | + (cid:105) ⊗ N ,and so P (0) L = 1 / s = 0,FESs are linear combinations containing one excitationin the x eigenbasis, i.e., a single |−(cid:105) . Consider the sym-metric FES, denoted | F E + (cid:105) , where the state of the chainis √ ( | − + (cid:105) + | + −(cid:105) ) (and the other qubits are all | + (cid:105) ).This state is entirely in the logical subspace, due to thecancelling out of the | (cid:105) and | (cid:105) terms. When the trans-verse ﬁeld is ‘strong’, i.e., ‘near’ to s = 0 (but where theFES degeneracy is broken), by the perturbation theorywe may indeed therefore expect that ∆ (cid:48) > ∆. We seethis in Fig. 7, where the strongest chain, | J F | = 8, hasthe smallest instantaneous gap. In the arbitrary chainlength case, following the general expression the ﬁrst lineof Eq. (8), a similar argument applies provided P (1) L islarge enough relative to P (0) L , though the precise depen-dence is more complicated.We also know that once the transverse ﬁeld becomesweak relative to the problem Hamiltonian (e.g. A/B < P (1) L − P (0) L → VI. CONCLUSIONS AND FUTURE WORK

We studied how mid-anneal pauses aﬀect performanceon embedded problems using the previously unconsid-ered class of degree-bounded minimum spanning treeproblems. We developed a partial gauge approach thatallowed us to take advantage of the extended J -rangewhile also using gauges (partially), yielding signiﬁcantlycleaner results and improved performance than withoutpartial gauges, enabling us to conﬁrm theoretical predic-tions. Our results conﬁrm that, like for native problems,there is a region, consistent across instances, in whicha pause improves the probability of success. We fur-ther showed that the pause generally improves the time-to-solution (TTS) for these problems and evaluated theperformance on three TTS-related metric. We extendedthe theoretical picture of [1] to embedding problems, de-scribing the interaction of embedding parameters withannealing parameters, thermalization, and non-adiabaticeﬀects. This picture explains why the optimal pause lo-cation moves earlier in the anneal as | J F | increases andwhy the beneﬁt provide by pausing, while signiﬁcant, isnot as great as for native problems. It generally providesboth deeper insights into the physics of these devices andpragmatic recommendations to improve performance onoptimization and sampling problems.3This study suggests a number of avenues for future re-search. As the connectivity of quantum annealing hard-ware increases, as is anticipated in D-Wave’s upcomingPegasus architecture, lower embedding overhead shouldtranslate to greater beneﬁt from pausing. Larger andmore connective devices will allow larger problem sizesto be run, enabling scaling analyses. As annealing hard-ware becomes more ﬂexible, a wider variety of advancedschedules become possible such as a smooth slowing downrather than a pause or annealing at diﬀerent rates in dif-ferent parts of the system depending on the local embed-ding characteristics or local problem instance structure.All of these possibilities should be explored on a vari-ety of optimization problems as well as on the BD-MSTproblem class investigated here. Embedding aﬀects sam-pling problems even more than optimization problems[19], so a study of the interplay between embedding pa-rameters and annealing parameters should be done inthat context as well. Experiments at other temperaturesand with the ability to do quick quenches at arbitrarypoints in the anneal would give further insight in the theunderlying physics. Further, given that diabatic behav-ior is expected to be useful even for devices that couldremain adiabatic through out a run, an intriguing areafor both theoretical research and hardware developmentis the use of engineered dissipation to support coolingin conjunction with diabatic evolution, enabling muchmore controlled utilization of thermalization in quantum annealers of the future. VII. ACKNOWLEDGMENTS

We are grateful for support from NASA Ames Re-search Center, particularly the NASA TransformativeAeronautic Concepts Program. We also appreciate sup-port from the AFRL Information Directorate under grantF4HBKC4162G001 and the Oﬃce of the Director of Na-tional Intelligence (ODNI) and the Intelligence AdvancedResearch Projects Activity (IARPA), via IAA 145483.The views and conclusions contained herein are those ofthe authors and should not be interpreted as necessarilyrepresenting the oﬃcial policies or endorsements, eitherexpressed or implied, of ODNI, IARPA, AFRL, or theU.S. Government. The U.S. Government is authorized toreproduce and distribute reprints for Governmental pur-pose notwithstanding any copyright annotation thereon.ZGI was also supported by the USRA Feynman QuantumAcademy funded by the NAMS R&D Student Programat NASA Ames Research Center. ZGI, SH, JM, and ZWare also supported by NASA Academic Mission Services,Contract No. NNA16BD14C. We acknowledge insight-ful discussions with Davide Venturelli and also with Ric-cardo Mengoni, particularly preliminary work mappingspanning tree problems to QUBO [13]. [1] J. Marshall, D. Venturelli, I. Hen, and E. G. Rieﬀel,“Power of Pausing: Advancing Understanding of Ther-malization in Experimental Quantum Annealers,” Phys.Rev. Applied , 044083 (2019).[2] G. Passarelli, V. Cataudella, and P. Lucignano, “Improv-ing quantum annealing of the ferromagnetic p -spin modelthrough pausing,” Phys. Rev. B , 024302 (2019).[3] H. Chen and D.A. Lidar, “Why and when is pausing ben-eﬁcial in quantum annealing?” arXiv:2005.01888 (2020).[4] Vijay V Vazirani, Approximation algorithms (SpringerScience & Business Media, 2013).[5] Thomas H Cormen, Charles E Leiserson, Ronald LRivest, and Cliﬀord Stein,

Introduction to algorithms (MIT press, 2009).[6] Michael R. Garey and David S. Johnson,

Comput-ers and Intractability: A Guide to the Theory of NP-Completeness (W. H. Freeman & Co., New York, NY,USA, 1979).[7] V. Choi, “The eﬀects of the problem Hamiltonian param-eters on the minimum spectral gap in adiabatic quantumoptimization,” Quant. Inf. Process. , 90 (2020).[8] Eleanor G. Rieﬀel, Davide Venturelli, Bryan O’Gorman,Minh B. Do, Elicia M. Prystay, and Vadim N. Smelyan-skiy, “A case study in programming a quantum annealerfor hard operational planning problems,” Quantum In-formation Processing , 1–36 (2015).[9] Andrew Lucas, “Ising formulations of many NP prob-lems,” Frontiers in Physics , 5 (2014). [10] V. Choi, “Minor-embedding in adiabatic quantum com-putation: I. the parameter setting problem,” QuantumInf. Process. , 193 (2008).[11] Yan-Long Fang and P. Warburton, “Minimizing minorembedding energy: an application in quantum anneal-ing,” arXiv:1905.03291 (2019).[12] D. Venturelli, S. Mandr, S. Knysh, B. OGorman,R. Biswas, and V. Smelyanskiy, “Quantum optimizationof fully-connected spin glasses,” Phys. Rev. X , 031040(2015).[13] Zhihui Wang, Mostafa Adnane, Bryan OGorman, StuartHadﬁeld, Riccardo Mengoni, Davide Venurelli, Zoe Gon-zalez Izguierdo, and Eleanor Rieﬀel, “Mapping spanninggraph problems to quantum heuristics: techniques forhandling global connectivity,” In preparation. (2020).[14] “QPU Properties: D-Wave 2000Q System at NASAAmes,” D-Wave User Manual 09-1151A-D (2019).[15] Sergio Boixo, Troels F Rønnow, Sergei V Isakov, ZhihuiWang, David Wecker, Daniel A Lidar, John M Martinis,and Matthias Troyer, “Evidence for quantum annealingwith more than one hundred qubits,” Nature Physics ,218–224 (2014).[16] Troels F. Rønnow, Zhihui Wang, Joshua Job, SergioBoixo, Sergei V. Isakov, David Wecker, John M. Mar-tinis, Daniel A. Lidar, and Matthias Troyer, “Deﬁningand detecting quantum speedup,” Science , 420–424(2014).[17] Sergio Boixo, Tameem Albash, Federico M Spedalieri,Nicholas Chancellor, and Daniel A Lidar, “Experimental signature of programmable quantum annealing,” Naturecommunications , 2067 (2013).[18] M. Kim, D. Venturelli, and K. Jamieson, “LeveragingQuantum Annealing for Large MIMO Processing in Cen-tralized Radio Access Networks,” in Proceedings of theACM Special Interest Group on Data Communication ,SIGCOMM 19 (Association for Computing Machinery,New York, NY, USA, 2019) p. 241255.[19] J. Marshall, A. Di Gioacchino, and E. G. Rieﬀel, “Perilsof embedding for sampling problems,” Phys. Rev. Re-search , 023020 (2020).[20] R Ravi, Madhav V Marathe, SS Ravi, Daniel JRosenkrantz, and Harry B Hunt III, “Many birds withone stone: Multi-objective approximation algorithms,” in Proceedings of the twenty-ﬁfth annual ACM symposiumon Theory of computing , ACM (Association for Comput-ing Machinery, New York, NY, USA, 1993) pp. 438–447.[21] Martin F¨urer and Balaji Raghavachari, “Approximatingthe minimum degree spanning tree to within one fromthe optimal degree,” in

Proceedings of the third annualACM-SIAM Symposium on Discrete Algorithms (Societyfor Industrial and Applied Mathematics, 1992) pp. 317–324.[22] Michel X Goemans, “Minimum bounded degree spanningtrees,” in

Proceedings of the 47th Annual IEEE Sympo-sium on Foundations of Computer Science , FOCS 06,IEEE (IEEE Computer Society, USA, 2006) pp. 273–282.[23] Mohit Singh and Lap Chi Lau, “Approximating mini-mum bounded degree spanning trees to within one ofoptimal,” in

Proceedings of the thirty-ninth annual ACMsymposium on Theory of computing , ACM (Associationfor Computing Machinery, New York, NY, USA, 2007)pp. 661–670.[24] Jochen K¨onemann and R Ravi, “A matter of degree:Improved approximation algorithms for degree-boundedminimum spanning trees,” in

Proceedings of the Thirty-Second Annual ACM Symposium on Theory of Comput-ing , ACM (Association for Computing Machinery, NewYork, NY, USA, 2000) pp. 537–546.[25] MS Zahrani, Martin J Loomes, JA Malcolm, and An-dreas A Albrecht, “A local search heuristic for bounded-degree minimum spanning trees,” Engineering Optimiza-tion , 1115–1135 (2008).[26] Samir Khuller, Balaji Raghavachari, and Neal Young,“Low-degree spanning trees of small weight,” SIAM Jour-nal on Computing , 355–368 (1996).[27] Raja Jothi and Balaji Raghavachari, “Degree-boundedminimum spanning trees,” Discrete Applied Mathematics , 960–970 (2009).[28] Thang N Bui and Catherine M Zrncic, “An ant-basedalgorithm for ﬁnding degree-constrained minimum span-ning tree,” in Proceedings of the 8th Annual Conferenceon Genetic and Evolutionary Computation , ACM (Asso-ciation for Computing Machinery, New York, NY, USA,2006) pp. 11–18.[29] Thang N Bui, Xianghua Deng, and Catherine M Zrn-cic, “An improved ant-based algorithm for the degree-constrained minimum spanning tree problem,” IEEETransactions on Evolutionary Computation , 266–278(2011).[30] Mohan Krishnamoorthy, Andreas T Ernst, and Yazid MSharaiha, “Comparison of algorithms for the degree con-strained minimum spanning tree,” Journal of heuristics , 587–611 (2001). Appendix A: Problem mapping: ‘level-based’

Consider a graph G = ( V, E ) with weights w ( E ) foreach edge, from which we wish to obtain a minimalweighted spanning tree with maximum degree ∆, i.e. ﬁndits BD-MST. This will involve minimizing the sum of theweights of the tree edges, represented by the cost function C = (cid:88) p,v w pv x p,v , (A1)which we explain below. Several constraints will also beimposed to ensure that the graph is in fact a spanningtree and its degree is bounded by ∆.A root for the tree will be picked randomly or basedon problem structure—generally, picking a high-degreevertex as the root will result in lower resource costs—andassigned to level 1. Its children will be at level 2, theirchildren at level 3 and so on, leading to the ‘level-based’designation.The variables x p,v appearing in Eq. (A1) represent theparent-child relationships in the tree; x p,v = 1 if p is the(adjacent) parent of v (and 0 if not). The indices p, v range over p = 1 , . . . , n and v = 2 , . . . , n , restricted to(intersected with) pairs ( p, v ) or ( v, p ) that occur in E .Thus there are 2 variables for every edge not containingthe root, and one for every root edge, giving 2 m − d r total x p,v variables, with m being the number of edges in E and d r the degree of the root.Since our problem needs to be in QUBO form, theconstraints will be expressed as penalty terms. The ﬁrstpenalty term enforces that every node (except the root)has exactly one parent, C pen = (cid:88) v ∈{ ,...,n }  (cid:88) p :( pv ) ∈ E x p,v −  . (A2)The number of terms in the sum is 2 m − d r , i.e. equal tothe number of variables x p,v .The second penalty term enforces that each vertex ex-ists at exactly one level in the tree, C pen = (cid:88) v ∈{ ,...,n } (cid:32) n (cid:88) (cid:96) =2 y v,(cid:96) − (cid:33) . (A3)It introduces the y v,(cid:96) variables, with y v,(cid:96) = 1 if v isat depth (cid:96) of the tree, v = 2 , . . . , n , (cid:96) = 2 , . . . , n . Thereare ( n − such variables. However, since the numberof variables will eventually determine how many logicalqubits the problem requires, it is in our interest to reduceit as much as possible. By picking the root smartly therange of (cid:96) can be reduced. We also carry out the followingpre-processing: taking the original graph G = ( V, E ), thedistance from each node to the one we have selected asthe tree root is calculated. Given that it is impossible fora node to be at a level smaller than its distance to the5root, we can avoid generating any y v,(cid:96) for which that isthe case, further bringing down the total number of y v,(cid:96) variables.The third penalty term enforces that the tree has de-gree at most ∆, C pen = v (cid:88) p =2  (cid:88) v :( pv ) ∈ E x p,v − ∆ − (cid:88) j =1 z p,j  (A4)+( (cid:88) v :(1 v ) ∈ E x ,v − ∆ (cid:88) j =1 z ,j ) . (A5)It is separated into two terms to account for the fact thatthe root can have up to ∆ children, while all other nodescannot have more than (∆ − (cid:80) v :( pv ) ∈ E x p,v ≤ ∆ −

1, integervariable z p ∈ [0 , ∆ −

1] is introduced as slack variable, andthe inequality is enforced as equality (cid:80) v :( pv ) ∈ E x p,v = z p .The integer variable is further encoded into binary vari-ables z p,j . In general, various encoding methods can beapplied to encode an integer into binaries, including bi-nary, unary, and one-hot encodings. While binary encod-ing is most eﬃcient for integers of value power of two, Weuse unary encoding here, which can be applied straight-forward to arbitrary value of ∆.The fourth and ﬁnal penalty term enforces that thetree encoding is consistent, i.e., that if p is the parent of v then its level is one less than v ’s, C pen = (cid:88) p,v n (cid:88) (cid:96) =3 x p,v y v,(cid:96) (1 − y p,(cid:96) − ) (A6)+ d r (cid:88) v =2 x ,v (1 − y v, ) + d r (cid:88) v =2 y v, (1 − x ,v ) , (A7)where the last two sums handle the edges connected tothe root and their terms are quadratic, while the ﬁrstsum deals with the remaining edges and produces cubicterms of the form x p,v y v,(cid:96) (1 − y p,(cid:96) − ). While the originalnumber of cubic terms would be(2 m − d r ) ∗ ( n − , thanks to the preprocessing of the y v,(cid:96) variables thisnumber is reduced. Because cubic terms cannot be di-rectly encoded in D-Wave, we introduce an ancilla vari-able a p,v,(cid:96) to encode x p,v y v,(cid:96) , and accordingly a penaltyfunction f ( x, y, a ) = 3 a + xy − ax − ay is added to raise apenalty if a = xy is violated. The term x p,v y v,(cid:96) (1 − y p,(cid:96) − )then can be replaced by quadratic terms4 a − ay p,(cid:96) − + x p,v y v,(cid:96) − ax p,v − ay v,(cid:96) . (A8)The total number of variables (and hence, logicalqubits) without preprocessing is at most:2 m − d r + ( n − + n (∆ −

1) + 1 + (2 m − d r )( n − (cid:39) mn + n This would mean, for instance, that the completegraph K with ∆ = 3 would require between 86 and 100logical qubits (depending on d r ). With pre-processing,we are able to bring this number down to 74.Finally, we can write the overall objective function as C = C + A ( C pen + C pen + C pen + C pen ) , (A9)and accordingly the cost Hamiltonian H C . In Eq. (A9)we have deﬁned the minimum penalty weight to be themaximum edge weight A = w max = max ( uv ) ∈ E w uv . (A10)In Ref. [13] we provide proof that setting A = w max + (cid:15) with any positive (cid:15) suﬃces to guarantee C is minimizedby bounded-degree spanning trees that are optimal for C and correctly encoded. In our runs, for convenience,we set (cid:15) = 0, which could in principle have led to aninvalid bit string also minimizing C . The solutions re-turned from the quantum annealer were each checked foroptimality and correct encoding. Though increasing A by any amount would guarantee that the optimal costof a solution implies its correct encoding, in practice weobserved that this was still the case despite having set (cid:15) = 0. We provide details and further discussion in [13]. Appendix B: Approximation complexity forBD-MST Problems

Finding a degree-bounded spanning tree of cost atmost r times the optimum remains NP-hard for any r ≥ ∗ + 1, where ∆ ∗ is the minimal ∆ for which such a spanning tree exists.For the weighted case, Ref. [22] shows a polynomial timealgorithm that returns a spanning tree with vertex degreeat most ∆ + 2 – subsequently improved to ∆ + 1 in [23]– and cost at most OP T , where

OP T is the optimalspanning tree weight under the desired bound ∆. Alter-natively, heuristics exist which return valid ∆-boundedspanning trees, but with suboptimal cost that may be dif-ﬁcult to quantify generally. A wide variety of approacheshave been developed for this problem [24–29], includingspeciﬁc approximations for various special cases (e.g., ge-ometric weights); see [30] for an overview.

Appendix C: BD-MST Problem instances

All connected graphs of n = 5, with m = | E | rang-ing from 4 to 10 are considered where an BD-MST with∆ = 2 exists. The edges in these graphs is provide inTable III. Additionally, the graph labeled ’m5ver5’ is in-cluded to demonstrate the BD-MST with ∆ ≥

3. For6each graph, problem instances were generated by assign-ing a set of weights by sampling from one of the lists ofweights appearing in Table IV. The ﬁrst m weights ineach weight list were used to deﬁne an instance. Appendix D: Embedding Statistics

Table V contains mapped problem size for each graph,and embedding features like number of physical qubits,size of the vertex model, etc. Embedding statistics on afuture D-Wave architecture (Pegasus) are also includedin this table. For the Pegasus architecture, each qubitcan couple to 15 other qubits, as opposed to the Chimeraarchitecture that allows each qubit to connect to at most6 additional qubits.As discussed in Section III B, Fig. 9 contains detailedinformation about the typical size of the embedded prob-lems for diﬀerent graphs, including the number of physi-cal qubits and the size of the vertex models. Embeddingstatistics for a future D-Wave architecture (Pegasus) arealso given. number of logical variables m e d i a nnu m b e r o f ph y s i c a l v a r i a b l e s n = n = n = n = n = n = n = ChimeraPegasus

FIG. 9:

Embedding comparison between currentand future architectures.

Embedding for thecomplete graphs for problem size n = 4-10 with defaultembedding parameters and 10 to 20 instances drawn foreach graph. Chimera embedding performed withD-Wave’s SAPI2 ﬁnd embedding routine with theD-Wave 2000Q hardware adjacency graph. Pegasusembedding performed with the Ocean minorminerﬁnd embedding routine. Median number of physicalqubits as a function of the number of logical qubits witherror bars are at the 35th and 65th percentiles afterbootstrapping over the ensemble of instances.7 m label Graph6 Name edges4 m4ver1 DhC (1,2), (2,3), (3,4), (4,5)5 m5ver1 Dhc (1,2), (2,3), (3,4), (4,5), (1,5)5 m5ver2 DiK (1,2), (2,3), (2,5), (3,4), (4,5)5 m5ver3 DjC (1,2), (2,3), (2,4), (3,4), (4,5)5 m5ver5 DiS (1, 2),(1, 3), (1,4),(1,5),(4,5)5 m5ver6 DKs (1, 2), (2, 3), (3, 4), (4, 5), (3, 5)6 m6ver1 DyK (1,2), (1,5), (2,5), (2,3), (3,4), (4,5)6 m6ver2 DjS (1,2), (2,3), (2,4), (2,5), (3,4), (4,5)6 m6ver3 DjK (1,2), (2,3), (2,5), (3,5), (3,4), (4,5)6 m6ver4 D { K (1,2), (1,5), (1,3), (2,3), (3,4), (4,5)6 m6ver5 D { c (1, 2), (1, 3), (2, 3), (3, 5), (3, 4), (4, 5)6 m6ver6 D]o (1, 2), (2, 3), (3, 4), (1, 4), (2, 5), (4, 5)7 m7ver1 D | S (1,2), (1,5), (1,4), (2,5), (2,3), (3,4), (4,5)7 m7ver2 DzW (1,2), (1,5), (2,5), (2,3), (2,4), (3,5), (4,5)7 m7ver3 D | c (1,2), (1,3), (1,4), (1,5), (2,3), (3,4), (4,5)7 m7ver4 D ∼ C (1,2), (1,3), (1,4), (2,4), (2,3), (3,4), (4,5)7 m7ver5 D]w (1, 2), (2, 3), (3, 4), (1, 4), (4, 5), (2, 5), (3, 5)7 m7ver6 Dh { (1, 2), (1, 3), (1, 4), (1, 5), (2, 3), (3, 4), (4, 5)8 m8ver1 D } k (1,2), (1,5), (1,3), (1,4), (2,5), (2,3), (3,4), (4,5)8 m8ver2 Dz[ (1,2), (1,5), (2,5), (2,3), (2,4), (3,4), (3,5), (4,5)9 m9ver1 D ∼ k (1, 2), (2, 3), (4, 5), (1, 5), (1, 4),(1, 3), (2, 5), (2, 4), (3, 5)10 m10ver1 D ∼{ (1, 2), (2, 3), (3, 4), (4, 5), (1, 5),(1, 4), (1, 3), (2, 5), (2, 4), (3, 5) TABLE III: n = 5 graphs label weight listw2 [1, 2, 1, 2, 1, 2, 1, 2, 1, 2]w3 [1, 1, 2, 1, 1, 2, 1, 1, 2, 1]w4 [1, 1, 2, 2, 1, 1, 2, 2, 1, 1]w5 [1, 4, 1, 4, 1, 4, 1, 4, 1, 4]w6 [1, 3, 6, 1, 3, 6, 1, 3, 6, 1]w7 [1, 7, 1, 7, 1, 7, 1, 7, 1, 7]w8 [3, 2, 1, 3, 2, 1, 3, 2, 1, 3]w9 [4, 3, 2, 1, 4, 3, 2, 1, 4, 3]w10 [5, 4, 3, 2, 1, 5, 4, 3, 2, 1]w11 [6, 5, 4, 3, 2, 1, 6, 5, 4, 3]w12 [7, 6, 5, 4, 3, 2, 1, 7, 6, 5]w13 [1, 1, 3, 4, 2, 1, 2, 3, 4, 2]w14 [3, 2, 1, 1, 1, 1, 2, 4, 2, 2]w15 [2, 1, 2, 1, 4, 1, 1, 3, 3, 2]w16 [4, 3, 3, 4, 3, 3, 4, 3, 4 ]w17 [3, 4, 7, 5, 5, 5, 5]w18 [2, 1, 4, 1, 2, 1, 2]w19 [4, 6, 4, 7, 4, 7]w20 [1, 1, 2, 3, 2, 3]w21 [4, 5, 4, 5, 5]w22 [2, 2, 6, 2, 4]w23 [3, 3, 5, 2, 3, 2, 5, 2, 5]w24 [4, 3, 2, 2]w25 [2, 2, 6, 2, 4]w26 [4, 3, 3, 3]w27 [3, 4, 7, 5, 5, 5, 5]w28 [4, 6, 4, 7, 4, 7]w29 [6, 4, 2, 2] TABLE IV: Graph weights areuniformly drawn from the abovelists.8 n m Chimera Architecture Pegasus Architecture

TABLE V: Mapped problem size for N=4-6 using the D-Wave Chimera architecture, and problem size N = 4-10using the future D-Wave Pegasus architecture. For the Pegasus architecture entries, embedding was only performedfor the complete graphs, which is the reason for the large number of unset entries in the last three columns. For theChimera Architecture embedding entries, we were unable to embed graphs with n ≥ n = 4 network communicationgraphs that we examined early in the study, which is the reason for the missing Chimera Architecture entries for the n = 4 graphs near the top of the table.9 Appendix E: Details on unsolved instances

As a special case of the improvement in TTS, we ﬁndthat for certain problems, the no-pause annealing failedto ﬁnd a solution even after 50K reads, while the anneal-ing with an appropriate pause was able to ﬁnd one. Inparticular, out of the 45 instances tested, the no-pauseannealing failed to solve 7 of them. Of those 7, thereare 3 which remained unsolved by any of the pause runs(we are considering a total of 10 pause runs, resultingfrom 2 pause locations s p = { . , . } and 5 pause du-rations t p = { . , . , . , , } µ s), while the other 4were solved by most or all of them: 2 were solved by 10out of the 10 pause runs, 1 solved by 9 pause runs, andthe other one solved by 8 pause runs. There are also 2other instances that were solved by the no pause runsbut that, respectively, 1 and 3 of the pause runs couldnot solve (but all the rest could). There are no instancesthat were solved by the no pause runs but weren’t solvedby the pause runs. Of the 10 pause runs, the worst onecannot solve 6 of the 45 instances (making it better thanthe no pause in that metric). The second worst cannotsolve 5, there are 2 that cannot solve 4, and the other 6cannot solve 3. Appendix F: Supporting instances showcasing shiftof optimal pause location, improvements with partialgauges and the eﬀect of pause on success probability

In Fig. 10, we show the shift of optimal pause locationwith | J F | for a problem instance for bounded degree ∆ =3. In Fig. 11, we show a few more instances from theinstance ensemble for ∆ = 2, n = 5.Figure 12 illustrates the clear shifting of the optimalpause location for an instance ensemble over all | J F | val-ues we examined (in range [1 . , | J F | are shownearlier in the bottom panel of Fig. 4 and discussed inSection IV CAs discussed in Section IV D, the improvement in p success will saturate as one increases the number ofgauges applied. Fig. 13 shows for an n = 4 ensemble, applying as few as 10 gauges yields similar p success to100 gauges. As detailed in Section IV D, these resultsindicate that with 10 gauges we are already likely to en-counter one or more positive outliers, leading to the largeimprovement in p success . As the number of gauges fur-ther increases, the eﬀect is not as dramatic, indicatingthe spread in gauge quality approaches the intrinsic dis-tribution.Figure 3 in Section IV B contains results for TTS foran ensemble in the narrowed parameter range discussedin this section. The corresponding p success are shown inFig. 14. .

20 0 .

25 0 .

30 0 .

35 0 .

40 0 .

45 0 . s p − − p s u cc e ss | J F | = 2 | J F | = 1 . | J F | = 1 . | J F | = 1 . | J F | = 1 . | J F | = 1 . | J F | = 1 . | J F | = 1 . | J F | = 1 . FIG. 10:

Optimal pause location shift for aninstance with ∆ = 3 Probability of success versus theannealing pause location for the n=5, m5ver5 [m=5, K , + e , g6: DiS], weight set 14 instance usingembedding number 20 with ∆=3, 1 µ s anneal, 100 µ spause, 50K reads and 0 partial gauges. Pause locationranging from 0.2 to 0.5 and J ferro varied from -1.2 to-2.0. The peak in p success shifts from s p =0.42 at J ferro =-1.2; to s p =0.36 for J ferro =-1.5; and s p =0.32for J ferro =-2.0. Note that ∆=3 is the minimum deltathat can be used to obtain a minimum weightedspanning tree.0 .

20 0 .

25 0 .

30 0 .

35 0 .

40 0 .

45 0 . s p − − − p s u cc e ss n5m5ver2w14emb12 .

20 0 .

25 0 .

30 0 .

35 0 .

40 0 .

45 0 . s p − − p s u cc e ss n5m5ver3w14emb12 .

20 0 .

25 0 .

30 0 .

35 0 .

40 0 .

45 0 . s p − − p s u cc e ss n5m6ver1w14emb26 .

20 0 .

25 0 .

30 0 .

35 0 .

40 0 .

45 0 . s p − − − p s u cc e ss n5m6ver2w14emb12 .

20 0 .

25 0 .

30 0 .

35 0 .

40 0 .

45 0 . s p − p s u cc e ss n5m6ver3w14emb9 .

20 0 .

25 0 .

30 0 .

35 0 .

40 0 .

45 0 . s p − − p s u cc e ss n5m6ver4w14emb11 .

20 0 .

25 0 .

30 0 .

35 0 .

40 0 .

45 0 . s p − − − p s u cc e ss n5m7ver1w14emb1 .

20 0 .

25 0 .

30 0 .

35 0 .

40 0 .

45 0 . s p − − p s u cc e ss n5m7ver2w14emb2 .

20 0 .

25 0 .

30 0 .

35 0 .

40 0 .

45 0 . s p − p s u cc e ss n5m7ver3w14emb13 | J F | = 1 . | J F | = 1 . | J F | = 1 . | J F | = 1 . | J F | = 1 . | J F | = 1 . | J F | = 1 . | J F | = 1 . | J F | = 2 . FIG. 11:

Shift of optimal pause location with | J F | (for multiple instances). Shifting of optimal pauselocation with | J F | for multiple instances with a 100 µ s pause. Note that the scale of the y axis is diﬀerent acrossinstances.1 .

20 0 .

25 0 .

30 0 .

35 0 .

40 0 .

45 0 . s p − − p s u cc e ss | J F | = 1 . | J F | = 1 . | J F | = 1 . | J F | = 1 . | J F | = 1 . | J F | = 1 . | J F | = 1 . | J F | = 1 . | J F | = 2 FIG. 12:

Shift of optimal pause location with | J F | (ensemble). Probability of success for an ensemble of 9instances of n = 5 with a pause duration t p = 100 µ s, and t a = 1 µ s. . . . . . . | J F | − − − p s u cc e ss no gauges10 gauges100 gauges FIG. 13:

Improvement of probability of successwith partial gauges.

Eﬀect of partial gauges on theprobability of success for 10 of n = 4 instances withvarying | J F | and a no pause schedule.2 .

30 0 . s p − × − × − × − p s u cc e ss s p = 0 . s p = 0 . t p = 0.25 µ s t p = 0.5 µ s t p = 0.75 µ s t p = 1.0 µ s t p = 2.0 µ s FIG. 14:

Eﬀect of pause duration on successprobability.

Success probability corresponding to TTSshown in Figure 3 in Section IV B. With pause durationof { . , . , . , , } µ s, and | J F | =1.8, the successprobability for an ensemble of 45 instances is shown forpause locations s p = 0 . s p = 0 .

32 (which we foundto be optimal during initial sweep). The reference(horizontal line and band for median and 35 and 65percentiles, respectively) is the no-pausing case withparameters optimal for TTS: t a = 1 µ s, and | J F | =1.6.Data points show the median, with error bars at the35 th and 65 th percentiles, after performing 105