Scaling overhead of locality reduction in binary optimization problems
Elisabetta Valiante, Maritza Hernandez, Amin Barzegar, Helmut G. Katzgraber
SScaling overhead of locality reduction in binary optimization problems
Elisabetta Valiante, Maritza Hernandez, Amin Barzegar, and Helmut G. Katzgraber
3, 2, 4, ∗ Microsoft Quantum, Microsoft, Redmond, Washington 98052, USA Professional Services, Amazon Web Services, Seattle, Washington 98101, USA Department of Physics and Astronomy, Texas A&M University, College Station, Texas 77843-4242, USA (Dated: December 18, 2020)Recently, there has been considerable interest in solving optimization problems by mapping these onto abinary representation, sparked mostly by the use of quantum annealing machines. Such binary representa-tion is reminiscent of a discrete physical two-state system, such as the Ising model. As such, physics-inspiredtechniques—commonly used in fundamental physics studies—are ideally suited to solve optimization problemsin a binary format. While binary representations can be often found for paradigmatic optimization problems,these typically result in k -local higher-order unconstrained binary optimization cost functions. In this work, wediscuss the effects of locality reduction needed for solvers such as the D-Wave quantum annealer, the FujitsuDigital Annealer, or the Toshiba Simulated Bifurcation Machine that can only accommodate -local (quadratic)cost functions. General locality reduction approaches require the introduction of ancillary variables which causean overhead over the native problem. Using a parallel tempering Monte Carlo solver on Azure Quantum, as wellas k -local binary problems with planted solutions, we show that post reduction to a corresponding -local repre-sentation the problems become considerably harder to solve. We further quantify the increase in computationalhardness introduced by the reduction algorithm by measuring the variation of number of variables, statistics ofthe coefficient values, and the entropic family size. Our results demonstrate the importance of avoiding localityreduction when solving optimization problems. PACS numbers: 75.50.Lk, 75.40.Mg, 05.50.+q, 64.60.-i
I. INTRODUCTION
In recent years there have been many technological andalgorithmic advances when solving optimization problems,in particular, in an industrial setting. Sparked by the workof D-Wave Systems Inc., a whole new field of optimizationbased on physical processes has emerged. Specifically, thedevelopment of hardware quantum annealers has stimulatednew ways of analyzing problems previously thought to be in-tractable.Despite these advances, the use of quantum annealers forlarge-scale industry applications remains limited if not pairedwith classical algorithms on CMOS hardware. Being able totackle an application requires first having a Boolean represen-tation of the problem. To this mapping step, in most cases avariable overhead is associated, which typically makes a prob-lem harder to solve. However, due to hardware limitations,only -local (quadratic unconstrained binary optimization, orQUBO) cost functions can be tackled with quantum annealinghardware. This means that a higher-order binary polynomialunconstrained optimization problem requires a locality reduc-tion which can result in a sizable variable overhead. In thiswork, we focus on the locality reduction , and do not discussadditional overheads due to the embedding of a binary prob-lem onto the hardwired sparse quasi-two-dimensional topol-ogy of annealing hardware or the effects of analog noise.The hardware limitations play an important role when solv-ing problems naturally formulated as a Hamiltonian with ∗ The work of H. G. K. was performed before joining Amazon Web Services. k -local interactions with k > . There are various optimiza-tion problems both in fundamental physics and applicationsthat are natively k -local. Examples in physics are comput-ing the partition function of a four-dimensional pure latticegauge theory [1, 2], measuring the fault-tolerance in topo-logical colour codes [3], and solving k -SAT problems with k > . Examples of practical applications are circuit fault di-agnosis [4, 5], molecular similarity measurement [6], molecu-lar conformational sampling [7], and traffic light synchroniza-tion [8].Quadratization techniques are algorithms used to reduce ahigher-degree multilinear polynomial into a quadratic one [9].The reduction process can introduce two different types ofoverheads. First, the quadratization itself can result in a largeoverhead before any solver is applied to the problem of in-terest. Second, quadratization requires the introduction of ad-ditional variables and terms. As such, the complexity of theproblem increases and, in turn, so does the time to solution.Finally, the quadratization process might also introduce fea-tures (e.g., broader coupler distributions) that can affect theintrinsic difficulty of the problem. An extensive comparisonbetween several quadratization methods, highlighting the prosand cons of each method, has been compiled by Dattani inRef. [10].In this paper, we use Microsoft Quantum’s k -local solversbased on simulated annealing and parallel tempering MonteCarlo to measure the time overhead introduced by the quadra-tization process to reduce an optimization problem with k -local interaction to its -local counterpart. We study un-constrained problems with a binary representation and plantedsolutions and disregard the time it takes for the quadratiza-tion algorithm to run. Our results demonstrate that the lo- a r X i v : . [ qu a n t - ph ] D ec cality reduction introduces a large overhead when solving theproblems. Employing a commonly used proxy metric, wedemonstrate that, on average, optimization problems becomemuch harder to solve when the locality is reduced. K¨onz etal. (in prep.) study the embedding overhead when using sparsehardware topologies. Both complementary studies highlightthe importance of developing new optimization machines andtechniques that can handle k -local cost functions natively oncomplete graphs.This paper has the following structure: in Sec. II, we de-scribe the benchmark problems used for the experiment; inSec. III, we present the setup of the experiment and the met-rics used to compare performance; in Sec. IV and Sec. V, wediscuss and analyze the results of the experiment; in Sec. VI,we present our conclusions. II. BENCHMARK PROBLEMS
In order to study the scaling overhead caused by reducing a k -local problem to a quadratic ( -local) formulation, we firstgenerate Ising problems for k = 3 and k = 4 . The k -local in-stances have been generated using the Chook package, whichis publicly available on GitHub; see [11]. Using this pack-age we are able to construct planted-solution instances, thusensuring that the ground state and corresponding energy areknown a priori. The construction of k -local problems is doneby combining problems of lower-order, that is, k ≤ .In this study, -local instances have been generated by com-bining a tile planting problem with Ising spins coupled to a bi-modal random field, while for -local instances, the problemshave been generated by combining two tile planting problems.The tile planting problems are defined by four subproblemclasses that correspond to unit cycles (plaquettes) with dif-ferent levels of frustration. A subproblem is constructed byassigning to the couplers values equal to − , , or , accord-ing to the class to which the subproblem belongs. The classis assigned with a certain probability, and each instance classis defined by three probability parameters. We set these pa-rameters to the default values used in Chook [11]. For eachlocality considered, we generate instances with problem sizes N (number of variables) between and .The k -local instances are then reduced to their quadraticform using an iterative reduction-by-substitution algo-rithm [12, 13]. Here, the product of two variables is substi-tuted by a new auxiliary variable and a penalty term is addedto enforce equality in the ground state. This process is re-peated until the final function becomes quadratic. Tuning thevalue of the energy penalty term is extremely important: asmall value could return a -local problem not having thesame optimum as the original higher-order problem. There-fore, a large value is commonly used in various implementa-tions of this algorithm. The penalty value can grow to be verylarge if high values of k are being reduced. This can poseissues when attempting to solve problems on current analogquantum annealing hardware, because large coefficients am-plify the effects of the analog noise. The reduction of k -localproblems in this work is done via the Hobo2Qubo function available through 1QBit’s 1Qloud Platform [14], which usesa tight bound for the penalty coefficient and sets it indepen-dently for each reduced term. The computational time re-quired to reduce a single instance is negligible with respect tothe time required by the solver. Moreover, the reduction from k -local to -local is known to scale in polynomial time [13],that is, it should be irrelevant in the thermodynamic limit.The sizes and densities of the -local instances obtainedafter reduction from -local and -local instances are shownin Tables I and II, respectively. The number of variables in-creases considerably when reducing locality from k -local to -local, as can be expected for a reduction-by-substitution al-gorithm.The density of a k -local instance ρ is calculated as ρ = 1 k − k (cid:88) k t =2 ( N − k t )! k t ! N ! E k t , (1)where k is the locality of the polynomial. The sum is takenover all the degrees in the polynomial running from k t = 2 to k t = k , E k t is the number of individual terms with degree k t , and N is the number of variables in the polynomial. For -local instances, this expression is reduced to the commongraph density expression. Notice that, for all problems, thedensities decrease slightly post locality reduction. TABLE I: Reduction of -local problems to -local problems. Den-sities for each instance are calculated as per Eq. (1). The mean values(denoted by an overbar) are calculated over the instances that havebeen generated. The number of variables of the reduced problems in-creases by a factor ∼ . -local -local reduction N ¯ ρ ¯ N ¯ ρ
16 0 . ± .
020 46 . ± .
573 0 . ± . . ± .
014 192 . . ± . . ± .
008 432 . . ± . . ± .
007 768 . . ± . . ± .
005 1200 . . ± . TABLE II: Reduction of -local problems to -local problems. Den-sities for each instance are calculated as per Eq. 1. The mean values(denoted by an overbar) are calculated over the instances that havebeen generated. The number of variables of the reduced problems in-creases by a factor ∼ . -local -local reduction N ¯ ρ ¯ N ¯ ρ
16 0 . ± .
023 76 . ± . . ± . . ± .
017 448 . ± . . ± . . ± .
008 887 . ± . . ± . . ± .
007 1501 . ± . . ± . . ± .
004 2248 . ± . . ± . III. EXPERIMENT SETUP
The simulations are performed with Azure Quantum’ssolvers, which can handle k -local terms natively. There aretwo variants of the solvers. The parameter-free version re-quires the user to enter only a timeout and automatically opti-mizes the parameters to find solutions to binary cost functionsto high probabilities. The standard solvers require parameteroptimization to obtain the optimal scaling. A. Setup
For the experiments, we use the parameter-free
ParallelTempering (v1.0) solver. The best valuesfor temperatures, number of sweeps, and number of replicasare calculated internally and are customized for each sub-mitted problem individually. The only parameter to set is timeout , which is the time spent in the core solver loop(in seconds). It is worth specifying that timeout doesnot include the time spent by the solver to calculate theparameters that are used during the annealing process. Thetotal time the solver needs to solve the problem is referred toas runtime . The advantage of using a parameter-free solveris that no tuning experiment is necessary. The disadvantageis that the runtime we measure includes both the time tocalculate the parameters and the time to solve the problem. Atthe time of running the experiment, the parameters calculatedby the solver are not returned to the user in the currentimplementation. As such, we cannot list them in this work.The benchmark experiment consists of solving 30 randominstances for each system size and locality, as well as theirrespective -local reduction (see Tables I and II for details).For each of these instances, we perform 30 runs to gatherstatistics. We set timeout = 100 . In cases when isnot enough time to find the ground-state energy, we increase timeout to . B. Metrics
The primary objective of our benchmark experiment is toquantify how the computational effort in solving a problemscales as the size of the problem input increases. The com-mon approach is to measure the time to solution (TTS). Wecalculate the TTS following the approach defined in Ref. [15]:
TTS = τ R , (2)where R is the number of runs required to find the ground-state energy with a probability of and τ is the time it takesto run the algorithm once (i.e., the solver output runtime ).Measuring the TTS requires the algorithm to find theground-state energy of each problem for at least of thesuccessive runs performed. When it is not possible to measurethe TTS, because the ground-state energy cannot be deter-mined sufficiently often, we measure other performance met-rics, such as the fraction of solved problems and the residualenergies—both defined below.
16 64 144 256 400 N − TT S ( s ) k = 3 k = 4 FIG. 1: TTS mean value and standard deviation for k -local problemswith k = 3 and k = 4 using the parallel tempering solver. The fraction of solved problems is defined as the fraction ofruns for which the ground-state energy is found by the solverdivided by the total number of experiments. We have per-formed a total of 900 runs for each problem size and locality.The energy is calculated for each problem and each run in thefollowing way: R = E GS − E best E GS , (3)where E GS is the known planted ground-state energy of theproblem and E best is the best energy found by the algorithm.The values reported here are obtained by resampling the dis-tribution of residuals over all problems and runs. IV. RESULTS
Figure 1 shows the TTS for planted - and -local problemswith a number of variables N ranging from to usingthe parallel tempering algorithm. Both problem types showa similar scaling. We have fit an exponential function of theform TTS = 10 α + βN . The results of the fit and the estimatedscaling exponent β are: β = 0 . k = 3) β = 0 . k = 4) The fraction of solved runs is for all sizes of both -and -local problems. However, it is not possible to calculatethe TTS for the -local reductions of either the - or -localproblems. Figure 2 shows the fraction of solved problems(left panel) and the residual energies (right panel). The -local problems derived from the -local instances seem to becomputationally harder to solve than the ones generated fromthe -local problems. We surmise that the higher the locality,the harder it would be to solve the -local reductions. Thebenchmark experiment has been performed with two differentvalues of the parameter timeout . However, increasing thetimeout does not improve the quality of the results.
16 64 144 256 400 N . . . . . . F r a c t i o nS o l v e d k = 3, timeout = 100 k = 4, timeout = 100 k = 3, timeout = 500 k = 4, timeout = 500
16 64 144 256 400 N − − − R e s i du a l s ( % ) k = 3, timeout = 100 k = 4, timeout = 100 k = 3, timeout = 500 k = 4, timeout = 500 FIG. 2: Fraction solved (left) and residuals (right) of -local problems obtained by reducing k -local instances with k = 3 and k = 4 . Thedashed line represents the reference for the ideal cases. The benchmark experiment has been performed with two different values of theparameter timeout . The data show that solving the -local versions of the problems is extremely difficult. In fact, we were unable to do ascaling analysis as the majority of the problems could not be solved. V. DISCUSSION
The computational hardness of the - and -local instancesis set in the planting tool Chook by a careful choice of thecouplers from different disorder distributions with varyinglevels of frustration. The reduction to -local interactions inthe Hamiltonian requires the introduction of auxiliary vari-ables and penalty terms. The latter, in particular, change thefrustration levels, and thereby the hardness of the problems.Tables I and II show the increase in the number of variableswhen reducing the problems to their -local versions. We ob-serve an increase of a factor of approximately for the -localproblems, which increases to a factor ∼ when reducing the -local problems. Higher-order Hamiltonians will naturallyrequire an even larger overhead.Figure 3 compares the coupler distributions for the - and -local problems of different system sizes with their corre-sponding -local reductions. The histograms show that, whilethe distributions of the - and -local problem are quite sim-ilar (note that the x -axis in the plots on the left and the rightsides of the figure have a different scale), the distributions oftheir -local reductions are significantly wider, in particularwhen the reduction occurs from a higher degree of the poly-nomial. A more quantitative analysis of this effect is shownin Figure 4. We have calculated the standard deviation andthe kurtosis of the the coupler distributions. While the formerincreases by a factor of approximately , the latter reducesby approximately a factor of when reducing the problemsfrom k -local to -local. Having a large dynamic range in thecoupler distributions of the reduced problems typically makesthese harder to solve with physics-based solvers.To corroborate the aforementioned observation that theproblems become harder when their locality is reduced and,in turn, the coupler distributions have greater variance, we usepopulation annealing Monte Carlo (PAMC) [16–21] to mea-sure the entropic family size ρ s . Similar to simulated anneal- ing (SA) [22], population annealing is a sequential Markovchain Monte Carlo (MCMC) algorithm in which a populationof “replicas” is slowly annealed toward a target low tempera-ture. At each temperature, the population is reconfigured via aresampling process during which some replicas are multipliedor eliminated to achieve an equilibrium Gibbs distribution ofenergies. In a well-thermalized PAMC simulation, a sufficientnumber of the original replica families must survive. This canbe quantified by the family entropy , S f : S f = − R (cid:88) i n i log n i , (4)where n i is the fraction of the replicas in the i -th family and R is the total population size. The average family size in thermalequilibrium can then be obtained from ρ s = lim R →∞ R/e S f . (5)Note that ρ s , by definition, is an intensive quantity, and there-fore independent of the population size R in the thermody-namic limit. In practice, ρ s converges to its true value at alarge but finite population. For all measurements of ρ s , weensured that such convergence was achieved unless the sim-ulation timed out. Figure 5 shows ρ s averaged over all in-stances generated for each system size. The -local reductioncritically increases the hardness of the problems, especiallyfor large system sizes. In particular, measuring ρ s for the re-duced version of -local problems was possible only for sizes N = 16 and N = 64 . The problems were so hard that thesimulation converged only partially for N = 144 , and did notconverge at all during the allocated time for larger problemsizes.Our results demonstrate the advantage of solving the opti-mization problems in their original k -local formulation, andwe expect this result to be independent of the choice of − − − Couplers C o un t s N = 400 C o un t s N = 256 C o un t s N = 144 C o un t s N = 64 C o un t s N = 16 − − Couplers C o un t s N = 400 C o un t s N = 256 C o un t s N = 144 C o un t s N = 64 C o un t s N = 16 FIG. 3: Coupler distributions of k -local problems with k = 3 (left panel) and k = 4 (right panel) for different system sizes N , and theircorresponding -local reductions. In the -local case the distributions are similar, however weight is redistributed to the tails. In the -localcase there is a sizable increase in the width of the distributions post locality reduction. solver. Reference [5] shows that a simulated quantum anneal-ing (SQA) algorithm has no advantage in solving a k -localformulation of a problem, instead of its -local reduction, for N < . In particular, they claim that the tunnelling effectwould pass across the large energy barriers introduced by thereduction. Nevertheless, we would expect such barriers to be-come wider as the size of the problem increases [23], until nofinite-range tunnelling can be beneficial during the optimiza-tion. VI. CONCLUSIONS
We have generated problems with planted solutions having k -local interactions and reduced them to their corresponding -local versions, more amenable to current physics-inspiredoptimization tools than the original ones. The reduction hasbeen performed using a customized version of a classic andextensively adopted quadratization algorithm. The computa-tional time required by the reduction algorithm is known toscale polynomially with the size of the input and thus does notaffect the overall exponential scaling found in current physics-
16 64 144 256 400 N σ C o e ffi c i e n t s K u rt o s i s C o e ffi c i e n t s k = 3 k = 3, reduced k = 4 k = 4, reduced FIG. 4: Kurtosis and standard deviation calculated from the cou-pler distributions of k -local problems with k = 3 and k = 4 , andtheir correspondent -local reductions. While the kurtosis decreases,the standard deviation of the distributions increases noticeably, thusmaking the problems harder to solve. Both panels have the samehorizontal axis.
16 64 144 256 400 N ρ s k = 3 k = 3, reduced k = 4 k = 4, reduced FIG. 5: Entropic family size, ρ s , calculated using population anneal-ing Monte Carlo for k -local problems with k = 3 and k = 4 , andtheir corresponding -local reductions. The family size of the re-duced version of k = 4 problems with N = 144 converged onlypartially, while for larger sizes the value did not converge during theallocated timeout. In all cases, a reduction in locality makes the prob-lems computationally harder to solve. inspired optimization methods. Using Azure Quantum’s im- plementation of the ParallelTempering parameter-freealgorithm, designed to handle problems of any locality, wehave attempted to find optima for the native - and -localproblems, as well as their -local reductions. All k -local prob-lems with k = 3 and k = 4 have been solved to optimal-ity during the allocated 100-second timeout. The TTS for -local problems is approximately times larger than for the -local ones. In contrast, even after increasing the timeout to500 seconds, the -local reductions could not be solved. It iscommon practice to apply locality reduction in order to ac-commodate higher-order polynomial unconstrained optimiza-tion problems to run on optimizers that natively handle onlyquadratic problems. Nevertheless, our results show that doingso should be, ideally, avoided. As such, investing into creat-ing hardware and/or software to tackle higher-order problemsshould be prioritized. Acknowledgments
We thank Marko Bucyk for his careful editing and review-ing of the manuscript. H. G. K. would like to thank DavidPoulin for inspiring discussions and dedicate this manuscriptto him. [1] G. D. las Cuevas, W. D¨ur, M. V. den Nest, and H. J. Briegel,
Completeness of classical spin models and universal quantumcomputation , J. Stat. Mech. , P07001 (2009).[2] G. D. las Cuevas, W. D¨ur, H. J. Briegel, and M. A. Martin-Delgado,
Mapping all classical spin models to a lattice gaugetheory , New J. Phys. , 043014 (2010).[3] R. S. Andrist, H. G. Katzgraber, H. Bombin, and M. A. Martin-Delgado, Tricolored lattice gauge theory with randomness:fault tolerance in topological color codes , New J. Phys. ,083006 (2011).[4] A. Feldman, G. Provan, and A. Van Gemund, ApproximateModel-Based Diagnosis Using Greedy Stochastic Search , Jour-nal of Artificial Intelligence Research , 371–413 (2010).[5] A. Perdomo-Ortiz, A. Feldman, A. Ozaeta, S. V. Isakov, Z. Zhu,B. O’Gorman, H. G. Katzgraber, A. Diedrich, H. Neven,J. de Kleer, et al., Readiness of Quantum Optimization Ma-chines for Industrial Applications , Phys. Rev. Applied ,014004 (2019).[6] M. Hernandez, A. Zaribafiyan, M. Aramon, and M. Naghibi, ANovel Graph-Based Approach for Determining Molecular Sim-ilarity (2016), 1601.06693.[7] D. J. J. Marchand, M. Noori, A. Roberts, G. Rosenberg,B. Woods, U. Yildiz, M. Coons, D. Devore, and P. Margl,
AVariable Neighbourhood Descent Heuristic for ConformationalSearch Using a Quantum Annealer , Sci. Rep. , 13708 (2019).[8] Microsoft Quantum, Jij and Toyota Tsusho: reducing carbonemissions with Azure Quantum , https://cloudblogs.microsoft.com/quantum/2020/08/04/jij-toyota-azure-quantum-reducing-carbon-emissions/ .[9] E. Boros and A. Gruber, On Quadratization of Pseudo-Boolean Functions (2014), 1404.6538.[10] N. Dattani,
Quadratization in discrete optimization and quan-tum mechanics (2019), 1901.04405.[11] D. Perera, I. Akpabio, F. Hamze, S. Mandr`a, N. Rose, M. Ara-mon, and H. G. Katzgraber,
Chook – A comprehensive suite forgenerating binary optimization problems with planted solutions (2020), 2005.14344.[12] I. Rosenberg,
Reduction of bivalent maximization to thequadratic case , Cahiers du Centre d’Etudes de Recherche Op-erationnelle , 71 (1975).[13] E. Boros and P. L. Hammer, Pseudo-Boolean optimization , Dis-crete Applied Mathematics , 155 (2002).[14] 1QBit, , https://portal.1qbit-prod.com/docs/task/convert-hobo-to-a-qubo , accessed: 2020-12-09.[15] M. Aramon, G. Rosenberg, E. Valiante, T. Miyazawa,H. Tamura, and H. G. Katzgraber, Physics-Inspired Optimiza-tion for Quadratic Unconstrained Problems Using a Digital An-nealer , Frontiers in Physics , 48 (2019), 1806.08815.[16] K. Hukushima and Y. Iba, in The Monte Carlo method inthe physical sciences: celebrating the 50th anniversary of theMetropolis algorithm , edited by J. E. Gubernatis (AIP, LosAlamos, New Mexico (USA), 2003), vol. 690, p. 200.[17] J. Machta,
Population annealing with weighted averages: AMonte Carlo method for rough free-energy landscapes , Phys.Rev. E , 026704 (2010).[18] J. Machta and R. Ellis, Monte Carlo Methods for Rough FreeEnergy Landscapes: Population Annealing and Parallel Tem-pering , J. Stat. Phys. , 541 (2011).[19] W. Wang, J. Machta, and H. G. Katzgraber,
Population anneal-ing: Theory and application in spin glasses , Phys. Rev. E , Analysis and optimization of popula-tion annealing , Phys. Rev. E , 033301 (2018).[21] A. Barzegar, C. Pattison, W. Wang, and H. G. Katzgraber, Op-timization of population annealing Monte Carlo for large-scalespin-glass simulations , Phys. Rev. E , 053308 (2018).[22] S. Kirkpatrick, C. D. Gelatt, Jr., and M. Vecchi, Optimization by Simulated Annealing , Science , 671 (1983).[23] S. Mandr`a, Z. Zhu, W. Wang, A. Perdomo-Ortiz, and H. G.Katzgraber,
Strengths and weaknesses of weak-strong clus-ter problems: A detailed overview of state-of-the-art classi-cal heuristics versus quantum approaches , Phys. Rev. A94