Applications of Quantum Annealing in Statistics
AApplications of Quantum Annealing inStatistics
Robert C. Foster, Brian Weaver, Jim GattikerLos Alamos National LaboratoryNovember 21, 2019
Abstract
Quantum computation offers exciting new possibilities for statis-tics. This paper explores the use of the D-Wave machine, a specializedtype of quantum computer, which performs quantum annealing. Ageneral description of quantum annealing through the use of the D-Wave is given, along with technical issues to be encountered. Quantumannealing is used to perform maximum likelihood estimation, gener-ate an experimental design, and perform matrix inversion. Thoughthe results show that quantum computing is still at an early stagewhich is not yet superior to classical computation, there is promise forquantum computation in the future.
Keywords—
Quantum computing, Quantum annealing, D-Wave, Simu-lated Annealing, Maximum likelihood, Experimental Design, Matrix Inversion
LA-UR-19-23207 a r X i v : . [ s t a t . C O ] N ov Introduction
Quantum computing has arrived. Long imagined as a technology of the future,early quantum computers exist today which are being used to solve real problems.These computers, however, exist in what has been dubbed the noisy intermediate-scale quantum (NISQ) era (Preskill (2018)), with enough qubits that classicalcomputers can not simply simulate the machines, but with hardware issues intro-ducing noise into the computation and an inability to perform error-checking.This paper will focus on one particular type of quantum computer - the D-Wave quantum annealer. It must be stressed that this is not a universal quantumcomputer which can be programmed to run various types of quantum computingalgorithms, but rather a specialty quantum computer built to solve combinato-rial optimization problems using one type of quantum computing algorithm. Theadvantage of focusing on the D-Wave is two-fold: first, that quantum annealing us-ing the D-Wave is much simpler to perform and understand than algorithms whichrun on universal quantum computers, which often requires knowledge of quantummechanics to comprehend and whose programming requires direct implementationof quantum logic gates. Second, it is currently much more readily available foruse through companies and organizations such as the Universities Space ResearchAssociation (USRA). Quantum annealing may potentially provide a polynomialincrease in computing power depending on the problem at hand, though researchis ongoing.The purpose of this paper is not to give an in-depth discussion into theunderlying physics of quantum computation or hardware of the D-Wave quantumannealer, but rather to provide an introduction to quantum annealing and the D-Wave hardware in general, and suggest directions for future research. The originalpaper discussing the use of the Ising model for quantum annealing is Bian et al.(2010). An excellent overview of quantum annealing may be found in Biswaset al. (2017). Within the statistical literature, Wang et al. (2016) also provides athorough description of both the physics and hardware of the D-Wave machine,and derives statistical theory for tests comparing the results of D-Wave output toresults from classical algorithms. Section 2 gives a general description of quantumbits, focusing on the property needed to understand quantum annealing, and ageneral description of quantum annealing through the adiabatic theorem. Section3 describes technical issues which affect the performance of the quantum annealingalgorithm on the D-Wave. Sections 4, 5, 6 give examples of quantum annealingusing the D-Wave for maximum likelihood, experimental design generation, andmatrix inversion, respectively. Quantum Computation using the D-Wave
Before beginning a discussion of what a quantum computer is, it is important toclarify what it is not. A quantum computer is not simply a traditional computerthat operates with much more computational power. In fact, for most problemstraditional computation is faster. If all that existed were quantum computers,it would still be necessary to invent traditional computers in order to improveperformance for most applications. And a quantum computer is not simply amachine that operates on all answers simultaneously and produces the correctanswer as if by magic - its mathematics are much more subtle.
A quantum computer is a device which exploits properties of elementary particlesto perform linear algebra in the complex plane without the requirement of storingnumerical values in memory and performing computational operations on them.If a traditional bit may only take the values 0 and 1, define a qubit, short forquantum bit, to be a particle existing in the form q = c | (cid:105) + c | (cid:105) (1)where c , c ∈ C and | (cid:105) , | (cid:105) are, using standard bra-ket notation, orthonormalbasis vectors in the complex plane corresponding to physical properties of theparticle such as spin but nominally representing numerical values. The complexcoefficients c and c must have | c | + | c | = 1, and with this constraint the setof values that q may take is often represented as a sphere in complex space. Inthis way, the qubit manages to exist as a linear combination of the vectors | (cid:105) and | (cid:105) . The qubit is connected to the traditional bit by associating the pure | (cid:105) and pure | (cid:105) states with numerical values. These values may theoretically beanything, but most common are {− , } and { , } . For this paper, define 0 be thenumerical value associated with the superposition defined by c = 1 and c = 0,and likewise 1 as the numerical value associated with the superposition defined by c = 0 and c = 1. Given values of c and c before observation of the qubit, uponobservation the qubit q collapses into either 0 or 1 with probability P (0) = | c | and P (1) = | c | . Further properties of quantum mechanics are necessary for otherquantum computing algorithms but the superposition property is all that is neededfor quantum annealing. .2 Problem Format The D-Wave is not a universal quantum computer. It is strictly an optimizationmachine which solves one specific problem. The D-Wave in principle finds the setof q i that minimize the energy function Energy = (cid:88) i a i q i + (cid:88) j>i b ij q i q j (2)where the q i are binary variables that can fit into one of two forms: either q i ∈ {− , } , known as the Ising model, or q i ∈ { , } , known as the quadraticunconstrained binary optimization (QUBO) model. The a i and b ij can be any realnumber, though the physical properties of the D-Wave hardware impose minimumand maximum values for each and rescaling must be applied in the case that a i and b ij values input by the user are outside this range. For a system with n q qubits, there are 2 n q possible solutions. This exponential scaling of possible solu-tions means the computation required to find the minimum energy of systems asin Equation (2) also scales exponentially as the problem size increases.The D-Wave operates natively using the Ising model, and any QUBO mod-els input to the machine are converted to Ising models before running. This paperwill work primarily with the QUBO model, as it is much more natural for theproblems considered. A QUBO problem can also be written in the form Energy = x (cid:48) Qx (3)where x is an n q length vector consisting of the q i values and Q is an upper-triangular matrix consisting of the a i values along the diagonals and the b ij valuesin the upper triangle. Each of the two models, Ising and QUBO, may be convertedback and forth to each other using linear transformations, potentially including lin-ear offsets, and D-Wave provides automated software to perform this. Formattinga problem to run in QUBO form on the D-Wave means determining a matrix Q as in Equation (3).Each system as in Equation (2) can also be thought of a graphical networkbetween the n q qubits, wherein each qubit is assigned weight a i to itself andany interactions between qubits are assigned weight b ij . This visualization as agraphical network is important to the operation of the D-Wave machine as thegraph must be mapped onto the physical hardware, which will be discussed inSection 3. A three-qubit system and corresponding QUBO matrix are shown inFigure 1. a i and eachpair of qubits is connected with interaction b ij . Each qubit may take one oftwo values: either q i ∈ {− , } , known as the Ising model, or q i ∈ { , } ,known as the QUBO model. The energy of the system is given by pluggingthe q i values into Equation (2). When the q i follow the QUBO model, thesystem can be represented by the upper-triangular matrix on the right, andthe energy calculated by Equation (3). Systems in the form of Equation (2) are commonly optimized by techniques such assimulated annealing which are able to both climb and descend peaks in the energysurface in order to avoid becoming trapped in local optima. In the worst case,however, they still may require an enumeration of all 2 n q possible combinationsof qubits in order to find the lowest-energy solution. The quantum annealingprocedure, in theory, allows for direct estimation of the minimum energy statewithout traversal of the entire energy surface. Maintaining quantum coherenceof the qubit in a linear combination of 0 and 1 states, as in Equation (1), allowsfor a phenomenon called quantum tunneling, in which the solution travels directlythrough an energy barrier between local minima without the “backing-out” of alocal optima that a procedure like simulated annealing performs.Figure 2 demonstrates the manner in which a solution moves through theoptima in the energy surface in order to find the global optima. In theory, quantumannealing performs best on energy surfaces with tall, narrow peaks that imposesubstantial difficulties on hill-climbing techniques, but require only a short tunnelfor the quantum anneal. In practice, such problems are hard to find. Attemptsto construct problems which show supremacy of the quantum annealer over clas-sical techniques, have so far been focused on artificially constructed problems, as escribed in Mandr`a et al. (2017).The D-Wave performs this quantum annealing through the use of the adia-batic theorem. The adiabatic theorem, stated simply, says that if a system beginsin the minimum energy state of a Hamiltonian, where a Hamiltonian is a system ofqubits, weights, and interactions as in Equation (2) and Figure 1, and transitionsto a new Hamiltonian slowly enough, then the system remains in the lowest-energystate the entire time. This is, of course, a vast simplification of the procedure. Amore complete technical description, including the underlying physics, may befound in Biswas et al. (2017).Beginning with a predetermined Hamiltonian H in which the lowest-energy state is easily found, the system transitions to the user-input Hamiltonian H through the Equation H ( s ) = A ( s ) H + B ( s ) H (4)where s transitions smoothly from 0 to 1, and A ( s ) are chosen such that A (0) > B (1) = 0 at the beginning of the anneal, while A (1) = 0 and B (1) > s from 0 to 1, called the annealing schedule, the user candetermine the amount time it takes to complete the anneal in Equation (4). Theminimum annealing time is 10 µs . Whereas in simulated annealing the energysystem remains constant and bits are changed in order to explore it, in quantumannealing the energy system is changed around the qubits and physics forces themto adapt in order to remain in the lowest-energy state.How slowly is slowly enough? The system is only guaranteed to arrive atthe lowest-energy state in the case of an anneal of infinite length time. In general,the necessary anneal time increases with the complexity of the energy surface. Therequired annealing time is determined by the energy gap between the lowest-energysolution and the second lowest-energy solution at any point during the annealprocess, at which the qubits are likely to tunnel from the optimal solution to anon-optimal solution. It is currently unknown whether or not this gap decreasesexponentially as problem size grows, though if the system could be run at zerotemperature it is known that this gap would decrease polynomially. A polynomialdecrease in computational time, however, would still yield a sizeable increase incomputational power, taking an algorithm which runs in cubic time to square timeor lower, for example, or an algorithm which runs in linear time to square root timeas with Grover’s search algorithm (Grover (1996)). Some evidence suggests thatthe temperature may need to drop at minimum in a logarithmic rate, or possiblyat a power rate, as the problem size increases (Albash et al. (2017)).Because the system is not guaranteed to stay in the lowest-energy statefor a given anneal, it is common to perform a large number of anneals and return A ( s ) and B ( s ) are as in Equation (4). At s = 0, A ( s ) > B ( s ) = 0, so the D-Wave is set entirely to system H which has an easilylocated lowest-energy state. At s = 1, A ( s ) = 0 and B ( s ) >
0, so theD-Wave is set entirely to the user-input system H as in Equation (2). Ifthe annealing is done slowly enough, the adiabatic theorem states that thesystem will stay in the lowest-energy configuration of q i the entire time. Theuser has control over the annealing time and other aspects of the anneal.Figure originally from Katzgraber et al. (2015).8 he set of q i and energy for each. The default number of anneals is currently 1000but can be increased to up to 10000 by the user. The machine will usually returnsome low-energy solution for each anneal, and so over a large number of samplesthere is a probability of returning the lowest-energy solution which should be highfor well-formed problems. The user then takes the solution with the lowest-energystate as the final result. It is entirely possible, however, that the machine will failto find the lowest-energy solution. Because of this, users should not attempt touse the D-Wave to solve problems in which obtaining only a good solution ratherthan the best solution is disastrous.Bian et al. (2010) notes that under theoretically optimal operating condi-tions, the distribution of outcomes can be modeled as a Boltzmann distribution. P ( { q , q , . . . } ) ∝ exp (cid:18) (cid:80) i a i q i + (cid:80) j>i b ij q i q j τ (cid:19) (5)In Equation (5), the temperature parameter τ can be thought of as correspondingto physical thermal effects present in the hardware due to the machine running atnonzero temperature. A number of estimators for the value of τ have been devel-oped, as discussed in Raymond et al. (2016). Operating conditions are not optimal,however, and empirical output from the machine shows significant deviations fromthe model in Equation (5). The D-Wave quantum computer is, to put it mildly, a controversial machine withinthe quantum computing community, with some questioning how “quantum” themachine really is (Shin et al. (2014)). All types of quantum computing rely uponmaintaining the coherence of quantum particles as in Equation (1), and the length-ening time until decoherence continues to prove a formidable problem. The deco-herence time of qubits in the D-Wave is reported to be on the order of nanoseconds.The annealing time is at a minimum of 1 µs , which is orders of magnitude longerthan the decoherence time. The initial question regarding the D-Wave hardware,then, was whether the machine was performing any sort of quantum computationat all. This was answered affirmatively in Johnson et al. (2011), which comparedthe results of quantum annealing using the D-Wave to classical techniques andshowed that the D-Wave output exhibited characteristics that could only have oc-curred with quantum tunneling. The next question was whether any problem couldbe formulated, even artificially, which showed quantum supremacy over the bestclassical algorithms. Though quantum supremacy as a concept is still ill-defined(Ronnow et al. (2014)), attempts are currently being made to obtain it using theD-Wave (though not without controversy) as in Mandr`a et al. (2017). The final uestion remains whether any quantum supremacy the D-Wave shows on the Isingspin energy minimizations is because of its quantum computational nature, or be-cause it is a machine specifically built for solving Ising spin energy minimizations.All hope is not lost, however - future improvements of the machine may allow fordemonstration of quantum supremacy strictly due to quantum effects, and sincethe D-Wave is known to perform quantum tunneling, however briefly, it is stillworthwhile to explore the computational model. All computers, quantum or otherwise, must map a problem into a format thecomputer can understand. In the classical realm, this is using binary floatingpoint arithmetic to represent numbers as bits, for example. These issues appearedcomplex in early classical computing, but the effect of these classical technicalissues has been reduced to a level that it can be safely ignored for the vast majorityof calculations. Quantum computers exhibit technical issues analogous to thosefaced by early classical computers, however, and as such are currently both limitedin the problems they can represent and noisy in the ways they solve these problems.
As previously mentioned, the range of values allowed for the a i and b ij in Equation(2) is limited in the hardware implementation of the algorithm. On the D-Wave2000Q, these values are a i ∈ [ − . , .
0] and b ij ∈ [ − . , . a i and b ij values which are reasonable in theQUBO model do not become wildly out of range of the hardware in the resultingIsing model. If there are a i or b ij values outside of this range, the entire set ofcoefficients must be rescaled so that all the a i and b ij fit within the range of thehardware.A rescaling of the coefficients would not be a problem, being equivalent tomultiplying both sides of Equation (2) by a constant, except that the values of a i and b ij are themselves subject to noise. This noise is due to both imperfections inthe hardware and thermal fluctuations which affect the physical implementationof the annealing process. A model for the noisy coefficients is a ∗ i = a i + (cid:15)b ∗ ij = b ij + δ (6) here (cid:15) and δ are random variables, ideally independent between qubits, but ev-idence of both temporal and spatial correlation of errors exists (Michalak andPicard (2017)). These have been commonly modeled as mean-zero Gaussian ran-dom variables, though Perdomo-Ortiz et al. (2016) gives empirical evidence of bias,and also offers a method of bias correction. Intuitively, the values of a i and b ij which are smallest in magnitude are those most likely to be affected by the noise.In general, these errors can not be avoided. Decreasing the standard deviationsof the noise components will likely involve lowering the operating temperature ofthe D-Wave machine (Albash et al. (2017)) - not an easy task, as it already runsat 0 .
015 degrees Kelvin. The power draw necessary for this cooling is claimed tobe less than 25 kW, far less than the US Department of Energy’s stated goal of20-30 MW power for an exascale supercomputer (King et al. (2017)).These errors can both significantly alter the system so that the lowest-energy state is no longer the one desired by the user and degrade the performance ofquantum annealing so that it shows no speedup over classical algorithms (Venturelliet al. (2015)). These errors also prevent the D-Wave from being used for truerandom number generation, as even if the zero model is used with all a i = b ij = 0so that every set of q i is theoretically equally likely, the results of the qubit spinsare still not independent Bernoulli random variables. Research is ongoing intomodels for the D-Wave output which incorporates noise. Coffrin (2019) gives animproved statistical model over the Boltzmann sampler of Equation (5) for theoutput from the D-Wave as a hierarchical model wherein the (cid:15) in Equation (6)are modeled as mean zero Gaussians, and gives evidence that this model providesa closer match of output from the D-Wave hardware to theoretical probabilities.This is only an approximate model, however, and there are still sources of noisebeyond errors in coefficients which causes the output from the D-Wave to deviatefrom the noisy Boltzmann sampler model. A further technical issue is embedding. Though the model has so far been pre-sented as a theoretical graphical network of qubits, this theoretical graph must beembedded onto the physical hardware of the D-Wave. The graph configurationchosen for the D-Wave hardware is called a Chimera structure, in which qubitsare arranged in sparsely connected groups of at most six other qubits, called unitcells. A sample unit cell and the connected Chimera structure is shown in Figure4. The Chimera structure precludes the simple three-qubit system of Figure1 from being solved directly on the D-Wave, as there is no set of three qubits allconnected to each other on the Chimera structure. The graph must be embedded onto the Chimera structure to find the minimum energy of this system. To dothis, multiple qubits must be chained together and the chain treated as if it werea single qubit. This is called a chain, and an embedding of the three-qubit systemusing four qubits and one chain is shown in Figure 5. To chain qubits together, theproblem must be first converted to Ising form if it is not already. In Ising form, q i q j = 1 when q i = q j and q i q j = − q i (cid:54) = q j . Then in place of an interactionterm b ij , a single chain strength c is chosen for all connections between chainedqubits such that, ideally, all low-energy solutions have qubits in a chain return thesame value. It is possible to use multiple long chains to map complex graphicalstructures onto the Chimera structure. On the D-Wave machine, all chains mustuse the same chain strength c . D-Wave offers automated tools to find and performan embedding and unembed the results automatically, currently available in C,Python, and Matlab.The value of c should be negative so that the total energy of the system ismade smaller by having unbroken chains with q i = q j for all i, j in a chain. Withinthis constraint, it is clear that the magnitude of the chain strength c must be chosencarefully. If c is chosen too low in magnitude, then chains will be returned broken,with q i (cid:54) = q j . If c is chosen too high in magnitude, then unnecessary rescaling willcause the effects of noise introduced into the coefficients of the graphical network to c . The qubits q and q are treatedas a single qubits and the value of c should be chosen such that all low-energysolutions return q = q . Using chains, large and complex graph structurescan be embedded onto the Chimera structure used by the D-Wave hardware.13 egrade performance. Research has been performed on selecting an optimal choicefor c , as in Rieffel et al. (2015), but at the moment there is no solid criterion forchoosing a value. It may be necessary to use the quantum annealer with multiplevalues of the chain strength c in order to determine which is optimal for a givenproblem.Embedding reduces the size of problems available to solve on the D-Wave,as a problem which takes tens of qubits in the theoretical graphical model mayrequire hundreds or thousands of qubits when embedded on the Chimera struc-ture. The necessity of embedding also magnifies the influence of the errors on thecoefficients given in Equation (6), adding hundreds or thousands of errors to thecalculation. In general, reducing the size of the embedding will improve perfor-mance. Running a program on the D-Wave follows these steps:1. Formulate the problem as a QUBO or Ising model, as in Equation (2).2. Embed the graphical structure of the problem onto the Chimera graph inFigure 4. If the original problem is in QUBO form, this will necessarilyinvolve a conversion to the Ising model. This can be done either by hand orusing the automated tools provided by D-Wave.3. Specify any parameters of the anneal, such as the number of reads, annealtime, or other more advanced processing options.4. Perform the quantum anneal and obtain the results.5. Unembed the results from the Chimera structure to obtain a solution andenergy for each read of the anneal. Analyzing them may involve simplychoosing the solution returned with the lowest energy or performing someanalysis over all returned solutions.Keeping all a i and b ij coefficients within hardware bounds and minimizingembedding generally improves performance of the quantum anneal. This paper willnot focus on doing these things, instead using the machine as it is likely to be usedby non-specialists: by formulating a problem into QUBO or Ising form, letting theautomated tools provided by D-Wave perform the embedding and unembeddingof the problem, and analyzing results. The goal is to use the hardware to solvepractical optimization problems, many of which exist in statistics. To be clear,quantum computing should not be expected to outperform classical computing for ears to come. The purpose is to develop algorithms and strategies for the use ofquantum computers so that, when hardware developments do allow for quantumsupremacy, statisticians are prepared to make full use of the technology. One of the most common optimization problems in statistics is maximum likeli-hood estimation. Though the D-Wave has been used to solve, for example, linearleast squares problems in Borle and Lomonaco (2019) and systems of polynomialequations in Chang et al. (2019), most likelihood functions are non-polynomial.This section provides an algorithm which is, as far as the authors are aware, thefirst general purpose quantum annealing algorithm which can be applied to opti-mization of continuous functions.Suppose that independent and identically distributed data x d for d =1 , ..., n is available from some two-parameter distribution f ( x d | θ, φ ). The max-imum likelihood estimates are then(ˆ θ, ˆ φ ) = arg max ( θ,φ ) (cid:88) d (cid:96) ( θ, φ | x d ) (7)where (cid:80) d (cid:96) ( θ, φ | x d ) is the standard log-likelihood.In order to use the D-Wave, the log-likelihood function must be put in theform of Equation (2). Begin by writing the parameters θ and φ in binary. Supposethat a total of n q qubits in QUBO form are available for use, indexed by q i for i = 1 , , . . . , n q . Let the subscript θ on the index indicate that the qubit belongsto set of qubits which represent binary digits of θ , and likewise the subscript φ onthe index as the set of qubits which represent powers of φ . The conversion backinto decimal form is then θ = (cid:88) i p iθ q i θ φ = (cid:88) i p iφ q i φ (8)where p i θ and p i φ are the powers of 2 to be used for the calculation of eachparameter. For example, using five qubits for the calculation of θ with powers p i θ = { , , − , − , − } allows for θ to be any value between 0 and 3.875 with anumerical resolution of 0.125. This binary representation of numbers using qubitshas previously been introduced in O’Malley and Vesselinov (2016).Note that Equation (8) forces θ > φ >
0. It is easy to extend this tonegative numbers by introducing a sign qubit as in Borle and Lomonaco (2019); owever, this reduces the numerical resolution of the calculations. To maximizenumerical resolution, only positive solutions were considered. In general, theremay be information about the problem or the solution that can allow the rangeof powers of two in Equation (8) to be reduced, increasing the accuracy of thecalculations.Given the single datum x d , expand the log-likelihood (cid:96) ( θ, φ | x d ) aroundpoints θ and φ using a two-term Taylor series expansion. (cid:96) ( θ, φ | x d ) ≈ (cid:96) ( θ , φ | x d ) + (cid:96) θ ( θ , φ | x d )( θ − θ ) + (cid:96) φ ( θ , φ | x d )( φ − φ )+ 12 [ (cid:96) θθ ( θ , φ | x d )( θ − θ ) + 2 (cid:96) θφ ( θ , φ | x d )( θ − θ )( φ − φ )+ (cid:96) φφ ( θ , φ | x d )( φ − φ ) ] (9)where subscripts on the log-likelihood (cid:96) represent partial derivatives with respectto the notated parameter or parameters in the order given. A maximum power oftwo is chosen for the Taylor expansion because higher powers induce three-qubitinteractions c ijk q i q j q k or higher, which are not supported by the D-Wave hardware.Then plugging the representations in Equation (8) into Equation (9) wher-ever possible, expanding the polynomials, and collecting terms (with q i = q i since q i ∈ { , } ), the following Equations are obtained: (cid:96) ( θ, φ | x d ) ≈ (cid:88) i a i q i + (cid:88) j>i b ij q i q j where a i = − n (cid:88) d =1 [2 p iθ (cid:96) θ ( θ , φ | x d ) + 2 p iθ − (cid:96) θθ ( θ , φ | x d ) − p iθ θ (cid:96) θθ ( θ , φ | x d ) − p iθ φ (cid:96) θφ ( θ , φ )] i is a qubit of θ − n (cid:88) d =1 [2 p iφ (cid:96) φ ( θ , φ | x d ) + 2 p iφ − (cid:96) φφ ( θ , φ | x d ) − p iφ φ (cid:96) φφ ( θ , φ | x d ) − p iφ θ (cid:96) θφ ( θ , φ )] i is a qubit of φ (10) ij = − n (cid:88) d =1 (cid:2) p iθ + p jθ (cid:96) θθ ( θ , φ | x d ) (cid:3) i, j are both qubits of θ − n (cid:88) d =1 (cid:104) p iφ + p jφ (cid:96) φφ ( θ , φ | x d ) (cid:105) i, j are both qubits of φ − n (cid:88) d =1 (cid:104) p iθ + p jφ (cid:96) θφ ( θ , φ | x d ) (cid:105) i, j are qubits of different parameters(11)The set of qubits for the lowest-energy solution returned by the quantumannealer can be converted into numerical estimates ˆ θ and ˆ φ using Equation (8).Unfortunately, these ˆ θ and ˆ φ maximize only the approximated likelihood con-structed using a two-term Taylor series expansion in Equation (9), not the fulllikelihood function in Equation (7). Because of this, the procedure is iterated bytaking the ˆ θ and ˆ φ returned by the quantum anneal, expanding the Taylor seriesaround those values, and finding a new ˆ θ and ˆ φ values which maximize this func-tion using a new quantum anneal. For most likelihood functions, which are wellbehaved, the values ˆ θ and ˆ φ will converge to the maximum likelihood estimatesin Equation (7). At the maximum likelihood estimates, expanding a Taylor seriesaround ˆ θ and ˆ φ and then maximizing it will return those same ˆ θ and ˆ φ values.The ideas behind this iterative process are not new and not unique to quantum an-nealing, using the same principles as Newton and Raphson’s original optimizationmethod. A summary of the iterative process is given below.1. Choose initial values ( θ , φ ) such that (cid:96) ( θ , φ | x d ) and the first two deriva-tives at each point exist and are finite.2. Expand the two-term Taylor series for (cid:96) ( θ , φ | x d ) around ( θ , φ ), as inEquation (9).3. Find lowest-energy values (ˆ θ, ˆ φ ) using the quantum annealer with Equations(10) and (11) and binary representations in Equation (8).4. Take (ˆ θ, ˆ φ ) as new expansion points ( θ , φ ).5. Repeat steps 2-4 until some stopping criterion is met.Because of the noise inherent in the system and the approximate natureof quantum annealing, a stopping criterion based on a tolerance level will fail toconverge unless set large. From Equations (10) and (11), the values a i and b ij values which are smallest in magnitude are those of the least significant qubits due o the influence of the 2 p i terms. These are precisely the ones which contribute theleast to the total energy of the system and that will be most affected by the noisein the model inputs, as described in Equation (6). It is unlikely that the methodwill fix upon the exact values which minimize the energy of the system, but willvary randomly in the least significant qubits. A simple test case for this algorithm is to estimate the parameters of a N ( θ, φ )distribution from a random sample of data. Using parameters θ = 0 . φ = 1, arandom sample x d = [ − . , − . , − . , . , . , . , . , . , . , . θ = ¯ x = 0 . φ ≈ . c = − .
0. Eightqubits were used for each variable, with powers p i θ = p i φ = { , , − , − , − , − , − , − , − } , for sixteen qubits total. Following Equations (10) and (11), eachqubit interacts with all other qubits, forming a complete K graph for the prob-lem. This graph, and its resulting embedding onto the Chimera structure, areshown in Figure 6. The Chimera network is efficient at embedding this completegraph, requiring approximately only 90 qubits total.Starting values were θ = 0 and φ = 1. The minima and energy at eachiteration are given in Table 1. The chosen powers of 2 give a maximum numericalresolution of 2 − = 0 . There is much work to do in the general use of quantum annealing for maxi-mum likelihood optimization, and this method is only a step. Constructing andmaximizing a quadratic surface at each iteration leads only to the nearest localminima rather than the global minima, in many ways defeating the purpose ofquantum annealing. One possible solution to this issue is including more termsin the Taylor expansion of Equation (9). This will create interactions of three ormore qubits, but these can be handled by two-qubit interactions combined withpenalty functions. In fact, through the use of penalties to represent higher-order K n q graph. On the left, a complete K graphrepresenting all the connections between the qubits of θ and φ in the ex-ample. On the right, the embedding of the K graph onto the Chimerahardware of the D-Wave. The Chimera graph is well-suited for this completegraphical network. qubit interactions, any function which has a Taylor series representation has atheoretical corresponding QUBO representation, though such representation willnecessarily be truncated in practice. A potential approach is to attempt to es-timate said QUBO representation using the function itself as an objective, as inmachine learning. Lastly, the method here is for two parameters, but it may beexpanded to any number of parameters given the appropriate dimension Taylorseries expansion and algebraic manipulations. All of these improvements requiremore qubits or the ability to form larger networks between them. Though notpossible on current hardware, they may be of use on quantum annealers of thefuture. A more realistic use of quantum annealing is for problems which currently rely ontechniques such as simulated annealing or genetic algorithms, such as the case ofexperimental design. Finding an optimal design generally scales exponentially withthe size or dimensionality of the problem, and quantum annealing may provide apolynomial increase in computational speed for high dimensional problems. θ ˆ φ Energy (cid:96) ( θ, φ | x d )1 0.5078125 0.9765625 -423.439 -15.2182 0.5 1.0625 -422.93 -15.0893 0.515625 1.0859375 -420.865 -15.0824 0.5 1.09375 -420.434 -15.0805 0.5 1.0859375 -420.283 -15.0816 0.4765625 1.09375 -420.42 -15.0827 0.53125 1.09375 -420.263 -15.0848 0.484375 1.09375 -420.308 -15.0819 0.5 1.09375 -420.272 -15.08010 0.5 1.0859375 -420.283 -15.081Table 1: Ten iterations of the D-Wave quantum annealer, with starting values θ = 0 and φ = 1. Eight qubits were used for each parameter, sixteen total,with a maximum numerical resolution of 2 − = 0 . θ and ˆ φ from the previous iteration was maximized using the quantum annealer. Theresults show that the quantum annealing algorithm is performing well on thisdata set. The energy is not simply a function of the log-likelihood (cid:96) ( θ, φ | x d )due to both the the necessity of using a two-term Taylor series approximationof the log-likelihood and embedding introducing many more qubits into theannealing procedure, each of which contributes to the energy total. Suppose an N × N Latin hypercube is desired. Current methods for findinga space-filling Latin hypercube use simulated annealing to find the design whichis maximin, as in Morris and Mitchell (1995).
Suppose that instead of a maximin or some other distance-based criterion, a designis desired for which there is one observation within each row, column, and diagonalof the design matrix. The row and column requirements force a Latin hypercubewhile the diagonal requirements force some degree of space-filling, or may be de-sired for its own properties depending on the particular problem at hand. Thisis the classic N -queens problem of placing N queens on an N × N chessboardsuch that no queen is directly attacking another queen. Though found in manyreferences, Mandziuk and Macuk (1992) gives a more thorough description of theproblem including an equation for the energy in the form of Equation (2). An example solution to the 8-queens problem and corresponding Latin hypercube isshown in Figure 7.Each square on the grid can be represented in QUBO form by a singlequbit, q i , i = 1 , , . . . , N , where q i = 0 represents no design point in the squareand q i = 1 represents a design point in the square. The particular labeling isnot important, so long as qubits can consistently be identified as belonging to thesame row, column, and diagonal. An example layout of a 4 × q i and q j represent two qubits such that i (cid:54) = j , the energy of thesystem in the form of Equation (2) is a i = − b ij = i, j in the same row2 for i, j in the same column1 for i, j in the same forward diagonal1 for i, j in the same backward diagonal0 otherwise (12)The minimum possible energy for this system is Energy = − N , reached at thesolution where no two qubits in the same row, column, or diagonal are both setto q i = 1, as in Figure 7. Note that this system is not unique in defining theproblem, as different choices of a i and b ij in Equation (12) will still yield the N-queens solution as the lowest-energy design but with differing energies for other q q q q q q q q q q q q q q q Figure 8: Layout of qubits used for generation of a Latin hypercube on aregular 4 x 4 grid. The particular layout of qubits is not important, so longas it is possible to identify which qubits belong to the same row, column, anddiagonal. designs. The particlar values in Equation (12) penalize violation of the Latinhypercube criterion moreso than violation of the diagonal criterion and were chosenin an attempt to mitigate the risk of failing to obtain the lowest-energy design.Manipulating the distribution of energies in order to emphasize certain classes ofdesigns while still maintaining the preferred design as lowest-energy is perhaps aninteresting area for further research.
The D-Wave 2000Q minimized the energy function in Equation (12) for multiplesized Latin hypercubes. Unfortunately, this set of connections does not readily lenditself to embedding on the Chimera graph as strongly as the complete K graphof Figure 6. Using the embedding software provided by D-Wave and the D-Wave2000Q computer, embedding the 36 qubits of a 6 x 6 hypercube uses approximately400 total qubits, while embedding the 64 qubits of an 8 × c = − . µs anneal and 10000 reads, while using the same parameters for an 8x 8 hypercube did not yield a good solution - the lowest-energy solution returnedfailed to place eight total observations. Both are shown in Figure 9. This analysiswas repeated using a variety of chain strengths and annealing times, but there didnot appear to be a significant improvement in performance.Solutions to the N -queens problem are not unique. There are multiple hypercubes which achieve the minimum energy of − N , many of which are simplerotations or reflections of existing hypercubes. Due to noise in the hardware asin Equation (6), however, the D-Wave will tend towards one particular solutionabove all others. This tended to be one particular solution within a set of annealsbut different solutions between different sets anneals, possibly due to the effect oftemporal correlation in the noise.In theory, the weights given in Equation (12) can be rescaled to induce alarge difference between the lowest and second-lowest energy solution, so that thesystem may more easily remain in the lowest-energy state throughout the entireanneal. In practice, the rescaling of coefficients performed by the D-Wave preventsthis. Furthermore, the large number of qubits required increases the number ofpoints at which errors introduced through thermal or chaining effects may affect thefinal result. Allowing for a wider range of input values, or allowing for additionalconnections between qubits, may help increase the potential problem size. Matrix Inversion
General models similar to the form of Equation (2) exist, and a number of problemshave been already been placed into said formats in a manner that is easy toconvert to the QUBO or Ising model of the D-Wave. For example, Jang et al.(1988) presents matrix optimization using a Hopfield network, a form of recurrentartificial neural network strikingly similar to that of the Ising-spin model given inEquation (2), and which is easily adapted for use in quantum annealing. Matrixinversion is of particular important in statistics, as it often serves as the bottleneckin fitting statistical models such as Gaussian process regressions. Any method thatwould decrease computational time at this bottleneck would greatly increase theapplicability of such techniques.It should be noted that this is not the only approach to using the D-Wavemachine to perform matrix inversion. Rogers and Singleton Jr. (2019) presentsa similar method for inverting matrices using the D-Wave hardware, though ap-proaching the problem as solving a set of linear Equations rather than the Hopfieldnetwork of Jang et al. (1988). Other matrix include nonnegative binary matrixfactorization in O’Malley et al. (2018).
Define A as an n × n matrix and V as its corresponding inverse, respectively. Thenfrom Jang et al. (1988), column k of V may be found individually by minimizingthe energy Equation E k = (cid:32) n (cid:88) (cid:96) =1 A (cid:96) V (cid:96)k (cid:33) + . . . + (cid:32) n (cid:88) (cid:96) =1 A k(cid:96) V (cid:96)k − (cid:33) + (cid:32) n (cid:88) (cid:96) =1 A n(cid:96) V (cid:96)k (cid:33) (13)Writing the elements V rk as sums of powers of two times qubits in QUBOform, similarly to Equation (8), gives V rk = n rk (cid:88) (cid:96) =1 p (cid:96)rk q p (cid:96)rk = 2 p rk q rk + . . . + 2 p nrk q n rk (14)where n rk is the number of qubits used in the binary representation of V rk andonce again, (cid:96) rk indicates that the qubit represents a power of V rk . Expanding andcollecting terms, the energy Equation (13) for column k of V can be rewritten inthe form of Equation (2) as i = 2 p irk α r − p irk +1 A kr i is a qubit of V rk b ij = (cid:40) p irk + p jrk +1 α r i, j are qubits of the same V rk p irk + p jrk +1 β r i ,r j i, j are qubits of different V rk where the quantities α r and β r ,r are given by α r = n (cid:88) (cid:96) =1 A (cid:96)r β r ,r = n (cid:88) (cid:96) =1 A (cid:96)r A (cid:96)r The quantities α i and β ij only need to be calculated once, before the actualquantum anneal. In fact, nearly the entire matrix Q (as in Equation (3)) may beprecomputed, as only the subtraction of the A kr elements in the calculation of the a i differs between minimizations for different columns k . Unfortunately, creatingthe entire set of β ij requires calculating n ( n −
1) objects each requiring a sumof n terms, precluding this method from improving on the fastest classical matrixinversion algorithms. This time could be reduced by parallel computation, bothin classical calculation of the α i and β ij and, supposing one had access to multiplequantum annealers, minimization of the energy function in Equation (13). This method was tested on matrices of multiple sizes with known inverses con-sisting entirely of positive entries. An example 3 × A and correspondinginverse V are shown below. .
344 0 . − . − .
018 1 . − . . − .
384 0 . × .
613 0 .
037 0 . .
586 1 .
068 1 . .
074 0 .
531 1 . = A V I × Using six qubits per element of V rk - 18 qubits total for the column - theestimated inverse V of the above matrix A was found using the D-Wave 2000Q.With embedding, a total of 125 qubits were used. Using 2500 reads, a defaultannealing time, and c = −
10 chain strength, the resulting estimate is shown below. .
344 0 . − . − .
018 1 . − . . − .
384 0 . × .
625 0 . . . . . . . . = . − . − . − .
088 1 .
023 0 . − .
019 0 .
016 1 . A V I × The quantum annealer was not able to find the inverse V precisely, thoughthis is in part due to the limited numerical resolution. The diagonal elements ofthe estimated inverse are reasonably close to the true values, with V being themost off at 6% relative error. The off-diagonal elements show larger relative errors.In practice, 3 × × A , depending on the number ofbits used for each column element V rk ; however, this does not yield a good estimateof the inverse matrix. Note that the discretization used in Equation (14) can havea significant effect upon the final result. The energy measures the distance of theproduct of A and the estimate of V from the identity matrix, not the distance ofthe estimate of V from the true V . Because of this, moving an individual elementof the estimate V rk closer to its true value while changing nothing else can producea larger energy in terms of the energy function in Equation (13). Multiple qubitsmust be adjusted simultaneously, providing a deceptively good test case for the D-Wave hardware. Multi-qubit tunneling provides another instance of a bottleneckwhich may have prevented the anneal from reaching the lowest-energy state, inaddition to the aforementioned errors introduced by chaining and thermal effects.Once again, alternate chain strengths and annealing times did not significantlyaffect the results. Though quantum computing has arrived in at least one form, it is still not accurateenough to be incorporated into regular use in statistics. Qubit coherence remainsan issue for quantum computers of all types, and theoretical research is ongoing intowhich types of problems may provide a speedup over classical computation usingquantum annealing by analyzing how the gap between lowest and second-lowestenergy states scales with problem size. The D-Wave, though more user-friendlythan universal quantum computers, forces all problems to be formatted in theIsing or QUBO model of Equation (2) - not a trivial task. The machine itselfsuffers from hardware noise which may mask the true solution and the process ofembedding reduces the scale of problems available. The combined effects of these ssues were shown in Sections 5 and 6, where larger problems could be embeddedonto the machine, but suitable solutions could not be found. The effects of noiseand difficulty of annealing with hundreds or thousands of qubits leads to solutionswhich are easily identifiable as incorrect but for which the correct solution cannot be enforced, as in the solution to the N-queens problem with too few queens.Furthermore, while the machine has been proven to utilize quantum tunnelingeffects not available to classical solver, it is still unclear whether these effects areeven useful in solving the problem or scalable as the problem size grows.Those qualifications aside, the D-Wave is still potentially relevant for prob-lems of the form in Equation (2). This paper has shown maximum likelihood esti-mation, experimental design generation, and matrix inversion on the D-Wave. Ofthese, experimental design generation arguably shows the most promise for futureresearch using quantum computing, as it relies on simulated annealing techniquesfor finding designs which satisfy an optima criterion - a problem for which thequantum annealing is well suited.This paper will hopefully serve as inspiration for further research in quan-tum computing. Though noisy, hardware is available for current research intoquantum computing. Proof of concept is here, and what remains is for quantumannealers to find ways to make these current drawbacks in implementing the com-putational model in hardware more faithful so that the solutions are reliable. It willbe exciting to observe how this new technology is used to broaden computationalpossibilities in the field of statistics. eferences Albash, T., Martin-Mayor, V., and Hen, I. (2017), “Temperature Scaling Law forQuantum Annealing Optimizers,”
Phys. Rev. Lett. , 119, 110502.Bian, Z., Chudak, F. A., Macready, W. G., and Rose, G. (2010), The Ising model: teaching an old problem new tricks,, Technical report, D-Wave Systems.Biswas, R., Jiang, Z., Kechezhi, K., Knysh, S., Mandr, S., OGorman, B., Perdomo-Ortiz, A., Petukhov, A., Realpe-Gmez, J., Rieffel, E., Venturelli, D., Vasko, F.,and Wang, Z. (2017), “A NASA Perspective on Quantum Computing,”
ParallelComput. , 64(C), 81–98.Borle, A., and Lomonaco, J. (2019), “Analyzing the Quantum Annealing Ap-proach for Solving Linear Least Squares Problems: 13th International Confer-ence, WALCOM 2019, Guwahati, India, February 27 – March 2, 2019, Proceed-ings,”.Chang, C. C., Gambhir, A., Humble, T. S., and Sota, S. (2019), “Quantum an-nealing for systems of polynomial equations,”
Scientific Reports
Proceedings of the Twenty-eighth Annual ACM Symposium on Theory ofComputing , STOC ’96, ACM, New York, NY, USA, pp. 212–219.Jang, J.-S., Lee, S.-Y., and Shin, S.-Y. (1988), “An Optimization Network for Ma-trix Inversion,” in
Neural Information Processing Systems , ed. D. Z. AndersonAmerican Institute of Physics, pp. 397–401.Johnson, M. W., Amin, M. H. S., Gildert, S., Lanting, T., Hamze, F., Dickson, N.,Harris, R., Berkley, A. J., Johansson, J., Bunyk, P., Chapple, E. M., Enderud,C., Hilton, J. P., Karimi, K., Ladizinsky, E., Ladizinsky, N., Oh, T., Perminov,I., Rich, C., Thom, M. C., Tolkacheva, E., Truncik, C. J. S., Uchaikin, S., Wang,J., Wilson, B., and Rose, G. (2011), “Quantum annealing with manufacturedspins,”
Nature , 473, 194 EP –.Katzgraber, H. G., Hamze, F., Zhu, Z., Ochoa, A. J., and Munoz-Bauza, H. (2015),“Seeking Quantum Speedup Through Spin Glasses: The Good, the Bad, andthe Ugly,”
Phys. Rev. X , 5, 031026. ing, J., Yarkoni, S., Raymond, J., Ozfidan, I., King, A. D., Nevisi, M. M., Hilton,J. P., and McGeoch, C. (2017), Quantum Annealing amid Local Ruggedness andGlobal Frustration,, Technical report, D-Wave.Mandr`a, S., Katzgraber, H. G., and Thomas, C. (2017), “The pitfalls of planarspin-glass benchmarks: raising the bar for quantum annealers (again),” Quan-tum Science and Technology , 2(3), 038501.Mandziuk, J., and Macuk, B. (1992), “A neural network designed to solve theN-Queens Problem,”
Biological Cybernetics , 66(4), 375–379.Michalak, S., and Picard, R. (2017), Leveraging LANL’s D-WAVE 2X for Lever-aging LANL’s D-Wave 2X for Random Number Generation,, Technical report,Los Alamos National Laboratory.Morris, M. D., and Mitchell, T. J. (1995), “Exploratory designs for computationalexperiments,”
Journal of Statistical Planning and Inference , 43(3), 381 – 402.O’Malley, D., and Vesselinov, V. V. (2016), ToQ.jl: A high-level programminglanguage for D-Wave machines based on Julia,, in , pp. 1–7.O’Malley, D., Vesselinov, V. V., Alexandrov, B. S., and Alexandrov, L. B. (2018),“Nonnegative/Binary matrix factorization with a D-Wave quantum annealer,”
PLOS ONE , 13(12), 1–12.Perdomo-Ortiz, A., O’Gorman, B., Fluegemann, J., Biswas, R., and Smelyanskiy,V. N. (2016), “Determination and correction of persistent biases in quantumannealers,”
Scientific Reports , 6, 18628 EP –.Preskill, J. (2018), “Quantum Computing in the NISQ era and beyond,”
Quantum ,2, 79.Raymond, J., Yarkoni, S., and Andriyash, E. (2016), “Global Warming: Temper-ature Estimation in Annealers,”
Frontiers in ICT , 3, Article 23.Rieffel, E. G., Venturelli, D., O’Gorman, B., Do, M. B., Prystay, E. M., andSmelyanskiy, V. N. (2015), “A case study in programming a quantum annealerfor hard operational planning problems,”
Quantum Information Processing ,14(1), 1–36.Rogers, M. L., and Singleton Jr., R. L. (2019), Floating-Point Calculations on aQuantum Annealer: Division and Matrix Inversion,. arXiv:1901.06526 [quant-ph]. onnow, T. F., Wang, Z., Job, J., Boixo, S., Isakov, S. V., Wecker, D., Martinis,J. M., Lidar, D. A., and Troyer, M. (2014), “Defining and detecting quantumspeedup,” Science , 345(6195), 420–424.Shin, S. W., Smith, G., Smolin, J. A., and Vazirani, U. (2014), How ”Quantum”is the D-Wave Machine?,. arXiv:1401.7087 [quant-ph].Venturelli, D., Mandr`a, S., Knysh, S., O’Gorman, B., Biswas, R., and Smelyanskiy,V. (2015), “Quantum Optimization of Fully Connected Spin Glasses,”
Phys.Rev. X , 5, 031040.Wang, Y., Wu, S., and Zou, J. (2016), “Quantum Annealing with Markov ChainMonte Carlo Simulations and D-Wave Quantum Computers,”
Statist. Sci. ,31(3), 362–398.,31(3), 362–398.