# Statistical Analysis of Quantum Annealing

SStatistical Analysis of Quantum Annealing

Xinyu Song , Yazhen Wang , Shang Wu Donggyu Kim , School of Statistics and Management,Shanghai University of Finance and Economics Department of Statistics,University of Wisconsin-Madison College of Business,Korea Advanced Institute of Science and TechnologyJanuary 19, 2021

Abstract

Quantum computers use quantum resources to carry out computational tasks andmay outperform classical computers in solving certain computational problems. Special-purpose quantum computers such as quantum annealers employ quantum adiabatictheorem to solve combinatorial optimization problems. In this paper, we compare clas-sical annealings such as simulated annealing and quantum annealings that are done bythe D-Wave machines both theoretically and numerically. We show that if the classicaland quantum annealing are characterized by equivalent Ising models, then solving anoptimization problem, i.e., ﬁnding the minimal energy of each Ising model, by the twoannealing procedures, are mathematically identical. For quantum annealing, we alsoderive the probability lower-bound on successfully solving an optimization problem bymeasuring the system at the end of the annealing procedure. Moreover, we presentthe Markov chain Monte Carlo (MCMC) method to realize quantum annealing byclassical computers and investigate its statistical properties. In the numerical section,we discuss the discrepancies between the MCMC based annealing approaches and thequantum annealing approach in solving optimization problems.

Keywords and phrases:

Combinatorial optimization, ground state success probability,Hamiltonian, Ising model, Markov chain Monte Carlo (MCMC), quantum computing, quan-tum annealing

Running title:

Quantum Annealing and MCMC1 a r X i v : . [ s t a t . O T ] J a n Introduction

While classical computation follows classical physics and uses electronic transistors to crunchzeros and ones individually, quantum computation employs quantum resources to manage ze-ros and ones simultaneously and may speed up certain calculation work dramatically (Nielsenand Chuang, 2000; Wang, 2012). Quantum computation strives to understand how to takeadvantage of the huge information hidden in quantum systems by exploring the enormouspotential of atoms and photons. It uses quantum phenomena such as quantum superposition,quantum entanglement and quantum tunneling to compute and process infomation. Twomajor approaches to realize and implement quantum computation are logic-gate based quan-tum computing and adiabatic quantum computing (Aharonov et al., 2008; Browne, 2014;Deutsch, 1985; DiVincenzo, 1995; Farhi et al., 2000, 2001, 2002; Johnson et al., 2011). Thelogic-gate based quantum computers are constructed based on quantum circuits with quan-tum gates and provide the quantum analog of classical universal or general-purpose comput-ers. Intensive eﬀorts are underway around the world by academic labs, large companies, andgovernment institutes to overcome technical problems in constructing the universal quantumcomputers. However, the practical application of general-purpose quantum computers maybe still decades away while on the other hand, special-purpose quantum computers such asquantum annealers and quantum simulators, can be constructed with capabilities exceedingtheir classical counterparts (Aharonov and Ta-Shma, 2003; Aharonov et al., 2008; Albashand Lidar, 2016; Britton et al., 2012; Browne, 2014; Brumﬁel, 2012; Wang, 2012). Quan-tum annealers are physical hardwares to realize quantum annealing and are used to solvecombinatorial optimization problems and to realize Monte Carlo sampling more eﬃciently.Annealing materials by ﬁrst heating and then slow cooling in order to reduce their brittlenessis an ancient technology, which has been used in making and reﬁning materials such as glassand metal. Computers can be employed to reproduce this process, which creates simulatedannealing that provides an optimization tool. It is based on the analogy between the be-havior of a complex physical system with multiple degrees of freedom and an optimizationproblem of ﬁnding the global minimum given an objective function deﬁned by many param-eters. The objective function of the optimization problem can be regarded as the energy ofthe physical system, then ﬁnding the minimum energy conﬁgurations or ground states of themany-body physical system is equivalent to solving the corresponding optimization problem.Computer simulation algorithms can be developed to mimic the annealing procedure of thephysical system and ﬁnally reach the minimum energy conﬁgurations. We discuss the clas-sical approach called simulated annealing (SA) and the quantum approach called quantumannealing.Given an optimization problem, SA takes into account the relative conﬁguration energiesand a ﬁctitious time-dependent temperature when exploring the immense search space prob-abilistically. Speciﬁcally, we assign the physical system a temperature that can be regardedas a control parameter introduced artiﬁcially. By slowly driving the temperature from ahigh value to zero, we are able to move the physical system to the state with the lowestvalue of energy, hence also arrive at the solution of the optimization problem (Bertsimasand Tsitsiklis, 1993; Kirkpatrick et al., 1983; Wang et al., 2016). Quantum annealing con-2tructs on the physical process of a quantum system whose lowest energy or equivalently,the ground state of the system, represents the solution to an optimization problem posed. Itﬁrst establishes a simple quantum system initialized in its ground state, and then graduallymoves the simple system to the target complex system. Quantum adiabatic theorem (Farhiet al., 2000, 2001, 2002; Kadowaki and Nishimori, 1998) indicates that the system will likelyto stay in the ground state during the gradual evolution, thus, under certain probability,we can ﬁnd the solution of the original optimization problem by measuring the system atits ﬁnal state. That is, quantum annealing replaces thermal ﬂuctuations in SA by quantumtunneling to keep the system close to its instantaneous ground state during the evolution.It is similar to a quasi-equilibrium state being maintained during the evolution of SA. SeeBrooke et al. (1999); Isakov et al. (2016); J¨org et al. (2010); Wang et al. (2016) for moredetails. Both the classical and quantum annealing methods are powerful tools for solvinghard optimization problems, whether achieved by physical devices or simulation methods.The physical scheme either employs a natural system or builds a device to engineer a phys-ical system where the ground state of the system represents the sought-after solution of anoptimization problem (McGeoch, 2014). The simulation approach applies “escape” rules incomputer simulations to prevent the system from getting trapped in local minima given acost function, and eventually, reach the global minimum with some probability (Martoˇn´aket al., 2002; Rieger and Kawashima, 1999). In both situations, the system can probabilisti-cally explore its the huge conﬁguration space, and ﬁnally freeze in the global minimum withcertain probability. Through suﬃcient repeated attempts, we can ﬁnd the global minimumand solve the optimization problem.Quantum annealing devices are actively pursued by several academic labs and companiessuch as Google and D-Wave Systems, with uncertain quantum speedup. For example, theD-Wave quantum computer is a commercially available hardware device that is designed andbuilt to implement quantum annealing physically. It is an analog computing device based onsuperconducting quantum bits (also called qubits) to process the annealing procedure andsolve combinatorial optimization problems (Albash et al., 2015; Boixo et al., 2014, 2016, 2018;Brady and van Dam, 2016; Rønnow et al., 2014; Wang et al., 2016). The D-Wave computershave been used to solve simple optimization problems in graphs and networks, machinelearning, artiﬁcial intelligence, and computational biology (Bian et al., 2013; O’Gormanet al., 2015; Perdomo-Ortiz et al., 2012, 2015; Rieﬀel et al., 2015). As an example, the lowestenergy alignment of a protein is considered as its preferred state, and protein folding is toﬁnd the lowest energy point in its energy landscape; the D-Wave computers can be arrangedto manipulate qubits to reach their lowest energy state and solve the folding problem of asimple protein (McGeoch, 2014; Perdomo-Ortiz et al., 2012).The rest of the paper proceeds as follows. Section 2 brieﬂy reviews quantum mechan-ics and quantum computation. Section 3 introduces quantum annealing and discusses itsimplementations by the D-Wave devices and its realizations by the MCMC based methods,called simulated quantum annealing (SQA), in the contex of Ising model. Section 4 carriesout simulation experiments by classical computers to illustrate SA and SQA. The resultsare compared with ground state success probability data obtained from quantum annealing.Section 5 features concluding remarks regarding statistical issues associated with the study3f the D-Wave devices and whether those MCMC based annealing methods can providestatistical models for the D-Wave devices.

For this paper, we consider the ﬁnite dimension only and let C d be the d -dimensional complexspace. For a vector ψ in C d , we employ the usual notations in quantum science and useDirac notations ket |·(cid:105) and bra (cid:104)·| to denote the column | ψ (cid:105) and row (cid:104) ψ | vector. Let thesuperscripts ∗ , (cid:48) and † be the conjugate of a complex number, the transpose of a vectoror matrix, and the conjugate transpose operation, respectively. A natural inner product in C d is given by (cid:104) u | v (cid:105) = (cid:80) dj =1 u ∗ j v j = ( u ∗ , · · · , u ∗ d )( v , · · · , v d ) (cid:48) , where (cid:104) u | = ( u , · · · , u d ) and | v (cid:105) = ( v , · · · , v d ) (cid:48) , and the modulus follows to be | u | = (cid:112) (cid:104) u | u (cid:105) . A matrix A is said to beHermitian if A = A † and a matrix U is said to be unitary if UU † = U † U = I , where I isan identity matrix. In classical computation, the most fundamental entity is a bit with two mutually exclusivestate values 0 and 1. The quantum analog of the classical bit is a qubit with two statevalues | (cid:105) and | (cid:105) , where we use the customary notation |·(cid:105) called ket to denote the qubitstate. Moreover, quantum computation allows qubits to encode the two states, zeros andones, simultaneously, which is known as the quantum superposition. That is, a qubit canbe in superposition states | ψ (cid:105) = α | (cid:105) + α | (cid:105) where complex numbers α and α are calledamplitudes and satisfy | α | + | α | = 1. Thus, the states of a qubit are unit vectors in C ,and the states | (cid:105) and | (cid:105) form an orthonormal basis for the C space, which are knownas computational basis states. Unlike the classical bits that have mutually exclusive statesand can be examined to determine whether it is in state 0 or 1, the qubits can be zero andone simultaneously while its state cannot be determined by examing the qubit. Instead,by quantum mechanics, we can measure a qubit | ψ (cid:105) and obtain either the result 0, withprobability | α | , or the result 1, with probability | α | .Like classic bits, we also can deﬁne multiple qubits. For one b -qubit, its states are unitvectors in a 2 b -dimensional complex vector space with computational basis states of the form | x x · · · x b (cid:105) , x j = 0 or 1, j = 1 , . . . , b . For example, the states of 2-qubit are unit vectors in C space with the superposition states | ψ (cid:105) = α | (cid:105) + α | (cid:105) + α | (cid:105) + α | (cid:105) , whereamplitudes α x are complex numbers satisfying | α | + | α | + | α | + | α | = 1. Similar tothe single qubit case, we obtain a measurement outcome x as one of the 00 , , ,

11 with acorresponding probability | α x | . The quantum exponential complexity is then shown in theexponential growth of dimensionality 2 b and the number of 2 b amplitudes needed to specifythe superposition states. Thus, as the quantum systems evolves, they are able to store andkeep track an exponential number of complex numbers while performing data manipulations4nd calculations. See Nielsen and Chuang (2000) and Wang (2012) for details. Quantum systems are entirely characterized by their states and the time evolution of thestates. For a ﬁxed time, the states of a d -dimensional quantum system can be characterizedby a unit vector | ψ (cid:105) in C d . To study the quantum system, we perform measurements on anobservable M , which is a Hermitian matrix on C d . The eigendecomposition of M is assumedto be the following M = r (cid:88) a =1 λ a Q a , where λ , . . . , λ r are the real eigenvalues of M , and Q a is the projection onto the eigen-spacecorresponding to the eigenvalue λ a . Quantum probability theory says that if we measure M for the quantum system under the state | ψ (cid:105) , we may obtain the measurement outcome Λ,which is a random variable that takes values in { λ , λ , . . . , λ r } with probability distribution P (Λ = λ a ) = tr( Q a | ψ (cid:105)(cid:104) ψ | ) = (cid:104) ψ | Q a | ψ (cid:105) , a = 1 , , . . . , r . Statistically, we can measure M many times to obtain measurement data for making inferences on the quantum states.We now focus on the time evolution of a quantum system. Let | ψ ( t ) (cid:105) be the state ofthe quantum system at time t , which is known as a wave function. The states | ψ ( t ) (cid:105) and | ψ ( t ) (cid:105) at times t and t are connected through | ψ ( t ) (cid:105) = U ( t , t ) | ψ ( t ) (cid:105) , where U ( t , t ) =exp[ − i H ( t − t )] is a unitary matrix and H is a Hermitian matrix on C d . In this regard,the continuous time evolution of | ψ ( t ) (cid:105) is governed by the following Schr¨odinger equation √− ∂ | ψ ( t ) (cid:105) ∂t = H | ψ ( t ) (cid:105) or equivalently | ψ ( t ) (cid:105) = e −√− H t | ψ (0) (cid:105) , (2.1)here H is a possibly time-dependent Hermitian matrix on C d , which is known as the Hamil-tonian of the quantum system for governing its quantum evolution. See Cai et al. (2016);Holevo (2011); Sakurai and Napolitano (2017); Shankar (2012); Wang (2012, 2013) and Wangand Xu (2015) for details. In this section, we ﬁrst review the classical simulated annealing and then discuss the quantumannealing. We show that if the simulated and quantum annealing are described by equivalentIsing models, then solving the corresponding optimization problem, or equivalently, ﬁndingthe minimal energy of each Ising model, are mathematically identical by the two annealingapproaches. We also derive the ground state success probability for quantum annealing andpresent the path-integral Monte Carlo method to mimic the quantum annealing by classicalsimulation. Relevant statistical theorems are presented.5 .1 Simulated annealing

Ising model is often used to describe the natural systems in physics and many optimizationproblems can be mapped into physical systems described by the Ising model. Examples in-clude traveling salesman problem, portfolio optimization, integer factoring, social economicsnetwork, protein folding, protein modeling, and statistical genetics. The ground state of theIsing model provides a solution for the optimization problem (Irb¨ack et al., 1996; Majewskiet al., 2001; McGeoch, 2014; Stauﬀer, 2008).Ising model is described by a graph G = ( V ( G ) , E ( G )), where V ( G ) and E ( G ) representthe vertex and edge sets of G , respectively. Each vertex is occupied by a random variablewhose value is { +1 , − } , and each edge corresponds to the coupling (or interaction) betweentwo vertex variables connected by the edge. A conﬁguration s = { s j , j ∈ V ( G ) } is deﬁned asa set of values assigned to all vertex variables s j , j ∈ V ( G ). We refer to the vertices as sitesand vertex variables as spins in physics, where +1 is for spin up and − b lattice sites and at each site j ,we let the variable s j be a binary random variable representing the position of the magnet,where s j = ± j th magnet pointing up or down, respectively.The classical Ising model has the following Hamiltonian H cI ( s ) = − (cid:88) ( i,j ) ∈E ( G ) J ij s i s j − (cid:88) j ∈V ( G ) h j s j , (3.2)where ( i, j ) stands for the edge between the sites i and j , the ﬁrst sum takes over all pairsof vertices with edge ( i, j ) ∈ E ( G ), J ij stands for the interaction (or coupling) between sites i and j associated with edge ( i, j ) ∈ E ( G ), and h j describes an external magnetic ﬁeld onvertex j ∈ V ( G ). We refer to a set of ﬁxed values { J ij , h j } as one instance of the Ising model.Given a speciﬁc conﬁguration s , the energy of the Ising model is H cI ( s ).According to Boltzmann’s law, the probability of a given conﬁguration s is described bythe following Boltzmann (or Gibbs) distribution P β ( s ) = e − β H cI ( s ) Z β , Z β = (cid:88) s e − β H cI ( s ) , (3.3)here β = ( k B T ) − is an inverse temperature where k B a generic physical constant called theBoltzmann constant and T the absolute temperature, moreover, the normalization constant Z β is called the partition function. If k B = 1, then T serves as the fundamental temperatureof the system with units of energy, and β is reciprocal to the fundamental temperature. Theconﬁguration probability P β ( s ) denotes the probability that the physical system is in a statewith conﬁguration s in equilibrium.When using the Ising model to represent a combinatorial optimization problem, the goalis to ﬁnd the ground state of the Ising model, that is, we need to ﬁnd a conﬁguration thatcan minimize the energy function H cI ( s ). If the Ising model contains b sites, then the conﬁgu-ration space is {− , +1 } b and the total number of possible conﬁgurations is 2 b . We note that6he system complexity increases exponentially in the number of sites b , and thus, it is verydiﬃcult to ﬁnd the ground states and solve the minimization problem numerically when b islarge. In fact, the search space that grows exponentially prohibits us to solve the minimiza-tion problem with deterministic exhaustive search algorithms. Instead, annealing methodssuch as SA can be employed to search the space probabilistically. To ﬁnd the conﬁgurationwith minimal energy, SA uses MCMC methods such as Metropolitan Hastings algorithm togenerate conﬁguration samples from the Boltzmann distribution while decreasing the tem-perature slowly. We initialize the algorithm with a random spin conﬁguration and ﬂip spinsat each step randomly. A new spin conﬁguration is accepted if it lowers the energy and if not,it is accepted probabilistically based on the Metropolis rule; meanwhile, the temperature islowered to reduce the escape probability of trapping in local minima. That is, the initialspin conﬁguration s (0) = { s (0) j } is obtained by randomly and independently assign +1 or − k th sweep, we attempt to ﬂip the i th spin from state s ( k − i to thenew state s ( k ) i = − s ( k − i while all the other spins remain unchanged. The change of energycaused by the ﬂipping is the following∆ E ( k ) i = − h i ( s ( k ) i − s ( k − i ) − i − (cid:88) j =1 J ij s ( k ) j ( s ( k ) i − s ( k − i ) − b (cid:88) j = i +1 J ij s ( k − j ( s ( k ) i − s ( k − i ) . The state of the i th spin is updated from s ( k − i to s ( k ) i if the energy is lowered, that is,∆ E ( k ) i ≤

0; otherwise, the state is updated with probability exp( − ∆ E ( k ) i /T k ), where T k isthe annealing schedule to lower the temperature. That is, the new state s ( k ) i is accepted withprobability min { , exp( − ∆ E ( k ) i /T k ) } . Annealing schedules used to lower the temperatureoften have T k that is proportional to 1 /k or 1 / log k . See Bertsimas and Tsitsiklis (1993);Geman and Geman (1984); Hajek (1988), and Wang et al. (2016). As in the classical annealing method, graph G is used to describe the quantum Ising modelwhile the vertex set V ( G ) represents the quantum spins, and the edge set E ( G ) denotesthe couplings (or interactions) between two quantum spins. Since qubits can be used torealize quantum spins, each vertex can be represented by a qubit. Given b vertices in G ,the quantum system is characterized by a d -dimensional complex space where d = 2 b . Thequantum state is described by a unit vector in C d , and its dynamic evolution is governed bythe Schr¨odinger equation deﬁned in (2.1) via a quantum Hamiltonian. The Hamiltonian hereis a Hermitian matrix of size d . The eigenvalues of the quantum Hamiltonian are connectedwith the energies of the quantum system, and the eigenvector corresponds to the smallesteigenvalue represents a ground state. I j = (cid:18) (cid:19) , σ xj = (cid:18) (cid:19) , σ zj = (cid:18) − (cid:19) , j = 1 , . . . , b, σ xj and σ zj are Pauli matrices in x and z axes, respectively. We note that the Paulimatrix in y axis is not needed (Nielsen and Chuang, 2000; Wang, 2012). For the quantumsystem, each classical vertex variable s j = ± σ zj for the j th quantumspin, which is further realized by a qubit. The two eigenvalues ± σ zj correspond to the eigenstates | + 1 (cid:105) and |− (cid:105) where each further represents the spin up state | ↑(cid:105) and spin down state | ↓(cid:105) . In total, there are 2 b possible quantum conﬁgurations made bycombining the 2 b eigenstates in the form |± (cid:105) of the Pauli matrices { σ zj } bj =1 . We replace s j in the classical Ising Hamiltonian H cI ( s ) by σ zj to obtain the quantum version, that is, H qI = − (cid:88) ( i,j ) ∈E ( G ) J ij σ zi σ zj − (cid:88) j ∈V ( G ) h j σ zj , (3.4)where J ij is the Ising coupling along the edge ( i, j ) ∈ E ( G ), and h j is the local ﬁeld on thevertex j ∈ V ( G ). Here we follow the convention in quantum literature so that σ zj and σ zi σ zj in (3.4) stand for their tensor products along with identical matrices σ zi σ zj ≡ I ⊗ · · · ⊗ I i − ⊗ σ zi ⊗ I i +1 ⊗ · · · ⊗ I j − ⊗ σ zj (cid:124) (cid:123)(cid:122) (cid:125) vertices i and j ⊗ I j +1 ⊗ · · · ⊗ I b , σ zj ≡ I ⊗ · · · ⊗ I j − ⊗ σ zj ⊗ I j +1 (cid:124) (cid:123)(cid:122) (cid:125) vertex j ⊗ · · · ⊗ I b . Each term in (3.4) is a tensor product of b matrices of size two and as a result, all terms in H qI are diagonal matrices of size 2 b , and so does H qI . We have one matrix to act on the j thqubit, either a Pauli matrix σ zj or an identity matrix I j . The quantum convention identiﬁesthe qubits with Pauli matrices for real actions but omits the identical matrices and tensorproduct signs.To ﬁnd a quantum spin conﬁguration with the minimal energy, or equivalently, a groundstate of quantum Hamiltonian H qI , we need to search for an eigenvector of H qI correspondingto its smallest eigenvalue. We note that H qI involves only commuting diagonal matrices, andits eigenvalues are equal to its diagonal entries, which in turn are the 2 b possible values ofclassical Hamiltonian H cI ( s ). Thus, the quantum system governed by H qI behaves essentiallylike a classical system and ﬁnding the minimal energy of the quantum Ising Hamiltonian H qI is equivalent to ﬁnding the minimal energy of the classical Ising Hamiltonian H cI . Thisresult is established mathematically in the following theorem. Theorem 1

Suppose that the classical and quantum Ising models have respective Hamilto-nians H cI and H qI described by the same graph G with b vertices, the identical couplings J ij and local ﬁelds h j . Then the quantum Ising model governed by H qI has the same Boltzmanndistribution of observing conﬁgurations as the classical Ising model governed by H cI , andﬁnding the minimal energy of the quantum Ising model is mathematically identical to ﬁndingthe minimal energy of the classical Ising model. Remark 2

Theorem 1 indicates that the original optimization problem described in Section3.1 can be formulated in the quantum framework, and the computational task for solving theoptimization problem remains the same as in the classical case.

8o carry out quantum annealing, it is essential to engineer a transverse magnetic ﬁeld thatis orthogonal to the Ising axis and obtain the corresponding Hamiltonian in the transverseﬁeld. The transverse ﬁeld represents kinetic energy that doe not commute with the potentialenergy H qI , therefore, it induces transitions between the up and down states of every singlespin, and converts the system behavior from classical to quantum. Assume that the followingquantum Hamiltonian governs the transverse magnetic ﬁeld H X = − (cid:88) j ∈V ( G ) σ xj , (3.5)where σ xj denotes the tensor products of b matrices of size 2, σ xj ≡ I ⊗ · · · ⊗ I j − ⊗ σ xj ⊗ I j +1 (cid:124) (cid:123)(cid:122) (cid:125) vertex j ⊗ · · · ⊗ I b , (3.6)and does not commute with σ zj in H qI . We note that σ xj has two eigenvalues +1 and − | v j, (cid:105) = (1 , † and | v j, − (cid:105) = (1 , − † . Asa result, the eigenvector corresponds to the smallest eigenvalue of H X is | v + (cid:105) = | v , +1 (cid:105) ⊗| v , +1 (cid:105) ⊗ · · · ⊗ | v b, +1 (cid:105) , and | v + (cid:105) is the ground state of H X .We start the quantum annealing procedure with a quantum system that is driven by thetransverse magnetic ﬁeld H X and is initialized in its ground state | v + (cid:105) . The system thengradually evolves from the initial Hamiltonian H X to its ﬁnal target Hamiltonian H qI . Duringthe Hamiltonian change, the system tends to stay in the ground states of the instantaneousHamiltonian via quantum tunneling based on the adiabatic quantum theorem (Farhi et al.,2000, 2001, 2002; McGeoch, 2014). At the end of the annealing procedure, if the quantumsystem stays in a ground state of the ﬁnal Hamiltonian H qI , we are able to obtain an optimalsolution by measuring the system.In speciﬁc, quantum annealing is realized by an instantaneous Hamiltonian for the Isingmodel in the transverse ﬁeld as follows, H D ( t ) = A ( t ) H X + B ( t ) H qI , t ∈ [0 , t f ] , (3.7)where A ( t ) and B ( t ) are smooth functions depend on time t that control the annealingschedules, and t f is the total annealing time. To drive the system from H X to H qI , we take A ( t f ) = B (0) = 0 where A ( t ) is decreasing and B ( t ) is increasing. It follows that when t = 0, H D (0) = A (0) H X and when t = t f , H D ( t f ) = B ( t f ) H qI . Since A (0) and B ( t f ) are knownscalars, H D ( t ) has the same eigenvectors as H X at the initial time t = 0 and as H qI at theﬁnal time t = t f , where the corresponding eigenvalues diﬀer by factors of A (0) and B ( t f ),respectively. Therefore, H D ( t ) moves the system from H X initialized in its ground state tothe ﬁnal target H qI . When the control functions A ( t ) and B ( t ) are chosen appropriately,the quantum adiabatic theorem indicates that the annealing procedure driven by (3.7) willhave a suﬃciently high probability in ﬁnding the global minimum of H cI ( s ) and solvingthe minimization problem at the ﬁnal annealing time t f . The following theorem providesthe probability lower bound on successfully solving the optimization problem at the ﬁnalannealing time t f by quantum annealing. 9 heorem 3 Suppose that the quantum system associated with quantum annealing is drivenby H QA ( t ) = A ( t ) H + B ( t ) H , t ∈ [0 , t f ] , where t f > , H and H are two quantumHamiltonians, annealing schedules A ( t ) and B ( t ) are smooth functions and satisfy A ( t f ) = B (0) = 0 , A ( t ) is decreasing and B ( t ) is increasing, and the system is initialized in a groundstate of H at t = 0 . The probability that the lowest energy of H is obtained by measuringthe system at the end of the annealing procedure is then bounded from below by e − ξ − b ζ Π e ξ (1 − p ) , (3.8) where ζ is the number of ground states, p is a constant such that p ∈ [0 , , Π = max (cid:40)(cid:12)(cid:12)(cid:12)(cid:12) λ j ( u ) − λ ( u ) (cid:104) ( (cid:96) ) ( u ) | d H QA ( ut f ) du | j ( u ) (cid:105) (cid:12)(cid:12)(cid:12)(cid:12) , j ≥ , (cid:96) = 1 , . . . , ζ, u ∈ [0 , (cid:41) ,λ j ( u ) , j = 0 , , , . . . , are eigenvalues of H QA ( ut f ) listed in an increasing order, and | ( (cid:96) ) ( u ) (cid:105) , (cid:96) = 1 , . . . , ζ are the ground states corresponding to the smallest eigenvalue λ ( u ) , with | j ( u ) (cid:105) any normalized eigenvector corresponding to eigenvalue λ j ( u ) , j ≥ , u ∈ [0 , , A ( u ) =( A (cid:96)l ( u )) is a ζ by ζ matrix given by A (cid:96)l ( u ) = −√− (cid:104) ˇ0 ( (cid:96) ) ( u ) | ddu ˇ0 ( l ) ( u ) (cid:105) for (cid:96) (cid:54) = l and for (cid:96) = l , ξ = (cid:90) (cid:107) A ( u ) (cid:107) du (cid:18) − π (cid:90) (cid:107) A ( u ) (cid:107) du (cid:19) − , and (cid:107) A ( u ) (cid:107) is the spectral norm of A ( u ) . In particular, if there is a unique ground state,then ζ = 1 and ξ = 0 , the probability lower bound in (3.8) becomes to − b Π(1 − p ) . The ground state success probability of quantum annealing is usually derived under theunique ground state condition in an asymptotic sense, that is, we obtain some expressionsor bounds for the leading terms of the ground state success probability by taking t f to go toinﬁnity (Aharonov et al., 2008; Born and Fock, 1928; McGeoch, 2014; Morita and Nishimori,2008). We provide the probability lower bound in (3.8) that is for ﬁnite t f and does notimpose the unique ground state restriction. From the proof, p can be interpreted as theprobability that the quantum system stays in a ground state during the annealing procedure.We note that d H QA ( u t f ) du = dA ( u t f ) du H + dB ( u t f ) du H , depends on u only through the derivatives of annealing schedules A ( t ) and B ( t ). By choosing A ( t ) and B ( t ) appropriately, we can make the probability lower bound in (3.8) away fromzero and thus, guarantee that quantum annealing can ﬁnd the lowest energy of H with someprobability. In Theorem 3, we may take H = H X given by (3.5), H = H qI deﬁned in (3.4),and H QA = H D in (3.7), then by Theorems 1 and 3 together, we conclude that the quantumannealing driven by (3.7) can ﬁnd the global minimum of H cI ( s ) and solve the minimizationproblem with certain probability. 10 .3 Simulated Quantum Annealing with path-integral Monte Carlomethod In this section, we discuss how to employ classical Monte Carlo method to mimic the behaviorof quantum annealing and derive relevant theoretical results. Metropolis-Hastings algorithmis one of the most popular ways to generate the Markov chain and we review the homogeneouscase ﬁrst. To obtain random samples from the target probability distribution q ( x ) using theMetropolis-Hastings, we ﬁrst choose an arbitrary point x (0) and then generate a candidate y (0) for the next sample using the generation probability P ( y, x (0) ) which is symmetric, thatis, P ( x, y ) = P ( y, x ). We accept y (0) as the next sample x (1) with acceptance probability A ( y (0) , x (0) ) and reject y (0) with probability 1 − A ( y (0) , x (0) ). One common choice of theacceptance probability is A ( y, x ) = min (1 , q ( y ) /q ( x )). We repeat the above algorithms andobtain the sequence of random samples from q ( x ). Speciﬁcally, Algorithm 1 summarizes theMetropolis-Hastings algorithm. Algorithm 1

Metropolis-Hastings Algorithm

Input: (1) Initial state x (0) ;(2) Generation probability P ( y, x );(3) Acceptance probability A ( y, x ). Generate y t ∼ P ( y, x ( t ) ); Take x ( t +1) = (cid:40) y ( t ) with probability A ( y ( t ) , x ( t ) ) x ( t ) with probability 1 − A ( y ( t ) , x ( t ) ) . The transition probability of the Metropolis-Hastings is given by G ( y, x ) = P ( y, x ) A ( y, x )while the suﬃcient condition of existing stationary distribution q ( x ) is detailed balance, thatis, G ( y, x ) q ( x ) = G ( y, x ) q ( y ) . We extend above homogeneous Markov chain to the inhomogeneous Markov chain andapply the path-integral method to the d -dimensional transverse magnetic ﬁeld Ising model.We deﬁne the transition probability that characterizes a Monte Carlo. For an Ising modelthat has b sites, we let q M,β ( s , t ) = 1 Z M,β ( t ) exp ( βF ,M ( s ) + γ β ( t ) F ,M ( s )) ,Z M,β ( t ) = (cid:88) s ∈{− , } bM exp ( βF ,M ( s ) + γ β ( t ) F ,M ( s )) ;11 M,β ( s ) = 1 Z M,β exp ( βF ,M ( s )) , Z M,β = (cid:88) s ∈{− , } bM exp ( βF ,M ( s )) , where F ,M ( s ) = 1 M M (cid:88) k =1 (cid:88) (cid:104) ij (cid:105) J ij s i,k s j,k , F ,M ( s ) = M (cid:88) k =1 b (cid:88) i =1 s i,k s i,k +1 ,γ β ( t ) = 12 log (cid:18) coth β Γ( t ) M (cid:19) . We deﬁne the transition probability from state x ∈ {− , } bM to state y ∈ {− , } bM at timestep t as follows G ( y, x ; t ) = (cid:40) P ( y, x ) A ( y, x ; t ) if x (cid:54) = y − (cid:80) z (cid:54)∈ x P ( z, x ) A ( z, x ; t ) if x = y (3.9)where P ( y, x ) is the generation probability and A ( y, x ; t ) is the acceptance probability.Speciﬁcally, P ( y, x ), the probability to generate the next candidate state y from the presentstate x , satisﬁes the following conditions(1) ∀ x, y ∈ {− , } bM : P ( y, x ) = P ( x, y ) ≥ ∀ x ∈ {− , } bM : (cid:80) y ∈{− , } bM P ( y, x ) = 1;(3) ∀ x ∈ {− , } bM : P ( x, x ) = 0;(4) ∀ x, y ∈ {− , } bM , ∃ n > , ∃ z , . . . , z n − ∈ {− , } bM : (cid:81) n − k =0 P ( z k +1 , z k ) > , z = x, z n = y . A ( y, x ; t ), the acceptance probability, is deﬁned as A ( y, x ; t ) = g (cid:18) q M,β ( y, t ) q M,β ( x, t ) (cid:19) . where the acceptance function g ( u ) satisﬁes (i) monotone increasing, (ii) 0 ≤ g ( u ) ≤ g (1 /u ) = g ( u ) /u . Examples include u/ (1 + u ) and min(1 , u ). Given above set-up, q M,β ( y, t ) satisﬁes the detailed balance condition, that is, G ( y, x ; t ) q M,β ( x, t ) = G ( x, y ; t ) q M,β ( y, t ) . For a ﬁxed t , q M,β ( x, t ) is then the stationary distribution of the homogeneous Markov chaindeﬁned by the transition matrix ( G ( x, y ; t )) x,y .To establish the theoretical results, we deﬁne R = min { max { d ( y, x ) : y ∈ {− , } bM } : x ∈ {− , } bM \ S m } ,L ,M = max (cid:8) | F ,M ( x ) − F ,M ( y ) | : P ( y, x ) > , x, y ∈ {− , } bM (cid:9) , S m = { x : x ∈ {− , } bM , F ,M ( y ) ≤ F ,M ( x ) , ∀ y ∈ {− , } bM and P ( y, x ) > } , and d ( y, x ) denotes the minimum number of steps necessary to make a transition from x to y , P ( y, x ) is deﬁned in (3.9). We ﬁrst consider the marginal distribution. Let q M,β ( s , t ) = M (cid:88) k =2 (cid:88) s k q M,β ( s , t ) , q M,β ( s ) = M (cid:88) k =2 (cid:88) s k q M,β ( s ) , where s k = { s i,k , i = 1 , . . . , b } and s = ∪ Mk =1 s k . Lemma 4

Suppose Γ( t ) ≥ Mβ tanh − t + 2) / ( RL ,M ) . Then q M,β ( s , t ) with β and q ,β/M ( s , t ) with β/M are asymptotically the same as t → ∞ . Let X M ( t ) = 1 √ M M (cid:88) k =1 (cid:88) (cid:104) ij (cid:105) J ij s i,k s j,k , where s is generated by q M,β ( s , t ). Theorem 5

For any given (cid:15) > , there are X ∞ and M ( (cid:15) ) such that for any given M ≥ M ( (cid:15) ) , there is t ( (cid:15), M ) such that for all t ≥ t ( (cid:15), M ) and any x ∈ R , | P ( X M ( t ) ≤ x ) − P ( X ∞ ≤ x ) | < (cid:15), if Γ( t ) ≥ Mβ tanh − t + 2) / ( RL ,M ) , where X ∞ ∼ N (0 , (cid:80) (cid:104) ij (cid:105) J ij ) . Remark 6

The marginal distribution of s k has the probability mass function (PMF) P ( s k = s k ) = 1 Z β/M exp βM (cid:88) (cid:104) ij (cid:105) J ij s i,k s j,k . By Taylor expansion, we have Z β/M = (cid:88) s βM (cid:88) (cid:104) ij (cid:105) J ij s i s j + O ( M − ) = 2 b + O ( M − )13 nd exp βM (cid:88) (cid:104) ij (cid:105) J ij s i,k s j,k = 1 + βM (cid:88) (cid:104) ij (cid:105) J ij s i,k s j,k + O ( M − ) , which implies that P ( s k = s k ) converges to the discrete uniform distribution as M → ∞ .That is, P ( s k = s k ) → b as M → ∞ . Then s k has mean zero and variance (cid:80) (cid:104) ij (cid:105) J ij as M → ∞ . Note: please check, s k ∼ q M,β ( s ) Lemma 7

For given t , we have (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:32)(cid:114)

12 sinh 2 β Γ( t ) M (cid:33) bM Z M,β ( t ) − Z β ( t ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = O (( β/M ) ) , where Z β ( t ) = T r (cid:16) exp (cid:16) β (cid:80) (cid:104) ij (cid:105) J ij σ zi σ zj + β Γ( t ) (cid:80) i σ xi (cid:17)(cid:17) . We investigate data collected from annealing experiments conducted by D-Wave (DW) ma-chine where the number of qubits used equals to either 485 or 1094. That is, we consider anIsing model with b spins where b = 485 or 1094. The Ising model studied in this paper tookexternal magnetic ﬁeld h j = 0 so that the optimization problem could be easier. Withineach experiment, to create a problem instance, we randomly assigned a value of either +1or − J ij in the DW machine. The same procedure was repeated for 100times so that in total, 100 problem instances each with a random assignment of couplings { J ij , ( i, j ) ∈ E ( G ) } where J ij = ±

1, were obtained. For each of the 100 selected instances,after its couplings were programmed, 1000000 annealing runs were performed on the DWmachine. The energy level at the end of each annealing run was recorded. Theorem 1 and3 guarantee that the quantum annealing can ﬁnd the global minimum of H cI ( s ) by measur-ing the quantum system at the end of each annealing run with a positive probability. Todetermine whether the system has found the global minimum or reached the ground state,we adopt the following annealing method: given a problem instance, we declare that a par-ticular annealing run obtains the ground state if the ﬁnal energy level recorded is the sameas the global minimum energy level for the problem instance, H cI ( s ). For each of the 100selected instances, we were able to determine whether the ground state has been achievedduring a particular run out of 1000000 runs. We recorded the frequency of ﬁnding theground state among 1000000 runs and computed its corresponding probability, that is, the14round state success probability. Figure 1 presents the histogram plot that consists of the100 ground state success probabilities for the 100 problem instances under the circumstanceswhere b = 485 and b = 1094. Both histograms exhibit unimodal shape. For b = 485, groundstate success probability ranges from 0.00024 to 0.28502 while for b = 1094, ground statesuccess probability ranges from 0.0001 to 0.00371. As the number of qubits increases, theDW machine ﬁnds it harder to reach minimum energy level given various problem instancesas the interactions among couplers become more complicated. DW Machine 485 success probability nu m be r o f i n s t an c e s DW Machine 1094 success probability nu m be r o f i n s t an c e s Figure 1: Histogram plot of ground state success probability data collected from DW machineconsist of 100 problem instances.

We approximate quantum annealing with classical path-integral Monte Carlo method de-scribed in Section 3.3. The SQA procedure assumes a Chimera graph where the total numberof qubits is b . For the quantum annealing Hamiltonian H D ( t ) in (3.7), to ﬁnd the Boltz-mann state of the quantum system, we are required to evaluate canonical partition functiontr[ e − H D ( t ) /T ] for the transverse ﬁeld quantum Ising model. To approximate the partitionfunction, we use the Trotter formula (Kato, 1978; Suzuki, 1976; Trotter, 1959). Speciﬁcally,given annealing schedules A ( t ) and B ( t ) from quantum annealing and to approximate thepartition function for quantum annealing Hamiltonian H D ( t ) with temperature T , we replacethe temperature parameter by T /B ( t ) and the transverse ﬁeld parameter by A ( t ) /B ( t ). Suchapproximation maps the transverse quantum Ising model to a classical (2 + 1)-dimensional15nisotropic Ising model with temperature τ T and Hamiltonian H caI ( s ) = − τ (cid:88) l =1 B ( t ) (cid:88) ( i,j ) ∈E ( G ) J ij s il s jl + J ( t ) (cid:88) j ∈V ( G ) s jl s j,l +1 where s il = ± τ is an integer, J ij are the regular couplings along theoriginal 2-dimensional direction of the Ising model and l is the index for an extra dimensionthat is often referred to as the imaginary-time dimension with J ( t ) = − τ T (cid:20) tanh (cid:18) A ( t ) τ T (cid:19)(cid:21) as the coupling along the imaginary-time direction.Let s l = { s il , l = 1 , . . . , b } , 1 ≤ l ≤ τ be the l th Trotter slice correponding to oneparticular conﬁguration. We now transform the original 2-dimensional transverse quantumIsing model to a classical (2+1)-dimensional Ising model with the additional dimension inimaginary-time. The imaginary-time dimension has a ﬁnite length τ , uniform coupling andan annealing schedule J ( t ). Given the classical anisotropic Ising Hamiltonian H caI , we areallowed to approximate the transverse ﬁeld quantum Ising model by a classical path-integralMonte Carlo method. With the additional dimension in imaginary-time, the Monte Carlosimulation needs to adopt a standard Metropolis-Hastings algorithm both in local and globalmoves. For the local moves, we perform usual independent spin ﬂips for all spins embeddedwithin all Trotter slices while for the global moves, we attemp to ﬂip all the replicas of thesame spin embedded within all Trotter slices.We start the SQA algorithm by randomly assigning ± s (0) = { s (0) jl , j ∈V ( G ) , l = 1 , . . . , τ } . We update spins one by one for local moves and spin replicas site bysite for global moves. We call a complete update of all spins both in local and global movesa sweep and denote the total number of sweeps as R . Let t k = k/R , k = 1 , . . . , R . For the k th local sweep, we focus on spin i in the l th Trotter slice while keeping all the other spinsthe same. The energy change from state s ( k − il to a new state s ( k ) il = − s ( k − il is (cid:52) E ( k )1 il = − B ( t k ) (cid:34) i − (cid:88) j =1 J ij s ( k ) jl (cid:16) s ( k ) il − s ( k − il (cid:17) + b (cid:88) j = i +1 J ij s ( k − jl (cid:16) s ( k ) il − s ( k − il (cid:17)(cid:35) − J ( t k ) (cid:104) s ( k ) il s ( k ) i,l +1 + s ( k ) i,l − s ( k ) il − s ( k − il s ( k − i,l +1 − s ( k − i,l − s ( k − il (cid:105) and the new state s ( k ) il is accepted with probability min (cid:110) , exp (cid:104) −(cid:52) E ( k )1 il / ( τ T ) (cid:105)(cid:111) during the k th local sweep. On the other hand, for the k th global sweep, we attempt to ﬂip spinsat site i embedded within all Trotter slices l , l = 1 , . . . , τ . The energy change from state { s ( k − il , l = 1 , . . . , τ } to state { s ( k ) il = − s ( k − il , l = 1 , . . . , τ } is (cid:52) E ( k )2 i = − τ (cid:88) l =1 B ( t k ) (cid:34) i − (cid:88) j =1 J ij s ( k ) jl (cid:16) s ( k ) il − s ( k − il (cid:17) + b (cid:88) j = i +1 J ij s ( k − jl (cid:16) s ( k ) il − s ( k − il (cid:17)(cid:35) s ( k ) il is accepted with probability min (cid:110) , exp (cid:104) −(cid:52) E ( k )2 i / ( τ T ) (cid:105)(cid:111) during the k th global sweep. To evaluate the original classical Hamiltonian H cI ( s ) in (3.2), we simplyplug in the conﬁguration s ( k ) = { s ( k ) i , i ∈ V ( G ) } obtained from the ﬁrst Trotter slice at thelast sweep and take h j = 0. The SQA method based on path-integral Monte Carlo simulationstudies the 2-dimensional transverse ﬁeld quantum Ising model through a (2+1)-dimensionalclassical Ising model and brought the system to an equilibrium state at each sweep underthe Metropolis-Hastings algorithm.We now take a Chimera graph that is close to the one adopted by DW machine wherethe total number of spins b equals to either 485 or 945 that is comparable to the totalnumber of qubits studied in previous Section 4.1. We create 100 problem instances byrandomly assigning ± A ( t ) = (cid:40) t − . t + 2 . , if 0 ≤ t ≤ . , if 0 . < t < ≤ B ( t ) = 5 . t + 0 . t, t ∈ [0 , . We take temperature T to be 0.1 and number of Trotter slices τ to be either 30 or 60.Let the total number of sweeps for SQA method to be 100000, 200000 and 500000. Foreach of the 100 problem instances, we ran the SQA algorithm for 3000 times and for eachannealing run, we declare that the particular run ﬁnds its ground state if the run yields aminimum value that is the same as the known global minimum of H cI ( s ) for the instance. Theglobal minimum values are determined by SA algorithms described in Section 3.1 where foreach assigned problem instance, we run the algorithm with the following number of sweeps:1000000, 5000000 and 10000000, each for 3000 times. For a speciﬁc problem instance, out ofthe 9000 SA runs, the minimum energy level reported is considered to be the global minimumenergy level.Figure 2 and 3 present the histograms for ground state success probability data obtainedby the SA method and SQA method, respectively. Given the same 100 problem instances andglobal minimum energy levels reported by SA, the SA on average has a higher probability ofﬁnding the minimum values comparing to SQA. For the SA method, the histograms exhibituniform pattern when b = 485 and unimodal pattern when b = 945 for all sweep numbers.As the number of sweeps increases, the ground state success probability increases on averageso that the patterns become weaker. For the SQA method, the histograms exhibit similarunimodal patterns for all sweep and slice numbers. The additional number of slices made itmore diﬃcult for the SQA to ﬁnd the minimum energy level given our current sweep numberscomparing to the SA. For each slice number, ground state success probability increases asthe sweep number increase for most of the 100 problem instances so that in general, theunimodal pattern becomes weaker as the sweep number increases. For each sweep number,ground state success probability decreases as the slices number changes from 30 to 60 for mostof the 100 problem instances so that in general, the unimodal pattern becomes stronger asthe slices number changes from 30 to 60. The histograms for ground state success probability17btained by the SQA method share the similar pattern with the histogram generated by theDW machine. However, the success probability obtained by SQA method ranges from 0 to0.8 when the number of spins equals to 485 and ranges from 0 to 0.45 when the number ofspins equals to 945 while the success probability generated by DW machine ranges from 0to 0.3. (a) total number of spins: 485 SA with 1000000 sweeps success probability nu m be r o f i n s t an c e s SA with 5000000 sweeps success probability nu m be r o f i n s t an c e s SA with 10000000 sweeps success probability nu m be r o f i n s t an c e s (b) total number of spins: 945 SA with 1000000 sweeps success probability nu m be r o f i n s t an c e s SA with 5000000 sweeps success probability nu m be r o f i n s t an c e s SA with 10000000 sweeps success probability nu m be r o f i n s t an c e s Figure 2: Histogram plots of ground state success probability data based on SA algorithmfor 100 problem instances.

Quantum computation has drawn enormous attention in many frontiers of multiple scientiﬁcﬁelds and can give rise to an exponential speedup over classical computers in solving certaincomputational problems. Since statistics and machine learning nowadays involve compu-tation work heavily, there is a great demand in studying statistical issues for theoreticalresearch and experimental work with quantum computers.18 a) total number of spins: 485

SQA with 100000 sweeps and 30 slices success probability nu m be r o f i n s t an c e s SQA with 200000 sweeps and 30 slices success probability nu m be r o f i n s t an c e s SQA with 500000 sweeps and 30 slices success probability nu m be r o f i n s t an c e s SQA with 100000 sweeps and 60 slices success probability nu m be r o f i n s t an c e s SQA with 200000 sweeps and 60 slices success probability nu m be r o f i n s t an c e s SQA with 500000 sweeps and 60 slices success probability nu m be r o f i n s t an c e s (b) total number of spins: 945 SQA with 100000 sweeps and 30 slices success probability nu m be r o f i n s t an c e s SQA with 200000 sweeps and 30 slices success probability nu m be r o f i n s t an c e s SQA with 500000 sweeps and 30 slices success probability nu m be r o f i n s t an c e s SQA with 100000 sweeps and 60 slices success probability nu m be r o f i n s t an c e s SQA with 200000 sweeps and 60 slices success probability nu m be r o f i n s t an c e s SQA with 500000 sweeps and 60 slices success probability nu m be r o f i n s t an c e s Figure 3: Histogram plots of ground state success probability data based on SQA algorithmfor 100 problem instances. 19his paper reviews classical annealing such as simulated annealing, discusses quantumannealing that is implemented by D-Wave computing devices and by MCMC based annealingmethods. We show that if the classical and quantum annealing are characterized by equiv-alent Ising models, then solving an optimization problem by the two annealing proceduresare mathematically identical. Moreover, we derive the probability lower bound of ﬁnding theminimal energy, or equivalently, solving the optimization problem, through quantum anneal-ing by measuring the system at the end of the annealing procedure. Attempts to quantifythe quantum nature of the D-Wave devices have been not only met with excitement butalso confronted with suspicion. Wang et al. (2016) studied the consistency or inconsistencybetween data obtained from D-Wave devices and gathered through MCMC based annealingmethods where the total number of qubits is 100, which is relatively small. In this paper,we demonstrate that fundamental distinguishment exists between the classical and quan-tum annealing methods when the total number of qubits involved is at the level of 500 or1000. The results provide strong evidence that the input-output data of D-Wave devices areinconsistent with the statistical behaviors of the MCMC-based annealing models.20

Appendix: Proofs

Denote C ’s the generic constants whose values are free of m, n, M, d and may change fromappearance to appearance. Let e j, +1 = (1 , † and e j, − = (0 , † , j = 1 , · · · , b , then e j, ± are eigenvectors of Pauli matrix σ zj corresponding to eigenvalues ±

1. For the classical Ising model, given a conﬁguration s = ( s , · · · , s b ) with energy H cI ( s ) in (3.2), we deﬁne a unit vector in C d as follows, e s = e ,s ⊗ e ,s · · · ⊗ e b,s b . We show that e s is an eigenvector of H qI with corresponding eigenvalue H cI ( s ). Indeed, σ zj e j,s j = s j e j,s j , (cid:0) I ⊗ · · · I ⊗ σ zj ⊗ I · · · ⊗ I (cid:1) ( e ,s ⊗ e ,s · · · ⊗ e b,s b ) = s j e ,s ⊗ e ,s · · · ⊗ e b,s b = s j e s , (cid:0) I ⊗ · · · I ⊗ σ zi ⊗ I · · · I ⊗ σ zj ⊗ I · · · ⊗ I (cid:1) = s i s j e ,s ⊗ e ,s · · · ⊗ e b,s b = s i s j e s , it follows that H qI e s = − (cid:88) ( i,j ) ∈E ( G ) J ij (cid:0) I ⊗ · · · I ⊗ σ zi ⊗ I · · · I ⊗ σ zj ⊗ I · · · ⊗ I (cid:1) ( e ,s ⊗ e ,s · · · ⊗ e b,s b ) − (cid:88) j ∈V ( G ) h j (cid:0) I ⊗ · · · I ⊗ σ zj ⊗ I · · · ⊗ I (cid:1) ( e ,s ⊗ e ,s · · · ⊗ e b,s b )= − (cid:88) ( i,j ) ∈E ( G ) J ij s i s j e s − (cid:88) j ∈V ( G ) h j s j e s = − (cid:88) ( i,j ) ∈E ( G ) J ij s i s j − (cid:88) j ∈V ( G ) h j s j e s = H cI ( s ) e s . Thus, the 2 b eigenvalues of H qI are H cI ( s ), s ∈ { +1 , − } b , which are actually the diagonalentries of H qI . If s achieves the global minimum of H cI ( s ), then H cI ( s ) is the smallesteigenvalue of H qI .The partition function and Gibbs state of the quantum Ising model are, respectively,given by Z = tr[ e − β H qI ] , ρ = e − β H qI Z .

Since H qI has eigenvalues H cI ( s ) with eigenvectors e s , it is easy to compute the partitionfunction as follows, Z = (cid:88) s (cid:104) e s | e − β H qI | e s (cid:105) = (cid:88) s e − β H cI ( s ) , s is given by (cid:104) e s | ρ | e s (cid:105) = 1 Z (cid:104) e s | e − β H qI | e s (cid:105) = 1 Z e − β H cI ( s ) , which is equal to the classical Boltzmann distribution. Let u = t/t f be the dimensionless time, and set H ( u ) ≡ H QA ( u t f ) = H QA ( t ) , | ϕ ( u ) (cid:105) ≡ | ψ ( u t f ) (cid:105) = | ψ ( t ) (cid:105) . Since state | ψ ( t ) (cid:105) of the quantum system in nature time t follows the Schr¨odingier equation(2.1) with Hamiltonian H QA ( t ), | ϕ ( u ) (cid:105) satisﬁes i d | ϕ ( u ) (cid:105) du = t f H ( u ) | ϕ ( u ) (cid:105) , (6.10)where we reserve i for the unit imaginary number √− H ( u ). Let λ ( u ) < λ ( u ) < · · · < λ k ( u ) < · · · be the eigenvalues of H ( u ) listed in an increasing order, and denote by | k ( (cid:96) ) ( u ) (cid:105) the normalized instantaneous eigenstates of H ( u ) corresponding to the k -th eigen-value λ k ( u ). For example, there are ζ ground states | ( (cid:96) ) ( u ) (cid:105) , (cid:96) = 1 , · · · , ζ , corresponding tothe smallest eigenvalue λ ( u ). Since our main analysis is targeted for the ground states, wefocus on | ( (cid:96) ) ( u ) (cid:105) , their amplitudes and relationships with other eigenstates.First we need to adjust the eigenstates to meet certain conditions in order to facilitateour analysis. From H ( u ) | j ( l ) ( u ) (cid:105) = λ j ( u ) | j ( l ) ( u ) (cid:105) , we take derivatives on both sides to obtain d H ( u ) du | j ( l ) ( u ) (cid:105) + H ( u ) d | j ( l ) ( u ) (cid:105) du = dλ j ( u ) du | j ( l ) ( u ) (cid:105) + λ j ( u ) d | j ( l ) ( u ) (cid:105) du , and thus for k (cid:54) = j , (cid:104) k ( (cid:96) ) ( u ) | d H ( u ) du | j ( l ) ( u ) (cid:105) + (cid:104) k ( (cid:96) ) ( u ) | H ( u ) d | j ( l ) ( u ) (cid:105) du = (cid:104) k ( (cid:96) ) ( u ) | dλ j ( u ) du | j ( l ) ( u ) (cid:105) + (cid:104) k ( (cid:96) ) ( u ) | λ j ( u ) d | j ( l ) ( u ) (cid:105) du . (6.11)For orthonormal | j ( l ) ( u ) (cid:105) and | k ( (cid:96) ) ( u ) (cid:105) , (cid:104) j ( l ) ( u ) (cid:105)| k ( (cid:96) ) ( u ) (cid:105) = 0 and (cid:104) k ( (cid:96) ) ( u ) | H ( u ) = λ k ( u ) (cid:104) k ( (cid:96) ) ( u ) | .Substitute these into (6.11) to yield (cid:104) k ( (cid:96) ) ( u ) | d H ( u ) du | j ( l ) ( u ) (cid:105) + λ k ( u ) (cid:104) k ( (cid:96) ) ( u ) | d | j ( l ) ( u ) (cid:105) du = dλ j ( u ) du (cid:104) k ( (cid:96) ) ( u ) | j ( l ) ( u ) (cid:105) + λ j ( u ) (cid:104) k ( (cid:96) ) ( u ) | d | j ( l ) ( u ) (cid:105) du = λ j ( u ) (cid:104) k ( (cid:96) ) ( u ) | d | j ( l ) ( u ) (cid:105) du , (cid:104) k ( (cid:96) ) ( u ) | ddu | j ( l ) ( u ) (cid:105) ≡ (cid:104) k ( (cid:96) ) ( u ) | d | j ( l ) ( u ) (cid:105) du = 1 λ j ( u ) − λ k ( u ) (cid:104) k ( (cid:96) ) ( u ) | d H ( u ) du | j ( l ) ( u ) (cid:105) , (6.12)for j (cid:54) = k . Let | ˇ j ( l ) ( u ) (cid:105) = exp { iη ( l ) j ( u ) }| j ( l ) ( u ) (cid:105) be a time-dependent phase shift of | j ( l ) ( u ) (cid:105) ,that is, we add an accent mark ˇ to mean a time-dependent shift for the eigenstates, where η ( l ) j ( u ) satisﬁes (cid:104) j ( l ) ( u ) | ddu | j ( l ) ( u ) (cid:105) + i dη ( l ) j ( u ) du = 0 , which is possible since (cid:104) j ( l ) ( u ) | ddu | j ( l ) ( u ) (cid:105) + (cid:18) (cid:104) j ( l ) ( u ) | ddu | j ( l ) ( u ) (cid:105) (cid:19) † = ddu (cid:104) j ( l ) ( u ) | j ( l ) ( u ) (cid:105) = 0 , and hence (cid:104) j ( l ) ( u ) | ddu | j ( l ) ( u ) (cid:105) is a pure imaginary number. Thus, (cid:104) ˇ j ( l ) ( u ) | ddu | ˇ j ( l ) ( u ) (cid:105) = e iη ( l ) j ( u ) (cid:104) ˇ j ( l ) ( u ) | ddu | j ( l ) ( u ) (cid:105) + i dη ( l ) j ( u ) du (cid:104) ˇ j ( l ) ( u ) | ˇ j ( l ) ( u ) (cid:105) = e iη ( l ) j ( u ) e − iη ( l ) j ( u ) (cid:104) j ( l ) ( u ) | ddu | j ( l ) ( u ) (cid:105) + i dη ( l ) j ( u ) du = (cid:104) j ( l ) ( u ) | ddu | j ( l ) ( u ) (cid:105) + i dη ( l ) j ( u ) du = 0 . (6.13)Of course {| ˇ j ( l ) ( u ) (cid:105) , j = 0 , , · · · , } remains to be orthonormal, the pair ( λ j ( u ) , | ˇ j ( l ) ( u ) (cid:105) ) stillsatisﬁes H ( u ) | ˇ j ( l ) ( u ) (cid:105) = e iη ( l ) j ( u ) H ( u ) | j ( l ) ( u ) (cid:105) = e iη ( l ) j ( u ) λ j ( u ) | j ( l ) ( u ) (cid:105) = λ j ( u ) | ˇ j ( l ) ( u ) (cid:105) , and for j (cid:54) = k , (cid:104) ˇ k ( (cid:96) ) ( u ) | ddu | ˇ j ( l ) ( u ) (cid:105) = e iη ( l ) j ( u ) (cid:104) ˇ k ( (cid:96) ) ( u ) | ddu | j ( l ) ( u ) (cid:105) + i dη ( l ) j ( u ) du (cid:104) ˇ k ( (cid:96) ) ( u ) | ˇ j ( l ) ( u ) (cid:105) = e i [ η ( l ) j ( u ) − η ( (cid:96) ) k ( u )] (cid:104) k ( (cid:96) ) ( u ) | ddu | j ( l ) ( u ) (cid:105) = e i [ η ( l ) j ( u ) − η ( (cid:96) ) k ( u )] λ j ( u ) − λ k ( u ) (cid:104) k ( (cid:96) ) ( u ) | d H ( u ) du | j ( l ) ( u ) (cid:105) = 1 λ j ( u ) − λ k ( u ) (cid:104) ˇ k ( (cid:96) ) ( u ) | d H ( u ) du | ˇ j ( l ) ( u ) (cid:105) , (6.14)where the third equality is due to (6.12).Now with (6.13)-(6.14) satisﬁed by the instantaneous eigenstates ˇ j ( l ) ( u ) of H ( u ), we usethem to express the quantum state | ϕ ( u ) (cid:105) as follows, | ϕ ( u ) (cid:105) = (cid:88) j,l ≥ α ( l ) j ( u ) | ˇ j ( l ) ( u ) (cid:105) . (6.15)23lugging above expression into the Schr¨odinger equation (6.10) we obtain (cid:88) j,l ≥ i ddu (cid:104) α ( l ) j ( u ) | ˇ j ( l ) ( u ) (cid:105) (cid:105) = (cid:88) j,l ≥ t f H ( u ) (cid:104) α ( l ) j ( u ) | ˇ j ( l ) ( u ) (cid:105) (cid:105) , and simple manipulations lead to (cid:88) j,l ≥ i (cid:34) dα ( l ) j ( u ) du | ˇ j ( l ) ( u ) (cid:105) + α ( l ) j ( u ) ddu | ˇ j ( l ) ( u ) (cid:105) (cid:35) = (cid:88) j,l ≥ t f α ( l ) j ( u ) H ( u ) | ˇ j ( l ) ( u ) (cid:105) = (cid:88) j,l ≥ t f α ( l ) j ( u ) λ j ( u ) | ˇ j ( l ) ( u ) (cid:105) . Taking products with state (cid:104) ˇ k ( (cid:96) ) ( u ) | on both sides and noting the scalar nature of t f , α ( l ) j ( u )and λ j ( u ), we arrive at (cid:88) j,l ≥ i (cid:34) dα ( l ) j ( u ) du (cid:104) ˇ k ( (cid:96) ) ( u ) | ˇ j ( l ) ( u ) (cid:105) + α ( l ) j ( u ) (cid:104) ˇ k ( (cid:96) ) ( u ) | ddu | ˇ j ( l ) ( u ) (cid:105) (cid:35) = (cid:88) j,l ≥ t f λ j ( u ) α ( l ) j ( u ) (cid:104) ˇ k ( (cid:96) ) ( u ) | ˇ j ( l ) ( u ) (cid:105) , which can be simpliﬁed by (6.13) and the orthonormality of | ˇ j ( l ) ( u ) (cid:105) as dα ( (cid:96) ) k ( u ) du + (cid:88) l (cid:54) = (cid:96) α ( l ) k (cid:104) ˇ k ( (cid:96) ) ( u ) | ddu | ˇ k ( l ) ( u ) (cid:105) + (cid:88) j (cid:54) = k (cid:88) l α ( l ) j ( u ) (cid:104) ˇ k ( (cid:96) ) ( u ) | ddu | ˇ j ( l ) ( u ) (cid:105) = − it f λ k ( u ) α ( (cid:96) ) k ( u ) . (6.16)Use (6.14) to re-write (6.16) with k = 0 as a linear diﬀerential equation system for theamplitudes α ( (cid:96) )0 ( u ) of ζ ground states dα ( (cid:96) )0 ( u ) du = − it f λ ( u ) α ( (cid:96) )0 ( u ) − (cid:88) l (cid:54) = (cid:96) α ( l )0 ( u ) (cid:104) ˇ0 ( (cid:96) ) ( u ) | ddu | ˇ0 ( l ) ( u ) (cid:105)− (cid:88) j> (cid:88) l α ( l ) j ( u ) 1 λ j ( u ) − λ ( u ) (cid:104) ˇ0 ( (cid:96) ) ( u ) | d H ( u ) du | ˇ j ( l ) ( u ) (cid:105) , (6.17)where (cid:96) = 1 , · · · , ζ , the sum in the second term is l = 1 , . . . , (cid:96) − , (cid:96) + 1 , . . . , ζ for groundstates, and the sums in the third term are over for all excited states.The linear diﬀerential equation system (6.17) has solution( α (1)0 ( u ) , · · · , α ( ζ )0 ( u )) (cid:48) = U ( u )( α (1)0 (0) , · · · , α ( ζ )0 (0)) (cid:48) + U ( u ) (cid:90) u [ U ( x )] − (cid:88) j> α ( l ) j ( x ) 1 λ j ( x ) − λ ( x ) (cid:104) ˇ ( x ) | d H ( x ) dx | ˇ j ( l ) ( x ) (cid:105) dx, (6.18)24here (cid:104) ˇ ( x ) | = ( (cid:104) ˇ0 (1)( x ) | , · · · , (cid:104) ˇ0 ( ζ ) ( x ) | ) (cid:48) , and U is a fundamental matrix for the homogeneouslinear diﬀerential equation system corresponding to (6.17) with initial condition U (0) = I ,that is, the columns of U form a complete linearly independent set of solutions for thehomogeneous equation system, dα ( (cid:96) )0 ( u ) du = − it f λ ( u ) α ( (cid:96) )0 ( u ) − (cid:88) l (cid:54) = (cid:96) (cid:104) ˇ0 ( (cid:96) ) ( u ) | ddu | ˇ0 ( l ) ( u ) (cid:105) α ( l )0 ( u ) , or in a vector-matrix form d ( α (1)0 ( u ) , · · · , α ( ζ )0 ( u )) (cid:48) du = D ( u )( α (1)0 (0) , · · · , α ( ζ )0 (0)) (cid:48) , where D ( u ) = − it f λ ( u ) I − i A ( u ) is a matrix of size ζ , where A ( u ) = ( A (cid:96)l ( u )) and A (cid:96)l ( u ) = − i (cid:104) ˇ0 ( (cid:96) ) ( u ) | ddu | ˇ0 ( l ) ( u ) (cid:105) for l (cid:54) = (cid:96) and 0 for l = (cid:96) . Since iA (cid:96)l ( u ) + [ iA l(cid:96) ( u )] ∗ = (cid:104) ( (cid:96) ) ( u ) | ddu | ( l ) ( u ) (cid:105) + (cid:18) (cid:104) ( l ) ( u ) | ddu | ( (cid:96) ) ( u ) (cid:105) (cid:19) ∗ = ddu (cid:104) ( (cid:96) ) ( u ) | ( l ) ( u ) (cid:105) = 0 , A ( u ) is a Hermitian matrix. Matrix U has an expression through the so called Magnusexpansion Blanes et al. (2009), U ( u ) = exp {− it f λ ( u ) I − Ξ ( u ) } , Ξ ( u ) = ∞ (cid:88) k =1 i k Ξ k ( u ) , where Ξ k ( u ) in the Magnus expansion can be computed by a recursive procedure throughthe matrices Υ ( j ) k as follows, Ξ ( u ) = (cid:90) u A ( v ) dv, Ξ k ( u ) = k − (cid:88) l =1 B j j (cid:90) u Υ ( l ) k ( v ) dv, k ≥ ,B j are Bernoulli numbers, Υ (1) k = [ Ξ k − , A ] , Υ ( k − k = ad ( k − Ξ ( A ) , Υ ( j ) k = k − j (cid:88) l =1 [ Ξ l , Υ ( j − k − l ] , j = 2 , . . . , k − ,ad Ξ ( A ) = A , ad k +1 Ξ ( A ) = [ Ξ , ad k Ξ ( A )] , [ A , B ] = AB − BA is the matrix commutator of A and B , ad k Ξ is a shorthand for an iteratedcommutator, and (cid:107) Ξ k ( u ) (cid:107) ≤ π (cid:18) π (cid:90) u (cid:107) A ( v ) (cid:107) dv (cid:19) k . (cid:107) Ξ ( u ) (cid:107) ≤ ∞ (cid:88) k =1 (cid:107) Ξ k ( u ) (cid:107) ≤ π ∞ (cid:88) k =1 (cid:18) π (cid:90) u (cid:107) A ( v ) (cid:107) dv (cid:19) k ≤ (cid:90) u (cid:107) A ( v ) (cid:107) dv (cid:18) − π (cid:90) u (cid:107) A ( v ) (cid:107) dv (cid:19) − ≤ (cid:90) (cid:107) A ( v ) (cid:107) dv (cid:18) − π (cid:90) (cid:107) A ( v ) (cid:107) dv (cid:19) − = ξ,e − ξ ≤ exp( −(cid:107) Ξ ( u ) (cid:107) ) ≤ (cid:107) U ( u ) (cid:107) ≤ exp( (cid:107) Ξ ( u ) (cid:107) ) ≤ e ξ . (6.19)As the system initializes at the ground states, (cid:107) ( α (1)0 (0) , · · · , α ( ζ )0 (0)) (cid:107) = 1. From (6.18) and(6.19) we ﬁnd (cid:107) ( α (1)0 (1) , · · · , α ( ζ )0 (1)) (cid:107) ≥ (cid:107) U (1)( α (1)0 (0) , · · · , α ( ζ )0 (0)) (cid:107) − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) U (1) (cid:90) [ U ( x )] − (cid:88) j> (cid:88) l α ( l ) j ( x ) 1 λ j ( x ) − λ ( x ) (cid:104) ˇ ( x ) | d H ( x ) dx | ˇ j ( l ) ( x ) (cid:105) dx (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) , (6.20) (cid:107) U (1)( α (1)0 (0) , · · · , α ( ζ )0 (0)) (cid:107) ≥ e − ξ (cid:107) ( α (1)0 (0) , · · · , α ( ζ )0 (0)) (cid:107) = e − ξ , (6.21)and (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) U ( u ) (cid:90) u [ U ( x )] − (cid:88) j> (cid:88) l α ( l ) j ( x ) 1 λ j ( x ) − λ ( x ) (cid:104) ˇ ( x ) | d H ( x ) dx | ˇ j ( l ) ( x ) (cid:105) dx (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) U ( u )[ U ( x ∗ )] − (cid:88) j> (cid:88) l α ( l ) j ( x ∗ ) 1 λ j ( x ∗ ) − λ ( x ∗ ) (cid:104) ˇ ( x ∗ ) | d H ( x ∗ ) dx | ˇ j ( l ) ( x ∗ ) (cid:105) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ (cid:13)(cid:13) U ( u )[ U ( x ∗ )] − (cid:13)(cid:13) ζ (cid:88) (cid:96) =1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:88) j> (cid:88) l α ( l ) j ( x ∗ ) 1 λ j ( x ∗ ) − λ ( x ∗ ) (cid:104) ˇ0 ( (cid:96) ) ( x ∗ ) | d H ( x ∗ ) dx | ˇ j ( l ) ( x ∗ ) (cid:105) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:107) U ( u ) (cid:107)(cid:107) U − ( x ∗ ) (cid:107) ζ (cid:88) (cid:96) =1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:88) j> (cid:88) l α ( l ) j ( x ∗ ) 1 λ j ( x ∗ ) − λ ( x ∗ ) (cid:104) ˇ0 ( (cid:96) ) ( x ∗ ) | d H ( x ∗ ) dx | ˇ j ( l ) ( x ∗ ) (cid:105) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ e ξ ζ (cid:88) (cid:96) =1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:88) j> (cid:88) l α ( l ) j ( x ∗ ) 1 λ j ( x ∗ ) − λ ( x ∗ ) (cid:104) ˇ0 ( (cid:96) ) ( x ∗ ) | d H ( x ∗ ) dx | ˇ j ( l ) ( x ∗ ) (cid:105) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ Π e ξ ζ (cid:34)(cid:88) j> (cid:88) l | α ( l ) j ( x ∗ ) | (cid:35) ≤ Π e ξ ζ b (cid:88) j> (cid:88) l (cid:12)(cid:12)(cid:12) α ( l ) j ( x ∗ ) (cid:12)(cid:12)(cid:12) ≤ Π e ξ ζ b (1 − p ) , (6.22)where the ﬁrst equality is from the mean value theorem and x ∗ ∈ (0 , p = min ≤ x ≤ ζ (cid:88) (cid:96) =1 | α ( (cid:96) )0 ( x ) | , (cid:88) j,l | α ( l ) j ( x ) | = 1 . The probability of the quantum annealing process staying in the ground states at the ﬁnalannealing time is equal to the sum of these modulus squares given by the left hand side of(6.20), and thus (6.20)-(6.22) together imply that the probability has the bound stated inthe theorem. For ζ = 1, A is a scalar equal to zero, and hence ξ = 0. By Corollary 5.4 Morita and Nishimori (2008), (cid:88) s | q M,β ( s , t ) − q M,β ( s ) | → . We have q M,β ( s ) = M (cid:88) k =2 (cid:88) s k Z M,β exp β M M (cid:88) k =1 (cid:88) (cid:104) ij (cid:105) J ij s i,k s j,k = 1 Z M,β exp β M (cid:88) (cid:104) ij (cid:105) J ij s i, s j, M (cid:88) k =2 (cid:88) s k exp β M M (cid:88) k =2 (cid:88) (cid:104) ij (cid:105) J ij s i,k s j,k = 1 Z M,β exp β M (cid:88) (cid:104) ij (cid:105) J ij s i, s j, C ∗ , where C ∗ = (cid:80) Mk =2 (cid:80) s k exp (cid:16) β M (cid:80) Mk =2 (cid:80) (cid:104) ij (cid:105) J ij s i,k s j,k (cid:17) . Since (cid:88) s q M,β ( s ) = 1 , we have (cid:88) s exp β M (cid:88) (cid:104) ij (cid:105) J ij s i, s j, = Z M,β C ∗ , which implies q M,β ( s ) = 1 Z M,β exp βM (cid:88) (cid:104) ij (cid:105) J ij s i, s j, , Z M,β = (cid:88) s exp βM (cid:88) (cid:104) ij (cid:105) J ij s i, s j, = Z ,β/M . Therefore, q M,β ( s , t ) and q ,β/M ( s , t ) are asymptotically the same as t → ∞ .27 .4 Proof of Theorem 5 Let X M = 1 √ M M (cid:88) k =1 (cid:88) (cid:104) ij (cid:105) J ij s i,k s j,k where s is generated by q M,β ( s ). Simple algebra shows | P ( X M ( t ) ≤ x ) − P ( X ∞ ≤ x ) |≤ | P ( X M ≤ x ) − P ( X ∞ ≤ x ) | + | P ( X M ( t ) ≤ x ) − P ( X M ≤ x ) | = ( I ) + ( II ) . Consider ( I ). We have for any given θ ∈ R and S = {− , } bM , E [exp( θX M )] = (cid:88) s ∈ S Z M,β exp ( β + √ M θ ) 1 M M (cid:88) k =1 (cid:88) (cid:104) ij (cid:105) J ij s i,k s j,k = Z M,β + √ Mθ Z M,β → exp θ (cid:88) (cid:104) ij (cid:105) J ij as M → ∞ , where the last line is due to (6.23) below. We have Z M,β + √ Mθ = (cid:88) s ∈ S exp β + √ M θM M (cid:88) k =1 (cid:88) (cid:104) ij (cid:105) J ij s i,k s j,k = M (cid:89) k =1 (cid:88) s k exp β + √ M θM (cid:88) (cid:104) ij (cid:105) J ij s i,k s j,k = M (cid:89) k =1 Z β + √ MθM = (cid:16) Z β + √ MθM (cid:17) M = T r exp β + √ M θM (cid:88) (cid:104) ij (cid:105) J ij σ zi σ zj M = T r I + β + √ M θM (cid:88) (cid:104) ij (cid:105) J ij σ zi σ zj + M θ M (cid:88) (cid:104) ij (cid:105) J ij σ zi σ zj + o ( M − ) M = b + 2 b θ M (cid:88) (cid:104) ij (cid:105) J ij + o ( M − ) M , T r ( σ zi σ zj ) = 0 and T r ( σ zi σ zj · σ zi (cid:48) σ zj (cid:48) ) = 0 for i (cid:54) = i (cid:48) or j (cid:54) = j (cid:48) . Similarly, Z M,β = (cid:2) b + O ( M − ) (cid:3) M . Thus Z M,β + √ Mθ Z M,β = (cid:104) θ M (cid:80) (cid:104) ij (cid:105) J ij + o ( M − ) (cid:105) M [1 + O ( M − )] M → exp θ (cid:88) (cid:104) ij (cid:105) J ij as M → ∞ . (6.23)By Levy continuity Theorem for the moment generating function (Theorem 7.5 in Kapadiaet al. (2005)), we have X M d → X ∞ , where X ∞ follows the normal distribution with mean zero and variance (cid:80) (cid:104) ij (cid:105) J ij . Thus thereis M ( · ) such that for any M ≥ M ( (cid:15) ), | P ( X M ≤ x ) − P ( X ∞ ≤ x ) | ≤ (cid:15)/ . (6.24)Consider ( II ). Now we ﬁx M ≥ M ( (cid:15) ). By Corollary 5.4 in Morita and Nishimori (2008),there is t ( (cid:15), M ) such that for all t ≥ t ( (cid:15), M ) and all x , | P ( X M ( t ) ≤ x ) − P ( X M ≤ x ) | ≤ (cid:15)/ . (6.25)By (6.24) and (6.25), the statement is proved. Let H D = (cid:88) (cid:104) ij (cid:105) J ij σ zi σ zj and H N = Γ( t ) (cid:88) i σ xi . We have Z β ( t ) = T r (exp ( β [ H D + H N ]))= T r (cid:32) M (cid:89) k =1 exp (cid:18) βM [ H D + H N ] (cid:19)(cid:33) = T r M (cid:89) k =1 exp (cid:18) βM [ H D + H N ] (cid:19) b (cid:88) i k =1 e i k e Ti k = T r (cid:32) b (cid:88) i M =1 · · · b (cid:88) i =1 exp (cid:18) βM [ H D + H N ] (cid:19) e i e Ti exp (cid:18) βM [ H D + H N ] (cid:19) e i e Ti · · · e Ti M − exp (cid:18) βM [ H D + H N ] (cid:19) e i M e Ti M (cid:33) = b (cid:88) i M =1 · · · b (cid:88) i =1 e Ti M exp (cid:18) βM [ H D + H N ] (cid:19) e i e Ti exp (cid:18) βM [ H D + H N ] (cid:19) × e i e Ti · · · e Ti M − exp (cid:18) βM [ H D + H N ] (cid:19) e i M , (6.26)where e k , k = 1 , . . . , b , are the canonical basis. Let e k = e k, ⊗ e k, ⊗ · · · ⊗ e k,b , where e k,i is one of (1 , T and (0 , T . Let s k,i = 1 if e k,i = (1 , T and s k,i = − e k,i = (0 , T . Forany l, k ∈ { , . . . , b } , e Tl exp (cid:18) βM [ H D + H N ] (cid:19) e k = e Tl exp (cid:18) βM H D (cid:19) exp (cid:18) βM H N (cid:19) e k + β M e Tl ( H N H D − H D H N ) e k + O (cid:0) ( β/M ) (cid:1) = exp (cid:18) βM e Tl H D e l (cid:19) e Tl exp (cid:18) βM H N (cid:19) e k + O (cid:0) ( β/M ) (cid:1) = (cid:32)(cid:114)

12 sinh 2 β Γ( t ) M (cid:33) b exp βM (cid:88) (cid:104) ij (cid:105) J ij s l,i s l,j + 12 log (cid:18) coth β Γ( t ) M (cid:19) b (cid:88) i =1 s l,i s k,i + O (cid:0) ( β/M ) (cid:1) , (6.27)where the second equality is derived using the Taylor expansion, and the last equality is dueto (6.28) and (6.29). We have e Tl H D e l = (cid:88) (cid:104) ij (cid:105) J ij ( e Tl,i σ z e l,i )( e Tl,j σ z e l,j ) = (cid:88) (cid:104) ij (cid:105) J ij s l,i s l,j . (6.28)Since σ xi , i = 1 , . . . , b, commute each other, we have e Tl exp (cid:18) βM H N (cid:19) e k = e Tl b (cid:89) k =1 exp (cid:18) β Γ( t ) M σ xk (cid:19) e k = e Tl (cid:20) exp (cid:18) β Γ( t ) M σ x (cid:19) ⊗ · · · ⊗ exp (cid:18) β Γ( t ) M σ x (cid:19)(cid:21) e k = (cid:20) e Tl, exp (cid:18) β Γ( t ) M σ x (cid:19) e k, ⊗ · · · ⊗ e Tl,b exp (cid:18) β Γ( t ) M σ x (cid:19) e k,b (cid:21) = (cid:32)(cid:114)

12 sinh 2 β Γ( t ) M (cid:33) b b (cid:89) i =1 exp (cid:18)

12 log (cid:18) coth β Γ( t ) M (cid:19) s l,i s k,i (cid:19) = (cid:32)(cid:114)

12 sinh 2 β Γ( t ) M (cid:33) b exp (cid:32)

12 log (cid:18) coth β Γ( t ) M (cid:19) b (cid:88) i =1 s l,i s k,i (cid:33) (6.29)30here the second equality is derived by e A ⊗ e B = e A ⊕ B and ( A ⊗ B )( C ⊗ D ) = ( AC ) ⊗ ( BD ),and the fourth equality is due to (6.30) below. Note: please check the reasoning for the second equality

We haveexp (cid:18) β Γ( t ) M σ x (cid:19) = (cid:32) cosh β Γ( t ) M sinh β Γ( t ) M sinh β Γ( t ) M cosh β Γ( t ) M (cid:33) = (cid:114)

12 sinh 2 β Γ( t ) M exp (cid:16) log (cid:16) coth β Γ( t ) M (cid:17)(cid:17) exp (cid:16) − log (cid:16) coth β Γ( t ) M (cid:17)(cid:17) exp (cid:16) − log (cid:16) coth β Γ( t ) M (cid:17)(cid:17) exp (cid:16) log (cid:16) coth β Γ( t ) M (cid:17)(cid:17) . (6.30)By (6.26) and (6.27), we have Z β ( t )= (cid:32)(cid:114)

12 sinh 2 β Γ( t ) M (cid:33) bM (cid:88) s M · · · (cid:88) s exp βM M (cid:88) k =1 (cid:88) (cid:104) ij (cid:105) J ij s i,k s j,k + γ β ( t ) M (cid:88) k =1 b (cid:88) i =1 s i,k s i,k +1 + O ( β /M )= (cid:32)(cid:114)

12 sinh 2 β Γ( t ) M (cid:33) bM Z M,β ( t ) + O ( β /M ) . Acknowledgements:

The research of Xinyu Song was supported by the Fundamen-tal Research Funds for the Central Universities (2018110128), China Scholarship Coucil(201806485017), and National Natural Science Foundation of China (Grant No. 11871323).The research of Yazhen Wang was supported in part by NSF grants DMS-1528375 andDMS-1707605. The research of Donggyu Kim was supported in part by KAIST Settle-ment/Research Subsidies for Newly-hired Faculty grant G04170049 and KAIST Basic Re-search Funds by Faculty (A0601003029). 31 eferences

Aharonov, D. and Ta-Shma, A. (2003). Adiabatic quantum state generation and statisticalzero knowledge. In

Proceedings of the thirty-ﬁfth annual ACM symposium on Theory ofcomputing , pages 20–29. ACM.Aharonov, D., Van Dam, W., Kempe, J., Landau, Z., Lloyd, S., and Regev, O. (2008).Adiabatic quantum computation is equivalent to standard quantum computation.

SIAMreview , 50(4):755–787.Albash, T. and Lidar, D. A. (2016). Adiabatic quantum computing. arXiv preprintarXiv:1611.04471 .Albash, T., Rønnow, T. F., Troyer, M., and Lidar, D. A. (2015). Reexamining classical andquantum models for the d-wave one processor.

The European Physical Journal SpecialTopics , 224(1):111–129.Bertsimas, D. and Tsitsiklis, J. (1993). Simulated annealing.

Statistical science , 8(1):10–15.Bian, Z., Chudak, F., Macready, W. G., Clark, L., and Gaitan, F. (2013). Experimentaldetermination of ramsey numbers.

Physical review letters , 111(13):130505.Blanes, S., Casas, F., Oteo, J., and Ros, J. (2009). The magnus expansion and some of itsapplications.

Physics reports , 470(5-6):151–238.Boixo, S., Isakov, S. V., Smelyanskiy, V. N., Babbush, R., Ding, N., Jiang, Z., Bremner,M. J., Martinis, J. M., and Neven, H. (2018). Characterizing quantum supremacy innear-term devices.

Nature Physics , 14(6):595.Boixo, S., Smelyanskiy, V. N., Shabani, A., Isakov, S. V., Dykman, M., Denchev, V. S.,Amin, M., Smirnov, A., Mohseni, M., and Neven, H. (2014). Computational role ofcollective tunneling in a quantum annealer. arXiv preprint arXiv:1411.4036 .Boixo, S., Smelyanskiy, V. N., Shabani, A., Isakov, S. V., Dykman, M., Denchev, V. S., Amin,M. H., Smirnov, A. Y., Mohseni, M., and Neven, H. (2016). Computational multiqubittunnelling in programmable quantum annealers.

Nature communications , 7:10327.Born, M. and Fock, V. (1928). Beweis des adiabatensatzes.

Zeitschrift f¨ur Physik , 51(3-4):165–180.Brady, L. T. and van Dam, W. (2016). Quantum Monte Carlo simulations of tunneling inquantum adiabatic optimization.

Physical Review A , 93(3):032304.Britton, J. W., Sawyer, B. C., Keith, A. C., Wang, C.-C. J., Freericks, J. K., Uys, H.,Biercuk, M. J., and Bollinger, J. J. (2012). Engineered two-dimensional ising interactionsin a trapped-ion quantum simulator with hundreds of spins.

Nature , 484(7395):489.32rooke, J., Bitko, D., and Aeppli, G. (1999). Quantum annealing of a disordered magnet.

Science , 284(5415):779–781.Browne, D. (2014). Quantum computation: Model versus machine.

Nature Physics ,10(3):179.Brumﬁel, G. (2012). Simulation: Quantum leaps.

Nature News , 491(7424):322.Cai, T., Kim, D., Wang, Y., Yuan, M., and Zhou, H. H. (2016). Optimal large-scale quantumstate tomography with pauli measurements.

The Annals of Statistics , 44(2):682–712.Deutsch, D. (1985). Quantum theory, the church–turing principle and the universal quantumcomputer.

Proc. R. Soc. Lond. A , 400(1818):97–117.DiVincenzo, D. P. (1995). Quantum computation.

Science , 270(5234):255–261.Farhi, E., Goldstone, J., and Gutmann, S. (2002). Quantum adiabatic evolution algorithmsversus simulated annealing. arXiv preprint quant-ph/0201031 .Farhi, E., Goldstone, J., Gutmann, S., Lapan, J., Lundgren, A., and Preda, D. (2001). Aquantum adiabatic evolution algorithm applied to random instances of an np-completeproblem.

Science , 292(5516):472–475.Farhi, E., Goldstone, J., Gutmann, S., and Sipser, M. (2000). Quantum computation byadiabatic evolution. arXiv preprint quant-ph/0001106 .Geman, S. and Geman, D. (1984). Stochastic relaxation, gibbs distributions, and thebayesian restoration of images.

IEEE Transactions on pattern analysis and machine in-telligence , (6):721–741.Hajek, B. (1988). Cooling schedules for optimal annealing.

Mathematics of operationsresearch , 13(2):311–329.Holevo, A. S. (2011).

Probabilistic and statistical aspects of quantum theory , volume 1.Springer Science & Business Media.Irb¨ack, A., Peterson, C., and Potthast, F. (1996). Evidence for nonrandom hydrophobicitystructures in protein chains. proceedings of the National Academy of Sciences , 93(18):9533–9538.Isakov, S. V., Mazzola, G., Smelyanskiy, V. N., Jiang, Z., Boixo, S., Neven, H., and Troyer,M. (2016). Understanding quantum tunneling through quantum Monte Carlo simulations.

Physical review letters , 117(18):180402.Johnson, M. W., Amin, M. H., Gildert, S., Lanting, T., Hamze, F., Dickson, N., Harris, R.,Berkley, A. J., Johansson, J., and Bunyk, P. (2011). Quantum annealing with manufac-tured spins.

Nature , 473(7346):194. 33¨org, T., Krzakala, F., Kurchan, J., and Maggs, A. C. (2010). Quantum annealing of hardproblems.

Progress of Theoretical Physics Supplement , 184:290–303.Kadowaki, T. and Nishimori, H. (1998). Quantum annealing in the transverse ising model.

Physical Review E , 58(5):5355.Kato, T. (1978). Trotter’s product formula for an arbitrary pair of self-adjoint contractionsemigroup.

Topics in Func. Anal., Adv. Math. Suppl. Studies , 3:185–195.Kirkpatrick, S., Gelatt, C. D., and Vecchi, M. P. (1983). Optimization by simulated anneal-ing.

Science , 220(4598):671–680.Majewski, J., Li, H., and Ott, J. (2001). The ising model in physics and statistical genetics.

The American Journal of Human Genetics , 69(4):853–862.Martoˇn´ak, R., Santoro, G. E., and Tosatti, E. (2002). Quantum annealing by the path-integral Monte Carlo method: The two-dimensional random ising model.

Physical ReviewB , 66(9):094203.McGeoch, C. C. (2014). Adiabatic quantum computation and quantum annealing: Theoryand practice.

Synthesis Lectures on Quantum Computing , 5(2):1–93.Morita, S. and Nishimori, H. (2008). Mathematical foundation of quantum annealing.

Jour-nal of Mathematical Physics , 49(12):125210.Nielsen, M. A. and Chuang, I. L. (2000).

Quantum computation and quantum information .Cambridge Univ. Press.O’Gorman, B., Babbush, R., Perdomo-Ortiz, A., Aspuru-Guzik, A., and Smelyanskiy, V.(2015). Bayesian network structure learning using quantum annealing.

The EuropeanPhysical Journal Special Topics , 224(1):163–188.Perdomo-Ortiz, A., Dickson, N., Drew-Brook, M., Rose, G., and Aspuru-Guzik, A. (2012).Finding low-energy conformations of lattice protein models by quantum annealing.

Sci-entiﬁc reports , 2:571.Perdomo-Ortiz, A., Fluegemann, J., Narasimhan, S., Biswas, R., and Smelyanskiy, V. N.(2015). A quantum annealing approach for fault detection and diagnosis of graph-basedsystems.

The European Physical Journal Special Topics , 224(1):131–148.Rieﬀel, E. G., Venturelli, D., O’Gorman, B., Do, M. B., Prystay, E. M., and Smelyanskiy,V. N. (2015). A case study in programming a quantum annealer for hard operationalplanning problems.

Quantum Information Processing , 14(1):1–36.Rieger, H. and Kawashima, N. (1999). Application of a continuous time cluster algorithm tothe two-dimensional random quantum ising ferromagnet.

The European Physical JournalB-Condensed Matter and Complex Systems , 9(2):233–236.34ønnow, T. F., Wang, Z., Job, J., Boixo, S., Isakov, S. V., Wecker, D., Martinis, J. M.,Lidar, D. A., and Troyer, M. (2014). Deﬁning and detecting quantum speedup.

Science ,345(6195):420–424.Sakurai, J. and Napolitano, J. (2017). Modern quantum mechanics.

Modern QuantumMechanics, by JJ Sakurai, Jim Napolitano, Cambridge, UK: Cambridge University Press,2017 .Shankar, R. (2012).

Principles of quantum mechanics . Springer Science & Business Media.Stauﬀer, D. (2008). Social applications of two-dimensional ising models.

American Journalof Physics , 76(4):470–473.Suzuki, M. (1976). Generalized trotter’s formula and systematic approximants of exponentialoperators and inner derivations with applications to many-body problems.

Communica-tions in Mathematical Physics , 51(2):183–190.Trotter, H. F. (1959). On the product of semi-groups of operators.

Proceedings of theAmerican Mathematical Society , 10(4):545–551.Wang, Y. (2012). Quantum computation and quantum information.

Statistical Science ,27(3):373–394.Wang, Y. (2013). Asymptotic equivalence of quantum state tomography and noisy matrixcompletion.

The Annals of Statistics , 41(5):2462–2504.Wang, Y., Wu, S., and Zou, J. (2016). Quantum annealing with Markov chain Monte Carlosimulations and D-Wave quantum computers.

Statistical Science , 31(3):362–398.Wang, Y. and Xu, C. (2015). Density matrix estimation in quantum homodyne tomography.