[PDF] Optimization of the Variational Quantum Eigensolver for Quantum Chemistry Applications

Abstract

This work studies the variational quantum eigensolver algorithm, designed to determine the ground state of a quantum mechanical system by combining classical and quantum hardware. Methods of reducing the number of required qubit manipulations, prone to induce errors, for the variational quantum eigensolver are studied. We formally justify the qubit removal process as sketched by Bravyi, Gambetta, Mezzacapo and Temme [arXiv:1701.08213 (2017)]. Furthermore, different classical optimization and entangling methods, both gate based and native, are surveyed by computing ground state energies of H 2 and LiH. This paper aims to provide performance-based recommendations for entangling methods and classical optimization methods. Analyzing the VQE problem is complex, where the optimization algorithm, the method of entangling, and the dimensionality of the search space all interact. In specific cases however, concrete results can be shown, and an entangling method or optimization algorithm can be recommended over others. In particular we find that for high dimensionality (many qubits and/or entanglement depth) certain classical optimization algorithms outperform others in terms of energy error.

Full PDF

OOptimization of the Variational Quantum Eigensolver for Quantum ChemistryApplications

R.J.P.T. de Keijzer, V.E. Colussi,

1, 2

B. ˘Skori´c, and S.J.J.M.F. Kokkelmans Eindhoven University of Technology, P. O. Box 513, 5600 MB Eindhoven, The Netherlands ∗ INO-CNR BEC Center and Dipartimento di Fisica, Universit`a di Trento, 38123 Povo, Italy (Dated: February 4, 2021)This work studies the variational quantum eigensolver algorithm, designed to determine theground state of a quantum mechanical system by combining classical and quantum hardware. Meth-ods of reducing the number of required qubit manipulations, prone to induce errors, for the varia-tional quantum eigensolver are studied. We formally justify the qubit removal process as sketched byBravyi, Gambetta, Mezzacapo and Temme [arXiv:1701.08213 (2017)]. Furthermore, diﬀerent clas-sical optimization and entangling methods, both gate based and native, are surveyed by computingground state energies of H and LiH. This paper aims to provide performance-based recommenda-tions for entangling methods and classical optimization methods. Analyzing the VQE problem iscomplex, where the optimization algorithm, the method of entangling, and the dimensionality ofthe search space all interact. In speciﬁc cases however, concrete results can be shown, and an en-tangling method or optimization algorithm can be recommended over others. In particular we ﬁndthat for high dimensionality (many qubits and/or entanglement depth) certain classical optimizationalgorithms outperform others in terms of energy error. I. INTRODUCTION

Presently, quantum computing is in the noisyintermediate-scale (NISQ) era [1], where the avail-able universal quantum computers cannot outperformtheir classical counterparts except for a few, speciﬁc,well-designed cases [2]. However, even in the NISQera quantum computers can be used in practice. Onesuch application is the variational quantum eigensolver(VQE) algorithm, ﬁrst proposed in a paper by Peruzzoand McClean in 2014 [3]. The algorithm is hybrid,e.g. at certain points a quantum processor unit (QPU)is addressed. The VQE algorithm has shown proof ofconcept for small molecules with several diﬀerent designsof qubits [4–8]. The aim of the algorithm is to determinethe lowest eigenvalue of a quantum Hamiltonian, whichcan be equated to ﬁnding the ground state energy of amolecule. The algorithm has also been implemented tosolve NP-hard problems, such as the travelling salesmanproblem [9], in polynomial time.To illustrate how the VQE algorithm works, aschematic of the relevant circuit diagram is shown inFig. 1. To determine the ground state energy of a Hamil-tonian the VQE algorithm maps the fermionic degrees offreedom of molecules onto a set of qubits. These qubitscan then be manipulated into a trial state with a certainenergy by applying a sequence of entanglement opera-tors U ent and individual rotations U i,j . The length ofthis sequence is the depth d which plays an importantrole in determining which part of the state space canbe addressed by the VQE. The entanglement operator U ent is a multi-qubit gate that ensures entangled states ∗ Corresponding author: [email protected] can be reached and thus its choice is critical in exploit-ing the quantum nature of the method. This operator isnecessary for utilizing the computational powers of themulti-qubit structure. By applying a problem dependentHamiltonian to the trial state, its energy can be mea-sured using a QPU. Based on these measurements theclassical part of the computer proposes a new trial state(new rotations) by means of a classical optimization algo-rithm [10]. Because of the limited hardware of quantumcomputers, each qubit manipulation has a non-negligibleprobability of error. Therefore, it is important to mini-mize the number of necessary manipulations. { |0000 U U U U U U U U U ent H d q q q q ClassicalOptimizationAlgorithm

Figure 1: Schematic circuit diagram of the VQE algo-rithm. The qubits are manipulated by a sequence of en-tanglement operators U ent and individual rotations U i,j into a prepared trial state.This paper examines three ways of achieving thisminimization: (i) by reducing the number of qubits usedto describe the problem as proposed by Bravyi et al.[11], (ii) by reducing the number of qubit manipulationsusing entanglement methods native to the physicalsystem, and (iii) by optimizing the classical searchalgorithm used to converge on the ground state. a r X i v : . [ qu a n t - ph ] F e b The layout of this paper is as follows. In Secs. II and IIIthe VQE algorithm and the method of simulation are de-scribed. Sec. IV gives a constructive proof of the schemeby Bravyi et al. [11] Secs. V and VI respectively describemulti-qubit entangling methods and classical optimiza-tion algorithms. Finally, in Sec. VII we compare andcontrast the methods and their results on small moleculeVQE problems.

II. QUANTUM MEASUREMENTS ANDSIMULATION SETUP

The VQE algorithm exploits the variational princi-ple and aims to ﬁnd a set of parameters minimizingthe energy of a quantum system. The states of a sys-tem are mapped to qubits and the classical optimiza-tion algorithm optimizes the variational calculation. Todo so for Hamiltonians of small molecules requires acharacterization of their electron spin-orbitals. For thisthe Hartree-Fock method is applied, which reformulatesthe Hamiltonian in ﬁrst-quantization form [12]. TheBorn-Oppenheimer approximation is applied which en-sures that the nuclear contribution to the Hamiltonianis constant [12]. The Hamiltonian H under the Born-Oppenheimer approximation becomes H = − (cid:88) i ∇ (cid:126)r i − (cid:88) i,j Z i | (cid:126)R i − (cid:126)r j | + (cid:88) i,j>i | (cid:126)r i − (cid:126)r j | + (cid:88) i,j>i Z i Z j | (cid:126)R i − (cid:126)R j | , (1)where, (cid:126)R i and Z i are the position and charge of nucleus i . (cid:126)r i is the position of electron i . In VQE, the Hamilto-nian is described in Hartree units [13]. The Hamiltonianis reformulated in second-quantization form [14] H , byprojecting it onto a ﬁnite set of orthogonal spin-orbitalmodes { φ } ni =1 , as H = V nn + (cid:88) p,q h pq a † p a q + 12 (cid:88) p,q,r,s h pqrs a † p a † q a r a s , (2)where the nuclei-nuclei interaction potential V nn = (cid:80) i,j>i Z i Z j / | (cid:126)R i − (cid:126)R j | is constant and the annihilationand creation operators a † i and a i work on a set of or-thogonal spin-orbital modes { φ } ni =1 determined by thelinear combination of atomic orbitals (LCAO) methodand which induce the fermionic algebra [15]. The num-ber of considered orbitals determines the complexity ofthe problem and with that the number of required qubits.For this paper the 1 s orbital for hydrogen H and the 1 s ,2 s and 2 p x orbitals for lithium-hydrogen LiH (aligned onthe x -axis) are low enough in energy to be consideredin the LCAO construction. All higher energy atomic or-bitals are ignored. The molecular integrals h pq and h pqrs are referred to as the one-electron and two-electron inte-grals respectively h pq = (cid:90) S ∗ p ( ω ) S q ( ω ) dω × (cid:90) R φ ∗ p ( (cid:126)r ) (cid:32) ∇ (cid:126)r − (cid:88) i Z i | (cid:126)R i − (cid:126)r | (cid:33) φ q ( (cid:126)r ) d(cid:126)r, (3) h pqrs = (cid:90) (cid:90) S ∗ p ( ω ) S s ( ω ) S ∗ q ( ω ) S r ( ω ) dω dω × (cid:90) R (cid:90) R φ ∗ p ( (cid:126)r ) φ ∗ q ( (cid:126)r ) φ r ( (cid:126)r ) φ s ( (cid:126)r ) | (cid:126)r − (cid:126)r | d (cid:126)r d (cid:126)r , (4)where S i ( ω ) is the spin part of the spin orbitals [16].Some important concepts in quantum computing areintroduced below. These are required to understand theVQE method and are used in proofs that follow. Deﬁnition 1.

Pauli Matrices I := (cid:18) (cid:19) , X := (cid:18) (cid:19) , Y := (cid:18) − ii (cid:19) , Z := (cid:18) − (cid:19) . (5) Deﬁnition 2. m -fold Single Qubit Operator σ ρj := I ⊗ ( j − ⊗ ρ ⊗ I ⊗ ( m − j ) , j ∈ , ..., m, ρ ∈ { X, Y, Z } . (6)These m -fold single qubit operator work on a m -qubitstate but only change the state of one of the qubits. Deﬁnition 3. m -fold Pauli operator space P m P m := {± , ± i } × { I, X, Y, Z } ⊗ m . (7) P m forms a group under operator multiplication sincethe Pauli operators P form a group. Note that since thePauli matrices are a complete set for the Hilbert spaceof complex 2 × P m is a complete set for theHilbert space of complex 2 m × m matricesThe Jordan-Wigner transformation can be used to mapthe creation and annihilation operators of Eq. (2) to com-binations of Pauli matrices [17]. The transformation isgiven by a j = I ⊗ j − ⊗ σ + ⊗ Z ⊗ m − j , (8) a † j = I ⊗ j − ⊗ σ − ⊗ Z ⊗ m − j , (9)where m is the number of qubits. The σ ± operators aredeﬁned as σ ± := X ± iY . (10)It can be shown that the transformed operators a j and a † j obey the canonical commutation relations forfermions. Since this transformation is local the electronconﬁguration can be determined from the qubit state.Since the number of required qubits is a key factor deter-mining the required eﬀort for calculation on both physicalsystems and in simulations it is desirable to reduce thenumber of qubits where possible. This can be done byprojecting only on those states describing the right num-ber of electrons (e.g. for LiH the state | (cid:105) woulddescribe three electrons while only two electrons, in theouter shells, are analyzed for the problem). For this pur-pose a projector operator Π melecN on the states describing N electrons is deﬁned asΠ melecN := (cid:89) j (cid:54) = Nj =0 ,...,m N state − jN − j , (11)where N state is the operator returning the state it oper-ates on multiplied by the number of electrons describedby the state. A similar projection can be done on spin upand down of states. Another way of reducing the numberof qubits necessary to describe the problem is usingthe symmetry of the Hamiltonian as described in Sec. IV.Because P m is a complete set for the Hilbert space ofcomplex 2 m × m matrices, any m -qubit Hamiltonian H can be written as a linear combination of elements of the m -fold Pauli operators in P m as H = (cid:88) k h k σ k , σ k = σ k ⊗ σ k ⊗ ... ⊗ σ km ∈ P m ,h k = (cid:104) H, σ k (cid:105) F = (cid:88) i,j H Tij σ kij = tr ( H † σ k ) . (12)This is a ﬁnite dimensional Fourier decomposition withthe Frobenius inner product (cid:104)· , ·(cid:105) F on matrix represen-tations of the operators. Note that in Eq. (12) σ kij is amatrix element, not a Pauli matrix working on a qubit.Any rotation on a single qubit can be written as R ( α, β, γ, δ ) = e iα Z β X γ Z δ , (13)where α, β, γ, δ ∈ [0 , π ]. The operators X φ and Z φ de-note a rotation of an angle φ around the given axis andare given by X φ = cos( φ/ I + sin( φ/ X,Y φ = cos( φ/ I − i sin( φ/ Y,Z φ = cos( φ/ I + sin( φ/ Z. (14) III. TRIAL STATE INITIALIZATION ANDMEASUREMENT

In order to initialize a trial state the individual qubitshave to be put in a requested state using a sequence ofrotation and entanglement operations. The depth d of astate preparation is deﬁned as the length of this sequence,or equivalently the number of entanglement operationsapplied. Each state on the Bloch sphere of a single qubitcan be reached with a ZXZ -rotation. Such a rotation ona qubit q at a depth i can be written as U q,i ( (cid:126)θ ) = Z θ q,i X θ q,i Z θ q,i , (15)where (cid:126)θ has three elements for every qubit and depthpair. In total 3 d + 3 rotations would be done for everyqubit. However, in this paper the ansatz state will alwaysbe the vacuum state. The ﬁrst Z -rotation can thereforebe omitted. Thus the parameter vector (cid:126)θ is an elementof the search space [0 , π ] ⊗ D , where D = (3 d + 2) m . Toreach the trial state described by the parameter vector (cid:126)θ the ansatz state | ψ init (cid:105) is rotated and entangled. Theentanglement operator is U ent . The prepared trial stateafter all rotations and entanglements will be | Ψ( (cid:126)θ ) (cid:105) = (cid:32) m (cid:89) q =1 U q,d ( (cid:126)θ ) × U ent (cid:33) × (cid:32) m (cid:89) q =1 U q,d − ( (cid:126)θ ) × U ent (cid:33) × ... × (cid:32) m (cid:89) q =1 U q, ( (cid:126)θ ) (cid:33) | ψ init (cid:105) . (16)With the trial state prepared, the expectation value of H can be measured on a quantum computer as (cid:104) H (cid:105) (cid:126)θ = (cid:104) Ψ( (cid:126)θ ) | H | Ψ( (cid:126)θ ) (cid:105) = (cid:88) k h k (cid:104) Ψ( (cid:126)θ ) | σ k | Ψ( (cid:126)θ ) (cid:105) = (cid:88) k h k (cid:104) Ψ( (cid:126)θ ) | σ k ⊗ σ k ⊗ ... ⊗ σ km | Ψ( (cid:126)θ ) (cid:105) . (17)In quantum computers these expectation values requiresuﬃcient measurements. In simulation the expectationvalues require an inner product of matrices and vectors. IV. Z SYMMETRY QUBIT REDUCTIONSCHEME

This section formally justiﬁes the qubit removal pro-cess as described by Bravyi et al. [11]. In this processthe Z symmetries in Hamiltonians [11], which often orig-inate from geometric symmetries in the molecule, areexploited to reduce the number of qubits necessary todescribe the Hamiltonian. The reductions will substan-tially lower computation power and time. The main ideaof the qubit reduction scheme is to transform a m -qubitHamiltonian in such a way that the Pauli operators inthe decomposition will all have either I or X as the last r ∈ N factors (e.g. YXXZIX and ZXYZXX). If the i -thterm of every Pauli operator in the decomposition is I or X then [ H, σ xi ] = 0, so the Hamiltonian H and σ xi commute and thus share common eigenvectors. One cantherefore replace the i -th factors of the operators withthe eigenvalues ± m -fold Pauli operators becausethese do not inﬂuence commutation. Doing so every m -fold Pauli operator becomes its own inverse. First somedeﬁnitions are given and lemmas are proven in order tohelp provide a constructive proof of the scheme. Deﬁnition 4.

Symmetry GroupA symmetry group S of a group G is an abelian subgroupof G such that − I (cid:54)∈ S . Deﬁnition 5.

Center and CentralizerLet G be a group. The center Z ( G ) of G is the set ofelements commuting with every element of G . Z ( G ) = { z ∈ G | zg = gz ∀ g ∈ G } . (18)The centralizer C G ( A ) of A ⊆ G is the set of all elementsin G which commute with every element of A . C G ( A ) = { g ∈ G | ag = ga ∀ a ∈ A } . (19) Lemma . Symmetry group commuting with 2 m × m HamiltonianLet S ⊆ P m be a symmetry group. S commutes with a2 m × m Hamiltonian H = (cid:80) k h k σ k if S commutes withevery Pauli operator in the decomposition of H. Thus,let H = (cid:80) k h k σ k . If ∀ k S commutes with σ k then S commutes with H . Proof. sH = s (cid:80) k h k σ k = (cid:80) k h k s σ k = (cid:80) k h k σ k s = Hs Lemma . Redeﬁning of generatorsLet the generators τ , τ , ..., τ r generate the abelian group S = (cid:104) τ , τ , ..., τ r (cid:105) ⊂ P m and let i, j ∈ { , ....r } with i (cid:54) = j . Then S = (cid:104) τ , τ , ..., τ i , ..., τ j , ..., τ r (cid:105) = (cid:104) τ , τ , ..., τ i , ..., τ j − , τ i τ j , τ j +1 , ..., τ r (cid:105) Proof.

Since τ i , τ j ∈ S it holds that τ i τ j ∈ S . Therefore, (cid:104) τ , τ , ..., τ i , ..., τ j − , τ i τ j , τ j +1 , ..., τ r (cid:105) ⊆(cid:104) τ , τ , ..., τ i , ..., τ j , ..., τ r (cid:105) . Now if τ i τ j , τ i ∈ S then τ i τ i τ j = τ j ∈ S . Therefore (cid:104) τ , τ , ..., τ i , ..., τ j , ..., τ r (cid:105) ⊆(cid:104) τ , τ , ..., τ i , ..., τ j − , τ i τ j , τ j +1 , ..., τ r (cid:105) . Lemma . Commutation of productLet

A, B , B and B be operators such that A commuteswith B and anti-commutes with B and B . Then B B and B B anti-commute with A. B B and B B com-mute with A. Proof. AB B = B AB = − B B A and AB B = − B AB = − B B A . So indeed B B and B B anti-commute with A. Furthermore, AB B = − B AB = B B A and AB B = − B AB = B B A . So indeed B B and B B anti-commute with A.An important result of stabilizer theory [18] is that asymmetry group commuting with a m -qubit Hamilto-nian will at most require m generators. This is a keyfact in the reduction scheme (the exact proof of thisis given by Fujii [19]). A proof sketch would take thefollowing form. Each symmetry group commuting with a m -qubit Hamiltonian is a symmetry group of P m . Eachsymmetry group generator projects a m -qubit basisstate onto one of its eigenvalues ±

1. Therefore, each ofthe generators divides the Hilbert space of m -qubits intotwo subspaces. The generators share a common set ofeigenvectors as they all commute. The generators thuspartition the m -qubit space into 2 subspaces.Since there are only 2 m basis states, ≤ m .Now the algorithm for the qubit reduction is given be-low. For notation purposes the single qubit operators X, Y and Z on qubit i are sometimes denoted as σ Xi , σ Yi and σ Zi respectively. The ﬁrst goal is to ﬁnd a symmetrygroup S , commuting with H , which has a maximal num-ber of generators. First oﬀ, every Pauli operator σ k ∈ P m will be encoded as a binary row-vector ( a x | a z ) of length2 m as follows σ ( a x | a z ) = σ ( a x , ..., a xm | a z , ..., a zm ) = m (cid:89) i =1 ( σ xi ) a xi ( σ zi ) a zi , (20)where a xi , a zi ∈ { , } . As an example IZXY =(0011 | m -dimensional vectors in Z , resulting in a value 0 or 1, thefollowing commutation relation holds σ ( a x | a z ) σ ( b x | b z ) = ( − a x · b z + a z · b x σ ( b x | b z ) σ ( a x | a z ) . (21)Following this, a scalar product × is deﬁned on the vectorspace of binary vectors of length 2 m as a × b := a x · b z + a z · b x ∈ { , } . (22)This vector product is symmetrical and linear in both a and b . By evaluating this vector product it can easily bedetermined if the corresponding operators will commuteor anti-commute, see Eq. (21).The set of k Pauli operators σ , ..., σ k appearingin the decomposition of H can be represented by vectors( a x | a z ) , ..., ( a kx | a kz ) which can in turn be represented in a k × m matrix G as G =  · · · a x · · · · · · a z · · ·· · · a x · · · · · · a z · · · ... ... · · · a rx · · · · · · a rz · · ·  (23)Now if a Pauli operator σ ( b x | b z ) were to commute withevery term in H it must be that G × ( b x | b z ) = 0, wherethe vector product × , deﬁned in Eq. (22), is taken forevery row in G . From the theory of stabilizer codes theconcept of a parity check matrix E is borrowed [20]. Thismatrix E is deﬁned as E =  · · · a z · · · · · · a x · · ·· · · a z · · · · · · a x · · · ... ... · · · a rz · · · · · · a rx · · ·  Note that EG T = 0. Now if the operator σ ( b x | b z ) were tobe part of the symmetry group S then ( b x | b z ) ∈ ker( E ).Next a basis for ker( E ) can be constructed. Let d :=dim(ker( E )). These basis vectors b , ..., b d give rise toPauli operators σ , ..., σ d ∈ P m . However, it is notguaranteed that these Pauli operators mutually commuteand thus will form a set of generators for a symme-try group S . Commutation can be enforced by apply-ing the algorithm below to obtain a set of new vectors g , ..., g r ∈ span( b , ..., b d ) with r ≤ d . This is the partnot formally justiﬁed in the paper by Bravyi et al [11].The operators corresponding to these vectors will be lin-early independent and commuting. We are thus lookingfor generators of a maximal Abelian subgroup of ker( E ).Hence ∀ i, j ∈ { , ..., r } : g i × g j = 0 . (24)The algorithm for ﬁnding maximal abelian subgroups [21]is described as follows1. Deﬁne G = (cid:104) σ ( b ) , ...σ ( b d ) (cid:105) .2. Now compute the center C of G and set A := C .Since A is abelian A ⊆ C G ( A ).3. If C G ( A ) = A then A is a maximal abelian sub-group of G . Else, take an arbitrary a ∈ C G ( A ) \ A .Deﬁne A := (cid:104) A ∪ { a }(cid:105) and repeat step 3.This procedure must halt as G is ﬁnitely generated and A keeps growing while still A ⊆ G holds. As mentionedpreviously, it is known that there are maximally m gen-erators in S [19]. Thus S = (cid:104) τ , τ , ...τ r (cid:105) where r ≤ m . Deﬁnition 6.

Generator transformLet i ∈ , ..., r and let τ i be a generator of a symmetrygroup S = (cid:104) τ , ..., τ r (cid:105) . Let τ i anti-commute with a single qubit operator σ ρj , where j ∈ , ...m and ρ ∈ { X, Y, Z } .Deﬁne new operators ˜ τ k as˜ τ k = (cid:40) τ k if k = i ∨ τ k σ ρj = σ ρj τ k τ i τ k else . (25)These new operators are linearly independent, mutuallycommuting and generate S according to lemma 2. Fur-thermore, these new generators were transformed in sucha way that ˜ τ i anti-commutes with σ ρj while all other ˜ τ k commute with σ ρj according to lemma 3.For each n ≤ r the qubit reduction scheme requires atriple { A n , B n , V n } with A n = { q ( i ) | i = 1 , ..., n i (cid:54) = j → q ( i ) (cid:54) = q ( j ) } a set of qubit-indices, B n = { ρ ( i ) | i =1 , ..., n ρ ( i ) ∈ { X, Z }} a set of labels and V n = { ν i | i =1 , ..., r, ν i = τ α ...τ α r r , α j ∈ { , }} a set of generatorsof S . This triple { A n , B n , V n } should satisfy the predi-cateΠ( { A n , B n , V n } ) : ∀ i ∈ { , ..., n } ∀ j ∈ { , ..., r } σ ρ ( i ) q ( i ) ν j = ( − δ i,j ν j σ ρ ( i ) q ( i ) (26)This requirement states that each qubit i should anti-commute only with generator i . Lemma . Assume that Π( { A n , B n , V n } ) is satisﬁed. Let τ n +1 = P P ...P m where P i ∈ { I, X, Y, Z } . There exists j ∈ { , ..., m } such that j (cid:54)∈ A n and P j (cid:54) = I . Proof.

Proof by contradiction. Assume that ∀ i (cid:54)∈ A n : P i = I . Deﬁne the subgroup S nl < P m for l ≤ n as S nl ( { A n , B n , V n } ) = { σ = P P ...P m | j (cid:54)∈ A n ⇒ P j = I, ∀ i ∈ { , ..., l } στ i = τ i σ, ∀ w ∈ { , ..., n } σσ ρ ( w ) q ( w ) = σ ρ ( w ) q ( w ) σ } (27) S nl is a subgroup of P m as for every l ≤ n it is true that I ⊗ m ∈ S l and if σ a , σ b ∈ S nl then σ a σ b ∈ S nl (the inversesare of course trivially satisﬁed as every element is its owninverse). One can see that τ n +1 ∈ S nn by construction.From the condition ∀ w ∈ { , ..., n } σσ p q ( w ) q ( w ) = σ p q ( w ) q ( w ) σ itfollows that | S n | = 2 n , as in every position in A n onlytwo operators out of { I, X, Y, Z } are allowed. It is easyto see that S nl +1 is a subgroup of S nl . Lagrange’s theoremgives | S nl | / | S nl +1 | = | S nl /S nl +1 | ∈ N . (28)Note now that σ ρ ( l +1) q ( l +1) is in S nl but not in S nl +1 , since itmust anti commute with τ l +1 . So, | S nl | / | S nl +1 | ≥

2. Then | S n | / | S nn | ≥ n . Therefore, | S n | = 1 and thus S n = I , theonly group with one element. But by construction it wasknown that τ n +1 ∈ S nn . Since I can not be a generatora contradiction is reached. Therefore, there must indeedexist a j ∈ , ..., m such that j (cid:54)∈ A n and P j (cid:54) = I . Theorem 5.

Let S = (cid:104) τ , τ , ...τ r (cid:105) be a symmetry group.For every n ≤ r there exists a triple { A n , B n , V n } suchthat Π( { A n , B n , V n } ) is satisﬁed. Proof.

The proof is given by induction over n Base case n = 1 : consider τ = P P ...P m where P i ∈ { I, X, Y, Z } . Since τ is a generator there must bea P j (cid:54) = I . The ﬁrst qubit index is chosen as q (1) = j thus A = { j } . The ﬁrst label ρ (1) is chosen as ρ (1) = (cid:40) X if P j ∈ { Y, Z } Z if P j = X . (29)Thus, B = { ρ (1) } . The generator transform from deﬁ-nition 6 can now be applied to the original generators of S to yield new generators ˜ τ , ..., ˜ τ r which form V . Forease of argumentation the new generators ˜ τ , ..., ˜ τ r arerenamed to τ , ..., τ r . Lemmas 2 and 3 now show thatΠ( { A , B , V } ) is satisﬁed. Induction cases ≤ n + 1 ≤ r .From lemma 4 there exists a j ∈ , ...m such that j (cid:54)∈ A n and P j (cid:54) = I . One can then choose q ( n + 1) = j and ρ ( n + 1) = (cid:40) X if P j ∈ { Y, Z } Z if P j = X . (30)The generator transform of deﬁnition 6 can be applied tothe generators in V n to yield new generators ˜ τ , ..., ˜ τ r .These generators form V n +1 . For ease of argumenta-tion the new generators ˜ τ , ..., ˜ τ r are renamed to τ , ..., τ r .Π( { A n +1 , B n +1 , V n +1 } ) is now satisﬁed.The existence and construction of a triple { A r , B r , V r } such that Π( { A r , B r , V r } ) is satisﬁed has been proven.Using the found triple { A r , B r , V r } one can deﬁne theoperators U i for i = 1 , ...r, according to U i = (cid:40) ( σ zq ( i ) + τ i )( σ zq ( i ) + σ xq ( i ) ) if ρ ( i ) = Z √ ( σ xq ( i ) + τ i ) if ρ ( i ) = X . (31)Below certain properties of the deﬁned operators U i areshown. Since Pauli matrices are Hermitian and unitary σ ρ ( i ) † q ( i ) = σ ρ ( i ) q ( i ) , τ † i = τ i and σ ρ ( i ) † q ( i ) σ ρ ( i ) q ( i ) = τ † i τ i = 1. Usingthe implied commutation relations the following holds U † i U i = 1 , U i σ xq ( i ) U † i = τ i ,U i σ xq ( j ) U † i = σ xq ( j ) for j (cid:54) = i. (32)This allows for the deﬁnition of the unitary operator U = U U ...U r W , where W is a permutation operatorassigning qubit i to qubit q ( i ). As a combination of uni-tary operators U is also unitary. The following relationfollows from Eq. (32) U σ xi U † = τ i . The transformed Hamiltonian H (cid:48) can be deﬁned as H (cid:48) = U † HU = (cid:88) k h k U † σ k U. (33)Deﬁne η k = U † σ k U . Since ∀ i, j ∈ { , ..., r } [ σ i , τ j ] = 0it must be that ∀ i, j ∈ { , ..., r } [ η i , σ xj ] = 0. Therefore, H (cid:48) commutes with the last r qubits and thus shareseigenvectors with σ xm − ( r − , ...σ xm . The last r qubits canthus be replaced by the X operator eigenvalues ± Z symmetry qubit reductionscheme described by Bravyi et al. [11]. It shows a con-structive proof of the scheme. Since it is constructive itshould be feasible to implement this scheme in code andput it to use. Bravyi et al. have successfully applied thisscheme in practice to Hamiltonians of small moleculessuch as H and LiH to reduce the number of qubits nec-essary to describe the problem by 1. Because the focusof this paper is on the individual inﬂuences of the entan-gling methods and the classical optimization algorithmson the VQE algorithm, the reduction algorithm will notbe used in the rest of the paper. V. MULTI-QUBIT ENTANGLING METHODS

In the preparation of the trial state the entanglementoperator U ent ensures that the full Hilbert space oftrial states can be reached. In this section multi-qubitentangling methods will be described, some gate basedand others native to the physical qubit systems. In thiswork we study the Rydberg quantum computing systemand its accompanying Rydberg interaction [22–25] .This section discusses the multi-qubit gates used in thesimulations of Sec. VII.Figure 2: Quantum circuit diagram of the CNOT chainand pairs methods for m = 4.The standard entanglement operator on two qubits isthe CNOT gate. A way to entangle more than two qubitsis by applying the CNOT gate in a chain or pairwise asis shown in Fig. 2. Since qubit operations are prone toerrors it is desirable to use as few as possible to entangleall qubits. The chain and pair methods respectivelyuse m − m ( m − / m qubits. A way of reducing the number of operationsis using entanglement interactions native to the qubitsystem. These interactions can entangle all qubits usingonly one operation and thus when considered as a singlegate, are a more eﬃcient way of entangling all qubits.Note that this eﬃciency only holds if the interaction isnative to the qubit system, on other systems it mighthave to be constructed out of multiple other interactions.A method proposed by Jaksch et al. [24] uses theRydberg interaction. The Rydberg interaction is a longrange interaction which, when one Rydberg atom (qubit)is in the excited state, blocks the others from going tothat state. Thus it can function much like a CNOTgate. However, because of the long range interaction italso allows for a so called C m − NOT gate where onequbit controls all m − m gate which mirrorsthe left and right side of a chain of qubits and multipliesit by a factor i (for instance | (cid:105) → i | (cid:105) ).The Krawtchouk chain is an important method toconstruct other multi-qubit gates. Several studies haveshown that the Krawtchouk Hamiltonian gate can betransformed, using only few one and two qubit rotations[28, 29]. Two examples of these transformations are theiSWAP gate and the C m − NOT gate. The iSWAP gateswaps the last two qubits and multiplies the state by aphase i . This swapping is controlled by the ﬁrst m − m gate can betransformed into a C m − NOT gate.

VI. CLASSICAL OPTIMIZATION METHODS

Alongside the multi-qubit entangling method theclassical optimization method plays an important rolein the VQE algorithm. In this paper two contrast-ing methods which have been applied to VQE inprevious research, are used. The local SimultaneousPerturbation Stochastic Approach (SPSA) [4] and theglobal DIviding RECTangles (DIRECT) method [30, 31].SPSA starts with an input initial trial state | Ψ( (cid:126)θ ) (cid:105) with parameter vector (cid:126)θ ∈ [0 , π ] ⊗ D . At every itera-tion step a small perturbation is done from the param-eter vector towards a random direction to obtain twonew parameter vectors (cid:126)θ k, ± = (cid:126)θ k + c k (cid:126) ∆ k . Here (cid:126) ∆ k is a(3 d + 2) m -dimensional random vector that is generatedby a suﬃciently random generator and c k is a yet to bedeﬁned constant. In this paper (cid:126) ∆ k is determined by theRademacher distribution where every vector entry has aprobability of 1 / (cid:126)g k is estimated by (cid:126)g k = (cid:104) Ψ( (cid:126)θ k, + ) | H | Ψ( (cid:126)θ k, + ) (cid:105) − (cid:104) Ψ( (cid:126)θ k, − ) | H | Ψ( (cid:126)θ k, − ) (cid:105) c k (cid:126) ∆ k . (34)To decrease the energy of the trial state at iteration k astep against the gradient direction is taken such that (cid:126)θ k +1 = (cid:126)θ k − a k (cid:126)g k , (35)where a k is a weight factor dependent on the iterationstep. The factors a k and c k are given by a k = ak A , c k = ck Γ . (36)The constants A, c and Γ are best chosen to be 0.602,0.01 and 0.101 respectively to guarantee the smoothestgradient descent [32]. The constant a is dependent onthe gradient around the initial position and is deﬁnedaccording to a = 2 π c (cid:68) |(cid:104) Ψ( (cid:126)θ , + ) | H | Ψ( (cid:126)θ , + ) (cid:105) − (cid:104) Ψ( (cid:126)θ , − ) | H | Ψ( (cid:126)θ , − ) (cid:105)| (cid:69) (cid:126) ∆ , (37)where (cid:104) ... (cid:105) (cid:126) ∆ is the expectation value over the distribu-tion of (cid:126) ∆ . This is the way the average slope around theinitialization vector is calculated in order to calibratethe optimization process. In practice this expectationvalue is approximated using Monte-Carlo methods. Astep is taken in the direction opposite of the gradient inorder to, conceivably, reach a lower energy value. TheSPSA algorithm is a local optimization method.The second method is the DIviding RECTangles (DI-RECT) approach [33, 34]. The DIRECT algorithm at-tempts to ﬁnd the global minimum of a Lipschitz con-tinuous function f : [0 , π ] ⊗ D → R deﬁned on a convexhypercube [0 , π ] ⊗ D . A Lipschitz continuous function isone for which the following relation holds ∃ K > ∀ (cid:126)x , (cid:126)x ∈ [0 , π ] ⊗ D : | f ( (cid:126)x ) − f ( (cid:126)x ) | ≤ K || (cid:126)x − (cid:126)x || . (38)Appendix A shows the proof that Hamiltonian evalua-tion, f ( (cid:126)x ) = (cid:104) (cid:126)x | H | (cid:126)x (cid:105) is Lipschitz continuous.A hypercube s is a cuboid subset of R D . Let S k be theset of hypercubes that the original hypercube [0 , π ] ⊗ D Figure 3: Quantum circuit diagram for the C NOT gate for m = 5 together with how it would be constructed fromthe iPHASE gate which is created by the Krawtchouk Hamiltonian. Here H is a Hadamard gate.was divided in after k iterations. Each of these hyper-cubes s kj ∈ S k has a center c kj ∈ R D and a dimension d k,maxj ∈ R , the length of its largest side. d k,ij will be thelength of its side in direction i . In the DIRECT algorithmoptimal hypercubes will be be partitioned in the next it-eration. A hypercube s ki ∈ S k is considered optimal ifthere exists a ˜ K > f ( c ki ) − ˜ Kd k,maxi ≤ f ( c kj ) − ˜ Kd k,maxj , j = 1 , ..., | S k | (39a) f ( c kj ) − ˜ Kd k,maxj ≤ f kmin − (cid:15) | f kmin | , (39b)where f kmin = min j { f ( c kj ) } , and (cid:15) is a small positive con-stant. In this research (cid:15) = 10 − . An optimal hypercube s kj with center c kj will be partitioned into smaller hyper-cubes. For a subdivision ﬁrst a set of new sample pointsin the hypercube is deﬁned as A k +1 j = { c kj } ∪ { c kj ± d k,maxj (cid:126)e i | i : d k,ij = d k,maxj } , (40)where (cid:126)e i is the unit vector in direction i . So partitionsare only done in the direction where the hypercube hasthe largest length. Since hypercubes can not overlap theorder of partitioning the diﬀerent dimensions matters.The ﬁrst division will be done in the direction of thesample point with the lowest function value. The seconddivision with the second lowest function value, etc.,see Fig. 4. This way the hypercube with the lowestfunction value will become the largest of the partition.Larger hypercubes are more likely to satisfy Eq. (39a)and thus to be partitioned again in the next iteration.The partitioning will result in a new set of hypercubes S k +1 which can be tested for optimality. The centerof an optimal hypercube S k will again be the centerof a hypercube in S k +1 . The set of centers for whichthe energy is tested thus grows at each iterations step.The hypercubes will eventually converge to the optimalground state. The DIRECT algorithm is a globaloptimization method. Figure 4: Iteration step of the DIRECT algorithm for athree dimensional ( D = 3) hypercube. Divisions are onlydone along the dimensions with the largest length. VII. RESULTS

This section will investigate the behaviour of thediscussed multi-qubit entangling methods and classicaloptimization algorithms on VQE problems of H andLiH. For both H and LiH the simulations are run overinteratomic distances from 0.392 a to 0 . a and 0.72 a to 1 . a respectively, in steps of 0.049 a and 0.09 a respectively, with a the Bohr-radius, and the resultingerrors with the Hartree-Fock energy are averaged. Forproblems with a small number of qubits, such as H withthe 1 s orbital (2 qubits) and LiH with the 1 s , 2 s and2 p x orbitals (6 qubits), the dimension of the state spaceis limited. In other, more complex, problems one mightwant to start oﬀ with a certain ansatz initial state to seeif the VQE algorithm is able to improve upon this. Forconsistency, we choose the initial state to be the vacuumstate | ψ init (cid:105) = | ... (cid:105) in this work. When the errors inground state energies of two algorithms are compared,the algorithm with the lower error is said to outperformthe other.Table I: Errors in Hartree for exact ground energy for 10 iteration SPSA simulations comparing the entanglingmethods for H and LiH at depths 1 to 6. Molecule Entangler d=1 d=2 d=3 d=4 d=5 d=6H CNOT chain 4 . × − . × − . × − . × − . × − . × − C k NOT 2 . × − . × − . × − . × − . × − . × − iPHASE m . × − . × − . × − . × − . × − . × − iSWAP . × − . × − . × − . × − . × − . × − LiH

CNOT chain 1 . × − . × − . × − . × − . × − . × − C k NOT . × − . × − . × − . × − . × − . × − iPHASE m . × − . × − . × − . × − . × − . × − iSWAP . × − . × − . × − . × − . × − . × − Figure 5: A typical energy error from Hartree-Fock en-ergy versus iteration number for the entangling meth-ods iPHASE m , C m − NOT and iSWAP compared to thestandard CNOT chain. The VQE problem describes aLiH molecule (6 qubits) at depth 6, with classical opti-mization SPSA.To investigate the dependence of the VQE resultson entangling methods for ﬁxed (SPSA) classical opti-mization algorithm, we ﬁrst analyze the ground state ofLiH for d=6 using the iPHASE m , iSWAP , C m − NOTand CNOT chain gates (discussed in Sec. V) as shownin Fig. 5. Here it is found that methods based on theRydberg interaction are able to perform comparablyto the CNOT chain method (see Sec. V.) Repeatingthese simulations for LiH and H as a function of depth d we get the resulting errors given in Table I. Theseresults seem to further support that Rydberg interactionentangling methods can compare to the CNOT chainmethod in terms of performance. The optimal entanglingmethod however appears to be problem dependent. Theentanglement depth d can open up the search space forthe trial state allowing lower errors to be reached as canbe seen in many of the LiH cases of Table I. However, it can also overcomplicate the problem by introducing toomany rotation parameters and thus take more iterationsto converge as we see in H cases. Another interestingfact is seen for iPHASE m and iSWAP at H , where theentangling method is not ﬁt for the problem regardlessof the depth, and an error as low as the CNOT chainand C k NOT errors is not reached.Next, we analyze the ground state energies of LiH andH as a function of the classical algorithm optimizationalgorithm (SPSA or DIRECT, see Sec. VI) for bothCNOT and iPHASE m to check additional dependenceon the entangling method as shown in Fig. 6. We sum-marize the results of the VQE calculations for a speciﬁcmolecule in terms of the depth and dimensionality. Theconsidered VQE problems are Hamiltonians of H atdepths 1 to 8 and of LiH at depths d and 25 times respectively.These numbers were chosen because for consideredground state calculations, the errors of the SPSAand DIRECT methods do not decrease by more than10 − Hartree in the last 10 and 5 iterations, respectively.From Fig. 6 it can be seen that for H ((a) and (c))SPSA performs better, while for LiH ((b) and (d))the DIRECT algorithm performs better. However, theDIRECT algorithm performs better on average at higherdepths regardless of the entangling method, althoughthis holds to a lesser degree for the iPHASE m results inFig. 6 (c). The DIRECT algorithm also performs betterfor larger numbers of qubits m , indicating that dimen-sionality D diﬀerentiates signiﬁcantly the performanceof classical search algorithms.Having highlighted the role of dimensionality D , wereturn ﬁnally to discuss the relatively lesser dependencein Fig. 6 on the chosen entangling method. Comparingthe results for CNOT pairs (Figs. 6 (a) and (b)) and0iPHASE m (Figs. 6 (c) and (d)) it is seen that the er-rors reached in the same problems diﬀer per entanglingmethod. For instance at H , for CNOT pairs at highdepths DIRECT gives less error than SPSA while foriPHASE m SPSA still performs better. The preferenceof one algorithm over another seems thus to be depen-dent on the entangling method. It is remarkable thata diﬀerent entanglement scheme changes the preferencefor a certain classical optimization algorithm as they areimplemented in distinct parts of the algorithm. We spec-ulate that this has to do with the “scrambling” natureof a given entangling method in a given search space. Ina fully scrambled search space a gradient is meaningless.For this reason the SPSA algorithm might perform worse compared to the DIRECT algorithm and thus a localsearch becomes expensive in terms of computing power.This would explain, for instance, why at high dimen-sionality the CNOT pairs method performs better whencombined with a global rather than local search method(Figs. 6 (a) and (b)) and vice versa for the iPHASE m method (Figs. 6 (c) and (d)). Ultimately this requires away to measure the amount of scrambling that an entan-gling method performs on a given search space. Althoughno such measure exists presently for operators, it wouldbe analogous to how the entropy of a state provides ameasure of its degree of entanglement. The derivation ofsuch a scrambling measure for operators on a given searchspace remains however the subject of future studies.Figure 6: Error with the Hartree-Fock energy in the reached ground state energy for VQE problems with diﬀerentdepth, optimized with the SPSA and DIRECT algorithms. The entangling methods used are CNOT pairs ((a) and(b)) and iPHASE m ((c) and (d)). Used VQE problems are the Hamiltonians of H (2 qubits, (a) and (c)) at depths1 to 8 and LiH (6 qubits, (b) and (d)) at depths 1 to 4. Thus, the problems have dimensions ranging from D = 10 to D = 84. SPSA was run with 10 iterations and DIRECT was run with 25 iterations. These were empirically chosen aswas found that the next 10 and 5 iterations respectively yielded less than 10 − Hartree improvement on the groundstate energy.1

VIII. CONCLUSION

In this paper, we have focused on optimizing the VQE algorithm for quantum chemistry. First, a constructiveproof was shown for a qubit reduction scheme using the Z symmetries of Hamiltonians, based on the paper byBravyi et al. [11], which eﬀectively reduces the total number of qubit operations. Then we examined the performanceof the VQE algorithm as a function of the entangling method choice, depth, and classical optimization algorithm.Entangling methods based on both physical interactions, such as iPHASE m and C m − NOT gave a convergencethat was comparable to that of the CNOT chain method in the tested small molecule VQE problems. From theresults, although the entangling method and classical optimization method appear to interact in a complex, problemdependent manner, we found a general trend that global search algorithms perform better than local algorithms athigher dimensionality.To further investigate links between dimensionality, entangling method, and classical optimization algorithm, VQEcalculations should be performed for more complicated molecular systems. Furthermore, our ﬁndings motivate futurestudies on the scrambling powers of entanglement operators on a search space and how this might be encapsulatedquite generally by some measure. Such a concept would be foundational towards the design of eﬃcient VQE circuitry.Data available on request from the authors.

ACKNOWLEDGEMENTS

We thank Jasper Postema and Gijs Groeneveld for discussions. This research is ﬁnancially supported by theNetherlands Organisation for Scientiﬁc Research (NWO) under Grant No. 680-47-623 and 680.92.18.05. [1] J. Preskill, Quantum , 79 (2018), ISSN 2521-327X, URL http://dx.doi.org/10.22331/q-2018-08-06-79 .[2] F. Arute, K. Arya, R. Babbush, D. Bacon, J. C.Bardin, R. Barends, R. Biswas, S. Boixo, F. G.S. L. Brandao, D. A. Buell, et al., Nature , 505(2019), ISSN 1476-4687, URL https://doi.org/10.1038/s41586-019-1666-5 .[3] A. Peruzzo, J. McClean, P. Shadbolt, M.-H. Yung, X.-Q.Zhou, P. J. Love, A. Aspuru-Guzik, and J. L. O’Brien,Nature Communications , 4213 (2014), 1304.3061.[4] A. Kandala, A. Mezzacapo, K. Temme, M. Takita,M. Brink, J. M. Chow, and J. M. Gambetta, Nature(London) , 242 (2017), 1704.05018.[5] C. Hempel, C. Maier, J. Romero, J. McClean, T. Monz,H. Shen, P. Jurcevic, B. P. Lanyon, P. Love, R. Babbush,et al., Phys. Rev. X , 031022 (2018), URL https://link.aps.org/doi/10.1103/PhysRevX.8.031022 .[6] A. Aspuru-Guzik and P. Walther, Nature Physics (2012).[7] T. O’Brien, P. Ro˙zek, and A. Akhmerov, Physical reviewletters , 220504 (2018).[8] F. and Arute, K. Arya, R. Babbush, D. Bacon, J. C.Bardin, R. Barends, S. Boixo, M. Broughton, B. B.Buckley, D. A. Buell, et al., Science , 1084 (2020),ISSN 0036-8075, URL https://science.sciencemag.org/content/369/6507/1084 .[9] I. B. Jonathan Koller, William Bender, ed., ExploringQuantum Approximations of Traveling Salesman (Sei-denberg School of CSIS, Pace University, Pleasantville,New York 10570, 2019).[10] IBM-Research-Editorial-Staﬀ,

How to measure amolecule’s energy using a quantum computer (2017),URL . [11] S. Bravyi, J. M. Gambetta, A. Mezzacapo, andK. Temme, arXiv e-prints arXiv:1701.08213 (2017),1701.08213.[12] S. McArdle, S. Endo, A. Aspuru-Guzik, S. Benjamin,and X. Yuan, arXiv e-prints arXiv:1808.10402 (2018),1808.10402.[13] Note1, hartree units are also known as atomic units wherethe reduced constant of Plank (cid:126) , the electron mass m e ,the elementary charge e and Coulomb’s constant k e =1 / π(cid:15) are equal to unity. In this system the Bohr radius a = 4 π(cid:15) (cid:126) /m e e is also equal to unity. Hartree unitsare often used in molecular level calculations [15].[14] J. R. McClean, J. Romero, R. Babbush, and A. Aspuru-Guzik, New Journal of Physics , 023023 (2016),1509.04279.[15] D. McQuarrie, Quantum Chemistry , v. 1 (University Sci-ence Books, 2008), ISBN 9781891389504, URL https://books.google.nl/books?id=zzxLTIljQB4C .[16] Note2, in this work the integrals in Eq. 3 and Eq. 4have been determined using quantum computation li-brary

OpenFermion [35].[17] P. Coleman,

Simple examples of second quantization (Cambridge University Press, 2015), p. 71–94.[18] T. A. Brun,

Quantum error correction (2019),1910.03672.[19] K. Fujii,

Quantum Computation with Topological Codes ,vol. 1 of (Springer Singapore, 2015), 1st ed., ISBN 978-981-287-995-0.[20] D. Gottesman, Ph.D. thesis, California Institute of Tech-nology (1997).[21] A. G. Kurosh, Theory of groups (Amer MathematicalSociety, 2014).[22] M. Saﬀman, Journal of Physics B: Atomic, Molecular andOptical Physics , 202001 (2016), URL https://doi. org/10.1088/0953-4075/49/20/202001 .[23] C. S. Adams, J. D. Pritchard, and J. P. Shaﬀer, Jour-nal of Physics B: Atomic, Molecular and Optical Physics , 012002 (2019), URL https://doi.org/10.1088/1361-6455/ab52ef .[24] D. Jaksch, J. I. Cirac, P. Zoller, S. L. Rolston, R. Cˆot´e,and M. D. Lukin, Phys. Rev. Lett. , 2208 (2000),quant-ph/0004038.[25] M. Morgado and S. Whitlock, Quantum simulationand computing with rydberg-interacting qubits (2020),2011.03031.[26] T. G. Walker and M. Saﬀman, in

Advances in Atomic,Molecular, and Optical Physics , edited by P. Berman,E. Arimondo, and C. Lin (Academic Press, 2012), vol. 61of

Advances In Atomic, Molecular, and Optical Physics ,pp. 81 – 115, URL .[27] H. Labuhn, D. Barredo, S. Ravets, S. de L´es´eleuc,T. Macr`ı, T. Lahaye, and A. Browaeys, Nature (London) , 667 (2016), 1509.04543.[28] K. Groenland and K. Schoutens, Phys. Rev. A , 042321(2018), 1707.05144.[29] S. R. Clark, C. Moura Alves, and D. Jaksch, New Journalof Physics , 124 (2005), quant-ph/0406150. [30] C. Kokail, C. Maier, R. van Bijnen, T. Brydges,M. K. Joshi, P. Jurcevic, C. A. Muschik, P. Silvi,R. Blatt, C. F. Roos, et al., Nature , 355–360(2019), ISSN 1476-4687, URL http://dx.doi.org/10.1038/s41586-019-1177-4 .[31] C. Kokail, C. Maier, R. van Bijnen, T. Brydges,M. K. Joshi, P. Jurcevic, C. A. Muschik, P. Silvi,R. Blatt, C. F. Roos, et al., Nature , 355–360(2019), ISSN 1476-4687, URL http://dx.doi.org/10.1038/s41586-019-1177-4 .[32] J. C. Spall, IEEE Transactions on Aerospace and Elec-tronic Systems , 817 (1998), ISSN 0018-9251.[33] P. J. Nicholas, in A Dividing Rectangles Algorithm forStochastic Simulation Optimization (2015).[34] M. Piao Tan and C. A. Floudas,

Determining the opti-mal number of clustersDetermining the Optimal Numberof Clusters (Springer US, Boston, MA, 2009), pp. 687–694, ISBN 978-0-387-74759-0, URL https://doi.org/10.1007/978-0-387-74759-0_123 .[35] J. R. McClean, K. J. Sung, I. D. Kivlichan,Y. Cao, C. Dai, E. Schuyler Fried, C. Gidney,B. Gimby, P. Gokhale, T. H¨aner, et al., arXiv e-printsarXiv:1710.07629 (2017), 1710.07629. Appendix A: Proof Lipschitz continuity of Hamiltonian evaluation

Theorem 6.

Let H be a m × m Hamiltonian operator. The function f ( (cid:126)x ) = (cid:104) Ψ( (cid:126)x ) | H | Ψ( (cid:126)x ) (cid:105) , where (cid:126)x describes arotation of an ansatz | φ (cid:105) , is Lipschitz continuous.Proof. | f ( (cid:126)x ) − f ( (cid:126)x ) | = |(cid:104) Ψ( (cid:126)x ) | H | Ψ( (cid:126)x ) (cid:105) − (cid:104) Ψ( (cid:126)x ) | H | Ψ( (cid:126)x ) (cid:105)| = |(cid:104) Ψ( (cid:126)x ) | H | Ψ( (cid:126)x ) (cid:105) − (cid:104) Ψ( (cid:126)x ) | H | Ψ( (cid:126)x ) (cid:105) + (cid:104) Ψ( (cid:126)x ) | H | Ψ( (cid:126)x ) (cid:105) − (cid:104) Ψ( (cid:126)x ) | H | Ψ( (cid:126)x ) (cid:105)| = |(cid:104) Ψ( (cid:126)x ) | H | Ψ( (cid:126)x ) − Ψ( (cid:126)x ) (cid:105) + (cid:104) Ψ( (cid:126)x ) − Ψ( (cid:126)x ) | H | Ψ( (cid:126)x ) (cid:105)|≤ |(cid:104) Ψ( (cid:126)x ) | H | Ψ( (cid:126)x ) − Ψ( (cid:126)x ) (cid:105)| + |(cid:104) Ψ( (cid:126)x ) − Ψ( (cid:126)x ) | H | Ψ( (cid:126)x ) (cid:105)|≤ || Ψ( (cid:126)x ) || · || H || · || Ψ( (cid:126)x ) − Ψ( (cid:126)x ) || + || Ψ( (cid:126)x ) || · || H || · || Ψ( (cid:126)x ) − Ψ( (cid:126)x ) || = 2 || H || · || Ψ( (cid:126)x ) − Ψ( (cid:126)x ) || . (A1)Now the trial state | Ψ( (cid:126)x ) (cid:105) is generated by rotations on the ansatz | φ (cid:105) , so | Ψ( (cid:126)x ) (cid:105) = R ( (cid:126)x ) | φ (cid:105) . All of the rotationmatrices as deﬁned in Eq. (14) have elements which are continuously diﬀerentiable. Since the product of continuouslydiﬀerentiable functions is still continuously diﬀerentiable, it is true that R ( (cid:126)x ) can be written as R ( (cid:126)x ) =  f , ( (cid:126)x ) f , ( (cid:126)x ) · · · f , m ( (cid:126)x ) f , ( (cid:126)x ) f , ( (cid:126)x ) · · · f , m ( (cid:126)x )... ... . . . ... f m , ( (cid:126)x ) f m , ( (cid:126)x ) · · · f m , m ( (cid:126)x )  , (A2)where f i,j ( (cid:126)x ) are all continuously diﬀerentiable functions. Since all continuously diﬀerentiable functions are Lipschitzcontinuous ∀ i, j ∃ a i,j > | f i,j ( (cid:126)x ) − f i,j ( (cid:126)x ) | ≤ a i,j || (cid:126)x − (cid:126)x || . (A3)Then it is true that || Ψ( (cid:126)x ) − Ψ( (cid:126)x ) || = || R ( (cid:126)x ) | ... (cid:105) − R ( (cid:126)x ) | ... (cid:105)|| ≤ || R ( (cid:126)x ) − R ( (cid:126)x ) || = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) f , ( (cid:126)x ) f , ( (cid:126)x ) · · · f , m ( (cid:126)x ) f , ( (cid:126)x ) f , ( (cid:126)x ) · · · f , m ( (cid:126)x )... ... . . . ... f m , ( (cid:126)x ) f m , ( (cid:126)x ) · · · f m , m ( (cid:126)x )  −  f , ( (cid:126)x ) f , ( (cid:126)x ) · · · f , m ( (cid:126)x ) f , ( (cid:126)x ) f , ( (cid:126)x ) · · · f , m ( (cid:126)x )... ... . . . ... f m , ( (cid:126)x ) f m , ( (cid:126)x ) · · · f m , m ( (cid:126)x ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ m (cid:88) i =1 2 m (cid:88) j =1 || f i,j ( (cid:126)x ) − f i,j ( (cid:126)x ) || ≤ m (cid:88) i =1 2 m (cid:88) j =1 a i,j || (cid:126)x − (cid:126)x || ≤ K || (cid:126)x − (cid:126)x || , (A4)where K = (cid:80) m i =1 (cid:80) m j =1 a i,j ∈ R + . Thus | f ( (cid:126)x ) − f ( (cid:126)x ) | ≤ ·|| H || · K ·|| (cid:126)x − (cid:126)x || and thus ff