Tailoring Term Truncations for Electronic Structure Calculations Using a Linear Combination of Unitaries
aa r X i v : . [ qu a n t - ph ] J u l Tailoring Term Truncations for Electronic Structure Calculations Using a LinearCombination of Unitaries
Richard Meister, Simon C. Benjamin, and Earl T. Campbell
2, 3 Department of Materials, University of Oxford, Oxford OX1 3PH, United Kingdom Department of Physics and Astronomy, University of Sheffield, Sheffield S3 7RH, United Kingdom AWS Center for Quantum Computing, Pasadena, CA 91125, USA (Dated: July 24, 2020)A highly anticipated use of quantum computers is the simulation of complex quantum systemsincluding molecules and other many-body systems. One promising method involves directly applyinga linear combination of unitaries (LCU) to approximate a Taylor series by truncating after someorder. Here we present an adaptation of that method, optimized for Hamiltonians with terms ofwidely varying magnitude, as is commonly the case in electronic structure calculations. We showthat it is more efficient to apply LCU using a truncation that retains larger magnitude terms asdetermined by an iterative procedure. We obtain bounds on the simulation error for this generalizedtruncated Taylor method, and for a range of molecular simulations we report these bounds as wellas direct numerical emulation results. We find that our adaptive method can typically improve thesimulation accuracy by an order of magnitude, for a given circuit depth.
I. Introduction
One of the most promising applications of quantumcomputers is the efficient simulation of quantum sys-tems [1], including those that arise in quantum chem-istry. Following the first concepts for such simula-tions [2, 3], there have been numerous proposed algo-rithms to simulate these systems using quantum com-puters [4–12], often with variations of Trotter-Suzukiproduct formulas [13, 14]. These methods usually ap-proximate the time evolution operator by sequentiallyevolving the terms in the Hamiltonian individually.Through extensive study the required number of gateswas reduced substantially over time [15–20]. However,the scaling of the inverse simulation error for such prod-uct formulas is polynomial in the circuit gate count.An alternative is available through the technique oflinear combinations of unitaries (LCU). Here, in con-trast to the product formula approaches, one derivesa quantum circuit that directly applies a sum of uni-taries, allowing for a much greater variety of accessibleoperators. A key enhancement was the replacement ofa probabilistic step in the original scheme [21] with anear-deterministic process based on oblivious amplitudeamplification [22].The LCU method gave rise to a number of implemen-tations for Hamiltonian simulation. The approach inRef. [23] uses linear combinations of product formulas,taking advantage of commuting terms in the Hamilto-nian – like pure product formulas – while improvingthe complexity scaling with inverse error to be onlypoly-logarithmic using LCU. In Ref. [24] it is appliedto enhance the scaling of the complexity with simula-tion error of Szegedy quantum walks, while retainingtheir advantage for sparse Hamiltonians. Extensionsof this approach are the so-called quantum signal pro-cessing [25, 26] and qubitization [27], of which variantsspecifically for quantum chemistry exist [28].One of the most direct uses of LCU is presentedin [29], where the time evolution operator is approxi-mated by truncating its Taylor expansion at some ap- propriate order. This results in exponentially betterscaling of the complexity with inverse error comparedto product formulas.In this work, we present a variant of the truncatedTaylor LCU scheme [29] that includes terms by weightrather than order. This makes use of the fact that insome quantum mechanical systems – especially elec-tronic structure Hamilonians – the magnitudes of theterms vary considerably. This suggests that some(large) terms should be included to higher orders thanother (small) terms. Our variant implements just that,while respecting the efficient circuit implementationof [29] and subsequent improvements to select and prepare subroutines [30, 31].Our algorithm starts from an empty expansion and it-eratively adds terms which facilitate the largest declineof the error bound for one additional gate. This greedymethod leads to a more rapid reduction of the error inthe very early stage of the construction when appliedto electronic structure Hamiltonians. At a later stage,the rate of convergence becomes roughly equal to theoriginal method, maintaining a roughly constant factoradvantage in the error for the investigated molecules.Therefore, the asymptotic behavior is equivalent forboth methods, but we accomplish a constant improve-ment. We find that the error of our modified schemeis typically one order of magnitude lower than in theoriginal method at the same gate cost. For a fixed errormagnitude, this results in reducing the circuit depth byroughly one full order of the expansion.The rest of the paper is structured as follows. Sec-tion II contains a detailed description of our modifiedmethod adapted from [29]. In Section III, we present re-sults for error bounds as well as numerically evaluatederrors for a variety of electronic structure Hamiltoni-ans. Lastly, Section IV concludes the paper and givesan outlook to possible further work.
II. Truncated Taylor series
Our method is closely related to the approach pre-sented by Berry et al. [29]. We will give a detaileddescription of our modified method, which at the sametime serves as a summary of [29].
A. Linear combination of unitaries
The protocol is based on a method of adding unitarieswith the help of ancilla qubits [21]. We start from aHamiltonian of the form H = L − X ℓ =0 α ℓ h ℓ , where α ℓ are real positive scalars and h ℓ are unitariesfor which implementations on a quantum computer ex-ist. Without loss of generality, we assume the terms aresorted by magnitude, i.e. α ℓ +1 ≥ α ℓ . The approachalso used in [29] is to implement an approximation tothe corresponding time evolution operator U ( t ) = e − iHt with a Taylor series. Taking t to be sufficiently small,the series representation of U ( t ) can be approximatedby the sum U L ( t ) := + ∞ X k =1 ( − it ) k k ! k Y j =1 L j − X ℓ j =0 α ℓ j h ℓ j (1)where L is a vector of L k with ≤ L k ≤ L , k ∈ N + ,meaning the individual sums in the product only con-tain the L k largest terms of H . This is the main dif-ference to [29], where the series is truncated at someappropriate order n . This yields U n ( t ) := + n X k =1 ( − it ) k k ! k Y j =1 L − X ℓ j =0 α ℓ j h ℓ j , which is a special case of Eq. (1), where all orders up to n are added in full. Our modified version of the sumincludes some orders only partially, giving greater con-trol over the total gate count and allowing for quickerconvergence of the error bounds.The magnitude of the time step t will be a fixed valuerestricted by the method. Longer times τ = rt , can besimulated by applying U r L . However, most of this paperwill focus on the implementation of a single time step.To keep the notation simple, the products of the co-efficients α ℓ with t k /k ! are gathered into new variables Phases can always be pushed into the operators h ℓ . For all quantities with an L subscript we will alternatively re-place it with n to mean an L where L k = L for k ≤ n and L k = 0 for k > n . β j , and all products of the unitaries h ℓ together with ( − i ) k are collected into operators V j , with a newly in-troduced label j numbering all terms in the sum. Notethat even if different products of h ℓ yield identical op-erators, they are treated as separate V j , each with acorresponding weight β j . By construction, all β j arealso real and positive. Thus, Eq. (1) becomes U L = m − X j =0 β j V j , where the time dependence of U L and β j is not explicitlydenoted, and the total number of terms m implicitlydepends on L .In order to apply U L to a state | ψ i , we define theunitary operators P ( t ) and S ( prepare and select ) inaccordance with [29]. The prepare operator P , whosetime dependence we will make implicit from here on,maps the | i state of the ancilla qubits to the weightedsuperposition P | i := 1 √ s L m − X j =0 p β j | j i (2)with the also implicitly t -dependent normalization con-stant s L := m − X j =0 β j . The select operator S acts on a state | ψ i with theoperator V j , where j depends on the state of the ancilla.So its action on a tensor state is defined as S | j i | ψ i := | j i V j | ψ i . Given these two operators P and S , we proceed analo-gously to [29] by introducing a new operator W := ( P † ⊗ ) S ( P ⊗ ) which has the effect W | i | ψ i = 1 s L | i U L | ψ i + N | ⊥ , Φ i (3)where N is the appropriate constant for the state tobe normalized, and with a garbage state | ⊥ , Φ i whoseancilla part has no overlap with the ancillary | i state. B. Oblivious amplitude amplification
The naïve method for obtaining U L | ψ i would be tomeasure the ancilla of W | i | ψ i , see Eq. (3), and post-select for the ancilla | i state. However, since s L in-creases with t , the success probability for large t di-minishes. Additionally, t is always subject to conver-gence of Eq. (1). Due to the postselection, dividing t into smaller segments and repeating the process multi-ple times would also suppress the total success proba-bility.One way around this problem also used in [29] is theso-called oblivious amplitude amplification. As detailedin Lemma 1 in Appendix A, and references therein, if U L were unitary and s L = 2 , the amplification operator Q := −W R W † R (4)with R := 2Π − the reflection operator about the | i state of the ancilla and Π := | ih | ⊗ the projectoronto the ancilla | i , would have the effect [22] QW | i | ψ i = | i U L | ψ i . Thus, we define A := QW = −W R W † R W . (5)We first discuss the requirement of s L = 2 . Our formof the Taylor expansion leads to s L being of the form s L ( t ) := ∞ X k =1 t k k ! k Y j =1 L j − X ℓ j =0 α ℓ j | {z } :=Λ j = ∞ X k =0 t k k ! k Y j =1 Λ j . (6)The restriction s L = 2 therefore forces the simulationtime t to be the only real root of ∞ X k =0 t k k ! k Y j =1 Λ j − which we call t L . If we were to include all orders infull, i.e. L k = L, ∀ k , all Λ j would be equal and theinfinite sum on the left would become the series of theexponential function. We call the time step for this case t ∞ = log (2) / Λ , with the definition Λ := P Lj =0 α j .Shorter times can be accomplished by using an extraqubit, as described in [22]. Since the only requirementfor oblivious amplitude amplification to work is s L = 2 ,and shorter times mean s L < , i.e. the amplitude ofthe ancilla | i is too large, we can introduce an addi-tional qubit to the ancilla and prepare it with enoughweight such that the ancilla | i reduces to amplitude / . These shorter times are only relevant in the lasttime step of a simulation and have almost the same costas a full step, so we limit the rest of the discussion tomultiples of t L .Equation (5) only strictly holds for unitary U L , butit is only close to unitary. We need the action of A for ageneral U L and again follow [29]. Applying A to a state | i | ψ i and projecting onto the ancilla | i yields Π A | i | ψ i = | i (cid:18) s L U L − s L U L U † L U L (cid:19) | ψ i , (derivation in Appendix A, Lemma 2) and we call theoperator we are actually applying in the | ψ i subspace ˜ A L := 3 s L U L − s L U L U † L U L . (7) C. Gate construction
We also want to elaborate on the specific gate con-struction to implement A efficiently, adapted from [29].First, the ancilla is divided into κ + 1 registers, where κ := || L || is the number of non-zero elements in thevector L . The first register is named q and contains κ qubits, while the others are given labels c . . . c κ , with c k containing ⌈ log L k ⌉ qubits.The q register’s purpose is to represent different or-ders, while registers c k are needed for the terms in eachorder. This makes it convenient to use a multi-index j ≡ ( k, ℓ , . . . , ℓ k ) . The corresponding state of the an-cilla is | j i = | k i q | ℓ i c . . . | ℓ k i c k . . . where we leave the state of the registers c k ′ with k ′ > k unspecified. The coefficient associated with this indexis β j = β ( k,ℓ ,...,ℓ k ) = t k k ! α ℓ . . . α ℓ k . prepare For this operator we slightly deviatefrom [29]. Exact implementation of P as defined inEq. (2) would necessitate the preparation of the c k reg-isters to be conditioned on qubits in the q register. Wecan, however, implement an operator P ⋆ , which actsequivalently to P when used in W , and does not needcontrolled operations on the c k registers.The q register is prepared to contain the prefactorfor each order k in unary coding, i.e. | k i q := (cid:12)(cid:12) k k − κ (cid:11) q .Thus, the prepare operator P ⋆ ( t ) acts on this registerproportional to | κ i q κ X k =0 vuut t k k ! k Y j =1 Λ j | k i q . This can be implemented by a rotation on the firstqubit, and rotations controlled by the previous one oneach subsequent qubit.The c k registers can now all be almost identically pre-pared to contain the coefficients in the Hamiltonian us-ing regular binary coding. So the action of P ⋆ on asingle c k register is proportional to | i c k L k − X ℓ =0 √ α ℓ | ℓ i c k . For this, any efficient method for arbitrary state prepa-ration can be used, whose cost we discuss presently.Combining these constituents into a single unitary P ⋆ and applying it to the whole ancilla yields the desiredoperator equivalent to Eq. (2) if used in W , which isshown in more detail in Lemma 3 in Appendix A. select Using the established structure of the an-cilla, the S operator must have the action S | k i q | ℓ i c . . . | ℓ k i c k . . . | ℓ κ i c κ | ψ i = | k i q | ℓ i c . . . | ℓ k i c k . . . | ℓ κ i c κ ˜ h ℓ . . . ˜ h ℓ k | ψ i with ˜ h ℓ := − ih ℓ . This can be accomplished by having asequence of groups of unitaries in the circuit. Each ofthe groups m = 1 . . . κ contains the unitaries ˜ h ℓ m , with ℓ m = 0 . . . L m − , acting on the target state | ψ i .The register c m is used as the addressing register forgroup m , i.e. the state | ℓ m i c m determines which unitaryin group m is applied. To achieve this, we use the factthat the c registers are in binary coding, so ℓ m is rep-resented as a binary number with the ⌈ log L m ⌉ qubitsin c m as digits. By controlling ˜ h ℓ m on the c m registerin a way that matches the binary representation of ℓ m ,only the unitary with the correct index is applied. Forexample, ˜ h would be controlled by the last and ante-penultimate qubit in c m and anti-controlled by all otherqubits in c m (since 5 corresponds to the state | . . . i in binary coding).Additionally, the q register specifies how many of thegroups are applied. If q is in the state | k i q , only thefirst k groups should be active. The unary coding in q makes this straightforward to implement by addition-ally controlling every unitary in group m with the m th qubit in register q . Figure 1 shows a sketch of the fullconstruction. qc c ... | ψ i ......... . . .. . .. . .. . .. . .. . .. . . . . .. . .. . .. . .. . .. . .. . . ˜ h ˜ h . . .. . .. . .. . .. . .. . .. . . ˜ h ˜ h second group first group FIG. 1. Sketch of the gate construction for S . By takingadvantage of the unary iteration structure, the T -count ofthe multi-controls can be significantly reduced [30]. How-ever, we include this non-optimized diagram for pedagogicalpurposes. Gate cost
Lastly, we want to estimate the gatecomplexity of the operator A . Its constituents are tworeflections R , and three instances of W , each of whichcontain one S and two P ⋆ . We consider the universalset of Clifford + T and count the number of expensive T -gates [32–34] for our complexity analysis.Each reflection R is a single Pauli- Z operator on oneof the ancilla qubits (padded between two not gates),anti-controlled on all others. This can be done with O ( P k log L k ) T -gates and a second ancilla register ofsize ( κ + P k ⌈ log L k ⌉ − [35]. The groups in the circuit are numbered right-to-left to matchthe established numbering convention of the operators.
The prepare stage for the q register consists of κ − controlled rotations with a total T -complexity of O ( κ ) .Each of the κ registers c k needs to be initialized to aspecific state with ⌈ log L k ⌉ ∼ L k coefficients, requir-ing between O ( P k L k ) and O ( P k √ L k log ( L k /ǫ )) T -gates per register, depending on the number of addi-tionally available ancillas, where ǫ is the accuracy ofthe preparation [31]. In total, this yields a T -count be-tween O ( P k L k ) and O ( P k √ L k log ( L k /ǫ )) .The fact that the controls of each h ℓ in S form aso-called unary iteration can be exploited to lower thetotal gate count. Each sequence of L k operators onlyrequires O ( L k ) T -gates [30], plus L k times the cost ofimplementing a single − ih ℓ operator. This totals to P k L k such operators, thus S has a T -complexity of O ( P k L k ) Combining all these counts results in a total complex-ity of O ( P k L k ) for A . We thus define C L := ∞ X k =1 L k = || L || (8)to use as a proxy for the total gate cost in our results. D. Error bounds
We consider the error of the method per time step tobe the norm of an operator ∆ L which fulfils U ( t ∞ ) = ˜ A L ( t ∞ ) + ∆ L ( t ∞ ) where we now use the step size t ∞ = log(2) / Λ . We findthat the error made by applying Π A once and tracingout the ancilla can be bounded by δ L := || ∆ L ( t ∞ ) || ≤ − s L ( t ∞ ) =: ε L up to order ε L . Because using t L or t ∞ makes no dif-ference in the error up to order ε L , we exclusively use t ∞ in our calculations. The error for a total simulationtime τ = rt ∞ = r log(2) / Λ , r ∈ N , is then || ˜ A L ( t ∞ ) r − U ( t ∞ ) r || = rδ L = Λ δ L log 2 τ ≤ rε L , also up to order ε L .We call the bound on the total simulation error of r steps ǫ := rε . The T -gate complexity C ǫ of a simulationfor time τ in terms of the total error bound ǫ is then inthe range O Λ τ log Λ τǫ log log Λ τǫ ! < C ǫ ≤ O L Λ τ log Λ τǫ log log Λ τǫ ! , depending on the Hamiltonian. All of these results areshown in more detail in Appendix A, Lemmas 4, 5, 7and 9 and Corollaries 6, 8 and 10. || · || in this paper always means the operator norm. E. Insertion strategy
The notion of partially included orders together withan expression for the error bound allows us to start fromany given expansion L and determine which L k shouldbe increased by 1 – i.e. which additional gate shouldbe included – to give the quickest decrease of the errorbound. Specifically, it is the k which maximizes theexpression X ν ≥ k t ν ν ! α L k Y j = k ≤ j ≤ ν L j X i =1 α i . (9)Starting from L = , repeatedly adding terms that max-imize (9) results in a greedy algorithm for decreasingthe error bound, which we used to iteratively constructcircuits. III. Results
We first observe that for Hamiltonians with evenlydistributed magnitudes α ℓ , the only benefit of our modi-fication is the finer control over the total gate count. Byconstruction, whenever C L = νL , ν ∈ N , our protocoland the method used in [29] yield identical results.We may expect our modification to be advantageouswhenever the magnitudes of α ℓ vary over orders of mag-nitude, because this allows terms in low orders con-taining small α ℓ to be smaller than terms in higherorders containing large α ℓ . Such magnitude distribu-tions are often found in electronic structure Hamilto-nians for molecules [36]. Because the efficiency of ourmethod depends critically on the specific amplitudes in − − − C L /L ε L / t ∞ o r δ L / t ∞ FIG. 2. Accuracy of the Taylor expansion for the electronicHamiltonian of hydrogen fluoride (HF), at time step size t ∞ ,in terms of the error per unit time vs the circuit cost C L asdefined in Eq. (8) per cost of a full order. Lines are theerror bounds ε L for the unmodified and modifiedcircuit. Squares are the numerically obtained errors δ L forfully expanded orders, circles analogous for partial orders.The vertical gray bars point to where the error would be ifwe could implement U L without the amplification step. TABLE I. Molecules used in our calculations with theirmolecular formula, PubChem Compound ID (CID), num-ber of qubits (excluding ancillas), and number of terms L .Formula CID Qubits L HO
157 350 11 631 HF
16 211 014 11 631 HN LiH
62 714 11 631 BH BeH
139 073 13 666 CH
123 164 13 1086 NH
123 329 13 1086 BH
139 760 13 1086 H O
962 13 1086 BH CH NH
222 15 2929 CH
297 17 6892 O
977 19 2239 N
947 19 2951 NO
145 068 19 4427
BeO
14 775 19 5851
LiF
224 478 19 5851 CO
281 19 5851 CN BN
66 227 19 5851
LiOH
HBO
518 615 21 8758
HCN
768 21 8758
HOF
123 334 21 12 070
CHO
123 370 21 12 070
CHF
186 213 21 12 074
HNO
945 21 12 078 H NO CH O
712 23 9257 NH F
139 987 23 15 673 CH F
138 041 23 15 681 CH F
11 638 25 18 600 CH Li H NO
787 25 22 080
OCH
123 146 25 39 392
LiBH CH OH
887 27 30 419 C H O C H
12 302 244 91 1 897 809 the Hamiltonian, analytical results are hard to obtain.Therefore we resort to a numerical study comparing theaccuracy of the modification to the method in [29] fora group of molecules listed in Table I.The Hamiltonians for these molecules were obtainedusing OpenFermion [37] and PySCF [38], with the ba-sis set STO-3G [39], and geometry data retrieved fromPubChem [40] and the NIST Computational ChemistryComparison and Benchmark Database [41]. Mappingfrom second quantization to spin operators was doneusing the Jordan-Wigner transformation [42].In addition to the listed molecules, we also replacedthe coefficients of the Hamiltonian for LiH with ran-dom numbers from a normal distribution with µ = 1 and σ = 0 . , to show the vanishing effect of our modi-fication whenever all weights are similar. These resultsare labelled Random . RandomHOHFHNLiHBHBeH CH NH BH H OBH CH NH CH O N NOBeOLiFCOCNBNLiOHHBOHCNHOFCHOCHFHNOH NOCH ONH FCH FCH FCH LiH NOOCH LiBH CH OHC H O C H ε n /ε L or δ n /δ L Suppression of simulation errorfor a given gate cost
FIG. 3. Ratio of the errors obtained without, versus with,our modification for different molecules, using a time step of t ∞ . Errors were evaluated at cost values C L = (1 . . . L .Each line represents the ratio of errors at some cost C L .The advantage of our modified algorithm therefore increasesleft-to-right. Top-bottom split data indicates ratios of errorbounds ε n /ε L at the top and ratios of numerically obtainederrors δ n /δ L at the bottom. If there is no split, lines repre-sent error bounds only. . . . RandomHOHFHNLiHBHBeH CH NH BH H OBH CH NH CH O N NOBeOLiFCOCNBNLiOHHBOHCNHOFCHOCHFHNOH NOCH ONH FCH FCH FCH LiH NOOCH LiBH CH OHC H O C H ( C n − C L ) /L Saving in gate costfor a given simulation error
FIG. 4. Difference between the cost of full expansions C n = nL to order n = 1 . . . and the cost of an iterativelyconstructed circuit C L to arrive at the same error, normal-ized to the cost of one full order L , for each molecule. Theadvantage of our modified algorithm therefore increases left-to-right. The time step size is t ∞ . Top-bottom split dataindicates differences for error bounds at the top and for nu-merically obtained errors on the bottom. If there is no split,lines represent error bounds only. Figure 2 shows the error bounds as well as the numer-ically evaluated exact errors per unit time for hydrogenfluoride. Compared to the expansion to full orders, wesee that our modification leads to a much quicker de-crease of the error bound as well as the exact error inthe range < C L < L , followed by very similar con-vergence for C L > L . This pattern is consistent withthe convergence we observed for all other molecules wecalculated.To summarize the results for all molecules, we ob-tained the ratio of the errors of the original and themodified version at cost values C L = nL , with n =1 . . . , where the time step was set to t ∞ for each re-spective molecule. The results are depicted in Fig. 3.Across the listed molecules, our modification consis-tently yields errors roughly one order of magnitudelower than the unmodified method at equivalent costs,with some ratios as low as 3 and some as high as 100.We also compared the cost required to obtain a cer-tain error threshold. To this end, the errors of the ex-pansions δ n to full orders n were calculated, and thecost C L of the modified version to yield the same errorwas recorded. The results are depicted in Fig. 4. Us-ing the modified method leads to saving approximatelyone order in most cases, i.e. the accuracy obtained byexpanding n full orders can be produced with a cost of C L = ( n − L .Our results show no strong correlation with neitherthe number of qubits in the Hamiltonian nor the numberof terms L . Therefore we presume that these propertieswill also hold for other chemical Hamiltonians obtainedin the same way. IV. Conclusion
We demonstrated that a natural extension of themethod proposed in [29] can lead to noticeable improve-ments in the convergence of the approximation, if usedfor electronic structure Hamiltonians of molecules. Theasymptotic behaviour is equivalent; however, minimiz-ing the required number of gates will be important forimplementations on actual quantum hardware.Our modification does not need the introduction of any new subroutines. It only rearranges some parts ofthe gate construction to facilitate quicker convergenceof both the error bound and the actual error.Due to the lack of analytic relations between the am-plitudes in the Hamiltonians we investigated, only nu-meric results are available. However, because of therelatively large sample size of molecules we considered,it stands to reason that this behaviour will generalizeto a large portion of electronic structure Hamiltonians.The aforementioned methods of qubitization andquantum signal processing have been shown to ex-hibit better scaling for several types of Hamiltoni-ans [27, 28, 30, 43]. However, there are instances wherethey are less suited, one prominent example being forsimulating time dependent Hamiltonians. Even for in-trinsically time-independent cases, introducing a timedependence by transforming to a rotating frame can bebeneficial if the Hamiltonian is diagonally dominant. Incontrast to qubitization and quantum signal processing,the approach of the truncated Taylor series in [29] canbe applied to such time dependent cases with reasonableoverhead, as shown in Refs. [44, 45]. Investigating thesuitability of our modification to such problems wouldtherefore be an interesting question for future work.Furthermore, combining our proposed adaptationswith the improvements by Novo and Berry [46], whoadd an additional correction step to the method, couldalso be worth exploring.
Acknowledgments
The authors would like to acknowledge the use ofthe University of Oxford Advanced Research Comput-ing (ARC) facility in carrying out this work. Thiswork was completed while ETC was at the Universityof Sheffield. ETC was supported by the EPSRC (grantno. EP/M024261/1). SCB acknowledges support fromthe EU Flagship project AQTION, the NQIT Hub(EP/M013243/1) and the QCS Hub (EP/T001062/1).The authors also thank Bálint Koczor and Sam McAr-dle for useful discussions, as well as Yingkai Ouyang forcomments on the manuscript. [1] R. P. Feynman. Simulating physics with computers.
In-ternational Journal of Theoretical Physics , 21(6):467–488, Jun 1982. doi .[2] S. Lloyd. Universal quantum simulators.
Science , 273:1073–1078, Aug 1996. doi .[3] D. S. Abrams and S. Lloyd. Simulation of many-bodyFermi systems on a universal quantum computer.
Phys-ical Review Letters , 79(13):2586–2589, Sep 1997. doi . DOI 10.5281/zenodo.22558 [4] G. Ortiz, J. E. Gubernatis, E. Knill, and R. Laflamme.Quantum algorithms for fermionic simulations.
PhysicalReview A , 64(2), Jul 2001. doi .[5] A. Aspuru-Guzik. Simulated quantum computation ofmolecular energies.
Science , 309(5741):1704–1707, Sep2005. doi .[6] D. W. Berry, G. Ahokas, R. Cleve, and B. C. Sanders.Efficient quantum algorithms for simulating sparseHamiltonians.
Communications in Mathematical Phy-sics , 270(2):359–371, Dec 2006. doi .[7] H. Wang, S. Kais, A. Aspuru-Guzik, and M. R. Hoff-mann. Quantum algorithm for obtaining the energy spectrum of molecular systems.
Physical ChemistryChemical Physics , 10(35):5388, 2008. doi .[8] J. D. Whitfield, J. Biamonte, and A. Aspuru-Guzik.Simulation of electronic structure Hamiltonians usingquantum computers.
Molecular Physics , 109(5):735–750, 2011. doi .[9] E. Campbell. Random compiler for fast Hamiltoniansimulation.
Physical Review Letters , 123(7):070503,Aug 2019. doi .[10] A. M. Childs and Y. Su. Nearly optimal lattice simula-tion by product formulas.
Physical Review Letters , 123(5):050503, Aug 2019. doi .[11] A. M. Childs, A. Ostrander, and Y. Su. Faster quan-tum simulation by randomization.
Quantum , 3:182, Sep2019. doi .[12] Y. Ouyang, D. R. White, and E. T. Campbell. Compila-tion by stochastic Hamiltonian sparsification.
Quantum ,4:235, Feb 2020. doi .[13] H. F. Trotter. On the product of semi-groups of opera-tors.
Proceedings of the American Mathematical Society ,10(4):545–551, 1959. doi .[14] M. Suzuki. Generalized Trotter’s formula and system-atic approximants of exponential operators and innerderivations with applications to many-body problems.
Communications in Mathematical Physics , 51(2):183–190, 1976. doi .[15] D. Wecker, B. Bauer, B. K. Clark, M. B. Hastings, andM. Troyer. Gate-count estimates for performing quan-tum chemistry on small quantum computers.
PhysicalReview A , 90:022305, Aug 2014. doi .[16] R. Babbush, J. McClean, D. Wecker, A. Aspuru-Guzik,and N. Wiebe. Chemical basis of Trotter-Suzuki errorsin quantum chemistry simulation.
Physical Review A ,91(2), Feb 2015. doi .[17] M. B. Hastings, D. Wecker, B. Bauer, and M. Troyer.Improving quantum algorithms for quantum chemistry,2014. ar x iv .[18] D. Poulin, M. B. Hastings, D. Wecker, N. Wiebe, A. C.Doherty, and M. Troyer. The Trotter step size requiredfor accurate quantum simulation of quantum chemistry,2014. ar x iv .[19] J. R. McClean, R. Babbush, P. J. Love, and A. Aspuru-Guzik. Exploiting locality in quantum computation forquantum chemistry. The Journal of Physical ChemistryLetters , 5(24):4368–4380, Dec 2014. doi .[20] A. M. Childs, Y. Su, M. C. Tran, N. Wiebe, and S. Zhu.A theory of Trotter error, 2019. ar x iv .[21] A. M. Childs and N. Wiebe. Hamiltonian simulationusing linear combinations of unitary operations. Quan-tum Information & Computation , 12(11–12):901–924,Nov 2012. doi .[22] D. W. Berry, A. M. Childs, R. Cleve, R. Kothari, andR. D. Somma. Exponential improvement in precisionfor simulating sparse Hamiltonians. In
Proceedings ofthe Forty-Sixth Annual ACM Symposium on Theory ofComputing , STOC ’14, page 283–292, New York, NY,USA, 2014. Association for Computing Machinery. doi . [23] G. H. Low, V. Kliuchnikov, and N. Wiebe. Well-conditioned multiproduct Hamiltonian simulation,2019. ar x iv .[24] D. W. Berry, A. M. Childs, and R. Kothari. Hamilto-nian Simulation with nearly optimal dependence on allparameters. In , pages 792–809, 2015. doi .[25] G. H. Low and I. L. Chuang. Optimal Hamiltonian sim-ulation by quantum signal processing. Physical ReviewLetters , 118(1):010501, Jan 2017. doi .[26] A. Gilyén, Y. Su, G. H. Low, and N. Wiebe. Quantumsingular value transformation and beyond: exponentialimprovements for quantum matrix arithmetics. In
Pro-ceedings of the 51st Annual ACM SIGACT Symposiumon Theory of Computing , STOC ’19, pages 193–204.ACM Press, 2019. doi .[27] G. H. Low and I. L. Chuang. Hamiltonian simulation byqubitization.
Quantum , 3:163, Jul 2019. doi .[28] D. W. Berry, C. Gidney, M. Motta, J. R. McClean, andR. Babbush. Qubitization of arbitrary basis quantumchemistry leveraging sparsity and low rank factoriza-tion.
Quantum , 3:208, Dec 2019. doi .[29] D. W. Berry, A. M. Childs, R. Cleve, R. Kothari, andR. D. Somma. Simulating Hamiltonian dynamics witha truncated Taylor series.
Physical Review Letters , 114:090502, Mar 2015. doi .[30] R. Babbush, C. Gidney, D. W. Berry, N. Wiebe, J. Mc-Clean, A. Paler, A. Fowler, and H. Neven. Encodingelectronic spectra in quantum circuits with linear Tcomplexity.
Physical Review X , 8:041015, Oct 2018. doi .[31] G. H. Low, V. Kliuchnikov, and L. Schaeffer. TradingT-gates for dirty qubits in state preparation and unitarysynthesis, 2018. ar x iv .[32] S. Bravyi and A. Kitaev. Universal quantum computa-tion with ideal Clifford gates and noisy ancillas. Phys.Rev. A , 71:022316, Feb 2005. doi .[33] M. Howard, J. Wallman, V. Veitch, and J. Emerson.Contextuality supplies the magic for quantum compu-tation.
Nature , 510(7505):351–355, Jun 2014. doi .[34] E. T. Campbell, B. M. Terhal, and C. Vuillot. Roadstowards fault-tolerant universal quantum computation.
Nature , 549(7671):172–179, Sep 2017. doi .[35] M. A. Nielsen and I. L. Chuang.
Quantum Computationand Quantum Information , chapter 4, pages 182–184.Cambridge University Press, 2nd edition, 2010. isbn
978 1 107 00217 3 .[36] T. Helgaker, P. Jorgensen, and J. Olsen.
Molecularelectronic-structure theory . John Wiley & Sons, 2000. isbn
978 1 118 53147 1 .[37] J. R. McClean, K. J. Sung, I. D. Kivlichan, Y. Cao,C. Dai, E. S. Fried, C. Gidney, B. Gimby, P. Gokhale,T. Häner, T. Hardikar, V. Havlíček, O. Higgott,C. Huang, J. Izaac, Z. Jiang, X. Liu, S. McArdle,M. Neeley, T. O’Brien, B. O’Gorman, I. Ozfidan, M. D.
Radin, J. Romero, N. Rubin, N. P. D. Sawaya, K. Setia,S. Sim, D. S. Steiger, M. Steudtner, Q. Sun, W. Sun,D. Wang, F. Zhang, and R. Babbush. OpenFermion:The electronic structure package for quantum comput-ers, 2017. ar x iv .[38] Q. Sun, T. C. Berkelbach, N. S. Blunt, G. H. Booth,S. Guo, Z. Li, J. Liu, J. D. McClain, E. R. Sayfutyarova,S. Sharma, S. Wouters, and G. K. Chan. PySCF:the Python-based simulations of chemistry framework,2017. doi .[39] W. J. Hehre, R. F. Stewart, and J. A. Pople. Self-consistent molecular-orbital methods. I. Use of Gaus-sian expansions of Slater-type atomic orbitals. TheJournal of Chemical Physics , 51(6):2657–2664, 1969. doi .[40] S. Kim, J. Chen, T. Cheng, A. Gindulyte, J. He, S. He,Q. Li, B. A. Shoemaker, P. A. Thiessen, B. Yu, L. Za-slavsky, J. Zhang, and E. E. Bolton. PubChem 2019 up-date: improved access to chemical data.
Nucleic AcidsResearch , 47(D1):D1102–D1109, Oct 2018. doi .[41] R. D. Johnson III. NIST Computational ChemistryComparison and Benchmark Database, NIST Standard Reference Database Number 101, Release 20, Aug 2019. url http://cccbdb.nist.gov .[42] P. Jordan and E. Wigner. Über das Paulische Äquiv-alenzverbot.
Zeitschrift für Physik , 47(9):631–651,1928. doi .[43] A. M. Childs, D. Maslov, Y. Nam, N. J. Ross, and Y. Su.Toward the first quantum simulation with quantumspeedup.
Proceedings of the National Academy of Sci-ences , 115(38):9456–9461, Sep 2018. doi .[44] G. H. Low and N. Wiebe. Hamiltonian simulation inthe interaction picture, 2018. ar x iv .[45] R. Babbush, D. W. Berry, J. R. McClean, and H. Neven.Quantum simulation of chemistry with sublinear scalingin basis size. npj Quantum Information , 5(1), Nov 2019. doi .[46] L. Novo and D. W. Berry. Improved Hamiltonian sim-ulation via a truncated Taylor series and corrections,2016. ar x iv . A. Proofs
For completeness and convenience we collect some of the results we used in this appendix, including some thatmay be well known.
Lemma 1.
The optimal choice for the number of amplification steps is ν = 1 , resulting in s L = 2 .Proof. For unitary U L , the operator Q ν W , with Q as defined in Eq. (4), would have the effect [22] Q ν W | i | ψ i = sin [(2 ν + 1) sin − ( s − L )] | i U L | ψ i + cos [(2 ν + 1) cos − ( N )] | ⊥ , Φ i . For any given number of amplification steps ν , the amplitude of the desired state | i U L | ψ i can be tuned to 1 bysetting t such that s L fulfils s L = sin (cid:18) π ν + 2 (cid:19) − ∼ ν + 2 π . (A1)For this argument it is sufficient to analyze the full expansion to order n . To find the optimal number ν weconsider the operator Q ν W , which contains the most expensive operator W a total of ν + 1 times. Therefore thecost is approximately linear in ν , meaning it is also linear in s n . However, we know that s n = n X k =0 t k k ! L − X ℓ =0 α ℓ | {z } :=Λ ! k = n X k =0 t k Λ k k ! ≈ e Λ t , (A2)where the rightmost approximation holds for sufficiently large n . Equations (A1) and (A2) imply that the cost isexponential in t , indicating there is no benefit in amplifying more than once. Exact numerical evaluation showsthat for n → ∞ , one or two amplification steps ( ν ∈ { , } ) yield approximately equivalent time-per-gate, but thesmaller t of ν = 1 leads to quicker convergence in n . Consequently, it is best to choose ν = 1 . This choice forces s L to satisfy s L = sin (cid:16) π (cid:17) − = 2 . Lemma 2.
The action of Π A on a product state | i | ψ i is given by [29] Π A | i | ψ i = | i (cid:18) − s L U L U † L U L + 3 s L U L (cid:19) | ψ i . Proof.
We can explicitly expand R and use that Π = Π as well as Π | i | ψ i = | i | ψ i to find Π A | i | ψ i = − Π W R W † R W | i | ψ i = − Π W (2Π − ) W † (2Π − ) W | i | ψ i = ( − W Π W † Π W + 2Π WW † Π W + 2Π W Π W † W − Π WW † W ) | i | ψ i = ( − W Π W † Π W + 3Π W ) | i | ψ i = ( − W Π | {z } s L ( | ih |⊗ U L ) Π W † ΠΠ W Π + 3Π W Π) | i | ψ i = | i (cid:18) − s L U L U † L U L + 3 s L U L (cid:19) | ψ i as claimed. Lemma 3.
If used in
W | i | ψ i , P ⋆ has the same effect as P , i.e. W | i | ψ i = ( P † ⊗ ) S ( P ⊗ ) | i | ψ i =( P ⋆ † ⊗ ) S ( P ⋆ ⊗ ) | i | ψ i Proof. P ⋆ on any of the c k registers has the action P ⋆ | i c k = 1 √ Λ k L k − X ℓ =0 √ α ℓ | ℓ i c k and on the q register P ⋆ | i q = 1 p N q κ X k =0 vuut t k k ! k Y j =1 Λ j | k i q where N q = P ∞ k =0 t k k ! Q kj =1 Λ j and κ is the largest nonzero index in L . Therefore S ( P ⋆ ⊗ ) | i | ψ i = 1 p N q Q κk =1 Λ k κ X k =0 vuuut k Y j =1 Λ j t k k ! | k i q k O j =1 L j − X ℓ =0 | ℓ i c j √ α ℓ ˜ h ℓ κ O j = k +1 L j − X ℓ =0 | ℓ i c j √ α ℓ | ψ i Transforming back with P ⋆ † and projecting onto the ancilla | i yields Π( P ⋆ † ⊗ ) S ( P ⋆ ⊗ ) = Π N q Q κk =1 Λ k κ X k =0 k Y j =1 Λ j t k k ! | i q k O j =1 L j − X ℓ =0 | i c j α ℓ ˜ h ℓ κ O j = k +1 L j − X ℓ =0 | i c j α ℓ | {z } Λ j | i cj | ψ i = | i N q Q κk =1 Λ k κ X k =0 κ Y j =1 Λ j t k k ! k Y j =1 L j − X ℓ =0 α ℓ ˜ h ℓ | ψ i = 1 N q | i κ X k =0 t k k ! k Y j =1 L j − X ℓ =0 α ℓ ˜ h ℓ | ψ i = 1 N q | i U L | ψ i = Π W | i | ψ i which is Eq. (3) and we see that N q = s L as defined in Eq. (6). Lemma 4.
The error of a single time step δ L when using t ∞ = log(2) / Λ can be bounded by δ L ( t ∞ ) ≤ − s L ( t ∞ ) =: ε L . Proof.
For easier notation, we first consider fully expanded orders and define ˜∆ n ( t ) := U n ( t ) − U ( t ) ε n := s ∞ ( t ∞ ) − s n ( t ∞ ) = 2 − s n ( t ∞ ) and observe that || ˜∆ n ( t ) || = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n X k =1 ( − it ) k k ! L − X ℓ ,...,ℓ k =0 α ℓ . . . α ℓ k h ℓ . . . h ℓ k − ∞ X k =1 ( − it ) k k ! L X ℓ ,...,ℓ k α ℓ . . . α ℓ k h ℓ . . . h ℓ k (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∞ X k = n +1 ( − it ) k k ! L − X ℓ ,...,ℓ k =0 α ℓ . . . α ℓ k h ℓ . . . h ℓ k (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ ∞ X k = n +1 t k k ! L − X ℓ ,...,ℓ k =0 α ℓ . . . α ℓ k || h ℓ || . . . || h ℓ k || | {z } = ∞ X k =1 t k k ! L − X ℓ ,...,ℓ k =0 α ℓ . . . α ℓ k − n X k =1 t k k ! L − X ℓ ,...,ℓ k =0 α ℓ . . . α ℓ k = s ∞ ( t ) − s n ( t ) which means || ˜∆ n ( t ∞ ) || ≤ s ∞ ( t ∞ ) − s n ( t ∞ ) = ε n . Using these in our definition of ∆ n yields − ∆ n ( t ∞ ) = ˜ A n ( t ∞ ) − U ( t ∞ )= 3 s n ( t ∞ ) U n ( t ∞ ) − s n ( t ∞ ) U n ( t ∞ ) U † n ( t ∞ ) U n ( t ∞ ) − U ( t ∞ )= 32 − ε n | {z } = + εn + O ( ε n ) ( U + ˜∆ n ) − − ε n ) | {z } = + εn + O ( ε n ) = U +2 ˜∆ n + U ˜∆ † n U + O ( ˜∆ n ) z }| { ( U + ˜∆ n )( U + ˜∆ n ) † ( U + ˜∆ n ) − U = ˜∆ n (cid:18) − ε n (cid:19) − U ˜∆ † n U (cid:18)
12 + 3 ε n (cid:19) + O ( ˜∆ n ) + O ( ε n ) . Now we can finally bound the error to δ n = || ∆ n ( t ∞ ) || ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ˜∆ n (cid:18) − ε n (cid:19)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) U ˜∆ † n U (cid:18)
12 + 3 ε n (cid:19)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + O ( ε n ) ≤ ε n ε n O ( ε n ) = ε n + O ( ε n ) , which straightforwardly extends to δ L with ε L for partial orders. Lemma 5.
The logarithmic inverse error bound of a single time step for full orders log (cid:0) ε − n (cid:1) scales like log 1 ε n = O ( n log n ) Proof.
We can bound the residual of the Taylor series by ε n = ∞ X k = n +1 ( log 2 z}|{ t ∞ Λ) k k != log n n ! ∞ X k =1 n ! log k k + n )! ≤ log n n ! e log 2 = 2 log n n ! n ! ≤ e n n +1 / e − n to find ε n ≤ e − n log n e n n +1 / log ε n ≤ log 2 − n − n log log 2 − − (cid:18) n + 12 (cid:19) log n. Therefore log 1 ε n = O ( n log n ) . Corollary 6.
Because the total complexity is of the order C n = nL , the complexity when using full orders dependingon the error bound ε , which we will call C ε, full scales like C ε, full = O (cid:18) L log ε log log ε (cid:19) Proof.
We can use the inequality in Lemma 5 n < log 1 ε n to replace the n in the logarithm of Lemma 5 and find n = O log ε n log log ε n ! and therefore C ε, full = O (cid:18) L log ε log log ε (cid:19) . Lemma 7.
The bound for the logarithmic inverse error of the modified version log ε − L scales like O (cid:0) C L L log C L L (cid:1) ≤ log ε − L < O ( C L log C L ) , depending on the Hamiltonian.Proof. The left inequality follows immediately from the worst case that α = α = . . . = α L . In this case themodification is equivalent to the original method and Lemma 5 with n = C L /L holds.In the other extreme case of α ℓ α → ∀ ℓ ∈ { . . . L } , one term dominates the whole Hamiltonian, and adding h in some order of the expansion equates to adding that whole order, effectively reducing L to 1. Therefore Lemma 5with L = 1 defines an upper bound for the error scaling. Corollary 8.
The complexity of our modified version depending on the simulation error bound ε , which we call C ε , is bounded by O (cid:18) log ε log log ε (cid:19) < C ε ≤ O (cid:18) L log ε log log ε (cid:19) , Proof.
The same reasoning as in Corollary 6 applies to the bounds established in Lemma 7.
Lemma 9.
The error of r time steps || U r − ˜ A r L || is bounded by r times the error of a single time step δ L = || ∆ L || = || U − ˜ A L || , up to order δ L .Proof. We use the definition of ∆ = U − ˜ A and substitute for ˜ A . || U r − ˜ A r L || = || U r − ( U − ∆) r || = || U r − U r + r X k =1 U k − ∆ U r − k + O (∆ ) ||≤ r X k =1 || U k − ∆ U r − k || + ||O (∆ ) || ≤ r X k =1 || ∆ || + O ( δ ) = rδ + O ( δ ) Corollary 10.
The complexity of simulating for a time τ with a total error bound ǫ is bounded by O Λ τ log Λ τǫ log log Λ τǫ ! < C ǫ ≤ O L Λ τ log Λ τǫ log log Λ τǫ ! . Proof.