[PDF] Turning Gate Synthesis Errors into Incoherent Errors

Abstract

Using error correcting codes and fault tolerant techniques, it is possible, at least in theory, to produce logical qubits with significantly lower error rates than the underlying physical qubits. Suppose, however, that the gates that act on these logical qubits are only approximation of the desired gate. This can arise, for example, in synthesizing a single qubit unitary from a set of Clifford and T gates; for a generic such unitary, any finite sequence of gates only approximates the desired target. In this case, errors in the gate can add coherently so that, roughly, the error ϵ in the unitary of each gate must scale as ϵ≲1/N , where N is the number of gates. If, however, one has the option of synthesizing one of several unitaries near the desired target, and if an average of these options is closer to the target, we give some elementary bounds showing cases in which the errors can be made to add incoherently by averaging over random choices, so that, roughly, one needs ϵ≲1/ N − − √ . We remark on one particular application to distilling magic states where this effect happens automatically in the usual circuits.

Full PDF

TTurning Gate Synthesis Errors into Incoherent Errors

Matthew B. Hastings

1, 2 Station Q, Microsoft Research, Santa Barbara, CA 93106-6105, USA Quantum Architectures and Computation Group, Microsoft Research, Redmond, WA 98052, USA

Using error correcting codes and fault tolerant techniques, it is possible, at least in theory, to pro-duce logical qubits with signiﬁcantly lower error rates than the underlying physical qubits. Suppose,however, that the gates that act on these logical qubits are only approximation of the desired gate.This can arise, for example, in synthesizing a single qubit unitary from a set of Cliﬀord and T gates;for a generic such unitary, any ﬁnite sequence of gates only approximates the desired target. In thiscase, errors in the gate can add coherently so that, roughly, the error (cid:15) in the unitary of each gatemust scale as (cid:15) (cid:46) /N , where N is the number of gates. If, however, one has the option of synthe-sizing one of several unitaries near the desired target, and if an average of these options is closer tothe target, we give some elementary bounds showing cases in which the errors can be made to addincoherently by averaging over random choices, so that, roughly, one needs (cid:15) (cid:46) / √ N . We remarkon one particular application to distilling magic states where this eﬀect happens automatically inthe usual circuits. There are several diﬀerent settings in quantum computation where the same theme occurs: there is some error whichcan be reduced at the cost of an overhead, either in space (number of qubits) or time (depth of quantum circuit) orboth[1–4]. To give one architecture where this theme occurs three times, imagine a system of noisy physical qubits.Then, imagine implementing an appropriate CSS code to produce a system of logical qubits with a smaller noise andwith Cliﬀord gate operations at high accuracy; the error rate decreases as the code distance increases which requiresan increasing number of physical qubits for the same number of logical qubits. Then, taking high accuracy Cliﬀordoperations as given, if one can approximately implement a T gate at suﬃciently high accuracy, one can distill magicstates to produce T gates with higher accuracy, again at the cost of an overhead[5–7]. Finally, using Cliﬀord and T gates from the ﬁrst two steps, one can approximate arbitrary single qubit unitaries by a sequence of these gates, upto an error (cid:15) that is exponentially small in the length of the sequence; this is called gate synthesis[8, 9].In the simplest analysis of such a scheme, one tries to make the error rate in Cliﬀord operations negligible so thatone can assume at later steps that they are exact, and then one tries to make the error rate in the T gates negligibleso that in the synthesis operation one can assume that the T gates are exact. It is important to know how seriousthe eﬀect of errors are, though, as one would like to incur the minimal necessary overhead. In this paper, we do notfocus on errors which cause a large change in the state, such as a single spin ﬂip; these are the sorts of errors that arehandled by the CSS code. Instead, we focus on errors where the system evolves under a unitary which is close to thedesired one but not exactly correct, as might occur in gate synthesis.As a toy example, consider a single qubit. Suppose that one wishes to apply N successive unitaries exp( iθσ Z ) tothis qubit. Suppose however that one instead applies N unitaries exp( iθ (cid:48) σ Z ) with θ (cid:48) − θ = (cid:15) . Then, in order forthe error in the evolution to be small, one needs | (cid:15) | (cid:46) /N . Suppose instead that each of the unitaries implementsexp( i ( θ ± (cid:15) ) σ Z ), with the sign chosen ± uniformly and independently for each qubit. Then, the net error resultsfrom a random walk and if | (cid:15) | (cid:46) / √ N the error will likely be small. Roughly (we do not give a precise deﬁnition)we call the ﬁrst kind of scaling “coherent” and the second kind of scaling “incoherent”. In this paper, we give someelementary bounds showing how to achieve the incoherent scaling if one can approximate the desired unitary by anaverage over other unitaries. We then explain how this would be applied in practice, by repeatedly re-running thesame algorithm with diﬀerent random choices of unitaries.The eﬀect in this toy example would occur in simulating a Hamiltonian by Trotter-Suzuki methods[10]; since thesame unitaries are applied repeatedly to reach a time large compared to the time step, small errors in angle can addcoherently. Indeed, the toy example is an example of this method using just a single term in the Trotter-Suzukidecomposition. While it was noticed before[11] in numerical studies of Trotter-Suzuki that one could obtain goodresults by choosing the angle of the evolution randomly at each time step to obtain the correct average angle, herewe give a general result. Further, we propose also varying the angle from one run of the algorithm to the next; in thetoy example if one chooses the same random angles on every run, then typically there is an error in expectation value(of σ y if the system is initialized in the σ x = +1 state) which is ∼ (cid:15) √ N as opposed to the ∼ (cid:15) N error arising fromaveraging over angles that we show below. a r X i v : . [ qu a n t - ph ] D ec GENERAL RESULTS

For a general setting suppose that we wish to implement a quantum circuit composed of a sequence of unitary gates U , U , . . . , U N , so that the circuit implements unitary transformation U deﬁned by U = U N U N − . . . U . (1)Now suppose that we are only able to approximate this on some quantum computer. For some or all of the gates,we implement the gates with some error, so that rather than performing unitary U i , we instead perform some otherunitary V i , with V i ≈ U i . Let V = V N V N − . . . V . (2)For applications in quantum computing, we initialize that system in some initial (possibly mixed) state ρ , then applythe quantum circuit, and then measure some operator M . The error in expectation is tr( U ρU † M ) − tr( V ρV † M ).This is upper bounded by tr( | U ρU † − V ρV † ) | ) · (cid:107) M (cid:107) , where tr( | . . . | ) denotes the trace norm (i.e., the sum of thesingular values) and (cid:107) . . . (cid:107) denotes the operator norm (i.e., the maximum singular value).Of course, trivially we have (cid:107) V − U (cid:107) ≤ (cid:80) i (cid:107) V i − U i (cid:107) , which immediately implies the bound tr( | U ρU † − V ρV † ) | ) ≤ (cid:80) i (cid:107) V i − U i (cid:107) . However, we are interested in ﬁnding a better bound with incoherent error scaling.Suppose that for each i , there are several unitaries, W i, , W i, , . . . , W i,n ( i ) that we can implement, where n ( i ) issome integer depending on i , and we can take V i to be any given one of these unitaries. That is, we have the optionto choose some integer a i and then set V i = W i,a i . We are able to choose these a i independently for each i . Thequestion then is how to do this to minimize error. The key idea here is that while it may be diﬃcult to show that anygiven sequence minimizes error, it will be easier to show that an average of the state value V ρV † over an appropriateensemble of sequences gives a good approximation to U ρU † .Before continuing, we brieﬂy introduce the concept of the diamond norm. Given a linear map E on matrices,a natural norm is (cid:107)E(cid:107) ≡ max σ, tr( | σ | )=1 tr( |E ( σ ) | ). The diamond norm is deﬁned by stabilizing this norm; it is (cid:107)E(cid:107) (cid:5) ≡ max σ, tr( | σ | )=1 tr( | ( E ⊗ I )( σ ) | ), where we have tensored E with the identity channel on an auxiliary Hilbertspace of suﬃciently large dimension. The diamond norm is often used as a way to estimate the diﬀerence betweentwo such linear maps; a bound on the diamond norm is a stronger statement than a bound on the norm (cid:107) . . . (cid:107) . Thediamond norm provides a useful language for the following results and it is important because it helps understandthat if the unitaries W a below act only on a subsystem of the full system and act trivially on the rest of the system,then the norm bounds can be computed on that subsystem. Lemma 1.

Let W a be unitaries. Suppose that there is a probability distribution q ( a ) such that the following holds.Let (cid:80) a q ( a ) W a ≡ W . Let δ ≡ (cid:80) a q ( a ) (cid:107) W a − W (cid:107) . Let E be the quantum channel deﬁned by E ( σ ) = U σU . Let G bethe quantum channel deﬁned by G ( σ ) = (cid:80) a q ( a ) W a σW † a . Then, (cid:107)E − G(cid:107) (cid:5) ≤ δ + 2 (cid:107) W − U (cid:107) . (3) Proof.

Let F be the linear map deﬁned by F ( σ ) = W σW † . We have (cid:107)E − F(cid:107) ≤ (cid:107) W − U (cid:107) . Indeed, we have (cid:107)E − F(cid:107) (cid:5) ≤ (cid:107) W − U (cid:107) since (cid:107) ( W − U ) ⊗ I (cid:107) = (cid:107) W − U (cid:107) .We have ( G ⊗ I )( σ ) = (cid:88) a q ( a )( W a ⊗ I ) σ ( W a ⊗ I ) † (4)= (cid:88) a q ( a )(( W a − W ) ⊗ I + W ⊗ I ) σ (( W a − W ) ⊗ I + W ⊗ I ) † = ( W ⊗ I ) σ ( W ⊗ I ) † + (cid:88) a q ( a )(( W a − W ) ⊗ I ) σ ( W ⊗ I ) † + h . c . + (cid:88) a q ( a )(( W a − W ) ⊗ I ) σ (( W a − W ) ⊗ I ) † . (5)The second-to-last line of the above equation vanishes, while the last line is bounded in trace norm by δ i tr( | σ | ) since (cid:107) ( W a − W ) ⊗ I (cid:107) = (cid:107) W a − W (cid:107) . So (cid:107)F − G(cid:107) (cid:5) ≤ δ . By a triangle inequality, this implies Eq. (3).Let E , . . . , E N and G , . . . , G N be quantum channels and let E ◦ F denote composition of channels E , F . We havethe following inequality: (cid:107)E N ◦ E N − ◦ . . . ◦ E − G N ◦ G N − ◦ . . . ◦ G (cid:107) (cid:5) ≤ N (cid:88) i =1 (cid:107)E i − G i (cid:107) (cid:5) . (6)This follows because E N ◦ E N − ◦ . . . ◦ E − G N ◦ G N − ◦ . . . ◦ G = ( E N − G N ) ◦ G N − ◦ . . . ◦ G + E N ◦ ( E N − − G N − ) ◦G N − ◦ . . . G + . . . . Then, use a triangle inequality and use the fact that tr( |E i ( σ ) | ) ≤ tr( | σ | ) and tr( |G i ( σ ) | ) ≤ tr( | σ | ).This immediately implies: Lemma 2.

Suppose that for each i = 1 , . . . , N there is a probability distribution q i ( a ) , for a = 1 , . . . , n ( i ) , such thatthe following holds. Let (cid:80) a q i ( a ) W i,a ≡ W i . Let δ i ≡ (cid:80) a q i ( a ) (cid:107) W i,a − W i (cid:107) .Assume | ρ | = 1 . Deﬁne U, V by Eqs. (1,2). Let V i = W i,a ( i ) , with a ( i ) chosen independently from probabilitydistribution q i ( a ( i )) . Let E [ . . . ] denote expectation value. Then, (cid:12)(cid:12)(cid:12) E [ V ρV † − U ρU † ] (cid:12)(cid:12)(cid:12) ≤ (cid:88) i ( δ i + 2 (cid:107) W i − U i (cid:107) ) . (7) Proof.

Let E i be the quantum channel deﬁned by E i ( σ ) = U i σU † i . Let G i the be quantum channel deﬁned by G i ( σ ) = (cid:80) a q i ( a ) W i,a σW † i,a . Use lemma 1 to bound (cid:107)E i −G i (cid:107) (cid:5) . Use Eq. (6) to bound (cid:107)E N ◦E N − ◦ . . . ◦E −G N ◦G N − ◦ . . . ◦G (cid:107) (cid:5) .Note that E [ V ρV † ] = G N ◦ . . . G ( ρ ). APPLICATIONS

Lemma 2 can be applied in practice as follows. Suppose that one wishes to estimate some expectation valuetr(

U ρU † M ). This expectation value could be estimated by applying the quantum circuit U to the initial state,measuring M , and repeating several times to improve statistics. One obtains approximately the same result byapplying quantum circuit V to the initial state, measuring M , and repeating several times randomly resampling theunitaries V i each time to improve statistics, with an error bounded by (cid:107) M (cid:107) times the error term in Eq. (7).We now estimate the error in a simple setting. Suppose that for each i , either U i can be implemented exactly (i.e.,we can choose a i such that V i = U i ) or U i is a rotation of a single qubit by a unitary exp( iθ i σ Z ). Suppose for thoselatter i , we have n ( i ) = 2 and W i, , W i, are rotations exp( iθ i, σ Z ) or exp( iθ i, σ Z ). Suppose we can ﬁnd probabilities q i (1) , q i (2) so that q i (1) θ i, + q i (2) θ i, = θ i .To estimate δ i , let φ i, = θ i, − θ i and φ i, = θ i, − θ i . We can compute (cid:107) W i − U i (cid:107) by considering a systemconsisting of just a single qubit, so both W i and U i are diagonal 2-by-2 matrices; we get the same result for a systemof multiple qubits with the unitaries U i , W i just acting on one qubit as we simply tensor by the identity which doesnot change the operator norm (indeed, this is why the results above held for the diamond norm). We ﬁnd (cid:107) W i − U i (cid:107) = | q i (1) exp( iφ i, ) + q i (2) exp( iφ i, ) − | = (cid:112) ( q i (1) cos( φ i, ) + q i (2) cos( φ i, ) − + ( q i (1) sin( φ i, ) + q i (2) sin( φ i, )) .Note that 1 ≥ cos( φ ) ≥ − φ / | q i (1) cos( φ i, ) + q i (2) cos( φ i, ) − | ≤ q i (1) φ i, + q i (2) φ i, . Note also that | q i (1) sin( φ i, ) + q i (2) sin( φ i, ) | ≤ O ( | φ i, + φ i, | ). Hence, (cid:107) W i − U i (cid:107) ≤ q i (1) φ i, + q i (2) φ i, O ( φ i, + φ i, ) . (8)Similarly, δ i ≤ q φ i, + q φ i, + O ( φ i, + φ i, ) . (9)Thus, in this setting we need to take φ i, , φ i, (cid:46) / √ N in order to make the error small. T GATES BY STATE INJECTION

Another application of the above is to implementing T gates by state injection (see for example Refs. 6, 7). Assumewe have an ancilla qubit (the target) in the state 2 − / (exp( i θ ) | (cid:105) + exp( − i θ ) | (cid:105) ). In state injection, we apply aCNOT gate from another qubit (the control) to this target, and then measure the target in the Z basis. If themeasurement result is | (cid:105) , then (up to a global phase) we implement the unitary exp( i θ σ Z ) on the control, whileif the measurement is | (cid:105) , we implement exp( − i θ σ Z ). If θ = π/

4, then this gives a way to implement the T gate,assuming we can implement the S gate exp( − i π σ Z ): if the measurement outcome is | (cid:105) , we follow the measurementby an S gate. This way, with probability 1 / i θ σ Z ) on the control, and with probability 1 / i ( π − θ ) σ Z ) on the control. This is the situation considered above, where if this state injection is usedto implement the i -th unitary in the circuit we have θ i, = θ/ θ i, = π/ − θ/ q i (1) = q i (2) = 1 / θ i = π/

8. The randomness of the outcome of state injection automatically produces the needed averaging over twodiﬀerent angles.Distillation schemes produce ancillas in states 2 − / (exp( i θ ) | (cid:105) + exp( − i θ ) | (cid:105) ) with θ ≈ π/

4. There can be bothrandom errors (so that the value of θ varies from one ancilla to another) and systematic errors (so that the averagevalue of θ diﬀers from π/ µ denote the average, over ancillas,of ( θ − π/ and let µ denote the average of | θ − π/ | . It does not matter whether or not the angles are independentbetween diﬀerent ancillas. Let ρ be some initial state and U ρU † be the result of some quantum circuit including T gates and σ be the result of the quantum circuit with the T gates performed (approximately) with state injection,choosing a random ancilla from this ensemble each time we do state injection. Assume that there are S diﬀerent stateinjections. Then, tr( | ρ − σ | ) ≤ Sµ + S · O ( µ ). Note that µ is bounded by a constant times µ since we can assumethat all angles are bounded by π .One may also consider a more general case where the ancilla qubit may be in state u | (cid:105) + v | (cid:105) , with u = cos( τ ) exp( i θ )and v = sin( τ ) exp( − i θ ). We now give a separate treatment of this case. The eﬀect of the state injec-tion protocol is to implement the quantum channel G deﬁned by G ( σ ) = (cid:80) i A i σA † i , with A = (cid:18) u v (cid:19) and A = (cid:18) v exp( i π ) u exp( − i π ) (cid:19) . Let A = (1 / A + A ), let ∆ = A − A and ∆ = A − A . Then, G ( σ ) = 2 AσA † + (cid:88) i ∆ i σ ∆ † i . (10)Deﬁning E i ( σ ) = exp( i π σ Z ) σ exp( − i π σ Z ), we have (cid:107)E − G(cid:107) (cid:5) ≤ (cid:107) exp( i π σ Z ) − √ · A (cid:107) + (cid:88) i (cid:107) ∆ i (cid:107) . (11)Let ω = exp( i π ). Note that A = diag(( u + ω v ) / , ( v + ω − u ) / (cid:107) exp( i π σ Z ) − √ · A (cid:107) = | ω − ( u + ω v ) / √ | . (12)The right-hand side of the above equation is second order in τ − π and θ − π ; i.e., it is O (( τ − π ) ) + O (( θ − π ) ) + O (( τ − π )( θ − π )), though it is not analytic near θ = τ = π . Also, (cid:107) ∆ i (cid:107) = (1 / | u − v exp( i π ) | which is secondorder in τ − π and θ − π . Hence, (cid:107)E − G(cid:107) (cid:5) is second order in τ − π and θ − π . (We omit the exact expression tosecond order, which can be found by Taylor series). DISCUSSION

We have considered the eﬀects of errors in gates on a quantum computation. An elementary calculation showsthat an appropriate averaging can in some cases signiﬁcantly improve the scaling, so that instead of requiring error (cid:15) (cid:46) /N for N gates one instead requires only (cid:15) (cid:46) / √ N . This averaging can be implemented in some cases by usinga diﬀerent random choice of gates each time the algorithm is run. In some cases, such as in T gates, the averagingoccurs automatically. In an architecture as mentioned at the start, where one has errors in the T gates and where oneuses a sequence of such approximate T gates and Cliﬀord gates to synthesize an approximation to another unitary,the diamond norm error in the approximation of the T gates adds to the diamond norm error in the synthesis. Acknowledgments—

I thank D. Wecker for useful discussions. [1] D. Aharonov and M. Ben-Or, “Fault-tolerant quantum computation with constant error”, in STOC ’97 Proceedings of thetwenty-ninth annual ACM symposium on Theory of computing, 1997.[2] J. Preskill, “Reliable quantum computers”, Proc. R. Soc. Lond. A, , 385 (1998).[3] E. Knill, R. Laﬂamme, and W. H. Zurek, “Resilient quantum computation: Error models and thresholds”, Phil. Trans. R.Soc. Lond. A, , 365 (1998).[4] P. W. Shor, “Fault-tolerant quantum computation”, in Proceedings of the 37th Symposium on the Foundations of ComputerScience, Los Alamitos, California, 1996, IEEE press, p. 56-65.[5] S. Bravyi and A. Kitaev, “Universal quantum computation with ideal Cliﬀord gates and noisy ancillas”, Phys. Rev. A ,022316 (2005).[6] E. Knill, “Fault-Tolerant Postselected Quantum Computation: Threshold Analysis”, arXiv:quant-ph/0404104.[7] E. Knill, “Quantum computing with realistically noisy devices”, Nature , 39-44 (2005).[8] A. Kitaev, A. Shen, and M. Vyalyi, Classical and quantum computation , volume 47 of Graduate studies in mathematics.American Mathematical Society, 2002.[9] N. J. Ross and P. Selinger, “Optimal ancilla-free Cliﬀord+T approximation of z-rotations”, QIC , 901 (2016).[10] S. Lloyd, “Universal quantum simulators”, Science273