TTurning Gate Synthesis Errors into Incoherent Errors
Matthew B. Hastings
1, 2 Station Q, Microsoft Research, Santa Barbara, CA 93106-6105, USA Quantum Architectures and Computation Group, Microsoft Research, Redmond, WA 98052, USA
Using error correcting codes and fault tolerant techniques, it is possible, at least in theory, to pro-duce logical qubits with significantly lower error rates than the underlying physical qubits. Suppose,however, that the gates that act on these logical qubits are only approximation of the desired gate.This can arise, for example, in synthesizing a single qubit unitary from a set of Clifford and T gates;for a generic such unitary, any finite sequence of gates only approximates the desired target. In thiscase, errors in the gate can add coherently so that, roughly, the error (cid:15) in the unitary of each gatemust scale as (cid:15) (cid:46) /N , where N is the number of gates. If, however, one has the option of synthe-sizing one of several unitaries near the desired target, and if an average of these options is closer tothe target, we give some elementary bounds showing cases in which the errors can be made to addincoherently by averaging over random choices, so that, roughly, one needs (cid:15) (cid:46) / √ N . We remarkon one particular application to distilling magic states where this effect happens automatically inthe usual circuits. There are several different settings in quantum computation where the same theme occurs: there is some error whichcan be reduced at the cost of an overhead, either in space (number of qubits) or time (depth of quantum circuit) orboth[1–4]. To give one architecture where this theme occurs three times, imagine a system of noisy physical qubits.Then, imagine implementing an appropriate CSS code to produce a system of logical qubits with a smaller noise andwith Clifford gate operations at high accuracy; the error rate decreases as the code distance increases which requiresan increasing number of physical qubits for the same number of logical qubits. Then, taking high accuracy Cliffordoperations as given, if one can approximately implement a T gate at sufficiently high accuracy, one can distill magicstates to produce T gates with higher accuracy, again at the cost of an overhead[5–7]. Finally, using Clifford and T gates from the first two steps, one can approximate arbitrary single qubit unitaries by a sequence of these gates, upto an error (cid:15) that is exponentially small in the length of the sequence; this is called gate synthesis[8, 9].In the simplest analysis of such a scheme, one tries to make the error rate in Clifford operations negligible so thatone can assume at later steps that they are exact, and then one tries to make the error rate in the T gates negligibleso that in the synthesis operation one can assume that the T gates are exact. It is important to know how seriousthe effect of errors are, though, as one would like to incur the minimal necessary overhead. In this paper, we do notfocus on errors which cause a large change in the state, such as a single spin flip; these are the sorts of errors that arehandled by the CSS code. Instead, we focus on errors where the system evolves under a unitary which is close to thedesired one but not exactly correct, as might occur in gate synthesis.As a toy example, consider a single qubit. Suppose that one wishes to apply N successive unitaries exp( iθσ Z ) tothis qubit. Suppose however that one instead applies N unitaries exp( iθ (cid:48) σ Z ) with θ (cid:48) − θ = (cid:15) . Then, in order forthe error in the evolution to be small, one needs | (cid:15) | (cid:46) /N . Suppose instead that each of the unitaries implementsexp( i ( θ ± (cid:15) ) σ Z ), with the sign chosen ± uniformly and independently for each qubit. Then, the net error resultsfrom a random walk and if | (cid:15) | (cid:46) / √ N the error will likely be small. Roughly (we do not give a precise definition)we call the first kind of scaling “coherent” and the second kind of scaling “incoherent”. In this paper, we give someelementary bounds showing how to achieve the incoherent scaling if one can approximate the desired unitary by anaverage over other unitaries. We then explain how this would be applied in practice, by repeatedly re-running thesame algorithm with different random choices of unitaries.The effect in this toy example would occur in simulating a Hamiltonian by Trotter-Suzuki methods[10]; since thesame unitaries are applied repeatedly to reach a time large compared to the time step, small errors in angle can addcoherently. Indeed, the toy example is an example of this method using just a single term in the Trotter-Suzukidecomposition. While it was noticed before[11] in numerical studies of Trotter-Suzuki that one could obtain goodresults by choosing the angle of the evolution randomly at each time step to obtain the correct average angle, herewe give a general result. Further, we propose also varying the angle from one run of the algorithm to the next; in thetoy example if one chooses the same random angles on every run, then typically there is an error in expectation value(of σ y if the system is initialized in the σ x = +1 state) which is ∼ (cid:15) √ N as opposed to the ∼ (cid:15) N error arising fromaveraging over angles that we show below. a r X i v : . [ qu a n t - ph ] D ec GENERAL RESULTS
For a general setting suppose that we wish to implement a quantum circuit composed of a sequence of unitary gates U , U , . . . , U N , so that the circuit implements unitary transformation U defined by U = U N U N − . . . U . (1)Now suppose that we are only able to approximate this on some quantum computer. For some or all of the gates,we implement the gates with some error, so that rather than performing unitary U i , we instead perform some otherunitary V i , with V i ≈ U i . Let V = V N V N − . . . V . (2)For applications in quantum computing, we initialize that system in some initial (possibly mixed) state ρ , then applythe quantum circuit, and then measure some operator M . The error in expectation is tr( U ρU † M ) − tr( V ρV † M ).This is upper bounded by tr( | U ρU † − V ρV † ) | ) · (cid:107) M (cid:107) , where tr( | . . . | ) denotes the trace norm (i.e., the sum of thesingular values) and (cid:107) . . . (cid:107) denotes the operator norm (i.e., the maximum singular value).Of course, trivially we have (cid:107) V − U (cid:107) ≤ (cid:80) i (cid:107) V i − U i (cid:107) , which immediately implies the bound tr( | U ρU † − V ρV † ) | ) ≤ (cid:80) i (cid:107) V i − U i (cid:107) . However, we are interested in finding a better bound with incoherent error scaling.Suppose that for each i , there are several unitaries, W i, , W i, , . . . , W i,n ( i ) that we can implement, where n ( i ) issome integer depending on i , and we can take V i to be any given one of these unitaries. That is, we have the optionto choose some integer a i and then set V i = W i,a i . We are able to choose these a i independently for each i . Thequestion then is how to do this to minimize error. The key idea here is that while it may be difficult to show that anygiven sequence minimizes error, it will be easier to show that an average of the state value V ρV † over an appropriateensemble of sequences gives a good approximation to U ρU † .Before continuing, we briefly introduce the concept of the diamond norm. Given a linear map E on matrices,a natural norm is (cid:107)E(cid:107) ≡ max σ, tr( | σ | )=1 tr( |E ( σ ) | ). The diamond norm is defined by stabilizing this norm; it is (cid:107)E(cid:107) (cid:5) ≡ max σ, tr( | σ | )=1 tr( | ( E ⊗ I )( σ ) | ), where we have tensored E with the identity channel on an auxiliary Hilbertspace of sufficiently large dimension. The diamond norm is often used as a way to estimate the difference betweentwo such linear maps; a bound on the diamond norm is a stronger statement than a bound on the norm (cid:107) . . . (cid:107) . Thediamond norm provides a useful language for the following results and it is important because it helps understandthat if the unitaries W a below act only on a subsystem of the full system and act trivially on the rest of the system,then the norm bounds can be computed on that subsystem. Lemma 1.
Let W a be unitaries. Suppose that there is a probability distribution q ( a ) such that the following holds.Let (cid:80) a q ( a ) W a ≡ W . Let δ ≡ (cid:80) a q ( a ) (cid:107) W a − W (cid:107) . Let E be the quantum channel defined by E ( σ ) = U σU . Let G bethe quantum channel defined by G ( σ ) = (cid:80) a q ( a ) W a σW † a . Then, (cid:107)E − G(cid:107) (cid:5) ≤ δ + 2 (cid:107) W − U (cid:107) . (3) Proof.
Let F be the linear map defined by F ( σ ) = W σW † . We have (cid:107)E − F(cid:107) ≤ (cid:107) W − U (cid:107) . Indeed, we have (cid:107)E − F(cid:107) (cid:5) ≤ (cid:107) W − U (cid:107) since (cid:107) ( W − U ) ⊗ I (cid:107) = (cid:107) W − U (cid:107) .We have ( G ⊗ I )( σ ) = (cid:88) a q ( a )( W a ⊗ I ) σ ( W a ⊗ I ) † (4)= (cid:88) a q ( a )(( W a − W ) ⊗ I + W ⊗ I ) σ (( W a − W ) ⊗ I + W ⊗ I ) † = ( W ⊗ I ) σ ( W ⊗ I ) † + (cid:88) a q ( a )(( W a − W ) ⊗ I ) σ ( W ⊗ I ) † + h . c . + (cid:88) a q ( a )(( W a − W ) ⊗ I ) σ (( W a − W ) ⊗ I ) † . (5)The second-to-last line of the above equation vanishes, while the last line is bounded in trace norm by δ i tr( | σ | ) since (cid:107) ( W a − W ) ⊗ I (cid:107) = (cid:107) W a − W (cid:107) . So (cid:107)F − G(cid:107) (cid:5) ≤ δ . By a triangle inequality, this implies Eq. (3).Let E , . . . , E N and G , . . . , G N be quantum channels and let E ◦ F denote composition of channels E , F . We havethe following inequality: (cid:107)E N ◦ E N − ◦ . . . ◦ E − G N ◦ G N − ◦ . . . ◦ G (cid:107) (cid:5) ≤ N (cid:88) i =1 (cid:107)E i − G i (cid:107) (cid:5) . (6)This follows because E N ◦ E N − ◦ . . . ◦ E − G N ◦ G N − ◦ . . . ◦ G = ( E N − G N ) ◦ G N − ◦ . . . ◦ G + E N ◦ ( E N − − G N − ) ◦G N − ◦ . . . G + . . . . Then, use a triangle inequality and use the fact that tr( |E i ( σ ) | ) ≤ tr( | σ | ) and tr( |G i ( σ ) | ) ≤ tr( | σ | ).This immediately implies: Lemma 2.
Suppose that for each i = 1 , . . . , N there is a probability distribution q i ( a ) , for a = 1 , . . . , n ( i ) , such thatthe following holds. Let (cid:80) a q i ( a ) W i,a ≡ W i . Let δ i ≡ (cid:80) a q i ( a ) (cid:107) W i,a − W i (cid:107) .Assume | ρ | = 1 . Define U, V by Eqs. (1,2). Let V i = W i,a ( i ) , with a ( i ) chosen independently from probabilitydistribution q i ( a ( i )) . Let E [ . . . ] denote expectation value. Then, (cid:12)(cid:12)(cid:12) E [ V ρV † − U ρU † ] (cid:12)(cid:12)(cid:12) ≤ (cid:88) i ( δ i + 2 (cid:107) W i − U i (cid:107) ) . (7) Proof.
Let E i be the quantum channel defined by E i ( σ ) = U i σU † i . Let G i the be quantum channel defined by G i ( σ ) = (cid:80) a q i ( a ) W i,a σW † i,a . Use lemma 1 to bound (cid:107)E i −G i (cid:107) (cid:5) . Use Eq. (6) to bound (cid:107)E N ◦E N − ◦ . . . ◦E −G N ◦G N − ◦ . . . ◦G (cid:107) (cid:5) .Note that E [ V ρV † ] = G N ◦ . . . G ( ρ ). APPLICATIONS
Lemma 2 can be applied in practice as follows. Suppose that one wishes to estimate some expectation valuetr(
U ρU † M ). This expectation value could be estimated by applying the quantum circuit U to the initial state,measuring M , and repeating several times to improve statistics. One obtains approximately the same result byapplying quantum circuit V to the initial state, measuring M , and repeating several times randomly resampling theunitaries V i each time to improve statistics, with an error bounded by (cid:107) M (cid:107) times the error term in Eq. (7).We now estimate the error in a simple setting. Suppose that for each i , either U i can be implemented exactly (i.e.,we can choose a i such that V i = U i ) or U i is a rotation of a single qubit by a unitary exp( iθ i σ Z ). Suppose for thoselatter i , we have n ( i ) = 2 and W i, , W i, are rotations exp( iθ i, σ Z ) or exp( iθ i, σ Z ). Suppose we can find probabilities q i (1) , q i (2) so that q i (1) θ i, + q i (2) θ i, = θ i .To estimate δ i , let φ i, = θ i, − θ i and φ i, = θ i, − θ i . We can compute (cid:107) W i − U i (cid:107) by considering a systemconsisting of just a single qubit, so both W i and U i are diagonal 2-by-2 matrices; we get the same result for a systemof multiple qubits with the unitaries U i , W i just acting on one qubit as we simply tensor by the identity which doesnot change the operator norm (indeed, this is why the results above held for the diamond norm). We find (cid:107) W i − U i (cid:107) = | q i (1) exp( iφ i, ) + q i (2) exp( iφ i, ) − | = (cid:112) ( q i (1) cos( φ i, ) + q i (2) cos( φ i, ) − + ( q i (1) sin( φ i, ) + q i (2) sin( φ i, )) .Note that 1 ≥ cos( φ ) ≥ − φ / | q i (1) cos( φ i, ) + q i (2) cos( φ i, ) − | ≤ q i (1) φ i, + q i (2) φ i, . Note also that | q i (1) sin( φ i, ) + q i (2) sin( φ i, ) | ≤ O ( | φ i, + φ i, | ). Hence, (cid:107) W i − U i (cid:107) ≤ q i (1) φ i, + q i (2) φ i, O ( φ i, + φ i, ) . (8)Similarly, δ i ≤ q φ i, + q φ i, + O ( φ i, + φ i, ) . (9)Thus, in this setting we need to take φ i, , φ i, (cid:46) / √ N in order to make the error small. T GATES BY STATE INJECTION
Another application of the above is to implementing T gates by state injection (see for example Refs. 6, 7). Assumewe have an ancilla qubit (the target) in the state 2 − / (exp( i θ ) | (cid:105) + exp( − i θ ) | (cid:105) ). In state injection, we apply aCNOT gate from another qubit (the control) to this target, and then measure the target in the Z basis. If themeasurement result is | (cid:105) , then (up to a global phase) we implement the unitary exp( i θ σ Z ) on the control, whileif the measurement is | (cid:105) , we implement exp( − i θ σ Z ). If θ = π/
4, then this gives a way to implement the T gate,assuming we can implement the S gate exp( − i π σ Z ): if the measurement outcome is | (cid:105) , we follow the measurementby an S gate. This way, with probability 1 / i θ σ Z ) on the control, and with probability 1 / i ( π − θ ) σ Z ) on the control. This is the situation considered above, where if this state injection is usedto implement the i -th unitary in the circuit we have θ i, = θ/ θ i, = π/ − θ/ q i (1) = q i (2) = 1 / θ i = π/
8. The randomness of the outcome of state injection automatically produces the needed averaging over twodifferent angles.Distillation schemes produce ancillas in states 2 − / (exp( i θ ) | (cid:105) + exp( − i θ ) | (cid:105) ) with θ ≈ π/
4. There can be bothrandom errors (so that the value of θ varies from one ancilla to another) and systematic errors (so that the averagevalue of θ differs from π/ µ denote the average, over ancillas,of ( θ − π/ and let µ denote the average of | θ − π/ | . It does not matter whether or not the angles are independentbetween different ancillas. Let ρ be some initial state and U ρU † be the result of some quantum circuit including T gates and σ be the result of the quantum circuit with the T gates performed (approximately) with state injection,choosing a random ancilla from this ensemble each time we do state injection. Assume that there are S different stateinjections. Then, tr( | ρ − σ | ) ≤ Sµ + S · O ( µ ). Note that µ is bounded by a constant times µ since we can assumethat all angles are bounded by π .One may also consider a more general case where the ancilla qubit may be in state u | (cid:105) + v | (cid:105) , with u = cos( τ ) exp( i θ )and v = sin( τ ) exp( − i θ ). We now give a separate treatment of this case. The effect of the state injec-tion protocol is to implement the quantum channel G defined by G ( σ ) = (cid:80) i A i σA † i , with A = (cid:18) u v (cid:19) and A = (cid:18) v exp( i π ) u exp( − i π ) (cid:19) . Let A = (1 / A + A ), let ∆ = A − A and ∆ = A − A . Then, G ( σ ) = 2 AσA † + (cid:88) i ∆ i σ ∆ † i . (10)Defining E i ( σ ) = exp( i π σ Z ) σ exp( − i π σ Z ), we have (cid:107)E − G(cid:107) (cid:5) ≤ (cid:107) exp( i π σ Z ) − √ · A (cid:107) + (cid:88) i (cid:107) ∆ i (cid:107) . (11)Let ω = exp( i π ). Note that A = diag(( u + ω v ) / , ( v + ω − u ) / (cid:107) exp( i π σ Z ) − √ · A (cid:107) = | ω − ( u + ω v ) / √ | . (12)The right-hand side of the above equation is second order in τ − π and θ − π ; i.e., it is O (( τ − π ) ) + O (( θ − π ) ) + O (( τ − π )( θ − π )), though it is not analytic near θ = τ = π . Also, (cid:107) ∆ i (cid:107) = (1 / | u − v exp( i π ) | which is secondorder in τ − π and θ − π . Hence, (cid:107)E − G(cid:107) (cid:5) is second order in τ − π and θ − π . (We omit the exact expression tosecond order, which can be found by Taylor series). DISCUSSION
We have considered the effects of errors in gates on a quantum computation. An elementary calculation showsthat an appropriate averaging can in some cases significantly improve the scaling, so that instead of requiring error (cid:15) (cid:46) /N for N gates one instead requires only (cid:15) (cid:46) / √ N . This averaging can be implemented in some cases by usinga different random choice of gates each time the algorithm is run. In some cases, such as in T gates, the averagingoccurs automatically. In an architecture as mentioned at the start, where one has errors in the T gates and where oneuses a sequence of such approximate T gates and Clifford gates to synthesize an approximation to another unitary,the diamond norm error in the approximation of the T gates adds to the diamond norm error in the synthesis. Acknowledgments—
I thank D. Wecker for useful discussions. [1] D. Aharonov and M. Ben-Or, “Fault-tolerant quantum computation with constant error”, in STOC ’97 Proceedings of thetwenty-ninth annual ACM symposium on Theory of computing, 1997.[2] J. Preskill, “Reliable quantum computers”, Proc. R. Soc. Lond. A, , 385 (1998).[3] E. Knill, R. Laflamme, and W. H. Zurek, “Resilient quantum computation: Error models and thresholds”, Phil. Trans. R.Soc. Lond. A, , 365 (1998).[4] P. W. Shor, “Fault-tolerant quantum computation”, in Proceedings of the 37th Symposium on the Foundations of ComputerScience, Los Alamitos, California, 1996, IEEE press, p. 56-65.[5] S. Bravyi and A. Kitaev, “Universal quantum computation with ideal Clifford gates and noisy ancillas”, Phys. Rev. A ,022316 (2005).[6] E. Knill, “Fault-Tolerant Postselected Quantum Computation: Threshold Analysis”, arXiv:quant-ph/0404104.[7] E. Knill, “Quantum computing with realistically noisy devices”, Nature , 39-44 (2005).[8] A. Kitaev, A. Shen, and M. Vyalyi, Classical and quantum computation , volume 47 of Graduate studies in mathematics.American Mathematical Society, 2002.[9] N. J. Ross and P. Selinger, “Optimal ancilla-free Clifford+T approximation of z-rotations”, QIC , 901 (2016).[10] S. Lloyd, “Universal quantum simulators”, Science273