[PDF] Error mitigation for short-depth quantum circuits

Abstract

Two schemes are presented that mitigate the effect of errors and decoherence in short depth quantum circuits. The size of the circuits for which these techniques can be applied is limited by the rate at which the errors in the computation are introduced. Near-term applications of early quantum devices, such as quantum simulations, rely on accurate estimates of expectation values to become relevant. Decoherence and gate errors lead to wrong estimates of the expectation values of observables used to evaluate the noisy circuit. The two schemes we discuss are deliberately simple and don't require additional qubit resources, so to be as practically relevant in current experiments as possible. The first method, extrapolation to the zero noise limit, subsequently cancels powers of the noise perturbations by an application of Richardson's deferred approach to the limit. The second method cancels errors by resampling randomized circuits according to a quasi-probability distribution.

Full PDF

EError mitigation for short-depth quantum circuits

Kristan Temme, Sergey Bravyi and Jay M. Gambetta

IBM T.J. Watson Research Center, Yorktown Heights NY 10598 (Dated: November 7, 2017)Two schemes are presented that mitigate the effect of errors and decoherence in short-depth quantum circuits.The size of the circuits for which these techniques can be applied is limited by the rate at which the errors inthe computation are introduced. Near-term applications of early quantum devices, such as quantum simulations,rely on accurate estimates of expectation values to become relevant. Decoherence and gate errors lead to wrongestimates of the expectation values of observables used to evaluate the noisy circuit. The two schemes wediscuss are deliberately simple and don’t require additional qubit resources, so to be as practically relevant incurrent experiments as possible. The ﬁrst method, extrapolation to the zero noise limit, subsequently cancelspowers of the noise perturbations by an application of Richardson’s deferred approach to the limit. The secondmethod cancels errors by resampling randomized circuits according to a quasi-probability distribution.

From the time quantum computation generated wide spreadinterest, the strongest objection to its viability was the sensi-tivity to errors and noise. In an early paper, William Unruh[1] found that the coupling to the environment sets an ulti-mate time and size limit for any quantum computation. Thisinitially curbed the hopes that the full advantage of quantumcomputing could be harnessed, since it set limits on the scal-ability of any algorithm. This problem was, at least in theory,remedied with the advent of quantum error correction [2–4].It was proven that if both the decoherence and the imprecisionof gates could be reduced below a ﬁnite threshold value, thenquantum computation could be performed indeﬁnitely [5, 6].Although it is the ultimate goal to reach this threshold in anexperiment that is scalable to larger sizes, the overhead that isneeded to implement a fully fault-tolerant gate set with cur-rent codes [7] seems prohibitively large [8, 9]. In turn, it isexpected that in the near term the progress in quantum exper-iments will lead to devices with dynamics, which are beyondwhat can be simulated with a conventional computer. Thisleads to the question: what computational tasks could be ac-complished with only limited, or no error correction?The suggestions of near-term applications in such quan-tum devices mostly center around quantum simulations withshort-depth circuit [10–12] and approximate optimization al-gorithms [13]. Furthermore, certain problems in materialsimulation may be tackled by hybrid quantum-classical algo-rithms [14]. In most such applications, the task can be ab-stracted to applying a short-depth quantum circuits to somesimple initial state and then estimating the expectation valueof some observable after the circuit has been applied. This es-timation must be accurate enough to achieve a simulation pre-cision comparable or exceeding that of classical algorithms.Yet, although the quantum system evolves coherently for themost part of the short-depth circuit, the effects of decoherencealready become apparent as an error in the estimate of the ob-servable. For the simulation to be of value, the effect of thiserror needs to be mitigated.In this paper we introduce two techniques for quantumerror mitigation that increase the quality of any such short-depth quantum simulations. We ﬁnd that the accuracy ofthe expectation value can be increased signiﬁcantly in the presence of noise. We are looking for error mitigationtechniques that are as simple as possible and don’t requireadditional quantum resources. Both techniques require thatsome noise parameter taken together with system size andcircuit depth can be considered a small number. The ﬁrstscheme does not make any assumption about the noise modelother than it being weak and constant in time. In comparison,the second scheme can tolerate stronger noise; however, itrequires detailed knowledge of the noise model.

Extrapolation to the zero noise limit:

It is our goal to es-timate the expectation value of some quantum observable A with respect to an evolved state ρ λ ( T ) after time T that issubject to noise characterized by the parameter λ in the limitwhere λ → . To achieve this, we apply Richardson’s de-ferred approach to the limit to cancel increasingly higher or-ders of λ [15].Although gates are typically used to describe quantum cir-cuits, for our analysis it is more convenient to consider thetime-dependent Hamiltonian dynamics implementing the cir-cuit. The time-dependent multi-qubit Hamiltonian is denotedby K ( t ) . It can be expanded into N - qubit Pauli opera-tors P α ∈ (cid:104) , X j , Y j , Z j (cid:105) j =1 ...N , where X j , Y j , Z j acts as asingle-qubit Pauli matrix on site j and trivially elsewhere. Weallow for time-dependent coupling coefﬁcients J α ( t ) ∈ R .The circuit is encoded as K ( t ) = (cid:80) α J α ( t ) P α . The totalevolution of the open system with initial state ρ will be de-scribed by an equation of the following form: ∂∂t ρ ( t ) = − i [ K ( t ) , ρ ( t )] + λ L ( ρ ( t )) (1)for time t ∈ [0 , T ] . We do not specify the exact form of thegenerator L ( ρ ) but only require that it is invariant under timerescaling and independent from the parameters J α ( t ) in K ( t ) .The noise term L ( ρ ) could be given as a Lindblad operator, orit could correspond to a Hamiltonian that couples to a bathto model non-Markovian dynamics. We ask that there is aparameter λ (cid:28) that indicates a weak action of the noise andthat we can bound (cid:107)L I,t ◦L I,t ◦ . . . ◦L I,t n ( ρ ) (cid:107) ≤ l n , whereat most l n = O ( N n ) . The map L I,t is short-hand notation forthe transformation of L into the interaction frame generated a r X i v : . [ qu a n t - ph ] N ov by K ( t ) .The expectation value of the observable A is obtained fromthe ﬁnal state ρ λ ( T ) as E K ( λ ) = tr ( Aρ λ ( T )) . The function E K ( λ ) can be expressed as a series in λ where the contri-bution with λ corresponds to the noise-free evolution. Thiscan be seen by transforming the evolution into the interac-tion frame w.r.t K ( t ) and expanding the Born series, c.f.supp. mat. sec I. Starting from the noise-free expectation value E ∗ = tr ( Aρ ( T )) , the expansion is given by E K ( λ ) = E ∗ + n (cid:88) k =1 a k λ k + R n +1 ( λ, L , T ) . (2)The a k are model-speciﬁc constants typically growing like a k ∼ N k T k . Here R n +1 ( λ, L , T ) is the remainder ofthe expansion and can be bounded by | R n +1 ( λ, L , T ) | ≤(cid:107) A (cid:107) l n +1 ( λT ) n +1 / ( n + 1)! by standard arguments. Since weassumed an extensive scaling of l n , such an expansion is onlymeaningful whenever N T λ is small. We are of course inter-ested in lim λ → E K ( λ ) = E ∗ ; however, we are faced witha small but ﬁnite parameter λ . Since we only have access to E K ( λ ) , our estimate of E ∗ will be off by O ( λ ) .This estimate can be improved by Richardson’s deferredapproach to the limit [15, 16]. To explain the idea, let us as-sume we can run the quantum circuit at different noise rates λ j , with j = 0 , . . . , n and obtain experimental estimates ˆ E K ( λ j ) = E K ( λ j )+ δ j . Here the λ j = c j λ are appropriatelyrescaled values of the experimental noise rate λ . The estimatedeviates from the actual expectation value due to experimen-tal inaccuracies and ﬁnite sampling errors by an error δ j . Theestimate of E ∗ can be signiﬁcantly improved by consideringthe approximation ˆ E nK ( λ ) , which is written as the linear com-bination ˆ E nK ( λ ) = n (cid:88) j =0 γ j ˆ E K ( c j λ ) . (3)Here we require the coefﬁcients γ j to satisfy the linear systemof equations [16]. n (cid:88) l =0 γ j = 1 and n (cid:88) j =0 γ j c kj = 0 for k = 1 . . . n. (4)The linear combination Eq. (3) will be an approximation to E ∗ up to an error of order O ( λ n +1 ) .To obtain estimates at different noise rates λ j , we use arescaling trick. We run the same circuit n + 1 times withrescaled parameters in K ( t ) . We follow the protocol:1. For j = 0 , . . . , n (a) choose a rescaling coefﬁcient c j > ( c = 1 )and evolve ρ with rescaled Hamiltonian K j ( t ) = (cid:80) α J jα ( t ) P α , where J jα ( t ) = c − j J α (cid:0) c − j t (cid:1) , (5)for time T j = c j T . (b) Estimate observable A to obtain ˆ E K ( c j λ ) .2. Solve equations (4) and compute ˆ E nK ( λ ) as in Eq. (3).A rescaling of the equations shows that the state ρ jλ ( T j ) ,which evolves under ˙ ρ jλ = − i [ K j ( t ) , ρ j ] + λ L ( ρ j ) for time T j , satisﬁes ρ jλ ( T j ) = ρ c j λ ( T ) , c.f. supp. mat. sec. II. Hencethe estimates ˆ E K ( c j λ ) = tr (cid:16) Aρ jλ ( T j ) (cid:17) + δ j can be obtainedfrom the n + 1 runs rescaled according to the protocol. If the protocol is performed for n + 1 steps, the errorbetween the exact expectation value E ∗ and the estimator ˆ E nK ( λ ) can be bounded by | E ∗ − ˆ E nK ( λ ) | ≤ Γ n (cid:18) δ ∗ + (cid:107) A (cid:107) l n +1 ( λT ) n +1 ( n + 1)! (cid:19) . (6) Here Γ n = (cid:80) nj =0 | γ j | c n +1 j and δ ∗ = max j | δ j | is the largestexperimental error. This follows from repeated application of the trian-gle inequality, c.f. supp. mat. sec. III. The equations(4) can be solved, and one ﬁnds that the coefﬁcients γ j = (cid:81) m (cid:54) = j c m ( c j − c m ) − , so that the constant Γ n canbe evaluated. In the literature [16], several choices forprogression of c j are common. The two most frequent seriesare exponential decrease (Bulirsch - Stoer) and harmonicdecay. In our experiments we are actually increasing the noiserate starting from the optimal value, wheres it is common inthe numerical literature to improve the small parameter. Theresult is, of course, the same. Examples :

To demonstrate this method we will considerthree numerical examples. In all the examples the timeevolution is given by a Hamiltonian K ( t ) that encodes acontrol problem. For a single drift step we evolve with aHamiltonian K R ( t ) = U N ( θ ) K U † N ( θ ) , where the singlequbit product unitary U N ( θ ) ∈ SU (2) ⊗ N is chosen Haar-random, and the drift Hamiltonian K = (cid:80) i,j J i,j X i Z j ischosen with respect to a random graph and Gaussian dis-tributed couplings J i,j . The evolution is subject to three dif-ferent noise models: ﬁrst Fig 1(a), we evolve in the pres-ence of depolarizing noise described by the sum of singlequbit generators L i = − λ (2 − tr i ( ρ ) − ρ ) acting on all N qubits. Second, Fig 1(b), we consider dephasing andamplitude damping noise on every qubit, where we havechosen a ratio of λ /λ = 1 . with a generator L i = λ (cid:0) σ − i ρσ + i − { σ + i σ − i , ρ } (cid:1) + λ ( Z i ρZ i − ρ ) and σ ± i =2 − ( X i ± iY i ) . Third, Fig 1(c), we consider a highly non-Markovian setting, where each of the N qubits i is coupledto its own single-qubit bath b i via the Hamiltonian V i =1 / X i ⊗ X b i + 1 / Z b i and the bath is prepared in the initialstate ρ B = (2 cosh( β/ − N exp( − β (cid:80) b i σ zb i ) . Then, afterthe evolution of each noisy circuit T = td we measure a ran-domly chosen multi-qubit Pauli operator P α .The graphs in Fig 1 show that with modest effort very highprecisions can be obtained. In the low noise range (cid:15) ∼ − × -3 -8 -6 -4 -2 × -3 × -3 -6 -5 -4 -3 -2 -1 λ λ λ λ -12 -10 -8 -6 -4 -2 Depolarizing noise Ampl. Damp / Dephasing non - Markovian

FIG. 1. (color online) The plots show a random Hamiltonian evolu-tion for N = 4 system qubits and d = 6 drift steps, each for time t = 2 . For all systems plot the error ∆ E = | E ∗ − ˆ E nK ( λ ) | for n = 0 , , , . Here λ , n = 0 corresponds to the uncorrected er-ror. The noise parameter λ = − / − (cid:15) ) is chosen so thatall plots have the same perturbation measured in the depolarizingstrength (cid:15) = 10 − . . . − . The plot shows the mitigation of (a)Depolarizing noise (b) Amplitude damping / dephasing noise and (c)non-Markovian noise, for { c j } chosen as random partition of in theinterval [1 , . the relative error can be reduced to ∆ E ∼ − − − .The precision is then essentially determined by the samplingerror δ ∗ , which we have neglected in the plots. Probabilistic error cancellation:

Here we discuss a noisereduction scheme for quantum circuits subject to a Marko-vian noise. First let us state our assumptions on the noisemodel. A noisy N -qubit device will be described by a ba-sis set of noisy operations Ω = {O , . . . , O m } that can beimplemented on this device. Each operation O α is a trace-preserving completely positive (TPCP) map on N qubits thatacts non-trivially only on a small subset of qubits, say atmost two. For example, O α could be a noisy unitary gateapplied to a speciﬁed pair of qubits or a noisy qubit initial-ization. We assume that noise in the system can be fullycharacterized such that the map O α is known for each α . Acircuit of length L in the basis Ω is a sequence of L oper-ations from Ω . Let Ω L be a set of all length- L circuits inthe basis Ω . A circuit α = ( α , . . . , α L ) implements a map O α = O α L · · · O α O α . The expectation value of an observ-able A on the ﬁnal state produced by a noisy circuit α is E ( α ) = Tr (cid:2) A O α ( | (cid:105)(cid:104) | ⊗ n ) (cid:3) . For simplicity, we ignore errors in the initial state preparationand in the ﬁnal measurement. Such errors can be accountedfor by adding dummy noisy operations before each measure-ment and after each qubit initialization. Furthermore, we shallassume that A is diagonal in the Z -basis and (cid:107) A (cid:107) ≤ .Below we show that under certain conditions the task ofsimulating an ideal quantum circuit can be reduced to esti-mating the expectation value E ( α ) for a suitable random en-semble of noisy quantum circuits α . Moreover, the ideal andthe noisy circuits act on the same number of qubits and havethe same depth.Let Γ = {U , . . . , U k } be a ﬁxed basis set of ideal gates. Each gate U β ( ρ ) = U β ρU † β is described by a unitary TPCPmap on N qubits that acts non-trivially on a small subset ofqubits. An ideal length- L circuit in the basis Γ is a sequenceof L gates from Γ . A circuit β = ( β , . . . , β L ) implements amap U β = U β L · · · U β U β . Deﬁne an ideal expectation value E ∗ ( β ) = Tr (cid:2) A U β ( | (cid:105)(cid:104) | ⊗ n ) (cid:3) . We consider a simulation task where the goal is to estimate E ∗ ( β ) with a speciﬁed precision δ .The key idea of our scheme is to represent the ideal circuitas a quasi-probabilistic mixture of noisy ones. Let us say thata noisy basis Ω simulates an ideal circuit β with the overhead γ β ≥ if there exists a probability distribution P β ( α ) on theset of noisy circuits α ∈ Ω L such that U β = γ β (cid:88) α ∈ Ω L P β ( α ) σ β ( α ) O α (7)for some coefﬁcients σ β ( α ) = ± . We also require that thedistribution P β ( α ) is sufﬁciently simple so that one can ef-ﬁciently sample α from P β ( α ) . The coefﬁcients γ β , σ β ( α ) must be efﬁciently computable. We shall refer to Eq. (41) as a quasi-probability representation (QPR) of the ideal circuit β .Note that γ β ≥ because U β and O α are trace-preserving.Quasi-probability distributions have been previously used toconstruct classical algorithms for simulation of quantum cir-cuits [17, 18]. Our work can be viewed as an application ofthese methods to the problem of simulating ideal quantum cir-cuits by noisy ones.Substituting Eq. (41) into the deﬁnition of E ∗ ( β ) gives E ∗ ( β ) = γ β (cid:88) α ∈ Ω L P β ( α ) σ β ( α ) E ( α ) . (8)Let α ∈ Ω L be a random variable drawn from P β ( α ) and x ∈{ , } n be the ﬁnal readout of the noisy circuit α obtained bymeasuring each qubit of the ﬁnal state O α ( | (cid:105)(cid:104) | ⊗ n ) in the Z -basis. Note that (cid:104) x | A | x (cid:105) is an unbiased estimator of E ( α ) with the variance O (1) . Thus from Eq. (42) one infers that γ β σ β ( α ) (cid:104) x | A | x (cid:105) is an unbiased estimator of the ideal expec-tation value E ∗ ( β ) with the variance O ( γ β ) . We can nowestimate E ∗ ( β ) with any desired precision δ by the MonteCarlo method. Deﬁne M = ( δ − γ β ) (9)and generate M samples α , . . . , α M ∈ Ω L drawn from P β ( α ) . By Hoeffding’s inequality, E ∗ ( β ) is approximatedwithin error O ( δ ) w.h.p. by a random variable ˆ E ( β ) = γ β M M (cid:88) a =1 σ β ( α a ) (cid:104) x a | A | x a (cid:105) , (10)where x a ∈ { , } n is the ﬁnal string of the noisy circuit α a . Computing the estimator ˆ E ( β ) requires M runs of thenoisy circuits, with each run producing a single readout string x a . Estimating E ∗ ( β ) with a precision δ in the absence ofnoise by Monte Carlo method would require approximately δ − runs. Thus the quantity γ β determines the simulationoverhead (see Eq. (9)).A systematic method of constructing QPRs with a smalloverhead is given in supp. mat. sec. IV. Here we illustrate themethod using toy noise models usually studied in the quantumfault-tolerance theory: the depolarizing noise and the ampli-tude damping noise. For concreteness, we choose the idealgate set Γ as the standard Clifford+ T basis.Let D k be the depolarizing noise on k = 1 , qubits thatreturns the maximally mixed state with probability (cid:15) and doesnothing with probability − (cid:15) . Deﬁne a noisy version of a k -qubit unitary gate U as D k U . The noisy basis Ω is obtained bymultiplying ideal gates on the left by arbitrary Pauli operatorsand adding the depolarizing noise. Thus Ω is a set of opera-tions O α = D k PU , where U ∈ Γ is a k -qubit ideal gate and P ∈ {I , X , Y , Z} ⊗ k is a Pauli TPCP map. The random en-semble of noisy circuits O α that simulates an ideal circuit U β is constructed in three steps: (1) Start from the ideal circuit, O α = U β . (2) Modify O α by adding a Pauli X, Y, Z aftereach single-qubit gate with probability p = (cid:15)/ (4 + 2 (cid:15) ) . Thegate is unchanged with probability − p . (3) Modify O α byadding a Pauli IX, IY, . . . , ZZ after each CNOT with proba-bility p = (cid:15)/ (16 + 14 (cid:15) ) . The CNOT is unchanged with prob-ability − p . The resulting circuit is then implemented ona noisy device (which adds the depolarizing noise after eachgate) and the ﬁnal readout string x is recorded. By generating M samples of x one can estimate E ∗ ( β ) from Eq. (10). Thesign function σ β ( α ) is equal to ( − r , where r is the num-ber of Pauli operators added to the ideal circuit U β . As shownin supp. mat. sec. IV, the above deﬁnes a QPR of the idealcircuit U β with the overhead γ β ≈ (cid:15) (3 L / L / ,where L is the number of single-qubit gates and L is thenumber of CNOTs in the ideal circuit. The method has beentested numerically for random noisy Clifford+ T circuits, seeFig. 2.A more interesting example is the noise described bythe amplitude-damping channel A that resets every qubitto its ground state with probability (cid:15) . A noisy version of a k -qubit unitary gate U is deﬁned as A ⊗ k U . In contrast to theprevious example, noisy unitary gates A ⊗ k U alone cannotsimulate any ideal unitary gate since A is not a unital map. Toovercome this, we extend the noisy basis Ω by adding noisyversions of single-qubit state preparations AP | ψ (cid:105) , where P | ψ (cid:105) maps any input state to | ψ (cid:105)(cid:104) ψ | . Our scheme requires statepreparations for single qubit states | ψ (cid:105) = | + (cid:105) , |−(cid:105) , | (cid:105) , | (cid:105) that can be performed at any time step (not only at thebeginning). In supp. mat. sec. V we show how to constructa QPR of the ideal Clifford+ T circuit U β with the overhead γ β ≈ (cid:15) (2 L + 4 L ) . The examples considered abovesuggest that well-characterized noisy circuits can simulateideal ones with overhead γ ≈ (1 + c(cid:15) ) L , where (cid:15) is the typicalerror rate and c is a small constant. The value of c can bedetermined by performing quantum process tomography [19] Simulation precision N u m be r o f s i m u l a t ed c i r c u i t s mitigation offmitigation on FIG. 2. Simulation precision δ ( β ) = | ˆ E ( β ) − E ∗ ( β ) | for ran-domly generated ideal Clifford+ T circuits on N = 6 qubits withdepth d = 20 . The gates are subject to single- and two-qubit de-polarizing noise (cid:15) = 10 − . The ﬁgure shows results for simula-tions without (a) and with (b) error cancellation. In both cases eachideal circuit was simulated by M = 4000 runs of the noisy cir-cuit. For each circuit U β we deﬁned the observable A as a projector Π out onto the subset of N − basis vectors with the largest weightin the ﬁnal state. The results are consistent with γ β ≈ . so that γ β M − / ≈ . . and ﬁnding the QPR for each ideal gate. Using Eq. (9),one can estimate the number of noisy circuit runs of length L as M ∼ exp (2 c(cid:15)L ) . Assuming error rates in the range (cid:15) ∼ − , it may be possible to simulate ideal circuits with O (10 ) gates. Conclusions:

Both error mitigation schemes require noadditional quantum hardware such as ancilla or code qubitsand work directly with the physical qubits. The zero-noise ex-trapolation requires sufﬁcient control of the time evolution toimplement the rescaled dynamics and hinges on the assump-tion of a large time-scale separation between the dominantnoise and the controlled dynamics. For the probabilistic errorcancellation a full characterization of the noisy computationaloperations is necessary. To obtain this to a precision of ∼ − is challenging in practice. However, if one is willingto sacriﬁce optimality, a Pauli- or Clifford-twirling [20, 21]can be applied that converts any noise channel into a simplemixture of Pauli errors or depolarizing noise, making thecharacterization task much more manageable. A very recentindependent paper by Li and Benjamin [22] discusses similarissues to those addressed here. Acknowledgements:

We thank Antonio Mezzacapo for in-sightful discussions, and we acknowledge support from theIBM Research Frontiers Institute [1] W. G. Unruh, Physical Review A , 992 (1995).[2] P. W. Shor, Physical review A , R2493 (1995).[3] A. M. Steane, Physical Review Letters , 793 (1996).[4] A. R. Calderbank and P. W. Shor, Physical Review A , 1098(1996). [5] D. Aharonov and M. Ben-Or, in Proceedings of the twenty-ninthannual ACM symposium on Theory of computing (ACM, 1997)pp. 176–188.[6] A. Y. Kitaev, Russian Mathematical Surveys , 1191 (1997).[7] A. G. Fowler, M. Mariantoni, J. M. Martinis, and A. N. Cle-land, Physical Review A , 032324 (2012).[8] N. C. Jones, R. Van Meter, A. G. Fowler, P. L. McMahon,J. Kim, T. D. Ladd, and Y. Yamamoto, Physical Review X ,031007 (2012).[9] S. J. Devitt, A. M. Stephens, W. J. Munro, and K. Nemoto,Nature communications (2013).[10] A. Peruzzo, J. McClean, P. Shadbolt, M.-H. Yung, X.-Q. Zhou,P. J. Love, A. Aspuru-Guzik, and J. L. O?Brien, Nature com-munications (2014).[11] J. R. McClean, J. Romero, R. Babbush, and A. Aspuru-Guzik,arXiv preprint arXiv:1509.04279 (2015).[12] D. Wecker, M. B. Hastings, and M. Troyer, Physical Review A , 042303 (2015).[13] E. Farhi, J. Goldstone, and S. Gutmann, arXiv preprintarXiv:1411.4028 (2014).[14] B. Bauer, D. Wecker, A. J. Millis, M. B. Hastings, andM. Troyer, arXiv preprint arXiv:1510.03859 (2015).[15] L. F. Richardson and J. A. Gaunt, Philosophical Transactions of the Royal Society of London. Series A, containing papers of amathematical or physical character , 299 (1927).[16] A. Sidi, “Practical extrapolation methods: Theory and appli-cations, volume 10 of cambridge monographs on applied andcomputational mathematics,” (2003).[17] H. Pashayan, J. J. Wallman, and S. D. Bartlett, Physical reviewletters , 070501 (2015).[18] N. Delfosse, P. A. Guerin, J. Bian, and R. Raussendorf, Physi-cal Review X , 021003 (2015).[19] M. Mohseni, A. Rezakhani, and D. Lidar, Phys. Rev. A ,032322 (2008).[20] M. Silva, E. Magesan, D. W. Kribs, and J. Emerson, PhysicalReview A , 012347 (2008).[21] E. Magesan, J. M. Gambetta, and J. Emerson, Physical ReviewA , 042311 (2012).[22] Y. Li and S. C. Benjamin, arXiv preprint arXiv:1611.09301(2016).[23] G. Lindblad, Communications in Mathematical Physics , 119(1976).[24] H.-P. Breuer and F. Petruccione, The theory of open quantumsystems (Oxford University Press on Demand, 2002).[25] C. Schneider, Numerische Mathematik , 177 (1975). SUPPLEMENTAL MATERIALReducing noise by Richardson extrapolation

In numerical analysis, Richardson extrapolation [15, 16] is a sequence acceleration method, used to improve the rate ofconvergence of a sequence. We use the same technique to extrapolate to the zero-noise limit in short-depth quantum circuits inthe presence of noise. We assume that the noise process is constant in time and does not depend on the rescaling of the systemHamiltonian parameters. We consider various noise models in continuous time.It is our goal to estimate the expectation value of some observable A with respect to the evolved state ρ λ ( T ) . The actualcomputation is now encoded in the time-dependent Hamiltonian K ( t ) , and the full evolution is given by the equation ∂∂t ρ = − i [ K ( t ) , ρ ] + λ L ( ρ ) . (11)Here we have that K ( t ) = (cid:88) α J α ( t ) P α (12)is some multi-qubit Pauli Hamiltonian with time-dependent coupling constant J α ( t ) ∈ R and Pauli operators P α ∈(cid:104) , X i , Y i , Z i (cid:105) i . Starting from some initial state ρ the system is evolved for some time T . We consider different forms ofnoise L ( ρ ) . As the simplest form of noise, we assume a time-independent Lindblad operator [23] of the form L ( ρ ) = (cid:88) β L β ρL † β − { L † β L β , ρ } . (13)However, we can also imagine other forms of errors, such as L ( ρ ) = − i [ V, ρ ] , (14)where V is some Hamiltonian. This setting in useful when we want to consider more general, possibly non-Markovian noisemodels, or a noisy evolution derived from ﬁrst principle [24]. One can make the assumption that the initial state is given by ρ = ρ S (0) ⊗ ρ B (0) and give the most general form of an interaction Hamiltonian between system and bath, such as V = (cid:88) α S α ⊗ B α + H B . (15)Here we take the point of view that a small λ indicates a separation of time scales, and ρ S (0) = | ψ (cid:105)(cid:104) ψ | may be the initialstate of the computation. We assume that ρ (0) = ρ S (0) ⊗ ρ B , where the bath state is a steady state w.r.t the bath Hamiltonian [ H B , ρ B ] = 0 . The observable we want to estimate A = A S ⊗ is then only supported on the system degrees of freedom.We assume that ρ λ ( T ) is the state we obtain after the noisy evolution for time T . From this state we can estimate theexpectation value of the observable A by various methods. Typically we will sample the expectation value E K ( λ ) = tr ( Aρ λ ( T )) (16)so that an additional sampling error δ is introduced, and we obtain from out measurement the statistic ˆ E K ( λ ) = E K ( λ ) + δ .The error can assumed to be asymptotically Gaussian δ = O (cid:16) M − / (cid:112) tr ( ρ λ ( T )( A − E K ( λ )) ) (cid:17) since one typically repeatsthe experiment M (cid:29) times and the i.i.d hypothesis holds. I Series expansion in the noise parameter

We now show that the function E K ( λ ) can be expressed as a series in λ where the contribution with λ corresponds to thenoise-free evolution. We also provide a bound on the error term. To this end, we transform into the interaction picture of K ( t ) .We deﬁne U K ( t ) = T (cid:110) exp( − i (cid:82) t K ( t (cid:48) ) dt (cid:48) ) (cid:111) , where T {·} deﬁnes the time order expansion. We deﬁne the interaction picturethrough ρ I ( t ) = U K ( t ) ρ ( t ) U † K ( t ) and L I,t ( ◦ ) = U K ( t ) L (cid:16) U † K ( t ) ◦ U K ( t ) (cid:17) U † K ( t ) , (17)where now the generator L I,t has become time-dependent. The evolution equation in the interaction picture now reads ∂ t ρ I ( t ) = λ L I,t ( ρ I ( t )) . (18)Recall that every ﬁrst-order differential equation can be reformulated as an integral equation ρ I ( T ) = ρ I (0) + λ (cid:90) T L I,t ( ρ I ( t )) dt. (19)This equation can be recursively solved to increasing order in λ so that ρ I ( T ) = ρ I (0) + λ (cid:90) T L I,t ( ρ I (0)) dt + λ (cid:90) T (cid:90) t L I,t ◦ L

I,t (cid:48) ( ρ I (0)) dtdt (cid:48) + λ (cid:90) T (cid:90) t (cid:90) t (cid:48) L I,t ◦ L

I,t (cid:48) ◦ L

I,t (cid:48)(cid:48) ( ρ I (0)) dtdt (cid:48) dt (cid:48)(cid:48) . . . . (20)Recall that ρ I (0) = ρ (0) . Furthermore, we can conjugate the full expression on both sides with the unitary U K ( T ) .Let us for notational convenience deﬁne ρ λ ( T ) as the resulting state after evolution with noise rate λ . We observe that U K ( T ) † ρ (0) U K ( T ) = ρ ( T ) , whereas U K ( T ) † ρ I ( T ) U K ( T ) = ρ λ ( T ) , so that we obtain the expression in the Schr¨odingerpicture as ρ λ ( T ) = ρ ( T ) + n (cid:88) k =1 λ k (cid:90) T (cid:90) t . . . (cid:90) t k − U † K ( T ) L I,t ◦ L I,t ◦ . . . ◦ L I,t k ( ρ (0)) U K ( T ) dt dt . . . dt k + λ n +1 (cid:90) T (cid:90) t . . . (cid:90) t n U † K ( T ) L I,t ◦ L I,t ◦ . . . ◦ L I,t n +1 ( ρ I ( t n +1 )) U K ( T ) dt dt . . . dt n +1 . (21)The expectation value E K ( λ ) = tr ( Aρ λ ( T )) for the observable A can immediately be expanded in a series with parameter λ ofthe form E K ( λ ) = tr ( Aρ ( T )) + n (cid:88) k =1 a k λ k + R n +1 ( λ, L , T ) , (22)where the constants a k and the remainder R n +1 ( λ, L , T ) are obtained by pairing the integrals with the trace tr ( A · ) and tr ( Aρ ( T )) = E ∗ corresponds to the noise-free evolution to which we seek to extrapolate. We read off that a k = (cid:90) T (cid:90) t . . . (cid:90) t k − tr (cid:16) U K ( T ) AU † K ( T ) L I,t ◦ L I,t ◦ . . . ◦ L I,t k ( ρ (0)) (cid:17) dt dt . . . dt k , (23)as well as R n +1 ( λ, L , T ) = λ n +1 (cid:90) T (cid:90) t . . . (cid:90) t n tr (cid:16) U K ( T ) AU † K ( T ) L I,t ◦ . . . ◦ L I,t n +1 ( ρ I ( t n +1 )) (cid:17) dt . . . dt n +1 . (24)We can bound | R n +1 ( λ, L , T ) | by a simple application of Cauchy’s mean value theorem and H¨older’s inequality. We observeby ﬁrst applying the midpoint Theorem that there exist ξ , . . . , ξ n +1 so that R n +1 ( λ, L , T ) = λ n +1 T n +1 ( n + 1)! tr (cid:16) U K ( T ) AU † K ( T ) L I,ξ ◦ . . . ◦ L I,ξ n +1 ( ρ I ( ξ n +1 )) (cid:17) . (25)We can then of course immediately bound the inner product | tr (cid:16) U K ( T ) AU † K ( T ) L I,t ◦ . . . ◦ L I,t n +1 ( ρ I ( t n +1 )) (cid:17) | ≤ (cid:107) A (cid:107)(cid:107)L I,tξ ◦ . . . ◦ L I,ξ n +1 ( ρ I ( ξ n +1 )) (cid:107) (26)by a direct application of H¨older’s inequality. Note that all Schatten norms are unitarily invariant, so when the map L is bounded,we can apply the subsequent operator norm inequalities (cid:107)L I,tξ ◦ . . . ◦ L I,ξ n +1 ( ρ I ( ξ n +1 )) (cid:107) ≤ (cid:107)L(cid:107) n +11 → . (27)It is safe to assume that a Lindblad operator L acting on a ﬁnite dimensional system, such as a collection of qubits, is bounded.However, we also consider the case of a ﬁrst-principle noise model that can even be non-Markvoian. In this setting the operator L ( ρ ) = − [ V, ρ ] is expected to couple to an arbitrary large bath and V may contain unbounded operators, such as bosonicoperators. In such a setting an upper bound in terms of an operator norm of L is a moot point. Yet, in this case we can transformthe evolution into the Heisenberg picture L ∗ , for the observable A (0) = A S ⊗ , and look at the equations for A ( t ) instead. Thealmost identical analysis as performed above can be carried through, but this time we can obtain a bound on | tr (cid:16) ρ I (0) L ∗ I,ξ n +1 ◦ . . . ◦ L ∗ I,tξ ( A ( t )) (cid:17) | ≤ (cid:107) A ( t ) (cid:107)(cid:107)L I,tξ ◦ . . . ◦ L I,ξ n +1 ( ρ I (0)) (cid:107) . (28)We obtain almost the same type of bound from H¨older’s inequality since (cid:107) A ( t ) (cid:107) ≤ (cid:107) A (cid:107) for contractive evolutions, where thesole difference is now that (cid:107)L I,tξ ◦ . . . ◦ L I,ξ n +1 ( ρ I (0)) (cid:107) ≤ l n +1 only depends on the initial state. Since we now consider theaction of the operators in V on a well-behaved initial state ρ (0) , we can assume that l n +1 is a reasonable bound. In either case,we will now write for the bound on | R n +1 ( λ, L , T ) | from now on: | R n +1 ( λ, L , T ) | ≤ (cid:107) A (cid:107) l n +1 λ n +1 T n +1 ( n + 1)! . (29)The coefﬁcients a k can be bounded in a similar fashion. Note that, if we assume that noise acts locally on each qubit, such asfor instance, when the dissipator L corresponds to single-qubit depolarizing noise, so that L ( ρ ) = (cid:80) Ni =1 ( tr [ i ] ( ρ ) − ρ ) . Wehave that ||L|| → = O ( N ) is extensive in the system size. A similar argument holds for the case when the individual qubitscouple to a bath. From this we can deduce that for local noise we typically ﬁnd l k = O ( N k ) as mentioned in the main text, andthat | a k | ≤ O (( N T ) k ) .It is also worthwhile to point out the following observation. For different types of error terms L it may happen that not allpowers of λ are present in the expansion. It is conceivable that some system bath interactions could lead to an expansion in onlyeven powers of λ . If this occurs, the Richardson extrapolation method is particularly efﬁcient, since a higher order of precisioncan be obtained with fewer values of λ j . II Experimental rescaling of the noise parameter

In order to apply Richardson extrapolation, we have to be able to evaluate E K ( λ ) for different values of λ . In an actualexperiment, we can’t directly control the parameter λ : however, we may control the evolution K ( t ) . To this end, we introduce arescaling. We redeﬁne T → T (cid:48) = cT as well as J α ( t ) → J (cid:48) α ( t ) = c − J α ( c − t ) from which also ρ ( t ) → ρ (cid:48) ( t ) = ρ ( c − t ) . (30)We claim that this rescaling maps ρ (cid:48) λ ( T (cid:48) ) = ρ cλ ( T ) if the noise operator L does not depend on the Hamiltonian couplings J α ( t ) and is constant in time. This rescaled density matrix then leads to a new evaluation E (cid:48) K ( λ ) → E K ( cλ ) of the expectation value.To see that the rescaling has the desired effect, we again make use of the integral representation of ρ λ ( T ) , for which we canwrite now in the Schr¨odinger picture ρ λ ( T ) = ρ (0) − i (cid:90) T [ K ( t ) , ρ ( t )] dt + λ (cid:90) T L ( ρ ( t )) dt. (31)We can now choose a re-parametrization of the evolution c − J α ( c − t ) and an increased runtime cT , and write ρ (cid:48) λ ( T (cid:48) ) = ρ (0) − i (cid:90) cT [ K (cid:48) ( t ) , ρ (cid:48) ( t )] dt + λ (cid:90) cT L ( ρ (cid:48) ( t )) dt (32)with K (cid:48) ( t ) = (cid:80) α c − J α ( c − t ) P α . If we now substitute the integration variable according to t = ct (cid:48) , we have that dt = cdt (cid:48) ,which leads to ρ (cid:48) λ ( T (cid:48) ) = ρ (0) − i (cid:90) T (cid:88) α c − J α ( t (cid:48) )[ P α , ρ ( t (cid:48) )] cdt (cid:48) + λ (cid:90) T L ( ρ (cid:48) ( t )) cdt (cid:48) = ρ (0) − i (cid:90) T (cid:88) α [ K ( t (cid:48) ) , ρ ( t (cid:48) )] dt (cid:48) + λc (cid:90) T L ( ρ ( t (cid:48) )) dt (cid:48) = ρ cλ ( T ) . (33)Hence, rescaling the evolution according to equation (30) leads to an effective rescaling of the dissipative rate λ . This can bedone for any constant dissipator L and allows the experimenter to evaluate E K ( λ ) for different values of cλ so that we canapply the Richardson extrapolation procedure.Note that for different experimental circumstances, other rescaling methods of the parameter λ may actually be easier toimplement. For example, in an optical experiment that is plagued by photon loss, it may be suitable to consider differentmethods of directly changing the photon loss rate. The only requirement is that the modiﬁcation of λ j can be performedsufﬁciently accurately so the extrapolation can be performed. III Error bounds on the noise-free estimator

Let us now show that the protocol leads to the desired error bound on the estimated expectation value as claimed in the maintext. Recall that we ﬁrst choose a set of n +1 rescaling parameters c = 1 < c < . . . < c n , to evolve with respect to the rescaledHamiltonian K j ( t ) for time T j = c j T . As discussed in the previous section, this evolution leads to a state ρ jλ ( T j ) = ρ c j λ ( T ) ,c.f. Eq. (33) as was discussed in section II. If we now measure the observable A on these states we obtain for j = 0 . . . n + 1 the estimates ˆ E K ( c j λ ) = E K ( c j λ ) + δ j . Recall the set of equations for γ j deﬁned in [25] and given in the main text, whichrequires for the { c j } that n (cid:88) l =0 γ j = 1 n (cid:88) j =0 γ j c kj = 0 for k = 1 . . . n. (34)Now we observe that estimators ˆ E K ( c j λ ) can be expressed as ˆ E K ( c j λ ) = E ∗ + n (cid:88) k =1 a k c kj λ k + R ( c j λ, L , T ) + δ j (35)due to the expansion (22) discussed in section I. Recall now the deﬁnition of our improved estimator ˆ E nK ( λ ) as given in themain text, ˆ E nK ( λ ) = (cid:80) nj =0 γ j ˆ E K ( c j λ ) , for which then ˆ E nK ( λ ) = n (cid:88) j =0 (cid:32) γ j E ∗ + n (cid:88) k =1 a k c kj λ k + R ( c j λ, L , T ) + δ j (cid:33) = E ∗  n (cid:88) j =0 γ j  + n (cid:88) k =1 a k λ k  n (cid:88) j =0 γ j c kj  +  n (cid:88) j =0 γ j R ( c j λ, L , T ) + δ j  . (36)Recall the equations for γ j from which we can then infer after the application of the triangle inequality | E ∗ − ˆ E nK ( λ ) | ≤ n (cid:88) j =0 | γ j | ( | R ( c j λ, L , T ) | + | δ j | ) . (37)After the application of the bound | R ( c j λ, L , T ) | ≤ (cid:107) A (cid:107) l n +1 c n +1 j λ n +1 T n +1 (( n + 1)!) − and the observation that c j ≥ , wecan bound the difference with Γ n = (cid:80) nj =0 | γ j | c n +1 j and obtain the ﬁnal bound | E ∗ − ˆ E nK ( λ ) | ≤ Γ n (cid:18) δ ∗ + (cid:107) A (cid:107) l n +1 λ n +1 T n +1 ( n + 1)! (cid:19) , (38)with δ ∗ = max j | δ j | .In the Richardson extrapolation literature [16], two types of sequences c j are considered frequently. In the Bulirsch - Stoerseries the rescalings are chosen so that c j = h j c constitutes an exponential series, which is typically chosen at base h = 1 / ;but harmonic series have also been frequently applied, e.g. for parameters q > , η ≥ one can choose c j = ( j + η ) − q c .Note that in our experiments we are actually increasing the noise rate starting from the optimal value, whereas it is commonin the numerical literature to improve the small parameter, so that c j +1 ≤ c j . The result here is of course the same, and justcorresponds to a reordering of the labels when n is ﬁnite. For both of the aforementioned cases, a bound on Γ n has been derived.One is mostly interested in the asymptotic behavior of Γ n as n → ∞ in order to analyze the numerical stability of the method.In current experiments we only expect to go to third or forth order, making the stability analysis less relevant. Probabilistic error cancellation by resampling

The key idea of our scheme is to represent the ideal circuit as a quasi-probabilistic mixture of noisy ones. Central to thisapproach is the quasi probability representation (QPR) of the noise-free circuit U β . We note that quasi-probability distributionshave been previously used to construct classical algorithms for simulation of quantum circuits [17, 18]. Our work can be viewedas an application of these methods to the problem of simulating ideal quantum circuits by noisy ones.The general approach to constructing a QPR for a quantum circuit is the following: Suppose you are given a set of noisyoperations Ω = {O , . . . , O m } that can be implemented on a noisy N -qubit device. We assume that we can perform gatetomography [19] to specify the gates with an accuracy that is comparable to the desired accuracy of the ideal circuit. Thesenoisy operations are noisy versions of ideal quantum gates and are assumed to form a full basis of TPCP operations, i.e. anelement O k ∈ Ω is always of the form O k ( ρ ) = (cid:88) i O i,k ρO † i,k with (cid:88) i O † i,k O i,k = . (39)A crucial condition is that the set of noisy operators Ω constitutes a basis in the space of TPCP operations that is sufﬁcientlylarge, so that any ideal, unitary gate U ( ρ ) = U ρU † can be expressed as a linear combination of noisy gates in Ω . Hence, therehave to be coefﬁcients η α ∈ R , and noisy operations in O α ∈ Ω so that we can write for any ideal gate in the circuit U ( ρ ) = (cid:88) α η α O α ( ρ ) , ∀ ρ. (40)This linear expansion of U can then be cast into the form of a quasi-probability representation.On real quantum devices we can only apply the noisy operations Ω . We say that a circuit of length L in the basis Ω isa sequence of L operations from Ω . Such a circuit, c.f. Fig 3(b), indexed by α = ( α , . . . , α L ) implements a noisy map O α = O α L · · · O α O α .0The expectation value of an observable A on the ﬁnal state produced by a noisy circuit α is E ( α ) = Tr (cid:2) A O α ( | (cid:105)(cid:104) | ⊗ n ) (cid:3) . For simplicity, we ignore errors in the initial state preparation and in the ﬁnal measurement. Such errors can be accounted for byadding dummy noisy operations before each measurement and after each qubit initialization. Furthermore, we shall assume that A is diagonal in the Z -basis and (cid:107) A (cid:107) ≤ .The task of simulating an ideal quantum circuit U β , c.f. Fig. 3(a), of ideal gates U β = U β L . . . U α U α , can be reduced toestimating the expectation values E ( α ) for a suitable random ensemble of noisy quantum circuits α . That is, we can obtainestimates for the ideal expectation values E ∗ ( β ) = Tr (cid:2) A U β ( | (cid:105)(cid:104) | ⊗ n ) (cid:3) , after the application of the circuit U β by estimating noisy circuit outputs. Moreover, the ideal and the noisy circuits act on thesame number of qubits and have the same length.We say that the noisy basis Ω simulates an ideal circuit β if there exists a probability distribution P β ( α ) on the set of noisycircuits α ∈ Ω L such that U β = γ β (cid:88) α ∈ Ω L P β ( α ) σ β ( α ) O α (41)for some coefﬁcients σ β ( α ) = ± . We require that the distribution P β ( α ) is sufﬁciently simple so that one can efﬁcientlysample α from P β ( α ) . The coefﬁcients γ β , σ β ( α ) must be efﬁciently computable. FIG. 3. (color online) The ﬁgure (a) represents the ideal circuit we want to simulate. It is comprised of single- and two-qubit gates { U , . . . , U } . We assume that a complete set of noisy gates exist Ω = {O α , . . . , O α } , which serve as an operator basis in whichthe action of the ideal set can be expanded. It is then sufﬁcient to sample circuits, as given in ﬁgure (b), where the gates are drawn from theprobability distribution P β in Eq. (41). We can see that the estimates of the noisy circuit are related to the ideal circuit probability by substituting Eq. (41) into thedeﬁnition of E ∗ ( β ) . This gives E ∗ ( β ) = γ β (cid:88) α ∈ Ω L P β ( α ) σ β ( α ) E ( α ) . (42)The construction of QPRs with optimal overhead of general operation-dependent noise is an interesting open problem. Apreliminary analysis shows that a noisy basis Ω that includes noisy versions of all single-qubit and two-qubit Clifford gates, T -gates, and noisy qubit initializations in the X, Y, Z basis can simulate any ideal gate U β from the Clifford+ T gate set withthe overhead γ β ≤ O ( (cid:15) ) , provided that each noisy operation is (cid:15) -close to its ideal analogue. Unfortunately, the constantcoefﬁcient in this upper bound is far too large to have any practical implications.Furthermore, the full Clifford group on two-qubits contains gates. It may not be feasible to perform process tomographyfor each of those gates. We shall see that for certain noise models, such as the amplitude damping noise, QPRs can be constructedonly if the noisy basis Ω includes some qubit initialization maps. In particular, noisy circuits α that appear in Eq. (41) may applyqubit initializations at intermediate time steps, even though the ideal circuit β initializes all the qubits at the very ﬁrst step. AllQPRs constructed below preserve the circuit depth. That is, if the ideal circuit β has depth d then all noisy circuits α in Eq. (41)have depth at most d .1 IV Minimal overhead decomposition of noise free circuit

Let us discuss how to construct QPRs with a small overhead. For concreteness, we choose the ideal gate set Γ as the Clifford+ T basis. It includes the identity gate I , the Hadamard gate H , phase-shift gates S = diag[1 , i ] and T = diag[1 , e iπ/ ] , andthe CNOT. For technical reasons, we shall assume that each CNOT is followed by single-qubit gates (that could be identitygates). We shall consider toy noise models usually studied in the quantum fault-tolerance theory: the depolarizing noise and theamplitude damping noise.First let us describe product QPRs that can be constructed independently for each gate in the ideal circuit. Consider a ﬁxedideal gate U β ∈ Γ . Let O , . . . , O p ∈ Ω be the list of all noisy operations whose support is contained in the support of U β .Consider the following linear program with p real variables µ , . . . , µ p , η , . . . , η p . minimize p (cid:88) α =1 µ α (43) subject to  η α ≤ µ α − η α ≤ µ α U β = (cid:80) pα =1 η α O α . (44)Suppose { µ α , η α } is the optimal solution of the program. Note that µ α = | η α | for all α since otherwise the objective functioncan be decreased. Deﬁne γ β = (cid:80) pα =1 µ p , P β ( α ) = µ α /γ β , and σ β ( α ) = sgn( η α ) . Then U β = γ β p (cid:88) α =1 P β ( α ) σ β ( α ) O α , (45)which is a gate-wise version of the QPR Eq. (41). We shall say that a noisy basis Ω simulates a gate U β with the overhead γ β ifthe linear program Eqs. (43,44) has a feasible solution with value γ β . A product QPR of the ideal circuit β is deﬁned as a productof all gate-wise QPRs Eq. (45). It gives γ β = γ β · · · γ β L , P β ( α ) = P β ( α ) · · · P β L ( α L ) and σ β ( α ) = σ β ( α ) · · · σ β L ( α L ) .The assumption that all noisy operations O α in Eq. (44) act non-trivially only within the support of U β allows one to restrictEq. (44) to operations acting on at most two qubits. Such operations can be represented by real matrices of size × bycomputing matrix elements of O α and U β in the Pauli basis. Thus the program Eqs. (43,44) can be solved in time O (1) . Sincethe ideal gate set has size O ( n ) , product QPRs can be computed in time O ( n ) . Furthermore, if two ideal gates have disjointsupports, then the gate-wise QPRs deﬁned in Eq. (45) have disjoint supports. Thus product QPRs preserve the circuit depth. V Depolarizing noise cancellation and numerical results

Let us illustrate the construction of product QPRs using the depolarizing noise as an example. Let D k be the (cid:15) -depolarizingchannel on k qubits that returns the maximally mixed state with probability (cid:15) and does nothing with probability − (cid:15) . Deﬁnea noisy version of a k -qubit unitary gate U as D k U . Deﬁne a noisy basis Ω by multiplying ideal gates on the left by arbitraryPauli operators and adding the depolarizing noise. Thus Ω is a set of operations O α = D k PU , where U ∈ Γ is a k -qubitideal gate and P ∈ {I , X , Y , Z} ⊗ k is a Pauli TPCP map. A Pauli map P corresponding to a Pauli operator P ∈ { I, X, Y, Z } is deﬁned by P ( ρ ) = P ρP . Here k = 1 , . We claim that Ω simulates ideal single-qubit gates U β ∈ Γ with the overhead γ β = (1 + (cid:15)/ / (1 − (cid:15) ) and simulates CNOTs with the overhead γ β = (1 + 7 (cid:15)/ / (1 − (cid:15) ) .Indeed, suppose U β ∈ Γ is a single-qubit gate. Let us look for a solution of Eq. (44) in the form O α = D PU β , where P ∈ {I , X , Y , Z} . Then Eq. (44) is equivalent to D − = η I + η X + η Y + η Z . One can easily check that the optimal solution minimizing (cid:80) α | η α | is η = 1 + 3 (cid:15)/ − (cid:15) ) and η α = − (cid:15)/ − (cid:15) ) for α = 2 , , . Therefore γ β = (cid:80) α | η α | = (1 + (cid:15)/ / (1 − (cid:15) ) . The CNOT is simulated in a similar fashion by representing D − asa linear combination of two-qubit Pauli maps. The random ensemble of noisy circuits O α that simulates an ideal circuit U β isconstructed in three steps:1. Start from the ideal circuit, O α = U β .22. Modify O α by adding a Pauli X, Y, Z after each single-qubit gate with probability p = (cid:15)/ (4+2 (cid:15) ) . The gate is unchangedwith probability − p .3. Modify O α by adding a Pauli IX, IY, . . . , ZZ after each CNOT with probability p = (cid:15)/ (16 + 14 (cid:15) ) . The CNOT isunchanged with probability − p .The resulting circuit is then implemented on a noisy device (which adds the depolarizing noise after each gate) and the ﬁnalreadout string x is recorded. By generating M samples of x one can estimate E ∗ ( β ) using Eq. (10) of the main text. The signfunction σ β ( α ) is equal to ( − r , where r is the number of Pauli operators added to the ideal circuit U β to obtain O α . Numerical simulations

The error cancellation method was tested numerically for small Clifford+ T circuits subject to the depolarizing noise. Wechoose the ideal circuit U β as a composition of d alternating layers of gates, with each layer being either a tensor product of n single-qubit gates I, H, S, T (for odd layers) or a tensor product of n/ CNOTs (for even layers). The resulting circuit U β hasdepth d . Simulations were performed for random circuits U β as above with the initial state | + (cid:105) ⊗ n . Each single-qubit gatewas picked randomly from the set { I, H, S, T } . Control and target qubits for each CNOT were picked at random.For each ideal circuit U β we choose the observable A as a projector onto the subset of n − basis states x ∈ { , } n whoseprobability in the ﬁnal state of U β is above the median value. In other words, A = (cid:88) x ∈ S | x (cid:105)(cid:104) x | , S = arg min S ⊆{ , } n | S | =2 n − (cid:88) x ∈ S (cid:104) x |U β ( | + (cid:105)(cid:104) + | ⊗ n ) | x (cid:105) . By construction, E ∗ ( β ) ≥ / for any circuit β . Furthermore, we observed that E ∗ ( β ) is well separated from / for most ofthe circuits see Fig. 4. Recall that we deﬁne a noisy version of a k -qubit unitary gate U as D k U , where N u m be r o f s i m u l a t ed c i r c u i t s FIG. 4. Distribution of the ideal circuits according to their output probability E ∗ ( β ) . D k ( ρ ) = (1 − (cid:15) ) ρ + (cid:15)I k Tr( ρ ) is the depolarizing channel on k qubits. Noise was added after all gates including the identity gates. In this case the totalsimulation overhead γ β depends only on the number of qubits and the circuit depth, namely γ β = (cid:20) (cid:15)/ − (cid:15) (cid:21) nd/ · (cid:20) (cid:15)/ − (cid:15) (cid:21) nd/ . Consider a ﬁxed ideal circuit U β and let P β ( α ) , O α be the random ensemble of noisy circuits obtained from U β by insertingrandom Pauli operators and adding noise as described in the main text. Instead of using the estimate Eq. (10) of the main text3for the ideal output probability E ∗ ( β ) we opted for a slightly optimized estimate. It is deﬁned by dividing the total budget of M runs into K groups such that the j -th group contains M j runs M = K (cid:88) j =1 M j . Deﬁne a random variable ˆ E ( β ) ≡ γ β K − K (cid:88) j =1 σ β ( α j ) 1 M j M j (cid:88) a =1 (cid:104) x aj | A | x aj (cid:105) , (46)where α , . . . , α K are independent samples drawn from the distribution P β ( α ) and x aj ∈ { , } n are readout strings obtainedby measuring each qubit of the ﬁnal state O α j ( ρ in ) in the Z -basis. We prepare a fresh copy of the ﬁnal state to generate eachstring x aj . Thus computing ˆ E ( β ) requires M runs of the noisy circuits with each run producing a single readout string. One caneasily check that ˆ E ( β ) is an unbiased estimator of E ∗ ( β ) for any choice of { M j } . Our goal is to choose { M j } that minimizethe variance of ˆ E ( β ) for a ﬁxed M . One can easily check that the optimal choice is M j ≈ M σ j (cid:80) Ki =1 σ i where σ j = E ( α j ) − E ( α j ) . In order to choose optimal values of M j , one has to run each circuit α j at least a few times,which gives a rough estimate of E ( α j ) and thus σ j . Numerical simulations were performed for the following parameters:number of qubits n = 6 circuit depth d = 20 error rate (cid:15) = 0 . total number of runs M = 4 , simulation overhead γ β ≈ . Our results are presented on the left panel of Figure 5. For each of ≈ ideal circuits β generated at random we computed asimulation precision δ ( β ) ≡ | ˆ E ( β ) − E ∗ ( β ) | where ˆ E ( β ) is the estimate deﬁned in Eq. (46). The plot on Figure 5, left, showsdistribution of the ideal circuits β according to their simulation precision δ ( β ) . The median value of δ ( β ) is approximately . .This is consistent with the estimate δ ( β ) ≈ γ β √ M = 4 . √ ≈ . . We also computed a simulation precision δ ( β ) that one would obtain by running the circuit β directly on a noisy devicewithout error cancellation, see the right panel of Figure 5. It is deﬁned as δ ( β ) ≡ | E ( β ) − E ∗ ( β ) | . For each circuit the outputprobability E ( β ) was estimated using M = 4 , circuit runs. Thus the simulations presented on the left and the right panelsof Figure 5 have access to exactly the same resources. The median value of δ ( β ) is approximately . . We conclude that errorcancellation signiﬁcantly improves the simulation precision. VI Quasi-probability representation for amplitude-damping noise

A more interesting example is the noise described by the amplitude-damping channel A ( ρ ) = A ρA † + A ρA † , where A = (cid:34) − (cid:15) ) / (cid:35) and A = (cid:15) / (cid:34) (cid:35) . A noisy version of a k -qubit unitary gate U is deﬁned as A ⊗ k U . In contrast to the previous example, noisy unitary gates A ⊗ k U alone cannot simulate any ideal unitary gate. Indeed, assume the contrary. Suppose U β is a single-qubit gate that has a QPREq. (45) with O α = AV α for some unitary maps V α . Rewrite Eq. (45) as A − = γ β p (cid:88) α =1 P β ( α ) σ β ( α ) V α U − β . N u m be r o f s i m u l a t ed c i r c u i t s N u m be r o f s i m u l a t ed c i r c u i t s FIG. 5. Simulation precision for ≈ randomly generated ideal Clifford+ T circuits on n = 6 qubits with depth d = 20 . The left and theright panels show results for simulations with and without error cancellation. In both cases each ideal circuit was simulated by M = 4000 runs of the noisy circuit. Since the maps V α U − β are unital, we infer that A − and A are unital which is false. Thus Eq. (45) has no solutions.To overcome this problem we shall extend the noisy basis by adding state preparations. Also we shall employ non-productQPRs. Given a single-qubit state | ψ (cid:105) , deﬁne a state preparation map P | ψ (cid:105) ( ρ ) = Tr( ρ ) · | ψ (cid:105)(cid:104) ψ | . (47)Let S ( ρ ) = SρS − be the S -gate considered as a TPCP map. Deﬁne a noisy basis Ω that includes noisy state preparations AP | ψ (cid:105) with | ψ (cid:105) = | + (cid:105) , |−(cid:105) , | (cid:105) , | (cid:105) , noisy single-qubit gates AU β , AS ± U β for each ideal single-qubit gate U β ∈ Γ , and noisytwo-qubit gates A c A t S yc S zt U cnot , y, z ∈ { , ± } where c, t are the control and the target qubits of a CNOT gate U cnot ∈ Γ . Here the subscripts indicate qubits acted upon by eachmap. We claim that this noisy basis Ω simulates any ideal Clifford+ T circuit β with the overhead γ β ≤ γ L +2 L , γ ≡ (cid:15) − (cid:15) , (48)where L k is the number of k -qubit gates in β . The corresponding QPR Eq. (41) preserves the circuit depth, although it does nothave a simple product form as above.Indeed, consider a single-qubit gate U β ∈ Γ . Let us look for a solution of Eq. (44) with p = 4 and O = AU β , O = ASU β , O = AS − U β , O = P | (cid:105) . Note that P | (cid:105) ∈ Ω since AP | (cid:105) = P | (cid:105) . Furthermore, since P | (cid:105) = AP | (cid:105) U β , one can rewrite Eq. (44) as A − = η I + η S + η S − + η P | (cid:105) . (49)One can easily check that the optimal solution minimizing (cid:80) α | η α | is η = 1 √ − (cid:15) , η = η = 1 − √ − (cid:15) − (cid:15) ) , η = − (cid:15) − (cid:15) . Therefore, Ω simulates U β with the overhead (cid:80) α | η α | = γ , where γ is deﬁned in Eq. (48).Next consider the CNOT gate U cnot ∈ Γ . Consider a decomposition of A − c A − t obtained by applying Eq. (49) twice.Multiplying this decomposition on the right by U cnot and on the left by A c A t one obtains U cnot = (cid:88) α η α O (cid:48) α O α , (cid:88) α | η α | = γ (50)5where O α = A c A t S yc S zt U cnot ∈ Ω is a valid noisy operation and O (cid:48) α is either identity or a state preparation map P | (cid:105) appliedto the control and/or target qubits. Here we noted that AP | (cid:105) = P | (cid:105) A . Although O (cid:48) α O α might not be a valid noisy operationfrom Ω , we may merge O (cid:48) α with the next gate applied after the CNOT. Indeed, by assumption, each CNOT in the ideal circuitis followed by some single-qubit gates U c and U t applied to the control and the target qubits. The gates I, S, T can be absorbedinto P | (cid:105) since they act trivially on the state | (cid:105) . The only non-trivial case is when P | (cid:105) is merged with the Hadamard gate. Inthis case the latter is replaced by the state preparation P | + (cid:105) .Since P | + (cid:105) can now appear in the ideal circuit, we must be able to use noisy operations from Ω to simulate P | + (cid:105) . Let us lookfor a solution of Eq. (44) with U β ≡ P | + (cid:105) in the form P | + (cid:105) = η AP | + (cid:105) + η AP |−(cid:105) + η AP | (cid:105) . (51)Note that the righthand side of Eq. (51) contains only noisy operations from Ω . One can rewrite Eq. (51) as | + (cid:105)(cid:104) + | = η A ( | + (cid:105)(cid:104) + | ) + η A ( |−(cid:105)(cid:104)−| ) + η A ( | (cid:105)(cid:104) . The optimal solution minimizing (cid:80) α | η α | is η , = ± (cid:18) √ − (cid:15) ± − (cid:15) − (cid:15) (cid:19) , η = (cid:15) − (cid:15) . Therefore Ω simulates the ideal state preparation P | + (cid:105) with the overhead γ (cid:48) = (cid:80) α | η α | ≤ γ , where γ is deﬁned in Eq. (48).A QPR of the ideal circuit β with the overhead Eq. (48) is constructed in two steps. First, one applies the decompositionEq. (50) to each CNOT of β and merges state preparation maps P | (cid:105) that appear in O (cid:48) α (if any) with the single-qubit gates of β following the CNOT. Now all CNOT gates are replaced by noisy gates from Ω . The rest of the circuit consists of single-qubitgates U β ∈ Γ and state preparations P | + (cid:105) . At the second step, each of these ideal operations is replaced by its QPR constructedabove. Note that each CNOT contributes γ to the total overhead γ β , see Eq. (50). Each single-qubit gate U β ∈ Γ or a statepreparation P | + (cid:105) contributes at most γγ