OOn maximum-likelihood decoding with circuit-level errors
Leonid P. Pryadko
Department of Physics & Astronomy, University of California, Riverside, California 92521, USA (Dated: July 15, 2020)Error probability distribution associated with a given Clifford measurement circuit is describedexactly in terms of the circuit error-equivalence group, or the circuit subsystem code previouslyintroduced by Bacon, Flammia, Harrow, and Shi. This gives a prescription for maximum-likelihooddecoding with a given measurement circuit. Marginal distributions for subsets of circuit errors arealso analyzed; these generate a family of related asymmetric LDPC codes of varying degeneracy.More generally, such a family is associated with any quantum code. Implications for decodinghighly-degenerate quantum codes are discussed.
I. INTRODUCTION
Quantum computation offers exponential algorithmicspeed-up for some classically hard problems. Thispromise is conditional in a fundamental way upon theuse of quantum error correction (QEC). However, despitean enormous progress achieved both in theory and ex-periment during the quarter century since the inventionof QEC[1], universal repetitive QEC protecting againstboth phase and amplitude errors has not yet been demon-strated in any of the qubit systems constructed to date.This illustrates the enormous difficulty of building quan-tum computers (QCs) with sufficiently small error rates.Given also the engineering difficulties with scaling upthe number of qubits[2], it is important that on the algo-rithmic side one tries to achieve every optimization possi-ble. Among other things, we would like to maximize theprobability of successful syndrome-based decoding withQEC. Given the available hardware, this requires choos-ing the optimal code, the optimal measurement circuit,and the optimal decoder. In particular, a decoder shouldbe designed for the specific syndrome measurement cir-cuit, as it must be aware of the associated correlationsbetween the errors[3–5]—at sufficiently high error proba-bilities such correlations are present even in fault-tolerant(FT) circuits designed to prevent a single fault from af-fecting multiple data qubits.In contrast, the standard approach is to use a de-coder optimized for the underlying code, regardless ofthe actual circuit used for syndrome measurement. Insome cases, e.g., in the case of surface codes withminimum-weight perfect matching (MWPM) decoder,some leading-order correlations can be included in theedge weights of the graph used for decoding[4, 6–8]. How-ever, such a scheme is limited to codes and measure-ment circuits where MWPM can be done efficiently, e.g.,surface codes with single-ancilla measurement circuits[9],and certain other classes or topological codes[10, 11].Further, there is necessarily a decoding accuracy losswhen measurement circuit is not simply repeated, e.g.,with code deformation or lattice surgery[12].A feasible approach to building a decoder optimized forthe specific measurement circuit is to train a neural net-work (NN) using extensive simulation data[13–20]. How- ever, as is commonly the case with NNs, there is alwaysa question whether training has been sufficient to achieveoptimal decoding performance.A parameterization of the probability distribution ofcorrelated quantum errors in terms of a spin model hasbeen recently considered by Chubb and Flammia[5]. Inparticular, they describe how such a formalism can beused for maximum-likelihood (ML) decoding in the pres-ence of circuit-level noise. However, Chubb and Flam-mia focused on larger circuits composed of measurementblocks. Errors are assumed uncorrelated between theblocks, while a model of error correlations at the outputof each block has to be constructed offline, e.g., usingerror propagation for a Clifford measurement circuit. Inparticular, Chubb and Flammia stopped short of analyz-ing circuit-level errors, and only considered numericallya toy model of correlated errors.The goal of this work is to give an explicit numeri-cally efficient algorithm for analyzing error correlationsresulting from a given qubit-based Clifford measurementcircuit, and for constructing decoders optimized for sucha circuit. Main result is that such correlations can beaccounted for by using phenomenological error model(no error correlations) with the circuit-associated subsys-tem code constructed by Bacon, Flammia, Harrow, andShi[21, 22]. Thus, any generic decoder capable of deal-ing with uncorrelated data and syndrome measurementerrors in highly-degenerate sparse-generator subsystemcodes can be rendered to account for circuit-level errorcorrelations. Error correlations needed for implementingthe scheme of Chubb and Flammia are recovered by cal-culating the marginals of the constructed distribution,which can be done in practice using Ising model star-polygon transformations[23]. The construction naturallyincludes additional correlations between errors in differ-ent fault locations on the circuit.An immediate application is for designing decodersapproaching true ML decoding for Clifford circuits, tobe used in quantum memory. In particular, more ac-curate decoding and optimization, with the ability toaccount for error rates’ variations on individual qubits,could help improve QEC to the level sufficient to passthe break-even point and show coherent lifetimes longerthan that of an unprotected qubit in present-day or near-future devices. Related techniques could also be more a r X i v : . [ qu a n t - ph ] J u l widely applicable for design and analysis of FT protocolsand control schemes. Examples include error correctionwith FT gadgets like flag error correction[24–26], schemesfor protected evolution, e.g., using code deformationswhere conventional approaches may result in a reduceddistance[12], and optimized single-shot error-correctionprotocols[27–29].In addition, analysis of marginal error distributionsand associated families of asymmetric codes of varyingdegeneracy could become an important tool in the theoryof QECCs. In particular, it could help constructing bet-ter decoders for highly-degenerate quantum LDPC codes.Also, thorough analytical understanding of error correla-tions could be useful for fundamental analysis of thresh-olds, e.g., in order to extend and/or improve the boundsin Refs. 30 and 31.Paper outline: After this Introduction, an overviewof relevant notations and results in quantum codes andmulti-variable Bernoulli distributions is given in Sec. II.Pauli error channels with correlated errors, includingcircuit-level correlations, are discussed in Sec. III. Cor-responding marginal distributions are constructed inSec. IV, followed by a discussion of possible implicationsfor exact and approximate ML decoding with circuit-levelerrors in Sec. V. Sec. VI gives the concluding remarks. II. BACKGROUNDA. Quantum codes
Generally, an n -qubit quantum code Q is a subspaceof the n -qubit Hilbert space H ⊗ n . A quantum [[ n, k, d ]]stabilizer code is a 2 k -dimensional subspace specified as acommon +1 eigenspace of all operators in an Abelian sta-bilizer group S ⊂ P n , − (cid:54)∈ S , where the n -qubit Pauligroup P n is generated by tensor products of single-qubitPauli operators. The stabilizer is typically specified interms of its generators, S = (cid:104) S , . . . , S n − k (cid:105) . The weightof a Pauli operator is the number of qubits that it af-fects. The distance d of a quantum code is the minimumweight of a Pauli operator E ∈ P n which commutes withall operators from the stabilizer S , but is not a part ofthe stabilizer, E (cid:54)∈ S . Such operators act non-trivially inthe code and are called logical operators.Subsystem codes[32, 33] are a generalization of sta-bilizer codes where only some of the logical qubits areused. More precisely, given a stabilizer group S , the sta-bilized subspace Q S is further decomposed into a tensorproduct of two subspaces, Q S = Q L ⊗ Q G , where Q L is a 2 k -dimensional “logical” subspace used to store thequantum information, while we do not care about thestate of the “gauge” qubits in the subspace Q G after therecovery. Logical operators of the original code whichact non-trivially only in Q G , together with the operatorsfrom the stabilizer group S , generate the non-Abeliangauge group G which fully characterizes the subsystemcode. In particular, the center Z ( G ) is formed by the elements of the original stabilizer S , up to a phase i m , m ∈ { , , , } .A Pauli error E that anticommutes with an elementof the stabilizer, ES = − SE , S ∈ S , is called de-tectable. Such an error results in a non-zero syndrome s = { s , . . . , s r } whose bits s i ∈ { , } are obtained bymeasuring the eigenvalues ( − s i of the chosen set of sta-bilizer generators S i , i = { , . . . , r } , r ≥ n − k . Unlikein the case of classical codes, there may be many equiv-alent ( mutually degenerate ) errors resulting in the samesyndrome. For a subsystem code, errors E (cid:48) degeneratewith E , denoted E (cid:48) (cid:39) E , have the form E (cid:48) = EG , where G ∈ G is an element of the gauge group; such errorscan not and need not be distinguished. Non-trivial logi-cal operators of the subsystem code are logical operatorsof the original stabilizer code which act non-trivially in Q L . A bare logical operator U acts trivially in Q G andcommutes with any element of G . Such restrictions donot apply to dressed logical operators which are only re-quired to commute with elements of the stabilizer S . Inany case, multiplication by a logical operator U gives anerror with the same syndrome but from a different equiv-alence class, EU (cid:54)(cid:39) E .Analysis of error correction is conveniently done usingquaternary, or an equivalent binary, representation of thePauli group[34, 35]. A Pauli operator U ≡ i m X u Z v ,where u , v ∈ { , } ⊗ n and X u = X u X u . . . X u n n , Z v = Z v Z v . . . Z v n n , is mapped, up to a phase, to a length-2 n binary vector e = ( u | v ). Two Pauli operators U , U commute iff the symplectic inner product e (cid:63) e T ≡ e Σ e T , Σ ≡ (cid:18) I n I n (cid:19) , (1)of the corresponding binary vectors is zero, e (cid:63) e T = 0.Here I n is the n × n identity matrix. Thus, if H =( H Z | H X ) is an m × n binary matrix whose rows rep-resent stabilizer generators, and (cid:101) H ≡ H Σ = ( H X | H Z ),then the syndrome s of an error U e with binary represen-tation e = ( u | v ) is given by s T = H e T . If we similarlydenote G = ( G X | G Z ) an r × n matrix formed by thegauge group generators (for a stabilizer code, G = (cid:101) H ),the errors with binary representation e and e (cid:48) = e + α G are equivalent, e (cid:48) (cid:39) e , for any length- r binary string α ∈ F ⊗ m . Generally, GH T = 0, andrank H = n − k − κ, rank G = n − k + κ, (2)where κ is the number of gauge qubits. Matrices G and H can be viewed as generator matrices of an auxiliarylength-2 n CSS code which encodes 2 k qubits. These cor-respond to a basis set of 2 k independent logical oper-ators of the original subsystem code. It will be conve-nient to introduce a logical generating matrix L such that LH T = 0, rank L = 2 k . Rows of L map to basis logicaloperators; a non-zero linear combination of the rows of L must be linearly independent from the rows of G .Gauge or stabilizer generators can be measured witha Clifford circuit which consists of ancillary qubit initial-ization and measurement in the preferred (e.g., Z ) ba-sis, and a set of unitary Clifford gates, e.g., single-qubitHadamard H and phase P gates and two-qubit CNOT.Generally, a Clifford unitary U maps Pauli operators toPauli operators, E (cid:48) = U † EU , E, E (cid:48) ∈ P n . Ignoring theoverall phase (for complete description, see Refs. 36 and37), this corresponds to a linear map of the correspond-ing binary vectors, ( e (cid:48) ) T = C e T , where C ≡ C U is asymplectic matrix with the property C T Σ C = Σ. B. Bernoulli distribution
Multi-variate Bernoulli distribution describes a jointprobability distribution of m single-bit variables x i ∈ F ,e.g., components of a binary vector x ∈ F m . Most gener-ally, such a distribution can be specified in terms of 2 m probabilities p x ≥
0, with normalization (cid:80) x ∈ F m p x = 1.A convenient representation of such a distribution as anexponent of a polynomial of m binary variables with realcoefficients was given by Dai et al. in Ref. 38. Namely, fora single variable x ∈ { , } , we can write P ( x ) = p − x p x as a product of two terms, where p + p = 1 are the out-come probabilities. For m variables we have, similarly,the product of 2 m terms, P ( x ) = p (1 − x )(1 − x ) ... (1 − x m )00 ... × p (1 − x )(1 − x ) ... (1 − x m − ) x m ... . . . p x x ...x m ... , (3)where the probabilities are assumed positive, p x >
0. Forany given x ∈ F m only one exponent is non-zero so thatthe result is p x . Taking the logarithm and expanding,one obtains the corresponding “energy” E ≡ − ln P ( x )as a polynomial of m binary variables. The correspond-ing coefficients can be viewed as binary cumulants[38];presence of high-degree terms indicates a complex prob-ability distribution with highly non-trivial correlations.For applications it is more convenient to work withspin variables s i = ( − x i ≡ − x i ∈ ±
1, and rewritethe energy function using the general Ising representationfirst introduced by Wegner[39], E ≡ E ( { s j } ) = − (cid:88) b K b R b + const , R b = (cid:89) i s θ ib i , (4)parameterized by the binary spin-bond incidence matrix θ with m rows, and the bond coefficients K b . While mostgeneral m -variate Bernoulli distribution requires 2 m − θ , e.g., withbounded row and column weights, which also limits thenumber of columns. In the simplest case of independentidentically-distributed (i.i.d.) bits with equal set proba-bilities p = p , p = 1 − p , we can take θ = I m , the iden-tity matrix, and all coefficients equal, K = ln[(1 − p ) /p ] / − p ) /p ] is commonly called a log-likelihood ratio (LLR); the coefficient K here and K b inEq. (4) are thus called half-LLR coefficients. III. ERROR CORRELATIONS IN A CLIFFORDMEASUREMENT CIRCUITA. Pauli error channel with correlations.
Consider the most general Pauli error channel ρ → (cid:88) e ∈ F n P ( e ) E e ρE † e , (5)where P ( e ) is the probability of an error E e with binaryrepresentation e , with the irrelevant phase disregarded.Technically, P ( e ) describes a 2 n -variate Bernoulli distri-bution. The probability P ( e ) can be parameterized interms of a 2 n × m binary coupling matrix θ with m < n columns, and a set of coefficients K b , b ∈ { , . . . , m } , P ( e ; θ, { K b } ) = Z − exp (cid:32)(cid:88) b K b ( − [ e θ ] b (cid:33) , (6)where [ e θ ] b in the exponent is the corresponding compo-nent of the row-vector e θ . The normalization constant Z ≡ Z ( θ, { K b } ) in Eq. (6) is the partition function of theIsing model in Wegner’s form, cf. Eq. (4), Z ( θ, { K b } ) = (cid:88) { s i ∈± } (cid:89) b e K b R b . (7)In the simplest case of independent X and Z errors, θ = I n , the identity matrix, while e K b = (1 − p X ) /p X for b ≤ n , and the corresponding expression with p X → p Z for n < b ≤ n . In the case of the depolarizing channelwith error probability p , we have, instead, θ = (cid:18) I n I n I n I n (cid:19) , e K = 3(1 − p ) /p, (8)where the additional column block is to account for corre-lations between X and Z errors. Additional correlationsbetween the errors can be introduced by adding columnsto matrix θ and the corresponding coefficients K b .Given a probability distribution in the form (6), it iseasy to construct an expression for probability of an errorequivalent to e in a subsystem code with gauge generatormatrix G , extending the approach of Refs. 6, 40, and41, and reproducing some of the results from Ref. 5. Asubstitution e → e + α G and a summation over α gives P ( e (cid:48) ∈ F n | e (cid:48) (cid:39) e ) = 2 rank G − r Z (cid:0) Gθ, { K b ( − [ e θ ] b } (cid:1) Z ( θ, { K b } ) . (9)Here G is an r × n binary matrix, cf. Eq. (2), and theprefactor accounts for a possible redundancy in the sum-mation. Notice that the partition function in the nu-merator has the same number of bonds m as that in thedenominator, but with the signs of the coefficients K b corresponding to non-zero elements of the binary vector (cid:15) ≡ e θ flipped. When both the gauge generator matrix G and the error correlation matrix θ are sparse, the inci-dence matrix Θ ≡ Gθ in the numerator of Eq. (9) mustalso be sparse. B. Clifford measurement circuit and associatedinput/output codes.
Let us now consider error correlations resulting froma Clifford measurement circuit. Specifically, followingRefs. 21 and 22, consider an n -qubit circuit, n = n + n a ,with n data qubits and n a ancillary qubits. First, an-cillary qubits are initialized to | (cid:105) , second, a collectionof Clifford gates forms a unitary U , and finally the an-cillary qubits are measured in the Z basis, see Fig. 1.In the absence of errors and in the event of all measure-ments returning +1 (zero syndrome), the correspondingpost-selected evolution is described by the matrix V = ( I ⊗ n ⊗ (cid:104) | ⊗ n a ) U ( I ⊗ n ⊗ | (cid:105) ⊗ n a ) , (10)where I is a single-qubit identity operator. The circuit isassumed to be a good error-detecting circuit (good EDC),namely, V † V be proportional to the projector onto a sub-space Q ⊆ H n , V † V = c Π , c > Q , called the input code ,is an [[ n , k, d ]] stabilizer code encoding k qubits withdistance d . | ψ i / n U / n | ψ ′ i| i / n a / n a Z FIG. 1. Generic circuit with n data and n a ancillary qubitsinitialized to | (cid:105) and measured in Z basis. In practice, an-cillary qubit can be measured during evolution, and subse-quently reused after initialization. However, there is no mech-anism for adapting the gates to the measurement results. A good EDC also defines an output code Q (cid:48) ⊆ H ⊗ n which encodes the same number of qubits k . Indeed,since V † V = c Π , matrix V has only one non-zero singu-lar value, √ c ; this immediately gives V V † = c Π (cid:48) , withthe projector onto a space Q (cid:48) ⊆ H ⊗ n of the same di-mension 2 k , the output code. Moreover, for any inputstate | ψ (cid:105) ∈ Q , the output | ψ (cid:105) (cid:48) ≡ V | ψ (cid:105) ∈ Q (cid:48) is in theoutput code, and the corresponding transformation is a(scaled) Clifford unitary.Even though the map between Q and Q (cid:48) is unitary,the distance d (cid:48) of the output code does not necessarilyequals d . In particular, adding a unitary decoding cir-cuit on output data qubits may be used to render d (cid:48) = 1. C. Errors in a Clifford circuit.
Using standard circuit identities, any circuit error E can be propagated forward to the output of the circuit,thus giving an equivalent data error E (cid:48) ( E ) ∈ P n andthe (gauge) syndrome σ (cid:48) ( E ) ∈ F n a corresponding to themeasurement results. Clearly, there is a big redundancy even if phases are ignored, as many circuit errors canresult in the same or equivalent E (cid:48) ( E ) and σ (cid:48) ( E ). Thegoal is to find the conditional probability distribution forthe equivalence class of the output error E (cid:48) ( E ) given themeasured value of σ (cid:48) ( E ).In a given Clifford circuit, consider N possible error lo-cations , portions of horizontal wires starting and endingon a gate or an input/output end of the wire. For exam-ple, the circuit in Fig. 2 has N = 15 error locations. APauli error may occur on any of these locations. A circuiterror E is a set of N single-qubit Pauli operators with-out the phase, P i ∈ { I, X, Y, Z } , i ∈ { , . . . , N } . Whentwo circuit errors are applied sequentially, the result isa circuit error whose elements are pointwise products ofPauli operators with the phases dropped. The algebradefined by such a product is isomorphic to the N -qubitPauli group without the phase. This is an Abelian groupwhich also admits a representation in terms of length-2 N binary vectors e ≡ ( u | v ) ∈ F N . Multiplication of twocircuit errors amounts to addition of the correspondingbinary vectors e . • I • • I • | i Z | i Z FIG. 2. Circuit measuring generators Z Z and Z Z of athree-qubit toy code. Digits indicate distinct circuit locations.Identity operators I are inserted so that the number of circuitlocations along each qubit wire be odd. With these definitions, error propagation through aClifford circuit can be described as products of the orig-inal circuit error E with trivial circuit errors which haveno effect on the outcome. The collection of all such errorsforms error equivalence group (EEG) of the circuit. Thegenerators of this group involved in error propagation aretrivial errors, each formed as a union of a single-qubitPauli P i , i ∈ { , . . . , N } (but not in the output for dataqubits, or right before the measurements for ancillaryqubits) with the result of error propagation of P i acrossthe subsequent gate. For example, for the circuit inFig. 2, some of the EEG generators, with identity opera-tors dropped, are { Z , Z } , { X , X } , { X , X , X } , and { Z , Z , Z } (propagation across the leftmost CNOT ),and { X , X } , { Z , Z } (propagation across the leftmostidentity gate labeled I ).In addition, for a circuit which includes qubits’ ini-tialization and measurement, the list of EEG generatorsincludes errors Z j on ancillary qubits right after initial-ization to | (cid:105) and right before the Z -basis measurement.For the circuit in Fig. 2, these are { Z } , { Z } (ancillaryqubits right after initialization), and { Z } , { Z } (an-cillary qubits just before measurement). It is importantthat stabilizer group of the input code, after its elementsare promoted as circuit errors, forms a subgroup of thusconstructed full circuit EEG[22]. Same applies to thestabilizer group of the output code.In the binary representation, a single-qubit Hadamardgate connecting circuit positions i and i (cid:48) corre-sponds to a pair of generators with non-zero elements( u i , v i | u i (cid:48) , v i (cid:48) ) = (10 |
01) and (01 |
10) ( X i propagates into Z i (cid:48) and v.v.), a phase gate similarly corresponds to(10 |
10) and (01 |
11) ( Z i propagates into Z i (cid:48) and X i into Y i (cid:48) ), and a CNOT gate (10 00 |
10 00) (input Z i on the con-trol and an output Z i (cid:48) on the same wire), (01 00 |
01 01)(input X i on the control and X i (cid:48) , X j (cid:48) on the out-puts), (00 01 |
00 01) (target X pass-through), and, finally,(00 10 |
10 10). The generators for the single-qubit trivialerrors are even simpler, since a Z j maps to the weight-onevector ( u j , v j ) = (10).For a given circuit error E with binary form e ∈ F N ,any equivalent error can be obtained by adding a linearcombination of thus constructed circuit EEG generators g j . It is convenient to combine the corresponding gen-erators into a generator matrix G with 2 N columns. Asdiscussed in the Appendix A, for a good EDC with therank of the input-code stabilizer group r = n − k andthe constant c = 1 / κ , see Eq. (10) and below, the gen-erator matrix has the rankrank G = 2 N − n − f, f ≡ n a − κ − r ≥ . (12)Generally, a non-zero value f > e ∈ F N is proportional to the Ising partition function, P ( e (cid:48) ∈ F N | e (cid:48) (cid:39) e ) = Const Z ( Gθ, { K b ( − [ e θ ] b } ) , (13)cf. Eq. (9). It is also easy to find the conditional probabil-ity distribution of output errors with a given syndrome.Namely, given the binary form of an output data error e (cid:48) ∈ F ⊗ n and a syndrome vector σ (cid:48) ∈ F ⊗ n a , we needto form the corresponding vector e ≡ e ( e (cid:48) , σ (cid:48) ) ∈ F N ,filling only the components corresponding to the outputdata qubits and ancillary X j just before the measure-ments which give non-zero syndrome bits in σ (cid:48) .For ML decoding, we compare thus computed prob-abilities for all inequivalent errors consistent with themeasured syndrome σ (cid:48) . In particular, these include er-rors that differ by a logical operator of the input (or,equivalently, the output) code, since non-trivial logicaloperators are outside the circuit EEG. In other words,length-2 N binary vectors corresponding to logical opera-tors are linearly independent from the rows of the circuitEEG generator matrix G . It is convenient to define abinary logical generator matrix L of dimension 2 k × N , whose rows correspond to mutually inequivalent logicaloperators of the input code.For the ease of decoding, it is also convenient to intro-duce the parity-check matrix H , also with 2 N columns,whose rows are orthogonal to the rows of both matrices G and L , and whose rank satisfiesrank H = 2 N − rank G − rank L = n a − κ + r . (14)Clearly, H is dual to a matrix combing the rows of G and L . As in the case of a subsystem code, see Eq. (2), thesematrices can be seen as forming a half of a CSS code,with stabilizer generator matrices G X = G , G Z = H ,and X -logical operator generator matrix L X = L .The orthogonality requirement for rows of H can alsobe interpreted in terms of the N -qubit Pauli group as-sociated with the circuit. Namely, the Pauli operatorscorresponding to the rows of the symplectic dual matrix (cid:101) H = H Σ, see Eq. (1) and below, must commute withgenerators of the circuit EEG. This guarantees that eachof these operators be a spackle , i.e., a circuit error wherethe single-qubit Pauli operators in any time layer can beobtained by error propagation from those in the previostime layer, see Ref. 22. Respectively, row weights of H scale linearly with the circuit depth. D. Circuit subsystem code
The discussion in Ref. 22 focused on the special caseof good EDCs where each qubit line is required to havean odd number of locations. In this special case, thecircuit EEG can be seen as the gauge group of a quan-tum subsystem code of length N which encodes the samenumber of qubits k as the input/output codes, and hasa distance not exceeding the corresponding distances, d ≤ min( d , d (cid:48) ). Respectively, rows of the matrix (cid:101) H which correspond to generators of the stabilizer group ofthe subsystem code are necessarily given by linear com-binations of the rows of G . In addition, circuit errorscorresponding to the rows of the matrix L can be seen asdressed logical operators, while the bare logical operatorswhich commute with the elements of the circuit EEG canbe constructed as spackles.In practice, any circuit can be easily modified to satisfythis additional requirement by inserting a null (identity)gate into each qubit line with an even number of loca-tions, see Fig. 2 for an example. While it is not strictlynecessary to work with circuits that satisfy this require-ment, it is convenient, as the additional structure of thesubsystem code can be used to verify the validity of theconstructed matrices.However, once the matrices G , H , and L are con-structed, there is no need to refer to the subsystem code.In fact, the generator matrix row-reduction transforma-tion described in the following Section IV preserves theorthogonality and the relation Eq. (14) between the ranksinherent in the CSS code map, but not the structure ofthe circuit subsystem code. E. Circuit code distance.
How good can a measurement protocol be? What arethe bounds on the distance d of the subsystem code as-sociated with the circuit?Generally, if d and d (cid:48) are the distances of the inputand the output codes, the distance of the correspondingcircuit code satisfies d ≤ min( d , d (cid:48) ). This follows fromthe fact that a logical operator of the input code, e.g.,is naturally mapped to a (dressed) logical operator ofthe circuit code. An important result in Ref. [22] is thatone can always design a fault-tolerant circuit so that thedistance d of the corresponding subsystem code be asgood as that of the input code, d = d .Unfortunately, circuit-code distance d does not have adirect relation to the probability distribution of the out-put errors; even single-qubit output errors may remainundetected. This is a well known “feature” of quan-tum error-correcting codes operating in a fault-tolerantregime, even for codes with single-shot properties[27–29].Indeed, regardless of the circuit structure, errors on thedata qubits in the locations just before the circuit outputwill not be detected.In comparison, with a formally defined circuit code,such an error can be propagated back to the input layerand (when it is detectable, e.g., if its weight is smallerthan the distance d (cid:48) of the output code) it would nec-essarily anticommute with one or more combination(s) Z g of the ancillary qubits. The original error would thusbe detectable in the circuit code. Causality does notpermit such an operation with actual circuit evolution.Formally, this functionality is removed due to assumedancillary qubit initialization to | (cid:105) .Of course, even if a small-weight error goes undetected,it may get corrected after one or more additional mea-surement rounds. In practice, when an error-correctingcode is analyzed in a fault-tolerant setting, the standardnumerical procedure is to add a layer of perfect stabilizermeasurements (no measurement errors). This guaranteesthat all small-weight errors at the end of the simulationbe detected, and thus recovers the distance d > IV. MARGINAL DISTRIBUTIONS FORCORRELATED ERRORS
The circuit EEG fully describes correlations betweenthe circuit errors. However, it also contains a lot of ex-cessive information: for the purpose of error correction,we are only interested in the distribution of the outputerrors and the syndrome, which are all supported at therightmost locations of the circuit. In addition, the largesize and sparsity of the circuit generator matrix G makesdecoding difficult, except with the simplest circuits.Present goal is to reduce the number of independentvariables, by constructing the marginal distribution for agiven subset of the variables. This amounts to a summa- tion over the variables one is not interested in, e.g., P ( e s +1 , . . . , e m ) = (cid:88) e (cid:88) e . . . (cid:88) e s P ( e , e , . . . , e m ) . (15)In the case of binary variables e i ∈ { , } , both the orig-inal and the resulting marginal distributions are multi-variate Bernoulli distributions, and each can be describedin terms of the Ising energy function (4). A. Row-reduction transformation
1. Generator matrix and the coupling coefficients
Given an n -variate Bernoulli distribution described bythe coupling matrix Θ, e.g., Θ = Gθ in Eq. (9), and a setof half-LLR coefficients K b , 1 ≤ b ≤ m , what are the cor-responding parameters of the marginal distribution (15)?In the equivalent Ising-model representation (4), the goalis to describe the couplings between the remaining spinsafter a partial summation. Such a star-polygon transfor-mation for a general Ising model has been constructed inRef. 23. The transformation includes two special caseslong known in the Ising model literature: the Onsager’sstar-triangle transformation[42] and the (inverse) deco-ration transformation[43, 44].It is convenient to derive the result directly, focusingon the marginal distribution after a summation over justone spin variable s i ∈ ± i th row ofΘ. Without limiting generality, assume that the chosenrow has w non-zero elements in positions 1, 2, . . . , w ,decompose the corresponding bond variables R b = s i T b , T b ∈ ±
1, 1 ≤ b ≤ w , and perform the summation explic-itly (with the additional one-half factor for convenience), B τ ≡ (cid:88) s i = ± exp (cid:16) s i w (cid:88) b =1 K b T b (cid:17) = cosh (cid:16) w (cid:88) b =1 K b T b (cid:17) , (16)where τ ∈ F w is a composite index with elements τ b such that T b = ( − τ b . To exponentiate this expression,rewrite B τ by analogy with Eq. (3), B τ = B τ
12 1+ τ ... τw ... B τ
12 1+ τ ... τw −
12 1 − τw ... × . . . B − τ
12 1 − τ ... − τw ... , (17)where the coefficients B ... in the base of the exponentsare the hyperbolic cosines (16) of the sum of coefficients ± K b with the signs fixed, and matching exactly the signsin the exponents. As in Eq. (3), after a substitutionof any given τ ∈ F w , only one term with the correctindex τ remains in the product. The modified bondsand the corresponding coefficients K (cid:48) b are obtained byexpanding the polynomial in the exponent of Eq. (17).Because of the symmetry of the hyperbolic cosine, onlyeven-weight products of the original bonds result fromthis expansion. Thus, in general, for an original row ofweight w , the corresponding w columns are combined toproduce w (cid:48) = 2 w − − w = 2 w − − w − w = 1, the transfor-mation amounts to simply dropping the row and the cor-responding column of Θ. The values of K b remain thesame, except for the one value that is dropped.For a row of weight w = 2, only the sum of the corre-sponding columns is retained in Θ, with the coefficient K (cid:48) , = 12 ln cosh( K + K )cosh( K − K ) ≡
12 ln B B , cf. Eq. (16). Equivalently, tanh K (cid:48) , = tanh K tanh K .For a row of weigth w = 3, the three columns of theoriginal matrix Θ are replaced by their pairwise sums,with the coefficient K (cid:48) , = 14 ln B B B B for the combination of the first two columns. The re-maining coefficients K (cid:48) , and K (cid:48) , can be obtained withcyclic permutations of the indices.For a row of weight w = 4, the four columns of theoriginal matrix Θ are replaced with six pairwise sums andthe seventh column combining all four original columns,with the coefficients, e.g., K (cid:48) , = 18 ln B B B B B B B B ,K (cid:48) , , , = 18 ln B B B B B B B B . In general, the coefficient K (cid:48) J in front of the product of T b with indices b in an (even) subset J ⊆ { , , . . . , w } isgiven by the sum of logarithms of the hyperbolic cosines B τ with τ = 0 (this accounts for symmetry of hyperboliccosine), with the coefficients ± / w − , where the sign isdetermined by the parity of the weight of the subset τ [ J ]restricted to J . It is easy to check that the numbersof positive and negative coefficients always match. Re-spectively, the coefficients for high-weight products aretypically small in magnitude.
2. Transformation for a parity check matrix
The row-reduction transformation can also be writ-ten in terms of the parity-check matrix H, also with m columns, and dual to Θ, such thatHΘ T = 0 and rank H = m − rank Θ . (18)To this end, consider the row-reduction of Θ as a combi-nation of the following elementary column steps:(i) The 1st column of Θ is added to columns 2, 3, . . . , w of Θ; as a result the i th row of Θ has a non-zeroelement only in the first position.(ii) The 1st column of thus modified Θ is dropped,which leaves the i th row zero—it may be droppedas well; (iii) If w >
2, 2 w − − w combinations of two, three, . . . , w − b ≤ w − (cid:48) ) Columns b ∈ { , , . . . , w } of H are added to its 1stcolumn, which becomes identically zero as a result.(ii (cid:48) ) Drop all-zero 1st column from thus modified H.(iii (cid:48) ) For each column, e.g., b (cid:48) , added to Θ as a linearcombination of two existing columns b and b , Hacquires a new row with the support on { b , b , b (cid:48) } to express this linear relation.It is easy to check that row orthogonality, HΘ T = 0, ispreserved. Also, the rank of Θ is reduced by one, whilethe increase of the rank of H matches the number ofcolumns added in step (iii (cid:48) ), so that the exact duality(18) is preserved. B. Marginal distribution for error equivalenceclasses
This analysis is easily carried over to the problem ofsyndrome-based ML decoding for an [[ n, k ]] subsystemcode under a Pauli channel characterized by a 2 n × m matrix θ and a set of m half-LLR coefficients { K b } , seeSec. III A. Given a gauge generator matrix G , the prob-ability of an error equivalent to e is given by Eq. (9).Generally, for ML decoding we need to choose the largestof the 2 k partition functions Z ( Gθ, { K b ( − [ e (cid:48) θ ] b ) , e (cid:48) = e + α L, α ∈ F k . (19)Typically, this needs to be done for a large number of er-ror vectors e . Can the calculation be simplified en masseby doing partial summation over the spins correspondingto all rows of Gθ as described in the previous section?The structure of the logical operators can be accountedfor by extending the rows of the generator matrix whichnow has two row blocks,Θ = (cid:32) GθLθ (cid:33) , ˜ K b = K b ( − [ e θ ] b , (20)and the half-LLR coefficients ˜ K b , b ∈ { , , . . . , m } . Amatching parity check matrix H is dual to Θ, see Eq. (18).
1. Independent X and Z errors Let us first consider the simpler case of independent X and Z errors, where matrix θ = I n . In this case H = H = (cid:101) H Σ, where (cid:101) H is a generator matrix of the code’sstabilizer group, see Eq. (2). Marginal distribution beingindependent of the choice of the generator matrix, userow transformations and column permutations to renderΘ = (cid:32) GL (cid:33) = (cid:32) I r A BI k C (cid:33) , (21) H = (cid:16) B T + C T A T C T I (cid:96) (cid:17) , (22)where matrices B and C have (cid:96) ≡ n − r − k columns,and r = rank( G ). The matrices Θ and H are mutuallydual as can be immediately verified.Row-reduction operations applied to each row in theupper block of Θ correspond to:(i (cid:48)(cid:48) ) Column operations to set both blocks A and B tozero, and conjugate column operations on H to setits left-most column block to zero.(ii (cid:48)(cid:48) ) Drop the upper row-block of the obtained Θ, aswell as the left-most column blocks of the resultingΘ and H .(iii (cid:48)(cid:48) ) Add an extra column block M + CM to the re-sulting Θ, where columns of M and M specifythe linear combinations of the columns in its tworemaining blocks, and a matching row-block to H .As a result, the transformed matrices acquire the formΘ (cid:48) = (cid:16) I k C M + CM (cid:17) , (23) H (cid:48) = (cid:32) C T I (cid:96) M T M T I (cid:33) , (24)where double vertical lines are used to separate the newlyadded columns.When the described transformation is applied to anyof the original partition functions (19), the result is justan exponential exp (cid:16)(cid:80) b K (fin) b (cid:17) of the sum of the finalhalf-LLR coefficients. Can identical columns of the finalgenerator matrix (23) be similarly combined to simplifythe structure of the final marginal distribution for errorequivalence classes? The answer is yes, as long as weaccount for the effect of the error e on the values of thecoefficients K (fin) b .In fact, it is easy to check that the row-reduction trans-formations in Sec. IV A are such that the additional signsin Eq. (19) only affect the signs of the coefficients K (fin) b .Moreover, these signs correspond to the bits of the trans-formed error vector, cf. Eq. (23), e (fin) = (cid:16) ε ε ε M + ε M (cid:17) , (25)where the vector ε selects the equivalence class, and ε = s + ε C , with s ≡ e H T the original syndrome.Clearly, the right-most blocks in Eqs. (23) and (25) areobtained from ε ≡ ( ε | ε ) with 2 k + (cid:96) = 2 n − r com-ponents as a right product with the combined matrix M = (cid:0) M M (cid:1) . All (cid:96) ≡ n − r components of ε being inde-pendent, there are m (cid:48) max = 2 (cid:96) − ε to e (fin) , we can ensure thatthe final matrices contain no more than m (cid:48) max columns.With Eq. (2), r = n − k + κ , so that (cid:96) = n + k − κ , where κ is the number of gauge qubits in the subsystem code.For a stabilizer code, κ = 0, thus (cid:96) = n + k . Clearly, thelatter is just the number of logical generators, 2 k , plusthe number of independent syndrome bits, n − k .
2. A more general Pauli channel
Instead of considering most general situation, consideran important case of a Pauli channel where any single-qubit error has a non-zero probability. Then, the inci-dence matrix can be chosen to have an identity-matrixblock, θ = ( I n | T ), where T is an ( m − n ) × n binarymatrix. As a result, the matrix Θ and the half-LLR co-efficients in Eq. (20) both acquire additional blocks oflinearly-dependent components, while the parity-checkmatrix dual to Θ can be chosen in the formH = (cid:32) H T T I m − n (cid:33) . (27)Since the relevant error in Eq. (20) is e θ = ( e | e T ), thelower row-block of H gives a zero syndrome, just like thelower row-block in Eq. (24). Basically, after degeneracyis taken into account, the number of independent com-ponents of e θ is 2 n − r , the same as for e ; final matricesΘ (cid:48) and H (cid:48) can always be constructed to have m (cid:48) ≤ m (cid:48) max components given by Eq. (26). Of course, the actual re-sulting matrices, as well as the final half-LLR coefficientsdo depend on the assumed error model.How much freedom is there to choose the matrices Θ (cid:48) and H (cid:48) ? For the purpose of ML decoding, we need togo over the entire linear space C Θ (cid:48) generated by the rowsof Θ (cid:48) ; the choice of basis is irrelevant. The same is trueregarding the parity check matrix H (cid:48) . These are the samesymmetries as for a generator and a parity-check matricesof a binary code.In essence, the original quantum subsystem code withgauge G and logical L generator matrices has been trans-formed into a classical binary code, with the transfor-mation dependent in a non-trivial fashion on the errorprobability distribution. This binary code has length m (cid:48) ≤ m (cid:48) max , and it encodes 2 k bits.Further, any non-trivial linear combination of rows of Lθ with rows of Gθ has weight lower-bounded by the dis-tance of the quantum code, which gives the lower bound d (cid:48) ≥ d on the distance of this binary code. In general,given the structure of the row-reduction transformation,the distance may be quite a bit larger, possibly scalinglinearly with m (cid:48) . Notice, however, that with highly non-uniform error probability distribution, more relevant pa-rameter is not the distance, but the corresponding quan-tity weighted with half-LLR coefficients K (fin) b , relatedto the probability of the most-likely logical error. Byconstruction, this quantity is exactly the same as for theoriginal quantum code.Let us consider an important case of a sparse origi-nal parity check matrix H, e.g., with row and columnweights bounded. This requires a low-density parity-check (LDPC) code with a sparse stabilizer generatormatrix (cid:101) H = H Σ, and an error channel with the gen-erator matrix θ = ( I n | T ) also sparse. When actingon the parity check matrix, each row-reduction trans-formation drops a column of H, and may also add oneor more rows of weight 3, see Sec. IV A 2. Thus, theparity-check matrix of the classical binary code describ-ing the marginal probability distribution of error equiv-alence classes must also be sparse, with row weights notexceeding w (cid:48) ≤ max( w, w is the maximum rowweight of H in Eq. (27). C. Marginal distribution for output errors in agood measurement circuit
This discussion also applies to the code associated withthe error equivalence group of a good EDC. In this casethe matrices G and H have 2 N columns each, where N isthe number of circuit locations, and their ranks are givenby Eqs. (12) and (14). Just as any circuit error can bepushed all the way to the right, row-reduction can also bedone starting with the bits at the beginning of the circuitand pushing toward its output. This way, a circuit errorequivalence class can be characterized by (cid:96) = 2 N − rank G = 2 n + f = ( n + k ) + ( n a − κ ) (28)bits, where n + k is the number of linearly-independenterror equivalence classes in an [[ n , k ]] stabilizer code and n a − κ is the number of syndrome bits measured in thecircuit. Alternatively, as in a subsystem code with κ gauge qubits, (cid:96) = n + k − κ is the exponent in Eq. (26),and n a is the number of additional error positions rightbefore the measurements. As in the previous section, thisgives an upper bound on the maximum number M (cid:48) ofcolumns in the matrices Θ (cid:48) and H (cid:48) that may be necessary, M (cid:48) ≤ M (cid:48) max = 2 (cid:96) − . (29)After row-reduction for all generators of the circuit EEG,we get a classical [ M (cid:48) , k, d (cid:48) ] binary code, where k is thenumber of encoded qubits, and d (cid:48) ≥ d .Is this an LDPC code? This question has not beenanswered in the previous section: the parity check matrix H of the circuit code is not necessarily sparse as its rowweights scale linearly with circuit depth. To analyze thesparsity of the output-error parity-check matrix H (cid:48) , writeit in a block form similar to Eq. (27),H (cid:48) = (cid:32) H (cid:48) H (cid:48)(cid:48) H (cid:48)(cid:48) (cid:33) , (30)where the upper row-block contains only the originalcolumns of the circuit EEG parity check matrix H re-maining after row-reduction steps (i (cid:48) ) and (ii (cid:48) ), while any row with exactly three non-zero entries added in a step(iii (cid:48) ) goes to the lower row-block. Further, if we assumecircuit EEG H in the form (27), with the lower row-block of bounded weight (e.g., w = 3 for depolarizingerrors), any potentially large-weight row in H (cid:48) must cor-respond to (i) a stabilizer generator of the output code,or (ii) a generator of the group H (cid:48) , see Ref. 22. All ofthese have bounded weights for any bounded-weight sta-bilizer LDPC code with a measurement circuit where sta-bilizer generators are measured independently and with abounded number of ancillary qubits. On the other hand,these conditions do not generally hold for any family ofconcatenated codes based on a given code, or for sub-system codes where a stabilizer generator may have anunbounded weight or cannot be expressed as a product ofgauge generators with a bounded number of terms. Nev-ertheless, even in these cases, given a potentially verylarge number of columns (29), it is reasonable to expectthe final H (cid:48) to be a sparse matrix, with only a small frac-tion of non-zero elements. V. DECODING STRATEGIESA. Decoding based on circuit EEG
The present approach is to analyze error propagationin a Clifford measurement circuit in terms of its circuitEEG characterized by a logical generator matrix L andeither a generator matrix G or a parity check matrix H .With a minor constraint on the circuit, these correspondto the circuit subsystem code constructed in Refs. 21 and22. More generally, these matrices form (a half of) a CSScode, with generator matrices G X = G and G Z = H ,while rows of L define X -type logical operators. Simplyput, any decoder capable of dealing with a CSS code withuncorrelated X -type errors can now be used to accountfor error correlations in a given measurement circuit.Additional correlations can also be accounted for. As-suming a Pauli error channel (with or without correla-tions between different circuit locations) characterized bya coupling matrix θ and a set of half-LLR coefficients K b (one per column), probability of a circuit error equivalentto a given one is given by the Ising partition function (13),with the spin-bond coupling matrix Gθ .As already discussed, for ML decoding we need to findthe largest of the Ising partition functions (19). Sucha calculation can be expensive. Indeed, given a codeencoding k qubits, we need to compute and choose thelargest of all 2 k partition functions corresponding to allnon-trivial sectors; this can only be done in reasonabletime for a code with k small. Previously, a computa-tionally efficient ML decoder using this approach and 2Dtensor network contraction for computing the partitionfunctions has been constructed for surface codes[45] inthe channel model (perfect syndrome measurement).Feasible approaches for evaluating the partition func-tions (19) include tensor network contraction (see, e.g.,0in Ref. 46, for a 3D network contraction with complexityscaling as ∝ nχ , where χ is the bond dimension) andMonte-Carlo (MC) methods constructed specifically forefficient calculations of free energy differences, e.g., thenon-equilibrium dynamics method [47] or the classicalBennett acceptance ratio[48]. In application to surfacecodes in FT regime, MC calculations are in essence sim-ulations of bond-disordered 3D Ising model; such calcula-tions can be done using GPU[49], FPGA[50], or TPU[51]hardware acceleration.Notice also that the circuit code is extremely degener-ate: with Hadamard, Phase, and CNOT gates the rowsof the generator matrix G have (quaternary) weights notexceeding three. On the other hand, the row weightsof the parity-check matrix H scale linearly with the cir-cuit depth. By these reasons, iterative decoders like be-lief propagation (BP) are expected to fare even worsethan with the usual (not so degenerate) quantum LDPCcodes[52–54]. B. Generator-based decoding via marginaldistributions
The calculation of the Ising partition functions neededfor ML decoding can be simplified via partial summationover a subset of the spins. Technically, this correspondsto using a marginal distribution for the subset of variablesneeded, as discussed in Sec. IV.Denote V the set of rows of the original generator ma-trix G [also, rows in the upper block of Θ in Eq. (20)].Then, row-reduction in an increasing sequence of subsets I ≡ ∅ ⊂ I ⊂ I ⊂ . . . ⊂ I s ≡ V (31)defines a sequence of mutually-dual pairs of matrices { Θ ( j ) , H ( j ) } with m j columns, whereΘ ( j ) = (cid:32) G ( j ) L ( j ) (cid:33) , j ∈ { , . . . , s } . (32)The ranks of logical generator matrices remain the same,rank L ( j ) = 2 k , while the sequence r j ≡ rank G ( j ) isdecreasing with increasing j , ending at r s = 0. Inessence, this defines a sequence of asymmetric (half) CSScodes [[ m j , k ]], with generator matrices G ( j ) X = G ( j ) and G ( j ) Z = H ( j ) , where rows of L ( j ) define X -type logicaloperators. The sequence ends with a CSS code with anempty X -generator matrix, i.e., a classical binary codewith the parity check matrix H ( s ) .For each of the codes in the sequence (32), there isalso a set of half-LLR coefficients { K ( j ) b , ≤ b ≤ m j } .Given an error vector e ∈ F m j matching the syndrome,ML decoding can be done by choosing the largest of the2 k Ising partition functions [cf. Eq. (19)] Z (cid:16) G ( j ) , { K ( j ) b ( − e (cid:48) b } (cid:17) , e (cid:48) = e + α L ( j ) , α ∈ F k . (33) By construction, the result should be the same for ev-ery j ≤ s . However, the complexities for computing thepartition functions may differ dramatically.One possibility for exact ML decoding is thus to choosea subset I ⊂ V to optimize the partition function calcula-tion. In particular, one option is to choose I such thatall rows of weights one, two, and three are eliminatedfrom the corresponding matrix G (1) . This minimizes thenumber of columns in the exact generator matrix of themarginal-distribution. Even though all rows in the origi-nal circuit EEG generator matrix G have row weights notexceeding three, row-reduction on rows of weight threetends to create higher-weight rows. Thus, in general, r >
0: the resulting partition function is not expectedto be trivial.To quote some numbers, Tables I and II give the dimen-sions of generator matrices and row-reduced generatormatrices for two code families: the toy codes [[ n , , d X = n /d Z = 1]] with stabilizer generators Z i Z i +1 mod n ,0 ≤ i < n , and the rotated toric codes (Ex. 11 in Ref. 55)[[ n , , t + 1]] with n = t + ( t + 1) and stabilizer gen-erators Z i X i + t mod n X i + t +1 mod n Z i +2 t +1 mod n , where0 ≤ i < n and t = 1 , , . . . , with up to three measure-ments cycles. The simplest examples of the measurementcircuits used for these two code families are shown inFigs. 3 and 4, respectively. In particular, while the origi-nal generator matrix for the circuit EEG of the [[13 , , n cyc = 3 times has di-mensions 1066 × × • • input • • output • • | i | i | i FIG. 3. Single-cycle circuit measuring the overcomplete set { Z Z , Z Z , Z Z } of stabilizer generators for the toy codewith n = 3 data and n a = 3 ancillary qubits. Second possibility for exact ML decoding is to performa summation over all spins, as described in Sec. IV C.This gives a probability distribution directly for the er-ror equivalence classes; the logarithm of probability ofan error equivalent to e is given, up to an additive con-stant, by a weighted sum of the half-LLR coefficients − (cid:80) b e b K (fin) b . Unfortunately, such an exact expressionis expected to have an exponentially long list of coeffi-cients, see Eqs. (28) and (29) for an upper bound. Thecorresponding column numbers are large even for the rel-atively simple circuits in Tables I and II. In fact, the1 circuit parameters generator matrix dimensions remaining columns m (cid:15) n n a n cyc d orig w = 2 w = 3 final w fin (cid:15) = 10 − (cid:15) = 10 − (cid:96) ×
36 3 ×
10 0 ×
10 0 ×
10 0 10 10 75 5 1 5 50 ×
60 5 ×
16 0 ×
16 0 ×
16 0 16 16 117 7 1 7 70 ×
84 7 ×
22 0 ×
22 0 ×
22 0 22 22 153 6 2 3 72 ×
78 9 ×
19 3 ×
19 2 ×
43 21 38 21 105 10 2 5 120 ×
130 15 ×
31 5 ×
31 3 ×
79 21 69 34 167 14 2 7 168 ×
182 21 ×
43 7 ×
43 4 ×
115 21 99 48 223 9 3 3 102 ×
108 15 ×
28 6 ×
27 4 ×
75 35 69 31 135 15 3 5 170 ×
180 25 ×
46 10 ×
45 7 ×
116 20 104 50 217 21 3 7 238 ×
252 35 ×
64 14 ×
63 10 ×
157 20 140 69 29TABLE I. Parameters of the original and row-reduced generator matrices for repetition code circuits as in Fig. 3 but with n cyc measurement cycles, n data and n a = n cyc n ancillary qubits. Also shown are dimensions of row-reduced generator matriceswith rows of weights w = 2 and w = 3 (and smaller) eliminated; w fin is the minimum row-weight of the final generator matrixwith the smallest number of rows remaining. Remaining columns m (cid:15) is the number of columns after columns with | K b | < (cid:15) aredropped from the final generator matrix, assuming p = 0 .
05 corresponding to a half-LLR value K ≈ . (cid:96) in the upper bound (29) on the number of columns.circuit parameters generator matrix dimensions remaining columns m (cid:15) n n a n cyc d orig w = 2 w = 3 w = 4 final w fin (cid:15) = 10 − (cid:15) = 10 − (cid:15) = 10 − (cid:96) ×
140 25 ×
35 20 ×
35 16 ×
43 13 ×
75 10 65 47 23 115 10 2 3 280 ×
290 55 ×
65 45 ×
65 37 ×
81 32 ×
139 10 112 82 46 165 15 3 3 410 ×
420 85 ×
95 70 ×
95 58 ×
119 51 ×
203 10 159 118 70 2113 13 1 5 338 ×
364 65 ×
91 52 ×
91 40 ×
115 32 ×
163 9 163 120 61 2713 26 2 5 728 ×
754 143 ×
169 117 ×
169 93 ×
217 83 ×
291 9 273 213 118 4013 39 3 5 1066 × ×
247 182 ×
247 146 ×
319 134 ×
419 9 384 305 182 5325 25 1 7 650 ×
700 125 ×
175 100 ×
175 76 ×
223 58 ×
331 9 331 234 116 5125 50 2 7 1400 × ×
325 225 ×
325 177 ×
421 157 ×
555 9 528 413 223 76TABLE II. Same as in Tab. I but for rotated toric codes [[ t + ( t + 1) , , t + 1]] with t = 1 , ,
3, represented as single-generatorcyclic codes, see Example 11 in Ref. 55. | q (cid:105) • H • • H •| q (cid:105) • H • • H •| q (cid:105) • H • • H •| q (cid:105) • H • • H •| q (cid:105) • H • • H •| q (cid:105) | (cid:105) ⊕ ⊕ ⊕ ⊕| q (cid:105) | (cid:105) ⊕ ⊕ ⊕ ⊕| q (cid:105) | (cid:105) ⊕ ⊕ ⊕ ⊕| q (cid:105) | (cid:105) ⊕ ⊕ ⊕ ⊕| q (cid:105) | (cid:105) ⊕ ⊕ ⊕ ⊕ FIG. 4. Cingle-cycle measurement circuit for the five-qubitcode with n a = 5 ancillary qubits. Mathematica program (which was not written for effi-ciency) failed to complete full row-reduction except forthe simplest repetition codes with n cyc = 1 round of mea- surements.In practice, the list of coefficients K (fin) b often con-tains a large number of entries with small magnitudes.This suggests a range of approximations, where only suf-ficiently large coefficients are preserved, e.g., | K (fin) b | > (cid:15) ,for a given (cid:15) >
0. For incompletely reduced matrices inTables I and II, with (cid:15) = 0 .
1, the number of columns is re-duced, roughly, by a factor of two. On general grounds,the reduction factor is expected to be much larger forfully-reduced generator matrices.Alternatively, only a fixed number χ of the largestin magnitude coefficients may be preserved. This latterapproach is similar in spirit to approximate tensor net-work contraction using singular value decomposition anda fixed maximum bond dimension. Notice that if only thecoefficients corresponding to columns of the matrix H (cid:48) inEq. (30) are preserved, we get an approximation similarto the conventional phenomenological noise model.The “history code” decoding algorithm suggested byChubb and Flammia[5], can be seen as a special case ofgenerator-based decoding. Here the measurement circuitis assumed to have a block structure, and summation is2done over all circuit locations with the exception of thoseat the output of each block. With the circuits in TablesI and II, a block corresponds to a single measurementcycle. The full probability distribution is then recov-ered using Bayes rules, assuming no correlations betweenmeasurements in different blocks. Evidently, even in thiscase, the complexity of the probability distribution ac-counting for full error correlations could be prohibitivefor exact ML decoding. C. Parity-based decoding via marginaldistributions
Summations over the spins in the subsets I j with in-creasing j from a sequence like (31) give marginal distri-butions which account exactly for increasing numbers ofalternative spin configurations. Respectively, the degen-eracy of the corresponding row-reduced half CSS codesdecreases with increasing j , down to a classical code withno degeneracy at the end of the sequence, j = s . A gen-eral expectation is that the accuracy of minimum-energy(ME) decoding would be increasing with increasing j .ME decoding becomes strictly equivalent to ML decod-ing for the end-of-the-sequence classical code.Formally, given a parity-check matrix H and a set ofLLR weights K b , ME decoding aims to find an error vec-tor e which gives the correst syndrome e H T = s andmaximizes the error likelihood (cid:80) b ( − e b K b or, equiva-lently, minimizes the error energy E = (cid:80) b e b K b . Com-pared to generator-based decoding, an obvious advantageis that there is no need to go over all 2 k logical oper-ators. Unfortunately, for generic codes, even the rela-tively simple problem of ME decoding has an exponentialcomplexity[56]. Given that the intermediate codes in thesequence (31) tend to be long, this makes it unpracticalto use generic ME decoding algorithms with exponentialcomplexity, e.g., the information subset[57, 58] or theclassical Viterbi[59] algorithm.Notice, however, that the sparsity of parity-check ma-trices H ( j ) increase with j . The final matrix H ( s ) is ex-pected to be sparse whenever the output code is an LDPCcode. Ideally, this classical code could be decoded with alinear complexity using a variant of belief propagation(BP) algorithm[60–63]. Assuming a sufficiently smallfraction of failed convergence cases, the result would beequivalent to ML decoding of the correlated errors.Unfortunately, this does not look so promising giventhe fact that this final code is expected to have an expo-nential length, see Eq. (29). Additionally, as confirmedwith limited numerical simulations[64], having a largenumber of small in magnitude LLR coefficients tends toreduce the convergence rate. Using approximate decod-ing schemes with reduced number of LLR coefficients asdiscussed in Sec. V B is expected to help with both is-sues. Notice, however, that for such a reduction certaincolumns in the generator matrix Θ are merely dropped(puncturing). The corresponding transformation for the parity-check matrix H is shortening [65], which may re-duce the sparsity of the resulting matrix and, in turn,negatively affect the convergence of BP decoding. VI. DISCUSSION
Improving the accuracy of syndrome-based decoding inthe presence of circuit-level error correlations would bothincrease the threshold to scalable quantum computationand improve the finite-size performance of quantum er-ror correction. Present results make two steps in thisdirection. First, the observation that ML decoding un-der these conditions amounts to decoding the code[21, 22]associated with the circuit EEG, in the absence of cor-relations. Second, the structure of this latter code canbe significantly simplified using row-reduction transfor-mations, while leaving the probability of ML decodingunchanged. A variety of approximate decoding schemesnaturally follow, which interpolate between the exact MLdecoding and the decoding within a relatively simple phe-nomenological error model, with an additional handle onthe degeneracy of the actual code to be used for decoding.Designing practical decoding algorithms, e.g., in appli-cation to surface codes with well-developed near-optimalsyndrome measurement circuits, would require a substan-tial additional effort. However, this does not look like anunsolvable problem.Decoding quantum LDPC codes is a major problem inthe theory of QECCs, especially in a fault-tolerant regimewith syndrome measurement errors present. While asubstantial progress has been made in recent years[66–70], this problem remains open in application to generichighly-degenerate codes. Transformations discussed inSec. IV change the degeneracy of a quantum code, andcan even map from a quantum to a classical code. Thisopens new avenues to explore in decoding, in particular,new ways to apply existing iterative decoding algorithmsto highly degenerate codes.Circuit optimization: Traditional approach to quan-tum error correction is to start with a code, come upwith an FT measurement circuit, compile it to a set ofgates available on a specific quantum computer, and thenfinally design a decoder. Instead, one could start with thelist of permitted two-qubit gates on a particular deviceand enumerate all good error-detecting circuits, increas-ing circuit depth and the number of gates. Given the cir-cuit, it is easy to find the parameters of the input/outputcodes, as well as construct the associated circuit code.While at the end we would still need to evaluate andcompare the performance of thus constructed codes, sucha procedure could offer a shortcut to circuit optimizationfor specific hardware.
Acknowledgment:
This work was supported in partby the NSF Division of Physics via grant No. 1820939.3
Appendix A: Ranks of the matrices1. Rank of the circuit-EEG generator matrix
Consider a good error-detecting circuit in Fig. 1 withthe constant c = 1 / κ in Eq. (11) and the input (or out-put) code encoding k = n − r qubits, where r is therank of the input-code stabilizer group. Here κ is aninteger[22]. Also assume that f measurements are re-dundant, so that the total number of ancillary qubits is n a = κ + r + f . Generators of the circuit EEG can beused to propagate any circuit error all the way to theoutput layer on the right, which requires 2 N − n + n a )independent generators, where N is the number of circuitlocations. In addition, there are n a ancillary Z j gener-ators on the input and n a on the output layers. How-ever, not all of them are independent. Indeed, whenthe n a ancillary Z j operators are propagated to theright, we get r independent operators, each containinga product of ancillary Z j and an element of the stabi-lizer group, κ independent operators containing ancillary X j , and f additional combinations that are redundantexcept for a product of ancillary Z j . Overall, this givesrank G = 2 N − n + n a ) + n a + κ + r , which is the same as in Eq. (12).
2. Rank of the circuit stabilizer group
The circuit stabilizer group must commute with anyEEG generator and also with any logical operator ofthe output code (say). Necessarily, its elements mustbe spackles[22]. Any spackle is uniquely determined byits support on the input layer, thus, there are total of2( n + n a ) independent spackles. We also have to en-sure commutativity with ancillary Z j generators on theleft (drop n a spackles) and on the right (drop κ spack-les). Additional r spackles have to be removed to en-sure commutativity with the elements of the output codestabilizer generators, and 2 k spackles to ensure commu-tativity with the output code logical operators. Overall,this leavesrank H = 2( n + n a ) − n a − κ − r − n − r )= n a − κ + r , which is exactly the result in Eq. (14). [1] P. W. Shor, “Scheme for reducing decoherence in quan-tum computer memory,” Phys. Rev. A , R2493 (1995).[2] C. G. Almudever, L. Lao, X. Fu, N. Khammassi,I. Ashraf, D. Iorga, S. Varsamopoulos, C. Eichler,A. Wallraff, L. Geck, A. Kruth, J. Knoch, H. Bluhm,and K. Bertels, “The engineering challenges in quantumcomputing,” in Design, Automation Test in Europe Con-ference Exhibition (DATE), 2017 (2017) pp. 836–845.[3] P. Aliferis, D. Gottesman, and J. Preskill, “Quan-tum accuracy threshold for concatenated distance-3codes,” Quantum Inf. Comput. , 97–165 (2006), quant-ph/0504218.[4] David S. Wang, Austin G. Fowler, and Lloyd C. L. Hol-lenberg, “Surface code quantum computing with errorrates over 1%,” Phys. Rev. A , 020302 (2011).[5] Christopher T. Chubb and Steven T. Flammia, “Statisti-cal mechanical models for quantum codes with correlatednoise,” (2018), unpublished, 1809.10704.[6] E. Dennis, A. Kitaev, A. Landahl, and J. Preskill,“Topological quantum memory,” J. Math. Phys. , 4452(2002).[7] Austin G. Fowler, Adam C. Whiteside, and Lloyd C. L.Hollenberg, “Towards practical classical processing forthe surface code,” Phys. Rev. Lett. , 180501 (2012).[8] A. G. Fowler, M. Mariantoni, J. M. Martinis, and A. N.Cleland, “Surface codes: Towards practical large-scalequantum computation,” Phys. Rev. A , 032324 (2012).[9] Austin G. Fowler, Adam C. Whiteside, Angus L.McInnes, and Alimohammad Rabbani, “Topologicalcode autotune,” Phys. Rev. X , 041003 (2012).[10] Christopher Chamberland, Guanyu Zhu, Theodore J. Yo-der, Jared B. Hertzberg, and Andrew W. Cross, “Topo-logical and subsystem codes on low-degree graphs with flag qubits,” Phys. Rev. X , 011022 (2020).[11] Christopher Chamberland, Aleksander Kubica,Theodore J Yoder, and Guanyu Zhu, “Triangularcolor codes on trivalent graphs with flag qubits,” NewJournal of Physics , 023019 (2020).[12] Christophe Vuillot, Lingling Lao, Ben Criger, Car-men Garc´ıa Almud´ever, Koen Bertels, and Barbara M.Terhal, “Code deformation and lattice surgery are gaugefixing,” New Journal of Physics , 033028 (2019).[13] Giacomo Torlai and Roger G. Melko, “Neural decoder fortopological codes,” Phys. Rev. Lett. , 030501 (2017).[14] S. Krastanov and L. Jiang, “Deep neural network prob-abilistic decoder for stabilizer codes,” Scientific Reports , 11003 (2017), 1705.09334.[15] N. P. Breuckmann and X. Ni, “Scalable neural networkdecoders for higher dimensional quantum codes,” Quan-tum , 68 (2018), 1710.09489.[16] Zhih-Ahn Jia, Yuan-Hang Zhang, Yu-Chun Wu, LiangKong, Guang-Can Guo, and Guo-Ping Guo, “Efficientmachine-learning representations of a surface code withboundaries, defects, domain walls, and twists,” Phys.Rev. A , 012307 (2019).[17] Paul Baireuther, Thomas E. O’Brien, Brian Tarasin-ski, and Carlo W. J. Beenakker, “Machine-learning-assisted correction of correlated qubit errors in a topo-logical code,” Quantum , 48 (2018).[18] Christopher Chamberland and Pooya Ronagh, “Deepneural decoders for near term fault-tolerant experi-ments,” Quantum Science and Technology , 044002(2018).[19] P. Baireuther, M. D. Caio, B. Criger, C. W. J. Beenakker,and T. E. O’Brien, “Neural network decoder for topolog-ical color codes with circuit level noise,” New Journal of Physics , 013003 (2019).[20] Nishad Maskara, Aleksander Kubica, and TomasJochym-O’Connor, “Advantages of versatile neural-network decoding for topological codes,” Phys. Rev. A , 052351 (2019).[21] D. Bacon, S. T. Flammia, A. W. Harrow, and J. Shi,“Sparse quantum codes from quantum circuits,” in Pro-ceedings of the Forty-Seventh Annual ACM on Sympo-sium on Theory of Computing , STOC ’15 (ACM, NewYork, NY, USA, 2015) pp. 327–334, 1411.3334.[22] D. Bacon, S. T. Flammia, A. W. Harrow, and J. Shi,“Sparse quantum codes from quantum circuits,” IEEETransactions on Information Theory , 2464–2479(2017).[23] Jozef Streˇcka, “Generalized algebraic transformationsand exactly solvable classical-quantum models,” PhysicsLetters A , 3718 – 3722 (2010).[24] Christopher Chamberland and Michael E. Beverland,“Flag fault-tolerant error correction with arbitrary dis-tance codes,” Quantum , 53 (2018), 1708.02246.[25] C. Chamberland and A. W. Cross, “Fault-tolerant magicstate preparation with flag qubits,” Quantum , 143(2019), 1811.00566.[26] Rui Chao and Ben W. Reichardt, “Quantum error correc-tion with only two extra qubits,” Phys. Rev. Lett. ,050502 (2018).[27] H´ector Bomb´ın, “Single-shot fault-tolerant quantum er-ror correction,” Phys. Rev. X , 031043 (2015).[28] Benjamin J. Brown, Naomi H. Nickerson, and Dan E.Browne, “Fault-tolerant error correction with the gaugecolor code,” Nature Communications , 12302 (2016).[29] Earl T. Campbell, “A theory of single-shot error correc-tion for adversarial noise,” Quantum Science and Tech-nology , 025006 (2019), 1805.09271.[30] I. Dumer, A. A. Kovalev, and L. P. Pryadko, “Thresh-olds for correcting errors, erasures, and faulty syndromemeasurements in degenerate quantum codes,” Phys. Rev.Lett. , 050502 (2015), 1412.6172.[31] A. A. Kovalev, S. Prabhakar, I. Dumer, and L. P.Pryadko, “Numerical and analytical bounds on thresholderror rates for hypergraph-product codes,” Phys. Rev. A , 062320 (2018), 1804.01950.[32] David Poulin, “Stabilizer formalism for operator quan-tum error correction,” Phys. Rev. Lett. , 230504(2005).[33] Dave Bacon, “Operator quantum error-correcting subsys-tems for self-correcting quantum memories,” Phys. Rev.A , 012340 (2006).[34] Daniel Gottesman, Stabilizer Codes and Quantum ErrorCorrection , Ph.D. thesis, Caltech (1997).[35] A. R. Calderbank, E. M. Rains, P. M. Shor, andN. J. A. Sloane, “Quantum error correction via codesover GF(4),” IEEE Trans. Info. Theory , 1369–1387(1998).[36] Jeroen Dehaene and Bart De Moor, “Clifford group, sta-bilizer states, and linear and quadratic operations overGF(2),” Phys. Rev. A , 042318 (2003).[37] Scott Aaronson and Daniel Gottesman, “Improved sim-ulation of stabilizer circuits,” Phys. Rev. A , 052328(2004).[38] Bin Dai, Shilin Ding, and Grace Wahba, “MultivariateBernoulli distribution,” Bernoulli , 1465–1483 (2013).[39] F. Wegner, “Duality in generalized Ising models andphase transitions without local order parameters,” J. Math. Phys. , 12 (1971).[40] A. J. Landahl, J. T. Anderson, and P. R. Rice, “Fault-tolerant quantum computing with color codes,” (2011),presented at QIP 2012, December 12 to December 16,arXiv:1108.5738.[41] A. A. Kovalev and L. P. Pryadko, “Spin glass re-flection of the decoding transition for quantum error-correcting codes,” Quantum Inf. & Comp. , 0825(2015), arXiv:1311.7688.[42] Lars Onsager, “Crystal statistics. I. a two-dimensionalmodel with an order-disorder transition,” Phys. Rev. ,117–149 (1944).[43] Shigeo Naya, “On the spontaneous magnetizations ofhoneycomb and Kagom´e Ising lattices,” Progress of The-oretical Physics , 53–62 (1954).[44] Michael E. Fisher, “Transformations of Ising models,”Phys. Rev. , 969–981 (1959).[45] Sergey Bravyi, Martin Suchara, and Alexander Vargo,“Efficient algorithms for maximum likelihood decodingin the surface code,” Phys. Rev. A , 032326 (2014).[46] Markus Hauru, Clement Delcamp, and Sebastian Miz-era, “Renormalization of tensor networks using graph-independent local truncations,” Phys. Rev. B , 045111(2018).[47] M. de Koning, Wei Cai, A. Antonelli, and S. Yip,“Efficient free-energy calculations by the simulation ofnonequilibrium processes,” Computing in Science Engi-neering , 88–96 (2000).[48] Charles H. Bennett, “Efficient estimation of free energydifferences from Monte Carlo data,” Journal of Compu-tational Physics , 245268 (1976).[49] Tobias Preis, Peter Virnau, Wolfgang Paul, and Jo-hannes J. Schneider, “ { GPU } accelerated monte carlosimulation of the 2d and 3d ising model,” Journal of Com-putational Physics , 4468 – 4477 (2009).[50] A. Gilman, A. Leist, and K. A. Hawick, “3D lat-tice Monte Carlo simulations on FPGAs,” in Proceed-ings of the International Conference on Computer De-sign (CDES) (The Steering Committee of The WorldCongress in Computer Science, Computer Engineeringand Applied Computing (WorldComp), 2013).[51] Kun Yang, Yi-Fan Chen, Georgios Roumpos, ChrisColby, and John Anderson, “High performance MonteCarlo simulation of Ising model on TPU clusters,”(2019), unpublished, 1903.11714.[52] D. Poulin and Y. Chung, “On the iterative decoding ofsparse quantum codes,” Quant. Info. and Comp. , 987(2008), arXiv:0801.1241.[53] Ye-Hua Liu and David Poulin, “Neural belief-propagation decoders for quantum error-correctingcodes,” Phys. Rev. Lett. , 200501 (2019), 1811.07835.[54] Alex Rigby, J. C. Olivier, and Peter Jarvis, “Modi-fied belief propagation decoders for quantum low-densityparity-check codes,” Phys. Rev. A , 012330 (2019),1903.07404.[55] A. A. Kovalev, I. Dumer, and L. P. Pryadko, “Designof additive quantum codes via the code-word-stabilizedframework,” Phys. Rev. A , 062319 (2011).[56] Pavithran Iyer and David Poulin, “Hardness of decodingquantum stabilizer codes,” IEEE Transactions on Infor-mation Theory , 5209–5223 (2015), arXiv:1310.3235.[57] E. A. Kruk, “Decoding complexity bound for linear blockcodes,” Probl. Peredachi Inf. , 103–107 (1989), (InRussian). [58] J. T. Coffey and R. M. Goodman, “The complexity ofinformation set decoding,” IEEE Trans. Info. Theory ,1031 –1037 (1990).[59] Andrew J. Viterbi, “Error bounds for convolutional codesand an asymptotically optimum decoding algorithm,”IEEE Transactions on Information Theory , 260–269(1967).[60] R. G. Gallager, Low-Density Parity-Check Codes (M.I.T.Press, Cambridge, Mass., 1963).[61] M. P. C. Fossorier, “Iterative reliability-based decodingof low-density parity check codes,” IEEE Journal on Se-lected Areas in Communications , 908–917 (2001).[62] Thomas J. Richardson and R¨udiger L. Urbanke, “The ca-pacity of low-density parity-check codes under message-passing decoding,” Information Theory, IEEE Transac-tions on , 599–618 (2001).[63] David Declerq, Marc Fossorier, and Ezio Biglieri, eds., Channel Coding. Theory, Algorithms, and Applications (Academic Press Library in Mobile and Wireless Com-munications, San Francisco, 2014).[64] Weilei Zeng and Leonid P. Pryadko, “Iterative decodingof row-reduced quantum LDPC codes,” (2020), unpub-lished. [65] F. J. MacWilliams and N. J. A. Sloane,
The Theoryof Error-Correcting Codes (North-Holland, Amsterdam,1981).[66] Omar Fawzi, Antoine Grospellier, and Anthony Lever-rier, “Efficient decoding of random errors for quantumexpander codes,” (2017), unpublished, 1711.08351.[67] Omar Fawzi, Antoine Grospellier, and Anthony Lever-rier, “Constant overhead quantum fault-tolerance withquantum expander codes,” in59th IEEE Annual Sympo-sium on Foundations of Computer Science, FOCS 2018,Paris, France, October 7-9, 2018