[PDF] Resource optimization for fault-tolerant quantum computing

Abstract

In this thesis we examine a variety of techniques for reducing the resources required for fault-tolerant quantum computation. First, we show how to simplify universal encoded computation by using only transversal gates and standard error correction procedures, circumventing existing no-go theorems. We then show how to simplify ancilla preparation, reducing the cost of error correction by more than a factor of four. Using this optimized ancilla preparation, we develop improved techniques for proving rigorous lower bounds on the noise threshold. Additional overhead can be incurred because quantum algorithms must be translated into sequences of gates that are actually available in the quantum computer. In particular, arbitrary single-qubit rotations must be decomposed into a discrete set of fault-tolerant gates. We find that by using a special class of non-deterministic circuits, the cost of decomposition can be reduced by as much as a factor of four over state-of-the-art techniques, which typically use deterministic circuits. Finally, we examine global optimization of fault-tolerant quantum circuits under physical connectivity constraints. We adapt techniques from VLSI in order to minimize time and space usage for computations in the surface code, and we develop a software prototype to demonstrate the potential savings.

Full PDF

RResource optimization forfault-tolerant quantum computing by Adam Paetznick

A thesispresented to the University of Waterlooin fulﬁllment of thethesis requirement for the degree ofDoctor of PhilosophyinComputer ScienceWaterloo, Ontario, Canada, 2013 a r X i v : . [ qu a n t - ph ] O c t opyright notice. Chapter 5 contains material from [PR13], which is copyrighted bythe American Physical Society. Chapters 6 and 7 contain material from [PR12] which iscopyrighted by Rinton Press.Remaining material is: c (cid:13)

Adam Paetznick 2013ii hereby declare that I am the sole author of this thesis. This is a true copy of the thesis,including any required ﬁnal revisions, as accepted by my examiners.I understand that my thesis may be made electronically available to the public.iii bstract

Quantum computing oﬀers the potential for eﬃciently solving otherwise classicallydiﬃcult problems, with applications in material and drug design, cryptography, theoreticalphysics, number theory and more. However, quantum systems are notoriously fragile;interaction with the surrounding environment and lack of precise control constitute noise,which makes construction of a reliable quantum computer extremely challenging. Thresholdtheorems show that by adding enough redundancy, reliable and arbitrarily long quantumcomputation is possible so long as the amount of noise is relatively low—below a “threshold”value. The amount of redundancy required is reasonable in the asymptotic sense, but inabsolute terms the resource overhead of existing protocols is enormous when compared tocurrent experimental capabilities.In this thesis we examine a variety of techniques for reducing the resources requiredfor fault-tolerant quantum computation. First, we show how to simplify universal encodedcomputation by using only transversal gates and standard error correction procedures,circumventing existing no-go theorems. The cost of certain error correction proceduresis dominated by preparation of special ancillary states. We show how to simplify ancillapreparation, reducing the cost of error correction by more than a factor of four. Using thisoptimized ancilla preparation, we then develop improved techniques for proving rigorouslower bounds on the noise threshold. The techniques are speciﬁcally intended for analysisof relatively large codes such as the 23-qubit Golay code, for which we compute a lowerbound on the threshold error rate of 0 .

132 percent per gate for depolarizing noise. Thisbound is the best known for any scheme.Additional overhead can be incurred because quantum algorithms must be translatedinto sequences of gates that are actually available in the quantum computer. In particular,arbitrary single-qubit rotations must be decomposed into a discrete set of fault-tolerantgates. We ﬁnd that by using a special class of non-deterministic circuits, the cost ofdecomposition can be reduced by as much as a factor of four over state-of-the-art techniques,which typically use deterministic circuits.Finally, we examine global optimization of fault-tolerant quantum circuits. Physicalconnectivity constraints require that qubits are moved close together before they caninteract, but such movement can cause data to lay idle, wasting time and space. We adapttechniques from VLSI in order to minimize time and space usage for computations in thesurface code, and we develop a software prototype to demonstrate the potential savings.iv cknowledgements

I must begin by thanking my supervisor, Ben Reichardt, for his support over the past fouryears. Ben is responsible for teaching me much of what I know about fault-tolerant quantumcomputation. In addition he has served as a tremendous guide in terms of academic writingand speaking, and navigation of the academic world in general. Much of my writing andspeaking style is due to Ben’s advice.I would also like to thank Richard Cleve for supporting me throughout, but especiallyfor support in the past two years during which Ben has been at USC. Special thanks also toMichele Mosca for bringing me into the quantum circuits and Torque group. I am gratefulto other members of the Torque team for their enthusiasm and support including especiallyMartin Roetteler and Rich Lazarus. The surface code was largely a mystery until it wasmarvelously explained to me by Austin Fowler. His tenacity for ﬁnding practical solutionsto important problems has inspired me to try to do the same.Some of my most valuable discussions and collaborations occurred during internshipsaway from Waterloo. I would like to thank all of those in the quantum computing group atHRL, and Jim Harrington and Bryan Fong in particular, for their hospitality and support.Thanks also to the QuArC group at Microsoft Research including: Krysta Svore, AlexBocharov, Dave Wecker and Nathan Wiebe.Much of my ﬁnancial support has come from the Mike and Ophelia Lazaridis fellowship,for which I am very grateful.Of course, I must also acknowledge the support of my peers, at Waterloo and elsewhere,for their friendship and for helpful suggestions and conversations. This includes: VadymKliuchnikov, Cody Jones, Peter Brooks, Robin Kothari, Alessandro Cosentino, MattAmy, Vinayak Pathak, Lucy Zhang, David Gosset, Rajat Mittal, Ansis Rosmanis, StacyJeﬀery, Moritz Ernst, Tomas Jochym-O’Connor, Jaimie Sikora, Sevag Gharibian, SarvagyaUpadhyay, Laura Mancinska, Abel Molina and Shelby Kimmel. To my many other friendsincluding Troy Borneman, Chad Daley, Mike Wesolowski, Kurt Schreiter, Mike Zhang,Daniel Park, Chris Wood, Holger Haas, Shane Farnsworth, Andrew Achkar and HalleRevell, thank you for making the experience in Waterloo an enjoyable one for me and mywife Marion.Finally, my personal and academic successes are due largely to the inﬂuence, love andsupport of my parents, Duane and Phyllis, and my brother Brandon and his wife Heather.Thank you for your unwavering encouragement, especially during the ﬁrst few years inWaterloo, which were diﬃcult for both Marion and myself. I am similarly grateful toMarion’s parents, Bill and Sue, and my sister-in-law Gwen and her husband Rich.v edication

To my loving wife Marion,I would not have even considered this pursuit had it not been for your enthusiasticencouragement and support. Conferences and internships have kept us apart for longstretches, and while I have been traveling all over the world, you have been working theextra jobs to keep us aﬂoat. When we are together, you ﬁll me with life and laughter. Thisthesis is as much a product of your time, eﬀort and love as it is of mine. I love you.vi able of Contents

List of Tables xiiiList of Figures xiv1 Motivation and results 1

Protecting quantum information 15 V . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1468.7 Decomposition with the circuit database . . . . . . . . . . . . . . . . . . . 1478.7.1 Decomposition with axial rotations . . . . . . . . . . . . . . . . . . 1498.7.2 Decomposition with non-axial rotations . . . . . . . . . . . . . . . 1518.8 Quantum algorithms using coarse angles . . . . . . . . . . . . . . . . . . . 1538.9 Possible generalizations and limitations . . . . . . . . . . . . . . . . . . . . 155

10 Concluding thoughts 186Appendices 189A Proof of Claim 7.5.1 190References 192 xii ist of Tables | (cid:105) . . . . . . . . . . . . . . . . . . 846.2 Correlated X error counts for Golay encoded | (cid:105) . . . . . . . . . . . . . . . 866.3 Distribution of errors for Golay encoded | (cid:105) . . . . . . . . . . . . . . . . . . 906.4 Random ancilla preparation schedules for Golay encoded | (cid:105) . . . . . . . . . 926.5 Golay code permutations for ancilla veriﬁcation. . . . . . . . . . . . . . . . 936.6 Acceptance probabilities for Golay code ancilla veriﬁcation. . . . . . . . . . 967.1 Location counts for preparing | (cid:105) in the Golay code. . . . . . . . . . . . . . 1187.2 Depolarizing noise threshold lower bounds for the Golay code. . . . . . . . 1188.1 Decomposition methods for arbitrary single-qubit unitaries. . . . . . . . . . 1348.2 Decomposition methods for Z -axis rotations. . . . . . . . . . . . . . . . . . 1348.3 Expected T counts for approximation of random Z -axis rotations with RUScircuits. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1508.4 Size and density of the Z -axis rotation database according to the maximumexpected number of T gates. . . . . . . . . . . . . . . . . . . . . . . . . . . 1529.1 Surface code gate set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162xiii ist of Figures T gate. . . . . . . . . . . . . . . . . . . . . . . . . 484.10 Distillation by encoded gate teleportation. . . . . . . . . . . . . . . . . . . 494.11 Naive syndrome measurement. . . . . . . . . . . . . . . . . . . . . . . . . . 504.12 Steane-style error correction. . . . . . . . . . . . . . . . . . . . . . . . . . . 514.13 Knill-style error correction. . . . . . . . . . . . . . . . . . . . . . . . . . . . 524.14 Shor-style syndrome measurement. . . . . . . . . . . . . . . . . . . . . . . 535.1 Toﬀoli and CCZ equivalence. . . . . . . . . . . . . . . . . . . . . . . . . . . 605.2 Transversal H plus error correction. . . . . . . . . . . . . . . . . . . . . . . 625.3 Toﬀoli state distillation using CCZ. . . . . . . . . . . . . . . . . . . . . . . 65xiv.4 CCZ implementation from Toﬀoli gate teleportation. . . . . . . . . . . . . 665.5 CCZ gate teleportation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 675.6 An error detecting Toﬀoli gate. . . . . . . . . . . . . . . . . . . . . . . . . 685.7 T count for Toﬀoli distillation protocols. . . . . . . . . . . . . . . . . . . . 695.8 Gate teleportation of H in the [[15 , , | (cid:105) for the [[7 , , , , | (cid:105) . . . . . . . . . . . . . . . . . . 836.6 A 57 CNOT circuit for encoded | (cid:105) in the Golay code. . . . . . . . . . . . . 856.7 First order veriﬁcation circuits. . . . . . . . . . . . . . . . . . . . . . . . . 876.8 Twelve ancilla veriﬁcation circuit for the Golay code. . . . . . . . . . . . . 896.9 Four ancilla veriﬁcation circuit for the Golay code. . . . . . . . . . . . . . . 896.10 Overhead estimates for Golay code error correction. . . . . . . . . . . . . . 947.1 A circuit component. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1037.2 EC and exRec components. . . . . . . . . . . . . . . . . . . . . . . . . . . 1067.3 Upper block of the CNOT exRec. . . . . . . . . . . . . . . . . . . . . . . . 1137.4 CNOT exRec components. . . . . . . . . . . . . . . . . . . . . . . . . . . . 1197.5 Malignant event probabilities for Golay code CNOT. . . . . . . . . . . . . 1207.6 Acceptance probabilities and Pr[ bad ]. . . . . . . . . . . . . . . . . . . . . . 1237.7 Gate overhead for Golay and Fibonacci schemes. . . . . . . . . . . . . . . . 1257.8 Qubit overhead for Golay and Fibonacci schemes. . . . . . . . . . . . . . . 1268.1 Repeat-until-success circuits for V . . . . . . . . . . . . . . . . . . . . . . 1308.2 General form of an RUS circuit. . . . . . . . . . . . . . . . . . . . . . . . . 1338.3 General form of circuits in the RUS circuit database. . . . . . . . . . . . . 141xv.4 Cliﬀord simpliﬁcations to Figure 8.3. . . . . . . . . . . . . . . . . . . . . . 1418.5 Statistics for the database of RUS circuits. . . . . . . . . . . . . . . . . . . 1438.6 Axial and non-axial RUS circuit costs compared to KMM. . . . . . . . . . 1448.7 Cheap RUS circuit with high KMM T count. . . . . . . . . . . . . . . . . . 1448.8 The smallest RUS circuit in our database. . . . . . . . . . . . . . . . . . . 1458.9 A two-qubit RUS circuit for V . . . . . . . . . . . . . . . . . . . . . . . . . 1458.10 High-order V -basis RUS circuits. . . . . . . . . . . . . . . . . . . . . . . . 1468.11 T count scaling for approximation of Z -axis rotations. . . . . . . . . . . . . 1508.12 T count scaling for approximation of non-axial single-qubit unitaries. . . . 1549.1 Surface code primal-dual CNOT. . . . . . . . . . . . . . . . . . . . . . . . 1619.2 An example of topological deformation. . . . . . . . . . . . . . . . . . . . . 1629.3 Surface code T gate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1649.4 Time ordering of T gates. . . . . . . . . . . . . . . . . . . . . . . . . . . . 1659.5 Primative plumbing pieces. . . . . . . . . . . . . . . . . . . . . . . . . . . . 1689.6 Surface code cells. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1689.7 An example of gravity forces. . . . . . . . . . . . . . . . . . . . . . . . . . 1709.8 An example of the tension force. . . . . . . . . . . . . . . . . . . . . . . . . 1719.9 Compaction of a CNOT gate. . . . . . . . . . . . . . . . . . . . . . . . . . 1749.10 Compaction of eleven CNOT gates. . . . . . . . . . . . . . . . . . . . . . . 1759.11 A jog node. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1799.12 A cuboid with ports. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1809.13 A linking node. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1829.14 An enclosing cuboid around a primal loop. . . . . . . . . . . . . . . . . . . 182xvi hapter 1Motivation and results The discovery of quantum mechanics in the early 1900s represented a fundamental departurefrom previous understanding of the natural world. In a similar way, quantum computers,conceived by Feynman in 1982, represent a fundamental shift from the traditional way ofsolving computational problems [Fey82]. Feynman observed that simulation of quantummechanics, though an apparently diﬃcult task for (classical) computers, is accomplishedtautologically by natural physical systems. Consequently, a computing device operatingaccording to the laws of quantum mechanics could have a distinct advantage over its classicalcounterparts.Indeed, simulation of quantum mechanical systems is of enormous practical importance,with potential applications in drug design, materials science, protein folding and more (see,e.g., [KW11]). Feynman’s original ideas have since been reﬁned and show that exponentialspeedups for simulation of quantum mechanical systems are indeed possible, in theory [AL97,BT98, Zal98].Exponential improvements are not limited to simulation, though. In 1994, Shor devel-oped a polynomial-time algorithm for factoring large numbers, a problem which is widelybelieved to be intractable for classical computers [Sho94]. Other exponential speedups existincluding algorithms for solving linear systems of equations [HHL09], and other mathe-matical problems [Ked06, JW06, Hal07, AJKR10]. Finding new algorithms is a subject ofactive research [Mos08, CvD10].To date, however, quantum computers capable of outperforming classical devices donot exist. The limited number of experimental eﬀorts that have been attempted, whileencouraging, fall well short of the scale necessary for real-world applications [LJL + + decoherence , which destroys entanglement.Decoherence can be delayed by carefully isolating the quantum information from itsenvironment. However, too much isolation also prevents (wanted) access to the quantumsystem, making control and readout diﬃcult. At the same time, coherently controllinga large quantum mechanical system for the duration of an algorithm requires extremeaccuracy. Such stringent control requirements, combined with the inherent fragility ofquantum information, raise concerns about the feasibility of constructing a quantumcomputer.Is accurate large-scale quantum computation possible? It turns out that, by incorporatingenough redundancy, quantum computation with arbitrary accuracy is possible, at least inprinciple [AB97]. In practice, the engineering challenges are signiﬁcant and the necessaryamount of redundancy can be overwhelmingly large. In this thesis, we will discuss thechallenges and propose a variety of methods for reducing resource requirements. Errors in a quantum computer originate from two sources. First, control of the quantumsystem may be imperfect. For example, operations in a quantum computer can be describedby rotations about a set of ﬁxed axes. Over time, small over- or under-rotations canaccumulate, resulting in data corruption. Second, the surrounding environment mayinteract undesirably with the system. For example, data stored in an electron can bealtered by interaction with surrounding magnetic ﬁelds. Collectively, imperfect control andenvironmental interactions represent noise in a quantum computer.Noise is not exclusive to quantum systems. Classical devices can also suﬀer from errorsdue to imperfections, or external physical phenomena. However, most electronics can bemanufactured so that errors are vanishingly rare. When this is not possible, errors can besuppressed by adding redundancy. Error-correcting codes use a large number of physicalbits in order to represent some smaller number of “logical” bits [MS93]. As long as thenumber of physical bit errors is small enough, the information inside of the code can beretrieved accurately. 2ndeed, a very simple kind of error protection is used in dynamic random-access-memory(DRAM), which is ubiquitous in modern electronics. Each bit in DRAM is stored in asmall capacitor as an electric charge, which may leak away over time. To avoid data loss,each charge is periodically “refreshed” by reading it and then rewriting it. Unfortunately,directly refreshing quantum bits is not possible. Merely reading a quantum bit, or qubit ,has the eﬀect of changing its state.One might hope that quantum hardware could be manufactured to reduce noise toacceptable levels. However, most quantum algorithms will require billions of operationsand many hundreds or thousands of qubits. Controlling such a large number of qubits,each with an error rate below one part in a billion is far beyond the capability of currenttechnology, and is likely to remain so for the foreseeable future.The inability to refresh is due, in part, to the fact that quantum information cannotbe cloned [WZ82]. One might expect that the use of error-correcting codes for quantuminformation is therefore also prohibited. Nevertheless, quantum information can be protectedby combining classical error-correcting codes in a novel way [Sho96]. Indeed, so long as theprobability of an error is below a constant threshold value, it is possible to use error-correctingcodes to protect quantum information during arbitrarily long computations [AB97].Error correction is not the only technique available for protecting quantum information.Decoherence-free subspaces and dynamical decoupling are capable of improving the ﬁdelityof quantum operations [PSE96, DG97, VKL99, Ban98]. However, these methods havelimitations and are generally regarded as complementary to active error correction, whichis where we will focus our attention.

Quantum error-correcting codes permit high-quality protection of quantum informationfrom noise, but is it enough? At a minimum, the amount of noise that can be tolerated byerror correction must meet or exceed the amount of noise in the physical system. Thresholdtheorems tell us that arbitrary accuracy is possible even if error rates are constant, butsmall enough [AB97, Kit97, KLZ96, Rei06b, TB05, AGP06, AKP06, NP09, Pre13]. Whatis the noise threshold for quantum computing, and can it be physically achieved?Initial estimates of the threshold error rate were around 0 .

01 percent per gate [Zal96], buthave been subsequently improved to as high as one- to three-percent per gate [Kni05, RH07,3FH11]. This range of error rates meets or approaches gate ﬁdelities reported by a varietyof experimental eﬀorts for small-scale systems [LJL +

10, MSB +

11, CGC +

12, GGZ13]. Itseems, therefore, that quantum error-correcting codes have the capacity to protect quantuminformation in realistic conditions.But there is a second, potentially more alarming concern. In principle, quantumcomputation with error correction is eﬃcient. If the size of the ideal circuit is n then thecorresponding fault-tolerant circuit need only be a factor of poly(log n ) larger. However theconstants involved can be quite large, and numerical studies have shown that the resourcerequirements can be astoundingly large in absolute terms. A single encoded quantumgate can require millions or billions of physical gates [Kni05, RHG07, PR12, JVF + − [Kni05]. That is, if the size of the original algorithm is n ,then the size of the quantum computer would need to be roughly 10 n . For other size anderror parameters, the gate and qubit overhead can range from one-thousand to one-billionfold, or more.These kinds of resource requirements place a huge burden on the construction of aquantum computer. Even if billions of qubits can be coherently controlled, such largeoverhead is clearly undesirable. The fear is that the overhead required to protect quantuminformation is so large as to make quantum computers wholly impractical, or to eﬀectivelynegate any algorithmic speedups over classical computers. The most important goal ofthe quantum circuit designer, therefore, is to reduce resource requirements to manageablelevels. Resource overhead in fault-tolerant circuits is incurred in a variety of ways, including largeerror-correction circuits, large gate costs, low encoding rates and more. Thus one shouldconsider a variety of optimization strategies in order to address each problem. Accordingly,4his thesis proposes a number of new techniques for reducing the resources required toaccurately implement quantum algorithms, subject to realistic constraints imposed byquantum computing hardware.

Universality with transversal gates

Fault-tolerant computation involves performing operations on the data while it is encoded.For most quantum error-correcting codes, there is a small set of operations that canbe performed easily, and another set of operations that are much more diﬃcult but arerequired in order to implement quantum algorithms. The Toﬀoli gate, for example, is usedheavily in classical subroutines but usually involves costly decomposition into a sequenceof other gates. In Chapter 5 we show that a particular family of quantum codes admits asimple “transversal” implementation of the controlled-controlled- Z gate. A relatively cheapimplementation of Toﬀoli can then be obtained with the help of encoded Hadamards, whichwe show can also be implemented transversally. Toﬀoli and Hadamard are universal forquantum computation [Shi03], and so only these simple transversal gates are necessary. Smaller error correction circuits

Error correction dominates the resource costs of many fault-tolerance schemes. Reducingthe cost of error-correction therefore reduces the total cost by nearly the same amount.Chapter 6 examines methods for eﬃciently preparing so-called “stabilizer states”, whichcomprise the bulk of the cost for several types of error correction. These methods can beapplied to a large class of quantum error-correcting codes, and are particularly eﬀective forcodes of medium to large size. For example, the cost of error-correction for the 23-qubitGolay code can be reduced by more than a factor of four when compared to previousmethods.

Improved noise thresholds

Computational accuracy increases rapidly as the physical noise rate drops below thethreshold. Thus, an eﬀective way to reduce resource requirements is to increase the noisethreshold by improving lower bounds. Chapter 7 describes a technique for more accuratelycalculating lower bounds on the noise threshold when noise is modeled as a Pauli channel.We calculate a threshold error rate of 0 .

132 percent per gate for depolarizing noise, the bestlower bound currently known. Our proof uses malignant set counting [AGP06], extensively5ailored for our optimized error-correction circuits and for Pauli channel noise. Insteadof assuming adversarial (i.e., worst-case) noise at higher levels of code concatenation, thecounting procedure keeps track of multiple types of malignant events to create a transformedindependent noise model for each level, allowing for a more accurate analysis.

Low-cost approximations of single-qubit unitaries

Fault-tolerance schemes oﬀer a universal but ﬁnite set of gates from which to implementquantum algorithms. An arbitrary unitary requested by an algorithm must be approximatedby decomposition into a sequence of fault-tolerant gates. Traditional approximation methodsoutput a deterministic sequence of gates [DN05, Fow11, Sel12, KMM12c]. In Chapter 8 weexplore the use of non-deterministic but repeatable quantum circuits. By optimized directcomputer search, we ﬁnd a large number of such circuits and show how to use them toreduce the cost of approximating a single-qubit unitary by about a factor of three.

Circuit optimization subject to geometric constraints

Resource calculations often ignore geometric connectivity constraints imposed by a quantumcomputer. Fault-tolerant quantum circuits encoded in the surface code automatically respecttwo-dimensional nearest-neighbor constraints but do not consider global dimensions of thecomputer, wasting both space and time. To solve this problem, Chapter 9 proposes twoalgorithms for placing fault-tolerant quantum circuits onto a two-dimensional qubit latticeof ﬁxed, but arbitrary size. The algorithms exploit topological properties of the surfacecode in order to transform the initial circuit into one that ﬁts compactly into the latticegeometry. 6 hapter 2The mechanics of a quantumcomputer

Classical computers operate based on the laws of electricity and magnetism. However, thephysical details are usually abstracted and, instead, operations are described in terms of bitsand logic gates. Similarly, though quantum computers operate based on the laws of quantummechanics, we will use abstractions such as qubits and quantum gates. In this chapter,we summarize the mathematics of quantum computation. This summary introduces onlythe concepts that are necessary for quantum error correction and fault tolerance. For amore complete treatment, the reader is referred to any of several textbooks [NC00, KSV02,KLM07].

The content, or state , of a classical computer is described by bits. A bit is value either zeroor one, or alternatively, a bit is a vector (cid:126)v = a(cid:126) b(cid:126) , (2.1)where a, b ∈ { , } and such that a + b = 1. A string, or register , of n bits is then a length n vector over the ﬁeld Z = { , } , i.e., an ordered collection of bits.The state of a quantum computer is described by qubits . Like a bit, a qubit is a vector | ψ (cid:105) = a | (cid:105) + b | (cid:105) , (2.2)7xcept that the “amplitudes” a, b ∈ C are now free to take complex values and must satisfythe normalization condition | a | + | b | = 1. The notation |·(cid:105) , is called a “ket” and isconventional for quantum states. Measurement of a qubit yields a bit, the value of which isdetermined by a probability distribution deﬁned by a and b . The normalization conditionensures that the total probability is equal to one. See Section 2.3.A register of qubits is a unit vector in a 2 n -dimensional vector space over the complexﬁeld C . However, unlike a classical register, a register of n qubits has length 2 n , one entryfor each of the possible bit strings of length n . This is akin to a probabilistic classicalregister which may take one of 2 n possible values according to a probability distribution.In this way, a qubit register is a generalization of a probabilistic register in which thecoeﬃcients are complex and could be negative, for example. The normalization conditionfor a register (cid:80) i a i | x i (cid:105) is (cid:80) i | a i | = 1.Any two n -qubit registers | ψ (cid:105) = (cid:80) i a i | i (cid:105) and | φ (cid:105) = (cid:80) i b i | i (cid:105) obey the inner product (cid:104)| ψ (cid:105) , | φ (cid:105)(cid:105) = (cid:104) φ | ψ (cid:105) = (cid:88) i a i b ∗ i . (2.3)The normalization condition enforces that a quantum register has inner product one withitself, i.e., (cid:104) ψ | ψ (cid:105) = 1.Registers of qubits can be joined together by tensor product. For example, the tensorproduct of the state | ψ (cid:105) and | φ (cid:105) deﬁned above is given by | ψ (cid:105) ⊗ | φ (cid:105) = (cid:88) i,j a i b j | i (cid:105) ⊗ | j (cid:105) . (2.4)Often the ⊗ notation is dropped, instead using the shorthand | ψ (cid:105) | φ (cid:105) , or sometimes | ψ, φ (cid:105) .The tensor product of k identical registers | ψ (cid:105) is denoted by | ψ (cid:105) ⊗ k , or sometimes (cid:12)(cid:12) ψ k (cid:11) . Computers map input states to output states through a series of operations called gates .A classical gate takes some number of bit registers as input, and outputs one or more bitregisters as output. A quantum gate is similar, but manipulates registers of qubits.A quantum gate operating on n qubits can be described by a 2 n × n unitary matrix. Amatrix U is unitary if and only if U U † = I , where U † is the matrix obtained by transposing U and then taking the entry-wise complex conjugate, and I is the identity matrix of8ppropriate dimension. Unitary operations are reversible. That is, the inputs of a quantumgate U can be obtained from the outputs by performing the gate U † .Like registers, quantum gates can be joined by tensor product. Again, the ⊗ notationis sometimes dropped for visual clarity. This can create an ambiguity between matrixmultiplication U V and the tensor product U ⊗ V . When the intended product cannot beinferred from the context we will use ⊗ explicitly. One particularly important class of unitary gates is the single-qubit Pauli operators. Thereare four such Pauli operators: I = (cid:18) (cid:19) , (2.5a) X = (cid:18) (cid:19) , (2.5b) Y = (cid:18) − ii (cid:19) , (2.5c) Z = (cid:18) − (cid:19) . (2.5d)The square of any Pauli is equal to the identity I , and except for I , the Paulis pairwiseanticommute. That is, P Q = − QP for P, Q ∈ {

X, Y, Z } and P (cid:54) = Q .The Paulis are orthogonal under the Hilbert-Schmidt matrix inner product (cid:104) U, V (cid:105) := Tr( U † V ) . (2.6)Accordingly, they form an orthogonal basis for the set of 2 × × U can be written as a linear combination U = cos( θ ) I − i sin( θ )( aX + bY + cZ ) , (2.7)for θ ∈ [0 , π ] and nonnegative real values a, b, c such that √ a + b + c = 1The set of tensor products of Pauli operators forms a group under multiplication. Theproduct of any two Pauli operators is a Pauli operator, up to a possible unit phase {± , ± i } .The extra phase can usually be ignored, and the corresponding group is called the Pauligroup . 9 .3 Measurement

Results of a quantum operation or quantum algorithm are obtained by measuring quantumregisters. Let {| φ i (cid:105)} be an orthonormal basis for a quantum register | ψ (cid:105) such that | ψ (cid:105) = (cid:80) i a i | φ i (cid:105) . The measurement of | ψ (cid:105) with respect to this basis yields outcome i withprobability | a i | . For example, measurement of the single-qubit state a | (cid:105) + b | (cid:105) yieldsoutcome zero with probability | a | and outcome one with probability | b | . Since | (cid:105) and | (cid:105) are eigenstates of Z , this is called a Z -basis measurement.We may alternatively measure in the X eigenbasis {| + (cid:105) = √ ( | (cid:105) + | (cid:105) ) , |−(cid:105) = √ ( | (cid:105) + | (cid:105) ) } . Measurement in the X basis is equivalent to ﬁrst performing the Hadamard gate H = 1 √ (cid:18) − (cid:19) (2.8)and then measuring in the Z basis, since H | + (cid:105) = | (cid:105) and H |−(cid:105) = | (cid:105) .Measurement in other bases, and measurement of multi-qubit registers is physically pos-sible in principle. However, we will use only single-qubit Z -basis and X -basis measurementin this thesis. Unlike bits of a classical register, qubits in a quantum register need not be independent ofeach other. Consider the so-called “Bell-state” on two-qubits | ψ (cid:105) = 1 √ | (cid:105) + | (cid:105) ) . (2.9)This state is a “superposition” of two cases, one in which both qubits have value zero, andone in which both qubits have value one.If we measure the ﬁrst qubit of | ψ (cid:105) , then we get a classical bit, either zero or one. Butin this case, we know that the value of the second qubit must be equal to the value of theﬁrst qubit. That is, if we measure zero on the ﬁrst qubit, then the value of the second qubitmust also be zero. Similarly, if we measure a one on the ﬁrst qubit, then the second qubitmust also have value one.A state such as (2.9) in which qubit values are not independent is said to be entangled .Entangled states are an important part of many quantum algorithms and are used heavilyin quantum error-correcting codes. 10 .5 Universality Any quantum algorithm can be expressed as a sequence of unitary operations and single-qubitmeasurements. However, rather than construct a quantum computer capable of executingan inﬁnite number of possible unitary operations, it is more practical to decompose quantumalgorithms into a ﬁnite, but universal set of gates.

Deﬁnition 2.5.1 (Universality) . A set of quantum gates G is universal if for any unitary U and (cid:15) > , there exists some k and V = G G . . . G k such that G , G . . . G k ∈ G and (cid:107) V − U (cid:107) ≤ (cid:15) . Informally Deﬁnition 2.5.1 says that a universal gate set is one from which any unitary U can be approximated to any desired error tolerance (cid:15) . The choice of norm (cid:107) V − U (cid:107) is largely arbitrary; when necessary, the choice of norm will be stated explicitly. It canbe shown that the set of arbitrary single-qubit gates with the addition of any non-trivialmulti-qubit gate—i.e., one that cannot be expressed as the product of single-qubit gates—isuniversal [DiV95]. Thus the problem of universality can be reduced to just the single-qubitcase. One special class of quantum gates is the

Cliﬀord gates. A gate G on n qubits is Cliﬀord ifand only if e iθ G † P G ∈ P ⊗ n for all P ∈ P ⊗ n and some unit phase e iθ , where P = { I, X, Y, Z } .That is, the Cliﬀord gates are those that map Pauli operators to Pauli operators underconjugation. The Cliﬀord operators form a group. The single-qubit Cliﬀord group has size24 and can be generated by { H, S = ( i ) } . The entire Cliﬀord group can be generated byadding a single two-qubit gate, usuallyCNOT =   . (2.10)The ﬁrst input of the CNOT is called the control and the second input of the CNOT iscalled the target . The CNOT gate ﬂips the value of the target qubit only if the state of thecontrol qubit is | (cid:105) .The Cliﬀord group is important in the study of fault-tolerant quantum computing fortwo reasons. First, many quantum error-correcting codes permit very simple and robust11ncoded versions of Cliﬀord gates. Second, and more importantly, it is particularly easyto calculate the eﬀect Pauli errors as they propagate through sequences of Cliﬀord gates.Indeed, the Cliﬀord group contains several important quantum gates including H andCNOT, but quantum computations that contain only Cliﬀords can be eﬃciently simulatedby a classical computer, a result known as the Gottesman-Knill theorem. In fact, theCliﬀord group is strictly less powerful than (universal) classical computation [AG04].Propagation of Pauli errors through Cliﬀord gates is used heavily throughout this thesis.For convenience, we give the relevant equations explicitly for X and Z . Propagation for Y follows from Y = iXZ . HX = ZH, (2.11a) HZ = XH, (2.11b) SX = Y S, (2.11c) SZ = ZS, (2.11d)CNOT( I ⊗ X ) = ( I ⊗ X )CNOT , (2.11e)CNOT( X ⊗ I ) = ( X ⊗ X )CNOT , (2.11f)CNOT( I ⊗ Z ) = ( Z ⊗ Z )CNOT , (2.11g)CNOT( Z ⊗ I ) = ( Z ⊗ I )CNOT . (2.11h) The relatively meager computational power of the Cliﬀord group implies that Cliﬀordgates alone cannot be universal for quantum computation. It turns out, however, that theaddition of any non-Cliﬀord gate is suﬃcient for universality (see, e.g., [CAB12] AppendixD). The most common choice is the single-qubit gate T = (cid:18) e iπ/ (cid:19) . (2.12)Note that T = S . There are other sensible choices, however. For example the three-qubitToﬀoli gate, deﬁned by | a, b, c (cid:105) (cid:55)→ | a, b, c ⊕ ( a · b ) (cid:105) , is universal for classical computationand is therefore also useful in constructing classical reversible subroutines such as addition.Some other alternatives are discussed in Chapter 8.12 ψ i • X •| i H • Z | i X Z

Figure 2.1: An example of a quantum circuit. The circuit takes three qubits as input, andoutputs a single qubit. CNOT gates are indicated by vertical lines between qubits; the blackdot indicates the control, and the ⊕ indicates the target. Measurements are represented by“D” shapes, and the basis ( X or Z ) is indicated. Classically-controlled gates are denoted bydouble lines. This particular circuit performs “teleportation”, transferring | ψ (cid:105) from theﬁrst qubit to the third qubit. It is often convenient and helpful to describe sequences of quantum gates visually, as circuits.Technically, a quantum circuit is a directed acyclic graph in which the vertices representquantum gates, and the edges represent qubits. Figure 2.1 shows an example of a circuitcomposed of gates from { CNOT , H, X, Z } .A circuit can be partitioned into time-steps in which each qubit is involved in at mostone gate. By convention, time goes from left to right. Note that this is the opposite of theconvention for matrix multiplication, in which gates are applied on the state | ψ (cid:105) from rightto left. In Figure 2.1, the Hadamard gate is applied ﬁrst, followed by a CNOT on qubitstwo and three and then a CNOT on qubits one and two.Measurements output classical bits, indicated by the double lines. Quantum gates canbe conditionally applied based on classical measurement values. In this example, the X gate is applied only if the Z -basis measurement on the second qubit is one, and the Z gateis applied only if the X -basis measurement on the ﬁrst qubit is one. The circuit shown in Figure 2.1 demonstrates a uniquely quantum concept called teleporta-tion [BBC + | ψ (cid:105) of the ﬁrstqubit on to the third qubit. Initially the second and third qubits must be located closetogether in order to execute the ﬁrst CNOT gate. The third qubit can then be transported13 ψ i Z | + i • X R Z ( θ ) R Z ( θ ) | ψ i (a) | ψ i Z | + i R Z ( θ ) • R Z (2 θ ) X R Z ( θ ) | ψ i (b) Figure 2.2: Two modiﬁcations of the teleportation circuit shown in Figure 2.1. (a) One-qubitteleportation. The input | ψ (cid:105) is teleported using just one ancilla qubit, prepared as | + (cid:105) .After teleportation, a Z -axis rotation is applied to the output. (b) Gate teleportation.Using the relation R Z ( θ ) X = R Z (2 θ ) XR Z ( θ ), the Z -axis rotation can be shifted to the leftand a new conditional correction is required.to any desired location. Upon executing the remainder of the circuit, the state of the ﬁrstqubit is instantly transported to the location of the third qubit, up to Pauli correctionsbased on the measurement outcomes.Teleportation is used frequently in fault-tolerant circuits, but for a diﬀerent reason.Consider the circuit shown in Figure 2.2a. This circuit also teleports the state | ψ (cid:105) , butrequires only one additional qubit [ZLC00]. After teleportation, a Z -axis rotation R Z ( θ ) = cos( θ/ I − i sin( θ/ Z (2.13)is applied to the output. Next, observe that R Z ( θ ) X = R Z ( θ ) XR Z ( − θ ) R Z ( θ ) = R Z (2 θ ) XR Z ( θ ) . (2.14)Therefore, the Z -axis rotation may be shifted to the left of the conditional X correction, andto the left of the CNOT gate (since Z has no eﬀect on the control of a CNOT). The R Z ( θ )gate can now be performed “oﬄine” on the ancillary qubit, before interaction with the state | ψ (cid:105) . The technique of preparing a gate oﬄine by commuting through the teleportationcircuit is called gate teleportation [GC99].Of course, the conditional correction R Z (2 θ ) X in the gate teleportation circuit is nowmore complicated than it was before. However, there are certain cases in which fault-tolerantly executing R Z (2 θ ) is far easier than executing R Z ( θ ). Oﬄine preparation of themore diﬃcult R Z ( θ ) allows for more eﬃcient error suppression, as we will see in Chapter 4.14 hapter 3Protecting quantum information In Chapter 1 we discussed the fragility of the information stored in quantum bits. Anunprotected quantum system interacts freely with its environment, causing the informationthat it contains to be corrupted or lost. Before a qubit can be used for computation, itmust be protected against noise.In this chapter, we detail a major tool for protecting quantum information, quantumerror-correcting codes. Quantum codes use many physical qubits to represent one logicalqubit, thereby reducing the impact of an error on any one of the physical qubits. Quantuminformation is more complicated than classical information, and likewise quantum errorsare more complicated than classical errors. Nonetheless, it is still possible to use the wealthof classical coding theory to develop quantum codes.

Classical codes operate by adding redundancy. For example, the simplest classical codeis the two-bit repetition code in which a single logical bit is encoded using two noisy bits.The logical value zero is encoded as 00 and the logical one is encoded as 11. An error oneither one of the two noisy bits will result in a value of 01 or 10. This single error can bedetected by taking the parity of the two bits (i.e., the sum of the bits modulo two); in thiscase an odd parity indicates an error. By adding third bit of repetition, single bit-ﬂips canbe corrected . For example, the value 010 can be restored by ﬂipping the second bit back tozero. The errant bit can be identiﬁed by taking the parity of each pair of bits. An oddparity for the ﬁrst two and the last two bits indicates an error on the middle bit.15 common simplifying assumption is that errors occur identically and independentlyon each bit. If the probability of an error on a single bit is p , then the probability of asimultaneous error on two bits is p . Since the three-bit repetition code can correct anysingle-bit error, an uncorrectable error occurs only when there are simultaneous errors ontwo or more of the bits. The probability p L of this uncorrectable, or “logical” error is givenby p L = 3 p (1 − p ) + p , (3.1)where there are (cid:0) (cid:1) = 3 ways for two errors to occur. So long as p < p (1 − p ) + p , (3.2)which is true for p < .

5, then the encoding yields a net improvement over just a single bit. The repetition code can be extended to correct larger numbers of errors by simplyadding more bits. The number of simultaneously correctable errors is given by (cid:98) ( n − / (cid:99) where n is the number bits in the code. In the limit of large n , each additional bit increasesthe number of correctable errors by one-half. Improved eﬃciency can be obtained by encoding more than one logical bit at a time. Linearcodes are deﬁned by a k × n binary matrix G , where n is the number of bits of the codeand k is the number of encoded logical bits. The logical value x is encoded into a codeword c by binary (i.e., sum modulo two) matrix-vector multiplication c = G (cid:124) x , (3.3)where x and c are treated as column vectors.All codewords satisfy a set of linear constraints called parity checks, deﬁned by a( n − k ) × n binary matrix H such that HG (cid:124) = 0 , (3.4)which implies that Hc = 0 for all codewords c .The parity check matrix H is useful in identifying errors since for any codeword c andany n -bit vector e , H ( c + e ) = Hc + He = He . (3.5) In this case, net improvement can also be obtained for p > .   (a)   (b) Table 3.1: Parity check matrices for the (a) [7 , ,

3] and (b) [15 , ,

3] Hamming codes.The ( n − k )-bit vector He identiﬁes the parity checks violated by the error e and is calledthe error syndrome . Each syndrome can be associated with a recovery operation e (cid:48) thatreturns the vector ( c + e ) to a codeword, i.e., H ( c + e + e (cid:48) ) = 0.The distance of a linear code is deﬁned as the minimum Hamming weight of anynonzero codeword. The distance corresponds to the minimum number of bits that mustbe ﬂipped to transform one codeword into another—i.e., the minimum Hamming distancebetween codewords. The all zero vector is always a codeword of any linear code, and sothe minimum Hamming distance cannot be larger than the minimum weight (nonzero)codeword. Conversely, for any two codewords c , c , the linear combination c = c + c is also a codeword and the Hamming weight of c is equal to the Hamming distance of c and c . Thus, the Hamming distance between c and c is at least the code distance. Thethree-bit repetition code, for example, has distance three since 111 has weight three.A code with distance d can detect up to d − c + e that is not a codeword yields a nonzerosyndrome, and so applying an error e to a codeword c results in a syndrome of zero onlyif e has Hamming weight at least d . A linear code can correct up to t = (cid:98) ( d − / (cid:99) biterrors. The correction procedure takes a vector c + e and replaces it with the closest (inHamming distance) codeword c (cid:48) . Informally, an error of weight k moves the data k stepsaway from the codeword. So long as k is less than halfway to any other codeword thecorrection procedure will succeed. Again, the three-bit repetition code can detect errors upto weight two, but can only correct errors of weight one.A linear code using n noisy bits to encode k logical bits to a distance of d is denoted by[ n, k, d ]. Perhaps the most well known class of linear codes is the family of [2 r − , r − r − , r ≥ r = 2) Hamming code corresponds tothe three-bit repetition code discussed above. Parity check matrices for the seven-bit and15-bit Hamming codes are shown in Table 3.1.17 .1.2 Dual codes The generator matrix G and the parity check matrix H are interchangeable. Just as G deﬁnes the codewords of a linear code, H deﬁnes the codewords of a diﬀerent code, calledthe dual . The parity checks of the dual code are then given by G . Alternatively, given alinear code C , the codewords of the dual code are given by the orthogonal complement of C deﬁned by the set C ⊥ = { g : | g · c | = 0 mod 2 , ∀ c ∈ C } . Unfortunately, classical codes cannot be used directly to protect quantum information,primarily because in addition to bit ﬂips, qubits can suﬀer from more exotic kinds of errors.For example, consider the state | + (cid:105) = √ ( | (cid:105) + | (cid:105) ). If the Pauli operator Z is accidentallyapplied to this state then it becomes |−(cid:105) = √ ( | (cid:105) − | (cid:105) ). This kind of error is called a phase-ﬂip , since the relative phase between | (cid:105) and | (cid:105) has been swapped from +1 to − On the surface the problem appears to be even worse than just dealing with bit-ﬂip andphase-ﬂip errors. Consider the operator E θ = (cid:18) e − i θ (cid:19) , (3.6)where where θ ∈ [0 , π ). Accidental application of E θ introduces one of an inﬁnite numberof continuous phase errors e − i θ . Bit-ﬂip errors may be similarly continuous.However, we may rewrite (3.6) as E θ = e − iθ (cos( θ ) I + i sin( θ ) Z ) . (3.7)When written in this way, what was a continuous phase error now appears as a discrete Z error, but with a continuous amplitude. Up to a global phase, the state is either leftunchanged with amplitude cos( θ ) or incurs a phase-ﬂip with amplitude i sin( θ ). The globalphase e − iθ is generally unimportant, since it has no eﬀect on measurement outcomes.18ore generally, an error can be modeled as a unitary transformation U E on the jointstate of the quantum computer | ψ (cid:105) and its surrounding environment | E (cid:105) . Using the factthat the Pauli operators form a basis for single-qubit operators, U E can be decomposed as U E = (cid:88) i,j e ij P i ⊗ E j , (3.8)where each P i is a tensor product of Pauli operators and E j acts only on the environment.The result of an error U E on the joint state is then given by U E | ψ (cid:105) | E (cid:105) = (cid:32)(cid:88) i P i | ψ (cid:105) (cid:33) (cid:88) j e ij E j | E (cid:105) . (3.9)Again, as in (3.7), the error is written as a discrete sum over Pauli operators.Equation (3.9) implies that task of protecting quantum information can be reduced tothe task of guarding against products of Pauli errors. Additionally, since Y = iXZ , eachtensor of Paulis can be expressed using only X and Z , up to an unimportant global phase.In other words, quantum errors can be expressed solely in terms bit-ﬂips and phase-ﬂips onindividual qubits. The error expressed in (3.8) is not entirely general in that it does not directly account forleakage and loss errors. Leakage occurs when the state | ψ (cid:105) goes outside of the expected 2 n dimensional state space. For example, a qubit may be represented physically by the ﬁrsttwo energy levels of an ion. Thermal excitations could cause the ion to jump to a higherenergy level, in which case the state would have to be represented by a qutrit | ψ (cid:105) = a | (cid:105) + a | (cid:105) + a | (cid:105) , (3.10)where the state | (cid:105) represents leakage outside of the qubit space. Similarly, loss occurswhen a qubit is removed or otherwise disappears from the computer. This could happen ifan ion is spontaneously ejected from a trap.Left unchecked, leakage and loss errors can have serious consequences for protection ofquantum information [GFMG13]. However, they can usually be controlled with a smallamount of eﬀort [Pre98, Fow13a]. We will not consider leakage and loss errors in this thesis.19 .3 Quantum error-correcting codes Equation (3.9) shows us that the state of a quantum register after being subjected to noisecan be expressed as a superposition of the original state over a discrete set of bit-ﬂip andphase-ﬂip errors. Informally then, the goal of a quantum error-correcting code is to projectthe register onto one of those superposition states, identify the error and reverse it.More formally, let { ψ i } be an orthonormal basis for the codewords of a quantum error-correcting code C , and let { E i } be a set of errors against which we would like to protect.The conditions under which the code C can correct errors { E i } are given by the followingtheorem [BDSW96, KLV00]. Theorem 3.3.1 (Quantum error correction condition) . A code C with codewords {| ψ i }(cid:105) can correct the set of errors { E a } if and only if (cid:104) ψ i | E † a E b | ψ j (cid:105) = C ab δ ij , (3.11) where δ ij equals one if i = j and equals zero otherwise, and C ab ∈ C is independent of i and j . Theorem 3.3.1 can be understood by considering a code with just two codewords { (cid:12)(cid:12) (cid:11) , (cid:12)(cid:12) (cid:11) } , where the notation | a (cid:105) indicates the encoded logical state | a (cid:105) . Then (3.11)requires that E a (cid:12)(cid:12) (cid:11) and E b (cid:12)(cid:12) (cid:11) are orthogonal. If this were not the case, then an error E a on (cid:12)(cid:12) (cid:11) and E b on (cid:12)(cid:12) (cid:11) would yield overlapping states, and measurement of the error could confusethe two cases. In particular, if E † a E b is a logical operator (say X ), then (3.11) is certainlyviolated. Likewise, consider an error E with the property that (cid:10) (cid:12)(cid:12) E † E (cid:12)(cid:12) (cid:11) (cid:54) = (cid:10) (cid:12)(cid:12) E † E (cid:12)(cid:12) (cid:11) ,again violating (3.11). Then E changes the relative amplitudes of (cid:12)(cid:12) (cid:11) and (cid:12)(cid:12) (cid:11) so that E ( | (cid:105) + | (cid:105) ) ∝ (cid:12)(cid:12) (cid:11) + δ (cid:12)(cid:12) (cid:11) for some δ . But (cid:12)(cid:12) (cid:11) + δ (cid:12)(cid:12) (cid:11) is itself a codeword, so the error E cannot be distinguished from a valid logical operation. The most widely studied class of quantum error-correcting codes is stabilizer codes , thequantum analog of classical linear codes [Got96a, CRSS97]. A stabilizer code is deﬁned bya stabilizer group M for which each element is a tensor product of Pauli operators. The setof codewords is given by {| ψ (cid:105) : M | ψ (cid:105) = | ψ (cid:105) , M ∈ M} ; each codeword is a +1-eigenvectorof all of the elements in the stabilizer group. Since M is a group, the stabilizers canbe speciﬁed by a set of generating elements called stabilizer generators . The stabilizergenerators are directly analogous to the parity checks of a classical linear code.20rror correction can be performed by measuring each of the stabilizer generators in orderto determine the error syndrome. The number of simultaneous single-qubit errors that thecode can correct is given by (cid:98) ( d − / (cid:99) , where d is the code distance. Deﬁne the normalizer of M , N := { P ∈ P n : P S = SP, S ∈ M} , as the set of n -qubit Pauli group elementsthat commute with all of the stabilizers. The distance of the code is then equivalent tothe minimum weight non-identity element of N \ M . Here, the weight of an operator isdeﬁned as the number of X , Y and Z operators in its tensor product decomposition.A stabilizer group on n physical qubits with m generators encodes n − m logicalqubits. The 2 m syndromes partition the 2 n -dimensional state space, yielding a codespaceof dimension 2 n − m . Each logical qubit i is associated with a pair of logical operators X i , Z i ∈ N \ M such that X i and Z i commute with all of the stabilizers, but anti-commute witheach other. Logical operators on diﬀerent logical qubits also commute. The situation is indirect correspondence with single-qubit Pauli operators on physical qubits. A stabilizer codeencoding k logical qubits into n physical qubits to a distance of d is denoted as [[ n, k, d ]]. Stabilizer algebra

Given a set of generators and logical operators for a stabilizer code, it is possible to writeout each of the codewords {| ψ i (cid:105)} explicitly, and therefore to calculate how the encodedquantum state evolves under unitary operations and measurements. However, the stabilizerformalism oﬀers an alternative which is usually more eﬃcient and intuitive. Consider theeﬀect of applying a unitary U to a codeword | ψ (cid:105) . We would like to understand how U impacts the stabilizers and the logical operators of the code. By deﬁnition, we have U | ψ (cid:105) = U ( M | ψ (cid:105) ) = ( U M U † ) U | ψ (cid:105) (3.12)for any stabilizer M . Thus, a stabilizer M of the original state is transformed by conjugation U M U † to a stabilizer of the new state U | ψ (cid:105) . The logical operators are similarly transformedby conjugation.In this way, stabilizers oﬀer an analog of the Heisenberg interpretation of quantummechanics [Got99]. Rather than tracking the evolution of the state | ψ (cid:105) , we may track theevolution of the stabilizers. For a code on n qubits, there are 2 n possible terms in theexpansion of | ψ (cid:105) , but only at most n stabilizer generators. Thus expressing an encoded statein terms of its code stabilizers can be exponentially more eﬃcient than the correspondingexpression as a quantum state.The eﬀect of measurements on the stabilizers is slightly more complicated, but can stillbe calculated eﬃciently. Consider a Z -basis measurement on the ﬁrst qubit of an n -qubit21odeword. After the measurement, the state is stabilized by the operator Z ⊗ I ⊗ n − , upto a phase of ±

1. The deﬁnition of the stabilizer group implies that all stabilizers mustcommute. Thus, the stabilizers of the state after the measurement must all commute with Z ⊗ I ⊗ n − . Any operator that was a stabilizer before the measurement, but anti-commuteswith Z on the ﬁrst qubit cannot be a stabilizer after the measurement. Note however,that it is always possible to express the set of stabilizer generators so that at most onegenerator anti-commutes with the measurement. If both M and M anti-commute withthe measurement, then M can be replaced by M M , which does commute. Thus thesingle anti-commuting generator is replaced by Z ⊗ I ⊗ n − and all of the other generatorsremain unchanged.To make this more concrete, we illustrate with an example using the [[7 , , I I I I XXX I I I I ZZZI XX I I XX I ZZ I I ZZX I X I X I X Z I Z I Z I ZX L = XXXXXXX Z L = ZZZZZZZ , (3.13)where X L and Z L are the X and Z logical operators, respectively, and for visual clarity thetensor product notation has been omitted. Now consider the eﬀect of applying the Hadamardoperator to each qubit. Hadamard swaps X and Z under conjugation; HXH = Z and HZH = X . So the result of applying H ⊗ is I I I I ZZZ I I I I XXXI ZZ I I ZZ I XX I I XXZ I Z I Z I Z X I X I X I XX L = ZZZZZZZ Z L = XXXXXXX . (3.14)The stabilizers have been preserved, and the X L and Z L logical operators have been swapped.The operator H ⊗ therefore acts as a logical Hadamard on the code.Now consider a Z -basis measurement on the ﬁrst qubit. All but the operators XIXIXIX and Z L = XXXXXXX commute with the measurement. However, the Z L operator may bemultiplied by XIXIXIX so that it commutes with Z ⊗ I ⊗ . (Remember that multiplicationby a stabilizer is equivalent to multiplying by the identity.) The resulting stabilizers aftermeasurement are I I I I ZZZ I I I I XXXI ZZ I I ZZ I XX I I XXZ I Z I Z I Z ± Z I I I I I I X L = ZZZZZZZ Z L = I X I X I X I , (3.15)22here the new stabilizer is highlighted in bold and the ± Stabilizer states

Normally we are interested in codes that contain at least one logical qubit. For a stabilizercode on n qubits, this means that the number of stabilizer generators should be ( n − k ) forsome k >

0. Then the set of codewords lives in a 2 k -dimensional subspace representing k logical qubits. If k = 0, however, then the set of codewords has dimension one, a singlequantum state.An n -qubit state that is deﬁned by a set of n stabilizer generators is called a stabilizerstate . In the seven-qubit code, for example, adding the Z logical operator Z L to the set ofstabilizers yields a stabilizer state. By deﬁnition, this state is a +1-eigenstate of Z L and sothis is the encoded state | (cid:105) , just as (physical) | (cid:105) is the +1-eigenstate of Z .Not all quantum states are stabilizer states. Consider the eﬀect of applying T to theﬁrst qubit of the encoded | (cid:105) state deﬁned above. The conjugation relations for T are T ZT † = ZT XT † = ( X + Y ) / √ . (3.16)Therefore some of the resulting stabilizers are no longer tensor products of Paulis, butrather linear combinations of tensor products of Paulis. The encoded state T ⊗ I ⊗ (cid:12)(cid:12) (cid:11) isnot a stabilizer state.On the other hand, an inductive argument shows that the output of any circuit composedof Cliﬀord gates, | (cid:105) preparation and Z -basis measurement is a stabilizer state. Conversely,the deﬁnition of the Cliﬀord group implies that any stabilizer state can be expressed by sucha circuit [AG04]. Stabilizer states and their corresponding circuits are a major componentof fault-tolerant error correction, and are discussed in more detail in Chapter 6. A particularly useful subset of stabilizer codes can be constructed from classical linearcodes. The construction requires two linear codes C = [ n, k , d ] , C = [ n, k , d ] that areorthogonal, i.e., C ⊥ ⊆ C . The parity checks of C can be translated into tensor productsof Pauli X operators, and the parity checks of C can be translated into tensor products ofPauli Z operators. Together these operators form the stabilizer generators of the quantum23rror-correcting code. The tensor products of X are called X stabilizers and the tensorproducts of Z are called Z stabilizers .Codes based on this construction are known as CSS codes after Calderbank, Shor andSteane and include the most commonly known codes such as the Steane’s [[7 , , , , | x (cid:105) = 1 (cid:113)(cid:12)(cid:12) C ⊥ (cid:12)(cid:12) (cid:88) w ∈ C ⊥ | x + w (cid:105) , (3.17)where x is the coset representative of an element of C /C ⊥ . Equation (3.17) shows thateach codeword x can be interpreted as a superposition over each of the X stabilizers. Thecode C ⊥ partitions C into | C | / (cid:12)(cid:12) C ⊥ (cid:12)(cid:12) cosets and so there are 2 k − ( n − k ) codewords. Second,CSS codes permit independent correction of X errors and Z errors. The X stabilizersdeﬁned by C are used to correct Z errors, and the Z stabilizers deﬁned by C are usedto separately correct X errors. Independent X and Z correction is exploited in Chapter 6and Chapter 7. As a consequence of these two properties, the CSS construction yields a[[ n, k + k − n, min { d , d } ]] quantum code. Stabilizer codes can be combined to form other larger stabilizer codes. Given two stabilizercodes C = [[ n , k , d ]] and C = [[ n , , d ]], a [[ n n , k , d d ]] code is be obtained byencoding each physical qubit of C in the code C [KL96]. This construction is known ascode concatenation , and is a key element of many threshold theorems including the onein Chapter 7. In particular, concatenation can be performed repeatedly in order to obtainan arbitrarily large code distance.Concatenation can also be accomplished when C encodes multiple logical qubits, inwhich case the resulting code is [[ n n , k k , d d ]] (see, e.g., [Got97]). Other methods forcombining codes include pasting to increase k [Got96b], and welding [Mic12]. We focusonly on concatenation in this thesis, however.24 .3.4 Topological codes Another notable subset of stabilizer codes are so-called topological codes. These codes havethe property that the stabilizer generators can be deﬁned locally when qubits are laid outas a lattice on some manifold. Prominent examples include the toric code [Kit97], and thesurface code [BK98].Each topological code is, in fact, a family of codes. Notably, both the number ofencoded qubits and the distance can be increased arbitrarily while maintaining locality ofthe stabilizer generators. This permits fault-tolerance schemes which require only localinteractions among qubits, a feature which is useful on a large number of proposed physicalquantum computing architectures. By contrast, concatenated codes require interactionsbetween qubits which may be far apart.

There are also quantum error correcting codes that do not conform to the stabilizerconstruction. A variety of codes can be constructed by relaxing the stabilizer formalism insome way. Subsystem codes, for example, encode qubits as linear subsystems rather thantwo-dimensional subspaces [Bac06]. Another relaxation of the stabilizer formalism can beused to construct approximate quantum error-correcting codes [LNCY97]. Codes can beused to protect qudits ( d -dimensional quantum bits) rather than qubits [Kni96]. Yet morecodes are possible if the code block is entangled with an outside resource [Bow02].Stabilizer codes are generalized by so-called codeword stabilized codes [LYGG08,CSSZ09]. A codeword stabilized code is characterized by a stabilizer state and a setof “word operators” that act as logical X operators. The structure of these codes is morecomplicated than for stabilizer codes. The word operators need not commute with eachother, for example. Codeword stabilized codes have not been widely studied in the contextof fault-tolerant quantum computing. The protection oﬀered by quantum error-correcting codes was demonstrated experimentallyas early as 1998, when the three-qubit phase-ﬂip code was implemented in liquid stateNMR [CPM + +

99, KLMN01, BPF +

02, BVFC05, ZGML11,25GZL12, ZLS12], trapped ions [CLS +

04, SBM + + + p to roughly p . A similar, but much more recent study shows even sharper improvements [ZGML11].On the other hand, the limited scale of the experiments illustrate the need to improvethreshold and resource overhead requirements. Most experimental setups are large enoughto encode only a single logical qubit, whereas quantum algorithms require hundreds orthousands of qubits. Experimental capabilities will continue to improve, but so must theresource costs of error correction. Quantum error-correcting codes are not the only means by which to protect quantuminformation. For completeness, we brieﬂy outline some alternative techniques.

Originally formalized for quantum information by [PSE96] and [DG97] and later coinedby [LCW98], a decoherence-free subspace (DFS) encodes data into states for which the eﬀectof environmental noise is trivial. As a toy example, consider a noise model in which only Z errors occur, and when they do they occur simultaneously on all qubits in the system.That is, for a system of n qubits, the only possible error is Z ⊗ n . Even this very simplenoise model can cripple a quantum computer. But this error acts trivially on certain states,for example, Z ⊗ Z ( a | (cid:105) + b | (cid:105) ) = a | (cid:105) + b | (cid:105) . (3.18)Thus, by encoding in the subspace {| (cid:105) , | (cid:105)} ( | (cid:105) for logical | (cid:105) , and | (cid:105) for logical | (cid:105) )the logical qubit is completely immune to errors. In this way, a DFS is equivalent to anerror-correcting code for a very simple and speciﬁc noise model.26ecoherence-free subspaces enjoy several advantages over error-correcting codes. First,they usually require only a very small number (two in the above example) of physical qubitsper logical qubit. Second, since errors act trivially, a DFS requires no active intervention inorder to correct errors. Furthermore, the strength of the noise can be very high, in contrastto error correcting codes which can tolerate only low levels of noise (see Chapter 4). On theother hand, given a particular noise model, ﬁnding the symmetries required to construct aDFS, provided that they even exist, is diﬃcult. Indeed, DFS is known to be insuﬃcient forsome reasonable noise models [LBKW01]. Dynamical decoupling (DD) is another technique for suppressing errors for simple andwell-characterized noise models [VKL99, Ban98]. If noise causes the system to evolve inan uncontrolled but predictable way, then quick control pulses can be used to periodically“reverse” the noise and cause it to cancel out. Again a toy example is helpful. Say that noiseacts continuously on a qubit, and that for a ﬁxed duration of time t the eﬀect is given by E ( t ) = (cid:18) e iθt (cid:19) . (3.19)By periodically applying Pauli X , the eﬀect of the noise can be canceled since E ( t ) XE ( t ) X = E ( t ) E † ( t ) = I . (3.20)In more practical examples the noise and the required control pulses are more complicated,but the idea is the same.Dynamical decoupling has the advantage of requiring no additional qubits. Its disadvan-tage, is that it requires fast and accurate control. Moreover, complicated pulse sequencescan make data manipulation more diﬃcult, and increase gate times.DFS and DD are usually considered as complementary to quantum error-correctingcodes. A variety of authors have considered methods for using DFS, DD and quantumerror-correcting codes in diﬀerent combinations [LBW99, LBKW01, NLP11, PSL13]. DFSand DD can act as a “ﬁrst line of defense” against errors, after which error-correctionis applied to achieve arbitrary accuracy. In this thesis we focus only on fault-toleranceprotocols based on codes. It is likely, however, that the the best complete strategies forsuppressing errors will involve elements from all three techniques.

A third, and dramatically diﬀerent alternative to quantum error-correcting codes is topolog-ical quantum computation. In topological quantum computation, data is stored in exotic27articles called anyons [Kit03]. Consider a pair of particles which are placed side-by-side,and then exchanged; the particle on the left moves to the right, and the particle on theright moves to the left. For typical physical particles, such as photons or electrons, theeﬀect of this exchange is essentially trivial. For anyons, however, this exchange induces anon-trivial phase akin to a diagonal unitary gate. Sequences of exchanges, called “braids”,can be composed in order to quantum compute [FLW02a, FLW02b].The novel feature of topological quantum computation is that, in principle, it is inherentlyrobust against errors. The computational states are degenerate ground states, which meansthat errors are suppressed naturally by the system. So long as the anyons are kept farenough apart, no active error suppression is required. Though promising, the existence andcapability to produce anyons with the right properties is still largely speculative [DFN05,NSS +

08, LK12, SL13]. 28 hapter 4Fault tolerance: making quantumcomputing error-free

The most straightforward use of quantum error-correcting codes is in transmitting quantuminformation over noisy channels. In this case, the sender encodes his quantum state andsends it over the noisy channel to the receiver who then decodes. Of course, in a realisticsetting, errors can occur before the encoding process and after decoding, when the quantuminformation is unprotected.In order to achieve reliable quantum computation, the data must be protected at alltimes. In particular, unitary gates should be performed while the data is still encoded.The typical procedure involves alternating rounds of encoded gates and error correction.The encoded gate manipulates the data in the error correcting code, and error correctionattempts to eliminate errors introduced by the encoded gate. See Figure 4.1.The use of encoded gates alone is not enough. Both the encoded gates and the error . . . EC • EC H EC . . .. . . EC EC T EC . . . Figure 4.1: Typical fault-tolerant circuits are constructed by alternating rounds of error-correction with encoded gates. 29orrection circuits should be fault tolerant . Roughly, a quantum circuit is fault tolerant ifthe errors that occur during each step are small in number and can be kept well controlled.Errors in a fault-tolerant circuit have very little chance of spreading or combining in orderto cause data corruption. In this chapter we make this concept precise, and examinetechniques for constructing fault-tolerant quantum circuits.

Before delving into the details of fault-tolerant quantum computation, it is instructive tooutline the path from its early beginnings to current state-of-the-art. This history willshow the successes and diﬃculties of the theory of fault-tolerant quantum computing, andprovide motivation and context for the new results in subsequent chapters.

The ﬁrst proposal for fault tolerant quantum computation was posited by Shor in 1996[Sho96]. Shor showed that his construction tolerates a noise rate that is logarithmic inthe size of the computation (measured by the number of gates). Roughly, Shor’s errorcorrection circuit contains a logarithmic number of gates, thus an error rate proportional tothe inverse of that size is suﬃcient. Soon after, the ﬁrst “threshold theorems” were provenindependently by Aharonov and Ben-Or [AB97], Kitaev [Kit97], and Knill, Laﬂammeand Zurek [KLZ96] each of which permitted a constant noise rate per gate regardless ofcomputation size. Importantly, the amount of extra time and space resources requiredscales only as a polynomial in the logarithm of the computation size.

Theorem 4.1.1 (Constant noise threshold for quantum computation) . Consider a quantumcircuit C of size N , a quantum computer with gates that fail independently with probabilityat most p , and target failure probability (cid:15) > . There exists a diﬀerent quantum circuit C (cid:48) of size at most O (cid:18) N · poly (cid:18) log N(cid:15) (cid:19)(cid:19) (4.1) that can be implemented on the quantum computer and simulates C with probability of errorat most (cid:15) , provided that p is below a constant threshold value p th . The intuition is that an [[ n, k, d ]] code yields encoded gates with a logical error rateat most cp t +1 , where t = (cid:98) ( d − / (cid:99) , c = (cid:0) At +1 (cid:1) and A is the number of physical gates30ontained in a single encoded gate plus error correction. Concatenating the code with itself j times requires n j +1 qubits per block but an inductive argument yields a logical error rateper gate of p j ≤ c /t (cid:0) c /t p (cid:1) ( t +1) j +1 . (4.2)That is, the size of the code scales exponentially, but so does the minimum distance. Theright-hand side of (4.2) converges so long as the physical error rate obeys p < c /t = p th . (4.3)Taking the logarithm of both sides of (4.2) twice, we see that achieving a target error rateper gate of p j ≤ (cid:15)/N only requires concatenation to level j = O (log log N/(cid:15) ). The totalcode size is then a polynomial in log

N/(cid:15) .Interestingly, the early threshold theorems hold only for quantum error-correcting codesof distance at least ﬁve. Thresholds for distance-three codes were not known until 2006,when they were discovered independently by Reichardt [Rei06b], and Aliferis, Preskill andGottesman [AGP06]. A novel fault-tolerance scheme using distance-two error- detecting codes was proposed by Knill in 2005, though without explicit proof of a threshold [Kni05].Rigorous proof of a threshold for distance-two schemes was proposed by Reichardt [Rei07](see also [Rei06a]), and later by Aliferis, Preskill and Gottesman [AGP08].Existence of a noise threshold permits arbitrary quantum computation for a constantamount of engineering cost per gate, at least in principle. In practice, the value of thethreshold matters since, while error rates near one percent are currently achievable in somesmall-scale experiments, e.g., [LJL +

10, MSB +

11, CGC +

12, GGZ13], rates much lower thansay 10 − on a large-scale are perhaps impossible even in the long-term.The earliest estimate based on a rigorous threshold proof was calculated by Aharonovand Ben-Or to be an error rate per gate of about 10 − . Later calculations based on [Rei06b]and [AGP06] were similarly low at 6 . × − and 2 . × − , respectively. Since then,rigorous threshold bounds have steadily improved. As of 2011, the highest lower boundwas 1 . × − by Aliferis and Preskill [AP09]. In Chapter 7, we adapt the techniqueof [AGP06] to prove a threshold of 1 . × − .Another popular technique is to estimate the threshold using Monte Carlo simulation.Threshold estimates, though not rigorous, paint a much more optimistic picture than lowerbounds. An initial estimate by Zalka placed the threshold at about 10 − [Zal96]. In 2004Knill estimated a threshold for his distance-two scheme as high as three percent. Simulationsfor the surface code indicate a threshold of about one percent [WFH11]. Figure 4.2 shows31

003 2004 2005 2006 2007 2008 2009 2010 2011 2012 20130.00%0.01%0.10%1.00%10.00% Year T h r e s ho l d e rr o r r a t e Figure 4.2: Threshold calculations since 2003 arranged in chronological order by year. Bluediamonds indicate estimates based on Monte Carlo sampling. a Orange triangles indicaterigorous lower bounds (with varying assumptions). b a [Ste03, Rei04, Kni05, MTC +

05, SFR +

06, DHN06, SDT07, AC07, RH07, RHG07, SE09, WFHH10,WFSH10, FSG09, FY10, WFH11, SMN13] b [AGP06, Rei06b, SDT07, AGP08, AC07, SR09, SFH08, AP09, PR12, Fow12b, LPSB13] thresholds from a large number of studies and for a variety of error-correcting codes, noisemodels, and geometric constraints. Modern fault-tolerance schemes provide reasonable conﬁdence that noise thresholds canbe met using near-term technologies, at least for small numbers of qubits. At the sametime, the resource requirements for these schemes can be overwhelming. Knill, for example,estimates that his distance-two scheme would require a resource overhead ranging fromone-thousand to one-billion fold, or more depending on computation size and gate errorrate. Estimates for a cluster-state-based scheme due to Raussendorf, Harrington and Goyalare similarly large [RHG07].Accordingly, the focus in quantum fault tolerance has shifted from threshold calcu-lations to resource reduction and optimization. In all schemes, particularly those basedon concatenated codes, the dominant source of overhead is due to error correction. Mostencoded gates on an n -qubit code can be implemented using roughly n gates. Typical errorcorrection procedures, meanwhile, require additional ancillary qubits and can require ten to32ne-hundred times as many gates [Sho96, Ste96, Kni05].Steane has proposed an error correction method based on ancillary encoded stabilizerstates [Ste96] (see Section 4.4.1), and in 2002 showed a method for preparing such statesfault-tolerantly [Ste02]. Steane’s method uses a hierarchy of many encoded stabilizer statesthat can be used to verify the reliability of a single encoded state. Reichardt suggesteda procedure for improving on Steane’s method [Rei06a], and in Chapter 6 we examineadditional improvements in detail.Aliferis and Cross have demonstrated a fundamentally diﬀerent approach to fault-tolerant error correction for the family of Bacon-Shor subsystem codes [AC07]. Theirmethod eliminates the need for encoded ancillas and, instead, requires only nearest-neighbortwo-qubit measurements which can be accomplished with just a single “bare” ancilla qubit.Similar bare-ancilla techniques are used for topological codes [LAR11, FMMC12].For many quantum error-correcting codes, Cliﬀord operations can be implementedvery eﬃciently. In 2004, Bravyi and Kitaev showed that universal fault-tolerant quantumcomputation is possible with only Cliﬀord gates and special “magic” resource states [BK05].Speciﬁcally, a fault-tolerant T gate can be obtained by progressively reﬁning noisy magicstates into fewer, but less noisy copies in a process known as state distillation . See Sec-tion 4.3.2.Unfortunately, state-distillation is usually very costly. The cost of distilling a T gate toﬁdelity (1 − (cid:15) ) scales as O (log . (1 /(cid:15) )), but again the numbers are large in absolute terms;usually thousands of magic states are required. Recently, though, a ﬂurry of results haveyielded signiﬁcant improvements. In 2012, Meier, Eastin and Knill [MEK13], Bravyi andHaah [BH12], and Jones [Jon12] have each proposed new methods for T -gate distillation.The protocol of Jones comes arbitrarily close to O (log(1 /(cid:15) )) in the number of magic states,and this is conjectured to be optimal. However, the total costs of the new protocols aremore challenging to calculate, and so their practical beneﬁts are less clear [FDJ13, Jon13c].Fowler and others have incorporated and optimized various distillation methods foruse in the surface code [FD12, FDJ13], including a method for parallelization [Fow12c].Jones and Eastin have independently observed that distillation of so-called Toﬀoli states can yield improvements compared to Toﬀoli gate constructions that use fault-tolerantCliﬀord and T gates [Jon13d, Eas13, Jon13a, Jon13c]. In total, such optimizations canyield orders-of-magnitude improvements in the fault-tolerance resource overhead comparedto naive methods [Jon13d, Jon13c]. 33 .1.3 Unitary decomposition Fault-tolerance schemes provide universality through a small discrete set of encoded gates.However, quantum algorithms are usually speciﬁed in terms of arbitrary unitaries. Untilrecently, the standard method for decomposition into fault-tolerant gates has been theSolovay-Kitaev algorithm [DN05]. Once again, the decomposition cost of O (log . (1 /(cid:15) ))is asymptotically eﬃcient, but often requires tens of thousands of fault-tolerant gates inabsolute terms.In principle the decomposition cost is lower bounded by a more modest scaling of O (log(1 /(cid:15) )) [Kit97, KSV02]. Fowler suggested an optimal approximation of single-qubitunitaries by optimized but exponential-time direct search [Fow11]. In 2012 Kliuchnikov,Maslov and Mosca (KMM) characterized the set of single-qubit unitaries that can beexactly decomposed with { Cliﬀord , T } and gave an optimal and eﬃcient algorithm for exactdecomposition [KMM12b], and later an asymptotically optimal algorithm for approximatedecomposition [KMM12a]. Further improvements by Selinger [Sel12], and KMM [KMM12c]soon followed.Several other methods for single-qubit unitary decomposition have been proposed. Onemethod involves preparing so-called Fourier states and using phase kickback [KSV02].Using recent optimizations due to Jones [Jon13b], this method is shown to be competitivewith [Sel13] and [KMM12c] when using the surface code. Bocharov and Svore have shownthat decomposition into an alternative gate set { Cliﬀord , V = ( I + 2 iZ ) / √ } can be upto six times better than [KMM12c], but requires an implementation of V which is moreeﬃcient than those currently known [BGS13]. In Chapter 8 we discuss a V implementationthat requires 5 . T gates (in expectation), thus making [BS12] competitive with all of themethods above. We also present a class of non-deterministic quantum circuits that can beused to approximate single-qubit unitaries for less than half the cost of existing methods. Noise thresholds for quantum computation manifest in a variety of forms depending onphysical noise and gate models, physical connectivity constraints, choice of error correctingcode, method of error correction and the rigor with which the result is obtained. In all casesthough, the goal is the same: determine the conditions under which reliable large-scaleimplementation of a quantum algorithm is possible. We now discuss these various conditions,and outline techniques for calculating threshold values.34 .2.1 Noise models

In order for fault-tolerant techniques to be eﬀective, the strength of the noise must bebelow a certain threshold value. The way that strength is deﬁned, and the methods forcalculating the threshold depend on the way in which the noise is modeled. Many diﬀerentmodels can be considered and a broad categorization includes: • Stochastic - physical gates fail according to a probability distribution, • Markovian - physical gates fail independently, • Non-Markovian - gate failures may be correlated, • Local - gate behavior is correlated to a constant number of other gates.Additional classiﬁcations are also possible. For example, one can consider noise which actsunitarily only on the computer, and does not include the environment.

Pauli and Cliﬀord channels

The simplest way to model noise is as a Pauli channel. In this setting, each gate is speciﬁedby the ideal version of the unitary followed by either the identity, or some Pauli-group erroraccording to a probability distribution. The Pauli channel is an example of a stochasticand Markovian noise model in that errors occur independently at each gate according to aﬁxed probability distribution. Speciﬁc cases include physically motivated noise such at thethe depolarizing channel and the dephasing channel [NC00]. In the depolarizing channel,for example, a single-qubit gate may be followed by one of { X, Y, Z } each with probability p/

3, where the parameter 0 ≤ p ≤ EC Ga TEC

Figure 4.3: A rectangle, indicated here by the dotted line, includes a gate gadget (Ga)followed by a trailing error correction (TEC). An extended rectangle (exRec) also includesthe leading error correction (LEC).

More general noise models

Threshold calculations can be made for more general kinds of noise models, as well. Aliferis,Gottesman and Preskill (AGP) [AGP06], assume a local non-Markovian error model whichis similar to a Pauli channel except that, when an error occurs, an adversary is allowed tochoose the Pauli error. In this model, gates fail stochastically, but the adversary is allowedto coordinate the errors (in both time and space) among faulty gates in the circuit.AGP also prove a threshold for a stronger non-stochastic model in which the behaviorof a gate can depend on conditions of both the quantum computer and the environment atother points in space and time. That is, gate failures are no longer independent but can becorrelated by a kind of quantum memory. Others have also considered non-stochastic modelswith varying restrictions on the type and strength of correlations [TB05, AKP06, NP09].Preskill has considered the most general noise model of all [Pre13]. In his model, thecoupling between the environment and the computer is allowed to be completely arbitrary,assuming only that single qubits can be prepared with reasonable ﬁdelity. Preskill showsthat a positive threshold exists so long as the strength of k -qubit interactions decays rapidly(i.e., exponentially) with k . Many threshold theorems consider fault-tolerant, noisy simulations constructed by compilingan ideal quantum circuit into a sequence of rectangles , each of which contains an encodedoperation “gadget” (Ga) and a trailing error correction gadget (TEC). See Figure 4.3.The methods and notation here and in the remainder of the chapter follows [AGP06]. Agadget may contain many physical locations , i.e., unitary gates and qubit preparations andmeasurements, each of which may be faulty (according to the prescribed noise model). Agadget in which there are n faulty locations is said to contain n faults.For simplicity, we will assume that data is encoded into a quantum error-correcting36 a TEC idealdecoder ≡ idealdecoder idealgate Figure 4.4: A rectangle is correct if the rectangle followed by an ideal decoder is equivalentto an ideal decoder followed by the ideal gate.code that encodes just a single qubit. That is, each logical qubit belongs to its own codeblock. We will also assume that the same error-correcting code is used throughout.A decoder is gadget that maps an encoded logical state, possibly containing errors, tothe corresponding single qubit state. We can use the decoder gadget in order to reasonabout the relationship between a rectangle and the intended logical gate.

Deﬁnition 4.2.1 (Rectangle correctness) . A rectangle is correct if the output of therectangle followed by an ideal decoder (a decoder containing no faults) is equivalent to theoutput of an ideal decoder followed by an ideal implementation of the corresponding gate.See Figure 4.4. If a rectangle is not correct then it is incorrect . In other words, a correct rectangle eﬀectively acts as an encoded version of the intendedgate. If all rectangles are correct then a simple inductive argument shows that the compiled,noisy circuit successfully simulates the original ideal circuit. By “simulates” we mean thatthe probability distribution obtained by measuring the outputs of ideal circuit is equivalentto the probability distribution obtained by measuring the outputs of the noisy fault-tolerantcircuit. We should emphasize here that the decoder gadget, ideal or otherwise, is conceptualonly. It is not actually used in the fault tolerant simulation.For a ﬁxed stochastic noise model and a ﬁxed quantum error-correcting code, theprobability that a rectangle is correct is a constant and therefore the probability that allrectangles are correct will generally be exponentially small in the number of gates in thecircuit being simulated. To achieve a constant success probability, code concatenation(see Section 3.3.3) is often used. In a concatenated fault tolerant simulation, each gateis ﬁrst compiled into a rectangle, called a level-one rectangle (1-Rec), as described above.Then, a level-two rectangle (2-Rec) is constructed by compiling each physical gate of the1-Rec into a rectangle. This process is repeated as many times as desired, resulting in acircuit composed of a hierarchy of rectangles.37 trict fault tolerance

Deﬁnition 4.2.1 says nothing about the conditions under which we can expect the rectangleto be correct. Of course, we should expect that a rectangle is correct when it contains zerofaults. It will be helpful to impose some additional constraints on each gadget, however. Agadget which satisﬁes these constraints will be called strictly fault tolerant .Informally, strict fault tolerance requires that a gadget must 1) faithfully perform itsencoded function (either correction of errors or data manipulation) and 2) control thepropagation of errors. The roles of gate and error correction gadgets are distinct, and wedeﬁne strict fault-tolerance separately for each.In the deﬁnitions below, let t = (cid:98) ( d − / (cid:99) , where d is the minimum distance of theerror-correcting code in use. Deﬁnition 4.2.2 (Strict fault tolerance: Ga) . Consider a Ga that contains r faults and forwhich the input contains an error of weight s such that r + s ≤ t . Then the Ga is strictlyfault tolerant if and only if:1. the eﬀect of perfectly decoding the output of the Ga is the same as ﬁrst perfectlydecoding the input to the Ga and then performing the corresponding ideal gate, and2. the weight of the error at the output of the Ga is at most r + s . Deﬁnition 4.2.3 (Strict fault tolerance: EC) . Similarly consider an EC that contains r faults and has an input with a weight s error. The EC is strictly fault tolerant if and onlyif: 1. for r + s ≤ t the state obtained by decoding the output of the EC is the same as thestate obtained by removing the EC and (ideally) decoding the input, and2. the output of the EC contains an error of weight at most r for all r ≤ t , regardless of s . In the above deﬁnitions, the input | ψ (cid:105) to the gadget is some quantum state on n qubits.The input is said to contain an error of weight- k if | ψ (cid:105) is equal to a codeword multiplied bysome Pauli error of weight k , modulo the stabilizers and the logical operators.38 he extended rectangle Rectangles do not overlap, but the output of a rectangle is the input of another and sorectangles do not act independently when errors occur. An error on the output of onerectangle could combine with an error that occurs in the subsequent rectangle to causea logical error. In order to circumvent this problem, the preceding (or leading) errorcorrection gadget (LEC) of a rectangle can be included to form an extended rectangle (exRec). ExRecs do overlap, but under certain reasonable assumptions, the behavior of anexRec is independent of the errors on its inputs.In particular, if the correction applied by the LEC is deterministic for all possible inputerrors, then it can be shown that the syndrome on the output of the LEC is independent ofthe input [CDT09]. The correctness of the enclosed rectangle, therefore, can be determinedby analyzing the exRec in isolation. This observation is a key element of the malignant setcounting technique discussed in Section 4.2.4. If all rectangles at all levels of concatenation are correct, then the fault tolerant simulationreproduces the results of the corresponding ideal quantum circuit.

Level reduction is aconceptual technique for coping with incorrect rectangles in order to maintain a faithfulsimulation result. The idea of level reduction is to incrementally replace each rectangle at thelowest level with either an ideal location (when the rectangle is correct), or a faulty location(when the rectangle is incorrect). Repeating the process for each level of concatenationyields a quantum circuit that directly reﬂects the original circuit, and hopefully contains nofaulty locations.Level reduction begins by placing ideal decoders at the outputs of the rightmost 1-Recs.If a 1-Rec is correct, then by deﬁnition, the behavior of the simulation is unchanged bymoving the decoder to the left and replacing the rectangle with the corresponding ideallocation. If a 1-Rec is incorrect, however, then the decoder is stuck and cannot be movedto the left. Instead, an ideal decoder-encoder pair is placed to the left of the LEC of thecorresponding 1- exRec . The result is a 1-exRec ﬂanked by an ideal encoder and decoderthat can be represented at level-two by a faulty location. See Figure 4.5.By repeating the process for each 1-Rec, the ideal decoders gradually sweep from right toleft, across the entire simulation. A decoder is free to move to the left until encountering anincorrect 1-Rec at which point a new decoder-encoder pair is created to take its place. Theresult is a level-( k −

1) simulation with faulty locations at previously incorrect 1-Recs. In39 dealdecoder idealencoder LEC Ga TEC idealdecoder

Figure 4.5: If a rectangle (indicated by the dotted line) is found to be incorrect, then theideal decoder cannot be moved through to the left. Instead, an ideal decoder-encoder pairis placed to the left of the LEC so that the entire exRec is ﬂanked. The new ideal decodercan now proceed to the left as normal. The encoder-exRec-decoder sequence in the dashedbox is replaced by a single faulty location in the next level of concatenation.this way, level reduction allows the level-( k + 1) analysis to proceed by treating each k -Recas a single independent location. The probability that a “location” fails in the level-( k + 1)simulation is upper bounded by the probability that the corresponding k -Rec is incorrect.As a concrete example, consider the circuit shown in Figure 4.6. The level-reductionprocedure proceeds as follows.1. Examine exRec 2. If the enclosed rectangle is incorrect then replace the entire exRec with a faulty version of the associated (level-zero) gate. Otherwise, replace the rectangle with an ideal version of the associated gate.2. Examine exRec 3. Follow the same procedure as for exRec 2.3. Examine exRec 1. Depending on the outcomes of exRec 2 and exRec 3, one or bothof the TECs may have been removed. The enclosed rectangle now consists of theencoded CNOT and any remaining TECs. If the remains of rectangle 1 are incorrect,exRec 1 is replaced with a faulty level-zero gate. Otherwise, the rectangle is replacedwith an ideal level-zero gate.There are two technicalities in the level-reduction process that must be addressed.First, when an incorrect rectangle is encountered, the newly created decoder-encoder splitsthe preceding rectangle, eﬀectively removing the TEC from the rectangle. This problemcan be readily ﬁxed by deﬁning correctness for the partial rectangle in a straightforwardway. Second, ﬂanking the incorrect exRec with an encoder and decoder allows us to treatit as a faulty location at level-two. But the deﬁnition of incorrectness is insuﬃcient toidentify which error actually occurred. AGP solve this problem by using an adversarialnoise model in which the worst-case error is always assumed. In Chapter 7, we will see thatother noise models can be accommodated by more carefully characterizing correctness andincorrectness. 40 xRec2EC • EC idealdecoderexRec1EC • EC EC idealdecoderEC EC • EC idealdecoderEC EC idealdecoderexRec3

Figure 4.6: An example of a fault tolerant simulation with three overlapping exRecs. Level-reduction starts by sweeping the decoders back through exRec 2 and exRec 3, and thenmoving on to exRec 1.

At each level k of concatenation, the probability that the k -Rec is correct increases relativeto level k − t , and assume that the gadgets in theexRec are strictly fault tolerant. Then the enclosed rectangle is guaranteed to be correct ifit contains no more than t faults, and the probability of incorrectness p can be naivelyupper-bounded as p ≤ (cid:18) nt + 1 (cid:19) p t +1 , (4.4)where n is the number of locations in the exRec and p is an upper bound on the probabilitythat a location is faulty. An inductive argument shows that the threshold is then lower41ounded by p th ≥ (cid:18) nt + 1 (cid:19) − /t . (4.5)Equation (4.4) (and therefore (4.5)) can be improved by noting that, though the codecan only correct errors up to weight t , an exRec that contains more than t faults need notbe incorrect. Say, for example, that two faults occur, one in the LEC and one in the TECand that the code can correct a single error—i.e., t = 1. If the TEC fault occurs early on,then it is likely that the two faults combine to cause an uncorrectable error. But if theTEC fault occurs after the error from the LEC has been corrected, then the rectangle willstill be correct.Malignant set counting is the process of enumerating subsets of faulty locations in theexRec, and counting only those that can actually cause incorrectness. A set of locationsis considered malignant if there exists some ﬁxed combination of nontrivial Pauli errorsacting on that set of locations that causes the enclosed rectangle to be incorrect. Let M k be the number of malignant sets of size k . Then by counting all of the malignant sets ofsize at most K , we may use the bound p ≤ (cid:18) nK + 1 (cid:19) p K +1 + K (cid:88) k = t +1 M k p k , (4.6)which can be substantially better than (4.4).Malignant set counting is both conceptually simple, and highly ﬂexible. As a concreteexample, AGP used malignant set counting to prove a threshold of 2 . × − for adeterministic scheme based on the [[7 , , K . It is usually feasible to count subsets only up to some smallﬁxed size. In Chapter 7 we discuss a solution that eliminates many subsets of locationswhich are unlikely to be simultaneously faulty, thereby permitting much larger values of K .42 .2.5 Alternative proof techniques Malignant set counting and related techniques are eﬀective for proving threshold lowerbounds for schemes based on concatenated codes. For other codes, and for topologicalcodes in particular, the arguments made by level-reduction no longer apply, since thereare no “levels” so-to-speak. Alternative proof techniques are available, however. Onepopular method is to map errors in topological codes onto models based on statisticalphysics [DKLP02, Har04]. With these models, it is possible to prove thresholds in therange 1-10 percent. However, these high thresholds are obtained by assuming the abilityto measure stabilizer generators without creating correlated errors and the ability toclassically compute corrections based global information about the syndromes. Recently,Fowler used a combinatorial argument to prove a lower bound of 7 . × − for the surfacecode [Fow12b]. His model includes explicit circuits used to measure stabilizer generators (sothat measurements can introduce correlations) and requires only locally-bounded classicalsyndrome processing. An alternative solution to the malignant set counting complexity problem is to randomlysample rather than exhaust over all possible subsets. Any stochastic error model induces aprobability distribution of faulty locations, which can be sampled using the Monte Carlomethod. The result is an estimate of the threshold to within some statistical conﬁdenceinterval. Aliferis and Cross have used this technique to calculate thresholds for a varietyof codes [AC07]. Steane [Ste03] and Knill [Kni05] have used Monte Carlo sampling todirectly simulate depolarizing noise on sequences of rectangles and calculate the probabilityof correctness. Svore and others have used a more limited simulation of a single level-oneexRec [STD05, SCCA06, CDT09]. Their simulations yield a value called the pseudo-threshold , which is a rough estimate of the threshold rather than a statistical bound, but iseasier to calculate.Monte Carlo simulation has been used extensively to estimate thresholds for schemesbased on topological error-correcting codes, which do not conform to the rectangle andgadget paradigm outlined in Section 4.2.2 [RHG06, RH07, RHG07, FSG09, WFSH10, FY10,WFH11, SMN13]. In these cases, a small patch of the code is simulated many times overa range of physical error rates and for progressively larger code distances. Plotting theresults by code distance yields a “waterfall” shape in which the intersection of the curvesconverges to a point which is deemed the threshold. Figure 4.7 shows simulation results forthe surface code [Fow13c]. 43 -8 -7 -6 -5 -4 -3 -2 -1 × -5 × -4 × -3 × -2 × -1 Log i c a l X e rr o r r a t e ( p L ) Depolarizing probability (p) d=3d=5d=7d=9d=11d=15d=25

Figure 4.7: An example of Monte Carlo simulation for the surface code. A patch of thesurface code is simulated for a variety of depolarizing noise strengths p and code distances d . The threshold corresponds to the intersection point. Reproduced, with permission,from [Fow13c]. 44 .2.7 Limitations Threshold theorems describe the circumstances for which reliable quantum computationis possible. Under which circumstances is quantum computation not possible? In otherwords, what are the upper bounds on the noise threshold?Harrow and Nielsen showed that two-qubit gates are incapable of generating entanglementwhen subject to depolarizing noise with strength 0 .

74, or 0 .

50 for more general noise [HN03].This result for depolarizing noise was sharpened to 0 .

67 by [VHP05]. Another way to upperbound the threshold is to allow perfect stabilizer operations and then determine the noiserate at which { Cliﬀord , T } circuits can be simulated classically. (Recall from Section 2.5.1that Cliﬀord circuits can be simulated classically.) Using this technique [VHP05] showthat classical simulation is possible for dephasing noise with strength 0 .

3, or about 0 . . + +

06] were later shown to be tight in the sense thatmagic state distillation (see Section 4.3.2) permits universal quantum computation if thenoise strength on the T gate is below the bound [Rei05]. More recently, it was shown thatthe results are tight for all single-qubit non-Cliﬀord gates, not just T [vDH09].On the other hand, [KRUdW10] have considered the case in which single-qubit gatesare perfect, but k -qubit gates are subject to depolarizing noise. For the case k = 2, theyshow that the output of the circuit is independent of the input when the noise strengthis 0 . .

26 withoutrestriction on the protocol for non-Cliﬀord gates. The bounds in other cases are as low as0 . .3 Encoded computing Fundamentally, fault tolerance is the practice of simulating an ideal computation by carefullymanipulating encoded data. In particular, we should like that encoded operations meetthe conditions given by Deﬁnition 4.2.2, namely that they faithfully execute the intendedlogical operations, and that they prevent errors from spreading among physical qubits.The logical operations permitted within a ﬁxed error-correcting code are limited, however.For a stabilizer code, the set of available operations corresponds exactly with the normalizer,i.e., operators that commute with all of the stabilizers Section 3.3.1. Most unitary operationsthat can be performed on a code block do not actually realize a unitary operation on theencoded qubits.Furthermore, proposals for quantum computing architectures usually provide a small setof physical one- and two-qubit operations (see, e.g., [LJL + { H, T,

CNOT } , though there are others. In Chapter 5 we will use { H, CCZ } , and in Chapter 8 we will discuss another alternative gate set. The simplest and most well-behaved class of encoded operations is called transversal . Acircuit is transversal if each physical gate acts on at most one qubit in the encoded block.In the case of multi-qubit operations, the circuit is transversal if each gate acts on at mostone qubit in each of the encoded blocks, and no qubit is involved in more than one gate.See Figure 4.8.Transversal circuits are automatically (strictly) fault tolerant. A single faulty gate canproduce only a single error on a given block. Thus the maximum weight of an error on anyblock after application of a transversal circuit is at most r + s , where r is the maximumweight of an existing error on any block and s is the number of faulty gates.The set of encoded gates that can be implemented transversally depends on the error-correcting code. The single-qubit Pauli operators are transversal for any stabilizer code,and CNOT is transversal for any CSS code (a consequence of independent X and Z stabilizers). Speciﬁc codes may admit transversal implementations of other operations. The[[7 , , H and S , in addition to { X, Y, Z,

CNOT } . 46 • . . . • . . .  Figure 4.8: Transversal implementation of an encoded CNOT. Each gate touches exactlyone qubit per block, and no qubit is involved in more than one gate.As noted in Section 2.5.1, the gate set { H, S,

CNOT } generates the Cliﬀord group which,though useful, is insuﬃcient for universal quantum computation. Indeed, no quantum error-correcting (or error-detecting) code admits transversal implementation of a universal set ofgates [EK09]. In Chapter 5, however, we will see a scheme which eﬀectively circumventsthis limitation by incorporating error correction. Given the Cliﬀord group, universality can be achieved by adding a single non-Cliﬀord gate(see, e.g., [CAB12] Appendix D). Fault-tolerant implementation of the non-Cliﬀord gateis usually accomplished by preparing many noisy copies of a special resource state, and“distilling” them into a single high-ﬁdelity copy. The high-ﬁdelity state can then be used toeﬀect the desired gate using gate-teleportation (Section 2.7).Importantly, the distillation and gate teleportation circuits for certain resource statescan be accomplished using only Cliﬀord gates and Z -basis measurement. For example, thestate | A (cid:105) = T | + (cid:105) can be distilled and teleported to implement T using only CNOT, H and S [BK05]. See Figure 4.9.The distillation circuit shown in Figure 4.9c can be understood as a novel kind ofgate teleportation circuit. Consider the circuit in Figure 4.10. This circuit implementsteleportation of the state T | + (cid:105) . In this case, however, the T gate has been commutedto the right side of the CNOT (rather than the left). Before performing the T gate, thetop ancilla is encoded into a quantum error correcting code that supports transversal (orotherwise robust) implementation of T . In this case we use the [[15 , , T [KLZ96]. An encoded47 A i • S T | ψ i| ψ i Z • (a) | A i • X | A i • X | A i Z | A i • X | A i Z | A i Z | A i Z | A i • X | A i Z | A i Z | A i Z | A i Z | A i Z | A i Z | A i • | A i (b) | + i • T † X | + i • T † X | i T † X | + i • T † X | i T † X | i T † X | i T † X | + i • T † X | i T † X | i T † X | i T † X | i T † X | i T † X | i T † X | + i • • T † X | i | A i (c) Figure 4.9: State distillation of the T gate. (a) For many quantum error-correcting codes,the T gate is implemented by preparing the resource state | A (cid:105) = √ ( | (cid:105) + e iπ/ | (cid:105) ) andusing gate-teleportation. Conditioned on the measurement outcome, an S correction maybe required. (b) 15 noisy | A (cid:105) states can be used to prepare a single high-ﬁdelity | A (cid:105) state, conditioned on a +1 outcome for each measurement [BK05]. The circuit is basedon the decoding circuit for the [[15 , , T may beapplied to one half of a Bell-pair that is encoded into the [[15 , , X -basismeasurement then teleports the T gate onto the other half of the Bell-pair, again conditionedon +1 results for each measurement [RHG07]. An abstract version of this circuit is shownin Figure 4.10. 48 + i • encode T X | i Z T | + i Figure 4.10: This circuit outputs | A (cid:105) = T | + (cid:105) by gate teleportation. Before performingthe T gate, the top qubit is encoded into an error correcting code. The T gate and X -basismeasurements are performed logically on the code. X -basis measurement then completes the circuit. Usually, the entire circuit is alreadyencoded in the code that we are using to implement Cliﬀord gates. The encoding gatein Figure 4.10 then concatenates this “base” code with the [[15 , , T gate.Variations on Figure 4.9 and Figure 4.10 also work. Recently, Bravyi and Haah showedhow to construct a wide class of quantum codes that admit eﬃcient implementation of theencoded T gate [BH12] and can similarly be used for distillation. Others have developedprotocols based on codes that admit transversal Hadamard [MEK13, Jon12]. Toﬀoli gatescan be obtained using a similar distillation and teleportation procedure [Eas13, Jon13d].Early proposals for fault tolerant implementations of non-Cliﬀord gates diﬀered somewhatfrom the protocol described above. Shor proposed a procedure for implementing the Toﬀoligate based on fault-tolerant construction of a cat state plus other transversal gates [Sho96].Knill, Laﬂamme and Zurek proposed the use of the [[15 , , T is transversal,but H is not [KLZ96]. They construct fault-tolerant H using preparation of an encoded | + (cid:105) state and a teleportation-like circuit. These methods are further discussed in Chapter 5.Topological codes oﬀer a qualitatively diﬀerent way to perform fault-tolerant encodedgates. Many topological codes are also stabilizer codes, and for those codes the sameconcept of transversality still applies. However, it can be more productive to implementencoded gates by instead deforming the surface on which the code is supported. In thesurface code, for example, encoded qubits are deﬁned by introducing logical “defects” intothe lattice of physical qubits. Encoded gates are then performed by moving defects aroundeach other and fault-tolerance is ensured by keeping the defects suﬃciently far apart (see,e.g., [FMMC12]). Code deformation is not universal on its own, though. State distillationis typically used for topological codes, as well.49 P P P | + i • • • • X Figure 4.11: This circuit measures the the four-qubit stabilizer P ⊗ P ⊗ P ⊗ P wherethe P i are Pauli operators. Each Pauli operator is applied, controlled on the ancilla qubit | + (cid:105) . The measurement outcome corresponds to the eigenvalue of the stabilizer. This circuitis not fault tolerant since an error on the ancilla can spread to the other qubits through thecontrolled- P i gates. Fault-tolerant encoded gates are carefully designed to prevent errors from spreading betweenqubits. Even so, errors must be periodically identiﬁed and ﬂushed away by measuring errorsyndromes and making corrections. There is a very simple circuit that measures the errorsyndrome. Figure 4.11 shows an example for a weight-four stabilizer. However, this circuitis not fault tolerant. An error on the ancilla qubit can spread to many of the data qubits,possibly causing a logical error. More complicated error correction circuits are usuallyrequired in order to limit the spread of errors.A variety of error-correction techniques have been studied, and three broad categoriesare so-called Shor-type [Sho96], Steane-type [Ste96] and Knill-type [Kni04] error correction.This is only a rough categorization, and it leaves signiﬁcant room for introducing new ideasand optimization within or beyond these categories; see, e.g., [Rei04, DA07, AC07].Common to each of these types of error correction is the use of ancillary qubits to extracterror information from the data blocks. Before interacting with the data, the ancilla qubitsneed to be prepared in an entangled state. Error information is transferred by couplingthis state with the data. Finally, measurements are used to obtain syndrome information.The methods diﬀer mainly in the type of entangled states that are required.

Steane-type error correction is based on the circuit shown in Figure 4.12. X errors arecorrected by preparing an encoded | + (cid:105) state, performing CNOT from the data to the50 -error correction X -error correction | ψ i •| i • X | + i Z Figure 4.12: In Steane-style error-correction, Z and X errors are corrected separately. Z errors are corrected by preparing an encoded | (cid:105) , performing transversal CNOT, and thentransversally measuring in the X -basis. Similarly, X errors are corrected by preparingencoded | + (cid:105) , performing transversal CNOT, and measuring transversally in the Z -basis.ancilla and then measuring the ancilla in the Z basis. Z errors are independently correctedby instead preparing encoded | (cid:105) , performing CNOT from the ancilla to the data, andmeasuring the ancilla in the X basis. Note that, under ideal conditions, neither of thecircuits have any eﬀect on the encoded data. The state | + (cid:105) is the +1-eigenstate of X andis therefore invariant as the target of a CNOT. Likewise, a CNOT does not activate whenits control qubit is in state | (cid:105) .Steane error correction requires that X and Z errors can be corrected independently,and therefore applies only to CSS codes (see Section 3.3.2). Transversal measurement ofthe ancilla eﬀectively measures all of the stabilizer generators of a particular type (either X or Z ) in parallel. Thus, it is typically more eﬃcient for large codes than Shor-type errorcorrection, which measures each generator individually. Its conceptual simplicity has alsomade it a popular choice for threshold studies, e.g., [AGP06, Rei06b, CDT09].The drawback of Steane error correction is that preparation of suﬃciently robust encoded | (cid:105) and | + (cid:105) states can be complicated. Systematic techniques for preparing such encodedstabilizer states exist [Ste02, PMH03], but errors can occur during preparation. The ancillastate must therefore be “veriﬁed” before being coupled to the data [Ste02, Rei04, Rei06a].Several techniques for improving stabilizer state preparation and veriﬁcation are discussedin detail in Chapter 6. Knill-type error correction, like Steane-type, uses encoded ancillary states. In this case,however, the required states are more complicated. Knill error correction is based ongate teleportation. See Figure 4.13. First an ancillary Bell state is prepared, followedby application of the desired unitary. Then a Bell measurement serves simultaneously toteleport the data, and measure the error syndrome.51 ψ i • X | + i • Z | i U { X, Y, Z } U | ψ i Figure 4.13: In Knill-style error correction, syndrome measurements and an encoded gateare accomplished simultaneously. The circuit above implements the single-qubit encodedunitary U and corrects both X and Z errors. The resource state in the dashed box can beprepared and veriﬁed oﬄine. Conditioned on the (logical) measurement outcomes, a Paulicorrection may be required (as in teleportation). In most cases this Pauli correction can benoted classically and need not actually be applied.The advantage of preparing a more complicated resource state is that the preparationcan be done “oﬄine”. The bulk of the work of both error correction and encoded gates canbe completed before ever touching the data. As a result, it is possible to use error- detection ,throwing away ancillary states that exhibit errors. Since codes can detect far more errorsthan they can correct, this method can oﬀer substantially higher thresholds than Steane-type or Shor-type error correction [Kni05]. The concept has strong similarities with statedistillation, which also uses error detection.High-levels of non-determinism, however, can be very resource intensive and can lead topoor threshold performance in certain circumstances [LPSB13]. Rigorous threshold analysisis also more complicated [Rei07, AGP08]. Shor-type error correction is the only of the three types that does not use encoded ancillarystates to extract syndrome information. Instead, each syndrome is measured by preparinga so-called GHZ, or “cat” state as in Figure 4.14. Like the Steane and Knill methods, theancillary state must ﬁrst be veriﬁed using error detection to make sure that errors do notspread back to the data. See Figure 4.14. In order to ensure reliable results, each syndromemeasurement is repeated a number of times that is proportional to the distance of the code.The size of a stabilizer measurement circuit corresponds directly with the weight ofthe stabilizer. Consequently, Shor-type error correction is most useful for codes that havelow-weight stabilizers. In some cases, syndrome measurements can be implemented using asingle “bare” ancilla qubit, without creating a cat state. Bare ancillas are usually used insurface code schemes, for example [FMMC12].52 P P P (veriﬁed) | i + | i • X • X • X • X Figure 4.14: A weight-four Shor-style syndrome measurement. This circuit diﬀers from Fig-ure 4.11 in that each qubit of the cat-state ancilla interacts with at most one qubit of theencoded data. The cat state √ ( | (cid:105) + | (cid:105) ) must be checked for errors (veriﬁed) beforeit can be used. The resource requirements for fault-tolerant quantum computation can be speciﬁed ina variety of ways including: circuit size (number of gates), circuit depth (time), circuitwidth (number of physical qubits), or the number of a particular type of gate ( T gates,for example). Often it is possible to trade one type of resource for another. One commonexample is to trade circuit width and circuit depth. Fowler, for example, has shown howto minimize computational depth in the surface code at the expense of a larger qubitlattice [Fow12c]. Therefore, it is often sensible to express resource requirements in terms ofcircuit area (depth × width) or volume (space × time).Threshold theorems show that both the time and space resources required for reliablequantum computation scale eﬃciently with respect to the size the original noisy computation,in the asymptotic sense. Given a noisy circuit of size n , it is possible to construct a fault-tolerant simulation that takes time and space n · polylog( n ). The constants in the polynomialcan be overwhelmingly high, however, and some examples were noted in Section 1.2.We now have a clearer picture for why the overhead is so large, and how the variousparts of fault-tolerance schemes contribute to the overhead. The most obvious sourcesare error correction and state distillation, both of which involve multiple rounds of errorchecking in order to produce resource states of suitable ﬁdelity. But there are other lessobvious sources, too. For example, the resource overhead increases rapidly as the gateerror-rate approaches the threshold. From (4.2) we see that a physical error rate of p = p th /α / log α , which increases exponentiallynear the threshold p th . Additional overhead is incurred from decomposing unitaries fromthe quantum algorithm into the limited set of fault-tolerant gates.In the remaining chapters, we will examine each of these sources of overhead, in turn.In most cases, our focus will be on optimizing size and width requirements, though someoptimizations will also improve circuit depth. In addition to suppressing noise, fault tolerant constructions must also satisfy other hardwareconstraints. This can mean, for example, accounting for more complicated noise modelssuch as those with qubit leakage, but may also involve limitations on the set of availablegates, or the placement of and interactions among qubits.One of the most signiﬁcant limitations of proposed quantum computing architecturesis qubit geometry. Many such proposals involve a lattice of qubits in a limited number ofspatial dimensions (see references contained in [SE09, FMMC12]). Qubits in the latticeare allowed to interact only with a small number of nearby qubits, usually only nearestneighbors. A variety of studies have considered lattices in one-dimension [Got00, SSP13,DFH04, FHH04, SE09, SWD10], two-dimensions [STD05, SDT07, FMMC12], and three-dimensions [BMD07, Haa11, Mic12, BK12, Kim12]. Geometric connectivity constraintscan signiﬁcantly impact the performance of a fault-tolerance scheme, particularly for thosebased on concatenated codes [SDT07, LPSB13]. Topological codes, however, are eachtailored to a speciﬁc geometry and suﬀer little when the computer geometry is similar tothe intended topology of the code.Other limitations have also been considered. Gate execution times can vary dependingon the gate. Measurements often take longer than unitary gates, though it is possible toovercome this limitation [DA07, PSBT10b]. Fault-tolerance can also be achieved whencontrol of individual qubits is limited [BBK03, Kay05, Kay07, FT07, FT09, PSBT10a,PSBT11]. Production of qubit lattices on physical substrates will likely include somenumber of defective qubits. With some care fault-tolerance protocols can be adapted toavoid defective regions, even subject to geometric locality constraints [N + ]. Practicallyspeaking, it is easier to manufacture many small regions rather than one monolithic lattice.Several authors have considered fault-tolerant quantum computation in which qubits aredistributed among many small nodes [DMN11, VLFY10, KK09, DFS +

09, HFDV12].An assumption that is almost ubiquitous in analysis of fault-tolerance schemes is that54erfect and arbitrarily fast classical control logic is available. In reality, though, classicalcomputers have limitations, and connecting classical and quantum logic requires physicalspace. Decoding and interpreting measurement results is eﬃcient for concatenated codes,and can be made similarly eﬃcient for topological codes [DFT +

10, FWH12, Fow13b]. Evenso, low-latency high-performance classical logic is desirable and may be necessary forarchitectures with small quantum gate times. One attractive option is to use low-powersuperconducting technology [HHOI11, Muk11, VSFM13, HRM13, HHO + hapter 5Fault-tolerant universal computationwith transversal gates This chapter is based on material that appears in [PR13].At the highest level, fault-tolerant quantum computation involves only two steps:encoded computation, and error correction. Thus reducing resource overhead requiressimpliﬁcation of either or both of these steps. In this chapter we address the former,encoded computation. In particular, we show that a universal set of fault-tolerant gatescan be implemented using only the simplest of constructions, transversal gates.Recall from Section 4.3.1 that a transversal gate is the application of physical gatestransversally across the codewords, usually meaning that the j th gate is applied to the j thqubits of the codewords, for every j . Transversal gates are highly desirable because theyare both extremely simple and automatically fault tolerant, according to Deﬁnition 4.2.2.Depending on the gate, a transversal implementation may or may not preserve the codespaceand execute a valid encoded operation. Consider the [[7 , , X and Z logicaloperators and so transversal Hadamard implements logical Hadamard. On the other hand,transversal T is not a logical operation on this code; it corrupts the X logical operator.Until 2007, an important open question in quantum information theory was whetheror not there exist codes that admit transversal implementation of a universal set of gates.Due to the inability to ﬁnd one, it was conjectured that no such code existed. Zeng, Crossand Chuang conﬁrmed this conjecture for stabilizer codes on qubits [ZCC11], and then56long with Chen and Chung extended the result to qudits [CCC + Theorem 5.0.1 (Transversal universality is impossible [EK09]) . For any quantum codecapable of detecting an error on any physical subsystem, the set of transversal logicaloperations is not universal.

Theorem 5.0.1 is unfortunate because the traditional method for completing a universalset of fault-tolerant gates is state distillation, a procedure which is highly costly comparedto transversal gates. See Section 4.3.2. Indeed state distillation dominates the resourceoverhead for fault-tolerant quantum computation [RHG07, FDJ13].In this chapter we propose a way of implementing a universal set of quantum gatestransversally, up to a correction that can be made by the standard error-correction procedure.The inclusion of error correction means that Theorem 5.0.1 is preserved. However, since errorcorrection is required anyway, our protocol eﬀectively shows that the no-go theorems [ZCC11,CCC +

08, EK09] can be circumvented without adding any new machinery. Separate injectionand distillation procedures are not required.Our construction is based on two main insights for the class of “triorthogonal” quantumstabilizer codes, introduced recently by Bravyi and Haah [BH12]. First, we observe thatthe controlled-controlled- Z operation (deﬁned by CCZ | a, b, c (cid:105) = ( − abc | a, b, c (cid:105) for bits a, b, c ) can be implemented transversally for any triorthogonal quantum code. Second, weshow that Hadamard can be implemented by transversal H gates followed by stabilizermeasurements and Pauli X corrections. Together, H and CCZ are universal for quantumcomputation [Shi03, Aha03]. Let us begin by specifying the construction of stabilizer codes based on triorthogonalmatrices. For two binary vectors f, g ∈ { , } n , let f · g ∈ { , } n be their entry-wiseproduct, and let | f | denote the Hamming weight of f . Deﬁnition 5.1.1 (Triorthogonal matrix [BH12]) . An m × n binary matrix G , with rows f , . . . , f m ∈ { , } n , is triorthogonal if | f i · f j | = 0 (mod 2) and | f i · f j · f k | = 0 (mod 2) for all pairs ( i, j ) and triples ( i, j, k ) of distinct indices. m × n triorthogonal matrix G can be used to construct an n -qubit, “triorthogonal,”CSS code as follows. Deﬁnition 5.1.2 (Triorthogonal code [BH12]) . For each even-weight row of a triorthogonalmatrix G , add an X stabilizer generator by mapping non-zero entries to X operators, e.g., (1 , , (cid:55)→ X ⊗ I ⊗ X . Similarly add a Z stabilizer for each row of the orthogonal complement G ⊥ = { g : | g · f | = 0 mod 2 , ∀ f ∈ G } . The logical X and Z operators are then given bymapping non-zero entries of the odd-weight rows of G to X and Z , respectively. For example, the [[15 , , , , k + 8 , k, A special subset of triorthogonal codes admit transversal implementation of the single-qubit T gate. The [[15 , , | f i · f j | = 0 mod 4 , (5.1)for all distinct pairs of even-weight rows ( f i , f j ). This condition implies that all of thestabilizers of the code have weight 0 mod 8. Codes that satisfy (5.1) are called triplyeven [BM12]. In general, T is transversal for triorthogonal codes only up to (non-transversal)Cliﬀord corrections [BH12]. We next construct a fault-tolerant CCZ gate for a triorthogonal code. We claim that forany triorthogonal code, transversal application of CCZ gates realizes CCZ gates on theencoded qubits.

Theorem 5.2.1 (Transversal CCZ for triorthogonal codes) . Let C be a triorthogonal codebased on a triorthogonal matrix G . Then transversal application of CCZ implements logicalCCZ transversally on each of the encoded qubits of C . roof. For simplicity consider ﬁrst the case of a triorthogonal code with a single encodedqubit, i.e., based on a triorthogonal matrix G with a single odd-weight row f (cid:63) . Let G ⊆ { , } n be the linear span of all the even-weight rows of G and let G be the coset { f (cid:63) + g : g ∈ G } . Then the encoding of | a (cid:105) , for a ∈ { , } , is given by the uniformsuperposition over G a : | a (cid:105) = √ |G a | (cid:80) g ∈G a | g (cid:105) .The action of transversal CCZ on an encoded basis state (cid:12)(cid:12) a, b, c (cid:11) , for a, b, c ∈ { , } , istherefore given by CCZ ⊗ n (cid:12)(cid:12) a, b, c (cid:11) = (cid:88) g ∈G a ,h ∈G b ,i ∈G c CCZ ⊗ n | g, h, i (cid:105) = (cid:88) g ∈G a ,h ∈G b ,i ∈G c ( − | g · h · i | | g, h, i (cid:105) . (5.2)Now g · h · i can be expanded as ( af (cid:63) + g (cid:48) ) · ( bf (cid:63) + h (cid:48) ) · ( cf (cid:63) + i (cid:48) ), where g (cid:48) , h (cid:48) , i (cid:48) ∈ G .Expanding further gives one term abc ( f (cid:63) · f (cid:63) · f (cid:63) ) = abcf (cid:63) , plus other triple product termsin which f (cid:63) appears at most twice. Since G is triorthogonal, these other terms necessarilyhave even weight. The term abcf (cid:63) has odd weight if and only if a = b = c = 1. Substitutingback into (5.2), as desired, CCZ ⊗ n (cid:12)(cid:12) a, b, c (cid:11) = ( − abc (cid:12)(cid:12) a, b, c (cid:11) . (5.3)In the case that G has some number k > { f (1) (cid:63) , f (2) (cid:63) , . . . , f ( k ) (cid:63) } wemay deﬁne 2 k cosets, one for each codeword. Let a be a length- k binary vector where eachelement a i represents a logical qubit of the code, and let G a := { g (cid:48) + k (cid:88) i =1 a i f ( i ) (cid:63) | g (cid:48) ∈ G } . (5.4)From (5.4), we can see that the expansion of g · h · i will contain even-weight terms plus k terms of the form a i b i c i ( f ( i ) (cid:63) · f ( i ) (cid:63) · f ( i ) (cid:63) ), each of which is odd if and only if a i = b i = c i = 1.Again substituting back into (5.2) we obtainCCZ ⊗ n (cid:12)(cid:12) a , b , c (cid:11) = k (cid:89) i =1 ( − a i b i c i (cid:12)(cid:12) a , b , c (cid:11) . (5.5)Thus transversal CCZ implements logical CCZ transversally across each of the encodedqubits. 59 • = •• H • H Figure 5.1: The Toﬀoli gate is equivalent to a CCZ gate in which the target qubit isconjugated by Hadamard gates.We note that transversality of CCZ for the the subset of triply-even codes follows triviallyfrom the fact that CCZ can be expressed as a sequence of gates from { T, CNOT } [NC00].Theorem 5.2.1 extends this result to all triorthogonal codes. In a sense, Theorem 5.2.1shows that CCZ is more “natural” than T for triorthogonal codes, since Cliﬀord correctionsmay be required for T [BH12], but are never required for CCZ.If the orthogonality conditions on the matrix G are increased, then additional types ofdiagonal operations are transversal. If G satisﬁes the condition that all j -tuple productshave weight (0 mod 2) for all 2 ≤ j ≤ h , then the h -fold controlled- Z gate is transversal inthe corresponding stabilizer code. This observation is similar to a result of Landahl andCesare, who demonstrated that codes satisfying increasingly stringent conditions on weightsof the codewords admit transversal Z -axis rotations of increasing powers of 1 / k [LC13]. To achieve universality, we also require a fault-tolerant implementation of the Hadamard gate.For Hadamard to be transversal, the code must be self-dual, i.e., G = G ⊥ . Unfortunately,no triorthogonal code is self-dual. Indeed, otherwise, since CCZ is transversal it wouldbe possible obtain a transversal implementation of Toﬀoli and H for the same code.See Figure 5.1. However, Toﬀoli and H together are universal [Shi03, Aha03] and sotransversal implementations of both would violate Theorem 5.0.1.Nonetheless, fault-tolerant and eﬀectively transversal implementations of logical H arestill possible. Theorem 5.3.1 (Transversal H for triorthogonal codes) . Let C be a triorthogonal codebased on a triorthogonal matrix G . Then the encoded Hadamard gate on each of the encodedqubits of C can be implemented fault-tolerantly using transversal H , fault-tolerant syndromemeasurement and classically-controlled transversal X gates. roof. When transversal H is performed on a triorthogonal code, the logical operatorsare transformed properly: logical X maps to logical Z and vice versa. A subset of thestabilizers is preserved; observe that G ⊂ G ⊥ , and thus each element of G corresponds toboth X and Z stabilizers, which transversal H swaps. Transversal H does not preservethe Z stabilizers corresponding to G ⊥ \ G , so these must be restored by measuring andcorrecting them.Consider the eﬀect of measuring one of the Z stabilizer generators ζ corresponding to G ⊥ \ G . The measurement projects the code block onto either the +1 or − ζ according to the measurement outcome. Let χ be a tensor product of I and X operatorssuch that χ anticommutes with ζ and commutes with all other Z stabilizer generators and Z logical operators. Such an operator always exists since ζ is neither an element of the(current) stabilizer nor an element of the normalizer. If the measurement outcome is − χ restores the code block to the +1 eigenspace of ζ .Importantly, even with additional X corrections to ﬁx the Z stabilizers of G ⊥ \ G , theprocedure is fault tolerant. That is, k gate failures can lead to a data error of weight atmost k , for k less than half the code’s distance d . Let d Z be the code’s distance against Z errors, as determined by the X stabilizers of G . Likewise, let d X be the distance against X errors, as determined by the Z stabilizers of G ⊥ . The minimum distance of the code (againstarbitrary Pauli errors) is then d = min { d X , d Z } . But G ⊂ G ⊥ implies that d Z < d X and,therefore, the code’s minimum distance is determined solely by G . Since both the X and Z stabilizers of G are preserved, a minimum distance of d is maintained throughout. Solong as the stabilizer measurements are performed fault-tolerantly, and since the otheroperations are transversal, the entire procedure is fault-tolerant.In fact, the Hadamard construction of Theorem 5.3.1 holds for any CSS code inwhich the X and Z logical operators have identical supports and transversal Hadamardconjugates the X stabilizers to a subset of the Z stabilizers. The triorthogonality condition(Deﬁnition 5.1.1) is not strictly necessary. Rather it is the symmetry of the X and Z stabilizers in the triorthogonal code construction that is important.Informally, Theorem 5.3.1 takes advantage of the fact that the X and Z stabilizershave an asymmetry which is required in order to provide triorthogonality (and thereforetransversal CCZ), but which is otherwise unnecessary. In principle, the extra X -errordistance provided by the Z stabilizers could be used to improve performance for biasednoise [AP08, BP12]. But it can be diﬃcult to properly exploit this asymmetry in practice.For example, direct application of transversal T is not allowed because it splits X errorsinto both X and Z errors (see (3.16)). We choose, instead, to use the asymmetry to reducethe complexity of the Hadamard gate. 61 Figure 5.2: An implementation of the logical Hadamard operation in a triorthogonal code,using Steane’s method for error correction. Transversal Hadamard gates are applied to thedata block. In order to restore the data to the codespace, and also correct any X errors,an encoded | + (cid:105) state is prepared, coupled to the data with transversal CNOT gates andmeasured. X corrections are applied as necessary.The stabilizer measurements required by Theorem 5.3.1 can be incorporated into thenormal fault-tolerant error-correction procedure. Steane’s procedure [Ste96], for example,involves a transversal CNOT from the data to an encoded | + (cid:105) ancilla state. Transversal Z -basis measurements of the ancilla then permit correcting X errors on the data, whilesimultaneously restoring the stabilizer group. See Figure 5.2. (See also Section 4.4.1.)Alternatively, Knill-style or Shor-style error correction could be used. In any case, therequired stabilizers can be measured and corrected using H , X , CNOT, | (cid:105) preparationand Z -basis measurements. By using CCZ gates to simulate CNOT and X , universalitycan be achieved using only | (cid:105) preparation, Z -basis measurement, and H and CCZ gates. In order to make our universal construction concrete, we now walk through an examplebased on the 15-qubit code. We present the example in two equivalent ways. First with the[[15 , , , , , , Z · · · · · Z · · Z · Z · · · , · Z · · · · Z · Z · · Z · · · , · · Z · · · ZZ · · · Z · · · , · · · Z · · Z · ZZ · · · · · , · · · · Z · ZZ · Z · · · · · , · · · · · ZZZZ · · · · · · , · · · · · · · XXXXXXXX, · · · · · · ·

ZZZZZZZZ , · · · XXXX · · · ·

XXXX, · · ·

ZZZZ · · · ·

ZZZZ , · XX · · XX · · XX · · XX, · ZZ · · ZZ · · ZZ · · ZZ , X · X · X · X · X · X · X · X, Z · Z · Z · Z · Z · Z · Z · Z ,where the X stabilizers come directly from Table 3.1b and the Z stabilizers come from theorthogonal complement. For visual clarity, identity operators are indicated by dots. Thelogical X and Z operators correspond to transversal X and Z , respectively. By construction,this code is triorthogonal according to Deﬁnition 5.1.2. The four X stabilizers providedistance three protection against Z errors and the 11 Z stabilizers provide distance sevenprotection against X errors.Transversal Hadamard swaps the X and Z logical operators. The X and Z stabilizersare also swapped. The bottom four generators (both X and Z ) are preserved, since they aresymmetric. The remaining six Z generators have now become X generators. Now the codeprovides distance three protection against X errors and distance seven protection against Z errors; it is the dual of the original code. The [[15 , , Z generators. For each − X correction corresponds toone of the six asymmetric X generators of the dual code.There is an alternative way to interpret this example by using the [[15 , , , , X generators as the [[15 , , Z generators. It encodes seven logical qubits. The logical Z operators correspond tothe top six generators of the [[15 , , Z . However, the code, asgiven, is not triorthogonal.In order to induce triorthogonality, we will treat six of the encoded qubits as “gaugequbits”. That is, we will not use them to store computational data. Instead, we will requirethat they are always prepared as encoded | (cid:105) , so that the logical Z operators are nowstabilizers. If we choose the six gauge qubits so that the remaining computation qubit is theone with transversal logical Z , then we recover the [[15 , , H can be interpreted as implementing logical H , except that the gaugequbits have been corrupted because they are no longer in state | (cid:105) . As for the [[15 , , Z operators. The simplest way to use the CCZ and Hadamard constructions above is with a concatenatedtriorthogonal code. The relation shown in Figure 5.1 implies that a universal set offault-tolerant operations can be constructed from only CCZ and H gates. Thus usingtriorthogonal codes for computation could be advantageous for circuits that contain largenumbers of Toﬀoli gates. One could also imagine using multiple codes for computation by,for example, teleporting into the code best suited for each logical operation. In this setting,a triorthogonal code could be used to implement eﬃciently the CCZ operation.Threshold error rates for triorthogonal codes are largely unknown, though one estimatefor the [[15 , , .

01 percent per gate for depolarizing noise [CDT09].Toﬀoli- and CCZ-type gates have been demonstrated in a number of experimental settings,with ﬁdelities ranging from 68 to 98 percent [MKH +

08, MWY +

11, FSB +

12, RDN + Ironically, while the original motivation for implementing CCZ transversally was to eliminatestate distillation, Theorem 5.2.1 also implies an alternative protocol for distillation. Bravyiand Haah have proposed distillation procedures using triorthogonal codes that permitfault-tolerant implementation of the T gate [BH12]. We show that a similar procedure canbe used to implement Toﬀoli gates. 64 + i ⊗ k • m e a s u r e X s t a b ili z e r s d ec o d e • ... . . . ... •| + i ⊗ k • m e a s u r e X s t a b ili z e r s d ec o d e • ... . . . ... •| + i ⊗ k • m e a s u r e X s t a b ili z e r s d ec o d e • H ... . . . ... • H Figure 5.3: A Toﬀoli state distillation circuit using a triorthogonal code encoding k qubits.Three separate blocks are prepared in the encoded state | + (cid:105) ⊗ k and then transversal CCZgates are applied. Conditioned on detecting no errors, each block is decoded and Hadamardgates are applied to each of the target qubits, yielding k Toﬀoli states.

The Toﬀoli state is deﬁned by the output of the Toﬀoli gate on input | + , + , (cid:105) , where thethird qubit is the target. The circuit in Figure 5.3 uses a [[3 k + 8 , k, k + 8 noisy CCZ gates to produce k Toﬀoli states with higher ﬁdelity. Note that theHadamard gates are performed after decoding and thus the circuit in Figure 5.2 is notrequired.To simplify the analysis we will assume that all Cliﬀord gates can be implementedperfectly. This assumption is justiﬁed by the fact that many quantum error-correctingcodes admit simple (e.g., transversal) implementation of the Cliﬀord group. Using sucha code, we can then arbitrarily reduce the logical error rate per Cliﬀord gate using faulttolerant protocols for that code. Error-free Cliﬀord gates are conventional for analysis ofstate-distillation protocols, though some studies have considered a more complete errormodel [JYHL12, Bro13].The circuit in Figure 5.3 is directly adapted from the T -gate distillation protocol ofBravyi and Haah. Their protocol involved only a single code block, but the error analysiscan be re-used here directly. Consider a [[3 k + 8 , k, k > Z error with probability p , after65 a i • • X | i Z | b i • • X | i Z | c i • • X | i Z Figure 5.4: This circuit implements a CCZ gate on the input | abc (cid:105) . Each input qubitis individually teleported onto an ancilla. The CCZ gate commutes through the CNOTcontrols and can therefore be performed after the CNOT gates. Assuming perfect Cliﬀordoperations, the output contains only Z errors.which the X stabilizers are (perfectly) measured and the code block is decoded. Bravyiand Haah show that, conditioned on detecting no errors during stabilizer measurement, theprobability of an error on a (logical) qubit after decoding is given by (3 k + 1) p to leadingorder in p . The scaling in the error comes from counting the number of weight-two logical Z operators that have support on a particular logical qubit, which is equal to (3 k + 1).For Bravyi and Haah, the independent Z errors originate from T gates. In this case,the independent Z errors instead originate from CCZ gates. However, the error analysis fora given code block is precisely the same. Given access to CCZ gates that contain Z errorsindependently with probability p , and conditioned on detecting no errors during stabilizermeasurement, the circuit in Figure 5.3 produces k Toﬀoli states with error rate (3 k + 1) p per state, to leading order in p .Perhaps the most obvious way to obtain CCZ gates containing Z errors with probability p is to use a recursive protocol. At the lowest level of recursion, we may choose to usephysical CCZ gates if they are available, or an equivalent circuit composed of Cliﬀord and T gates. These physical CCZ gates may also contain X errors. But it is possible to eliminate X errors by probabilistically applying Cliﬀord gates, a process known as “twirling” [DLT02].Alternatively, X errors can be eliminated by using gate teleportation. See Figure 5.4. SinceCCZ is diagonal, it can be commuted through the control of a CNOT gate. X errors onthe CCZ have no impact on the X -basis measurements, and Z errors on the CCZ can leadonly to Z errors on the output. The concept here is similar to that of Figure 4.9c for T distillation, except with a three-qubit gate.Another issue with the recursive protocol is that errors on the k Toﬀoli states of theoutput of Figure 5.3 are not independent. Therefore, two Toﬀoli states from the same66 | + i • • • • X | + i • • • X •| i Z Ha Zb Zc H • X (cid:95) (cid:95) (cid:95) (cid:95)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31) (cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:95) (cid:95) (cid:95) (cid:95) Figure 5.5: A CCZ gate can be implemented by consuming a single Toﬀoli state [NC00].The input qubits are teleported into the Toﬀoli state (enclosed by the dashed line) withCliﬀord corrections conditioned on the measurement outcomes.distillation circuit cannot be used together as inputs to a distillation circuit at the nextlevel up. When many Toﬀoli states are required, as is expected in large-scale quantumalgorithms, then Toﬀoli states can be routed appropriately without any waste.

We ﬁnd, however, that a more eﬃcient method is to use the Toﬀoli distillation protocol dueto Eastin [Eas13] and Jones [Jon13d] to implement CCZ gates, and use the triorthogonalprotocol only at the top level. A Toﬀoli state can be used to implement the CCZ gate withthe help of classically controlled Cliﬀord gates as shown in Figure 5.5. To see how Figure 5.3can be combined with the protocol of Eastin and Jones, we give the following illustrativeexample.Suppose we wish to implement a Toﬀoli gate with error below 10 − . The procedureof [Jon13d] consumes eight T gates with error rate p to produce a Toﬀoli state with errorrate 28 p . See Figure 5.6. The T gates can be implemented using a combination of protocols;Table I of [Jon12] lists optimal protocol combinations for a large range of target errorrates. If physical T gates can be performed with error at most 10 − , then using the Toﬀoliconstruction of [Jon13d], as given, requires on average 540 . T gates.Alternatively, we could use a [[3 k + 8 , k, each triorthogonal code block is even. To leading67 + i • •| + i • •| + i • • • H •| i H • • H Z | i T | i T | i T † | i T † | i T † | i T † | i T | i T Figure 5.6: This circuit prepares a Toﬀoli state on the top three qubits [Jon13d]. Assumethat each T gate fails with probability p and the Cliﬀord gates are perfect. Then conditionedon a Z -basis measurement outcome of zero, the probability of an error on the output is28 p , to leading order in p . The bottom eight qubits can be discarded.order, this occurs only if a pair of input Toﬀoli states contain identical errors. There areseven possible errors on the output of states from [Jon13d], each of which are equally likely.Thus, if the input Toﬀoli states have error p , then to leading order the failure probabilityof the triorthogonal protocol is given by 7(3 k + 1)( p / per output Toﬀoli state. For k = 100, this yields an average T -gate cost of 428 .

7, a savings of 25% over [Jon13d] alone.Calculations for a range of target error rates are shown in Figure 5.7.The T gate count alone is an incomplete measure of the overhead. Indeed, Figure 5.7shows that the double error-detecting protocol of [Jon13a] usually has higher T gate costthan the single error-detecting protocol. However, the double error-detecting protocol canstill yield savings since smaller code distances may be used for Cliﬀord gates in intermediatedistillation levels [FDJ13, Jon13d, Jon13a, Jon13c]. Our protocol similarly allows forreduced Cliﬀord gate costs and oﬀers the ﬂexibility to be used recursively or on top of anyother Toﬀoli state distillation protocol, including [Eas13, Jon13d] and [Jon13a]. Completeoverhead calculations depend on architectural considerations.Jones has performed detailed optimizations and resource calculations of various Toﬀoliconstructions for the surface code [Jon13c], though the protocol of Figure 5.3 is not amongthem. He ﬁnds that the single-error detecting circuit of [Eas13, Jon13d] usually requires68 Eas13,Jon13d (cid:68)(cid:64)

Jon13a (cid:68)(cid:64)(cid:64) (cid:43) (cid:68)(cid:68) code10 (cid:45) (cid:45) (cid:45) (cid:45) (cid:45) N u m b e r o f T g a t e s Figure 5.7: The average number of physical T gates required for three diﬀerent Toﬀolistate distillation protocols. For the previous protocols of [Eas13, Jon13d] and [Jon13a],input T gates are ﬁrst distilled to the appropriate ﬁdelity according to Table I of [Jon12].The solid black line shows the cost of our protocol for [[3 k + 8 , k, ≤ k ≤

100 has been optimally selected at each target error rate.Input CCZ gates to the triorthogonal protocol are produced using [Jon13d]. Physical T gates are assumed to have error at most 10 − .69 + i S • X | ψ i S S Y H | ψ i Figure 5.8: This circuit implements H (up to a global phase) with the help of | + (cid:105) and X -basis measurement.the smallest total space-time volume. Given the results in Figure 5.7, we expect thattriorthogonal distillation performs similarly well in the surface code. The correspondingoptimization and volume calculations have not been performed here, however. Although state distillation is the most widely used protocol, other methods for achievinguniversality exist for certain codes. Shor’s original proposal used Toﬀoli states and teleporta-tion to implement Toﬀoli gates for the class of “doubly-even” codes [Sho96]. However, eachToﬀoli state was prepared using a veriﬁed cat state and a particular four-qubit transversalgate rather than distillation, which was developed afterwards. Shor’s approach was laterextended by Gottesman to accommodate any stabilizer code [Got98].Knill, Laﬂamme and Zurek showed that T and CNOT are transversal for the [[15 , , | + (cid:105) , each of the gates in this circuit can be performed transversally.This circuit bears a striking resemblance to the gate teleportation circuit used for statedistillation in Figure 4.9a. Indeed, the most costly part of Figure 5.8 is the fault-tolerantpreparation of the “resource state” | + (cid:105) . One diﬀerence in this case, though, is that | + (cid:105) is a stabilizer state, and can be prepared with the methods discussed in Chapter 6. Thismethod for achieving universality has also been used by Bombin and others in topologicalcolor codes [BMD07, BCHMD13].Another alternative has been employed to implement a fault-tolerant T gate in the[[7 , , T XT † = SX of which | A (cid:105) = T | + (cid:105) is the +1-eigenstate. Conditioned on theoutcomes of these measurements, an ancilla state is projected onto encoded | A (cid:105) with highﬁdelity [AGP06].Recently, Jochym-O’Connor and Laﬂamme have proposed a diﬀerent protocol foruniversality [JL13]. They concatenate two diﬀerent codes and use the incomplete set oftransversal gates from each one in order to obtain a universal set overall. Their method uses70nly transversal gates (in a certain sense), but whereas the distance of a concatenated codeis typically given by the product d d of the two code distances, they achieve a minimumdistance of only min { d , d } . Thus, while their protocol is conceptually interesting, it isless eﬃcient than ours.These methods for achieving universality suggest several possible categorizations. Distillation and teleportation

This category includes traditional T [BK05, MEK13, BH12, Jon12] and Toﬀoli [Eas13,Jon13d] distillation protocols, as well as the [[15 , , Cat state projection

Protocols in this category use cat states and transversal gates in order to measure aparticular operator of which the desired state is an eigenstate. This includes [Sho96,Got98] and [AGP06].

Transversal gates and error correction

This category includes the protocol described in this chapter, and potentially [JL13].Each protocol, regardless of the category requires preparation of some sort of ancillarystate. Even the Hadamard described in Section 5.3 requires an ancilla in order to measurethe stabilizer generators. Another way to partition universality techniques, therefore, isbased on the type of ancilla state that is prepared. One obvious choice is to group theprotocols that require only stabilizer states such as | + (cid:105) or cat states, and those that requirenon-stabilizer states such as | A (cid:105) or Toﬀoli states.Regardless of categorization, though, the most important property of each protocol isthe amount of resources required. Transversal gates plus error correction is the simplest ofall protocols. But the uncertainty regarding thresholds for triorthogonal codes preventsmore thorough analysis. High thresholds and minimal connectivity requirements suggestthat the T or Toﬀoli distillation in the surface code may require fewer resources overall.71 hapter 6Reducing the overhead oferror-correction This chapter is based on material that appears in [PR12].We have seen in Chapters 4 and 5 that error correction circuits are much more com-plicated than transversal gates. Furthermore, since error correction is also required indistillation circuits, it is the dominant factor in determining a scheme’s resource overhead,and is usually the major bottleneck in determining the noise threshold. In particular, thedetails of how error correction is implemented are more important than the properties ofthe underlying quantum error-correcting code.For example, with the nine-qubit Bacon-Shor code, a fault-tolerant logical CNOT gatebetween two code blocks can be implemented using nine physical CNOT gates, whereasan optimized error-correction method uses 24 physical CNOT gates [AC07]. For largerquantum error-correcting codes, the asymmetry between computation and error correctionis greater still.Larger quantum error-correcting codes, with higher distances and possibly higherrates, can still outperform smaller codes. Separate numerical studies by Steane [Ste03] (seealso [Ste07]) and Cross, DiVincenzo and Terhal [CDT09] have each compared fault-toleranceschemes based on a variety of codes. They identify larger codes that, compared to the[[7 , , | (cid:105) and | + (cid:105) are prepared and used to detecterrors on the data. The complexity of ancilla preparation grows quickly with the size of thecode, however, and dominates the overall cost of error-correction.In this chapter we present a variety of methods for reducing the cost of ancilla statepreparation for CSS codes. Our derivation is based on two main ideas. First, we simplifySteane’s Latin-rectangle-based scheme for preparing encoded | (cid:105) states [Ste02], by takingadvantage of overlaps among the code’s stabilizers. Second, we reduce the overall numberof encoded ancilla states required for error correction by carefully tracking the exactpropagation of errors.To demonstrate the utility of our approach, we give an optimized fault-tolerant error-correction procedure for the Golay code that uses only 640 CNOT gates (compared to 1177for a more naive procedure), while also being highly parallelizable. All of our methods aregenerally applicable to other large quantum error-correcting codes. Robust preparation of stabilizer states is a key ingredient of both Steane- and Knill-styleerror correction protocols. Indeed, preparation of stabilizer states is required for anyfault-tolerance scheme based on stabilizer codes in order to prepare logical qubits forcomputation.One way to prepare a stabilizer state for an n -qubit code is to prepare any state on n qubits, say | n (cid:105) . Then by measuring each of the stabilizer generators (including thecorresponding logical operator) the state is projected onto the one-dimensional subspacethat deﬁnes the stabilizer state. Steane has proposed an alternative method for CSS codes,which is more compact [Ste02]. Steane’s method involves constructing and solving a partial Latin rectangle based on thestabilizer generators. For simplicity, consider a [[ n, , d ]] CSS code. Let n X be the numberof X stabilizer generators. Then the X stabilizer generators form a n X × n binary matrix inwhich the X operators in the tensor product are represented as 1s. Each column representsa (physical) qubit in the code, and each row represents one stabilizer generator. To prepareencoded | (cid:105) , Gaussian elimination is performed until the matrix is of the form73 X (cid:8)(cid:0) n X (cid:122)(cid:125)(cid:124)(cid:123) I n − n X (cid:122)(cid:125)(cid:124)(cid:123) A (cid:1) (6.1)The ﬁrst n X qubits, called “control” qubits, are prepared as | + (cid:105) , and the remaining “target”qubits are prepared as | (cid:105) . The matrix A is called the redundancy matrix , and represents apartial Latin rectangle, the solution to which is used to schedule rounds of CNOT gatesfrom control to target qubits.For example, by swapping qubits three and four, the X stabilizers of the [[7 , , | (cid:105) , notice that the stabilizergenerators of the initial state of the control and target qubits are described by the binarymatrix (cid:18) I n X I n Z +1 (cid:19) , (6.2)where the ﬁrst n X rows are weight-one X generators and the last n Z + 1 = n − n X rows areweight-one Z generators. The ﬁrst n X qubits are controls and the remaining n Z + 1 qubitsare targets. Let S i be the operator corresponding to row i , let U be the unitary operationcorresponding to the CNOT schedule, and let | ψ (cid:105) be the initial state. Then U performsthe transformation S i (cid:55)→ U S i U † | ψ (cid:105) (cid:55)→ U | ψ (cid:105) . (6.3)The operators U S i U † form an independent set of stabilizers of U | ψ (cid:105) , the ﬁrst n X of whichare the X stabilizer generators of the code, by construction. The remaining n Z operators arealso independent stabilizers of U | ψ (cid:105) . Indeed, they form a basis for the ( n − n X )-dimensionalsubspace orthogonal to the X stabilizers and are therefore equivalent to the Z stabilizergenerators and the Z logical operator of the code. Therefore U | ψ (cid:105) = (cid:12)(cid:12) (cid:11) .The procedure for encoded | + (cid:105) is entirely analogous, except that the Z stabilizers areused in place of the X stabilizers, and the roles of control and target are swapped. Theprocedure can also be generalized to CSS codes that encode multiple qubits. Steane’s Latin rectangle method treats each stabilizer generator independently. However, bytaking advantage of similarities between stabilizer generators it is possible to signiﬁcantlyreduce the number of CNOT gates. 74 2 4 3 5 6 7 X X X (a) | + i • • •| + i • • •| i (cid:31)(cid:30)(cid:29)(cid:28)(cid:24)(cid:25)(cid:26)(cid:27) (cid:31)(cid:30)(cid:29)(cid:28)(cid:24)(cid:25)(cid:26)(cid:27) | + i • • •| i (cid:31)(cid:30)(cid:29)(cid:28)(cid:24)(cid:25)(cid:26)(cid:27) (cid:31)(cid:30)(cid:29)(cid:28)(cid:24)(cid:25)(cid:26)(cid:27) | i (cid:31)(cid:30)(cid:29)(cid:28)(cid:24)(cid:25)(cid:26)(cid:27) (cid:31)(cid:30)(cid:29)(cid:28)(cid:24)(cid:25)(cid:26)(cid:27) | i (cid:31)(cid:30)(cid:29)(cid:28)(cid:24)(cid:25)(cid:26)(cid:27) (cid:31)(cid:30)(cid:29)(cid:28)(cid:24)(cid:25)(cid:26)(cid:27) (cid:31)(cid:30)(cid:29)(cid:28)(cid:24)(cid:25)(cid:26)(cid:27) (b) | + i • • •| + i • •| i (cid:31)(cid:30)(cid:29)(cid:28)(cid:24)(cid:25)(cid:26)(cid:27) (cid:31)(cid:30)(cid:29)(cid:28)(cid:24)(cid:25)(cid:26)(cid:27) | + i • •| i (cid:31)(cid:30)(cid:29)(cid:28)(cid:24)(cid:25)(cid:26)(cid:27) (cid:31)(cid:30)(cid:29)(cid:28)(cid:24)(cid:25)(cid:26)(cid:27) | i (cid:31)(cid:30)(cid:29)(cid:28)(cid:24)(cid:25)(cid:26)(cid:27) (cid:31)(cid:30)(cid:29)(cid:28)(cid:24)(cid:25)(cid:26)(cid:27) •| i (cid:31)(cid:30)(cid:29)(cid:28)(cid:24)(cid:25)(cid:26)(cid:27) (cid:31)(cid:30)(cid:29)(cid:28)(cid:24)(cid:25)(cid:26)(cid:27) (c) Figure 6.1: Preparation of encoded | (cid:105) for the [[7 , , × t of entry ( r, c )speciﬁes a CNOT on qubits r and c controlled by r in timestep t . (c) An alternative circuitfor preparing encoded | (cid:105) using one fewer CNOT gate. The new CNOT gate has the sameeﬀect as the two removed gates. See Section 6.1.2.To explain the optimization, consider once again the [[7 , , A (cid:124) A = O , where A is theredundancy matrix of the X stabilizer generators when expressed in form (6.1), and A (cid:124) is the transpose of A . Entry O ( i, j ) of this matrix corresponds to the number of non-zeroentries shared by stabilizers i and j .The algorithm proceeds by selecting the set of disjoint pairs { ( i k , j k ) } that yields thelargest sum (cid:80) k O ( i k , j k ), for some k ≤ n X /

2. The overlap between each pair of columns( i, j ) is then removed from column j of A , and the process is repeated until no overlapsremain. The schedule of CNOT gates is then obtained from the chosen pairs, and the75emaining 1s in A , while also accounting for the time-ordering required by the overlapCNOTs.For example, swapping columns three and four of the [[7 , , , (6.4)where the lower triangular entries have been omitted because the matrix is symmetric.Each diagonal entry indicates the weight of the corresponding column, and the oﬀ-diagonalentries indicate the overlap between pairs of columns. In this example we see that columnseven has overlap two with each of the other three columns. In Figure 6.1c we have chosento use the overlap between columns six and seven. Alternatively we could have chosen touse the overlap between columns three and seven or columns ﬁve and seven.In this case, column seven is the only choice that yields improvement over Steane’smethod. Overlaps of one yield no net gain. In Section 6.3 we will examine larger codes forwhich there are more overlaps.In the asymptotic setting, for arbitrarily large circuits of CNOT gates, the overlap-based method bares resemblance to the algorithm presented in [PMH03]. Both methodsexploit similarities across columns (or rows) of a matrix to eliminate CNOT gates. Ourmethod diﬀers in that we use only the redundancy matrix rather than the full n × n lineartransformation, and we exploit similarities between columns without ﬁrst using Gaussianelimination to make the columns identical. This way, and by making the optimizations byhand, we are usually able to preserve circuit depth. The most obvious beneﬁt of this method is the reduction in the size of the encoding circuit.For the [[7 , , Deﬁnition 6.1.1 (Correlated error) . Consider an encoding circuit C for a code withdistance d . An error e caused by a set of k ≤ (cid:98) ( d − / (cid:99) faulty locations in C is correlated76 f e propagates through C to an error f such that | f | > k . An error that is not correlated issaid to be uncorrelated . Informally, an error is correlated if its weight, modulo the stabilizers, is larger than thenumber of faulty locations that combined to cause the error. This deﬁnition is motivatedby the desire for strict fault tolerance (Deﬁnition 4.2.3). If each location in the circuitfails with probability p , then an uncorrelated error of weight k occurs with probability atmost p k . Preparation of stabilizer states with small numbers of correlated errors is highlydesirable for fault-tolerant error correction, as we shall see in Section 6.4.For the [[7 , , { X X , X X , X X } , and there are no weight-three errors. Here the notation X i indicates an X error on qubit i . In Figure 6.1c, however, there are only two possiblecorrelated errors { X X , X X } . A correlated XX error could occur on the ﬁnal CNOTbetween qubits six and seven. However, X X is equivalent to X X modulo the stabilizer X X X X . The reduction in the number of correlated errors is fairly modest for this code,but can be substantially larger for other codes. Stabilizer state preparation can be extended in order to encode an arbitrary state | ψ (cid:105) . Oneway to prepare an arbitrary state is to use a teleportation protocol due to Knill [Kni04].The idea here is to prepare an encoded Bell pair and then teleport the (physical) inputstate | ψ (cid:105) into the encoding. See Figure 6.2. The circuit requires two encoded stabilizerstates | (cid:105) and | + (cid:105) plus some additional Cliﬀord operations.A more eﬃcient alternative, however, is to use just the encoded | (cid:105) preparation circuitand a controlled version of the logical X operator, as shown in Figure 6.3. Let U be theunitary operation implemented by the | (cid:105) encoding circuit, and consider the operator X (cid:48) L = U † X L U (6.5)obtained by propagating logical X from the output through U to the input. In Figure 6.3we take one of the input | (cid:105) qubits with support on the logical operator X (cid:48) L , and replace itwith | ψ (cid:105) . Let ˜ X L be the part X (cid:48) L that does not have support on this qubit. Then we perform˜ X L , controlled by | ψ (cid:105) . Assuming that the encoding circuit contains only CNOT gates, X (cid:48) L is a tensor product of X and I and so the controlled operation can be accomplished77 ψ i • X | + i / • d ec o d e Z (cid:12)(cid:12) (cid:11) / / (cid:12)(cid:12)(cid:12) ψ E Figure 6.2: Encoding of an arbitrary state | ψ (cid:105) by teleportation [Kni04]. An encoded Bellpair is constructed by preparing stabilizer states (cid:12)(cid:12) (cid:11) and | + (cid:105) and coupling with CNOT. Onehalf of the Bell pair is then decoded. The decoded half is then used in a Bell measurementto teleport the input state | ψ (cid:105) onto the encoded half of the Bell pair.with CNOT gates. Finally, implementing the circuit for encoded | (cid:105) (using either Steane’smethod or by exploiting overlaps) outputs an encoded version (cid:12)(cid:12) ψ (cid:11) of the input state. Herewe have assumed a single-qubit state | ψ (cid:105) , though the procedure can be adapted to multiplequbits.To see that this works, we examine the eﬀect of the circuit on each of the basis states of | ψ (cid:105) = a | (cid:105) + b | (cid:105) . Let Λ X be the controlled ˜ X L operation, U be the unitary correspondingto the encoding circuit, and | φ (cid:105) be the state of the ( n −

1) qubits other than | ψ (cid:105) . We needto show that U Λ X | φ (cid:105) ( a | (cid:105) + b | (cid:105) ) = a (cid:12)(cid:12) (cid:11) + b (cid:12)(cid:12) (cid:11) . (6.6)We will examine the two basis states | (cid:105) and | (cid:105) separately. The result will then followby linearity. The case in which | ψ (cid:105) = | (cid:105) is obvious. In this case, the controlled ˜ X L gatedoes not activate, and we obtain U | φ (cid:105) | (cid:105) = (cid:12)(cid:12) (cid:11) , by construction.Now consider the case | ψ (cid:105) = | (cid:105) . Since the control activates in this case, the circuit isequivalent to setting | ψ (cid:105) to | (cid:105) , applying X (cid:48) L and then applying U . That is U Λ X | φ (cid:105) | (cid:105) = U X (cid:48) L | φ (cid:105) | (cid:105) . (6.7)Using (6.5), we then obtain U X (cid:48) L | φ (cid:105) | (cid:105) = X L U | φ (cid:105) | (cid:105) = X L (cid:12)(cid:12) (cid:11) = (cid:12)(cid:12) (cid:11) . (6.8)Note that when using Steane’s Latin rectangle construction, X (cid:48) L = X L since all ofthe qubits on which X L has support are targets of CNOT gates. Any X operator of X L + i encode | i ... | + i| i ˜ X L ... (cid:12)(cid:12)(cid:12) ψ E ... | i| ψ i •  Figure 6.3: Encoding of an arbitrary single-qubit state | ψ (cid:105) without resorting to teleportation.Qubits are prepared as in the encoding procedure for encoded | (cid:105) , except that one of the | (cid:105) inputs is replaced by | ψ (cid:105) . Controlled on | ψ (cid:105) , the X logical operator is conditionally applied.Here ˜ X L indicates the part of logical X with support disjoint from | ψ (cid:105) . Implementing theencoding circuit for | (cid:105) then yields the encoded state (cid:12)(cid:12) ψ (cid:11) .on a control qubit can be removed by multiplying by a stabilizer. Pauli X commutesthrough a CNOT target, and therefore X L commutes through a Latin rectangle circuit.For overlap-based circuits, the operator X (cid:48) L may be somewhat diﬀerent, but will still be atensor product of X and I .Circuits of the form given by Figure 6.3 are typically used for state distillation intopological codes [FMMC12], where the code in question is the [[7 , , S gate)or the [[15 , , T gate). The overlap-based optimizations given in Section 6.1.2therefore suggest that such distillation circuits could be improved, particularly Figure 4.9c.In Section 6.3.1 we show this optimization explicitly. The [[7 , , For our ﬁrst example, we examine the [[15 , , X stabilizer generators, each of which have weight eight. A Latin79ectangle encoding circuit for encoded | (cid:105) , therefore has size 28 and depth seven. There areten Z stabilizer generators. In (5.4) there are four generators of weight eight and six ofweight four. However, using Gaussian elimination we can obtain the following presentationin which each generator has weight four:Z · · · · · Z · · · Z · Z · ·· Z · · · · Z · · · Z · · Z ·· · Z · · · Z · · · Z · · · Z · · · Z · · Z · · · · · Z Z ·· · · · Z · Z · · · · · Z · Z · · · · · Z Z · · · · · ·

Z Z · · · · · · · Z · · Z · Z Z ·· · · · · · · · Z · Z · Z · Z · · · · · · · · · Z Z · ·

Z Z · · · · · · · · · · ·

Z Z Z Z . (6.9)The corresponding Latin rectangle circuit for encoded | + (cid:105) then has size 30. The depth is atleast six, the maximum weight of a column of (6.9).By exploiting overlaps between pairs of generators, as described above, we constructthe circuits shown in Figure 6.4. The circuit for encoded | (cid:105) has size 22 and the circuitfor | + (cid:105) has size 25, a size decrease by roughly 27% and 20%, respectively. The depth ofboth circuits is seven. The depth for the | + (cid:105) circuit has actually increased relative to theLatin rectangle circuit. The extra timestep is necessary to exploit overlaps between twoweight-six columns.As an immediate consequence of Figure 6.4a, the gate cost of state distillation can bedecreased. This circuit can be substituted for the bulk of the CNOT gates in Figure 4.9using the protocol discussed in Section 6.2. Additional savings can be obtained by notingthat not all of the qubits need to be prepared at the beginning of the circuit. For example,qubit 15 is not required until timestep six.Thorough analysis of the resource savings requires choosing another error-correcting codefor computation and specifying any geometric constraints. The standard [[15 , ,

1) (2) (3) (4) (5) (6)(7) | + i | + i | + i | + i | i | i | i | i | i | i | i | i | i | i | i (a) Encoded | (cid:105) (1) (2) (3) (4) (5) (6) (7) | i | i | i | i | i | i | i | i | i | i | + i | + i | + i | + i | + i (b) Encoded | + (cid:105) Figure 6.4: Optimized encoding circuits for the [[15 , , | (cid:105) requires 22 CNOT gates and seven rounds. (b) An encoding circuit for | + (cid:105) requires 25CNOT gates and seven rounds. Gates in the same round are applied in parallel. Next we consider the family of Bacon-Shor codes [Bac06]. For a ﬁxed n , this code familyuses n physical qubits to encode one logical qubit to a distance of n and ( n − logicalqubits to a distance of two. Usually, only the single distance- n qubit is used and the stateof the remaining “gauge” qubits is ignored. In this case the code is treated as [[ n , , n ]].The qubits of this code can be laid out as an n × n square lattice. In this geometry,the stabilizer generators can be expressed in a particularly simple form. The X stabilizergenerators correspond to neighboring pairs of rows, and the Z stabilizers correspondneighboring to pairs of columns. Following [AC07], for each row j let X j, ∗ be the operatorthat acts as a tensor product of Pauli X on the qubits of row j and acts trivially elsewhere.Similarly, for each column j let Z ∗ ,j be the operator that acts as a tensor product of Pauli Z on column j . Then the stabilizer generators of the code are given by { X j, ∗ X j +1 , ∗ ; Z ∗ ,j Z ∗ ,j +1 | j ∈ [ n − } . (6.10)When presented in this way, we immediately see that each generator has weight 2 n and,except for X , ∗ X , ∗ and X n − , ∗ X n, ∗ (and respectively, Z ∗ , Z ∗ , and Z ∗ ,n − Z ∗ ,n ) overlaps withtwo other generators in on exactly n qubits. In order to see how to take advantage of theseoverlaps, however, we will prefer to present the generators in a diﬀerent way. Consider theproduct of the last two X generators X n − := ( X n − , ∗ X n − , ∗ )( X n − , ∗ X n, ∗ ) = X n − , ∗ X n, ∗ .This operator has support on rows ( n −

2) and n . We may similarly deﬁne operators X j X j := ( X j, ∗ X j +1 , ∗ ) X j +1 = X j, ∗ X n, ∗ . (6.11)The set { X j | j ∈ [ n − } forms an alternate basis of X stabilizer generators for the code.Each column of the generator matrix has weight one, except for the last n columns whicheach have weight n −

1. For example, the X generators for the case n = 4 are given by X X X X · · · · · · · ·

X X X X · · · ·

X X X X · · · · · · · ·

X X X X X X X X . (6.12)The weight-one columns can be ﬁlled in using a total of ( n − CNOT gates, and column n can be ﬁlled in using ( n −

1) additional CNOTs. Then the remaining block of ( n − X s can be ﬁlled, using overlaps, with ( n −

1) CNOTs. The corresponding circuit prepareslogical | (cid:105) on each of the encoded qubits (including the gauge qubits) using ( n − n + 1)CNOTs. See Figure 6.5. By obtaining a similar presentation of the Z generators, encoded | + (cid:105) can be prepared across all logical qubits for the same cost.From Figure 6.5 we see that the circuit consists of ( n −

1) cat state preparations, plusanother circuit that also resembles a cat state. A cat state can be prepared in depth (cid:100) log ( n ) (cid:101) using a tree-like sequence of CNOT gates, and so the entire circuit can beimplemented in depth n + 2 (cid:100) log ( n ) (cid:101) − | + (cid:105) rather than | (cid:105) , the encoded | (cid:105) state (on the distance n qubit) can beexpressed as a tensor product of n cat states ( | n (cid:105) + | n (cid:105) ) / √

2, breaking the couplingrequired in Figure 6.5. Thus, if we are unconcerned with the state of the gauge qubits, thenencoded | (cid:105) can be prepared using only n ( n −

1) CNOTs and (cid:100) log ( n ) (cid:101) timesteps.Both Figure 6.5 and the cat state method of [AC07] compare favorably to the Latinrectangle method. The Latin rectangle method requires each of the rows to be ﬁlledseparately, yielding ( n − n −

1) CNOT gates and a depth of 2 n −

1. The overlap and catstate circuits beat this by roughly a factor of two in size. Statistics for all three encodingmethods are shown in Table 6.1.This example also illustrates why exploiting stabilizer overlaps reduces the number ofcorrelated errors produced by the encoding circuit when compared to the Latin rectanglemethod. Reichardt has observed that the correlated errors in a Latin rectangle circuit canbe characterized in a systematic way [Rei06a]. Consider a single X stabilizer generator ofweight m . Ignoring the qubits on which this generator acts trivially, the circuit for this82 + i • • . . . • • . . . | i . . . . . . | i . . . . . . n ... . . . . . . | i . . . . . . | + i • • . . . • • . . . | i . . . . . . | i . . . . . . ... . . . . . . | i . . . . . . ... . . . . . . | + i • • . . . • • . . . | i . . . . . .n − | i . . . . . . ... . . . . . . | i . . . . . . | i . . . • • . . . •| i . . . | i . . . ... . . . | i . . .  Figure 6.5: This circuit prepares logical | (cid:105) on each of the qubits (including gauge qubits) ofan n -qubit Bacon-Shor code. For visual clarity, each of the n boxed subcircuits use CNOTsfrom the same control qubit. Alternate but equivalent subcircuits can be implemented indepth (cid:100) log ( n ) (cid:101) . 83ethod Size (CNOTs) DepthLatin rect. ( n − n −

1) 2 n − n ( n − (cid:100) log ( n ) (cid:101) Overlap ( n − n + 1) n + 2 (cid:100) log ( n ) (cid:101) − | (cid:105) or | + (cid:105) for a [[ n , , n ]] Bacon-Shor code.Column one shows the Latin rectangle method due to [Ste02], column two shows the catstate method due to [AC07] and column three shows the overlap method from Section 6.1.2.The cat state method of [AC07] prepares | (cid:105) on the distance- n logical qubit and | + (cid:105) oneach of the gauge qubits, whereas the overlap method prepares | (cid:105) on each of the gaugequbits.generator is of the form | + i • • . . . •| i . . . | i . . . ... . . . | i . . . . (6.13)Next consider the X errors that can occur as a result of a single faulty gate in the circuit.Pauli X errors on target qubits do not propagate and are uncorrelated. Any correlated X error must have support on the ﬁrst qubit and some consecutive sequence of qubits { j, . . . , m } for j >

1. Up to multiplication by the stabilizer, X . . . X m is trivial and X X . . . X m has weight one. Thus, there are exactly ( m −

2) unique correlated errors thatoccur with ﬁrst-order probability.Of course, the stabilizer generators of the entire code are not disjoint, and so the totalnumber of ﬁrst-order correlated errors is more complicated to compute. However, in thecase of the Bacon-Shor code the intersections between stabilizers are particularly simple,and do not aﬀect the analysis. An n -qubit Bacon-Shor code has ( n − X generatorseach of weight 2 n , and so a Latin rectangle encoding circuit will contain ( n − n − X errors to ﬁrst-order.The situation for the overlap-based circuit is somewhat diﬀerent. From Figure 6.5 wesee that the encoding circuit contains n subcircuits of the same form as (6.13). Each ofthese subcircuits can produce ( n −

2) correlated X errors from a ﬁrst-order fault. The extraCNOT gates that span the circuit add another ( n −

1) such correlated errors. Thus theentire circuit can produce n ( n − − X errors, roughly half of thenumber of correlated X errors produced by a corresponding Latin rectangle circuit.84

1) (2) (3) (4) (5) (6) (7) | i | i | i | i | i | i | i | i | i | i | i | i | + i | + i | + i | + i | + i | + i | + i | + i | + i | + i | + i Figure 6.6: An optimized circuit for preparing | (cid:105) encoded in the Golay code uses 57CNOT gates applied in seven rounds. Gates in the same round are applied in parallel. In our ﬁnal example, we construct circuits for encoding | (cid:105) in the 23-qubit Golay code. TheGolay code has 11 X stabilizer generators, each of weight eight: · X · · X · · XXXXX · · · · · · · · · · XX · · X · · XXXXX · · · · · · · · · · X ·· XX · XXX · · · XX · · · · · · · · X · · XX · XXX · · · XX · · · · · · · · X · · · XXXX · · · X · · XX · · · · · · X · · · · X · X · X · XXX · · X · · · · · X · · · · ·· · · XXXX · XX · X · · · · X · · · · · ·· · XXXX · XX · X · · · · X · · · · · · ·· XXXX · XX · X · · · · X · · · · · · · · XXXX · XX · X · · · · X · · · · · · · · · X · X · · X · · XXXXX · · · · · · · · · · (6.14)The Z stabilizers are entirely symmetric (the code is self-dual). The logical X and Z operators correspond to transversal X and transversal Z , respectively.Latin rectangle circuits for | (cid:105) use 77 CNOT gates and seven time steps. The overlapoptimized circuit for | (cid:105) also has depth 7 but uses only 57 CNOT gates, a savings of about35%. See Figure 6.6. Since the X and Z stabilizers of the Golay code are symmetric, | + (cid:105) can be prepared from the circuit for | (cid:105) by taking the dual circuit in the standard way.85 -error weight: 2 3 4 5 6 7Order 1: 16 14 4 0 0 0Order 2: - 493 400 35 2 0 (a) Overlap X -error weight: 2 3 4 5 6 7Order 1: 22 22 11 0 0 0Order 2: - 848 718 73 3 0 (b) Latin rectangle Table 6.2: Correlated X error counts for circuits encoding | (cid:105) in the Golay code. (a)Correlated X error counts for the overlap optimized circuit in Figure 6.6. (b) Correlated X error counts for a Latin rectangle encoding circuit (not shown).By reducing the number of CNOT gates, this circuit also reduces the number of correlatederrors. For example, a single failure in the Latin rectangle encoded circuits can cause up to22 weight-two errors, but a single failure in Figure 6.6 can only cause up to 16 weight-twoerrors. The contrast for second-order faults is even larger. The improvement for the overlapoptimized circuit is roughly a factor of two. The correlated error counts for ﬁrst and secondorder are shown in Table 6.2a.We brieﬂy note that the overlap method, and the circuit in Figure 6.6 in particular,may not be optimal. Indeed there are equivalent circuits with fewer CNOT gates. However,Figure 6.6 is the smallest circuit we found that also preserves depth . None of the stabilizer state preparation circuits shown thus far are fault tolerant. A singlephysical fault may lead to errors on multiple qubits. For example, an XX error on theﬁnal CNOT of Figure 6.1c leaves the weight-two error X X . The code is limited by itsdistance and cannot necessarily protect against such correlated errors. As a result, theancilla states themselves must be checked for errors. The primary task of fault-tolerantancilla preparation then, is to prevent errors in the preparation circuit from spreadingthrough the ancilla block. One way to check for errors which is particularly useful for large CSS codes is to use aSteane-style error-detection circuit. To check for X errors, a second encoded ancilla isprepared as | + (cid:105) and a transversal CNOT is used to copy errors from the ﬁrst ancilla to86 | + i Z (a) X -error veriﬁcation | i • • X | + i Z (b) Z -error veriﬁcation Figure 6.7: First-order veriﬁcation circuits. (a) X errors are copied onto the encoded | + (cid:105) ancilla and then detected by the Z -basis measurement. (b) An encoded | (cid:105) ancilla is ﬁrstchecked for X errors in order to prevent X errors from spreading to the top qubit. Then Z errors are copied from the top qubit and detected by the X -basis measurement.the second, as shown in Figure 6.7a. If the Z -basis measurement implies the presence ofan error, then the ancilla is discarded and the process begins again. To check for Z errors,we instead prepare encoded | (cid:105) and swap the control and target of the CNOT. However,correlated X errors that occur during preparation of | (cid:105) can propagate through the CNOTto the original ancilla. To prevent this we ﬁrst check the | (cid:105) state for X errors, and thenproceed to use it for Z error detection, as in Figure 6.7b. Again, if an error is detected, thethe ancilla is discarded.The circuits in Figure 6.7 are suﬃcient to detect correlated errors up to ﬁrst order. Butfor high distance codes we desire veriﬁcation up to order t = (cid:98) ( d − / (cid:99) . Higher-orderveriﬁcation can be accomplished by using additional and more complex hierarchical errordetection circuits. In general, ( t + 1) t + 1 encoded ancillas are suﬃcient to produce a singleancilla veriﬁed to order t . For example, use t X -error veriﬁcations, followed by t Z -errorveriﬁcations in which each encoded | (cid:105) ancilla has been veriﬁed using an additional t X -errorveriﬁcations. The total overhead required to prepare a fault-tolerant ancilla depends alsoon the probability that any errors are detected.To maximize eﬃciency, preparation and veriﬁcation circuits may be constructed using apipeline architecture in which part of the computer is dedicated to preparing many ancillasin parallel. Even so, ancilla production constitutes the majority of the space requirementfor a fault-tolerant quantum circuit. In [IWPK08], for example, the ancilla pipeline isestimated to take up to 68 percent of the entire circuit footprint.One of the reasons that a hierarchical veriﬁcation structure is required is becauseidentically prepared stabilizer states produce identical sets of correlated errors. For example,say that two encoded ancillas are identically prepared. Assume that a single failure occursin the ﬁrst ancilla and propagates through the preparation circuit to produce a weight threeerror. Then the same single failure in the other ancilla will produce the same weight threeerror. When the error from the ﬁrst ancilla is copied to the second, the two errors will87ancel each other and no error will be detected. This is a second-order event that results ina weight-three error.However, DiVincenzo and Aliferis [DA07] have observed that diﬀerent preparationcircuits exhibit diﬀerent error propagation behavior, and this can be exploited. Intuitively,if the sets of errors produced by two diﬀerent preparation circuits are suﬃciently diﬀerent,only a small number of errors will cancel out at each veriﬁcation, and fewer veriﬁcationssteps will be required overall. Therefore, we seek to prepare encoded ancillas that producediﬀerent correlated error sets. In the next section we analyze the correlated errors producedby preparation circuits for the Golay code, and randomized methods for ﬁnding ancillaswith diﬀerent correlated error sets. Since the circuits and therefore the correlated errors diﬀer depending on the employederror-correcting code, the veriﬁcation circuits that can be obtained by mixing preparationcircuits will also diﬀer. The most concrete way to show the beneﬁts of this technique arewith an example. In this section we consider the 23-qubit Golay code. The Golay code is anillustrative example because it has relatively large distance, but is small enough for manualinspection. Furthermore, estimates show that the Golay code has a fairly high threshold.The examples discussed here will also be used in Chapter 7 to prove a lower bound on thethreshold for the Golay code.For the Golay code, the standard recursive veriﬁcation technique requires twelve encodedancillas and at least 1177 CNOT gates. One such circuit is shown in Figure 6.8. Variantsof this circuit have been used in previous studies of the Golay code, including in [Ste03]and [CDT09]. By considering many diﬀerent preparation circuits, we ﬁnd that the numberof ancillas can be signiﬁcantly reduced. We now outline two methods that produce circuitsof the form shown in Figure 6.9, requiring only four encoded | (cid:105) ancillas and as few as 297CNOT gates. Randomized method for preparing encoded | (cid:105) An X error in the preparation circuit can propagate to other qubits only if it occurs ona control qubit, and then only through the X stabilizer being created from that controlqubit. Thus single faults can create up to 22 weight-two errors (for each of the eleven X (cid:12) (cid:11) • (cid:31)(cid:30)(cid:29)(cid:28)(cid:24)(cid:25)(cid:26)(cid:27) • (cid:31)(cid:30)(cid:29)(cid:28)(cid:24)(cid:25)(cid:26)(cid:27) • (cid:12)(cid:12) (cid:11) (cid:31)(cid:30)(cid:29)(cid:28)(cid:24)(cid:25)(cid:26)(cid:27) (cid:42)(cid:45)(cid:43)(cid:44) Z (cid:30)(cid:30)(cid:30) (cid:12)(cid:12) (cid:11) • • (cid:42)(cid:45)(cid:43)(cid:44) X (cid:30)(cid:30)(cid:30) (cid:12)(cid:12) (cid:11) (cid:31)(cid:30)(cid:29)(cid:28)(cid:24)(cid:25)(cid:26)(cid:27) (cid:31)(cid:30)(cid:29)(cid:28)(cid:24)(cid:25)(cid:26)(cid:27) (cid:42)(cid:45)(cid:43)(cid:44) Z (cid:30)(cid:30)(cid:30) (cid:12)(cid:12) (cid:11) • • • (cid:42)(cid:45)(cid:43)(cid:44) X (cid:30)(cid:30)(cid:30) (cid:12)(cid:12) (cid:11) (cid:31)(cid:30)(cid:29)(cid:28)(cid:24)(cid:25)(cid:26)(cid:27) (cid:31)(cid:30)(cid:29)(cid:28)(cid:24)(cid:25)(cid:26)(cid:27) (cid:31)(cid:30)(cid:29)(cid:28)(cid:24)(cid:25)(cid:26)(cid:27) (cid:42)(cid:45)(cid:43)(cid:44) Z (cid:30)(cid:30)(cid:30) (cid:12)(cid:12) (cid:11) (cid:31)(cid:30)(cid:29)(cid:28)(cid:24)(cid:25)(cid:26)(cid:27) (cid:42)(cid:45)(cid:43)(cid:44) Z (cid:30)(cid:30)(cid:30) (cid:12)(cid:12) (cid:11) • (cid:42)(cid:45)(cid:43)(cid:44) X (cid:30)(cid:30)(cid:30) (cid:12)(cid:12) (cid:11) (cid:31)(cid:30)(cid:29)(cid:28)(cid:24)(cid:25)(cid:26)(cid:27) (cid:42)(cid:45)(cid:43)(cid:44) Z (cid:30)(cid:30)(cid:30) (cid:12)(cid:12) (cid:11) (cid:31)(cid:30)(cid:29)(cid:28)(cid:24)(cid:25)(cid:26)(cid:27) (cid:42)(cid:45)(cid:43)(cid:44) Z (cid:30)(cid:30)(cid:30) (cid:12)(cid:12) (cid:11) • (cid:42)(cid:45)(cid:43)(cid:44) X (cid:30)(cid:30)(cid:30) (cid:12)(cid:12) (cid:11) • (cid:42)(cid:45)(cid:43)(cid:44) X (cid:30)(cid:30)(cid:30) Figure 6.8: This circuit produces a single Golay encoded | (cid:105) state that is ready to be usedin fault-tolerant error correction. Each of the twelve encoded | (cid:105) ancillas, denoted (cid:12)(cid:12) (cid:11) , isidentically prepared using the Steane Latin rectangle method (see Section 6.4.2). The wiresrepresent 23-qubit code blocks and the indicated CNOT and measurement operations aretransversal. (cid:12)(cid:12) (cid:11) • (cid:31)(cid:30)(cid:29)(cid:28)(cid:24)(cid:25)(cid:26)(cid:27) (cid:12)(cid:12) (cid:11) (cid:31)(cid:30)(cid:29)(cid:28)(cid:24)(cid:25)(cid:26)(cid:27) (cid:42)(cid:45)(cid:43)(cid:44) Z (cid:12)(cid:12) (cid:11) • • (cid:42)(cid:45)(cid:43)(cid:44) X (cid:12)(cid:12) (cid:11) (cid:31)(cid:30)(cid:29)(cid:28)(cid:24)(cid:25)(cid:26)(cid:27) (cid:42)(cid:45)(cid:43)(cid:44) Z Figure 6.9: Our simpliﬁed ancilla preparation and veriﬁcation circuit uses only four encoded | (cid:105) ancillas. The ancillas are prepared using diﬀerent encoding circuits, shown in Figure 6.6and Table 6.5, and also in Table 6.4. 89eight: 0 1 2 3 4 5 6 7Number of X errors: 1 23 253 1771 1771 253 23 1Number of Z errors: 1 23 253 1771 0 0 0 0Table 6.3: The number of errors on Golay encoded | (cid:105) by Hamming weight. All Z errorsare correctable so there are no Z errors of weight greater than three.stabilizers, either IIIIIIXX or IIXXXXXX ∼ XXIIIIII ), 22 weight-three errors andeleven weight-four errors (

IIIIXXXX for each stabilizer).A single X fault, i.e., a fault resulting in an X error, cannot break the veriﬁcationcircuit in Figure 6.9. If it creates a correlated error on the ﬁrst ancilla, that error will bedetected on the second ancilla, and both will be discarded. Four or more X faults alsocannot break the veriﬁcation circuit because we only seek fault tolerance up to order three.Two X faults can break the veriﬁcation circuit only if there is one failure in each ancillapreparation that propagates to an error of weight at least three—necessarily the same errorso that it is undetected. To obtain a crude estimate for how likely this is to occur, considera circuit obtained by sampling uniformly at random over all possible circuits that prepareencoded | (cid:105) . (Several methods for approximating such a sample are discussed below.)Pretend that the correlated errors created by such a circuit are uniformly distributed amongall errors of the same weights. The number of errors on encoded | (cid:105) for each weight are givenin Table 6.3. Then the probability that two preparation circuits share no such correlatederrors is estimated as (cid:0) − (cid:1)(cid:0) (cid:1) · (cid:0) − (cid:1)(cid:0) (cid:1) ≈ . . Here, (cid:0) − (cid:1) is the number of ways to select 22 weight-three X errors on the secondancilla such that none of them correspond to the 22 weight-three errors on the ﬁrst ancilla.Similarly (cid:0) − (cid:1) is the number of ways to select 11 weight-four X errors on the secondancilla.Three X errors can break the circuit if they lead to an undetected error of weight fouror greater on the ﬁrst ancilla. Consider the case that there are two failures while preparingthe ﬁrst ancilla and one failure while preparing the second ancilla. The number of diﬀerentweight-four errors created with second-order probability (i.e., excluding those created withﬁrst-order probability) depends on the circuit. For ten random circuits, the smallest countwe obtained was 688 and the largest 735, with an average of 711. Using this average value,we estimate that the probability of a random circuit succeeding against three X errors isroughly [ (cid:0) − (cid:1) / (cid:0) (cid:1) ] ≈ . · − . (Here the square is because we want the circuit to90ork against both the case of two failures in the ﬁrst ancilla, one failure in the second, andvice versa.) Overall, we expect to have to try about 1 . · random pairs of preparationcircuits before we ﬁnd one that gives fully fault-tolerant X -error veriﬁcation.The result of X -error veriﬁcation is a single ancilla free of correlated X errors up toweight-three, but possibly containing correlated Z errors. The Z -error propagation canbe analyzed in a manner similar to that used for X errors. A single failure in an X -errorveriﬁed ancilla can produce roughly 60 Z errors of weight three. Again assuming a uniformdistribution, the probability of ﬁnding two X -error veriﬁed ancillas that share no correlated Z errors of weight three is (cid:0) − (cid:1) / (cid:0) (cid:1) ≈ .

12. In total, we expect to try about ﬁve X -error fault-tolerant pairs in order to ﬁnd two pairs that are fully fault-tolerant for both X -error and Z -error veriﬁcation, as (cid:0) (cid:1) = 10.To ﬁnd fault-tolerant veriﬁcation circuits in this way, one needs to be able to generatesuﬃciently random preparation circuits. As the Latin rectangle procedure for ﬁndingencoding circuits is fully algorithmic, it can be randomized by starting with a randompresentation of the Golay code. Alternatively, one can begin with a ﬁxed encoding circuitand randomly permute the seven rounds of CNOT gates (all of the CNOTs commute). TheGolay code is preserved by qubit permutations in a symmetry group known as the Mathieugroup M . Therefore another option is to permute encoding circuits based on randomelements of M . By trying roughly 10 random pairs, we found 14 pairs of ancillas thatwere fully fault-tolerant against X errors. Of the (cid:0) (cid:1) combinations, six were also fullyfault-tolerant against Z errors. Table 6.4 presents one such set. Overlap method for preparing encoded | (cid:105) Ideally, though, we could use preparation circuits based on the overlap optimizationof Section 6.1.2. The smaller number of correlated errors produced by Figure 6.6 meansthat it should be easier to ﬁnd fault-tolerant circuits. However, unlike Latin rectangleschedules the overlap-based schedule depends on a ﬁxed code presentation and on a ﬁxedround ordering, since the CNOT gates do not commute.To obtain randomized overlap method encoding circuits, we use the qubit permutationsymmetry of the Golay code and permute the qubits of Figure 6.6 according to a pseudo-random element of the symmetry group M . By analyzing the correlated error setsof randomly permuted circuits, we have found many sets of fault-tolerant four-ancillapreparation circuits. In fact, we have even found sets for which the fault order required for This symmetry is inherited from the classical 23-bit Golay code. See, e.g., [PBH98] pp. 1411. Generatorsfor this group can be obtained at [Gan99]. (a) Ancilla 1 (b) Ancilla 2 (c) Ancilla 3 (d) Ancilla 4 Table 6.4: Four seven-round ancilla-preparation schedules. In each table, the entry inrow i , column j speciﬁes the target qubit of a CNOT gate with control qubit i applied inround j . Using these schedules in the veriﬁcation circuit of Figure 6.9, the output encoded | (cid:105) state is fully fault-tolerant against both X and Z errors.92 ncilla Qubit permutation (cid:12)(cid:12) (cid:11) (0, 20, 13, 7, 12, 14, 1)(2, 11)(3, 19, 5, 4, 8, 22, 6, 15, 10, 16, 9, 18, 21, 17) (cid:12)(cid:12) (cid:11) (0, 14, 6, 12, 16, 2, 11, 22, 17, 21, 9, 20, 5, 7, 3, 13, 18, 4, 15, 1, 10, 8, 19) (cid:12)(cid:12) (cid:11) (0, 12, 4, 17, 9, 6, 1)(2, 10, 18, 22, 21, 16, 13)(3, 11, 20, 15, 7, 19, 5)(8)(14) Table 6.5: The ﬁrst ancilla in Figure 6.9 is prepared using the circuit of Figure 6.6. Theother three ancillas are prepared in the same way, except with the qubits rearrangedaccording to the above permutations.a weight- k error to pass veriﬁcation is at least k + 1 (rather than k ) for all k ≤

2. Thisreduces, for example, the probability of accumulating an uncorrectable error on the datablock by ﬁrst a weight-two error in Z -error correction and then another weight-two error in X -error correction. One such set of four permutations is given in Table 6.5. To evaluate the practical importance of our optimizations, we now analyze the resourcerequirements of Steane-style error correction circuits based on ancillas prepared by Figure 6.9.We use Monte Carlo simulation to compare overhead of our ancilla preparation andveriﬁcation circuits for the Golay code to that of standard circuits.One natural measure for the overhead is the number of CNOT gates used to readyan ancilla. Another overhead measure, important given the diﬃculty of scaling quantumcomputers, is the space complexity, i.e., the number of qubits that must be dedicatedto ancilla preparation in a pipeline so that an ancilla is always ready in time for errorcorrection. We consider both measures.As listed in the third column of Table 6.6, the overlap-based four-ancilla preparation andveriﬁcation circuit involves roughly a factor of four fewer CNOT gates than the standardtwelve-ancilla circuit. In fact, this understates the improvement. The overhead also dependson the acceptance rates of each veriﬁcation test. For an ancilla to leave the twelve-ancillacircuit, it must pass eleven tests, compared to only three tests for the four-ancilla circuit.The probability of passing all tests should be signiﬁcantly higher for the optimized circuit,and so one expects the ratio between the expected numbers of CNOT gates used by the twocircuits to be greater than four.To estimate the expected overhead, each circuit was modeled and subjected to depolariz-ing noise in a Monte Carlo computer simulation. We assumed that test results are availablesoon enough that a failed veriﬁcation circuit can be immediately aborted; later test failures93 .0000 0.0005 0.0010 0.0015 0.0020p050010001500200025003000 C N O T s Overlap-4Steane-4Steane-12 (a) Q u b i t s Overlap-4Steane-4Steane-12 (b)

Figure 6.10: Overhead estimates for the twelve-ancilla ancilla preparation and veriﬁcationcircuit and for each of our optimized circuits. The Steane-4 circuit is based on ancil-las prepared according to Table 6.4. Overlap-4 is based on ancillas prepared accordingto Figure 6.6 and Table 6.5. (a) Expected number of CNOT gates required to produce averiﬁed encoded | (cid:105) . (b) Number of qubits required to produce one veriﬁed encoded | (cid:105) , inexpectation, at every time step. Standard error intervals are too small to be seen here.94re therefore the most costly. This assumption impacts the twelve-ancilla circuits the most,since there are many ways to construct the hierarchy of veriﬁcations. The circuit shownin Figure 6.8 is a reasonable choice here because only six of the veriﬁcation tests dependon results of previous tests. Other circuits—see, e.g., [Rei06a, Sec. 2.3.2]—may contain asmany as nine dependent tests.Estimates of the expected number of CNOT gates required for each circuit are given inthe last column of Table 6.6 for the CNOT depolarization rate p = 10 − , and are plottedversus p in Figure 6.10a. At p = 10 − , the overlap method reduces the expected numberof CNOT gates by roughly a factor of 4 .

5, compared to the twelve-ancilla circuit, and theimprovement for our optimized Latin rectangle scheme is a factor of 3 .

6. At lower errorrates, the improvement is less. To investigate the eﬀects of diﬀerent error parameters, wealso considered setting the rest error rate to zero; in this case, the expected number ofCNOT gates used in the overlap circuit further decreases by about 11 percent, comparedto less than four percent for our other four-ancilla circuit and less than two percent for thetwelve-ancilla circuit. The larger improvement for the overlap circuit is due primarily to thefact that the overlap preparation method replaces many CNOT gates with rest locations.To evaluate the space overhead, we plot in Figure 6.10b the number of qubits requiredto produce a single veriﬁed encoded | (cid:105) , in expectation, per time step, for each of thepreparation and veriﬁcation circuits. Thus, for example, the space overhead for a pipelineto produce a single unveriﬁed ancilla state is 8 ·

23 = 184 qubits; at any given time step,one 23-qubit block is initialized, and CNOT gates are applied to seven other blocks—oneper round in, e.g., Figure 6.6—so that one ancilla is prepared. (In fact, the overhead isslightly less than this since some of the qubits in the block can be prepared during roundsone and two.) Estimates are calculated recursively by computing E[qubits] = (E[qubits] + E[qubits] )/Pr[accept] for each veriﬁcation step where the numerator is the expectednumber of qubits required to prepare the two states used in that veriﬁcation step andPr[accept] is the probability that the veriﬁcation measurement detects no errors. Theresults at p = 10 − are given in the second column of Table 6.6. Both of our optimizedschemes reduce the required space by a factor of 3 . p = 10 − .To judge the signiﬁcance of these results, recall that the ancilla production pipelinecan consume the majority of resources in a fault-tolerant quantum computer. In the caseof [IWPK08], physical ancilla production space is proportional to the number of CNOTgates in the pipeline. A factor of 4 . . ± .

001 5183 ± . ± . . ± .

002 1413 ± . . ± . . ± .

002 1399 ± . . ± . p = 10 − . The column labeled Pr[accept] gives the probability that all auxiliaryancilla measurements in the veriﬁcation circuit detect no errors. The next column, E[qubits],gives the expected number of physical qubits required to produce one veriﬁed encoded | (cid:105) .This is calculated recursively, by computing the expected number of qubits needed to passeach veriﬁcation step. The last two columns specify, respectively, the minimum number ofCNOT gates and the expected number of CNOT gates required to produce a single veriﬁedancilla. 96 hapter 7Improving threshold lower bounds This chapter is based on material that appears in [PR12].The malignant set counting technique discussed in Chapter 4 provides a simple wayto calculate lower bounds on the the noise threshold, particularly for low-distance codes.However, it suﬀers from two limitations. First, the number of faulty gate sets of size k scales exponentially with k . A large fraction of faulty sets may be harmless, but countingall of them is computationally intractable. Second, the assumed noise model is adversarialand, while more general than the model of independent Pauli channels, is probably overlypessimistic.The ﬁrst limitation is particularly troublesome if we wish to prove high thresholds forlarge codes which can correct many more sets of errors than smaller codes. Large codescan be more eﬃcient than small codes because they require fewer levels of concatenation inorder to achieve the same level of error protection. Using large codes could, therefore, leadto signiﬁcant reduction in resource overhead.Instead of exhaustively counting all subsets of locations, Aliferis and Cross have usedMonte Carlo sampling in order to estimate the fraction of malignant subsets to withinprescribed conﬁdence intervals [AC07]. Despite this improvement, the scaling of thepopulation size is still exponential, and so the ability to count large subsets is limited.In this chapter, we show how malignant set counting can be adapted to prove goodthresholds for large codes while simultaneously removing the requirement for an adversarialnoise model. The adaptation is based on two main ideas. First, when errors occur97ndependently, it is possible to partition the error correction circuit into small subcircuits.Malignant subsets within each subcircuit can be counted separately, and then recombinedin an eﬃcient way. By combining information from each subcircuit, we can eﬀectively countvery large sets.The second main idea involves the way that error rates are calculated for each levelof code concatenation. Standard malignant set counting calculates the probability that any uncorrectable error occurs during execution of the encoded gate. This error rate canthen be re-used to calculate similar probabilities at increasing levels of concatenation. Weinstead keep track of the probability of each type of uncorrectable error that can occur.This can signiﬁcantly improve the accuracy of the eﬀective noise model for higher levels ofconcatenation.For example, say that the probability that an encoded gate introduces a logical Z erroris 0 .

01 and that the probability of a logical X error is the same. In standard malignantset counting, this would be treated as a total error probability of 0 .

02 at the next level ofconcatenation. Using our method, error rates are reported separately, potentially saving afactor of two in this example.By combining these two ideas and including error-correction optimizations from Chap-ter 6, we can calculate rigorous lower bounds on the noise threshold for relatively largecodes. As a concrete example we calculate an error-rate bound of 0 . Before describing the adapted malignant set counting procedure in detail, it is worthwhileto examine the requirements that will be imposed on the noise model and fault-tolerancescheme. There are essentially only two requirements:1. errors must occur independently at each circuit location, and2. error-correction and gate gadgets must be strictly fault-tolerant.Roughly, the strict fault-tolerance requirement means that for a code that corrects up to t errors, the probability that the circuit causes a weight- k error on the data is no more than O ( p k ) for all k ≤ t and gate error rate p . This requirement was described in Section 4.2.2.We begin, instead, with the noise model. 98 .1.1 Noise model An important requirement of the modiﬁed malignant set counting technique is that errorsoccur independently at each physical circuit location. Indeed, one primary motivation formodifying the malignant set counting procedure was to move away from the adversarialnoise model in which circuit locations fail independently, but the errors at the failinglocations are correlated.We study noisy circuits constructed from the following physical operations: | (cid:105) and | + (cid:105) initialization, a CNOT gate, and single-qubit measurement in the Z and X eigenbases.Every qubit in the computer can be involved in at most one operation per discrete timestep. CNOT gates are allowed between arbitrary qubits, without geometry constraints.Resting qubits are also subject to noise. Deﬁnition 7.1.1 (Independent Pauli noise with parameter γ ) . Choose weights w ab ∈ [0 , for all a, b ∈ { I, X, Y, Z } such that (cid:88) a,b : ab (cid:54) = II w ab = 15 . (7.1) Additionally, choose weights w | (cid:105) , w | + (cid:105) , w m X , w m Z , w r X , w r Y , w r Z ∈ [0 , /γ ] , such that ( w r X + w r Y + w r Z ) γ ≤ .Then noisy operations are modeled by:1. A noisy CNOT gate is a perfect CNOT gate followed by, with probability γ , anon-trivial two-qubit Pauli error drawn from { I, X, Y, Z } ⊗ \ { I ⊗ I } according to { w ab / } .2. Noisy preparation of a | (cid:105) state is modeled as ideal preparation of | (cid:105) , followed byapplication of an X error with probability w | (cid:105) γ . Similarly, noisy preparation of | + (cid:105) is modeled as ideal preparation of | + (cid:105) with probability − w | + (cid:105) γ and of |−(cid:105) = Z | + (cid:105) with probability w | + (cid:105) γ .3. Noisy Z -basis ( | (cid:105) , | (cid:105) ) measurement is modeled by applying an X error with probability w m X γ , followed by ideal Z -basis measurement. Similarly, noisy X -basis ( | + (cid:105) , |−(cid:105) )measurement is modeled as ideal measurement except preceded by a Z error withprobability w m Z γ .4. A noisy rest operation is modeled as applying either the identity gate, with probability − ( w r X + w r Y + w r Z ) γ , or a Pauli error a ∈ { X, Y, Z } with probability w r a γ . ll locations fail independently of each other. Informally, this noise model works by modeling each physical location as an idealoperation, possibly followed (or preceded) by an error on the corresponding qubits. Whenan error occurs, it is selected from a probability distribution deﬁned by the weights forthat location. Deﬁnition 7.1.1 deﬁnes weights only for CNOT, qubit preparation andmeasurement in the X and Z bases, and rest locations. This set of locations is suﬃcient forthe fault-tolerance schemes considered in this chapter. However, additional locations (e.g.,Hadamard) can be added as necessary. The counting procedure and threshold calculationsof this section can be extended to accommodate any number of location types.The condition imposed by (7.1) is for convenience and concreteness, only. A sum of 15was chosen to correspond nicely with a depolarizing noise model in which w ab = 1 for all a, b .The noise model described by Deﬁnition 7.1.1 is quite ﬂexible and greatly improves ourability to analyze fault-tolerant quantum circuits when compared to an adversarial noisemodel. However, it is weaker than adversarial noise and may seem artiﬁcial compared toeven more general, or more physically realistic noise models described in Chapter 4.We justify Deﬁnition 7.1.1 in two ways. First, as a special case, this noise modeldescribes independent depolarizing noise, which is commonly used in Monte Carlo thresholdestimates [Zal96, Ste03, Rei04, Kni05, DHN06, SDT07, CDT09, LPSB13]. Therefore, ouradapted malignant set counting technique can be used to obtain rigorous threshold lowerbounds that can be more fairly compared with Monte Carlo threshold estimates. Second,although physical noise may be complicated, methods for rigorously replacing realisticphysical noise with simpler models do exist. For example, Magesan et al. have shown how toreplace an arbitrary single-qubit channel with a Pauli channel that approximates the originalchannel as closely as possible without underestimating the error strength [MPGC13].During error counting, X and Z errors are usually considered separately and the errorprobability is computed by omitting the Z or X part of each error, respectively. Forexample, when considering only X , error XY is equivalent to XX , XZ is equivalent to XI and so on. Thus, the marginal distribution of X errors for a CNOT is:Pr[ IX ] = w IX + w IY + w ZX + w ZY , Pr[ XI ] = w XI + w Y I + w XZ + w Y Z , Pr[ XX ] = w XX + w XY + w Y X + w Y Y . (7.2)The Z error distribution for CNOT, and the X and Z error distributions for rest locationsare calculated similarly. When preparing | (cid:105) or measuring in the Z basis, no Z errors are100ossible, and similarly no X errors are possible when preparing | + (cid:105) or measuring in the X basis.For computer analysis, it is convenient to choose integer-valued weights for each location.Any noise model that satisﬁes Deﬁnition 7.1.1 can be approximated to arbitrary precisionwith integer weights by relaxing (7.1) and rescaling γ . In order to both reduce the time-complexity of the counting procedure, and to simplify itsanalysis we will make a few additional assumptions. First, we assume that the quantumerror-correcting code (or codes) in use are CSS codes. Speciﬁcally, when X and Z errorscan be corrected independently, as is the case for CSS codes, the number of errors thatmust be counted is signiﬁcantly reduced. This optimization is described in Section 7.2.The second simplifying assumption is that quantum gates are not geometrically con-strained. That is, multi-qubit gates can act on any set of qubits of appropriate size, andthe properties of a quantum gate do not depend on the qubits on which the gate acts orthe position of the gate within the circuit.The unconstrained geometry assumption is common to many threshold calculations,including the AGP method of malignant set counting. AGP do not require use of CSScodes. However, nearly all fault-tolerance schemes that have been studied use CSS codes.(Some exceptions include [DS96, Got98].)Finally, we will assume some level of determinism in the error-correction gadgets.Speciﬁcally, syndrome measurements and corresponding corrections must be deterministic,though oﬄine procedures such as ancilla preparation and veriﬁcation may still be non-deterministic. In particular, veriﬁcation procedures such as those described in Chapter 6are allowed. Perhaps the biggest drawback of malignant set counting for high-distance codes is thatobtaining an accurate threshold value requires counting large subsets, but the countingcomplexity scales poorly with subset size. The number of subsets of size k in an exRec with n locations scales as (cid:0) nk (cid:1) , which is exponential in k .101onte Carlo simulations of circuits using the 23-qubit Golay code [Ste03, DHN06,CDT09] indicate that the depolarizing noise threshold should be on the order of p = 10 − .Unfortunately, it is not straightforward to prove such a high threshold using malignant setcounting. For example, say that we check for malignancy all location subsets of size up to k good , and we assume that all larger subsets are malignant. Then the estimate we obtainfor the probability of an incorrect rectangle is at least n (cid:88) k = k good +1 (cid:18) nk (cid:19) p k (1 − p ) n − k . (7.3)Using optimized circuits from Chapter 6, the size of CNOT exRec for the Golay code is n = 5439. For this size and p = 10 − , probability of incorrectness drops below 10 − onlyfor k good ≥

14. However, there are more than 10 subsets of size at most 14, so checkingthem one at a time is computationally intractable.Instead of checking each set for malignancy, one can sample random sets of locationsin order to estimate the fraction that are malignant. This technique, called malignant setsampling, can provide threshold estimates with statistical conﬁdence intervals. However,both malignant set counting and sampling techniques study the threshold for worst-caseadversarial noise, and may be overly conservative for a more physically realistic, non-adversarial noise model such as depolarizing noise. For example, malignant set samplingresults from [AC07] estimate a threshold of only p ≈ − for the Golay code.On the other hand, when a large number of errors occur, it is relatively unlikely that allof the errors occur in the same region. Rather, we expect errors to be distributed roughlyevenly throughout the exRec. We therefore choose to divide the exRec into a hierarchy ofcomponents and sub-components. We then compute an upper bound on the probability ofeach error that a component may produce, by counting location sets up to a certain smallsize. At the exRec level, we synthesize the component error bounds into upper boundson the probability that the rectangle is incorrect. The resulting error probabilities aretreated as an eﬀective transformed noise model for the encoded gate. With some care, thetransformed noise model can be fed recursively back into the procedure to determine aneﬀective noise model for the next level of encoding, and so on. See Section 7.5.1.Eﬀectively, dividing the exRec into components allows us to account eﬃciently for evenvery large location subsets. Most large sets will be roughly evenly divided between thecomponents, with only a small number of locations in each component. The remainder ofthis section outlines the exRec component structure.102 in , ζ in Component with K failures χ out , ζ out Figure 7.1: A circuit component with input error ( χ in , ζ in ) and output error ( χ out , ζ out ) We will divide the exRec into its encoded operation and its error corrections. The errorcorrections will each divide into X -error correction and Z -error correction, and furtherrecursive divisions will continue until reaching the physical location level.Each component in the hierarchy has input error ( χ in , ζ in ), some number of internalfailures K , and output error ( χ out , ζ out ) which depends on the internal failures and on theinput error (see Figure 7.1). Here, the notation ( χ, ζ ) indicates an error equal to the product χζ where χ is a tensor product of X and I operators and ζ is a tensor product of Z and I operators. For every error equivalence class on the inputs and outputs and for every k ∈ N ,we would like to computePr (cid:2) ( χ out , ζ out ) = ( x out , z out ) , K = k | ( χ in , ζ in ) = ( x in , z in ) (cid:3) , (7.4)the probability that there are exactly k failures and the output error is ( x out , z out ) conditionedon the input error ( x in , z in ).For components that are physical gate locations the probability in (7.4) is deﬁned by theappropriate Pauli-channel noise model (Deﬁnition 7.1.1). Larger components are analyzedby ﬁrst analyzing each enclosed sub-component. At the exRec level the LEC, transversal Gaand TEC components provide all of the information necessary to determine the probabilitythat the enclosed rectangle is incorrect. Indeed, we shall see in Section 7.2.3 that theycontain enough information to compute the probability for each way that the rectangle canbe incorrect.There are, however, two logistical problems. First, on each n -qubit code block, there2 n +1 inequivalent Pauli errors in total (assuming a single encoded qubit per block). Fora component involving two code blocks, this means we should compute for each k up to(2 n +1 ) quantities, one for each combination of input and output errors. Second, sincethere are (cid:0) nk (cid:1) size- k subsets of n locations and since each CNOT gate has 15 diﬀerent waysto fail, a computation that accounts for all possibilities scales roughly as (cid:0) nk (cid:1) k . Such acomputation is feasible only for small k and small n .103he ﬁrst problem can be solved by observing that X errors and Z errors can becorrected independently for CSS codes. Furthermore, error correction can be accomplishedwithout using gates that mix X and Z , so X and Z errors mostly propagate independently.There are cases, such as ancilla veriﬁcation, in which X and Z errors cannot be treatedindependently entirely. A speciﬁc example of this issue is discussed in Section 7.7.2. Still,for most components, the X -error part of the output of a component depends only on the X -error part of the input and the X failures that occur inside the component. A similarobservation holds for Z errors. Thus, expression (7.4) may be split into separate X and Z parts: Pr[ χ out = x out , K X = k | χ in = x in ] (7.5a)Pr[ ζ out = z out , K Z = k | ζ in = z in ] . (7.5b)Here, the random variable K X is the number of failures inside the component that containan X when decomposed into a tensor product of Pauli operators. The value K Z is similarlydeﬁned for Z . When considering X and Z errors separately, the input and output ofa two-block component contain at most roughly 2 n inequivalent errors, for codes thatprotect evenly against X and Z errors, and the worst case combination is a large but moremanageable 2 n cases.The second problem is eliminated by noting that, for a ﬁxed k , the probability of anorder- k fault decreases rapidly as the size of the component decreases. For example, for p = 10 − , the probability of an order-ten fault in an exRec of size 5000 is about 0 . − . Thus there is little gain in counting errors of order-ten or higher in componentsof small size.In general, the probability that a component contains a fault of order greater than k good can be bounded according toPr[ K > k good ] ≤ n (cid:88) k = k good +1 (cid:18) nk (cid:19) (1 − p ) n − k p k , (7.6)where p is an upper bound on the probability of a physical gate failure. (A tighter boundcan be achieved by considering separate k for each location type. See [PR12] Appendix A.)We will choose a value of k good for each component and then pessimistically assume that allfaults of order greater than k good within the component cause the rectangle to be incorrect.For large enough values of k good the overall impact on the threshold is negligible. There isa tradeoﬀ here between running time and accuracy. A larger value of k good yields a more104ccurate bound on the probability that the rectangle is incorrect. A smaller value of k good is easier to compute. We must choose for each component a suitable k good that balancesthe two.In the end we are left with two sets of faults for each component, those of order atmost k good and those of order greater than k good . Each fault in the ﬁrst set is counted toobtain accurate estimates of (7.5a) and (7.5b). When a fault from this set occurs we call ita good event. Faults in the second set are not counted and are instead bounded using (7.6)and pessimistically added to the ﬁnal incorrectness probability bounds for the rectangle.When a fault from this set occurs we call it a bad event. The probability that the rectangleis incorrect is then upper-bounded byPr[ incorrect ] ≤ Pr[ incorrect , good ] + Pr[ bad ] . (7.7)In general, there are four quantities we need to upper bound for each component:Pr[ χ out = x out , K X = k, good X | χ in ] , (7.8a)Pr[ ζ out = z out , K Z = k, good Z | ζ in ] , (7.8b)Pr[ bad X ] , (7.8c)Pr[ bad Z ] . (7.8d)The event good X ≡ ¬ bad X occurs when there is a set of X -error failures in the componentthat we choose to count. It will usually depend only on k good in which case good X ⇔ ( K X ≤ k good ). In some cases good X may depend on a vector (cid:126)k representing the number of X -error failures across multiple sub-components. The event good Z ≡ ¬ bad Z is similarlydeﬁned for Z .Finally, it is assumed that most components operate deterministically. Non-deterministiccomponents can be accommodated, however. If, for example, the output errors of acomponent are dependent on a “successful” measurement outcome, then the componentmust also report the probability of success. Then, the component output probabilities canbe bounded using Bayes’s rulePr[output | success ] = Pr[output , success ]Pr[ success ] ≤ Pr[output]Pr[ success ] . (7.9)In the remainder of this section we outline the procedure for computing the abovequantities for the error-correction and exRec components. Details of lower level components,such as ancilla preparation and veriﬁcation, depend on the choice of error-correcting code.105 -error correction X -error correction (a) CSS error-correction component LEC-A Ga TEC-ALEC-B TEC-B (b) Two-qubit exRec component

Figure 7.2: (a) The error-correction component for a CSS code consists of independent Z -error and X -error corrections. Here, we have chosen an arbitrary convention that X -errorcorrection follows Z -error correction. (b) The (encoded) two-qubit exRec consists of twoleading error-correction (LEC) components, a gate gadget (Ga) component and two trailingerror-correction (TEC) components. An error-correction component consists of Z -error correction and X -error correction, asshown in Figure 7.2a. (Recall that CSS codes admit independent correction of X and Z errors.) After extracting the error syndrome, the lowest-weight correction is computed. Thecorrection itself can be applied classically, and therefore without error, by a change in thequbit’s Pauli frame [Kni05].There are two types of error correction components: leading error correction (LEC) andtrailing error correction (TEC). For the LEC, we may assume that the input errors χ in and ζ in are both zero. This is because we have assumed that syndrome measurement andcorrection are deterministic. The probability that the rectangle is incorrect depends onlyon the syndrome of the output of the LEC and that syndrome depends only on the errorsinside of the LEC [CDT09].To be more precise, consider the two errors X and X X L , where X = X ⊗ I n − and X L is the logical X operator of the code. These two errors yield the same syndrome, butthey are inequivalent since X X L ﬂips the logical state of the encoded qubit, and X doesnot. But correctness of the rectangle that follows is independent of the logical state ofthe input. The rectangle is not accountable for a logical error that occurred prior to itsexecution. Accordingly, we may treat X and X X L as equivalent errors in this case. Moregenerally, we may assume that all of the errors at the output of the LEC are correctable,since the relationship with the logical operator is irrelevant. This reduces the number ofinequivalent errors at the output of each LEC by a factor of two, and therefore reduces thecounting complexity by the same amount. 106or trailing error correction, we care only about the result of applying a logical decoderto the output. In other words, we only need to know whether the output errors χ out and ζ out represent correctable errors or not. The four relevant quantities are:LEC TECPr[ χ out = x out , K X = k, good | χ in = 0], Pr[ D ( χ out ) = d, K X = k, good | χ in = x in ],Pr[ ζ out = z out , K Z = k, good | ζ in = 0], Pr[ D ( ζ out ) = d, K Z = k, good | ζ in = z in ],where d ∈ { , } and D ( e ) identiﬁes whether e is a correctable error (0) or an uncorrectableerror (1). That is, D ( e ) = 1 if and only if e decodes to a nontrivial Pauli error. The detailsof D depend on the choice of error-correcting code. A two-qubit exRec, shown in Figure 7.2b, is divided into ﬁve components: two leading errorcorrections, gate gadget, and two trailing error corrections. At this level, we are interestedin malignant events—the events for which the rectangle is incorrect. Furthermore, when amalignant event occurs we would like to know how the rectangle is incorrect.Let | ψ (cid:105) be the two-qubit state obtained by applying ideal decoders on the two blocksof the Ga immediately following the LECs. Similarly let | ψ (cid:105) be the state obtained byapplying ideal decoders immediately following the TECs. Then deﬁne mal IX as the eventthat ( I ⊗ X ) U Ga | ψ (cid:105) = | ψ (cid:105) , where U Ga is the two-qubit unitary corresponding to the idealGa gate. Similarly deﬁne the events mal XI , mal XX , mal IZ , mal ZI , mal ZZ . The event mal E can be informally interpreted as the event in which the rectangle introduces a logical error E . The relevant quantities are Pr[ M X , K X = k, good ], and (7.10a)Pr[ M Z , K Z = k, good ] , (7.10b)for M X ∈ { mal IX , mal XI , mal XX } and M Z ∈ { mal IZ , mal ZI , mal ZZ } . Each of the malignantevents can be determined by propagating errors from the output of the LECs and Gathrough the TECs. For example, let x and x be the X errors on the outputs of the ﬁrstand second LECs, respectively. Let x (cid:48) and x (cid:48) be the X result of propagating x and x tothe input of the TECs and combining with X error x of the Ga. Then the probability ofthe malignant IX even is given byPr[mal IX | x , x , x ] = Pr[ D ( χ out ) = 0 | χ in = x (cid:48) ] · Pr[ D ( χ out ) = 1 | χ in = x (cid:48) ] , (7.11)107here as before, D ( x ) determines whether x is a correctable error (0) or not (1). Thequantities on the right-hand side can be readily obtained from the TEC components. Recallfrom Section 7.2.2 that the errors x , x are assumed to be correctable errors. Therefore, D ( χ out ) = 0 corresponds to a logical identity operator and D ( χ out ) = 1 corresponds to alogical X . Probabilities of the other malignant events can be similarly calculated.When counting X and Z errors separately, it is not possible to compute logical Y errorquantities and the analysis will therefore double-count Y errors. Intuitively this is not agreat loss, because the correlations between X and Z are much smaller at this level thanthey are in the original noise model. In Section 7.5 we show how to use (7.10) to computea lower bound on the threshold. The component quantities (7.8) are conceptually straightforward and easy to computenumerically for a ﬁxed γ . However, we would like to compute exact bounds that hold for arange of γ . In this section we discuss a few of the implementation details that allow formaintaining the bounds as polynomials with integer coeﬃcients.The ultimate goal is to compute upper bounds on the probabilities of malignant eventsat the outermost layer of the exRec. That is, we want to compute Equations (7.8) andcombine them to get, for example,Pr[mal IX ( (cid:126)χ ) | accept ] ≤ Pr[mal IX ( (cid:126)χ ) , good X | accept ] + Pr[ bad X | accept ] . (7.12)Here, accept is the event that any and all non-deterministic sub-components (ancilla veriﬁ-cation, for example) accept or succeed. The right-hand side of this inequality decomposesinto sums of individual component quantities of the formPr[ χ = x, K X = k ] = (cid:88) (cid:126) | k | = k Pr[ χ = x, (cid:126)K X = (cid:126)k ] , (7.13)where (cid:126)k = ( k , k , k , k ) expresses the number of failing CNOT, rest, | (cid:105) preparation and Z -basis measurements, respectively.For each term in the sum, the number of failures for each type of location is ﬁxed, butthe particular locations on which those failures occur are not ﬁxed, nor are the errors thatoccur at those locations. Let L ( (cid:126)k ) := { (cid:126)l : ( (cid:126) | l | , (cid:126) | l | , (cid:126) | l | , (cid:126) | l | ) = (cid:126)k } be the set of all possible108uples of failing locations consistent with (cid:126)k . Also, let E ( (cid:126)l ) be the set of all possible tuples of X errors consistent with failures at all locations (cid:126)l . To ﬁx the locations and the errors, usePr[ χ = x, (cid:126)K X = (cid:126)k ] = (cid:88) (cid:126)l ∈ L ( (cid:126)k ) ,(cid:126)e ∈ E ( (cid:126)l ) Pr[ χ = x, (cid:126)E = (cid:126)e ]= (cid:88) (cid:126)l ∈ L ( (cid:126)k ) ,(cid:126)e ∈ E ( (cid:126)l ) I ( x, (cid:126)e ) Pr[ (cid:126)E = (cid:126)e ] (7.14)where in the second line we have made the substitution I ( x, (cid:126)e ) = Pr[ χ = x | (cid:126)E = (cid:126)e ].The indicator function I ( x, (cid:126)e ) takes value one if the component produces the error x for a given “conﬁguration” of errors (cid:126)e and value zero otherwise. The error conﬁguration (cid:126)e fully speciﬁes the locations that have failed and the error at each failing location. Let (cid:126)n = ( n , n , n , n ) be the total number of CNOT, rest, | (cid:105) preparations and Z -basismeasurements in the component, respectively. Let W = w IX + w IY + w XI + w Y I + w XX + w Y Y be the sum of all of the CNOT X -error weights, let W = w r X + w r Y , W = w | (cid:105) , W = w m X and W := max { W , W , W , W } . For simplicity, assume also that w IX = w IY = w XI = w Y I = w XX = w Y Y =: w , w r X = w r Y =: w and let w | (cid:105) =: w . Then from the marginalnoise model discussed in Section 7.1.1 and a conﬁguration of X errors (cid:126)e we havePr[ (cid:126)E = (cid:126)e ] = (cid:89) j =1 (1 − W j γ ) n j (cid:18) w j γ − W j γ (cid:19) k j ≤ A (cid:126)n (cid:18) γ − W γ (cid:19) k (cid:89) j =1 w k j j , (7.15)where A (cid:126)n := (cid:81) j =1 (1 − W j γ ) n j . This inequality is a reasonable approximation for small γ . It allows us to move γ into a prefactor in front of the sum of (7.13) and, assuminginteger weights { w j } , permits an integer representation in the computer analysis. Indeed,substituting back into equation (7.13) givesPr[ χ = x, K X = k ] ≤ A (cid:126)n (cid:18) γ − W γ (cid:19) k (cid:88) (cid:126) | k | = k(cid:126)l ∈ L ( (cid:126)k ) ,(cid:126)e ∈ E ( (cid:126)l ) I ( x, (cid:126)e ) (cid:89) j =1 w k j j . (7.16)Another advantage of counting component probabilities in this way, is that the counts109ompose nicely. If we apply (7.13) to itself and combine with (7.16), we end up withPr[ χ = x, K X = k ] = (cid:88) (cid:126) | k | = k(cid:126)x ∈ out ( x ) (cid:89) i Pr[ χ j = x i , K X,i = k i ] ≤ A (cid:126)n (cid:18) γ − W γ (cid:19) k (cid:34) (cid:88) (cid:126) | k | = k(cid:126)x ∈ out ( x ) (cid:89) i (cid:88) (cid:126) | k i | = k i (cid:126)l ∈ L ( (cid:126)k i ) ,(cid:126)e ∈ E ( (cid:126)l ) I ( x j , (cid:126)e ) (cid:89) j =1 w k j j (cid:35) . (7.17)The substitution made in the ﬁrst line can be applied successively for each sub-component i .Once the lowest level component is reached, we use (7.16) to push dependence on γ outsideof the sum. The integer value inside of the brackets is the discrete convolution of weightedcounts from the sub-components summed over all possible failure partitions (cid:126)k of size k . Itis a weighted count of all possible ways to produce error x with an order k fault.A similar formula holds for the general case in which each of the weights may be unique(i.e., w IX (cid:54) = w IY (cid:54) = w XI . . . , etc.). In general, the product of weights (cid:81) j =1 w k j j is morecomplicated and may depend on the error conﬁguration (cid:126)e .The primary task of the computer analysis is to compute I for each (good) errorconﬁguration, starting with the lowest level component, and to store the resulting weightedsums (cid:88) (cid:126) | k | = k(cid:126)l ∈ L ( (cid:126)k ) ,(cid:126)e ∈ E ( (cid:126)l ) I ( x, (cid:126)e ) (cid:89) j =1 w k j j (7.18)(or equivalent) for use in the counting of larger components. At each level, counts for thesub-components are convolved to generate new counts. The prefactor A (cid:126)n (cid:16) γ − W γ (cid:17) k needonly be computed at the end, when calculating the threshold. One quantity that can be immediately calculated from our counts is the so-called pseudo-threshold [SCCA06] for the CNOT location. The pseudo-threshold for location l is deﬁnedas the solution to the equation p = p (1) l , where p is the probability that the physical (level-0)location fails, and p (1) l is the probability that the 1-Rec for location l is incorrect. We may110ompute a lower bound on the pseudo-threshold for CNOT by upper bounding p (1)cnot ≤ Pr[ bad | accept ] + (cid:88) k (cid:0) Pr[mal X , K X = k, good ] + Pr[mal Z , K Z = k, good ] (cid:1) , (7.19)where mal X ≡ (mal IX ∨ mal XI ∨ mal XX ), mal Z ≡ (mal IZ ∨ mal ZI ∨ mal ZZ ).The pseudo-threshold is of practical interest for cases in which a ﬁnite failure probabilityis acceptable and only a few levels of concatenation are desired. For example, when thephysical failure rate is suﬃciently below the pseudo-threshold, a large code code could beused to bootstrap into other codes with lower overhead.The pseudo-threshold is useful to us for two reasons. First, pseudo-threshold estimateshave been calculated for a variety of fault-tolerant quantum circuits and codes [CDT09],and therefore serve as a reference for our counting results. Second, it was conjecturedby [SCCA06] that the pseudo-threshold is an upper bound on the asymptotic threshold.It thus provides a reasonable target for our calculation of the asymptotic threshold lowerbound, which requires a noise strength maximum to be speciﬁed. Traditionally, malignant sets are those for which any combination of Pauli errors at thecorresponding locations combine to cause the enclosed rectangle to be incorrect. Our malig-nant sets are diﬀerent. We count subsets of faulty locations, but the counted informationis synthesized into error probability upper bounds based on a particular noise model anderror correction scheme.In this section we outline an alternative method for rigorously lower bounding thenoise threshold that is tailored speciﬁcally to the information obtained by our countingprocedure. The basic idea is to treat each level-one rectangle in the level-two simulation asa single “location” with a transformed noise model based on the malignant event upperbounds obtained in Section 7.2. In particular, we show how to treat each level-one exRecindependently while maintaining valid upper bounds on the error probabilities.The asymptotic noise threshold is deﬁned as the largest value γ th such that, for all γ < γ th , the probability that the fault-tolerant simulation succeeds can be made arbitrarilyclose to one by using suﬃciently many levels of code concatenation. To prove a lower boundon the threshold we must show, in particular, that the probability of an incorrect CNOT k -Rec decreases monotonically with k for all γ < γ th . Our counting technique gives anupper bound on the probability that a CNOT 1-Rec is incorrect. We now show how toupper bound incorrectness for level-two and higher and therefore lower bound γ th .111 .5.1 Preserving independent Pauli noise under level reduction Consider an isolated level-one CNOT exRec. Let Pr[mal E ] be the probability that themalignant event mal E occurs. For this event, the enclosed 1-Rec behaves as an encodedCNOT gate followed by a two-block error that, when ideally decoded, leaves a two-qubiterror E on the decoded state. Then our counting technique provides upper bounds onPr[mal E ] for E ∈ { IX, XI, XX, IZ, ZI, ZZ } . These upper bounds can be viewed as anerror model for the CNOT 1-Rec in which the correlations between X and Z errors areunknown.We would now like to analyze the level-two CNOT exRec. Ideally, we could treat each1-Rec in the level-two simulation as a single “location” and use the error model obtainedfrom level-one to describe the probability of failure. Then level-two analysis could proceedby feeding this “transformed” error model back into the counting procedure in order tocompute Pr[mal E ] for the CNOT 2-Rec.However, the transformed error model is based on analysis of an isolated level-one CNOTexRec. A typical level-one simulation will contain many exRecs, and adjacent exRecs mayshare error corrections at which point they can no longer be considered independently.The reason that level reduction works when counting sets of malignant locations isbecause exRecs with incorrect rectangles are replaced with faulty gates in the same wayregardless of the malignant event that actually occurs. The quantity used to boundincorrectness probability is strictly non-increasing as locations (i.e., TECs) are removed.To see this, consider sets of exRec locations of size k and denote the set of all such setsby S k . Let M ⊆ S k be those sets for which some combination of nontrivial errors at the k locations causes the rectangle to be incorrect (i.e., the malignant sets). The probabilitythat the rectangle is incorrect due to failures at exactly k locations is then no more than | M | p k . If an error correction is removed from the exRec, some of the sets in M now containfewer than k exRec locations. The remaining sets with k exRec locations are those that donot contain a location in the removed error correction. The number of such sets is at most | M | and so the original bound on the incorrectness probability still holds.The disadvantage to this approach for non-adversarial noise models is that it fails toconsider all of the available information. In particular, for a ﬁxed set of malignant locationsit assumes the worst-case error for each location. The probability that a given set of k locations is actually malignant can be signiﬁcantly less than p k . To obtain a more accurateanalysis of the second level, we would like to replace each incorrect 1-Rec according to themalignant event that has actually occurred.Our transformed noise model of an isolated CNOT exRec provides upper bounds on the112 EC-A • { X , I } TEC-A { X , I } Figure 7.3: Upper block of the CNOT exRec. The error at the output of the TEC is eithercorrectable ( I ), or not ( X ). Similarly the error immediately preceding the TEC is eithercorrectable ( I (cid:48) ) or not ( X (cid:48) ).probability of each type of malignant event, but we must show that the bounds still holdwhen exRecs overlap. Unfortunately, the bounds almost certainly will not hold. Consider,for example, the control block of the CNOT exRec, shown in Figure 7.3. Assume thatthe error immediately preceding the transversal CNOT is correctable (the error itself isnot important). Let X be the event that an uncorrectable X error exists on the outputof the TEC and I be the event that the error on the output is correctable. In otherwords X ≡ (mal XI ∨ mal XX ) and I ≡ ¬ X . Then deﬁne X (cid:48) ≡ ¬ I (cid:48) as the event that anuncorrectable X error exists on the block following the transversal CNOT but before errorcorrection. Pr[mal XI ] will be non-increasing when removing the trailing error correction onlyif Pr[ X (cid:48) ] ≤ Pr[ X ]. On the other hand, Pr[mal IX ] will be non-increasing only if Pr[ I (cid:48) ] ≤ Pr[ I ].Since Pr[ X ] + Pr[ I ] = Pr[ X (cid:48) ] + Pr[ I (cid:48) ] = 1, both conditions are satisﬁed only if Pr[ X ] = Pr[ X (cid:48) ]and Pr[ I ] = Pr[ I (cid:48) ], which of course is highly unlikely.In order to ensure a proper upper bound on each of the malignant event probabilities,we must calculate upper bounds for the complete exRec and for incomplete exRecs in whichone or more trailing error corrections have been removed. Calculations for the completeexRec were discussed in Section 7.2.3. Calculations for the incomplete exRecs are the sameexcept that some of the TEC components are not considered. Bounding the malignantevent probability is a matter of ﬁnding a polynomial that bounds all four cases. Details ofthe bounding polynomial can be found in Appendix D of [PR12].Once proper bounds on the level-one malignant event probabilities are determined,we would like to plug the transformed error model into our counting procedure in orderto determine the level-two error probabilities. There are a few things to consider beforedoing so. First, part of the counting strategy, such as ancilla veriﬁcation, may relyon using the correlations between X and Z errors in order to avoid over-counting thatoccurs during postselection (for example, see Section 7.7.2). The transformed error model,however, contains no such correlation information, so the counting strategy must be alteredaccordingly. Second, the CNOT malignant event upper bounds do not contain informationabout rest, preparation or measurement locations. Level-one error models for these locations113an be computed using the same counting strategy as the CNOT, but with an appropriatelymodiﬁed exRec. Finally, in the Pauli-channel noise model, the error probabilities of each location areconstant multiples of the noise strength γ . Our upper bounds on the malignant eventprobabilities, however, need not have any scalar relationship. For computer analysis, errorprobabilities must be re-normalized in terms of γ and error weights recalculated as follows.Let P (1) E be our upper bound on the level-one malignant event mal E . Then construct apolynomial Γ (1) and choose constants α E such that P (1) E ( γ ) ≤ α E Γ (1) ( γ ) (7.20)for all E . The polynomial Γ (1) can be viewed as an eﬀective noise strength “reference” forlevel-one. Γ (1) ( γ ) is a function of γ , but we will usually denote it as Γ (1) for convenience ofnotation. Together with weights α E , Γ (1) deﬁnes a new independent Pauli channel noisemodel. Again, see Appendix D of [PR12] for details of the construction.Now the new error model is input into the counting procedure and upper bounds on thelevel-two error rates are computed. Let P (2) E (Γ) be the upper bound computed for mal E atlevel-two. Then we have the following conditions on the level-one and level-two malignantevent probabilities: Pr[mal (1) E ] ≤ P (1) E ( γ ) ≤ α E Γ (1) Pr[mal (2) E ] ≤ P (2) E (Γ (1) ) . (7.21) The transformed noise model provides a means for computing malignant event probabilitiesat level-two based on the malignant event probabilities of level-one. In principle, it ispossible to repeat that procedure to calculate malignant event probabilities up to anydesired level of concatenation.To prove a noise threshold, we could continue to concatenate until the transformednoise strength is suﬃciently low, and then use schemes for which a threshold is known. Forexample, Aliferis and Preskill prove a threshold for depolarizing noise of 1 . × − for ascheme based on the [[4 , , P (2) E obeys the following property: Alternatively, they can be incorporated into the CNOT exRecs [AC07]. laim 7.5.1.

For ≤ (cid:15) ≤ , P (2) E ( (cid:15) Γ (1) ( γ )) ≤ (cid:15) t +1 P (2) E (Γ (1) ( γ )) , where t = (cid:98) ( d − / (cid:99) and d is the minimum distance of the (unconcatenated) code. In other words, the level-two malignant event polynomials decrease with γ at a ratethat corresponds with the distance of the code. This is just the kind of behavior that weshould expect from a strictly fault-tolerant scheme. Proof of this claim is based on the formof the polynomials constructed by our counting technique and the fact that our circuits arestrictly fault-tolerant. Details of the proof are delegated to Appendix A.We are now in a position to establish conditions for a noise threshold, i.e., the conditionsunder which the probability of a successful simulation can be made arbitrarily close to one. Theorem 7.5.2.

Let M be the set of all level-one CNOT, preparation, measurement andrest malignant events consisting of: mal IX , mal XI , mal XX , mal IZ , mal ZI , mal ZZ , mal prep X ,mal prep Z ,mal meas X , mal meas Z , mal rest X and mal rest Z . Also let P (1) E , P (2) E and Γ (1) be polynomials and α E constants as discussed above. Then the tolerable noise threshold for depolarizing noiseis lower bounded by the largest value γ th such that P (2) E (Γ (1) ( γ th )) ≤ α E Γ (1) ( γ th ) (7.22) for all mal E ∈ M .Proof. Assume that P (2) E (Γ (1) ) < α E Γ (1) , for all mal E and γ ∈ (0 , γ th ). Then, for a ﬁxed γ ∈ [0 , γ th ), there exists some positive (cid:15) < E , P (2) E (Γ (1) ) ≤ (cid:15)α E Γ (1) .By choosing Γ (2) := (cid:15) Γ (1) we obtain an eﬀective noise model for level two in which theweights α E are unchanged. Since our counting method depends only on the error weights,the polynomials that upper bound the level-three malignant events will be the same as thepolynomials that upper bound the level-two malignant events. That is, P ( k ) E (Γ) = P (2) E (Γ)for k ≥

2. Thus, Pr[mal (3) E ] ≤ P (3) E (Γ (2) ) = P (2) E ( (cid:15) Γ (1) ) ≤ (cid:15) t +2 α E Γ (1) , (7.23)where the last inequality follows from Claim 7.5.1. Deﬁning Γ (3) := (cid:15) t +1 Γ (2) and repeatingthis process k times yieldsPr[mal ( k +1) E ] ≤ P ( k +1) E (Γ ( k ) ) ≤ (cid:15) ( k − t +1)+1 α E Γ (1) , (7.24)which approaches zero in the limit of large k .115esting of the assumption P (2) E (Γ (1) ) < α E Γ (1) over a ﬁxed interval (0 , γ th ) is straightfor-ward if all of the malignant event polynomials (including Γ (1) ) are monotone non-decreasingup to suﬃciently large values of γ . Monotonicity is highly plausible for values of γ sur-rounding or below threshold, but must be checked explicitly based on the weighted countsobtained from malignant set counting. Appendix C of [PR12] provides an explicit procedurefor checking monotonicity. The entire malignant set counting procedure is somewhat lengthy. For convenience, we nowsummarize each of the steps.1. Choose a CSS code, error correction scheme, and an independent Pauli noise model.Construct the corresponding extended rectangle that satisﬁes Deﬁnitions 4.2.2 and4.2.3, for each encoded gate type.2. Partition each exRec into a hierarchy of small components.3. For each lowest-level component choose a small integer k good , count all of the errorsthat occur with up to k good faulty locations, according to the weights of the selectednoise model. Also compute Pr[ bad ], the probability that more than k good locationsare faulty. If necessary, compute Pr[ accept ] that the component is accepted.4. For higher level components, again choose a k good , and count errors by convolvingresults from lower level components up to k good . Calculate Pr[ bad ] and Pr[ accept ] asnecessary.5. For each exRec, compute Pr[ E ] the probability of the logical error E for each X and Z error. Construct the corresponding transformed Pauli noise model.6. Either repeat the procedure (if parts of the exRec are non-deterministic), or boundthe threshold analytically using Theorem 7.5.2.116 .7 Example: a depolarizing noise threshold for theGolay code In order to quantify the eﬃcacy of our adapted malignant set counting technique, we use itto calculate the depolarizing threshold of the 23-qubit Golay code. The Golay code is idealfor this task for a variety of reasons. First, with distance seven, it is substantially largerthan typically studied codes which usually have distance three. Still, it is small enough sothat the number of possible errors on a single block is quite manageable. Second, numericalestimates place the Golay code as one of the top performers, with depolarizing thresholdestimates on the order of 10 − [Ste03, DHN06, CDT09]. On the other hand, malignantset sampling has yielded statistical lower bounds for adversarial noise of just 10 − , leavingample room for improvement.In this section, we prove a depolarizing noise threshold lower bound of 1 . × − forthe Golay code, which essentially matches numerical estimates and is the highest knownrigorous lower bound for any code. Furthermore, we show that the resource overhead forour scheme is usually substantially lower than the [[4 , , The depolarizing noise model is particularly easy to deﬁne in terms of the weights prescribedby Deﬁnition 7.1.1. For the CNOT gate, choose w ab = 1 for all a, b ∈ { I, X, Y, Z } . The117 (cid:12) (cid:11) preparation Location type CNOT exReccircuit CNOT Prep. Meas. Rest Total totalSteane 77 23 0 6 106 5439Overlap 57 23 0 38 118 5823Table 7.1: Location counts for preparing encoded | (cid:105) in the Golay code. Encoded | (cid:105) ancillas are prepared with either the pseudorandomly constructed Steane preparationcircuits (Table 6.4), or the overlap preparation circuits (Figure 6.6 and Table 6.5). The lastcolumn shows the total number of locations inside the CNOT exRec shown in Figure 7.4,including the transversal CNOT operation and four error corrections.Veriﬁcation schedule CNOT Pseudothreshold ThresholdSteane-4 1 . × − . × − Overlap-4 1 . × − . × − Table 7.2: Threshold lower bounds for circuits based on our four-ancilla preparation andveriﬁcation schedules for the Golay code, based on Figure 6.9. Thresholds are given withrespect to p the probability that a physical CNOT gate fails, according to the depolarizingnoise model deﬁned in Section 7.7.1.rest location weights are chosen based on the one-qubit marginals of the CNOT. Use w r a = (cid:80) b ∈{ I,X,Y,Z } w ab = 4 for a ∈ { X, Y, Z } . For preparation and measurement locationsuse w | (cid:105) = w | + (cid:105) = w m X = w m Z = 4. The preparation and measurement weights are lowerthan the one-qubit marginals (which would imply values of eight) because any higher noiserate could be reduced to 4 γ + O ( γ ) by repeating the preparation or measurement usingtwo qubits coupled by a CNOT. The threshold calculation is most limited by the exRec with the largest number of locations.The Golay code admits transversal implementations of encoded Cliﬀord group unitaries.Universality can be achieved by state distillation. Therefore the largest exRec in our caseis for the encoded CNOT gate, an exRec that consists of four Steane-type error correctionsplus 23 CNOT gates (see Figure 7.4). Table 7.1 gives a breakdown of the number oflocations for our preparation circuits, and the total number of locations in the CNOTexRec. 118 veriﬁcation Z veriﬁcation (cid:12)(cid:12) (cid:11) • r e s t (cid:31)(cid:30)(cid:29)(cid:28)(cid:24)(cid:25)(cid:26)(cid:27) (cid:12)(cid:12) (cid:11) (cid:31)(cid:30)(cid:29)(cid:28)(cid:24)(cid:25)(cid:26)(cid:27) (cid:42)(cid:45)(cid:43)(cid:44) Z (cid:12)(cid:12) (cid:11) • r e s t • (cid:42)(cid:45)(cid:43)(cid:44) X (cid:12)(cid:12) (cid:11) (cid:31)(cid:30)(cid:29)(cid:28)(cid:24)(cid:25)(cid:26)(cid:27) (cid:42)(cid:45)(cid:43)(cid:44) Z (cid:95) (cid:95) (cid:95) (cid:95) (cid:95) (cid:95)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31) (cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:95) (cid:95) (cid:95) (cid:95) (cid:95) (cid:95)(cid:95) (cid:95) (cid:95) (cid:95) (cid:95) (cid:95)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31) (cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:95) (cid:95) (cid:95) (cid:95) (cid:95) (cid:95) (cid:95) (cid:95) (cid:95) (cid:95) (cid:95) (cid:95) (cid:95) (cid:95) (cid:95)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31) (cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:95) (cid:95) (cid:95) (cid:95) (cid:95) (cid:95) (cid:95) (cid:95) (cid:95) (a) Z -error correction X -error correction χ in , ζ in (cid:31)(cid:30)(cid:29)(cid:28)(cid:24)(cid:25)(cid:26)(cid:27) • χ out , ζ out veriﬁed (cid:12)(cid:12) (cid:11) r e s t • (cid:42)(cid:45)(cid:43)(cid:44) X veriﬁed | + i r e s t (cid:31)(cid:30)(cid:29)(cid:28)(cid:24)(cid:25)(cid:26)(cid:27) (cid:42)(cid:45)(cid:43)(cid:44) Z (cid:95) (cid:95) (cid:95) (cid:95) (cid:95) (cid:95) (cid:95) (cid:95) (cid:95) (cid:95)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31) (cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:95) (cid:95) (cid:95) (cid:95) (cid:95) (cid:95) (cid:95) (cid:95) (cid:95) (cid:95) (cid:95) (cid:95) (cid:95) (cid:95) (cid:95) (cid:95) (cid:95) (cid:95) (cid:95) (cid:95)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31) (cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:95) (cid:95) (cid:95) (cid:95) (cid:95) (cid:95) (cid:95) (cid:95) (cid:95) (cid:95) (b) Figure 7.4: Organization of a CNOT exRec, for the Golay code. The CNOT exRecincludes four error corrections and a transversal CNOT gate as illustrated in Figure 7.2b.(a) Each error-correction component consists of separate Z and X error corrections. Z -errorcorrection requires an encoded | (cid:105) state ( (cid:12)(cid:12) (cid:11) ) that has been veriﬁed against errors, and X -error correction requires a veriﬁed | + (cid:105) ancilla state. (b) A veriﬁed (cid:12)(cid:12) (cid:11) state is prepared bychecking two pairs of prepared (cid:12)(cid:12) (cid:11) states against each other for X errors, then, conditionedon no X errors being detected, checking the results against each other for Z errors. Veriﬁed | + (cid:105) is prepared by taking the dual of the (cid:12)(cid:12) (cid:11) circuit. See Chapter 6.119 .0000 0.0002 0.0004 0.0006 0.0008 0.0010 0.0012 0.0014 0.0016 0.0018 p mal IX mal XI mal XX mal meas X mal rest X mal prep X (a) X -error malignant events p mal IZ mal ZI mal ZZ mal meas Z mal rest Z mal prep Z (b) Z -error malignant events Figure 7.5: These plots show upper bounds on probability of malignant events for thediﬀerent level-one exRecs. The mal IX , mal XI , mal XX , mal IZ , mal ZI and mal ZZ events allpertain to the CNOT exRec; the mal prep X and mal prep Z events correspond to the | (cid:105) and | + (cid:105) preparation exRecs, respectively; mal meas X and mal meas Z correspond to Z -basis and X -basismeasurement exRecs; mal rest X and mal rest Z pertain to the rest exRecs. Note that the upperbound on mal ZI is signiﬁcantly higher than that of its dual counterpart mal IX . This isdue largely to the arbitrary choice in error correction to correct Z errors ﬁrst and X errorssecond. 120 -error veriﬁcation X -error veriﬁcation requires two encoded | (cid:105) states. The ﬁrst is veriﬁed against the secondfor X errors by applying transversal CNOT gates between the two code blocks and thenmeasuring each qubit of the second block in the Z eigenbasis ( | (cid:105) , | (cid:105) basis). Conditionedon no X errors being detected, the ﬁrst code block is accepted. See Figure 7.4a.Letting accept denote the event that no X errors are detected, we use Bayes’s rulePr[event | accept ] = Pr[event , accept ]Pr[ accept ] (7.25)to compute the conditional probabilities of diﬀerent error events. For an event χ involvingonly X errors, this calculation is straightforward.However, if the event is a Z error ζ , then the numerator Pr[ ζ = z, accept ] is diﬃcult tocompute as it mixes X and Z errors. The obvious bound, Pr[ ζ = z, accept ] ≤ Pr[ ζ = z ], isquite pessimistic because in the depolarizing noise model we expect X errors to occur with Z errors roughly half of the time, and so X -error veriﬁcation should remove many Z errors.It is important to obtain an accurate count of Z errors since they strongly inﬂuence theacceptance rate of the upcoming Z -error veriﬁcation. Therefore, we also count X and Z errors together for very low-order faults and apply a correction to the Z -only counts.Speciﬁcally, when counting X and Z errors together, we keep track of the errors that are rejected rather than those that are accepted. Since the Z -only counts contain all errors, wemay subtract oﬀ the rejected error counts while maintaining proper counts for the acceptederrors. Details of are worked out in [PR12].The improvement obtained by counting X and Z errors simultaneously is twofold. First,the reduction in Z errors directly reduces the probability of a Z -error malignant event.Indeed, we ﬁnd that the correction cuts the number of Z errors roughly in half, as expected.More importantly, though, a smaller number of Z errors means an increased acceptanceprobability during the upcoming Z -error veriﬁcation. We see from Figure 7.6a that thelower bound on Z -error veriﬁcation acceptance at p = 10 − is about 0 .

84. We crudelyestimate a lower bound without the correction of about 0 .

63, a decrease by a factor of 1 . Z -error veriﬁcations of encoded | (cid:105) in the (full) exRec and four similar X -error veriﬁcations of encoded | + (cid:105) . Thus, in the normalization factor alone, the correctionreduces upper bounds on the malignant event probabilities by roughly a factor of 1 . ≈ p decreases.121 xRec Counting of the exRec component was discussed in Section 7.2.3. However, there are afew items of note for our example based on the Golay code. First, the ancilla veriﬁcationcomponents are non-deterministic. Accordingly, all of the malignant event probabilitiesmust be conditioned on acceptance of all of the veriﬁcation stages. Since the counts reportedby the ancilla veriﬁcation stages assume successful veriﬁcation already, calculating theconditional probability is simply a matter of dividing by the product of all of the acceptanceprobabilities.Second, we seek to combine large subsets of the sub-component counts. However, dueto the block-size of the Golay code and size of the sub-components in the CNOT exRec,taking all possible convolutions of the sub-component error counts is impractical. Instead,the bad X event for the exRec (and analogously the bad Z event) occurs when any of thefollowing are true: • any of the sub-components are bad X , • there are more than 25 X failures in the exRec, • there is more than one X failure in the transversal CNOT and there are more thanthan three X failures in each of the two leading ECs.The last condition eliminates faults that are particularly diﬃcult to count. The timerequired to count an exRec fault is proportional to the product of the number of uniquesyndromes that can result at the output of the two leading ECs and the transversal CNOT.The number of unique syndromes that can result from the transversal CNOT with two X failures is (cid:0) (cid:1) = 2277, while the number of unique syndromes with one X failure is23 · X failures respectively. So, for example, the event K X, = 2 , K X, = 3 , K X, = 1 (277 · · ≈ · ) requires far less time than the event K X, = 2 , K X, = 3 , K X, = 2 (277 · · ≈ · ). In particular, we would like toavoid counting faults for which K X, = 2.Calculations for each of the bad X terms are plotted in Figure 7.6b. Label each of theexRec sub-components with numbers, starting with the LECs (1 , , bad (3) X ]) or the condition involving the transversal CNOT and thetwo LECs (Pr[ K X, > (cid:81) j =1 Pr[ K X,j > | accept ( j ) ]).122 .0000 0.0005 0.0010 0.0015 0.0020 0.0025 p Pr[accept (1) ]Pr[accept (2) ]Pr[accept | accept (1 , ] (a) p -9 -8 -7 -6 -5 -4 -3 Pr[bad X | accept (1 , , , ]Pr[ K X, > Y j =1 Pr[ K X,j > | accept ( j ) ]Pr[bad (1 , , , X | accept (1 , , , ]Pr[ K X > (3) X ] (b) Figure 7.6: Plotted in (a) are lower bounds on the Overlap-4 acceptance probabilities for thetwo X -error veriﬁcations ( accept (1) and accept (2) ) and for the Z -error veriﬁcation ( accept )conditioned on success of the X -error veriﬁcations. The plot in (b) shows upper bounds onconditions that lead to a bad X event in the CNOT exRec. Our thresholds compare favorably to threshold results for similar circuits. For a six-ancillapreparation and veriﬁcation circuit, Aliferis and Cross [AC07] give a threshold estimatebased on malignant set sampling of p ≈ × − for adversarial noise. Our results beatthis by an order of magnitude and provide strong evidence that our counting techniqueis an improvement over malignant set sampling and malignant set counting for the caseof depolarizing noise. Our results also essentially close the gap with other analytical andMonte Carlo threshold estimates for depolarizing noise. Using a closed form analysis,Steane [Ste03] estimated a threshold on the order of 10 − for the Golay code with similarnoise parameters. Dawson, Haselgrove and Nielsen calculated a higher estimate of justunder 3 × − , and Cross et al. [CDT09] estimated a pseudo-threshold of 2 . × − basedon Monte Carlo simulations of a twelve-ancilla preparation and veriﬁcation circuit.Beyond circuits based on the Golay code, our results are apparently the highest rigorousthreshold lower bounds known. Aliferis and Preskill [AP09] prove a lower bound of p ≥ . × − . Their analysis applies to teleportation-based gates due to Knill [Kni05] inwhich Bell pairs encoded into an error correcting code C are prepared by ﬁrst encoding eachqubit of the C block into an error detecting code C and performing error detection and123ostselection after each step of the C encoding. Our best threshold is only about 5 percentbetter, but applies to circuits that usually require far less overhead (see Section 7.7.4). Thisimplies only that in the depolarizing noise model our analysis is more accurate, and notthat our schemes tolerate more noise.The limiting factor on the threshold value is the event mal ZI . That is, mal ZI is the event E for which Pr[mal (2) E ] = Pr[mal (1) E ] takes the smallest value of p . In fact, the correspondingthreshold values for nearly all Z -error malignant events are lower than threshold values for any of the X -error events. This asymmetry is due to the arbitrary order with which weperform error correction— Z ﬁrst, then X . Some X errors resulting from the leading Z -errorcorrection will be corrected by the X -error correction that follows. However, Z errorsresulting from the X -error correction may propagate through the encoded operation beforearriving at the Z -error correction on the trailing end. As a result, it is more likely for Z errors on individual blocks to be combined by the CNOT gate and create an uncorrectableerror. Evidence of this eﬀect can be seen in the level-one malignant event probabilitiesshown in Figure 7.5.It should be possible to reduce such lopsided event probabilities by customizing theerror correction order for each EC based on the speciﬁcs of the ancilla preparation circuits.However, analyzing such a scheme would require consideration of up to 36 diﬀerent full orpartial CNOT exRecs (two choices for each EC) instead of four and is likely to yield only asmall improvement in the threshold. Note that other small improvements could be madeby, for example, eliminating measurement or rest exRecs at level-two. For simplicity, theseoptimizations were not considered. The threshold provides a target accuracy for quantum computing hardware, but it does notproduce a complete picture on its own. In particular, we would also like to understand howthe resource overhead for our scheme scales as the physical error rate drops below threshold.Ultimately, the resource scaling will determine how small physical error rates must be inorder to keep space and time resources to a manageable level. In this section we calculateupper bounds on the number of physical gates and the number of physical qubits requiredto implement a single logical gate with a given eﬀective error rate.Our threshold analysis assumes that an inﬁnite supply of ancilla qubits is availablefor use in error correction. In order to bound the resource overhead we instead assumethat some ﬁnite number of ancillas are available to each k -EC. Error correction proceedsnormally unless all ancilla veriﬁcations fail. If the number of available ancillas is high124 -6 -5 -4 -3 p p h y s i c a l g a t e s p e r l o g i c a l g a t e (a) Golay scheme with Overlap-4 preparation -6 -5 -4 -3 p (b) [[4 , , Figure 7.7: Gate overhead upper bounds for (a) our Golay scheme with overlap ancillapreparation and (b) the Fibonacci scheme presented in [AP09]. Each plot shows the numberof physical gates required to implement a logical gate with target error rates p target ∈{ − , − , − , − } . Black text labels indicate the required level of concatenationand colored lines are a guide for the eye.enough, then the probability that all veriﬁcations fail will be small and the impact on thelogical errors will be similarly small.More precisely, our approach is as follows. The ancilla veriﬁcation circuit (Figure 7.4a)is considered as a single unit. Each level- k Z -error correction consists of m k | (cid:105) veriﬁcationsperformed in parallel plus a transversal rest, CNOT and X -basis measurement. If all of the m k veriﬁcations fail, then Z -error correction is aborted and the data is left idle. Level- kX -error correction is similar. For simplicity, if any of the error corrections are aborted,then we consider the entire top-level logical gate to have failed.Let p target be overall target error rate per logical gate, P ( k ) := max i P ( k ) i , and let K bethe minimum level of concatenation that achieves P ( k ) < p target assuming an unboundednumber of ancilla. We may then calculate a bound on the number of ancilla veriﬁcations m k for every k ≤ K . Setting δ ( k ) = p target − P ( k ) , the total gate overhead g ( k ) for a CNOT k -Rec can be computed recursively by g ( k ) ≤ (2 m k A EC + 23) · g ( k − A EC is thenumber of locations in the error-correction component. Details are provided in [PR12].Gate overhead upper bounds for the overlap-based scheme are shown in Figure 7.7a.The overhead increases dramatically as the target logical error rate decreases. However,compared to similar upper bounds for the Fibonacci scheme—which has a similar threshold125 -6 -5 -4 -3 p p h y s i c a l q u b i t s p e r l o g i c a l g a t e (a) Golay scheme with Overlap-4 preparation -6 -5 -4 -3 p (b) [[4 , , Figure 7.8: Qubit overhead upper bounds. Plots are formatted identically to Figure 7.7.lower bound [AP09]—our scheme is better for a wide range of error rates often by severalorders of magnitude. One reason for the improved overhead is that our scheme is basedon a code with higher distance than the Fibonacci scheme which uses the [[4 , , k -Rec hasdepth three, independent of k . We, therefore, pessimistically assume that once a qubit ismeasured it cannot be re-used within the same rectangle. The qubit overhead then dependsonly on the gate overhead and the qubit-gate ratio for | (cid:105) veriﬁcation. Using a ratio of8 · / ( A EC −

46) we obtain q ( k ) ≤ k + 0 . k g ( k ) Therefore, the level- k qubit overhead isroughly k orders of magnitude lower than the level- k gate overhead.The qubit-gate ratio for Bell-state preparation in the Fibonacci scheme is relativelylarge ( ≈ . p target = 10 − and p = 10 − ourscheme requires two levels of concatenation and about 10 physical gates per logical gate.For the same error rates, the Fibonacci scheme requires three levels of concatenation, but126ewer than 10 gates.Finally, note that bounds for our scheme when p target = 10 − are a bit loose due toa constant oﬀset that is added during the transformed noise model construction. In ourcomputer analysis, these oﬀsets were on the order of (cid:15) ≈ − . In principle, this oﬀsetdoes not aﬀect the actual error rates; rather it is an artifact of our construction. Our explicit calculations for the Golay code show the power of the modiﬁed malignant setcounting technique. Compared to standard malignant set counting we are able to countmuch larger sets of faulty locations, and obtain a bound on the threshold which is about anorder of magnitude larger than previous attempts. Intuitively, this is because we eﬃcientlyignore subsets of faulty locations which are unlikely to occur. Use of the independent Paulinoise model permits fair comparisons of our bounds with Monte Carlo estimates. In thecase of the Golay code, our rigorous lower bound roughly matches numerical estimates dueto [Ste03, DHN06, CDT09].The technique is quite general, and can be applied to any CSS code. However, thereare still several drawbacks to our approach. First, we count errors in terms of equivalenceclasses based on the stabilizers of the code, but the number of unique errors per block isstill exponential in the block size. For the Golay code, this meant keeping track of 2 X errors and 2 Z errors. For two blocks the total number of errors was 2 · . This numberof errors is manageable, but numbers for larger codes may become unwieldy.Another drawback is that we have assumed arbitrary qubit interactions, ignoring anyphysical geometric locality constraints. This simpliﬁes the analysis greatly, but artiﬁciallyinﬂates the threshold and underestimates resource requirements in the case that geometricconstraints are actually required. Therefore, our results are not directly comparable tothresholds for topological codes including the surface code, for example. Of course, ourtechnique can be adapted to account for geometric constraints by, for example, insertingswap gates, if necessary. We have not considered such adaptations here.127 hapter 8Decomposition of single-qubitunitaries into fault-tolerant gates This chapter is based on material that appears in [PS13].The mapping of a quantum algorithm into its equivalent fault-tolerant circuit repre-sentation requires a choice of universal basis, most commonly consisting of CNOT andsingle-qubit gates. (See Section 2.5.) Traditional methods for single-qubit unitary decom-position take as input a unitary U and a distance parameter (cid:15) , and output a sequence ofgates W = G . . . G k such that (cid:107) U − W (cid:107) ≤ (cid:15) , for G , . . . , G k in the chosen gate set, andsome choice of norm (cid:107)·(cid:107) . The operation W is said to approximate U to within a distance (cid:15) .This approach is justiﬁed by the fact that when (cid:107) U − W (cid:107) is small, the output distributionof a circuit containing U is close to the output distribution obtained by substituting W .The set of single-qubit unitaries that can be implemented fault tolerantly is predom-inantly dictated by the existence of resource-eﬃcient fault-tolerance protocols. See Sec-tion 4.3. A common universal, single-qubit basis is { H, S, T } , since H and S can often beimplemented transversally, and T can be achieved through state distillation. The cost of a { H, S, T } circuit is usually deﬁned to be the number of T gates, since the resource cost ofa fault-tolerant T gate is up to an order of magnitude larger than the resource cost of afault-tolerant H gate [RHG07, FDJ13]. The inclusion of S is a direct consequence of the choice of cost function. The S gate is otherwiseredundant since S = T . W involves no measurements, and is therefore deterministic(at the logical level). In this chapter we will show that by allowing a small number ofancilla qubits and measurements, non-deterministic circuits can outperform deterministiccircuits which are otherwise optimal. The circuits that we consider can be used to approx-imate a single-qubit unitary with roughly one-third to one-fourth the cost of traditionaldecomposition methods.As an example, consider the circuit shown in Figure 8.1a, which performs the single-qubitunitary V = ( I + 2 iZ ) / √

5. This circuit involves two measurements in the X -basis. Ifboth measurement outcomes are zero, then the output is equivalent to V | ψ (cid:105) . If any otheroutcome occurs, then the output is I | ψ (cid:105) = | ψ (cid:105) . Thus, the circuit may be repeated untilobtaining the all zeros outcome, and the number of repetitions will vary according to ageometric probability distribution. (In this case the probability of getting both zeros is5 / V is implemented exactly , even though theoverall circuit is non-deterministic. Each Toﬀoli gate can be implemented using four T gates, and so the overall expected cost is 12 .

8. By contrast, an approximation of V towithin (cid:15) = 10 − using the deterministic algorithm of [KMM12c] requires 67 T gates.We call a circuit of the form of Figure 8.1a, which may be repeated until obtainingsome desired outcome, a repeat-until-success circuit or RUS circuit for short. Throughthe use of an optimized direct-search algorithm, we present thousands of RUS circuitswhich exactly implement select unitary rotations at extremely low T -count. By explicitlycomputing the circuit sequences, we construct a large database of single-qubit unitarieswhich is suﬃciently large to approximate an arbitrary Z -axis rotation with within (cid:15) ≥ − .Using this database, the expected number of T gates required to approximate a random Z -axis rotation R Z ( θ ) = (cid:0) e iθ (cid:1) scales asExp Z [ T ] = 1 .

26 log (1 /(cid:15) ) − . . (8.1)While existing algorithmic decomposition methods are capable of approximations to smallerdistances, our techniques provide approximations with extremely low T counts. Furthermore,approximations to within 10 − are suﬃcient for many quantum algorithms, including Shor’sfactoring algorithm [FH04], and quantum chemistry algorithms [JWM + U can be approximated by ﬁrst expressing it as aproduct of three Z -axis rotations U = R Z ( θ ) HR Z ( θ ) HR Z ( θ ) . (8.2)Each rotation can then be decomposed individually. However, RUS circuits can also beused to approximate arbitrary single-qubit unitaries directly, without resorting to Z -axis129 + i • • X | + i • • X | ψ i S Z (a) Exp[T] = 12 . | + i • • X | + i • • X | i • • X | ψ i S Z (b) Exp[T] = 6 . | + i T † T X | + i • T X | ψ i T Z • (c) Exp[T] < . Figure 8.1: Repeat-until-success circuits for V = ( I + 2 iZ ) / √

5. Each of the circuits aboveimplements V conditioned on an X -basis measurement outcome of zero on each of the toptwo ancilla qubits. If any other measurement outcome occurs, then each circuit implementsthe identity. The probability of measuring 00 is 5 / T gates, as indicated. (a) Aslight modiﬁcation of the circuit presented in [NC00] pp. 198. Each Toﬀoli gate can beimplemented with four T gates (see [Jon13d]). (b) A circuit proposed by Jones that requiresjust a single Toﬀoli gate [Jon13c]. (c) An alternative circuit found by our computer search.Measurement of the ﬁrst qubit can be performed before interaction with the data qubit.Thus the top-left part of the circuit can be repeated until measuring zero. The probabilityof measuring zero on the ﬁrst qubit is 3 /

4. The probability of measuring zero on the secondqubit, conditioned on zero outcome of the ﬁrst qubit, is 5 /

6. The T gate applied directlyto | ψ (cid:105) can be freely commuted through the CNOT. In the case that an even number ofattempts are required, the T gates can be combined yield the Cliﬀord gate T = S .130otations. Our results indicate a T -count scaling ofExp U [ T ] = 2 . (1 /(cid:15) ) − . , (8.3)roughly another 50 percent better than using (8.1) and up to four-fold better than traditionaldeterministic decomposition of three Z -axis rotations. Constructing a database of RUScircuits for arbitrary unitaries is signiﬁcantly more challenging than for the Z -axis case,however. We have computed approximations only up to (cid:15) = 8 × − . By the Solovay-Kitaev theorem [Kit97, KSV02], a single-qubit unitary operation can beeﬃciently approximated to within a desired (cid:15) by decomposition into a sequence of gatesfrom a discrete universal basis with length O (log c (1 /(cid:15) )), where c = 1 is the asymptotic lowerbound [Kni95], and the best-known practical implementation achieves c ∼ .

97 [DN05].The algorithm works by ﬁnding progressively better approximations of a unitary U , throughapplication of the group commutator G G G † G † for pairs of gates G , G in the gateset. The key insight of the theorem is that use of the group commutator converges to U exponentially fast.Approximations with optimal scaling O (log(1 /(cid:15) )) are possible. Fowler proposed anexponential-time algorithm that yields an optimal decomposition with a T -gate count ofroughly 2 .

95 log (1 /(cid:15) ) + 3 .

75 [Fow11], on average. He used an optimized, but exhaustivesearch over gate sequences of progressively longer length, stopping at the ﬁrst sequencewithin the required distance. The weakness of this approach is that it is practical forapproximations only up to about (cid:15) ≥ − . Bocharov and Svore have proposed a moreeﬃcient method which can be used to extend this range somewhat [BS12].An ancilla-based method known as “phase kickback” provides a computationally eﬃcientand cost-competitive alternative for approximating Z -axis rotations [KSV02]. Phase kick-back involves preparing a special ancilla state based on the quantum Fourier transform andthen using addition circuits controlled by the single-qubit input to eﬀect the desired rotation.Optimization of the ancilla state preparation yields a cost scaling which is somewhat higherthan Fowler’s results [JWM +

12, Jon13b], but can be made more competitive in certaincases [Jon13c]. Phase kickback oﬀers the possibility of very low circuit depth, as low as O (log log 1 /(cid:15) ), but requires a relatively large number of ancilla qubits O (log 1 /(cid:15) ).Recently, in a series of breakthroughs, eﬃcient algorithms for asymptotically optimalsingle-qubit decomposition were discovered [KMM12a, Sel12, KMM12c]. These algorithms131re based on an earlier algorithm for optimally and exactly decomposing a certain class ofunitaries into Cliﬀord and T gates [KMM12b]. The approximation algorithms work by ﬁrstrounding the unitary U to the closest ˜ U that can be exactly decomposed over { Cliﬀord , T } and then using the exact decomposition algorithm on ˜ U . Unlike phase kickback, thesealgorithms do not require ancilla qubits.Selinger showed that ancilla-free approximation of a single-qubit R Z ( θ ) rotation to withina distance of (cid:15) requires 4 log (1 /(cid:15) ) + 11 T gates in the worst case [Sel12]. For many valuesof θ , however, the number of T gates can be signiﬁcantly smaller. Kliuchnikov, Maslov andMosca (KMM) gave an eﬃcient algorithm which is shown to scale as 3 .

21 log (1 /(cid:15) ) − . R Z (1 /

10) [KMM12c].

A few non-deterministic decomposition techniques have also been developed. So-called“programmable ancilla rotations” (PAR) use a cascading set of specially prepared ancillastates along with gate teleportation [JWM + T gates required by PAR is larger thanfor ancilla-free methods, but the expected number of resources are comparable in somearchitectures [Jon13c]. Similar use of non-deterministic circuits to produce a “ladder”of non-stabilizer states, and in turn approximate an arbitrary unitary, has also beenproposed [DS12].RUS circuits have already been proposed for decomposition into an alternate logical gateset. Bocharov, Gurevich and Svore (BGS) showed that arbitrary single-qubit unitaries canbe approximated using the gate set { H, S, V } with a typical scaling of 3 log (1 /(cid:15) ) in thenumber of V gates [BGS13]. They suggest a fault-tolerant implementation of the V gateusing Figure 8.1a, which requires eight T gates, four for each Toﬀoli (see [Jon13d]). Later,Jones improved this circuit, using only a single Toﬀoli gate [Jon13c]. Through optimizeddirect search, we found an alternative RUS circuit for V that uses only four T gates andhas a lower expectation value than the other two circuits, as shown in Figure 8.1c. Furtherdiscussion of decomposition with V is found in Section 8.6.Our proposed method of single-qubit unitary decomposition based on RUS circuits isalso non-deterministic, of course. In the next section we describe these circuits in detailand in Section 8.5 we analyze the results of our optimized direct search. Decompositionalgorithms are described in Sections 8.6 and 8.7.132 i W | i W ... ... ... ... . . . | i | i| ψ i { R † i } . . . U | ψ i Figure 8.2: A repeat-until-success circuit that implements the unitary U . Ancilla qubitsare prepared in | (cid:105) , then the unitary W is performed on both the ancillas and | ψ (cid:105) . Uponmeasuring the ancillas, a unitary operation is eﬀected on | ψ (cid:105) which is either U or one of { R i } , depending on the measurement outcome. If the measurement outcome indicates R i ,then the recovery operation R † i is performed, and the process can be repeated.For convenience, a summary of single-qubit decomposition methods is given in Tables 8.1and 8.2 The structure of a repeat-until-success (RUS) circuit over a gate set G is as follows. First,some number m of ancilla qubits are prepared in state | m (cid:105) . Then, given an input state | ψ (cid:105) on n qubits, a unitary W is applied to all of the n + m qubits using gates from G .Finally, each ancilla qubit is measured in the computational basis. The output is given byΦ i | ψ (cid:105) , where Φ i is a quantum channel—i.e., a unitary plus measurements—on n qubitsthat depends on the measurement outcome i ∈ { , } m .The measurement outcomes are partitioned into two sets: “success” and “failure”.Success corresponds to some set of desired operations { Φ i : i ∈ success } ; failure correspondsto some set of undesired operations { Φ i : i ∈ failure } . In the case of success, no furtheraction is required. In the case of failure i , a recovery operation Φ − i is applied, and thecircuit is repeated.We restrict to the case in which | ψ (cid:105) is a single qubit and the { Φ i } are unitary. We alsolimit to a single “success” output U | ψ (cid:105) , for some unitary U , though U may correspond tomultiple measurement outcomes. The operation W is then a 2 m +1 × m +1 unitary matrix133 ethod Description T count Comments Solovay-Kitaev Converging (cid:15) -netbased on groupcommutators. O (log . /(cid:15) ) Computationallyeﬃcient, butsub-optimal T count.Ladderstates Hierarchicaldistillation based | H (cid:105) states. O (log . /(cid:15) ) Some of the cost canbe shifted “oﬄine”.Directsearch Optimizedexponential-timesearch. 2 .

95 log (1 /(cid:15) ) + 3 .

75 Optimal ancilla-free T count.BGS Direct searchdecomposition with V . T V (3 log /(cid:15) ) T V is the T count forchoice of fault-tolerantimplementation of V . RUS (non-axial) Database lookup. 2 . (1 /(cid:15) ) − . { H, S, T } . Method Description T count Comments Phasekickback Uses Fourier statesand phase estimation. O (log 1 /(cid:15) )(implementationdependent) O (log 1 /(cid:15) ) ancillas.Optimizations make itcost competitive withSelinger and KMM.PAR Cascading gateteleportation. O (log 1 /(cid:15) ) Constant depth (onaverage), higher T count than phasekickback.Selinger Round-oﬀ followed byexact decomposition. 4 log(1 /(cid:15) ) + 11 T count is optimal forworst case rotations.KMM Round-oﬀ followed byexact decomposition. 3 .

21 log (1 /(cid:15) ) − . T count based onscaling for R Z (1 / RUS (axial) Database lookup. 1 .

26 log (1 /(cid:15) ) − .

53 Approximation towithin (cid:15) = 10 − .Table 8.2: Decomposition methods for Z -axis rotations using the gate set { H, S, T } .Approximation of an arbitrary single-qubit unitary is possible by using the relation U = R Z ( θ ) HR Z ( θ ) HR Z ( θ ). 134f the form W = 1 (cid:113)(cid:80) i | α i |  α U . . .α R . . .... α m R l  , (8.4)where U, R , . . . , R l are 2 × α , . . . , α l ∈ C are scalars. Since theancillas are prepared in | m (cid:105) , only the ﬁrst two columns of W are of consequence. Contentsof the remaining columns are essentially unrestricted, except that W must be unitary. Eachof the l + 1 = 2 m measurement outcomes corresponds to application of a unitary from U ∪ { R i } on the data qubit. Without loss of generality, we have selected the all zerosoutcome to correspond with application of U , since outcomes can be freely permuted. Theentire protocol is illustrated in Figure 8.2.For simplicity, we assume that U (cid:54) = R i ∀ ≤ i ≤ l . The case in which U appearsmultiple times can be easily accommodated. In order for the circuit to be useful, theremaining matrices R , . . . , R l should be invertible at a low cost.In order to be compatible with existing fault-tolerance schemes, we require that W can be synthesized using the gate set { Cliﬀord } ∪ { T } , where { Cliﬀord } denotes theCliﬀord group generated by { H, S,

CNOT } . A unitary matrix is exactly implementable by { Cliﬀord , T } if and only if its entries are contained in the ring extension Z [ i, √ ] [GS12].Thus, we require that α U, α R , . . . , α m R m are matrices over Z [ i, √ ]. Furthermore, thenormalization 1 / (cid:113)(cid:80) i | α i | must also be in the ring. The unitarity condition on W thenrequires that (cid:88) i | α i | = 2 k (8.5)for some integer k .If all of the recovery operations R , . . . , R m are exactly implementable by { Cliﬀord , T } ,then we may assume that α , . . . , α m ∈ Z [ i, √ ]. If α is an integer, then Lagrange’sfour-square theorem implies that (8.5) can be satisﬁed using at most n = 3 ancilla qubits.We pause brieﬂy to note that any element of the ring extension Z [ i, √ ] can be writtenas a + ib + √ c + id ) √ k ∈ Z [ i, √ , (8.6) Our method is also extensible to other gate sets; however such extensions are not explored here. a, b, c, d, k . Below we will eliminate the denominator in which case we may write a + ib + √ c + id ) ∈ Z [ i, √ . (8.7) Consider a 2 × U such that U = (cid:18) u u u u (cid:19) = 1 √ k α (cid:18) β β β β (cid:19) , (8.8)for α ∈ R , β , . . . , β ∈ Z [ i, √

2] and integer k ≥

0. We are concerned with exactlyimplementing U only up to a global unit phase e iφ for some φ ∈ [0 , π ). Accordingly, wemay assume without loss of generality that α is real and non-negative since for any β ∈ C , ββ ∗ | β | ≥

0. The restriction to Z [ i, √

2] rather than Z [ i, √ ] is also without loss of generality,since k can be chosen to eliminate any denominators. Then choosing α = √ k α we have α = (cid:113) | β | + | β | = (cid:113) x + y √ , (8.9)where x = a + c + a + c + 2( b + d + b + d ), y = a b + c d + a b + c d for integers a , b , c , d , a , b , c , d .Any target unitary U must have this form due to (8.4). In other words, the only unitaries that can be obtained by { Cliﬀord , T } circuits of the form Figure 8.2 are thosethat can be expressed by entries in Z [ i, √

2] after multiplying by a scalar. Nonetheless, thisrestricted class of unitaries can be used to approximate arbitrary unitaries more eﬃcientlythan unitaries limited to Z [ i, √ ], as we show in Section 8.5 and Section 8.7.In addition to their use in [BGS13], repeat-until-success circuits have been consideredby Wiebe and Kliuchnikov for small-angle Z -axis rotations [WK13]. Whereas Wiebe andKliuchnikov propose hierarchical RUS circuits over { Cliﬀord , T } , we do not a priori restrictto a hierarchical structure or to small Z -axis rotations. RUS circuits have been studiedto a limited extent in other contexts, as well. For example, repeated gate operations havebeen proposed for use in linear optics to implement a CZ gate [LBK04]. More recently,[SO13] adapted deterministic ancilla-driven methods [AOK +

10, KOB +

09] to allow fornon-determinism. 136 .3.2 Success probability and expected cost

The success probability, i.e., the probability of obtaining the zero outcome for all ancillameasurements, can be computed from (8.5) and is given byPr[success] = α k ≤ α (cid:100) α (cid:101) , (8.10)where since α < k , we may use k ≥ (cid:100) α (cid:101) . The circuits in Figure 8.1, for example,each yield a value of α = √ /

8. On the otherhand, if U appears multiple times in (8.4), then we havePr[success] = mα k ≤ mα (cid:100) log mα (cid:101) , (8.11)where m is the number of times that U appears. This upper bound can be made arbitrarilyclose to one for large enough m .The expected number of repetitions required in order to achieve success is given bya geometric distribution with expectation value 1 /p , and variance (1 − p ) /p , where p = Pr[success]. If C ( W ) is the cost of implementing the unitary W , then the expectedcost of the RUS circuit is given by C ( W ) /p with a variance of C ( W )(1 − p ) /p . Since theresources required to implement a { Cliﬀord , T } fault-tolerant circuit are often dominatedby the cost of implementing the T gate, we will deﬁne C ( W ) as the number of T gates inthe circuit used to implement W .We choose to use T -gate count as the cost function because it is simple, and is consistentwith other { Cliﬀord , T } decomposition algorithms [KMM12b, AMMR12, Sel12, KMM12c,WK13, GKMR13]. However, RUS circuits employ techniques that are not present in thecircuits produced by previous decomposition methods. In particular, rapid classical feedbackand control is required. Moreover, variable time scales for logical single-qubit gates implythe need for active synchronization. Thus, while T count allows for direct comparison ofRUS circuits with other methods, a more complete metric may be required for resourcecalculations on a particular architecture. We may describe the action of the multi-qubit unitary W by W | m (cid:105) | ψ (cid:105) = √ p | m (cid:105) U | ψ (cid:105) + (cid:112) − p (cid:12)(cid:12) Φ ⊥ (cid:11) , (8.12)137here (cid:12)(cid:12) Φ ⊥ (cid:11) is a state that depends on | ψ (cid:105) and satisﬁes ( | m (cid:105) (cid:104) m | ⊗ I ) (cid:12)(cid:12) Φ ⊥ (cid:11) = 0. Thatis, W outputs a state which has amplitude √ p on the “success” subspace, and amplitude √ − p on the “failure” subspace. We show below that in some cases we may apply theamplitude ampliﬁcation algorithm to boost the success probability and reduce the expected T count of an RUS circuit.Traditional amplitude ampliﬁcation [BHMT00] proceeds by applying the operator ( RS ) j on the initial state W | m (cid:105) | ψ (cid:105) for some integer j > S = I − | m (cid:105) | ψ (cid:105) (cid:104) m | (cid:104) ψ | ,R = W SW † = I − W | m (cid:105) | ψ (cid:105) (cid:104) m | (cid:104) ψ | W † . (8.13)In the two-dimensional subspace spanned by {| m (cid:105) U | ψ (cid:105) , (cid:12)(cid:12) Φ ⊥ (cid:11) } , RS acts a rotation by 2 θ where sin( θ ) = √ p . Therefore ( RS ) j ( W | m (cid:105) | ψ (cid:105) ) = sin((2 j + 1) θ ) | m (cid:105) U | ψ (cid:105) + cos((2 j +1) θ ) (cid:12)(cid:12) Φ ⊥ (cid:11) . The goal then, is to choose j appropriately so as to minimize the expectednumber of T gates.The problem in this case is that | ψ (cid:105) is unknown, and therefore we cannot directlyimplement S . We can, however, implement S (cid:48) = CZ( m ) ⊗ I , the generalized controlled- Z gate on m qubits deﬁned by CZ( m ) | x , x , . . . , x m (cid:105) = ( − x x ...x m | x , x , . . . , x m (cid:105) . Wecould, therefore, apply ( W S (cid:48) W † S (cid:48) ) j instead of ( RS ) j . Proposition 8.3.1.

Consider a unitary W that satisﬁes (8.12). Amplitude ampliﬁcationon | m (cid:105) U | ψ (cid:105) can be performed using the operator W S (cid:48) W † S (cid:48) , where S (cid:48) = CZ ( m ) ⊗ I . Moreprecisely, ( W S (cid:48) W † S (cid:48) ) j ( W | m (cid:105) | ψ (cid:105) ) = sin((2 j + 1) θ ) | m (cid:105) U | ψ (cid:105) + cos((2 j + 1) θ ) (cid:12)(cid:12) Φ ⊥ (cid:11) , (8.14) where sin( θ ) = √ p . Proof of this claim relies on the 2 D Subspace Lemma of Childs and Kothari.

Lemma 8.3.2 ([CK13]) . Let W be a unitary that satisﬁes (8.12). Then the state (cid:12)(cid:12) Ψ ⊥ (cid:11) := W † ( (cid:112) − p | m (cid:105) U | ψ (cid:105) − √ p (cid:12)(cid:12) Φ ⊥ (cid:11) ) satisﬁes ( | m (cid:105) (cid:104) m | ⊗ I ) (cid:12)(cid:12) Ψ ⊥ (cid:11) = 0 .Proof of Proposition 8.3.1. First, note that both R and S preserve the two-dimensionalsubspace spanned by | m (cid:105) U | ψ (cid:105) and (cid:12)(cid:12) Φ ⊥ (cid:11) . That is, the state that results from applyingany sequence of R and S on W | m (cid:105) | ψ (cid:105) can be written as a linear combination of | m (cid:105) | ψ (cid:105) (cid:12)(cid:12) Φ ⊥ (cid:11) . Next, observe that S (cid:48) also preserves this subspace and is equivalent to S since S (cid:48) | m (cid:105) U | ψ (cid:105) = − | m (cid:105) U | ψ (cid:105) and S (cid:48) (cid:12)(cid:12) Φ ⊥ (cid:11) = (cid:12)(cid:12) Φ ⊥ (cid:11) .The claim then is that the reﬂection W S (cid:48) W † about the state W | m (cid:105) | ψ (cid:105) also preservesthe subspace and is equivalent to R . Clearly, ( W S (cid:48) W † ) W | m (cid:105) | ψ (cid:105) = − W | m (cid:105) | ψ (cid:105) . On theother hand, the action of W S (cid:48) W † on the state (cid:12)(cid:12) Ψ ⊥ (cid:11) that is orthogonal to W | m (cid:105) | ψ (cid:105) (in the 2 D subspace) is less obvious and requires Lemma 8.3.2, which implies that( W S (cid:48) W † ) W (cid:12)(cid:12) Ψ ⊥ (cid:11) = W (cid:12)(cid:12) Ψ ⊥ (cid:11) as desired. We therefore conclude that ( W S (cid:48) W † S (cid:48) ) j is equiva-lent to “real” amplitude ampliﬁcation on W | m (cid:105) | ψ (cid:105) and, in particular, that( W S (cid:48) W † S (cid:48) ) j W | m (cid:105) | ψ (cid:105) = sin((2 j + 1) θ ) | m (cid:105) U | ψ (cid:105) + cos((2 j + 1) θ ) (cid:12)(cid:12) Φ ⊥ (cid:11) . If m ≤

2, then S (cid:48) can be implemented with only Cliﬀord gates, i.e., Z or CZ. Then,for a ﬁxed value of j , the total number of T gates in the corresponding ampliﬁed circuit isgiven by (2 j + 1) T , where T is the number of T gates in the unampliﬁed circuit. In orderfor amplitude ampliﬁcation to yield an improvement in the expected number of T gates, wetherefore require that (2 j + 1) sin ( θ ) < sin ((2 j + 1) θ ) , (8.15)a condition that holds if and only if 0 ≤ p < /

3. Thus a sensible course of action is toapply amplitude ampliﬁcation for all RUS circuits for which p < /

3, and leave higherprobability circuits unchanged.Consider, for example, an RUS circuit that contains 15 T gates and has a successprobability of 0 .

1. In this case, using amplitude ampliﬁcation with value of j = 1 yields anew circuit with success probability 0 .

676 and 45 T gates, an improvement in the expectednumber of T gates by a factor of 2 .

25. The eﬀects of amplitude ampliﬁcation on ourdatabase of RUS circuits are discussed in Section 8.5.Cost analysis of amplitude ampliﬁcation for circuits with more than two ancilla qubitsis more complicated because the reﬂection operator S (cid:48) = CZ( m ) is not a Cliﬀord gate.For three ancilla qubits, for example, S (cid:48) is the controlled-controlled- Z gate, which canbe implemented with 4 T gates [Jon13d]. Larger versions of CZ( m ) could be synthesizeddirectly [Kli13, WGMAG13], or by using a recursive procedure [NC00]. The circuitspresented in Section 8.5 use at most two ancilla qubits, however, so more complicatedampliﬁcation circuits are not an issue in our analysis.139 .4 Direct search methods Equations (8.4) and (8.9) restrict the kinds of unitaries that can be obtained from RUScircuits. However, these conditions say little about how to implement the unitary W .Given W explicitly, it is possible to synthesize a corresponding { Cliﬀord , T } circuit with aminimum number of T gates [GKMR13], at least for small W . However, given a unitary U of the form (8.8), there are potentially many choices of W . The minimum number of T gates required is therefore unclear and is a direction for future research.In order to better understand the scope and power of RUS circuits, we design anoptimized direct search algorithm that checks for RUS circuits up to a given T -gate count.Our direct search algorithm is as follows:1. Select the number of ancilla qubits and the number of gates.2. Construct a { Cliﬀord , T } circuit and compute the resulting unitary matrix W .3. Partition the ﬁrst two columns of W into 2 × R i of the circuits found by our search to the set ofsingle-qubit Cliﬀords. This choice is motivated by our use of the T count as a cost function;Cliﬀord gates, and therefore the recovery operations are assigned a cost of zero.In order to identify relevant search parameters, we initially performed a random searchover a wide range of circuit widths (number of qubits) and sizes (number of gates). Oursearch was most successful with small numbers of ancilla qubits, large numbers of T gates,and just one or two entangling gates. We therefore focused on circuits of the form shownin Figure 8.3. These circuits contain just a single ancilla qubit and two CZ gates, interleavedwith single-qubit gates.Naively, the number of circuits of the form Figure 8.3 is O (3 n ), where n is the maximumnumber of (non-CZ) gates in the circuit, and the base of three is the size of the set { H, S, T } .In order to reduce the complexity of our search, we constructed each of the single-qubitgate sequences using the canonical form proposed by [BS12]. A canonical form sequence isthe product of three 2 × g Cg where g , g belong to the Cliﬀord group,and C is the product of some number of “syllables” T H and

SHT H . The canonical form140 i g C g • g C g • g C g | ψ i • g C g • Figure 8.3: The above circuit illustrates the general form of most of the circuits in ourdatabase. Each of the gates labeled g represents an element of the single-qubit Cliﬀordgroup. Each of the gates labeled C represents a single-qubit canonical circuit as deﬁnedin [BS12]. | i { I, X } {

I, SH, HSH } C . . . (a) . . . C { H, HS, HSH } (b) . . . g • { I, SH, HSH } C . . . | ψ i • { I, SH, HSH } C . . . (c) . . . C { H, HS, HSH } • g . . .. . . C { H, HS, HSH } • U | ψ i (d) Figure 8.4: Some of the g gates in Figure 8.3 can be restricted to a subset of the single-qubitCliﬀord group. (a) Circuits that begin with diagonal gates can be eliminated since theyadd a trivial phase to | (cid:105) . (b) Similarly, diagonal gates have no impact on the Z -basismeasurement. (c) Pauli gates and S gates can be commuted through the CZ and absorbedinto either | ψ (cid:105) or the preceding g gate. (d) Analogously, Pauli and S gates occurring beforethe CZ can be absorbed by the trailing g gate or by the output.yields a unique representation of all single-qubit circuits over { H, T } ; there are 2 t − + 4canonical circuits of T -count at most t . This yields more than a quadratic improvementcompared to the naive search, since the number of T gates is roughly one-half the totalnumber of gates.In general, the canonical form requires conjugation by the full single-qubit Cliﬀordgroup, which contains 24 elements. Given a product of syllables C , each of the 24 = 576circuits g Cg are unique. However, when multiple canonical form circuits are placed ina larger circuit, as in Figure 8.3, some combinations of Cliﬀord gates can be eliminated.For example, in g Cg | (cid:105) , g need only be an element of { I, X, SH, SHX, HSH, HSHX } since diagonal gates act trivially on | (cid:105) . Similar simpliﬁcations for Figure 8.3 are shownin Figure 8.4. In total, these Cliﬀord simpliﬁcations reduce the search space by a factor ofmore than 10 . 141espite these simpliﬁcations, the search time is still exponential in the number of T gates. To save time, we partitioned the search into thousands of small pieces running inparallel on a large cluster and collected the results in a central database. We were ableto exhaustively search circuits of the form of Figure 8.3 up to a total (raw) T count of 15.The search took roughly one week running on hundreds of cores. The results of this searchare presented in the next section. Our search yielded many circuits that implement the same unitary U , but with diﬀerent T -gate counts and success probabilities. To eliminate redundancy we maintained, for a given U , a database containing only the circuit with the minimum expected T count. The resultis a database containing 2194 RUS circuits. Upon success, each circuit exactly implements aunique non-Cliﬀord single-qubit unitary U , and otherwise implements a single-qubit Cliﬀordoperation. Database statistics are shown in Figure 8.5. For circuits with success probabilityless than 1 /

3, we used amplitude ampliﬁcation to improve performance (see Section 8.3.3).Figure 8.5b illustrates the impact of amplitude ampliﬁcation on the expected T count.Ampliﬁcation improved the performance of circuits with relatively high expected T count,but did not improve circuits with expected T count of 30 or less. Note that the database alsoincludes some circuits that were found by preliminary searches not of the form of Figure 8.3.The database contains 1659 axial rotations, i.e., unitaries which, modulo conjugation byCliﬀords, are rotations about the Z -axis of the Bloch sphere, and 535 non-axial rotations.The number of axial rotations is noteworthy since, modulo Cliﬀord conjugation, only onenon-trivial single-qubit rotation can be exactly synthesized with { Cliﬀord , T } and withoutmeasurement, namely T [KMM12b]. Our results show that many axial rotations can beimplemented exactly (conditioned on success) when measurement is allowed.At the same time, the non-axial rotations in our database oﬀer an expected T countthat is dramatically better than the T count obtained by approximation algorithms [Sel12,KMM12c]. For each circuit in the database we computed the number of T gates requiredto approximate the corresponding unitary to within a distance of 10 − using the algorithmof KMM. Figure 8.6 shows the ratio of the T count given by KMM vs. the expected T count for the RUS circuit. Our results show a typical improvement of about a factorof three for axial rotations and a typical improvement of about a factor of about 12 fornon-axial rotations. The larger improvement for non-axial rotations is expected since theKMM algorithm requires the unitary to be ﬁrst decomposed into a sequence of three axialrotations. 142 (cid:60) (cid:60) (cid:60) (cid:60) (cid:60) (cid:60) (cid:60) (cid:60) (cid:60) N u m b e r o f c i r c u it s Pr (cid:64) success (cid:68) (a) amplifiedunamplified

10 20 30 40 50 60 70 80 90 10050100150200250300350400 Expected T count N u m b e r o f c i r c u it s (b) Figure 8.5: Statistics for the database of repeat-until-success circuits, including all circuitsof the form of Figure 8.3 up to a T count of 15. (a) The total number of circuits groupedby (raw) T gate count and success probability. (b) The total number of circuits grouped byexpected T count, both before amplitude ampliﬁcation and after amplitude ampliﬁcation.The two histograms (before ampliﬁcation and after ampliﬁcation) are overlayed, where thedarker hatched bars indicate circuits that are unaﬀected by ampliﬁcation. Only circuitswith an expected T count of at most 100 are shown.143 xialnon (cid:45) axial N u m b e r o f c i r c u it s Figure 8.6: Contents of the RUS database, split into axial and non-axial single-qubitrotations. For each circuit in the database the number of T gates required to approximatethe corresponding “success” unitary U to within 10 − was calculated using the algorithmof [KMM12c]. The x -axis represents the ratio of the KMM T count vs. the expectednumber of T gates for the RUS circuit.As an example, the RUS circuit shown in Figure 8.7 implements the non-axial single-qubit rotation U = (2 X + √ Y + Z ) / √ T gates and a probability of success of7 /

8. By contrast, approximating U to within (cid:15) = 10 − using the KMM algorithm requiresa total of 182 T gates. Thus Figure 8.7 not only implements the intended unitary exactly,but does so at a cost better than 40 times less than the best approximation methods.Our database is too large to oﬀer an analysis of each circuit in detail. Instead, we presentsome additional noteworthy examples. The smallest circuit in our database contains two T gates and is shown in Figure 8.8. Upon measuring zero, which occurs with probability 3 / I + √ X ) / √ I . This circuit | i H T † H • • H T † H | ψ i • H T H T † H • X + √ Y + Z √ | ψ i Figure 8.7: This RUS circuit implements the unitary U = (2 X + √ Y + Z ) / √ /

8, and otherwise implements Z . Approximation of U without ancillas requires182 T gates (roughly 40 times more) for (cid:15) = 10 − .144 i H T • H • T H | ψ i I + i √ X √ | ψ i Figure 8.8: The above circuit is the smallest in our database. Upon measuring zero, whichoccurs with probability 3 /

4, it implements ( I + i √ X ) / √ | ψ (cid:105) . Uponmeasuring one, it implements the identity. | i H T H • T † H T • H T H | ψ i • • Z V | ψ i Figure 8.9: Like the circuits in Figure 8.1, the above circuit implements V with probability5 / /

6, but with only one ancilla qubit and one measurement.is notable in that its existence was predicted by Gosset and Nagaj in [GN13]. They requireda { Cliﬀord , T } circuit that exactly implemented R = ( √ I − iY ) / √ R up toconjugation by Cliﬀord gates.As discussed in Section 8.2, our database contains a circuit that implements V . Inaddition to the circuit shown in Figure 8.1c, our search also found a circuit that implements V with the same number of T gates (four), but just a single ancilla qubit, as shownin Figure 8.9. The expected T count of the single-ancilla circuit is worse than thatof Figure 8.1c, though, since all four of the T gates on the ancilla must be performed“online”.The V gate is one of a family of V -basis gates for which the normalization factor is1 / √

5. In addition to single-qubit unitary decomposition based on V , [BGS13] also oﬀersthe possibility of decomposing single-qubit unitaries using V -basis gates with normalizationfactors 1 / √ p where p is a prime. These “higher-order” V gates cover SU (2) more rapidlythan V and therefore oﬀer potentially more eﬃcient decomposition algorithms. A number ofsuch V -basis gates can be found in our database, including axial versions for p ∈ { , , } ,as shown in Figure 8.10. The prospect of decomposition algorithms with these circuits isdiscussed in Section 8.6. 145 i H T H T † H • S H T H T † H S • H T H T † H | ψ i • • Z (a) (2 Z + 3 iI ) / √

13, Pr = 13 / | i H S T H T H T H T S † H • H S H T H T H T H • H S H T H T H T H T H S H | ψ i • X • X (b) (4 I + iZ ) / √

17, Pr ≈ . | i H T H T H • H S H T H T H T H T H T H T H T H S H • H T H T H | ψ i • X • X (c) (5 I + 2 iZ ) / √

29, Pr ≈ . Figure 8.10: RUS circuits for V -basis gates with prime normalization factors (a) p = 13(b) p = 17 and (c) p = 29. The values under each circuit indicate the unitary eﬀected uponsuccess and the success probability, respectively. Each circuit implements the identity uponfailure. V V in Figure 8.1c can be used directly in the decomposition algorithmof [BGS13]. The BGS direct search algorithm can produce an (cid:15) -approximation of arandomly chosen single-qubit unitary with a number of V gates given by 3 log (1 /(cid:15) ) inmost cases. Multiplying by an expected T -cost of 5 .

26 using Figure 8.1c yields an algorithmwith an expected T count of 15 .

78 log (1 /(cid:15) ) . (8.16)This is an improvement over the estimated T count of 3(3 .

21 log (3 /(cid:15) ) − .

93) due to[KMM12c] for all (cid:15) < .

25. This scaling is worse than Fowler’s optimal exponential-timesearch by roughly a factor of two. However, the exponential nature of Fowler’s method meansthat it can provide approximations in reasonable time only up to roughly (cid:15) = 10 − . TheBGS direct search can provide approximations to within (cid:15) = 10 − . Thus V decompositionappears to be the best option when relatively high precision is required.The database also contains some V -basis gates with prime normalization factors largerthan 5. In [BGS13], the authors conjecture that the decomposition algorithm for p = 5extends to other primes with a T -count scaling of 4 log p (1 /(cid:15) ). However, whereas p = 5requires only the single V gate, higher prime values require implementation of multiple V V gates can be implemented withsome number of T gates T p . Then the decomposition yielded for prime p will be betterthan that obtained with V if 1 < . T p log ( p ) . (8.17)Unfortunately, our database contains only a single V -basis gate for each of p = { , , } . Still we calculate (8.17) under the optimistic assumption that the remaining V gates can somehow be implemented at the same cost. Using the circuits in Figure 8.10we obtain 5 . / .

38 log ≈ . , (8.18a)5 . / .

17 log ≈ . , (8.18b)5 . / .

22 log ≈ . . (8.18c)Based on these calculations we conclude that, while improved decomposition may be possibleusing p = 13, higher values of p are unlikely to yield cost beneﬁts on their own.On the other hand, given implementations of multiple V gates, there is no reason tolimit to a single value of p . One could imagine an algorithm that combined multiple classesof V gates, using largely V and using more expensive high-order V gates selectively. We donot consider such an algorithm directly. In the next section, however, we study the eﬀect ofoptimally combining all of the RUS circuits in our database, not just V gates. It is possible to approximate to any desired accuracy, an arbitrary single-qubit unitary usingjust Cliﬀord gates and the circuits in our database. But actually ﬁnding the optimal sequenceamong all possible combinations of circuits is a challenging task. Ideally, we could constructan eﬃcient decomposition algorithm based on algebraic characterization of the set ofavailable circuits, similar to algorithms for more limited gate sets [Sel12, KMM12c, BGS13].But the current theoretical characterization of RUS circuits is limited and is a directionfor future work. Instead, we elect to expand the database by explicitly constructing allpossible sequences of circuits.Construction of the expanded database is similar in nature to the constructions of [Fow11]and [BS12]. Starting with the set of circuits found by our direct search algorithm, wecompute all products of pairs of circuits, keeping those that produce a unitary which is not147et in the database. Triples of circuits can then be constructed from singles and pairs, andso on. Composite circuits of arbitrary size can be constructed in this way. Call a circuit aclass- k circuit if it is composed of a k -tuple of circuits from the original database. Then thenumber N k of class- k circuits is bounded by N k ≤ N · N k − ≤ N k , (8.19)where N is the number of circuits in the original database.To make database expansion more manageable, we keep only those circuits that yieldan expected T count of at most some ﬁxed value T . This has the simultaneous eﬀect ofdiscarding poorly performing circuits and reducing the value of N k so that constructionof class-( k + 1) circuits is less computationally expensive. Furthermore, circuits can bepartitioned into equivalence classes by Cliﬀord conjugation. The unitaries of the initial setof circuits are of the form g U g , where U is the unitary obtained from the RUS circuit,and g , g are Cliﬀords. Thus, the product of k such circuits has the form g U g U g . . . U k g k . (8.20)The set of class-( k + 1) circuits can then be constructed by using g U g U g . . . U k g k ( g k (cid:48) U k +1 g k +1 ) = g U g U g . . . U k g k (cid:48)(cid:48) U k +1 g k +1 , (8.21)so that the Cliﬀord g k is unnecessary. Furthermore, g can always be prepended later, andso we instead express each class- k unitary as U g U g . . . U k . (8.22)To ﬁnd an equivalence class representative of U , we ﬁrst adjust the global phase bymultiplying by u ∗ / (cid:112) | u | , where u is the ﬁrst non-zero entry in the ﬁrst row of U . Next,we conjugate U by all possible pairs of single-qubit Cliﬀords. The ﬁrst element of alexicographical sort then yields the representative g U g for some Cliﬀords g , g .Once the database has been constructed, the decomposition algorithm is straightforward.Given a single-qubit unitary U and (cid:15) ∈ [0 , V such that D ( U, V ) ≤ (cid:15) , where D ( U, V ) = (cid:114) − | Tr( U † V ) | T count. 148 .7.1 Decomposition with axial rotations An arbitrary single-qubit unitary can be decomposed into a sequence of three Z -axisrotations and two Hadamard gates [NC00]. Therefore, approximate decomposition of Z -axis rotations suﬃces to approximate any single-qubit unitary. If we limit to Z -axis, i.e,diagonal rotations only, then a few additional simpliﬁcations are possible. In particular,each unitary can be represented by a single real number corresponding to the rotation anglein radians. The result of a sequence of such rotations is then given by the sum of the angles.Furthermore, up to conjugation by { X, S } , all Z -axis rotations can be represented by anangle in the range [0 , π/ Z -axis rotationswhich is much larger than a database of arbitrary (non-axial) unitaries.Using the database expansion procedure described above, we were able to construct adatabase containing all combinations of RUS circuits with expected T count at most 30.The maximum distance (according to (8.23)) between any two neighboring rotations is lessthan 2 . × − , and can be improved to 2 × − by selectively ﬁlling the largest gaps. Sothe resulting database permits approximation of any Z -axis rotation to within (cid:15) = 10 − .To approximate a Z -axis rotation by an angle θ , we simply select all of the entries thatare within the prescribed distance (cid:15) , and then choose the one with the smallest expected T count. This procedure is eﬃcient since the database can be sorted according to rotationangle. Then the subset of entries that are within (cid:15) can be identiﬁed by binary search.In order to assess the performance of this method, we approximate, for various valuesof (cid:15) , a sample of 10 randomly generated angles in the range [0 , π/ T count for each (cid:15) yields a scalinggiven by (8.1), with a slope roughly 2 . R Z (1 / .

14 log (1 /θ ) forsmall angles θ . However, their RUS circuits are specially designed for small angles. Forarbitrary angles they report an expected T count of about1 .

14 log (10 γ ) + 8 log (10 − γ /(cid:15) ) , (8.24)where θ = a × − γ for some a ∈ (0 ,

1) and integer γ >

0. Using (8.24) to calculate costsfor the same 10 random angles as above, we obtain a ﬁt function of6 log (1 /(cid:15) ) − . . (8.25)Formula (8.25) indicates that the eﬃciency of the circuits in [WK13] does not extend tocoarse angles. 149 US KMM Selinger (cid:230) (cid:230) (cid:230) (cid:230) (cid:230) (cid:230) (cid:230) (cid:230) (cid:230)(cid:230)(cid:230) (cid:230) (cid:230) (cid:230) (cid:230) (cid:230) (cid:230) (cid:230) (cid:230) (cid:230) (cid:230) (cid:230) (cid:230) (cid:230) (cid:230) (cid:230) (cid:230) (cid:230) (cid:230) (cid:230) (cid:230) (cid:230) (cid:45) (cid:45) (cid:45) (cid:45) (cid:45) (cid:45) (cid:45) (cid:45) (cid:72) Ε (cid:76) (cid:72) E xp ec t e d (cid:76) T c oun t . l o g (cid:72) (cid:144) Ε (cid:76) (cid:45) . Figure 8.11: The above plot shows the expectednumber of T gates required to approximate a single-qubit Z -axis rotation to within a distance (cid:15) . Theplot was constructed by selecting 10 real numbersin the range [0 , π/

4] uniformly at random. For eachvalue θ , the RUS circuit with the smallest expected T count within (cid:15) of the unitary R Z ( θ ) was selected.The mean for each value of (cid:15) is plotted, yielding aﬁt-curve of 1 .

26 log (1 /(cid:15) ) − .

53. The gray regionis an estimate of the interval containing the actualnumber of T gates with probability 95%. Scaling ofthe Selinger and KMM algorithms are included forreference. log (1 /(cid:15) ) Exp T ( σ ) ±

95% ( σ )1 1 . .

1) 1 . . . . .

2) 2 . . . .

4) 3 . . . . .

9) 4 . . . .

3) 4 . . . . .

6) 4 . . . .

8) 5 . . . . .

3) 5 . . . .

7) 6 . . . . .

0) 6 . . . .

4) 7 . . T counts forapproximation of random Z -axis ro-tations with RUS circuits. The mid-dle column indicates the expected T count based on a sample of 10 ran-dom angles. The right-hand columnindicates the expected 95 percentconﬁdence interval of the T countfor the best RUS circuit, given a ran-dom angle θ . The variance of eachexpected value is indicated in paren-thesis.150quation (8.1) also implies that RUS Z -axis rotations can be used to approximate arbitrary single-qubit unitaries with a scaling approaching that of optimal ancilla-freedecomposition. Since an arbitrary unitary can be expressed as a product of three axialrotations, the expected T count for approximating an arbitrary single-qubit unitary isgiven by 3 . (3 /(cid:15) ) − .

37. On the other hand, Fowler calculates an optimal T -count of2 .

95 log (1 /(cid:15) ) + 3 .

75 (on average) without using ancillas [Fow11].Since our circuits are non-deterministic, we are also concerned with the probabilitydistribution of the number of T gates. For each composite circuit in the database, wecalculate the variance σ of the T count based on the variance of each individual circuit.We may then obtain a conﬁdence interval using Chevyshev’s inequalityPr( | Actual[ T ] − Exp[ T ] | ≥ kσ ) ≤ k . (8.26)Table 8.3 shows the mean of the expected T count for each (cid:15) . By also calculating the meanof the variance σ , we obtain an estimate of the corresponding 95% conﬁdence interval,shown by the gray region in Figure 8.11. That is, for a randomly chosen angle θ , thetotal number of T gates required to implement R Z ( θ ) is within the given interval around1 .

26 log (1 /(cid:15) ) − .

53, with probability 0 . T count of 30 tookroughly 20 hours and 41 GB of memory using Mathematica. Table 8.4 shows the numberof circuit combinations and corresponding rotation angle densities for increasing values ofthe expected T count. The size and density of the database increases by about an order ofmagnitude for every ﬁve T gates. We expect that with a more eﬃcient implementation—inC/C++ for example—the worst-case approximation accuracy could be improved. Using either the above database, or the methods of KMM or Selinger, decomposition ofan arbitrary unitary incurs an additional factor of three in cost because each of the three Z -axis rotations are approximated separately. The increased cost is illustrated in Figure 8.6by the larger ratios for non-axial unitaries. Indeed Figure 8.6 suggests that incorporatingboth axial and non-axial RUS circuits could yield better approximations than using Z -axisrotations alone.Fowler’s method does not incur the additional factor of three for arbitrary unitaries,maintaining a scaling of 2 .

95 log (1 /(cid:15) ) + 3 .

75. But as noted before, RUS circuits oﬀer a151ax. exp. T count Size Mean D Max D .

04 0 . . . . . . . . . . . Z -axis rotation database according to the maximumexpected number of T gates. The mean and the maximum distances between nearestneighbors is given in columns three and four, respectively.larger domain of exactly implementable unitaries than circuits without ancillas. Just as RUScircuits outperform ancilla-free Z -axis decomposition, they could outperform ancilla-freenon-axial decomposition.On the other hand, construction of the database in the non-axial case is signiﬁcantlymore challenging than in the axial case. Unitaries must be represented by three rotationangles instead of one. Multiplication of circuit combinations is less eﬃcient than for Z -axisrotations which only require addition. Organizing the database for eﬃcient lookup is alsomore complicated. Z -axis rotations can be sorted by rotation angle, but arbitrary unitariesrequire a more complicated data structure such as a k -d tree [DN05, Amy13].Despite these limitations, there are some savings to be had. We may still express eachunitary by its Cliﬀord equivalence class representative (8.22). Conjugation by all 576 pairsof Cliﬀords is not required however. First, note that any single-qubit Cliﬀord can be writtenas a product g g where g ∈ G , g ∈ G and G = { I, Z, S, S † } G = { I, H, X, XH, HS, XHS, HSH, XHSH } . (8.27)Now, instead of conjugating by the entire Cliﬀord group, we conjugate only by G . Then,each resulting unitary can be decomposed into three rotations g U g (cid:48) = R Z ( θ ) R X ( θ ) R Z ( θ ) , (8.28)where g ∈ G and g (cid:48) ∈ { g † | g ∈ G } . The Cliﬀords in G are diagonal, and only modify θ and θ . Up to conjugation by these remaining Cliﬀords, we then have R Z ( θ ) R X ( θ ) R Z ( θ ) ≡ R Z ( θ mod π/ R X ( θ ) R Z ( θ mod π/ . (8.29)152hoosing 0 ≤ θ , θ < π/

2, we can ﬁnd an equivalence class representative without actuallyconjugating by G , saving a factor of 576 /

64 = 9.Even with this optimization, though, our Mathematica implementation is quite slow.We were able to construct a database of size 45526 consisting of all RUS circuits withexpected T count at most 18. We then calculated the best circuit for each of 100 randomsingle-qubit unitaries for a variety of (cid:15) ≥ × − . A ﬁt-curve for the data yields a scalinggiven by (8.3). Based on the slope, the savings is only about 18 percent over Fowler, but inabsolute terms the savings is roughly a factor of two, at least for modest approximationaccuracy. See Figure 8.12.Given the relatively large ratios for non-axial unitaries in Figure 8.6, the scaling givenby (8.3) is perhaps disappointing. We note, however, that our database contains only alimited subset of possible RUS circuits. Incorporating a larger set of circuits could improveperformance. The accuracy to which the database decomposition methods can reach is limited by the sizeof the database. Our Z -axis rotation database is capable of approximations to within 10 − .If the required accuracy is higher than that, then either the database must be expanded,or an algorithmic decomposition such as Selinger, KMM, or that of Section 8.6 must beused. However, a variety of important quantum algorithms require only relatively coarseaccuracy. Fowler, for example, used numerical analysis to argue that Shor’s algorithmrequires rotation angles no smaller than θ = π/ ≈ .

05 with an with an approximationerror of (cid:15) = π/ ≈ .

012 [FH04].Another application of coarse angles is in quantum chemistry. Consider a Hamiltonianfor a molecule expressed in second quantized form, where the objective is to determine theground state energy of the molecule. Wecker et al. [WBCT13] have developed a techniqueto scale the coeﬃcients of the non-commuting terms in the Hamiltonian to the maximumcoeﬃcient, while maintaining arbitrary accuracy on the estimate of the energy. This scalingallows one to use large angles within the phase estimation algorithm, where the anglesrequire at most 10 − accuracy in practice. Similarly, Jones et al. show how to optimizequantum chemistry simulations by ignoring terms with small norm [JWM + Z -axis rotations with approximation accuracies in the range (cid:15) = 10 − . The second quantized form expresses the quantum system in terms of the number of particles in eachpossible state. The speciﬁcs are not important for the current discussion, however. US (cid:72) U (cid:76) RUS (cid:72) Z (cid:76) Fowler BGS (cid:230) (cid:230) (cid:230) (cid:230) (cid:230) (cid:230) (cid:230) (cid:230) (cid:230) (cid:230) (cid:230) (cid:230) (cid:230) (cid:230) (cid:230) (cid:230) (cid:230) (cid:45) (cid:45) (cid:45) (cid:45) (cid:72) Ε (cid:76) (cid:72) E xp ec t e d (cid:76) T c oun t . l o g (cid:72) (cid:144) Ε (cid:76) (cid:45) . Figure 8.12: The above plot shows the expected number of T gates required to approximatean arbitrary single-qubit unitary to within a distance (cid:15) . Each point indicates the mean of100 random unitaries approximated to the corresponding accuracy with our full databaseof RUS circuits. With 95 percent conﬁdence, the solid black line has slope in the range[2 . , . Z -axis RUSdatabase from Section 8.7.1. The solid red line indicates the scaling obtained by usingthe circuit in Figure 8.1c for V decomposition [BGS13]. This scaling is worse than theothers, but is valid for (cid:15) > − . The estimated scaling due to Fowler [Fow11] is shown forreference. 154 .9 Possible generalizations and limitations Traditional methods decompose single-qubit unitaries into deterministic sequences of gates.Wiebe and Kliuchnikov showed that by adding measurements and allowing non-deterministiccircuits, decompositions with fewer T gates are possible (in expectation) for very small Z -axis rotations [WK13]. Our results extend that conclusion to arbitrary single-qubitunitaries. By constructing a database of repeat-until-success circuits and then progressivelycomposing those circuits, we can approximate arbitrary single-qubit unitaries to within adistance of 10 − , which is suﬃcient for many quantum algorithms. For a random Z -axisrotation, our database yields an approximation which requires as little as one-third as many T gates as [Sel12], [KMM12c] and [Fow11]. Using all of the circuits in our database (notjust the Z -axis rotations), the improvement for arbitrary unitaries can be larger, thoughachieving high approximation accuracy is challenging.Our results suggest a number of possible areas for improvement and further research.First, the circuits proposed by [WK13] use traditional decomposition algorithms (i.e.,Selinger or KMM) to generate the unitaries required for the mantissa a of the angle a × − γ .Instead, our RUS circuits could be used in order to improve performance. Indeed, one couldconsider a hybrid approach that combined all available decomposition methods in order toﬁnd the most eﬃcient circuit. Second, circuits of the form shown in Figure 8.3 make uponly a subset of possible RUS circuits. Expanding the search to include additional types ofcircuits could improve database density. Third, the formal theory of RUS circuits is notyet understood. A better understanding could lead to eﬃcient decomposition algorithmsbased on RUS circuits and allow for approximation to much smaller values of (cid:15) . A tightcharacterization of RUS circuits would seem to ﬁrst require a better understanding of { Cliﬀord , T } complexity for multi-qubit unitaries.One could also consider some relaxations to the RUS circuit framework. We consideronly single-qubit unitaries. However, multi-qubit unitaries or non-unitary channels mayalso be of interest. We also restrict to recovery operations that are Cliﬀord operators.That restriction could be modiﬁed to allow for larger or alternative classes of operations.On the other hand, fault-tolerance schemes based on stabilizer codes often permit no-cost application of Pauli operators [Kni05]. Thus, it might be sensible to limit recoveryoperations to only tensor products of Paulis.Finally, the non-deterministic nature of RUS circuits imposes some additional constraintson the overall architecture of the quantum computer. Many fault-tolerance schemes alreadyuse non-deterministic methods such as state distillation to implement certain gates. Butmost of the non-determinism occurs “oﬄine”, without impacting the computational data155ubits. Since RUS circuits are “online”, the time required to implement a given unitarycannot be determined in advance. Such asynchronicity could complicate placement androuting techniques (see Chapter 9) and classical control logic, thereby increasing resourceoverhead requirements. Thorough architecture-speciﬁc analysis will be required in order toconcretely assess the improvements obtained by using RUS circuits.156 hapter 9Global optimization of fault-tolerantquantum circuits This chapter is based on material that appears in [PF13].One issue that is generally ignored in fault-tolerant constructions, particularly forconcatenated codes and including the one in Chapter 7, is that realistic proposals forquantum computers impose geometric constraints. Many proposed architectures involvea two-dimensional lattice of qubits for which interactions are limited to a small set ofneighboring locations (see Section 4.6). Ultimately, any practical fault-tolerance schememust account for the particular geometry oﬀered by the quantum computer.In this chapter we propose two algorithms for eﬃcient placement of fault-tolerantquantum circuits onto a two-dimensional rectangular lattice of qubits. Our algorithmsoperate within the context of the surface code and therefore automatically respect nearest-neighbor interaction constraints. Encoded computation in the surface code is representedby a three-dimensional object in space-time called a braid . Our algorithms are based onthe fact that the encoded quantum circuit is invariant under topological transformations ofthe braid. We may, therefore, smoothly deform the braid according to the dimensions ofthe quantum computer.Informally, braid compaction is the problem of topologically deforming a braid so that itﬁts into a prescribed spacetime volume. This problem bears a striking resemblance to VLSIplacement. In VLSI placement the goal is to arrange a set of logic elements—represented byrectangles—and wires into the smallest possible area subject to connectivity and distance157onstraints. In braid compaction, the task is to pack a set of gates, some of which arerepresented by boxes, into the smallest possible volume subject to distance and topologyconstraints. The VLSI placement problem is known to be NP-complete [SLW83]. Weconjecture that braid compaction is NP-complete, as well; though attempts at a formalreduction have been unsuccessful.Correspondingly, our algorithms are constructed from carefully designed heuristics. Theﬁrst algorithm is loosely based on physical principles of gravity and tension. The braid istreated as a physical object that is allowed to slide into a space-time box under its ownweight. Gravity forces direct the braid toward the bottom of the box in order to minimizetime, and tension forces keep the braid compact.Our second algorithm uses the optimization technique of simulated annealing, andis based on a similar algorithm for VLSI placement [HLL88]. Each part of the braid ismodeled as a cuboid (i.e., a box). Some cuboids have ﬁxed dimension and some are allowedto expand and contract. Size, distance and topology constraints are given by sets of linearinequalities on the coordinates of each cuboid. Depending on the shape of the braid, someconstraints must be actively enforced, and others need not be enforced. The annealing stepconsists of swapping constraints in and out of the active set to change the shape and size ofthe braid.

The main goal of the optimization techniques in previous chapters has been to reducethe space requirements of fault-tolerant quantum computation. In many cases, theseoptimizations also lead to smaller time overhead, as well. To this point, however, timeoptimization has been a secondary goal. Furthermore, these techniques focus on small butrepeated parts of the circuit. They do not address, for example, global parallelism concerns.In our current context, we are instead given a ﬁxed two-dimensional lattice of qubits,and are asked to minimize the time overhead. If we can minimize the space requirementswithout increasing time requirements, then we should. But space that is available butotherwise unoccupied is wasted.An important goal, therefore, is to parallelize quantum algorithms. However, manyquantum algorithms are serial in nature, leaving large numbers of qubits idle much of thetime. Low-gate-count arithmetic quantum circuits, for example, form a staircase structureof linear depth [CDKM04]. Parallelization of certain procedures, such as the quantum158ourier transform, is possible when extra qubits are available but is typically done on acase by case basis [CW00].Some general techniques for pararallelization exist. Typical methods involve localcircuit rewriting rules for trading between sequences of gates and additional qubits [MN01,MDMN08, SWD10]. Small-depth circuits can be achieved for certain sub-classes of quantumcircuits. Cliﬀord group circuits, for example, can be parallelized to quantum circuits ofconstant depth followed by log-depth classical post-processing [RB01].Others have proposed global circuit optimization procedures that involve a multi-stagedtransformation to and from the measurement-based quantum computing model [BK09,dSPK13]. Indeed, there are strong similarities between the measurement-based model andthe surface code [RH07]. However, the template-based and measurement-based optimiza-tions are not fault-tolerant and, except for [SWD10], do not explicitly consider geometricconstraints imposed by the quantum computer. It is not clear that the resulting circuitsremain compact under such restrictions.By contrast, since our algorithms operate within the surface code, the output is auto-matically fault-tolerant and can be easily mapped to a wide variety of two-dimensionalnearest-neighbor architectures [DiV09, GFG12]. Furthermore, the rules for topologicallytransforming surface code braids are conceptually simple. There is no need to break up thetransformation into multiple stages. Thus, compared to other proposals, we feel that ourapproach is easier to understand, implement, and extend.

The optimization algorithms in Section 9.4 and Section 9.5 are based on fault-tolerantquantum circuits for the surface code. The surface code uses a fundamentally diﬀerentapproach to encoding logical quantum gates than we have previously seen for concatenatedcodes, and this encoding is key to our optimization approach. In this section, we give abrief pedagogical introduction to the surface code, with a focus on the mapping from aquantum circuit to a surface code braid. Other details of the surface code are not essentialfor understanding our compaction algorithms. For a comprehensive introduction to thesurface code we refer the reader to [FMMC12].The surface code has a number of desirable properties. First, it operates on a two-dimensional rectangular lattice of qubits. All operations can be performed using onlyone-qubit gates, and two-qubit gates involving only nearest neighbor qubits. As a result,the required number of qubits scales much more slowly for the surface code than for159oncatenated codes on 2-D nearest-neighbor architectures. At the same time, the surfacecode tolerates noisier physical gates than many other quantum error correcting codes.Reliable computation is possible so long as the noise rate is below roughly 0 . The surface code is a CSS code that can be deﬁned on a 2 D rectangular lattice graph ofdegree four. A qubit is placed on each edge of the graph. The X stabilizer generatorscorrespond to weight-four operators around each vertex—i.e., each operator has supportonly on the qubits adjacent to the corresponding vertex. The Z stabilizer generatorscorrespond to weight-four operators around each face of the lattice—i.e., each operator hassupport only on qubits of the edges that deﬁne the face. Encoded qubits are created bydisabling some of the generators, thereby adding new degrees of freedom to the code. Wechoose to deﬁne a qubit as a pair of defects . Defects are contiguous regions of the latticefor which the stabilizers are not measured. There are two types of defects, primal anddual. Primal defects correspond to operators around vertices of the lattice ( Z stabilizergenerators), and dual defects correspond to operators around the faces of the lattice ( X stabilizer generators).Error protection is achieved by creating defects of suﬃcient size, and by keepingdefects well separated in space. For a code distance of d , we require that all defects havecircumference d and that defects of the same type are separated in L ∞ distance by d . Fordefects of opposite type, the minimum distance depends on the shape of each defect. In allcases a distance of d/ d ), though in some cases primal anddual defects may be as close as d/ Most encoded operations in the surface code proceed by moving defects around each other.Defect movement is achieved by turning oﬀ new regions of stabilizer measurements and thenturning on other stabilizer measurements. The movement can be divided into time-slices.By stacking time-slices on top of each other, the encoded operations are represented by athree-dimensional object in space and time called a braid . See Figure 9.1. Transformationof a quantum circuit to a braid can be done systematically by constructing canonical braidelements for each quantum gate. Preparation of encoded | (cid:105) is represented by a “U”-shapedprimal defect. Encoded Z -basis measurement is essentially the reverse. A CNOT operation160 a) top view space t i m e (b) side view Figure 9.1: (a) Encoded surface code qubits are deﬁned by pairs of defects, either primal(red) or dual (blue). Each defect is composed of multiple physical qubits on the two-dimensional lattice. Operations are performed by moving defects around. Here, an encodedtwo-qubit operation is performed by moving one defect from the dual encoded qubit aroundone of the defects of the primal encoded qubit. (b) The same operation can be written as aspace-time diagram in which one of the space axes has been ﬂattened.is performed by a loop of dual defects that wraps around the two associated encoded qubits.See Table 9.1.Braids consisting of these operations are invariant under topological deformation. Thatis, a quantum circuit can be represented by a canonical braid, and also by any braid that istopologically equivalent to that canonical braid. Strings of defects may be smoothly pulledor pushed around in space and time without altering the encoded quantum computation.See Figure 9.2. Note that space and time are symmetric here. Space can be traded for timeand vice versa.Not all encoded operations in the surface code can be performed topologically, however.The encoded Hadamard operation, for example, requires the encoded qubit—i.e., the twocorresponding defects—to be placed on a separate lattice, isolated from all other encodedqubits. This is achieved by ﬁrst “cutting out” part of the lattice around the encoded qubitand then later re-attaching it to the rest of the lattice [Fow12a]. The resulting space-timevolume is a cuboid (i.e., a box) of dimension roughly 3 d/ × d/ × d/

2. However, thecuboid contains a variety of boundary types near the surface, thus imposing some restrictionson the conﬁgurations of other surrounding defects. The cuboid can be translated in anydirection, or rotated about the time-axis by increments of π/

2, but is otherwise treated asa rigid object. For concreteness, we adopt the convention that time corresponds to the In principle, a sideways Hadamard gate is possible and would allow for rotations about the x and y i Z • H T T YYA

Table 9.1: The surface code gate set (top) and corresponding canonical braids (bottom).Each braid is a three-dimensional collection of defects. For visual clarity, the braids havebeen ﬂattened here into two dimensions. space t i m e (a) (b) Figure 9.2: Surface code braids are invariant under topological deformation. The space-timediagram on the left (a) is topologically equivalent to the diagram on the right (b). Defectstrings and loops may be smoothly stretched and contracted without altering the encodedoperation. 162 -axis.We will also require one other non-topological operation, the encoded T -gate. Thisgate cannot be implemented directly in the surface code and is instead constructed bythe state distillation protocol described in Section 4.3.2. Distillation does not explicitlyrequire the encoded qubit to be cut out of the lattice, as the Hadamard does. However,both the distillation and gate teleportation involve measurements which are probabilistic.The required circuit changes depending on the measurement outcomes.Likewise, the corresponding braid cannot be entirely determined ahead of time. Itis possible, however, to shift all of the non-determinism either oﬄine or into logicalmeasurements, which can be performed very eﬃciently [Fow12c]. Figure 9.3b shows analternative circuit that also implements T . In this circuit, an S gate, implemented withthe help of a resource state | Y (cid:105) = √ ( | (cid:105) + i | (cid:105) ), is selectively teleported into the circuitconditioned on the outcome of an Z -basis measurement. Given states | A (cid:105) and | Y (cid:105) , theentire circuit is determined ahead of time except for the measurement bases for selectiveteleportation.The circuit in Figure 9.3a is smaller than that of Figure 9.3b. The latter circuit, however,has the advantage that it can be composed in parallel with any number of additional T gatecircuits. The braid corresponding to the single-qubit unitary T HT , for example, can beparallelized as shown in Figure 9.4. The logical measurements in this braid are implementeddiﬀerently than previously discussed. The cap on the defects has been ﬂattened into awider, but thinner set of defects that looks like a tabletop. This allows for maximumparallelization of sequences of T gates.The measurement regions of Figure 9.4 must obey a relative time ordering. In particular,the Z -basis measurement of the input qubit | ψ (cid:105) must be completed before the selectiveteleportation measurements can be performed. In addition, the selective teleportation of the previous T (if applicable) must be completed before selective teleportation measurementsof current T gate can be performed. In this way, the measurement regions for sequences of T gates form a tree. Each measurement region must be located strictly later in time thaneach of its children.There are a variety of options for preparing the | A (cid:105) and | Y (cid:105) states required by Figure 9.3b.The | A (cid:105) state, for example, can be prepared using the [[15 , , | A (cid:105) and | Y (cid:105) axes. However, the chosen implementation requires the cuboid to be vertically oriented. A i • S T | ψ i| ψ i Z • (a) | ψ i Z •| A i • • Z/X | i X/Z | Y i H H | Y i| + i • • • • X/Z | + i • • Z/X | i T | ψ i (b) Figure 9.3: Two circuits that implement the T gate on input state | ψ (cid:105) . (a) The resourcestate | A (cid:105) = | (cid:105) + e iπ/ | (cid:105) is constructed by injection and distillation. Conditioned onthe measurement outcome, a corrective S rotation may be required, which requires anon-destructive use of an ancilla | Y (cid:105) = | (cid:105) + i | (cid:105) state, initially prepared by injection anddistillation (not shown). (b) Instead of performing the conditional S gate directly, selectivedestination teleportation can be used [Fow12c]. On one path of the teleportation, the S gate is applied, and on the other path it is not. The Z -basis measurement on | ψ (cid:105) determinesthe bases in which the other four qubits are measured. The output is T | ψ (cid:105) , up to Paulicorrections from teleportation.preparation as rigid cuboids, similar to the Hadamard gate. This gives us the freedomto deﬁne braid compaction algorithms without being coupled to a particular distillationprocedure.The gates listed in Table 9.1 are universal for quantum computing. Thus any quantumcircuit can be mapped to a surface code braid by ﬁrst decomposing it into this gate set,and then sequentially constructing each of the canonical braid elements. The canonical braid is a fault-tolerant representation of the original circuit, but there isno guarantee that it will ﬁt onto the two-dimensional lattice of qubits that is available.Indeed, the structure of the canonical braid closely resembles that of the original circuit.It is essentially a long line of defects that extends out in time. Even if the braid ﬁts, itstwo-dimensional shape means that most of the qubits in the quantum computer will be leftunused. 164

H T (a)

T H T

A YY A YY (b)

Figure 9.4: (a) A quantum circuit for the single-qubit unitary

T HT in which time runs leftto right. (b) A schematic representation of the corresponding surface code braid in whichtime runs bottom to top. For simplicity, the braids corresponding to Figure 9.3b are shownas boxes, except for the measurements which are shown as thin tabletop structures. The T and H boxes may be placed in parallel, and the | A (cid:105) and | Y (cid:105) states may be prepared aheadof time. The ﬁrst measurement of the T gate must complete before the remaining fourselective teleportation measurements can be performed. Selective teleportation measurmentsbetween T gates also obey a relative time-ordering as indicated by the black dotted lines.Any sequence of single-qubit gates from { T, H, S } may be parallelized in this way.165f course, one could try to compile the braid in a diﬀerent way, so as to use more ofthe available space. However, the eﬃciency of the compilation will depend heavily on thestructure of the original circuit. Qubits that were originally local when arranged linearlymight be placed far apart when arranged in two dimensions, thereby increasing the volumerequired for a CNOT between the two.We instead choose to optimize the canonical braid by smoothly deforming it. So long asthe deformations are topological, the optimized braid will be logically equivalent to theoriginal. Braid compaction, then, is the problem of taking a braid B and converting it intoa topologically equivalent braid B (cid:48) that ﬁts into a smaller bounding volume. Alternatively,the problem can be described as follows. Braid compaction

Given a braid B , code distance d , and a rectangular lattice of dimen-sion A = ( x, y ), ﬁnd a braid B (cid:48) that is topologically equivalent to B and such that B (cid:48) that achieves a minimum code distance of d and is contained in a volume V = ( x, y, z )of minimum size.The x and y dimensions of the bounding volume are are ﬁxed by the size and geometryof the quantum computer. The goal is to eﬃciently use the provided space in order tominimize computation time.Abstractly, we can view braid compaction as a process of placing cuboids (Hadamardand T gates) in a large box, subject to certain distance, connectivity and topology con-straints. When viewed in this way, the problem looks strikingly similar to that of VLSIplacement [SLW83]. In the VLSI placement problem, the task is to pack a set of circuitelements—represented by rectangles—on a two-dimensional circuit board of minimum area.Some of the circuit elements must be connected by wires, and some must be separated fromother circuit elements by a minimum distance.VLSI placement is NP-complete [SLW83]. Given the close similarities with VLSIplacement and with other packing problems, we conjecture that braid compaction is alsoNP-complete. However, despite their similarities, there are several key diﬀerences betweenVLSI placement and braid compaction. In particular, the rigid objects in VLSI placementhave arbitrary dimension whereas the Hadamard cuboids in the braid are of ﬁxed size.Thus a naive reduction from VLSI placement to braid compaction is not possible. Attemptsat a more complicated reduction or reduction from related problems such as 3-Partitionand bin packing have so far failed. 166 .4 A force-directed compaction algorithm We now describe our force-directed algorithm, the ﬁrst of two proposed algorithms forbraid compaction. The algorithm employs two complementary “forces”. A gravity forceacts to pull the braid down toward the bottom of the space-time grid, thereby reducingcomputation time. Meanwhile, a tension force prevents the braid from becoming too largeand impeding the progress of gravity.

For our force-directed algorithm, the braid is modeled as a set of plumbing pieces (i.e.,pipes) placed on a three-dimensional grid. For circuits containing preparation, measurement,single-qubit Paulis and CNOT gates, only four types of pipes are required: straight and bent(elbow shaped) pipes, both primal and dual. See Figure 9.5. The braid is then constructedby connecting pipes into interlocking loops. Junctions can also be supported by mergingtwo or more pipes.The three-dimensional ( l × w × h ) grid is partitioned into 4 × × − y face must always connectat position (1 , ,

2) within the cell. Including the empty pipe, there are 2 = 64 possibleprimal pipes and 64 possible dual pipes, for a total of 4096 possible cell conﬁgurations.See Figure 9.6.The structure of the cell enforces a minimum distance of a single unit cube betweendefects of opposite type and a distance of three unit cubes between distinct defects of thesame type. Thus, if the length of a unit cube is δ , the resulting surface code distance is d = 3 δ . A unit cube contains 2 δ physical qubits per side (including qubits for stabilizermeasurement), so that a single time-slice of a cell contains 64 δ qubits.Regions such as Hadamards, and state distillation and tabletop measurement for T gates cannot be represented as a collection of conventional plumbing pieces. Instead, theyare represented by a volume of special purpose pipes which collectively are treated as acontiguous region. These pipes are much like regular pipes, except that they consume anarbitrary region of the 4 × × a) straight primal (b) bent primal (c) straight dual (d) bent dual Figure 9.5: In the force-directed algorithm, braids are constructed by rotating andconnecting the four primative “plumbing” pieces shown above. (a) (b)

Figure 9.6: (a) An example of a 4 x 4 x 4 cell containing both a primal pipe and a dualpipe. The primal pipe connects to the southern face ( − z ) and the eastern face (+ x ). Thedual pipe connects to the the western face ( − x ) and to the far face (+ y ). (b) All possiblepipes superimposed on a single cell. Primal and dual defects are always separated by atleast one unit cube. Neighboring unconnected defects of the same type are always separatedby at least three unit cubes. 168 .4.2 Braid synthesis As deﬁned, the braid compaction problem takes an arbitrary braid as input. Thus ouralgorithm need not address the synthesis of a quantum circuit into a braid. Indeed, theforce-directed braid model requires only that rigid collections of pipes (i.e., cuboids) bespeciﬁed along with rotation and time-ordering constraints.For concreteness, however, we will assume that the initial braid is constructed froma quantum circuit in the canonical way as described in Section 9.2. That is, qubits arerepresented by pairs of primal defects. Single-qubit preparation corresponds to two bentpipes connected to form a “U” shape and single-qubit measurement is the same, except thatthe U shape is upside-down. Hadamards, and T gates are abstracted as cuboids of ﬁxeddimension. CNOT gates are constructed by wrapping a dual loop around correspondingprimal loops.The Hadamard cuboid is three cells wide, four cells deep and four cells high. Thiscuboid is larger than is strictly necessary to enclose the Hadamard operation. Part of theHadamard operation involves cutting a boundary around the corresponding logical qubit.The volume given above provides enough room for the Hadamard operation to take placeinside boundary, while enforcing that defects outside of the boundary are a safe distanceaway. Aﬃxed to opposite faces of the cuboid are pairs of straight pipes representing theinput and output logical qubit.The speciﬁcs of the T -gate braid depend on the distillation protocol and on the desiredgate accuracy, but otherwise follow Figure 9.3b. Our compaction algorithm is ﬂexible enoughto allow any type of distillation scheme. For simplicity, we will assume the existence of twocuboid regions for each T gate, one for | A (cid:105) and one for | Y (cid:105) . Straight pipes representing theoutput are aﬃxed to the top of each cuboid. The primary “force” in the algorithm is a vector ﬁeld that loosely resembles physical gravityacting on the braid. With each cell in the grid, we associate two vectors of the form ( a, m ),speciﬁed by an axis a ∈ { x, y, z } and a magnitude m ∈ Z . The ﬁrst vector represents aforce on the primal pipe contained in the cell, and the second vector represents a force onthe dual pipe.There are a number of reasonable ways to initialize and update the gravity ﬁeld asdefects are moved around. The simplest strategy is to assign a ﬁxed, negative magnitude toeach spacetime point and align the vector along the z -axis so that the force always points169igure 9.7: Gravity vectors (shown as green cones) generally point downward, but maypoint in any direction.downward. In order that defects may slide past each other, though, we allow vectors topoint sideways along the x and y axes, as well. See Figure 9.7. Roughly, gravity vectorsare assigned to point to the closest cell from which the defect may then move downward.For example, a primal pipe occupying cell ( x, y, z ) may be blocked by a dual pipe in cell( x, y, z − x + 1 , y, z ) and ( x + 1 , y, z −

1) are empty, then the primalgravity vector for cell ( x, y, z ) is assigned to point along the positive x -axis. The gravity force, while eﬀective at directing pipes toward the bottom of the grid, has theeﬀect of stretching strings and loops, thus increasing the length of the braid. This happens,for example, when a loop is pulled by gravity in one direction but a small segment of theloop is prevented from moving because other defects are in the way. When a string or loopbecomes very long, it may take up space that could otherwise be occupied by other partsof the braid. To prevent this behavior we implement a tension force which acts to reducethe length of a string.Tension is applied to each string of defects independently. For each pipe in the string,there is a force pulling in the direction of the input face and a force pulling in the directionof the output face. For example, a pipe connected to the − x and + z faces will experience aforce in the − x and + z directions. The magnitude of the force is proportional to the lengthof the string, just as for a physical spring.This choice of tension forces means that the inward and outward forces cancel forstraight pipes. Bent pipes, however, feel an inward force toward the rest of the string.This inward force tends to decrease the curvature of the string, thereby reducing its length.See Figure 9.8. 170 a) before (b) after Figure 9.8: The tension force pulls inward on each of the corners of the loop. The result isa smaller rectangular loop.Tension forces also act on cuboids. Each of the pipes connected to a cuboid exerts aforce that pulls in the direction of the pipe. Again, the force is proportional to the lengthof the string to which each connecting pipe belongs.

The braid is initially placed above the three-dimensional grid. Since the braid may bewider than the grid dimensions, a funnel is placed on top of the grid. This allows the braidto slowly deform according to the geometry of the lattice. Compaction then proceeds byiterating through each of the cuboids and strings. Cuboids are translated or rotated as asingle rigid object. Other regions of pipes form strings which either connect to cuboids orform loops. Strings are treated as ﬂexible objects in which each pipe can be translatedindependently.Associated with each pipe is a velocity vector. Each pipe in a string is moved by ﬁrsttaking the initial velocity vector and updating it according to the gravity and tension forcesat that location. The pipe is then translated according to the direction and magnitude ofthe new velocity vector. During the move, additional pipes may be added or removed inorder to maintain connectivity of the string.To translate a cuboid, the total velocity is calculated by summing each of the individualvelocity vectors. Similarly, the gravity and tension forces are calculated by summing theforce vectors associated with each pipe. The cuboid velocity is then updated by dividingthe total force by the number of pipes (each pipe is assumed to have the same mass) andthen adding to the existing velocity. Finally, the cuboid is translated according to thedirection and magnitude of the velocity vector.In the case of tabletop measurement translations along the z -axis, we must preservethe partial ordering. When translating a tabletop m along the − z -axis we must check the171eight of the other measurements on which m depends. Likewise, when translating m alongthe + z -axis, we must check the height of the measurement that depends on m .Cuboid rotations are performed similarly by calculating a rotational velocity accordingto the moments of each pipe and the torque due to gravity and tension forces. Rotationabout a given axis is performed only if rotation is allowed and the magnitude of the angularvelocity is large enough to induce a rotation of π/ The complexity of a single compaction iteration scales as the size of the braid. The size ofa canonical braid is O ( nm ) where n is the number of qubits and m is the number of gates172n the input circuit. The number of iterations required to obtain good compaction resultsdepends on the ratio of the lattice area—i.e., the x - y plane—to the braid size. In the casethat the lattice area is large compared to the braid size, it seems reasonable to expect thebraid to ﬂatten in time proportional to the height of the canonical braid. If the canonicalbraid has area large compared to the lattice, then O ( nm ) iterations may be required inorder to funnel then entire braid into the proper bounding box.For small circuit sizes, a runtime of O ( n m ) is reasonable. But for large circuitsconsisting of thousands of qubits and possibly millions or billions of gates, we require abetter strategy. Indeed, we cannot hope to globally optimize braids for large-scale problemsizes. Instead, the circuit is partitioned into subcircuits of manageable size and the braidis synthesized and compacted hierarchically. Just as we treat single-qubit Hadamards asatomic cuboids of ﬁxed size, we may consider sub-braids as ﬁxed size cuboids.Each sub-braid is represented as a tangle of defects in which some defects are anchoredto grid boundaries. Subject to the anchoring constraints, the sub-braid is compacted asnormal. Once its compacted size is determined, the sub-braid is then treated as a black-boxin the larger braid. If two sub-braids contain measurements that are time-ordered, then thesub-braids must also be time ordered. But again, this is no diﬀerent than time orderingrestrictions on tabletop measurements in the original model.We anticipate that the best partitioning strategy will be one that reﬂects the structure ofthe input circuit. Reasonable representations of large input circuits will be hierarchical andit should be possible to mimic this hierarchy for large-scale compaction. This technique willbe particularly useful for highly repetitive circuits. Repeated sub-circuits can be synthesizedand compacted once, and then duplicated in the larger braid. We have implemented the force-directed compaction algorithm in C++ as a tool calledBraidpack. Braidpack takes, as input, a representation of a circuit along with physicalspace restrictions. It produces, as output, a compact logically equivalent surface code braid.The current implementation is not fully functional, but is capable of synthesizing andcompacting arbitrary circuits of CNOT gates, including qubit preparation and measurement.Figure 9.9 shows the result of compaction on a single CNOT gate. The tension forceﬁrst contracts the primal loop on the right-hand-side. Then gravity ﬂattens the braid.Compaction in this example was done without implementing collisions between pipes.As a result, tension is unable to fully contract the dual loop. With a more completeimplementation of the algorithm, we expect the braid to fully ﬂatten and contract.173 a) (b)

Figure 9.9: Compaction of a single CNOT using a prototype of the force-directed algorithm.(a) A canonical CNOT braid is initially arranged vertically. (b) After compaction, the braidhas been almost completely ﬂattened.Figure 9.10 shows the same prototype implementation of Braidpack for a circuit composedof eleven CNOT gates. For simplicity of implementation, the qubit preparation andmeasurements in the canonical braid are arranged in a staircase fashion. Ignoring thestaircases, the canonical braid has a bounding box of size (3 × × × × In this section we describe our second compaction algorithm, which is based on simulatedannealing. Simulated annealing is a general optimization technique that has been appliedto a wide variety of problems. The main idea is to explore the solution space by hoppingrandomly from the current solution to a nearby solution. Hops that result in an improvedsolution are kept. In order to avoid local minima, hops that result in a less desirable solution174 a) (b)

Figure 9.10: Compaction of eleven CNOT gates with a prototype implementation ofthe force-directed algorithm. The canonical braid (a) is compressed into a smaller buttopologically equivalent braid (b). 175re also kept with some non-zero probability, thus permitting broader exploration of theset of possible solutions.Our simulated annealing algorithm is based largely on a procedure used for VLSIplacement [HLL88]. In the VLSI algorithm, circuit elements and wires are represented byrectangles. Size, distance and connectivity constraints are given by linear inequalities onthe coordinates of each rectangle. Rectangles can be shifted around by swapping linearconstraints. The idea for braids is similar. Defects are represented by cuboids. Size, distanceand topology constraints are given by linear inequalities which can be swapped to performtopological deformation.

In the force-directed algorithm, the braid was modeled as a connected conﬁguration ofplumbing pieces. Some collections of pipes formed rigid cuboids. Other collections of pipesformed ﬂexible strings and loops. For simulated annealing, we take a diﬀerent approach.Each cuboid is represented by a pair of points ( p, p (cid:48) ) in the three-dimensional lattice. Point p speciﬁes the point closest to the origin (lower-left corner) and p (cid:48) speciﬁes the point furthestfrom the origin (upper-right corner). Defect strings and loops are also represented bycuboids. A string of defects is given by a set of overlapping cuboids of arbitrary dimension.By connecting cuboids it is possible to construct any desired loop or string.Thus the entire braid is speciﬁed by a set of cuboids. A layout of n cuboids is deﬁned by2 n three-dimensional integer coordinates. The x , y , z dimensions of the layout are deﬁnedby the maximum x , y , and z coordinates respectively. The layout must satisfy a set ofconstraints which we group into the following types:1. size constraints,2. time-ordering constraints,3. minimum distance constraints,4. jog constraints,5. connectivity constraints and6. topological constraints.Except for the topological constraints, all of the constraints can be directly expressed assets of linear inequalities. 176 ize constraints Minimum dimension constraints of a cuboid are speciﬁed by a triple ( δ x , δ y , δ z ) of non-negative real values and three linear inequalities: x + δ x ≤ x (cid:48) y + δ y ≤ y (cid:48) z + δ z ≤ z (cid:48) . (9.1)For string cuboids (those that are not H gates or table-like measurements), δ x = δ y = δ z = d/

4, where d is the code distance. Hadamard and T gates may be rotated 90 degreesabout the z -axis. Each gate can take on one of four diﬀerent rotations { , π/ , π, − π/ } .Rotations 0 and π correspond to the set of constraints given by (9.1). Rotations ± π/ δ x and δ y have been exchanged.We therefore assign one of two sets of constraints to each H and T gate, either theconstraints of (9.1) or the permuted version. The corresponding cuboids must satisfy allconstraints from at least one of sets. Time-ordering constraints

The non-deterministic implementation of T gates in the surface code induces a partial time-ordering of tabletop measurement regions. As discussed in Section 9.2, this partial orderingrequires that, for certain pairs, one tabletop measurement must be located above anothertabletop measurement. The time-ordering constraint for two dependent measurements, a , b is given by, z (cid:48) a + 1 ≤ z b . (9.2) Minimum distance constraints

Like the size constraints, minimum distances are proportional to d , the distance of the code.With a few exceptions (see Section 9.5.1 and Section 9.5.1), primal defect cuboids must beat least a distance d away from other primal defects. Similarly, dual defect cuboids mustbe d away from other dual cuboids. Cuboids of opposite type must be at least d/ r i , r j must be separated by δ , then at least one of the following constraintsmust be satisﬁed: x (cid:48) i + δ ≤ x j x (cid:48) j + δ ≤ x i y (cid:48) i + δ ≤ y j y (cid:48) j + δ ≤ y i z (cid:48) i + δ ≤ z j z (cid:48) j + δ ≤ z i (9.3)177ach constraint corresponds to a diﬀerent relative arrangement of the two cuboids. The x (cid:48) i + δ ≤ x j constraint, for example, enforces that r i is placed to the left of r j . Whereas z (cid:48) i + δ ≤ z j requires that r i be placed below r j . Jog nodes

A ﬁxed string of defects may be represented by a set of overlapping cuboids each of whichhas a ﬁxed orientation along one of the three axes. However, in order to accommodatetopological deformation we require a representation that allows for ﬂexible strings of cuboids.This is analogous to a VLSI instance in which an arbitrary number of jogs are allowed ineach wire. To fulﬁll this requirement, we introduce an object called a jog node .A jog node is a set of six cuboids, each of which has a particular orientation axis. Theﬁrst cuboid is oriented along the + x axis, the second along the + y axis, and the third alongthe + z axis. The fourth, ﬁfth and sixth cuboids are oriented along the − x , − y and − z axes,respectively. Each cuboid in the jog node is allowed to expand along its corresponding axis.Adjacent cuboids are required to overlap so that the entire jog node forms a continuouspath. The constraints for a jog node are given by: x ≤ x , y = y , z = z , x (cid:48) = x (cid:48) , y (cid:48) ≤ y (cid:48) , z (cid:48) = z (cid:48) , x = x , y ≤ y , z = z , x (cid:48) = x (cid:48) , y (cid:48) = y (cid:48) , z (cid:48) ≤ z (cid:48) , x ≥ x , y = y , z ≥ z , x (cid:48) = x (cid:48) , y (cid:48) = y (cid:48) , z (cid:48) = z (cid:48) , x = x , y ≥ y , z = z , x (cid:48) ≥ x (cid:48) , y (cid:48) = y (cid:48) , z (cid:48) = z (cid:48) , x = x , y = y , z ≥ z , x (cid:48) = x (cid:48) , y (cid:48) ≥ y (cid:48) , z (cid:48) = z (cid:48) . (9.4)It possible to connect two jog nodes at their endpoints. Given the sixth cuboid a a and the ﬁrst cuboid b b the endpoints are connected by requiring x a = x b , y a = y b , z a = z b . (9.5)In this way, jog nodes can be connected to form an arbitrary defect path of any length. Itis possible to form both loops and open ended strings.The jog node constraints, as stated, conﬂict with the minimum distance constraintsin Section 9.5.1. For example, cuboids a a δ . As a workaround, we ﬁrst require that each jognode be oriented along at most one axis. This is accomplished by changing the appropriateinequality constraints to equality constraints. For example, to force an orientation along the+ x axis only, leave the x ≤ x constraint alone and change all of the other inequalities to178igure 9.11: A jog node consists of six overlapping cuboids. Each cuboid is allowed toextend in only one direction, and only one cuboid in the node may be extended. The sixpossible jog node conﬁgurations are shown above. The node origin is indicated by a blackdot, where visible.equalities. Then the cuboid corresponding to the + x axis can be of arbitrary size (subjectto minimum dimension constraints) and all other cuboids of the node must ﬁt inside of it.See Figure 9.11.Next, remove the minimum distance constraints for all jog node cuboids except thosethat correspond to the orientation axis. Finally, remove minimum distance constraintsbetween cuboids in adjacent jog nodes. Now, overlapping cuboids within the same jog nodeor between connected jog nodes are consistent with all other constraints.A jog node may also be conﬁgured to take no orientation. In this case, all cuboids inthe node are constrained to be of minimum size, i.e., x + δ x = x (cid:48) , y + δ y = y (cid:48) , z + δ z = z (cid:48) .Furthermore, all minimum distance constraints involving the node are removed. This typeof node will either be unconnected to any other node (in which case it can be removed), orit will be contained entirely within another jog node. In either case, its distance from otherobjects in the braid is unimportant. Connectivity constraints

Jog nodes allow for arbitrary defect paths and loops. We must also deﬁne how jog nodes areused to connect to cuboids such as Hadamard gates and state distillation. Each gate cuboidcontains some number of ports to which string defect cuboids are allowed to attach. Thelocations of the ports are ﬁxed relative to the gate. However, since gates can be rotated,179igure 9.12: The above cuboid has four ports deﬁned on its surface, two on top and twoon the bottom. Jog nodes are aﬃxed to the points that deﬁne each port.the constraints that describe the connection must correspond to the permutation of thedimensional constraints from Section 9.5.1.A port is a rectangle deﬁned by two coordinates on the surface of the gate. A jog nodeis connected to a port by requiring that certain coordinates of the jog node cuboid matchthe coordinates of the port. For example, if the input port ( x, y, z ), ( x (cid:48) , y (cid:48) , z ) is located onthe top of the gate, then the jog node connection constraint is given by x = x, y = y, x (cid:48) = x (cid:48) , y (cid:48) = y (cid:48) , z = z . (9.6)See Figure 9.12.To maintain consistency, the minimum distance constraints between the gate and theconnecting jog node must be eliminated. Note that it is still possible for two connectedgates to achieve a separation of exactly d . In this case, the node connected to the outputport of the ﬁrst gate is also connected to the input port of the second gate, and vice versa.But since each node is of minimum size, the minimum distance constraints between thenode and the gates do not apply (see Section 9.5.1). Topological constraints

Finally we address the topological constraints. Informally, these constraints enforce thelinking between loops. Links between loops of the same type are trivial and need not beconstrained. However, certain linking properties between loops of diﬀerent types must bemaintained. In particular, it is suﬃcient to consider the linking number for each primal-dualloop pair. For each primal-dual pair ( l p , l d ) we have the following constraint l pd = L pd mod 2 (9.7)180here l pd is the linking number of loops l p and l d and L pd ∈ { , } is an input parameter.There is a simple linear-time algorithm to compute the linking number between twoloops (see, e.g., [Kau01]). However, in order to eﬃciently compute the cost function of alayout, we will require that all constraints be linear. See Section 9.5.2.We impose linear topology constraints separately for loop pairs with odd linking number(i.e., loops that are linked) and loop pairs with even linking number (loops that are notlinked). First consider two loops with odd linking number. One of the loops consists ofprimal defects and the other loop consists of dual defects. To the primal loop, attacha new primal cuboid which we will call a linking node . The linking node has dimension(5 d/ × d/ × d/ x bethe minimal x -coordinate of any cuboid in this set and let x (cid:48) be the maximal x -coordinateof any cuboid in the set. Similarly deﬁne y , y (cid:48) , z and z (cid:48) as the minimal and maximal y - and z -coordinates. Then the entire primal loop is contained in a bounding box of dimension( x (cid:48) − x, y (cid:48) − y, z (cid:48) − z ). 181 a) (b) (c) Figure 9.13: When a primal and a dual loop are linked in the canonical braid a linkingnode (a) is inserted and attached to both loops. Once compaction has completed, thelinking node is removed. The linking number can be left unchanged (b), or toggled (c) ifnecessary.Figure 9.14: In order to avoid unwanted links, a cuboid is placed around each primalloop. Dual loops which do not link with the primal loop are prohibited from entering theenclosing cuboid.If all of the cuboids in the dual loop stay outside of the bounding box that encloses theprimal loop, then the linking number is guaranteed to be zero. We therefore introduce a newcuboid that encloses the primal loop. For all dual loops which have even linking numberwith the corresponding primal loop, we add primal-dual minimum distance constraintsbetween the dual cuboids and the enclosing cuboid. See Figure 9.14.In order to ensure that the new cuboid actually encloses the primal loop, additionalvariables and constraints are required. Let ( x, y, z ) and ( x (cid:48) , y (cid:48) , z (cid:48) ) be variables describingthe enclosing cuboid. Then for each cuboid ( x i , y i , z i ), ( x (cid:48) i , y (cid:48) i , z (cid:48) i ) in the primal loop werequire that x ≤ x i x (cid:48) i ≤ x (cid:48) y ≤ y i y (cid:48) i ≤ y (cid:48) z ≤ z i z (cid:48) i ≤ z (cid:48) . (9.8)182 .5.2 The annealing algorithm The algorithm takes a canonical braid as input. Initialization consists of constructing all ofthe cuboids and constraint groups. The instance includes a set of coordinates P , which canbe divided into sets of integers X , Y and Z corresponding to the x -, y − , and z -coordinates,respectively. The constraints can be represented as a set C of integer triples. Some of theconstraints, such at time-ordering constraints, must be satisﬁed for all possible layouts.Other constraints may be partitioned into subsets for which the layout must satisfy at leastone of the constraints in the subset. Let C (cid:48) be the set of all constraints that must alwaysbe satisﬁed, let C (cid:48)(cid:48) be the remaining constraints and let B be the corresponding partitioninto constraint subsets. Let A ⊂ C be the set of “active” constraints such that C (cid:48) ⊂ A and A contains exactly one constraint from each element of B .A key element of the algorithm is to calculate the “cost” of layout. There are a numberof choices of cost function. The goal is to construct a braid of small height that ﬁts intoan x - y area of ﬁxed size. The ﬁrst step is to ensure that the braid ﬁts into that area.We initially set the cost function as the x -coordinate of the bounding box. Once this x -coordinate is small enough, we impose a global constraint that the x -coordinates of allcuboids must be no greater than that of the bounding box. We then set the cost functionas the y -coordinate of the bounding box and repeat the procedure. Finally, once the entirebraid ﬁts into the x - y area, we minimize over the height.Start by choosing a set of active constraints such that all constraints in A are satisﬁedby the canonical braid. The algorithm then proceeds by repeating the following sequence.1. Randomly select an element β ∈ B .2. Randomly select a constraint b ∈ β such that b (cid:54)∈ A .3. Locate the single constraint in b (cid:48) ∈ A ∩ β . Remove b (cid:48) from A and replace it with thenew constraint b .4. Compute the new minimum bounding box size and corresponding cost function.5. If the new set of active constraints is infeasible, then reject the swap by removing b from A and replacing with b (cid:48) .6. If the cost is smaller than before, keep the new constraint.7. If the cost is larger than before, then keep the new constraint with probability givenby the annealing schedule (see below).183n order for the algorithm to be eﬃcient, we require an eﬃcient way to compute thesize of the minimum bounding box. This can be done using the constraint graph methodproposed in [LW83] and used by [HLL88]. First, partition the active constraints into threesets: those that involve only x coordinates, those that involve only y coordinates and thosethat involve only z coordinates. Note that there are no constraints that involve coordinatesfor two diﬀerent axes. Consider just the set of x -coordinates X . We construct a weighteddirected graph G X = ( V X , E X ). Assign V X = X ∪ { x ∅ , x ∞ } where x ∅ and x ∞ are a boundarycoordinates. For each constraint x i ≤ x j + d ij there is a directed edge from vertex x i tovertex x j with weight d ij . The value of each coordinate x ∈ X is assigned by computingthe longest path from x ∅ to x . Assuming that the set of constraints can be satisﬁed, G X isa acyclic. Thus the longest path can be computed in linear time by negating the weightsand using Dijkstra’s algorithm. Constraint graphs for y and z coordinates are similarlyconstructed.The cost of constructing the initial constraint graphs is O ( n ), where n is the number ofcuboids. Once the graphs are constructed, updates can be computed by an online algorithm.When a constraint swap is performed, only those paths aﬀected by the correspondingvertices need to be recalculated. This algorithm can also detect cycles induced by the newconstraint. If a cycle is detected, then the set of constraints is infeasible and the swap isrejected.For VLSI placement Hsieh, Leong and Liu use a ﬁxed-ratio temperature schedule inwhich the temperature is reduced by a constant factor after each time step [HLL88]. Thisschedule is simple and eﬃcient and can also be used for our algorithm. Other kinds ofschedules could also be used. The surface code provides a unique opportunity for fault-tolerant quantum circuit opti-mization by topological deformation. We have deﬁned the problem of braid compactionsubject to geometric constraints, and given two heuristic algorithms. Our tool Braidpackimplements the ﬁrst of these—the force-directed algorithm— and small examples indicatethat compaction algorithms can lead to signiﬁcant improvement in spacetime overheadwhen compared to the canonical braid.Currently, Braidpack is a proof-of-principle rather than production-ready software tool.Small-scale results are largely encouraging, but not all of the intended features have beenimplemented, and larger-scale examples are required to demonstrate the extent of its184sefulness. Implementation of the simulated annealing algorithm is desired in order tocompare the performance of the two algorithms. Indeed, we could also construct a hybridalgorithm which incorporates both techniques.Our simulated annealing algorithm is inspired from a similar algorithm for VLSIplacement. VLSI also oﬀers a number of other techniques including, genetic algorithms,numerical and partitioning algorithms, and force-directed algorithms that are distinct fromour own (see, e.g., [SM91]). Perhaps some of these additional techniques could be adaptedto braid compaction.Due to similarity with VLSI compaction and other packing problems, we conjecturethat braid compaction is NP-complete. A formal reduction has proven elusive, however.Thus an obvious open problem is to conﬁrm or refute that conjecture.Finally, we have focused on topological deformation. However, other non-topologicalbraid identities exist [FD12, RHG07]. Optimization involving these identities has beenpreviously done by hand, but it may be possible to incorporate non-topological techniquesinto an automated tool such as ours. 185 hapter 10Concluding thoughts

The promise of a reliable large-scale quantum computer is in the exponential speedups thatit oﬀers for real-world applications in physics, cryptography and number theory. Quantumcomputers do not yet exist in the real world, however. It is the main objective of the fault-tolerant quantum circuit designer to reduce resource requirements to match the capabilitiesof current or near-term technology. In this thesis we have tried to further this objective byoptimizing a variety of aspects of fault-tolerance including: encoded gates, error correction,threshold calculations, unitary decomposition and global parallelization.We can extract a number of themes from these optimizations. One theme is thecircumvention of optimality or no-go theorems by making novel use of the available machineryor by removing unnecessary constraints. Theorem 5.3.1 shows that the Eastin-Knilltheorem against transversal universality can be side-stepped at essentially no cost. Overlap-based stabilizer state preparation break the convention of treating stabilizer generatorsindependently in exchange for reduced circuit size. A tighter threshold can be obtained byeliminating the need for an adversarial noise model. Repeat-until-success circuits achievebetter-than-optimal scaling by incorporating quantum measurements.The use of gate teleportation and ancillary qubits has been a theme in quantum fault-tolerance from the earliest protocols due to Shor [Sho96], and we have continued the trendhere. The utility of ancillas is particularly evident in the circuits presented in Chapter 8.By using ancillas and measurement, suddenly a much wider class of unitary operationscan be implemented without expanding the gate set beyond { Cliﬀord , T } . Ancillas andteleportation are used heavily in state distillation and we saw two new distillation protocols,one in Chapter 5 and one in Chapter 6.Another strong theme is the development and use of software tools to aid in circuit186esign and discovery. Indeed, except for Chapter 5, all of the new results presented inthis thesis made use of custom computer software in some form or another. Undoubtedly,software tools will continue to be an important part of fault-tolerance optimizations goingforward. One can imagine a kind of software “toolchain” for compiling and optimizingquantum algorithms, taking a high level description of an algorithm and progressivelydecomposing it into machine-level instructions.The new results and ideas in this thesis introduce many new questions, and leave roomfor improvement in several areas. Given their universal and transversal power, triorthogonalcodes appear to have a special place in the theory of fault-tolerant quantum computation.However, beyond numerical study of the [[15 , , constant overhead may be possible by using certain low-densityparity-check (LDPC) quantum codes [Got13]. However, realization of his claims presumeeﬃcient classical decoding algorithms for these codes, algorithms which are are not currentlyknown.Another exciting, but speculative pursuit is the use of non-abelian anyons for topologicalquantum computation. Because of their inherently robust properties, some have likenedanyons to “quantum transistors” (thereby implying a comparison between quantum circuitsand vacuum tubes). The experimental viability of this method remains to be seen.At the current time, the surface code seems to be the leader among realistic schemes forfault-tolerant quantum computation. Its high threshold and 2D nearest-neighbor properties187ake it a very appealing option for a variety of proposed quantum computing architectures.Indeed it has been the subject of intense study in recent years. We addressed globaltopological optimization for the surface code in this thesis, but others have also consideredoptimizations, particularly for state distillation [FD12, FDJ13, Jon13c].The motivation for resource optimization is a strong one, and more improvementsare necessary before requirements become low-enough for implementation of quantumalgorithms. To quote Gottesman [Got13], “the main thing is not to give up”. We canbe pleased with the optimizations that we discover, but we should not be satisﬁed untilfault-tolerant quantum computing is a reality.188 ppendices ppendix AProof of Claim 7.5.1 We now prove Claim 7.5.1, that the level-two malignant event upper bounds decrease with γ according to the distance of the code. The claim is restated here for convenience. Claim.

For ≤ (cid:15) ≤ , P (2) E ( (cid:15) Γ (1) ( γ )) ≤ (cid:15) t +1 P (2) E (Γ (1) ( γ )) , where t = (cid:98) ( d − / (cid:99) and d isthe minimum distance of the (unconcatenated) code.Proof. From (7.12) we see that P (2) E can be bounded asPr[mal E , good ]Pr[ accept ] + Pr[ bad | accept ] . (A.1)The Pr[mal E , good ] term is expressed as a sum of the form k max (cid:88) k =0 c ( k )Γ k (A.2)where all of the coeﬃcients c ( k ) are non-negative (because there are no non-deterministiccomponents at level-two) and it is understood that Γ is a function of γ . The Pr[ accept ]term in the denominator is a product of terms of the form1 − k max (cid:88) k =0 c ( k )Γ k (A.3)where, again, all c ( k ) are non-negative. Pr[ bad | accept ] is a sum of terms similar to (A.2),some of which contain (A.3) terms in the denominator.190trict fault-tolerance of the exRec implies that the coeﬃcients c ( k ) of (A.2) and thenumerator coeﬃcients of Pr[ bad | accept ] are zero for k ≤ t . Therefore, for 0 ≤ (cid:15) ≤ P (2) E ( (cid:15) Γ)is a sum of non-negative terms of the form (cid:80) k max k =0 c ( k )( (cid:15) Γ) k − (cid:80) k max k =0 c ( k )( (cid:15) Γ) k ≤ (cid:15) t (cid:80) k max k =4 c ( k )Γ k − (cid:80) k max k =0 c ( k )Γ k (A.4)which completes the proof. 191 eferences [AB97] Dorit Aharonov and Michael Ben-Or. Fault Tolerant Quantum Computationwith Constant Error. Proc. 29th Annual ACM Symp. on Theory of Computing(STOC) , pages 176–188, 1997, arXiv:9611025 .[AC07] Panos Aliferis and Andrew W. Cross. Subsystem Fault Tolerance with theBacon-Shor Code.

Physical Review Letters , 98(22):4, 2007, arXiv:0610063 .[AG04] Scott Aaronson and Daniel Gottesman. Improved simulation of stabilizercircuits.

Physical Review A , 70:052328, 2004, arXiv:0406196 .[AGP06] Panos Aliferis, Daniel Gottesman, and John Preskill. Quantum accuracythreshold for concatenated distance-3 codes.

Quantum Information andComputation , 6:97–165, 2006, arXiv:0504218 .[AGP08] Panos Aliferis, Daniel Gottesman, and John Preskill. Accuracy threshold forpostselected quantum computation.

Quantum Information and Computation ,8:181–244, 2008, arXiv:0703264 .[Aha03] Dorit Aharonov. A Simple Proof that Toﬀoli and Hadamard are QuantumUniversal. 2003, arXiv:0301040 .[AJKR10] Gorjan Alagic, Stephen P. Jordan, Robert K¨onig, and Ben W. Reichardt.Estimating Turaev-Viro three-manifold invariants is universal for quantumcomputation.

Physical Review A , 82(4):040302, 2010, arXiv:1003.0923 .[AKP06] Dorit Aharonov, Alexei Y. Kitaev, and John Preskill. Fault-Tolerant Quan-tum Computation with Long-Range Correlated Noise.

Physical ReviewLetters , 96(5):050504, 2006. 192AL97] Daniel Abrams and Seth Lloyd. Simulation of Many-Body Fermi Systems ona Universal Quantum Computer.

Physical Review Letters , 79(13):2586–2589,1997, arXiv:9703054 .[AMMR12] Matthew Amy, Dmitri Maslov, Michele Mosca, and Martin Roetteler. Ameet-in-the-middle algorithm for fast synthesis of depth-optimal quantumcircuits. 2012, arXiv:1206.0758 .[Amy13] Matthew Amy.

Algorithms for the Optimization of Quantum Circuits . Mas-ter’s thesis, University of Waterloo, 2013.[AOK +

10] Janet Anders, Daniel Kuan Li Oi, Elham Kasheﬁ, Dan E. Browne, andErika Andersson. Ancilla-Driven Universal Quantum Computation.

PhysicalReview A , 82:020301, 2010, arXiv:0911.3783 .[AP08] Panos Aliferis and John Preskill. Fault-tolerant quantum computation againstbiased noise.

Physical Review A , 78:052331, 2008, arXiv:0710.1301 .[AP09] Panos Aliferis and John Preskill. Fibonacci scheme for fault-tolerant quantumcomputation.

Physical Review A , 79:12332, 2009, arXiv:0809.5063 .[Bac06] Dave Bacon. Operator quantum error-correcting subsystems for self-correctingquantum memories.

Physical Review A , 73:12340, 2006, arXiv:0506023 .[Ban98] Masashi Ban. Photon-echo technique for reducing the decoherence of aquantum bit.

Journal of Modern Optics , 45(11):2315–2325, 1998.[BBC +

93] Charles H. Bennett, Gilles Brassard, Claude Cr´epeau, Richard Jozsa, AsherPeres, and William K. Wootters. Teleporting an unknown quantum statevia dual classical and Einstein-Podolsky-Rosen channels.

Physical ReviewLetters , 70(13):1895–1899, 1993.[BBK03] Adel Bririd, Simon C. Benjamin, and Alastair Kay. Quantum error correctionin globally controlled arrays. 2003, arXiv:0308113 .[BCHMD13] Hector Bombin, Ravindra W. Chhajlany, Micha(cid:32)l Horodecki, and Miguel-Angel Martin-Delgado. Self-correcting quantum computers.

New Journal ofPhysics , 15(5):055023, 2013, arXiv:0907.5228 .193BCL +

06] Harry Buhrman, Richard Cleve, Monique Laurent, Noah Linden, Alexan-der Schrijver, and Falk Unger. New Limits on Fault-Tolerant Quan-tum Computation. In , pages 411–419. IEEE, 2006, arXiv:/arxiv.org/abs/quant-ph/0604141 .[BDSW96] Charles H. Bennett, David P. DiVincenzo, John A. Smolin, and William K.Wootters. Mixed State Entanglement and Quantum Error Correction.

Phys-ical Review A , 54:3824, 1996, arXiv:9604024 .[BGS13] Alex Bocharov, Yuri Gurevich, and Krysta M. Svore. Eﬃcient Decompositionof Single-Qubit Gates into V Basis Circuits.

Physical Review A , 88:012313,2013, arXiv:1303.1411 .[BH12] Sergey Bravyi and Jeongwan Haah. Magic-state distillation with low overhead.

Physical Review A , 86:052329, 2012, arXiv:1209.2426 .[BHMT00] Gilles Brassard, Peter Høyer, Michele Mosca, and Alain Tapp. QuantumAmplitude Ampliﬁcation and Estimation. 2000, arXiv:0005055 .[BK98] Sergey Bravyi and Alexei Y. Kitaev. Quantum codes on a lattice withboundary. 1998, arXiv:9811052 .[BK05] Sergey Bravyi and Alexei Y. Kitaev. Universal quantum computation withideal Cliﬀord gates and noisy ancillas.

Physical Review A , 71:022316, 2005, arXiv:0403025 .[BK09] Anne Broadbent and Elham Kasheﬁ. Parallelizing quantum circuits.

Theo-retical Computer Science , 410(26):2489–2510, 2009, arXiv:0704.1736 .[BK12] Sergey Bravyi and Robert K¨onig. Classiﬁcation of topologically protectedgates for local stabilizer codes.

Physical Review Letters , 110:170503, 2012, arXiv:1206.1609 .[Ble] Blender, .[BM12] Koichi Betsumiya and Akihiro Munemasa. On triply even binary codes.

Jour-nal of the London Mathematical Society , 86(1):1–16, 2012, arXiv:1012.4134 .[BMD07] Hector Bombin and Miguel-Angel Martin-Delgado. Topological Computationwithout Braiding.

Physical Review Letters , 98:160502, 2007, arXiv:0610024 .194Bow02] Garry Bowen. Entanglement required in achieving entanglement-assistedchannel capacities.

Physical Review A , 66(5):052313, 2002, arXiv:0205117 .[BP12] Peter Brooks and John Preskill. Fault-tolerant quantum computationwith asymmetric Bacon-Shor codes.

Physical Review A , 87:032310, 2012, arXiv:1211.1400 .[BPF +

02] Nicolas Boulant, Marco A. Pravia, Evan M. Fortunato, Timothy F. Havel, andDavid G. Cory. Experimental Concatenation of Quantum Error Correctionwith Decoupling.

Quantum Information Processing , 1(1-2):135–144, 2002.[Bro13] Peter Brooks.

Quantum error correction with biased noise . PhD thesis,Caltech, 2013.[BS12] Alex Bocharov and Krysta M. Svore. A Depth-Optimal Canonical Form forSingle-qubit Quantum Circuits.

Physical Review Letters , 109:19050, 2012, arXiv:1206.3223 .[BT98] Bruce M. Boghosian and Washington Taylor. Simulating quantum mechanicson a quantum computer.

Physica D: Nonlinear Phenomena , 120(1-2):30–42,1998.[BT11] Avraham Ben-Aroya and Amnon Ta-Shma. Approximate quantum errorcorrection for correlated noise.

IEEE Transactions on Information Theory ,57:3982–3988, 2011, arXiv:0909.1466 .[BVFC05] Nicolas Boulant, Lorenza Viola, Evan Fortunato, and David G. Cory. Exper-imental Implementation of a Concatenated Quantum Error-Correcting Code.

Physical Review Letters , 94(13):130501, 2005, arXiv:0409193 .[CAB12] Earl T. Campbell, Hussain Anwar, and Dan E. Browne. Magic state distil-lation in all prime dimensions using quantum Reed-Muller codes.

PhysicalReview X , 2:041021, 2012, arXiv:1205.3104 .[CCC +

08] Xie Chen, Hyeyoun Chung, Andrew W. Cross, Bei Zeng, and Isaac L.Chuang. Subsystem stabilizer codes cannot have a universal set of transversalgates for even one encoded qudit.

Physical Review A , 78(1):012353, 2008, arXiv:0801.2360 .[CDKM04] Steven A. Cuccaro, Thomas G. Draper, Samuel A. Kutin, and David P.Moulton. A new quantum ripple-carry addition circuit. 2004, arXiv:0410184 .195CDT09] Andrew W. Cross, David P. DiVincenzo, and Barbara M. Terhal. A com-parative code study for quantum fault-tolerance.

Quantum Information andComputation , 9(7&8):541–572, 2009, arXiv:0711.1156 .[CGC +

12] Jerry M. Chow, Jay M. Gambetta, Antonio D. Corcoles, Seth T. Merkel,John A. Smolin, Chad Rigetti, S. Poletto, George A. Keefe, Mary B. Rothwell,John R. Rozen, Mark B. Ketchen, and Matthias Steﬀen. Complete universalquantum gate set approaching fault-tolerant thresholds with superconductingqubits.

Physical Review Letters , 109:060501, 2012, arXiv:1202.5344 .[CK13] Andrew M. Childs and Robin Kothari. In preparation. 2013.[CLS +

04] John Chiaverini, D Leibfried, Tobias Schaetz, Murray D Barrett, R BBlakestad, Joseph W. Britton, Wayne M. Itano, Juergen D. Jost, EmanuelKnill, C. E. Langer, Roee Ozeri, and David J. Wineland. Realization ofquantum error correction.

Nature , 432(7017):602–5, 2004.[CPM +

98] David G. Cory, Mark Price, W. Maas, Emanuel Knill, Raymond Laﬂamme,Wojciech H. Zurek, Timothy F. Havel, and Shyamal S. Somaroo. ExperimentalQuantum Error Correction.

Physical Review Letters , 81(10):2152–2155, 1998, arXiv:9802018 .[CRSS97] A. Robert Calderbank, Eric M. Rains, Peter W. Shor, and Neil J.A. Sloane.Quantum Error Correction and Orthogonal Geometry.

Physical ReviewLetters , 78:405–409, 1997, arXiv:9605005 .[CS96] A. Robert Calderbank and Peter W. Shor. Good quantum error-correctingcodes exist.

Physical Review A , 54(2):1098–1105, 1996, arXiv:9512032 .[CSSZ09] Andrew W. Cross, Graeme Smith, John A. Smolin, and Bei Zeng. CodewordStabilized Quantum Codes.

IEEE Transactions on Information Theory ,55(1):433–438, 2009, arXiv:0708.1021 .[CvD10] Andrew M. Childs and Wim van Dam. Quantum algorithms for algebraicproblems.

Reviews of Modern Physics , 82(1):1–52, 2010, arXiv:0812.0380 .[CW00] Richard Cleve and John Watrous. Fast parallel circuits for the quantumFourier transform.

Foundations of Computer Science, 2000. Proceedings. 41stAnnual Symposium on , pages 526–536, 2000, arXiv:0006004 .196DA07] David P. DiVincenzo and Panos Aliferis. Eﬀective fault-tolerant quantumcomputation with slow measurements.

Physical Review Letters , 98:20501,2007, arXiv:0607047 .[DFH04] Simon J. Devitt, Austin G. Fowler, and Lloyd C. L. Hollenberg. Implementa-tion of Shor’s algorithm on a linear nearest neighbour qubit array.

QuantumInformation and Computation , 4(4):237–251, 2004, arXiv:0402196 .[DFN05] Sankar Das Sarma, Michael H. Freedman, and Chetan Nayak. TopologicallyProtected Qubits from a Possible Non-Abelian Fractional Quantum HallState.

Physical Review Letters , 94(16):166802, 2005, arXiv:0412343 .[DFS +

09] Simon J. Devitt, Austin G. Fowler, Ashley M. Stephens, Andrew D. Greentree,Lloyd C. L. Hollenberg, William J. Munro, and Kae Nemoto. Architecturaldesign for a topological cluster state quantum computer.

New Journal ofPhysics , 11(8):083032, 2009, arXiv:0808.1782 .[DFT +

10] Simon J. Devitt, Austin G. Fowler, Todd Tilma, William J. Munro, and KaeNemoto. Classical Processing Requirements For A Topological QuantumComputing System.

International Journal of Quantum Information , 08:1–27,2010, arXiv:0906.0415 .[DG97] Lu-Ming Duan and Guang-Can Guo. Preserving Coherence in QuantumComputation by Pairing Quantum Bits.

Physical Review Letters , 79(10):1953–1956, 1997, arXiv:9703040 .[DHN06] Christopher M. Dawson, Henry Haselgrove, and Michael A. Nielsen. NoiseThresholds for Optical Quantum Computers.

Physical Review Letters , 96(2):4,2006, arXiv:0509060 .[DiV95] David P. DiVincenzo. Two-bit gates are universal for quantum computation.

Physical Review A , 51:1015–1022, 1995, arXiv:9407022 .[DiV09] David P. DiVincenzo. Fault-tolerant architectures for superconducting qubits.

Physica Scripta , T137:014020, 2009, arXiv:0905.4839 .[DKLP02] Eric Dennis, Alexei Y. Kitaev, Andrew J. Landahl, and John Preskill. Topo-logical quantum memory.

Journal of Mathematical Physics , 42(9), 2002, arXiv:0110143 . 197DLT02] David P. DiVincenzo, Debbie W. Leung, and Barbara M. Terhal. Quantumdata hiding.

IEEE Transactions on Information Theory , 48(3):580–598, 2002, arXiv:0103098 .[DMN11] Simon J. Devitt, William J. Munro, and Kae Nemoto. High Per-formance Quantum Computing.

Progress in Informatics , 8:1–7, 2011, arXiv:0810.2444 .[DN05] Christopher M. Dawson and Michael A. Nielsen. The Solovay-Kitaevalgorithm.

Quantum Information and Computation , 6(1):81–95, 2005, arXiv:0505030 .[DS96] David P. DiVincenzo and Peter W. Shor. Fault-tolerant error correctionwith eﬃcient quantum codes.

Physical Review Letters , 77:3260–3263, 1996, arXiv:9605031 .[DS12] Guillaume Duclos-Cianci and Krysta M. Svore. A State Distillation Pro-tocol to Implement Arbitrary Single-qubit Rotations. page 10, 2012, arXiv:1210.1980 .[dSPK13] Raphael Dias da Silva, Einar Pius, and Elham Kasheﬁ. Global QuantumCircuit Optimization. 2013, arXiv:1301.0351 .[Eas13] Bryan Eastin. Distilling one-qubit magic states into Toﬀoli states.

PhysicalReview A , 87:032321, 2013, arXiv:1212.4872 .[EK09] Bryan Eastin and Emanuel Knill. Restrictions on Transversal Encoded Quan-tum Gate Sets.

Physical Review Letters , 102:11050, 2009, arXiv:0811.4262 .[FD12] Austin G. Fowler and Simon J. Devitt. A bridge to lower overhead quantumcomputation. 2012, arXiv:1209.0510 .[FDJ13] Austin G. Fowler, Simon J. Devitt, and Cody Jones. Surface code imple-mentation of block code state distillation.

Scientiﬁc reports , 3(1939), 2013, arXiv:1301.7107 .[Fey82] Richard P. Feynman. Simulating Physics with Computers.

Internationaljournal of theoretical physics , 21(6–7):467–488, 1982.[FH04] Austin G. Fowler and Lloyd C. L. Hollenberg. Scalability of Shor’s algorithmwith a limited set of rotation gates.

Physical Review A , 70:32329, 2004, arXiv:0306018 . 198FHH04] Austin G. Fowler, Charles D. Hill, and Lloyd C. L. Hollenberg. Quantum-error correction on linear-nearest-neighbor qubit arrays.

Physical Review A ,69:42314, 2004, arXiv:0311116 .[FLW02a] Michael H. Freedman, Michael J. Larsen, and Zhenghan Wang. A modularfunctor which is universal for quantum computation.

Communications inMathematical Physics , 227:605–622, 2002, arXiv:0001108 .[FLW02b] Michael H. Freedman, Michael J. Larsen, and Zhenghan Wang. TheTwo-Eigenvalue Problem and Density of Jones Representation of BraidGroups.

Communications in Mathematical Physics , 228(1):177–199, 2002, arXiv:0103200 .[FMMC12] Austin G. Fowler, Matteo Mariantoni, John M. Martinis, and Andrew N.Cleland. A primer on surface codes: Developing a machine language for aquantum computer. 2012, arXiv:1208.0928 .[Fow11] Austin G. Fowler. Constructing arbitrary Steane code single logical qubitfault-tolerant gates.

Quantum Information and Computation , 11:867–873,2011, arXiv:0411206 .[Fow12a] Austin G. Fowler. Low-overhead surface code logical H. 2012, arXiv:1202.2639 .[Fow12b] Austin G. Fowler. Proof of ﬁnite surface code threshold for matching.

PhysicalReview Letters , 109:180502, 2012, arXiv:1206.0800 .[Fow12c] Austin G. Fowler. Time-optimal quantum computation. 2012, arXiv:1210.4626 .[Fow13a] Austin G. Fowler. Coping with qubit leakage in topological codes. 2013, arXiv:1308.6642 .[Fow13b] Austin G. Fowler. Minimum weight perfect matching in O(1) parallel time.page 7, 2013, arXiv:1307.1740 .[Fow13c] Austin G. Fowler. Optimal complexity correction of correlated errors in thesurface code. 2013, arXiv:1310.0863 .[FSB +

12] Arkady Fedorov, Lars Steﬀen, Matthias Baur, Marcus P. da Silva, andAndreas Wallraﬀ. Implementation of a Toﬀoli gate with superconductingcircuits.

Nature , 481(7380):170–2, 2012, arXiv:1108.3966 .199FSG09] Austin G. Fowler, Ashley M. Stephens, and Peter Groszkowski. High-threshold universal quantum computation on the surface code.

PhysicalReview A , 80(5), 2009, arXiv:0803.0272 .[FT07] Joseph F. Fitzsimons and Jason Twamley. Globally controlled fault tolerantquantum computation. 2007, arXiv:0707.1119 .[FT09] Joseph F. Fitzsimons and Jason Twamley. Quantum Fault Tolerance inSystems with Restricted Control.

Electronic Notes in Theoretical ComputerScience , 258(2):35–49, 2009.[FWH12] Austin G. Fowler, Adam C. Whiteside, and Lloyd C. L. Hollenberg. Towardspractical classical processing for the surface code.

Physical Review Letters ,108:180501, 2012, arXiv:1110.5133 .[FWMR12] Austin G. Fowler, Adam C. Whiteside, Angus L. McInnes, and AlimohammadRabbani. Topological code Autotune.

Physical Review X , 2:041003, 2012, arXiv:1202.6111 .[FY10] Keisuke Fujii and Katsuji Yamamoto. Topological one-way quantum com-putation on veriﬁed logical cluster states.

Physical Review A , 82(6):4, 2010, arXiv:1008.2048 .[Gan99] Xiao Gang. PermGroup, 1999, http://wims.unice.fr/wims/en_tool~algebra~permgroup.en.phtml .[GC99] Daniel Gottesman and Isaac L. Chuang. Demonstrating the viability of uni-versal quantum computation using teleportation and single-qubit operations.

Nature , 402:390–393, 1999, arXiv:9908010 .[GFG12] Joydip Ghosh, Austin G. Fowler, and MR Geller. Surface code with decoher-ence: An analysis of three superconducting architectures.

Physical Review A ,2012, arXiv:1210.5799 .[GFMG13] Joydip Ghosh, Austin G. Fowler, John M. Martinis, and Michael R. Geller.Leakage and paralysis in ancilla-assisted qubit measurement: Consequencesfor topological error correction in superconducting architectures. 2013, arXiv:1306.0925 .[GGZ13] Joydip Ghosh, Andrei Galiautdinov, and Zhongyuan Zhou. High-ﬁdelityCZ gate for resonator-based superconducting quantum computers.

PhysicalReview A , 87:022309, 2013, arXiv:1301.1719 .200GKMR13] David Gosset, Vadym Kliuchnikov, Michele Mosca, and Vincent Russo. Analgorithm for the T-count. 2013, arXiv:1308.4134 .[GN13] David Gosset and Daniel Nagaj. Quantum 3-SAT is QMA1-complete. 2013, arXiv:1302.0290 .[Got96a] Daniel Gottesman. A Class of Quantum Error-Correcting Codes Saturatingthe Quantum Hamming Bound.

Physical Review A , 54:1862–1868, 1996, arXiv:9604038 .[Got96b] Daniel Gottesman. Pasting Quantum Codes. 1996, arXiv:9607027 .[Got97] Daniel Gottesman.

Stabilizer Codes and Quantum Error Correction . PhDthesis, Caltech, 1997, arXiv:9705052 .[Got98] Daniel Gottesman. Theory of fault-tolerant quantum computation.

PhysicalReview A , 57(1):127–137, 1998, arXiv:9702029 .[Got99] Daniel Gottesman. The Heisenberg Representation of Quantum Computers.In S. P. Corney, R. Delbourgo, and P. D. Jarvis, editors,

Proceedings of theXXII International Colloquium on Group Theoretical Methods in Physics ,pages 32–43. International Press, 1999, arXiv:9807006 .[Got00] Daniel Gottesman. Fault-Tolerant Quantum Computation with Local Gates.

Journal of Modern Optics , 47:333–345, 2000, arXiv:9903099 .[Got13] Daniel Gottesman. What is the Overhead Required for Fault-TolerantQuantum Computation? 2013, arXiv:1310.2984 .[GS12] Brett Giles and Peter Selinger. Exact synthesis of multi-qubit Cliﬀord+Tcircuits.

Physical Review A , 87, 032332, 2012, arXiv:1212.0506 .[Haa11] Jeongwan Haah. Local stabilizer codes in three dimensions without stringlogical operators.

Physical Review A , 83:042330, 2011, arXiv:1101.1962 .[Hal07] Sean Hallgren. Polynomial-time quantum algorithms for Pell’s equation andthe principal ideal problem.

Journal of the ACM , 54(1), 2007.[Har04] Jim Harrington.

Analysis of quantum error-correcting codes: symplecticlattice codes and toric codes . PhD thesis, Caltech, 2004.201HFDV12] Clare Horsman, Austin G. Fowler, Simon J. Devitt, and Rodney Van Meter.Surface code quantum computing by lattice surgery.

New Journal of Physics ,14:123011, 2012, arXiv:1111.4022 .[HHL09] Aram W. Harrow, Avinatan Hassidim, and Seth Lloyd. Quantum algorithmfor solving linear systems of equations.

Physical Review Letters , 103:150502,2009, arXiv:0811.3171 .[HHO +

13] Anna Y. Herr, Quentin P. Herr, Oliver T. Oberg, Ofer Naaman, John X.Przybysz, Pavel Borodulin, and Steven B. Shauck. An 8-bit carry look-aheadadder with 150 ps latency and sub-microwatt power dissipation at 10GHz.

Journal of Applied Physics , 113(3):033911, 2013.[HHOI11] Quentin P. Herr, Anna Y. Herr, Oliver T. Oberg, and Alexander G. Ioannidis.Ultra-low-power superconductor logic.

Journal of Applied Physics , 109:103903,2011, arXiv:1103.4269 .[HLL88] Tsai-Ming Hsieh, Hon Wai Leong, and Chang Liu. Two-dimensional layoutcompaction by simulated annealing. In

IEEE International Symposium onCircuits and Systems , pages 2439–2443. IEEE, 1988.[HN03] Aram W. Harrow and Michael A. Nielsen. How robust is a quantum gate inthe presence of noise?

Physical Review A , 68:012308, 2003, arXiv:0301108 .[HRM13] D. Scott Holmes, Andrew L. Ripple, and Marc A. Manheimer. Energy-Eﬃcient Superconducting ComputingPower Budgets and Requirements.

IEEE Transactions on Applied Superconductivity , 23(3):1701610–1701610,2013.[IWPK08] Nemanja Isailovic, Mark Whitney, Yatish Patel, and John Kubiatowicz.Running a Quantum Circuit at the Speed of Data. In , pages 177 – 188, 2008, arXiv:0804.4725 .[JL13] Tomas Jochym-O’Connor and Raymond Laﬂamme. Using concate-nated quantum codes for universal fault-tolerant quantum gates. 2013, arXiv:1309.3310 .[Jon12] Cody Jones. Multilevel distillation of magic states for quantum computing.2012, arXiv:1210.3388 . 202Jon13a] Cody Jones. Composite Toﬀoli gate with two-round error detection.

PhysicalReview A , 87, 052334, 2013, arXiv:1303.6971 .[Jon13b] Cody Jones. Distillation protocols for Fourier states in quantum computing.2013, arXiv:1303.3066 .[Jon13c] Cody Jones.

Logic synthesis for fault-tolerant quantum computers . PhDthesis, Stanford University, 2013, arXiv:1310.7290 .[Jon13d] Cody Jones. Low-overhead constructions for the fault-tolerant Toﬀoli gate.

Physical Review A , 87, 022328, 2013, arXiv:1212.5069 .[JVF +

12] Cody Jones, Rodney Van Meter, Austin G. Fowler, Peter L. McMahon,Jungsang Kim, Thaddeus D. Ladd, and Yoshihisa Yamamoto. LayeredArchitecture for Quantum Computing.

Physical Review X , 2(3):031007, 2012, arXiv:1010.5022 .[JW06] Dominik Janzing and Pawel Wocjan. Estimating diagonal entries of powersof sparse symmetric matrices is BQP-complete. 2006, arXiv:0606229 .[JWM +

12] Cody Jones, James D. Whitﬁeld, Peter L. McMahon, Man-Hong Yung, Rod-ney Van Meter, Al´an Aspuru-Guzik, and Yoshihisa Yamamoto. Simulatingchemistry eﬃciently on fault-tolerant quantum computers.

New Journal ofPhysics , 14, 115023, 2012, arXiv:1204.0567 .[JYHL12] Tomas Jochym-O’Connor, Yafei Yu, Bassam Helou, and Raymond Laﬂamme.The robustness of magic state distillation against errors in Cliﬀord gates.2012, arXiv:1205.6715 .[Kal11] Gil Kalai. How Quantum Computers Fail: Quantum Codes, Correlations inPhysical Systems, and Noise Accumulation. 2011, arXiv:1106.0485 .[Kau01] Louis H. Kauﬀman.

Knots and physics . World Scientiﬁc, Teaneck, NJ, 2001.[Kay05] Alastair Kay. Error Correcting the Control Unit in Global Control Schemes.2005, arXiv:0504197 .[Kay07] Alastair Kay. Deriving a Fault-Tolerant Threshold for a Global ControlScheme. 2007, arXiv:0702239 .[Ked06] Kiran S. Kedlaya. Quantum computation of zeta functions of curves.

Com-putational Complexity , 15:1–19, 2006, arXiv:0411623 .203Kim12] Isaac H. Kim. 3D local qupit quantum code without string logical operator.2012, arXiv:1202.0052 .[Kit97] Alexei Y. Kitaev. Quantum computations: algorithms and error correction.

Russian Mathematical Surveys , 52(6):1191–1249, 1997.[Kit03] Alexei Y. Kitaev. Fault-tolerant quantum computation by anyons.

Annalsof Physics , 303(1):2–30, 2003, arXiv:9707021 .[KK09] Jungsang Kim and Changsoon Kim. Integrated Optical Approach to TrappedIon Quantum Computation.

Quantum Information and Computation , 9:181–202, 2009, arXiv:0711.3866 .[KL96] Emanuel Knill and Raymond Laﬂamme. Concatenated quantum codes. 1996, arXiv:9608012 .[Kli13] Vadym Kliuchnikov. Synthesis of unitaries with Cliﬀord+T circuits. 2013, arXiv:1306.3200 .[KLM07] Phillip Kaye, Raymond Laﬂamme, and Michele Mosca.

An Introduction toQuantum Computing . Oxford University Press, 2007.[KLMN01] Emanuel Knill, Raymond Laﬂamme, Rudy Martinez, and Camille Ne-grevergne. Benchmarking Quantum Computers: The Five-Qubit Er-ror Correcting Code.

Physical Review Letters , 86(25):5811–5814, 2001, arXiv:0101034 .[KLV00] Emanuel Knill, Raymond Laﬂamme, and Lorenza Viola. Theory of QuantumError Correction for General Noise.

Physical Review Letters , 84(11):2525–2528, 2000, arXiv:9604034 .[KLZ96] Emanuel Knill, Raymond Laﬂamme, and Wojciech H. Zurek. ThresholdAccuracy for Quantum Computation. 1996, arXiv:9610011 .[KMM12a] Vadym Kliuchnikov, Dmitri Maslov, and Michele Mosca. Asymptoticallyoptimal approximation of the single qubit unitaries by Cliﬀord+T circuitsusing at most three ancillary qubits.

Physical Review Letters , 110:190502,2012, arXiv:1212.0822 .[KMM12b] Vadym Kliuchnikov, Dmitri Maslov, and Michele Mosca. Fast and eﬃ-cient exact synthesis of single qubit unitaries generated by Cliﬀord and204 gates.

Quantum Information and Computation , 13(7&8):607–630, 2012, arXiv:1206.5236 .[KMM12c] Vadym Kliuchnikov, Dmitri Maslov, and Michele Mosca. Practical approx-imation of single-qubit unitaries by single-qubit quantum Cliﬀord and Tcircuits. 2012, arXiv:1212.6964 .[Kni95] Emanuel Knill. Approximation by Quantum Circuits. Technical ReportLAUR-95-2225, Los Alamos National Laboratory, 1995, arXiv:9508006 .[Kni96] Emanuel Knill. Non-binary unitary error bases and quantum codes.Technical Report LAUR-96-2717, Los Alamos National Laboratory, 1996, arXiv:9608048 .[Kni04] Emanuel Knill. Fault-Tolerant Postselected Quantum Computation: Schemes.2004, arXiv:0402171 .[Kni05] Emanuel Knill. Quantum Computing with Very Noisy Devices.

Nature ,434(7029):39–44, 2005, arXiv:0410199 .[KOB +

09] Elham Kasheﬁ, Daniel Kuan Li Oi, Daniel E. Browne, Janet Anders, andErika Andersson. Twisted graph states for ancilla-driven quantum computa-tion.

Proc. 25th Conference on the Mathematical Foundations of ProgrammingSemantics (MFPS 25), ENTCS , 249:307–331, 2009, arXiv:0905.3354 .[KRUdW10] Julia Kempe, Oded Regev, Falk Unger, and Ronald de Wolf. Upper Boundson the Noise Threshold for Fault-tolerant Quantum Computing.

QuantumInformation and Computation , 10(5&6):0361–0376, 2010, arXiv:0802.1464 .[KSV02] Alexei Y. Kitaev, Alexander H. Shen, and Mikhail N. Vyalyi.

Classical andQuantum Computation . American Mathematical Society, Providence, RI,2002.[KW11] Ivan Kassal and JD Whitﬁeld. Simulating chemistry using quantumcomputers.

Annual Review of Physical Chemistry , 62:185–207, 2011, arXiv:1007.2648 .[LAR11] Andrew J. Landahl, Jonas T. Anderson, and Patrick R. Rice. Fault-tolerantquantum computing with color codes. page 28, 2011, arXiv:1108.5738 .205LBK04] Yuan Liang Lim, Almut Beige, and Leong Chuan Kwek. Repeat-Until-Success Quantum Computing.

Physical Review Letters , 95, 030505, 2004, arXiv:0408043 .[LBKW01] Daniel A. Lidar, Dave Bacon, Julia Kempe, and K. Birgitta Whaley.Decoherence-free subspaces for multiple-qubit errors. II. Universal, fault-tolerant quantum computation.

Physical Review A , 63(2):022307, 2001.[LBW99] Daniel A. Lidar, Dave Bacon, and K. Birgitta Whaley. ConcatenatingDecoherence-Free Subspaces with Quantum Error Correcting Codes.

PhysicalReview Letters , 82(22):4556–4559, 1999, arXiv:9809081 .[LC13] Andrew J. Landahl and Chris Cesare. Complex instruction set computingarchitecture for performing accurate quantum Z rotations with less magic.2013, arXiv:1302.3240 .[LCW98] Daniel A. Lidar, Isaac L. Chuang, and K. Birgitta Whaley. Decoherence-FreeSubspaces for Quantum Computation.

Physical Review Letters , 81(12):2594–2597, 1998, arXiv:9807004 .[LJL +

10] Thaddeus D. Ladd, Fedor Jelezko, Raymond Laﬂamme, Yasunobu Nakamura,Christopher Monroe, and Jeremy L. O’Brien. Quantum Computing.

Nature ,464(7285):45–53, 2010, arXiv:1009.2267 .[LK12] Igor Lesanovsky and Hosho Katsura. Interacting Fibonacci anyons in aRydberg gas.

Physical Review A , 86(4):041601, 2012, arXiv:1204.0903 .[LNCY97] Debbie W. Leung, Michael A. Nielsen, Isaac L. Chuang, and YoshihisaYamamoto. Approximate quantum error correction can lead to better codes.

Physical Review A , 56:2567–2573, 1997, arXiv:9704002 .[LPSB13] Ching-Yi Lai, Gerardo Paz, Martin Suchara, and Todd A. Brun. Performanceand Error Analysis of Knill’s Postselection Scheme in a Two-DimensionalArchitecture. 2013, arXiv:1305.5657 .[LVZ +

99] Debbie W. Leung, Lieven Vandersypen, Xinlan Zhou, Mark Sherwood, Con-stantino Yannoni, Mark Kubinec, and Isaac L. Chuang. Experimentalrealization of a two-bit phase damping quantum code.

Physical Review A ,60(3):1924–1943, 1999, arXiv:9811068 .206LW83] Yuh-Zen Liao and Chak-Kuen Wong. An Algorithm to Compact a VLSI Sym-bolic Layout with Mixed Constraints. In , pages 107–112. IEEE, 1983.[LYGG08] Shiang Looi, Li Yu, Vlad Gheorghiu, and Robert Griﬃths. Quantum-error-correcting codes using qudit graph states.

Physical Review A , 78(4):042303,2008, arXiv:0712.1979 .[MBRL11] Osama Moussa, Jonathan Baugh, Colm A. Ryan, and Raymond Laﬂamme.Demonstration of suﬃcient control for two rounds of quantum error correctionin a solid state ensemble quantum information processor.

Physical ReviewLetters , 107:160501, 2011, arXiv:1108.4842 .[MDMN08] Dmitri Maslov, Gerhard W. Dueck, D. Michael Miller, and Camille Ne-grevergne. Quantum Circuit Simpliﬁcation and Level Compaction.

IEEETransactions on Computer-Aided Design of Integrated Circuits and Systems ,27(3):436–444, 2008, arXiv:0604001 .[MEK13] Adam M. Meier, Bryan Eastin, and Emanuel Knill. Magic-state distilla-tion with the four-qubit code.

Quantum Information and Computation ,13(3&4):195–209, 2013, arXiv:1204.4221 .[MF12] Thomas J. Milburn and Austin G. Fowler. Checking the error correctionstrength of arbitrary surface code logical gates. 2012, arXiv:1210.4249 .[Mic12] Kamil Michnicki. 3-d quantum stabilizer codes with a power law energybarrier. 2012, arXiv:1208.3496 .[MKH +

08] Thomas Monz, Kihwan Kim, Wolfgang H¨ansel, M. Riebe, Alessandro Villar,Philipp Schindler, Michael Chwalla, Markus Hennrich, and Rainer Blatt.Realization of the quantum Toﬀoli gate with trapped ions.

Physical ReviewLetters , 102(4):11, 2008, arXiv:0804.0082 .[MN01] Cristopher Moore and Martin Nilsson. Parallel quantum computation andquantum codes.

SIAM Journal on Computing , 2001, arXiv:9808027 .[Mos08] Michele Mosca. Quantum Algorithms. 2008, arXiv:0808.0369 .[MPGC13] Easwar Magesan, Daniel Puzzuoli, Christopher E. Granade, and David G.Cory. Modeling quantum noise for eﬃcient testing of fault-tolerant circuits.

Physical Review A , 87:012324, 2013, arXiv:1206.5407 .207MS93] Florence J. MacWilliams and Neil J.A. Sloane.

The Theory of Error-Correcting Codes . North-Holland, 1993.[MSB +

11] Thomas Monz, Philipp Schindler, Julio T. Barreiro, Michael Chwalla, DanielNigg, William A Coish, M Harlander, Wolfgang H¨ansel, Markus Hennrich,and Rainer Blatt. 14-Qubit Entanglement: Creation and Coherence.

PhysicalReview Letters , 106:130506, 2011, arXiv:1009.6126 .[MTC +

05] Tzvetan Metodi, Darshan Thaker, Andrew W. Cross, Fred Chong, andIsaac L. Chuang. A quantum logic array microarchitecture: Scalable quan-tum data movement and computation. , 2005, arXiv:0509051 .[Muk11] Oleg A. Mukhanov. Energy-Eﬃcient Single Flux Quantum Technology.

IEEETransactions on Applied Superconductivity , 21(3):760–769, 2011.[MWY +

11] Matteo Mariantoni, Haiyan Wang, Takashi Yamamoto, Matthew Neeley,Radoslaw C. Bialczak, Yu Chen, Mike Lenander, Erik Lucero, Aaron D.O’Connell, Daniel Sank, Martin Weides, Jim Wenner, Yi Yin, Jian Zhao,Alexander N. Korotkov, Andrew N. Cleland, and John M Martinis. Im-plementing the quantum von Neumann architecture with superconductingcircuits.

Science , 334(6052):61–5, 2011, arXiv:1109.3743 .[N + ] Shota Nagayama et al. In preparation.[NC00] Michael A. Nielsen and Isaac L. Chuang. Quantum Computation and Quan-tum Information . Cambridge University Press, 2000.[NLP11] Hui Khoon Ng, Daniel A. Lidar, and John Preskill. Combining dynamicaldecoupling with fault-tolerant quantum computation.

Physical Review A ,84(1):012305, 2011, arXiv:0911.3202 .[NP09] Hui Khoon Ng and John Preskill. Fault-tolerant quantum computation versusGaussian noise.

Physical Review A , 79(3):30, 2009, arXiv:0810.4953 .[NSS +

08] Chetan Nayak, Steven H. Simon, Ady Stern, Michael H. Freedman, andSankar Das Sarma. Non-Abelian anyons and topological quantum computa-tion.

Reviews of Modern Physics , 80(3):1083–1159, 2008, arXiv:0707.1889 .[OV10] Carlo Ottaviani and David Vitali. Implementation of a three-qubit quantumerror-correction code in a cavity-QED setup.

Physical Review A , 82(1):012319,2010, arXiv:1005.3072 . 208PBH98] Vera Pless, Richard A. Brualdi, and W. C. Huﬀman.

Handbook of CodingTheory . Elsevier Science Inc., New York, NY, USA, 1998.[PF13] Adam Paetznick and Austin G. Fowler. Quantum circuit optimization bytopological compaction in the surface code. 2013, arXiv:1304.2807 .[PJF05] T. Pittman, B. Jacobs, and J. D. Franson. Demonstration of quantumerror correction using linear optics.

Physical Review A , 71(5):052332, 2005, arXiv:0502042 .[PMH03] Ketan N. Patel, Igor L. Markov, and John P. Hayes. Eﬃcient Synthesis ofLinear Reversible Circuits. 2003, arXiv:0302002 .[PR] Adam Paetznick and Ben W. Reichardt. qfault: Python modules for countingmalignant sets of locations in fault-tolerant quantum circuits, http://code.google.com/p/qfault/ .[PR12] Adam Paetznick and Ben W. Reichardt. Fault-tolerant ancilla preparationand noise threshold lower bounds for the 23-qubit Golay code.

Quantum In-formation and Computation , 12(11&12):1034–1080, 2012, arXiv:1106.2190 .[PR13] Adam Paetznick and Ben W. Reichardt. Universal fault-tolerant quantumcomputation with only transversal gates and error correction.

Physical ReviewLetters , 111, 09050, 2013, arXiv:1304.3709 .[Pre98] John Preskill. Reliable Quantum Computers.

Proceedings of the Royal SocietyA , 454:385–410, 1998, arXiv:9705031 .[Pre13] John Preskill. Suﬃcient condition on noise correlations for scalable quantumcomputing.

Quantum Information and Computation , 13:181–194, 2013, arXiv:1207.6131 .[PS13] Adam Paetznick and Krysta M. Svore. Repeat-Until-Success:Non-deterministic decomposition of single-qubit unitaries. 2013, arXiv:1311.1074 .[PSBT10a] Gerardo A. Paz-Silva, Gavin K. Brennen, and Jason Twamley. Fault tol-erant Quantum Information Processing with Holographic control. 2010, arXiv:1008.1634 .[PSBT10b] Gerardo A. Paz-Silva, Gavin K. Brennen, and Jason Twamley. On fault-tolerance with noisy and slow measurements. 2010.209PSBT11] Gerardo A. Paz-Silva, Gavin K. Brennen, and Jason Twamley. Bulk fault-tolerant quantum information processing with boundary addressability.

NewJournal of Physics , 13(1):013011, 2011.[PSE96] G. Massimo Palma, Kalle-Antti Suominen, and Artur K. Ekert. QuantumComputers and Dissipation.

Proceedings of the Royal Society A: Mathematical,Physical and Engineering Sciences , 452(1946):567–584, 1996, arXiv:9702001 .[PSL13] Gerardo A. Paz-Silva and Daniel A. Lidar. Optimally combining dynamicaldecoupling and quantum error correction.

Scientiﬁc reports , 3:1530, 2013, arXiv:1206.3606 .[PV10] Martin B. Plenio and Shashank Virmani. Upper bounds on fault tolerancethresholds of noisy Cliﬀord-based quantum computers.

New Journal ofPhysics , 12(3):033012, 2010, arXiv:0810.4340 .[RB01] Robert Raussendorf and Hans Briegel. Computational model underlying theone-way quantum computer. 2001, arXiv:0108067 .[RDN +

12] Matthew D. Reed, Leonardo DiCarlo, Simon E. Nigg, Luyan Sun, LuigiFrunzio, Steven M. Girvin, and Robert J. Schoelkopf. Realization of three-qubit quantum error correction with superconducting circuits.

Nature ,482(7385):382–5, 2012, arXiv:1109.4948 .[Rei04] Ben W. Reichardt. Improved ancilla preparation scheme increases fault-tolerant threshold. 2004, arXiv:0406025 .[Rei05] Ben W. Reichardt. Quantum Universality from Magic States DistillationApplied to CSS Codes.

Quantum Information Processing , 4:251, 2005, arXiv:0608085 .[Rei06a] Ben W. Reichardt.

Error-detection-based quantum fault tolerance againstdiscrete Pauli noise . PhD thesis, UC Berkeley, 2006, arXiv:0612004 .[Rei06b] Ben W. Reichardt. Fault-tolerance threshold for a distance-three quantumcode.

Lecture Notes in Computer Science , 4051:50–61, 2006, arXiv:0509203 .[Rei07] Ben W. Reichardt. Error-Detection-Based Quantum Fault-Tolerance Thresh-old.

Algorithmica , 55(3):517–556, 2007.210RH07] Robert Raussendorf and Jim Harrington. Fault-Tolerant Quantum Compu-tation with High Threshold in Two Dimensions.

Physical Review Letters , 98,190504, 2007, arXiv:0610082 .[RHG06] Robert Raussendorf, Jim Harrington, and Kovid Goyal. A fault-tolerantone-way quantum computer.

Annals of Physics , 321(9):2242–2270, 2006, arXiv:0510135 .[RHG07] Robert Raussendorf, Jim Harrington, and Kovid Goyal. Topological fault-tolerance in cluster state quantum computation.

New Journal of Physics ,9(6):199–199, 2007, arXiv:0703143 .[SBM +

11] Philipp Schindler, Julio T. Barreiro, Thomas Monz, Volckmar Nebendahl,Daniel Nigg, Michael Chwalla, Markus Hennrich, and Rainer Blatt. Exper-imental repetitive quantum error correction.

Science , 332(6033):1059–61,2011.[SCCA06] Krysta M. Svore, Andrew W. Cross, Isaac L. Chuang, and Alfred V. Aho. Aﬂow-map model for analyzing pseudothresholds in fault-tolerant quantumcomputing.

Quantum Information and Computation , 6(3):193–212, 2006, arXiv:0508176 .[SDT07] Krysta M. Svore, David P. DiVincenzo, and Barbara M. Terhal. Noise Thresh-old for a Fault-Tolerant Two-Dimensional Lattice Architecture.

QuantumInformation and Computation , 7:20, 2007, arXiv:0604090 .[SE09] Ashley M. Stephens and Zachary W. E. Evans. Accuracy threshold forconcatenated error detection in one dimension.

Physical Review A , 80:22313,2009, arXiv:0902.2658 .[Sel12] Peter Selinger. Eﬃcient Cliﬀord+T approximation of single-qubit operators.2012, arXiv:1212.6253 .[Sel13] Peter Selinger. Quantum circuits of T-depth one.

Physical Review A ,87:042302, 2013, arXiv:1210.0974 .[SFH08] Ashley M. Stephens, Austin G. Fowler, and Lloyd C. L. Hollenberg. Universalfault tolerant quantum computation on bilinear nearest neighbor arrays.

Quantum Information and Computation , 8:330, 2008.211SFR +

06] Thomas Szkopek, Heng Fan, Vwani Roychowdhury, Eli Yablonovitch, P. Os-car Boykin, Geoﬀrey Simms, Mark Gyure, and Bryan Fong. Threshold ErrorPenalty for Fault Tolerant Computation with Nearest Neighbour Communica-tion.

IEEE Transactions on Nanotechnology , 5:42–49, 2006, arXiv:0411111 .[Shi03] Yaoyun Shi. Both Toﬀoli and Controlled-NOT need little help to do universalquantum computation.

Quantum Information and Computation , 3(1):84–92,2003, arXiv:0205115 .[Sho94] Peter W. Shor. Polynomial Time Algorithms for Discrete Logarithms andFactoring on a Quantum Computer. In

Proceedings of the 35th AnnualSymposium on Foundations of Computer Science , 1994, arXiv:9508027 .[Sho96] Peter W. Shor. Fault-tolerant quantum computation.

Proc. 37th AnnualSymp. on Foundations of Computer Science (FOCS) , pages 56–65, 1996, arXiv:9605011 .[SL13] Ady Stern and Netanel H Lindner. Topological quantum computation–frombasic concepts to ﬁrst experiments.

Science , 339(6124):1179–84, 2013.[SLB +

11] D. Stucki, M. Legr´e, F. Buntschu, B. Clausen, N. Felber, N. Gisin, L. Henzen,P. Junod, G. Litzistorf, P. Monbaron, L. Monat, J.-B. Page, D. Perroud,G. Ribordy, A. Rochas, S. Robyr, J. Tavares, R. Thew, P. Trinkler, S. Ventura,R. Voirol, N. Walenta, and H. Zbinden. Long-term performance of theSwissQuantum quantum key distribution network in a ﬁeld environment.

New Journal of Physics , 13(12):123001, 2011, arXiv:1203.4940 .[SLW83] Martine Schlag, Yuh-Zen Liao, and Chak-Kuen Wong. An algorithm foroptimal two-dimensional compaction of VLSI layouts.

Integration, the VLSIJournal , 1(2-3):179–209, 1983.[SM91] Khushro Shahookar and Pinaki Mazumder. VLSI cell placement techniques.

ACM Computing Surveys , 23(2):143–220, 1991.[SMN13] Ashley M. Stephens, William J. Munro, and Kae Nemoto. High-threshold topological quantum error correction against biased noise. 2013, arXiv:1308.4776 .[SO13] Kerem Halil Shah and Daniel Kuan Li Oi. Ancilla Driven Quantum Compu-tation with arbitrary entangling strength. In

Proc. 8th Conference on the heory of Quantum Computation, Communication and Cryptography (TQC2013) , 2013, arXiv:1303.2066 .[SR09] Federico M. Spedalieri and Vwani P. Roychowdhury. Latency in local, two-dimensional, fault-tolerant quantum computing.

Quantum Information andComputation , 9:666–682, 2009, arXiv:0805.4213 .[SSP13] Alireza Shafaei, Mehdi Saeedi, and Massoud Pedram. Optimization of quan-tum circuits for interaction distance in linear nearest neighbor architectures.In

Proceedings of the 50th Annual Design Automation Conference (DAC13) ,page 41, 2013.[Sta11] Daan Staudt. The Role of Correlated Noise in Quantum Computing. 2011, arXiv:1111.1417 .[STD05] Krysta M. Svore, Barbara M. Terhal, and David P. DiVincenzo. Localfault-tolerant quantum computation.

Physical Review A , 72(2):44, 2005, arXiv:0410047 .[Ste96] Andrew M. Steane. Active stabilisation, quantum computation and quan-tum state synthesis.

Physical Review Letters , 78(11):2252–2255, 1996, arXiv:9611027 .[Ste02] Andrew M. Steane. Fast fault-tolerant ﬁltering of quantum codewords. 2002, arXiv:0202036 .[Ste03] Andrew M. Steane. Overhead and noise threshold of fault-tolerant quantumerror correction.

Physical Review A , 68(4):042322, 2003, arXiv:0207119 .[Ste07] Andrew M. Steane. How to build a 300 bit, 1 Giga-operation quan-tum computer.

Quantum Information and Computation , 7:171–183, 2007, arXiv:0412165 .[SWD10] Mehdi Saeedi, Robert Wille, and Rolf Drechsler. Synthesis of quantumcircuits for linear nearest neighbor architectures.

Quantum InformationProcessing , 10(3):355–377, 2010, arXiv:1110.6412 .[SZRL11] Alexandre M. Souza, Jingfu Zhang, Colm A. Ryan, and Raymond Laﬂamme.Experimental magic state distillation for fault-tolerant quantum computing.

Nature communications , 2:169, 2011, arXiv:1103.2178 .213TB05] Barbara M. Terhal and Guido Burkard. Fault-tolerant quantum compu-tation for local non-Markovian noise.

Physical Review A , 71(1):19, 2005, arXiv:0402104 .[vDH09] Wim van Dam and Mark Howard. Tight Noise Thresholds for QuantumComputation with Perfect Stabilizer Operations.

Physical Review Letters ,103(17):170504, 2009, arXiv:0907.3189 .[VHP05] Shashank Virmani, Susana F. Huelga, and Martin B. Plenio. Classicalsimulatability, entanglement breaking, and quantum computation thresholds.

Physical Review A , 71:042328, 2005, arXiv:0408076 .[VKL99] Lorenza Viola, Emanuel Knill, and Seth Lloyd. Dynamical Decoupling ofOpen Quantum Systems.

Physical Review Letters , 82(12):2417–2421, 1999, arXiv:9809071 .[VLFY10] Rodney Van Meter, Thaddeus D. Ladd, Austin G. Fowler, and YoshihisaYamamoto. Distributed Quantum Computation Architecture Using Semi-conductor Nanophotonics.

International Journal of Quantum Information ,8:295–323, 2010, arXiv:0906.2686 .[VSFM13] Mark H. Volkmann, Anubhav Sahu, Coenrad J. Fourie, and Oleg A.Mukhanov. Implementation of energy eﬃcient single ﬂux quantum digitalcircuits with sub-aJ/bit operation.

Superconductor Science and Technology ,26(1):015002, 2013, arXiv:1209.6383 .[WBCT13] Dave Wecker, Bela Bauer, Bryan Clark, and Matthias Troyer. Privatecommunication. 2013.[WFH11] David S. Wang, Austin G. Fowler, and Lloyd C. L. Hollenberg. Surfacecode quantum computing with error rates over 1%.

Physical Review A ,83(2):020302, 2011, arXiv:1009.3686 .[WFHH10] David S. Wang, Austin G. Fowler, Hill, Charles D., and Lloyd C. L. Hollen-berg. Graphical algorithms and threshold error rates for the 2d colour code.

Quantum Information and Computation , 10:780, 2010, arXiv:0907.1708 .[WFSH10] David S. Wang, Austin G. Fowler, Ashley M. Stephens, and Lloyd C. L.Hollenberg. Threshold error rates for the toric and surface codes.

QuantumInformation and Computation , 10:456, 2010, arXiv:0905.0531 .214WGMAG13] Jonathan Welch, Daniel Greenbaum, Sarah Mostame, and Al´an Aspuru-Guzik. Eﬃcient Quantum Circuits for Diagonal Unitaries Without Ancillas.2013, arXiv:1306.3991 .[WK13] Nathan Wiebe and Vadym Kliuchnikov. Floating point representationsin quantum circuit synthesis.

New Journal of Physics , 15:093041, 2013, arXiv:1305.5528 .[WZ82] W K Wootters and Wojciech H. Zurek. A single quantum cannot be cloned.

Nature , 299:802–803, 1982.[YGL +

13] Norman Y. Yao, Zhe-Xuan Gong, Chris R. Laumann, Steven D. Bennett, L. M.Duan, Mikhail D. Lukin, Liang Jiang, and Alexey V. Gorshkov. QuantumLogic between Remote Quantum Registers.

Physical Review A , 87:022306,2013, arXiv:1206.0014 .[Zal96] Christof Zalka. Threshold Estimate for Fault Tolerant Quantum Computing.1996, arXiv:9612028 .[Zal98] Christof Zalka. Simulating Quantum Systems on a Quantum Computer.

Proceedings of the Royal Society A , A454:313–322, 1998, arXiv:9603026 .[ZCC11] Bei Zeng, Andrew W. Cross, and Isaac L. Chuang. Transversality versus uni-versality for additive quantum codes.

Information Theory, IEEE Transactionson , 57(9):6272–6284, 2011, arXiv:0706.1382 .[ZGML11] Jingfu Zhang, Dorian Gangloﬀ, Osama Moussa, and Raymond Laﬂamme.Experimental quantum error correction with high ﬁdelity.

Physical ReviewA , 84(3), 2011, arXiv:1109.4821 .[ZGZL12] Jingfu Zhang, Markus Grassl, Bei Zeng, and Raymond Laﬂamme. Experi-mental Implementation of a Codeword Stabilized Quantum Code.

PhysicalReview A , 85:062312, 2012, arXiv:1111.5445 .[ZLC00] Xinlan Zhou, Debbie W. Leung, and Isaac L. Chuang. Methodologyfor quantum logic gate construction.

Physical Review A , 62(5):17, 2000, arXiv:0002039 .[ZLS12] Jingfu Zhang, Raymond Laﬂamme, and Dieter Suter. Experimental Im-plementation of Encoded Logical Qubit Operations in a Perfect Quan-tum Error Correcting Code.

Physical Review Letters , 109:100503, 2012, arXiv:1208.4797arXiv:1208.4797