Flexible layout of surface code computations using AutoCCZ states
FFlexible layout of surface code computations using Au-toCCZ states
Craig Gidney and Austin G. Fowler
Google Inc., Santa Barbara, California 93117, USAMay 23, 2019
We construct a self-correcting CCZ state (the “AutoCCZ”) with embedded delayedchoice CZs for completing gate teleportations. Using the AutoCCZ state we create efficientsurface code spacetime layouts for both a depth-limited circuit (a ripply-carry addition)and a Clifford-limited circuit (a QROM read). Our layouts account for distillation androuting, are based on plausible physical assumptions for a large-scale superconducting qubitplatform, and suggest that circuit-level Toffoli parallelism (e.g. using a carry-lookaheadadder instead of a ripple-carry adder) will not reduce the execution time of computationsinvolving fewer than five million physical qubits. We reduce the spacetime volume ofdelayed choice CZs by a factor of 4 compared to techniques from previous work (Fowler2012), and make several improvements to the CCZ magic state factory from (Gidney 2019).
An interesting consequence of the topological nature of quantum computation in the surface code [7]is that, in spacetime diagrams describing the computation, there is very little distinction betweenspacelike and timelike directions. It is valid to redirect a qubit’s worldline “backwards in time” sothat its next operation actually happened earlier. There is of course a forward-in-time descriptionof the computation occurring, where a magic state representing an operation is prepared, applied tothe qubit via gate teleportation, and then completed via classically controlled fixup operations. Butconceptually it is helpful to treat the time direction as just another space direction.Computations that only involve Clifford operations are particularly easy to treat purely topologi-cally, with time equivalent to space, because when Clifford operations are applied via gate teleportationthe resulting fixup operations are always classically controlled Paulis (which can be applied entirelywithin the classical control system). In order to treat time like space when performing more generaloperations, such as T gates and Toffoli gates, it is necessary to use techniques such as the selectiverouting construction described by Fowler in [8].Fowler’s technique allows general operations to be laid out arbitrarily through spacetime, butproduces a series of “routing qubits” that must be stored until an adaptive measurement process de-termines whether to measure them in the X or Z basis. The measurement depth (e.g. T depth) of thecircuit determines how many times the classical control system will have to perform a set of measure-ments, decide which basis to use for the next set of measurements, and start those measurements. Thespeed at which the control system can run this loop, and work through the measurements, determinesthe speed of computation.The characteristic time it takes the control system to react to a measurement, and perform thefollowing dependent measurement, is the control system’s “reaction time”. We refer to the generalparadigm of performing a quantum computation whose speed is limited only by the measurementdepth of the circuit and the reaction time of the control system as “reaction limited computation”.
Craig Gidney: [email protected] a r X i v : . [ qu a n t - ph ] M a y igure 1: Equivalent concepts expressed in quantum circuit diagrams, ZX calculus graphs, and 3d topologicaldiagrams. In circuit diagrams, time goes from left to right. In ZX calculus graphs, there is no preferred timedirection. In 3d topological diagrams, times goes from bottom to top. We never show Pauli operations in latticesurgery diagrams or in ZX calculus graphs, because they are performed by the classical control system instead ofby operating on qubits. Our usage of the ZX calculus is somewhat non-standard in that we consider ZX graphsto be equivalent if they are equal modulo Pauli operations, we use a non-standard node coloring that matches thecoloring of our topological diagrams, and we introduce a “delayed choice node” to represent adaptive effects comingfrom the classical control system. We exaggerate the spacing of our 3d topological diagrams, as in [11], so that itis possible to see how the components are interconnected. uantity Value Effect of 10x decrease Effect of 10x increasePhysical gate error rate 10 −
4x less space, same time, fewer factories Too close to threshold for tractable computationReaction limited at 7 factoriesSurface code cycle time 1 µ s Same time, fewer factories 10x more spacetime volumeReaction limited at 2 factories Reaction limited at 135 factoriesReaction time 10 µ s Easy to trade space for time Hard to trade space for timeReaction limited at 135 factories Reaction limited at 2 factoriesPhysical connectivity planar N/A N/A Table 1: Physical assumptions we make in this paper, and the effect of varying them.
Our contributions in this paper relate to decreasing the space overhead of reaction limited com-putation, and making it easier to route. The paper is organized as follows. In Section 1 we describethe context in which we are operating and note some notational conventions. In Section 2 we presentour optimized version of a reaction limited selective CZ, which we refer to as a delayed choice CZ. InSection 3 we show how to produce and consume AutoCCZ states, which make routing easier becausethey decouple the consumption of the CCZ state from the fixup operations needed to complete a gateteleportation. Section 4 presents several improvements to the CCZ distillation factory from [11]. InSection 5 we lay out a reaction limited ripple-carry adder that relies heavily on AutoCCZ states. InSection 6 we lay out a QROM read that uses multiple access hallways in order to nearly transition frombeing Clifford limited to being reaction limited. Finally, in Section 7 we summarize our contributionsand discuss their implications.In this paper we will represent quantum computations using quantum circuit diagrams, ZX cal-culus graphs [5] and 3d topological diagrams. Figure 1 shows how to translate between the threerepresentations.When making estimates, we will be using assumptions that are plausible for a future large-scalesuperconducting qubit platform: a reaction time of 10 microseconds, a surface code cycle time of 1microsecond, and a characteristic gate error rate of − . We show these quantities in Table 1, whichalso notes the effect of changing each value. Fowler’s selective routing technique from [8] is based on controllable multiplexers and demultiplexers(see Figure 2). Each (de)multiplexer produces two “routing qubits” which, depending on whetherthey are measured in the X or Z basis at a later time, can route a data qubit through one of multipleprecomputed worldlines. By chaining these selective computations together through space instead ofthrough time, the computation becomes reaction limited.For example, consider a series of Toffoli gates where the output of each gate affects the control of afollowing Toffoli gate, such as the Toffolis in a ripple-carry adder. Normally, the current Toffoli wouldneed to be completely finished (including fixup operations necessitated due to using gate teleportation)before it was possible to start the next Toffoli gate. But, by using multiplexers routing through multiplepossible precomputed fixups, it is possible to apply all of the Toffoli operations simultaneously whiledelayed-choice routing through all the various possible fixups.Although the (de)multiplexer construction from [8] is very general, it is often not optimal. Forexample, teleporting a CCZ gate produces up to three possible CZ fixup operations. Using the mul-tiplexer construction to delay the choice of whether or not the various CZ fixups should be appliedwould produce eight routing qubits per potential CZ (because there are two qubits involved in a CZand each must go through a multiplexer/demultiplexer pair). In Figure 3, we present a more efficientconstruction for performing delayed choice CZs that only uses two routing qubits. igure 2: Delayed choice multiplex/demultiplex construction from [8]. Top left is a circuit diagram directly from[8]. The bottom left shows the process as a ZX calculus graph which, unlike the circuit diagram, is identical formultiplexing and demultiplexing. Known rewrite rules are used to show equivalence with the claimed “choose whichroute is connected” functionality. On the right side is a 3d topological diagram of a lattice surgery implementationof the construction. The vertical poles coming out of the branches of the fork are the routing qubits used to controlwhich of the two branches connects to the rear trunk. They can be extended arbitrarily. The red squares atop therouting qubit columns are placeholders for an eventual X or Z basis measurement, to be determined by classicalcontrol software.Figure 3: Our optimized delayed choice CZ as a circuit and as a lattice surgery construction. The two forms areshown to be equivalent via ZX graph identities. During execution, the choice of whether or not to apply the CZ isdelayed by extending the red-topped columns (the “routing qubits”) in the 3d topological diagram. Once the choiceis known, the columns are terminated with the red square replaced either by a white square (activates the CZ) or ablack square (skips the CZ). The circuit can be opened in the online simulator Quirk by following this link. The AutoCCZ state
We can embed three instances of the delayed choice CZ construction directly into a CCZ state, sothat there is one delayed choice CZ for each CZ fixup that may be needed when performing gateteleportation. This augments the CCZ state into an “AutoCCZ” state, so named because it automat-ically cleans up its own CZ fixup garbage. This makes consuming the state much simpler, because nocorrections are needed at the consumption site. (Note that any magic state can be augmented into anauto magic state. For example, Figure 17 of [15] defines an auto-corrected T gate.)In Figure 4, we show a circuit diagram producing and consuming an AutoCCZ state. The figurealso shows how to represent this concept as a ZX calculus graph. Then, in Figure 5, we construct acompact “CCZ fixup box” with contents equivalent to the CZ fixup part of the ZX graph. AutoCCZstates are produced by linking the CCZ state output from a CCZ factory to the ports of the fixup boxas the CCZ state is routed to its destination.In Figure 6, we show how to use a CCZ state to perform a Toffoli operation. In Figure 7, we showthe resulting reaction limited control system cycle.We believe that it is possible to reduce the amount of spacetime volume we use to transform CCZstates into AutoCCZ states. For example, our current construction uses six routing qubits (two foreach possible CZ fixup) but intuition would suggest that only three qubits should be needed in order todistinguish between the eight possible fixup cases. Alternatively, it should be possible to save volumeby carefully integrating the delayed choice CZ fixups directly into the CCZ factory. We leave this taskas future work.
Operations that are not native to the surface code can be performed using magic state distillation [3]and gate teleportation [12]. A particularly useful magic state is the CCZ state | CCZ i = CCZ | + i ⊗ = P a,b,c ∈{ , } ( − abc | abc i . The reason this state is useful is because the quantum equivalent of the ANDgate, the Toffoli gate, is not native to the surface code but can be performed by consuming one CCZstate [13, 6, 11] (see Figure 6). Algorithms with a lot of arithmetic, such as Grover’s algorithm andShor’s algorithm, perform many Toffoli gates and benefit from using a state specialized to this task.It is also possible to perform Toffoli gates using T states, but four states are required instead ofone [13]. Whether this is better or worse than using a CCZ state, or some other technique, dependson the relative spacetime volumes of the different types of magic state factories, their error rates, andthe number of operations that need to be performed.In this paper we will be using the CCZ factory from [11]. It produces level 0 T states using thestate injection technique of Li [14], then distills level 1 T states using one round of 15-to-1 distillationbased on the Reed-Muller code [3], then uses those T states to perform an error-detecting Toffoli [13]to produce the final CCZ states.We make several small improvements to this factory, in order to improve its depth: • We choose code distances based on the target error rate, instead of based on what packs mostneatly. • We use six level 1 T factories instead of five. • We improve the depth of the T teleportation/injection construction. • We interleave the level 1 T factories better.The exact code distances that we choose depends on the number of operations that have to beperformed. Loosely speaking, the level 2 code distance will almost always be lower (e.g. 27 insteadof the original 31) and the level 1 code distance will sometimes be larger (e.g. 17 instead of theoriginal 15). Changing the code distances requires recomputing the footprint and depth of the factory. igure 4: A circuit diagram and ZX calculus graph simplification showing how to create and consume an AutoCCZmagic state, powered by delayed choice CZs, to perform a CCZ gate. The bottom left part of the circuit is producingthe AutoCCZ magic state, then the middle left does parity measurements vs the target qubits, then the middle rightuses those measurements to determine the basis of measurements that determine whether fixup operations occur,then finally on the right side all the measurement results are used to update the Pauli frame tracked in the controlsoftware. The circuit can be opened in the online simulator Quirk by following this link. Depending on the ratio between the level 1 and level 2 code distances, the output rate may be limitedby either the production rate of level 1 T states or by the height of the level 2 part of the factory. Thefootprint of the factory is similarly lower bounded by either one code distance or the other, dependingon their relative size (e.g. see Figure 9).Since writing [11], we have checked that it is possible to slightly lower the locations where T statesare injected into our factories. Basically, because the T states are being injected on the outside ofthe vertical poles storing the factory’s qubits, while the stabilizer measurements are happening on the igure 5: Lattice surgery implementation of the CZ fixups bubble from Figure 4. The 3d model is stored inancillary file “ccz-fixup.skp”; it can be opened online using Sketchup. The tee-junctions in the 3d diagram havebeen numbered, so that it is easier to see how they correspond to the cycle in the ZX calculus graph. The pinkbox on the right, with routing qubit “chimneys”, is the simplified abstract representation that we will use in largerdiagrams. Figure 6: Performing a Toffoli gate by consuming an AutoCCZ state. insides of the poles, they are not constrained into waiting for each other and it is possible to overlapthe two. We show this new injection style, which reduces the depth of the CCZ factory from . d to d , in Figure 8.We also used this new injection style in the level 1 T factories within the CCZ factory. Additionally,we found a slightly better way to interleave successive instantiations of the T factories. In total, asshown in Figure 11, we achieved a depth of . d (compared to . d in [11]).For extremely large problem sizes, where the number of Toffoli operations exceeds the capabilitiesof the CCZ state factory from [11], we can fall back to the T state factory from [9]. According tothe spreadsheet included in [11], and accounting for the ability to increase the code distances, thistransition becomes necessary when performing on the order of ten trillion Toffoli operations.In order to perform a reaction limited computation, more than one factory is needed. The exactnumber depends on the reaction time of the control system, the cycle time of the surface code, andthe depth of the factory. In this paper we are assuming a reaction time of 10 microseconds and a cycletime of 1 microsecond. For the sake of example, we will assume a level 1 code distance of 17 and alevel 2 code distance of 27.The production rate of the factory can be limited by either the level 1 or level 2 distances. At level2 the production rate of the factory is limited by the factory’s depth times the cycle time times thelevel 2 code distance. So the level 2 part of the factory is technically capable of producing states at arate of (5 · µs · − ≈ . kHz. The level 1 part of the factory needs to produce 8 level 1 T states foreach CCZ state that will be output. There are six level 1 T factories, and they have a depth of . d ,which means the output rate of the entire factory cannot be larger than (5 . · · µs · / − ≈ . igure 7: Performing a series of Toffoli gates using reaction limited quantum computation [8]. The Toffolis canbe laid out in a spacelike fashion, so that they are “performed simultaneously”, but doing so will produce routingqubits that the control software will still process serially. The rate at which the control software can cycle betweencomputing Pauli frames and applying fixups, i.e. the time it takes to measure a set of routing qubits and determinethe basis to measure the next set in, determines the speed of the quantum computation. kHz. Therefore the level 2 code distance is the limiting factor, and the factory runs at . kHz.In a reaction limited computation, one CCZ state will be needed per reaction time of the classicalcontrol system. That is to say, CCZ states are consumed at a rate of 100 kHz. Therefore, given ourassumptions, a reaction limited computation requires d / . e = 14 CCZ factories running in parallel.Note that the combined footprint of these 14 CCZ factories would be approximately 2.6 millionphysical qubits. This suggests that quantum computations covering less than five million physicalqubits will see no gains from circuit-level parallelism, such as using carry-lookahead adders instead ofripple-carry adders, because there simply isn’t the space to fit the data qubits, the factories, and therouting needed to run multiple threads of execution faster than a single thread of execution.
Our goal in this section is to perform a reaction limited addition using the ripple-carry adder describedby Cuccaro et al [4] (see Figure 12). There are two problems we need to solve to make this possible.First, we need to figure out how to lay out the operations of the adder in a spacelike fashion. Second, weneed to efficiently route the CCZ states being produced by the many CCZ factories into the ripple-carrycomputation.Thanks to the flexibility of the AutoCCZ state, laying out the ripple-carry process in a spacelikefashion is straightforward. We just need implementations of the MAJ and UMA carry rippling op-erations that accept a CCZ state and propagate the involved bits horizontally across space, insteadof vertically through time. Figure 13 shows a 3d topological diagram of a MAJ implementation withthis property. It consumes one AutoCCZ state, propagates carry bits left to right, propagates thequbits being operated on front to back, and propagates nothing past-to-future. It is possible to tilemultiple copies of this block across space, as shown in Figure 14. The control system will then analyzethe dependencies between the AutoCCZ states, and propagate the carry signal in a reaction limitedfashion (as in Figure 7).Note that we can’t lay out the entire addition in a spacelike fashion. That would require a numberof factories proportional to the size of the addition, instead of proportional to the reaction time of thecontrol system. We instead zig-zag the addition back and forth across space, performing just enough igure 8: Improved method for gate teleporting a T state into a qubit. This 3d model is stored in ancillary file“inject-t.skp”; it can be opened online using Sketchup. The new method (right) saves . d surface code cycles,where d is the code distance of the factory, compared to the old method (left) [9]. carry rippling to keep the factories and the control system busy.We now move on to the routing question. Each MAJ box has four inputs and three outputs. Oneof the inputs, and also one of the outputs, is the carry qubit. Another two of the inputs (and outputs)are the data qubits, one from the target register and one from the offset register. The remaining inputis the three qubits making up the CCZ part of an AutoCCZ state. We must route all these input andoutput qubits in a way that causes them to intersect the MAJ box at the right place and at the righttime.Our solution to this problem is to zig-zag the carry qubit back and forth along the X axis (right/leftthrough space), while running data qubits through along the Y axis (forward/back through space).We place CCZ factories in front of and behind the zig-zagging area, so that their outputs are produceddirectly adjacent to where they are needed making routing trivial. We then leave small gaps betweenadjacent factories, so that data qubits from outside the zig-zagging area can be routed through thosegaps as needed.As more and more data qubits are routed from behind the zig-zagging area to in front (or viceversa), we gradually shift the zig-zagging area backward (or forward). We interleave the two dataregisters into alternating rows, so that qubits that need to reach the same MAJ box at the same timeare adjacent. Within each row we do additional interleaving, spacing out qubits that are technicallysequential in the register. This prevents traffic jams as the data qubits are routed through the gapsbetween the factories.The layout we are describing can be seen in 2d in Figure 15 and in 3d in Figure 16. We alsoprovide an “across time” view in Figure 18. Note how there is some additional space for the CCZfixup boxes from Figure 5 and for left/right routing of data qubits. There are two CCZ fixup boxesper CCZ factory because the routing qubits (the “chimneys”) emerging from a CCZ fixup box canextend vertically into the next “zig-zag layer” before the control system determines how to measuresaid qubits.This layout is capable of performing ripple-carry additions at the reaction limited rate, propagatingthe carry information from qubit to qubit at 100kHz. Given our physical assumptions, this layout wouldadd a pair of thousand-qubit registers in approximately 20 milliseconds. igure 9: The CCZ factory from [11], after adjusting the level 1 distance to 17 and the level 2 distance to d = 27 ,has a footprint that fits inside a × rectangular patch of distance d = 27 logical qubits. A ripple-carry adder is ideal for reaction limited computation because it has only a small amount ofClifford operations per Toffoli operation. A table lookup (also called a QROM read [1]) is exactly theopposite. For each Toffoli performed in a table lookup, there is a huge amount of Clifford work todo. In particular, each Toffoli triggers a gigantic multi-target CNOT potentially touching all of thelookup’s output qubits. Because of this, the limiting factor during a table lookup is not the controlsystem’s reaction time but rather access to the output qubits.In order to target a logical qubit with a CNOT, there must be an unused logical-qubit sized patchof surface code adjacent to that logical qubit. The CNOT operation will then need to use that patchfor d cycles, where d is the code distance. For qubits where only one side is accessible, only one CNOTcan be performed per d cycles. Given our assumption of a surface code cycle time of 1 microsecond,and using a code distance of 27 just for example, this suggests a maximum CNOT rate of 37kHz(instead of the 100kHz of a reaction limited computation).It is often possible to work around this CNOT rate limitation. For example, if there are multiplesingle-control single-target CNOTs all targeting the same qubit, it is possible to fuse the many CNOTsinto a single generalized CNOT where the control is a Pauli product of all the individual controls. Butthis doesn’t work in the case of table lookups, because the set of relevant control qubits differs fromoutput qubit to output qubit.One thing we can do is to make two sides of each qubit accessible, instead of just one. The giganticmulti-target CNOTs can then alternate between using one side, and using the other side. This doublesthe achievable Toffoli rate from 37kHz to 74kHz, which is much closer to 100kHz. One can go further,e.g. making temporary copies of qubits in order to allow more and more parallel access, but the spacetradeoffs start to become problematic. In this paper we will limit ourselves to two-sided access.We focus on the case where we are performing a large lookup that prepares a register whose rowsare interleaved between rows of another register. This specific case is interesting to us because it is thekind of lookup that is needed when performing windowed arithmetic [10], where groups of additionoperations are replaced by single lookup+addition operations.While performing the lookup, we use a tiled row interleaving pattern of R L L R where an L isa lookup data row, an R is an existing data row not involved in the lookup, and an underscore isa empty access row that can be used when performing the multi-target CNOTs. The multi-targetCNOTs alternate between using the single inner access row and both of the outer access rows. Inorder to access the access rows, we place cross-row access corridors on opposing sides of the layout. igure 10: 3D diagram of the core of the CCZ factory factory we are using, derived from the factory in [11].This 3d model is stored in ancillary file “ccz-factory.skp”; it can be opened online using Sketchup. The triplets ofopen-ended pipes are the CCZ state output. Each small open-ended pipe is for a T state input, coming from the sixlevel 1 T factories (not shown here, see Figure 11). Due to how the factory is interleaved with itself, a CCZ stateof sufficient quality is produced every d · surface code cycles where d is the code distance. The multi-target CNOTs alternate between using the two access corridors, so that they can branchinto individual access rows as needed.Figure 17 has a 2d floor plan of where factories and access hallways are located during a lookup.We also provide an “across time” view in Figure 18 of a lookup followed by an addition. igure 11: Different views of a 3D diagram of the level 1 T factory we are using to produce | T i states, derivedfrom the factory in [9, 11]. This 3d model is stored in ancillary file “t1-factory.skp”; it can be opened online usingSketchup. Due to how the factory is interleaved with itself, a | T i state is produced every d · . surface codecycles (where d is the level 1 code distance). In this paper we defined a self-correcting AutoCCZ state, improved the distillation factory and delayedchoice CZ constructions underlying it, and demonstrated the basics of using it to lay out low-depthsurface code computations. We laid out a reaction limited ripple-carry adder, a Clifford limited tablelookup, and ensured they fit together in a fashion that would be useful for performing windowedarithmetic.We estimated, under plausible physical assumptions for future large-scale superconducting qubitplatforms, that running a serial circuit at the reaction limited rate requires 14 CCZ factories. Becausethis number of factories covers several million physical qubits, we believe that (under our physicalassumptions) computations involving fewer than five million physical qubits will not benefit fromusing circuits that perform Toffolis in parallel (as in carry-lookahead adders) instead of serially (as inripple-carry adders).We hope readers find the techniques we’ve described, and the example layouts we’ve presented,helpful when laying out their own large scale computations. in • • • c in t • • ( t + i ) i • • • • • i t • • ( t + i ) i • • • • • i t • • ( t + i ) i • • • • i t • ( t + i ) i • • • i t ( t + i ) Figure 12: Cuccaro’s ripple-carry adder [4]. Uses local operations travelling in a “V” shaped wave. If m is thelength of the target register t , then the Toffoli count and measurement depth of this construction is m − .Figure 13: Lattice surgery implementation of the MAJ operation. Top-left is the MAJ circuit from Cuccaro’s paper[4]. Top-center is a ZX calculus graph with function equivalent to the circuit. Center-right is a lattice surgeryimplementation of the ZX graph (stored in ancillary file “maj.skp” which can be opened online using Sketchup).Bottom-center is a simplified representation of the 3d model that shows only the ports and the 3x3x5 bounding box. igure 14: Lattice surgery implementation of a series of MAJ operations. Each light blue box is an instance ofFigure 13. To enable parallelization, the carry propagation is done left to right, through space, instead of forwardthrough time. Dark and light coloring indicates boundary type, but otherwise coloring is for labelling. Red bars areinjecting three qubits from an AutoCCZ state, green bars are passing qubits from the target register through theblock (front to back), and yellow bars are passing qubits from the offset register through the block (front to back).Figure 15: Layout of a reaction limited ripple-carry addition, assuming a level 2 code distance of 27 and a level 1code distance of 17. The CCZ factories (red boxes) and CCZ fixups (pink boxes) are producing AutoCCZ states fedinto the MAJ computations (light blue region). As the carry qubit zig-zags back and forth, qubits from the offsetregister and target register (green and yellow rows) are routed through gaps between the factories. igure 16: Surface code activity during an addition, assuming a level 2 code distance of 27 and a level 1 codedistance of 17. This 3d model is stored in ancillary file “adder-layout.skp”, which can be opened online usingSketchup. Yellow and green rows are qubits from the target and input registers of the addition. Dark blue rowsare qubits from an idle register. Red boxes are CCZ magic state factories, auto-corrected by the pink boxes (seeFigure 3). Light blue boxes are the MAJ operation of Cuccaro’s adder (see Figure 13), arranged into a spacelikesweep to keep ahead of the control software (see Section 3). The full adder is formed by repeating this pattern,with the operating area gradually sweeping up and then down through the green/yellow data (see Figure 18). Craig Gidney constructed the AutoCCZ state and produced the adder and lookup layouts using it.Austin Fowler helped iterate on ideas for laying out CCZ fixups, found the more efficient factoryinterleavings, and played a supervisory role guiding the content of the paper.
We thank Hartmut Neven for creating an environment where this research was possible in the firstplace.
References [1] Ryan Babbush, Craig Gidney, Dominic W Berry, Nathan Wiebe, Jarrod McClean, AlexandruPaler, Austin Fowler, and Hartmut Neven. Encoding electronic spectra in quantum circuits withlinear t complexity.
Physical Review X , 8(4):041015, 2018.[2] Dominic W Berry, Craig Gidney, Mario Motta, Jarrod R McClean, and Ryan Babbush.Qubitization of arbitrary basis quantum chemistry by low rank factorization. arXiv preprintarXiv:1902.02134 , 2019.[3] Sergey Bravyi and Alexei Kitaev. Universal quantum computation with ideal clifford gates andnoisy ancillas.
Physical Review A , 71(2):022316, 2005.[4] Steven A Cuccaro, Thomas G Draper, Samuel A Kutin, and David Petrie Moulton. A newquantum ripple-carry addition circuit. arXiv preprint quant-ph/0410184 , 2004.[5] Niel de Beaudrap and Dominic Horsman. The zx calculus is a language for surface code latticesurgery. arXiv preprint arXiv:1704.08670 , 2017. igure 17: Data layout during a double-access lookup, assuming a level 2 code distance of 27 and a level 1 codedistance of 17. Output qubits (yellow) are interleaved between the rows of an existing register (green). The verticalaccess corridors and horizontal access rows provide two distinct ways to simultaneously access all output qubitswhen performing many-target CNOTs. The dark gray area in the bottom region is sufficient space to run a unaryiteration [1]. The number of factories shown is slightly lower than what is needed to saturate the access rate of thedouble-access hallways. [6] Bryan Eastin. Distilling one-qubit magic states into toffoli states. Physical Review A , 87(3):032321, 2013.[7] A. G. Fowler, M. Mariantoni, J. M. Martinis, and A. N. Cleland. Surface codes: Towards practicallarge-scale quantum computation.
Phys. Rev. A , 86:032324, 2012. URL https://doi.org/10.1103/PhysRevA.86.032324 . arXiv:1208.0928.[8] Austin G Fowler. Time-optimal quantum computation. arXiv preprint arXiv:1210.4626 , 2012.[9] Austin G Fowler and Craig Gidney. Low overhead quantum computation using lattice surgery. arXiv preprint arXiv:1808.06709 , 2018.[10] Craig Gidney. Windowed quantum arithmetic. arXiv preprint arXiv:1905.07682 , 2019.[11] Craig Gidney and Austin G Fowler. Efficient magic state factories with a catalyzed ccz to 2ttransformation. arXiv preprint arXiv:1812.01238 , 2018.[12] Daniel Gottesman and Isaac L Chuang. Demonstrating the viability of universal quantum com-putation using teleportation and single-qubit operations.
Nature , 402(6760):390, 1999.[13] Cody Jones. Low-overhead constructions for the fault-tolerant toffoli gate.
Physical Review A , 87(2):022328, 2013.[14] Ying Li. A magic state’s fidelity can be superior to the operations that created it.
New Journalof Physics , 17(2):023037, 2015.[15] Daniel Litinski. A game of surface codes: Large-scale quantum computing with lattice surgery. arXiv preprint arXiv:1808.02892 , 2018. igure 18: Layout of data (top to bottom) over time (left to right) during a lookup addition. During lookup, thetarget register is spread out to make room for the temporary register that will hold the lookup’s output. Duringaddition the target register and lookup output register are squeezed through a moving operating area that sweepsup then down, applying the MAJ and UMA sweeps of Cuccaro’s adder. Uncomputing the lookup is done usingmeasurement based uncomputation [2], and has negligible cost compared to the other steps.igure 18: Layout of data (top to bottom) over time (left to right) during a lookup addition. During lookup, thetarget register is spread out to make room for the temporary register that will hold the lookup’s output. Duringaddition the target register and lookup output register are squeezed through a moving operating area that sweepsup then down, applying the MAJ and UMA sweeps of Cuccaro’s adder. Uncomputing the lookup is done usingmeasurement based uncomputation [2], and has negligible cost compared to the other steps.