[PDF] Comparison of ancilla preparation and measurement procedures for the Steane [[7,1,3]] code on a model ion trap quantum computer

Abstract

We schedule the Steane [[7,1,3]] error correction on a model ion trap architecture with ballistic transport. We compare the level one error rates for syndrome extraction using the Shor method of ancilla prepared in verified cat states to the DiVincenzo-Aliferis method without verification. The study examines how the quantum error correction circuit latency and error vary with the number of available ancilla and the choice of protocol for ancilla preparation and measurement. We find that with few exceptions the DiVincenzo-Aliferis method without cat state verification outperforms the standard Shor method. We also find that additional ancilla always reduces the latency but does not significantly change the error due to the high memory fidelity.

Full PDF

CComparison of ancilla preparation and measurement procedures for the Steane[[7,1,3]] code on a model ion trap quantum computer

Yu Tomita, Mauricio Guti´errez, Chingiz Kabytayev, and Kenneth R. Brown ∗ Schools of Chemistry and Biochemistry; Computational Science and Engineering; and Physics,Georgia Institute of Technology, Atlanta, Georgia 30332, USA

M. R. Hutsel, A. P. Morris, Kelly E. Stevens, and G. Mohler

Georgia Tech Research Institute, Atlanta, Georgia 30332, USA (Dated: October 31, 2018)We schedule the Steane [[7,1,3]] error correction on a model ion trap architecture with ballistictransport. We compare the level one error rates for syndrome extraction using the Shor method ofancilla prepared in veriﬁed cat states to the DiVincenzo-Aliferis method without veriﬁcation. Thestudy examines how the quantum error correction circuit latency and error vary with the number ofavailable ancilla and the choice of protocol for ancilla preparation and measurement. We ﬁnd thatwith few exceptions the DiVincenzo-Aliferis method without cat state veriﬁcation outperforms thestandard Shor method. We also ﬁnd that additional ancilla always reduces the latency but does notsigniﬁcantly change the error due to the high memory ﬁdelity. ∗ Author to whom correspondence should be addressed. Electronic mail:[email protected] a r X i v : . [ qu a n t - ph ] M a y I. INTRODUCTION

The reliability of a fault-tolerant quantum computation depends on not only the choice of error correction codebut also the methods used for syndrome extraction, state preparation, and error decoding. These choices can becompared at an abstract level of quantum circuits and depolarizing channels, but realistic quantum informationdevices will have error rates that depend on circuit elements as well as limited connectivity for applying two-qubitgates [1]. Topological codes have an advantage in that they are naturally suited to nearest-neighbor architectures[2, 3]. Concatenated code error correction procedures require additional resources to map these circuits onto localarchitectures which leads to a reduced error threshold relative to the abstract model [1, 4]. Still, these codes oﬀerpotential beneﬁts over topological codes for systems with low-error rates and fast communication between distantqubits by ballistic transport or interaction with ﬂying qubits.The extraction of syndromes requires the preparation and measurement of fresh ancilla states. This process is whatallows us to remove the entropy from the quantum system [5]. One question that arises is how many extra qubitsshould one dedicate for ancilla. Consider the Steane [[7,1,3]] code [6] using the Shor method for syndrome extraction [7]based on veriﬁed cat states. Each cat state contains four qubits; six syndrome measurements are required, suggestingthat between 4 and 24 ancilla qubits could be used. The proper balance of ancilla resources depends on the devicedetails and the error rates of the physical operations. For most quantum information devices, measurement is theslowest operation. It has been shown that in a nonequiprobable error environment where Z type error is dominant,the ﬁdelity of the Shor state may decreas with veriﬁcation [8, 9]. To avoid bottlenecks due to the measurements usedto verify cat states, DiVincenzo and Aliferis [10] proposed a method that does not require veriﬁcation of ancilla states.Here we compare these methods on a model ion trap quantum computer.The ion trap architecture is a promising basis for quantum computation and have already demonstrated longcoherence times and high ﬁdelity operations. A scalable architecture has been proposed based on shuttling ionsbetween traps [11] and work is ongoing to implement this architecture experimentally [12–20]. This framework hasbeen the basis for a number of studies on the resource requirements for implementing large quantum algorithms[21–23] and has also been considered as the elementary logical unit of hybrid schemes using photonic interconnects[24].While an arbitrarily well-connected ion trap layout can be envisioned, such that there is little fear of collision orbacklogs, this is not realistic given current technology. The ion trap layout, for example, is a grid of narrow pathswhere no ion may pass by another. Performing multiple two-qubit gates eﬃciently becomes problematic. There willbe a limited number of interaction zones, and the paths to reach them will be obstructed by other qubits which addsnon-trivial transport time in addition to the time required to execute gates.This introduces the issue of latency which is deﬁned here as the total amount of time experienced by qubits afterphysical state preparation. Latency includes qubit transport times, gate times, and idle times due to traﬃc inthe layout. When mapping a quantum circuit to a series of device operations for a layout with limited connectivity,resources dedicated to the transport of qubit information quickly come to dominate the cost of algorithm execution [21].The goal then becomes to ﬁnd a schedule of qubit operations that reduces latency as much as possible, both to makeoperation times feasible for large algorithms and to reduce memory errors due to ever-present environmental noise.Parallelization of operations is one of the most direct ways to reduce latency and is the focus of this work.One simple way to increase the parallelizability is to prepare additional ancillary qubits ahead of time in statesneeded by the computation. Just as in classical computing, this is a trade-oﬀ between memory and latency. Maximumparallelization may call for the simultaneous creation and preparation of multiple ancilla sets (low latency), but thisresults in “stale” ancilla that may suﬀer logical errors before being used (poor memory performance). Both of thesefactors can be calculated quantitatively. Total latency can be calculated given a layout, a schedule of gates, and aset of operation times (gate time, qubit speed, measurement time, etc.). Logical error can be characterized in termsof ﬁdelity, or alternatively in terms of the qubit error rate. In general, it will increase with increasing latency. Usingthese calculations, we can study the eﬀect of additional resources on the error rate of the overall algorithm execution.The impact of ancilla preparation on overhead has been previously studied for both individual logical qubits [25, 26]and large-scale quantum computation [21]. The individual logical qubit studies done for the Steane [[7,1,3]] codeassumed an abstracted layout. Although the studies did consider memory errors due to gate operation times, they didnot include the additional errors due to movement latency. The large-scale study looked at ion trap layouts holdinglarge numbers of logical qubits, and found that ancilla generation was the primary performance bottleneck. Thebottleneck was removed by creating regions dedicated to ancilla preparation and recycling. Our approach ﬂows inpart from these prior studies; here, multiple ancilla blocks are assigned to individual logical qubits, and two diﬀerentancilla encodings are employed. Once an ancilla block size and encoding are chosen, execution of the Steane code issimulated using a software design tool. The design tool is used to include realistic latency and scheduling bottlenecks,pointing towards the most practical ancilla encoding and block size. Our study focuses on a single round of Steane-code quantum error correction on a model ion trap architecture as a function of ancilla encoding/decoding and ancillaresources.

II. METHODSA. Cat state syndrome extraction and the Steane [[7,1,3]] code

The Steane [[7,1,3]] code is the best-known of the Calderbank-Shor-Steane codes [27]. It encodes one logical qubitinto seven physical qubits. The resulting logical states | (cid:105) and | (cid:105) have a Hamming distance of three and the code isable to detect and correct up to one physical bit-ﬂip error and one physical phase-ﬂip error. The Steane code has beenwidely studied and has been shown to have a threshold in the range between 10 − and 10 − [28–30] which makes itsuitable for fault-tolerant quantum error correction (FTQEC).The Steane code has six weight-four syndrome operators. Each syndrome is extracted by measuring a four-qubitcat state after interacting with the data qubits. In this study, the Steane QEC process is simulated and the latencyand ﬁdelity are calculated varying numbers of Shor ancilla sets from one to six. The two preparation/decoding casesare: (1) “on-demand” where only two sets of ancilla are prepared at any time, and (2) “one-time,” where all ancillaare prepared at once before the ﬁrst use. These procedures are done for both Shor and DiVincenzo-Aliferis ancillaencodings. Circuit diagrams for the two methods are shown in Figure 1. B. Ion trap physical machine description

Previous ion trap studies in the literature have used a gate-level error model to calculate error correction properties.Here we model our ion trap using parameters and constraints derived from the Physical Machine Description (PMD)provided by the IARPA Quantum Computer Science program [31]. The ion trap PMD is a collection of linearion trapping regions joined by cross junctions see Figure 2. It is modeled after the ion trap charge-coupled devicearchitecture of Kielpinski, Monroe, and Wineland [11]. Each bus segment (white) section is capable of holding four iontrapping regions or “wells.” Each well is capable of hosting up to ﬁve ions. Individual ion loading wells are indicatedin yellow, and interaction wells capable of executing gate or measurement operations are in green. In order to undergoa two qubit gate (such as controlled-phase), the two qubits must be co-located in an interaction well. Shuttling aqubit between adjacent empty wells takes 10 µ s. There is an additional time cost of 10 µ s to add or remove a qubitfrom a well that is occupied This reﬂects the increased experimental complexity of joining and splitting single ionsfrom ion chains [32, 33].Logical errors are assumed to arise from stochastic white noise and 1 /f noise in the control and backgroundHamiltonians. The result is a very asymmetric error model that better reﬂects the dominance of gate errors overmemory errors in the actual physical system. The model does not consider the heating of the ion motion due totransport. The error of two-qubit ion gates is modeled as a stochastic noise term in the two-qubit Hamiltonian.Table I gives the latency and error rate costs of each gate type for the ion trap PMD.In order to get the error rates in Table I, we approximate the real error channel derived from the stochastic noisewith the closest Pauli error channel. We denote the process matrix of the noisy gate by χ (cid:48) . The process matrix ofthe operation corresponding to the target (error-free) unitary followed by an X gate is denoted by χ X . The processmatrices corresponding to the target unitary followed by a Y or Z gate are denoted analogously. We then calculatethe error rates as the overlap between χ i and χ (cid:48) : X er = (cid:104) χ X , χ (cid:48) (cid:105) , Y er = (cid:104) χ Y , χ (cid:48) (cid:105) , and Z er = (cid:104) χ Z , χ (cid:48) (cid:105) where (cid:104) A, B (cid:105) = Tr( A † B ).The measurement error is above the Steane code threshold but this is be ﬁxed by introducing two extra qubitsand CZ and Hadamard gates as shown in Figure 3. This enhancement provides us the error rate of O ( (cid:15) ) where (cid:15) isthe error rate of a single measurement. The enhanced measurement operation is denoted as ‘MULTIMEASURE’ inTable I. C. Quantum Machine Parameterizer

A design tool is required to model the ion trap layout and execute qubit schedules on it. We use the QuantumMachine Parameterizer (QMP) code suite developed at GTRI. QMP is used for designing architecture layouts andcreating operation schedules with real locality constraints. QMP can currently be used for any hardware where thelocality constraints can be mapped to a planar graph. QMP has three primary facets: quantum computer layoutmodeling, operation scheduling, and physical qubit state tracking.

M1P1M1C1D1P1M1V1C1

FIG. 1. Circuits for extraction of Z type syndrome measurement of the Steane code using the (a) standard Shor and(b) DiVincenzo-Aliferis method. In the DiVincenzo-Aliferis method, the cat state veriﬁcation step is substituted with post-measurement decoding of the ancilla. Dashed lines demarcate diﬀerent sections of the circuits. Also shown to the right arerepresentations of the schedule of operations as a function of circuit sections. P=Prepare, V=Verify, C=Couple, D=Decode,M=Measure. Grey regions correspond to operations that are exclusively movement.FIG. 2. Layout for QCS Ion Trap PMD Z H Z HHH

FIG. 3. Improvement of measurement gates by adding two ancilla. This reduces the failure rate when the ancilla preparationand the controlled gates are relatively reliable compare to the measurement gates. The ﬁnal measurement value is determinedby the majority vote of the three measurements.TABLE I. Execution time and error rates of physical operations. MULTIMEASURE gates are the enhanced measurement gatedescribed in Figure 3. Gate Latency (in µ s) Error Rate X Error Rate Y Error Rate ZX 3 1.6E-8 8.0E-10 1.0E-9Y 3 8.0E-10 1.6E-8 1.0E-9Z 3 0.0 0.0 1.8E-8S 2 0.0 0.0 5.5E-9T 1 0.0 0.0 1.7E-9HADAMARD 6 1.6E-8 4.0E-9 1.9E-9CZ 105.5 0.0 0.0 IZ: 6.7E-8ZI: 6.7E-8ZZ: 2.5E-5PREPARE Z 10 0.0 0.0 0.0MEASURE Z 100 0.0 0.0 1.0E-4MULTIMEASURE Z 355 0.0 0.0 3.1E-6WAIT/MOVE t . t · The layout modeling module allows the user to describe a quantum computing system, speciﬁcally the physicalconnectivity of allowed qubit paths, and the location of addressable zones. Also, this module accepts device-dependentparameters such as gate times and movement cost times to customize the behavior of a physical machine. This includessuch device-speciﬁc operations as JOIN and SPLIT operations, required for two-qubit interactions.The operation scheduling module allows the user to write a schedule of operations that can be performed in a circuit-model-based quantum computer (gate operations, qubit preparation, etc.). This schedule is written at a “high-level”which we deﬁne as a list of gate operations and move requests that only specify qubit and destination address. Thescheduling module then calculates qubit paths using a specialized A* path-ﬁnding algorithm [34, 35], parallelizesthe schedule where possible, removes possible collision events, and produces a series of qubit movement operations.Movement parallelization is performed with highest priority given in terms of move request order in a parallel block inthe schedule. The ﬁrst qubit is moved with no impediment, provided that a path exists. The second qubit’s movementmust defer to the ﬁrst qubit, and WAIT commands are issued as needed to the second qubit to prevent collisions. Thiscontinues until the end of a parallel block is reached. In order to optimize this approach to parallelized movement,QMP analyzes move calls, qubit start positions, and destinations, and then re-orders the move calls as necessary.This module uses the device-dependent timing parameters such as gate times to complete the latency calculation ofthe operations schedule. This approach means that QMP will automatically create diﬀerent (ideally optimal) qubittransport schedules and overall latencies for diﬀerent choices in initial qubit arrangement, ancilla population, etc.Finally, the qubit state tracking module allows the user to visualize the positions of the physical qubits within thelayout as a function of time, and produce as output the total latency of the operations schedule. The module alsoproduces an “error schedule” ﬁle which reduces the detailed physical machine schedule to a sequence of error-relevantevents that the Quantum Circuit Fault Tracer (Section II D) uses to calculate failure rate.

D. Quantum Circuit Fault Tracer

The Quantum Circuit Fault Tracer (QCFT) is a tool created to eﬃciently compute logical failure rates of con-catenated FTQEC codes. This tool is based on the “fault paths” concept, introduced by Aliferis, Gottesman, andPreskill to calculate thresholds of distance-3 concatenated FTQEC codes [36]. It takes as input a quantum circuitand failure rates of physical gates including WAIT and MOVE. Starting from the output state qubits, the circuit istraced backwards, marking possible fault points. Fault points are circuit locations where an error can propagate intoa fatal error for the FTQEC circuit. The QCFT then combines the fault points and calculates the overall failure rateof the given circuit. To improve the accuracy of the output logical failure rate, we separate errors into X-type andZ-type, and propagate them on the circuit independently. Each Y-type error rate in Table I is added into both X-typeand Z-type rates. When the input circuit is a distance-3 QECC circuit, the output is the probability of having errorspropagated into two or more output data qubits.

III. PROCEDURE

Executing an algorithm on the ion trap layout requires a set of starting positions for all of the qubits present inthe computation. Each starting position corresponds to an interaction well. That is, each qubit has its own homeinteraction well which it starts in at the beginning of the QEC round; see Figure 4. The qubits are ordered into rowsby function. The data qubits sit in the top row, and never move from their initial positions. Each additional rowof qubits is a self-contained ancilla set, which is prepared according the Steane-Shor or Steane-DiVincenzo-Aliferiscircuit, coupled with the data, returned to the ancilla area (although not necessarily to the set’s original position) andmeasured to obtain the error syndrome. Each qubit returns to its home well after being called away for a sequence oftwo-qubit gates. By this convention, the control bit travels and the target bit stays home. These choices remove someoptimization capability, but allow us to test and compare diﬀerent QEC circuits from the same initial conditions andscheduling assumptions.

DataAncilla set 1Ancilla set 2Ancilla set 3(spare)Ancilla set 4(spare)Verifier

FIG. 4. Sample layout for the Steane algorithm with four sets of Shor ancilla and two preparation rows. The data qubits nevermove from their positions in the top row. The preparations rows are indicated by the presence of static veriﬁer qubits.

Once the initial state and scheduling assumptions are established, QMP is used to calculate the time requiredto perform level one error correction assuming diﬀerent operation times and ancilla management strategies. Thisincludes varying the method of ancilla preparation and measurement, the number of ancilla, the parallelization ofancilla manipulation, and the time of gates and measurements. For each set of conditions, QMP uses the A* algorithmto optimize the latency from an initial hand-crafted schedule. An error schedule is then produced containing all gateand latency information. Finally, QFCT reads the error schedule and determines the logical error rate for a set ofthese conditions.The Steane code is theoretically improved in terms of latency and error rate by creating multiple ancilla sets in par-allel. To compare the eﬃcacy of “on-demand” ancilla with “one-time” ancilla, we look at two diﬀerent parallelizationsfor both Shor and DiVincenzo-Aliferis ancilla. At this point, we use the following notation to represent choices ofancilla management: y P x R where y is the number of ancilla sets that can be prepared and measured simultaneously( y =All means complete parallelization over the set) and x are the number of ancilla sets, one set per row in the layout.The on-demand approach is 2P x R where ancilla preparation is only allowed in the two rows immediately belowthe data row. In this arrangement, ancilla qubits are moved up into one of the preparation rows, prepared, coupledwith the data, and then moved to the bottom of the ancilla “stack” for measurement, which makes room for the nextancilla set. For the Shor case, veriﬁer qubits are only kept in the two preparation rows. Six sets of ancilla are preparedin total in order to perform the three bit-stabilizer and three phase-stabilizer measurements.The one-time approach is AllP x R wherein all ancilla are prepared at once and coupled with the data as soon aspossible. For fewer than six ancilla sets, the ancilla sets are prepared at once, coupled with the data, measured, andthen prepared again as soon as possible, repeated until all stabilizer measurements are performed. For the Shor case,every row has a veriﬁcation qubit.

IV. RESULTS AND DISCUSSIONA. Scheduled error correction: Two-row preparation

The total latencies for the Steane-Shor and Steane-DiVincenzo-Aliferis algorithms are shown in Figure 5. Thelatency for one ancilla set represents the standard one-set strategy for the algorithms. For the case of a single ancillarow, the latencies for Steane-Shor and Steane-DiVincenzo-Aliferis are almost identical, with a slight advantage toSteane-Shor. In terms of latency, the veriﬁcation step in Steane-Shor is roughly equivalent to the decoding step inSteane-DiVincenzo-Aliferis. However, for Steane-Shor, ancilla qubits can be moved to the data qubits while the veriﬁerqubit is being measured. By contrast, the ancilla qubits are tied up during decoding, and no further parallelization ispossible. Thus, assuming the veriﬁer show successful ancilla creation, the Shor encoding is slightly more eﬃcient. Aswill be shown, increasing the gate time, increasing the measurement time, or adding additional ancilla rows (increasingparallelizability) will separate Steane-Shor and Steane-DiVincenzo-Aliferis performance.The latencies decrease greatly for both encodings with the addition of a second preparation row (2P x R) and, ingeneral, continue to decrease with the addition of spare ancilla sets. The Steane-Shor latency reaches a minimum withfour ancilla sets (two spare sets). This occurs because the preparation step in the Steane-Shor algorithm dominatesthe time required to perform a single bit or phase stabilizer measurement. Since in this scheme, only two sets can beprepared at a time, four total ancilla sets is suﬃcient to continuously utilize the preparation rows.

1 2 3 4 5 6 T o t a l La t en cy ( µ s ) Number of RowsTotal Latency (2P x R)S-SD-A

FIG. 5. Total latency for the Steane-Shor(S-S) and Steane-DiVincenzo-Aliferis(D-A) algorithms as a function of number ofancilla sets for the case of two preparation rows (2P x R).

The latency for the Steane-DiVincenzo-Aliferis algorithm reaches a minimum with a full six sets of ancilla. Thisoccurs because, unlike preparation, decoding can occur at any interaction well. Additional ancilla rows allow forgreater parallelized preparation and decoding and consequently less down-time between data-coupling steps. Theelimination of the limited veriﬁer measurement in mid-circuit clears out a critical bottleneck in the QEC execution.More speciﬁcally, the eﬃciency of the algorithm execution depends on the degree of overlap between separateoperations. The individual stabilizer measurements must be performed sequentially when using the one-set strategy.With two rows of ancilla, two stabilizer measurements can be performed in parallel. This is achieved by overlappingpartially the preparation of the second set with the ﬁrst. The preparation of the second set is delayed so that moving itto the data takes place when the ﬁrst set is moved back to the top row for measurement. Following coupling with thedata, the second set is moved to its original row via the unoccupied outer columns of the layout for measurement. Oncea set of ancilla is measured, it is re-prepared for its next stabilizer measurement. This process, shown in Figure 6(a),is repeated two more times to perform the remaining four stabilizer measurements.Adding a third or fourth set of ancilla provides additional spare sets that can move up and begin preparation assoon as the second set moves to the data. Furthermore, the measurements of the ﬁrst and second sets completebefore it is necessary to reuse them to perform the ﬁnal two stabilizer measurements. Therefore, unnecessary delaysare removed by using more sets. This process is shown in Figure 6(b). For Steane-Shor, adding ﬁfth and sixth setsremoves the need to reuse ancilla but does not provide any further decrease in the latency, due to the veriﬁcationbottleneck.Comparing DiVincenzo-Aliferis to Shor shows that Steane-DiVincenzo-Aliferis requires less time for preparationbut more time for measurement. As a consequence, the ancillae cannot be reused as quickly as they are neededfor a subsequent preparation. For example, with four sets of ancillae, the ﬁrst four stabilizer measurements can beperformed in rapid succession. However, ancilla sets three and four are prepared quickly and move to the data before

Error CheckPH1 PH2 PH3 BT1 BT2 BT3

M1 M2 M1 M2 M1 M2

Two ancilla sets T i m e Error CheckPH1 PH2 PH3 BT1 BT2 BT3

M1 M2 M3 M4 M1P1 P2 P1 P2 P1 P2 P1 P2 P3 P4 P1 P2M2C1 C2 C1 C2 C1 C2 C1 C2 C3 C4 C1 C2

Four ancilla sets T i m e Legend

Pn = Prepare ancilla set nCn = Couple ancilla set n with dataMn = Measure ancilla set n= MoveV1 V2 V1 V2 V1 V2 V1 V2 V3 V4 V1 V2 Vn = Verify ancilla set n (a) (b)

FIG. 6. Parallel strategy for implementing the Steane-Shor algorithm with (a) two sets of ancilla and two preparation rowsand (b) four sets of ancilla and two preparation rows. “PHx” and “BTx” indicate bit-stabilizer or phase-stabilizer operations,respectively. Dashed lines indicate steps for separate ancilla sets that must occur in sequence. Red arrows indicate steps thatmust occur in sequence because an ancilla set is reused. The need to reuse the two ancilla sets, as indicated by the red arrows in(a), prevents the ﬁrst set from being prepared as early as possible, as indicated by the non-horizontal dashed lines between PH2and PH3 and BT1 and BT2. The additional ancilla sets in (b) ensure that the ancilla are used at the speed of computation,with measurement of an ancilla set occurring before the need to prepare that ancilla set. the decoding/measuring of sets one and two are complete. Thus, the preparation of sets one and two for the lasttwo stabilizer measurements cannot begin as early as possible. This process is shown in Figure 7. A unique set ofancilla must be available for each stabilizer measurement operation (six sets) to optimize fully the parallelization ofthe DiVincenzo-Aliferis algorithm.

Error CheckPH1 PH2 PH3 BT1 BT2 BT3

Four ancilla sets T i m e P1M1C1D1 P2M2C2D2 P3M3C3D3 P4M4C4D4 P1M1C1D1 P2M2C2D2

Legend

Pn = Prepare ancilla set nCn = Couple ancilla set n with dataDn = Decode ancilla set n= MoveMn = Measure ancilla set n

FIG. 7. Parallel strategy for implementing the Steane-DiVincenzo-Aliferis algorithm with four sets of ancilla and two preparationrows. The need to reuse ancilla set one, as indicated by the red arrow, does not permit this set to be prepared for BT2 as earlyas possible, as indicated by the non-horizontal dashed line between BT1 and BT2.

B. Scheduled error correction: All-row preparation

Further decreases in the latency for the Steane-Shor and Steane-DiVincenzo-Aliferis algorithms can be achievedwhen three or more sets of ancillae are used by preparing every set in parallel. This all-row preparation allows asmany stabilizer measurements as there are ancilla sets to be performed in rapid succession. However, preparing onthree or more rows introduces delays between preparing the lower sets of ancillae and coupling them with the data.Here, the performance of the all-row strategy is studied for two, three, and six sets of ancillae. Four and ﬁve sets ofancilla are not considered because they are not commensurate with the total number of stabilizer measurements thatmust be performed.The total latencies for the Steane-Shor and Steane-DiVincenzo-Aliferis algorithms for all-row preparation are shownin Figure 8. The latencies again decrease consistently with the addition of extra ancilla sets, due to the abilityto prepare and measure ancilla sets in parallel. The biggest change from “on-demand” ancilla is that for Shormethod, all ancilla rows are allowed to have veriﬁcation. This removes the bottleneck seen in the two-row case.The parallelization advantage of Steane-Shor (moving ancilla qubits to data qubits for coupling while veriﬁcationmeasurement is occurring) then gives it a lower latency than DiVincenzo-Aliferis for any number of ancilla sets.

1 2 3 4 5 6 T o t a l La t en cy ( µ s ) Number of RowsTotal Latency (AllP x R)S-SD-A

FIG. 8. Total latency for the Steane-Shor(S-S) and Steane-DiVincenzo-Aliferis(D-A) algorithms as a function of number ofancilla sets.

C. Gate time variation

The total latencies and the eﬀectiveness of using spare ancilla sets to parallelize the algorithms vary with thegate times. For the case of two-row preparation, the latencies of the Steane-Shor and Steane-DiVincenzo-Aliferisalgorithms as a function of a gate-time multiplier for various numbers of ancilla sets are shown in Figure 9(a) and(b) respectively. Latencies asymptotically approach a linear dependence on the gate time for large gate times. Thisis expected because the time spent on moves and measurements becomes negligible compared to the time spent ongates speciﬁcally controlled gates.As the gate times get larger and dominate the total latency, run time improvement is dependent on the time spenton CNOT gates. The one-set strategy for the Steane-Shor algorithm requires 30 CNOT stages to be performed, wherea stage is deﬁned as one or more overlapping CNOT gates. In contrast, the two-row preparation strategy requiresonly 16 CNOT stages, dropping the latency almost in half. This is because more CNOT gates can be performed inoverlapping pairs or triplets. The two-row preparation limit prevents any further signiﬁcant latency reduction, sinceadditional rows have to wait for a preparation row to clear out before they are prepared. In particular, having threeancilla sets reduces Steane-Shor to 14 CNOT stages, a very minor improvement over two ancilla sets, while havingfour to six ancilla sets oﬀers no additional reduction beyond three sets.For the DiVincenzo-Aliferis algorithm in the case two preparation rows, time saved with two ancilla sets followsa similar trend to that achieved with Steane-Shor algorithm. However, using more sets provides greater latencyreduction since decoding is not subject to the two-row preparation limit. Similar to the Steane-Shor algorithm, theone-set strategy for the Steane-DiVincenzo-Aliferis algorithm requires 30 CNOT stages. Two rows reduces this to 16CNOT stages, three rows to 12 stages, four rows to 11 stages, and ﬁve or six rows to 10 stages. Six rows reduces thelatency a slight additional amount due to better parallel transport. The diminishing returns in adding ancilla rowsfor the case of long gate times can be seen in Figure 10(a).All-row preparation results as a function of a gate-time multiplier for various numbers of ancilla sets are shown inFigure 11(a) and (b) respectively. These times again asymptotically approach a linear dependence on the gate timefor large gate times, because of the dominance of gate time in the latency. In this case, Steane-Shor and Steane-DiVincenzo-Aliferis are nearly identical in latency. Without the preparation limit, both approaches have the exact0 T o t a l r unn i n g t i m e ( µ s ) Gate time multiplier 10 T o t a l r unn i n g t i m e ( µ s ) Gate time multiplier10 T o t a l r unn i n g t i m e ( µ s ) Measurement time multiplier 10 T o t a l r unn i n g t i m e ( µ s ) Measurement time multiplier (b)(a)(c) (d)

FIG. 9. Top: Total latencies for the (a) Steane-Shor and (b) Steane-DiVincenzo-Aliferis algorithms as a function of gate-timemultiplier for various numbers of ancilla sets. Bottom: Total latencies for the (c) Steane-Shor and (d) Steane-DiVincenzo-Aliferis algorithms as a function of measurement-time multiplier for various numbers of ancilla sets. In all cases, preparationis limited to the top two ancilla rows. T o t a l La t en cy ( µ s ) Number of RowsTotal Lateny with Slow CZ Gates (2P x R)S-SD-A0.0 x 10 T o t a l La t en cy ( µ s ) Number of RowsTotal Lateny with Slow CZ Gates (AllP x R)S-SD-A (a)(b)

FIG. 10. Latency for Steane-Shor(S-S) and Steane DiVincenzo-Aliferis(D-A) circuits as a function of total number of ancillarows for the case of CNOT execution time 1000 times longer than the default time. (a) Two preparation rows (2P x R); (b)ancilla can be prepared on all rows (AllP x R). T o t a l r unn i ng t i m e ( µ s ) Gate time multiplier10 T o t a l r unn i ng t i m e ( µ s ) Measurement time multiplier 10 T o t a l r unn i ng t i m e ( µ s ) Gate time multiplier10 T o t a l r unn i ng t i m e ( µ s ) Measurement time multiplier (b)(a)(c) (d)

AllP1RAllP2RAllP3RAllP6R

FIG. 11. Top: Total latency for (a) the Steane-Shor and (b) Steane-DiVincenzo-Aliferis algorithms as a function of gate-timemultiplier for various numbers of ancilla sets. Bottom: Total latency for (c) the Steane-Shor and (d) Steane-DiVincenzo-Aliferisalgorithms as a function of measurement-time multiplier for various numbers of ancilla sets. In all cases, the ancilla sets areprepared at-once (no limit on preparation).

D. Measurement time variation

The total latency and the eﬀectiveness of using spare ancilla sets to parallelize the algorithms also vary with themeasurement time. The results for the two-row-preparation Steane-Shor and Steane-DiVincenzo-Aliferis algorithmsas a function of a measurement time multiplier for various numbers of ancilla sets are shown in Figure 9(c) and (d)respectively. Results for all-row-preparation are shown in Figure 11(c) and (d). As with increased gate time, thelatency is dominated by measurement time for large measurement times. This is expected because the time spent onmoves and gates becomes negligible compared to the time spent on measurements.For both preparation types, the latency is reduced consistently as more sets of ancilla are used for each algorithm.As with the gate time variation, this is because more stabilizer measurements can be performed in parallel with moresets. The time saved also increases consistently as the measurement time increases. This increase indicates that moremeasurements are performed in parallel.As in all other cases, Steane-Shor sees a saturation eﬀect as the number of rows are increased. Increasing themeasurement time puts more emphasis on the veriﬁcation measurement bottleneck, since the ancilla-data controlledgate is not allowed to occur until after the veriﬁcation measurement is completed. For extremely long measurementtimes and one ancilla set, Steane-Shor is eﬀectively 12 measurement stages- one veriﬁcation measurement and oneancilla measurement per stabilizer. For two ancilla sets, Steane-Shor is eﬀectively 7 measurement stages. For three ormore ancilla sets, it saturates at 4 measurement stages. For shorter measurement times, this behavior is moderatedby transport and gate times, but the latency reduction still appears.By contrast, for long measurement times, Steane-DiVincenzo-Aliferis sees a superior latency for even a singleancilla row, and a staggered decrease in latency for additional ancilla sets. This is not surprising, since handlinglong measurement times was the motivation for this scheme. For a single set and extremely long measurement times,Steane-DiVincenzo-Aliferis is eﬀectively 6 measurement stages (one measurement per stabilizer). Using the two-row preparation strategy with two ancilla sets, measurement is reduced to 3 stages. Using three, four, or ﬁve setsreduces this to 2 measurement stages, and ﬁnally six sets reduces the latency to 1 measurement stage. This behavior2simply comes from dividing the total number of measurement stages (six) by the number of ancilla sets, rounding up.Figure 12(a) shows the behavior of both encodings for long measurement times (1000 times longer than the defaultvalue). T o t a l La t en cy ( µ s ) Number of RowsTotal Latency with Slow Measurements (2P x R)S-SD-A0.0 x 10 T o t a l La t en cy ( µ s ) Number of RowsTotal Latency with Slow Measurements (AllP x R)S-SD-A (a)(b)

FIG. 12. Latency for Steane-Shor(S-S) and Steane-DiVincenzo-Aliferis(D-A) circuits as a function of total number of ancillarows for the case of measurement execution time 1000 times longer than the default time. (a) Two preparation rows (2P x R);(b) ancilla can be prepared on all rows (AllP x R).

For the case of long measurement and all-row preparation, both Steane-Shor and Steane-DiVincenzo-Aliferis followthe same latency reduction: the algorithm is reduced to twelve or six measurement stages, respectively, which isdivided by the number of number of ancilla sets rounded up. In all cases, DiVincenzo-Aliferis is superior to Shormethod in terms of latency. This is shown in Figure 12(b).

E. Scheduled error correction: Two-row versus all-row preparation

Figure 13 shows the total execution time and logical error rates of Steane QECC with diﬀerent numbers of ancillaqubits and their scheduling scheme. In order to to reduce the errors in syndrome measurements, we assume that werun the whole QECC circuits three times and the ﬁnal syndromes are determined by the majority vote. This enablesus to ignore one measurement error on a set of syndrome extraction.As expected, it takes the longest to execute the whole QECC scheme when we keep only one set of ancilla in bothSteane-Shor and Steane-DiVincenzo-Aliferis circuits. For Steane-Shor circuits, the number of preparation rows hasthe most inﬂuence on the execution time. Adding four more ancilla sets with 16 qubits only reduced the time byan additional 5% (2P2R → → AllP3R and 2P6R → AllP6R). The execution time of theSteane-DiVincenzo-Aliferis circuits are more susceptible to numbers of available ancilla qubits. Adding four ancillasets with the same number of preparation rows reduces the total time by 20% (2P2R → . × − . This is comparable to the error rate of a single CNOT gate, and each qubit in3

55 60 65 70 75 80 85 90 95 100 B a s e li ne2 P R P R P R P R P R A ll P R A ll P R % T i m e f r o m B a s e li ne Steane Shor

40 50 60 70 80 90 100 B a s e li ne2 P R P R P R P R P R A ll P R A ll P R % T i m e f r o m B a s e li ne Steane DiVincenzo-Aliferis B a s e li ne2 P R P R P R P R P R A ll P R A ll P R F a il u r e R a t e Error RatesShorDiVincenzo-Aliferis

FIG. 13. Steane Shor and Steane-DiVincenzo execution time and error rates. Baseline schedule keeps only one set of ancillaeand re-uses them by preparing it six times. We assume that the whole set of syndrome extraction is repeated three times toreduce the measurement errors. Execution times are shown relative to the baseline which are 5 . × µs for the Steane-Shorcircuit and 5 . × µs for the Steane-DiVincenzo-Aliferis circuit. a Steane QECC encounters multiple CNOT gates. We also ﬁnd that ancilla decoding (Steane-Divincenzo-Aliferis)yields a substantially lower error than ancilla veriﬁcation (Steane-Shor). An abstract model using Steane syndromeextraction, instead of Shor syndrome extraction, also showed a ﬁdelity improvement when decoding was used insteadof veriﬁcation [25].Taken in total, the results suggest that increasing the number of simultaneous ancilla preparations has the greatestimpact on QEC run times, without adversely aﬀecting the QEC error rate. It also shows that the Steane-DiVincenzo-Aliferis algorithm is equivalent or superior to the Steane-Shor algorithm, particularly for long measurement times,given the error model presented in this paper. This is attributable to reducing (for simultaneous preparation) orremoving (for DiVincenzo-Aliferis) the ancilla veriﬁcation bottleneck after preparation. V. CONCLUSION

We examined changes in execution time and logical error rates of Steane QECC by varying the number of ancillaqubits and how they are scheduled on the ion trap architecture. We identiﬁed possible resource bottlenecks andopportunities for parallelism in preparing blocks of Shor ancilla. After studying both standard Shor and DiVincenzo-Aliferis ancilla for a variety of multiple ancilla set preparations, we found that one-time ancilla preparation was superiorto on-demand preparation. This is attributed to the time-intensive process of ancilla preparation driving QEC latency.On-demand ancilla preparation limits the speed of the QEC round to the speed of sequential ancilla preparations,particular for veriﬁcation schemes like Steane-Shor. In comparing the Steane-Shor and Steane-DiVincenzo-Aliferislatencies, we found that for the case of a single ancilla set with roughly equivalent gate, measurement, and transporttimes, Steane-Shor has slightly lower latency. This is due to the ability to perform parallel operations on the veriﬁcationand ancilla qubit. This also holds true for multiple ancilla sets with one-time preparation. For on-demand preparation,veriﬁcation becomes a bottleneck signiﬁcantly slowing down Steane-Shor compared to Steane-DiVincenzo-Aliferis asancilla sets are added. When gate times are increased, Steane-Shor and Steane-DiVincenzo-Aliferis become eﬀectivelyidentical, as they both have the same number of parallel CNOT operations. As measurement times are increased,Steane-DiVincenzo-Aliferis shows a much lower latency, as expected.The results presented are based on an ion trap description with optimistic error rates, but pessimistic gate andmovement times. The long times required for error correction in this paper could be improved in a number of ways.For example by using ultrafast lasers, single qubit gate times as fast as 50 ps have been achieved [37] and two-qubitgate times can in principle be considerably improved [38, 39]. Transport in the model is limited to 1 m/s but recentexperiments have shown that a transport speed of 40-80 m/s can be achieved while still controlling the quantumstates of the ion motion [40, 41].In the future, we plan to extend this analysis to other quantum error-correcting codes including non-CSS typecodes. We can also apply the same methods to study performance of various quantum architectures and to determinewhether there exists aﬃnity between certain devices and types of codes. The QMP method is ﬂexible enough tohandle a wide array of qubit architectures and couplings. Future improvements to the QCFT method will allow usto approximate error rates for circuits beyond those generated by Cliﬀord operators.4

ACKNOWLEDGMENTS

This work was supported by the Oﬃce of the Director of National Intelligence - Intelligence Advanced ResearchProjects Activity through Department of Interior contract D11PC20167. Disclaimer: The views and conclusionscontained herein are those of the authors and should not be interpreted as necessarily representing the oﬃcial policiesor endorsements, either expressed or implied, of IARPA, DoI/NBC, or the U.S. Government. [1] K. M. Svore, D. P. DiVincenzo, and B. M. Terhal, Quantum Information & Computation , 297 (2007).[2] R. Raussendorf and J. Harrington, Phys. Rev. Lett. , 190504 (2007).[3] A. G. Fowler, A. M. Stephens, and P. Groszkowski, Phys. Rev. A , 052312 (2009).[4] A. W. Cross, Master’s thesis, Massachusettes Institute of Technology, Cambridge, MA (2005).[5] D. Aharonov, M. Ben-Or, R. Impagliazzo, and N. Nisan, arXiv:quantph/9611028v1 (1996).[6] A. M. Steane, Phys. Rev. Lett. , 793 (1996).[7] D. P. DiVincenzo and P. W. Shor, Phys. Rev. Lett. , 3260 (1996).[8] Y. S. Weinstein, Phys. Rev. A , 012323 (2011).[9] Y. S. Weinstein and S. D. Buchbinder, Phys. Rev. A , 052336 (2012).[10] D. P. DiVincenzo and P. Aliferis, Phys. Rev. Lett. , 020501 (2007).[11] D. Kielpinski, C. Monroe, and D. J. Wineland, Nature , 709 (2002).[12] S. A. Schulz, U. Poschinger, F. Ziesel, and F. Schmidt-Kaler, New Journal of Physics , 045007 (2008).[13] F. Splatt, M. Harlander, M. Brownnutt, F. Z¨ahringer, R. Blatt, and W. H¨ansel, New Journal of Physics , 103008 (2009).[14] J. M. Amini, H. Uys, J. H. Wesenberg, S. Seidelin, J. Britton, J. J. Bollinger, D. Leibfried, C. Ospelkaus, A. P. VanDevender,and D. J. Wineland, New Journal of Physics , 033031 (2010).[15] M. D. Hughes, B. Lekitsch, J. A. Broersma, and W. K. Hensinger, Contemporary Physics , 505 (2011).[16] D. L. Moehring, C. Highstrete, D. Stick, K. M. Fortier, R. Haltli, C. Tigges, and M. G. Blain, New Journal of Physics ,075018 (2011).[17] R. B. Blakestad, C. Ospelkaus, A. P. VanDevender, J. H. Wesenberg, M. J. Biercuk, D. Leibfried, and D. J. Wineland,Phys. Rev. A , 032314 (2011).[18] J. T. Merrill, C. Volin, D. Landgren, J. M. Amini, K. Wright, S. C. Doret, C.-S. Pai, H. Hayden, T. Killian, D. Faircloth,et al., New Journal of Physics , 103005 (2011).[19] S. C. Doret, J. M. Amini, K. Wright, C. Volin, T. Killian, A. Ozakin, D. Denison, H. Hayden, C.-S. Pai, R. E. Slusher,et al., New Journal of Physics , 073012 (2012).[20] K. Wright, J. Amini, D. Faircloth, C. Volin, S. Doret, H. Hayden, C.-S. Pai, D. Landgren, D. Denison, T. Killian, et al.,New Journal of Physics , 033004 (2012).[21] N. Isailovic, M. Whitney, Y. Patel, and J. Kubiatowicz, SIGARCH Comput. Archit. News , 177 (2008).[22] T. S. Metodi, D. D. Thaker, A. W. Cross, F. T. Chong, and I. L. Chuang, in MICRO-38: Proc. 38TH Annual IEEE/ACMInt. Symp. on Microarchitecture (2005), pp. 305.[23] C. R. Clark, T. S. Metodi, S. D. Gasster, and K. R. Brown, Phys. Rev. A , 062314 (2009).[24] C. Monroe, R. Raussendorf, A. Ruthven, K. R. Brown, P. Maunz, L.-M. Duan, and J. Kim, arXiv:quant-ph/1208.0391(2012).[25] A. Abu Nada, B. Fortescue, and M. Byrd, arXiv:quant-ph/1303.4026 (2013).[26] P. J. Salas and A. L. Sanz, Phys. Rev. A , 052322 (2004).[27] A. M. Steane and B. Ibinson, Phys. Rev. A , 052335 (2005).[28] K. M. Svore, A. V. Aho, A. W. Cross, I. Chuang, and I. L. Markov, Computer , 74 (2006).[29] T. S. Metodi, D. Thaker, A. W. Cross, F. T. Chong, and I. L. Chuang, in Proc. SPIE

Vol. 5815, p. 91 (2005).[30] A. M. Steane, Phys. Rev. A , 042322 (2003).[31] The ion trap PMD is part of the government furnished information that IARPA has provided for oﬃcial use only. Thenecessary data to reproduce the results are contained in this paper.[32] M. A. Rowe, A. Ben-Kish, B. Demarco, D. Leibfried, V. Meyer, J. Beall, J. Britton, J. Hughes, W. M. Itano, B. Jelenkovic,et al., Quantum Information & Computation , 257 (2002).[33] J. P. Home and A. M. Steane, Quantum Information & Computation , 289 (2006).[34] N. Nilsson, Problem Solving Methods in Artiﬁcial Intelligence (McGraw Hill, 1971).[35] J. Pearl,

Heurestics: Intelligent Search Strategies for Computer Problem Solving (Addison Wesley, 1984).[36] P. Aliferis, D. Gottesman, and J. Preskill, Quantum Information & Computation , 97 (2006).[37] W. C. Campbell, J. Mizrahi, Q. Quraishi, C. Senko, D. Hayes, D. Hucul, D. N. Matsukevich, P. Maunz, and C. Monroe,Phys. Rev. Lett. , 090502 (2010).[38] J. J. Garcia-Ripoll, P. Zoller, and J. I. Cirac, Phys. Rev. Lett. , 157901 (2003).[39] C. D. B. Bentley, A. R. R. Carvalho, D. Kielpinski, and J. J. Hope, New Journal of Physics , 043006 (2013).[40] R. Bowler, J. Gaebler, Y. Lin, T. R. Tan, D. Hanneke, J. D. Jost, J. P. Home, D. Leibfried, and D. J. Wineland, Phys.Rev. Lett. , 080502 (2012).[41] A. Walther, F. Ziesel, T. Ruster, S. T. Dawkins, K. Ott, M. Hettrich, K. Singer, F. Schmidt-Kaler, and U. Poschinger, Phys. Rev. Lett.109