Halving the width of Toffoli based constant modular addition to n+3 qubits
HHalving the width of Toffoli based constant modular addition to n+3 qubits
Oumarou Oumarou, Alexandru Paler,
2, 3, 4 and Robert Basmadjian Clausthal University of Technology, 38678 Clausthal-Zellerfeld, Germany University of Texas at Dallas, Richardson, TX 75080, USA Transilvania University, 500036 Bras , ov, Romania Johannes Kepler University, 4040 Linz, Austria
We present an arithmetic circuit performing constant modular addition having O ( n ) depth ofToffoli gates and using a total of n + 3 qubits. This is an improvement by a factor of two comparedto the width of the state-of-the-art Toffoli-based constant modular adder. The advantage of ouradder, compared to the ones operating in the Fourier-basis, is that it does not require small anglerotations and their Clifford+T decomposition. Our circuit uses a recursive adder combined withthe modular addition scheme proposed by Vedral et. al. The circuit is implemented and verifiedexhaustively with QUANTIFY, an open-sourced framework. We also report on the Clifford+T costof the circuit. I. INTRODUCTION
Arithmetic quantum circuits, namely adders, modularadders and multipliers, are an integral part of the imple-mentation of practical quantum algorithms. In practice,adders, e.g. [2–4, 8], are building blocks of more complexfunctions, such as the modular exponentiation needed forShor’s algorithm [1, 6, 7]. Optimising adders as a sub-circuit can eventually benefit the entire circuit due to theconvoluted relation that exists between the arithmeticoperations. There are also quantum arithmetic circuitsdeveloped specifically for Shor’s algorithm, such as [1, 7].These exploit the fact that one of the inputs is an integerwhich does not require quantum storage.Modular addition is a fundamental operation in Shor’salgorithm. The circuit for modular addition takes threeinteger inputs a , b and the ring size N , and outputs a + b or a + b − N depending on whether a + b < N ornot. In [7], the presented modular adder is composed oftwo comparators and an adder. The compartors in themodular adder circuit use approximately n qubits (i.ewidth) but the adder only uses around n qubits. Hence,the width of the entire circuit ( ∼ n ) is dictated by thecomparators.In [13], no Comparators per se are used, but still thewidth of the modular adder is even greater when com-pared to the aforedescribed circuit because the adderused only, disregarding the other employed registers,needs n + 1 qubits.Modular adders of low depth and using O ( n ) qubitsare known, but use the QFT [3] approach. The QFTuses controlled rotation gates, and the angles are of theform e i πkN , where the maximum value of N is n foraddition on n qubits. The rotation angles get smallerwith increasing number of qubits. When error-correctingsuch circuits, the controlled rotation gates have to bedecomposed into Clifford+T. The decomposition proce-dure introduces on the order of hundred T gates per ro-tation gate [12] (exact number depends on the decompo-sition approximation precision), such that QFT modularaddition is not necessarily resource efficient when error- corrected.The main contribution of this work is a method thatperforms constant modular addition using the adder from[7] while bypassing the need for comparators. Inspiredby the modular addition method from [13], we combineit with the adder from [7] to yield a modular adder witha linear depth of O ( n ) and a qubit width of n +3 . Conse-quently, the width of constant modular addition is halvedand reduced very close to its minimum of n wires.The rest of this paper is organised as: In Section II, wepresent the circuits used to construct our modular adder.In Section III, we present the design method and steps forour circuit. Finally, in Section IV we investigate differentdecompositions into Clifford+T scenarios, analyse andcompare them. II. PRELIMINARIES
In the following, we review the construction of the re-cursive adder from [7], and the modular addition methodfrom [13].
A. The Incrementer and the Carry Gates
The Incrementer is a circuit that adds one to the valueof an integer stored in a quantum register. Incrementa-tion can be achieved with the help of a quantum adderwith | a (cid:105) as the operand to be incremented, while thevalue of the second operand | g (cid:105) is irrelevant (garbage reg-ister). The second register is used in order to perform atrick based on the two’s complement representation. Wedenote by ˜ g the bitwise negation of the number g . Fornumbers represented as two’s complement, ˜ g = − g − ,such that g + ˜ g = − . Thus, the value a can be incre-mented by performing a − g − ˜ g . In terms of a quantumcircuit, the incrementation procedure is the following:• Subtract g from a : | a (cid:105) | g (cid:105) −→ | a − g (cid:105) | g (cid:105) • Flip all the qubits of the garbage register g : | a − g (cid:105) | g (cid:105) −→ | a − g (cid:105) | ˜ g (cid:105) a r X i v : . [ qu a n t - ph ] F e b Figure 1. The Incrementer. There are two quantum registers: a stores the operand, | g (cid:105) is a garbage register. Add − is thesubtraction operation implemented using the inverse of anaddition circuit. X n is n X gates applied on the garbageregister. • Subtract ˜ g from a − g : | a − g (cid:105) | ˜ g (cid:105) −→| a − g − ˜ g (cid:105) | ˜ g (cid:105) = | a − g + g + 1 (cid:105) | ˜ g (cid:105) • Restate the garbage register to its original state: | a + 1 (cid:105) | ˜ g (cid:105) −→ | a + 1 (cid:105) | g (cid:105) The Carry gate proposed in [7] determines the mostsignificant bit of a sum of two integers. The circuit usesa classical n -bit constant c , an n -qubit register | a (cid:105) whichstores the first operand, a garbage register but of n − qubits, and an ancilla initialised to | (cid:105) . The content ofthe garbage register is irrelevant, but will be used duringthe computation. The n th bit of the sum ( a + c ) will bestored in the third register as: | a (cid:105) | g (cid:105) | (cid:105) CarryGate −−−−−−−→ | a (cid:105) | g (cid:105) | ( a + c ) n (cid:105) B. The Recursive Adder
The recursive adder from [7] computes the sum of twointegers: the quantum register | a (cid:105) and the classical con-stant c . This adder uses mainly two sub-circuits: theCarry gate and the Incrementer of Section II A. The in-puts to the adder are a quantum register of size n + 1 , agarbage qubit | g (cid:105) and a classical constant c . It outputsthe sum | a + c (cid:105) | g (cid:105) .The recursive adder is an in-place adder: the construc-tion uses the fact that the sum bit at position m dependson the carry bit generated by the m − bits before. For m being the middle of the | a (cid:105) bitstring, m = n , and know-ing the carry bit from the first half of the bits (use Carrygate), the second half of the bits can be treated as a sep-arate number which is just incremented (use Incrementergate). For simplicity of demonstration, we consider thegarbage qubit to be initialised to | (cid:105) . Later we presentthe general concept with an arbitrary value of | g (cid:105) . Theaddition procedure is performed as:1. Split the register | a (cid:105) = | a H (cid:105) | a L (cid:105) where a H and a L are respectively the higher and lower halves of thebinary representation of a . Split also the constant c in the same manner as | a (cid:105) . 2. Apply the Carry gate to the | a L (cid:105) and c L using | a H (cid:105) as garbage register and the garbage qubit | g (cid:105) tostore the carry bit of a L + c L .3. Use the garbage qubit | g (cid:105) to control whether theupper half a H should be incremented or not. If | g (cid:105) = | (cid:105) then we have a carry bit from a L + c L andit should be added to a H . Hence the Incrementershould be applied. Otherwise, we don’t apply it.4. To reset the carry qubit to | (cid:105) , reapply the Carrygate.5. Recursively apply the previous three steps to a L and a H .The upper part of Figure 2 illustrates the recursionprocess for the case of 4-bitstring | a (cid:105) . The left most part(e.g the big box) represents the entire one 4-bitstringwhich is divided into two 2-bitstrings (e.g. the middletwo boxes). Those are on their turn divided into four1-bitstring (e.g. the right 4 small boxes). Note that therecursion stops when one n-bitstring is subdivided into n | a H (cid:105) controlled by | g (cid:105) a setof CN OT gates also targeting | a H (cid:105) and controlledby | g (cid:105) from the left.• Right: another set of CN OT gates.This construction works because if the initial garbageis | (cid:105) , the circuit is just as listed before. For | g (cid:105) = | (cid:105) ,the first Incrementer generates | g + 1 (cid:105) , the bitwise nega-tion results in | ˜ g (cid:105) = − ( g + 1) − − g − . Thereare two options: a) the first carry flips | g (cid:105) such thatthe lowest bit is | (cid:105) , the Incrementer and the secondCarry are not called, and the second negation returnsthe state to | g + 1 (cid:105) ; b) the first carry does not flip | g (cid:105) ,the controlled-Incrementer is called such that |− g − (cid:105) ,the second Carry is not called and the final bit flips resultin (cid:12)(cid:12)(cid:12) (cid:94) − g − (cid:69) = g + 1 − g . C. Modular Addition
The intuitive way of constructing a modular adder isto use a Comparator gate to test the sum of the twooperands with the maximum value representative in thering. Based on the comparison result, we either onlyadd the two operands, or, in case there was an overflow,subtract N from the sum. This approach to modularaddition, used by [7], requires n + O (1) wires becauseof the comparisons.Compared to the intuitive approach from [7], the mod-ular adder in [13] has higher depth and an n + O (1) Figure 2. Recursive adder circuit design [7]. The registers | a L (cid:105) , | a H (cid:105) are the lower, upper halves of the input integer a respectively and | g (cid:105) is the garbage qubit used as a control forthe Incrementer and the X gates and as a result qubit for theCarry gates. The triangles point to the used garbage qubits.Figure 3. Modular adder circuit design used in [13]. a , b are the two integers and N is the size of the ring. | a (cid:105) , | b (cid:105) are quantum registers of size n and n + 1 respectively. Theaddition and subtraction are performed using the adder in[13] requiring hence another quantum register of size n forcarry bits which is not depicted here. width. However, the modular addition approach is gen-eral and not tied to a particular adder design – the addercan be replaced. In Section III, we use in-place recur-sive adders for the additions from Figure 3. Also, two ofthe inputs, accounting for n qubits can be replaced byclassical values in the case of constant addition.We successfully designed a modular adder that usesonly n+3 qubits which is approximately a reduc-tion compared to state of the art modular adder like in[7] while simultaneously maintaining the linearity in thedepth. III. METHODS
In [7], the Comparator circuits are the culprit behindthe n width of modular addition, although the modu-lar adder uses for the addition/subtraction only approxi-mately half of the wires. At the same time, the modularadder from [13] does not use Comparator circuits, and isagnostic of how the adders are implemented. We use arecursive adder of width n + 2 from [7] (in Figure 2 | a (cid:105) isof size n + 1 and there is an additional ancilla g of size 1qubit) to implement the modular addition. The advan-tage is that we halve the width because we eliminate theneed for Comparator circuits.The Comparator in [7] is implemented by applyingtheir Carry circuit, which uses n + O (1) qubits, out ofwhich n are garbage. When implementing recursive ad-dition, the doubling of qubits is not an issue, because inthe adder half of the n qubits are garbage for the otherhalf (cf. Figure 2). However, for the Comparator circuitthis approach is not efficient, because the Carry has toconsider all n qubits. Although one of the modular adderdiagrams in [9] shows n wires, internally the adder uses n ancillae for a total width of n .It is possible to implement constant-modular additionwithout a Comparator circuit and the corresponding an-cillae [13]. We implement the recursive adder using: a)the Carry gate from [7], and b) the Incrementer is thelinear-depth controlled-adder (CtrlAdd) from [8]. Theoriginal CtrlAdd circuit has a width of n + 3 , and wecan cut two of the ancillae because we made sure fromthe size of the input register | a (cid:105) that the incrementationnever overflows. The CtrlAdd circuit will be applied toonly half of the bits from the recursive adder, and theother half are used as garbage [8] (see Figure 2).The original circuit from [13] (Figure 3) has width n + O (1) where n +1 of which are used to store a, b and N (which were not hardwired). Implementing constant-modular addition requires a single n -qubit quantum reg-ister, namely | a (cid:105) , while c (we use c instead of b to high-light that it is a constant) and the size of the ring, N ,are classical values. The modular addition is performedin the following steps (Figure 4):1. Add a and c . | a (cid:105) | g (cid:105) | (cid:105) add ( a,c ) −−−−−→ | a + c (cid:105) | g (cid:105) | (cid:105)
2. Subtract N from the previous sum, by running therecursive adder in inverse with a + c and N as in-teger and classical constant inputs respectively.3. If the flag bit of the result of the previous subtrac-tion equals then it is negative and we need tore-add N . Otherwise, if it is positive, we leave it asis.4. Reset the flag qubit to its original state. Subtract c from a + c mod ( N ) . If it is (positive) then theflag qubit should be flipped.5. Add c to the result to recover a + c mod ( N ) . Figure 4. Registers a (quantum) and c (classical) are the twooperands. N is the ring size. | a (cid:105) is the quantum registerholding a at the beginning of the circuit. The size of a is n + 1 because it will hold the sum a + c . | g (cid:105) is a singlegarbage qubit used in the recursive adder, and | (cid:105) is markingthe qubit controlling the addition/subtraction operations. After the second step the state is | a + c − N (cid:105) | g (cid:105) | (cid:105) .The state of the most significant qubit (MSB) of | a + c − N (cid:105) indicates whether it is positive or negative.During the third step, if a + c − N is indeed positive,then its MSB is | (cid:105) and a + c − N = a + c mod ( N ) . On theother hand, if a + c − N < , the MSB is | (cid:105) and we shouldre-add the constant N . To implement both conditions inthe circuit, we apply a CN OT gate between MSB andthe flag qubit which is initialised to | (cid:105) . As a result, theflag equals | (cid:105) if the a + c ≥ and equals | (cid:105) in the othercase. During the third step, we hence apply the recursiveadder controlled by the flag qubit adder with a + c − N and N as operands.The fourth step resets the lowest qubit in Figure 4, theflag qubit, to its original state | (cid:105) . We subtract c from a + c mod ( N ) by applying the inverse of the recursive adder.If the result is positive, the flag qubit will be flipped:apply a CNOT between the most significant qubit of theresult a + c − c mod ( N ) .Effectively, the constant modular adder using themethod from [13] has the width of the adder used asa component. Using the recursive addition circuit, thetotal width of the resulting modular adder is n + 3 . Theinput | a (cid:105) is n + 1 qubits wide in order to store carry.There are also two ancillae: 1) an ancilla to control theincrementation procedures all along the recursive addi-tion operation, 2) the flag qubit from within the modularadder. IV. RESULTS AND RESOURCE ANALYSIS
The presented modular adder was implemented inQUANTIFY [10] which is open-sourced and availableat https://github.com/quantumresource/quantify .We exhaustively tested the compiled Toffoli formulationof the adder using the Toffoli circuit simulator from [5].Herein we focus on the Clifford+T cost of the adder,because the Toffoli gate decomposition influences the re-source efficiency of the compiled circuit. The resourceanalysis was implemented with QUANTIFY, too.To determine the depth of our modular adder, we needfirst to determine the depth of the Incrementer and the recursive adder. For the Incrementer, let D T f denote thedepth of a decomposed Toffoli, and A present the numberof ancillae, then for n > we have the following [D]epth and [W]idth : D Inc ( n ) = (6 n − D T f + 2 n − W ( n ) = 2 n + 1 + A The recursive adder is built with two components,namely the Incrementer and the Carry gate. Unlike theIncrementer, the depth of the Carry gate, and conse-quently the recursive adder, depends on the value of theconstant c . In the following, we will study the worstcase scenario that yields a maximum depth and whichcorresponds to c being equal to n − . Using the samenotation as from the previous equation, the depth of theCarry gate for n > is given by: D maxcarry ( n ) = (4 n − D T f + 4 n − , if n ≥ , if n = 21 , if n = 1 Hence the depth of the recursive adder equals: D RA = 2 log ( n ) (cid:88) i =1 ( D Inc ( n i ) + D maxcarry ( n i ) + 2) W = n + 2 Lastly, our modular adder being composed of four re-cursive adders, one controlled version (which has thesame depth but two more Toffoli gates that replace twoCNOT gates) and two CNOT gates has the followingdepth: D HBA = 4 × D RA + D CRA + 2 (1)with D CRA being the depth of the controlled version andis equal to D CRA = (4 n − D T f + 4 n − .The adder is of the ripple-carry type, and all the Toffoligates are sequential. When using a Toffoli decomposition(see Appendix) which requires ancillae, only a constantnumber of ancillae would be needed since they can bereused for the rest of the Toffoli gates. The same factapplies for the Carry gate. V. CONCLUSION
We designed a constant modular adder that uses onlyn+3 qubits which is a reduction compared to thestate-of-the-art constant modular adder from [7] whilesimultaneously maintaining the linearity in the depth.We conjecture that the addition method from [13] is ageneralisation of the incrementation trick (Section II A).We implemented the new constant modular adder inQUANTIFY[10], which is an open-sourced framework.Moreover, using this framework, we compiled the circuitsusing Toffoli gates and exhaustively verified the correct-ness. Future work is to use our modular addition forimproving circuits from [11].The proposed adder is useful for implementing, for ex-ample, quantum random walks or any other computationwhere state changes are a function of a constant. Ouradder will also be useful for verifying very large quantum circuits that include constant modular addition. The sizeof the state-vectors and the overall matrix representingthe circuit is reduced to half and quarter approximatelywhen compared to [7] and [13] respectively. Such reduc-tion will have a quadratic and quatric speedup on thesimulation of the quantum circuits. [1] Stephane Beauregard. Circuit for Shor’s algorithm using2n+ 3 qubits.
Quantum Information & Computation ,3(2):175–185, 2003.[2] Steven A Cuccaro, Thomas G Draper, Samuel A Kutin,and David Petrie Moulton. A new quantum ripple-carryaddition circuit. arXiv preprint quant-ph/0410184 , 2004.[3] Thomas G Draper. Addition on a quantum computer. arXiv preprint quant-ph/0008033 , 2000.[4] Thomas G Draper, Samuel A Kutin, Eric M Rains, andKrysta M Svore. A logarithmic-depth quantum carry-lookahead adder.
Quantum Information & Computation ,6(4):351–369, 2006.[5] Casey Duckering. Cirq Toffoli circuit simulator. https://github.com/cduck/cirqtools/blob/master/cirqtools/classical_simulator.py .[6] AG Fowler, SJ Devitt, and LCL Hollenberg. Imple-mentation of shor’s algorithm on a linear nearest neigh-bour qubit array.
Quantum Inf. Comput. , 4(quant-ph/0402196):237–251, 2004.[7] Thomas Häner, Martin Roetteler, and Krysta M Svore.Factoring using 2n+ 2 qubits with Toffoli based modularmultiplication.
Quantum Information & Computation ,17(7-8):673–684, 2017.[8] Edgard Muñoz-Coreas and Himanshu Thapliyal. Quan-tum circuit design of a T-count optimized integer multi-plier.
IEEE Transactions on Computers , 68(5):729–739,2018.[9] Kento Oonishi, Tomoki Tanaka, Shumpei Uno, TakahikoSatoh, Rodney Van Meter, and Noboru Kunihiro. Ef-ficient Construction of a Control Modular Adder ona Carry-Lookahead Adder Using Relative-phase ToffoliGates. arXiv preprint arXiv:2010.00255 , 2020.[10] Oumarou Oumarou, Alexandru Paler, and Robert Bas-madjian. QUANTIFY: A framework for resource analy-sis and design verification of quantum circuits. In , pages 126–131. IEEE, 2020.[11] Rich Rines and Isaac Chuang. High perfor-mance quantum modular multipliers. arXiv preprintarXiv:1801.01081 , 2018.[12] Neil J. Ross and Peter Selinger. Optimal ancilla-freeclifford+t approximation of z-rotations.
Quantum Info.Comput. , 16(11–12):901–953, September 2016.[13] Vlatko Vedral, Adriano Barenco, and Artur Ekert.Quantum networks for elementary arithmetic operations.
Physical Review A , 54(1):147, 1996. Figure 5. The 4AT1 Toffoli decomposition offers better depththan the 0AT3 decomposition.
APPENDIX
Figure 5 illustrates the depth of the modular adderwhen using the 4AT1 (Figure 7) and the 0AT3 (Figure 6)decompositions. The four ancillae from 4AT1 are reusedfor all the Toffoli gates of the Incrementer, such that theoverall area is better when using 4AT1.Concerning the recursive adder within the modularadder, three scenarios exist regarding the decompositionof the Toffoli gates. We consider the O ( nlog ( n )) -depthsequential recursive adder and the O ( n ) -depth parallelrecursive adder:1. Using the sequential adder and without any paral-lelism: all Toffoli gates are decomposed using 4AT1such that only 4 ancillae are introduced and can bereused for the recursive addition sub-circuits.2. Using the parallel adder and maintaining the par-allelism: we choose the 4AT1 decompositions andthe parallel recursive addition sub-circuits use dif-ferent ancillae, for a total of × n = n – becausethe leaves do not have Toffoli gates . There are n/2 additions of single bits, n/4 additions of 2 bits,etc. Single bit additions use CNOTs. Toffoli gates are used onlystarting with the 2 bit additions.
3. Using the parallel adder without introducing ancil-lae: the Toffoli gates are decomposed using 0AT3.The depth increases by a factor of / if parallelCNOTs are allowed, otherwise the depth is actu-ally reduced by / ≈ .The gate count of the recursive adder circuit is in O ( nlog ( n )) . If gate parallelism is allowed, the depth canbe reduced to O ( n ) , with two options (cf. Fig. 2):1. With n ancilla. Introduce as many as necessaryancillae to use them as garbage qubits and henceparallelise the sub-circuit blocks in each recursion.This would mean that n ancillae are added in total.2. Without ancilla. Without loss of generality, weonly execute the sub-circuit blocks on the lowerhalf | a L (cid:105) . We then use the qubits of the upperhalf | a H (cid:105) as a garbage qubits to parallelise the sub-circuit blocks in each recursion round. Once therecursion is finished on the lower half | a L (cid:105) , we ex-ecute the sub-circuit blocks on the upper half | a H (cid:105) using | a L (cid:105) as garbage qubits for the same purpose. Figure 6. The 0AT3 Toffoli gate decomposition uses no ancil-lae and has T-depth 3. Figure 7. The 4AT1 Toffoli gate decomposition uses four an-cillae and has T-depth 1.
Without parallelism, the area (depth x width) of theadder is definitely worse than with parallelism. We havean O ( nlog ( n )) depth and still a linear width, even thoughno ancillae were introduced with 0AT3. As a result, theoverall area is equivalent to O ( n log ( n )) .When parallelising the adder and using 4AT1, the areascales asymptotically in O ( n ) . However, with 4AT1there is a depth ratio of ∼ compared to the third op-tion with 0AT3 but with approximately twice as muchwidth. As a result, when determining the area, the thirddecomposition scenario with the 0AT3 decomposition isbetter than the second alternative with the 4AT1 decom-position. Because, the depth ratio ∼ is less than thewidth ratio ∼ .The overall area of the adder decomposed with 4AT1in the last scenario is then ∼ × >1