[PDF] Bit-Slicing the Hilbert Space: Scaling Up Accurate Quantum Circuit Simulation to a New Level

Abstract

Quantum computing is greatly advanced in recent years and is expected to transform the computation paradigm in the near future. Quantum circuit simulation plays a key role in the toolchain for the development of quantum hardware and software systems. However, due to the enormous Hilbert space of quantum states, simulating quantum circuits with classical computers is extremely challenging despite notable efforts have been made. In this paper, we enhance quantum circuit simulation in two dimensions: accuracy and scalability. The former is achieved by using an algebraic representation of complex numbers; the latter is achieved by bit-slicing the number representation and replacing matrix-vector multiplication with symbolic Boolean function manipulation. Experimental results demonstrate that our method can be superior to the state-of-the-art for various quantum circuits and can simulate certain benchmark families with up to tens of thousands of qubits.

Full PDF

BBit-Slicing the Hilbert Space: Scaling Up AccurateQuantum Circuit Simulation to a New Level

Yuan-Hung Tsai † , Jie-Hong R. Jiang †‡ , and Chiao-Shan Jhang ‡ † Graduate Institute of Electronics Engineering, ‡ Department of Electrical EngineeringNational Taiwan University , Taipei, Taiwan

Abstract —Quantum computing is greatly advanced in recentyears and is expected to transform the computation paradigmin the near future. Quantum circuit simulation plays a key rolein the toolchain for the development of quantum hardware andsoftware systems. However, due to the enormous Hilbert space ofquantum states, simulating quantum circuits with classical com-puters is extremely challenging despite notable efforts have beenmade. In this paper, we enhance quantum circuit simulation intwo dimensions: accuracy and scalability. The former is achievedby using an algebraic representation of complex numbers; thelatter is achieved by bit-slicing the number representation andreplacing matrix-vector multiplication with symbolic Booleanfunction manipulation. Experimental results demonstrate thatour method can be superior to the state-of-the-art for variousquantum circuits and can simulate certain benchmark familieswith up to tens of thousands of qubits.

I. I

NTRODUCTION

Recent progress in building quantum computers has set themilestone of demonstrating quantum supremacy [1]. Quantumcomputation is expected to provide computing power beyondthe reach of classical computers and transform the informationtechnology in the near future. Quantum hardware and soft-ware systems, e.g., [2], [3], are under active development.Quantum system design requires a comprehensive softwaretoolchain, where quantum circuit simulation is one of the keycomponents.Simulating quantum circuits on a classical computer isindispensable to understand system behavior and verify designcorrectness especially before universal quantum computersare ready. However the simulation is challenging becausequantum states have to be described in the complex vectorspace and the space is exponential in the number of quantumbits (qubits). Although there are special classes of quantumcircuits, such as the stablizer circuits, that allow efﬁcientsimulation by classical computers, see, e.g., [4], simulatinggeneral quantum circuits can be extremely difﬁcult. In fact,it is this difﬁculty that triggered Richard Feynman envisag-ing quantum simulators/computers in his seminal lecture on’Simulating Physics with Computers’ in 1981.Despite the aforementioned challenges, various simulationalgorithms have been proposed and tools are available to date.Depending on the underlying data structure, existing methodscan be mainly classiﬁed into array-based, e.g., [5]–[9] , ordecision-diagram (DD) based, e.g., [10]–[13]. The formeris rather limited without exploiting supercomputing facilitiesand hardly scalable to 50 qubits even with supercomputing.On the other hand, although decision diagrams are well-known for their typical memory explosion problems, the latter when engineered properly can be superior to the former[12]. The simulation method proposed in this work is DD-based. While prior DD-based methods [10], [13], [14] requirespecializing multi-terminal or multi-valued DDs for generalquantum circuit simulation, ours relies on standard binarydecision diagrams (BDDs) and takes an off-the-shelf BDDpackage for computation.The state-of-the-art methods [12], [13] are based onthe Quantum Multiple-valued Decision Diagrams (QMDDs)[14]. The data structure consists of decision nodes withmulti-valued branching for matrix representation and edgesweighted with complex numbers for unitary operator andstate vector representation and manipulation. In contrast,we simply rely on BDD to represent quantum states andsupport matrix and vector multiplication. Moreover, unlikeprior work [12], [13] with precision loss representing complexnumbers, our method employs the algebraic representation[15] for accurate complex number representation under theconsidered set of unitary operators (see Table I) generalenough to achieve universal quantum computation. Note thatalthough the QMDD-based methods can potentially beneﬁtfrom algebraic representation, it cannot be done as easy asours due to the complications of unique representation anddivision by normalization factors [15]. To the best of ourknowledge, our method is the ﬁrst work that utilizes theaccurate representation for quantum circuit simulation.In addition to the accuracy enhancement, to extend thecapacity of quantum circuit simulation, we devise 1) a bit-slicing technique that represents a state vector bit by bit eachcorresponds to a BDD, and 2) an implicit method that replacesmatrix-vector multiplication with a set of precharacterizedBoolean formulas of the unitary operators for BDD manip-ulation. Experimental results demonstrate the accuracy andscalability advantages of the proposed method compared tothe state-of-the-art over a number of different benchmarks.Notably for certain benchmark families, our method cansimulate circuits up to tens of thousands of qubits beyond thecapacity of other existing simulators. While encouraged by thestrengths of our approach, we also identify some weaknessesfor future improvements.The rest of the paper is organized as follows. In Section II,preliminaries are given. The main simulation algorithm is thenpresented in Section III. Section IV shows the experimentalresults and evaluates different simulation methods. Finally inSection V we concludes this paper and outline some directionsfor future work. a r X i v : . [ c s . ET ] J u l I. P

RELIMINARIES

For convention in the sequel, variables are denoted withlower-case letters, e.g. x , while Boolean functions and theirBDD representations are denoted with upper-case letters, e.g., F . We denote Boolean connectives negation by overline or ¬ , conjunction by ∧ , disjunction by ∨ , and exclusive-or by ⊕ . We sometimes omit ∧ in a Boolean formula.A literal is a Boolean variable, e.g., x , in the positive phase,or its negation, e.g., ¬ x , in the negative phase. Let var ( (cid:96) ) denote the underlying variable of literal (cid:96) ; let phase ( (cid:96) ) denotethe phase of (cid:96) for phase ( (cid:96) ) = 1 if (cid:96) = δ ( (cid:96) ) and phase ( (cid:96) ) = 0 otherwise. A cube is a conjunction of literals, which is treatedas a set of literals.The cofactor of a Boolean function (BDD) F with respectto a literal (cid:96) is denoted by F | (cid:96) , which corresponds to the newBoolean function same as F expect for variable var ( (cid:96) ) in F being substituted with phase ( (cid:96) ) . The notion of cofactoris straightforwardly generalized to a cube q so that F iscofactored with respect to the literals in q . A. Quantum Circuit Basics

Quantum computation through quantum circuit executiontakes three actions: 1) initial state preparation, 2) state evo-lution via quantum circuit operation, 3) qubit measurement.A quantum circuit simulator has to implement algorithms toperform these actions as detailed in the following.

1) Initial State Preparation:

Unlike a classical bit takesvalue either 0 (in other words, in state 0, denoted | (cid:105) ) or1 (in state 1, denoted | (cid:105) ), a qubit in state | ψ (cid:105) can be in asuperposition state of both | (cid:105) and | (cid:105) described by | ψ (cid:105) = α · | (cid:105) + β · | (cid:105) where α, β ∈ C are probability amplitudes satisfying thenormalization constraint | α | + | β | = 1 . For an n -qubitquantum system, the qubits can be entangled and an n -qubitstate | ψ (cid:105) can be described by | ψ (cid:105) = (cid:88) i ∈{ , } n α i · | i (cid:105) , (1)where the probability amplitudes α i ∈ C satisfy (cid:88) i ∈{ , } n | α i | = 1 . (2)Therefore, a quantum state of n qubits can be alternativelyrepresented as a n -dimensional state vector [ α , ..., α n − ] T .For initial state preparation, a quantum circuit simulatorneeds to construct a state vector representing some speciﬁedinitial quantum state.

2) State Evolution via Quantum Circuit Operation:

Thestate of a quantum system can be updated by the applicationof quantum operations (or so-called quantum gates). Thefunctionality of a quantum operation applied on n qubitscan be described by a n × n dimensional unitary matrix U (satisfying U − = U † , that is, its inverse matrix equalsits Hermitian adjoint). The quantum gates considered in thiswork are listed in Table I. TABLE IQ

UANTUM GATES SUPPORTED IN THIS WORK .Gate Symbol MatrixPauli-X (X) (cid:20) (cid:21)

Pauli-Y (Y) (cid:20) − ıı (cid:21) Pauli-Z (Z) (cid:20) − (cid:21) Hadamard (H) √ (cid:20) − (cid:21) Phase (S) (cid:20) ı (cid:21) T (cid:20) e ıπ/ (cid:21) Rx( π ) √ (cid:20) − ı − ı (cid:21) Ry( π ) √ (cid:20) −

11 1 (cid:21)

Controlled-NOT (CNOT)  

Controlled-Z (CZ)  −  Toffoli  

Fredkin  

Consequently the state of a quantum system can be updatedby multiplying a state vector with a unitary matrix. Fora quantum circuit with m gates corresponding to unitaryoperators M , . . . , M m in order, let v be the initial inputstate and v m be the ﬁnal output state of the circuit. Then v m = M m × M m − × · · · × M × v . (3)

3) Qubit Measurement:

After all quantum gates are ap-plied, we may need to measure some qubits to get theirstate outcome and determine its probability. In quantummechanics, measuring a qubit makes its state collapse (withsuperposition being destroyed) into an eigenstate with respectto the measurement basis. The probability of an eigenstatebeing observed is determined by its corresponding probabil-ity amplitude. Speciﬁcally, the probability of qubit q beingcollapsed to | (cid:105) (similarly | (cid:105) ) can be calculated by Pr (cid:2) q = | (cid:105) (cid:3) = (cid:88) i ∈{ , } n with bit q =0 | α i | . (4)uppose after measuring qubit q , it collapses to state | (cid:105) ,say. Then the quantum state of the n -qubit system will have0 probability amplitudes for states | i (cid:105) for i ∈ { , } n with bit q being 1. Consequently the probability amplitudes for theother states, i.e., | i (cid:105) for i ∈ { , } n with bit q being 0, arerenormalized by the factor / (cid:113) Pr (cid:2) q = | (cid:105) (cid:3) .Note that the measurement process can be repeated forsome other qubits sequentially or simultaneously.III. BDD-B ASED Q UANTUM S IMULATION

In this section, we describe the proposed BDD-basedstructure for representing state vectors, and provide methodsto complete quantum simulation on this BDD-based structure,i.e., realize three actions discussed in Section II.

A. Algebraic Representation of Complex Values

In this work, we employ the algebraic representation ofcomplex values proposed in [15] to achieve accurate quantumsimulation without precision loss. Essentially, any complexvalue (scalar) α that can be represented exactly can beexpressed as α = 1 √ k ( aω + bω + cω + d ) , (5)where the coefﬁcients a, b, c, d, k ∈ Z , α ∈ C , and ω = e ıπ/ , Therefore, in the context of quantum circuit simulation,when all the entries in the initial state vector and the unitaryoperators can be exactly represented, all the complex valuesresult from the matrix-vector multiplication can be exactlyrepresented, too.The representation is appealing as we only need to maintainﬁve integers to represent a complex number. By a simplecounting argument that Z is countable while C is uncount-able, clearly not every complex number can be compactly rep-resented in this algebraic form. However as there are universalgate sets, such as the Clifford+T set, for quantum computingwhose entries are all exactly representable, quantum circuitsimulation under the algebraic representation can be generallydone without loss of generality. It is because the universalityof a gate set allows any unitary operator to be approximatedusing the gates from the set within any desired precision.Hence as long as the initial state can be exactly represented,quantum circuit simulation can be achieved without precisionloss.The next question we should address is how to efﬁcientlymaintain and manipulate the integers for quantum circuitsimulation. B. Bit-Slicing State Vectors with BDDs

Given a state vector | ψ (cid:105) of an n -qubit quantum system withits entries represented algebraically in the form of Eq. (5), weexploit BDDs for representation as follows.By Eq. (5), the state vector | ψ (cid:105) (with n entries ofcomplex values) is described by an integer scalar k andfour vectors (cid:126)a = [ a , . . . , a n − ] T ,(cid:126)b = [ b , . . . , b n − ] T , (cid:126)c =[ c , . . . , c n − ] T , (cid:126)d = [ d , . . . , d n − ] T , each with n entriesof integer values. Hence the probability amplitude α i of basisstate | i (cid:105) in | ψ (cid:105) equals √ k ( a i ω + b i ω + c i ω + d i ) for Fig. 1. Bit-slicing algebraic numbers with BDDs. i ∈ { , } n . Let a i , b i , c i , d i be r -bit integers ( r can beadjusted on-the-ﬂy as large as enough in our implementation).We represent the j th bit, for j = 1 , . . . , r , of each of thevectors (cid:126)a,(cid:126)b, (cid:126)c, (cid:126)d with a BDD as illustrated in Fig. 1. Therebyeach BDD is a function over the n qubit variables so that thetruth table of the function corresponds to the corresponding abit vector of n entries. On the other hand, because scalar k isshared among all n entries of (cid:126)a,(cid:126)b, (cid:126)c, (cid:126)d , it is stored separately.Overall we use r BDDs over n variables to represent an n -qubit state vector. In the sequel, we denote the BDDs of the i th bit of (cid:126)a , (cid:126)b , (cid:126)c , and (cid:126)d as F ai , F bi , F ci , and F di , respectively.Note that we require no algebraic representation for unitarymatrices as we replace matrix-vector multiplication withBoolean formula manipulation over state-vector BDDs as tobe explained in Section III-D. C. Initial State Construction

A typical, if not every, quantum computation algorithmprepares its initial state in one of the basis states speciﬁed apriori . In this case, all BDDs correspond to constant 0 (logicalfalse) except for F d . Given an n -qubit quantum circuit withan initial state | i (cid:105) = | b . . . b n − (cid:105) for some i ∈ { , } n and b , . . . , b n − ∈ { , } , then F d = n − (cid:94) j =0 l j , for l j = (cid:40) q j , if b j = 0 q j , if b j = 1 , (6)where q j is the Boolean variable of the formula/BDD underconstruction corresponding to the j th qubit of the quantumcircuit. D. Unitary State Evolution via Formula Manipulation

Under the bit-sliced state vector representation using BDDsas detailed in Section III-B, we show how to update a statevector when an quantum gate in Table I is applied. Essentially,the new state is the result of the matrix-vector multiplication.We note that the gates in Table I form a library generalenough to achieve universal quantum computation as it is a su-perset of the Clifford+T [16] as well as the Toffoli+Hadamard[17] universal gate sets.To avoid vector-matrix multiplication but achieve the samecomputation, we examine the effect of every consideredquantum gate and characterize its corresponding Booleanformulas for updating the bit-sliced algebraic parameters ( (cid:126)a,(cid:126)b, (cid:126)c, (cid:126)d, k ) stated in Section III-B. The obtained update rulesor F ai , F bi , F ci , F di are summarized in Table II. For brevity,below we only detail the update rule derivation of H-gatewhile leaving others as exercises for interested readers toverify.For simplicity, consider a 2-qubit quantum system with thestate vector [ α , α , α , α ] T , where α , α , α , and α arethe probability amplitudes of basis states | q q (cid:105) = | (cid:105) , | (cid:105) , | (cid:105) , and | (cid:105) , respectively. Without loss of generality, assumea Hadamard gate is applied on qubit q . The unitary matrix forthe 2-qubit system is then obtained by the Kronecker product H ⊗ I = 1 √ (cid:20) − (cid:21) ⊗ (cid:20) (cid:21) = 1 √  − −  . With the scaling factor √ being omitted (which contributesto the increment of parameter k by 1), the state vector isupdated to  − −   α α α α  =  α + α α + α α − α α − α  =  α α α α  +  α α − α − α  . The summation of the above two vectors forms the basis ofupdating F ai , F bi , F ci , F di for the Hadamard gate.For a general n -qubit quantum system, the followingproposition speciﬁes the formulas that update F ai (similarly F bi , F ci , F di ) of the original state vector to ˆ F ai of the newstate vector. Proposition 1.

For an n -qubit quantum system with its statevector being algebraically represented with F ai (along with F b i , F c i , F d i ), let Hadamard gate be applied on an arbitraryqubit q t . Then the new state vector is speciﬁed by ˆ F ai (alongwith ˆ F b i , ˆ F c i , ˆ F d i derivable similarly) as follows. G ai = F ai | q t , (7) D ai = q t F ai | q t ∨ q t F ai , (8) C a = q t , (9) C a ( i +1) = G ai D ai ∨ ( G ai ∨ D ai ) C ai , (10) ˆ F ai = G ai ⊕ D ai ⊕ C ai , (11)where i = 0 , . . . , r − for r be the integer size of algebraicparameters ( (cid:126)a,(cid:126)b, (cid:126)c, (cid:126)d ) . Moreover, the k -value increases by 1.To see the correctness (with the help of the above 2-qubit example), observe that Eq. (7) and Eq. (8) derivethe BDDs of two component vectors ( [ α , α , α , α ] T and [ α , α , − α , − α ] T , respectively). Moreover, Eq. (9),Eq. (10) and Eq. (11) together fulﬁll the function of an addersumming the two component vectors bitwisely. It is worthnoting that since we use 2’s complement to represent integers,we set C a = q t in Eq. (9) as the initial carry-in of the adderfunction to realize the ”plus one” action for the negated entriesof the second component vector (in [ α , α , − α , − α ] T thelast two entries are complemented in Eq. (8) by the secondterm, i.e., q t F ai , and their ”plus one” actions are realized bythe initial carry-in setting). On the other hand, the Hadamardgate increases the k -value by 1 due to the √ scaling factor. For other quantum gates, their F ai , F bi , F ci , and F di formulas are obtained and summarized in Table II, where q c is the control bit, Q c is the conjunction of all control bits ofthe Toffoli and Fredkin gate, q t is the target bit, q t (cid:48) is thesecond target for Fredkin gates, and i = 0 , . . . , r − . Also inthe table, for brevity we deﬁne Car(

A, B, C ) (cid:44) AB ∨ ( A ∨ B ) C, Sum(

A, B, C ) (cid:44) A ⊕ B ⊕ C, to denote the carry and sum operations over formulas A, B, C .We note that the formulas of the Toffoli gate in the tablework for a general Toffoli gate of an arbitrary number ofcontrol bits. Note also that quantum gates X, Z, H, Ry( π ),CNOT, CZ, Toffoli, and Fredkin involve no imaginary partsand thus their F ai , F bi , F ci , and F di formulas are mutuallyindependent. In contrast, quantum gates Y, S, T, and Rx( π )involve imaginary parts and cause phase shifts making their F ai , F bi , F ci , and F di formulas mutually dependent. Aboutthe algebraic parameter k , its value remains the same forall the quantum gates except for being incremented by 1for Hadamard, Rx( π/ ), and Ry( π/ ) gates due to their √ scaling factors.The correctness of Table II construction is asserted in thefollowing theorem. Theorem 1.

The formulas F ai , F bi , F ci , F di of Table II andthe k value update rule stated above correctly update thequantum state under the algebraic representation of Eq. (5).We mention that in our implementation the integer bit size r is augmented dynamically when necessary (i.e., overﬂowoccurs). Thereby with the accuracy guarantee of the algebraicrepresentation, our simulation is exact (i.e., no precision loss). E. Measurement and Probability Calculation

In the QMDD-based method [12], measurement and prob-ability calculation can be done efﬁciently by traversing theQMDD. For a qubit q to be measured, placing it as the topvariable of the QMDD makes Pr (cid:2) q = | (cid:105) (cid:3) and Pr (cid:2) q = | (cid:105) (cid:3) derivable in one QMDD traversal.In our case, measurement and probability calculation arenot as easy as the QMDD case because we do not have amonolithic BDD but rather r BDDs. We adopt the hyper-function construction [18] of combining multiple BDDs intoone monolithic BDD F as follows. Let F = x x F (cid:126)a ∨ x x F (cid:126)b ∨ x x F (cid:126)c ∨ x x F (cid:126)d , (12)for F (cid:126)a = r − (cid:95) i =0 g i F ai , F (cid:126)b = r − (cid:95) i =0 g i F bi ,F (cid:126)c = r − (cid:95) i =0 g i F ci , F (cid:126)d = r − (cid:95) i =0 g i F di , ABLE IIB

OOLEAN FORMULAS FOR QUANTUM STATE EVOLUTION OF THE SUPPORTED GATE SET .Gate Boolean formulasUpdate F ai Update F bi Update F ci Update F di X ˆ F ai = q t F ai | q t ∨ q t F ai | q t . same as F ai except renamingY G ai = q t F ci | q t ∨ q t F ci | q t , G bi = q t F di | q t ∨ q t F di | q t , G ci = q t F ai | q t ∨ q t F ai | q t , G di = q t F bi | q t ∨ q t F bi | q t ,D ai = q t G ai ∨ q t G ai , D bi = q t G bi ∨ q t G bi , D ci = q t G ci ∨ q t G ci , D di = q t G di ∨ q t G di ,C a = q t , C b = q t , C c = q t , C d = q t ,C a ( i +1) = Car( D ai , , C ai ) , C b ( i +1) = Car( D bi , , C bi ) , C c ( i +1) = Car( D ci , , C ci ) , C d ( i +1) = Car( D di , , C di ) , ˆ F ai = Sum( D ai , , C ai ) . ˆ F bi = Sum( D bi , , C bi ) . ˆ F ci = Sum( D ci , , C ci ) . ˆ F di = Sum( D di , , C di ) . Z G ai = q t F ai ∨ q t F ai , same as F ai except renaming C a = q t ,C a ( i +1) = Car( G ai , , C ai ) , ˆ F ai = Sum( G ai , , C ai ) . H G ai = F ai | q t , same as F ai except renaming D ai = q t F ai | q t ∨ q t F ai ,C a = q t ,C a ( i +1) = Car( G ai , D ai , C ai ) , ˆ F ai = Sum( G ai , D ai , C ai ) . S ˆ F ai = q t F ai ∨ q t F ci . ˆ F bi = q t F bi ∨ q t F di . G ci = q t F ci ∨ q t F ai , G di = q t F di ∨ q t F bi ,C c = q t , C d = q t ,C c ( i +1) = Car( G ci , , C ci ) , C d ( i +1) = Car( G di , , C di ) , ˆ F ci = Sum( G ci , , C ci ) . ˆ F di = Sum( G di , , C di ) . T ˆ F ai = q t F ai ∨ q t F bi . ˆ F bi = q t F bi ∨ q t F ci . ˆ F ci = q t F ci ∨ q t F di . G di = q t F di ∨ q t F ai ,C d = q t ,C d ( i +1) = Car( G di , , C di ) , ˆ F di = Sum( G di , , C di ) . Rx( π ) D ai = q t F ci | q t ∨ q t F ci | q t , D bi = q t F di | q t ∨ q t F di | q t , D ci = q t F ai | q t ∨ q t F ai | q t , D di = q t F bi | q t ∨ q t F bi | q t ,C a = 1 , C b = 1 , C c = 0 , C d = 0 ,C a ( i +1) = Car( F ai , D ai , C ai ) , C b ( i +1) = Car( F bi , D bi , C bi ) , C c ( i +1) = Car( F ci , D ci , C ci ) , C d ( i +1) = Car( F di , D di , C di ) , ˆ F ai = Sum( F ai , D ai , C ai ) . ˆ F bi = Sum( F bi , D bi , C bi ) . ˆ F ci = Sum( F ci , D ci , C ci ) . ˆ F di = Sum( F di , D di , C di ) . Ry( π ) G ai = F ai | q t , same as F ai except renaming D ai = q t F ai ∨ q t F ai | q t ,C a = q t ,C a ( i +1) = Car( G ai , D ai , C ai ) , ˆ F ai = Sum( G ai , D ai , C ai ) . CNOT ˆ F ai = q c F ai ∨ q c q t F ai | q c q t same as F ai except renaming ∨ q c q t F ai | q c q t . CZ G ai = q c q t F ai ∨ q c q t F ai , same as F ai except renaming C a = q c q t ,C a ( i +1) = Car( G ai , , C ai ) , ˆ F ai = Sum( G ai , , C ai ) . Toffoli ˆ F ai = Q c F ai ∨ Q c q t F ai | Q c q t same as F ai except renaming ∨ Q c q t F ai | Q c q t . Fredkin ˆ F ai = Q c ( q t ⊕ q t (cid:48) ) F ai same as F ai except renaming ∨ Q c q t q t (cid:48) F ai | Q c q t q t (cid:48) ∨ Q c q t q t (cid:48) F ai | Q c q t q t (cid:48) . ith g i =  x x · · · x (cid:100) log r (cid:101) +1 , if i = 0 x x · · · x (cid:100) log r (cid:101) +1 , if i = 1 x x · · · x (cid:100) log r (cid:101) +1 , if i = 2 ... x x · · · x (cid:100) log r (cid:101) +1 , if i = r − , where x to x (cid:100) log r (cid:101) +1 are fresh new Boolean variables usedfor labeling/encoding the r BDDs.Given a monolithic BDD F , the measurement procedure isconducted on F as illustrated in Fig. 2, where the qubit vari-ables q , . . . , q n − are ordered above the encoding variables x , x , which are followed by variables x . . . , x (cid:100) log r (cid:101) +1 variables, and p ij denotes the probability Pr (cid:2) q i = | j (cid:105) (cid:3) for j ∈ { , } . For probability calculation of measurement, wecompute the accumulated probabilities of the nodes at thetop n levels recursively, and record them by a hash map.(The accumulated probability of a node is the sum of theprobabilities of its left and right children.) If the recursiveprocedure reaches the n th level of F (counting from 0), thefour algebraic integers a , b , c , and d can be decoded by paths x x , x x , x x , and x x , respectively, to obtain values ofthe bit positions, and the corresponding probability amplitude α can be computed.As mentioned in Section II-A, all probability amplitudesshould be multiplied with a normalization factor after someprobability amplitudes are set to zero due to measurement.Unfortunately, a normalization factor may not be algebraicallyrepresented by Eq. (5). Hence, we modify Eq. (5) as s × α = s × √ k ( aω + bω + cω + d ) , (13)where s ∈ R is a normalization factor for measuring somequbit(s). Note that the potential precision loss in this ﬁnalsimulation step of measurement is inevitable because we haveto represent the answer probability using a ﬂoating pointnumber anyway. On the other hand, for measuring multiplequbits, having one measurement on all of the interested qubitsyields less precision loss than a sequence of measurementson the qubits one at a time. The former avoids the need ofnormalization (at the cost of exploring exponential numberof outcomes), while the latter requires normalization severaltimes. If our interested query is about the probability of aparticular outcome, e.g., Pr (cid:2) q q q = | (cid:105) (cid:3) , rather than theprobabilities of all possible outcomes, then the former is morepreferred than the latter.By replacing α with s × α in Eq. (4), we derive (cid:88) i ∈{ , } n with q = j | sα i | = s (cid:88) i ∈{ , } n with q = j | α i | , where j ∈ { , } . Therefore, to compute the current proba-bilities after normalization, we can simply multiply s withthe accumulated probabilities of the current nodes. Hencethe accumulated probabilities need not be recomputed dueto normalization. Fig. 2. Monolithic BDD F for measurement. Assume we would like to measure multiple qubits one at atime. We make the BDD variables of the interested qubits ontop and follow the measurement order in F . Without loss ofgenerality let the order be q , . . . , q n − . Then, the ﬁrst qubitto be measured is q , which is the root node of F . As shownin the left part of Fig. 2, p and p are the accumulatedprobabilities of the 0-child and 1-child of the root node,respectively. Assume we obtain q = | (cid:105) after measurement,as shown in the right part of Fig. 2, the amplitudes with q = | (cid:105) can easily be set to zero by connecting the 0-edge of q to the constant-0 node, and s is updated from to / √ p to satisfy the normalization constraint. We repeat the sameprocedure to measure the other qubits following the order ofvariables. In fact, when measuring q i , there exists only onenode labelled q i .It is worth noting that, when some qubits are to bemeasured, the order of measuring them is immaterial. Thisfreedom allows BDD reordering to be performed to reduceBDD size. The only requirement is to make the qubit variablesto be measured above the qubit variables not to be measured,and the encoding variables below all the qubit variables.IV. E XPERIMENTAL R ESULTS

The proposed simulation algorithm was implemented inC++ in the

ABC system [19] and used

CUDD [20] as theunderlying BDD package. For the dynamic variable reorder-ing heuristic, the implementation of [21] in

CUDD was used.In our implementation, the size of the integers r was ini-tially set to 32. When overﬂows were detected, extra BDDswere allocated for each integer. To reduce the precision lossdue to probability calculation in measurement, we used the GNU MPFR library [22] to increase the precision of theﬂoating point numbers. In the experiments, the state-of-the-art simulator

DDSIM (version v1.0.1a) [12], [13], which isbased on QMDD [14], was compared. All experiments wereconducted on a server with Intel(R) Xeon(R) Silver 4210CPU @ 2.20GHz, 30.7GB RAM. The time-out (TO) limitwas set to 7200 seconds, and the memory-out (MO) limitwas set to 2GB for each case. The evaluation was performedon four sets of benchmarks, including 1) randomly generatedbenchmarks, 2)

RevLib benchmarks [23] and its variantswith entanglements, 3) quantum algorithm benchmarks forntanglement (Bell state preparation) and BernsteinVazirani(BV) algorithm [24], and 4) supremacy benchmarks fromGoogle [25].For the ﬁrst set of benchmarks, we randomly generatedcircuits of various qubit sizes (40, 80, 120, 160, 200, 300, 400,500) using all the supported gates, but excluding Rx( π/ ) andRy( π/ ) as they exhibit similar effects as the H-gate. The ratioof gates : qubits was ﬁxed to 3:1, and 10 circuits weregenerated for each size. In building a circuit, we ﬁrst insertedan H-gate to every qubit (so to impose state superposition inthe beginning), and then inserted the targeted number of gatesinto the circuit by picking every gate uniformly at randomfrom the mentioned gate set and applied it to some qubit(s)selected uniformly at random.The results on the random circuits are shown in Table III,where Column 1 lists qubits , Column 2 lists gates ,and Columns 3 and 4 list the average runtime and thenumbers of TO/MO/error/segmentation fault cases of DDSIM ,respectively, and Columns 5 and 6 list the similar informationof our simulator. The term ‘error’ indicates a numerical errorhappened if all state probabilities do not sum to 1 due toprecision loss. The message ‘failed’ means all 10 cases cannotbe simulated successfully. The reported runtime was onlyaveraged over success cases.From Table III, we see that

DDSIM fails to simulate circuitswith 120 or more qubits due to TO, MO, numerical errors, orsegmentation faults. In contrast, our method tends to be muchmore robust and scalable when simulating the consideredrandom circuits.

DDSIM yielded 13 MO and 30 error cases.On the contrary, our method produced no such cases. In thisbenchmark setting, our simulator is superior to DDSIM inruntime, memory efﬁciency, and accuracy.For the second benchmark set, we took reversible circuits

RevLib [23] for experiments. However, because most of thecircuits from

RevLib are converted from classical circuits,they do not exhibit quantum effects and can be simulated ef-ﬁciently. (In the simulation we assumed random initial valuesfor inputs whose initial values are not speciﬁed.) To make thecircuits more interesting with quantum effects, we modiﬁedthe original circuits by inserting H-gates to the inputs whoseinitial values are not speciﬁed in the original circuit suchthat we create superposition states in the beginning. Theresults on the

RevLib benchmarks are shown in Table IV,where Column 1 lists the circuit name, Column 2 lists thecorresponding qubits , Columns 3-5 list the gates beforethe modiﬁcation and the results on the original circuits, andsimilarly, Columns 6-8 list the gates after the modiﬁcationand the results on the modiﬁed circuits. As one can seefrom Table IV, both

DDSIM and our method can simulatethe circuits of classical functionalities efﬁciently. When themodiﬁed circuits are considered,

DDSIM suffered mostly fromMO when simulating the modiﬁed circuits. For those circuitsthat

DDSIM cannot simulate successfully, our method cansimulate them within the timeout limit. To further investigatethe MO cases of

DDSIM , we performed a case study on There are some

RevLib benchmarks that our method cannot simulatesuccessfully due to timeout. However,

DDSIM fails on those circuits, too. callif_32_439 and removed the MO limit. Nevertheless,

DDSIM still cannot ﬁnish simulation within TO limit, and thememory usage grows to 9.72GB upon TO.For the third benchmark set, we collected quantum algo-rithm circuits, including the entanglement and BV circuits. The results are shown in Table V, where Column 1 lists qubits , Columns 2-4 and 5-7 list the gates and runtimeinformation for entanglement and BV circuits, respectively.For the entanglement circuits,

DDSIM encountered MO at qubits = 10000 , whereas our method ﬁnishes within 67seconds. For the BV circuits,

DDSIM encountered numericalerrors and segmentation faults for the qubits ≥ cases,whereas our method ﬁnishes in hundreds of seconds whensimulating circuits with thousands of qubits. We note thatthe entanglement circuits belong to the category of stabilizercircuits, which are known efﬁciently simulatable by classi-cal computers [4]. When CHP [26], a simulator based on[4] dedicated to stabilizer circuit simulation, is applied, theentanglement circuit with qubits = 10000 can be simulatedin 6.7 seconds. It is not surprising as

CHP exploits additionalcircuit properties for fast simulation. On the other hand, BVcircuits are beyond the stabilizer circuit category and cannotbe simulated by

CHP .For the fourth benchmark set, we took the random circuitsproposed by Google for showing quantum supremacy [25]. These circuits are meant to create highly entangled states tomake them challenging to simulate by classical computers.As the circuits are too difﬁcult to simulate, we simpliﬁedthe circuits with depth 10 by reducing their depths to 5. Theresults are shown in Table VI, where all columns list the sameinformation as that in Table III, except that memory usageinformation is additionally reported and the numbers of errorand segmentation fault cases are all 0 and omitted. Similar tothe experiments of Table III, we have 10 random circuits foreach qubit size (in each row in Table VI), and the runtimewas averaged over the cases that are simulated successfully.On the other hand, the memory usage was averaged over all10 cases including TO and MO cases. The obtained resultsshow that,

DDSIM and our method suffer from MO andTO, respectively, when simulating qubits ≥ . As canbe observed, DDSIM took smaller average runtime comparedto ours. However, by examining circuits with some speciﬁc qubits , we can observe that our method possibly simulatesmore cases than

DDSIM . For example, when qubits = 49 ,our method simulates 9 cases, whereas

DDSIM simulatesonly 4 cases. Overall,

DDSIM simulates 74 out of the total120 cases, whereas our method simulates 77 out of 120cases. Furthermore, our method clearly outperforms

DDSIM in terms of the memory usage. This challenging benchmarkset motivates us for further investigation and improvement. There are other quantum algorithm circuits, such as QFT and Shoralgorithms. However, they involve unitary operators not algebraically rep-resentable and are excluded from our experiments. The circuits were downloaded from https://github.com/sboixo/GRCS un-der the directory ”inst/rectangular/cz v2”.ABLE IIIR

ESULTS ON R ANDOM C IRCUITS . TABLE IVR

ESULTS ON R EVLIB C IRCUITS . Original ModiﬁedTime(s) Time(s)Benchmark

V. C

ONCLUSIONS

In this paper, we have presented a new quantum circuitsimulator that outperforms the state-of-the-art in both scala-bility and accuracy. The algebraic representation, bit-slicingtechnique, and quantum gate formula pre-characterizationtogether enable the success. For future work, we would liketo investigate the supremacy benchmarks further and identifypoints for improvements.A

CKNOWLEDGMENT

This work was supported in part by the Ministry of Scienceand Technology of Taiwan under grant MOST 108-2218-E-002-073. The authors are grateful to Stefan Hillmich foranswering questions about the DDSIM package.R

EFERENCES[1] F. Arute et al. , “Quantum supremacy using a programmable supercon-ducting processor,”

Nature et al. , “Qiskit: An open-source framework for quantumcomputing,” 2019.[4] S. Aaronson and D. Gottesman, “Improved simulation of stabilizercircuits,”

Physical Review A , vol. 70, no. 5, Nov 2004.[5] A. S. Green, P. L. Lumsdaine, N. J. Ross, P. Selinger, and B. Val-iron, “Quipper: A scalable quantum programming language,” in

ACMSIGPLAN Conference on Programming Language Design and Imple-mentation, (PLDI) , 2013, pp. 333–342.[6] D. Wecker and K. M. Svore, “Liqui |(cid:105) : A software design architectureand domain-speciﬁc language for quantum computing,” 2014, arXiv:quant-ph/1402.4467.[7] N. Khammassi, I. Ashraf, X. Fu, C. G. Almudever, and K. Bertels,“QX: A high-performance quantum computer simulation platform,” in

Design, Automation and Test in Europe Conference and Exhibition(DATE) , 2017, pp. 464–469.[8] E. Pednault et al. , “Breaking the 49-qubit barrier in the simulation ofquantum circuits,” 2017, arXiv: quant-ph/1710.05867. TABLE VR

ESULTS ON Q UANTUM A LGORITHM C IRCUITS .

ESULTS ON G OOGLE S UPREMACY C IRCUITS . < [9] D. S. Steiger, T. Hner, and M. Troyer, “ProjectQ: An open sourcesoftware framework for quantum computing,” Quantum , vol. 2, p. 49,Jan 2018.[10] G. F. Viamontes, I. L. Markov, and J. P. Hayes,

Quantum CircuitSimulation . Springer Publishing Company, Incorporated, 2009.[11] V. Samoladas, “Improved BDD algorithms for the simulation of quan-tum circuits,” in

European Symposium on Algorithms (ESA) , 2008, pp.720–731.[12] A. Zulehner and R. Wille, “Advanced simulation of quantum compu-tations,”

Trans. on CAD of Integrated Circuits and Systems , vol. 38,no. 5, pp. 848–859, 2019.[13] A. Zulehner, S. Hillmich, and R. Wille, “How to efﬁciently handlecomplex values? Implementing decision diagrams for quantum comput-ing,” in

International Conference on Computer-Aided Design (ICCAD) ,2019, pp. 1–7.[14] P. Niemann, R. Wille, D. M. Miller, M. A. Thornton, and R. Drechsler,“QMDDs: Efﬁcient quantum function representation and manipulation,”

IEEE Transactions on Computer-Aided Design of Integrated Circuitsand Systems , vol. 35, no. 1, pp. 86–99, 2016.[15] A. Zulehner, P. Niemann, R. Drechsler, and R. Wille, “Accuracyand compactness in decision diagrams for quantum computation,” in

Design, Automation and Test in Europe Conference and Exhibition(DATE) , 2019, pp. 280–283.[16] P. O. Boykin, T. Mor, M. Pulver, V. Roychowdhury, and F. Vatan,“A new universal and fault-tolerant quantum basis,”

Inf. Process. Lett. ,vol. 75, no. 3, p. 101107, 2000.[17] D. Aharonov, “A simple proof that Toffoli and Hadamard are quantumuniversal,” 2003, arXiv: quant-ph/0301040.[18] J.-H. R. Jiang, J.-Y. Jou, and J.-D. Huang, “Compatible class encodingin hyper-function decomposition for FPGA synthesis,” in

Design andAutomation Conference (DAC) , 1998, pp. 712–717.[19] R. Brayton and A. Mishchenko, “ABC: An Academic Industrial-Strength Veriﬁcation Tool,” in

Proceedings of International Conferenceon Computer Aided Veriﬁcation (CAV) , 2010, pp. 24–40.[20] F. Somenzi, “CUDD: CU decision diagram package (release 2.4.2),”

University of Colorado at Boulder , 2005.[21] S. Panda, F. Somenzi, and B. F. Plessier, “Symmetry detection anddynamic variable ordering of decision diagrams,” in

InternationalConference on Computer-Aided Design (ICCAD) , 1994, p. 628631.[22] L. Fousse, G. Hanrot, V. Lef`evre, P. P´elissier, and P. Zimmermann,“MPFR: A multiple-precision binary ﬂoating-point library with correctrounding,”

ACM Trans. Math. Softw. , vol. 33, no. 2, p. 13es, Jun. 2007.[23] R. Wille, D. Große, L. Teuber, G. W. Dueck, and R. Drechsler,“RevLib: An online resource for reversible functions and reversibleircuits,” in

Int’l Symp. on Multi-Valued Logic

SIAM J.Comput. , vol. 26, no. 5, p. 14111473, Oct. 1997.[25] S. Boixo et al. , “Characterizing quantum supremacy in near-termdevices,”