[PDF] Quantum Divide and Compute: Exploring The Effect of Different Noise Sources

Abstract

Our recent work (Ayral et al., 2020 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)) showed the first implementation of the Quantum Divide and Compute (QDC) method, which allows to break quantum circuits into smaller fragments with fewer qubits and shallower depth. QDC can thus deal with the limited number of qubits and short coherence times of noisy, intermediate-scale quantum processors. This article investigates the impact of different noise sources -- readout error, gate error and decoherence -- on the success probability of the QDC procedure. We perform detailed noise modeling on the Atos Quantum Learning Machine, allowing us to understand tradeoffs and formulate recommendations about which hardware noise sources should be preferentially optimized. We describe in detail the noise models we used to reproduce experimental runs on IBM's Johannesburg processor. This work also includes a detailed derivation of the equations used in the QDC procedure to compute the output distribution of the original quantum circuit from the output distribution of its fragments. Finally, we analyze the computational complexity of the QDC method for the circuit under study via tensor-network considerations, and elaborate on the relation the QDC method with tensor-network simulation methods.

Full PDF

NNoname manuscript No. (will be inserted by the editor)

Quantum Divide and Compute: Exploring The Eﬀect of Diﬀerent NoiseSources

Thomas Ayral ∗ , François-Marie Le Régent ,Zain Saleem , Yuri Alexeev , Martin Suchara the date of receipt and acceptance should be inserted later Abstract

Our recent work [1] showed the ﬁrst implementation of the Quantum Divide and Compute(QDC) method, which allows to break quantum circuits into smaller fragments with fewer qubits andshallower depth. QDC can thus deal with the limited number of qubits and short coherence times of noisy,intermediate-scale quantum processors. This article investigates the impact of diﬀerent noise sources—readout error, gate error and decoherence—on the success probability of the QDC procedure. We performdetailed noise modeling on the Atos Quantum Learning Machine, allowing us to understand tradeoﬀs andformulate recommendations about which hardware noise sources should be preferentially optimized. Wedescribe in detail the noise models we used to reproduce experimental runs on IBM’s Johannesburgprocessor. This work also includes a detailed derivation of the equations used in the QDC procedureto compute the output distribution of the original quantum circuit from the output distribution of itsfragments. Finally, we analyze the computational complexity of the QDC method for the circuit understudy via tensor-network considerations, and elaborate on the relation the QDC method with tensor-network simulation methods.

Keywords

Quantum Circuit Compilation · Noise Modeling · Simulation · NISQ

The advent of Noisy Intermediate Scale Quantum (NISQ) technologies [2] makes multiqubit processorswith modest but increasing numbers of qubits available. Google, IBM, and Intel have recently announcedquantum computers with 72, 65, and 49 qubits, respectively [3,4,5]; and new systems with 50 to 200

Thomas Ayral,Atos Quantum Laboratory, Les Clayes-sous-Bois, France, Corresponding author ([email protected], +33 1 30 80 7000)F.M Le RégentAtos Quantum Laboratory, Les Clayes-sous-Bois, France and Ecole Polytechnique, Palaiseau, France,Zain Saleem, Yuri Alexeev, Martin SucharaArgonne National Laboratory, Lemont, Illinois, United States of America a r X i v : . [ qu a n t - ph ] F e b T. Ayral, F.-M. Le Régent, Z. Saleem, Y. Alexeev, M. Suchara qubits are expected to be commercially available in the next few years. However, our ability to use thehardware to solve interesting problems is lagging. Solving practical computational problems typicallyrequires evaluating quantum circuits with many hundreds or even thousands of qubits, exceeding the sizeof the devices. In addition, large gate errors and short qubit coherence times prevent accurate evaluationsof deep circuits.Despite the remarkable progress in manufacturing and controlling these small multiqubit systems, buildinghardware with suﬃciently high number of high-ﬁdelity qubits remains an extremely challenging task.Engineering challenges worsen as the systems scale and are inherent for all major qubit technologies,including superconducting qubits (errors due to Josephson junction defects and spurious microwaveresonances [6]), ion traps (susceptibility to noise and diﬃculty to address individual ions [7]), neutralatoms (motion of the atoms inside the lattice [8]), and quantum dots (diﬃculty to entangle multiplequbits [9,10]).Successfully solving practical computational problems can be achieved only by developing techniques thatcan simultaneously map large problems onto small qubit systems and mitigate the eﬀects of noise. TheQuantum Divide and Compute (QDC) approach is one such technique. In this approach, we divide alarge and potentially deep quantum circuit to suit the number of qubits and coherence times available inthe current quantum hardware. We then perform the computations on the subcircuits obtained by thisdivision on a quantum processor, and we ﬁnally recombine our output results to obtain the output of theoriginal circuit. This allows us to compute the outputs of quantum circuits that are too deep or too wideto be run on existing small-scale quantum processors.There has been some previous work related to this approach. Bravyi et al. [11] showed that a quantumcircuit on n + k qubits can be simulated by sparse circuits on n qubits and exponential classical processingthat takes time O ( k ) poly ( n ) . A more general approach that allows fragmenting larger quantum circuitsinto smaller subcircuits was introduced in [12]. In this work, tensor-network techniques were used to showhow to decompose a circuit with a large quantum volume [13] into smaller subcircuits with quantumvolumes compatible with NISQ devices. The classical computing overhead of the circuit fragmentingtechniques was reduced in [14], and maximum likelihood tomography was applied on top of the circuitfragmentation to ensure that the reconstructed probability distributions are strictly non-negative andnormalized. This work also showed, with the help of classical simulations, that the QDC strategy, whencombined with maximum likelihood tomography, can estimate the output of a clustered circuit withhigher ﬁdelity than the full circuit execution. In [15], a method was introduced to locate the optimallocation of the cut (the location where the circuit should be fragmented). The QDC strategy was appliedto commonly known circuits in quantum computing such as supremacy circuits, Grover and Bernstein-Vazirani circuits, and was shown to achieve a high quantum circuit evaluation ﬁdelity.The ultimate test for the quantum computing ﬁeld—the ability to use controlled quantum systemsto perform tasks surpassing what can be done using classical computers, also called quantumsupremacy [16]—has received considerable attention from both the scientiﬁc community and the generalpublic. The largest classical supercomputers are capable of reliably simulating quantum systems withapproximately 50 qubits [17,18], and there is evidence that devices with more than 50 qubits may be ableto demonstrate quantum supremacy even in the presence of noise [19]. While quantum supremacy is notone of the goals of this work, the developed techniques will allow increasing the size of circuits that canbe evaluated on quantum hardware as well as on quantum simulators run on classical hardware [20,21,22,23] by a constant factor. Consequently, it will be possible to evaluate quantum circuits with hundredsof qubits and use quantum algorithms to solve problems larger than ever before. uantum Divide and Compute: Exploring The Eﬀect of Diﬀerent Noise Sources 3 Circuit cutting naturally complements variational quantum-classical algorithms such as the VariationalQuantum Eigensolver (VQE) [24,25] and the Quantum Approximate Optimization Algorithm(QAOA) [26]. These approaches have successfully produced suitable quantum circuits for optimizationproblems by combining shallow quantum circuits with classical processing; and they allow some controlover the width, depth, and connectivity of the circuits. However, the quality of the approximate solutionsproduced by VQE and QAOA decreases as the width and depth of their circuits decreases, and solvingmost interesting problems still requires hundreds of qubits [27,28].Circuit cutting oﬀers numerous beneﬁts. First, the technique does not compromise the quality of thesolution as the size of the subcircuits decreases (overhead may scale exponentially with the numberof cuts, however). Second, the technique can be applied to any sparsely connected quantum circuit,irrespective of the structure of the problem. Third, circuit cutting has a close relationship with tensornetwork quantum simulation techniques that are used to address scalability limitations due to memoryrequirements that grow exponentially with the size of the simulated systems. Fourth, circuit cutting canenhance the performance of existing quantum-classical variational approaches because it can increase thesize of the subproblems tackled by the variational quantum eigensolver.In this article, we follow up on our previous work on the topic [1]: we start by giving a detailed derivationof the formula for the output reconstruction of the original circuit from the outputs of its fragments,and a description of the noise models we chose to reproduce the experimental results (Section 2). Wequantify the performance of the QDC method by recalling our previous results [1] on a 20-qubit IBMprocessor for diﬀerent qubit counts and fragment sizes (Section 3.1). Then, in Section 3, based on noisysimulations, we quantify the diﬀerential inﬂuence of various noise sources such as readout error, gateerror and decoherence on the success probability of the algorithm for diﬀerent qubit counts and fragmentsizes. Finally, we discuss the classical complexity of the method, its relation to tensor-network simulationapproaches, and its implications for homogeneous and heterogeneous quantum computing.

Let us consider a m -qubit circuit described as the following composition of operations: O = O a A ◦ O a B ◦ O b A ◦ O b B where the support of superoperators O b A and O b B is a bipartition of the qubits; similarly, the support of O a A and O a B is a bipartition such that the two “a” (for “after”) sets diﬀer from the “b” (for “before”) setsby one qubit. Without loss of generality, one can assume that up to a relabeling, the support of O b A is T. Ayral, F.-M. Le Régent, Z. Saleem, Y. Alexeev, M. Suchara ˆ b ˆ b n − ˆ b n ˆ b m − AB(a) Y ( π ) Y ( π ) Y ( π ) Y ( π ) ...... ˆ b n ˆ b m − H b σ α B(d) ...(c) ˆ b n ˆ b m − σ b α B ... ˆ b ˆ b n − b ′ σ α A(b) Y ( π ) Y ( π ) Y ( π ) Y ( π ) ... Fig. 1

Cutting sketch in the two-fragment case. (a) Original circuit, (b) Upper fragment of the circuit, (c) Lower fragmentof the circuit, (d) Lower fragment in its Bell-state variant q , . . . q n and that of O b B is q n +1 , . . . q m − , and the “a” supports, q , . . . q n − and q n , . . . q m − (see Fig.1(a)).The ﬁnal state of the circuit is given by the density matrix: ρ = O ( ρ ) = O a A ◦ O a B ◦ O b A ◦ O b B ( ρ ) where ρ is the initial density matrix. The probability of measuring a state i with binary representation i = (ˆ b ( i ) , . . . ˆ b m − ( i )) is given by p ( i ) = Tr [ Π i · ρ ] (1)where Π i is the projector on state i ( i = 0 . . . m ). It can be expressed as Π i = | i (cid:105)(cid:104) i | = ⊗ m − k =0 | ˆ b k ( i ) (cid:105)(cid:104) ˆ b k ( i ) | ,where ˆ b k ( i ) is the value of the k th bit of i . We note that Π † i = Π i , and (cid:80) i Π i = ⊗ k (cid:80) b k =0 | ˆ b k (cid:105)(cid:104) ˆ b k | = I .Thus: p ( i ) = Tr (cid:104) Π † i · O a A ◦ O a B ◦ O b A ◦ O b B ( ρ ) (cid:105) . (2)We now switch to a Pauli-basis representation (see Appendix A for a reminder). Using Eq. (16), we get p ( i ) = 2 m (cid:104)(cid:104) Π i |R a A R a B R b A R b B | ρ (cid:105)(cid:105) (3)where R a / b A/B is the Pauli transfer matrix (PTM) representation of superoperator O a / b A/B . We now derive the splitting formula. Let us decompose the one-qubit PTM representationof the identity superoperator as R I = (cid:88) α = X,Y,Z (cid:88) bb (cid:48) ∈{ , } ˜ γ bb (cid:48) α | σ bα (cid:105)(cid:105)(cid:104)(cid:104) σ b (cid:48) α | (4) uantum Divide and Compute: Exploring The Eﬀect of Diﬀerent Noise Sources 5 where | σ bα (cid:105)(cid:105) are the (real) coordinates in the Pauli basis of the density matrix corresponding to the b theigenvector | ψ bα (cid:105) of Pauli matrix σ α . The ˜ γ tensor is given by ˜ γ bb (cid:48) X = ˜ γ bb (cid:48) Y = 2 δ bb (cid:48) − and ˜ γ bb (cid:48) Z = 2 δ bb (cid:48) .Inserting R I (acting on qubit q n ) in the expression for the probability, Eq. (3), we obtain p ( i ) = 2 m (cid:104)(cid:104) Π i | R a A (cid:124)(cid:123)(cid:122)(cid:125) q ,...q n − R a B (cid:124)(cid:123)(cid:122)(cid:125) q n ,...q m − R I (cid:124)(cid:123)(cid:122)(cid:125) q n R b A (cid:124)(cid:123)(cid:122)(cid:125) q ,...q n R b B (cid:124)(cid:123)(cid:122)(cid:125) q n +1 ...q m − | ρ (cid:105)(cid:105) = 2 m (cid:88) α = X,Y,Z (cid:88) bb (cid:48) ∈{ , } ˜ γ bb (cid:48) α × (cid:104)(cid:104) Π i | q ...q n − (cid:104)(cid:104) Π i | q n ...q m − R a A (cid:124)(cid:123)(cid:122)(cid:125) q ...q n − R a B (cid:124)(cid:123)(cid:122)(cid:125) q n ...q m − | σ bα (cid:105)(cid:105) q n × (cid:104)(cid:104) σ b (cid:48) α | q n R b A (cid:124)(cid:123)(cid:122)(cid:125) q ...q n − R b B (cid:124)(cid:123)(cid:122)(cid:125) q n +1 ...q m − | ρ (cid:105)(cid:105) q ...q n | ρ (cid:105)(cid:105) q n +1 ...q m − = 2 m (cid:88) α = X,Y,Z (cid:88) bb (cid:48) ∈{ , } ˜ γ bb (cid:48) α − n − p αA ( i | ...n − ; b (cid:48) ) 2 − m + n p αbB ( i | n...m − ) . We thus obtain the ﬁnal formula (with i = (ˆ b . . . ˆ b m − ) ): p (ˆ b . . . ˆ b m − ) = 12 (cid:88) α = X,Y,Z (cid:88) bb (cid:48) ∈{ , } ˜ γ bb (cid:48) α p αA (ˆ b . . . ˆ b n − ; b (cid:48) ) p αbB (ˆ b n . . . ˆ b m − ) (5)with p αA (ˆ b . . . ˆ b n − ; b (cid:48) ) ≡ n +1 (cid:104)(cid:104) Π ˆ b ... ˆ b n − |(cid:104)(cid:104) σ b (cid:48) α | q n R A | ρ (cid:105)(cid:105) q ...q n (6) p αbB (ˆ b n . . . ˆ b m − ) ≡ m − n (cid:104)(cid:104) Π ˆ b n ... ˆ b m − |R B | σ bα (cid:105)(cid:105) q n | ρ (cid:105)(cid:105) q n +1 ...q m − (7)where we have regrouped R A ≡ R a A R b A and R B ≡ R a B R b B . In other words, p αA (ˆ b . . . ˆ b n − ; b (cid:48) ) is theprobability of measuring bitstring ˆ b . . . ˆ b n − , b (cid:48) when measuring the ﬁnal state of fragment A with ameasurement on axis α for qubit q n (see Fig. 1(b)), and p αbB (ˆ b n . . . ˆ b m − ) is the probability of measuringbitstring ˆ b n . . . ˆ b m − when measuring the ﬁnal state of fragment B with qubit q n initially prepared in the b th eigenstate of Pauli matrix σ α (see Fig. 1(c)). Variant using Bell pair

We now derive a diﬀerent expression based on the following idea: instead ofpreparing both eigenstates of σ α , one can use an ancilla qubit, prepare a Bell state, and measure thevalue of the ancilla along measurement axis α and obtain an equivalent result, with a slightly diﬀerentexpression.Switching from the Pauli-basis expression back to the original representation, Eq. (7) is equivalent to p αbB ( i ) = Tr (cid:2) Π i O B ( σ bα ⊗ ρ ) (cid:3) where σ bα = | ψ bα (cid:105)(cid:104) ψ bα | . Let us decompose | ψ bα (cid:105) = (cid:88) k ∈{ , } (cid:104) k | ψ bα (cid:105)| k (cid:105) T. Ayral, F.-M. Le Régent, Z. Saleem, Y. Alexeev, M. Suchara then p αbB ( i ) = (cid:88) kk (cid:48) (cid:104) k | ψ bα (cid:105)(cid:104) ψ bα | k (cid:48) (cid:105) Tr [ Π i · O B ( | k (cid:105)(cid:104) k (cid:48) | ⊗ ρ )]= (cid:88) kk (cid:48) (cid:104) ψ b ∗ α | k (cid:105)(cid:104) k (cid:48) | ψ b ∗ α (cid:105) × Tr [( I ⊗ Π i ) · ( I ⊗ O B ) ( I ⊗ | k (cid:105)(cid:104) k (cid:48) | ⊗ ρ )]= Tr (cid:34) (cid:0) | ψ b ∗ α (cid:105)(cid:104) ψ b ∗ α | ⊗ Π i (cid:1) × ( I ⊗ O B ) (cid:32)(cid:88) kk (cid:48) | k (cid:105)(cid:104) k (cid:48) | ⊗ | k (cid:105)(cid:104) k (cid:48) | ⊗ ρ (cid:33) (cid:35) = 2Tr (cid:2)(cid:0) Π b ∗ α ⊗ Π i (cid:1) · ( I ⊗ O B ) ( ρ Φ + ⊗ ρ ) (cid:3) where Π b ∗ α = | ψ b ∗ α (cid:105)(cid:104) ψ b ∗ α | is the projector onto the complex conjugate of the b th eigenstate of the σ α Paulimatrix, and ρ Φ + is the density matrix of the Bell state | Φ + (cid:105) ≡ √ (cid:88) k =0 , | kk (cid:105) . (8)In the second line, we have added an ancilla qubit. Now, let us note that for α = X, Z , | ψ bα (cid:105) = | ψ b ∗ α (cid:105) (theeigenvector is real-valued), while | ψ b ∗ Y (cid:105) = | ψ − bY (cid:105) , and let us deﬁne ˆ p αB ( b ; i ) ≡ Tr (cid:2) Π bα ⊗ Π i ( I ⊗ O B ) ( ρ Φ + ⊗ ρ ) (cid:3) . (9)Then: p αbB ( i ) = (cid:40) p αB ( i ; b ) α = X, Z p αB ( i ; 1 − b ) α = Y Thus, after relabeling b → − b for α = Y in the ﬁnal formula Eq. (5), we ﬁnally obtain the ﬁnalexpression: p (ˆ b . . . ˆ b m − ) = (cid:88) α = X,Y,Z (cid:88) bb (cid:48) ∈{ , } γ bb (cid:48) α p αA (ˆ b . . . ˆ b n − ; b (cid:48) ) p αB ( b ; ˆ b n . . . ˆ b m − ) . (10)where γ bb (cid:48) X = 2 δ bb (cid:48) − , γ bb (cid:48) Y = − γ bb (cid:48) X and γ bb (cid:48) Z = 2 δ bb (cid:48) .The graphical representation for such a contraction is shown in Fig. 2 (a). The formula for the multi-fragment case can be inferred from that of the two-fragment case: the proceduresketched for the two-fragment case can be recast in more generic terms, as described in [12]. This is done byconsidering the directed acyclic graph G = ( V, E ) corresponding to the quantum circuit at hand (see Fig. 3for an illustration of the procedure). Its vertices V are quantum operations such as qubit initialization,measurement and gates. The cutting procedure amounts to ﬁnding a subset E (cid:48) ⊂ E of M (directed) edgesin this graph whose removal leads to K disconnected directed acyclic graphs { G ( i ) = ( V i , E i ) } i =1 ...K . Ineach disconnected graph, n i + m i vertices have a dangling edge corresponding to the original n i incoming uantum Divide and Compute: Exploring The Eﬀect of Diﬀerent Noise Sources 7 α ˆ b ˆ b ˆ b n − ˆ b n ˆ b n +1 ˆ b m − (a)(b) . . . b ′ b p B γ . . . p A . . . . . . . . . . . . Fig. 2

Graphical representation of the contraction formula. Panel (a): Two fragment case. Panel (b): Multifragment casefor the GHZ circuit shown in Fig.1 (a). (A) (B) (C) ˆ b ˆ b ˆ b ˆ b (a) (b) γ γγ p A p C p B α α α b ′ b b b ′ b ′ b ˆ b ˆ b ˆ b ˆ b Fig. 3

Graphical representation of the contraction formula for a generic case (here with three fragments). Panel (a): Sketchof the fragmentation of a four-qubit circuit in three fragments. Panel (b): Corresponding tensor network to contract to getﬁnal distribution. and m i outgoing edges connecting it to the rest of the original graph, with (cid:80) i n i = (cid:80) i m i = M . Onethen adds a measurement along axis α k ( α k = X, Y, Z ) as a termination to each outgoing dangling edge( k = 1 . . . n i ), and a Bell-state gadget (as described in the previous subsection), whose ancilla line isterminated by an α (cid:48) k -measurement, to each incoming dangling edge. Translating the family of graphs G ( i ) α ... α ni ,α (cid:48) ... α (cid:48) mi back to quantum circuits C ( i ) α ... α ni ,α (cid:48) ... α (cid:48) mi , we can sample (using a quantum computer)the corresponding probability distributions. We denote as p α ...α ni ,α (cid:48) ...α (cid:48) mi i (cid:0) b , . . . b n i ; s ; b (cid:48) , . . . b (cid:48) m i (cid:1) the probability of measuring bitstring b , . . . b n i ; s ; b (cid:48) , . . . b (cid:48) m i , with s = (ˆ b . . . ˆ b p i ) a bitstringcorresponding to the state of “ﬁnal” qubits of circuit C ( i ) , and ( b , . . . b n i ) (resp. b (cid:48) , . . . b (cid:48) m i ) ) the bitstringscorresponding to the measured value for the measurements on the incoming (resp. outgoing) edges ofsub-graph G ( i ) after pre-measurement rotations along axes α . . . α n i , α (cid:48) . . . α (cid:48) m i .The ﬁnal probability distribution is obtained by contracting the tensor network deﬁned by the graph ˆ G = (cid:16) ˆ V , ˆ E (cid:17) , with | ˆ V | = K + M and | ˆ E | = 2 M . Here, K “fragment” vertices correspond to the K disconnected components { G ( i ) } , and M “connecting” vertices to the M removed edges. The M edgesconnect each of the K “fragment” vertices via one of the M “connecting” vertices. To each “fragment” T. Ayral, F.-M. Le Régent, Z. Saleem, Y. Alexeev, M. SucharaParameter ValueReadout error rate γ (cid:15) (1)avg (cid:15) (2)avg T µ sDephasing time T µ s Table 1

Johannesburg processor metrics, as retrieved from IBM Quantum Experience on March 5th, 2020. All rates areaverages over all the qubits/qubit pairs. vertex, we associate a distribution p i , while to each “connecting” vertex, we associate a γ tensor (asdeﬁned below Eq. (10)).We give an example of such a tensor network for the Greenberger-Horne-Zeilinger (GHZ) circuit weconsidered in our previous work as well in Fig. 2 (b): in this case, the underlying graph turns out to belinear. We also show, in Fig. 3, an example with a more complex circuit and the resulting, more complextensor network. Here, K = 3 and M = 3 .The contraction of these networks yields the sought-after distribution. The classical complexity of carryingout this contraction will be discussed in section 3.3.2.2 Noisy simulationNISQ processors are characterized by a substantial level of noise. In this section, we describe what noiseprocesses we took into account in our simulation of the IBM Johannesburg quantum processor.In this study, we focus on the noise processes whose quantitative levels are reported by the hardwaremanufacturer, IBM (see Table 1 for a summary of the numerical values used in the noisy simulationsbelow). This pragmatic approach is justiﬁed a posteriori by the reasonable agreement of our numericalsimulations with the experimental data (see Ref. [1], and Section 3 below). It should nevertheless beemphasized that (i) it uses rather simple noise models, that should be compared to noise models extractedfrom a full process tomography of the processor, and that (ii) it excludes some noise processes that aresuspected to aﬀect the ﬁnal quantum state distribution in a non-negligible way, e.g., crosstalk (spatiallycorrelated noise) and temporally correlated noise (like 1/f noise).The most prominent source of error in today’s superconducting processors is the readout error. Theduration of the dispersive readout conducted in transmon processors, of the order of a few microseconds,makes for a higher probability of error, most notably of the relaxation (or amplitude damping)type. We thus model the readout process as a two-outcome POVM corresponding to an amplitude-damping quantum channel of duration τ followed by a perfect Z -axis measurement: { I − E , E } , with E = (cid:18) − γ (cid:19) . The duration τ is adjusted so as to obtain a readout error rate γ = 1 − e − τ/T thatmatches the readout error rate reported by IBM. With γ = 4 . and T = 65 µs , we ﬁnd τ = 2 . µs , aduration that is consistent with the usual measurement durations of dispersive readout processes. Notethat this noise model does not include measurement crosstalk eﬀects [29].Another source of error is gate noise, i.e. gate imperfections. Here, since the hardware manufacturer onlyreports average 1- and 2-qubit gate error rates, we picked the simplest noise process to model gate noise,namely depolarizing noise with a depolarization probability adjusted so that the average process ﬁdelity uantum Divide and Compute: Exploring The Eﬀect of Diﬀerent Noise Sources 9 F avg matches the qubit-averaged average error rates (cid:15) avg = 1 − F avg reported by the hardware maker.We recall that the one-qubit depolarizing noise process is characterized by the following Kraus operators: K D = (cid:113) − p D (1) I , K Di = (cid:113) p D (1) σ i , i = 1 , , where σ i denote the Pauli spin matrices. We model two-qubit depolarization processes as a tensorproduct of the one-qubit depolarizing noise. Let us stress that more structured, and therefore moreaccurate, noise models could be extracted from quantum process tomography methods, at the cost ofa larger characterization overhead. Furthermore, this noise model does not include any crosstalk eﬀects(see e.g [30]), despite evidence that they play some role in today’s NISQ processors.Finally, we include the eﬀect of decoherence on idle qubits, i.e. qubits that are not being acted uponby a quantum gate, but are nevertheless coupled to the outside environment. This decoherence can bedecomposed into two main types, namely relaxation and dephasing. Relaxation (also known as amplitudedamping or, in other contexts, spontaneous emission) causes excited qubits (i.e. in state | (cid:105) ) to relaxto their ground state ( | (cid:105) ) with a probability that is characterized by a time T : p A . D τ idle = 1 − e − τ idle /T ,namely, the longer the idling duration τ idle , the higher the probability of a relaxation event. Similarly,dephasing events cause the two components | (cid:105) and | (cid:105) of a superposed state to acquire an unwanteddephasing with a certain probability. Under simplifying assumptions about the power spectral density(PSD) of the qubit-environment system, namely the assumption of a white noise PSD, this probabilityis given by p P . D τ idle = 1 − e − τ idle /T ϕ , with T ϕ = T − T . We note that this is a quite strong simpliﬁcation,as actual transmon processors are known to have a PSD that deviates from white noise, with, mostnotably, a sizable pink (1/f) noise component (see e.g [31] for a review) that leads to a deviation to theexponential decay of the formula we used. Let us also stress that such a noise modeling does not takeinto account temporally correlated noise. As a reminder, here are the Kraus operators associated withamplitude damping and (pure) dephasing: K A . D0 = (cid:34) (cid:113) − p A . D τ idle (cid:35) , K A . D1 = (cid:34) (cid:113) p A . D τ idle (cid:35) , K P . D0 = (cid:34) (cid:113) − p P . D τ idle (cid:35) , K P . D1 = (cid:34) (cid:113) p P . D τ idle (cid:35) . The values we used for T and T are reported in Table 1.The noisy simulations are conducted on the Atos Quantum Learning Machine (QLM), a classicalsupercomputing platform dedicated to writing, simulating and optimizing quantum algorithms [22].Before simulating the circuits resulting from the fragmentation procedure described in the previoussection, we use the QLM’s Nnizer plugin to compile the circuits, i.e. most notably to adapt them to theJohannesburg processor’s restricted qubit topology (shown in Fig. 4). Then, we perform noisy simulationsusing a density-matrix-based noise simulator that uses a dense representation of the density matrix ρ ofthe qubit register. Fig. 4

Qubit connectivity map of the Johannesburg processor. Edges are shown between qubit pairs coupled via a resonatorthat allows application of the two-qubit CNOT gate. s u cc e ss p r o b a b ili t y Fig. 5

Success probability as a function of circuit size (number of qubits) for various numbers of fragments using IBM’sJohannesburg processor.uantum Divide and Compute: Exploring The Eﬀect of Diﬀerent Noise Sources 11 s u cc e ss p r o b a b ili t y s u cc e ss p r o b a b ili t y Fig. 6

Success probability as a function of circuit size (number of qubits) for various numbers of fragments using IBM’sJohannesburg processor (solid black lines) and Atos QLM noisy simulation (dashed blue lines). The black integers next toeach black disk indicate the maximum fragment size (in number of qubits). P success ≡ p (cid:16) | (cid:105) ⊗ m/ | (cid:105) ⊗ m/ (cid:17) + p (cid:16) | (cid:105) ⊗ m/ | (cid:105) ⊗ m/ (cid:17) , (11)which, given the GHZ circuit at hand, is unity in the absence of any noise.We carried out the procedure both using an actual 20-qubit processor, IBM Johannesburg, and using theAtos Quantum Learning Machine’s noisy simulator.The experimental success probabilities, shown in Fig. 5, display two clear trends: on the one hand,increasing the number of qubits leads to a decreasing success probability. This trend can be accountedfor by the fact that increasing the number of qubits increases the number of gates of the circuit, and thusthe sensitivity to gate errors and environmental decoherence. On the other hand, increasing the numberof fragments in general leads to an improved success probability: the 6-8 fragment success probabilitiesare larger than the success probabilities obtained for lower numbers of fragments (with some exceptions s u cc e ss p r o b a b ili t y s u cc e ss p r o b a b ili t y Fig. 7

Eﬀect of better readout:

Same as Fig. 5, but with a readout duration divided by 5. to this observation: the one-fragment success probability often exceeds that of the 2 and 4-5 fragmentcases, maybe due to compiler optimizations on the hardware side for circuits with larger numbers ofqubits; we also note a point at n qbits = 10 where the 4-5 fragment success probability exceeds that ofthe 6-8 fragment case). This trend can be ascribed to the smaller gate count of each individual fragment,and thus a reduced sensitivity to errors. This smaller gate count not only comes from the mere cuttingprocedure, but also from the fact that smaller circuits better match the limited connectivity (Fig. 4) of theJohannesburg chip. Conversely, larger circuits need to be compiled to fulﬁll the connectivity constraints,leading to larger gate counts.To substantiate these interpretations, we performed noisy simulations with noise models constructed usingthe constructor’s calibration data (Table 1). We show the results in Fig. 6: a 20% agreement is foundbetween the noisy simulations and the experimental data. In particular, the drops in success probability,which can be traced back to connectivity-related insertions of SWAP gates, are reproduced. We note thatthe error bars coming from the ﬁnite number of shots (8192) used for each fragment are contained withinthe data symbols.3.2 Analysis of the inﬂuence of the diﬀerent noise typesIn this section, we study and compare the diﬀerential impact of all the noise types we have previouslytaken into account: gate imperfections, idling and readout errors. Our goal is to understand which types uantum Divide and Compute: Exploring The Eﬀect of Diﬀerent Noise Sources 13 s u cc e ss p r o b a b ili t y s u cc e ss p r o b a b ili t y Fig. 8

Eﬀect of better gates:

Same as Fig. 5, but with a depolarizing error per gate divided by 5. of noise have a particularly severe inﬂuence on the ﬁdelity of the fragmenting procedure and to formulaterecommendations as to which noise types should be addressed ﬁrst if one wants to make the most ofthe fragmenting procedure. Hence, we study the inﬂuence of the three noise types by simulating betterreadout measurements (Fig. 7), better gates (Fig. 8) and a better coherence time (Fig. 9).

Faster readout.

First, we analyze the impact of readout errors by decreasing the duration τ of themeasurements on all the subcircuits generated by the splitting procedure. Readout error is at presentthe largest source of errors in superconducting processors, with error rates as high as a few percent. Itis thus reasonable to assume that large experimental eﬀorts are going to be made to reduce this errorrate. Here, we suppose the reduction in readout error rate to originate from a reduction of the readoutduration (in practice by a factor 5), although it would be equivalent, in this noise model that assumes theerrors to come only from an amplitude damping noise, to keep the readout duration ﬁxed and to increasethe T1 coherence time (by the same factor 5). In reality, progress is being made on both fronts (see e.g[32, Fig. 2.c], for the increasing T1 trend, and [33] for recent eﬀorts towards faster measurements).We see in Fig. 9 that better readout improves the overall success probability all the more as thefragment number is large. The diﬀerence between the solid and the dashed lines qualitatively increaseswith the number of readout measurements used, and consequently the number of fragments. Indeed,more fragments necessitate more measurements to characterize the quantum state of each fragment.Nevertheless, we still see drops in the evolution of the success probability with the number of qubits. It s u cc e ss p r o b a b ili t y s u cc e ss p r o b a b ili t y Fig. 9

Eﬀect of better coherence:

Same as Fig. 5, but with T and T coherence times multiplied by 5. can be explained by the topology constraints that require the use of several SWAP gates when we try toperform gates between physical qubits that are not adjacent. This calls the study of the next paragraph. Better gates.

To model the use of better gates, we choose to lower the amplitude of the depolarizingchannel by dividing the depolarizing error rate by a factor of 5. The limited gate ﬁdelity is the second majorsource of errors in superconducting processors. It comes from calibration errors as well as decoherence.Here, we mimic the improvement in gate quality by simply dividing the error rate by a factor of 5. Sucha factor is realistic, in view of the improvements in gate qualities of superconducting processors in therecent years, and of the variability in the error rates reported by the hardware providers (the two-qubiterror rates reported for IBM Johannesburg [34], Google Sycamore [35, Fig.2, Table II] and Rigetti Aspen7 [36], are, respectively, 0.2%, 0.62% and 4.8%).The results of this change in the noise model can be seen in Fig. 6. We notice that the slope is moreregular as the number of qubits increases. Indeed, a smoothing of the "drops" in success probability isobserved. These drops were the consequence of performing a gate between qubits that are not adjacentin the connectivity map (Fig. 4) and that require using several SWAP gates. Thus, better gates helpmitigate the eﬀect of topology. The insertion of additional SWAP gates because of topology constraintsbecomes less detrimental to the overall success probability when the inserted gates are of good ﬁdelity.

Better coherence.

Finally, in order to understand the impact of coherence on the splitting procedure, weincrease the relaxation time T and the dephasing time T by multiplying them by a factor of 5 (see Sec. uantum Divide and Compute: Exploring The Eﬀect of Diﬀerent Noise Sources 15 I n c r e a s e i n s u cc e ss p r o b a b ili t y faster readoutbetter gatesbetter coherence Fig. 10

Increase in success probability averaged over the number of qubits as a function of the number of fragments, ∆P ( S , n f ) , for a faster readout (blue, parameters of Fig. 7), better gates (orange, parameters of Fig. 8), better coherence(green, parameters of Fig. 9). P (0)success computed with the Johannesburg noiseparameters. For each of the scenarios S introduced above, we compute the increase in probability deﬁnedas: ∆P ( S , n f ) = (cid:104) P success ( S , n f , n q ) − P (0)success ( n f , n q ) (cid:105) n q . (12)We see that, as discussed above, better readout is all the more helpful as the number of fragments islarge, while, conversely, better coherence is more beneﬁcial for smaller number of fragments. Achievingbetter gate ﬁdelities, on the other hand, is equally beneﬁcial with and without fragmentation since theslope of the orange line is close to . (We stress that because of the arbitrariness in the quantitative choiceof level of improvement for the three scenarios, one cannot conclude any quantitative insight from the . . . (a) . . . (b) . . . (c) . . . α b ′ b α b α b ′ b b b ′ α . . . . . . . . .. . . . . .. . .. . . Fig. 11

First three contraction steps for the fragmentation of the GHZ-type circuit of Fig. 1. value of the improvement; here, our conclusions are qualitative and only based on the slope with respectto the number of fragments). Consequently, to make the most of the fragmenting procedure in the caseof numerous fragments, the major error source to focus on is the measurement error by designing fasterreadouts.3.3 Contraction complexity and relation to tensor-network simulationIn this section, we elaborate on the complexity of the fragmentation algorithm. As presented in Section 2.1,the fragmentation method consists of a quantum and a classical step. In the quantum step, a batch ofquantum circuits is executed on a Quantum Processing Unit (QPU). The number of such circuits scales asthe number K of disconnected subgraphs of the original directed acyclic graph with some edges removed.The outcome of this step is a list of probability distributions p i . In the classical step, a tensor networkwith nodes corresponding either to the probability distributions or to the γ tensors deﬁned in Section 2.1needs to be contracted.Here, we shall be interested in the contraction complexity of such a tensor network, assuming one wantsto recover the probability of a single bitstring (ˆ b , . . . ˆ b m − ) , i.e. for a ﬁxed assignment of the externallegs of the tensor network shown in Fig. 2 (b). A naive contraction of the tensor network at hand,namely a simultaneous summation over all internal indices ( α i , b i , b (cid:48) i ) i =1 ...K − , would entail a contractioncomplexity of K − , i.e. a classical computation that is exponential in the number K of fragments. Inour case, however, the linear structure of the graph underlying the tensor network allows for a much moreeﬃcient sequential contraction strategy. Such a strategy, which is also widely exploited for contractingso-called Matrix Product States (see e.g. [37,38] for a review), consists in sequentially contracting thenodes of the network starting from one end of the linear graph. This is illustrated in Fig. 11, where weshow the ﬁrst three steps. The contraction complexity of the successive steps is 12, 36, 12, 36, . . . , 12,36, 12, 6. For K fragments, this yields an overall contraction complexity of K −

2) + 18 = 48 K − ,i.e a linear complexity in the fragment number K .In the case of a general tensor network, the optimal contraction complexity can be shown to be at leastof the order of O ( e T ) , where T is the so-called treewidth of the network graph [39]. The treewidth of agraph can be deﬁned as a combinatorial metric of closeness of the graph to a tree. There are a few waysto deﬁne the treewidth in more formal way: the minimum k for which a given graph is a partial k -tree,or the elimination width. uantum Divide and Compute: Exploring The Eﬀect of Diﬀerent Noise Sources 17 Tensor-network theory can also be leveraged to simulate quantum circuits classically. There are a numberof tensor-network-based simulators developed for such simulations: QFlex [40], AC-QDP [41], Quimb [42],and QTensor [43]. These simulators are typically much faster and more eﬃcient than state vectorsimulators for shallow circuits [44] such as the circuits in this work. In these tensor simulators, thecircuits are not directly represented by tensors, but rather use line graphs, which was proposed by Boixoet al. [45]. This approach has multiple beneﬁts. The only disadvantage of the line graph approach is thatit has limited usability to simulate sub-tensors of amplitudes, which was resolved in the work by Schutskiet al. [46].The method studied in our work, circuit cutting, has a counterpart in tensor-network-based simulation.It is called tensor slicing . One way to understand the slice of a tensor as an index that can be viewed asthe function of many variables evaluated at some value of one variable: f ( x , x , . . . x n ) | x = a = ˜ f ( x , . . . x n ) , where variables can have integer values x i ∈ [0 , d − . Thus, in this technique, slicing reduces the numberof indices of the tensor one by one. Since all sizes of indices we use are equal to 2, removal of n verticesallows to split the expression into n separate parts. This operation is also equivalent to decompositionof the full tensor expression. Each separate tensor is represented by a graph with lower connectivity thanthe original one. As a result, it dramatically reduces the complexity of ﬁnding the optimal elimination.Thus, it results in a lower contraction cost. It is a powerful technique that allows to simulate large circuitsas does the circuit-cutting technique described in this work.3.4 Homogeneous and heterogeneous quantum computingOne exciting application of the circuit-cutting technique is to allow to execute much larger circuits. Itcan be done in two ways: split circuits and run sequentially on a quantum device (as we demonstratedin [1]), or run at the same time on multiple quantum devices. The latter way can lead to an excitingnew era of how quantum computation is done - distributed quantum computing. It can potentially notonly allow for the execution of larger circuits, but also for a much faster execution. It is arguably amore realistic approach in the near future compared to the "true" distributed quantum computing thatrequires a quantum network connecting quantum devices. In our approach, indeed, we would utilize onlythe classical network. In this work, we further investigated the Quantum Divide and Conquer approach whose ﬁrstimplementation was demonstrated in a recent work of ours [1].After giving more details as to the mathematical framework and physical models used for thisimplementation, we analyzed the inﬂuence of diﬀerent noise sources on the success probability of asimple, GHZ-type circuit using classical noisy simulations on the Atos Quantum Learning Machine. Wefocused on the three main noise sources of today’s superconducting processors, namely readout errors,gate errors and decoherence (relaxation and dephasing) on idle qubits. We showed that readout errorsare the most detrimental to the QDC procedure, because QDC requires additional measurements as the number of fragments increases. Conversely, the eﬀect of idling noise is mitigated by QDC, as QDC resultsin smaller circuits that are less susceptible to this source of noise.We also analyzed the computational complexity of QDC using tensor-network methods. While for ageneral circuit the contraction complexity increases exponentially with the number of cuts, for the GHZ-like circuit we studied, the complexity increases linearly with the number of cuts.Finding more complex circuits in which the contraction complexity is still manageable is an interestingfuture direction. Circuits that have a "clustered" structure [14], that are e.g required in methods likethe Dynamic Quantum Variational Ansatz [47], are promising candidates. In these methods, indeed, theansatz has a mixer unitary that is made up of partial mixers that can have limited connectivity betweeneach other, and can therefore form clusters.

Acknowledgements

This research used resources of the Oak Ridge Leadership Computing Facility, which is a DOEOﬃce of Science User Facility supported under Contract DE-AC05-00OR22725. This research also used the resources of theArgonne Leadership Computing Facility, which is DOE Oﬃce of Science User Facility supported under Contract DE-AC02-06CH11357. Yuri Alexeev, Zain H. Saleem and Martin Suchara were supported by the DOE, Oﬃce of Science, under ContractDE-AC02-06CH11357. The compilation and noisy simulations were performed using Argonne National Laboratory’s andAtos Quantum Laboratory’s Quantum Learning Machines.The submitted manuscript has been created by UChicago Argonne, LLC, Operator of Argonne National Laboratory("Argonne”). Argonne, a U.S. Department of Energy Oﬃce of Science laboratory, is operated under Contract No. DE-AC02-06CH11357. The U.S. Government retains for itself, and others acting on its behalf, a paid-up nonexclusive, irrevocableworldwide license in said article to reproduce, prepare derivative works, distribute copies to the public, and perform publiclyand display publicly, by or on behalf of the Government. The Department of Energy will provide public access to theseresults of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).

Conﬂict of Interest:

The authors declare that they have no conﬂict of interest.The data and materials presented in this paper are available upon request to the authors.

A Pauli-basis representation of operators and superoperators

We can decompose any Hermitian operator (including density matrices) as ρ = (cid:88) α ρ α P α , ρ α = 1 d Tr [ P α ρ ] (13)with d = 2 n qbits and P α a generalized Pauli matrix on n qbits qubits. Similary, superoperators can be decomposed on thisbasis, [ R ] αβ = 1 d Tr (cid:2) P α · O ( P β ) (cid:3) R is called the Pauli transfer matrix (PTM) representation of O . Then, the coordinates of ρ (cid:48) = O ( ρ ) is the Pauli basis aresimply given by ρ (cid:48) α = 1 d Tr [ P α O ( ρ )] = 1 d (cid:88) β ρ β Tr (cid:2) P α O ( P β ) (cid:3) = (cid:88) β R αβ ρ β (14)We note that Tr (cid:104) A † · B (cid:105) = (cid:88) αβ A ∗ α B β Tr (cid:104) P † α P β (cid:105) = (cid:88) αβ A ∗ α B β dδ αβ = d (cid:88) α A ∗ α B α (15)Deﬁning the scalar product (cid:104)(cid:104) A | B (cid:105)(cid:105) ≡ (cid:80) α A ∗ α B α , we thus have Tr (cid:104) A † · B (cid:105) = d (cid:104)(cid:104) A | B (cid:105)(cid:105) = 2 n qbits (cid:104)(cid:104) A | B (cid:105)(cid:105) (16)uantum Divide and Compute: Exploring The Eﬀect of Diﬀerent Noise Sources 19 References

1. T. Ayral, F. M. Le Regent, Z. Saleem, Y. Alexeev, and M. Suchara, “Quantum divide and compute: Hardwaredemonstrations and noisy simulations,”

Proceedings of IEEE Computer Society Annual Symposium on VLSI, ISVLSI ,pp. 138–140, 2020. [Online]. Available: https://doi.org/10.1109/ISVLSI49217.2020.000342. J. Preskill, “Quantum Computing in the NISQ era and beyond,”

Quantum npj Quantum Information , vol. 3, no. 1, p. 2, 2017. [Online]. Available: https://doi.org/10.1038/s41534-016-0004-07. C. Monroe and J. Kim, “Scaling the ion trap quantum processor,”

Science , vol. 339, no. 6124, pp. 1164–1169, 2013.[Online]. Available: https://science.sciencemag.org/content/339/6124/11648. M. Saﬀman, “Quantum computing with neutral atoms,”

National Science Review , vol. 6, no. 1, pp. 24–25, Sep. 2018.[Online]. Available: https://doi.org/10.1093/nsr/nwy0889. F. Dolde, I. Jakobi, B. Naydenov, N. Zhao, S. Pezzagna, C. Trautmann, J. Meijer, P. Neumann, F. Jelezko, andJ. Wrachtrup, “Room-temperature entanglement between single defect spins in diamond,”

Nature Physics , vol. 9, pp.139 EP –, Feb. 2013. [Online]. Available: https://doi.org/10.1038/nphys254510. H. Bernien, B. Hensen, W. Pfaﬀ, G. Koolstra, M. S. Blok, L. Robledo, T. H. Taminiau, M. Markham, D. J. Twitchen,L. Childress, and R. Hanson, “Heralded entanglement between solid-state qubits separated by three metres,”

Nature ,vol. 497, pp. 86 EP –, Apr. 2013. [Online]. Available: https://doi.org/10.1038/nature1201611. S. Bravyi, G. Smith, and J. A. Smolin, “Trading classical and quantum computational resources,”

Phys. Rev. X , vol. 6,p. 021043, Jun. 2016. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevX.6.02104312. T. Peng, A. Harrow, M. Ozols, and X. Wu, “Simulating large quantum circuits on a small quantum computer,” arXivpreprint arXiv:1904.00102 , 2019. [Online]. Available: https://arxiv.org/abs/1904.0010213. A. W. Cross, L. S. Bishop, S. Sheldon, P. D. Nation, and J. M. Gambetta, “Validating quantum computersusing randomized model circuits,”

Physical Review A , vol. 100, no. 3, p. 032328, Sep. 2019. [Online]. Available:http://dx.doi.org/10.1103/PhysRevA.100.03232814. M. A. Perlin, Z. H. Saleem, M. Suchara, and J. C. Osborn, “Quantum circuits: Divide and compute with maximumlikelihood tomography,” arXiv preprint arXiv:2005.12702 , 2020. [Online]. Available: https://arxiv.org/abs/2005.1270215. W. Tang, T. Tomesh, J. Larson, M. Suchara, and M. Martonosi, “CutQC: using small quantum computers for largequantum circuit evaluations,” in

Proceedings of the ACM International Conference on Architectural Support forProgramming Languages and Operating Systems (ASPLOS) , 2021.16. J. Preskill, “Quantum computing and the entanglement frontier,” arXiv:1203.5813 , Nov 2012. [Online]. Available:https://arxiv.org/abs/1203.581317. Y. Alexeev, “Evaluation of the Intel-QS performance on Theta supercomputer,” Argonne National Laboratory -Leadership Computing Facility, Technical report ANL/ALCF 18/2, Apr 2018.18. T. Häner and D. S. Steiger, “0.5 petabyte simulation of a 45-qubit quantum circuit,” in

Proceedings of the InternationalConference for High Performance Computing, Networking, Storage and Analysis , ser. SC ’17. New York, NY, USA:ACM, 2017, pp. 33:1–33:10. [Online]. Available: http://doi.acm.org/10.1145/3126908.312694719. S. Boixo, S. V. Isakov, V. N. Smelyanskiy, R. Babbush, N. Ding, Z. Jiang, M. J. Bremner, J. M. Martinis, andH. Neven, “Characterizing quantum supremacy in near-term devices,”

Nature Physics , vol. 14, no. 6, pp. 595–600,2018. [Online]. Available: https://doi.org/10.1038/s41567-018-0124-x20. G. Aleksandrowicz et al. , “Qiskit: An open-source framework for quantum computing,” 2019.21. M. Smelyanskiy, N. P. D. Sawaya, and A. Aspuru-Guzik, “qHiPSTER: The quantum high performance softwaretesting environment,” arXiv:1601.07195 , 2016. [Online]. Available: https://arxiv.org/abs/1601.0719522. “Atos Quantum Learning Machine,” https://atos.net/wp-content/uploads/2018/07/Atos-Quantum-Learning-Machine-brochure.pdf, Jun 2018.23. D. S. Steiger, T. Häner, and M. Troyer, “ProjectQ: an open source software framework for quantum computing,”

Quantum , vol. 2, p. 49, Jan 2018.24. J. R. McClean, J. Romero, R. Babbush, and A. Aspuru-Guzik, “The theory of variational hybrid quantum-classical algorithms,”

New Journal of Physics , vol. 18, no. 2, p. 023023, Feb 2016. [Online]. Available:https://doi.org/10.1088%2F1367-2630%2F18%2F2%2F0230230 T. Ayral, F.-M. Le Régent, Z. Saleem, Y. Alexeev, M. Suchara25. S. Barrett, K. Hammerer, S. Harrison, T. E. Northup, and T. J. Osborne, “Simulating quantum ﬁelds withcavity QED,”

Phys. Rev. Lett. , vol. 110, p. 090501, Feb 2013. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevLett.110.09050126. E. Farhi, J. Goldstone, and S. Gutmann, “A quantum approximate optimization algorithm,” arXiv:1411.4028 , Nov2014. [Online]. Available: https://arxiv.org/abs/1411.402827. D. Wecker, M. B. Hastings, and M. Troyer, “Progress towards practical quantum variational algorithms,”

Phys. Rev.A , vol. 92, p. 042303, Oct 2015. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevA.92.04230328. G. G. Guerreschi and A. Y. Matsuura, “QAOA for max-cut requires hundreds of qubits for quantum speed-up,” arXiv:1812.07589 , Dec 2018. [Online]. Available: https://arxiv.org/abs/1812.0758929. Y. Chen, M. Farahzad, S. Yoo, and T.-C. Wei, “Detector tomography on IBM quantum computers and mitigationof an imperfect measurement,”

Physical Review A , vol. 100, no. 5, p. 052315, Nov. 2019. [Online]. Available:https://link.aps.org/doi/10.1103/PhysRevA.100.05231530. M. Sarovar, T. Proctor, K. Rudinger, K. Young, E. Nielsen, and R. Blume-Kohout, “Detecting crosstalkerrors in quantum information processors,”

Quantum , vol. 4, p. 321, Sep. 2020. [Online]. Available:https://quantum-journal.org/papers/q-2020-09-11-321/31. E. Paladino, Y. Galperin, G. Falci, and B. L. Altshuler, “1/ f noise: Implications for solid-state quantuminformation,”

Reviews of Modern Physics , vol. 86, no. 2, pp. 361–418, 2014. [Online]. Available: https://doi.org/10.1103/RevModPhys.86.36132. M. Kjaergaard, M. E. Schwartz, J. Braumüller, P. Krantz, J. I.-J. Wang, S. Gustavsson, and W. D.Oliver, “Superconducting Qubits: Current State of Play,”

Annual Review of Condensed Matter Physics

Physical ReviewApplied , vol. 10, no. 3, p. 034040, sep 2018. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevApplied.10.03404034. “Ibm quantum experience website,” https://quantum-computing.ibm.com/, accessed: 2020-03-05.35. F. Arute, K. Arya, R. Babbush, D. Bacon, J. C. Bardin, R. Barends, R. Biswas, S. Boixo, F. G. S. L. Brandao,D. A. Buell, B. Burkett, Y. Chen, Z. Chen, B. Chiaro, R. Collins, W. Courtney, A. Dunsworth, E. Farhi, B. Foxen,A. Fowler, C. Gidney, M. Giustina, R. Graﬀ, K. Guerin, S. Habegger, M. P. Harrigan, M. J. Hartmann, A. Ho,M. Hoﬀmann, T. Huang, T. S. Humble, S. V. Isakov, E. Jeﬀrey, Z. Jiang, D. Kafri, K. Kechedzhi, J. Kelly, P. V.Klimov, S. Knysh, A. Korotkov, F. Kostritsa, D. Landhuis, M. Lindmark, E. Lucero, D. Lyakh, S. Mandrà, J. R.McClean, M. McEwen, A. Megrant, X. Mi, K. Michielsen, M. Mohseni, J. Mutus, O. Naaman, M. Neeley, C. Neill,M. Y. Niu, E. Ostby, A. Petukhov, J. C. Platt, C. Quintana, E. G. Rieﬀel, P. Roushan, N. C. Rubin, D. Sank, K. J.Satzinger, V. Smelyanskiy, K. J. Sung, M. D. Trevithick, A. Vainsencher, B. Villalonga, T. White, Z. J. Yao, P. Yeh,A. Zalcman, H. Neven, and J. M. Martinis, “Quantum supremacy using a programmable superconducting processor,”

Nature

Annals of Physics ,vol. 326, no. 1, pp. 96–192, Aug. 2011. [Online]. Available: http://dx.doi.org/10.1016/j.aop.2010.09.01238. R. Orus, “A Practical Introduction to Tensor Networks: Matrix Product States and Projected Entangled Pair States,”

Annals of Physics , vol. 349, pp. 117–158, Jun. 2013. [Online]. Available: http://dx.doi.org/10.1016/j.aop.2014.06.01339. I. L. Markov and Y. Shi, “Simulating quantum computation by contracting tensor networks,”

SIAM Journal onComputing , vol. 38, no. 3, pp. 963–981, 2008. [Online]. Available: https://doi.org/10.1137/05064475640. B. Villalonga, S. Boixo, B. Nelson, C. Henze, E. Rieﬀel, R. Biswas, and S. Mandrà, “A ﬂexible high-performancesimulator for verifying and benchmarking quantum circuits implemented on real hardware,” npj Quantum Information ,vol. 5, pp. 1–16, 2019. [Online]. Available: https://doi.org/10.1038/s41534-019-0196-141. C. Huang, M. Szegedy, F. Zhang, X. Gao, J. Chen, and Y. Shi, “Alibaba cloud quantum developmentplatform: Applications to quantum algorithm design,” arXiv preprint arXiv:1909.02559 , 2019. [Online]. Available:https://arxiv.org/abs/1909.0255942. J. Gray, “quimb: A python package for quantum information and many-body calculations,”

Journal of Open SourceSoftware , vol. 3, no. 29, p. 819, 2018.43. D. Lykov, C. Ibrahim, A. Galda, and Y. Alexeev, “Tensor network simulator QTensor,” https://github.com/danlkv/QTensor, 2020.44. X.-C. Wu, S. Di, E. M. Dasgupta, F. Cappello, H. Finkel, Y. Alexeev, and F. T. Chong, “Full-state quantum circuitsimulationby using data compression,” in

Proceedings of the High Performance Computing,Networking, Storage andAnalysis International Conference (SC19) . Denver, CO, USA: IEEE Computer Society, 2019. [Online]. Available:https://doi.org/10.1145/3295500.3356155uantum Divide and Compute: Exploring The Eﬀect of Diﬀerent Noise Sources 2145. S. Boixo, S. V. Isakov, V. N. Smelyanskiy, and H. Neven, “Simulation of low-depth quantum circuits as complexundirected graphical models,” arXiv preprint arXiv:1712.05384 , 2017.46. R. Schutski, D. Lykov, and I. Oseledets, “An adaptive algorithm for quantum circuit simulation,” arXiv preprintarXiv:1911.12242 , 2019. [Online]. Available: https://arxiv.org/pdf/1911.12242.pdf47. Z. H. Saleem, B. Tariq, and M. Suchara, “Approaches to constrained quantum approximate optimization,” arXivpreprint arXiv:2010.06660arXivpreprint arXiv:2010.06660