[PDF] Enabling multi-programming mechanism for quantum computing in the NISQ era

Abstract

As NISQ devices have several physical limitations and unavoidable noisy quantum operations, only small circuits can be executed on a quantum machine to get reliable results. This leads to the quantum hardware under-utilization issue. Here, we address this problem and improve the quantum hardware throughput by proposing a multiprogramming approach to execute multiple quantum circuits on quantum hardware simultaneously. We first introduce a parallelism manager to select an appropriate number of circuits to be executed at the same time. Second, we present two different qubit partitioning algorithms to allocate reliable partitions to multiple circuits-a greedy and a heuristic. Third, we use the Simultaneous Randomized Benchmarking protocol to characterize the crosstalk properties and consider them in the qubit partition process to avoid crosstalk effect during simultaneous executions. Finally, we enhance the mapping transition algorithm to make circuits executable on hardware using decreased number of inserted gates. We demonstrate the performance of our multi-programming approach by executing circuits of different size on IBM quantum hardware simultaneously. We also investigate this method on VQE algorithm to reduce its overhead.

Full PDF

EEnabling multi-programming mechanism for quantumcomputing in the NISQ era

Siyuan Niu

LIRMM, University of Montpellier34095 Montpellier, [email protected]

Aida Todri-Sanial

LIRMM, University of Montpellier, CNRS34095 Montpellier, [email protected]

ABSTRACT

As NISQ devices have several physical limitations and unavoidablenoisy quantum operations, only small circuits can be executed on aquantum machine to get reliable results. This leads to the quantumhardware under-utilization issue. Here, we address this problem andimprove the quantum hardware throughput by proposing a multi-programming approach to execute multiple quantum circuits onquantum hardware simultaneously. We first introduce a parallelismmanager to select an appropriate number of circuits to be executedat the same time. Second, we present two different qubit partition-ing algorithms to allocate reliable partitions to multiple circuits – agreedy and a heuristic. Third, we use the Simultaneous RandomizedBenchmarking protocol to characterize the crosstalk properties andconsider them in the qubit partition process to avoid crosstalk effectduring simultaneous executions. Finally, we enhance the mappingtransition algorithm to make circuits executable on hardware usinga decreased number of inserted gates. We demonstrate the perfor-mance of our multi-programming approach by executing circuits ofdifferent sizes on IBM quantum hardware simultaneously. We alsoinvestigate this method on VQE algorithm to reduce its overhead.

Quantum computing promises to achieve an exponential speedup totackle certain computational tasks compared with the classical com-puters [11, 18, 21, 32, 33, 46, 47]. Although quantum technologiesare continuously improving, current quantum devices are still qual-ified as Noisy Intermediate-Scale Quantum (NISQ) hardware [41],with several physical constraints. For example, for superconductingdevices which we target in this paper, connections are only allowedbetween two neighbouring qubits. Besides, the gate operations ofNISQ devices are noisy and have unavoidable error rates. As we donot have enough number of qubits to realize Quantum Error Cor-rection [9, 10, 22], only small circuits with limited depth can obtainreliable results when executed on quantum hardware, which leadsto the waste of hardware resource. Moreover, with the growingdemand to access to quantum hardware, its under-utilization issueincreases the waiting time for users, which indicates the need toimprove the hardware throughput.As the qubit number of the hardware increases and the errorrates improve, it becomes possible to execute multiple circuits on aquantum chip simultaneously. The multi-programming mappingproblem was firstly introduced by [15], which demonstrated that thethroughput and utilization of NISQ hardware can be enhanced by ex-ecuting several circuits at the same time. Ref [16] further improvedit in terms of fidelity and gate number by proposing a CommunityDetection Assisted Partition algorithm along with the X-SWAPscheme (we refer to this algorithm as CDAP for brevity). However, their results showed that when executing multiple quantum circuitssimultaneously, the activity of one circuit can negatively impactthe fidelity of others, due to the difficulty of allocating reliable re-gions to each circuit, higher chance of crosstalk error [45], and thequbit movement limitation (only inside of the partition). Previousworks [15, 16] have left these issues largely unexplored and havenot addressed the problem holistically: (1) Hardware topology andcalibration data are not fully analyzed where allocation is doneon unreliable or sparse-connected partitions to circuits ignoringthe robust qubits and links. (2) These works use only

SWAP gatefor mapping transition process and the modified circuits alwayshave a large number of additional gates. (3) Crosstalk error is notconsidered when allocating partitions for circuits. For example, theX-SWAP scheme [16] for reducing the inserted

SWAP number canonly be performed when the two circuits are allocated to neighbour-ing partitions, which can introduce crosstalk effect and decrease thecircuit output fidelity. Detrimental crosstalk impact when executingmultiple parallel instructions has been reported in [5, 6, 37] by usingSimultaneous Randomized Benchmarking (SRB) [23]. In presenceof crosstalk, gate error can be increased by an order of magnitude.Ref [5] even proposed a fault-attack model using crosstalk in amulti-programming environment.It is important to investigate the multi-programming approachin the NISQ era especially for Variational Quantum Algorithms(VQAs) [12]. For example, the multi-programming mechanism canenable to execute several ansatz states in parallel in one quantumprocessor, such as in Variational Quantum Eigensolver (VQE) [31,40], Variational Quantum Linear Solver (VQLS) [8, 29], or Varia-tional Quantum Classifier (VQC) [27, 43] with reliability. It is alsogeneral enough to be applied to other quantum circuits regardlessof applications or algorithms.In this work, we address the problem of multi-programmingwhile considering the impact of hardware topology, calibrationdata, and crosstalk without losing the circuit fidelity. First, we intro-duce a parallelism manager that can optimally select the number ofcircuits being executed on the quantum hardware simultaneously.Second, we present two different qubit partition algorithms to allo-cate reliable partitions to different circuits. One is a greedy partitionalgorithm which provides optimal choices. The other one is basedon a heuristic which can give nearly optimal results and signifi-cantly reduce the time complexity. Third, we consider crosstalkerror during the partition process to lower the crosstalk effectduring simultaneous executions. Then, we improve the mappingtransition step of the qubit mapping problem to make quantumcircuits executable on quantum hardware with a reduced numberof additional gates. Finally, we evaluate our algorithm on real quan-tum hardware by first executing circuits of different sizes at the a r X i v : . [ c s . A R ] F e b iyuan Niu and Aida Todri-Sanial same time and then applying it to VQE algorithm to estimate theground state energy of deuteron. To the best of our knowledge, thisis the first attempt to propose a complete multi-programming pro-cess flow for executing an optimal number of workloads in parallelensuring the output fidelity by analyzing the hardware limitations. The multi-programming workflow is schematically shown in Fig. 1,which includes the following steps: • Input layer. It contains a list of small quantum circuits writtenin OpenQASM language [14], and the quantum hardwareinformation, including the hardware topology, calibrationdata, and crosstalk effect. • Parallelism manager. It can determine whether executingcircuits concurrently or separately. If the simultaneous exe-cution is allowed, it can further decide the number of circuitsto be executed on the hardware at the same time withoutlosing fidelity based on the fidelity metric included in thehardware-aware multi-programming compiler. • Hardware-aware multi-programming Compiler. Qubits arepartitioned to several reliable regions and are allocated todifferent quantum circuits using qubit partition algorithms.Then, the partition fidelity is evaluated by the post qubitpartition process. We introduce a fidelity metric here whichhelps to decide whether this number of circuits can be ex-ecuted simultaneously or the number needs to be reducedbased on their properties. • Scheduler. The mapping transition algorithm is applied andcircuits are transpiled to be executable on real quantumhardware. • Output layer. Output circuits are executed on the quantumhardware simultaneously or independently according to theprevious steps and the experimental results are obtained.

In order to determine the optimal number of circuits that can beexecuted on the hardware in parallel without losing fidelity, here,we introduce the parallelism manager, shown in Fig. 2a.Suppose we have a list of 𝑛 circuit workloads with 𝑛 𝑖 qubitsfor each of them, that are expected to be executed on 𝑁 -qubithardware. Firstly, the circuits are sorted according to their densities.The density of a circuit is defined as the number of CNOTs dividedby the qubit number of the circuit,

𝐶𝑁𝑂𝑇𝑠 / 𝑛 𝑖 , [16]. Then, we pick 𝐾 circuits which is the maximum number of circuits that are able tobe executed on the hardware at the same time, (cid:205) 𝐾𝑛 = 𝑛 𝑖 ≤ 𝑁 . If 𝐾 isequal to one, then all the circuits should be executed independently.Otherwise, these circuits are passed to the hardware-aware multi-programming compiler. They work together to decide an optimalnumber of simultaneous circuits to be executed. Here, we present the key features of the qubit partition algorithms.A motivational example can be found in Supplementary Note 2.

Crosstalk effect characterization.

Crosstalk is one of the major noise sources in NISQ devices, whichcan corrupt a quantum state due to quantum operations on otherqubits [44]. There are two types of crosstalk. The first one is quan-tum crosstalk, which is caused by the always-on-ZZ interaction [35,52]. The second one is classical crosstalk caused by the incorrectcontrol of the qubits. The calibration data provided by IBM do notinclude the crosstalk error. To consider the crosstalk effect in parti-tion algorithms, we must first characterize it in the hardware. Thereare several protocols presented in [7, 19, 23, 28, 42] to benchmarkthe crosstalk effect in quantum devices. In this paper, we choose themostly used protocol – Simultaneous Randomized Benchmarking(SRB) [23] to detect and quantify the crosstalk between

CNOT pairswhen executing them in parallel.We characterize the crosstalk effect followed by the optimizationmethods presented in [37]. On IBM quantum devices, the crosstalkeffect is significant only at one hop distance between

CNOT pairs [37],such as ( 𝐶𝑋 , | 𝐶𝑋 , ) shown in Fig. 3a, when the control pulse ofone qubit propagates an unwanted drive to the nearby qubits thathave similar resonate frequencies. Therefore, we perform SRB onlyon CNOT pairs that are separated by one-hop distance. For thosepairs whose distance is greater than one hop, the crosstalk effectsare very weak and we ignore them. It allows us to parallelize SRBexperiments of multiple

CNOT pairs when they are separated by twoor more hops. For example, in IBM Q 27 Toronto (ibmq_toronto) [1],the pairs ( 𝐶𝑋 , | 𝐶𝑋 , ), ( 𝐶𝑋 , | 𝐶𝑋 , ), ( 𝐶𝑋 , | 𝐶𝑋 , ) can becharacterized in parallel.We perform the crosstalk characterization on IBM Q 27 Torontotwice. The results show that, although the absolute gate errorsvary every day, the pairs that have strong crosstalk effect remainthe same across days. SRB experiment on CNOT pairs ( 𝑔 𝑖 | 𝑔 𝑗 ) giveserror rate 𝐸 ( 𝑔 𝑖 | 𝑔 𝑗 ) and 𝐸 ( 𝑔 𝑗 | 𝑔 𝑖 ) . Here, 𝐸 ( 𝑔 𝑖 | 𝑔 𝑗 ) represents the CNOT error rate of 𝑔 𝑖 when 𝑔 𝑖 and 𝑔 𝑗 are executed in parallel. If there isa crosstalk effect between the two pairs, it will lead to 𝐸 ( 𝑔 𝑖 | 𝑔 𝑗 ) > 𝐸 ( 𝑔 𝑖 ) or 𝐸 ( 𝑔 𝑗 | 𝑔 𝑖 ) > 𝐸 ( 𝑔 𝑗 ) . The crosstalk effect characterization isexpensive and time costly. Some of the pairs do not have crosstalk ef-fect whereas the CNOT error of the pair affected the most by crosstalkeffect is increased by more than five times. Therefore, we extractthe pairs with significant crosstalk effect, i.e., 𝐸 ( 𝑔 𝑖 | 𝑔 𝑗 ) > × 𝐸 ( 𝑔 𝑖 ) and only characterize these pairs when crosstalk properties areneeded. We choose the same factor 3 to quantify the pairs withstrong crosstalk error like [37]. The result of crosstalk effect char-acterization on IBM Q 27 Toronto is shown in Fig. 3b. Greedy sub-graph partition algorithm.

We develop a Greedy Sub-graph Partition algorithm (GSP) for qubitpartition process which is able to provide theoretically the optimalpartitions for different quantum circuits (see Supplementary Note3 for pseudo-code of GSP). The first step of the GSP algorithm is totraverse the overall hardware to find all the possible partitions for nabling multi-programming mechanism for quantum computing in the NISQ era Parallelism

Manager Scheduler

Quantum circuitworkloads

Quantum hardware information

Selected shared workloadsReduce the number of shared workloads Independent workloads

Quantum hardware

Hardware-aware

Multi-programming

Compiler

OutputCircuitsOutputCircuits

Figure 1: Overview of the proposed multi-programming framework.

The input layer includes the quantum hardware information andmultiple quantum circuit workloads. The parallelism manager helps to decide whether executing circuits simultaneously or independently.For simultaneous executions, it works with the hardware-aware multi-programming compiler to select an optimal number of sharedworkloads to be executed at the same time. Then, the scheduler makes all the circuits executable on the quantum hardware and we canobtain the results of the output circuits. K == Qubit

PartitionAlgorithms SimultaneoustransitionIndependent transition

Quantumhardware

YesNo

Yes No K = K - Quantum circuitworkloadsQuantum hardwareinformation

Hardware-aware

Multi-programming

Compiler

Parallelism

Manager n workloads N-qubit hardware

Scheduler (a) (b) (c) Δ S Pass fidelitythreshold δ ? Pick K circuits K1 ii n N =   Pick K circuits K1 ii n N =   Sort n circuits i CNOTs n i CNOTs n n circuits i CNOTs n

Figure 2: Process flow of each block that constitutes our multi-programming approach. (a)

The parallelism manager selects 𝐾 circuits according to their densities and passes them to the hardware-aware multi-programming compiler. (b) The qubit partition algorithmsallocate reliable regions to multiple circuits. Δ 𝑆 is the difference between partition scores when partitioning independently and simultaneously,which is the fidelity metric. 𝛿 is the threshold set by the user. The fidelity metric helps to select the optimal number of simultaneous circuitsto be executed. (c) The scheduler performs mapping transition algorithm and makes quantum circuits executable on real quantum hardware.a given circuit. For example, suppose we have a five-qubit circuit,we find all the subgraphs of the hardware topology (also calledcoupling graph) containing five qubits as the partition candidates.Each candidate has a score to represent its fidelity depending on thetopology and calibration data. The partition with the best fidelity isselected and all the qubits inside of the partition are marked as usedqubits so they cannot be assigned to other circuits. For the nextcircuit, a subgraph with the required number of qubits is assignedand we check if there is an overlap on this partition to partitions ofprevious circuits. If not, the subgraph is a partition candidate forthe given circuit and the same process is applied to each subsequentcircuit. To account for crosstalk, we check if any pairs in a subgraphhave strong crosstalk effect caused by the allocated partitions ofother circuits. If so, the score of the subgraph is adjusted to takecrosstalk error into account.In order to evaluate the reliability of a partition, there are twofactors that need to be considered: partition topology and errorrates of two-qubit links and readout error of each qubit. One-qubitgates are ignored for simplicity and because of their relatively lowerror rates compared to the other quantum operations. If there is aqubit pair in a partition that has strong crosstalk affected by otherpartitions, then

CNOT error of this pair is added to the crosstalk effect. Note that the most recent calibration data should be re-trieved through the IBM Quantum Experience before each usageto ensure that the algorithm has access to the most accurate andup-to-date information. To evaluate the partition topology, we de-termine the longest shortest path (also called graph diameter) of thepartition, denoted 𝐿 . The smaller the longest shortest path is, thebetter the partition is connected and eventually fewer SWAP gateswould be needed to make a connection between two qubits in awell-connected partition.We devise a reliability score metric for a partition that is the sumof the graph diameter 𝐿 , average CNOT error rate of the links timesthe number of

CNOTs of the circuit, and the sum of the readouterror rate of each qubit in a partition (Eq. 1). Note that the

CNOT error rate includes the crosstalk effect if it exists.

𝑆𝑐𝑜𝑟𝑒 𝑔 = 𝐿 + 𝐴𝑣𝑔

𝐶𝑁𝑂𝑇 × 𝐶𝑁𝑂𝑇𝑠 + ∑︁ 𝑄 𝑖 ∈ 𝑃 𝑅 𝑄 𝑖 (1)The graph diameter 𝐿 is always prioritized in this equation, sinceit is more than one order of magnitude larger than the other twofactors. The partition with the smallest reliability score is selected.It is supposed to have the best connectivity and the lowest errorrate. Moreover, the partition algorithm prioritizes the quantumcircuit with a large density because the input circuits are ordered iyuan Niu and Aida Todri-Sanial (a) (b) Figure 3: Characterization of crosstalk effect. (a)

Crosstalkpairs separated by one-hop distance. The crosstalk pairs shouldbe able to be executed at the same time. Therefore, they cannotshare the same qubit. One-hop is the minimum distance betweencrosstalk pairs. (b)

Crosstalk effect results of IBM Q 27 Torontousing SRB. The arrow of the red dash line points to the

CNOT pairthat is affected significantly by crosstalk effect, e.g., 𝐶𝑋 , and 𝐶𝑋 , affect each other when they are executed simultaneously.In our experiments, 𝐸 ( 𝐶𝑋 , | 𝐶𝑋 , ) > × 𝐸 ( 𝐶𝑋 , ) , whereas 𝐸 ( 𝐶𝑋 , | 𝐶𝑋 , ) ≈ . × 𝐸 ( 𝐶𝑋 , ) . As we choose 3 as the factorto pick up pairs with strong crosstalk effect, there is no arrow atpair 𝐶𝑋 , .by their densities during the parallelism manager process. Thepartition algorithm is then called for each circuit in order. However,GSP algorithm is expensive and time costly. For small circuits, GSPalgorithm gives the best choice of partition. It is also useful to useit as a baseline to compare with other partition algorithms. Forbeyond NISQ, a better approach should be explored to overcomethe complexity overhead. Qubit fidelity degree-based heuristic sub-graph partition algorithm.

The Qubit fidelity degree-based Heuristic Sub-graph Partition algo-rithm (QHSP) should perform as well as GSP but without the largeruntime overhead.In QHSP, when allocating partitions, we favor qubits with highfidelity. We define the fidelity degree of qubit based on the

CNOT and readout fidelities of this qubit as in Eq. 2. 𝐹 _ 𝐷𝑒𝑔𝑟𝑒𝑒 𝑄 𝑖 = ∑︁ 𝑄 𝑗 ∈ 𝑁 ( 𝑄 𝑖 ) 𝜆 × ( − 𝐸 [ 𝑄 𝑖 ] [ 𝑄 𝑗 ]) + ( − 𝑅 𝑄 𝑖 ) (2) 𝑄 𝑗 are the neighbour qubits connected to 𝑄 𝑖 , 𝐸 is the CNOT errormatrix, and 𝑅 is the readout error rate. 𝜆 is a user defined parameterto weight between the CNOT error rate and readout error rate. Suchparameter is useful for two reasons: (1) Typically, in a quantum cir-cuit, the number of

CNOT operations is different from the number ofmeasurement operations. Hence, the user can decide on 𝜆 based onthe relative number of operations. (2) For some qubits, the readouterror rate is one or more orders of magnitude larger than the CNOT error rate. Thus, it is reasonable to add a weight parameter.The fidelity degree metric reveals two aspects of the qubit. Thefirst one is the connectivity of the qubit. The more neighbours a qubit has, the larger its fidelity degree is. The second one is the reli-ability of the qubit accounting

CNOT and readout error rates. Thus,the metric allows us to select a reliable qubit with good connectiv-ity. Instead of trying all the possible subgraph combinations (as inGSP algorithm), we propose a QHSP algorithm to build partitionsthat contain qubits with high fidelity degree while significantlyreducing runtime.To further improve the algorithm, we construct a list of qubitswith good connectivity as starting points. We sort all physicalqubits (qubits used in hardware) by their physical node degree,which is defined as the number of links in a physical qubit. Notethat, the physical node degree is different from the fidelity degree.Similarly, we also obtain the largest logical node degree of thelogical qubit (qubits used in the quantum circuit) by checking thenumber of different qubits that are connected to a qubit through

CNOT operations. Next, we compare these two metrics.If the largest physical node degree is less than the largest logi-cal node degree, it means we cannot find a suitable physical qubitto map the logical qubit with the largest logical node degree thatsatisfies all the connections. In this case, we only collect the phys-ical qubits with the largest physical node degree. Otherwise, thephysical qubits whose physical node degree is greater than or equalto the largest logical node degree are collected as starting points.By limiting the starting points, this heuristic partition algorithmbecomes even faster.For each qubit in the starting points list, it explores its neigh-bours and finds the neighbour qubit with the highest fidelity degreecalculated in Eq. 2, and merges it into the sub-partition. Then, thequbit inside of the sub-partition with the highest fidelity degreeexplores its neighbour qubits and merges the best one. The processis repeated until the number of qubits inside of the sub-partitionis equal to the number of qubits needed. This sub-partition is con-sidered as a subgraph and is added to the partition candidates (seeSupplementary Note 3 for pseudo-code of QHSP).After obtaining all the partition candidates, we compute thefidelity score for each of them. As we start from a qubit with highphysical node degree and merge to neighbour qubits with highfidelity degree, the constructed partition is supposed to be well-connected, hence, we do not need to check the connectivity ofthe partition using the longest shortest path 𝐿 as in Eq. 1, GSPalgorithm. We can only compare the error rates. The fidelity scoremetric is simplified by only calculating the CNOT and readout errorrates as in Eq. 3. It is calculated for each partition candidate andthe best one is selected. See supplementary note 3 for an exampleof explaining QHSP in detail.

𝑆𝑐𝑜𝑟𝑒 ℎ = 𝐴𝑣𝑔

𝐶𝑁𝑂𝑇 × 𝐶𝑁𝑂𝑇𝑠 + ∑︁ 𝑄 𝑖 ∈ 𝑃 𝑅 𝑄 𝑖 (3) Runtime analysis

Let 𝑛 be the number of hardware qubits, 𝑘 the number of circuitqubits to be allocated in a partition, 𝑔 the number of gates that thecircuit has.For GSP algorithm, in most cases, the number of circuit qubitsis less than the number of hardware qubits, thus the time costis 𝑂 ( 𝑘 𝑛 𝑘 ) . It increases exponentially as the number of circuit nabling multi-programming mechanism for quantum computing in the NISQ era qubits augments. QHSP algorithm starts by collecting a list of 𝑚 starting points where 𝑚 ≤ 𝑛 . It takes 𝑂 ( 𝑚𝑘 + 𝑛𝑙𝑜𝑔 ( 𝑛 ) + 𝑔 ) , whichis polynomial. For the detailed explanation of runtime analysis, seeSupplementary Note 3. By default multi-programming mechanism reduces circuit fidelitycompared to standalone circuit execution mode. If the fidelity re-duction is significant, circuits should be executed independentlyor the number of simultaneous circuits should be reduced eventhough the hardware throughput can be decreased as well. There-fore, we consistently check the circuit fidelity difference betweenindependent versus concurrent execution.We start with qubit partition process for each circuit indepen-dently and obtain the fidelity score of the partition. Next, this qubitpartition process is applied to these circuits to compute the fidelityscore when executing them simultaneously. The difference betweenthe fidelity scores is denoted Δ 𝑆 , which is the fidelity metric. If Δ 𝑆 is less than a specific threshold 𝛿 , it means simultaneous circuitexecution does not detriment significantly the fidelity score, thuscircuits can be executed concurrently, otherwise, independentlyor reduce the number of simultaneous circuits. The fidelity met-ric along with the parallelism manager help to define the optimalnumber of simultaneous circuits to be executed. The circuits need to be transformed to be executable on real quan-tum hardware, which includes two steps: initial mapping and map-ping transition. The initial mapping of each circuit is created whiletaking into account swap error rate and swap distance to performqubit movement operations [39]. The initial mapping of the simul-taneous mapping transition process is obtained by merging theinitial mapping of each circuit according to its partition. We furtherimprove the mapping transition algorithm [39] by modifying theheuristic cost function to better select the inserted gate. We alsointroduce the

Bridge gate to the simultaneous mapping transitionprocess for multi-programming.First, each quantum circuit is transformed into a more convenientformat – Directed Acyclic Graph (DAG) circuit which representsthe operation dependencies of the circuit without considering theconnectivity constraints. Then, the compiler traverses the DAGcircuit and goes through each quantum gate sequentially. The gatethat does not depend on other gates (i.e., all the gates before it havebeen executed) is allocated to the first layer, denoted 𝐹 . The com-piler checks if the gates on the first layer are hardware-compliant.The hardware-compliant gates can be executed on the hardwaredirectly without modification. They are added to the scheduler,removed from the first layer and marked as executed. If the firstlayer is not empty, which means some gates are non-executableon hardware, a SWAP or Bridge gate is needed. We collect all thepossible

SWAPs and

Bridges , and use the cost function 𝐻 (see Eq. 5)to find the best candidate. The process is repeated until all the gatesare marked as executed (see Supplementary Note 4 for pseudo-codeof simultaneous mapping transition algorithm).A SWAP gate requires three

CNOTs and inserting a

SWAP gate canchange the current mapping. A

Bridge gate requires four

CNOTs and inserting a

Bridge gate does not change the current mappingand it can only be used to execute a

CNOT when the distance betweenthe control qubit and the target qubit is exactly two. Both gatesneed three supplementary

CNOTs . The

SWAP gate is preferred whenit has a positive impact on the following gates, allocated in theextended layer 𝐸 , hence it makes these gates executable or reducesthe distance between control and target quits. Otherwise, a Bridge gate is preferred.A cost function 𝐻 is introduced to evaluate the cost of insertinga SWAP or Bridge . We use the following distance matrix (Eq. 4) asin [39] to quantify the impact of the

SWAP or Bridge gate, 𝐷 = 𝛼 × 𝑆 + 𝛼 × E (4)where 𝑆 is the swap distance matrix and E is the swap error matrix.We set 𝛼 and 𝛼 to 0.5 to equally consider the swap distance andswap error rate. In [39], only the impact of a SWAP and

Bridge on other gates (first and extended layer) was considered withoutconsidering their impact on the gate itself. As each of them iscomposed of either three or four

CNOTs , their impact cannot beignored. Hence, in our multi-programming mapping transitionalgorithm, we take self impact into account and create a list of both

SWAP and

Bridge candidates, labeled as "tentative gates" and theheuristic cost function is as: 𝐻 = | 𝐹 + 𝑁 𝑇𝑒𝑛𝑡 | ( ∑︁ 𝑔 ∈ 𝐹 𝐷 [ 𝜋 ( 𝑔.𝑞 )] [ 𝜋 ( 𝑔.𝑞 )]+ ∑︁ 𝑔 ∈ 𝑇𝑒𝑛𝑡 𝐷 [ 𝜋 ( 𝑔.𝑞 )] [ 𝜋 ( 𝑔.𝑞 )])+ 𝑊 × | 𝐸 | ∑︁ 𝑔 ∈ 𝐸 𝐷 [ 𝜋 ( 𝑔.𝑞 )] [ 𝜋 ( 𝑔.𝑞 )] (5)where 𝑊 is the parameter that weights the impact of the ex-tended layer, 𝑁 𝑇𝑒𝑛𝑡 is the number of gates of the tentative gate,

𝑇𝑒𝑛𝑡 represents a

SWAP or Bridge gate, and 𝜋 represents the map-ping. SWAP gate has three

CNOTs , thus 𝑁 𝑇𝑒𝑛𝑡 is three and we con-sider the impact of three

CNOTs on the first layer. The mapping isthe new mapping after inserting a

SWAP . For

Bridge gate, 𝑁 𝑇𝑒𝑛𝑡 isfour and we consider four

CNOTs on the first layer, and the mappingis the current mapping as

Bridge gate does not change the currentmapping. We weight the impact on the extended layer to prioritizethe first layer. This cost function can help the compiler select thebest gate to insert between a

SWAP and

Bridge gate.

We first evaluated our multi-programming approach by execut-ing a list of different-size benchmarks at the same time on twoquantum devices, IBM Q 27 Toronto and IBM Q 65 Manhattan(ibmq_manhattan) [2] (see Supplementary Note 1 for further in-formation about the selected quantum hardware). All the bench-marks are collected from the previous work [54], including severalfunctions taken from RevLib [49] as well as some quantum algo-rithms written in Quipper [25] or Scaffold [4]. These benchmarksare widely used in the quantum community and their details are iyuan Niu and Aida Todri-Sanial (a) . . . . . . F i d e li t y HA PHA CDAP QHSP (b) N u m b e r o f a dd i t i o n a l ga t e HA PHA CDAP QHSP

Figure 4: Comparison of fidelity and number of additionalgates on IBM Q 27 Toronto when executing two circuits si-multaneously. (a)

Fidelity . (b)

Number of additional gates. (a) . . . . F i d e li t y HA PHA CDAP QHSP (b) N u m b e r o f a dd i t i o n a l ga t e s HA PHA CDAP QHSP

Figure 5: Comparison of fidelity and number of additionalgates on IBM Q 65 Manhattan when executing three circuitssimultaneously. (a)

Fidelity . (b)

Number of additional gates. (a) . . . . F i d e li t y HA PHA CDAP QHSP (b) N u m b e r o f a dd i t i o n a l ga t e s HA PHA CDAP QHSP

Figure 6: Comparison of fidelity and number of additionalgates on IBM Q 65 Manhattan when executing four circuitssimultaneously. (a)

Fidelity . (b)

Number of additional gates.shown in Table 1. We chose small quantum circuits with shallow-depth since only small circuits can obtain reliable results whenexecuted on real quantum hardware. The metrics we used to eval-uate our algorithm include Probability of a Successful Trial (PST), number of additional CNOT gates, and Trial Reduction Factor (TRF),see Method for detailed explanation.Several published qubit mapping algorithms [26, 30, 34, 36, 39, 50,53] and multi-programming mapping algorithms are available asdiscussed in section 1. HA [39] seems to be the best qubit mappingalgorithm in terms of the number of additional gates and circuitfidelity. We use HA as the baseline for independent executions ofmultiple circuits. CDAP algorithm proposed in [16] seems to be thebest multi-programming mapping algorithm and is considered asthe baseline for concurrent executions of multiple circuits.To summarize, we compare our multi-programming algorithms,1) GSP + improved mapping transition (labeled as GSP) and 2)QHSP + improved mapping transition (labeled as QHSP), with thebaseline CDAP. The loss of fidelity due to simultaneous executionsof multiple circuits is reported by comparing concurrent versusindependent executions. Moreover, we compare the partition +improved mapping transition algorithm based on HA (labeled asPHA) versus HA on independent executions to show the impact ofpartition in large quantum hardware for a small circuit. The detailsof the configuration of algorithms are presented in Methods.We first ran two quantum circuits on IBM Q 27 Toronto simul-taneously. Results on output state fidelity and the number of addi-tional gates are shown in Fig. 4. For independent executions, thefidelity is improved by 46.8% and the number of additional gatesis reduced by 8.7% comparing PHA to HA. For simultaneous ex-ecutions, QHSP and GSP allocate the same partitions except forthe first experiment – (ID1, ID1). In this experiment, GSP improvesthe fidelity by 6% compared to QHSP. Partition results might bedifferent due to the various calibration data and the choice of 𝜆 ,but the difference of the partition fidelity score between the twoalgorithms is small. The results show that QHSP is able to allo-cate nearly optimal partitions while reducing runtime significantly.Therefore, for the rest experiments, we only evaluate QHSP al-gorithm. QHSP can improve the fidelity by 28.9% and reduce theadditional gate number by 52.3% compared to CDAP. Comparingsimultaneous (QHSP) versus independent (PHA) executions for twocircuits, fidelity decreases by 5.8% and the number of additionalgates is almost the same. During the post-partition process, Δ 𝑆 doesnot pass the threshold and TRF is two.Next, we executed on IBM 65 Manhattan three and four simulta-neous quantum circuits. Fig. 5 and Fig. 6 show the comparison of fi-delity and the number of additional gates. PHA always outperformsHA for independent executions. QSHP significantly outperformsCDAP with the number of simultaneous circuits increasing. Theoutput fidelity is increased by 74.8% and 55.3% on average for thetwo cases. The reduction of inserted gate number is always morethan 50%. The threshold is still not passed and TRF becomes three ID Name Qubits Num g Num

CNOT

Table 1: Information of benchmarks nabling multi-programming mechanism for quantum computing in the NISQ era and four. Moreover, fidelities decrease by 1.5% and 6.7% when com-paring simultaneous (QHSP) versus independent (PHA) executions.Finally, to evaluate the hardware limitations of executing mul-tiple circuits in parallel, we set the threshold 𝛿 to 0.2. All the fivebenchmarks are able to be executed simultaneously on IBM Q 65Manhattan. Partition fidelity difference is 0.18. Results show thatfidelity of simultaneous executions (QHSP) is decreased by 9.5%compared to independent executions (PHA). Both fidelity and ad-ditional gate number improvement of QHSP are more than 50%compared to CDAP. The complete experimental results can be foundin Supplementary Note 5. For independent executions, algorithm PHA is always better thanHA due to two reasons: (1) The initial mapping of the two algo-rithms is based on a random process. During the experiment, weperform the initial mapping generation process ten times and selectthe best one. However, for PHA, we first limit the random processinto a reliable and well-connected small partition space rather thanthe overall hardware space used by HA. Therefore, with only tentrials, PHA finds a better initial mapping. (2) We improve the map-ping transition process of PHA, which can make a better selectionbetween

SWAP and

Bridge gate. HA is shown to be sufficient forhardware with a small number of qubits for example a 5-qubitquantum chip. If we want to map a circuit on large hardware, it isbetter to first limit the search space into a reliable small partitionand then find the initial mapping. This qubit partition approachcan be applied to general qubit mapping problem for search spacelimitation when large hardware is selected to map.For simultaneous executions, QHSP performs better than CDAPbecause of the following reasons: (1) CDAP constructs a hierarchytree according to the modularity-based FN community detection al-gorithm [38]. The tree is constructed by calculating the modularityof the overall hardware coupling graph. However, when allocatinga partition to a circuit, we focus on the topology and calibrationdata inside of the partition, rather than the whole hardware. Asthe number of partitions to allocate increases, the performance ofCDAP becomes worse. (2) CDAP only considers the

SWAP gate torealize the connection ignoring the

Bridge gate, which can sig-nificantly reduce the number of additional gates. (3) CDAP doesnot consider the crosstalk effect. Although the X-SWAP schemeused in CDAP can slightly reduce the number of additional gates,it only works when the allocated partitions are close to each other,which will increase the crosstalk effect. However, QHSP takes thepartition topology, error rate, and crosstalk effect into considera-tion and can provide better partitions. QHSP uses almost the samenumber of additional gates whereas fidelity is decreased less than10% compared to PHA if the threshold is set to 0.1.

In order to demonstrate the potential interest to apply the multi-programming mechanism to existing quantum algorithms, we in-vestigated it on VQE algorithm. To do this, we performed the sameexperiment as [17, 24] on IBM Q 65 Manhattan, estimating theground state energy of deuteron, which is the nucleus of a deu-terium atom, an isotope of hydrogen. Deuteron can be modeled using a 2-qubit Hamiltonian span-ning four Pauli strings:

𝑍𝐼, 𝐼𝑍, 𝑋𝑋, and 𝑌𝑌 , [17, 24]. If we use thenaive measurement to calculate the state energy, one ansatz cor-responds to four different measurements. Pauli operator groupinghas been proposed to reduce this overhead by utilizing simultane-ous measurement [13, 24, 31]. For example, the Pauli strings canbe partitioned into two commuting families: { 𝑍𝐼, 𝐼𝑍 } and {

𝑋𝑋, 𝑌𝑌 }using the approach proposed in [24]. It allows one parameterizedansatz to be measured twice instead of four measurements in naivemethod.We used a simplified Unitary Coupled Cluster ansatz with asingle parameter and three gates, as described in [17, 24]. Thealgorithm configuration of this experiment is explained in Methods.We applied our multi-programming method on the top of the Paulioperator grouping approach (labeled as PG) [24]. We performedthis experiment twice across different days. For the first experiment,the parallelism manager worked with the hardware-aware multi-programming compiler to finally select ten circuits for simultaneousexecution without passing the fidelity threshold. It corresponds toperform five optimisations (five different parameterized circuits) atthe same time (one parameterized circuit needs two measurements).The selected ten circuits were passed to the scheduler to be executedin parallel. The required circuit number is reduced by ten timescompared to PG. Note that, if we use the naive measurement, thenumber of circuits needed will be reduced by a factor of 20. Theresult is shown in Fig. 7a. The error rate is quite high for the twoexecutions, 29.7% for PG and 64.4% for multi-programming + PG.The result of the second experiment is shown in Fig. 7b. In this case,four optimisations (eight circuits) were selected to be executed atthe same time with respect to the fidelity threshold. The error rateis 9.3% and 7% for the two methods. Applying multi-programmingcan even improve the output fidelity. The huge fidelity differenceis due to the different calibration data of the device which are theinput of our multi-programming approach. The complete result ofthe two experiments including hardware throughput is shown inFig. 7c.

In this article, we presented a multi-programming approach that al-lows to execute multiple circuits on a quantum chip simultaneouslywithout losing fidelity. We introduced the parallelism manager andfidelity metric to select optimally the number of circuits to be exe-cuted at the same time. Moreover, we proposed a hardware-awaremulti-programming compiler which contains two qubit partition al-gorithms taking hardware topology, calibration data, and crosstalkeffect into account to allocate reliable partitions to different quan-tum circuits. We also demonstrated an improved simultaneousmapping transition algorithm which helps to transpile the circuitson quantum hardware with a reduced number of inserted gates.We first executed a list of circuits of different sizes simultane-ously and compared our algorithm with the state-of-the-art multi-programming approach. Experimental results showed that our ap-proach can outperform the state of the art in terms of both outputfidelity and the number of additional gates. Then, we investigatedour multi-programming approach on VQE algorithm to estimatethe ground state energy of deuteron, showing the added value of iyuan Niu and Aida Todri-Sanial (a) < H > Deuteron estimation, 5 optimisations

PGMulti+PGTheory (b) < H > Deuteron estimation, 4 optimisations

PGMulti+PGTheory (c)

Experiments n c Errorrate (%) HardwarethroughputID1 PG 1 29.7 0.03Multi+PG 10 64.4 0.3ID2 PG 1 9.3 0.03Multi+PG 8 7 0.25

Figure 7: The estimation of the ground state energy ofdeuteron under PG and muti-programming + PG. (a)

Five op-timisations with ten measurements. (b)

Four optimisations witheight measurements. (c)

The complete result of the two experiments. 𝑛 𝑐 is the number of simultaneous circuit number.applying our approach to existing quantum algorithms. The multi-programming approach is evaluated on IBM hardware, but it isgeneral enough to be adapted to other quantum hardware.Based on the experimental result, we found that the main con-cern with multi-programming mechanism is a trade-off betweenoutput fidelity and the hardware throughput. For example, howone can decide which programs to execute simultaneously and howmany of them to execute without losing fidelity. Here, we list sev-eral guidelines to help the user to utilize our multi-programmingapproach. • Check the target hardware topology and calibration data.The multi-programming mechanism is more suitable for arelatively large quantum chip compared to the quantumcircuit and with low error rate. • Choose appropriate fidelity threshold for post qubit partitionprocess. A high threshold can improve the hardware through-put but lead to the reduction of output fidelity. It should beset carefully depending on the size of the benchmark. Forbenchmarks of small size that we used in experiments, it isreasonable to set the threshold to 0.1. • The number of circuits that can be executed simultaneouslywill mainly depend on the fidelity threshold and the calibra-tion data of the hardware. • QHSP algorithm is suggested for the partition process due toefficiency and GSP is recommended to evaluate the qualityof the partition algorithm. Using both algorithms, one canexplore which circuits can be executed simultaneously andhow many of them within the given fidelity threshold.Quantum hardware development with more and more qubitswill enable execution of multiple quantum programs simultane-ously and possibly a linchpin for quantum algorithms requiringparallel sub-problem executions. Variational Quantum Algorithm is becoming a leading strategy to demonstrate quantum advantagesfor practical applications. In such algorithms, the preparation ofparameterized quantum state and the measurement of expecta-tion value are realized on shallow circuits [51]. Taking VQE as anexample, the Hamiltonian can be decomposed into several Paulioperators and simultaneous measurement by grouping Pauli op-erators have been proposed in [13, 24, 31] to reduce the overheadof the algorithm. Based on our experiment, we have shown thatthe overhead of VQE can be further improved by executing severalsets of Pauli operators at the same time using multi-programmingmechanism.For future work, we would like to apply our multi-programmingalgorithm to other variational quantum algorithms such as VQLSor VQC to enable the preparation of states in parallel and to reducethe overhead of these algorithms. Moreover, in our qubit parti-tion algorithms, we take the crosstalk effects into considerationby characterizing them and adding them to the fidelity score ofthe partition, which is able to avoid the crosstalk error in a highlevel. There are some other approaches of eliminating the crosstalkerror in a cheaper way instead of performing SRB protocol, forexample using commutativity rules to reorder the simultaneousgate operations [30, 37]. However, these methods have some chal-lenges such as trading off between crosstalk and decoherence. Moreinteresting tricks for crosstalk mitigation need to be targeted forsimultaneous executions. In addition, not all the benchmarks havethe same circuit depth. Taking the time-dependency into consid-eration, choosing the optimal combination of circuits of differentdepth to run simultaneously can also be the focus of future work.

Here are the detailed explanations of the metrics that we use toevaluate our algorithm.(1) Probability of a Successful Trial (PST) [48]. This metric isdefined by the number of trials that give the expected resultdivided by the total number of trials. The expected result isobtained by executing the quantum circuit on the simulator.To have a precise estimation of the PST, we execute eachquantum circuit on the quantum hardware for a large numberof trials (8192).(2) Number of additional

CNOT gates. This metric is related tothe number of

SWAP or Bridge gates inserted. This metriccan show the ability of the algorithm to reduce the numberof additional gates.(3) Trial Reduction Factor (TRF). This metric is introduced in [15]to evaluate the improvement of the throughput thanks tothe multi-programming mechanism. It is defined as the ra-tio of trials needed when quantum circuits are executedindependently to the trials that when they are executed si-multaneously.

Here, we consider the algorithm configurations of different multi-programming and standalone mapping approaches. We select thebest initial mapping out of ten attempts for HA, PHA, GSP, andQHSP. Weight parameter 𝑊 in the cost function (Eq. 5) is set to 0.5 nabling multi-programming mechanism for quantum computing in the NISQ era Single qubit error rate: 2 . − . − . − . − . − . − Figure 8: IBM Q 27 Toronto topology and error rates. and the size of the extended layer is set to 20. Parameters 𝛼 and 𝛼 are set to 0.5 respectively to consider equally the swap distanceand swap error rate. For the experiments of multiple different-sizecircuits, the weight parameter 𝜆 of QHSP (Eq. 2) is set to 2 because ofthe relatively large number of CNOT gates in benchmarks, whereasfor deuteron experiment, 𝜆 is set to 1 because of the small number of CNOTs of the parameterized circuit. The threshold 𝛿 for post qubitpartition is set to 0.1 to ensure the multi-programming fidelity. Dueto the expensive cost of SRB, we perform SRB only on IBM Q 27Toronto and collect the pairs with significant crosstalk effect. Onlythe collected pairs are characterized and their crosstalk propertiesare provided to the partition process. The experimental resultson IBM Q 65 Manhattan do not consider the crosstalk effect. Foreach algorithm, we only evaluate the mapping transition process,which means no optimisation methods like gate commutation orcancellation are applied.The algorithm is implemented in Python and evaluated on a PCwith 1 Intel i5-5300U CPU and 8 GB memory. Operating Systemis Ubuntu 18.04. All the experiments were performed on the IBMquantum information science kit (qiskit) [20] and the version usedis 0.21.0. The source code of the algorithms used in this paper is availableon the Github repository [3].

Noise can cause several errors during the execution process such as(1) coherence errors due to the fragile nature of qubits. The qubitcan only maintain information for a limited amount of time. (2)Operational errors including gate errors and measurement errors(readout errors). (3) Crosstalk errors that violate the isolated qubitstate due to operations on other qubits.Supplementary Fig. 8 shows the hardware topology and thecalibration data of IBM Q 27 Toronto. We list the calibration dataof single-qubit error rate,

CNOT error rate, and readout error rate.

Single qubit error rate: 2 . − . − . − . − . − . − Figure 9: IBM Q 65 Manhattan topology and calibration data.

Note that these errors are not constant and change at each re-calibration of the chip, and IBM does not provide the statistics ofcrosstalk error. The other device that we choose to evaluate ouralgorithm is IBM Q 65 Manhattan. Its topology and calibration dataare shown in Supplementary Fig. 9.

CNOT error rate is one orderof magnitude higher than their one-qubit counterparts. Moreover,the readout error rate is of the same order of magnitude or higherthan

CNOT error rate. In this paper, we only focus on

CNOT errorrate and readout error rate because of the relatively low error ratesof one-qubit gates.It is important to note that all the interconnects between qubitsas well as the reliability of qubit are not equal with respect to

CNOT error rate and readout error rate. Taking IBM Q 27 Toronto as anexample, the best

CNOT gate has an error rate of 4.8 times lowerthan the worst

CNOT , and the most reliable qubit has a readouterror rate of 31.7 times lower than the worst qubit. Therefore, eachqubit cannot be treated equally, and we need to consider the errordifference between the links and qubits.In this article, we mainly focus on IBM architectures. But theproposed methods are general enough to be applied to any otherquantum chips that use the quantum-gate model of computation,such as Google’s Sycamore [56] or Rigetti’s Aspen-8.

To motivate the qubit partition problem, we execute two smallcircuits QC1 and QC2 simultaneously on IBM Q 27 Toronto withdifferent partitions (Supplementary Fig. 10).

CNOT error rate of eachlink is shown in the figure and the unreliable links and qubits withhigh readout error rates are highlighted in red. Both circuits havefive qubits with a different number of gates as listed in Supplemen-tary Fig. 11.There are two constraints to be considered when executing mul-tiple circuits concurrently. First, each circuit should be allocated toa partition containing reliable physical qubits. Allocated physicalqubits can not be shared among quantum circuits. Second, qubits iyuan Niu and Aida Todri-Sanial (a) QC1 QC20 123 45 6789 1011 121314 1516 17181920 2122 232425 26 . . . . . . . . . . . . . . . . . . . . . . . . . . . . (b) . . . . . . . . . . . . . . . . . . . . . . . . . . . . (c) . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 10: A motivational example of qubit partition prob-lem (error rate in %). (a)

Partition without considering opera-tional error. (b)

Partition considering operational error withoutconsidering crosstalk effect. (c)

Partition considering both opera-tional error and crosstalk effect.can be moved only inside of their circuit partition, in other words,qubits can be swapped within the same partition only. Thus, findingreliable partitions for multiple circuits is an important step in themulti-programming mapping problem.We compare three partitions with the same topology to showthe impact of different error sources on the output fidelity: (1) Par-tition P1 without considering the operational error (SupplementaryFig. 10a). (2) Partition P2 only considering operational error with-out the crosstalk effect (Supplementary Fig. 10b). (3) Partition P3considering both operational error and crosstalk effect (Supplemen-tary Fig. 10c). Note that the operational error includes

CNOT errorand readout error. For illustration, we fix the partition of QC2 to (a)

Benchmarks name n g P1 P2 P3QC1 alu-v0 27 5 36 0.256 0.289 0.467QC2 4mod5-v1 22 5 21 0.792 0.759 0.783 (b)

P1 P2 P30 . . . . F i d e li t y QC1 QC2

Figure 11: Results of the motivational example. (a)

Circuitinformation and output fidelity results of different partitions. n:qubit number. g: gate number of the circuit. (b)

Output fidelityresults of different partitions. { , , , , } and only change the partition of QC1. It is importantto note that if we have different topologies, the fidelity of the circuitwill be different as well because the number of additional gates isstrongly related to the hardware topology.Results in Supplementary Fig. 11b show that Partition P1 has thelowest fidelity. Partition P2 considers operational error and selects { , , , , } with reliable qubits and links. However, it does notconsider the crosstalk effect. Since 𝑄 is the neighour of 𝑄 , when 𝐶𝑋 , and 𝐶𝑋 , are executed at the same time, they can affect eachother and violate the qubit state. Partition P3 includes { , , , , } and considers both operational error and crosstalk effect. P3 doesnot have the crosstalk effect and is slightly better than P2 in termsof the operational error, however, the output fidelity of QC1 isincreased by 61 . In this note, we first demonstrate the pseudo-code of GSP algorithm.Then, we show an example of QHSP algorithm and its pseudo-code.Finally, we explain the runtime analysis of the two algorithms indetail.

The pseudo-code of GSP is shown in Algorithm 1.

Supplementary Fig. 12 shows an example of applying QHSP on IBMQ 5 Valencia (5-qubit ibmq_valencia) [55] for a four-qubit circuit.The calibration data of IBM Q 5 Valencia, including readout errorrate and

CNOT error rate is shown in Supplementary Fig. 12a. The nabling multi-programming mechanism for quantum computing in the NISQ era Algorithm 1:

GSP algorithm input :

Quantum circuit 𝑄𝐶 , Coupling graph 𝐺 ,Calibration data 𝐶 , Crosstalk properties crosstalk_props , Used_qubits 𝑞 used output : A list of candidate partitions sub_graph_list begin qubit_num ← 𝑄𝐶 .qubit_num; Set sub_graph_list to empty list; for sub_graph ∈ combinations ( 𝐺 , qubit_num ) do if sub_graph is connected then if 𝑞 used is empty then sub_graph . Set_Partition_Score ( 𝐺 , 𝐶 , 𝑄𝐶 ); sub_graph_list . append ( sub_graph ); end if no qubit in sub_graph is in 𝑞 used then crosstalk_pairs ← Find_Crosstalk_pairs ( sub_graph , crosstalk_props , 𝑞 used ); sub_graph . Set_Partition_Score ( 𝐺 , 𝐶 , 𝑄𝐶 , crosstalk_pairs ); sub_graph_list . append ( sub_graph ); end end end return sub_graph_list ; end fidelity degree of qubit calculated by Eq. 2 is shown in Supplemen-tary Fig. 12c. Here, we consider a circuit of medium size and set 𝜆 to two. Suppose the largest logical degree is three. Therefore, 𝑄 is selected as the starting point since it is the only physical qubitthat has the same physical node degree as the largest logical de-gree. It has three neighbour qubits: 𝑄 , 𝑄 , and 𝑄 . 𝑄 is mergedinto the sub-partition because it has the highest fidelity degreeamong neighbour qubits. The sub-partition becomes { 𝑄 , 𝑄 } . Asthe fidelity degree of 𝑄 is larger than 𝑄 , the algorithm will se-lect again the left neighbour qubit with the largest fidelity degreeof 𝑄 , which is 𝑄 . The sub-partition becomes { 𝑄 , 𝑄 , 𝑄 } . 𝑄 isstill the qubit with the largest fidelity degree in the current sub-partition, its neighbour qubit – 𝑄 is merged. The final sub-partitionis { 𝑄 , 𝑄 , 𝑄 , 𝑄 } and it can be considered as a partition candidate.The merging process is shown in Supplementary Fig. 12b.The pseudo-code of QHSP is shown in Algorithm 2. Let 𝑛 be the number of hardware qubits and 𝑘 the number of qubitsin the circuit to be allocated in a partition. GSP algorithm selectsall the combinations of 𝑘 subgraphs from 𝑛 -qubit hardware andtakes 𝑂 ( 𝐶 ( 𝑛, 𝑘 )) time, which is 𝑂 ( 𝑛 𝑐ℎ𝑜𝑜𝑠𝑒 𝑘 ) . For each subgraph,it computes its fidelity score including calculating the longest short-est path, which scales at 𝑂 ( 𝑘 ) . It ends up being equivalent to 𝑂 ( 𝑘 𝑚𝑖𝑛 ( 𝑛 𝑘 , 𝑛 𝑛 − 𝑘 )) . In most cases, the number of circuit qubits is Algorithm 2:

QHSP algorithm input :

Quantum circuit 𝑄𝐶 , Coupling graph 𝐺 ,Calibration data 𝐶 , Crosstalk properties crosstalk_props , Used_qubits 𝑞 used , Starting points starting_points output : A list of candidate partitions sub_graph_list begin circ_qubit_num ← 𝑄𝐶 .qubit_num; Set sub_graph_list to empty list; for i ∈ starting_points do Set sub_graph to empty list; qubit_num ← while qubit_num < circ_qubit_num do if sub_graph is empty then sub_graph . append (i); qubit_num ← qubit_num + 1 ; continue; end best_qubit ← find_best_qubit ( sub_graph , 𝐺 , 𝐶 ); if best_qubit ≠ None then sub_graph . append ( best_qubit ); qubit_num ← qubit_num + 1 ; continue; end end if len ( sub_graph ) = circ_qubit_num then if 𝑞 used is empty then sub_graph . Set_Partition_Error ( 𝐺 , 𝐶 , 𝑄𝐶 ,); sub_graph_list . append ( sub_graph ); end if no qubit in sub_graph is in 𝑞 used then crosstalk_pairs ← Find_Crosstalk_pairs ( sub_graph , crosstalk_props , 𝑞 used ); sub_graph . Set_Partition_Error ( 𝐺 , 𝐶 , 𝑄𝐶 , crosstalk_pairs ); sub_graph_list . append ( sub_graph ); end end end return sub_graph_list ; end less than the number of hardware qubits, thus the time complex-ity becomes 𝑂 ( 𝑘 𝑛 𝑘 ) . It increases exponentially as the number ofqubits of the circuit augments.QHSP algorithm starts by collecting a list of 𝑚 starting pointswhere 𝑚 ≤ 𝑛 . To get the starting points, we sort the 𝑛 physicalqubits by their physical node degree, which takes 𝑂 ( 𝑛𝑙𝑜𝑔 ( 𝑛 )) . Then,we iterate over all the gates of the circuit (e.g. circuit has 𝑔 gates) andsort the 𝑘 logical qubits according to the logical node degree, which iyuan Niu and Aida Todri-Sanial (a) . Q . Q . Q . Q . Q .

85 1 . . . (b) { Q }{ Q ,Q }{ Q ,Q ,Q }{ Q ,Q ,Q ,Q } (c) Qubit Q Q Q Q Q Fidelity degree 1 .

96 3 .

93 1 .

95 2 .

94 1 . Figure 12: Example of qubit partition on IBM Q 5 Valenciafor a four-qubit circuit using QHSP.

Suppose the largest logicaldegree of the target circuit is three. (a)

Calibration data of IBM Q 5Valencia. The value inside of the node represents the readout errorrate (in%), and the value above the link represents the

CNOT errorrate (in%). (b)

Process of constructing a partition candidate usingQHSP. (c) The physical node degree and the fidelity degree of eachqubit calculated by Eq. 2.takes 𝑂 ( 𝑔 + 𝑘𝑙𝑜𝑔 ( 𝑘 )) . Next, for each starting point, it iterativelymerges the best neighbour qubit until each sub-partition contains 𝑘 qubits. To find the best neighbour qubit, the algorithm finds thebest qubit in a sub-partition and traverses all its neighbours to se-lect the one with the highest fidelity degree. Finding the best qubitin the sub-partition is 𝑂 ( 𝑝 ) where 𝑝 is the number of qubits in asub-partition. The average number of qubits 𝑝 is 𝑘 /

2, so this processtakes 𝑂 ( 𝑘 ) time on average. Finding the best neighbour qubit is 𝑂 ( ) because of the nearest-neighbor connectivity of superconduct-ing devices. Overall, the QHSP takes 𝑂 ( 𝑚𝑘 + 𝑛𝑙𝑜𝑔 ( 𝑛 ) + 𝑔 + 𝑘𝑙𝑜𝑔 ( 𝑘 )) time, and it can be truncated to 𝑂 ( 𝑚𝑘 + 𝑛𝑙𝑜𝑔 ( 𝑛 ) + 𝑔 ) , which is poly-nomial. In this note, we present the pseudo-code of our simultaneous map-ping transition algorithm (see Algorithm 3).

In this note, we demonstrate the exact experimental results whenexecuting a different number of circuits on the two devices, IBM Q27 Toronto and IBM Q 65 Manhattan, at the same time.

REFERENCES [1] 27-qubit backend: IBM Q team, "IBM Q 27 toronto backend specification V1.0.7,"(2020). Retrieved from https://quantum-computing.ibm.com.[2] 65-qubit backend: IBM Q team, "IBM Q 65 manhattan backend specificationV1.0.5," (2020). Retrieved from https://quantum-computing.ibm.com.[3] Github repository of the hardware-aware multi-programming approach. https://github.com/peachnuts/Multiprogramming.[4] Ali J Abhari, Arvin Faruque, Mohammad J Dousti, Lukas Svec, Oana Catu, AmlanChakrabati, Chen-Fu Chiang, Seth Vanderwilt, John Black, and Fred Chong.Scaffold: Quantum programming language. Technical report, Princeton Univ NJDept of Computer Science, 2012.[5] Abdullah Ash-Saki, Mahabubul Alam, and Swaroop Ghosh. Analysis of crosstalkin nisq devices and security implications in multi-programming regime. In

Algorithm 3:

Simultaneous mapping transition algorithm input :

Circuits

𝐷𝐴𝐺𝑠 , Coupling graph 𝐺 , Distancematrices 𝐷𝑠 , Initial mapping 𝜋 𝑖 , First layers 𝐹𝑠 output : Final schedule schedule begin 𝜋 𝑐 ← 𝜋 𝑖 ; while not all gates are executed do Set swap_bridge_lists to empty list; for 𝐹 𝑖 in 𝐹𝑠 do for gate in 𝐹 𝑖 do if gate is hardware-compliant then schedule . append ( gate ); Remove gate from 𝐹 𝑖 ; end end if 𝐹 𝑖 is not empty then swap_bridge_candidate_list ← FindSwapBridgePairs ( 𝐹 𝑖 , 𝐺 ); swap_bridge_lists . append ( swap_bridge_candidate_list ); end end for swap_bridge_candidate_list ∈ swap_bridge_lists do for 𝑔 tmp ∈ swap_bridge_candidate_list do 𝜋 tmp ← Map_Update ( 𝑔 tmp , 𝜋 𝑐 ); 𝐻 basic ← for gate ∈ 𝐹 𝑖 do 𝐻 basic ← 𝐻 basic + 𝐷 𝑖 ( gate , 𝜋 tmp ) end 𝐻 tentative ← 𝑔 tmp . 𝑐𝑜𝑠𝑡 ( 𝐺 , 𝐷 𝑖 , 𝜋 tmp ); Update the extended layer 𝐸 ; 𝐻 extend ← for gate ∈ 𝐸 do 𝐻 extend ← 𝐻 extend + 𝐷 𝑖 ( gate , 𝜋 tmp ); end 𝐻 ← | 𝐹 + 𝑁 tent | ( 𝐻 basic + 𝐻 tentative ) + 𝑊 | 𝐸 | 𝐻 extend end Choose the best gate 𝑔 𝑛 ; 𝜋 𝑐 ← Map_Update ( 𝑔 𝑛 , 𝜋 𝑐 ); end Update the First layers; end return schedule end Proceedings of the ACM/IEEE International Symposium on Low Power Electronicsand Design , pages 25–30, 2020.[6] Abdullah Ash-Saki, Mahabubul Alam, and Swaroop Ghosh. Experimental char-acterization, modeling, and analysis of crosstalk in a quantum computer.

IEEETransactions on Quantum Engineering , 2020.12 nabling multi-programming mechanism for quantum computing in the NISQ era

Table 2: Comparison of fidelity when executing two circuits simultaneously on IBM Q 27 Toronto.

Benchmarks Independent Correlated ComparisonID HA PHA CDAP QHSP GSP ∆

PST %ID1 ID2 PST1 PST2 Avg PST1 PST2 Avg PST1 PST2 Avg PST1 PST2 Avg t PST1 PST2 Avg t Indp. Corr.1 1 0.571 0.558 0.565 0.686 0.676 0.681 0.597 0.506 0.552 0.675 0.641 0.658 0.009 0.641 0.682 0.662 0.4 20.6 19.31 2 0.334 0.75 0.542 0.661 0.789 0.725 0.522 0.585 0.554 0.69 0.789 0.74 0.012 0.69 0.789 0.74 7.4 33.8 33.61 3 0.547 0.412 0.48 0.687 0.591 0.639 0.616 0.487 0.552 0.619 0.552 0.586 0.007 0.619 0.552 0.586 7.4 100 6.21 4 0.476 0.45 0.463 0.574 0.642 0.608 0.562 0.158 0.36 0.626 0.647 0.637 0.016 0.626 0.647 0.637 7.4 31.3 76.81 5 0.495 0.445 0.47 0.673 0.582 0.628 0.561 0.437 0.499 0.647 0.511 0.579 0.012 0.647 0.511 0.579 1.6 33.5 162 2 0.647 0.53 0.589 0.78 0.775 0.778 0.567 0.426 0.5 0.808 0.591 0.7 0.006 0.808 0.591 0.7 14.4 32.1 40.92 3 0.428 0.304 0.366 0.787 0.626 0.707 0.635 0.602 0.619 0.764 0.529 0.647 0.013 0.764 0.529 0.647 15 93 4.52 4 0.561 0.607 0.584 0.791 0.645 0.718 0.483 0.431 0.457 0.788 0.467 0.628 0.008 0.788 0.467 0.628 14.7 23 37.32 5 0.573 0.311 0.442 0.796 0.568 0.682 0.534 0.506 0.52 0.774 0.531 0.653 0.006 0.774 0.531 0.653 8.7 54.3 25.5

Avg : average of PSTs. t : runtime in seconds of the partition process. ∆ PST : comparison of average fidelity.

Table 3: Comparison of number of additional gates when executing two circuits simultaneously on IBM Q 27 Toronto.

Benchmarks Independent Correlated ComparisonID HA PHA CDAP QHSP ∆ g %ID1 ID2 g g Sum g g Sum g g

Indp. Corr.1 1 12 12 24 12 12 24 42 24 0 42.91 2 12 9 21 12 6 18 42 18 14.3 57.11 3 12 15 27 12 15 27 57 27 0 52.61 4 12 24 36 12 24 36 48 33 0 31.31 5 12 18 30 12 18 30 60 30 0 502 2 6 12 18 6 6 12 42 15 33.3 64.32 3 9 15 24 6 15 21 51 18 12.5 64.72 4 9 24 33 6 21 27 54 27 18.2 502 5 6 18 24 6 18 24 57 24 0 57.9 g : number of additional gates. Sum : sum of number of additional gates. ∆ g : comparison of sum of number of additional gates. Table 4: Comparison of fidelity when executing three circuits simultaneously on IBM Q 65 Manhattan.

Benchmarks Independent Correlated ComparisonID HA PHA CDAP QHSP ∆

PST %ID1 ID2 ID2 PST1 PST2 PST3 Avg PST1 PST2 PST3 Avg PST1 PST2 PST3 Avg PST1 PST2 PST3 Avg t Indp. Corr.1 2 3 0.61 0.566 0.624 0.6 0.651 0.624 0.555 0.61 0.566 0.57 0.177 0.438 0.609 0.526 0.714 0.616 0.047 1.7 40.81 2 4 0.521 0.683 0.289 0.5 0.637 0.703 0.48 0.607 0.163 0.624 0.131 0.306 0.559 0.708 0.531 0.599 0.048 21.9 95.91 2 5 0.627 0.725 0.368 0.573 0.623 0.653 0.487 0.588 0.15 0.466 0.233 0.283 0.609 0.592 0.528 0.576 0.047 2.5 103.72 3 4 0.644 0.434 0.389 0.489 0.631 0.566 0.544 0.58 0.547 0.156 0.211 0.305 0.633 0.565 0.498 0.565 0.04 18.7 85.62 3 5 0.689 0.617 0.488 0.598 0.585 0.542 0.486 0.538 0.548 0.276 0.237 0.354 0.7 0.528 0.34 0.523 0.04 -10 47.8

Avg : average of PSTs. t : runtime in seconds of the partition process. ∆ PST : comparison of average fidelity.

Table 5: Comparison of number of additional gates when executing three circuits simultaneously on IBM Q 65 Manhattan.

Benchmarks Independent Correlated ComparisonID HA PHA CDAP QHSP ∆ g %ID1 ID2 ID2 g g g Sum g g g Sum g g

Indp. Corr.1 2 3 12 12 12 36 12 6 12 30 75 30 16.7 601 2 4 12 9 21 42 12 6 18 36 69 36 14.3 47.81 2 5 12 9 18 39 12 6 18 36 78 36 7.7 53.82 3 4 9 15 18 42 6 12 18 36 84 39 14.3 53.62 3 5 9 15 18 42 9 12 18 39 93 36 7.1 61.3 g : number of additional gates. Sum : sum of number of additional gates. ∆ g : comparison of sum of number of additional gates.[7] Radoslaw C Bialczak, Markus Ansmann, Max Hofheinz, Erik Lucero, MatthewNeeley, AD O’Connell, Daniel Sank, Haohua Wang, James Wenner, MatthiasSteffen, et al. Quantum process tomography of a universal entangling gateimplemented with josephson phase qubits. Nature Physics , 6(6):409–413, 2010.[8] Carlos Bravo-Prieto, Ryan LaRose, Marco Cerezo, Yigit Subasi, Lukasz Cincio,and Patrick Coles. Variational quantum linear solver: A hybrid algorithm forlinear systems.

Bulletin of the American Physical Society , 65, 2020. [9] A Robert Calderbank, Eric M Rains, PM Shor, and Neil JA Sloane. Quantumerror correction via codes over gf (4).

IEEE Transactions on Information Theory ,44(4):1369–1387, 1998.[10] A Robert Calderbank and Peter W Shor. Good quantum error-correcting codesexist.

Physical Review A , 54(2):1098, 1996.13 iyuan Niu and Aida Todri-Sanial

Table 6: Comparison of fidelity when executing four circuits simultaneously on IBM Q 65 Manhattan.

Benchmarks Independent Correlated ComparisonID HA PHA CDAP QHSP ∆

PST %ID1 ID2 ID2 ID3 PST1 PST2 PST3 PST4 Avg PST1 PST2 PST3 PST4 Avg PST1 PST2 PST3 PST4 Avg PST1 PST2 PST3 PST4 Avg t Indp. Corr.1 2 3 4 0.512 0.622 0.486 0.35 0.493 0.588 0.644 0.572 0.443 0.562 0.145 0.625 0.383 0.283 0.359 0.443 0.747 0.542 0.443 0.544 0.06 14.1 51.51 2 3 5 0.44 0.644 0.608 0.203 0.474 0.648 0.638 0.561 0.491 0.585 0.157 0.619 0.511 0.475 0.441 0.612 0.645 0.581 0.373 0.553 0.058 23.4 25.51 3 4 5 0.6 0.542 0.228 0.289 0.415 0.592 0.504 0.497 0.404 0.499 0.123 0.608 0.468 0.145 0.336 0.557 0.53 0.32 0.426 0.458 0.058 20.4 36.42 3 4 5 0.643 0.544 0.287 0.278 0.438 0.699 0.53 0.525 0.465 0.555 0.271 0.489 0.154 0.138 0.263 0.691 0.477 0.492 0.369 0.507 0.048 26.7 92.9

Avg : average of PSTs. t : runtime in seconds of the partition process. ∆ PST : comparison of average fidelity.

Table 7: Comparison of number of additional gates when executing three circuits simultaneously on IBM Q 65 Manhattan.

Benchmarks Independent Correlated ComparisonID HA PHA CDAP QHSP ∆ g %ID1 ID2 ID2 ID4 g g g g Sum g g g g Sum g g

Indp. Corr.1 2 3 4 12 9 15 24 60 12 9 15 18 54 102 51 10 501 2 3 5 12 9 15 12 48 12 6 12 18 48 114 51 0 55.31 3 4 5 12 15 18 18 63 12 12 18 18 60 129 60 4.8 53.52 3 4 5 6 15 21 18 60 6 15 18 18 57 126 54 5 57.1 g : number of additional gates. Sum : sum of number of additional gates. ∆ g : comparison of sum of number of additional gates.[11] Yudong Cao, Jonathan Romero, Jonathan P Olson, Matthias Degroote, Peter DJohnson, Mária Kieferová, Ian D Kivlichan, Tim Menke, Borja Peropadre, Nico-las PD Sawaya, et al. Quantum chemistry in the age of quantum computing. Chemical reviews , 119(19):10856–10915, 2019.[12] M Cerezo, Andrew Arrasmith, Ryan Babbush, Simon C Benjamin, Suguru Endo,Keisuke Fujii, Jarrod R McClean, Kosuke Mitarai, Xiao Yuan, Lukasz Cincio, et al.Variational quantum algorithms. arXiv preprint arXiv:2012.09265 , 2020.[13] Ophelia Crawford, Barnaby van Straaten, Daochen Wang, Thomas Parks, EarlCampbell, and Stephen Brierley. Efficient quantum measurement of pauli opera-tors. arXiv preprint arXiv:1908.06942 , 2019.[14] Andrew W Cross, Lev S Bishop, John A Smolin, and Jay M Gambetta. Openquantum assembly language. arXiv preprint arXiv:1707.03429 , 2017.[15] Poulami Das, Swamit S Tannu, Prashant J Nair, and Moinuddin Qureshi. A casefor multi-programming quantum computers. In

Proceedings of the 52nd AnnualIEEE/ACM International Symposium on Microarchitecture , pages 291–303, 2019.[16] Xinglei Dou and Lei Liu. A new qubits mapping mechanism for multi-programming quantum computing. In

Proceedings of the ACM InternationalConference on Parallel Architectures and Compilation Techniques , pages 349–350,2020.[17] Eugene F Dumitrescu, Alex J McCaskey, Gaute Hagen, Gustav R Jansen, Titus DMorris, T Papenbrock, Raphael C Pooser, David Jarvis Dean, and Pavel Lougov-ski. Cloud quantum computing of an atomic nucleus.

Physical review letters ,120(21):210501, 2018.[18] Daniel J Egger, Claudio Gambella, Jakub Marecek, Scott McFaddin, Martin Mevis-sen, Rudy Raymond, Andrea Simonetto, Stefan Woerner, and Elena Yndurain.Quantum computing for finance: state of the art and future prospects. arXivpreprint arXiv:2006.14510 , 2020.[19] Alexander Erhard, Joel J Wallman, Lukas Postler, Michael Meth, Roman Stricker,Esteban A Martinez, Philipp Schindler, Thomas Monz, Joseph Emerson, andRainer Blatt. Characterizing large-scale quantum computers via cycle bench-marking.

Nature communications , 10(1):1–7, 2019.[20] Héctor Abraham et al. Qiskit: An open-source framework for quantum computing.https://qiskit.org/, 2019.[21] Edward Farhi, Jeffrey Goldstone, and Sam Gutmann. A quantum approximateoptimization algorithm. arXiv preprint arXiv:1411.4028 , 2014.[22] Austin G Fowler, Matteo Mariantoni, John M Martinis, and Andrew N Cleland.Surface codes: Towards practical large-scale quantum computation.

PhysicalReview A , 86(3):032324, 2012.[23] Jay M Gambetta, AD Córcoles, Seth T Merkel, Blake R Johnson, John A Smolin,Jerry M Chow, Colm A Ryan, Chad Rigetti, S Poletto, Thomas A Ohki, et al.Characterization of addressability by simultaneous randomized benchmarking.

Physical review letters , 109(24):240504, 2012.[24] Pranav Gokhale, Olivia Angiuli, Yongshan Ding, Kaiwen Gui, Teague Tomesh,Martin Suchara, Margaret Martonosi, and Frederic T Chong. Optimization ofsimultaneous measurement for variational quantum eigensolver applications.In , pages 379–390. IEEE, 2020.[25] Alexander S Green, Peter LeFanu Lumsdaine, Neil J Ross, Peter Selinger, andBenoît Valiron. Quipper: a scalable quantum programming language. In

Proceed-ings of the 34th ACM SIGPLAN conference on Programming language design and implementation , pages 333–342, 2013.[26] Gian Giacomo Guerreschi and Jongsoo Park. Two-step approach to schedulingquantum circuits.

Quantum Science and Technology , 3(4):045003, 2018.[27] Vojtěch Havlíček, Antonio D Córcoles, Kristan Temme, Aram W Harrow, AbhinavKandala, Jerry M Chow, and Jay M Gambetta. Supervised learning with quantum-enhanced feature spaces.

Nature , 567(7747):209–212, 2019.[28] Cupjin Huang, Xiaotong Ni, Fang Zhang, Michael Newman, Dawei Ding, XunGao, Tenghui Wang, Hui-Hai Zhao, Feng Wu, Gengyan Zhang, et al. Alibabacloud quantum development platform: Surface code simulations with crosstalk. arXiv preprint arXiv:2002.08918 , 2020.[29] Hsin-Yuan Huang, Kishor Bharti, and Patrick Rebentrost. Near-term quantumalgorithms for linear systems of equations. arXiv preprint arXiv:1909.07344 , 2019.[30] Toshinari Itoko, Rudy Raymond, Takashi Imamichi, and Atsushi Matsuo. Opti-mization of quantum circuit mapping using gate transformation and commuta-tion.

Integration , 70:43–50, 2020.[31] Abhinav Kandala, Antonio Mezzacapo, Kristan Temme, Maika Takita, MarkusBrink, Jerry M Chow, and Jay M Gambetta. Hardware-efficient variationalquantum eigensolver for small molecules and quantum magnets.

Nature ,549(7671):242–246, 2017.[32] Iordanis Kerenidis and Anupam Prakash. Quantum gradient descent for linearsystems and least squares.

Physical Review A , 101(2):022316, 2020.[33] Benjamin P Lanyon, James D Whitfield, Geoff G Gillett, Michael E Goggin,Marcelo P Almeida, Ivan Kassal, Jacob D Biamonte, Masoud Mohseni, Ben JPowell, Marco Barbieri, et al. Towards quantum chemistry on a quantum com-puter.

Nature chemistry , 2(2):106–111, 2010.[34] Gushu Li, Yufei Ding, and Yuan Xie. Tackling the qubit mapping problem fornisq-era quantum devices. In

Proceedings of the Twenty-Fourth InternationalConference on Architectural Support for Programming Languages and OperatingSystems , pages 1001–1014, 2019.[35] Pranav Mundada, Gengyan Zhang, Thomas Hazard, and Andrew Houck. Sup-pression of qubit crosstalk in a tunable coupling superconducting circuit.

PhysicalReview Applied , 12(5):054023, 2019.[36] Prakash Murali, Jonathan M Baker, Ali Javadi-Abhari, Frederic T Chong, andMargaret Martonosi. Noise-adaptive compiler mappings for noisy intermediate-scale quantum computers. In

Proceedings of the Twenty-Fourth InternationalConference on Architectural Support for Programming Languages and OperatingSystems , pages 1015–1029, 2019.[37] Prakash Murali, David C McKay, Margaret Martonosi, and Ali Javadi-Abhari.Software mitigation of crosstalk on noisy intermediate-scale quantum computers.In

Proceedings of the Twenty-Fifth International Conference on Architectural Supportfor Programming Languages and Operating Systems , pages 1001–1016, 2020.[38] Mark EJ Newman. Fast algorithm for detecting community structure in networks.

Physical review E , 69(6):066133, 2004.[39] Siyuan Niu, Adrien Suau, Gabriel Staffelbach, and Aida Todri-Sanial. A hardware-aware heuristic for the qubit mapping problem in the nisq era. arXiv preprintarXiv:2010.03397 , 2020.[40] Alberto Peruzzo, Jarrod McClean, Peter Shadbolt, Man-Hong Yung, Xiao-Qi Zhou,Peter J Love, Alán Aspuru-Guzik, and Jeremy L O’brien. A variational eigenvaluesolver on a photonic quantum processor.

Nature communications , 5:4213, 2014.14 nabling multi-programming mechanism for quantum computing in the NISQ era [41] John Preskill. Quantum Computing in the NISQ era and beyond.

Quantum , 2:79,August 2018.[42] Timothy J Proctor, Arnaud Carignan-Dugas, Kenneth Rudinger, Erik Nielsen,Robin Blume-Kohout, and Kevin Young. Direct randomized benchmarking formultiqubit devices.

Physical review letters , 123(3):030503, 2019.[43] Jonathan Romero and Alan Aspuru-Guzik. Variational quantum generators:Generative adversarial quantum machine learning for continuous distributions. arXiv preprint arXiv:1901.00848 , 2019.[44] Mohan Sarovar, Timothy Proctor, Kenneth Rudinger, Kevin Young, Erik Nielsen,and Robin Blume-Kohout. Detecting crosstalk errors in quantum informationprocessors.

Quantum , 4:321, 2020.[45] Sarah Sheldon, Easwar Magesan, Jerry M Chow, and Jay M Gambetta. Procedurefor systematically tuning up cross-talk in the cross-resonance gate.

PhysicalReview A , 93(6):060302, 2016.[46] Peter W. Shor. Polynomial-time algorithms for prime factorization and discretelogarithms on a quantum computer.

SIAM Journal on Computing , 26(5):1484–1509,1997.[47] Hao Tang, Anurag Pal, Lu-Feng Qiao, Tian-Yu Wang, Jun Gao, and Xian-Min Jin.Quantum computation for pricing the collateral debt obligations. arXiv preprintarXiv:2008.04110 , 2020.[48] Swamit S Tannu and Moinuddin K Qureshi. Not all qubits are created equal: acase for variability-aware policies for nisq-era quantum computers. In

Proceed-ings of the Twenty-Fourth International Conference on Architectural Support forProgramming Languages and Operating Systems , pages 987–999, 2019.[49] R. Wille, D. Große, L. Teuber, G. W. Dueck, and R. Drechsler. RevLib: An onlineresource for reversible functions and reversible circuits. In

Int’l Symp. on Multi-Valued Logic , pages 1–6. IEEE, 2019.[51] Feng Zhang, Niladri Gomes, Noah F Berthusen, Peter P Orth, Cai-Zhuang Wang,Kai-Ming Ho, and Yong-Xin Yao. Shallow-circuit variational quantum eigensolverbased on symmetry-inspired hilbert space partitioning for quantum chemicalcalculations. arXiv preprint arXiv:2006.11213 , 2020.[52] Peng Zhao, Peng Xu, Dong Lan, Ji Chu, Xinsheng Tan, Haifeng Yu, and Yang Yu.High-contrast z z interaction using superconducting qubits with opposite-signanharmonicity.

Physical Review Letters , 125(20):200503, 2020.[53] Pengcheng Zhu, Zhijin Guan, and Xueyun Cheng. A dynamic look-ahead heuris-tic for the qubit mapping problem of nisq computers.

IEEE Transactions onComputer-Aided Design of Integrated Circuits and Systems , 2020. [54] Alwin Zulehner, Alexandru Paler, and Robert Wille. An efficient methodologyfor mapping quantum circuits to the ibm qx architectures.

IEEE Transactions onComputer-Aided Design of Integrated Circuits and Systems , 38(7):1226–1236, 2018.[55] 5-qubit backend: IBM Q team, "IBM Q 5 valencia backend specification V1.4.0,"(2020). Retrieved from https://quantum-computing.ibm.com.[56] Frank Arute et. al. Quantum supremacy using a programmable superconductingprocessor.

Nature , 574:505–510, 10 2019. doi: https://doi.org/10.1038/s41586-019-1666-5.

This work is funded by the QuantUM Initiative of the Region Occi-tanie, University of Montpellier and IBM Montpellier. The authorswould like to thank Xinglei Dou and Lei Liu for the meaningfuldiscussions and exchanges. The authors are very grateful to AdrienSuau for the helpful suggestions and feedback on an early versionof this manuscript. We acknowledge use of the IBM Q for this work.The views expressed are those of the authors and do not reflect theofficial policy or position of IBM or the IBM Q team.

S.N and A.T.S contributed equally to this work. A.T.S proposed theproblem formalism. S.N implemented the algorithms and wrote thepaper. A.T.S revised the paper. Both authors reviewed and discussedthe analyses and results of the work.

The authors declare no competing interests.

10 ADDITIONAL INFORMATION

Correspondence and requests for materials should be addressedto S.N.and requests for materials should be addressedto S.N.