[PDF] A Hardware-Aware Heuristic for the Qubit Mapping Problem in the NISQ Era

Abstract

Due to several physical limitations in the realisation of quantum hardware, today's quantum computers are qualified as Noisy Intermediate-Scale Quantum (NISQ) hardware. NISQ hardware is characterized by a small number of qubits (50 to a few hundred) and noisy operations. Moreover, current realisations of superconducting quantum chips do not have the ideal all-to-all connectivity between qubits but rather at most a nearest-neighbour connectivity. All these hardware restrictions add supplementary low-level requirements. They need to be addressed before submitting the quantum circuit to an actual chip. Satisfying these requirements is a tedious task for the programmer. Instead, the task of adapting the quantum circuit to a given hardware is left to the compiler. In this paper, we propose a Hardware-Aware mapping transition algorithm (HA) that takes the calibration data into account with the aim to improve the overall fidelity of the circuit. Evaluation results on IBM quantum hardware show that our HA approach can outperform the state of the art both in terms of the number of additional gates and circuit fidelity.

Full PDF

AA Hardware-Aware Heuristic for the Qubit Mapping Problem inthe NISQ Era

Siyuan NIU

LIRMM, Univ MontpellierMontpellier, [email protected]

Adrien Suau

LIRMM, Univ MontpellierMontpellier, FranceCERFACSToulouse, [email protected]

Gabriel Staffelbach

CERFACSToulouse, [email protected]

Aida Todri-Sanial

LIRMM, CNRSMontpellier, [email protected]

ABSTRACT

Due to several physical limitations in the realisation of quantumhardware, today’s quantum computers are qualified as Noisy Intermediate-Scale Quantum (NISQ) hardware. NISQ hardware is characterizedby a small number of qubits (50 to a few hundred) and noisy opera-tions. Moreover, current realisations of superconducting quantumchips do not have the ideal all-to-all connectivity between qubitsbut rather at most a nearest-neighbour connectivity. All these hard-ware restrictions add supplementary low-level requirements. Theyneed to be addressed before submitting the quantum circuit to anactual chip. Satisfying these requirements is a tedious task for theprogrammer. Instead, the task of adapting the quantum circuit to agiven hardware is left to the compiler. In this paper, we propose aHardware-Aware mapping transition algorithm (HA) that takes thecalibration data into account with the aim to improve the overallfidelity of the circuit. Evaluation results on IBM quantum hardwareshow that our HA approach can outperform the state of the artboth in terms of the number of additional gates and circuit fidelity.

In recent years, quantum computing has become a very activefield of research. It promises to solve classically intractable com-putational problems such as integer factorisation [36], quantumchemistry [9], linear algebra [7, 18, 21, 22, 35, 43], or optimisa-tion [15, 24, 25]. Along with algorithms, quantum hardware hasattracted the attention of several companies such as IBM, Google, In-tel, or Rigetti that have demonstrated quantum chips with 53, 72, 49,and 28 qubits respectively. IBM and Rigetti have also given accessto a cloud quantum computing service on which anyone can submitquantum circuits to real quantum hardware. The aforementionedquantum hardware can already be qualified as NISQ hardware [33].Still, none of them is fault-tolerant as quantum error correctioncodes (QECC) are in infancy. Nevertheless, it is believed that evena noisy quantum chip with limited qubit-to-qubit connectivity canbe used to solve some classically intractable problems, one of themost promising candidates being quantum chemistry [9].From the algorithm perspective, a new paradigm for quantumalgorithms has emerged to take into account the limitations of NISQ hardware – variational algorithms. Examples of variational algo-rithms include the Variational Quantum Eigensolver (VQE) [32],the Variational Quantum Linear Solver (VQLS) [7, 22], or the Quan-tum Approximate Optimisation Algorithm (QAOA) [15]. However,there is a difference between the quantum program written by theprogrammer and what can be executed on the current quantumhardware. Quantum programs are written as if they were runningon ideal quantum hardware without any noise or physical con-straints. But real quantum chips are not ideal – for example forsuperconducting devices which are targeted in this paper, currenttwo-qubit gates can at best only be applied between two neigh-bouring qubits. If we want to perform quantum computations, ourquantum circuits must obey such connectivity constraints, whichmeans that a modification of the quantum program is necessaryto adapt it to the real quantum device. This problem of adaptinga quantum program to given hardware connectivity is called thequbit mapping problem and is the focus of this paper.The qubit mapping problem can be reformulated as two sub-problems. First, to find an initial mapping, i.e. a mapping betweenthe "logical qubits" (as a qubit in a quantum circuit) to the "physicalqubits" (as a qubit in a quantum chip). Secondly, to determine amapping transition algorithm to identify the quantum gates toinsert in a quantum circuit such that it complies with the targetedquantum hardware topology. Finding the optimal solution for thequbit mapping problem is likely to be an NP-complete problem asnoted in [38].Two types of methods have been used to solve the qubit mappingproblem. The first method is to reformulate it as a mathematicallyequivalent problem that can then be solved using a specialisedsolver. Such mathematical formalism can be Integer Linear Program-ming (ILP) [4, 5, 13, 27], Satisfiability Modulo Theory (SMT) [30, 31],or even Constraint Programming (CP) [6, 40]. However, these math-ematical approaches suffer from long runtime and are difficult toscale up. The second method is to use heuristics to modify the quan-tum circuit, starting from the first quantum gate and transformingthe circuit sequentially by making each gate one after the otherhardware-compliant.Most of the previous works [2, 26, 34, 37, 42] using the secondmethod only adapt for nearest-neighbour connectivity and cannot a r X i v : . [ c s . A R ] O c t iyuan NIU, Adrien Suau, Gabriel Staffelbach, and Aida Todri-Sanial be directly applicable to actual quantum architectures with nonuni-form connections. Recently, publications[10, 19, 23, 29, 38, 44–46]that are not restricted to a specific architecture have been released.The algorithm presented in [46] uses a heuristic to find the bestpermutation at each step of the mapping procedure. Instead ofrepresenting a quantum circuit as a fixed sequence of layers, [20]introduces the Directed Acyclic Graph (DAG) that takes into ac-count the dependency and commutativity of quantum gates. Amajor improvement has been shown by [29] which uses a "forward-backward-forward" mapping algorithm. Moreover, the "look-ahead"strategy has been introduced in the heuristic cost function for fur-ther optimisation in some existing works, notably [23, 29, 44–46].For qubit movement, most of these methods only use SWAP gate.A notable exception is [23] that considered both

Bridge and

SWAP gate. Moreover, most of these works aim to minimise the numberof inserted gates and do not consider the noise impact on differentqubits.In [3, 5, 16, 30, 39], calibration data is exploited and the applied ap-proach is to insert additional gates between strongly linked qubits,i.e. qubits linked with a low two-qubit gate error rate. However,these works do not consider a holistic view of the problem such asexploring initial mapping or the heuristic function is not efficientenough to select the best candidate of the inserted gate.In this work, we follow the second type of methods that consistin developing a heuristic to choose the best

SWAP to insert basedon calibration data. We propose a Hardware-Aware (HA) heuristicmapping transition algorithm to address the drawbacks mentionedabove. Our main contributions can be listed as follows. First, wepresent a mapping transition algorithm that takes into account thehardware topology and the calibration data to improve the overalloutput state fidelity and reduce the total execution time. Second, toreduce the number of additional gates required to map the quantumcircuit to the quantum chip, our algorithm can select between a

SWAP or Bridge gate. Finally, we run our HA algorithm on realquantum hardware and compare with various mapping methodsfrom the literature.

Here, we introduce state of the art on quantum hardware devicesand their constraints, focusing mainly on IBM quantum devices.Then, we explain the qubit mapping problem. Finally, a small mo-tivational example is shown to illustrate the gist of our algorithm.The notations used in this paper are summarised in Table 1 (wereference some notations from [29]).

NISQ hardware is characterized in [33] as quantum hardware hav-ing from 50 to a few hundred noisy qubits on which one can onlyperform noisy operations. At the time of writing, several compa-nies have already demonstrated quantum chips that can, accordingto the definition, be qualified as NISQ chips. For example, IBMannounced the latest 53-qubit quantum chip and gave access tothe community to execute quantum circuits. Other companies likeGoogle (72 qubits) or Intel (49 qubits) announced quantum chipsthat could be qualified as NISQ but did not provide any information

Table 1: Notations used in this paper

Notation Deﬁnition q logical qubits for quantum circuit Q physical qubits for quantum device g quantum gate g.q n n -th logical qubit the quantum gate g is applied on G hardware coupling graph D distance matrix S swap matrix G E hardware graph with swap error ratesas weights E swap error matrix G T hardware graph with swap executiontime as weights T swap execution time matrix H heuristic cost function π mapping from q to QF ﬁrst layer E extended layer W weight parameter Figure 1: ibmq_almaden topology. Qubits are represented ascircles and indexed from to . A connection between twoqubits is represented by an edge between the two qubits. about their characteristics. One of the significant challenges of thesequantum chips that limits them from solving real-world problemsis their level of noise – even if these chips have enough qubits theo-retically to show a quantum speedup, the fidelity of their quantumoperations is still too low to obtain any advantage over the classicalcomputer on real-world problems. In this paper, we mainly focus onIBM architectures. Other hardware such as Google’s Sycamore [14]or Rigetti’s Aspen-7 is not specially targeted. Still, the proposedalgorithm and methods are general enough to be applicable to anyquantum chip that use the quantum-gate model of computationand so should be applicable to these hardware.Fig. 1 shows the topology, also called coupling graph, of IBMQuantum’s ibmq_almaden, a 20-qubit system. Each vertex repre-sents a qubit and the edge represents the coupling interconnectbetween two qubits. Table 2 shows the calibration data that areextracted from [1]. It includes CNOT error rates, single qubit error

Hardware-Aware Heuristic for the Qubit Mapping Problem in the NISQ Era

Table 2: ibmq_almaden characteristics

Qubit number 20Single qubit error rate 2 . e − to 9 . e − CNOT error rate 8 . e − to 3 . e − id gate length 35 .

56 ns u1 gate length 0 ns u2 gate length 35 .

56 ns u3 gate length 71 .

11 ns

CNOT gate length 248 .

88 ns to 860 .

44 nsT1 34 . µ s to 139 . µ sT2 12 . µ s to 200 . µ s Note that the exact hardware characteristics are not constant and change at eachre-calibration of the chip. rates, energy relaxation and decoherence characteristic times T1and T2, and execution time (gate length). The calibration data showthat the error of two-qubit gates is one order of magnitude higherthan their one-qubit counterparts. This is also the case for gateexecution times – two-qubit gates are approximately an order ofmagnitude slower than one-qubit gates. For simplicity and becauseof the relatively low error rates and execution times of one-qubitgates when compared to two-qubit gates, we focus on two-qubitgates in this paper.Moreover, it is important to note that all the interconnects be-tween qubits are not equal with respect to

CNOT gate error rateor execution time. Taking ibmq_almaden as an example, the best

CNOT gate has an error rate of 4 .

18 times lower than the worst

CNOT and the maximum execution time is 3 .

46 times longer than theminimum one. Therefore, we cannot treat each qubit equally, andwe need to consider the interconnect topology between qubits aswell as their error rate.

CNOT gates can be applied in either directionby conjugating with H gates. As we do not consider one-qubit gatesin this study, we do not have to consider the connectivity direction.

Following the abstraction first introduced in classical computingdecades ago, most of the quantum circuits are described in a genericmanner that does not take into account all the physical hardwareconstraints. Many of the currently existing frameworks for quan-tum algorithm development encourage this way of developmentby giving access to a broad set of "primitive" gates. For example,the Qiskit library allows the developer to choose from more than30 primitive gates, whereas the IBM quantum chips only providefour physical hardware gates (five if we take the identity gate intoaccount). However, any gate can also be implemented with Open-Pulse [8] which is a low level hardware control for users to generatetheir gates to mitigate errors. Such an abstraction relieves the bur-den from the developer to adapt the code to a specific hardwareand transfer it to the compiler, whose role is to transform an ab-stracted code into the most efficient hardware code possible. To doso, a compiler for quantum programs should perform several stepssummarised in the following paragraphs.The first step is to decompose the abstracted quantum gatesinto hardware gates. The hardware gates available strongly depend on the quantum hardware we are compiling for, but are generallycomprised of a two-qubit "entangler" gate (controlled- X gate forIBM hardware, fSim gate for Google hardware) and several one-qubit gates ( u1 , u2 , u3 , and id gates for IBM hardware). At the endof this step, the quantum circuit has been modified to only con-tain quantum gates that are directly implemented in the quantumhardware.But translating all abstract gates to hardware gates is generallynot enough to make the quantum circuit executable on the specifichardware – the hardware topology is rarely respected at the endof this first step and the circuit requires a second step with furthermodifications. Such modification of the quantum circuit to makeit compliant with the hardware topology is often done by insert-ing SWAP gates before non-executable two-qubit gates. Note thaton current hardware, only two-qubit gates are restricted by thehardware topology.Finally, once the quantum circuit is executable on the specifiedquantum hardware, a final third step is performed to optimise thequantum circuit. Depending on the figure of interest, the optimisa-tion can aim at reducing the execution time, gate count, increasingthe final state fidelity or even reducing the number of qubits needed.The qubit mapping problem is defined as the second compilationstep that modifies the quantum circuit to contain only two-qubitgates that fit into the hardware topology. But in practice qubitmapping algorithms also try to consider the third step that consistsin optimising the generated quantum circuit according to a chosenfigure of merit.

Fig. 2a shows a small quantum circuit, which is composed of three

CNOT gates and one X gate. It is mapped to a 5-qubit IBM quantumdevice called ibmq_valencia, shown in Fig. 2b. For simplicity, theinitial mapping is allocated linearly as { 𝑞 → 𝑄 , 𝑞 → 𝑄 , 𝑞 → 𝑄 , 𝑞 → 𝑄 , 𝑞 → 𝑄 } . Gates 𝑔 and 𝑔 comply with the hardwaretopology (i.e. coupling constraints) and can be executed directly.However, 𝑔 is applied to two non-connected qubits. Therefore, amovement (i.e. a SWAP gate, shown in Fig. 3) of logical qubits isneeded before being able to execute 𝑔 on the hardware connectionbetween 𝑞 and 𝑞 . Referring to the coupling graph in Fig. 2b, three SWAP gates are possible: { 𝑞 , 𝑞 } , { 𝑞 , 𝑞 } and { 𝑞 , 𝑞 } . Among thesepossible SWAP s, two of them change the current mapping betweenlogical and physical qubits in such a way that the

CNOT gate between 𝑞 and 𝑞 becomes executable – swapping of { 𝑞 , 𝑞 } and { 𝑞 , 𝑞 } .Translating the logical qubits to their physical counterparts, the SWAP s { 𝑄 , 𝑄 } and { 𝑄 , 𝑄 } are our candidates. At this step, mostof the state-of-the-art algorithms consider the two possible SWAP sto be equal and will randomly select one. However, if the calibrationdata is considered, the

SWAP between { 𝑄 , 𝑄 } is less noisy thanthe other (error rate of the two interconnects is shown in Fig. 2b).A SWAP operation consists of three

CNOT s and we want to insert a

SWAP gate with the least noise. Thus, the

SWAP gate between { 𝑞 , 𝑞 } is inserted and the final mapping is { 𝑞 → 𝑄 , 𝑞 → 𝑄 , 𝑞 → 𝑄 , 𝑞 → 𝑄 , 𝑞 → 𝑄 } . The updated circuit is shown in Fig. 2c. iyuan NIU, Adrien Suau, Gabriel Staffelbach, and Aida Todri-Sanial g g g Xq ( Q ) q ( Q ) q ( Q ) q ( Q ) q ( Q ) (a) Original circuit Q Q Q Q Q − e − (b) ibmq_valencia Xq ( Q ) q ( Q ) q ( Q ) q ( Q ) q ( Q ) Q Q Q Q Q (c) Updated circuit Figure 2: A motivational example for the qubit mapping problem.

We are inspired by the SABRE algorithm presented in [29], which isa

SWAP -based heuristic algorithm to reduce the number of additional

CNOT gates. We propose a Hardware-Aware

SWAP and

Bridge basedheuristic search algorithm. Compared to SABRE algorithm, whichaims at reducing the number of additional gates, we improve thecircuit fidelity as well as reduce the number of additional gates byintroducing a new distance matrix that takes into account bothof the hardware connectivity and the calibration data. Moreover,SABRE only uses

SWAP gate when a qubit movement is needed,whereas our algorithm decides between a

SWAP and

Bridge for qubitmovement to further reduce the number of additional gates. Finally,we also develop an initial mapping algorithm called Hardware-aware Simulated Annealing (HSA) in order to evaluate the mappingtransition algorithm of different flavours.The compiler takes as input a quantum program written in theOpenQASM language [11] and the calibration data of a specificIBM quantum device. During the compilation process, it consid-ers the hardware constraints such as hardware topology and gateavailability. Then, the qubit mapping algorithm is applied. It con-tains two principal parts – initial mapping and mapping transi-tion algorithm. In the mapping transition step, some optimisa-tions are done to generate a circuit with a better performance interms of final state fidelity. The source code is publicly available athttps://github.com/peachnuts/HA.We start by explaining our HA algorithm in subsection 3.1. In sub-section 3.2, we describe the hardware-aware simulated annealing(HSA) method for initial mapping. Finally, subsection 3.3 presentsthe metrics used to evaluate our algorithm.

The first step of the algorithm is to process the input quantumcircuit in order to reformulate it in a more convenient data format.Starting from the input quantum circuit, we can obtain a DirectedAcyclic Graph (DAG) circuit which represents the operation depen-dencies in the quantum circuit without considering the hardwareconstraints. The DAG is constructed such that quantum gates arerepresented by the graph nodes and the directed edge ( 𝑖, 𝑗 ) betweennodes 𝑖 and 𝑗 represents a dependency from gate 𝑖 to 𝑗 , i.e. gate 𝑖 should be executed before 𝑗 .Once the DAG is constructed, graph nodes (i.e. quantum gates)can be ordered according to the gate dependencies – for example = Figure 3: SWAP gate if gate 𝑗 depends on gate 𝑖 , then gate 𝑖 will be ordered before gate 𝑗 . One possible ordering that fulfil this property of dependency isthe well known topological ordering. Note that depending on thequantum circuit, this ordering might not be unique.Quantum gates can then be divided into three groups: the exe-cuted gates, the executable gates, and to be executed future gates.Executed gates are quantum gates that have already been mappedby the algorithm. Executable gates constitute the first layer, denoted 𝐹 . A gate is considered executable when all the gates it dependedon are in the executed gates group. Finally, to be executed futuregates are the rest of the gates (not yet executed nor executable).These gates are included in the extended layer, 𝐸 . An illustration oflayers 𝐸 and 𝐹 is shown in Fig. 6. A heuristic cost function 𝐻 is intro-duced to estimate the cost of each possible (i.e. executable) swappairs at a given step of the iterative algorithm. Its objective is toquantify the quality of the possible swap pairs according to thedistance considered and to select the best swap pair.When inserting a SWAP gate, the circuit is divided into two layers:the first layer 𝐹 and the extended layer 𝐸 . Note that inserting a SWAP gate will not only influence the gates in the first layer 𝐹 but also thegates in the extended layer 𝐸 . The approach of considering the swappair’s impact on the extended layer is referred as the look-aheadability. It can contribute to a better selection and depends on thesize of the extended layer.We devise several metrics that can be used to estimate the costof a swap pair in HA. We consider three different distance matrices– swap matrix 𝑆 , swap error matrix E and swap execution timematrix 𝑇 . Because 𝑆 , E , and 𝑇 contain entries with incompatibleunits and different scales, we update 𝑇 to make it dimensionlessand each matrix is normalised. Moreover, we introduce weights( 𝛼 , 𝛼 , and 𝛼 for 𝑆 , E , and 𝑇 , respectively) to allow to choose theimportance of each parameter in terms of number of SWAP s, gateerror and execution time.Matrix 𝑆 is constructed such that the entry ( 𝑖, 𝑗 ) stores the dis-tance on the real hardware between qubit 𝑖 to a neighbour of qubit Hardware-Aware Heuristic for the Qubit Mapping Problem in the NISQ Era 𝑗 , which is also equal to the minimum number of SWAP gates neededto move qubit 𝑖 to qubit 𝑗 . The matrix is efficiently constructed byusing the Floyd-Warshall algorithm [17].Matrix E stores in its entry ( 𝑖, 𝑗 ) the minimum error rate attain-able to move the qubit 𝑖 to a neighbour of qubit 𝑗 . The error rateof each possible SWAP is computed based on the calibration dataof

CNOT gates. The decomposition of a

SWAP gate in terms of

CNOT gates is shown in Fig. 3.The success rate of a

CNOT between the physical qubits 𝑄 𝑖 and 𝑄 𝑗 , denoted by 𝑆 ( 𝑄 𝑖 , 𝑄 𝑗 ) , is computed from the error rates givenin the calibration data. Equation (1) computes the error rate of a SWAP gate between two connected physical qubits 𝑄 𝑖 and 𝑄 𝑗 whiletaking into account that the swap operation is symmetric. The final E matrix is constructed by using the Floyd-Warshall algorithm onthe graph 𝐺 E with the computed errors as edge weights. 𝐺 E ( 𝑄 𝑖 , 𝑄 𝑗 ) = − 𝑆 ( 𝑄 𝑖 , 𝑄 𝑗 ) × 𝑆 ( 𝑄 𝑗 , 𝑄 𝑖 )× max ( 𝑆 ( 𝑄 𝑖 , 𝑄 𝑗 ) , 𝑆 ( 𝑄 𝑗 , 𝑄 𝑖 )) (1)Matrix 𝑇 is computed, similarly as 𝑆 and E , with the Floyd-Warshall algorithm applied on graph 𝐺 𝑇 but by using the SWAP execution time. This execution time is computed with (2) where 𝑡 ( 𝑄 𝑖 , 𝑄 𝑗 ) is the execution time of the CNOT gate with 𝑄 𝑖 as controland 𝑄 𝑗 as target, extracted from the calibration data. 𝐺 𝑇 ( 𝑄 𝑖 , 𝑄 𝑗 ) = 𝑡 ( 𝑄 𝑖 , 𝑄 𝑗 ) + 𝑡 ( 𝑄 𝑗 , 𝑄 𝑖 )+ min ( 𝑡 ( 𝑄 𝑖 , 𝑄 𝑗 ) , 𝑡 ( 𝑄 𝑗 , 𝑄 𝑖 )) (2)The summation of the three matrices forms a new matrix calleddistance matrix 𝐷 (shown in (3)). The distance matrix represents the"distance" between each pair of qubits in the quantum chip. Here,the "distance" means the combination of swap distance, overallerror rate and execution time of the shortest path. 𝐷 = 𝛼 × 𝑆 + 𝛼 × E + 𝛼 × 𝑇 (3)Inserting a SWAP gate will have an impact on the current mapping 𝜋 𝑐 , changing it to 𝜋 temp . We compute the cost of this SWAP on thefirst layer 𝐹 with the cost function 𝐻 𝑏𝑎𝑠𝑖𝑐 shown in (4). A smallscore means the SWAP has a little impact on the first layer gateswith respect to the overall distance considered. The swap pair withthe minimum score is selected as the best candidate. 𝐻 𝑏𝑎𝑠𝑖𝑐 = ∑︁ 𝑔 ∈ 𝐹 𝐷 [ 𝜋 temp ( 𝑔.𝑞 )] [ 𝜋 temp ( 𝑔.𝑞 )] (4)We also consider the impact of the swap pair on the extendedlayer 𝐸 . The impact of a SWAP on the first layer is prioritised overits impact on the extended layer. As a result, a weight parameter 𝑊 is added to the extended layer cost to scale its impact. Moreover,the impacts on the first layer and extended layer are normalised bydividing them with their respective number of gates. The completeheuristic function including the extended layer 𝐸 with look-aheadability is shown in (5). Even though (4) and (5) are similar to equa-tions in [29], it is important to note that the distance matrix 𝐷 isdifferent. = Figure 4: Bridge gate 𝐻 = | 𝐹 | ∑︁ 𝑔 ∈ 𝐹 𝐷 [ 𝜋 temp ( 𝑔.𝑞 )] [ 𝜋 temp ( 𝑔.𝑞 )]+ 𝑊 × | 𝐸 | ∑︁ 𝑔 ∈ 𝐸 𝐷 [ 𝜋 temp ( 𝑔.𝑞 )] [ 𝜋 temp ( 𝑔.𝑞 )] (5) Another important metric ofthe HA algorithm is the heuristic cost function that estimates theusefulness of a

SWAP . In some situations, even the best

SWAP mayhave a negative impact on the overall circuit. In that case, inspiredby [23], our heuristic function decides to insert a

Bridge gate in-stead of a

SWAP gate if the topology allows it. The decomposition ofthe

Bridge gate with four

CNOT s is shown in Fig. 4. The

Bridge gateallows executing a

CNOT between two qubits that share a commonneighbour. Both

SWAP and

Bridge gate need three supplementary

CNOT s. Note that the

Bridge gate can only be used to replace a

CNOT if the distance between the control and target qubits (i.e. theminimum number of links between the two qubits) is exactly two.Fig. 5a shows an example of quantum circuit that is mapped toibmq_valencia with the topology described in Fig. 2b. The quantumgates 𝑔 and 𝑔 comply with the topology of the chip, but 𝑔 doesnot. By evaluating the heuristic cost function 𝐻 , the SWAP between 𝑞 and 𝑞 is selected. But as shown in Fig. 5b, the chosen SWAP has a negative impact on the extended layer – gate 𝑔 is no longerexecutable and another SWAP gate is required to execute it.Such situations can be solved by using a

Bridge gate insteadof a

SWAP gate as shown in Fig. 5c. Since the distance between thecontrol qubit 𝑞 and the target qubit 𝑞 of gate 𝑔 is two, we caninsert a Bridge gate instead. Using a

Bridge gate allows to executethe

CNOT gate 𝑔 without changing the current mapping. Moreover,by using a Bridge gate, we only add three

CNOT s to map the entirecircuit, instead of six (two times more) if only

SWAP gates were used.Once the cost 𝐻 of each swap pair is computed, the heuristicwill try to choose the best option between inserting a SWAP or Bridge gate. To do so, it considers two mappings: 𝜋 𝑐 , the mappingused before selecting the best swap pair and also the mappingobtained after inserting a Bridge gate, and 𝜋 temp , the new mappingthat would be obtained after inserting the best SWAP gate. Theoverall effect of the

SWAP gate on the extended layer 𝐸 is computedaccording to (6). If the effect of the best SWAP gate is negative, thismeans that the considered swap pair has an overall negative impacton the extended layer 𝐸 . In this case, we consider that it is better tokeep the current mapping so, if the hardware topology permits it,a Bridge gate is inserted instead of a

SWAP gate. iyuan NIU, Adrien Suau, Gabriel Staffelbach, and Aida Todri-Sanial g g g g g q ( Q ) q ( Q ) q ( Q ) q ( Q ) q ( Q ) (a) Original circuit q ( Q ) q ( Q ) q ( Q ) q ( Q ) q ( Q ) Q Q Q Q Q (b) SWAP gate transformation Bridge gate q ( Q ) q ( Q ) q ( Q ) q ( Q ) q ( Q ) Q Q Q Q Q (c) Bridge gate transformation Figure 5: An example of a quantum circuit showing the difference between SWAP and Bridge transformation.

Effect = ∑︁ 𝑔 ∈ 𝐸 𝐷 [ 𝜋 𝑐 ( 𝑔.𝑞 )] [ 𝜋 𝑐 ( 𝑔.𝑞 )]− 𝐷 [ 𝜋 temp ( 𝑔.𝑞 )] [ 𝜋 temp ( 𝑔.𝑞 )] (6) The mapping transition algorithm will gothrough each quantum gate sequentially and mark the directlyexecutable gates as executed. If no more gates can be marked as executed , this means that either the quantum circuit is fully mappedor all the gates in the first layer do not comply with the hardwaretopology. In the first case, the mapping algorithm can be stoppedand the mapped quantum circuit returned. In the second case, thealgorithm calls a heuristic function to choose the best

SWAP or Bridge gate to insert in order to make some of the gates in the firstlayer executable. The algorithm then iterates, until the quantumcircuit is fully mapped.Algorithm 1 shows the pseudo-code of the HA heuristic method.Note that the most recent calibration data should be retrieved (i.e.through the IBM Quantum Experience) before each usage of theHA algorithm to ensure that the algorithm has access to the mostaccurate and up-to-date information possible.The heuristic method to insert a

SWAP or Bridge when no gatein the first layer 𝐹 is executable can be described as follows. First, alist of all the candidate SWAP gates, swap_candidate_list , is con-structed based on the quantum gates in the first layer 𝐹 and thehardware coupling graph 𝐺 . Then, for each SWAP candidate a tem-porary mapping 𝜋 𝑡𝑒𝑚𝑝 is computed with the Map_Update function.The final cost of the candidate

SWAP is computed following (5). The

SWAP with the minimum score is selected and called 𝑠𝑤𝑎𝑝 𝑚𝑖𝑛 .The last step is to choose between a

SWAP gate or a

Bridge gate.A

SWAP gate can always be used, whereas a

Bridge gate can onlybe inserted if a gate in the first layer 𝐹 becomes executable fromthe mapping obtained after applying the 𝑠𝑤𝑎𝑝 𝑚𝑖𝑛 gate. If a Bridge gate is not insertable, then the algorithm has no choice but to inserta

SWAP gate. Else, the algorithm decides the gate (

SWAP or Bridge )to insert based on the effect of the

SWAP gate on the extended layercomputed with (6). If adding a

SWAP gate has a negative impact onthe extended layer, then a

Bridge gate (which does not change thecurrent mapping) is inserted. Otherwise, if adding a

SWAP gate hasa positive effect on the extended layer, then the algorithm inserts a

SWAP gate.

The HA algorithm outperforms SABREalgorithm thanks to several modifications while not changing itsasymptotic complexity. The mapping procedure is separated into

Algorithm 1:

Heuristic algorithm for selecting additionalgate candidate input :

Circuit

𝐷𝐴𝐺 , Coupling graph 𝐺 , Current mapping 𝜋 𝑐 , Distance matrix 𝐷 , Swap matrix 𝑆 , First layer 𝐹 ,Extended layer 𝐸 , Weight parameter 𝑊 output : New mapping 𝜋 𝑛 , Inserted gate 𝑔 add begin Set score to empty list; Set effect to empty list; swap_candidate_list ← FindSwapPairs( 𝐹 , 𝐺 ) ; for swap ∈ swap_candidate_list do 𝜋 temp ← Map_Update ( swap ); 𝐻 basic ← for gate ∈ 𝐹 do 𝐻 basic ← 𝐻 basic + 𝐷 (gate , 𝜋 temp ) ; end 𝐻 extended ← for gate ∈ 𝐸 do 𝐻 extended ← 𝐻 extended + 𝐷 (gate , 𝜋 temp ) ; effect_cost ← effect_cost + 𝐷 (gate , 𝜋 𝑐 ) − 𝐷 (gate , 𝜋 temp ) ; end 𝐻 ← | 𝐹 | 𝐻 basic + 𝑊 | 𝐸 | 𝐻 extended ; score . append( 𝐻 ) ; effect . append( effect_cost ) ; end Find the swap with minimum score: swap min ; Find the gate in 𝐹 that become executable by applying swap min : 𝑔 𝑠 ; if effect (cid:2) swap min (cid:3) < and 𝑆 ( 𝑔 𝑠 , 𝜋 𝑐 ) = then 𝜋 𝑛 ← 𝜋 𝑐 ; 𝑔 add ← 𝑔 𝐵 ; else 𝜋 𝑛 ← Map_Update ( swap min ); 𝑔 add ← swap min ; end return 𝜋 𝑛 , 𝑔 add ; end Hardware-Aware Heuristic for the Qubit Mapping Problem in the NISQ Era g g g g g q ( Q ) q ( Q ) q ( Q ) q ( Q ) q ( Q ) F E (a) Beginning of the HA mapping algorithm. 𝑔 and 𝑔 do not overlap and are the first gates inthe circuit so they are in 𝐹 . The other gates arepushed in 𝐸 . g g g g g q ( Q ) q ( Q ) q ( Q ) q ( Q ) q ( Q ) F E (b) 𝑔 and 𝑔 are compliant with the hardwaretopology. They are executed and removed from 𝐹 . The gate 𝑔 is pushed into 𝐹 but is not com-pliant and a SWAP/Bridge should be inserted. 𝑔 overlaps with 𝑔 and cannot be inserted in 𝐹 . g g g g g q ( Q ) q ( Q ) q ( Q ) q ( Q ) q ( Q ) F E (c) After Bridge insertion, 𝑔 is executed and re-moved from 𝐹 . 𝑔 no longer overlap with a gatein 𝐹 and is added to the first layer. 𝑔 overlapswith 𝑔 and so should stay in 𝐸 . Figure 6: Evolution of the layers 𝐹 and 𝐸 on a simple circuit with a detailed explanation at each step. two steps: an initialisation step that is independent of the mappedquantum circuit and a mapping step.The initialisation step computes the distance matrix that is usedafterwards in the mapping step. In our algorithm, the distancematrix is computed according to (4). Each of 𝑆 , E and 𝑇 constitutingthe distance matrix 𝐷 requires to use the Floyd-Warshall algorithmonce on the hardware graph. This means that we need to performthree calls to an algorithm of 𝑂 ( 𝑛 ) complexity, 𝑛 being the numberof qubits of the targeted quantum chip. Moreover, the weightsused by the Floyd-Warshall algorithm for the matrices E and 𝑇 should be retrieved online with Qiskit API. This retrieval is anoperation that theoretically takes 𝑂 ( 𝑛 ) time in the worst case aswe need to retrieve CNOT error rates and execution time for eachlink. Note that the current quantum chips only have 𝑂 ( 𝑛 ) linksand so the asymptotic complexity of this step is 𝑂 ( 𝑛 ) . Overall, theinitialisation step is dominated by the cost of applying the Floyd-Warshall algorithm, that takes 𝑂 ( 𝑛 ) time.After the initialisation step, the actual mapping procedure isapplied. Let 𝑛 be the number of qubits, 𝑔 the number of CNOT gatesin the mapped quantum circuit and 𝑑 the diameter of the chip, i.e.the minimum SWAP distance between the two farthest qubits onthe quantum chip. In the worst case, all the

CNOT gates should bemapped because none of them comply with the hardware topology.Moreover, all the

CNOT gates might need up to 𝑑 SWAP s in orderto become executable. Finally, for each

SWAP insertion we needto execute the heuristic cost function. This function will need toexplore at most 𝑛 links (in the case of an all-to-all connected chip,this number improves to 𝑂 ( 𝑛 ) on practical quantum chips with anearest-neighbour connectivity), where exploring one link mighttake a time of 𝑂 ( 𝑔 ) if all the CNOT gates are included in either 𝐹 or 𝐸 . In summary, the mapping step takes 𝑂 ( 𝑔 𝑑𝑛 ) time in theworst case, which can be improved to 𝑂 ( 𝑔𝑛 . ) under reasonableassumptions (nearest-neighbour chip connectivity, i.e. 𝑑 ∈ 𝑂 (√ 𝑛 ) ,and an extended layer 𝐸 with at most 𝑂 ( 𝑛 ) CNOT gates).It is important to note that the initialisation step only needs tobe repeated when the calibration data change but that requires torecover data from the Internet which can be a slow operation (inthe order of several seconds).

Heuristic-based mapping transition algorithms rely crucially ona good initial mapping to achieve the best results. A well-knownalgorithm when trying to approximate the global minimum of ascalar function with a discrete search space is simulated annealing.Simulated annealing is a meta-heuristic designed to explore thesearch space by randomly selecting neighbours of the current state,evaluating them with the provided cost function and evolving insuch a way that the algorithm will not be trapped into local mini-mums. The simulated annealing algorithm is depicted in Algorithm2. A modified version of simulated annealing has already beenapplied in [44] where a repetition parameter 𝑅 is used to exploreseveral neighbours at each temperature step. The authors considera simple get_neighbour function that modifies randomly the cur-rent mapping 𝜋 to a neighbouring mapping 𝜋 neighbour . However, get_neighbour function is limited as it is not aware of the underly-ing hardware. This means that from the set of mappings generatedby this function and evaluated by the simulated annealing proce-dure, several mappings can be excluded even before evaluating themapping cost.We aim to improve the initial mapping generated with the simu-lated annealing procedure by designing a Hardware-aware Simu-lated Annealing (HSA) algorithm using a hardware-aware get_neighbour method to explore the neighbouring mappings. To explore differ-ent mappings, we separate the get_neighbour procedure in threealgorithms governed by a top execution policy. This top layer pol-icy decides which one of the three algorithms the get_neighbour method should execute to obtain a new mapping. The policy weused randomly chooses which algorithm to use from the value of arandom number.The first algorithm, called shuffle , does not change the physicalqubits involved in the current mapping but changes how they aremapped to logical qubits. The most straightforward algorithm thatcan be used for this task is a random shuffle – we list the physicalqubits involved in the mapping, randomly shuffle them, and obtaina new arbitrary mapping with the same physical qubits.The second algorithm, expand , does not change the mappingbetween physical qubits and logical ones but replaces one of thephysical qubits involved in the mapping by another physical qubit iyuan NIU, Adrien Suau, Gabriel Staffelbach, and Aida Todri-Sanial Algorithm 2:

Simulated annealing input :

Initial mapping 𝜋 , Cost function C , Neighbourcomputation function get_neighbour , Initialtemperature 𝑇 init , Final temperature 𝑇 𝑓 ,Temperature evolution constant Δ output : Best initial mapping found 𝜋 opt begin 𝜋 ← 𝜋 ; 𝜋 opt ← 𝜋 ; 𝑇 ← 𝑇 init ; cost ← C( 𝜋 ) ; cost opt ← cost ; while 𝑇 ⩾ 𝑇 𝑓 do 𝜋 neighbour ← get_neighbour( 𝜋 ) ; cost neighbour ← C( 𝜋 neighbour ) ; if cost neighbour < cost opt then cost opt ← cost neighbour ; 𝜋 opt ← 𝜋 neighbour ; end if cost neighbour < cost then cost ← cost neighbour ; 𝜋 ← 𝜋 neighbour ; else if rand() < exp (cid:16) cost - cost neighbour 𝑇 (cid:17) then cost ← cost neighbour ; 𝜋 ← 𝜋 neighbour ; end end 𝑇 ← 𝑇 × Δ ; end return 𝜋 opt ; end that is not part of the mapping. Instead of a hardware-unaware expand , we use an expand algorithm that tries to avoid separatingthe physical qubits in the current mapping into two disconnectedgroups. Moreover, the algorithm encourages re-arrangement ofqubits based on the figure of merit chosen (i.e. final state fidelity,circuit depth, execution time). In this algorithm, we consider thatstrongly connected qubits have high fidelity. The hardware-awareimplementation aims to identify the qubits with the least and mostconnections. Moreover, based on our tests with qubit measurementoperations, we find there is a huge source of errors. To account fortheses errors, we add weights to qubit to determine the best andworst qubits in terms of their measured fidelity.The third algorithm used in the get_neighbour algorithm iscalled reset . Its purpose is to give the possibility to the simulatedannealing algorithm to escape local-minimums. This algorithm isneeded because the first two algorithms shuffle and expand willlikely explore only the close neighbourhood of the current mappingand may not be able to escape a local minimum. To avoid beingstuck, the reset algorithm tries to find a potentially good new Figure 7: ibmq_tokyo topology. initial mapping from a randomly chosen qubit, without consideringthe previously explored mappings. The algorithm starts with arandom qubit and expands the mapping by iteratively weightingall the qubits and adding the best qubit to the new mapping.

To evaluate our solution and compare it to other algorithms, weuse some metrics that are described in the following paragraphs.The first metric is the success rate of the mapped quantum circuiton a given hardware. We define the success rate of a quantumcircuit as the fidelity of the quantum state obtained at the end ofthe execution of this quantum circuit on the hardware. We estimatethis success rate by executing the quantum circuit a large numberof times (8192), counting the number of executions that gave theexpected answer and dividing this number by the total numberof executions. The expected answer is obtained by executing thequantum circuit on a simulator.The second metric chosen is the additional number of

CNOT gates.This metric is tightly linked with the total number of

SWAP gatesinserted.The third metric is the total execution time of the circuit. Asthe execution time of each

CNOT gate can be extracted from [1], wecan estimate the overall execution time for a given circuit. Thismetric is important for several reasons. First, it shows the ability ofthe mapping algorithm to schedule gates in parallel when possibleand how good is the algorithm at doing this. Secondly, it allowsus to have an idea of the importance of decoherence noise in thecomputed fidelity. For each qubit, the execution time is computedby adding the total execution time of gate operations acting on it.The longest qubit execution time is selected to represent the totalexecution time of the quantum circuit.

All the benchmarks used are collected from the previous works [28,29, 46]. They include several functions taken from RevLib [41] aswell as quantum algorithms from a variety of domains includingoptimization, simulation, quantum arithmetic, etc. They are wellknown in the community and given as quantum circuits written inthe OpenQASM language [11].

Hardware-Aware Heuristic for the Qubit Mapping Problem in the NISQ Era B V m o d m il s l u - v

027 31713a l u - v d ec o d - v m o d d t i s i n g N u m b e r o f a dd i t i o n a l ga t e s HA+SABRE HA+HSA SABRE N-A Qiskit (a) Number of additional gates B V m o d m il s l u - v

027 31713a l u - v d ec o d - v m o d d t i s i n g . . . . . . F i d e li t y HA+SABRE HA+HSA SABRE N-A Qiskit (b) Fidelity

Figure 8: Comparison of number of additional gates and fidelity on ibmq_valencia. HA has been used with 𝛼 = . , 𝛼 = . and 𝛼 = . B V m o d m il s l u - v

027 31713a l u - v d ec o d - v m o d d t i s i n g N u m b e r o f a dd i t i o n a l ga t e s HA+SABRE HA+HSA SABRE N-A Qiskit (a) Number of additional gates B V m o d m il s l u - v

027 31713a l u - v d ec o d - v m o d d t i s i n g . . . . . F i d e li t y HA+SABRE HA+HSA SABRE N-A Qiskit (b) Fidelity

Figure 9: Comparison of number of additional gates and fidelity on ibmq_almaden. HA has been used with 𝛼 = . , 𝛼 = . and 𝛼 = . iyuan NIU, Adrien Suau, Gabriel Staffelbach, and Aida Todri-Sanial We chose two quantum chips, ibmq_almaden and ibmq_valencia,available from IBM Quantum experience website and one quan-tum chip ibmq_tokyo, which is not accessible currently but widelyused by state-of-art algorithms. ibmq_almaden is a 20-qubit quan-tum chip. Its topology and characteristics are summarised in Fig. 1and Table 2. ibmq_valencia is a 5-qubit chip depicted in Fig. 2b.ibmq_tokyo is a 20-qubit virtual chip depicted in Fig. 7. We executebenchmarks on ibmq_valencia and ibmq_almaden to check caseswhen the mapped quantum circuit needs all the available qubits oronly a small number of them. Moreover, we use ibmq_almaden andibmq_tokyo to compare our algorithm with state-of-art algorithmsin terms of number of additional gates. Note that we do not haveto execute the mapped quantum circuit on real quantum hardwareto count the number of additional gates.Our algorithm is implemented in Python and the Qiskit versionis 0.19.1. To empirically evaluate our algorithm, we use a personalcomputer with 1 Intel i5-5300U CPU and 8 GB memory. The Oper-ating System is Ubuntu 18.04.Several published qubit mapping algorithms are available asdiscussed in section 1. SABRE [29] seems to be the best algorithmat the time of writing when comparing the number of insertedgates to make the quantum circuit hardware-compliant. It providesa good initial mapping method and a mapping transition algorithm.Another algorithm DL [45] (Dynamic look-ahead) based on SABREshows an improvement in terms of number of additional gates.Moreover, the mapping method presented in [30] uses the hardwarecalibration data to try to find a good mapping. We compare to allthese algorithms. The source code of SABRE has been provided bythe authors of the algorithm, and the mapping method presentedin [30], called Noise-Adaptive (N-A) Compiler, has been integratedinto Qiskit as a transpiler pass. Finally, we also include the defaulttranspiler included in Qiskit as the baseline. We execute our HAmapping transition algorithm with two different initial mappingalgorithm – SABRE initial mapping algorithm and our Hardware-aware Simulated Annealing (HSA) algorithm.Summarising, to test on real hardware, five different algorithmsare included in the benchmarks: 1) our HA mapping algorithmwith SABRE initial mapping, 2) our HA mapping algorithm withHSA initial mapping, 3) SABRE mapping algorithm with SABREinitial mapping, 4) N-A Compiler and 5) Qiskit transpiler. For afair comparison, we set the optimisation_level parameter of theQiskit transpiler to zero and make sure that the circuits obtainedfrom the five methods are all executed with the same calibrationdata. The optimisation_level is set to zero to invoke only themapping transformation and not the optimisation transformation.Moreover, when using the N-A Compiler, the routing method is setto "lookahead" to make sure that it has the look-ahead ability. Tocompare the number of additional gates without accessing to realhardware, three algorithms are included: 1) SABRE, 2) DL, 3) HA.To evaluate our algorithm with the different initial mappingmethods, we allow each of them to call the mapping algorithm atmost 100 times. The number of calls to the mapping algorithm is anatural parameter of the simulated annealing-based method, butthe SABRE initial mapping method only needs two calls. To let theSABRE algorithm take advantage of a larger number of calls, werepeat the algorithm on several random initial mappings until no more calls are allowed and choose the best mapping found. Thewhole process is repeated 10 times to obtain 10 initial mappings.We divide benchmarks by size according to their number ofgates. We only execute small size benchmarks on real quantumhardware, because the other benchmarks with a large number ofgate operations introduce too much noise to obtain any meaningfulresults. Moreover, the initial mapping generation process describedabove is applied on small and medium sized benchmarks. Largebenchmarks suffer from long run time, so we generate 10 initialrandom mappings and use them with different algorithms. Whenusing ibmq_tokyo virtual chip, we select the best results out of fiveattempts which is a similar approach applied in SABRE and DL.When testing the HSA algorithm we used a random policy tochoose which one of the three subroutines to execute. The shuffle procedure is executed with a probability of 0 .

9, the expand algo-rithm is chosen with a probability 0 .

08 and the reset procedure isexecuted when the two previous algorithms are not used (i.e. 0 . 𝛼 of swap matrix 𝑆 is set to 0.5, the weightparameter 𝛼 of CNOT error matrix E is set to 0.5 and the weightparameter 𝛼 of CNOT execution time 𝑇 is set to 0. Second, wecompare the number of additional gates and total execution time.Weight parameter 𝛼 of swap matrix 𝑆 is set to 0.5, weight parameter 𝛼 of CNOT error matrix E is set to 0 and weight parameter 𝛼 of CNOT execution time 𝑇 is set to 0.5. Third, we compare thenumber of additional gates for circuits that are not executable onthe real quantum device. Weight parameter 𝛼 of swap matrix 𝑆 isset to 1, and the other two parameters are set to 0. For these threecomparisons, the weight parameter 𝑊 in the cost function is set to0.5 and size of extended layer is set to 20. We compare both the average number of additional gates (see Fig. 8aand Fig. 9a) and average output state fidelity (see Fig. 8b and Fig. 9b)among the 10 initial mappings for the five methods. The completeexperimental results are listed in Table 5 and Table 6.The Qiskit default qubit mapping algorithm is nearly alwaysthe worst one in terms of additional gates, which translates inmost of the cases to the worst output state fidelity. Although N-A Compiler takes into account the calibration data and has thelook-ahead ability, results show that it does not outperform theSABRE mapping algorithm with SABRE initial mapping (labelled asSABRE in the plots). Our HA mapping algorithm with SABRE initialmapping (labelled as HA+SABRE in the plots) seems to be the bestcombination as in average it achieves the best output state fidelity.Moreover, our HA algorithm with SABRE initial mapping gives theminimum number of additional gates. HA mapping algorithm withHSA initial mapping (labelled as HA+HSA in the plots) is also good,but its results are less consistent than HA+SABRE due to its randomnature. Although, in many test cases, it outperforms SABRE.We also tried to map and execute the qft_10 circuit. We foundthat its output fidelity is less than 0 .

01 for all the methods tested inthe benchmark. Because the base fidelity is too low to perform ameaningful comparison, we only compare the number of additional

Hardware-Aware Heuristic for the Qubit Mapping Problem in the NISQ Era gates as summarised in Table 3 and Table 4 for quantum circuitswith a medium-to-large number of gates.Fig. 10 shows the result of comparing the execution times, num-ber of additional gates and fidelities of our HA algorithm withSABRE algorithm on ibmq_valencia. The execution time is reducedby 19% on average. Even though the weight parameter 𝛼 of CNOTerror matrix E is set to 0, the fidelity is improved by 8%. The numberof additional gates is reduced by 38%.Table 3 lists the result of the number of additional gates onibmq_almaden. Using the selection of SWAP and

Bridge gate, ourHA algorithm can outperform SABRE on circuits with differentsizes. For medium circuits, HA gives similar results as SABRE andan improvement from SABRE for only one circuit among the eightcircuits tested. For large circuits, HA outperforms SABRE and con-sistently reduces the number of additional gates by 28% on average.Table 4 shows the number of additional gates on ibmq_tokyo whencomparing our HA algorithm with SABRE and DL. DL outperformsSABRE and our HA algorithm can further reduce the number ofadditional gates by 14% on average. SABRE and DL only providetheir runtime on ibmq_tokyo, the difference between runtime ofthe three algorithms is shown in Table 4. Note that, DL is written inC++ and tested on a normal personal computer. SABRE is writtenin Python and tested on a server with 2 Intel Xeon E5-2680 CPUs(48 logical cores) and 378GB memory. Since there is an intrinsicspeed difference between C++ and Python as well as the differentdevices used, the runtime data in this table are for reference ratherthan for comparison.

Given the current available NISQ hardware, it is important to adaptquantum programs to execute on such hardware while taking intoaccount their physical constraints and limitations (noisy operations,number of qubits and gates). Here, we list several guidelines thatcan help a programmer to design quantum circuits that comply ongiven quantum hardware. • Check the topology and the calibration data of the devicetargeted. Try to map the most used qubit of the mappedcircuit to the physical qubit that has the strongest couplingconnection. • Try to apply a

CNOT gate on qubits that are directly connectedand with a reliable (i.e. low error rate) interconnect, so thatno more additional gates are needed, and the overall circuitfidelity is improved. • If a

CNOT gate cannot be applied on two-adjacent qubits, tryto apply on two qubits whose distance is two on the couplinggraph. In such situation, one can select between a

SWAP and

Bridge gate to execute the

CNOT gate. Also, the number ofadditional gates will be reduced.

In this work, we present an efficient hardware-aware mappingalgorithm based on heuristic search. For future studies, we find that the following potential research directions can be explored.First, our HA algorithm only takes into consideration the calibra-tion data, which includes the gate error and the execution time.However, other physical constraints, such as crosstalk error maybe included to take into account crosstalk coupling between in-terconnects. Secondly we would like to investigate the adaptationof such a mapping algorithm to a multi-programming mechanismas introduced in [12]. Executing multiple quantum circuits on thesame chip allows us to use more efficiently hardware resourcesbut may decrease the fidelity of the quantum operations due tounwanted interactions. Finally, we find it relevant to investigatemapping algorithms for specific use cases such as quantum circuitsconstructed for quantum chemistry computations with VQE [32]or to solve linear systems with the VQLS algorithm [7].

The quantum computers are now in the NISQ era. There’s a gapbetween the design and execution of a quantum circuit in NISQhardware. In this paper, we present a hardware-aware heuristicfor qubit mapping problem that adapts the quantum circuit to thequantum hardware. We design a mapping transition algorithm thatuses calibration data and selects from either a

SWAP or Bridge gatefor qubit movement. Experimental results show that our algorithmcan outperform state-of-the-art algorithms in terms of the numberof additional gates, fidelity and execution time. Our algorithm isevaluated on IBM quantum devices, but should be general enoughto be used on quantum devices from other vendors as well.

ACKNOWLEDGMENT

This work is funded by the QuantUM Initiative of the Region Oc-citanie, University of Montpellier and IBM Montpellier as well asby a research collaboration grant between TOTAL, LIRMM andCERFACS. We would like to thank the authors of SABRE for themeaningful discussions and exchanges.

SUPPLEMENTARY INFORMATION

Authors have made available the source code and it can be foundat the following link: https://github.com/peachnuts/HA.

REFERENCES [1] IBMQ backends information. https://github.com/Qiskit/ibmq-device-information, 2019. Accessed: 2019-09-18.[2] Mohammad Alfailakawi, Imtiaz Ahmad, and Suha Hamdan. Lnn reversible circuitrealization using fast harmony search based heuristic. In

Asia-Pacific Conferenceon Computer Science and Electrical Engineering , 11 2014.[3] Abdullah Ash-Saki, Mahabubul Alam, and Swaroop Ghosh. Qure: Qubit re-allocation in noisy intermediate-scale quantum computers. In

Proceedings ofthe 56th Annual Design Automation Conference 2019 , pages 1–6, 2019. doi: https://doi.org/10.1145/3316781.3317888.[4] Debjyoti Bhattacharjee and Anupam Chattopadhyay. Depth-optimal quantumcircuit placement for arbitrary topologies, 2017. arXiv : https://arxiv.org/abs/1703.08540.[5] Debjyoti Bhattacharjee, Abdullah Ash Saki, Mahabubul Alam, Anupam Chat-topadhyay, and Swaroop Ghosh. Muqut: Multi-constraint quantum circuit map-ping on NISQ computers. In , page 8942132. Institute of Electrical and ElectronicsEngineers Inc., 2019. doi: https://doi.org/10.1109/ICCAD45719.2019.8942132.[6] Kyle E. C. Booth, Minh Do, J. Christopher Beck, Eleanor Rieffel, Davide Venturelli,and Jeremy Frank. Comparing and integrating constraint programming andtemporal planning for quantum circuit compilation. In

International Conferenceon Automated Planning and Scheduling , pages 366–374, 2018. arXiv: https://arxiv.org/abs/1803.06775. iyuan NIU, Adrien Suau, Gabriel Staffelbach, and Aida Todri-Sanial m o d m il s

65 a l u - v

027 31713 d ec o d - v m o d d

264 4g t E x ec u t i o n t i m e ( µ s ) SABRE HA (a) Execution time m o d m il s

65 a l u - v

027 31713 d ec o d - v m o d d

264 4g t N u m b e r o f a dd i t i o n a l ga t e s SABRE HA (b) Number of additional gates m o d m il s

65 a l u - v

027 31713 d ec o d - v m o d d

264 4g t . . . . . . F i d e li t y SABRE HA (c) Fidelity

Figure 10: Comparison of execution time, number of additional gates and fidelity on ibmq_valencia. HA has been used with 𝛼 = . , 𝛼 = and 𝛼 = . .Table 3: Number of additional gates on ibmq_almaden for large circuits. HA has been used with 𝛼 = , 𝛼 = and 𝛼 = . Original Circuit SABRE HA Comparisontype name n g all g g min g g min t ∆ g % ∆ g min %medium qaoa 6 270 30 27 30 27 0.008 0 0medium ising model 10 10 480 0 0 0 0 0.02 0 0medium ising model 13 13 633 0 0 0 0 0.03 0 0medium ising model 16 16 786 3 0 9 0 0.10 -200 0medium qft 10 10 200 93 81 66 42 0.04 29 48.1medium qft 13 13 403 192 177 195 171 0.07 -1.6 3.4medium qft 16 16 512 425 372 450 375 0.24 -5.9 -0.8large adr4 197 13 3439 2973 2856 2136 2004 2.13 28.2 29.8large radd 250 13 3213 2742 2655 2040 1926 1.62 25.6 27.5large z4 268 11 3073 2628 2559 1872 1815 1.44 28.8 29.1large sym6 145 14 3888 3024 2982 2022 1965 2.18 33.1 34.1large misex1 241 15 4813 3999 3831 2892 2630 3.04 27.7 31.3large rd73 252 10 5321 4539 4428 3261 3090 3.73 28.2 30.2large cycle10 2 110 12 6050 5127 5043 3795 3576 4.87 26 29.1large square root 7 15 7630 6477 6324 4851 4707 7.00 25.1 25.6large sqn 258 10 10223 8679 8580 6012 5736 13.92 30.7 33.1large rd84 253 12 13658 11889 11673 8721 8574 24.54 26.6 26.5large co14 215 15 17936 16710 16368 13071 12426 37.81 21.8 24.1large sym9 193 10 34881 30558 30027 21900 21168 160.19 28.3 29.5large 9symml 195 11 34881 30471 30129 21949 21168 151.84 28 29.7 n : number of qubits. g all : total number of gates. g : average number of additional gates. g min : minimum number of additional gates. t : runtime in seconds. ∆g : comparison of averagenumber of additional gates between HA and SABRE. ∆g min : comparison of minimum number of additional gates between HA and SABRE. Hardware-Aware Heuristic for the Qubit Mapping Problem in the NISQ Era

Table 4: Number of additional gates on ibmq_tokyo for large circuits. HA has been used with 𝛼 = , 𝛼 = and 𝛼 = . Original Circuit SABRE DL HA Comparisontype name n g all g t g t g t ∆ g %medium ising model 10 10 480 0 0.004 0 0 0 0.005 0medium ising model 13 13 633 0 0.007 0 0 0 0.01 0medium ising model 16 16 786 0 0.01 0 0 0 0.02 0medium qft 10 10 200 54 0.103 39 0.015 36 0.015 7.7medium qft 13 13 403 93 0.036 96 0.031 78 0.043 18.8medium qft 16 16 512 186 0.084 192 0.062 174 0.09 9.4large adr4 197 13 3439 1614 0.49 1224 0.218 882 1.41 27.9large radd 250 13 3213 1275 0.48 1047 0.186 840 1.24 19.8large z4 268 11 3073 1365 0.44 855 0.202 801 1.13 6.3large sym6 145 14 3888 1272 0.56 1017 0.202 786 1.71 22.7large misex1 241 15 4813 1251 0.89 1098 0.249 942 2.57 14.2large rd73 252 10 5321 2133 0.94 2193 0.343 1635 3.19 25.4large cycle10 2 110 12 6050 2622 1.35 1968 0.348 1719 4.02 12.7large square root 7 15 7630 2598 1.5 1788 0.406 828 5.66 53.7large sqn 258 10 10223 4344 3.52 3057 0.563 2712 11.7 11.3large rd84 253 12 13658 6147 5.39 5697 0.892 3843 21.8 32.5large co14 215 15 17936 8982 9.51 5061 1.062 6429 36 -27large sym9 193 10 34881 16653 30.17 13746 2.091 11553 138.3 16 n : number of qubits. g all : total number of gates. g : minimum number of additional gates. t : runtime in seconds. ∆g : comparison of minimum number of additional gates betweenHA and DL. Table 5: Comparison of number of additional gates and fidelity on ibmq_valencia. HA has been used with 𝛼 = . , 𝛼 = . and 𝛼 = . Original Circuit SABRE HA + SABRE HA + HSA Qiskit N-A Comparisonname n g all g g min

S S max g g min

S S max t g g min

S S max g S g S ∆ g % ∆ g min % ∆ S % ∆ S max %BV5 5 15 3 3 0.576 0.639 3 3 0.612 0.639 0 3 3 0.581 0.63 12 0.456 3 0.56 0 0 6.3 0mod5mils 65 5 35 21 21 0.495 0.515 12 12 0.525 0.559 0.003 12 12 0.53 0.559 27 0.275 27 0.443 42.9 42.9 6.1 8.5alu-v0 27 5 36 24 24 0.322 0.329 18 18 0.437 0.437 0.002 18 18 0.384 0.431 24 0.335 24 0.319 25 25 35.7 32.83 17 13 3 36 18 18 0.43 0.476 12 12 0.503 0.546 0.004 12 12 0.463 0.542 36 0.458 21 0.354 33.3 33.3 17 14.7alu-v1 28 5 37 24 24 0.225 0.233 18 18 0.342 0.384 0.004 18 18 0.269 0.384 39 0.178 27 0.192 25 25 52 64.8decod24-v2 43 4 52 36 36 0.262 0.396 18 18 0.307 0.37 0.004 18 18 0.303 0.372 36 0.07 36 0.213 50 50 17.2 -6.6mod5d2 64 5 53 45 45 0.14 0.208 24 24 0.199 0.207 0.005 24 24 0.194 0.207 42 0.171 48 0.125 46.7 46.7 42.1 -0.44gt13 92 5 66 45 45 0.171 0.191 24 24 0.194 0.206 0.006 24 24 0.199 0.22 69 0.154 48 0.18 46.7 46.7 13.5 7.9ising 5 90 24 24 0.133 0.145 24 24 0.134 0.141 0.007 24 24 0.137 0.143 60 0.113 33 0.1 0 0 0.8 -2.8 n : number of qubits. g all : total number of gates. g : average number of additional gates. g min : minimum number of additional gates. S : mean of success rate. S max : maximumof success rate. ∆g : comparison of average number of additional gates between HA+SABRE and SABRE. ∆g min : comparison of minimum number of additional gates betweenHA+SABRE and SABRE. ∆S : comparison of mean of success rate between HA+SABRE and SABRE. ∆S max : comparison of maximum of success rate between HA+SABRE and SABRE. t : runtime of HA+SABRE in seconds. [7] Carlos Bravo-Prieto, Ryan LaRose, M. Cerezo, Yigit Subasi, Lukasz Cincio, andPatrick J. Coles. Variational quantum linear solver: A hybrid algorithm for linearsystems. 09 2019. arXiv: https://arxiv.org/abs/1909.05820v2.[8] McKay David C, Thomas Alexander, Luciano Bello, Michael J Biercuk, Lev Bishop,Jiayin Chen, Jerry M Chow, Antonio D Córcoles, Daniel Egger, Stefan Filipp, et al.Qiskit backend specifications for OpenQASM and openpulse experiments. arXivpreprint arXiv:1809.03452 , 2018. arXiv: https://arxiv.org/abs/1809.03452.[9] Yudong Cao, Jonathan Romero, Jonathan P Olson, Matthias Degroote, Peter DJohnson, Mária Kieferová, Ian D Kivlichan, Tim Menke, Borja Peropadre, Nico-las PD Sawaya, et al. Quantum chemistry in the age of quantum computing. Chemical reviews , 119(19):10856–10915, 2019. doi: https://doi.org/10.1021/acs.chemrev.8b00803.[10] Alexander Cowtan, Silas Dilkes, Ross Duncan, Alexandre Krajenbrink, WillSimmons, and Seyon Sivarajah. On the Qubit Routing Problem. In Wimvan Dam and Laura Mancinska, editors, , volume135 of

Leibniz International Proceedings in Informatics (LIPIcs) , pages 5:1–5:32,Dagstuhl, Germany, 2019. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik.doi: https://doi.org/10.4230/LIPIcs.TQC.2019.5.[11] Andrew W Cross, Lev S Bishop, John A Smolin, and Jay M Gambetta. Openquantum assembly language. arXiv preprint arXiv:1707.03429 , 2017. arXiv: https://arxiv.org/abs/1707.03429.[12] Poulami Das, Swamit S Tannu, Prashant J Nair, and Moinuddin Qureshi. A casefor multi-programming quantum computers. In

Proceedings of the 52nd AnnualIEEE/ACM International Symposium on Microarchitecture , pages 291–303, 2019.doi: https://doi.org/10.1145/3352460.3358287.[13] Alexandre A. A. de Almeida, Gerhard W. Dueck, and Alexandre C. R. da Silva.Finding optimal qubit permutations for ibm’s quantum computer architectures.In

Proceedings of the 32nd Symposium on Integrated Circuits and Systems Design ,SBCCI ’19, New York, NY, USA, 2019. Association for Computing Machinery. doi: iyuan NIU, Adrien Suau, Gabriel Staffelbach, and Aida Todri-Sanial

Table 6: Comparison of number of additional gates and fidelity on ibmq_almaden. HA has been used with 𝛼 = . , 𝛼 = . and 𝛼 = . Original Circuit SABRE HA + SABRE HA + HSA Qiskit N-A Comparisonname n g all g g min

S S max g g min

S S max t g g min

S S max g S g S ∆ g % ∆ g min % ∆ S % ∆ S max %BV5 5 15 3 3 0.436 0.624 3 3 0.497 0.651 0.002 7 6 0.318 0.508 24 0.04 6 0.37 0 0 14 4.3mod5mils 65 5 35 21 21 0.315 0.47 12 12 0.383 0.481 0.003 19 15 0.268 0.439 54 0.107 33 0.214 42.9 42.9 21.6 2.3alu-v0 27 5 36 21 21 0.276 0.413 15 15 0.3 0.483 0.002 26 19 0.265 0.408 36 0.127 36 0.139 28.6 28.6 8.7 16.93 17 13 3 36 18 18 0.333 0.469 12 12 0.395 0.519 0.002 12 12 0.35 0.502 33 0.216 27 0.207 33.3 33.3 18.6 10.7alu-v1 28 5 37 24 24 0.25 0.359 15 15 0.391 0.478 0.002 21 21 0.27 0.408 48 0.054 30 0.087 37.5 37.5 56.4 33.1decod24-v2 43 4 52 36 36 0.199 0.334 18 18 0.284 0.401 0.006 20 18 0.235 0.387 54 0.076 39 0.145 50 50 42.7 20.1mod5d2 64 5 53 45 45 0.132 0.198 24 24 0.16 0.266 0.003 33 33 0.15 0.263 54 0.073 48 0.056 46.7 46.7 21.2 34.34gt13 92 5 66 45 45 0.13 0.249 24 24 0.145 0.312 0.007 32 27 0.165 0.347 99 0.061 66 0.106 46.7 46.7 11.5 25.3ising 5 90 24 24 0.115 0.177 24 24 0.133 0.191 0.01 36 30 0.121 0.235 51 0.07 33 0.054 0 0 15.7 7.9 n : number of qubits. g all : total number of gates. g : average number of additional gates. g min : minimum number of additional gates. S : mean of success rate. S max : maximumof success rate. ∆g : comparison of average number of additional gates between HA+SABRE and SABRE. ∆g min : comparison of minimum number of additional gates betweenHA+SABRE and SABRE. ∆S : comparison of mean of success rate between HA+SABRE and SABRE. ∆S max : comparison of maximum of success rate between HA+SABRE and SABRE. t : runtime of HA+SABRE in seconds. https://doi.org/10.1145/3338852.3339829.[14] Frank Arute et. al. Quantum supremacy using a programmable superconductingprocessor. Nature , 574:505–510, 10 2019. doi: https://doi.org/10.1038/s41586-019-1666-5.[15] Edward Farhi, Jeffrey Goldstone, and Sam Gutmann. A quantum approximateoptimization algorithm. 11 2014. arXiv: https://arxiv.org/abs/1411.4028v1.[16] Will Finigan, Michael Cubeddu, Thomas Lively, Johannes Flick, and PrinehaNarang. Qubit allocation for noisy intermediate-scale quantum computers. arXivpreprint arXiv:1810.08291 , 2018. arXiv: https://arxiv.org/abs/1810.08291.[17] Robert W Floyd. Algorithm 97: shortest path.

Communications of the ACM ,5(6):345, 1962. doi: https://doi.org/10.1145/367766.368168.[18] András Gilyén, Yuan Su, Guang Hao Low, and Nathan Wiebe. Quantum singularvalue transformation and beyond: exponential improvements for quantum matrixarithmetics. 06 2018. doi: https://doi.org/10.1145/3313276.3316366.[19] Gian Giacomo Guerreschi. Scheduler of quantum circuits based on dynamicalpattern improvement and its application to hardware design. arXiv e-prints , pagearXiv:1912.00035, November 2019. arXiv: https://arxiv.org/abs/1912.00035.[20] Gian Giacomo Guerreschi and Jongsoo Park. Two-step approach to schedulingquantum circuits.

Quantum Science and Technology , 3(4):045003, jul 2018. doi:https://doi.org/10.1088/2058-9565/aacf0b.[21] Aram W. Harrow, Avinatan Hassidim, and Seth Lloyd. Quantum algorithmfor linear systems of equations.

Physical Review Letters , 103, 10 2009. doi:https://doi.org/10.1103/PhysRevLett.103.150502.[22] Hsin-Yuan Huang, Kishor Bharti, and Patrick Rebentrost. Near-term quantumalgorithms for linear systems of equations. 09 2019. arXiv: https://arxiv.org/abs/1909.07344v1.[23] Toshinari Itoko, Rudy Raymond, Takashi Imamichi, and Atsushi Matsuo. Opti-mization of quantum circuit mapping using gate transformation and commuta-tion.

Integration , 70:43–50, 2020. doi: https://doi.org/10.1016/j.vlsi.2019.10.004.[24] Iordanis Kerenidis and Anupam Prakash. A quantum interior point method forlps and sdps. 08 2018. arXiv: https://arxiv.org/abs/1808.09266.[25] Iordanis Kerenidis and Anupam Prakash. Quantum gradient descent for linearsystems and least squares.

Phys. Rev. A , 101:022316, Feb 2020. doi: https://doi.org/10.1103/PhysRevA.101.022316.[26] Abhoy Kole, Kamalika Datta, and Indranil Sengupta. A heuristic for linear nearestneighbor realization of quantum circuits by swap gate insertion using 𝑛 -gatelookahead. IEEE Journal on Emerging and Selected Topics in Circuits and Systems ,6(1):62–72, 2016. doi: https://doi.org/10.1109/JETCAS.2016.2528720.[27] Lingling Lao, Hans van Someren, Imran Ashraf, and Carmen G. Almudever.Mapping of quantum circuits onto NISQ superconducting processors. arXive-prints , August 2019. arXiv : https://arxiv.org/abs/1908.04226v1.[28] Ang Li and Sriram Krishnamoorthy. QASMBench: A low-level QASM benchmarksuite for NISQ evaluation and simulation. arXiv preprint arXiv:2005.13018 , 2020.arXiv: https://arxiv.org/abs/2005.13018.[29] Gushu Li, Yufei Ding, and Yuan Xie. Tackling the qubit mapping problem forNISQ-era quantum devices. In

Proceedings of the Twenty-Fourth InternationalConference on Architectural Support for Programming Languages and OperatingSystems , pages 1001–1014, 2019. doi: https://doi.org/10.1145/3297858.3304023.[30] Prakash Murali, Jonathan M. Baker, Ali Javadi-Abhari, Frederic T. Chong, andMargaret Martonosi. Noise-adaptive compiler mappings for noisy intermediate-scale quantum computers. In

Proceedings of the Twenty-Fourth InternationalConference on Architectural Support for Programming Languages and OperatingSystems , ASPLOS ’19, page 1015–1029, New York, NY, USA, 2019. Association forComputing Machinery. doi: https://doi.org/10.1145/3297858.3304075.[31] Prakash Murali, Norbert Matthias Linke, Margaret Martonosi, Ali Javadi Abhari,Nhung Hong Nguyen, and Cinthia Huerta Alderete. Full-stack, real-system quantum computer studies: Architectural comparisons and design insights, 2019.doi: https://doi.org/10.1145/3307650.3322273.[32] Alberto Peruzzo, Jarrod McClean, Peter Shadbolt, Man-Hong Yung, Xiao-Qi Zhou,Peter J. Love, Alán Aspuru-Guzik, and Jeremy L. O’Brien. A variational eigenvaluesolver on a quantum processor. 04 2013. doi: https://doi.org/10.1038/ncomms5213.[33] John Preskill. Quantum computing in the NISQ era and beyond. 01 2018. doi:https://doi.org/10.22331/q-2018-08-06-79.[34] Mehdi Saeedi, Robert Wille, and Rolf Drechsler. Synthesis of quantum circuits forlinear nearest neighbor architectures.

Quantum Information Processing , 10(3):355–377, 2011. doi: https://doi.org/10.1007/s11128-010-0201-2.[35] Changpeng Shao and Hua Xiang. Row and column iteration methods to solvelinear systems on a quantum computer.

Phys. Rev. A , 101:022322, 02 2020. doi:https://doi.org/10.1103/PhysRevA.101.022322.[36] Peter W. Shor. Polynomial-time algorithms for prime factorization and discretelogarithms on a quantum computer.

SIAM J. Sci. Statist. Comput. 26 (1997) 1484 ,09 1995. doi: https://doi.org/10.1137/S0097539795293172.[37] Ritu Ranjan Shrivastwa, Kamalika Datta, and Indranil Sengupta. Fast qubitplacement in 2d architecture using nearest neighbor realization. In , pages 95–100.IEEE, 2015. doi: https://doi.org/10.1109/iNIS.2015.59.[38] Marcos Yukio Siraichi, Vinícius Fernandes dos Santos, Sylvain Collange, andFernando Magno Quintao Pereira. Qubit allocation. In

Proceedings of the 2018International Symposium on Code Generation and Optimization , CGO 2018, page113–125, New York, NY, USA, 2018. Association for Computing Machinery. doi:https://doi.org/10.1145/3168822.[39] Swamit S Tannu and Moinuddin K Qureshi. Not all qubits are created equal:a case for variability-aware policies for NISQ-era quantum computers. In

Pro-ceedings of the Twenty-Fourth International Conference on Architectural Supportfor Programming Languages and Operating Systems , pages 987–999, 2019. doi:https://doi.org/10.1145/3297858.3304007.[40] Davide Venturelli, Minh Do, Bryan O’Gorman, Jeremy Frank, Eleanor Rieffel,Kyle EC Booth, Thanh Nguyen, Parvathi Narayan, and Sasha Nanda. Quantumcircuit compilation : An emerging application for automated reasoning. 2019.[41] R. Wille, D. Große, L. Teuber, G. W. Dueck, and R. Drechsler. RevLib: An onlineresource for reversible functions and reversible circuits. In

Int’l Symp. on -ValuedLogic , pages 292–297. IEEE, 2016. doi:https://doi.org/10.1109/ASPDAC.2016.7428026.[43] Xiaosi Xu, Jinzhao Sun, Suguru Endo, Ying Li, Simon C. Benjamin, and Xiao Yuan.Variational algorithms for linear algebra. 09 2019. arXiv: https://arxiv.org/abs/1909.03898.[44] Xiangzhen Zhou, Sanjiang Li, and Yuan Feng. Quantum circuit transfor-mation based on simulated annealing and heuristic search.

IEEE Transac-tions on Computer-Aided Design of Integrated Circuits and Systems , 2020. doi:https://doi.org/10.1109/TCAD.2020.2969647.[45] P. Zhu, Z. Guan, and X. Cheng. A dynamic look-ahead heuristic for the qubitmapping problem of NISQ computers.

IEEE Transactions on Computer-AidedDesign of Integrated Circuits and Systems , 2020. doi: https://doi.org/10.1109/TCAD.2020.2970594.[46] Alwin Zulehner, Alexandru Paler, and Robert Wille. An efficient methodologyfor mapping quantum circuits to the ibm qx architectures.