Benchmarking quantum co-processors in an application-centric, hardware-agnostic and scalable way
DDigital Object Identifier XXX
Benchmarking quantum co-processorsin an application-centric,hardware-agnostic and scalable way
SIMON MARTIEL, THOMAS AYRAL, CYRIL ALLOUCHE
Atos Quantum Laboratory, Les Clayes-sous-Bois, France
ABSTRACT
Existing protocols for benchmarking current quantum co-processors fail to meet the usualstandards for assessing the performance of High-Performance-Computing platforms. After a syntheticreview of these protocols—whether at the gate, circuit or application level—we introduce a new benchmark,dubbed Atos Q-score TM , that is application-centric, hardware-agnostic and scalable to quantum advantageprocessor sizes and beyond. The Q-score measures the maximum number of qubits that can be usedeffectively to solve the MaxCut combinatorial optimization problem with the Quantum ApproximateOptimization Algorithm. We give a robust definition of the notion of effective performance by introducingan improved approximation ratio based on the scaling of random and optimal algorithms. We illustratethe behavior of Q-score using perfect and noisy simulations of quantum processors. Finally, we provide anopen-source implementation of Q-score that makes it easy to compute the Q-score of any quantum hardware. INDEX TERMS
Quantum benchmarking, Combinatorial optimization, Quantum algorithms
INTRODUCTION
Recent years have witnessed great progress in the field ofquantum technologies, whether on the hardware side—withgrowing computer sizes and quantum operation fidelities—or on the software side—with many algorithmic improve-ments. This progress has, among other achievements, enabledrecent claims that quantum advantage—the capacity for aquantum processor to outperform a classical machine—wasattained by some of the most advanced Noisy, Intermedi-ate Scale Quantum (NISQ, [Pre18]) processors [AAB + + VOLUME 4, 2016 a r X i v : . [ qu a n t - ph ] F e b er of quantum bits that a quantum computer can use tosolve a combinatorial optimization problem—the Max Cutproblem—significantly better than a classical random algo-rithm. In other words, it is an estimate of the largest combi-natorial optimization problem that can be solved better on aquantum processor than on a classical computer. Q-score canbe run and computed on any gate-based quantum hardware.It takes into account the performance of the compilation. Animplementation is available under an open-source license.To define the Q-score, we carefully investigate the size-dependence of the average performance of random and op-timal classical algorithms, as well as the QAOA quantumalgorithm, for solving the Max Cut problem on classes ofrandom graphs. The metric that we propose, akin to animproved approximation ratio, allows to measure non-trivialperformance above the level of random classical algorithms.Finally, we illustrate the behavior of the Q-score using noisysimulations with a depolarizing noise intensity compatiblewith today’s NISQ processors.This paper is organized as follows: we start by spelling outthe desirable properties of quantum metrics and by reviewingthe main existing quantum metrics (Section I). We thendescribe the Q-score protocol (Section II) and discuss itsproperties (Section III). We finally explain how to run thisbenchmark using an open-source script we provide online(Section IV). I. CHARACTERIZING QUANTUM PROCESSORS: GOALSAND PRIOR WORK
The careful design of Quantum Characterization, Verificationand Validation (QCVV) protocols is crucial for assessingthe potential of current and future quantum processing units(QPUs). Several such protocols have been proposed in therecent years, with various levels of proximity to applications,scalability, fairness and practicality.In this section, we start by laying out the QCVV criteriawe deem to be most important from a High-PerformanceComputing (HPC) perspective. We then briefly review themain existing proposals and to what extent they fulfill thesecriteria.
A. A HIGH-PERFORMANCE-COMPUTING-DRIVEN LISTOF CRITERIA
The first useful applications of quantum processors willlikely be demonstrated in setups where quantum co-processors will be used as accelerators for performingvery specific hard computational tasks within a High-Performance-Computing (HPC) system. The usefulness ofthe co-processor will be measured by comparing the per-formance of such a (possibly hybrid) computation with theperformance of its purely classical counterpart. With this inmind, we argue that useful QCVV protocols should fulfill thefollowing three criteria: (1) Application-centric: The protocol should yield a singlenumber (or a few) that unequivocally reflects the potential ofa given QPU for solving a real-life HPC application. Ideally,the score of the QPU for this given application should be aproxy for how well the processor performs in general, i.e forother applications. This focus on applications and its "holis-tic" goal excludes protocols that narrow the characterizationdown to low-level components only, such as, e.g, gate qualityor ability to sample specific classes of circuits (random orsquare circuits).(2) Hardware-agnostic: The protocol should put all the ex-isting or future hardware technologies on an equal footing.In particular, it should not favor a given technology over theothers.(3) Scalable: The protocol should be scalable to large num-bers of qubits. In particular, the classical computational com-plexity for processing the quantum output and outputtingthe metric should be reasonably moderate. This constraintexcludes protocols that involve classical computations thatare exponentially costly in the number of qubits.
B. PRIOR PROPOSALS
Most previously proposed QCVV protocols focus on gate-level and circuit-level characterization. We briefly reviewthese protocols, which give valuable, albeit partial insightsinto the performance of a given QPU. We then turn to theprevious attempts at characterizing QPUs from an applicationperspective.
1) Gate-level protocols
In the past years, several protocols have been proposed tocharacterize the performance of the main low-level com-ponents of QPUs, quantum gates and sequences of gates,namely quantum circuits. The corresponding metrics givevaluable information to compare different implementationsof similar quantum technologies, such as two different exper-imental realizations of superconducting transmon processors.They also give indications about the ability of QPUs to runcertain classes of quantum circuits.The most widely used protocol for characterizing the gate-level quality of a QPU is Randomized Benchmarking (RB)[MGE12]. It yields the average fidelity f or average errorrate (cid:15) = 1 − f of a given gate set [PRY +
17] while requiringonly polynomial classical resources (provided potentiallyexponential compilation overheads are avoided, such as inDirect Randomized Benchmarking [PCDR + VOLUME 4, 2016 etrics of its gate set (see [PRY +
20] for protocols that usestructured circuits). One major reason for this deficiencyis that RB gives little information about crosstalk errors,which influence the performance of a QPU at the circuitlevel (although we note that recent works propose ways ofextending RB to crosstalk estimation [MCWG20]).Another widely used protocol that goes beyond the measure-ment of the mere average fidelity of a gate set is GatesetTomography (GST) [BKGN + + + +
19] has been proposed to go beyond the limitation ofRB and GST to the characterization of QPUs with operationsacting only a few qubits. Indeed, beyond the crosstalk char-acterization issue we raised in the previous paragraph, sometechnologies such as trapped-ion QPUs provide operationslike the Mølmer-Sorensen gate that act on multiple (even all)qubits in a register. CB can detect crosstalk and is robust toSPAM errors [EWP +
2) Circuit-level protocols
A number of protocols has been proposed to measure theability of QPUs to run certain classes of circuits.One such protocol is the Quantum Volume (QV) [CBS + + + l norm to compare probability distributions. Similarlimitations in terms of the classical complexity to computethe metric and difficulty to use it as a proxy for an actualapplication also apply to this work.Ref. [PRY +
20] recently proposed a protocol based on the"mirroring" concept (also used in RB) that allows to getrid of the exponential classical effort that plagues the pre-vious circuit-level protocols. Yet, the ability to use this othercircuit-level metric to reliably predict the behavior of a givenQPU for a real application remains to be investigated.
3) Application-level protocols
We now turn to application-level protocols.One of the most promising applications of quantum pro-cessors is the field of quantum many-body physics, sincequantum processors are by construction quantum many-bodysystems with a large number of quantum bits interacting withone another in a controlled fashion.Ref. [DDSG +
20] recently proposed a metric dubbedFermionic Depth (FD) to quantify the ability of a QPUto tackle a quantum many-body problem. The prototypi-cal many-body problem chosen in this work is the one-dimensional Fermi-Hubbard model, whose ground-state en-ergy in the infinite-size limit, E exact ∞ , can be computed ex-actly in polynomial time on a classical computer via the so-called Bethe ansatz method [LW68]. The protocol consistsin computing, with a QPU, the approximate ground-stateenergy of this model E L for different (linear) sizes L , andthen returning the deviation to the exact energy at infinitesize, ∆ E L = E L − E exact ∞ . In practice, due to the limitedcoherence of current (NISQ) processors, E L is computed viaa hybrid quantum-classical method, the Variational QuantumEigensolver (VQE, [PMS + ∆ E L curveis going to display a minimum at a given size L ∗ , dubbedthe fermionic length of the QPU under investigation. Thisfermionic length thus gives an indication about the maximumsize of a fermionic problem that a given QPU can handle. VOLUME 4, 2016 he predictive power of this metric for problems out-side the 1D Fermi-Hubbard model remains to be investi-gated: whether the fermionic length estimated for a one-dimensional problem is related to the fermionic length thatcan be achieved for two-dimensional quantum many-bodyproblems is an open question. Indeed, those two-dimensionalproblems, which are among the hardest to tackle withthe most advanced classical algorithms, display phenomena(high-temperature superconductivity, pseudogap phase, ...)that are radically different from one-dimensional problems.Quantum chemistry problems, on the other hand, usuallyfeature interactions between many orbitals, whereas the Hub-bard model has only local interactions, raising the question ofthe relevance of the fermionic length for chemistry problems.We note that Ref. [MPJ +
19] proposed a chemistry-basedbenchmark of quantum processors, albeit with a focus onsmall molecules only and therefore no clear path towardsscalability yet.Finally, Ref. [DL20] proposed an extension of the LINPACKbenchmark (that is used to rank classical supercomputers) toa quantum setting. The protocol consists in solving a linearsystem of equations Ax = b , with A a random dense matrix,by outputting an approximate solution g ( A ) | b (cid:105) with g ( x ) apolynomial approximation of x − . While this protocol avoidsthe usual read-in problem (it does not require the use ofa QRAM to load A from classical data) through a block-encoding method (random circuits U A are used such that oneof the blocks of this unitary is A , with A a random densematrix), its measure of success consists in comparing theoutput vector g ( A ) | b (cid:105) to the actual solution (in addition to ameasure of the wall-clock time). This entails an exponentialclassical cost (through e.g a cross-entropy test), which limitsthe scalability of the method. II. THE PROTOCOL
In this section we describe our benchmark metric proposal.Similarly to other benchmark proposals, Q-score works byiteratively testing a quantum co-processor using a scalabletest T n indexed by a problem size n . Naturally, the score willbe the largest problem size n (cid:63) such that T n (cid:63) holds.Informally the test consists in:(a) Picking a collection of random graphs of size n (b) Running a QAOA-MaxCut algorithm on these graphsand computing C ( n ) , the average of the expected cutcost for each instance(c) Computing a score β ( n ) that depends on C ( n ) andtesting T n : β ( n ) > β (cid:63) for some constant β (cid:63) .The next subsection is dedicated to the description of this test T n . The detailed explanation of the various choices describedin this section can be found in section III. A. DESCRIPTION OF THE TEST
Our test T n consists in running a Quantum ApproximateOptimization Algorithm ( QAOA ) for a MaxCut instance of size n . We now describe the settings in which the algorithm isrun, and how its performance is assessed for a given instancesize. a: The circuit implementation. We assume that we tackle instances using the standard QAOAAnsatz as described in [FGG14]. Given a graph G = ( V, E ) (with V and E the vertex and edge set, respectively) and adepth parameter p , we implement the parameterized circuit: U ( γ , β ) = (cid:89) ≤ i ≤ p e − i βi H e − i γi H G (1)where H = − (cid:80) ≤ i ≤ n σ ( i ) x and H G = (cid:88) i,j ∈ E σ ( i ) z σ ( j ) z − | E | . (2)Here, σ x and σ z denote the Pauli X and Z operators, and | E | is the number of edges in the graph.In practice, each rotation e − i γi σ ( i ) z σ ( j ) z is decomposed usinga sub-circuit of CNOT gates and a single R Z rotation.The propagator e − i βi H is implemented using a wall of R X gates. b: The classical optimizer. The classical optimization routine used to minimize theAnsatz energy is COBYLA [Pow94]. This optimizer behaveswell in perfect settings and for shallow circuits (i.e circuitswith a low number of parameters). Since we expect thatincreasing the depth of the Ansatz will probably only degradeperformances, this choice seems reasonable. c: Computing the score.
For a given size n , we run a QAOA-MaxCut on randomgraphs in G ( n, p = ) , the distribution of Erdös-Renyigraphs obtained by taking an empty graph and connectingeach pair of vertices with probability . These graphs arerelatively dense and constitute a standard class used forbenchmarks. Given C ( n ) , the average of the energies (multi-plied by − ) produced by QAOA over these graphs, wecompute the following ratio: β ( n ) = C ( n ) − n λn / . (3)We say that the quantum processor passes the test for thissize n if β ( n ) > β (cid:63) . Here, the threshold β (cid:63) ∈ ]0 , dictateshow demanding is the test: a test with β (cid:63) = 0 can be passedby a simple coin toss, while a test with β (cid:63) = 1 can only bepassed by an exact solver. Hence β (cid:63) can be seen as fractionof performance between a naive randomized algorithm andan exact solver. In practice, the threshold β (cid:63) is arbitrarily setto . . We take λ = 0 . (see discussion below, section III).We also fix the number of shots (repetitions) to be used to getthe estimate of the QAOA energy for a given graph to 2048. VOLUME 4, 2016 R a t i o β ( n ) β ⋆ = 20⋆ perfect p = 1perfect p = 2 noi y (all-to-all)noi y (grid) FIGURE 1: Evolution of β ( n ) for different simulated QPUs:perfect QPU with p = 1 (blue), p = 2 (cyan), and noisyQPU with a depolarizing noise model (see text), with p = 1 ,all-to-call connectivity (solid red lines), and grid connectivity(dashed red lines). The dash-dotted black line shows the 20%threshold above which the Q-score test is passed. The errorbar is the standard error of the mean score over 100 graphs.The final Q-score is the largest n such that this test succeeds,i.e n (cid:63) ≡ max { n ∈ N , β ( n ) > β (cid:63) } . (4) d: Remarks. The choice of β (cid:63) is somewhat arbitrary. β (cid:63) was set so thata QAOA of depth p = 1 running on a perfect quantumprocessor will pass the test and will have an infinite Q-score.(As will be seen later [Fig. 2], for p = 1 , β Q ( n ) ≈ fora perfect QPU). In practice, the Q-score implementation weprovide is parameterized by this β (cid:63) . Moreover, it is usuallynot necessary to iteratively try each instance size until the testfails, since β ( n ) is expected to be a monotonically decreasingfunction of n . This implies that one can employ a dichotomicsearch in order to find n (cid:63) , the largest n such that β ( n (cid:63) ) > β (cid:63) .Our implementation supports both iterative evaluation anddichotomic search. B. ILLUSTRATION: PERFECT AND NOISY SIMULATIONS
To illustrate the meaning of the Q-score, we simulated thebehavior of QAOA-MaxCut on various Quantum ProcessingUnits (QPUs) using the Atos Quantum Learning Machine(QLM).We started by running QAOA-MaxCut on a perfect (noise-less) QPU for two values of the number p of QAOA layers.As expected, we see, in Figure 1, that the score increaseswith an increasing p due to an increased expressivity of theQAOA ansatz. We also observe that the ratio β ( n ) achievedby this perfect QPU is roughly constant as n increases, with β ( n ) ≈ for p = 1 , and β ( n ) ≈ for p = 2 . Thismeans that QAOA executed on a perfect quantum processor achieves scalings within 40 % (resp. 60 %) of the optimalscaling λn / (after subtraction of the leading n / term).We compare this behavior to the scores obtained with simula-tions of noisy QPUs. We choose a simple depolarizing noisemodel with a level of noise that is consistent with today’sNISQ processors. More specifically, we add depolarizingnoise after each gate, with an average error rate of (cid:15) = 2% for two-qubit gates (in comparison, the two-qubit er-ror rates reported for IBM Johannesburg [IBM], GoogleSycamore [AAB +
19, Fig.2, Table II], Rigetti Aspen 7 [Rig]and ionQ [WBD + (cid:15) = 0 . for one-qubit gates (this factor of 5 between the one- andtwo-qubit error rates is observed in typical superconductingand trapped-ion architectures, with reported one-qubit errorrates of 0.041%, 0.16%, 0.77% and 0.5% for the four afore-mentioned platforms). For the sake of simplicity, we assumeperfect initialization and readout, and neglect noise duringidling periods.We observe that the ratio β ( n ) achieved with a noisy QPUis, as expected, lower than with a perfect QPU. More impor-tantly, it decreases with the problem size n (i.e the numberof qubits): larger problems require longer circuits and hencelead to an increased sensitivity to noise. Moreover, a limitedconnectivity (e.g a grid connectivity) leads to a decreasedratio, since these connectivity constraints require the originalQAOA circuit to be optimized to comply with the constraints.This optimization, carried out following a method describedin [HNYN09], [MdB20] using one of Atos QLM’s compi-lation plugins, leads to longer circuits and hence degradedperformance in the presence of noise.From these simulations, we can infer that the Q-score for anoisy QPU with a grid connectivity is n (cid:63) = 11 . For the noisyQPU with an all-to-all connectivity, we can infer that n (cid:63) =21 . For perfect QPUs, QAOA achieves an infinite Q-score.Let us stress that this example also shows that beyondassessing the quality of the hardware for solving QAOA-MaxCut, the Q-score also assesses the performance of thesoftware stack: for instance, a better compiler to optimize forconnectivity constraints will lead to an increased β ( n ) andhence to an increased Q-score. III. DISCUSSION
In this section, we discuss the various choices made in thisproposal. First of all, let us recall briefly what we need toachieve.
A. THE ALGORITHM CHOICE
We are not looking at finding a discerning metric for quantumsupremacy. Our goal is simply to consider an application that
VOLUME 4, 2016 s both representative of practical needs from the industry andchallenging for current hardware platforms. a: The choice of QAOA-MaxCut Most, if not all, proposed algorithms compatible with theNISQ era, are variational algorithms. It thus seems natural,in an application-centric benchmark, to focus on this typeof algorithms. Among all these propositions, we need onethat fits a particular set of requirements. First, the algorithmshould be scalable, in the sense that one should be able torather smoothly increase the problem size in order to isolatethe precise threshold were the quantum co-processor fails.Combinatorial optimization problems usually fit this criterionquite easily. Moreover, we also need the test to be efficientlycomputable. By averaging over a simple class of randominstances, we can deduce asymptotic values for usually in-tractable quantities (see next subsection). This might be hardto do efficiently for other classes of problems. Hence, theQuantum Approximate Optimization Algorithm seems to bea good candidate that fits these needs. We chose the MaxCutproblem for the simple reason that it is both simple toimplement and simple to analyze. For instance, it is possibleto know the average number of entangling gates required inthe Ansatz, even after compilation and optimization. Thiswould not be the case were we to consider problems thatinvolved clauses over more that variables (mainly due to thevariability of the literature in architecture-aware phase poly-nomial synthesis algorithms [NGM20], [MdB20], [vdGD20]or other less competitive SWAP-based routing techniques). b: The choice of the class G ( n, ) This class of graphs is quite standard in random graphliterature and has a predictable behavior with regard to theMaxCut problem. Moreover, they constitute a class of densegraphs, with half of their possible edges present (on average).QAOA-MaxCut are often run using k -regular graphs for thesimple reason that these graphs are very sparse. In fact theiredge density decrease with their size. We argue that mostreal world applications will not have this property. Hence thechoice of G ( n, ) . One could relax a bit the test by pickinga class G ( n, f ( n )) with f ( n ) = o ( n ) , that is a class ofgraph where edges are picked uniformly with a probabilitythat decreases with n , but such that the average number ofedges f ( n ) n still grows faster that n . B. TEST DEFINITION AND APPROXIMATION RATIO
In this subsection, we detail the reasoning behind the defi-nition of the score (Eq. (3)) and the corresponding successcriterion. a: The usual approximation ratio and its lower bound.
We recall that the algorithm is run on Erdös-Renyi graphs G of fixed size n and with edge probability , denoted G (cid:0) n, (cid:1) . A standard way to evaluate the performance of an approximation heuristic such as QAOA is to considerthe approximation ratio α ( G ) = C ( G ) C max ( G ) , where C ( G ) isthe score of the worst solution that can be produced by theheuristic and C max ( G ) is the cost of the optimal solution forthe given graph G . Since we are dealing with a randomizedalgorithm, this quantity translates into α Q ( G ) = E Q [ C ( G )] C max where E Q [ C ] would be the expected score of a solutionproduced by QAOA. (With an infinite number of shots, E Q [ C ( G )] = −(cid:104) Ψ( γ , β ) | H G | Ψ( γ , β ) (cid:105) , with | Ψ( γ , β ) (cid:105) = U ( γ , β ) | (cid:105) ⊗ n , see Eqs. (1) and (2)).Since we are interested in a typical behavior over a class ofrandom graphs, we want to average this quantity, giving us anexpected approximation ratio over instances of a given size, α Q ( n ) = E G ∼G ( n, ) (cid:2) α Q ( G ) (cid:3) . The behavior of this quantity is hard to derive, but it is easyto derive the behavior of the closely related quantity, α Q ( n ) ≡ E G ∼G ( n, ) (cid:2) E Q [ C ( G )] (cid:3) E G ∼G ( n, ) [ C max ( G )] ≡ C Q ( n ) C max ( n ) .α Q ( n ) can be seen as a first-order approximation to α Q ( n ) .Since QAOA produces score distributions that are at least asgood as straightforward random sampling, we get α Q ( n ) ≥ C R ( n ) C max ( n ) . (5)We now turn to the behavior of C R ( n ) and C max ( n ) . Erdös-Renyi graphs of G (cid:0) n, (cid:1) have, on average, n edges. Onaverage over the complete family, their cuts have an expectedcost C R ( n ) ≡ E G ∼G ( n, ) (cid:2) E R [ C ( G )] (cid:3) = E [ | E | ]2 = n . Recent results [GL18], [DMS17] show that their typicalmaximum cut size grows as C max ( n ) ≡ E G ∼G ( n, ) ( C max ( G )) = n λn + o ( n / ) , (6)with λ ≥ √ π ≈ . . In practice, a numerical fit in therange n ∈ [5 , yields a value of λ ≈ . (see Figure 2).Plugging these results into Eq. (5), we obtain α Q ( n ) ≥ n n + λn , (7)which approaches when n diverges. b: An improved approximation ratio This lower bound suggests that α Q ( n ) is not the appropriatequantity to consider to assess the quality of a heuristic forMaxCut on this class of graphs. Because the expected ap-proximation ratio of random sampling grows with n , requir-ing a quantum processor to achieve a fixed approximationratio is not an interesting test (for this class of graphs): thisratio will get easier and easier to reach as n grows. For VOLUME 4, 2016 s c o r e Average MaxCut score for G(n, ) avg. maxcut score n + λn S c o r e C ( n ) − n / perfect p=1Fit νx , ν=0.070±1.15e−03perfect p=2Fit νx , ν=0.107±1.05e−03 FIGURE 2:
Top : Scaling of the expected maximum cut sizefor Erdös-Renyi graphs of increasing size. Each data point (inblue) is computed by solving
MaxCut instances (usingthe AKMaxSAT solver [AMP05]). In orange is a fit of shape y = n + λn with λ ≈ . obtained by a standard least-squares method (r-value > − − ). Bottom : Fit of QAOAscores C ( n ) − n / to νn / for p = 1 (blue) and p = 2 (cyan). The obtained values for ν correspond to β = ν/λ =40% and , respectively.instance, the previous inequality tells us that over randomgraphs of size , random sampling will produce cutswith an average score that is . of the average scoreof the maximal cuts. This means that the most ineffectivequantum processor, as long as it has qubits, will achieveat least the same ratio of expected cost. This phenomenonwas for instance observed in [DHJ + G ( n, ) class. In fact, this result holds for any classof random graphs such that edges are picked uniformly atrandom [GL18], [DMS17] and such that the number of edgesgrows faster than O ( n ) . If the number of edges is a O ( n ) ,then the standard average approximation ratio definition willbe upper bounded by a constant that can be analyticallyderived. This is for instance the case for k -regular graphs (seesection III-D). In order to avoid this issue, we consider instead the samequantities after subtracting the leading n term: β ( n ) ≡ C ( n ) − n C max ( n ) − n = C ( n ) − n λn / , (8)We use this definition to specify the conditions to pass the Q-score: we require the quantum algorithm achieve a ratio thatexceeds a constant value β (cid:63) ∈ ]0 , : β Q ( n ) ≥ β (cid:63) . (9)Based on numerical simulations with NISQ-compatible noiselevels (see subsection II-B above), we fix β (cid:63) to β (cid:63) = 20% .This requirement implies that the quantum heuristic mustfulfill satisfactory scalability properties: indeed, achieving aratio β Q n ≥ β (cid:63) implies that the quantity C Q ( n ) − n grows atleast as νn / , with ν = β (cid:63) λ and λ the scaling of the optimalsolution (see Eq. (6)). In other words, we require the scalingrate of the quantum heuristic to be at least within a fraction β (cid:63) = 20% of the scaling of the optimal solution.For instance, random sampling, which always produces avanishing ratio β R ( n ) = 0 , cannot fulfill the Q-score forany β (cid:63) > . Conversely, requiring β (cid:63) = 100% would meanrequiring to achieve the optimal solution.Figure 3 gives a qualitative graphical summary of the differ-ent quantities discussed here. c: Remark. In [AAB + H (cid:48) G = (cid:80) i,j ∈ E σ ( i ) z σ ( j ) z , i.e. without the constant energy offset of | E | (compare to H G in Eq. (2)). The spectra of these Hamil-tonians do not coincide with the usual cut size functions,but exhibit the same feature as the cost metric described inEq. (8). d: A continuous score. Even though the proposed protocol outputs a single number,it is possible to extract far more information from a run ofQ-score. For instance, a good benchmark metric would beto track the largest ν constant accessible for each problemsize n . This scaling would allow a manufacturer to track theperformances of its processors when scaling up the numberof qubits/problem size. Moreover, this ν factor provides acomparison tool with various behaviors whether it is randomsampling ( ν = 0 ), perfect solving ( ν = λ ≈ . ), perfectQAOA ( ν ≈ . for p = 1 , ≈ . for p = 2 , see Fig. 2). C. A NOTE ON THE EXPERIMENTAL PARAMETERS
When defining the protocol, we set the value of the numberof shots as well as the optimization procedure. While these
VOLUME 4, 2016 a) a v g . s c o r e Initial scores rndopttest (b) a v g . s c o r e / b e s t s c o r e Normalized scores rnd/optopt/opttest/opt (c) a v g . s c o r e Scores without leading term rnd− n opt− n test− n (d) a v g . s c o r e / b e s t s c o r e Normalized scores without leading term rndopttest (e) a v g . s c o r e / b e s t s c o r e Typical run rndopttestrun (clean)run (with noise)
FIGURE 3: (a) Typical scaling of the expected costs C ( n ) forthree cases: expected maximum cut size C max ( n ) (orange),expected random cut size C R ( n ) (blue), and cost correspond-ing to our threshold β (cid:63) = 20% (green). (b) Scaling of theaverage expectation ratios α ( n ) (namely cost normalized bythe expected maximum cut size C max ( n ) ) (c) Scaling ofthe cost with the leading n / term subtracted. (d) Scalingof the improved expectation ratios β ( n ) (namely with theleading term subtracted and normalized by the maximum cutscaling). (e) Evolution of β ( n ) for a typical Q-score run: inred, the scaling of a QAOA running on a perfect quantumprocessor. In purple, the scaling of a QAOA running on aimperfect processor. In this last setting, the dashed red linewill give the returned Q-score. choices are somewhat arbitrary, they arguably do not signifi-cantly impact the final value of the Q-score.The number of shots (2048) is representative of the typicalnumbers of shots used on experimental processors. It givesreasonable statistical errors on the estimate of the cost func-tion.As for the choice of the classical optimization procedure:we argue that the value of Q-score does not depend onthe classical optimizer, provided it is "good enough", i.e itsuitably optimizes the QAOA parameters. In our experience,COBYLA is one such optimizer.Finally, we note that the time-to-solution could easily betaken into account by Q-score by setting a maximum timebudget to compute β Q ( n ) for a given graph size n . For thetime being, we did not specify such a time limit, but Q-scoreshould be reported together with the absolute time requiredto compute β ( n (cid:63) ) . D. CHANGING THE GRAPH CLASS
In this protocol, and the discussion of section III-B, wefocused on a particular class of random graphs, namelyErdös-Renyi random graphs with edge probability . Thesegraphs have the nice property of being dense, and thus any(positive) result for this class of graph has a good chanceto transpose to any application. However, running QAOA-MaxCut for these graphs can be quite demanding, since atypical circuit would have around k n CNOT gates for anAnsatz of depth k over a graph of size n . This quadraticscaling can be quite demanding for a real hardware platform.In this section, we show how a similar score/test can bederived for other classes of random graphs that would defineless demanding tests, as in running circuits with a lowerentangling gate count. All the results presented below can bederived from the scaling proven in [DMS17]. In this work,the authors state that the scaling of the average maximum cutsize for random graphs with γn edges picked uniformly canbe expressed as: C max ( n ) = nγ P (cid:63) (cid:114) γ n + o ( n √ γ ) (10)where P (cid:63) ≤ (cid:112) /π . Numerical estimate of this constantgives P (cid:63) = 0 . ± . . This result gives us quitenaturally the difference in scaling between the cut sizes pro-duced by random sampling, nγ , and the cut sizes producedby an exact solver.We now detail this scaling for two classes of graphs: generic G ( n, p ) random graphs and random k -regular graphs. a: G ( n, p ) graphs We can run the same calculation as the one for G ( n, ) forany edge probability p . In this setting, we have γ = pn andsimilarly to the G ( n, ) case, the average maximum cut sizegrows as C max ( n ) = p n λ p n VOLUME 4, 2016 or some constant λ p . Analytically, we expect λ p = √ pλ ,with λ the scaling of the p = case. The direct conse-quence is that we can use a similar test as for p = andpose: β ( n ) = C ( n ) − p n C max ( n ) − p n = C ( n ) − p n λ p n / where λ p can be either fitted numerically or taken as λ √ p .Overall, this boils down to comparing the QAOA perfor-mance C ( n ) − p n against a n / scaling.Here, we derived an expression for β ( n ) where p is constant,but the derivation hold for any size dependent probability p = f ( n ) . Hence, we can define the same benchmarkwith increasingly dense (and thus difficult to implement)instances. b: k -regular graphs Regular graphs have the convenient property of being verysparse, with a number of edges of kn for a k -regular graphof size n . For this class of graph the scaling of the averagemaximum cut is in fact proven, and not only known withinan interval. Applying Eq. (10) with γ = k gives us: C max ( n ) = nk P (cid:63) √ k n + o ( n √ k ) , hence a natural choice of β is: β ( n ) = C ( n ) − nk C max ( n ) − nk = C ( n ) − nk λn for some constant λ = P (cid:63) √ k/ . Once again, we can eitheruse the analytical value for λ or fit it numerically for smallinstances. That is, if we fix k , we are looking to compare theQAOA performances over k -regular graphs C ( n ) − nk , to alinear scaling in n . IV. RUNNING Q-SCORE YOURSELF: AN OPEN-SOURCEREPOSITORY
We provide a Python package, qscore myqlm library.Once the qscore package is installed, here is the typicalscript that needs to be run: from qat.qscore.benchmark import QScore from qat.plugins import ScipyMinimizePlugin from qat.qpus import get_default_qpu QPU = ScipyMinimizePlugin( method="COBYLA", tol=1e-4, options={"maxiter": 300} ) | get_default_qpu() benchmark = QScore( QPU, size_limit=20, depth=1, output="perfect.csv", rawdata="perfect.raw" ) benchmark.run() Listing 1: Python script to run Q-scoreHere, the QPU is a perfect circuit simulator provided bymyQLM. In order to use a true hardware QPU, one simplyneeds to interface one’s QPU with the myQLM API. Thisthin layer typically looks as follows: from qat.core.qpu import QPUHandler from qat.core import Result class MyQPU(QPUHandler): def submit_job(self, job): circuit = job.circuit observable = job.observable qubits = job.qubits return result Listing 2: Python script to make your own QPU compatiblewith myQLM
V. CONCLUSION
In this note, we have introduced the Atos Q-score, anapplication-centric, hardware-agnostic and scalable metricthat measures the ability of a full quantum stack—hardwareand software—to solve a prototypical combinatorial opti-mization problem, MaxCut, using the Quantum ApproximateOptimization Algorithm, a widespread variational quantumheuristic compatible with Noisy Intermediate Scale Quantumco-processors. Instead of focusing on how well the basicbuilding blocks of a quantum processor work, like most exist-ing metrics, Q-score provides information as to the capacityof the processor to solve an actual problem. It does so withoutfavoring any hardware technology or software paradigm, andwill be applicable to very large problems due to its scalability.Like the classical LINPACK benchmark, the Q-score fo-cuses on a given problem as a proxy for most other hardcomputational problems. Here, MaxCut was chosen as arepresentative hard problem, because it appears to be quitesimple and universal. In the search for the "killer application"for quantum co-processors, other more relevant problemsmay appear and supersede MaxCut, but the same strategy asthe one we describe in this note will likely be applicable.Likewise, the choices of optimizer (COBYLA) and other
VOLUME 4, 2016 arameters (number of shots, number of graphs, etc) we setthe value of for the sake of standardization have a degree ofarbitrariness. In a similar vein, the current protocol is gearedto digital quantum co-processors. An extension to analogprocessors is rather straightforward, and will be the topic offuture work.All these variations on the protocol proposed in this noteshould not influence the overall outcome of the procedure,and thus the usefulness of the benchmark. ACKNOWLEDGEMENTS
We acknowledge useful discussions with the members ofthe Atos Quantum Advisory Board, Alain Aspect, DavidDiVincenzo, Artur Ekert, Daniel Estève, and Serge Haroche.The computations have been performed on the Atos QuantumLearning Machine.
REFERENCES [AAB +
19] Frank Arute, Kunal Arya, Ryan Babbush, Dave Bacon, Joseph C.Bardin, Rami Barends, Rupak Biswas, Sergio Boixo, FernandoG. S. L. Brandao, David A. Buell, Brian Burkett, Yu Chen, ZijunChen, Ben Chiaro, Roberto Collins, William Courtney, AndrewDunsworth, Edward Farhi, Brooks Foxen, Austin Fowler, CraigGidney, Marissa Giustina, Rob Graff, Keith Guerin, Steve Habeg-ger, Matthew P. Harrigan, Michael J. Hartmann, Alan Ho, MarkusHoffmann, Trent Huang, Travis S. Humble, Sergei V. Isakov,Evan Jeffrey, Zhang Jiang, Dvir Kafri, Kostyantyn Kechedzhi,Julian Kelly, Paul V. Klimov, Sergey Knysh, Alexander Korotkov,Fedor Kostritsa, David Landhuis, Mike Lindmark, Erik Lucero,Dmitry Lyakh, Salvatore Mandrà, Jarrod R. McClean, MatthewMcEwen, Anthony Megrant, Xiao Mi, Kristel Michielsen, Ma-soud Mohseni, Josh Mutus, Ofer Naaman, Matthew Neeley,Charles Neill, Murphy Yuezhen Niu, Eric Ostby, Andre Petukhov,John C. Platt, Chris Quintana, Eleanor G. Rieffel, PedramRoushan, Nicholas C. Rubin, Daniel Sank, Kevin J. Satzinger,Vadim Smelyanskiy, Kevin J. Sung, Matthew D. Trevithick, AmitVainsencher, Benjamin Villalonga, Theodore White, Z. JamieYao, Ping Yeh, Adam Zalcman, Hartmut Neven, and John M.Martinis. Quantum supremacy using a programmable supercon-ducting processor. Nature, 574(7779):505–510, oct 2019.[AAB +
20] Frank Arute, Kunal Arya, Ryan Babbush, Dave Bacon, Joseph C.Bardin, Rami Barends, Sergio Boixo, Michael Broughton, Bob B.Buckley, David A. Buell, Brian Burkett, Nicholas Bushnell,Yu Chen, Zijun Chen, Ben Chiaro, Roberto Collins, WilliamCourtney, Sean Demura, Andrew Dunsworth, Daniel Eppens, Ed-ward Farhi, Austin Fowler, Brooks Foxen, Craig Gidney, MarissaGiustina, Rob Graff, Steve Habegger, Matthew P. Harrigan, AlanHo, Sabrina Hong, Trent Huang, L. B. Ioffe, Sergei V. Isakov,Evan Jeffrey, Zhang Jiang, Cody Jones, Dvir Kafri, KostyantynKechedzhi, Julian Kelly, Seon Kim, Paul V. Klimov, Alexander N.Korotkov, Fedor Kostritsa, David Landhuis, Pavel Laptev, MikeLindmark, Martin Leib, Erik Lucero, Orion Martin, John M. Mar-tinis, Jarrod R. McClean, Matt McEwen, Anthony Megrant, XiaoMi, Masoud Mohseni, Wojciech Mruczkiewicz, Josh Mutus, OferNaaman, Matthew Neeley, Charles Neill, Florian Neukart, Hart-mut Neven, Murphy Yuezhen Niu, Thomas E. O’Brien, BryanO’Gorman, Eric Ostby, Andre Petukhov, Harald Putterman, ChrisQuintana, Pedram Roushan, Nicholas C. Rubin, Daniel Sank,Kevin J. Satzinger, Andrea Skolik, Vadim Smelyanskiy, DougStrain, Michael Streif, Kevin J. Sung, Marco Szalay, AmitVainsencher, Theodore White, Z. Jamie Yao, Ping Yeh, AdamZalcman, and Leo Zhou. Quantum approximate optimization ofnon-planar graph problems on a planar superconducting proces-sor, 2020.[AC16] Scott Aaronson and Lijie Chen. Complexity-Theoretic Founda-tions of Quantum Supremacy Experiments. dec 2016. [AMP05] Teresa Alsinet, Felip Manyà, and Jordi Planes. Improved exactsolvers for weighted max-sat. In Fahiem Bacchus and TobyWalsh, editors, Theory and Applications of Satisfiability Testing,pages 371–377, Berlin, Heidelberg, 2005. Springer Berlin Hei-delberg.[BKGN +
13] Robin Blume-Kohout, John King Gamble, Erik Nielsen, JonathanMizrahi, Jonathan D Sterk, and Peter Maunz. Robust, self-consistent, closed-form tomography of quantum logic gates ona trapped ion qubit. 87185(Ml), oct 2013.[BKY19] Robin Blume-Kohout and Kevin C. Young. A volumetric frame-work for quantum computer benchmarks. (49), 2019.[CBS +
19] Andrew W Cross, Lev S Bishop, Sarah Sheldon, Paul D Nation,and Jay M Gambetta. Validating quantum computers usingrandomized model circuits. Physical Review A, 100(3):032328,sep 2019.[DDSG +
20] Pierre-Luc Dallaire-Demers, Michał Ste¸chły, Jerome F. Gonthier,Ntwali Toussaint Bashige, Jonathan Romero, and Yudong Cao.An application benchmark for fermionic quantum simulations.2020.[DGG +
20] Olivia Di Matteo, John Gamble, Chris Granade, KennethRudinger, and Nathan Wiebe. Operational, gauge-free quantumtomography. Quantum, 4:364, nov 2020.[DHJ +
20] Constantin Dalyac, Loïc Henriet, Emmanuel Jeandel, Wolf-gang Lechner, Simon Perdrix, Marc Porcheron, and MargaritaVeshchezerova. Qualifying quantum approaches for hard indus-trial optimization problems. a case study in the field of smart-charging of electric vehicles, 2020.[DL20] Yulong Dong and Lin Lin. Random circuit block-encoded matrixand a proposal of quantum LINPACK benchmark. pages 1–22,2020.[DLP03] Jack J. Dongarra, Piotr Luszczek, and Antoine Petite. TheLINPACK benchmark: Past, present and future. ConcurrencyComputation Practice and Experience, 15(9):803–820, 2003.[DMS17] Amir Dembo, Andrea Montanari, and Subhabrata Sen. Extremalcuts of sparse random graphs. Ann. Probab., 45(2):1190–1217,03 2017.[EWP +
19] Alexander Erhard, Joel James Wallman, Lukas Postler, MichaelMeth, Roman Stricker, Esteban Adrian Martinez, PhilippSchindler, Thomas Monz, Joseph Emerson, and Rainer Blatt.Characterizing large-scale quantum computers via cycle bench-marking. pages 1–13, 2019.[FGG14] Edward Farhi, Jeffrey Goldstone, and Sam Gutmann. A QuantumApproximate Optimization Algorithm. nov 2014.[GL18] David Gamarnik and Quan Li. On the max-cut of sparse randomgraphs. Random Structures & Algorithms, 52(2):219–262, 2018.[Gre15] Daniel Greenbaum. Introduction to Quantum Gate Set Tomogra-phy. 2015.[HNYN09] Yuichi Hirata, Masaki Nakanishi, Shigeru Yamashita, and Ya-suhiko Nakashima. An Efficient Method to Convert ArbitraryQuantum Circuits to Ones on a Linear Nearest Neighbor Archi-tecture. In 2009 Third International Conference on Quantum,Nano and Micro Technologies, pages 26–33. IEEE, feb 2009.[IBM] Ibm quantum experience website. https://quantum-computing.ibm.com/. Accessed: 2020-03-05.[LW68] Elliott H. Lieb and F. Y. Wu. Absence of mott transition inan exact solution of the short-range, one-band model in onedimension. Physical Review Letters, 20(25):1445–1448, 1968.[MCWG20] David C. McKay, Andrew W. Cross, Christopher J. Wood, andJay M. Gambetta. Correlated randomized benchmarking. arXiv,2020.[MdB20] Simon Martiel and Timothée Goubault de Brugière. Architectureaware compilation of quantum circuits via lazy synthesis, 2020.[MGE12] Easwar Magesan, Jay M Gambetta, and Joseph Emerson. Charac-terizing quantum gates via randomized benchmarking. PhysicalReview A, 85(4):042311, apr 2012.[MGS +
13] Seth T. Merkel, Jay M. Gambetta, John A. Smolin, Stefano Po-letto, Antonio D. Córcoles, Blake R. Johnson, Colm A. Ryan, andMatthias Steffen. Self-consistent quantum process tomography.Physical Review A, 87(6):062119, jun 2013.[MPJ +
19] Alexander J. McCaskey, Zachary P. Parks, Jacek Jakowski,Shirley V. Moore, Titus D. Morris, Travis S. Humble, Raphael C.Pooser, V M oore, Titus D. Morris, Travis S. Humble, andRaphael C. Pooser. Quantum Chemistry as a Benchmark for VOLUME 4, 2016 ear-Term Quantum Computers. npj Quantum Information,5(1):1–10, 2019.[MSSD20] Daniel Mills, Seyon Sivarajah, Travis L. Scholten, and RossDuncan. Application-Motivated, Holistic Benchmarking of a FullQuantum Computing Stack. 2020.[NGM20] Beatrice Nash, Vlad Gheorghiu, and Michele Mosca. Quantumcircuit optimizations for NISQ architectures. Quantum Scienceand Technology, 5(2):025010, mar 2020.[NRK +
17] C. Neill, P. Roushan, K. Kechedzhi, S. Boixo, S. V. Isakov,V. Smelyanskiy, R. Barends, B. Burkett, Y. Chen, Z. Chen,B. Chiaro, A. Dunsworth, A. Fowler, B. Foxen, R. Graff, E. Jef-frey, J. Kelly, E. Lucero, A. Megrant, J. Mutus, M. Neeley,C. Quintana, D. Sank, A. Vainsencher, J. Wenner, T. C. White,H. Neven, and J. M. Martinis. A blueprint for demonstratingquantum supremacy with superconducting qubits. 2017.[PCDR +
19] Timothy J. Proctor, Arnaud Carignan-Dugas, Kenneth Rudinger,Erik Nielsen, Robin Blume-Kohout, and Kevin Young. DirectRandomized Benchmarking for Multiqubit Devices. PhysicalReview Letters, 123(3):1–13, 2019.[PMS +
14] Alberto Peruzzo, Jarrod McClean, Peter Shadbolt, Man-HongYung, Xiao-Qi Zhou, Peter J. Love, Alán Aspuru-Guzik, andJeremy L. O’Brien. A variational eigenvalue solver on a photonicquantum processor. Nature Communications, 5(1):4213, sep2014.[Pow94] M. J. D. Powell. A Direct Search Optimization Method ThatModels the Objective and Constraint Functions by Linear Inter-polation, pages 51–67. Springer Netherlands, Dordrecht, 1994.[Pre18] John Preskill. Quantum Computing in the NISQ era and beyond.Quantum, 2:79, jan 2018.[PRY +
17] Timothy Proctor, Kenneth Rudinger, Kevin Young, MohanSarovar, and Robin Blume-Kohout. What RandomizedBenchmarking Actually Measures. Physical Review Letters,119(13):130502, sep 2017.[PRY + +
19] K. Wright, K. M. Beck, S. Debnath, J. M. Amini, Y. Nam,N. Grzesiak, J. S. Chen, N. C. Pisenti, M. Chmielewski,C. Collins, K. M. Hudek, J. Mizrahi, J. D. Wong-Campos,S. Allen, J. Apisdorf, P. Solomon, M. Williams, A. M. Ducore,A. Blinov, S. M. Kreikemeier, V. Chaplin, M. Keesan, C. Monroe,and J. Kim. Benchmarking an 11-qubit quantum computer.Nature Communications, 10(1):1–6, 2019.[ZWD +
20] Han-Sen Zhong, Hui Wang, Yu-Hao Deng, Ming-Cheng Chen,Li-Chao Peng, Yi-Han Luo, Jian Qin, Dian Wu, Xing Ding,Yi Hu, Peng Hu, Xiao-Yan Yang, Wei-Jun Zhang, Hao Li, YuxuanLi, Xiao Jiang, Lin Gan, Guangwen Yang, Lixing You, ZhenWang, Li Li, Nai-Le Liu, Chao-Yang Lu, and Jian-Wei Pan.Quantum computational advantage using photons. Science (NewYork, N.Y.), 370(6523):1460–1463, dec 2020.