[PDF] Layer VQE: A Variational Approach for Combinatorial Optimization on Noisy Quantum Computers

Abstract

Combinatorial optimization on near-term quantum devices is a promising path to demonstrating quantum advantage. However, the capabilities of these devices are constrained by high noise levels and limited error mitigation. In this paper, we propose an iterative Layer VQE (L-VQE) approach, inspired by the Variational Quantum Eigensolver (VQE). We present a large-scale numerical study, simulating circuits with up to 40 qubits and 352 parameters, that demonstrates the potential of the proposed approach. We evaluate quantum optimization heuristics on the problem of detecting multiple communities in networks, for which we introduce a novel qubit-frugal formulation. We numerically compare L-VQE with QAOA and demonstrate that QAOA achieves lower approximation ratios while requiring significantly deeper circuits. We show that L-VQE is more robust to sampling noise and has a higher chance of finding the solution as compared with standard VQE approaches. Our simulation results show that L-VQE performs well under realistic hardware noise.

Full PDF

LLayer VQE: A Variational Approach for Combinatorial Optimization onNoisy Quantum Computers

Xiaoyuan Liu, Anthony Angone, Ruslan Shaydulin, Ilya Safro, Yuri Alexeev, and Lukasz Cincio University of Delaware, Newark, DE 19716, USA School of Computing, Clemson University, Clemson, SC 29634, USA Mathematics and Computer Science Division,Argonne National Laboratory, Lemont, IL 60439, USA Computational Science Division, Argonne National Laboratory, Lemont, IL 60439, USA Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA

Combinatorial optimization on near-term quantum devices is a promising path to demonstratingquantum advantage. However, the capabilities of these devices are constrained by high noise levelsand limited error mitigation. In this paper, we propose an iterative Layer VQE (L-VQE) approach,inspired by the Variational Quantum Eigensolver (VQE). We present a large-scale numerical study,simulating circuits with up to 40 qubits and 352 parameters, that demonstrates the potential ofthe proposed approach. We evaluate quantum optimization heuristics on the problem of detectingmultiple communities in networks, for which we introduce a novel qubit-frugal formulation. We nu-merically compare L-VQE with QAOA and demonstrate that QAOA achieves lower approximationratios while requiring signiﬁcantly deeper circuits. We show that L-VQE is more robust to samplingnoise and has a higher chance of ﬁnding the solution as compared with standard VQE approaches.Our simulation results show that L-VQE performs well under realistic hardware noise.

I. INTRODUCTION

Recent advances in quantum computing hardware open the possibility of demonstrating quantum advan-tage in practical applications [1, 2]. A promising target application domain is combinatorial optimization,with problems becoming classically intractable (in the current state of theory) to solve exactly even formoderately sized instances. This situation suggests that the requirement for the number of qubits needed totackle certain classically hard combinatorial optimization problems is relatively low, leading to the possibilityof noisy intermediate-scale quantum (NISQ) [3] devices becoming competitive with classical state-of-the-artmethods for such problems.Near-term quantum devices are expected to have high noise levels, and only partial error mitigation iscurrently possible. This situation leads to a constraint on the maximum depth of the quantum circuitthat can be reliably executed on NISQ devices. This constraint motivated the development of a number ofhybrid quantum-classical algorithms for optimization, most notably the Quantum Approximate OptimizationAlgorithm (QAOA) [4, 5] and variational quantum algorithms for optimization [6–8]. These algorithmsexecute only a short parameterized circuit on the quantum computer and use a classical outer-loop procedureto ﬁnd “good” parameters [9]. The short parameterized circuit is often referred to as the ansatz. The goalof the outer-loop procedure, in general, is to ﬁnd parameters such that the output of the quantum circuitincludes high-quality solutions to the combinatorial optimization problem being solved.The choice of the ansatz is a key problem in hybrid algorithms. Two main concerns are the expressivity and the trainability of the chosen ansatz. First, the ansatz has to be suﬃciently expressive, meaning thatthere should exist parameters with which the ansatz prepares a state suitably close to the solution of theproblem. Second, the ansatz has to be trainable, meaning that suﬃciently good parameters have to befeasible to ﬁnd [10].For combinatorial optimization problems, the solution is classical; in other words, it is a computationalbasis state. Therefore the ﬁrst criterion, the expressivity of the ansatz, reduces to the ability to prepare astate with suﬃciently large overlap with the computational basis state encoding the solution of the problem.This means that the ansatz can be suﬃciently expressive without generating any entanglement or havingany quantum properties whatsoever: one layer of single-qubit rotations is suﬃcient to prepare an arbitrarycomputational basis state. Such ans¨atze may not be trainable, however. Their structure enforces localizedoptimization, which is prone to local minima. As we discuss below, that class of ans¨atze may be extendedto enhance trainability by introducing a correlation between distant parts of the system. A commonly used a r X i v : . [ qu a n t - ph ] F e b lass of highly expressive ans¨atze are those with alternating layers of single-qubit and two-qubit gates, wherethe two-qubit gates are aligned with the connectivity available on the hardware. These ans¨atze are knownas quantum neural networks [11] or hardware-eﬃcient ans¨atze [12]. An alternative and “natural” approachis the Hamiltonian-evolution ansatz used in QAOA. Such ans¨atze can be less expressive, however, sincethe state it has to prepare is a nontrivial entangled state due to the symmetry-preserving properties of theansatz [13]. This observation has been used by Bravyi et al. [14] to show that because of the Z symmetryof the ansatz, QAOA with constant depth is outperformed by the classical Goemans–Williamson algorithmfor MaxCut. As a result, QAOA needs a comparatively large circuit depth to achieve the same (classical)expressivity as compared with hardware-eﬃcient ansatz.For ans¨atze with a large number of parameters, the high-quality parameters are typically found by usinga classical optimizer. Thus the second criterion, the trainability of the ansatz, is typically framed in termsof the cost function landscape that the classical outer-loop routine has to optimize over. Recent resultsshow that highly expressive ans¨atze such as hardware-eﬃcient ans¨atze suﬀer from “barren plateaus” in theoptimization landscape, making ﬁnding high-quality parameters intractable [11, 15–24]. At the same time,a series of recent results show that because of the structured nature of the ansatz used in QAOA, one maybe able to ﬁnd high-quality parameters by using machine learning approaches [25–27] or by restricting theparameters to a speciﬁc physically motivated class [28–30]. Note that our notion of trainability is diﬀerentfrom the one commonly used when discussing eﬀects such as “barren plateaus.” In fact, it is more general:under our deﬁnition, a circuit may have large gradients for the whole parameter space and still not betrainable. QUANTUM COMPUTER

MeasurementProduce samples

CLASSICAL COMPUTER construct and iteratively grow the ansatzoptimize parameters

FIG. 1: Layer-VQE: start from a simple and shallow ansatz with one r y act on each qubit; optimize andupdate the parameters; after some predeﬁned number of iterations; increment the size of the ansatz;optimize and update all parameters. The ansatz can be incremented multiple times.In this paper, we propose a practical approach to combinatorial optimization on near-term quantumcomputers. We introduce an iterative approach, which we call Layer VQE (L-VQE), inspired by recentadvances in hybrid quantum-classical algorithms with an adaptive ansatz [7, 31, 32]. In L-VQE, we startwith one layer of parameterized rotations and increment the size of the ansatz systematically by introducingentangling gates and additional parameterized rotations. To heuristically decrease the likelihood of gettingtrapped in a local optimum of the parameters, we increment the ansatz before reaching convergence. Toguarantee that at each step the quality of the solution does not decrease, we initialize the added ansatzsuch that it evaluates to identity. We work with qubits aligned in a chain and assume nearest neighborconnectivity. This allows us to consider large problems by using tensor network techniques to simulatecircuits. We expect that in practical applications on real quantum hardware, one would organize ansatzlayers according to (typically two-dimensional) qubit connectivity to further enhance circuit expressiveness.Quantum circuits on such layouts cannot be in general classically eﬃciently simulated and are thus notconsidered in the present study.Fig. 1 gives a schematic presentation of L-VQE. We study the algorithm for the problem of detecting k n nodes, n (cid:100) log k (cid:101) qubits are required for the circuit. We present a large-scale numerical study of the proposed approach, simulating circuits with up to 40 qubits and 352 rotationalgates (i.e., parameters). Our numerical simulation results show that the proposed approach achieves a higherapproximation ratio compared with QAOA while requiring signiﬁcantly lower circuit depth. The proposedapproach is more robust to sampling noise and performs better than hybrid approaches with a ﬁxed ansatz.Moreover, we show that the proposed approach performs well under realistic hardware noise.The rest of the paper is organized as follows. In Section II we review the relevant background of solvingcombinatorial optimizations on quantum computers. In Section III we review related work. Section IVintroduces our L-VQE approach, and in Section V we discuss our novel formulation of the k -communitydetection problem. Section VI presents our numerical simulation results and in Section VII we summarizeour conclusions. II. BACKGROUND

We begin by brieﬂy reviewing our notion of combinatorial optimization on quantum computers and relevantconcepts. Suppose we have an objective function C ( x ) deﬁned on the Boolean cube x = { x i } ni =1 ∈ { , } n and a corresponding optimization problem max x ∈{ , } n C ( x ) , (1)where the objective function C ( x ) can be formulated in the following form: C ( x ) = (cid:88) q w q (cid:89) i ∈ q x i (cid:89) j (cid:54)∈ q (1 − x j ) . (2)Here, q ⊂ { , , · · · , n } are given index sets, and w q are given coeﬃcients. The objective function C ( x ) issaid to be faithfully represented by a Hamiltonian H if it acts as H | x (cid:105) = C ( x ) | x (cid:105) for each x ∈ { , } n . For afunction given in the form (2), such a Hamiltonian representation can be constructed by substituting every x i with the matrix x i → ( i − z i ), where i is the identity matrix and z i is the Pauli z operator that acts onqubit i : H = (cid:88) q w q (cid:89) i ∈ q i − z i (cid:89) j ∈ q C i − z j . (3)Note that the operator H ∈ C n is never constructed explicitly. Instead, we construct a compact represen-tation of it as a combination of Pauli z operators. A. Combinatorial Optimization on Near-Term Quantum Computers

The two most prominent candidate algorithms for combinatorial optimization on noisy near-term quantumcomputers are the Variational Quantum Eigensolver (VQE, originally proposed in the context of quantumchemistry [33]) and the Quantum Approximate Optimization Algorithm (QAOA) [4]. Both algorithms arehybrid quantum-classical algorithms that combine a parameterized trial | ψ ( θ ) (cid:105) state prepared on a quantumcomputer with a classical routine used to ﬁnd high-quality parameters θ . The goal is to ﬁnd parameters θ such that when the state | ψ ( θ ) (cid:105) is measured, the measurement result corresponds to a good solution of theclassical optimization problem. The parameterized trial state | ψ ( θ ) (cid:105) is commonly called the ansatz.In VQE, for optimization the ansatz is frequently tailored to the hardware [8, 34], and the parameters θ are found by using a classical outer-loop optimizer. The expectation value (cid:104) ψ ( θ ) |H| ψ ( θ ) (cid:105) is commonly3sed as the metric for the optimizer, although other approaches have been suggested [35]. QAOA uses aproblem-dependent ansatz given by | ψ p ( γ , β ) (cid:105) = e − iβ p B e − iγ p H · · · e − iβ B e − iγ H | + (cid:105) ⊗ n , (4)where B = (cid:80) ni =1 x i is the mixing Hamiltonian, x i is the Pauli x operator acting on qubit i , H is theHamiltonian faithfully representing the objective function, and p is a parameter controlling the depth. Thespecial structure of the QAOA ansatz enables ﬁnding high-quality parameters γ , β purely classically in manysettings [4, 36, 37] or using very few iterations of the outer-loop optimizer [25, 38, 39].We evaluate the quality of the ﬁnal quantum state | ψ ( θ ) (cid:105) by computing the approximation ratio ρ deﬁnedas follows: ρ = (cid:104) ψ ( θ ) |H| ψ ( θ ) (cid:105) C bkv , (5)where C bkv is the global optimum of the objective function C ( x ) or the best known value, since when thesize of the problem gets larger, the global optimal max x ∈{ , } n C ( x ) may not be accessible. B. The k -Community Detection The k -community detection, also known as modularity clustering, is a famous problem in network science.The goal is to partition a network into k communities such that the modularity metric [40] is maximized.Intuitively, when modularity with respect to a partition of the network is large, the connectivity betweennodes inside each community is dense while the connectivity between each community is sparse. Modularity,when maximized, leads to the appearance of communities in a given network. It is deﬁned as the fraction ofthe edges that fall within the given groups minus the expected fraction if edges were distributed at random.For a formal deﬁnition, let G = ( V, E ) be an undirected simple graph with | V | = n nodes and | E | = m edges. The adjacency matrix of G is denoted by A = { A u,v } ≤ u,v ≤ n , where A u,v = 1 if there is an edgebetween node u and node v , and 0 otherwise. The degree of a node v is denoted by d v . A k -communityclustering C = { C , · · · , C k } is a partition of V into k disjoint sets, namely, (cid:83) ki =1 C i = V , and C i (cid:84) C j = ∅ for all 1 ≤ i (cid:54) = j ≤ k . Furthermore, c v denotes the membership of node v for a given clustering; that is, if v ∈ C i , then c v = i . The modularity of a clustering C is given by: Q ( C ) = 12 m n (cid:88) u,v =1 B u,v δ ( c u , c v ) , (6)where the modularity matrix B is given by B u,v = A u,v − d u d v m , ≤ u, v ≤ n , and δ is the Kronecker delta: δ ( c u , c v ) = (cid:40) , if c u = c v , otherwise. (7)Our goal is to ﬁnd the clustering C such that the modularity is maximized:argmax C Q ( C ) . The problem has applications in chemistry [41], biology [42], social sciences [43], and other ﬁelds. Solvingthe modularity maximization problem to optimality is NP-complete [44].

III. RELATED WORKA. Hybrid Quantum-Classical Algorithms

The question of ansatz choice is central to the success of hybrid quantum-classical methods introducedin Section II A. In VQE, the choice of the ansatz determines the expressivity and trainability of the trial4tate; therefore, the quality of VQE is only as good as the ansatz. Diﬀerent strategies of parameterizingthe ansatz and updating the parameters will also aﬀect the performance of the algorithm. While being ableto reach any state requires a circuit with exponential depth, shallow circuits are preferred in applications,especially if the goal is to run the circuits on the modern NISQ devices. McClean et al. [11] show that withrandom parameterized circuit initialization, the exponential dimension of the Hilbert space and the gradientestimation complexity make the optimization impossible for deep circuits. Moreover, Wang et al. [45] showthat another type of “barren plateau” is induced by hardware noise. More speciﬁcally, given local Pauli noise,the gradient vanishes exponentially with the depth of the circuits. Similar results have been demonstratedfor QAOA [46].Recently, a number of approaches have been proposed that attempt to overcome these limitations by usingan adaptive or iteratively constructed ansatz. In conventional VQE approaches, the wave function ansatz(such as the unitary coupled cluster (UCC) ansatz [33] or hardware-eﬃcient ansatz [12]) is preselected andﬁxed upfront. To address the limitation of the preselected ansatz, Grimsley et al. propose an adaptivevariational algorithm ADAPT-VQE [31] that generates an ansatz with a small number of parameters andgrows it systematically. This approach performs better than a UCC ansatz approach in terms of both circuitdepth and accuracy.With the same spirit, Zhu et al. propose an adaptive version of QAOA, called ADAPT-QAOA [7].Compared with the standard QAOA ansatz, which alternates between the predeﬁned exponentiated costand mixing Hamiltonian operator, ADAPT-QAOA grows the ansatz with two operators at a time. It alsouses a gradient criterion to select the mixing operator from a predeﬁned operator pool. On a class ofMaxCut graph problems, ADAPT-QAOA demonstrates faster convergence while also reducing the numberof optimization parameters and the cnot gate counts, compared with standard QAOA. Skolik et al. [32]propose a layer-wise learning strategy that grows the circuit depth incrementally during optimization andonly updates subsets of parameters in training. However, a recent paper [22] shows that this type of layer-wise training strategy, namely, training a circuit piecewise in sequence, could encounter abrupt transitionsin the training process as the depth of the circuit grows.

B. Community Detection

Community detection has been extensively studied classically [40, 47], as well as by using the D-wave quan-tum annealer [48–51] and QAOA [49, 51, 52]. In these hybrid quantum-classical approaches, the optimizationproblem is encoded as an Ising model Hamiltonian that has only two-body terms. In the formulations, for agraph with n vertices, solving the 2-community modularity maximization problem requires n qubits, whereeach qubit encodes the membership of a node. For the k -community problem, to encode the membership ofeach node, one will need to associate k qubits to each node, while introducing quadratic penalty constraintsinto the Ising Hamiltonian to enforce that each node belongs to only one community. The formulationrequires kn qubits. IV. LAYER VQE

We advocate an iterative hybrid approach to quantum optimization on NISQ devices, which we call LayerVQE. L-VQE combines ideas from recent developments in adaptive variational algorithms, such as [7, 31, 32].In this section, we describe L-VQE in detail.Suppose we use a problem encoding that requires n qubits. We start the algorithm with an ansatz withno entangling gates and one r y gate acting on each qubit, where r y is the single qubit rotation through anangle θ around the y -axis, the unitary matrix is deﬁned as r y ( θ ) ≡ e − i θ y , and y is the Pauli y operator.The parameters of these r y gates are initialized uniformly randomly on [0 , π ]. We denote the parametersfor this layer of gates ( Layer 0 in Fig. 2) as θ and the layer as U ( θ ). The quantum state after applyingthe circuit to the initial state | (cid:105) is denoted as | ψ ( θ ) (cid:105) ≡ U ( θ ) | (cid:105) . We then proceed to the conventionalVQE routine and iteratively update the parameters θ to minimize the cost function (cid:104) ψ ( θ ) |H| ψ ( θ ) (cid:105) . Inconventional VQE, this iterative procedure is run until convergence; but in L-VQE, we stop after a ﬁxed5umber of iterations and then add another set of gates to the ansatz. The conventional strategy can indeedproduce a better result at this step, but after adding the new set of gates, it may more easily get trapped ina local minimum in the subsequent optimization procedure. In our experiments, the number of iterations ispicked empirically and increases linearly as system size grows.The newly added set of gates includes the r y gates and cnot gates that act on nearest-neighbor qubits.Another way to describe this whole procedure is that we embed the obtained parameterized circuit into adeeper circuit. We denote this newly added layer of the circuit U ( θ ), ( Layer 1 in Fig. 2). The newlyadded parameters θ are initialized as zero. Note that here since r y (0) = i and cnot = i , where i is theidentity matrix, the quantum state becomes | ψ ( θ , θ ) (cid:105) ≡ U ( θ ) U ( θ ) | (cid:105) = U ( θ ) | (cid:105) = | ψ ( θ ) (cid:105) . (8)Therefore, initializing the newly added parameters as zeros guarantee that the cost function that we areoptimizing will not change after adding this new layer. However, one may add small random perturbationto the parameters before optimization is continued. In that sense, θ are not initialized with zero but withsmall random numbers. As we proceed to the optimization process, iteratively updating the parameters θ , θ to minimize the cost function (cid:104) ψ ( θ , θ ) |H| ψ ( θ , θ ) (cid:105) , initializing with small random numbers maybe useful to avoid local minima and speed up the optimization in general. At this point, we can either letthe optimization run until convergence or repeat the previous process, stop at a ﬁxed number of iterations,and add another set of gates to the circuit and then optimize. The pseudo code of the algorithm is presentedin Algorithm 1. Layer 0 Layer 1 Layer 2 q r y r y r y r y r y q r y r y r y r y r y r y r y r y r y q r y r y r y r y r y r y r y r y r y q r y r y r y r y r y r y r y r y r y q r y r y r y r y r y r y r y r y r y q r y r y r y r y r y FIG. 2: L-VQE ansatz for a 6-qubit quantum state. r y denotes rotation around the y -axis deﬁned as r y ( θ ) ≡ e − i θ y . Every r y contains a parameter that is optimized over in the outer loop. Algorithm 1

L-VQE with (cid:96) layers Initialize the ansatz with one r y acting on each qubit. Update the parameters to minimize (cid:104) ψ ( θ ) |H| ψ ( θ ) (cid:105) ; stop after k iterations (before reaching convergence). for l = 1 , · · · , (cid:96) do Add a new layer to the ansatz, and initialize them such that they evaluate to identity. Update all parameters to minimize (cid:104) ψ (cid:96) ( θ ) |H| ψ (cid:96) ( θ ) (cid:105) ; stop after k (cid:96) iterations (before reaching convergence). end for Update all parameters to minimize (cid:104) ψ ( θ ) |H| ψ ( θ ) (cid:105) until convergence. In simulations, the cost function (cid:104) ψ ( θ ) |H| ψ ( θ ) (cid:105) can be evaluated exactly, but in practical applications wewill repeat the state preparation and measurement multiple times to generate a number of samples, and we6se the samples to estimate the cost function. In our experiments we investigate the performance of thealgorithm in both cases.Similar to ADAPT-VQE [31] and ADAPT-QAOA [7], we grow the size of the ansatz as we iterativelyupdate the parameters. The added parameterized ansatz is initialized such that the new circuit partsevaluate to identity in order to avoid deterioration of the optimization. In ADAPT-VQE and ADAPT-QAOA, however, the algorithm will identify an operator that has the largest gradient from a collection ofoperators and then add this operator to the ansatz. In L-VQE, we deﬁne the newly added ansatz upfront.Unlike layer-wise learning [32] where only subsets of the parameters are updated in training, we optimizeover all parameters and thereby may reduce the limitations of the lack of layer-wise trainability [22]. Inaddition, unlike previous approaches, we grow the size of the ansatz before the convergence is reached, whichagain may be beneﬁcial to avoid local minima. V. THE k -COMMUNITY DETECTION We propose a novel qubit-frugal formulation for the k − community detection problem. When the problemis to divide the network into two communities, namely, with k = 2, we can associate a binary variable witheach node v ∈ V such that x v = (cid:40) , if c v = 10 , if c v = 2.Then, we can rewrite the Kronecker delta (7) in terms of these binary variables: δ ( c u , c v ) = δ ( x u , x v ) = 2 x u x v − x u − x v + 1 . (9)Plugging (9) into (6) leads to the expression of modularity: Q ( C ) = 12 m n (cid:88) u,v =1 B u,v (2 x u x v − x u − x v + 1) . For larger k , we can use a binary encoding by associating N = (cid:100) log k (cid:101) binary variables { x j,v } Nj =1 ⊂ { , } N with each node v ∈ V . We can rewrite the membership of node v as c v = N (cid:88) j =1 j − x j,v . Again, we can rewrite the Kronecker delta (7) in terms of these binary variables: δ ( c u , c v ) = N (cid:89) j =1 δ ( x j,u , x j,v ) = N (cid:89) j =1 (2 x j,u x j,v − x j,u − x j,v + 1) . (10)Plugging (10) into (6), we obtain for the modularity Q ( C ) = 12 m n (cid:88) u,v =1 B u,v N (cid:89) j =1 (2 x j,u x j,v − x j,u − x j,v + 1) . (11)Following the construction described in Section II, maximizing the modularity in (11) can be formulatedin terms of ﬁnding the ground state of the following Hamiltonian, H = − m n (cid:88) u,v =1 B u,v N (cid:89) j =1 I + z j,u z j,v , (12)where binary variables x j,v have been substituted with ( I − z j,v ) , ∀ j ∈ { , , · · · , N } , ∀ v ∈ V . Here, z j,v isthe Pauli z operator that acts on qubit ( j, v ). 7ther formulations have been proposed to tackle the problem for speciﬁc quantum architectures. Ushijima-Mwesigwa et al. [50] use an Ising Hamiltonian formulation to detect two communities using quantum an-nealing on the D-Wave system, which requires n qubits. Negre et al. [48] extend it to detect k communities,which requires kn qubits. In contrast, the Hamiltonian we propose in this work requires only n (cid:100) log k (cid:101) qubits thanks to the encoding introduced above. A possible downside of our formulation is that we now havemany-body interactions in the Hamiltonian, which make it harder to implement compared with the 2-bodyterms in existing works. VI. EXPERIMENTS

In this section we present the numerical results. Since QAOA is considered the leading approach forcombinatorial optimization on NISQ devices, we begin in Section VI A with a numerical comparison of L-VQE and QAOA. We then compare L-VQE with the second leading approach, which is VQE in SectionVI B. To highlight the potential of the proposed L-VQE approach on NISQ devices, we present some furtherevidence in Section VI C. This includes a scalability analysis and simulation results of L-VQE on a trappedion noisy quantum simulator with a realistic noise level. To highlight the importance of entanglement foroptimization, in Section VI D we present results comparing VQE with and without entanglement.

A. L-VQE and QAOA

For the ﬁrst set of experiments, we run simulations for the L-VQE and QAOA algorithm with the proposedHamiltonian (12). The goal is to ﬁnd a clustering of up to 4 communities that maximize the modularity. Weare thus simulating 2 n qubits for a graph with n vertices. For L-VQE, we run our simulations of the quantumcircuits in MATLAB. We use matrix product states techniques to simulate quantum circuits, which allows toreach large system sizes (up to 40 qubits and 352 parameters). We also represent the Hamiltonian in the formof a matrix product operator [53, 54]. For the classical optimization, we use a sequential minimal optimizer(SMO) [55]. For QAOA, we use the high-performance simulator Qiskit Aer [56] to simulate QAOA circuits.Because of the simulation complexity and the need to optimize parameters for the benchmark instances, welimit the simulation to 20 qubits. For optimization in QAOA, we use COBYLA [57, 58] implemented in theSciPy [59] package and also use COBYLA as a local optimizer in the libEnsemble [60] implementation ofAPOSMM [61, 62]. Given a ﬁxed number of iterations, APOSMM as a multistart method will run the localoptimizer until convergence and then restart the optimization. This approach has been shown to work wellin our previous work [39].We ﬁrst generate a random graph with 7 vertices, shown in the inset of Fig. 3 (a), simulating 14 qubits.The maximal modularity with up to 4 communities of this graph can be found by brute force (0.1790). Wereport the approximation ratio ρ (deﬁned in (5)) found by QAOA in Fig. 3 (a). We ﬁrst run QAOA with p ranging from 1 to 30 for 10 times for each p , and we use COBYLA to optimize. Each run is given adiﬀerent random seed and run until convergence. In Fig. 3 (a) we report the best approximation ratio weﬁnd from the 10 runs. Note that local optimizers such as COBYLA cannot guarantee to ﬁnd the optimalparameters, especially as p increases. This is the reason that the data points of approximation ratio donot grow monotonically with p . Therefore, to further improve the optimizer, we use the multistart methodAPOSMM with COBYLA, which uses a ensemble of local optimization solvers. We use COBYLA as thelocal optimization solver within APOSMM. We give APOSMM a limit of 30,000 iterations. The limit ischosen based on an empirical observation that with this parameter choice APOSMM will restart COBYLAfor at least 10 times, usually much more. Indeed, using multistart method, the results improve comparedwith using only COBYLA. All results are presented in Fig. 3 (a). We observe that with this small graph,even if we increase p up to 30, QAOA at most ﬁnds an estimate of the ground state up to approximationratio 0.817. We also run QAOA experiments on slightly larger graphs, up to 10 vertices, and with p up to10. The results of the experiments are shown in Fig. 3 (b). To compare, we run our L-VQE on each graph10 times given diﬀerent random seed. Each run is given a limit of 3,000 iterations, and we report the bestresult found by L-VQE in Fig. 3 (b). For each graph, L-VQE ﬁnds an estimate of the ground state with an8ABLE I: Assuming full connectivity and compiling the higher-order terms in the Hamiltonian (12) intogate sets { r z , cnot } , the gate count of QAOA scales quadratically with n , while L-VQE scales linearly. Inour experiments presented in Fig. 3 (a), QAOA circuits with p steps consists of 77 p single qubit gates and336 p cnot gates, while L-VQE with (cid:96) layers contains 52 (cid:96) + 14 single qubit gates and 26 (cid:96) cnot gates.Thus, we expect that the L-VQE approach will be more robust to noise in real-life experiments. CNOTcount of QAOA can be decreased by further circuit optimizations and more eﬃcient native gates. On theother hand, it would be increased if the connectivity is not full. QAOA with p steps L-VQE with (cid:96) layers cnot count 8 n ( n − p (cid:96) (4 n − approximation ratio ρ of at least 0.99. p A pp r o x i m a t i o n R a t i o (a) QAOA on a 7-node graph APOSMMCOBYLA (b) L-VQE vs QAOA p = 1p = 2p = 3p = 4p = 5p = 6p = 7p = 8p = 9p = 10L-VQE

FIG. 3: (a) shows the best approximation ratio QAOA found for the 7-node graph (shown in the inset)with p ranging from 1 to 30. Even with the multistart method APOSSM to improve the optimizerCOBYLA, we at most ﬁnd an estimate of the ground state up to approximation ratio 0.817. (b) comparesQAOA and L-VQE on graphs of size from 7 to 10, simulating 14–20 qubits. L-VQE ansatz is iterativelyincreased up to (cid:96) = 2 layers. L-VQE ﬁnds the ground state or a state that is close (with approximationratio at least 0.99) to the ground state for each graph.For a graph with n vertices, our approach requires 2 n qubits in order to detect 4 communities. Assum-ing full connectivity and cnot as the entangling gate, the gate counts of QAOA and L-VQE circuits aresummarized in Table I. When p is small, the Hamiltonian evolution ansatz used in QAOA is less expressiveas compared with the hardware-eﬃcient ansatz used in L-VQE. Therefore large circuit depth is needed toachieve the required overlap with the target state. At the same time, the cost function landscape of QAOA ishighly nonconvex and contains many low-quality local optima, which make ﬁnding high-quality parametersdiﬃcult for larger p . In addition, the Hamiltonian (12) contains many-body terms, which makes it harderto compile into gates. The compilation is further complicated in practice by the limited connectivity of thehardware. In contrast, L-VQE follows the connectivity of the hardware, as the Hamiltonian structure doesnot enter the ansatz explicitly. B. VQE and L-VQE

To further examine the performance of L-VQE, we compare the results of VQE with ﬁxed ansatz and L-VQE on larger problems. In Section VI B 1 we compare the performance of VQE and L-VQE with samplingnoise, and in Section VI B 2 we compare the performance without sampling noise. The results are summarized9ABLE II: Graph information of the

Networkx generated instances.

Graph class | V | relaxed caveman gaussian random random partition windmill gnp random power law cluster in Section VI B 3.We generated 16 graph instances with NetworkX . Graph information is summarized in Table II. The goalis to ﬁnd a clustering of up to 4 communities that maximizes the modularity; thus we are simulating 34qubits for windmill and 40 qubits for all other graphs.For VQE, we deﬁne a ﬁxed form of the ansatz upfront and then iteratively optimize and update over allparameters. We compare 3 sets of ans¨atze, which are shown in Fig. 2 as Layer 0 only ( (cid:96) = 0), Layer 0 to 1only ( (cid:96) = 1), and Layer 0 to 2 ( (cid:96) = 2), respectively. For L-VQE with (cid:96) = 0, the ansatz will not grow; thus thealgorithm is the same as VQE with one r y gate acting on each qubit. For L-VQE with (cid:96) = 1 and (cid:96) = 2, weset the parameter k = 200 in Algorithm 1. In other words, we ﬁrst run L-VQE with Layer 0 ansatz for 200iterations and then reuse the parameters and embed the parameters to the ansatz with 1 layer and 2 layers,respectively. Again, we run our simulations of the quantum circuits in MATLAB, with the Hamiltonian inthe form of a matrix product operator [53]. For optimization, we use the sequential minimal optimizer [55]and COBYLA [57]. For each graph and each approach, we initialize the ansatz with 10 diﬀerent randomseeds.

1. VQE and L-VQE with sampling noise

We report the results of VQE and L-VQE with sampling noise in Table III–VI. To evaluate the costfunction (cid:104) ψ ( θ ) |H| ψ ( θ ) (cid:105) , we execute the circuit and generate 2,000 samples and use the mean of the samplesas an estimator. Having a ﬁnite number of samples is a realistic setup, since when the scale of the systemgets larger, the exact computation of the cost function becomes intractable.In Table III, we report the best approximation ratio ( ρ best ) achieved from the 10 runs using SMO for eachgraph with sampling noise. In Table IV, we report the average and standard deviation ( ρ average ± σ ) of theapproximation ratio from the 10 runs for each graph. We additionally report the results that use COBYLAas the optimizer in Tables V, VI.

2. VQE and L-VQE without sampling noise

We report the results of VQE and L-VQE without sampling noise in Tables VII–VIII. In each iterationwe evaluate the cost function exactly. In Table VII we report the best approximation ratio ( ρ best ) achievedfrom the 10 runs using SMO for each graph without sampling noise. In Table VIII, we report the averageand standard deviation ( ρ average ± σ ) of the approximation ratio from the 10 runs for each graph.

3. Summary of VQE and L-VQE

Across all instances we set the threshold of approximation ratio to 0.99, 0.95, and 0.90, respectively, andin Table IX we report the percentage of the local optimizer runs that ﬁnd the quantum state with a higherapproximation ratio at the end of the algorithm. The rows in blue are the experiments with sampling noise(i.e., the cost function is estimated by the mean of the samples), and the rows in white are the experimentswithout sampling noise (i.e., the cost function is evaluated exactly).10ABLE III: Best approximation ratio with sampling noise using SMO. As the number of layers in theansatz increases, results of VQE deteriorates. L-VQE does not suﬀer from that problem, and we achievebetter results as the number of layers grows. graph VQE ρ best L-VQE ρ best TABLE IV: Average approximation ratio with sampling noise using SMO. As the number of layers in theansatz increases, results of VQE deteriorate; but for L-VQE, we achieve better results. graph VQE ρ average ± σ L-VQE ρ average ± σ ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± Intuitively, when we increase the size of the ansatz, the ansatz becomes more expressive, and we shouldhave a better chance of ﬁnding the ground state. However, we can see that for VQE, with sampling noise, asthe number of layers in the ansatz increase, the results deteriorate. But for L-VQE, as we increase the size ofthe ansatz, the results improve. Moreover, it is not practical to evaluate the energy exactly in applicationswhen the size of the system gets larger. In L-VQE, by reusing parameters and by using the optimizationprocess, we can achieve a higher probability of ﬁnding the ground state or a state that is suﬃciently closeto the ground state. By comparing the results of our L-VQE with or without sampling noise, we can see nosigniﬁcant diﬀerence, which suggests that our approach is relatively robust to sampling noise.We further investigate the eﬀect of reusing parameters and adding layers of the ansatz in L-VQE; theresults are summarized in Table III - VIII. We observe that for the runs of experiments that start from the11ABLE V: Best approximation ratio with sampling noise using COBYLA. As the number of layers in theansatz increases, results of VQE deteriorate; but for L-VQE, we achieve better results. Thus L-VQE ismore robust under sampling noise compared with VQE. graph VQE ρ best L-VQE ρ best TABLE VI: Average approximation ratio with sampling noise using COBYLA. As the number of layers inthe ansatz increases, results of VQE deteriorate, but for L-VQE, we achieve better results. Thus L-VQE ismore robust under sampling noise compared with VQE. graph conventional VQE ρ average ± σ L-VQE ρ average ± σ ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± same initial Layer 0 ansatz, by reusing the parameters obtained from that ansatz, in most cases the resultsimprove. Across all runs of the experiments, with sampling noise, for 1 layer, 147 out of the 160 (91.88%)runs ﬁnd a state with a better or equal approximation ratio compared with the ansatz with 0 layer only. For2 layers, 151 out of the 160 (94.38%) runs ﬁnd a state with a better or equal approximation ratio comparedwith the ansatz with 0 layers. Similarly, without sampling noise, across all runs of the experiments, for 1layer, 150 out of the 160 (93.75%) runs ﬁnd a state with a better or equal approximation ratio comparedwith the ansatz with 0 layers. For 2 layers, 151 out of the 160 (94.38%) runs ﬁnd a quantum state with abetter or equal approximation ratio compared with the ansatz with 0 layer.12ABLE VII: Best approximation ratio without sampling noise using SMO. L-VQE is clearly more robustunder sampling noise compared with VQE. graph VQE ρ best L-VQE ρ best TABLE VIII: Average approximation ratio without sampling noise using SMO. L-VQE is clearly morerobust under sampling noise compared with VQE. graph VQE ρ average ± σ L-VQE ρ average ± σ ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± C. Further Evidence of the Potential of L-VQE

To provide further evidence of the potential of L-VQE, we present a scaling analysis of L-VQE in SectionVI C 1 and discuss the simulation results of L-VQE on a trapped ion noisy quantum simulator in SectionVI C 2. 13ABLE IX: Percentage of runs of local optimizers that reach a given approximation ratio: blue rows showresults from experiments with sampling noise; white rows are from experiments without sampling noise.The optimizer is SMO. With sampling noise, as the number of layers in the ansatz increases, results ofVQE deteriorate. But for L-VQE, we achieve better results. Thus L-VQE is more robust under samplingnoise compared with VQE.

Approximation ratio > .

99 0 Layer 1 Layer 2 LayerVQE 11.875% 0.625% 0.0%L-VQE 11.875% 29.375% 30.625%VQE 7.5% 26.875% 24.375%L-VQE 7.5% 31.25% 27.5%Approximation ratio > .

95 0 Layer 1 Layer 2 LayerVQE 21.25% 33.75% 1.25%L-VQE 21.25% 49.375% 48.125%VQE 18.75% 45.0% 48.125%L-VQE 18.75% 57.5% 58.125%Approximation ratio > .

90 0 Layer 1 Layer 2 LayerVQE 40.0% 55% 19.375%L-VQE 40.0% 66.875% 67.5%VQE 42.5% 72.5% 66.25%L-VQE 42.5% 71.25% 71.25%

1. Scaling analysis

In this set of experiments, we generate random graphs with vertices ranging from 8 to 20. This meansthat in our application of ﬁnding a clustering up to 4 communities that maximize the modularity, we need tosimulate qubits ranging from 16 to 40. For each graph and each approach, we run the experiments 10 timesand record the average number of iterations needed for convergence of each graph. The results are shown inFig. 4. We can see that the number of iterations scales up polynomially as the number of vertices increases.Here, since within each iteration the number of r y gates in the ansatz scales linearly with respect to thenumber of qubits needed (ansatz shown in Fig. 2), the number of parameters that need to be optimizedtherefore scales up linearly. In addition, the number of samples produced for evaluating the cost function isﬁxed as constant. Thus, the resources required for the entire algorithm scale polynomially. We point out,however, that our algorithm is heuristic by design and there is no guarantee of obtaining a solution withspeciﬁed quality. n )2 I t e r a t i o n s ( y ) Scaling Analysis of L-VQE on log log scale y ) = 1.53log( n ) + 2.471 Layer log( y ) = 1.75log( n ) + 3.102 Layer log( y ) = 1.64log( n ) + 3.79 FIG. 4: Average number of iterations until convergence scales up polynomially with respect to the size ofthe graph. 14 . Noisy simulations

The experiments described in the preceding sections are simulated in a setting that has no gate noise, butwe do simulate sampling noise in some cases. For demonstration purposes, in the next set of experiments wealso investigate the performance of L-VQE using a trapped in a noisy quantum simulator. We use realisticerror rates in our simulations. Details of the noise model are given in Appendix C of [63] and in [64]. Werun the experiments on a caveman graph with 20 nodes. For L-VQE with Layer 1 and Layer 2, we runthe experiments 10 times each. Fig. 5 gives a violin plot of the results. We observe that as the size of theansatz increases, the probability of ﬁnding the ground state or a state that is suﬃciently close increases. Thissuggests that L-VQE is also relatively robust to hardware noise and can be adapted to diﬀerent quantumarchitectures. A pp r o x i m a t i o n R a t i o Noisy simulation on caveman graph

FIG. 5: Violin plot of L-VQE performance on a trapped ion noisy quantum simulator. The plot shows theprobability density of the results, with the kernel density estimator truncated to (min( ρ ) , max( ρ )) (sincethe approximation ratio cannot exceed 1). As the size of the ansatz increases, the probability of ﬁnding theground state or a state that is suﬃciently close increases. D. Entanglement vs no entanglement

Our next experiment is aimed at understanding the role of entanglement in VQE. We use the samemethodology as proposed in [8]. That is, the experiment is based on replacing the entanglement gatesCNOT with a T gate acting on both qubits. Compared with previous work, with our simulator we caninvestigate the algorithm’s performance on larger problems. We run the experiments on 4 graphs: ( caveman , gnp , random , and gaussian ). For each graph, we repeat the experiments 10 times with a diﬀerent randomseed. For the set of experiments with entanglement, we use the ansatz described in Fig. 2 with Layer 0 andLayer 1. For the set of experiments without entanglement, we replace all CNOT gates with a T gate actingon both qubits. The results are summarized in Table X, where we report the percentage of runs that reachthe approximation threshold 0.99, 0.95, and 0.90, respectively. As we can see from the results, under bothcases, with sampling noise and without, using the ansatz with entanglement performs better than using theansatz without entanglement. VII. CONCLUSIONS AND DISCUSSION

Combinatorial optimization on near-term quantum devices is a leading candidate to demonstrate quantumadvantage, and hybrid quantum-classical algorithms have been developed to solve this problem. In this work,15ABLE X: Percentage of experiments given the approximation ratio threshold

Approximation ratio > .

99 Entanglement No entanglementWith sampling noise 15% 0%Without sampling noise 37.5% 32.5%Approximation ratio > .

95 Entanglement No entanglementWith sampling noise 45% 37.5%Without sampling noise 57.5% 47.5%Approximation ratio > .

90 Entanglement No entanglementWith sampling noise 65% 60%Without sampling noise 70% 57.5% we propose an iterative L-VQE approach inspired by VQE. We speciﬁcally studied the application of k -communities detection. In existing works, for a graph with n vertices, solving the k -communities modularitymaximization problem requires kn qubits that encode the problem as an Ising model Hamiltonian. Wepropose a novel qubit-frugal formulation that requires only n (cid:100) log k (cid:101) qubits.We compared the performance of L-VQE with QAOA, which is widely considered to be strong candidatefor quantum advantage in applications with NISQ computers. However, the many-body terms in the Hamil-tonian make it harder to implement in the QAOA setting. Moreover, the numerical results show that theoptimization indeed gets harder, thus suggesting that L-VQE provides a practical alternative to QAOA forcombinatorial optimization on noisy near-term quantum computers.Unlike VQE, which has an ansatz ﬁxed upfront, L-VQE starts from a simple and shallow hardware eﬃcientansatz with a small number of parameterized gates and then adds layers to the ansatz systematically. Thisstrategy allows us to make the ansatz more expressive and reduces the optimization overhead. Our numericalresults suggest that adding layers of the ansatz indeed increases the probability of ﬁnding the ground state orﬁnding the state that is suﬃciently close to the ground state. With the presence of sampling noise, however,VQE is more likely to fail. We empirically observe L-VQE to be more robust under sampling noise, makingit a promising approach for NISQ devices. We use matrix product state representation to perform large-scale simulations of the quantum circuits in MATLAB. Doing so allowed us to explore problems of largersize (simulations up to 40 qubits and 352 parameters). We also studied the performance of L-VQE usinga simulator of noisy trapped-ion quantum computer. The results suggest that our approach is relativelyrobust to hardware noise and can be adapted and generalized to diﬀerent quantum architectures. Finally,we present numerical results of the role of entanglement in VQE. The results clearly show that the ansatzwith entanglement performs better than the ansatz without entanglement.Our results are the ﬁrst indication that the introduction of additional entangling parameters in VQE forclassical problems, as proposed in [65, Section V-B], break down the barriers in the optimization landscape,making it more convex and therefore more amenable to simple local outer-loop optimizers to ﬁnd a minimum.This is in sharp contrast with the previous results of Nannicini [8], who did not observe any beneﬁcial eﬀectsof entanglement. The diﬀerence in ﬁndings between our results and those presented in [8] suggests theimportance of the parameterization choice and the overall VQE procedure design to the success of suchmethods. We hope that this work will lead to even better algorithms to design ans¨atze for NISQ devices. ACKNOWLEDGMENTS

We thank Jeﬀrey Larson for help with tuning APOSMM for QAOA parameter optimization. ClemsonUniversity is acknowledged for generous allotment of compute time on the Palmetto cluster. X.L., A.A., R.S.,I.S. and Y.A. were supported in part with funding from the Defense Advanced Research Projects Agency(DARPA). R.S. and Y.A. were supported by Laboratory Directed Research and Development (LDRD) fund-ing from Argonne National Laboratory, provided by the Director, Oﬃce of Science, of the U.S. Department ofEnergy under Contract No. DE-AC02-06CH11357. R.S. was supported by the U.S. Department of Energy,Oﬃce of Science, Oﬃce of Advanced Scientiﬁc Computing Research, Accelerated Research for QuantumComputing program. L.C. was supported by the Laboratory Directed Research and Development (LDRD)16rogram of Los Alamos National Laboratory (LANL) under project number 20200056DR. LANL is operatedby Triad National Security, LLC, for the National Nuclear Security Administration of U.S. Department ofEnergy (contract no. 89233218CNA000001). L.C. was also supported by the U.S. DOE, Oﬃce of Science,Oﬃce of Advanced Scientiﬁc Computing Research, under the Accelerated Research in Quantum Computing(ARQC) program. [1] Y. Alexeev, D. Bacon, K. R. Brown, R. Calderbank, L. D. Carr, F. T. Chong, B. DeMarco, D. Englund, E. Farhi,B. Feﬀerman, et al. , Quantum computer systems for scientiﬁc discovery, arXiv preprint arXiv:1912.07577 (2019).[2] R. Shaydulin, H. Ushijima-Mwesigwa, C. F. A. Negre, I. Safro, S. M. Mniszewski, and Y. Alexeev, A hybridapproach for solving optimization problems on small quantum computers, Computer , 18 (2019).[3] J. Preskill, Quantum computing in the NISQ era and beyond, Quantum , 79 (2018).[4] E. Farhi, J. Goldstone, and S. Gutmann, A quantum approximate optimization algorithm, arXiv preprintarXiv:1411.4028 (2014).[5] S. Hadﬁeld, Z. Wang, B. O’Gorman, E. Rieﬀel, D. Venturelli, and R. Biswas, From the quantum approximateoptimization algorithm to a quantum alternating operator ansatz, Algorithms , 34 (2019).[6] E. Farhi, J. Goldstone, S. Gutmann, and H. Neven, Quantum algorithms for ﬁxed qubit architectures, arXivpreprint arXiv:1703.06199 (2017).[7] L. Zhu, H. L. Tang, G. S. Barron, N. J. Mayhall, E. Barnes, and S. E. Economou, An adaptive quantumapproximate optimization algorithm for solving combinatorial problems on a quantum computer, arXiv preprintarXiv:2005.10258 (2020).[8] G. Nannicini, Performance of hybrid quantum-classical variational heuristics for combinatorial optimization,Physical Review E , 013304 (2019).[9] M. Cerezo, A. Arrasmith, R. Babbush, S. C. Benjamin, S. Endo, K. Fujii, J. R. McClean, K. Mitarai, X. Yuan,L. Cincio, et al. , Variational quantum algorithms, arXiv preprint arXiv:2012.09265 (2020).[10] Z. Holmes, K. Sharma, M. Cerezo, and P. J. Coles, Connecting ansatz expressibility to gradient magnitudes andbarren plateaus, arXiv preprint arXiv:2101.02138 (2021).[11] J. R. McClean, S. Boixo, V. N. Smelyanskiy, R. Babbush, and H. Neven, Barren plateaus in quantum neuralnetwork training landscapes, Nature communications , 1 (2018).[12] A. Kandala, A. Mezzacapo, K. Temme, M. Takita, M. Brink, J. M. Chow, and J. M. Gambetta, Hardware-eﬃcientvariational quantum eigensolver for small molecules and quantum magnets, Nature , 242 (2017).[13] R. Shaydulin, S. Hadﬁeld, T. Hogg, and I. Safro, Classical symmetries and QAOA, arXiv preprintarXiv:2012.04713 (2020).[14] S. Bravyi, A. Kliesch, R. Koenig, and E. Tang, Obstacles to state preparation and variational optimization fromsymmetry protection, arXiv preprint arXiv:1910.08980 (2019).[15] M. Cerezo, A. Sone, T. Volkoﬀ, L. Cincio, and P. J. Coles, Cost-function-dependent barren plateaus in shallowquantum neural networks, arXiv preprint arXiv:2001.00550 (2020).[16] K. Sharma, M. Cerezo, L. Cincio, and P. J. Coles, Trainability of dissipative perceptron-based quantum neuralnetworks, arXiv preprint, arXiv:2005.12458 (2020).[17] A. Pesah, M. Cerezo, S. Wang, T. Volkoﬀ, A. T. Sornborger, and P. J. Coles, Absence of barren plateaus inquantum convolutional neural networks, arXiv preprint, arXiv:2011.02966 (2020).[18] M. Cerezo and P. J. Coles, Impact of barren plateaus on the Hessian and higher order derivatives, arXiv preprint,arXiv:2008.07454 (2020).[19] S. Wang, E. Fontana, M. Cerezo, K. Sharma, A. Sone, L. Cincio, and P. J. Coles, Noise-induced barren plateausin variational quantum algorithms, arXiv preprint, arXiv:2007.14384 (2020).[20] Z. Holmes, A. Arrasmith, B. Yan, P. J. Coles, A. Albrecht, and A. T. Sornborger, Barren plateaus precludelearning scramblers, arXiv preprints, arXiv:2009.14808 (2020).[21] T. J. Volkoﬀ, Eﬃcient trainability of linear optical modules in quantum optical neural networks, arXiv preprint,arXiv:2008.09173 (2020).[22] E. Campos, A. Nasrallah, and J. Biamonte, Abrupt transitions in variational quantum circuit training, arXivpreprint, arXiv:2010.09720 (2020).[23] C. Ortiz Marrero, M. Kieferov´a, and N. Wiebe, Entanglement induced barren plateaus, arXiv preprint,arXiv:2010.15968 (2020).[24] A. Abbas, D. Sutter, C. Zoufal, A. Lucchi, A. Figalli, and S. Woerner, The power of quantum neural networks,arXiv preprint, arXiv:2011.00027 (2020).

25] S. Khairy, R. Shaydulin, L. Cincio, Y. Alexeev, and P. Balaprakash, Learning to optimize variational quan-tum circuits to solve combinatorial problems, Proceedings of the Thirty-Forth AAAI Conference on ArtiﬁcialIntelligence (AAAI-20) (2019).[26] M. Wilson, S. Stromswold, F. Wudarski, S. Hadﬁeld, N. M. Tubman, and E. Rieﬀel, Optimizing quantumheuristics with meta-learning, arXiv preprint arXiv:1908.03185 (2019).[27] G. Verdon, M. Broughton, J. R. McClean, K. J. Sung, R. Babbush, Z. Jiang, H. Neven, and M. Mohseni, Learningto learn with quantum neural networks via classical neural networks, arXiv preprint arXiv:1907.05415 (2019).[28] L. Zhou, S.-T. Wang, S. Choi, H. Pichler, and M. D. Lukin, Quantum approximate optimization algorithm:Performance, mechanism, and implementation on near-term devices, Physical Review X , 021067 (2020).[29] G. E. Crooks, Performance of the quantum approximate optimization algorithm on the maximum cut problem,arXiv preprint arXiv:1811.08419 (2018).[30] G. B. Mbeng, R. Fazio, and G. Santoro, Quantum annealing: A journey through digitalization, control, andhybrid quantum variational schemes, arXiv preprint arXiv:1906.08948 (2019).[31] H. R. Grimsley, S. E. Economou, E. Barnes, and N. J. Mayhall, An adaptive variational algorithm for exactmolecular simulations on a quantum computer, Nature communications , 1 (2019).[32] A. Skolik, J. R. McClean, M. Mohseni, P. van der Smagt, and M. Leib, Layerwise learning for quantum neuralnetworks, arXiv preprint arXiv:2006.14904 (2020).[33] A. Peruzzo, J. McClean, P. Shadbolt, M.-H. Yung, X.-Q. Zhou, P. J. Love, A. Aspuru-Guzik, and J. L. O’brien,A variational eigenvalue solver on a photonic quantum processor, Nature communications , 4213 (2014).[34] M. Paredes Quinones and C. Junqueira, Modeling Linear Inequality Constraints in Quadratic Binary Optimiza-tion for Variational Quantum Eigensolver, arXiv preprint arxiv:2007.13245 (2020).[35] P. K. Barkoutsos, G. Nannicini, A. Robert, I. Tavernelli, and S. Woerner, Improving variational quantum opti-mization using CVaR, Quantum , 256 (2020).[36] M. Streif and M. Leib, Training the quantum approximate optimization algorithm without access to a quantumprocessing unit, Quantum Science and Technology , 034008 (2020).[37] R. Shaydulin and S. M. Wild, Exploiting symmetry reduces the cost of training qaoa, arXiv preprintarXiv:2101.10296 (2021).[38] F. G. Brandao, M. Broughton, E. Farhi, S. Gutmann, and H. Neven, For ﬁxed control parameters the quantumapproximate optimization algorithm’s objective function value concentrates for typical instances, arXiv preprintarXiv:1812.04170 (2018).[39] R. Shaydulin, I. Safro, and J. Larson, Multistart methods for quantum approximate optimization, in (IEEE, 2019) pp. 1–8.[40] M. E. Newman, Modularity and community structure in networks, Proceedings of the national academy ofsciences , 8577 (2006).[41] A. M. Niklasson, S. M. Mniszewski, C. F. Negre, M. J. Cawkwell, P. J. Swart, J. Mohd-Yusof, T. C. Germann,M. E. Wall, N. Bock, E. H. Rubensson, et al. , Graph-based linear scaling electronic structure theory, The Journalof Chemical Physics , 234101 (2016).[42] H. Jeong, B. Tombor, R. Albert, Z. N. Oltvai, and A.-L. Barab´asi, The large-scale organization of metabolicnetworks, Nature , 651 (2000).[43] J. Ugander, B. Karrer, L. Backstrom, and C. Marlow, The anatomy of the Facebook social graph, arXiv preprintarXiv:1111.4503 (2011).[44] U. Brandes, D. Delling, M. Gaertler, R. G¨orke, M. Hoefer, Z. Nikoloski, and D. Wagner, Maximizing modularityis hard, arXiv preprint arXiv:physics/0608255 (2006).[45] S. Wang, E. Fontana, M. Cerezo, K. Sharma, A. Sone, L. Cincio, and P. J. Coles, Noise-induced barren plateausin variational quantum algorithms, arXiv preprint arXiv:2007.14384 (2020).[46] C. Xue, Z.-Y. Chen, Y.-C. Wu, and G.-P. Guo, Eﬀects of quantum noise on quantum approximate optimizationalgorithm, arXiv preprint arXiv:1909.02196 (2019).[47] M. C. Nascimento and A. C. De Carvalho, Spectral methods for graph clustering–a survey, European Journal ofOperational Research , 221 (2011).[48] C. F. Negre, H. Ushijima-Mwesigwa, and S. M. Mniszewski, Detecting multiple communities using quantumannealing on the D-Wave system, Plos one , e0227538 (2020).[49] R. Shaydulin, H. Ushijima-Mwesigwa, I. Safro, S. Mniszewski, and Y. Alexeev, Network community detectionon small quantum computers, Advanced Quantum Technologies , 1900029 (2019).[50] H. Ushijima-Mwesigwa, C. F. Negre, and S. M. Mniszewski, Graph partitioning using quantum annealing onthe D-Wave system, in Proceedings of the Second International Workshop on Post Moores Era Supercomputing (2017) pp. 22–29.[51] H. Ushijima-Mwesigwa, R. Shaydulin, C. F. Negre, S. M. Mniszewski, Y. Alexeev, and I. Safro, Multilevel com-binatorial optimization across quantum architectures, to appear in ACM Transactions on Quantum Computing,arXiv preprint arXiv:1910.09985 (2019).

52] R. Shaydulin, H. Ushijima-Mwesigwa, I. Safro, S. Mniszewski, and Y. Alexeev, Community detection acrossemerging quantum architectures, Proceedings of the 3rd International Workshop on Post Moore’s Era Super-computing (2018).[53] R. Or´us, A practical introduction to tensor networks: Matrix product states and projected entangled pair states,Annals of Physics , 117 (2014).[54] D. Lykov, R. Schutski, A. Galda, V. Vinokur, and Y. Alexeev, Tensor network quantum simulator with step-dependent parallelization, arXiv preprint arXiv:2012.02430 (2020).[55] K. M. Nakanishi, K. Fujii, and S. Todo, Sequential minimal optimization for quantum-classical hybrid algorithms,arXiv preprint arXiv:1903.12166 (2019).[56] Qiskit: An open-source framework for quantum computing (2019).[57] M. J. Powell, A direct search optimization method that models the objective and constraint functions by linearinterpolation, in

Advances in optimization and numerical analysis (Springer, 1994) pp. 51–67.[58] M. J. D. Powell, Direct search algorithms for optimization calculations, Acta Numerica , 287 (1998).[59] E. Jones, T. Oliphant, P. Peterson, et al. , SciPy: Open source scientiﬁc tools for Python (2001–), [Online.].[60] S. Hudson, J. Larson, S. M. Wild, and D. Bindel, libEnsemble users manual (2019).[61] J. Larson and S. M. Wild, A batch, derivative-free algorithm for ﬁnding multiple local minima, Optimizationand Engineering , 205 (2016).[62] J. Larson and S. M. Wild, Asynchronously parallel optimization solver for ﬁnding multiple minima, MathematicalProgramming Computation , 303 (2018).[63] L. Cincio, K. Rudinger, M. Sarovar, and P. J. Coles, Machine learning of noise-resilient quantum circuits, arXivpreprint arXiv:2007.01210 (2020).[64] C. J. Trout, M. Li, M. Guti´errez, Y. Wu, S.-T. Wang, L. Duan, and K. R. Brown, Simulating the performanceof a distance-3 surface code in a linear ion trap, New Journal of Physics , 043038 (2018).[65] J. R. McClean, M. P. Harrigan, M. Mohseni, N. C. Rubin, Z. Jiang, S. Boixo, V. N. Smelyanskiy, R. Babbush,and H. Neven, Low depth mechanisms for quantum optimization, arXiv preprint arXiv:2008.08615 (2020). The submitted manuscript has been created by UChicago Argonne, LLC, Operator of Argonne National Lab-oratory (“Argonne”). Argonne, a U.S. Department of Energy Oﬃce of Science laboratory, is operated underContract No. DE-AC02-06CH11357. The U.S. Government retains for itself, and others acting on its behalf,a paid-up nonexclusive, irrevocable worldwide license in said article to reproduce, prepare derivative works,distribute copies to the public, and perform publicly and display publicly, by or on behalf of the Govern-ment. The Department of Energy will provide public access to these results of federally sponsored research inaccordance with the DOE Public Access Plan http://energy.gov/downloads/doe-public-access-planhttp://energy.gov/downloads/doe-public-access-plan