[PDF] Empirical performance bounds for quantum approximate optimization

Abstract

The quantum approximate optimization algorithm (QAOA) is a variational method for noisy, intermediate-scale quantum computers to solve combinatorial optimization problems. Quantifying performance bounds with respect to specific problem instances provides insight into when QAOA may be viable for solving real-world applications. Here, we solve every instance of MaxCut on non-isomorphic unweighted graphs with nine or fewer vertices by numerically simulating the pure-state dynamics of QAOA. Testing up to three layers of QAOA depth, we find that distributions of the approximation ratio narrow with increasing depth while the probability of recovering the maximum cut generally broadens. We find QAOA exceeds the Goemans-Williamson approximation ratio bound for most graphs. We also identify consistent patterns within the ensemble of optimized variational circuit parameters that offer highly efficient heuristics for solving MaxCut with QAOA. The resulting data set is presented as a benchmark for establishing empirical bounds on QAOA performance that may be used to test on-going experimental realizations.

Full PDF

aa r X i v : . [ qu a n t - ph ] F e b Empirical performance bounds for quantum approximate optimization

Phillip C. Lotshaw ∗ and Travis S. Humble † Quantum Computational Sciences GroupOak Ridge National LaboratoryOak Ridge, Tennessee 37830 USA

Rebekah Herrman ‡ and James Ostrowski § Department of Industrial and Systems Engineering,University of Tennessee at KnoxvilleKnoxville, Tennessee 37996-2315 USA

George Siopsis ¶ Department of Physics and Astronomy,University of Tennessee at KnoxvilleKnoxville, Tennessee 37996-1200 USA

The quantum approximate optimization algorithm (QAOA) is a variational method for noisy,intermediate-scale quantum computers to solve combinatorial optimization problems. Quantifyingperformance bounds with respect to speciﬁc problem instances provides insight into when QAOAmay be viable for solving real-world applications. Here, we solve every instance of MaxCut onnon-isomorphic unweighted graphs with nine or fewer vertices by numerically simulating the pure-state dynamics of QAOA. Testing up to three layers of QAOA depth, we ﬁnd that distributionsof the approximation ratio narrow with increasing depth while the probability of recovering themaximum cut generally broadens. We ﬁnd QAOA exceeds the Goemans-Williamson approximationratio bound for most graphs. We also identify consistent patterns within the ensemble of optimizedvariational circuit parameters that oﬀer highly eﬃcient heuristics for solving MaxCut with QAOA.The resulting data set is presented as a benchmark for establishing empirical bounds on QAOAperformance that may be used to test on-going experimental realizations.

I. INTRODUCTION

Noisy, intermediate-scale quantum (NISQ) computersmay soon solve problems of practical importance with aquantum computational advantage [1, 2]. One promis-ing approach uses the quantum approximate optimiza-tion algorithm (QAOA) [3–17] or its variants [18–24] toﬁnd approximate solutions to combinatorial optimizationproblems. By using alternating layers of a mixing oper-ator and a cost operator, QAOA promises to prepare anapproximation to the quantum state that maximizes theexpectation value of the cost operator [3]. Moreover, thecost operator may represent an instance of unconstrained ∗ [email protected] † [email protected];This manuscript has been authored by UT-Battelle, LLC un-der Contract No. DE-AC05-00OR22725 with the U.S. Depart-ment of Energy. The United States Government retains and thepublisher, by accepting the article for publication, acknowledgesthat the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce thepublished form of this manuscript, or allow others to do so, forUnited States Government purposes. The Department of En-ergy will provide public access to these results of federally spon-sored research in accordance with the DOE Public Access Plan.(http://energy.gov/downloads/doe-public-access-plan). ‡ [email protected] § [email protected] ¶ [email protected] combinatorial optimization, which opens the applicationof QAOA to a wide variety of practical but challengingcomputational problems, including the well known graphproblem MaxCut.Alongside theoretical guarantees, the empirical per-formance of QAOA is an important open question forevaluating computational utility. The optimal numberof alternating circuit layers as well as their tuning hasbeen observed to vary with speciﬁc problem instances.Previous eﬀorts examining MaxCut have tested perfor-mance in terms of the approximation ratio, which quan-tiﬁes the average cost function value observed relativeto the optimal value. These include studies on fami-lies of problem instances represented by 2-regular graphs[3–5], 3-regular graphs [6–10], small samples of randomgraphs [11, 12], and small samples of graphs with var-ious ﬁxed symmetries [13]. Notably, Crooks has shownthat QAOA may exceed the performance bounds set bythe Goemans-Williamson algorithm [12], which yields thebest conventional lower bound on the approximation ra-tio for MaxCut. This collection of results has encouragedfurther study of the more general circumstances underwhich QAOA may provide a practical improvement inperformance.An additional measure of QAOA performance is theresources required to prepare the optimized circuit lay-ers. By design, the alternating layers of QAOA are tunedto prepare the approximate quantum state using parame-terized gates optimized with respect to the observed costvalue. However, the eﬀort to identify these optimizedcircuits becomes intractable for large numbers of param-eters. Previous studies have used optimization heuristicsbased on machine learning [12, 13] and various numeri-cal optimization algorithms [6–8, 11, 22]. These resultshave been limited to small sets of graphs, often with sim-ple regular structures, and an open question is whetherthere are general heuristics that yield good parametersquickly and consistently.Here we use numerical simulations of pure state dy-namics to quantify performance bounds of QAOA solv-ing an exhaustive set of MaxCut instances for n ≤ p ≤

3. The exhaustive problem-instance set gives a thorough and systematic accountof QAOA behavior on MaxCut for small graphs. Wealso evaluate the eﬀectiveness of circuit optimization us-ing the Broyden-Fletcher-Goldfarb-Shanno (BFGS) al-gorithm [25] by benchmarking against exact optimiza-tion software and brute force solutions for all graphs at p = 1 and for graphs with n ≤ p = 2. Weconﬁrm BFGS returns optimal angles for QAOA statepreparation in all these cases. Ultimately, our simula-tion results reveal patterns in the optimized circuits thatsupport heuristics for more eﬃcient parameter selection.Our characterizations of QAOA performance for solv-ing MaxCut address the approximation ratio and theprobability of obtaining the optimal result. While theformer measure estimates the average cost function valuereturned by QAOA relative to the actual optimal value,the latter quantiﬁes the probability to recover the opti-mal cut. We compile the results for each graph instanceinto a data set [26] that serves as a validated benchmarkto support experimental testing of QAOA on NISQ de-vices as well as analyses of how speciﬁc graph structurefeatures are correlated with performance [27].The remainder of the presentation is organized as fol-lows: we review QAOA in Sec. II and discuss our ap-proach to variational circuit simulations in Sec. III. Wepresent results from these simulations in Sec. IV, includ-ing trends in the performance measures and gate param-eters, before concluding with our key ﬁndings in Sec. V. II. QUANTUM APPROXIMATEOPTIMIZATION ALGORITHM

The quantum approximate optimization algorithm(QAOA) is a variational algorithm designed to ﬁndgood approximate solutions to combinatorial optimiza-tion problems [3]. A problem instance is speciﬁed bya cost function C ( z ) with z = ( z n − , z n − , ..., z ) and z j ∈ { , } . QAOA recovers a candidate solution z ∗ andthe quality of this candidate can be quantiﬁed by thenormalized value of the cost function R ( z ∗ ) = C ( z ∗ ) C max . (1) Here C max = max z C ( z ) (2)is the globally optimal value with optimal solution z max = arg max z C ( z ) (3)and R ( z max ) = 1 while R < z as a quantum state withrespect to the computational basis | z j i ∈ {| i , | i} suchthat | z i = | z n − , ..., z i . The cost function C ( z ) is ex-pressed as the operator ˆ C that is diagonal in the compu-tational basis with matrix elements h z | ˆ C | z i = C ( z ) . (4)The initial state of QAOA is taken as a uniform super-position of the computational basis states | ψ i = 1 √ n n − X z =0 | z i (5)that is transformed by a series of p unitary operations as | ψ p ( γ , β ) i = p Y q =1 ˆ U ( ˆ B, β q ) ˆ U ( ˆ C, γ q ) ! | ψ i (6)with variational angle parameters γ = ( γ , ..., γ p ) and β = ( β , ..., β p ). The ﬁrst unitary operatorˆ U ( ˆ C, γ q ) = exp( − iγ q ˆ C ) (7)applies a C ( z )-dependent phase to each of the computa-tional basis states. The second unitaryˆ U ( ˆ B, β q ) = exp( − iβ q ˆ B ) , (8)applies coupling in the computational basis asˆ B = n − X j =0 ˆ X j , (9)where ˆ X j is the Pauli X operator on qubit j . The lat-ter transitions between the {| z i} depend on the C ( z )-dependent phases from ˆ U ( ˆ C, γ q ) and the angle parame-ters ( γ , β ). QAOA selects these parameters to maximizethe value h ˆ C i , as discussed in more detail in Sec. II A.Measurement of the state | ψ p ( γ , β ) i in the computa-tional basis yields each z j ∈ { , } and therefore a can-didate solution z ∗ . The probability to observe a speciﬁc z is given by the Born rule P ( z ) = |h z | ψ p ( γ , β ) i| . (10)Following measurement, the observed result z is used tocalculate the cost C ( z ). Repeating the sequence of prepa-ration and measurement approximates the distribution of z given by Eq. (10). The proposed solution z ∗ may thenbe selected as the argument that yields the largest costfunction as deﬁned by Eq. (1).As shown by Farhi et al. [3], the probability that ameasurement returns the optimal solution z max convergesto unity as the QAOA depth p goes to inﬁnity. Less isknown about the performance of QAOA at ﬁnite p , wherestudies suggest a modest p may suﬃce for achieving aquantum computational advantage [7, 12]. A. Gate parameter optimization

Finding optimal solutions, or good approximations, toEq. (1) requires optimizing the angle parameters thattune each QAOA layer. A standard approach to op-timizing these angles is to pick an initial set of values( γ (0) , β (0) ) and then prepare and measure many copiesof the state | ψ p ( γ (0) , β (0) ) i to estimate the expectationvalue of the cost function operator as h ˆ C ( γ , β ) i = X z P ( z ) C ( z ) . (11)Analytically, P ( z ) depends on | ψ p ( γ , β ) i throughEq. (10).When using an optimization algorithm, such as gra-dient descent or BFGS [25], new angles ( γ (1) , β (1) ) aretested for increases in h ˆ C i . After ( γ (1) , β (1) ) have beenselected, the process is repeated with the new angles:prepare and measure a set of states | ψ p ( γ (1) , β (1) ) i , sendthe results { z } to a classical computer to calculate anestimate of h ˆ C ( γ (1) , β (1) ) i , then pick new angles to trywith the quantum computer. The optimization processis repeated until a set of angles is found to maximize h ˆ C i .States with the optimized angles are repeatedly preparedand measured to sample z and identify a solution z ∗ thatgives the best approximation.Previous studies have found patterns in the optimizedparameters that can potentially simplify gate-angle op-timization. Zhou et al. [6] examined regular graphs anddeveloped parameter optimization approaches for deepcircuits using linear interpolation or discrete cosine andsine transformations of parameters from lower depths.Brand˜ao et al. [9] have argued that parameters shouldbe transferable between graphs with suﬃciently similarstructure, suggesting that approaches like that of Zhou et al. may be applicable within any families of structuredgraphs. Other studies suggest that parameter patternsapply more generally between varying types of graphs, asobserved in small samples of random graphs [11, 12], orperhaps even between varying problems and QAOA-likealgorithms, as in an approach to Max- k Vertex Cover us-ing a variant formulation of QAOA [22]. Together thesestudies suggest that parameter patterns may be genericand important in simplifying optimization for QAOA.However, it is unclear if these patterns will extend overgeneral graphs or if they are byproducts of the structuresand small samples of graphs examined so far. In addition to the angle parameters, a depth parameter p deﬁnes a given state preparation circuit and instanceof QAOA. Higher depths theoretically give better perfor-mance [3], but they also require more computationallyintensive optimizations. It is therefore eﬃcient to use thesmallest depth needed to obtain a desired quality of so-lution. Previous studies have characterized performancefor 2-regular graphs at arbitrary depths [3, 4] and estab-lished performance bounds for 3-regular graphs at severallow depths [10]. For general graphs, an analytical expres-sion that characterizes performance has been derived forthe lowest-possible depth p = 1 [4, 5] and the relationshipbetween depth and performance under speciﬁc parame-ter schedules has been examined for graphs with vary-ing symmetry [13]. However, much less is known aboutQAOA performance on general graphs at depths p > B. Characterizing QAOA performance

The quality of a solution z ∗ is characterized by the nor-malized cost function in Eq. (1). In QAOA, z ∗ dependson measurements that probabilistically sample the com-putational basis states following Eq. (10). Therefore, wecharacterize QAOA performance using the quantum ap-proximation ratio r = hR ( z ) i , (12)where R ( z ) is the normalized cost and the expectationvalue is taken as in Eq. (11). The quantity r describes theaverage R ( z ) with respect to measurements of | ψ p ( γ , β ) i .An argument about concentration in QAOA claims thatsolutions z ∗ with R ( z ∗ ) ≥ r can be expected with highprobability given a modest number of quantum statemeasurements [3]. The quantum r is thus also relatedto the classical notion of an approximation ratio as aminimum bound on performance for the optimization al-gorithm.A second way to characterize QAOA success quantiﬁesthe probability to obtain the optimal solution z max fromEq. (3). In general, there can be multiple optimal solu-tions, which we denote as the set of k optimal solutions { z i max } i ∈ [ k ] , and the probability of optimizing the costfunction is P ( C max ) = k X i =1 P ( z i max ) , (13)where P ( z i max ) is the probability of z i max from Eq. (10).Letting P s be a desired threshold to recover the max cut,the minimal number of samples N s to observe a cut thatmaximizes the cost function is given by N s = log (1 − P s )log (1 − P ( C max )) , (14)since the probability to observe no maximum cut in N s samples is P ′ = (1 − P ( C max )) N s and P ′ = 1 − P s . C. MaxCut

MaxCut partitions a graph G such that the number ofedges shared between the partitions is maximized. Let G = ( V, E ) with V = { n − , ..., } the set of vertex labelsand E = {h j, k i : j, k ∈ V } the set of unweighted edges.For each vertex j ∈ V , a cut z = ( z n − , ..., z ) assigns abinary label z j ∈ { , } that denotes the correspondingset. We express the cost function for MaxCut as a sumof pair-wise clauses C ( z ) = X h j,k i C h j,k i ( z ) . (15)with C h j,k i ( z ) = z j + z k − z j z k = (cid:26) z j = z k z j = z k (16)An individual clause C h j,k i ( z ) is maximized when twovertices j, k connected by an edge are assigned to oppos-ing sets, and the purpose of MaxCut is to ﬁnd a cut z max that maximizes Eq. (15). There are always at least twomaximum cuts since the C h j,k i is invariant under 0 ↔ z j in any z .From Eq. (4), the cost function is cast as a sum ofoperators for the edgesˆ C = X h j,k i ˆ C h j,k i , (17)with edge operatorsˆ C h j,k i = 12 (cid:16) ˆ − ˆ Z j ˆ Z k (cid:17) , (18)where ˆ is the identity operator and ˆ Z j is the Pauli Z operator on qubit j . The notation ˆ Z j ˆ Z k uses an implicittensor product with the identity operator on the Hilbertspace of unlisted qubits. III. SIMULATIONS OF QAOA

We simulate QAOA to solve MaxCut of all connectednon-isomorphic graphs with n ≤ N n grows rapidly with the number of vertices n , as shown in Table I. n N n TABLE I: The number of connected non-isomorphicgraphs N n grows with the number of vertices n . A. Global optimal solutions for p = 1 We ﬁrst optimize instances of QAOA for depth p = 1using the expectation value of the cost function operator h ˆ C ( γ , β ) i = X h j,k i h ˆ C h j,k i ( γ , β ) i (19)Previously, Hadﬁeld derived a general expression for h ˆ C h j,k i i at depth p = 1 as [5] h ˆ C h j,k i ( γ , β ) i = sin(4 β ) sin( γ ) (cid:0) cos d ( γ ) + cos e ( γ ) (cid:1) − sin (2 β ) cos d + e − f ( γ ) (cid:0) − cos f (2 γ ) (cid:1) , (20)where d = deg( j ) − e = deg( k ) − j and k connected by the edge h j, k i and f is the number of triangles containing h j, k i .We maximize h ˆ C ( γ , β ) i using the numerical opti-mization software Couenne [29, 30], which optimizesproblems of the formmax F ( x , y )s.t. p i ( x , y ) ≤ ∀ i ∈ [ n ] x ∈ R n y ∈ Z n (21)Here, we specify the continuous variable x = ( γ , β ) andthe integer variable y is not used. The function F ( x , y )then corresponds with Eq. (19) while the polynomial con-straints p i ( x ) are not included since there are no con-straints on the angles. Couenne simpliﬁes this problemby reformulating it with auxiliary variables. Then, a con-vex relaxation of the reformulated problem is found andsolved using branch and bound techniques. The processis repeated until an optimal solution is recovered [29, 30].The results provide globally optimal instances of QAOAfor p = 1. B. Numerical searches for general p For QAOA depth p >

1, we use the Broyden-Fletcher-Goldfarb-Shanno (BFGS) optimization algorithm [25].BFGS has been used previously in a wide variety of con-texts including QAOA [6]. BFGS begins with an ini-tial set of angles and then iteratively ﬁnds angles thatconverge to a local maximum of h ˆ C ( γ , β ) i . Each itera-tion is determined by the numerical gradient of h ˆ C i andan approximate Hessian second-derivative matrix that isconstructed from the gradient at successive steps. Us-ing the approximate Hessian when calculating the stepsgives faster convergence than ﬁrst-order methods whilealso avoiding the computational expense of calculatingthe exact Hessian. r p=0 (a)Mean BFGS random startsGlobal optimal 0 50 100p=1 (b)BFGS random starts 0 250 500p=2 (c)BFGS random starts FIG. 1: Convergence of multi-start BFGS optimization for the 853 graphs with n = 7 vertices and depths p = 1 , , r over the graphs and blackdashed lines show ± one standard deviation. For p = 1 BFGS converges to the global optimal solutions fromCouenne in red.We test convergence of BFGS using random initial an-gles and monitoring the local maximum of h ˆ C i . We re-peat the initialization procedure a ﬁxed number of timesthat varies with depth p . We use 50 random-angle seedswhen p = 1, 100 random seeds when p = 2, and 500seeds when p = 3. As shown in Fig. 1(a), the averageand standard deviation of the quantum approximationratio at p = 1 quickly converges to the solution obtainedusing Couenne. We compared results from BFGS to re-sults from Couenne for every graph with n ≤ n = 9 dueto numerical issues with Couenne, so instead we checkedthese results using brute force searches. As shown inFig. 1(b) and (c), we observe similar behavior in conver-gence for the cases of p = 2 and 3.We extend our validation of BFGS for n = 9 with p = 1and n ≤ p = 2 using brute force search. We evaluate h ˆ C i for all ( γ , γ , β , β ) on a grid with spacing π/

100 foreach angle. We found both BFGS and brute force returnvalues for the optimal h ˆ C i to within an additive factorof 10 − . As BFGS consistently recovers larger maxima,we attribute these diﬀerences to coarse-graining in thebrute force method. We conclude that BFGS is ﬁndingglobally optimal results at p = 2 for these graphs.To verify our results are consistent for n > p = 2 and for all n with p = 3, we run BFGS calculationsa second time using diﬀerent sets of 100 or 500 randomseeds. We observe an increases in h ˆ C i after the secondround of BFGS for less than 1% of graphs at each n and p .We conclude that BFGS typically ﬁnds globally optimalsolutions at these higher n and p though it is unclear howdeviations may grow beyond p = 3. C. Symmetry analysis

The optimized angles recovered by BFGS exhibit a va-riety of symmetries, in which multiple angle solutions give the same optimized expectation value of the costfunction. We simplify these results by systematically re-porting a single assignment for the optimized angles tocompare the distributions of angles recovered across dif-ferent graph instances. Symmetry analysis proves essen-tial for revealing patterns in the optimized angles dis-cussed in Secs. IV C-IV D. We summarize how we usethe symmetries below with detailed descriptions of thesymmetries deferred to the Appendix.Many of these observed symmetries are identiﬁed byextending the analysis of Zhou et al. for regular graphs[6], which we apply more broadly to graphs where eachvertex has even degree or each vertex has odd degree.Speciﬁcally, the angles β q are periodic over intervals of π/ − π/ ≤ β q ≤ π/ q . We alwaysset β < γ q for generic graphs, so generally these have − π ≤ γ q ≤ π for all q . For graphs where every vertex haseven degree or every vertex has odd degree, we use sym-metries to always transform to angles in the half-intervals − π/ ≤ γ q ≤ π/

2. A small number of graphs have ad-ditional symmetries and for these we report angles withthe most component ( γ q , β q ) pairs inside the intervalsΓ = [ − π/ , , B = [ − π/ , . (22) IV. QAOA SIMULATION RESULTS

We analyze QAOA performance on the MaxCut prob-lem for graphs with various numbers of vertices n atvarious depths p using the two performance measures ofSection II B. We also discuss patterns in the observedoptimized angle distributions and we construct a searchheuristic that uses these patterns to reduce the compu-tational expense of parameter optimization. r GW (a) r FIG. 2: Distributions of the approximation ratios r forgraphs with n = 7 vertices. Panels (a),...,(d) show p = 0 , ..., r GW . p ¯ r σ ¯ F . ¯ F . ¯ F . .

633 0 .

063 0.116 0.004 0.0001 0 .

809 0 .

053 0.986 0.551 0.0392 0 .

884 0 .

037 1.000 1.000 0.3213 0 .

930 0 .

029 1.000 1.000 0.825

TABLE II: Mean ¯ r , standard deviation σ , andcomplimentary cumulative distribution functions ¯ F R ( r )of Eq. (23) for the r distributions in Fig. 2. A. Approximation ratio

Our ﬁrst measure of QAOA performance on MaxCutis the approximation ratio r from Eq. (12). We presentthe distribution of r across graphs with various n and p .Figure 2 shows an example of the distribution of r for all853 graphs with n = 7 vertices for depths p = 0 , . . . , p = 0 corre-sponds to the initial state in Eq. (5). Trends for other n are similar and described further below. In Fig. 2,the width of the plotted bars is smaller than the his-togram bin width w = 0 .

02 to clearly visualize the dis-tributions. Each panel also overlays a Gaussian distri-bution ( w/ √ πσ ) exp( − ( r − ¯ r ) / σ ) using the mean ¯ r and standard deviation σ in Table II as calculated fromthe observed distributions of r .Figure 2(a) shows r for the initial states of Eq. (5).Every z is equally probable in these states, so r is equiv-alent to the approximation ratio obtained by averagingrandom cuts from the uniform distribution. This cutshalf the edges in E on average, h ˆ C i = E/

2, while themaximum number of cuts is bounded by the total num-ber of edges, C max ≤ E , so r = h ˆ C i /C max ≥ / r for p = 1 , , p and r is approach- ing one as p increases. QAOA instances also performmore similarly with increasing depth as indicated by thedecrease in standard deviation of r .We further quantify increases in r with p by calculat-ing ¯ F R ( r ) as the fraction of graphs with approximationratio r that exceeds a threshold R . This is given by thecomplimentary cumulative distribution function¯ F R ( r ) = 1 N n X G ( n ) Θ( r − R ) , (23)where N n is the number of non-isomorphic connectedgraphs with n vertices, P G ( n ) is the sum over thesegraphs, and Θ is the Heaviside step function. Table IIpresents ¯ F R ( r ) for n = 7 with R = 0 . , . , and 0 . r ≥ . p = 1 and morethan half exceed r ≥ .

8. At p = 2, all graphs have r > . r > . p = 3. Wenote that the latter results exceeds the best lower boundfor the approximation ratio of r GW = 0 . r for all n ≤

9. Sim-ilar trends are observed for all n : the mean r increasesas p increases and the distributions become narrower interms of the standard deviation. The distributions of r over diﬀerent graphs at the same n are well described byGaussian distributions when n ≥

7, while for n < r as a function of n for p = 0 , . . . , p = 0 are approximately 2/3 for all n while thestandard deviation decreases with increasing n due tothe number of graphs sampled increasing as N n . For p = 0 in Figs. 3(b)-(d), the smallest graphs reach an op-timal r = 1 while larger graphs decay to steady valuesof r < n increases. There is not much variationin the mean or standard deviation of r between nearby n , especially at the largest n , so larger n might be ex-pected to return similar r distributions. As p increases,at each n the mean r become larger and the standarddeviations become smaller, which is consistent with thetrends at n = 7 seen in Fig. 2. The mean r all exceed theGoemans-Williamson r GW by at least a standard devia-tion at p = 3. B. Probability of the maximum cut

We next assess the performance of QAOA in terms ofthe probability for obtaining the maximum cut, P ( C max ).Figure 4 shows an example of the normalized distributionof the P ( C max ) for the 853 graphs with n = 7 at varyingdepths p . Statistics describing these distributions aregiven in Table III.The distribution for p = 0 shown in Fig. 4(a) peaksaround P ( C max ) ≈ r GW r n

2 3 4 5 6 7 8 9(b) n

2 3 4 5 6 7 8 9(c) n

2 3 4 5 6 7 8 9(d) n FIG. 3: Mean r across all graphs with various numbers of vertices n and depths p . Panels (a),...,(d) show p = 0 , ..., r , the dashed red line represents the approximationratio r GW of the Goemans-Williamson algorithm. P ( C m a x ) FIG. 4: Distributions of probabilities to obtain themaximum cut P ( C max ) for graphs with n = 7 vertices.Panels (a),..,(d) show p = 0 , ..., p ¯ P ( C max ) σ ¯ F . ¯ F . ¯ F . .

046 0 .

051 0.009 0.001 0.0001 0 .

211 0 .

132 0.278 0.035 0.0092 0 .

426 0 .

172 0.835 0.318 0.0463 0 .

610 0 .

187 0.975 0.715 0.252

TABLE III: Mean ¯ P ( C max ), standard deviation σ , andcomplimentary cumulative distribution functions¯ F R ( P ( C max )) similar to Eq. (23) for the P ( C max )distributions in Fig. 4.distribution yields exp( − κ P ( C max )) with κ = 29 . ± .

9. At p = 0, each possible cut is equally probable inthe initial state and there are 2 n possible cuts with atleast two maximum cuts. This implies the bound P ( C max ) ≥ / n − , (24)where the “0” subscript refers to the initial state. Theminimum value P ( C max ) = 1 / n − gives the main con-tribution to the lowest bar in Fig. 4(a) while higher values are obtained for those graphs with additional symme-tries, for example, in a complete graph where symmetrygives n ! / ( ⌊ n/ ⌋ ! ⌈ n/ ⌉ !) maximum cuts. The peak around P ( C max ) = 1 / n − indicates that large degeneracies likethis are uncommon in the total set of graphs.Figure 4(b) shows P ( C max ) when p = 1. We ﬁt thisdistribution with an exponential starting at the secondhistogram bar and similar to the p = 0 distribution. Thisyields a smaller decay constant κ = 7 . ± .

3. Panels(c) and (d) show the distributions for p = 2 and 3, re-spectively. These distributions are better described byGaussian distributions with means and standard devia-tions presented in Table III. The means increase with p and the distributions widen indicating greater diﬀerencesbetween graphs at larger p . The graphs pass thresholds P ( C max ) > R gradually, as seen by ¯ F R ( P ( C max )).We contrast distributions of P ( C max ) against distribu-tions of the quantum approximation ratio r in Fig. 2 andTable II. The mean of P ( C max ) increases the least whengoing from p = 0 to p = 1, while by contrast the meanof r increases the most in going from p = 0 to p = 1.The exponential and Gaussian distributions are also sig-niﬁcantly diﬀerent at small p . At higher p , P ( C max ) and r both follow Gaussian distributions but there are diﬀer-ences in how their widths change with p , with increasingdiﬀerences between diﬀerent graphs for P ( C max ) and theopposite for r .We explain why distributions of r have high averagesand narrow widths when concurrently P ( C max ) presentsbroad widths and relatively small averages. Consider thenormalized eigenspectrum ˆ C/C max of energies R ( z ) = C ( z ) /C max . For each n , we made a histogram of R ( z )for each graph, then average these values to obtain anaverage spectrum for R ( z ). Figure 5 shows the averagespectrum of R ( z ) at n = 7 vertices with each data pointindicating the fraction of basis states in a histogram binof width 1/12 averaged over all graphs with n = 7. Theresults look similar for other n ≥ n ≤ f r a c t i o n o f b a s i ss t a t e s R ( z )FIG. 5: Average spectrum of the normalized costfunction R ( z ) from Eq. (1) for graphs with n = 7vertices.We show error bars in the ﬁgure denoting the ﬁrst andthird quartiles of the distributions of the fractions of basisstates, calculated by listing the fractions for each graphin increasing order then taking the entries 1/4 and 3/4of the way up the list, rounded to the nearest integer.Most basis states give sub-optimal cuts R ( z ) < / . R ( z ) . .

95 and with few optimal statesat R ( z ) = 1. Thus, optimization of r in state prepara-tion favors large total probabilities in many states withnear-optimal R ( z ) over the smaller probability of prepar-ing truly optimal states. This gives relatively high anduniform r for diﬀerent graphs but a much more variable P ( C max ).Figure 6 shows how the average of P ( C max ) varies forthe distribution of graphs at various n and p using asym-metric error bars expressed as quartiles. The mean of P ( C max ) decreases exponentially with n as shown by theblue ﬁt to the averages P ﬁtmean ( C max ) = N p e − k p n (25)with the ﬁt parameters in Table IV. We ﬁt to a subsetof the data at each p to include only the largest n wheredecay is observed, and we show the n we ﬁt at each p bythe location of the ﬁt curve.At p = 0 the ﬁrst quartiles follow the minimum ofEq. (24), shown by the dashed black line. This indicatesmany of the graphs saturate the minimum theoreticalvalue of P ( C max ). As n increases, the averages and quar-tiles of the P ( C max ) distributions approach small valuesnear the minimum of Eq. (24).Figure 6(b) shows the averages and quartiles of thedistributions of P ( C max ) at p = 1. The average P ( C max )have increased relative to p = 0 and are again decreasingexponentially with n , but the rate of exponential decay k p is about half of the rate at p = 0, see Table IV. Theerror bars denoting quartiles reduce for n = 3, as thesegraphs are approaching full optimization. For higher n p N p k p . ± . . ± .

071 2 . ± . . ± .

022 2 . ± . . ± . . ± . . ± . TABLE IV: Parameters for the exponential ﬁts Eq. (25)in Fig. 6.the error bars increase to indicate there are larger devia-tions between diﬀerent graphs at p = 1.The same trends continue at higher p : the average P ( C max ) increase with p , with decreasing exponential de-cay constants in Table IV. The distributions widen as p increases at large n , indicating P ( C max ) is increasinglysensitive to graph structure. This is in contrast to thedistributions of the r , which narrow as p increases, withmore similar r for diﬀerent graphs. C. Optimized angle distributions

We next discuss the distribution of optimized angle pa-rameters shown in Figs. 7-9 for the 853 graphs at n = 7.These ﬁgures present two-dimensional histograms of theangle distributions, where the optimized ( γ j , β j ) havebeen organized into bins of size π/ × π/

20 and counted.We use a logarithmic color scale to visualize the distri-butions and add a base of one counts in each bin so thelogarithm does not diverge when there are zero counts.Patterns have been observed previously in the opti-mized angles for solving simple families of graphs. Op-timized angle patterns have been observed for 3-regulargraphs [6], small samples of random graphs [11, 12], andeven in a variant formulation of QAOA solving the Max- k Vertex Cover problem [22]. An argument has been giventhat angle patterns should be expected within familiesof graphs with systematic structure [9]. However, it isnot clear if similar angle patterns hold for over the set ofgeneral graphs considered here.

1. Angle patterns at p = 1 Figure 7 shows the distribution of optimized anglesfor all graphs with n = 7 vertices at depth p = 1 inEq. (6). The overwhelming majority of the angles arefocused in the bright spot near γ ≈ − π/ β ≈ − π/ n studied: a verylimited range of angles ( γ , β ) are observed to optimizealmost all graphs with p = 1.We categorize the optimized angles using a partition-ing of parameter space. Using Γ = [ − π/ ,

0] and B =[ − π/ ,

0] from Eq. (22), we say angle-pairs ( γ j , β j ) in-side (Γ , B) follow the angle patterns and say angle-pairsoutside (Γ , B) deviate from the pattern. Categorizing by P ( C max ) n

2 3 4 5 6 7 8 9(b) n

2 3 4 5 6 7 8 9(c) n

2 3 4 5 6 7 8 9(d) n FIG. 6: Probabilities for obtaining the maximum cut P ( C max ), averaged over graphs at various n , with p = 0 , ..., P ( C max ), thedashed black line in (a) follows 1 / n − from Eq. (24), solid blue lines show the exponential ﬁts of Eq. (25) to theindicated data, with parameters from Table IV. -0.250.000.25 -1.0 -0.5 0.0 0.5 1.0 β / π γ / π FIG. 7: Optimized angle distribution histogram for p = 1 QAOA on graphs with n = 7 vertices.(Γ , B) quantiﬁes how many graphs have angles that fol-low the observed patterns in Fig. 7 at p = 1 and alsoextends to categorizing graphs when p > D n,p be the number of n -vertex graphs that deviatefrom the angle patterns at depth p . At p = 1, D n,p is thenumber of graphs with ( γ , β ) (Γ , B). For general p , D n,p is the number of graphs with ( γ j , β j ) (Γ , B) forany j . Let D n,p = D n,p N n (26)be the fraction of n -vertex graphs that deviate from thepatterns at depth p , where the N n is the total numberof graphs from Table I. We calculate the D n,p using anumerical error tolerance of ǫ = 10 − , so that γ j that arein Γ to within numerical error are counted as γ j ∈ Γ, andsimilarly for the β j .Table V lists D n,p and the fractions D n,p of graphsthat do not follow the angle patterns at p = 1 for n ≤ n ≤

6, a smallnumber of graphs deviate beginning with n = 7. Thenumber of graphs that deviate increases with n , but theyare a small fraction of the total number of graphs. n p D n,p D n,p . × − . × − . × − TABLE V: Numbers D n,p and fractions D n,p of graphswith optimized angles ( γ , β ) (Γ , B) at p = 1.

2. Angle patterns at p > Figure 8 shows the distributions of optimized anglesfor all graphs with n = 7 vertices with depth p = 2. Theobserved ( γ , β ) diﬀer from the values at p = 1 becausethese angles are optimized together with ( γ , β ) at p = 2.The majority of ( γ , β ) are still concentrated near γ ≈− π/ β ≈ − π/ p = 1, but the distribution ismore dispersed within (Γ , B) in comparison with Fig. 7.There is a new cluster of angles at γ ≈ π/ β (recall β ≤ γ ≈ − π/ β ≈ − π/ γ , β ). The majority of angles follow a peakeddistribution in the parameter space of (Γ , B). The dis-tribution is shifted from the distribution at p = 1 tonew angles γ ≈ − π/ β ≈ − π/

13. There are alsosubsets of graphs with angles that form clusters aroundseemingly-random parameter values.We again categorize the optimized angles as agreeingwith the pattern when they are inside the parameterspace of (Γ , B) from Eq. (22) and deviating from the0 -0.250.000.25 -1.0 -0.5 0.0 0.5 1.0(a) β / π γ / π -0.250.000.25 -1.0 -0.5 0.0 0.5 1.0(b) β / π γ / π FIG. 8: Angle patterns at depth p = 2 for graphs with n = 7 vertices, with ( γ , β ) from the ﬁrst layer in (a)and ( γ , β ) in (b).pattern when they are outside (Γ , B) to within numer-ical error. The total fraction of graphs that deviate fromthe pattern is D n,p from Eq. (26). We further separatethe D n,p into components to understand which ( γ j , β j )deviate and their correlations at p = 2 , D ( j ) n,p denote the fraction of graphs that devi-ate only in the j th angle pair, ( γ j , β j ) (Γ , B) and( γ k , β k ) ∈ (Γ , B) for all k = j . Let D ( jk ) n,p denote the frac-tion of graphs where two angle pairs deviate, ( γ j , β j ) (Γ , B) and ( γ k , β k ) (Γ , B) but ( γ l , β l ) ∈ (Γ , B) for all l

6∈ { k, j } , and similarly let D ( jkl ) n,p denote the fractionof graphs where the j th, k th, and l th angle-pairs devi-ate from (Γ , B). The total fraction of graphs that devi-ate from the angle patterns is the sum of fractions thatdeviate in diﬀerent sets of angle pairs, for example, at p = 2 the total fraction of graphs that deviate is thesum of fractions that deviate in one or both angle pairs, D n,p = D (1) n,p + D (2) n,p + D (12) n,p .Table VI shows the fractions of graphs that deviatefrom the angle patterns at p = 2. The total fractionof graphs that deviate D n,p typically decreases as n in-creases, however, the D n,p is also increasing with p asseen by comparison with Table V. The angles deviatemost often in ( γ , β ) or in both ( γ , β ) and ( γ , β ),with D (2) n,p ≈ D (12) n,p , while deviations in only ( γ , β ) areobserved less often, with D (1) n,p < D (2) n,p , D (12) n,p .Figure 9 shows the optimized angle distributions at p = 3 and n = 7; Table VII shows the fractions of graphsthat deviate from (Γ , B). These continue the pattern: themajority of angles are concentrated in a single cluster in(Γ , B) at each layer, with additional small clusters of an-gles distributed unpredictably over the parameter space.The fractions of graphs in these clusters increases with p .Deviations from the angle patterns occur most often inthe ﬁnal layer of angles and in combinations connectingthe ﬁnal layer with earlier layers, with relatively large n p D (1) n,p D (2) n,p D (12) n,p D n,p × − × − × − × − × − × − × − × − × − × − × − × − × − × − × − × − × − × − × − × − × − × − TABLE VI: Fractions of graphs with optimized anglesthat deviate from the angle-pattern parameter space(Γ , B) at p = 2, see text for details. -0.250.000.25 -1.0 -0.5 0.0 0.5 1.0(a) β / π γ / π -0.250.000.25 -1.0 -0.5 0.0 0.5 1.0(b) β / π γ / π -0.250.000.25 -1.0 -0.5 0.0 0.5 1.0(c) β / π γ / π FIG. 9: Angle patterns at depth p = 3 for graphs with n = 7 vertices, for the ﬁrst layer parameters ( γ , β ) in(a), ( γ , β ) in (b), and ( γ , β ) in (c). D (3) n,p ≈ D (23) n,p ≈ D (123) n,p . Typically, smaller fractions ofgraphs deviate in the earlier layers only, as in D (1) n,p , D (2) n,p ,and D (12) n,p , or in disjoint layers, as in D (13) n,p . The totalfractions of graphs that deviate from the patterns D n,p decreases with n and increases with p .The deviations in the clusters of angles away from(Γ , B) appears to be due to diﬀerences in graph struc-ture. We have conﬁrmed that optimal angles deviatefrom (Γ , B) for some graphs. While parameter optimiza-tion is limited by the sampling of random seeds withBFGS, these deviations are not attributed to limits onsampling. We expect the clusters contain graphs withsimilar structures, with more graphs in the clusters athigher p indicating a greater sensitivity to graph struc-1 n p D (1) n,p D (2) n,p D (3) n,p D (12) n,p D (13) n,p D (23) n,p D (123) n,p D n,p × − × − × − × − × − × − × − × − × − × − × − × − × − × − × − × − × − × − × − × − × − × − × − × − × − × − × − × − × − × − × − × − × − × − × − × − × − × − × − × − TABLE VII: Fractions of graphs with optimized angles that deviate from the region (Γ , B) at p = 3, similar toTables V-VI. β /π β /π β /π γ /π γ /π γ /π -0.15244 -0.10299 -0.06517 -0.12641 -0.24101 -0.27459 TABLE VIII: Median optimized angles at n = 7 and p = 3, shown to ﬁve decimal places.ture features. Angles that optimize a graph at a given p are not necessarily close to the angles that optimizethe same graph at p + 1. However, for most graphs atthe n and p tested here, the angle variations are minimaland contained in (Γ , B). This motivates a heuristic ap-proach to identifying optimized angles for most graphs,developed in the next section.

D. Median Angles

We consider how patterns of optimized angles mayidentify good approximate angles for most graphs andgreatly reduce the computational cost of searching for an-gles with BFGS. Similar uses of angle patterns have beenconsidered for 3-regular graphs [6], families of structuredgraphs [9], and small samples of random graphs [12].We deﬁne a set of angles that follows the angle pat-terns by ﬁrst taking the median γ over all γ for graphswith n = 7 and p = 3, then deﬁne similar median γ j and β j for each j to obtain the set of angles shown inTable VIII. We consider two approaches to QAOA thatuse these angles to avoid the computationally expensiverandom seeding of the standard BFGS approach. Theﬁrst approach uses the median angles in the evolutionof Eq. (6) without any optimization, results from thisapproach are denoted r m and P m ( C max ). The second ap-proach uses the median angles as seeds in a single BFGSoptimization, results from this approach are denoted r mB and P mB ( C max ). Figures 10-11 compare results from themedian angle approaches at n = 7 and p = 3 to ourprevious results from BFGS optimization with hundredsof random seeds. The results are visualized using twodimensional histograms on a logarithmic color scale.Figure 10(a) compares the standard r from the fullBFGS search to the r m from the median angles without any optimization. The results are concentrated near thediagonal, shown by the white dotted line, and the meanand standard deviation of the diﬀerence r − r m is small,as seen in Table IX. The median angles r m give a goodapproximation to the r from the full BFGS search.Figure 10(b) compares the standard r against the r mB from single BFGS optimizations using the median anglesas seeds. The r mB are signiﬁcantly improved in com-parison with the r m , with 87% of the results satisfying r mB = r up to an additive factor of 10 − , with signiﬁ-cantly reduced mean and standard deviation of the diﬀer-ence r mB − r in Table IX. The approximation ratio fromthe single runs of BFGS with the median angle seedsare typically identical or close to results from the BFGSsearch with random seeds and are calculated at a smallfraction of the computational expense.Figure 11 assesses the probability of obtaining themaximum cut in the median angle approaches. The P m ( C max ) in Fig. 11(a) roughly follow the P ( C max ) alongthe diagonal but with some considerable spread below thediagonal; note the diﬀerence in scale in comparison withthe r in Fig. 10. There is also some spread above the diag-onal, indicating the median angles give higher probabili-ties for the maximum cut for some graphs. The mean andstandard deviation of the diﬀerence P m ( C max ) − P ( C max )is shown in Table IX; they are small but larger than thecorresponding values for r − r m . Overall, the medianangles give probabilities for the maximum cut that aretypically close to the full BFGS results, but the proba-bilities are more sensitive to graph structures and varysigniﬁcantly for some graphs.Figure 11(b) shows P mB ( C max ), where the median an-gles have been used in a single BFGS optimization. Theresults are much closer to the diagonal than in Fig. 11(a)and the mean and standard deviation of the diﬀerence P ( C max ) − P mB ( C max ) are smaller than the correspond-ing quantities with P m ( C max ). In comparison with the r mB , we again see the probability of obtaining the max-imum cut is more sensitive to the choice of angles, withgreater deviations for some graphs in the ﬁgures and alarger mean and standard deviation of the diﬀerence inTable IX. However, for most graphs the median angleswork very well, with 87% of graphs obtaining identical2 r − r m r − r mB P ( C max ) − P m ( C max ) P ( C max ) − P mB ( C max )mean 2.2434 × − × − × − × − standard deviation 2.4664 × − × − × − × − TABLE IX: Mean and standard deviations of the diﬀerences between quantities r m , r mB , P m ( C max ), and P mB ( C max )calculated with the median angles and quantities r and P ( C max ) calculated from BFGS optimizations with randomseeds, see text for details. r m r r mB r FIG. 10: Two-dimensional histograms comparing theapproximation ratio from the full BFGS search toresults using the median angles without optimization(a) and using the the median angles as seeds in BFGS(b), for n = 7 and p = 3. Dashed white lines indicatethe diagonals where r m = r and r mB = r .probabilities for the maximum cut using the median an-gles as seeds in BFGS and using the much more compu-tationally expensive search over random seeds. V. CONCLUSIONS

We have presented results that identify empirical per-formance bounds on optimized instances of QAOA for P m ( C max ) P ( C max ) 1 10counts+10.000.250.500.751.00 0.00 0.25 0.50 0.75 1.00(b) P mB ( C max ) P ( C max ) 1 10counts+1 FIG. 11: Histograms similar to Fig. 10 but showing theprobability of obtaining a maximum cut.MaxCut. Using numerical simulations, we investigatedan exhaustive set of MaxCut instances for depths p ≤ n ≤ p = 1 and 2 and validated results from BFGSat depths p ≤

3. The catalog of graph instances, opti-mized angles, and simulated states are available online[26].Our analysis used the approximation ratio r and theprobability for obtaining a maximum cut P ( C max ) asmeasures of QAOA performance. We observed that r becomes more similar across graph structures as p and n increase and the Gaussian distributions of r narrow.3Most graphs at n ≤ p = 3 in these narrow distributions, indicatingviability of modest-depth QAOA to outperform GW.In contrast to the narrowing distributions of r , distri-butions of P ( C max ) were found to broaden with increas-ing p . We attributed the diﬀerence to design of the costfunction and its corresponding spectrum. A preponder-ance of nearly-optimal states skews the optimization of r away from the much smaller set of truly optimal states.While this yields a large, uniform value for r , it leaves P ( C max ) less constrained. One alternative is to focusoptimization on state preparation to ensure the largestvalues of P ( C max ), for example by choosing gate anglesthat optimize a non-linear function such as h exp( α ˆ C ) i [23]. For p >

0, we observed an exponential decay in P ( C max ) with respect to n . The rate constant, k p , wasfound to decrease as p increases, and this raises the ques-tion as to whether such exponential behavior can predictperformance metrics at larger n and p .The patterns observed in the optimized angles acrossthis exhaustive set of instances mirrors previous resultsfor narrower cases [6, 9, 11, 12, 22]. Using these patternsas a search heuristic demonstrated high quality resultsfor most graphs and required a signiﬁcantly smaller com-putational cost than BFGS search with random seeding.Identifying the prevalence of these patterns at larger n and p as well as the correlation with graph propertiescan enable new search heuristics, for example, for gen-eral QAOA-like algorithms, as suggested by the work ofCook et al. [22]. ACKNOWLEDGMENTS

P. C. L. thanks Zak Webb and Yan Wang for discussingtime-reversal symmetry. This work was supported byDARPA ONISQ program under award W911NF-20-2-0051. J. Ostrowski acknowledges the Air Force Oﬃceof Scientiﬁc Research award, AF-FA9550-19-1-0147. G.Siopsis acknowledges the Army Research Oﬃce awardW911NF-19-1-0397. J. Ostrowski and G. Siopsis ac-knowledge the National Science Foundation award OMA-1937008. This research used resources of the Computeand Data Environment for Science (CADES) at the OakRidge National Laboratory, which is supported by theOﬃce of Science of the U.S. Department of Energy un-der Contract No. DE-AC05-00OR22725.

APPENDIX: ANGLE SYMMETRIES

In this Appendix we discuss details of the angle sym-metries from Section III C. Most of the symmetries havebeen described previously by Zhou et. al. [6], althoughwe note two of the symmetries they described for reg-ular graphs apply more broadly to graphs where everyvertex has even degree or every vertex has odd degree.Each symmetry relates diﬀerent sets of angles ( γ , β ) and ( γ ′ , β ′ ) that give the same approximation ratio inEq. (11), h ˆ C ( γ , β ) i = h ˆ C ( γ ′ , β ′ ) i , (27)where γ = ( γ , ..., γ q − , γ q , γ q +1 , ..., γ p ) β = ( β , ..., β q − , β q , β q +1 , ..., β p ) , (28)with ( γ ′ , β ′ ) related to ( γ , β ) in diﬀerent ways for thediﬀerent symmetries. An example is periodic behavior ofthe β q angles over intervals of π/

2, we use this to restrictevery β q component to the interval − π/ ≤ β q ≤ π/ et. al. There are a variety of addi-tional symmetries we describe below, with derivations insubsequent subsections.The ﬁrst type of symmetry applies to graphs where ev-ery vertex has even degree. In this case the γ q angles areperiodic over intervals of π since all the eigenvalues of ˆ C are even, as shown in detail in the next subsection. Thusthe symmetry of Eq. (27) holds for any pair of angleswith β = β ′ and γ ′ = ( γ , ..., γ q − , γ q ± π, γ q +1 , ..., γ p ) , (29)where γ ′ diﬀers from γ by a shift γ ′ q = γ q ± π for any q .We use this symmetry to organize the angle distributionsso | γ q | ≤ π/ q for graphs where every vertexdegree is even. We search for generic optimized angles inour implementation of the BFGS algorithm from SectionIII B, but if we ﬁnd a | γ q | > π/ π to get a | γ ′ q | ≤ π/ γ ′ = ( γ , ..., γ q − , γ q ± π, γ q +1 , ..., γ p ) β ′ = ( β , ..., β q − , − β q , − β q +1 , ..., − β p ) . (30)In Eq. (30), γ ′ diﬀers from γ in the q th component, γ ′ q = γ q ± π for any q , and β ′ diﬀers from β in the sign of allsubsequent components, β ′ r = − β r for all r ≥ q . Theproof follows from Pauli operator commutation relationsapplied to the two unitary operators of Eqs. (7)-(8), asshown in detail later. In our calculations, we use thissymmetry to organize the angles so | γ q | ≤ π/ q for graphs where every vertex degree is odd, similar tohow we organize the angles when all the vertex degreesare even.The third analytic symmetry we use is the “time-reversal” symmetry [6]( γ ′ , β ′ ) = ( − γ , − β ) . (31)4 n M (1) M (2) M (3)2 0 – –3 0 1* –4 0 2* 05 0 2* 1*6 0 1 3*7 0 0 2*8 0 1 19 0 0 1* TABLE X: Numbers of graphs M ( p ) where we founddegeneracies and used the rule of saving the angles withthe most components in (Γ , B) following Eq. (22), forvarious depths p . Numbers with asterisks denote all thegraphs we used the rule on were fully optimized, dashesdenote all the graphs were optimized at smaller depths.The symmetry is related to the ˆ B and ˆ C operators inEqs. (7)-(8) and the initial state | ψ i of Eq. (5), whichhave real-valued matrix elements and coeﬃcients in thecomputational basis. We give a proof at the end of theAppendix. We use this symmetry to always transform toangles with β ≤

0. The β q for q > γ i , β i ) ∈ (Γ , B)from Eq. (22). Using the angles with the most compo-nents in (Γ , B) is designed to emphasize the angle pat-terns in Section IV C. The number of graphs M ( p ) weused the (Γ , B) symmetry on is shown for each n and p in Table X. The symmetry was used most often when agraph became fully optimized. For example, we foundtwo sets of angles for an n = 3 graph that optimized thecost function h ˆ C i = C max ; we saved the angles with themost components in (Γ , B). Asterisks in the table denoteeach graph which we used the rule on for that n and p was fully optimized. Note when graphs are optimized atsome p we do not simulate them at depths p ′ > p , sothey do not carry over to columns for greater depths inthe table. Overall, the rule is applied to a very limitedsubset of the graphs we study.

1. Symmetry when all vertices have even degree

When all the vertices have even degree, the angle com-ponents γ q are periodic over intervals of π , as discussedaround Eq. (29). To demonstrate equivalence of h ˆ C i inEq. (27) for the angles γ ′ from Eq. (29) and γ fromEq. (28), in the next paragraph we show ˆ C of Eq. (17) hasonly even eigenvalues when each vertex has even degree.Then C ( z ) = 2 m z for all z , where m z is an integer. Theperiodicity over π follows since the matrix elements fromthe unitary operator ˆ U ( ˆ C, γ q ) of Eq. (7) are invariant under changes γ q → γ q ± π , since h z | ˆ U ( ˆ C, γ q ) | z i = exp( ∓ i m z π ) exp( − i m z γ q )= h z | ˆ U ( ˆ C, γ q ± π ) | z i , (32)where the factor 1 = exp( ∓ i m z π ) connects the two ex-pressions.To show the eigenvalues C ( z ) are all even when allthe vertex degrees are even, we begin by showing thereexists a single bitstring z (0) for which C ( z (0) ) is even.Then we show that if C ( z ) is even for any z and all thevertex degrees are even, then modifying any bit z k in thebitstring z to get a new bitstring z ′ will give a C ( z ′ ) thatis also even. Together these imply that every bitstringhas even C ( z ) when all the vertex degrees are even, sinceany bitstring can be made from a series of modiﬁcationsto z (0) and every modiﬁcation gives an even C ( z ).First consider the zero bitstring z (0) = (0 , , ..., C ( z (0) ) = 0 whichis even. Next consider an arbitrary bitstring z =( z n − , z n − , ..., z k , ..., z ) for which C ( z ) is even. Sup-pose we ﬂip a bit z k to make a new bitstring z ′ =( z n − , z n − , ..., z ′ k , ..., z ) where z ′ k = z k but the z j = k arethe same. The change z k → z ′ k will change the value ofeach C h j,k i from Eq. (16) that is associated with z k , sothat if C h j,k i ( z ) = 1 then C h j,k i ( z ′ ) = 0 and vice-versa.The total number of edge terms C h j,k i can be separatedinto κ terms that increase in value and δ terms that de-crease in value when z → z ′ . The value of the cost func-tion for z ′ can then be expressed as C ( z ′ ) = C ( z ) + κ − δ .The number of edge components C h j,k i that depend on z k is even when each vertex degree is even, so δ + κ = 2 m forsome integer m . This implies that κ and δ are either botheven or both odd; either way, the diﬀerence κ − δ is even.Since C ( z ) is even by assumption, C ( z ′ ) = C ( z ) + κ − δ is also even.We have shown there is a bitstring z (0) for which C ( z (0) ) is even and we have shown changing any suchbitstring gives a new C ( z ′ ) which is also even. This im-plies C ( z ) is even for all z for every graph where eachvertex degree is even. By Eq. (32) the angle components γ q are periodic over intervals of π for these graphs.

2. Symmetry when all vertices have odd degree

When all the vertices have odd degree, there is a sym-metry Eq. (27) for angle-pairs ( γ , β ) and ( γ ′ , β ′ ) fromEqs. (28) and (30) respectively. Only graphs with aneven number of vertices can have this symmetry sinceit is impossible to have a graph with an odd number ofvertices where every vertex has odd degree. We provethe symmetry holds in a simple case with p = 1, the ex-tension to p > γ , β ) and ( γ ′ , β ′ ), then use the analysis to relate5the time evolution and probabilities P ( z ) for states withthe diﬀerent angles.Let γ ′ = γ ± π . The unitary operator ˆ U ( ˆ C, γ ) fromEq. (7) is a product of unitary-operator components foreach edge, a single component can be expressed asexp( − iγ ˆ C h j,k i ) = exp (cid:16) − i γ − ˆ Z j ˆ Z k ) (cid:17) = exp (cid:18) − i γ ′ ∓ π (cid:19) exp (cid:18) i γ ′ ∓ π Z j ˆ Z k (cid:19) . (33)Separate out a term exp (cid:16) ∓ i ( π/

2) ˆ Z j ˆ Z k (cid:17) = ∓ i ˆ Z j ˆ Z k toobtainexp( − iγ ˆ C h j,k i ) = e iδ ˆ Z k ˆ Z j exp( − iγ ′ ˆ C h j,k i ) , (34)where e iδ is an overall phase.Now consider the unitary operator ˆ U ( ˆ C, γ ) of Eq. (7).Using Eq. (34) we haveˆ U ( ˆ C, γ ) = Y h j,k i exp( − iγ ˆ C h j,k i )= e iη  n − Y j =0 ˆ Z j  ˆ U ( ˆ C, γ ′ ) , (35)where e iη is a phase factor. The term Q n − j =0 ˆ Z j comesfrom the product of ˆ Z j ˆ Z k for all the edges—each ˆ Z j israised to an odd power in the product since each ver-tex degree is odd and using ( ˆ Z j ) = ˆ reduces this to Q n − j =0 ˆ Z j .We will use Eq. (35) to simplify the time evolution of | ψ p ( γ , β ) i in Eq. (6). This includes the unitary opera-tor ˆ U ( ˆ B, β ) from Eq. (8), which shows up in the productˆ U ( ˆ B, β ) ˆ U ( ˆ C, γ ). Our next goal will be to use commu-tation relations to move the Q n − j =0 ˆ Z j to the left side ofthe product.Express the ˆ U ( ˆ B, β ) from Eq. (8) asˆ U ( ˆ B, β ) = n − Y j =0 (cid:16) cos( β )ˆ − i sin( β ) ˆ X j (cid:17) . (36)Now consider the product of the unitary operatorsˆ U ( ˆ B, β ) ˆ U ( ˆ C, γ )= e iη n − Y j =0 (cid:16) cos( β )ˆ − i sin( β ) ˆ X j (cid:17)  n − Y j =0 ˆ Z j  ˆ U ( ˆ C, γ ′ )(37) The Pauli operators anticommute soˆ U ( ˆ B, β ) ˆ U ( ˆ C, γ ) = e iη  n − Y j =0 ˆ Z j  ˆ U ( ˆ B, − β ) ˆ U ( ˆ C, γ ′ ) . (38)We are now ready to show the angles ( γ , β ) and( γ ′ , β ′ ) give equivalent h ˆ C i in Eq. (11), where γ ′ = γ ± π and β ′ = − β . To demonstrate the equivalence we showthe computational basis state probabilities P ( z ) are thesame for both ( γ , β ) and ( γ ′ , β ′ ).The probability amplitude for a basis state | z i is h z | ψ ( γ , β ) i = h z | ˆ U ( ˆ B, β ) ˆ U ( ˆ C, γ ) | ψ i (39)Using Eq. (38) this can be expressed as h z | ψ ( γ , β ) i = h z | e iη  n − Y j =0 ˆ Z j  ˆ U ( ˆ B, β ′ ) ˆ U ( ˆ C, γ ′ ) | ψ i = e iη h z |  n − Y j =0 ˆ Z j  | ψ ( γ ′ , β ′ ) i (40)Squaring the amplitudes we obtain |h z | ψ ( γ , β ) i| = h ψ ( γ ′ , β ′ ) |  n − Y j =0 ˆ Z † j  | z ih z |  n − Y j =0 ˆ Z j  | ψ ( γ ′ , β ′ ) i (41)The (cid:16)Q n − j =0 ˆ Z j (cid:17) can be moved to the left since it com-mutes with | z ih z | , then using (cid:16)Q n − j =0 ˆ Z † j (cid:17) (cid:16)Q n − j =0 ˆ Z j (cid:17) =ˆ gives P ( z ) = |h z | ψ ( γ , β ) i| = |h z | ψ ( γ ′ , β ′ ) i| , (42)where P ( z ) is the probability of | z i from Eq. (10).The P ( z ) are the same for ( γ , β ) and ( γ ′ , β ′ ) so h ˆ C ( γ , β ) i = h ˆ C ( γ ′ , β ′ ) i in Eq. (11), which is the de-sired symmetry.

3. Time-reversal symmetry

We ﬁnally consider the “time-reversal” symmetry withthe angles of Eqs. (28) and (31) in the symmetry relationEq. (27) [6]. The symmetry is related to structures ofthe ˆ B and ˆ C operators, which have real-valued matrixelements in the computational basis in Eqs. (9) and (17),and the structure of the initial state | ψ i , which has real-valued coeﬃcients in the computational basis in Eq. (5).To demonstrate the symmetry we calculate generic com-putational basis probabilities P ( z ) for states with both6sets of angles and show they are equal, thus the h ˆ C i areequal following Eq. (11).Consider the probability amplitude h z | ψ p ( γ , β ) i for asingle basis state | z i . From Eqs. (5)-(8) this is h z | ψ p ( γ , β ) i = 1 √ n X z ′ h z | p Y q =1 e − iβ q ˆ B e − iγ q ˆ C ! | z ′ i . (43)Taking the complex conjugate of Eq. (43) only changesthe sign of the angle terms since the ˆ B and ˆ C matriceshave real-valued matrix elements in the computational basis, thus h z | ψ p ( γ , β ) i ∗ = h z | ψ p ( − γ , − β ) i . (44)The Born probabilities P ( z ) from Eq. (10) are the samefor both states since P ( z ) = |h z | ψ p ( γ , β ) i| = |h z | ψ p ( γ , β ) i ∗ | = |h z | ψ p ( − γ , − β ) i| , (45)so the h ˆ C i are the same in Eq. (11). [1] Frank Arute, Kunal Arya, Ryan Babbush, Dave Bacon,Joseph C Bardin, Rami Barends, Rupak Biswas, SergioBoixo, Fernando GSL Brandao, David A Buell, et al.Quantum supremacy using a programmable supercon-ducting processor. Nature , 574(7779):505–510, 2019.[2] Han-Sen Zhong, Hui Wang, Yu-Hao Deng, Ming-ChengChen, Li-Chao Peng, Yi-Han Luo, Jian Qin, Dian Wu,Xing Ding, Yi Hu, Peng Hu, Xiao-Yan Yang, Wei-JunZhang, Hao Li, Yuxuan Li, Xiao Jiang, Lin Gan, Guang-wen Yang, Lixing You, Zhen Wang, Li Li, Nai-Le Liu,Chao-Yang Lu, and Jian-Wei Pan. Quantum computa-tional advantage using photons.

Science , 2020.[3] Edward Farhi, Jeﬀrey Goldstone, and Sam Gutmann.A quantum approximate optimization algorithm. arXivpreprint arXiv:1411.4028 , 2014.[4] Zhihui Wang, Stuart Hadﬁeld, Zhang Jiang, andEleanor G Rieﬀel. Quantum approximate optimizationalgorithm for MaxCut: A fermionic view.

Physical Re-view A , 97(2):022304, 2018.[5] Stuart Hadﬁeld. Quantum algorithms for scientiﬁc com-puting and approximate optimization. arXiv preprintarXiv:1805.03265 , 2018. Eq. 5.10, p. 114.[6] Leo Zhou, Sheng-Tao Wang, Soonwon Choi, HannesPichler, and Mikhail D. Lukin. Quantum approximateoptimization algorithm: Performance, mechanism, andimplementation on near-term devices.

Phys. Rev. X ,10:021067, 2020.[7] G G. Guerreschi and A. Y. Matsuura. QAOA for Max-Cut requires hundreds of qubits for quantum speed-up.

Scientiﬁc Reports , 9, 2019.[8] Matija Medvidovi´c and Giuseppe Carleo. Classical varia-tional simulation of the quantum approximate optimiza-tion algorithm. arXiv preprint arXiv:2009:01760v1 , 2020.[9] Fernando G. S. L. Brand˜ao, Michael Broughton, EdwardFarhi, Sam Gutmann, and Hartmut Neven. For ﬁxedcontrol parameters the quantum approximate optimiza-tion algorithm’s objective function value concentrates fortypical instances. arXiv preprint arXiv:1812.04170 , 2018.[10] Jonathan Wurtz and Peter Love. Bounds on MAX-CUT QAOA performance for p > arXiv preprintarXiv:2010.11209 , 2020.[11] R. Shaydulin and Y. Alexeev. Evaluating quantum ap-proximate optimization algorithm: A case study. In , pages 1–6, 2019. [12] Gavin E Crooks. Performance of the quantum approxi-mate optimization algorithm on the maximum cut prob-lem. arXiv preprint arXiv:1811.08419 , 2018.[13] Ruslan Shaydulin, Stuart Hadﬁeld, Tad Hogg, and IlyaSafro. Classical symmetries and QAOA. arXiv preprintarXiv:2012.04713 , 2020.[14] James Ostrowski, Rebekah Herrman, Travis S. Humble,and George Siopsis. Lower bounds on circuit depth ofthe quantum approximate optimization algorithm. arXivpreprint arXiv:2008.01820v2 , 2020.[15] V. Akshay, H. Philathong, M. E. S. Morales, and J. D.Biamonte. Reachability deﬁcits in quantum approximateoptimization. Phys. Rev. Lett. , 124:090504, 2020.[16] Mario Szegedy. What do QAOA energies reveal aboutgraphs? arXiv preprint arXiv:1912.12272v2 , 2020.[17] Guido Pagano, Aniruddha Bapat, Patrick Becker,Katherine S. Collins, Arinjoy De, Paul W. Hess, Har-vey B. Kaplan, Antonis Kyprianidis, Wen Lin Tan,Christopher Baldwin, Lucas T. Brady, Abhinav Desh-pande, Fangli Liu, Stephen Jordan, Alexey V. Gorshkov,and Christopher Monroe. Quantum approximate opti-mization of the long-range Ising model with a trapped-ionquantum simulator.

Proceedings of the National Academyof Sciences , 117(41):25396–25401, 2020.[18] Zhihui Wang, Nicholas C Rubin, Jason M Dominy, andEleanor G. Rieﬀel. XY -mixers: analytical and numeri-cal results for the quantum alternating operator ansatz. Physical Review A , 101:012320.[19] Linghua Zhu, Ho Lun Tang, Gearge S. Barron,Nicholas J. Mayhall, Edwin Barnes, and Sophia E.Economou. An adaptive quantum approximate optimiza-tion algorithm for solving combinatorial problems on aquantum computer. arXiv preprint arXiv:2005.10258 ,2020.[20] Zhang Jiang, Eleanor G. Rieﬀel, and Zhihui Wang. Near-optimal quantum circuit for Grover’s unstructured searchusing a transverse ﬁeld.

Physical Review A , 95:062317,2017.[21] Andreas B¨artschi and Stephan Eidenbenz. Grover mix-ers for QAOA: Shifting complexity from mixer design tostate preparation. arXiv preprint arXiv:2006.00354v2 ,2020.[22] Jeremy Cook, Stephan Eidenbenz, and Andreas B¨artschi.The quantum alternating operator ansatz on maximum k -vertex cover. arXiv preprint arXiv:1910.13483v2 , 2020. [23] Li Li, Minjie Fan, Marc Coram, Patrick Riley, and StefanLeichenauer. Quantum optimization with a novel Gibbsobjective function and ansatz architecture search. Phys.Rev. Research , 2, 2020.[24] Reuben Tate, Majid Farhadi, Creston Herold, GregMohler, and Swati Gupta. Bridging classical and quan-tum with SDP initialized warm-starts for QAOA. arXivpreprint arXiv:2010.14021 , 2020.[25] William H. Press, Brian P. Flannery, andSaul A. Teukolsky.

Numerical Recipes in For-tran 77: The Art of Scientiﬁc Computing . Cam-bridge University Press, second edition, 1993.https://people.sc.fsu.edu/ ∼ inavon/5420a/DFP.pdf.[26] Phillip C. Lotshaw and Travis S. Humble. QAOAdataset. Found at https://code.ornl.gov/qci/qaoa-dataset-version1.[27] Rebekah Herrman, Lorna Treﬀert, James Ostrowski, Phillip C. Lotshaw, Travis S. Humble, and George Siop-sis. Impact of graph structures for QAOA on MaxCut. arXiv preprint arXiv:2102.05997 , 2021.[28] Brendan McKay. Graphs.https://users.cecs.anu.edu.au/ ∼ bdm/data/graphs.html.Accessed July 8, 2020.[29] Pietro Belotti. Couenne: A user’s manual. Technicalreport, Technical report, Lehigh University, 2009.[30] Pietro Belotti, Jon Lee, Leo Liberti, Francois Margot,and Andreas W¨achter. Branching and bounds tight-eningtechniques for non-convex MINLP. OptimizationMethods & Software , 24(4-5):597–634, 2009.[31] Michel X. Goemans and David P. Williamson. Improvedapproximation algorithms for maximum cut and satisﬁa-bility problems using semideﬁnite programming.