Accelerating Quantum Approximate Optimization Algorithm using Machine Learning
AAccelerating Quantum Approximate OptimizationAlgorithm using Machine Learning
Mahabubul Alam
Department of Electrical EngineeringPennsylvania State University
University Park, [email protected]
Abdullah Ash-Saki
Department of Electrical EngineeringPennsylvania State University
University Park, [email protected]
Swaroop Ghosh
Department of Electrical EngineeringPennsylvania State University
University Park, [email protected]
Abstract —We propose a machine learning based approach toaccelerate quantum approximate optimization algorithm (QAOA)implementation which is a promising quantum-classical hybridalgorithm to prove the so-called quantum supremacy . In QAOA, aparametric quantum circuit and a classical optimizer iterates ina closed loop to solve hard combinatorial optimization problems.The performance of QAOA improves with increasing numberof stages (depth) in the quantum circuit. However, two newparameters are introduced with each added stage for the classicaloptimizer increasing the number of optimization loop iterations.We note a correlation among parameters of the lower-depth andthe higher-depth QAOA implementations and, exploit it by devel-oping a machine learning model to predict the gate parametersclose to the optimal values. As a result, the optimization loopconverges in a fewer number of iterations. We choose graphMaxCut problem as a prototype to solve using QAOA. Weperform a feature extraction routine using different QAOAinstances and develop a training data-set with , optimalparameters. We present our analysis for flavors of regressionmodels and flavors of classical optimizers. Finally, we show thatthe proposed approach can curtail the number of optimizationiterations by on average . (up to . ) from an analysisperformed with flavors of graphs. I. I
NTRODUCTION
Quantum approximate optimization algorithm (QAOA) [1],[2] is a quantum-classical hybrid algorithm to solve combina-torial optimization problems. The hybrid algorithms are con-sidered promising to demonstrate quantum advantage (i.e., toprove superior performance for a problem compared to state-of-the-art classical methods) [3], [4]. Versions of the QAOAare expected to find approximate solutions to combinatorialsearch problems faster than the classical algorithms. Thus,exploring new avenues to improve the performance of QAOAhas become an exciting research domain [5]–[9].The theoretic discussion of QAOA can be found in [1], [5].In this paper, we delineate our proposal based on the quantumcircuit-level implementation of the QAOA. Fig. 1(a) shows atypical stage of the QAOA circuit. The first layer consists ofHadamard gate which put the qubits in superposition states.The next level is the phase-separation layer consisting ofControlled-NOT (CNOT) and parametric RZ ( − γ ) gates. Thefinal layer is the mixing layer with parametric RX ( β ) gates.Thus, each layer of the circuit has two gate parameters ( γ, β ) .This typical stage can be repeated several times to construct ahigher-depth QAOA circuit where each stage will have differ-ent sets of gate parameters ( γ and β ). The QAOA performanceis known to improve with the increasing number of stages inthe circuit [5], [6]. On the basis of the gate parameters, thequantum circuit generates an output quantum state | ψ ( γ, β ) (cid:105) .QAOA involves a cost function which is the expectation valueof cost Hamiltonian ( H C ) in the output state | ψ ( γ, β ) (cid:105) . Theoptimization goal is to maximize this expectation value. Ineach iteration the classical optimizer tracks the expectationvalue and generates a new set of parameters ( γ, β ) whichdrives the quantum circuit. The optimization loop iterates between the quantum com-puter and the classical optimizer until an optimal set of gateparameters are found (Fig. 1(a)). The number of this loop-iteration is a key factor for the QAOA run-time. A higher-depth(stage) QAOA implementation has better performance but alarger number of parameters may lead to a greater number ofloop iterations (Fig. 1(c)). ( features , and train aMachine Learning (ML) based predictor model. The predictormodel predicts initial parameters for a higher-depth QAOAcircuit from the optimal gate parameters of a single-stageQAOA. These tuned initial parameters are close to the optimalones. Therefore, when the optimization loop is intelligentlyinitialized (noted as QCML in Fig. 1(d)) with these tunedparameters, the optimization goal is achieved faster cuttingdown the loop iterations by 44.9% on average (up to 65.7%).We select MaxCut problem to be solved by QAOA. Thebackgrounds of MaxCut formulation and QAOA are providedin the Appendix. The contributions made in this paper are: (a) Feature extraction: We explore a graph MaxCut undervarious setups (e.g., 100 different QAOA instances with varieddepth) and identify optimal gate parameters for each setup.From this optimal values we select features for our ML model. (b) Training data-set:
We prepare a training data-set with atotal of , optimal parameters ( different graphs and QAOA instances for each graph). (c) ML model:
We train ML models, namely, GaussianProcess Regression (GPR), Linear Regression (LM), Regres-sion Tree (RTREE), and Support Vector Machine Regression(RSVM) to study the effect of ML model on the classical loopoptimization. GPR exhibits best performance on the basis ofprediction accuracy. (d) Accelerating optimization loop:
We propose a two-levelML-based approach to accelerate QAOA optimization loop.First, we calculate the optimal parameters ( γ OP T , β OP T ) for a single-stage (lowest depth) QAOA circuit using naivemethod. Next, we feed this ( γ OP T , β OP T ) of the single-stage implementation to ML model and predict tuned parame-ters for a higher target depth QAOA instance. The higher depthinstance (initialized with the predicted parameters) is then runin the optimization loop with a local optimizer to generate thefinal solution. We also introduce a hierarchical prediction byusing optimal parameters from an intermediate-stage QAOAimplementation along with the single-stage values. (e) Exhaustive analysis: We explore a broad-spectrum ofclassical optimizers (a total of ) to establish that our approachis optimizer-agnostic. Two of these optimizers are gradient-Accepted in the Design Automation and Test in Europe Conference 2020 a r X i v : . [ c s . ET ] A p r uantum Computer (QC)ClassicalOptimizerIntelligentInitialization of 𝛽, 𝛾 OptimizationGoal Met?YesNo 𝛽, 𝛾
QCML ) QCR QCML
Distinct patterns in the optimal parameter values. ( 𝛽 decreases and 𝛾 increases between steps) (a) (d) Quantum Computer (QC)ClassicalOptimizerRandom Initializations of 𝛽, 𝛾
OptimizationGoal Met?YesNo 𝛽, 𝛾
QCR )Solution: Qubit assignments (1/0) maximizing/minimizing the cost. (c)
Phase Separation Step 1 MixingStep 1 Phase SeparationStep 2 MixingStep 2 Depth p = 2 QAOA Instance (b)
Fig. 1: (a) Typical QAOA flow with random initializations (QCR) of the control parameters and a QAOA circuit instance for aMaxCut problem of two-node, single edge graph; (b) expected patterns in the optimal parameters; (c) approximation ratio - AR(indicates the performance) and run-time (QC calls) distributions for QAOA MaxCut instance optimization of four 3-Regulargraphs (8-nodes) with varying depths ( p ); (d) proposed QAOA flow with ML-based initialization (QCML) of the parameters.based (L-BFGS-B and SLSQP), and two are gradient free(Nelder-Mead and COBLYA) from Python SciPy library [10]).We have used QuTIP library [11] based quantum computersimulator as the quantum computer of the optimization loop. (f) Quantified speed-up: The proposed approaches reduce theoptimization loop iteration by 44.9% on average.
To the best of our knowledge, this is the first work onoptimizing the parameters of QAOA using ML .II. P
ARAMETER T RENDS , F
EATURE S ELECTION , AND P ROPOSED A PPROACH
In this section, we describe the notations used in the paperto avoid ambiguity. We also discuss the trends in the QAOAcircuit parameters that are leveraged to select appropriatefeatures for the ML model and to predict the initial parameters.
A. Notations
The depth (or, the total number of stages) in a QAOA circuitis denoted by p . Each stage/step of a p -depth circuit is indexedwith i . For example, for a QAOA circuit with depth ( p =) 5 , i could be { , , , , } . For a single-depth circuit (i.e., p = 1 ), i = { } . Additionally, γ OP T ( p =3) denotes γ parameter (i.e.,phase separation parameter) of the st stage ( i = 1) of aQAOA circuit implementation with depth . Note that, thesame problem can be solved by lower depth QAOA circuit aswell as by a higher depth QAOA circuit. The ApproximationRatio (AR) of the higher depth implementation is better. B. Patterns in Optimal Control Parameters
In this Section, we present the patterns in the optimalcontrol parameters for a fixed-depth QAOA circuit (i.e. pat-terns in the optimal γ OP T , γ OP T , and γ OP T values forQAOA instance with p = 3). An interesting observation isthat the optimal parameter values of any QAOA-instance arenot necessarily random. Rather, they show some regularities[5], [6]. The optimal mixing layer parameter ( β iOP T ) val-ues of any QAOA instance with depth- p gradually decreasebetween steps (stages) whereas the optimal phase separatingparameters ( γ iOP T ) increase between steps. Fig. 2(a) and(b) show the optimal control parameter values at 3 steps( γ OP T ( p =3) , β OP T ( p =3) .. γ OP T ( p =3) , β OP T ( p =3) ) and 5steps ( γ OP T ( p =5) , β OP T ( p =5) .. γ OP T ( p =5) , β OP T ( p =5) )for four (denoted by G1, G2, G3 and G4 in Fig. 2) 8-node3-regular graphs with p = 3 and p = 5 respectively. For everygraph and every p -value, L-BFGS-B optimizer has been used with functional tolerance limit of − to find the optimalparameters from 20 random initialization. C. Optimal Control Parameters with Depth-p
In this Section, we analyze the patterns in a certain controlparameter (say, γ OP T ) for different instance depths (i.e. how γ OP T value changes from p = 1 ( γ OP T ( p = 1) ) to p = 2( γ OP T ( p = 2) ) for the same MaxCut problem). The optimalcontrol parameter values of a certain QAOA step ( γ iOP T , β iOP T ) have a strong correlation with the chosen circuitdepth- p . The optimal phase separating control parameter of acertain QAOA step ( γ iOP T ) decreases with the circuit depth- p (annotated by arrows in Fig. 3(a)). Contrarily, the optimalmixing control parameter ( β iOP T ) increases. Fig. 3(a) showsthe optimal phase separating control parameters with varyingdepth ( p = 1 to 5). Fig. 3(b) shows the corresponding optimalmixing parameters.Note that the above observations are significant. The optimalcontrol parameter values at a lower depth (say, p ) givessignificant clues about the optimal control parameters at adepth p where p > p . For instance, if we have alreadydetermined γ OP T ( p = 1) and β OP T ( p = 1) for any givenproblem for p = 1 , a smaller γ init ( p = 2) value (than γ OP T ( p = 1) ) can be a good starting point for p = 2 instance optimization (Fig. 3). Moreover, an initial valuelarger than γ init ( p = 2) can be a good starting point for γ init ( p = 2) (Fig. 2). Similar techniques can be appliedfor the β init ( p = 2) and β init ( p = 2) initialization.However, the extent of the differences will depend on thedifference between p and p , and the optimal γ OP T ( p = 1) and β OP T ( p = 1) values. A predictor model trained withsufficient number similar problem instances can essentiallylearn these correlations and can be used for smart initialization (a) (b) Fig. 2: Trends in the optimal control parameters of four 3-regular graphs for (a) p = 3; (b) p = 5 (each instance isoptimized from 20 random initializations). a) (b) Fig. 3: Trends in (a) the optimal γ iOP T values, and (b) theoptimal β iOP T values for varying depths for a single 3-regulargraph with 20 random initializations. Prepare p = 1QAOA instance with random initialization Yes 𝑝 (cid:3047) ) 𝛽,𝛾 For 𝑝 (cid:3047) -depth QAOAQuantum Computer (QC)ClassicalOptimizer OptimizationGoal Met?No YesPrepare 𝑝 (cid:3047) -depthQAOA Circuit withintelligent initializationof ( 𝛽 (cid:2869) ,𝛾 (cid:2869),…. 𝛽 (cid:3043)(cid:3047) ,𝛾 (cid:3043)(cid:3047) ) using knowledge of 𝛽 (cid:2869)(cid:3016)(cid:3017)(cid:3021) ,𝛾 (cid:2869)(cid:3016)(cid:3017)(cid:3021) for p = 1.Predictor ModelDepth== 𝑝 (cid:3047) ?Level-1 NoLevel-2 Fig. 4: Proposed two-level QAOA implementation flow.of the variables of a higher-depth implementation from thelow-depth optimal control parameters.
D. Feature Selection
The lower-depth optimal control parameters and the targetdepth (say, p = p t ) are used as the f eatures of the pre-dictor models. For the two-level approach (discussed next), γ OP T ( p = 1) , β OP T ( p = 1) , and the target depth p t areused as the f eature vectors (a total of features). On the basisof these features, the predictor will generate p t parameters. E. Proposed Approach to Accelerate QAOA
To accelerate the convergence speed of QAOA with a clas-sical local optimizer at moderate target depth- p t , we proposea two-stage optimization procedure (Fig. 4). In the first stage,the QAOA instance of depth p = 1 is optimized for thetarget problem. This is relatively simple and fast optimizationprocess. The optimal γ OP T ( p = 1) and β OP T ( p = 1) valuesand the target depth- p t is then used to predict the initial valuesof the control parameters for the p t -depth QAOA instanceusing pre-trained regression models. The local optimizer thenoptimizes the control variables from these initial values tofind the target solution. The overall convergence speed isthe summation of the number of loop iterations for p = 1instance optimization (typically fast) and the later target depth- p t instance optimization (which we have accelerated).III. M ACHINE L EARNING F RAMEWORK
We train regression models with the optimal parametervalues of various problem instances. To evaluate the proposedtechnique, we have created a data-set which includes theoptimal QAOA parameter values of an ensemble of problemgraphs for varied depths. The details are provided below.
A. Data Generation
We have picked the problem graphs for MaxCut-QAOAfrom the Erdos-Renyi ensemble [12] with edge probabilitiesof 0.5 using Python NetworkX package [13]. The ensemble isfrequently used in graph theory to validate if a certain propertyholds for almost all types of graphs [12]. A total of 330 graphs (b)(a)
Fig. 5: Correlation between the predictors of the two-levelapproach ( γ OP T ( p = 1) , β OP T ( p = 1) , p ) and the responsevariables for (a) γ iOP T ; (b) β iOP T .are chosen each containing 8 nodes with random number ofedges and connectivity. The optimal control parameters foreach of the graphs have been generated for various circuitdepths ( p = 1 to 6) using L-BFGS-B as the classical optimizer[10]. The quantum computer is simulated using the QuTIPframework [11]. The functional tolerance limit is identical forall the runs (10 − ) with optimization domain restricted to β i ∈ [0, π ], γ i ∈ [0,2 π ] [1] for random initializations. The datais used to create the data-set (discussed next). Note that thedata generation is an one-time cost.B. Data-set Analysis
We performed a detailed analysis of the correlations be-tween the predictor (i.e., the input to the predictor model)and the response (i.e., the output of the predictor) variables inour data-set. The optimal γ OP T and β OP T parameters for p = show a strong linear correlation (correlation coefficient, R = 0.92) [14] with each other. The correlation coefficient(R) between the γ iOP T and p is negative which indicatesa decrease in γ iOP T with increase in p (Fig. 5(a)). Notethat the correlation decreased for higher order parameters.For instance, R between γ OP T and p is found to be -0.63,while it decreased to -0.44 for γ OP T . The correlation between β iOP T and p is found to be positive and it increased for higherorder parameters (Fig. 5(b)). On one hand, the higher order β iOP T parameters (i.e. β OP T ) are smaller compared to thelower order ones. On the other hand, the lower order γ iOP T parameters are smaller compared to the higher order ones. Theresults indicate that the γ iOP T and β iOP T values are moredictated by the p values when those optimal parameters areexpected to be small.A decreasing trend is also observed in the R between β iOP T ’s and β OP T ( p = 1) , β iOP T ’s and γ OP T ( p = 1) , γ iOP T ’s and γ OP T ( p = 1) , and γ iOP T ’s and β OP T ( p = 1) (Fig. 5(a) and (b)). These variables showed positive corre-lations. The decreasing trend in these correlation coefficientindicates that the further we move from a certain depth, theassociated control parameters will be weakly correlated. Inother words, the control parameters at p = 1 will be weaklycorrelated with the control parameters at p = 3, comparedto the parameters at p = 2. We can expect high correlationbetween the optimal parameters at closer depths.The data-set is split into a training and a test data-set in20:80 ratio (66 in the training data-set and 264 in the test data-set). The reason behind choosing a small training data-set istwofold - first, to determine whether the proposed approachgeneralizes to other instances beyond the training set; second,to emphasize that a small training-set is sufficient to reap thebenefit of our proposed methodology. C. Supervised ML Models
We have experimented with four different regression models- GPR, LM, RSVM and RTREE from MATLAB Statistics andL Toolbox [15] to analyze their performance as our predictormodels. GPR showed the best performance metrics (lowestmean square error (MSE), root mean square error (RMSE),mean absolute error (MAE), and highest R and R adj statistics[14]) for all the tasks. Therefore, we have used GPR as ourregression model in all further analysis. Note that, we haveused MSE as the cost function during the training phase.The performance improvement with our proposed approachare summarized in the following Section.IV. N UMERICAL S IMULATION R ESULTS
As a test-case, we have selected QAOA circuits with targetdepths ranging from to (i.e., a total of , , , and parameters, respectively in the circuit).First, we calculate the run-time for random initialization.For that, we solve the MaxCut problem for graphs eachwith following setup: different local optimizers, randomloop initializations, circuit depth p = 2 to , and tolerance − . We report the mean and standard deviations of functioncalls (FC) (i.e., the number of loop iterations) along with themean and standard deviations of approximation ratio (AR) inTable I for each depth and the local optimizer.Next, we calculate the run-time for our ML-based initial-ization. We solve the same MaxCut problem for graphswith identical setup (i.e., same local optimizers, circuit depth,and tolerance) except, the optimization loop is initialized withpredicted parameters by the trained ML model. We reportfunction calls and approximation ratios as before. The numberof function calls in this case consists of two components: number of calls to get γ OP T ( p = 1) and β OP T ( p = 1) (withrandom initialization) + number of calls to solve a problemfor the target depth (with ML-based initialization). Note that the prediction error is higher for larger targetdepths for the test data-set (Fig. 6) e.g., the average percentageerror in p = 2 instance parameter predictions (difference fromthe actual optimal control parameter values for the 264 graphsin the test data-set) has been found to be 5.7%. This issmaller than the errors in p = 3 instance parameter predictions(8.1%). The results match with our expectation as the predictorvariables have weaker correlations with higher order controlparameters (refer to Section III-B).The two-level approach shows an average improvement of44.9% in run-time across all the local optimization procedures.Note that the improvement in run-time is more pronouncedwhen we go for a higher target depth implementation. Forinstance, Nelder-Mead optimizer showed an average reductionof 12.3% in function calls over the naive approach for targetdepth p = 2 for the entire test set and it increased to 43.3%for p = 3, and further increased to 57.7% for p = 5 (Table I).Note that, for any target depth- p t , p = 1 instance optimizationconstitutes a large portion of the total run-time (as it startsfrom random initialization). Therefore, the improvement in theruntime is less evident for the lower target depth (say, p t = 2)even though the prediction accuracy is higher.V. C ONCLUSION
In this paper, we presented an ML-based method to accel-erate QAOA optimization loop using MaxCut problem. Wepresent analysis for a broad-set of graphs, local optimizers,and circuit depths. We extract trends in the QAOA circuit pa-rameters, select appropriate features for ML models, generatetraining data-set, and run QAOA optimization loop with pre-dicted parameters. Our approach reduces the number of loopiteration by . in average thus accelerating the process.We also present possible tweaks to augment our approach,relevant discussions and limitations, and future perspectives. Optimal Control Parameters P r e d i c ti on E rr o r ( % ) (a) (c)(d)(b) 𝜇 ((cid:3028)(cid:3029)(cid:3046).% (cid:3032)(cid:3045)(cid:3045)(cid:3042)(cid:3045)) = 8.1; 𝜎 = 8.8 𝜇 ((cid:3028)(cid:3029)(cid:3046). % (cid:3032)(cid:3045)(cid:3045)(cid:3042)(cid:3045)) = 9.4; 𝜎 = 10.86𝜇 ((cid:3028)(cid:3029)(cid:3046). % (cid:3032)(cid:3045)(cid:3045)(cid:3042)(cid:3045)) = 10.2; 𝜎 = 14.36𝜇 ((cid:3028)(cid:3029)(cid:3046).% (cid:3032)(cid:3045)(cid:3045)(cid:3042)(cid:3045)) = 5.7; 𝜎 = 6.12 Fig. 6: Prediction errors in (a) p = 2; (b) p = 3; (c) p = 4;and (d) p = 5 QAOA instance control parameter predictionsfor the test data-set (264 graphs) in the two-level approach. Acknowledgement:
This work is supported by SRC(2847.001), and NSF (CNS- 1722557, CCF-1718474, CNS-1814710, DGE-1723687 and DGE-1821766).R
EFERENCES[1] E. Farhi, J. Goldstone, and S. Gutmann, “A quantum approximateoptimization algorithm,” arXiv preprint arXiv:1411.4028 , 2014.[2] E. Farhi, J. Goldstone, S. Gutmann, and H. Neven, “Quantum algorithmsfor fixed qubit architectures,” arXiv preprint arXiv:1703.06199 , 2017.[3] E. Farhi and A. W. Harrow, “Quantum supremacy through the quantumapproximate optimization algorithm,” arXiv preprint arXiv:1602.07674 .[4] J. Preskill, “Quantum computing in the nisq era and beyond,”
Quantum ,vol. 2, p. 79, 2018.[5] L. Zhou et al. , “Quantum approximate optimization algorithm: Perfor-mance, mechanism, and implementation on near-term devices,” arXivpreprint arXiv:1812.01041 , 2018.[6] G. E. Crooks, “Performance of the quantum approximate opti-mization algorithm on the maximum cut problem,” arXiv preprintarXiv:1811.08419 , 2018.[7] M. Wilson, S. Stromswold, F. Wudarski, S. Hadfield, N. M. Tubman, andE. Rieffel, “Optimizing quantum heuristics with meta-learning,” arXivpreprint arXiv:1908.03185 , 2019.[8] D. Wecker, M. B. Hastings, and M. Troyer, “Training a quantumoptimizer,”
Physical Review A , vol. 94, no. 2, p. 022309, 2016.[9] M. Streif and M. Leib, “Training the quantum approximate optimizationalgorithm without access to a quantum processing unit,” arXiv preprintarXiv:1908.08862 , 2019.[10] E. Jones, T. Oliphant, P. Peterson et al. , “Scipy: Open source scientifictools for python, 2001,” 2016.[11] J. R. Johansson et al. , “Qutip 2: A python framework for the dynamics ofopen quantum systems,”
Computer Physics Communications , vol. 184,no. 4, pp. 1234–1240, 2013.[12] B. Bollob´as and B. B´ela,
Random graphs . Cambridge university press,2001, no. 73.[13] N. Developers, “Networkx,” networkx. lanl. gov , 2010.[14] T. Hastie, R. Tibshirani, J. Friedman, and J. Franklin, “The elementsof statistical learning: data mining, inference and prediction,”
TheMathematical Intelligencer , vol. 27, no. 2, pp. 83–85, 2005.[15] I. MathWorks, “Matlab and statistics and machine learning toolbox,release 2017a,” 2017.