[PDF] Training of Quantum Circuits on a Hybrid Quantum Computer

Abstract

Generative modeling is a flavor of machine learning with applications ranging from computer vision to chemical design. It is expected to be one of the techniques most suited to take advantage of the additional resources provided by near-term quantum computers. We implement a data-driven quantum circuit training algorithm on the canonical Bars-and-Stripes data set using a quantum-classical hybrid machine. The training proceeds by running parameterized circuits on a trapped ion quantum computer, and feeding the results to a classical optimizer. We apply two separate strategies, Particle Swarm and Bayesian optimization to this task. We show that the convergence of the quantum circuit to the target distribution depends critically on both the quantum hardware and classical optimization strategy. Our study represents the first successful training of a high-dimensional universal quantum circuit, and highlights the promise and challenges associated with hybrid learning schemes.

Full PDF

TTraining of Quantum Circuits on a Hybrid QuantumComputer

D. Zhu, ∗ N. M. Linke, M. Benedetti, , K. A. Landsman , N. H. Nguyen ,C. H. Alderete , A. Perdomo-Ortiz , , N. Korda , A. Garfoot ,C. Brecque , L. Egan , O. Perdomo , C. Monroe , Joint Quantum Institute, Department of Physics,and Joint Center for Quantum Information and Computer Science,University of Maryland, College Park, MD 20742, USA IonQ, Inc., College Park, MD 20740 Department of Computer Science, University College London,WC1E 6BT London, UK, Cambridge Quantum Computing Limited, CB2 1UB Cambridge, UK Mind Foundry Limited, OX2 7DD Oxford, UK Department of Mathematics, Central Connecticut State University,New Britain, CT 06050, USA Zapata Computing Inc., 439 University Avenue,Ofﬁce 535, Toronto, ON, M5G 1Y8 ∗ To whom correspondence should be addressed; E-mail: [email protected].

Generative modeling is a ﬂavor of machine learning with applications rangingfrom computer vision to chemical design. It is expected to be one of the tech-niques most suited to take advantage of the additional resources provided bynear-term quantum computers. Here we implement a data-driven quantumcircuit training algorithm on the canonical Bars-and-Stripes data set using aquantum-classical hybrid machine. The training proceeds by running param-eterized circuits on a trapped ion quantum computer, and feeding the results a r X i v : . [ qu a n t - ph ] O c t o a classical optimizer. We apply two separate strategies, Particle Swarm andBayesian optimization to this task. We show that the convergence of the quan-tum circuit to the target distribution depends critically on both the quantumhardware and classical optimization strategy. Our study represents the ﬁrstsuccessful training of a high-dimensional universal quantum circuit, and high-lights the promise and challenges associated with hybrid learning schemes. One Sentence Summary

We train generative modeling circuits on a quantum-classical hybrid computer showing opti-mization strategy and resource trade-off.

Introduction

Hybrid quantum algorithms ( ) use both classical and quantum resources to solve potentiallydifﬁcult problems. This approach is particularly promising for current quantum computers oflimited size and power ( ). Several variants of hybrid quantum algorithms have recently beendemonstrated, such as the Variational Quantum Eigensolver (VQE) for quantum chemistry andrelated applications ( ), and the Quantum Approximate Optimization Algorithm (QAOA) forgraph or other optimization problems (

8, 9 ). Hybrid quantum algorithms can also be used forgenerative models, which aim to learn representations of data in order to make subsequent taskseasier. Applications of generative modeling include computer vision ( ), speech synthesis( ), the inference of missing text ( ), de-noising of images ( ), and chemical design ( ).Here, we apply a hybrid quantum learning scheme on a trapped ion quantum computer ( ) toaccomplish a generative modeling task.Data-driven quantum circuit learning (DDQCL) is a hybrid framework for generative mod-eling of classical data where the model consists of a parameterized quantum circuit ( ). The2odel is trained by sampling the output of a quantum computer and updating the circuit param-eters using a classical optimizer. After convergence, the optimal circuit produces a quantumstate that captures the correlations in the training data sets. Hence the trained circuit serves as agenerative model for the training data. Theoretical results suggest that such generative modelshave more expressive power than widely used classical neural networks (

17, 18 ). This is be-cause instantaneous quantum polynomial circuits – special cases of the parameterized quantumcircuits used for generative modeling – cannot be efﬁciently simulated by classical means.The Bars-and-Stripes (BAS) data set is a canonical body of synthetic data for generativemodeling ( ). It can be easily visualized in terms of images containing horizontal bars orvertical stripes, where each pixel represents a qubit. Here, we use the uniformly distributed2-by-2 BAS shown in Fig.1 in a proof-of-principle generative modeling task on a trapped-ionquantum computer. This is the ﬁrst successful demonstration of generative quantum circuitstrained on multi-qubit quantum hardware. We note that there has been a single-qubit experimentin this context ( ). We compare the performance of different classical optimization algorithmsand conclude that Bayesian optimization shows signiﬁcant advantages over Particle SwarmOptimization for this task.The experiment is performed on four qubits within a seven-qubit fully programmable trappedion quantum computer ( ) (see Method). With individual addressing and readout of all qubits,the system can perform sequences of gates from a universal gates set, composed of Ising gatesand arbitrary rotations ( ). In order to run the large number of variational circuit instances nec-essary for the data-driven learning, we calibrate single- and two-qubit gates and execute lists ofcircuits in an automated fashion.The training pipeline is illustrated in Fig. 1. The quantum circuits are structured as layersof parameterized gates. We use two types of layers, involving single-qubit rotations and two-qubit entangling gates. A single-qubit layer sandwiches an X-rotation between two Z-rotations3 Q1Q2Q3Qn

Bars-and-stripes states >>>>Learn...... target measurement En t a n g l e R o t a t e En t a n g l e R o t a t e Figure 1: Data-driven quantum circuit learning (DDQCL) is a hybrid quantum algorithmscheme that can be used for generative modeling, illustrated here by the example of 2-by-2Bars and Stripes (BAS) data. From top left, clockwise: A parametrized circuit is initialized atrandom. Then at each iteration, the circuit is executed on a trapped ion quantum computer. Theprobability distribution of measurement is compared on a classical computer against the BAStarget data set. Next, the quantiﬁed difference is used to optimize the parametrized circuit. Thislearning process is iterated until convergence. 4n each qubit i , or R ( i ) z ( α i ) R ( i ) x ( β i ) R ( i ) z ( γ i ) , involving twelve rotation parameters for the fourqubits (see Fig. 2). An entangling layer applies Ising or XX gates between all pairs of qubitsaccording to any imposed connectivity graph. This is expressed as a sequence of XX i,j ( χ i,j ) operations as shown in Fig. 2), with up to six entangling parameters ( ) for four qubits. Dueto the universality of this gate set, a sufﬁciently long sequence of layers of these two types canproduce arbitrary unitaries. Q1Q2Q3Q4Q1Q2Q3Q4

Q1Q3 Q2 Q4 Q1Q3 Q2 Q4 Figure 2: Connectivity graphs and corresponding training circuits. Top: Fully-connected train-ing circuit layer, with layers of rotations (square boxes) and entanglement gates (rounded boxes)between any pair of the four qubits. Bottom: Star-connectivity training circuit layer, with re-stricted entangling gates. In either case, each rotation (denoted by X or Z) and each entangle-ment gate (denoted by XX) includes a distinct control parameter, for a total of 18 parametersfor the fully-connected circuit layer and 15 parameters for the star-connected circuit layer. Weremove the ﬁrst Z rotation (dashed square box) acting on the initial state | (cid:105) , resulting in 14 and11 parameters, respectively. The connectivity ﬁgures on the left deﬁne the mapping betweenthe four qubits and the pixels of the BAS images (see Fig.1)..At the start of DDQCL, all the rotation and entangling parameters are initialized with ran-dom values. Next the circuit is repeatedly executed on the trapped ion quantum computer inorder to reconstruct the state distribution. A classical computer then compares the measureddistribution with the target distribution and quantiﬁes the difference using a cost function (seeMethod for details). A classical optimization algorithm then varies the parameters. We iterate5he entire process until convergence.We impose two distinct connectivity graphs in a four-qubit circuit: all-to-all and star, asshown in Fig.2. With star connectivity, entanglement between certain qubit-pairs cannot occurwithin a single gate layer, which means more layers are necessary for certain target distributions.Comparing the training process between circuits of different connectivity provides insight intothe performance of DDQCL algorithms on platforms with more limited interaction graphs.For each connectivity graph, we add layers until the goal of reproducing the BAS data withthe trained model is achieved. The match between training data and model is limited by noise,experimental throughput rate (how fast the system can process circuits), and sampling errors.The cost function used in optimization scores the result, but a successful training process mustbe able to generate data that can be qualitatively recognized as a BAS pattern to ensure that thesystem provides usable results in the spirit of generative modeling in machine learning ( ).We now describe the classical optimization strategies for the training algorithm. Althoughgradient-based approaches were recently proposed for DDQCL ( ), we employ gradient-freeoptimization schemes that appear less sensitive to noise and experimental throughput. We ex-plore two such schemes: Particle Swarm Optimization (PSO) ( ) and Bayesian Optimization(BO) ( ). PSO is a stochastic optimization scheme commonly used in machine learning thatworks by creating many “particles” randomly distributed across parameter space that explorethe landscape collaboratively. We limit the number of particles to twice the number of pa-rameters. BO is a global optimization paradigm that can handle the expensive sampling ofmany-parameter functions. It works by maintaining a surrogate model of the underlying costfunction and, at each iteration, updates the model to guide the search for the global minimum.Essentially, the problem of optimizing the real cost is replaced with that of optimizing the sur-rogate model, which is designed to be a much easier optimization problem. We use OPTaaS, aBO software package developed by Mind Foundry and adapted for this work.6 esults Results from PSO optimization are shown in Fig. 3. We ﬁrst simulate the training procedureusing a classical simulator in place of the quantum processor (orange plots in Fig. 3). Sincethe PSO method is sensitive to the initial ”seed” values of the particles, we simulate the conver-gence for many different random seeds (see Fig.3). We choose a seed that converges quicklyand reliably under simulated sampling error to start the training procedure on the trapped ionquantum computer illustrated in Fig.1. We iterate the training until it converges (blue plots inFig.3). In practice, which seeds are successful is unknown, and different seeds need to be triedexperimentally until a good model is obtained. This incurs an additional cost in the form ofmultiple independent DDQCL training rounds.For all-to-all connectivity, we ﬁnd that a circuit with one rotation gate layer and one entan-gling gate layer is able to produce the desired BAS distribution (Fig. 3a). This is not the casefor the star-connected circuit, with the closest state having two additional components in thesuperposition (states 6 and 9 in Fig. 3b). With two additional layers, the star-connected circuitis able to model the BAS distribution (orange plots of Fig. 3c). In the experiment however (blueplots in Fig. 3c), the PSO is unable to converge to an acceptable solution even using the bestpre-screened seed value and sufﬁcient sample statistics. We conclude that PSO fails because thethroughput rate is too low for effectively training the circuit in the face of gate imperfections.For these reasons, we instead employ a Bayesian optimization scheme for the circuit trainingprocedure. We ﬁnd that all circuits experimentally converge in agreement with the simulations,as shown in Fig. 4. Moreover, even the star-connected circuit with four layers now producesa recognizable BAS distribution (Fig. 4c). In contrast to PSO, BO dramatically reduces thenumber of samples needed for training and does not require any pre-selection of random seedsor other prior knowledge of the cost-function landscape.7O updates the surrogate model using the experimental result of every iteration. Therefore,the classical part of each BO iteration consumes more time than with PSO, where the timecost on the classical optimizer is negligible. However, the BO procedure converges faster tothe desired BAS distribution. More generally, these examples highlight the need to balancequantum and classical resources in order to produce acceptable performance and run time in ahybrid quantum algorithm.As a measure of the performance of the various training procedures, we compute the Kullback-Leibler (KL) divergence ( ) and the qBAS score (an alternative performance measure sug-gested in ( )) of the experimental results at the end of each DDQCL training run, shown inTable 1. We also compute the entanglement entropy (S) averaged over all two plus two qubitpartitions assuming a pure state ( ), estimated via simulation of the quantum state from thetrained circuits. The entanglement entropy quantiﬁes the level of entanglement of a state, thusindicates how difﬁcult it is to produce such state. This metric shows that the successfully trainedcircuits generate states that are consistent with a high level of entanglement. As a reference, theentanglement entropy of a GHZ state over any partition is S = 1 .8 a) (b) (c) Figure 3: Quantum circuit training results with Particle Swarm optimization (PSO), with sim-ulations (orange) and trapped ion quantum computer results (blue). Column (a) corresponds toa circuit with one layer of single qubit rotations (square boxes) and one layer of entanglementgates (rounded boxes) of all-to-all connectivity. The circuit converges well to produce the bars-and-stripes (BAS) distribution. Columns (b) and (c) correspond to a circuit with two and fourlayers and star-connectivity, respectively. In (b), the simulation shows imperfect convergencewith two extra state components (6 and 9), due to the limited connectivity, and the experimentalresults follow the simulation. In (c), the simulation shows convergence to the BAS distribution,but the experiment fails to converge despite performing 1,400 quantum circuits. The optimiza-tion is sensitive to the choice of initialization seeds. To illustrate the convergence behavior, theshaded regions span the 5th-95th percentile range of random seeds (500 for (a) and (b), 1000 for(c), and the orange curve shows the median. The two-layer circuits have 14 and 11 parametersfor (a) all-to-all- and (b) star-connectivity, while the (c) star-connectivity circuit with four layershas 26 parameters. The number of PSO particles used is twice the number of parameters, andeach training sample is repeated times. Including circuit compilation, controller-uploadtime, and classical PSO optimization, each circuit instance takes about 1 min to be processed,in addition to periodic interruptions for the re-calibration of gates.9 c)(a) (b)

Figure 4: Quantum circuit training results with Bayesian optimization (BO), with simulations(orange) and trapped ion quantum computer results (blue). Column (a) corresponds to a circuitwith two layers of gates and all-to-all connectivity. Columns (b) and (c) correspond to a circuitwith two and four layers and star-connectivity, respectively. Convergence is much faster thanwith PSO (Fig. 3). Unlike the PSO results, the four-layer star-connected circuit in (c) is trainedsuccessfully, and no prior knowledge enters BO process. As before, the two-layer circuits have14 and 11 parameters for (a) all-to-all- and (b) star-connectivity, while the (c) star-connectivitycircuit with four layers has 26 parameters. We use a batch of 5 circuits per iteration, and eachtraining sample is repeated times. Including circuit compilation, controller-upload time,and BO classical optimization, each circuit instance takes 2-5 minutes, depending on the amountof accumulated data. 10ircuits optimizer D KL qBAS score S PSO 0.116 0.91 1.628BO 0.094 0.91 1.659PSO 0.357 0.74 0.9950BO 0.328 0.77 0.9999PSO 0.646 0.59 0.8867BO 0.100 0.91 1.709Table 1: KL divergence ( D KL , see Materials and Methods), qBAS score, and entanglemententropy ( S ) for the state obtained at the end of each of the DDQCL training on hardware, forvarious circuits and classical optimizers used. Discussion

This demonstration of generative modeling using reconﬁgurable quantum circuits of up to 26parameters represents one of the most powerful hybrid quantum applications to date. Withongoing engineering improvements ( ), we expect the system to grow in both qubit numberand gate quality. This approach can be scaled up to handle larger data sets with increased qubitnumber by adapting the cost function for sparser sampling ( ). Moreover, this procedure canbe adapted for other types of hybrid quantum algorithms.Classical optimization techniques for hybrid quantum algorithms on intermediate-scale quan-tum computer do not always succeed ( ). Recent work suggests that typical cost functions formedium to large scale variational quantum circuits landscape resemble “barren plateaus” ( ),making optimization hard. As quantum computers scale up for larger problems, the cost ofclassical optimization such as BO must be weighed against the quantum algorithmic advantage.11 aterials and Methods Trapped Ion Quantum Computer

The trapped ion quantum computer used for this study consists of a chain of seven single Yb + ions conﬁned in a Paul trap and laser cooled close to their motional ground state. Each ionprovides one physical qubit in the form of a pair of states in the hyperﬁne-split S / groundlevel with an energy difference of 12.642821 GHz, which is insensitive to magnetic ﬁelds to ﬁrstorder. The qubits are collectively initialized into | (cid:105) through optical pumping, and state readoutis accomplished by state-dependent ﬂuorescence detection ( ). Qubit operations are realizedvia pairs of Raman beams, derived from a single 355-nm mode-locked laser ( ). These opticalcontrollers consist of an array of individual addressing beams and a counter-propagating globalbeam that illuminates the entire chain. Single qubit gates are realized by driving resonant Rabirotations of deﬁned phase, amplitude, and duration. Single-qubit rotations about the z-axis, areperformed by classically advancing/regarding the phase of the optical beatnote applied to theparticular qubit. Two-qubit gates are achieved by illuminating two selected ions with beat-notefrequencies near motional sidebands and creating an effective Ising spin-spin interaction viatransient entanglement between the two qubits and the motion in the trap ( ). Since ourparticular scheme involves multiple modes of motion, we use an amplitude modulation schemeto disentangle the qubit state from the motional state at the end of the interaction ( ). Typicalsingle-qubit gate ﬁdelities are . . Typical two-qubit gate ﬁdelities are − , withﬁdelity mainly limited by residual entanglement of the qubit states to the motional state of theions, coherent crosstalk and driving intensity noise from classical imperfections in our opticalcontrollers.In our experiment, the effect of the gate errors is seen as an offset in the cost function af-ter convergence. An improvement in gate ﬁdelity will reduce this offset. But the convergence12ehavior of an ideal system (as shown in the simulations in Fig.3 and Fig.4) is not signiﬁcantlyfaster than the actual experimental system. This is because it is limited by the classical opti-mization routine.The trapped ion quantum architecture is scalable to a much larger number of qubits, asatomic clock qubits are perfectly replicable and do not suffer idle errors (T1 and T2 times areessentially inﬁnite). All of the errors in scaling arise from the classical controllers, such asapplied noise on the trap electrodes and laser beam intensity ﬂuctuations. Fundamental errors(like spontaneous scattering from the control laser beams) are not expected to play a role untilour gates approach . ﬁdelity. However, as the qubit number grows beyond about 20-30, we expect to sacriﬁce full connectivity, as gates will only be performed with high ﬁdelitybetween any qubit and its 15-20 nearest neighbors.Another limitation is the sampling rate on the quantum computer. This is limited by techni-cal issues on the current experiment, and can be improved, e.g. by increasing the upload speedof the experimental control system. Classical Optimizers: PSO and BO

We explore two different classical optimizer in this study: Particle Swarm Optimization(PSO)and Bayesian Optimization(BO).PSO is a gradient-free optimization method inspired by the social behaviour of some an-imals. Each particle represents a candidate solution and moves within the solution space ac-cording to its current performance and the performance of the swarm. Three hyper-parameterscontrol the dynamics of the swarm: a cognition coefﬁcient c , a social coefﬁcient c , and aninertia coefﬁcient w ( ).Concretely, each particle consists of a position vector θ i and a velocity vector v i . At iteration t of the algorithm, the velocity of particle i for the coordinate d is updated as13 ( t +1) i,d = wv ( t ) i,d + c r ( t )1 ,d ( p ( t ) i,d − θ ( t ) i,d ) + c r ( t )2 ,d ( g ( t ) d − θ ( t ) i,d ) , (1)where r ( t )1 ,d and r ( t )2 ,d are random numbers sampled from the uniform distribution in [0,1] forevery dimension and every iteration, p ( t ) i is the particle’s best position, g ( t ) is the swarm’s bestposition. The position is then updated as θ ( t +1) i = θ ( t ) i + v ( t ) i , (2)In our problem, each particle corresponds to a point in parameter space of the quantumcircuit. For example, in the fully connected circuit with two layers, each particle consists ofan instance of the 14 parameters. Recall, however, that parameters are angles and are thereforeperiodic; We customized the PSO updates above to use this information. In Eq. (1), p ( t ) i,d and θ ( t ) i,d can be thought of as two points on a circle. Instead of using the standard displacement p ( t ) i,d − θ ( t ) i,d , we use the angular displacement, that is the signed length of the minor arc on the unitcircle. We use the same deﬁnition of displacement for the swarm’s best position g ( t ) i,d . Finally, inEq. (2), we make sure to express angles always using their principal values.In our experiments, we set the number of particles to twice the number of parameters of thecircuit. Position and velocity vectors of each particle are initialized from the uniform distribu-tion. For the coefﬁcients we use c = c = 1 and w = 0 . .Bayesian Optimisation is a powerful global optimisation paradigm. It is best suited to ﬁnd-ing optima of multi-modal objective functions that are expensive to evaluate. There are twomain features that characterize the a BO process: the surrogate model and an acquisition func-tion.The surrogate model is non-parametric model of the objective function. At each iteration,the surrogate model is updated using the sampled points in parameter space. The package usedin this study is OPTaaS by MindFoundry. It implements the surrogate model as regression using14aussian Process ( ). A kernel (or correlation function) characterizes the Gaussian process,we use a Matern 5/2 as it provides the most ﬂexibility.The acquisition function is computed from the surrogate model. It is used to select pointsfor evaluation during the optimization. It trades off exploration against exploitation. The ac-quisition function of a point has a high value if the cost function is expected to give a signif-icant improvement over historically sampled points, or if the uncertainty of the point is high,according to the surrogate model. A simple and well known acquisition function, ExpectedImprovement ( ), is employed here.In our case, OPTaaS also leverages the cyclic symmetry of the angles by embedding theparameter space into a metric space with the appropriate topology, effectively allowing theGaussian Process surrogate model to be placed over a hyper-torus, rather than a hyper-cube.This greatly alleviates the so-called curse of dimensionality ( ), and allows for much moreefﬁcient use of samples of the objective function.It is key in Bayesian Optimisation to adequately optimise the acquisition function duringeach iteration. OPTaaS puts considerable computational resources towards this non-convexoptimisation problem.There are two major reasons why the BO out performs PSO in our speciﬁc case. First,PSO spends signiﬁcant amount of computation resource exploring trajectories far from opti-mal, while BO mitigates it by the use of acquisition function. Second, the maintenance ofthe surrogate model enable us to make much better use of the information from the historicalexploration of the parameter space. Cost Functions

We use a cost function to quantify the difference between the target BAS distribution and theexperimental measurements of the circuit. The cost functions used to implement the training15re variants of the original Kullback-Leibler Divergence ( D KL ) ( ): D KL ( p, q ) = − (cid:88) i p ( i ) log q ( i ) p ( i ) . (3)Here p and q are two distributions. D KL ( p, q ) is an information theoretic measure of how two probability distribution differ.If base 2 for the logarithm is used, it quantiﬁes the expected number of extra bits required tostore samples from p when an optimal code designed for q is used instead. It can be shownthat D KL ( p, q ) is non-negative, and is zero if and only if p=q. However, it is asymmetric in thearguments and does not satisfy the triangle inequality. Therefore D KL ( p, q ) is not a metric.The KL divergence is a very general measure, but it is not always well-deﬁned, e.g. if anelement of the domain is supported by p and not by q , the measure will diverge. This problemmay occur quite often if D KL ( p, q ) is estimated from samples and if the dimensionality of thedomain is large. For PSO, we use the clipped negative log-likelihood cost function ( ), C nll = − (cid:88) i p ( i ) log { max[ (cid:15), q ( i )] } . (4)Here we set p as the target distribution. Thus Eq.4 is equivalent to Eq.3 up to a constant offset,so the optimization of these two functions is equivalent. (cid:15) is a small number (0.0001 here) usedto avoid a numerical singularity when q ( i ) is measured to be zero.For BO, we use the clipped symmetrized Kullback-Leibler (KL) divergence as the costfunction ˜ D KL ( p, q ) = D KL [max( (cid:15), p ) , max( (cid:15), q )] + D KL [max( (cid:15), q ) , max( (cid:15), p )] . (5)This is found to be the most reliable variant of D KL for BO.16 cknowledgments We thank C. Figgatt for helpful discussion. This work was supported by the ARO with fundsfrom the Intelligence Advanced Research Projects Activity (IARPA) LogiQ program (GrantNumber W911NF16-1-0082), the Army Research Ofﬁce (ARO) MURI program on ModularQuantum Circuits (Grant Number W911NF1610349), the AFOSR MURI program on Opti-mal Quantum Measurements (Grant Number 5710003628), the NSF STAQ Practical Fully-Connected Quantum Computer Project, and the NSF Physics Frontier Center at JQI (GrantNumber PHY0822671). L. Egan is additionally funded by NSF award DMR-1747426.

Authors’ contributions

D. Z, N. M. L, M. B, K. A. L, A. P and C. M designed the research. D. Z, N. M. L, M. B, K.A. L, N. H. N, C. H. A, A. P, L. E, and O. P collected and analyzed data. D. Z, M. B, A. P, N.K, A. G and C. B contributed to the software used in this study. All authors contributed to thismanuscript.

Competing interests

C.M. is a founding scientist of IonQ, Inc. All other authors declare that they have no competinginterests.

Data availability

All data needed to evaluate the conclusions in the paper are present in the paper and/or theSupplementary Materials. Additional data related to this paper may be requested from thecorresponding author upon request. 17 eferences

1. J. R. McClean, J. Romero, R. Babbush, A. Aspuru-Guzik, The theory of variational hybridquantum-classical algorithms.

New Journal of Physics , 023023 (2016).2. J. Preskill, Quantum Computing in the NISQ era and beyond. Quantum , 79 (2018).3. A. Kandala, A. Mezzacapo, K. Temme, M. Takita, M. Brink, J. M. Chow, J. M. Gambetta,Hardware-efﬁcient variational quantum eigensolver for small molecules and quantum mag-nets. Nature , 242 (2017).4. A. Peruzzo, J. McClean, P. Shadbolt, M.-H. Yung, X.-Q. Zhou, P. J. Love, A. Aspuru-Guzik, J. L. O’Brien, A variational eigenvalue solver on a photonic quantum processor.

Nature communications , 4213 (2014).5. C. Hempel, C. Maier, J. Romero, J. McClean, T. Monz, H. Shen, P. Jurcevic, B. P. Lanyon,P. Love, R. Babbush, A. Aspuru-Guzik, R. Blatt, C. F. Roos, Quantum chemistry calcula-tions on a trapped-ion quantum simulator. Phys. Rev. X , 031022 (2018).6. P. OMalley, R. Babbush, I. Kivlichan, J. Romero, J. McClean, R. Barends, J. Kelly,P. Roushan, A. Tranter, N. Ding, et al. , Scalable quantum simulation of molecular ener-gies. Physical Review X , 031007 (2016).7. C. Kokail, C. Maier, R. van Bijnen, T. Brydges, M. Joshi, P. Jurcevic, C. Muschik, P. Silvi,R. Blatt, C. Roos, et al. , Self-verifying variational quantum simulation of lattice models. Nature , 355 (2019).8. E. Farhi, J. Goldstone, S. Gutmann, A quantum approximate optimization algorithm.

MIT-CTP/4610 (2014). 18. J. Otterbach, R. Manenti, N. Alidoust, A. Bestwick, M. Block, B. Bloom, S. Caldwell,N. Didier, E. S. Fried, S. Hong, et al. , Unsupervised machine learning on a hybrid quantumcomputer. arXiv preprint arXiv:1712.05771 (2017).10. J.-Y. Zhu, T. Park, P. Isola, A. A. Efros,

Proceedings of the IEEE international conferenceon computer vision (2017), pp. 2223–2232.11. A. Van Den Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalch-brenner, A. Senior, K. Kavukcuoglu, Wavenet: A generative model for raw audio.

CoRRabs/1609.03499 (2016).12. S. R. Bowman, L. Vilnis, O. Vinyals, A. M. Dai, R. Jozefowicz, S. Bengio, Generating sen-tences from a continuous space.

SIGNLL Conference on Computational Natural LanguageLearning (CONLL), 2016 (2016).13. Y. Bengio, L. Yao, G. Alain, P. Vincent,

Advances in Neural Information Processing Sys-tems (2013), pp. 899–907.14. R. G ´omez-Bombarelli, J. N. Wei, D. Duvenaud, J. M. Hern´andez-Lobato, B. S´anchez-Lengeling, D. Sheberla, J. Aguilera-Iparraguirre, T. D. Hirzel, R. P. Adams, A. Aspuru-Guzik, Automatic chemical design using a data-driven continuous representation ofmolecules.

ACS central science , 268–276 (2018).15. S. Debnath, N. M. Linke, C. Figgatt, K. A. Landsman, K. Wright, C. Monroe, Demon-stration of a small programmable quantum computer with atomic qubits. Nature , 63(2016).16. M. Benedetti, D. Garcia-Pintos, O. Perdomo, V. Leyton-Ortega, Y. Nam, A. Perdomo-Ortiz,A generative modeling approach for benchmarking and training shallow quantum circuits. npj Quantum Information , 45 (2019). 197. Y. Du, M.-H. Hsieh, T. Liu, D. Tao, The expressive power of parameterized quantum cir-cuits. arXiv preprint arXiv:1810.11922 (2018).18. X. Gao, Z.-Y. Zhang, L.-M. Duan, A quantum machine learning algorithm based on gener-ative models. Science Advances (2018).19. D. J. MacKay, D. J. Mac Kay, Information theory, inference and learning algorithms (Cam-bridge university press, 2003).20. L. Hu, S.-H. Wu, W. Cai, Y. Ma, X. Mu, Y. Xu, H. Wang, Y. Song, D.-L. Deng, C.-L.Zou, et al. , Quantum generative adversarial learning in a superconducting quantum circuit.

Science advances , eaav2761 (2019).21. K. A. Landsman, C. Figgatt, T. Schuster, N. M. Linke, B. Yoshida, N. Y. Yao, C. Monroe,Veriﬁed quantum information scrambling. Nature , 61 (2019).22. L. Theis, A. v. d. Oord, M. Bethge, A note on the evaluation of generative models. arXivpreprint arXiv:1511.01844 (2015).23. J.-G. Liu, L. Wang, Differentiable learning of quantum circuit born machines.

PhysicalReview A , 062324 (2018).24. J. Kennedy, R. Eberhart, Proc. IEEE International Conference on Neural Networks, Perth,Australia (1995), pp. 1942–1948.25. P. I. Frazier, A tutorial on bayesian optimization. arXiv preprint arXiv:1807.02811 (2018).26. S. Kullback, R. A. Leibler, On information and sufﬁciency.

The annals of mathematicalstatistics , 79–86 (1951). 207. A. Higuchi, A. Sudbery, How entangled can two couples get? Physics Letters A ,213–217 (2000).28. K. Wright, K. Beck, S. Debnath, J. Amini, Y. Nam, N. Grzesiak, J.-S. Chen, N. Pisenti,M. Chmielewski, C. Collins, et al. , Benchmarking an 11-qubit quantum computer. arXivpreprint arXiv:1903.08181 (2019).29. K. E. Hamilton, E. F. Dumitrescu, R. C. Pooser, Generative model benchmarks for super-conducting qubits. arXiv preprint arXiv:1811.09905 (2018).30. J. R. McClean, S. Boixo, V. N. Smelyanskiy, R. Babbush, H. Neven, Barren plateaus inquantum neural network training landscapes.

Nature communications , 4812 (2018).31. S. Olmschenk, K. C. Younge, D. L. Moehring, D. N. Matsukevich, P. Maunz, C. Monroe,Manipulation and detection of a trapped yb + hyperﬁne qubit. Phys. Rev. A , 052314(2007).32. K. Mølmer, A. Sørensen, Multiparticle entanglement of hot trapped ions. Phys. Rev. Lett. , 1835–1838 (1999).33. E. Solano, R. L. de Matos Filho, N. Zagury, Deterministic bell states and measurement ofthe motional state of two trapped ions. Phys. Rev. A , R2539–R2543 (1999).34. G. Milburn, S. Schneider, D. James, Ion trap quantum computing with warm ions. Fortschritte der Physik , 801–810 (2000).35. T. Choi, S. Debnath, T. A. Manning, C. Figgatt, Z.-X. Gong, L.-M. Duan, C. Monroe,Optimal quantum control of multimode couplings between trapped ion qubits for scalableentanglement. Phys. Rev. Lett. , 190502 (2014).36. C. E. Rasmussen,

Summer School on Machine Learning (Springer, 2003), pp. 63–71.217. E. Brochu, V. M. Cora, N. De Freitas, A tutorial on bayesian optimization of expensive costfunctions, with application to active user modeling and hierarchical reinforcement learning. arXiv preprint arXiv:1012.2599 (2010).38. R. E. Bellman,