MMachine learning quantum states in the NISQ era
Giacomo Torlai and Roger G. Melko
2, 3 Center for Computational Quantum Physics, Flatiron Institute, New York, New York, 10010, USA Department of Physics and Astronomy, University of Waterloo, Ontario, N2L 3G1, Canada Perimeter Institute for Theoretical Physics, Waterloo, Ontario N2L 2Y5, Canada
We review the development of generative modeling techniques in machine learning for the purposeof reconstructing real, noisy, many-qubit quantum states. Motivated by its interpretability andutility, we discuss in detail the theory of the restricted Boltzmann machine. We demonstrate itspractical use for state reconstruction, starting from a classical thermal distribution of Ising spins,then moving systematically through increasingly complex pure and mixed quantum states. Intendedfor use on experimental noisy intermediate-scale quantum (NISQ) devices, we review recent efforts inreconstruction of a cold atom wavefunction. Finally, we discuss the outlook for future experimentalstate reconstruction using machine learning, in the NISQ era and beyond.
I. INTRODUCTION
We are entering the age where quantum computerswith tens – or soon hundreds – of qubits are becom-ing available. These noisy intermediate-scale quantum(NISQ) devices [1] are being constructed out of coldatoms [2], superconducting quantum circuits [3], trappedions [4], and other quantum systems for which we haveachieved an exquisite degree of control. NISQ deviceswill soon play an important role, since they are poised tosurpass the ability of the world’s most powerful comput-ers to perform exact simulations of them, ushering in theera of so-called quantum supremacy [5].Some of the first tasks for these NISQ devices will be assimulators (or emulators) of other highly-entangled quan-tum many-body systems. The goal is to supplant our cur-rent conventional computer simulation technology, suchas exact diagonalization, quantum Monte Carlo, or ten-sor network methods. Efforts to produce quantum sim-ulators of some of the most important physical systems,such as the fermionic Hubbard model [6], are progress-ing in earnest. However, with the advent of increasinglylarger NISQ devices comes a paradox: how will we sim-ulate the simulators? That is, how will we validate anintermediate-scale quantum device, confirming that it isproducing the behavior it was designed for? Along withquantum supremacy comes the necessary breakdown ofconventional tomography – the gold standard for quan-tum state reconstruction. We are left searching for im-perfect alternatives.The answer may lie in new data-driven approaches in-spired by rapid advances in machine learning. A strat-egy for unsupervised learning, called generative modeling ,has demonstrated the ability to integrate well with thedata produced by NISQ devices. In industry applica-tions, the goal of generative modeling is to reconstructan unknown probability distribution P ( x ) from a set ofdata x drawn from it. In the most powerful versions ofgenerative modeling, the reconstructed probability distri-bution is represented approximately by a graphical model or neural network – the weights and biases serving as aparameterization of P ( x ) . After training, these genera-tive models can be used to estimate the likelihood, or toproduce samples, of new x in a way that generalizes andscales well.This procedure can be extended to data produced byquantum devices, with the goal of reconstructing thequantum wavefunction (a complex generalization of aclassical probability distribution). NISQ devices withsingle-site control are particularly suited to this data-driven approach, since they can produce projective mea-surements of the state of individual qubits. If a suffi-cient type and number of projective measurements canbe obtained, industry-standard algorithms for unsuper-vised learning of the relevant probability distributions(produced according to the Born rule) can be used toreconstruct the underlying quantum state.Such data-driven state reconstruction may play by dif-ferent rules than Hamiltonian-driven discovery of quan-tum states. In fact, the latter consists of obtain-ing a quantum state underlying a microscopic model(i.e. a Hamiltonian), and it is a benchmark for quan-tum supremacy. Instead, the former assumes no knowl-edge of the Hamiltonian, but requires informationally-complete sets of measurement data on the quantum state.The question of how efficiently the data-driven approachscales for wavefunctions of various structures of inter-est to physicists, and how it compares to the more con-ventional Hamiltonian-driven approach, is largely unan-swered.The most obvious role for a quantum state recon-structed via generative modeling is to produce new phys-ical observables. To be useful, this must be done in atractable way that scales efficiently with increasing num-ber of qubits, while generalizing well to unseen data. Theobservables in question may be inaccessible to the device,such as those encoded in a basis for which no projectivemeasurement was taken, or those (such as Renyi entan-glement entropies) that require elaborate technical setups[7]. Generative models are also capable of mitigatingnoise in the state preparation and measurement, a ubiq- a r X i v : . [ qu a n t - ph ] M a y uitous and defining condition in NISQ devices. Finally,the ability to off-load the production of various observ-ables to a parameterized model frees experimentalists tofocus solely on the production of high-quality projectivemeasurements. It is this type of inelegant compromisethat will allow machine learning techniques to contributeto the verification of quantum devices as they grow intothe NISQ era and beyond.In this paper, we review the development of genera-tive modeling for quantum state reconstruction. Begin-ning with the classical treatment of probability distri-butions, we motivate the use of a restricted Boltzmannmachine (RBM), and demonstrate its ability to param-eterize the thermal distribution of data drawn from aclassical Ising model. The same type of RBM is shownto faithfully reconstruct a real-positive wavefunction, andwe demonstrate the production of non-trivial observablesfrom the parameterized model. We then discuss exten-sions of standard RBMs to reconstruct complex wave-functions and density matrices. We end this review witha discussion of recent efforts in reconstruction of a real-world Rydberg atom quantum simulator. Despite chal-lenges in noisy state preparation and measurement, thedemonstration of real experimental state reconstructionis a milestone for the use of machine learning in the NISQera. II. GENERATIVE MODELING
Let us begin by considering an unknown probabilitydistribution P ( x ) defined over the N -dimensional spaceof binary states x = ( x , . . . , x N ) , and a set of data D = { x k } distributed according to P ( x ) . Can we in-fer the features of such distribution, such as regulari-ties and correlations, directly from the observation ofthe data? In other words, can we discover an approx-imate representation p ( x ) ≈ P ( x ) from the limited-sizedataset D ? The simplest approach consists of approxi-mating the unknown probability with the frequency dis-tribution obtained by inverting the measurement countsin the dataset: p ( x ) = P data ( x ) = 1 (cid:107)D(cid:107) (cid:88) x k ∈D δ x , x k . (1)The validity of this approximation depends on the sizeof the system N the entropy of the distribution and thesize (cid:107)D(cid:107) of the dataset. For most practical purposeshowever, it fails to generalize the features of P ( x ) be-yond the training set. In contrast, generative modelingaims to discover an approximation of the unknown dis-tribution that captures the underlying structure and it isalso capable of generalization.The first ingredient is a compact representation of theprobability distribution, i.e. a parametrization p λ ( x ) interms of a set of parameters λ whose number is much smaller than the size of the configuration space. Then,generative modeling consists of finding an optimal setof parameters λ ∗ such that the parametric distribution p λ ∗ ( x ) mimics the unknown distribution P ( x ) underly-ing the finite number of dataset samples. In practice,this search is carried out through an optimization proce-dure, where the distance between the two distributionsis minimized with respect to the model parameters λ .The distance between two probability distributions canbe quantified by the Kullbach-Leibler (KL) divergence KL λ ( P (cid:107) p ) = (cid:88) x P ( x ) log P ( x ) p λ ( x ) , (2)a non-symmetric statistical measure such that KL λ ( P (cid:107) p λ ) > and KL λ ( P (cid:107) p λ ) = 0 if and onlyif P = p λ . By approximating the KL divergence withthe measurement data, we obtain KL λ ( P (cid:107) p ) ≈ − (cid:107)D(cid:107) (cid:88) x ∈D log p λ ( x ) − H D , (3)where H D is the dataset entropy. This quantity can beminimized iteratively by one of the many variants of thegradient descent algorithm. This procedure allows one toobtain a representation of the unknown distribution andgenerate new configurations that were not encountered inthe learning stage. The most successful approach relieson the representation of p λ ( x ) in terms of networks ofartificial neurons. A. Artificial neural networks
Artificial neural networks, the bedrock of modern ma-chine learning and artificial intelligence [8], have a historyspanning decades. Initially investigated to understandthe process of human cognition, neural networks modelsare based on the idea that information (in the brain) hasa distributed representation over a large collection of ele-mentary units (neurons), and information processing oc-curs through the mutual interaction between neurons [9].The fundamental ingredients are: i) a set of neurons, eachone applying a simple type of computation to the inputsignal it receives; ii) a set of interactions defined over agraph structure connecting the neurons; iii) an externalenvironment providing a “teaching signal”; iv) a learn-ing rule, i.e. a prescription for modifying the interactionsaccording to the external environment.The first artificial neuron capable of computation, the perceptron , was proposed by Frank Rosenblatt as earlyas 1957 [10]. Based on the previous work of McCul-logh and Pitts [11], the perceptron was capable of dis-criminating different classes of input patterns, a processcalled supervised learning . It was later shown that a sin-gle layer perceptron is only capable of learning linearlyseparable functions [12], and since no learning algorithmswere known for multi-layer perceptrons, the model wasabandoned, leading to a decrease in both popularity andfundings of neural networks (called the first AI winter).The first resurgence of the field took place more than adecade later, with the invention of the backpropagationalgorithm [13] and the Boltzmann machine (BM) [14].The latter, was directly built on the connection betweencognitive science and statistical mechanics, establishedby the works of condensed matter physicists William Lit-tle [15, 16] and John Hopfield [17].
1. The Hopfield model
The Hopfield network, introduced in 1982 as a modelfor associative memories [17], was inspired by the con-cept of emergence in condensed matter physics, wherecomplex behaviors effectively emerge from the mutual in-teractions of a large number of degrees of freedom. In thiscontext, Hopfield formulated a physics-inspired model ofcognition for the task of recovering a corrupted memory.By regarding a memory as a state x containing N bitsof information, the corresponding network consists of N binary neurons fully connected with symmetric weights(or interactions), described by an energy function E ( x ) = − (cid:88) ij W ij x i x j . (4)Each neuron in the network carries out the computationof (cid:80) j W ij x j , and update itself according to the followingrule: x i = (cid:40) , if (cid:80) j W ij x j > , otherwise. (5)Since the energy difference between the two possiblestates of the i -th neuron is ∆ E i = (cid:80) j W ij x j , the dynam-ics resulting from the asynchronous update of the neu-rons monotonically minimizes the total energy. There-fore, given an initial state, the network evolves in timeby following the above equation of motion until a stableconfiguration (i.e. a local energy minimum) is found. Inthe context of associative memories, given a set of desiredmemory states { ¯ x , ¯ x , . . . } , there exists a learning ruleto modify the interactions W in such a way that thesestates become local minima in the energy landscape [17].Thus, if the network is initialized to a corrupted mem-ory ¯ x k + δ which is sufficiently close to the true state(i.e. small δ ), the network is able to recover the correctmemory simply by evolving with its equations of motion.
2. The Boltzmann machine
The two major limitations of the Hopfield model arethe tendency for the network to get trapped into lo-cal minima, and its restricted capacity. Nevertheless, it suggested the important connection between cognitivescience and statistical physics, which was was furtherstrengthened with the invention of the BM by Ackley,Hinton and Sejnowski in 1985 [14]. Similarly to the Hop-field model, the BM consists of a set of N binary neuronsinteracting with energy given in Eq. 4. However, in orderto allow the system to escape local minima, the neuralnetwork is placed at thermal equilibrium at some inversetemperature β = 1 /T . The update rule becomes nowstochastic, with the i -th neuron activating ( x i = 1 ) withprobability p i = 11 + e − β ∆ E i , (6)where ∆ E i is once again the energy difference betweenits two internal states. As the temperature goes to zero( β → ∞ ) , one recovers the Hopfield model with a de-terministic dynamics minimizing energy. Instead, for astochastic dynamics at finite temperature, the networkminimizes the free energy instead, equilibrating to thecanonical Boltzmann distribution: p ( x ) = 1 Z e − βE ( x ) , Z = (cid:88) x e − βE ( x ) . (7)The BM is one of the simplest examples of generativemodels. In fact, the set of interactions can be consid-ered as tunable parameters, resulting into a parametricdistribution p λ ( x ) (with λ = W ). Then, the interac-tions can be modified following an unsupervised learningprocedure in order for the network distribution to mimican unknown probability distribution underlying a givenset of data points D = { x } . By minimizing the statisti-cal divergence between the data and model distribution(Eq. 3), one obtains the following learning rule for theparameters [14], ∆ W ij = β (cid:104) (cid:104) x i x j (cid:105) D − (cid:104) x i x j (cid:105) p ( x ) (cid:105) . (8)In the positive phase, the weight W ij is increased ac-cording to the average value of x i x j over the data pointsin D , corresponding to a traditional Hebbian learning(i.e. “neurons that fire together wire together”). Thisterm effectively lowers the energy of all configurationsthat are compatible with the dataset, thus increasingtheir probability. In contrast, in the negative phase, thesame process occurs with the reverse sign, decreasing theprobability of configurations generated by the BM whenrunning freely at thermal equilibrium. Clearly, as thetwo averages coincide, the BM distribution reproducesthe dataset and there is not net change in the parameter.Otherwise, the network is trying to unlearn configura-tions generated at equilibrium that lead to an imbalancewith respect to the data. It is interesting to note howthis learning and unlearning process had been alreadyproposed in an ad-hoc way by Hopfield to eliminate spu-rious minima in his model of associate memories [18]. Boltzmann Machine ba v h W Figure 1. Probabilistic graphical models. ( a ) A fully con-nected neural network, which can represent both the Hopfieldmodel or Boltzmann machine, depending on the update rule.( b ) A restricted Boltzmann machine, with a set of symmetricweights W connecting the visible and the hidden layer. The major limitation of this network is the structureof the energy function, allowing the BM to capture onlypairwise correlation in the data (e.g. it cannot learn theXOR function [19]). The simplest way to increase thereach of its representational capabilities is to introducean auxiliary set of neurons which do not appear in theinput space of the data. The full network is then dividedas x = ( v , h ) , where v are called visible units , corre-sponding to the degrees of freedom in the dataset, and h are called hidden units [14]. In order to derive a learningrule for the network parameters, one needs to eliminatethe hidden degrees of freedom so that an explicit distribu-tion over the visible neurons p λ ( v ) = (cid:80) h p λ ( v , h ) can beobtained. Therefore, to attain a tractable marginal dis-tribution, one can restrict the interactions between neu-rons on different layers, resulting in the famous restrictedBoltzmann machine (RBM). B. Restricted Boltzmann machines
The RBM, originally introduced by Smolensky underthe name of Harmonium [20], is a probabilistic graphicalmodel with energy E λ ( v , h ) = − (cid:88) ij W ij h i v j − (cid:88) j b j v j − (cid:88) i c i h i , (9)where we have added bias terms (i.e. magnetic fields) b and c for the visible and hidden layer respectively. Theset of tunable parameters is now λ = ( W , b , c ) (Fig. 1b).The marginal distribution, obtained by tracing out thehidden neurons, can be calculated analytically p λ ( v ) = (cid:88) h p λ ( v , h ) = 1 Z λ (cid:88) h e − E λ ( v , h ) = 1 Z λ e −E λ ( v ) , (10)where we set the inverse temperature to β = 1 and weintroduced the new energy function E λ ( v ) = − (cid:88) j b j v j − (cid:88) i log (cid:16) e (cid:80) j W ij v j + c i (cid:17) . (11) The energy E λ ( v ) defines an effective system consisting ofthe visible neurons only. We can see that the energy con-tains two terms: a mean-field contribution, proportionalto the visible bias b and a non-linearity containing corre-lations between visible neurons at all orders. The partic-ular structure of such an effective energy allows the RBMdistribution p λ ( v ) to be a universal function approxima-tor of discrete distributions [21]. This means that, givena large enough number of hidden neurons, any functionof discrete binary variables can be approximated to arbi-trary precision. However, in the worst case scenario, thenumber of hidden neurons may grow exponentially withthe visible layer.
1. Unsupervised learning
The goal of unsupervised learning is to discover aset of parameters so that the RBM distribution mim-ics the unknown distribution underlying a dataset D = { v , v , . . . } of visible samples. The cost function, givenby the KL divergence from Eq. 3, is C λ = − (cid:107)D(cid:107) (cid:88) v ∈D log p λ ( v ) = − (cid:107)D(cid:107) (cid:88) v ∈D E λ ( v ) − log Z λ , (12)where we have omitted the constant entropy term H D .The learning rule for the RBM parameters is obtainedby taking the gradient of C λ ∆ λ ∝ −∇ λ C λ = (cid:10) ∇ λ E λ ( v ) (cid:11) D − (cid:10) ∇ λ E λ ( v ) (cid:11) p λ ( v ) (13)where the gradients ∇ λ E λ ( v ) are straightforwards to cal-culate. Similar to the regular BM, the gradient containstwo competing terms, driven respectively by the data andthe RBM distribution. The first term (the positive phase)is trivial to compute, being an average over the data: (cid:10) ∇ λ E λ ( v ) (cid:11) D = 1 (cid:107)D(cid:107) (cid:88) v ∈D ∇ λ E λ ( v ) . (14)Conversely, the calculation of the negative phase is ingeneral intractable. It needs to be approximated using aMonte Carlo simulation, (cid:10) ∇ λ E λ ( v ) (cid:11) p λ ( v ) = 1 Z λ (cid:88) v p λ ( v ) ∇ λ E λ ( v ) ≈ M M (cid:88) k =1 ∇ λ E λ ( v k ) , (15)where the configurations v k are drawn from a Markovchain running on the distribution p λ ( v ) .The sampling stage to estimate the negative phase,which is the computational bottleneck of the training, isaided by the restricted nature of the RBM graph. In fact,because of that, neurons in a given layer are conditionallyindependent of one another. That is, due to the lack ofintra-layer connections, the conditional probabilities forthe neurons in one layer conditioned on the current stateof the other factorize over each individual neuron, p λ ( v | h ) = (cid:89) j p λ ( v j | h ) , p λ ( h | v ) = (cid:89) i p λ ( h i | v ) , (16)and can be easily calculated analytically [22]. When run-ning the Markov chain to collect the statistic in Eq. 15,one can sample the state of each neuron in one layer si-multaneously using the above conditional probabilities,alternating between visible and hidden layer. This sam-pling strategy is called block Gibbs sampling .
2. Training by contrastive divergence
The calculation of the negative phase, even if carriedout using block Gibbs sampling, is still computationallyintensive. In fact, at each training iteration, the RBMneeds to reach its equilibrium distribution p λ ( v ) beforecollecting the statistics for the negative phase calculation.Furthermore, the gradient in Eq. 13 can display a largevariance, being the difference of two averages computedfrom two different distributions. A solution to both theseissues is to consider a different cost function. Namely, the contrastive divergence (CD) between the data and theRBM after a sequence of k block Gibbs sampling stepsis [23], CD k = KL ( P data | p λ ) − KL ( p ( k ) λ | p λ ) , (17)where p ( k ) λ is the probability distribution of the visiblelayer after k steps. The new update from the gradient of CD k becomes [24–26] ∆ λ ∝ (cid:10) ∇ λ E λ ( v ) (cid:11) D − (cid:10) ∇ λ E λ ( v ) (cid:11) p ( k ) λ ( v ) . (18)The resulting CD training consists of initializing theRBM to a random sample from the dataset D and usingthe visible state after k steps of block Gibbs sampling toevaluate the negative phase.Once the gradient of the cost function is calculated,the parameters λ are updated with a gradient descentalgorithm. The simplest one, called stochastic gradientdescent , uses a random set of data to evaluate the positivephase and performs the update ∆ λ = − η ∇ λ C λ where η is the step-size of the update, also called the learning rate .The total number of data samples used for the update iscalled the batch size . Other algorithms can be used tospeed up the convergence [27] and tune the learning ratein an adaptive way [28, 29]. Furthermore, an additionalterm should be added to the cost function to help gen-eralization, i.e. avoid the overfitting of data points. Acommon choice is weight decay [30] regularization, which penalizes large value of the weights. We refer the readerto Ref [31] for more details on the practical training ofRBMs and a description of the various training hyperpa-rameters (and how to choose them). III. QUANTUM STATE RECONSTRUCTION
Image that an experimental NISQ apparatus in thelaboratory containing N qubits is prepared in some quan-tum state of interest, described by a density operator ˆ (cid:37) . Because of the practical limitations imposed by thehardware, measurements of properties of interest mightbe costly, or not technically possible. It is then highlydesirable to be able to reconstruct the quantum state ˆ (cid:37) from simple, experimentally feasible measurements.The traditional approach for reconstructing a quantumstate from measurement data is called quantum state to-mography (QST) [32–34]. A typical procedure consists ofmaximum-likelihood reconstruction of a density operatorparametrized as ˆ ρ ∝ ˆ T † ˆ T [35], where ˆ T is a tri-diagonalhermitian matrix, enforcing the positive semi-definite re-quirement on ˆ ρ . Such procedures assume no a priori phase structure to the quantum state, or even whetherit is necessarily pure. Such “full” QST therefore typi-cally scales exponentially. Given this, full QST can onlybe effectively carried out for systems with a relativelysmall number of particles or qubits [36]. In general how-ever, physical quantum states – such as ground statesof local Hamiltonians – possess large degrees of struc-ture. This often makes it possible to obtain a compactrepresentation with resources scaling polynomially withthe size of the Hilbert space. The most notable exam-ple are matrix product states (MPS), which have beenused to successfully reconstruct quantum states outsidethe reach of full QST [37, 38]. However, so-called MPStomography inherits the intrinsic limitations of the MPSrepresentation, namely the restriction to one dimensionalsystems and low-entangled states, which limits the recon-struction of short-time dynamics, for example. The in-herent structure of a quantum state can also be exploitedin alternative ways, such as in permutationally invariantQST [39, 40] or compressed sensing [41].In this section, we overview a machine learning-basedapproach to QST, and show that unsupervised learningof generative models provides a very natural frameworkfor reconstructing quantum many-body states. As de-scribed in the last section, RBMs offer a generative mod-eling framework that is conceptually interpretable in thecontext of statistical physics. In addition, they havebeen more widely explored in applications in classical andquantum state reconstruction than any other generativemodel. We start by considering the simplest case of re-constructing a thermal state in the classical limit, andproceed with increasing complexity to the case of purequantum wavefunctions and finally density operators. A. Classical limit
We start with the reconstruction of a physical systemat thermal equilibrium, and consider the classical limitwhere the Hamiltonian under consideration is diagonalin the measurement basis {| σ (cid:105)} . The density operatorwe aim to reconstruct simply reduces to ˆ (cid:37) = e − β ˆ H Tr [ e − β ˆ H ] = (cid:88) σ P β ( σ ) | σ (cid:105)(cid:104) σ | (19)where P β ( σ ) = e − βH ( σ z ) /Z β is the classical Boltzmanndistribution in the canonical ensemble and Z β its parti-tion function. State reconstruction is inherently a clas-sical problem here, corresponding to the unsupervisedlearning of the distribution P β ( σ ) . A simple yet non-trivial example is given by the Ising model, where N spins interact with Hamiltonian H ( σ ) = − (cid:88) (cid:104) ij (cid:105) σ i σ j (20)with the sum running over nearest neighbours on alattice. In two dimensions, the spin system displaysferromagnetic order at low temperature and a high-temperature disordered state, separated by a continuousphase transition.As first demonstrated in Ref. [43], different RBMs canbe trained on datasets containing spin configurations atdifferent temperatures across the phase diagram, gener-ated by importance sampling the partition functions us-ing Monte Carlo simulations [44]. The quality of the re-construction can be assessed by comparing expectationvalues of thermodynamics observables generated by theRBM with the exact values calculated on the datasets.In Fig. 2 we report such a comparison for the magnetiza-tion and specific heat, with a varying number of hiddenunits in the RBM. While the magnetization convergesvery quickly – since it it explicitly encoded in the dataset– a larger number of hidden units is required to accuratelyreproduce the specific heat, particularly in the presenceof large fluctuations at the critical point. Finally, wepoint out the curious observation that the quality of thereconstruction does not obviously improve for deep ver-sions of the RBMs [45], such as deep belief networks [46]or deep Boltzmann machines [47]. B. Positive wavefunctions
We now turn to quantum states described by puredensity operators ˆ (cid:37) = | Ψ (cid:105)(cid:104) Ψ | , where the wavefunctionhas representation | Ψ (cid:105) = (cid:80) σ Ψ( σ ) | σ (cid:105) with coefficients Ψ( σ ) = (cid:104) σ | Ψ (cid:105) in the measurement basis {| σ (cid:105)} . In ad-dition, we assume for now that the pure state | Ψ (cid:105) has areal and positive representation in this basis, Ψ( σ ) ∈ R n h = 4 n h = 16 n h = 64 n h = 4 n h = 16 n h = 64 ab M
Figure 2. Learning the thermodynamics of the classical Isingmodel at thermal equilibrium. Comparison of the averagevalues of the magnetization ( a ) and the specific heat ( b ) be-tween the exact values calculated on the dataset (sampled byMC) and the values generated after the reconstruction, foran increasing number of hidden neurons in the RBM. Figurereproduced from reference [42]. and Ψ( σ ) > ∀| σ (cid:105) . Under this assumption, validfor example for ground states of so-called “stoquastic"Hamiltonians [48], the wavefunction | Ψ (cid:105) is uniquely char-acterized by the probability distribution underlying aset of projective measurements, given by the Born rule P ( σ ) = | Ψ( σ ) | . The inherently probabilistic nature ofquantum mechanics provides a simple and natural wayto define a representation of a pure and positive quantumstate in terms of an RBM [49], ψ λ ( σ ) = (cid:112) p λ ( σ ) = 1 √ Z λ e −E λ ( σ ) / . (21)Note that, since RBMs are universal approximators ofany discrete probability distribution, provided the num-ber of hidden units in the network is sufficiently large,the RBM wavefunction ψ λ ( σ ) is capable of representingany positive quantum state to arbitrary accuracy.Because of the positivity of the target state, quantumstate reconstruction in this case is equivalent to conven-tional RBM unsupervised learning. Upon minimizationof the KL divergence between the projective measure-ment distribution and the RBM distribution, C λ = (cid:88) σ | Ψ( σ ) | log | Ψ( σ ) | | ψ λ ( σ ) | = − (cid:107)D(cid:107) (cid:88) σ k ∈D log p λ ( σ ) − H D , (22)the RBM wavefunction approximates the target state ψ λ ∼ Ψ as desired.
1. Measurement of physical observables
By discovering a set of parameters that successfullyminimizes the cost function, the RBM builds an inter-nal representation of the unknown target wavefunctionand can be sampled to compute expectation values ofphysical observables. If the observable ˆ O is diagonal inthe measurement basis, its expectation value reduces toa thermal average with respect to the RBM distribution (cid:104) ˆ O(cid:105) = (cid:104) ψ λ | ˆ O| ψ λ (cid:105) = (cid:88) σ p λ ( σ ) O σσ , (23)which can be approximated by a Monte Carlo averageusing block Gibbs sampling. Calculations of diagonal ob-servables provide a direct verification of the quality of thetraining, since the expectation values can be comparedwith those calculated directly on the training dataset.More interestingly, the RBM allows one to estimateaverage values of observables which are off-diagonal inthe measurement basis. For this case, the expectationvalue reduces to the average (cid:104) ˆ O(cid:105) = (cid:104)O L ( σ ) (cid:105) p λ ( σ ) , where O L ( σ ) = (cid:88) σ (cid:48) ψ λ ( σ (cid:48) ) ψ λ ( σ ) O σ (cid:48) σ , (24)is the so-called local estimate of the observables [50]. Pro-vided the matrix representation of ˆ O is sufficiently sparsein the measurement basis (i.e. the number of off-diagonalelements that are non-zero scales polynomially with N ),its expectation value can be efficiently estimated withMonte Carlo.Another important quantity amenable to calculationwith an RBM is the entanglement of a subsystem A ,which for pure states is quantified by the Renyi en-tropy [51] S α (ˆ ρ A ) = 11 − α log Tr (ˆ ρ αA ) , (25)with ˆ ρ A = Tr A ⊥ | Ψ (cid:105)(cid:104) Ψ | the reduced density matrix of A . For the case of α = 2 , the entanglement entropy canbe calculated by considering two identical replicas of theoriginal system, and computing the overlap between thestates with and without the configurations of subregion n h = 4 n h = 16 n h = 64 n h = 4 n h = 16 n h = 64 ab M
2. Reconstructing quantum spins on a lattice
As an example, we review a numerical experiment forthe quantum reconstruction of the ground state of thetransverse-field Ising model, with Hamiltonian ˆ H = − (cid:88) (cid:104) ij (cid:105) ˆ σ zi ˆ σ zj − h (cid:88) i ˆ σ xi . (27)This spin system undergoes a quantum phase transitionbetween a ferromagnetic state for a small value of thetransverse field h , and a paramagnetic state for large h . Measurement data in the {| σ z (cid:105)} basis can be gen-erated with standard methods [56, 57]. Similar to thecase of the classical Ising model above, different RBMsare trained at different values of the transverse field, andthen sampled to generate expectation values of observ-ables [49]. Fig. 3 shows the reconstruction of the averagediagonal and off-diagonal (transverse) magnetizations forthe quantum Ising model on a square lattice, and the en-tanglement entropy for the one-dimensional chain, calcu-lated using the swap operator between replicated copiesof the neural network. C. Complex wavefunctions
The assumption of a pure and positive quantum stateenables RBM reconstruction with a favorable scalingwith respect to the number of particles in the system. Ingeneral however, experimental quantum states might vi-olate this assumption, containing a sign or a phase struc-ture, where the coefficients of the wavefunction can beboth positive and negative, or complex-valued Ψ( σ ) = | Ψ( σ ) | e iφ ( σ ) . A sign structure often appears in groundstates of non-stoquastic Hamiltonians, such as quantumspins with competing interactions on frustrated lattices,or fermions. In this case, data from a single measurementbasis is clearly not sufficient to fully capture the quan-tum state, since the corresponding probability distribu-tion P ( σ ) = | Ψ( σ ) | does not contain any fingerprints ofthe sign structure. Thus, reconstruction of the quantumstate requires measurement in additional bases.The first step for generalizing the RBM reconstructionto complex-valued wavefunctions is to define an appro-priate neural-network parametrization of the quantumstate. The most straightforward way consists of addinga phase factor to the positive RBM wavefunction definedin the previous section, ψ λµ ( σ ) = (cid:112) p λ ( σ ) e iθ µ ( σ ) . Thereis a large amount of freedom in choosing the nature ofthe phase function θ µ ( σ ) in term of additional networkparameters µ , and it needs not to be restricted to gener-ative models. In fact, any feedforward neural network,such as convolutional networks [58], could be used tothis end. Another powerful way to adapt the RBM toquantum states is by using complex-valued weights andbiases [59]. In this review we will use an additional RBMto capture the phases, leading to the following neural-network wavefunction [49] ψ λµ ( σ ) = 1 √ Z λ e − ( E λ ( σ )+ i E µ ( σ )) / . (28) Note that the generation of configurations in the ref-erence basis corresponds to sampling the distribution | ψ λ ( σ ) | = p λ ( σ ) which does not depend on the phasesand can be then carried out by using block Gibbs sam-pling on the RBM with parameters λ .
1. Learning the phase structure
The reconstruction of a phase structure requires per-forming additional measurements in bases different thanthe reference one where the RBM wavefunction is ex-pressed. This involves applying a unitary transformation ˆ U to the quantum state, Ψ( σ b ) = (cid:88) σ U σ b σ Ψ( σ ) , (29)where | σ b (cid:105) = | σ b , . . . , σ b N N (cid:105) and b j identifies a particularchoice of local basis for the j -th degree of freedom. Thecorresponding probability distribution after the measure-ment, P ( σ b ) = | Ψ( σ b ) | , contains partial information onthe phases and can be used to reconstruct the complexstate. In general, such a unitary transformation con-sists of a collection of independent rotations of the localHilbert spaces. The number and the type of rotations re-quired to extract sufficient information to learn a phasedepends on the structure of the specific quantum stateunder reconstruction.Given a dataset D = { σ b } of measurements in differ-ent bases, the RBM reconstruction can be realized byminimizing the total KL divergence in all bases, C λµ = − (cid:107)D(cid:107) (cid:88) σ b ∈D log | ψ λµ ( σ b ) | = − (cid:107)D(cid:107) (cid:88) σ b ∈D (cid:34) log (cid:32)(cid:88) σ U σ b , σ ψ λµ ( σ ) (cid:33) + c.c (cid:35) , (30)where we have omitted the constant entropy term. Bytaking the gradients with respect to the parameters oneobtains: ∇ λ C λµ = 1 (cid:107)D(cid:107) (cid:88) σ b ∈D R e (cid:20)(cid:10) ∇ λ E λ ( σ ) (cid:11) Q b ( σ ) (cid:21) − (cid:10) E λ ( σ ) (cid:11) p λ , (31) ∇ µ C λµ = − (cid:107)D(cid:107) (cid:88) σ b ∈D I m (cid:20)(cid:10) ∇ µ E µ ( σ ) (cid:11) Q b ( σ ) (cid:21) , (32)where the averages over the quasi-probability distribu-tion Q b ( σ ) = U σ b , σ ψ λµ ( σ ) are calculated directly on themeasurement data. Since the negative phase does not de-pend on the phase parameters µ , standard CD trainingcan be directly applied here. A detailed derivation of thegradients can be found in Ref. [42]. D. Density operators
When the purity of the quantum state of interest can-not be assumed, one needs to reconstruct the full den-sity operator, (cid:37) ( σ , σ (cid:48) ) = (cid:104) σ | ˆ (cid:37) | σ (cid:48) (cid:105) . Similar to the caseof a pure state, before handling the reconstruction werequire a representation of the density matrix in termsof a set of network parameters, ρ λµ ( σ , σ (cid:48) ) , i.e. a neu-ral density operator (NDO). However, in contrast withan RBM wavefunction, the construction of a NDO hasmore stringent requirements, namely the Hermitian con-dition ˆ ρ λµ = ˆ ρ † λµ and the positive semi-definite condition ˆ ρ λµ ≥ . One way to enforce the latter directly into theneural network representation consists of adding a set ofauxiliary degrees of freedom that purifies the mixed stateof the physical system [60].
1. Latent space purification
For any mixed quantum state, it is always possible tointroduce a set of variables α in such a way that thequantum state of the composite system is pure [61]. Inthe context of neural networks, we can introduce a RBMwavefunction for the enlarged system | ψ λµ (cid:105) = (cid:88) σa ψ λµ ( σ , α ) | σ (cid:105) ⊗ | α (cid:105) (33)and obtain a NDO by tracing out the auxiliary variables: ρ λµ ( σ , σ (cid:48) ) = (cid:88) α ψ ∗ λµ ( σ , α ) ψ λµ ( σ (cid:48) , α ) . (34)By embedding the auxiliary units in the latent space ofthe neural network, it is possible to perform this traceanalytically [60] ρ λµ ( σ , σ (cid:48) ) = 1 Z λ e − Γ [+] λ ( σ , σ (cid:48) ) − Γ [ − ] µ ( σ , σ (cid:48) ) − Π λµ ( σ , σ (cid:48) ) . (35)Here we defined Γ [ ± ] λ / µ ( σ , σ (cid:48) ) = 12 (cid:104) E λ / µ ( σ ) ± E λ / µ ( σ (cid:48) ) (cid:105) (36)and Π λµ ( σ , σ (cid:48) ) = − (cid:88) k log (cid:20) e V λ ( σ + σ (cid:48) )+ i V µ ( σ − σ (cid:48) ) (cid:21) , (37)capturing, respectively, the correlations within the sys-tem, and the correlations between the system and theenvironment. The new parameters V λ / µ encode the de-gree of mixing of the state of the physical system – theyare identically zero for a pure state.The cost function for the quantum reconstruction of aNDO is given by C λµ = − (cid:107)D(cid:107) (cid:88) σ b ∈D log ρ λµ ( σ b , σ b ) − H D , (38) and its gradients can be easily calculated analytically [42,60]. Similarly to the case of a pure state, all the gradi-ents can be evaluated directly on the training data (pro-vided the appropriate unitary rotations are applied to thestate). The exception is the term involving the partitionfunction (the negative phase), which is approximated bythe CD algorithm using a finite step of block Gibbs sam-pling (equivalent to sampling the distribution ρ λµ ( σ , σ ) ).Given that the purification through the latent space of aRBM architecture generates a physical density operator( ˆ ρ λµ ≥ ), this type of ansatz is also suitable for the sim-ulation of quantum dynamics of open systems, which wasrecently explored in various numerical experiments [62–64].When evaluating the gradients of the cost function inEq. 38, the NDO needs to be transformed back into thereference basis by appropriate unitary transformationsrelated to the measurement basis, ˆ ρ bλµ = ˆ U b ˆ ρ λµ ˆ U † b . Thisrotation has to be carried out explicitly and it is thenonly feasible as long as ˆ U acts non-trivially on a suffi-ciently small number of degrees of freedom. This limita-tion can be circumvented by avoiding the parametriza-tion of the quantum state directly, and using instead agenerative model to represent the probability distributionunderlying the measurement outcomes of a set of infor-mation complete set of positive-operator valued measures(POVM) [65]. E. Reconstruction of experimental wavefunctions
We have shown that RBMs trained with unsupervisedlearning offer a versatile approach to quantum state re-construction of many-body systems. In this Section, weturn to the case of RBM reconstruction of experimentaldata from NISQ hardware.
1. Noise mitigation
One of the major obstacles in reconstructing quantumstates from real experiments is the presence of measure-ment errors. In practice, when performing measurementson a system prepared in the quantum state ˆ (cid:37) , one ob-tains measurement outcomes τ which do not correspondto projective measurements, but are instead described bya POVM ˆΠ ( τ ) = (cid:80) σ p ( τ | σ ) | σ (cid:105)(cid:104) σ | , where the distribu-tion p ( τ | σ ) is the probability of recording the outcome | τ (cid:105) given the actual measurement | σ (cid:105) . The probabilitydistribution underlying a set of measurement data is thengiven by P ( τ ) = Tr [Π ( τ ) ˆ (cid:37) ] . Assuming the rate of mea-surement errors p ( τ | σ ) are known from the experiment,it is possible to incorporate the noisy measurements inthe RBM architecture in such a way that the neural net-work learns the de-noised distribution, corresponding to0 a b c (MHz)
Lindblad
RBM
2. Application to a Rydberg-atom quantum simulator
Finally, we summarize a recent experiment where RBMquantum reconstruction was applied to real data from aNISQ simulator. Specifically, the experimental systemconsists of an array of cold Rydberg atoms [2, 68], oneof the highest-quality platforms for programmable simu-lation of Ising-like quantum spins [69–71]. In the exper-iment, Rb atoms are individually trapped by opticaltweezers in a defect-free array. The atomic ground state | g (cid:105) is coupled to an highly excited Rydberg state | r (cid:105) bya uniform laser drive, and the atoms interact with a Vander Waals potential, resulting into the Hamiltonian ˆ H (Ω , ∆) = − ∆ (cid:88) i ˆ n i − Ω2 (cid:88) i ˆ σ xi + (cid:88) i Modern machine learning has provided us with genera-tive modeling techniques that are perfectly suited for theemerging landscape of NISQ hardware. Stochastic neuralnetworks, such as RBMs and their cousins [65, 67, 72],are heuristically known to provide good quality state re-constructions for intermediate-scale and noisy data. Inthat sense, their adoption to quantum state reconstruc-tion on devices of tens, hundreds, or even thousands ofqubits should come as no surprise.As discussed in this review, the systematic develop-ment of RBM theory for use in quantum state reconstruc-tion is becoming well understood from a formal stand-point. Parallel to theoretical and algorithmic advance-ments, a crucial role is also played by the developmentof related open source software [73, 74], easily accessibleto experimentalists. However, many fundamental ques-tions still remain to be answered if such machine learn-ing techniques are to become fully integrated with NISQhardware.First, as evident in this review, the most well-studiedcases involve wavefunctions that are real and positive –mathematically equivalent to probability distributions. There, standard generative models such as RBMs canbe employed with little alteration from their originalindustry-motivated design. Under the assumption of pu-rity, recent work has demonstrated the efficiency of mod-ern algorithms for unsupervised learning in approximatestate reconstruction. In this case, RBMs in particularhave shown their utility in producing accurate and scal-able estimators for physical observables not directly avail-able from the original data set; i.e. they generalize well.Of particular interest is the basis-independent Renyi en-tanglement entropy, which can be measured directly froma trained RBM using a scalable algorithm involving repli-cation of the model wavefunction. This is perhaps themost striking example of a measurement that is resource-intensive experimentally [7], but relatively simple to im-plement in the trained generative model.Real and positive wavefunctions occupy a special placein the landscape of physically-interesting states; for ex-ample, they are the ground states of stoquastic Hamilto-nians. However, a large proportion of quantum states un-der study theoretically and experimentally cannot be as-sumed to have this significant simplification. As we havediscussed in detail, in the case of complex wavefunctions,state reconstruction is possible with RBMs (and othergenerative models). What is required first is a conven-tion to parameterize the phase, e.g. in additional hiddenlayers, or as complex weights [59]. Then, measurementsin more than one basis are needed to train the parame-ters encoding the phase of the wavefunction. Given thisstrategy, experimental NISQ wavefunctions, such as cold-atom implementations of the fermionic Hubbard model[6] or other interesting many-body Hamiltonians, mayconceivably be reconstructed in the near future.Herein lies one frontier for state reconstruction withmachine learning. In the quest to construct a NISQ-compatible generative modeling method, foremost is thequestion of scaling of the number of measurement basesrequired for informational completeness. Very little isknown theoretically about this scaling for wavefunctionsof interest to NISQ simulators; in the pure case, the num-ber of basis required to learn a N -qubit state could rangefrom 1 (see above), to a number that grows exponentiallyin N (see e.g. Ref. [75]). This wide range of possibil-ities leaves open many questions about the learnabilityof quantum states. For example, for what other typicalphysical wavefunctions is the number of bases tractablein the context of generative models (RBMs or otherwise)?Also, how does the target wavefunction structure affectthe number of measurements required in each basis? Fi-nally, what is the relationship between these numbers andthe scaling of the RBM parameters required for a desiredrepresentational accuracy? An entire field related to thestudy of how efficient learning relates to the sign or en-tanglement structure of a quantum state still lies ahead.Moving away from pure states, the ability to representdensity matrices suggests the possibility that machine-2learning reconstruction can be expressed as an approx-imate re-formulation of more traditional quantum statetomography. The same scaling questions apply as abovefor the context of generic complex wavefunctions (albeitwith the possibility of significant further roadblocks toscaling). A reformulation of the problem in the lan-guage of informationally-complete POVMs, briefly men-tioned here [65], offers the tantalizing possibility of scal-ing improvements, at the cost of (experimentally) morecomplicated measurements. Finally, success with themixed-state density matrix formulation suggests that are-imagination of process tomography as an unsupervisedlearning problem could also be in store. With furtherdevelopment along these lines, generative modeling ispoised to breach beyond the realm of NISQ simulators,to become a tool for gate-based architectures in the nearfuture.Looking forward, it is clear that today’s hardware isa necessary stepping stone to the more powerful quan-tum technologies of the future. As these devices con-tinue to grow, they will develop in lock-step with powerfulclassical algorithms, to aid in all stages of state prepa-ration, measurement, verification, error correction, andmore. With the dawn of artificial intelligence as the mostpowerful classical computing paradigm of a generation, itstands to reason that machine learning of quantum many-body states will play a critical role in the NISQ era andbeyond. Acknowledgements The Flatiron Institute is supported by the SimonsFoundation. R.G.M. is supported by NSERC of Canada,a Canada Research Chair, and the Perimeter Institute forTheoretical Physics. Research at Perimeter Institute issupported through Industry Canada and by the Provinceof Ontario through the Ministry of Research & Innova-tion. [1] J. Preskill, Quantum , 79 (2018).[2] H. Bernien, S. Schwartz, A. Keesling, H. Levine, A. Om-ran, H. Pichler, S. Choi, A. S. Zibrov, M. Endres,M. Greiner, V. Vuletić, and M. D. Lukin, Nature ,579 (2017).[3] A. Kandala, A. Mezzacapo, K. Temme, M. Takita,M. Brink, J. M. Chow, and J. M. Gambetta, Nature , 242 EP (2017).[4] J. Zhang, G. Pagano, P. W. Hess, A. Kyprianidis,P. Becker, H. Kaplan, A. V. Gorshkov, Z. X. Gong, andC. Monroe, Nature , 601 EP (2017).[5] B. M. Terhal, Nature Physics , 530 (2018).[6] A. Mazurenko, C. S. Chiu, G. Ji, M. F. Parsons,M. Kanász-Nagy, R. Schmidt, F. Grusdt, E. Demler,D. Greif, and M. Greiner, Nature , 462 EP (2017). [7] R. Islam, R. Ma, P. M. Preiss, M. Eric Tai, A. Lukin,M. Rispoli, and M. Greiner, Nature , 77 EP (2015).[8] Y. LeCun, Y. Bengio, and G. Hinton, Nature , 436(2015).[9] D. E. Rumelhart and J. L. McClelland, eds., Parallel Dis-tributed Processing: Explorations in the Microstructure ofCognition, Vol. 1: Foundations (MIT Press, Cambridge,MA, USA, 1986).[10] F. Rosenblatt, Psychological Review , 386 (1958).[11] W. S. McCulloch and W. Pitts, The bulletin of mathe-matical biophysics , 115 (1943).[12] M. Minsky and S. Papert, Perceptrons (MIT Press, Cam-bridge, MA, 1969).[13] D. E. Rumelhart, G. E. Hinton, and R. J. Williams,Nature , 533 EP (1986).[14] D. H. Ackley, G. E. Hinton, and T. J. Sejnowski, Cog-nitive Science , 147 (1985).[15] W. Little, Mathematical Biosciences , 101 (1974).[16] W. Little and G. L. Shaw, Mathematical Biosciences ,281 (1978).[17] J. J. Hopfield, Proceedings of the National Academy ofSciences , 2554 (1982).[18] J. J. Hopfield, D. I. Feinstein, and R. G. Palmer, Nature , 158 EP (1983).[19] G. E. Hinton and T. J. Sejnowski (MIT Press, Cam-bridge, MA, USA, 1986) Chap. Learning and Relearningin Boltzmann Machines, pp. 282–317.[20] P. Smolensky, in Parallel Distributed Processing , editedby D. E. Rumelhart, J. L. McClelland, and C. PDP Re-search Group (MIT Press, Cambridge, MA, USA, 1986)Chap. Information Processing in Dynamical Systems:Foundations of Harmony Theory, pp. 194–281.[21] N. Le Roux and Y. Bengio, Neural Comput. , 1631(2008).[22] A. Fischer and C. Igel, in Progress in Pattern Recog-nition, Image Analysis, Computer Vision, and Applica-tions , edited by L. Alvarez, M. Mejail, L. Gomez, andJ. Jacobo (Springer Berlin Heidelberg, Berlin, Heidel-berg, 2012) pp. 14–36.[23] G. E. Hinton, Neural Computation , 1771 (2002).[24] Y. Bengio and O. Delalleau, Neural Computation ,1601 (2009).[25] A. Fischer and C. Igel, Neural Computation , 664(2011).[26] M. Á. Carreira-Perpiñán and G. E. Hinton, in AISTATS (2005).[27] I. Sutskever, J. Martens, G. Dahl, and G. Hinton, in Proceedings of the 30th International Conference on Ma-chine Learning , Proceedings of Machine Learning Re-search, Vol. 28, edited by S. Dasgupta and D. McAllester(PMLR, Atlanta, Georgia, USA, 2013) pp. 1139–1147.[28] M. D. Zeiler, ArXiv e-prints (2012), arXiv:1212.5701.[29] D. P. Kingma and J. Ba, ArXiv e-prints (2014),arXiv:1412.6980.[30] A. Krogh and J. A. Hertz, in Advances in Neural In-formation Processing Systems 4 , edited by J. E. Moody,S. J. Hanson, and R. P. Lippmann (Morgan-Kaufmann,1992) pp. 950–957.[31] G. E. Hinton, in Neural Networks: Tricks of the Trade:Second Edition , edited by G. Montavon, G. B. Orr, andK.-R. Müller (Springer Berlin Heidelberg, Berlin, Heidel-berg, 2012) pp. 599–619.[32] K. Vogel and H. Risken, Phys. Rev. A , 2847 (1989). [33] M. Ježek, J. Fiurášek, and Z. Hradil, Physical ReviewA , 012305 (2003).[34] K. Banaszek, M. Cramer, and D. Gross, New Journal ofPhysics , 125020 (2013).[35] D. F. V. James, P. G. Kwiat, W. J. Munro, and A. G.White, Phys. Rev. A , 052312 (2001).[36] H. Häffner, W. Hänsel, C. F. Roos, J. Benhelm, D. Chek-al kar, M. Chwalla, T. Körber, U. D. Rapol, M. Riebe,P. O. Schmidt, C. Becher, O. Gühne, W. Dür, andR. Blatt, Nature , 643 (2005).[37] M. Cramer, M. B. Plenio, S. T. Flammia, R. Somma,D. Gross, S. D. Bartlett, O. Landon-Cardinal, D. Poulin,and Y.-K. Liu, Nature Communications , 149 (2010).[38] B. P. Lanyon, C. Maier, M. Holzäpfel, T. Baumgratz,C. Hempel, P. Jurcevic, I. Dhand, A. S. Buyskikh, A. J.Daley, M. Cramer, M. B. Plenio, R. Blatt, and C. F.Roos, Nature Physics , 1158 (2017).[39] G. Tóth, W. Wieczorek, D. Gross, R. Krischek,C. Schwemmer, and H. Weinfurter, Phys. Rev. Lett. ,250403 (2010).[40] T. Moroder, P. Hyllus, G. Tóth, C. Schwemmer,A. Niggebaum, S. Gaile, O. GÃŒhne, and H. Wein-furter, New Journal of Physics , 105001 (2012).[41] D. Gross, Y.-K. Liu, S. T. Flammia, S. Becker, andJ. Eisert, Physical Review Letters , 150401 (2010).[42] Giacomo Torlai, “Augmenting quantum mechanics withartificial intelligence,” (2018).[43] G. Torlai and R. G. Melko, Physical Review B , 165134(2016).[44] W. K. Hastings, Biometrika , 97 (1970).[45] A. Morningstar and R. G. Melko, J. Mach. Learn. Res. , 5975 (2017).[46] G. E. Hinton, S. Osindero, and Y.-W. Teh, Neu-ral Computation , 1527 (2006), pMID: 16764513,https://doi.org/10.1162/neco.2006.18.7.1527.[47] R. Salakhutdinov and G. Hinton, in Proceedings of theTwelth International Conference on Artificial Intelli-gence and Statistics , Proceedings of Machine LearningResearch, Vol. 5, edited by D. van Dyk and M. Welling(PMLR, Hilton Clearwater Beach Resort, ClearwaterBeach, Florida USA, 2009) pp. 448–455.[48] S. Bravyi, D. P. Divincenzo, R. Oliveira, and B. M.Terhal, Quantum Info. Comput. , 361 (2008).[49] G. Torlai, G. Mazzola, J. Carrasquilla, M. Troyer,R. Melko, and G. Carleo, Nature Physics , 447 (2018).[50] F. Becca and S. Sorella, Quantum Monte Carlo Ap-proaches for Correlated Systems (Cambridge UniversityPress, 2017).[51] A. Rényi, in Proceedings of the Fourth Berkeley Sympo-sium on Mathematical Statistics and Probability, Volume1: Contributions to the Theory of Statistics (Universityof California Press, Berkeley, Calif., 1961) pp. 547–561.[52] M. B. Hastings, I. González, A. B. Kallin, and R. G.Melko, Physical Review Letters , 157201 (2010).[53] A. B. Kallin, M. B. Hastings, R. G. Melko, and R. R. P.Singh, Phys. Rev. B , 165134 (2011).[54] Y. Zhang, T. Grover, and A. Vishwanath, Phys. Rev.Lett. , 067202 (2011).[55] J.-M. Stéphan, H. Ju, P. Fendley, and R. G. Melko, NewJournal of Physics , 015004 (2013). [56] S. R. White, Phys. Rev. Lett. , 2863 (1992).[57] H. G. Evertz, Advances in Physics , 1 (2003),https://doi.org/10.1080/0001873021000049195.[58] A. Krizhevsky, I. Sutskever, and G. E. Hinton, in Ad-vances in Neural Information Processing Systems 25 ,edited by F. Pereira, C. J. C. Burges, L. Bottou, andK. Q. Weinberger (Curran Associates, Inc., 2012) pp.1097–1105.[59] G. Carleo and M. Troyer, Science , 602 (2017).[60] G. Torlai and R. G. Melko, Phys. Rev. Lett. , 240503(2018).[61] G. Benenti, G. Casati, and G. Strini, Principles of Quan-tum Computation and Information (WORLD SCIEN-TIFIC, 2004).[62] M. J. Hartmann and G. Carleo, arXiv e-prints ,arXiv:1902.05131 (2019), arXiv:1902.05131 [quant-ph].[63] F. Vicentini, A. Biella, N. Regnault, and C. Ciuti, arXive-prints , arXiv:1902.10104 (2019), arXiv:1902.10104[quant-ph].[64] A. Nagy and V. Savona, arXiv e-prints ,arXiv:1902.09483 (2019), arXiv:1902.09483 [quant-ph].[65] J. Carrasquilla, G. Torlai, R. G. Melko, and L. Aolita,Nature Machine Intelligence , 155 (2019).[66] G. Torlai, B. Timar, E. P. L. van Nieuwenburg,H. Levine, A. Omran, A. Keesling, H. Bernien,M. Greiner, V. Vuletić, M. D. Lukin, R. G. Melko,and M. Endres, arXiv e-prints , arXiv:1904.08441 (2019),arXiv:1904.08441 [quant-ph].[67] A. Macarone Palmieri, E. Kovlakov, F. Bianchi,D. Yudin, S. Straupe, J. Biamonte, and S. Kulik, arXive-prints , arXiv:1904.05902 (2019), arXiv:1904.05902[quant-ph].[68] M. Endres, H. Bernien, A. Keesling, H. Levine, E. R.Anschuetz, A. Krajenbrink, C. Senko, V. Vuletic,M. Greiner, and M. D. Lukin, Science , 1024 (2016).[69] P. Schauß, J. Zeiher, T. Fukuhara, S. Hild, M. Cheneau,T. Macrì, T. Pohl, I. Bloch, and C. Gross, Science ,1455 (2015).[70] H. Labuhn, D. Barredo, S. Ravets, S. de Léséleuc,T. Macrì, T. Lahaye, and A. Browaeys, Nature ,667 EP (2016).[71] E. Guardado-Sanchez, P. T. Brown, D. Mitra, T. De-vakul, D. A. Huse, P. Schauß, and W. S. Bakr, Phys.Rev. X , 021069 (2018).[72] A. Rocchetto, E. Grant, S. Strelchuk, G. Carleo, andS. Severini, npj Quantum Information , 28 (2018).[73] M. J. S. Beach, I. De Vlugt, A. Golubeva, P. Huem-beli, B. Kulchytskyy, X. Luo, R. G. Melko, E. Merali,and G. Torlai, arXiv e-prints , arXiv:1812.09329 (2018),arXiv:1812.09329 [quant-ph].[74] G. Carleo, K. Choo, D. Hofmann, J. E. T. Smith,T. Westerhout, F. Alet, E. J. Davis, S. Efthymiou,I. Glasser, S.-H. Lin, M. Mauri, G. Mazzola, C. B. Mendl,E. van Nieuwenburg, O. O’Reilly, H. Théveniaut, G. Tor-lai, and A. Wietek, arXiv e-prints , arXiv:1904.00031(2019), arXiv:1904.00031 [quant-ph].[75] X. Ma, T. Jackson, H. Zhou, J. Chen, D. Lu, M. D.Mazurek, K. A. G. Fisher, X. Peng, D. Kribs, K. J. Resch,Z. Ji, B. Zeng, and R. Laflamme, Phys. Rev. A93