[PDF] Learning hard quantum distributions with variational autoencoders

Abstract

Studying general quantum many-body systems is one of the major challenges in modern physics because it requires an amount of computational resources that scales exponentially with the size of the system.Simulating the evolution of a state, or even storing its description, rapidly becomes intractable for exact classical algorithms. Recently, machine learning techniques, in the form of restricted Boltzmann machines, have been proposed as a way to efficiently represent certain quantum states with applications in state tomography and ground state estimation. Here, we introduce a new representation of states based on variational autoencoders. Variational autoencoders are a type of generative model in the form of a neural network. We probe the power of this representation by encoding probability distributions associated with states from different classes. Our simulations show that deep networks give a better representation for states that are hard to sample from, while providing no benefit for random states. This suggests that the probability distributions associated to hard quantum states might have a compositional structure that can be exploited by layered neural networks. Specifically, we consider the learnability of a class of quantum states introduced by Fefferman and Umans. Such states are provably hard to sample for classical computers, but not for quantum ones, under plausible computational complexity assumptions. The good level of compression achieved for hard states suggests these methods can be suitable for characterising states of the size expected in first generation quantum hardware.

Full PDF

LLearning hard quantum distributions with variational autoencoders

Andrea Rocchetto,

1, 2, 3, ∗ Edward Grant, † Sergii Strelchuk, Giuseppe Carleo,

5, 6 and Simone Severini

3, 7 Department of Computer Science, University of Oxford, Oxford OX1 3QD, UK Department of Materials, University of Oxford, Oxford OX1 3PH, UK Department of Computer Science, University College London, London WC1E 6EA, UK DAMTP, University of Cambridge, Cambridge CB3 0WA, UK Institute for Theoretical Physics, ETH Z¨urich, Z¨urich 8093, Switzerland Center for Computational Quantum Physics, Flatiron Institute, New York 10010, USA Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai, China

The exact description of many-body quantum systems represents one of the major challenges inmodern physics, because it requires an amount of computational resources that scales exponentiallywith the size of the system. Simulating the evolution of a state, or even storing its description,rapidly becomes intractable for exact classical algorithms. Recently, machine learning techniques,in the form of restricted Boltzmann machines, have been proposed as a way to eﬃciently representcertain quantum states with applications in state tomography and ground state estimation. Here,we introduce a practically usable deep architecture for representing and sampling from probabilitydistributions of quantum states. Our representation is based on variational auto-encoders, a type ofgenerative model in the form of a neural network. We show that this model is able to learn eﬃcientrepresentations of states that are easy to simulate classically and can compress states that are notclassically tractable. Speciﬁcally, we consider the learnability of a class of quantum states introducedby Feﬀerman and Umans. Such states are provably hard to sample for classical computers, butnot for quantum ones, under plausible computational complexity assumptions. The good level ofcompression achieved for hard states suggests these methods can be suitable for characterizing statesof the size expected in ﬁrst generation quantum hardware.

INTRODUCTION

One of the most fundamental tenets of quantumphysics is that the physical state of a many-body quan-tum system is fully speciﬁed by a high-dimensional func-tion of the quantum numbers, the wave-function. Asthe size of the system grows the number of parametersrequired for its description scales exponentially in thenumber of its constituents. This complexity is a severefundamental bottleneck in the numerical simulation ofinteracting quantum systems. Nonetheless, several ap-proximate methods can handle the exponential complex-ity of the wave function in special cases. For example, quantum Monte Carlo methods (QMC), allow to sampleexactly from many-body states free of sign problem [1–3], and

Tensor Network approaches (TN), very eﬃcientlyrepresent low-dimensional states satisfying the area lawfor entanglement [4, 5].Recently, machine learning methods have been intro-duced to tackle a variety of tasks in quantum informa-tion processing that involve the manipulation of quantumstates. These techniques oﬀer greater ﬂexibility and, po-tentially, better performance, with respect to the meth-ods traditionally used. Research eﬀorts have focused onrepresenting quantum states in terms of restricted Boltz-mann machines (RBMs). The RBM representation ofthe wave function, introduced by Carleo and Troyer [6],has been successfully applied to a variety of physicalproblems, ranging from strongly correlated spins [6, 7],and fermions [8] to topological phases of matter [9–11].Particularly relevant to our purposes is the work by Tor- lai et al. [12] that makes use of RBMs to perform quantumstate tomography of states whose evolution can be sim-ulated in polynomial time using classical methods ( e.g.matrix product states (MPS) [13]). Although it is remark-able that RBMs can learn an eﬃcient representation ofthis class of states without any explicitly programmed in-struction, it remains unclear how the model behaves onstates where no eﬃcient classical description is available.Theoretical analysis of the representational power ofRBMs has been conducted in a series of works [7, 14–17].Gao and Duan, in particular, showed that RBMs can-not eﬃciently encode every quantum state [14]. Theyproved that Deep Boltzmann Machines (DBMs) withcomplex weights, a multilayer variant of RBMs, can ef-ﬁciently represent most physical states. Although thisresult is of great theoretical interest the practical appli-cation of complex-valued DBMs in the context of unsu-pervised learning has not yet been demonstrated due to alack of eﬃcient methods to sample eﬃciently from DBMswhen the weights are complex-valued. The absence ofpractically usable deep architectures remains an impor-tant limitation of current neural network based learningmethods for quantum systems. Indeed, several researcheﬀorts on neural networks [18–20] have shown that depthsigniﬁcantly improves the representational capability ofnetworks for some classes of functions (such as composi-tional functions).In this Paper, we address several open questions withneural network quantum states. First, we study howthe depth of the network aﬀects the ability to compressquantum many-body states. This task is achieved upon a r X i v : . [ qu a n t - ph ] J u l introduction of a deep neural network architecture for en-coding probability distribution of quantum states, basedon variational autoencoders (VAEs) [21]. We benchmarkthe performance of deep networks on states where no ef-ﬁcient classical description is known, ﬁnding that depthsystematically improves the quality of the reconstructionfor states that are computationally tractable and for hardstates that can be eﬃciently constructed with a quantumcomputer. Surprisingly, the same does not apply for hardstates that cannot be eﬃciently constructed by means ofa quantum process. Here, depth does not improve thereconstruction accuracy.Second, we show that VAEs can learn eﬃcient rep-resentations of computationally tractable states and canreduce the number of parameters required to represent anhard quantum state up to a factor 5. This improvementmakes VAE states a promising tool for the characteriza-tion of early quantum devices that are expected to havea number of qubits that is slightly larger than what canbe eﬃciently simulated using existing methods [22]. Encoding quantum probability distributions withVAEs

Variational autoencoders (VAEs), introduced byKingma and Welling in 2013 [21], are generative mod-els based on layered neural networks. Given a set of i.i.d.data points X = { x ( i ) } , where x ( i ) ∈ R n , generated fromsome distribution p θ ( x ( i ) | z ) over Gaussian distributed la-tent variables z and model parameters θ , ﬁnding the pos-terior density p θ ( z | x ( i ) ) is often intractable. VAEs allowfor approximating the true posterior distribution, with atractable approximate model q φ ( z | x ( i ) ), with parameters φ , and provide an eﬃcient procedure to sample eﬃcientlyfrom p θ ( x ( i ) | z ). The procedure does not employ MonteCarlo methods.As shown in Fig. 1 a VAE is composed of three maincomponents. The encoder that is used to project theinput in the latent space and the decoder that is usedto reconstruct the input from the latent representation.Once the network is trained the encoder can be droppedand, by generating samples in the latent space, it is pos-sible to sample according to the original distribution. Ingraph theoretic terms, the graph representing a networkwith a given number of layers is a blow up of a directedpath on the same number of vertices. Such a graph isobtained by replacing each vertex of the path with anindependent set of arbitrary but ﬁxed size. The indepen-dent sets are then connected to form complete bipartitegraphs.The model is trained by minimizing over θ and φ the cost function: J ( θ, φ, x ( i ) ) = − E z ∼ q φ ( z | x ( i ) ) [log p θ ( x ( i ) | z )]+ D KL ( q φ ( z | x ( i ) ) || p θ ( z ))) . (1)The ﬁrst term (reconstruction loss) − E z ∼ q φ ( z | x ( i ) ) [log p θ ( x ( i ) | z )] is the expected nega-tive log-likelihood of the i -th data-point and favorschoices of θ and φ that lead to more faithful reconstruc-tions of the input. The second term (regularizationloss) D KL ( q φ ( z | x ( i ) ) || p θ ( z ))) is the Kullback-Leiblerdivergence between the encoder’s distribution q φ ( z | x ( i ) )and the Gaussian prior on z . A full treatment andderivations of the variational objective are given in [21].VAEs can be used to encode the probability distri-bution associated to a quantum state. Let us consideran n -qubit quantum state | ψ (cid:105) , with respect to a basis {| b i (cid:105)} i =1 ,..., n . We can write the probability distributioncorresponding to | ψ (cid:105) as p ( b i ) = |(cid:104) b i | ψ (cid:105)| . If we considerthe computational basis, we can write | ψ (cid:105) = (cid:80) n i =1 ψ i | i (cid:105) ,where each basis element corresponds to an n -bit string.A VAE can be trained to generate basis elements | i (cid:105) ac-cording to the probability p ( i ) = |(cid:104) i | ψ (cid:105)| = | ψ i | .We note that, in principle, it is possible to encode afull quantum state (phase included) in a VAE. This re-quires samples taken from more than one basis and anetwork structure that can distinguish among the diﬀer-ent inputs. The development of VAE encodings for fullquantum states will be left to future work.We approximate the true posterior distribution acrossmeasurement outcomes in the latent space z with a mul-tivariate Gaussian, having diagonal covariance structure,zero mean and unit standard deviation. The trainingset consists of a set of basis elements generated accord-ing to the distribution associated with a quantum state.Following training, the variables z are sampled from amultivariate Gaussian and used as the input to the de-coder. By taking samples from this Gaussian as input,the decoder is able to generate strings corresponding tomeasurement outcomes that closely follow the distribu-tion of measurement outcomes used to train the network. Hard and easy quantum states

In this section we introduce a method to classify quan-tum states based on the hardness of sampling their prob-ability distribution in a given basis. This will be usedto assess the power of deep neural network models atrepresenting many-body wave-functions.We now proceed to deﬁne two concepts that will befrequently used throughout the paper and form the basisof our classiﬁcation method: reconstruction accuracy and compression . Let ρ and σ be n –qubit quantum states.We say that σ is a good representation of ρ if the ﬁdelity F = Tr( (cid:112) ρ / σρ / ) ≥ − (cid:15) for an (cid:15) >

0. This accuracy

Figure 1.

Encoding quantum probability distributionswith VAEs.

A VAE can be used to encode and then gen-erate samples according to the probability distribution of aquantum state. Each dot corresponds to a neuron and neu-rons are arranged in layers. Input (top), latent, and output(bottom) layers contain n neurons. The number of neuronsin the other layers is a function of the compression and thedepth. Layers are fully connected with each other with nointra layer connectivity. The network has three main compo-nents: the encoder (blue neurons), the latent space (green),and the decoder (red). Each edge of the network is labelledby a weight θ . The total number of weights m in the decodercorresponds to the number of parameters used to representa quantum state. The network can approximate quantumstates using m < n parameters. The model is trained usinga dataset consisting of basis elements drawn according to theprobability distribution of a quantum state. Elements of thebasis are presented to the input layer on top of the encoderand, during the training phase, the weights of the network areoptimized in order to reconstruct the same basis element inthe output layer. metric cannot be immediately applied to the analysis ofVAEs, that can only encode the probability distributionassociated to a state. We now show that the ﬁdelitycan expressed in terms of the probability distributionsover a measurement that maximally distinguishes the twostates. Let E = { E i } be a POVM measurement. Then,using a result by Fuchs and Caves [23] we can write F = min E (cid:88) i (cid:112) Tr( E i ρ )Tr( E i σ ) , (2)where the minimum is taken over all possible POVMs.Note that p ( i ) = Tr( E i ρ ) and q ( i ) = Tr( E i σ ) are theprobabilities of measuring the state ρ and σ , respectively,in outcome labelled by i and (cid:80) i (cid:112) p ( i ) q ( i ) is the Bhat-tacharyya coeﬃcient between the two distributions.Using Eq. 2 we can relate the complexity of a statewith the problem of estimating the ﬁdelity F . This cor-responds to the hardness of sampling the probability dis-tribution p ( i ) = Tr( E (cid:48) i ρ ), where E (cid:48) minimises Eq. 2 (here we assume that sampling from the approximating distri-bution q ( i ) is at most as hard as sampling from p ( i )).Throughout the paper, unless where explicitly men-tioned, we will work with states that have only positive,real entries in the computational basis. In this case, itis easy to see that the Bhattacharyya coeﬃcient betweenthe distributions reduces to the ﬁdelity and, hence, mea-surements in the Z basis minimises Eq 2.We remark that, if it is not possible to ﬁnd a POVMfor which Eq. 2 is minimised it is always possible to usethe standard formulation of the ﬁdelity as a metric inthe context of VAEs. This can be accomplished by mak-ing use of 3 VAEs to encode the state σ over 3 diﬀerentbasis. By using standard tomographic techniques, likemaximum likelihood, measurements in a complete basiscan be used to reconstruct the full density matrix.In order to connect the above deﬁnition of state com-plexity with VAEs we introduce the compression factor .Given an n -qubit state that is represented by a VAE with m parameters in the decoder, the compression factor is C = m n . We say that a state ρ is exponentially compress-ible if there exists a network that approximates ρ withhigh accuracy using m = O (poly( n )) parameters.Once a network is trained, the cost of generating asample is proportional to the number of parameters inthe network. In this sense the complexity of a state isparametrised by the number of parameters used by aneural network representation. Based on these obser-vation we deﬁne easy states those that can be repre-sented with high accuracy and exponential compressionand hard states those that can be represented with highaccuracy using at least O (exp( n )) parameters. The lastcategory includes: 1) states that can be eﬃciently sam-pled with a quantum computer, but are conjectured tohave no classical algorithm to do so; 2) states that cannotbe eﬃciently obtained on a quantum computer startingfrom some ﬁxed product input state ( e.g. random states).Under this deﬁnition, states that admit an eﬃcientclassical description (such as stabilizer states or MPSwith low bond dimension) are easy, because we knownthat O (poly( n )) parameters are suﬃcient to specify thestate. Speciﬁcally, for the class of easy states we considerseparable states obtained by taking the tensor productof n diﬀerent 1-qubit random states. More formally, weconsider states of the form | τ (cid:105) = (cid:78) ni =1 | r i (cid:105) where | r i (cid:105) arerandom 1-qubit states. These states can be describedusing only 2 n parameters.Among the class of hard states of the ﬁrst kind, westudy the learnability of a type of hard distributions in-troduced in [24] which can be sampled exactly on a quan-tum computer. These distributions are conjectured to behard to approximately sample from classically – the ex-istence of an eﬃcient sampler would lead to the collapseof the Polynomial Hierarchy under some natural conjec-tures described in [24, 25]. We discuss how to generatethis type of states in the Methods section.Finally, for the second class of hard states, we considerrandom pure states. These are generated by normalizinga 2 n dimensional complex vector drawn from the unitsphere according to the Haar measure. RESULTSThe role of depth in compressibility

Classically, depth is known to play a signiﬁcant rolein the representational capability of a neural network.Recent results, such as the ones by Mhaskar, Liao, andPoggio [18], Telgarsky [19], and Eldan and Shamir [20]showed that some classes of functions can be approxi-mated by deep networks with the same accuracy as shal-low networks but with exponentially less parameters.The representational capability of networks that rep-resent quantum states remains largely unexplored. Someof the known results are only based on empirical evidenceand sometimes yield to unexpected results. For example,Morningstar and Melko [26] showed that shallow net-works are more eﬃcient than deep ones when learningthe energy distribution of a 2-dimensional Ising model.In the context of the learnability of quantum statesGao and Duan [14] proved that DBMs can eﬃciently rep-resent some states that cannot be eﬃciently represent byshallow networks ( i.e. states generated by polynomialdepth circuits or k -local Hamiltonians with polynomialsize gap) using a polynomial number of hidden units.However, there are no known methods to sample eﬃ-ciently from DBMs when the weights include complex-valued coeﬃcients.We benchmark with numerical simulations the roleplayed by depth in compressing states of diﬀerent lev-els of complexities. We focus on three diﬀerent states:an easy state (the completely separable state discussedin the previous section), a hard state (according to Fef-ferman and Umans), and a random pure state.Our results are presented in Fig. 2. Here, by keepingthe number of parameters in the decoder constant, wedetermine the reconstruction accuracy of networks withincreasing depth. Remarkably, depth aﬀects the recon-struction accuracy of hard quantum states. This mightindicate that VAEs are able to capture correlations inhard quantum states. As a sanity check we notice thatthe network can learn correlations in random productstates and that depth does not aﬀect the learnability ofrandom states.Our simulations suggest a further link between neu-ral network and quantum states. This topic has recentlyreceived the attention of the community. Speciﬁcally,Levine et al. [27] demonstrated that convolutional recti-ﬁer networks with product pooling can be described astensor networks. By making use graph theoretic toolsthey showed that nodes in diﬀerent layers model correla- tions across diﬀerent scales and that adding more nodesto deeper layers of a network can make it better at rep-resenting non-local correlations. Eﬃcient compression of physical states

In this section we focus our attention onto two ques-tions: can VAEs ﬁnd eﬃcient representations of easystates? What level of compression can we obtain forhard states? Through numerical simulations we showthat VAEs can learn to eﬃciently represent some easystates (that are challenging for standard methods) andachieve good levels of compressions for hard states. Re-markably, our methods allow to compress up to a factor 5the hard quantum states introduced in [28]. We remarkthat the exponential hardness cannot be overcome forgeneral quantum states and our methods achieve only afactor improvement on the overall complexity. This maynevertheless be suﬃcient to be used as a characterisationtool where full classical simulation is not feasible.We test the performance of the VAE representation ontwo classes of states: the hard states that can be con-structed eﬃciently with a quantum computer introducedby Feﬀerman and Umans [28] and states that can be gen-erated with a long-range Hamiltonian dynamics, as foundfor example in experiments with ultra-cold ions [29]. Thestates generated through this evolution are highly sym-metric physical states. However, due to the bond dimen-sion increasing exponentially with the evolution time,these states are particularly challenging for MPS meth-ods. An interesting question is to understand whetherneural networks are able to exploit these symmetries andrepresent these states eﬃciently. We describe long-rangeHamiltonian dynamics in the Methods section.Results are displayed in Fig. 3. For states obtainedthrough Hamiltonian evolution we achieve with almostmaximum reconstruction accuracy compression levels ofup to C ≈ − . This corresponds to a number of pa-rameters m = O (100) (cid:28) which implies that the VAEhas learned an eﬃcient representation of the state.In the case of hard state we can reach a compression of0 .

2, corresponding to a factor 5 reduction in the numberof parameters required to represent the state. Note thatthe entanglement properties of hard states are likely tomake them hard to compress for tensor network states.For example, if one wanted to compress an 18 qubits stateusing MPS (a type of tensor network that is known to beeﬃciently contractable) we have found that the estimatedbond dimension to reconstruct this state is D ≈ S ), and estimating the bond di-mension with D ≈ S . Considering that an MPS has D variational parameters (in the best case), this would yieldabout 200 thousands variational parameters required torepresent those hard states. The resulting MPS com- L . . . . F (a) L . . . . F (b) L . . . F (c) Figure 2.

Depth aﬀects the learnability of hard quantum states.

Fidelity as a function of the number of layers in theVAE decoder for (a) an 18-qubit hard state that is easy to generate with a quantum computer, (b) random 18-qubit productstates that admit eﬃcient classical descriptions and (c) random 15-qubit pure states. Errors bars for (b) and (c) show thestandard deviation for an average of 5 diﬀerent random states. The compression level C is set to C = 0 . C = 0 .

015 for (b) where C is deﬁned by m n where m is the number of parameters in the VAE decoder and n is the numberof qubits. We use a lower compression rate for product states because, due to their simple structure, even a 1 layer networkachieves almost perfect overalp. Plot (b) makes use of up to 4 layers in order to avoid the saturation eﬀects discussed in theMethods section. pressing factor is then about 1 .

23, a signiﬁcantly lowerﬁgure with respect to the 5 compression factor obtainedwith VAEs. We note that this calculation only shows thatthe entanglement structure of hard states is not well mod-elled by MPS. Other types of tensor networks might bemore amenable to the speciﬁc structure of these statesbut it is unlikely these models will be computationallytractable.Although limited, the levels of compression we achievefor hard states could play a role in experiments aimedat showing quantum supremacy. In this setting a quan-tum machine with a handful of noisy qubits performs atask that is not reproducible even by the fastest super-computer. As recently highlighted by Montanaro andHarrow [30] one of the key challenges with quantumsupremacy experiments is to verify that the quantummachine is behaving as expected. Because quantum com-puters are conjectured to not be eﬃciently simulatable,verifying that a quantum machine is performing as ex-pected is a hard problem for classical machines. The pa-per by Jozsa and Strelchuk [31] provides an introductionto several approaches to veriﬁcation of quantum com-putation. Our methods might allow to characterise theresult of a computation by reducing the complexity of theproblem. Because any veriﬁcation of quantum supremacywill likely involve a machine with only a few qubits abovewhat can be eﬃciently classically simulated, even smallreductions in the number of parameters of the state mightallow to approximate relevant quantities in a computa-tionally tractable way. Potentially, a neural network ap-proach to veriﬁcation can be accomplished by compress-ing a trusted initial state into a VAE whose parameters are then evolved according to a set of rules speciﬁed bythe quantum circuit. By comparing the experimental dis-tribution with the one sampled with the VAE it is thenpossible to determine whether the device is faulty. Weremark that this type of veriﬁcation protocol would only“approximately verify” the system because of the errorsintroduced during the compression phase.

DISCUSSION

In this work we introduced VAEs, a type of deep, gen-erative, neural network, as way to encode the probabilitydistribution of quantum states. Our methods are com-pletely unsupervised, i.e. do not require a labelled train-ing set. By means of numerical simulations we showedthat deep networks can represent hard quantum statesthat can be eﬃciently obtained by a quantum computerbetter than shallow ones. On the other hand, for statesthat are hard and conjectured to be not eﬃciently pro-ducible by quantum computers, depth does not appear toplay a role in increasing the reconstruction accuracy. Ourresults suggest that neural networks are able to capturecorrelations in states that are provably hard to samplefrom for classical computers but not for quantum ones.As already pointed out in other works, this might signalthat states that can be produced eﬃciently by a quantumcomputer have a structure that is well represented by alayered neural network.Through numerical experiments we showed that ourmethods have two important features. First, they arecapable of representing, using fewer parameters, states C . . . . . . F (a) C . . . . . . F (b) Figure 3.

VAEs can learn eﬃcient representation of easy states and can be used to characterize hard states.

Fidelity as a function of compression C = m/ n for (a) an 18-qubit state generated by evolving 2 − n/ (cid:80) i | i (cid:105) using the long-range Hamiltonian time evolution described in the Methods section for a time t = 20 and (b) an 18-qubit hard state generatedaccording to [28]. Figure (a) shows that the VAE can learn to represent eﬃciently with almost perfect accuracy easy states thatare challenging for MPS. Figure (b) shows that hard quantum states can be compressed with high reconstruction accuracy upto a factor 5. The decoder in (a) has 1 hidden layer to allow for greater compression without incurring in the saturation eﬀectsdiscussed in Methods section. The decoder in (b) has 6 hidden layers in order to maximise the representational capability ofthe network. that that are known to have eﬃcient representation butwhere other classical approaches struggle. Second, VAEscan compress hard quantum states up to a constant fac-tor. However low, this compression level might enable toapproximately verify quantum states of a size expectedon near future quantum computers.Presently, our methods allow to encode only the prob-ability distribution of a quantum state. Future researchshould focus on developing VAE architectures that allowto reconstruct the full set of amplitudes. Other inter-esting directions involve ﬁnding methods to compute thequantum evolution of the parameters of the network andinvestigating whether the depth of a quantum circuit isrelated to the optimal depth of a VAE learning its outputstates. Finally, it is interesting to investigate how infor-mation is encoded in the latent layers of the network.Such analysis might provide novel tools to understandthe information theoretic properties of a quantum sys-tem. METHODSNumerical experiments

All our networks were trained using the tensorﬂow r1.3framework on a single NVIDIA K80 GPU. Training wasperformed using backpropagation and the Adam opti-miser with initial learning rate of 10 − [32]. Leaky recti-ﬁed linear units (LReLU) function were used on all hid-den layers with the leak set to 0 . .

85 linearly during training [34]. This turned out to becritical, especially for hard states. A consequence of thisapproach is that the model does not learn the distribu-tion until close to the end of training irrespective of thenumber of training iterations. Each network was trainedusing 50 ,

000 batches of 1000 samples each. Each sampleconsists of a binary string representing a measurementoutcome.Following training the state was reconstructed fromthe VAE decoder by drawing 100(2 n ) samples from amultivariate Gaussian with zero mean and unit variance.The samples were decoded by the decoder to generatemeasurement outcomes in the form of binary strings. Therelative frequency of each string was recorded and used toreconstruct the learned distribution which was comparedto the true distribution to determine its ﬁdelity.In all experiments the number of nodes in the latentlayer is the same as the number of qubits. Using fewer ormore nodes in this layer resulted in worse performance.The number of nodes in the hidden layers is determinedby the number of layers and the compression C deﬁned by m n where n is the number of qubits and m is the numberof parameters in the decoder. In all cases the encoderhas the same number of hidden layers and nodes in eachlayer as the decoder.We compress the VAE representation of a quantumstate by removing neurons from each hidden layer of theVAE. For small n ’s achieving a high level of compres-sion caused instabilities in the network ( i.e. the recon-struction accuracy became more dependent on the weightinitialisation). In this respect we note that, by restrict-ing the number of neurons in the penultimate layer, weare eﬀectively constraining the number of possible basisstates that can be expressed in the output layer and, asa result, the number of conﬁgurations the VAE can sam-ple from. This can be shown noting that the activationfunctions of the penultimate layer generate a set of lin-ear inequalities that must be simultaneously satisﬁed. Ageometric argument that involves how many regions ofan n -dimensional space m hyperplanes can separate leadto conclude that, to have full expressive capability, thepenultimate layer must include at least n neurons. Sim-ilar arguments have been discussed in [35] for multilayerperceptrons. States that are classically hard to sample from

We study the learnability of a special class of hardstates introduced by Feﬀerman and Umans [28] which isproduced by a certain quantum computational processeswhich exhibit quantum “supremacy”. The latter is a phe-nomenon whereby a quantum circuit which consists ofquantum gates and measurements on a constant numberof qubit lines samples from a particular class of distri-butions which is known to be hard to sample from on aclassical computer modulo some very plausible compu-tational complexity assumptions. To demonstrate quan-tum supremacy one only requires quantum gates to oper-ate within a certain ﬁdelity without full error-correction.This makes eﬃcient sampling from such distributions fea-sible to execute on near-term quantum devices and opensthe search for possibilities to look for practically-relevantdecision problems.To construct a distribution one starts from an encodingfunction h : [ m ] → { , } N . The function h performs aneﬃcient encoding of its argument and is used to constructthe following so-called eﬃciently speciﬁable polynomialon n variables: Q ( X , . . . , X N ) = (cid:88) z ∈ [ m ] X h ( z ) . . . X h ( z ) N N , (3)where h ( z ) i means that we take only the i -th bit, and m is an arbitrary integer. In the following, we pick h to be related to the permanent. More speciﬁcally, h : [0 , n ! − → { , } n maps the i -th permutation (outof n !) to a string which encodes its n × n permutationmatrix in a natural way resulting in a N -coordinate vec-tor, where N = n . To encode a number A ∈ [0 , n ! − A in factorial number system to get A (cid:48) obtaining the N -coordinate vector which identiﬁes a particular permuta-tion σ . With the above encoding, our eﬃciently speciﬁablepolynomial Q will have the form: Q ( X , . . . , X N ) = (cid:88) z ∈ [ n ! − X h ( z ) . . . X h ( z ) N N . (4)Fix some number L and consider the following set ofvectors y = ( y , . . . , y N ) ∈ [0 , L − N (i.e. each y j rangesbetween 0 and L − y construct anothervector Z y = ( z y , . . . , z y N ) constructed as follows: each z y j corresponds to a complex L -ary root of unity raisedto power y j . For instance, pick L = 4 and consider y (cid:48) =(1 , , , , , , , Z y (cid:48) =( w , w , w , w , w , w , w , w ), where w = e πi/ (for anarbitrary L it will be e πi/L ).Having deﬁned Q ﬁxed L we are now ready to constructeach element of the “hard” distribution D Q,L :Pr D Q,L [ y ] = | Q ( Z y ) | L N n ! . (5)A quantum circuit which performs sampling is remark-ably easy. It amounts to applying the quantum Fouriertransform to a uniform superposition which was trans-formed by h and measuring in the standard basis (seeTheorem 4 of Section 4 of [28]).Classical sampling of distributions based on the aboveeﬃciently speciﬁable polynomial is believed to be hardin particular because it contains the permanent problem.Thus, the existence of an eﬃcient classical sampler wouldimply a collapse of the Polynomial Hierarchy to the thirdlevel (see Section 5 and 6 of [28] for detailed proof). Long-range quantum Hamiltonians

The long-range Hamiltonian we consider has the form: | Ψ( t ) (cid:105) = e − i H t | Ψ( t = 0) (cid:105) , (6)where H = (cid:88) i

1, the resulting states are highlyentangled, and are for example, challenging for MPS-based tomography [36]. To assess the ability of VAE tocompress highly entangled states, we focus on the taskof reconstructing the outcomes of experimental measure-ments in the computational basis. In particular, we gen-erate samples distributed according to the probabilitydensity | Ψ i ( t ) | , and reconstruct this distribution withour generative, deep models. Acknowledgements.

We thank Carlo Ciliberto, Da-nial Dervovic, Alessandro Davide Ialongo, Joshua Lock-hart, and Gillian Marshall for helpful comments and dis-cussions. Andrea Rocchetto is supported by an EP-SRC DTP Scholarship and by QinetiQ. Edward Grantis supported by EPSRC [EP/P510270/1]. GiuseppeCarleo is supported by the European Research Councilthrough the ERC Advanced Grant SIMCOFE, and bythe Swiss National Science Foundation through NCCRQSIT. Sergii Strelchuk is supported by a LeverhulmeTrust Early Career Fellowship. Simone Severini is sup-ported by The Royal Society, EPSRC and the NationalNatural Science Foundation of China.

Contributions.

The concept of using VAEs to encodeprobability distributions of quantum states was conceivedby A.R., E.G., and G.C. The complexity framework wasdeveloped by A.R., G.C., and S.St. E.G. wrote the codeand performed the simulations with help from S.St. Theproject was supervised by A.R. and S.Se. The ﬁrst draftof the manuscript was prepared by A.R. and all authorscontributed to the writing of the ﬁnal version. A.R. andE.G. contributed equally to this work.

Competing Interests.

The authors declare no compet-ing ﬁnancial interests.

Data availability statements.

All data needed to eval-uate the conclusions are available from the correspondingauthor upon reasonable request. ∗ [email protected] † [email protected][1] Nightingale, M. P. & Umrigar, C. J. Quantum MonteCarlo methods in physics and chemistry . 525 (SpringerScience & Business Media, 1998).[2] Gubernatis, J., Kawashima, N. & Werner, P.

Quan-tum Monte Carlo Methods (Cambridge University Press,2016).[3] Suzuki, M.

Quantum Monte Carlo methods in condensedmatter physics (World scientiﬁc, 1993).[4] Verstraete, F., Murg, V. & Cirac, J. I. Matrix prod-uct states, projected entangled pair states, and varia-tional renormalization group methods for quantum spinsystems.

Advances in Physics , 143–224 (2008).[5] Or´us, R. A practical introduction to tensor networks:Matrix product states and projected entangled pairstates. Annals of Physics , 117–158 (2014).[6] Carleo, G. & Troyer, M. Solving the quantum many-body problem with artiﬁcial neural networks.

Science , 602–606 (2017).[7] Deng, D.-L., Li, X. & Das Sarma, S. Quantum Entan-glement in Neural Network States.

Physical Review X ,021021 (2017).[8] Nomura, Y., Darmawan, A., Yamaji, Y. & Imada,M. Restricted-Boltzmann-Machine Learning forSolving Strongly Correlated Quantum Systems. arXiv:1709.06475 (2017).[9] Deng, D.-L., Li, X. & Sarma, S. D. Exact Machine Learn-ing Topological States. arXiv:1609.09060 (2016). [10] Glasser, I., Pancotti, N., August, M., Rodriguez,I. D. & Cirac, J. I. Neural Networks QuantumStates, String-Bond States and chiral topological states. arXiv:1710.04045 (2017).[11] Kaubruegger, R., Pastori, L. & Budich, J. C. Chi-ral Topological Phases from Artiﬁcial Neural Networks. arXiv:1710.04713 (2017).[12] Torlai, G. et al. Many-body quantum state tomographywith neural networks. arXiv preprint arXiv:1703.05334 (2017).[13] Perez-Garcia, D., Verstraete, F., Wolf, M. M. & Cirac,J. I. Matrix product state representations. arXiv preprintquant-ph/0608197 (2006).[14] Gao, X. & Duan, L.-M. Eﬃcient representation of quan-tum many-body states with deep neural networks.

NatureCommunications , 662 (2017).[15] Chen, J., Cheng, S., Xie, H., Wang, L. & Xiang, T. Onthe Equivalence of Restricted Boltzmann Machines andTensor Network States. arXiv:1701.04831 (2017).[16] Huang, Y. & Moore, J. E. Neural network representationof tensor network and chiral states. arXiv:1701.06246 (2017).[17] Clark, S. R. Unifying Neural-network Quantum Statesand Correlator Product States via Tensor Networks. arXiv:1710.03545 (2017).[18] Mhaskar, H., Liao, Q. & Poggio, T. Learning func-tions: When is deep better than shallow. arXiv preprintarXiv:1603.00988 (2016).[19] Telgarsky, M. Beneﬁts of depth in neural networks. arXivpreprint arXiv:1602.04485 (2016).[20] Eldan, R. & Shamir, O. The power of depth for feedfor-ward neural networks. In Conference on Learning The-ory , 907–940 (2016).[21] Kingma, D. P. & Welling, M. Auto-encoding variationalbayes. arXiv preprint arXiv:1312.6114 (2013).[22] Boixo, S. et al.

Characterizing quantum supremacyin near-term devices. arXiv preprint arXiv:1608.00263 (2016).[23] Fuchs, C. A. & Caves, C. M. Ensemble-dependent boundsfor accessible information in quantum mechanics.

Phys-ical Review Letters , 3047 (1994).[24] Feﬀerman, W. J. The power of quantum Fourier sam-pling . Ph.D. thesis, California Institute of Technology(2014).[25] Aaronson, S. & Arkhipov, A. The computational com-plexity of linear optics. In

Proceedings of the forty-thirdannual ACM symposium on Theory of computing , 333–342 (ACM, 2011).[26] Morningstar, A. & Melko, R. G. Deep learning the isingmodel near criticality. arXiv preprint arXiv:1708.04622 (2017).[27] Levine, Y., Yakira, D., Cohen, N. & Shashua, A. Deeplearning and quantum entanglement: Fundamental con-nections with implications to network design. arXivpreprint arXiv:1704.01552 (2017).[28] Feﬀerman, B. & Umans, C. The power of quan-tum fourier sampling. arXiv preprint arXiv:1507.05592 (2015).[29] Richerme, P. et al.

Non-local propagation of correlationsin quantum systems with long-range interactions.

Nature , 198–201 (2014).[30] Harrow, A. W. & Montanaro, A. Quantum computa-tional supremacy.

Nature , 203–209 (2017). [31] Jozsa, R. & Strelchuk, S. Eﬃcient classical ver-iﬁcation of quantum computations. arXiv preprintarXiv:1705.02817 (2017).[32] Kingma, D. & Ba, J. Adam: A method for stochasticoptimization. arXiv preprint arXiv:1412.6980 (2014).[33] Maas, A. L., Hannun, A. Y. & Ng, A. Y. Rectiﬁer non-linearities improve neural network acoustic models. In

Proc. ICML , vol. 30 (2013).[34] Sønderby, C. K., Raiko, T., Maaløe, L., Sønderby, S. K.& Winther, O. Ladder variational autoencoders. In

Ad- vances in Neural Information Processing Systems , 3738–3746 (2016).[35] Huang, S.-C. & Huang, Y.-F. Bounds on the number ofhidden neurons in multilayer perceptrons.

IEEE trans-actions on neural networks , 47–55 (1991).[36] Cramer, M. et al. Eﬃcient quantum state tomography.

Nature communications1