# Morphology of three-body quantum states from machine learning

David Huber, Oleksandr V. Marchukov, Hans-Werner Hammer, Artem G. Volosniev

MMorphology of three-body quantum states from machine learning

David Huber

Technische Universit¨at Darmstadt, Department of Physics,Institut f¨ur Kernphysik, 64289 Darmstadt, Germany

Oleksandr V. Marchukov

Technische Universit¨at Darmstadt, Institut f¨ur Angewandte Physik, Hochschulstraße 4a, 64289 Darmstadt, Germany

Hans-Werner Hammer

Technische Universit¨at Darmstadt, Department of Physics,Institut f¨ur Kernphysik, 64289 Darmstadt, Germany andExtreMe Matter Institute EMMI, GSI Helmholtzzentrum f¨ur Schwerionenforschung, 64291 Darmstadt, Germany

Artem G. Volosniev

Institute of Science and Technology Austria, Am Campus 1, 3400 Klosterneuburg, Austria

The relative motion of three impenetrable particles on a ring, in our case two identical fermionsand one impurity, is isomorphic to a triangular quantum billiard. Depending on the ratio κ of theimpurity and fermion masses, the billiards can be integrable or non-integrable (also referred to inthe main text as chaotic). To set the stage, we ﬁrst investigate the energy level distributions of thebilliards as a function of 1 /κ ∈ [0 ,

1] and ﬁnd no evidence of integrable cases beyond the limitingvalues 1 /κ = 1 and 1 /κ = 0. Then, we use machine learning tools to analyze properties of probabilitydistributions of individual quantum states. We ﬁnd that convolutional neural networks can correctlyclassify integrable and non-integrable states. The decisive features of the wave functions are thenormalization and a large number of zero elements, corresponding to the existence of a nodal line.The network achieves typical accuracies of 97%, suggesting that machine learning tools can be usedto analyze and classify the morphology of probability densities obtained in theory and experiment. I. INTRODUCTION

The correspondence principle conjectures that highlyexcited states of a quantum system carry informationabout the classical limit. In particular, it implies thatthere must be means to tell a diﬀerence between a ‘typ-ical’ high-energy quantum state that corresponds to anintegrable classical system from a ‘typical’ high-energystate that corresponds to a chaotic system. A discoveryof such means is a complicated task that requires a co-herent eﬀort of physicists, mathematicians, and philoso-phers [1–3]. Currently, there are two main approachesto study chaotic features in quantum mechanics. Oneapproach relies on the statistical analysis of the energylevels of a quantum-mechanical system. Another focuseson the morphology of wave functions. These approachesled to a few celebrated conjectures that postulate featuresof energy spectra and properties of eigenstates [4–6]. Thepostulates are widely accepted now, thanks to numericalas well as experimental data [7, 8].Numerical and experimental data sets produced toconﬁrm the proposed conjectures are so large that it isdiﬃcult, if not hopeless, for the human eye to ﬁnd univer-sal patterns beyond what has been conjectured. There-fore, it is logical to look for computational tools thatcan learn (with or without supervision) universal pat-terns from large datasets. One such tool is deep learn-ing (DL) [9], which is a machine learning method thatuses artiﬁcial neural networks with multiple layers for aprogressive learning of features from the input data. It requires very little engineering by hand, and can easilybe used to analyze big data across disciplines, in partic-ular in physics [10]. DL tools present an opportunity togo beyond the standard approaches of quantum chaolo-gists [11]. For example, in this paper, neural networksbuilt upon many states are used to analyze the morphol-ogy of individual wave functions. Therefore, DL providesus with means to connect and extend tools already usedto understand ‘chaos’ in quantum mechanics.Recent work [12] has already opened an exciting pos-sibility to study the quantum-classical correspondence inintegrable and chaotic systems using DL. In particular, ithas been suggested that a neural network (NN) can learnthe diﬀerence between wave functions that correspond tointegrable and chaotic systems. It is important to pursuethis research direction further, and to understand and in-terpret how a NN distinguishes the two situations. Thisinformation can be used in the future to formulate newconjectures on the role of classical chaos in quantum me-chanics. The main challenge here is the extraction of thisinformation from a NN, which often resembles a blackbox. Ongoing research on interpretability of NNs sug-gests certain routes to understand the black box [13, 14](see also recent works that discuss this question for ap-plications in physical sciences [15–17]). However, thereis no standard approach to this problem. In part, this isconnected to the fact that DL relies on general-purposelearning procedures, therefore, one does not expect thatthere can be a unique way to analyze a neural networkat hand. For example, as we will see, the training of a a r X i v : . [ qu a n t - ph ] F e b network for the ‘integrable’ vs ‘chaotic’ state recognitionis very similar to the classic dog-or-cat classiﬁcation. Itis not clear, however, that the tools that can be used tointerpret the latter (e.g., based on stable spatial relation-ships [18]) are also useful for the former. In particular, atraining set for the ‘integrable’-or-‘chaotic’ problem con-tains information about vastly diﬀerent length scales (de-termined by the energy), whereas a training set for catsvs dogs has only length scales given by the size of the an-imal. Therefore, it is imperative to study interpretabilityof neural networks used in physics separately from thatin other applications.In this paper we analyze a neural network, which hasbeen trained using highly excited states of a triangularbilliard, and attempt to extract the learned features. Bil-liards are conceptually simple systems, yet it is expectedthat they contain all necessary ingredients for study-ing the role of chaos in quantum mechanics [7]. Fur-thermore, eigenstates of quantum billiards are equiva-lent to the eigenstates of the Helmholtz equation withthe corresponding boundary conditions, which connectsquantum billiards and the wave chaos in microwave res-onators [7, 19]. The triangular billiard is one of the most-studied models in quantum chaology [20–24], and there-fore it is well-suited for our study focused on analyzingneural networks as a possible tool for quantum chaology.In our analysis, we rely on convolutional neutral net-works (ConvNets) for image classiﬁcation [25], whichhave recently been successfully applied to categorize nu-merical and experimental data in physical sciences [12,26–33]. These advances motivate us to apply ConvNetsto categorize quantum states as integrable and non-integrable. Our goal can be stated as follows: given aset of highly excited states, build a network that canclassify any input state as integrable or not, and, more-over, study features of this network. One comment is inorder here. There are various deﬁnitions of quantum in-tegrability [34], so we need to be more speciﬁc. In thiswork, we call a quantum system integrable, if it is Bethe-ansatz integrable, i.e., if one can write any eigenstate asa ﬁnite superposition of plane waves. We shall also some-times use the word chaotic instead of non-integrable. Fi-nally, we note that the properties ‘integrable’ and ‘non-integrable’ are usually attached to a given physical sys-tem, e.g., following an analysis of global properties likethe distribution of energy levels. However, the corre-spondence principle implies that these labels can also beapplied to individual states of a quantum system. In thispaper, we use both notions and show that they are com-patible. We employ neural networks to analyze the wavefunctions of individual quantum states.We show that a trained network accurately classiﬁes astate as being ‘integrable’ or ‘non-integrable’, even for an The dog-or-cat classiﬁer in this case is a network with one outputlabel for a dog and one for a cat, which has been trained using aset of a few thousand pictures. input state orthogonal to all states used for training. Thisimplies that a ConvNet learns certain universal featuresof highly-excited states. We argue that a trained neuralnetwork considers almost any random state generated bya Gaussian, Laplace or other distribution as ‘chaotic’,as long as the state includes a suﬃcient amount of zerovalues. This observation agrees with our intuition thata non-integrable state has only weak correlations. Wediscuss the eﬀect of the noise and coarse graining in ourclassiﬁcation scheme, which sets limitations on the appli-cability of neural network to analyze experimental andnumerical data.The paper is organized as follows. In Sec. II we intro-duce the system at hand: a triangular quantum billiardthat is isomorphic to three impenetrable particles on aring. Its properties are discussed in Sec. III using stan-dard methods. In Sec. IV, we present our neural networkapproach and use it in Sec. V to classify the states ofthe system. Moreover, we analyze the properties of thenetwork. In Sec. VI, we conclude. Some technical detailsare presented in the appendix.

II. FORMULATION

We study billiards isomorphic to the relative motion ofthree impenetrable particles in a ring: two fermions andone impurity. Characteristics of these triangular billiardsare presented below, see also Ref. [35, 36]. Our choiceprovides us with a simple parametrization of triangles interms of the mass ratio, κ = m I /m , where m I ( m ) is themass of the impurity (fermions). Furthermore, it allowsus to shed light on the problem of three particles in aring with broken integrability [37–39].For simplicity, we always assume that the impurity isheavier than (or as heavy as) the fermions, correspondingto 1 /κ ∈ [0 , ◦ , ◦ , ◦ )for 1 /κ = 0 and (60 ◦ , ◦ , ◦ ) for κ = 1. These limitingtriangles correspond to two identical hard-core particlesin a square well and a 2+1 Gaudin-Yang model on aring [40], respectively. Both limits are Bethe-ansatz inte-grable, see Refs. [21, 41] for a more detailed discussion.Note that certain extensions to the Bethe ansatz suggestthat additional solvable cases exist [42, 43]. However, ournumerical analysis does not ﬁnd any traces of solvabilitybeyond the two limiting cases, and supports the widelyaccepted idea that almost any one-dimensional problemwith mass imbalance is non-integrable (notable excep-tions include Refs. [44–48]). Therefore, in this work werefer to systems with 1 /κ = 0 and 1 as integrable, inthe sense that they can be analytically solved using theBethe ansatz (cf. Ref. [34]). Systems with other massratios are called non-integrable below. A. Hamiltonian

The Hamiltonian of a three-particle system with zero-range interactions reads as H = − (cid:126) m ∂ ∂x − (cid:126) m ∂ ∂x − (cid:126) κm ∂ ∂y + g (cid:88) i δ ( x i − y ) . (1)Everywhere below we focus on the limit g → ∞ . InEq. (1), 0 < x i < L (0 < y < L ) is the coordi-nate of the i th fermion (impurity), while L is the lengthof the ring, see Fig. 1 (a). The eigenstates ( φ ) of H are periodic functions in each variable. They are an-tisymmetric with respect to the exchange of fermions,i.e., φ ( x , x , y ) = − φ ( x , x , y ). Furthermore, the limit g → ∞ demands that ψ vanishes when the fermion ap-proaches the impurity, i.e., φ ( x i → y ) →

0. For conve-nience, we use the system of units in which (cid:126) = 1 and m = 1 in the following. For our numerical analysis, wechoose units such that L = π .The Hamiltonian H can be written as a sum of the rel-ative and center-of-mass parts. To show this, we expand φ using a basis of non-interacting states, i.e., φ ( x , x , y ) = (cid:88) n ,n ,n a ( n ) n ,n e − πiL ( n x + n x + n y ) , (2)where a ( n ) n ,n = − a ( n ) n ,n to satisfy antisymmetric condi-tion on the wave function. Using this expansion, it iseasy to see that the Hamiltonian does not couple stateswith diﬀerent values of P = 2 π n tot L ; n tot = n + n + n ,which is thus an integral of motion – ‘the total angu-lar momentum’. The conserved quantity P allows us towrite the wave function as φ = e − iP y (cid:88) n ,n a ( n tot − n − n ) n ,n e − πiL ( n ( x − y )+ n ( x − y )) , (3)and deﬁne the function, which depends only on the rela-tive coordinates: ψ P ( z , z ) = e iP y φ ( x , x , y ) , (4)where z i = Lθ ( y − x i ) + x i − y , with the Heaviside stepfunction: θ ( x >

0) = 1 , θ ( x <

0) = 0. The coordinates z i are chosen such that the function ψ P ( z , z ) takes valueson z i ∈ [0 , L ], see Fig. 1 (b).The function ψ P is an eigenstate of the Hamiltonian H P = − (cid:88) i =1 ∂ ∂z i − κ (cid:32) (cid:88) i =1 ∂∂z i (cid:33) + i Pκ (cid:88) i =1 ∂∂z i , (5)which will be the cornerstone of our analysis. As weshow below, it is enough to consider only H P =0 forour purposes. To diagonalize H , we resort to ex-act diagonalization in a suitable basis. As a basis ele-ment, we use the real functions sin (cid:0) n πz L (cid:1) sin (cid:0) n πz L (cid:1) − sin (cid:0) n πz L (cid:1) sin (cid:0) n πz L (cid:1) , where n and n are integers with x x y z z Zz ˜ Z ˜ z a bcd FIG. 1. The ﬁgure illustrates the system of interest and thecorrespondence between three particles in a ring and a tri-angular billiard. Panel (a): Three particles in a ring. Twofermions have coordinates x and x , the coordinate of theimpurity is y , see Eq. (1). Panel (b): The coordinates z and z describe the relative motion of three particles. Panel (c):The coordinates z and Z , which are obtained after rotationof z and z , see Sec. II B). Panel (d): The triangular bil-liard is obtained upon rescaling the coordinates z and Z . Toillustrate the transformation from (a) to (d), we sketch the(real-valued) ground-state wave function for κ = 1 in pan-els (b)-(d). The blue (red) color denotes negative (positive)values of ψ (see. Eq. (4)), which is chosen to be real in ouranalysis. The intensity matches the absolute value. n max > n > n >

0, which is a standard choice forthis type of problems, see, e.g., [49, 50]. This choiceensures that ψ is real in our work for the ground andexcited states. The parameter n max deﬁnes the maxi-mum element beyond which the basis is truncated. Notethat the basis element is the eigenstate of the systemfor 1 /κ = 0. Therefore, we expect exact diagonalizationto perform best for large values of κ and more poorlyfor κ = 1. To estimate the accuracy of our results, webenchmark against the exact solution for an equilateraltriangle ( κ = 1), see the discussion in the Appendix.Using n max = 130, we calculate about 4000 states whoseenergies have relative accuracy of the order of 10 − . Thisset of 4000 states is an input for our analysis in the nextsection.To summarize this subsection: We perform the trans-formation from H, φ to H P , ψ P to eliminate the coordi-nate of the impurity from the consideration. Our pro-cedure can be considered as the Lee-Low-Pines transfor-mation [51] in coordinate space, which is a known toolfor studying many-body systems with impurities in aring [52–55]. Below we argue that H P can be furthermapped onto a triangular billiard. Note however that weare going to work with H P everywhere. Its eigenfunc-tions are deﬁned on a square (see Fig. 1 (b)), allowing usto use them directly as an input for ConvNets. B. Mapping onto a triangular billiard

It is known that three particles in a ring can be mappedonto a triangular billiard [35, 36]. Here we show thismapping starting with H P . First of all we rotate thesystem of coordinates to eliminate the mixed derivative ∂∂z ∂∂z ; see Fig. 1 (c). To this end, we introduce thesystem of coordinates z = ( z − z ) / √ Z = ( z + z ) / √

2, where the Hamiltonian reads as H P ( z, Z ) = − ∂ ∂z − ∂ ∂Z − κ ∂ ∂Z + i √ Pκ ∂∂Z . (6)The last term here can be eliminated by a gauge trans-formation ψ P → exp (cid:16) i √ Pκ +2 Z (cid:17) ψ P . Therefore, in whatfollows we only consider P = 0 without loss of generality.We shall omit the subscript, i.e., we write ψ . Note that itis enough to study only z ≥

0, because of the symmetryof the problem.To derive the standard Hamiltonian for quantum bil-liards: h = − ∂ ∂ ˜ z − ∂ ∂ ˜ Z , (7)we rescale and shift the coordinates as ˜ z = z , and ˜ Z = (cid:112) κ/ ( κ + 2)( Z − L/ √ h is deﬁned on an isosceles triangle with the base angleobtained from tan( α ) = (cid:112) ( κ + 2) /κ . For systems withmore particles the corresponding transformations H → H P → h lead to quantum billiards in polytopes, allowingone to connect an N -body quantum mechanical problemto a quantum billiard in N − R instead ofthe delta function, then the considerations above wouldalso lead to a mapping of the system onto a triangle.(See Ref. [56] for an illustration with an equilateral tri-angle.) III. PROPERTIES OF THE SYSTEM

A discussion of highly excited states of triangular bil-liards can be found in the literature [20–24]. However, weﬁnd it necessary to review some known results and calcu-late some new quantities in order to explain our currentunderstanding of the diﬀerence between integrable andnon-integrable states. In principle, highly excited statesof a quantum system can be simulated using microwaveresonators (see, e.g., [57, 58]), or generated by means ofFloquet engineering – by choosing the driving frequency e n e r g y , E L p = − p = + 12.5 5.0 7.5 10.0 12.5 15.0mass ratio, 020406080 e n e r g y , E ρ () FIG. 2. The spectrum of the Hamiltonian H P from Eq. (6) asa function of the mass ratio, κ . The ﬁgure presents the ﬁrst 30states: 15 states with p = 1 (red, solid curves) and 15 stateswith p = − ρ ( κ ) from Eq. (8). to match the energy diﬀerence between the initial andthe desired ﬁnal state (see, e.g., Ref. [59]). Thereforethe results of this section are not of purely theoreticalinterest, as they can be observed in a laboratory.As we outlined in the introduction, there are two mainapproaches for analyzing a connection between highly-excited states and classical integrability. The ﬁrst onerelies on statistical properties of the energy spectra, whilethe second one focuses on the morphology of individualquantum states. This section sets the stage for our fur-ther study by discussing these approaches in more detail. A. Energy

We start by calculating the energy spectrum. It pro-vides a basic understanding of the evolution from an ‘in-tegrable’ to a ‘chaotic’ system in our work as a func-tion of κ . We present the ﬁrst 30 states of H in Fig. 2(top). Note that an isosceles triangle has a symmetry axis( ˜ Z → − ˜ Z ), which corresponds to a mirror transforma-tion (in the particle picture this symmetry correspondsto z i → L − z i ). The wave function can be symmetric orantisymmetric with respect to the mirror transformationand we consider these cases separately. The former statesare denoted as having p = 1, and the latter have p = − ρ ( κ ) ( E ) = d N/ d E , where N ( E ) is the num-ber of states with the energy less than E . The function ρ ( κ ) ( E ) can be easily calculated using Weyl’s law [62] for P ( s ) κ = 1 . κ = 1 . P ( s ) κ = 2 . κ = 3 . s P ( s ) κ = 5 . s /κ = 0 FIG. 3. The histogram shows the nearest-neighbor distribu-tion, P ( s ), as a function of s for diﬀerent values of the massratio, κ . The (red) solid curve shows the Wigner distributionfrom Eq. (9). The (black) dashed curve shows the Poissondistribution, e − s . To produce these ﬁgures, only states with p = 1 are used. the triangular billiard described by the Hamiltonian h : ρ ( κ ) ( E → ∞ ) → L π (cid:114) κκ + 2 . (8)The density of states is independent of the energy in thisequation because we work with a two-dimensional ob-ject. Equation (8) is derived assuming large values of E , however, in practice, it also describes well the den-sity of states in a lower part of spectrum (cf. Refs. [22]).If we multiply the energies presented in Fig. 2 (top) by ρ ( κ ) then we obtain a spectrum without inﬂation, i.e., alllevels are equally spaced on average, see Fig. 2 (bottom).Multiplication of E by ρ ( κ ) is a simple example ofunfolding, which allows us to directly compare featuresof the energy spectrum for diﬀerent values of κ . Thegoal of the unfolding is to extract the “average” prop-erties of the levels distribution and, thus, diminish theeﬀect of local level density ﬂuctuations in the spectrum.While there are many possible ways to implement theunfolding procedure, which depend on the properties ofthe energy spectrum (for further information see, e.g.,Refs. [8, 63, 64]), the ultimate goal is to obtain rescaledlevels with unit mean spacing. Below, we rescale all ofthe energy levels by the mean distance between them,thus, obtaining the unit mean spacing. We benchmarkedresults of this unfolding against more complicated ap-proaches, and found qualitatively equivalent outcomes.We use unfolded spectra to analyze the distributionof nearest neighbors, P ( s ), which shows the probabilitythat the distances between a random pair of two neigh-boring energy levels is s . The function P ( s ) is presentedin Fig. 3, see also [22, 23, 41], where some limiting casesare analyzed. For the sake of discussion, we only study δ m i n p = +1 κ = 5 . p = +1Average

500 1000 1500 2000 N δ m i n p = − κ = 5 .

500 1000 1500 2000 N p = − FIG. 4. The minimal distance between energy levels as afunction of the number of considered levels. The data pointsare given as (red) crosses. The (black) solid curve is addedto guide the eye. The left panels show δ min for κ = 5 and p = ±

1. The right panels display δ min averaged over diﬀerentmass ratios κ (see the text for details). The (green) dashedcurves in the right panels show the best-ﬁt to the asymptotic1 / √ N behavior, which is expected for random matrices. the states with p = 1, however, we have checked that thecase with p = − κ = 1 to larger val-ues. The degeneracies in the energy spectrum for κ = 1and 1 /κ = 0 lead to well spaced bins in the ﬁgure. Thisbehavior is however rather unique, and it is immediatelybroken for other mass ratios. For example, already for κ = 1 . P ( s ) can be approximated by the Wigner dis-tribution [7] P GOE ( s ) = πs e − πs . (9)Note that it is important to use only one value of p forthis conclusion. Levels that correspond to diﬀerent valuesof p do not repel each other, and the Wigner distributioncannot be realized [22].It is impossible to analyze every value of κ . However,we can also say something on average about our system.To that end, we calculate the dependence of the minimaldistance between levels as a function of the number ofconsidered levels, δ min ( N ) = inf { E n − E n − } n

0, this expression can be approximatedby πδ min /

4. If we consider N lowest states, then theprobability that all nearest neighbors are separated by δ > δ min is given by (1 − πδ min / N . To keep this prob-ability independent of N , the parameter δ min must beproportional to 1 / √ N .We show δ min for our system for κ = 5 in Fig. 4 (leftpanels). We see that for a given value of κ it is im-possible to verify 1 / √ N scaling, at least for the con-sidered amount of eigenstates. However, the random-ness present in a mass-imbalanced system can be re-covered. To show this, we average δ min over diﬀerentmasses, i.e., δ averagemin = M (cid:80) δ min ( κ i ), where M deter-mines how many values of κ appear in the sum. To pro-duce Fig. 4 (right panels), we sum over the followingvalues of the mass ratios: κ = 1 . , . , ...,

5. The param-eter δ averagemin has approximately 1 / √ N behavior at largevalues of N , which conﬁrms our expectation that systemswith 1 /κ ∈ (0 ,

1) are not integrable.

B. Wave function

The analysis above shows a drastic change of proper-ties of the system when moving from integrable to non-integrable regimes. Information about this transition isextracted by analyzing the energy levels as in Fig. 3, al-though, the correspondence principle conjectures proper-ties at the level of individual wave functions. The wavefunction of a highly excited state contains too much in-formation for the human eye, and one has to rely on afew existing conjectures that allow one to connect clas-sical chaos to quantum states. For example, the chaoticstates are expected to be similar to a random superpo-sition of plane waves [5], since the underlying classicalphase space has no structure, i.e., the classical motion isnot associated with motion on a two-dimensional torus.This expectation applies to a typical random state (notto atypical, e.g., scared states [67]). In contrast, the wavefunctions of integrable states are expected to have somenon-trivial morphology, since classical phase space of in-tegrable systems has some structure. Below, we illustratethese ideas for our problem. We focus on a distributionof wave-function amplitudes, although, other signaturesof ‘chaos’ in eigenstates connected to local currents andnodal lines [68–70] will also be important when we ana-lyze our neural network.A celebrated result of the random-wave conjecture isa Gaussian distribution of wave-function amplitudes, seeexamples in Refs. [71–73]: P ( ψ ) = 1 √ πv e − ψ v , (10)where the variance v = 1 /L ﬁxes the normalization of the . . . . . / = 0 | ψ | P ( | ψ | ) | ψ | P ( | ψ | ) | ψ | P ( | ψ | ) | ψ | P ( | ψ | ) | ψ | P ( | ψ | ) | ψ | P ( | ψ | ) | ψ | , state no. 500 P ( | ψ | ) FIG. 5. The distributions of values of | ψ | for diﬀerent massratios, κ = m I /m . The histogram describes the probabilitythat a pixel has a value of | ψ | in a speciﬁc interval. For themain plots, we use a 315 ×

315 pixel raster image of | ψ | . Theinsets are showing P ( ψ ) for the state represented by 33 × wave function. We present our numerical calculationsof P ( ψ ) in Fig. 5. For this ﬁgure, we discretize the wavefunction for the 500th state using either a 315 ×

315 pixelgrid or a 33 ×

33 pixel grid, and assign to each unit ofthe grid a value that corresponds to ψ in the center ofthe unit. The distribution of these central values for agiven value of κ is presented as a histogram in Fig. 5.For κ = 1 the P ( ψ ) resembles an exponential function(cf. Ref. [73]). For larger values of κ , a Gaussian proﬁleis developed. The distinction between the histogram andEq. (10) is clear for κ = 1. For 1 /κ = 0 the diﬀerence isless evident. Note that the peak at ψ = 0 is enhanced incomparison to the prediction of Eq. (10) for all values of κ . This is due to the evanescence of the wave functionat the boundaries, which is a ﬁnite-size eﬀect beyondEq. (10). Finally, the characteristics of the states arealso visible in a low-resolution images, see the insets ofFig. 5. This feature will be used in the design of our The value of v is calculated by using the average value of ψ , i.e., ψ = (cid:82) x P ( x )d x , in the normalization condition,i.e., (cid:82) ψ d z d z = ψ (cid:82) d z d z = 1. neural network discussed below. IV. NEURAL NETWORK

To construct a neural network that can distinguish in-tegrable states from non-integrable ones, we need toA. prepare a data set for training the networkB. choose a suitable architecture and a training algo-rithmIn this section, we discuss these two items in detail.

A. A data set

As data set we use the set A made of two-dimensionalimages that represent highly excited states. We can useimages of (real) wave functions, ψ ( z , z ) or probabilitydensities, | ψ ( z , z ) | . We have checked that these tworepresentations lead to similar results. In the paper, wepresent only our ﬁndings for | ψ ( z , z ) | . To produce A , we diagonalize the Hamiltonian H P =0 of Eq. (5) for κ = 1 , , /κ = 0. Each image has a label – in-tegrable (for κ = 1 and 1 /κ = 0) or non-integrable (for κ = 2 and 5). We do not include information about themirror symmetry, i.e., states with diﬀerent values of p are treated on the same footing, since we do not expectthat this information is relevant for a coarse-grained (seebelow) image of | ψ ( z , z ) | . This allows us to work withtwice as large datasets compared to Fig. (3). Each massratio contributes 1000 states to A , which therefore con-tains 4000 images in total. It is reasonable to not usedata sets that contain states with very diﬀerent energies:Very diﬀerent energies lead to very diﬀerent length scales,and hence diﬀerent information content that should belearned. We choose to include all states from the 50thto 1050th excited states. Not much should be deducibleabout the low-lying states (with N ∼

10) from the cor-respondence principle, therefore, we do not use them inour study.A wave function ψ ( z , z ) is a continuous function ofthe variables z and z , see Fig. 1 (b). To use it as an in-put for a network, we need to discretize and coarse-grainit. To this end, we represent ψ as a 64 ×

64 pixel image,and as the value of the pixel we use the value of the wavefunction at the center of the pixel. The resolution is im-portant for this discretization. Low resolution might not To avoid any bias towards non-integrable states, we use non-integrable states for only two values of κ . However, we havechecked that our conclusions hold also true if we include othervalues of κ into the data set, in particular, if we add 1000 statesto A from a system with κ = 15. The color depth (i.e., how many colors are available) of a pixel iseﬀectively given by the numerical precision used to produce theinput data. If experimental data is used as an input, then theiraccuracy will determine the color depth of a pixel. ×

320 33 × × aliasingoccursstructure isstill visible FIG. 6. Schematic representations of a probability distribu-tion of a state for 1 /κ = 0 susceptible to aliasing. The leftimage shows a high-resolution representation (320 ×

320 pixelimage). This representation contains too many pixels for ourpurposes, and can be optimized. The right images presenttwo low-resolution representations, which we use to train thenetwork. The representation with 33 ×

33 pixels does notcontain enough information and spatial aliasing occurs. Therepresentation with 64 ×

64 pixels contains all relevant infor-mation, and is used for the analysis in Sec. V. be able to capture oscillations present in highly excitedstates, leading to a loss of important physical informa-tion. For example, the approximately N th state in thespectrum for 1 /κ = 0 will have about √ N oscillationsin each direction, and it is important therefore to use a2 √ N × √ N representation of the wave function (similarto the Nyquist–Shannon sampling theorem). For a lowerresolution, the oscillations are not faithfully reproducedin the low resolution image and spatial aliasing occurs.We illustrate this using the 33 ×

33 resolution in Fig. 6 foran integrable state that is susceptible to spatial aliasing.Note that out of curiosity, we have also used imageswith 33 ×

33 pixel resolution to train our network. Thenetwork could reach relatively high accuracy (higher than90%). However, not all integrable states were detectedproperly. For example, the one in Fig. 6 was classiﬁedas non-integrable by the network. In general, spatialaliasing is more damaging for integrable states, whichhave symmetries that should be respected; non-integrablestates are more random, and some noise does not changethe classiﬁcation of the network. Everywhere below weuse the 64 ×

64 pixel representation, which gives a suf-ﬁciently accurate representation of the state, so thatwe do not need to worry that the network learns un-physical properties. Note that certain features (e.g., ψ ( z = z ) = 0) of the wave function may disappearat this resolution. The overall high accuracy of our net-work suggests that such features are not important forour analysis. Featuremaps36@14x14Featuremaps36@29x29Featuremaps24@31x31Featuremaps24@62x62Inputs1@64x64 Convolution3x3 kernel Max-pooling2x2 kernel Convolution3x3 kernel Max-pooling2x2 kernel Hiddenunits7056 Hiddenunits128 Outputs2Flatten Fullyconnected Fullyconnected

FIG. 7. An illustration of the ConvNet used in our analysis. An input layer, 64 ×

64 image, is followed by a sequence of layers:a convolutional layer, a pooling layer, a convolutional layer, a pooling layer, and two fully connected layers. The last layer isused to produce an output layer, which is made out of two neurons.

The set A seems somewhat small. For example, thewell-known Asirra dataset [74] for the cat-dog classiﬁca-tion contains 25000 images that are commonly used totest and compare diﬀerent image recognition techniques.However, we will see that A is large enough to train anetwork that can accurately classify integrable and non-integrable states. The dataset A is further divided intotwo parts. We draw randomly 85 % of all states and usethem as a training set. The remaining 15 % is used fortesting. We ﬁx the random seed used for drawing to avoiddiscrepancies between diﬀerent realizations of the net-work. It is worthwhile noting that in image-recognitionapplications, the dataset A may be divided into morethan two parts. For example, in addition to the trainingset and testing set, one can introduce a validation (ordevelopment) set [75], which is used to ﬁne-tune parame-ters of the model. We do not use this additional set here.The focus of this work is on understanding features ofour general image classiﬁcator, and not on improving itsaccuracy. B. Architecture

The neural network in our problem is a map that actsin the space X made of all 64 ×

64 pixel representations of | ψ | . By analogy to the standard dog-vs-cat classiﬁer, theoutput of the network is a vector with two elements b . Its ﬁrst element 0 ≤ b ≤ b = 1 − b is the probability that the inputstate is non-integrable. An input state is classiﬁed asintegrable (non-integrable) if b > b ( b < b ).Mathematically, the network is a map f , which acts onthe element a of X as f ( a ; θ, θ hyp ) = b . (11) The f is determined by the set of parameters θ , whichare optimized by training. Since our problem is similarto image recognition (in particular dog-vs-cat classiﬁca-tion) [9], which is one of the standard applications inmachine learning, we can use the already known train-ing routines (SGD, ADAM, Adadelta,...) for optimizing θ . The outcome for the parameters θ may vary betweendiﬀerent trainings, and we use this variability to checkthe universality of our results. Speciﬁcations of f thatare not trained but speciﬁed by the user are called hy-perparameters ( θ hyp ). Examples of them include the lossfunction, optimization algorithm, learning rate, networkarchitecture, size of batches that the data is split into fortraining (batch size) and the length of training (epochs).We ﬁnd hyperparameters by trial-and-error.The simplest form of a network is called a dense net-work in which all input neurons are connected to all out-put neurons. However in most cases of image detection,this architecture does not lead to accurate results. Thisalso happens in our case. Instead, we resort to a standardarchitecture based on ConvNets for image recognition,see Fig. 7. Our network consists of two convolutionallayers and two max-pooling layers. The former use a setof ﬁlters and apply them in parts to the image to producea new smaller image. This is somewhat analogous to arenormalization group transformation [76]. A set of im-ages that are produced by a convolutional layer is calledfeature map. Each convolutional layer is followed by amax-pooling layer which reduces the size of an image.The size of max-pooling layers is a hyperparameter. Inour implementation, max-pooling layers take the largestpixel out of groups of 2 by 2.One could use architectures diﬀerent from the one pre-sented in Fig. 7. However, we checked that they do notlead to noticeably diﬀerent results. Therefore, we do notinvestigate this possibility further. n output neurons are usual for classifying n classes. However, it ispossible to use a single output neuron for a binary classiﬁcation. This ﬁgure is generated by adapting the code from https:// github.com/gwding/draw_convnet . wrong i n t e g r a b l e correct uncertain n o n i n t e g r a b l e FIG. 8. The ﬁgure shows | ψ | of exemplary integrable (upperrow) and non-integrable states (lower row) together with thecorresponding prediction of the network. The network assignsa wrong (correct) label for the states in the ﬁrst (second)column. The third column shows states, which are identiﬁedcorrectly by the network, but with a low conﬁdence level (withabout 60%). In other words, the states in the third columnconfuse the network and lead to b (cid:39) b in Eq. (11). V. NUMERICAL EXPERIMENTS

Following the discussion in Sec. IV, we train and testthe neural network. We observe that a typical accuracyof the trained network (which we refer later to as N ) is ∼ This means that about 18 states out of 600used for testing are given the wrong label. Out of these18 states, roughly one half is integrable. We illustratetypical wave functions that are classiﬁed correctly andwrongly in Fig. 8. It does not mean that these statesare in anyway special – another implementation (e.g.,another random seed for weights) will lead to other statesthat are given the wrong label. Non-integrable stateswith some structure (e.g., states with scars) in generalconfuse the network and might be classiﬁed as integrable.In general, it is hard to interpret predictions of theneural network. This becomes clear after noticing thatsome images can be changed so that a human eye canhardly detect any variation. At the same time, thischange completely modiﬁes the prediction of the network.Such a change can be accomplished especially easily forintegrable states, thus, deep learning (DL) conﬁrms ourintuition that integrable states are fragile against pertur-bation. However, such a situation can also occur for non-integrable (in particular scared) states. We illustrate thisin Fig. 9, which is obtained by slightly modifying states We use the word ‘typical’ to emphasize that a trained networkdepends on hyperparameters and random seeds. Even for a givenset of hyperparameters, each set of random parameters leads toa slightly diﬀerent network N . We can tune hyperparameters toreach higher accuracies. We do not discuss this possibility here,since high accuracy is not the main purpose of our study. initial final initial − final -0.10-0.050.000.050.10 initial final initial − final -0.050-0.0250.0000.0250.050 FIG. 9. Fooling the network. The ﬁrst column shows | ψ | ofthe initial state, which is correctly identiﬁed by the networkas integrable (non-integrable) in the upper (lower) panel. Thesecond column shows a slightly modiﬁed image of | ψ | , whichis wrongly identiﬁed by the network. The third column showsthe diﬀerence between the ﬁrst and second columns dividedby the maximum value of the initial state. The color chartcorresponds to the images in the third graph only. from A using tools of adversarial machine learning, seeRef. [77].One simple way to extract features of the network is tolook at feature maps, which should contain informationabout what features are important. For example, theﬁrst layer might represent edges at particular parts of theimage, the second might detect speciﬁc arrangements ofedges, etc. However, we could not extract any meaningfulinformation from this analysis. This is expected: thefeatures of integrable and non-integrable states are moreabstract and not as intuitive as the features of cats anddogs or images of other objects we encounter in everydaylife.Other approaches to analyze a network rely on esti-mating the eﬀect of removing a single (or a group) ofelements on a model. For large data sets, this can bedone by introducing inﬂuence functions [78, 79]. Here,we work with a small data set, and, therefore, we cancalculate directly the actual eﬀect of leaving states outof training on a given prediction. Our goal is to under-stand correlations between states of diﬀerent energies. Inour implementation we compare the prediction of f fromEq. (11) for | ψ | to a prediction of f − β for the same state.Here f − β is obtained by training a neural network afterleaving out the set β from A . The comparison of the twopredictions ( f − f − β ) allows us to estimate the importanceof the set β for the classiﬁcation of a test state ψ . We Note that it is important to choose a test state ψ for whichthe network gives an accurate prediction with high conﬁdencelevel, i.e., b i →

1. For other states, an intrinsic randomness ofConvNets can lead to a drastic change in the classiﬁcation of thenetwork. N further. To this end,we resort to numerical experiments. We employ N to an-alyze states outside of the set A . First, we study physicalstates, and then non-physical ones. A. Classiﬁcation of physical states outside of A . As a ﬁrst application of N , we use it to classify eigen-states of H P =0 not used in the training, i.e., for κ (cid:54) = 1 , , /κ (cid:54) = 0. These states are non-integrable (cf. Fig. 4),and we observe that N accurately classiﬁes them as suchas long as κ is far enough from κ = 1, see Fig. 11. Theﬁgure shows that predictions of N are not accurate onlyfor systems with κ = 1 + (cid:15) , where (cid:15) is a small parameter.These systems are non-integrable, however, the morphol-ogy of their eigenstates is very similar to the integrableones at κ = 1. The network classiﬁes them wrongly be-cause of this. Already for κ (cid:39) .

5, the accuracy of thenetwork is close to one, and it stays high for larger valuesof κ . The region between 1 and 1 . N . Therefore, we do notinvestigate it further.To test the network on integrable states, we use wavefunctions of two non-interacting bosons in a box potentialof size L :Ψ B = N k ,k L [sin( k z ) sin( k z ) + sin( k z ) sin( k z )] , (12)where k ≤ k , and N k ,k is a normalization constant, N k = k = 1 and N k

2. The set of functions { Ψ B } is complementary to the 1 /κ = 0 case studied above forfermions. The bosonic symmetry used here is from theorthogonal Hilbert space, and therefore, the training rou-tine can have no microscopic information about the wavefunction, Ψ B . We use 1000 states of the bosonic type(from the 50th to 1050th) as an input for N . We observethat N accurately (accuracy is (cid:39) B as integrable.To connect the analysis of Ψ B to studies of quantumbilliards, we note that two bosonic impurities in an inﬁ-nite square well can be mapped on a right triangle withtwo impenetrable boundaries. At the third boundary azero Neumann boundary condition should be satisﬁed –Ψ (cid:48) B | z = z = 0. The mapping follows from the mappingdiscussed for fermions (see Fig. 1) assuming that the im-purity is inﬁnitely heavy. In particular, Fig. 1 b) shows f − f , − β = 11 / = 0= 2= 50 500 1000 1500 2000 2500energy of LOO state-0.250.000.250.500.751.001.25 f − f , − β × FIG. 10. A typical outcome of a leave-one-out (LOO) algo-rithm. (Top) The panel shows f − f , − β for β that consistsof a single state as a function of the energy of the state thatwas left out. (Bottom) The panel shows f − f , − β for β thatconsists of ten consecutive states as a function of the energyof the ﬁrst state in β . The test state, ψ , here is for κ = 5, itsenergy is 534 . the geometry of the problem in this case. Note thatthe bosonic symmetry requires that the derivative of thewave function vanishes on the diagonal of the square inFig. 1 b). The high accuracy of the classiﬁcation of thebosonic states suggests that a network trained using theDirichlet boundary condition can also be used to classifystates with the Neumann boundary condition. In otherwords, the network is mainly concerned with the ‘bulk’properties of the wave function, the boundary is not im-portant. B. Classiﬁcation of non-physical states

The network N can classify any 64 ×

64 pixel inputimage, and it is interesting to explore the outcome of thenetwork for images that have no direct physical meaning.We start by considering non-normalized eigenstates of H P =0 . The normalization coeﬃcient does not change thephysics behind the states. However, since the function f is non-linear, i.e., f ( αx ) (cid:54) = α f ( x ), input states musthave the same normalization as the states in the trainingset for a meaningful interpretation of the network. Toillustrate this statement, we use states from A multipliedby a factor, i.e., we use α | ψ | instead of | ψ | . Figure 121 n e t w o r k a cc u r a c y FIG. 11. Typical accuracies of the network N as a function ofthe mass imbalance κ . Diﬀerent dots show accuracies for dif-ferent random seeds, which are used to train a neural network.The curves are added to guide the eye. The points κ = 1 , , κ = 1 corresponds to an integrable system. All otherpoints are expected to be non-integrable (cf. Fig. 3). α a cc u r a c y total= 11 / = 0= 2= 5 FIG. 12. Prediction of the network for the states from A multiplied by a factor α . The curves show the accuracies forfour values of the mass ratio κ . The average of these fourcurves is shown as a thick solid curve. shows the accuracy of the network as a function of α .The maximum accuracy of the network is reached at α =1, i.e., for the states used for the training. Integrablestates are classiﬁed as non-integrable almost everywhereexcept close to α = 1. A diﬀerent situation happensfor non-integrable states. They are classiﬁed correctlyalmost everywhere, and we conclude that they are lesssusceptible to the factor α . The shape of the curves inFig. 12 is not universal, it depends on hyperparametersof the network. However, a general conclusion holds – Here we still can talk about the accuracy, since the states αψ correspond to integrable or non-integrable situations. σ a cc u r a c y totalintegr.non-integr. FIG. 13. Predictions of the network for the states from A with noise. The (red) dashed curve shows the accuracy forstates which are integrable for σ = 0 (the accuracy here is de-ﬁned as the percentage of the states identiﬁed as integrable).The (green) dotted curve shows the accuracy of the networkfor non-integrable states. The (blue) solid curve shows theaverage of the ﬁrst two curves. the normalization is important, and we use normalizedinput functions in the further analysis.As a next step, we add noise to the images from A , i.e., we build a new data set using wave functions˜ ψ = a σ ψ (1 + r σ ), where r σ is a noise function whosevalues are drawn from the normal distribution with zeromean and the standard deviation σ ; a σ is a normalizationfactor, which is determined for each input state depend-ing on the function r σ . We assume that r σ possessesbasic symmetries of the problem: fermionic and mirror.Functions ˜ ψ naturally appear in applications, and there-fore, it is interesting to investigate the resilience of thenetwork to random noise. We use 4000 states of A withnoise to make a relevant statistical statement, see Fig. 13.Small values of σ lead to weak noise and the network cor-rectly classiﬁes almost all input states. However, largervalues of σ lead to confusing input states, and the net-work fails. It actually fails for integrable states wherethe noise destroys correlations. The accuracy for non-integrable states is always high. The resilience of thenetwork to noise suggests it as a tool to analyze exper-imental data (e.g., obtained using microwave billiards).These experiments [19] can produce a large amount ofdata, however, there are limited variety of tools to ana-lyze the simulated states. In particular, neural networkscan be used to identify atypical states, which do not ﬁtthe overall pattern, e.g., scars.Our choice of ˜ ψ to represent a noisy state is not unique.One could, for example, use instead of ˜ ψ the function¯ ψ = a σ ( ψ + G r σ ), where the parameter G determinesthe relative weight of the function r σ , which is deﬁnedas above. Note that G cannot be absorbed in r σ , sincefor a given value of σ , the average amplitude of r σ iswell deﬁned. For G = 0, the function ¯ ψ is a physicalstate, whereas for G → ∞ , the function ¯ ψ is completely2 . . . . . / = 0 | ψ | P ( | ψ | ) | ψ | P ( | ψ | ) | ψ | P ( | ψ | ) | ψ | P ( | ψ | ) | ψ | P ( | ψ | ) | ψ | P ( | ψ | ) | ψ | , state no. 500 P ( | ψ | ) FIG. 14. The histogram shows distributions of probability-density amplitudes for diﬀerent values of the mass ratio, κ .We use a 315 ×

315 pixel representation of the probabilitydensity, | ψ | , for this analysis. The insets give P ( | ψ | ) for theprobability density at the lower (33 ×

33) pixel resolution. random. In contrast to ˜ ψ , the function ¯ ψ can becomecompletely independent of the function ψ . This happensif the parameter G is large.For the data set based upon ¯ ψ with small values of G , the accuracy of the network is similar to that pre-sented in Fig. 13. For large values of G , the network isconfused and classiﬁes states in a random manner. Thisbehavior should be compared with the data set ˜ ψ forwhich the states are classiﬁed as non-integrable when thenoise is large. To understand this diﬀerence, note that˜ ψ retains information about the nodal lines of the phys-ical state. It turns out that is important for the inputstate to have enough pixels with small (zero) values (notethat the number of zero pixels is very large for | ψ | , seeFig. 14). Only such states have a direct meaning for thenetwork, all other states confuse the network and do notallow for extraction of any meaningful information. Itis worthwhile noting that the network does not learn thephysical nodal lines. We checked that almost any randomstate with a large number ( (cid:39) − κ , and perturbationin κ or noise change the prediction of N for these states.The asymmetry also suggests that the standard imple-mentation of deep learning (DL) presented here shouldbe modiﬁed to reveal the physics behind non-integrablestates. The present network does not distinguish a non-physical random state with a large number of zero val-ues from a physical non-integrable state since it was nottrained for that question. A possible modiﬁcation is theaddition of an extra label ( b in Eq. (11)) for non-physicalstates. Since there are many possible non-physical states,one should frame the problem having an experimental ornumerical set-up in mind, where such states have someorigin and interpretation. We leave an investigation ofthis possibility for future studies.Finally, we note that the conclusion that the networkclassiﬁes almost all random images as non-integrable isgeneral – it does not depend on the values of hyperparam-eters, initial seed or the distribution (Laplace, Gaussian,etc.) that we use for generation of the random states.We check this by performing a number of numerical ex-periments. In particular, this means that the networkdoes not learn P ( | ψ | ) from Eq. (10).To summarize: A trained network can accuratelyclassify integrable and non-integrable states. The net-work can even classify input states from an orthogo-nal (bosonic) Hilbert space, although, this might be notthat surprising provided that we work with low-resolution64 ×

64 pixel images. The network classiﬁes almost allrandom images with nodal lines as non-integrable, whichsuggests that useful information for the network is mostlycontained in integrable states.

VI. SUMMARY AND OUTLOOK

We used convolutional neural networks to analyzestates of a quantum triangular billiard illustrated inFig. 1. We argued that neural networks can correctlyclassify integrable and non-integrable states. The impor-tant features of the states for the network are the normal-ization of the wave functions and a large number of zeroelements, corresponding to the existence of a nodal line.Almost any random image that satisﬁes these criteriais classiﬁed as non-integrable. All in all, the neural net-work supports our expectation that non-integrable states3are resilient to noise as discussed in subsection V B, andhave a ‘random’ structure, unlike integrable states whosestructure can be revealed by considering, for example,nodal lines.Our results suggest that machine learning tools canbe used to analyze the morphology of wave functionsof highly excited states obtained numerically or exper-imentally, to solve problems like: ﬁnd exceptional states(e.g., scars or integrable states) in the spectra, investi-gate the transition from chaotic to integrable dynamics,etc. However, further investigations are needed to setthe limits of applicability for deep learning tools. Forexample, our network considers all states without clearcorrelations as non-integrable. This means that it mustbe modiﬁed for the analysis of noisy data, where a noisyimage without any physical meaning could be classiﬁed asnon-integrable. To circumvent this, one could introduceadditional labels for training the network. For exam-ple, one could consider three classes – ‘integrable’, ‘non-integrable’, and ‘noise’. This classiﬁcation might allowfor a more precise description of data sets, and may helpone to extract more information about the physics behindthe problem.We speculate that a network memorizes integrablestates, and all other states are classiﬁed as non-integrable, provided that an image has a large numberof vanishing values. This would explain our ﬁndings andit would align nicely with the observation that we ob-serve no overﬁtting even after many epochs of workingthrough the training set. In the future, it will be inter-esting to use other integrable systems to test this idea. Inparticular, one could use non-triangular billiards, or sys-tems without an impenetrable boundary. For example,one can consider a two-dimensional harmonic oscillatorwith cold atoms. At a single body level, the integrabilityin this system can be broken by potential bumps [81, 82]or spin-orbit coupling [83].In the present work, we focus on data related to thespatial representation of quantum states. However, ourapproach can also be used to analyze other data. For ex-ample, for few-body cold atom systems, correlation func-tions in momentum space are of interest in theory andexperiment (see, e.g., [84]). Therefore, in the future, itwill be interesting to train neural networks using experi-mental/numerical data that correspond to a momentum-space representation of quantum states, and study the corresponding features of ‘quantum chaos’.To extract further information about the map f , onecould investigate its geometry close to its maximum (min-imum) values. For example, in the vicinity of some accu-rately determined integrable state ( f ( x ) = 1), we canwrite f ( x + δx ) (cid:39) δx T Gδx, (13)where the position of the maximum x is understoodas a vector and G is the Hessian matrix. The ﬁrstderivative of f vanishes since the function f is analyticand bounded. Eigenstates of the Hessian G provide uswith the most important correlations. A preliminarystudy shows that there are only a handful of eigenvaluesof G for our network, which suggests the next step inthe analysis of our image classiﬁer. ACKNOWLEDGMENTS

We thank Aidan Tracy for his input during the ini-tial stages of this project. We thank Nathan Harsh-man, Achim Richter, and Wojciech Rzadkowski forhelpful discussions and comments on the manuscript.This work has been supported by European Union’sHorizon 2020 research and innovation programme un-der the Marie Sk(cid:32)lodowska-Curie Grant Agreement No.754411 (A.G.V.); by the German Aeronautics and SpaceAdministration (DLR) through Grant No. 50 WM1957 (O.V.M.); by the Deutsche Forschungsgemein-schaft through Project VO 2437/1-1 (Projektnummer413495248) (A.G.V. and H.W.H.); by the DeutscheForschungsgemeinschaft through Collaborative ResearchCenter SFB 1245 (Projektnummer 279384907) and bythe Bundesministerium f¨ur Bildung und Forschung un-der contract 05P18RDFN1 (H.W.H). H.W.H. also thanksthe ECT* for hospitality during the workshop ”Universalphysics in Many-Body Quantum Systems – From Atomsto Quarks”. This infrastructure is part of a project thathas received funding from the European Union’s Hori-zon 2020 research and innovation programme under grantagreement No 824093. [1] M. V. Berry, I. C. Percival, and N. O. Weiss, Proceedingsof the Royal Society of London. A. Mathematical andPhysical Sciences , 183 (1987).[2] G. Belot and J. Earman, Studies in History and Philoso-phy of Science Part B: Studies in History and Philosophyof Modern Physics , 147 (1997).[3] W. H. Zurek, Rev. Mod. Phys. , 715 (2003).[4] M. V. Berry, M. Tabor, and J. M. Ziman, Proceedingsof the Royal Society of London. A. Mathematical and Physical Sciences , 375 (1977).[5] M. V. Berry, Journal of Physics A: Mathematical andGeneral , 2083 (1977).[6] O. Bohigas, M. J. Giannoni, and C. Schmit, Phys. Rev.Lett. , 1 (1984).[7] H.-J. St¨ockmann, Quantum Chaos - An Introduction (University Press, Cambridge, 1999).[8] F. Haake,

Quantum Signatures of Chaos, 2nd edition (Springer, Berlin, 2001). [9] Y. LeCun, Y. Bengio, and G. Hinton, Nature ,436–444 (2015).[10] G. Carleo, I. Cirac, K. Cranmer, L. Daudet, M. Schuld,N. Tishby, L. Vogt-Maranto, and L. Zdeborov´a, Rev.Mod. Phys. , 045002 (2019).[11] M. Berry, Physica Scripta , 335 (1989).[12] Y. A. Kharkov, V. E. Sotskov, A. A. Karazeev, E. O.Kiktenko, and A. K. Fedorov, Phys. Rev. B , 064406(2020).[13] R. Guidotti, A. Monreale, S. Ruggieri, F. Turini, F. Gian-notti, and D. Pedreschi, ACM Comput. Surv. (2018),10.1145/3236009.[14] Q. Zhang and S. Zhu, Frontiers Inf Technol ElectronicEng , 27–39 (2018).[15] B. Kaspschak and U.-G. Meißner, (2020),arXiv:2003.09137 [physics.comp-ph].[16] B. Kaspschak and U.-G. Meißner, (2020),arXiv:2009.03192 [cs.LG].[17] A. Dawid, P. Huembeli, M. Tomza, M. Lewenstein, andA. Dauphin, New Journal of Physics , 115001 (2020).[18] Q. Zhang, X. Wang, R. Cao, Y. N. Wu, F. Shi, andS. Zhu, IEEE Transactions on Pattern Analysis and Ma-chine Intelligence , 1 (2020).[19] A. Richter, “Playing billiards with microwaves — quan-tum manifestations of classical chaos,” in Emerging Ap-plications of Number Theory , edited by D. A. Hejhal,J. Friedman, M. C. Gutzwiller, and A. M. Odlyzko(Springer New York, New York, NY, 1999) pp. 479–523.[20] M. V. Berry and M. Wilkinson, Proceedings of the RoyalSociety of London. Series A, Mathematical and PhysicalSciences , 15 (1984).[21] W. Li and S. M. Blinder, Journal of MathematicalPhysics , 2784 (1985).[22] D. L. Kaufman, I. Kosztin, and K. Schulten, AmericanJournal of Physics , 133 (1999).[23] F. M. de Aguiar, Phys. Rev. E , 036201 (2008).[24] T. Ara´ujo Lima, S. Rodr´ıguez-P´erez, and F. M.de Aguiar, Phys. Rev. E , 062902 (2013).[25] W. Rawat and Z. Wang, Neural Computation , 2352(2017), pMID: 28599112.[26] L. Wang, Phys. Rev. B , 195105 (2016).[27] P. Broecker, J. Carrasquilla, R. Melko, and et al., SciRep , 8823 (2017).[28] W. Hu, R. R. P. Singh, and R. T. Scalettar, Phys. Rev.E , 062122 (2017).[29] Y. Zhang, A. Mesaros, K. Fujita, and et al., Nature ,484–490 (2019).[30] B. Rem, N. K¨aming, M. Tarnowski, and et al., Nat.Phys. , 917–920 (2019).[31] A. Bohrdt, C. Chiu, G. Ji, and et al., Nat. Phys. ,921–924 (2019).[32] J. Pekalski, W. Rzadkowski, and A. Z. Panagiotopoulos,The Journal of Chemical Physics , 204905 (2020).[33] W. Rzadkowski, N. Defenu, S. Chiacchiera, A. Trombet-toni, and G. Bighin, New Journal of Physics , 093026(2020).[34] J.-S. Caux and J. Mossel, Journal of Statistical Mechan-ics: Theory and Experiment , P02023 (2011).[35] H. R. Krishnamurthy, H. S. Mani, and H. C. Verma,Journal of Physics A: Mathematical and General ,2131 (1982).[36] S. Glashow and L. Mittag, J Stat Phys , 937–941(1997).[37] A. Lamacraft, Phys. Rev. A , 012707 (2013). [38] R. E. Barfknecht, I. Brouzos, and A. Foerster, Phys.Rev. A , 043640 (2015).[39] S. Joseph and M. Sanju´an, Entropy , 79 (2016).[40] X.-W. Guan, M. T. Batchelor, and C. Lee, Rev. Mod.Phys. , 1633 (2013).[41] H. Schachner and G. Obermair, Z. Physik B - CondensedMatter , 113–119 (1994).[42] Y.-Q. Li and Z.-S. Ma, Phys. Rev. B , R13071 (1995).[43] J. McGuire and C. Dirk, Journal of Statistical Physics , 971 (2001).[44] M. Olshanii and S. G. Jackson, New Journal of Physics , 105005 (2015).[45] N. Loft, A. Dehkharghani, N. Mehta, A. G. Volosniev,and N. T. Zinner, Eur. Phys. J. D , 65 (2015).[46] T. Scoquart, J. J. Seaward, S. G. Jackson, and M. Ol-shanii, SciPost Phys. , 005 (2016).[47] N. L. Harshman, M. Olshanii, A. S. Dehkharghani, A. G.Volosniev, S. G. Jackson, and N. T. Zinner, Phys. Rev.X , 041001 (2017).[48] Y. Liu, F. Qi, Y. Zhang, and S. Chen, iScience , 181(2019).[49] A. Miltenburg and T. Ruijgrok, Physica A: StatisticalMechanics and its Applications , 476 (1994).[50] A. S. Dehkharghani, A. G. Volosniev, and N. T. Zin-ner, Journal of Physics B: Atomic, Molecular and OpticalPhysics , 085301 (2016).[51] T. D. Lee, F. E. Low, and D. Pines, Phys. Rev. , 297(1953).[52] A. G. Volosniev and H.-W. Hammer, Phys. Rev. A ,031601 (2017).[53] G. Panochko and V. Pastukhov, Annals of Physics ,167933 (2019).[54] S. I. Mistakidis, A. G. Volosniev, N. T. Zinner, andP. Schmelcher, Phys. Rev. A , 013619 (2019).[55] J. Jager, R. Barnett, M. Will, and M. Fleischhauer,Phys. Rev. Research , 033142 (2020).[56] R. N. Hill, Journal of Mathematical Physics , 1083(1980).[57] H.-J. St¨ockmann and J. Stein, Phys. Rev. Lett. , 2215(1990).[58] S. Sridhar and E. J. Heller, Phys. Rev. A , R1728(1992).[59] F. Lenz, B. Liebchen, F. K. Diakonos, andP. Schmelcher, New Journal of Physics , 103019 (2011).[60] P. Pechukas, Phys. Rev. Lett. , 943 (1983).[61] T. Yukawa, Phys. Rev. Lett. , 1883 (1985).[62] V. Ivrii, Bull. Math. Sci. , 379–452 (2016).[63] O. Bohigas and M.-J. Giannoni, Mathematical and Com-putational Methods in Nuclear Physics , edited by J. S.Dehesa, J. M. G. Gomez, and A. Polls (Springer BerlinHeidelberg, 1984) pp. 1–99.[64] T. Prosen and M. Robnik, Journal of Physics A: Mathe-matical and General , 2371 (1993).[65] G. Ben Arous and P. Bourgade, Ann. Probab. , 2648(2013).[66] V. Blomer, J. Bourgain, M. Radziwill, and Z. Rudnick,Annales Scientiﬁques de l’Ecole Normale Superieure ,1283 (2017).[67] L. Kaplan and E. Heller, Annals of Physics , 171(1998).[68] J. R. Evans and M. I. Stockman, Phys. Rev. Lett. ,4624 (1998).[69] K. F. Berggren, K. N. Pichugin, A. F. Sadreev, andA. Starikov, Jetp Lett. , 403 (1999). [70] S. R. Jain and R. Samajdar, Rev. Mod. Phys. , 045005(2017).[71] M. Shapiro and G. Goelman, Phys. Rev. Lett. , 1714(1984).[72] S. W. McDonald and A. N. Kaufman, Phys. Rev. A ,3067 (1988).[73] R. Samajdar and S. R. Jain, Journal of MathematicalPhysics , 012103 (2018).[74] J. Elson, J. Douceur, J. Howell, and J. Saul, in Proceed-ings of 14th ACM Conference on Computer and Com-munications Security (CCS) (Association for ComputingMachinery, Inc. (ACM), 2007).[75] G. James, D. Witten, T. Hastie, and R. Tibshirani,

An Introduction to Statistical Learning (Springer, NewYork, NY, 2013).[76] K. G. Wilson, Rev. Mod. Phys. , 583 (1983).[77] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Er- han, I. Goodfellow, and R. Fergus, “Intriguing propertiesof neural networks,” (2014), arXiv:1312.6199 [cs.CV].[78] F. R. Hampel, Journal of the American Statistical Asso-ciation , 383 (1974).[79] P. W. Koh and P. Liang, in International Conference onMachine Learning , ICML’17, Vol. 70 (JMLR.org, 2017)p. 1885–1894.[80] V. N. Prigodin, Phys. Rev. Lett. , 1566 (1995).[81] N. L. Harshman, Phys. Rev. A , 053616 (2017).[82] J. Keski-Rahkonen, A. Ruhanen, E. J. Heller, andE. R¨as¨anen, Phys. Rev. Lett. , 214101 (2019).[83] O. V. Marchukov, A. G. Volosniev, D. V. Fedorov, A. S.Jensen, and N. T. Zinner, Journal of Physics B: Atomic,Molecular and Optical Physics , 195303 (2014).[84] A. Bergschneider, V. M. Klinkhamer, J. H. Becher,R. Klemt, L. Palm, G. Z¨urn, S. Jochim, and P. M. Preiss,Nature Physics , 640 (2019). APPENDIX

To generate the input for a neural network, we diagonalize the Hamiltonian H P =0 from Eq. (5) in a truncatedHilbert space whose basis element is ξ n ,n ( x , x ) = N (cid:104) sin (cid:16) n πx L (cid:17) sin (cid:16) n πx L (cid:17) − sin (cid:16) n πx L (cid:17) sin (cid:16) n πx L (cid:17)(cid:105) , (14)where c > n > n > c is the cutoﬀ parameter, and N is the normalization constant. In this basis, matrix elementsof the Hamiltonian read as (in our numerical analysis, we use units in which L = π ) (cid:90) π (cid:90) π ξ m ,m ( x , x ) ∗ H ξ n ,n ( x , x )d x d x = 12 ( n + n ) (cid:18) κ (cid:19) ( δ m ,n δ m ,n − δ m ,n δ m ,n )+ n n π κ [ I ( m + n , m + n ) + I ( m + n , m − n )+ I ( m − n , m + n ) + I ( m − n , m − n ) − I ( m + n , m − n ) − I ( m + n , m − n ) − I ( m − n , m + n ) − I ( m − n , m − n )] , (15)where I ( s, t ) = st [( − s −

1] [( − t −

1] if s, t (cid:54) = 0, and I ( s, t ) = 0 otherwise. To write these matrix elements in amatrix form, we use the index n = n − n + c ( n − − ( n − n . (16)The parameters n and n can be uniquely deﬁned as: n = 1 + 2 c − (cid:114) c − c − n + 94 , n = n + n − c ( n −

1) + ( n − n . (17)To choose the cutoﬀ parameter, c , we should ﬁnd a good balance between the calculation time and the accuracy ofour results. To quantify the accuracy, we compute energies for κ = 1 by diagonalizing H , and compare them to theexact ones obtained with the Bethe ansatz: E BA = n + n + n , (cid:80) n i = 0 , ( n + ) + ( n + ) + ( n + ) , (cid:80) n i = − , ( n + ) + ( n + ) + ( n + ) , (cid:80) n i = − . (18)This solution assumes that the total momentum is zero. The relative diﬀerence (cid:15) = E BA − E ( c ) E BA (19)6provides us a measure for the accuracy. Note that our exact diagonalization method is expected to work better for κ >

1, since ξ is the eigenstate of a system with 1 /κ = 0. The input for a neural network is obtained using c = 130,for which we obtain 726 (3795) states with (cid:15) < − (10 −3