Learning What a Machine Learns in a Many-Body Localization Transition
LLearning What a Machine Learns in a Many-BodyLocalization Transition
Rubah Kausar ,Wen-Jia Rao , ∗ , and Xin Wan , Zhejiang Institute of Modern Physics, Zhejiang University, Hangzhou 310027, China School of Science, Hangzhou Dianzi University, Hangzhou 310027, China CAS Center for Excellence in Topological Quantum Computation, University ofChinese Academy of Sciences, Beijing 100190, China ∗ E-mail: [email protected]
Abstract.
We employ a convolutional neural network to explore the distinct phasesin random spin systems with the aim to understand the specific features that theneural network chooses to identify the phases. With the energy spectrum normalizedto the bandwidth as the input data, we demonstrate that a network of the smallestnontrivial kernel width selects level spacing as the signature to distinguish the many-body localized phase from the thermal phase. We also study the performance of theneural network with an increased kernel width, based on which we find an alternativediagnostic to detect phases from the raw energy spectrum of such a disorderedinteracting system.PACS numbers: 84.35.+i, 71.55.Jv, 75.10.Pq, 64.60.aq a r X i v : . [ c ond - m a t . d i s - nn ] J u l earning What a Machine Learns in a Many-Body Localization Transition
1. Introduction
Recent research has established the existence of two generic phases in isolated quantummany-body systems: the thermal phase and many-body localized (MBL) phase [1, 2].Ergodicity is preserved in the thermal phase, while in the MBL phase localizationpersists in the presence of weak interactions. The difference between the thermal andMBL phases exhibits in many aspects, such as quantum entanglement. A thermal systemcan act as the heat bath for its own subsystem, hence the entanglement is extensiveand satisfies a volume law. An MBL system, however, yields small entanglement thatscales with the area of subsystem boundary. More recently, much attention is drawnto the study of entanglement spectrum (ES) [3], which is the eigenvalue spectrum ofthe reduced density matrix of a subsystem. ES contains more information than itsvon Neumann entropy – the entanglement entropy – which is a single number. Thevariance of the entanglement entropy and its evolution after a local quench from anexact eigenstate, together with the spectral statistics of ES, are all promising tools inthe study of the MBL phase [4, 5, 6, 7, 8, 9, 10, 11, 12, 13].Modern developments in machine learning (ML) [14] provides a new paradigmto study phases and phase transitions in condensed matter physics. In computerscience, ML is an efficient algorithm to extract hidden features in data, such asfigures, to make predictions about the nature of new ones. This is similar to thestudy of phase transitions, where we use (local or non-local) order parameters todistinguish different phases. ML includes both unsupervised and supervised methods.Unsupervised learning is a collection of exploratory methods that extract the hiddenpatterns in the input data without prior knowledge of desired output. Whereasin supervised learning the input data are accompanied by matching labels, and amachine is trained to recognize patterns and predict correct labels. A significantamount of works have been devoted in using ML methods to study equilibrium phasetransitions [15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31].For the MBL physics, ML has been successfully employed to study the MBLtransition point, mobility edge, and the evolution of initial state [32, 33, 34, 35, 36, 37].In these works, ES is a popular choice of the input data [32, 34, 35, 36], The choice,as was explicitly argued in Ref. [36], builds on past studies that established ES as asensitive probe of the phases in the MBL systems, while ML successfully extracts relevantfeatures from the complex pattern of ES. Unfortunately, such a common practice, unlikeconventional studies in physics, does not reveal what are the relevant features in thespectral pattern. In addition, ES is obtained by pre-processing wave functions and is,therefore, high-level data, while the general success of ML hints that we should be ableto use “lower level”physical quantity, such as wave function or energy spectrum.Given the energy spectrum of a many-body system, the most widely-used statisticalquantity, i.e. a relevant feature, is the distribution of level spacings (gaps betweennearest levels). Eigenstates in the thermal phase are extended with finite overlap witheach other, resulting a correlated energy spectrum; while the levels are independent in earning What a Machine Learns in a Many-Body Localization Transition
2. Models and Methods
The canonical model to study the MBL phenomena in one dimension (1D) is the spin1/2 Heisenberg chain with random external fields [50], whose Hamiltonian is given byH = J (cid:88) i S i · S i +1 + (cid:88) α = x,z (cid:88) i ε αi S αi . (1)Here, S is the spin-1/2 operator at each site. The isotropic interaction J couples nearestneighbouring spins. The disorder is introduced via a random field that couples to the x and z component of the spin operator. Such a random field is modelled by making ε αi random and ε αi ∈ [ − h, h ] is sampled from a uniform distribution of width 2 h . In thiswork we set the interaction strength J to be unity and implemented periodic boundaryconditions. earning What a Machine Learns in a Many-Body Localization Transition (a) h { E i } (b) s ( s ) GOEPoissonh =1.0h =5.0
Figure 1: (a) Energy spectrum of random field Heisenberg model for L = 6 at differentdisorder strengths. (b) Comparison of the level spacing distribution P ( s ) for the randomfield Heisenberg model. At small disorder ( h = 1), P ( s ) follows the GOE distribution,while at larger disorder ( h = 5) it has a Poisson distribution.The random matrix theory (RMT) pioneered by Wigner and Dyson [38] in 1960s tounderstand the behaviour of complex nuclei established a deep connection between thesymmetries of the Hamiltonian and the statistical properties of the eigenvalue spectrum.For instance, the system with time reversal invariance is represented by a Hamiltonianmatrix that is symmetric and real, which is invariant under orthogonal transformation,hence belongs to the Gaussian orthogonal ensemble (GOE). Note that our model Eq.(1)breaks time-reversal symmetry due to the external field, while there remains an anti-unitary symmetry comprised of time reversal and a rotation by π of all spins about z axis which leave the Hamiltonian unchanged, hence belonging to GOE. The Hamiltonianwith spin rotational invariance while breaks time reversal symmetry is represented bymatrix that belongs to the Gaussian unitary ensemble (GUE), while Gaussian symplecticensemble (GSE) represents systems with time reversal symmetry present but brokenspin rotational symmetry. All these ensembles describe thermal phase that has finitecorrelations between different energy levels, and there exist characteristic features thatare only determined by the symmetry while independent of the microscopic details.Among various features of RMT, the mostly used one is the distribution of nearestlevel spacings P ( s ), where s is the normalized spacing E i +1 − E i between nearest levels.For our model Eq. (1), it can be proven that in the thermal phase with small disorder,the nearest level spacings follows a GOE distribution P ( s ) = πs exp (cid:16) − πs (cid:17) , reflectingthe repulsion between levels. On the other hand, in a fully localized phase with largedisorder, all the energy levels become independent, the nearest level spacings distributionevolves into the Poisson distribution P ( s ) = exp ( − s ). The level spacing has been provedas a powerful tool to explore the behaviour of complex systems such as disorderedsystems [51, 52, 53, 54], chaotic [55] and quasi periodic systems [56].We plot representative energy spectra of the Hamiltonian in Fig. 1(a), whosebandwidth increases with disorder strength h . More importantly, in all cases, the levelsare denser in the middle part of the spectrum, hence the density of state (DOS) is more earning What a Machine Learns in a Many-Body Localization Transition (a) (b) Figure 2: (a) Sketch of the CNN architecture used in the present work. (b) The flow ofdata in the CNN model.uniform. For this reason we choose middle part of the spectrum to do level statistics.The evolution of the level spacing distribution at low ( h = 1) and high ( h = 5) disorderstrength is shown in Fig. 1(b), which is obtained after implementing the unfoldingprocedure [57, 58, 59]. We clearly see that at large disorder strength ( h = 5) the levelspacing distribution is Poissonian and at small value of disorder ( h = 1) the distributionfollows GOE. The fitting for h = 5 has noticeable deviations around s → h c ≈ M CNN consists of a convolution layer and a fully connected layeras shown Fig. 2(a). In the convolution layer the input data X is convolved with N f filters F β (also referred as kernel), where β = 1 , , . . . , N f . Mathematically, the 1Dconvolutions performed by our CNN model can be expressed as,Z β = ( X ∗ F β )( k ) = Q (cid:88) i =1 X ( k + i ) F β ( i ) , (2)where Z β is the β th feature map obtained as the result of the convolution process and Q is the width of the kernel. Filter weights constitute a set of N f × Q parameters inthe convolution layer, which are optimized during the training process. These resultingfeature maps are processed by a nonlinear activation function, whose output is flattenedand passed to the fully connected layer without pooling. By a linear map with weights earning What a Machine Learns in a Many-Body Localization Transition W the flattened output yields ˜ y and ˜ y , which correspond to the two phases. The setof weights are also optimized in the training process. Finally, the probability of beingin either of the phases is obtained by a softmax function y CNN1 = exp(˜ y )exp(˜ y ) + exp(˜ y ) (3) y CNN2 = exp(˜ y )exp(˜ y ) + exp(˜ y ) (4)The flow of the data in the CNN model is summarized in Fig. 2(b).To classify the thermal and MBL phases, we train the neural network in a supervisedlearning scheme with a collection of raw eigenvalue spectra obtained via diagonalizingEq. (1). In other words, we label the spectra with the corresponding phases and trainthe network to extract relevant features from the input data. In particular, the CNNis trained with the mini-batch gradient descent method. The optimization algorithmsearches for an optimal set of parameters that minimizes the cross entropy E = − N b (cid:88) I =1 2 (cid:88) i =1 y I,i log y CNN
I,i , (5)where y i = 0 or 1 is the true phase label of the I th sample, and N b is the size of batchduring one training. After the training, the neural network can use its gained knowledgeto predict or validate the class for a new set of data. The performance of the networkdepends on the model of the network, as well as the quality of training. It should bepointed out that, the number of parameters in a CNN is large (although significantlysmaller than that of fully-connected NN), hence finding a global minimum of the crossentropy is almost impossible. Nevertheless, our aim is not the precise values of theoptimized parameters or the best performance, but a numerical trend in the parametersthat enables the machine to extract non-trivial physics.
3. CNN Training Results
We begin by understanding what a CNN learns to distinguish phases, explicitly, thethermal phase at low disorder and the MBL phase at high disorder from energy spectra,as displayed in Fig. 1(a), in the random spin system. We assume no prior knowledgeof the exact transition point and numerically generate the raw energy spectrum { E i } of the Heisenberg model deep in each phase. Explicitly, we collect data for the thermalphase (labelled as 0) in the range 1 . ≤ h ≤ .
4, and for the MBL phase (labelled as1) in the range 4 . ≤ h ≤ .
0. In each region we generate 1000 samples of { E i } withparameter interval ∆ h = 0 .
1. The conventional wisdom is that the energy spectra in thetwo phases can be distinguished by nearest-neighbour level spacing, as we demonstratein Fig. 1(b). In the CNN training, we assume we have no knowledge of the level statisticsand feed all the labelled data to the CNN for a supervised learning. We then analyzethe kernel parameters to gain knowledge on how the neural network filter the energyspectra to distinguish the two phases. earning What a Machine Learns in a Many-Body Localization Transition An energy spectrum is a 1D set of data, so we use 1D filters with kernel width Q .The simplest nontrivial case is Q = 2. Therefore, we start our training with the CNNarchitecture that has 2 × β generated by convolution with a filter F β = ( F β , F β ) on twoneighbouring eigenvalues E i and E i +1 is,Z βi = F β E i + F β E i +1 , (6)where β indicates the channel index. For each collection of data we run the trainingprocess T L T L , the network parameters are initializedstochastically from a truncated normal distribution having mean µ = 0 and standarddeviation σ = 0 .
1. All testing accuracies are close to 100%, suggesting thatdistinguishing the thermal phase from the MBL phase is an easy task for even thesimplest CNN architecture.In Fig. 3(a) we show the resulting filter weights F β from the 1000 training process,in which we only use N f = 1 filter. We find that the filter weights split into two branchesin quadrants I and III, respectively. We fit the filter weights by F β = m F β ± c, (7)where the plus sign corresponds to quadrant I and minus to quadrant III. The best fityields m = − . ± .
02 and c = 0 . ± . m ≈ − βi = F β ∆ E i ± . E i , (8)where ∆ E i = E i +1 − E i is the nearest-neighbour level spacing. Eq. (8) reveals twounderlying features that the neural network filters to distinguish the phases: ∆ E i and E i . Because the network identifies the phases with almost perfect accuracy regardlessof the value of F β , we conclude that the neural network is not sensitive to levelspacing, which is commonly used as a diagnostic for phase transition in disorderedand chaotic systems. The apparently surprising result roots in the supervised learningscheme and the disorder strength dependence of the bandwidth. As shown in Fig. 1(a),the bandwidth of the system grows monotonically with the disorder strength. In thesupervised learning bandwidth can be a feature that the CNN learns to distinguishphases. For two neighbouring levels we can combine their energies into their differenceand mean, which are independent. Our observation suggests that the energy differenceis irrelevant, confirming that their mean is the feature that is filtered by the convolutionlayer to the subsequent fully connected layer, the output of which scales with bandwidth.This is a vivid example that a machine with a supervised scheme may not alwayslearn nontrivial knowledge. Bandwidth can be used to distinguish a low-disorder systemfrom a high disorder system, but it cannot be used to detect the phase transition pointwithout prior knowledge. The above example is, therefore, a caution to the study ofphase transition via deep neural networks. earning What a Machine Learns in a Many-Body Localization Transition (a) m = 0.95 ± 0.02, c = 0.360 ± 0.003 (b) m = 1.025 ± 0.003, c = 0.005 ± 0.008 Figure 3: (a) Filter weights after performing convolution with a kernel of width 2 onthe energy spectrum { E i } of 10 sites random field Heisenberg chain. Data is shown for T L = 1000 training loops keeping stride s = 1 and using sigmoid activation function.Other training parameters are: batch size N b = 100, learning rate η = 0 .
01, and filternumber N f = 1. (b) Filter weights for all the filter channels obtained from normalizedenergy levels { E (cid:48) i } . Other parameters are: N b = 100, s = 1, η = 0 . N f = 3, and T L = 500. Here we use tanh as the activation function.The failure to gain nontrivial features can be corrected, however, by properlymanipulating the input data. We can unbiasedly compare the spectra at various disorderby normalizing the spectrum of a sample by its minimum and maximum energies as E (cid:48) i = 2 E i − E min E max − E min − E (cid:48) i is the normalized energy level. The normalized spectrum (cid:8) E (cid:48) i (cid:9) always has abandwidth of 2 and preserves the level statistics of the original spectrum. We then feedthe normalized spectrum (cid:8) E (cid:48) i (cid:9) to the CNN and perform 500 independent trainings T L .We find that once the bandwidth effects are removed by normalization the performanceof the same neural network drops from 100% to around 70%, even though we increasethe number of channel to N f = 3. Fig. 3(b) presents the results of the filter weights F β in all trainings, which fall roughly on a straight line. When we fit the data by F β = m F β + c, (10)we obtain m = − . ± .
003 and c = 0 . ± . m ≈ −
1. This means that the convolutionlayer now only passes the level spacing information to the subsequent layer, consistentwith our expectation that level spacing can be used to detect the MBL phase and theMBL-thermal transition.In Fig. 3(b) we use N f = 3 convolutional filter channels. The resulting weightsfor the three channels all prefer level-spacing information, rather than bandwidth. Wehave also tested training with N f = 1 and N f = 2 channels. We find, in general, moretraining loops (hence longer time) are needed to filter the bandwidth information withless filters. For example, we need about 1000 training loops with N f = 2, compared to earning What a Machine Learns in a Many-Body Localization Transition N f = 3 to obtain a similar trend line as in Fig. 3(b). We speculate thatthis is because the level spacing distribution is a statistical “order parameter” in thissystem. Therefore, more level-spacing information can be extracted with a larger N f ,which allows an easier identification of its distribution.We also note that the detail of the energy normalization is not important. Wealso consider the normalization by E (cid:48) i = (cid:0) E i − E (cid:1) /σ where E is the average energyof the spectrum and σ is the standard deviation from mean value. This normalizationconvention yields a similar picture for the CNN parameters.We can attribute the significant drop of the performance of the CNN after weinput the normalized spectra to sample-to-sample fluctuations in finite systems. Inthe conventional level statistics study we identify the phase by analyzing the ensembleaveraged level spacing. In the CNN approach, however, we ask which phase each samplebelongs to. In a finite system, the fluctuations in energy level spacing, therefore, preventus from unambiguously classifying individual samples. But over all samples, we stillachieve a 70% performance, which is significantly higher than 50% from random guesses.The difference is sufficient for the machine to select the level spacing as the feature mapto explore during the convolution. The next question is whether we can boost theperformance by increasing the kernel width. The CNN with kernel width 2 extracts the nearest-neighbour level spacing. To extractmore information we also study the neural network with filters of kernel width 3, whichcaptures both nearest-neighbour and next-nearest-neighbour level spacings. Explicitly,a filter F β = ( F β , F β , F β ) yields a feature map in the formZ βi = F β E i + F β E i +1 + F β E i +2 (11)Again, we input the energy spectra normalized by their minimum and maximumenergies. In this case, we obtain an 82% performance in accuracy among 500 trainings,significantly higher than the 70% with filters of kernel width 2.We plot the resulting filter weights F β against F β in Fig. 4(a) and F β against F β in Fig. 4(b) and fit the results by straight lines. We find that F β = − (0 . ± . F β + (0 . ± . , (12)and F β = (0 . ± . F β + ( − . ± . . (13)The results provide a strong motivation for us to approximate the weights by F β = − F β F β = F β , (14)which lead to the feature map Z βi only in terms of one filter element F β ,Z βi (cid:39) F β E i − E i +1 ) + F β E i − E i +2 ) (15) earning What a Machine Learns in a Many-Body Localization Transition (a) m = 0.485 ± 0.005, c = 0.01 ± 0.01 (b) m = 0.85 ± 0.02, c = 0.03 ± 0.03 Figure 4: (a) Filter weights F β as a function of F β obtained by using a kernel of size3 × { E (cid:48) i } , the stride is kept to be s = 1. Other trainingparameters are: N b = 100, η = 0 . N f = 3, and T L = 500. (b) Filter weights F β as afunction of F β . Training parameters are the same as in (a).This feature map is an equal-weight linear combination of the nearest-neighbour andnext-nearest-neighbour level spacings.The filter weights learned by the CNN suggests that the neural network, whengiven the freedom to choose, tends to use both nearest-neighbour and next-nearest-neighbour level spacings with roughly equal weight to identify the energy spectrum ofthe two phases across the MBL phase transition. The improved performance confirmsthat the combination is more effective than using nearest-neighbour level statistics only.Therefore, we will turn to the analysis of the next-nearest-neighbour level statistics inthe next subsection and try to understand the machine learning results with kernelwidth 3.We note that the training results scatter around the linear regression curve withnoticeable deviations. This results from the fact that the neural network has numerousparameters or weights and, therefore, finding the globally optimal, hence reproducible,parameters is almost impossible. However, the goal of our study is not the precisionof the parameters, but the numerical trend that allows us to propose alternative levelspacings, which may outperform the conventional nearest-neighbour level spacings, atleast in finite systems. Before we try to understand the machine learning results, we discuss the distributionof the next-nearest-neighbour level spacings, which is denoted by P ( s ). As for thenearest-neighbour level statistics, we demand (cid:90) ∞ P ( s ) ds = (cid:90) ∞ s P ( s ) ds = 1 , (16)which can be achieved by normalizing the level spacings by their mean.In the MBL phase neighbouring eigenenergies likely correspond to two wavefunctions localized in different regions. Therefore, we can write the next-nearest- earning What a Machine Learns in a Many-Body Localization Transition E i +2 − E i = ( E i +2 − E i +1 ) + ( E i +1 − E i ), as the sum of twoindependent nearest-neighbour level spacings, whose distribution satisfies a Poissondistribution P ( s ) = exp ( − s ). Therefore, we have, for unnormalized level spacing s (cid:48) ,˜ P ( s (cid:48) ) ∝ (cid:90) s (cid:48) P ( s (cid:48) − s ) P ( s ) ds = s (cid:48) e − s (cid:48) . (17)Normalizing the distribution according to Eq. (16), we obtain P ( s ) = 4 s exp ( − s ) , (18)which turns out to be a semi-Poisson distribution. Compared to the nearest-neighbourlevel statistics, the most noticeable difference is now P (0) = 0. This is not amanifestation of level repulsion as in the thermal phase; rather, it simply states thatthree consecutive levels do not coincide.In the thermal phase neighbouring levels are correlated. In random matrix theorythe joint probability density function for eigenvalues is [38] P ( { E i } ) ∝ (cid:89) i Figure 5: Evolution of next-nearest level spacing distribution from GOE to GSE at h = 1 and from Poisson to semi-Poisson at h = 5. The black dashed line correspondsto the distribution in Eq. (21).phase boundary can be determined by the percentage recognition of the two phases.In this case, our CNN study in the previous subsection reveals roughly equal weightson the two level spacings, which improves the recognition accuracy significantly. Wespeculate that while the residual level repulsion due to the finite-size effect in the MBLphase favours the next-nearest-neighbour level statistics, the fluctuations in the levelspacings tend to blur the difference in the similar peak structures in the two phases.So what we observe is likely a compromise of the two aspects of the finite-size effect.Finally, it’s worth emphasizing the filter size can be further increased to incorporatelevel spacings on longer ranges, and the higher-order level spacings are also efficient fordistinguishing phases. Meanwhile, the results from filter size equals two and three aresufficient to express the main point of current work. 4. Conclusion We deploy a CNN to study the thermal-MBL transition in a one-dimensional randomspin system, using the raw energy spectrum as the training data. Our aim is to reveal thekey feature that the neural network extracts to classify the phases. The simplest CNNthat contains 2 × × earning What a Machine Learns in a Many-Body Localization Transition − , which should be reduced to the orderof 10 − when training data is the normalized energy spectrum. This demonstrates thatthe NN without a feature extractor is less efficient in learning non-trivial physics, whichmotivated us to employ a CNN. Finally, we are able to show that machine learning can“discover”less known physical quantity, in this case higher-order level spacings for boththe thermal and MBL phases. Therefore, the present work provides a vivid example ofhow one may use neural networks to develop and to improve methods from low-leveldata in disordered systems.In general, our approach can be applied to study dynamical phase transition in anymodel that has energy or entanglement spectrum. For example, in quantum systemwith periodic driving where a quasi-energy spectrum replaces the conventional energyspectrum, we believe the CNN can likewise capture the dynamical signal of phasetransition through the filters. In addition, by selecting different parts of the energyspectrum as training data, the CNN can also be used to locate the many-body mobilityedge.We note that our discussion only relies on random matrix theory, rather than thespecific Hamiltonian. We expect the results can be applied to other disordered andchaotic systems in both ML and conventional studies. Acknowledgement We acknowledge the support by the National Natural Science Foundation of Chinathrough Grant No. 11904069, No. 11847005 and No. 11674282 and the StrategicPriority Research Program of Chinese Academy of Sciences through Grant No.XDB28000000. Appendix A. Derivation of Eq. (20) In this appendix we give an analytical derivation for the next-nearest level spacingsin thermal phase. We start with the standard (unnormalized) energy level probabilitydensity for Gaussian ensembles [38], P ( { E i } ) ∝ (cid:89) i