[PDF] Efficient Learning of a One-dimensional Density Functional Theory

Abstract

Density functional theory underlies the most successful and widely used numerical methods for electronic structure prediction of solids. However, it has the fundamental shortcoming that the universal density functional is unknown. In addition, the computational result---energy and charge density distribution of the ground state---is useful for electronic properties of solids mostly when reduced to a band structure interpretation based on the Kohn-Sham approach. Here, we demonstrate how machine learning algorithms can help to free density functional theory from these limitations. We study a theory of spinless fermions on a one-dimensional lattice. The density functional is implicitly represented by a neural network, which predicts, besides the ground-state energy and density distribution, density-density correlation functions. At no point do we require a band structure interpretation. The training data, obtained via exact diagonalization, feeds into a learning scheme inspired by active learning, which minimizes the computational costs for data generation. We show that the network results are of high quantitative accuracy and, despite learning on random potentials, capture both symmetry-breaking and topological phase transitions correctly.

Full PDF

AActive Learning a One-dimensional Density Functional Theory

M. Michael Denner, Mark H. Fischer, and Titus Neupert Department of Physics, University of Zurich, Winterthurerstrasse 190, 8057 Zurich, Switzerland (Dated: May 8, 2020)Density functional theory underlies the most successful and widely used numerical methods forelectronic structure prediction of solids. However, it has the fundamental shortcoming that theuniversal density functional is unknown. In addition, the computational result—energy and chargedensity distribution of the ground state—is useful for electronic properties of solids mostly when re-duced to a band structure interpretation based on the Kohn-Sham approach. Here, we demonstratehow machine learning algorithms can help to free density functional theory from these limitations.We study a theory of spinless fermions on a one-dimensional lattice. The density functional is im-plicitly represented by a neural network, which predicts, besides the ground-state energy and densitydistribution, density-density correlation functions. At no point do we require a band structure in-terpretation. The training data, obtained via exact diagonalization, feeds into an active learningscheme which minimizes the computational costs for data generation. We show that the networkresults are of high quantitative accuracy and, despite learning on random potentials, capture bothsymmetry-breaking and topological phase transitions correctly.

Introduction.—

Materials with strong electronic corre-lations host a variety of intriguing phenomena and quan-tum phases. Modeling and understanding these systemsare among the greatest challenges in theoretical con-densed matter physics. For quantitative and predictiveresults, numerical calculations are indispensable. Themost widely and successfully used numerical approachto the electronic structure problem is based on densityfunctional theory (DFT). In condensed matter physics,DFT is often linked to band structure calculations, whileit is in principle much more powerful than that. TheHohenberg-Kohn theorems guarantee that a (potentiallycorrelated) many-body ground state is uniquely deter-mined by its energy and charge density distribution [1].However, for practical implementations and a physical in-terpretation of calculated results, the Kohn-Sham ansatzis commonly used, producing the band structure of a dif-ferent, non-interacting system with the same energy anddensity [2]. The implicit assumption is that this bandstructure captures the essential physics of the originalsystem, at least if correlations are weak enough.A critical shortcoming of DFT is that its eponymousfunctional is not known; instead, approximations on var-ious levels of complexity are commonly employed [3]. Itis important to emphasize that most of the functionalis universal , representing the many-particle Schr¨odingerequation. The only non-universal input in a DFT calcu-lation for a crystal is the potential landscape within theunit cell induced from the ions and core electrons as wellas the particle number, both of which do not aﬀect theuniversal part of the functional.The recent rise of machine-learning methods used tomodel physical systems has sparked hopes to use thesemethods for improving DFT calculations [4]. The ap-proaches interject the DFT-workﬂow at various stages,ranging from improving the Kohn-Sham scheme by rep-resenting the exchange-correlation functionals [5–10] or approximating the unknown energy functional and itsderivatives [11–19]. More recent works bypass the Kohn-Sham-solution scheme by directly learning the mappingbetween material parameters and ground-state properties[20–31], or constructing the ground-state wavefunctionfrom a corresponding density distribution [32]. Despitethese advancements, previous approaches are either lim-ited by complicated, non-scalable networks, suﬀer fromineﬃcient training data generation or struggle in applica-tions to the diﬀerent physical phases of the used models.In this work, we take an approach that follows threeguiding principles: (i) Implicit knowledge representationis the key strength of neural networks. Therefore, weuse a neural network to implicitely represent the (mini-mized) DFT functional . (ii) We aim at solving for phasesof quantum matter beyond the band structure paradigm.To that end, we train the neural network to directly out-put correlation functions [32], which can be used to char-acterize phases and phase transitions. (iii) Training theneural network is the key challenge as data – theoreticalor experimental – is precious. We set up an algorithm, inwhich the neural network can be trained with data fromdiﬀerent system sizes. This is the basis for an activelearning scheme , via which the neural network requestscostly training data for larger systems only in situationswhere it detects large ﬁnite-size eﬀects.Figures 1 (a) and (b) summarize our model, neuralnetwork, and workﬂow. We choose to work with a one-dimensional lattice model of spinless fermions. The hop-ping and interaction terms of the model deﬁne our ‘uni-versal Schr¨odinger equation’ and are therefore left un-altered throughout the study. Input to the neural net-work is the problem-speciﬁc potential and particle num-ber. Its output is the ground state energy E GS as wellas the density-density correlation function. We start byintroducing the active learning scheme and demonstratethe quantitative accuracy of network predictions, after a r X i v : . [ c ond - m a t . d i s - nn ] M a y FIG. 1. (a) Schematic of the dense neural network used to learn the map from unit-cell potentials and ﬁlling to ground-stateenergy and density-density correlators. (b) Active learning allows to check the energy deviations of diﬀerent system sizes andcontinuing to larger systems only if necessary. (c) Exact versus predicted ground-state energies on the test data set of theactively learning network (ALN). The inset shows the absolute error on the test-energy values as a density histogram n . (d)Mean absolute error per correlator entry on the test data set for the ALN. (e) Exact versus predicted ground-state energies forthe network trained on a single system size (PLN), with the inset showing the density n of absolute errors on the test data set.(f) Mean absolute error per correlator entry on the test data set for the PLN. training it on random potentials. The active trainingshows superior performance compared to conventionaltraining. We obtain mean squared errors of the energy of3 . · − in units of the hopping integral. Finally, we ap-ply the trained network to a topological and a symmetry-breaking phase transition. Our results demonstrate ascalable architecture, able to capture interacting latticemodels, with successful applications to structured phases. Model.—

While density functional theory was originallyformulated as a continuum theory, it has also been suc-cessfully applied to lattice models [33]. We consider aHamiltonian for spinless fermions on a one-dimensionallattice with sites labelled by i = 1 , · · · , L under periodicboundary conditions,ˆ H = − t X i (cid:16) ˆ c † i ˆ c i +1 + h.c. (cid:17) + U X i ˆ n i ˆ n i +1 + U X i ˆ n i ˆ n i +2 + X i V i ˆ n i , (1)where ˆ c † i and ˆ c i are the fermion creation and annihilationoperators on site i and ˆ n i = ˆ c † i ˆ c i is the corresponding den-sity operator. Nearest-neighbor hopping is parametrizedby t , which will serve as the energy unit throughout. The particles are subject to a repulsive interaction on nearest-and next-nearest-neighbor sites which we ﬁx to U = 1and U = 0 . V i , even at half ﬁlling [34].Motivated by the Hohenberg-Kohn theorems, we con-sider the kinetic term and the electron-electron interac-tions as universal, such that the external or ionic poten-tial ˆ V ext = P i V i ˆ n i together with the ﬁlling uniquely de-termine the ground state and all of its properties. Weonly consider potentials with periodicity of four sites.That is, the four values V i , i = 1 , ...,

4, completely spec-ify the Hamiltonian for any lattice size L = 4 N uc , with N uc the number of unit cells and V i = V i +4 for all i . Thisfour-site unit cell can be thought of as the discretized unitcell of a periodic crystal, while V i is the ionic potentialin this analogy. We restrict it to the range V i ∈ [ − , < n e < Learning.—

The supervised-machine-learning algo-rithm we use bypasses the Kohn-Sham scheme bydirectly learning the map from the external parameters n e and V i to the corresponding ground-state energyand density-density correlators h ˆ n i ˆ n j i GS . The density-density correlators are calculated for two adjacentunit cells [35]. The chosen fully connected neuralnetwork [36] consists of four hidden layers which increasein size towards the output as depicted in Fig. 1 (a) [37].A central challenge in machine learning is unbiased andeﬃcient data generation; one usually deals with limitedcomputational or experimental resources. Here, we gen-erate data by ﬁnite-size exact diagonalization (ED) ofsystems with randomly chosen n e , V i . In order to reduceﬁnite size eﬀects, these examples should naively be gener-ated with as large systems as possible. However, the com-putational cost for data generation with ED grows expo-nentially with system size. For this reason, we employ an active learning procedure, performing costly large systemED, as depicted in Fig. 1 (b), only if necessary. Usingrandom values for n e and V i , the scheme iteratively com-putes larger systems until the ﬁnite size deviation be-tween ground state energies lies below a priorly chosenthreshold θ . Correspondingly, the fast computation ofsmaller systems is used as often as possible, while pro-viding more accurate data in critical cases. The lowersigniﬁcance of inaccurate samples is accounted for by asample weight [37]. The samples are further augmentedby applying translations within the unit cell and inver-sion, allowing the network to capture the symmetries ofthe universal part of the Hamiltonian.We contrast the active learning approach outlinedabove with a passive learning scheme using training datagenerated for systems of ﬁxed size N uc = 5, with ﬁlling n e and on-site potentials V i chosen randomly. This systemsize is still solvable eﬃciently by ED, while suﬃciently re-ducing ﬁnite size eﬀects. Both learning procedures wererun with an equal time budget to ensure comparability.A mean absolute error loss function is then optimizedto obtain the weights and biases of the actively (ALN)and passively (PLN) learning neural network. The result-ing performance is evaluated on unseen data, consistingof 20 % of the full data set. Overﬁtting was avoided forboth systems by suitable hyperparameter choices [37].The absence of signiﬁcant deviations in the energy cor-relation plot in Fig. 1 (c) shows that the ALN performswell on random data, with an absolute error peaked at1 . · − . Similarly, Fig. 1 (d) shows only small errors inthe correlator prediction, with an overall mean absoluteerror of 1 . · − . The PLN performs worse in predict-ing energy values and correlators, as shown in the insetof Fig. 1 (e) and in Fig. 1 (f), with errors at least twice aslarge as in the case of the ALN. This comparison showsthe advantage of intelligent data generation at ﬁxed com-putational time budget.Random potentials rarely represent a relevant physicalscenario. We therefore move on to investigate how the FIG. 2. Neural network results for a transition betweendiﬀerent atomic limit insulators. (a) Compressibility κ forvarious potential strengths at quarter ﬁlling, as calculatedfrom the actively and passively learned neural network, ex-act diagonalization (ED) and density matrix renormalizationgroup (DMRG) of several system sizes. (b) Schematic depic-tion of the potential in the 4-site unit cell: Depending on thestrength and sign, an obstructed atomic limit, metallic, andatomic limit phase can be realized. (c) Corresponding observ-able C as calculated from the 8x8 density-density correlatorfor the same numerical methods as used for the compressibil-ity. The insets show the correlator as obtained from the ALNin the atomic (lower left) and obstructed atomic limit (upperright) with the unit cell depicted in red. randomly trained ALN and PLN perform for structuredsystems, where V i obey further symmetries. Learnability of obstructed atomic limits.—

We considerthe model introduced in Eq. (1) for a potential choice( V , V , V , V ) = (0 , V, V,

0) at quarter ﬁlling ( n e = 1).The system is metallic for V = 0 which separates twodistinct insulating phases for V >

V <

0. For

V >

V < κ = 1 n (cid:18) ∂ E GS ( n e ) ∂n (cid:19) − , (2)where n e is the electron ﬁlling and E GS ( n e ) the corre-sponding ground state energy. Figure 2 (a) shows that, asone approaches the critical metallic state around V = 0, κ increases rapidly. We emphasize that since κ is thesecond derivative of the energy, it is extremely suscepti-ble to errors. Note, further, that the ED data shows astrong even-odd eﬀect in N uc . We also calculated κ ( V )with the density matrix renormalization group algorithm(DMRG) [38][39] for N uc = 28. Compared with these ex-act results, the ALN produces a meaningful κ ( V ) with apeak value κ ( V = 0) close to the DMRG result. On thecontrary, the PLN is worse with a less pronounced andnon-symmetric peak.The location of Wannier centers can be used to dif-ferentiate between the two phases. Deﬁning C =( h ˆ n ˆ n i − h ˆ n ˆ n i ) − ( h ˆ n ˆ n i − h ˆ n ˆ n i ) [40], the trivialatomic limit with localization in the unit cell is obtainedfor C >

0, the phase transition happens at C = 0 andthe non-trivial atomic limit has C <

0. Figure 2 (c) high-lights that both networks are able to capture C across thetransition well, but the ALN results are markedly moreaccurate than the PLN results when compared with theED and DMRG data. This supports the statement thatactive learning delivers quantitatively better results. Learnability of spontaneously symmetry brokenphases.—

Spontaneous breaking of translation symmetrycan be triggered by introducing a potential of theform ( V , V , V , V ) = ( − V, V, − V, V ) at quarter ﬁlling( n e = 1). This phase arises from the competitionbetween the next-nearest-neighbor interaction U andthe increasing potential V and breaks spontaneously thefour-site translation symmetry at V c ≈ . E GS on n e around quarter ﬁlling [Fig. 3 (a)]. With increasing V , E GS ( n e ) develops a kink at n e = 1, signalling theemergence of the symmetry broken charge-density wave(CDW) insulator [Fig. 3 (b)]. Both neural networksrepresent the diﬀerent phases very well, the deviationsat small ﬁllings are attributed to the limited amount oftraining samples in this limit.Figure 3 (c) shows that the correlation in the metallicphase is short ranged and fast decaying, whereas the sym-metry broken phase possesses a distinct order. The corre-sponding order parameter C SSB = N uc (cid:2)P i ( − i h ˆ n i +1 i (cid:3) is, however, zero, since the two degenerate ground stateshave opposite imbalance in electron density between theﬁrst and third site of each unit cell. Instead, the ordercan be diagnosed by computing the square of the orderparameter from the density-density correlation functions.Its nonzero value is implied by the inequivalence betweenoﬀ-diagonal terms in the density-density correlator, high-lighted by the red and green squares in Fig. 3 (c). This FIG. 3. Neural network results for a spontaneous sym-metry breaking phase. (a) Ground state energy for V = 0for several electron ﬁllings n e as calculated from the activelyand passively learned neural network and ED. The inset dis-plays the non-interacting bandstructure. (b) Ground stateenergy as a function of the electron ﬁlling n e in the symme-try broken phase ( V = 4). The kink at n e = 1 signals aninteraction-induced incompressible phase. The inset revealsthe band ﬂattening of the non-interacting system. (c) Phasediagram, schematic of the potential, and density-density cor-relation functions. The latter are obtained from the ALN for V = 0 (left) and V = 4 (right). Oﬀ-diagonal terms, whoseinequivalence signals the symmetry breaking phase (right) arehighlighted in red and green. behaviour is well captured by the ALN, producing quan-titatively accurate correlations in both phases. Conclusion.—

We presented a supervised learning ap-proach for lattice DFT, bypassing the Kohn-Sham so-lution scheme. Employing an active learning procedureallowed us to improve our results at ﬁxed computationalcost regarding data generation. Focussing on correlationfunctions on a subsystem and taking only the potentiallandscape in the unit cell and particle number as inputresults in a scalable architecture. Besides veriﬁcation ofour algorithm on unseen random potentials, we demon-strated that the trained networks reliably solve for dif-ferent structured phases.Looking ahead, it is highly desirable to constructsimilar implicit (neural network) representations of DFTfor systems in continuous space and higher dimensions,in particular to attack the electronic structure problemin strongly correlated regimes. The main challengeis the generation of valid and balanced data sets,and the incorporation of data from various sources,including conventional DFT, Monte Carlo calculations,experiments, and future quantum simulation devices.Two concepts on which our study relies, (1) focus oncorrelation functions instead of quantum states and (2)the use of active learning, should prove useful in thisfuture venture.We thank Giuseppe Carleo and Xi Dai for insightfuldiscussions. This project has received funding from theEuropean Research Council (ERC) under the EuropeanUnions Horizon 2020 research and innovation program(ERC-StG-Neupert-757867-PARATOP). [1] P. Hohenberg and W. Kohn, Physical Review , B864(1964).[2] W. Kohn and L. J. Sham, Physical Review , A1133(1965).[3] P. Mori-Snchez, A. J. Cohen, and W. Yang, PhysicalReview Letters , 146401 (2008).[4] G. R. Schleder, A. C. M. Padilha, C. M. Acosta,M. Costa, and A. Fazzio, Journal of Physics: Materi-als , 032001 (2019).[5] K. T. Lundgaard, J. Wellendorﬀ, J. Voss, K. W. Jacob-sen, and T. Bligaard, Physical Review B , 235162(2016).[6] B. Kolb, L. C. Lentz, and A. M. Kolpak, Scientiﬁc Re-ports , 1192 (2017).[7] Q. Liu, J. Wang, P. Du, L. Hu, X. Zheng, and G. Chen,The Journal of Physical Chemistry A , 7273 (2017).[8] R. Nagai, R. Akashi, S. Sasaki, and S. Tsuneyuki, TheJournal of Chemical Physics , 241737 (2018).[9] S. Dick and M. Fernandez-Serra, Machine Learning aHighly Accurate Exchange and Correlation Functional ofthe Electronic Density , preprint (2019).[10] J. Schmidt, C. L. Benavides-Riveros, and M. A. L. Mar-ques, The Journal of Physical Chemistry Letters , 6425(2019), arXiv: 1908.06198.[11] J. C. Snyder, M. Rupp, K. Hansen, K.-R. Mller, andK. Burke, Physical Review Letters , 253002 (2012).[12] J. C. Snyder, M. Rupp, K. Hansen, L. Blooston, K.-R.Mller, and K. Burke, The Journal of Chemical Physics , 224104 (2013).[13] L. Li, J. C. Snyder, I. M. Pelaschier, J. Huang, U.-N. Ni-ranjan, P. Duncan, M. Rupp, K.-R. Mller, and K. Burke,International Journal of Quantum Chemistry , 819(2016).[14] L. Li, T. E. Baker, S. R. White, and K. Burke, PhysicalReview B , 245129 (2016).[15] K. Yao and J. Parkhill, Journal of Chemical Theory andComputation , 1139 (2016).[16] J. Seino, R. Kageyama, M. Fujinami, Y. Ikabata, andH. Nakai, The Journal of Chemical Physics , 241705(2018). [17] J. Nelson, R. Tiwari, and S. Sanvito, Physical Review B , 075132 (2019).[18] P. Golub and S. Manzhos, Physical Chemistry ChemicalPhysics , 378 (2019).[19] T. Nudejima, Y. Ikabata, J. Seino, T. Yoshikawa, andH. Nakai, The Journal of Chemical Physics , 024104(2019).[20] K. Hansen, G. Montavon, F. Biegler, S. Fazli, M. Rupp,M. Scheﬄer, O. A. von Lilienfeld, A. Tkatchenko, andK.-R. Mller, Journal of Chemical Theory and Computa-tion , 3404 (2013).[21] K. T. Schtt, H. Glawe, F. Brockherde, A. Sanna, K. R.Mller, and E. K. U. Gross, Physical Review B , 205118(2014).[22] K. Hansen, F. Biegler, R. Ramakrishnan, W. Pronobis,O. A. von Lilienfeld, K.-R. Mller, and A. Tkatchenko,The Journal of Physical Chemistry Letters , 2326(2015).[23] K. T. Schtt, F. Arbabzadah, S. Chmiela, K. R. Mller,and A. Tkatchenko, Nature Communications , 13890(2017).[24] F. Brockherde, L. Vogt, L. Li, M. E. Tuckerman,K. Burke, and K.-R. Mller, Nature Communications ,872 (2017).[25] M. Bogojeski, F. Brockherde, L. Vogt-Maranto, L. Li,M. E. Tuckerman, K. Burke, and K.-R. Mller, “Eﬃcientprediction of 3d electron densities using machine learn-ing,” (2018), arXiv:1811.06255 [physics.comp-ph].[26] K. Ryczko, K. Mills, I. Luchak, C. Homenick, andI. Tamblyn, Computational Materials Science , 134(2018).[27] E. Schmidt, A. T. Fowler, J. A. Elliott, and P. D. Bris-towe, Computational Materials Science , 250 (2018).[28] S. Pilati and P. Pieri, Scientiﬁc Reports , 5613 (2019).[29] C. A. Custodio, E. R. Filletti, and V. V. Frana, ScientiﬁcReports , 1886 (2019), arXiv: 1811.03774.[30] K. Ryczko, D. A. Strubbe, and I. Tamblyn, PhysicalReview A , 022512 (2019).[31] L. Zepeda-Nez, Y. Chen, J. Zhang, W. Jia, L. Zhang,and L. Lin, “Deep density: circumventing the kohn-shamequations via symmetry preserving neural networks,”(2019), arXiv:1912.00775 [physics.comp-ph].[32] J. R. Moreno, G. Carleo, and A. Georges,arXiv:1911.03580 [cond-mat, physics:quant-ph] (2019),arXiv: 1911.03580.[33] K. Sch¨onhammer, O. Gunnarsson, and R. M. Noack,Phys. Rev. B , 2504 (1995).[34] L. Markhof, B. Sbierski, V. Meden, and C. Karrasch,Physical Review B , 235126 (2018).[35] In order to also use small systems ( N uc = 4) with periodicboundary conditions as part of the training data, at mosttwo adjacent unit cells from the interior of the density-density correlator are a valid representative.[36] F. Chollet et al. , “Keras,” https://keras.io (2015).[37] For details about the implementation of the used neuralnetworks, data generation and resulting training samples,training, test and validation performance as well as astudy of the transition point of the spontaneous symme-try breaking phase, see supplementary material.[38] Calculations were performed using the TeNPy Library(version 0.5.0), for a ﬁnite but periodic lattice withmaximum bond dimension χ max = 2000, max. trunca-tion error (cid:15) = 10e − E = 10e − [39] J. Hauschild and F. Pollmann, SciPost Phys. Lect.Notes 5 (2018), 10.21468/SciPostPhysLectNotes.5,code available from https://github.com/tenpy/tenpy ,arXiv:1805.00055. [40] Note that we choose C to highlight that two neighboringsites are always occupied by just one electron. That dis-tinguishes it from the band insulator case of ﬁlling n e = 2and a spontaneously translation symmetry broken phaseat n e = 1 (both C = 0). upplementary Materials M. Michael Denner, Mark H. Fischer, and Titus Neupert Department of Physics, University of Zurich, Winterthurerstrasse 190, 8057 Zurich, Switzerland (Dated: May 8, 2020) a r X i v : . [ c ond - m a t . d i s - nn ] M a y I. NETWORK PARAMETERS

The supervised-machine-learning algorithm we propose in this paper uses dense neural networks of identical ar-chitecture, for both the active and passive learning scheme. The layers are connected by Softplus( x ) = ln (1 + e x )activation functions, except for the output layer. The last layer takes into account that correlator values and groundstate energies have diﬀerent ranges, by employing a linear activation function. Table I indicates the relevant param-eters used in this paper.TABLE I: Relevant parameters used to create the neural networks in this paper. Parameter ValueNeurons Layer 1 50Activation 1 SoftplusWeight init. 1 lecun uniformNeurons Layer 2 125Activation 2 SoftplusWeight init. 2 lecun uniformNeurons Layer 3 150Activation 3 SoftplusWeight init. 3 lecun uniformNeurons Layer 4 200Activation 4 SoftplusWeight init. 4 lecun uniformNeurons Layer 5 65Activation 5 LinearWeight init. 5 lecun uniformOptimizer AdamBatch size 100Learning rate 0.001Epochs 1500

II. TRAINING DATA

The choice of training examples is crucial in a machine learning setting, as data is precious. An intelligent datageneration procedure is advantageous if computational time is ﬁnite, removing the necessity to always calculate aslarge systems as possible. Naively, the latter is the way to go in order to avoid ﬁnite size biases in the trainingexamples. We contrast these two approaches as active and passive learning schemes.The naive approach generates data with exact diagonalization of a 5 unit cell system, with electron ﬁllings n e andpotentials V i in the unit cell chosen randomly. The potentials are in the range | V i | ≤

4, and the ﬁlling can assumevalues between 0 . ≤ n e ≤ . V i → V i +1 and inversion within the unit cell. The resulting dataset consists of 12.020 pairs of ﬁllings and externalpotentials, mapped to the corresponding ground state energies and density-density correlators. The split in training,validation and test sets is illustrated in Tab. II.However, large system diagonalization is not always necessary to obtain accurate data. This is the basis for an activelearning scheme, which performs the diagonalization of large systems only if strong ﬁnite size eﬀects are detected.The input to this procedure is the random choice of electron ﬁllings n e and potentials V i in the unit cell. The formerassuming values between 0 . ≤ n e ≤ . | V i | ≤

4. Thetraining data for both networks was chosen as | V i | ≤ N uc = 3 and N uc = 4 unit cell system, calculated with exact diagonalization,deviates more than a priorly chosen threshold θ , a system with one additional unit cell is being calculated. ThisTABLE II: Data for the passively trained neural network. Parameter ValueTotal number of samples 12020Number of training samples 7212Number of validation samples 2404Number of test samples 2404Number of samples from 20 site ED 12020Sample weight 1 procedure is repeated up to N uc = 6 unit cells if necessary. This means that the neural network can query a largersystem whenever the deviation of ground state energies exceeds the chosen threshold θ , thereby reducing ﬁnite sizeeﬀects. A sampleweight is used to capture the lower importance of small systems, whenever ﬁnite size eﬀects aredetected. Data augmentation is again used not only to enlarge the dataset to 74.500 input-output data pairs (seeTab. III), but also to allow the network to capture the underlying symmetries.TABLE III: Data for the actively trained neural network. Parameter ValueTotal number of samples 74500Number of training samples 44700Number of validation samples 14900Number of test samples 14900Number of samples from 16 site ED 68850Number of samples from 20 site ED 5500Number of samples from 24 site ED 150Sample weight 16 site data 1 a Sample weight 20 site data 2 b Sample weight 24 site data 3 a This weight is increased to 3 if no 20 or 24 site system had to be calculated for this sample. b This weight is increased to 3 if no 24 site system had to be calculated for this sample.

The choice of the threshold θ is a crucial parameter of the active learning scheme. We therefore investigate theimplications of diﬀerent thresholds (see Tab. IV), and test the performance of the trained models on unseen examples.These examples are not part of the training set and were generated with random ﬁllings n e and potentials V i in therange | V i | ≤ N uc = 5 , θ , calculated with exact diagonalization and anequal time budget for each parameter choice. N uc θ = 0 . θ = 0 . θ = 0 . θ = 0 . θ = 0 . θ = 1 .

54 1350 4200 13500 18850 81075 832505 1335 575 1030 790 850 06 245 10 30 15 20 0

The resulting mean absolute error is presented in Fig. 1, indicating that the error decreases with increasing θ .When considering random potentials, it is therefore advantageous to use training data generated with as large θ aspossible, resulting in the largest possible dataset. Consequently, this means larger systems with N uc = 5 , | V i | ≤ V i . Finite size deviations are howevernegligible in such a case. This means that a good performance on this data does not necessarily correspond to agood elimination of ﬁnite size eﬀects in a realistic physical potential. Figure 1 highlights this with the evaluation ofa dataset with potentials | V i | ≤

1, with the minimum of the obtained mean absolute error shifting towards smallervalues of θ .Consequently, a trade-oﬀ is needed between the total number of training samples and the number of samples with N uc = 5 ,

6. The best compromise is reached for θ = 0 . M e a n a b s o l u t e e rr o r FIG. 1: Mean absolute error of neural network predictions for various datasets, trained on examples generated withdiﬀerent threshold θ as illustrated in Tab IV. Random potentials were used to generate the evaluation data, with a N uc = 5 , | V i | ≤ N uc = 6 , | V i | ≤ N uc = 5 , | V i | ≤ III. TRAINING PERFORMANCE

The actively and passively constructed datasets are split into training, validation and test sets. Achieving consistentperformance on the optimized and unseen data indicates that the network has not been overﬁtted. The correspondingresults for both training schemes are presented in Tab. Va and Vb, highlighting that the intended mapping fromelectron ﬁllings and potentials to ground state energy and density-density correlator is well captured.

IV. DETERMINATION OF THE POINT OF SPONTANEOUS SYMMETRY BREAKING

The four-site translational symmetry of the considered extended Hubbard model can be spontaneously broken by theintroduction of an external potential ˆ V ext = P i =1 V i ˆ n i . Choosing a potential of the form V = − V = V = − V = − V causes a competition between next-nearest neighbor repulsion U and the potential V at quarter ﬁlling. As a result, asymmetry breaking phase with two degenerate ground states emerges, corresponding to ordering the electrons eitherto site one or to site three in each unit cell.We consider two approaches in order to identify the potential strength at which the spontaneous symmetry breakingoccurs. For each of these we investigate a ﬁnite size scaling plot, to derive the extrapolated point of the phase transitionin the thermodynamic limit.As stated above, the symmetry broken phase possesses a distinct order in the density-density correlator, occupyingonly sites with negative potential. By adding a small potential on one site, e.g. V = − V − δV, δV (cid:28) V , one of thedegenerate ground states is favoured. Consequently, the occupation h ˆ n i i GS will concentrate on the ﬁrst lattice site ofeach unit cell. We therefore investigate the diﬀerence in the occupation of site one and three, C = h ˆ n i GS − h ˆ n i GS .In the metallic phase for a vanishing potential V = 0, all sites are equally occupied, despite the small oﬀset δV .This changes around the point of the phase transition, where explicitly one of the symmetry breaking ground statesTABLE V: Performance of passively (a) and actively (b) trained neural networks. (a) Performance for the PLN.Parameter ValueMAE on E testGS E testGS h ˆ n i ˆ n j i testGS h ˆ n i ˆ n j i testGS E validationGS E validationGS h ˆ n i ˆ n j i validationGS h ˆ n i ˆ n j i validationGS E trainingGS E trainingGS h ˆ n i ˆ n j i trainingGS h ˆ n i ˆ n j i trainingGS E testGS E testGS h ˆ n i ˆ n j i testGS h ˆ n i ˆ n j i testGS E validationGS E validationGS h ˆ n i ˆ n j i validationGS h ˆ n i ˆ n j i validationGS E trainingGS E trainingGS h ˆ n i ˆ n j i trainingGS h ˆ n i ˆ n j i trainingGS is selected. The position of this jump in C can then be studied with various system sizes and extrapolated tothe thermodynamic limit. Since exact diagonalization already reaches computational boundaries for quite smallsystems, we additionally employ the density-matrix renormalization group algorithm (DMRG) [1][2]. In order toensure convergence also for large systems, open boundary conditions are considered. The inﬂuence of the boundarycan however be neglected when considering density-density correlations in the middle of the system. Figure 2(a)indicates the transition point as extrapolated from several system sizes, leading to a potential of V c = 1 . E GS − E for severalsystem sizes. The critical potential V c corresponds to the point of gap opening. The corresponding results arepresented in Fig. 2(c), with an extrapolated transition point at V c = 1 . V c ≈ .

8. Correspondingly, choosing the range of training potentials as V i ∈ [ − ,

4] securesthat the neural networks can capture the phase transition. [1] Calculations were performed using the TeNPy Library (version 0.5.0), for a ﬁnite lattice with maximum bond dimension χ max = 1000, max. truncation error (cid:15) = 10e −

10 and energy convergence criterion ∆ E = 10e − https://github.com/tenpy/tenpy , arXiv:1805.00055. FIG. 2: (a) Finite size scaling plot of the critical potential V c , obtained with a small potential δV to break theground state degeneracy. Density matrix renormalization group (DMRG) calculations of open boundary conditionsare combined with the exact diagonalization (ED) of a 16 site system. (b) Schematic of the potential landscape inthe unit cell. (c) Finite size scaling plot of the critical potential V cc