[PDF] Machine learning topological invariants of non-Hermitian systems

Abstract

The study of topological properties by machine learning approaches has attracted considerable interest recently. Here we propose machine learning the topological invariants that are unique in non-Hermitian systems. Specifically, we train neural networks to predict the winding of eigenvalues of four prototypical non-Hermitian Hamiltonians on the complex energy plane with nearly 100% accuracy. Our demonstrations in the non-Hermitian Hatano-Nelson model, Su-Schrieffer-Heeger model and generalized Aubry-André-Harper model in one dimension, and two-dimensional Dirac fermion model with non-Hermitian terms show the capability of the neural networks in exploring topological invariants and the associated topological phase transitions and topological phase diagrams in non-Hermitian systems. Moreover, the neural networks trained by a small data set in the phase diagram can successfully predict topological invariants in untouched phase regions. Thus, our work paves the way to revealing non-Hermitian topology with the machine learning toolbox.

Full PDF

MMachine learning topological invariants of non-Hermitian systems

Ling-Feng Zhang, Ling-Zhi Tang, Zhi-Hao Huang, Guo-Qing Zhang,

1, 2, ∗ Wei Huang, and Dan-Wei Zhang

1, 2, † Guangdong Provincial Key Laboratory of Quantum Engineering and Quantum Materials,GPETR Center for Quantum Precision Measurement and SPTE,South China Normal University, Guangzhou 510006, China Frontier Research Institute for Physics, South China Normal University, Guangzhou 510006, China (Dated: September 16, 2020)The study of topological properties by machine learning approaches has attracted considerableinterest recently. Here we propose machine learning the topological invariants that are unique innon-Hermitian systems. Speciﬁcally, we train neural networks to predict the winding of eigenvaluesof three diﬀerent non-Hermitian Hamiltonians on the complex energy plane with nearly 100% ac-curacy. Our demonstrations on the Hatano-Nelson model, the non-Hermitian Su-Schrieﬀer-Heegermodel and generalized Aubry-Andr´e-Harper model show the capability of the neural networks inexploring topological invariants and the associated topological phase transitions and topologicalphase diagrams in non-Hermitian systems. Moreover, the neural networks trained by a small dataset in the phase diagram can successfully predict topological invariants in untouched phase regions.Thus, our work pave a way to reveal non-Hermitian topology with the machine learning toolbox.

I. INTRODUCTION

Machine learning, which lies at the core of the artiﬁcialintelligence and data science, has recently achieved hugesuccess from industrial applications (especially in com-puter vision and natural language process) to fundamen-tal researches in physics, cheminformatics and biology [1–4]. In physics, machine learning has shown its availabilityin experimental data analysis [5–7] and classiﬁcation ofphases of matter [8–23]. Among these applications, oneof the most interesting problems is to extract the globalproperties of topological phases of matter from local in-puts, such as the topological invariants that intrinsicallynonlocal. Recent works have shown that artiﬁcial neuralnetworks can be trained to predict the topological invari-ants of band insulators with high accuracy [16, 17]. Theadvantage of this approach is that the neural networkcan capture global topology directly from local raw datainputs. Other theoretical proposals for using supervisedor unsupervised learning in identifying topological phaseshave been suggested [15, 18, 21–26]. Notably, the convo-lutional neural network (CNN) trained from raw experi-mental data has been demonstrated to identify topolog-ical phases [27, 28].On the other hand, growing eﬀorts have been investedin uncovering exotic topological states and phenomena innon-Hermitian systems in recent years [29–68]. The non-Hermiticity may come from the gain and loss eﬀects [36–40], the non-reciprocal hopping [46, 47], or the dissi-pation in open systems [29, 30]. The non-Hermiticity-induced topological phases are also investigated in dis-ordered [53–62] and interacting systems [63–68]. In thenon-Hermitian topological systems, there are not onlytopological properties deﬁned by the eigenstates (such as ∗ [email protected] † [email protected] topological Bloch bands), but also topological invariantslying on solely the eigenenergies. For instance, the com-plex energy landscapes (and exceptional points) give riseto new topological invariants, which include the windingnumber (vorticity) deﬁned solely in the complex energyplane [48–51]. This winding number and several closelyrelated winding numbers in the presence of other sym-metries lead to richer topological classiﬁcation than thatof their Hermitian counterparts. In addition, it was re-vealed [69–71] that the nonzero winding number in thecomplex energy plane is the topological origin of the non-Hermitian skin eﬀect [31–35]. In view that the topolog-ical invariants in Hermitian systems have been recentlystudied based on the machine learning approach [15–18, 21–26], the ﬂexibility of machine learning the newkind of winding numbers in non-Hermitian systems is anurgent and meaningful research.In this work, we adapt the machine learning with ar-tiﬁcial neural networks to predict non-Hermitian topo-logical invariants and classify the topological phases inthree non-Hermitian models. We ﬁrst take the Hatano-Nelson model [46, 47] as a feasibility veriﬁcation of themachine learning method in identifying non-Hermitiantopological phases. We show that the trained CNNcan predict the winding numbers even for those phasesthat are not included in the training with high accu-racy, whereas the fully connected neural network (FCNN)can only predict winding numbers in the trained phases.We interpolate the intermediate value of the CNN andﬁnd a strong relationship with the winding angle of theeigenenergies in the complex plane. We then use theCNN to study topological phase transitions in a non-Hermitian Su-Schrieﬀer-Heeger (SSH) model [72] withnon-reciprocal hopping. We ﬁnd that the CNN can pre-cisely detect the transition points near the boundary ofeach phase even though trained only by the data in thedeep phase region. Finally, by using the CNN, we obtainthe topological phase diagram of a non-Hermitian gen-eralized Aubry-Andr´e-Harper (AAH) model [73–75] with a r X i v : . [ phy s i c s . c o m p - ph ] S e p non-reciprocal hopping and complex quasiperiodic poten-tial. The winding numbers evaluated from the CNN showan accuracy of more than 99% with theoretical values inthe whole parameter space, even though the complex on-site potential is absent in the training process. Our workprovides an eﬃcient and general approach to reveal non-Hermitian topology based on the machine learning.The rest of this paper is organized as follows. We ﬁrststudy the winding number of the Hatano-Nelson model asa feasibility veriﬁcation of our machine learning methodin Sec. II. Diﬀerent performances of the CNN and theFCNN are also discussed. Section III is devoted to revealthe topological phase transition in the non-HermitianSSH model by the CNN. In Sec. IV, we show the CNNcan precisely predict the topological phase diagram of thenon-Hermitian generalized AAH model. A short sum-mary is ﬁnally presented in Sec. V. II. LEARNING TOPOLOGICAL INVARIANTSIN HATANO-NELSON MODEL

Let us begin with the Hatano-Nelson model, which canbe considered as the simplest single-band non-Hermitianmodel. The Hatano-Nelson model takes the followingHamiltonian in a one-dimensional lattice of length LH = L (cid:88) j ( t r ˆ c † j + µ ˆ c j + t l ˆ c † j ˆ c j + µ + V j ˆ c † j ˆ c j ) , (1)where t l (cid:54) = t ∗ r denotes the amplitudes of non-reciprocalhopping, ˆ c † j (ˆ c j ) is creation (annihilation) operator on the j -th lattice site, µ denotes the hopping length betweentwo sites, and V j is the on-site energy in the lattice. Theoriginal Hatano-Nelson model takes the disorder poten-tial with random V j and the nearest-neighbour hoppingwith µ = 1, as shown in Fig. 1(a). Here we consider theclean case by setting V j = 0 and take µ as a parameterin learning the topological phase transition with neuralnetworks. Under the periodical boundary condition, thecorresponding eigenenergies in this case are given by E ( k ) = H ( k ) = t r e − iµk + t l e iµk , (2)where H ( k ) is the Hamiltonian in momentum space withthe quasimomenta k = 0 , π/L, π/L, · · · , π .Following Ref. [48], we can deﬁne the winding numberin the complex energy place as a topological invariant inthe Hatano-Nelson model: w = (cid:90) π dk πi ∂ k ln det H ( k )= (cid:90) π dk π ∂ k arg E ( k ) = (cid:26) µ | t r | < | t l | ; − µ | t r | > | t l | , (3)where arg denotes the principle value of the argument be-longing to [0 , π ). For discretized E ( k ) with ﬁnite lattice …… 𝑡 𝑟 𝑗 − 1 𝑗 + 1𝑗 A Im(E) Re(E) Re(E)Im(E)(a)(b) 𝑤 = 1 𝑤 = −1 𝑡 𝑙 FIG. 1. (Color online) (a) The Hatano-Nelson modelwith non-reciprocal hopping between two nearest nearest-neighbour sites ( µ = 1). (b) Complex eigenenergy draws aclosed loop around the base energy E B = 0 during the varia-tion of quasimomenta k from 0 to 2 π , giving rise to the wind-ing number w = ± site L , the complex-energy winding number reduces to w = 12 π L (cid:88) n =1 ∆ θ ( n ) = 12 π L (cid:88) n =1 [ θ ( n ) − θ ( n − , (4)where θ ( n ) = arg E (2 πn/L ). Note that for Hermitiansystems ( t r = t ∗ l ), one has w = 0 due to the real energyspectrum with arg E ( k ) = 0 , π . According to this deﬁni-tion, a nontrivial winding number in this model gives thenumber of times the complex eigenenergy encircles thebase point E B = 0, which is unique to non-Hermitiansystems. The complex eigenenergy windings of the twocases with w = ± µ to con-trol the number of times of complex eigenenergy encirclesthe origin of the complex plane. When the loop windsaround the origin µ times during the variation of k from0 to 2 π , the winding number is ± µ , where ± means thecounterclockwise and clockwise windings, respectively.We now build a supervised task for learning the wind-ing number given by Eq. (4) based on neural networks.First, we need labeled data sets for training and evalu-ation. Since the winding number is intrinsically nonlo-cal and characterized by the complex energy spectrum,we feed neural networks with the normalized spectrum-dependent conﬁgurations d ( n ) = [ d R ( n ) , d I ( n )] at L points discretized uniformly from 0 to 2 π , where d R ( n ) =Re[ E (2 πn/L )] and d I ( n ) = Im[ E (2 πn/L )]. Therefore,the input data is a ( L + 1) × (cid:20) d R (0) d R (2 π/L ) d R (4 π/L ) ... d R (2 π ) d I (0) d I (2 π/L ) d I (4 π/L ) ... d I (2 π ) (cid:21) T , with a period of 2 π : d ( n ) = d ( n + 2 π ). In the following,we set L = 32, which is large enough to take discreteenergy spectra as the input data of neural networks. La- 𝑑𝑑 𝑅𝑅 Input

FCNN 𝑑𝑑 𝐼𝐼 𝐿𝐿 + 2 𝐿𝐿 �𝑤𝑤 Output HN model

𝐿𝐿 𝐿𝐿 /2 𝑃𝑃 𝑃𝑃 𝑃𝑃 NHSSH model �𝑤𝑤 ± = 0 �𝑤𝑤 ± = 0.5 �𝑤𝑤 ± = − 𝐿𝐿 𝐿𝐿 /2 𝑃𝑃 𝑃𝑃 NHAAH model �𝑤𝑤 Φ = 0 �𝑤𝑤 Φ = − 𝐿𝐿 + 1 𝐿𝐿 𝑑𝑑 𝐼𝐼 𝑑𝑑 𝑅𝑅 Conv. Conv.Input

CNN

OutputOutput

FIG. 2. (Color online) Schematic of machine learning workﬂow and the structure of neural networks for the Hatano-Nelson(denoted by HN) model, non-Hermitian SSH (denoted by NHSSH) model, and non-Hermitian generalized AAH (denoted byNHGAAH) model. The input data are represented by ( L + 1) × × ( L + 1)-dimensionalvector for the FCNN, respectively. Here d R and d I denote the real and imaginary parts of input data (complex eigenenergies),respectively. bels are computed according to Eq. (4) for correspondingconﬁgurations.The machine learning workﬂow is schematically shownin Fig. 2. For the Hatano-Nelson model with diﬀerent µ , the output of the neural network is a real number˜ w , and the predicted winding number is interpreted asthe integer that is closed to ˜ w . We ﬁrst train the neu-ral networks with both complex spectrum conﬁgurationsand their corresponding true winding numbers. After thetraining, we feed only the complex-spectrum-dependentconﬁgurations to the neural networks and compare theirpredictions with the true winding numbers, from whichwe determine the percentage of the correct predictionsas the accuracy. In this case, we consider two classesof neural networks: the CNN and FCNN, respectively.The neural networks are similar as those in Ref. [16] forcalculating the winding number of the Bloch vectors inHermitian topological bands.The CNN has two convolution layers with 32 kernelsof size 1 × × × ×

1, followedby a fully connected layer of 2 neurons before outputlayer. The total number of trainable parameters is 262.The FCNN has two hidden layers with 32 and 2 neurons,respectively. The total number of trainable parameters is2213. The architecture of two classes of neural networksis shown in Fig. 2. All the hidden layers have rectiﬁed linear units f ( x ) = max (0 , x ) as activation functions andthe output layer has linear activation function f ( x ) = x .The objective function to be optimized is deﬁned by J = 1 N N (cid:88) i =1 ( ˜ w i − w i ) , (5)where ˜ w i and w i are respectively the winding numberof the i th complex eigenenergies predicted by the neuralnetworks and the true winding number, and N is the totalnumber of the training data set. We take 6 × train-ing conﬁgurations, which consists of a ratio of 1 : 1 : 1of them having winding number {± , ± , ± } , respec-tively. Test set consists of some conﬁgurations with wind-ing numbers w ∈ {± , ± , ± } that are not included inthe training set and w ∈ {± , ± } that are not seen byneural networks during the training. The number of con-ﬁgurations in each kind of winding number is 4 × .The training details are given in the Appendix A.After training, we test with other conﬁgurations andthe predicted winding number ˜ w are shown in Fig. 3 (a).Note that the networks tend to produce ˜ w close to inte-gers and thus we take each ﬁnal winding number as theinteger closed to ˜ w . As shown in Fig. 3 (b), we plot theprobability distribution of ˜ w predicted from the CNN ondiﬀerent test data sets. The test results of two neural net-works are presented in Table. I, which shows very high � 𝑤𝑤 ±1±2±3±4±5 (a)(b) Probability density �𝑤𝑤 test data (c) Δ𝜃𝜃 E x t r ac t e d F ea t u r e FIG. 3. (Color online) (a) Winding number predicted byCNN on test data sets, diﬀerent colors belong to diﬀerenttrue winding numbers, each test set contains 4 × complexspectrum conﬁgurations. (b) Probability distribution of pre-dicted winding number from CNN on test data sets. The sumof probability distribution for a test set (bins with the samecolor) is equal to one and there are narrow peaks at each in-teger. (c) The intermediate output a n which is the activationvalue after two convolutional layers against the correspond-ing exact winding angle ∆ θ ( n ). 10 L points corresponding 10diﬀerent test conﬁgurations are plotted in the ﬁgure. accuracy (more than 98%) of the CNN and FCNN on testdata set with the winding number w = {± , ± , ± } . Wecan ﬁnd that the CNN performs generally better thanthe FCNN. Surprisingly, the CNN works well even in thecases of w = {± , ± } , which consist of conﬁgurationswith larger winding numbers not seen by neural networksduring the training. On the contrary, the FCNN can-not predict the true winding number even though hasmore trainable parameters. These results indicate thatthe convolutional layer respects the translation symme-try of complex spectrum in the momentum space explic-itly and convolutional layers can take local winding ∆ θ explicitly through the 2 × θ . Based onthe convolutional layers, we consider the activation valueafter two convolution should have a linear dependenceon ∆ θ at some extent and the following fully-connectedlayers use simple linear regression. We plot a n versus∆ θ ( n ) with n = 1 , ..., L and a n being the n -th componentof intermediate values after two convolution layers. Asshown in Fig. 3 (c), the intermediate output is approx-imately linear with ∆ θ within certain regions. A linearcombination of these intermediate values with correct co-eﬃcients in the following fully-connected layers can theneasily lead to the true winding number. In this way, theCNN realizes a calculation workﬂow that is equivalent tothe wingding angle ∆ θ in Eq. (4). w ± ± ± ± ± CNN Accuracy 99.8 % 99.4 % 98.0% . % . %FCNN Accuracy 99.2% 99.0% 98.5% . % . %TABLE I. The accuracy of the CNN and FCNN on test dataset with the winding number w = {± , ± , ± , ± , ± } inthe Hatano-Nelson model with µ = 1 , , , ,

5. The windingnumber w = {± , ± } are not seen by the neural networksduring the training. III. LEARNING TOPOLOGICAL TRANSITIONIN NON-HERMITIAN SSH MODEL

Based on the accurate winding number calculated bythe CNN, we further use similar CNN to study topolog-ical phase transitions in a non-Hermitian SSH model, asshown in Fig. 4(a). The considered model with nonrecip-rocal intra-cell hopping in the one-dimensional dimerizedlattice of L unit cells can be described by the followingHamiltonian H = L (cid:88) n =1 [( t − δ )ˆ a † n ˆ b n + ( t + δ )ˆ b † n ˆ a n + t (cid:48) ˆ a † n +1 ˆ b n + t (cid:48) ˆ b † n ˆ a n +1 ] . (6)Here ˆ a † n and ˆ b † n (ˆ a n , ˆ b n ) denote the creation (annihila-tion) operators on the n -th A and B sublattices, t isthe uniform intra-cell hopping amplitude, δ is the non-Hermitian parameter, t (cid:48) is the inter-cell hopping ampli-tude. When δ = 0, the model reduces to the HermitianSSH model. Under the periodic boundary condition, thecorresponding Hamiltonian in k space is given by H ( k ) = (cid:32) t (cid:48) e − ik + t − δt (cid:48) e ik + t + δ (cid:33) . (7)The two energy bands are then given by E ± ( k ) = ± (cid:112) t − δ + 2 t cos k − i δ sin k. (8)Following Ref. [48–51] and considering the chiral sym-metry, one can deﬁne an inter-band winding number w ± = (cid:90) π dk π ∂ k arg( E + − E − ) = (cid:90) π dk π ∂ k arg E . (9)For discretized E ± ( k ) with ﬁnite L , it reduces to w ± = 14 π L (cid:88) n =1 [ θ (cid:48) ( n ) − θ (cid:48) ( n − θ (cid:48) ( n ) = arg E (2 πn/L ) in this model. Notable, w ± is half of the summing the winding of number of t (cid:48) e − ik + t − δ and t (cid:48) e ik + t + δ around the origin of thecomplex plane as k is increased from 0 to 2 π . The inter-band winding number w ± is quantized as Z / t (cid:48) e − ik + t − δ and t (cid:48) e ik + t + δ are always integersdue to periodicity [51]. We consider t (cid:48) = 1, t ∈ ( − , δ ∈ ( − ,

6) in our study.For this model, we set the conﬁguration of input dataas d ( n ) = { Re[ E (2 πn/L )] , Im[ E (2 πn/L )] } . To learnthe topological phase transition in this model, we treat itas a classiﬁcation task assisted by neural networks. Theoutput of neural network is the probabilities of diﬀerentwinding numbers. We deﬁne { P , P , P } as the outputprobabilities of winding number ˜ w ± = { , . , − . } , re-spectively. The predicted winding number is interpretedas the ˜ w ± , which has the highest probability. The ar-chitecture of the CNN is shown in Fig. 2, with sometraining details are given in the Appendix A. For ourtask, the objective function to be optimized is deﬁned by J = − N [ N (cid:88) i =1 n w =3 (cid:88) j =1 w ( i ) ± = ˜ w ± ,j ) log ( P j ))] , (11)where w ( i ) ± is the label of the i th conﬁguration, and theset { ˜ w ± , , ˜ w ± , , ..., ˜ w ± ,n w } represents the winding num-ber predicted by the neural networks. The expression1( w ( i ) ± = ˜ w ± ,j ) means that it will take the value 1 whenthe condition w ( i ) ± = ˜ w ± ,j is satisﬁed and 0 for the oppo-site case. In this model, n w = 3 and { ˜ w ± , , ˜ w ± , , ˜ w ± , } A cc u r ac y T 𝛿 P r ob a b ilit y Test set I Test set

II𝑃 (෥𝑤 ± = 0)𝑃 (෥𝑤 ± = 0.5)𝑃 (෥𝑤 ± = −0.5) 𝑤 ± = −0.5𝑤 ± = 0.5 𝑤 ± = 0 (b)(c) 𝑤 ± = 0 𝑤 ± = 0 … 𝑡 + 𝛿 𝐀 𝑗 − 1 𝐁 A 𝐀 𝑗 𝐁 A 𝐀 𝑗 + 1 𝐁 A … 𝑡 ′ 𝑡 − 𝛿 (a) FIG. 4. (Color online) (a) The non-Hermitian SSH modelwith non-reciprocal hopping modulated by the parameter δ . (b) The accuracy of two test sets against the distancethreshold T . For each T , data sets are regenerated andthe CNN is retrained and retested. (c) Classiﬁcation prob-abilities outputted by the CNN in the test set II with T = 0 .

2, where true phase transition points are located at δ = {− . , − . , . , . } . The predicted phase transitionpoints locate at the crossing point of prediction probabilities.Diﬀerent colors represent diﬀerent winding numbers. represent the winding number w = { , . , − . } corre-spondingly.To see whether the CNN is a good tool to study topo-logical phases transitions in this model, we deﬁne a Eu-clidean distance s between the conﬁguration and thephase boundaries in parameter space of the Hamiltonian: s = | Aδ + Bt + C |√ A + B , (12)where Aδ + Bt + C = 0 (straight lines in parameters spaceabout δ and t ) is the equations of phase boundaries with A, B, C being the parameters of the equation. In addi-tion, we deﬁne a distance threshold T . In the following,we choose T = 0 . . < T ≤ . . × conﬁgurations satisfying s ≥ T are sampled from diﬀerent phases with diﬀerent windingnumbers.We test the CNN with two diﬀerent test data sets: (I)6 × conﬁgurations satisfying s < T ; (II) 300 conﬁgu-rations distributed uniformly in t = 0 . , δ = [ − , T = 0 . , . , . , . T . We ﬁnd that the CNN achieveshigh accuracy in diﬀerent T , meaning that the CNN candetect the phase transitions precisely in these regions.Moreover, we locate the phase transition points from thecrossing points of prediction probabilities, the phase tran-sitions determined by this method is relatively accurate,as shown in the Fig. 4(c). At the deep phase, the prob-ability for the true winding number w ± stays at nearly100% . On the other hand, the probability for w ± risesstraightly at the phase transitions. In a words, the CNNis a great supplementary tool to study the phase tran-sitions when only phase properties in some conﬁdent re-gions (e.g. the deep phase) are provided. IV. LEARNING TOPOLOGICAL PHASEDIAGRAM IN NON-HERMITIAN AAH MODEL

To show that our results can be generalized to othernon-Hermitian topological models, we consider a gener-alized AAH model in a one-dimensional quasicrystal asshown in Fig. 5(a), with two kinds of non-Hermiticitesarising from the non-reciprocal hopping [55] and complexon-site potential phase [56]. The Hamiltonian of such anon-Hermitian AAH model is given by [76] H = (cid:88) j ( t ( r ) j ˆ c † j +1 ˆ c j + t ( l ) j ˆ c † j ˆ c j +1 + ∆ j ˆ n j ) , (13)where the non-reciprocal hopping terms and the on-sitepotential are parameterized as t ( r ) j = { t + V cos[2 π ( j + 1 / β ] } e − α ,t ( l ) j = { t + V cos[2 π ( j + 1 / β ] } e α , (14)∆ j = V cos (2 πjβ + ih ) . Here t ( r ) j ( t ( l ) j ) denotes the right (left)-hopping ampli-tude between j -th and ( j + 1)-th site with parame-ters t > V being real, ∆ j denotes the complexquasiperiodic potential with V > β an irrationalnumber, and the parameters α and h tune the non-reciprocity and complex phase, respectively. For ﬁnitequasiperiodic systems, one can take the lattice site num-ber L = F j +1 rational number and the rational number β = F j /F j +1 with F j being the j -th Fibonacci numbersince lim j →∞ F j /F j +1 = ( √ − /

2. In the following,we set t = 1 and L = 89.The winding numbers discussed previously cannot bedirectly used here due to the periodicity breaking. In this case, one can consider a ring chain with an eﬀective mag-netic ﬂux Φ penetrating through the center, such that theHamiltonian matrix can be rewritten as H (Φ) =  ∆ t ( l )1 t ( r ) L e − i Φ t ( r )1 ∆ t ( l )2 . . . . . . . . . t ( r ) L − ∆ L − t ( r ) L − t ( l ) L e i Φ t ( r ) L − ∆ L  . (15)One can deﬁne the winding number with respect to Φand the energy base E B [48, 55]: w Φ = (cid:90) π dΦ2 πi ∂ Φ ln det[ H (Φ) − E B ] . (16)Here w Φ counts the number of times the complex spec-tral trajectory encircles the energy base E B ( E B ∈ C does not belong to the energy spectrum) when the ﬂuxis increased from 0 to 2 π . For discretized H (Φ) withΦ = 0 , π/L Φ , π/L Φ , · · · , π , the winding number canbe rewritten as w Φ = 12 π L Φ (cid:88) n =1 [ θ (cid:48)(cid:48) ( n ) − θ (cid:48)(cid:48) ( n − , (17)where θ (cid:48)(cid:48) ( n ) = arg det[ H (2 πn/L Φ ) − E B ].Below we show that the generalization ability en-ables the CNN to precisely obtain topological phase di-agrams of this non-Hermitian generalized AAH model,even though we only use non-reciprocal-hopping conﬁg-urations in the training. To do this, we treat the prob-lem as a classiﬁcation task and set the conﬁguration inthis case as d ( n ) = { Re det[ ˜ H ( n )] , Im det[ ˜ H ( n )] } with˜ H ( n ) ≡ H (2 πn/L Φ ) − E B . The architecture of theCNN is similar to that for the non-Hermitian SSH model,but the output layer now becomes two neurons for twokinds of winding number. We deﬁne { P , P } as the out-put probabilities of the winding numbers ˜ w Φ = { , − } ,respectively. The objective function in this case is similarto that in Eq. (11) and is given by J = − N [ N (cid:88) i =1 n w =2 (cid:88) j =1 w ( i )Φ = ˜ w Φ ,j ) log ( P j ))] , (18)where { ˜ w Φ , , ˜ w Φ , } (with n w = 2) represent ˜ w Φ = { , − } , respectively.To test the generality of the neural network, we trainthe neural network with conﬁgurations corresponding themodel Hamiltonians with h = 0, and test it with con-ﬁgurations corresponding Hamiltonians with both non-reciprocal hopping amplitudes ( α (cid:54) = 0) and complex po-tentials ( h (cid:54) = 0). Training data set includes conﬁgu-rations with α ∈ [0 . , .

0] and the interval ∆ α = 0 . . × conﬁgurations correspondingHamiltonians sampled from the two-dimensional param-eter space spanned by V ∈ [0 , × V ∈ [0 , 𝑉 𝑉 𝑉 (b) (c) (a) …… 𝑡 𝑗 𝑟 𝑡 𝑗 𝑙 𝑗 − 1 𝑗 + 1𝑗 Δ 𝑗 𝛽 𝛼 ℎ ෥𝑤 Φ Accuracy ℎ = 1.2, 𝛼 = 0.55ℎ = 1.6, 𝛼 = 1.95

FIG. 5. (Color online) (a) Non-Hermitian generalized AAH model with non-reciprocal hopping and complex quasiperiodicpotential. (b) Test accuracy table with respect to two non-Hermiticity parameters α and h . (c) The upper (down) ﬁgure is thetopological phase diagram predicted by the CNN for h = 1 . α = 0 .

55 ( h = 1 . α = 1 . set includes 110 pairs of parameters, which consist of α from α = 0 .

15 to α = 1 .

95 with the interval ∆ α = 0 . h from h = 0 . h = 2 . h = 0 . . × conﬁgurations corresponding Hamil-tonians from the region V ∈ [0 , × V ∈ [0 ,

2] for eachpair of parameters.After the training, we ﬁnd that the CNN performswell even without a knowledge of the complex on-sitepotential ( h = 0) during the training process. Fig. 5(b)shows the test accuracy table with respect to the two non-Hermiticity parameters α and h , with the accuracy morethan 99% in the whole parameter region. Moreover, wepresents the topological phase diagrams predicted by theCNN, which is with respect to V and V as shown in Fig.5(c). It is clear that the CNN performs excellently at thedeep phase with only a little struggle near the topologicalphase transitions. We attribute the high accuracy in thislearning task to two reasons. First, normalizing data en-able both the training and test data distributing in thecomplex unit, which is important for the generality ofthe neural network. Second, the topological transitionsin this model is consistent with the real-complex tran-sitions in the energy spectrum [76], which reduces thecomplexity of problem when input data is dependent oncomplex spectrum. V. CONCLUSIONS

In summary, we have demonstrated that the artiﬁcialneural networks can be used to predict the topological invariants and the associated topological phase transi-tion and topological phase diagrams in three diﬀerentnon-Hermitian models with high accuracy. The wind-ing numbers of the Hatano-Nelson model are presentedas a demonstration of our non-Hermitian machine learn-ing method. The CNN trained by the data set within thedeep phases has been shown to correctly detect the phasetransition near each boundary of the non-Hermitian SSHmodel. We have investigated the non-Hermitian general-ized AAH model with non-reciprocal hopping and com-plex quasiperiodic potential. It is found that the topolog-ical phase diagram in the 2D non-Hermiticity parameterspace predicted by CNN has high accuracy with the the-oretical one. Our results have shown the generality ofmachine learning based method on classifying topologi-cal properties for both single- and multi-band models.

Note added.

After the completion of this work, wenoticed a complementary work on meachine learning non-Hermitian topological phases [77], which focused on thewinding number of the Hamiltonian vectors.

Appendix A: Training details

We ﬁrst describe some training details for the Hatano-Nelson model. We use the deep learning framework py-torch [78] to construct and train the neural network.Weights are randomly initialized to a normal distribu-tion with Xavier algorithm [79] and the biases are ini-tialized to 0. We use Adam optimizer[80] to minimizethe output of the neural network ˜ w with true w . Weset initial learning rate is 0.001 and use ReduceLROn-Plateau algorithm [78] to lower by 10 times when theimprovement of the validation loss stops for 20 epochs.All hyper-parameters are set to be default, unless men-tioned otherwise. In order to prevent neural overﬁtting, L regularization with strength 10 − and early stop [81]are used during the training. We use a mini-batch train-ing with the batch size to be 64 and a validation set toconﬁrm there is no overﬁtting during training. Among4 × conﬁgurations consist of 1 : 1 : 1 of them havingwinding numbers w = ±{ , , } , respectively. Typicalloss during a training instance of the CNN and FCNN isshown in Fig. 6(b), from which one can see that there isno sign of overﬁtting. Training set Validation set EpochEpochTraining set(CNN) Training set(FCNN) Validation data set(FCNN)Validation set(CNN) Epoch

Training set (T = 0.2 ) Validation set (

T = 0.2 ) Training set (T = 0.3 ) Validation set (T = 0.3 ) Training set (T = 0.4 ) Validation set (T = 0.4 ) Training set (T = 0.5 ) Validation set (T = 0.5 ) Training set (T = 0.6 ) Validation set (T = 0.6 ) L o ss (a)(b)(c) L o ss L o ss FIG. 6. (Color online) (a) The CNN and FCNN training losshistory on the Hatano-Nelson model. The CNN training losshistory on (b) the non-Hermitian SSH model; and (c) thenon-Hermitian generalized AAH model. 𝛿𝛿𝑡𝑡𝑡𝑡 𝛿𝛿 (a)(b) 𝑤𝑤 ± = 0𝑤𝑤 ± = 0 𝑤𝑤 ± = 0 Training setValidation setTest set

FIG. 7. (Color online) (a) The phase diagram of the non-Hermitian SSH model about t ∈ [ − , × δ ∈ [ − , , t (cid:48) =1. (b) Data set distribution when T = 0 .

2, the amount oftraining data set, validation data set and test data set areabout 2 . × , 6 × , 6 × , respectively. We then provide some training details for the non-Hermitian SSH model. In this case, the CNN has twoconvolution layers with 32 kernels of size 1 × × × ×

1, followed by a fully connectedlayer of 16 neurons before output layer. In this model,the output layer consists of three neurons for three dif-ferent inter-band winding numbers. All the hidden layershave ReLU as activation functions and the output layerhas softmax function f ( x ) i = exp x i / (cid:80) nj =1 exp x j . Theexact topological phase diagram in the parameter spacespanned by t and δ is show in Fig. 7(a). Training dataset satisﬁed s ≥ T with T = 0 . s < T are randomly sampled from the param-eter space. Data set distribution is shown in Fig. 7(b).The number of conﬁgurations in training data set, vali-dation data set, and test data set are about 2 . × ,6 × , and 6 × , respectively. Typical loss duringtraining instances of the CNN for diﬀerent training datasets is plotted in Fig. 6(b), which clearly shows the neu-ral networks converge quickly without overﬁtting.Finally, we present brieﬂy some details for the non-Hermitian generalized AAH model. In this case, the vali-dation set consists of 8 × conﬁgurations correspondingnon-reciprocal-hopping Hamiltonians (with h = 0) thatare not included in the training data set. Typical loss isshown in Fig. 6(c) with the networks converging quicklywithout overﬁtting. ACKNOWLEDGMENTS

We thank Dan-Bo Zhang for helpful discussions. Thiswork was supported by the NSAF (Grants No. U1830111 and No. U1801661), the Key-Area Research and De-velopment Program of Guangdong Province (Grant No.2019B030330001), and the Science and Technology Pro-gram of Guangzhou (Grants No. 201804020055 and No.2019050001). [1] M. I. Jordan and T. M. Mitchell, Science , 255 (2015).[2] Y. LeCun, Y. Bengio, and G. Hinton, Nature , 436(2015).[3] I. J. Goodfellow, Y. Bengio, and A. Courville,Deep Learning (MIT Press, Cambridge, MA, USA, 2016) .[4] G. Carleo, I. Cirac, K. Cranmer, L. Daudet, M. Schuld,N. Tishby, L. Vogt-Maranto, and L. Zdeborov´a, Rev.Mod. Phys. , 045002 (2019).[5] R. Biswas, L. Blackburn, J. Cao, R. Essick, K. A. Hodge,E. Katsavounidis, K. Kim, Y.-M. Kim, E.-O. Le Bigot,C.-H. Lee, J. J. Oh, S. H. Oh, E. J. Son, Y. Tao,R. Vaulin, and X. Wang, Phys. Rev. D , 062003(2013).[6] B. S. Rem, N. K¨aming, M. Tarnowski, L. Asteria,N. Fl¨aschner, C. Becker, K. Sengstock, and C. Weit-enberg, Nat. Phys. , 917 (2019).[7] G. Kasieczka, T. Plehn, A. Butter, K. Cranmer, D. Deb-nath, B. M. Dillon, M. Fairbairn, D. A. Faroughy, W. Fe-dorko, C. Gay, L. Gouskos, J. F. Kamenik, P. Komiske,S. Leiss, A. Lister, S. Macaluso, E. Metodiev, L. Moore,B. Nachman, K. Nordstrm, J. Pearkes, H. Qu, Y. Rath,M. Rieger, D. Shih, J. Thompson, and S. Varma, SciPostPhysics (2019), 10.21468/scipostphys.7.1.014.[8] L. Wang, Phys. Rev. B , 195105 (2016).[9] J. Carrasquilla and R. G. Melko, Nat. Phys. , 431(2017).[10] Y. Zhang and E.-A. Kim, Phys. Rev. Lett. , 216401(2017).[11] D.-L. Deng, X. Li, and S. Das Sarma, Phys. Rev. B ,195145 (2017).[12] P. Huembeli, A. Dauphin, P. Wittek, and C. Gogolin,Phys. Rev. B , 104106 (2019).[13] X.-Y. Dong, F. Pollmann, and X.-F. Zhang, Phys. Rev.B , 121104 (2019).[14] E. P. Van Nieuwenburg, Y.-H. Liu, and S. D. Huber,Nat. Phys. , 435 (2017).[15] D. Carvalho, N. A. Garc´ıa-Mart´ınez, J. L. Lado, andJ. Fern´andez-Rossier, Phys. Rev. B , 115453 (2018).[16] P. Zhang, H. Shen, and H. Zhai, Phys. Rev. Lett. ,066401 (2018).[17] N. Sun, J. Yi, P. Zhang, H. Shen, and H. Zhai, Phys.Rev. B , 085402 (2018).[18] P. Huembeli, A. Dauphin, and P. Wittek, Phys. Rev. B , 134109 (2018).[19] Y.-H. Tsai, M.-Z. Yu, Y.-H. Hsu, and M.-C. Chung,Phys. Rev. B , 054512 (2020).[20] Y. Ming, C.-T. Lin, S. D. Bartlett, and W.-W. Zhang,npj Comput. Mater. , 1 (2019).[21] J. F. Rodriguez-Nieva and M. S. Scheurer, Nat. Phys. ,790 (2019).[22] N. L. Holanda and M. A. R. Griﬃth, Phys. Rev. B ,054107 (2020).[23] T. Ohtsuki and T. Mano, J. Phys. Soc. Jpn. , 022001 (2020).[24] Y. Zhang, P. Ginsparg, and E.-A. Kim, Phys. Rev. Re-search , 023283 (2020).[25] Y. Long, J. Ren, and H. Chen, Phys. Rev. Lett. ,185501 (2020).[26] M. S. Scheurer and R.-J. Slager, Phys. Rev. Lett. ,226401 (2020).[27] B. S. Rem, N. Kming, M. Tarnowski, L. Asteria,N. Flschner, C. Becker, K. Sengstock, and C. Weiten-berg, Nature Physics , 917 (2019).[28] W. Lian, S.-T. Wang, S. Lu, Y. Huang, F. Wang,X. Yuan, W. Zhang, X. Ouyang, X. Wang, X. Huang,L. He, X. Chang, D.-L. Deng, and L. Duan, Phys. Rev.Lett. , 210503 (2019).[29] S. Diehl, A. Micheli, A. Kantian, B. Kraus, H. B¨uchler,and P. Zoller, Nat. Phys. , 878 (2008).[30] S. Malzard, C. Poli, and H. Schomerus, Phys. Rev. Lett. , 200402 (2015).[31] T. E. Lee, Phys. Rev. Lett. , 133903 (2016).[32] S. Yao and Z. Wang, Phys. Rev. Lett. , 086803 (2018).[33] S. Yao, F. Song, and Z. Wang, Phys. Rev. Lett. ,136802 (2018).[34] F. Song, S. Yao, and Z. Wang, Phys. Rev. Lett. ,246801 (2019).[35] F. K. Kunst, E. Edvardsson, J. C. Budich, and E. J.Bergholtz, Phys. Rev. Lett. , 026808 (2018).[36] K. Takata and M. Notomi, Phys. Rev. Lett. , 213902(2018).[37] H. Wang, J. Ruan, and H. Zhang, Phys. Rev. B ,075130 (2019).[38] Q.-B. Zeng, S. Chen, and R. L¨u, Phys. Rev. A , 062118(2017).[39] L.-J. Lang, Y. Wang, H. Wang, and Y. D. Chong, Phys.Rev. B , 094307 (2018).[40] R. Hamazaki, K. Kawabata, and M. Ueda, Phys. Rev.Lett. , 090603 (2019).[41] L. Jin and Z. Song, Phys. Rev. B , 081103 (2019).[42] K. Kawabata, S. Higashikawa, Z. Gong, Y. Ashida, andM. Ueda, Nat. Commun. , 1 (2019).[43] T. Liu, Y.-R. Zhang, Q. Ai, Z. Gong, K. Kawabata,M. Ueda, and F. Nori, Phys. Rev. Lett. , 076801(2019).[44] C. H. Lee, L. Li, and J. Gong, Phys. Rev. Lett. ,016805 (2019).[45] K. Yamamoto, M. Nakagawa, K. Adachi, K. Takasan,M. Ueda, and N. Kawakami, Phys. Rev. Lett. ,123601 (2019).[46] N. Hatano and D. R. Nelson, Phys. Rev. Lett. , 570(1996).[47] N. Hatano and D. R. Nelson, Phys. Rev. B , 8651(1997).[48] Z. Gong, Y. Ashida, K. Kawabata, K. Takasan, S. Hi-gashikawa, and M. Ueda, Phys. Rev. X , 031079 (2018).[49] A. Ghatak and T. Das, J. Phys.: Condens. Matter , , 040401 (2017).[51] H. Shen, B. Zhen, and L. Fu, Phys. Rev. Lett. ,146402 (2018).[52] G.-Q. Zhang, D.-W. Zhang, Z. Li, Z. D. Wang, and S.-L.Zhu, Phys. Rev. B , 054204 (2020).[53] D.-W. Zhang, L.-Z. Tang, L.-J. Lang, H. Yan, and S.-L.Zhu, Sci. China Phys. Mech. Astron. , 1 (2020).[54] X.-W. Luo and C. Zhang, arXiv:1912.10652v1.[55] H. Jiang, L.-J. Lang, C. Yang, S.-L. Zhu, and S. Chen,Phys. Rev. B , 054301 (2019).[56] S. Longhi, Phys. Rev. Lett. , 237601 (2019).[57] T. Liu, H. Guo, Y. Pu, and S. Longhi, Phys. Rev. B , 024205 (2020).[58] H. Wu and J.-H. An, Phys. Rev. B , 041119 (2020).[59] Q.-B. Zeng, Y.-B. Yang, and Y. Xu, Phys. Rev. B ,020201 (2020).[60] Q.-B. Zeng and Y. Xu, Phys. Rev. Research , 033052(2020).[61] L.-Z. Tang, L.-F. Zhang, G.-Q. Zhang, and D.-W. Zhang,Phys. Rev. A , 063612 (2020).[62] H. Liu, Z. Su, Z.-Q. Zhang, and H. Jiang, ChinesePhysics B , 050502 (2020).[63] D.-W. Zhang, Y.-L. Chen, G.-Q. Zhang, L.-J. Lang, Z. Li,and S.-L. Zhu, Phys. Rev. B , 235150 (2020).[64] Z. Xu and S. Chen, Phys. Rev. B , 035153 (2020).[65] T. Liu, J. J. He, T. Yoshida, Z.-L. Xiang, and F. Nori,arXiv:2001.09475v2.[66] W. Xi, Z.-H. Zhang, Z.-C. Gu, and W.-Q. Chen,arXiv:1911.01590v4.[67] E. Lee, H. Lee, and B.-J. Yang, Phys. Rev. B , 121109 (2020).[68] T. Yoshida, K. Kudo, and Y. Hatsugai, Sci. Rep. (2019), 10.1038/s41598-019-53253-8.[69] D. S. Borgnia, A. J. Kruchkov, and R.-J. Slager, Phys.Rev. Lett. , 056802 (2020).[70] N. Okuma, K. Kawabata, K. Shiozaki, and M. Sato,Phys. Rev. Lett. , 086801 (2020).[71] K. Zhang, Z. Yang, and C. Fang, arXiv:1910.01131v3.[72] W. P. Su, J. R. Schrieﬀer, and A. J. Heeger, Phys. Rev.Lett. , 1698 (1979).[73] P. G. Harper, Proc. Phys. Soc. Sect. A , 874 (1955).[74] S. Aubry and G. Andre, Ann. Israel Phys. Soc. , 133(1980).[75] F. Liu, S. Ghosh, and Y. D. Chong, Phys. Rev. B ,014108 (2015).[76] L.-Z. Tang and et al, In prepareration.[77] B. Narayan and A. Narayan, arXiv:2009.06476 (2020).[78] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Brad-bury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein,L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. De-Vito, M. Raison, A. Tejani, S. Chilamkurthy,B. Steiner, L. Fang, J. Bai, and S. Chintala, inAdvances in Neural Information Processing Systems 32(Curran Associates, Inc., 2019) pp. 8026–8037.[79] X. Glorot and Y. Bengio, inProceedings of Machine Learning Research, Vol. 9,edited by Y. W. Teh and M. Titterington (JMLRWorkshop and Conference Proceedings, Chia LagunaResort, Sardinia, Italy, 2010) pp. 249–256.[80] D. P. Kingma and J. Ba, arXiv:1412.6980 (2014).[81] Y. Yao, L. Rosasco, and A. Caponnetto, ConstructiveApproximation26