Machine Learning Regression for Operator Dynamics
MMachine Learning Regression for Operator Dynamics
Justin A. Reyes, Sayandip Dhara, and Eduardo R. Mucciolo Department of Physics,University of Central Florida, Orlando, FL, 32816, USA (Dated: February 24, 2021)Determining the dynamics of the expectation values of operators acting on quantum many-body systems isa challenging task. Matrix product states (MPS) have traditionally been the ”go-to” models for these systemsbecause calculating expectation values in this representation can be done with relative simplicity and high ac-curacy. However, such calculations can become computationally costly when extended to long times. Here, wepresent a solution for efficiently extending the computation of expectation values to long time intervals. We uti-lize a multi-layer perceptron (MLP) model as a tool for regression on MPS expectation values calculated withinthe regime of short time intervals. With this model, the computational cost of generating long-time dynamics issignificantly reduced, while maintaining a high accuracy. These results are demonstrated with operators relevantto quantum spin models in one spatial dimension.
I. INTRODUCTION
The accurate determination of expectation values for op-erators acting on quantum many-body (QMB) systems atlong times remains an open problem. Much progress hasbeen made for various specific systems of interest, suchas the Ising chain with a quenched transverse field, orthe Ohmic spin-boson model coupled to a harmonic non-Markovian environment. However, these developments havefocused on systems where symmetries and approximationscan be exploited, analytic or exact diagonalization methodscan be used, or matrix product state algorithms can be em-ployed. Such approaches are either limited in their scope orquickly become computationally demanding, particularly forsystems in more than one spatial dimension. This is par-ticularly true for the standard time-evolving block decima-tion (TEBD) , time-dependent density matrix renormalizationgroup ( t -DMRG), and dynamic density matrix renormaliza-tion group (DDMRG) algorithms. Recent advances in machine learning models have offerednew insights and paved new pathways for modeling QMB sys-tems, often providing significant computational advantagesover traditional methods.
Motivated by these successes,we investigate the advantages machine learning can providewhen computing the expectation values for operators actingon QMB systems within the long-time regime.Previous work utilizing machine learning techniques inQMB systems was heavily focused on the use of restrictedBoltzmann machines (RBMs) as generative models of quan-tum states.
These are energy-based models with an energycost function given by E ( v, h ) = − (cid:88) i a i v i − (cid:88) j b j h j − (cid:88) i,j v i W ij h j . (1)Here, v = { v i } and h = { h j } are the visible and hidden lay-ers of neurons in the RBM network, respectively. Each visible(hidden) neuron can only take on the values ± with an asso-ciated bias a i ( b j ) and is fully connected to the hidden (visi-ble) layer by the weight matrix W . Given a spin-1/2 system,the probability amplitude of a specific spin state, Ψ RBM ( v ) ,can be represented by the RBM by setting a spin configura-tion v for the visible layer and performing a summation over all hidden variables as Ψ RBM ( v ) = (cid:88) h e − E ( v,h ) (2) = (cid:89) i e a i v i (cid:89) j e b j + (cid:80) i v i W ij . (3)To obtain information about the full state of the system, itis necessary to perform sampling over numerous spin con-figurations. While this model is preferably suited to the de-termination of ground state properties, after some modifica-tions it has also been used to determine dynamical propertiesof QMB systems. Recently, convolutional neural networkhave also been used to map input QMB spin configurations toprobability amplitudes. In spite of the success of these ap-proaches, these models still face similar challenges as othercomputational methods, namely that accurately representingthe system state becomes computationally demanding as thesystem size grows.To circumvent the computational demands of representing(or sampling) the full state of the system at any given time,we focus our attention on the direct evolution of expectationvalues by breaking time into two domains. For any given op-erator O acting on a quantum system | Ψ ( t ) (cid:105) , the expectationvalue of the operator at any given time is given by (cid:104)O(cid:105) = (cid:104) Ψ ( t ) | O | Ψ ( t ) (cid:105) . (4)By computing (cid:104)O(cid:105) using matrix product state (MPS) algo-rithms within the short-time domain, we shown that the long-time expectation values can be determined with low computa-tional effort and good accuracy by utilizing a multi-layer per-ceptron (MLP) as a tool for linear regression. It is noted thatprevious extrapolation methods have been studied, but thesehave either focused on constructing the wave function at eachincrement of time or implementing linear prediction meth-ods to t -DMRG spectral calculations. This paper is organized as follows: In Sec. II, we reviewthe fundamentals of the multi-layer perceptron (MLP) model.In Sec. III, as a benchmark for algorithmic comparison, we re-view the time-evolving block decimation algorithm in the con-text of calculating operator dynamics. In Sec. IV, we describethe methodology involved in using the MLP for regression. InSec. V, we demonstrate the computational advantage gained a r X i v : . [ qu a n t - ph ] F e b 𝒙 ₁ 𝒙 ₂ 𝑊 ₁ 𝑊 ₂ a ₁ a ₂ a ₃ a ₄ 𝒚 ′ Training examples
FIG. 1. A graphical representation of an example MLP, with inputvectors (cid:126)x n each having two elements for use as input to the first layerof neurons (blue). This input is propagated to the next layer (red) byinterconnecting weights W and finally sent to the output (yellow)with weights W . by using MLP regression to determine operator dynamics forboth the Ising and the XXZ model. Finally, in Sec. VI, weinterpret these results and provide a framework for further im-provement and investigation. II. MULTI-LAYER PERCEPTRONSA. Architecture
In machine learning, the MLP model is a ubiquitous tool forperforming classification tasks. It is an input-output modelapproximating the function y (cid:48) = W · f ( x ) , (5)where f ( x ) is an activation function over a set of inputs x , W is a weight matrix, and y (cid:48) is a guessed classification la-bel. This model is composed of l sequential layers of neurons { a ( i ) n i } , where < i < l , and n i specifies the number of neu-rons in a given layer i . Each neuron is subject to the activationfunction f . Additionally, each layer of neurons has a speci-fied weight matrix W ( i ) , which connects the output from onelayer to the input of the next. The components of each W ( i ) are used as parameters for optimization of the network. Fig-ure 1 provides a schematic for a MLP with a single layer ofneurons between the input and final output.
B. Supervised Learning
To minimize the cost of the guessed label y (cid:48) generated bythe MLP, we provide an initial data set having N elements, { x n , y n } , where < n < N , for use in a training proto-col. This provision for training is characteristic of supervised learning . In this learning procedure, each input vector x n isaccompanied by a corresponding true classification label y n .This label provides the reference for a cost function C ( y (cid:48) n , y n ) which measures the distance between the current MLP outputclassification label y (cid:48) n and the true classification label y n . For our specific purposes,xs we define C ( y (cid:48) n , y n ) as C ( y (cid:48) n , y n ) = 1 N N (cid:88) n | y (cid:48) n − y n | . (6)Minimizing this cost function can be accomplished by any se-lection of known optimizations algorithms. For our purposes,we focus on using a stochastic gradient descent method. III. TIME EVOLVING BLOCK DECIMATION
Before introducing the MLP regression algorithm, we pro-vide a short review of the TEBD algorithm so that compu-tational comparisons to our algorithm might be well under-stood. The TEBD algorithm facilitates the time evolution (realor imaginary) of one-dimensional quantum systems under lo-cal Hamiltonians. As such, it is naturally expressed withinthe framework of matrix product states (MPS). The time evo-lution is accomplished by generating and repeatedly apply-ing Suzuki-Trotter expansions of the time evolution operator exp ( − i H T ) , up to any specified order. Given a Hamiltonianwith nearest-neighbor interactions and open boundary condi-tions (OBC) over N sites, H = N − (cid:88) i =1 H i,i +1 , (7)the second-order Suzuki-Trotter expansion of exp ( − i H T ) for a small time-step δ > is given as e − iT H ≈ (cid:104) ( e − iδ H , e − iδ H , · · · e − iδ H n − ,n ) × ( e − iδH , e − iδH , · · · e − iδH n − ,n − ) × ( e − iδ H , e − iδ H , · · · e − iδ H n − ,n ) (cid:105) T/δ . (8)The sequential application of these operators demands that theMPS be brought to canonical form (i.e., orthonormalizing theindices) after every time step δ . Such a procedure involves O ( poly ( N ) poly ( D )) steps, where D is the maximum internalbond dimension of the MPS. In the absence of truncation, D grows exponentially with both the system size and the evolu-tion time and the computational cost of this procedure quicklybecomes intractable for large systems and long times.After the application of these operators, the final state of thesystem is obtained as | Ψ( t = T ) (cid:105) = e − iT H | Ψ( t = 0) (cid:105) . (9)Typically, there are two sources of error in the TEBD frame-work. The first comes from the truncation of the MPS bonddimensions during the orthonormalization process. The othersource of error arises during the Suzuki-Trotter expansion, 𝒙 ₁ 𝒙 ₂ 𝑊 ₁ 𝑊 ₂ a ₁ a ₂ a ₃ a ₄ 𝒚 ′ Training examplesTrue Label 𝒚 Data Set 𝑿 FIG. 2. An example diagram for MLP regression. The input data set X is decomposed into sets of training examples consisting of inputvectors (white) and output values (yellow) selected from contiguousblocks in X . These training examples are fed into the MLP as shown. which for our purposes is taken to second order. In this case,the error per time step is on the order O ( δ ) resulting in anerror over the total time interval on the order of O ( δ ) . Inthis paper, we choose to mitigate the first source of error byperforming minimal amounts of truncation on the MPS (i.e.,maintaining large bond dimensions). This is done to ensurethat the primary source of error arises from the Trotter-Suzukiapproximation itself. IV. MACHINE LEARNING REGRESSION
In order to effectively model the evolution of operator ex-pectation values, we construct the MLP in a manner con-ducive to regression rather than classification. Accomplishingthis involves a few specifications about the input-output pairs { x , y } . We treat an input vector X as being parameterized bytime t over a time interval [0 , τ ] so that each element X i ∈ X is labeled by a coordinate t i . The total time τ is partitionedinto m discrete time intervals { t i | < i < m } . From X , eachinput-output pair is constructed as follows. Starting from thefirst element in X corresponding to time t = 0 , we select acontiguous block of p elements from X to form an input vec-tor x = { X , X , ..., X p } . We call this block our trainingwindow . The corresponding label for this window is selectedas the element X p +1 . To construct multiple input examplesfor training, we shift the starting position of the training win-dow throughout X until the desired number of examples isachieved. A diagram of this initialization procedure is givenin Fig. 2.In addition to constructing the input-output pairs in theaforementioned manner, we choose to define our activationfunctions by the linear unit, f = x . This activation allows usto effectively propagate all of the input information throughthe network. This is in contrast to the more commonly used τ / J -1-0.500.51 〈 S z 〉 MLPMPS (a) τ / J -5 -4 -3 -2 -1 ε (b) FIG. 3. (a) Time evolution of the expectation value (cid:104) S z (cid:105) up to τ = 25 /J in time steps of δ = 0 . /J for the one-dimensionalIsing spin chain with N = 12 sites, exchange coupling J , and trans-verse field ∆ = J (i.e., at the critical point). The MLP used for theregression was constructed with 32 linear activated neurons using atraining window size of p = 4 . The selection of training exam-ples employs the first 110 time steps (left of the green line). Re-sults for the MLP regression are compared to TEBD results withmaximum bond dimension D = 200 . (b) The absolute difference (cid:15) = |(cid:104) S z MPS (cid:105) − (cid:104) S z MLP (cid:105)| between the TEBD and the MLP regres-sion is shown for each time step. rectified linear unit (ReLU), f = max(0 , x ) , which, depend-ing on the values selected for the weights, can suppress someinformation propagation through the network by eliminatingall negative values. The ReLU activation is useful when theMLP is used for classification over a discrete set of positivevalued labels. However, we select the linear activation be-cause our output values are continuous and include values lessthan zero.
V. RESULTS
We test our MLP regression by evaluating operator expec-tation values over two model systems and comparing them tosecond-order Trotter-Suzuki time evolved MPS calculations.
FIG. 4. A comparison between the total computational times for theTEBD and the MLP regression algorithms. Each method was used todetermine the dynamics of the expectation value of the S z operatorfor the transverse-field Ising model with exchange coupling J , trans-verse field, h = J , and evolution time τ = 25 /J in increments of δ = 0 . /J . The MLP regression was trained by stochastic gradientdescent over 32 linearly activated neurons, regardless of size. Forthe TEBD, the maximal bond dimension was set to D = 200 for allsizes. Firstly, we determine values at the critical point for the one-dimensional Ising model in a transverse field with an evolutiontime of τ = 25 /J , where J is the exchange coupling con-stant. To compare the computational cost between the TEBDand the MLP regression, we measure the computational timeas a function of system size. Secondly, we apply the MLP re-gression to the one-dimensional XXZ model as a demonstra-tion of its adaptability to various models. All machine learn-ing simulations were implemented using the Tensorflow Keraslibrary. A. Ising Model
For N spins in a one-dimensional chain, the nearest-neighbor Ising model in the presence of a transverse field isgiven by the Hamiltonian H = − J N − (cid:88) i =1 S zi S zi +1 − h N (cid:88) i = S xi , (10)where S z is the longitudinal spin operator, J is the exchangecoupling constant, and h characterizes the strength of thetransverse field. Due to the non-commutability of terms in theHamiltonian, this model is known to have a quantum phasetransition at J = h in one spatial dimension. This phase tran-sition takes the system from the ordered ferromagnetic stateto a paramagnetic state. As an illustration of our method, we investigate the dynam-ics of the expectation of the local spin operator (cid:104) S z (cid:105) near thisphase transition for a short spin chain with N = 12 spins.We first use calculations obtained from an MPS with OBCinitialized in the ferromagnetic state to generate expectationvalues for time steps of δ = 0 . /J . We split these time-ordered expectation values into subsets for training and test-ing the MLP. For our model, we select training windows of τ / J -1-0.500.51 〈 S z 〉 MLPMPS (a) τ / J -5 -4 -3 -2 -1 ε (b) FIG. 5. (a) Time evolution of the expectation value (cid:104) S z (cid:105) up to τ = 20 /J in times steps of δ = 0 . /J for the one-dimensionalspin 1/2 XXZ chain with N = 12 sites, ∆ = J/ , and h = J/ .The MLP used for the regression was constructed with 64 linear ac-tivated neurons using a training window size of p = 4 . The selectionof training examples come only from the first 100 time steps (greenline). Results for the MLP regression are compared with TEBD re-sults without bond truncation (i.e., exact). (b) The absolute difference (cid:15) = |(cid:104) S z MPS (cid:105) − (cid:104) S z MLP (cid:105)| between TEBD calculations and the MLPregression are shown for each time step. p = 4 , giving us access to 995 input-output pairs. Of this, weuse 110 pairs for training. Our MLP architecture is optimizedwith the following parameters: one layer of 32 linear activatedneurons, followed by a single layer with one linear activatedneuron for output. Training is carried out by a stochastic gra-dient descent over the cost function given in Eq. (6). Figure 3shows the results with h = J and maximal bond dimension D = 200 . Within the training region, the MLP is trained un-til it has significant overlap with the MPS calculations. Thisoverlap is seen to continue far past the region of training.Comparison with the MPS results, as shown in Fig. 3(b), re-veal that the MLP deviates from the MPS calculations with anaverage absolute deviation (cid:15) = |(cid:104) S z MPS (cid:105) − (cid:104) S z MLP (cid:105)| equal to × − . We note that the training time and parameters wereselected in such a way as to mitigate overfitting for the givennumber of training examples, which explains why the devia-tion is relatively high in the training range. Using a standarddesktop computer, the time to train the MLP was 415.24 sec-onds, while the time to predict the rest of the dynamics was35.87 seconds. Comparatively, at this system size, exact diag-onalization calculations took approximately 60 seconds, andTEBD calculations (with fixed bond dimension D = 200 )took approximately 0.146 seconds per time step, resulting ina total computation time of 73 seconds.To glean information about the scaling of the computationalcost of our approach (demonstrating its advantage for largersystems), we measure the time required to generate short-time TEBD expectation values as input data and add this tothe computational time required for training and predictingin the long-time regime for varying system sizes N . This computation time is compared to the total time taken by theTEBD to calculate expectation values over the full time in-terval τ = 25 /J . Figure 4 displays this comparison. It isclear that the scaling of the computational time is more favor-able for our method. Still, a more interesting result appearsif the time required for generating the input data is excluded.As shown in Table V A, within this regime of system sizes(having trained until an average deviation of (cid:15) = 10 − isachieved), the scaling of the overall computational time forthe MLP regression is due primarily to the time necessary togenerate input-output training pairs. The computational timenecessary for the training and prediction steps in the algorithmappears polynomial (nearly linear), being primarily due to thenecessary increase in the number of training required to main-tain the given deviation (cid:15) . System size Number of Training Sets Needed Training Set Generation Training + Prediction( N ) ( N train ) (seconds) (seconds)12 110 16.06 451.1114 120 148.8 572.4716 140 1,069.32 594.6218 150 3,205.5 626.6920 175 6,562.5 783.55TABLE I. Dependence of computational times on systems sizes for the transverse-field Ising model. The second column shows the numberof training examples generated to maintain an average deviation (cid:15) = 10 − . The third column shows the computational times to generate thetraining set of TEBD expectation values. The fourth column shows the computational times required for the training and predicting stages ofthe MLP regression. B. XXZ Model
We test another ubiquitous spin system with our MLP re-gression, namely, the XXZ model. The Hamiltonian govern-ing the evolution of this open boundary system is given by H = − J N − (cid:88) i =1 (cid:0) S xi S xi +1 + S yi S yi +1 + ∆ S zi S zi +1 (cid:1) − h N (cid:88) i =1 S xi , (11)where J and ∆ control the strength of the exchange couplingand the uniaxial anisotropy, respectively, and h is the strengthof transverse field. The transverse and longitudinal exchangecouplings are J ⊥ = J and J z = J ∆ , respectively. Similarto the Ising model above, the XXZ model exhibits transitionsbetween the paramagnetic and the ferromagnetic phases ( J > ), with critical values at h c = ± ( J ⊥ − J z ) = J (1 − ∆) . Weinvestigate this model for N = 12 spins and ∆ = h = J/ (within the paramagnetic phase). Initially, the system is set atthe fully-polarized ferromagnetic state.We again select a training window of p = 4 , producing1995 input-output pairs. From these, we train over 100 pairs.The MLP is composed of a single layer of 64 linearly ac- tivated neurons, followed by an output layer with a singlelinearly activated neuron. This model is again trained usedstochastic gradient descent. Comparing with the results takenfrom MPS calculations over time intervals δ = 0 . , we seein Fig. 5 that the MLP regression agrees with the MPS andcontinues to do so deep into the testing regime. As shown inFig. 5(b), the MLP on average differs consistently from theTEBD calculations by an average absolute differnce equal to × − . The time to sufficiently train to the desired accu-racy was 200.99 seconds, while the time to predict the rest ofthe dynamics was 150.14 seconds. Comparatively, exact diag-onalization calculations took approximately 60 seconds, andTEBD calculations took approximately 300 seconds per timestep at the maximal bond dimension, for a total computationtime of approximately 166.67 hours. As for the case of theIsing model, for such a small system, exact diagonalizationis the most cost effective method for computing (cid:104) S z (cid:105) , but thecost of this method increases exponentially with system sizeas O (2 N ) . As previously shown in Table V A, the scaling ismore favorable for MLP. VI. DISCUSSION
By investigating the evolution of the expectation value ofoperators, we have demonstrated that MLP regression ac-curately extends calculations in a highly reduced parame-ter space using very few training examples. To understandthe significance of this, a comparison between the computa-tional resources used in TEBD and MLP calculations is pre-sented. For TEBD calculations, long-time dynamics are ob-tained by determining the state of the system | Ψ MPS (cid:105) at ev-ery time step. For N sites with maximal bond dimension D ,this results in using O ( poly ( N ) poly ( D )) steps, more specif-ically, O (2 w N D ) steps for the sequential application ofone- and two-body operators, where w is the matrix dimen-sion of the applied local operator. Computations with thiscomplexity quickly become cumbersome for long times, par-ticularly when the correlation length of the system divergesand the bond dimension D scales exponentially. However, theMLP regression circumvents this computational cost by utiliz-ing a small fixed ”memory” of previously generated expecta-tion values as the basis for extending calculations out to longtimes. As can be seen in Table V A, the computational cost ofthe combined training and prediction steps of the regressionis approximately polynomial. To understand how this short”memory” reduces the complexity, consider a training set hav-ing N train examples constructed over training windows (i.e.,”memories”) of size p input into a neural network have m neurons. For a given value of p elements, the training phaseof the MLP regression has a computational cost which is de-termined entirely by the neural network model parameters, O ( m p N train ) . After the training, prediction for later timesonly has a computational cost of O (1) . By generating the firstfew expectation values with the MPS, the MLP regression isshown to be able to predict long-time operator expectationsvalues with only the addition of a relatively small number ofcompute cycles. We conclude that within the regime of sizesconsidered in this study, the computational cost of the MLPregression scales remarkably slowly (nearly constant).Though this computational advantage is significant, it isworth noting that the MLP regression can only extend the op-erator expectation values generated by the MPS. It is not agenerative model and therefore cannot calculate operator dy-namics without the presence of some initial expectation val-ues. Further work must be done to explore machine learningarchitectures which can directly generate operator dynamicswhile maintaining low computational costs. Nonetheless, theresults of this work indicate that machine learning techniquescontinue to provide unforeseen advantages in modeling QMBsystems. ACKNOWLEDGEMENTS
The authors acknowledge partial financial support fromNSF grant No. CCF-1844434. P. Calabrese, F. H. L. Essler, and M. Fagotti,
Quantum quenchin the transverse field ising chain , Phys. Rev. Lett. , 227203(2011). A. Strathearn, P. Kirton, D. Kilda, J. Keeling, and B. W. Lovett,
Efficient non-markovian quantum dynamics using time-evolvingmatrix product operators , Nature Commun. , 3322 (2018). S. Paeckel, T. K¨ohler, A. Swoboda, S.R. Manmana, U.Schollw¨ock, and C. Hubig,
Time-evolution methods for matrix-product states , Ann. Phys. , 167998 (2019). G. Vidal
Efficient simulation of one-dimensional quantum many-body systems , Phys. Rev. Lett. , 040502 (2004). E. Jeckelmann,
Dynamical density martix renormalization groupmethod , Phys. Rev. B , 045114 (2002). D. C. Marcello, M. Caccin, P. Baireuther, T. Hyart, and M.Fruchart
Machine learning assisted measurement of local topo-logical invariants , arXiv:1901.03346 (2019). A. Melkinov, L. Fedichkin, and A. Alodjants,
Predicting quantumadvantage by quantum walk with convolutional neural networks ,New J. Phys. , 125002 (2019). K. Chinjo and S. Sota and S. Yunoki, and T. Tohyama
Charac-terization of photoexcited state in the half-filled one-dimensionalextended hubbard model assisted by machine learning , Phys. Rev.B , 195136 (2020). Y. Nomura, A.S. Darmawan, Y. Yamaji, and M. Imada,
Restrictedboltzmann machine learning for solving strongly correlated quan-tum systems , Phys. Rev. B , 205152 (2017). X. Gao and L-M. Duong,
Efficient representation of quantummany body states with deep neural networks , Nat. Commun. ,662 (2017). I. Glasser, N. Pancotti, M. August, I. D. Rodriguez, and J. I. Cirac,
Neural-network quantum states, string-bond states, and chiraltopological states , Phys. Rev. X , 011006 (2018). G. Montufar,
Restricted boltzmann machines: Introduction andreview , arXiv:1806.07066 (2018). M. J. Hartmann and G. Carleo,
Neural network approach todissipative quantum many-body dynamics , Phys. Rev. Lett. ,250502 (2019). G. Carleo and M. Troyer,
Solving the quantum many-body prob-lem with artificial neural networks
Science , 602 (2017). D. Hendry and A. E. Feiguin,
A machine learning approach todynamical properties of quantum many-body systems , Phys. Rev.B , 245123 (2019). M. Schmitt and M. Heyl,
Quantum many-body dynamics in twodimensions with artificial neural networks , Phys. Rev. Lett. ,100503 (2020). Y. Tian and S. R. White,
Matrix product state recursion meth-ods for strongly correlated quantum systems , arXiv:2010.00213(2020). T. Barthel, U. Schollwock and S. R. White,
Spectral functionsin one-dimensional quantum systems at T¿0 , Phys. Rev. B. ,245101 (2009). M.W. Gardner and S.R. Dorling,
Artificial neural networks (themultilayer perceptron): A review of applications in the atmo-spheric sciences , Atmos. Environ. , 2627–2636 (1998). S. B. Kotsiantis,
Supervised machine learning: A review of clas-sification techniques , Informatica , 249–268 (2007). T. Zhang,
Solving large scale linear problems using stochasticgradient descent algorithms , in Proccedings of the 21st Interna-tional Conference on Machine Learning (2004). C. Nwankpa, W. Ijomah, A. Gachagan, and S. Marshall,
Activa-tion functions: Comparison of trends in practice and research fordeep learning , arXiv:1811.03378 (2018). Keras https://keras.io J. Strecka and M. Jascur,
A brief account of the ising and ising-likemodels: Mean-field, effective-field and exact results , Acta Phys. Slovaca , 235–367 (2015). F. Franchini
An introduction to integrable techniques for one-dimensional quantum systems (Springer, 2017). R. Orus,
A practical introduction to tensor networks: Matrix prod-uct states and projected entangled pair states , Ann. Phys.349