Machine learning quantum mechanics: solving quantum mechanics problems using radial basis function networks
MMachine learning quantum mechanics: solving quantum mechanics problems usingradial basis function network
Peiyuan Teng ∗ Department of PhysicsThe Ohio State UniversityColumbus, Ohio, 43210, USA
In this article, machine learning methods are used to solve quantum mechanics problems. Theradial basis function(RBF) network in a discrete basis is used as the variational wavefunction forthe ground state of a quantum system. Variational Monte Carlo(VMC) calculations are carriedout for some simple Hamiltonians. The results are in good agreements with theoretical values.The smallest eigenvalue of a Hermitian matrix can also be acquired using VMC calculations. Newresults are provided to demonstrate that machine learning techniques are capable of solving quantummechanical problems.
I. INTRODUCTION
Machine learning theory has been developing rapidlyin recent years. Machine learning techniques have beensuccessfully applied to solve a variety of problems, such asemail filtering, optical character recognition(OCR), andnatural language processing, and have become a part ofeveryday life. In the physical sciences, researchers arealso applying machine learning methods to explore newpossibilities. For example, machine learning methods areused in molecular dynamics[1][2], as a way to bypassthe Kohn-Sham equation in density functional theory[3],to assist in materials discovery[4], or to identify phasetransitions[5]. Considering the power of machine learn-ing, it is interesting to consider solving quantum mechan-ics problems using machine learning methods.Artificial neural networks (ANNs) [6], which are in-spired by biological neural networks, are one of the mostimportant methods in machine learning theory. An ANNconsists of a network of artificial neurons, and examplesof ANNs include feedforward neural networks[7], radialbasis function (RBF) networks[8], and restricted Boltz-mann machines[9]. As a universal approximator [10][11],an ANN can be used to represent functions, and it ispossible to use an ANN as a representation of the wave-function in a quantum system.Researchers have been trying to combine neural net-work theory and quantum mechanics, for example, usinga neural network in the real space to solve differentialequations, especially the schr¨odinger equation with somespecific potential[12]. Another example is the quantumneural network[13], where information in an ANN is pro-cessed quantum mechanically. One of the most promis-ing works was the recent research by Carleo and Troyerin Ref.[14], where the restricted Boltzmann machine wasused as the variational Monte Carlo (VMC) ground statewavefunction. In their work, the ground state of a many-body system could be efficiently represented by a neural ∗ [email protected] network. Following their work, other possibilities werealso explored. Most recently, in Ref.[15], a three-layerfeedforward neural network was used to calculate theground state energy of the Bose-Hubbard model. Ma-chine learning methods were shown to be able to distin-guish between different phases, even for systems with thesign problem[16]. VMC methods do not suffer from thefermion sign problem; therefore, using a neural networkas a VMC ansatz is very promising and has the potentialto tackle the calculations that are almost impossible inother Monte Carlo methods.In this article, the possibility of using an RBF networkto represent the wavefunction of a quantum-mechanicalsystem is discussed. Our work is new in two major as-pects. First, the representation power of the RBF net-work is illustrated, which has not been discussed in thephysics literature. Second, instead of a lattice system,where the dimension of the Hilbert space of each site isfinite, a general quantum-mechanical system with infi-nite or continuous degrees of freedom is discussed. Abinary restricted Boltzmann machine is not sufficient forthe simulations of such a system; therefore, it is interest-ing to search for new ansatz. An RBF network is one ofthe candidates.In our work, a VMC procedure is formulated, wherean RBF network is used as the variational wavefunction.A harmonic oscillator in a linear potential and a particlein a box with a linear potential are then used as bench-marks. Furthermore, we discuss the possibility of usingthe VMC method to solve for the lowest eigenvalue of amatrix.This article is organized as follows. In section II, arti-ficial neural network theory and variational Monte Carlotheory are reviewed. Section III contains major results,that is, quantum mechanical problems are solved usingthe radial basis neural network. In section IV, we discusssome related questions. a r X i v : . [ qu a n t - ph ] S e p II. ARTIFICIAL NEURAL NETWORK THEORYAND THE VARIATIONAL MONTE CARLOMETHOD
In this section, two cornerstones of this work will beintroduced, which are the artificial neural network theoryand the variational Monte Carlo method.
A. Artificial neural network theory
Inspired by the biological neural network model, ANNtheory was proposed by McCulloch and Pitts in 1943[6],in an attempt to propose a mathematical description ofthe biological nervous system. Figure. 1 illustrates a sim-ple example of a neural network which consists of threelayers of artificial neurons.
FIG. 1. An illustration of the artificial neural network. Atypical neural network consists of three layers of neurons: theinput layer, the hidden layer, and the output layer. Eachneuron is represented by a circle. The lines between layersare associated with the parameters of the neural network.
Neural networks are widely-used tools in machinelearning theory, for example, as a function approximationtool in supervised learning. The goal is to find the op-timal parameters by minimizing the cost function. Thiscan be a highly non-trivial problem when there are alarge number of parameters. For such algorithms such asthe back-propagation, please see Ref.[7].In a typical machine learning problem using neural net-work methods, the input neuron can be a binary number.For example, in a handwritten digit recognition problem,each input neuron corresponds to a pixel in a figure andtakes a value of 0 or 1. The input values are processedthrough the neural network using, for example, the rulesmentioned above. The output values of the neural net-work are compared with the objective values, and theerror is minimized by finding the optimal parameters.In this article, the radial basis function (RBF) net-work, is used as a variational wavefunction ansatz. Forexample, for a three-layer RBF network with one single output neuron, the output function z ( x ) of the neuralnetwork can be written as z ( x ) = M (cid:88) i =1 a i ρ i ( || x − c i || ) . (1)In this output function, a i and c i are parameters ofthe neural network. x is the input vector which has thesame dimension as c i . M is the number of neurons in thehidden layer. ρ ( || • || ) is the radial basis function whichcan be a Gaussian function with a Euclidean norm. ρ i ( || x − c i || ) = e −| b i || x − c i | , (2)or an exponential absolute value function ρ i ( || x − c i || ) = e −| b i || x − c i | . (3)Other activation functions, such as multiquadratics ρ i ( || x − c i || ) = (cid:112) | x − c i | − | b i | , (4)Or inverse multiquadratics ρ i ( || x − c i || ) = ( | x − c i | − | b i | ) − , (5)are also commonly used in the machine learning commu-nity. These activation functions can also be understoodas kernel functions. In the activation functions, | b i | areparameters that control the spread of the activation func-tion. Other activation function are also possible, discus-sions about the activation function can be found in Ref.[17]In addition to the RBF network, many different typesof neural networks can be constructed, such as the re-stricted Boltzmann machines or the autoencoders, whichare widely used in deep learning technology. The univer-sal approximation theorem establishes the mathematicalfoundation of neural network theory, which states thatneural network functions are dense in the space of con-tinuous functions defined on a compact subset of R n , un-der some assumptions about the activation function andgiven enough hidden neurons[10][11].In this paper, the RBF network is used as a variationalwave function represented in a discrete eigenbasis. Notethat we use | b i | as a variational parameter in our calcu-lations instead of a constant number as in a regular RBFnetwork. The absolute value of | b i | is for the stability ofthe optimization.When neural network methods are applied to quantumphysics, the inputs of the neural network can take dis-crete quantum numbers. After being processed throughthe neural network, the outputs of the neural networkrepresent the amplitudes of the wavefunction on the ba-sis labeled by the input quantum numbers. The neuralnetwork is then trained by minimizing the energy expec-tation value. For example, for a three-dimensional quan-tum harmonic oscillator in an orthogonal coordinate sys-tem, we can use a neural network with three input neu-rons, where each input can take integer values for 0 to ∞ .The trained neural network should represent the groundstate of this system, in which, after proper normalization,the output should be 1 given 000 as the input, and 0 forother inputs. B. Variational Monte Carlo method(VMC)
The VMC method, first proposed by McMillan in1965[18], combines the variational method and the MonteCarlo method in order to evaluate the ground state of aquantum system.Start from a Hamiltonian ˆ H and a variational wavefunction | ψ ( λ ) (cid:105) , where λ is a set of variational parame-ters, the energy expectation value can be written as E ( λ ) = (cid:104) ψ ( λ ) | ˆ H | ψ ( λ ) (cid:105)(cid:104) ψ ( λ ) | ψ ( λ ) (cid:105) . (6)This energy expectation value can be computed us-ing the widely known Metropolis algorithm[19], which isone of the most efficient algorithms in computational sci-ence. As a Markov chain Monte Carlo method, it maycurrently be the only efficient algorithm for evaluating amultidimensional integral.The next step of the VMC method is to minimizethe energy in the parameter space. This can be adifficult problem when there are many variational pa-rameters. Two examples of such algorithms are thelinear method[20] and the stochastic reconfigurationmethod[21]. The minimization algorithm gives the min-imum of the energy in the parameter space, and it isreasonable to use this value as our approximation for theground state energy. For a detailed review of the VMCmethod, please refer to Ref. [22].Currently, physicists believe that the accuracy of theVMC method depends, to a great extent, on a properchoice of the variational wavefunction; therefore, it is im-portant to choose a wavefunction based on physical in-tuition or a physical understanding of the system. Thisbelief may not be true in the age of machine learning.Neural network functions are capable of approximatingunknown functions by maximizing or minimizing the ob-jective function. It would be interesting to further ex-plore the possibility of using a neural network functionas the variational wavefunction of a quantum system. III. SOLVING QUANTUM MECHANICSPROBLEMS USING ARTIFICIAL NEURALNETWORK
In the pioneering work of Carleo and Troyer[14], re-stricted Boltzmann machine(RBM) was used as a vari-ational wave-function for many-body systems. Thetransverse-field Ising model and anti-ferromagneticHeisenberg model were benchmarked using the RBMwavefunction. Variational Monte Carlo calculations werecarried out. Their results demonstrate that a neural net-work wavefunction is capable of capturing the quantumentanglement of the ground states and giving an accurateestimation of the ground state energy.In this article, we continue developing this idea us-ing artificial neural network functions as the groundstate variational wavefunction. In Ref.[14], the re-stricted Boltzmann machine is only binary-valued, wewill demonstrate the representation power of a neuralnetwork wavefunction without this constraint. In addi-tion, we discuss the possibility of using a neural networkwavefunction to solve a generic quantum mechanics prob-lem. This VMC method behaves at least as accurate asthe perturbation theory.
A. Theoretical outline
Consider a quantum system which has countable num-ber of basis, an arbitrary state | ψ (cid:105) in the Hilbert spacecan be represented by | ψ (cid:105) = (cid:88) n ,n ,...,n p ψ ( n , n , ..., n p ) | n , n , ..., n p (cid:105) , (7)where | n , n , ..., n p (cid:105) is a set of basis labeled by quan-tum number n i , i = 1 , ...p. , and p is the number ofsites in the system. For example, for the Heisenbergmodel, p represents the number of spins; for a threedimensional harmonic oscillator in a Cartesian coordi-nate, we could use n , n , n to label three quantum num-bers. ψ ( n , n , ..., n p ) is the amplitude of | ψ (cid:105) on basis | n , n , ..., n p (cid:105) . We can interpret this amplitude as a func-tion of n , n , ..., n p . A similar ansatz is also used in Ref.[15].This function can be represented by a neural networkwith one output neuron. Using an RBF network, theamplitude function can be written as, ψ ( n , n , ..., n p ; a , c ) = M (cid:88) i a i ρ i ( || n − c i || ) , (8)with n represents an array of quantum numbers and ρ i ( || x − c i || ) = e −| b i || x − c i | . (9)One reason to choose this neural network is that theGaussian activation function guarantees that the ampli-tude does not diverge when n → ∞ .Practically, it is useful to truncate the quantum num-ber n i if its range is countably infinite. This is not nec-essary for a spin half lattice system since n i can onlytake two values. For a harmonic oscillator, however, wemay truncate the quantum number at some finite value.The universal approximation theorem is only valid for aclosed space. This truncation will also facilitate numeri-cal simulations.Using this variational wave function, the energy expec-tation value is E ( λ ) = (cid:104) ψ ( λ ) | H | ψ ( λ ) (cid:105)(cid:104) ψ ( λ ) | ψ ( λ ) (cid:105) = (cid:82) | ψ ( n ; λ ) | E local ( n ; λ )) d n (cid:82) | ψ ( n ; λ ) | d n , (10)with E local ( n ; λ ) = (cid:104) n | H | ψ ( λ ) (cid:105)(cid:104) n | ψ ( λ ) (cid:105) = (cid:80) n (cid:48) (cid:104) n | H | n (cid:48) (cid:105)(cid:104) n (cid:48) | ψ ( λ ) (cid:105)(cid:104) n | ψ ( λ ) (cid:105) , (11)Here, λ represents all the variational parameters, forexample, a i , b i and c i .The energy expectation can be evaluated using theMetropolis algorithm. After initialization and thermal-ization, repeat these two step until equilibrium: (1) gen-erate a move from configuration n to n (cid:48)(cid:48) . (2) Usingproper transition probability, accept or reject the movewith probability min (1 , | (cid:104) n (cid:48)(cid:48) | ψ ( λ ) (cid:105)(cid:104) n | ψ ( λ ) (cid:105) | ). Expectation valueof other operators can be evaluated similarly.Compared with exact diagonalization, one advantageof this formalism is that the matrix element (cid:104) n | H | n (cid:48) (cid:105) is never stored explicitly. Only the non-zero matrix ele-ments are needed to be valued and summed during thesampling process.The energy as a function of parameters λ can be, forexample, minimized using the stochastic reconfigurationmethod[21]. In the stochastic reconfiguration method, anoperator O i ( n ) = ∂ λ i ψ λ ( n ) ψ λ ( n ) , (12)can be defined for each parameter in the variationalwavefunction.For a radial basis neural network with the Gaussianbasis function O a i ( n ) = ρ i ψ , (13) O b i ( n ) = − a i b i | n − c i | ρ i | b i | ψ , (14) O c ij ( n ) = 2 a i | b i | ( n j − c ij ) ρ i ψ , (15)where c ij is the j-th component of of c i . The covariancematrix and forces are defined as S ij = (cid:104) O ∗ i O j (cid:105) − (cid:104) O ∗ i (cid:105)(cid:104) O j (cid:105) , (16) F i = (cid:104) E local O ∗ i (cid:105) − (cid:104) E local (cid:105)(cid:104) O ∗ i (cid:105) . (17)The parameters can be updated by λ (cid:48) j = λ j + αS − ij F i . (18)Here, (cid:104)•(cid:105) is the expectation value of an operator. α can be understood as the learning rate of the optimiza-tion algorithm. A regularization, S (cid:48) ii = S ii + r ( k ) S ii , isapplied to the diagonal elements of matrix S in all our cal-culation, where r ( k ) = max (100 × . k , − )[14]. Thisprocess iterates until the optimization converges, and wetreat the converged energy as our best approximation ofthe ground state energy.In this article, the method mentioned above is usedfor the optimization. We notice that the recent work ofSaito[15], in which feedforward neural network was suc-cessfully used to represent the ground state of the Bose-Hubbard model. In their work, an exponential functionwas written based on the output of the feedforward neu-ral network. It is an interesting question whether an ex-ponential of feedforward neural network output functioncan be used to represent a quantum mechanical wave-function. B. One dimensional quantum harmonic oscillatorin electric field
To start with, we’d like to benchmark the quantumharmonic oscillator. Since we use a set of discrete quan-tum numbers to describe the variational wavefunction, itis natural to use the energy eigenbasis of an unperturbedharmonic oscillator to calculate the matrix element.Consider the one dimensional Hamiltonian H = ˆ p x E ˆ x = H + E ˆ x, (19)where E is a parameter that can be understood as theelectric field.Using natural units, it is easy to see that the groundstate energy of H is 0 . H are labeled by | n (cid:105) , thevariational ansatz for the ground state of H can be ap-proximated by | ψ (cid:105) = n max − (cid:88) n =0 ψ ( n ) | n (cid:105) , (20)with ψ ( n ) represented by an RBF network with oneinput neuron, and we truncate the quantum number to n max −
1. In this notation, the RBF network representsthe function ψ ( n ). The variable n can take different val-ues, for example, if n = 1, the output of the neural net-work is the coefficient on the basis | (cid:105) , which is ψ (1).The neural network represents the function ψ , and thecoefficient on basis | n (cid:105) are represented by ψ ( n ).We use the VMC procedure described in Section III Ato conduct the calculation. The parameters are initial-ized randomly. Our codes are written in C++, where thematrix solving library Eigen[23] is used for the Stochas-tic Reconfiguration. Sample codes will be available athttps://github.com/peiyuanteng.A neural network with random parameters is first cre-ated. And then the ground state energy under one set ofparameters are calculated using the Monte Carlo method.The state space of the Monte Carlo sampling is a trun-cated discrete space denoted by n . Specifically, our quan-tum number is the quantum number of the unperturbedHamiltonian H , and the basis are the eigenbasis of H .We are trying to solve for the ground state of the per-turbed one. A random plus or minus move is generatedfor each sample and accepted using the Metropolis Al-gorithm. In this work, when a random move yields aquantum number that is below zero or above n max − n , we can plug it into the neuralnetwork and get its amplitude. During the Monte Carloprocess, 50000 samples are used. Being able to calculatethe energy, we can then use the Stochastic Reconfigu-ration method to find the minimal energy, and we treatthis energy as our best approximation of the ground stateenergy.In Figure. 2, we illustrate the minimization of groundstate energy during the iteration process using the Gaus-sian basis function. See Eq.2. The learning rate is setat 0 . m denotes the number of neurons in the hiddenlayer.Alternatively, we can use the exponential absolutevalue function as the RBF, see Eq. 3. Under the samelearning rates, this RBF network also converges to thecorrect eigenvalue, see Figure.3. It is easy to see that theGaussian RBF network behaves better than the other.Based on our experience, the Gaussian network also per-forms better in other cases, therefore we use the Gaussiannetwork in later examples. Remarks:
We use n as our variable for the variationalwavefunction. The output of ψ ( n ) is discrete. It shouldnot be confused with the method that uses a Gaussianfunction in the coordinate representation as the varia-tional wavefunction, which is trivial. One reason that Number of iteration steps S a m p l ed g r ound s t a t e ene r g y m=1m=2m=5m=10m=20 FIG. 2. Minimization of the ground state energy of H at E = 0, using Gaussian radial basis network. m is the numberof hidden layers in the neural network. we compare Eq. 2 and Eq. 3 is to demonstrate thatthis method is capable of giving the correct coefficientsregardless of the radial basis function. Number of iteration steps S a m p l ed g r ound s t a t e ene r g y m=20m=10m=5m=2 FIG. 3. Minimization of the ground state energy of H at E = 0, using Eq. 3 as the radial basis function. m is thenumber of hidden layers in the neural network. Figure. 4 illustrates the behavior of VMC under dif-ferent electric field. In our simulation, a separate neuralnetwork is trained for each E . The theoretical value ofthe ground state energy e g is e g = 0 . − E ). The VMCresults converge at 0 . ± . . ± . − . ± .
003 when E = 0 . , . , .
0, while the exact value isat 0 . . − . E under certain nmax . In this section nmax = 20. Expetation value and errors in this articleare calculated when optimization is saturated.Notice that during the optimization process, the sam-pled ground state energy may have some spikes. Theauthor believes that this phenomenon is a result of thestochastic nature of the optimization algorithm. Randomfluctuations of the expectation value of the operator andthe complicated structure of the energy function may lead Number of iteration steps -2-101234 S a m p l ed g r ound s t a t e ene r g y E=2.0E=0.5E=1.0
FIG. 4. Minimization of the ground state energy of H at E = 0 . , . , .
0, using Gaussian radial basis function. to drastic changes in the ground state energy during theoptimization process.Figure. 5 shows ψ ( n ) as a function of n under different E . ψ ( n ) is normalized and its value means the overlapbetween new ground state of H and the energy eigenstate | n (cid:105) of H o .Theoretically one can calculate that ψ ( n ) = (cid:90) ∞−∞ √ n n ! ( 1 π ) e − ( x − E ) / e − x / H n ( x ) dx, (21)where H n ( x ) are the Hermite polynomials. Simplifythis expression, we will get ψ ( n ) = 1 √ n n ! E n e − E / . (22)It can be seen that VMC values agree very well withthe exact value when E is small. Errors begin to increasewhen E gets larger.Based on these results, we claim that the radial basisneural network clearly captures the behavior of the 1Dquantum harmonic oscillator. C. Two dimensional quantum harmonic oscillatorin electric field
Similarly, we can consider a radial basis neural net-work with many input neurons. For example, with twoinput neurons, we can consider a two-dimensional quan-tum harmonic oscillator in an electric field.Consider a Hamiltonian H = ˆ p x x p y y E x ˆ x + E y ˆ y = H + E x ˆ x + E y ˆ y. (23)It is easy to see that the eigenvalue of H is 1 .
0. Wewill treat E x and E y as our parameters. Quantum number n -0.200.20.40.60.81 C oe ff i c i en t E=0.0E=0.5E=1.0E=2.0
FIG. 5. ψ ( n ) as a function of n at E = 0 . , . , . , .
0, usingGaussian radial basis function. Circles represent theoreticalvalues and asterisk represents the values with RBF network.
Our neural network wavefunction can be written as | ψ (cid:105) = n max − (cid:88) n x ,n y =0 ψ ( n x , n y ) | n x , n y (cid:105) (24)We can use the same VMC procedure as the previouspart to perform the calculation. The learning rate, inthis case, is set at 0 .
2, our neural network has 10 hiddenneurons and 2 input neurons. The algorithm used forthis 2d example is similar to the 1d Harmonic Oscillator.Figure 6 and 7 illustrate the behavior of the trainedneural network at different electric field. From the shapeof the surface, we can see that a proper choice of nmax isimportant to the accuracy of this method. The reason isthat, in this example, when E x , and E y gets larger, thebump in the function ψ ( n ) will shift away from the origin.The states out of nmax are not considered, therefore theaccuracy will be affected if the overlaps out of nmax arelarge. In these figures, we choose nmax = 10 to illustratethe influence of nmax on the accuracy.The exact value of ψ ( n x , n y ) can be solved as ψ ( n x , n y ) = 1 (cid:112) nx n x ! 1 (cid:112) ny n y ! E n x x E n y y e − E x / e − E y / . (25)Table I lists a sample of the relation between nmax and the VMC energy at E x = 4 . , E y = 2 .
0. We can seethat in this example the accuracy of the results improvewith nmax .Figure. 8 shows ψ ( n x , n y ) as a function of n x and n x under different E = (1 . , . D. Particle in a box
Another example that is benchmarked is a particle ina box with perturbation. n y +1 n x +1 ψ ( n ) FIG. 6. ψ ( n x , n y ) as a function of n x + 1 , n y + 1 at E x =1 . , E y = 1 .
0, using Gaussian radial basis function. In thisfigure, ψ ( n x , n y ) is not normalized. n y +1 n x +1 ψ ( n ) FIG. 7. ψ ( n x , n y ) as a function of n x + 1 , n y + 1 at E x =4 . , E y = 2 .
0, using Gaussian radial basis function. In thisfigure, ψ ( n x , n y ) is not normalized. n y ψ ( n x , n y ) n x =0 exactn x =0 numericaln x =1 exactn x =1 numericaln x =3 exactn x =3 numerical FIG. 8. ψ ( n x , n y ) as a function of n y at different n x with E x = 1 . E y = 1 .
0. Circles represent theoretical values andasterisk represents the values with RBF network. In this fig-ure, ψ ( n x , n y ) is normalized. Consider the Hamiltonian
TABLE I. The relation between nmax and the VMC energyat E x = 4 . , E y = 2 .
0. VMC energy converges at − . ± . H = ˆ p V ( x ) + a ˆ x = H + a ˆ x, (26)with V ( x ) = 0 when 0 < x < V ( x ) = ∞ when x takes other values. a ˆ x is a linear potential defined on0 < x < a as a parameter.In natural units, the ground state energy of H is π =4 . a/
2. The second orderperturbation will give a correction of − . a .A radial basis neural network VMC simulation can besimilarly carried out. As always we choose the basis tobe the eigenbasis of H . 50000 samples are used. Tenhidden neurons ( m = 10 ) are chosen in our calculation. nmax is set at 20. The learning rates are set at 0 . (cid:104) n | ax | n (cid:105) = a − n + n − n n ( n − n ) ( n + n ) π , (27)when n (cid:54) = n . And (cid:104) n | ax | n (cid:105) = 0 . a, (28)when n = n .In Figure 9, The convergence VMC ground state atdifferent parameters is illustrated. Intermediate pointsthat have a value which is larger than 20 are set at 20 tomaintain the scale of this graph. Notice that we get morespikes during the iteration when a is small. The heightsof the spikes decrease if smaller the learning rates areused.Table II compares the result using an RBF networkVMC, theoretical results up to second-order perturba-tion theory, and exact results. The exact ground stateenergy values are calculated using Mathematica. We cansee that VMC performs much better than first-order per-turbation theory and converge to the ground state energythat is very close to the theoretical ground state energy. Number of iteration steps S a m p l ed g r ound s t a t e ene r g y a=0.0a=2.0a=4.0a=8.0a=-8.0 FIG. 9. Minimization of the ground state energy of H at a = 0 . , . , . , . , − . a .a 1st order 2nd order VMC energy exact value0.0 4.9348 4.9348 4.9348 ± ± ± ± ± E. Neural network as a Hermitian matrix lowesteigenvalue solver
So far the examples that are benchmarked can all besolved by perturbation theory. Can neural network VMCmethod have a wider application than the perturbationtheory? In this part, we will illustrate the possibilityof using an RBF network VMC method to solve for thesmallest eigenvalue of a Hermitian matrix. This problemis non-perturbative and purely mathematical, and ourresult implies that neural network VMC can have muchbroader scope than perturbation method.Consider an n × n Hermitian matrix H . The eigenvec-tor that corresponds to the lowest energy is an n dimen-sional vector.We can write this eigenvector as (cid:126)x = n (cid:88) i =1 ψ ( i )ˆ i, (29)and any vector in this finite vector space can be writtenin this form.Define the objective function to be E = (cid:126)x ∗ H(cid:126)x. (30) Then the smallest value of E corresponds to the low-est eigenvalue of H . And our goal is to find a set ofparameters in neural network ψ that minimize E .We can convert the matrix multiplication in E into adiscrete sum, which can be evaluated using the Metropo-lis algorithm. Instead of the energy eigenbasis, in thissituation, we can choose our configuration space to be n points, where n is the dimension of vector (cid:126)x , and thetrial move would be from basis ˆ i to ˆ i (cid:48) . Therefore we canuse the same VMC technique to minimize E .Our previous examples can be essentially understoodin this way since our Hamiltonians are truncated to afinite dimensional matrix.To give a concrete implementation of this idea, we con-sider a matrix H ( d ) pq = 1 /p + 1 /q. (31)Here H ( d ) is a d × d dimensional matrix. p , q are thelabel for H ( d ). And the matrix element on the p -th rowand q -th column equals 1 /p + 1 /q .We use the RBF network ansatz to calculate the lowesteigenvalue of H ( d ). Hidden neuron numbers are set at20. 50000 samples are chosen. Iteration undergoes 300steps and learning rate is 0 .
01. Table III shows the resultof our VMC simulation.
TABLE III. VMC results of the lowest eigenvalue of H ( d ).d exact value VMC result2 -0.0811 -0.0811 ± ± ± ± Our optimized neural network also yields the eigen-vector that corresponds to the lowest eigenvalue. Thecomponents can be acquired by plugging in i into ψ ( i ). For example, when n = 10, VMC gives aeigenvector −→ V , which is (0.6851,0.1174,-0.0711,-0.1646,-0.2200,-0.2562,-0.2813,-0.2994,-0.3127,-0.3226), while theexact vector −→ V is (0.6807,0.1194,-0.0677,-0.1613,-0.2174,-0.2548,-0.2816,-0.3016,-0.3172,-0.3297). The Euclideannorm of the error d = |−→ V − −→ V | = 1 . × − .We also calculate the relation between the accu-racy and m (the number of neurons in the hid-den layer). For d = 10, the variational energy is − . , − . , − . , − . m = 5 , , , Caveat : The learning rate depends on the number ofhidden neurons, and it has to be set by trial and error.We also have to point out that when d >
10, the VMCoptimization procedure may converge slowly or fail toconverge. The stability also depends on forms of H . Forsome large ill-conditioned matrices, it is expected thatthe random sampling process will not capture all the ma-trix elements and lead to inaccurate results. IV. DISCUSSION
Is it possible to use an RBF network with continuousvariables as the variational wavefunction? This is possi-ble for some certain Hamiltonians. For example, we canuse an RBF network with a Gaussian basis as the vari-ation wavefunction for the ground state of a harmonicoscillator. Based on our test, although this ansatz worksperfectly for the harmonic oscillator, the iteration maynot converge to the correct ground state when appliedto other models. This test is trivial for the harmonic os-cillator since its ground state is intrinsically a Gaussianfunction. For wavefunctions with continuous variables,the Kato’s cusp condition[24] poses strong constraints onthe mathematical form of the wavefunction. A wave-function that does not satisfy this condition will result instrong numerical instability in the VMC calculation.How is this approach useful? This approach providesa new way to find the ground state energy of a quantumsystem. Compared with traditional variational MonteCarlo simulation, this method does not require choos-ing a specific wavefunction from our intuition. Does thismethod depend on choosing a basis | n (cid:105) ? The exampleon the diagonalization of a Hermitian matrix illustratesthat it doesn’t depend on it as well, although a good basismay improve the accuracy and stability.One advantage of ANN-based VMC is that the code iseasy to modularize. When programming, we can writethe modules for a neural network, Hamiltonian, and op-timization separately. For the same Hamiltonian, we canalso compare the representation power of different neu-ral networks and different optimization methods. Thisgreatly reduces programming difficulties and improvesaccuracy.A potential issue with the neural network VMCmethod is that the optimization algorithm may fail tofind the global minimum of the objective function. Thisis a common issue in machine learning methods. We seethat the stochastic reconfiguration may not work wellenough that it could find the smallest eigenvalue of amatrix of arbitrarily large dimension. Therefore, findinga stable algorithm or stable neural network mathemati-cal form for the VMC optimization should be a crucial task. If successful, the neural network VMC method maygive numerical conclusions to many unsolved problems inquantum physics.Based on the above points, one important research di-rection is to develop more efficient VMC optimization al-gorithms. Another interesting direction is to discuss therepresentation power of different neural networks sincethere are a variety of neural networks developed by themachine learning community. For example, one interest-ing problem is the representation power of a continuousrestricted Boltzmann machine[25]. With a Gaussian ac-tivation function, a continuous restricted Boltzmann ma-chine has some similarities with the RBF network ansatzdiscussed in this paper. It is promising to provide moreaccurate results due to the elegant mathematical struc-ture of the restricted Boltzmann machine. V. CONCLUSION
In this article, RBF networks are used as the varia-tional wavefunction for quantum systems, and VMC cal-culations are carried out. For the examples that are ex-amined, the VMC results agree well with theoretical pre-dictions. Furthermore, it is possible to use the VMCmethod to calculate the lowest eigenvalue of a Hermitianmatrix.
ACKNOWLEDGEMENT
Great thanks should be given to Dr. Yuan-Ming Lufor his helpful discussions and comments. I also wantto thank the Ohio State University Physics Departmentfor supporting my study. This work is supported by thestartup funds of Dr. Yuan-Ming Lu at the Ohio StateUniversity.