Nuclear binding energy predictions using neural networks: Application of the multilayer perceptron
NNUCLEAR MASS PREDICTIONS USING NEURAL NETWORKS: APPLICATIONOF THE MULTILAYER PERCEPTRON
Esra Y¨uksel, a) Derya Soydaner, b) and H¨useyin Bahtiyar c)1) Department of Physics, Faculty of Science and Letters,Yildiz Technical University, Davutpasa Campus, 34220, Esenler, Istanbul,Turkey Department of Statistics, Mimar Sinan Fine Arts University, Bomonti 34380,Istanbul, Turkey Department of Physics, Mimar Sinan Fine Arts University, Bomonti 34380,Istanbul, Turkey (Dated: 29 January 2021)
In recent years, artificial neural networks and their applications for large data setshave became a crucial part of scientific research. In this work, we implement theMultilayer Perceptron (MLP), which is a class of feedforward artificial neural network(ANN), to predict ground-state binding energies of atomic nuclei. Two different MLParchitectures with three and four hidden layers are used to study their effects on thepredictions. To train the MLP architectures, two different inputs are used alongwith the latest atomic mass table and changes in binding energy predictions arealso analyzed in terms of the changes in the input channel. It is seen that usingappropriate MLP architectures and putting more physical information in the inputchannels, MLP can make fast and reliable predictions for binding energies of atomicnuclei, which is also comparable to the microscopic energy density functionals. a) Electronic mail: [email protected] b) Electronic mail: [email protected] c) Electronic mail: [email protected] a r X i v : . [ nu c l - t h ] J a n . INTRODUCTION One of the major research areas in nuclear physics is the nuclear mass (binding energy)predictions, especially for nuclei far from the stability line with extreme proton-neutron ra-tio. As the most fundamental property of nuclei, accurate nuclear mass measurements andtheoretical predictions have vital importance not only for nuclear physics but also for nu-clear astrophysics . Alongside with other nuclear properties (i.e., charge radii, separationenergies, decay properties, etc.), the nuclear masses could provide information about thenucleon-nucleon interaction, shell, and pairing properties of nuclei, and its precise determi-nation is also crucial for our understanding of the formation of chemical elements heavierthan iron in the universe .In the last decades, nuclear mass measurements have gained acceleration with the devel-opments in experimental facilities. According to the latest atomic mass table AME2016 ,the ground-state masses of 3435 nuclei have been measured. However, measurement of theground-state properties for nuclei close to the drip lines still stands as a challenge. Also,there exist large deviations in theoretical model calculations for nuclei close to the driplines. Therefore, further studies are needed to make reliable predictions in these regions.Up to now, microscopic-macroscopic (mic-mac) and microscopic models have beenemployed for the mapping of the nuclear landscape. Although the microscopic models aremore complete in terms of the physics behind them, much better results are obtained usingthe mic-mac models in nuclear mass predictions since their constants are determined usingthe experimental ground-state masses of nuclei. While the root-mean-square (rms) devi-ation of nuclear masses is generally high (several MeV) using the microscopic models, theFRDM2012 predicts an rms deviation 0.57 MeV with respect to the AME2003 atomic masstable , and WS3 model gives even lower rms deviation (0.336 MeV) for nuclear masses .Considering the microscopic models, the first complete nuclear mass table was basedon the well-known Skyrme Hartree–Fock (HF) method in the non-relativistic framework ,and the root-mean-square error was obtained as 0.738 MeV using the 1995 Audi–Wapstracompilation . With further improvements, the HFB-31 mass model also gave a model errorof 0.561 MeV for the measured mass of 2353 nuclei . Using the Gogny HFB method,the rms deviation was obtained as 0.798 MeV with respect to the experimental predictionsof the 2149 nuclei . A systematic study was also conducted on 6969 nuclei to predict2uclear properties using the relativistic mean-field model, and rms deviation for nuclearmasses was obtained as 2.1 MeV . According to the latest microscopic mass model basedon the relativistic continuum Hartree-Bogoliubov (RCHB) theory , the root-mean-squaredeviation of the binding energies with respect to the experimental data was obtained ataround several MeV. Although considerable progress has been achieved with the mic-macand microscopic models, the rms deviations are still high and one needs more accurateresults, especially for astrophysical applications. Therefore, different approximations andmodels are required to understand the discrepancies in the results as well as to make moreprecise predictions.In recent years, there has been an increasing amount of interest in artificial neuralnetworks , which is known as a nonparametric estimator in Machine Learning (ML).Applications of neural networks cover many areas in science as well as the different branchesof physics. Considering the variety and richness of the available experimental data, nuclearphysics is also a good candidate to study using neural networks. Long ago, several studieswere performed to predict nuclear properties using various techniques . Neural networkswere used to predict nuclear mass excess and neutron separation energies , and it is shownthat neural networks can be used as a new tool to predict the properties of atomic nuclei.Later on, nuclear mass defect predictions were made using the neural networks, showingthat the neural networks can be considered as powerful tools to explore nuclear propertiesalongside the theoretical models . The ground-state energies and charge radii of nucleiwere also investigated using artificial neural networks, and the usefulness of the method inthe predictions was shown. Recently, various machine learning algorithms were used withthe latest AME2016 data set to estimate the binding energies of atomic nuclei . Besides,it was shown that the deep neural networks can predict the ground-state and excited en-ergies as accurate as the nuclear energy density functional with less computational cost .In recent years, neural network approaches were also used to train the mass residues of thetheoretical models to improve the predictive power of the models and achieved considerablesuccess . As far as we are concerned, there is no work related to the application ofthe multilayer perceptron (MLP), which is a class of feedforward artificial neural network(ANN), to nuclear physics data. Therefore, it would be interesting to investigate the successof this model in the predictions of nuclear properties.In this work, we implement the multilayer perceptron to predict the total binding en-3rgies (BE) of atomic nuclei. In our work, we first use the experimental data along withthe proton (Z) and mass (A) numbers of the selected nuclei as inputs. Then, we studythe effects of the increasing number of the hidden layers and inputs in the predictive powerof the neural network. Finally, we compare our results with the other microscopic andmicroscopic-macroscopic models to evaluate the success of MLP in the binding energy pre-dictions compared to the other models. II. MULTILAYER PERCEPTRON
In this study, we aim to create a model that makes ground-state binding energy predic-tions for atomic nuclei by using input data. Inputs are the nuclear properties that can affectthe binding energies of nuclei. Our model takes these properties as input and predicts thebinding energies as the output. Such problems, where the output is a numerical value, areknown as the regression problems. Regression is a supervised learning problem where thereis an input, X , an output, Y , and the task is to learn the mapping from the input to theoutput . To this end, in machine learning, we assume a model as shown below: y = f ( x | θ ) (1)where f ( . ) is the model and θ are its parameters. In our case, y corresponds to the predictionfor binding energy, and f ( . ) is the regression function. In the context of machine learning,the parameters, θ , are optimized by minimizing a loss function. Thus, the predictions areobtained as close as possible to the correct values given in the input data.We choose multilayer perceptron (MLP) as the model f ( . ). MLP is a neural networkarchitecture that is mostly preferred to solve such regression problems. In the training stageof an MLP, the backpropagation algorithm is used for computing the gradient. On theother side, another algorithm is used to perform learning using this gradient . The secondalgorithm is used for optimization, which is usually called as the optimizer . In recent years,a new type of algorithm, which is called the adaptive gradient method, is preferred as theoptimizer. In this study, we use Adam algorithm to train the MLP.Basically, an MLP is a feedforward neural network with one or more than one hiddenlayer between input and output layers. In the case of one hidden layer, first, input x is fedto the input layer. By using an activation function, the activation propagates in the forward4irection, and the values of the hidden units z are computed. Each hidden unit usuallyapplies a nonlinear activation function to its weighted sum. After performing the forwardpass, an error is computed by using a loss function. By using this error, the weights areupdated in the backward pass .However, an MLP with one hidden layer has limited capacity, and using an MLP withmultiple hidden layers can learn more complicated functions of the input. That is the ideabehind deep neural networks where each hidden layer combines the values in its precedinglayer and learns more complicated functions . It is possible to have multiple hidden layerseach with its own weights and applying the activation function to its weighted sum. Itshould be noted that different activation functions can be used in multilayer perceptrons,e.g., ReLU, tanh, sigmoid, etc. In this work, we implement both tanh and ReLU, which aretwo commonly used activation functions in nuclear mass predictions , and find thatthe ReLU function gives better predictions on the test data. Therefore, we choose the ReLUfunction as the activation function of the hidden layers in this work, which is also mostlypreferred for the hidden layers of deep neural networks : φ ( x ) = max (0 , x ) (2)An MLP with three hidden layers is demonstrated in Fig. 1 where w h , w l and w k arethe weights belonging to the first, second and third hidden layers, respectively. The unitson the first, second and third hidden layers are represented as z h , z l and z k , and v are theoutput layer weights. Such an architecture is required four stages to compute the output.Firstly, input x is fed to the input layer, the weighted sum is computed, and the activationpropagates in the forward direction. When the ReLU function is chosen as the activationfunction, z h is computed as shown below: z h = ReLU ( w T h x )= ReLU (cid:32) d (cid:88) j =1 w hj x j + w h (cid:33) , h = 1 , ..., H (3)The computations for the second hidden layer are practiced similarly. At this stage, thesecond hidden layer activations are computed by taking the first hidden layer activations astheir inputs. Then, the third hidden layer activations are computed by taking the second5 nput Layer First Hidden Layer Second Hidden Layer Output LayerThird Hidden Layer FIG. 1. The structure of a multilayer perceptron with three hidden layers. hidden layer activations as their inputs. In a regression problem, there is no nonlinearity inthe output layer. Therefore, the output y is computed by taking the z as input . Thus,the forward propagation is completed: z l = ReLU ( w T l z )= ReLU (cid:32) H (cid:88) h =0 w lh z h + w l (cid:33) , l = 1 , ..., H (4) z k = ReLU ( w T k z )= ReLU (cid:32) H (cid:88) l =0 w kl z l + w k (cid:33) , k = 1 , ..., H (5) y = v T z = H (cid:88) k =1 v k z k + v (6)When the MLP goes deeper, one more step is added to these computations for each oneof additional hidden layer. In this study, we implement MLP architectures of two differentdepths. Whereas the first one includes three hidden layers, the other one includes four.Thus, we observe the effect of depth on the binding energy predictions. As we predict thebinding energy, i.e. one single numerical value, only one unit exists in the output layer.In order to determine the MLP architecture, we gradually increase the number of hiddenunits according to the prediction performance on three data sets. Then, we choose our6nal architectures. The MLP with three hidden layers includes 32,16 and 8 hidden units,respectively. It includes 769 (833) parameters for the MLP model with two (four) inputs,which is much smaller than the number of training data. On the other side, the MLP withfour hidden layers includes 32,32,16 and 8 hidden units. It includes 1825 (1889) parametersfor the MLP model with two (four) inputs. In addition to the smaller number of parameters,our architectures does not overfit because of two main reasons: Firstly, the central challengein machine learning is that we must perform well on new, previously unseen inputs – notjust those on which our model is trained . As it is seen in our results below, our neuralnetwork can make good predictions on test data. Secondly, overfitting occurs when thegap between the training loss and test loss is too large . However, in our calculations, thegap between the training loss and test loss is too small. For instance, it is found as 0.0023(0.0029) for the training (test) data of MLP architecture with four hidden layers using fourinputs. Therefore, it is seen that our neural network does not overfit, and it generalizes wellon test data of each dataset, and the losses of training and test data are so close to eachother.Another important step of creating these architectures is to initialize the layer weights.We initialize them with the Glorot normal initializer, also known as Xavier normal initializer .Besides, the input data is randomly divided into two subsets as 70.0% for training and 30.0%for testing. We prefer mean absolute error as the loss function on the training set X : E ( W , v | X ) = (cid:80) nt =1 | r t − f ( x t ) | n (7)where r t are the desired values and f ( x t ) are predictions for the binding energy.We train our MLP architectures 800 epochs by using Adam optimization algorithm tominimize mean absolute error. The name Adam is derived from adaptive moment estimation.It is an adaptive gradient method that individually adapts the learning rates of modelparameters . During training, this algorithm computes the estimates of first and secondmoments of the gradients, and uses decay constants to update them. Therefore, Adamalgorithm requires hyperparameters called decay constants in addition to the learning rate .In this study, the initial learning rate is 0.001, and decay constants are 0.9 and 0.999,respectively. 7 II. RESULTS
In this work, the multilayer perceptron is used to predict ground-state binding energiesof atomic nuclei. Two different MLP architectures and inputs are used to test the effect ofthe number of the hidden layers and inputs on the results. In the input channels, we firstuse proton (Z) and mass (A) numbers of nuclei as inputs along with the experimental datafrom the latest AME2016 mass table to predict binding energies. Therefore, this part ofthe present work does not include any physical identity or theory, except the proton andmass numbers. In the second part, we also include additional physical inputs to improve thepredictive power of our models. These inputs carry information about the shell structureof nuclei, which in turn related to nuclear binding energies . One of them is the pairingterm δ ( Z, N ), which is defined as δ ( Z, N ) = (cid:2) ( − Z + ( − N (cid:3) / . (8)The pairing term becomes +1 ( −
1) for even-even (odd-odd) nuclei, and 0 for other nuclei.A positive value for the pairing term indicates that the nucleus is more bound while theopposite behavior is valid for a negative value . Another input is the promiscuity factor(P) of nuclei and it is given by P = ν p ν n / ( ν p + ν n ) , (9)where ν p ( n ) is the difference between the actual proton (neutron) number and the nearestmagic number. The promiscuity factor is defined as a measure of the valance proton-neutron( p − n ) interactions . In this work, the proton and neutron magic numbers are taken asZ=8, 20, 28, 50, 82, 126 and N=8, 20, 28, 50, 82, 126, 184, respectively. The latest AME2016mass table provides data for 3413 nuclei for A ≥
8. However, only 2479 of them are obtainedexperimentally, and the others are calculated using the trend from the mass surface (TMS)in the neighborhood. Although the properties of some nuclei are not obtained directly fromexperiments, they are expected to provide reasonable data and trends for unknown nuclei.As it is well-known, having a large amount of data is crucial for training an artificial neuralnetwork. Using a large collection of data, performance of an algorithm can be improvedsignificantly. In our model, we make predictions by using only proton and mass numbers ofnuclei alongside some shell effects in the input. Since we do not provide much information8
M L P ( 3 2 - 1 6 - 8 )
B E
E x p . - B E p r e d i c t i o n ( M e V )N = 2 8N = 2 0 Z = 8 2Z = 5 0Z = 2 8
N e u t r o n n u m b e r ( N )
Proton number (Z) - 8 . 0- 6 . 0- 4 . 0- 2 . 00 . 02 . 04 . 06 . 08 . 0 ( Z , A ) s r m s = 3 . 9 8 M e V Z = 2 0N = 5 0 N = 8 2 a )
N = 1 2 6
02 04 06 08 01 0 01 2 0 0 2 0 4 0 6 0 8 0 1 0 0 1 2 0 1 4 0 1 6 0 1 8 0b )
N = 8 2 s r m s = 2 . 1 6 M e V Proton number (Z)
N e u t r o n n u m b e r ( N )
B E
E x p . - B E p r e d i c t i o n ( M e V ) - 8 . 0- 6 . 0- 4 . 0- 2 . 00 . 02 . 04 . 06 . 08 . 0 ( Z , A , d ,
P )
Z = 2 0N = 5 0 Z = 8 2N = 2 0N = 2 8 Z = 2 8 Z = 5 0
M L P ( 3 2 - 1 6 - 8 )
N = 1 2 6
FIG. 2. Comparison of the experimental and predicted binding energy differences for the testingset of the three layers MLP architecture using (Z, A) (upper figure) and (Z, A, δ , P) (lower figure)as inputs. B E
E x p . - B E p r e d i c t i o n ( M e V )
M L P ( 3 2 - 3 2 - 1 6 - 8 ) a )
N = 2 8N = 2 0 Z = 8 2Z = 5 0Z = 2 8
N e u t r o n n u m b e r ( N )
Proton number (Z) - 8 . 0- 6 . 0- 4 . 0- 2 . 00 . 02 . 04 . 06 . 08 . 0 ( Z , A ) s r m s = 3 . 7 2 M e V Z = 2 0N = 5 0 N = 8 2 N = 1 2 6
02 04 06 08 01 0 01 2 0 0 2 0 4 0 6 0 8 0 1 0 0 1 2 0 1 4 0 1 6 0 1 8 0
B E
E x p . - B E p r e d i c t i o n ( M e V )
M L P ( 3 2 - 3 2 - 1 6 - 8 ) b )
N = 8 2 s r m s = 1 . 8 4 M e V Proton number (Z)
N e u t r o n n u m b e r ( N ) - 8 . 0- 6 . 0- 4 . 0- 2 . 00 . 02 . 04 . 06 . 08 . 0 ( Z , A , d ,
P )
Z = 2 0N = 5 0 Z = 8 2N = 2 0N = 2 8 Z = 2 8 Z = 5 0N = 1 2 6
FIG. 3. The same as in Fig. 2, but for the four layers MLP architecture. to the model in the input channel, we need a large amount of data to make reasonable10redictions. Therefore, non-experimental values from the AME2016 mass table are also usedin the calculations to increase the performance of the model. While training the model, lightnuclei with N < <
10 is not taken into account and we randomly divide our data set(3388 nuclei in total) into training (70.0%) and testing sets (30.0%), as usual.In Figs. 2 and 3, we display the binding energy (BE) differences between the experimentaldata and the results from the MLP model using two different architectures and inputs. Thefirst architecture has three hidden layers (32-16-8), and we only use the proton and massnumbers of nuclei (Z, A) in the input channel (see Fig. 2(a)). Then, we also add pairingand promiscuity factors (Z, A, δ , P) to see their effects on the results (see Fig. 2(b)).The predictions of the MLP with the three hidden layers are given in Fig. 2 for 1017nuclei in the testing set. Although the root-mean-square deviation ( σ rms ) with respectto the experimental data is high and obtained as σ rms = 3 .
98 MeV, the MLP can makereasonable predictions for the nuclear binding energies using only (Z, A) as inputs, and theresults are comparable with the predictions of the nuclear energy density functionals. Firstthing to notice is the large deviation of the model results for light and heavy nuclei, whichcan be related to the limited number of experimental data in these regions. Adding morephysical information to the input, we find that the predictive power of the MLP increasesconsiderably as can be seen from Fig.2(b). By adding pairing and promiscuity factors to theinput channel, the root-mean-square deviation is improved by about 45.73% and obtainedas 2.16 MeV for the testing set nuclei. It is also clear that the predictive power of the MLPincreases considerably for light and heavy nuclei. To see the performance of the MLP indifferent regions of the nuclear landscape, we divide the testing set into three parts as lightnuclei (Z <
20, 93 nuclei), medium-heavy nuclei (20 ≤ Z ≤
82, 696 nuclei), and super-heavynuclei (Z ≥
82, 228 nuclei). Using (Z, A) as inputs, the σ rms values are obtained as 8.60, 2.93,and 3.80 MeV for light, medium-heavy and super-heavy nuclei, respectively. Increasing thenumber of the inputs in the MLP architecture, namely using (Z, A, δ , P) in the input channel,we obtain important improvements in the σ rms values (see Table I). For instance, the σ rms value for light nuclei is decreased and found as 3.39 MeV, which corresponds to 60.58%improvement in the predictions. Besides, the σ rms value for medium-heavy and super-heavynuclei is decreased and obtained as 2.10 and 1.61 MeV, respectively. Using MLP with fourinputs, the model predictions are improved by about 28.32% and 57.63% for medium-heavyand super-heavy nuclei, respectively. 11 ABLE I. The root-mean square deviations ( σ rms ) in units of MeV for different MLP architecturesand inputs. The results of the best MLP architectures are shown in bold. Since the number ofthe parameters are higher than the number of the training data set, the results of the two hiddenlayers (64-64) MLP architecture are not presented.MLP Input Z <
20 20 ≤ Z ≤
82 Z >
82 Testing set93 nuclei 696 nuclei 228 nuclei 1017 nuclei(32-32) (A,Z) 7.80 2.80 2.11 3.46(64-32) (A,Z) 2.93 2.35 4.50 3.01(32-32) (A,Z, δ ,P) 4.13 1.95 4.76 3.04(64-32) (A,Z, δ ,P) 5.00 1.88 3.07 2.61(32-16-8) (A,Z) 8.60 2.93 3.80 3.98(64-8-4) (A,Z) 6.42 4.07 4.83 4.51(64-16-8) (A,Z) 6.22 2.11 2.05 2.75 (32-16-8) (A,Z, δ ,P) 3.39 2.10 1.61 2.16 (64-8-4) (A,Z, δ ,P) 6.75 2.93 4.76 3.90(64-16-8) (A,Z, δ ,P) 4.60 1.89 2.47 2.40(32-16-8-4) (A,Z) 10.37 3.01 2.60 4.20(32-16-16-8) (A,Z) 1.98 2.73 2.37 2.60(32-32-16-8) (A,Z) 4.89 3.06 4.82 3.72(32-16-8-4) (A,Z, δ ,P) 5.03 3.22 3.24 3.43(32-16-16-8) (A,Z, δ ,P) 9.11 2.83 2.27 3.78 (32-32-16-8) (A,Z, δ ,P) 3.03 1.58 1.94 1.84 (32-16-8-8-8) (A,Z) 4.20 2.56 5.25 3.50(32-16-16-8-4) (A,Z) 4.21 2.83 4.88 3.52(64-32-16-16-8) (A,Z) 2.87 2.46 5.55 3.44(32-16-8-8-8) (A,Z, δ ,P) 5.56 2.17 1.35 2.54(32-16-16-8-4) (A,Z, δ ,P) 10.83 2.98 1.78 4.18(64-32-16-16-8) (A,Z, δ ,P) 3.83 7.47 2.47 4.92 M L P ( 3 2 - 1 6 - 8 ) U N E D F 1 A BEexp. - BEpredict. (MeV)
M L P ( 3 2 - 1 6 - 8 ) S K M *
BEexp. - BEpredict. (MeV) A BEexp. - BEpredict. (MeV)BEexp. - BEpredict. (MeV)
M L P ( 3 2 - 1 6 - 8 ) F R D M - 2 0 1 2 A M L P ( 3 2 - 1 6 - 8 ) W S 3 A FIG. 4. Comparison of the experimental and predicted binding energy differences for 956 nucleibetween the MLP with three hidden layers and (a) UNEDF1 , SKM* , (c) FRDM-2012 and (d) WS3 models. The black dashed lines are given to guide the eye. It is known that increasing or decreasing the number of hidden layers can also affectthe predictive power of the neural networks. Therefore, we also increase the number of thehidden layers to four in the MLP architecture and the same calculations are repeated to seeits effect on the results. In Fig. 3, the binding energy differences between the experimentaldata and MLP predictions with three hidden layers are displayed for 1017 nuclei in thetesting set. By increasing the number of the hidden layer by one unit, the predictive powerof the MLP model increases, and the σ rms values are obtained as 3.72 and 1.84 MeV withtwo (see Fig.3(a)) and four inputs (see Fig.3(b)), respectively. Similar to the MLP withthe three hidden layers, the largest deviations in the binding energies are obtained for lightand super-heavy nuclei, and inclusion of the additional inputs increases the success of themodel in these regions. Using MLP with four hidden layers and two inputs (Z, A), the σ rms deviations are obtained as 4.89, 3.06, and 4.82 MeV for light, medium-heavy, and super-13 M L P ( 3 2 - 3 2 - 1 6 - 8 ) U N E D F 1 A BEexp. - BEpredict. (MeV)
M L P ( 3 2 - 3 2 - 1 6 - 8 ) S K M *
BEexp. - BEpredict. (MeV) A BEexp. - BEpredict. (MeV)BEexp. - BEpredict. (MeV)
M L P ( 3 2 - 3 2 - 1 6 - 8 ) F R D M - 2 0 1 2 A M L P ( 3 2 - 3 2 - 1 6 - 8 ) W S 3 A FIG. 5. The same as in Fig. 4, but using the four layers MLP architecture. heavy nuclei, respectively. Adding the pairing and promiscuity factors to the input channels,the results are improved by about 38.03%, 48.37%, and 59.75% and the σ rms deviation valuesare obtained as 3.03, 1.58, and 1.94 MeV for light, medium-heavy and super-heavy nuclei(see Table I), respectively.We should mention that different MLP architectures with different number of hiddenlayers or hidden units are also tested to make nuclear mass predictions, and obtain the bestMPL model. The results of these works are also given in Table I. We found that decreasingthe number of the hidden layers also decrease the predictive power of the results. On theother hand, increasing the number of the hidden layers more than four also does not givebetter results. In this work, the best results are obtained using the three (32-16-8) and four(32-32-16-8) layers MLP architectures with four inputs. Our results indicate that using theproper number of hidden layers and inputs in the MLP architecture, we can make fast andreliable predictions for nuclear properties. Since the predictions are better using the (Z, A, δ , P) as inputs, we always use the results with four inputs to compare with other mic-mac14 ABLE II. The root-mean square deviations ( σ rms ) in units of MeV for the common 956 nucleiusing various models. In here, MLP and MLP represent three layers (32-16-8) and four layers (32-32-16-8) MLP architectures using (Z, A, δ , P) in the input channel. The nuclear bindings energyresults are taken from the nuclear energy density functionals UNEDF1 and SkM* .The results of the FRDM-2012 and WS3 models are taken from Refs. .MLP MLP UNEDF1 SKM* FRDM-2012 WS3 σ rms and microscopic results in the rest of the paper.In figures 4 and 5, we also display the binding energy differences between experimentaldata and modeling results from the MLP architectures with three and four hidden layersand using four (Z, A, δ , P) inputs. Both mic-mac (FRDM-2012 and WS3 ) and microscopic(UNEDF1 and SkM* ) results from theoretical database Explorer are shownalong with the MLP predictions for comparison. It is known that the accuracy of the self-consistent models is not high for the nuclear mass predictions and the root-mean-squaredeviations are obtained at about several MeV. On the other side, the mic-mac models(FRDM-2012 and WS3) are not self-consistent, and their parameter constants are fittedusing the available experimental data, which in turn provide better estimations for thenuclear binding energies as can be seen from Figs. 4 and 5. The first thing to noticeis that the binding energy differences are generally high for light and heavy nuclei in allmodel predictions. Compared to the UNEDF1, the deviation of the SkM* results from theexperimental data is quite high, and increases with the increase in mass number. For mic-mac models (FRDM-2012 and WS3), the highest deviations are obtained for nuclei withA ≥ ≥ σ rms ) are also given for each model to get a15etter insight into the success of the models. The best results are obtained for FRDM-2012and WS3 models, and rms deviations are obtained below 1.0 MeV. While the rms deviationis rather high using the SkM* functional and obtained as 7.81 MeV, the UNEDF1 functionalmakes better predictions and rms deviation is obtained as 2.13 MeV. The root-mean-squaredeviations are still high using the self-consistent microscopic models in the calculations.Besides, calculations using nuclear energy density functionals are still computationally de-manding. It is seen that the MLP model can make reliable predictions that are comparableto the well-known microscopic models, and the σ rms values are obtained as 1.97 and 1.72MeV with three and four layers MLP architectures. IV. CONCLUSIONS
We implement the multilayer perceptron (MLP) to make ground-state binding energypredictions for atomic nuclei. Two different architectures and inputs are used in the MLPmodel to study the performance of this neural network in the binding energy predictions. Inthe first one, we only use the proton and mass numbers of nuclei alongside with the latestexperimental data, and no physical input is included. Then, we also added two additionalinputs: pairing and promiscuity factors of nuclei to give more physical information to themodels. We find that using proper hidden layers and units with relevant information in theinput channels, the nuclear binding energy predictions using the MLP improve considerably,especially for light and medium-heavy nuclei. For 1017 nuclei in the testing set, the bestroot-mean-square deviations are obtained as 2.16 MeV and 1.84 MeV for three and fourlayers MLP architectures using (Z, A, δ , P) inputs, respectively.Our findings show that the MLP model can make reasonable predictions for bindingenergies of atomic nuclei and the results are also comparable to other models. Although theMLP does not include any physics theory behind it and considered as a statistical model, itis seen that the model can make fast and reliable predictions with a proper architecture andrelevant inputs. In this respect, the artificial neural networks can be seen as an alternativetool to other mic-mac and microscopic models.As future work, we plan to extend our calculations by including more physical quantitiesin the input to better estimate the nuclear properties. Improving the extrapolation abilitiesof the neural networks for very neutron-rich nuclei is also another challenging task and16emains as a future work. Besides, the neural network approaches can be used to train theresidues of nuclear properties as it is done in Refs. , which in turn can be helpful tounderstand the missing physics behind the microscopic models. REFERENCES D. Lunney, J. M. Pearson, and C. Thibault, “Recent trends in the determination of nuclearmasses,” Rev. Mod. Phys. , 1021–1082 (2003). M. R. Mumpower, R. Surman, D.-L. Fang, M. Beard, P. M¨oller, T. Kawano, and A. Apra-hamian, “Impact of individual nuclear masses on r -process abundances,” Phys. Rev. C ,035807 (2015). M. Mumpower, R. Surman, D. L. Fang, M. Beard, and A. Aprahamian, “The impact ofuncertain nuclear masses near closed shells on ther-process abundance pattern,” Journalof Physics G: Nuclear and Particle Physics , 034027 (2015). H. Schatz and W.-J. Ong, “Dependence of x-ray burst models on nuclear masses,” TheAstrophysical Journal , 139 (2017). D. Martin, A. Arcones, W. Nazarewicz, and E. Olsen, “Impact of nuclear mass uncertain-ties on the r process,” Phys. Rev. Lett. , 121101 (2016). M. Wang, G. Audi, F. G. Kondev, W. Huang, S. Naimi, and X. Xu, “The AME2016atomic mass evaluation (II). tables, graphs and references,” Chinese Physics C , 030003(2017). N. Wang, Z. Liang, M. Liu, and X. Wu, “Mirror nuclei constraint in nuclear mass formula,”Phys. Rev. C , 044304 (2010). M. Liu, N. Wang, Y. Deng, and X. Wu, “Further improvements on a global nuclear massmodel,” Phys. Rev. C , 014333 (2011). P. M¨oller, W. D. Myers, H. Sagawa, and S. Yoshida, “New finite-range droplet mass modeland equation-of-state parameters,” Phys. Rev. Lett. , 052501 (2012). S. Goriely, N. Chamel, and J. M. Pearson, “Skyrme-hartree-fock-bogoliubov nuclear massformulas: Crossing the 0.6 mev accuracy threshold with microscopically deduced pairing,”Phys. Rev. Lett. , 152503 (2009). S. Goriely, S. Hilaire, M. Girod, and S. P´eru, “First gogny-hartree-fock-bogoliubov nuclearmass model,” Phys. Rev. Lett. , 242501 (2009).17 M. Stoitsov, J. Dobaczewski, W. Nazarewicz, and P. Borycki, “Large-scale self-consistentnuclear mass calculations,” International Journal of Mass Spectrometry , 243 – 251(2006). J. Erler, N. Birge, M. Kortelainen, W. Nazarewicz, E. Olsen, A. Perhac, and M. Stoitsov,“The limits of the nuclear landscape,” Nature , 509–512 (2012). A. V. Afanasjev and S. E. Agbemava, “Covariant energy density functionals: Nuclearmatter constraints and global ground state properties,” Phys. Rev. C , 054310 (2016). S. E. Agbemava, A. V. Afanasjev, D. Ray, and P. Ring, “Global performance of covariantenergy density functionals: Ground state observables of even-even nuclei and the estimateof theoretical uncertainties,” Phys. Rev. C , 054320 (2014). G. Audi, A. Wapstra, and C. Thibault, “The ame2003 atomic mass evaluation: (ii). tables,graphs and references,” Nuclear Physics A , 337 – 676 (2003), the 2003 NUBASE andAtomic Mass Evaluations. S. Goriely, F. Tondeur, and J. Pearson, “A hartree–fock nuclear mass table,” Atomic Dataand Nuclear Data Tables , 311 – 381 (2001). G. Audi and A. Wapstra, “The 1995 update to the atomic mass evaluation,” NuclearPhysics A , 409 – 480 (1995). S. Goriely, N. Chamel, and J. M. Pearson, “Further explorations of skyrme-hartree-fock-bogoliubov mass formulas. xvi. inclusion of self-energy effects in pairing,” Phys. Rev. C , 034337 (2016). L. Geng, H. Toki, and J. Meng, “Masses, Deformations and Charge Radii—NuclearGround-State Properties in the Relativistic Mean Field Model,” Progress of TheoreticalPhysics , 785–800 (2005). X. Xia, Y. Lim, P. Zhao, H. Liang, X. Qu, Y. Chen, H. Liu, L. Zhang, S. Zhang, Y. Kim,and J. Meng, “The limits of the nuclear landscape explored by the relativistic continuumhartree–bogoliubov theory,” Atomic Data and Nuclear Data Tables , 1 – 215(2018). S. Akkoyun, T. Bayram, S. O. Kara, and A. Sinan, “An artificial neural network appli-cation on nuclear charge radii,” Journal of Physics G: Nuclear and Particle Physics ,055106 (2013). T. Bayram, S. Akkoyun, and S. O. Kara, “A study on ground-state energies of nuclei byusing neural networks,” Annals of Nuclear Energy , 172 – 175 (2014).18 I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning (MIT Press, 2016). R. Utama and J. Piekarewicz, “Refining mass formulas for astrophysical applications: Abayesian neural network approach,” Phys. Rev. C , 044308 (2017). Z. Niu and H. Liang, “Nuclear mass predictions based on bayesian neural network approachwith pairing and shell effects,” Physics Letters B , 48 – 53 (2018). L. Neufcourt, Y. Cao, W. Nazarewicz, and F. Viens, “Bayesian approach to model-basedextrapolation of nuclear observables,” Phys. Rev. C , 034318 (2018). L. Neufcourt, Y. Cao, W. Nazarewicz, E. Olsen, and F. Viens, “Neutron drip line in theca region from bayesian model averaging,” Phys. Rev. Lett. , 062502 (2019). K. Gernoth, J. Clark, J. Prater, and H. Bohr, “Neural network models of nuclear system-atics,” Physics Letters B , 1 – 7 (1993). S. Gazula, J. Clark, and H. Bohr, “Learning and prediction of nuclear stability by neuralnetworks,” Nuclear Physics A , 1 – 26 (1992). S. Athanassopoulos, E. Mavrommatis, K. Gernoth, and J. Clark, “Nuclear mass system-atics using neural networks,” Nuclear Physics A , 222 – 235 (2004). M. U. Anil, T. Malik, and K. Banerjee, “Nuclear binding energy predictions based onmachine learning,” ArXiv:2004.14196 (2020). R.-D. Lasseri, D. Regnier, J.-P. Ebran, and A. Penon, “Taming nuclear complexity witha committee of multilayer neural networks,” Phys. Rev. Lett. , 162502 (2020). E. Alpaydın, Introduction to Machine Learning (MIT Press, 2014). D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by back-propagating errors,” Nature , 533–536 (1986). D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” preprintArXiv:1412.6980 (2014). A. Idinil, “Statistical learnability of nuclear masses,” arXiv:1904.00057 (2019). H. F. Zhang, L. H. Wang, J. P. Yin, P. H. Chen, and H. F. Zhang, “Performance ofthe levenberg–marquardt neural network approach in nuclear mass prediction,” Journal ofPhysics G: Nuclear and Particle Physics , 045110 (2017). X. Glorot, A. Bordes, and Y. Bengio, “Deep sparse rectifier neural networks,” 14th Inter-national Conference on Artificial Intelligence and Statistics , 315–323 (2011). X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neuralnetworks,” Proceedings of the thirteenth international conference on artificial intelligence19nd statistics , 249–256 (2010). R. F. Casten and N. V. Zamfir, “The evolution of nuclear structure: the scheme and relatedcorrelations,” Journal of Physics G: Nuclear and Particle Physics , 1521–1552 (1996). M. W. Kirson, “Mutual influence of terms in a semi-empirical mass formula,” NuclearPhysics A , 29 – 60 (2008). M. Kortelainen, J. McDonnell, W. Nazarewicz, P.-G. Reinhard, J. Sarich, N. Schunck,M. V. Stoitsov, and S. M. Wild, “Nuclear energy density optimization: Large deforma-tions,” Phys. Rev. C , 024304 (2012). “Mass explorer, http://massexplorer.frib.msu.edu/content/dftmasstables.html,” (2020). J. Bartel, P. Quentin, M. Brack, C. Guet, and H.-B. H˚akansson, “Towards a betterparametrisation of skyrme-like effective forces: A critical study of the skm force,” NuclearPhysics A386