Symmetry-adapted graph neural networks for constructing molecular dynamics force fields
Zun Wang, Chong Wang, Sibo Zhao, Shiqiao Du, Yong Xu, Bing-Lin Gu, Wenhui Duan
SSymmetry-adapted graph neural networks for constructing molecular dynamics force fields
Zun Wang , Chong Wang , ∗ Sibo Zhao , Shiqiao Du , Yong Xu , Bing-Lin Gu , and Wenhui Duan State Key Laboratory of Low Dimensional Quantum Physics and Department of Physics, Tsinghua University, Beijing, 100084, China Department of Physics, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA Institute for Advanced Study, Tsinghua University, Beijing 100084, China Frontier Science Center for Quantum Information, Beijing 100084, China RIKEN Center for Emergent Matter Science (CEMS), Wako, Saitama 351-0198, Japan (Dated: January 11, 2021)Molecular dynamics is a powerful simulation tool to explore material properties. Most of the realistic ma-terial systems are too large to be simulated with first-principles molecular dynamics. Classical moleculardynamics has lower computational cost but requires accurate force fields to achieve chemical accuracy. Inthis work, we develop a symmetry-adapted graph neural networks framework, named molecular dynamicsgraph neural networks (MDGNN), to construct force fields automatically for molecular dynamics simula-tions for both molecules and crystals. This architecture consistently preserves the translation, rotationand permutation invariance in the simulations. We propose a new feature engineering method includinghigher order contributions and show that MDGNN accurately reproduces the results of both classical andfirst-principles molecular dynamics. We also demonstrate that force fields constructed by the model hasgood transferability. Therefore, MDGNN provides an efficient and promising option for molecular dynam-ics simulations of large scale systems with high accuracy.
I. INTRODUCTION
Molecular dynamics (MD) is an indispensable simula-tion method in physics, chemistry, biology and material sci-ence. While current first-principles MD can feasibly com-pute material systems of small sizes, the computational costis formidable for large scale simulations [1]. On the otherhand, the computational cost for classical MD is typicallysmaller than first-principles MD by orders of magnitude,but the accuracy depends heavily on the precision of forcefields, which describe the interactions between atoms. Tra-ditional construction of force fields starts with predeter-mined forms of force fields and then the parameters are fit-ted to free energies of different atomic configurations [2–4].This process relies on expert knowledge of computationalquantum chemistry and generally cannot be automated.Recently, machine learning methods emerge as an ef-ficient tool for handling tasks where human labor wasthought to be indispensable. Especially, with the tremen-dous expressive power [5, 6] and generalization ability ofdeep neural networks, reconstructing accurate force fieldswith fewer data points becomes possible. Several success-ful schemes that reproduce force fields by machine learn-ing have been proposed, including the Behler-Parrinelloneural networks [7], the gradient-domain machine learn-ing [8], the deep tensor neural networks [9] and deep po-tential molecular dynamics [10–12]. The free energy of asystem is manifestly invariant under rotation, translationand permutation within the atoms of the same kind. Thispoint, although trivial to human beings, is not transpar-ent to the neural networks if the Cartesian coordinates areused directly as inputs. To optimize the outcome of forcefields constructed by machine learning, it is necessary to ∗ [email protected] † [email protected] expose all the known symmetries to the neural networks.For this purpose, the Behler-Parrinello neural networks [7]construct a set of many-body symmetry functions of coor-dinates as the input, which can only handle a few num-ber of distinct elements. The loss function of the Behler-Parrinello neural networks [7] merely includes the contri-bution of energies, which usually makes the training pro-cess of the model trapped into a local minimum [13, 14].The gradient-domain machine learning [8] and its variant(the symmetric gradient domain machine learning [15])are kernel-based statistical learning models, in which thecoordinates are mapped to the ordered eigenvalues of theCoulomb matrix. The input features of the deep tensor neu-ral networks [9], which is proposed for small molecules,are charges and distances expanded in a Gaussian basis.Deep potential molecular dynamics (DeePMD) [10–12, 16]has been developed as a systematic pipeline to reduce datapreprocessing, but the coordinates still need to be trans-formed into local spherical coordinate frames and orderedwith some rules to preserve all the symmetries. The reasonof heavy data preprocessing in the previous models is thatthe graph-structured data could not be processed directlyby fully connected neural networks or statistical learningmethods. A straightforward and intuitive method of char-acterizing the atom configurations is still lacking.Recently, graph neural networks (GNNs) have showngreat potential in learning from graph-structured data, andhas become an increasingly important tool in machinelearning [17–22]. SchNet [14] is a GNN-based frame-work whose inputs are the embeddings of charges and theGaussian expansion of distances. DimeNet [23] is anothernovel GNN model whose features are inspired by solutionsof time-independent Schrödinger equation. Innovatively,DimeNet [23] explicitly introduces angular information toGNN with directed graphs. However, these two modelsare only applied to small molecules. The tensor embed-ded atom network (TeaNet) [24] is a GNN-based framework a r X i v : . [ phy s i c s . c o m p - ph ] J a n proposed to construct force fields for crystals. Similar to it-erative relaxation, TeaNet repeats its interaction block sev-eral times to predict the ground state energies. Since theinteraction blocks share the same parameters, TeaNet is ac-tually an original GNN [17] framework in analogy with re-cursive neural networks [25–27], which probably tends tosuffer from over-smooth issues [28].In this work, we propose a symmetry-adapted GNNframework, named Molecular Dynamics Graph NeuralNetwork (MDGNN), to construct accurate force fields forboth molecules and crystals. Compared to the tradi-tional GNN which preserves the permutation invariance,MDGNN additionally imposes restrictions to respect trans-lation and rotation invariance. With a new feature engi-neering method proposed to introduce higher order effects,we show this architecture can reproduce the potential-energy surface (PES) up to the accuracy of first-principlesmethods for both molecules and crystals. The transferabil-ity of MDGNN-constructed force fields is demonstrated bysimulating materials that MDGNN was not trained with.Compared to first-principles MD and even classical MD,MDGNN is computationally more efficient. With minimaldata preprocessing, MDGNN automates the construction offorce fields and is a computationally efficient tool to explorematerial properties. II. RESULTSA. Architecture
With atoms identified as vertices and interatomic inter-actions identified as edges, GNN can naturally describe acollection of atoms and all the symmetries mentioned pre-viously can be automatically enforced. In this work wefocus on undirected graphs 𝒢 . The goal of the GNN isto achieve the hidden state embeddings of each node bygraph perception. With the neighborhoods of the node 𝑖 passing their messages to the center, the hidden states ofthe node 𝑖 will contain the information from the neighbor-hoods. Gilmer et al. [29] proposed the recapitulative for-mula of the GNNs, 𝐱 (𝑘)𝑖 = 𝛾 (𝑘)𝚯 (𝐱 (𝑘−1)𝑖 , □ 𝑗∈𝒩 (𝑖) 𝜙 (𝑘)𝚯 (𝐱 (𝑘−1)𝑖 , 𝐱 (𝑘−1)𝑗 , 𝐞 (𝑘−1)𝑖𝑗 )) (1)where the superscript 𝑘 represents the 𝑘 -th GNN layer, and 𝐱 𝑖 and 𝐱 𝑗 are the features of node 𝑖 and node 𝑗 , respec-tively. 𝐞 𝑖𝑗 denotes the feature of the edge connecting nodes 𝑖 and 𝑗 . 𝜙 𝚯 and 𝛾 𝚯 are message passing function and up-dating function, respectively. The specific forms of thesefunctions could be expressed as multi-layer perceptrons(MLPs). 𝒩(𝑖) denotes the neighborhoods of node 𝑖 and □ is the aggregation operator (e.g. sum or average), whichensures the permutation invariance of the graph data. Infact, the formula implements a spatially convolutional op-eration on the graph, thus the architecture is termed GraphConvolutional Network (GCN) [30]. This formula shouldbe viewed as a two-step continuous procedure, i.e., mes- sage passing step and updating step. In the message passingstep, neighboring nodes pass their information to the cen-tral node (the convolution center). The central node thenperforms an aggregation operation on the informing infor-mation and correspondingly update its hidden state. Thefinal hidden states of each node are obtained after severalGNN layers and are used to carry out node-level tasks. Inour model, a graph-level task is conducted with a final read-out layer [31, 32].The architecture of the model is drawn schematically inFig. 1 (a). The input of the framework is a graph trans-formed from a frame of the molecular dynamics trajectory.Periodic boundary condition will always be adopted, evenfor isolated molecules, where vacuum layers are insertedbetween repeated images. The inputs of the computationalgraph are only the embeddings of atomic type and the co-ordinates without any data preprocessing. The neural net-work is expected to predict the potential energy of the cor-responding configuration, which should satisfy translation,rotation and permutation invariance. As mentioned above,GNNs could self-consistently enforce permutation invari-ance. To further ensure translation and rotation invariance,the distances 𝑅 𝑖𝑗 between the atoms are computed as theedge features. It must be emphasized that due to the pe-riodic boundary condition, the distance 𝑅 𝑖𝑗 is calculatedby taking the minimum of the distances between node 𝑖 and the periodic images of node 𝑗 . To save the compu-tational cost, only interatomic interactions within a fixedtruncation radius 𝑟 𝑐 will be calculated. To ensure the dis-tance between two atoms is unique, the minimum imageconvention requires that the shortest lattice parameter ofthe supercell is larger than 𝑐 . The input edge featuresof GCN1 layer which describe atomic positions is the con-catenation of sin[𝑙(𝑅 𝑖𝑗 ∕𝑟 𝑐 ) 𝑚 ∕𝑚!] and cos[𝑙(𝑅 𝑖𝑗 ∕𝑟 𝑐 ) 𝑚 ∕𝑚!] ,where 𝑙 = 1, 2, … , 𝐿 and 𝑚 = 1, 2, … , 𝑀 . These features in-clude high order terms of 𝑅 𝑖𝑗 and take value from −1 to 1.Besides the above features, the embeddings [33] of the tworelevant atomic types are also concatenated as the inputs ofthe next block of the neural network. There are other meth-ods that could be used to distinguish the atomic types, suchas charges or masses. We find that including these featuresdoes not improve MDGNN’s performance. This means thatthese features only distinguish the types of atoms while thespecific parameters of force fields will be learned by train-ing and the principal contributions to the interactions be-tween atoms are the distances. We choose summation asthe aggregation operation to ensure the permutation in-variance. Note that summation is the only operation thatmakes sense since forces and energies are additive. Theabove process can be viewed as a GCN layer with physicallymotivated constraints, which is denoted as GCN0 layer inFig. 1 (a). In a single GCN layer, messages from a node’s -hop neighborhood will be aggregated to center [30]. Sim-ilarly, after 𝑘 GCN layers, the central node will computemessages within 𝑘 -hop distance [34]. The network com-prises several different graph convolutional network’s con-volutional [30] layers, and the last GCN layer outputs localenergies. Nonlinear activations are applied to all but the FIG. 1. The architecture diagram of MDGNN. (a) The overall structure of the model. The input of the model is a molecule graph ora crystal graph transformed from a graph-structured frame of molecular dynamics trajectory. Processed by a translation- and rotation-preserved graph convolutional network (GCN) layer named GCN0 and 𝑁 GCN layers, the node features of the output graph representthe local energies of the center atoms. These features will be element-wise added after every two GCN layers, same as residual networks.The total energy is obtained by summing over all the atomic features directly. The specific local energies are not required and the modelis trained end-to-end to predict the total energy of the supercell. To obey the energy-conservation rule, the forces on each atom arethe derivatives of the total energy with respect to the corresponding coordinates. (b) The crystal graph is built with periodic boundarycondition and minimum image convention. For an isolated molecule, 15 Å vacuum layers are built for each dimension, and then themolecule graph is equivalent to a crystal graph. (c) The details of GCN0 layer which preserves translation and rotation invariance. Theinputs of GCN0 are the embeddings of atomic types and high-order features of 𝑅 𝑖𝑗 of atomic pairs. After several multi-layer perceptrons(MLPs), higher dimensional node features are obtained. final convolutional layer. Note that the hidden state em-beddings are element-wise added every two GCN layers,similar to residual connections [35], which allows GCNsgo deeper without suffering from over-smoothing [28]. Theglobal add pooling simply adds the local potential energiesof all atoms, yielding the potential energy of the supercell.To ensure energy conservation, we obtain forces by takingthe gradients of the local energies with respect to the posi-tions of the input coordinates through automatic differen-tiation of the network and preprocessing operations. This practice is only justified when only conservative forces areinvolved in the dynamics. Notable exceptions include mag-netic field, for which MDGNN cannot be used to performMD simulations. B. Training details
The dataset comprises frames from MD simulations:atom types, atomic coordinates, potential energies andforces at multiple time steps. Atomic types and interatomicdistances are the only inputs to the model. The datasetwhich is transformed to molecular or crystal graphs, is ran-domly shuffled and split into training set, validation set andtest set with a certain ratio. The mean value of the en-ergies of the dataset is subtracted from the energies of allframes. With only atomic embeddings and interatomic dis-tances as initial features, the symmetry-adapted MDGNNpreserves rotation and translation symmetry. MDGNN isprogrammed with the PyTorch [36, 37] and the PyTorch-Geometric [38] Python libraries. The neighborhood of eachcenter atom is selected by the nearest-neighbors algorithmimplemented in the Scikit-Learn [39] library. The periodicboundary condition and the minimum image conventionare enforced in the construction of the neighbor graph andedge values. In feature engineering, we choose
𝐿 = 40 and
𝑀 = 4 to create radial features for describing the interac-tions between linked atoms.Neural networks with more than one hidden layers aremore expressive, thus they are more capable of learning therepresentation of data [40]. For the GCN0 layer, we use 3hidden layers, each with 256 hidden units. The numberof hidden units are 256 for GCN 𝑖 (𝑖 = 1, 2, … , 𝑁 − 1) lay-ers and 1 for GCN 𝑁 layer. Except for GCN 𝑁 , each hid-den layer is followed by a non-linear activation function. Itmust be emphasized that if too many GCN layers are used,GNN will suffer from over-smoothing [28], which meansthe hidden states on each node will be similar because eachnode in the graph contains the information of all the othernodes due to the aggregation operation of GCN. With resid-ual connections alleviating the problem [41], we can traindeeper GCNs. We find the performance of our model islimited if fewer GCNs layers are utilized, since they can-not see “far enough”. In theory an infinitely wide 2 layernetwork that has a very large cutoff radius should be suffi-ciently expressive, but in practice it is much more practical,efficient, and effective to train a network with more layersand a smaller cutoff radius. Therefore, in our model, 8 GCNlayers with residual connection are used to construct theMDGNN framework.The loss function of the model is defined as the sum-mation of the mean absolute errors (MAEs) of energy andforce, ℒ = 𝛾ℒ 𝐸 + (1 − 𝛾)ℒ 𝐹 ,ℒ 𝐸 = 1𝒩 atoms 𝒩 atoms ∑ 𝑖=1 |𝐸 𝑖 − ̂𝐸 𝑖 |,ℒ 𝐹 = 13𝒩 atoms 𝒩 atoms ∑ 𝑖=1 ∑ 𝛼=𝑥,𝑦,𝑧 |𝐹 𝑖𝛼 − ̂𝐹 𝑖𝛼 |, (2)where 𝒩 atoms is the number of the atoms. 𝐸 𝑖 and 𝐹 𝑖 are theenergies and forces predicted by MDGNN, while ̂𝐸 𝑖 and ̂𝐹 𝑖 are the true energies and forces. The subscript character 𝛼 is the Cartesian index and 𝛾 is the weight coefficient toadjust the relative contributions of the two loss terms dur-ing the training process, which is a hyper-parameter thatshould be set manually. Here we find that with 𝛾 = 0.2 and the units of the energies and forces taken as eV andeV/Å respectively, the loss of the MDGNN will convergerapidly.Because the forces are obtained as derivatives of thepredicted potential energy as well as inputs to our lossfunction, the operations in MDGNN must be doubly-differentiable (having a non-trivial second order deriva-tive) in order to perform back-propagation to train the net-work. For this reason, we use the double-differentiablefunction Softplus 𝜁(𝑥) = log(1+exp(𝑥)) [42–44] as the non-linear activation function. The network is trained with theAdam [45] optimizer.To demonstrate the performance of MDGNN on bothmolecular and crystal structures, we will first com-pare the performance of SchNet [14] and MDGNN withMD17, a public molecular dynamics dataset of eight smallmolecules. We then compare MDGNN with DeePMD andTeaNet with two crystals, copper and silicon dioxide. Fi-nally, we demonstrate the transferability of MDGNN withaluminum-copper compounds. C. Molecular and crystal dataset
The MD17 dataset [8] is a molecular dynamics datasetfor small molecules. All trajectories are calculated at a tem-perature of 500 K and a time resolution of 0.5 fs. The poten-tial energy and force labels are computed with PBE+vdW-TS method. The goal of this benchmark is to demonstratethe performance of MDGNN model on molecular systems.Separate MDGNN models are trained for each dataset thatis randomly split into a 1000-frame training set, a 1000-frame validation set and the rest frames are the test set.The learning rate is set as −4 . 15 Å vacuum layers areinserted in each dimension for all the molecules and thetruncated radius is set as 9 Å to construct complete graphs.With 𝐿 and 𝑀 set as 40 and 4 respectively, the final resultsare shown in Table I, and 56.25% of the results obtained byMDGNN outperform that obtained by SchNet [14].While most proposed machine-learning based methodsare solely suitable for small molecules, MDGNN constructsforce fields for both molecules and crystals. To demonstratethe performance of MDGNN on crystals, we compare ourresults of copper (Cu) and silicon dioxide (SiO ) with thosefrom DeePMD and TeaNet. Cu dataset is randomly shuf-fled and split into training set and test set with the ratio 9:1while SiO dataset is randomly shuffled and split into train-ing set, validation set and test set with the ratio 7:1.5:1.5.The learning rates are set as 1e −4 and the cutoff radius ofMDGNN is set to 4.0 Å for both Cu and SiO . With 𝐿 and 𝑀 set as 40 and 4 respectively, the MAEs of these modelsare shown in Table II. MDGNN outperforms DeepPot-SE(DeePMD) and TeaNet on crystal structures except for theenergy MAE of Cu. Although the energy MAE of Cu ofMDGNN is larger than that of DeePMD, the error is stillfar below the chemical accuracy. TABLE I. Comparison between the MAEs of SchNet and MDGNNtrained on MD17 datasets using 1000 training samples (energiesin meV/atom and forces in meV/Å). The values in bold representoutperformance on the same task.
SchNet † MDGNN
Benzene energy 3.44 force force force force 36.55
Aspirin energy 15.91 force 58.05
Ethanol energy force force † The data of SchNet are extracted from [14]
TABLE II. Comparison about the MAEs of DeepPot-SE(DeePMD) [16] and MDGNN trained on Cu dataset split intotraining set and test set with the ratio 90% and 10%, and the MAEsof TeaNet [24] and MDGNN trained on SiO dataset (energiesin meV/atom and forces in meV/Å). The Cu dataset is the pub-lic dataset from DeePMD, and the SiO dataset is a dataset ob-tained by first-principles MD simulation, similar to the datasetfrom TeaNet. For clarity, the better results are indicated in bold. DeepPot-SE (DeePMD) TeaNet MDGNN
Cu energy a - 0.68force 90 (90) a - SiO energy - 19.3 b force - 142 b a The data of DeepPot-SE and DeePMD are extracted from [16] b The data of TeaNet are extracted from [24]
D. Aluminum-Copper Compounds
Herein we consider two types of Aluminum-Coppercompounds: Al Cu [46] and AlCu [47] [see Fig. 2 (a)].We will train MDGNN with Al Cu and test the force fieldson AlCu , for the purpose of demonstrating the transfer-ability. This is a nontrivial task since these two compoundshave very different local chemical environments.Al Cu has a tetragonal lattice with the I4/mcm spacegroup. The dataset is obtained by classical MD simulationsat temperatures from 50 K to 1000 K on a 3 × × FIG. 2. (a) The unit cells of Al Cu (upper panel) and AlCu (lowerpanel). (b) RDFs of 4 × × Cu supercell from MDGNN-based (dotted lines) and ADP force fields (solid lines) under 300 K.The insets are the energy 𝐸 (top) and forces 𝐹 (bottom) of Al Cutest set predicted by the MDGNN v.s. the ground truth, wherethe subscripts gt and p refer to the ground truth and the predic-tion, respectively. (c) RDFs of 3 × × supercell fromMDGNN (dotted lines) trained with Al Cu dataset and from ADPforce fields (solid lines) under 300 K. −4 , respectively. After 7000 training epochs, the train-ing losses for the energy and force are 1.58 meV/atom and28.80 meV/Å with corresponding validation losses being1.98 meV/atom and 28.74 meV/Å respectively. The MAEsof the unseen test set are 1.46 meV/atom and 28.75 meV/Å.The comparison of the ground truth and the prediction ofthe potential energies and forces on test dataset is shownin the insets of Fig. 2 (b). The results from MDGNN are insatisfactory agreement with those of the classical MD simu-lations. We then run MD simulations for a 4 × × Cusupercell at a constant temperature of 300 K with MDGNN-based force fields and ADP force fields. The radial distribu-tion functions (RDF) from the MD simulations, presentedin Fig. 2 (b), are almost identical for the two force fields.This clearly indicates that MDGNN-based force fields canbe well trained on a system of small size and then appliedin MD simulation of larger systems of the same material.To test the transferability of the MDGNN, we performclassical MD simulations for AlCu at 300 K on a 3 × × Cu dataset.The RDFs produced by both MDGNN-based and ADP forcefields are presented in Fig. 2 (c). It can be observed thatMDGNN reproduces all the RDF peak positions, despitesome differences in the detailed RDF shapes and values.The transferability of the model is also tested by calculatingthe formation energy of an Al vacancy in Al Cu, which is5.238 eV from ADP force fields and 4.936 eV from MDGNN.The difference is 5.76%.Even compared with empirical force fields used in clas-sical MD simulations, MDGNN-based force field is consid-erably more economical in computational resources. Ta-ble III presents the computational cost of MD simulationsof 10000 time steps with different force fields on a 4 × × Cu supercell on the same Intel(R) Xeon(R) CPUE5-2676 v3 CPU. MD with MDGNN-based force field ismuch faster than embedded atom method (EAM) [50–52]and ADP force fields. We note that MDGNN is trainedwith ADP force field that contains angle information of dif-ferent bonds. In contrast, EAM force fields do not takeinto account such angle information. Another advantageof MDGNN is that contemporary machine learning tech-nology heavily and naturally utilizes GPU, which is signifi-cantly faster than CPU for many linear algebra operations.With an Nvidia RTX 2080Ti card, MDGNN runs 27 timesfaster than ADP force field on CPU, as shown in Table III.
III. DISCUSSION
As explained in Sec. II A, the input features of MDGNNexplicitly include high order terms of distances betweenatoms. In addition, all the input features take values be-tween and and are thus automatically normalized. Herewe test the performance of MDGNN for different numbersof input features, which is controlled by the choice of 𝐿 and 𝑀 . As shown in Fig. 3 (a), the choice of 𝐿 is critical for theaccuracy of MDGNN, while the value of 𝑀 plays a less rel-evant role. Therefore, we choose 𝐿 = 40 and
𝑀 = 4 for allcalculations in this work.Compared with the Gaussian-type input features in mostof the existing frameworks, our data preprocessing methodmight be more effective. As shown in Fig. 3 (b), black lines represent the input features in SchNet [14] exp (−𝛾 ∥ 𝑅 𝑖𝑗 − 𝜇 𝑘 ∥ ) , where 𝜇 𝑘 takes value from Å to Å with a
Å step, and 𝛾 = 10
Å. While, colorlines represent input features of MDGNN with
𝐿 = 50 and
𝑀 = 3 . Because exp(−𝑥 ) is a rapidly decreasing functionof 𝑥 , most of the input features of SchNet are zero for anyspecific 𝑅 𝑖𝑗 . In contrast, most of the sine- and cosine-typeinput features are finite for any 𝑅 𝑖𝑗 . We suspect our inputfeatures provide more information for the neural networkto construct the force field.Force fields not only depend on distances between atoms,but also depend on angular information such as bond an-gles. Indeed, except for Cu, we have been training MDGNNwith either classical force fields with angle informationor first principles MD data. The excellent accuracy indi-cates MDGNN is able to capture the angle information inthe data, despite the fact that the only input features aredistances between atoms. Due to the sophisticated multi-layer nonlinear structure, deep neural networks are non-transparent and their predictions are not traceable by hu-mans. We partially interpret the working mechanism ofMDGNN in Fig. 3 (c). Although each node aggregates theembeddings of only its neighbors in a single GCN layer, thenext GCN layer will extract the information concerning thesecond-order neighbors from its first-order neighbors. Sim-ilarly, with 𝑘 GCN layers, each node in the graph will aggre- gate 𝑘 -hop information. For three nodes linked with eachother in a graph, forming a triangle, MDGNN may learnthe angular information implicitly from the lengths of thethree sides of the triangle.There are also several other GNN-based architecturesthat could be used in molecular dynamics. For example,visual interaction networks [53], a model for the dynam-ics of the physical systems, learns the interactions from rawframes of motion. If the atoms are regarded as hard spheres,this network might be useful in learning the interactionsbetween atoms. However, considering the down-samplingof the convolutional neural networks, the visual interactionnetworks could not be directly applied to the MD simula-tions. Alternatively, the data in molecular dynamics simu-lations could also be viewed as point clouds, which could beprocessed by PointNet [54] and PointNet++ [55] with somemodifications to preserve all the symmetries. To sum up,these proposed architectures should be capable of process-ing dataset of MD simulations if certain symmetry-adaptedtricks are introduced.We introduce a symmetry-adapted graph neural networkarchitecture to construct molecular dynamics force fieldsfor both molecules and crystals. Previously proposed non-GNN-based models are mostly based on traditional fullyconnected neural networks, for which preprocessing ofinput data is required to ensure the symmetries such asthe translation, rotation and permutation invariance. Onthe other hand, most of the proposed GNN-based modelsare solely applied to molecular structures. MDGNN in-troduced here is based on graph convolutional networks,which can naturally handle graph-structured data. Thisframework transforms both molecular and crystal struc-tures to crystal graphs with periodic boundary conditionand minimum image convention. Force fields constructedby MDGNN can accurately describe systems which arelarger than the training system. For materials with thesame type of atoms as the training material, MDGNN cancapture the most prominent features in the radial distribu-tion function. Although the input features of MDGNN donot explicitly contain the bond angle information of atoms,the results demonstrate that this information is capturedand utilized in the MD simulations with MDGNN-basedforce fields. MDGNN has significant potential to become atool for constructing accurate force fields and potential en-ergy surfaces with low computational cost and good trans- TABLE III. Comparison of computational cost of 10000 MD timesteps with different force fields on a 4 × × Cu supercell.Note that the MD simulations with angular dependent potential(ADP) and embedded atom method (EAM) force fields could berun only on CPU in atomic simulation environment of python.The results from empirical force fields and MDGNN (CPU) areobtained on the same CPU. The result of MDGNN (GPU) is ob-tained with the model allocated on a GPU.Calculator EAM ADP MDGNN MDGNN (GPU)Time (hour) 5.10 11.35 3.24 0.61
FIG. 3. (a) The results of different feature engineerings on aspirin dataset with training size set as 1000. The black lines with differentmarkers represent the energy loss and the blue ones represent forces loss. (b) Every color line represents a corresponding non-linear sineor cosine transformation with some 𝑙 ∈ {1, 2, … , 50} and 𝑚 ∈ {1, 2, 3} . The black lines are the exponential transformations of distances.(c) After a single GCN layer, the messages from 1-hop neighborhoods will be aggregated to the central node, simultaneously these 1-hopneighborhoods will receive the messages of their 1-hop neighborhoods. In the next GCN layer, these 2-hop messages will be sent to thecentral node and so on. Finally, after 𝑘 GCN layers, each central node will receive messages from k-hop neighborhoods. ferability for different systems.
IV. METHODS
The classical MD simulation in the canonical NVT en-semble is performed with the python library atomic sim-ulation environment [56, 57]. The Langevin dynamics ofaluminum-copper compound is run with the angular de-pendent potential [58] force field. The initial momenta isset to a Maxwell-Boltzmann distribution with the temper-ature set to 50 K. The desired temperature, time step andfriction term are set to 1000 K, 1 fs and 0.2 atomic unit, re-spectively. We obtained 7000 frames of the canonical en-semble simulations as dataset. Constant temperature sim-ulations at 300 K for a larger Al Cu supercell and AlCu arealso performed to compare with MDGNN results.The dataset of SiO is generated with first-principlesMD simulation which is performed with Vienna ab initio Simulation Package [59–62] using the projector-augmentedwave [63, 64] pseudopotentials. Generalized gradient ap-proximation is included in the Perdew-Berke-Ernzerhof exchange-correlation potential [65, 66]. The cutoff of planewaves is 400 eV and the Monkhorst-Packk-point meshes of3 × × V. DATA AVAILABILITY
The raw data of MD17 is available at http://quantum-machine.org/datasets/ . The raw dataof Cu is available at . VI. CODE AVAILABILITY
The implementations of MDGNN described in the paperwill be available after the manuscript is accepted for publi-cation. [1] Car, R. & Parrinello, M. Unified approach for molecular dy-namics and density-functional theory.
Phys. Rev. Lett. ,2471 (1985).[2] Vanommeslaeghe, K. et al. Charmm general force field:A force field for drug-like molecules compatible with thecharmm all-atom additive biological force fields.
J. Comput.Chem. , 671–690 (2010).[3] Jorgensen, W. L., Maxwell, D. S. & Tirado-Rives, J. Develop-ment and testing of the opls all-atom force field on confor-mational energetics and properties of organic liquids. J. Am.Chem. Soc. , 11225–11236 (1996).[4] Wang, J., Wolf, R. M., Caldwell, J. W., Kollman, P. A. & Case,D. A. Development and testing of a general amber force field.
J. Comput. Chem. , 1157–1174 (2004).[5] Hornik, K., Stinchcombe, M., White, H. et al. Multilayerfeedforward networks are universal approximators.
Neural Netw. , 359–366 (1989).[6] Hornik, K. Approximation capabilities of multilayer feedfor-ward networks. Neural Netw. , 251–257 (1991).[7] Behler, J. & Parrinello, M. Generalized neural-network rep-resentation of high-dimensional potential-energy surfaces. Phys. Rev. Lett. , 146401 (2007).[8] Chmiela, S. et al. Machine learning of accurate energy-conserving molecular force fields.
Sci. Adv. , e1603015(2017).[9] Schütt, K. T., Arbabzadah, F., Chmiela, S., Müller, K. R. &Tkatchenko, A. Quantum-chemical insights from deep ten-sor neural networks. Nat. Commun. , 1–8 (2017).[10] Han, J., Zhang, L., Car, R. et al. Deep potential: A gen-eral representation of a many-body potential energy surface.
Preprint at https://arxiv.org/abs/1707.01478 (2017). [11] Zhang, L., Han, J., Wang, H., Car, R. & Weinan, E. Deeppotential molecular dynamics: a scalable model with the ac-curacy of quantum mechanics.
Phys. Rev. Lett. , 143001(2018).[12] Wang, H., Zhang, L., Han, J. & Weinan, E. Deepmd-kit: Adeep learning package for many-body potential energy repre-sentation and molecular dynamics.
Comput. Phys. Commun. , 178–184 (2018).[13] Ciccotti, G., Lelievre, T. & Vanden-Eijnden, E. Projection ofdiffusions on submanifolds: Application to mean force com-putation.
Commun. Pure Appl. Math. , 371–408 (2008).[14] Schütt, K. et al. Schnet: A continuous-filter convolutionalneural network for modeling quantum interactions. In
Adv.Neural Inf. Process Syst. , 991–1001 (2017).[15] Chmiela, S., Sauceda, H. E., Müller, K.-R. & Tkatchenko,A. Towards exact molecular dynamics simulations withmachine-learned force fields.
Nat. Commun. , 1–10 (2018).[16] Zhang, L. et al. End-to-end symmetry preserving inter-atomic potential energy model for finite and extended sys-tems.
Adv. Neural Inf. Process Syst. , 4436–4446 (2018).[17] Scarselli, F., Gori, M., Tsoi, A. C., Hagenbuchner, M. & Mon-fardini, G. The graph neural network model. IEEE Trans.Neural. Netw. Learn Syst. , 61–80 (2008).[18] Hamilton, W. L., Ying, R. & Leskovec, J. Representationlearning on graphs: Methods and applications. Preprint athttps://arxiv.org/abs/1709.05584 (2017).[19] Bruna, J., Zaremba, W., Szlam, A. & LeCun, Y. Spectral net-works and locally connected networks on graphs.
Preprint athttps://arxiv.org/abs/1312.6203 (2013).[20] Zhou, J. et al.
Graph neural networks: A re-view of methods and applications.
Preprint athttps://arxiv.org/abs/1812.08434 (2018).[21] Wu, Z. et al.
A comprehensive survey on graph neural net-works.
IEEE Trans. Neural Netw. Learn Syst. (2020).[22] Zhang, Z., Cui, P. & Zhu, W. Deep learning on graphs: Asurvey.
IEEE Trans. Knowl. Data Eng. (2020).[23] Klicpera, J., Groß, J. & Günnemann, S. Directional messagepassing for molecular graphs. In
Int. Conf. Learn. Represent. (2019).[24] Takamoto, S., Izumi, S. & Li, J. Teanet: universal neuralnetwork interatomic potential inspired by iterative electronicrelaxations.
Preprint at https://arxiv.org/abs/1912.01398 (2019).[25] Frasconi, P., Gori, M. & Sperduti, A. A general framework foradaptive processing of data structures.
IEEE Trans. NeuralNetw. , 768–786 (1998).[26] Sperduti, A. & Starita, A. Supervised neural networks forthe classification of structures. IEEE Trans. Neural Netw. ,714–735 (1997).[27] Hagenbuchner, M., Sperduti, A. & Tsoi, A. C. A self-organizing map for adaptive processing of structured data. IEEE Trans. Neural Netw. , 491–505 (2003).[28] Li, Q., Han, Z. & Wu, X.-M. Deeper insights into graph con-volutional networks for semi-supervised learning. In Thirty-Second AAAI Conference on Artificial Intelligence (2018).[29] Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O. & Dahl,G. E. Neural message passing for quantum chemistry. In
Proceedings of the 34th International Conference on MachineLearning-Volume 70 , 1263–1272 (JMLR. org, 2017).[30] Kipf, T. N. & Welling, M. Semi-supervised classifica-tion with graph convolutional networks.
Preprint athttps://arxiv.org/abs/1609.02907 (2016).[31] Ying, Z. et al.
Hierarchical graph representation learningwith differentiable pooling. In
Advances in Neural Informa- tion Processing Systems , 4800–4810 (2018).[32] Zhang, M., Cui, Z., Neumann, M. & Chen, Y. An end-to-enddeep learning architecture for graph classification. In
Thirty-Second AAAI Conference on Artificial Intelligence (2018).[33] Gal, Y. & Ghahramani, Z. A theoretically grounded applica-tion of dropout in recurrent neural networks. In
Adv. NeuralInf. Process Syst. , 1019–1027 (2016).[34] Xu, K. et al.
Representation learning on graphswith jumping knowledge networks.
Preprint athttps://arxiv.org/abs/1806.03536 (2018).[35] He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learningfor image recognition. In
Proc. IEEE Int. Conf. Comput. Vis. ,770–778 (2016).[36] Paszke, A. et al.
Pytorch: An imperative style, high-performance deep learning library. In Wallach, H. et al. (eds.)
Advances in Neural Information Processing Systems 32 , 8024–8035 (Curran Associates, Inc., 2019).[37] Paszke, A. et al.
Automatic differentiation in pytorch (2017).[38] Fey, M. & Lenssen, J. E. Fast graph representation learningwith PyTorch Geometric. In
ICLR Workshop on Representa-tion Learning on Graphs and Manifolds (2019).[39] Pedregosa, F. et al.
Scikit-learn: Machine learning in Python.
J. Mach. Learn Res. , 2825–2830 (2011).[40] Xu, K., Hu, W., Leskovec, J. & Jegelka, S. Howpowerful are graph neural networks? Preprint athttps://arxiv.org/abs/1810.00826 (2018).[41] Li, G., Muller, M., Thabet, A. & Ghanem, B. Deepgcns: Cangcns go as deep as cnns? In
Proc. IEEE Int. Conf. Comput.Vis. , 9267–9276 (2019).[42] Hahnloser, R. H., Sarpeshkar, R., Mahowald, M. A., Douglas,R. J. & Seung, H. S. Digital selection and analogue amplifi-cation coexist in a cortex-inspired silicon circuit.
Nature ,947–951 (2000).[43] Hahnloser, R. H. & Seung, H. S. Permitted and forbidden setsin symmetric threshold-linear networks. In
Adv. Neural Inf.Process Syst. , 217–223 (2001).[44] Glorot, X., Bordes, A. & Bengio, Y. Deep sparse rectifier neu-ral networks. In
Proceedings of the Fourteenth InternationalConference on Artificial Intelligence and Statistics , 315–323(2011).[45] Kingma, D. P. & Ba, J. Adam: A method for stochastic opti-mization.
Preprint at https://arxiv.org/abs/1412.6980 (2014).[46] BRADLEY, A. t. & Jones, P. An x-ray investigation of thecopper-aluminium alloys.
J. Inst. Met. , 131–157 (1933).[47] Tarora, I. The transformation process of 𝛽 -phase of cu-al sys-tem and the effect of mn addition upon it (fouth report). J.Jpn. Inst. Met. , 13–8 (1979).[48] Becker, C. A., Tavazza, F., Trautt, Z. T. & de Macedo, R. A. B.Considerations for choosing and using force fields and inter-atomic potentials in materials science and engineering. Curr.Opin. Solid State Mater. Sci. , 277–283 (2013).[49] Hale, L. M., Trautt, Z. T. & Becker, C. A. Evaluating vari-ability with atomistic simulations: the effect of potential andcalculation methodology on the modeling of lattice and elas-tic constants. Model Simul. Mat. Sci. Eng. , 055003 (2018).[50] Daw, M. S. & Baskes, M. I. Semiempirical, quantum mechan-ical calculation of hydrogen embrittlement in metals. Phys.Rev. Lett. , 1285 (1983).[51] Daw, M. S. & Baskes, M. I. Embedded-atom method: Deriva-tion and application to impurities, surfaces, and other de-fects in metals. Phys. Rev. B , 6443 (1984).[52] Liu, X.-Y., Liu, C.-L. & Borucki, L. A new investigation ofcopper’s role in enhancing al–cu interconnect electromigra-tion resistance from an atomistic view. Acta Mater. , 3227– et al. Visual interaction networks: Learning aphysics simulator from video. In
Adv. Neural Inf. ProcessSyst. , 4539–4547 (2017).[54] Qi, C. R., Su, H., Mo, K. & Guibas, L. J. Pointnet: Deep learn-ing on point sets for 3d classification and segmentation. In
Proc. IEEE Int. Conf. Comput. Vis. , 652–660 (2017).[55] Qi, C. R., Yi, L., Su, H. & Guibas, L. J. Pointnet++: Deephierarchical feature learning on point sets in a metric space.In
Adv. Neural Inf. Process Syst. , 5099–5108 (2017).[56] Larsen, A. H. et al.
The atomic simulation environment—apython library for working with atoms.
J. Phys. Condens.Matter , 273002 (2017).[57] Bahn, S. R. & Jacobsen, K. W. An object-oriented scriptinginterface to a legacy electronic structure code. Comput. Sci.Eng. , 56–66 (2002).[58] Apostol, F. & Mishin, Y. Interatomic potential for the al-cusystem. Phys. Rev. B , 054116 (2011).[59] Kresse, G. & Hafner, J. Ab initio molecular dynamics for liq-uid metals. Phys. Rev. B , 558 (1993).[60] Kresse, G. & Hafner, J. Ab initio molecular-dynamics sim-ulation of the liquid-metal–amorphous-semiconductor tran-sition in germanium. Phys. Rev. B , 14251 (1994).[61] Kresse, G. & Furthmüller, J. Efficiency of ab-initio totalenergy calculations for metals and semiconductors using aplane-wave basis set. Comput. Mater. Sci. , 15–50 (1996).[62] Kresse, G. & Furthmüller, J. Efficient iterative schemes forab initio total-energy calculations using a plane-wave basisset. Phys. Rev. B , 11169 (1996).[63] Blöchl, P. E. Projector augmented-wave method. Phys. Rev.B , 17953 (1994).[64] Kresse, G. & Joubert, D. From ultrasoft pseudopotentials tothe projector augmented-wave method. Phys. Rev. B , 1758 (1999).[65] Perdew, J. P., Burke, K. & Ernzerhof, M. Generalized gradientapproximation made simple. Phys. Rev. Lett. , 3865 (1996).[66] Perdew, J. & Burke, K. Ernzerhof, m., 1997. erratum: Gener-alized gradient approximation made simple. Phys. Rev. Lett. , 1396. ACKNOWLEDGMENTS
This work was supported by the Basic Science Cen-ter Project of NSFC (Grant No. 51788104), the Min-istry of Science and Technology of China (Grant Nos.2016YFA0301001 and 2017YFB0701502), and the BeijingAdvanced Innovation Center for Materials Genome Engi-neering.
VII. AUTHOR CONTRIBUTIONS
Z.W., C.W. and W.D. conceived the idea, designed the re-search. Z.W. implemented the codes and S.Z. further opti-mized these codes. Z.W. and S.D. performed the moleculardynamics simulations. W.D., Y.X. and B.G. supervised thework. All authors discussed the results and were involvedin the writing of the manuscript.