[PDF] Predicting Material Properties Using a 3D Graph Neural Network with Invariant Local Descriptors

Abstract

Accurately predicting material properties is critical for discovering and designing novel materials. Machine learning technologies have attracted significant attention in materials science community for their potential for large-scale screening. Among the machine learning methods, graph convolution neural networks (GCNNs) have been one of the most successful ones because of their flexibility and effectiveness in describing 3D structural data. Most existing GCNN models focus on the topological structure but overly simplify the three-dimensional geometric structure. In materials science, the 3D-spatial distribution of the atoms, however, is crucial for determining the atomic states and interatomic forces. In this paper, we propose an adaptive GCNN with novel convolutions that model interactions among all neighboring atoms in three-dimensional space simultaneously. We apply the model to two distinctly challenging problems on predicting material properties. The first is Henry's constant for gas adsorption in Metal-Organic Frameworks (MOFs), which is notoriously difficult because of its high sensitivity to atomic configurations. The second is the ion conductivity of solid-state crystal materials, which is difficult because of very few labeled data available for training. The new model outperforms existing GCNN models on both data sets, suggesting that some important three-dimensional geometric information is indeed captured by the new model.

Full PDF

PPredicting Material Properties Using a 3D Graph Neural Networkwith Invariant Local Descriptors ⋆ Boyu Zhang a , ∗ , Mushen Zhou b , Jianzhong Wu b and Fuchang Gao a a Department of Mathematics and Statistical Science, University of Idaho, Moscow, ID 83843, United States b Department of Chemical and Environmental Engineering, University of California, Riverside, CA 92521, United States

A R T I C L E I N F O

Keywords :Property PredictionGraph Convolution Neural NetworkHenry’s ConstantMetal-Organic Frameworks

A B S T R A C T

Accurately predicting material properties is critical for discovering and designing novel materi-als. Machine learning technologies have attracted signiﬁcant attention in materials science com-munity for their potential for large-scale screening. Among the machine learning methods, graphconvolution neural networks (GCNNs) have been one of the most successful ones because of theirﬂexibility and eﬀectiveness in describing 3D structural data. Most existing GCNN models focuson the topological structure but overly simplify the three-dimensional geometric structure. Inmaterials science, the 3D-spatial distribution of the atoms, however, is crucial for determiningthe atomic states and interatomic forces. In this paper, we propose an adaptive GCNN with novelconvolutions that model interactions among all neighboring atoms in three-dimensional spacesimultaneously. We apply the model to two distinctly challenging problems on predicting ma-terial properties. The ﬁrst is Henry’s constant for gas adsorption in Metal-Organic Frameworks(MOFs), which is notoriously diﬃcult because of its high sensitivity to atomic conﬁgurations.The second is the ion conductivity of solid-state crystal materials, which is diﬃcult because ofvery few labeled data available for training. The new model outperforms existing GCNN mod-els on both data sets, suggesting that some important three-dimensional geometric informationis indeed captured by the new model.

1. Introduction

Accurate and prompt property prediction plays an essential role in discovering and designing novel materials.First principle (viz., ab initio) methods Šimůnek and Vackář (2006) are commonly used to calculate the properties ofcandidate materials. However, ab initio methods based on either Density-Functional Theory (DFT) or wavefunctiontheories are relatively time-consuming. An alternative method is to predict these properties using machine learning(ML) technologies. By modeling the material structure and atomic interactions, ML algorithms are much faster thanDFT and are expected to provide similar accuracy Schleder et al. (2019) .Graph Neural Networks (GNNs) have attracted great attention recently because of their ﬂexibility and eﬀectivenessin describing 3D structures formed by discrete particles. Graph Convolutional Neural Networks (GCNNs) modelatom distribution using an undirected graph, and they simulate interactions among the atoms by passing messagesamong vertices. Although most models only handle inputs with ﬁxed format, GCNNs are capable of modeling highlyvariable structures with varying number of nodes. This ﬂexibility makes GCNNs have wide applications in diﬀerentresearch ﬁelds such as computer vision and recommend systems. There have been several applications in chemistrywith excellent performance on property prediction Allam et al. (2018); Liu et al. (2017); Schütt et al. (2018); Xieand Grossman (2018) . However, currently these methods have some noticeable disadvantages. In particularly, thegraph description abandons most 3D structural information, except the relative distance between connected vertices.In chemistry terminology, this means that only bond lengths and coordination numbers are taken into consideration.This overly simpliﬁcation naturally fails to model atomic interactions accurately. It is reasonable to assume that adding3D information will improve model’s capability of making accurate predictions.This paper presents a K-Nearest Atoms Graph Convolution Neural Network (KAGCN) for materials properties pre-diction. The proposed KAGCN replaces the “summing up” operation by a novel convolutional operation that integratesinformation from surrounding atoms simultaneously. We applied the model to two distinctly challenging problems on ⋆ This work is ﬁnancially supported by the National Science Foundation Harnessing the Data Revolution Big Idea under Grant No. NSF 1940118and Grant No. NSF 1940270. ∗ Corresponding author

ORCID (s):

B Zhang et al.:

Preprint submitted to Elsevier

Page 1 of 11 a r X i v : . [ c ond - m a t . m t r l - s c i ] F e b redicting Material Properties predicting material properties. The ﬁrst is Henry’s constant for gas adsorption in Metal-Organic Frameworks (MOFs),which is notoriously diﬃcult because of its highly sensitive on inter-atomic separations. The second is the ion con-ductivity of solid-state crystal materials, which is diﬃcult due to the lack of labeled data available for training. Thenew model achieves state-of-the-art performance on both data sets, suggesting that some important three-dimensionalgeometric information is indeed captured by the new model.

2. Related Work

The performance of a machine learning algorithm improves with experience regarding to the task Mitchell (1999). The emergence of large-scale data sets of materials, such as the Inorganic Crystal Structure Database (ICSD) Hellen-brandt (2004), the Open Quantum Materials Database (OQMD) Kirklin et al. (2015), and the Material Project (MP)Jain et al. (2013), greatly empowers the applications of novel machine learning technologies in the ﬁeld of materialsscience. For materials science, an algorithm typically addresses a given problem by optimizing model parameters basedon existing material samples that have been experimentally validated. The rich information contained in large-scaledata sets makes the complex model feasible with minimal human interference.The representation of materials plays a critical role in building an eﬀective machine learning system, and it ischallenging due to the complexity of the chemical structures. One current strategy is to describe the materials bya group of physical properties that are collected from computational simulations and/or experiments. For example,Cubuk et al. (2019) designed a transfer learning framework that generalizes a feature-based predictor into a formula-based predictor and screened billions of materials. Their feature-based predictor employed 30 features. Allam et al.(2018) obtained electronic properties using DFT computations and then applied a neural network to predict otherproperties. These methods require professional knowledge to develop useful features and thus can barely beneﬁt fromthe increasing sample size. End-to-end optimization is not feasible for these methods either.Contrary to the concise properties, some researchers preferred to describe the chemical structure in detail. Hoﬀ-mann et al. (2019) , for instance, constructed a smooth and continuous three-dimensional density representation ofeach crystal based on diﬀerent atomic positions and designed an encoding/decoding system for molecules. Zhenget al. (2018) presented the atomic conﬁguration as a two-dimensional periodic table and normalized the table to mimica digital image. They trained a multitask CNN model that outputs the lattice parameters and enthalpy of formationsimultaneously. The experimental results presented that their method achieved competitive performance on OQMDand ICSD.Graph-based models could handle variant inputs and thus have attracted much attention in chemistry. Schütt et al.(2018) described a molecule by a set of n atoms with nuclear charges and atomic positions and proposed to use contin-uous ﬁlter convolutional layers to model local correlations. The Schnet presented excellent performance on moleculesproperty prediction. Xie and Grossman (2018) described a crystal using an undirected graph, where the vertices andedges correspond to the atoms and bonds, respectively. The atomic interactions were captured by a speciﬁc convo-lution layer, and a pooling layer embedded the graph into the feature space. The CGCNN model presented excellentperformance on a broad array of properties, including formation energy, bandgap, Fermi energy, and elastic properties.Sanyal et al. (2018) adapted the original CGCNN using multitask learning Zhang and Yang (2017) , and their experi-ments showed that sharing parameters in the lower level improved the prediction performance. Some researchers triedto integrate additional knowledge in the model for better performance. Chen et al. (2019) developed an universal modelto predict property for both molecules and crystals by incorporating state variables, such as temperature, pressure, andentropy. Cho and Choi (2018) added a relative position matrix into the model and provided a mechanism to handlethe feature update process. Their 3DGCN model showed good performance on predicting molecular properties andbiochemical activities.

3. Method

In theoretical perspective of ML, predicting material property is straightforward. Here, we are looking for a function 𝐹 that takes material features as input, and outputs the property of interest. The input should contain features that reﬂectobservable characteristics of the materials. There are a variety of features and functions used in existing researchLiuet al. (2017) . In this article, the input contains a feature descriptor and the position of each atom in the materials.The parameters of the function are determined using properties of interest of known materials. The neural network isexpected to have great generalization capabilities and thus, could eﬀectively predict the property of interest for newmaterials. B Zhang et al.:

Preprint submitted to Elsevier

Page 2 of 11redicting Material Properties

Eﬀective and concise representation of atomic structures of materials is a challenging topic in the ﬁeld, and avariety of methods have been proposed Mueller et al. (2016) . Among these methods, graph-based methods presentexcellent computational eﬃciency and have superiority in modeling the inner complexity of the structures. Modelsbased on GCN have been successfully applied to property prediction for crystals and molecules Xie and Grossman(2018); Sanyal et al. (2018) , and the results demonstrated that these models are eﬀective in general.We represent the material by encoding three pieces of information. The ﬁrst is the prior chemistry knowledge aboutthe atoms that was extracted from existing information about the materials. In this article, an 𝑙 × 𝑚 matrix 𝑀 𝑓 wascalculated using the adaptive ‘atom2vector’ method Zhou et al. (2018) , where l is a hyper-parameter that reﬂects thedimension of the feature of each atom, and 𝑚 is the number of diﬀerent elements in the data. Each column of 𝑀 𝑓 isan l-dimensional feature vector of the corresponding element. 𝑀 𝑓 summarizes all the priori known knowledge aboutthe materials in the data set and is independent from the speciﬁc structure. The second piece is the element type of theatoms in the structure, which is noted by a matrix 𝐴 with column vectors 𝑎 , . . . , 𝑎 𝑖 , . . . , 𝑎 𝑁 , where 𝑁 is the numberof atoms in the unit cell. Each column vector is an 𝑚 -dimensional vector that represents the atom type. The matrix 𝐴 is the one-hot encoding Harris and Harris (2010) for the atoms in the structure: 𝑎 𝑖,𝑗 equals to 1 if the 𝑖 -th atom isthe 𝑗 -th element in 𝑀 𝑓 , and otherwise. The third is the coordinates of the atoms in the material, which is a 𝑁 matrix 𝑋 . Each column in 𝑋 is a 3D vector representing the atomic position. With these three pieces of informationas input, we aim to ﬁnd an optimal function F that minimizes the distance between ground truth 𝑝 and the prediction ̂𝑝 = 𝐹 ( 𝑀 𝑓 , 𝐴, 𝑋 ) , where ̂𝑝 is the predicted value of the property of interest. Like commonly applied GCN models, we use an undirected graph 𝐺 = ( 𝑉 , 𝐸 ) to store the input information,where 𝑉 = { 𝑣 , ...𝑣 𝑁 } is the set of vertices and 𝐸 = { 𝑒 𝑖,𝑗 | 𝑖, 𝑗 ∈ {1 , ...𝑁 } , 𝑖 ≠ 𝑗 } is the set of edges. There isa one-to-one correspondence between the vertices and the atoms. If there is a bond between two atoms at vertices 𝑣 𝑖 and 𝑣 𝑗 , an edge 𝑒 𝑖,𝑗 is included in the edge set 𝐸 of the graph Sanderson (2012) . Each vertex in the graph 𝐺 isassociated with an element type represented by the one-hot matrix A. Thus, the atom features can be calculated using 𝑀 𝑓 × 𝐴 . We denote the feature vector associating with vertex 𝑣 𝑖 as 𝑓 𝑣𝑖 . Similarly, each edge 𝑒 𝑖,𝑗 associates with afeature vector 𝑓 𝑒𝑖,𝑗 = ( 𝑒 − 𝛼 | 𝑒 𝑖,𝑗 | , 𝑒 − 𝛼 | 𝑒 𝑖,𝑗 | , . . . , 𝑒 − 𝛼 | 𝑒 𝑖,𝑗 | ) , which 𝛼 , 𝛼 , , . . . , 𝛼 is a vector depending on the length ‖ 𝑒 𝑖,𝑗 ‖ of the edge 𝑒 𝑖,𝑗 . In practical, we used 𝑓 𝑒𝑖,𝑗 , where are prechosen numbers. One can view 𝑒 − 𝛼 𝑘 | 𝑒 𝑖,𝑗 | as a probability thatthe information will be passed through the edge.To simulates the atomic interaction, the associated feature of each vertex in the structure is updated using speciﬁcconvolution mechanism. A common approach is integrating the eﬀect of all bonded vertices. For vertex 𝑣 𝑖 , the updatedfeature ̂𝑓 𝑣𝑖 is calculated according to equation 1. ̂𝑓 𝑣𝑖 = 𝑓 𝑣𝑖 + ∑ 𝑒 𝑖,𝑗 ∈ 𝐸 𝐿 ( 𝑓 𝑣𝑖 , 𝑓 𝑒𝑖,𝑗 , 𝑓 𝑣𝑗 ) (1)where 𝐿 is a nonlinear transformation consists of linear transformation and non-linear activation function. The networkcontains a sequence of convolutions layers and each layer updates the feature vectors for all atoms. The convolutionlayers are used to simulate atomic interactions, and the ﬁnial feature vectors are expected to reﬂect the stable atomicstates.The convolution described in equation 1 could eﬀectively accumulate the interaction among immediately connectedvertices. and has been successful applied in some material property prediction tasks Xie and Grossman (2018) .However, it does not seem to ﬁt well for metal-organic framework materials, whose properties are usually highlydependent on the geometric structure of the atoms of the framework in a large neighborhood, instead of merely amongthe bound atoms. Reducing the 3D spatial relation to pairwise distance between bound atoms neglects the interactionsamong the atoms in the same neighborhood but not bounded. Also, the parameter-sharing design of GCNs forces themodel to calculate the interaction sequentially. In the real world, however, atoms interact with its bonded neighborssimultaneously. This drawback pushes the GCN model further away from accurately modeling the true local chemicalenvironment of the atoms.To tackle the above disadvantages, we involve the position matrix 𝑋 in the calculation, and the convolution is de-ﬁned to integrate all neighbor atoms simultaneously. The atomic feature attached to the 𝑖 th vertex is updated according B Zhang et al.:

Preprint submitted to Elsevier

Page 3 of 11redicting Material Properties to equation 2. ̂𝑓 𝑣𝑖 = 𝑓 𝑣𝑖 + 𝐿 ( 𝑓 𝑣𝑖 , 𝑋 𝑖 , ∪ 𝑗 ∈ 𝐾 𝑖 𝑓 𝑒𝑖,𝑗 , 𝑓 𝑣𝑗 , 𝑋 𝑗 ) (2)where 𝐿 is a combination of linear transformation and nonlinear activation function, and 𝐾 𝑖 is the set indices of the 𝑘 vertices that are closest to 𝑣 𝑖 . The value 𝑘 is a hyper-parameter that decides how many neighbor atoms should beinvolved in the convolution. The new convolution mechanism considers the 𝑘 nearest neighbor atoms simultaneouslyin 3D space, and thus, is expected to achieve better performance. There is one signiﬁcant diﬀerence between convolution mechanism described in equation 1 and equation 2. Theformer doesn’t care about the order of the vertices due to the ‘summing up’ mechanism. By contrast, it is very im-portant to keep the atoms consequence identical even the conﬁguration rotated for the proposed convolution mech-anism. Assuming 𝑘 = 2 and the two neighbor vertices of vertex 𝑣 𝑖 is noted as 𝑣 𝑗 and 𝑣 𝑘 . Then the input of 𝐿 is ( 𝑓 𝑣𝑖 , 𝑋 𝑖 , 𝑓 𝑒𝑖,𝑗 , 𝑓 𝑣𝑗 , 𝑋 𝑗 , 𝑓 𝑒𝑖,𝑘 , 𝑓 𝑣𝑘 , 𝑋 𝑘 ) . Use 𝑣 𝑖 as the center, we could rotate the geometric composed of 𝑣 𝑖 , 𝑣 𝑗 , and 𝑣 𝑘 until 𝑣 𝑗 and 𝑣 𝑘 switch the place with each other. Then the input of 𝐿 becomes ( 𝑓 𝑣𝑖 , 𝑋 𝑖 , 𝑓 𝑒𝑖,𝑘 , 𝑓 𝑣𝑘 , 𝑋 𝑘 , 𝑓 𝑒𝑖,𝑗 , 𝑓 𝑣𝑗 , 𝑋 𝑗 ) andthe output of 𝐿 will change accordingly. However, the chemical feature of atom 𝑣 𝑖 shouldn’t change same even theorientation is diﬀerent. Theoretically, the equivalency of the rotated coordinates could be learned if a large group oftraining samples are provided. Unfortunately, we don’t have enough training samples under most circumstances, and itis computational expensive. In this paper, we developed a simple and eﬀective method to achieve rotational invariantfor the atom sequences.There are existing researches devoted to invariant descriptions. Most of these descriptions have a high computa-tional cost and could not ﬁt deep neural network structures. The scale-invariant feature transform (SIFT) Lowe is themost widely used algorithm that generates rotation invariant descriptors for local patches in images. For each locatedkeypoint, the SIFT descriptor calculates the gradient distribution and decides the principal direction. Then the algo-rithm rotates the principal direction to the x-axis, and the description vector was calculated. The SIFT algorithm hasbeen extensively investigated and applied widely. There are a few adaptions of the original SIFT algorithm Scovanneret al.; Flitton et al.; Berretti et al. that extended the algorithm to 3D space. Comparing to images, the calculation in3D space is more complex due to the additional degree of freedom. However, the chemical structure data under ourconsideration are discrete and sparse, and thus, the calculation could be signiﬁcantly simpliﬁed. We transform thecoordinate system of the surrounding atoms according to Algorithm 1. Algorithm 1:

The rotational invariant local description for material structure input :

Graph 𝐺 = { 𝑉 , 𝐸 } , coordinate matrix 𝑋 , number of neighbor atoms 𝑘 output: Transformed coordinate matrix 𝑋 𝑜𝑢𝑡 for 𝑖 ← to 𝑁 do Calculate distance between vertex 𝑣 𝑖 and 𝑣 𝑗 , where 𝑣 𝑗 ∈ 𝑉 , ≠ 𝑗 ;Sort 𝑣 𝑗 based on distance and get 𝑘 nearest neighbors ;Transform the origin of coordinate system to 𝑥 𝑖 ;Calculate transform matrix 𝑅 that rotates 𝑥 𝑖 to [0 , , | 𝑥 𝑖 | ] 𝑇 using Rodrigues’ rotation formula;Update the coordinate by 𝑋 𝑖𝑅 = 𝑅 × 𝑋 𝑖 ;Calculate transform matrix 𝑅 that rotate 𝑥 𝑖 to 𝑦𝑧 -plane;Update the coordinate by 𝑋 𝑖𝑜𝑢𝑡 = 𝑅 × 𝑋 𝑖𝑅 end For each atom 𝑣 𝑖 in the structure, we calculated the distance between 𝑣 𝑖 and the rest atoms 𝑣 𝑗 , where 𝑖 ≠ 𝑗 . Thenwe sorted all 𝑣 𝑗 according to the distance and ﬁnd the 𝑘 nearest atoms based on the hyper-parameter 𝑘 . Denote by 𝑥 𝑖 the position of atom 𝑣 𝑖 , and by 𝑥 𝑖 and 𝑥 𝑖 the positions of the closest and the second closest atoms to 𝑥 𝑖 . We ﬁrst ﬁndtransform 𝑇 that translates 𝑥 𝑖 to the origin. Next, we calculate a matrix 𝑅 that rotates 𝑥 𝑖 to positive 𝑧 -axis accordingto Rodrigues’ rotation formula Koks (2006) . Then, we calculated the matrix 𝑅 that rotate 𝑥 𝑖 to the 𝑦𝑧 -plane, keeping B Zhang et al.:

Preprint submitted to Elsevier

Page 4 of 11redicting Material Properties 𝑥 𝑖 ﬁxed. Finally, new coordinate 𝑅 ◦ 𝑅 ◦ 𝑇 ( 𝑥 ) for every atom in the 𝑘 nearest neighborhood. It is readily seen thatthis distance-based sort and rotation guarantee the rotational invariance.

4. Experimental Data

The experiments involve two data sets. The main experimental data is CoRE2019 MOF data set that contains10136 MOF structures. For the MOF data, the property of interest is Henry’s constant Soto et al. (1981) . AppendixA presents the details of the calculation of Henry’s constant.The original CoRE2019 MOF database contains 12763 MOF structures. According to our calculation, the rangeof the Henry’s constant covers more than 30 orders of magnitude. After examining the MOF structure ﬁles, we foundthe board range of Henry’s constant is due to some unreasonable MOF structures. A closer examination reveals that ina small portion of MOFs with exceptionally high Henry’s constant, atoms are unreasonably close to each other. Theoverlapped atom position would lead to unreasonably high attraction and unreasonably high Henry’s constant. We gotrid of MOF structures with potentially overlapped atoms using a threshold on smallest atomic distance. MOFs withatomic distance smaller than . ̊𝐴 were ﬁltered out .Meanwhile, there are some MOFS with extremely low Henry’s constant (at the order of 10-10). For the low Henry’sconstant for those MOFs is due to the relative conﬁned geometry. Compared to other MOF with similar void fraction,it has relative thin structure and therefore most interaction inside the MOF are repulsive which results in a low Henry’sconstant. Similarly, we ﬁltered out the MOFs with extremely low Henry’s constant by removing structures with surfacearea less than 50 ̊𝐴 (meaning that there will be very little space and very unlikely for surface adsorption to happen).The cleaned data set contains 10136 MOF structures. The Henry’s constants still cover the several order of magnitudes.However, we believe they may be reachable. The details of this sample selection process could be found in AppendixB. The proposed algorithm was also tested on ion conductivity data that is a subset of the ICSD database. Theproperties of interest of the ICSD data includes minimum bond length, large Li site, and percolation radius. Theseproperties were calculated through topological analysis and could be used as indicators for high potential conductors.Readers could refer to He et al. (2019) for the details of the calculation of these properties.

5. Computational Experiments

The values of 𝐻 𝑐𝑐 for MOF data in the database range across several order of magnitudes. The tremendous valueswould cause issues during the backpropagation of the gradient and make the training infeasible. For example, it couldbe hard to ﬁnd a reasonable learning rate because of the signiﬁcant variances. Existing research English and Carroll(2001) predicted the 𝑙𝑜𝑔 ( 𝐻 𝑐𝑐 ) instead of the widely varied original value. In this paper, we projected the originalvalue to both 𝑙𝑜𝑔 ( 𝐻 𝑐𝑐 ) and 𝑙𝑜𝑔 ( 𝐻 𝑐𝑐 + 1) . Both terms vary are close when the 𝐻 𝑐𝑐 is large, but the later term putless weight on materials with extremely low Henry’s constants. We introduce 𝑙𝑜𝑔 ( 𝐻 𝑐𝑐 + 1) because high Henry’sconstant would be favored for gas adsorption, while discovering materials with extremely low Henry’s constant is notof practical interest.10-folded validation was used through all experiments. For each experiment, samples were randomly divided intothree groups. of the samples for training, of samples for validation, and the rest were used for testing.The parameters were selected based on the performance on validation data, and then the model was evaluated on therest of the samples for testing. Mean Absolute Error (MAE) was picked as the measurement. The above processwas performed 10 times, and the algorithm’s performance was evaluated using the average of MAE.For the algorithm in Xie and Grossman (2018) , the network structure includes one feature embedding layer, threegraph convolution layers, and three fully-connected layers. More details could be found in the original literature. Forthe 3DGCN in Cho and Choi (2018) , we prefer the max pooling for comparison. Similarly, the details of the 3DGCNcould be found in the original literature. The proposed algorithm employs three convolution layers and three fullyconnected layers, similar to the CGCNN. The following parameters were shared for all tasks. The batch size was set as , and a weight decay was . . The above setting was helpful to enhance the generalization of the prediction model.Besides, we used Adam as the optimizer, and the initial learning rate was set as . for all tasks. The experimentswere performed on a virtual machine with 24 vCPU (2.5GHz), 112 G ram, and 1 V100 GPU (5120 CUDA Cores and32Gb ram). B Zhang et al.:

Preprint submitted to Elsevier

Page 5 of 11redicting Material Properties

Figure 1:

Results of Henry’s Constant prediction for MOF.

Figure 2:

The eﬀect of the number of neighbors in the convolution. Presented on the prediction of

𝐿𝑜𝑔 (1 + 𝐻 𝑐𝑐 ) . We ﬁrst tested the proposed method on MOF data, and the property of interest is 𝐻 𝑐𝑐 . The structures of MOFinclude more atoms and are more complex than the crystals. We have 10136 MOF structures for the experiments. Asmentioned above, we predicted both 𝑙𝑜𝑔 ( 𝐻 𝑐𝑐 ) and 𝑙𝑜𝑔 ( 𝐻 𝑐𝑐 + 1) . The results are summarized in Figure 1. It couldbe seen that the proposed method achieved better MAE comparing to the classical GCN model. The results provedthat the proposed method could handle complex structures and further validated our hypothesis. It also shows that 𝑙𝑜𝑔 ( 𝐻 𝑐𝑐 + 1) achieve better MAE. Considering both terms are one to one mapping, the 𝑙𝑜𝑔 ( 𝐻 𝑐𝑐 + 1) is a betterpre-processing choice for 𝐻 𝑐𝑐 prediction task.There is one crucial hyper-parameter is the number of neighbors while calculating the convolution (see equation2). It is similar to the scale of the convolutional kernel and the scale in computer vision. To check the eﬀect of theparameter, we repeated the experiments using the same setup with diﬀerent scale parameters, which are 4, 8, 12, and16. Speciﬁcally, we use two convolutional layers and one fully connected layer. The batch size was set to 32 to suppressthe overﬁt. The samples were divided into training, validation, and testing as we described in the above section. Thetraining processes converged in 100 epochs. The performance was summarized in Figure 2.It could be found that the number of neighbors plays an essential role in the model. The MAE is signiﬁcantly higherwhen only a few neighbor atoms, e.g., four, were involved in the convolution. The MAE reduced while increasing thenumber and put more atoms into the calculation. However, when we increased the size of the neighborhood to a speciﬁcrange (e.g., 12), a future increment didn’t increase the performance. The newly involved atoms have no interaction with B Zhang et al.:

Preprint submitted to Elsevier

Page 6 of 11redicting Material Properties

Table 1

Average time consumed for processing samples (ms).Data Graph Building TrainingKAGCN Xie and Grossman (2018) KAGCN Xie and Grossman (2018)MOFs 493.15 335.89 1.68 3.99ICSD 120.7 113.06 1.31 1.54 the target atom and are not helpful for the model because of the distances. The result suggested that the calculation ofthe connection matrix was not needed if we picked the proper scale parameter. On the other hand, if the scale was toosmall, the convolution didn’t consider the target atom’s whole local environment, and the model underperformed.The network structure is an important part that aﬀects the performance of the network. However, the simpliﬁedstructure was used for the proposed method due to its superiority in describing the materials’ physical features. We usedtwo convolutional layers and one fully connected layer through the experiments, and the increment of the convolutionallayers led to overﬁtting on the training data, even we have plenty of MOF data.One concern about the proposed method is the computational cost. There are two important parts when discussingabout computational eﬃciency, the graph building and convolution calculation. Our algorithm added extra steps togenerate rotation invariant coordination, which cost more time. Meanwhile, the new convolution layer has muchparameters than the existing methods. Assuming the convolution involves four neighbor atoms, and the atom featureand the edge feature are 128 and 32 respectively. The new convolution has slightly over 100 thousands parameters,versus 36 thousands parameters for the convolution in Xie and Grossman (2018) . The average time consumed forsamples used in the experiments were summarized in Table 1.As expected, it could be found that the extra process during graph building increased the computational cost. ForMOF data, the average processing time increased from 335.89 milliseconds to 493.15 milliseconds, and from 113.06milliseconds to 120.7 milliseconds. It is noticeable that the computational cost of training (forward and backward)reduced signiﬁcantly. The reason is that the method in Xie and Grossman (2018) needs to sum up the eﬀect of theneighbor atoms and multiple convolutions are needed. Improvement could be made by introducing more computationunits and compute the convolutions in parallel. But the possible improvement is by no means without limitations.Comparingly, the proposed method improves the overall computational eﬃciency by reducing the training time.The 3DGCN algorithm also involves the coordination of the atoms. However, the performance of the 3DGCNis only slightly better comparing to the classical GCN model and was outperformed by the proposed method. Byanalyzing the model structure, one could discover that the 3DGCN could only model the pairwise interaction, butthe surrounding atoms interact with the target atom simultaneously. The experimental results prove that the proposedmethod that models multiple atoms at the same time could better ﬁt the physics laws.On the other hand, the performance of the proposed method is better than the classical GCN model, but the ad-vantage is not as signiﬁcant as the other two properties. The calculation of bond length involves the atom type and thedistance between atoms, which is involved in the two-dimensional graph description, and thus the introduction of thethree-dimensional topology didn’t beneﬁt much.We also tested the new algorithm on the ICSD crystal data. MAE was picked as measurement, and the results aresummarized in Table 2.

Table 2

Performance of the proposed algorithm on ICSD data.Database ̊𝐴 (0.1 nm) 0.390 0.375 Percolation Radius 2524 ̊𝐴 (0.1 nm) 0.063 0.060 Minimum Bond Length 1760 ̊𝐴 (0.1 nm) 0.697 0.700 The results in present that the proposed method outperformed two GCN based method on prediction topology-based predictor. According to He et al. (2019) , the enlarged Li site originates from the large local space within thecrystal structural framework, and the percolation radius is deﬁned as the maximum radius of a sphere that can percolate

B Zhang et al.:

Preprint submitted to Elsevier

Page 7 of 11redicting Material Properties across at least one direction of the structure. Both deﬁnitions involve the three-dimensional topology information, andthe proposed algorithm beneﬁted from the extra information and thus outperformed the classical GCN model. TheICSD data set experiments validated our hypotheses that the introduction of three-dimensional information into theGCN model improves the model performance.

6. Conclusion

The paper investigated three-dimensional topological information usage in the GCN model and proposed theKAGCN model. The experimental results presented that the 3D topology plays a vital role in modeling the interac-tions among atoms and the proposed network structure better reﬂected the real-world interaction comparing to existinggraph-based models. Meanwhile, we developed an algorithm that generated rotation invariant descriptors for the dis-tribution of local atoms. The invariant descriptor enhances the generalization of the prediction model by eliminatingthe ambiguity caused by rotation. Due to the beneﬁts brought by our improvements, the KAGCN model outperformedthe existing methods on MOF and crystal property prediction. The GCN models present excellent ﬂexibility and ef-fectiveness in modeling material structures. Our experiments proved that model material structure as a 3D point cloudis feasible. The proposed model could be extended to more property prediction tasks and could be easily embed-ded in other models such as graph autoencoders and graph generative networks. Meanwhile, the introduction of thescale parameter helps discover the most signiﬁcant local environment for the atoms and may lead to novel meaningfulchemistry structures.

7. Discussion

Accurate and rapid prediction of the material property is a challenging problem. The recent major advance of deeplearning and the emerge of large-scale material databases empower the development of the application of machinelearning technologies in the ﬁeld. However, the variant material structure still stands in the way hindering immensesuccess of these technologies. Graph representation is concise and useful in describing variant structures. The GCNbuilt on graph representation could eﬀectively handle most materials analysis tasks. The undirected graph could hardlycarry the higher dimensional topology, which is vital to understand the interaction between atoms. This paper deﬁnesa new convolutional operation that involves three-dimensional topological information.The three-dimensional information makes the process more complex than the operation on the graph. For materi-als, one of the core problems is that the operation should be rotation invariant in the three-dimensional space. Inspiredby the classic SIFT algorithm, we propose to rotate each atom’s neighborhood in 3D space according to the nearestneighbor’s coordinates. The new description is rotation invariant. Moreover, Both Cho and Choi (2018) and Hoﬀ-mann et al. (2019) reported reduced calculation eﬃciency after involved the 3D information into consideration. Theadditional computational cost of calculating the new description depends on the total number of atoms in the structureand the neighborhood scales s, which is a constant. Thus, the computational complexity is 𝑂 ( 𝑠 ∙ 𝑛 ) 𝑂 ( 𝑛 ) . The increasedcomputational cost doesn’t aﬀect the forward/backward process and doesn’t slow down the training process. Besides,the experiments present that the new convolutional operation performs faster than looping over all bonded atoms.The concept of ’scale’ should also be considered in the new model. The existing methods loop over all the atompairs that ’bonded’ to sum up all neighbor atoms’ eﬀects. We can eliminate the connected matrix in the new networkstructure and consider the proper scale parameter to prospect the corresponding local area. The re-introduction of thescale aspect helps the future applications of existing deep neural networks in chemistry. References

Allam, O., Cho, B.W., Kim, K.C., Jang, S.S., 2018. Application of dft-based machine learning for developing molecular electrode materials in li-ionbatteries. RSC advances 8, 39414–39420.Berretti, S., Del Bimbo, A., Pala, P., Amor, B.B., Daoudi, M., . A set of selected sift features for 3d facial expression recognition, in: 2010 20thInternational Conference on Pattern Recognition, IEEE. pp. 4125–4128.Chen, C., Ye, W., Zuo, Y., Zheng, C., Ong, S.P., 2019. Graph networks as a universal machine learning framework for molecules and crystals.Chemistry of Materials 31, 3564–3572.Cho, H., Choi, I.S., 2018. Three-dimensionally embedded graph convolutional network (3dgcn) for molecule interpretation. arXiv preprintarXiv:1811.09794 .Cubuk, E.D., Sendek, A.D., Reed, E.J., 2019. Screening billions of candidates for solid lithium-ion conductors: A transfer learning approach forsmall data. The Journal of chemical physics 150, 214701.

B Zhang et al.:

Preprint submitted to Elsevier

Page 8 of 11redicting Material Properties

English, N.J., Carroll, D.G., 2001. Prediction of henry’s law constants by a quantitative structure property relationship and neural networks. Journalof chemical information and computer sciences 41, 1150–1161.Flitton, G.T., Breckon, T.P., Bouallagu, N.M., . Object recognition using 3d sift in complex ct volumes, in: BMVC, pp. 1–12.Harris, D., Harris, S., 2010. Digital design and computer architecture. Morgan Kaufmann.He, X., Bai, Q., Liu, Y., Nolan, A.M., Ling, C., Mo, Y., 2019. Crystal structural framework of lithium super-ionic conductors. Advanced EnergyMaterials 9, 1902078.Hellenbrandt, M., 2004. The inorganic crystal structure database (icsd)—present and future. Crystallography Reviews 10, 17–22.Hoﬀmann, J., Maestrati, L., Sawada, Y., Tang, J., Sellier, J.M., Bengio, Y., 2019. Data-driven approach to encoding and decoding 3-d crystalstructures. arXiv preprint arXiv:1909.00949 .Šimůnek, A., Vackář, J., 2006. Hardness of covalent and ionic crystals: ﬁrst-principle calculations. Physical Review Letters 96, 085501.Jain, A., Ong, S.P., Hautier, G., Chen, W., Richards, W.D., Dacek, S., Cholia, S., Gunter, D., Skinner, D., Ceder, G., et al., 2013. Commentary: Thematerials project: A materials genome approach to accelerating materials innovation. APL materials 1, 011002.Kirklin, S., Saal, J.E., Meredig, B., Thompson, A., Doak, J.W., Aykol, M., Rühl, S., Wolverton, C., 2015. The open quantum materials database(oqmd): assessing the accuracy of dft formation energies. npj Computational Materials 1, 1–15.Koks, D., 2006. Explorations in mathematical physics: the concepts behind an elegant language. Springer Science & Business Media.Liu, Y., Zhao, T., Ju, W., Shi, S., 2017. Materials discovery and design using machine learning. Journal of Materiomics 3, 159–177.Lowe, D.G., . Object recognition from local scale-invariant features, in: Proceedings of the seventh IEEE international conference on computervision, Ieee. pp. 1150–1157.Mitchell, T.M., 1999. Machine learning and data mining. Communications of the ACM 42, 30–36.Mueller, T., Kusne, A.G., Ramprasad, R., 2016. Machine learning in materials science: Recent progress and emerging applications. Reviews inComputational Chemistry 29, 186–273.Sanderson, R., 2012. Chemical bonds and bonds energy. volume 21. Elsevier.Sanyal, S., Balachandran, J., Yadati, N., Kumar, A., Rajagopalan, P., Sanyal, S., Talukdar, P., 2018. Mt-cgcnn: Integrating crystal graph convolu-tional neural network with multitask learning for material property prediction. arXiv preprint arXiv:1811.05660 .Schleder, G.R., Padilha, A.C., Acosta, C.M., Costa, M., Fazzio, A., 2019. From dft to machine learning: recent approaches to materials science–areview. Journal of Physics: Materials 2, 032001.Schütt, K.T., Sauceda, H.E., Kindermans, P.J., Tkatchenko, A., Müller, K.R., 2018. Schnet–a deep learning architecture for molecules and materials.The Journal of Chemical Physics 148, 241722.Scovanner, P., Ali, S., Shah, M., . A 3-dimensional sift descriptor and its application to action recognition, in: Proceedings of the 15th ACMinternational conference on Multimedia, pp. 357–360.Soto, J.L., Fisher, P.W., Glessner, A.J., Myers, A.L., 1981. Sorption of gases in molecular sieves. theory for henry’s constant. Journal of theChemical Society, Faraday Transactions 1: Physical Chemistry in Condensed Phases 77, 157–168.Xie, T., Grossman, J.C., 2018. Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties.Physical review letters 120, 145301.Zhang, Y., Yang, Q., 2017. A survey on multi-task learning. arXiv preprint arXiv:1707.08114 .Zheng, X., Zheng, P., Zhang, R.Z., 2018. Machine learning material properties from the periodic table using convolutional neural networks.Chemical science 9, 8426–8432.Zhou, Q., Tang, P., Liu, S., Pan, J., Yan, Q., Zhang, S.C., 2018. Learning atoms for materials discovery. Proceedings of the National Academy ofSciences 115, E6411–E6417.

B Zhang et al.:

Preprint submitted to Elsevier

Page 9 of 11redicting Material Properties

A. Appendix: The calculation of Henry’s Contant

The Henry’s constant Soto et al. (1981) is calculated as follows, 𝐻 𝑐𝑐 = 1 𝑅𝑇 𝑉 ∫ 𝑒 − 𝛽𝑉 𝑒𝑥𝑡 ( 𝑟 ) 𝑑𝑟 (A1)where 𝛽 = 𝑘 𝐵 𝑇 , 𝑅 represents gas constant, 𝑘 𝐵 stands for Boltzmann constant. 𝑇 is the absolute temperature and 𝑉 𝑒𝑥𝑡 is the external potential. In this work, we consider hydrogen adsorption in MOF where the interaction is mainlycontributed by van der Waals interaction. Lennard-Jones 12-6 potential 𝑉 𝐿𝐽 is therefore used to describe the externalpotential of hydrogen inside MOF with the cutoﬀ distance of 12.9 ̊𝐴 . 𝑉 𝐿𝐽 = 4 𝜀 [( 𝜎𝛾 ) − ( 𝜎𝛾 ) ] (A2)where 𝛾 is the pairwise distance between gas molecules and atoms in the MOFs, and 𝜎 and 𝜀 are the size and energyparameters. In this work, universal force ﬁeld (UFF) is used for all atoms in MOFs. For hydrogen molecule, 𝜎 𝐻 =0 . nm and 𝜀 𝐻 ∕ 𝑘 𝐵 = 34 . K. For interactions between diﬀerent kinds of atoms, Lorentz-Berthelot mixing rulesare used.

B. Appendix: The removal of MOF structures with extreme values

The original CoRE2019 MOF database contains 12763 MOF structures. According to our calculation of Henry’sconstant, the range of the Henry’s constant covers more than 30 orders of magnitude (as shown in Figure B1). Afterexamining the MOF structure ﬁles, we found the board range of Henry’s constant is due to some unreasonable MOFstructures. In Figure B1, we can see that most MOFs distribute around unit Henry’s constant in the unit of 𝑚𝑜𝑙 ∕( 𝑚 𝑃 𝑎 ) .There are a small portion of MOFs with exceptionally high Henry’s constant in Figure B1. Figure B1:

Distribution of Henry’s constant for 12763 MOF in CoRE2019 MOF database.

A closer examination reveals that in these structures, atoms are unreasonably close to each other. For example, inMOF (RUBLEH) with the highest Henry’s constant, the distance of oxygen-oxygen there is only . ̊𝐴 , which iseven shorter than the shortest covalent bond (hydrogen-hydrogen . ̊𝐴 ). The overlapped atom position would leadto unreasonably high attraction and unreasonably high Henry’s constant.After getting rid of MOF structure with potentially overlapped atoms (with smallest atom distance smaller than . ̊𝐴 ), the distribution of Henry’s constant is more reasonable as shown in Figure B2. Most MOFs with exceptionallyhigh Henry’s constant are ﬁltered out.It is also worth noticing that some Henry’s constant is also extremely low (at the order of 10-10). For the lowHenry’s constant for those MOFs is due to the relative conﬁned geometry (XEKUO). Compared to other MOF withsimilar void fraction, it has relative thin structure and therefore most interaction inside the MOF are repulsive whichresults in a low Henry’s constant. For example, compared with XEJJOM (void volume of 253 ̊𝐴 ), XEKUO (voidvolume of 185 ̊𝐴

3) has 0 surface area while XEJJOM has surface of 9.40 ̊𝐴 when using test particle with diameterof 3.68 ̊𝐴 (size of nitrogen molecule). After getting rid of MOF with surface area less than 50 ̊𝐴 (meaning that therewill be very little space and very unlikely for surface adsorption to happen), all the MOFs with extremely low Henry’s B Zhang et al.:

Preprint submitted to Elsevier

Page 10 of 11redicting Material Properties

Figure B2:

Distribution of Henry’s constant for MOFs without potentially overlapping atoms in CoRE2019 MOF database. constant were ﬁltered out. The cleaned data set contains 10136 MOF structures. The Henry’s constants still cover theseveral order of magnitudes. However, we believe they may be reachable. The distribution of Henry’s constants of thecleaned data set is shown in Figure B3.

Figure B3:

Distribution of Henry’s constant for MOFs without structures containing potentially overlapped atoms andextremely small surface area (left). Experimental Henry’s constant in literature (right).B Zhang et al.: