[PDF] Spherical Message Passing for 3D Graph Networks

Abstract

We consider representation learning from 3D graphs in which each node is associated with a spatial position in 3D. This is an under explored area of research, and a principled framework is currently lacking. In this work, we propose a generic framework, known as the 3D graph network (3DGN), to provide a unified interface at different levels of granularity for 3D graphs. Built on 3DGN, we propose the spherical message passing (SMP) as a novel and specific scheme for realizing the 3DGN framework in the spherical coordinate system (SCS). We conduct formal analyses and show that the relative location of each node in 3D graphs is uniquely defined in the SMP scheme. Thus, our SMP represents a complete and accurate architecture for learning from 3D graphs in the SCS. We derive physically-based representations of geometric information and propose the SphereNet for learning representations of 3D graphs. We show that existing 3D deep models can be viewed as special cases of the SphereNet. Experimental results demonstrate that the use of complete and accurate 3D information in 3DGN and SphereNet leads to significant performance improvements in prediction tasks.

Full PDF

SSpherical Message Passing for 3D Graph Networks

Yi Liu* Limei Wang* Meng Liu Xuan Zhang Bora Oztekin Shuiwang Ji Abstract

We consider representation learning from 3Dgraphs in which each node is associated with aspatial position in 3D. This is an under exploredarea of research, and a principled framework iscurrently lacking. In this work, we propose ageneric framework, known as the 3D graph net-work (3DGN), to provide a uniﬁed interface at dif-ferent levels of granularity for 3D graphs. Built on3DGN, we propose the spherical message pass-ing (SMP) as a novel and speciﬁc scheme forrealizing the 3DGN framework in the sphericalcoordinate system (SCS). We conduct formal anal-yses and show that the relative location of eachnode in 3D graphs is uniquely deﬁned in the SMPscheme. Thus, our SMP represents a completeand accurate architecture for learning from 3Dgraphs in the SCS. We derive physically-basedrepresentations of geometric information and pro-pose the SphereNet for learning representations of3D graphs. We show that existing 3D deep modelscan be viewed as special cases of the SphereNet.Experimental results demonstrate that the use ofcomplete and accurate 3D information in 3DGNand SphereNet leads to signiﬁcant performanceimprovements in prediction tasks.

1. Introduction

In many real-world studies, structured objects suchas molecules and proteins are naturally modeled asgraphs (Gori et al., 2005; Wu et al., 2018; Shervashidzeet al., 2011; Fout et al., 2017; Liu et al., 2020; Wang et al.,2020). With the advances of deep learning, graph neuralnetworks (GNNs) have been developed for learning fromgraph data (Kipf & Welling, 2017; Veliˇckovi´c et al., 2018;Xu et al., 2019; Gao & Ji, 2019; Gao et al., 2020; Yuan &Ji, 2020). In Battaglia et al. (2018), existing GNN meth-ods have been uniﬁed to the general graph network (GN) * Equal contribution Department of Computer Science & En-gineering, Texas A&M University, TX, USA. Correspondence to:Shuiwang Ji . framework and can be realized by message passing architec-tures (Gilmer et al., 2017; Sanchez-Gonzalez et al., 2020).The original GN framework is developed for regular graphsrather than 3D graphs. Generally, a 3D graph contains 3Dcoordinates for each node given in the Cartesian systemalong with the graph structure (Liu et al., 2019; Townshendet al., 2019; Axelrod & Gomez-Bombarelli, 2020). Differ-ent types of relative 3D information can be derived from3D graphs, and they can be important in some applications,such as bond lengths and angles in molecular modeling.In this work, we propose the 3D graph network (3DGN)as a generic framework for 3D graphs. The 3DGN aimsat providing a clear interface at different levels of graphgranularity such that researchers can easily develop novelmethods for 3D graphs. We note that the original Carte-sian coordinates in 3D graphs usually cannot serve as directinputs to computational models, as they contain severelyredundant information that may hurt model performance. Inaddition, they are not invariant to translation and rotation ofinput graphs. Hence, following message passing neural net-works (MPNNs) (Scarselli et al., 2008; Gilmer et al., 2017;Vignac et al., 2020), we further propose a novel messagepassing scheme, known as the spherical message passing(SMP), for realizing the 3DGN framework. Based on for-mal analysis in the spherical coordinate system (SCS), weshow that the relative location of each node in 3D graphs isuniquely determined in the SMP scheme. Hence, our SMPrepresents a complete and accurate architecture for realizingthe 3DGN in SCS. As the encoded 3D information is therelative positional information such as distances betweenpairwise nodes, SMP yields predictions that are invariant totranslation and rotation of input graphs.We apply the SMP to real-world problems, where meaning-ful physical representations are important. By integratingthe SMP and physical representations approximating thedensity functional theory, we develop the spherical messagepassing neural networks, known as the SphereNet, for 3Dgraph learning. We show that existing models for 3D graphs,such as SchNet (Schütt et al., 2017) and DimeNet (Klicperaet al., 2020b), are special cases of our SphereNet, as theyonly encode partial 3D information. We conduct experi-ments on various types of datasets including QM9, OC20,and MD17. Experimental results show that compared withbaseline methods, SphereNet achieves the best performance a r X i v : . [ c s . L G ] F e b pherical Message Passing for 3D Graph Networks without increasing computations. Ablation study revealscontributions of different 3D information, and demonstratesthe advances of the proposed SMP.

2. Related Work

Graph neural networks (GNNs) are an emerging area ofresearch. Notable GNN methods include GCN (Kipf &Welling, 2017; Defferrard et al., 2016), GAT (Veliˇckovi´cet al., 2018), GraphSAGE (Hamilton et al., 2017), GIN (Xuet al., 2019), LGCN (Gao et al., 2018), GG-NN (Li et al.,2016), DGCNN (Zhang et al., 2018), graph U-Nets (Gao& Ji, 2019), etc.

Currently, message passing neural net-works (MPNNs) (Gilmer et al., 2017) are viewed as themost general architectures for realizing GNNs. These net-works and architectures are further extended and uniﬁedto a more generic framework, known as the graph network(GN) in Battaglia et al. (2018). The key steps for the GNframework include information aggregation and informationupdate across different levels of granularity, such as nodes,edges, or the whole graph. However, the GN frameworkdoes not incorporate the 3D positional information whenperforming the aggregation and update processes, whilesuch information is vital in some real-world applications.

Formally, 3D graphs refer to graphs in which 3D positionsfor all nodes are given in the Cartesian system and areuseful in deriving graph representations. Generally, orig-inal Cartesian coordinates in 3D graphs can not serve asdirect inputs to computational models. Otherwise, theymay hurt model performance and the generated predictionswould not be invariant to translation and rotation of inputgraphs. Hence, several types of relative 3D information canbe derived based on absolute Cartesian coordinates, such asdistances between nodes, angles between edges, angles be-tween planes, etc.

Such 3D information is invariant to trans-lation and rotation and is important in some applications.For instance, in molecular modeling, the 3D molecular in-formation includes bond lengths, angles between bonds, andbond rotations. These information plays a crucial role inmolecular representation learning.The development of methods for 3D graphs is in early stage,and existing studies focus on leveraging different types of3D information. The SchNet (Schütt et al., 2017) incorpo-rates the distance information during the information aggre-gation stage by using continuous-ﬁlter convolutional layers.The PhysNet (Unke & Meuwly, 2019) integrates both thenode features and distance information in the proposed in-teraction block. The DimeNet (Klicpera et al., 2020b) isdeveloped based on the PhysNet and moves a step forward !" " ! ! ! "→$ % " % $ ! &$ %→" % % $ &→% $ &→" $ &→$ $ %→$ Figure 1.

An illustration of the computational steps for our pro-posed 3DGN framework. by considering both distance and angle information in theinteraction block. In addition to the above methods thatexplicitly integrate 3D information during the informationaggregation process, the OrbNet (Qiao et al., 2020) com-bines distance information with the atomic orbital theory todesign important SAAO features as inputs to GNNs. Gen-erally, the use of 3D positional information usually resultsin improved performance. However, there lacks a uniﬁedand rigorous framework to systematically incorporate 3Dinformation in the message passing schemes.

3. A Generic Framework for 3D Graphs

In machine learning, structured objects can be naturally rep-resented as graphs. For instance, molecular representationlearning is a key task in many ﬁelds, including biophysicsand material science (Xie & Grossman, 2018; Wu et al.,2018). When modeling molecules as graphs, atoms are rep-resented as nodes, and bonds between atoms are modeledas edges. In addition, characteristics of atoms and bonds, e.g. , atom and bond types, can be encoded as node and edgeattributes. With the advances of deep learning, GNNs havebeen developed to learn features from input graphs. Thereexist numerous architectures for GNNs, and Battaglia et al.(2018) unify and extend these architectures by proposingthe graph network (GN) framework. The GN frameworkin Battaglia et al. (2018) is based on graphs without 3D posi-tional information of the input graphs. In some applications,such as molecular property prediciton, the 3D information,such as bond lengths, angles between bonds, bond rotations,is of great importance for making accurate predictions. Tothis end, we propose the 3D graph network (3DGN) as a newand generic framework to explicitly consider 3D positionalinformation in real-world data. Our 3DGN is a generalframework that provides a clear interface for manipulatingstructured objects represented as 3D graphs.A 3D graph is represented as a 4-tuple G = ( u , V, E, P ) .The u ∈ R d u is a global feature vector for the graph pherical Message Passing for 3D Graph Networks ' ' ( ' ) ! ( ! ( a ) ( b ) * ( ! + ! ) * ) Figure 2.

Illustrations of the functions φ e (a) and φ v (b). G . V = { v i } i =1: n is the set of node features, whereeach v i ∈ R d v is the feature vector for the node i . E = { ( e k , r k , s k ) } k =1: m is the set of edges, where each e k ∈ R d e is the feature vector, r k is the index of the receivernode, and s k is the index of the sender node for the edge k . P = { r h } h =1: n is the set of 3D Cartesian coordinatesthat contains 3D spatial information for each node. In ad-dition, we let E i = { ( e k , r k , s k ) } r k = i,k =1: m denote the setof edges that point to the node i , and N i denote the indicesof incoming nodes of node i .The computational steps of our proposed 3DGN frameworkare illustrated in Fig. 1. The used functions include a setof φ functions and a set of ρ functions. Generally, the φ functions are applied to nodes, edges, or the whole graphas information update functions for the corresponding ge-ometries, while the ρ functions are used to aggregate infor-mation from one type of geometry to another. The outputsinclude the updated global feature vector u (cid:48) ∈ R d u , theupdated node features V (cid:48) = { v (cid:48) i } i =1: n , and the updatededges E (cid:48) = { ( e (cid:48) k , r k , s k ) } k =1: m . Formally, our proposed3DGN framework is deﬁned as e (cid:48) k = φ e (cid:16) e k , v r k , v s k , E s k , u , ρ p → e (cid:16) { r h } h = r k ∪ s k ∪N sk (cid:17)(cid:17) , v (cid:48) i = φ v ( v i , ρ e → v ( E i ) , u , ρ p → v ( { r h } h = i ∪N i )) , u (cid:48) = φ u ( ρ e → u ( E (cid:48) ) , ρ v → u ( V (cid:48) ) , u , ρ p → u ( { r h } h =1: n )) . (1)Speciﬁcally, the function φ e is applied to each edge k andoutputs the updated edge vector e (cid:48) k . The indices of the inputgeometries to φ e are illustrated in Fig. 2 (a). Correspond-ingly, the inputs include the old edge vector e k , the receivernode vector v r k , the sender node vector v s k , the set of edges E s k that point to the node s k , and the 3D positional infor-mation for all the nodes connected by the edge k and edgesin E s k with the index set as r k ∪ s k ∪ N s k . The function ρ p → e aggregates 3D information from these nodes to updatethe edge k . The function φ v is used for per-node updateand generates the new node vector v (cid:48) i for each node i . Anillustration of the indices of the inputs to φ v is provided inFig. 2 (b). The inputs include the old node vector v i , theset of edges E i that point to the node i , and 3D informa-tion for all the related nodes (the index set is i ∪ N i ). The O HH O

Figure 3.

The chemical structure of the H O . functions ρ e → v and ρ p → v are applied to aggregate the inputedge features and the input nodes’ positional informationfor updating the node i , respectively. The function φ u isused to update the global graph feature, while the functions ρ e → u , ρ v → u , and ρ p → u aggregate information from all theedge features, all the node features, and 3D information forall the nodes, respectively.Importantly, we explicitly incorporate positional informa-tion contained in P , and use three aggregation functions ρ p → e , ρ p → v , and ρ p → u to compute positional representa-tions for different geometries. Note that absolute Cartesiancoordinates stored in P are not invariant to translation androtation, and usually contain excessive useless information.Hence, they are not used as immediate inputs to machinelearning models. The ρ functions can be ﬂexibly adapted togenerate relative 3D information that is invariant to transla-tion and rotation. For example, when updating node featuresin Eq. (1), ρ p → v can be adapted to a radial basis function(RBF) that computes distances between node i and each ofits incoming node in E i .

4. Spherical Message Passing NeuralNetworks

Currently, the class of message passing neural networks(MPNNs) (Gilmer et al., 2017) are the most widely used ar-chitectures for GNNs. In this work, we aim at constructinga suitable and speciﬁc message passing scheme that incor-porates 3D positional information, thus realizing the 3DGNframework described in Sec. 3. Importantly, the encoded 3Dinformation should be relative positional information thatis invariant to translation and rotation of real-world graphslike molecules. To this end, we propose to perform messagepassing in the spherical coordinate system (SCS), result-ing in a novel scheme known as spherical message passing(SMP). Based on our formal analysis in the SCS, we argueand show that our SMP represents a complete and accuratearchitecture for realizing 3DGN in SCS. By integrating theSMP and physically based representations, we propose thespherical message passing neural networks, known as theSphereNet, for learning representations for 3D graph data.We show that existing 3DGN architectures, such as SchNetand DimeNet, are special cases of our SphereNet, as they pherical Message Passing for 3D Graph Networks r e f e r e n c e p l a n e (a) (b) Figure 4. (a). The message aggregation scheme for the spherical message passing. (b). An illustration for computing torsion angles in thespherical message passing architecture. capture partial 3D information.

We ﬁrst investigate the structure identiﬁcation of 3D graphsin the spherical coordinate system. For any point in theSCS, its location is speciﬁed by a 3-tuple ( d, θ, ϕ ) , where d , θ , and ϕ denote the radial distance, polar angle, and theazimuthal angle, respectively. When modeling 3D graphsin the SCS, any node i can be the origin of a local SCS,and d , θ , and ϕ naturally become the edge length, the anglebetween edges, and the torsion angle, respectively. Thus,the relative location of each neighboring node of node i canbe speciﬁed by the corresponding tuple ( d, θ, ϕ ) . Similarly,the relative location of each node in the 3D graph can bedetermined, leading to the identiﬁed graph structure, whichis naturally invariant to translation and rotation of the inputgraph. The SCS can be easily converted from the Cartesiancoordinate system, where the 3D positional information P ,introduced in Sec. 3, is deﬁned. Thus, the tuple ( d, θ, ϕ ) can be easily obtained from P .We use a molecular graph as an example to show how d , θ , and ϕ are vital for the graph structure identiﬁcation.The chemical structure of the hydrogen peroxide (H O ) isshown in Fig. 3. It is obvious that the structure is uniquelydeﬁned by the three bond lengths d , d , d , the two bondangles θ , θ , and the torsion angle ϕ . Note that the two O-H bonds can rotate around the O-O bond without changingany of the bond lengths and bond angles. In this situation,however, the torsion angle ϕ changes and the structure of theH O varies accordingly. Hence, the torsion angle is nec-essary for determining structures of molecular graphs. Theimportance of torsion angle has also been demonstrated inrelated research domains. Garg et al. (2020) formally showthat the torsion angle along with the port numbering canimprove the expressive power of GNNs in distinguishinggeometric graph properties, such as girth and circumfer-ence, etc . Other studies (Ingraham et al., 2019; Simm et al.,2020) reveal that protein sequences and molecules can beaccurately generated by considering the torsion angle in the given 3D structures. In this work, we propose SMP to sys-tematically consider distance, angle, and torsion informationfor completely determining structures of 3D graphs.An illustration of the message aggregation scheme for SMPis provided in Fig. 4 (a). Based on Eq. (1), the embeddingof the node r k is obtained by aggregating each incomingmessage e k . The message e k is updated based on E s k , theset of incoming edges pointing to the node s k . Let q denotethe sender node of any edge in E s k . Hence, we can deﬁne alocal SCS, where s k serves as the origin, and the directionof the message e k naturally serves as the z -axis. We deﬁnea neighboring node o of s k as the reference node. Thus,the reference plane is formed by three nodes s k , r k , and o . For node q , its location is uniquely deﬁned by the tuple ( d, θ, ϕ ) , as shown in Fig. 4 (a). Speciﬁcally, d determinesits distance to the node s k , θ speciﬁes its direction to updatethe message e k . Apparently, the relative location of thenode q is not completely determined when ﬁxing d and θ only. For instance, q can rotate around s k without changing d and θ . To determine the location of q completely, wepropose to use the torsion angle ϕ , which is formed by thedeﬁned reference plane and the plane spanned by s k , r k , and q . Intuitively, as an advanced message passing architecturein spherical coordinates for 3DGN, SMP can determine therelative location for any neighboring node q by consideringall the distance, angle, and torsion information, leading tomore informative and accurate 3D representations.Generally, the node s k may have several neighboring nodes,which we denote as q , ..., q t . It is easy to compute the cor-responding bond lengths and bond angles for these t nodes.The SMP computes torsion angles by projecting all the t nodes to the plane that is perpendicular to e k and intersectwith s k . Then on this plane, the torsion angles are formed ina predeﬁned direction, such as the anticlockwise direction.By doing this, any node naturally becomes the referencenode for its next node in the anticlockwise direction. No-tably, the sum of these t torsion angles is π . A simpliﬁedcase is illustrated in Fig. 4 (b). The node s k has three neigh-boring nodes q , q , and q ; q is the reference node for q , pherical Message Passing for 3D Graph Networks !" " ! ! ! "→$ % " % $ ! &$ %→" % % $ &→% Figure 5.

An illustration of the computational steps for the spheremessage passing. and they form ϕ ; q is the reference node for q , and theyform ϕ ; similarly, q is the reference node for q , and theyform ϕ . It is obvious that the sum of ϕ , ϕ , and ϕ is π .Our 3DGN is a general framework considering the 3D posi-tional information. The uniﬁed framework makes it easierfor researchers to develop novel architectures on 3D graphsfor real-world applications. In this work, we move a stepforward by proposing the spherical message passing as aspeciﬁc and novel architecture for the 3DGN. Formally, thedeﬁnition of the SMP can be obtained by realizing Eq. (1)in the SCS as e (cid:48) k = φ e (cid:16) e k , v r k , v s k , E s k , ρ p → e (cid:16) { r h } h = r k ∪ s k ∪N sk (cid:17)(cid:17) , v (cid:48) i = φ v ( v i , ρ e → v ( E i )) , u (cid:48) = φ u ( u , ρ v → u ( V (cid:48) )) . (2)The 3D information in P is converted and used to updateeach edge feature e k . Hence, for the three positional ag-gregation functions ρ p → e , ρ p → v , and ρ p → u , SMP only uses ρ p → e for edge updates. The functions φ e , φ v , and φ u canbe implemented in different ways, such as using neural net-works and mathematical operations. The computational ﬂowof SMP can be obtained by removing several connectionsin Fig. 1 and is given in Fig. 5. We propose to implementthe function ρ p → e as the physically based representation Ψ( d, θ, ϕ ) , which is the solution to the Schrödinger equa-tion approximating the density functional theory (DFT). The obtained 3-tuple ( d, θ, ϕ ) determines the relative loca-tion of any node in the graph. However, in some applica-tions, such as molecular representation learning, the 3-tuple ( d, θ, ϕ ) cannot serve as the direct input to neural networksas it lacks meaningful physical representations. In this sec-tion, we propose a physically based representation for the3-tuple ( d, θ, ϕ ) , which we denote as Ψ( d, θ, ϕ ) . In quan-tum systems, Ψ( d, θ, ϕ ) can be treated as the solution to the Schrödinger equation approximating the DFT.When investigating the electric structures of molecules, com-putational models are used to approximate the DFT (Sholl& Steckel, 2011; Calais, 1993). The key proposal of DFTis that the molecular properties are determined by func-tionals of the spatially dependent electron density. Hence,atomic locations are combined with quantum mechanics topredict properties of molecules. With the electron state ex-pressed as a function of locations as Ψ( r ) , the Schrödingerequation could be written in a time-independent manner as − (cid:126) m ∇ Ψ( r ) + V ( r )Ψ( r ) = E Ψ( r ) , where m denotes theconstant mass, E denotes the constant energy, (cid:126) is the re-duced Planck constant, ∇ is the Laplacian in the Cartesiancoordinate system, and V ( r ) is the potential as a functionof locations. By performing separation on variables andconverting the Cartesian coordinate system to the SCS (Grif-ﬁths & Schroeter, 2018), the generic and regular solution tothe Schrödinger equation in the SCS is Ψ( d, θ, ϕ ) = ∞ (cid:88) (cid:96) =0 m = (cid:96) (cid:88) m = − (cid:96) a (cid:96)m j (cid:96) ( kd ) Y m(cid:96) ( θ, ϕ ) , (3)where j (cid:96) ( · ) is a spherical Bessel function of order (cid:96) , Y m(cid:96) isa spherical harmonic function of degree m and order (cid:96) , and a (cid:96)m is the set of coefﬁcients regarding (cid:96) and m . In the SMP,we use an orthogonal basis (Cohen et al., 2019; Klicperaet al., 2020b) for j (cid:96) ( · ) and set the boundary condition to be k = z (cid:96)n c . Thus, we obtain the complete 3D representationfor ( d, θ, ϕ ) as ˜ t BF ,(cid:96)mn ( d, θ, ϕ ) = (cid:115) c j (cid:96) +1 ( z (cid:96)n ) j (cid:96) (cid:16) z (cid:96)n c d (cid:17) Y m(cid:96) ( θ, ϕ ) , (4)where c denotes the cutoff, z (cid:96)n is the n -th root of the Besselfunction of order (cid:96) . We also have (cid:96) ∈ [0 , · · · , N SHBF − , m ∈ [ − (cid:96), · · · , (cid:96) ] and n ∈ [1 , · · · , N SRBF ] . N SHBF and N SRBF denote the highest orders for the spherical harmon-ics and spherical Bessel functions, respectively. They arehyperparameters in experimental settings.We also consider two simpliﬁed cases, where we only con-sider d and θ for the ﬁrst, and only d for the second (usea radial basis function). The representations for these twocases can be obtained as ˜ a SBF ,(cid:96)n ( d, θ ) = (cid:115) c j (cid:96) +1 ( z (cid:96)n ) j (cid:96) (cid:16) z (cid:96)n c d (cid:17) Y (cid:96) ( θ ) , ˜ e RBF ,n ( d ) = (cid:114) c sin (cid:0) nπc d (cid:1) d , (5)where the notations have been deﬁned as in Eq. (4). pherical Message Passing for 3D Graph Networks Based upon the spherical message passing scheme describedin Sec. 4.1 and physical solutions to the DFT in Sec. 4.2,we build the SphereNet for real-world graph data. By in-corporating the positional information in spherical coordi-nates, SphereNet generates predictions that are invariantto translation and rotation of input molecules. Our net-work is composed of an embedding block, several inter-action blocks, and an output block. For clear description,we assume the message e k for the edge k in Fig. 4 andEq. (2) is the message for update. Speciﬁcally, the embed-ding block generates the initial message for the edge k , andtakes only the distance representation ˜ e RBF ,n ( d ) in Eq. (5)as the input. Each interaction block updates the messagefor the edge k . The inputs include messages for all theneighboring edges, and all three representations, including ˜ t BF ,(cid:96)mn ( d, θ, ϕ ) , ˜ a SBF ,(cid:96)n ( d, θ ) , and ˜ e RBF ,n ( d ) in Eq. (4) andEq. (5) based on the edge k and its neighboring edges. Theoutput block ﬁrst takes both the distance representation andthe current message for k as inputs. Then the feature vectorof the receiver node for the edge k (node r k in Fig. 4 andEq. (2)) is obtained by aggregating all the messages point-ing to it, where the other messages have a similar updateprocess as e k . The detailed architecture for the SphereNetis provided in Appendix A.

5. Relations with Prior 3DGN Models

When developing architectures for 3DGN using the spheremessage passing, our SphereNet is an advanced modelwhere the relative location of each node is deterministic.The development for 3DGN is still in early stage. To ourbest knowledge, there exist several notable models in theliterature, and they all can be viewed as special cases of theSphereNet, as they capture partial 3D positional information.We describe two exemplary models SchNet and DimeNet inthis section. Details of these two models and the descriptionof other models are provided in Appendix B.

SchNet, Schütt et al. (2017)

SchNet uses continuous-ﬁlter convolutional layers to modellocal correlations for molecules. It essentially incorporatesrelative distances based on atomic positions. The computa-tional steps for the SchNet are simpliﬁed from the SphereNetin Eq. (2) and can be expressed as e (cid:48) k = ϕ e ( v r k , ρ p → e ( { r h } h = r k ∪ s k )) , v (cid:48) i = ϕ v ( v i , ρ e → v ( E i )) , u (cid:48) = ϕ u ( ρ v → u ( V (cid:48) )) . (6)As the SchNet only considers the distance information, ρ p → e = ˜ e RBF ,n ( d ) in Eq. (5), and d is the Euclidean dis-tance between the nodes r k and s k . The other used functionsare neural networks or mathematical operations. DimeNet, Klicpera et al. (2020b)

DimeNet explicitly includes angles between directed edgesin the proposed directional message passing process. Com-pared with the SchNet, DimeNet moves a step forward byconsidering both the distance information d and bond angleinformation θ . As a special case of the SMP deﬁned inEq. (2), the directional message passing is expressed as e (cid:48) k = ϕ e (cid:16) e k , E s k , ρ p → e (cid:16) { r h } h = r k ∪ s k ∪N sk (cid:17)(cid:17) , v (cid:48) i = ϕ v ( v i , ρ e → v ( E i )) , u (cid:48) = ϕ u ( u , ρ v → u ( V (cid:48) )) . (7)Apparently, in the DimeNet, ρ p → e = ˜ a SBF ,(cid:96)n ( d, θ ) inEq. (5). The d and θ are illustrated in Fig. 2 and introducedin Sec. 4.1. The other used functions are neural networks ormathematical operations.

6. Experimental Studies

We apply our SphereNet to three benchmark datasets in-cluding QM9 (Ramakrishnan et al., 2014), Open Catalyst2020 (OC20) (Chanussot et al., 2020), and MD17 (Chmielaet al., 2017). Baseline methods include PPGN (Maronet al., 2019), SchNet (Schütt et al., 2017), PhysNet (Unke& Meuwly, 2019), Cormorant (Anderson et al., 2019),MGCN (Lu et al., 2019), DimeNet (Klicpera et al., 2020b),DimeNet++ (Klicpera et al., 2020a) , CGCNN (Xie & Gross-man, 2018), and sGDML (Chmiela et al., 2018). For allbaseline methods, we report the results taken from the re-ferred papers or provided by the original authors. For theSphereNet, all models are trained using stochastic gradientdescent (SGD) with the Adam optimizer (Kingma & Ba,2014). The optimal hyperparameters are obtained by gridsearch. Network conﬁgurations and search space for allmodels are provided in Appendix C. Code will be releasedafter the anonymous review period.

We apply the SphereNet to the QM9 dataset, which is widelyused for predicting various properties of molecules. It con-sists organic molecules composed of up to 9 heavy atoms.Thus, this test can examine the power of the SphereNet forsimilar quantum chemistry systems. The dataset is originalsplit into three sets, where the training set contains 110 000,the validation set contains 10 000, and the test set contains10 831 molecules. For energy-related properties, the train-ing processes use the unit eV. All hyperparameters are tunedon the validation set and applied to the test set. We compareour SphereNet with baselines using mean absolute error(MAE) for each property and the overall mean standarizedMAE (std. MAE) for all the 12 properties. The comparison pherical Message Passing for 3D Graph Networks

Table 1.

Comparisons between SphereNet and other models in terms of MAE and the overall mean std. MAE on QM9. ‘-’ denotes noresults are reported in the referred papers for the corresponding properties. The best results are shown in bold and the second best resultsare shown with underlines.

Property Unit PPGN SchNet PhysNet Cormorant MGCN DimeNet DimeNet++

SphereNet µ D 0.047 0.033 0.0529 0.13 0.0560 0.0286 0.0297 α a (cid:15) HOMO meV 40.3 41 32.9 36 42.1 27.8 24.6 (cid:15)

LUMO meV 32.7 34 24.7 36 57.4 19.7 19.5 ∆ (cid:15) meV 60.0 63 42.5 60 64.2 34.8 32.6 (cid:10) R (cid:11) a U meV 36.8 14 8.15 28 12.9 8.02 6.32 U meV 36.8 19 8.34 - 14.4 7.89 H meV 36.3 14 8.42 - 14.6 8.11 6.53 G meV 36.4 14 9.40 - 16.2 8.98 c v calmol K 0.055 0.033 0.0280 0.031 0.0380 0.0249 0.0230 std. MAE % 1.84 1.76 1.37 2.14 1.86 1.05 0.98 Table 2.

Comparisons between SphereNet and other models on IS2RE in terms of energy MAE and the percentage of EwT of the groundtruth energy. Results reported for models trained on the training set with size of 10k. The best results are shown in bold.Energy MAE [eV] ↓ EwT ↑ Model ID OOD Ads OOD Cat OOD Both ID OOD Ads OOD Cat OOD BothCGCNN 1.0479 1.0527 1.0232 0.9608 1.39% 1.38% 1.59% 1.57%SchNet 1.0858 1.1044 1.0720 1.0391 1.34% 1.39% 1.42% 1.44%DimeNet 1.0117 1.0734 0.9814 0.9767 1.45% 1.41% 1.53% 1.41%DimeNet++ 0.8819 0.9106 0.8357 0.8408 1.94% 1.69% 2.13% 1.84%

SphereNet 0.8352 0.8723 0.7959 0.7952 1.96% 2.02% 2.19% 1.90% results are summarized in Table 1. SphereNets achievesbest performance on 8 properties and the second best per-formance on 2 properties. It also sets the new state of theart on the overall mean std. MAE of the QM9 dataset.

The Open Catalyst 2020 (OC20) dataset is a newly re-leased large-scale dataset for catalyst discovery and opti-mization (Chanussot et al., 2020). It comprises millionsof DFT relaxations across huge chemical structure spacesuch that machine learning models can be fully trained.There exist three tasks including S2EF, IS2RS, and IS2RE.In this work, we focus on IS2RE that predicts structure’senergy in the relaxed state. It is the most common taskin catalysis as relaxed energies usually inﬂuence the cata-lyst activity. The dataset for IS2RE is originally split intotraining/validation/test sets. The test label is not publiclyavailable. Experiments are conducted on the validation set,which has four splits including In Domain (ID), Out ofDomain Adsorbates (OOD Ads), Out of Domain catalysts(OOD cat), and Out of Domain Adsorbates and catalysts(OOD Both), where numbers of samples are 24 943, 24 961, 24 963, 24 987, respectively. Results for all the baselines areprovided by the original authors, and we report evaluationresults of ﬁxed epochs for SphereNet. Following a settingin Chanussot et al. (2020), we use the training set with size10k for training models. The used metrics are the energyMAE and the percentage of Energies within a Threshold(EwT) of the ground truth energy. Table 2 shows that theSphereNet consistently achieves the best performance on allthe four splits in terms of energy MAE and EwT. It reducesthe average energy MAE on four splits by 0.043, which is4.91% of the second best model. In addition, it improves theaverage EwT from 1.90% to 2.02%, which is a large marginconsidering the inherently low EwT values.

The MD17 dataset is used to examine the expressive powerof SphereNet for molecular dynamics simulations. The goalis to predict energy-conserving forces at the atomic levelfor eight organic molecules, each of which has hundreds ofthousands states simulated by DFT and atom coordinates.Following the settings in Schütt et al. (2017); Klicpera et al.(2020b), we train a separate model for each molecule to pherical Message Passing for 3D Graph Networks

Table 3.

Comparisons between SphereNets and other models interms MAE of forces on MD17. The best results are shown in boldand the second best results are shown with underlines.

Molecule sGDML SchNet DimeNet

SphereNet

Aspirin 0.68 1.35 0.499

Benzene 0.20 0.31 0.187

Ethanol 0.33 0.39 0.230

Malonaldehyde 0.41 0.66 0.383

Naphthalene

Table 4.

Comparisons among three message passing strategies onthe same SphereNet architecture on the partial MD17 dataset.

Molecule SMP w/o ( θ , ϕ ) SMP w/o ϕ SMPEthanol 0.249 0.22 0.208Malonaldehyde 0.550 0.360 0.340Naphthalene 0.372 0.205 0.178Toluene 0.446 0.182 0.155predict atomic forces. We use 1000 samples for training,and each of the eight molecules has both the validation andtest sets. Hyperparameters are tuned on validation sets andapplied to test sets. The results for forces are reported inTable 3. Results for baselines are taken from referred papersand there are no original results for DimeNet++. Note thatfor Benzene, all models are evaluated on Benzene17, thus,the result for sGDML is 0.20 rather than 0.06 (Benzene18).We can observe from the table that SphereNet consistentlyoutperforms SchNet and DimeNet by largin margins. Com-pared with sGDML, SphereNet performs better on fourmolecules, and acheives much better std. MAE with 0.97.sGDML is one of the original work that created the MD17dataset with carefully-designed features, thus, it naturallyhas advantages for small molecules. However, comparedwith SphereNet, sGDML has poorer generalization powerto larger datasets without hand-engineered features.

The proposed SMP considers all the distance, angle, andtorsion information, leading to complete representations of3D information. In this section, we investigate contributionsof different 3D information to demonstrate the advances ofour SMP. We remove torsion information from SMP whichwe denote as “SMP w/o ϕ ”; we further remove angle infor-mation which we denote as “SMP w/o ( θ , ϕ )”. The threemessage passing strategies are integrated to the same archi-tecture with other network parts remaining the same. Weevaluate these models on four molecules of MD17. Table 4 Figure 6.

Visualization of three SphereNet ﬁlters. Each row cor-responds to a ﬁlter with torsion angles 0, π/ , π , and π/ fromleft to right. Table 5.

Efﬁciency comparisons between SphereNet and othermodels in terms of number of parameters and time cost per epochusing the same infrastructure.

SchNet DimeNet DimeNet++ SphereNet ϕ , and SMP w/o ϕ outperforms “SMP w/o ( θ , ϕ )”. These results demonstratethe effectiveness of angle and torsion information used inthe SMP. The best performance of SMP further reveals thatSMP represents an accurate architecture for realizing the3DGN framework. Fig. 6 provides the visualization of ﬁl-ters in a learned SphereNet model. Among the distance,angle and torsion, considering any one when ﬁxing the othertwo, the structural value of ﬁlters will be different whenthe one of interest changes. It essentially shows that allthe distance, angle, and torsion information determine thestructural semantics of ﬁlters. This further demonstrates thatSMP enables the learning of different 3D information forimproving representations. Details of the SphereNet ﬁltersand more visualization results are provided in Appendix D. Since SphereNet computes geometries such as torsion an-gles, and employs linear layers for incorporating 3D in-formation, it involves extra parameters and computationalresources. We study the efﬁciency of SphereNet by compar-ing with other models regarding number of parameters andtime cost per epoch using the same computing infrastruc-ture (Nvidia GeForce RTX 2080 TI 11GB). Experimentsare conducted on the property U of QM9 and results areshown in Table 5. It is obvious that SphereNet uses similarcomputational resources as DimeNet++, and is much moreefﬁcient than DimeNet. pherical Message Passing for 3D Graph Networks

7. Conclusion

3D information is important for real-world graph data butthe existing GN framework does not consider it. We ﬁrstbuild the generic and uniﬁed framework 3DGN to provide aclear interface for 3D graphs. We further develop a novelmessage passing architecture SMP for realizing the 3DGN,and show that SMP represents a complete and accurate ar-chitecture in SCS. Based on SMP and meaningful physicalrepresentations, SphereNet is presented for real-world 3Dgraph data. Experimental results on various types of datasetsshow that SphereNet leads to signiﬁcant performance im-provements without increasing computations.

Acknowledgments

This work was supported in part by National Science Foun-dation grant IIS-1908198 and National Institutes of Healthgrant 1R21NS102828.

References

Anderson, B., Hy, T.-S., and Kondor, R. Cormorant: Co-variant molecular neural networks. In

Proceedings ofthe 33st International Conference on Neural InformationProcessing Systems , pp. 14537–14546, 2019.Axelrod, S. and Gomez-Bombarelli, R. Geom: Energy-annotated molecular conformations for property pre-diction and molecular generation. arXiv preprintarXiv:2006.05531 , 2020.Battaglia, P. W., Hamrick, J. B., Bapst, V., Sanchez-Gonzalez, A., Zambaldi, V., Malinowski, M., Tacchetti,A., Raposo, D., Santoro, A., Faulkner, R., et al. Rela-tional inductive biases, deep learning, and graph networks. arXiv preprint arXiv:1806.01261 , 2018.Calais, J.-L. Density-functional theory of atoms andmolecules. rg parr and w. yang, oxford university press,new york, oxford, 1989. ix+ 333 pp. price£ 45.00.

Inter-national Journal of Quantum Chemistry , 47(1):101–101,1993.Chanussot, L., Das, A., Goyal, S., Lavril, T., Shuaibi, M.,Riviere, M., Tran, K., Heras-Domingo, J., Ho, C., Hu,W., et al. The open catalyst 2020 (oc20) dataset andcommunity challenges. arXiv preprint arXiv:2010.09990 ,2020.Chmiela, S., Tkatchenko, A., Sauceda, H. E., Poltavsky, I.,Schütt, K. T., and Müller, K.-R. Machine learning of ac-curate energy-conserving molecular force ﬁelds.

Scienceadvances , 3(5):e1603015, 2017.Chmiela, S., Sauceda, H. E., Müller, K.-R., and Tkatchenko,A. Towards exact molecular dynamics simulations with machine-learned force ﬁelds.

Nature communications , 9(1):1–10, 2018.Cohen, T., Weiler, M., Kicanaoglu, B., and Welling, M.Gauge equivariant convolutional networks and the icosa-hedral cnn. In

International Conference on MachineLearning , pp. 1321–1330, 2019.Defferrard, M., Bresson, X., and Vandergheynst, P. Con-volutional neural networks on graphs with fast localizedspectral ﬁltering.

Advances in neural information pro-cessing systems , 29:3844–3852, 2016.Fout, A., Byrd, J., Shariat, B., and Ben-Hur, A. Proteininterface prediction using graph convolutional networks.In

Proceedings of the 31st International Conference onNeural Information Processing Systems , pp. 6533–6542,2017.Gao, H. and Ji, S. Graph U-nets. In

Proceedings of The36th International Conference on Machine Learning , pp.2083–2092, 2019.Gao, H., Wang, Z., and Ji, S. Large-scale learnable graphconvolutional networks. In

Proceedings of the 24th ACMSIGKDD International Conference on Knowledge Dis-covery and Data Mining , pp. 1416–1424, 2018.Gao, H., Liu, Y., and Ji, S. Topology-aware graph poolingnetworks. arXiv preprint arXiv:2010.09834 , 2020.Garg, V., Jegelka, S., and Jaakkola, T. Generalization andrepresentational limits of graph neural networks. In

In-ternational Conference on Machine Learning , pp. 3419–3430, 2020.Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O., andDahl, G. E. Neural message passing for quantum chem-istry. In

Proceedings of the 34th International Conferenceon Machine Learning-Volume 70 , pp. 1263–1272. JMLR.org, 2017.Gori, M., Monfardini, G., and Scarselli, F. A new modelfor learning in graph domains. In

Proceedings. 2005IEEE International Joint Conference on Neural Networks,2005. , volume 2, pp. 729–734. IEEE, 2005.Grifﬁths, D. J. and Schroeter, D. F.

Introduction to quantummechanics . Cambridge University Press, 2018.Hamilton, W., Ying, Z., and Leskovec, J. Inductive repre-sentation learning on large graphs. In

Advances in neuralinformation processing systems , pp. 1024–1034, 2017.Ingraham, J., Garg, V. K., Barzilay, R., and Jaakkola, T.Generative models for graph-based protein design. In

Advances in Neural Information Processing Systems , pp.15794–15805, 2019. pherical Message Passing for 3D Graph Networks

Kingma, D. P. and Ba, J. Adam: A method for stochasticoptimization. arXiv preprint arXiv:1412.6980 , 2014.Kipf, T. N. and Welling, M. Semi-supervised classiﬁca-tion with graph convolutional networks. In

InternationalConference on Learning Representations , 2017.Klicpera, J., Giri, S., Margraf, J. T., and Günnemann, S.Fast and uncertainty-aware directional message passingfor non-equilibrium molecules. In

NeurIPS-W , 2020a.Klicpera, J., Groß, J., and Günnemann, S. Directionalmessage passing for molecular graphs. In

InternationalConference on Learning Representations (ICLR) , 2020b.Li, Y., Tarlow, D., Brockschmidt, M., and Zemel, R. Gatedgraph sequence neural networks. In

International Confer-ence on Learning Representations , 2016.Liu, S., Demirel, M. F., and Liang, Y. N-gram graph: Simpleunsupervised representation for graphs, with applicationsto molecules.

Advances in Neural Information ProcessingSystems , 32:8464–8476, 2019.Liu, Y., Yuan, H., Cai, L., and Ji, S. Deep learning ofhigh-order interactions for protein interface prediction.In

Proceedings of the 26th ACM SIGKDD InternationalConference on Knowledge Discovery & Data Mining , pp.679–687, 2020.Lu, C., Liu, Q., Wang, C., Huang, Z., Lin, P., and He, L.Molecular property prediction: A multilevel quantuminteractions modeling perspective. In

Proceedings of theAAAI Conference on Artiﬁcial Intelligence , volume 33,pp. 1052–1060, 2019.Maron, H., Ben-Hamu, H., Serviansky, H., and Lipman, Y.Provably powerful graph networks.

Advances in NeuralInformation Processing Systems , 32:2153–2164, 2019.Qiao, Z., Welborn, M., Anandkumar, A., Manby, F. R.,and Miller III, T. F. Orbnet: Deep learning for quantumchemistry using symmetry-adapted atomic-orbital fea-tures.

The Journal of Chemical Physics , 153(12):124111,2020.Ramakrishnan, R., Dral, P. O., Rupp, M., and Von Lilienfeld,O. A. Quantum chemistry structures and properties of134 kilo molecules.

Scientiﬁc data , 1(1):1–7, 2014.Sanchez-Gonzalez, A., Godwin, J., Pfaff, T., Ying, R.,Leskovec, J., and Battaglia, P. Learning to simulatecomplex physics with graph networks. In

InternationalConference on Machine Learning , pp. 8459–8468, 2020.Scarselli, F., Gori, M., Tsoi, A. C., Hagenbuchner, M., andMonfardini, G. The graph neural network model.

IEEEtransactions on neural networks , 20(1):61–80, 2008. Schütt, K., Kindermans, P.-J., Felix, H. E. S., Chmiela, S.,Tkatchenko, A., and Müller, K.-R. Schnet: A continuous-ﬁlter convolutional neural network for modeling quantuminteractions. In

Advances in neural information process-ing systems , pp. 991–1001, 2017.Shervashidze, N., Schweitzer, P., Van Leeuwen, E. J.,Mehlhorn, K., and Borgwardt, K. M. Weisfeiler-lehmangraph kernels.

Journal of Machine Learning Research ,12(9), 2011.Sholl, D. and Steckel, J. A.

Density functional theory: apractical introduction . John Wiley & Sons, 2011.Simm, G., Pinsler, R., and Hernández-Lobato, J. M. Rein-forcement learning for molecular design guided by quan-tum mechanics. In

International Conference on MachineLearning , pp. 8959–8969, 2020.Townshend, R., Bedi, R., Suriana, P., and Dror, R. End-to-end learning on 3d protein structure for interface predic-tion.

Advances in Neural Information Processing Systems ,32:15642–15651, 2019.Unke, O. T. and Meuwly, M. Physnet: A neural network forpredicting energies, forces, dipole moments, and partialcharges.

Journal of chemical theory and computation , 15(6):3678–3693, 2019.Veliˇckovi´c, P., Cucurull, G., Casanova, A., Romero, A.,Lio, P., and Bengio, Y. Graph attention networks. In

International Conference on Learning Representations ,2018.Vignac, C., Loukas, A., and Frossard, P. Building power-ful and equivariant graph neural networks with message-passing. arXiv preprint arXiv:2006.15107 , 2020.Wang, Z., Liu, M., Luo, Y., Xu, Z., Xie, Y., Wang, L.,Cai, L., and Ji, S. Advanced graph and sequence neu-ral networks for molecular property prediction and drugdiscovery. arXiv preprint arXiv:2012.01981 , 2020.Wu, Z., Ramsundar, B., Feinberg, E. N., Gomes, J., Ge-niesse, C., Pappu, A. S., Leswing, K., and Pande, V.Moleculenet: a benchmark for molecular machine learn-ing.

Chemical science , 9(2):513–530, 2018.Xie, T. and Grossman, J. C. Crystal graph convolutionalneural networks for an accurate and interpretable predic-tion of material properties.

Physical review letters , 120(14):145301, 2018.Xu, K., Hu, W., Leskovec, J., and Jegelka, S. How powerfulare graph neural networks? In

International Conferenceon Learning Representations , 2019. pherical Message Passing for 3D Graph Networks

Yuan, H. and Ji, S. StructPool: Structured graph poolingvia conditional random ﬁelds. In

Proceedings of the 8thInternational Conference on Learning Representations ,2020.Zhang, M., Cui, Z., Neumann, M., and Chen, Y. An end-to-end deep learning architecture for graph classiﬁcation. In

Proceedings of the AAAI Conference on Artiﬁcial Intelli-gence , volume 32, 2018. pherical Message Passing for 3D Graph Networks: Appendix

Angle ˜ a SBF , ln ( d , θ ) Interaction Block

LB2LB2

Messages E s k + σ (LB) Message e k σ (LB) Input Block

Distance e RBF , n ( d ) v s k v r k | | LB2 e k σ (LB) σ (LB) σ (LB) e k + Output BlockLB2 e ′ k ⊙ Σ LB2

Residual + Spherical Message Passing σ (LB) σ (LB) ⊙⊙ σ (LB) ⊙ Σ Residual x 2 v r k e ′ k Distance ˜ e RBF , n ( d ) LB2

Torsion Angle ˜ t BF , lmn ( d , θ , φ ) Distance ˜ e RBF , n ( d ) Figure 7.

Architecture of SphereNet. LB2 denotes a linear block with two linear layers, σ (LB) denotes a linear layer followed by anactivation function, (cid:107) denotes concatenation, and (cid:12) denotes element-wise multiplication. Each LB2 aims at canceling bottlenecks byperforming downprojection, followed by upprojection. Hence, it is related to three hyperparameters; these are, input embedding size,intermediate size, and output embedding size. Each linear block LB is related to hyperparameters including input embedding size andoutput embedding size. Description of each block is in Sec. A. A. Architecture of SphereNet

The architecture of SphereNet is designed based on DimeNet++ (Klicpera et al., 2020a), where we incorporate our proposeds pherical message passing and the torsion representation ˜ t BF ,(cid:96)mn ( d, θ, ϕ ) . Detailed architecture of SphereNet is provided inFig. 7. Speciﬁcally, SphereNet is composed of an input block, followed by multiple interaction blocks and an output block.For the purpose of simplicity, the architecture is explained by updating the receiver note r k of the message e k , as describedin Eq. 2 and Sec. 4.3 in main paper. Input Block aims at constructing initial message e k for the edge k . Inputs include the distance representation ˜ e RBF ,n ( d ) for edge k , initial node embeddings v s k , v r k for the sender node s k , and the receiver node r k . The distance information isencoded by using a LB2 block. Interaction Block updates the message e k with incorporating all the three physical representations. Input 3D informationincludes the distance embedding ˜ e RBF ,n ( d ) , the angle ˜ a SBF ,(cid:96)n ( d, θ ) , and the torsion ˜ t BF ,(cid:96)mn ( d, θ, ϕ ) . The initial embeddingsizes for them are N SHBF , N SRBF × N SHBF , and N SRBF × N SHBF , respectively. Other inputs are old message e k and the setof messages E s k that point to the sender node s k . Similar to the input block, each type of 3D information is encoded byusing a block LB2. Note that each (cid:12) indicates the element-wise multiplication between the corresponding 3D informationrepresented as a vector and each message in the set E s k . Thus, each neighboring message of e k is gated by the encoded 3Dinformation. The (cid:80) aggregates all the gated messages in E s k to a vector, which is added to the transformation of the oldmessage e k as the updated message e (cid:48) k . The transformation branch for old message e k is composed of several nonlinearlayers and residual blocks, as shown in Fig. 7. Output Block aggregates all the incoming messages to update the feature for node r k . Each incoming message has thesame update process as e k by interaction blocks. For the purpose of clear illustration, we use e (cid:48) k to represent each updatedincoming message, which is further gated by the distance representation vector ˜ e RBF ,n ( d ) . pherical Message Passing for 3D Graph Networks B. Relations with Prior 3DGN Models

B.1. SchNet, Schütt et al. (2017)

In SchNet, the used aggregation function to encode 3D positional information is ρ p → e ( { r h } h = r k ∪ s k ) =˜ e RBF ,n ( (cid:107) r r k − r s k (cid:107) ) , which converts the positional information to an embedding of distance. In addition to the ρ p → e func-tion, the φ e function used is NN ( NN ( v r k ) (cid:12) NN (˜ e RBF ,n ( (cid:107) r r k − r s k (cid:107) ))) , where NN denotes a neural network and (cid:12) de-notes the element-wise multiplication. The ρ e → v function is (cid:80) ( e (cid:48) k ,r k ,s k ) ∈ E i e (cid:48) k . The φ v function is v i + (cid:80) ( e (cid:48) k ,r k ,s k ) ∈ E i e (cid:48) k .The global feature u is updated based on the ﬁnal node features V T and the function is φ u = (cid:80) i =1: n NN (cid:0) v Ti (cid:1) . Formally,the update process is expressed as e (cid:48) k = φ e ( v r k , ρ p → e ( { r h } h = r k ∪ s k ))= φ e ( v r k , ˜ e RBF ,n ( (cid:107) r s k − r r k (cid:107) ))= NN ( NN ( v r k ) (cid:12) NN (˜ e RBF ,n ( (cid:107) r r k − r s k (cid:107) ))) , v (cid:48) i = φ v ( v i , ρ e → v ( E i ))= φ v  v i , (cid:88) ( e (cid:48) k ,r k ,s k ) ∈ E i e (cid:48) k  = v i + (cid:88) ( e (cid:48) k ,r k ,s k ) ∈ E i e (cid:48) k , u = φ u (cid:0) ρ v → u (cid:0) V T (cid:1)(cid:1) = (cid:88) i =1: n NN (cid:0) v Ti (cid:1) . (8) B.2. PhysNet, Unke & Meuwly (2019)

PhysNet uses distance between atoms as an important feature and proposes more powerful neural networks for chemical appli-cations. The positional aggregation function is ρ p → e ( { r h } h = r k ∪ s k ) = g ( (cid:107) r r k − r s k (cid:107) ) , where g is any radial basis functionwith a smooth cutoff. For the information update functions, the φ e function is σ ( W ) σ ( v s k ) (cid:12) W g ( (cid:107) r r k − r s k (cid:107) ) , the φ v function is NN (cid:16) W (cid:12) v i + NN (cid:16) σ ( W ) σ ( v i ) + (cid:80) ( e (cid:48) k ,r k ,s k ) ∈ E i e (cid:48) k (cid:17)(cid:17) and the φ u function is u + (cid:80) i =1: n NN ( v (cid:48) i ) .Here NN denotes a neural network, W , W , W , W are learnable weight matrices, σ is an activate function, and (cid:12) denotes the element-wise multiplication. PhysNet is expressed as e (cid:48) k = φ e ( v s k , ρ p → e ( { r h } h = r k ∪ s k ))= φ e ( v s k , g ( (cid:107) r r k − r s k (cid:107) ))= σ ( W ) σ ( v s k ) (cid:12) W g ( (cid:107) r r k − r s k (cid:107) ) , v (cid:48) i = φ v ( v i , ρ e → v ( E i ))= φ v  v i , (cid:88) ( e (cid:48) k ,r k ,s k ) ∈ E i e (cid:48) k  = NN  W (cid:12) v i + NN  σ ( W ) σ ( v i ) + (cid:88) ( e (cid:48) k ,r k ,s k ) ∈ E i e (cid:48) k  , u (cid:48) = φ u ( ρ v → u ( V (cid:48) ) , u )= u + (cid:88) i =1: n NN ( v (cid:48) i ) . (9) pherical Message Passing for 3D Graph Networks B.3. DimeNet, Klicpera et al. (2020b)

DimeNet explicitly considers distances between atoms and angles between directed edges. The aggregation functions on thepositional information is ρ p → e = (˜ e RBF ,n (cid:107) ˜ a SBF ,(cid:96)n ) , where (cid:107) denotes concatenation. For other functions, the φ e function usedis e (cid:48) k = (cid:16) e (cid:48) k, (cid:107) e (cid:48) k, (cid:17) with e (cid:48) k, = NN (cid:16) e k, + NN (cid:16) σ W e k, + (cid:80) ( e j ,r j ,s j ) ∈ E sk W ˜ a k,j SBF ,(cid:96)n (cid:16) W ˜ e j RBF ,n (cid:12) σ W e j, (cid:17)(cid:17)(cid:17) and e (cid:48) k, = W ˜ e j RBF ,n (cid:12) e (cid:48) k, , where NN denotes a neural network, W , W , W , W , W are different weight matrices,and σ is an activation function. The ρ e → v function is (cid:80) ( e (cid:48) k ,r k ,s k ) ∈ E i e (cid:48) k, and the φ v is NN (cid:16)(cid:80) ( e (cid:48) k ,r k ,s k ) ∈ E i e (cid:48) k, (cid:17) . The ρ v → u is (cid:80) i =1: n v (cid:48) i and the φ u is u + (cid:80) i =1: n v (cid:48) i . Note that ρ p → v , ρ p → u , ρ e → u functions are not required in DimeNet. Thewhole model is expressed as e k = ( e k, (cid:107) e k, ) ,ρ p → e = (˜ e RBF ,n (cid:107) ˜ a SBF ,(cid:96)n ) , e (cid:48) k, = φ e (cid:16) e k , E s k , ρ p → e (cid:16) { r h } h = r k ∪ s k ∪N sk (cid:17)(cid:17) = NN  e k, + NN  σ W e k, + (cid:88) ( e j ,r j ,s j ) ∈ E sk W ˜ a k,j SBF ,(cid:96)n (cid:16) W ˜ e j RBF ,n (cid:12) σ W e j, (cid:17) , e (cid:48) k, = W ˜ e j RBF ,n (cid:12) e (cid:48) k, , v (cid:48) i = φ v ( ρ e → v ( E i ))= NN  (cid:88) ( e (cid:48) k ,r k ,s k ) ∈ E i e (cid:48) k,  , u (cid:48) = φ u ( u , ρ v → u ( V (cid:48) ))= u + (cid:88) i =1: n v (cid:48) i . (10) C. Experimental Setup

For all the models used in three datasets, we set input embedding size = 256 and output embedding size = 64 for both LB2and LB blocks. For each separate model, we ﬁrst perform warmup on initial learning rate. Then two learning rate strategies,including CosineAnnealingLR and StepLR, are used for training. For StepLR, the learning rate is decayed by the decayratio every ﬁxed epochs represented as step size. We do not use weight decay or dropout for all models. For MD17, wefollow the settings in Klicpera et al. (2020b); Schütt et al. (2017) and use a joint loss of forces and conserved energy duringtraining. The forces’ weight is set to be 100 for all models. Some hyperparameters are ﬁxed values, and some are tunedby grid search. Values/search space of hyperparameters for OC20, QM9, and MD17 are provided in Table 6, Table 7, andTable 8, respectively. As described in main paper, optimized hyperparameters are tuned on validation sets and applied totest sets for QM9 and MD17. For OC20, optimized hyperparameters are obtained on the ID split within max epochs, andthen applied to the other three splits. Pytorch is used to implement all methods. For QM9 and MD17 datasets, all modelsare trained using one NVIDIA GeForce RTX 2080 Ti 11GB GPU. For the OC20 dataset, all models are trained using fourNVIDIA Tesla V100 32GB GPUs.

D. SphereNet Filter Visualization

We visualize SphereNet ﬁlters from a learned SphereNet model. Speciﬁcally, we port learned weights for the block LB2 afterthe torsion embedding ˜ t BF ,(cid:96)mn ( d, θ, ϕ ) in Fig. 7. For each location represented by a tuple ( d, θ, ϕ ) , the initial embedding sizeis N SRBF × N SHBF . The computation for the above LB2 is W (cid:0) W ˜ t BF ,(cid:96)mn ( d, θ, ϕ ) (cid:1) , which results in the new embeddingsize of 64 for each location ( d, θ, ϕ ) . We then perform sampling on locations in 3D space for visualizing weights asSphereNet ﬁlters. We set sampling rate in the torsion direction to be π/ in Fig. 6, and a smaller rate π/ in Fig. 8. Thus,there are four samples in the torsion direction in Fig. 6 and eight samples in Fig. 8. For the distance and angle directions, weuse much smaller sample rates to provide visualization maps with high resolution. There are totally 64 elements for eachlocation, and we randomly pick 6 elements as shown in Fig. 8. pherical Message Passing for 3D Graph Networks Table 6.

Values/search space for hyperparameters on OC20.

Hyperparameters Values/search spaceInteraction block - distance LB2 intermediate size 8Interaction block - angle LB2 intermediate size 8Interaction block - torsion LB2 intermediate size 8 N SRBF N SHBF

Figure 8.

Visualization of six SphereNet ﬁlters. Each row corresponds to a ﬁlter with torsion angles 0, π/ , π/ , π/ , π , π/ , π/ ,and π/ from left to right. pherical Message Passing for 3D Graph Networks Table 7.

Values/search space for hyperparameters on QM9.

Hyperparameters Values/search spaceInteraction block - distance LB2 intermediate size 4, 8, 16Interaction block - angle LB2 intermediate size 4, 8, 16Interaction block - torsion LB2 intermediate size 4, 8, 16 N SRBF N SHBF

Table 8.

Values/search space for hyperparameters on MD17.