[PDF] Interest-aware Message-Passing GCN for Recommendation

Abstract

Graph Convolution Networks (GCNs) manifest great potential in recommendation. This is attributed to their capability on learning good user and item embeddings by exploiting the collaborative signals from the high-order neighbors. Like other GCN models, the GCN based recommendation models also suffer from the notorious over-smoothing problem - when stacking more layers, node embeddings become more similar and eventually indistinguishable, resulted in performance degradation. The recently proposed LightGCN and LR-GCN alleviate this problem to some extent, however, we argue that they overlook an important factor for the over-smoothing problem in recommendation, that is, high-order neighboring users with no common interests of a user can be also involved in the user's embedding learning in the graph convolution operation. As a result, the multi-layer graph convolution will make users with dissimilar interests have similar embeddings. In this paper, we propose a novel Interest-aware Message-Passing GCN (IMP-GCN) recommendation model, which performs high-order graph convolution inside subgraphs. The subgraph consists of users with similar interests and their interacted items. To form the subgraphs, we design an unsupervised subgraph generation module, which can effectively identify users with common interests by exploiting both user feature and graph structure. To this end, our model can avoid propagating negative information from high-order neighbors into embedding learning. Experimental results on three large-scale benchmark datasets show that our model can gain performance improvement by stacking more layers and outperform the state-of-the-art GCN-based recommendation models significantly.

Full PDF

IInterest-aware Message-Passing GCN for Recommendation

Fan Liu † , Zhiyong Cheng §∗ , Lei Zhu ‡ , Zan Gao § , Liqiang Nie †∗ † School of Computer Science and Technology, Shandong University § Shandong Artificial Intelligence Institute, Qilu University of Technology (Shandong Academy of Sciences) ‡ School of Information Science and Engineering, Shandong Normal University{liufancs,jason.zy.cheng,nieliqiang}@gmail.com

ABSTRACT

Graph Convolution Networks (GCNs) manifest great potential inrecommendation. This is attributed to their capability on learninggood user and item embeddings by exploiting the collaborativesignals from the high-order neighbors. Like other GCN models,the GCN based recommendation models also suffer from the no-torious over-smoothing problem – when stacking more layers,node embeddings become more similar and eventually indistin-guishable, resulted in performance degradation. The recently pro-posed LightGCN and LR-GCN alleviate this problem to some extent,however, we argue that they overlook an important factor for theover-smoothing problem in recommendation, that is, high-orderneighboring users with no common interests of a user can be alsoinvolved in the user’s embedding learning in the graph convolutionoperation. As a result, the multi-layer graph convolution will makeusers with dissimilar interests have similar embeddings. In thispaper, we propose a novel Interest-aware Message-Passing GCN(IMP-GCN) recommendation model, which performs high-ordergraph convolution inside subgraphs. The subgraph consists of userswith similar interests and their interacted items. To form the sub-graphs, we design an unsupervised subgraph generation module,which can effectively identify users with common interests by ex-ploiting both user feature and graph structure. To this end, ourmodel can avoid propagating negative information from high-orderneighbors into embedding learning. Experimental results on threelarge-scale benchmark datasets show that our model can gain per-formance improvement by stacking more layers and outperform thestate-of-the-art GCN-based recommendation models significantly.

CCS CONCEPTS • Information systems → Recommender systems . KEYWORDS

Recommendation, Graph Convolution Networks, Message-PassingStrategy, Interest-aware, Subgraph

ACM Reference Format:

Fan Liu † , Zhiyong Cheng §∗ , Lei Zhu ‡ , Zan Gao § , Liqiang Nie †∗ . 2021.Interest-aware Message-Passing GCN for Recommendation. In Proceedings * Corresponding author: Zhiyong Cheng and Liqiang Nie.This paper is published under the Creative Commons Attribution 4.0 International(CC-BY 4.0) license. Authors reserve their rights to disseminate the work on theirpersonal and corporate Web sites with the appropriate attribution.

WWW ’21, April 19–23, 2021, Ljubljana, Slovenia © 2021 IW3C2 (International World Wide Web Conference Committee), publishedunder Creative Commons CC-BY 4.0 License.ACM ISBN 978-1-4503-8312-7/21/04.https://doi.org/10.1145/3442381.3449986 of the Web Conference 2021 (WWW ’21), April 19–23, 2021, Ljubljana, Slovenia.

ACM, New York, NY, USA, 10 pages. https://doi.org/10.1145/3442381.3449986

Recommendation system has become one of the most importanttechniques for various online platforms. It can not only provide per-sonalized information for an specific user from overwhelming infor-mation, but also increase the revenue for service providers. Amongthem, Collaborative filtering (CF) based models [1, 15, 20, 38, 41]have made substantial progress in learning user and item represen-tations by modeling historical user-item interactions. For example,matrix factorization (MF) can directly embed user/item as a featurevector and model the user-item interactions with inner product [1].Neural collaborative filtering models replace the MF interactionfunction of inner product with nonlinear neural networks to learnbetter user and item representations [15].Recently, GCN-based models [14, 21, 29, 32, 33] have achievedgreat success in recommendation due to the powerful capability onrepresentation learning from non-Euclidean structure. The core ofGCN-based models is to iteratively aggregate feature informationfrom local graph neighbors. It has been proved to be an efficientway to distill additional information from graph structure, andthus improves user and item representation learning and allevi-ates the sparse problem. For example, NGCF [33] has proved thatexploiting high-order connectivity can help alleviate the sparsityproblem in recommendation. However, it is also well-recognizedthat GCNs suffer from the over-smoothing problem [33], becausethe graph convolution operation is actually a special kind of graphLaplacian smoothing [33], making node representations become in-distinguishable after multi-layer graph convolution [40]. As a result,most current GCN based models obtain their peak performanceby stacking only few layers (e.g., 2 or 3 layers), and continuingincreasing the depth will lead to sharp performance degradation.In the domain of recommendation, Chen et al. [3] have empiricallydemonstrated that the user/item embeddings become more similarwhen stacking more layers in NGCF due to the over-smoothingeffect. In other words, the preferences of different users becomehomogeneous, resulted in performance degradation in recommen-dation. Based on the observations, they proposed a LR-GCN model,which removes the non-linearities in GCNs to simply the networkstructure and introduced a residual network structure to alleviatethe over-smoothing problem, achieving substantially improvementover NGCF on recommendation accuracy.It is worth mentioning that the LightGCN proposed by He etal. [14] has a similar formulation as LR-GCN. With careful experi-mental studies, He et al. pointed out that the feature transformationand nonlinear activation have no positive effect (or even negative a r X i v : . [ c s . I R ] F e b WW ’21, April 19–23, 2021, Ljubljana, Slovenia Fan Liu † , Zhiyong Cheng §∗ , Lei Zhu ‡ , Zan Gao § , Liqiang Nie †∗ effect due to the increase of training difficult) to the final perfor-mance. Therefore, they only keep the neighborhood aggregation inthe LightGCN for collaborative filtering. Comparing to LR-GCN,LightGCN further removes the “self-loop" in the aggregation oper-ation. Although LightGCN is not dedicatedly designed for tackingthe over-smoothing problem, it has almost the same formulation asLR-GCN and thus can also alleviate the over-smoothing problemto some extent. In fact, both LR-GCN and LightGCN are consis-tent with the recent theories in simplifying GCNs [37] and canobtain the best performance with a deeper structure (e.g., 4 layers).Despite the two success GCN based models are designed for rec-ommendation, we argue that they still design the model from theperspective of graph convolution, while have not well consideredthe over-smoothing problem in the domain of recommendation.The GCN based recommendation model is built upon a user-item graph, in which the user and item are linked according to thehistorical user-item interactions. The user embedding is learnedby iteratively aggregating messages passed from the neighboring(both user and item) nodes. Note that the passed messages are dis-tilled from the embeddings of neighboring nodes. When stacking 𝑘 layers, the information from the 𝑘 -order neighbors, which areindirectly connected via items and users, are also involved in theembedding learning of a target node. An underlying assumption isthat the collaborative signals from high-order neighbors are benefi-cial to the embedding learning. However, not all the informationfrom high-order neighbors are positive in reality. In the user-iteminteraction graph, the high-order neighboring users could have nocommon or even contradictory interest with a target user. This ishighly possible, especially when the graph is constructed based onimplicit feedbacks (e.g., click). In fact, the implicit feedback is morewidely used over the explicit feedbacks in modern recommendationsystems. The core idea behind collaborative filtering is that similarusers like similar items. Therefore, the collaborative signals thatwe would like to exploit should be from similar users (i.e., userswith similar interests). However, existing GCN-based recommen-dation models have not distinguished the high-order neighbors,and just simply aggregate the messages from all those neighbors toupdate user embeddings. As a result, the embeddings of dissimilarusers are also involved in the embedding learning of a target user,negatively affecting the performance. This is also a reason of theover-smoothing effect in the GCN-based recommendation models– making the embeddings of dissimilar users to be similar.Motivated by the above considerations, in this paper, we pro-pose a novel Interest-aware Messaging-Passing GCN (IMP-GCN)recommendation model, which groups users and their interacteditems into different subgraphs and operates high-order graph con-volutions inside subgraphs. More specific, we adopt the simplifiednetwork structure of LightGCN, as its effectiveness has been welldemonstrated in [14] and it can alleviate the over-smoothing prob-lem to some extent. The first-order graph convolution is the same asthat of LightGCN. For the high-order graph convolution, only themessages from nodes in the same subgraph are exploited to learn thenode embeddings. The subgraph is generated by a proposed graphgeneration module, which integrates users features and graph struc-ture to identify users with similar interests, and then constructsthe subgraphs by retaining those users and their interacted items. To this end, our model can filter out the negative information prop-agation in the high-order graph convolution operations for theembedding learning, and thus can keep the uniqueness of usersby stacking more graph convolution layers. Extensive experimentshave been conducted on three large-scale real-world datasets tovalidate the effectiveness of our model. Results show that our modeloutperforms the state-of-the-art methods by a large margin and canobtain better performance with more layers (till 7 layers) . Thisindicates that our model can benefits from higher-order neighborsby excluding negative nodes. Besides, with deep analysis on theresults, we found that the negative information in the embeddingpropagation is the major reason for the performance degradationof existing GCN-based recommendation models in deep structure.We released the codes and involved parameter settings to facilitateothers to repeat this work .In summary, the main contributes of this work are as follows: • We step into the over-smoothing problem in existing GCN-basedrecommendation models and point out an overlooked factor:exploiting high-order neighbors indiscriminately makes the em-beddings of users with dissimilar interests to be similar. • We propose an IMP-GCN model which exploits high-order neigh-bors from the same subgraph, in which the user nodes share moresimilar interests than those in other subgraphs. It is proved to beeffective on alleviating the over-smoothing problem. • We design a subgraph generation module to group users andgenerate subgraphs from the user-item bipartite graph by con-sidering users features and graph structure information. • We conduct empirical studies on three benchmark datasets toevaluate the proposed IPM-GCN model. Results show that IPM-GCN can gain improvement by stacking more layers and learnbetter user/item embeddings, and thus outperforms the SOTAGCN-based recommendation models with a large margin.

Let 𝑨 ∈ R 𝑁 × 𝑀 be the user-item interaction matrix, where 𝑁 and 𝑀 indicate the number of users and items, respectively. An nonzeroentry 𝑎 𝑢𝑖 ∈ 𝑨 indicates that user 𝑢 ∈ U has interacted with item 𝑖 ∈ I before; otherwise, the entry is zero. A user-item bipartitegraph G = (W , E) can be constructed based on the interactionmatrix, where the node set W consists of the two types of usernodes and item nodes and E represents for the set of edges. For anonzero 𝑎 𝑢𝑖 , there is an edge between the user 𝑢 and item 𝑖 . Theabove information is taken as the input of GCN model to learn theuser and item representations by iteratively aggregating featuresfrom neighboring nodes in the bipartite graph.Here we take LightGCN as an example to describe the GCN-basedrecommendation model, because it achieves the state-of-the-artperformance with a very light design. Our model is also developedbased on its design. Let 𝒆 ( ) 𝒖 denote the ID embedding of user 𝑢 and 𝒆 ( ) 𝒊 denote the ID embedding of item 𝑖 , the graph convolution In experiments, we found that by stacking 7 layers, a user node almost reaches all theother users in three different datasets. Therefore, no more gain after stacking 7 layers. https://github.com/liufancs/IMP_GCN. Note that although LR-GCN was inspired by a different motivation, its final formula-tion is almost the same as LightGCN. nterest-aware Message-Passing GCN for Recommendation WWW ’21, April 19–23, 2021, Ljubljana, Slovenia C ov er ag e R a t i o / e - Number of Layers

Kindle Store

Home&Kitchen

Gowalla

Figure 1: The average ratio of nodes involved in differentlayers of graph convolution on three datasets. operation in LightGCN is described as follows: 𝒆 ( 𝒌 ) 𝒖 = ∑︁ 𝑖 ∈N 𝑢 √︁ |N 𝑢 | √︁ |N 𝑖 | 𝒆 ( 𝒌 − ) 𝒊 , 𝒆 ( 𝒌 ) 𝒊 = ∑︁ 𝑢 ∈N 𝑖 √︁ |N 𝑖 | √︁ |N 𝑢 | 𝒆 ( 𝒌 − ) 𝒖 , (1)where 𝒆 ( 𝒌 ) 𝒖 and 𝒆 ( 𝒌 ) 𝒊 represent the embeddings of the user 𝑢 anditem 𝑖 after 𝑘 layers propagation, respectively; N 𝑢 denotes the set ofitems that interact with user 𝑢 , and N 𝑖 denotes the set of users thatinteract with item 𝑖 ; √ |N 𝑢 | √ |N 𝑖 | is symmetric normalization terms,which can avoid the scale of embeddings increasing with graphconvolution operations [18]. After K layers graph convolution, thefinal embeddings of a user 𝑢 and an item 𝑖 are the combination oftheir embeddings obtained at each layer in LightGCN: 𝒆 𝒖 = 𝐾 ∑︁ 𝑘 = 𝛼 𝑘 𝒆 ( 𝒌 ) 𝒖 ; 𝒆 𝒊 = 𝐾 ∑︁ 𝑘 = 𝛼 𝑘 𝒆 ( 𝒌 ) 𝒊 , (2)where 𝛼 𝑘 ≥ the number of nodesthat a target node reaches in the propagation by stacking differentnumbers of layers to all the nodes in the graph . It can be seen thatafter 6- or 7-layer graph convolution, a node can almost receiveinformation from all the other nodes in embedding propagation.Therefore, by aggregating information from all the connected high-order neighbors, it is unavoidable that the node embeddings become homogeneous in the current GCN-based models after stacking morelayers, especially for the densely connected ones, whose embed-dings will become more and more similar. In the recommendationscenario, this means the uniqueness of users will be neglected indeep structure.Actually, current GCN-based recommendation models achievetheir peak performance at most 3 or 4 layers [14, 37]. Besides theover-smoothing effect, we deem that a node also takes noisy ornegative information in the embedding propagation process, whichhurts the final performance. This is because a user’s interests of-ten span a range of items. Different users can have very differentinterests or even exhibit contradictory attitudes to some items.Without distinguishing those users, the embedding propagationmay perform among users with very different interests to learntheir embeddings in the graph convolution operation. To avoidthe situation and alleviate the over-smoothing problem, it is im-portant to group users with similar interests (and their interacteditems) into subgraphs and constrain the embedding propagation tooperate inside the subgraph. To achieve the goal we propose theinterest-aware message-passing GCN model. With constructingsubgraphs, we would like that all the information propagated ina subgraph can contribute to the embedding learning of all thenodes in this subgraph. In other words, we aim to exclude the neg-ative information propagation in the graph convolution operationusing subgraphs. To achieve the goal, we rely on user nodes toform subgraphs in the user-item bipartite graph. The general idea isthat users with more similar interests are grouped into a subgraph,and the items which directly linked to those users also belong tothis subgraph. Therefore, each user only belongs to one subgraph,and an item can be associated with multiple subgraph. Let 𝐺 𝑠 with 𝑠 ∈ { , · · · , 𝑁 𝑠 } denotes a subgraph, where 𝑁 𝑠 is the number of sub-graphs. In the next, we introduce the graph convolution operationin our model.Because the direct interactions between users and items providethe most important and reliable information of user interests, in thefirst-order propagation, all the first-order neighbors are involvedin the graph convolution operation. Let 𝒆 ( ) 𝒖 and 𝒆 ( ) 𝒊 denote theID embeddings of user 𝑢 and item 𝑖 , respectively. The first-ordergraph convolution is: 𝒆 ( ) 𝒖 = ∑︁ 𝑖 ∈N 𝑢 √︁ |N 𝑢 | √︁ |N 𝑖 | 𝒆 ( ) 𝒊 , 𝒆 ( ) 𝒊 = ∑︁ 𝑢 ∈N 𝑖 √︁ |N 𝑖 | √︁ |N 𝑢 | 𝒆 ( ) 𝒖 , (3)where 𝒆 ( ) 𝒖 and 𝒆 ( ) 𝒊 represent the first layer embeddings of thetarget user 𝑢 and item 𝑖 , respectively.For the high-order graph convolution, to avoid introducing noisyinformation, a node in a subgraph can only exploit the informationfrom its neighbor nodes in this subgraph. Because the items inter-acted by a user all belong to the subgraph of this user, the user canstill receive information from all the linked items. However, for anitem node, its direct user neighbors can be distributed in different WW ’21, April 19–23, 2021, Ljubljana, Slovenia Fan Liu † , Zhiyong Cheng §∗ , Lei Zhu ‡ , Zan Gao § , Liqiang Nie †∗ 𝑬 (𝟎) 𝑬 (𝟏) 𝑬 (𝒌) 𝑬 (𝟐) E : Message passing : Layer propagation : Features obtained from the k-th layer : The final representation of users and items : Subgraph combination: Layer combination 𝑬 (𝒌) Eu u u u i i i i u u u u i i i i Layer 1

User-item Interaction Graph G First-order Graph Convolution Layer u u u u i i i i u u u u i i i i u u u u i i i i u u u u i i i i u u u u i i i i u u u u i i i i Layer k-1 u u u u i i i i u u u u i i i i Layer 1 … Layer 1 u u u u i i i i u u u u i i i i … u u u u i i i i u u u u i i i i Layer k-1

Sub g r a ph G Sub g r a ph G High-order Graph Convolution Layer … 𝑬 (𝒌−𝟏) L ay er ( G ) Figure 2: An overview of our IMP-GCN model with two subgraphs as illustration. In IMP-GCN, the first-order propagationoperates on whole graph, and high-order propagation operates inside the subgraphs. subgraphs. To learn the embeddings of an item 𝑖 , for each subgraph 𝐺 𝑠 it belongs to, we learn an embedding for this item. Let 𝒆 ( 𝒌 ) 𝒊𝒔 denotes the embedding of item 𝑖 in subgraph 𝑠 after 𝑘 layers graphconvolution, the high-order propagation in IMP-GCN is defined as: 𝒆 ( 𝒌 + ) 𝒖 = ∑︁ 𝑖𝑠 ∈N 𝑢 √︁ |N 𝑢 | √︁ |N 𝑖 | 𝒆 ( 𝒌 ) 𝒊𝒔 , 𝒆 ( 𝒌 + ) 𝒊𝒔 = ∑︁ 𝑢 ∈N 𝑠𝑖 √︁ |N 𝑖 | √︁ |N 𝑢 | 𝒆 ( 𝒌 ) 𝒖 . (4)In this way, we guarantee that the embedding of a node learned in asubgraph only contributes to the embedding learning of other nodesin this subgraph. This can avoid the noisy information propagatedfrom unrelated nodes. 𝒆 (·) 𝒊𝒔 can be regarded as the features learnedfrom the users with a similar interest in the subgraph 𝐺 𝑠 . Thismake senses since users with similar interests often prefer the samefeature of an item. The final representation of an item 𝑖 after 𝑘 layersgraph convolution is a combination of its embeddings learned indifferent subgraphs, i.e., 𝒆 ( 𝒌 ) 𝒊 = ∑︁ 𝑠 ∈S 𝒆 ( 𝒌 ) 𝒊𝒔 , (5)where 𝑆 is the subgraph set that item 𝑖 belongs to. We combine the embed-dings obtained at each layer to form the final representation of user 𝑢 and item 𝑖 as Eq. 2. Similar to LightGCN, 𝛼 𝑘 is set uniformly as1 /( 𝐾 + ) [14].With the learned embeddings of users (i.e., 𝒆 𝑢 ) and items 𝒆 𝑖 ,given a user 𝑢 and a target item 𝑖 , the preference of the user to theitem is computed by inner product:ˆ 𝑟 𝑢𝑣 = 𝒆 𝑇 𝒖 𝒆 𝒊 . (6)Notice that other interaction functions can be also applied, such asEuclidean distance. Because the main focus of this work is to studythe effects of distinguishing user interests in the graph convolution in the GCN-based recommendation model, we adopt the innerproduct as previous work [2, 33, 42] for fair comparisons in theempirical studies. We implement our algorithmwith the matrix form propagation rule (see [33] for more details),by which we can simultaneously update the representations of allusers and items in a rather efficient way. It is a commonly used ap-proach to make graph convolution network feasible for large-scalegraph [26, 33]. Let 𝑬 ( ) be the representations matrix for users IDand items ID; 𝑬 ( 𝒌 ) represents the representation of users and itemsat the 𝑘 -th layer. Similarly, 𝑬 ( 𝒌 ) 𝒔 is defined as the representationof users and items at the 𝑘 -th layer in subgraph 𝐺 𝑠 . As shown inFig. 2, the first layer embedding propagation in our model can bedescribed as follows: 𝑬 ( ) = L 𝑬 ( ) , (7)where L is the Laplacian matrix for the user-item interaction graph.As we involve the subgraphs in high-order graph convolutionlayers,the embeddings propagation on subgraphs is formulated asfollows: 𝑬 ( 𝒌 − ) 𝒔 = L 𝑠 𝑬 ( 𝒌 − ) 𝒔 , (8)where 𝑘 ⩾ L 𝑠 represent the Laplacian matrix for the subgraph 𝐺 𝑠 . And then, the ( 𝑘 − ) -th layer embeddings are propagated onthe user-item graph and obtained the embeddings in the 𝑘 -th layer: 𝑬 ( 𝒌 ) 𝒔 = L 𝑬 𝒔 ( 𝒌 − ) . (9)We aggregate all the 𝑘 -th layer embeddings involved different sub-graphs to formulate the final 𝑘 -th layer embeddings: 𝑬 ( 𝒌 ) = ∑︁ 𝑠 ∈ 𝐺 𝑠 𝑬 ( 𝒌 ) 𝒔 . (10)Lastly, we combine all the layers’ embeddings and get the final rep-resentations of users and items, this formulation keeps consistentwith it in LightGCN [14]: 𝑬 = 𝛼 𝑬 ( ) + 𝛼 𝑬 ( ) + · · · + 𝛼 𝐾 𝑬 ( 𝑲 ) (11) nterest-aware Message-Passing GCN for Recommendation WWW ’21, April 19–23, 2021, Ljubljana, Slovenia In this work, we target at the top- 𝑛 recom-mendation, which aims to recommend a set of 𝑛 top-ranked itemsmatching the target user’s preference. Compared to rating predic-tion, this is a more practical task in real commercial systems [27].Similar to other rank-oriented recommendation works [33, 42], weadopt the pairwise learning method for optimization. To performthe pairwise learning, it needs to constructs a triplet of { 𝑢, 𝑖 + , 𝑖 − } ,with an observed interaction between 𝑢 and 𝑖 + and an unobservedinteraction between 𝑢 and 𝑖 − . This method assumes that a positiveitem (i.e., 𝑖 + ) should rank higher than an negative item (i.e., 𝑖 − ). Theobjective function is formulated as:arg min ∑︁ ( u , i + , i − ) ∈O − ln 𝜙 ( ˆ 𝑟 𝑢𝑖 + − ˆ 𝑟 𝑢𝑖 − ) + 𝜆 ∥ Θ ∥ (12)where O = {( 𝑢, 𝑖 + , 𝑖 − )|( 𝑢, 𝑖 + ) ∈ R + , ( 𝑢, 𝑖 − ) ∈ R − } denotes thetraining set; R + indicates the observed interactions between user 𝑢 and 𝑖 + in the training dataset, and R − is the sampled unobservedinteraction set. 𝜆 and Θ represent the regularization weight andthe parameters of the model, respectively. The 𝐿 regularization isused to prevent overfitting.The mini-batch Adam [17] is adopted to optimize the predictionmodel and update the model parameters. Specifically, for a batchof randomly sampled triples ( 𝑢, 𝑣 + , 𝑣 − ) ∈ ( 𝑂 ) , the representationof those users and items are first learned by the propagation rulesand then the model parameters are updated by using the gradientsof the loss function. In this section, we introduce our proposed subgraph generationmodule which is designed to construct the subgraphs 𝐺 𝑠 with 𝑠 ∈{ , · · · , 𝑁 𝑠 } from a given input graph G . Remind that the subgraphsare used to group users with common interests in our model. Weformulate the user grouping as a classification task, i.e., each useris classified to a group. Specifically, each user is represented by afeature vector, which is a fusion of the graph structure and the IDembedding: 𝑭 𝒖 = 𝜎 ( 𝑾 ( 𝒆 ( ) 𝒖 + 𝒆 ( ) 𝒖 ) + 𝒃 ) , (13)where 𝑭 𝒖 is the obtained user feature via feature fusion. 𝒆 ( ) 𝒖 is theembedding of user ID and 𝒆 ( ) 𝒖 is the feature obtained by aggregat-ing local neighbor in the graph (i.e., the user embedding after thefirst layer propagation.). 𝑾 ∈ 𝑅 𝑑 × 𝑑 and 𝒃 ∈ 𝑅 × 𝑑 are respectivelythe trainable weight matrix and bias vector of the fusion method. 𝜎 is the activation function. LeakyReLU [24] is adopted, because itcan encode both positive and small negative signals. To classify theusers into different subgraphs, we cast the obtained user feature toa prediction vector with a 2-layer neural networks: 𝑼 𝒉 = 𝜎 ( 𝑾 𝑭 𝒖 + 𝒃 ) , 𝑼 𝒐 = 𝑾 𝑼 𝒉 + 𝒃 , (14)where 𝑼 𝒐 is the prediction vector. The position of maximum valuein 𝑈 𝑜 represents which group/subgraph the user belongs to. 𝑾 ∈ 𝑅 𝑑 × 𝑑 , 𝑾 ∈ 𝑅 𝑑 × 𝑁 𝑠 and 𝒃 ∈ 𝑅 × 𝑑 , 𝒃 ∈ 𝑅 × 𝑁 𝑠 are respectivelythe trainable weight matrices and bias vectors of the two layers.The dimension of the prediction vector dimensions is the same asthe number of subgraphs, which is a pre-selected hyper-parameter. Table 1: Basic statistics of the experimental datasets.

Dataset

To evaluate the effectiveness of IMP-GCN,we conducted experiments on three benchmark datasets: Amazon-Kindle Store, Amazon-Home&Kitchen and Gowalla. The first twodatasets are from the public Amazon review dataset , which hasbeen widely used for recommendation evaluation in previous stud-ies. The third dataset is a check-in dataset collected from Gowalla,where users share their locations by checking-in. We followed thegeneral setting in recommendation to filter users and items withfew interactions. For all the datasets, we used the 10-core settings,i.e., retaining users and items with at least 10 interactions. Thestatistics of three datasets are shown in Table 1. As we can see, thedatasets are of different sizes and sparsity levels, which are usefulfor analyzing the performance of our method and the competitorsin different situations.For each datasets, we randomly split it into training, validation,and testing set with the ratio 80:10:10 for each user. The observeduser-item interactions were treated as positive instances. For themethods which adopt the pairwise learning strategy, we randomlysample a negative instance, that the user did not consume before,to pair with each positive instance. For each user in the test set, we treatall the items that the user did not interact with as negative items.Two widely used evaluation metrics for top- 𝑛 recommendation areadopted in our evaluation: Recall and Normalized Discounted Cu-mulative Gain [13]. For each metric, the performance is computedbased on the top 20 results. Notice that the reported results are theaverage values across all the testing users. We implemented our model with Ten-sorflow and carefully tuned the key parameters. The embeddingsize is fixed to 64 for all models and the embedding parameters areinitialized with the Xavier method [39]. We optimized our method http://jmcauley.ucsd.edu/data/amazon. WW ’21, April 19–23, 2021, Ljubljana, Slovenia Fan Liu † , Zhiyong Cheng §∗ , Lei Zhu ‡ , Zan Gao § , Liqiang Nie †∗ Figure 3: Results Comparison between IMP-GCN and LightGCN at different layers on Kindle Store and Gowalla. IMP-GCN ,IMP-GCN , and IMP-GCN represent IMP-GCN with 2, 3, and 4 subgraphs, respectively. with Adam [17] and used the default learning rate of 0.001 anddefault mini-batch size of 1024 (on gowalla, we increased the mini-batch size to 2048 for speed). The 𝐿 regularization coefficient 𝜆 issearched in the range of { 𝑒 − , 𝑒 − , · · · , 𝑒 − } . The early stoppingand validation strategies are kept the same as those in LightGCN. In this section, we first evaluated the performance of our IPM-GCNmodel when stacking different layers in graph convolution. This isto examine whether our interest-aware message-passing strategycan alleviate the over-smoothing problem. In the next, we study theeffects of the subgraph numbers on the performance of our model.

To investigate the effectiveness ofIMP-GCN in deeper structure, we increased the model depth andperformed detailed comparison with LightGCN. Since the adoptedmessage-passing strategy is the same as LightGCN in the first-orderconvolution layer, we increased the layer number from 2 to 7. Theexperimental results are shown in Fig. 3, in which IMP-GCN , IMP-GCN and IMP-GCN indicate the model with 2, 3, and 4 subgraphs,respectively. We omitted the results on 𝐻𝑜𝑚𝑒 & 𝐾𝑖𝑡𝑐ℎ𝑒𝑛 for spacelimitation, because they show exactly the same trend. From theresults, we had some interesting observations.Firstly, the proposed IMP-GCN outperforms LightGCN consis-tently when stacking more than 2 or 3 layers over both datasets.This indicates that our model can learn better embeddings by theinterest-aware message-passing strategy. Secondly, the peak per-formance of LightGCN is obtained when stacking 3 or 4 layers, andincreasing more layers will cause dramatic performance degrada-tion, indicating it suffers from the over-smoothing problem in adeep structure. In contrast, IMP-GCN continues to achieve betterperformance with deeper structure (notice that when stacking morethan 7 layers, a node already aggregates information from almostall the nodes, see Fig. 1. The results demonstrate the capability ofour model on alleviating the over-smoothing problem. Moreover,it also 1) justifies our claim that exploiting information from allnodes indiscriminately causes the over-smoothing in GCN-basedrecommendation model, and 2) validates the effectiveness of our (a) Recall on Kindle Store (b) Coverage Ratio on Kindle Store

Figure 4: Statistics of Recall and Coverage Ratio on KindleStore in three subgraphs. subgraph generation algorithm on classifying users with commoninterests.

The performance of IPM-GCN with dif-ferent numbers (i.e., { , , } ) of subgraphs can also be observed inFig. 3. From the results, we can see that the (1) IMP-GCN with 2subgraphs can obtain the best results when stacking no more than3 layers. This is because a node in the subgraphs of IMP-GCN can reach more nodes in short distance than the on in IMP-GCN or IMP-GCN in the embedding propagation operation. (2) Whenstacking more than 3 layers, IMP-GCN performs the best. After 3layers graph convolution, the number of involved nodes increasingsharply in embedding propagation (see the examples in Fig. 1). Onaverage, each node in IMP-GCN should reach more nodes thatthe one in IMP-GCN and IMP-GCN , however, the performanceimprovement of IMP-GCN is smaller or even negative (on th Kin-dle Stores) than that of IMP-GCN and IMP-GCN . This indicatesthat there is still noisy information in embedding propagation bydiscriminating user interests in a coarse-level (i.e., 2 subgraphs),negatively impacting the performance. Note that IMP-GCN canstill benefit from high-order neighbors. (3) With more subgraphs,on the one hand, IMP-GCN can distinguish users with similarinterests in a finer level and thus can better distill informationfrom high-order neighbors; on the other hand, it also cuts moreconnections to other nodes, especially the ones in short distance nterest-aware Message-Passing GCN for Recommendation WWW ’21, April 19–23, 2021, Ljubljana, Slovenia which provide more valuable information in embedding learning.As a result, when stacking more layers, its performance is onlycomparable to that of IMP-GCN . Therefore, there is a trade-off onselecting the number of subgraphs. We further studied the effectsof subgraphs by analyzing the average coverage ratio of each nodeand the corresponding performance based on the LightGCN andour IPM-GCN model. Due to the space limitation, we only providethe results on Kindle Store and omit the performance 𝑤 .𝑟 .𝑡 ndcgwhich has the similar trend as recall. In this experiment, we usedthe LightGCN with 4 layers and IPM-GCN with 3 subgraphs and 6layers, which are their optimal setting on Kindle Store. The averagerecall and average cover ratio of each user in a subgraph basedon LightGCN and IPM-GCN are shown in Fig. 4(a) and Fig. 4(b),respectively. Notably, by grouping users with similar interest insubgraphs to make information only propagate inside subgraphs,IPM-GCN can benefit from more layers of graph convolution anddistill positive information from high-order neighors. In contrast,LightGCN is limited by the negative information from high-orderneighbors and can only gain improvements over 4 layers. Compar-ing the performance of different subgraphs, we can see that witha higher coverage ratio, the performance of IPM-GCN increasesclearly.Another interesting finding is that, by stacking 6 layers, a usernode in a subgraph almost connects to all the other nodes in thewhole graph. This indicates that the users in a subgraph almostinteract all the items in the graph (otherwise, the coverage ratiocannot be that high). More importantly, IPM-GCN can still achieveimprovement with such high coverage without over-smoothing.This indicates that the embeddings of items learned in a graphcontributes to the embedding learning of users in this graph, andthe distilled information in a subgraph during graph convolution isuseful for the embedding learning for all the nodes in this subgraph.It demonstrates the effectiveness of our interest-aware message-passing strategy and the subgraph generation algorithm. To demonstrate the effectiveness, we comparedour proposed method with several recently proposed competitivemethods, including • NeuMF [15] : It is a state-of-the-art neural collaborative filter-ing method. This method uses multiple hidden layers above theelement-wise and concatenation of user and item embeddings tocapture their non-linear feature interactions. • HOP-Rec [42]:

This method exploits the high-order user-iteminteractions by random walks to enrich the original training data.In experiments, we used the codes released by the authors . • CSE [2]:

This recently proposed graph-based model also exploitsthe high-order proximity in the user-item bipartite graph. Dif-ferent from HOP-Rec, this method explores the user-user anditem-item relations by random walks to improve the performance.We used the codes released by the authors ( the same link as HOP-Rec). Number of users in the three groups 𝐺 , 𝐺 , 𝐺 are , , , , , , respectively. https://github.com/cnclabs/smore. Table 2: Performance of our model and the competitors overthree datasets. Noticed that the values are reported by per-centage with ’%’ omitted.

Datasets Kindle Store Home&Kitchen GowallaMetrics Recall NDCG Recall NDCG Recall NDCGNeuMF 4.96 2.06 1.34 0.62 12.96 11.21CSE 7.65 4.54 1.93 0.91 13.85 11.51HOP-Rec 7.96 4.58 1.98 0.94 14.11 12.70GCMC 7.93 4.55 1.42 0.64 14.03 11.68NGCF 8.25 5.09 2.14 0.96 15.62 13.35LightGCN

IMP-GCN

Improv. 6.46% 7.85% 6.27% 7.19% 4.07% 3.66%

The symbol * denotes that the improvement is significant with 𝑝 − 𝑣𝑎𝑙𝑢𝑒 < . based on a two-tailed paired t-test. • GCMC [29]:

This method applies the GCN techniques on user-item bipartite graph and employs one convolutional layer toexploit the direct connections between users and items. • NGCF [33] : This method explicitly encodes the collaborativesignal in the form of high-order connectivities by performingembedding propagation in the user-item bipartite graph. • LightGCN [14] : It is an simplified version of NGCF by removingthe feature transformation and nonlinear activation module. Itmakes GCN-based methods more concise and appropriate forrecommendation and achieves the state-of-the-art performance.For fair comparisons, all the methods are optimized by the same pair-wise learning strategy. We put great efforts to tune these methodsbased on the validation dataset and reported their best performance.

Table 2 shows the performance com-parison results. The best and second best results were highlightedin bold. From the results, we had following observations.The performance of NeuMF is relatively poor as it not explicitlyleverages the high-order connectivities between users and items,resulting in suboptimal performance. For the graph-based methods,CSE makes use of the implicit associates of user-user and item-itemsimilarities via high-order neighborhood proximity by performingrandom walks on the user-item interaction graph. GCMC obtainsbetter performance over CSE, demonstrating the advantages ofGCN-based approaches, which can exploit graph structure informa-tion. However, it does not perform well on

𝐻𝑜𝑚𝑒 & 𝐾𝑖𝑡𝑐ℎ𝑒𝑛 becausethe useful information in neighbors cannot be efficiently aggregated.Hop-Rec outperforms the above methods on the three datasets, be-cause it samples user-item interactions from high-order neighborsto enrich the training data. NGCF achieves consistent much betterperformance over the above baselines. This is because it adopts theGCN techniques to explicitly and directly exploit the high-orderconnectivities in the embedding. In contrast, the GCMC methodonly utilizes the first-order neighbors for representation learning;HOP-Rec and CSE leverage the high-order neighbors to enrichthe training data rather than using them in embedding functionfor direct representation learning. This demonstrates the powerfulrepresentation learning capability of GCN and the importance ofutilizing high-order information directly in representation learning.

WW ’21, April 19–23, 2021, Ljubljana, Slovenia Fan Liu † , Zhiyong Cheng §∗ , Lei Zhu ‡ , Zan Gao § , Liqiang Nie †∗ Table 3: Performance of our model and its variants overthree datasets. Noticed that the values are reported by per-centage with ’%’ omitted.

Datasets Kindle Store Home&Kitchen GowallaMetrics Recall NDCG Recall NDCG Recall NDCGIMP-GCN 𝑠 𝑓 Similar to the results reported in [14], LightGCN achieves substan-tially improvement over NGCF by simplifying it with the removalof two common designs in GCN.IMP-GCN outperforms all the baselines consistently over all thedatasets. In particular, compared to the strongest baseline in termsof NDCG@20, IMP-GCN can reach a relative improvement overLightGCN by 7.85%, 7.19%, 3.66% on

𝐾𝑖𝑛𝑑𝑙𝑒𝑆𝑡𝑜𝑟𝑒 , 𝐻𝑜𝑚𝑒 & 𝐾𝑖𝑡𝑐ℎ𝑒𝑛 and

𝐺𝑜𝑤𝑎𝑙𝑙𝑎 , respectively. The great improvement over LightGCNdemonstrates the importance of distinguishing nodes in high-orderneighbors in the graph convolution operation, as well as the effec-tiveness of our proposed interest-aware message-passing strategy.

In this section, we examined the contribution of different compo-nents in our model to the final performance by comparing IMP-GCNwith the following two variants: • IMP-GCN 𝑠 : This variant removes the graph structure informa-tion from the subgraph generation module (i.e., removing 𝒆 ( ) 𝒖 in Eq. 13). • IMP-GCN 𝑓 : In this variant, the first-order propagation is alsoperformed inside each subgraph (i.e., The equation for 𝒆 ( ) 𝒊 inEq. 3 is replaced with (cid:205) 𝑠 ∈S 𝒆 ( ) 𝒊𝒔 ).The results of two variants and IPM-GCN were reported in Ta-ble 3, in which the best results are highlighted in bold. IMP-GCNoutperforms IMP-GCN 𝑠 over all the datasets, which indicates theeffectiveness of employing graph structure information in subgraphgeneration module. It is expected that IMP-GCN 𝑠 obtains much bet-ter performance over IMP-GCN 𝑓 , because the first-order neighbors(i.e., the interaction between users and items) contributes the directinformation for user and embedding in the collaborative filteringprocess. The results also demonstrate the reasonable design of ourIPM-GCN model. As one of the most important information retrieval techniques,recommendation has made tremendous progress in past decades.Among various recommendation approaches, the model-based col-laborative filtering (CF) [5, 6, 14–16, 19, 20, 27, 32, 33] achieves agreat success and becomes the mainstream recommendation tech-nique. CF learns user and item embeddings by reconstructing theuser-item interaction matrix. Earlier research efforts mainly focuson the shallow models, such as BPR [27], CML [16], matrix factoriza-tion (MF) [19]. Their success motivates the development of variousvariants via leveraging additional information (e.g., review [25], image [12], knowledge graph [30–32]) to deal with different tasks(e.g., context-aware [22], session-based [23]). With the rise of deeplearning, it has also been widely applied in recommendation and ex-hibits great potential by either enhancing the user/item embeddinglearning or introducing non-linearity into the interaction func-tion, promoting another peak development of recommendationtechnique. Many DL-based recommendation models have been pro-posed, such as NeuMF [15], Wide&Deep [4], and achieved betterperformance over traditional models.Another research line is graph-based recommendation, whichcan explicitly exploit high-order proximity between users and items.Early approaches infer indirect preference by random walks in thegraph to provide recommendation [7, 10, 11]. The recently proposedapproaches exploit the user-item bipartite graph to enrich the user-item interactions [42, 44] and explore other types of collaborativerelations, such as user-user and item-item similar ties [2, 44]. Forexample, HOP-Rec [42] uses random sample positive user-iteminteractions to enrich the training data by using random walks.WalkRanker [44] and CSE [2] performs random walks to explorethe high-order proximity in user-user and item-item relations. Asthose methods rely on random walks to sample new interactions formodel training, their performance heavily depends on the quality ofgenerated interactions by random walks. As a result, these methodsneed carefully selection and tuning effects.In recent years, Graph Convolution Networks (GCNs) have at-tracted increasing attention in recommendation due to the powerfulcapability on representation learning from non-Euclidean struc-ture [8, 9, 14, 21, 29, 32–36, 43, 45]. And then, many GCN-basedrecommendation models have been developed. For example, GC-MC [29] employs one convolution layer to exploit the direct con-nections between users and items; PinSage [43] combines randomwalks with multiple graph convolution layers on the item-itemgraph for Pinterest image recommendation; MEIRec [8] utilizesmetapath-guided neighbors to exploit rich structure informationfor intent recommendation; NGCF [33] exploits high-order proxim-ity by propagating embeddings on the user-item interaction graph;instead of implicitly capturing the high-order connectivity throughthe propagation embedding, SMOG-CF [45] is proposed to directlycapture the high-order connectivity between neighboring nodes atany order. Multi-GCCF [28] explicitly incorporates the user-userand item-item graphs, which is built upon the user-item bipartitegraph, in the embedding learning process. Inspired by the study ofsimplifying GCN [37], researchers also introspect the complex de-sign in GCN-based recommendation models. He at al. [14] pointedout that the two common designs feature transformation and nonlin-ear activation have no positive effects on the final performance, andproposed LightGCN which substantially improves the performanceover NDCG. Meanwhile, Chen et al. [3] also proposed to removethe nonlinearity in the network and introduced a residual networkto alleviate the over-smoothing problem in existing GCN-based rec-ommendation models. In this paper, we move a step further on thisresearch line. We claim that the indiscriminatively exploiting thehigh-order neighboring nodes is also an important reason for theover-smoothing problem for GCN-based recommendation model.A typical example is that two users with contradictory interestscan be also connected via a 𝑘 -order path in the user-item interac-tion graph. To tackle the problem, we propose an interest-aware nterest-aware Message-Passing GCN for Recommendation WWW ’21, April 19–23, 2021, Ljubljana, Slovenia message-passing strategy to make the embedding propagation onlyhappened inside a subgraph with similar interests. In this work, we argued that exploiting high-order node indiscrimi-nately would introduce negative information into the embeddingpropagation in the GCN-based recommendation models, causingthe performance degradation when stacking more layers. We pre-sented a IMP-GCN model which learns user and item embeddingsby performing high-order graph convolution inside subgraphs. Thesubgraphs are formed by a designed subgraph generation algo-rithm that groups users with similar interests and their interacteditems into the same graph. In IMP-GCN, the embedding of a nodelearned in a subgraph only contributes to the embedding learning ofother nodes in this subgraph. In this way, IMP-GCN can effectivelyavoid taking the noisy information into the embedding learning.Experiments on large-scale real-world datasets demonstrate thatIMP-GCN can gain improvements by stacking more layers to ex-ploit information from higher-order neighbors, and achieve thestate-of-the-art performance. The advantages of IMP-GCN indicatethe importance of distinguishing high-order neighbors on tacklingthe over-smoothing problem in GCN models. We believe the in-sights in this study can shed light on the further development ofgraph-based recommendation models.

This work is supported by the National Natural Science Foundationof China, No.:61902223, No.:U1936203; the Innovation Teams in Col-leges and Universities in Jinan, No.:2018GXRC014; the ShandongProvincial Natural Science Foundation, No.:ZR2019JQ23; Young cre-ative team in universities of Shandong Province, No.:2020KJN012.

REFERENCES [1] Robert M. Bell and Yehuda Koren. 2007. Lessons from the Netflix prize challenge.In

SIGKDD Explorations . ACM, 75–79.[2] Chih-Ming Chen, Chuan-Ju Wang, Ming-Feng Tsai, and Yi-Hsuan Yang. 2019.Collaborative Similarity Embedding for Recommender Systems. In

WWW . ACM,2637–2643.[3] Lei Chen, Le Wu, Richang Hong, Kun Zhang, and Meng Wang. 2020. RevisitingGraph Based Collaborative Filtering: A Linear Residual Graph ConvolutionalNetwork Approach. In

The Thirty-Fourth AAAI Conference on Artificial Intelligence .AAAI Press, 27–34.[4] Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra,Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, et al.2016. Wide & deep learning for recommender systems. In

Proceedings of the 1stWorkshop on Deep Learning for Recommender Systems . ACM, 7–10.[5] Zhiyong Cheng, Xiaojun Chang, Lei Zhu, Rose C Kanjirathinkal, and MohanKankanhalli. 2019. MMALFM: Explainable recommendation by leveraging re-views and images.

TOIS

37, 2 (2019), 16.[6] Zhiyong Cheng, Ying Ding, Lei Zhu, and Kankanhalli Mohan. 2018. Aspect-awarelatent factor model: Rating prediction with ratings and reviews. In

Proceedings ofthe 27th International Conference on World Wide Web . IW3C2, 639–648.[7] Fabian Christoffel, Bibek Paudel, Chris Newell, and Abraham Bernstein. 2015.Blockbusters and Wallflowers: Accurate, Diverse, and Scalable Recommendationswith Random Walks. In

Proceedings of the 9th ACM Conference on RecommenderSystems . ACM, 163–170.[8] Shaohua Fan, Junxiong Zhu, Xiaotian Han, Chuan Shi, Linmei Hu, Biyu Ma, andYongliang Li. 2019. Metapath-Guided Heterogeneous Graph Neural Network forIntent Recommendation. In

Proceedings of the 25th ACM SIGKDD InternationalConference on Knowledge Discovery & Data Mining . ACM, 2478–2486.[9] Wenqi Fan, Yao Ma, Qing Li, Yuan He, Yihong Eric Zhao, Jiliang Tang, and DaweiYin. 2019. Graph Neural Networks for Social Recommendation. In

Proceedings ofthe 28th International Conference on World Wide Web . IW3C2, 417–426.[10] François Fouss, Alain Pirotte, Jean-Michel Renders, and Marco Saerens. 2007.Random-Walk Computation of Similarities between Nodes of a Graph with Application to Collaborative Recommendation.

IEEE Trans. Knowl. Data Eng.

Proceedings of the 20th International JointConference on Artifical Intelligence . Morgan Kaufmann Publishers Inc., 2766–2771.[12] Ruining He and Julian McAuley. 2016. VBPR: Visual Bayesian PersonalizedRanking from Implicit Feedback. In

Proceedings of the Thirtieth AAAI Conferenceon Artificial Intelligence . AAAI Press, 144–150.[13] Xiangnan He, Tao Chen, Min-Yen Kan, and Xiao Chen. 2015. TriRank: Re-viewaware Explainable Recommendation by Modeling Aspects. In

Proceedings ofthe 24th ACM International Conference on Information and Knowledge Management .ACM, 1661–1670.[14] Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, YongDong Zhang, and MengWang. 2020. LightGCN: Simplifying and Powering Graph Convolution Net-work for Recommendation. In

Proceedings of the 43rd International ACM SIGIRConference on Research and Development in Information Retrieval . ACM, 639–648.[15] Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-SengChua. 2017. Neural collaborative filtering. In

Proceedings of the 26th InternationalConference on World Wide Web . IW3C2, 173–182.[16] Cheng-Kang Hsieh, Longqi Yang, Yin Cui, Tsung-Yi Lin, Serge Belongie, andDeborah Estrin. 2017. Collaborative metric learning. In

Proceedings of the 26thInternational Conference on World Wide Web . IW3C2, 193–201.[17] Diederik P Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimiza-tion. In

Proceedings of the 3rd International Conference on Learning Representations .[18] Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification withGraph Convolutional Networks. In

ICLR . OpenReview.net.[19] Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization tech-niques for recommender systems. In

IEEE Computer , Vol. 42. 42–49.[20] Fan Liu, Zhiyong Cheng, Changchang Sun, Yinglong Wang, Liqiang Nie, andMohan Kankanhalli. 2019. User Diverse Preference Modeling by MultimodalAttentive Metric Learning. In

Proceedings of the 27th ACM International Conferenceon Multimedia . ACM, 1526–1534.[21] Fan Liu, Zhiyong Cheng, Lei Zhu, Chenghao Liu, and Liqiang Nie. 2020. AnAttribute-aware Attentive GCN Model for Recommendation.

IEEE Transactionson Knowledge and Data Engineering (2020), 1–12.[22] Xin Liu and Wei Wu. 2015. Learning Context-Aware Latent Representations forContext-Aware Collaborative Filtering. In

Proceedings of the 38th InternationalACM SIGIR Conference on Research and Development in Information Retrieval .Association for Computing Machinery, 887–890.[23] Yuanxing Liu, Zhaochun Ren, Wei-Nan Zhang, Wanxiang Che, Ting Liu, andDawei Yin. 2020. Keywords Generation Improves E-Commerce Session-BasedRecommendation. In

Proceedings of The Web Conference 2020 . Association forComputing Machinery, 1604–1614.[24] Andrew L Maas, Awni Y Hannun, and Andrew Y Ng. 2013. Rectifier nonlinearitiesimprove neural network acoustic models. In

ICML Workshop on Deep Learningfor Audio, Speech and Language Processing .[25] Julian McAuley and Jure Leskovec. 2013. Hidden factors and hidden topics:understanding rating dimensions with review text. In

Proceedings of the 7th ACMConference on Recommender Systems . ACM, 165–172.[26] Jiezhong Qiu, Jian Tang, Hao Ma, Yuxiao Dong, Kuansan Wang, and Jie Tang.2018. DeepInf: Social Influence Prediction with Deep Learning.. In

Proceedings ofthe 24th ACM SIGKDD International Conference on Knowledge Discovery and DataMining . ACM, 2110–2119.[27] Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme.2009. BPR: Bayesian personalized ranking from implicit feedback. In

UAI . 452–461.[28] Jianing Sun, Yingxue Zhang, Chen Ma, Mark Coates, Huifeng Guo, RuimingTang, and Xiuqiang He. 2019. Multi-graph convolution collaborative filtering. In

Proceedings of IEEE International Conference on Data Mining . 1306 – 1311.[29] Rianne van den Berg, Thomas N. Kipf, and Max Welling. 2018. Graph Convolu-tional Matrix Completion. In

ACM SIGKDD: Deep Learning Day . ACM.[30] Hongwei Wang, Fuzheng Zhang, Jialin Wang, Miao Zhao, Wenjie Li, Xing Xie,and Minyi Guo. 2018. RippleNet: Propagating User Preferences on the KnowledgeGraph for Recommender Systems. In

CIKM . ACM, 417–426.[31] Hongwei Wang, Fuzheng Zhang, Jialin Wang, Miao Zhao, Wenjie Li, Xing Xie,and Minyi Guo. 2019. Exploring High-Order User Preference on the KnowledgeGraph for Recommender Systems.

ACM Trans. Inf. Syst.

37, 3 (2019), 32:1–32:26.[32] Xiang Wang, Xiangnan He, Yixin Cao, Meng Liu, and Tat-Seng Chua. 2019. KGAT:Knowledge Graph Attention Network for Recommendation. In

Proceedings of the25th ACM SIGKDD International Conference on Knowledge Discovery and DataMining . ACM, 950–958.[33] Xiang Wang, Xiangnan He, Meng Wang, Fuli Feng, and Tat-Seng Chua. 2019.Neural Graph Collaborative Filtering. In

Proceedings of the 42nd InternationalACM SIGIR Conference on Research and Development in Information Retrieval .ACM, 165–174.[34] Yinwei Wei, Zhiyong Cheng, Xuzheng Yu, Zhou Zhao, Lei Zhu, and Liqiang Nie.2019. Personalized Hashtag Recommendation for Micro-videos. In

Proceedings ofthe 27th ACM International Conference on Multimedia . ACM, 1446–1454.

WW ’21, April 19–23, 2021, Ljubljana, Slovenia Fan Liu † , Zhiyong Cheng §∗ , Lei Zhu ‡ , Zan Gao § , Liqiang Nie †∗ [35] Yinwei Wei, Xiang Wang, Liqiang Nie, Xiangnan He, and Tat-Seng Chua. 2020.Graph-Refined Convolutional Network for Multimedia Recommendation withImplicit Feedback. In Proceedings of the 28th ACM International Conference onMultimedia . ACM, 3541–3549.[36] Yinwei Wei, Xiang Wang, Liqiang Nie, Xiangnan He, Richang Hong, and Tat-SengChua. 2019. MMGCN: Multi-modal graph convolution network for personalizedrecommendation of micro-video. In

Proceedings of the 27th ACM InternationalConference on Multimedia . ACM, 1437–1445.[37] Felix Wu, Amauri Souza, Tianyi Zhang, Christopher Fifty, Tao Yu, and KilianWeinberger. 2019. Simplifying Graph Convolutional Networks. PMLR, 6861–6871.[38] Yao Wu, Christopher DuBois, Alice X. Zheng, and Martin Ester. 2016. Collabora-tive Denoising Auto-Encoders for Top-N Recommender Systems. In

Proceedingsof the 9th ACM International Conference on Web Search and Data Mining . ACM,153–162.[39] Glorot Xavier and Bengio Yoshua. 2010. Understanding the difficulty of train-ing deep feedforward neural networks. In

Proceedings of the 13th InternationalConference on Artificial Intelligence and Statistics . JMLR, 249–256.[40] Xin Xin, Alexandros Karatzoglou, I. Arapakis, and J. Jose. 2020. Graph HighwayNetworks.

ArXiv abs/2004.04635 (2020). [41] HongJian Xue, XinYu Dai, Jianbing Zhang, Shujian Huang, and Jiajun Chen.2017. Deep matrix factorization models for recommender systems. In

Proceedingsof the 26h International Joint Conference on Artificial Intelligence . AAAI Press,3203–3209.[42] Jheng-Hong Yang, Chih-Ming Chen, Chuan-Ju Wang, and Ming-Feng Tsai. 2018.HOP-rec: high-order proximity for implicit recommendation. In

RecSys . ACM,140–144.[43] Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L. Hamilton,and Jure Leskovec. 2018. Graph Convolutional Neural Networks for Web-ScaleRecommender Systems. In

Proceedings of the 24th ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining . ACM, 974–983.[44] Lu Yu, Chuxu Zhang, Shichao Pei, Guolei Sun, and Xiangliang Zhang. 2018.WalkRanker: A Unified Pairwise Ranking Model With Multiple Relations forItem Recommendation. In

IJCAI . AAAI Press, 2596–2603.[45] Hengrui Zhang and Julian McAuley. 2020. Stacked Mixed-Order Graph Convo-lutional Networks for Collaborative Filtering. In