User Embedding based Neighborhood Aggregation Method for Inductive Recommendation
Rahul Ragesh, Sundararajan Sellamanickam, Vijay Lingam, Arun Iyer, Ramakrishna Bairi
UUser Embedding based Neighborhood Aggregation Methodfor Inductive Recommendation
Rahul Ragesh, Sundararajan Sellamanickam, Vijay Lingam*, Arun Iyer*, Ramakrishna Bairi* t-rarage,ssrajan,t-vili,ariy,[email protected] Research IndiaBengaluru, India
ABSTRACT
We consider the problem of learning latent features ( aka embed-ding) for users and items in a recommendation setting. Given onlya user-item interaction graph, the goal is to recommend itemsfor each user. Traditional approaches employ matrix factorization-based collaborative filtering methods. Recent methods using graphconvolutional networks (e.g., LightGCN) achieve state-of-the-artperformance. They learn both user and item embedding. One majordrawback of most existing methods is that they are not inductive;they do not generalize for users and items unseen during training.Besides, existing network models are quite complex, difficult totrain and scale. Motivated by LightGCN , we propose a graph con-volutional network modeling approach for collaborative filtering(CF-GCN ). We solely learn user embedding and derive item embed-ding using light variant (CF-LGCN-U ) performing neighborhoodaggregation, making it scalable due to reduced model complex-ity. CF-LGCN-U models naturally possess the inductive capabilityfor new items, and we propose a simple solution to generalizefor new users. We show how the proposed models are related toLightGCN . As a by-product, we suggest a simple solution to makeLightGCN inductive. We perform comprehensive experiments onseveral benchmark datasets and demonstrate the capabilities ofthe proposed approach. Experimental results show that similar orbetter generalization performance is achievable than the state ofthe art methods in both transductive and inductive settings.
CCS CONCEPTS • Computing methodologies → Neural networks ; Learninglatent representations . KEYWORDS recommendation, graph convolutional networks, heterogeneousnetworks, embeddings
Recommender systems design continues to draw the attention ofresearchers, as new challenges are to be addressed, with more de-manding requirements coming due to several factors: (1) users (e.g., personalized recommendation with high quality), (2) problem scale(e.g., number of users and items, rates at which they grow), (3) infor-mation available (e.g., history of user-item interactions, the volumeof data, lack of side-information) for learning models [13, 21], and(4) system resources with constraints (e.g., storage cost, and in-ference speed). Since recommender systems provide personalized * Equal Contribution. recommendation because of their design, the development of col-laborative filtering (CF) methods that make use of each user’s pastitem interactions to build a model has drawn immense attentionfor more than a decade. CF methods’ advantage is that they donot require other domain knowledge. Several important contribu-tions [4, 6, 33] have been made along dimensions including newmodeling approaches [13–15, 20, 21], optimization [27, 29, 34, 42],scalability [8, 17, 24, 43].CF methods require only user-item interaction graph to buildmodels. Neighborhood and latent factor modeling methods are twoimportant classes of CF methods [21]. Neighborhood-based meth-ods essentially work by discovering/computing relations betweenentities of the same types (e.g., item-item, user-user) and use thisinformation to recommend items for users [16]. Our interest lies inlearning latent features (a.k.a. embedding) for users and items. Thisway, user embedding can be directly compared with item embed-ding to score each user-item pair. Matrix factorization methods, apopular class of latent factor modeling methods, include SingularValue Decomposition [9], SVD++ [21], Alternating Least Squares(ALS) [34], Factorization Machines (FM) [28]. With the advent ofneural modeling and deep learning methods, building recommendermodels using deep neural networks (DNN) and graph neural net-works (GNN) has been surging. Neural models are very-powerful,and deliver high quality recommendations [5, 12, 14, 15]. However,they come with their challenges: high model complexity, trainingand inference costs, ability to work in the limited transductivesetting, etc. See [7, 46] for more details.Our focus in this work is to develop CF models using graph neuralnetworks [2, 13, 19, 25, 38] for learning user and item embedding.Graph convolutional networks (GCNs) have been used successfullyin applications such as node classification, link prediction, andrecommendation. Of particular interest here is to learn GCN modelsfor collaborative filtering, when there is no side-information (e.g.,node features, knowledge graphs). Several GCN model-based CFsolutions [13, 25, 38] have been proposed recently. However, all ofthem have some limitations (e.g., high training cost, inability tohandle large scale data, only transductive). We have two importantrequirements: (1) scale well - in particular when the number ofitems is large and significantly higher than the number of users; thissituation is encountered in many real-world applications, and (2)able to generalize well on unseen users and items during inference,i.e., we need the model to be inductive .Our work is inspired by a very recently proposed GCN model,LightGCN [13]. This important study investigated different aspectsof GCNs for CF and proposed a very attractive simple model that de-livered state-of-the-art performance on several benchmark datasets.However, it lacks two important requirements mentioned above. a r X i v : . [ c s . I R ] F e b . Ragesh, et al. For example, LightGCN learns embedding for both users and items.For reasons stated earlier, this model does not scale well, as thenumber of model parameters is dependent on the number of items.Therefore, it is expensive to train; also, it adds to model storagecost. Furthermore, LightGCN is transductive, as latent features arenot available for new users and items during inference. We addressthese two important issues in this work, thereby expanding thecapabilities of LightGCN .A key contribution is the proposal of a novel graph convolutionalnetwork modeling approach for filtering (CF-GCN ), and we inves-tigate several variants within this important class CF-GCN models.The most important one is CF-GCN-U models which learn only user embedding, and its LightGCN variant, CF-LGCN-U (i.e., us-ing only the most important function, neighborhood aggregation).Since CF-LGCN-U models learn only user embedding, they canperform inductive recommendations with new items. However, thisstill leaves open the question: how to infer embedding for newusers?. We suggest a simple but effective solution that answersthis question. As a by-product, we also suggest a simple methodto make LightGCN inductive. The CF-LGCN-U modeling approachhas several advantages. (1) The CF-LGCN-U model has inductiverecommendation capability, as it generalizes for users and itemsunseen during training. (2) It scales well, as the model complexity isdependent only on the number of users. Therefore, it is very usefulin applications where the growth rate of the number of items is rel-atively high. (3) Keeping both inductive and scalability advantagesintact, it can achieve comparable or better performance than morecomplex models.We suggest a twin-CF-LGCN-U architecture to increase the ex-pressive power of CF-LGCN-U models by learning two sets of userembedding. This helps to get improved performance in some ap-plications, yet keeping the model complexity advantage (with de-pendency only on the number of users). Furthermore, we showhow the CF-LGCN-U model and its counterpart (CF-LGCN-E modelthat learns only item embedding) are related to LightGCN . Ourbasic analysis reveals how training the CF-LGCN-U and CF-LGCN-E models have interpretation of learning neighborhood aggregationfunctions that make use of user-user and item-item similarities likeneighborhood-based CF methods.We conduct comprehensive experiments on several benchmarkdatasets in both transductive and inductive setting. First, we showthat the proposed CF-LGCN-U models achieve comparable or bet-ter performance compared to LightGCN and several baselines onbenchmark datasets in the transductive setting. Importantly, this isachieved with reduced model complexity, highlighting that furthersimplification of LightGCN is possible, making it more scalable.Next, we conduct an experiment to demonstrate the inductive rec-ommendation capability of CF-LGCN-U models. Our experimentalresults show that the performance in the inductive setting is veryclose to that achievable in the transductive setting.In Section 2, we introduce notation and problem setting. Back-ground and motivation for our solution are presented in Section 3.Our proposal of CF-GCN architecture and its variants are detailedin Section 4. We present our experimental setting and results inSection 5. This is followed by discussion and suggestions for fu-ture work, and related work in Sections 5 and 6. We conclude withimportant highlights and observations in Section 7.
Notation.
We use R ∈ { , } 𝑚 × 𝑛 to denote user-item interactiongraph, where 𝑚 and 𝑛 denote the number of users and items re-spectively. Let U ∈ R 𝑚 × 𝑑 and E ∈ R 𝑛 × 𝑑 denote user and itemembedding matrices. We assume that the embedding dimension(d) is same for users and items. We use superscript to indicate thelayer output embedding (e.g., U ( 𝑙 ) and E ( 𝑙 ) for 𝑙 𝑡ℎ layer embeddingoutput of a network) explicitly, wherever needed. Lower-case letterand subscript are used to refer embedding vectors (row vectorsof embedding matrices); for example, u ( 𝑙 ) 𝑗 and e ( 𝑙 ) 𝑘 would denoteembedding vectors of 𝑗 𝑡ℎ user and 𝑘 𝑡ℎ item at the 𝑙 𝑡ℎ layer output. Problem Setting.
We are given only the user to item interactiongraph, R . We assume that no side information (e.g., additionalgraphs encoding item-item or user-user relations, the user or itemfeatures) is available. The goal is to learn latent features for users ( U )and items ( E ) such that user-specific relevant recommendation canbe made by ranking items using scores computed from embeddingvectors (e.g., the inner product of user and item embedding vectors, Z = UE 𝑇 ).Most collaborating filtering methodologies cannot make recom-mendations for users/items unseen during training because theyare transductive. Therefore, their utility is limited. We are interestedin designing a graph embedding based inductive recommendation model where we require the model to have the ability to generalizefor users/items unseen during training. We assume that some inter-actions are available for new users/items during inference. Thus,we require an inductive recommendation modeling solution thatcan infer embedding vectors for new users and items. Moreover, inmany practical applications, the number of items, 𝑛 , grows muchhigher than the number of users, 𝑚 . Therefore, we aim at devel-oping models with reduced model complexity, having complexityindependent of the number of items. The simplicity and success of the recently proposed LightGCN model[13] for collaborating filtering inspire our work. LightGCN learnslatent features (a.k.a. embedding) for users and items using a light-weight graph convolution neural network. We present some back-ground on graph convolutional networks followed by LightGCN .We then make some observations to motivate our work.
Graph Convolutional Networks.
A graph convolutional net-work [19] is composed using graph convolutional layers with eachlayer performing three basic operations: neighborhood aggregation,feature transformation, and non-linear activation. In GCN, 𝑙 𝑡ℎ layerfunction is defined as: H ( 𝑙 + ) = 𝜎 ( AH ( 𝑙 ) W ( 𝑙 ) ) . It takes the previous layer output ( H ( 𝑙 ) ) as input, transforms usingweight matrix ( W ( 𝑙 ) ) , performs aggregation using an adjacencymatrix ( A ), and produces output ( H ( 𝑙 + ) ) via a nonlinear activa-tion function 𝜎 (·) (e.g., ReLU or Sigmoid). Kipf and Welling [19]proposed to use normalized A defined as: ˜A = D − ( I + A ) D − where D is the diagonal in-degree matrix of ( I + A ) and I helps toinclude self node representation in the neighborhood aggregation.In node classification tasks, H ( ) comprises of node features and ser Embedding based Neighborhood Aggregation Methodfor Inductive Recommendation the model weights { W ( 𝑙 ) : 𝑙 = , . . . , 𝐿 } are learned by optimizingan objective function (e.g., cross-entropy loss) using labeled data. Recommendation Setting.
In the basic model with GCN, weuse A 𝐶𝐹 = (cid:20) 𝑇 (cid:21) as the graph and H ( 𝑙 ) = (cid:20) U ( 𝑙 ) E ( 𝑙 ) (cid:21) . Note thatuser and item embedding vectors are available at each layer out-put, and propagated through multiple layers. Then, user and itemembedding (starting with 1-hot representation at the input) arelearned with the GCN model weights using a loss function suit-able for collaborative filtering. Following LightGCN [13], we useBayesian personalized ranking (BPR) loss function defined as: L = 𝑚 ∑︁ 𝑖 = ∑︁ 𝑗 ∈N 𝑒 ( 𝑢 𝑖 ) ∑︁ 𝑘 ∈ 𝑁𝑆 𝑒 ( 𝑢 𝑖 ) 𝑔 ( 𝑧 𝑖,𝑗 , 𝑧 𝑖,𝑘 ; H ) + 𝜆 || H || (1)where N 𝑒 ( 𝑢 𝑖 ) and 𝑁𝑆 𝑒 ( 𝑢 𝑖 ) denote item neighborhood (with in-teractions) and negative samples (items) of user 𝑢 𝑖 . The function 𝑔 (·) measures degree of violation of positive pairwise score ( 𝑧 𝑖,𝑗 )higher than negative pairwise score ( 𝑧 𝑖,𝑘 ) and we use 𝑔 ( 𝑧 𝑖 , 𝑧 𝑗 ) = − ln 𝜎 ( 𝑧 𝑖,𝑗 − 𝑧 𝑖,𝑘 ) in our experiments. 𝜆 is weight regularizationconstant and || H || denote L2-norm of embedding and layer weightvectors. We give more details of model training in the experimentsection. LightGCN .
LightGCN draws upon the idea of propagatingembedding from prior work on neural graph collaborative filtering(NGCF) [38]. NGCF uses the basic GCN recommendation modelexplained above except that an additional term involving element-wise (Hadamard) product of user and item embedding is part ofneighborhood aggregation. See [38] for more details. He et.al. [38]empirically showed that feature transformation and non-linear acti-vation operations are unnecessary in NGCF and can even be harmfulto the performance. They proposed a simpler GCN architecture(LightGCN ) by keeping only neighborhood aggregation functionwith the result: H ( 𝑙 + ) = ˜AH ( 𝑙 ) , 𝑙 = , . . . , 𝐿 . LightGCN fuses thesemulti-layer outputs as: H = 𝐿 ∑︁ 𝑙 = 𝛼 𝑙 H ( 𝑙 ) = 𝐿 ∑︁ 𝑙 = 𝛼 𝑙 ˜A 𝑙 H ( ) (2)where { 𝛼 𝑙 : 𝑙 = , . . . , 𝐿 } are fusion hyperparameters. It is worthnoting that LightGCN does not require I while using ˜A , as de-fined by Kipf and Welling [19]. Also, note that since R is a bipar-tite graph, user and item specific in-degree matrices ( D 𝑢 and D 𝑒 )are used in pre/post multiplication of A . See LightGCN [13] formore details. In empirical studies, they found that tuning fusionhyperparameters is not necessary and setting equal weights (i.e.,simple average) works well. From (2), we see that the only learnableparameters are H ( ) = (cid:20) U ( ) E ( ) (cid:21) . Furthermore, higher order neigh-borhood aggregation happens through the effective propagationmatrix, (cid:205) 𝐿𝑙 = 𝛼 𝑙 A 𝑙 . Experimental results [13] showed state-of-the-art performance achievable by LightGCN on several benchmarkdatasets. Motivation.
We observe that LightGCN is quite attractive dueto its simplicity and ability to give a state-of-the-art performance.However, it still has two drawbacks. Firstly, it is transductive. There-fore, new items, when they arrive, cannot be recommended for ex-isting users. Similarly, recommending existing items to new users is not possible. Secondly, the model complexity of LightGCN is 𝑂 (( 𝑚 + 𝑛 ) 𝑑 ) . Thus, it poses a scalability issue, for example, whenthe number of items is extremely huge (i.e., 𝑛 ≫ 𝑚 ). In many real-world applications, the rate at which the number of items grows issignificantly higher compared to users. Thus, our interest lies inanswering two key questions: • RQ1:
How do we make GCN models for collaborative filtering(CF-GCN ) scalable, make it independent of the number ofitems, yet delivering state-of-the-art performance? • RQ2:
How do we make CF-GCN models inductive?
We answer these questions in the next section. As a by-product, wealso show how LightGCN can be made inductive.
We propose a novel graph convolutional network modeling ap-proach for inductive collaborative filtering (CF-GCN ). We presentthe idea of learning CF-GCN models with only user embedding(termed CF-GCN-U and its light version, CF-LGCN-U ) and inferitem embedding. We discuss several CF-GCN model variants thatcan be useful for different purposes (e.g., increasing the expressivepower of models, scenarios where learning item embedding is ben-eficial). With basic analysis, we show how the proposed models arerelated to LightGCN . We also show that training these models canbe interpreted as learning neighborhood aggregation function us-ing user-user and item-item similarity function like neighborhood-based CF methods. We show how to use the proposed architecturefor the inductive recommendation setting. Finally, we also suggesta simple way of making LightGCN inductive.
The basic idea that we take [26] is to use heterogeneous graphconvolutional layers to compose individual networks and combineembedding outputs from multiple layers and networks. In our set-ting, since we have only two graphs, R , and R 𝑇 , there can only betwo-layer types. We define UE-GCN layer as the GCN layer thattakes item embedding ( E ) as input and uses R to produce userembedding ( U ) output. A similar definition holds for EU-GCN layer. A CF-GCN network is composed of acascaded sequence of UE-GCN and EU-GCN layers. There are onlytwo network types for two reasons: (1) we have only two embeddingtypes (user and item) and start with only one embedding input(i.e., user or item), and (2) we have only two graphs ( R and R 𝑇 )and layer compatibility (i.e., output and input of two consecutivelayers match) is to be ensured. We name these networks, CF-GCN-U and CF-GCN-E , as they take user and item embedding inputrespectively.We illustrate with an example. CF-GCN ( R 𝑇 - R - R 𝑇 ) is a three-layer CF-GCN-U network, starting with user embedding input( U ( ) ). See Figure 1. It produces item embedding outputs, E ( ) = 𝜎 ( R 𝑇 I 𝑚 W ( ) ) and E ( ) = 𝜎 ( R 𝑇 U ( ) W ( ) ) , as the first and thirdlayer outputs respectively. Note that U ( ) = W ( ) (with 1-hotencoding I 𝑚 for users). Further, the network produces user em-bedding, U ( ) = 𝜎 ( RE ( ) W ( ) ) , as the second layer output. Wefuse U ( ) and U ( ) using AGGREGATION (e.g., mean or weighted . Ragesh, et al. Users Items
1 0 0 … … … … … … …
1 1 0 … Items Users
1 0 … … … … … … …
0 1 … 𝒖
1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 𝒖 𝒖 𝒖 𝒎 𝝈൫𝑹 𝑻 𝑰 𝒎 𝑾 ሺ𝟏ሻ 𝑼 ሺ𝟎ሻ = 𝑾 ሺ𝟏ሻ 𝝈൫𝑹𝑬 ሺ𝟏ሻ 𝑾 ሺ𝟐ሻ 𝑾 ሺ𝟐ሻ … … … … … 𝒆 𝒆 𝒆 𝒆 𝒏 𝒖 𝒖 𝒖 𝒖 𝒎 Items Users
1 0 … … … … … … …
0 1 … 𝝈൫𝑹 𝑻 𝑼 ሺ𝟐ሻ 𝑾 ሺ𝟑ሻ 𝑾 ሺ𝟑ሻ … … … … … 𝒆 𝒆 𝒆 𝒆 𝒏 ሺ𝟎ሻ ሺ𝟐ሻ ሺ𝟎ሻ ሺ𝟐ሻ ሺ𝟏ሻ ሺ𝟑ሻ ሺ𝟏ሻ ሺ𝟑ሻ Network-1: CF-GCN-U GCN GCN GCN 𝑹 𝑹 𝑻 𝑹 𝑻 One-hot encoded Users User Embedding From different layers from two networks Item Embedding From different layers from two networks
User Embedding 𝑼 ሺ𝟐ሻ 𝑬 ሺ𝟏ሻ 𝑰 𝒎 𝑬 ሺ𝟑ሻ Item Embedding Item Embedding
Prediction: User-Item Score Fusion of Embedding Fusion of Embedding
Network-2: CF-GCN-U
Inner product of User-Item Embedding
Figure 1:
Graph Convolutional Network for Collaborating Filtering Architecture (CF-GCN-U ) with Twin Networks. Each network is CF-GCN-U , using GCN layers with graphs R and R 𝑇 . We learn user embedding and derive item embedding. Fuse user and item embedding fromdifferent layers/networks. Removing layer weight matrices (except in the first layer) and non-linear activation function 𝜎 (·) result in theCF-LGCN-U network. sum) or CONCAT functions. Likewise, we obtain a fused item em-bedding. Finally, we compute each user-item pairwise score, as theinner product of user and item embedding vectors. In this network,the learnable parameters are U ( ) and GCN layer weights ( W ( ) and W ( ) ) . Thus, we learn only user embedding, and derive itemembedding using this architecture. Likewise, CF-GCN ( R - R 𝑇 - R )is a three layer CF-GCN-E network with learnable E ( ) ( only itemembedding) and layer weights. As explained earlier, LightGCN usesonly neighborhood aggregation and abandons feature and nonlineartransformations. Using this idea, we simplify CF-GCN networks.Let us continue with the three-layer CF-GCN-U network example.On dropping feature and nonlinear transformation, we get the CF-LGCN-U network with outputs as: E ( ) 𝑢 = R 𝑇 U ( ) , E ( ) 𝑢 = R 𝑇 S 𝑢 U ( ) (3) U ( ) 𝑢 = S 𝑢 U ( ) (4)where S 𝑢 = RR 𝑇 measures user-user similarity. Note that weuse the subscript 𝑢 to denote only user embedding model. Usingweighted mean with weights ( 𝛼 , 𝛼 ) to fuse, we get: E 𝑢 = R 𝑇 ( 𝛼 I 𝑚 + 𝛼 S 𝑢 ) U ( ) (5) U 𝑢 = ( 𝛼 I 𝑚 + 𝛼 S 𝑢 ) U ( ) . (6)Note that we have used same set of weights (tunable hyperparam-eters) in (5) and (6) for simplicity, and we can also use layer-wiseweights (e.g., ( 𝛼 and 𝛼 in 5) and ( 𝛼 and 𝛼 in 6)). Thus, CF-LGCN-U learns only user embedding, and uses second order user-userinformation ( S 𝑢 ) to infer final user embedding ( U 𝑢 ) that capturesuser-user relation as well. Further, item embedding is inferred byaggregating over its neighbor user embedding (see (5)). It is alsouseful to note that since S 𝑢 is obtained using only R , we have: R 𝑇 S 𝑢 = S 𝑒 R 𝑇 where S 𝑒 = R 𝑇 R . Thus, we can rewrite E ( ) in (3)and E 𝑢 in (5) as: E ( ) 𝑢 = S 𝑒 R 𝑇 U ( ) , E 𝑢 = ( 𝛼 I 𝑛 + 𝛼 S 𝑒 ) R 𝑇 U ( ) . (7) With the definition, ^E ( ) 𝑢 = R 𝑇 U ( ) , we can interpret ^E ( ) 𝑢 as aneffective or proxy item embedding derived from the learned userembedding, with the result: E 𝑢 = ˜S 𝑒 ^E ( ) 𝑢 where ˜S 𝑒 = ( 𝛼 I 𝑛 + 𝛼 S 𝑒 ) .It is useful to understand the number of users and item embed-ding sets we get from a multi-layer network. In one layer case, wehave one set of item and user embedding, i.e., U ( ) and E ( ) . With two layers, we get two sets of user embedding and one set of itemembedding. With three layers, we have two sets each, as seen from(5) and (6). To generalize, we get one set of item embedding lesser with even number of layers. It does not pose any issues in meanfusion. However, when we have even number of layers and useinner product score, we need to drop one set of embedding (eitheruser or item) to ensure dimensions match with concat fusion.Generalizing (6) and (7) with more layers, fusing with weightedmean and using inner product score ( Z 𝑢 = U 𝑢 E 𝑇𝑢 ) , we get themulti-layer CF-LGCN-U network (with 𝐿 + layers) outputs as: E 𝑢 = R 𝑇 ˜S 𝑢 U ( ) , U 𝑢 = ˜S 𝑢 U ( ) (8) Z 𝑢 = ˜S 𝑢 U ( ) ( U ( ) ) 𝑇 ˜S 𝑇𝑢 R (9)where ˜S 𝑢 = (cid:205) 𝐿𝑙 = 𝛼 𝑙 S 𝑙𝑢 , S 𝑢 = I 𝑚 . Thus, zero and higher powersof user-user and item-item similarities are used with increase inthe number of layers. Applying the same steps to the CF-LGCN-E network, we get: E 𝑒 = ˜S 𝑒 E ( ) , U 𝑒 = R˜S 𝑒 E ( ) (10) Z 𝑒 = R˜S 𝑒 E ( ) ( E ( ) ) 𝑇 ˜S 𝑇𝑒 (11)where ˜S 𝑒 = (cid:205) 𝐿𝑙 = 𝛼 𝑙 S 𝑙𝑒 , S 𝑒 = I 𝑛 . This network uses only item em-bedding as denoted by the subscript 𝑒 and, higher-order similarityinformation to learn/infer item/user embedding. Weobserve that (9) can be rewritten as: Z 𝑢 = W 𝑢 R , where W 𝑢 = ˜S 𝑢 U ( ) ( U ( ) ) 𝑇 ˜S 𝑇𝑢 . Thus, we can interpret training as learninguser-centric aggregation function (expressed through learnableuser embedding, with a special structure), W 𝑢 , that takes intoaccount the user-user similarity. Likewise, the scoring functionwith CF-LGCN-E (11) can be rewritten as: Z 𝑒 = RW 𝑒 , where W 𝑒 = ˜S 𝑒 E ( ) ( E ( ) ) 𝑇 ˜S 𝑇𝑒 . Thus, the score is computed using an item ser Embedding based Neighborhood Aggregation Methodfor Inductive Recommendation centric transformation (post multiplication) or aggregation functionthat uses item-item similarity. Thus, CF-LGCN-U and CF-LGCN-E offer learning user and item oriented neighborhood aggregationfunctions for collaborative filtering.One natural question is: Which network is better - CF-LGCN-U or CF-LGCN-E ? . We conduct a comparative experimental studylater. Note that the number of model parameters is more for CF-LGCN-E when 𝑛 > 𝑚 . These additional degrees of freedom mayhelp in some applications to get improved performance. How-ever, we emphasize that our interest primarily lies in CF-LGCN-U networks only due to reasons explained earlier. We include CF-LGCN-E discussion to show it as a possible variant within theclass of CF-GCN models. Furthermore, it helps to connect CF-GCN modeling approach with LightGCN , as explained below. We observe that the standard GCN model, including LightGCN ,uses A as the adjacency matrix in every GCN layer. Thus, it pro-duces both user and item embedding as each layer output, having 𝑂 (( 𝑚 + 𝑛 ) 𝑑 ) model complexity. On the other hand, the model com-plexities of CF-GCN-U and CF-GCN-E networks are 𝑂 ( 𝑚𝑑 ) and 𝑂 ( 𝑛𝑑 ) only. Thus, these networks help to reduce model complexity.It is possible to train CF-GCN-U and CF-GCN-E networks jointly,and fuse user/item embedding outputs from both networks. How-ever, this solution has 𝑂 (( 𝑚 + 𝑛 ) 𝑑 ) complexity. LightGCN is onesuch solution, as we show next. To understand the relation with LightGCN, we begin with theexpression (2) for the embedding vectors, and unfold up to threelayers as follows: A 𝐶𝐹 = (cid:20) S 𝑢
00 S 𝑒 (cid:21) , A 𝐶𝐹 = (cid:20) 𝑢 RS 𝑒 R 𝑇 (cid:21) (12)where S 𝑢 = RR 𝑇 and S 𝑒 = R 𝑇 R , and they measure user-user anditem-item similarities. Note that while A is off-block-diagonal, A is block-diagonal. Thus, there is a special structure to odd and evenpowers of A . Using (2) and (12), we get: U = ( 𝛼 I 𝑚 + 𝛼 S 𝑢 ) U ( ) + R ( 𝛼 I 𝑛 + 𝛼 S 𝑒 ) E ( ) (13) E = R 𝑇 ( 𝛼 I 𝑚 + 𝛼 S 𝑢 ) U ( ) + ( 𝛼 I 𝑛 + 𝛼 S 𝑒 ) E ( ) (14)On comparing (13), (14) with three layer embedding outputs ofCF-LGCN-U (see (5), (6)) and similar expressions for CF-LGCN-E networks (not shown), we see that LightGCN combines thesetwo network embedding outputs with suitable matching weights.Thus, it makes use of both user-user and item-item second orderinformation in computing user and item embedding. Further, theinner product score computed by LightGCN using (13) and (14) isessentially sum of the CF-LGCN-U and CF-LGCN-E scores (i.e., (9)and (11)); in addition, it has cross-term inner product scores of userembedding of CF-LGCN-U (i.e., the first term in (13)) with itemembedding CF-LGCN-E (i.e., the second term in (14)) and vice-versa. As noted earlier, LightGCN uses more parameters and has modelcomplexity 𝑂 (( 𝑚 + 𝑛 ) 𝑑 ) ; therefore, more powerful. The question is: can CF-LGCN-U match LightGCN performance with such re-duced model complexity? . One simple way to increase the powerof CF-LGCN-U is to use two (twin) CF-LGCN-U networks, eachlearning different sets of user embedding (i.e., U ( ) and U ( ) ). Inthis case, the effective item embedding of the second network is: ( E ( ) 𝑢 ) = R 𝑇 U ( ) and has more degrees of freedom compared tothe single CF-LGCN-U network where the effective item embed-ding is dependent only on U ( ) . Note that we can have differentnumber of layers in each network, and the user/item embeddingfrom both networks are fused to get the final user/item embedding.The advantage of the twin CF-LGCN-U network modeling approachis that the model complexity is still independent of the number ofitems and significant reduction is achieved when 𝑛 ≫ 𝑚 .The other question is: do we need both networks to get state-of-the-art performance?. In the experiment section, we evaluate differentCF-GCN networks on several benchmark datasets. We show thatCF-LGCN-U network gives a comparable or better performancethan LightGCN through empirical studies. Furthermore, the modelcomplexity is reduced and incurs lesser storage costs when 𝑚 < 𝑛 . LightGCN is a transductive method because we learn both user/itemembedding and it is not apparent how to infer embedding for newusers and items. Graph embedding based inductive recommenda-tion model requires the ability to infer new user and item embed-ding. Using CF-LGCN networks which learn only user embeddingmeets this requirement as discussed below. We present our solutionstarting with the easier problem of fixed user set with training dataavailable to learn user embedding and we need to generalize onlyfor new items. Then, we expand the scope to include new usersand suggest a simple solution for CF-LGCN-U networks. Also, as aby-product, we show how LightGCN can be modified for makinginductive inference.
Let us consider thethree-layer CF-LGCN-U example again (see (3) - (7)). Since we learnonly user embedding and infer item embedding, CF-LGCN-U caninfer embedding for new items. Let R 𝐼 denote new user-item in-teraction graph such that new items (i.e., new columns) are alsoappended. We assume that at least a few interactions are availablefor each new item in R 𝐼 . In practice, this is possible because onecan record interactions on new items shown to randomly pickedor targeted existing users. Recall that the three-layer CF-LGCN-U network uses R 𝑇 − R − R 𝑇 in tandem. We compute new itemembedding by substituting R 𝐼 for R in each layer. Note that we canchoose to use R or R 𝐼 in the second layer (i.e., to compute freshuser embedding or not). We conduct an ablation study to evaluatethe efficacy of using fresh user embedding and present our resultsshortly. Since we learn user em-bedding in CF-LGCN-U , we face the same question of how to inferembedding for new users?.
When we look at (5) and (6) closely, themain difficulty arises from the term 𝛼 I 𝑚 U ( ) , as we cannot getembedding for new users without retraining. We propose a naivesolution to address this problem. Suppose we set 𝛼 = , that is,we do not use U ( ) in the fusion step. We can then still computefresh user embedding by substituting new R 𝑈 (i.e., R with added . Ragesh, et al. rows) in the second layer. However, the number of user embeddingsets available for fusion is lesser by one set. Since the inner productof fused user and item embedding requires that the dimensionsmatch, we also drop the first layer item embedding output and addone more layer, if needed, to ensure a valid CONCAT fusion. Thissimple solution provides the ability to infer new user embedding,and we demonstrate its usefulness through empirical studies. We handle bothnew users/items’ by not using embedding outputs of the first layer(i.e., by setting 𝛼 = for the user and item embedding, as needed).Note that we update both user and item embedding. This updaterequires using R 𝐼 (i.e., ignoring new users) in the first layer and fullyupdated, R (i.e., having both new users and items) in other layers..It is important to keep in mind that we require at least a few labeledentries for new users/items during inductive inference. Finally, wenote that the idea of not using the first layer user embedding isuseful for LightGCN . Our experimental results show that inductive LightGCN also works.
We first discuss experimental setup, including a brief descriptionof the datasets and introduces the baselines used; this sub-sectionalso covers the training details, metrics and hyperparameter opti-mization. We evaluate our proposed approach in both Transductiveand Inductive settings, present and discuss our results.
We used the three datasets provided in the Light-GCN repository (a) Gowalla [22] contains the user location check-in data (b)
Yelp2018 [38] contains local business (treated as item)recommendation to users (c)
Amazon-book [11] contains book rec-ommendation to users. We include one additional dataset
Douban-Movie [47] to our evaluation; this dataset consists of user-movieinteractions. The statistics of datasets are detailed in the table 1.
As the repository onlycontained train and test splits, we sampled 10% of items randomlyfor each user from the train split to construct the validation split.As there is a difference in the splits used in LightGCN, we retrainall the baselines on these new splits and report the metrics.
In the inductive setting, wedo not have access to all the users or items during the trainingprocess. New users/items can arrive post the model training, withnew user-item interactions. Given these interactions, we need toderive embeddings for these new items/users without retraining. Toevaluate in this setting, we hold out 5% of users and items randomly(with a minimum of 10 and 5 interactions respectively) and removethem from the train, validation and test splits in the transductivesetting. At the time of inference, we assume we have access to apartial set of interactions involving all the new users and new itemsusing which we derive their embeddings. We then evaluate themodel on the rest of the interactions, along with the test set.
LightGCN gives a state-of-the-artperformance. Hence LightGCN forms the primary baseline against https://github.com/kuandeng/LightGCN Dataset User
29, 858 40, 981 1, 027, 370
Yelp2018
31, 668 38, 048 1, 561, 406
Amazon-Book
52, 643 91, 599 2, 984, 108
Douban-movie
3, 022 6, 971 195, 472
Table 1: Statistics of benchmark datasets which we compare all our proposed models. Additionally, we haveincluded the following baselines. MF [30]: This is the traditional matrix factorization model that doesnot utilize the graph information directly. NGCF [38]: Neural Graph Collaborative Filtering is a graph neuralnetwork based model that captures high-order information by em-bedding propagation using graphs. We utilized the code from thisrepository to obtain performance metrics. Mult-VAE [23]: This is a collaborative filtering method based onvariational autoencoder. We run the code released by the authors after modifying it to run on the train/val/test splits of our otherexperiments. GRMF-Norm [27]: Graph Regularized Matrix Factroiation addi-tonally adds a graph Laplacian regularizer in addition to the MFobjective. We used the GRMF-Norm variant as described in [13]
GCN : This baseline is equivalent to the non-linear version of Light-GCN with transformation matrices learnt as is done in the tradi-tional GCN layers.
We implemented all the above methodsexcept NGCF and Mult-VAE. The training pipeline is setup identicalto that of LightGCN for a fair comparison. We use BPR [29] basedloss function to train all the models using Adam Optimizer [18].All the models were trained for at most 1000 epochs with valida-tion recall@20 evaluated every 20 epochs. There is early stoppingwhen there is no improvement in the metric for ten consecutiveevaluations. Additionally, we conducted all our experiments in adistributed setting, training on 4 GPU nodes using Horovod [31].All the models we implemented are PyTorch based. We fixed em-bedding size to 64 for all the models. And the batch sizes for alldatasets were fixed to 2048 except for Amazon-Book for which wasset to 8192. Learning rate was swept over {1e-1, 1e-2, 5e-3, 1e-3,5e-4}, embedding regularization over {1e-2, 1e-3, 1e-4, 1e-5, 1e-6},graph dropout over {0.0, 0.1, 0.2, 0.25}. Additionally for NGCF, nodedropout was swept over {0.0, 0.1, 0.2, 0.25} and the coefficient forgraph laplacian regularizer for GRMF-Norm was swept over {1e-2,1e-3, 1e-4, 1e-5, 1e-6}. We tuned the hyperparameters dropout ratein the range [ , . , . ] , and 𝛽 in [ . , . , . , . ] for Mult-VAEwith the model architecture as → → , and the learn-ing rate as 1e-3. For the proposed model, we also varied graphnormalization as hyperparameters. We carried out a detailed experimental study with the proposedCF-LGCN variants. We show the usefulness of twin networks first.Then, we present results with multiple layers and layer combina-tion. We compare the recommended variants (CF-LGCN-U ) of ourproposed approach to the state-of-the-art baselines. https://github.com/kuandeng/LightGCN https://github.com/dawenl/vae_cf ser Embedding based Neighborhood Aggregation Methodfor Inductive Recommendation Recall / NDCG @ 20 Gowalla Yelp2018 Amazon-Book Douban-Movie
CF-LGCN-U 16.81 / 14.35
Twin CF-LGCN-U / 3.28
Table 2: Twin CF-LGCN-U vs CF-LGCN-UFigure 2: Training loss and Test Recall @20 metric as as func-tion of epochs
In Table 2, we present results obtained from evaluating single CF-LGCN-U network and twin CF-LGCN-U network on four bench-mark datasets. We used three layers in each network. Recall thatwe learn two sets of user embedding with the twin network. Wesee that the twin CF-LGCN-U network performs better than thesingle CF-LGCN-U network on all datasets except Yelp2018. The re-sults show that the twin CF-LGCN-U network is beneficial, and canachieve higher performances by increasing the expressive power ofCF-LGCN-U network. Hence, we use only the twin network in therest of our experiments. Performance on Yelp2018 suggests thateven a single network is sufficient in some application scenario.Therefore, it helps to experiment with both models and performmodel section using the validation set.
In this experiment, westudy the usefulness of having multiple layers in twin CF-LGCN-U network. From Table 4, we see that increasing the number oflayers improves the performance significantly on Yelp2018 andAmazon-Book datasets. This result demonstrates the usefulness ofhigher-order information available through propagation. The per-formance starts saturating, for example, in the Gowalla dataset. Be-sides, it is useful to note that increasing the number of layers beyonda certain limit may hurt the performance due to over-smoothing,as reported in node classification tasks [19]. Therefore, it is cru-cial to treat layer size as a model hyperparameter and tune usingvalidation set performance.
LightGCN reported that fusing embeddingoutput from different layers helps to get improved performancecompared to using only the final layer embedding output, and weobserved a similar phenomenon in our experiments. We tried twovariants of fusion: Mean and Concat. Following LightGCN , weused uniform weighting with Mean. From Table 3, we see that
Recall / NDCG @ 20 Aggregation Gowalla Yelp2018 Amazon-Book Douban-Movie
Twin CF-LGCN-U (2L) Mean 16.38 / 13.76
Twin CF-LGCN-U (3L) Mean 16.71 / 14.27 6.06 / 5.03 4.24 / 3.31 5.16 / 3.17Twin CF-LGCN-U (3L) Concat
Table 3: Different Layerwise Fusion
Concat fusion delivers significant improvement with our twin CF-LGCN-U network model. Therefore, we use Concat fusion withour model in rest of the experiments. Note that fusion can be done inseveral other ways, particularly when working with two networks.We leave this exercise for future experimental studies.
We performed a detailed experi-mental study by comparing the proposed model with several state-of-the-art baselines.In Figure 2, we compare the training loss andrecall @
20 of the proposed twin CF-LGCN-U model with Light-GCN on the Amazon-Book and Yelp2018 datasets. Detailed re-sults are presented in Table 4. We observe that the 3-layers twinCF-LGCN-U network model performs uniformly better than thestrongest competitor, LightGCN , except on the Gowalla datasetwhere the performance is very close. Thus, the proposed model isquite competitive to LightGCN and is a powerful alternative toLightGCN . As explained earlier, the CF-LGCN-U network has theadvantage of learning only user embedding. From Table 1, we seethat the number of items is higher than the number of users, and isnearly twice for the Amazon-Book and Douban-Movie datasets.Thus, the proposed CF-LGCN-U network can deliver similar or bet-ter performance than LightGCN with significantly reduced com-plexity, enabling large scale learning.
Our CF-GCN networkarchitecture supports another variant CF-LGCN-E . Though ourinterest primarily lies in CF-LGCN-U , we compare these two vari-ants. In Table 5, we report results from this experiment. We seethat the twin CF-LGCN-U network outperforms its counter-parttwin CF-LGCN-E on all datasets except the Douban-Movie datasetwhere the performance is very close. Note that the number of modelparameters is significantly higher in the CF-LGCN-E network, as 𝑛 > 𝑚 . Nevertheless, the performance is inferior. There has alwaysbeen this question of developing user-centric versus item-centricmodels [20] and studying this problem using CF-GCN models isbeyond the scope of this work. We leave this study for the future. We carried outa detailed experimental study to evaluate the effectiveness of in-ductive variants of CF-LGCN-U and LightGCN . We report threesets of metrics for both these models.
Inductive : This is a standard inductive setup where several usersand items are unseen during training. During inference, we usepartial interactions available for all the new entities to obtain theirembeddings. Note that we do not need to retrain these models.
Transductive (Upper Bound) : In this setup, we train the modelalong with the partial interactions, thereby having access to all theusers and items during training. The motivation to report thesenumbers is to get an upper bound on the performance.
Transductive (Lower Bound) : This set of numbers indicates thelower bound and highlights the performance gain that one canachieve by recommending for new users and items. Training and . Ragesh, et al.
Recall / NDCG @ 20 Gowalla Yelp2018 Amazon-Book Douban-MovieMF
NGCF
Mult-VAE
GRMF-Norm
GCN (3L)
LightGCN (1L)
LightGCN (2L)
LightGCN (3L) 17.21 / 14.65
Twin CF-LGCN-U (1L)
Twin CF-LGCN-U (2L)
Twin CF-LGCN-U (3L)
Table 4: Comparison with Baselines
Recall / NDCG @ 20 Gowalla Yelp2018 Amazon-Book Douban-Movie
Twin CF-LGCN-U
Table 5: Twin CF-LGCN-U vs Twin CF-LGCN-E inference are made exactly like the inductive setting. However, weassume that we cannot make recommendations for new users andnew items during the evaluation.From the results in Table 6, the inductive variants of both CF-LGCN-U and LightGCN obtain significant gains over the corre-sponding Transductive (Lower Bound) metrics and closer to theTransductive (Upper Bound) metrics. This result indicates thatthe proposed inductive modification is quite effective in general-izing to new users and items. CF-LGCN-U performs better thanLightGCN on Yelp2018 and Amazon-Book datasets, while it ismarginally inferior in the Gowalla dataset (as was observed in thetransductive setting).
Additionally, we carriedout a few experiments in a setting where we had access to all theusers during training and evaluated the performance only on newitems. As we have access to all the user embedding, we only needto derive embedding for new items. We can derive item embeddingwhile retaining the user embedding as it is, or we could updatethe embedding for users as well when we get access to the partialinteractions for the new items. We report metrics for both thesecases in Table 7. As we can see, updating user embedding withpartially observed interactions during inference can be very usefulover using static user embedding.
Recommendation problems are ubiquitous in our day-to-day life.User-Item interaction graphs capture historical information onusers interaction with various items. Collaborative Filtering (CF)methods utilize these interaction graphs to make item recommen-dations to users. User-item interactions have several dimensions:
1. Binary / Real Valued:
Interactions could be binary (e.g. user
Recall / NDCG @ 20 Gowalla Yelp2018 Amazon-BookTransductive (Lower Bound)
LightGCN 15.58 / 12.60 5.79 / 4.87 3.66 / 2.94Twin CF-LGCN-U 14.49 / 12.49 5.78 / 4.92 4.02 / 3.29
Inductive
LightGCN
LightGCN 17.19 / 15.34 6.44 / 5.65 4.59 / 3.98Twin CF-LGCN-U 17.09 / 15.11 6.48 / 5.71 5.11 / 4.49
Table 6: Generalization to New Users and New Items
Recall / NDCG @ 20 Gowalla Yelp2018 Amazon-Book
Twin CF-LGCN-U (2L) 16.12 / 13.72 6.11 / 5.19 4.08 / 3.33Twin CF-LGCN-U (2L) with U+
Table 7: U+ suffix represents the model for which we updateuser embedding during inference bought a product or not) or real-valued (e.g. user gave a 5/10 ratingon a video). This distinction might seem trivial; however, there areseveral subtle differences between the two. In binary interactiongraphs, we are often interested in predicting top-k items for a user;hence, the interest metric is often NDCG@k. In real-valued inter-actions like ratings, we are interested in predicting user’s ratingfor an item (regardless of the user’s preference towards that item).Thereby, the metric often measured here is the MSE. Also, there areinherent biases involved in real-valued interactions - a user maybe prone to giving high ratings to all products, or a popular item isprone to receiving high ratings. Such biases are absent in a binaryinteraction graph.
2. Explicit / Implicit:
Interactions could be explicit (e.g. user pur-chased a product) or implicit (e.g. user watched a YouTube video).Explicit feedbacks create a reliable user-interaction graph whichcan lead to a reliable recommendation. If a user purchases an item, ser Embedding based Neighborhood Aggregation Methodfor Inductive Recommendation we can be sure that they liked/sought the item. Hence, recommen-dations learnt from such graphs have a high degree of confidencein being liked/sought by the user. However, implicit feedback, dueto its nature, creates a highly unreliable graph. Just because a userwatched a YouTube video does not imply that the user liked it. Anyrecommendation made on such unreliable information can be noisy.Several methods have been proposed for CF over the years, ad-dressing different aspects of the recommendation problem.
Factorization Methods:
Early methods assumed fixed user anditem set, and proposed Matrix Factorization (MF) based methods [9,21, 34] which treats the user-item interaction graph as a matrix.These methods can successfully handle binary interactions. Forreal-valued interactions, addressing inherent biases is essential.[20] computes explicit per-user per-item bias estimates to get im-proved performance. In cases of implicit interactions, it is necessaryto identify meaningful interactions for downstream recommenda-tion problems; hence attention-based models have been developedfor such scenarios that use side-information to model attention[3, 14]. With the advent of deep learning, non-linear models ofmatrix factorization have been proposed [15] and models that canleverage rich side information to get improved recommendation[37, 44]. AutoEncoder-based models [23, 39] have been proposedto address implicit interactions.
GNN based:
Viewing graphs as matrices has certain inherent limi-tations. Since not all users interact with all items, certain entriesin the matrix are missing. Replacing them with default values ad-dresses these missing entry problems. Choice of the default valueoften dictates the performance of the model. Additionally, thesemethods do not explicitly exploit the collaborative nature of theuser-item interaction graph. Researchers have started to look atGraph Neural Networks (GNNs) [40] to circumvent these issues.Graph Convolutional Network (GCN) [19], a particular instantia-tion of GNNs, has shown to be capable of exploiting graphs to giveimproved performance. [36] adapt GCNs for user-item interactiongraph and show that it can improve performance over baseline MFmethods; however, it only uses single-hop information. [38] adaptsthe model to incorporate multiple hop information. [43] developeda random walk based GCN model that can work for a massive graph.[13] proposed a simplified version of the GCN where neighborhoodaggregation alone gave good gains.
Inductive:
All the above methods, however, assume a fixed user-item set. This assumption can be very restrictive, particularly insocial network and e-commerce scenarios where several new usersand new items get added every day. In such cases, it is desirableto have an inductive method which does not assume a fixed user-item set. Our proposed approach is one such method. In this ap-proach, we derive item embeddings from user-embeddings and user-embeddings can be derived from item-embeddings. Exploiting suchrelationship allows the model to generalize to new users and newitems without retraining. Our proposed approach is competitivewith existing approaches in transductive settings, showing that it ispossible to express item-embeddings in terms of user-embeddingsand vice versa. It also performs well in inductive settings withoutdegradation, proving its generalizability to new users and new items.There have been a few prior works in inductive recommendation[10, 45]. However, these methods work on real-valued interaction graphs. As discussed earlier, the real-valued interactions have im-plicit biases. These are either addressed by either estimating peruser, per item bias as done in [20] or modeling each real-valuedpossibility as a separate relation as done in [45]. Addressing thesebiases is a non-trivial problem and beyond the scope of the work.We refrain from comparing with these approaches as we restrictour focus to binary-valued user-item interaction graphs.
We focused our attention on applying the proposed modeling ap-proach to scenarios where user-interaction data is available as bi-nary information and user-specific recommendation is made, withrecommendation quality measured using metrics such as recall @ 𝑘 and ndcg @ 𝑘 . Learning models in this application setting comes withseveral other important factors, including negative sampling [42]and choice of loss functions (e.g., approximate ndcg [1]) that helpto get improved performance. Incorporating these factors and con-ducting an experimental study is beyond the scope of this work.Another important recommendation application is rating pre-diction task (e.g., users rating movies or products) [21]. Severalmethods have been developed and studied recently. This includesgraph neural networks based matrix completion [36, 45], hetero-geneous (graph) information networks [32] which uses conceptssuch as meta-path where a meta-path (e.g., User - Book - Author -Book) encodes semantic information with higher order relations.Further, several solutions to address cold-start (e.g., recommendingfor users/items having only a few rating entries) and inductiveinference [45]. In some application setting, knowledge graphs (e.g.,explicit user-user relational graphs) are available, and they are usedto improve the recommendation quality. See [7] for more details.We can adapt our CF-GCN modeling approach for these differ-ent application scenarios by modifying network architecture, forexample, using GCN with additional graph types [26], relationalGCN (R-GCN) [35], Reco-GCN [41]) and optimizing for metricssuch as mean-squared-error (MSE), mean-absolute error (MAE) andndcg @ 𝑘 etc. We leave all these directions for future work. We consider the problem of learning graph embedding based methodfor collaborative filtering (CF). We set out to develop an inductiverecommendation model with graph neural networks and build scal-able models that have reduced model complexity. We proposed anovel graph convolutional networks modeling approach for col-laborating (CF-GCN ). Among the possible variants within this ap-proach’s scope, our primary candidate is the CF-LGCN-U networkand its twin variant. The CF-LGCN-U network model learns onlyuser embedding. This network offers inductive recommendationcapability, therefore, generalizes for users and items unseen duringtraining. Furthermore, its model complexity is dependent only onthe number of users. Thus, its model complexity is significantlylesser compared to even light models such as LightGCN . CF-LGCN-U models are quite attractive in many practical application scenar-ios, where the number of items is significantly lesser than thenumber of users. We showed the relation between the proposedmodels and LightGCN . Our experimental results demonstrate theefficacy of the proposed modeling approach in both transductiveand inductive settings. . Ragesh, et al.
REFERENCES [1] Sebastian Bruch, Masrour Zoghi, Michael Bendersky, and Marc Najork. 2019.Revisiting Approximate Metric Optimization in the Age of Deep Neural Networks.In
Proceedings of the 42nd International ACM SIGIR Conference on Research andDevelopment in Information Retrieval (Paris, France) (SIGIR’19) . Association forComputing Machinery, New York, NY, USA, 1241–1244.[2] Jiangxia Cao, Xixun Lin, Shu Guo, Luchen Liu, Tingwen Liu, and Bin Wang. 2021.Bipartite Graph Embedding via Mutual Information Maximization. In
WSDM .[3] Jingyuan Chen, Hanwang Zhang, Xiangnan He, Liqiang Nie, Wei Liu, and Tat-Seng Chua. 2017. Attentive Collaborative Filtering: Multimedia Recommendationwith Item- and Component-Level Attention. In
Proceedings of the 40th Inter-national ACM SIGIR Conference on Research and Development in InformationRetrieval .[4] R. Chen, Q. Hua, Y. Chang, B. Wang, L. Zhang, and X. Kong. 2018. A Survey ofCollaborative Filtering-Based Recommender Systems: From Traditional Methodsto Hybrid Methods Based on Social Networks.
IEEE Access
CoRR abs/1511.06443 (2015). arXiv:1511.06443[6] Michael D. Ekstrand, John T. Riedl, and Joseph A. Konstan. 2011. CollaborativeFiltering Recommender Systems.
Found. Trends Hum.-Comput. Interact.
4, 2 (Feb.2011), 81–173.[7] Yang Gao, Yi-Fan Li, Yu Lin, Hang Gao, and Latifur Khan. 2020. Deep Learning onKnowledge Graph for Recommender System: A Survey. arXiv:2004.00387 [cs.IR][8] T. George and S. Merugu. 2005. A scalable collaborative filtering frameworkbased on co-clustering. In
Fifth IEEE International Conference on Data Mining(ICDM’05) .[9] G. H. Golub and C. Reinsch. 1970. Singular Value Decomposition and LeastSquares Solutions.
Numer. Math.
14, 5 (April 1970), 403–420.[10] Jason Hartford, Devon Graham, Kevin Leyton-Brown, and Siamak Ravanbakhsh.2018. Deep Models of Interactions Across Sets. In
Proceedings of the 35th Interna-tional Conference on Machine Learning (Proceedings of Machine Learning Research,Vol. 80) , Jennifer Dy and Andreas Krause (Eds.). PMLR, Stockholmsmässan, Stock-holm Sweden, 1909–1918.[11] Ruining He and Julian J. McAuley. 2016. Ups and Downs: Modeling the VisualEvolution of Fashion Trends with One-Class Collaborative Filtering. (2016).[12] Xiangnan He and Tat-Seng Chua. 2017. Neural Factorization Machines for SparsePredictive Analytics. In
Proceedings of the 40th International ACM SIGIR Conferenceon Research and Development in Information Retrieval (Shinjuku, Tokyo, Japan) (SIGIR ’17) . Association for Computing Machinery, New York, NY, USA.[13] Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, YongDong Zhang, and MengWang. 2020.
LightGCN: Simplifying and Powering Graph Convolution Network forRecommendation . Association for Computing Machinery, New York, NY, USA,639–648.[14] X. He, Z. He, J. Song, Z. Liu, Y. Jiang, and T. Chua. 2018. NAIS: Neural AttentiveItem Similarity Model for Recommendation.
IEEE Transactions on Knowledge andData Engineering
30, 12 (2018), 2354–2366.[15] Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-SengChua. 2017. Neural Collaborative Filtering (WWW ’17) . International WorldWide Web Conferences Steering Committee, Republic and Canton of Geneva,CHE.[16] Santosh Kabbur, Xia Ning, and George Karypis. 2013. FISM: Factored ItemSimilarity Models for Top-N Recommender Systems. In
Proceedings of the 19thACM SIGKDD .[17] Efthalia Karydi and Konstantinos Margaritis. 2016. Parallel and DistributedCollaborative Filtering: A Survey.
ACM Comput. Surv.
49, 2, Article 37 (Aug.2016), 41 pages.[18] Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Opti-mization. In
ICLR .[19] Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification withGraph Convolutional Networks. In
ICLR .[20] Yehuda Koren. 2008. Factorization Meets the Neighborhood: A Multifaceted Col-laborative Filtering Model. In
Proceedings of the 14th ACM SIGKDD . Associationfor Computing Machinery, New York, NY, USA.[21] Y. Koren, R. Bell, and C. Volinsky. 2009. Matrix Factorization Techniques forRecommender Systems.
Computer
42, 8 (2009), 30–37.[22] Dawen Liang, Laurent Charlin, James McInerney, and David M. Blei. 2016. Mod-eling User Exposure in Recommendation. In
Proceedings of the 25th InternationalConference on World Wide Web (Montréal, Québec, Canada) (WWW ’16) . Interna-tional World Wide Web Conferences Steering Committee, Republic and Cantonof Geneva, CHE, 951–961.[23] Dawen Liang, Rahul G. Krishnan, Matthew D. Hoffman, and Tony Jebara. 2018.Variational Autoencoders for Collaborative Filtering. In
Proceedings of the 2018World Wide Web Conference .[24] S. Papadimitriou and J. Sun. 2008. DisCo: Distributed Co-clustering with Map-Reduce: A Case Study towards Petabyte-Scale End-to-End Mining. In . [25] Shaowen Peng and Tsunenori Mine. 2020. A Robust Hierarchical Graph Convo-lutional Network Model for Collaborative Filtering. arXiv:2004.14734 [cs.IR][26] Rahul Ragesh, Sundararajan Sellamanickam, Arun Iyer, Ram Bairi, and VijayLingam. 2021. HeteGCN: Heterogeneous Graph Convolutional Networks forText Classification. In
WSDM .[27] Nikhil Rao, Hsiang-Fu Yu, Pradeep Ravikumar, and Inderjit S. Dhillon. 2015. Col-laborative Filtering with Graph Information: Consistency and Scalable Methods.In
Advances in Neural Information Processing Systems 28: Annual Conference onNeural Information Processing Systems 2015, December 7-12, 2015, Montreal, Que-bec, Canada , Corinna Cortes, Neil D. Lawrence, Daniel D. Lee, Masashi Sugiyama,and Roman Garnett (Eds.). 2107–2115.[28] S. Rendle. 2010. Factorization Machines. In . 995–1000.[29] Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme.2009. BPR: Bayesian Personalized Ranking from Implicit Feedback. In
Proceedingsof the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence (Montreal,Quebec, Canada) (UAI ’09) . AUAI Press, Arlington, Virginia, USA, 452–461.[30] Steffen Rendle, Zeno Gantner, Christoph Freudenthaler, and Lars Schmidt-Thieme.2011. Fast Context-Aware Recommendations with Factorization Machines. In
Proceedings of the 34th International ACM SIGIR Conference on Research andDevelopment in Information Retrieval (Beijing, China) (SIGIR ’11) . Association forComputing Machinery, New York, NY, USA, 635–644.[31] Alexander Sergeev and Mike Del Balso. 2018. Horovod: fast and easy distributeddeep learning in TensorFlow.
CoRR abs/1802.05799 (2018).[32] C. Shi, B. Hu, W. X. Zhao, and P. S. Yu. 2019. Heterogeneous Information NetworkEmbedding for Recommendation.
IEEE Transactions on Knowledge and DataEngineering
31, 2 (2019), 357–370.[33] Xiaoyuan Su and Taghi M. Khoshgoftaar. 2009. A Survey of Collaborative FilteringTechniques.
Adv. in Artif. Intell.
Proceedings of the Sixth ACM Conference on RecommenderSystems (Dublin, Ireland) (RecSys ’12) . Association for Computing Machinery,New York, NY, USA, 83–90.[35] Anqi Tian, Chunhong Zhang, Miao Rang, Xueying Yang, and Zhiqiang Zhan. 2020.RA-GCN: Relational Aggregation Graph Convolutional Network for KnowledgeGraph Completion. In
Proceedings of the 2020 12th International Conference onMachine Learning and Computing (Shenzhen, China) (ICMLC 2020) . Associationfor Computing Machinery, New York, NY, USA, 580–586.[36] Rianne van den Berg, Thomas N. Kipf, and Max Welling. 2017. Graph Convolu-tional Matrix Completion.
CoRR abs/1706.02263 (2017). arXiv:1706.02263[37] Hao Wang, Naiyan Wang, and Dit-Yan Yeung. 2015. Collaborative Deep Learn-ing for Recommender Systems. In
Proceedings of the 21th ACM SIGKDD Inter-national Conference on Knowledge Discovery and Data Mining (Sydney, NSW,Australia) (KDD ’15) . Association for Computing Machinery, New York, NY, USA,1235–1244.[38] Xiang Wang, Xiangnan He, Meng Wang, Fuli Feng, and Tat-Seng Chua. 2019.Neural Graph Collaborative Filtering. In
Proceedings of the 42nd InternationalACM SIGIR Conference on Research and Development in Information Retrieval (Paris, France) (SIGIR’19) . Association for Computing Machinery, New York, NY,USA, 165–174.[39] Yao Wu, Christopher DuBois, Alice X. Zheng, and Martin Ester. 2016. Collabora-tive Denoising Auto-Encoders for Top-N Recommender Systems. In
Proceedingsof the Ninth ACM International Conference on Web Search and Data Mining .[40] Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, andPhilip S. Yu. 2019. A Comprehensive Survey on Graph Neural Networks.
ArXiv abs/1901.00596 (2019).[41] Fengli Xu, Jianxun Lian, Zhenyu Han, Yong Li, Yujian Xu, and Xing Xie. 2019.Relation-Aware Graph Convolutional Networks for Agent-Initiated Social E-Commerce Recommendation. In
Proceedings of the 28th ACM CIKM .[42] Zhen Yang, Ming Ding, Chang Zhou, Hongxia Yang, Jingren Zhou, and Jie Tang.2020. Understanding Negative Sampling in Graph Representation Learning.1666–1676.[43] Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L. Hamilton,and Jure Leskovec. 2018. Graph Convolutional Neural Networks for Web-ScaleRecommender Systems. In
KDD .[44] Fuzheng Zhang, Nicholas Jing Yuan, Defu Lian, Xing Xie, and Wei-Ying Ma. 2016.Collaborative Knowledge Base Embedding for Recommender Systems. In
Proceed-ings of the 22nd ACM SIGKDD International Conference on Knowledge Discoveryand Data Mining (San Francisco, California, USA) (KDD ’16) . Association forComputing Machinery, New York, NY, USA, 353–362.[45] Muhan Zhang and Yixin Chen. 2020. Inductive Matrix Completion Based onGraph Neural Networks. In
ICLR .[46] Shuai Zhang, Lina Yao, Aixin Sun, and Yi Tay. 2019. Deep Learning BasedRecommender System: A Survey and New Perspectives.
ACM Comput. Surv.