Dynamic Graph Collaborative Filtering
Xiaohan Li, Mengqi Zhang, Shu Wu, Zheng Liu, Liang Wang, Philip S. Yu
DDynamic Graph Collaborative Filtering
Xiaohan Li ∗† , Mengqi Zhang ∗‡§ , Shu Wu ‡§(cid:107) , Zheng Liu † , Liang Wang ‡§ , Philip S. Yu †† University of Illinois at Chicago, Chicago, IL, USA { xli241, zliu212, psyu } @uic.edu ‡ School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, [email protected] § Institute of Automation, Chinese Academy of Sciences, Beijing, China { shu.wu, wangliang } @nlpr.ia.ac.cn Abstract —Dynamic recommendation is essential for modernrecommender systems to provide real-time predictions basedon sequential data. In real-world scenarios, the popularity ofitems and interests of users change over time. Based on thisassumption, many previous works focus on interaction sequencesand learn evolutionary embeddings of users and items. However,we argue that sequence-based models are not able to capturecollaborative information among users and items directly. Herewe propose Dynamic Graph Collaborative Filtering (DGCF),a novel framework leveraging dynamic graphs to capture col-laborative and sequential relations of both items and users atthe same time. We propose three update mechanisms: zero-order ‘inheritance’, first-order ‘propagation’, and second-order‘aggregation’, to represent the impact on a user or item whena new interaction occurs. Based on them, we update relateduser and item embeddings simultaneously when interactionsoccur in turn, and then use the latest embeddings to makerecommendations. Extensive experiments conducted on threepublic datasets show that DGCF significantly outperforms thestate-of-the-art dynamic recommendation methods up to 30%.Our approach achieves higher performance when the datasetcontains less action repetition, indicating the effectiveness ofintegrating dynamic collaborative information.
Index Terms —recommender system, dynamic graph
I. I
NTRODUCTION
Dynamic recommender systems have proved their effec-tiveness in many online applications, such as social media,online shopping, and streaming media. They leverage histor-ical interaction sequences of users and items to predict theitem that the user may interact with in the future. In real-world scenarios, both user interest and item popularity mayshift and evolve along with time. Therefore, it is crucial fora dynamic recommendation model to accurately capture thedynamic changes in user and item perspectives to make accu-rate predictions. Besides, collaborative information is provedpowerful in making recommendation [1]–[3]. Users that sharecommon interacted items tend to have similar interests, and themethods that leverage this property to recommender system iswhat we call Collaborative Filtering (CF). Consequently, com-bining dynamic changes of users and items with collaborativeinformation is one of the main tasks in dynamic recommendersystems. ∗ Both authors contributed equally to this research. (cid:107)
To whom correspondence should be addressed.
Zero-order inheritanceFirst-order propagationSecond-order aggregationAggregator
Fig. 1. User-Item graph. Red nodes denote users, and green nodes denoteitems. A solid line means the user has interacted with the item, a dashline means the user and the item are interacting. Arrows in different colorsrepresent different relations when updating the embedding of a node.
To build a dynamic recommender system, an intuitive wayis to model sequences of interactions. Previously, many kindsof sequence-based methods have been developed based onuser-item interactions. For example, RNN-based sequentialprediction models [4]–[7] use the recurrent architecture tocapture the long-term dependency of the item sequences.Besides, to take user sequences into consideration as well,Jodie [8] and other dynamic evolution models [9]–[11] usedouble RNNs to simultaneously model the update of usersand items based on the evolutionary processes of them. Weargue that all the models above fail to utilize collaborativeinformation directly. These sequence-based methods primarilymodel the transition relations between items but ignoring thesimilarity between users. To mitigate the problem of lackingcollaborative information, the graph structure is an alternativeinstrument for dynamic recommender systems.Several previous works have shown the advantage of usingthe graph structure in recommender systems, but they allhave restrictions in different aspects. According to [12], [13],the graph structure is capable of incorporating collaborativeinformation explicitly. By taking user-item interactions as a bi-partite graph, these models exploit the high-order connectivityof users and items and encode collaborative information intothe graph structure. However, all these models are only suitablefor static scenarios. The advantages of sequential dependencyand time information are wasted in them. Moreover, SR-GNN[14] proves the superiority of graph structures over sequencesin the dynamic recommendation, but it fails to incorporate theevolving of items. To deal with these problems, we leverage a r X i v : . [ c s . I R ] J a n ynamic graphs to model the evolutionary process of dynamicrecommender systems.In our proposed dynamic graph, nodes are users and items,and edges are their interactions. In the beginning, the graphonly contains isolated user and item nodes. With more users,items and their interactions join the dataset, the nodes andedges evolve and grow to a large graph. To model this processand learn the embeddings of users and items at differenttimes, we develop three update mechanisms in the dynamicgraph, which are shown in Figure 1. The first one is zero-order inheritance , in which each node inherits its previousembedding and incorporates the new features of it to updateits embedding. Secondly, first-order propagation builds theconnection between two sides of the interaction by propagatingone’s embedding to the other. It updates the embeddings ofthe user and item in the interaction simultaneously. Finally, second-order aggregation leverages aggregator functions toobtain an overall embedding for all neighbors of the node inthe user side, and then pass the embedding to the node inthe item side. It is a direct manner to utilize collaborativeinformation.Based on these three update mechanisms mentioned above,we propose Dynamic Graph Collaborative Filtering (DGCF)to employ all of them under a unified framework. Figure2 illustrates the workflow of the DGCF model. There arethree modules in the model, corresponding to the three updatemechanisms. Each part produces an embedding, and then theembeddings generated by the three parts are fused to learnthe embedding of the node. At the end of our model, weutilize an evolutionary loss to take the time information, i.e.,time stamps, into our model to make recommendation. Detailsof the model will be introduced in section II. Overall, ourproposed DGCF is capable of directly learning user and itemembeddings and perform recommendation tasks in an end-to-end framework.To summarize, our main contributions are listed as follows: • To the best of our knowledge, our work is the first oneto introduce the dynamic graph into dynamic recommen-dation scenarios to model the interactions and updatesbetween users and items. • We design a novel framework for the dynamic recommen-dation task with graph structure, which can effectivelymodel the dynamic relationship between users and items. • We conduct empirical experiments on three publicdatasets. Experimental results demonstrate that our DGCFmodel achieves state-of-the-art results on these datasets,especially for datasets with lower action repetition.II. P
ROPOSED M ODEL
In this section, we present the proposed Dynamic GraphCollaborative Filtering (DGCF) in detail. We first formulatethe dynamic graph recommendation problem. Then we intro-duce the embedding update and recommendation modules ofthe DGCF. Finally, we illustrate the process of optimizationand training.
TABLE IN
OTATIONS
Notation Explanation t − , t, t + Time points of previous, current,and future interaction G t = ( V t , E t ) Dynamic graph at time tS i = ( u i , v i , t i , f i ) i -th interaction u, v User u and item v h tu , h tv The embedding of user and item at time tθ, φ, ζ
Zero, first, and second-order functions W Weight matrix H uv = { v , v , ..., v m } Second-order neighbors of item v H vu = { u , u , ..., u n } Second-order neighbors of user u F( · ) Fusion function
MLP
Multi-Layer Perceptron f feature vector A. Preliminary1) Dynamic recommendation:
Let U , V represent the userand item sets, respectively. In a dynamic recommendationscenario, the i -th user-item interaction is represented in a tuple S i = ( u i , v i , t i , f i ) , where i ∈ { , , · · · , I } , and I is the totalnumber of interactions. u i ∈ U , v i ∈ V are the user anditem in the interaction and t i is the time stamp. f i denotesfeatures of the interaction, and it includes user features f u anditem features f v . The target of dynamic recommendation isto learn the representations of the user and item from currentinteraction and historical records, and then predict the mostpossible item that the user will interact with in the future.
2) Dynamic graph:
The interactions between users anditems at time stamp t construct a dynamic graph G t = ( V t , E t ) ,where V t , E t are the sets of nodes and edges in G t respectively.Under recommendation settings, V t contains all user and itemnodes, and E t is the set of all interactions between users anditems before time t . Essentially, the graph here is a bipartitegraph, and all edges are between user and item nodes. We use h tu ∈ R d and h tv ∈ R d to represent the embedding of user node u and item node v at time t . The initial graph G t = ( V t , E t ) at time t consists of isolated nodes or a snapshot of thedynamic graph, and the initial embeddings of users and itemsare initial feature vectors or random vectors. Then, whileanother interaction S = ( u, v, t, f ) joins the graph, user anditem embeddings h tu , h tv are updated by our proposed DGCF.Besides, h t − u and h t − v represent the most recent embeddingsof user node u and item node v before time stamp t . Thenotations used throughout this paper are summarized in TableI. B. Overview
Figure 2 is an overview of our proposed framework. Theinteractions between users and items form a dynamic graphover time. When a new user-item interaction joins, we utilizethe embedding update mechanism (Section II-C) to updateboth user and item nodes together. Then, we get the user anditem embeddings in the future via the projection functions(Section II-D). Finally, we calculate the L distance between volutionary LossFusion Aggregator
New interactionUser u Item v Fusion
Aggregator
User embedding updateItem embedding update
First-orderFirst-order Second-orderSecond-orderZero-orderZero-order
Fig. 2. Illustration of the Dynamic Graph Collaborative Filtering (DGCF). Left: A new interaction joins in the user-item graph. Right: Overall structure ofDGCF. The pink and light green nodes means the neighbors of the target nodes. h t − u and h t − v denote the embedding of user u and item v before time t . Based on three update mechanisms, h tu , h tv are produced by our DGCF at the same time. With h tu , h tv , and evolutionary loss function, we can get therecommendation in the end. the predicted item embedding and all other item embeddings,and then recommend items with the smallest distance to thepredicted item embedding. C. Embedding Update
First, we discuss the embedding update mechanisms: (1)zero-order ‘inheritance’, which means users and items takethe input of their previous embeddings and new features,(2) first-order ‘propagation’, which models current user-itemiteration, and (3) second-order ‘aggregation’, which aggregatesthe previous users that have interacted with the item into thecurrent user, and vice versa.
1) Zero-order ‘inheritance’:
For a dynamic graph, the nodeto be updated firstly inherits the influence of the previous stateand new features of itself. Similar to existing sequential pre-diction methods [4], [5], [8], we use the previous embeddingas a part of the input to ‘inherit’ the previous state. Besides,we additionally encode the time interval between current andprevious embeddings as a part of the features to learn theuser and item embeddings. For user embedding h u and itemembedding h v , the forward formulas are ˆh tu = θ u ( W u h t − u + w ∆ t u + W f f u ) , (1) ˆh tv = θ v ( W v h t − v + w ∆ t v + W f f v ) , (2)where ∆ t is the time interval between current time t andprevious interaction time t − , and h t − u , h t − v are the most recentembeddings of user u and item v before t . f u and f v are thefeatures of the user and the item respectively. W ∈ R d × d areparameter matrices and w ∈ R d is the parameter vector toencode time interval ∆ t . θ u and θ v are activation functions.To improve the computational speed, our work here uses anidentity map instead of a non-linear activation function.
2) First-order ‘propagation’:
In our model, we build adynamic bipartite graph to model the interactions betweenuser and item nodes, which means a user node’s first-orderneighbors are the items that he or she has interacted with, andvice versa.In dynamic recommendation scenarios, the item that a userinteracts with, to some extent, reflects his or her recent inten-tion and interest. Correspondingly, users who are interestedin a specific item can be regarded as a part of the item’sproperties. Therefore, it is essential to exploit the first-orderneighbor information to learn user and item embeddings. Notethat our model only incorporates the current interacted node asthe input instead of taking all first-order neighbors as inputs.To be specific, when an interaction involving user u anditem v occurs, on the one hand, item v ’s current embeddingand features like reviews or descriptions are incorporated intouser u ’s embedding. On the other hand, the user u ’s currentembedding and features, e.g., the user’s profile, are injectedinto item v ’s embedding. Formally, ¯h tu = φ u ( W u h t − v + W f f v ) , (3) ¯h tv = φ v ( W v h t − u + W f f u ) , (4)where W ∈ R d × d are parameter matrices. In this way, thefeatures of the current interaction are propagated to updateuser and item embeddings. Similar to zero-order inheritance,we still use identity map function for φ u and φ v .
3) Second-order ‘aggregation’:
In the spirit of modelingcollaborative filtering in a dynamic graph, the update betweennodes considers not only the historical sequence of the nodeand the information between interacted nodes, but also thestructural information among the nodes. The rationale be-hind second-order aggregation is to model the collaborativerelationship between users and items. In a dynamic graph, itaptures the influence of the nodes at second-order proximitypassing through the other node participating in the interaction.Specifically, for a user, he or she may have bought a bunchof items before the current interaction, and now this user buysa new item, so we assume the newly purchased item has acollaborative relation with the previously purchased items ofthe user. To model this relation, we build a direct connection,which denoted as v u → u → v . Here v u ∈ H uv , where H uv = { v , v , ..., v m } is the set of previously purchased items ofuser u , and v is the item in current interaction. Node u servesas a bridge passing information from v u to node v so that v receives the aggregated second-order information through u .For the user u , the relation is u v → v → u , where u v ∈H vu , H vu = { u , u , ..., u n } is the set of users who previouslypurchased item v .As shown in Figure 2, to make the second-order informationflow from the neighbors of v to u , we take u as the anchor nodeand u i ∈ H vu as second-order neighbor nodes in the graph.Then we use aggregator functions to transmit neighborhoodinformation to the anchor node. This process is formulated as (cid:101) h tu = ζ u ( h t − u , h t − u , h t − u , ..., h t − u n ) , (5) (cid:101) h tv = ζ v ( h t − v , h t − v , h t − v , ..., h t − v m ) , (6)where ζ u and ζ v are aggregator functions. In the dynamicgraph, users and items are two different kinds of nodes withdifferent properties. For example, the neighbors of a userare the items that he or she has interacted with, which aretime-dependent. The neighbors of an item are the users whohave interacted with it, which tend to be similar. Besides, thenumber of users that have interacted with a popular item couldbe significantly large-scale. Thus, the aggregator functions weuse also should be different for user and item nodes. Thefollowing part provides some candidate aggregator functionswe can use in second-order update: • Mean aggregator is a straightforward operator to aggregatethe neighbor information of user u and item v . Accordingto [15], Mean aggregator can be viewed as an inductivevariant of the GCN [16] approach. So, the formulas of theaggregator can be written as follows: (cid:101) h tu = h t − u + 1 |H uv | (cid:88) u i ∈H vu W mu h t − u i , (7) (cid:101) h tv = h t − v + 1 |H vu | (cid:88) v i ∈H uv W mv h t − v i , (8)where W m · ∈ R d × d are aggregation parameters. • LSTM aggregator is a complex aggregator function basedon LSTM [17] architecture. By taking sequential data asinputs, LSTM has strong non-linear memory expressioncapability to keep track of long term memory. From user’sperspective, previous items that users have interacted haveexplicit sequential dependency, so we feed all of the usernode’s neighbors into the aggregator function in a chrono-logical order. Besides, from the item’s perspective, we order its connected users by time and input them to LSTM. TheLSTM aggregator can be formulated as: (cid:101) h tu = h t − u + LSTM u ( h t − u , h t − u , ..., h t − u n ) , (9) (cid:101) h tv = h t − v + LSTM v ( h t − v , h t − v , ..., h t − v m ) . (10) • Graph Attention aggregator can compute attentionweights between the central node and the neighbor nodes,which indicate the importance of each neighbor node to thecentral node. Inspired by the GAT [18] model, we definegraph attention aggregator as: (cid:101) h tu = (cid:88) u i ∈H vu α ui h t − u i , (11) α ui = exp(LeakyRelu( W w [ h t − u (cid:107) h t − u i ])) (cid:80) u i ∈H vu exp(LeakyRelu( W w [ h t − u (cid:107) h t − u i ])) , (12) (cid:101) h tv = (cid:88) u i ∈H uv α vi h t − v i , (13) α vi = exp(LeakyRelu( W w [ h t − v (cid:107) h t − v i ])) (cid:80) v i ∈H uv exp(LeakyRelu( W w [ h t − v (cid:107) h t − v i ])) , (14)where W w ∈ R d is a weight matrix, and (cid:107) is theconcatenation operation.In practical recommendation scenarios, second-order ag-gregation may face enormous computational costs due tothe large scale of data. Therefore, when performing second-order aggregation, we select a fixed number of neighbors foraggregation. We call the number of neighbor nodes selectedas aggregator size.For higher-order information, we choose not to use itbecause of the following reasons. Firstly, higher-order in-formation may lead to over-smoothing problem [19], whichmakes the node embeddings prone to be similar. Besides,using higher-order information increases the computationalcomplexity in power law. As efficiency is a vital issue inrecommender system, we try to control the computationalcomplexity to be acceptable in most scenarios and only useup to second-order information.
4) Fusion information:
To combine the three kinds of up-date mechanisms in learning node embeddings in the dynamicgraph, we fuse the above mentioned three representations toobtain the final update formula: h tu = F u ( W zerou ˆh tu + W firstu ¯h tu + W secondu (cid:101) h tu ) , (15) h tv = F v ( W zerov ˆh tv + W firstv ¯h tv + W secondv (cid:101) h tv ) , (16)where h tu , h tv ∈ R d are the node embeddings updated afterthe user u interact with item v at time t . F u , F u are fusionfunctions of user and item respectively. Here we generallychoose sigmod σ ( · ) as activation function. W zero , W first ,and W second ∈ R d × d are parameters to control the influenceof three update mechanisms. . Recommendation In dynamic recommendation, the goal is to predict theitem that the user is most likely to interact with at time t according to his or her historical interaction sequence beforetime t . Intuitively, this is an analogy to link prediction tasksin dynamic graphs. To be specific, our target is to predict theitem node v in the dynamic graph that the user node u ismost likely to link to at time t . Based on [8], we propose anevolutionary loss for dynamic graph.
1) Evolution formula:
Different from traditional collabora-tive filtering methods, our model is designed for predictingfuture interaction. Specifically, given a future time point, wecan leverage our model to predict the future embeddingsand then make recommendation. It is a more flexible setting,because the predicted results do not rely on the sequences.Instead, they are based on the embeddings learned by thedynamic graph structure.Since h tu means the predicted embedding of the user’sfuture, we need an estimated future embeddings to measurewhether the predicted embedding is accurate. Motivated by[8], we assume the growth of users is smooth, so the embed-ding vector of the user node evolves in a contiguous space.Therefore, we set a projection function to estimate the futureembedding based on element-wise product of the previous em-bedding and time interval. We define the embedding projectionformula of user u after current time t to the future time t + asfollows: (cid:98) h t + u = MLP u ( h tu (cid:12) ( + w t ( t + − t )) , (17)where w t ∈ R d is time-context parameter to convert the timeinterval to vector, ∈ R d is a vector with all elements . M LP here means Multi-Layer Perceptron. t + is the futuretime that the user interacts with the next item. With thisprojection function, the future item embedding grows in asmooth trajectory w.r.t. the time interval.After obtaining the projected embedding (cid:98) h t + u of the user u , we learn the future embedding of the item v denoted as (cid:98) h t + v by setting another projection function. The projected itemembedding is based on three parts: the user that currentlyinteracts with, the update features of the user and the itemitself, which are all we have already known. So, we define theprojection formula of item v as: (cid:98) h t + v = MLP v ( W (cid:98) h t + u + W f u + W f v )) , (18)where W , W and W denote the weight matrix.
2) Loss function:
When we have the estimated futureembeddings by projection functions, we take them as groundtruth embeddings in our loss function. In order to trainour model, the loss function is composed of Mean SquareError (MSE) between model-generated embeddings h tv , h tu and estimated ground truth embeddings (cid:98) h t + v , (cid:98) h t + u at eachinteraction time t . Besides, we need another constraint for theitem embedding to avoid overfitting. We constrain the distancebetween model-generated h tv and mostly recently embedding h t − v of item v , and between h tu and h t − u , respectively, to make the node embedding more consistent with the previous one.The assumption behind this constraint is that items’ and users’properties tend to be stable in a short time. The loss functionis written as follows: L = (cid:88) ( u,v,t,f ) ∈{ S i } Ii =0 (cid:107) (cid:98) h t + v − h tv (cid:107) + λ u (cid:107) h tu − h t − u (cid:107) + α v (cid:107) h tv − h t − v (cid:107) , (19)where { S t } Ii =0 denotes the interaction events sorted bychronological order, and λ u and α v are smooth coefficients,which are used to prevent the embedding of user and itemfrom deviating too much during the update process.To make recommendations for a user, we calculated the L distances between the predicted item embedding that we obtainfrom the loss function and all other item embeddings. Thenthe nearest Top- k items are what we predict for the user.Compared with traditional BPR loss [3], the evolutionaryloss is more suitable for dynamic recommendation, because ittakes time into account. As a result, the changing trajectoriesfor users and items are modeled by this loss [8], and it canmake more precise recommendation for the next item. E. Optimization and Training
Similar with Recurrent Neural Networks (RNNs), we ap-ply the back-propagation through time (BPTT) algorithm formodel training. The model parameters are optimized by Adamoptimizer [20].To speed up the training process, we use the same methodof constructing batches as [8]. As mentioned in [8], thetraining algorithm needs to follow two critical criteria: (1) itshould process the interactions in each batch simultaneously,and (2) the batching process should maintain the originaltemporal ordering of the interactions and keep the sequentialdependency in the generated embeddings. In practice, wearrange the interaction events S i in chronological order toget an event sequence { S , S , · · · , S I } numbered by integer,and I is the total number of interactions. We traverse throughthe temporally sorted sequence of interactions iteratively andput each interaction to a Batch k , where k ∈ [1 , I ] . In theinitial stage of constructing the Batch sequence: each
Batch set is empty at first, and the
Batch index is − . We defineas B init ( u ) = − , B init ( v ) = − . After each interaction ( u, v, t, f r ) is added to Batch , we update the batch indexof the user u and item v . For each interaction, the index ofthe added Batch is max { B ( u ) + 1 , B ( v ) + 1) } . When theinteraction is added to the Batch , we update the index of theadded u and v . This mechanism ensures that the embeddings ofusers and items in the same Batch are updated simultaneouslyin the training and testing process.III. E
XPERIMENTS
All source codes and datasets are provided in this link . Inthis section, we design the experiments to answer the followingquestions: https://github.com/CRIPAC-DIG/DGCFABLE IIT HE AMOUNT OF USERS , ITEMS , INTERACTIONS AND ACTION REPETITIONRATE IN EACH DATASET . Data Users Items Interactions Action RepetitionReddit 10000 1000 672447 79%Wikipedia 8227 1000 157474 61%LastFM 1000 1000 1293103 8.6% Q1 : How does DGCF perform compared with other state-of-the-art dynamic or sequential models? Q2 : What is the influence of three types of embeddingupdate mechanisms (zero-order inheritance, first-order prop-agation, second-order aggregation) in DGCF? Q3 : What is the effect of different aggregator functions insecond-order aggregation on model performance? Q4 : How do different hyper-parameters (e.g. aggregationsize) affect the performance of DGCF? A. Datasets Description
To evaluate the proposed model, we conduct experimentson three real-world datasets. The amounts of users, items andinteractions for datasets and their action repetitions are listed inTable II. It is worth emphasizing that the three datasets differsignificantly in terms of users’ repetitive behaviors. Here isthe details of the datasets:
Reddit:
This dataset contains one month of posts made byusers on subreddits [21]. The 10,000 most active users and the1,000 most active subreddits are selected and treated as usersand items respectively, and they have 672,447 interactions.Besides, each post’s text is converted into a feature vector torepresent their LIWC categories [22].
Wikipedia edits:
This dataset contains one month of editson Wikipedia [23]. The editors who made at least 5 editsand the 1,000 most edited pages are filtered out as usersand items for recommendation. This dataset contains 157,474interactions in total. Similarly, the edit texts are converted intoan LIWC-feature vector.
LastFM:
This dataset contains the listening records ofusers within a month [24]. 1000 users and the 1000 mostlistened songs are selected as users and items, and 1,293,103interactions are in this dataset. Note that interactions do nothave features in this dataset.All user-item interactions are arranged in chronologicalorder. Then we split the training, validation and test set ina proportion of 80%, 10%, 10% for each dataset. For eachinteraction ( u, v, t, f ) in the test set, our goal is to use thegiven u and v to predict the item that the user is most likelyto interact with at time t . B. Compared Methods
To evaluate the performance of DGCF, we compare it withthe following baseline methods: • LSTM [17]: It is a variant of RNN whose name isLong Short-Term Memory (LSTM). LSTM updates user(session) embedding by inputting a sequence of historical interacted items of the user into the LSTM cell, whichcould capture the long-term dependence of the itemsequence. • Time-LSTM [25]: It uses time gates in LSTM to modeltime intervals in the interaction sequences. • RRN [7]: Recurrent Recommender Network (RRN) pre-dicts future trajectories to learn user and item embeddingsbased on LSTM. • CTDNE [10]: It is a state-of-the-art model in generatingembeddings from temporal networks, but it only producesstatic embeddings. • DeepCoevolve [26]: It is based on co-evolutionary pointprocess algorithms. We use 10 negative samples perinteraction following the setting of [8]. • Jodie [8] : It is a state-of-the-art model in dynamic rec-ommendation problem. It defines a projection operationto predict dynamic embedding trajectory.
C. Experimental Settings1)
Evaluation Metrics:
Two evaluation metrics are usedto measure the performance of our DGCF framework:(1)
Mean Reciprocal Rank (MRR) supposes the modelproduces a list of recommended items to a user, and the listis ordered by confidence of the prediction. With MRR, wecan measure the performance of the model with respect to theranking list of items. Higher MRR score means target itemstend to have higher rank positions in the predicted item lists.Formally, MRR is defined as:
M RR = 1 | N | | N | (cid:88) i =1 rank u , (20)where rank u represents the rank position of the target itemfor user u .(2) Recall@10 means the number of target items that are inthe top-10 recommendation lists. To calculate recall@10, weuse equation
Recall @10 = n hit n test , (21)where n hit is the number of target items that are among thetop-10 recommendation list and n test is the number of all testcases. Parameter Settings:
We implement DGCF frameworkin
Pytorch . The dimensionality of embeddings is forall attempts. We use randomly sampled vectors in Gaussiandistribution with a mean of 0 and a variance of 1 to initializeembeddings of the users and items. Features of users and itemsare one-hot vectors. Adam optimizer with learning rate e − , L penalty e − is adopted in our model. The smoothingcoefficients λ and α in loss function are set to 1. We run 50epochs each time and select the best attempt based on thevalidation set for testing. For comparison methods, we mostlyuse the default hyperparameters in their paper. https://pytorch.org/ABLE IIIE XPERIMENTS ON THREE DATASETS COMPARE
DGCF
WITH SIX BASELINEMODELS BASED ON M EAN R ECIPROCAL R ANK (MRR)
AND R ECALL @10(R@10). T
HE BOLD AND UNDERLINED NUMBERS MEAN THE BEST ANDSECOND - BEST RESULTS ON EACH DATASET AND METRIC , RESPECTIVELY .”I
MPROVEMENT ” MEANS THE MINIMUM IMPROVEMENT AMONG ALLBASELINES . Models LastFM Wikipedia RedditMRR R@10 MRR R@10 MRR R@10LSTM 0.081 0.127 0.332 0.459 0.367 0.573Time-LSTM 0.088 0.146 0.251 0.353 0.398 0.601RRN 0.093 0.199 0.530 0.628 0.605 0.751CTDNE 0.010 0.010 0.035 0.056 0.165 0.257DeepCoevolve 0.021 0.042 0.515 0.563 0.243 0.305Jodie 0.239 0.387 0.746 0.821 0.724 0.851DGCF
Improvement 34.3% 27.7% 5.4% 3.6% 0.2% 0.5%
D. Performance Comparison (Q1)
To prove the superiority of our proposed DGCF model, weconduct experiments on three datasets and compare our modelwith six baseline methods. Table III shows that DGCF signif-icantly outperforms all six baselines on the three datasets ac-cording to both MRR and Recall@10. Especially on LastFM,the improvements are 34.3% on MRR and 27.7% on Re-call@10. Based on the experimental results shown in TableIII, we find three following facts: • DGCF yields significant improvements on LastFM. Be-sides, compared with Jodie, the improvements on Reddit,Wikipedia, and LastFM are in an increasing order, which areconsistent with the repetitive action pattern in the datasets.The main reason for the improvements might be that DGCFexplicitly considers the collaborative information in thegraph. LastFM dataset includes users and their listenedsongs. Intuitively, users tend to listen to different songs thatbelong to similar genres, and that is why LastFM datasethas a low action repetition. In this case, it is not easy tomake recommendations only based on sequences. However,with collaborative information we utilize in DGCF, wecan find similar sequences from other users and makerecommendations. It can be proof that DGCF can deal withlow action repetition situation. • In DGCF, the improvements on MRR are greater than onRecall@10. It may result from projection functions and theevolutionary loss. Because our loss considers the futuretime, the predicted results corresponding to different timestamps tend to be different. In this case, our model has abetter performance on the ranking of recommended items.
E. Ablation Study (Q2)
As the three update mechanisms are crucial in DGCF, wemeasure their effectiveness respectively by conducting exper-iments. We compare our model with different ablation mod-els: (1) DGCF-0: DGCF without zero-order inheritance. (2)DGCF-1: DGCF without first-order propagation. (3) DGCF-2:DGCF without second-order aggregation. The DGCF here uses
Reddit Wikipedia LastFM0 . . . . . . . M rr DGCF-0DGCF-1DGCF-2DGCF (a) MRR
Reddit Wikipedia LastFM0 . . . . . . . . R ec a ll @ DGCF-0DGCF-1DGCF-2DGCF (b) Recall@10Fig. 3. Ablation Study. It shows different node relations affect the perfor-mance of the model. Due to the different characteristics of datasets, The resultsare different on several datasets. graph attention as its aggregator function. Figure 3 summarizesthe experimental results.Compared with ablation models, DGCF achieves the beston all datasets, which proves the effectiveness of each moduleof our model. The following observations are further derivedfrom the results: • On the LastFM dataset, we find that DGCF-2 shows ev-idently lower performance than others. Since the second-order update is capable of finding the users who like thesame songs, it is plausible that second-order aggregation isadvantageous on the LastFM dataset. For example, whena user does not have many repetitive actions to indicatehis/her interests, our model is still capable of modeling theuser based on the similarity with other users. As mentionedin Q1, DGCF is able to deal with low action repetitiondatasets. Without the second-order update, our model cannotachieve similar results on LastFM. As a result, it proves thecorrectness of this assumption. • Compared to DGCF, DGCF-0 and DGCF-1 tend to bestable. It shows that when the model lacks sequentialinformation or current interaction information, second-orderaggregation can still have a decent performance by takingadvantage of collaborative information.
F. Aggregator Function (Q3)
In this subsection, we test the effectiveness of Mean, LSTM,and Graph attention aggregator functions respectively. The
ABLE IVA
GGREGATOR F UNCTION . T
HE INFLUENCE OF DIFFERENT AGGREGATORFUNCTIONS ON THE MODEL PERFORMANCE . Aggregator LastFM Reddit WikipediaMRR R@10 MRR R@10 MRR R@10Mean 0.296 0.419 0.721 0.844 0.770 0.836LSTM 0.291 0.425 0.721 0.841 0.755 0.815Attention 0.321 0.456 0.726 0.856 0.786 0.852 experimental results are shown in Table IV.In Table IV, graph attention outperforms the other two ag-gregator functions on all datasets. It may because that attentionmechanism can learn the strength of the relations between thecentral node and its neighbors. In other words, graph attentionis able to explicitly select effective collaborative informationamong many second-order neighbor nodes to update user oritem embeddings. Compared with graph attention, the draw-back of mean and LSTM is that neighbors of the central nodeare equally important when performing aggregation, so thatthe most influential nodes might be ignored. Different fromthe other two datasets, low action repetition dataset LastFMdepends more on second-order aggregation, so the attentionmechanism improves more on it.
G. Hyper-Parameter Study (Q4)
Appropriate aggregator size not only guarantees model per-formance but also speeds up training and inference. In this sec-tion, we test the performance of DGCF when applying differ-ent aggregation sizes. We select graph attention aggregator andchoose the aggregation size as , , , , , . Figure4 shows the results. According to results, a smaller T can leadto higher performance. The degradation of performances whenaggregation sizes increase maybe because of the redundancyof second-order collaborative information. Therefore, we canreduce the aggregator size to improve training and inferencespeed. IV. D ISCUSSION
In this section, we compare the DGCF with two representa-tive dynamic models, which are RNN and Jodie, to highlightthe main differences and innovative parts of our model.
A. Difference between RNN and DGCF
Recurrent Neural Network (RNN) based models, such asGRU4Rec [4], NARM [5], are widely used in sequentialrecommendation problems. These models take the user’s his-torical item sequences as input to model the user’s interest.However, RNN is only a special case of our DGCF when firstand second-order relations are removed. It makes RNN failto consider the variations of items as well as the interactionsbetween users and items.
B. Difference between Jodie and DGCF
Joint Dynamic User-Item Embeddings (Jodie) [8] is a rep-resentative method of co-evolving based models. It uses two . . Reddit . . Wikipedia
20 40 60 80 100 120Aggregator Size0 . . . LastFM (a) MRR . . . Reddit . . . Wikipedia
20 40 60 80 100 120Aggregator Size0 . . . LastFM (b) Recall@10Fig. 4. Aggregation size. As the aggregator size changes, the performancefluctuation of the model on the datasets is small, indicating that the model isgenerally robust to aggregator sizes in second-order aggregation.
RNNs, which are user-RNN and item-RNN and projectionoperation to predict the embedding of users and items. OurDGCF model shares similar motivation with Jodie, but com-pared to DGCF, Jodie is also a special case with no second-order aggregation. This makes it difficult to explicitly modelthe collaborative relation between users and items. Therefore,when users have few repetitive actions, RNN-based modelcannot achieve satisfactory results, and collaborative relationsplay an important role in this case. This difference results inour significant improvement over Jodie on LastFM dataset,which has the lowest action repetition rate.V. R
ELATED WORKS
In this section, we review related works on dynamic recom-mendation models, graph-based recommendation and dynamicgraph representation learning. We also include the comparisonbetween our model and previous methods in the end of eachsubsection.
A. Dynamic recommendation
Distinct from static recommendation models like MatrixFactorization (MF) and Bayesian Personal Ranking (BPR), themain task of dynamic recommendation models is to capturethe variations of users and items. Recently, Recurrent NeuralNetwork (RNN) and its variants (LSTM and GRU) are widelyused in the dynamic recommendation problems. Hidasi et al.[4] apply RNN in session-based recommendation, which is aub-problem of dynamic recommendation, to model the fluc-tuation of users’ interests. Time-LSTM [25] combines LSTMwith time gates to model time differences between interactions.Although the RNN model has made great progress in thesequential or dynamic recommendation, it has shortcomings inlong-range dependence. To solve the problem of long-distancedependence, NARM, Stamp, and SasRec [5], [27], [28] utilizesattention mechanism to capture users’ main purposes, and alsoimprove the training speed. Besides, CNN models [29] are alsointroduced to dynamic recommendation. However, all thesemethods only utilize the item trajectory of a user to modelvariations in the user’s interest, while ignoring the evolutionof items.To deal with this problem, some methods that jointly learnrepresentations of users and items by using the point processmodel [26] and RNN model [7]. Jodie [8] predicts userand item embedding with RNN and projection operation. Allmethods above are recommendation models based on RNNsfrom different perspectives. They mainly deal with the itemsequences or user sequences in the temporal order. However,collaborative signal, which means indirect connections be-tween user-user or item-item, is not used among these works.Therefore, in the DGCF, we not only consider both user anditem sequence, but also exploit high-order neighbors in theuser-item graph to enrich the training data.
B. Graph-based recommendation
Users, items, and their interactions can be seen as two typesof nodes and edges in a bipartite graph. The advantages ofmodeling user-item interactions as a graph are: 1) graph-basedalgorithms like random walk and Graph Neural Networks(GNNs) can be applied to predict links between users anditems. 2) High-order connectivity can be explored to enrichtraining data. Because of the ability to reach high-order neigh-bors, random walk is tried in making recommendations on theinteraction graph. HOP-Rec [30] performs random walks onthe graph to consider high-order neighbors. RecWalk [31] isalso a random walk-based method that leverages the spectralproperties of nearly uncoupled Markov chains for the top-Nrecommendation. However, random walk-based methods havean issue of lacking robustness.Nowadays, GNNs show remarkable success in many appli-cations [32]–[38]. The effectiveness of GNNs is also provedon recommendation problems. GCMC [39] applies GraphConvolutional Networks (GCN) [16] in completing the user-item matrix. PinSage [40] introduces GraphSAGE [15] intorecommender system on item-item graph. Spectral CF [13]leverage spectral convolution over the user-item bipartite graphto discover possible connections in the spectral domain. SR-GNN [14] and A-PGNN [41] use GNNs on session-basedrecommendation. BasConv [42] leverages graph convolutionto user-basket-item graph embedding. NGCF [12] propagatesuser and item embeddings hierarchically to model high-orderconnectivity. Although these methods achieve significant per-formances on static recommender system, all of them failto make use of the influence of time, and the dynamics in users and items are not well considered among these methods.Therefore, we take the graph-based recommendation modelsunder a dynamic graph framework to combine graph structurewith time series.
C. Dynamic graph representation learning
Representation learning over graph-structured data has re-ceived wide attention [15], [43]–[45]. It aims to encodethe high-dimensional graph information into low-dimensionalvectors. However, in real-world graph data like social networksand citation networks, the graphs are always evolving. Todeal with this problem, the graph encoding methods shouldalso consider the dynamics of data, and we call this kind ofmethods as dynamic graph representation learning.Previously, some preliminary methods take the evolvinggraph as snapshots during the discrete-time. DANE [46] pro-poses an embedding method on dynamic attributed network.It models the variations of the adjacency matrix and attributematrix based on matrix perturbation. DynGEM [47] trainsa deep autoencoder based model across snapshots of thegraph to learn stable graph embeddings over time. TIMERS[48] propose an incremental method based on Singular ValueDecomposition (SVD) for dynamic representation learning.DyRep [49] defines association and communication events ondynamic graphs and learning the representation based on graphattention neural networks. Our DGCF model is motivated bythese works, especially DyRep, to learn dynamic represen-tations of users and items under recommendation scenarios.However, the recommendation scenario is different from socialnetworks because we have two types of nodes and differentinfluences for dynamic events. Under these circumstances, wedevelop our DGCF model to capture dynamics on both userand item level. VI. C
ONCLUSION
In this paper, we associate the dynamic graph with the dy-namic recommendation scenarios and propose a novel frame-work based on dynamic graph for dynamic recommendation:Dynamic Graph Collaborative Filtering, abbreviated as DGCF.In DGCF, we design three dynamic node update mechanismsfor learning node embedding and making recommendations.Experimental results show that our model outperforms allseven baselines.The proposed DGCF is an initial trial of combining dy-namic graph with recommender system. Apart from user-itembipartite graph, many other kinds of graph structure can beexplored with dynamic graph, e.g., knowledge graph, socialnetwork, and attributed graph.VII. A
CKNOWLEDGEMENT
This work is supported in part by National Key Researchand Development Program (2018YFB1402600), and NSF un-der grants III-1526499, III-1763325, III-1909323, and SaTC-1930941.
EFERENCES[1] S. Rendle, C. Freudenthaler, and L. Schmidt-Thieme, “Factorizingpersonalized markov chains for next-basket recommendation,” in
WWW .ACM, 2010, pp. 811–820.[2] Y. Koren, “Collaborative filtering with temporal dynamics,” in
KDD .ACM, 2009, pp. 447–456.[3] S. Rendle, C. Freudenthaler, Z. Gantner, and L. Schmidt-Thieme, “Bpr:Bayesian personalized ranking from implicit feedback,” in
UAI . AUAIPress, 2009, pp. 452–461.[4] B. Hidasi, A. Karatzoglou, L. Baltrunas, and D. Tikk, “Session-basedrecommendations with recurrent neural networks,” in
ICLR , 2016.[5] J. Li, P. Ren, Z. Chen, Z. Ren, T. Lian, and J. Ma, “Neural attentivesession-based recommendation,” in
CIKM . ACM, 2017, pp. 1419–1428.[6] Q. Liu, S. Wu, D. Wang, Z. Li, and L. Wang, “Context-aware sequentialrecommendation,” in
ICDM . IEEE, 2016, pp. 1053–1058.[7] C.-Y. Wu, A. Ahmed, A. Beutel, A. J. Smola, and H. Jing, “Recurrentrecommender networks,” in
WSDM . ACM, 2017, pp. 495–503.[8] S. Kumar, X. Zhang, and J. Leskovec, “Predicting dynamic embeddingtrajectory in temporal interaction networks,” in
KDD . ACM, 2019, pp.1269–1278.[9] J. You, Y. Wang, A. Pal, P. Eksombatchai, C. Rosenburg, andJ. Leskovec, “Hierarchical temporal convolutional networks for dynamicrecommender systems,” in
WWW . ACM, 2019, pp. 2236–2246.[10] Y. Wang, N. Du, R. Trivedi, and L. Song, “Coevolutionary latent featureprocesses for continuous-time user-item interactions,” in
NeurIPS , 2016,pp. 4547–4555.[11] Q. Wu, Y. Gao, X. Gao, P. Weng, and G. Chen, “Dual sequentialprediction models linking sequential recommendation and informationdissemination,” in
KDD . ACM, 2019, pp. 447–457.[12] X. Wang, X. He, M. Wang, F. Feng, and T.-S. Chua, “Neural graphcollaborative filtering,” in
SIGIR , 2019, pp. 165–174.[13] L. Zheng, C.-T. Lu, F. Jiang, J. Zhang, and P. S. Yu, “Spectralcollaborative filtering,” in
Proceedings of the 12th ACM Conference onRecommender Systems . ACM, 2018, pp. 311–319.[14] S. Wu, Y. Tang, Y. Zhu, L. Wang, X. Xie, and T. Tan, “Session-basedrecommendation with graph neural networks,” in
AAAI , vol. 33, 2019,pp. 346–353.[15] W. Hamilton, Z. Ying, and J. Leskovec, “Inductive representationlearning on large graphs,” in
NeurIPS , 2017, pp. 1024–1034.[16] T. N. Kipf and M. Welling, “Semi-supervised classification with graphconvolutional networks,”
ICLR , 2017.[17] S. Hochreiter and J. Schmidhuber, “Long short-term memory,”
Neuralcomputation , vol. 9, no. 8, pp. 1735–1780, 1997.[18] P. Veliˇckovi´c, G. Cucurull, A. Casanova, A. Romero, P. Lio, andY. Bengio, “Graph attention networks,”
ICLR , 2018.[19] Q. Li, Z. Han, and X.-M. Wu, “Deeper insights into graph convolutionalnetworks for semi-supervised learning,” in
AAAI , 2018.[20] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”
ICLR , 2015.[21] J. Baumgarten, “Reddit data dump,”
URL https://files. pushshift.io/reddit .[22] J. W. Pennebaker, M. E. Francis, and R. J. Booth, “Linguistic inquiryand word count: Liwc 2001,”
Mahway: Lawrence Erlbaum Associates ,vol. 71, 2001.[23] G. Kossinets, “Processed wikipedia edit history,” 2012.[24] B. Hidasi and D. Tikk, “Fast als-based tensor factorization for context-aware recommendation from implicit feedback,” in
ECML-PKDD .Springer, 2012, pp. 67–82.[25] Y. Zhu, H. Li, Y. Liao, B. Wang, Z. Guan, H. Liu, and D. Cai, “Whatto do next: Modeling user behaviors by time-lstm.” in
IJCAI , 2017, pp.3602–3608.[26] H. Dai, Y. Wang, R. Trivedi, and L. Song, “Deep coevolutionarynetwork: Embedding user and item features for recommendation,” arXivpreprint arXiv:1609.03675 , 2016.[27] Q. Liu, Y. Zeng, R. Mokhosi, and H. Zhang, “Stamp: short-termattention/memory priority model for session-based recommendation,” in
KDD . ACM, 2018, pp. 1831–1839.[28] W.-C. Kang and J. McAuley, “Self-attentive sequential recommenda-tion,” in
ICDM . IEEE, 2018, pp. 197–206.[29] J. Tang and K. Wang, “Personalized top-n sequential recommendationvia convolutional sequence embedding,” in
WSDM . ACM, 2018, pp.565–573. [30] J.-H. Yang, C.-M. Chen, C.-J. Wang, and M.-F. Tsai, “Hop-rec: high-order proximity for implicit recommendation,” in
Proceedings of the12th ACM Conference on Recommender Systems . ACM, 2018, pp.140–144.[31] A. N. Nikolakopoulos and G. Karypis, “Recwalk: Nearly uncoupledrandom walks for top-n recommendation,” in
WSDM . ACM, 2019, pp.150–158.[32] H. Peng, J. Li, Q. Gong, Y. Song, Y. Ning, K. Lai, and P. S. Yu, “Fine-grained event categorization with heterogeneous graph convolutionalnetworks,” in
IJCAI . AAAI Press, 2019, pp. 3238–3245.[33] Y. Dou, Z. Liu, L. Sun, Y. Deng, H. Peng, and P. S. Yu, “Enhancinggraph neural network-based fraud detectors against camouflaged fraud-sters,” arXiv preprint arXiv:2008.08692 , 2020.[34] Z. Li, Z. Cui, S. Wu, X. Zhang, and L. Wang, “Fi-gnn: Modeling featureinteractions via graph neural networks for ctr prediction,” in
CIKM .ACM, 2019, pp. 539–548.[35] Y. Zhang, X. Yu, Z. Cui, S. Wu, Z. Wen, and L. Wang, “Everydocument owns its structure: Inductive text classification via graphneural networks,” arXiv preprint arXiv:2004.13826 , 2020.[36] H. Peng, H. Wang, B. Du, M. Z. A. Bhuiyan, H. Ma, J. Liu, L. Wang,Z. Yang, L. Du, S. Wang et al. , “Spatial temporal incidence dynamicgraph neural networks for traffic flow forecasting,”
Information Sciences ,vol. 521, pp. 277–290, 2020.[37] Y. Zhu, Y. Xu, F. Yu, Q. Liu, S. Wu, and L. Wang, “DeepGraph Contrastive Representation Learning,” in
ICML Workshop onGraph Representation Learning and Beyond , 2020. [Online]. Available:http://arxiv.org/abs/2006.04131[38] F. Yu, Y. Zhu, Q. Liu, S. Wu, L. Wang, and T. Tan, “Tagnn: Targetattentive graph neural networks for session-based recommendation,” arXiv preprint arXiv:2005.02844 , 2020.[39] R. v. d. Berg, T. N. Kipf, and M. Welling, “Graph convolutional matrixcompletion,”
KDD Deep Learning Day , 2018.[40] R. Ying, R. He, K. Chen, P. Eksombatchai, W. L. Hamilton, andJ. Leskovec, “Graph convolutional neural networks for web-scale rec-ommender systems,” in
KDD . ACM, 2018, pp. 974–983.[41] S. Wu, M. Zhang, X. Jiang, X. Ke, and L. Wang, “Personalizinggraph neural networks with attention mechanism for session-basedrecommendation,” arXiv preprint arXiv:1910.08887 , 2019.[42] Z. Liu, M. Wan, S. Guo, K. Achan, and P. S. Yu, “Basconv: Ag-gregating heterogeneous interactions for basket recommendation withgraph convolutional neural network,” in
Proceedings of the 2020 SIAMInternational Conference on Data Mining . SIAM, 2020, pp. 64–72.[43] B. Perozzi, R. Al-Rfou, and S. Skiena, “Deepwalk: Online learning ofsocial representations,” in
KDD . ACM, 2014, pp. 701–710.[44] A. Grover and J. Leskovec, “node2vec: Scalable feature learning fornetworks,” in
KDD , 2016, pp. 855–864.[45] J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei, “Line: Large-scale information network embedding,” in
WWW , 2015, pp. 1067–1077.[46] J. Li, H. Dani, X. Hu, J. Tang, Y. Chang, and H. Liu, “Attributed networkembedding for learning in a dynamic environment,” in
CIKM . ACM,2017, pp. 387–396.[47] P. Goyal, N. Kamra, X. He, and Y. Liu, “Dyngem: Deep embeddingmethod for dynamic graphs,” arXiv preprint arXiv:1805.11273 , 2018.[48] Z. Zhang, P. Cui, J. Pei, X. Wang, and W. Zhu, “Timers: Error-boundedsvd restart on dynamic networks,” in
AAAI , 2018.[49] R. Trivedi, M. Farajtabar, P. Biswal, and H. Zha, “Dyrep: Learningrepresentations over dynamic graphs,” in