[PDF] Beyond Clicks: Modeling Multi-Relational Item Graph for Session-Based Target Behavior Prediction

Abstract

Session-based target behavior prediction aims to predict the next item to be interacted with specific behavior types (e.g., clicking). Although existing methods for session-based behavior prediction leverage powerful representation learning approaches to encode items' sequential relevance in a low-dimensional space, they suffer from several limitations. Firstly, they focus on only utilizing the same type of user behavior for prediction, but ignore the potential of taking other behavior data as auxiliary information. This is particularly crucial when the target behavior is sparse but important (e.g., buying or sharing an item). Secondly, item-to-item relations are modeled separately and locally in one behavior sequence, and they lack a principled way to globally encode these relations more effectively. To overcome these limitations, we propose a novel Multi-relational Graph Neural Network model for Session-based target behavior Prediction, namely MGNN-SPred for short. Specifically, we build a Multi-Relational Item Graph (MRIG) based on all behavior sequences from all sessions, involving target and auxiliary behavior types. Based on MRIG, MGNN-SPred learns global item-to-item relations and further obtains user preferences w.r.t. current target and auxiliary behavior sequences, respectively. In the end, MGNN-SPred leverages a gating mechanism to adaptively fuse user representations for predicting next item interacted with target behavior. The extensive experiments on two real-world datasets demonstrate the superiority of MGNN-SPred by comparing with state-of-the-art session-based prediction methods, validating the benefits of leveraging auxiliary behavior and learning item-to-item relations over MRIG.

Full PDF

BBeyond Clicks: Modeling Multi-Relational Item Graph forSession-Based Target Behavior Prediction

Wen Wang

Wei Zhang ∗ [email protected] of Computer Science andTechnology,East China Normal University Shukai LiuQi Liu

[email protected]@tencent.com

Bo Zhang

[email protected]

Leyu Lin

[email protected]

Hongyuan Zha

Georgia Institute of [email protected]

ABSTRACT

Session-based target behavior prediction aims to predict the nextitem to be interacted with specific behavior types (e.g., clicking).Although existing methods for session-based behavior predictionleverage powerful representation learning approaches to encodeitems’ sequential relevance in a low-dimensional space, they sufferfrom several limitations. Firstly, they focus on only utilizing thesame type of user behavior for prediction, but ignore the potentialof taking other behavior data as auxiliary information. This is par-ticularly crucial when the target behavior is sparse but important(e.g., buying or sharing an item). Secondly, item-to-item relationsare modeled separately and locally in one behavior sequence, andthey lack a principled way to globally encode these relations moreeffectively. To overcome these limitations, we propose a novel Multi-relational Graph Neural Network model for Session-based targetbehavior Prediction, namely MGNN-SPred for short. Specifically,we build a Multi-Relational Item Graph (MRIG) based on all be-havior sequences from all sessions, involving target and auxiliarybehavior types. Based on MRIG, MGNN-SPred learns global item-to-item relations and further obtains user preferences w.r.t. currenttarget and auxiliary behavior sequences, respectively. In the end,MGNN-SPred leverages a gating mechanism to adaptively fuseuser representations for predicting next item interacted with targetbehavior. The extensive experiments on two real-world datasetsdemonstrate the superiority of MGNN-SPred by comparing withstate-of-the-art session-based prediction methods, validating thebenefits of leveraging auxiliary behavior and learning item-to-itemrelations over MRIG.

CCS CONCEPTS • Information systems → Personalization ; •

Computing method-ologies → Neural networks . ∗ Wei Zhang is the corresponding author. This work is supported by NSFC (61702190),Shanghai Sailing Program (17YF1404500), and NSFC-Zhejiang (U1609220).This paper is published under the Creative Commons Attribution 4.0 International(CC-BY 4.0) license. Authors reserve their rights to disseminate the work on theirpersonal and corporate Web sites with the appropriate attribution.

WWW ’20, April 20–24, 2020, Taipei, Taiwan © 2020 IW3C2 (International World Wide Web Conference Committee), publishedunder Creative Commons CC-BY 4.0 License.ACM ISBN 978-1-4503-7023-3/20/04.https://doi.org/10.1145/3366423.3380077

KEYWORDS

Sequential recommendation, graph neural networks, user behaviormodeling

ACM Reference Format:

Wen Wang, Wei Zhang, Shukai Liu, Qi Liu, Bo Zhang, Leyu Lin, and HongyuanZha. 2020. Beyond Clicks: Modeling Multi-Relational Item Graph for Session-Based Target Behavior Prediction. In

Proceedings of The Web Conference2020 (WWW ’20), April 20–24, 2020, Taipei, Taiwan.

ACM, New York, NY,USA, 7 pages. https://doi.org/10.1145/3366423.3380077

Unlike conventional recommendation algorithms which get ac-customed to modeling each user-item interaction separately [11],recent sequential recommendation approaches meet more realis-tic requirements for its ability of modeling user dynamic interest.Session-based target behavior prediction [8] is the one of the mainstudied problem in this regard, aiming to predict the next item tobe interacted with a user under a specific type of behavior (e.g.,clicking an item). Based on the predictions, information providerscan effectively deliver items to appropriate users and at the sametime, and users can quickly find the items what they actually want.Note that we use session-based prediction and session-based rec-ommendation interchangeably throughout this paper.Early studies for this problem assume that the appearance of thenext item depends only on its previous item [23, 33] in the samesequence. With such a strong assumption, they could only modelthe last item in each sequence and ignore other information fromthe sequence. To relieve this assumption, various methods adopt se-quential models for session-based recommendation system to learnbehavior sequences. Recurrent Neural Networks (RNN) [9] is com-monly leveraged to obtain promising performance. The relevantmethods could roughly be attributed into two categories: single-session based recommendation models [8, 30] and multi-sessionbased recommendation models [21, 31]. As the latter category re-quires the user ID of each behavior sequence should be known inadvance to link multiple sequences of the same user together, itis not so universal than the first category due to privacy issuesand user scalability problem (e.g., a billion of active users eachday in WeChat). As such, we study session-based target behaviorprediction from the perspective of single-session based modeling. a r X i v : . [ c s . I R ] O c t WW ’20, April 20–24, 2020, Taipei, Taiwan Wen Wang and Wei Zhang, et al.

In the domain of single-session based behavior prediction, somestudies [14, 22, 25] adopt attention mechanism [1, 28] and outper-form the pioneering RNN based methods [8]. Recent advances ingraph neural networks (GNN) [3, 7] further boost the performanceof session-based behavior prediction by modeling each session-based behavior sequence as a graph to achieve the state-of-the-artperformance [29, 30]. However, existing studies in this regard stillsuffer from several limitations.

Firstly , they focus on only using thesame type of user behavior as input for the next item prediction,but ignore the potential of leveraging other type of behavior asauxiliary information. This is particularly crucial when the targetbehavior is sparse but important (e.g., buying or sharing an item).

Secondly , item-to-item relations are modeled separately and locally,since both RNN based and GNN based recommendation modelsonly utilize one behavior sequence each time. It is intuitive thatabundant item-to-item relations are hidden in various behavior se-quences. For example, if many other users who have bought item Bafter buying item A, the relation between item A and B is especiallyvital if a target user just bought item A.To overcome these limitations, we propose a novel Multi-relationalGraph Neural Network model for Session-based target behaviorPrediction, namely MGNN-SPred for short. The target behaviorwe focused on is the aforementioned sparse behavior beyond thedense click behavior. MGNN-SPred jointly considers target behav-ior and auxiliary behavior sequences and explores global item-to-item relations for accurate prediction. Specifically, for the purposeof considering the global item-to-item relations, we build a Multi-Relational Item Graph (MRIG) based on the past behavior sequencesof all sessions. There might exist multiple relations between twograph nodes, denoting target and auxiliary behavior types. Basedon MRIG, MGNN-SPred encodes global item-to-item relations intonode representations and further obtains local representations forcurrent target and auxiliary behavior sequences, respectively. In theend, MGNN-SPred leverages a gating mechanism to adaptively fusethe representations from target behavior sequence and auxiliarybehavior sequence to produce current user interest representation.The main contributions of this work is summarized as follows:1. We address the two limitations of existing methods by break-ing the restriction of only using one type of behavior sequencein session-based recommendation and exploring another type ofbehavior as auxiliary information. We further construct the multi-relational item graph for learning global item-to-item relations.2. To effectively model MRIG w.r.t. target and auxiliary behaviorsequences, we develop the novel graph model MGNN-SPred whichlearns global item-to-item relations through graph neural networkand integrates representations of target and auxiliary of currentsequences by the gating mechanism.3. We carry out extensive experiments and demonstrate MGNN-SPred achieves the best performance among strong competitors,showing the benefits of overcoming the two limitations.

Session-Based Behavior Prediction.

In the literature, the pio-neering study [8] in the direction of single-session based recom-mendation first adopts a recurrent neural network based approachwith past interacted items as the input of different time steps for session-based recommendation. Following that, [26] improves themodel with data augmentation and the consideration of tempo-ral user behavior shift. In addition to using RNN, [13] also adoptsattention mechanism to capture a user’s sequential behavior andits main purpose in a current session. Similarly, [14] proposes anovel attention mechanism to capture both the users’ long-terminterests in general and their short-term attention. More recently,with the flourish Graph Neural Networks (GNN) methodologies,[29] first separates each session sequence into different graphs anduses graph neural networks to capture complex item transitionsin a specific graph. Afterwards, each session is represented as thecombination of the global preference and current interests of thissession using an attention network. [30] is similar to [29], whichuses a multi-layered self-attention network as an alternative tocapture long-range dependencies between items within a session.As discussed in the introduction, these existing relevant methodssuffer from two limitations which motivate the proposal of ourmodel in this paper.

Multi-Behavior Modeling.

Multi-behavior modeling for recom-mender system aims to leverage other types of user behavior toboost the recommendation performance on the target behavior. Afew studies have already investigated this scenario from differentperspectives. [12] considers to leverage users’ social interactionsas auxiliary behavior for target behavior prediction by collectivematrix factorization (CMF) techniques. In a similar fashion, [34]builds multiple matrices from user different behaviors which coveruser resharing behavior, user commenting behavior, user postingbehavior, etc. CMF is adopted to learn shared user representationfor recommendation as well. [15] proposes multi-feedback Bayesianpersonalized ranking (BPR), an extension of the classical Bayesianpersonalized ranking approach and tailored for different user behav-iors. It differentiates different preference levels between differentuser behaviors in the sampling stage for ranking. [4] also considersthe assignment different preference levels of various user behav-iors. Instead of BPR, it incorporates this useful information intoelement-wise alternating least squares learner. More recently, aneural network approach is proposed by [6] to learn representa-tions for user-item interactions with different behaviors. Multi-tasklearning is conducted to predict multi-behaviors with respect to acertain item in a cascading way. Our work fundamentally differsfrom the above studies since all of them assume the independenceof different user-item interactions while our study is more realisticby considering to model user behaviors in a sequential setting.

Graph Neural Networks.

Graph neural networks are the methodsused to generate representation of graph structured data, such associal network and knowledge graph. [20] extends Word2vec [17]by proposing a model, DeepWalk, to learn node representationsbased on sequences sampled from graphs. LINE[27] encodes first-order and second-order proximity of nodes into a low-dimensionalspace. Recently, a surge of methods related on graph convolutionalnetworks (GCN) have been raised. [2] presents a method witha graph-based analogue of convolutional architectures, which isthe original version of GCN. Later, a number of improvements,extensions, and approximations of these spectral convolutions beproposed [5, 7, 10, 18]. These approaches outperform other methodsbased on random walks (e.g., DeepWalk and node2vec). With the eyond Clicks: Modeling Multi-Relational Item Graph for Session-Based Target Behavior Prediction WWW ’20, April 20–24, 2020, Taipei, Taiwan G r a ph N e u r a l N e t w o r k MRIG

Target and auxiliary behavior sequences

Item Embedding 32145Gate

Constructing sub-graph User embedding

Bi-linear productRanking by score Top-N

Target sub-graphAuxiliary sub-graph TA Figure 1: The architecture of our model. We use a toy MRIGand two current behavior sequences as input. The numberof recommended items is set to 2. success in mind, an amount of GCN based methods are widelyapplied in various domains such as recommendation systems [18].But most GCN based methods require that all nodes in the graphare present in each propagation step of GNN. Different from GCN,GraphSAGE [7] can train GNN with a minibatch setting. Inspiredby this, we design our GNN to learn from the constructed multi-relational item graph for session-based behavior prediction.

For a session 𝑠 in the session set 𝑆 , let 𝑃 𝑠 = [ 𝑝 𝑠 , 𝑝 𝑠 , 𝑝 𝑠 , ..., 𝑝 𝑠 | 𝑃 𝑠 | ] denote the target behavior sequence and 𝑄 𝑠 = [ 𝑞 𝑠 , 𝑞 𝑠 , 𝑞 𝑠 , ..., 𝑞 𝑠 | 𝑄 𝑠 | ] represent the auxiliary behavior sequence. Moreover, we constructa Multi-Relational Item Graph G = (V , E) based on all behaviorsequences from all sessions, where V is the set of nodes in thegraph containing all available items and E is the edge sets involvingmultiple types of directed edges. Each edge is a triple consisting ofthe head item, the tail item, and the type of this edge. For instance, ifwe construct the graph based on behaviors of sharing and clicking,then an edge ( 𝑎, 𝑏, share ) ∈ E means that a user shared item 𝑎 andsubsequently shared item 𝑏 , and an edge ( 𝑎, 𝑏, click ) ∈ E meansthat a user clicked item 𝑏 after clicking item 𝑎 . Given the abovenotations, we formulate the problem as follows:Problem 1 (session-based target behavior prediction). Givena session 𝑠 ∈ 𝑆 and its target and auxiliary behavior sequences 𝑃 𝑠 and 𝑄 𝑠 , along with MRIG G , the target of this problem is to learn a modelthat can generate 𝐾 items which are most likely to be interacted withthe user of the session in the next. The overall architecture of the proposed MGNN-SPred is depictedin Figure 1. The input to MGNN-SPred contains a Multi-RelationalItem Graph (MRIG) and the two types of behavior sequences. SR-MRIG first learns item correlations from MRIG by graph neuralnetworks and encode them into item representations. Afterwards,a user’s two behavior sequences are regarded as two sub-graphsin the MRIG where the items in each sub-graph are connected

Algorithm 1

Multi-relational item graph construction

Input:

Session set 𝑆 , both target and auxiliary behavior sequences 𝑃 𝑠 and 𝑄 𝑠 , ∀ 𝑠 ∈ 𝑆 Output:

MRIG G = (V , E) V ← ∅ , E ← ∅ for 𝑠 ∈ S do V ← V ∪ { 𝑃 𝑠 [ ]} for 𝑖 = | 𝑃 𝑠 | do V ← V ∪ { 𝑃 𝑠 [ 𝑖 ]} , E ← E ∪ {( 𝑃 𝑠 [ 𝑖 − ] , 𝑃 𝑠 [ 𝑖 ] , target )} end for V ← V ∪ { 𝑄 𝑠 [ ]} for 𝑖 = | 𝑄 𝑠 | do V ← V∪{ 𝑄 𝑠 [ 𝑖 ]} , E ← E∪{( 𝑄 𝑠 [ 𝑖 − ] , 𝑄 𝑠 [ 𝑖 ] , auxiliary )} end for end for with a virtual node (“T” or “A” in Figure 1), respectively. Subse-quently, SR-MRIG aggregates the nodes of each sub-graph to thecorresponding virtual node, thus getting the representation of eachbehavior sequence. Finally, to fuse the two behavior representationsand obtain user preference representations, a gating mechanism isadopted to adaptively decide the importance of different behaviorsand perform weighted summation over them. For the purpose ofrecommendation, SR-MRIG calculates each item’s score by userand item representations via a bi-linear product and use the scoresto rank them for recommendation. There are abundant relationships between items lying in users’ his-torical behaviors. If a user buys item 𝑎 , and subsequently buys item 𝑏 in the same session, it indicates that item 𝑎 and item 𝑏 probablyhave some dependency, but does not reflect similarity too muchsince a user less likely buys two very similar items within a shortduration. In comparison, if a user clicks item 𝑎 , and subsequentlyclicks item 𝑏 , it indicates that item 𝑎 and item 𝑏 are probably withlarge similarity. This is intuitive because a user usually browses anumber of similar items, and picks the most suitable one to buy.We construct the multi-relational item graph by taking all itemsas nodes and each type of behavior corresponds as one directed edge,denoting different relationships between items. The process of con-structing MRIG is shown in Algorithm 1. The both target and aux-iliary behavior sequences from all sessions 𝑃 𝑠 and 𝑄 𝑠 ( ∀ 𝑠 ∈ S ) areprovided as input. The algorithm browses all behavior sequences,collects all items in the sequences as the nodes of the graph, and con-structs edges between two consequent items in the same sequencewith their behavior types as the edge types. After constructing thegraph with target and auxiliary behaviors, there are two types ofdirectional edges in the graph. For each node 𝑣 ∈ V , we use ¯ e 𝑣 ∈ R |V | denotes its one-hotrepresentation. Before we feed the one-hot representations of nodesinto GNN, we first convert each of them into a low-dimensionaldense vector e 𝑣 ∈ R 𝑑 by a learnable embedding matrix E ∈ R |V |× 𝑑 : e 𝑣 = E ⊤ ¯ e 𝑣 . WW ’20, April 20–24, 2020, Taipei, Taiwan Wen Wang and Wei Zhang, et al.

After collecting the vectors e 𝑣 (∀ 𝑣 ∈ V) , we feed them withMRIG G into GNN to generate global representations of nodes g 𝑣 . The representations are expected to encode multiple item-to-item relations. We take node 𝑣 as an example for illustration. Firstof all, we collect neighbors of node 𝑣 . Each node in the graphhas four types of neighboring node sets. According to the typeand direction, we name the four sets as “target-forward”, “target-backward”, “auxiliary-forward”, and “auxiliary-backward”. Takethe type of “target” as an example, we obtain neighbor groupscorresponding to forward and backward directions as below: N t + ( 𝑣 ) = { 𝑣 ′ |( 𝑣 ′ , 𝑣, target ) ∈ E} , N t − ( 𝑣 ) = { 𝑣 ′ |( 𝑣, 𝑣 ′ , target ) ∈ E} . (1)For the type of “auxiliary", its neighbor groups, i.e., N a + ( 𝑣 ) and N a − ( 𝑣 ) , are acquired by the same way.At each step of representation propagation in GNN, we firstaggregate each group of neighbors by mean-pooling to obtain therepresentation of this group, defined as below: h 𝑘 t + ,𝑣 = (cid:205) 𝑣 ′ ∈N t + ( 𝑣 ) h 𝑘 − 𝑣 ′ |N t + ( 𝑣 )| . (2)The representations of the three remaining groups are calculatedin a similar fashion. Consequently, for the propose of joint con-sidering different relations between items, we combine these fourrepresentations of different neighbor groups by sum-pooling: ¯h 𝑘𝑣 = h 𝑘 t + ,𝑣 + h 𝑘 t − ,𝑣 + h 𝑘 a + ,𝑣 + h 𝑘 a − ,𝑣 . (3)Finally, we update the representation of the center node 𝑣 by: h 𝑘𝑣 = h 𝑘 − 𝑣 + ¯h 𝑘𝑣 . (4)After performing 𝐾 iterations, we take the node representationof the last step as the representation of the corresponding item: g 𝑣 = h 𝐾𝑣 . In practice, we implement the GNN in a minibatch settingwhich is inspired by [7] to ensure scalability. We have tried different ways to compute the representation ofthe virtual node for the target and auxiliary behavior sequences,including using attention mechanism to assign different importanceweights to the nodes and performing sub-graph propagating forseveral times. Empirically, we have found that simple mean-poolingcould already achieve comparable performance while retaining lowcomplexity. We denote the summarized representations of targetbehavior sequence 𝑃 and auxiliary behavior sequence 𝑄 as p and q ,respectively, which are given as: p = (cid:205) | 𝑃 | 𝑖 = g 𝑝 𝑖 | 𝑃 | , q = (cid:205) | 𝑄 | 𝑖 = g 𝑞 𝑖 | 𝑄 | . (5)We argue that the two different types of behavior sequence repre-sentations might contribute differently when building an integratedrepresentation. This is because the auxiliary behavior is not exactlythe same with the target behavior to be predicted, and differentusers might have different concentration on different behaviors.For instances, some users might browse the item pages frequentlyand click various items arbitrarily, and another users might onlyclick the items they want to buy. It is self-evident that the contribu-tions of auxiliary behavior sequence for the next item prediction are different in these situations. We define the following gatingmechanism to calculate the relative importance weight 𝛼 : 𝛼 = 𝜎 ( W 𝑔 [ p ; q ]) , (6)where [ p ; q ] denotes the concatenation of the two representations, 𝜎 is the sigmoid function, and W 𝑔 ∈ 𝑅 × 𝑑 is a trainable parameterof our model. Finally, we obtain the user preference representation o for the current session by the weighted summation of p and q : o = 𝛼 · p + ( − 𝛼 ) · q . (7) We further calculate the recommendation score 𝑠 𝑣 of each item 𝑣 ∈ V using the item embedding e 𝑣 . A bi-linear matching schemeis employed by: 𝑠 𝑣 = o ⊤ We 𝑣 , (8)where W ∈ 𝑅 𝑑 × 𝑑 is a trainable parameter matrix of our model.To learn the parameters of our model, we apply a softmax func-tion to normalize the scores s ∈ 𝑅 |V | over all items to get theprobability distribution ˆ y :ˆ y = softmax ( s ) . (9)Backpropagation for neural networks is adopted to optimize themodel by minimizing the cross-entropy loss of the predicted prob-ability distribution ˆ y w.r.t. the ground truth. The loss function isdefined as follows: L 𝑅𝑆 = − |V | ∑︁ 𝑖 = (cid:0) 𝑦 𝑖 log ( ˆ 𝑦 𝑖 ) + ( − 𝑦 𝑖 ) log ( − ˆ 𝑦 𝑖 ) (cid:1) , (10)where y = ( 𝑦 , · · · , 𝑦 |V | ) denotes the one-hot representation ofthe ground truth. Note that L 𝑅𝑆 is easily extended to a minibatchloss. Table 1: Basic statistics of the datasets.

Data WeChat Yoochoose

We evaluate our model on two real-world datasetsnamed WeChat and Yoochoose. The Yoochoose dataset is obtainedfrom the RecSys Challenge 2015. The user behavior sequences inthe dataset are already segmented into sessions and all the users areanonymized. The WeChat dataset is collected from

Top Stories ( 看一看 ) of WeChat, where we choose videos are regarded as items. Werandomly select one hundred thousand active users and collect theirbehavior records for a duration of one week. Since the durationis relatively short, we retain an entire behavior sequence of each eyond Clicks: Modeling Multi-Relational Item Graph for Session-Based Target Behavior Prediction WWW ’20, April 20–24, 2020, Taipei, Taiwan user by taking the sequence as a single session. In this paper, wetreat the behavior of purchase in Yoochoose and the behavior ofsharing in WeChat as the target behavior and regard the behaviorof clicking in both datasets as the auxiliary behavior.Given a session with the target behavior sequence 𝑃 = [ 𝑝 , 𝑝 , ...,𝑝 | 𝑃 | ] and the auxiliary behavior sequence 𝑄 = [ 𝑞 , 𝑞 , ..., 𝑞 | 𝑄 | ] , weadopt a similar way to construct training example as [13, 29]. Thatis, we treat each item 𝑝 𝑖 , ( 𝑖 ≥

2) as the label and use [ 𝑝 , 𝑝 , ...𝑝 𝑖 − ] as input of target behavior. The treatment for the auxiliary behavioris a little different, because a user is very likely to click an itembefore buying or sharing it. To avoid the auxiliary input alreadysees the labels, we only keep the clicked items before the target itemthat is also bought or shared by the user. We set a maximum length 𝐿 for both types of sequences and only keep the last 𝐿 items longerthan the maximum length. Considering the fact that two datasetshave different average sequence length (see details in Table 1), weset the maximum length 𝐿 to 10 for WeChat and 3 for Yoochoose.We discuss the impact of different maximum length in Section 4.4.3.We split the datasets in a chronological order for evaluation,consistent with real situations. We take the first 6/7 of datasets asthe training data, and use 1/3 of the remaining data as the validationdata to determine optimal hyper-parameter settings. MRIG usedthroughout the experiments are constructed only based on trainingdata. The basic statistics of two datasets are summarized in Table 1. We compare the proposed model with severalstrong competitors, including state-of-the-art graph neural networkbased model for session-based recommendation. • POP.

It just recommends the top-n frequent items in the trainingset regardless of behaviors in current sessions. • Item-KNN [24].

It recommends items most similar to the previ-ously interacted items belonging to the same sessions. • GRU4Rec [8].

GRU4Rec is the pioneering RNN-based deep se-quential model for session-based recommendation. • NARM [13].

It employs attention mechanism to capture differ-ent importance of each item according to their hidden statesobtained by RNN. A weighted integration of different item repre-sentations is performed to obtain final representation. • STAMP [14].

This model learns users’ general interest from thelong-term memory of session context and current interest fromthe short-term memory of their last behaviors. • SR-GNN [29] and

GC-SAN [30].

Both of the graph-based mod-els only use a current session to construct graph for applyingGNN to learn item representations. The difference is SR-GNNrepresents each session by a traditional attention network whileGC-SAN is based on a multi-layered self-attention mechanism. • R-DAN.

Reasoning-DAN (R-DAN) [19] is used to model bothbehavior sequences simultaneously. • CoAtt.

Co-Attention (CoAtt) [16] with alternative calculationfor interactive attention is adopted for comparison. • HetGNN.

Heterogeneous graph neural network [32] is appliedfor recommendation, with two edge types and one node type.It is worth noting only target behavior is considered by the abovebaselines originally developed for session-based recommendation,i.e., GRU4Rec, NARM, STAMP, SR-GNN, and GC-SAN. To makethe comparison more fairable, we revise these methods throughthe following manner. We use their original forms to model the

Table 2: Evaluation results of all methods.

Methods WeChat YoochooseH@100 M@100 N@100 H@100 M@100 N@100POP 13.565 1.1247 3.2621 6.095 0.2529 1.2231Item-KNN 15.770 1.1624 3.7222 15.286 1.9415 4.4040GRU4Rec 18.831 1.3956 4.3966 19.114 2.5292 5.5830NARM 19.131 1.4034 4.4416 18.775 2.5819 5.5813STAMP 17.757 1.3083 4.1078 20.361 2.3487 5.6879SR-GNN 18.940 1.3827 4.3967 21.262 2.6892 6.1232GC-SAN 19.034 1.2090 4.2490 19.718 2.5218 5.6861HetGNN 20.290 1.4171 4.6504 24.031 2.9546 6.8732R-DAN 18.952 1.3879 4.3980 15.956 2.3107 4.8608CoAtt 17.700 1.1931 4.0137 20.080 2.5742 5.8206

Ours 21.271 1.4797 4.8529 28.632 3.6564 8.2722

Table 3: Results of not using auxiliary behavior sequences.

Methods WeChat YoochooseH@100 M@100 N@100 H@100 M@100 N@100GRU4Rec (w/o a) 16.889 1.2346 3.9128 14.817 1.6032 4.0012NARM (w/o a) 17.773 1.3123 4.1298 14.443 1.5540 3.8900SR-GNN (w/o a) 18.093 1.2621 4.1368 15.302 1.5782 4.0852Ours (w/o a) 19.252 1.3933 4.4473 21.089 2.3798 5.8221 target behavior sequence and auxiliary behavior sequence respec-tively, And afterwards, we utilize the proposed gating mechanismto fuse the two types of representations as ours. In addition, wealso compare our model with the baselines in the situation of onlyconsidering target behavior (see Table 3 for details).

We implement our proposed modelbased on Tensorflow. The dimension of item embedding is set to64. Adam with default parameter setting is adopted to optimize themodel, with the mini-batch size of 64. GNN is ensured to run ina minibatch setting and the depth 𝐾 is set to 2. We terminate thelearning process with an early stopping strategy. We test differentforms of attention computation formulas for the baselines basedon attention mechanism and report their best results. The hyper-parameters of baselines are turned on validation datasets as well. We consider the top-100 ranked predictions as recommended items.Following [29, 30], we adopt HR@100 (H@100), MRR@100 (M@100),and NDCG@100 (N@100) to evaluate the recommendation perfor-mance of all models after obtaining their recommendation lists.Table 2 shows the performance comparison between our model andthe adopted baselines. (1) The first part of the table correspondsto the simple baselines. We observe their results are significantlyworse than other methods. (2) The second part involves standardsequential based methods for session-based recommendation. Weobserve that their results keep at the same level, except for STAMPon WeChat. It shows that: 1) taking session-based recommenda-tion as a sequential modeling task can improve performance; 2)although NARM and STAMP are more advanced approaches whichuse attention mechanism to combine hidden representations ofdifferent time steps, they do not show advantages on the sparsebehavior prediction problem we studied (not the same as previousstudies focusing on click prediction). (3) The third part is GNN basedmodels. SR-GNN and GC-SAN seem to be better than the sequen-tial methods, and HetGNN further boost the performance. (4) The

WW ’20, April 20–24, 2020, Taipei, Taiwan Wen Wang and Wei Zhang, et al.

Table 4: Ablation study of MGNN-SPredl.

Methods WeChat YoochooseH@100 M@100 N@100 H@100 M@100 N@100Ours (w/o ae) 20.923 1.4665 4.7945 25.463 2.7678 6.8907Ours (w/o asg) 19.742 1.3949 4.5167 22.517 2.6025 6.2631Ours (w/o g) 20.363 1.3707 4.6154 27.577 3.3531 7.7896

Ours 21.271 1.4797 4.8529 28.632 3.6564 8.2722 second-to-last part involves approaches of learning two sequencesin other research domains. Their best results are worse than thebest performance of the above recommendation methods, whichsuggests that considering the interaction of items in two sequencesmight have no benefit for the studied problem. Finally, we can seethat our method outperforms all the other methods, demonstratingthe superiority of our model for session-based recommendation.

We choose several representative methods in Table 3 to test whetherconsidering the auxiliary behavior sequence indeed boosts theperformance of session-based recommendation. The methods with“(w/o a)" mean removing the auxiliary behavior sequence fromtheir full version. Firstly, we observe that our proposed model stillconsistently achieves better performance in this situation. Moreover,by comparing each method in Table 2 with its “(w/o a)" version, wecan find every method beats the one of “(w/o a)" with significantmargins. Based on the above illustrations, we demonstrate thatconsidering the auxiliary behavior sequence is indeed meaningful.

We conduct ablation studies of our model,using “w/o ae" to denote removing the edges related to the auxiliarybehavior, using “w/o asg" to denote that not modeling the sub-graph of the auxiliary behavior sequence in getting user preferencerepresentation, and using “w/o g" to indicate merging the tworepresentations of the target and auxiliary behavior sequences bysimple summation instead of the gating mechanism. Table 4 showsthe corresponding results. We observe that the incorporation ofthe auxiliary edge into the built graph is beneficial for the problemby seeing “w/o ae". The integration of the auxiliary behavior withtarget behavior sequence have a notable contribution by seeing “w/oasg". Besides, we find that the performance becomes worse if wedo not use the gating mechanism to merge the two representationsof the target and auxiliary behavior sequences by investigating“w/o g". Through the above comparison, we conclude the maincomponents in our model are effective.

We test different depth settings(from 0 to 3) about graph representation propagation. The depthsetting with value 0 means the our model does not use GNN andcould not learn any information from MRIG. Figure 2 shows thecorresponding results. We can see that the performance of depth 0 iswithout doubt much worse than the results with depths from 1 to 3.This comparison clarifies the significance of considering MRIG forour model. Moreover, the performance becomes significantly betterwhen the depth grows from 1 to 2, showing modeling high-orderrelation between items through GNN is indispensable. When thenumber of graph representation propagation is larger than 3, the

16 17 18 19 20 21 22 23 0 1 2 3 H it R a t e @ Depth of GNN (a)

WeChat

16 18 20 22 24 26 28 30 0 1 2 3 H it R a t e @ Depth of GNN (b)

Yoochoose

Figure 2: Results of our model with different depths of GNN.

16 17 18 19 20 21 22 23 1 2 3 4 5 7 10 15 20 H it R a t e @ Sequence lengthOursSR-GNN (a)

WeChat

16 18 20 22 24 26 28 30 1 2 3 4 5 7 10 15 20 H it R a t e @ Sequence lengthOursSR-GNN (b)

Yoochoose

Figure 3: Results for different maximum lengths. representations of nodes might become less distinguishable, whichis not ideal for further improving the performance.

We visualize the performancevariation with the change of the maximum behavior sequencelength 𝐿 in Figure 3, where we set 𝐿 in the range from 1 to 20.As expected, with larger maximum sequence length at the begin-ning, the performance of both our model and SR-GNN grows to bebetter. After reaching the peaks, the results slightly become worse,and finally the variation trends turn to be stable. Overall, our modeloutperforms SR-GNN consistently. Besides, we find the lengthswith the best performance are not the same in the two datasets.This is due to the fact the average length of Yoochoose is muchsmaller than that of WeChat, as shown in Table 1. In this paper, we study session-based target behavior prediction.Two limitations of existing relevant models are addressed: usingonly target behavior for next item prediction and lacking a princi-pled way to encode global item-to-item relations. To alleviate theissues, MGNN-SPred is proposed, with the major novelties of build-ing and modeling of the multi-relational item graph. In addition,a gating mechanism is adopted to adaptively fuse target behaviorsequences and auxiliary behavior sequences into the user prefer-ence representations for the next item prediction. Comprehensiveexperiments on two real-world datasets demonstrate MGNN-SPredachieves the best performance and its design is rational. eyond Clicks: Modeling Multi-Relational Item Graph for Session-Based Target Behavior Prediction WWW ’20, April 20–24, 2020, Taipei, Taiwan

REFERENCES [1] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural MachineTranslation by Jointly Learning to Align and Translate.

ICLR (2015).[2] Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. 2014. Spectralnetworks and locally connected networks on graphs.

ICLR (2014).[3] Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. 2016. Convolu-tional Neural Networks on Graphs with Fast Localized Spectral Filtering. In

NIPS .3837–3845.[4] Jingtao Ding, Guanghui Yu, Xiangnan He, Yuhan Quan, Yong Li, Tat-Seng Chua,Depeng Jin, and Jiajie Yu. 2018. Improving Implicit Recommender Systems withView Data. In

IJCAI . 3343–3349.[5] David K Duvenaud, Dougal Maclaurin, Jorge Iparraguirre, Rafael Bombarell,Timothy Hirzel, Alán Aspuru-Guzik, and Ryan P Adams. 2015. Convolutionalnetworks on graphs for learning molecular fingerprints. In

NIPS . 2224–2232.[6] Chen Gao, Xiangnan He, Dahua Gan, Xiangning Chen, Fuli Feng, Yong Li, Tat-Seng Chua, and Depeng Jin. 2019. Neural Multi-task Recommendation fromMulti-behavior Data. In

ICDE . 1554–1557.[7] Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representationlearning on large graphs. In

NIPS . 1024–1034.[8] Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk.2016. Session-based recommendations with recurrent neural networks.

ICLR (2016).[9] Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory.

Neuralcomputation

9, 8 (1997), 1735–1780.[10] Thomas N Kipf and Max Welling. 2017. Semi-supervised classification with graphconvolutional networks.

ICLR (2017).[11] Yehuda Koren, Robert M. Bell, and Chris Volinsky. 2009. Matrix FactorizationTechniques for Recommender Systems.

IEEE Computer

42, 8 (2009), 30–37.[12] Artus Krohn-Grimberghe, Lucas Drumond, Christoph Freudenthaler, and LarsSchmidt-Thieme. 2012. Multi-relational matrix factorization using bayesianpersonalized ranking for social network data. In

WSDM . 173–182.[13] Jing Li, Pengjie Ren, Zhumin Chen, Zhaochun Ren, Tao Lian, and Jun Ma. 2017.Neural attentive session-based recommendation. In

CIKM . 1419–1428.[14] Qiao Liu, Yifu Zeng, Refuoe Mokhosi, and Haibin Zhang. 2018. STAMP: short-term attention/memory priority model for session-based recommendation. In

KDD . 1831–1839.[15] Babak Loni, Roberto Pagano, Martha Larson, and Alan Hanjalic. 2016. BayesianPersonalized Ranking with Multi-Channel User Feedback. In

RecSys . 361–364.[16] Jiasen Lu, Jianwei Yang, Dhruv Batra, and Devi Parikh. 2016. Hierarchicalquestion-image co-attention for visual question answering. In

NIPS . 289–297.[17] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013.Distributed representations of words and phrases and their compositionality. In

NIPS . 3111–3119.[18] Federico Monti, Michael Bronstein, and Xavier Bresson. 2017. Geometric matrixcompletion with recurrent multi-graph neural networks. In

NIPS . 3697–3707.[19] Hyeonseob Nam, Jung-Woo Ha, and Jeonghee Kim. 2017. Dual attention networksfor multimodal reasoning and matching. In

CVPR . 299–307.[20] Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learningof social representations. In

KDD . 701–710.[21] Massimo Quadrana, Alexandros Karatzoglou, Balázs Hidasi, and Paolo Cremonesi.2017. Personalizing session-based recommendations with hierarchical recurrentneural networks. In

RecSys . 130–137.[22] Pengjie Ren, Zhumin Chen, Jing Li, Zhaochun Ren, Jun Ma, and Maarten deRijke. 2019. RepeatNet: A Repeat Aware Neural Recommendation Machine forSession-Based Recommendation. In

AAAI . 4806–4813.[23] Steffen Rendle, Christoph Freudenthaler, and Lars Schmidt-Thieme. 2010. Factor-izing personalized markov chains for next-basket recommendation. In

WWW .811–820.[24] Badrul Munir Sarwar, George Karypis, Joseph A Konstan, John Riedl, et al. 2001.Item-based collaborative filtering recommendation algorithms.

Www

CIKM .[26] Yong Kiam Tan, Xinxing Xu, and Yong Liu. 2016. Improved recurrent neuralnetworks for session-based recommendations. In

Proceedings of the 1st Workshopon Deep Learning for Recommender Systems . ACM, 17–22.[27] Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei.2015. Line: Large-scale information network embedding. In

WWW . 1067–1077.[28] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones,Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is Allyou Need. In

NIPS . 6000–6010.[29] Shu Wu, Yuyuan Tang, Yanqiao Zhu, Liang Wang, Xing Xie, and Tieniu Tan. 2019.Session-based recommendation with graph neural networks. In

AAAI . 346–353.[30] Chengfeng Xu, Pengpeng Zhao, Yanchi Liu, Victor S Sheng, Jiajie Xu, FuzhenZhuang, Junhua Fang, and Xiaofang Zhou. 2019. Graph Contextualized Self-Attention Network for Session-based Recommendation. In

IJCAI . 3940–3946. [31] Jiaxuan You, Yichen Wang, Aditya Pal, Pong Eksombatchai, Chuck Rosenberg,and Jure Leskovec. 2019. Hierarchical Temporal Convolutional Networks forDynamic Recommender Systems. In

WWW . 2236–2246.[32] Chuxu Zhang, Dongjin Song, Chao Huang, Ananthram Swami, and Nitesh V.Chawla. 2019. Heterogeneous Graph Neural Network. In

KDD . 793–803.[33] Wei Zhang and Jianyong Wang. 2015. Location and Time Aware Social Collabo-rative Retrieval for New Successive Point-of-Interest Recommendation. In

CIKM .1221–1230.[34] Zhe Zhao, Zhiyuan Cheng, Lichan Hong, and Ed Huai-hsin Chi. 2015. ImprovingUser Topic Interest Profiles by Behavior Factorization. In