[PDF] Improving Accuracy and Diversity in Matching of Recommendation with Diversified Preference Network

Abstract

Recently, real-world recommendation systems need to deal with millions of candidates. It is extremely challenging to conduct sophisticated end-to-end algorithms on the entire corpus due to the tremendous computation costs. Therefore, conventional recommendation systems usually contain two modules. The matching module focuses on the coverage, which aims to efficiently retrieve hundreds of items from large corpora, while the ranking module generates specific ranks for these items. Recommendation diversity is an essential factor that impacts user experience. Most efforts have explored recommendation diversity in ranking, while the matching module should take more responsibility for diversity. In this paper, we propose a novel Heterogeneous graph neural network framework for diversified recommendation (GraphDR) in matching to improve both recommendation accuracy and diversity. Specifically, GraphDR builds a huge heterogeneous preference network to record different types of user preferences, and conduct a field-level heterogeneous graph attention network for node aggregation. We also innovatively conduct a neighbor-similarity based loss to balance both recommendation accuracy and diversity for the diversified matching task. In experiments, we conduct extensive online and offline evaluations on a real-world recommendation system with various accuracy and diversity metrics and achieve significant improvements. We also conduct model analyses and case study for a better understanding of our model. Moreover, GraphDR has been deployed on a well-known recommendation system, which affects millions of users. The source code will be released.

Full PDF

IImproving Accuracy and Diversity in Matching ofRecommendation with Diversified Preference Network

Ruobing Xie ∗ WeChat, TencentBeijing, [email protected]

Qi Liu ∗ WeChat, TencentBeijing, [email protected]

Shukai Liu

WeChat, TencentBeijing, [email protected]

Ziwei Zhang

Tsinghua UniversityBeijing, [email protected]

Peng Cui

Tsinghua UniversityBeijing, [email protected]

Bo Zhang

WeChat, TencentBeijing, [email protected]

Leyu Lin

WeChat, TencentBeijing, [email protected]

ABSTRACT

Recently, real-world recommendation systems need to deal withmillions of candidates. It is extremely challenging to conduct so-phisticated end-to-end algorithms on the entire corpus due to thetremendous computation costs. Therefore, conventional recom-mendation systems usually contain two modules. The matchingmodule focuses on the coverage, which aims to efficiently retrievehundreds of items from large corpora, while the ranking modulegenerates specific ranks for these items. Recommendation diversityis an essential factor that impacts user experience. Most efforts haveexplored recommendation diversity in ranking, while the match-ing module should take more responsibility for diversity. In thispaper, we propose a novel Heterogeneous graph neural networkframework for diversified recommendation (GraphDR) in matchingto improve both recommendation accuracy and diversity. Specifi-cally, GraphDR builds a huge heterogeneous preference networkto record different types of user preferences, and conduct a field-level heterogeneous graph attention network for node aggregation.We also innovatively conduct a neighbor-similarity based loss tobalance both recommendation accuracy and diversity for the diver-sified matching task. In experiments, we conduct extensive onlineand offline evaluations on a real-world recommendation systemwith various accuracy and diversity metrics and achieve significantimprovements. We also conduct model analyses and case study fora better understanding of our model. Moreover, GraphDR has beendeployed on a well-known recommendation system, which affectsmillions of users. The source code will be released. ∗ Both authors contributed equally to this research.Permission to make digital or hard copies of part or all of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for third-party components of this work must be honored.For all other uses, contact the owner/author(s). arXiv ’21, Feb 07, 2021, Online © 2021 Copyright held by the owner/author(s).ACM ISBN 978-x-xxxx-xxxx-x/YY/MM.https://doi.org/10.1145/xxxxxxx.xxxxxxx

CCS CONCEPTS • Information systems → Recommender systems ; •

Comput-ing methodologies → Neural networks . KEYWORDS recommender system, diversified recommendation, graph neuralnetwork, heterogeneous network

ACM Reference Format:

Ruobing Xie, Qi Liu, Shukai Liu, Ziwei Zhang, Peng Cui, Bo Zhang, and LeyuLin. 2021. Improving Accuracy and Diversity in Matching of Recommen-dation with Diversified Preference Network. In arXiv ’21, Feb 07, 2021, On-line.

ACM, New York, NY, USA, 11 pages. https://doi.org/10.1145/xxxxxxx.xxxxxxx

Recently, real-world personalized recommendation systems usuallyneed to deal with hundreds of millions of items [39]. Therefore, it ischallenging to conduct complicated end-to-end recommendation al-gorithms on the entire corpus, for even a linear time complexity w.r.tthe corpus size is unacceptable [54]. To balance both effectivenessand efficiency in real-world scenarios, conventional recommenda-tion systems usually consist of two modules, namely the matchingmodule and the ranking module [6, 47]. The matching module,also regarded as the candidate generation in the Youtube model [6],aims to retrieve a small subset of (usually hundreds of) items fromthe entire corpus efficiently. In contrast, the ranking module con-ducts sophisticated models on these retrieved items to get specificitem ranks. Fig. 1 shows the classical two-step architecture. Thematching module concentrates more on the diversity, efficiencyand item coverage, while the ranking module focuses more on theaccuracy of specific item ranks. This two-step architecture balancesefficiency and effectiveness in practice.Conventional recommendation models usually regard recom-mendation accuracy metrics such like Click-through-rate (CTR) astheir central objectives, in which popular items clicked by users are a r X i v : . [ c s . I R ] F e b atching RankingMost sophisticated ranking algorithmsmillions hundreds dozensOur GraphDRlarge corpus efficiently retrieving diverse candidates accurately generating specific item ranks Figure 1: An example of a real-world recommendation sys-tem. GraphDR focuses on the matching module, which aimsto retrieve user-interested and diverse items efficiently. more preferred. However, such objectives will lead to homogeniza-tion issues that reduce personalization and harm user experiences.To solve this issue, recommendation diversity is considered toevaluate the overall recommendation performances from anotheraspect [2]. It is measured in two classical ways: the individual di-versity and the aggregate diversity [21]. The individual diversity focuses on the local diversity in each recommended item list, whichaims to balance user-item similarities and item-item dissimilarities[3]. In contrast, the aggregate diversity focuses on the global di-versity in the overall recommendation, which is usually measuredby the coverage of items that could be recommended by models inthe entire corpus [16]. The significance of diversity has been widelyverified to provide highly idiosyncratic items in recommendation[49], which should be considered in real-world scenarios.There are lots of ranking models that have explored recommen-dation diversities with the help of dissimilarity factors [2], externaltaxonomy information [57], clustering [1] and graphic technologies[29]. However, most diversified recommendation models are spe-cially designed for ranking, which are incredibly time-consumingto be used in matching with millions of items [29], while very fewworks systematically focus on the diversity in matching. In fact,matching should take more responsibility for diversity, since it caresmore about the coverage of user-interested items rather than theirspecific item ranks. The recommendation diversity needs to be firstguaranteed in the matching module. Otherwise, the homogeniza-tion of the item candidates generated by the matching module willinevitably lead to the lack of diversity in the final recommendation.In this paper, we aim to improve both recommendation accuracyand diversity in the matching module, which is essential in real-world recommendation systems. We propose a novel

Heteroge-neous graph neural network framework for diversified rec-ommendation (GraphDR) . Precisely, GraphDR mainly consistsof three modules: (1)

Diversified preference network construction ,which aims to build a huge global heterogeneous network contain-ing various interactions between different types of nodes includingvideos, tags, medias, users and words. These interactions betweenessential recommendation factors reflect user diverse preferencesfrom a global view, which are the sources of diversity. (2)

Hetero-geneous network representation learning (NRL) , which learns noderepresentations with a novel field-level heterogeneous graph at-tention network (FH-GAT). FH-GAT helps to better maintain and aggregate different types of interactions. We also innovatively con-duct a neighbor-similarity based objective to encode user diversepreferences into heterogeneous node representations. Differentfrom CTR-oriented objectives that simply focus on click behav-iors, the neighbor-similarity based objective highlights diversityby considering multiple factors of videos such as user watchinghabit, audience community, video content, video taxonomy, andcontent provider. (3)

Online multi-channel matching , which gener-ates a small subset of user-interested and diverse item candidatesefficiently through multiple channels. The multi-channel strategyis conducted to further amplify the diversity in the final results.The diversity derives from all three modules in GraphDR.In experiments, we conduct both offline and online evaluationson a real-world video recommendation system, which is widelyused by hundreds of millions of users. We conduct extensive experi-ments to measure the recommendation accuracy and diversity withdozens of metrics. We also explore GraphDR with model analyses,ablation tests and case studies for better understanding. The maincontributions are concluded as follows: • We highlight and systematically explore the recommenda-tion diversity issue in the matching module, which is essen-tial in practical large-scale recommendation systems. • We propose a novel GraphDR framework to jointly improveboth recommendation accuracy and diversity in real-worldmatching. To the best of our knowledge, we are the first tointroduce GNN on heterogeneous preference networks fordiversified recommendation in matching. • We propose a novel field-level heterogeneous GAT modelto aggregate neighbors with different feature fields. We alsoinnovatively conduct the neighbor-similarity based loss topolish recommendation diversity. • The offline and online evaluations indicate that GraphDR canimprove both accuracy and diversity in practice. GraphDRis simple and effective, which has been deployed on a real-world recommendation system used by millions of users. Itis also convenient to adopt GraphDR to other scenarios.

In related works, we first give a brief introduction to the classicalrecommendation algorithms, and then introduce the efforts in rec-ommendation diversity. We also include a discussion on the graphneural networks used in recommendation.

Collaborative filtering (CF) is a classical method which recommendsitems with similar items or users [34]. Matrix factorization (MF)attempts to decompose user-item interaction matrix to get user anditem representations [20]. FM [33] expands to model second-orderfeature interactions with latent vectors. However, most neural rank-ing models rely on user-item interactions for prediction. Hence,these complicated ranking models are hard to be directly used inmatching, for they are extremely time-consuming when handlingmillion-level items. With the thriving in deep learning, neural mod-els like Deep Crossing [35], FNN [51], PNN [32], Wide&Deep [4],DCN [40] and DFN [46] are proposed to improve recommendationperformances. DeepFM [10], AFM [45] and NFM [12] improve theriginal FM with DNN or attention. AutoInt [36] and BERT4Rec[37] also brings in self attention. Recently, AFN [5] and AutoFIS[23] are proposed to smartly model high-order feature interactionsvia logarithmic transformation or automatic feature selection. Mostdeep ranking models are challenging to be utilized in real-worldmatching module, for they are extremely time-consuming dealingwith millions of candidates.In contrast, there are much fewer works specially designed formatching. Conventional systems usually use IR-based methods[17] or Collaborative filtering (CF) based methods [34] for fastretrieval. For neural models, embedding-based retrieval such asDSSM [15] is also widely deployed. Recently, Youtube [6] brings indeep models to learn user preference in matching. Moreover, TDM[54], JTM [53] and OTM [56] arrange items with tree structuresto accelerate top-n item retrieval, which combine matching andranking in a single model. ICAN [47] is specially designed forcold-start multi-channel matching. Huang et al. [14] also proposesan industrial embedding-based retrieval framework in Facebooksearch. However, these matching models mainly focus on CTR-oriented objectives. It is still challenging for these models to balanceaccuracy and diversity in real-world scenarios. In this work, weaim to improve both recommendation accuracy and diversity in thematching module via the proposed GraphDR with the diversifiedpreference network.

Merely using CTR-oriented objectives will make hot items hotter,which inevitably brings in serious homogenization issues that maydegrade user experiences [50]. The significance of diversity has beenverified by lots of efforts, since it could provide highly idiosyncraticitems with less homogeneity for users in personalized recommen-dation [2, 49]. Recommendation diversity is mainly measured inindividual diversity and aggregate diversity [21]. The individualdiversity focuses on the local diversity in recommended list. [2] and[57] focus on intra-list item dissimilarities. [50] proposes a novelitem novelty, which measures the additional information from newitems. Some works measure diversity with the varieties of taxon-omy in item lists [57]. In contrast, the aggregate diversity measuresthe global diversity in overall systems. [16] measures this diversitywith the coverage of recommended items. The higher item coverageindicates that the model could recommend more long-tail items,which implies a more diversified system from the global aspect.There are some works that model diversity in ranking. [2] bringdissimilarity factors to the loss functions to measure the individ-ual diversity. External taxonomy information (e.g., tag, categoryand subtopic) [42, 57] and knowledge graph [9] are useful factorsfor diversity. Other technologies such as entropy regularizer [31],clustering [1], graph-based models [28, 29, 55], and greedy mapinference [3] have also been explored for diversified recommenda-tion. Recently, diversified recommendation is armed with reinforce-ment learning [25] and adversarial learning [43]. Recommendationbandits [22, 27] are also well explored. However, most diversifiedmodels are specially designed for ranking, which are hard to bedirectly used in matching. To the best of our knowledge, we arethe first to use GNN on the global heterogeneous interactions toimprove both accuracy and diversity in the matching module.

Recently, GNN has been widely explored and verified in variousfields. GCN [19] introduces convolution to graphs based on spectralgraph theory. GraphSAGE [11] conducts an inductive represen-tation learning on large graphs. Graph attention network (GAT)[38] brings in graph attention mechanism. HetGNN [48] and HAN[41] extend GNN to heterogeneous networks. In recommendation,Wu et al. [44], Fan et al. [8] and He et al. [13] further use GNN onsession-based and social-based recommendation. Heterogeneousgraphs are also widely adopted to model different types of essentialobjects such as users, items, tags and providers in recommendation[24, 26]. Inspired by these models, we conduct a heterogeneousgraph to model various types of feature interactions, and also use aheterogeneous GNN model for node aggregation.

In this paper, we propose GraphDR to improve both accuracy anddiversity in matching by considering user diverse preferences. Inthis section, we first show the overall framework of GraphDR (Sec.3.1). Second, we introduce the construction of nodes and edges inthe diversified preference network, which is the source of diversityin our model (Sec. 3.2). Next, we introduce the Diversity-enhancednetwork representation learning model FH-GAT used to generatenode representations for all types of nodes (Sec. 3.3). Finally, we givea detailed discussion on the proposed Diversity-enhanced trainingobjective (Sec. 3.4). We further introduce the online deployment ofthe multi-channel matching module (Sec. 4).

The GraphDR framework mainly contains three modules as in Fig2, including diversified preference network construction, networkrepresentation learning, and online multi-channel matching. Inoffline NRL, GraphDR first collects various informative interactionsbetween heterogeneous nodes to build a huge global diversifiedpreference network. Next, we propose a field-level HGAT model tolearn node embeddings with the neighbor-similarity based objective.In online serving, the multi-channel matching retrieves hundredsof accurate and diverse item candidates efficiently with multiplechannels. The offline NRL conducts time-consuming training toencode user diverse preferences into node embeddings, while theonline serving efficiently uses these learned embeddings for fastand diversified multi-channel retrieval.

The diversified preference network is the fundamental of diversity.We attempt to bring in heterogeneous interactions between essen-tial objects in recommendation to describe user diverse preferences.Precisely, we focus on five different types of nodes including video , tag , media , user and word , which are essential factors that mayimpact users in recommendation. Each video has a title (containingwords) and several tags annotated by editors. The video provider isviewed as the media. To alleviate the data sparsity and reduce com-putation costs, we cluster users into user groups as communitiesaccording to their basic profiles (i.e., the gender-age-location at-tribute triplets in this work), and consider these user groups as usernodes. We group users via user basic profiles for higher coverage. ideosvideosmediasvideos ...... nd Field-level HGATSimilarity-based loss word field user field video field ...

Heterogeneous feature inputs1st aggregation layer2nd aggregation layerOutput

Full connection & ReLU one-hop neighborstwo-hop neighbors aggregated node embeddings

Field-level HGAT word

Field-level HGAT user

Field-level HGAT video ...

Self-loop projection central node video channelmedia channel tag channel

Joint ranking final video candidates top k videostags video watching sequence videos

Offline NRL Online Serving v v v m ... Figure 2: The offline NRL and online serving parts of GraphDR for matching in recommendation. The left offline NRL partis the proposed FH-GAT model, which builds the aggregated node embeddings with heterogeneous GAT on the diversifiedpreference network. The right online multi-channel matching part aims to retrieve hundreds of videos from large corpora ef-ficiently. The recommendation diversity comes from diversified preference network, FH-GAT trained with diversity-enhancedtraining objective, and online multi-channel matching.

We assume that the interactions between these five types of ob-jects can reflect user diverse preferences. In GraphDR, we considersix types of edges to record these multi-aspect preferences as: • Video-video edge.

We generate the video-video edge be-tween two video nodes if they have appeared adjacently in auser’s video session. To reduce noises, we only use the validwatching behaviors , where videos have been watched formore than 70% of their total time lengths. Video-video edgesrecord the sequential user watching habits in sessions. • Video-user edge.

Video-user edges are built if a video isvalidly watched by a user group at least 3 times in a week.This edge stores coarse-grained user-item interactions andalso implies the audience community of videos. • Video-tag edge.

Video-tag edge connects videos with theircorresponding tags, which reflects the coarse-grained seman-tic preferences of taxonomy in videos. • Video-word edge.

Video-word edge links videos with theirwords in titles, which reflects the fine-grained semantic pref-erences of detailed word-level contents in videos. • Video-media edge.

Video-media edges are drawn betweenvideos and their medias, which shows the video providers. • Tag-tag edge.

We build tag-tag edges according to tag co-occurrence in a video, which highlights taxonomy relevance.All edges are undirected and unweighted. These heterogeneousedges bring in additional information of videos besides user-itemclick behaviors. They can reflect user diverse preferences in userwatching habit, audience community, video content, taxonomy, andcontent provider. For instance, two related videos may be linked viathe same user groups, video providers, tags or watching sessions, oreven connected by a multi-step path containing heterogenous nodes.The multi-hop paths via heterogeneous nodes and edges build up the potential reasons for recommendation, which are implicit, low-correlational but diversified. It is also not difficult to extend othertypes of nodes and edges in GraphDR.

Network representation learning aims to encode user diverse pref-erences into node representations. Inspired by [24, 41], we proposea new

Field-level Heterogeneous Graph Attention Network(FH-GAT) . Fig. 2 shows the 2-layer architecture.

We first project all heteroge-neous nodes into the same feature space. For the k-th node, itsoverall neighbor set 𝑁 𝑘 could be divided into five feature fields according to their types as { ¯ 𝒗 𝑘 , ¯ 𝒕 𝑘 , ¯ 𝒎 𝑘 , ¯ 𝒖 𝑘 , ¯ 𝒅 𝑘 } , where ¯ 𝒗 𝑘 , ¯ 𝒕 𝑘 , ¯ 𝒎 𝑘 , ¯ 𝒖 𝑘 and ¯ 𝒅 𝑘 indicate the one-hot representations of video, tag, media,user, word neighbors respectively. The node feature embeddings ofthe k-th node 𝒉 𝑘 is as follows: 𝒇 𝑘 = concat ( 𝒗 𝑘 , 𝒕 𝑘 , 𝒎 𝑘 , 𝒖 𝑘 , 𝒅 𝑘 ) , (1)in which 𝒗 𝑘 indicates the video-field feature embedding. In thiswork, we empirically set 𝒗 𝑘 = 𝑷 𝑣 ¯ 𝒗 𝑘 , where 𝑷 𝑣 ∈ R 𝑑 𝑣 × 𝑛 𝑣 repre-sents the lookup projection matrix generating 𝒗 𝑘 with the videoneighbors. 𝑑 𝑣 is the dimension of 𝒗 𝑘 and 𝑛 𝑣 is the number of videonodes. For efficiency, the projection matrix is pre-defined as theindicator of top-frequent video neighbors and fixed during training.concat (·) is the concatenation operation. The tag, media, user andword field feature embeddings 𝒕 𝑘 , 𝒎 𝑘 , 𝒖 𝑘 and 𝒅 𝑘 are generatedsimilarly as the video field feature embedding 𝒗 𝑘 . This layer takes the neighbor fea-ture embeddings { 𝒇 , · · · , 𝒇 𝑙 } of the k-th node as inputs. We set aweighting vector group { 𝒘 𝑣𝑘 , 𝒘 𝑡𝑘 , 𝒘 𝑚𝑘 , 𝒘 𝑢𝑘 , 𝒘 𝑑𝑘 } for each field, where 𝑣𝑘 represents the k-th weighting vector of video. The output em-bedding 𝒚 𝑣𝑘 of the video field is defined as follows: 𝒚 𝑣𝑘 = 𝑙 ∑︁ 𝑖 = 𝛼 𝑣𝑘𝑖 𝒗 𝑖 , 𝛼 𝑣𝑘𝑖 = exp ( 𝒘 𝑣𝑘 ⊤ 𝒗 𝑖 ) (cid:205) 𝑛𝑗 = exp ( 𝒘 𝑣𝑘 ⊤ 𝒗 𝑗 ) , (2)where 𝛼 𝑣𝑘𝑖 is the weight of the k-th node to its i-th neighbor in thevideo field. The construction of 𝒚 𝑡𝑘 , 𝒚 𝑚𝑘 , 𝒚 𝑢𝑘 and 𝒚 𝑑𝑘 are the same as 𝒚 𝑣𝑘 .We concatenate these embeddings to form the final neighbor-basedrepresentation 𝒚 𝑁𝑘 as follows: 𝒚 𝑁𝑘 = ReLU ( 𝑾 𝑛 · concat ( 𝒚 𝑣𝑘 , 𝒚 𝑡𝑘 , 𝒚 𝑚𝑘 , 𝒚 𝑢𝑘 , 𝒚 𝑑𝑘 )) . (3)We further consider the self-loop projection as a supplement tohighlight the central k-th node’s information. We have: 𝒚 𝑆𝑘 = ReLU ( 𝑾 𝑠 · 𝒇 𝑘 ) . (4)Next, we combine neighbor and self-loop based representations toget the 1st layer output 𝒚 𝑘 , and use the 2nd FH-GAT layer to getthe final aggregated representation 𝒉 𝑘 as: 𝒉 𝑘 = FH − GAT ( 𝒚 𝑘 ) , 𝒚 𝑘 = 𝜆 𝑠 · 𝒚 𝑆𝑘 + ( − 𝜆 𝑠 ) · 𝒚 𝑁𝑘 , (5)where 𝜆 𝑠 is empirically set as 0 . Conventional ranking models usually rely on supervised trainingwith CTR-oriented objectives, which also brings in homogeniza-tion. In this work, instead of merely focusing on CTR, GraphDRaims to learn user diverse preferences from multi-aspect factorsand improve both accuracy and diversity. Therefore, we conductthe neighbor-similarity based loss [24] instead of conventionalCTR-oriented objectives to highlight diversity. Specifically, we as-sume that all nodes should be similar to their neighbors on the diver-sified preference network regardless of their types. The neighbor-similarity based loss can be viewed as a specialized DeepWalk [30]with the path length set to be 2 (too long paths may bring in morenoises and computation costs), which is formalized as follows: 𝐽 = ∑︁ ℎ 𝑘 ∑︁ ℎ 𝑖 ∈ 𝑁 𝑘 ∑︁ ℎ 𝑗 ∉ 𝑁 𝑘 ( log ( 𝜎 ( 𝒉 ⊤ 𝑘 𝒉 𝑗 )) − log ( 𝜎 ( 𝒉 ⊤ 𝑘 𝒉 𝑖 ))) . (6) 𝒉 𝑘 is the 𝑘 -th aggregated node embedding given by FH-GAT, and 𝑁 𝑘 is the neighbor set of the k-th node. 𝜎 (·) indicates the sigmoidfunction. We use Adam [18] with negative sampling for training.The feasibility and necessity of the neighbor-similarity basedloss are discussed as follows: (1) videos that a user may be inter-ested in are very likely to be connected via (multi-step) paths in thediversified preference network. For example, the multi-step path video:Apple event ↔ tag:iPhone ↔ tag:fast charge ↔ video:new tech of charge connects two related videos users may watch sequentially.Through the neighbor-similarity based loss, related heterogeneousnodes linked by multi-hop paths in the diversified preference net-work will have similar representations. (2) GraphDR focuses on thematching module which values efficiency. Hence, the online multi-channel matching in Sec. 4 conducts an embedding-based retrievalto meet the requirement of efficiency, which ranks videos accord-ing to the similarities between different types of embeddings. Theneighbor-similarity based loss perfectly matches the embedding-based retrieval for efficient, accurate and diverse matching.Cooperating with diversified preference network, the neighbor-similarity based loss can well balance both accuracy and diversity,since it calculates video similarities with multiple factors includ-ing user watching habit in session, audience community, videocontent, taxonomy, and content provider. Precisely, the click-basedsupervised information used in classical ranking models is collectedby two global interactions in GraphDR: video-video edges (for se-quential click information in session) and video-user edges (forcommunity-aggregated user-item interactions). These two types ofclick-based interactions are still the dominating interactions (tak-ing nearly 83% of all interactions in our dataset given in Table 1 toensure the recommendation accuracy. In contrast, the other fourinteractions related to tags, medias and words mainly provide thegeneralization ability of node representations to ensure the recom-mendation diversity. Comparing with classical CTR-oriented lossesthat merely focus on clicks, GraphDR jointly considers user diversepreferences from multiple heterogeneous interactions, and thuscould achieve better accuracy and diversity in matching. We have deployed our GraphDR on the matching module of awidely-used video recommendation system in WeChat Top Stories,which has nearly billion-level daily views generated by million-levelusers. We introduce the details of online serving.

Online multi-channel matching aims to retrieve hundreds of itemsfrom millions of candidates rapidly. GraphDR first builds the userrepresentation with his/her valid watching behaviors { ˆ 𝑣 , · · · , ˆ 𝑣 𝑚 } of videos. To improve the diversity, we conduct a multi-channelmatching strategy as in Fig 2, which jointly retrieves video can-didates from multiple aspects of representative tags, medias andvideos in user historical behaviors.In the video channel, each video in the valid watching behaviorsequence retrieves top 100 videos with the cosine similarity betweentwo aggregated video embeddings. The weighting score of the i-thvideo 𝑣 𝑖 in the video channel is formulated as: 𝑠𝑐𝑜𝑟𝑒 𝑣𝑖 = 𝑚 ∑︁ 𝑗 = 𝑥 𝑣 ( 𝑖 𝑗 ) × 𝑐𝑜𝑚𝑝𝑙𝑒𝑡𝑒 𝑗 × 𝑡𝑖𝑚𝑒 𝑗 × 𝑠𝑖𝑚 ( 𝑣 𝑖 , ˆ 𝑣 𝑗 ) . (7) 𝑥 𝑣 ( 𝑖 𝑗 ) equals 1 only if the i-th video 𝑣 𝑖 is in the top 100 nearestvideos of the j-th video ˆ 𝑣 𝑗 in valid watching sequence, and otherwiseequals 0. 𝑐𝑜𝑚𝑝𝑙𝑒𝑡𝑒 𝑗 is the watching time length percentage of ˆ 𝑣 𝑗 ,which measures the user’s satisfaction of ˆ 𝑣 𝑗 . 𝑠𝑖𝑚 ( 𝑣 𝑖 , ˆ 𝑣 𝑗 ) representsthe cosine similarity calculated by the aggregated node embeddingsof 𝑣 𝑖 and ˆ 𝑣 𝑗 . We also use 𝑡𝑖𝑚𝑒 𝑗 to highlight the short-term interestsf users as follows: 𝑡𝑖𝑚𝑒 𝑗 = 𝜂 · 𝑡𝑖𝑚𝑒 𝑗 + , 𝑡𝑖𝑚𝑒 𝑚 = , (8)in which 𝜂 = .

95 is a time decay factor.In the tag and media channels, we first learn user preferences ontags and medias from user historical behaviors. For example, thei-th tag’s preference score 𝑝 𝑡𝑖 is defined as: 𝑝 𝑡𝑖 = 𝑚 ∑︁ 𝑗 = 𝑧 𝑡 ( 𝑖 𝑗 ) × 𝑐𝑜𝑚𝑝𝑙𝑒𝑡𝑒 𝑗 × 𝑡𝑖𝑚𝑒 𝑗 , (9)where 𝑧 𝑡 ( 𝑖 𝑗 ) equals 1 when the i-th tag belongs to ˆ 𝑣 𝑗 , and otherwiseequals 0. To reduce noises, we only select top 10 tags ˆ 𝑡 𝑗 rankedby 𝑝 𝑡𝑖 to form the user preferred tag set 𝑇 𝑢 . Next, each tag in 𝑇 𝑢 retrieves top 100 videos with the cosine similarities between tagand video aggregated embeddings. The weighting score of the i-thvideo in tag channel is calculated as: 𝑠𝑐𝑜𝑟𝑒 𝑡𝑖 = ∑︁ ˆ 𝑡 𝑗 ∈ 𝑇 𝑢 𝑥 𝑡 ( 𝑖 𝑗 ) × 𝑝 𝑡𝑗 (cid:205) ˆ 𝑡 𝑘 ∈ 𝑇 𝑢 𝑝 𝑡𝑘 × 𝑠𝑖𝑚 ( 𝑣 𝑖 , ˆ 𝑡 𝑗 ) . (10) 𝑥 𝑡 ( 𝑖 𝑗 ) equals 1 if 𝑣 𝑖 is in the top 100 nearest videos of ˆ 𝑡 𝑗 , and oth-erwise equals 0. 𝑠𝑖𝑚 ( 𝑣 𝑖 , ˆ 𝑡 𝑗 ) indicates the cosine similarity between 𝑣 𝑖 and ˆ 𝑡 𝑗 . The weighting score of 𝑣 𝑖 in media channel 𝑠𝑐𝑜𝑟𝑒 𝑚𝑖 iscalculated similarly as 𝑠𝑐𝑜𝑟𝑒 𝑡𝑖 of tag channel.Finally, we combine all three multiple channels in the joint rank-ing to get the final video weighting scores as follows: 𝑠𝑐𝑜𝑟𝑒 𝑖 = 𝜆 𝑣 · 𝑠𝑐𝑜𝑟𝑒 𝑣𝑖 + 𝜆 𝑡 · 𝑠𝑐𝑜𝑟𝑒 𝑡𝑖 + 𝜆 𝑚 · 𝑠𝑐𝑜𝑟𝑒 𝑚𝑖 . (11)We rank all videos with their final video weighting scores and selecttop 500 videos as the output of GraphDR. We do not use the usergroup embedding learned by FH-GAT for online matching, sincethey are coarse-grained user community representations, and userhistorical behaviors are more informative for individuals. We alsoabandon the word channel considering the ambiguity in words. The online recommendation system mainly contains two modulesincluding ranking and matching. The ranking module adopts ensem-ble ranking models including DeepFM [10], AutoInt [36] and AFN[5] to model feature interactions between user, item and contexts.Reinforcement learning is also used for long-term and list-wiserewards. In contrast, the matching module aims to retrieve as manyappropriate items as possible. Therefore, the matching module con-sists of dozens of different types of matching strategies from variousaspects. Our GraphDR and other compared matching baselines areworked as one of the matching strategies in the matching module.All matching strategies compete with each other, aiming to gener-ate items to be fed into the the same shared ranking module. Sec. 5.6gives the implementation details of an online matching evaluation.Online matching especially values efficiency. In GraphDR, allembedding similarities like 𝑠𝑖𝑚 ( 𝑣 𝑖 , ˆ 𝑣 𝑗 ) are pre-calculated in offline,which enables fast retrieval. Its online time complexity is less than 𝑂 ( log 𝑛 ) w.r.t the corpus size 𝑛 , which is much superior to mostdeep ranking models involving complicated user-item interactions. In experiments, we conduct extensive offline and online evaluationswith detailed analyses on a real-world recommendation systemto verify that GraphDR can improve both accuracy and diversity.In this section, we attempt to answer the following five researchquestions: (

RQ1 ): How does the proposed GraphDR model performagainst different types of competitive models on recommendationaccuracy in matching (see Sec. 5.4)? (

RQ2 ): How does GraphDRperform against competitive baselines on recommendation diversityat element level, list level and global level (see Sec. 5.5)? (

RQ3 ):How does PAPERec perform in online system with various onlineaccuracy and diversity related evaluation metrics (see Sec. 5.6)?(

RQ4 ): How do different essential parameters affect GraphDR onrecommendation accuracy and diversity (see Sec. 5.7)? (

RQ5 ): Willnode representations learned by GraphDR be successfully encodedwith user diverse preferences (see Sec. 5.8)?

Since there are few large-scale datasets for evaluating recommen-dation accuracy and diversity in matching, we build a novel datasetDivMat-2.1B extracted from WeChat Top Stories. We randomly se-lect nearly 15 million users, collect their 2 . , ,

719 valid watchingbehavior instances for offline evaluation in matching.

Table 1: Statistics of the DivMat-2.1B dataset. video user tag media word instance1.2M 15M 103K 74K 150K 2.1B

We implement several classical models as baselines, and categorizethese competitors into four groups.

IR-based Methods.

We implement three IR-based methods in-cluding Category-based, Tag-based and Media-based IR methods[17]. For Tag-based method, we build a tag-video inverted index,where videos are ranked with their popularity. The online matchingretrieves videos with user preferred tags. Other IR-based methodsare similar to Tag-based IR method.

CF-based Methods.

We implement Item-CF [34] to retrieve sim-ilar videos with video co-occurrence. Moreover, we also implementBERT-CF, which uses semantic similarity to measure video simi-larity. Precisely, we calculate the semantic similarity of two videoswith their title embeddings learned by BERT [7], and conduct CFto learn video embeddings for fast retrieval. omogeneous NRL Methods.

We implement some typical NRLmodels on the homogeneous video network built with video ses-sions. The compared methods include DeepWalk [30] and Graph-SAGE [11]. These learned video representations are then used foronline embedding-based matching with the video channel.

Neural-based Methods.

Youtube candidate generation model[6] is a classical deep model for matching. We further improvethe original Youtube model with behavior-level attention [52] andneural FM [12] as Youtube+ATT+FM, which is a strong industrialbaseline in practice. Moreover, we implement DSSM [15], whichretrieves items according to the user-item similarities. We alsoimplement AutoInt [36] to model feature interactions. These modelsare optimized under supervised learning with video behaviors.We conduct a nearest neighbor server for all embedding-basedfast retrieval. Note that we do not compare with complicated di-versified recommendation models specially designed for ranking,due to their tremendous computation costs in matching [16]. Wedo not report TDM/JDM either for the static tree-based retrieval ischallenging to handle various aspects of diversities in videos.

Ablation Test Settings.

We implement the heterogeneous ver-sions of GraphSAGE [11] and GAT [38] to replace FH-GAT in theNRL module for ablation tests. We use GraphDR(GraphSAGE) andGraphDR(GAT) to represent these two settings respectively.

In GraphDR, the node feature embedding dimension is 900, wherethe video field’s dimension 𝑑 𝑣 is 300 and others’ are 150. The dimen-sions of two output embeddings in FH-GAT are 120. The numbersof neighbor sampling in the first and second layers are 30 and 20. Intraining, we randomly select 20 negative samples for each positivesample, and set batch size as 512. In online matching, we considertop 200 recent watched videos and retrieve top 500 candidates forranking. The weighting scores 𝜆 𝑣 , 𝜆 𝑡 and 𝜆 𝑚 are equally set tobe 1. We conduct the grid search for parameter selection. For faircomparisons, all models follow the same settings in evaluation. We first evaluate all GraphDR models and baselines on recommen-dation accuracy in offline DivMat-2.1B dataset.

We focus on matching that aims togenerate hundreds of item candidates. Differing from ranking,matching only cares whether good items are retrieved , not thespecific item ranks. Therefore, we use hit rate (HIT@N) [37] as theevaluation metric for accuracy, where an instance is “hit” if theclicked item is ranked in top N. We do not use classical rankingmetrics such as MAP and NDCG since matching does not carespecific ranks. To simulate the real-world scenarios, we conductHIT@N with N set as 100, 200, 300 and 500. Since we retrieve top 500 items in the online recommendation system, HIT@500 isconsidered to be the most essential accuracy metric.

In Table 2 we can observe that:(1) GraphDR(FH-GAT) significantly outperforms all baselineson HIT@500 with the significance level 𝛼 = .

01. It indicates thatGraphDR(FH-GAT) could retrieve accurate items in matching. Dif-fering from conventional CTR-oriented models, GraphDR considers

Table 2: Results of recommendation accuracy. We set N=500in the matching module of our online system.

HIT@N N=100 N=200 N=300 N=500Category-based 0.0010 0.0018 0.0021 0.0031Tag-based 0.0157 0.0207 0.0240 0.0287Media-based 0.0235 0.0297 0.0337 0.0383BERT-CF 0.0337 0.0469 0.0556 0.0669Item-CF 0.0748 0.0904 0.1214 0.1459DeepWalk 0.0799 0.0998 0.1130 0.1340GraphSAGE 0.0932 0.1242 0.1568 0.1862DSSM 0.1012 0.1326 0.1631 0.2031AutoInt 0.1087 0.1488 0.1892 0.2401Youtube+ATT+FM user diverse preferences related to video session, community, tax-onomy, semantics and provider, which makes the matching resultsmore diversified. GraphDR is perfectly suitable for matching, sinceit concerns more about item coverage than their specific ranks.(2) GraphDR(FH-GAT) performs comparable or slightly worsethan Youtube+ATT+FM when 𝑁 is small. It is intuitive since theneighbor-similarity based loss should balance accuracy and diver-sity, which inevitably harms ranking accuracy (not matching). Incontrast, Youtube is a strong supervised baseline that benefits fromits CTR-oriented objective. However, it suffers from overfitting andhomogenization, and thus performs much worse than GraphDR(FH-GAT) when N grows bigger (which is the practical scenario). Thediversity issue will be discussed in Sec. 5.5.(3) Both IR-based methods and BERT-CF are not satisfactory. It in-dicates that the taxonomy and semantic similarities contribute lessto accuracy compared to user behaviors. In contrast, Neural-basedmethods focus on CTR-oriented objectives and thus get better accu-racies. However, they still perform worse than GraphDR, for theyfail to consider heterogeneous interactions and thus lack coverage. Ablation study.

Among different GraphDR versions, we findthat FH-GAT outperforms GAT and GraphSAGE. It confirms thepower of field-specific aggregation in modeling user diverse pref-erences. Moreover, we further conduct an ablation test to verifythat all different types of nodes are necessary for the diversifiedrecommendation. For instance, the HIT@500 will drop to 29 .

31% ifwe wipe out all word nodes in DivMat-2.1B.

In this subsection, we evaluate all models on both individual di-versity and aggregate diversity in recommendation with variousevaluation metrics.

We conduct nine typical diversity met-rics and group them into three classes, namely the element-leveldiversity, the list-level diversity and the global-level diversity. Theformer two diversities indicate the individual diversity, while the able 3: Results of different evaluation metrics on recommendation diversity.

Model Element-level diversity List-level diversity Global-level diversitytag cate media tag cate media coverage long-tail noveltyCategory-based 17.64 1.00 13.15 206.98 4.26 98.76 0.0012 0.0836 0.0043Tag-based 24.39 1.91 12.20 346.42 23.31 315.48 0.0270 0.1432 0.0343Media-based 29.67 2.95 1.00 434.30 43.41 9.58 0.0309 0.1327 0.0543BERT-CF 26.29 2.27 11.06 387.45 30.52 207.41 0.3829 0.2631 0.5734Item-CF 31.86 3.66 11.47 499.42 55.42 234.31 0.1786 0.0000 0.3143DeepWalk 30.64 3.24 13.23 476.76 52.53 246.33 0.1642 0.0000 0.3821GraphSAGE 31.67 2.84 13.65 426.32 41.11 285.52 0.1806 0.0000 0.3532DSSM 25.15 2.13 13.94 363.41 29.65 211.32 0.1688 0.0525 0.2843AutoInt 26.31 2.41 13.21 372.31 32.12 242.31 0.1762 0.0612 0.2971Youtube+ATT+FM 31.22 2.79 12.83 457.15 41.93 217.67 0.1532 0.0734 0.3523GraphDR(GraphSAGE) 33.19 3.61 14.91 498.31 51.21 327.28 0.4892 0.2854 0.6742GraphDR(GAT) 34.77 3.79 15.34 516.93 56.62 358.82 0.4934 0.3242 0.7032GraphDR(FH-GAT) latter diversity measures the aggregate diversity [21]. The element-level diversity focuses on the diversity in each element, suchas the tag, category, media in IR-based methods and the embed-dings in baselines. Precisely, we regard the average deduplicatedtag/category/media numbers in top 20 videos retrieved by theseelements as the element-level diversity. The list-level diversity measures diversity in recommended lists (top 500 items). We usethe average deduplicated tag/category/media numbers in the finalrecommended lists as the list-level diversity [42, 57]. In the global-level diversity , coverage indicates the percentage of items thatcould be recommended [16]. Long-tail indicates the percentage oflong-tail items in all results (videos that have not been watchedfor 15 days are empirically viewed as the long-tail videos).

Novelty represents the percentage of new items generated by this modelthat other models do not recommend [50].

Table 3 shows the results of variousdiversity metrics, form which we can know that:(1) GraphDR(FH-GAT) achieves the best performances in all di-versity metrics. The improvement derives from all three modules: (i)in diversified preference network, the heterogeneous interactionsstore user diverse preferences on taxonomy, semantics, community,video session and provider to link similar videos via multi-hoppaths. (ii) In NRL, FH-GAT and its neighbor-similarity based losssuccessfully encode user diverse preferences into node representa-tions. (iii) In online matching, the multi-channel strategy retrievesitems from tag/media/video aspects, which also amplifies diversity.In addition, GraphDR(GraphSAGE) and GraphDR(GAT) generallyoutperform all baselines but still inferior to GraphDR(FH-GAT). Itreconfirms the power of FH-GAT in diversity.(2) The element-level and list-level diversities indirectly measurethe individual diversity with diversities in tag, category and media.We assume that more tags/medias/categories in recommended listsindicate a more diversified recommendation. We find that behavior-based models like Youtube and GraphSAGE perform better thanother baselines in individual diversities. Nevertheless, GraphDRhas better results since it considers other types of interactions. (3) The global-level diversity measures the aggregate diversity,where coverage, long-tail and novelty focus on different aspects.Behavior-based models only consider video watching behaviors,which are hard to handle long-tail and new items. In contrast,BERT-CF focuses on content similarity and achieves good aggre-gate diversity. Still, GraphDR considers user diverse preferences invarious fields and achieves the best aggregate diversity.

The offline evaluation has verified the improvements of accuracyand diversity in matching module. We further conduct an online A/Btest to evaluate GraphDR in real-world industrial-level scenarios.

We implement GraphDR on the match-ing module of WeChat Top Stories following Sec. 4. The originalonline matching model is an ensemble model containing multipleIR-based, CF-based and Neural-based methods in Sec. 5.2. We regardGraphDR as an additional matching channel to the existing onlineensemble model, with the ranking module unchanged. All videosretrieved by different matching channels will jointly compete witheach other in the following ranking module.In online A/B test, we focus on the following seven representativemetrics to evaluate accuracy and diversity: (1) video views percapita (VV), (2) video watching time per capita (VWT/c), (3) videowatching time per video (VWT/v), (4) page turns per capita (PT),(5) deduplicated impressed videos per capita (DIV), (6) watched tagper capita (Tag diver), and (7) watched category per capita (Catediver). The former five metrics mainly measure accuracy, while thelatter two measure diversity. We conduct the A/B test for 5 dayswith nearly 3 . Table 4 shows the results of onlineevaluation with multiple metrics, from which we find that:(1) All GraphDR models outperform the ensemble base model,among which GraphDR(FH-GAT) achieves the best performances able 4: Online A/B test on recommendation accuracy and diversity in a real-world system.

VV VWT/c VWT/v PT DIV Tag diver Cate diverGraphSAGE +3.08% +6.20% +1.48% +4.66% +2.43% +6.24% +10.27%GraphDR(GraphSAGE) +4.37% +7.61% +1.68% +6.04% +3.97% +9.16% +12.42%GraphDR(GAT) +5.30% +9.36% +2.49% +6.07% +8.00% +12.81% +15.57%GraphDR(FH-GAT) +6.08% +10.79% +3.10% +6.10% +10.43% +14.68% +17.00% in accuracy and diversity with the significance level 𝛼 = .

01. Wehave also passed the homogeneity test in online evaluation, whichconfirms that the system and traffic split are unbiased and the im-provements are stable. It verifies the effectiveness of GraphDR inreal-world scenarios. Moreover, the improvements from Graph-SAGE to FH-GAT also imply the significances of FH-GAT.(2) The significant improvements in the former five metrics re-flect better accuracy. A better video view metric indicates that usersare more willing to click videos, while a better video watchingtime indicates users are genuinely interested in their clicked videos.Moreover, the page turns and deduplicated impressed video metricsalso reflect user experiences indirectly. Users will slide down andbrowse more videos if they are satisfied with the results.(3) The average watched tags and categories measure the diver-sity. The better tag/category diversity derives from two factors:more diverse videos impressed to users, and better personalizedresults that attract users to watch more videos. These diverse itemshelp us to explore users’ potential interests and give surprising re-sults, which could even contribute to the long-term performances.

We conduct several analyses on different channels and user behaviorsequence lengths to better understand GraphDR.

In GraphDR, the onlinemulti-channel matching module plays an important role in im-proving diversity. We evaluate the GraphDR(FH-GAT) on HIT@Nand list-level diversity metrics with different channels individu-ally. From Table 5 we find that: the video channel achieves betterHIT@N results, since video embeddings are directly influenced byvideo watching behaviors. In contrast, the tag and media channelsare more responsible for diversity. To balance both accuracy anddiversity, we combine all three channels in GraphDR.

Table 5: Results of different matching channels.

Channel tag media video jointHIT@100 0.1027 0.0934

We also analyze theimpacts of different behavior sequence lengths in online match-ing. In Table 6, as the behavior sequence length increases, HIT@Nmetrics achieve consistent improvements, while diversity metricsbecome slightly worse. It implies that considering user long-termpreferences can better understand users in recommendation. How-ever, user long-term preferences are more stable, which inevitablyharm the diversity. In GraphDR, we set the length as 200 since theimprovements in accuracy are more significant than diversity.

Table 6: Results of different user behavior lengths.

Length m=20 m=50 m=100 m=200HIT@100 0.0791 0.0883 0.1072

HIT@200 0.1237 0.1373 0.1653

HIT@300 0.1742 0.1902 0.2114

HIT@500 0.2393 0.2617 0.2763

Tag diversity

In GraphDR, user diverse preferences are encoded in node em-beddings. We give some tags and their nearest tags to explicitlydisplay the diversity in Table 7. The interest in

Restaurant guide may expand to specific food like

Foie gras and their stories like

Fooddocumentary . The nearest tags of

El Nino phenomenon reflect theinterests in nature and science. Users like iPhone 11 Pro Max mayalso seek information on its hardware, software, and discount infor-mation. These nearest tags reflect both similarities in semantics anduser preferences, since the node representations are learned underthe neighbor-similarity based objective with a diversified prefer-ence graph containing various heterogeneous feature interactions.Similar phenomenon can be found in other nodes.

Table 7: Examples of tags and their nearest tags.

Tag Nearest tagsRestaurantguide Roasted goose; Food documentary; Melaleucacake; Foie gras; Hong Kong cuisineEl Ninophenomenon Superluminal speed; Easter island; Darwin; Abso-lute zero; Parallel worlds theoryiPhone 11 ProMax iPhone SE; Fast charge; Mobile phone test; Voiceassistant; iPhone discount able 8 shows the nearest tags of some typical user groups. Ac-cording to the node embeddings and aggregated behaviors, youngmen users in our dataset are more interested in sports, while youngwomen focus more on fashion. Differing from the youth, the el-derly in Beijing concentrate on traditional Chinese art and culture.The geographic distance also leads to fine-grained differences ininterested sports (e.g., golf V.S. soccer). The preference divergencesin different communities verify the success of diversity modeling.

Table 8: Examples of user groups with nearest tags.

Sex Age City Nearest tagsM 21 Beijing Sports news; Entrepreneur; Comedy; Scien-tific anecdotes; SoccerF 21 Beijing Summer wear; Constellation; Product pro-motion; Diet food; Potted plantM 59 Beijing Calligraphy; Social documentary; Tai Chi;Exercise; FamilyM 21 London London Olympics; The Celtic; Scientists;Golf; 100 metres race

In this work, we propose a simple and effective GraphDR frameworkto improve both accuracy and diversity in matching. We proposea new diversified preference network to capture heterogeneousinteractions between essential objects in recommendation. We alsodesign a novel FH-GAT model with a neighbor-similarity based lossto encode user diverse preferences from heterogeneous interactions.In experiments, we conduct extensive offline and online evaluations,model analyses and case studies. The significant improvementsverify the effectiveness and robustness of GraphDR in improvingaccuracy and diversity simultaneously.In the future, we will explore more types of interactions andweighted edges in GraphDR. Moreover, we will enhance the multi-channel matching with more sophisticated models. Better graphneural networks are also worth being studied, which will be easilyadopted in our GraphDR framework.

REFERENCES [1] Tevfik Aytekin and Mahmut Özge Karakaya. 2014. Clustering-based diversityimprovement in top-N recommendation.

Journal of Intelligent Information Systems (2014).[2] Keith Bradley and Barry Smyth. 2001. Improving recommendation diversity. In

Proceedings of AICS .[3] Laming Chen, Guoxin Zhang, and Eric Zhou. 2018. Fast greedy map inference fordeterminantal point process to improve recommendation diversity. In

Proceedingsof NIPS .[4] Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra,Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, et al.2016. Wide & deep learning for recommender systems. In

Proceedings of the 1stworkshop on deep learning for recommender systems .[5] Weiyu Cheng, Yanyan Shen, and Linpeng Huang. 2020. Adaptive FactorizationNetwork: Learning Adaptive-Order Feature Interactions. In

Proceedings of AAAI .[6] Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networks foryoutube recommendations. In

Proceedings of RecSys .[7] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. Bert:Pre-training of deep bidirectional transformers for language understanding. In

Proceedings of NAACL . [8] Wenqi Fan, Yao Ma, Qing Li, Yuan He, Eric Zhao, Jiliang Tang, and Dawei Yin.2019. Graph Neural Networks for Social Recommendation. In

Proceedings ofWWW .[9] Lu Gan, Diana Nurbakova, Léa Laporte, and Sylvie Calabretto. 2020. EnhancingRecommendation Diversity using Determinantal Point Processes on KnowledgeGraphs. In

Proceedings of SIGIR .[10] Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He. 2017.DeepFM: a factorization-machine based neural network for CTR prediction. In

Proceedings of IJCAI .[11] Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representationlearning on large graphs. In

Proceedings of NIPS .[12] Xiangnan He and Tat-Seng Chua. 2017. Neural factorization machines for sparsepredictive analytics. In

Proceedings of SIGIR .[13] Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, Yongdong Zhang, and MengWang. 2020. LightGCN: Simplifying and Powering Graph Convolution Networkfor Recommendation. In

Proceedings of SIGIR .[14] Jui-Ting Huang, Ashish Sharma, Shuying Sun, Li Xia, David Zhang, Philip Pronin,Janani Padmanabhan, Giuseppe Ottaviano, and Linjun Yang. 2020. Embedding-based retrieval in facebook search. In

Proceedings of KDD .[15] Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and LarryHeck. 2013. Learning deep structured semantic models for web search usingclickthrough data. In

Proceedings of CIKM .[16] Mahmut Özge Karakaya and Tevfik Aytekin. 2018. Effective methods for increas-ing aggregate diversity in recommender systems. knowledge and InformationSystems (2018).[17] Mohamed Koutheaïr Khribi, Mohamed Jemni, and Olfa Nasraoui. 2008. Automaticrecommendations for e-learning personalization based on web usage miningtechniques and information retrieval. In

Proceedings of ICALT .[18] Diederik P Kingma and Jimmy Ba. 2015. Adam: A method for stochastic opti-mization. In

Proceedings of ICLR .[19] Thomas N Kipf and Max Welling. 2017. Semi-supervised classification with graphconvolutional networks. In

Proceedings of ICLR .[20] Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization tech-niques for recommender systems.

Computer (2009).[21] Matevž Kunaver and Tomaž Požrl. 2017. Diversity in recommender systems–Asurvey.

Knowledge-Based Systems (2017).[22] Shuai Li, Alexandros Karatzoglou, and Claudio Gentile. 2016. Collaborativefiltering bandits. In

Proceedings of SIGIR .[23] Bin Liu, Chenxu Zhu, Guilin Li, Weinan Zhang, Jincai Lai, Ruiming Tang, Xi-uqiang He, Zhenguo Li, and Yong Yu. 2020. Autofis: Automatic feature interactionselection in factorization models for click-through rate prediction. In

Proceedingsof KDD .[24] Qi Liu, Ruobing Xie, Lei Chen, Shukai Liu, Ke Tu, Peng Cui, Bo Zhang, andLeyu Lin. 2020. Graph Neural Network for Tag Ranking in Tag-enhanced VideoRecommendation. In

Proceedings of CIKM .[25] Yong Liu, Yinan Zhang, Qiong Wu, Chunyan Miao, Lizhen Cui, Binqiang Zhao,Yin Zhao, and Lu Guan. 2019. Diversity-Promoting Deep Reinforcement Learningfor Interactive Recommendation. arXiv preprint arXiv:1903.07826 (2019).[26] Yuanfu Lu, Ruobing Xie, Chuan Shi, Yuan Fang, Wei Wang, Xu Zhang, andLeyu Lin. 2020. Social influence attentive neural network for friend-enhancedrecommendation. In

Proceedings of ECML-PKDD .[27] Kanak Mahadik, Qingyun Wu, Shuai Li, and Amit Sabne. 2020. Fast distributedbandits for online recommendation systems. In

Proceedings of ICS .[28] Qiaozhu Mei, Jian Guo, and Dragomir Radev. 2010. Divrank: the interplay ofprestige and diversity in information networks. In

Proceedings of KDD .[29] Sharad Nandanwar, Aayush Moroney, and M Narasimha Murty. 2018. FusingDiversity in Recommendations in Heterogeneous Information Networks. In

Proceedings of WSDM .[30] Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learningof social representations. In

Proceedings of KDD .[31] Lijing Qin and Xiaoyan Zhu. 2013. Promoting diversity in recommendation byentropy regularizer. In

Twenty-Third International Joint Conference on ArtificialIntelligence .[32] Yanru Qu, Han Cai, Kan Ren, Weinan Zhang, Yong Yu, Ying Wen, and Jun Wang.2016. Product-based neural networks for user response prediction. In

Proceedingsof ICDM .[33] Steffen Rendle. 2010. Factorization machines. In

Proceedings of ICDM .[34] Badrul Munir Sarwar, George Karypis, Joseph A Konstan, John Riedl, et al. 2001.Item-based collaborative filtering recommendation algorithms.. In

Proceedings ofWWW .[35] Ying Shan, T Ryan Hoens, Jian Jiao, Haijing Wang, Dong Yu, and JC Mao. 2016.Deep crossing: Web-scale modeling without manually crafted combinatorialfeatures. In

Proceedings of KDD .[36] Weiping Song, Chence Shi, Zhiping Xiao, Zhijian Duan, Yewen Xu, Ming Zhang,and Jian Tang. 2019. Autoint: Automatic feature interaction learning via self-attentive neural networks. In

Proceedings of CIKM .37] Fei Sun, Jun Liu, Jian Wu, Changhua Pei, Xiao Lin, Wenwu Ou, and Peng Jiang.2019. BERT4Rec: Sequential Recommendation with Bidirectional Encoder Repre-sentations from Transformer. In

Proceedings of CIKM .[38] Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, PietroLio, and Yoshua Bengio. 2018. Graph attention networks. In

Proceedings of ICLR .[39] Jizhe Wang, Pipei Huang, Huan Zhao, Zhibo Zhang, Binqiang Zhao, and Dik LunLee. 2018. Billion-scale commodity embedding for e-commerce recommendationin alibaba. In

Proceedings of KDD .[40] Ruoxi Wang, Bin Fu, Gang Fu, and Mingliang Wang. 2017. Deep & cross networkfor ad click predictions. In

Proceedings of ADKDD .[41] Xiao Wang, Houye Ji, Chuan Shi, Bai Wang, Yanfang Ye, Peng Cui, and Philip SYu. 2019. Heterogeneous Graph Attention Network. In

Proceedings of WWW .[42] Le Wu, Qi Liu, Enhong Chen, Nicholas Jing Yuan, Guangming Guo, and Xing Xie.2016. Relevance meets coverage: A unified framework to generate diversifiedrecommendations.

TIST (2016).[43] Qiong Wu, Yong Liu, Chunyan Miao, Binqiang Zhao, Yin Zhao, and Lu Guan.2019. PD-GAN: Adversarial Learning for Personalized Diversity-PromotingRecommendation. In

IJCAI .[44] Shu Wu, Yuyuan Tang, Yanqiao Zhu, Liang Wang, Xing Xie, and Tieniu Tan. 2019.Session-based Recommendation with Graph Neural Networks. In

Proceedings ofAAAI .[45] Jun Xiao, Hao Ye, Xiangnan He, Hanwang Zhang, Fei Wu, and Tat-Seng Chua.2017. Attentional factorization machines: Learning the weight of feature interac-tions via attention networks. In

Proceedings of IJCAI .[46] Ruobing Xie, Cheng Ling, Yalong Wang, Rui Wang, Feng Xia, and Leyu Lin. 2020.Deep Feedback Network for Recommendation. In

Proceedings of IJCAI .[47] Ruobing Xie, Zhijie Qiu, Jun Rao, Yi Liu, Bo Zhang, and Leyu Lin. 2020. Internaland Contextual Attention Network for Cold-start Multi-channel Matching in Recommendation. In

Proceedings of IJCAI .[48] Chuxu Zhang, Dongjin Song, Chao Huang, Ananthram Swami, and Nitesh VChawla. 2019. Heterogeneous Graph Neural Network. In

Proceedings of KDD .[49] Lin Zhang, Qiang Yan, Junqiang Lu, Yongqiang Chen, and Yi Liu. 2019. EmpiricalResearch on the Impact of Personalized Recommendation Diversity. In

Proceedingsof HICSS .[50] Mi Zhang and Neil Hurley. 2008. Avoiding monotony: improving the diversity ofrecommendation lists. In

Proceedings of RecSys .[51] Weinan Zhang, Tianming Du, and Jun Wang. 2016. Deep learning over multi-fieldcategorical data. In

European conference on information retrieval .[52] Guorui Zhou, Xiaoqiang Zhu, Chenru Song, Ying Fan, Han Zhu, Xiao Ma, YanghuiYan, Junqi Jin, Han Li, and Kun Gai. 2018. Deep interest network for click-throughrate prediction. In

Proceedings of KDD .[53] Han Zhu, Daqing Chang, Ziru Xu, Pengye Zhang, Xiang Li, Jie He, Han Li, JianXu, and Kun Gai. 2019. Joint Optimization of Tree-based Index and Deep Modelfor Recommender Systems. In

Proceedings of NIPS .[54] Han Zhu, Xiang Li, Pengye Zhang, Guozheng Li, Jie He, Han Li, and Kun Gai.2018. Learning Tree-based Deep Model for Recommender Systems. In

Proceedingsof KDD .[55] Xiaojin Zhu, Andrew Goldberg, Jurgen Van Gael, and David Andrzejewski. 2007.Improving diversity in ranking using absorbing random walks. In

Proceedings ofNAACL .[56] Jingwei Zhuo, Ziru Xu, Wei Dai, Han Zhu, Han Li, Jian Xu, and Kun Gai. 2020.Learning Optimal Tree Models under Beam Search. In

Proceedings of ICML .[57] Cai-Nicolas Ziegler, Sean M McNee, Joseph A Konstan, and Georg Lausen. 2005.Improving recommendation lists through topic diversification. In