[PDF] SetRank: A Setwise Bayesian Approach for Collaborative Ranking from Implicit Feedback

Abstract

The recent development of online recommender systems has a focus on collaborative ranking from implicit feedback, such as user clicks and purchases. Different from explicit ratings, which reflect graded user preferences, the implicit feedback only generates positive and unobserved labels. While considerable efforts have been made in this direction, the well-known pairwise and listwise approaches have still been limited by various challenges. Specifically, for the pairwise approaches, the assumption of independent pairwise preference is not always held in practice. Also, the listwise approaches cannot efficiently accommodate "ties" due to the precondition of the entire list permutation. To this end, in this paper, we propose a novel setwise Bayesian approach for collaborative ranking, namely SetRank, to inherently accommodate the characteristics of implicit feedback in recommender system. Specifically, SetRank aims at maximizing the posterior probability of novel setwise preference comparisons and can be implemented with matrix factorization and neural networks. Meanwhile, we also present the theoretical analysis of SetRank to show that the bound of excess risk can be proportional to M/N − − − − − √ , where M and N are the numbers of items and users, respectively. Finally, extensive experiments on four real-world datasets clearly validate the superiority of SetRank compared with various state-of-the-art baselines.

Full PDF

SSetRank: A Setwise Bayesian Approach for Collaborative Rankingfrom Implicit Feedback

Chao Wang , Hengshu Zhu ∗ , Chen Zhu , Chuan Qin , Hui Xiong ∗ University of Science and Technology of China Baidu Talent Intelligence Center, Baidu Inc. Business Intelligence Lab, Baidu [email protected], [email protected], { zc3930155, chuanqin0426, xionghui } @gmail.com Abstract

The recent development of online recommender systems hasa focus on collaborative ranking from implicit feedback, suchas user clicks and purchases. Different from explicit ratings,which reﬂect graded user preferences, the implicit feedbackonly generates positive and unobserved labels. While con-siderable efforts have been made in this direction, the well-known pairwise and listwise approaches have still been lim-ited by various challenges. Speciﬁcally, for the pairwise ap-proaches, the assumption of independent pairwise preferenceis not always held in practice. Also, the listwise approachescannot efﬁciently accommodate “ties” due to the precondi-tion of the entire list permutation. To this end, in this paper,we propose a novel setwise Bayesian approach for collabora-tive ranking, namely SetRank, to inherently accommodate thecharacteristics of implicit feedback in recommender system.Speciﬁcally, SetRank aims at maximizing the posterior prob-ability of novel setwise preference comparisons and can beimplemented with matrix factorization and neural networks.Meanwhile, we also present the theoretical analysis of Se-tRank to show that the bound of excess risk can be propor-tional to (cid:112)

M/N , where M and N are the numbers of itemsand users, respectively. Finally, extensive experiments on fourreal-world datasets clearly validate the superiority of SetRankcompared with various state-of-the-art baselines. Introduction

Recommender systems have been widely deployed in manypopular online services for enhancing user experience andbusiness revenue (Wang, Wang, and Yeung 2015; Liu et al.2019a; Liu et al. 2019b; Zhu et al. 2018). As one repre-sentative task of personalized recommendation, collabora-tive ranking aims at providing a user-speciﬁc item rankingfor users based on their preferences learned from histori-cal feedback. Indeed, in real-world scenarios, most of theuser feedback is implicit (e.g., clicks and purchases) butnot explicit (e.g., 5-star ratings). Different from explicit rat-ings, the implicit feedback only contains positive and un-observed labels instead of graded user preferences, whichbrings new research challenges for building recommender ∗ Hui Xiong and Hengshu Zhu are corresponding authors.Copyright c (cid:13) systems (Hsieh, Natarajan, and Dhillon 2015). Therefore,collaborative ranking from implicit feedback has been at-tracting more and more attention in recent years (Rendle etal. 2009; Shi, Larson, and Hanjalic 2010; Huang et al. 2015;Xia 2019).While considerable efforts have been made in this di-rection (Rendle et al. 2009; Chen et al. 2009; Weimer etal. 2008), represented by the well-known pairwise and list-wise approaches, some critical challenges still exist. Asfor the family of pairwise approaches (Rendle et al. 2009;Freund et al. 2003; Chapelle and Keerthi 2010; Wu, Hsieh,and Sharpnack 2017; Krohn-Grimberghe et al. 2012), whichtake the item pair as the basic element to model the prefer-ence structure in implicit feedback, they are prone to the in-consistency problem between assumption and practice. Forexample, Bayesian Personalized Ranking (BPR) (Rendle etal. 2009), one of the most widely used collaborative pairwiseapproaches, tries to maximize the probability of binary com-parison between positive and unobserved feedback. Suchtreatment requires the strict assumption of independent pair-wise preference over two items as the basis for constructingpairwise loss. However, as shown in Figure 1, if there ex-ist the item preference pairs “A > B” and “C > D” for user 1,the pairs “A > D” and “C > B” must also exist for user 1 dueto the binary value of implicit feedback. In other words, wehave p ( A > D,C > B | A > B,C > D ) = 1 in the practical pairconstruction process, which breaks the independence amongpairs and thus inﬂuences the optimization result of the pair-wise loss. Some follow-up studies chose to relax the inde-pendence assumption by considering group information. Forexample, GBPR (Pan and Chen 2013b) introduced richerusers’ interactions and Coﬁset (Pan and Chen 2013a) de-ﬁned a users preference on the item group to consider thecomposition effect. However, the inconsistency problem re-mains to some extent.As for the listwise approaches, the key challenge is howto efﬁciently accommodate “ties” (items with the same rat-ing value) due to the precondition of entire list permutation,since there is no clear sequential relationship but binary rat-ing in implicit feedback. Besides, they measure the uncer-tainty between the top- P items on the observed and pre-dicted list by calculating the cross-entropy (Cao et al. 2007; a r X i v : . [ c s . I R ] F e b sers Items

A B C D

Original rating matrix user1

A > BA > B

A > DA > DC > DC > DC > BC > B user2

B > AB > A

B > CB > CD > CD > C

D > AD > A

Pairwise user1

A > { B, D }A > { B, D } ...

Setwise

C > { B, D }user2 B > { A, C }B > { A, C }

D > { A, C }    ？？？？？？？    ？？？？？？？ ... user1

A > C > B > DA > C > B > D

Listwise

C > A > B > DC > A > D > BA > C > D > B ... user2

B > D > A > C B > D > A > C

D > B > A > C D > B > C > AB > D > C > A ...

Figure 1: The diagrammatic sketch of preference structures in different collaborative ranking approaches, where the notation” > ” represents the preference order.Shi, Larson, and Hanjalic 2010; Huang et al. 2015; Wanget al. 2016), which would result in the exponential compu-tational complexity to P (that is why often P is set as ).Though Wu, Hsieh, and Sharpnack(2018) tried to proposea permutation probability based listwise model to solve theabove challenges, only the upper bound rather than the orig-inal negative log-likelihood is optimized.To avoid the limitations of the existing collaborative rank-ing approaches, in this paper, we propose a novel setwiseBayesian approach, namely SetRank, for collaborative rank-ing. SetRank has the ability in accommodating the character-istics of implicit feedback in recommender systems. Particu-larly, we ﬁrst make a weaker independence assumption com-pared to pairwise approaches, that is, each user prefers ev-ery positive item over the set of unobserved items indepen-dently. Hence, we can transform the original rating recordsinto the comparisons between each single positive item andthe set of unobserved items, which could avoid the inconsis-tency problem in pairwise approaches, as the example shownin Figure 1. Moreover, since there is no ordering informationbetween unobserved items, it is unnecessary to rank the setof unobserved items, which relaxes the permutation form inlistwise approaches. Speciﬁcally, our approach is named as“setwise” because the preference order of a user is only de-ﬁned between each positive item and the set of unobserveditems. Consequently, SetRank is able to model the propertiesof implicit feedback in a more effective manner, with avoid-ing the disadvantages of both pairwise and listwise rankingapproaches. The contributions of this work can be summa-rized as follows: • We propose a novel setwise Bayesian collaborative rank-ing approach, namely SetRank, to provide a new re-search perspective for implicit feedback based recommen-dations. SetRank can inherently accommodate the charac-teristics of implicit feedback. • We design two implementations for SetRank, namely MF-SetRank and Deep-SetRank based on matrix factorizationand neural networks, respectively. • We validate our approach by both theoretical analysis andexperiments. Speciﬁcally, we prove that the bound of ex-cess risk can be bounded by a (cid:112)

M/N term, where M and N are the numbers of items and users, respectively. Mean-while, extensive experiments on four real-world datasetsclearly demonstrate the advantages of our approach com-pared with various state-of-the-art baselines. Setwise Bayesian Collaborative Ranking

Problem Formulation

Suppose there are N users and M items in the dataset. Let P i and O i denote the set of positive and unobserved items foreach user i , respectively. User i has J i = | P i | positive itemsand K i = | O i | unobserved items. Then the rating matrix R = { R il } N × M is a binary matrix, i.e., R ij = 1 for j ∈ P i and R ik = 0 for k ∈ O i . The goal of collaborative rankingis to recommend each user an ordered item list by predictingthe preference score matrix X = { X il } N × M . SetRank Optimization Criterion

The target of SetRank is to maximize the posterior probabil-ity of preference structure to build the Bayesian formulationof collaborative ranking: p (Θ | > total ) ∝ p ( > total | Θ) p (Θ) , (1) where > total = { > i } Ni =1 and > i is a random variable repre-senting the preference structure of user i , which takes valuesfrom all possible preference structures. Θ is the model pa-rameters to be learned.Before modeling the setwise preference structure proba-bility, we ﬁrst give a new independence assumption: Assumption 1

Every user i prefers the positive item j ∈ P i to unobserved item set O i independently. In this setwise assumption, we ignore the direct compar-isons among positive or unobserved items to better reﬂectthe nature of implicit feedback, since there is no explicititem-level preference information. Supposing there is onlyone user, we have no reason to decide which positive item isbetter than another positive one, or which unobserved itemis better than another unobserved one. Only when there aremany users, we can then exploit collaborative information toderive entire ranking results.y comparison, pairwise approaches like BPR (Rendle etal. 2009) establish the individual binary comparison overeach positive item and each unobserved item, which needthe strict assumption that item comparisons are independentfor optimization. However, in the pair construction process,the pairs are bound to be dependent due to the characteris-tic of implicit feedback. When calculating the pairwise loss,pairwise approaches still assume pairs are independent andoptimize the improper loss. By contrast, setwise approachhas no such inconsistency problem owing to the weaker in-dependence assumption.Moreover, setwise permutation form is weaker than list-wise approaches. In the comparisons, we do not care aboutthe ranking of items in O i , since the ordering informationof unobserved items is naturally missing in implicit data.As a result, all the unobserved items are treated equally inthe preference comparison, and thus the setwise ranking ap-proach is inherently suitable for handling implicit data.According to our assumption, we can transform Equa-tion 1 into the following form: p ( > total | Θ) = N (cid:89) i =1 p ( > i | Θ) = N (cid:89) i =1 (cid:89) j ∈ P i p ( j > i O i | Θ) , (2) where j > i O i denotes the user i prefers item j to itemset O i . Therefore, we turn to collect the preference compar-ison between a single positive item and an unobserved itemset. For example, in Figure 1, there are two comparisons foruser 1, A > { B, D } and C > { B, D } .In the setwise preference structure, the positive item j and the unobserved set O i compose a new item list L ij .Hence, it is convenient to draw the concept of permuta-tion probability (Cao et al. 2007) from listwise approachesfor further specifying the preference structure probability p ( j > i O i ) . Review that in listwise approach, a permuta-tion π = { π , π , ..., π m } is a list in descending order ofthe m items (Cao et al. 2007). Denote the scores assignedto items as a vector s = ( s , s , ..., s m ) and φ ( x ) is an in-creasing and strictly positive function. Then the permutationprobability is deﬁned as: p s ( π ) := m (cid:89) d =1 φ ( s π d ) (cid:80) ml = d φ ( s π l ) . (3) It is easy to verify that p s ( π ) is a valid probability distri-bution. In the literature, permutation probability has beenwidely used in many listwise approaches to calculate thecross entropy due to many beneﬁcial properties (Xia et al.2008). These properties guarantee that items with higherscores are more likely to be ranked higher. However, a se-rious problem of the deﬁnition is that we have to calculate P ! permutation probabilities to obtain the top- P probabilityof the list. Fortunately, in our case, we only need to placethe positive item j at the top of List L ij , which means wejust concentrate on the top-1 probability. Actually, Cao etal.(2007) had proved that the top- probability p s, ( d ) ofitem d can be efﬁciently calculated under the deﬁnition ofEquation 3 as follows: p s, ( d ) = φ ( s d ) (cid:80) ml =1 φ ( s l ) . (4) With the help of Equation 4 and the preference score ma-trix X , now we can give the detailed formulation of the set-wise preference probability over all users: p ( > total | Θ) = N (cid:89) i =1 (cid:89) j ∈ P i φ ( X ij ) φ ( X ij ) + (cid:80) k ∈ O i φ ( X ik ) . (5) As one can see, Equation 5 indicates that if positive itemshave higher scores and unobserved items have lower scores,this preference structure will be more likely to be true.At last, to complete the Bayesian inference, we introducea general prior probability p (Θ) . Following BPR (Rendleet al. 2009), p (Θ) is set as a normal distribution with zeromean and variance-covariance matrix λ Θ I . Hence, maxi-mizing the posterior probability is equivalent to minimizingthe following function: L = N (cid:88) i =1 (cid:88) j ∈ P i − log p ( j > i O i | Θ) + λ Θ (cid:107) Θ (cid:107) . (6) Note that though some listwise approaches also exploitEquation 4 (Cao et al. 2007; Shi, Larson, and Hanjalic2010), it is actually quite different from SetRank. First, list-wise approaches are essentially based on the top- P prob-ability since they consider the order in a list composed ofmultiple positive and unobserved items. In fact, using alarger P tends to improve the performance of listwise ap-proaches (Cao et al. 2007). They use the top-1 probabilitymainly due to the compromise of exponential computationalcomplexity for calculating the top- P probability. However,our setwise assumption, which is more appropriate for im-plicit feedback, is naturally based on the top-1 probability.Second, they could only employ cross-entropy loss for cal-culation while the cross-entropy loss may rank worse scor-ing permutations higher (Wu, Hsieh, and Sharpnack 2018).On the contrary, our loss is strictly obtained by Bayesian in-ference without the adoption of cross-entropy. Implementation

It is quite ﬂexible to apply many well-known models tolearn the score matrix X. In the literature, matrix factor-ization (Mnih and Salakhutdinov 2008) and neural network(NN) (Xue et al. 2017) have demonstrated their effectivenessand practicability for recommender systems. Therefore, inthis paper, we introduce two implementations for SetRank,namely MF-SetRank and Deep-SetRank, based on the abovetwo models, respectively.

MF-SetRank.

MF-SetRank is based on the popularcollaborative model, Probabilistic Matrix Factorization(PMF) (Mnih and Salakhutdinov 2008). PMF factorizes thescore matrix into two factor matrices representing user anditem latent features. Along this line, we have X = U T V ,where U ∈ R r × N and V ∈ R r × M are latent user anditem matrices, respectively. Then the prior probabilities over lgorithm 1 Gradient update for V when ﬁxing U

Require:

V, U, γ, decay, λ, { P i , ˜ O i | ≤ i ≤ N } Ensure: V grad = λ · V for i = 1 to N do

3: Precompute g l = u Ti v l , for ∀ l ∈ P i ∪ ˜ O i

4: Initialize tmp = 0 , totalsum = 0 , sum = 0 , s [ l ] = 0 for ∀ l ∈ P i , c [ l ] = 0 for ∀ l ∈ P i ∪ ˜ O i for l ∈ ˜ O i do sum += exp( g l ) end for for l ∈ P i do c [ l ] − = g l · (1 − g l ) s [ l ] = sum + exp( g l ) totalsum += 1 /s [ l ] end for for l ∈ ˜ O i do c [ l ] += exp( g l ) · g l · (1 − g l ) · totalsum end for for l ∈ P i do c [ l ] += exp( g l ) · g l · (1 − g l ) /s [ l ] end for for l ∈ P i ∪ ˜ O i do grad [: , l ] += c [ l ] · u i end for end for V − = γ · grad γ ∗ = decay

25: Return V columns of U, V are assumed to be the normal distribution,i.e., p ( u i ) ∼ N (0 , λ − I ) and p ( v l ) ∼ N (0 , λ − I ) , where λ is the regularization parameter. In this way, we can transformEquation 6 to the following form: L = N (cid:88) i =1 (cid:88) j ∈ P i − log φ ( u Ti v j ) φ ( u Ti v j ) + (cid:80) k ∈ O i φ ( u Ti v k )+ λ N (cid:88) i =1 (cid:107) u i (cid:107) + M (cid:88) l =1 (cid:107) v l (cid:107) ) . (7) For the ease of calculation, we let log φ ( x ) be the sigmoidfunction, i.e., log φ ( x ) = σ ( x ) = 1 / (1 + e − x ) . It is easy toverify that such φ ( x ) is an increasing and strictly positivefunction. Besides, this choice is also beneﬁcial for boundingthe excess risk which we will discuss in the next subsection.Another notable thing is that we do not have to go throughall the unobserved items for every user in each epoch, con-sidering that the positive feedback is much more inﬂuentialthan the unobserved feedback. Following Wu, Hsieh, andSharpnack(2018), we can randomly sample ˜ K i = τ · J i un-observed items in each epoch to compose the set ˜ O i for re-placing O i in Equation 7.In each epoch, we update the latent factors U and V bythe gradients ∇ U L and ∇ V L , respectively. The speed ef-ﬁciency of recommender system is quite important (Wu,Hsieh, and Sharpnack 2017). Though a direct way to calcu-late the gradients costs O ( N J ˜ Kr ) = O ( N J r ) time, where J = max { J i , ≤ i ≤ N } and ˜ K = τ · J , there is actually

1 0 1 1 0 0 1 101000101

N M  i u Multi-layer perception network layer -1 n layer -1 n layer 1layer 1 l v * i R * i R * l R * l R il X Rating matrix

Figure 2: The modeling process of Deep-SetRank.numerous repeated calculations here. We provide a clevererapproach in Algorithm 1 to rearrange the computation sothat it only requires O ( N ( J + ˜ K ) r ) = O ( N Jr ) time. Letus take the process of updating V as an example. Speciﬁ-cally, given a ﬁxed latent user matrix U , the regularizationparameter λ , the positive item set P i , the unobserved itemset ˜ O i and the decaying rate decay of the step size γ , Al-gorithm 1 shows how to update the gradients for latent itemmatrix V . Thus, MF-SetRank could run with a linear com-putational complexity, which is same as the efﬁcient ratingprediction methods based on matrix factorization (Mnih andSalakhutdinov 2008; Hu, Koren, and Volinsky 2008). Deep-SetRank.

In recent years, neural networks haveshown good capacity on non-linear projection and em-bedding in recommender systems (Qin et al. 2019; Heet al. 2017). Inspired by Deep Matrix Factorization(DeepMF) (Xue et al. 2017), we design a NN based setwisemodel called Deep-SetRank.As shown in Figure 2, Deep-SetRank transforms the row R i ∗ and column R ∗ l of rating matrix R to obtain latent userand item matrices by user and item embedding networks,respectively. Then we still employ Equation 7 as the setwiseloss function. Following DeepMF, we choose the multi-layerperception network (MLP) as the embedding network. Takeuser network as an example, we have h = f ( W R i ∗ + b ) ,h t = f t ( W t h t − + b t ) , t ∈ [2 , n − ,u i = f n ( W n h n − + b n ) , (8) where h t is the t -th hidden layer with weight matrix W t andbias term b t . For the activation function f t ( · ) , we employthe sigmoid function for the ﬁrst n − layers and the tanh function for the last layer. Hence, we can predict the scoresby the product of these two matrices. For each user, we si-multaneously calculate the scores of items in both P i and ˜ O i in a batch for optimizing setwise loss. Different from MF-SetRank, Deep-SetRank needs to train two neural networksrather than latent matrices. heoretical Analysis In this subsection, we aim at giving the theoretical bound forthe excess risk, i.e., the expected difference between the es-timate and the truth, of SetRank. Without loss of generality,we assume that all the users have the same number of posi-tive items and unobserved items for the sake of convenience.Hence, we have J = J i , K = K i for ∀ i . Note that the resultcan be readily generalized to the individual setting.Considering the following constrained optimization of ageneral setwise method: ˆ X := arg min X − log p ( > total | X ) such that X ∈ X , (9) where X is the feasible set. Usually, X is constrained bythe norm regularization to satisfy the low-rank condition.For example, in the personalized collaborative setting, X = { X | X = U T V, (cid:107) U (cid:107) F ≤ c u , (cid:107) V (cid:107) F ≤ c v } . Here (cid:107) · (cid:107) F is the Frobenius norm. Supposing there is a X ∗ ∈ X such that > total is generated from p ( > total | X ∗ ) . Thenthe excess risk is given in the form of KL divergence be-tween the real and estimated probability: D ( X ∗ , ˆ X ) := N (cid:80) Ni =1 E log p ( > i | X ∗ i ) p ( > i | ˆ X i ) . So far, the state-of-the-art listwise method could boundthe excess risk by O P (cid:16)(cid:112) rM/N ln M (cid:17) in the person-alized collaborative setting (Wu, Hsieh, and Sharpnack2018). Here we will show that the bound of SetRank is O P (cid:16)(cid:112) rM/N (1 + J/K ) (cid:17) owing to the weaker precondi-tion. In practice, the positive feedback always accounts formerely a tiny fraction of the total items. So, we have J/K (cid:28) , which makes the result sound.First, we give another statistical interpretation of Equa-tion 5 from the generative perspective: Theorem 1

Suppose there is a matrix Y = { Y il } N × M .Each entry Y il is independently drawn from an exponen-tial distribution with rate φ ( X il ) . For each row Y i , let the J smallest entries form the set P i and others form the set O i .Then the ranking structure probability p ( > total | X ) , i.e., theprobability that entries in P i are less than those in O i , is ex-actly equal to the RHS of Equation 5. The proof for Theorem 1 can be found in the Appendix.From Theorem 1, we know that the setwise preference prob-ability can also be seen as the probability of a ranking struc-ture over the matrix Y composed of N × M independentexponential random variables. Thus, we could give the fol-lowing theorem according to McDiarmid’s inequality (Mc-Diarmid 1989) and Dudleys chaining (Talagrand 2006): Theorem 2

Let Z := { log φ ( X ) | X ∈ X } be the imageof element-wise function log φ ( X ) and (cid:107) · (cid:107) ∞ , be the ∞ , norm deﬁned as (cid:107) Z (cid:107) ∞ , := (cid:113)(cid:80) Ni =1 (cid:107) Z i (cid:107) ∞ , Z ∈ R N × M .Denote N ( (cid:15), Z , (cid:107) · (cid:107) ∞ , ) as the (cid:15) -covering number of Z in ∞ , norm, which represents the fewest number of sphericalballs of radius (cid:15) needed to completely cover Z in the condi-tion of ∞ , norm. Hence, if (cid:107) Z i (cid:107) ∞ ≤ α for ∀ i , we have D ( X ∗ , ˆ X ) = O P (cid:16) g ( Z ) √ M/N (1 +

J/K ) (cid:17) , (10) where g ( Z ) = (cid:82) ∞ (cid:112) ln N ( u, Z , (cid:107) · (cid:107) ∞ , ) du . By Theorem 2, we are able to obtain a bound in the gen-eral setting of X . Particularly, in the personalized collabora-tive setting, we can obtain the further result as follows: Theorem 3

Suppose that (cid:107) Z i (cid:107) ∞ ≤ α for ∀ i and log φ ( x ) is 1-Lipschitz, then in the personalized collaborative setting,we have D ( X ∗ , ˆ X ) = O P (cid:16)(cid:112) rM/N (1 + J/K ) (cid:17) . (11) The detailed proofs for Theorem 2 and 3 are in the Ap-pendix. Theorem 3 shows that when ﬁxing the rank of latentfactors, we will have a better estimate with a larger num-ber of users and a smaller number of items, which is accordwith the intuition. Besides, a smaller J with a larger K isbeneﬁcial for bounding the excess risk. Experiments

Experimental Settings

Datasets.

We evaluated the performance of our SetRankmethod on four real-world datasets, i.e.,

MovieLens , Kin-dle , Yahoo and CiteULike . MovieLens is a com-monly used movie recommendation dataset.

Kindle containsAmazon product ratings collected from Kindle Store.

Ya-hoo (Marlin and Zemel 2009) contains ratings for songsfrom Yahoo! Music.

CiteULike is composed of users’ col-lections of articles on CiteULike website. Following Wu,Hsieh, and Sharpnack(2018), we took two steps for data pre-processing. First, the original data of

MovieLens , Kindle and

Yahoo are in the form of 5-star ratings. We transformed theminto implicit data, where each entry was marked as / ,depending on whether the ratings are greater than 3. Sec-ond, in order to make sure we have adequate positive feed-back for better evaluating the recommendation algorithms,we ﬁltered out users with less than 60, 20, 10, 10 posi-tive items in MovieLens , Kindle , Yahoo and

CiteULike , re-spectively. After data ﬁltering, there are totally 3,937 usersand 3,533 items with 923,473 positive entries in

MovieLens ,4,379 users and 3,774 items with 102,545 positive entriesin

Kindle , 4,664 users and 921 items with 82,384 positiveentries in

Yahoo , 4,123 users and 7,849 items with 135,365positive entries in

CiteULike . Evaluation protocols.

We randomly sampled of posi-tive items for each user to construct the training set in eachdataset, while the maximum number of item samples foreach user was set as 10. Then, we sampled 1 positive item ofeach user as the validation set. Meanwhile, the rest data wereused for test. In this way, we randomly split each dataset ﬁvetimes and reported all the results by mean values. To evaluate https://grouplens.org/datasets/movielens/ http://jmcauley.ucsd.edu/data/amazon/ https://webscope.sandbox.yahoo.com/catalog.php?datatype=r able 1: The recommendation performance of different approaches. (Methods with notation ∗ are our proposed methods. Weconducted the paired t-tests to verify that all improvements by SetRank are statistically signiﬁcant for p < . .) Datasets Methods P@5 P@10 R@5 R@10 MAP@5 MAP@10

MovieLens

WMF . ± . . ± . . ± . . ± . . ± . . ± . BPR . ± . . ± . . ± . . ± . . ± . . ± . Coﬁset . ± . . ± . . ± . . ± . . ± . . ± . SQL-Rank . ± . . ± . . ± . . ± . . ± . . ± . MF-SetRank ∗ . ± . . ± . . ± . . ± . . ± . . ± . DeepMF . ± . . ± . . ± . . ± . . ± . . ± . Deep-BPR . ± . . ± . . ± . . ± . . ± . . ± . Deep-SQL . ± . . ± . . ± . . ± . . ± . . ± . Multi-VAE . ± . . ± . . ± . . ± . . ± . . ± . Deep-SetRank ∗ ± . ± . ± . ± . ± . ± . the performance, we adopted three widely used evaluationmetrics, i.e., P@ P , R@ P and MAP@ P (Wu, Hsieh, andSharpnack 2018; Wang, Wang, and Yeung 2015). For eachuser, P (Precision) @ P measures the ratio of correct pre-diction results among top- P items to P and R (Recall) @ P measures the ratio of correct prediction results among top- P items to all positive items. Furthermore, MAP (Mean Av-erage Precision) @ P considers the ranking of correct pre-diction results among top- P items. The ﬁnal results of threemetrics are given in the average of all users. Baselines.

The recommendation methods for comparisonare listed as follows: • WMF:

Weighted Matrix Factorization (Hu, Koren, andVolinsky 2008) is a popular rating prediction method forimplicit data, which introduces the conﬁdence levels intostandard matrix factorization model. • BPR:

Bayesian Personalized Ranking (Rendle et al.2009) is a widely used pairwise collaborative ranking ap- proach, which transforms the original rating matrix intothe form of independent pairs. • Coﬁset:

Coﬁset (Pan and Chen 2013a) deﬁnes the grouppreference as the mean value of each item in the group.Then the BPR loss function is used for optimization. • SQL-Rank:

Stochastic Queuing Listwise Ranking (Wu,Hsieh, and Sharpnack 2018) is a state-of-the-art listwiseapproach, which breaks ties randomly and generates mul-tiple possible permutations. • DeepMF:

Deep Matrix Factorization (Xue et al. 2017) isa NN based matrix factorization model. • Multi-VAE:

Variational Autoencoders for CollaborativeFiltering (Liang et al. 2018) is a state-of-the-art NN basedmethod, which extends variational autoencoders to rec-ommendations for implicit feedback. • Deep-BPR, Deep-SQL:

These two methods have thesame network architecture with Deep-SetRank, but we re- @ P @ P @ P @ Figure 3: The performance of P@ with different values of sampling ratio τ on the four datasets. r P @ r P @ r P @ r P @ Figure 4: The performance of P@ with different values of dimension r on the four datasets.place the loss function in SetRank by those in BPR andSQL-Rank, respectively. Hence, we obtain these two NNbased pairwise and listwise approaches. • MF-SetRank, Deep-SetRank:

These two methods areour proposed setwise Bayesian approaches for collabora-tive ranking from implicit feedback. We release our codeat https://github.com/chadwang2012/SetRank.Please note that WMF, BPR, SQL-Rank, MF-SetRank areall implemented with a basic matrix factorization model andthe four “Deep” methods are all implemented with the sameneural network architecture, so it is a fair setting to comparethe performances of different item ranking approaches.

Parameter settings.

For the above baselines, we havecarefully explored the corresponding parameters, i.e., thenumber of dimensions and regularization parameters. Be-sides, for SQL-Rank, we chose the ratio of subsampledunobserved items to positive items as follow-ing the authors’ guidance. For MF-SetRank, we tuned thelearning rate in [0 . , . , ..., . and the decay rate in [0 . , . , . , . , . . We also ﬁxed the sampling ratio τ to . Then we tuned the number of dimensions r in [50,100, 150, 200, 250, 300] and the regularization parameter λ in [0 . , . , ..., . , . .For Multi-VAE, we set the encoder as -layer MLP withdimensions × and decoder with dimensions × . For the other four “Deep” methods, we ﬁxed the usernetwork as -layer MLP with dimensions × and itemnetwork with dimensions × . Then we performedAdam (Kingma and Ba 2014) algorithm for optimizationand tune the learning rate from . to . . Overall Performance Comparison

We present the overall recommendation performance re-sults of the nine methods in Table 1 under two types ofsettings, i.e., P = 5 and P = 10 , since the top recom-mended items are much more important in practical scenes.As shown in the results, Deep-SetRank achieves the best per-formance against all the baseline methods on every dataset.Speciﬁcally, Deep-SetRank outperforms the best baselinesby an average relative boost of 4.28% for the metric P@ on the four datasets. Besides, MF-SetRank achieves thebest performance against all the other MF based baselines.Speciﬁcally, MF-SetRank outperforms the state-of-the-artMF based method, SQL-Rank, by an average relative boostof 11.57% for the metric P@ . We can also observe thatNN based models have stronger embedding ability and canperform better than MF based models. Nevertheless, it isnotable that MF-SetRank has achieved comparable perfor-mances with NN based methods, such as Multi-VAE andDeep-SQL. The outstanding performances clearly demon-strate the effectiveness of our setwise approaches. We canalso observe that SetRank achieves the largest relative boostto the other baselines on the sparsest dataset, CiteULike ,which shows its superior capacity for handling sparsity prob-lem. Another notable thing is that listwise approaches seemto perform better in top- metrics than top- metrics. Thisis probably because they pay more attention to the top ranksin an item list. On the opposite, SetRank treats every posi-tive item or every unobserved item fairly thus can performwell in both top- and top- metrics. Hyper-parameter Investigations

Effectiveness of negative sampling.

As mentioned in Sec-tion Implementation, it is unnecessary to utilize all the un-bserved items for gradient calculations in SetRank. We canjust randomly sample τ · J i negative items for each user i in each epoch. Since the number of positive items J i isusually far smaller than the number of total items, there arefew unobserved item overlaps for each user among differentepochs. In this subsection, we ﬁx all the other parameters tobe the same and evaluate the inﬂuence of sampling ratio τ onﬁnal recommendation results. The P@ results are shown inFigure 3. We ﬁnd that when τ = 3 , the performance is goodenough. Even if we further enlarge the value of τ , the resultwould not increase signiﬁcantly. Sensitivities of latent factors.

In this paper, we factorize thescore matrix into the product of user and item latent factorsin a low-rank space. Therefore, the rank r of latent space isquite inﬂuential to the result. If the rank r is too small, themodel could not ﬁt the real-world data well while if r is toolarger, it may cause the overﬁtting problem. We varied r totrain our method and then presented the results in Figure 4.We can observe that the performance result of SetRank is notgood when r = 50 . With a larger value of r , the performanceof MF-SetRank tends to be much better. Thus, we suggestadopting a large value for r to get the best performance inMF-SetRank. By comparison, r = 100 seems to be goodenough for Deep-SetRank. Conclusion

In this paper, we proposed a setwise Bayesian approach,namely SetRank, for collaborative ranking. SetRank hasthe ability in accommodating the characteristic of implicitfeedback in recommender systems. Speciﬁcally, we ﬁrst de-signed a novel setwise preference structure. Then, we max-imized the posterior probability of the setwise preferencestructure to complete the Bayesian inference. In particular,we designed two implementations, MF-SetRank and Deep-SetRank. Moreover, we provided the theoretical analysis ofSetRank to show that the bound of excess risk can be pro-portional to (cid:112)

M/N . Finally, extensive experiments on fourreal-world datasets clearly validated the advantages of Se-tRank over various state-of-the-art baselines.

Acknowledgments

This work was supported by grants from the National Natu-ral Science Foundation of China (No.91746301, 61836013).

Appendix

Proof of Theorem 1

Proof . Owing to the independence assumption, the follow-ing equation holds: p ( > total | X ) = N (cid:89) i =1 p ( > i | X i )= N (cid:89) i =1 (cid:89) j ∈ P i p ( Y ij ≤ min k ∈ O i { Y ik }| X i ) , (12) where min k ∈ O i { Y ik } obeys an exponential distribution with rate (cid:80) k ∈ O i φ ( X ik ) . Then we have p ( Y ij ≤ min k ∈ O i { Y ik }| X i )= (cid:90) ∞ φ ( X ij ) e − uφ ( X ij ) e (cid:80) k ∈ Oi − uφ ( X ik ) du = φ ( X ij ) φ ( X ij ) + (cid:80) k ∈ O i φ ( X ik ) . (13) With Equation 12 and 13, we obtain the conclusion. (cid:3)

Proof of Theorem 2 and Theorem 3

To prove Theorem 2, we ﬁrst follow (Wu, Hsieh, and Sharp-nack 2018) to propose an important lemma to bound the ex-cess risk by an empirical process term.

Lemma 1

Supposing ˆ X := arg min X − log p ( > total | X ) such that X ∈ X and there is a X ∗ ∈ X such that > total is generated from p ( > total | X ∗ ) . Then we have thefollowing inequality, where E is for the draw of > i : D ( X ∗ , ˆ X ) ≤ − N N (cid:88) i =1 (cid:32) log p ( > i | X ∗ i ) p ( > i | ˆ X i ) − E log p ( > i | X ∗ i ) p ( > i | ˆ X i ) (cid:33) . (14) Proof . Due to the optimality condition, we have N (cid:88) i =1 − log p ( > i | ˆ X i ) ≤ N (cid:88) i =1 − log p ( > i | X ∗ i ) . (15) Actually, Equation 15 is equivalent to N N (cid:88) i =1 log p ( > i | X ∗ i ) p ( > i | ˆ X i ) ≤ . Thus, it is easy to obtain the conclusion. (cid:3)

As we can see from Lemma 1, if we ﬁx ˆ X , the empiricalprocess term (the RHS of Equation 14) is a random func-tion of the preference structure > total with mean zero. How-ever, ˆ X is also random so that we have to uniformly boundthe empirical process term over ˆ X ∈ X . To apply Dudleyschaining (Talagrand 2006), we ﬁrst bound the variations be-tween two preference scores X i and X (cid:48) i with Lemma 2: Lemma 2

Deﬁne the difference function ∆( > i | X i , X (cid:48) i ) :=log p ( > i | X i ) p ( > i | X (cid:48) i ) . If a single entry Y il changes, it would causethe transformation of setwise preference structure, i.e., > i would be converted into > (cid:48) i . We can bound the variations ofthe difference function in the form of: | ∆( > i | X i , X (cid:48) i ) − ∆( > (cid:48) i | X i , X (cid:48) i ) |≤ C (cid:107) log φ ( X i ) − log φ ( X (cid:48) i ) (cid:107) ∞ , (16) where C = 2 + 2 e α J/K .roof . If the change of Y il does not lead to the change of P i and O i , there is no inﬂuence for preference structure, i.e., > i = > (cid:48) i and | ∆( > i | X i , X (cid:48) i ) − ∆( > (cid:48) i | X i , X (cid:48) i ) | = 0 .Otherwise, we assume that item j (cid:48) ∈ P i and k (cid:48) ∈ O i exchange their status with each other so that in the new pref-erence structure > (cid:48) i , we have j (cid:48) ∈ O (cid:48) i and k (cid:48) ∈ P (cid:48) i . In thefollowing part of the proof, for ease of the statement, we de-note λ l = φ ( X il ) and Λ j = λ j + (cid:80) k ∈ O i \{ k (cid:48) } λ k . λ (cid:48) l and Λ (cid:48) j are deﬁned analogously with X (cid:48) .So, we have ∆( > i | X i , X (cid:48) i ) = (cid:88) j ∈ P i (cid:18) log λ j λ (cid:48) j − log Λ j + λ k (cid:48) Λ (cid:48) j + λ (cid:48) k (cid:48) (cid:19) , ∆( > (cid:48) i | X i , X (cid:48) i ) = (cid:88) j ∈ P i \{ j (cid:48) } (cid:32) log λ j λ (cid:48) j − log Λ j + λ j (cid:48) Λ (cid:48) j + λ (cid:48) j (cid:48) (cid:33) + log λ k (cid:48) λ (cid:48) k (cid:48) − log Λ j (cid:48) + λ k (cid:48) Λ (cid:48) j (cid:48) + λ (cid:48) k (cid:48) , and thus, | ∆( > i | X i , X (cid:48) i ) − ∆( > (cid:48) i | X i , X (cid:48) i ) | = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) log λ j (cid:48) λ (cid:48) j (cid:48) − log λ k (cid:48) λ (cid:48) k (cid:48) + (cid:88) j ∈ P i \{ j (cid:48) } (cid:32) log Λ j + λ j (cid:48) Λ (cid:48) j + λ (cid:48) j (cid:48) − log Λ j + λ k (cid:48) Λ (cid:48) j + λ (cid:48) k (cid:48) (cid:33)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) log λ j (cid:48) λ (cid:48) j (cid:48) − log λ k (cid:48) λ (cid:48) k (cid:48) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:88) j ∈ P i \{ j (cid:48) } (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) log Λ j + λ j (cid:48) Λ (cid:48) j + λ (cid:48) j (cid:48) − log Λ j + λ k (cid:48) Λ (cid:48) j + λ (cid:48) k (cid:48) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . Notice that (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) log Λ j + λ j (cid:48) Λ (cid:48) j + λ (cid:48) j (cid:48) − log Λ j + λ k (cid:48) Λ (cid:48) j + λ (cid:48) k (cid:48) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) log Λ j + λ j (cid:48) Λ (cid:48) j + λ (cid:48) j (cid:48) − log Λ j Λ (cid:48) j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12) log Λ j Λ (cid:48) j − log Λ j + λ k (cid:48) Λ (cid:48) j + λ (cid:48) k (cid:48) (cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12) log Λ j + λ j (cid:48) Λ j − log Λ (cid:48) j + λ (cid:48) j (cid:48) Λ (cid:48) j (cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12) log Λ j + λ k (cid:48) Λ j − log Λ (cid:48) j + λ (cid:48) k (cid:48) Λ (cid:48) j (cid:12)(cid:12)(cid:12)(cid:12) . Hence, we let δ = (cid:107) log φ ( X i ) − log φ ( X (cid:48) i ) (cid:107) ∞ , and have (cid:12)(cid:12)(cid:12)(cid:12) log Λ j Λ (cid:48) j (cid:12)(cid:12)(cid:12)(cid:12) ≤ max l (cid:12)(cid:12)(cid:12)(cid:12) log λ l λ (cid:48) l (cid:12)(cid:12)(cid:12)(cid:12) ≤ δ. Further, we assume β j = max { λ j (cid:48) Λ j , λ (cid:48) j (cid:48) Λ (cid:48) j } , and then (cid:12)(cid:12)(cid:12)(cid:12) log (cid:18) λ j (cid:48) Λ j (cid:19) − log (cid:18) λ (cid:48) j (cid:48) Λ (cid:48) j (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12) log(1 + β j ) − log(1 + e − δ β j ) (cid:12)(cid:12)(cid:12) ≤ | − e − δ | | β j | . Considering that we have Λ j ≥ Ke − α , we can de-rive β j ≤ e α /K . Similar conclusion can be obtained for (cid:12)(cid:12)(cid:12) log (cid:16) λ k (cid:48) Λ j (cid:17) − log (cid:16) λ (cid:48) k (cid:48) Λ (cid:48) j (cid:17)(cid:12)(cid:12)(cid:12) .Synthesize the analysis above, we thus have | ∆( > i | X i , X (cid:48) i ) − ∆( > (cid:48) i | X i , X (cid:48) i ) |≤ δ + 2 (cid:88) j ∈ P i \{ j (cid:48) } | − e − δ | e α /K ≤ δ (2 + 2 e α J/K ) . Consequently, let C = 2 + 2 e α J/K and we can come tothe conclusion. (cid:3)

Proof of Theorem 2.

The empirical process function is de-ﬁned as ρ N ( x ) := − N N (cid:88) i =1 (cid:32) log p ( > i | X ∗ i ) p ( > i | ˆ X i ) − E log p ( > i | X ∗ i ) p ( > i | ˆ X i ) (cid:33) . From Theorem 1, we know that ρ N ( x ) is a function of N × M independent exponential random variables. Andfrom Lemma 2, we know that the change of preference struc-ture caused by the change of a single entry Y il is bounded.Speciﬁcally, the accumulative squares of bounds are N (cid:88) i =1 M (cid:88) l =1 C (cid:107) log φ ( X i ) − log φ ( X (cid:48) i ) (cid:107) ∞ = MC (cid:107) Z − Z (cid:48) (cid:107) ∞ , . Then according to McDiarmid’s inequality (McDiarmid1989), we have p (cid:8) N (cid:0) ρ N ( x ) − ρ N ( x (cid:48) ) (cid:1) > (cid:15) (cid:9) ≤ exp (cid:18) − (cid:15) MC (cid:107) Z − Z (cid:48) (cid:107) ∞ , (cid:19) . As a result, the stochastic process { N ρ N ( X ) | X ∈ X } is a subGaussian ﬁeld with canonical distance d ( X, X (cid:48) ) = √ M C (cid:107) Z − Z (cid:48) (cid:107) ∞ , . Following Dudley’s chaining (Tala-grand 2006), we can get the conclusion. (cid:3) Proof of Theorem 3.

Wu, Hsieh, and Sharpnack(2018) haveproved that g ( Z ) ≤ c (cid:48) √ rN in the personalized collabora-tive setting, where c (cid:48) is an absolute constant. Thus we canconclude the proof immediately. (cid:3) References [Cao et al. 2007] Cao, Z.; Qin, T.; Liu, T.-Y.; Tsai, M.-F.; andLi, H. 2007. Learning to rank: from pairwise approach tolistwise approach. In

Proceedings of the 24th internationalconference on Machine learning , 129–136. ACM.Chapelle and Keerthi 2010] Chapelle, O., and Keerthi, S. S.2010. Efﬁcient algorithms for ranking with svms.

Informa-tion retrieval

NIPS , 315–323.[Freund et al. 2003] Freund, Y.; Iyer, R.; Schapire, R. E.; andSinger, Y. 2003. An efﬁcient boosting algorithm for com-bining preferences.

Journal of machine learning research

WWW , 173–182. International World Wide Web Confer-ences Steering Committee.[Hsieh, Natarajan, and Dhillon 2015] Hsieh, C.-J.; Natara-jan, N.; and Dhillon, I. S. 2015. Pu learning for matrixcompletion. In

ICML , 2445–2453.[Hu, Koren, and Volinsky 2008] Hu, Y.; Koren, Y.; andVolinsky, C. 2008. Collaborative ﬁltering for implicit feed-back datasets. In

ICDM , volume 8, 263–272. Citeseer.[Huang et al. 2015] Huang, S.; Wang, S.; Liu, T.-Y.; Ma, J.;Chen, Z.; and Veijalainen, J. 2015. Listwise collaborativeﬁltering. In

Proceedings of the 38th International ACM SI-GIR Conference on Research and Development in Informa-tion Retrieval , 343–352. ACM.[Kingma and Ba 2014] Kingma, D. P., and Ba, J. 2014.Adam: A method for stochastic optimization. arXiv preprintarXiv:1412.6980 .[Krohn-Grimberghe et al. 2012] Krohn-Grimberghe, A.;Drumond, L.; Freudenthaler, C.; and Schmidt-Thieme,L. 2012. Multi-relational matrix factorization usingbayesian personalized ranking for social network data. In

Proceedings of the ﬁfth ACM international conference onWeb search and data mining , 173–182. ACM.[Liang et al. 2018] Liang, D.; Krishnan, R. G.; Hoffman,M. D.; and Jebara, T. 2018. Variational autoencoders for col-laborative ﬁltering. In

WWW , 689–698. International WorldWide Web Conferences Steering Committee.[Liu et al. 2019a] Liu, H.; Li, T.; Hu, R.; Fu, Y.; Gu, J.; andXiong, H. 2019a. Joint representation learning for multi-modal transportation recommendation. In , 1036–1043.[Liu et al. 2019b] Liu, H.; Tong, Y.; Zhang, P.; Lu, X.; Duan,J.; and Xiong, H. 2019b. Hydra: A personalized andcontext-aware multi-modal transportation recommendationsystem. In

Proceedings of the 25th ACM SIGKDD Interna-tional Conference on Knowledge Discovery and Data Min-ing , 2314–2324.[Marlin and Zemel 2009] Marlin, B. M., and Zemel, R. S.2009. Collaborative prediction and ranking with non-random missing data. In

Proceedings of the third ACM con-ference on Recommender systems , 5–12. ACM.[McDiarmid 1989] McDiarmid, C. 1989. On the method ofbounded differences.

Surveys in combinatorics

Advances in neural information processing systems , 1257–1264.[Pan and Chen 2013a] Pan, W., and Chen, L. 2013a. Coﬁset:Collaborative ﬁltering via learning pairwise preferencesover item-sets. In

Proceedings of the 2013 SIAM interna-tional conference on data mining , 180–188. SIAM.[Pan and Chen 2013b] Pan, W., and Chen, L. 2013b. Gbpr:group preference based bayesian personalized ranking forone-class collaborative ﬁltering. In

Twenty-Third Interna-tional Joint Conference on Artiﬁcial Intelligence .[Qin et al. 2019] Qin, C.; Zhu, H.; Zhu, C.; Xu, T.; Zhuang,F.; Ma, C.; Zhang, J.; and Xiong, H. 2019. Duerquiz: Apersonalized question recommender system for intelligentjob interview. In

Proceedings of the 25th ACM SIGKDDInternational Conference on Knowledge Discovery & DataMining , 2165–2173. ACM.[Rendle et al. 2009] Rendle, S.; Freudenthaler, C.; Gantner,Z.; and Schmidt-Thieme, L. 2009. Bpr: Bayesian person-alized ranking from implicit feedback. In

Proceedings ofthe twenty-ﬁfth conference on uncertainty in artiﬁcial intel-ligence , 452–461. AUAI Press.[Shi, Larson, and Hanjalic 2010] Shi, Y.; Larson, M.; andHanjalic, A. 2010. List-wise learning to rank with matrixfactorization for collaborative ﬁltering. In

Proceedings ofthe fourth ACM conference on Recommender systems , 269–272. ACM.[Talagrand 2006] Talagrand, M. 2006.

The generic chaining:upper and lower bounds of stochastic processes . SpringerScience & Business Media.[Wang et al. 2016] Wang, S.; Huang, S.; Liu, T.-Y.; Ma, J.;Chen, Z.; and Veijalainen, J. 2016. Ranking-oriented col-laborative ﬁltering: A listwise approach.

ACM Transactionson Information Systems (TOIS)

Proceedings of the 21th ACMSIGKDD International Conference on Knowledge Discov-ery and Data Mining , 1235–1244. ACM.[Weimer et al. 2008] Weimer, M.; Karatzoglou, A.; Le,Q. V.; and Smola, A. J. 2008. Coﬁ rank-maximum marginmatrix factorization for collaborative ranking. In

Advancesin neural information processing systems , 1593–1600.[Wu, Hsieh, and Sharpnack 2017] Wu, L.; Hsieh, C.-J.; andSharpnack, J. 2017. Large-scale collaborative rankingin near-linear time. In

Proceedings of the 23rd ACMSIGKDD International Conference on Knowledge Discov-ery and Data Mining , 515–524. ACM.[Wu, Hsieh, and Sharpnack 2018] Wu, L.; Hsieh, C.-J.; andSharpnack, J. 2018. Sql-rank: A listwise approach to col-laborative ranking. In

Proceedings of the 35th InternationalConference on Machine Learning, ser , volume 80, 5315–5324.[Xia et al. 2008] Xia, F.; Liu, T.-Y.; Wang, J.; Zhang, W.; andLi, H. 2008. Listwise approach to learning to rank: theorynd algorithm. In

Proceedings of the 25th international con-ference on Machine learning , 1192–1199. ACM.[Xia 2019] Xia, L. 2019. Learning and decision-makingfrom rank data.

Synthesis Lectures on Artiﬁcial Intelligenceand Machine Learning

IJCAI , 3203–3209.[Zhu et al. 2018] Zhu, C.; Zhu, H.; Xiong, H.; Ma, C.; Xie,F.; Ding, P.; and Li, P. 2018. Person-job ﬁt: Adapting theright talent for the right job with joint representation learn-ing.