[PDF] Revisit Recommender System in the Permutation Prospective

Abstract

Recommender systems (RS) work effective at alleviating information overload and matching user interests in various web-scale applications. Most RS retrieve the user's favorite candidates and then rank them by the rating scores in the greedy manner. In the permutation prospective, however, current RS come to reveal the following two limitations: 1) They neglect addressing the permutation-variant influence within the recommended results; 2) Permutation consideration extends the latent solution space exponentially, and current RS lack the ability to evaluate the permutations. Both drive RS away from the permutation-optimal recommended results and better user experience. To approximate the permutation-optimal recommended results effectively and efficiently, we propose a novel permutation-wise framework PRS in the re-ranking stage of RS, which consists of Permutation-Matching (PMatch) and Permutation-Ranking (PRank) stages successively. Specifically, the PMatch stage is designed to obtain the candidate list set, where we propose the FPSA algorithm to generate multiple candidate lists via the permutation-wise and goal-oriented beam search algorithm. Afterwards, for the candidate list set, the PRank stage provides a unified permutation-wise ranking criterion named LR metric, which is calculated by the rating scores of elaborately designed permutation-wise model DPWN. Finally, the list with the highest LR score is recommended to the user. Empirical results show that PRS consistently and significantly outperforms state-of-the-art methods. Moreover, PRS has achieved a performance improvement of 11.0% on PV metric and 8.7% on IPV metric after the successful deployment in one popular recommendation scenario of Taobao application.

Full PDF

RRevisit Recommender System in the Permutation Prospective

Yufei Feng, Yu Gong, Fei Sun, Qingwen Liu, Wenwu Ou

Alibaba Group, Hangzhou, [email protected],{gongyu.gy,ofey.sf,xiangsheng.lqw,santong.oww}@alibaba-inc.com

ABSTRACT

Recommender systems (RS) work effective at alleviating informa-tion overload and matching user interests in various web-scaleapplications. Most RS retrieve the user’s favorite candidates andthen rank them by the rating scores in the greedy manner. In thepermutation prospective, however, current RS come to reveal the fol-lowing two limitations: 1) They neglect addressing the permutation-variant influence within the recommended results; 2) Permutationconsideration extends the latent solution space exponentially, andcurrent RS lack the ability to evaluate the permutations. Both driveRS away from the permutation-optimal recommended results andbetter user experience.To approximate the permutation-optimal recommended resultseffectively and efficiently, we propose a novel permutation-wiseframework PRS in the re-ranking stage of RS, which consists ofPermutation-Matching (PMatch) and Permutation-Ranking (PRank)stages successively. Specifically, the PMatch stage is designed toobtain the candidate list set, where we propose the FPSA algorithmto generate multiple candidate lists via the permutation-wise andgoal-oriented beam search algorithm. Afterwards, for the candidatelist set, the PRank stage provides a unified permutation-wise rank-ing criterion named LR metric, which is calculated by the ratingscores of elaborately designed permutation-wise model DPWN.Finally, the list with the highest LR score is recommended to theuser. Empirical results show that PRS consistently and significantlyoutperforms state-of-the-art methods. Moreover, PRS has achieveda performance improvement of 11.0% on PV metric and 8.7% onIPV metric after the successful deployment in one popular recom-mendation scenario of Taobao application.

KEYWORDS

Recommender System; Re-ranking; Permutation-wise

ACM Reference Format:

Yufei Feng, Yu Gong, Fei Sun, Qingwen Liu, Wenwu Ou. 2021. RevisitRecommender System in the Permutation Prospective. In

Proceedings ofACM Conference (Conference’17).

ACM, New York, NY, USA, 9 pages. https://doi.org/10.1145/nnnnnnn.nnnnnnn

Recommender systems (RS) have been widely deployed in variousweb-scale applications, including e-commerce [10, 14, 28, 34], social

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than ACMmust be honored. Abstracting with credit is permitted. To copy otherwise, or republish,to post on servers or to redistribute to lists, requires prior specific permission and/or afee. Request permissions from [email protected].

A B CCAB

Permutation 1 Permutation 2 (2) List-wise(3) Permutation-wise

CAB

SumSum

AB C

P2 is better than P1 (cid:11)(cid:20)(cid:12)(cid:3)

Real-world cases

User List-wise model

Shared initial list

AB C

ReﬁnedscoresGreedyre-rank Worseresults

A B C

Permutation-wisemodel

A B CCAB CAB

Betterresults

A B C

Figure 1: (1) shows two real-world cases. (2) and (3) presentthe comparison of list-wise and permutation-wise methodswhen meeting the two cases. media [10, 17] and news [11, 23, 24]. Typically, web-scale RS usu-ally consist of three stages successively, i.e., matching, ranking, andre-ranking. With the continuous advances in the recommendationtechnologies, various methods have been proposed in each stage toimprove user experience and recommendation performance. Over-all, most methods in RS contribute to retrieve the user’s favoriteitems from tremendous candidates, followed by the greedy strategybased on the rating scores of given user-item pairs.Different from point-wise methods in the matching and rankingstages, various list-wise methods [1, 25, 26, 35] have been proposedand deployed in the re-ranking stage. The widely adopted pipelineof existing list-wise methods usually contain three key components:1)

Ranking : the initial list is generated according to scores of thebasic ranking model; 2)

Refining : the list-wise feature distribution ofthe initial list is extracted by a well-designed module ( e.g.,

LSTM [1]and self-attention [25, 26]) to refine the rating scores; 3)

Re-ranking :candidate items are re-ranked by the refined list-wise rating scoresin the greedy manner. Overall, existing list-wise methods achieveimprovements in recommendations mainly by modeling list-wisefeature distribution to focus on refining the item’s rating scores.In the permutation prospective, however, the popular re-rankingpipeline cannot assist current RS to reach permutation-optimaldue to neglecting permutation-variant influence within the recom-mended results. As shown in Fig. 1 (1), here are two recommendedresults collected from the real-world dataset. Surprisingly, even ifthey are composed of the same items and exhibited to the same user, the user responses to the permutation 2 rather than 1. Onepossible reason is that placing the more expensive item 𝐵 aheadcan stimulate the user’s desire to buy the cheaper item 𝐴 . We re-fer such influence on the user’s making decisions caused by the a r X i v : . [ c s . I R ] F e b onference’17, July 2017, Washington, DC, USA Yufei Feng, Yu Gong, Fei Sun, Qingwen Liu, Wenwu Ou Matching A PRS

Ranking Current RSList-wisePMatch PRankLRFPSA

Others

Figure 2: Difference of architecture between current RS andPRS in the re-ranking stage. change of permutations as permutation-variant influence withinthe recommended results. By addressing such permutation-variantinfluence, the ideal method should recommend permutation 2 tothe user, which is the permutation-optimal result in the permutationprospective. While with the supervised training of item 𝐴 in thepermutation 2, the list-wise models in Fig. 1 (2) can only acquirethe priority of item 𝐴 when meeting the shared initial list ( e.g., item 𝐵 , 𝐴 , 𝐶 ). Afterwards, the list-wise models will re-rank items in thegreedy manner by the refined rating scores ( i.e., permutation 1)and miss the better permutation 2. Ideally, with the permutation-wise methods in Fig. 1 (3), the predicted interaction probability ofitem 𝐴 has been greatly improved by placing item 𝐵 ahead of 𝐴 from permutation 1 to 2, which mainly contributes to the correctjudgment of permutation 2 better than 1. Accordingly, fully consid-ering and leveraging such permutation-variant influence containedin lists serve as the key to approximate the permutation-optimalrecommended results and better user experience.Practically, such permutation prospective brings new challengesto current RS, which can mainly be summarized as the followingtwo aspects: 1) Exponential solutions.

Suppose 𝑛 ( e.g.,

10) itemsneed to be recommended from the initial list of size 𝑚 ( e.g., 𝑛𝑚 ) (about O(100 )). Considering the permutation-variant influence within the final item lists, the user will responsedifferently to each list. In fact, only one list will be finally recom-mended to the user, thus how to reach the permutation-optimal listeffectively and efficiently brings new challenges to current RS; 2) Permutation-wise evaluation.

Most methods applied in currentRS attempt to point-wisely predict the user-item interaction proba-bility ( e.g., click-through rate and conversion rate). As mentionedabove, it is necessary to provide a unified permutation-wise rankingcriterion for selecting the permutation-optimal one from massivequalified lists, which has not been involved in the existing efforts. In this work, as shown in Fig. 2, we promote the list-wise methodsto a novel permutation-wise framework named PRS ( P ermutation R etrieve S ystem) in the re-ranking stage of RS. Specifically, PRS con-sists of PMatch ( P ermutation- M atching) and PRank ( P ermutation- R anking) stages successively. The PMatch stage focuses on gener-ating multiple candidate lists in an efficient and parallel way, withthe aim of considering permutation-variant influence and alleviat-ing the problem of exponential solutions. Here we propose FPSA( F ast P ermutation S earching A lgorithm), a permutation-wise andgoal-oriented beam search algorithm, to generate candidate listsin an efficient way. Afterwards, the PRank stage provides a unifiedranking criterion for the challenge of permutation-wise evaluation.We equip the proposed model DPWN ( D eep P ermutation- W ise N etwork) with Bi-LSTM, with which permutation-variant influ-ence can be fully addressed. Moreover, we propose the LR ( L ist R eward) metric to rank the candidate lists, which is calculated byadding the DPWN’s rating scores of each item within the candidatelist. Finally, the list with the highest LR score will be recommendedto the user.To demonstrate the effectiveness of PRS, we perform a series ofexperiments on a public dataset from Alimama and a proprietarydataset from Taobao application. Experimental results demonstratethat PRS consistently and significantly outperforms various state-of-the-art methods. Moreover, PRS has been successfully deployedin one popular recommendation scenario of Taobao applicationand gained a performance improvement of 11.0% on the PV (PageView) metric and 8.7% on the IPV (Item Page View) metric. In this section, we review the most related studies in the re-rankingstage in both academic and industrial recommender systems. Typi-cally, the re-ranking stage serves as the last stage after the matchingand ranking stages, where an initial ranking list obtained from pre-vious stages is re-ranked into the final item list ( i.e., recommendedresults) to better satisfy user demands. Roughly speaking, existingre-ranking methods mainly fall into three categories: point-wise,pair-wise and list-wise methods. Point-wise methods [9, 10, 19]regard the recommendation task as a binary classification prob-lem and globally learn a scoring function for a given user-itempair with manually feature engineering. Though with continuousachievements, these methods neglect considering the comparisonand mutual influence between items in the input ranking list. Tosolve this problem, pair-wise and list-wise models are graduallypaid more and more attention in current RS. Pair-wise modelswork by considering the semantic distance of an item pair a time.RankSVM [20], GBRank [32] and RankNet [5] apply SVM, GBTand DNN with techinically designed pair-wise loss functions tocompare any item pair in the input ranking list, respectively. How-ever, pair-wise methods neglect the local information in the list andincrease the model complexity. List-wise models are proposed tocapture the interior correlations among items in the list in differentways. LambdaMart [6] is a well-known tree-based method withthe list-wise loss function. MIDNN [35] works by handmade globallist-wise features, while it requires much domain knowledge anddecreases its generalization performance. DLCM [1], PRM [26], andSetRank [25] apply GRU, self-attention, and induced self-attention evisit Recommender System in the Permutation Prospective Conference’17, July 2017, Washington, DC, USA to encode the list-wise information of the input ranking list forbetter prediction, respectively. Though effective, this type of list-wise models does not escape the paradigm of greedy ranking basedon rating scores, where the permutation-variant influence withinrecommended results are not fully addressed.There are also some other works [7, 15, 16, 29] focusing onmaking the trade-off between relevance and diversity in the re-ranking stage. Another emerging direction of re-ranking methodsis group-wise methods [2], where the relevance score of a documentis determined jointly by multiple documents in the list. Thougheffective, group-wise methods may not be appropriate for industrialrecommender systems due to its high computation complexity (atleast 𝑂 ( 𝑁 ) ).Note that our proposed PRS can be easily integrated with exist-ing works, deploying them in parallel and merging them into thecandidate list set in the PMatch stage. In general, a web-scale recommender system ( e.g., e-commerceand news) is composed of three stages chronologically: match-ing, ranking and re-ranking. In this paper, we focus on the fi-nal re-ranking stage, whose input is the ranking list producedby the previous two stage ( i.e., matching and ranking). The taskof the re-ranking is to elaborately select candidates from the in-put ranking list and rearrange them into the final item list, fol-lowed by the exhibition for users. Mathematically, with user set U and item set I , we denote labeled list interaction records as R = {( 𝑢, C , V , Y CTR , Y NEXT | 𝑢 ∈ U , V ⊂ C ⊂ I)} . Here, C and V represent the recorded input ranking list with 𝑚 items for re-ranking stage and the recorded final item list with 𝑛 items exhibitedto user 𝑢 , respectively. Intuitively, 𝑛 ≤ 𝑚 . 𝑦 CTR 𝑡 ∈ Y CTR is theimplicit feedback of user 𝑢 w.r.t. 𝑡 -th item 𝑣 𝑡 ∈ V , where 𝑦 CTR 𝑡 = e.g., click) is observed, and 𝑦 CTR 𝑡 = 𝑦 NEXT 𝑡 = 𝑦 NEXT 𝑡 = 𝑢 is associatedwith a user profile 𝑥 𝑢 consisting of sparse features 𝑥 𝑢𝑠 ( e.g., user idand gender) and dense features 𝑥 𝑢𝑑 ( e.g., age), while each item 𝑖 isalso associated with a item profile 𝑥 𝑖 consisting of sparse features 𝑥 𝑠𝑖 ( e.g., item id and brand) and dense features 𝑥 𝑑𝑖 ( e.g., price).Given the above definitions, we now formulate the re-rankingtask to be addressed in this paper:Definition 1. Task Description . Typically, the purpose of in-dustrial RS is to make users take more browsing (PV, page view) andinteractions (IPV, item page view), and the same for the re-rankingtask. Given a certain user 𝑢 , involving his/her input ranking list C ,the task is to learn a re-ranking strategy 𝜋 : C 𝜋 −→ P , which aims toselect and rearrange items from C , and subsequently recommend afinal item list P . In this section, we introduce our proposed permutation-wise frame-work PRS, which aims to leverage permutation-variant influenceeffectively and efficiently for better item rearrangements in there-ranking stage of RS. To address the exponential solutions and

Table 1: Key notations.

Notations Description U , I the set of users and items, respectively C , V the recorded input ranking list and labeledfinal item list, respectively Y CTR , Y NEXT the implicit feedback whether the user clicksor continues browsing of the final item list,respectively 𝑚, 𝑛 the number of items in the recorded input andfinal ranking list, respectively 𝜋 , P the learned re-ranking strategy and the gen-erated final ranking list, respectively 𝑥 𝑢𝑠 ( 𝑥 𝑖𝑠 ) , 𝑥 𝑢𝑑 ( 𝑥 𝑖𝑠 ) the sparse and dense feature set for users(items), respectively 𝛼, 𝛽 Fusion coefficient float of 𝑟 𝑃𝑉 and 𝑟 𝐼𝑃𝑉 inFPSA, respectively 𝑟 𝑃𝑉 , 𝑟 𝐼𝑃𝑉 , 𝑟 𝑠𝑢𝑚 the estimated PV, IPV and mixed reward ofthe list, respectively S the candidate list set generated by the PMatchstagepermutation-wise evaluation challenges, we transform the re-rankingtask into two successive stages: Permutation-Matching (PMatch)and Permutation-Ranking (PRank). In the PMatch stage, multiple ef-ficient methods can be deployed in parallel to consider permutation-variant influence and search for effective candidate item lists fromexponential solutions. Successively, the PRank stage proposes theLR metric as a unified permutation-wise ranking criterion for thecandidate item list set, with which the list of the highest LR scorewill be finally recommended. The key notations we will use through-out the article are summarized in Tab. 1.First of all, we begin with the representations of users and items,which are basic inputs of our proposed framework. Following previ-ous works [10, 34], we parameterize the available profiles into vectorrepresentations for users and items. Given a user 𝑢 , associated withsparse features 𝑥 𝑠𝑢 and dense features 𝑥 𝑑𝑢 , we embed each sparse fea-ture value into 𝑑 -dimensional space. Subsequently, each user 𝑢 canbe represented as x 𝑢 ∈ R | 𝑥 𝑠𝑢 |× 𝑑 +| 𝑥 𝑑𝑢 | , where | 𝑥 𝑠𝑢 | and | 𝑥 𝑑𝑢 | denotethe size of sparse and dense feature space of user 𝑢 , respectively.Similarly, we represent each item 𝑖 as x 𝑖 ∈ R | 𝑥 𝑠𝑖 |× 𝑑 +| 𝑥 𝑑𝑖 | . Naturally,we represent the recorded input ranking list C as C = [ x 𝑐 , . . . , x 𝑚𝑐 ] and the labeled final item list V as V = [ x 𝑣 , . . . , x 𝑛𝑣 ] , where 𝑚 and 𝑛 is the number of items in the input ranking list and final item list,respectively.In the following sections, we shall zoom into the PMatch andPMatch stages of proposed permutation-wise framework PRS. Foreach stage, we split the introductions into two parts: one is offlinetraining (preparation) and the other is online serving (inference).The major difference of the two parts is the V , Y CTR , Y NEXT in R is given in the offline training, while not in the online serving. As motivated, the Permutation-Matching (PMatch) stage is pro-posed to obtain multiple effective candidate lists in a more efficient onference’17, July 2017, Washington, DC, USA Yufei Feng, Yu Gong, Fei Sun, Qingwen Liu, Wenwu Ou

Alg 1 (FPSA)Alg 2Alg 3 Merge

LR Metric

PMatch PRank

Figure 3: Overall architecture of the PRS framework. way. As shown in the left part of Fig. 3, various algorithms [1, 25, 26,35] can be deployed in parallel to generate item lists, followed bythe list merge operation into the candidate list set. Though effective,current efforts neglect the permutation-variant influence withinthe final item list. To leverage the permutation-variant influence ina efficient way, we propose a permutation-wise and goal-orientedbeam search algorithm named FPSA ( F ast P ermutation S earching A lgorithm). To make users take more browsing andinteractions, besides the normally applied CTR [14, 34] score, wefirst design the NEXT score, which predicts the probability whetherthe user will continue browsing after the item. Items with higherNEXT scores can increase the user’s desire for continuous browsing,which improves the probability of subsequent items being browsedand clicked.Specifically, with the labeled interaction records R , we elaborateon two point-wise models: the CTR prediction model 𝑀 CTR ( 𝑣 | 𝑢 ; Θ CTR ) and the NEXT prediction model 𝑀 NEXT ( 𝑣 | 𝑢 ; Θ NEXT ) , which canbe calculated as follows:ˆ 𝑦 CTR 𝑡 = 𝑀 CTR ( 𝑣 | 𝑢 ; Θ CTR ) = 𝜎 (cid:0) 𝑓 ( 𝑓 ( 𝑓 ( x 𝑣 ⊕ x 𝑢 ))) (cid:1) ˆ 𝑦 NEXT 𝑡 = 𝑀 NEXT ( 𝑣 | 𝑢 ; Θ NEXT ) = 𝜎 (cid:0) 𝑓 ( 𝑓 ( 𝑓 ( x 𝑣 ⊕ x 𝑢 ))) (cid:1) (1)where 𝑓 ( x ) = 𝑅𝑒𝐿𝑈 ( Wx + b ) , 𝜎 (·) is the logistic function. Clearly,the two models can be optimized via binary cross-entropy lossfunction, which is defined as follows: L CTR = − 𝑁 ∑︁ ( 𝑢, V) ∈R ∑︁ 𝑥 𝑡𝑣 ∈V (cid:16) 𝑦 CTR 𝑡 log ˆ 𝑦 CTR 𝑡 + ( − 𝑦 CTR 𝑡 ) log ( − ˆ 𝑦 CTR 𝑡 ) (cid:17) L NEXT = − 𝑁 ∑︁ ( 𝑢, V) ∈R ∑︁ 𝑥 𝑡𝑣 ∈V (cid:16) 𝑦 NEXT 𝑡 log ˆ 𝑦 NEXT 𝑡 + ( − 𝑦 NEXT 𝑡 ) log ( − ˆ 𝑦 NEXT 𝑡 ) (cid:17) (2)We train Θ CTR and Θ NEXT till converged according to Eq.2. Note that the parameters among different 𝑓 (·) s are not shared. In this work, we regard the re-ranking taskas selecting items from the input ranking list sequentially till thepre-defined length. Beam search [21, 30] is a common technologyto generate multiple effective lists by exploring and expanding themost promising candidates in a limited set. In the FPSA algorithm,we implement the beam search algorithm in a goal-oriented way,that is, we select the lists with the higher estimated reward at eachstep.We clearly present and illustrate the proposed FPSA algorithmin Fig. 4 and Alg. 1. First, we attach two predicted scores to eachitem 𝑐 𝑖 in the input ranking list C by the converged CTR pre-diction model 𝑀 CTR ( 𝑣 | 𝑢 ; Θ CTR ) and the NEXT prediction model 𝑀 NEXT ( 𝑣 | 𝑢 ; Θ NEXT ) : the click-through probability 𝑃 CTR 𝑐 𝑖 and thecontinue-browsing probability 𝑃 NEXT 𝑐 𝑖 . Afterwards, at each step weexhaustively expand the remaining candidates for each list in theset S and reserve the top 𝑘 of candidate lists according to theircalculated estimated reward (lines 1-17). In line 18-28, At each stepwhen iterating through each item in the list, we calculate the tran-sitive expose probability 𝑝 𝐸𝑥𝑝𝑜𝑠𝑒 multiplied by the NEXT scores ofprevious items. Influenced by 𝑝 𝐸𝑥𝑝𝑜𝑠𝑒 , we calculate PV reward 𝑟 𝑃𝑉 and IPV reward 𝑟 𝐼𝑃𝑉 , and denote the sum of them as the reward 𝑟 𝑠𝑢𝑚 at the end of iteration. Here 𝑟 𝑃𝑉 , 𝑟 𝐼𝑃𝑉 and 𝑟 𝑠𝑢𝑚 representsthe estimated PV, IPV and mixed reward of the list, respectively.At the end of the algorithm, we obtain the candidate list set S = {O , . . . , O 𝑡 } 𝑠𝑖𝑧𝑒 = 𝑘 , which can be either directly selected thetop list according to the reward 𝑟 𝑠𝑢𝑚 . Moreover, the final item listsgenerated from other methods ( e.g., DLCM and PRM) can be mergedinto S and further ranked by the successive PRank stage. We make some efforts to improve the ef-ficiency of FPSA algorithm. Specifically, we deploy the CTR andNEXT prediction model in parallel in the ranking stage to rate CTRand NEXT scores for each item, whose algorithm complexity is 𝑂 ( ) . For the algorithm in Alg. 1, we conduct the loops in lines 6-15in parallel and adopt the min-max heaps sorting algorithm [3] toselect top-k results in line 16. Totally, we reduce to the algorithmcomplexity if Alg. 1 to 𝑂 ( 𝑛 ( 𝑛 + 𝑘 log 𝑘 )) .Overall, the algorithm complexity of FPSA is 𝑎𝑂 ( ) + 𝑏𝑂 ( 𝑛 ( 𝑛 + 𝑘 log 𝑘 )) . It is more efficient than existing list-wise methods ( e.g., DLCM [1] and PRM [26]) of 𝑎𝑂 ( 𝑛 ) + 𝑏𝑂 ( 𝑛 log 𝑛 ) . Here 𝑎 ≫ 𝑏 represents that the matrix calculations in deep models cost muchmore than the numerical calculations. As motivated, the PRank stage is proposed to provide a unifiedpermutation-wise ranking criterion for the candidate list set gen-erated by the PMatch stage. Most methods in current RS mainlyfollow the greedy strategy based on the rating scores. Though ef-fective, such strategy neglects the permutation-variant influence inthe final item list, thus is not capable to evaluate the permutations.To this end, as show in the right part of Fig. 3, we propose the LR( L ist R eward) metric to select the permutation-optimal list fromthe candidate list set provided by the PMatch stage, which is calcu-lated by the rating scores of elaborately designed permutation-wisemodel DPWN ( D eep P ermutation- W ise N etwork). evisit Recommender System in the Permutation Prospective Conference’17, July 2017, Washington, DC, USA c r c c r r c c r r r c c r c r r r c …… c Figure 4: Illustration of the FPSA algorithm.Algorithm 1

Fast Permutation searching algorithm (FPSA).

Input:

Input ranking list C ; CTR score list P CTR ; NEXT score list P NEXT ; Output length 𝑛 ; Beam size integer 𝑘 ; Fusion coefficientfloat 𝛼, 𝛽 . Output:

Candidate list set S . New candidate list set S = {[ 𝜙 ]} , estimated reward set R = {} ; // Beam search. for 𝑖 = , , ..., 𝑛 do : New candidate list set S 𝑡 = S ; Clear S and R ; for O ∈ S 𝑡 do : for 𝑐 𝑖 ∈ C do : if 𝑐 𝑖 ∉ O then : New O 𝑡 by appending 𝑐 𝑖 to O ; Reward 𝑟 = Calculate-Estimated-Reward( O 𝑡 ); R ← R ∪ { 𝑟 } ; S ← S ∪ {O 𝑡 } ; end if end for end for According to R , S ← top 𝑘 of S ; end for // Calculate reward. function

Calculate-Estimated-Reward( O ) Estimated PV reward 𝑟 𝑃𝑉 = 1, IPV reward 𝑟 𝐼𝑃𝑉 = 0;

Transitive expose probability 𝑝 𝐸𝑥𝑝𝑜𝑠𝑒 = 1; for 𝑐 𝑖 ∈ O do : 𝑟 𝑃𝑉 += 𝑃 𝐸𝑥𝑝𝑜𝑠𝑒 ∗ 𝑃 NEXT 𝑐 𝑖 ; 𝑟 𝐼𝑃𝑉 += 𝑃 𝐸𝑥𝑝𝑜𝑠𝑒 ∗ 𝑃 CTR 𝑐 𝑖 ; 𝑝 𝐸𝑥𝑝𝑜𝑠𝑒 ∗ = 𝑃 NEXT 𝑐 𝑖 ; end for 𝑟 𝑠𝑢𝑚 = 𝛼 ∗ 𝑟 𝑃𝑉 + 𝛽 ∗ 𝑟 𝐼𝑃𝑉 ; return 𝑟 𝑠𝑢𝑚 ; end function Final ranking list h b1 h b2 h bk h fk h f2 h f1 Bi-LSTM h h h n h n h h + + + … … MLPMLPMLP

Figure 5: Technical architecture of the DPWN model.

The overall architecture of DPWN is shownin Fig. 5. As motivated, DPWN is proposed to capture and model thepermutation-variant influence contained by the final item list. Bytaking such motivation into consideration, DPWN 𝑀 ( x 𝑡𝑣 | 𝑢, V ; Θ 𝐷 ) ,parameterized by Θ 𝐷 , predicts the permutation-wise interactionprobability between user 𝑢 and the 𝑡 -th item x 𝑡𝑣 in the final itemlist V . Naturally, Bi-LSTM [18] is excellent at capturing such time-dependent and long-short term information within the permutation.Mathematically, the forward output state for the 𝑡 -th item 𝑥 𝑡𝑣 canbe calculated as follows: i 𝑡 = 𝜎 ( W 𝑥𝑖 x 𝑡𝑣 + W ℎ𝑖 h 𝑡 − + W 𝑐𝑖 c 𝑡 − + b 𝑖 ) f 𝑡 = 𝜎 ( W 𝑥 𝑓 x 𝑡𝑣 + W ℎ𝑓 h 𝑡 − + W 𝑐𝑓 c 𝑡 − + b 𝑓 ) c 𝑡 = f 𝑡 x 𝑡𝑣 + i 𝑡 tanh ( W 𝑥𝑐 x 𝑡𝑣 + W ℎ𝑐 h 𝑡 − + b 𝑐 ) o 𝑡 = 𝜎 ( W 𝑥𝑜 x 𝑡𝑣 + W ℎ𝑜 h 𝑡 − + W 𝑐𝑜 c 𝑡 + b 𝑜 )−→ h 𝑡 = o 𝑡 tanh ( c 𝑡 ) (3)where 𝜎 (·) is the logistic function, and i , f , o , and c are the inputgate, forget gate, output gate and cell vectors which have the samesize as x 𝑡𝑣 . Shapes of weight matrices are indicated with the sub-scripts. Similarly, we can get the backward output state ←− h 𝑡 . Thenwe concatenate the two output states h 𝑡 = −→ h 𝑡 ⊕ ←− h 𝑡 and denote h 𝑡 as the sequential representation of 𝑥 𝑡𝑣 .Due to the powerful ability in modeling complex interaction inthe CTR prediction field [14, 27, 33, 34], we integrate the multi-layerperceptron (MLP) into DPWN for better feature interaction. Hence,we formalize the Bi-LSTM as follows: 𝑀 ( x 𝑡𝑣 | 𝑢, V ; Θ 𝐷 ) = 𝜎 (cid:0) 𝑓 ( 𝑓 ( 𝑓 ( x 𝑢 ⊕ x 𝑡𝑣 ⊕ h 𝑡 ))) (cid:1) (4)where 𝑓 ( x ) = 𝑅𝑒𝐿𝑈 ( Wx + b ) , 𝜎 (·) is the logistic function. Theparameter set of DPWN is Θ 𝐷 = { W ∗ , b ∗ } , i.e., the union of param-eters for Bi-LSTM and MLP.Clearly, DPWN can be optimized via binary cross-entropy lossfunction, which is defined as follows: L 𝐷 = − 𝑁 ∑︁ ( 𝑢, V) ∈R ∑︁ 𝑥 𝑡𝑣 ∈V (cid:0) 𝑦 𝑡𝑢𝑣 log ˆ 𝑦 𝑡𝑢𝑣 + ( − 𝑦 𝑡𝑢𝑣 ) log ( − ˆ 𝑦 𝑡𝑢𝑣 ) (cid:1) (5)where D is the training dataset. For convenience, we refer ˆ 𝑦 𝑡𝑢𝑣 as 𝑀 ( x 𝑡𝑣 | 𝑢, V ; Θ 𝐷 ) , i.e., ˆ 𝑦 𝑡𝑢𝑣 = 𝑀 ( x 𝑡𝑣 | 𝑢, V ; Θ 𝐷 ) , and 𝑦 𝑡𝑢𝑣 ∈ { , } isthe ground truth. We can optimize the parameters Θ 𝐷 throughminimizing L 𝐷 . onference’17, July 2017, Washington, DC, USA Yufei Feng, Yu Gong, Fei Sun, Qingwen Liu, Wenwu Ou Table 2: Statistics of datasets

Description Rec Ad × × × × × × × × C 𝑢 V 𝑢 final item list rather than the input ranking list . Practi-cally, the user is more influenced by the permutation informationof the exhibited final item list. As mentioned above, we propose a unifiedpermutation-wise LR metric based on the DPWN model, which isapplied to achieve the most permutation-optimal list solution fromthe candidate list set generated in the PMatch stage. Specifically, wecalculate the LR (list reward) score for each list O 𝑡 in the candidatelist set S as follows: 𝐿𝑅 (O 𝑡 ) = ∑︁ 𝑥 𝑖𝑜 ∈O 𝑡 M ( 𝑥 𝑖𝑜 | 𝑢, O 𝑡 ) (6)Afterwards, the list P with the highest LR score is finally selectedand recommended to the user, with the hope of obtaining thepermutation-optimal list and most meeting the user’s demands.Finally, we refer the online serving part of PMatch and PRankstages in the PRS framework as the learned re-ranking strategy 𝜋 to generate the permutation-optimal list P from the input rankinglist C . In this section, we perform a series of online and offline experimentsto verity the effectiveness of the proposed framework PRS, withthe aims of answering the following questions: • RQ1 : How does the proposed LR metric in the PRank stageevaluate the profits of the permutations precisely? • RQ2 : How does the proposed FPSA algorithm in the PMatchstage generate the candidate list set from exponential listsolutions in the effective way? • RQ3 : How about the performance of PRS in the real-worldrecommender systems?

We conduct extensive experiments on two real-world datasets: a proprietary dataset from Taobao application anda public dataset from Alimama, which are introduced as follows: • Rec dataset consists of large-scale list interaction logscollected from Taobao application, one of the most populare-commerce platform. Besides, Rec dataset contains userprofile ( e.g., id, age and gender), item profile ( e.g., id, category and brand), the input ranking list provided by the previousstages for each user and the labeled final item list. • Ad dataset records interactions between users and adver-tisements and contains user profile ( e.g., id, age and occupa-tion), item profile ( e.g., id, campaign and brand). Accordingto the timestamp of the user browsing the advertisement,we transform records of each user and slice them at each10 items into list records. Moreover, we mix an additional90 items with the final item list as the input ranking list tomake it more suitable for re-ranking.The detailed descriptions of the two datasets are shown in Tab. 2. Forthe both datasets, we randomly split the entire records into trainingand test set, i.e., we utilize 90% records to predict the remaining10% ones . We select two kinds of representative methodsas baselines: point-wise and list-wise methods, which are widelyadopted in most industrial recommender systems. Point-wise meth-ods ( i.e.,

DNN and DeepFM) mainly predict the interaction proba-bility for the given user-item pair by utilizing raw features derivedfrom user and item profile. List-wise methods ( i.e.,

MIDNN, DLCMand PRM) devote to extracting list-wise information in differentways. The comparison methods are given below in detail: • DNN [10] is a standard deep learning method in the indus-trial recommender system, which applies MLP for complexfeature interaction. • DeepFM [19] is a general deep model for recommendation,which combines a factorization machine component and adeep neural network component. • MIDNN [35] extracts list-wise information of the input rank-ing list with complex handmade features engineering. • DLCM [1] firstly applies GRU to encode the input rankinglist into a local vector, and then combine the global vectorand each feature vector to learn a powerful scoring functionfor list-wise re-ranking. • PRM [26] applies the self-attention mechanism to explicitlycapture the mutual influence between items in the inputranking list. • DPWN is the elaborately designed permutation-wise modelin this work, based on which we propose the LR metric.To answer

RQ1 , we denote the sum rating scores of the recordedfinal item list from the baselines ( 𝑆𝑅 for distinction), and our pro-posed LR metric is calculated by DPWN with the same pattern.To answer RQ2 , we compare the ranking result of the baselineswith the list of highest estimated reward 𝑟 𝑠𝑢𝑚 in FPSA under theLR metric.To answer RQ3 , we deploy the PRank stage with FPSA algorithmand the PMatch stage with the LR metric based on the DPWN modeland report the online performance of each stage and the completePRS framework.Note that pair-wise and group-wise methods are not selected asbaselines in our experiments due to their high training or inferencecomplexities ( O( 𝑁 ) ) compared with point-wise ( O( ) ) or list-wise https://tianchi.aliyun.com/dataset/dataDetail?dataId=56 We hold out 10% training data as the validation set for parameter tuning. evisit Recommender System in the Permutation Prospective Conference’17, July 2017, Washington, DC, USA

Table 3: Overall performance comparison w.r.t. interactionprobability prediction (bold: best; underline: runner-up).

Model Rec AdLoss AUC Pearson Loss AUC PearsonDNN( 𝑆𝑅 ) 0.191 0.701 0.066 0.187 0.587 0.131DeepFM( 𝑆𝑅 ) 0.189 0.703 0.072 0.186 0.588 0.140MIDNN( 𝑆𝑅 ) 0.182 0.706 0.090 0.185 0.600 0.155DLCM( 𝑆𝑅 ) 0.177 0.710 0.102 0.185 0.602 0.159PRM( 𝑆𝑅 ) 0.175 0.712 0.108 0.185 0.602 0.159DPWN(LR) ∗ ∗ ∗ ∗ ∗ ∗ models ( O( 𝑁 ) ), which may not be appropriate for the industrialRS. We apply Loss (cross-entropy) and AUC(area under ROC curve) to evaluate the accuracy of model predic-tions. Moreover, to answer

RQ1 , we calculate the pearson correla-tion coefficient [4] (Pearson for short) of different metrics w.r.t. theground-truth, which refers to the sum of Y CTR ( i.e., list 𝐼𝑃𝑉 ) foreach list interaction record in R . Overall, higher Pearson score in-dicates the estimated list rewards of the model are more correlatedto the real ones.To illustrate the priority of our proposed FPSA algorithm tocompetitors for RQ2 , we adopt the proposed LR metric to theranking results of the FPSA algorithm and other methods. Besides,we also present the relative improvement [12, 13] (RI) w.r.t. the LRmetric of the FPSA algorithm achieves over the compared methods.For online A/B testing for

RQ3 , we choose PV and IPV metrics,which most industrial recommender systems mainly pay attentionto. PV and IPV are defined as the total number of items browsedand clicked by the users, respectively.

We implement all models in Tensorflow1.4. For fair comparison, pre-training, batch normalization andregularization are not adopted in our experiments. We employrandom uniform to initialize model parameters and adopt Adam asoptimizer using a learning rate of 0.001. Moreover, embedding sizeof each feature is set to 8 and the architecture of MLP is set to [128,64, 32]. We run each model three times and reported the mean ofresults. For the FPSA algorithm in Rec dataset, the beam size 𝑘 isset to 50 and the output length 𝑛 is set to 4. For experimental results in Tab. 3 and 4, weuse “*” to indicate that the best method is significantly different fromthe runner-up method based on paired t-tests at the significancelevel of 0.01.

We report the comparison results of DPWN and other baselines ontwo datasets in Tab. 3. The major findings from the experimentalresults are summarized as follows: • Generally, list-wise methods ( i.e.,

MIDNN, DLCM and PRM)achieve more improvements in most cases than point-wisemethods ( i.e.,

DNN and DeepFM). By considering the mutualinfluence among the input ranking list, these methods learn

Table 4: Overall performance comparison w.r.t. the rankingresults on the LR metric (bold: best; underline: runner-up).

Model LR RIDNN 0.252 +13.09%DeepFM 0.261 +9.19%MIDNN 0.272 +4.77%DLCM 0.274 +4.01%PRM 0.278 +2.51%FPSA 𝛼 = ,𝛽 = ∗ - a refined scoring function aware of the feature distributionof the input ranking list. • DPWN consistently yields the best performance on the Lossand AUC metrics of both datasets. In particular, DPWN im-proves over the best baseline PRM by 0.018, 0.006 in AUC onRec and Ad dataset, respectively. By taking advantage of thepermutation information in the final item list, DPWN showsthe ability to provide more precise permutation-wise inter-action probabilities of each item. Different from extractinginformation from the input ranking list in DLCM and PRM,Bi-LSTM helps capture the permutation-variant influencecontained by the final item list, which refers as the moreessential and effective factors to affect the permutation-wisepredictions. • It is worth mentioning that 𝐿𝑅 based on DPWN performsbetter on the Pearson metric than 𝑆𝑅 based on other competi-tors in both datasets. It indicates that DPWN, by leveragingthe permutation-variant influence within the final list, canestimate the revenue ( i.e., IPV) of the permutation more accu-rately. Hence, its derivative LR metric is accordingly appliedto ranking the candidate lists in the PRank stage. • Compared with the experimental results on the Rec dataset,the performance lift on the Ad dataset is relatively small.One possible reason is that Ad dataset is published withrandomly sampling, resulting in the inconsistent and incom-plete list records and weaker intra-permutation relevance ofuser feedback.

Firstly, the rationality of using the LR metric to evaluate the model’sranking results offline is listed as follows: 1) As evidenced in theSection 5.2, our proposed LR metric based on DPWN can estimatethe list revenue more accurately than the SR metric of other models;2) It is model-based and independence of the ground-truth, whichmeans can be directly applied to evaluate lists in the PRank stagefor the online serving and ensure the online and offline consistency.We report the comparison results of the FPSA algorithm andbaselines on the Rec dataset in Tab 4 . We mainly have the follow-ing three observations: • List-wise models ( i.e.,

MIDNN, DLCM and PRM) achievebetter performance than point-wise models ( i.e.,

DNN and Ad dataset is not involved since the sampling of this dataset results in the discontinuityof records. onference’17, July 2017, Washington, DC, USA Yufei Feng, Yu Gong, Fei Sun, Qingwen Liu, Wenwu Ou

Figure 6: Impact of hyper parameters 𝛼 on the LR metric. DeepFM) on the LR metric, which indicates making themodel aware of the feature distribution of the input rankinglist helps achieve the better final item list. • Our proposed FPSA algorithm with fine-tuned hyper pa-rameters 𝛼 = , 𝛽 = Practically, we have noticed the pre-defined hyper parameters 𝛼 and 𝛽 have a great influence on thefinal item list and the LR metric. Therefore, we conduct severalgrid-search experiments on these two hyper parameters to get acomprehensive understanding of the FPSA algorithm. Specifically,we fix 𝛽 = 𝛼 from 0 to 𝑀𝐴𝑋 , and observe how LR variesunder each hyper parameter. Note that setting 𝛼 to 0 or 𝑀𝐴𝑋 meansthat the final item list is in a descending order w.r.t. the NEXT orCTR score, respectively.As shown in Fig. 6, we present the influence of hyper parameterson the final item lists. We observe the LR metric w.r.t. 𝛼 declinesfrom 7 to 0 and from 7 to 𝑀𝐴𝑋 , sharply at 0. It indicates thatexcessively guiding users to browse or interact may miss his/herdemands or increase his/her fatigue, respectively. When setting 𝛼 to7, recommendation results of FPSA take a browsing and interactingbalance and deliver the best user experience. To verify the effectiveness of our proposed framework PRS in thereal-world settings, PRS has been fully deployed in the

Minide-tail scenario, one of the most popular recommendation scenarioof Taobao application. AS shown in the left part of Fig. 7, in thiswaterfall scenario, users can browse and interact with items se-quentially and endlessly. The fixed-length recommended resultsare displayed to the user when he/she reaches the bottom of thecurrent page. Considering the potential problem of high latency tothe user experience, it brings great challenges to deploy PRS intoproduction. (1) Minidetail SenarioPage 1Page 2Page 2 … (2) Baseline RS (3) Promoted RS Matching

I2I

MIND Ranking

BST

Re-rankingPRMMatching

I2I

MIND Ranking Re-ranking

FPSA

CTR

BST

CTR

BST

DCWN

LRPRSList-wise

Figure 7: (1) is the

Minidetail scenario. (2) and (3) present thedifference of the baseline RS and the promoted RS with thePRS framework.Figure 8: Online improvements on PV and IPV metrics in aweek.

We first introduce our effective baseline RS, and then the ef-forts of merging the PRS framework into the promoted RS. ASshown in the right part of Fig. 7, the baseline RS consists of thematching, ranking and re-ranking stages, equipped with I2I [31]and MIND [22] in matching, BST [8] in ranking and PRM [26] inre-ranking. For the FPSA algorithm in the PMatch stage, we con-currently deploy

𝐵𝑆𝑇

CTR and

𝐵𝑆𝑇

NEXT to calculate the CTR andNEXT scores for each item in the ranking stage, without extra timecost. Then multiple candidate final item lists are generated by theFPSA algorithm, and then ranked by the LR metric calculated bythe deployed DPWN model in the PRank stage. Finally, the finalitem list with highest LR scores are recommend to the user.Thanks to the above efforts, we successfully deployed PRS intoproduction and report the online relative improvements from "2020-09-11" to "2020-09-17" in the Fig. 8. Specifically, the FPSA algorithmin the PMatch stage obtains % lift on PV metric and % lifton IPV metric, with the average cost of milliseconds for online evisit Recommender System in the Permutation Prospective Conference’17, July 2017, Washington, DC, USA inference. Afterwards, the LR metric based on the DPWN model inthe PRank stage obtains % lift on IPV metric, with the averagecost of milliseconds for online inference. Totally, the PRS frame-work takes only milliseconds to obtain the significant %lift on PV metric and % improvement on IPV metric. Consid-ering the massive influenced users and maturity of the baselineRS, the promotion of recommendation performance verifies theeffectiveness of our proposed framework PRS.

In this paper, we highlight the importance of the permutation knowl-edge in the final ranking list and address how to leverage and trans-form such information for better recommendation. We propose anovel two-stage permutation-wise framework PRS in the re-rankingstage, consisting of the PMatch and PRank stages successively. ThePMatch stage with the proposed FPSA is designed to generate mul-tiple candidate item lists, and the PRank stage with the LR metricbased on the DPWN model provide a uniform ranking criterion toselect the most effective list for the user. Extensive experiments onboth industrial and benchmark datasets demonstrate the effective-ness of our framework compared to state-of-the-art point-wise andlist-wise methods. Moreover, PRS has also achieved impressive im-provements on the PV and IPV metrics in online experiments aftersuccessful deployment in one popular recommendation scenario ofTaobao application.In the future, we will concentrate on proposing more effectivemethods in the PMatch and PRank stages. Besides, with the im-provement of computing efficiency, PRS has the potential to replacematching and ranking stages with PMatch and PRank stages, re-spectively.

REFERENCES [1] Qingyao Ai, Keping Bi, Jiafeng Guo, and W. Bruce Croft. 2018. Learning a DeepListwise Context Model for Ranking Refinement. In

SIGIR . 135–144.[2] Qingyao Ai, Xuanhui Wang, Sebastian Bruch, Nadav Golbandi, Michael Bender-sky, and Marc Najork. 2019. Learning groupwise multivariate scoring functionsusing deep neural networks. In

Proceedings of the 2019 ACM SIGIR InternationalConference on Theory of Information Retrieval . 85–92.[3] Michael D Atkinson, J-R Sack, Nicola Santoro, and Thomas Strothotte. 1986.Min-max heaps and generalized priority queues.

Commun. ACM

29, 10 (1986),996–1000.[4] Jacob Benesty, Jingdong Chen, Yiteng Huang, and Israel Cohen. 2009. Pearsoncorrelation coefficient. In

Noise reduction in speech processing . Springer, 1–4.[5] Chris Burges, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole Hamilton,and Greg Hullender. 2005. Learning to Rank Using Gradient Descent. In

ICML .89–96.[6] Christopher JC Burges. [n. d.]. From ranknet to lambdarank to lambdamart: Anoverview.

Learning ([n. d.]).[7] Laming Chen, Guoxin Zhang, and Eric Zhou. 2018. Fast greedy map inference fordeterminantal point process to improve recommendation diversity. In

NeurIPS .5622–5633.[8] Qiwei Chen, Huan Zhao, Wei Li, Pipei Huang, and Wenwu Ou. 2019. Behaviorsequence transformer for e-commerce recommendation in alibaba. In

Proceedingsof the 1st International Workshop on Deep Learning Practice for High-DimensionalSparse Data . 1–4.[9] Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra,Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, et al.2016. Wide & deep learning for recommender systems. In

RecSys Workshop . 7–10.[10] Covington, Paul, Adams, Jay, Sargin, and Emre. 2016. Deep neural networks foryoutube recommendations. In

RecSys . 191–198.[11] Abhinandan S Das, Mayur Datar, Ashutosh Garg, and Shyam Rajaram. 2007.Google news personalization: scalable online collaborative filtering. In

WWW . 271–280.[12] Yufei Feng, Binbin Hu, Fuyu Lv, Qingwen Liu, Zhiqiang Zhang, and Wenwu Ou.2020. ATBRG: Adaptive Target-Behavior Relational Graph Network for EffectiveRecommendation. In

Proceedings of the 43rd International ACM SIGIR Conferenceon Research and Development in Information Retrieval . 2231–2240.[13] Yufei Feng, Fuyu Lv, Binbin Hu, Fei Sun, Kun Kuang, Yang Liu, Qingwen Liu,and Wenwu Ou. 2020. MTBRN: Multiplex Target-Behavior Relation EnhancedNetwork for Click-Through Rate Prediction. In

Proceedings of the 29th ACMInternational Conference on Information & Knowledge Management . 2421–2428.[14] Yufei Feng, Fuyu Lv, Weichen Shen, Menghan Wang, Fei Sun, Yu Zhu, and KepingYang. 2019. Deep Session Interest Network for Click-Through Rate Prediction. In

IJCAI . 2301–2307.[15] Carles Gelada, Saurabh Kumar, Jacob Buckman, Ofir Nachum, and Marc G Belle-mare. 2019. Deepmdp: Learning continuous latent space models for representa-tion learning. arXiv preprint arXiv:1906.02736 (2019).[16] Anupriya Gogna and Angshul Majumdar. 2017. Balancing accuracy and diversityin recommendations using matrix completion framework.

Knowledge-BasedSystems

125 (2017), 83–95.[17] Carlos A Gomez-Uribe and Neil Hunt. 2015. The netflix recommender system:Algorithms, business value, and innovation.

ACM Transactions on ManagementInformation Systems

6, 4 (2015), 1–19.[18] Alex Graves and Jrgen Schmidhuber. 2005. Framewise phoneme classificationwith bidirectional LSTM and other neural network architectures.

Neural Networks

18, 5-6 (2005), 602–610.[19] Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He. 2017.DeepFM: a factorization-machine based neural network for CTR prediction. In

IJCAI . 1725–1731.[20] Thorsten Joachims. 2006. Training linear SVMs in linear time. In

SIGKDD . 217–226.[21] Roger Levy and T Florian Jaeger. 2007. Speakers optimize information densitythrough syntactic reduction.

Advances in neural information processing systems

19 (2007), 849.[22] Chao Li, Zhiyuan Liu, Mengmeng Wu, Yuchi Xu, Huan Zhao, Pipei Huang,Guoliang Kang, Qiwei Chen, Wei Li, and Dik Lun Lee. 2019. Multi-interestnetwork with dynamic routing for recommendation at Tmall. In

Proceedings ofthe 28th ACM International Conference on Information and Knowledge Management .2615–2623.[23] Lei Li, Dingding Wang, Tao Li, Daniel Knox, and Balaji Padmanabhan. 2011.SCENE: A Scalable Two-Stage Personalized News Recommendation System (SIGIR ’11) . Association for Computing Machinery, New York, NY, USA, 125–134.https://doi.org/10.1145/2009916.2009937[24] Jiahui Liu, Peter Dolan, and Elin Rønby Pedersen. 2010. Personalized newsrecommendation based on click behavior. In

ACM IUI . 31–40.[25] Liang Pang, Jun Xu, Qingyao Ai, Yanyan Lan, Xueqi Cheng, and Jirong Wen.2020. Setrank: Learning a permutation-invariant ranking model for informa-tion retrieval. In

Proceedings of the 43rd International ACM SIGIR Conference onResearch and Development in Information Retrieval . 499–508.[26] Changhua Pei, Yi Zhang, Yongfeng Zhang, Fei Sun, Xiao Lin, Hanxiao Sun, JianWu, Peng Jiang, Junfeng Ge, Wenwu Ou, and Dan Pei. 2019. Personalized Re-Ranking for Recommendation. In

RecSys . 3–11.[27] Qi Pi, Weijie Bian, Guorui Zhou, Xiaoqiang Zhu, and Kun Gai. 2019. Practice onLong Sequential User Behavior Modeling for Click-Through Rate Prediction. In

SIGKDD . 2671–2679.[28] Badrul Munir Sarwar, George Karypis, Joseph A Konstan, John Riedl, et al. 2001.Item-based collaborative filtering recommendation algorithms. In

WWW . 285–295.[29] Mark Wilhelm, Ajith Ramanathan, Alexander Bonomo, Sagar Jain, Ed H Chi, andJennifer Gillenwater. 2018. Practical diversified recommendations on youtubewith determinantal point processes. In

CIKM . 2165–2173.[30] Sam Wiseman and Alexander M Rush. 2016. Sequence-to-sequence learning asbeam-search optimization. arXiv preprint arXiv:1606.02960 (2016).[31] Xiaoyong Yang, Yadong Zhu, Yi Zhang, Xiaobo Wang, and Quan Yuan. 2020.Large Scale Product Graph Construction for Recommendation in E-commerce. arXiv preprint arXiv:2010.05525 (2020).[32] Zhaohui Zheng, Keke Chen, Gordon Sun, and Hongyuan Zha. 2007. A regressionframework for learning ranking functions using relative relevance judgments. In

SIGIR . 287–294.[33] Guorui Zhou, Na Mou, Ying Fan, Qi Pi, Weijie Bian, Chang Zhou, XiaoqiangZhu, and Kun Gai. 2019. Deep interest evolution network for click-through rateprediction. In

AAAI . 5941–5948.[34] Guorui Zhou, Xiaoqiang Zhu, Chenru Song, Ying Fan, Han Zhu, Xiao Ma, YanghuiYan, Junqi Jin, Han Li, and Kun Gai. 2018. Deep interest network for click-throughrate prediction. In

SIGKDD . 1059–1068.[35] Tao Zhuang, Wenwu Ou, and Zhirong Wang. 2018. Globally Optimized MutualInfluence Aware Ranking in E-Commerce Search. In