Leveraging Review Properties for Effective Recommendation
LLeveraging Review Properties for Effective Recommendation
Xi Wang
University of Glasgow, [email protected]
Iadh Ounis
University of Glasgow, [email protected]
Craig Macdonald
University of Glasgow, [email protected]
ABSTRACT
Many state-of-the-art recommendation systems leverage explicititem reviews posted by users by considering their usefulness in rep-resenting the users’ preferences and describing the items’ attributes.These posted reviews may have various associated properties, suchas their length, their age since they were posted, or their item rating.However, it remains unclear how these different review propertiescontribute to the usefulness of their corresponding reviews in ad-dressing the recommendation task. In particular, users show distinctpreferences when considering different aspects of the reviews (i.e.properties) for making decisions about the items. Hence, it is im-portant to model the relationship between the reviews’ propertiesand the usefulness of reviews while learning the users’ preferencesand the items’ attributes. Therefore, we propose to model the re-views with their associated available properties. We introduce anovel review properties-based recommendation model (RPRM) thatlearns which review properties are more important than others incapturing the usefulness of reviews, thereby enhancing the recom-mendation results. Furthermore, inspired by the users’ informationadoption framework, we integrate two loss functions and a negativesampling strategy into our proposed RPRM model, to ensure thatthe properties of reviews are correlated with the users’ preferences.We examine the effectiveness of RPRM using the well-known Yelpand Amazon datasets. Our results show that RPRM significantlyoutperforms a classical and five state-of-the-art baselines. Moreover,we experimentally show the advantages of using our proposed lossfunctions and negative sampling strategy, which further enhancethe recommendation performances of RPRM.
ACM Reference Format:
Xi Wang, Iadh Ounis, and Craig Macdonald. 2021. Leveraging Review Prop-erties for Effective Recommendation. In
Proceedings of WWW ’21: Interna-tional World Wide Web Conference (WWW ’21).
ACM, New York, NY, USA,11 pages. https://doi.org/10.1145/nnnnnnn.nnnnnnn
In recent years, there has been an increase in the amount of avail-able information and interaction choices online. As a consequence,recommender systems are increasingly being deployed in variousplatforms to alleviate the complexity of decision making for usersand help them to find their desired items. Several studies [4, 53]
Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than theauthor(s) must be honored. Abstracting with credit is permitted. To copy otherwise, orrepublish, to post on servers or to redistribute to lists, requires prior specific permissionand/or a fee. Request permissions from [email protected].
WWW ’21, March 8-12, 2021, Ljubljana, Slovenia © 2021 Copyright held by the owner/author(s). Publication rights licensed to ACM.ACM ISBN 978-x-xxxx-xxxx-x/YY/MM...$15.00https://doi.org/10.1145/nnnnnnn.nnnnnnn focused on leveraging the item reviews posted by users. For ex-ample, Li et al. [19] modelled the users’ dynamic preferences byfirst aggregating the reviews of users and the reviews of the itemsthey interacted with in a time-sequential manner. They convertedthese reviews into embedding vectors to represent the users’ prefer-ences. However, not all reviews can be useful to represent the users’preferences and items’ attributes [4]. Instead, by estimating the use-fulness of reviews, the recommendation models can focus on thosevaluable reviews among the large volume of available information,thereby leading to improved recommendation performance [2, 12].Moreover, the performances of review-based recommendation mod-els can be limited if they capture the users’ preferences and items’attributes by solely using the review text [36]. Hence, many studiesaimed to incorporate the usefulness of reviews in review-basedrecommendation [2, 4]. Some other approaches use an attentionmechanism to model the usefulness of reviews [12] or those por-tions of the textual content of reviews that contribute most to therecommendation performances [6]. However, we argue that a lim-itation of such approaches is that they capture the usefulness ofreviews by relying on historical data, which often do not generaliseto reviews that are unseen by the trained model [42].To address the limitation above, we propose to consider reviewproperties to model the usefulness of reviews. Indeed, the reviewsposted by users on items have a corresponding set of properties,such as their length, the number of days since they were posted(i.e. age) or their writing style. These review properties are asso-ciated with the historical reviews as well as the unseen reviewsby the trained model. A number of studies [32, 44] have previ-ously attempted to leverage the review properties when makingrecommendations. The underlying premise of such studies is thatthe review properties encapsulate rich information about both theusers’ preferences and the items’ attributes.In particular, each review property can bring useful insightsabout the users’ preferences and the items’ attributes. Therefore, byintegrating such review properties into the review modelling pro-cess, a review-based recommendation model could also encapsulatethe usefulness of reviews when capturing the users’ preferencesand items’ attributes. In the literature, a number of review proper-ties have been used as side/contextual information to enrich theuser-item interactions when addressing recommendation tasks. Forinstance, the geographical property of reviews have been used tocapture those venues visited by a user or to estimate the users’ loca-tions when making local recommendations [22, 24]. The temporalproperty of reviews has been frequently leveraged in sequentialrecommendation to predict the next action of users [24, 47, 52].However, these studies do not consider the review properties toexamine the usefulness of reviews so as to improve the recom-mendation performances. In particular, it remains unclear howthese different review properties contribute to estimating the use-fulness of their corresponding reviews in effectively addressing the a r X i v : . [ c s . I R ] F e b WW ’21, March 8-12, 2021, Ljubljana, Slovenia Xi Wang, Iadh Ounis, and Craig Macdonald recommendation task. We address this limitation by consideringvarious review properties and examining their actual effectivenessin capturing the reviews’ usefulness in addressing the recommen-dation task. In [39], Sussman et al. proposed the users’ adoption ofinformation framework, which showed that each user follows a par-ticular scheme or strategy in using different properties or aspectsof the review information and thereby each user makes differentinteraction decisions. For example, according to the ElaborationLikelihood Model (ELM) [11], users can follow two strategies toprocess the posted reviews, namely the central route and the pe-ripheral route . Users that follow the central route show strongerwillingness in processing in-depth information (e.g. the descrip-tions of items in the reviews), while users who adopt the peripheralroute frequently use the overall ranking or rating as key factors tomake decisions. The various reviews posted by the users differ intheir characteristics (e.g. length, language, details). Such differencescan be described by the reviews’ properties, and are correlated tothe level of the users’ adoption of information [3, 30, 46]. Therefore,we argue that a model can better capture the users’ preferencesby learning how users use the reviews and by examining theirpreferences on different properties of the reviews.In this paper, we model the importance of different review prop-erties in capturing the usefulness of reviews and learning the users’preferences and the items’ attributes. We propose a novel reviewproperty-based neural network model (RPRM) to effectively ad-dress the recommendation task. RPRM investigates the usage ofreview properties to model the usefulness of reviews and aims toenhance the ability of the recommendation model in capturing theusefulness of reviews and the users’ adoption of information. Inparticular, RPRM uses six commonly encountered review propertiesin recommendation scenarios to encode the usefulness of reviews,including the age of the reviews, the length of the reviews, thereviews’ associated ratings, the number of helpful votes associatedto the reviews, the probability of the reviews being helpful andthe sentiment expressed in the reviews. Note that in this paper,‘helpfulness’ refers to whether users found the reviews helpful fortheir task, for example by capturing if they voted them to be helpful.Different review properties can have various importance levels incapturing the usefulness of reviews for different users and items.Moreover, according to the aforementioned users’ adoption of infor-mation framework [39], the properties of reviews are correlated tothe level of users’ adoption of information. This suggests that userstend to prefer items whose associated useful reviews capture thesame important properties as those the users prefer. Therefore, wealso propose two loss functions and a negative sampling strategythat aim to reward the situation where a user and the interacteditems agree on the most important properties and penalises thesituation where the user disagrees with the negative sampled itemson the most important properties.The main contributions of this paper are: (1) We propose a novel review-based recommendation model,RPRM, which leverages the usefulness of reviews to address therecommendation task. To the best of our knowledge, this is the firstwork that integrates the review properties in estimating the reviews’usefulness in a recommendation model through leveraging how We will release all our code upon the acceptance of this paper. users make use of such reviews in their interaction with the system. (2)
Inspired by the users’ adoption of information framework, wepropose two loss functions and one negative sampling strategythat model the agreement on the importance of review propertiesbetween the users and items. (3)
We show that RPRM significantlyoutperforms one classical and five state-of-the-art recommendationapproaches on the commonly-used Amazon and Yelp datasets. (4)
We show that our proposed loss functions and the negative samplingstrategy can further enhance the recommendation performances ofRPRM across the two used datasets.
We briefly discuss three bodies of related work, namely recommen-dation approaches based on reviews, recommendation approachesleveraging the use of review properties, and work investigatingusers’ behaviour while interacting with information.
The main objective of applying a recommendation model is to ob-serve the users’ behaviours and to learn how to distinguish amongitems for a given user, thereby estimating the users’ preferencesand recommending suitable items that the users might be inter-ested in. User-generated reviews encapsulate rich semantic infor-mation such as the possible explanation of the users’ preferencesand the description of specific item attributes [5]. Therefore, manyrecommendation models have aimed to leverage these reviews toconstruct user/item representations and to address the recommen-dation task [1, 7, 15, 19, 27]. Many previous review-based recom-mendation approaches captured the semantic similarity betweenthe review content [4, 53], which allows to encode additional re-lationships among the users and items, allowing to better suggestitems the users might be interested in. Indeed, the posted reviewsby users are valuable in modelling the interactions among usersand items from a textual semantic perspective. However, the qualityand usefulness of the reviews markedly vary with the increasingamount of users and the available reviews they post online. There-fore, Chen et al. [4] applied an attention mechanism to estimate theusefulness of different reviews. Unlike previous work [2, 4] , whichused an attention mechanism to learn the usefulness of reviews, weargue that the review properties can be directly leveraged to effec-tively capture the usefulness of reviews. Moreover, there are manyexisting approaches [24, 32, 44] that extract the review propertiesand integrate them as side or contextual information to enhance therecommendation performance. However, unlike our work in thispaper, such approaches do not make use of the reviews themselves.In the following, we further describe such approaches using theproperties as side information.
Various existing approaches [21, 32, 33] aimed to leverage differentreview properties as side information to model the users’ behavioursor the items’ attributes. For example, Raghavan et al. [32] leveragedthe extent to which a review is helpful to measure the reliabilityof the associated users’ ratings and to incorporate such reliabilityscores into a recommendation model. The geographical propertyof the users’ reviews [22, 31] has also been well studied in venue everaging Review Properties for Effective Recommendation WWW ’21, March 8-12, 2021, Ljubljana, Slovenia recommendation models. Another property is whether a reviewexpresses a sentiment. For instance, Wang et al. [44] replaced the ex-plicit ratings provided by users with the review sentiment scores toenhance the recommendation performance. The temporal and ageproperties of reviews have been integrated into various recommen-dation models, and especially into the sequential recommendationmodels [24, 47, 52]. For example, Manotumruksa et al. [24] encodedthe temporal information in a recurrent neural network to modelthe users’ dynamic preferences in the venue recommendation task.In particular, unlike previous works that have used review proper-ties solely as side information in a recommendation model, in thispaper we instead use them to estimate a given review’s usefulnessin enhancing the performance of a recommendation model. Addi-tionally, the distinct focus of various review properties is correlatedto the users’ adoption of information [3, 46].Furthermore, although these approaches extracted various re-view properties as contextual information in order to further enrichthe collaborative interactions among users and items, they haveignored the textual information of reviews. Indeed, while collabora-tive approaches can be effective with rich interaction informationand side information [24, 50], we argue that it is still important touse both the textual information of reviews and their associated re-view properties. Indeed, in this study, we postulate that leveragingthe review properties (e.g. length, age, sentiment) and their rela-tionships to the users’ preferences can help the recommendationmodel to more accurately learn the usefulness of reviews for effec-tive recommendation. In particular, an effective recommendationmodel needs to also capture the users’ preferences and the items’attributes along with the usage of these reviews’ properties. Forinstance, a user might prefer to read recent reviews on a hotel toobtain a more accurate information on its current condition andservices instead of reading much older reviews. This intuition alsoaligns with the users’ adoption of information framework describedin the next section.
Information adoption concerns how consumers modify their be-haviour by making use of the suggestions made in online reviews [39].The communication routes and the customers’ involvement in aconsumer opinion sharing website might persuade a customer tovisit a particular destination or purchase a specific product [41]. Anumber of user studies have examined various properties of reviewsthat influence the users’ adoption of information [3, 9, 11, 30, 46].These studies observed that the properties of reviews are corre-lated to the level of users’ adoption of information – indeed, suchcorrelations are important motivations for our present work. Forexample, Filieri and Mcleay [11] used the Elaboration LikelihoodModel (ELM) [29] to group the factors and properties of reviewsaccording to two information processing routes (i.e. central and pe-ripheral routes). The same authors also observed that a peripheralroute-based user would prefer to process the information about aproduct that simply has a good overall ranking. On the other hand,a central route-based user would consider in-depth informationto make decisions. Furthermore, Sussman et al. [39] consideredthe information usefulness as a mediator between the informa-tion process and the information adoption by users and showeda strong linkage between the usefulness of the information and the users’ decision making. Our work is inspired by the aforemen-tioned users’ adoption of information framework. In particular, weargue that leveraging the users’ preferences in relation to the re-views’ properties can improve recommendation effectiveness byproviding additional insights about the users’ information process-ing behaviours and their personalised preferences on the item’sattributes as conveyed by the reviews’ properties. To the best of ourknowledge, this is the first work that uses and leverages the users’adoption of information to address the recommendation task.
We first introduce the recommendation task and the notations usedin this paper. Next, we describe our proposed RPRM model, whichleverages the reviews posted by users to enhance the recommen-dation task. RPRM takes into account the review properties whenmodelling the user/item information by learning the importanceof different review properties for enriching the representations ofthe users’ preferences and the items’ attributes. RPRM accountsfor the importance of the review properties for users and itemsby proposing two loss functions and a negative sampling strategythat model the extent to which the users and items agree on theimportant review properties. For example, the agreement will behigh if a user considers longer reviews to be more useful and agiven item’s reviews usefulness is better described by long reviews.
We address the recommendation task, which aims to effectivelyidentify and recommend items to users according to their pref-erences. The recommendation task involves connecting two keyentity types, namely: the set of users 𝑈 = { 𝑢 , 𝑢 , ..., 𝑢 𝑁 } with sizeN and the set of items 𝐼 = { 𝑖 , 𝑖 , ..., 𝑖 𝑀 } with size M. To address therecommendation task, we aim to accurately estimate the users’ pref-erences on items so that we rank the items that a given user mightfind the most interesting in higher ranks. To do so, we investigatethe use of the reviews that the users have posted on items, as well astheir associated properties. Each user 𝑢 or item 𝑖 has an associatedset of reviews, e.g. posted by that user, 𝐶 𝑢 , or posted on that item, 𝐶 𝑖 .Furthermore, the reviews of a user or an item can be describedusing 𝑘 review properties P = { 𝑃 , 𝑃 , ..., 𝑃 𝑘 } . For example, to cap-ture the preferences of user 𝑢 for a given property 𝑃 , we estimatethe corresponding review property scores for each review in thereview set of user 𝑢 – i.e. 𝑃 ,𝑢 = { 𝑝 , , 𝑝 , , ..., 𝑝 , | 𝐶 𝑢 | } , where 𝑝 ,𝑡 is the property score of the 𝑡 𝑡ℎ review of user 𝑢 . For example, forthe length (or age) property, the score will correspond to the lengthof the review (or its age resp.). These scores could be computedfor any property, provided that the property values are mappedinto scalars in the range of [0..1] using an adequate function. Forexample, the geographical property (‘near’ vs. ‘distant’), the lengthproperty (‘long’ vs. ‘short’), or the age of reviews (‘old’ vs. ‘recent’)can all be mapped into a scalar in the interval [0, 1]. In particular,the property scores also define the usefulness of reviews. For ex-ample, for the length property, a longer review will have its lengthproperty score closer to 1 than other shorter reviews, and hencethis would indicate according to the length property scores, that alonger review is more useful. The computed property scores enable WW ’21, March 8-12, 2021, Ljubljana, Slovenia Xi Wang, Iadh Ounis, and Craig Macdonald the modelling of reviews from different aspects and examine the re-lationship between the review usefulness and the review properties. … CNN …Review Property EncodingReview Property Attention user id embedding
User Modelling
CNN CNN CNN … …
CNN …Review Property EncodingReview Property Attention item id embedding
Item Modelling
CNN CNN CNN … ˆ R
To address the introduced recommendation task, we propose thenovel Review Properties-based Recommendation Model (RPRM),which is a neural recommendation model that takes reviews andtheir associated properties into account. In particular, we use adot-product attention mechanism to score and learn which reviewproperty is more useful in describing the usefulness of reviews andthereby are important in making good recommendations. We firstpresent the architecture of our proposed RPRM model in Figure 1. Ingeneral, RPRM is a collaborative filtering-based framework, whichmodels the interactions between users and items. It is of note thatRPRM models both the users and items using the same neuralnetwork architecture (i.e. User and Item Modelling in Figure 1).The RPRM architecture is organised into four layers, which wediscuss in turn below: (1) The review property encoding layer,which combines the semantic textual representations of reviewsobtained using BERT with the properties of reviews (Section 3.2.1);(2) The review embedding processing layer, which creates a low-dimensional representation of each review (Section 3.2.2); (3) Thereview property attention layer (Section 3.2.3), which identifies theproperties of reviews that are more useful to represent the users’preferences and items’ attributes; (4) We use the output from the lastlayer as input to the prediction layer, along with the identificationembedding of a given user and item, to score the user’s preferenceson items (Section 3.2.4). Later, in Section 3.3, we discuss how wepropose new loss functions and a new negative sampling strategyto aid learning while encapsulating the properties of reviews.
RPRM first models the users’reviews and items’ reviews. To process and summarise the semanticinformation of each review, we convert each reviews into a 768-sized embedding vector by using the pre-trained BERT model [10],which is a recent widely used language modelling approach . Next,in this layer, the model encodes the embedding vectors of the re-views with various review properties through a dot-product func-tion. The objective of encoding the review latent vectors with dif-ferent review properties is to model the usefulness of reviews from We use BERT for ease of integration, but any language modelling approach couldbe used, e.g. ALBERT [18] or RoBERTa [23]. different perspectives (e.g. length, age, sentiment). Each reviewproperty can be represented by a list of normalised review propertyscores. These scores allow RPRM to focus on different reviews andencode the knowledge of the corresponding reviews’ properties.For example, by encoding the review length property, the model cancapture how reviews with different lengths can have an influenceon the recommendation outcome and how the length property ofreviews is associated with the user/item representations. The encod-ing process of a given review property can be described as follows: 𝑂 𝑢,𝑃 = [ 𝑋 𝑃 , , 𝑋 𝑃 , , ..., 𝑋 | 𝐶 𝑢 | 𝑃 , | 𝐶 𝑢 | ] (1)where 𝑋 ,..., | 𝐶 𝑢 | are the embedding vectors of the reviews of user 𝑢 and | 𝐶 𝑢 | is the size of his/her review set. In particular, Equation (1)encodes the review property 𝑃 for user 𝑢 . After encoding 𝑘 reviewproperties, for user 𝑢 , we have 𝑂 𝑢 = [ 𝑂 𝑢,𝑃 , ..., 𝑂 𝑢,𝑃 𝑘 ] .In this work, we use six commonly available review propertiesto describe the reviews from different perspectives: • Age:
We calculate the number of days 𝑑 since a review hasbeen posted. Then, we compute the Age score of a review to be: 𝑝 = − 𝑑 / 𝑚𝑎𝑥 ( 𝐷 ) , where 𝑚𝑎𝑥 ( 𝐷 ) is the age of the oldest reviewin the collection. In this case, a recent review is considered moreuseful than an older review. • Length:
The number of words that are included in a review. • Rating:
The rating associated with the review (1-5 stars). • Polar_Senti:
The Polar_Senti property indicates the probabilityof a given review being polarised (strongly positive or negative).We use a CNN classifier, which identifies reviews as being positiveor negative and which has been validated as a strongly effectiveclassifier with >95% classification accuracy in [44]. We obtain thecorresponding probabilities of the positive reviews being actuallypositive or the negative reviews being negative . • Helpful:
The number of helpful votes given by other users to aparticular review. • Prob_Helpful:
We classify the reviews with a state-of-the-art re-view helpfulness classification model [45] and obtain the probabilityof a given review to be helpful.In addition, for the Length, Rating and Helpful review properties,which have their property scores larger than 1, we apply the min-max normalisation to scale the property scores into [0, 1]. Notethat we use the aforementioned review properties as typical re-view properties that are commonly available in various datasets.However, our approach is general in that it could also incorporateother review properties (e.g. the geographical and part-of-speechproperties) into the proposed RPRM model.
After encoding the embedding vec-tors of the reviews with their review property scores, we use theconvolutional operators, as in other review-based deep neuralnetwork approaches [4, 53], to model the embedding vector ofeach review. The convolutional operators consist of 𝑚 neurons,with the 𝑗 𝑡ℎ neuron modelling the review embedding vector as 𝑍 𝑗 = 𝑅𝑒𝐿𝑈 ( 𝑉 ∗ 𝐾 𝑗 + 𝑏 𝑗 ) , where 𝑉 is the review embedding vectorand ∗ is the convolution operator with the 𝑗 𝑡ℎ filter and 𝑏 𝑗 is abias term. The ReLU activation function is applied to process thegenerated features. Next, each neuron 𝑗 applies a sliding window A review is positive (negative if it has a rating ≥ ≤ ). The polarity of a 3-starreview is predicted by the CNN classifier. everaging Review Properties for Effective Recommendation WWW ’21, March 8-12, 2021, Ljubljana, Slovenia over the features 𝑍 with a max pooling function to then obtain theconvolutional output 𝑜 𝑗 for the corresponding neuron. Therefore,for each review, we concatenate the convolutional output fromthe neurons and obtain the processed embedding vector for eachreview (i.e. 𝑂 = [ 𝑜 , 𝑜 , ..., 𝑜 𝑚 ] ). In the review property en-coding layer, RPRM converts the user/item modelling latent vectorsinto a set of latent vectors by considering various review proper-ties. In this attention layer, the main objective is to observe whichproperties of reviews are more useful to represent the users’ pref-erences and items’ attributes. We hypothesise that the dot-productattention mechanism would enhance the recommendation perfor-mance of RPRM. Moreover, each user or item is associated witha review property weighted vector 𝜙 𝑢 or 𝜙 𝑖 with size 𝑘 , where 𝑘 is the number of used review properties. For a given user 𝑢 , thereview property attention layer is defined as: 𝑂 ′ 𝑢 = (cid:205) 𝑘𝑡 = 𝜙 𝑢,𝑡 𝑂 𝑢,𝑃 𝑡 𝑘 (2) In this layer, RPRM concatenates the pro-cessed review latent vectors and the identification embedding vectorof users and items to make recommendations. The final predictionof the users’ preferences on items can be computed as:ˆ 𝑅 𝑢,𝑖 = ( 𝑂 ′ 𝑢 ⊕ 𝑉 𝑢 ) ⊙ ( 𝑂 ′ 𝑖 ⊕ 𝑉 𝑖 ) (3)where ⊕ is the concatenation operation, which combines the reviewembedding vector 𝑂 ′ , and the identification embedding vector 𝑉 .Moreover, ⊙ denotes the element-wise product of the latent vectorsbetween user 𝑢 and item 𝑖 to calculate the preference score 𝑅 𝑢,𝑖 ofuser 𝑢 on item 𝑖 . The RPRM model addresses a ranking-based recommendation task,i.e. for a given user, it ranks first those items likely to be of interest tothe user. A common and popular ranking scheme is to first apply theBayesian Personalised Ranking (BPR) loss function [34] to optimisethe model by comparing the prediction scores for users 𝑈 with thepositive items 𝐼 + and the negative items 𝐼 − . The positive items arethose items the user has interacted with while the negative itemsare sampled from those items the users did not interact with thusfar. In particular, the uniform sampling strategy is commonly usedto generate the negative items from the users’ unseen items. We usethis learning scheme as a basic setup of our proposed RPRM model.Aside from building upon the BPR’s loss function and uniformsampling for generating negative items, we propose novel learningschemes to enhance the recommendation effectiveness. Accordingto the users’ adoption of information framework [39] that we dis-cussed in Section 2.3, users show distinct information processingbehaviour. In the recommendation scenario, users tend to havedifferent preferences; for example some users might prefer shorterreviews while others might favour in-depth reviews that describethe advantages/disadvantages of a given item. In particular, thereis a relationship between the users’ behaviour and the review prop-erties [39]. This suggests that users tend to prefer items whoseassociated useful reviews capture the same important propertiesas those the users prefer. Therefore, we propose two loss functions (i.e. 𝑃𝑟𝑜𝑝𝐿𝑜𝑠𝑠 𝑢𝑢 and 𝑃𝑟𝑜𝑝𝐿𝑜𝑠𝑠 𝑢𝑖 ) that reward the case of a user andthe interacted items agreeing on the most important propertiesand penalise the case where the user disagrees with the negativesampled items on the most important properties.In particular, based on the users’ adoption of information frame-work, we assume that users would prefer to process informationfrom items that have similar usefulness importance scores on thereview properties. Using the information from the similarly scoreditems’ properties, users would exhibit a higher probability of inter-acting with these items than with other unknown items. Therefore,both of our proposed loss functions ensure that there is an agree-ment in the importance of review properties between the users andtheir interacted items (i.e. positive items). However, 𝑃𝑟𝑜𝑝𝐿𝑜𝑠𝑠 𝑢𝑢 amplifies the disagreement in the importance of review proper-ties between the users and the unseen (negative) sampled items,while 𝑃𝑟𝑜𝑝𝐿𝑜𝑠𝑠 𝑢𝑖 amplifies the disagreement between the inter-acted items and the unseen (negative) sampled items of users. Thesetwo losses functions are defined as follows: 𝑃𝑟𝑜𝑝𝐿𝑜𝑠𝑠 𝑢𝑢 ( 𝑢, 𝑖 + , 𝑖 − ) = 𝑆𝑖𝑚 ( 𝜙 𝑢 , 𝜙 𝑖 − ) − 𝑆𝑖𝑚 ( 𝜙 𝑢 , 𝜙 𝑖 + ) (4) 𝑃𝑟𝑜𝑝𝐿𝑜𝑠𝑠 𝑢𝑖 ( 𝑢, 𝑖 + , 𝑖 − ) = 𝑆𝑖𝑚 ( 𝜙 𝑖 + , 𝜙 𝑖 − ) − 𝑆𝑖𝑚 ( 𝜙 𝑢 , 𝜙 𝑖 + ) (5)where 𝑆𝑖𝑚 ( . ) is a function that measures the similarity betweenthe weighted vectors of the review properties. Before applying thesimilarity function, we scale the weighting scores by dividing thescores by the sum of scores in each weighted vector to generatea discrete probability distribution of scores [0..1] over the reviewproperties. In particular, we use the Cosine similarity (Cos) functionand the Kullback–Leibler (KL) divergence measure as the similarityfunctions, which have shown good performances in measuringlatent vector similarities [43]. Note that, since KL is a divergencemeasure, we use the inverse of KL to compute similarity. Further-more, we combine the PropLoss functions with the commonly-usedBPR loss function as follows: L = 𝛼 × 𝐵𝑃𝑅 ( 𝑢, 𝑖 + , 𝑖 − ) + ( − 𝛼 ) × 𝑃𝑟𝑜𝑝𝐿𝑜𝑠𝑠 ( 𝑢, 𝑖 + , 𝑖 − ) (6)where 𝛼 controls the emphasis on the two loss functions.We also propose a novel negative sampling strategy, called 𝑃𝑟𝑜𝑝𝑆𝑎𝑚𝑝𝑙𝑒 , which models the agreement in the importance of reviewproperties between the users’ interacted items and the unseen items.We argue that if the same properties are important to two items(e.g. 𝑖 and 𝑖 ), but a particular user interacts with item 𝑖 but notwith item 𝑖 , then this user shows a clearer preference for item 𝑖 over item 𝑖 . Therefore, we sample negative items from each user’sunseen items by selecting items that have similar review propertieswith those items the user has already interacted with.For a given positive item 𝑖 + ∈ 𝐼 , we again use a similarity function 𝑆𝑖𝑚 () to calculate the similarity on the paired property weightedvectors 𝜙 𝑖,𝑝 between the positive item 𝑖 + and all negative (unseen)items (i.e. 𝐼 − ). Next, similar to the loss functions, we normalisethe similarity scores across all negative items into a probabilitydistribution. This probability distribution gives the likelihood forsampling these items as a negative instance for learning. In this section, we examine the performances of our proposed modeland approaches on two real-world datasets. Moreover, we compare
WW ’21, March 8-12, 2021, Ljubljana, Slovenia Xi Wang, Iadh Ounis, and Craig Macdonald the performance of RPRM with one classical and five state-of-the-art recommendation approaches. In particular, we evaluate theperformances of our proposed loss functions and negative samplingstrategy in addressing the following research questions:
RQ 1:
Does RPRM outperform the recommendation baselines onthe two used datasets?
RQ 2:
Do the proposed loss functions,
𝑃𝑟𝑜𝑝𝐿𝑜𝑠𝑠 𝑢𝑢 and 𝑃𝑟𝑜𝑝𝐿𝑜𝑠𝑠 𝑢𝑖 ,improve the recommendation performances of RPRM in comparisonto the classical BPR loss function? RQ 3:
Does the proposed negative sampling strategy, namely
𝑃𝑟𝑜𝑝𝑆𝑎𝑚𝑝𝑙𝑒 , further enhance the recommendation performance of RPRMcompared to the uniform sampling strategy?
For answering the aforementioned research questions, we use tworeal-world datasets, namely the Yelp dataset and the AmazonProduct dataset [14, 26] to examine the effectiveness of our RPRMmodel as well as our proposed loss functions and negative samplingstrategy. The Yelp dataset includes user reviews on their top pop-ular category (i.e. ‘restaurant’) and the Amazon dataset containsuser reviews on products among six categories . The use of variouscategories of the Amazon dataset allows to capture the users’ pref-erences across different types of items/products. These two datasetshave been used in several previous studies (e.g. [25, 38]). We use theYelp dataset from the most recent round of Yelp challenge dataset(i.e. round 13).In our experiments, we remove cold-start users and items fromboth datasets, as in [8, 36], to ensure that each user and item have atleast 5 associated reviews. The resulting Yelp dataset has 47k users,16k items and 551k reviews; the Amazon dataset has 26k users, 16kitems and 285k reviews. Then, following [4, 36], these two datasetsare divided into 80% training, 10% validation and 10% test sets in atime-sensitive manner. In particular, we ensure that the same datasplit ratio applies to the interactions of each user. Next, we mea-sure the recommendation effectiveness by examining if the itemsinteracted with by the users in the test sets are actually chosen forrecommendation by the tested models. Hence, we compute the Pre-cision and Recall metrics at different standard rank cutoff positions(namely, P@1, P@10, and R@10) as well as Mean Average Precision(MAP), following [20, 48], to examine the effectiveness of the testedrecommendation approaches. To test statistical significance, weapply a paired t-test, with significance level to 𝑝 < .
05, and usethe post-hoc Tukey Honest Significant Difference (HSD) [37] at 𝑝 < .
05 to account for the multiple comparisons with the t-tests.In the following, we describe the experimental setup of both RPRMand the used baselines.
We implement our proposed RPRM model and the NN-based base-line approaches (namely DREAM, CASER, DeepCoNN, JRL andNARRE) using the PyTorch framework [28]. For the setup of RPRM,in the review processing layer, as introduced in Section 3.2, we usethe pre-trained BERT model [10] to convert each review into a 768-sized latent vector. However, since BERT is limited to encoding a http://jmcauley.ucsd.edu/data/amazon/ ‘ama-zon instant video’, ‘automotive’, ‘grocery and gourmet food’, ‘musical instruments’,‘office products’ and ‘patio lawn and garden’ maximum of to 512 tokens, we limit the maximum review length tobe 512 tokens. Next, in the review property encoding layer, we usethe Negative Confidence-aware Weakly Supervised (i.e. NCWS) re-view helpfulness classifier [45] to generate the ‘Polar_Helpful’ prop-erty scores, which estimates the probability of the reviews beinghelpful. In particular, we follow [45] in training the NCWS modelusing reviews from the using ‘food’ and ‘nightlife’ categories of theYelp Challenge dataset round 12 and on the Kindle reviews fromAmazon . We use NCWS to predicts the ‘Polar_Helpful’ propertyscores of reviews in both the Yelp and Amazon datasets. Similarly,to generate the ‘Polar_Senti’ review property scores, we use a CNN-based binary sentiment classifier [16], which has been shown tohave a strong classification accuracy (>95%) [44]. We then train it on50,000 positive and 50,000 negative sentiment reviews that are sam-pled from the Yelp Challenge dataset round 12 to conduct sentimentclassification [44]. We label the polarity of each review accordingto the user’s posted rating, which we label as positive if the rating ≥
4, and negative if the rating ≤
2. This CNN classifier provideseach review with its probability of carrying a strong polarised sen-timent. Finally, when training our proposed RPRM model, we applyearly-stopping and use the Adam optimiser [17] with a 5 𝑒 − and1 𝑒 − learning rates for the Yelp and Amazon datasets, respectively.These learning rates are selected after tuning the model on thevalidation set, varying the learning rates between 1 𝑒 − and 1 𝑒 − . We use six baselines: one classical baseline and five state-of-the-art recommendation approaches: (1) BPR-MF [34] is a traditionaland commonly used recommendation baseline that uses a pairwiseranking loss function (i.e. BPR) to learn the matrix factorised in-teractions between users and items. (2) DREAM [49] encodes theage property of the reviews and models the dynamic representa-tions of the users’ preferences with a recurrent neural network. (3)CASER [40] is a recent approach sequentially models the implicituser historical interactions with convolutional neural networks. Itis a state-of-the-art recommendation model [13] that encodes theage property of reviews. (4) DeepCoNN [53] is a review-based rec-ommendation model that jointly models users and items througha convolutional neural network. (5) JRL [51] is a heterogeneousrecommendation model that encodes various types of informationresources including product images, review text and user ratings.In particular, we implement the JRL model by only using the reviewtext. (6) NARRE [4] models users and items with two parallel neu-ral networks, both of which include a convolutional layer and anattention layer to capture the usefulness of reviews.To ensure a fair comparison, we also apply early-stopping onall baseline approaches. In particular, since we use a pre-trainedBERT model to convert the reviews into embedding vectors for ourproposed RPRM model, we also extend the DeepCoNN and NARREbaselines by using the BERT-encoded review embedding vectors. Inparticular, for DeepCoNN, we concatenate all reviews given by/toa single user/item and form a user/item review document. Similarto RPRM, for both approaches, we limit the maximum length of the We use different Yelp dataset rounds, different categories & removed reviews thatbelong to ‘restaurant’ from ‘food’ and ‘nightlife’, to avoid overlaps between the NCWSand RPRM evaluation settings. Again, we use a different Amazon review categoryfor training NCWS to avoid overlap with the RPRM evaluation. everaging Review Properties for Effective Recommendation WWW ’21, March 8-12, 2021, Ljubljana, Slovenia
Table 1: Recommendation performances. Significant differences w.r.t. ‘No-Prop’ are indicated by ‘*’ (according to both thepaired t-test and the Tukey HSD test, 𝑝 < . ). 1/2/3 denote a significant difference according to both tests w.r.t. to theindicated approach. ↑ indicates that the corresponding approach is significantly outperformed by RPRM on all rankingmetrics according to both tests. Dataset Amazon YelpModel P@1 P@10 R@10 MAP P@1 P@10 R@10 MAP ↑ BPR-MF 0.0053* 0.0034* 0.0301* 0.0111* 0.0101* 0.0058* 0.0391* 0.0145* ↑ DREAM 0.0030* 0.0008* 0.0062* 0.0029 * 0.0083* 0.0065* 0.0469* 0.0155* ↑ CASER 0.0093* 0.0060* 0.0499* 0.0239 * 0.0111* 0.0083* 0.0571* 0.0229* ↑ , , , , , , , ↑ , , , , , , , ↑ , , , , , , , , ↑ No-Prop 0.0208 0.0088 0.0805 0.0357 0.0153 0.0099 0.0745 0.0260Age 0.0215* 0.0089 0.0820* 0.0372* 0.0157 0.0105* 0.0756* 0.0267*Length 0.0214 0.0089 0.0815* 0.0364* 0.0159* 0.0101 0.0726* 0.0262Helpful 0.0218* 0.0089 0.0817* 0.0365* 0.0151 0.0100 0.0719* 0.0255Prob-Helpful 0.0214 0.0093 0.0852* 0.0376* 0.0152 0.0103 0.0750 0.0264Rating 0.0206 0.0087 0.0795* 0.0352 0.0160* 0.0102 0.0730* 0.0264Polar-Senti 0.0211 0.0086 0.0783* 0.0355 0.0155 0.0102 0.0738 0.0262RPRM user/item document in the DeepCoNN model, and the maximumtokens of each review in NARRE, to be 512 tokens.We then fine tuneevery baseline model with learning rates in [ 𝑒 − , 𝑒 − , 𝑒 − ] andwe compare our approaches with the settings that exhibited the bestperformances on the validation set. Furthermore, we incrementallyevaluate the various components of our proposed RPRM model.First, we capture the effectiveness of using the review propertiesin a review-based recommendation model, we remove the reviewproperty encoding layer in RPRM, denoted as ‘No-Prop’, to examineits effectiveness on our datasets. Next, we also examine the effec-tiveness of using each single review property from the includedproperties in Section 3.2. Therefore, we apply each single reviewproperty in the review property attention layer of RPRM to evalu-ate their effectiveness in identifying the usefulness of reviews. Wedenote the resulting recommendation models with the name of thecorresponding review properties (e.g. ‘Age’ for using the Age prop-erty). Next, we examine the effectiveness of our proposed RPRM’slearning schemes, namely the two loss functions ( 𝑃𝑟𝑜𝑝𝐿𝑜𝑠𝑠 𝑢𝑢 and 𝑃𝑟𝑜𝑝𝐿𝑜𝑠𝑠 𝑢𝑖 ) and the negative sampling strategy 𝑃𝑟𝑜𝑝𝑆𝑎𝑚𝑝𝑙𝑒 , bycomparing their performances with those of the commonly usedBPR loss function and with uniform sampling, respectively.
We present and analyse the results of our experiments to answerthe research questions in Section 4. Our experiments focus on in-vestigating the performance of RPRM as well as the effectivenessof our proposed loss functions and negative sampling strategies incomparison to the six strong baselines from the literature.
To answer RQ1, we first examine the performances of the six base-lines and compare them to the performance of our proposed RPRMmodel and its variants. In particular, we integrate each review prop-erty separately, before combining all of them together in the fullRPRM model. The results on the two used datasets are presented in Table 1. First, we compare the performance of the RPRM modelwithout using the review property encoding layer (namely No-Prop) to the baseline approaches from Table 1. We observe that No-Prop significantly outperforms all baseline approaches, includingthe state-of-the-art recommendation approaches (namely CASER,DeepCoNN and NARRE), according to both the paired t-test and theTukey HSD test regardless of whether they use any review infor-mation. In particular, we focus on DeepCoNN, JRL and NARRE thatmake use of review information. Both DeepCoNN and JRL exhibitweak recommendation performances on the two used datasets withlow precision, recall and MAP scores, which are lower than thetraditional BPR-MF approach. The BPR-MF approach is a strongbaseline and was shown recently to outperform various state-of-the-art recommendation approaches from the literature [35]. Amongthese three baselines, NARRE significantly outperforms both theDeepCoNN and JRL approaches according to both the paired t-testand the Tukey HSD test on the two datasets with higher evalu-ation scores. However, NARRE is significantly outperformed byour No-Prop variant (according to both the paired t-test and theTukey HSD test), despite No-Prop having a simpler structure thanNARRE. The effectiveness of this simple review-based recommenda-tion approach is consistent with the conclusions in [36]. Moreover,by comparing No-Prop and DeepCoNN, we note that the only ar-chitecture difference between these two models is that No-Propintegrates the identification embedding vectors of users and items.The observed significantly enhanced performances of No-Prop overDeepCoNN on all used metrics (paired t-test and Tukey HSD test)demonstrate the benefits of using such embedding vectors to modelthe users’ preferences and items’ attributes. In summary, we findthat No-Prop significantly outperforms all baseline approaches.In particular, the use of the embedding vectors, which model theusers’ preferences and items’ attributes, explains the superior per-formances of both the No-Prop and NARRE models in comparisonto other baseline approaches.Next, we evaluate the effectiveness of integrating different re-view properties to the basic No-Prop approach to model the use-fulness of reviews. The results from Table 1 show that in general,
WW ’21, March 8-12, 2021, Ljubljana, Slovenia Xi Wang, Iadh Ounis, and Craig Macdonald
Table 2: Impact of the model’s learning schemes on RPRM. Statistically significant differences with respect to ‘RPRM-basic’are indicated by ‘*’ (according to both the paired t-test and the Tukey HSD test, 𝑝 < . ). Dataset Amazon YelpModel P@1 P@10 R@10 MAP P@1 P@10 R@10 MAP
RPRM-basic 0.0223 0.0095 0.0865 0.0378 0.0161 0.0104 0.0761 0.0271
𝑃𝑟𝑜𝑝𝐿𝑜𝑠𝑠 𝑢𝑢 -KL 0.0217 0.0094 0.0867 0.0381 0.0163 0.0106 0.0772* 0.0281* 𝑃𝑟𝑜𝑝𝐿𝑜𝑠𝑠 𝑢𝑢 -Cos 0.0211* 0.0093 0.0853* 0.0369* 0.0154* 0.0105 0.0764 0.0271 𝑃𝑟𝑜𝑝𝐿𝑜𝑠𝑠 𝑢𝑖 -KL * * 𝑃𝑟𝑜𝑝𝐿𝑜𝑠𝑠 𝑢𝑖 -Cos 0.0211* 0.0093 0.0859* 0.0382 0.0165 0.0105 0.0772* 0.0278* 𝑃𝑟𝑜𝑝𝑆𝑎𝑚𝑝𝑙𝑒 -KL 0.0225 0.0093 0.0863 0.0385* 0.0163 0.0102 0.0738* 0.0268
𝑃𝑟𝑜𝑝𝑆𝑎𝑚𝑝𝑙𝑒 -Cos 0.0220 0.0093 0.0855 0.0376 0.0166 0.0105 0.0767* 0.0276
𝑃𝑟𝑜𝑝𝐿𝑜𝑠𝑠 𝑢𝑖 -KL+ 𝑃𝑟𝑜𝑝𝑆𝑎𝑚𝑝𝑙𝑒 -KL 0.0210* 0.0095 0.0871* 0.0373 0.0162 0.0100 0.0740* 0.0266
𝑃𝑟𝑜𝑝𝐿𝑜𝑠𝑠 𝑢𝑖 -KL+ 𝑃𝑟𝑜𝑝𝑆𝑎𝑚𝑝𝑙𝑒 -Cos 0.0211* 0.0094 0.0863 0.0372* 0.0159 0.0105 0.0770* 0.0273 the review properties can significantly improve the performancesof No-Prop on both used datasets according to both the pairedt-test and the Tukey HSD test. In particular, we observe that the‘Age’ and ‘Prob_Helpful’ review properties are the two most ef-fective properties among the six review properties we tested incapturing the usefulness of reviews and improving the recommen-dation effectiveness of No-Prop. The other review property-basedapproaches show different performances on the two datasets. Forexample, by the ‘Helpful’ property enhances the performances ofNo-Prop on the Amazon dataset but decreases its performanceson the Yelp dataset. Moreover, the ‘Rating’ property improves therecommendation performances of No-Prop on the Yelp dataset butnot on the Amazon dataset. Therefore, these results suggest that itis more effective to selectively apply the right review properties inthe recommendation model to assess the usefulness of the reviewsand leverage them in the made recommendations, which is oneof the main underlying ideas of proposing the RPRM model. Inparticular, these results indicate the necessity of understandingthe importance of different review properties on different datasetsor recommendation applications. Therefore, next, we evaluate theperformance of our proposed RPRM model, which integrates allsix review properties and appropriately scores (or weights) theimportance of different reviews’ properties. The observed resultsfor RPRM from Table 1 show that the RPRM model provides thebest recommendation effectiveness on the two used datasets. More-over, the observed performances significantly outperform bothNo-Prop and all the baseline approaches, including the existingstate-of-the-art recommendation models (namely NARRE, CASERand DeepCoNN),) according to both the paired t-test and the TukeyHSD test. These results demonstrate the benefits of using all reviewproperties and weighting their importance for capturing the usefulreviews and their leverage in recommendation.Therefore, in answering RQ1, we conclude that different reviewproperties show distinct effectiveness levels in enhancing the per-formance of a review-based recommendation model. Among thesix used review properties, the ‘Age’ and ‘Prob_Helpful’ propertiesare the most effective, and consistently enhance the effectiveness ofthe No-Prop recommendation model. Furthermore, by integratingall six review properties and weighting their importance in the fullRPRM model, we observe that RPRM achieves the best performance among all tested approaches on both used datasets. Our resultsalso validate our hypothesis in Section 3.2.3, namely that weightingthe importance of review properties with a dot-product attentionmechanism can enhance the recommendation performances.
To answer RQ2, we examine the impact of using our proposed lossfunctions (namely
𝑃𝑟𝑜𝑝𝐿𝑜𝑠𝑠 𝑢𝑢 and 𝑃𝑟𝑜𝑝𝐿𝑜𝑠𝑠 𝑢𝑖 ) using two differ-ent similarity approaches (namely the KL divergence and Cosinesimilarity). We also compare the RPRM model that uses our pro-posed loss functions with the same model using a standard rankingscheme, namely the BPR loss function and a uniform samplingstrategy for generating negative items (i.e. RPRM-basic). By explor-ing different combinations, we have four possible model learningsetups, i.e. 𝑃𝑟𝑜𝑝𝐿𝑜𝑠𝑠 𝑢𝑢 with KL or Cosine and 𝑃𝑟𝑜𝑝𝐿𝑜𝑠𝑠 𝑢𝑖 with KLor Cosine. Table 2 presents the obtained experimental results on thetwo Amazon and Yelp datasets for these four model learning setups.First, from Table 2, we observe that our proposed two loss functionscan consistently improve the performance of the basic RPRM withthe exception of the 𝑃𝑟𝑜𝑝𝐿𝑜𝑠𝑠 𝑢𝑢 -Cos model setup on the Amazondataset. In particular, by comparing the evaluation performancesof the PropLoss-based approaches with that of the basic RPRM, weobserve that 𝑃𝑟𝑜𝑝𝐿𝑜𝑠𝑠 𝑢𝑖 -KL improves upon the recommendationperformances of RPRM-basic with significantly higher MAP scoresaccording to both the paired t-test and the Tukey HSD test on thetwo used datasets: 0.03784 → → 𝑝 < . 𝑃𝑟𝑜𝑝𝐿𝑜𝑠𝑠 𝑢𝑖 -based approaches outperform the 𝑃𝑟𝑜𝑝𝐿𝑜𝑠𝑠 𝑢𝑢 -based approaches on both datasets. This observation suggests that,in terms of setting the importance of the used review properties, itis more effective to amplify the disagreement between users and thenegatively sampled items than that between the users’ interacteditems and the negatively sampled items. Our results also demon-strate that leveraging the users’ adoption of information frameworkis a promising approach. Finally, by examining the effectiveness ofthe two similarity measurement approaches, we observe that both everaging Review Properties for Effective Recommendation WWW ’21, March 8-12, 2021, Ljubljana, Slovenia Age Help P_Help Length Rating P_SentiReview Properties0.30.40.50.60.70.8 R e v i e w P r o p e r t y W e i g h t s PI (User)PI (Item 1)PI (Item 2) PI (Item 3)PI (Item 4)PI (Item 5) (a) User A (few interactions).
Age Help P_Help Length Rating P_SentiReview Properties0.30.40.50.60.7 R e v i e w P r o p e r t y W e i g h t s PI (User)PI (Item 1)PI (Item 2) PI (Item 3)PI (Item 4) (b) User B (few interactions).
Age Help P_Help Length Rating P_SentiReview Properties0.20.30.40.50.60.70.8 R e v i e w P r o p e r t y W e i g h t s PI (User)PI (Items)Mean PI (Items) (c) User C (many interactions).
Figure 2: The properties’ importance scores of reviews for randomly selected users and their interacted items. ‘Help’, ‘P_Help’,‘P_Senti’ are the abbreviations of ‘Helpful’, ‘Prob_Helpful’ & ‘Polar_Senti’, resp. ‘PI’ refers to the Properties’ Importance scores. the KL and Cosine similarity-based approaches can outperform theRPRM-basic model when applied with the
𝑃𝑟𝑜𝑝𝐿𝑜𝑠𝑠 𝑢𝑖 approaches.However, KL is consistently more effective on both datasets in com-parison to the Cosine similarity method and significantly betterthan RPRM-basic according to both the paired t-test and the TukeyHSD test. This result overall demonstrates the effectiveness of mod-elling the divergence similarity between the weighting vectors ofthe review properties on our used datasets.After analysing the results from Table 2, we now answer RQ2:our proposed loss functions 𝑃𝑟𝑜𝑝𝐿𝑜𝑠𝑠 𝑢𝑢 and 𝑃𝑟𝑜𝑝𝐿𝑜𝑠𝑠 𝑢𝑖 can bothimprove the performances of RPRM-basic. Moreover, 𝑃𝑟𝑜𝑝𝐿𝑜𝑠𝑠 𝑢𝑖 shows a higher effectiveness than 𝑃𝑟𝑜𝑝𝐿𝑜𝑠𝑠 𝑢𝑢 in enhancing therecommendation performance. We conclude that the divergencebetween the weighted vectors of the review properties using theKL divergence measure, 𝑃𝑟𝑜𝑝𝐿𝑜𝑠𝑠 𝑢𝑖 , can enhance the RPRM-basicmodel and gives the best overall recommendation performances. We now examine the effectiveness of our proposed negative sam-pling strategy (namely
𝑃𝑟𝑜𝑝𝑆𝑎𝑚𝑝𝑙𝑒 ), so as to answer RQ3. FromTable 2, we observe that
𝑃𝑟𝑜𝑝𝑆𝑎𝑚𝑝𝑙𝑒 does not consistently out-perform the RPRM-basic model when using the same similarityapproach. In particular, the
𝑃𝑟𝑜𝑝𝑆𝑎𝑚𝑝𝑙𝑒 model with the KL diver-gence can improve the recommendation performance of RPRM-basic on the Amazon dataset but not on the Yelp dataset. On theother hand, the
𝑃𝑟𝑜𝑝𝑆𝑎𝑚𝑝𝑙𝑒 model that uses the Cosine similaritycan improve the recommendation performance of RPRM-basic onthe Yelp dataset but not on the Amazon dataset. Next, we investi-gate combining the
𝑃𝑟𝑜𝑝𝑆𝑎𝑚𝑝𝑙𝑒 negative sampling strategy withthe best performing loss function, namely
𝑃𝑟𝑜𝑝𝐿𝑜𝑠𝑠 𝑢𝑖 -KL (see thelast two rows of Table 2). We observe that the combination of both 𝑃𝑟𝑜𝑝𝑆𝑎𝑚𝑝𝑙𝑒 and
𝑃𝑟𝑜𝑝𝐿𝑜𝑠𝑠 𝑢𝑖 -KL with RPRM-basic does not leadto a better performance than when solely using 𝑃𝑟𝑜𝑝𝐿𝑜𝑠𝑠 𝑢𝑖 -KL.These results might be caused by the fact that both 𝑃𝑟𝑜𝑝𝐿𝑜𝑠𝑠 𝑢𝑖 and 𝑃𝑟𝑜𝑝𝑆𝑎𝑚𝑝𝑙𝑒 similarly capture the importance of the review proper-ties between the users’ interacted items and the unseen items. Fur-thermore,
𝑃𝑟𝑜𝑝𝑆𝑎𝑚𝑝𝑙𝑒 only considers the agreement between theusers’ interacted (positive) and the unseen (negative) items on the important reviews’ properties, which might not be sufficient to sam-ple useful negative items. We leave the modelling of further addi-tional information in the
𝑃𝑟𝑜𝑝𝑆𝑎𝑚𝑝𝑙𝑒 approach (e.g. the agreementon the important reviews’ properties between the users and theirinteracted (positive) and/or unseen (negative) items) to future work.Therefore, for RQ3, we conclude that
𝑃𝑟𝑜𝑝𝑆𝑎𝑚𝑝𝑙𝑒 can enhanceRPRM if an adequate similarity measure is applied on each useddataset. In addition, by comparing the performances of the
𝑃𝑟𝑜𝑝 - 𝑆𝑎𝑚𝑝𝑙𝑒 and
𝑃𝑟𝑜𝑝𝐿𝑜𝑠𝑠 approaches, our results showed that the
𝑃𝑟𝑜𝑝𝐿𝑜𝑠𝑠 loss function has more impact on the recommendationeffectiveness than the negative sampling strategy, suggesting thatit is more important to capture the reviews’ properties importancebetween the users and their interacted or unseen items.
The users’ adoption of information is one of the main arguments un-derlying our proposed RPRM model. In Section 5, we showed the ef-fectiveness of modelling the agreement between the users and itemsin terms of the reviews’ properties. Therefore, in this section, we usethree randomly selected users to illustrate the users’ preferences ondifferent review properties and the agreement on the importanceof review properties between the users and their interacted items.To this end, Figure 2(a)-(c) plots the learned RPRM property im-portance scores for the review properties of three randomly selectedusers, say A, B & C, as well as their interacted items. The users’property importance preferences are shown using a blue dashedline with square markers; their interacted items in the test set arealso shown (solid lines in Figure 2(a) and 2(b) and dots in differ-ent colours in Figure 2(c)) from the Amazon dataset. In particular,we selected users A & B from the Yelp dataset as example usersthat have few interactions with items, to illustrate the importancescores on the review properties between the target users and theirinteracted items. Indeed, we selected users with few interactionsso as to be able to visually plot all these items in a figure. FromFigure 2(a), we observe that user A shows stronger preferences forthe ‘Length’ property (i.e. A prefers longer reviews) and that the‘Length’ property is also highly weighted when determining theusefulness of reviews associated to that user’s interacted items. Onthe other hand, for user B (Figure 2(b)), the ‘Age’ property is animportant review property (i.e. B prefers recent reviews) to capture
WW ’21, March 8-12, 2021, Ljubljana, Slovenia Xi Wang, Iadh Ounis, and Craig Macdonald the review usefulness, which is similar to the high weights on ‘Age’for the interacted items.Next, since users A and B have few item interactions, they mightnot accurately reflect the behaviour of the general user populationin using different reviews. Therefore, we plot the learned impor-tance scores of the review properties for a third user, C, who hasinteracted with a higher number of items. We also plot the meanimportance scores of the review properties of his/her interacteditems in Figure 2(c). From Figure 2(c), we observe that the impor-tance scores on the review properties of user C is close to the meanimportance scores on the review properties of his/her interacteditems, especially on the two most important review properties (‘Age’and ‘Helpful’). This tells us that the ’Age’ and ’Helpful’ propertiesare important properties to observe the usefulness of reviews forboth user C and his/her interacted items.The above figures provide further evidence that users and theirinteracted items agree on the important review properties. We en-visage that an online platform could leverage these weighting scoresto customise the review presentation to different users accordingto their preferences for different review properties. For example, itis better to present recent reviews to user B than to user A, whileuser A would prefer to see longer reviews so as to obtain moreinformation about the items’ features. Our proposed RPRM modelcan learn the importance of the review properties to identify usefulreviews and enables making review presentation decisions.
We proposed the review-based RPRM model, which leverages theimportance of different review properties in capturing the use-fulness of reviews thereby enhancing the recommendation per-formance. Inspired by the users’ adoption of information frame-work [39], we proposed two new loss functions and a negativesampling strategy that account for the usefulness of the reviewproperties. RPRM consistently outperformed six strong recommen-dation approaches across the two used datasets. Moreover, we haveshown that both of our proposed loss functions and negative sam-pling strategy can further improve the recommendation perfor-mances of RPRM. These results demonstrated the advantages ofleveraging the agreement on the review properties’ importancebetween users and items. Through a qualitative analysis, we havealso illustrated the recommendation added-value of RPRM by ex-amining the usefulness of several review properties for a sampleof users and their interacted items. This analysis has exemplifiedthe promise of RPRM in guiding online review platforms in cus-tomising the presentation of reviews and deploying more effectiverecommendation systems.
REFERENCES [1] Amjad Almahairi, Kyle Kastner, Kyunghyun Cho, and Aaron Courville. 2015.Learning distributed representations from reviews for collaborative filtering. In
Proc. of RecSys .[2] Konstantin Bauman, Bing Liu, and Alexander Tuzhilin. 2017. Aspect basedrecommendations: Recommending items with the most valuable aspects basedon user reviews. In
Proc. of SIGKDD .[3] Yolanda YY Chan and Eric WT Ngai. 2011. Conceptualising electronic word ofmouth activity.
Marketing Intelligence & Planning (2011).[4] Chong Chen, Min Zhang, Yiqun Liu, and Shaoping Ma. 2018. Neural attentionalrating regression with review-level explanations. In
Proc. of WWW .[5] Li Chen, Guanliang Chen, and Feng Wang. 2015. Recommender systems basedon user reviews: the state of the art.
User Modeling and User-Adapted Interaction
25, 2 (2015), 99–154.[6] Xu Chen, Hanxiong Chen, Hongteng Xu, Yongfeng Zhang, Yixin Cao, ZhengQin, and Hongyuan Zha. 2019. Personalized fashion recommendation withvisual explanations based on multimodal attention network: Towards visuallyexplainable recommendation. In
Proc. of SIGIR .[7] Yifan Chen, Yang Wang, Xiang Zhao, Hongzhi Yin, Ilya Markov, andMAARTEN De Rijke. 2020. Local Variational Feature-based Similarity Mod-els for Recommending Top-N New Items.
ACM Transactions on InformationSystems (TOIS)
38, 2 (2020), 1–33.[8] Zhiyong Cheng, Ying Ding, Xiangnan He, Lei Zhu, Xuemeng Song, and Mohan SKankanhalli. 2018. A NCF: An Adaptive Aspect Attention Model for RatingPrediction. In
Proc. of IJCAI .[9] Cindy Man-Yee Cheung, Choon-Ling Sia, and Kevin KY Kuan. 2012. Is this reviewbelievable? A study of factors affecting the credibility of online consumer reviewsfrom an ELM perspective.
Journal of the Association for Information Systems
Proc. of NAACL-HLT .[11] Raffaele Filieri and Fraser McLeay. 2014. E-WOM and accommodation: Ananalysis of the factors that influence travelers’ adoption of information fromonline reviews.
Journal of Travel Research
53, 1 (2014), 44–57.[12] Xinyu Guan, Zhiyong Cheng, Xiangnan He, Yongfeng Zhang, Zhibo Zhu, QinkePeng, and Tat-Seng Chua. 2019. Attentive Aspect Modeling for Review-awareRecommendation.
ACM Transactions on Information Systems (TOIS)
37, 3 (2019),28.[13] Qing Guo, Zhu Sun, Jie Zhang, and Yin-Leng Theng. 2020. An AttentionalRecurrent Neural Network for Personalized Next Location Recommendation. In
Proc. of AAAI .[14] Ruining He and Julian McAuley. 2016. Ups and downs: Modeling the visualevolution of fashion trends with one-class collaborative filtering. In
Proc. ofWWW .[15] Xiangnan He, Tao Chen, Min-Yen Kan, and Xiao Chen. 2015. Trirank: Review-aware explainable recommendation by modeling aspects. In
Proc. of CIKM .[16] Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classification. In
Proc. of EMNLP .[17] Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Opti-mization. In
Proc. of ICLR .[18] Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, PiyushSharma, and Radu Soricut. 2020. ALBERT: A Lite BERT for Self-supervisedLearning of Language Representations. In
Proc. of ICLR .[19] Chenliang Li, Xichuan Niu, Xiangyang Luo, Zhenzhong Chen, and Cong Quan.2019. A Review-Driven Neural Model for Sequential Recommendation. In
Proc.of IJCAI .[20] Yuqi Li, Weizheng Chen, and Hongfei Yan. 2017. Learning Graph-based Embed-ding For Time-Aware Product Recommendation. In
Proc. of CIKM .[21] Guang Ling, Michael R Lyu, and Irwin King. 2014. Ratings meet reviews, acombined approach to recommend. In
Proc. of RecSys .[22] Wei Liu, Zhi-Jie Wang, Bin Yao, and Jian Yin. 2019. Geo-ALM: POI Recommenda-tion by Fusing Geographical Information and Adversarial Learning Mechanism..In
Proc. of IJCAI .[23] Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, OmerLevy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: ARobustly Optimized BERT Pretraining Approach.
CoRR abs/1907.11692 (2019).[24] Jarana Manotumruksa, Craig Macdonald, and Iadh Ounis. 2018. A contextualattention recurrent architecture for context-aware venue recommendation. In
Proc. of SIGIR .[25] Julian McAuley and Jure Leskovec. 2013. Hidden factors and hidden topics:understanding rating dimensions with review text. In
Proc. of RecSys .[26] Julian McAuley, Rahul Pandey, and Jure Leskovec. 2015. Inferring networks ofsubstitutable and complementary products. In
Proc. of SIGKDD . 785–794.[27] Jianmo Ni, Jiacheng Li, and Julian McAuley. 2019. Justifying recommendationsusing distantly-labeled reviews and fine-grained aspects. In
Proc. of EMNLP-IJCNLP . 188–197.[28] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, GregoryChanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019.Pytorch: An imperative style, high-performance deep learning library. In
Proc. ofNeurIPS .[29] Richard E Petty and John T Cacioppo. 2012.
Communication and persuasion:Central and peripheral routes to attitude change . Springer Science & BusinessMedia.[30] Hamed Qahri-Saremi and Ali Reza Montazemi. 2019. Factors Affecting theAdoption of an Electronic Word of Mouth Message: A Meta-Analysis.
Journal ofManagement Information Systems
36, 3 (2019).[31] Tieyun Qian, Bei Liu, Quoc Viet Hung Nguyen, and Hongzhi Yin. 2019. Spa-tiotemporal representation learning for translation-based POI recommendation.
ACM Transactions on Information Systems (TOIS)
37, 2 (2019), 1–24. everaging Review Properties for Effective Recommendation WWW ’21, March 8-12, 2021, Ljubljana, Slovenia [32] Sindhu Raghavan, Suriya Gunasekar, and Joydeep Ghosh. 2012. Review qualityaware collaborative filtering. In
Proc. of RecSys .[33] Zhaochun Ren, Shangsong Liang, Piji Li, Shuaiqiang Wang, and Maarten de Rijke.2017. Social collaborative viewpoint regression with explainable recommenda-tions. In
Proc. of WSDM .[34] Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme.2009. BPR: Bayesian Personalized Ranking from Implicit Feedback. In
Proc. ofUAI .[35] Steffen Rendle, Walid Krichene, Li Zhang, and John R. Anderson. 2020. NeuralCollaborative Filtering vs. Matrix Factorization Revisited. In
Proc. of RecSys .[36] Noveen Sachdeva and Julian McAuley. 2020. How Useful are Reviews for Recom-mendation? A Critical Review and Potential Improvements. In
Proc. of SIGIR .[37] Tetsuya Sakai. 2018.
Laboratory Experiments in Information Retrieval - SampleSizes, Effect Sizes, and Statistical Power . Springer.[38] Sungyong Seo, Jing Huang, Hao Yang, and Yan Liu. 2017. Interpretable convo-lutional neural networks with dual local and global attention for review ratingprediction. In
Proc. of RecSys .[39] Stephanie Watts Sussman and Wendy Schneier Siegal. 2003. Informationalinfluence in organizations: An integrated approach to knowledge adoption.
In-formation systems research
14, 1 (2003), 47–65.[40] Jiaxi Tang and Ke Wang. 2018. Personalized top-n sequential recommendationvia convolutional sequence embedding. In
Proc. of WSDM .[41] Liang Rebecca Tang, Soocheong Shawn Jang, and Alastair Morrison. 2012. Dual-route communication of destination websites.
Tourism Management
33, 1 (2012),38–49.[42] Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, PietroLiò, and Yoshua Bengio. 2018. Graph Attention Networks.
Proc. of ICLR . [43] Xi Wang, Anjie Fang, Iadh Ounis, and Craig Macdonald. 2019. Evaluating Simi-larity Metrics for Latent Twitter Topics. In
Proc. of ECIR .[44] Xi Wang, Iadh Ounis, and Craig Macdonald. 2019. Comparison of SentimentAnalysis and User Ratings in Venue Recommendation. In
Proc. of ECIR .[45] Xi Wang, Iadh Ounis, and Craig Macdonald. 2020. Negative Confidence-AwareWeakly Supervised Binary Classification for Effective Review Helpfulness Classi-fication. In
Proc. of CIKM .[46] Chenghuan Wu and David R Shaffer. 1987. Susceptibility to persuasive appealsas a function of source credibility and prior experience with the attitude object.
Journal of personality and social psychology
52, 4 (1987), 677.[47] Jibang Wu, Renqin Cai, and Hongning Wang. 2020. Déjà vu: A ContextualizedTemporal Attention Mechanism for Sequential Recommendation. In
Proc. ofWWW .[48] Jheng-Hong Yang, Chih-Ming Chen, Chuan-Ju Wang, and Ming-Feng Tsai. 2018.HOP-rec: high-order proximity for implicit recommendation. In
Proc. of RecSys .[49] Feng Yu, Qiang Liu, Shu Wu, Liang Wang, and Tieniu Tan. 2016. A dynamicrecurrent model for next basket recommendation. In
Proc. of SIGIR .[50] Jia-Dong Zhang and Chi-Yin Chow. 2015. GeoSoCa: Exploiting geographical,social and categorical correlations for point-of-interest recommendations. In
Proc.of SIGIR .[51] Yongfeng Zhang, Qingyao Ai, Xu Chen, and W Bruce Croft. 2017. Joint repre-sentation learning for top-n recommendation with heterogeneous informationsources. In
Proc. of CIKM .[52] Pengpeng Zhao, Haifeng Zhu, Yanchi Liu, Jiajie Xu, Zhixu Li, Fuzhen Zhuang,Victor S Sheng, and Xiaofang Zhou. 2019. Where to go next: A spatio-temporalgated network for next poi recommendation. In
Proc. of AAAI .[53] Lei Zheng, Vahid Noroozi, and Philip S Yu. 2017. Joint deep modeling of usersand items using reviews for recommendation. In