Context-aware Helpfulness Prediction for Online Product Reviews
TThis is the preliminary version. Fi-nal version has been published inThe 15th Asia Information RetrievalSocieties Conference proceeding (2019)and can be obtained from https://link.springer.com/chapter/10.1007/978-3-030-42835-8_6
Context-aware Helpfulness Prediction for OnlineProduct Reviews (cid:63)
Iyiola E. Olatunji, Xin Li, and Wai Lam
Department of Systems Engineering and Engineering Management,The Chinese University of Hong Kong,Shatin, Hong Kong { olatunji,lixin, wlam } @se.cuhk.edu.hk Abstract.
Modeling and prediction of review helpfulness has becomemore predominant due to proliferation of e-commerce websites and on-line shops. Since the functionality of a product cannot be tested beforebuying, people often rely on different kinds of user reviews to decidewhether or not to buy a product. However, quality reviews might beburied deep in the heap of a large amount of reviews. Therefore, rec-ommending reviews to customers based on the review quality is of theessence. Since there is no direct indication of review quality, most reviewsuse the information that “X out of Y” users found the review helpful forobtaining the review quality. However, this approach undermines helpful-ness prediction because not all reviews have statistically abundant votes.In this paper, we propose a neural deep learning model that predicts thehelpfulness score of a review. This model is based on convolutional neuralnetwork (CNN) and a context-aware encoding mechanism which can di-rectly capture relationships between words irrespective of their distancein a long sequence. We validated our model on human annotated datasetand the result shows that our model significantly outperforms existingmodels for helpfulness prediction.
Keywords:
Helpfulness prediction · context-aware · Product review.
Reviews have become an integral part of users experience when shopping online.This trend makes product reviews an invaluable asset because they help cus-tomers make purchasing decision, consequently, driving sales [7]. Due to theenormous amount of reviews, it is important to analyze review quality andto present useful reviews to potential customers. The quality of a review canvary from a well-detailed opinion and argument, to excessive appraisal, to spam.Therefore, predicting the helpfulness of a review involves automatic detection ofthe influence the review will have on a customer for making purchasing decision.Such reviews should be informative and self-contained [6]. (cid:63)
The work described in this paper is substantially supported by a grant from theResearch Grant Council of the Hong Kong Special Administrative Region, China(Project Code: 14204418). a r X i v : . [ c s . C L ] A p r I. E. Olatunji et al.
Votes: [1, 8]HS: 0.13HAS: 0.85
I received an updated charger as well as the updated replacement powersupply due to the recall. It looks as if this has solved all of the problems.I have been using this charger for some time now with no problems. Nomore heat issues and battery charging is quick and accurate. I can nowrecommend this charger with no problem. I use this charger severaltimes a week and much prefer it over the standard wall type chargers.I primarily use Enelope batteries with this charger.
Fig. 1.
Example of a review text with helpfulness score. HS = helpfulness score basedon “X of Y approach” while HAS = human annotated helpfulness score.
Review helpfulness prediction have been studied using arguments [12], as-pects (ASP) [25], structural (STR) and unigram (UGR) features [26]. Also,semantic features such as linguistic inquiry and word count (LIWC), generalinquirer (INQ) [26], and Geneva affect label coder (GALC) [14] have been usedto determine review helpfulness. However, using handcrafted features is laboriousand expensive due to manual feature engineering and data annotation. Recently,convolutional neural networks (CNNs) [10], more specifically, the character-basedCNN [5] has been applied to review helpfulness prediction and has shown to out-perform handcrafted features. However, it does not fully capture the semanticrepresentation of the review since different reviewers have different perspective,writing style and the reviewers language may affect the meaning of the review.That is, the choice of word of an experienced reviewer differs from that of a newreviewer. Therefore, modelling dependency between words is important.Recent works on helpfulness prediction use the “X of Y approach” i.e. ifX out of Y users votes that a review is helpful, then the helpfulness score ofthe review is X/Y. However, such simple method is not effective as shown inFigure 1. The helpfulness score (HS = 0.13) implies that the review is unhelpful.However, this is not the case as the review text entails user experience andprovides necessary information to draw a reasonable conclusion (self-contained).Hence, it is clear that the review is of high quality (helpful) as pointed outby human annotators (HAS = 0.85). This observation demonstrates that “Xof Y” helpfulness score may not strongly correlate to review quality [26] whichundermines the effectiveness of the prediction output.Similarly, prior methods assume that reviews are coherent [26][5]. However,this is not the case in most reviews because of certain content divergence featuressuch as sentiment divergence (opinion polarity of products compared to that ofreview) embedded in reviews. To address these issues, we propose a model thatpredicts the helpfulness score of a review by considering the context in whichwords are used in relation to the entire review. We aim to understand the internalstructure of the review text.In order to learn the internal structure of review text, we learn dependenciesbetween words by taking into account the positional encoding of each word ontext-aware Helpfulness Prediction for Online Product Reviews 3 and employing the self-attention mechanism to cater for the length of longersequences. We further encode our learned representation into CNN to producean output sequence representation.In our experiment, our framework outperforms existing state-of-the-art meth-ods for helpfulness prediction on the Amazon review dataset. We conducted de-tailed experiments to quantitatively evaluate the effectiveness of each designedcomponent of our model. We validated our model on the human annotated data.The code is available.
Previous works on helpfulness prediction can be categorized broadly into threecategories (a) score regression (prediction a helpfulness score between 0 and 1)(b) classification (classifying a review as either helpful or not helpful) (c) ranking(ordering reviews based on their helpfulness score). In this paper we define theproblem of helpfulness prediction as a regression task. Most studies tend to focuson extracting specific features (handcrafted features) from review text.Semantic features such as LIWC (linguistic inquiry and word count), generalinquirer (INQ) [26], and GALC (Geneva affect label coder) [14] were used toextract meaning from the text to determine the helpfulness of the review. Ex-tracting argument features from review text has shown to outperform semanticfeatures [12] since it provides a more detailed information about the product.Structural (STR) and unigram (UGR) features [26] have also been exploited.Content-based features such as review text and star rating and context-basedfeatures such as reviewer/user information are used for helpfulness predictiontask. Content-based features are features that can be obtained from the re-views. They include length of review, review readability, number of words inreview text, word-category features and content divergence features. Context-based features are features that can be derived outside the review. This includesreviewer features (profile) and features to capture similarities between users andreviews (user-reviewer Idiosyncrasy). Other metadata such as the probability ofa review and its sentences being subjective have been successfully used as fea-tures [19][9][17][20][22][8]. Since review text are mostly subjective, it is a morecomplicated task to model all contributing features for helpfulness prediction.Thus, using handcrafted features has limited capabilities and laborious due tomanual data annotation.Several methods have been applied to helpfulness prediction task includingsupport vector regression [9,29,26], probabilistic matrix factorization [23], lin-ear regression [13], extended tensor factorization models [16], HMM-LDA basedmodel [18] and multi-layer neural networks [11]. These methods allows the in-tegration of robust constraints into the learning process and this in turn hasimproved prediction result. Recently, convolutional neural network (CNN) hasshown significant improvement over existing methods by achieving the state-of-the-art performance [10][28]. CNNs automatically extract deep feature fromraw text. This capability can alleviate manual selection of features. Furthermore,
I. E. Olatunji et al. adding more levels of abstraction as in character-level representations has fur-ther improved prediction results over vanilla CNN. Embedding-gated CNN [3]and multi-domain gated CNN [4] are recent methods for predicting helpfulnessprediction.Moreover, attention mechanism has also been employed to CNN such as theABCNN for modelling sentence pair [27]. Several tasks including textual entail-ment, sentence representation, machine translation, and abstractive summariza-tion have applied self-attention mechanism and has shown significant result [1].However, employing self-attention for developing context-aware encoding mech-anism has not been applied to helpfulness prediction. Using self-attention mech-anism on review text is quite intuitive because even the same word can havedifferent meaning based on the context in which it is being used.
Fig. 2.
Proposed context-aware helpfulness prediction model
We model the problem of predicting the helpfulness score of a review as a regres-sion problem. Precisely, given a sequence of review, we predict the helpfulnessscore based on the review text. As shown in Figure 2, the sequence of wordsin the review are embedded and concatenated with their respective positionalencoding to form the input features. These input features are processed by aself-attention block for generating context-aware representations for all tokens.Then such representation will be fed into a convolutional neural network (CNN) ontext-aware Helpfulness Prediction for Online Product Reviews 5 which computes a vector representation of the entire sequence (i.e the depen-dency between the token and the entire sequence) and then we use a regressionlayer for predicting the helpfulness score.
The context-aware component of our model consists of positional encoding andself-attention mechanism. We augment the word embedding with positional en-coding vectors to learn text representation while taking the absolute position ofwords into consideration.Let X = ( x , x , ..., x n ) be a review consisting of a sequence of words. We mapeach word x i in a review X to a l -dimensional (word embedding) word vector e x i stored in an embedding matrix E ∈ IR V × l where V is the vocabulary size.We initialize E with pre-trained vectors from GloVe [21] and set the embeddingdimension l to 100. A review is therefore represented as Y = ( e x , e x , ..., e x n ).Since the above representation learns only the meaning of each word, we needthe position of each word for understanding the context of each word. Let S =( s , s , ..., s n ) be the position of each word in a sentence. Inspired by Vaswani etal [24], the positional encoding, denoted as P E ∈ IR n × l , is a 2D constant matrixwith position specific values calculated by the sine and cosine functions below: P E ( s k , i ) = sin (cid:16) s k /j i/l (cid:17) P E ( s k , i + 1) = cos (cid:16) s k /j i/l (cid:17) (1)where i is the position along the embedding vector e and j is a constant repre-senting the distance between successive peaks (or successive troughs) of cosineand sine function. This constant is between 2 π and 10000. Based on a tuningprocess, we set j to 1000. The sequence P = (cid:0) P , P , ..., P n (cid:1) , where P s = P E ( S )defined as the row vector corresponding to S in the matrix P E as in Equation1. The final representation e (cid:48) is obtained by adding the word embedding to therelative position values of each word in the sequence. Therefore, e (cid:48) = ( e (cid:48) , e (cid:48) , ..., e (cid:48) n )where e (cid:48) j = (cid:0) P j + e x j (cid:1) Self-attention is employed in our model. Given an input e (cid:48) , self-attention in-volves applying the attention mechanism on each e (cid:48) i using e (cid:48) i query vector andkey-value vector for all other positions. The reason for using the self-attentionmechanism is to capture the internal structure of the sentence by learning depen-dencies between words. The scaled dot-product attention is used which allowsfaster computation instead of the standard additive attention mechanism [2]. Itcomputes the attention scores by: Attention ( Q, K, V ) = sof tmax ( ( Q )( K ) T √ l ) V (2)where Q , K and V are the query, key and value matrices respectively. The aboveequation implies that we divide the dot product of queries with all keys by thekey vector dimension to obtain the weight on the values. I. E. Olatunji et al.
We used multi-head attention similar to [24]. The multi-head attention mech-anism maps the input vector e (cid:48) to queries, keys and values matrices by usingdifferent linear projections. This strategy allows the self-attention mechanismto be applied h times. Then all the vectors produced by different heads areconcatenated together to form a single vector.Concisely, our model captures context for a sequence as follows: We obtainthe relative position of the tokens in the sequence from Equation 1. The selfattention block, learns the context by relating or mapping different positions s , s , ..., s n of P via Equation 1 so as to compute a single encoding representationof the sequence. By employing the multi-head attention, our model can attend towords from different encoding representation at different positions. We set heads h to 2. The query, key and value used in the self-attention block are obtainedfrom the output of the previous layer of the context-aware encoding block. Thisdesign allows every position in the context-aware encoding block to attend overall positions of the input sequence. To consider positional information in longersentences not observed in training, we apply the sinusoidal position encoding tothe input embedding. The output of the context-aware encoding representation e (cid:48) is fed into the CNNto obtain new feature representations for making predictions. We employ mul-tiple filters f ∈ [1,2,3]. This method is similar to learning uni-gram, bi-gramand tri-gram representations respectively. Specifically, for each filter, we obtainan hidden representation r = P ool ( Conv ( e (cid:48) , f ilterSize ( f, l, c ))) where c is thechannel size, P ool is the pooling operation and
Conv ( . ) is the convolution op-eration. In our experiment, we use max pooling and average pooling. The finalrepresentation h is obtained by concatenating all hidden representation. i.e., h = [ r , r , r ]. These features are then passed to the regression layer to producethe helpfulness scores. We used two datasets for our experiments. The first dataset called D1 is con-structed from the Amazon product review dataset [15]. This dataset consists ofover 142 million reviews from Amazon between 1996 to 2014. We used a subsetof 21,776,678 reviews from 5 categories, namely; health, electronics, home, out-door and phone. We selected reviews with over 5 votes as done in [5,26]. Thestatistics of the dataset used are shown in table 1. We removed reviews havingless than 7 words for experimenting with different filter sizes. Note that this isthe largest dataset used for helpfulness prediction task.The second dataset called D2 is the human annotated dataset from Yang etal. [26]. This dataset consists of 400 reviews with 100 reviews selected randomlyfrom four product categories (outdoor, electronics, home and books). The reasonfor using the human annotated dataset is to verify that our model truly learns ontext-aware Helpfulness Prediction for Online Product Reviews 7 deep semantics features of review text. Therefore, our model was not trainedon the human annotated dataset but only used for evaluating the effectivenessof our model. We used only three categories for our experiment and performedcross-domain experiment on categories not in D2.
Table 1.
Data statistic of Amazon reviews from 5 different categories. We used Healthinstead of Watches as done by Chen et al. [5] because it is excluded from newlypublished Amazon dataset.
Product category
Phone 261,370 3,447,249Outdoor 491,008 3,268,695Health 550,297 2,982,326Home 749,564 4,253,926Electronics 1,310,513 7,824,482
Following the previous works [5,26,25], all experiments were evaluated usingcorrelation coefficients between the predicted helpfulness scores and the groundtruth scores. We split the dataset D1 into Train / Test / Validation (70, 20, 10).We used the same baselines as state-of-the-art convolutional model for helpful-ness prediction [5] i.e. STR, UGR, LIWC, INQ [26], ASP [25]. CNN is the CNNmodel by [10] and C CNN is the state-of-the-art character-based CNN from [5].We added two additional variants (S Attn and S Avg) to test different compo-nents of our model. S Attn involves using only self-attention without CNN whileS Avg is self-attention with CNN using average pooling and finally our modeluses max pooling with context-aware encoding. We re-implemented all baselinesas well as C CNN as described by [5] but excluded the transfer learning part oftheir model since it is for tackling insufficient data problem. We used RELU fornon-linearity and set dropout rate to 0.5 (for regularization). We used Adaptivemoment estimation (Adam) as our optimizer. The learning rate was set to 0.001and l to 100. We experimented with different filter sizes and found that f ∈ [1,2,3] produces the best result. Also we tried using Recurrent Neural Network(RNN) such as LSTM and BiLSTM but they performed worse than CNN. As shown in Table 2, our context-aware encoding based model using max poolingoutperforms all handcrafted features and CNN-based models including C CNNwith a large margin on D1. This is because by applying attention at differentpositions of the word embedding, different information about word dependencies
I. E. Olatunji et al.
Table 2.
Experimental result for the dataset D1
Phone Outdoor Health Home Electronics
STR 0.136 0.210 0.295 0.210 0.288UGR 0.210 0.299 0.301 0.278 0.310LIWC 0.163 0.287 0.268 0.285 0.350INQ 0.182 0.324 0.310 0.291 0.358ASP 0.185 0.281 0.342 0.233 0.366CNN 0.221 0.392 0.331 0.347 0.411C CNN 0.270 0.407 0.371 0.366 0.442S Attn 0.219 0.371 0.349 0.358 0.436S Avg 0.194 0.236 0.336 0.318 0.382Ours
Experimental result for the dataset D2. (Fusion all = STR + UGR + LIWC+ INQ)
Outdoor Home Electronics
Fusion all 0.417 0.596 0.461CNN 0.433 0.521 0.410C CNN 0.605 0.592 0.479Ours
Cross-domain investigationD1-PhoneD2-Home D1-HealthD2-ElectronicsC CNN 0.389 0.436Ours ontext-aware Helpfulness Prediction for Online Product Reviews 9 are extracted which in turn handles context variation around the same word.However, using self-attention alone (S Attn) (table 2) performs poorly than CNNas learning word dependencies alone is not sufficient for our task. We furtherneed to understand the internal structure of the review text. Since self-attentioncan handle longer sequence length than CNN when modelling dependencies,we resolve to capturing the dependencies using self-attention and then encodethe dependencies into a vector representation using CNN to further extract thepositional invariant features. Two variants are presented using average pooling(S Avg) and max pooling. S Avg performs comparable to handcrafted featuresprobably due to its tendency of selecting tokens having low attention scores. Ourproposed model with max-pooling produces the best result on D1 (Table 2) andsignificantly on D2 (Table 3) since it selects the best representation with mostattention. It implies that our model can capture the dependency between tokensand the entire sequence. Likewise, our model understands the internal structureof review and has a high correlation to human score.Since D2 does not include the Phone and Health category, we tested ourproposed model trained on the Phone and Health category from D1 on the Homeand Electronics category respectively on D2. Specifically, we used the trainingdata of the Phone category from D1 to train our proposed model and used thedata of the Home category from D2 for testing. Similarly, we used the trainingdata of the Health category from D1 to train our proposed model and tested themodel using the data from the Electronics category of D2.As shown in Table 4, the result is quite surprising. This shows that ourproposed model can effectively learn cross domain features and is robust to“out-of-vocabulary” (OOV) problem by predicting reasonable helpfulness scorehaving a high correlation to human score.
Predicting review helpfulness can substantially save a potential customers timeby presenting the most useful review. In this paper, we propose a context-awareencoding based method that learns dependencies between words for understand-ing the internal structure of the review.Experimental results on the human an-notated data shows that our model is a good estimator for predicting the help-fulness of reviews and robust to the out-of-vocabulary (OOV) problem. In thefuture, we aim to explore some learning to rank models to effectively rank help-fulness score while incorporating some other factors that may affect helpfulnessprediction including the types of products.
References
1. Ambartsoumian, A., Popowich, F.: Self-Attention : A Better Building Block forSentiment Analysis Neural Network Classifiers. In: Proceedings of the 9th Work-shop on Computational Approaches to Subjectivity, Sentiment and Social MediaAnalysis. pp. 130–139 (2018)0 I. E. Olatunji et al.2. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learningto align and translate. In: Proc. of ICLR. pp. 1–15 (2015)3. Chen, C., Qiu, M., Yang, Y., Zhou, J., Huang, J., Li, X., Bao, F.S.: Review Help-fulness Prediction with Embedding-Gated CNN. arXiv (2018)4. Chen, C., Qiu, M., Yang, Y., Zhou, J., Huang, J., Li, X., Bao, F.S.: Multi-domaingated cnn for review helpfulness prediction. In: Proc. of WWW. pp. 2630–2636(2019)5. Chen, C., Yang, Y., Zhou, J., Li, X., Bao, F.S.: Cross-Domain Review Helpful-ness Prediction based on Convolutional Neural Networks with Auxiliary DomainDiscriminators. In: Proc. of NAACL-HLT. pp. 602–607 (2018)6. Diaz, G.O., Ng, V.: Modeling and Prediction of Online Product Review Helpfulness: A Survey. In: Proc. of ACL. pp. 698–708 (2018)7. Duan, W., Gu, B., Whinston, A.B.: The dynamics of online word-of-mouth andproduct sales An empirical investigation of the movie industry. Journal of Retailing , 233–242 (2008)8. Ghose, A., Ipeirotis, P.G.: Estimating the helpfulness and economic impact ofproduct reviews: Mining text and reviewer characteristics. IEEE Transactions onKnowledge and Data Engineering (10), 1498–1512 (2011)9. Kim, S.M., Pantel, P., Chklovski, T., Pennacchiotti, M.: Automatically assessingreview helpfulness. In: Proc. of EMNLP. pp. 423–430 (2006)10. Kim, Y.: Convolutional Neural Networks for Sentence Classification. In: Proc. ofEMNLP. pp. 1746–1751 (2014)11. Lee, S., Choeh, J.Y.: Predicting the helpfulness of online reviews using multilayerperceptron neural networks. Expert Systems with Applications (6), 3041 – 3046(2014)12. Liu, H., Gao, Y., Lv, P., Li, M., Geng, S., Li, M., Wang, H.: Using Argument-based Features to Predict and Analyse Review Helpfulness. In: Proc. of EMNLP.pp. 1358–1363 (2017)13. Lu, Y., Tsaparas, P., Ntoulas, A., Polanyi, L.: Exploiting social context for reviewquality prediction. In: In Proc. of WWW. pp. 691–700 (2010)14. Martin, L., Pu, P.: Prediction of Helpful Reviews Using Emotions Extraction. In:Proc. of AAAI. pp. 1551–1557 (2014)15. Mcauley, J., Targett, C., Hengel, A.V.D.: Image-based Recommendations on Stylesand Substitutes. In: Proc. of SIGIR (2015)16. Moghaddam, S., Jamali, M., Ester, M.: Etf: Extended tensor factorization modelfor personalizing prediction of review helpfulness. In: Proceedings of the Fifth ACMInternational Conference on Web Search and Data Mining. pp. 163–172 (2012)17. Mudambi, S.M., Schuff, D.: Research note: What makes a helpful online review? astudy of customer reviews on amazon.com. MIS Quarterly (1), 185–200 (2010)18. Mukherjee, S., Popat, K., Weikum, G.: Exploring latent semantic factors to finduseful product reviews. In: Proceedings of the 2017 SIAM International Conferenceon Data Mining. pp. 480–488 (2017)19. Otterbacher, J.: ’helpfulness’ in online communities: A measure of message qual-ity. In: Proceedings of the SIGCHI Conference on Human Factors in ComputingSystems. pp. 955–964 (2009)20. Pan, Y., Zhang, J.Q.: Born unequal: A study of the helpfulness of user-generatedproduct reviews. Journal of Retailing (4), 598 – 612 (2011)21. Pennington, J., Socher, R., Manning, C.D.: GloVe : Global Vectors for Word Rep-resentation. In: Proc. of EMNLP. pp. 1532–1543 (2014)ontext-aware Helpfulness Prediction for Online Product Reviews 1122. Salehan, M., Kim, D.J.: Predicting the performance of online consumer reviews:A sentiment mining approach to big data analytics. Decision Support Systems ,30 – 40 (2016)23. Tang, J., Gao, H., Hu, X., Liu, H.: Context-aware review helpfulness rating pre-diction. In: Proceedings of the 7th ACM Conference on Recommender Systems(2013)24. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser,(cid:32)L., Polosukhin, I.: Attention Is All You Need. In: Proc. of NIPS (2017)25. Yang, Y., Chen, C., Bao, F.S.: Aspect-Based Helpfulness Prediction for OnlineProduct Reviews. In: Proc. of International Conference on Tools with Artificial In-telligence (ICTAI). pp. 836–843 (2016). https://doi.org/10.1109/ICTAI.2016.013026. Yang, Y., Yan, Y., Qiu, M., Bao, F.S.: Semantic Analysis and Helpfulness Pre-diction of Text for Online Product Reviews. In: Proc. of ACL-IJCNLP. pp. 38–44(2015)27. Yin, W., Schutze, H., Xiang, B., Zhou, B.: ABCNN : Attention-Based Convolu-tional Neural Network for Modeling Sentence Pairs. Transactions of the Associationfor Computational Linguistics4