AspeRa: Aspect-based Rating Prediction Model
Sergey I. Nikolenko, Elena Tutubalina, Valentin Malykh, Ilya Shenbin, Anton Alekseev
AAspeRa: Aspect-based Rating Prediction Model
Sergey I. Nikolenko , , Elena Tutubalina , , , Valentin Malykh , ,Ilya Shenbin , and Anton Alekseev Samsung-PDMI Joint AI Center, Steklov Mathematical Institute at St. Petersburg Chemoinformatics and Molecular Modeling Laboratory, Kazan Federal University Neural Systems and Deep Learning Laboratory,Moscow Institute of Physics and Technology Neuromation OU, Tallinn, 10111 Estonia
Abstract.
We propose a novel end-to-end Aspect-based Rating Predic-tion model (AspeRa) that estimates user rating based on review textsfor the items and at the same time discovers coherent aspects of reviewsthat can be used to explain predictions or profile users. The AspeRamodel uses max-margin losses for joint item and user embedding learn-ing and a dual-headed architecture; it significantly outperforms recentlyproposed state-of-the-art models such as DeepCoNN, HFT, NARRE, andTransRev on two real world data sets of user reviews. With qualitativeexamination of the aspects and quantitative evaluation of rating predic-tion models based on these aspects, we show how aspect embeddings canbe used in a recommender system.
Keywords: aspect-based sentiment analysis · recommender systems · aspect-based recommendation · explainable recommendation · user re-views · neural network · deep learning As the scale of online services and the Web itself grows, recommender systemsincreasingly attempt to utilize texts available online, either as items for recom-mendation or as their descriptions [1, 24, 27, 43]. One key complication is that asingle text can touch upon many different features of the item; e.g., the samebrief review of a laptop can assess its weight, performance, keyboard, and so on,with different results. Hence, real-world applications need to separate differentaspects of reviews. This idea also has a long history [16, 28]. Many recent worksin recommender systems have applied deep learning methods [11, 33, 35, 43]. Inthis work, we introduce novel deep learning methods for making recommenda-tions with full-text items, aiming to learn interpretable user representations thatreflect user preferences and at the same time help predict ratings. We proposea novel Aspect-based Rating Prediction Model (AspeRa) for aspect-based rep-resentation learning for items by encoding word-occurrence statistics into wordembeddings and applying dimensionality reduction to extract the most impor-tant aspects that are used for the user-item rating estimation. We investigatehow and in what settings such neural autoencoders can be applied to content-based recommendations for text items. a r X i v : . [ c s . C L ] J a n S. Nikolenko et al.
The
AspeRa model combines the advantages of deep learning (end-to-end learn-ing, spatial text representation) and topic modeling (interpretable topics) fortext-based recommendation systems. Fig. 1 shows the overall architecture ofAspeRa. The model receives as input two reviews at once, treating both identi-cally. Each review is embedded with self-attention to produce two vectors, onefor author (user) features and the other for item features. These two vectors areused to predict a rating corresponding to the review. All vectors are forced tobelong to the same feature space. The embedding is produced by the NeuralAttention-Based Aspect Extraction Model (ABAE) [7]. As in topic modelingor clustering, with ABAE the designer can determine a finite number of top-ics/clusters/aspects, and the goal is to find out for every document to whichextent it satisfies each topics/aspects. From a bird’s eye view, ABAE is an au-toencoder. The main feature of ABAE is the reconstruction loss between bag-of-words embeddings used as the sentence representation and a linear combinationof aspect embeddings. A sentence embedding is additionally weighted by self-attention , an attention mechanism where the values are word embeddings andthe key is the mean embedding of words in a sentence.The first step in ABAE is to compute the embedding z s ∈ R d for a sentence s ; below we call it a text embedding: z s = (cid:80) ni =1 a i e w i , where e w i is a wordembedding for a word w i , e ∈ R d . As word vectors the authors use word2vec embeddings trained with the skip-gram model [22]. Attention weights a i arecomputed as a multiplicative self-attention model: a i = softmax( e (cid:62) w i A y s ), where y s is the average of word embeddings in a sentence, y s = (cid:80) ni =1 e w i , and A ∈ R d × d is the learned attention model. The second step is to compute the aspect-based sentence representation r s ∈ R d from an aspect embeddings matrix T ∈ R k × d , where k is the number of aspects: p s = softmax( W z s + b ), where p s ∈ R k is the vector of probability weights over k aspect embeddings, r s = T (cid:62) p s , and W ∈ R k × d , b ∈ R k are the parameters of a multi-class logistic regression model.Below we call r s the reconstructed embedding.To train the model, ABAE uses the cosine distance between r s and z s with acontrastive max-margin objective function [41] as the reconstruction error, alsoadding an orthogonality penalty term that tries to make the aspect embeddingmatrix T to produce aspect embeddings as diverse as possible.The proposed model’s architecture includes an embedder, which providestext and reconstruction embeddings for an object similar to ABAE (“user em-bedding” and “item embedding” on Fig. 1). The intuition behind this separationof user and item embedding is as follows: there are some features (aspects) im-portant in an item for a user, but the item also has other features. Hence, wewant to extract user aspects from a user’s reviews as well as item aspects froman item’s reviews. The resulting embedding is conditioned on aspect representa-tion of the reviews; we will see below that this model can discover interpretabletopics. The model contains four embedders in total, one pair of user and itemembedders for two reviews being considered at once, as shown on Fig. 1. Firsteach review is paired with another review of the same user, grouping by users speRa: Aspect-based Rating Prediction Model 3 Fig. 1:
Architecture of the proposed
AspeRa model. and shuffling the reviews inside a group; then with another review of the sameitem. Thus, the training set gives rise to only twice as many pairs as reviewsavailable for training. The rating score for the first review in a pair is used totrain the rating predictor (
MSE ); at prediction stage, only one “tower” is used.There are two losses in
AspeRa : MSE for rating prediction (Fig. 1) andMaxMargin loss to put user and item embeddings in the same space (Fig. 1).The MSE loss assumes that rating is predicted as the dot product of user anditem embeddings for a review:
M SE = N (cid:80) Nj =1 ( z uj (cid:62) z ij − r j ) , where z uj is atext embedding for the author of review j , z ij is a text embedding for the item j is about, and r j is the true rating associated with j . Max-margin loss aims toproject all user and item embeddings into the same feature (aspect) space; seeFig. 1. We use it in two ways. First, we push reconstructed and text embeddingsto be closer for each user i , and pushes text embeddings for both considereditems apart: MaxMargin( i, j ) = N (cid:80) i,j max(0 , − r ui (cid:62) z ui + r ui (cid:62) z ii + r ui (cid:62) z ij ) , where i, j are indices of reviews, r ui is a reconstructed embedding from ABAEfor user i , z ui is a text embedding for user i , z ii and z ij are text embeddings fromABAE for items i and j respectively. This loss is applied for all four possiblecombination of users and items, i.e., ( u i , i i , i j ) , ( u j , i i , i j ) , ( i i , u i , u j ) , ( i j , u i , u j ).Second, we keep user embeddings from two reviews of the same author close:MaxMargin( i, j ) = N (cid:80) i,j max(0 , − z ui (cid:62) z uj + z ui (cid:62) z ii + z ui (cid:62) z ij ) , where i, j areindices of reviews, z ui and z uj are user embeddings from ABAE for authors ofreviews i and j and z ii and z ij are text embeddings from ABAE for items i and j respectively. This second form is symmetrically applied to item and userembeddings for two reviews pf the same item from different authors. Datasets and experimental setup.
We evaluated the proposed model on
Amazon Instant Videos 5-core reviews and
Amazon Toys and Games 5-core re-views [9, 20]. The first dataset consists of reviews written by users with at least http://jmcauley.ucsd.edu/data/amazon/ S. Nikolenko et al. Table 1:
Two sets of AspeRa hyper-parameters (for models with differentinitialization strategies).
Settings AspeRa(GloVe) AspeRa(SGNS)
Embeddings GloVe SGNSOptimization alg. Adam [13] Adam
Table 2:
Performance of text-based andcollaborative rating prediction models.
Model MSEInstantVideos Toys &Games
NMF 0.946 0.821DeepCoNN 0.943 0.851Attn+CNN 0.936 -SVD 0.904 0.788HFT 0.888 0.784TransRev 0.884 0.784NARRE - 0.769AspeRa (GloVe) 0.870 0.730AspeRa (SGNS)
Fig. 2:
Comparing
AspeRa with GloVe (SGNS clusters), ABAE (SGNS clusters), andLDA with the same vocabulary and 10 topics on
Instant Videos ; more is better. X-axis:number of top-ranked representative words per aspect, Y-axis: topic coherence scores. five reviews on
Amazon and/or for items with at least five reviews; it contains37 ,
126 reviews, 5 ,
130 users, 1 ,
685 items, and a total of 3 , ,
453 non-uniquetokens. The second dataset follows 5 minimum reviews rule; it contains 167 , ,
412 users, 11 ,
924 items, and a total of 17 , ,
324 non-unique to-kens. We randomly split each dataset into 10% test set and 90% training set,with 10% of the training set used as a validation set for tuning hyperparam-eters. Following ABAE [7], we set the aspects matrix ortho-regularization co-efficient equal to 0 .
1. Since this model utilizes an aspect embedding matrix toapproximate aspect words in the vocabulary, initialization of aspect embeddingsis crucial. The work [8] used k-means clustering-based initialization [17, 18, 36],where the aspect embedding matrix is initialized with centroids of the resultingclusters of word embeddings. We compare two word embeddings for
AspeRa : GloVe [29] and word2vec [21, 23]. We adopted a
GloVe model trained on the
Wikipedia 2014 + Gigaword 5 dataset (6B tokens, 400K words vocabulary, un-cased tokens) with dimension 50. For word2vec , we used the training set ofreviews to train a skip-gram model (SGNS) with the gensim library [31] withdimension 200, window size 10, and 5 negative samples; see Table 1 for details.
Rating Prediction.
We evaluate the performance of
AspeRa in comparison tostate-of-the-art models: NMF [42], DeepCoNN [43], Attn+CNN [33], SVD [14],HFT [19], NARRE [4], and TransRev [5]; we introduce these models in Section 4.Table 2 compares the best Mean Square Error (MSE) of
AspeRa and othermodels for rating prediction. Results of existing models were adopted from [5] for speRa: Aspect-based Rating Prediction Model 5
Table 3:
Sample aspects from
Instant Videos discovered by AspeRa (SGNS).
Aspect words
Table 4:
Sample aspects from
Instant Videos discovered by AspeRa (GloVe).
Aspect words
Amazon Instant Videos 5-core reviews with the ratio 80:10:10. We also used theresults of NARRE model [4], obtained in the same setup as [5] but with a differentrandom seed. Note that while
AspeRa with generic
GloVe word embeddings stillworks better than any other model, adding custom word embeddings trained onthe same type of texts improves the results greatly.
Topic Quality
We compared the performance of
AspeRa with
OnlineLDA [10]trained with the gensim library [31], with the same vocabulary and number oftopics, and ABAE with 10 aspects and 18 epochs, initialized with the same word2vec vectors (SGNS) as
AspeRa and having the same ortho-regularizationcoefficient as the best
AspeRa model, evaluating the results in terms of topiccoherence metrics, NPMI [2] and PMI [25,26] computed with companion softwarefor [15]. Figure 2 shows that the quality is generally lower for larger number ofrepresentative words per aspect (horizontal axis), and that
AspeRa achievesscores comparable to LDA and ABAE, although ABAE remains ahead. Tables 3and 4 present several sample aspects discovered by
AspeRA . Qualitative analysisshows that some aspects describe what could be called a topic (a set of wordsdiverse by part of speech and function describing a certain domain), some encodesentiment (top words are adjectives showing attitude to certain objects discussedin the text), and some encode names (actors, directors, etc.). We also foundsimilar patterns in the output of the basic ABAE model [7]. Thus, most aspectsare clearly coherent, but there is room for improvement.
Classical collaborative filtering based on matrix factorization (MF) [14, 42] hasbeen extended with textual information, often in the form of topics/aspects; as-
S. Nikolenko et al. pect extraction uses topic modelling [37,38,44] and phrase-based extraction [34].Collaborative topic regression (CTR) [39] was one of the first models to combinecollaborative-based and topic-based approaches to recommendation; to recom-mend research articles; it uses an LDA topic vector as a prior of item embeddingsfor MF. Hidden Factors and Hidden Topics (HTF) [19] also combines MF andLDA but with user reviews used as contextual information. A few subsequentworks use MF along with deep learning approaches; e.g., Collaborative DeepLearning (CDL) [40] improves upon CTR by replacing LDA with a stacked de-noising autoencoder. Unlike our approach, all these models learn in alternatingrather than end-to-end manner. Recent advances in distributed word representa-tions have made it a cornerstone of modern natural language processing [6], withneural networks recently used to learn text representations. He et al. [7] proposedan unsupervised neural attention-based aspect extraction (ABAE) approach thatencodes word-occurrence statistics into word embeddings and applies an atten-tion mechanism to remove irrelevant words, learning a set of aspect embeddings.Several recent works, including DeepCoNN [43], propose a completely differentapproach. DeepCoNN is an end-to-end model, both user and item embeddingvectors in this model are trainable functions (convolutional neural networks) ofreviews associated with a user or item respectively. Experiments on
Yelp and
Amazon datasets showed significant improvements over HFT.
TransNet [3] addsa regularizer on the penultimate layer that forces the network to predict reviewembedding.
TransRev [5] is based on the same idea of restoring the review em-bedding from user and item embeddings.
Attn+CNN and
D-Attn [32,33] extend
DeepCoNN with an attention mechanism on top of text reviews; it both im-proves performance and allows to explain predictions by highlighting significantwords. However, user and item embeddings of these models are learned in a fullysupervised way, unlike the proposed model. Our model combines semi-supervisedembedding learning, which makes predictions interpretable similar to HTF, witha deep architecture and end-to-end training.
We have introduced a novel approach to learning rating- and text-aware rec-ommender systems based on ABAE, metric learning, and autoencoder-enrichedlearning. Our approach jointly learns interpretable user and item representations.It is expectedly harder to tune to achieve better quality, but the final model per-forms better at rating prediction and almost on par at aspects coherence withother state-of-the-art approaches. Our results can also be viewed as part of theresearch effort to analyze and interpret deep neural networks, a very importantrecent trend [12, 30]. We foresee the following directions for future work: (i) fur-ther improving prediction quality (especially for models that learn interpretableuser representations), (ii) integrating methods that can remove “purely senti-mental” aspects into interpretable models for recommendations that we havediscussed above, (iii) developing visualization techniques for user profiles. speRa: Aspect-based Rating Prediction Model 7
Acknowledgements.
This research was done at the Samsung-PDMI Joint AICenter at PDMI RAS and was supported by Samsung Research.
References
1. Alekseev, A., Nikolenko, S.: Word embeddings for user profiling in online socialnetworks. Computaci´on y Sistemas (2) (2017)2. Bouma, G.: Normalized (pointwise) mutual information in collocation extraction.Proceedings of GSCL pp. 31–40 (2009)3. Catherine, R., Cohen, W.: Transnets: Learning to transform for recommendation.In: Proceedings of the Eleventh ACM Conference on Recommender Systems. pp.288–296. ACM (2017)4. Chen, C., Zhang, M., Liu, Y., Ma, S.: Neural attentional rating regres-sion with review-level explanations. In: Proceedings of the 2018 World WideWeb Conference. pp. 1583–1592. WWW’18, International World Wide WebConferences Steering Committee, Republic and Canton of Geneva, Switzer-land (2018). https://doi.org/10.1145/3178876.3186070, https://doi.org/10.1145/3178876.3186070
5. Garca-Durn, A., Gonzalez, R., Ooro-Rubio, D., Niepert, M., Li, H.: Transrev:Modeling reviews as translations from users to items. CoRR abs/1801.10095 (2018), http://dblp.uni-trier.de/db/journals/corr/corr1801.html
6. Goldberg, Y.: A primer on neural network models for natural language processing.CoRR abs/1510.00726 (2015), http://arxiv.org/abs/1510.00726
7. He, R., Lee, W.S., Ng, H.T., Dahlmeier, D.: An unsupervised neural attentionmodel for aspect extraction. In: Proceedings of the 55th Annual Meeting of theAssociation for Computational Linguistics (Volume 1: Long Papers). vol. 1, pp.388–397 (2017)8. He, R., Lee, W.S., Ng, H.T., Dahlmeier, D.: An unsupervised neural attentionmodel for aspect extraction. In: Proceedings of the 55th Annual Meeting of theAssociation for Computational Linguistics (Volume 1: Long Papers). vol. 1, pp.388–397 (2017)9. He, R., McAuley, J.: Ups and downs: Modeling the visual evolution of fashiontrends with one-class collaborative filtering. In: proceedings of the 25th interna-tional conference on world wide web. pp. 507–517. International World Wide WebConferences Steering Committee (2016)10. Hoffman, M., Bach, F.R., Blei, D.M.: Online learning for latent dirichlet allocation.In: advances in neural information processing systems. pp. 856–864 (2010)11. Hsieh, C.K., Yang, L., Cui, Y., Lin, T.Y., Belongie, S., Estrin, D.: Collaborativemetric learning. In: Proceedings of the 26th International Conference on WorldWide Web. pp. 193–201. International World Wide Web Conferences Steering Com-mittee (2017)12. K´ad´ar, A., Chrupa(cid:32)la, G., Alishahi, A.: Representation of linguistic form and func-tion in recurrent neural networks. Computational Linguistics (4), 761–780 (2017)13. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. CoRR abs/1412.6980 (2014), http://arxiv.org/abs/1412.6980
14. Koren, Y., Bell, R.M., Volinsky, C.: Matrix factorization techniques for recom-mender systems. IEEE Computer (8), 30–37 (2009) S. Nikolenko et al.15. Lau, J.H., Newman, D., Baldwin, T.: Machine reading tea leaves: Automaticallyevaluating topic coherence and topic model quality. In: Proceedings of the 14thConference of the European Chapter of the Association for Computational Lin-guistics. pp. 530–539 (2014)16. Liu, B.: Sentiment analysis and opinion mining, Synthesis Lectures on HumanLanguage Technologies, vol. 5. Morgan & Claypool Publishers (2012)17. Lloyd, S.: Least squares quantization in pcm. IEEE transactions on informationtheory (2), 129–137 (1982)18. MacQueen, J., et al.: Some methods for classification and analysis of multivariateobservations. In: Proceedings of the fifth Berkeley symposium on mathematicalstatistics and probability. vol. 1, pp. 281–297. Oakland, CA, USA (1967)19. McAuley, J., Leskovec, J.: Hidden factors and hidden topics: understanding rat-ing dimensions with review text. In: Proceedings of the 7th ACM conference onRecommender systems. pp. 165–172 (2013)20. McAuley, J., Targett, C., Shi, Q., Van Den Hengel, A.: Image-based recommen-dations on styles and substitutes. In: Proceedings of the 38th International ACMSIGIR Conference on Research and Development in Information Retrieval. pp.43–52. ACM (2015)21. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word rep-resentations in vector space. CoRR abs/1301.3781 (2013), http://arxiv.org/abs/1301.3781
22. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed repre-sentations of words and phrases and their compositionality. In: Advances in neuralinformation processing systems. pp. 3111–3119 (2013)23. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributedrepresentations of words and phrases and their compositionality. In:Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger,K.Q. (eds.) Advances in Neural Information Processing Systems 26, pp.3111–3119. Curran Associates, Inc. (2013), http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf
24. Mitcheltree, C., Wharton, V., Saluja, A.: Using aspect extraction approaches togenerate review summaries and user profiles. arXiv preprint arXiv:1804.08666(2018)25. Newman, D., Karimi, S., Cavedon, L.: External evaluation of topic models. In: inAustralasian Doc. Comp. Symp., 2009. Citeseer (2009)26. Newman, D., Lau, J.H., Grieser, K., Baldwin, T.: Automatic evaluation of topiccoherence. In: Human Language Technologies: The 2010 Annual Conference of theNorth American Chapter of the Association for Computational Linguistics. pp.100–108. HLT ’10, Association for Computational Linguistics, Stroudsburg, PA,USA (2010), http://dl.acm.org/citation.cfm?id=1857999.1858011
27. Nikolenko, S.I., Alekseyev, A.: User profiling in text-based recommender systemsbased on distributed word representations. In: Proc. 5th International Conferenceon Analysis of Images, Social Networks, and Texts (2016)28. Pang, B., Lee, L.: Opinion mining and sentiment analysis. Foundations and trendsin information retrieval (1-2), 1–135 (2008)29. Pennington, J., Socher, R., Manning, C.: Glove: Global vectors for word represen-tation. In: Proceedings of the 2014 Conference on Empirical Methods in NaturalLanguage Processing (EMNLP). pp. 1532–1543. Association for ComputationalLinguistics, Doha, Qatar (2014), speRa: Aspect-based Rating Prediction Model 930. Radford, A., Jozefowicz, R., Sutskever, I.: Learning to generate reviews and dis-covering sentiment. arXiv preprint arXiv:1704.01444 (2017)31. Rehurek, R., Sojka, P.: Software Framework for Topic Modelling with Large Cor-pora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLPFrameworks. pp. 45–50. ELRA, Valletta, Malta (May 2010), http://is.muni.cz/publication/884893/en
32. Seo, S., Huang, J., Yang, H., Liu, Y.: Interpretable convolutional neural networkswith dual local and global attention for review rating prediction. In: Proceedingsof the Eleventh ACM Conference on Recommender Systems. pp. 297–305. ACM(2017)33. Seo, S., Huang, J., Yang, H., Liu, Y.: Representation learning of users and itemsfor review rating prediction using attention-based convolutional neural network.In: 3rd International Workshop on Machine Learning Methods for RecommenderSystems (MLRec)(SDM17) (2017)34. Solovyev, V., Ivanov, V.: Dictionary-based problem phrase extraction from userreviews. In: International Conference on Text, Speech, and Dialogue. pp. 225–232.Springer (2014)35. Srivastava, A., Sutton, C.: Autoencoding variational inference for topic models.arXiv preprint arXiv:1703.01488 (2017)36. Steinhaus, H.: Sur la division des corp materiels en parties. Bull. Acad. Polon. Sci (804), 801 (1956)37. Tutubalina, E., Nikolenko, S.: Inferring sentiment-based priors in topic models. In:Mexican International Conference on Artificial Intelligence. pp. 92–104. Springer(2015)38. Tutubalina, E., Nikolenko, S.: Constructing aspect-based sentiment lexicons withtopic modeling. In: International Conference on Analysis of Images, Social Net-works and Texts. pp. 208–220. Springer (2016)39. Wang, C., Blei, D.M.: Collaborative topic modeling for recommending scientificarticles. In: Proceedings of the 17th ACM SIGKDD international conference onKnowledge discovery and data mining. pp. 448–456. ACM (2011)40. Wang, H., Wang, N., Yeung, D.Y.: Collaborative deep learning for recommendersystems. In: Proceedings of the 21th ACM SIGKDD International Conference onKnowledge Discovery and Data Mining. pp. 1235–1244. ACM (2015)41. Weston, J., Bengio, S., Usunier, N.: Wsabie: Scaling up to large vocabulary imageannotation. In: IJCAI. vol. 11, pp. 2764–2770 (2011)42. Zhang, S., Wang, W., Ford, J., Makedon, F.: Learning from incomplete ratingsusing non-negative matrix factorization. In: Proceedings of the 2006 SIAM inter-national conference on data mining. pp. 549–553. SIAM (2006)43. Zheng, L., Noroozi, V., Yu, P.S.: Joint deep modeling of users and items using re-views for recommendation. In: Proceedings of the Tenth ACM International Con-ference on Web Search and Data Mining. pp. 425–434. ACM (2017)44. Zhu, X., Blei, D., Lafferty, J.: TagLDA: Bringing document structure knowledgeinto topic models. Tech. rep., UWisc Technical Report TR-1533 (2006),(804), 801 (1956)37. Tutubalina, E., Nikolenko, S.: Inferring sentiment-based priors in topic models. In:Mexican International Conference on Artificial Intelligence. pp. 92–104. Springer(2015)38. Tutubalina, E., Nikolenko, S.: Constructing aspect-based sentiment lexicons withtopic modeling. In: International Conference on Analysis of Images, Social Net-works and Texts. pp. 208–220. Springer (2016)39. Wang, C., Blei, D.M.: Collaborative topic modeling for recommending scientificarticles. In: Proceedings of the 17th ACM SIGKDD international conference onKnowledge discovery and data mining. pp. 448–456. ACM (2011)40. Wang, H., Wang, N., Yeung, D.Y.: Collaborative deep learning for recommendersystems. In: Proceedings of the 21th ACM SIGKDD International Conference onKnowledge Discovery and Data Mining. pp. 1235–1244. ACM (2015)41. Weston, J., Bengio, S., Usunier, N.: Wsabie: Scaling up to large vocabulary imageannotation. In: IJCAI. vol. 11, pp. 2764–2770 (2011)42. Zhang, S., Wang, W., Ford, J., Makedon, F.: Learning from incomplete ratingsusing non-negative matrix factorization. In: Proceedings of the 2006 SIAM inter-national conference on data mining. pp. 549–553. SIAM (2006)43. Zheng, L., Noroozi, V., Yu, P.S.: Joint deep modeling of users and items using re-views for recommendation. In: Proceedings of the Tenth ACM International Con-ference on Web Search and Data Mining. pp. 425–434. ACM (2017)44. Zhu, X., Blei, D., Lafferty, J.: TagLDA: Bringing document structure knowledgeinto topic models. Tech. rep., UWisc Technical Report TR-1533 (2006),