[PDF] PoD: Positional Dependency-Based Word Embedding for Aspect Term Extraction

Abstract

Dependency context-based word embedding jointly learns the representations of word and dependency context, and has been proved effective in aspect term extraction. In this paper, we design the positional dependency-based word embedding (PoD) which considers both dependency context and positional context for aspect term extraction. Specifically, the positional context is modeled via relative position encoding. Besides, we enhance the dependency context by integrating more lexical information (e.g., POS tags) along dependency paths. Experiments on SemEval 2014/2015/2016 datasets show that our approach outperforms other embedding methods in aspect term extraction.

Full PDF

PP O D: Positional Dependency-Based Word Embeddingfor Aspect Term Extraction

Yichun Yin

Huawei Noah’s Ark Lab [email protected]

Chenguang Wang

Amazon Web Services [email protected]

Ming Zhang

Peking University mzhang [email protected]

Abstract

Dependency context-based word embeddingjointly learns the representations of word anddependency context, and has been proved ef-fective in aspect term extraction. In this paper,we design the positional dependency-basedword embedding (P O D) which considers bothdependency context and positional context foraspect term extraction. Speciﬁcally, the po-sitional context is modeled via relative po-sition encoding. Besides, we enhance thedependency context by integrating more lex-ical information (e.g., POS tags) along de-pendency paths. Experiments on SemEval2014/2015/2016 datasets show that our ap-proach outperforms other embedding methodsin aspect term extraction. The source code willbe publicly available soon.

Aspect term extraction aims to extract expres-sions that represent properties of products or ser-vices from online reviews (Hu and Liu, 2004a,b;Popescu and Etzioni, 2007; Liu, 2010). Under-standing the context between words in reviews,such as through conditional random ﬁelds (Pon-tiki et al., 2014, 2015, 2016), is the key to supe-rior results in aspect term extraction. Word em-beddings are effective to capture the contextual in-formation across a wide range of NLP tasks (Taiet al., 2015; Lei et al., 2015; Bojanowski et al.,2017; Devlin et al., 2019), however only producemoderate results in aspect term extraction. Recentstudies (e.g., Yin et al. (2016)) indicate that this isdue to the distributed nature of the word embed-ding (Mikolov et al., 2013b), which ignores therich context between the words, such as syntacticinformation.In this paper, we propose positionaldependency-based word embedding (P O D)to enhance the context modeling capability for as- pect term extraction. P O D explicitly captures twotypes of contexts, dependency context and posi-tional context . Inspired by the simple-yet-effectiveposition encoding in Transformer (Vaswani et al.,2017), P O D models the positional context viarelative position encoding (Shaw et al., 2018)between words within a ﬁxed window. Besides,the dependency context is deﬁned as the de-pendency path as well as the attached lexicalinformation (e.g., POS tags and words) along thepath. Compare to Yin et al. (2016), P O D is ableto incorporate more lexical information into thesemantic compositional model via the dependencycontext, making representations of dependencypaths more informative than the ones that onlyconsider grammatical information. We thenlinearly combine the dependency and positionalcontext to produce the positional dependenciesamong words. We also deﬁne a margin-basedranking loss to efﬁciently optimize P O D.Our contributions are two-fold, (i) we pro-pose positional dependency-based word embed-ding P O D, which incorporates both positionalcontext and dependency context, (ii) we compareP O D with other state-of-the-art aspect term ex-traction methods and demonstrate that P O D yieldsbetter results on aspect term extraction datasets. P O D aims to maximize likelihoods of triples ( w t , c , w c ), where w t and w c represent target wordand context word respectively, c refers to posi-tional dependency-based context (an example isin Table 1), which consists of two types of con-texts: the dependency context (dependency pathsbetween target and context word) and positionalcontext (relative position encoding between target1 a r X i v : . [ c s . C L ] N ov onderful/JJfood/NNprepared/NNthe/DT smells/VBZdet amod nsubj xcomp Figure 1: An example sentence, parsed by Stanford CoreNLP (Manning et al., 2014).

Target Context

DC PC food the ∗ det −→ ∗ -2prepared ∗ amod −→ ∗ -1smells ∗ nsubj ←− ∗ ∗ nsubj ←− smells / VBZ xcomp −→ ∗ Table 1: Target word, context words and their corre-sponding contexts: DC refers to dependency contextand PC refers to positional context. and context word). Figure 1 illustrates the sen-tence example according to the triples in Table 1.We introduce two score functions for triples( w t , c , w c ) which are as follows. S add = ( w c + c ) · w (cid:124) t ; S puct = ( w c ◦ c ) · w (cid:124) t , (1)where S add uses the element-wise addition for thecontext word and its context c , while S puct usesthe element-wise product. We use two embeddingmatrices M t ∈ R | V |× d and M c ∈ R | V |× d to repre-sent target words and context words respectively,where | V | is the size of vocabulary and d is thedimension of embeddings. The w c ∈ R × d and w t ∈ R × d are obtained through lookup opera-tions. Note that we describe how to derive c inSection 2.2. We construct the positional dependency-basedcontext c by linearly combining the dependencycontext vector c dep derived from semantic compo-sition of lexical dependency paths and the posi-tional context vector c pos computed based on rel-ative position encoding (Shaw et al., 2018). Therepresentation of positional dependency-basedcontext is deﬁned in Eq. (2). c = α · c pos + (1 − α ) · c dep , (2)where α is used to trade-off the effects betweendependency and positional contexts in the model.The basic idea of using relative position en-coding is based on the assumption that contextwords with different relative positions have dif-ferent impacts on learning the representations oftarget words. The use of relative position encod-ing has been proved to be useful in supervised re-lation classiﬁcation (Zeng et al., 2014) and ma-chine translation (Vaswani et al., 2017; Shaw et al., 2018). Similar to word embedding, we also intro-duce M l ∈ R ( s − × d to represent the relative po-sition encoding and derive c pos from it, where s isthe window size.We also consider the lexical information alongdependency paths when learning the represen-tations of the dependency context. For exam-ple, for the pair ( food , wonderful ) in Figure 1,the corresponding dependency path is ∗ nsubj ←− smells / VBZ xcomp −→ ∗ . We denote the words,POS tags as the lexical information, and use dep = { g , g , ..., g | c | } to denote the compos-ite lexical dependency path. The embeddingmatrix M dep ∈ R n × d is utilized to derive thedistributed representations of lexical dependencypath { g , g , ..., g | c | } , where n is the size of dictio-nary including words, POS tags and dependencypaths. To obtain c dep , we use RNN model whichlearns the dependency path representations alongthe sequence dep in a recurrent manner. We use a margin-based ranking objective to learnmodel parameters in Eq. (1), which encouragesscores of positive triples (w t , c , w c ) ∈ T to behigher than scores of sampled triples (w (cid:48) t , c , w c ) ∈T (cid:48) . The ranking loss is as follows. L = (cid:88) (w t , c , w c ) ∈T (cid:88) (w (cid:48) t , c , w c ) ∈T (cid:48) max { S (w t , c , w c ) − S (w (cid:48) t , c , w c ) + δ, } , (3)where δ is the margin value, S ( ∗ ) is the score func-tion deﬁned in Eq. (1), in which c is introduced inEq. (2).Note that, the proposed Eq. (3) conducts nega-tive sampling on target words rather than depen-dency paths, which proposes two advantages, (i) itcan exploit arbitrary hop dependency paths. Be-sides, the words and POS tags along the path canbe utilized; (ii) it avoids to memorize dependencypath frequencies which grow exponentially withthe number of hops.The negative sampling method is employed totrain the embedding model (Eq. (1)). These ran-2omly chosen words in T (cid:48) are sampled based onthe marginal distribution p ( w ) and p ( w ) is esti-mated from the word frequency raised to the power (Mikolov et al., 2013a) in the corpus. Weset the negative number to 15 which is a trade-offbetween the training time and performance. The δ is empirically set to 1 according to (Collobert andWeston, 2008; Bollegala et al., 2015). To avoidthe overﬁtting in RNN, we employ dropout on theinput vectors and set the dropout rate to 0.5. Theasynchronous gradient descent is used for paralleltraining. Moreover, Adagrad (Duchi et al., 2011)is used to adaptively change learning rate and theinitial learning rate is set to 0.1. We evaluate P O D on aspect term extraction bench-mark datasets: SemEval 2014/2015/2016. The Se-mEval 2014 datasets include two domains: lap-top and restaurant, and we use the D1 and D2 todenote these two datasets respectively. The Se-mEval 2015/2016 datasets only include restaurantdomain. D3 and D4 are utilized to represent them.We use the corpora introduced in (Yin et al., 2016)to learn the distributed representations of wordsand lexical dependency paths.

We compare P O D with top systems in SemEvalwhich are as follows.

IHS RD (Chernyshevich, 2014) and

DLIREC (Zhiqiang and Wenting, 2014) arethe top systems in D1 and D2 respectively, whichare both based on CRF with lexical, syntactic andstatistical features.

EliXa (San Vicente et al., 2015) is the top sys-tem in D3 which adopts perceptron.

Nlangp (Toh and Su, 2016) is the top system inD4 which is also based on CRF model.We also compare our method with the followingembedding-based methods.

DRNLM (Mirowski and Vlachos, 2015) pre-dicts the current words given the previous words,aiming at learning probabilities over sentences.

Skip-gram (Mikolov et al., 2013b) learns wordembeddings by predicting context words given tar-get words, while

CBOW (Mikolov et al., 2013a)predicts target word given context words.

Glove (Pennington et al., 2014) combines theadvantages of global matrix factorization and local context window embedding methods to learn wordrepresentations.

DepEmb (Levy and Goldberg, 2014) learnsword embedding using one-hop dependency con-text.

WDEmb (Yin et al., 2016) jointly learns dis-tributed representations of words and dependencypaths. However, WDEmb only considers gram-matical information in dependency context anddoes not capture positional context.As derived embeddings are not necessarily ina bounded range (Turian et al., 2010), this mightlead to moderate results. We apply a simple func-tion of discretization to make embedding featuresmore effective (Yin et al., 2016). f dis ( M ijt ) = (cid:98) ( M ijt − min ( M ∗ jt )) × lmax ( M ∗ jt ) − min ( M ∗ jt ) (cid:99) (4) where max ( M ∗ jt ) and min ( M ∗ jt ) are the maxi-mum and minimum in the j -th dimension respec-tively, l is the number of discrete intervals. Weuse the embeddings of w i and its context wordsas features to label w i . The window size of posi-tional context is set as 5 which follows (Collobertand Weston, 2008).In order to choose l , d (Section 2.1) and α (Eq. (2)), 80% sentences in training data are usedas training set, and the rest 20% are used as de-velopment set. The dimensions of word and de-pendency path embeddings are set as 100. Largerdimensions get similar results in the developmentset but cost more time. l is set as 10 which per-forms best in the development set. Similarly, the α s are set as 0.7, 0.5, 0.5 and 0.5 for datasets D1,D2, D3 and D4 respectively.To make fair comparisons, we choose parame-ters l and d on the development set for embeddingbaselines. All the dimensions of embedding meth-ods are set as 100. The dimensions l in Skip-gram,CBOW and WDEmb models are set as 15, the di-mensions in Glove and DepEmb are set as 10. Thewindows of Skip-gram, CBOW and Glove are setas 5, which are the same as our model. The results are described in Table 2 and the t-testis also conducted by random initialization. Fromthe table, we ﬁnd that P O D with both S puct and S add consistently outperform WDEmb which isone of the best embedding methods. The rea-sons are that (i) our model incorporates positional3 ethod D1 D2 D3 D4 IHS RD (Top system in D1)

DRNLM 66.91 78.59 64.75 63.89Skip-gram 70.52 82.20 66.98 68.57CBOW 69.80 81.98 67.09 67.43Glove 67.23 80.69 64.12 64.39DepEmb 71.02 82.78 67.55 69.23WDEmb 73.72 83.52 68.27 70.20P O D ( S add ) 73.54 ∗ † ∗ † P O D ( S puct ) 74.07 ∗ ∗ † ∗ Table 2: Comparison of F1 scores on the SemEval2014/2015/2016 datasets. In t-tests, the marker ∗ refersto p-value < † refers to p-value < Information D1 D2 D3 D4

Dependency path 72.13 83.52 68.39 70.90+ POS tags (only) 72.48 83.87 69.03 71.02+ Words (only) 73.79 84.31 69.98 71.24+ POS tags + Words

Table 3: Effects of information in dependency context. context as relative position encoding to help en-hance word embeddings; (ii) the dependency con-text leverages the lexical dependency path cap-turing more speciﬁc lexical information such aswords and POS tags (extracted using StanfordCoreNLP) than WDEmb. P O D also achieves com-parable results with top systems which are basedon hand-crafted features in all datasets, whichshows that our learned embeddings are effectivefor aspect term extraction. The S puct performsbetter than S add , which indicates that the product-based composition method is more capable in cap-turing the useful features in aspect term extraction.In terms of embedding-based baselines, DepEmband WDEmb perform better than other baselines,which indicates that encoding syntactic knowl-edge into word embeddings is desirable for aspectterm extraction.We also analyze the effects of POS tags andwords along dependency paths in the dependencycontext on ﬁnal results. The results are presentedin Table 3. From the table, we observe that bothPOS tags and words along dependency paths boostaspect term extraction, which indicates that lexi-cal information can encode discriminative infor-mation for representations of dependency paths.Meanwhile, P O D obtains better results by addingboth POS tags and words.

Association rule mining is used in (Hu and Liu,2004b) to mine aspect terms. Opinion words areused to extract infrequent aspect terms. The rela-tionship between opinion words and aspect wordsis crucial to extract aspect terms, which are de-ployed in many follow-up studies. In (Qiu et al.,2011), the predeﬁned dependency paths are uti-lized to iteratively extract aspect terms and opin-ion words. P O D instead learns the representationof the dependency context.Dependency-based word embedding (Levy andGoldberg, 2014; Komninos and Manandhar, 2016)encodes dependencies into word embeddings,which however implicitly encodes the dependencyinformation and models the unit (word plus de-pendency path) as the context vector and ignoresmulti-hop dependency paths. Yin et al. (2016)proposes to learn word and dependency contextand experimentally show that dependency context-based embeddings are effective in aspect term ex-traction. However, only grammatical informationis considered among the dependency paths. Weinstead introduce a positional dependency-basedembedding method which considers both depen-dency context and positional context. End-to-endaspect term extraction (Wang et al., 2016d, 2017d;Li et al., 2018; Xu et al., 2018) based on neu-ral networks and attention mechanism, have beenrecently developed. Compare to these methods,P O D can be applied to more applications. Com-pare to deep word representations (Peters et al.,2018; Devlin et al., 2019; Wang et al., 2019), P O Dis more efﬁcient which is crucial to aspect termextraction. Text-to-network (Wang et al., 2015a,b,2016a,c,b, 2017c, 2018) is in general relevant toaspect term extraction, we focus on proposing amore light weighted embedding method.

In this paper, we develop a speciﬁc word em-bedding method for aspect term extraction. Ourmethod considers both positional and depen-dency context when learning the word embedding.Meanwhile, the lexical information along depen-dency path is encoded into representations of de-pendency context. Compared with other embed-ding methods, our method achieves better resultsin aspect term extraction. We plan to apply ourmethod to more NLP tasks (Wang et al., 2013,2015c, 2017a,b).4 eferences

Piotr Bojanowski, Edouard Grave, Armand Joulin, andTomas Mikolov. 2017. Enriching word vectors withsubword information.

TACL , 5:135–146.Danushka Bollegala, Takanori Maehara, and Ken-ichiKawarabayashi. 2015. Unsupervised cross-domainword representation learning. In

Proceedings of the53rd Annual Meeting of the Association for Compu-tational Linguistics and the 7th International JointConference on Natural Language Processing (Vol-ume 1: Long Papers) , pages 730–740.Maryna Chernyshevich. 2014. Ihs r&d belarus: Cross-domain extraction of product features using condi-tional random ﬁelds.

SemEval 2014 , page 309.Ronan Collobert and Jason Weston. 2008. A uniﬁedarchitecture for natural language processing: Deepneural networks with multitask learning. In

ICML ,pages 160–167.Jacob Devlin, Ming-Wei Chang, Kenton Lee, andKristina Toutanova. 2019. BERT: pre-training ofdeep bidirectional transformers for language under-standing. In

NAACL , pages 4171–4186.John Duchi, Elad Hazan, and Yoram Singer. 2011.Adaptive subgradient methods for online learningand stochastic optimization.

Journal of MachineLearning Research , 12(Jul):2121–2159.Minqing Hu and Bing Liu. 2004a. Mining and summa-rizing customer reviews. In

SIGKDD , pages 168–177.Minqing Hu and Bing Liu. 2004b. Mining opinion fea-tures in customer reviews. In

AAAI , volume 4, pages755–760.Alexandros Komninos and Suresh Manandhar. 2016.Dependency based embeddings for sentence classi-ﬁcation tasks. In

NAACL , pages 1490–1500.Tao Lei, Regina Barzilay, and Tommi Jaakkola. 2015.Molding cnns for text: non-linear, non-consecutiveconvolutions. In

EMNLP , pages 1565–1575.Omer Levy and Yoav Goldberg. 2014. Dependency-based word embeddings. In

ACL , pages 302–308.Xin Li, Lidong Bing, Piji Li, Wai Lam, and ZhimouYang. 2018. Aspect term extraction with historyattention and selective transformation. In

Proceed-ings of the Twenty-Seventh International Joint Con-ference on Artiﬁcial Intelligence, IJCAI-18 , pages4194–4200. International Joint Conferences on Ar-tiﬁcial Intelligence Organization.Bing Liu. 2010. Sentiment analysis and subjectivity. In

Handbook of Natural Language Processing, SecondEdition. , pages 627–666.Christopher D. Manning, Mihai Surdeanu, John Bauer,Jenny Rose Finkel, Steven Bethard, and David Mc-Closky. 2014. The stanford corenlp natural languageprocessing toolkit. In

ACL , pages 55–60. Tomas Mikolov, Kai Chen, Greg Corrado, and Jef-frey Dean. 2013a. Efﬁcient estimation of wordrepresentations in vector space. arXiv preprintarXiv:1301.3781 .Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Cor-rado, and Jeff Dean. 2013b. Distributed representa-tions of words and phrases and their compositional-ity. In

NIPS , pages 3111–3119.Piotr Mirowski and Andreas Vlachos. 2015. Depen-dency recurrent neural language models for sentencecompletion. arXiv preprint arXiv:1507.01193 .Jeffrey Pennington, Richard Socher, and ChristopherManning. 2014. Glove: Global vectors for wordrepresentation. In

EMNLP , pages 1532–1543,Doha, Qatar.Matthew E. Peters, Mark Neumann, Mohit Iyyer, MattGardner, Christopher Clark, Kenton Lee, and LukeZettlemoyer. 2018. Deep contextualized word rep-resentations. In

NAACL , pages 2227–2237.Maria Pontiki, Dimitrios Galanis, Haris Papageogiou,Suresh Manandhar, and Ion Androutsopoulos. 2015.Semeval-2015 task 12: Aspect based sentiment anal-ysis. In

Proceedings of the 9th International Work-shop on Semantic Evaluation (SemEval 2015), Den-ver, Colorado .Maria Pontiki, Dimitris Galanis, Haris Papageorgiou,Ion Androutsopoulos, Suresh Manandhar, Moham-mad AL-Smadi, Mahmoud Al-Ayyoub, YanyanZhao, Bing Qin, Orphee De Clercq, VeroniqueHoste, Marianna Apidianaki, Xavier Tannier, Na-talia Loukachevitch, Evgeniy Kotelnikov, N´uria Bel,Salud Mar´ıa Jim´enez-Zafra, and G¨uls¸en Eryi˘git.2016. Semeval-2016 task 5: Aspect based sentimentanalysis. In

Proceedings of the 10th InternationalWorkshop on Semantic Evaluation (SemEval-2016) ,pages 19–30.Maria Pontiki, Haris Papageorgiou, Dimitrios Galanis,Ion Androutsopoulos, John Pavlopoulos, and SureshManandhar. 2014. Semeval-2014 task 4: Aspectbased sentiment analysis. In

SemEval 2014 , pages27–35.Ana-Maria Popescu and Orena Etzioni. 2007. Extract-ing product features and opinions from reviews. In

Natural language processing and text mining , pages9–28.Guang Qiu, Bing Liu, Jiajun Bu, and Chun Chen.2011. Opinion word expansion and target extrac-tion through double propagation.

Computationallinguistics , 37(1):9–27.I˜naki San Vicente, Xabier Saralegi, and RodrigoAgerri. 2015. Elixa: A modular and ﬂexible absaplatform. In

Proceedings of the 9th InternationalWorkshop on Semantic Evaluation (SemEval 2015) ,pages 748–752. eter Shaw, Jakob Uszkoreit, and Ashish Vaswani.2018. Self-attention with relative position represen-tations. In NAACL-HLT , pages 464–468.Kai Sheng Tai, Richard Socher, and Christopher D.Manning. 2015. Improved semantic representationsfrom tree-structured long short-term memory net-works. In

Proceedings of the 53rd Annual Meet-ing of the Association for Computational Linguisticsand the 7th International Joint Conference on Natu-ral Language Processing (Volume 1: Long Papers) ,pages 1556–1566.Zhiqiang Toh and Jian Su. 2016. Nlangp at semeval-2016 task 5: Improving aspect based sentiment anal-ysis using neural network features. In

Proceed-ings of the 10th International Workshop on SemanticEvaluation (SemEval-2016) , pages 287–293.Joseph Turian, Lev Ratinov, and Yoshua Bengio. 2010.Word representations: a simple and general methodfor semi-supervised learning. In

ACL , pages 384–394.Ashish Vaswani, Noam Shazeer, Niki Parmar, JakobUszkoreit, Llion Jones, Aidan N Gomez, ŁukaszKaiser, and Illia Polosukhin. 2017. Attention is allyou need. In

NIPS , pages 5998–6008.Chenguang Wang, Alan Akbik, Laura Chiticariu, Yun-yao Li, Fei Xia, and Anbang Xu. 2017a. CROWD-IN-THE-LOOP: A hybrid approach for annotatingsemantic roles. In

EMNLP , pages 1913–1922.Chenguang Wang, Laura Chiticariu, and Yunyao Li.2017b. Active learning for black-box semantic rolelabeling with neural factors. In

IJCAI , pages 2908–2914.Chenguang Wang, Nan Duan, Ming Zhou, and MingZhang. 2013. Paraphrasing adaptation for websearch ranking. In

ACL , pages 41–46.Chenguang Wang, Mu Li, and Alexander J. Smola.2019. Language models with transformers.

CoRR ,abs/1904.09408.Chenguang Wang, Yangqiu Song, Ahmed El-Kishky,Dan Roth, Ming Zhang, and Jiawei Han. 2015a.Incorporating world knowledge to document clus-tering via heterogeneous information networks. In

SIGKDD , pages 1215–1224.Chenguang Wang, Yangqiu Song, Haoran Li, YizhouSun, Ming Zhang, and Jiawei Han. 2017c. Distantmeta-path similarities for text-based heterogeneousinformation networks. In

CIKM , pages 1629–1638.Chenguang Wang, Yangqiu Song, Haoran Li, MingZhang, and Jiawei Han. 2015b. Knowsim: A doc-ument similarity measure on structured heteroge-neous information networks. In

ICDM , pages 1015–1020. Chenguang Wang, Yangqiu Song, Haoran Li, MingZhang, and Jiawei Han. 2016a. Text classiﬁcationwith heterogeneous information network kernels. In

AAAI , pages 2130–2136.Chenguang Wang, Yangqiu Song, Haoran Li, MingZhang, and Jiawei Han. 2018. Unsupervised meta-path selection for text similarity measure based onheterogeneous information networks.

Data Min.Knowl. Discov. , 32(6):1735–1767.Chenguang Wang, Yangqiu Song, Dan Roth, ChiWang, Jiawei Han, Heng Ji, and Ming Zhang. 2015c.Constrained information-theoretic tripartite graphclustering to identify semantically similar relations.In

IJCAI , pages 3882–3889.Chenguang Wang, Yangqiu Song, Dan Roth, MingZhang, and Jiawei Han. 2016b. World knowl-edge as indirect supervision for document cluster-ing.

TKDD , 11(2):13:1–13:36.Chenguang Wang, Yizhou Sun, Yanglei Song, Ji-awei Han, Yangqiu Song, Lidan Wang, and MingZhang. 2016c. Relsim: Relation similarity search inschema-rich heterogeneous information networks.In

SDM , pages 621–629.Wenya Wang, Sinno Jialin Pan, Daniel Dahlmeier, andXiaokui Xiao. 2016d. Recursive neural conditionalrandom ﬁelds for aspect-based sentiment analysis.In

EMNLP , pages 616–626.Wenya Wang, Sinno Jialin Pan, Daniel Dahlmeier, andXiaokui Xiao. 2017d. Coupled multi-layer atten-tions for co-extraction of aspect and opinion terms.In

AAAI , pages 3316–3322.Hu Xu, Bing Liu, Lei Shu, and Philip S. Yu. 2018.Double embeddings and cnn-based sequence label-ing for aspect extraction. In

ACL .Yichun Yin, Furu Wei, Li Dong, Kaimeng Xu, MingZhang, and Ming Zhou. 2016. Unsupervised wordand dependency path embeddings for aspect termextraction.

IJCAI .Daojian Zeng, Kang Liu, Siwei Lai, Guangyou Zhou,Jun Zhao, et al. 2014. Relation classiﬁcation viaconvolutional deep neural network. In

COLING ,pages 2335–2344.Toh Zhiqiang and Wang Wenting. 2014. Dlirec: As-pect term extraction and term polarity classiﬁcationsystem.,pages 2335–2344.Toh Zhiqiang and Wang Wenting. 2014. Dlirec: As-pect term extraction and term polarity classiﬁcationsystem.