Weakly-Supervised Deep Learning for Domain Invariant Sentiment Classification
aa r X i v : . [ c s . L G ] N ov Weakly-Supervised Deep Learning for Domain InvariantSentiment Classification
Pratik Kayal
Indian Institute of TechnologyGandhinagarGujarat, India
Mayank Singh
Indian Institute of TechnologyGandhinagarGujarat, India
Pawan Goyal
Indian Institute of TechnologyKharagpurWest Bengal, India
ABSTRACT
The task of learning a sentiment classification model that adaptswell to any target domain, different from the source domain, is achallenging problem. Majority of the existing approaches focus onlearning a common representation by leveraging both source andtarget data during training. In this paper, we introduce a two-stagetraining procedure that leverages weakly supervised datasets fordeveloping simple lift-and-shift-based predictive models withoutbeing exposed to the target domain during the training phase. Ex-perimental results show that transfer with weak supervision froma source domain to various target domains provides performancevery close to that obtained via supervised training on the targetdomain itself.
CCS CONCEPTS • Computing methodologies → Supervised learning by classifi-cation ; Semi-supervised learning settings ; Neural networks.
KEYWORDS
Sentiment Analysis, Domain Transfer, Weakly labeled datasets
Sentiment analysis is the practice of applying natural language pro-cessing and machine learning techniques to examine the polarity(sentiment) of subjective information from text. With the advance-ment in the internet infrastructure and cheaper Web services, alarge volume of opinionated sentences is produced and consumedby users. The majority of the volume is available in social media,blogs, online retail shops, and discussion forums. Primarily, thisuser-generated content evaluates the utility and quality of prod-ucts and their components such as laptops, mobile phones andbooks, and services such as restaurants, hotels, and events. As ofApril 2013, 90% of the customers’ purchase decisions are depen-dent on online reviews [23]. However, automatic sentiment clas-sification is a challenging problem due to several factors such asthe unavailability of the trained dataset [6], multilingualism [11],bias [13], etc. We describe relevant past work and our contributionsin the following sections.
Figure 1 shows representative sentences from five unrelated do-mains. Development of supervised classification models is heav-ily dependent on labeled datasets, which are rarely available inresource constraint domains. One line of work focuses on creat-ing a general representation for multiple domains based on the co-occurrences of domain-specific and domain-independent fea-tures [3, 4, 15, 20, 28–30]. Peng et al. [24] propose an innovativemethod to simultaneously extract domain-specific and invariantrepresentations, using both source and target domain labeled data.Qiu and Zhang [26] identified domain-specific words to improvecross-domain classification. Blitzer et al. [3] propose Structural Cor-respondence Learning (SCL), which learns a shared feature repre-sentation for source and target domains. Pan et al. [20] proposeSpectral Feature Alignment (SFA) to construct the alignment be-tween the source domain and target domain by using the co-occurrencebetween them, in order to build a bridge between the domains. Ingeneral, the above methods leverage both source-target domain
Domain I:
Laptop reviewsR: Laptop can get warm, to the point of discomfort nearthe WSAD keys, which I assume the GPU is located inter-nally.S: Negative
Domain II:
Restaurant reviewsR: I finished the meal with the “Cookies & Cream” icecream sandwich. That was a little disappointing... notmuch flavour to it.S: Negative
Domain III:
Movie reviewsR: Tired of sobby melodramas and stupid comedies? Whynot watch a film with a difference?S: Positive
Domain IV:
Weather reviewsR: This week in NYC will mark the longest stretch of dryweather since February.S: Positive
Domain V:
Scientific reviewsR: Many approaches for POS tagging have been devel-oped in the past, including rule-based tagging (Brill, 1995),HMM taggers (Brants, 2000), maximum-entropy models(Rathnaparki, 1996), etc. All of these approaches requireeither a large amount of annotated training data (for su-pervised tagging) or a lexicon listing all possible tags foreach word (for unsupervised tagging).S: NegativeR: Review Sentence S: Sentiment
Figure 1: Example reviews from five domains. oDS COMAD 2020, January 5–7, 2020, Hyderabad, India Pratik Kayal, Mayank Singh, and Pawan Goyal pairs for training. Availability of labeled instances in resource-constraintdomains is a challenging task.As a remedy, recent advancement in transfer learning led tothe development of classification models (termed as ‘Lift-and-shiftmodels’ ) that are trained on a labeled dataset from one domainbut can perform significantly better on several other domains [2,12, 27]. Lift-and-Shift model is a natural extension of single do-main sentiment classifier for cross-domain sentiment classification.Here, we pick a trained classifier on a source domain and classifyreviews from the target domain without any prior training on thetarget domain. However, we witness models trained on reviewsof a particular domain, do not generally do well when tested onreviews of the unknown and different target domain. The severallimitations in the textual domain transfer primarily exist due toout-of-the-vocabulary tokens [19], stylistic variations [16], non-generalizable features [12], etc. For example, Crammer et al. [5]assumed that the distributions of multiple sources are the same,but the labelings of the data from different sources may be differ-ent from each other.
Majority of e-commerce, travel, and restaurant websites (such asAmazon, Flipkart, Airbnb, etc.) allow customers to submit their re-views along with a rating over a five-point scale. Even though therating might not directly correlate with the sentiment of the re-view, but they provide weak signals for estimating sentence polar-ity [10, 31]. These weakly supervision rating datasets are termedas “Weakly Labeled Dataset (WLD)” . We show that WLDs, in ad-dition to the labeled dataset, produce significant improvements indomain transfer for resource-constraint target domains.
In this work, we propose a two-stage lift-and-shift training proce-dure that leverages standard labeled sentences along with polaritysignals emerging from weakly labeled review datasets. Informally,the proposed model is trained on a single source domain but pre-dicts sentiments for different target domains. We show that eventhough, BERT (Bidirectional Encoder Representation from Trans-formers) [9] and ELMO [25] achieve state-of-the-art performanceand outperform the previous benchmarks for single domain senti-ment analysis, they consistently fail in the cross-domain sentimentanalysis. Our proposed training mechanism adapts to unknowntarget domains and even performs better than models that explic-itly leverage target domain data.
In this paper, we address the cross-domain sentiment classificationproblem. Given a source domain D src and set of target domains { D , D , D , D . . . } where D , D , . . . , D N represent N distincttarget domains ( D tar ) with D src , D tar , the task is to train a clas-sifier on labelled D src data with high polarity prediction accuracyfor the sentences from the D tar . We use two types of review datasets — (i) the weakly labeled datasets and (ii) the fully labeled datasets . Table 1 details the datasets statis-tics.
Weakly labeled review datasets contain weak signals about the po-larity of review sentences. In the current scenario, the weak signalsare represented by user ratings associated with each review. Userratings are considered as noisy labels and would result in signifi-cantly weak classifiers. As user ratings lie between 1–5, with onebeing the worst and five being the best review, we adopt a simplestrategy to assign sentiment labels to these sentences. sentiment = ( positive, if 4 or 5 star ratednegative, if 1 or 2 star ratedPlease note that we do not consider three-star rated reviews.The current study uses three weakly labeled datasets, Amazon prod-uct reviews [18, 31], YELP restaurant reviews [8] and IMDB moviereviews [17], which we henceforth refer to as AWLD (Amazonweakly labeled dataset), YWLD (YELP weakly labeled dataset) andIWLD (IMDB weakly labeled dataset), respectively. The WLDs areeasier to collect, are not explicitly labeled for sentiments, but aremanual ratings given to reviews complementing the review text. Fully labeled datasets consist of manually labeled review sentences.The reviews are labeled into three classes — (i) positive, (ii) nega-tive, or (iii) neutral. For the current study, we only consider reviewsassociated with positive or negative sentiments. The current studyleverages six fully labeled datasets, (i) Weather sentiment data [7],(ii) IMDB [14], (iii) Yelp [14], (iv) Amazon (Cell and Accessory) [14],(v) Scientific citation context data [1], and (vi) Amazon (DigitalCameras, Cell Phones and Laptops) [31].Table 1 presents salient statistics of the two types of reviewdatasets. The experiments have been reported for those source do-mains that possess corresponding WLDs; thus, scientific citationand weather datasets are not considered as the source domainsdue to unavailability of WLDs. The compiled datasets are currentlyavailable at https://bit.ly/2EnjsSe.
As discussed in previous sections, the intuition is to leverage weaksignals generated by WLDs to complement the polarity classifica-tion training of FLDs. We, therefore, present a two-stage trainingprocedure. In the first stage, the predictive model is pre-trainedusing WLD data with lower learning rate for few iterations . Thesecond stage follows a standard training procedure. In the secondstage, the predictive model is trained using FLD instances with usu-ally higher learning rate till convergence. Algorithm 1 presents adetailed methodology. The training procedure is followed by the In the current paper, a single iteration over all training instances (one epoch). eakly-Supervised Deep Learning for Domain Invariant Sentiment Classification CoDS COMAD 2020, January 5–7, 2020, Hyderabad, India
Dataset Abbreviation Reviews P/N Ratio W L D Amazon AWLD 1.1M 1.43Yelp YWLD 1M 3.95IMDB IWLD 50,000 1 F L D Weather WEAT 980 0.81Amazon ACAD 11,800 0.88IMDB IMDB 1,000 1Yelp YELP 1,000 1Amazon ADLD 1,000 1Scientific SCCD 700 2.96
Table 1: Salient statistics of the datasets. Appropriate ab-breviations are added for better readability in further sec-tions. Amazon represents the Cell and Accessory categoryand Amazon represents Digital Cameras, Cell Phones, andLaptops category. The rightmost column displays the ratioof counts of positive reviews (P) to negative reviews (N). testing procedure. Since, the current work focuses on domain in-variant sentiment classification, the test dataset domain, in the ma-jority of the cases, is different from the train dataset domain.
Algorithm 1:
Weakly Supervised training for Domain Gener-alization
Input: D scr : Source domain; D , D , D , D , . . . , D N : N target domains; Pre-train model on
WLD of source domain D scr with a lowlearning rate and in ‘ n ’ epochs, where n is a small integer; Train model on
FLD of source domain D scr ; for i = i ≤ N ; i = i + do Test the trained model on D i ; For the current polarity prediction task, we train a standard fully-connected feed-forward network with softmax as activation func-tion and two output perceptrons. We plug this fully connectedlayer over recently published state-of-the-art natural language em-bedding models. We experiment with two models, (i) BERT base [9]and (ii) ELMO [25]. These models are trained on the general Wikipediaarticles. Our proposed algorithm considers their pre-trained weightsas the initialization for basic language understanding and adaptsthem to the specific polarity knowledge.
Each review is tokenized and subjected to standard token and char-acter level filtering such as lower-casing and special character fil-tering. Also, all neutral reviews are filtered. Next, each FLD datasetis randomly split into two sets — training, and test. We allocate 85% of pairs for training. The rest 15% of review sentences are allocatedfor the test. Note that, WLDs are not subjected to random splitting.
We compare our model with several lift-and-shift baselines withno WLD-based pretraining. Lift-and-shift models are a natural ex-tension of single-domain sentiment classification models. Here, wepick a trained classier model on a source domain and predict sen-tences from the target domain without any prior training on thetarget domain. The training procedure does not involve the WLDdataset either.
We compare our proposed model against standard lift-and-shiftbaselines (described in the previous section). We leverage standardmetrics in sentiment classification — (i)
Average accuracy score and(ii)
F1 score — for comparing the predicted polarity with the ground-truth polarity.
The BERT and ELMO embedding vector size are fixed at 768 and512 dimensions, respectively. We used a batch size of 64, and binarycross-entropy loss function. In the case of BERT, we find that thelearning rate of 3.00e-5 and 3.00e-8 for training and pretrainingphase, respectively, were performing the best. Similarly, in the caseof ELMO, the learning rate of 3.00e-03 and 3.00e-06 for training andpretraining phase performed best.
We, first of all, discuss the performance score of EMLO-based model.Table 2 presents accuracy and F1 scores for ELMO-based classifi-cation model. Here, each target domain represents 15% test data.ELMO baselines performed best when the system is trained andtested on the same domain (showed by diagonal cells in Table 2).Only training on WLD datasets produced worst results (see therow with source as AWLD). Our proposed training procedure (forexample, ACAD-AWLD) produced marginal performance. We wit-ness similar poor and marginal performances for other WLDs andfor our other proposed two-stage experiment settings, respectively.We claim that the poorer EMLO’s performance gains in cross-domainclassification are primarily due to its current limitations in gen-eralizability as compare to other state-of-the-art embeddings gen-erated from transformer-based language models. Next, we experi-ment with other competitive models.Table 3 presents accuracy and F1 scores for BERT-based classi-fication model. BERT performed significantly better than ELMO.Our proposed training procedure performed exceptionally well, insome cases even at par with the models that are trained and testedon the same domain. This training procedure not only improvesagainst standard lift-and-shift models but also leads to higher trans-fer results. Again, training only on WLD datasets produced theworst results. However, the values are better than the correspond-ing ELMO-based models. Even though, YELP-YWLD performs marginallypoor ( 0.3% lower than YELP), it performs significantly better inother domains such as ADLD (6% higher than YELP) and IMDB(4.7% higher than YELP). Similar transfer improvements are reported oDS COMAD 2020, January 5–7, 2020, Hyderabad, India Pratik Kayal, Mayank Singh, and Pawan Goyal
Target domainsACAD WEAT ADLD IMDB YELP SCCDA(100%) F1 A(100%) F1 A(100%) F1 A(100%) F1 A(100%) F1 A(100%) F1 S o u r c e d o m a i n s ACAD 82.50 0.809 74.60 0.691 87.30 0.871 74.00 0.735 75.30 0.764 71.20 0.809WEAT 69.10 0.570 82.10 0.813 69.30 0.596 72.70 0.682 70.70 0.735 29.40 0.182ADLD 75.30 0.775 68.70 0.687 88.00 0.878 73.30 0.750 78.00 0.802 69.30 0.791IMDB 77.30 0.756 74.60 0.761 86.00 0.847 78.70 0.802 79.30 0.812 70.60 0.802YELP 76.10 0.744 74.60 0.746 82.70 0.783 74.70 0.740 81.30 0.823 37.90 0.371SCCD 47.80 0.647 44.80 0.619 48.00 0.649 52.70 0.687 49.30 0.661 75.80 0.862AWLD 54.40 0.676 46.30 0.625 52.70 0.670 52.00 0.684 53.30 0.679 75.80 0.862IWLD 67.30 0.699 59.70 0.64 79.30 0.805 74.70 0.729 74.00 0.755 66.40 0.762YWLD 73.90 0.765 68.70 0.72 84.00 0.848 74.00 0.78 80.00 0.824 77.80 0.861ACAD-AWLD 80.00 0.794 71.60 0.708 80.70 0.803 75.30 0.764 73.30 0.762 69.90 0.807IMDB-IWLD 75.60 0.74 68.70 0.704 80.70 0.788 83.30 0.843 74.70 0.756 61.00 0.691YELP-YWLD 76.40 0.755 76.10 0.742 90.00 0.891 73.30 0.71 80.70 0.818 58.30 0.56
Table 2: [Color online] Accuracy and F1 scores with ELMO as embeddings. Here, each target domain represents a 15% held-out data. Blue and red color represent the best and second best values for a given target domain. AWLD, IWLD and YWLDrepresent training on weakly labeled datasets only. ACAD-AWLD, IMDB-IWLD and YELP-YWLD represents our two-stagetraining procedure.
Target DomainsACAD WEAT ADLD IMDB YELP SCCDA(100%) F1 A(100%) F1 A(100%) F1 A(100%) F1 A(100%) F1 A(100%) F1 S o u r c e D o m a i n s ACAD 90.20 0.898 80.50 0.805 95.30 0.953 89.30 0.901 88.60 0.887 80.20 0.876WEAT 79.90 0.766 85.00 0.848 86.60 0.857 87.30 0.876 85.30 0.853 67.00 0.731ADLD 84.10 0.819 85.00 0.827 95.30 0.950 84.60 0.841 91.30 0.909 50.80 0.867IMDB 82.30 0.805 86.50 0.852 92.60 0.925 91.30 0.915 87.30 0.875 65.80 0.867YELP 81.90 0.791 79.10 0.787 89.30 0.887 87.30 0.872 95.30 0.953 55.00 0.867SCCD 73.60 0.747 56.70 0.658 76.60 0.782 76.00 0.785 87.30 0.883 82.00 0.883AWLD 78.65 0.812 50.70 0.645 89.30 0.898 78.60 0.829 79.30 0.82 77.20 0.87IWLD 76.00 0.785 73.10 0.769 74.00 0.782 84.70 0.862 79.30 0.825 82.00 0.893YWLD 85.40 0.855 76.10 0.778 94.00 0.941 90.70 0.912 91.30 0.916 76.60 0.868ACAD-AWLD 90.50 0.903 85.00 0.843 94.00 0.938 92.60 0.931 92.00 0.921 80.80 0.879IMDB-IWLD 86.50 0.857 85.10 0.844 93.30 0.933 92.70 0.929 92.70 0.928 76.60 0.868YELP-YWLD 86.00 0.853 80.60 0.806 95.30 0.953 92.00 0.923 95.00 0.951 71.90 0.868
Table 3: [Color online] Accuracy and F1 scores with BERT as embeddings. Here, each target domain represents a 15% held-out data. Blue and red color represent the best and second best values for a given target domain. AWLD, IWLD and YWLDrepresent training on weakly labeled datasets only. ACAD-AWLD, IMDB-IWLD and YELP-YWLD represents our two-stagetraining procedure. for ACAD-AWLD and IMDB-IWLD. This transfer performance im-provement reconfirms the usefulness of two-stage training proce-dure. As expected, domain transfer from SCCD and WEAT is infe-rior due to high dissimilarity between SCCD and WEAT and otherdomains.Note that the performance of the models trained on only WLDsis poor compared to the performance of FLDs. This observationsuggests that WLDs are itself very noisy and not good enoughfor the sentiment classification task[21, 22]. Also, the poor perfor-mance of same domain weakly labelled dataset on fully labelleddata suggests that the star rating is not highly correlated to senti-ment.
In this paper, we propose a two-stage training framework for cross-domain sentiment classification. We showcase the utility of com-bining weak labels and full labels for domain-invariant sentimentanalysis. Even though the proposed approach uses more data thanthe baseline, curating WLD data is an extremely easier task thanthe curation of FLDs. WLD datasets extend the transfer capabilitiesto a significantly large extent. Our experimental results on BERTbased model demonstrate the effectiveness of our proposed frame-work for a wide range of domains.The primary focus of this paper has been to train more gener-alizable sentiment classification models using only single domaindata without using any target domain signals. The idea of pretrain-ing on weak signals can further be explored by combining variousweakly labeled domains. eakly-Supervised Deep Learning for Domain Invariant Sentiment Classification CoDS COMAD 2020, January 5–7, 2020, Hyderabad, India
REFERENCES [1] Awais Athar. 2011. Sentiment Analysis of Citations using SentenceStructure-Based Features. In
Proceedings of the ACL 2011 Student Ses-sion
Proceedings of recent advances in natural languageprocessing (RANLP) , Vol. 1. Citeseer, 2–1.[3] John Blitzer, Mark Dredze, and Fernando Pereira. 2007. Biographies, bollywood,boom-boxes and blenders: Domain adaptation for sentiment classification. In
Proceedings of the 45th annual meeting of the association of computational lin-guistics . 440–447.[4] Danushka Bollegala, David Weir, and John Carroll. 2011. Using multiple sourcesto construct a sentiment sensitive thesaurus for cross-domain sentiment classi-fication. In
Proceedings of the 49th Annual Meeting of the Association for Com-putational Linguistics: Human Language Technologies-Volume 1 . Association forComputational Linguistics, 132–141.[5] Koby Crammer, Michael Kearns, and Jennifer Wortman. 2007. Learning frommultiple sources.In
Advances in Neural Information ProcessingSystems . 321–328.[6] Sajib Dasgupta and Vincent Ng. 2009. Mine the easy, classify the hard: a semi-supervised approach to automatic sentiment classification. In
Proceedings of theJoint Conference of the 47th Annual Meeting of the ACL and the 4th InternationalJoint Conference on Natural Language Processing of the AFNLP: Volume 2-Volume2 arXiv preprint arXiv:1810.04805 (2018).[10] Ziyu Guan, Long Chen, Wei Zhao, Yi Zheng, Shulong Tan, and Deng Cai. 2016.Weakly-supervised deep learning for customer review sentiment classification.In
Proceedings of the Twenty-Fifth International Joint Conference on Artificial In-telligence . AAAI Press, 3719–3725.[11] Alexander Hogenboom, Bas Heerschop, Flavius Frasincar, Uzay Kaymak, andFranciska de Jong. 2014. Multi-lingual support for lexicon-based sentiment anal-ysis guided by semantics.
Decision support systems
62 (2014), 43–53.[12] Jing Jiang and ChengXiang Zhai. 2007. A two-stage approach to domain adap-tation for statistical classifiers. In
Proceedings of the sixteenth ACM conference onConference on information and knowledge management . ACM, 401–410.[13] Muhammad Taimoor Khan, Mehr Durrani, Armughan Ali, Irum Inayat, ShehzadKhalid, and Kamran Habib Khan. 2016. Sentiment analysis and the complexnatural language.
Complex Adaptive Systems Modeling
4, 1 (2016), 2.[14] Dimitrios Kotzias, Misha Denil, Nando De Freitas, and Padhraic Smyth. 2015.From group to individual labels using deep features. In
Proceedings of the 21thACM SIGKDD International Conference on Knowledge Discovery and Data Mining .ACM, 597–606.[15] Fangtao Li, Sinno Jialin Pan, Ou Jin, Qiang Yang, and Xiaoyan Zhu. 2012. Cross-domain co-extraction of sentiment and topic lexicons. In
Proceedings of the 50thAnnual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1 . Association for Computational Linguistics, 410–419.[16] Juncen Li, Robin Jia, He He, and Percy Liang. 2018. Delete, retrieve, generate: Asimple approach to sentiment and style transfer. arXiv preprint arXiv:1804.06437 (2018).[17] Andrew L Maas, Raymond E Daly, Peter T Pham, Dan Huang, Andrew Y Ng,and Christopher Potts. 2011. Learning word vectors for sentiment analysis. In
Proceedings of the 49th annual meeting of the association for computational lin-guistics: Human language technologies-volume 1 . Association for ComputationalLinguistics, 142–150.[18] Julian McAuley, Christopher Targett, Qinfeng Shi, and Anton Van Den Hengel.2015. Image-based recommendations on styles and substitutes. In
Proceedingsof the 38th International ACM SIGIR Conference on Research and Development inInformation Retrieval . ACM, 43–52.[19] Sinno Jialin Pan, Xiaochuan Ni, Jian-Tao Sun, Qiang Yang, and Zheng Chen. 2010.Cross-domain sentiment classification via spectral feature alignment. In
Proceed-ings of the 19th international conference on World wide web . ACM, 751–760.[20] Sinno Jialin Pan, Ivor W Tsang, James T Kwok, and Qiang Yang. 2011. Domainadaptation via transfer component analysis.
IEEE Transactions on Neural Net-works
22, 2 (2011), 199–210.[21] Bo Pang and Lillian Lee. 2005. Seeing stars: Exploiting class relationships forsentiment categorization with respect to rating scales. In
Proceedings of the 43rdannual meeting on association for computational linguistics . Association for Com-putational Linguistics, 115–124.[22] Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. 2002. Thumbs up?: sen-timent classification using machine learning techniques. In
Proceedings of theACL-02 conference on Empirical methods in natural language processing-Volume 10 . Association for Computational Linguistics, 79–86.[23] Ling Peng, Geng Cui, Mengzhou Zhuang, and Chunyu Li. 2014. What do sellermanipulations of online product reviews mean to consumers? (2014).[24] Minlong Peng, Qi Zhang, Yu-gang Jiang, and Xuanjing Huang. 2018. Cross-Domain Sentiment Classification with Target Domain Specific Information. In
Proceedings of the 56th Annual Meeting of the Association for Computational Lin-guistics (Volume 1: Long Papers) . 2505–2513.[25] Matthew E Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, ChristopherClark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word rep-resentations. arXiv preprint arXiv:1802.05365 (2018).[26] Likun Qiu and Yue Zhang. 2015. Word segmentation for Chinese novels. In
Twenty-Ninth AAAI Conference on Artificial Intelligence .[27] Songbo Tan, Gaowei Wu, Huifeng Tang, and Xueqi Cheng. 2007. A novel schemefor domain-transfer problem in the context of sentiment analysis. In
Proceed-ings of the sixteenth ACM conference on Conference on information and knowledgemanagement . Citeseer, 979–982.[28] Zhilin Yang, Ruslan Salakhutdinov, and William W Cohen. 2017. Transfer learn-ing for sequence tagging with hierarchical recurrent networks. arXiv preprintarXiv:1703.06345 (2017).[29] Jianfei Yu and Jing Jiang. 2016. Learning sentence embeddings with auxiliarytasks for cross-domain sentiment classification. In
Proceedings of the 2016 con-ference on empirical methods in natural language processing . 236–246.[30] Meishan Zhang, Yue Zhang, Wanxiang Che, and Ting Liu. 2014. Type-superviseddomain adaptation for joint segmentation and pos-tagging. In
Proceedings of the14th Conference of the European Chapter of the Association for ComputationalLinguistics . 588–597.[31] Wei Zhao, Ziyu Guan, Long Chen, Xiaofei He, Deng Cai, Beidou Wang, and QuanWang. 2017. Weakly-supervised deep embedding for product review sentimentanalysis.