[PDF] A Joint Training Dual-MRC Framework for Aspect Based Sentiment Analysis

Abstract

Aspect based sentiment analysis (ABSA) involves three fundamental subtasks: aspect term extraction, opinion term extraction, and aspect-level sentiment classification. Early works only focused on solving one of these subtasks individually. Some recent work focused on solving a combination of two subtasks, e.g., extracting aspect terms along with sentiment polarities or extracting the aspect and opinion terms pair-wisely. More recently, the triple extraction task has been proposed, i.e., extracting the (aspect term, opinion term, sentiment polarity) triples from a sentence. However, previous approaches fail to solve all subtasks in a unified end-to-end framework. In this paper, we propose a complete solution for ABSA. We construct two machine reading comprehension (MRC) problems and solve all subtasks by joint training two BERT-MRC models with parameters sharing. We conduct experiments on these subtasks, and results on several benchmark datasets demonstrate the effectiveness of our proposed framework, which significantly outperforms existing state-of-the-art methods.

Full PDF

AA Joint Training Dual-MRC Framework for Aspect Based Sentiment Analysis

Yue Mao, Yi Shen, Chao Yu, Longjun Cai

Alibaba Group, Beijing, China { maoyue.my, sy133477, aiqi.yc, longjun.clj } @alibaba-inc.com Abstract

Aspect based sentiment analysis (ABSA) involves threefundamental subtasks: aspect term extraction, opinion termextraction, and aspect-level sentiment classiﬁcation. Earlyworks only focused on solving one of these subtasks individ-ually. Some recent work focused on solving a combinationof two subtasks, e.g., extracting aspect terms along with sen-timent polarities or extracting the aspect and opinion termspair-wisely. More recently, the triple extraction task has beenproposed, i.e., extracting the (aspect term, opinion term, sen-timent polarity) triples from a sentence. However, previousapproaches fail to solve all subtasks in a uniﬁed end-to-endframework. In this paper, we propose a complete solution forABSA. We construct two machine reading comprehension(MRC) problems and solve all subtasks by joint training twoBERT-MRC models with parameters sharing. We conduct ex-periments on these subtasks, and results on several bench-mark datasets demonstrate the effectiveness of our proposedframework, which signiﬁcantly outperforms existing state-of-the-art methods.

Introduction

Aspect based sentiment analysis (ABSA) is an importantresearch area in natural language processing. Consider theexample in Figure 1, in the sentence “ The ambience wasnice, but the service was not so great. ”, the aspect terms(AT) are “ ambience/service ” and the opinion terms (OT) are“ nice/not so great ”. Traditionally, there exist three funda-mental subtasks: aspect term extraction, opinion term ex-traction, and aspect-level sentiment classiﬁcation. Recent re-search works aim to do a combination of two subtasks andhave achieved great progress. For example, they extract (AT,OT) pairs, or extract ATs with corresponding sentiment po-larities (SP). More recently, some work that aims to do all re-lated subtasks in ABSA with a uniﬁed framework has raisedincreasing interests.For convenience, we assume the following abbreviationsof ABSA subtasks as illustrated in Figure 1:• AE : AT extraction• OE : OT extraction It is also referred as target based sentiment analysis (TBSA).

Figure 1: An illustrative example of ABSA substasks.• SC : aspect-level sentiment classiﬁcation• AESC : AT extraction and sentiment classiﬁcation• AOE : aspect-oriented OT extraction• Pair : (AT, OT) pair extraction•

Triple : (AT, OT, SP) triple extraction.We mainly focus on the task of extracting ( a, o, s ) triplessince it is the hardest among all ABSA substasks. Peng et al.(2020) proposed a uniﬁed framework to extract (AT, OT,SP) triples. However, it is computationally inefﬁcient as itsframework has two stages and has to train three separatemodels.In this paper, we propose a joint training frameworkto handle all ABSA subtasks (described in Figure 1) inone single model. We use BERT (Devlin et al. 2019) asour backbone network and use a span based model to de-tect the start/end positions of ATs/OTs from a sentence.Span based methods outperform traditional sequence tag-ging based methods for extraction tasks (Hu et al. 2019).Following its idea, a heuristic multi-span decoding algo-rithm is used, which is based on the non-maximum suppres-sion algorithm (NMS) (Rosenfeld and Thurston 1971). It is also referred as aspect based sentiment analysis (ABSA). It is also referred as target oriented opinion word extraction(TOWE). a r X i v : . [ c s . C L ] J a n e convert the original triple extraction task to two ma-chine reading comprehension (MRC) problems. MRC meth-ods are known to be effective if a pre-trained BERT model isused. The reason might be that BERT is usually pre-trainedwith the next sentence prediction to capture the pairwise sen-tence relations. Theoretically, the triple extraction task canbe decomposited to subtasks AE , AOE and SC . Thus, weuse the left MRC to handle AE and the right MRC to han-dle AOE and SC . Our main contributions in this paper are asfollows:• We show the triple extraction task can be jointly trainedwith three objectives.• We propose a dual-MRC framework that can handle allsubtasks in ABSA (as illustrated in Table 1).• We conduct experiments to compare our proposed frame-work on these tasks. Experimental results show that ourproposed method outperforms the state-of-the-art meth-ods.Subtasks Left-MRC Right-MRCExtraction Classiﬁcation ExtractionAE √ AOE √ SC √ AESC √ √

Pair √ √

Triple √ √ √

Table 1: Our proposed dual-MRC can handle all ABSA sub-tasks.

Related Work

Aspect-based sentiment analysis (ABSA) has been widelystudied since it was ﬁrst proposed in (Hu and Liu 2004). Inthis section, we present existing works on ABSA accordingto related subtasks.

SC.

Various neural models have been proposed for thistask in recent years. The core idea of these works is tocapture the intricate relationship between an aspect and itscontext by designing various neural architectures such asCNN (Huang and Carley 2018; Li et al. 2018a), RNN (Tanget al. 2016; Zhang, Zhang, and Vo 2016; Ruder, Ghaffari,and Breslin 2016), attention-based network (Ma et al. 2017;Du et al. 2019; Wang et al. 2016; Gu et al. 2018; Yanget al. 2017), memory network(Tang, Qin, and Liu 2016;Chen et al. 2017; Fan et al. 2018). Sun, Huang, and Qiu(2019) convert SC to a BERT sentence-pair classiﬁcationtask, which achieves state-of-the-art results of this task. AE.

As the pre-task of SC , AE aims to identify all as-pect terms in a sentence (Hu and Liu 2004; Pontiki et al.2014) and is usually regarded as a sequence labeling prob-lem (Li et al. 2018b; Xu et al. 2018; He et al. 2017). Be-sides, (Ma et al. 2019) and (Li et al. 2020) formulated AE as a sequence-to-sequence learning task and also achievedimpressive results. AESC.

In order to make

AESC meet the needs of practicaluse, plenty of previous works make efforts to solve AE and SC simultaneously. Simply merging AE and SC in a pipelinemanner will lead to an error-propagation problem (Ma, Li,and Wang 2018). Some works (Li et al. 2019a,b) attempt toextract aspects and predicting corresponding sentiment po-larities jointly through sequence tagging based on a uniﬁedtagging scheme. However, these approaches are inefﬁcientdue to the compositionality of candidate labels (Lee et al.2016) and may suffer the sentiment inconsistency problem.Zhou et al. (2019) and Hu et al. (2019) utilize span-basedmethods to conduct AE and SC at the span-level rather thantoken-level, which are able to overcome the sentiment in-consistency problem. It is worth noting that the informationof opinion terms is under-exploited during these works. OE.

Opinion term extraction ( OE ) is widely employedas an auxiliary task to improve the performance of AE (Yu,Jiang, and Xia 2019; Wang et al. 2017; Wang and Pan 2018), SC (He et al. 2019) or both of them (Chen and Qian 2020).However, the extracted ATs and OTs in these works are notin pairs, as a result, they can not provide the cause for corre-sponding polarity of an aspect. AOE.

The task

AOE (Fan et al. 2019) has been pro-posed for the pair-wise aspect and opinion terms extractionin which the aspect terms are given in advance. Fan et al.(2019) design an aspect-fused sequence tagging approachfor this task. Wu et al. (2020) utilize a transfer learningmethod that leverages latent opinions knowledge from aux-iliary datasets to boost the performance of

AOE . Pair.

Zhao et al. (2020) proposed the

Pair task to extractaspect-opinion pairs from scratch, they develop a span-basedmulti-task framework, which ﬁrst enumerates all the candi-date spans and then construct two classiﬁers to identify thetypes of spans (i.e. aspect or opinion terms) and the relation-ship between spans.

Triple.

Peng et al. (2020) deﬁned the triple extraction taskfor ABSA, which aims to extract all possible aspect terms aswell as their corresponding opinion term and sentiment po-larity. The method proposed in (Peng et al. 2020) is a two-stage framework, the ﬁrst stage contains two separate mod-ules, one is a uniﬁed sequence tagging model for AE and SC , the other is a graph convolutional neural network(GCN)for OE . In the second stage, all possible aspect-opinion pairsare enumerated and a binary classiﬁer is constructed to judgewhether the aspect term and opinion term match with eachother. The main difference between our work and (Peng et al.2020) is that we regard all subtasks as a question-answeringproblem, and propose a uniﬁed framework based on a singlemodel. Proposed Framework

Joint Training for Triple Extraction

In this section, we focus on the triple extraction task andthe other subtasks can be regarded as special cases of it.Given a sentence x j with max-length n as the input. Let T j = { ( a, o, s ) } be the output of annotated triples given theinput sentence x j , where s ∈ { Positive, Neutral, Negative } and ( a, o, s ) refers to (aspect term, opinion term and sen-iment polarity). For the training set D = { ( x j , T j ) } , wewant to maximize the likelihood L ( D ) = |D| (cid:89) j =1 (cid:89) ( a,o,s ) ∈ T j P (( a, o, s ) | x j ) . (1)Deﬁne T j | a := { ( o, s ) , ( a, o, s ) ∈ T j } , k j,a := | T j | a | . (2)Consider the log-likelihood for x j , (cid:96) ( x j ) = (cid:88) ( a,o,s ) ∈ T j log P (( a, o, s ) | x j )= (cid:88) a ∈ T j (cid:88) ( o,s ) ∈ T j | a log P ( a | x j ) + log P (( o, s ) | a, x j )= (cid:88) a ∈ T j  (cid:88) ( o,s ) ∈ T j | a log P ( a | x j )  + (cid:88) a ∈ T j  (cid:88) ( o,s ) ∈ T j | a log P ( s | a, x j ) + log P ( o | a, x j )  (3)The last equation holds because the opinion terms o and thesentiment polarity s are conditionally independent given thesentence x j and the aspect term a . (cid:96) ( x j ) = (cid:88) a ∈ T j k j,a · log P ( a | x j )+ (cid:88) a ∈ T j  k j,a · log P ( s | a, x j ) + (cid:88) o ∈ T j | a log P ( o | a, x j )  . (4)We sum above equation over x j ∈ D and normalize the bothsides, then we get the log-likelihood of the following form (cid:96) ( D ) = α · |D| (cid:88) j =1 (cid:88) a ∈ T j  (cid:88) a ∈ T j log P ( a | x j )  + β · |D| (cid:88) j =1 (cid:88) a ∈ T j log P ( s | a, x j )+ γ · |D| (cid:88) j =1 (cid:88) a ∈ T j  (cid:88) o ∈ T j | a log P ( o | a, x j )  (5)where α, β, γ ∈ [0 , . The ﬁrst term is repeated in order tomatch with the other two terms. From (5), we may concludethe triple extraction task Triple can be converted to the jointtraining of AE , SC and AOE . Dual-MRC Framework

Now we are going to propose our joint training dual-MRCframework. As illustrated in Figure 2, our model consists Note ( x j , a ) has all the information needed to determine s .The term o does not bring additional information as it can be im-plied by ( x j , a ) , therefore P ( s | x j , a, o ) = P ( s | x j , a ) of two parts. Both parts use BERT (Devlin et al. 2019) astheir backbone models to encode the context information.Recall that BERT is a multi-layer bidirectional Transformerbased language representation model. Let n denote the sen-tence length and d denote the hidden dimension. Supposethe last layer outputs for all tokens are h l,s , h r,s , h l,e , h r,e ∈ R ( n +2) × d which are used for extraction, where l/r referto the left/right part and s/e refer to the stard/end token.Suppose the output of BERT at the [CLS] token is h rcls ∈ R ( n +2) × d which is used for classiﬁcation.The goal of the left part is to extract all ATs from thegiven text, i.e., the task AE . As we discussed previously, spanbased methods are proven to be effective for extraction tasks.We follow the idea in (Hu et al. 2019), for the left part, weobtain the logits and probabilities for start/end positions g l,s = W l,s h l,s , p l,s = sof tmax ( g l,s ) (6) g l,e = W l,e h l,e , p l,e = sof tmax ( g l,e ) (7)where W l,s ∈ R × d and W l,e ∈ R × d are trainable weightsand softmax is taken over all tokens. Deﬁne the extractionloss of the left part as J AE = − (cid:88) i y l,si log( p l,si ) − (cid:88) i y l,ei log( p l,ei ) (8)where y l,s and y l,e are ground truth start and end positionsfor ATs.The goal of the right part is to extract all OTs and ﬁndthe sentiment polarity with respect to a given speciﬁc AT.Similarly, we obtain the logits and probabilities for start/endpositions g r,s = W r,s h r,s , p r,s = sof tmax ( g r,s ) (9) g r,e = W r,e h r,e , p r,e = sof tmax ( g r,e ) (10)where W r,s ∈ R × d and W r,e ∈ R × d are trainable weightsand softmax is applied on all tokens. Deﬁne the extractionloss of the right part as J AOE = − (cid:88) i y r,si log( p r,si ) − (cid:88) i y r,ei log( p r,ei ) (11)where y r,s , y r,e ∈ R ( n +2) are true start and end positions forOTs given a speciﬁc AT.In addition, for the right part, we also obtain the sentimentpolarity p rcls = sof tmax ( W rcls h rcls + b rcls ) (12)The cross entropy loss for the classiﬁcation is J SC = CE ( p rcls , y cls ) (13)where y cls ∈ R ( n +2) is the true labels for sentiment polari-ties. Then we want to minimize the ﬁnal joint training loss J = α · J AE + β · J SC + γ · J AOE (14)where α, β, γ ∈ [0 , are hyper-parameters to control thecontributions of objectives.igure 2: Proposed joint training dual-MRC framework. MRC Dataset Conversion

As illustrated in Figure 3, the original triple annotations haveto be converted before it is fed into the joint training dual-MRC model. Both MRCs use the input sentence as their con-texts. The left MRC is constructed with the query q = “Find the aspect terms in the text.” (15)Then the answer to the left MRC is all ATs from the text.Given an AT, the right MRC is constructed with the query q ( AT ) = “Find the sentiment polarity andopinion terms for AT in the text.” (16)The output to the right MRC is all OTs and the sentimentpolarity with respect to the given AT. An important problemis that number of right MRCs equals the number of ATs,therefore, the left MRC is repeated for that number of times.Figure 3: Dataset conversion Inference Process

For

Triple , we want to point out some differences betweenthe training process and inference process. During the train-ing process, the ground truth of all ATs are known, then theright MRC can be constructed based on these ATs. Thus,the training process is end-to-end. However, during the in-ference process, the ATs are the output of the left MRC.Therefore, we inference the two MRCs in a pipeline, as inAlgorithm 1.The inference process of other tasks are similar. The task AE uses the span output from the left MRC. AOE and SC Algorithm 1:

The inference Process for Triple Ex-traction of the Dual-MRC Framework

Input: sentence x Output: T = { ( a, o, s ) } triplesInitialize T = {} ;Input x with the query q described in (15) as the leftMRC, and output the AT candidates A ;If A = {} , return T ; for a i ∈ A do Input x with the query q described in (16) as theright MRC, and output the sentiment polarity s and OTs { o j , j = 1 , , ... } ; T ← T (cid:83) { ( a i , o j , s ) , j = 1 , , ... } end Return T .use the span and classiﬁcation outputs from the right MRC. AESC and

Pair use a combination of them. Please refer toTable 1 for details.

Experiments

Datasets

Original datasets are from the Semeval Challenges(Pontikiet al. 2014, 2015, 2016), where ATs and corresponding sen-timent polarities are labeled. We evaluate our framework onthree public datasets derived from them.The ﬁrst dataset is from (Wang et al. 2017), where labelsfor opinion terms are annotated. All datasets share a ﬁxedtraining/test split. The second dataset is from (Fan et al.2019), where (AT, OT) pairs are labeled. The third datasetis from (Peng et al. 2020) where (AT, OT, SP) triples are la-beled. A small number of samples with overlapping ATs andOTs are corrected. Also, of the data from the trainingset are randomly selected as the validation set. The detailedstatistics for the three sets of datasets above are shown inTable 2, Table 3 and Table 4. ataset 14res 14lap 15res

Table 2: Dataset statistics annotated by (Wang et al. 2017).

Dataset 14res 14lap 15res 16res

Table 3: Dataset statistics annotated by (Fan et al. 2019).

Subtasks and Baselines

There exist three research lines in ABSA where each re-search line with different data annotations, ABSA substasks,baselines and experimental settings. To fairly compare ourproposed framework with previous baselines, we shouldspecify them clearly for each research line.Using the dataset from (Wang et al. 2017), the followingbaselines were evaluated for AE , OE , SC and AESC :• SPAN-BERT (Hu et al. 2019) is a pipeline method for

AESC which takes BERT as the backbone network. Aspan boundary detection module is used for AE , then fol-lowed by a polarity classiﬁer based on span representa-tions for SC .• IMN-BERT (He et al. 2019) is an extension of IMN (Heet al. 2019) with BERT as the backbone. IMN is a multi-task learning method involving joint training for AE and SC . A message-passing architecture is introduced in IMNto boost the performance of AESC .• RACL-BERT (Chen and Qian 2020) is a stacked multi-layer network based on BERT encoder and is the state ofthe art method for

AESC . A Relation propagation mech-anism is utilized in RACL to capture the interactions be-tween subtasks (i.e. AE , OE , SC ).Using the dataset from (Fan et al. 2019), the following base-lines were evaluated for AOE :• IOG (Fan et al. 2019) is the ﬁrst model proposed to ad-dress

AOE , which adopts six different BLSTMs to extractcorresponding opinion terms for aspects given in advance.•

LOTN (Wu et al. 2020) is the state of the art method forAOE, which transfer latent opinion information from ex-ternal sentiment classiﬁcation datasets to improve the per-formance.Using the dataset from (Peng et al. 2020), the followingbaselines were evaluated for

AESC , Pair and

Triple :• RINANTE (Dai and Song 2019) is a weakly supervisedco-extraction method for AE and OE which make use ofthe dependency relations of words in a sentence.• CMLA (Wang et al. 2017) is a multilayer attention net-work for AE and OE , where each layer consists of a cou-ple of attentions with tensor operators. Dataset 14res 14lap 15res 16res

Table 4: Dataset statistics annotated by (Peng et al. 2020).

Li-uniﬁed-R (Peng et al. 2020) is a modiﬁed variant ofLi-uniﬁed(Li et al. 2019a), which is originally for

AESC via a uniﬁed tagging scheme. Li-uniﬁed-R only adapts theoriginal OE module for opinion term extraction.• Peng-two-stage (Peng et al. 2020) is a two-stage frame-work with separate models for different subtasks in ABSAand is the state-of-the-art method for

Triple . Model Settings

We use the BERT-Base-Uncased or BERT-Large-Uncasedas backbone models for our proposed model depending onthe baselines. Please refer to (Devlin et al. 2019) for modeldetails of BERT. We use Adam optimizer with a learningrate of e − and warm up over the ﬁrst steps to trainfor 3 epochs. The batch size is and a dropout probabilityof . is used. The hyperparameters α, β, γ for the ﬁnal jointtraining loss in Equation 14 are not sensitive to results, so weﬁx them as / in our experiments. The logit thresholds ofheuristic multi-span decoding algorithms (Hu et al. 2019)are very sensitive to results and they are manually tuned oneach dataset, and other hyperparameters are kept default. Allexperiments are conducted on a single Tesla-V100 GPU. Evaluation Metrics

For all tasks in our experiments, we use the precision (P),recall (R), and F1 scores as evaluation metrics since a pre-dicted term is correct if it exactly matches a gold term. Main Results

As mentioned previously, there are three research lines withdifferent datasets, ABSA substasks, baselines and experi-mental settings. For each research line, we keep the samedataset and experimental setting, and compare our proposeddual-MRC framework with the baselines and present our re-sults in Table 5, Table 6 and Table 7.First, we compare our proposed method for AE , SC and AESC on the dataset from (Wang et al. 2017). OE is notapplicable to our proposed framework . Since the pair-wiserelations of (AT, OT) are not annotated in this dataset, weuse the right part of our model for classiﬁcation only. of the data from the training set are randomly selected as thevalidation set. The results are the average scores of 5 runswith random initialization and they are shown in Table 5. Weadopt BERT-Large-Uncased as our backbone model since https://github.com/google-research/bert We use F1 as the metric for aspect-level sentiment classiﬁca-tion following (Chen and Qian 2020) If needed, we can train a separate model with the query “

Findthe opinion terms in the text. ” for OE . - 71.75 73.68 82.34 - 62.50 61.25 74.63 - 50.28 62.29IMN-BERT 84.06 85.10 75.67 70.72 77.55 81.00 75.56 61.73 69.90 73.29 70.10 60.22RACL-BERT 86.38 87.18 81.61 75.42 81.79 79.72 73.91 63.40 73.99 76.00 Dual-MRC 86.60 - - - 73.59 65.08Table 5: Results for AE , SC and AESC on the datasets annotated by (Wang et al. 2017). OE is not applicable to our proposedframework. All tasks are evaluated with F1. Baseline results are directly taken from (Chen and Qian 2020). Our model is basedon BERT-Large-Uncased. of the data from the training set are randomly selected as the validation set. The results are theaverage scores of 5 runs with random initialization.14res 14lap 15res 16resP R F1 P R F1 P R F1 P R F1IOG 82.38 78.25 80.23 73.43 68.74 70.99 72.19 71.76 71.91 84.36 79.08 81.60LOTN 84.00 Dual-MRC

AOE on the datasets annotated by (Fan et al. 2019). Baseline results are directly taken from (Wu et al.2020). Our model is based on BERT-Base-Uncased.

Dual-MRC

Triple RINANTE 31.07 37.63 34.03 23.10 17.60 20.00 29.40 26.90 28.00 27.10 20.50 23.30CMLA 40.11 46.63 43.12 31.40 34.60 32.90 34.40 37.60 35.90 43.60 39.80 41.60Li-uniﬁed-R 41.44 68.79 51.68 42.25 42.78 42.47 43.34 50.73 46.69 38.19 53.47 44.51Peng-two-stage 44.18 62.99 51.89 40.40 47.24 43.50 40.97

Table 7: Results for

AESC , Pair and

Triple on the datasets annotated by (Peng et al. 2020). Baseline results are directly takenfrom (Peng et al. 2020). Our model is based on BERT-Base-Uncased.the baselines use it too. All the baselines are BERT basedand our results achieve the ﬁrst or second place comparing tothem. Recall that our approach is inspired by SPAN-BERT,which is a strong baseline for extraction tasks. Our resultsare close to SPAN-BERT in AE . However, with the help ofMRC, we achieve much better results in SC and AESC .Second, we compare our proposed method for

AOE on thedataset from (Fan et al. 2019), where the pair-wise (AT, OT)relations are annotated. This task can be viewed as a trivialcase of our proposed full model. The results are shown in Ta-ble 6. BERT-Base-Uncased is used as our backbone model.Although the result for 16res is a little bit lower than LOTN,most of our results signiﬁcantly outperform the previousbaselines. It indicates our model has advantage in matchingAT and OT. In particular, our model performs much betterthan baselines on lap14. It is probably due to the domain dif-ference between the laptop (14lap) comments and the restau- rant comments (14res/15res/16res).Third, we compare our proposed method for

AESC , Pair and

Triple on the dataset from (Peng et al. 2020). The fullmodel of our proposed framework is implemented. The re-sults are shown in Table 7. BERT-Base-Uncased is used asour backbone model. Our results signiﬁcantly outperformthe baselines, especially in the precision scores of extrac-tion the pair-wise (AT, OT) relations. Note that Li-uniﬁed-R and Peng-two-stage both use the uniﬁed tagging schema.For extraction tasks, span based methods outperform theuniﬁed tagging schema for extracting terms, probably be-cause determining the start/end positions is easier than de-termining the label for every token. More precisely, for theuniﬁed tagging schema, there are at 7 possible choices foreach token, say { B-POS, B-NEU, B-NEG, I-POS, I-NEU,I-NEG, O } , so there are n total choices. For span basedmethods, there are at 4 possible choices for each token, say xample Ground Truth Our model Peng-two-stage Li-uniﬁed-R CMLARice is too dry ,tuna was n’t sofresh either . (Rice, too dry, NEG),(tuna, was n’t so fresh, NEG) (Rice, too dry, NEG),(tuna, was n’t so fresh, NEG) (Rice, too dry, NEG),(tuna, was n’t so fresh, NEG),(Rice, was n’t so fresh, NEG) (cid:37) ,(tuna, too dry, NEG) (cid:37) (Rice, dry, POS) (cid:37) ,(Rice, n’t, POS) (cid:37) ,(tuna, dry, POS) (cid:37) ,(tuna, fresh, POS) (cid:37) (Rice, dry, POS) (cid:37) ,(tuna, dry, POS) (cid:37) I am pleased withthe fast log on, speedyWiFi connection andthe long battery life. (log on, pleased, POS),(log on, fast, POS),(WiFi connection, speedy, POS),(battery life, long, POS) (log on, pleased, POS),(log on, fast, POS),(WiFi connection, speedy, POS),(WiFi connection, pleased, POS) (cid:37) ,(battery life, long, POS) (log, pleased, POS) (cid:37) ,(log, fast, POS) (cid:37) ,(WiFi connection, speedy, POS),(battery life, long, POS) (WiFi connection, speedy, POS),(battery life, long, POS) (WiFi connection, speedy, POS),(WiFi connection, long, POS) (cid:37) ,(battery life, fast, POS),(battery life, long, POS)The service was exceptional- sometime there was a feelingthat we were served by thearmy of friendly waiters . (service, exceptional, POS),(waiters, friendly, POS) (service, exceptional, POS),(waiters, friendly, POS) (service, exceptional, POS),(waiters, friendly, POS) (service, exceptional, POS),(waiters, friendly, POS),(service, feeling, POS) (cid:37) (service, exceptional, POS),(waiters, friendly, POS)

Table 8: Case study of task

Triple . Wrong predictions are marked with (cid:37) . The three examples are extractly the same as theones selected by (Peng et al. 2020). { IS-START, NOT-START, IS-END, NOT-END } , then thereare n ( (cid:28) n ) total choices. Our proposed method combinesMRC and span based extraction, and it has huge improve-ments for Pair and

Triple . Analysis on Joint Learning

We give some analysis on the effectiveness of joint learning.The experimental results on the dataset from (Peng et al.2020) are shown in Table 9. Overall, from the experimentalresults, adding one or two learning objectives does not affectmuch in F-1 scores. However, joint learning is more efﬁcientand it can handle more tasks with one single model.

Task Left-MRC Right-MRC 14res 14lap 15res 16resextraction classiﬁcation extractionAESC √ √ √ √ √ ↑ ↑ ↓ ↑ √ √ √ √ √ ↓ ↓ ↓ ↓ √ √ √ √ ↑ ↓ ↓ ↓ Table 9: Results on the analysis of joint learning for

AESC and

Pair on the dataset from (Peng et al. 2020).For the task

AESC , we compare the results with or with-out the span based extraction output from the right part ofour model. By jointly learning to extract the opinion termsfor a given aspect, the result of aspect-level sentiment clas-siﬁcation is improved a little bit. It makes sense because ex-tracted OTs are useful for identifying the sentiment polarityof the given AT.For the task

Pair , we compare the results with or withoutthe classiﬁcation output from the right part of our model.The F-1 scores for OT extraction decrease a little bit whenthe sentiment classiﬁcation objective is added. The reasonmight be that the sentiment polarity can point to multipleOTs in a sentence where some OTs are not paired with thegiven AT.

Case Study

To validate the effectiveness of our model, we compare ourmethod based on exactly the same three examples in thebaseline (Peng et al. 2020) as its source code is not public.The results are shown in Table 8.The ﬁrst example shows our MRC based approach per-forms better in matching AT and OT. Peng’s approachmatches “ tuna ” and “ too dry ” by mistake while our approach converts the matching problem to a MRC problem. The sec-ond example shows the span based extraction method isgood at detecting boundaries of entities. Our approach suc-cessfully detects “ log on ” while Peng’s approach detects“ log ” by mistake. Moreover, the sentiment classiﬁcation re-sult indicates that our MRC based approach is also good at SC .We plot in Figure 4 the attention matrices from ourﬁne-tuned model between the input text and the query. Aswe can see, the “ opinion term ” has high attention scoreswith “ fresh ”, and “ sentiment ” has high attention scores with“ food/fresh/hot ”. As a result, the queries can capture impor-tant information for the task via self-attentions.Figure 4: An example of attention matrices for the input textand query. Conclusions

In this paper, we propose a joint training dual-MRC frame-work to handle all ABSA subtasks of aspect based sentimentanalysis (ABSA) in one shot, where the left MRC is for as-pect term extraction and the right MRC is for aspect-orientedopinion term extraction and sentiment classiﬁcation. Theoriginal dataset is converted and fed into dual-MRC to trainjointly. For three research lines, experiments are conductedand are compared with different ABSA subtasks and base-lines. Experimental results indicate that our proposed frame-work outperforms all compared baselines. eferences

Chen, P.; Sun, Z.; Bing, L.; and Yang, W. 2017. RecurrentAttention Network on Memory for Aspect Sentiment Anal-ysis. In

EMNLP , 452–461.Chen, Z.; and Qian, T. 2020. Relation-Aware CollaborativeLearning for Uniﬁed Aspect-Based Sentiment Analysis. In

ACL , 3685–3694.Dai, H.; and Song, Y. 2019. Neural aspect and opinion termextraction with mined rules as weak supervision. In

ACL ,5268–5277.Devlin, J.; Chang, M.-W.; Lee, K.; and Toutanova, K. 2019.BERT: Pre-training of Deep Bidirectional Transformers forLanguage Understanding. In

NAACL-HLT , 4171–4186.Du, C.; Sun, H.; Wang, J.; Qi, Q.; Liao, J.; Xu, T.; and Liu,M. 2019. Capsule Network with Interactive Attention forAspect-Level Sentiment Classiﬁcation. In

EMNLP , 5488–5497.Fan, C.; Gao, Q.; Du, J.; Gui, L.; Xu, R.; and Wong, K. 2018.Convolution-based Memory Network for Aspect-based Sen-timent Analysis. In

SIGIR , 1161–1164.Fan, Z.; Wu, Z.; Dai, X.; Huang, S.; and Chen, J. 2019.Target-oriented opinion words extraction with target-fusedneural sequence labeling. In

NAACL-HLT , 2509–2518.Gu, S.; Zhang, L.; Hou, Y.; and Song, Y. 2018. A Position-aware Bidirectional Attention Network for Aspect-levelSentiment Analysis. In

COLING , 774–784.He, R.; Lee, W. S.; Ng, H. T.; and Dahlmeier, D. 2017. AnUnsupervised Neural Attention Model for Aspect Extrac-tion. In

ACL , 388–397.He, R.; Lee, W. S.; Tou Ng, H.; and Dahlmeier, D. 2019.An Interactive Multi-Task Learning Network for End-to-End Aspect-Based Sentiment Analysis. In

ACL , 504–515.Hu, M.; and Liu, B. 2004. Mining and summarizing cus-tomer reviews. In

KDD , 168–177.Hu, M.; Peng, Y.; Huang, Z.; Li, D.; and Lv, Y. 2019. Open-domain targeted sentiment analysis via span-based extrac-tion and classiﬁcation. In

ACL , 537–546.Huang, B.; and Carley, K. M. 2018. Parameterized Convo-lutional Neural Networks for Aspect Level Sentiment Clas-siﬁcation. In

EMNLP , 1091–1096.Lee, K.; Kwiatkowski, T.; Parikh, A. P.; and Das, D. 2016.Learning Recurrent Span Representations for ExtractiveQuestion Answering.

CoRR abs/1611.01436.Li, K.; Chen, C.; Quan, X.; Ling, Q.; and Song, Y. 2020.Conditional Augmentation for Aspect Term Extraction viaMasked Sequence-to-Sequence Generation. In

ACL , 7056–7066.Li, X.; Bing, L.; Lam, W.; and Shi, B. 2018a. Transforma-tion Networks for Target-Oriented Sentiment Classiﬁcation.In

ACL , 946–956.Li, X.; Bing, L.; Li, P.; and Lam, W. 2019a. A uniﬁed modelfor opinion target extraction and target sentiment prediction.In

AAAI , 6714–6721. Li, X.; Bing, L.; Li, P.; Lam, W.; and Yang, Z. 2018b. As-pect Term Extraction with History Attention and SelectiveTransformation. In

IJCAI , 4194–4200.Li, X.; Bing, L.; Zhang, W.; and Lam, W. 2019b. ExploitingBERT for End-to-End Aspect-based Sentiment Analysis. In

W-NUT@EMNLP , 34–41.Ma, D.; Li, S.; and Wang, H. 2018. Joint Learning for Tar-geted Sentiment Analysis. In

EMNLP , 4737–4742.Ma, D.; Li, S.; Wu, F.; Xie, X.; and Wang, H. 2019. Explor-ing Sequence-to-Sequence Learning in Aspect Term Extrac-tion. In

ACL , 3538–3547.Ma, D.; Li, S.; Zhang, X.; and Wang, H. 2017. InteractiveAttention Networks for Aspect-Level Sentiment Classiﬁca-tion. In

IJCAI , 4068–4074.Peng, H.; Xu, L.; Bing, L.; Huang, F.; Lu, W.; and Si, L.2020. Knowing What, How and Why: A Near CompleteSolution for Aspect-Based Sentiment Analysis. In

AAAI ,8600–8607.Pontiki, M.; Galanis, D.; Papageorgiou, H.; Androutsopou-los, I.; Manandhar, S.; AL-Smadi, M.; Al-Ayyoub, M.;Zhao, Y.; Qin, B.; De Clercq, O.; Hoste, V.; Apidianaki,M.; Tannier, X.; Loukachevitch, N.; Kotelnikov, E.; Bel,N.; Jim´enez-Zafra, S. M.; and Eryi˘git, G. 2016. SemEval-2016 Task 5: Aspect Based Sentiment Analysis. In

SemEval@NAACL- HLT , 19–30.Pontiki, M.; Galanis, D.; Papageorgiou, H.; Manandhar, S.;and Androutsopoulos, I. 2015. SemEval-2015 Task 12:Aspect Based Sentiment Analysis. In

SemEval@NAACL-HLT) , 486–495.Pontiki, M.; Galanis, D.; Pavlopoulos, J.; Papageorgiou, H.;Androutsopoulos, I.; and Manandhar, S. 2014. SemEval-2014 Task 4: Aspect Based Sentiment Analysis. In

Se-mEval@COLING , 27–35.Rosenfeld, A.; and Thurston, M. 1971. Edge and curve de-tection for visual scene analysis.

IEEE Transactions on com-puters

EMNLP , 999–1005.Sun, C.; Huang, L.; and Qiu, X. 2019. Utilizing BERT forAspect-Based Sentiment Analysis via Constructing Auxil-iary Sentence. In

NAACL-HLT , 380–385.Tang, D.; Qin, B.; Feng, X.; and Liu, T. 2016. EffectiveLSTMs for target-dependent sentiment classiﬁcation. In

COLING , 3298–3307.Tang, D.; Qin, B.; and Liu, T. 2016. Aspect Level SentimentClassiﬁcation with Deep Memory Network. In

EMNLP ,214–224.Wang, W.; and Pan, S. J. 2018. Recursive Neural Struc-tural Correspondence Network for Cross-domain Aspectand Opinion Co-Extraction. In

ACL , 2171–2181.Wang, W.; Pan, S. J.; Dahlmeier, D.; and Xiao, X. 2017.Coupled multi-layer attentions for co-extraction of aspectand opinion terms. In

AAAI , 3316–3322.ang, Y.; Huang, M.; Zhu, X.; and Zhao, L. 2016.Attention-based LSTM for Aspect-level Sentiment Classi-ﬁcation. In

EMNLP , 606–615.Wu, Z.; Zhao, F.; Dai, X.-Y.; Huang, S.; and Chen, J.2020. Latent Opinions Transfer Network for Target-Oriented Opinion Words Extraction. In

AAAI , 9298–9305.Xu, H.; Liu, B.; Shu, L.; and Yu, P. S. 2018. Double Embed-dings and CNN-based Sequence Labeling for Aspect Extrac-tion. In

ACL , 592–598.Yang, M.; Tu, W.; Wang, J.; Xu, F.; and Chen, X. 2017. At-tention Based LSTM for Target Dependent Sentiment Clas-siﬁcation. In

AAAI , 5013–5014.Yu, J.; Jiang, J.; and Xia, R. 2019. Global Inference for As-pect and Opinion Terms Co-Extraction Based on Multi-TaskNeural Networks.

IEEE ACM Trans. Audio Speech Lang.Process.

AAAI , 3087–3093.Zhao, H.; Huang, L.; Zhang, R.; Lu, Q.; and Xue, H. 2020.SpanMlt: A Span-based Multi-Task Learning Frameworkfor Pair-wise Aspect and Opinion Terms Extraction. In

ACL ,3239–3248.Zhou, Y.; Huang, L.; Guo, T.; Han, J.; and Hu, S. 2019. ASpan-based Joint Model for Opinion Target Extraction andTarget Sentiment Classiﬁcation. In