[PDF] ECSP: A New Task for Emotion-Cause Span-Pair Extraction and Classification

Abstract

Emotion cause analysis such as emotion cause extraction (ECE) and emotion-cause pair extraction (ECPE) have gradually attracted the attention of many researchers. However, there are still two shortcomings in the existing research: 1) In most cases, emotion expression and cause are not the whole clause, but the span in the clause, so extracting the clause-pair rather than the span-pair greatly limits its applications in real-world scenarios; 2) It is not enough to extract the emotion expression clause without identifying the emotion categories, the presence of emotion clause does not necessarily convey emotional information explicitly due to different possible causes. In this paper, we propose a new task: Emotion-Cause Span-Pair extraction and classification (ECSP), which aims to extract the potential span-pair of emotion and corresponding causes in a document, and make emotion classification for each pair. In the new ECSP task, ECE and ECPE can be regarded as two special cases at the clause-level. We propose a span-based extract-then-classify (ETC) model, where emotion and cause are directly extracted and paired from the document under the supervision of target span boundaries, and corresponding categories are then classified using their pair representations and localized context. Experiments show that our proposed ETC model outperforms the SOTA model of ECE and ECPE task respectively and gets a fair-enough results on ECSP task.

Full PDF

EECSP: A New Task for Emotion-Cause Span-PairExtraction and Classiﬁcation

Hongliang Bi, Pengyuan Liu

Beijing Language and Culture University, China

Abstract

Emotion cause analysis such as emotion cause extraction (ECE) and emotion-cause pair extrac-tion (ECPE) have gradually attracted the attention of many researchers. However, there are stilltwo shortcomings in the existing research: 1) In most cases, emotion expression and cause arenot the whole clause, but the span in the clause, so extracting the clause-pair rather than thespan-pair greatly limits its applications in real-world scenarios; 2) It is not enough to extractthe emotion expression clause without identifying the emotion categories, the presence of emo-tion clause does not necessarily convey emotional information explicitly due to different possiblecauses. In this paper, we propose a new task: Emotion-Cause Span-Pair extraction and classiﬁca-tion (ECSP), which aims to extract the potential span-pair of emotion and corresponding causesin a document, and make emotion classiﬁcation for each pair. In the new ECSP task, ECE andECPE can be regarded as two special cases at the clause-level. We propose a span-based extract-then-classify (ETC) model, where emotion and cause are directly extracted and paired from thedocument under the supervision of target span boundaries, and corresponding categories are thenclassiﬁed using their pair representations and localized context. Experiments show that our pro-posed ETC model outperforms the SOTA model of ECE and ECPE task respectively and gets afair-enough results on ECSP task.

Emotion cause analysis such as emotion cause extraction (ECE) and emotion-cause pair extraction(ECPE) have gradually attracted the attention of many researchers, can be constructive to guide thedirection of future work, i.e., improving the quality of products or services according to the emotioncauses of comments provided by users.Emotion cause extraction (ECE) was ﬁrst proposed by Lee et al. (2010), which aims at discoveringthe potential cause clauses behind a certain emotion expression in the text. Earlier work viewed ECEas a trigger word detection problem and tries to solve it with corresponding tagging techniques. There-fore, primary efforts have been made on discovering reﬁned linguistic features (Chen et al., 2010; Lee etal., 2013), yielding improved performance. More recently, instead of concentrating on word-level causedetection, clause-level extraction (Gui et al., 2016) was putted forward in that the impact of individualwords in a clause can span over the whole sequence in the clause. While ECE has attracted an increasingattention due to its theoretical and practical signiﬁcance, it requires that the emotion expression annota-tions should be given in the test set. In light of recent advances in multi-task learning, Chen et al. (2018)proposed joint extraction of emotion categories and causes are investigated to exploit the mutual infor-mation between two correlated tasks, and Xia and Ding (2019) proposed emotion-cause pair extraction(ECPE) task, which aims to extract all potential clause-pairs of emotion expression and correspondingcause in a document, and to solve the shortcomings of previous ECE task must be annotated before ex-traction causes. Xia and Ding (2019) argues that, while co-extraction of emotion expression and causesare important, ECPE is a more challenging problem that is worth putting more emphases on.

Preprint version. a r X i v : . [ c s . C L ] M a r he 38th Document in Benchmark Corpus clause c : Wang was diagnosed with chronic renal failure last Aprilclause c : This test result broke the originally happy family of threeclause c : Xu said, ”It feels like the sky is falling right on top of me.”clause c : Xu described how he felt when he learned that his husband was illclause c : Because Wang is the support of her and her two-year-old child Emotion Cause Extraction (ECE) clause c = ⇒ clause c Emotion-Cause Pair Extraction (ECPE) (clause c , clause c ) Emotion-Cause Span-Pair Extraction and Classiﬁcation (ECSP) (It feels like the sky is falling right on top of me,Wang is the support of her and her two-year-old child, fear)Figure 1: An intuitive example of the difference between the ECE, ECPE and new ECSP task.However, ECPE still suffers from two shortcomings: 1) In most cases, emotion expression and causeare not the whole clause, but the span in the clause, so extracting the clause-pair rather than the span-pairgreatly limits its applications in real-world scenarios; 2) It is not enough to extract the emotion expressionclause without identifying the emotion categories, the presence of emotion clause does not necessarilyconvey emotional information due to different possible causes such as negative polarity, sense ambiguityor rhetoric. For example, “It feels like the sky is falling right on top of me” is an emotion expression of“fear”.In this paper, we propose a new task: Emotion-Cause Span-Pair extraction and classiﬁcation (ECSP),which aims to extract the potential span-pair of emotion and corresponding causes in a document, andmake emotion classiﬁcation for each pair. Therefore, ECE and ECPE can be regarded as two specialcases of ECSP at the clause-level. Figure 1 is an intuitive example of the difference between the ECE,ECPE and new ECSP task.Inspired by recent span-based models in syntactic parsing and co-reference resolution (Lee et al.,2017; Stern et al., 2017), we propose a span-based model to solve this new ECSP task. The key insightis to annotate each emotion and cause with its span boundary followed by its emotion categories. Undersuch annotation, we introduce a span-based extract-then-classify (ETC) model that emotion and causeare directly extracted and paired from the document under the supervision of target span boundaries, andcorresponding categories are then classiﬁed using their pair representations and localized context. Theadvantage of this method is that clause-based tasks and span-based tasks can be interpreted uniformly.Moreover, since the polarity is decided by using the targeted span representation, the model is able to takeall target words into account before making predictions, thus naturally avoiding sentiment inconsistency.We take BERT (Devlin et al., 2019) as the default backbone network, and explore the following twoaspects. First, we explore the feasibility of the ECSP task under different length search schemes, and theresults prove that the ECSP task can be solved well with the increase of the model search length, andthere is still some room for improvement. Second, following previous works (Gui et al., 2016; Xia andDing, 2019), we compare our proposed ETC model and strong baselines under the clause-based searchscheme. our proposed ETC model outperforms the SOTA model of ECE and ECPE task respectively andgets a fair-enough results on ECSP task. This proves the feasibility of the ECSP task and the effectivenessof our proposed ETC model.igure 2: Overall illustration of our proposed ETC model.

Instead of traditional clause-based detection methods to identify emotions and causes, we propose use aspan-based search scheme as follows: give an input document D = ( x , · · · , x n ) with length n , and aemotion-cause span-pair list P = { p , · · · , p m } , where the number of emotion-cause span-pair is m andeach emotion expression span e and corresponding cause span c in pair p i is annotated with its STARTposition, its END position, and its emotion category. Span i is deﬁned by all the tokens from START( i )to END( i ) inclusive, for 1 ≤ i ≤ N.Our goal is to ﬁnd all potential span-pair of emotion and corresponding causes in a document, andmake emotion classiﬁcation for each pair. The overall illustration of the proposed ETC model is shown inFigure 2. The basis of our proposed ETC model is the BERT encoder (Devlin et al., 2019), we map wordembeddings into contextualized token representations using pre-trained Transformer blocks (Vaswani etal., 2017). A span classiﬁer is ﬁrst used to propose multiple candidate targets from the sentence. Then,an emotion classiﬁer is designed to predict the emotion labels towards each extracted candidate span-pairusing its summarized span representation and and localized context. We further study the performanceof different span search schemes.

As mentioned before, we ﬁrst obtain the features of tokens with BERT, which utilizes the abundantlanguage knowledge, position information, and contextual information it contains. Given a document D = { x t } where t is the number of words, BERT begins by converting then sequence of tokens into asequence of vectors X = { x t } Li , x t ∈ R d . Each of these vectors is the sum of a token embedding, apositional embedding that represents the position of the token in the sequence, and a segment embeddingthat represents whether the token is in the source text or the auxiliary text. We only have source text sothe segment embeddings are the same for all tokens. Then several Transformer (Vaswani et al., 2017)layers are applied to get the ﬁnal representations: X i +1 = T ransf ormer ( X i ) , i ∈ [0 , D − (1)We use the ﬁnal hidden output of BERT X D ∈ R L × d as the representations of corresponding tokens.Attention mechanism (Bahdanau et al., 2015) can quickly extract important features of sparse data, soit is widely used in natural language. However, the BERT encoder uses a lot of attention mechanism, inrder to save resources, it is no longer used. We use the following two convenient functions to create task-speciﬁc span features: (1) sum of all vectors for the entire span can usually represent the its semantics.(2) max pooling is a sample-based discretization process, which the objective is to down-sample an inputrepresentation (image, text, hidden-layer output matrix, etc.). For each span i , its span representation g i was deﬁned as: g i = concat ( x Dcls , sum ( i ) , max ( i ) , Φ i ) (2)where x Dcls represents the ﬁnal hidden output of BERT global context information, which is usuallyrepresented by the vector of the ﬁrst token in BERT. Φ i encodes the length of span i in number of tokens.Each component of g i is a span-speciﬁc feature that would be difﬁcult to deﬁne and use in token-levelmodels. After obtaining span representation, we predict the type for each span. This prediction is done identicallyand parallelly for each span. For each span we compute a vector of type scores and apply the softmaxfunction to its type score vector to obtain the distribution. For span i , y spani = sof tmax ( g i w i + b i ) (3)where w i and b i are parameters that can be learned.The predicted type for each span i is the type corresponding to span i ’s highest span type score. Onlyspans whose predicted type is not none are selected. Finally, we obtain a set of emotion expression spans E = {· · · , e i , · · · } and a set of cause spans C = {· · · , c i , · · · } . Now our goal is then to pair the two sets and construct a set of emotion-cause span-pairswith emotion relationship. Firstly, we apply a Cartesian product to E and C , and obtain the set of allpossible span-pairs: P all = {· · · , ( e i , c j ) , · · · } (4)Despite advances in detecting long distance relations using BERT or the attention mechanism, thenoise induced with increasing context remains a challenge. By using a Localized Context (LC), i.e. thecontext between span candidates, the emotion classiﬁer can focus on the sentences section that is oftenmost discriminative for the emotion type: p ( e i ,c j ) = concat ( g e i , g c j , LC ( e i ,c j ) ) LC ( e i ,c j ) = concat ( sum ( i → j ) , max ( i → j ) , Ψ i → j ) (5)where e i and c j are the representations of the emotion expression span and corresponding cause spanrespectively, i → j is localized context between i and j , and Ψ i → j represents the distances (dist) betweenspan i and span j .For each emotion-cause span-pair ( e i , c j ) , we obtain a representation by concatenating the respec-tive span embeddings and Localized Context features. Finally, we train a softmax classiﬁer to identifyemotion categories: y pair ( e i ,c j ) = sof tmax ( p ( e i ,c j ) w ( e i ,c j ) + b ( e i ,c j ) ) (6)where w ( e i ,c j ) and b ( e i ,c j ) are parameters that can be learned. Two learning signals are provided to train this model: the span type information for each span (emotionalexpression, reason, and none) and the emotion category information for each selected (ordered) span-pair.Both are provided via cross-entropy (Shore and Johnson, 1980) loss on Eq. 3 and Eq. 6 respectively.tem Instance Clauses Cause Cause 1 Cause 2 Cause 3Number 2105 11799 2167 2046 56 3Item Annotations Length ≤ ≤ ≤

10 Length ≤

15 Length ≤ We evaluate on the benchmark ECE corpus (Gui et al., 2016), which was the mostly used corpus foremotion cause extraction. The corpus includes annotations of emotional expressions and correspondingemotional causes. We use the boundary of the annotations as the start and end of the spans. Note thatthe presence of emotion expression does not necessarily convey emotional information due to differentpossible causes such as negative polarity, sense ambiguity or rhetoric. And, the presence of emotionexpression does not necessarily guarantee the existence of emotional cause neither. Therefore, for eachemotion expression, we also use the emotion labels provided by the corpus. There are different lengthsfor each emotion expression and cause, and the number is shown in Table 1. The precision (P), recall (R), and F1 score are used as the metrics for evaluation. These metrics inemotion cause extraction are deﬁned by:

P recision = (cid:80) correct items (cid:80) proposed items , Recall = (cid:80) correct items (cid:80) annotated items , F ∗ P recision ∗ RecallP recision + Recall where proposed items denotes the number of items that are predicted, annotated items denotes thenumber of items that in corpus and the correct items means the number of items that are correctlypredicted. Unlike previous research on clause, a correct item is considered to be correct only if both thestart and end of the item are correctly predicted in the new ECSP task.

We use the BERT-Chinese model as the default backbone network, which using 12 layers, 768 -dimensional embeddings, 12 heads per layer, resulting in a total 110M parameters. Each span gets aspan length feature Φ which is a learned 25 -dimensional vector representing the number of tokens inthat span and each pair also gets a localized context length feature Ψ which is twice as much as Φ . werandomly divide the data with the proportion of 9:1, with 9 folds as training data and remaining 1 fold astesting data. The following results are reported in terms of an average of 10-fold cross-validation. We useAdam optimizer with a linear warmup and linear decay learning rate schedule and a peak learning rate of5e-5. Dropout is applied with dropout rate 0.1 to all hidden layers of BERT and Classiﬁers. Mini-batchSize is 1 and early stopping of 20 evaluations on the dev set is used. Table 2 shows our proposed ETC model performances with different span lengths on four sub-tasks:(emotion expression span extraction (EESE), emotion cause span extraction (ECSE), emotion-causespan-pair extraction (ECSPE), and emotion-cause span-pair extraction and classiﬁcation (ECSP)).Given a document with a T token, there may be N = T ( T +1) / spans. The huge search space makesthe task extremely challenging. In this experiment, we created a length-restricted span (rather than just Available at: Available at: https://github.com/huggingface/transformers odel EESE ECSE ECSPE ECSPP R F1 P R F1 P R F1 P R F1ETC-5 87.78 89.69

Table 2: Experimental results of all proposed ETC model and variants, where ETC- n represents themaximum span length of the model is n . Model EESE ECSE ECSPE ECSPP R F1 P R F1 P R F1 P R F1Without 87.08 89.11 88.07 58.76 57.10 57.79 52.73 520.03 51.31 49.62 47.24 48.35With 87.56 89.31

Table 3: Experimental results with localized context effects. “With” represents the localized context,“Without” means no localized context.token) representation that achieves a dual goal: to improve memory efﬁciency and capture the majority(more than 98% of emotions, see Table 1) for the span considered.Compared with ETC-5 and ETC-15, ETC-20 gets great improvements on the ECSP task as well as thetwo sub-tasks. Speciﬁcally, we ﬁnd that the improvements are mainly in the recall rate on the ECSE task,which ﬁnally lead to the great improvement in the recall rate of ECSP. The performance of the modeldoes not decrease sharply as the length of the annotation increases, and our chosen span search schemeis far more memory efﬁcient than a naive search over all possible spans in the input document. Yet ourscheme still considers more than 98% of all annotation. Our scheme is linear in the document length, notquadratic; because we limit our proposed ETC model to spans that are wholly in a document and have amax length of L = 20 tokens.In addition, the model achieved excellent F1 score 88.71 on ESE, but the F1 score on ECSP is 3.14%lower than ECSPE, which indicates that it is not enough to extract the emotion without identifying theemotion categories. The presence of emotion clause does not necessarily convey emotional informationexplicitly, and emotions need to be classiﬁed. As is shown in Table 3, localized context can effective slightly improve the performance of the model.The localized context takes advantage of all information between two span, so it is able to enrich thesource information when the model predicts the emotion labels, which leads to the performance of themodel effective signiﬁcantly improved.

By relaxing the ECSP task to the clause-level, we further examine our model by comparing it withstate-of-the-art of the traditional ECE and ECPE task.

We employ a hierarchical Bi-LSTM network

Indep proposed by Xia and Ding (2019) as baseline inECPE task. The lower layer consists of a set of word-level Bi-LSTM modules, each of which correspondsto one clause, and accumulate the context information for each word of the clause. Attention mechanismis then adopt to get a clause representation. The upper layer consists of two components: one for emotionexpression extraction and another for cause extraction. Each component is a clause-level Bi-LSTMwhich receives the independent clause representations and ﬁnally feed to the softmax layer for emotionprediction and cause predication. It has two interactive variants:

Inter-CE , where the predictions ofcause extraction are used to improve emotion extraction, and

Inter-EC , where the predictions of emotionextraction are used to enhance cause extraction.In addition to baselines mentioned above, we also considered several state-of-the-art methods andmodels in ECE task that need to provide annotations of emotional expressions in the test set in advanceodel Precision Recall F1Indep 83.75 80.71 82.10Inter-CE 84.94 51.22 83.00Inter-EC 83.64 81.07 82.30ETC-Clause 94.04 95.13 (a) Performance on the EEE.

Model Precision Recall F1Indep 68.32 50.82 58.18Inter-CE 69.02 51.35 59.01Inter-EC 67.21 57.05 61.28ETC-Clause 88.17 84.07 (b) Performance on the ECPE.

Model Precision Recall F1RB 67.47 42.87 52.43CB 26.72 71.30 38.87ConvMS-Mement 70.76 68.38 69.55CANN 77.21 68.91 72.66HCS 73.88 71.54 72.69MANN 78.43 75.87 77.06MANN-E 48.26 31.60 37.97Indep 69.02 56.73 62.05Inter-CE 68.09 56.34 61.51Inter-EC 70.41 60.83 65.07ETC-Clause 91.29 87.98 (c) Performance on the ECE.

Table 4: Comparison between our proposed ETC model and baselines on clause-level.to evaluate the results of our proposed ETC model: RB is a rule based method (Lee et al., 2010); CB iscommon-sense based method (Russo et al., 2011); ConvMS-Memnet considers emotion cause analysisas a reading comprehension task and designs a multiple-slot deep memory network to model contextinformation (Gui et al., 2017).

CANN uses a co-attention neural network to identify emotion causes(Li et al., 2018) and

CANN-E eliminates the dependence of CANN on emotion annotation in the testdata.

HCS is proposed by Yu et al. (2019) using a multiple-level hierarchical network to detect theemotion causes.

MANN is the current state-of-the-art method employing a multi-attention-based modelfor emotion cause extraction (Li et al., 2019).

The past clause-level models regarded the ECE task as a set of independent clause classiﬁcation prob-lems. By observing the Table 4 (c), we found that the proportions of emotion cause clauses and non-emotion-cause clauses were 18.36% and 81.64%, respectively. It is a serious class-imbalance classiﬁca-tion problem and the model tends to predict the clause as non-emotion-cause more often. This is also thereason why their Recall scores were quite low (the highest was 75.87).By contrast, it can found in Table 4 (c) that our proposed ETC model is absolutely higher on eachindicator than the other baselines and no need to manually annotate the test set. This is because they cancapture the relations of multiple clauses which help inferring the current clause. For example, if no otherclauses in a document have been detected as an emotion cause, the model will increase the probabilityof the current clause being predicted as an emotion cause. This ﬁnally increases the Recall score. It isclear that by removing the emotion annotations (CANN-E), the F1 score of CANN drops dramatically(about 34.69%). In contrast, our method does not need the emotion annotations and achieve 89.57% inF1 score, which signiﬁcantly outperforms the CANN-E model by 51.6%.Xia and Ding (2019) guessed that the expression clause extraction and cause clause extraction are notmutually independent. On the one hand, providing emotions can help better discover the causes; on theother hand, knowing causes may also help more accurately extract emotions. Our proposed ETC modeluses a classiﬁer to complete the classiﬁcation of expression and cause, forcing the classiﬁer to learnthe intrinsic relationship between them. Thanks to BERT’s self-attention mechanism, our proposed ETCmodel can capture the relationship between multiple clauses. It can found in Table 4 (b) that our proposedETC model has been greatly improved on both expression clause extraction and cause clause extractiontasks. Compared with Indep, Inter-CE and Inter-EC, our proposed ETC model gets great improvementson the ECPE task as well as the two sub-tasks. Our span-based model achieves 11.57%, 24.77% and24.5% absolute gains on three sub-task compared to the best classiﬁcation model, indicating the efﬁcacyof our proposed ETC model.

Related Work

First of all, our work is related to extracting causes based on emotions expression presented in documents,i.e., emotion cause extraction (ECE). ECE was ﬁrst proposed by Lee et al. (2010), given the fact that anemotion is often triggered by cause events and that cause events are integral parts of emotion, theyproposed a linguistic-driven rule-based system for emotion cause detection. To solve the insufﬁcient ofno formal deﬁnition about event in emotion cause extraction and there was no open corpus available foremotion cause extraction, Gui et al. (2016) released a corpus and re-formalized the ECE task as a clauseclassiﬁcation problem. This corpus has received much attention in the following study and has becomea benchmark corpus for ECE task research. Based on this corpus, several traditional rule-based models(Lee et al., 2010; Russo et al., 2011; Gui et al., 2014), machine learning models (Gui et al., 2016; Gui etal., 2017; Xu et al., 2017) and deep learning models (Gui et al., 2017; Li et al., 2018; Yu et al., 2019; Xiaet al., 2019; Li et al., 2019) were proposed. Recently, To solve the shortcoming of emotion expressionmust be annotated before cause extraction in the test set, Xia and Ding (2019) proposed emotion-causepair extraction (ECPE) task, which aims to extract all potential clause-pairs of emotion expression andcorresponding cause in a document.

The key idea of task and model is to build span-based feature representation for emotion expression andcauses to efﬁciently extract document information. Furthermore, our proposed ETC model is able toutilize the information based on an overall understanding of the document and a better localized contextof interactions between spans. Comprehensive empirical studies demonstrate the effectiveness of ourproposed ETC model. Since our proposed ETC model has a single input structure, so in the future wewill explore how to incorporate discourse graphs into our proposed ETC model to further improve per-formance, and we intend to annotate a large-scale emotion-cause span-pair corpus to facilitate research.

References

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning toalign and translate. In .Ying Chen, Sophia Yat Mei Lee, Shoushan Li, and Chu-Ren Huang. 2010. Emotion cause detection with linguisticconstructions. In

Proceedings of the 23rd International Conference on Computational Linguistics , pages 179–187. Association for Computational Linguistics.Ying Chen, Wenjun Hou, Xiyao Cheng, and Shoushan Li. 2018. Joint learning for emotion classiﬁcation andemotion cause detection. In

Proceedings of the 2018 Conference on Empirical Methods in Natural LanguageProcessing , pages 646–651.Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. Bert: Pre-training of deep bidirec-tional transformers for language understanding. In

Proceedings of the 2019 Conference of the North AmericanChapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long andShort Papers) , pages 4171–4186.Lin Gui, Li Yuan, Ruifeng Xu, Bin Liu, Qin Lu, and Yu Zhou. 2014. Emotion cause detection with linguisticconstruction in chinese weibo text. In

Natural Language Processing and Chinese Computing , pages 457–464.Springer.Lin Gui, Dongyin Wu, Ruifeng Xu, Qin Lu, and Yu Zhou. 2016. Event-driven emotion cause extraction withcorpus construction. In

EMNLP , pages 1639–1649. World Scientiﬁc.Lin Gui, Jiannan Hu, Yulan He, Ruifeng Xu, Lu Qin, and Jiachen Du. 2017. A question answering approach foremotion cause extraction. In

Proceedings of the 2017 Conference on Empirical Methods in Natural LanguageProcessing , pages 1593–1602.Sophia Yat Mei Lee, Ying Chen, and Chu-Ren Huang. 2010. A text-driven rule-based system for emotion causedetection. In

Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis andGeneration of Emotion in Text , pages 45–53. Association for Computational Linguistics.ophia Yat Mei Lee, Ying Chen, Chu-Ren Huang, and Shoushan Li. 2013. Detecting emotion causes with alinguistic rule-based approach 1.

Computational Intelligence , 29(3):390–416.Kenton Lee, Luheng He, Mike Lewis, and Luke Zettlemoyer. 2017. End-to-end neural coreference resolution. In

Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing , pages 188–197.Xiangju Li, Kaisong Song, Shi Feng, Daling Wang, and Yifei Zhang. 2018. A co-attention neural networkmodel for emotion cause analysis with emotional context awareness. In

Proceedings of the 2018 Conference onEmpirical Methods in Natural Language Processing , pages 4752–4757.Xiangju Li, Shi Feng, Daling Wang, and Yifei Zhang. 2019. Context-aware emotion cause analysis with multi-attention-based neural network.

Knowledge-Based Systems , 174:205–218.Irene Russo, Tommaso Caselli, Francesco Rubino, Ester Boldrini, and Patricio Mart´ınez-Barco. 2011. Emocause:an easy-adaptable approach to emotion cause contexts. In

Proceedings of the 2nd Workshop on ComputationalApproaches to Subjectivity and Sentiment Analysis , pages 153–160. Association for Computational Linguistics.John Shore and Rodney Johnson. 1980. Axiomatic derivation of the principle of maximum entropy and theprinciple of minimum cross-entropy.

IEEE Transactions on information theory , 26(1):26–37.Mitchell Stern, Jacob Andreas, and Dan Klein. 2017. A minimal span-based neural constituency parser. In

Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: LongPapers) , pages 818–827.Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, andIllia Polosukhin. 2017. Attention is all you need. In

Advances in neural information processing systems , pages5998–6008.Rui Xia and Zixiang Ding. 2019. Emotion-cause pair extraction: A new task to emotion analysis in texts. In

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics , pages 1003–1012.Association for Computational Linguistics.Rui Xia, Mengran Zhang, and Zixiang Ding. 2019. Rthn: a rnn-transformer hierarchical network for emotioncause extraction. In

Proceedings of the 28th International Joint Conference on Artiﬁcial Intelligence , pages5285–5291. AAAI Press.Ruifeng Xu, Jiannan Hu, Qin Lu, Dongyin Wu, and Lin Gui. 2017. An ensemble approach for emotion causedetection with event extraction and multi-kernel svms.

Tsinghua Science and Technology , 22(6):646–659.Xinyi Yu, Wenge Rong, Zhuo Zhang, Yuanxin Ouyang, and Zhang Xiong. 2019. Multiple level hierarchicalnetwork-based clause selection for emotion cause extraction.