[PDF] TEST_POSITIVE at W-NUT 2020 Shared Task-3: Joint Event Multi-task Learning for Slot Filling in Noisy Text

Abstract

The competition of extracting COVID-19 events from Twitter is to develop systems that can automatically extract related events from tweets. The built system should identify different pre-defined slots for each event, in order to answer important questions (e.g., Who is tested positive? What is the age of the person? Where is he/she?). To tackle these challenges, we propose the Joint Event Multi-task Learning (JOELIN) model. Through a unified global learning framework, we make use of all the training data across different events to learn and fine-tune the language model. Moreover, we implement a type-aware post-processing procedure using named entity recognition (NER) to further filter the predictions. JOELIN outperforms the BERT baseline by 17.2% in micro F1.

Full PDF

TTEST POSITIVE at W-NUT 2020 Shared Task-3:Joint Event Multi-task Learning for Slot Filling in Noisy Text

Chacha Chen Penn [email protected]

Chieh-Yang Huang

Penn [email protected]

Yaqi Hou

UNC at Chapel [email protected]

Yang Shi

NC [email protected]

Enyan Dai

Penn [email protected]

Jiaqi Wang * Penn [email protected]

Abstract

Whois tested positive? What is the age of theperson? Where is he/she? ). To tackle thesechallenges, we propose the Joint Event Multi-task Learning (

JOELIN ) model. Through auniﬁed global learning framework, we makeuse of all the training data across differentevents to learn and ﬁne-tune the languagemodel. Moreover, we implement a type-awarepost-processing procedure using named entityrecognition (NER) to further ﬁlter the predic-tions.

JOELIN outperforms the BERT base-line by . in micro F1. In this work, we report the system architecture andresults of the team TEST POSITIVE in the compe-tition of W-NUT 2020 sharred Task-3: extractingCOVID-19 event from Twitter.Since February 2020, the pandemic COVID-19has been spreading all over the world, posing asigniﬁcant threat to mankind in every aspect. Theinformation sharing about a pandemic has beencritical in stopping virus spreading. With the recentadvance of social networks and machine learning,we are able to automatically detect potential eventsof COVID cases, and identify key information toprepare ahead.We are interested in COVID-19 related event ex-traction from tweets. With the prevalence of coron-avirus, Twitter has been a valuable source of newsand information. Twitter users share COVID-19 re-lated topics about personal narratives and news onsocial media (M¨uller et al., 2020). The information https://github.com/Chacha-Chen/JOELIN could be helpful for doctors, epidemiologists, andpolicymakers in controlling the pandemic. How-ever, manual extracting useful information fromtremendous amount of tweets is impossible. Hence,we aim to develop a system to automatically extractstructured knowledge from Twitter.Extracting COVID-19 related events from Twit-ter is non-trivial due to the following challenges:(1) How to deal with limited annotations in het-erogeneous events and subtasks? . The creationof the annotated data relies completely on humanlabors, and thus only a limited amount of data canbe obtained in each event categories. There are avariety types of events and subtasks. Many existingworks solve these low resource problem by differ-ent approaches, inlcuding crowdsourcing (M¨ulleret al., 2020; Finin et al., 2010; Potthast et al., 2018),unsupervised training (Xie et al., 2019; Hsu et al.,2017), or multi-task learning (Zhang and Yang,2017; Pentyala et al., 2019). Here we adopt multi-task training paradigm to beneﬁt from the inter-event and intra-event (subtasks) information shar-ing. In this way,

JOELIN learns a shared embed-ding network globally from all events data. In thisway, we implicitly augment the dataset by globaltraining and ﬁne-tuning the language model.(2)

How to make type-aware predictions?

Exist-ing work (Zong et al., 2020) did not encode the in-formation of different subtask types into the model,while it could be useful in suggesting the candidateslot entity type. In order to make type-aware pre-dictions, we propose a NER-based post-processingprocedure in the end of

JOELIN pipeline. Weuse NER to automatically tag the candidate slotsand remove the candidate whose entity type doesnot match the corresponding subtask type. For ex-ample, as shown in Figure 1, in subtask “Who”,“my wife’s grandmother” is a valid candidate slot,while “old persons home”, tagged as location entity,would be replaced with “Not Speciﬁed” during the a r X i v : . [ c s . C L ] S e p ost-processing. PERSON “my wife’s grandmother”

LOCATION “old persons home” Subtask “Who”

Figure 1: Illustration of NER-based post-processing.

In summary,

JOELIN is enabled by the follow-ing technical contributions: • A joint event multi-task learning frame-work for different events and subtasks.

Withthe uniﬁed global training framework, we train andﬁne-tune the language model across all events andmake predictions based on multi-task learning tolearn from limited data. • A NER-based type-aware post-processingapproach.

We leverage NER tagging on the modelpredictions and ﬁlter out wrong predictions basedon subtask types. In this way,

JOELIN beneﬁtsfrom subtask type prior knowledge and furtherboosts the performance.

Event Extraction from Twitter

Impressive ef-forts have been made to detect events from Twitter.Existing works include domain speciﬁc event ex-traction and open domain event extraction. For do-main speciﬁc extraction, approaches mainly focuson extracting a particular type of events, includingnatural disasters (Sakaki et al., 2010), trafﬁc events(Dabiri and Heaslip, 2019), user mobility behav-iors (Yuan et al., 2013), and etc. The open domainscenario is more challenging and usually relies onunsupervised approaches. Existing works usuallycreate clusters with event-related keywords (Parikhand Karlapalem, 2013), or named entities (McMinnand Jose, 2015; Edouard et al., 2017). Additionally,Ritter et al. (2012) and Zhou et al. (2015) designgeneral pipelines to extract and categorize events insupervised and unsupervised manner respectively.Different from previous works, we deal withCOVID-19 related event extraction in particular.Zong et al. (2020) provide a BERT baseline forthe same task. But we create a uniﬁed frameworkto learn simultaneously for different categories ofevents and subtasks.

Type-aware Slot Filling

Yang et al. (2016) for-mulate entity type constraints and use integer linearprogramming to combine them with relation clas-siﬁcation. Adel and Sch¨utze (2019) propose to in- tegrate entity and relation classes in convolutionalneural networks and learn the correlation fromdata. We propose a NER-based post-processingtechnique for type-aware slot ﬁlling. By ﬁlteringout entity mis-matched predictions,

JOELIN canefﬁciently boost the performance with minimumhand-crafted rules.

COVID-19 Twitter Analysis

With the quaran-tine situation, people can share thoughts and makecomments about COVID-19 on Twitter. It has be-come a research source for researchers to exploreand study. Singh et al. (2020) show that Twitterconversations indicate a spatio-temporal relation-ship between information ﬂow and new cases ofCOVID-19. There is some work about COVID-19datasets. Banda et al. (2020) provide a large-scalecurated dataset of over 152 million tweets. Chenet al. (2020) collect tweets and forms a multilingualCOVID-19 Twitter dataset. Based on the collecteddata, Jahanbin and Rahmanian (2020) propose amodel to predict COVID-19 breakout by monitor-ing and tracking information on Twitter. Thoughthere are some works about COVID-19 tweetsanalyisis (M¨uller et al., 2020; Jimenez-Sotomayoret al., 2020; Lopez et al., 2020), the work aboutautomatically extracting structured knowledge ofCOVID-19 events from tweets is still limited.

In this section, we introduce our approach

JOELIN and its data pre-processing and post-processingsteps in detail. First, we pre-process the noisy Twit-ter data following the data cleaning procedures inM¨uller et al. (2020). Second, we train

JOELIN and ﬁne-tune the pre-trained language model end-to-end. Speciﬁcally, we design the

JOELIN classi-ﬁer in a joint event multi-task learning framework.Moreover, we provide four options of embeddingtypes and ensemble the outputs with the highestvalidation score. Finally, we further utilize NERtechniques to post-process our results with mini-mum hand-crafted rules.

Prior to training, the original tweets are cleanedfollowing M¨uller et al. (2020). The punctuationsare standardized and unicode emoticons are ex-panded into textual ASCII representations . AllTwitter usernames are replaced with a special to-ken for pseudonymisation, URLs with https://pypi.org/project/emoji/ Covid-19 Twitter BERT [CLS] Tok 1 Tok 2 … Candidate Slot <\E> … Tok N

Sentence

Last Layer Concat Type-1Last Four Layer

Multi-task Classifier

Concat Type-2

NER-based Post-processing

Age Contact … Relation

Symptoms

OpinionWhen … … … …

Event 1 Event 1 …Age Contact

Event 3 … Event 4 … Event 5 … Figure 2: Our approach comprises of 2 main compo-nents: (1) global language model across events and sub-tasks; (2) multi-task learning classiﬁer. , and COVID-19 related tags, such as . Note that the data cleaning stepis designed as a hyper-parameter and can be on oroff during the experiments.We construct the training instance as follows.The annotated data is a collection of tweets. Eachtweet is accompanied by hand-labeled candidatechunks. Each candidate chunk is extracted andsandwiched by a pair of tokens < E > and < /E > .The masked text, together with the annotated label,will then serve as one instance of the input. JOELIN

Model

JOELIN consists of four modules as shown inFigure 2: the pre-trained COVID Twitter BERT(CT-BERT) (M¨uller et al., 2020), four differentembedding layers, joint event multi-task learningframework with global parameter sharing, and theoutput ensemble module.

COVID Twitter BERT

It has been a commonpractice that pre-trained language models, e.g.,BERT (Devlin et al., 2018) and RoBERTa (Liuet al., 2019), are used for a supervised ﬁne-tuningfor speciﬁc downstream tasks. In this work, we useCT-BERT as

JOELIN pre-trained language model.The CT-BERT is trained on a corpus of 160Mtweets related to COVID-19. CT-BERT showsgreat improvement compared to BERT-LARGEand RoBERTa. We further ﬁne-tune CT-BERT withthe provided dataset.

Feature Extraction

With the hidden representa-tion of token < E > given by CT-BERT, we furtherapply various choices of different feature extrac-tion methods to choose the more useful features.Inspired by Devlin et al. (2018), we implemented the following four feature extraction methods:1. Last hidden layer : we directly use the last hid-den layer of CT-BERT as our classiﬁer input.2.

Summation of last four : we sum the last fourhidden layer outputs as the classiﬁer input.3.

Concatenation of last four (type-1) : we directlyconcatenate the last four layers, and ﬂatten the vec-tor before feeding it to the classiﬁer.4.

Concatenation of last four (type-2) : Each of lastfour layers is passed through a fully-connectedlayer and reduced to a quarter of its original hiddensize. We ﬂatten the vectors before passing throughthe classiﬁer.

Joint Event Multi-task Learning

To tackle thechallenge of limited annotated data, we apply aglobal parameter sharing model across all events.Speciﬁcally, we jointly learn and ﬁne-tune the lan-guage embedding across different events and applya multi-task classiﬁer for prediction. As shown inFigure 2, the language embedding as well as thefeature extraction mechanism are jointly learnedand ﬁne-tuned globally. We then apply a fully-connected layer as our classiﬁer for all the sub-tasks in different categories of events. In this way,

JOELIN beneﬁts from using data of all the eventsand their subtasks. Compared with training sep-arate models for each event, joint training acrossdifferent tasks signiﬁcantly boosts the performance.

Model Ensemble

It has long been observed thatensembles of models boost overall performance.Hence, in this work, we train multiple models withdifferent feature extraction approaches, and we se-lect the top 5 models with best performance andensemble them by majority voting.

We further ﬁlter our prediction based on NER forpost-processing. Speciﬁcally, we use spaCy ’sNER model to tag the predicted candidate slots.Then we compare the entity tag with the subtask. Ifthe candidate tag does not match the subtask type,we invalidate the prediction by replacing it with “NOT SPECIFIED” . For example, if the subtask is“who”, we nullify those candidate slots whose tagsare not related to persons, as shown in Figure 1. https://spacy.io/ Experiments and Analysis

The dataset is composed of annotated tweets sam-pled from January 15, 2020 to April 26, 2020. Itcontains 7,500 tweets for the following 5 events:(1) tested positive, (2) tested negative, (3) can nottest, (4) death, and (5) cure and prevention. Eachevent contains several slot subtasks. We randomly split the dataset into training andvalidation in a 80:20 ratio. The model is trainedwith the AdamW optimizer (Loshchilov and Hutter,2017) toward minimizing the binary cross entropyloss with batch size of 32 and learning rate of e - .To deal with the class imbalance issue, we applyclass weighting on the loss function. With grid-search, the best weight is 10 and 1 for positive andnegative samples respectively. We evaluate

JOELIN with BERT and CT-BERTbaselines. We measure the performance of differ-ent models with F1 score and micro F1 score, inconsideration of imbalanced sample sizes. Theoverall results are shown in Table 1. Comparedwith the performance of BERT (Zong et al., 2020)and CT-BERT (M¨uller et al., 2020),

JOELIN sig-niﬁcantly outperforms the best baseline CT-BERTby . in micro F1. In terms of performance onsubtasks, JOELIN outperforms the best baselineCT-BERT by up to . in recent travel of eventTESTED POSITIVE. The performance gains of JOELIN are attributed to the well-designed jointevent multi-task learning framework and the type-aware NER-based post-processing.

We conduct an ablation study to understand thecontribution of type-aware post-processing in

JOELIN . We remove the post-processing step as areduced model (

JOELIN -P) and compare the mi-cro F1 scores. As shown in Table 2,

JOELIN hasbetter micro F1 score in comparison with the re-duced model

JOELIN -P. It supports the claim thatour proposed type-aware post processing with NERcan signiﬁcantly boost the performance.

In this work, we build

JOELIN upon a joint eventmulti-task learning framework. We use NER-based https://github.com/viczong/extract_COVID19_events_from_Twitter Sub-task BERT CT-BERT

JOELIN

TESTED POSITIVEage 0.519 0.571 0.769close contact 0.262 0.333 0.420employer 0.394 0.391 0.453gender male 0.664 0.669 0.711gender female 0.635 0.698 0.779name 0.740 0.774 0.807recent travel 0.227 0.391 0.567relation 0.476 0.621 0.769when 0.571 0.571 0.741where 0.560 0.631 0.660TESTED NEGATIVEage 0.000 0.750 0.750close contact 0.000 0.133 0.133gender male 0.479 0.660 0.706gender female 0.214 0.649 0.766how long 0.000 0.400 0.800name 0.519 0.646 0.675relation 0.449 0.720 0.784when 0.000 0.471 0.471where 0.372 0.578 0.651CAN NOT TESTrelation 0.516 0.608 0.771symptoms 0.517 0.704 0.757name 0.382 0.545 0.550when 0.000 0.000 0.000where 0.509 0.500 0.638DEATHage 0.727 0.722 0.789name 0.642 0.715 0.774relation 0.378 0.646 0.680symptoms 0.000 0.000 0.444when 0.633 0.605 0.690where 0.483 0.613 0.628CURE AND PREVENTIONopinion 0.520 0.573 0.627what cure 0.583 0.671 0.671who cure 0.389 0.515 0.545micro avg. F1 0.576 0.647

Table 1: Overall performance of

JOELIN comparedwith BERT and CT-BERT on validation data. The re-sults are reported with F1 score.

Model Micro F1

JOELIN -P 0.488

JOELIN

Table 2: Ablation model comparison on test data. post-processing to generate type-aware predictions.The results show

JOELIN signiﬁcantly boosts theperformance of extracting COVID-19 events fromnoisy tweets over BERT and CT-BERT baselines.In the future, we would like to extend

JOELIN toopen domain event extraction tasks, which is morechallenging and requires a more general pipeline. eferences

Heike Adel and Hinrich Sch¨utze. 2019. Type-awareconvolutional neural networks for slot ﬁlling.

Jour-nal of Artiﬁcial Intelligence Research , 66:297–339.Juan M Banda, Ramya Tekumalla, Guanyu Wang,Jingyuan Yu, Tuo Liu, Yuning Ding, and Ger-ardo Chowell. 2020. A large-scale covid-19 twit-ter chatter dataset for open scientiﬁc research–an international collaboration. arXiv preprintarXiv:2004.03688 .Emily Chen, Kristina Lerman, and Emilio Ferrara.2020. Covid-19: The ﬁrst public coronavirus twit-ter dataset. arXiv preprint arXiv:2003.07372 .Sina Dabiri and Kevin Heaslip. 2019. Developinga twitter-based trafﬁc event detection model usingdeep learning architectures.

Expert systems with ap-plications , 118:425–439.Jacob Devlin, Ming-Wei Chang, Kenton Lee, andKristina Toutanova. 2018. Bert: Pre-training of deepbidirectional transformers for language understand-ing. arXiv preprint arXiv:1810.04805 .Amosse Edouard, Elena Cabrio, Sara Tonelli, andNhan Le Thanh. 2017. Graph-based event extrac-tion from twitter.Tim Finin, William Murnane, Anand Karandikar,Nicholas Keller, Justin Martineau, and Mark Dredze.2010. Annotating named entities in twitter data withcrowdsourcing. In

Proceedings of the NAACL HLT2010 Workshop on Creating Speech and LanguageData with Amazons Mechanical Turk , pages 80–88.Wei-Ning Hsu, Yu Zhang, and James Glass. 2017.Unsupervised domain adaptation for robust speechrecognition via variational autoencoder-based dataaugmentation. In ,pages 16–23. IEEE.Kia Jahanbin and Vahid Rahmanian. 2020. Using twit-ter and web news mining to predict covid-19 out-break.

Asian Paciﬁc Journal of Tropical Medicine ,13.Maria Renee Jimenez-Sotomayor, Carolina Gomez-Moreno, and Enrique Soto-Perez-de Celis. 2020.Coronavirus, ageism, and twitter: An evaluation oftweets about older adults and covid-19.

Journal ofthe American Geriatrics Society .Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Man-dar Joshi, Danqi Chen, Omer Levy, Mike Lewis,Luke Zettlemoyer, and Veselin Stoyanov. 2019.Roberta: A robustly optimized bert pretraining ap-proach. arXiv preprint arXiv:1907.11692 .Christian E Lopez, Malolan Vasu, and Caleb Galle-more. 2020. Understanding the perception ofcovid-19 policies by mining a multilanguage twitterdataset. arXiv preprint arXiv:2003.10359 . Ilya Loshchilov and Frank Hutter. 2017. Decou-pled weight decay regularization. arXiv preprintarXiv:1711.05101 .Andrew J McMinn and Joemon M Jose. 2015. Real-time entity-based event detection for twitter. In

In-ternational conference of the cross-language evalu-ation forum for european languages , pages 65–77.Springer.Martin M¨uller, Marcel Salath´e, and Per E Kummervold.2020. Covid-twitter-bert: A natural language pro-cessing model to analyse covid-19 content on twitter. arXiv preprint arXiv:2005.07503 .Ruchi Parikh and Kamalakar Karlapalem. 2013. Et:events from tweets. In

Proceedings of the 22nd inter-national conference on world wide web , pages 613–620.Shiva Pentyala, Mengwen Liu, and Markus Dreyer.2019. Multi-task networks with universe, group,and task feature learning. arXiv preprintarXiv:1907.01791 .Martin Potthast, Tim Gollub, Kristof Komlossy, Se-bastian Schuster, Matti Wiegmann, Erika Patri-cia Garces Fernandez, Matthias Hagen, and BennoStein. 2018. Crowdsourcing a large corpus of click-bait on twitter. In

Proceedings of the 27th inter-national conference on computational linguistics ,pages 1498–1507.Alan Ritter, Oren Etzioni, and Sam Clark. 2012. Opendomain event extraction from twitter. In

Proceed-ings of the 18th ACM SIGKDD international con-ference on Knowledge discovery and data mining ,pages 1104–1112.Takeshi Sakaki, Makoto Okazaki, and Yutaka Matsuo.2010. Earthquake shakes twitter users: real-timeevent detection by social sensors. In

Proceedingsof the 19th international conference on World wideweb , pages 851–860.Lisa Singh, Shweta Bansal, Leticia Bode, Ceren Budak,Guangqing Chi, Kornraphop Kawintiranon, ColtonPadden, Rebecca Vanarsdall, Emily Vraga, andYanchen Wang. 2020. A ﬁrst look at covid-19 infor-mation and misinformation sharing on twitter. arXivpreprint arXiv:2003.13907 .Qizhe Xie, Zihang Dai, Eduard Hovy, Minh-Thang Lu-ong, and Quoc V Le. 2019. Unsupervised data aug-mentation for consistency training. arXiv preprintarXiv:1904.12848 .Bishan Yang, Ndapandula Nakashole, Bryan Kisiel,Emmanouil A Platanios, Abulhair Saparov,Shashank Srivastava, Derry Wijaya, and Tom MMitchell. 2016. Cmuml micro-reader systemfor kbp 2016 cold start slot ﬁlling, event nuggetdetection, and event argument linking. In

TAC .uan Yuan, Gao Cong, Zongyang Ma, Aixin Sun, andNadia Magnenat Thalmann. 2013. Who, where,when and what: discover spatio-temporal topics fortwitter users. In

Proceedings of the 19th ACMSIGKDD international conference on Knowledgediscovery and data mining , pages 605–613.Yu Zhang and Qiang Yang. 2017. A survey on multi-task learning. arXiv preprint arXiv:1707.08114 .Deyu Zhou, Liangyu Chen, and Yulan He. 2015. Anunsupervised framework of exploring events on twit-ter: Filtering, extraction and categorization. In

Twenty-ninth aaai conference on artiﬁcial intelli-gence .Shi Zong, Ashutosh Baheti, Wei Xu, and Alan Rit-ter. 2020. Extracting covid-19 events from twitter. arXiv preprint arXiv:2006.02567arXiv preprint arXiv:2006.02567