TEST_POSITIVE at W-NUT 2020 Shared Task-3: Joint Event Multi-task Learning for Slot Filling in Noisy Text
Chacha Chen, Chieh-Yang Huang, Yaqi Hou, Yang Shi, Enyan Dai, Jiaqi Wang
TTEST POSITIVE at W-NUT 2020 Shared Task-3:Joint Event Multi-task Learning for Slot Filling in Noisy Text
Chacha Chen Penn [email protected]
Chieh-Yang Huang
Penn [email protected]
Yaqi Hou
UNC at Chapel [email protected]
Yang Shi
Enyan Dai
Penn [email protected]
Jiaqi Wang * Penn [email protected]
Abstract
The competition of extracting COVID-19events from Twitter is to develop systems thatcan automatically extract related events fromtweets. The built system should identify dif-ferent pre-defined slots for each event, in or-der to answer important questions (e.g.,
Whois tested positive? What is the age of theperson? Where is he/she? ). To tackle thesechallenges, we propose the Joint Event Multi-task Learning (
JOELIN ) model. Through aunified global learning framework, we makeuse of all the training data across differentevents to learn and fine-tune the languagemodel. Moreover, we implement a type-awarepost-processing procedure using named entityrecognition (NER) to further filter the predic-tions.
JOELIN outperforms the BERT base-line by . in micro F1. In this work, we report the system architecture andresults of the team TEST POSITIVE in the compe-tition of W-NUT 2020 sharred Task-3: extractingCOVID-19 event from Twitter.Since February 2020, the pandemic COVID-19has been spreading all over the world, posing asignificant threat to mankind in every aspect. Theinformation sharing about a pandemic has beencritical in stopping virus spreading. With the recentadvance of social networks and machine learning,we are able to automatically detect potential eventsof COVID cases, and identify key information toprepare ahead.We are interested in COVID-19 related event ex-traction from tweets. With the prevalence of coron-avirus, Twitter has been a valuable source of newsand information. Twitter users share COVID-19 re-lated topics about personal narratives and news onsocial media (M¨uller et al., 2020). The information https://github.com/Chacha-Chen/JOELIN could be helpful for doctors, epidemiologists, andpolicymakers in controlling the pandemic. How-ever, manual extracting useful information fromtremendous amount of tweets is impossible. Hence,we aim to develop a system to automatically extractstructured knowledge from Twitter.Extracting COVID-19 related events from Twit-ter is non-trivial due to the following challenges:(1) How to deal with limited annotations in het-erogeneous events and subtasks? . The creationof the annotated data relies completely on humanlabors, and thus only a limited amount of data canbe obtained in each event categories. There are avariety types of events and subtasks. Many existingworks solve these low resource problem by differ-ent approaches, inlcuding crowdsourcing (M¨ulleret al., 2020; Finin et al., 2010; Potthast et al., 2018),unsupervised training (Xie et al., 2019; Hsu et al.,2017), or multi-task learning (Zhang and Yang,2017; Pentyala et al., 2019). Here we adopt multi-task training paradigm to benefit from the inter-event and intra-event (subtasks) information shar-ing. In this way,
JOELIN learns a shared embed-ding network globally from all events data. In thisway, we implicitly augment the dataset by globaltraining and fine-tuning the language model.(2)
How to make type-aware predictions?
Exist-ing work (Zong et al., 2020) did not encode the in-formation of different subtask types into the model,while it could be useful in suggesting the candidateslot entity type. In order to make type-aware pre-dictions, we propose a NER-based post-processingprocedure in the end of
JOELIN pipeline. Weuse NER to automatically tag the candidate slotsand remove the candidate whose entity type doesnot match the corresponding subtask type. For ex-ample, as shown in Figure 1, in subtask “Who”,“my wife’s grandmother” is a valid candidate slot,while “old persons home”, tagged as location entity,would be replaced with “Not Specified” during the a r X i v : . [ c s . C L ] S e p ost-processing. PERSON “my wife’s grandmother”
LOCATION “old persons home” Subtask “Who”
Figure 1: Illustration of NER-based post-processing.
In summary,
JOELIN is enabled by the follow-ing technical contributions: • A joint event multi-task learning frame-work for different events and subtasks.
Withthe unified global training framework, we train andfine-tune the language model across all events andmake predictions based on multi-task learning tolearn from limited data. • A NER-based type-aware post-processingapproach.
We leverage NER tagging on the modelpredictions and filter out wrong predictions basedon subtask types. In this way,
JOELIN benefitsfrom subtask type prior knowledge and furtherboosts the performance.
Event Extraction from Twitter
Impressive ef-forts have been made to detect events from Twitter.Existing works include domain specific event ex-traction and open domain event extraction. For do-main specific extraction, approaches mainly focuson extracting a particular type of events, includingnatural disasters (Sakaki et al., 2010), traffic events(Dabiri and Heaslip, 2019), user mobility behav-iors (Yuan et al., 2013), and etc. The open domainscenario is more challenging and usually relies onunsupervised approaches. Existing works usuallycreate clusters with event-related keywords (Parikhand Karlapalem, 2013), or named entities (McMinnand Jose, 2015; Edouard et al., 2017). Additionally,Ritter et al. (2012) and Zhou et al. (2015) designgeneral pipelines to extract and categorize events insupervised and unsupervised manner respectively.Different from previous works, we deal withCOVID-19 related event extraction in particular.Zong et al. (2020) provide a BERT baseline forthe same task. But we create a unified frameworkto learn simultaneously for different categories ofevents and subtasks.
Type-aware Slot Filling
Yang et al. (2016) for-mulate entity type constraints and use integer linearprogramming to combine them with relation clas-sification. Adel and Sch¨utze (2019) propose to in- tegrate entity and relation classes in convolutionalneural networks and learn the correlation fromdata. We propose a NER-based post-processingtechnique for type-aware slot filling. By filteringout entity mis-matched predictions,
JOELIN canefficiently boost the performance with minimumhand-crafted rules.
COVID-19 Twitter Analysis
With the quaran-tine situation, people can share thoughts and makecomments about COVID-19 on Twitter. It has be-come a research source for researchers to exploreand study. Singh et al. (2020) show that Twitterconversations indicate a spatio-temporal relation-ship between information flow and new cases ofCOVID-19. There is some work about COVID-19datasets. Banda et al. (2020) provide a large-scalecurated dataset of over 152 million tweets. Chenet al. (2020) collect tweets and forms a multilingualCOVID-19 Twitter dataset. Based on the collecteddata, Jahanbin and Rahmanian (2020) propose amodel to predict COVID-19 breakout by monitor-ing and tracking information on Twitter. Thoughthere are some works about COVID-19 tweetsanalyisis (M¨uller et al., 2020; Jimenez-Sotomayoret al., 2020; Lopez et al., 2020), the work aboutautomatically extracting structured knowledge ofCOVID-19 events from tweets is still limited.
In this section, we introduce our approach
JOELIN and its data pre-processing and post-processingsteps in detail. First, we pre-process the noisy Twit-ter data following the data cleaning procedures inM¨uller et al. (2020). Second, we train
JOELIN and fine-tune the pre-trained language model end-to-end. Specifically, we design the
JOELIN classi-fier in a joint event multi-task learning framework.Moreover, we provide four options of embeddingtypes and ensemble the outputs with the highestvalidation score. Finally, we further utilize NERtechniques to post-process our results with mini-mum hand-crafted rules.
Prior to training, the original tweets are cleanedfollowing M¨uller et al. (2020). The punctuationsare standardized and unicode emoticons are ex-panded into textual ASCII representations . AllTwitter usernames are replaced with a special to-ken
Sentence
Last Layer Concat Type-1Last Four Layer
Multi-task Classifier
Concat Type-2
NER-based Post-processing
Age Contact … Relation
Symptoms
OpinionWhen … … … …
Event 1 Event 1 …Age Contact
Event 3 … Event 4 … Event 5 … Figure 2: Our approach comprises of 2 main compo-nents: (1) global language model across events and sub-tasks; (2) multi-task learning classifier.
Model
JOELIN consists of four modules as shown inFigure 2: the pre-trained COVID Twitter BERT(CT-BERT) (M¨uller et al., 2020), four differentembedding layers, joint event multi-task learningframework with global parameter sharing, and theoutput ensemble module.
COVID Twitter BERT
It has been a commonpractice that pre-trained language models, e.g.,BERT (Devlin et al., 2018) and RoBERTa (Liuet al., 2019), are used for a supervised fine-tuningfor specific downstream tasks. In this work, we useCT-BERT as
JOELIN pre-trained language model.The CT-BERT is trained on a corpus of 160Mtweets related to COVID-19. CT-BERT showsgreat improvement compared to BERT-LARGEand RoBERTa. We further fine-tune CT-BERT withthe provided dataset.
Feature Extraction
With the hidden representa-tion of token < E > given by CT-BERT, we furtherapply various choices of different feature extrac-tion methods to choose the more useful features.Inspired by Devlin et al. (2018), we implemented the following four feature extraction methods:1. Last hidden layer : we directly use the last hid-den layer of CT-BERT as our classifier input.2.
Summation of last four : we sum the last fourhidden layer outputs as the classifier input.3.
Concatenation of last four (type-1) : we directlyconcatenate the last four layers, and flatten the vec-tor before feeding it to the classifier.4.
Concatenation of last four (type-2) : Each of lastfour layers is passed through a fully-connectedlayer and reduced to a quarter of its original hiddensize. We flatten the vectors before passing throughthe classifier.
Joint Event Multi-task Learning
To tackle thechallenge of limited annotated data, we apply aglobal parameter sharing model across all events.Specifically, we jointly learn and fine-tune the lan-guage embedding across different events and applya multi-task classifier for prediction. As shown inFigure 2, the language embedding as well as thefeature extraction mechanism are jointly learnedand fine-tuned globally. We then apply a fully-connected layer as our classifier for all the sub-tasks in different categories of events. In this way,
JOELIN benefits from using data of all the eventsand their subtasks. Compared with training sep-arate models for each event, joint training acrossdifferent tasks significantly boosts the performance.
Model Ensemble
It has long been observed thatensembles of models boost overall performance.Hence, in this work, we train multiple models withdifferent feature extraction approaches, and we se-lect the top 5 models with best performance andensemble them by majority voting.
We further filter our prediction based on NER forpost-processing. Specifically, we use spaCy ’sNER model to tag the predicted candidate slots.Then we compare the entity tag with the subtask. Ifthe candidate tag does not match the subtask type,we invalidate the prediction by replacing it with “NOT SPECIFIED” . For example, if the subtask is“who”, we nullify those candidate slots whose tagsare not related to persons, as shown in Figure 1. https://spacy.io/ Experiments and Analysis
The dataset is composed of annotated tweets sam-pled from January 15, 2020 to April 26, 2020. Itcontains 7,500 tweets for the following 5 events:(1) tested positive, (2) tested negative, (3) can nottest, (4) death, and (5) cure and prevention. Eachevent contains several slot subtasks. We randomly split the dataset into training andvalidation in a 80:20 ratio. The model is trainedwith the AdamW optimizer (Loshchilov and Hutter,2017) toward minimizing the binary cross entropyloss with batch size of 32 and learning rate of e - .To deal with the class imbalance issue, we applyclass weighting on the loss function. With grid-search, the best weight is 10 and 1 for positive andnegative samples respectively. We evaluate
JOELIN with BERT and CT-BERTbaselines. We measure the performance of differ-ent models with F1 score and micro F1 score, inconsideration of imbalanced sample sizes. Theoverall results are shown in Table 1. Comparedwith the performance of BERT (Zong et al., 2020)and CT-BERT (M¨uller et al., 2020),
JOELIN sig-nificantly outperforms the best baseline CT-BERTby . in micro F1. In terms of performance onsubtasks, JOELIN outperforms the best baselineCT-BERT by up to . in recent travel of eventTESTED POSITIVE. The performance gains of JOELIN are attributed to the well-designed jointevent multi-task learning framework and the type-aware NER-based post-processing.
We conduct an ablation study to understand thecontribution of type-aware post-processing in
JOELIN . We remove the post-processing step as areduced model (
JOELIN -P) and compare the mi-cro F1 scores. As shown in Table 2,
JOELIN hasbetter micro F1 score in comparison with the re-duced model
JOELIN -P. It supports the claim thatour proposed type-aware post processing with NERcan significantly boost the performance.
In this work, we build
JOELIN upon a joint eventmulti-task learning framework. We use NER-based https://github.com/viczong/extract_COVID19_events_from_Twitter Sub-task BERT CT-BERT
JOELIN
TESTED POSITIVEage 0.519 0.571 0.769close contact 0.262 0.333 0.420employer 0.394 0.391 0.453gender male 0.664 0.669 0.711gender female 0.635 0.698 0.779name 0.740 0.774 0.807recent travel 0.227 0.391 0.567relation 0.476 0.621 0.769when 0.571 0.571 0.741where 0.560 0.631 0.660TESTED NEGATIVEage 0.000 0.750 0.750close contact 0.000 0.133 0.133gender male 0.479 0.660 0.706gender female 0.214 0.649 0.766how long 0.000 0.400 0.800name 0.519 0.646 0.675relation 0.449 0.720 0.784when 0.000 0.471 0.471where 0.372 0.578 0.651CAN NOT TESTrelation 0.516 0.608 0.771symptoms 0.517 0.704 0.757name 0.382 0.545 0.550when 0.000 0.000 0.000where 0.509 0.500 0.638DEATHage 0.727 0.722 0.789name 0.642 0.715 0.774relation 0.378 0.646 0.680symptoms 0.000 0.000 0.444when 0.633 0.605 0.690where 0.483 0.613 0.628CURE AND PREVENTIONopinion 0.520 0.573 0.627what cure 0.583 0.671 0.671who cure 0.389 0.515 0.545micro avg. F1 0.576 0.647
Table 1: Overall performance of
JOELIN comparedwith BERT and CT-BERT on validation data. The re-sults are reported with F1 score.
Model Micro F1
JOELIN -P 0.488
JOELIN
Table 2: Ablation model comparison on test data. post-processing to generate type-aware predictions.The results show
JOELIN significantly boosts theperformance of extracting COVID-19 events fromnoisy tweets over BERT and CT-BERT baselines.In the future, we would like to extend
JOELIN toopen domain event extraction tasks, which is morechallenging and requires a more general pipeline. eferences
Heike Adel and Hinrich Sch¨utze. 2019. Type-awareconvolutional neural networks for slot filling.
Jour-nal of Artificial Intelligence Research , 66:297–339.Juan M Banda, Ramya Tekumalla, Guanyu Wang,Jingyuan Yu, Tuo Liu, Yuning Ding, and Ger-ardo Chowell. 2020. A large-scale covid-19 twit-ter chatter dataset for open scientific research–an international collaboration. arXiv preprintarXiv:2004.03688 .Emily Chen, Kristina Lerman, and Emilio Ferrara.2020. Covid-19: The first public coronavirus twit-ter dataset. arXiv preprint arXiv:2003.07372 .Sina Dabiri and Kevin Heaslip. 2019. Developinga twitter-based traffic event detection model usingdeep learning architectures.
Expert systems with ap-plications , 118:425–439.Jacob Devlin, Ming-Wei Chang, Kenton Lee, andKristina Toutanova. 2018. Bert: Pre-training of deepbidirectional transformers for language understand-ing. arXiv preprint arXiv:1810.04805 .Amosse Edouard, Elena Cabrio, Sara Tonelli, andNhan Le Thanh. 2017. Graph-based event extrac-tion from twitter.Tim Finin, William Murnane, Anand Karandikar,Nicholas Keller, Justin Martineau, and Mark Dredze.2010. Annotating named entities in twitter data withcrowdsourcing. In
Proceedings of the NAACL HLT2010 Workshop on Creating Speech and LanguageData with Amazons Mechanical Turk , pages 80–88.Wei-Ning Hsu, Yu Zhang, and James Glass. 2017.Unsupervised domain adaptation for robust speechrecognition via variational autoencoder-based dataaugmentation. In ,pages 16–23. IEEE.Kia Jahanbin and Vahid Rahmanian. 2020. Using twit-ter and web news mining to predict covid-19 out-break.
Asian Pacific Journal of Tropical Medicine ,13.Maria Renee Jimenez-Sotomayor, Carolina Gomez-Moreno, and Enrique Soto-Perez-de Celis. 2020.Coronavirus, ageism, and twitter: An evaluation oftweets about older adults and covid-19.
Journal ofthe American Geriatrics Society .Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Man-dar Joshi, Danqi Chen, Omer Levy, Mike Lewis,Luke Zettlemoyer, and Veselin Stoyanov. 2019.Roberta: A robustly optimized bert pretraining ap-proach. arXiv preprint arXiv:1907.11692 .Christian E Lopez, Malolan Vasu, and Caleb Galle-more. 2020. Understanding the perception ofcovid-19 policies by mining a multilanguage twitterdataset. arXiv preprint arXiv:2003.10359 . Ilya Loshchilov and Frank Hutter. 2017. Decou-pled weight decay regularization. arXiv preprintarXiv:1711.05101 .Andrew J McMinn and Joemon M Jose. 2015. Real-time entity-based event detection for twitter. In
In-ternational conference of the cross-language evalu-ation forum for european languages , pages 65–77.Springer.Martin M¨uller, Marcel Salath´e, and Per E Kummervold.2020. Covid-twitter-bert: A natural language pro-cessing model to analyse covid-19 content on twitter. arXiv preprint arXiv:2005.07503 .Ruchi Parikh and Kamalakar Karlapalem. 2013. Et:events from tweets. In
Proceedings of the 22nd inter-national conference on world wide web , pages 613–620.Shiva Pentyala, Mengwen Liu, and Markus Dreyer.2019. Multi-task networks with universe, group,and task feature learning. arXiv preprintarXiv:1907.01791 .Martin Potthast, Tim Gollub, Kristof Komlossy, Se-bastian Schuster, Matti Wiegmann, Erika Patri-cia Garces Fernandez, Matthias Hagen, and BennoStein. 2018. Crowdsourcing a large corpus of click-bait on twitter. In
Proceedings of the 27th inter-national conference on computational linguistics ,pages 1498–1507.Alan Ritter, Oren Etzioni, and Sam Clark. 2012. Opendomain event extraction from twitter. In
Proceed-ings of the 18th ACM SIGKDD international con-ference on Knowledge discovery and data mining ,pages 1104–1112.Takeshi Sakaki, Makoto Okazaki, and Yutaka Matsuo.2010. Earthquake shakes twitter users: real-timeevent detection by social sensors. In
Proceedingsof the 19th international conference on World wideweb , pages 851–860.Lisa Singh, Shweta Bansal, Leticia Bode, Ceren Budak,Guangqing Chi, Kornraphop Kawintiranon, ColtonPadden, Rebecca Vanarsdall, Emily Vraga, andYanchen Wang. 2020. A first look at covid-19 infor-mation and misinformation sharing on twitter. arXivpreprint arXiv:2003.13907 .Qizhe Xie, Zihang Dai, Eduard Hovy, Minh-Thang Lu-ong, and Quoc V Le. 2019. Unsupervised data aug-mentation for consistency training. arXiv preprintarXiv:1904.12848 .Bishan Yang, Ndapandula Nakashole, Bryan Kisiel,Emmanouil A Platanios, Abulhair Saparov,Shashank Srivastava, Derry Wijaya, and Tom MMitchell. 2016. Cmuml micro-reader systemfor kbp 2016 cold start slot filling, event nuggetdetection, and event argument linking. In
TAC .uan Yuan, Gao Cong, Zongyang Ma, Aixin Sun, andNadia Magnenat Thalmann. 2013. Who, where,when and what: discover spatio-temporal topics fortwitter users. In
Proceedings of the 19th ACMSIGKDD international conference on Knowledgediscovery and data mining , pages 605–613.Yu Zhang and Qiang Yang. 2017. A survey on multi-task learning. arXiv preprint arXiv:1707.08114 .Deyu Zhou, Liangyu Chen, and Yulan He. 2015. Anunsupervised framework of exploring events on twit-ter: Filtering, extraction and categorization. In
Twenty-ninth aaai conference on artificial intelli-gence .Shi Zong, Ashutosh Baheti, Wei Xu, and Alan Rit-ter. 2020. Extracting covid-19 events from twitter. arXiv preprint arXiv:2006.02567arXiv preprint arXiv:2006.02567