A Hybrid BERT and LightGBM based Model for Predicting Emotion GIF Categories on Twitter
AA Hybrid BERT and LightGBM based Model for Predicting Emotion GIFCategories on Twitter
Ye Bi
Ping An Technology(Shenzhen)Shanghai, China [email protected]
Shuo Wang
Ping An Technology(Shenzhen)Shanghai, China [email protected]
Zhongrui Fan
Ping An Technology(Shenzhen)Shanghai, China [email protected]
Abstract
The animated Graphical Interchange For-mat (GIF) images have been widely used onsocial media as an intuitive way of expressionemotion. Given their expressiveness, GIFs of-fer a more nuanced and precise way to con-vey emotions. In this paper, we present oursolution for the EmotionGIF 2020 challenge,the shared task of SocialNLP 2020. To recom-mend GIF categories for unlabeled tweets, weregarded this problem as a kind of matchingtasks and proposed a learning to rank frame-work based on Bidirectional Encoder Rep-resentations from Transformer (BERT) andLightGBM. Our team won the 4th place with aMean Average Precision @ 6 (MAP@6) scoreof 0.5394 on the round 1 leaderboard.
Animated GIFs are widely used in online socialmedia nowadays. They provide a way to conveyemotions more accurately and conveniently. But al-gorithms on those GIFs are still insufficient. Fortu-nately, EmotionGIF 2020 (Shmueli and Ku, 2020)gives us a chance to have a glimpse of the relationbetween animated GIFs predicting and emotion de-tection. The task of EmotionGIF is defined as givenunlabeled tweets, predict the category of a GIF re-sponse. The evaluation metric is Mean AveragePrecision at 6(MAP@6).Unlike traditional emotion detection, a singleanimated GIF may contain various emotions, alsomay contain different set of emotions under differ-ent circumstances, making it harder to predict. Af-ter referring to several classical Natural LanguageProcessing(NLP) models like variations of Trans-formers (Vaswani et al., 2017) and BERT (Devlinet al., 2018), we find it may be unsuitable for thosemodels to directly apply on the challenge task. Con-sidering the multi-label of raw GIFs and the metric,we make improvement on BERT, then propose our idx text Fell right under my trap reply
Ouch! mp4 fe6e...ff82.mp4 categories
Table 1: A data sample. own solution: a cascade pairwise classifier by usingBERT and LightGBM together.The rest of the paper is organized as follows:Section 2 describes the dataset and preprocessingtechniques. Our solution is introduced in Section 3.We show the experiments and results of our modelin the next Section 4. Finally, we conclude ouranalysis of the challenge.
The organizer builds a dataset from 40,000 two-turntwitter threads. The dataset is splitted into threeparts: 32,000 samples with gold labels for training,4000 unlabeled samples for practice and 4000 un-labeled samples for final evaluation. Each sampleincludes the text of the original tweet and the replytext. The training samples also contains the MP4-format file and the category of the GIF response,which is selected from a list of 43 categories. AnyGIF response can belong to multiple categories andthe label in the dataset is a non-empty subset of 1to 6 categories. The data in a sample is shown inTable 1.
The text preprocessing is a crucial step in manyNLP tasks. There are some research (Maglianiet al., 2016) and tools on the text preprocessing oftwitter. We utilized NLTKs TwitterTokenizer (Bird,2006) and performed some basic preprocessing a r X i v : . [ c s . I R ] A ug teps on the tweet text: • convert all tweet texts into lower case. • convert the emoji symbols to their correspond-ing meanings using emoji python package. • remove excessive spaces. • user names, numbers and websites are re-placed with special tokens. In this section, we first introduce the BERT modelwith pointwise and pairwise approaches, respec-tively. And then we describe the LightGBM modelin details. The pipeline of the proposed method isshown in Figure 1.
BERT model has achieved great success in manyNLP tasks such as sentiment classification, naturallanguage inference, and machine reading compre-hension. So it’s a natural choice to employ BERTin our framework. We fine-tune the BERT model intwo different ways: pointwise scheme and pairwisescheme.
We trained the BERT model in the pointwise way,which means that we treated the task as the binaryclassification problem. We concatenated the pre-processed tweet text and reply with [SEP] tokenand sent them into the BERT model as text pair.The pooled output of BERT model is treated as therepresentation of tweet text. The 43 GIF categoriesare embedded into a high dimensional space. Ascore function is defined to measure how well thetweet content and the GIF category match. Sincethe given labeled samples are all positive samples,we need to match some negative samples duringthe training process.For pointwise approach, the score for each cat-egory is independent of the other categories thatare in the result list for the query. That is to say,the pointwise method can’t take into account theinternal dependencies between the categories cor-responding to the same query. This leads us to takepairwise model into consideration. https://pypi.org/project/emoji/ Pairwise approaches look at a pair of categoriesat a time in the loss function. Given a pair of cat-egories, they try to come up with the optimal or-dering for that pair. To be specific, each trainingsample consists of tweet text query and two GIFcategories, one is the true matched category, theother is the negative category sampled randomlyfrom categories excluding the true category. Thepairwise model learns to compare the differencesbetween the query and each category and pick thetrue one. We used the margin ranking loss as ourloss function and trained several triplet sampleswith the same tweet text and different GIF cate-gories. We got a higher score than the BERT modelin pointwise mode.
We use pairwise-way to train this classifier. Itsinputs contains multiple features, including the em-bedding returned by the pointwise/pairwise BERTand other manual features. And the output is thescore upon the specified label given. We mainly cre-ate similarity scores between sentence(text, reply)and label as manual features due to the pairwise-way training. Different NLP algorithms are usedto generate similarity vectors, for instance, TF-IDFtransformation, Word2Vec pretraining, FastTextpretraining, etc.. In order to gain different fine-grained language, we build the corpus in two ways.For word-level granularity, we separate all wordsin all sentences after removing the emoji in it. Forsentence-level granularity, we directly use the sen-tence after removing the emoji. We choose threealgorithms to calculate the similarity scores, in-cluding Euclidean distance, Manhattan distanceand Cosine distance. For labels, the similarity vec-tors is incalculable actually, so we aggregate thevectors belong to a specified label by using meanpooling as the vector of this specified label. Thesame method is used again upon the embeddingreturned by the pointwise/pairwise BERT. Besides,we also create some statistical features such as thenumber of emoji, the weight of key words, etc.
In our experiments, the 32000 training samplesare splitted into training set and validation set ata ratio of 9:1. In the training phase, each positivesample is paired with 4 negative samples. For the ightGBMBERT
Feature EngineeringStatisticalSemanticFeatures Text Reply SimilarityRelatedFeaturesPointwise Learning Pairwise Learning
Figure 1: The Model Overview.
Model Offline Online
BERT(pointwise) 0.5172 -BERT(pairwise) 0.5354 0.5209LightGBM 0.5645 0.5394
Table 2: MAP@6 results of different settings.
BERT-base model, The embedding dimension ofGIF category is set to 128. The code is imple-mented under
PyTorch framework and is trainedwith
Adam optimizer. The initial learning rate is0.00003 and decays every 10 epochs at a rate of0.1. The training of BERT-base model takes about4 hours on 4 NVIDIA Tesla V100 GPUs for 30epochs with a minibatch size of 128.For the LightGBM-base model, we use hyperopt for hyperparameter optimization. Here we compare the performance of our methodwith different settings. The results on the validationset are shown in Table 2. From the Table, we cansee that the result of the pairwise training method isbetter than pointwise. At the same time, the Light-GBM model based on detailed feature engineeringis very effective. https://github.com/hyperopt/hyperopt In this paper, we propose a method based on BERTand LightGBM under pairwise learning frameworkfor EmotionGIF 2020 Challenge. Extensive experi-ments were conducted on the challenge dataset andthe results proved the effectiveness of our method.
References
Steven Bird. 2006. Nltk: The natural language toolkit.In
ACL 2006, 21st International Conference on Com-putational Linguistics and 44th Annual Meeting ofthe Association for Computational Linguistics, Pro-ceedings of the Conference, Sydney, Australia, 17-21July 2006 .Jacob Devlin, Ming-Wei Chang, Kenton Lee, andKristina Toutanova. 2018. BERT: pre-training ofdeep bidirectional transformers for language under-standing.
CoRR , abs/1810.04805.Federico Magliani, Tomaso Fontanini, Paolo Fornac-ciari, Stefano Manicardi, and Eleonora Iotti. 2016.A comparison between preprocessing techniques forsentiment analysis in twitter. In
Kdweb 2016 .Boaz Shmueli and Lun-Wei Ku. 2020. Socialnlp emo-tiongif 2020 challenge overview: Predicting reac-tion gif categories on social media. Technical report.Ashish Vaswani, Noam Shazeer, Niki Parmar, JakobUszkoreit, Llion Jones, Aidan N. Gomez, Lukaszaiser, and Illia Polosukhin. 2017. Attention is allyou need.