HateMonitors: Language Agnostic Abuse Detection in Social Media
HHateMonitors: Language Agnostic AbuseDetection in Social Media
Punyajoy Saha − − − , Binny Mathew − − − ,Pawan Goyal − − − , and AnimeshMukherjee − − − Indian Institute of Technology, Kharagpur, West Bengal, India - 721302
Abstract.
Reducing hateful and offensive content in online social me-dia pose a dual problem for the moderators. On the one hand, rigidcensorship on social media cannot be imposed. On the other, the freeflow of such content cannot be allowed. Hence, we require efficient abu-sive language detection system to detect such harmful content in socialmedia. In this paper, we present our machine learning model, HateMon-itor, developed for Hate Speech and Offensive Content Identification inIndo-European Languages (HASOC) [19], a shared task at FIRE 2019.We have used Gradient Boosting model, along with BERT and LASERembeddings, to make the system language agnostic. Our model cameat
First position for the German sub-task A. We have also made ourmodel public . Keywords:
Hate speech · Offensive language · Multilingual · LASERembeddings · BERT embeddings · Classification.
In social media, abusive language denotes a text which contains any form ofunacceptable language in a post or a comment. Abusive language can be dividedinto hate speech, offensive language and profanity. Hate speech is a derogatorycomment that hurts an entire group in terms of ethnicity, race or gender. Of-fensive language is similar to derogatory comment, but it is targeted towardsan individual. Profanity refers to any use of unacceptable language without aspecific target. While profanity is the least threatening, hate speech has the mostdetrimental effect on the society.Social media moderators are having a hard time in combating the rampantspread of hate speech as it is closely related to the other forms of abusivelanguage. The evolution of new slangs and multilingualism, further adding tothe complexity.Recently, there has been a sharp rise in hate speech related incidents in India,the lynchings being the clear indication [3]. Arun et al. [3] suggests that hate https://github.com/punyajoy/HateMonitors-HASOC https://tinyurl.com/y6tgv865 a r X i v : . [ c s . S I] S e p P. Saha et al. speech in India is very complicated as people are not directly spreading hatebut are spreading misinformation against a particular community. Hence, it hasbecome imperative to study hate speech in Indian language.For the first time, a shared task on abusive content detection has been re-leased for Hindi language at HASOC 2019. This will fuel the hate speech andoffensive language research for Indian languages. The inclusion of datasets forEnglish and German language will give a performance comparison for detectionof abusive content in high and low resource language.In this paper, we focus on the detection of multilingual hate speech detectionthat are written in Hindi, English, and German and describe our submission (HateMonitors) for HASOC at FIRE 2019 competition. Our system concate-nates two types of sentence embeddings to represent each tweet and use machinelearning models for classification.
Analyzing abusive language in social media is a daunting task. Waseem et al. [33]categorizes abusive language into two sub-classes – hate speech and offensive lan-guage. In their analysis of abusive language, Classifying abusive language intothese two subtypes is more challenging due to the correlation between offensivelanguage and hate speech [10]. Nobata et al. [22] uses predefined language el-ement and embeddings to train a regression model. With the introduction ofbetter classification models [23,29] and newer features [1,10,30], the research inhate and offensive speech detection has gained momentum.Silva et al. [28] performed a large scale study to understand the target of suchhate speech on two social media platforms: Twitter and Whisper. These targetcould be the Refugees and Immigrants [25], Jews [7,14] and Muslims [4,32]. Peo-ple could become the target of hate speech based on Nationality [12], sex [5,26],and gender [24,16] as well. Public expressions of hate speech affects the devalua-tion of minority members [17], the exclusion of minorities from the society [21],and tend to diffuse through the network at a faster rate [20].One of the key issues with the current state of the hate and offensive languageresearch is that the majority of the research is dedicated to the English languageon [15]. Few researchers have tried to solve the problem of abusive language inother languages [25,27], but the works are mostly monolingual. Any online socialmedia platform contains people of different ethnicity, which results in the spreadof information in multiple languages. Hence, a robust classifier is needed, whichcan deal with abusive language in the multilingual domain. Several shared taskslike HASOC [19], HaSpeeDe [8], GermEval [34], AMI [13], HatEval [6] havefocused on detection of abusive text in multiple languages recently. ateMonitors: Language Agnostic Abuse Detection in Social Media 3
The dataset at HASOC 2019 were given in three languages: Hindi, English, andGerman. Dataset in Hindi and English had three subtasks each, while Germanhad only two subtasks. We participated in all the tasks provided by the organisersand decided to develop a single model that would be language agnostic. We usedthe same model architecture for all the three languages. We present the statistics for HASOC dataset in Table 1. From the table, we canobserve that the dataset for the German language is highly unbalanced, Englishand Hindi are more or less balanced for sub-task A. For sub-task B Germandataset is balanced but others are unbalanced. For sub-task C both the datasetsare highly unbalanced.
Table 1.
This table shows the initial statistics about the training and test dataLanguage English German HindiSub-Task A Train Test Train Test Train TestHOF 2261 288 407 136 2469 605NOT 3591 865 3142 714 2196 713Total 5852 1153 3819 850 4665 1318Sub-Task B Train Test Train Test Train TestHATE 1141 124 111 41 556 190OFFN 451 71 210 77 676 197PRFN 667 93 86 18 1237 218Total 2261 288 407 136 2469 605Sub-Task C Train Test Train Test Train TestTIN 2041 245 - - - - 1545 542UNT 220 43 - - - - 924 63Total 2261 288 - - - - 2469 605 consists of building a binary classification model which can predictif a given piece of text is hateful and offensive (HOF) or not (NOT). A datapoint is annotated as HOF if it contains any form of non-acceptable languagesuch as hate speech, aggression, profanity. Each of the three languages had thissubtask.
Sub-task B consists of building a multi-class classification model which canpredict the three different classes in the data points annotated as HOF: Hate https://hasoc2019.github.io/ P. Saha et al. speech (HATE), Offensive language (OFFN), and Profane (PRFN). Again allthree languages have this sub-task.
Sub-task C consists of building a binary classification model which can predictthe type of offense: Targeted (TIN) and Untargeted (UNT). Sub-task C was notconducted for the German dataset.
In this section, we will explain the details about our system, which comprisesof two sub-parts- feature generation and model selection. Figure 1 shows thearchitecture of our system.
We preprocess the tweets before performing the feature ex-traction. The following steps were followed: – We remove all the URLs. – Convert text to lowercase. This step was not applied to the Hindi languagesince Devanagari script does not have lowercase and uppercase characters. – We did not normalize the mentions in the text as they could potentiallyreveal important information for the embeddings encoders. – Any numerical figure was normalized to a string ‘number’.We did not remove any punctuation and stop-words since the context of thesentence might get lost in such a process. Since we are using sentence embedding,it is essential to keep the context of the sentence intact.
Feature vectors:
The preprocessed posts are then used to generate features forthe classifier. For our model, we decided to generate two types of feature vector:BERT Embeddings and LASER Embeddings. For each post, we generate theBERT and LASER Embedding, which are then concatenated and fed as inputto the final classifier.
Multilingual BERT embeddings:
Bidirectional Encoder Representationsfrom Transformers(BERT) [11] has played a key role in the advancement ofnatural language processing domain (NLP). BERT is a language model whichis trained to predict the masked words in a sentence. To generate the sentenceembedding for a post, we take the mean of the last 11 layers (out of 12) to geta sentence vector with length of 768. LASER embeddings : Researchers at Facebook released a language agnos-tic sentence embeddings representations (LASER) [2], where the model jointlylearns on 93 languages. The model takes the sentence as input and produces avector representation of length 1024. The model is able to handle code mixingas well [31]. We use the BERT-base-multilingual-cased which has 104 languages, 12-layer, 768-hidden, 12-heads and 110M parametersateMonitors: Language Agnostic Abuse Detection in Social Media 5
SentencesPre-processing moduleLASER Multilingual BERTsentenceembedding sentenceembedding
ConcatenateLGBMLabels
Fig. 1.
Architecture of our system
We pass the preprocessed sentences through each of these embedding mod-els and got two separate sentence representation. Further, we concatenate theembeddings into one single feature vector of length 1792, which is then passedto the final classification model.
The amount of data in each category was insufficient to train a deep learningmodel. Building such deep models would lead to overfitting. So, we resorted tousing simpler models such as SVM and Gradient boosted trees. Gradient boostedtrees [9] are often the choice for systems where features are pre-extracted fromthe raw data . In the category of gradient boosted trees, Light Gradient BoostingMachine (LGBM) [18] is considered one of the most efficient in terms of memoryfootprint. Moreover, it has been part of winning solutions of many competition . Hence, we used LGBM as model for the downstream tasks in this competition. The performance of our models across different languages for sub-task A areshown in table 2. Our model got the first position in the German sub-taskwith a macro F1 score of . The results of sub-task B and sub-task C isshown in table 3 and 4 respectively. https://tinyurl.com/yxmuwzla https://tinyurl.com/y2g8nuuo P. Saha et al.
Table 2.
This table gives the language wise result of sub-task A by comparing themacro F1 values Language English German HindiHOF 0.59 0.36 0.76NOT 0.79 0.87 0.79Total 0.69 0.62 0.78
Table 3.
This table gives the language wise result of sub-task B by comparing themacro F1 values Language English German HindiHATE 0.28 0.04 0.29OFFN 0.00 0.0 0.29PRFN 0.31 0.19 0.59NONE 0.79 0.87 0.79Total 0.34 0.28 0.49
Table 4.
This table gives the language wise result of sub-task C by comparing themacro F1 values Language English HindiTIN 0.51 0.63UNT 0.11 0.17NONE 0.79 0.79Total 0.47 0.53
In the results of subtask A, models are mainly affected by imbalance of thedataset. The training dataset of Hindi dataset was more balanced than Englishor German dataset. Hence, the results were around . As the dataset inGerman language was highly imbalanced, the results drops to . In subtaskB, the highest F1 score reached was by the profane class for each language intable 3. The model got confused between OFFN, HATE and PRFN labels whichsuggests that these models are not able to capture the context in the sentence.The subtask C was again a case of imbalanced dataset as targeted(TIN) labelgets the highest F1 score in table 4.
In this shared task, we experimented with zero-shot transfer learning on abusivetext detection with pre-trained BERT and LASER sentence embeddings. Weuse an LGBM model to train the embeddings to perform downstream task. Ourmodel for German language got the first position. The results provided a strongbaseline for further research in multilingual hate speech. We have also made themodels public for use by other researchers . https://github.com/punyajoy/HateMonitors-HASOC ateMonitors: Language Agnostic Abuse Detection in Social Media 7 References
1. Alorainy, W., Burnap, P., Liu, H., Williams, M.: The enemy among us: Detect-ing hate speech with threats based’othering’language embeddings. arXiv preprintarXiv:1801.07495 (2018)2. Artetxe, M., Schwenk, H.: Massively multilingual sentence embeddings for zero-shot cross-lingual transfer and beyond. CoRR abs/1812.10464 (2018), http://arxiv.org/abs/1812.10464
3. Arun, C.: On whatsapp, rumours, and lynchings. Economic & Political Weekly (6), 30–35 (2019)4. Awan, I.: Islamophobia on social media: A qualitative analysis of the facebook’swalls of hate. International Journal of Cyber Criminology (1) (2016)5. Bartlett, J., Norrie, R., Patel, S., Rumpel, R., Wibberley, S.: Misogyny on twitter.Demos (2014)6. Basile, V., Bosco, C., Fersini, E., Nozza, D., Patti, V., Pardo, F.M.R., Rosso,P., Sanguinetti, M.: Semeval-2019 task 5: Multilingual detection of hate speechagainst immigrants and women in twitter. In: Proceedings of the 13th InternationalWorkshop on Semantic Evaluation. pp. 54–63 (2019)7. Bilewicz, M., Winiewski, M., Kofta, M., W´ojcik, A.: Harmful ideas, the structureand consequences of anti-s emitic beliefs in p oland. Political Psychology (6),821–839 (2013)8. Bosco, C., Felice, D., Poletto, F., Sanguinetti, M., Maurizio, T.: Overview of theevalita 2018 hate speech detection task. In: EVALITA 2018-Sixth Evaluation Cam-paign of Natural Language Processing and Speech Tools for Italian. vol. 2263,pp. 1–9. CEUR (2018)9. Chen, T., Guestrin, C.: Xgboost: A scalable tree boosting system. CoRR abs/1603.02754 (2016), http://arxiv.org/abs/1603.02754
10. Davidson, T., Warmsley, D., Macy, M., Weber, I.: Automated hate speech detectionand the problem of offensive language. arXiv preprint arXiv:1703.04009 (2017)11. Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirec-tional transformers for language understanding. CoRR abs/1810.04805 (2018), http://arxiv.org/abs/1810.04805
12. Erjavec, K., Kovaˇciˇc, M.P.: you don’t understand, this is a new war! analysis ofhate speech in news web sites’ comments. Mass Communication and Society (6),899–920 (2012)13. Fersini, E., Nozza, D., Rosso, P.: Overview of the evalita 2018 task on automaticmisogyny identification (ami). In: EVALITA@ CLiC-it (2018)14. Finkelstein, J., Zannettou, S., Bradlyn, B., Blackburn, J.: A quantitative approachto understanding online antisemitism. arXiv preprint arXiv:1809.01644 (2018)15. Fortuna, P., Nunes, S.: A survey on automatic detection of hate speech in text.ACM Computing Surveys (CSUR) (4), 85 (2018)16. Gatehouse, C., Wood, M., Briggs, J., Pickles, J., Lawson, S.: Troubling vulnerabil-ity: Designing with lgbt young people’s ambivalence towards hate crime reporting.In: Proceedings of the 2018 CHI Conference on Human Factors in ComputingSystems. p. 109. ACM (2018)17. Greenberg, J., Pyszczynski, T.: The effect of an overheard ethnic slur on evalua-tions of the target: How to spread a social disease. Journal of Experimental SocialPsychology (1), 61–72 (1985)18. Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W.J., Ma, W., Ye, Q., Liu, T.M.:Lightgbm: A highly efficient gradient boosting decision tree. In: NIPS (2017) P. Saha et al.19. Mandl, T., Modha, S., Patel, D., Dave, M., Mandlia, C., Patel, A.: Overview ofthe HASOC track at FIRE 2019: Hate Speech and Offensive Content Identificationin Indo-European Languages). In: Proceedings of the 11th annual meeting of theForum for Information Retrieval Evaluation (December 2019)20. Mathew, B., Dutt, R., Goyal, P., Mukherjee, A.: Spread of hate speech in onlinesocial media. In: Proceedings of the 10th ACM Conference on Web Science. pp.173–182. ACM (2019)21. Mullen, B., Rice, D.R.: Ethnophaulisms and exclusion: The behavioral conse-quences of cognitive representation of ethnic immigrant groups. Personality andSocial Psychology Bulletin (8), 1056–1067 (2003)22. Nobata, C., Tetreault, J., Thomas, A., Mehdad, Y., Chang, Y.: Abusive languagedetection in online user content. In: Proceedings of the 25th international confer-ence on world wide web. pp. 145–153. International World Wide Web ConferencesSteering Committee (2016)23. Qian, J., ElSherief, M., Belding, E., Wang, W.Y.: Hierarchical cvae for fine-grainedhate speech classification. In: Proceedings of the 2018 Conference on EmpiricalMethods in Natural Language Processing. pp. 3550–3559 (2018)24. Reddy, V.: Perverts and sodomites: Homophobia as hate speech in africa. SouthernAfrican Linguistics and Applied Language Studies (3), 163–175 (2002)25. Ross, B., Rist, M., Carbonell, G., Cabrera, B., Kurowsky, N., Wojatzki, M.: Mea-suring the reliability of hate speech annotations: The case of the european refugeecrisis. arXiv preprint arXiv:1701.08118 (2017)26. Saha, P., Mathew, B., Goyal, P., Mukherjee, A.: Hateminers: Detecting hate speechagainst women. arXiv preprint arXiv:1812.06700 (2018)27. Sanguinetti, M., Poletto, F., Bosco, C., Patti, V., Stranisci, M.: An italian twittercorpus of hate speech against immigrants. In: Proceedings of the Eleventh Inter-national Conference on Language Resources and Evaluation (LREC-2018) (2018)28. Silva, L.A., Mondal, M., Correa, D., Benevenuto, F., Weber, I.: Analyzing thetargets of hate in online social media. In: ICWSM. pp. 687–690 (2016)29. Stammbach, D., Zahraei, A., Stadnikova, P., Klakow, D.: Offensive language detec-tion with neural networks for germeval task 2018. In: 14th Conference on NaturalLanguage Processing KONVENS 2018. p. 58 (2018)30. Unsv˚ag, E.F., Gamb¨ack, B.: The effects of user features on twitter hate speechdetection. In: Proceedings of the 2nd Workshop on Abusive Language Online(ALW2). pp. 75–85 (2018)31. Verma, S.: Code-switching: Hindi-english. Lingua (2), 153 – 165(1976). https://doi.org/https://doi.org/10.1016/0024-3841(76)90077-2,