Hate Speech detection in the Bengali language: A dataset and its baseline evaluation
Nauros Romim, Mosahed Ahmed, Hriteshwar Talukder, Md Saiful Islam
HHate Speech detection in the Bengali language:A dataset and its baseline evaluation
Nauros Romim − − − , Mosahed Ahmed − − − X ] ,Hriteshwar Talukder , − − − , and Md SaifulIslam , − − − X ] Shahjalal University of Science and Technology, Kumargaon, Sylhet 3114,Bangladesh {naurosromim,tainahmed96} @gmail.com {hriteshwar-eee,saiful-cse} @sust.edu Abstract.
Social media sites such as YouTube and Facebook have be-come an integral part of everyone’s life and in the last few years, hatespeech in the social media comment section has increased rapidly. Detec-tion of hate speech on social media websites faces a variety of challengesincluding small imbalanced data sets, the finding of an appropriate modeland also the choice of feature analysis method. Furthermore, this problemis more severe for the Bengali speaking community due to the lack of goldstandard labelled datasets. This paper presents a new dataset of 30,000user comments tagged by crowdsourcing and verified by expert. All theuser comments collected from YouTube and Facebook comment sectionand to classified into seven categories: sports, entertainment, religion,politics, crime, celebrity, and TikTok & meme. A total of 50 annota-tors annotated each comment three times, and the majority vote wastaken as the final annotation. Nevertheless, we have conducted base-line experiments and several deep learning models along with extensivepretrained Bengali word embedding such as Word2Vec, FastTest, andBengFastText on this dataset to facilitate future research opportunities.The experiment illustrated that although all the deep learning modelperformed well, SVM achieved the best result with 87.5% accuracy. Ourcore contribution is to make this benchmark dataset available and ac-cessible to facilitate further research in the field of Bengali hate speechdetection.
Keywords:
Natural Language Processing (NLP) · Bengali Text Clas-sification · Bengali sentiment analysis · Bengali Social Media Hatespeech detection.
Social media has become an essential part of every person’s day to day life.It enables fast communication, easy access to sharing, and receiving ideas andviews from worldwide. However, at the same time, this expression of freedom haslead to the continuous rise of hate speech and offensive language in social media.
N. Romim et al.
Part of this problem has been created by the corporate social media model andthe gap between documented community policy and the real-life implication ofhate speech [4]. Not only that, but hate speech-language is also very diverse [17].The language used in social media is often very different from traditional printmedia. It has various linguistic features. Thus the challenge in automaticallydetect hate speech is very hard [20].Even though much work has been done in hate speech prevention in the En-glish language, there is a significant lacking of resources regarding hate speech de-tection in Bengali social media. Nevertheless, problems like online abuse and es-pecially online abuse towards females, are continuously on the rise in Bangladesh[19]. However, for language like Bengali, a low resource language, developing anddeploying machine learning models to tackle real-life problems is very difficult,since there is a shortage of dataset and other tools for Bengali text classifica-tion [6]. So, the need for research on the nature and prevention of social mediahate speech has never been higher.This paper illustrates our attempt to improve this problem. Our dataset com-prises 30,000 Bengali comments from YouTube and Facebook comment sectionsand has a 10,000 hate speech. We select comments from 7 different categories:
Sports, Entertainment, Crime, Religion, Politics, Celebrity, TikTok & meme ,making it diverse. We ran several deep learning models along with word em-bedding models such as Word2Vec, FastText, and pretrained BengFastText onour dataset to obtain benchmark results. Lastly, we analyzed our findings andexplained challenges to detect hate speech.
Much work has been done on detecting hate speech using deep learning models[7], [10]. There have also been efforts to increase the accuracy of predictinghate speech specifically by extracting unique semantic features of hate speechcomments [22]. Researchers have also utilized fastText to build models that canbe trained on billions of words in less than ten minutes and classify millions ofsentences among hundreds of classes [14]. There have also been researches thatindicate how the annotator’s bias and worldview affect the performance of adataset [21]. Right now, the state of the art research for hate speech detectionreached the point where researches utilize the power of advanced architecturessuch as transfer learning. For example, in [9], researchers compared deep learning,transfer learning, and multitask learning architectures on the Arabic hate speechdataset. There is also research in identifying hate speech from a multilingualdataset using the pretrained state of the art models such as mBERT and xlm-RoBERTa [3].Unfortunately, very few works have been done on hate speech detection inBengali social media. The main challenge is the lack of sufficient data. To the bestof our knowledge, many of the datasets were around 5000 corpora [5], [8] and [12].There was a publicly available corpus containing around 10000 corpora, whichwere annotated into five different classes [2]. Nevertheless, one limitation the ate Speech detection in the Bengali language 3 authors faced that they could only use sentences labeled as toxicity in their ex-periments since other labels were low in number. There was also another datasetof 2665 corpus that was translated from an English hate speech dataset [1].Another research used a rule-based stemmer to obtain linguistic features [8]. Re-searches are coming out that uses deep learning models such as CNN, LSTM toobtain better results [2], [5], and [8]. One of the biggest challenges is that Bengaliis a low resource language. Research has been done to create word embeddingspecifically for a low-resource language like Bengali called BengFastText [15].This embedding was trained on 250 million Bengali word corpus. However, onething is clear that there is a lack of dataset that is both large and diverse. Thispaper tries to tackle the problem by presenting a large dataset and having com-ments from seven different categories, making it diverse. To our knowledge, thisis the first dataset in Bengali social media hate speech with such a large anddiverse dataset.
Our primary goal while creating the dataset was to create a dataset with dif-ferent varieties of data. For this reason, comments on seven different categories: sports, entertainment, crime, religion, politics, celebrity and meme, TikTok &miscellaneous from YouTube and Facebook were extracted.We extracted comments from the Facebook public page, Dr. MuhammadZafar Iqbal. He is a prominent science fiction author, physicist, academic, andactivist of Bangladesh. These comments belonged to the celebrity category. Nev-ertheless, due to Facebook’s restriction on its graph API, we had to focus onYouTube as the primary source of data.From YouTube, we looked for the most scandalous and controversial topicsof Bangladesh between the year 2017-20. We reasoned that since these were con-troversial, videos were made more frequently, and people participated more inthe comment section, and the comments might contain more hate speech. Wesearched YouTube for videos on keywords related to these controversial events.For example, we searched for renowned singer and actor couple
Mithila-Tahsandivorce , i.e., the Mithila controversy of 2020. Then we selected videos with atleast 700k views and extracted comments from them. In this way, we extractedcomments videos on controversial topics covering five categories: sports, enter-tainment, crime, religion, and politics.
Finally, we searched for videos that arememes, TikTok, and other keywords that might contain hate speech in theircomment section. This is our seventh category.We extracted all the comments using open-source software called FacePager .After extracting, we labeled each comment with a number defining which cat-egory it belonged. We also labeled the comments with the additional numberto define its keyword . For this paper, the keyword means the controversial event https://github.com/strohne/Facepager N. Romim et al. that falls under a category. For example, mithila is a keyword that belongs tothe entertainment category. We labeled every comment with its correspondingcategory and keyword for future research.After extracting the comments, we manually checked the whole dataset andremoved all the comments made entirely of English or other non-Bengali lan-guages, emoji, punctuation, and numerical values. However, we kept a commentif it primarily consists of Bengali words with some emoji, number, and non-Bengali words mixed within it. We deleted non-Bengali comments because ourresearch focuses on Bengali hate speech comments, so non-Bengali comments areout of our research focus. However, we kept impure Bengali comments becausepeople do not make pure Bengali comments on social media. So emoji, number,punctuation, and English words are a feature of social media comment. Thuswe believe our dataset can prove to be very rich for future research purposes. Inthe end, we collected a total of 30,000 comments. In a nutshell, our dataset has30k comments that are mostly Bengali sentences with some emoji, punctuation,number, and English alphabet mixed in it.
Hate speech is a subjective matter. So it is quite difficult to define what makesa comment hate speech. Thus we have come up with some rigid rules. We havebased these rules on the community standard of Facebook and YouTube . Bel-low, we have listed some criteria with necessary examples: – Hate speech is a sentence that dehumanizes one or multiple persons or acommunity. Dehumanizing can be done by comparing the person or commu-nity to an insect, object, or a criminal. It can also be done by targeting aperson based on their race, gender, physical and mental disability. – A sentence might contain slang or inappropriate language. But unless thatslang dehumanizes a person or community, we did not consider it to be hatespeech. For example: িক বােলর মুিভ
Here the slang word is not used to dehumanized any person. So it is not hatespeech. – If a comment does not dehumanize a person rather directly supports anotheridea that clearly dehumanizes a person or a community, This is consideredhate speech. For example: েবারকা পের না, ধিষর্ত েতা হেবই। this sentence supports the dehumanizing act of women, so we labeled it ashate speech. https://web.facebook.com/communitystandards/ – If additional context is needed to understand that a comment is a hatespeech, we did no consider it to be one. For example, consider this sentence: গতর্ েযাদ্ধা
It is a comment taken from Dr. Muhammad Zafar Iqbal’s Facebook page.This comment refers to a particular jab the haters of Dr. Zafar Iqbal con-stantly use to attack him in social media. But unless no one mentions anannotator that this comment belongs to this particular Facebook page, thatperson will have no way of knowing this is actually a hate speech. Thus thesetypes of comments will be labeled as not hate speech. – It does not matter if the stand that a hate speech comment takes is right orwrong. Because what is right or wrong is subjective. So if a sentence, withoutany outside context, dehumanizes a person or community, we considered thatto be hate speech.We worked with 50 annotators to annotate the entire dataset. We instructedall annotators to follow our guidelines mentioned above. The annotators areall undergraduate students of Shahjalal University of Science and Technology.Thus the annotators have an excellent understanding of popular social mediatrends and seen how hate speech propagates in social media. All comments wereannotated three times, and we took the majority decision as the final annotation.After annotation, we wanted to check the validity of the annotator’s annota-tion. For this reason, we randomly sampled 300 comments from every category.Then we manually checked each comment’s majority decision. Since we, as theauthors of this paper, were the ones that set the guideline for defining hatespeech and did not participate in the annotation procedure, our checking wasa neutral evaluation of the annotator’s performance. After our evaluation, wefound that our dataset annotation is 91.05% correct.
Our dataset has a total of 30,000 comments, where there are a total of 10,000 hatespeech comments, as evident from table-2. So it is clear that our dataset is heavilybiased towards not hate speech. If we look closely at each category in figure-1,it becomes even more apparent. In all categories, not hate speech commentsdominates. But particularly in celebrity and politics , the number of hate speechis very low, even compared to hate speech in other categories. During datacollection, we have observed that there were many hate speech in the celebrity section i.e. in Dr. Muhammad Zafar Iqbal’s Facebook page, but they were outsidecontext. As we have discussed before in section 3.2, we have only considered textswithout context while labeling it as hate speech. So many comments were labeledas not hate speech. For the category politics , we have observed that people tendto not attack any person or group directly. Rather they tend to add their owntake on the current political environment. So the number of direct attacks is lessin the politics category.
N. Romim et al.
When we look at the mean text length in table-3, we can find a couple ofinteresting observations. First, we can see that meme comments are very shortin length. This makes sense as when a person is posting a comment in a memevideo, and he is likely to express his state of mind, requiring a shorter amountsentences. But the opposite is true for the celebrity category. This has the longestaverage text length. This is large because when people comment on Dr. ZafarIqbal’s Facebook page, they add a lot of their own opinion and analysis, nomatter the comment is hate speech or not. This shows how unique the commentsection of an individual celebrity page can be. Lastly, we see that average hatespeech tends to be shorter than not hate speech.In the table, 4 we have compared all state of the art datasets. The tablewe have included the total number of the dataset and the number of classesthe datasets were annotated. As you can see, there are some datasets that havemultiple classes. In this paper, we focused on the total number of the datasetand extracted comments from different categories so that we can ensure linguisticvariation.
Table 1.
Sample datasetSentence Hate speech Category Keyword মিহলার িবচার চাই হ্লারেপা েতার কত বড় সাহস? আমরা পােশ আিছ বৰ্াদার। এই গাধার বাচ্চা, িক বিলস তু ই েহ আল্লাহ, তার সুস্থতা দা কর
Table 2.
Hate speech comments per categoryHate speech Not hate speech Total10,000 20,000 30,000
Our 30k dataset had raw Bengali comments with emoji, punctuation, and En-glish alphabet mixed in it. We removed all emoji, punctuation, numerical values,non-Bengali alphabet, and symbols from all comments for baseline evaluation.After that, we created a train and test set of 24,000 and 6,000 comments, re-spectively. Now the dataset is ready for evaluation. ate Speech detection in the Bengali language 7
Table 3.
Mean text length of the datasetCategory Mean text lengthSports 75Entertainment 65.9Crime 87.4Religion 71.4Politics 72.5Celebrity 134.5Meme, TikTok and others 56.2Hate speech 69.59Not hate speech 84.39
Fig. 1.
Distribution diagram of data in each category N. Romim et al.
Table 4.
A comparison of all state of the art datasets on Bengali hate speechPaper Total data Number of classesHateful speech detection in public facebookpages for the bengali language [12] 5,126 06Toxicity Detection on Bengali Social MediaComments using Supervised Models [2] 10,219 05A Deep Learning Approachto Detect Abusive Bengali Text [8] 4,700 07Threat and Abusive Language Detectionon Social Media in Bengali Language [5] 5,644 07Detecting Abusive Comments inDiscussion Threads Using Naïve Bayes [1] 2,665 07
Hate Speech detection in the Bengali language:A dataset and its baseline evaluation 30,000 02
We have used three word embedding models. They are: Word2Vec [16], FastText[13] and BengFast [18]. To create a Word2Vec model, we used gensim module totrain on the 30k dataset. We have used CBoW method for building the Word2Vecmodel. For FastText, we also used the 30k dataset to create the embedding andused the skip-gram method. The embedding dimension for both models was setto 300. Lastly, BengFastText is the largest pretrained Bengali word embeddingbased on FastText. But BengFastText was not trained on any YouTube data.So we wanted to see how it performs on YouTube comments. We used the Support Vector Machine [11]to determine baseline evaluation. We used the linear kernel and kept all otherparameters to its default value.
Long Short Term Memory (LSTM)
For our experiment, we used 100 LSTMlayers, set both dropout and recurrent dropout rate to 0.2, and used ’adam’ asthe optimizer.
Bi-directional Long Short Term Memory (Bi-LSTM)
In this case, weused 64 Bi-LSTM layers, with a dropout rate set to 0.2 and optimizer as ’adam.’
We kept 80% of the dataset as a train set and 20% as a test and trained everyword embedding with every deep learning algorithm on the train set. For every https : // radimrehurek.com / gensim / models / word vec.html ate Speech detection in the Bengali language 9 case, we kept all parameters standard. Epoch and batch size was set to 5 and64, respectively. Then we tested all the trained models on the test dataset andmeasured the accuracy and F-1 score. Bellow are all types of models we testedon our dataset. – Baseline evaluation: Support Vector Machine (SVM) – FastText Embedding with LSTM – FastText Embedding with Bi-LSTM – BengFastText Embedding with LSTM – BengFastText Embedding with Bi-LSTM – Word2Vec Embedding with LSTM – Word2Vec Embedding with Bi-LSTM
Fig. 2.
Deep learning architecture with Word2Vec and Bi-LSTM0 N. Romim et al.
We can observe from the table 5 that all the models achieved good accuracy.SVM achieved the overall best result with accuracy and an F-1 score of 87.5% and0.911, respectively. But BengFastText with LSTM and Bi-LSTM had relativelythe worst accuracy and F-1 score. Their low F-1 score indicates that deep learningmodels with BengFastText embedding were overfitted the most. BengFastTextis not trained on any YouTube data [18]. But our dataset has a huge amount ofYouTube comment. This might be a reason for its drop on performance.Then we looked at the performance of Word2Vec and FastText embedding.We can see that FastText performed better in terms of accuracy and had alower F-1 score than Word2Vec. Word2Vec was more overfitted than FastText.FastText has one distinct advantage over Word2Vec: it learns from the wordsof a corpus and it’s substrings. Thus FastText can tell ’love’ and ’beloved’ aresimilar words [13]. This might be a reason as to why FastText outperformedWord2Vec.
Table 5.
Result of all modelsModel name Accuracy F-1 ScoreSVM
Word2Vec + LSTM 83.85 0.89Word2Vec + Bi-LSTM 81.52 0.86FastText + LSTM 84.3 0.89FastText + Bi-LSTM 86.55 0.901BengFastText + LSTM 81 0.859BengFastText + Bi-LSTM 80.44 0.857
We manually cross-checked all labels of the test set with the prediction of theSVM model. We looked at the false negative and false positive cases and wantedto find which types of sentences the model failed to predict accurately. We foundthat some of the labels were actually wrong, and the model actually predictedaccurately. Nevertheless, there were some unusual cases. For example: ওরা চাইেবা েকন েতার কােছ তু ই িবিসিব েপৰ্িসেডন্ট হেয় িক বাল ফালাস েতার েতা েখয়াল রাখাদরকার িছল িকৰ্েকটারেদর িফিসিিলস িনেয়
This is not a hate speech, but the model predicted it to be hate speech. Thereare several other similar examples. The reason here is that this sentence containsaggressive words that are normally used in hate speech, but in this case, it wasnot used to dehumanize another person. However, the model failed to understandthat. This type of mistake was common in false-negative cases. It demonstrates ate Speech detection in the Bengali language 11 that words in the Bengali language can be used in a complicated context, andit is a tremendous challenge for machine learning models to actually understandthe proper context of a word used in a sentence.
Hate speech comment in social media is a pervasive problem. So a lot moreresearch is urgent to combat this issue and ensure a better online environment.One of the biggest obstacles towards detecting hate speech through state-of-the-art deep learning models is the lack of a large and diverse dataset. In thispaper, we created a large dataset of 30,000 comments, a 10,000 hate speech. Ourdataset has comments from 7 different categories making the dataset diverse andabundant. We showed that hate speech comments tend to be shorter in lengthand word count than not hate speech comments. Finally, we ran several deeplearning models with word embedding on our dataset. It showed that when thetraining dataset is highly imbalanced, the models become overfitted and biasedtowards not hate speech. Thus even though overall accuracy is very high, themodels can not predict hate speech well.However, we believe this is scratching just the surface of solving this widespreadproblem. One of the biggest obstacles is that there is no proper word embeddingfor the Bengali language used in social media. There is some word embeddingcreated from newspaper and blog article corpus. Nevertheless, the language usedis vastly different from social media language. The main reason is that unliketraditional print media, there is no one to check for grammatical and spellingmistakes. Thus there are lots of misspelling, grammatical errors, cryptic memelanguage, emoji, etc. In fact, in our dataset, we found the same word havingmultiple types of spelling. For example: জাব, যাব, যােবা, জাবও
Here, they all are the same word. A human brain can understand that thesewords are the same, but they are different from a deep learning model. Anotherdifference is the use of emoji to convey meaning. Often people will express aspecific emotion with the only emoji. Emoji is a recent phenomenon that is absentin blog posts or newspaper articles, or books. Currently, there is no dataset orpretrained model that classifies the sentiment of emoji used in social media.One of the critical challenges to accurate hate speech detection is to createmodels that can extract necessary information from an unbalanced dataset topredict the minority class with reasonable accuracy. Our experiment demon-strated that standard deep learning models are not sufficient for this task. Ad-vanced models like mBERT and XLM-RoBERTa can be of great use in thisregard as they are trained on a large multilingual dataset and use attentionmechanisms. Embedding models based on extensive and diverse Social Mediacomment datasets can also be of great help.
This work would not have been possible without the kind support from SUSTNLP Research Group and SUST Research Center. We would also like to ex-press our hearfealt gratitude to all the annotators and volunteers who made thejourney possible.
References
1. Awal, M.A., Rahman, M.S., Rabbi, J.: Detecting Abusive Comments in Discus-sion Threads Using Naïve Bayes. 2018 International Conference on Innovationsin Science, Engineering and Technology, ICISET 2018 (October), 163–167 (2018).https://doi.org/10.1109/ICISET.2018.87455652. Banik, N.: Toxicity Detection on Bengali Social Media Comments using SupervisedModels (November) (2019). https://doi.org/10.13140/RG.2.2.22214.016083. Baruah, A., Das, K., Barbhuiya, F., Dey, K.: Aggression identification in english,hindi and bangla text using bert, roberta and svm. In: Proceedings of the SecondWorkshop on Trolling, Aggression and Cyberbullying. pp. 76–82 (2020)4. Ben-David, A., Fernández, A.M.: Hate speech and covert discrimination on socialmedia: Monitoring the facebook pages of extreme-right political parties in spain.International Journal of Communication , 27 (2016)5. Chakraborty, P., Seddiqui, M.H.: Threat and Abusive Language Detection on So-cial Media in Bengali Language. 1st International Conference on Advances in Sci-ence, Engineering and Robotics Technology 2019, ICASERT 2019 (Icasert),1–6 (2019). https://doi.org/10.1109/ICASERT.2019.89346096. Chakravarthi, B.R., Arcan, M., McCrae, J.P.: Improving wordnets for under-resourced languages using machine translation. In: Proceedings of the 9th GlobalWordNet Conference (GWC 2018). p. 78 (2018)7. Davidson, T., Warmsley, D., Macy, M., Weber, I.: Automated hate speech detec-tion and the problem of offensive language. Proceedings of the 11th InternationalConference on Web and Social Media, ICWSM 2017 (Icwsm), 512–515 (2017)8. Emon, E.A., Rahman, S., Banarjee, J., Das, A.K., Mittra, T.: A Deep Learn-ing Approach to Detect Abusive Bengali Text. 2019 7th International Confer-ence on Smart Computing and Communications, ICSCC 2019 pp. 1–5 (2019).https://doi.org/10.1109/ICSCC.2019.88436069. Farha, I.A., Magdy, W.: Multitask learning for arabic offensive language and hate-speech detection. In: Proceedings of the 4th Workshop on Open-Source ArabicCorpora and Processing Tools, with a Shared Task on Offensive Language Detec-tion. pp. 86–90 (2020)10. Gambäck, B., Sikdar, U.K.: Using Convolutional Neural Networks to Classify Hate-Speech (7491), 85–90 (2017). https://doi.org/10.18653/v1/w17-301311. Hearst, M.A., Dumais, S.T., Osuna, E., Platt, J., Scholkopf, B.: Support vectormachines. IEEE Intelligent Systems and their applications (4), 18–28 (1998)12. Ishmam, A.M., Sharmin, S.: Hateful speech detection in public facebook pagesfor the bengali language. Proceedings - 18th IEEE International Conferenceon Machine Learning and Applications, ICMLA 2019 pp. 555–560 (2019).https://doi.org/10.1109/ICMLA.2019.0010413. Joulin, A., Grave, E., Bojanowski, P., Douze, M., Jégou, H., Mikolov, T.: Fast-text.zip: Compressing text classification models. arXiv preprint arXiv:1612.03651(2016)ate Speech detection in the Bengali language 1314. Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient textclassification. 15th Conference of the European Chapter of the Association forComputational Linguistics, EACL 2017 - Proceedings of Conference (2), 427–431(2017). https://doi.org/10.18653/v1/e17-206815. Karim, M.R., Chakravarthi, B.R., McCrae, J.P., Cochez, M.: ClassificationBenchmarks for Under-resourced Bengali Language based on MultichannelConvolutional-LSTM Network (2020), http://arxiv.org/abs/2004.0780716. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed repre-sentations of words and phrases and their compositionality. In: Advances in neuralinformation processing systems. pp. 3111–3119 (2013)17. Mondal, M., Silva, L.A., Benevenuto, F.: A measurement study of hate speechin social media. In: Proceedings of the 28th ACM Conference on Hypertext andSocial Media. pp. 85–94 (2017)18. Rezaul Karim, M., Raja Chakravarthi, B., Arcan, M., McCrae, J.P., Cochez, M.:Classification benchmarks for under-resourced bengali language based on multi-channel convolutional-lstm network. arXiv pp. arXiv–2004 (2020)19. Sambasivan, N., Batool, A., Ahmed, N., Matthews, T., Thomas, K., Gaytán-Lugo,L.S., Nemer, D., Bursztein, E., Churchill, E., Consolvo, S.: ” they don’t leave usalone anywhere we go” gender and digital abuse in south asia. In: proceedingsof the 2019 CHI Conference on Human Factors in Computing Systems. pp. 1–14(2019)20. Schmidt, A., Wiegand, M.: A survey on hate speech detection using natural lan-guage processing. In: Proceedings of the Fifth International Workshop on NaturalLanguage Processing for Social Media. pp. 1–10 (2017)21. Waseem, Z.: Are You a Racist or Am I Seeing Things? Annota-tor Influence on Hate Speech Detection on Twitter pp. 138–142 (2016).https://doi.org/10.18653/v1/w16-561822. Zhang, Z., Luo, L.: Hate speech detection: A solved problem? The chal-lenging case of long tail on Twitter. Semantic Web10