[PDF] Annotating for Hate Speech: The MaNeCo Corpus and Some Input from Critical Discourse Analysis

Abstract

This paper presents a novel scheme for the annotation of hate speech in corpora of Web 2.0 commentary. The proposed scheme is motivated by the critical analysis of posts made in reaction to news reports on the Mediterranean migration crisis and LGBTIQ+ matters in Malta, which was conducted under the auspices of the EU-funded C.O.N.T.A.C.T. project. Based on the realization that hate speech is not a clear-cut category to begin with, appears to belong to a continuum of discriminatory discourse and is often realized through the use of indirect linguistic means, it is argued that annotation schemes for its detection should refrain from directly including the label 'hate speech,' as different annotators might have different thresholds as to what constitutes hate speech and what not. In view of this, we suggest a multi-layer annotation scheme, which is pilot-tested against a binary +/- hate speech classification and appears to yield higher inter-annotator agreement. Motivating the postulation of our scheme, we then present the MaNeCo corpus on which it will eventually be used; a substantial corpus of on-line newspaper comments spanning 10 years.

Full PDF

AAnnotating for Hate Speech: The MaNeCo Corpus and Some Input fromCritical Discourse Analysis

Stavros Assimakopoulos, Rebecca Vella Muskat, Lonneke van der Plas, Albert Gatt

University of MaltaInstitute of Linguistics and Language Technology, Msida MSD 2080, Malta { stavros.assimakopoulos, rebecca.vella, lonneke.vanderplas, albert.gatt } @um.edu.mt Abstract

This paper presents a novel scheme for the annotation of hate speech in corpora of Web 2.0 commentary. The proposed scheme ismotivated by the critical analysis of posts made in reaction to news reports on the Mediterranean migration crisis and LGBTIQ+ mattersin Malta, which was conducted under the auspices of the EU-funded C.O.N.T.A.C.T. project. Based on the realization that hate speechis not a clear-cut category to begin with, appears to belong to a continuum of discriminatory discourse and is often realized through theuse of indirect linguistic means, it is argued that annotation schemes for its detection should refrain from directly including the label’hate speech,’ as different annotators might have different thresholds as to what constitutes hate speech and what not. In view of this, wesuggest a multi-layer annotation scheme, which is pilot-tested against a binary ± hate speech classiﬁcation and appears to yield higherinter-annotator agreement. Motivating the postulation of our scheme, we then present the MaNeCo corpus on which it will eventuallybe used; a substantial corpus of on-line newspaper comments spanning 10 years. Keywords: hate speech, annotation, newspaper comments, corpus creation

1. Introduction

While the discussion of automatic hate speech detectiongoes back at least two decades (Spertus, 1997; Greevy andSmeaton, 2004), recent years have witnessed a renewedinterest in the area. This is largely due to the prolifera-tion of user-generated content as part of Web 2.0, whichhas given rise to continuous streams of content, producedin large volumes over multiple geographical regions andin multiple languages and language varieties. As a result,the exponential increase in potential sources of hate speechin combination with the continuous introduction of legisla-tion and policy-making aiming speciﬁcally at regulating thephenomenon across a number of countries (Banks, 2010)clearly necessitates the development of reliable automaticmethods for detecting hate speech online that will comple-ment human moderation.From a natural language processing (NLP) perspective,treatments of hate speech focus mainly on the problem ofidentiﬁcation (Fortuna and Nunes, 2018). Thus, given aspan of text, the task is to identify whether it is an in-stance of hate speech or not. This makes the problem acase of binary classiﬁcation ( ± hate speech), which in turnmakes it amenable to treatment using a variety of classiﬁca-tion methods. These supervised learning techniques requirepre-labelled training data, consisting of manually annotatedpositive and negative examples of the class(es) to be iden-tiﬁed, to learn a model which, given a new instance, canpredict the label with some degree of probability. In thissetting, the most common features traditionally used forhate speech classiﬁers are lexical and grammatical features(Schmidt and Wiegand, 2017; Fortuna and Nunes, 2018),with more recent approaches making use of neural networkmodels relying on word embeddings (Badjatiya et al., 2017;Agrawal and Awekar, 2018).In this paper, we concentrate on the particular issues thatone has to take into consideration when annotating Web 2.0 data for hate speech. The unique angle of our perspectiveis that it is informed by data-driven research in the ﬁeldof Critical Discourse Analysis (CDA), a strand of appliedlinguistics that has for long dealt with the ways in whichlanguage is used to express ideologically-charged attitudes,especially in relation to discrimination. For some of us theinterest in the area of hate speech detection in the Web 2.0era was originally sparked through our involvement in theC.O.N.T.A.C.T. project which speciﬁcally targeted onlinehate speech from a CDA point of view (Assimakopouloset al., 2017). As a matter of fact, the ongoing compila-tion of the Maltese Newspaper Comments (MaNeCo) cor-pus, which this paper additionally launches, started off asa potential extension to the much smaller corpus that wascompiled for the purposes of in-depth qualitative analysisundertaken under the auspices of C.O.N.T.A.C.T. (Assi-makopoulos and Vella Muskat, 2017; Assimakopoulos andVella Muskat, 2018). Against this backdrop, the ensuingdiscussion focuses on the challenges faced while annotat-ing this corpus, which we are treating herein as a pilot forthe development of the scheme used to annotate samples ofMaNeCo, and which, given MaNeCo’s special characteris-tics, is expected to lead to more reliable datasets that couldbe used for future training and testing models of automatichate speech detection. Thus, our main contributions are thefollowing: • A description of the challenges encountered when an-notating hate speech • An annotation scheme that aims to address these chal-lenges • A pilot dataset of on-line newspaper comments on mi-gration and LGBTIQ+ matters, with multi-level anno-tations (described in Section 3.). The pilot dataset is available from the authors upon request. a r X i v : . [ c s . C Y ] A ug The MaNeCo corpus: a substantial corpus of Maltesenewspaper comments spanning 10 years.

2. Background

When it comes to automatic hate speech detection, a suf-ﬁciency of training data with high-quality labelling is acrucial ingredient for the success of any model. Even so,previous studies “remain fairly vague when it comes to theannotation guidelines their annotators were given for theirwork” (Schmidt and Wiegand, 2017, p.8). A review of therelevant literature reveals that the majority of previous at-tempts to annotate Web 2.0 data for hate speech involvessimple binary classiﬁcation into hate speech and non-hatespeech (Kwok and Wang, 2013; Burnap and Williams,2015; Djuric et al., 2015; Nobata et al., 2016). There areof course notable exceptions where the annotation schemeinvolved was more or less hierarchical in nature. For ex-ample, in Warner and Hirschberg (2012), annotators weretasked with classifying texts on the basis of whether theyconstitute hate speech or not, but were additionally askedto specify the target of said speech in the interest of dis-tinguishing between seven different domains of hatred (e.g.sexism, xenophobia, homophobia, etc). Then, acknowledg-ing that hate speech is a subtype of the more general cate-gory of offensive language and can thus often be conﬂatedwith it, Davidson et al. (2017) asked annotators to labeltweets in terms of three categories: hate speech, offensivebut not hate speech, or neither offensive nor hate speech.Along similar lines, Zampieri et al. (2019) introduce an ex-plicitly hierarchical annotation scheme that requested an-notators to code tweets on three consecutive levels: (a) onwhether they contain offensive language; (b) on whether theinsult/threat is targeted to some individual or group; and (c)on whether the target is an individual, a group or anothertype of entity (e.g. an organization or event). Finally, anannotation framework which, like the one proposed here,takes into account the intricate nature of hate speech wasdeveloped by Sanguinetti et al. (2018). Here, annotatorswere asked to not only provide a binary classiﬁcation ofItalian tweets as hate speech or not, but also to grade theirintensity on a scale from 0 to 4, and indicate whether eachtweet contains ironical statements or stereotypical repre-sentations as well as how it fares in terms of aggressivenessand offensiveness.Despite the apparently increasing interest in the area andthe development of all the more sophisticated annotationmethodologies, a major cause for concern when it comesto annotations used for model training and testing is thatreliability scores are consistently found to be low prettymuch across the board (Warner and Hirschberg, 2012; No-bata et al., 2016; Ross et al., 2016; Tulkens et al., 2016;Waseem, 2016; Bretschneider and Peters, 2017; Schmidtand Wiegand, 2017; de Gibert et al., 2018). This effec-tively suggests that, even in the presence of more detailedguidelines (Ross et al., 2016; Malmasi and Zampieri, 2018; Samples of this corpus can be made available upon request.A full release is expected in future, pending licensing agreementswith the donors of the MaNeCo data, which will need to cover sen-sitive data such as comments written by online users, but deletedby the moderators of the newspaper portal.

Sanguinetti et al., 2018), annotators often fail to developan intersubjective understanding of what ultimately countsas hate speech, and thus end up classifying the same re-marks differently, depending on their own background andpersonal views (Saleem et al., 2017; Salminen et al., 2018).The common denominator of the existing NLP literature onthe matter seems to be that annotating for hate speech is initself particularly challenging. The most pertinent reasonfor this seems to be the notorious elusiveness of the label‘hate speech’ itself. Despite having been established in le-gal discourse to refer to speech that can be taken to incite todiscriminatory hatred, hate speech is now often used as anumbrella label for all sorts of hateful/insulting/abusive con-tent (Brown, 2017). This much is strikingly evident whenone takes into account that most of the studies reviewed inNLP state-of-the-art reports on hate speech detection (in-cluding our discussion so far) formulate the question athand using different – albeit interrelated – terms, whichapart from hate speech variously include harmful speech,offensive and abusive language, verbal attacks, hostility oreven cyber-bullying, among others. In this vein, as Waseemet al. (2017, p.78) observe and exemplify, this “lack of con-sensus has resulted in contradictory annotation guidelines –some messages considered as hate speech by Waseem andHovy (2016) are only considered derogatory and offensiveby Nobata et al. (2016) and Davidson et al. (2017).”The apparent confusion as to how to adequately deﬁne hatespeech is of course not exclusive to NLP research, but ex-tends to social scientiﬁc (Gagliardone et al., 2015) and evenlegal treatments of the theme (Bleich, 2014; Sellars, 2016).Given this general confusion, it is certainly no surprisethat annotators, especially ones who do not have domain-speciﬁc knowledge on the matter, will be prone to disagree-ing as to how to classify some text in the relevant task, asopposed to other comparable annotation tasks.Discussing this issue within the more general context of de-tecting abusive language, Waseem et al. (2017) identifytwo sources for the resulting annotation confusion: (a) theexistence of abusive language directed towards some gen-eralised outgroup, as opposed to a speciﬁc individual; and(b) the implicitness with which an abusive attitude can of-ten be communicated. Despite targeting abusive languagein general, the resulting two-fold typology has been takento apply to the more particular discussion of hate speechtoo (ElSherief et al., 2018; MacAvaney et al., 2019; Mulkiet al., 2019; Rizos et al., 2019), rendering the lack of anexplicit insult/threat towards an individual target in termsof their membership to a protected group more difﬁcult toclassify as hate speech than a remark that explicitly incitesto discriminatory hatred. This much seems to be furthercorroborated by a recent study by Salminen et al. (2019)which revealed that, when evaluating the hatefulness of on-line comments on a scale of 1 (not hateful at all) to 4 (veryhateful), annotators agree more on the two extremes than inthe middle ground. Quite justiﬁably, this suggests that it iseasier to classify directly threatening or insulting messagesas hate speech, rather than indirectly disparaging ones.What transpires from a review of the literature then is thatthe difﬁculty in annotating for hate speech lies primarily inthose instances of hate speech that appear to fall under theadar of some annotators due to the ways in which incite-ment to discriminatory hatred can be concealed in linguisticexpression. For us, this is precisely the point where CDAcan offer valuable insight towards developing schemes forhate speech annotation. That is because CDA speciﬁcallyseeks, through ﬁne-grained analysis and a close reading ofthe text under investigation, to uncover the “consciousnessof belief and value which are encoded in the language – andwhich are below the threshold of notice for anyone who ac-cepts the discourse as ‘natural’” (Fowler, 1991, p. 67). Aswe will now turn to show, an appreciation of hate speechas an ideology-based phenomenon can substantially informNLP research in the area too.

3. Pilot dataset

For the purposes of the C.O.N.T.A.C.T. project we fo-cused speciﬁcally on analysing homophobia and xenopho-bia in local (that is, Maltese) user comment discourse (As-simakopoulos and Vella Muskat, 2017; Assimakopoulosand Vella Muskat, 2018). In this respect, the MalteseC.O.N.T.A.C.T. corpus, which served as a pilot for thepresently proposed annotation scheme, was built and an-notated following the common methodology establishedacross the C.O.N.T.A.C.T. consortium (Assimakopouloset al., 2017, pp. 17-20). The dataset was formed byscraping Maltese portals for comments found underneathnews reports related to LGBTIQ+ and migrant minoritiesover two time periods: April-June 2015 and December2015-February 2016. The identiﬁcation of relevant arti-cles was facilitated by the use of the EMM Newsbrief we-bcrawler , where we performed a search for articles fromMalta containing keywords pertaining to migrants and theLGBTIQ+ community in turn. We then scraped commentsamounting to approximately 5,000 words worth of contentper keyword, equally distributed across the selected key-words, eventually forming two subcorpora: one for migra-tion, comprising 1130 comments (41020 words), and onefor LGBTIQ+ matters, comprising 1109 comments (40924words).Following the compilation of the corpus, we engaged in an-notation which comprised two steps. In the ﬁrst instance,comments were classiﬁed in terms of their polarity, as pos-itive or negative, depending on their underlying stance to-wards the minority groups in question. Then, through anin-depth reading of each comment labelled as negative, weidentiﬁed the discursive strategies that underlie the com-munication of the negative attitude at hand. Crucially, thisdeeper level of annotation included the detection not onlyof linguistic forms, such as the use of derogatory vocabu-lary and slurs, tropes, or generics, but also implicit prag-matic functions that underlie the communication of insults,threats, jokes and stereotyping.

4. The special nature of hate speech

Right from the beginning of the aforementioned annotationof the pilot dataset, one issue that became immediately evi-dent was that, much like in the majority of the correspond-ing NLP research, a shallow binary classiﬁcation of positive http://emm.newsbrief.eu and negative attitude cannot possibly do justice to the par-ticularities of hate speech. This realisation appears to be inline with the argument made independently by the severalNLP researchers who have proposed, as we have seen, morecomplex annotation systems than a simple ± hate speechclassiﬁcation. Against this backdrop, the minute analysisperformed on each comment for C.O.N.T.A.C.T. could notlend itself to multiple repetitions by different annotators inorder to establish agreement, but it still reveals some prin-cipal reasons why the traditional reliance on lexical andgrammatical features for the training of a hate speech de-tection model falls short of capturing the intricate nature ofthe category in question. As common experience indicates, the internet is fullof emotional language that can be considered insultingand disrespectful towards both individuals and collectivegroups. In this setting, online hate speech might often con-tain offensive language, but not all offensive language usedon the internet can be considered hate speech. That is be-cause hate speech, in its specialised legal sense, is typicallytied to the speciﬁc criterion of incitement to discrimina-tory hatred in the several jurisdictions where it is regulated,while the discriminatory attitude needs to speciﬁcally targeta group that is legally protected.Indeed, since our pilot corpus was annotated for hate speechspeciﬁcally targeting migrants and the LQBTIQ+ commu-nity, it became obvious that there were a number of com-ments aimed at other groups of people as well as individ-uals too. As the following response from one commenterto another makes clear, a simple positive-negative attitudeclassiﬁcation could easily lead to the inaccurate labelling ofdata within the corpus:(1)

I’m not your Hun pervert!

While this comment is not targeted at a minority, it is avery direct insult, unlike much of the discourse that pointedto minorities in our dataset, which was more often than notonly indirectly offensive. Similarly, although less abusive,the following comment also offends another online user:(2)

I hope you’re just trolling [username]. If not, you trulyare a sad being.

Clearly, despite being obviously offensive, the two exam-ples do not constitute hate speech, since they are not tar-geted toward any group, much less a protected minority.In a similar vein, both (3) and (4) do not take issue with aminority group, but rather target groups that can be seen asdominant in the Maltese context.(3)

The issue or problem is that religion and governmentare not truly separate on Malta and never were or will bein the future no matter what is written in any constitution. (4)

Just remember that they wouldn’t be working Illegaly ifthere wasn’t someone willing to employ them Illegaly, anda lot of these employers are Maltese.

Evidently, (3) criticises the fact that the church and state inMalta are still intertwined, while (4) undermines the gen-eral negative stance taken toward migrants by expoundinghe hypocrisy of such a stance alongside the willingness ofthe Maltese to hire members of this group and thus ben-eﬁt from cheap labour. Again, within the framework ofdiscriminatory discourse, although these examples displaya negative disposition against collective groups (church,state, the Maltese), they cannot be deemed hate speech,since the groups targeted are not protected under MalteseLaw.

Beyond the examples which show that not all commentslabelled as negative constitute hate speech in the legisla-tive sense of the word, a mere label of ‘negative’ fails tocapture the complexity of the scale on which various formsof incitement to discriminatory hatred fall. In this respect,the analysis of the C.O.N.T.A.C.T. corpus also revealed thatdiscriminatory hatred can vary from direct calls to violence,as in (5), to indirect forms of discrimination, like the oneexempliﬁed in (6):(5)

They could use a woman to execute them because theybelieve that if they are killed by a woman they will not go toheaven and enjoy their 77 virgins. (6)

I believe the problem is Europe wide, but I feel Malta isharder hit simply because of it’s size. Malta simply has notgot the room to keep receiving these people.

While the commenter in (5) suggests that Muslim migrantsare executed, the one in (6) does not appear at ﬁrst sightto articulate anything directly offensive or insulting; uponcloser inspection, however, it can be taken to allude to ideasof exclusion and in-grouping, since the use of ‘these peo-ple’ serves to create an in-group of us (who belong here anddeserve access to resources) and them (who do not belonghere and should not be taking up our precious space).While the two examples given above illustrate the two op-posite ends of a spectrum, there is of course much thatlies in between. In the following comment, for example,the commenter may acknowledge the need to respect mem-bers of the LGBTIQ+ community, but concurrently refersto speciﬁc members of the minority group, that is transgen-der individuals, as ‘too complicated and abnormal:’(7) We just need to teach our children to respect each otherwhoever they be. Teaching gender diversity is too compli-cated and abnormal for their standards. I’m afraid it wouldeffect their perception that lgbti is the norm instead of a mi-nority.

So, despite emphasising the importance of promoting di-versity, the commenter also recommends that children aretaught to differentiate between the ‘normal’ dominant iden-tities and ‘abnormal’, and thus subordinate ones. Equiva-lently, the commenter in [8] explicitly denies racism, whileat the same time clearly employing tactics of negativestereotyping and categorisation with the use of the term ‘il-legals.’(8) [username] sure NO! The majority of Maltese peopleare against ” ILLEGALS”! Do not mix racism with illegalsplease!

All in all, as the examples above show, there are varyingdegrees of discriminatory hatred that need to be accommo-dated within the purview of hate speech. Crucially, this issomething that is underlined in non-NLP research on hatespeech too (Vollhard, 2007; Assimakopoulos et al., 2017).Cortese (2007, p. 8-9), for example, describes a frameworkthat treats hate speech not as a single category, but rather asfalling on a four-point scale:1. unintentional discrimination : unknowingly and unin-tentionally offending a (member of a) minority groupby, for example, referring to the group by means of aword that is considered offensive or outdated, such asreferring to black people as ‘coloured’ or by referringto asylum seekers as ‘immigrants;’2. conscious discrimination : intentionally and con-sciously insulting a (member of a) minority group by,for example, using a pejorative term like ‘faggot’ orby describing to migrants as ‘invaders;’3. inciting discriminatory hatred : intentionally and con-sciously generating feelings of hatred toward minori-ties by publicly encouraging society to hate and ex-clude the group in question, such as by suggesting thatmembers of the LGBTIQ+ community are sick or thatmigrants bring contagious diseases from their coun-tries;4. inciting discriminatory violence : intentionally andconsciously encouraging violence against minoritiesby, for example, suggesting that members of a minor-ity group be executed.The importance of a micro-classiﬁcation of our negativecomment data becomes apparent against the backdrop ofCortese’s categorisation, since not all the examples givenabove would fall within the third or fourth regions of thescale, which are the points that most current hate speechlegislation appears to regulate. That said, given the wideruse of the term hate speech in everyday discourse, sev-eral annotators could take texts that fall under the ﬁrst twopoints of Cortese’s scale to constitute hate speech too.

The discussion so far inevitably leads to what we considerto be the main challenge in achieving an intersubjective un-derstanding of hate speech. Clearly, explicit incitement, ofthe type expressed by an utterance of “ kill all [members ofa minority group] ” or “ let’s make sure that all [membersof a minority group] do not feel welcome ,” is not very dif-ﬁcult to discern. However, since “most contemporary so-cieties do disapprove of” such invocations, “overt prejudi-cial bias has been transformed into subtle and increasinglycovert expressions” (Leets, 2003, p.146). Therefore, whileclearly indicative of a negative attitude, the use of deroga-tory terms, as in (9), cannot account for all instances of hatespeech:(9)

That’s because we’re not simply importing destitutepeople. We’re importing a discredited, disheveled and de-structive culture. n this respect, it is often acknowledged (Warner andHirschberg, 2012; Gao and Huang, 2017; Schmidt andWiegand, 2017; Fortuna and Nunes, 2018; Sanguinetti etal., 2018; Watanabe et al., 2018) that discourse contextplays a crucial role in evaluating remarks, as there are sev-eral indirect strategies for expressing discriminatory ha-tred, which our in-depth analysis enabled us to addition-ally unearth. The most pertinent example of this is theuse of metaphor, which has for long been emphasised asa popular strategy for communicating discrimination, sinceit typically involves “mappings from a conceptual ‘sourcedomain’ to a ‘target domain’ with resulting conceptual‘blends’ that help to shape popular world-views in terms ofhow experiences are categorized and understood” (Musolff,2007, p.24). In (10), for example, the commenter engagesin incitement by making use of the metaphorical schema(Lakoff and Johnson, 2003)

MIGRATION IS A DISEASE :(10)

Illegal immigration is a cancer which if not eliminatedwill bring the downfall of Europe and European culture.

Similar indirect strategies can be found in the frequent useof ﬁgurative language to underline the urgency of the sit-uation, as exempliﬁed by the use of allusion in (11) andhyperbole in (12):(11)

Will Malta eventually become the New Caliphate? (12) . . . in 4 more days we will become the minority . . .

Alongside ﬁgurative language, stereotypical representa-tions and remarks generalising over a minority group, asin (12) also play a crucial role in the expression of discrim-inatory hatred (Brown, 2010; Haas, 2012; Maitra and Mc-Gowan, 2012; Kopytowska and Baider, 2017):(13)

If anyone is lacking, it is you guys for lacking a senseof decency. Just look at the costumes worn at gay paradesto prove my point.

Now, while such remarks might not appear to fall underthe incitement criterion for the delineation of hate speech,there is still reason to include them in a corpus of hatespeech. As Fortuna and Nunes (2018, p.85:5) argue, “allsubtle forms of discrimination, even jokes, must be markedas hate speech” because even seemingly harmless jokes in-dicate “relations between the groups of the jokers and thegroups targeted by the jokes, racial relations, and stereo-types” (Kuipers and van der Ent, 2016) and their repetition“can become a way of reinforcing racist attitudes” (Kom-patsiaris, 2017) and have ”negative psychological effectsfor some people” (Douglass et al., 2016).Be that as it may, people are bound to disagree as to whethersuch implicit expressions can or should indeed be classi-ﬁed as hate speech. In view of this, we think that a strategyfor obtaining more reliable annotation results would be torefrain as much as possible from asking raters to make ajudgement that could be inﬂuenced by their subjective opin-ion as to what ultimately constitutes hate speech, as wellas by their own views and attitudes towards the minoritygroups under question.

5. Towards a new annotation scheme

We trust that our discussion so far provides insight as towhy it is particularly difﬁcult to achieve adequate agree-ment among different crowd coders when it comes to clas-sifying hate speech. Given that non-experts cannot alwaysdistinguish between hate speech and offensive language,and given the varying thresholds that hate speech laws placeon the continuum of discriminatory discourse, it wouldprobably be hard even for hate speech experts to fully agreeon a classiﬁcation (Waseem, 2016; Ross et al., 2016). It isthus clear that a simple binary classiﬁcation of online postsinto hate speech or non-hate speech is unlikely to be reli-able, and that a more complex scheme is inevitably needed.When we presented previous annotation frameworks above,we mentioned that the one developed by Sanguinetti et al.(2018) was informed by critical discussions of the conceptof hate speech and was thus relatively complex. While onthe right track, however, it seems to also incorporate cate-gories, such as intensity, aggressiveness and offensiveness,that are susceptible to a rather subjective evaluation. In-dispensable though they may be for the discussion of hatespeech, such categories can easily compromise agreementas well, since individuals with different backgrounds andviews could provide markedly different interpretations ofthe same data.The proposal that we wish to make in this paper is that suchsubjective categories are left out from annotation instruc-tions as much as possible. To this effect, the scheme thatwe developed for hate speech annotation in the MaNeCocorpus is hierarchical in nature and expected to lead to theidentiﬁcation of discriminatory comments along the afore-mentioned scale by Cortese without focusing on such im-pressionistic categories as intensity or degree of hateful-ness, as follows:1. Does the post communicate a positive, negative orneutral attitude? [

Positive / Negative / Neutral ]2. If negative, who does this attitude target? [

Individual/ Group ](a) If it targets an individual, does it do so because ofthe individual’s afﬁliation to a group? [

Yes / No ]If yes, name the group .(b) If it targets a group, name the group .3. How is the attitude expressed in relation to the targetgroup? Select all that apply. [

Derogatory term /Generalisation / Insult / Sarcasm (including jokesand trolling) / Stereotyping / Suggestion / Threat ] The selection of these particular communicative strategies asopposed to alternative ones is based on the categories used forthe purposes of the C.O.N.T.A.C.T. project. We thus assume anadequate coverage, since all these strategies were identiﬁed in theparallel annotation of corpora from the nine different national con-texts represented in the project. That being said, we consideredadding a ﬁeld for annotators to suggest their own categories, butdecided against it, since this could easily lead to confusion whenit comes to grouping different categories together during the pro-cessing of responses. a) If the post involves a suggestion, is it a suggestionthat calls for violence against the target group?[

Yes / No ]With regards to the proposed annotation scheme, we cannotof course fail to acknowledge that the ﬁrst label (positive,negative, neutral) is to a certain extent subjective. Still, webelieve that it is formulated in a way that does not requirethe annotator to make a judgement in relation to a post’shatefulness or aggressiveness, but merely to evaluate thecommenter’s attitude in writing the comment. Obviously,this would inevitably lead to some disagreement, but it ul-timately calls for a judgement that should be more straight-forward to make, as it is not tied to an individual’s under-standing of what constitutes discrimination per se ; after all,a negative attitude can easily crop up in various other set-tings too, like, for example, when expressing disagreementto someone else’s post in the same thread.So, once negative attitude is established, the second step inthe process would help to indirectly assess whether the atti-tude can be taken to be discriminatory too, insofar as it tar-gets a group or an individual on the basis of group member-ship. In this vein, depending on whether the group that eachdiscriminatory post targets is protected by law – like mi-grant or LGBTIQ+ individuals usually are, but politiciansor church ofﬁcials are not – we can distinguish betweenhate speech and merely hateful discourse.Finally, the third and last step will help determine the posi-tioning of a post that fulﬁls the penultimate and ﬁnal cri-teria in Cortese’s scale. In this regard, posts that com-prise a suggestion would fall under incitement to discrim-inatory hatred, while posts speciﬁcally calling for violentactions towards the target group and those including threatswould belong to the category of incitement to discrimi-natory violence. By the same token, posts containing in-sults, derogatory terms, stereotyping, generalisations andsarcasm would fall under Cortese’s categories of consciousand unintentional discrimination, which are not – at facevalue – regulated by hate speech law, but are still, as wehave seen, relevant to the task at hand. In this way, oneshould be able to establish different sets of discriminatorytalk, which could then be included or excluded from subse-quent analyses, depending on the threshold that one speci-ﬁes for the delineation of hate speech.Apart from enabling us to distinguish between explicit hatespeech and softer forms of discrimination, a further merit ofthe proposed annotation scheme is that it indirectly allowsus to control for annotator disagreement in some cases. Themost obvious such case would be the presence of posts thatare ambiguous between a literal and a sarcastic interpreta-tion, or even the identiﬁcation of lexical items and general-isations that might be considered offensive by some anno-tators and not by others. At the same time, it could provideuseful indications regarding the distinction between con-scious and unintentional discrimination. The rule of thethumb here would be that the more annotators agree on acomment belonging to the ﬁrst two categories of discrim-ination in Cortese’s scale, in the sense that it is discrimi-natory but does not include a suggestion regarding the tar-get group, the closer that comment would be to conscious discrimination. That is because, as we have seen, uninten-tional discrimination has a marked tendency to go unno-ticed and is thus expected to be less noticed by the annota-tors. Crude though this generalisation might be, we believethat it still provides a criterion that is viable. Obviously, this is not to say that disagreement can be com-pletely eradicated. As a matter of fact, we cannot em-phasise enough that the aim of our suggested annotationscheme (or of any other such scheme for that matter) is notto achieve perfect agreement. After all, as our preliminaryCDA analysis revealed, genuine ambiguity can present it-self not only at the structural, but also at the attitudinallevel. For example, the following comment was posted inreaction to an article about a Nigerian rape victim who wasﬁghting to bring one of her two children to Malta:(14)

For God’s sake, make the process simpler and allowthis woman to unite with her daughter in Nigeria.

At face value, it seems difﬁcult to discern if the commenterwishes for the woman in question, who is stranded in Maltadue to not having received refugee statues, to be allowedto reunite with her daughter by bringing her to Malta, or issuggesting that she be deported back to Nigeria in order tobe with her daughter. Apart from this, some posts may stillbe difﬁcult to classify on the ground that they express mul-tiple attitudes. For example, in stating (15), the commenteracknowledges that marriage should be a matter of choicebetween two people in love, but then directly implies thatmarriage should be a union between two people of the op-posite (cis)gender.(15)

People marry because they fall in love, and althoughit’s a choice, it was meant to be like that even in the animalkingdom, for example swans mate for life, male and female,not male and male.

Despite the inevitable presence of such borderline am-biguous comments, however, we do expect our annotationscheme to fare better in terms of inter-annotator agreementthan previous attempts, and particularly attempts based ona binary ± hate speech classiﬁcation. In an attempt to assessits efﬁcacy then, we conducted a preliminary pilot study towhich we will now turn. A total of 24 annotators took part in this pilot study.The participants, who were mostly academics and studentsranging between 21 and 60 years of age, were divided intotwo gender- and age-balanced groups of 12. The ﬁrst groupwas asked to label items using simple binary annotation( ± hate speech) on the basis of the deﬁnition of hate speechprovided within the Maltese Criminal Code, while the sec-ond used our proposed multi-level annotation scheme. Bothgroups were presented with 15 user-generated commentsfrom the Maltese C.O.N.T.A.C.T. corpus in random order. An alternative here would potentially involve some kind ofuser proﬁling that would allow for an identiﬁcation of the com-menter’s more general stance towards the target minority. Suchan alternative, however, could compromise the overall task, sincegrouping a user’s comments together is bound to bias annotators. n an effort to ensure variation of the items to be anno-tated, we went back to our original CDA classiﬁcation andselected three comments from each of the following cate-gories: • comments involving incitement to discriminatory vio-lence against the migrant minority; • comments that were labeled as discriminatory towardsthe migrant minority but not as fulﬁlling the incite-ment criterion; • comments that were labeled as negative but do not tar-get the migrant minority; • comments that were labeled as expressing a positiveattitude towards the migrant minority; and • comments that were labeled as ambiguous, along sim-ilar lines to the discussion of (14) and (15) above.All individuals in both annotation conditions performed thetask independently. Obviously, in order to meaningfullycompare the levels of inter-annotator agreement for the twoschemes, we needed to have an equal number of classesacross the board. To achieve this we screened the annota-tions received for the multi-level condition, with a view toinferring a binary ± hate speech classiﬁcation in this casetoo.While inferring a binary class from the multi-level schememight seem counter-intuitive, in view of the preceding dis-cussion, there are several reasons why we did this. First,the decision was taken on pragmatic grounds: in order toachieve a fair comparison, agreement within the two groupsneeded to be estimated based on similar categories. Thatway, we were able to compute inter-annotator agreementbetween participants in the two annotation tasks in a waythat would enable a direct comparison. More importantly,however, there are theoretical grounds for using the multi-level scheme to achieve a binary classiﬁcation. One ofthe arguments in the previous section was that hate speechwould be evident in comments that would be labeled as neg-ative, targeting (members of) a minority group and com-prising either a suggestion or a threat. Indeed, the thrust ofthe argument presented above is not that a binary classiﬁca-tion is undesirable, but that in order to be reliably made, ithad to supervene on micro-decisions that took into accountthese various dimensions. The extent to which this is thecase is of course an empirical question, one to which wereturn in the concluding section.In Table 5.1., we report percent agreement, Fleiss’s kappa(Fleiss, 1971), and Randolph’s kappa (Randolph, 2005),which is known to be more suitable when raters do nothave prior knowledge of the expected distribution of theannotation categories. All in all, the results from this pilotappear to corroborate our prediction: the proposed multi-level annotation scheme appears to indeed lead to higherinter-annotator agreement, which for Fleiss’s kappa in par-ticular seems to even mark a rise from moderate (0.54) tosubstantial agreement (0.69).Apart from enabling us to indirectly compare its perfor-mance in relation to the corresponding binary classiﬁcation Metric binary multi-levelPercent agr. 76.8% 84.6%Fleiss’ k k

6. Future steps: The MaNeCo corpus

Having motivated the proposed annotation framework onthe basis of our pilot study, we are currently in the processof implementing it in the annotation of various samples forthe MaNeCo corpus. At the moment, the MaNeCo cor-pus comprises original data donated to us by the

Times ofMalta , the newspaper with the highest circulation in Malta.More speciﬁcally, it contains – in anonymised form – allthe comments from the inception of the Times of Malta on-line platform (April 2008) up to January 2017 (when thedata was obtained). This amounts to over 2.5 million usercomments (over 124 million words), which are written inEnglish, Maltese or a mixture of the two languages, eventhough the newspaper itself is English-language. Our aimis to eventually populate it with data from other local newsoutlets too to ensure an equal representation of opinions ir-respective of a single news portal’s political and ideologicalafﬁliations. That said, even this dataset on its own is an in-valuable resource for hate speech annotation, since it alsoincludes around 380K comments that have been deletedby the newspaper moderators, and which are obviously theones that tend to be particularly vitriolic. In this regard, see-ing that there are generally “much fewer hateful than benigncomments present in randomly sampled data, and thereforea large number of comments have to be annotated to ﬁnda considerable number of hate speech instances” (Schmidtand Wiegand, 2017, p.7), MaNeCo is well suited for ad-dressing the challenge of building a set that is balanced be-tween hate speech and non-hate speech data.Clearly, this does not mean that we will not have addi-tional challenges to face before we end up with a datasetthat could potentially be used for training and testing pur-oses. For one, being a corpus of user-generated content inthe bilingual setting of Malta, it contains extensive code-switching and code-mixing (Rosner and Farrugia, 2007;Elfardy and Diab, 2012; Sadat et al., 2014; Eskander etal., 2014), which is further complicated by the inconsistentuse of Maltese spelling online, particularly in relation tothe speciﬁc Maltese graphemes [ ˙c ], [ ˙g ], [ g¯h ], [ ¯h ], and [ ˙z ],for which diacritics are often omitted in online discourse.Then, given the casual nature of most communication onsocial media, it is full of non-canonical written text (Bald-win et al., 2013; Eisenstein, 2013) exhibiting unconven-tional orthography, use of arbitrary abbreviations and so on.Even so, these are challenges that can be faced after the ef-fectiveness of our proposed scheme in yielding better inter-annotator agreement results is established more concretely.In this regard, a line of future work, brieﬂy discussed inSection 3., is to further validate the multi-level annotationschemes. Speciﬁcally, reliability needs to be estimated onthe basis of larger and more diverse samples. Furthermore,we have already identiﬁed the question of whether, havingconducted a micro-analysis of user-generated texts, it be-comes easier to classify them, in a second step, as instancesof hate or non-hate speech, thereby going from a multi-levelto a binary classiﬁcation. The results from our pilot studyare encouraging, in that they evince higher agreement whenannotators have their attention drawn to different aspects ofthe text in question. Whether this will also translate into amore reliable detection rate for hate speech – by humans orby classiﬁcation algorithms, for which the multi-level an-notation may provide additional features – is an empiricalquestion we still need to test.In closing, to the extent that we can assess our proposedannotation scheme’s usefulness for training automatic clas-siﬁers, we believe that, by distinguishing between differ-ent target groups from the beginning, this scheme couldprospectively enable us to select training material from dif-ferent domains of hate speech (such as racism, sexism,homophobia, etc.) in a way that serves transfer learningtoo. This could be particularly useful in addressing thenotorious challenge of retaining a model’s good perfor-mance on a particular dataset source and/or domain of hatespeech during transfer learning to other datasets and do-mains (Agrawal and Awekar, 2018; Arango et al., 2019).After all, as Gr¨ondahl et al. (2018) concluded, after demon-strating – through replication and cross-application of ﬁvemodel architectures – that any model can obtain compara-ble results across sources/domains insofar as it is trained onannotated data from within the same source/domain, futurework in automatic hate speech detection “should focus onthe datasets instead of the models,” and more speciﬁcally ona comparison of “the linguistic features indicative of differ-ent kinds of hate speech (racism, sexism, personal attacksetc.), and the differences between hateful and merely of-fensive speech.” (Gr¨ondahl et al., 2018, p.11). This is a di-rection that we are particularly interested in, since our CDAresearch on the Maltese C.O.N.T.A.C.T dataset correspond-ingly revealed that although users discussing the LGBTIQ+community and migrants appear to employ similar tacticsin expressing discriminatory attitudes, the content of theirutterances differs to a considerable extent depending on the target of their comments.

7. Acknowledgements

The research reported in this paper has been substantiallyinformed by the original work conducted by the Univer-sity of Malta team on the C.O.N.T.A.C.T. Project, whichwas co-funded by the Rights, Equality and Citizenship Pro-gramme of the European Commission Directorate-Generalfor Justice and Consumers (JUST/2014/RRAC/AG). Wegratefully acknowledge the support of the Times of Malta inmaking the MaNeCo data available to us. Last but not least,we would like to thank Slavomir Ceplo for his diligent workin extracting and organising the MaNeCo corpus, the threeanonymous LREC reviewers for their insightful commentsand, of course, all participants of the pilot study for theirgenerous availability.

8. Bibliographical References

Agrawal, S. and Awekar, A. (2018). Deep learning for de-tecting cyberbullying across multiple social media plat-forms. In Gabriella Pasi, et al., editors,

Advances in In-formation Retrieval , pages 141–153, Cham. Springer.Arango, A., P´erez, J., and Poblete, B. (2019). Hate speechdetection is not as easy as you may think: A closer lookat model validation. In

Proceedings of the 42nd Inter-national ACM SIGIR Conference on Research and De-velopment in Information Retrieval , pages 45–54, NewYork. ACM.Assimakopoulos, S. and Vella Muskat, R. (2017). Ex-ploring xenophobic and homophobic attitudes in Malta:Linking the perception of social practice with textualanalysis.

Lodz Papers in Pragmatics , 13(2):179–202.Assimakopoulos, S. and Vella Muskat, R. (2018). Xeno-phobic and homophobic attitudes in online news portalcomments in Malta.

Xjenza Online , 6(1):25–40.Assimakopoulos, S., Baider, F. H., and Millar, S. (2017).

Online Hate Speech in the European Union: ADiscourse-Analytic Perspective . Springer, Cham.Badjatiya, P., Gupta, S., Gupta, M., and Varma, V. (2017).Deep learning for hate speech detection in tweets. In

Proceedings of the 26th International Conference onWorld Wide Web Companion , pages 759–760, Geneva.International World Wide Web Conferences.Baldwin, T., Cook, P., Lui, M., MacKinlay, A., and Wang,L. (2013). How noisy social media text, how diffrnt so-cial media sources? In

Proceedings of the Sixth Interna-tional Joint Conference on Natural Language Process-ing , pages 356–364, Nagoya. Asian Federation of Natu-ral Language Processing.Banks, J. (2010). Regulating hate speech online.

In-ternational Review of Law, Computers & Technology ,24(3):233–239.Bleich, E. (2014). Freedom of expression versus racist hatespeech: Explaining differences between high court reg- For example, while migrants are often referred to as ‘ungrate-ful’ or ‘dangerous’ and migration is typically framed in terms ofmetaphors of invasion, members of the LGBTIQ+ community arecharacterised as ‘abnormal’ and ‘sinners’ with frequent appeal tometaphors of biblical doom. lations in the usa and europe.

Journal of Ethnic andMigration Studies , 40(2):283–300.Bretschneider, U. and Peters, R. (2017). Detecting Offen-sive Statements towards Foreigners in Social Media. In

Proceedings of the 50th Hawaii International Confer-ence on System Sciences , Maui. HICSS.Brown, R. (2010).

Prejudice: Its Social Psychology . Wi-ley, Oxford.Brown, A. (2017). What is hate speech? Part 2: Familyresemblances.

Law and Philosophy , 36(5):561–613.Burnap, P. and Williams, M. L. (2015). Cyber hate speechon twitter: An application of machine classiﬁcation andstatistical modeling for policy and decision making.

Pol-icy & Internet , 7(2):223–242.Davidson, T., Warmsley, D., Macy, M., and Weber,I. (2017). Automated hate speech detection and theproblem of offensive language. In

Proceedings of theEleventh International AAAI Conference on Web and So-cial Media , Palo Alto. AAAI Press.de Gibert, O., Perez, N., Garc´ıa-Pablos, A., and Cuadros,M. (2018). Hate speech dataset from a white supremacyforum. In

Proceedings of the 2nd Workshop on AbusiveLanguage Online (ALW2) , pages 11–20, Brussels. Asso-ciation for Computational Linguistics.Djuric, N., Zhou, J., Morris, R., Grbovic, M., Radosavlje-vic, V., and Bhamidipati, N. (2015). Hate speech de-tection with comment embeddings. In

Proceedings ofthe 24th International Conference on World Wide Web ,pages 29–30, New York. ACM.Douglass, S., Mirpuri, S., English, D., and Yip, T. (2016).“they were just making jokes”: Ethnic/racial teasing anddiscrimination among adolescents.

Cultural Diversityand Ethnic Minority Psychology , 22(1):69–82.Eisenstein, J. (2013). What to do about bad language onthe internet. In

Proceedings of the 2013 Conference ofthe North American Chapter of the Association for Com-putational Linguistics: Human Language Technologies ,pages 359–369, Atlanta. Association for ComputationalLinguistics.Elfardy, H. and Diab, M. (2012). Token level identiﬁcationof linguistic code switching. In

Proceedings of COLING2012: Posters , pages 287–296, Mumbai. COLING 2012Organizing Committee.ElSherief, M., Kulkarni, V., Nguyen, D., Wang, W., andBelding, E. (2018). Hate lingo: A target-based linguisticanalysis of hate speech in social media. In

Proceedingsof the 12th International AAAI Conference on Web andSocial Media , Palo Alto. AAAI Press.Eskander, R., Al-Badrashiny, M., Habash, N., and Ram-bow, O. (2014). Foreign words and the automatic pro-cessing of Arabic social media text written in Romanscript. In

Proceedings of the First Workshop on Com-putational Approaches to Code Switching , pages 1–12,Doha. Association for Computational Linguistics.Fleiss, J. L. (1971). Measuring nominal scale agreementamong many raters.

Psychological Bulletin , 76:378–38.Fortuna, P. and Nunes, S. (2018). A survey on automaticdetection of hate speech in text.

ACM Comput. Surv. ,51(4):85:1–85:30. Gagliardone, I., Gal, D., Alves, T., and Martinez, G.(2015).

Countering online hate speech . Unesco Publish-ing, Paris.Gao, L. and Huang, R. (2017). Detecting online hatespeech using context aware models. In

Proceedings ofthe International Conference Recent Advances in Natu-ral Language Processing, RANLP 2017 , pages 260–266,Varna. INCOMA.Greevy, E. and Smeaton, A. F. (2004). Classifying racisttexts using a support vector machine. In

Proceedings ofthe 27th Annual International ACM SIGIR Conferenceon Research and Development in Information Retrieval ,SIGIR ’04, pages 468–469, New York. ACM.Gr¨ondahl, T., Pajola, L., Juuti, M., Conti, M., and Asokan,N. (2018). All you need is ”love”: Evading hate speechdetection. In

Proceedings of the 11th ACM Workshopon Artiﬁcial Intelligence and Security , pages 2–12, NewYork. ACM.Haas, J. (2012). Hate speech and stereotypic talk. InH. Giles, editor,

The Handbook of Intergroup Commu-nication , pages 128–140. Routledge, London.Kompatsiaris, P. (2017). Whitewashing the nation: racistjokes and the construction of the african ‘other’ in greekpopular cinema.

Social Identities , 23(3):360–375.Kopytowska, M. and Baider, F. (2017). From stereo-types and prejudice to verbal and physical violence:Hate speech in context.

Lodz Papers in Pragmatics ,13(2):133–152.Kuipers, G. and van der Ent, B. (2016). The seriousnessof ethnic jokes: Ethnic humor and social change in thenetherlands, 1995–2012.

HUMOR , 29(4):605–633.Kwok, I. and Wang, Y. (2013). Locate the hate: De-tecting tweets against blacks. In

Proceedings of theTwenty-Seventh AAAI Conference on Artiﬁcial Intelli-gence , pages 1621–1622, Palo Alto. AAAI Press.Lakoff, G. and Johnson, M. (2003).

Metaphors we live by .University of Chicago Press, Chicago.Leets, L. (2003). Disentangling perceptions of subtle racistspeech: A cultural perspective.

Journal of Language andSocial Psychology , 22(2):145–168.MacAvaney, S., Yao, H., Yang, E., Russell, K., Goharian,N., and Frieder, O. (2019). Hate speech detection: Chal-lenges and solutions.

PloS One , 14(8):1–16.Maitra, I. and McGowan, M. K. (2012). Introduction andoverview. In I. Maitra et al., editors,

Speech and harm:Controversies over free speech , pages 1–23. Oxford Uni-versity Press, Oxford.Malmasi, S. and Zampieri, M. (2018). Challenges indiscriminating profanity from hate speech.

Journalof Experimental & Theoretical Artiﬁcial Intelligence ,30(2):187–202.Mulki, H., Haddad, H., Bechikh Ali, C., and Alshabani,H. (2019). L-HSAB: A Levantine twitter dataset forhate speech and abusive language. In

Proceedings ofthe Third Workshop on Abusive Language Online , pages111–118, Florence. Association for Computational Lin-guistics.Musolff, A. (2007). What role do metaphors play in racialrejudice? the function of antisemitic imagery in hitler’smein kampf.

Patterns of Prejudice , 41(1):21–43.Nobata, C., Tetreault, J., Thomas, A., Mehdad, Y., andChang, Y. (2016). Abusive language detection in on-line user content. In

Proceedings of the 25th Interna-tional Conference on World Wide Web , WWW ’16, pages145–153, Geneva. International World Wide Web Con-ferences.Randolph, J. J. (2005). Free-marginal multirater kappa:An alternative to Fleiss’ ﬁxed-marginal multirater kappa.Rizos, G., Hemker, K., and Schuller, B. (2019). Aug-ment to Prevent: Short-Text Data Augmentation in DeepLearning for Hate-Speech Classiﬁcation. In

Proceedingsof the 28th ACM International Conference on Informa-tion and Knowledge Management - CIKM ’19 , pages991–1000, Beijing. ACM.Rosner, M. and Farrugia, P.-J. (2007). A tagging algorithmfor mixed language identiﬁcation in a noisy domain. In

INTERSPEECH 2007: 8th Annual Conference of the In-ternational Speech Communication Association , pages190–193, Red Hook. Curran Associates.Ross, B., Rist, M., Carbonell, G., Cabrera, B., Kurowsky,N., and Wojatzki, M. (2016). Measuring the Reliabilityof Hate Speech Annotations: The Case of the EuropeanRefugee Crisis. In Michael Beißwenger, et al., editors,

Proceedings of NLP4CMC III: 3rd Workshop on NaturalLanguage Processing for Computer-Mediated Commu-nication , pages 6–9, Frankfurt. Bochumer LinguistischeArbeitsberichte.Sadat, F., Kazemi, F., and Farzindar, A. (2014). Automaticidentiﬁcation of Arabic language varieties and dialectsin social media. In

Proceedings of the Second Workshopon Natural Language Processing for Social Media (So-cialNLP) , pages 22–27, Dublin. Association for Compu-tational Linguistics and Dublin City University.Saleem, H. M., Dillon, K. P., Benesch, S., and Ruths, D.(2017). A web of hate: Tackling hateful speech in on-line social spaces. In

Proceedings of the First Workshopon Text Analytics for Cybersecurity and Online Safetyat LREC 2016 , Portoroˇz. European Language ResourcesAssociation (ELRA).Salminen, J., Veronesi, F., Almerekhi, H., Jung, S., andJansen, B. J. (2018). Online hate interpretation variesby country, but more by individual: A statistical analysisusing crowdsourced ratings. In

Proceedings of the FifthInternational Conference on Social Networks Analysis,Management and Security (SNAMS) , pages 88–94, RedHook. Curran Associates.Salminen, J., Almerekhi, H., Kamel, A. M., Jung, S.-g.,and Jansen, B. J. (2019). Online hate ratings vary byextremes: A statistical analysis. In

Proceedings of the2019 Conference on Human Information Interaction andRetrieval , CHIIR ’19, pages 213–217, New York. ACM.Sanguinetti, M., Poletto, F., Bosco, C., Patti, V., andStranisci, M. (2018). An Italian twitter corpus ofhate speech against immigrants. In

Proceedings of theEleventh International Conference on Language Re-sources and Evaluation (LREC 2018) , Portoroˇz. Euro-pean Language Resources Association (ELRA). Schmidt, A. and Wiegand, M. (2017). A survey on hatespeech detection using natural language processing. In

Proceedings of the Fifth International Workshop on Nat-ural Language Processing for Social Media , pages 1–10,Valencia. Association for Computational Linguistics.Sellars, A. (2016). Deﬁning hate speech.

Berkman KleinCenter Research Publication No. 2016-20; Boston Uni-versity School of Law, Public Law Research Paper No.16-48 .Spertus, E. (1997). Smokey: Automatic recognition ofhostile messages. In

AAAI-97 - Proceedings of the Four-teenth National Conference on Artiﬁcial Intelligence andThe Ninth Annual Conference on Innovative Applica-tions of Artiﬁcial Intelligence , pages 1058–1065, Boston.MIT Press.Tulkens, S., Hilte, L., Lodewyckx, E., Verhoeven, B., andDaelemans, W. (2016). The automated detection ofracist discourse in dutch social media.

ComputationalLinguistics in the Netherlands Journal , 6(1):3–20.Warner, W. and Hirschberg, J. (2012). Detecting hatespeech on the world wide web. In

Proceedings of theSecond Workshop on Language in Social Media , pages19–26, Montr´eal. Association for Computational Lin-guistics.Waseem, Z. and Hovy, D. (2016). Hateful symbols or hate-ful people? predictive features for hate speech detectionon twitter. In

Proceedings of the NAACL Student Re-search Workshop , pages 88–93, San Diego. Associationfor Computational Linguistics.Waseem, Z., Davidson, T., Warmsley, D., and Weber, I.(2017). Understanding abuse: A typology of abusivelanguage detection subtasks. In

Proceedings of the FirstWorkshop on Abusive Language Online , pages 78–84,Vancouver. Association for Computational Linguistics.Waseem, Z. (2016). Are you a racist or am I seeing things?annotator inﬂuence on hate speech detection on twitter.In

Proceedings of the First Workshop on NLP and Com-putational Social Science , pages 138–142, Austin. Asso-ciation for Computational Linguistics.Watanabe, H., Bouazizi, M., and Ohtsuki, T. (2018). Hatespeech on twitter: A pragmatic approach to collect hate-ful and offensive expressions and perform hate speechdetection.

IEEE Access , 6:13825–13835.Zampieri, M., Malmasi, S., Nakov, P., Rosenthal, S., Farra,N., and Kumar, R. (2019). Predicting the type and targetof offensive posts in social media. In