Neural Character-based Composition Models for Abuse Detection
NNeural Character-based Composition Models for Abuse Detection
Pushkar Mishra
Dept. of CS & TechnologyUniversity of CambridgeUnited Kingdom [email protected]
Helen Yannakoudakis
The ALTA InstituteUniversity of CambridgeUnited Kingdom [email protected]
Ekaterina Shutova
ILLCUniversity of AmsterdamThe Netherlands [email protected]
Abstract
The advent of social media in recent years hasfed into some highly undesirable phenomenasuch as proliferation of offensive language,hate speech, sexist remarks, etc. on the Inter-net. In light of this, there have been severalefforts to automate the detection and modera-tion of such abusive content. However, delib-erate obfuscation of words by users to evadedetection poses a serious challenge to the ef-fectiveness of these efforts. The current stateof the art approaches to abusive language de-tection, based on recurrent neural networks, donot explicitly address this problem and resortto a generic
OOV (out of vocabulary) embed-ding for unseen words. However, in using asingle embedding for all unseen words we losethe ability to distinguish between obfuscatedand non-obfuscated or rare words. In this pa-per, we address this problem by designing amodel that can compose embeddings for un-seen words. We experimentally demonstratethat our approach significantly advances thecurrent state of the art in abuse detection ondatasets from two different domains, namelyTwitter and Wikipedia talk page.
Pew Research Center has recently uncovered sev-eral disturbing trends in communications on theInternet. As per their report (Duggan, 2014), 40%of adult Internet users have personally experiencedharassment online, and 60% have witnessed theuse of offensive names and expletives. Expect-edly, the majority (66%) of those who have per-sonally faced harassment have had their most re-cent incident occur on a social networking websiteor app. While most of these websites and apps pro-vide ways of flagging offensive and hateful con-tent, only 8.8% of the victims have actually con-sidered using such provisions. Two conclusions can be drawn from these statis-tics: (i) abuse (a term we use henceforth to collec-tively refer to toxic language, hate speech, etc.) isprevalent in social media, and (ii) passive and/ormanual techniques for curbing its propagation(such as flagging) are neither effective nor easilyscalable (Pavlopoulos et al., 2017). Consequently,the efforts to automate the detection and modera-tion of such content have been gaining popularity(Waseem and Hovy, 2016; Wulczyn et al., 2017).In their work, Nobata et al. (2016) describe thetask of achieving effective automation as an inher-ently difficult one due to several ingrained com-plexities; a prominent one they highlight is thedeliberate structural obfuscation of words (for ex-ample, fcukk , w0m3n , banislam , etc.) by users toevade detection. Simple spelling correction tech-niques and edit-distance procedures fail to provideinformation about such obfuscations because: (i)words may be excessively fudged (e.g., a55h0le , n1gg3r ) or concatenated (e.g., stupidbitch , femi-nismishate ), and (ii) they fail to take into accountthe fact that some character sequences like musl and wom are more frequent and more indicative ofabuse than others (Waseem and Hovy, 2016).Nobata et al. (2016) go on to show that sim-ple character n-gram features prove to be highlypromising for supervised classification approachesto abuse detection due to their robustness tospelling variations; however, they do not addressobfuscations explicitly. Waseem and Hovy (2016)and Wulczyn et al. (2017) also use character n-grams to attain impressive results on their respec-tive datasets. That said, the current state of theart methods do not exploit character-level infor-mation, but instead utilize recurrent neural net-work ( RNN ) models operating on word embed-dings alone (Pavlopoulos et al., 2017; Badjatiyaet al., 2017). Since the problem of deliberately a r X i v : . [ c s . C L ] S e p oisy input is not explicitly accounted for, theseapproaches resort to the use of a generic OOV (out of vocabulary) embedding for words not seenin the training phase. However, in using a sin-gle embedding for all unseen words, such ap-proaches lose the ability to distinguish obfuscatedwords from non-obfuscated or rare ones. Recently,Mishra et al. (2018) and Qian et al. (2018),working with the same Twitter dataset as we do,reported that many of the misclassifications bytheir
RNN -based methods happen due to inten-tional misspellings and/or rare words.Our contributions are two-fold: first, we exper-imentally demonstrate that character n-gram fea-tures are complementary to the current state of theart
RNN approaches to abusive language detectionand can strengthen their performance. We then ex-plicitly address the problem of deliberately noisyinput by constructing a model that operates at thecharacter level and learns to predict embeddingsfor unseen words. We show that the integration ofthis model with the character-enhanced
RNN meth-ods further advances the state of the art in abusedetection on three datasets from two different do-mains, namely Twitter and Wikipedia talk page.To the best of our knowledge, this is the first workto use character-based word composition modelsfor abuse detection.
Yin et al. (2009) were among the first ones to ap-ply supervised learning to the task of abuse detec-tion. They worked with a linear support vector ma-chine trained on local (e.g., n-grams), contextual(e.g., similarity of a post to its neighboring posts),and sentiment-based (e.g., presence of expletives)features to recognize posts involving harassment.Djuric et al. (2015) worked with commentstaken from the Yahoo Finance portal and demon-strated that distributional representations of com-ments learned using the paragraph2vec frame-work (Le and Mikolov, 2014) can outperform sim-pler bag-of-words
BOW features under supervisedclassification settings for hate speech detection.Nobata et al. (2016) improved upon the results ofDjuric et al. by training their classifier on an amal-gamation of features derived from four differentcategories: linguistic (e.g., count of insult words),syntactic (e.g. part-of-speech
POS tags), distribu-tional semantic (e.g., word and comment embed-dings) and n-gram based (e.g., word bi-grams). They noted that while the best results were ob-tained with all features combined, character n-grams had the highest impact on performance.Waseem and Hovy (2016) utilized a logisticregression ( LR ) classifier to distinguish amongstracist, sexist, and clean tweets in a dataset of ap-proximately k of them. They found that char-acter n-grams coupled with gender information ofusers formed the optimal feature set for the task.On the other hand, geographic and word-lengthdistribution features provided little to no improve-ment. Experimenting with the same dataset, Bad-jatiya et al. (2017) improved on their results bytraining a gradient-boosted decision tree ( GBDT )classifier on averaged word embeddings learnt us-ing a long short-term memory (
LSTM ) models ini-tialized with random embeddings. Mishra et al.(2018) went on to incorporate community-basedprofiling features of users in their classificationmethods, which led to the state of the art perfor-mance on this dataset.Waseem (2016) studied the influence of anno-tators’ knowledge on the task of hate speech de-tection. For this, they sampled k tweets fromthe same corpus as Waseem and Hovy (2016) andrecruited expert and amateur annotators to anno-tate the tweets as racism , sexism , both or neither .Combining this dataset with that of Waseem andHovy (2016), Park et al. (2017) evaluated the ef-ficacy of a 2-step classification process: they firstused an LR classifier to separate abusive and non-abusive tweets, and then used another LR classifierto distinguish between the racist and sexist ones.They showed that this setup had comparable per-formance to a 1-step classification approach basedon convolutional neural networks ( CNN s) operat-ing on word and character embeddings.Wulczyn et al. (2017) created three differentdatasets of comments collected from the EnglishWikipedia Talk page: one was annotated for per-sonal attacks, another for toxicity, and the thirdfor aggression. They achieved their best resultswith a multi-layered perceptron classifier trainedon character n-gram features. Working with thepersonal attack and toxicity datasets, Pavlopouloset al. (2017) outperformed the methods of Wul-czyn et al. by having a gated recurrent unit (
GRU )to model the comments as dense low-dimensionalrepresentations, followed by an LR layer to clas-sify the comments based on those representations.Davidson et al. (2017) produced a dataset ofbout k racist , offensive or clean tweets. Theyevaluated several multi-class classifiers with theaim of discerning clean tweets from racist and of-fensive tweets, while simultaneously being able todistinguish between the racist and offensive ones.Their best model was an LR classifier trained using TF – IDF and
POS n-gram features coupled with fea-tures like count of hash tags and number of words.
Following the proceedings of the st Workshop onAbusive Language Online (Waseem et al., 2017),we use three datasets from two different domains.
Waseem and Hovy (2016) prepared a dataset of , tweets from a corpus of approximately k tweets retrieved over a period of two months.They bootstrapped their collection process with asearch for commonly used slurs and expletives re-lated to religious, sexual, gender and ethnic mi-norities. After having manually annotated , of the tweets as racism , sexism or neither , theyasked an expert to review their annotations in orderto mitigate against any biases. The inter-annotatoragreement was reported at κ = 0 . , with furtherinsight that of all the disagreements occurredin the sexism class alone.The authors released the dataset as a list of , tweet ID s and their corresponding anno-tations. We could only retrieve , of thetweets with python’s Tweepy library since someof them have been deleted or their visibility hasbeen limited. Of the ones retrieved, 1,939 (12%)are racism , 3,148 (19.4%) are sexism , and the re-maining 11,115 (68.6%) are neither ; the origi-nal dataset has a similar distribution, i.e., 11.7% racism , 20.0% sexism , and 68.3% neither . Wulczyn et al. (2017) extracted approximately M talk page comments from a public dumpof the full history of English Wikipedia releasedin January 2016. From this corpus, they ran-domly sampled comments to form three datasetson personal attack, toxicity and aggression, andengaged workers from CrowdFlower to annotatethem. Noting that the datasets were highly skewedtowards the non-abusive classes, the authors over-sampled comments from banned users to attain amore uniform distribution. In this work, we utilize the toxicity and per-sonal attack datasets, henceforth referred to as W - TOX and W - ATT respectively. Each comment inboth of these datasets was annotated by at least 10workers. We use the majority annotation of eachcomment to resolve its gold label: if a commentis deemed toxic (alternatively, attacking) by morethan half of the annotators, we label it as abusive ;otherwise, as non-abusive . 13,590 (11.7%) of the115,864 comments in W - ATT and 15,362 (9.6%)of the 159,686 comments in W - TOX are abusive.Wikipedia comments, with an average length of25 tokens, are considerably longer than the tweetswhich have an average length of 8.
We experiment with ten different methods, eightof which have an
RNN operating on word embed-dings. Six of these eight also include character n-gram features, and four further integrate our wordcomposition model. The remaining two comprisean
RNN that works directly on character inputs.
Hidden-state ( HS ). As our first baseline, we adoptthe “
RNN ” method of Pavlopoulos et al. (2017)since it produces state of the art results on theWikipedia datasets. Given a text formed of a se-quence w , . . . , w n of words (represented by d -dimensional word embeddings), the method uti-lizes a 1-layer GRU to encode the words into hid-den states h , . . . , h n . This is followed by an LR layer that classifies the text based on the last hid-den state h n . We modify the authors’ original ar-chitecture in two minor ways: we extend the 1-layer GRU to a 2-layer
GRU and use softmax as theactivation in the LR layer instead of sigmoid. Following Pavlopoulos et al., we initialize theword embeddings to GL o V e vectors (Penningtonet al., 2014). In all our methods, words not presentin the GL o V e set are randomly initialized in therange ± . , indicating the lack of semantic in-formation. By not mapping these words to a singlerandom embedding, we mitigate against the errorsthat may arise due to their conflation (Madhyasthaet al., 2015). A special OOV (out of vocabulary)token is also initialized in the same range. All theembeddings are updated during training, allowingfor some of the randomly-initialized ones to get We also experimented with 1-layer
GRU / LSTM and 1/2-layer bi-directional
GRU s/ LSTM s but the performance onlyworsened or showed no gains; using sigmoid instead of soft-max did not have any noteworthy effects on the results either. ask-tuned (Kim, 2014); the ones that do not gettuned lie closely clustered around the
OOV tokento which unseen words in the test set are mapped.
Word-sum ( WS ). The “
LSTM + GL o V e+ GBDT ”method of Badjatiya et al. (2017) constitutesour second baseline. The authors first employan
LSTM to task-tune GL o V e-initialized word em-beddings by propagating error back from an LR layer. They then train a gradient-boosted decisiontree ( GBDT ) classifier to classify texts based onthe average of the constituent word embeddings. We make two minor modifications to the originalmethod: we utilize a 2-layer
GRU instead of the LSTM to tune the embeddings, and we train the
GBDT classifier on the L -normalized sum of theembeddings instead of their average. Hidden-state + char n-grams ( HS + CNG ). Herewe extend the hidden-state baseline: we trainthe 2-layer
GRU architecture as before, but nowconcatenate its last hidden state h n with L -normalized character n-gram counts to train a GBDT classifier.
Augmented hidden-state + char n-grams(
AUGMENTED HS + CNG ). In the above methods,unseen words in the test set are simply mappedto the
OOV token since we do not have a wayof obtaining any semantic information aboutthem. However, this is undesirable since racialslurs and expletives are often deliberately fudgedby users to prevent detection. In using a singleembedding for all unseen words, we lose theability to distinguish such obfuscations from othernon-obfuscated or rare words. Taking inspirationfrom the effectiveness of character-level featuresin abuse detection, we address this issue by havinga character-based word composition model thatcan compose embeddings for unseen words in thetest set (Pinter et al., 2017). We then augment the hidden-state + char n-grams method with it. In their work, the authors report that initializing embed-dings randomly rather than with GL o V e yields state of the artperformance on the Twitter dataset that we are using. How-ever, we found the opposite when performing 10-fold strat-ified cross-validation ( CV ). A possible explanation of thislies in the authors’ decision to not use stratification, whichfor such a highly imbalanced dataset can lead to unexpectedoutcomes (Forman and Scholz, 2010). Furthermore, the au-thors train their LSTM on the entire dataset including the testpart without any early stopping criterion; this facilitates over-fitting of the randomly-initialized embeddings. The deeper 2-layer
GRU slightly improves performance. L -normalized sum ensures uniformity of range acrossthe feature set in all our methods; GBDT , being a tree basedmodel, is not affected by the choice of monotonic function.
Specifically, our model (Figure 1b) comprisesa 2-layer bi-directional
LSTM , followed by a hid-den layer with tanh non-linearity and an outputlayer at the end. The model takes as input a se-quence c , . . . , c k of characters, represented asone-hot vectors, from a fixed vocabulary (i.e., low-ercase English alphabet and digits) and outputs a d -dimensional embedding for the word ‘ c . . . c k ’.Bi-directionality of the LSTM allows for the se-mantics of both the prefix and the suffix (last hid-den forward and backward state) of the input wordto be captured, which are then combined to formthe hidden state for the input word. The modelis trained by minimizing the mean squared er-ror (
MSE ) between the embeddings that it pro-duces and the task-tuned embeddings of words inthe training set. This ensures that newly com-posed embeddings are endowed with characteris-tics from both the GL o V e space as well as the task-tuning process. While approaches like that of Bo-janowski et al. (2017) can also compose embed-dings for unseen words, they cannot endow thenewly composed embeddings with characteristicsfrom the task-tuning process; this may constitute asignificant drawback (Kim, 2014).During the training of our character-based wordcomposition model, to emphasize frequent words,we feed a word as many times as it appears in thetraining corpus. We note that a 1-layer CNN withglobal max-pooling in place of the 2-layer
LSTM provides comparable performance while requiringsignificantly less time to train. This is expectedsince words are not very long sequences, and thefilters of the
CNN are able to capture the differentcharacter n-grams within them.
Context hidden-state + char n-grams(
CONTEXT HS + CNG ). In the augmentedhidden-state + char n-grams method, the wordcomposition model infers semantics of unseenwords solely on the basis of the characters in them.However, for many words, semantic inferenceand sense disambiguation require context, i.e.,knowledge of character sequences in the vicinity.An example is the word cnt that has differentmeanings in the sentences “
I cnt undrstand this! ”and “
You feminist cnt! ”, i.e., cannot in the formerand the sexist slur cunt in the latter. Yet anotherexample is an obfuscation like ‘’
You mot otherf ucker! where the expletive motherfucker cannot beproperly inferred from any fragment without theknowledge of surrounding character sequences. a) (b)
Figure 1 : Context-aware approach to word composition. The figure on the left shows how the encoderextracts context-aware representations of characters in the phrase “ cat sat on ” from their one-hot repre-sentations. The dotted lines denote the space character (cid:116) which demarcates word boundaries. Semanticsof an unseen word, e.g., sat , can then be inferred by our word composition model shown on the right.To address this, we develop context-awarerepresentations for characters as inputs to ourcharacter-based word composition model insteadof one-hot representations. We introduce an en-coder architecture to produce the context-awarerepresentations. Specifically, given a text formedof a sequence w , . . . , w n of words, the encodertakes as input one-hot representations of the char-acters c , . . . , c k within the concatenated sequence‘ w (cid:116) . . . (cid:116) w n ’, where (cid:116) denotes the space charac-ter. This input is passed through a bi-directional LSTM that produces hidden states h , . . . , h k , onefor every character. Each hidden state, referred toas context-aware character representation, is theaverage of its designated forward and backwardstates; hence, it captures both the preceding aswell as the following contexts of the characterit corresponds to. Figure 1 illustrates how thecontext-aware representations are extracted andused for inference by our character-based wordcomposition model. The model is trained in thesame manner as done in the augmented hidden-state + char n-grams method, i.e., by minimiz-ing the MSE between the embeddings that it pro-duces and the task-tuned embeddings of words inthe training set (initialized with GL o V e). However, We also experimented with word-level context but didnot get any significant improvements. We believe this is dueto higher variance at word level than at the character level. the inputs now are context-aware representationsof characters instead of one-hot representations.
Word-sum + char n-grams ( WS + CNG ) , Augmented word-sum + char n-grams(
AUGMENTED WS + CNG ) , and Contextword-sum + char n-grams (
CONTEXT WS + CNG ). These methods are identical to the (context/augmented) hidden-state + char n-grams methodsexcept that here we include the character n-gramsand our character-based word composition modelon top of the word-sum baseline.
Char hidden-state (
CHAR HS ) and Char word-sum (
CHAR WS ). In all the methods described uptill now, the input to the core
RNN is word em-beddings. To gauge whether character-level inputsare themselves sufficient or not, we construct twomethods based on the character to word ( C W ) ap-proach of Ling et al. (2015). For the char hidden-state method, the input is one-hot representationsof characters from a fixed vocabulary. These rep-resentations are encoded into a sequence w , . . . , w n of intermediate word embeddings by a 2-layerbi-directional LSTM . The word embeddings arethen fed into a 2-layer
GRU that transforms theminto hidden states h , . . . , h n . Finally, as in the hidden-state baseline, an LR layer with softmaxactivation uses the last hidden state h n to performclassification while propagating error backwardsto train the network. The char word-sum methods similar except that once the network has beentrained, we use the intermediate word embeddingsproduced by it to train a GBDT classifier in thesame manner as done in the word-sum baseline.
We normalize the input by lowercasing all wordsand removing stop words. For the
GRU architec-ture, we use exactly the same hyper-parametersas Pavlopoulos et al. (2017), i.e., 128 hiddenunits, Glorot initialization, cross-entropy loss, andAdam optimizer (Kingma and Ba, 2015). Bad-jatiya et al. (2017) also use the same settingsexcept they have fewer hidden units. The LSTM in our character-based word composition modelhas 256 hidden units while that in our encoderhas 64; the
CNN has filters of widths varyingfrom 1 to 4. The results we report are with an
LSTM -based word composition model. In all themodels, besides dropout regularization (Srivastavaet al., 2014), we hold out a small part of the train-ing set as validation data to prevent over-fitting.We use 300d embeddings and 1 to 5 character n-grams for Wikipedia and 200d embeddings and1 to 4 character n-grams for Twitter. We imple-ment the models in
Keras (Chollet et al., 2015)with
Theano back-end. We employ
Lightgbm (Keet al., 2017) as our
GDBT classifier and tune itshyper-parameters using 5-fold grid search.
For the Twitter dataset, unlike previous research(Badjatiya et al., 2017; Park and Fung, 2017), wereport the macro precision, recall, and F averagedover 10 folds of stratified CV (Table 1). For aclassification problem with N classes, macro pre-cision (similarly, macro recall and macro F ) isgiven by: M acro P = 1 N N (cid:88) i =1 P i where P i denotes precision on class i . Macro met-rics provide a better sense of effectiveness on theminority classes (Van Asch, 2013).We observe that character n-grams ( CNG ) con-sistently enhance performance, while our aug-mented approach (
AUGMENTED ) further improves The authors have not released their models; we replicatetheir method based on the details in their paper. upon the results obtained with character n-grams.All the improvements are statistically significantwith p < . under 10-fold CV paired t-test.As Ling et al. (2015) noted in their POS taggingexperiments, we observe that the
CHAR HS and
CHAR WS methods perform worse than their coun-terparts that use pre-trained word embeddings, i.e.,the HS and WS baselines respectively.To further analyze the performance of ourbest methods ( CONTEXT / AUGMENTED WS / HS + CNG ), we also examine the results on the racismand sexism classes individually (Table 2). As be-fore, we see that our approach consistently im-proves over the baselines, and the improvementsare statistically significant under paired t-tests.
Method
P R F HS CHAR HS HS + CNG † AUGMENTED HS + CNG † CONTEXT HS + CNG † WS CHAR WS WS + CNG † AUGMENTED WS + CNG † CONTEXT WS + CNG † Table 1 : Results on the Twitter dataset. The meth-ods we propose are denoted by † . Our best method( AUGMENTED WS + CNG ) significantly outper-forms all other methods.
Method
P R F HS AUGMENTED HS + CNG † CONTEXT HS + CNG † WS AUGMENTED WS + CNG † CONTEXT WS + CNG † Method
P R F HS AUGMENTED HS + CNG † CONTEXT HS + CNG † WS AUGMENTED WS + CNG † CONTEXT WS + CNG † Table 2 : The baselines ( WS , HS ) vs. our best ap-proaches ( † ) on the racism and sexism classes.Additionally, we note that the AUGMENTED WS + CNG method improves the F score of the WS CNG method from 74.12 to 75.01 for the racismclass, and from 74.03 to 74.44 for the sexism class.The
AUGMENTED HS + CNG method similarly im-proves the F score of the HS + CNG method from74.00 to 74.40 on the racism class while makingno notable difference on the sexism class.We see that the
CONTEXT HS / WS + CNG meth-ods do not perform as well as the
AUGMENTEDHS / WS + CNG methods. One reason for thisis that the Twitter dataset is not able to exposethe methods to enough contexts due to its smallsize. Moreover, because the collection of thisdataset was bootstrapped with a search for certaincommonly-used abusive words, many such wordsare shared across multiple tweets belonging to dif-ferent classes. Given the above, context-awarecharacter representations perhaps do not providesubstantial distinctive information.
Following previous work (Pavlopoulos et al.,2017; Wulczyn et al., 2017), we conduct a stan-dard 60:40 train–test split experiment on the twoWikipedia datasets. Specifically, from W - TOX , , comments ( . abusive) are used fortraining and , ( . abusive) for testing;from W - ATT , , ( . abusive) are used fortraining and , ( . abusive) for testing.Table 3 reports the macro F scores. We do notreport scores from the CHAR HS and
CHAR WS methods since they showed poor preliminary re-sults compared to the HS and WS baselines. Method W - TOX W - ATT HS HS + CNG † AUGMENTED HS + CNG † CONTEXT HS + CNG † WS WS + CNG † AUGMENTED WS + CNG † CONTEXT WS + CNG † Table 3 : Macro F scores on the two Wikipediadatasets. The current state of the art method forthese datasets is HS . † denotes the methods wepropose. Our best method ( CONTEXT HS + CNG )outperforms all the other methods.Mirroring the analysis carried out for the Twit-ter dataset, Table 4 further compares the per-formance of our best methods for Wikipedia(
CONTEXT / AUGMENTED HS + CNG ) with that of the state of the art baseline ( HS ) on specifically theabusive classes of W - TOX and W - ATT . Method
P R F HS AUGMENTED HS + CNG † CONTEXT HS + CNG † (a) W - TOX
Method
P R F HS AUGMENTED HS + CNG † CONTEXT HS + CNG † (b) W - ATT
Table 4 : The current state of the art baseline ( HS )vs. our best methods ( † ) on the abusive classes of W - TOX and W - ATT .We observe that the augmented approach sub-stantially improves over the state of the art base-line. Unlike in the case of Twitter, our context-aware setup for word composition is now able tofurther enhance performance courtesy of the largersize of the datasets which increases the availabilityof contexts. All improvements are significant ( p < . ) under paired t-tests. We note, however, thatthe gains we get here with the word compositionmodel are relatively small compared to those weget for Twitter. This difference can be explainedby the fact that: (i) Wikipedia comments are lessnoisy than the tweets and contain fewer obfusca-tions, and (ii) the Wikipedia datasets, being muchlarger, expose the methods to more words dur-ing training, hence reducing the likelihood of un-seen words being important to the semantics of thecomments they belong to (Kim et al., 2016).Like Pavlopoulos et al. (2017), we see thatthe methods that involve summation of word em-beddings ( WS ) perform significantly worse on theWikipedia datasets compared to those that use hid-den state ( HS ); however, their performance is com-parable or even superior on the Twitter dataset.This contrast is best explained by the observationof Nobata et al. (2016) that taking average orsum of word embeddings compromises contextualand word order information. While this is ben-eficial in the case of tweets which are short andloosely-structured, it leads to poor performance ofthe WS and WS + CNG methods on the Wikipediadatasets, with the addition of the word composi-tion model (
CONTEXT / AUGMENTED WS + CNG )providing little to no improvements. busive sample Predicted class
WS WS + CNG AUGMENTED WS + CNG @mention I love how the Islamofascists recruit 14 and 15 yearold jihadis and then talk about minors in reference to 17 yearolds. neither racism racism@mention @mention @mention As a certified inmate of the Is-lamasylum, you don’t have the ability to judge. neither racism racism@mention “I’ll be ready in 5 minutes” from a girl usuallymeans “I’ll be ready in 20+ minutes.”
Table 5 : Improved classification upon the addition of character n-grams (
CNG ) and our word compositionmodel (
AUGMENTED ). Names of users have been replaced with mention for anonymity.
To investigate the extent to which obfuscatedwords can be a problem, we extract a numberof statistics. Specifically, we notice that out ofthe approximately k unique tokens present inthe Twitter dataset, there are about . k tokensthat we cannot find in the English dictionary. Around of these . k tokens are present inthe racist tweets, . k in the sexist tweets, andthe rest in tweets that are neither. Examples fromthe racist tweets include fuckbag , ezidiz , islamo-fascists , islamistheproblem , islamasylum and isis-aremuslims , while those from the sexist tweets in-clude c*nt , bbbbitch , feminismisawful , and stupid-bitch . Given that the racist and sexist tweets comefrom a small number of unique users, 5 and 527respectively, we believe that the presence of ob-fuscated words would be even more pronounced iftweets were procured from more unique users.In the case of the Wikipedia datasets, around k unique tokens in the abusive comments ofboth W - TOX and W - ATT are not attested in the En-glish dictionary. Examples of such tokens from W - TOX include fuggin , n*gga , fuycker , and ;and from W - ATT include f**king , beeeitch , musul-mans , and motherfucken . In comparison to thetweets, the Wikipedia comments use more “stan-dard” language. This is validated by the fact thatonly 14% of the tokens present in W - TOX and W - ATT are absent from the English dictionary as op- We use the
US E nglish spell-checking utility provided bythe
PyEnchant library of python. posed to 32% of the tokens in the Twitter dataseteven though the Wikipedia ones are almost tentimes larger.Across the three datasets, we note that the ad-dition of character n-gram features enhances theperformance of
RNN -based methods, corroborat-ing the previous findings that they capture com-plementary structural and lexical information ofwords. The inclusion of our character-based wordcomposition model yields state of the art resultson all the datasets, demonstrating the benefits ofinferring the semantics of unseen words. Table5 shows some abusive samples from Twitter thatare misclassified by the WS baseline method butare correctly classified upon the addition of char-acter n-grams ( WS + CNG ) and the further additionof our character-based word composition model(
AUGMENTED WS + CNG ).Many of the abusive tweets that remain mis-classified by the
AUGMENTED WS + CNG methodare those that are part of some abusive dis-course (e.g., @Mich McConnell Just “her body”right? ) or contain
URL s to abusive content (e.g., @salmonfarmer1: Logic in the world of Islamhttp://t.co/6nALv2HPc3 ).In the case of the Wikipedia datasets, there areabusive examples like smyou have a message reyour last change, go fuckyourself!!! and
F-uc-kyou, a-ss-hole Motherf–ucker! that are misclassi-fied by the state of the art HS baseline and the HS + CNG method but correctly classified by our bestmethod for the datasets, i.e.,
CONTEXT HS + CNG . ord Similar words in training set women girls , woman , females , chicks , ladiesw0m3n † woman , women , girls , ladies , chickscunt twat , prick , faggot , slut , assholea5sh0les † assholes , stupid , cunts , twats , faggotsstupidbitch † idiotic , stupid , dumb , ugly , womenjihad islam , muslims , sharia , terrorist , jihadijihaaadi † terrorists , islamist , jihadists , muslimsterroristislam † terrorists , muslims , attacks , extremistsfuckyouass † fuck , shit , fucking , damn , hell Table 6 : Words in the training set that exhibit highcosine similarity to the given word. The onesmarked with † are not seen during training; em-beddings for them are composed using our wordcomposition model.To ascertain the effectiveness of our task-tuningprocess for embeddings, we conducted a quali-tative analysis, validating that semantically simi-lar words cluster together in the embedding space.Analogously, we assessed the merits of our wordcomposition model by verifying the neighbors ofembeddings formed by it for obfuscated words notseen during training. Table 6 provides some exam-ples. We see that our model correctly infers the se-mantics of obfuscated words, even in cases whereobfuscation is by concatenation of words. In this paper, we considered the problem of obfus-cated words in the field of automated abuse detec-tion. Working with three datasets from two differ-ent domains, namely Twitter and Wikipedia talkpage, we first comprehensively replicated the pre-vious state of the art
RNN methods for the datasets.We then showed that character n-grams capturecomplementary information, and hence, are ableto enhance the performance of the
RNN s. Finally,we constructed a character-based word composi-tion model in order to infer semantics for unseenwords and further extended it with context-awarecharacter representations. The integration of ourcomposition model with the enhanced
RNN meth-ods yielded the best results on all three datasets.We have experimentally demonstrated that our ap-proach to modeling obfuscated words significantlyadvances the state of the art in abuse detection. Inthe future, we wish to explore its efficacy in taskssuch as grammatical error detection and correc-tion. We will make our models and logs of experi-ments publicly available at https://github.com/pushkarmishra/AbuseDetection . Acknowledgements
Special thanks to the anonymous reviewers fortheir valuable comments and suggestions.
References
Pinkesh Badjatiya, Shashank Gupta, Manish Gupta,and Vasudeva Varma. 2017. Deep learning for hatespeech detection in tweets. In
Proceedings of the26th International Conference on World Wide WebCompanion , WWW ’17 Companion, pages 759–760, Republic and Canton of Geneva, Switzerland.International World Wide Web Conferences Steer-ing Committee.Piotr Bojanowski, Edouard Grave, Armand Joulin, andTomas Mikolov. 2017. Enriching word vectors withsubword information.
Transactions of the Associa-tion for Computational Linguistics , 5:135–146.Franc¸ois Chollet et al. 2015. Keras.Thomas Davidson, Dana Warmsley, Michael Macy,and Ingmar Weber. 2017. Automated hate speechdetection and the problem of offensive language. In
Proceedings of the 11th International AAAI Confer-ence on Web and Social Media , ICWSM ’17.Nemanja Djuric, Jing Zhou, Robin Morris, Mihajlo Gr-bovic, Vladan Radosavljevic, and Narayan Bhamidi-pati. 2015. Hate speech detection with commentembeddings. In
Proceedings of the 24th Interna-tional Conference on World Wide Web , WWW ’15Companion, pages 29–30, New York, NY, USA.ACM.Maeve Duggan. 2014. Online harassment.George Forman and Martin Scholz. 2010. Apples-to-apples in cross-validation studies: Pitfalls in clas-sifier performance measurement.
SIGKDD Explor.Newsl. , 12(1):49–57.Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang,Wei Chen, Weidong Ma, Qiwei Ye, and Tie-YanLiu. 2017. Lightgbm: A highly efficient gradientboosting decision tree. In I. Guyon, U. V. Luxburg,S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan,and R. Garnett, editors,
Advances in Neural Infor-mation Processing Systems 30 , pages 3149–3157.Curran Associates, Inc.Yoon Kim. 2014. Convolutional neural networks forsentence classification. In
Proceedings of the 2014Conference on Empirical Methods in Natural Lan-guage Processing (EMNLP) , pages 1746–1751. As-sociation for Computational Linguistics.Yoon Kim, Yacine Jernite, David Sontag, and Alexan-der M. Rush. 2016. Character-aware neural lan-guage models. In
Proceedings of the Thirtieth AAAIConference on Artificial Intelligence , AAAI’16,pages 2741–2749. AAAI Press.iederik P. Kingma and Jimmy Ba. 2015. Adam: Amethod for stochastic optimization. In
Proceed-ings of the 3rd International Conference on Learn-ing Representations , ICLR ’15.Quoc Le and Tomas Mikolov. 2014. Distributed repre-sentations of sentences and documents. In
Proceed-ings of the 31st International Conference on Inter-national Conference on Machine Learning , ICML’14.Wang Ling, Chris Dyer, Alan W Black, Isabel Tran-coso, Ramon Fermandez, Silvio Amir, Luis Marujo,and Tiago Luis. 2015. Finding function in form:Compositional character models for open vocabu-lary word representation. In
Proceedings of the 2015Conference on Empirical Methods in Natural Lan-guage Processing , pages 1520–1530. Associationfor Computational Linguistics.Pranava Swaroop Madhyastha, Mohit Bansal, KevinGimpel, and Karen Livescu. 2015. Mapping unseenwords to task-trained embedding spaces.
CoRR ,abs/1510.02387.Pushkar Mishra, Marco Del Tredici, Helen Yan-nakoudakis, and Ekaterina Shutova. 2018. Authorprofiling for abuse detection. In
Proceedings ofthe 27th International Conference on ComputationalLinguistics , pages 1088–1098. Association for Com-putational Linguistics.Chikashi Nobata, Joel Tetreault, Achint Thomas,Yashar Mehdad, and Yi Chang. 2016. Abusive lan-guage detection in online user content. In
Proceed-ings of the 25th International Conference on WorldWide Web , WWW ’16, pages 145–153, Republic andCanton of Geneva, Switzerland. International WorldWide Web Conferences Steering Committee.Ji Ho Park and Pascale Fung. 2017. One-step and two-step classification for abusive language detection ontwitter. In
Proceedings of the First Workshop onAbusive Language Online , pages 41–45. Associationfor Computational Linguistics.John Pavlopoulos, Prodromos Malakasiotis, and IonAndroutsopoulos. 2017. Deep learning for usercomment moderation. In
Proceedings of the FirstWorkshop on Abusive Language Online , pages 25–35. Association for Computational Linguistics.Jeffrey Pennington, Richard Socher, and Christo-pher D. Manning. 2014. Glove: Global vectors forword representation. In
Empirical Methods in Nat-ural Language Processing (EMNLP) , pages 1532–1543.Yuval Pinter, Robert Guthrie, and Jacob Eisenstein.2017. Mimicking word embeddings using subwordrnns. In
Proceedings of the 2017 Conference onEmpirical Methods in Natural Language Process-ing , pages 102–112. Association for ComputationalLinguistics. J. Qian, M. ElSherief, E. Belding, and W. Wang.2018. Leveraging intra-user and inter-user represen-tation learning for automated hate speech detection.
NAACL HLT, New Orleans, LA, June 2018. , page toappear .Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky,Ilya Sutskever, and Ruslan Salakhutdinov. 2014.Dropout: A simple way to prevent neural networksfrom overfitting.
Journal of Machine Learning Re-search , 15:1929–1958.Vincent Van Asch. 2013. Macro-and micro-averagedevaluation measures [[basic draft]].
Computa-tional Linguistics & Psycholinguistics, University ofAntwerp, Belgium .Zeerak Waseem. 2016. Are you a racist or am I seeingthings? annotator influence on hate speech detectionon twitter. In
Proceedings of the First Workshop onNLP and Computational Social Science , pages 138–142. Association for Computational Linguistics.Zeerak Waseem, Wendy Hui Kyong Chung, Dirk Hovy,and Joel Tetreault. 2017. Proceedings of the firstworkshop on abusive language online. In
Proceed-ings of the First Workshop on Abusive Language On-line . Association for Computational Linguistics.Zeerak Waseem and Dirk Hovy. 2016. Hateful sym-bols or hateful people? predictive features for hatespeech detection on twitter. In
Proceedings of theNAACL Student Research Workshop , pages 88–93,San Diego, California. Association for Computa-tional Linguistics.Ellery Wulczyn, Nithum Thain, and Lucas Dixon.2017. Ex machina: Personal attacks seen at scale.In
Proceedings of the 26th International Conferenceon World Wide Web , WWW ’17, pages 1391–1399,Republic and Canton of Geneva, Switzerland. In-ternational World Wide Web Conferences SteeringCommittee.Dawei Yin, Brian D. Davison, Zhenzhen Xue, LiangjieHong, April Kontostathis, and Lynne Edwards.2009. Detection of harassment on web 2.0. In