[PDF] TIES: Temporal Interaction Embeddings For Enhancing Social Media Integrity At Facebook

Abstract

Since its inception, Facebook has become an integral part of the online social community. People rely on Facebook to make connections with others and build communities. As a result, it is paramount to protect the integrity of such a rapidly growing network in a fast and scalable manner. In this paper, we present our efforts to protect various social media entities at Facebook from people who try to abuse our platform. We present a novel Temporal Interaction EmbeddingS (TIES) model that is designed to capture rogue social interactions and flag them for further suitable actions. TIES is a supervised, deep learning, production ready model at Facebook-scale networks. Prior works on integrity problems are mostly focused on capturing either only static or certain dynamic features of social entities. In contrast, TIES can capture both these variant behaviors in a unified model owing to the recent strides made in the domains of graph embedding and deep sequential pattern learning. To show the real-world impact of TIES, we present a few applications especially for preventing spread of misinformation, fake account detection, and reducing ads payment risks in order to enhance the platform's integrity.

Full PDF

TTIES: Temporal Interaction Embeddings For EnhancingSocial Media Integrity At Facebook

Nima Noorshams

Core Data Science [email protected] Saurabh Verma

Core Data Science [email protected] Aude Hoﬂeitner

Core Data Science [email protected]

ABSTRACT

Since its inception, Facebook has become an integral partof the online social community. People rely on Facebook tomake connections with others and build communities. Asa result, it is paramount to protect the integrity of such arapidly growing network in a fast and scalable manner. Inthis paper, we present our eﬀorts to protect various socialmedia entities at Facebook from people who try to abuse ourplatform. We present a novel Temporal Interaction Embed-dingS (TIES) model that is designed to capture rogue socialinteractions and ﬂag them for further suitable actions. TIESis a supervised, deep learning, production ready model atFacebook-scale networks. Prior works on integrity problemsare mostly focused on capturing either only static or cer-tain dynamic features of social entities. In contrast, TIEScan capture both these variant behaviors in a uniﬁed modelowing to the recent strides made in the domains of graphembedding and deep sequential pattern learning. To showthe real-world impact of TIES, we present a few applicationsespecially for preventing spread of misinformation, fake ac-count detection, and reducing ads payment risks in order toenhance Facebook platform’s integrity.

1. INTRODUCTION

People use online social media such as Facebook to connectwith family and friends, build communities, and share expe-riences every day. But the rapid growth of social media inrecent years has introduced several challenges. First is therise of fake and inauthentic accounts, which pose potentialthreats to the safety of online communities. Second is therise of threatening and/or disparaging content such as hate-speech, misinformation, bullying, terrorist propaganda, etc.These can occur both through authentic as well as inauthen-tic accounts. In Q1-2019 Facebook has acted on 4 millionand 2.6 million pieces of content for violating hate-speechand bullying policies, respectively [3; 2]. We broadly referto these as social media integrity challenges.There is a volume of research on fake account detection inonline social media. Graph propagation techniques to detectspams over user-following graphs have been proposed in [32;24]. Integro [8] focuses on predicting victims using user-levelfeatures and performs a random walk style algorithm on amodiﬁed graph. There have been other works that focus onutilizing the graph structure, by clustering and identifying cohorts of malicious actors that share common IP addressesand other common networking properties [29; 33; 31]. Re-searchers in [20] create graphs based on user activities inorder to detect fake engagements. Similarly, hand-designedfeatures based on activity has been used in [9].There is also a volume of research on content-based integrityproblems [5; 6]. Natural language processing techniqueshave been widely used for hate-speech and cyberbullyingdetection [34; 26; 10; 14; 23]. Simple token and character-level n-grams are included in the feature set by [34; 26; 10].Word topic distribution using Latent Dirichlet Allocationhas been used by [14] to detect cyberbullying on Instagram.Alternatively, paragraph embedding for hate-speech detec-tion was proposed in [23]. Dinakar et al. [17] presented aknowledge-based approach utilizing domain speciﬁc asser-tions in detecting hate-speech. In [15; 14] authors combineimage and other media features with text, essentially in-corporating context through multimodal information. Usermeta-data such as the violation history, number of profanewords in prior comments, gender, etc. have also been shownpredictive [22; 36]. More recently, deep-learning has beenused to ﬁght child pornography [25], hate-speech [13; 27],and misinformation [16; 12].The majority of previous approaches tackling integrity chal-lenges are static in nature. More speciﬁcally, they utilizeengineered user-level, graph, or content features that do notalter in time. However, entities on social media (accounts,posts, stories, Groups, Pages, etc.) generate numerous in-teractions from other entities over time (see Figure 1). Forinstance, • posts get likes, shares, comments, etc. by users, or • accounts send or reject friend requests, send or blockmessages, etc. from other accounts.These temporal sequences can potentially reveal a lot aboutentities to which they belong. The manner in which fakeaccounts behave is diﬀerent from normal accounts. Hatefulposts generate diﬀerent type of engagements compared toregular posts. Not only the type but also the target of theseengagements can be informative. For instance, an accountwith history of spreading hate or misinformation sharing orengaging positively with a post can be indicative of a pieceof questionable content.In this work, we present Temporal Interaction EmbeddingS(TIES), a supervised deep-learning model for encoding inter-actions between social media entities for integrity purposes.As its input, TIES takes a sequence of (source, target, ac-tion) in addition to miscellaneous source and target features. a r X i v : . [ c s . A I] F e b a) user-user interactions for fake account detection. (b) post-user interactions for hate-speech detection. Figure 1: Entities on social media interact with each other in numerous ways. Interactions generated by bad entities diﬀerfrom normal entities. We can enhance the platform integrity by capturing/encoding these interactions.It then learns model parameters by minimizing a loss func-tion over a labeled dataset. Finally, it outputs predictionscores as well as embedding vectors. There has also beenother works on temporal interaction networks and embed-dings. Recently and simultaneously to our work, JODIE,a novel embedding technique to learn joint user and itemembeddings from sequential data, was proposed [18]. Whilethese authors apply their algorithm to the problem of de-tecting banned accounts from Wikipedia and Reddit, whichis similar to the problem of fake account detection, theirwork is diﬀerent from ours. For example, they apply tworecurrent neural networks to learn joint user-item embed-dings based on temporal interaction sequences, while wejust want to learn the embedding for the source entities.In order to leverage the target entities current state, we alsouse pre-existing embeddings or application speciﬁc featuresets which JODIE does not. A notable limitation of JODIEand other existing approaches is scalability. On Facebook,we have billions of accounts and trillions of interactions perday. Therefore, scalability and reasonable computationalcosts are of utmost importance.The remainder of the paper is organized as follows. We beginin Section 2 with the problem formulation and descriptionof the model architecture. In Section 3, we discuss a coupleof integrity case studies and results. Finally, we concludethe paper with discussions in Section 4.

2. PROTECTING FACEBOOK SOCIAL ME-DIA INTEGRITY

We ﬁrst start by providing a mathematical formulation forsolving various social media integrity problems encounteredon Facebook’s platform and subsequently present the TIESmodel in detail.

At Facebook, we are interested in verifying the integrity ofvarious social media entities such as accounts, posts, Pages,Groups, etc. As mentioned earlier, in this work we exploitthe interaction information between such entities to deter-mine their integrity. We refer to an entity under inspec-tion as source (denoted by u ) while other interacted en- tities are referred as targets (denoted by v ). Suppose attime t , the source entity u interacts with target entity v t by taking an action a t such as receiving a friend request,sending an event invite, or liking a post. We might alsohave source and target speciﬁc features f t at each times-tamp (e.g. text or image related features or time gapsbetween consecutive actions). As a result, we will havea sequence of temporal interactions represented by I = { ( u, v , a , f ) , ( u, v , a , f ) , . . . , ( u, v T , a T , f T ) } . Based onthe interaction sequence I , TIES determines the integrityof u , for instance, how likely is the entity to be a fake ac-count or a hateful post. We do so by training a supervisedmodel using a labeled training set { ( I k , l k ) } Nk =1 , where l k isthe ground truth label for the sequence I k . Thus, in ourframework solving social media integrity is formulated as asequential or temporal learning problem. At the core of the TIES model, there are two types of em-beddings: 1) graph based and 2) temporal based. Theseembeddings are constantly trained to capture the ongoingbehavior of the entities. We ﬁrst make use of a large-scaleFacebook graph to capture the static behavior of the enti-ties. Subsequently, these graph-based embeddings are usedto initialize the temporal model that captures the dynamicbehavior of the entities. This is a distinguishing feature ofthe TIES that has not been explored before in prior integrityworks. We now describe these parts in more detail.

One of the novel components of TIES model is making useof a large scale graph structure formed by various socialmedia entities beside the dynamic interaction information.In social networks, entities may be connected to each otherthrough friend relationships or belong to the same groups.Such prior knowledge can capture the ‘static’ nature of var-ious entities. It should be noted that even though thesestatic relations (such as friend relationships, group member-ship, etc.) are not truly static, they vary at a much lowerpace compared to other more ‘dynamic’ interactions (such Actions could originate from either source or target.igure 2: Facebook social media entity graph. To capturethe static behavior of entities, Pytorch-BigGraph systemis used to compute graph embeddings. These vectors aretreated as pre-trained embeddings for the TIES model.as post reactions, commenting, etc.). Past studies mainly fo-cus either on static or dynamic behavior but do not accountfor both in the model. Moreover, the scale of the graphstructure considered in this work is much greater than inprevious works and thus presents unique challenges.Let G = ( V, R, E ) denote a large-scale multi-relations graphformed by social media entities (see Figure 2). Here, V denotes a set of nodes (i.e., entities), R is a set of relationsand E denotes a set of edges. Each edge e = ( s, r, d ) consistsof a source s , a relation r , and a destination d , where s, d ∈ V and r ∈ R . In order to learn graph embeddings for entities inthe graph G , we utilize the PyTorch-BigGraph (PBG) [19]distributed system due to its scalability to billions of nodesand trillions of edges. This is essential for solving our real-world needs.Suppose θ s , θ r , θ d are trainable parameter vectors (embed-dings) associated with source, relation, and destination. PBGassigns a score function f ( θ s , θ r , θ d ) to each triplet, wherehigher values are expected for ( s, r, d ) ∈ E as opposed to( s, r, d ) / ∈ E . PBG optimizes a margin-based ranking loss(shown below) for each edge e in the training data. A setof judiciously constructed negative edges e (cid:48) are obtained bycorrupting e with either a sampled source or destinationnode (cid:96) = (cid:88) e ∈ G (cid:88) e (cid:48) ∈ S (cid:48) e max { f ( e ) − f ( e (cid:48) ) + λ, } Here λ is the margin hyperparameter and S (cid:48) e = { ( s (cid:48) , r, d ) | s (cid:48) ∈ V } ∪ { ( s, r, d (cid:48) ) | d (cid:48) ∈ V } . Finally, entity embeddings andrelation parameters are learned by performing mini-batchstochastic gradient descent. More details about these em-beddings can be found in [19]. The PBG-trained embed-dings on a large-scale graph with billions of nodes and tril-lions of edges are fed as pre-trained source and target em-beddings to the temporal model, to which we now turn. Temporal based embeddings are designed to capture the in- formation encoded in the sequence of interactions I as dis-cussed in Section 2.1. Consider all the interactions of asource entity u in a given window of time period. Supposeat t time, source entity u interacted with target entity v t byperforming an action a t . This whole interaction at time t isencoded into a single feature vector as follows (see Figure 3):1. Action Features : Action a , for instance comment-ing or liking, is represented by a ﬁxed size vector thatwill be learned during the training process and initial-ized as random (also referred as trainable embedding).Depending upon the task, the same type of actioncan learn diﬀerent embeddings and can have multiplemeanings based on the context. As mentioned earlier,one can also encode the direction information in actionby splitting it into two directional events as a − send or a − receive.2. Source Entity Features : Source entity u is repre-sented by a pre-trained embedding. More speciﬁcally,we utilize graph-based embeddings obtained from thePBG system as described in Section . One canfurther ﬁnetune the pre-trained embeddings in the TIESmodel, but this is only possible if the number of uniquesource entities is not greater than a few million (dueto computational cost).3. Target Entity Features : Similar to the source en-tity, the target entity v is also represented by a pre-trained embedding (if available) or by a trainable em-bedding (if problem dimension allows i.e., limited tofew millions). Multiple types of pre-trained target em-beddings can be utilized in the same TIES model.4. Miscellaneous features : We can also encode usefultime-related information such as rate of interaction via∆ t = t i +1 − t i (may need to normalize the range ap-propriately). Rate of interaction is an important signalfor detecting abusiveness. Other features like text orimages can also be plugged into TIES in similar man-ner.All these features are packaged into a single feature vectorby performing an aggregation operation. We obtain a singleembedding capturing the full interaction information in x t = e ( u ) (cid:12) e ( v t ) (cid:12) · · · (cid:12) ∆ t , where (cid:12) is a aggregation operator, e ( · ) represents the entityembedding and x t is the resultant embedding obatined attime t . In our case, we simply choose concatenation as theaggregation operation. Next, we pass the sequence of thesetemporal embeddings i.e., X = [ x , x , ..., x T ] ∈ R T × d where T is sequence length and d is the input dimension, into asequence encoder to yield TIES embedding z ∈ R h as follows z = T IESEncoder ( X ) . TIES Encoder : It yields our ﬁnal TIES embedding bycapturing the temporal aspect present in the sequence ofinteraction embeddings. Our general purpose sequence en-coder has the following components:igure 3:

TIES Model Architecture : At each time step, the (source, target, action) triplet is converted into a feature vectorthat consists of trainable action embeddings, pre-trained source and target embeddings, and other miscellaneous features.The feature vectors are then fed into a deep sequential learning model to capture the dynamic behavior of entities.1.

Sequence Encoding Layer : This encoding layer trans-forms the input into a hidden state that is now awareof the interaction context in the sequence H = SeqEnocder ( X ) . Here, H ∈ R T × h is the hidden-state matrix with h asthe dimension. We consider three types of sequenceencoding layer in our TIES framework with varyingtraining and inference costs and beneﬁts:(a) Recurrent neural networks : RNNs such aslong short term memory networks (LSTM) arequite capable of capturing dependencies in a se-quence [35]. But they are inherently slow to train.(b)

Convolutional neural networks : 1D sequenceCNNs can also capture sequential information butare limited to local context and need to have higherdepth for capturing global context, depending onthe task in hand [7].(c)

DeepSet : When inputs are treated as sets andtheir order does not matter, we can use Deepsetsas sequence encoders [21]. Here, we ﬁrst pass eachinput in a sequence through an MLP (small neu-ral network) and then perform a sum operationfollowed by another MLP layer, yielding a singleembedding layer.Besides the application in-hand and deployment chal-lenges, the choice among RNN, CNN and DeepSet de-pends on the tradeoﬀ between performance and infer-ence time. Due to its recurrent nature, RNN is ex- pensive while DeepSet has the lowest inference time inproduction.2.

Attention Layer : Attention Layer [30] can be usedto weigh the embeddings diﬀerently according to theircontribution towards speciﬁc tasks. Attention valuescan also be used to visualize which part of the inter-action sequence is being focused more than the othersand that can provide more interpretable outcome. Theoutput is given by Z = Attention ( H ) , where Z ∈ R T × h is the attention layer output.3. Pooling Layer : A ﬁnal pooling layer such as mean,max, or sum operation is used to yield a single em-bedding for the whole interaction sequence. Here, wehave z = P ooling ( Z )where z ∈ R h is the output of the pooling layer andserves as the ﬁnal TIES embeddings. Parameters of TIES are learned based on the task labelsassociated with each training sequence I k , k = 1 , , ..., N .For instance, in case of abusive account detection we havea training set of sequences labeled as either abusive or be-nign and binary cross-entropy can be considered as the lossfunction. In general, TIES embedding z is fed to the feed-forward neural network for learning the parameters in end-o-end fashion as follows, (cid:96) = N (cid:88) i =1 L ( X i , f ( z i ))where (cid:96) is the overall task loss, L is the loss function, f ( · )is the feed-forward neural network, X i ∈ R T × d is the i th in-put training sequence, z i ∈ R h is the corresponding learnedTIES embedding and N is the total number of training sam-ples. Depending upon the task, diﬀerent metrics are takeninto consideration and class-imbalance issue is handled byweighting the classes properly. This completes the overalldescription of the TIES model architecture.

3. FACEBOOK INTEGRITY APPLICATIONSAND RESULTS

In this section, we brieﬂy describe our implementation ofTIES, introduce some of its integrity applications, and usecases. These applications cover a wide range of issues fromcontent-based to account-based.

Our framework is built on PyTorch, more speciﬁcally Torch-Script to streamline productionization [4]. Training is gen-erally performed on GPUs and when needed we use Dis-tributedDataParallel provided by PyTorch for multi-GPUtraining. Inference on the other hand is performed in paral-lel on up to 200 machines (CPUs suﬃce during inference).To maintain relative consistency in embedding vectors over-time, we use warm-start—that is, initializing the model witha previously trained one. For our experiments, we use Adamoptimizer with learning rate 0.0005 and clip gradients at 1.To mitigate overﬁtting we use dropout with probability 0.1.Finally, we weight positive samples such that the datasetsare balanced.

False news can spread rapidly, either intentionally or unin-tentionally, by various actors. Therefore, proactive identi-ﬁcation of misinformation posts is a particularly importantissue. It is also very challenging as some of such posts con-tain material that are not demonstrably false but rather aredesigned to be misleading and/or reﬂecting only one side ofa story.Some of the existing techniques train classiﬁers on carefullycrafted features [28]. There are also approaches that de-tects “rumors” based on users’ reactions to microblogs over-time [16; 12]. To the best of our knowledge, users’ histories(captured via graph-embeddings described in 2.2) and theirinteractions with posts have not been used to detect mis-information systematically. More recently, researchers atFacebook have devised a multimodal technique for identify-ing misinformation, inspired by the work of Kiela et al. [11].In the multimodal model, separate encoders are learned forvarious content modalities such as image, post-text, image-embedded-text, comments, etc. The encoded features arethen collected into a set and pooled in an order-invariantway. Finally, the pooled vector is passed through a fullyconnected layer to generate predictions.In the context of TIES, we have posts (sources) interactingwith users (targets). Here, we consider a small set of in-teractions: like, love, sad, wow, anger, haha, comment, and Figure 4: 2-dimensional TSNE projection of the TIES em-beddings for misinformation . Clearly, the misinformationposts (yellow) have a diﬀerent distribution than the regularposts (purple).Model PR-AUC Median Gap ± MADTIES-CNN -0.1130 ± ± ± ± ± ± misinformation detection. Combining TIES with content improves the per-formance signiﬁcantly.share. Moreover, we use embeddings described in Section 2.2for source and target entities. For this experiment, we splitour training dataset, consisting of 130K posts (roughly 10%of which are labeled as misinformation), into train-1 , train-2 , and test sets . It should be noted that this dataset issampled diﬀerently for positive and negative cases and doesnot reﬂect the accurate distribution of the posts on our plat-form. We then use the set train-1 to train a few TIES mod-els (CNN, RNN, and Deepset). For all models, we set theinteraction embeddings as well as hidden dimensions to 64.We consider sequences of length 512, where longer sequencesare cropped from the beginning and shorter sequences arepadded accordingly. The CNN model consists of 2 con-volution layers of width 5 and stride 1. The RNN modelconsists of a 1-layer bidirectional LSTM. And ﬁnally, theDeepset model consists of pre and post aggregation MLPswith one hidden layer of size 64. In addition to TIES, wetrain a multimodal model using post images and texts. Fi-nally, we use the train-2 dataset to train a hybrid model, asimple logistic-regression with two features, TIES-score andmultimodal-score. In order to specify conﬁdence intervals,we run the aforementioned experiment on several train/testdata split. Table 1 illustrates the diﬀerence/delta perfor-mance for various models with respect to the content-only We use two disjoint training sets to prevent overﬁtting. a) (b)

Figure 5: Sample misinformation posts, as identiﬁed by in-dependent human reviewers, with low content-only but highhybrid score.model on the test dataset.At this point a few observations are worth highlighting:First, TIES-RNN seems to be the best performing TIESmodel. Second, Deepset appears to be the least perform-ing TIES model, perhaps highlighting the importance ofthe ordered sequences (as opposed to sets) in identifyingquestionable posts. Third, the content-only model seems tooutperform interaction-only models. Finally, combining theinteractions signal with content (hybrid models) improvesthe performance signiﬁcantly.Figure 4 illustrates the TSNE projection of the post embed-dings. It should be noted that the ﬁnal hidden state of theTIES model is considered the source-entity and in this casepost embeddings. Interestingly, the distribution of the mis-information posts (yellow dots), in the latent space, is clearlydiﬀerent from the regular posts (purple dots). Figure 5 il-lustrates a couple of positive posts that were not identiﬁedby the content-only model but registered high scores in thehybrid (content+TIES) model.

It is understood that fake-accounts are a potential securityand integrity risk to the online community. Therefore, iden-tifying and removing them proactively is of utmost impor-tance. Most existing approaches in detecting fake accountsrevolve around graph propagation [32; 24] or clustering ofmalicious actors and/or activities [29; 33; 31; 20]. Fake-buster [9] trained a model on activity related features. Tothe best of our knowledge, sequence of accounts and associ-ated activities have not been used in fake account detection.One of the fake engagement baseline models at Facebook isa multilayer classiﬁer trained on over one thousand carefullyengineered features. These features cover a range of signalsfrom metadata, activity statistics, among others. In this ex-periment, we train a few TIES models and combine themwith the output of the baseline classiﬁer in order to eval-uate the eﬀectiveness of the sequence data. Here, sourcesare accounts engaging with various targets. Targets on theother hand could be other accounts, posts, or pages. As en-gagements, we consider a set of 44 sentry-level actions thatincludes liking a post, following a page, messaging a user,friending, etc. As source and target features, we use graph-based embeddings described in Section 2.2 for accounts, as well as posts and pages creators, where applicable.Our dataset consists of 2.5M accounts with 80/20 good/bad(fake) split. It should be emphasized that this dataset issampled diﬀerently for positive and negative cases and doesnot reﬂect the accurate distribution of fake accounts on ourplatform. We randomly divide this dataset into train-1 , train-2 , and test sets consisting of 2M, 250K, and 250K ac-counts, respectively. We then use the set train-1 to traina few TIES models similar to the previous experiment de-scribed in Section 3.2. The main diﬀerence is that herethe CNN model has 3 convolution layers of width 5 andstride 1, and the RNN model consists of a 2-layer bidi-rectional LSTM. Subsequently, we use the train-2 datasetto train a logistic-regression with two features, TIES-scoreand baseline-score. This experiment is repeated on severaltrain/test splits in order to calculate conﬁdence intervals.The performance gap between various models and the base-line on the test dataset are illustrated in Table 2.Model PR-AUC Median Gap ± MADTIES-CNN -0.0566 ± ± ± ± ± ± fake engagement detection. Combining TIES with the baseline improves theperformance.Our observations are largely consistent with the ones madein Section 3.2. Namely, TIES-RNN appears to be the bestperforming TIES model, baseline outperforms TIES, and asan additional feature, TIES can provide a boost to the base-line model. The fact that baseline outperforms TIES is notsurprising, as it includes a lot more information through over1000 carefully engineered features. Moreover, gains fromTIES appear to be small but they are statistically signiﬁ-cant and outside the conﬁdence interval. It should also benoted that, at the scale of Facebook, even a couple of per-centage points improvement in recall for the same precisiontranslates into signiﬁcant number of additional fake accountsbeing caught.

Figure 6 illustrates the 2-d projection of theTIES embeddings and as expected the distribution of thefake accounts (yellow) is quite diﬀerent from nor mal ac-counts (purple).

With more than two billion monthly active users, Facebookenables small- and medium-sized businesses to connect witha large audience. It allows them to reach out to their targetaudiences in an eﬃcient way and grow their businesses as aresult [1].Some of the integrity issues facing this advertising platforminclude fraudulent requests and unpaid service fees or sub-stantial reversals. Researchers at Facebook have devisedvarious models to prevent such abuses. These models gener-ally include thousands of carefully crafted features that covera wide range of information sources, such as user metadata,igure 6: 2-dimensional TSNE projection of the TIES em-beddings for fake engagements . Clearly, the fake accounts(yellow) have a diﬀerent distribution than the regular ac-counts (purple).activity history, etc. In order to test TIES’ viability for iden-tifying bad accounts that have failed to pay fees, we train afew models using interaction signals that generally includedetails about the account and associated payment trends.Here, sources are ads accounts, source features are graph-based embeddings, and targets are generally nulls. We fol-low the settings in the previous two experiments: we splitour dataset into train-1 , train-2 , and test sets consisting ofroughly 200K, 10K, and 10K accounts, respectively. Thedatasets are sampled such that we have roughly the samenumber of bad and good accounts. We then train TIES-CNN (2 layers of width 5), TIES-RNN (1 layer biLSTM),and TIES-Deepset (pre and post aggregation MLPs with 1hidden-layer of size 64) as well as the baseline model on theset train-1 . We then use train-2 dataset to combine baselineand TIES scores via a simple logistic regression and ﬁnallytest the outcome on the test dataset. In order to calcu-late the conﬁdence intervals, we repeat this experiment onseveral random train/test splits. Precision-recall area underthe curve gaps with respect to the baseline model are illus-trated in Table 3. Figure 7, on the other hand, demonstratesthe 2-d projection of embedding vectors and as expected thedistribution of bad accounts is quite diﬀerent from normalaccounts. Model PR-AUC Median Gap ± MADTIES-CNN -0.0705 ± ± ± ± ± ± ads paymentrisks . Combining TIES with the baseline improves the per-formance.TIES models do a decent job in identifying bad accounts Figure 7: 2-dimensional TSNE projection of the TIES em-beddings for ads payment risk . Clearly, accounts withunpaid fees (yellow) have a diﬀerent distribution than theregular accounts (purple).although they are not as predictive as the baseline model.Moreover, similar to the previous two experiments, RNNand CNN perform roughly the same and both outperformDeepset by a wide margin. Finally, combining TIES withthe baseline provides about 1% gain in precision-recall areaunder the curve which is statistically signiﬁcant. It is worthnoting that even a small improvement in recall for the sameprecision would translate to a large monetary value in sav-ings.

4. CONCLUSION

In this paper, we provided an overview of the temporal in-teraction embeddings (TIES) framework and demonstratedits eﬀectiveness in ﬁghting abuse at Facebook. Social me-dia entities such as accounts, posts, Pages, and Groups in-teract with each other overtime. The type of interactionsgenerated by bad entities such as fake accounts and hatefulposts are diﬀerent from normal entities. TIES, a superviseddeeplearning framework, embeds these interactions. Theembedding vectors in turn can be used to identify bad en-tities (or improve existing classiﬁers). The TIES frameworkis quite general and can be used by various forms of ac-count, post, Page, or Group integrity applications. Movingforward, we plan to continue exploring other applicationsof this methodology within Facebook. Moreover, we canadd additional features such as model interpretability, hy-perparameter optimization, unsupervised learning, etc. tothe framework in order to create a more complete tool.

5. ACKNOWLEDGEMENTS

Authors would like to thank He Li, Bolun Wang, YingyezheJin, Despoina Magka, Austin Reiter, Hamed Firooz, ShailiJain for numerous helpful conversations, data preparation,and experiment setup.

6. REFERENCES

Proceedingsof the Fifth International Workshop on Natural Lan-guage Processing for Social Media , pages 1–10, 2017.[6] K. B. M. L. R. P. Arkaitz Zubiaga, Ahmet Aker. Detec-tion and resolution of rumours in social media: A sur-vey.

ACM Computing Surveys (CSUR) , 51(32), 2018.[7] U. Z. A. S. Q. Asifullah Khan, Anabia Sohail. A surveyof the recent architectures of deep convolutional neuralnetworks.

ArXiv , abs/1901.06032, 2019.[8] Y. Boshmaf, D. Logothetis, G. Siganos, J. Ler´ıa,J. Lorenzo, M. Ripeanu, and K. Beznosov. Integro:Leveraging victim prediction for robust fake account de-tection in osns. In

NDSS , volume 15, pages 8–11, 2015.[9] Y.-C. Chen and S. F. Wu. Fakebuster: A robust fakeaccount detection by activity analysis. In

Proceedings of9th International Symposium on Parallel Architectures,Algorithms and Programming (PAAP) , pages 108–110.IEEE, 2018.[10] A. T. Y. M. Y. M. Chikashi Nobata, Joel Tetreault.Abusive language detection in online user content. In

Proceedings of 25th International World Wide WebConference , 2016.[11] A. J. T. M. Douwe Kiela, Edouard Grave. Eﬃcientlarge-scale multi-modal classiﬁcation. In

Proceedings ofthe 32nd AAAI Conference on Artiﬁcial Intelligence ,pages 5198–5204, 2018.[12] S. W. L. W. T. T. Feng Yu, Qiang Liu. A convolutionalapproach for misinformation identiﬁcation. In

Proceed-ings of the Twenty-Sixth International Joint Conferenceon Artiﬁcial Intelligence , pages 3901–3907, 2017.[13] B. Gambck and U. K. Sikdar. Using convolutional neu-ral networks to classify hate-speech. In

Proceedings ofthe First Workshop on Abusive Language Online , pages85–90, 2017.[14] A. S. S. R. C. G. D. M. C. C. Haoti Zhong, Hao Li.Content-driven detection of cyberbullying on the in-stagram social network. In

Proceedings of the Twenty-Fifth International Joint Conference on Artiﬁcial Intel-ligence (IJCAI-16) , pages 3952–3958, 2016.[15] R. I. R. R. H. Q. L. S. M. Homa Hosseinmardi, Sab-rina Arredondo Mattson. Detection of cyberbullying in-cidents on the instagram social network. Technical re-port, University of Colorado at Boulder, 2015. [16] P. M. H. B. K. S. K. B. J. J. K.-F. W. M. C. Jing Ma,Wei Gao. Detecting rumors from microblogs with re-current neural networks. In

Proceedings of the 25th In-ternational Joint Conference on Artiﬁcial Intelligence ,pages 3818–3824, 2016.[17] C. H. H. L. Karthik Dinakar, Birago Jones and R. Pi-card. Common sense reasoning for detection, preven-tion, and mitigation of cyberbullying.

ACM Transac-tions on Interactive Intelligent Systems , 2(3), 2012.[18] S. Kumar, X. Zhang, and J. Leskovec. Predicting dy-namic embedding trajectory in temporal interactionnetworks. In

Proceedings of the 25th ACM SIGKDDInternational Conference on Knowledge Discovery &Data Mining , pages 1269–1278. ACM, 2019.[19] A. Lerer, L. Wu, J. Shen, T. Lacroix, L. Wehrst-edt, A. Bose, and A. Peysakhovich. Pytorch-biggraph:A large-scale graph embedding system. arXiv preprintarXiv:1903.12287 , 2019.[20] Y. Li, O. Martinez, X. Chen, Y. Li, and J. E. Hopcroft.In a world that counts: Clustering and detecting fakesocial engagement at scale. In

Proceedings of the 25thInternational Conference on World Wide Web , pages111–120. International World Wide Web ConferencesSteering Committee, 2016.[21] S. R. B. P. R. S. A. J. S. Manzil Zaheer, Satwik Kottur.Deep sets. In

Advances in Neural Information Process-ing (NIPS) , 2017.[22] R. O. Maral Dadvar, Dolf Trieschnigg and F. de Jong.Improving cyberbullying detection with user context.In

Proceedings of European Conference on InformationRetrieval , pages 693–696, 2013.[23] R. M. M. G. V. R. N. B. Nemanja Djuric, Jing Zhou.Hate speech detection with comment embeddings. In

Proceedings of 24th International World Wide WebConference , pages 29–30, 2015.[24] S. Nilizadeh, F. Labr`eche, A. Sedighian, A. Zand,J. Fernandez, C. Kruegel, G. Stringhini, and G. Vigna.Poised: Spotting twitter spam oﬀ the beaten paths. In

Proceedings of the 2017 ACM SIGSAC Conference onComputer and Communications Security , pages 1159–1174. ACM, 2017.[25] M. P. A. R. Paulo Vitorino, Sandra Avila. Leveragingdeep neural networks to ﬁght child pornography in theage of social media.

Journal of Visual Communicationand Image Representation , 50:303–313, 2018.[26] M. L. W. Pete Burnap. Us and them: identifying cyberhate on twitter across multiple protected characteris-tics.

EPJ Data Science , 5(11), 2016.[27] M. G. V. V. Pinkesh Badjatiya, Shashank Gupta. Deeplearning for hate speech detection in tweets. In

Pro-ceedings of the 26th International Conference on WorldWide Web Companion , pages 759–760, 2017.[28] M. T. D. Sardar Hamidian. Rumor identiﬁcation andbelief investigation on twitter. In

Proceedings of the7th Workshop on Computational Approaches to Subjec-tivity, Sentiment and Social Media Analysis , page 38,2016.29] G. Stringhini, P. Mourlanne, G. Jacob, M. Egele,C. Kruegel, and G. Vigna. { EVILCOHORT } : De-tecting communities of malicious accounts on on-line services. In { USENIX } Security Symposium( { USENIX } Security 15) , pages 563–578, 2015.[30] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit,L. Jones, A. N. Gomez, (cid:32)L. Kaiser, and I. Polosukhin.Attention is all you need. In

Advances in neural infor-mation processing systems , pages 5998–6008, 2017.[31] C. Xiao, D. M. Freeman, and T. Hwa. Detecting clus-ters of fake accounts in online social networks. In

Pro-ceedings of the 8th ACM Workshop on Artiﬁcial Intel-ligence and Security , pages 91–101. ACM, 2015.[32] C. Yang, R. Harkreader, J. Zhang, S. Shin, and G. Gu.Analyzing spammers’ social networks for fun and proﬁt:a case study of cyber criminal ecosystem on twitter.In

Proceedings of the 21st international conference onWorld Wide Web , pages 71–80. ACM, 2012.[33] F. Y. Q. K. Y. Y. Y. C. Yao Zhao, Yinglian Xie andE. Gillum. Botgraph: Large scale spamming botnet de-tection. In , 2009.[34] S. Z. H. X. Ying Chen, Yilu Zhou. Detecting oﬀensivelanguage in social media to protect adolescent onlinesafety. In . IEEE, 2012.[35] C. E. Zachary Lipton, John Berkowitz. A critical re-view of recurrent neural networks for sequence learning.

ArXiv , abs/1506.00019, 2015.[36] D. H. Zeerak Waseem. Hateful symbols or hateful peo-ple? predictive features for hate speech detection ontwitter. In