[PDF] Into the Battlefield: Quantifying and Modeling Intra-community Conflicts in Online Discussion

Abstract

In this work, we present a novel quantification of conflict in online discussion. Unlike previous studies on conflict dynamics, which model conflict as a binary phenomenon, our measure is continuous-valued, which we validate with manually annotated ratings. We address a two-way prediction task. Firstly, we predict the probable degree of conflict a news article will face from its audience. We employ multiple machine learning frameworks for this task using various features extracted from news articles. Secondly, given a pair of users and their interaction history, we predict if their future engagement will result in a conflict. We fuse textual and network-based features together using a support vector machine which achieves an AUC of 0.89. Moreover, we implement a graph convolutional model which exploits engagement histories of users to predict whether a pair of users who never met each other before will have a conflicting interaction, with an AUC of 0.69. We perform our studies on a massive discussion dataset crawled from the Reddit news community, containing over 41k news articles and 5.5 million comments. Apart from the prediction tasks, our studies offer interesting insights on the conflict dynamics -- how users form clusters based on conflicting engagements, how different is the temporal nature of conflict over different online news forums, how is contribution of different language based features to induce conflict, etc. In short, our study paves the way towards new methods of exploration and modeling of conflict dynamics inside online discussion communities.

Full PDF

IInto the Battlefield: Quantifying and Modeling Intra-communityConflicts in Online Discussion

Subhabrata Dutta, Dipankar Das

Jadavpur UniversityKolkata, India{subha0009,dipankar.dipnil2005}@gmail.com

Gunkirat Kaur, Shreyans Mongia, ArpanMukherjee, Tanmoy Chakraborty

IIIT-Delhi, India{gunkirat15032,shreyans15178,arpan17007,tanmoy}@iiitd.ac.in

ABSTRACT

Over the last decade, online forums have become primary newssources for readers around the globe, and social media platformsare the space where these news forums find most of their audienceand engagement. Our particular focus in this paper is to studyconflict dynamics over online news articles in Reddit, one of themost popular online discussion platforms. We choose to study howconflicts develop around news inside a discussion community, the r/news subreddit. Mining the characteristics of these engagementsoften provide useful insights into the behavioral dynamics of large-scale human interactions. Such insights are useful for many reasons– for news houses to improvise their publishing strategies andpotential audience, for data analytics to get a better introspectionover media engagement as well as for social media platforms toavoid unnecessary and perilous conflicts.In this work, we present a novel quantification of conflict inonline discussion. Unlike previous studies on conflict dynamics,which model conflict as a binary phenomenon, our measure iscontinuous-valued, which we validate with manually annotatedratings. We address a two-way prediction task. Firstly, we predictthe probable degree of conflict a news article will face from its au-dience. We employ multiple machine learning frameworks for thistask using various features extracted from news articles. Secondly,given a pair of users and their interaction history, we predict iftheir future engagement will result in a conflict. We fuse textualand network-based features together using a support vector ma-chine which achieves an AUC of 0.89. Moreover, we implement agraph convolutional model which exploits engagement histories ofusers to predict whether a pair of users who never met each otherbefore will have a conflicting interaction, with an AUC of 0.69.We perform our studies on a massive discussion dataset crawledfrom the Reddit news community, containing over 41 k news articlesand 5 . ACM Reference Format:

Subhabrata Dutta, Dipankar Das and Gunkirat Kaur, Shreyans Mongia,Arpan Mukherjee, Tanmoy Chakraborty. 2019. Into the Battlefield: Quan-tifying and Modeling Intra-community Conflicts in Online Discussion. In

CIKM ’19, November 3–7, 2019, Beijing, China

The 28th ACM International Conference on Information and Knowledge Man-agement (CIKM ’19), November 3–7, 2019, Beijing, China.

ACM, New York,NY, USA, 10 pages. https://doi.org/10.1145/3357384.3358037

Mining knowledge from social media has gained tremendous atten-tion among the research community in recent years. Endeavoursstarted with entity recognition, opinion mining, object detection,etc.; current advents are pushing barriers to more complex analysissuch as influence detection, malicious activity identification, multi-modal and heterogeneous data mining, etc. Usage of social mediaplatforms are now so all-encompassing that these analyses yieldrich insight into individual and community interaction in general.Online discussion forums are a particular type of social media andnetworks, which, due to their ever growing usage and popularity,need no introduction today. Reddit is one such example, wherepeople across the globe engage in discussion related to innumerablesets of topics.With more and more people coming together in this virtualworld, differences of opinions and conflict are an inevitability. Con-flict may arise from several premises – partial knowledge, socio-political understandings, clash of cultural and moral positions, andmany more. It can be raised and developed from purely virtual indi-vidual interactions as well as real-world happenings. Although anydifference of opinion can be identified as a conflict, its actual aspectsare versatile. It may manifest itself within a vast spectrum, from con-structive debates with well-formed argumentation to degenerated,unhealthy cyber-bullying and abuse. Thus a better introspectioninto the complex dynamics of conflict over online discussion plat-forms may provide more useful insights to the data analytics andsocial computing community as well as help moderators of onlineplatforms to identify and eliminate abusive conflicts and make theweb a better place.Versatility in manifestation of conflict is also the primary chal-lenge of modeling conflict dynamics. Let us take the following threecomments taken from Reddit: Comment 1:

I’m talking specifically about the 2010 Afghan WarDiary, when Wikileaks was too lazy to scrub the names of about100 Afghan civilian informants, thus revealing their identitiesto Taliban death squads. You actually sound a lot like Assange,who when asked why he didn’t bother scrubbing the names said“Well, they’re informants, so, if they get killed, they’ve got itcoming to them. They deserve it.” a r X i v : . [ c s . S I] S e p IKM ’19, November 3–7, 2019, Beijing, China Dutta et al.

Non-conflict between pairConflict between pairNon-conflict with othersConflict with others

Figure 1: Hypothetical state-transition model of conflict forpair of users; state signifies starting of engagement be-tween a hypothetical user pair.Comment 2: Assange didn’t put them in danger. Participatingin an illegal war and murdering innocent people in a countrythat never attacked the US put them in danger. Being actualNazis put them in danger. Anyone who does that deserves tohave a light shone on what they are doing, so that they hope-fully stop. So sorry your conscience is worked up over maybes,instead of the hard reality of all the ACTUAL MURDERING thatthe US committed.

Comment 3:

You seem a bit daft. If you were in Vichy France,informing for the Nazis, do you think you would have an expec-tation of privacy?When people like you start owning up to whothe real monsters are, then the world can change.Both comments 2 and 3 are put in reply to comment 1, and bothof them hold an opposite view. But how do we decide which oneis more conflicting? Comment 3 is more subjectively aggressivetowards the user posting comment 1. However, if we look at thecontent, comment 2 presents an opposite opinion in a more pro-found sense. Previous studies [25] on conflicts either treated it as abinary phenomenon; or identified controversy scores over topicsand not between two text segments [15]. Sophisticated NLP toolsmay come handy in this content; however one major downside istheir lack of scalability in handling large-scale online data. In thiswork, we focus more on objective, argumentative conflict, ratherthan subjective, aggressive conflict. Simply put, we define comment2 to hold more conflicting opinion compared to comment 1.Online discussion platforms, through the lenses of engagementconflict, becomes a more complex dynamical process when thesystem interacts frequently with external sources. In this work, theexternal source is online news. Reddit has a specific community, r/news , dedicated to discuss on news articles from various onlinenews sources. Users post their views regarding news report andare engaged into discussion. Here a two way conflict comes intoplay – users holding opposite opinions against a report and usersholding opposite opinions towards each other. These two conflictsare even related; previous studies showed that certain news reportstend to blow up conflict of opinion between readers, mostly due tothe topic of the news, language usage, political bias, etc. [8, 28].The state transition model in Figure 1 can be hypothesized as anabstract model of inter-user conflict dynamics. A transition from 0

Notation Denotation T Corpus-wide keyword set T D Keywords present in document

DT S D Target-sentiment vector of document

DT S u Target-sentiment vector averaged overcomments from user uN ki No. of comments from user u i containing term T [ k ] cf ( D , D ) Conflict score between documents D , D nc ( N ) Total conflict towards news article NG ′ ( t ) Dynamic user engagement network

Table 1: Important notations used throughout the paper. to 1 or 2 signifies interaction between a hypothetical user pair. Anystate transition from 1 or 2 can be of two types: either the usersinteract with each other (solid lines) or they interact with rest ofthe users (solid + dashed lines). Then state 1 corresponds to usershaving only non-conflicting engagement with each other, whilestate 2 denotes only conflicting engagements. State 3, in either way,identifies user groups who have preferential conflict.This abstract model can be actuated with a dynamic user-userinteraction graph, with edges between users signifying previousinteractions. We can further weigh these edges according to thedegree of conflict arose in previous interactions. The problem ofpredicting future conflict between any two users then translatesinto a signed link prediction task.Our contributions in this work are as follows:(1) We define a simple yet powerful and scalable measurementof conflict between pair of documents, focused on objectiveexpression of opinion. We use target dependent sentimentscoring to compute a continuous valued score between textdocuments. We use this metric to quantify conflict betweennews reports and their audience as well as between user pairsinteracting over discussion comments, on a large datasetof news articles and corresponding discussions from Red-dit r/news . We manually annotate randomly selected newsreport-comment pair and comment-comment pair to testour conflict metric. We achieve 0 .

96 and 0 .

79 mean squarederror over [ − ] interval of conflict rating. For rankingcomments according to the conflict they express towardsparticular news reports and comments, our method achievesmean average precision of 0 .

77 and 0 .

83, respectively.(2) Using the conflict measurement, we attempt to predict the de-gree of conflict a news article will experience from the usersreading and discussing it. Our prediction is solely based onthe content of the article. We extract several textual featuresfrom the articles and employ multiple machine learning al-gorithms. We achieve symmetric mean average percentageerror of 0 .

077 with with Support Vector Regression model.(3) We attempt to predict whether a future interaction betweenany two given users will be conflicting or not, given their pre-vious history of comments and engagement. We implement aSupport Vector Machine based framework with selected tex-tual and network-based features for this task, which achieved0.89 AUC. We perform a fusion of textual features extractedfrom users’ comment history and their interaction featureover the engagement network using graph convolution over antifying and Modeling Intra-community Conflicts in Online Discussion CIKM ’19, November 3–7, 2019, Beijing, China dynamic user-user engagement, which correctly predictsconflict type between users (who have no previous historyof interaction) with 0.69 AUC.(4) We conduct several experiments using the conflict metricto reveal intriguing patterns of conflict dynamics of newsreporting over r/news community. We explore how conflicttowards news articles from different online news sourcesvary over time, and different news sources trigger inter-userconflict at different degrees.(5) We explore how inter-user conflict patterns emerge overtime in discussion threads as well as in interaction network.We identify different community formation through conflict,which closely follow the abstract conflict model we describedin Figure 1.

In this section, we describe previous studies which we deem to beclosely related to our work.

Conflict in community interaction , which is the prime themeof our study, is a well studied problem in social network theory,psychology and sociology [30, 39]. Different models and valuableintrospection have emerged from these studies, such as how peopletend to adapt towards certain acquaintances after initial conflict,fission in small group networks post conflict, emotional effects ofconflict on individuals etc. However, studies on its online counter-parts are much recent. Most of the studies in controversy and polar-ization over social media are based on Twitter [7, 16]. Garimella etal. [15] proposed a graph-based approach to identify controversialtopics on Twitter and measures to quantify controversy of a topic.They used 20 different hashtags to classify topics of conversation.Partitioning retweet, follow and reply graphs they compute thecontroversy related to each topic. Their work suggested the inef-ficiency of content-based measurements of controversy, majorlyattributed by short spans of texts in tweets and high noise. Guerraet al. [18] proposed a similar approach to measure polarization oversocial media; there data also is mostly based on Twitter. However,one must keep in mind that, the nature of conflict for microblogs issubstantially different from that of discussion forums, primarily dueto the size of the text. Kumar et al. [25] focused on Reddit to iden-tify roles of conflict in community interactions. They performedtheir study on 36,000 Reddit communities (subreddits), identifyingrelation between inter-community mobilization and conflict. Theirstudy also includes patterns of how people ‘gang up’ on the vergeof conflicting engagements. They predicted mobilizations betweencommunities based on conflicts using user-level, community-leveland text-level features. They achieved 0.67, 0.72 and 0.76 AUCusing Random Forest, LSTM and ensemble of both, respectively.Our work can be thought of as another side of their story – whilethey focused on conflict as a inter-community phenomenon, weattempt to address its dynamics in a microscopic level, inside asingle community.

Stance detection and opinion mining is closely related toconflict identification and measurement. Most of the previous worksin stance detection are based on stance classification of rumorsin Twitter [27, 29, 42]. Rosenthal and McKewon [33] propsed aagreement-disagreement identification framework for discussions in Create Debate and Wikipedia Talkpages. They defined vari-ous lexical and semantic features from discussion comments andachieved an average accuracy of 77% on the Create Debate corpus.Zhang et al. [40] used discourse act classification on Reddit dis-cussions to characterize agreement-disagreement over discussionthreads. Dutta et al. [11] employed an attention-based hierarchicalLSTM model for further improvement of discourse act classificationon the same dataset.

News popularity prediction , though does not handle conflictexplicitly, is related to this work as it deals with engagement dy-namics of online news. Previous studies can be classified into twomain heads of approach – popularity of news in social media plat-forms [31, 32, 38], and popularity of news on web in general [12, 22].The second approach deals with the prediction problem unawareof inter-user network information, thereby excludes the explicitinteraction of users with themselves and with the news sources. Pop-ularity prediction models focus only on the degree of engagementa news gets, without concerning about the types of engagement,which is our focus in this work.

Link prediction on social networks , as we already stated,is closely related to our formulated problem of predicting futureconflict between users. There is rich literature focusing on this task[1, 17, 26, 37]. Bliss et al. [5] used evolutionary algorithm for linkprediction in dynamic networks. One important advancement inrecent times for learning graph-based data is Graph ConvolutionNetworks [9, 23]. Zhang and Chen [41] applied convolution onenclosing subgraphs for link prediction. Berg et al. [3] also definedrecommendation as a link prediction problem and used graph auto-encoder using deep stacking of graph convolutional layers.

We crawled discussion threads containing at least one news link inthe posts or comments from r/news subreddit, starting from 2016-09-01 to 2019-01-16. Out of 43,343 discussion threads crawled, wediscarded threads containing less than 10 comments. The remaining17,351 threads containing a total of 5,502,258 comments were usedfor the experiments. We also crawled news articles mentioned inthe threads, resulting in a total of 41,430 news articles from 5,175different news sources. To evaluate our conflict measurement strategy, we employedthree expert annotators to identify conflict between two giventexts (articles/comments). We asked them to rate an interactionwith higher conflict score than another if they found more elabo-rate opposition in the first one. We provided the annotators withmultiple examples annotated by us (one of these examples is pre-sented in Section 1). We asked them to annotate the conflict in [ − ] scale such that non-conflicting and highly conflicting textswill receive 0 and 10, respectively. For any interaction where onlynegativity has been expressed (sarcasm, popular slang without men-tioning to what or whom it is addressed), we asked the annotatorsto rate as 1. We compute final ratings as the average of the rat-ings received. A total of randomly selected 3 ,

734 news-commentpairs and 6 ,

725 comment-comment pairs were annotated. The inter-annotator agreement based on Fleiss’ κ [13] is 0 . We have made the dataset containing the news articles public. They were experts on social media and their age ranged between 25-40 years.

IKM ’19, November 3–7, 2019, Beijing, China Dutta et al.

Given two text segments, we measure conflict between them ashow much opposite sentiment they exhibit. Here, we use target-dependent sentiment measurement (TD-sentiment) as sentence-level sentiment may not be a good indicator of stance towards amotion. Let us take the following two sentences:(1)

Applauds for the writer to rightly explain why immigration isnot a real problem. (2)

This is an extremely good analysis of why immigration shouldbe stopped.

Both of these sentences have positive sentence-level sentiment,though they carry conflicting opinion towards immigration. TD-sentiment for the term ‘ immigration ’ is neutral for sentence 1 andnegative for sentence 2. From this, we can conclude that these twosentences are potential indicator of conflict.As defined in our problem statement, we compute conflict be-tween news article and platform users as well as between pairof users. Firstly, we compute a set of keywords from our dataset(comments + news articles). We tag the sentences using Spacy parts-of-speech tagger and collect nouns only, after removing stop-words and lemmatization. To handle co-references of persons, wesubstitute nominal pronouns ‘ he ’ and ‘ she ’ by the last named-entityfound with ‘Person’-tag. We include all the named entities in ourkeyword set, and top 60% of the rest, ranked in order of tf-idf values.This results in a final corpus-wide term set T .Next, we compute TD-sentiment of news articles and commentsusing Multi-Task Target Dependent Sentiment Classifier (MTTDSC),a state-of-the-art deep learning framework proposed by Gupta etal. [20] recently. MTTDSC is informed by feature representationlearnedd for the related auxiliary task of passage-level sentimentclassification. For the auxiliary task and main task, it uses separatedgated recurrent unit (GRU), and sends the respective states to thefully connected layer, trained for the respective task. The model istrained and evaluated using multiple manually annotated datasets[10, 34, 36].Let a document D (a single comment or a news article) be asequence of sentences [ s , s , · · · , s n ] and T D ⊂ T be the keywordset present in D (where T is the corpus-wide term set defined earlier).For any t ∈ T occurring in s i , MTTDSC computes a three classprobability (positive, negative and neutral) vector v it . Then for allthe occurrences of t in D , we compute aggregate sentiment of D towards t as S D , t = arдmax ( n (cid:205) i v it ) , S D , t ∈ { , , } , wherenegative, neutral and positive sentiments are represented by 1, 2and 3 respectively. Following this, we construct a vector TS D ofsize | T | such that, TS D [ i ] = (cid:40) S D , T [ i ] if T [ i ] ∈ T D TS D now represents the aggregate sentiments of document D to-wards all the terms present in it. For any two documents D and D , we then compute the conflict factor ( c f ) between them using https://spacy.io/usage/linguistic-features their aggregate TD-sentiment vectors TS D and TS D as follows: c f ( D , D ) = | T | (cid:213) i = min ( TS D [ i ] , TS D [ i ] , )| TS D [ i ] − TS D [ i ]| (2)The component min ( TS D [ i ] , TS D [ i ] , ) returns 0 when either ofthe i th terms of TS D and TS D are 0, i.e., the term is not common,and 1 otherwise. This excludes terms which are not present ineither of the texts to contribute to conflict computation. The valueof the component | TS D [ i ] − TS D [ i ]| can be 0 (when both textshave same sentiment towards the term), 1 (when one of texts holdneutral sentiment and other one positive or negative) and 2 (whentexts hold opposite sentiments). Given a news article N and the set of all comments C related to N ,we define News Conflict Score as, nc ( N ) = | C | (cid:213) c ∈ C c f ( N , c ) (3)This is a normalized score referring to what degree users opposethe views presented in the news article. We then extract followingfeatures from news texts to predict this score given a news article:(1) TD-sentiment vector , entity-wise sentiment expressed inthe news, as we compute TS D in Eq. 1.(2) Count of positive, negative and neutral words , taggedusing SenticNet [6].(3)

Cumulative entropy of terms , given by, p = | T | (cid:213) t ∈ T t f t ( log | T | − log ( t f t )) where T is the set of all unique tokens in the corpus, and t f t is the frequency of term t in the news text.(4) Fraction of controversy and bias words , measured usingthe lexicon sets General Inquirer and Biased Language ; weuse the fractions of these lexicons present in the article ascontroversy and bias features.(5) Latent semantic features using ConceptNet Numberbatchpretrained word vectors [35]; we compute TF-IDF weightedaverage of the vectors of the words present in an article torepresent latent semantics of the article.(6) LIX readability [4], computed as: r = | w || s | + × | cw || w | ,where w and s are the sets of words and sentences respec-tively, and cw is the set of words with more than six charac-ters. Higher value of r indicates harness of the users to readthe article.(7) Gunning Fog [19], computed by: 0 . × ( ASL + PCW ) , whereASL is the average sentence length, and PCW is the percent-age of complex words. Higher value of this index indicatesharness of the users to read the article.(8) Subjectivity , calculated using TextBlob . Its values lie inthe range [0,1]. https://github.com/commonsense/conceptnet-numberbatch https://textblob.readthedocs.io/en/dev/ antifying and Modeling Intra-community Conflicts in Online Discussion CIKM ’19, November 3–7, 2019, Beijing, China To predict the conflict score nc ( N ) , we use three regression models: Lasso , Random Forest Regressor , and

Support Vector Regres-sor . As already stated, we define the inter-user conflict prediction as abinary classification task to decide whether two users will engagein a conflict given their previous engagement history. We representengagement history as a weighted undirected graph G = { V , E , W } ,where every node v i ∈ V represents a user u i , and every edge e ij ∈ E connects two nodes v i , v j if and only if u i and u j havebeen engaged with each other earlier (i.e., either of them havecommented in reply to at least one comment/post put by other).Every edge e ij is accompanied by a weight w ij ∈ W equal to theaverage conflict between u i , and u j , which is computed as follows: w ij = N ij N (cid:213) k = c f ( D ik , D jk ) (4)where D ik and D jk represent the comments posted by u i and u j ,respectively at their k th interaction, and N ij is the total numberof such interactions already occurred. c f ( D ik , D jk ) is computed fol-lowing Eq. 2.To predict conflict between user pairs, we propose four differentframeworks: one using graph convolution and three using SupportVector Machine (SVM) with different feature combinations. As typical user-user engagement networks of online discussionplatforms are huge in size, we need to implement graph convolutionover a subgraph. To predict the engagement type between a pair ofusers corresponding to vertices v i and v j , we compute an enclosedsubgraph G sub = { V sub , E sub } containing v i , v j from G such that ∀ v k ∈ V sub , dis ( v i , v k ) , dis ( v j , v k ) ≤ dis max , where dis ( v i , v k ) isthe length of the shortest path between v i and v k , and dis max is athreshold distance (see Section 7 for more details). All the edges in E sub share the same weight as in G .We compute the adjacency matrix A from G sub as follows: A [ i ][ j ] = A [ j ][ i ] = (cid:40) w ij if e ij ∈ E sub v i with a d -dimensional feature vector x i ∈ R d , which represents previous commenting history of user u i . We compute x i as the average over all the feature vectors cor-responding to previous comments from u i , using the same featureselection method as in Section 5 with an additional feature as fol-lows – a binary vector representing the news sources the user isengaged with. This leaves us with a tensor representation of uservertex features X = { x , x , . . . , x | V | } .The adjacency matrix A and the vertex feature tensor X nowrepresent network history and comment history of all the usersat an instance, respectively. First, we learn a lower dimensionalfeature representation X ′ from X as follows: X ′ = σ r ( K f ⊤ X + B f ) (6) w w w n1 w w w nn User features AdjacencymatrixGraphconvolution Graphconvolution GraphconvolutionEngagement graph UserConflictengagementNon-conflictengagementConflictNo conflict i-th featurej-th feature

Figure 2: Inter-user conflict prediction using graph convolu-tion. where K f and B f are kernel and bias matrices to learn while train-ing and σ r ( x ) = max ( x , ) . We fuse these two histories togetherusing graph convolution. We compute a degree-normalized adja-cency matrix ˆ A = D − AD − , where D is the degree matrix of A .This multiplication normalizes the effect of neighboring verticesso that higher degree vertices do not get over-weighted. Now, ourconvolution at the m th depth is computed as, H m + = σ r ( ˆ A · H m · K mg ) (7)where K g is the graph convolution kernel to be learned while train-ing, and H m and H m + are the input and the output for the m th convolution respectively. Since we use three consecutive convolu-tion layers, the final feature representation is H .For predicting whether there will be a conflicting engagementbetween users u i , u j , we select the i th and the j th feature vectorsof H and compute a score y ∈ ( , ) as follows: E = [ H [ i ] , H [ i ]] (8) y = σ s ( K c ⊤ · E + B c ) (9)where [· , ·] stands for the concatenation operator, K c and B c arethe kernel and bias for the classification layer respectively, and σ s ( x ) = ( + e − x ) − . The complete architecture of the model isillustrated in Figure 2. This model is trained to minimize cross-entropy loss between true and predicted labels. Graph convolution automatically learns feature representation forthe interaction between user pairs from node features and con-nectivity of the nodes. For SVM, we need to manually identifyinteraction features. We extract the following textual and networkbased features for each user pair u i , u j :(1) Count of relevant common tokens from the previouscomments of the users; we take the sum of tf-idf values ofcommon unigram and bigrams in the comment history ofboth the users.

IKM ’19, November 3–7, 2019, Beijing, China Dutta et al. (2)

Conflict vector CV ij between the pair computed using TD-sentiment vector TS D following Eq. 1; given previous N ki comments of user u i , { C , C , . . . , C N ki } where the term T [ k ] appear, we compute TS u i , the target sentiment vector of u i averaged over the history as, TS u i [ k ] = N ki N ki (cid:213) l = TS C l [ k ] (10)We compute CV ij as the element-wise absolute differencebetween TS u i and TS u j .(3) Common news sources , CN ij taken as a vector of lengthequal to the number of news sources; for news source k , CN ij [ k ] indicates the number of articles from this newssource where u i , u j both are engaged.(4) Common discussions , indicating the count of discussionswhere both u i and u j are engaged.(5) Previous mutual engagement , the total number of previ-ous interactions between u i and u j .(6) Previous conflict , the average of mutual conflicts between u i and u j for their previous engagements.(7) Neighbor interactions , the count of conflicting and non-conflicting engagements for each user with its neighbornodes.We use three SVMs with Gaussian kernel – first SVM uses all thefeatures mentioned above (SVM-all), the second one (SVM-text)uses only text based features (features 1 and 2) and the third one(SVM-net) uses only network based features (features 3-5). SVM-net, which has been used for negative link prediction by Wang etal. [37], serves as our external baseline.

For the news-user conflict prediction task, total size of our featurevector is 8 , ,

430 news articles, we used 80 : 20train-test split keeping the fractions of different news sources sameover train and test data. For the user-user conflict prediction task, the number of featuresrepresenting user nodes in the graph convolution model is 8 , d max (defined in Section 6.1) to be 100. Thisresults in adjacency matrices with an upper bound of 5000 nodes. We perform this prediction on 25 instances of the dynamic userengagement network, taking a total of 1 ,

637 different subgraphsfrom these instances. For any user pair on these subgraphs, if thereis a conflicting engagement between them over an interval of next24 hours, we label them as positive, otherwise negative. We take213 ,

998 different user pairs altogether, randomly sampling equalnumbers of positive and negative labels to avoid bias. Here again,we split the samples into 80 : 20 train-test splits, with 15% of thetrain data used as the development set to tune the parameters. Weuse Nadam (Adam with Nesterov momentum) optimization to trainthe model, with a batch size of 256. We used scikit-learn framework (https://scikit-learn.org/stable/) to implementall the regression models mentioned. Conflict type RMSE MAP MRR

News-comment conflict 0.96 0.77 0.86Comment-comment conflict 0.79 0.83 0.91

Table 2: Evaluation of conflict measurement on manuallyannotated conflict ratings. -1.2-0.600.61.2 0 50 100 150 200 250 300

Number of words in comment E rr o r ( y t r u e - y p r e d ) Figure 3: Error in conflict score vs. size of comments inwords.

We test our conflict measurement on the manually annotated news-comment and comment-comment pairs (Section 3). To deal withdifferent ranges, we normalize the c f values to the [ − ] inter-val and measure Root Mean Squared Error (MSE). We also con-sider ranking comments accordingly to their conflicting tendencytowards a particular news article and a particular comment. Wecompute the Mean Average Precision (MAP) of the ranking andMean Reciprocal Rank (MRR) for top ranking position based on theground-truth annotation mentioned in Section 3.As observed in Table 2, measuring inter-comment conflict israther an easier task compared to news-comment conflict. Thefeedback obtained from the annotators reveal that as most newsarticles are written in an objective style with less explicit opinion,it is hard to apprehend whether a comment holds opposite opinionto the news.As there is no previous work in quantifying conflict betweentwo text documents over online discussions, we implement theagreement-disagreement detection models proposed by Rosenthaland McKewon [33] ( Baseline-I ) and Dutta et al. [11] (

Baseline-II ).Baseline-I performs a three-class classification: agreement , disagree-ment and none . We identify disagreement as conflict and rest of theclasses as non-conflict. We also define the probability of the dis-agreement class (predicted by Baseline-I) for an interaction as a unitnorm score of conflict. Similarly, Baseline-II performs a ten-classclassification of discourse acts, from which we identify the classes disagreement and negative reaction together as conflict, and rest ofthe classes as non-conflict. Sum of the probabilities of these twomentioned classes is defined as unit norm conflict score predictedby Baseline-II.We compare our strategy of conflict score prediction with thebaselines through a three-way evaluation strategy: antifying and Modeling Intra-community Conflicts in Online Discussion CIKM ’19, November 3–7, 2019, Beijing, China Metric Our method(conflict factor) Baseline 1 Baseline 2

AUC 0.79 0.79 0.62MAP 0.83 0.61 0.55RMSE 0.79 1.67 2.09

Table 3: Comparison of conflict score with baselines. (1) We define a binary classification of interactions into conflictand non-conflict, evaluated using ROC-AUC;(2) We define a regression of the degree of conflict, where wescale the outputs of each model to the interval [ , ] andevaluate using RMSE;(3) We define a ranking problem of the interactions accordingto their degree of conflict, and evaluate using MAP.As both the baselines perform their corresponding tasks (stanceclassification and discourse act classification) on discussion data, weperform this comparisons only for the comment-comment conflictprediction.Table 3 shows that our proposed strategy outperforms both thebaselines for ranking and regression tasks. This is quite expectedas both the baseline models are actually classification frameworks.For the binary classification of conflicting and non-conflicting in-teractions, our strategy ties with Baseline-I.Figure 3 plots the variance of error in conflict score with thechange in the comment length. For news-comment pairs, we onlytake the comment length, while for comment-comment pairs wetake the average of the length of both comments. To see whetherthe error in our score has any bias towards underestimation / over-estimation, we take the difference ( y true − y pred ) , where y true and y pred are manually annotated score and computed c f respectively.As we can see in Figure 3, our computed score underestimates con-flict when comments are short, and overestimates as the size grows(more negative errors for size approximately less than 60 words;more positive errors afterwards). Also, absolute error rate decreaseswith increasing size of comments.Such error pattern can be explained from the definition of conflictmeasurement itself. We use the sum of the absolute differencesof sentiment towards specific targets common in documents, asconflict score which increases with the number of common targetspresent. As the length of the comments increases, the common wordset also increases, and small differences add up to large conflictscores. For short comments, the number of common targets arealso small, and the score tends to reflect less conflict than actual.For shorter comments, another problem is the use of semanticallysimilar words occurring as targets in any of the comments in a givenpair. For example, the sentences ‘ We do not support Democrats ’ and‘

We support Hilary ’ are actually conflicting, as the targets

Hilary and

Democrats are semantically similar. But due to no commonwords, these pairs will be identified as non-conflicting.However as our dataset suggests, the fraction of comments hav-ing greater than 50 words is 0 .

79; and the ratio between the numberof words and targets is 17 . .

96 and 0 .

79 RMSE for news-comment andcomment-comment conflict, respectively, over the interval [ , ] Model MSE RMSE sMAPE

Random Forest 6.194 2.489 0.099SVR 4.041 2.010

Lasso

Table 4: Performance of different regression algorithms fornews-user conflict prediction.

Gunning FogEntropyLIXLatent semanticSubjectivityControversy and bias lexiconPositive polarity wordsNegative polarity wordsTD-sentiment 0 2 4 6 8 10 12Feature Importance

Figure 4: Importance of different features for news-user con-flict prediction. which might be considered as significantly accurate for conflictmodeling.

In our dataset, the news conflict scores (computed using Eq. 3) ofthe news articles vary from 0 to 138 .

15. In Table 4, we present theMSE (Mean Squared Error), RMSE and sMAPE (Symmetric MeanAbsolute Percentage Error) for predicting news conflict scores usingdifferent regression algorithms. In terms of MSE and RMSE, Lassoregression performs the best, while SVR is the best performing onewhen evaluated using sMAPE.We check to see which features are given more importance byour best performing regression algorithms. As we can see in Figure4, term dependent sentiments are the most useful ones to predicthow much likely is a news article to get negative feedback. In fact,this feature achieves way more importance compared to its nextcompetitor, which again are polarity-oriented features. Interest-ingly, the count of negative polarity words has higher importancethan the count of positive polarity words. The high importance ofpolarity related features may signify that news report expressingpolarized bias tends to get more conflicting remarks. Readabilityindices (Gunning-Fog and LIX), albeit low, play some role in theprediction task. In fact, Gunning-Fog is substantially more usefulcompared to LIX.

We evaluate all four models for two cases: (i) whole of the test datawhere a pair of users may or may not have previous interactionhistory, and (ii) user pairs who have no interaction history beforethe prediction instance. We present the evaluation results in Table 5.For the whole test data, SVM model with all the features performsthe best. It is readily conclusive that, network-based features are ofgreater importance compared to text-based features for this task.However, when there is no previous interaction history betweentwo users, graph convolution beats all the models by a substantial

IKM ’19, November 3–7, 2019, Beijing, China Dutta et al.

Evaluation SVM-all SVM-text SVM-net GCN

Acc.

AUC (new) 0.65 0.43 0.65

TheGuardianReuters NewYorkTimes BBC FoxNews USAToday NBCNews Indepen-dent.co.uk C o n f li c t max valueavg. valuemin value Figure 5: Distribution of maximum, minimum and averageconflict scores for different news sources. This plot is foronly top 7 news sources (ranked by number of articles). margin. In fact, when there is no previous engagement historybetween users, the only feature available to the SVM model is theneighbour interactions; which means SVM-all and SVM-net actuallybecome the same model, and SVM-text becomes a model with allzero features with all zero output.

We introspect into the dynamics of conflict in r/news communityusing the conflict measurements that we propose in Eq. 2 (for inter-user conflict) and Eq. 3 (for an aggregate conflict that a news articlereceives from the users).

Different news sources tend to face different degree of conflict fromthe users. In Figure 5, we plot maximum, minimum and averagenews conflict for different news sources in our dataset. Althoughthe average conflict for different sources is in a comparable range,maximum values vary greatly. News sources such as

Fox News , USA Today or NBC News maintain a sustained negative response,whereas

New York Times or Reuters provoke sharp outrage at thesome point.We find that this outrage is signified by an article published inNYTimes on Dec 1, 2017, titled

Michael Flynn Pleads Guilty to Lyingto the F.B.I. and Will Cooperate With Russia Inquiry . Figure 6 alsoindicates the sharp peak for New York Times corresponding to thisarticle. The Guardian, Fox news and NBC News have similar peaks(red-circled) at nearby time instance, all corresponding to articles NBC NewsNew York TimesBBCThe Guardian FOX News Reuters

Figure 6: Temporal variation of news-user conflict for var-ious news sources; conflict score and time are representedin y- and x-axis respectively. All the plots have time framestarting from Nov 17 - Dec 28, 2017. Red circled peaks denoterise in conflict due to articles corresponding to a particularevent. related to the same event. One can draw an intuitive correlationbetween the posting time of the article in the forum and the risein conflict. It is important to note that at the time of posting, weidentify the time when the news appeared on Reddit, not the timeof its appearance on web.

To explore how conflict effects user engagement over r/news , weconstruct a temporal graph G ′ ( t ) = { V ′ ( t ) , E ′ ( t )} , where v i ( t i ) ∈ V ′ ( t ) corresponds to user u i who is engaged in a discussion at time t i for the first time. For every pair of users ( u i , u j ) engaging witheach other (anyone of them commenting in reply to the other) attime t ij , there is an edge e ij ( t ij ) ∈ E ′ ( t ) . For better visualization,we classify edges as conflicting (blue) and non-conflicting (green),and plot only a subgraph using 5000 vertices. We use Fruchter-man Reingold layout algorithm [14] on Gephi [2] to plot the graphand DyCoNet[21] to identify communities. In Figure 7, we presentsnapshots of the evolving graph. Each snapshot is taken at a timedifference of approximately 24 hours, presenting a 4-day long ab-straction through this engagement subgraph.We can observe the formation of separate user clusters in termsof engagement. It is interesting to see that there are some clus-ters where users are predominantly engaged with each other in aconflicting manner (blue regions) and some in a non-conflictingmanner (green regions). We also identify three different types ofengagement patterns in user clusters: • Type-I clusters tend to be formed with non-conflicting en-gagement between users. Users in these clusters do not seemto get engaged in conflicting manner with users in otherclusters as well. • Type-II clusters are formed with users having mutual con-flict. They tend to have conflicting interactions with otherclusters as well. • Type-III clusters show a organization-like behavior. Theseusers maintain almost non-conflicting engagement with eachother, but aggressive towards other clusters (mostly greenregions inside the cluster and blue ones outwards in Figure7). antifying and Modeling Intra-community Conflicts in Online Discussion CIKM ’19, November 3–7, 2019, Beijing, China

Type-I cluster Type-II cluster Type-III cluster

Figure 7: Snapshots of cluster formation in user-user engagement graph (left to right); blue and green edges correspond tocontroversial and non-controversial engagements respectively.

Type-III clusters tend to grow most compared to type-I andtype-II clusters. Different type-III clusters have most inter-clusterconflicts, even greater than that of type-II clusters. Type-I clustersshow least growth rate among three types, signifying that theseusers are less prone to go out of their ‘comfort zone’.This cluster types are of course not completely rigid. Althoughthere is no sign of conversion between type-I and II, both of themcan slowly convert into type-III. It is intriguing to observe twodifferent patterns in the formation of type-III cluster – (i) Someof them emerge as type-III from the beginning. Users having noprevious engagement form non-conflicting connections with eachother. This may signify a probable community interaction amongthem beyond the discussion platform such as organized campaign-ers, small group of people using multiple fake user accounts aka sockpuppets [24], people are accustomed to each other in real lifeand sharing similar opinions, etc. (ii) Some of them started as type-Ior II and slowly get converted into type-III, which possibly signifiesthe evolution of engagement via predominant platform interaction.Users in type-II clusters start changing opinion towards each otherwith long term interaction and get converted into type-III. Simi-larly, type-I users tend to start interacting with opposite opinionsand convert themselves into type-III. We observe that 33% of thetype-III clusters at the end of time frame are the ones convertedfrom type-II, whereas 48% are from type-I. Rest of them startedgrowing as type-III clusters.

Depth of thread N o r m a li z e d i n t e r - c o mm e n t c o n f li c t Figure 8: Variation of inter-comment conflict with depth ofcomments in discussion tree.

Formation and evolution of these clusters closely follow theabstract model of user engagement in Figure 1. A repeated transitionfrom state 1 along the self loop results in type-I cluster, whereasthe same happening on state 2 will result in a type-II cluster. If allthe user pairs from state 1 start conflicting with each other, it will lead to a transition to state 2, which implies that a type-I cluster istransformed into type-II. This can only be possible hypothetically;however we did not find any such evidence in our dataset. Likewise,transition from state 1 or 2 to state 3 signifies preferential conflict,resembling type-III clusters.In Figure 8, we plot the variation of inter-comment conflict withthe depth of the comments in discussion thread tree. We normalizeconflict scores to ( , ) interval. For comment pairs at depth i and i +

1, we plot their conflict at depth i . As it is evident from the plot,a discussion thread is most prone to conflict at depth levels 3 and 4.For interactions at more depth, variance goes up substantially, butaverage inter-conflict score drops steadily.Table 6 shows an example statistics of different news sourcesregarding which discussions lead to user clusters. We report thisfor three different instances G ′ ( t ) , G ′ ( t ) , and G ′ ( t ) at time t , t ,and t respectively. We take the discussions initiated within past 24hours for each instance of the network and map the users in each ofthe largest three clusters to those discussions. As each discussion isrelated to a news source, this finally maps news sources to clusters.As we can see in Table 6, there are several common news sourcespresent in first and second instances, whereas almost no commonsources is found in the third instance. In this paper, we studied conflict dynamics over online discussionsinside Reddit r/news community. We proposed a novel, continuous-valued quantification of inter-document conflict. Using this mea-surement we attempted to predict how much negative responsea news article is going to face from audience in online discussionplatforms, solely based on its textual features. We proposed anSVM based model and a graph convolutional model to predict fu-ture conflict between pairs of users. Extensive evaluation showedthat network-based features are more important in conflict linkprediction compared to textual content-based features.Our analyses provide novel insights into the conflict dynamicsover large-scale online discussion. We show how different newssources get different reactions from their audience and how thisvaries temporally. We identified three distinct types of user clus-ters developed in Reddit r/news community, based on the attitudetowards other users and engagement patterns. We also provided ahypothetical state-transition model of user engagement, which isclosely followed by actual interaction patterns.

IKM ’19, November 3–7, 2019, Beijing, China Dutta et al.

Cluster indexranked along size Instance 1 Instance 2 Instance 3

Table 6: Percentage of different news sources in user clusters of user-user engagement network. We show the statistics of threelargest clusters at three different instances of the network. Up to top four news sources (according to %-contribution) is shown.

ACKNOWLEDGEMENT

The project was partially supported by Ramanujan Fellowship(SERB, India), Early Career Research Award (ECR/2017/001691),the Infosys Centre for AI, IIITD, and State Government Fellowship,Jadavpur University.

REFERENCES [1] Luca Maria Aiello, Alain Barrat, Rossano Schifanella, Ciro Cattuto, BenjaminMarkines, and Filippo Menczer. 2012. Friendship prediction and homophily insocial media.

ACM Transactions on the Web (TWEB)

6, 2 (2012), 9–29.[2] Mathieu Bastian, Sebastien Heymann, and Mathieu Jacomy. 2009. Gephi: an opensource software for exploring and manipulating networks. In

ICWSM . 11–20.[3] Rianne van den Berg, Thomas N Kipf, and Max Welling. 2017. Graph convolu-tional matrix completion. arXiv preprint arXiv:1706.02263 (2017).[4] Carl-Hugo Björnsson. 1983. Readability of newspapers in 11 languages.

ReadingResearch Quarterly (1983), 480–497.[5] Catherine A Bliss, Morgan R Frank, Christopher M Danforth, and Peter SheridanDodds. 2014. An evolutionary algorithm approach to link prediction in dynamicsocial networks.

Journal of Computational Science

5, 5 (2014), 750–764.[6] Erik Cambria, Soujanya Poria, Devamanyu Hazarika, and Kenneth Kwok. 2018.SenticNet 5: Discovering conceptual primitives for sentiment analysis by meansof context embeddings. In

AAAI . 1–10.[7] Michael D Conover, Jacob Ratkiewicz, Matthew Francisco, Bruno Gonçalves,Filippo Menczer, and Alessandro Flammini. 2011. Political polarization on twitter.In

ICWSM . 1–10.[8] Peter A Cramer. 2011.

Controversy as news discourse . Vol. 19. Springer Science &Business Media.[9] Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. 2016. Convolu-tional neural networks on graphs with fast localized spectral filtering. In

NIPS .3844–3852.[10] Li Dong, Furu Wei, Chuanqi Tan, Duyu Tang, Ming Zhou, and Ke Xu. 2014.Adaptive recursive neural network for target-dependent twitter sentiment classi-fication. In

ACL , Vol. 2. 49–54.[11] Subhabrata Dutta, Tanmoy Chakraborty, and Dipankar Das. 2019. How did thediscussion go: Discourse act classification in social media conversations. In

Linking and Mining Heterogeneous and Multi-view Data . Springer, 137–160.[12] Kelwin Fernandes, Pedro Vinagre, and Paulo Cortez. 2015. A proactive intelli-gent decision support system for predicting the popularity of online news. In

Portuguese Conference on Artificial Intelligence . Springer, 535–546.[13] Joseph L Fleiss. 1971. Measuring nominal scale agreement among many raters.

Psychological bulletin

76, 5 (1971), 378.[14] Thomas MJ Fruchterman and Edward M Reingold. 1991. Graph drawing by force-directed placement.

Software: Practice and experience

21, 11 (1991), 1129–1164.[15] Kiran Garimella, Gianmarco De Francisci Morales, Aristides Gionis, and MichaelMathioudakis. 2018. Quantifying controversy on social media.

ACM Transactionson Social Computing

1, 1 (2018), 3.[16] Venkata Rama Kiran Garimella and Ingmar Weber. 2017. A long-term analysis ofpolarization on Twitter. In

ICWSM . 1–10.[17] Eric Gilbert and Karrie Karahalios. 2009. Predicting tie strength with social media.In

SIGCHI . ACM, 211–220.[18] Pedro Calais Guerra, Wagner Meira Jr, Claire Cardie, and Robert Kleinberg.2013. A measure of polarization on social media networks based on communityboundaries. In

ICWSM . 1–10.[19] Robert Gunning. 1969. The fog index after twenty years.

Journal of BusinessCommunication

6, 2 (1969), 3–13.[20] Divam Gupta, Kushagra Singh, Soumen Chakrabarti, and Tanmoy Chakraborty.2019. Multi-task Learning for Target-dependent Sentiment Classification. arXivpreprint arXiv:1902.02930 (2019). [21] Julie Kauffman, Aristotelis Kittas, Laura Bennett, and Sophia Tsoka. 2014. Dy-CoNet: a Gephi plugin for community detection in dynamic complex networks.

PloS one

9, 7 (2014), e101357.[22] Yaser Keneshloo, Shuguang Wang, Eui-Hong Han, and Naren Ramakrishnan.2016. Predicting the popularity of news articles. In

SDM . SIAM, 441–449.[23] Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graphconvolutional networks. arXiv preprint arXiv:1609.02907 (2016).[24] Srijan Kumar, Justin Cheng, Jure Leskovec, and V.S. Subrahmanian. 2017. AnArmy of Me: Sockpuppets in Online Discussion Communities. In

WWW . 857–866.[25] Srijan Kumar, William L Hamilton, Jure Leskovec, and Dan Jurafsky. 2018. Com-munity interaction and conflict on the web. In

WWW . International World WideWeb Conferences Steering Committee, 933–943.[26] David Liben-Nowell and Jon Kleinberg. 2007. The link-prediction problem forsocial networks.

Journal of the American society for information science andtechnology

58, 7 (2007), 1019–1031.[27] Michal Lukasik, PK Srijith, Duy Vu, Kalina Bontcheva, Arkaitz Zubiaga, andTrevor Cohn. 2016. Hawkes processes for continuous time sequence classification:an application to rumour stance classification in twitter. In

ACL , Vol. 2. 393–398.[28] Marian Meyers. 1994. Defining Homosexuality: News Coverage of theRepeal theBan’Controversy.

Discourse & Society

5, 3 (1994), 321–344.[29] Saif Mohammad, Svetlana Kiritchenko, Parinaz Sobhani, Xiaodan Zhu, and ColinCherry. 2016. Semeval-2016 task 6: Detecting stance in tweets. In

SemEval . 31–41.[30] Reed E Nelson. 1989. The strength of strong ties: Social networks and intergroupconflict in organizations.

Academy of Management Journal

32, 2 (1989), 377–401.[31] Alicja Piotrkowicz, Vania Dimitrova, Jahna Otterbacher, and Katja Markert. 2017.Headlines matter: Using headlines to predict the popularity of news articles ontwitter and facebook. In

ICWSM . 1–10.[32] Georgios Rizos, Symeon Papadopoulos, and Yiannis Kompatsiaris. 2016. Pre-dicting news popularity by mining online discussions. In

WWW . InternationalWorld Wide Web Conferences Steering Committee, 737–742.[33] Sara Rosenthal and Kathy McKeown. 2015. I couldn’t agree more: The role ofconversational structure in agreement and disagreement detection in onlinediscussions. In

Proceedings of the 16th Annual Meeting of the Special Interest Groupon Discourse and Dialogue . 168–177.[34] Niek J Sanders. 2011. Sanders-twitter sentiment corpus.

Sanders Analytics LLC

242 (2011).[35] Robert Speer and Joshua Chin. 2016. An ensemble method to produce high-qualityword embeddings. arXiv preprint arXiv:1604.01692 (2016).[36] Bo Wang, Maria Liakata, Arkaitz Zubiaga, and Rob Procter. 2017. Tdparse: Multi-target-specific sentiment recognition on twitter. In

EMNLP . 483–493.[37] Peng Wang, BaoWen Xu, YuRong Wu, and XiaoYu Zhou. 2015. Link predictionin social networks: the state-of-the-art.

Science China Information Sciences

58, 1(2015), 1–38.[38] Bo Wu and Haiying Shen. 2015. Analyzing and predicting news popularity onTwitter.

International Journal of Information Management

35, 6 (2015), 702–711.[39] Wayne W Zachary. 1977. An information flow model for conflict and fission insmall groups.

Journal of anthropological research

33, 4 (1977), 452–473.[40] Amy Zhang, Bryan Culbertson, and Praveen Paritosh. 2017. Characterizingonline discussion using coarse discourse sequences. (2017).[41] Muhan Zhang and Yixin Chen. 2018. Link prediction based on graph neuralnetworks. In

NIPS . 5165–5175.[42] Arkaitz Zubiaga, Elena Kochkina, Maria Liakata, Rob Procter, Michal Lukasik,Kalina Bontcheva, Trevor Cohn, and Isabelle Augenstein. 2018. Discourse-awarerumour stance classification in social media using sequential classifiers.