Demarcating Endogenous and Exogenous Opinion Dynamics: An Experimental Design Approach
Paramita Koley, Avirup Saha, Sourangshu Bhattacharya, Niloy Ganguly, Abir De
11Demarcating Endogenous and Exogenous OpinionDynamics: An Experimental Design Approach
PARAMITA KOLEY,
Department of Computer Science and Engineering, IIT Kharagpur, India
AVIRUP SAHA,
Department of Computer Science and Engineering, IIT Kharagpur, India
SOURANGSHU BHATTACHARYA,
Department of Computer Science and Engineering, IIT Kharagpur,India
NILOY GANGULY,
Department of Computer Science and Engineering, IIT Kharagpur, India
ABIR DE,
Department of Computer Science and Engineering, IIT Bombay, IndiaThe networked opinion diffusion in online social networks (OSN) is often governed by the two genres ofopinions— endogenous opinions that are driven by the influence of social contacts among users, and exogenous opinions which are formed by external effects like news, feeds etc. Accurate demarcation of endogenousand exogenous messages offers an important cue to opinion modeling, thereby enhancing its predictiveperformance. In this paper, we design a suite of unsupervised classification methods based on experimentaldesign approaches, in which, we aim to select the subsets of events which minimize different measures of meanestimation error. In more detail, we first show that these subset selection tasks are NP-Hard. Then we showthat the associated objective functions are weakly submodular, which allows us to cast efficient approximationalgorithms with guarantees. Finally, we validate the efficacy of our proposal on various real-world datasetscrawled from Twitter as well as diverse synthetic datasets. Our experiments range from validating predictionperformance on unsanitized and sanitized events to checking the effect of selecting optimal subsets of varioussizes. Through various experiments, we have found that our method offers a significant improvement inaccuracy in terms of opinion forecasting, against several competitors.CCS Concepts: •
Networks → Network dynamics ; •
Human-centered computing → Social networks ;• Computing methodologies → Network science ; Modeling methodologies ; Agent / discrete models ; Anomaly detection ; Markov decision processes .Additional Key Words and Phrases: Opinion dynamics, robust inference, submodularity, subset selection,temporal point process
ACM Reference Format:
Paramita Koley, Avirup Saha, Sourangshu Bhattacharya, Niloy Ganguly, and Abir De. 2021. DemarcatingEndogenous and Exogenous Opinion Dynamics: An Experimental Design Approach.
ACM Trans. Knowl.Discov. Data.
1, 1, Article 1 (January 2021), 25 pages. https://doi.org/10.1145/3449361
Authors’ addresses: Paramita Koley, [email protected], Department of Computer Science and Engineering, IITKharagpur, Kharagpur, India, 721302; Avirup Saha, Department of Computer Science and Engineering, IIT Kharagpur,Kharagpur, India, 721302, [email protected]; Sourangshu Bhattacharya, Department of Computer Science and Engi-neering, IIT Kharagpur, Kharagpur, India, 721302, [email protected]; Niloy Ganguly, Department of ComputerScience and Engineering, IIT Kharagpur, Kharagpur, India, 721302, [email protected]; Abir De, Department of ComputerScience and Engineering, IIT Bombay, Mumbai, India, 400076, [email protected] to make digital or hard copies of part or all of this work for personal or classroom use is granted without feeprovided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice andthe full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses,contact the owner/author(s).© 2021 Copyright held by the owner/author(s).1556-4681/2021/1-ART1 https://doi.org/10.1145/3449361
ACM Trans. Knowl. Discov. Data., Vol. 1, No. 1, Article 1. Publication date: January 2021. a r X i v : . [ c s . S I] F e b :2 Koley, et al. Research on understanding opinion dynamics, from both modeling and control perspectives,abounds in literature [1, 8, 14–16, 18, 20–22, 24, 26, 35, 47, 48, 67], predominantly following twoapproaches. The first approach is grounded on the concepts of statistical physics, it is barely data-driven and therefore shows poor predictive performance [1, 8, 14, 20–22, 24, 26, 35, 47, 48, 67],The second class of models aims to overcome such limitations, by learning a tractable linearmodel from transient opinion dynamics [15, 16, 18]. Barring the individual limitations of theseexisting approaches, they all have assumed the absence or lack of external effects, despite empiricalevidences advocating the presence of such signals [2, 28, 29, 49, 60]. Since a social network isan open system encouraging both inward and outward flow of information, a continuous flux ofexternal information is funneled to its users, via a gamut of sources like news, feeds, etc. As a result,a networked opinion formation process that involves extensive interactive discussions amongconnected users, is also propelled by such external sources recommended to those users. Therefore,at the very outset, we observe two families of opinions – endogenous opinions which evolve due tothe influence from neighbors, and exogenous opinions that are driven mostly by the externalities. Inmost practical situations, the true labels of the posts (endogenous or exogenous) are not available.Therefore, an accurate unsupervised labeling of the posts has immense potential impact on opinionmodeling – boosting the predictive performance for a broad spectrum of applications like pole-prediction, brand sentiment estimation, etc. In this paper, our goal is to demarcate endogenousand exogenous messages and demonstrate the utility of our proposal from an opinion modelingviewpoint.
We initiate investigating the dynamics of organic opinion in the presence of exogenous actions,using a recent temporal point-process based model SLANT [18]. It allows users’ latent endogenous opinions to be modulated over time, by both endogenous and exogenous opinions of their neighbours,expressed as sentiment messages (Section 3).Subsequently, in Section 4 we propose CherryPick , a suite of learning algorithms, based onexperimental design methods that optimally demarcate the endogenous and exogenous opinionsunder various circumstances. In a nutshell, in order to categorize messages, we aim to select theset of events that comply with the organic dynamics with a high confidence, i.e. a low varianceof influence estimation. To this end, we devise this problem as an inference task of the messagecategory (endogenous or exogenous) by means of subset selection ( i.e. demarcating a subset ofendogenous messages from the whole message stream). We find that this proposed inferenceproblem can be formulated as an instance of a cardinality constrained submodular maximizationproblem. To solve this optimization problem, we present a greedy approach which, like an ordinarygreedy submodular maximization algorithm, enjoys approximation bounds. However, since someof the optimization objectives we consider are only weakly submodular, they admit some specialapproximation bounds which have been proposed recently in [33].Finally, we perform experiments on various real datasets, crawled from Twitter about diversetopics (Section 5) and synthetic datasets, built over diverse networks (Section 6) and show thatCherryPick can accurately classify endogenous and exogenous messages, thereby helping toachieve a substantial performance boost in forecasting opinions. This paper is an extension of [17] where the idea of CherryPick was first introduced. However, it has been substantiallyrefined and expanded in this paper.ACM Trans. Knowl. Discov. Data., Vol. 1, No. 1, Article 1. Publication date: January 2021. emarcating Endogenous and Exogenous Opinions 1:3
Opinion modeling and its applications have been widely studied in different guises in many years. Inthis section, we review some of them, from two major perspectives– (i) opinion dynamics modelingand (ii) opinion sensing.
Opinion dynamics modeling.
Modeling the evolution process of opinion flow over networks,mostly follows two approaches, based on (a) statistical physics and (b) data-driven techniques.The first type of models, e.g. Voter, Flocking, DeGroot, etc. is traditionally designed to capturevarious regulatory real-life phenomena e.g. consensus, polarization, clustering, coexistence etc. [10,14, 19, 20, 23, 34, 40, 45, 46, 56, 61, 62, 64, 66, 67]. Voter model [14] is a discrete opinion model,where opinions are represented as nominal values, and copied from influencing neighbors in everystep. This underlying principle is still a major workhorse for many discrete opinion models [10,23, 45, 46, 56, 61, 62, 66, 67]. In contrast to these models, Flocking and DeGroot are continuousopinion models. In Flocking model and its variations [19, 34, 40, 64], a node 𝑖 having opinion 𝑥 𝑖 first selects the set of neighbors 𝑗 with | 𝑥 𝑖 − 𝑥 𝑗 | ≤ 𝜖 , and then updates its own opinion by averagingthese opinions. DeGroot model [20], on the other hand, allows a user to update her opinion withthe average opinions of all her neighbors. In this model, the underlying influence matrix is rowstochastic, enforcing consensus for a strongly connected graph. The second class of models, e.g.Biased Voter, AsLM, SLANT, etc. aim to learn a tractable linear model from a temporal messagestream reflecting transient opinion dynamics [15, 16, 18, 30]. While a Biased Voter model [15] unifiesvarious aspects of DeGroot and Flocking models, AsLM [16] generalizes the DeGroot model byrelaxing the structure of the influence matrix. In contrast to these models that ignore the temporaleffects of messages (post-rate), SLANT [18] blends the opinion dynamics along with the messagedynamics, using a stochastic generative model. In contrast to the above modeling approaches,there exist abundant empirical studies [3–5, 11, 27] which investigate various factors influencinginformation diffusion in social networks through large-scale field experiments. However, all theseapproaches skirt the effect of externalities, which severely constrains their forecasting prowess. Opinion sensing.
Sensing opinions, or mining sentiments from textual data traditionally relieson sophisticated NLP based machineries. See [44, 53] for details. Both these monographs provide acomprehensive survey. In general, LIWC [54] is widely considered as benchmark tool to computesentiments from rich textual data. On the other hand, Hannak et al. developed a simple yet effectivemethod for sentiment mining from short informal text like tweets [31], also used by [16, 18].Recently, a class of works [36, 37, 41, 42] designs simple supervised strategies to sense opinionspams, and some of them [36, 37, 42] also advocate the role of temporal signals in opinion spamming.Note that, exogenous opinions are fundamentally different from opinion spams. In contrast to aspam which is unsolicited and irrelevant to the discussion, an exogenous post is often relevant, yetjust an informed reflection of some external news or feeds. Also, since spamminess of a message isits intrinsic property, it does not depend on the messages before it. However, an exogenous postwhen retweeted, can become endogenous. Furthermore, the opinion spam detection techniquesrest on the principle of supervised classification that in turn requires labeled messages. However, inthe context of networked opinion dynamics, the messages (tweets) come unlabeled, which rendersthe spam detection techniques practically inapplicable for such scenarios.Finally we conclude this section with this note that our work closely resembles SLANT [18] andis built upon the modeling framework proposed by SLANT, but the major difference is that SLANTassumes the entire event stream as endogenous, whereas our work is motivated towards exploringvarious techniques for systematically demarcating the externalities from the heterogeneous eventstream. Similarly, our proposed algorithms are closely influenced by recent progress in subsetselection literature [33] where authors deal with designing new alphabetical optimality criteria for
ACM Trans. Knowl. Discov. Data., Vol. 1, No. 1, Article 1. Publication date: January 2021. :4 Koley, et al.
Symbol Meaning G A directed social network (such as Twitter) V Vertices of G (users) E Edges of G (follower-followee links) N ( 𝑢 ) Set of users followed by 𝑢 , subset of VU ( 𝑡 ) History of messages posted by all users until time 𝑡 U 𝑢 ( 𝑡 ) History of messages posted by user 𝑢 until time 𝑡𝑒 𝑖 𝑖 th message in U ( 𝑡 ) 𝑢 𝑖 User posting the 𝑖 th message in U ( 𝑡 ) 𝜁 𝑖 Opinion/sentiment value of the 𝑖 th message in U ( 𝑡 ) 𝑡 𝑖 Timestamp of the 𝑖 th message in U ( 𝑡 )H ( 𝑡 ) Set of endogenous messages in
U ( 𝑡 ) 𝑚 𝑖 Opinion/sentiment value of the 𝑖 th endogenous message in H ( 𝑡 )H 𝑢 ( 𝑡 ) Set of endogenous messages posted by user 𝑢 in U 𝑢 ( 𝑡 )C( 𝑡 ) Set of exogenous messages in
U ( 𝑡 ) , complementary to H ( 𝑡 ) 𝑤 𝑖 Opinion/sentiment value of the 𝑖 th exogenous message in C( 𝑡 )C 𝑢 ( 𝑡 ) Set of exogenous messages posted by user 𝑢 in U 𝑢 ( 𝑡 ) , complementary to H 𝑢 ( 𝑡 ) 𝑁 𝑢 ( 𝑡 ) Counting process for endogenous messages of user 𝑢 , equal to |H 𝑢 ( 𝑡 )| 𝑀 𝑢 ( 𝑡 ) Counting process for endogenous messages of user 𝑢 , equal to |C 𝑢 ( 𝑡 )| 𝑵 ( 𝑡 ) Set of counting processes ( 𝑁 𝑢 ( 𝑡 ) ) 𝑢 ∈V 𝑴 ( 𝑡 ) Set of counting processes ( 𝑀 𝑢 ( 𝑡 ) ) 𝑢 ∈V 𝜆 ∗ 𝑢 ( 𝑡 ) Intensity of 𝑁 𝑢 ( 𝑡 ) , i.e. the endogenous message rate of user 𝑢 𝝀 ∗ ( 𝑡 ) Set of intensities ( 𝜆 ∗ 𝑢 ( 𝑡 ) ) 𝑢 ∈V Table 1. List of important notations used in Section 3. quadratic models. However, our work is motivated towards investing the subset selection problemfor linear models in a temporal setting.
In what follows, we describe the scenario of a social network of users who post opinion-bearingmessages. For ease of reference, we list a compendium of all important notations used in this sectionin Table 1.We use two sources of data as input: (I). a directed social network G = (V , E) of users with theconnections between them (e.g. friends, following, etc), and (II). an aggregated history U ( 𝑇 ) of themessages posted by these users during a given time-window [ ,𝑇 ) . In this paper, we summarizeeach message-event 𝑒 𝑖 ∈ U ( 𝑇 ) using only three components, the user 𝑢 𝑖 who has posted themessage, the opinion or sentiment value 𝜁 𝑖 associated with the message, and the timestamp 𝑡 𝑖 ofthe post. Therefore, U ( 𝑇 ) : = { 𝑒 𝑖 = ( 𝑢 𝑖 , 𝜁 𝑖 , 𝑡 𝑖 )| 𝑡 𝑖 < 𝑇 } . We also use the notation U ( 𝑡 ) to denote theset of messages collected until 𝑡 < 𝑇 i.e. U ( 𝑡 ) : = { 𝑒 𝑖 = ( 𝑢 𝑖 , 𝜁 𝑖 , 𝑡 𝑖 )| 𝑡 𝑖 < 𝑡 } .In the spirit of [18], we assume that the history of events until time 𝑡 influences the arrival processof events after time 𝑡 . However, in a direct contrast to [18] which skirts the potential influence ofexternalities, we posit that the message events belong to two categories– endogenous and exogenous .Whereas the arrivals of endogenous events are driven by the previous events in the network i.e. theseare history-dependent , exogenous events originate from external influence outside the given socialnetwork and are, therefore not history-dependent . Note that the distinction between endogenous ACM Trans. Knowl. Discov. Data., Vol. 1, No. 1, Article 1. Publication date: January 2021. emarcating Endogenous and Exogenous Opinions 1:5 and exogenous events is not directly observable from the data, but needs to be inferred from thecharacteristics of the event sequence. To this end, we split the entire set of messages observed untiltime 𝑡 , U ( 𝑡 ) into two complementary subsets, H ( 𝑡 ) and C( 𝑡 ) representing the sets of endogenousand exogenous events respectively, with U ( 𝑡 ) = H ( 𝑡 ) ∪ C( 𝑡 ) & H ( 𝑡 ) ∩ C( 𝑡 ) = 𝜙 . At a user level, wedenote H 𝑢 ( 𝑡 ) = { 𝑒 𝑖 = ( 𝑢 𝑖 , 𝑚 𝑖 , 𝑡 𝑖 )| 𝑢 𝑖 = 𝑢 and 𝑡 𝑖 < 𝑡 } as the collection of all endogenous messages withsentiments 𝑚 𝑖 , posted by user 𝑢 until time 𝑡 . Similarly, C 𝑢 ( 𝑡 ) = { 𝑒 𝑖 = ( 𝑢 𝑖 , 𝑤 𝑖 , 𝑡 𝑖 )| 𝑢 𝑖 = 𝑢 and 𝑡 𝑖 < 𝑡 } denotes the history of exogenous messages posted by user 𝑢 ∈ V with sentiments 𝑤 𝑖 until time 𝑡 .Finally we write U 𝑢 ( 𝑡 ) = H 𝑢 ( 𝑡 ) ∪ C 𝑢 ( 𝑡 ) , as the history gathering both types of messages posted byuser 𝑢 , until time 𝑡 . Therefore, ∪ 𝑢 ∈V H 𝑢 ( 𝑡 ) = H ( 𝑡 ) , ∪ 𝑢 ∈V C 𝑢 ( 𝑡 ) = C( 𝑡 ) , and ∪ 𝑢 ∈V U 𝑢 ( 𝑡 ) = U ( 𝑡 ) .Note that, for clarity we denote 𝑚 𝑖 and 𝑤 𝑖 as endogenous and exogenous sentiments respectively,while 𝜁 𝑖 denotes opinion of any type. However, both types of sentiments belong to identical domain.To model the endogenous message dynamics, we represent the message times by a set of countingprocesses denoted as a vector 𝑵 ( 𝑡 ) , in which the 𝑢 -th entry, 𝑁 𝑢 ( 𝑡 ) ∈ { } ∪ Z + , counts the numberof endogenous messages user 𝑢 posted until time 𝑡 , i.e. 𝑁 𝑢 ( 𝑡 ) = |H 𝑢 ( 𝑡 )| . Then, we characterize themessage rates with the conditional intensity function E [ 𝑑 𝑵 ( 𝑡 ) | U ( 𝑡 )] = 𝝀 ∗ ( 𝑡 ) 𝑑𝑡, (1)where 𝑑 𝑵 ( 𝑡 ) : = ( 𝑑𝑁 𝑢 ( 𝑡 ) ) 𝑢 ∈V counts the endogenous messages per user in the interval [ 𝑡, 𝑡 + 𝑑𝑡 ) and 𝝀 ∗ ( 𝑡 ) : = ( 𝜆 ∗ 𝑢 ( 𝑡 ) ) 𝑢 ∈V denotes the user intensities that depend on the history U ( 𝑡 ) .Note that we assume the endogenous events do not depend on their own history H ( 𝑡 ) only, butrather on the combined history U ( 𝑡 ) of both endogenous and exogenous events. Hence, everyexogenous post influences the subsequent endogenous events in the same manner as the previousendogenous events. This is because a recipient user cannot distinguish between exogenous orendogenous posts made by her neighbors.In order to represent the arrival times of the exogenous message set C( 𝑡 ) , we introduce anadditional counting process 𝑴 ( 𝑡 ) that describes the rate of generation of exogenous events, inwhich the 𝑢 -th entry, 𝑀 𝑢 ( 𝑡 ) ∈ { } ∪ Z + , counts the number of exogenous messages user 𝑢 posteduntil time 𝑡 , i.e. 𝑀 𝑢 ( 𝑡 ) = |C 𝑢 ( 𝑡 )| . Note that, we do not aim to model the dynamics of exogenousevents, since their source is not known to us. For clarity, we briefly discuss the proposal by De et al. [18], that ignores the effect of exogenousmessages. The user intensities 𝜆 ∗ 𝑢 ( 𝑡 ) are generally modeled using multivariate Hawkes Process [43].We denote the set of users that u follows by N ( 𝑢 ) . In absence of exogenous actions, i.e. , when U ( 𝑡 ) = H ( 𝑡 ) , we have: 𝜆 ∗ 𝑢 ( 𝑡 ) = 𝜇 𝑢 + ∑︁ 𝑣 ∈N( 𝑢 ) 𝑏 𝑣𝑢 ∑︁ 𝑒 𝑖 ∈H 𝑣 ( 𝑡 ) 𝜅 ( 𝑡 − 𝑡 𝑖 ) . (2)Here, the first term, 𝜇 𝑢 ⩾
0, captures the posts by user 𝑢 on her own initiative, and the second term,with 𝑏 𝑣𝑢 ⩾
0, reflects the influence of previous posts on her intensity (self-excitation). The users’latent opinions are represented as a history-dependent, multidimensional stochastic process x ∗ ( 𝑡 ) : 𝑥 ∗ 𝑢 ( 𝑡 ) = 𝛼 𝑢 + ∑︁ 𝑣 ∈N( 𝑢 ) 𝑎 𝑣𝑢 ∑︁ 𝑒 𝑖 ∈H 𝑣 ( 𝑡 ) 𝑚 𝑖 𝑔 ( 𝑡 − 𝑡 𝑖 ) (3)where the first term, 𝛼 𝑢 ∈ R , models the original opinion of a user 𝑢 and the second term, with 𝑎 𝑣𝑢 ∈ R , models updates in user 𝑢 ’s opinion due to the influence from previous messages of herneighbours. Here, 𝜅 ( 𝑡 ) = 𝑒 − 𝜈𝑡 and 𝑔 ( 𝑡 ) = 𝑒 − 𝜔𝑡 (where 𝜈, 𝜔 ⩾
0) denote exponential triggeringkernels, which model the decay of influence over time. Finally, when a user 𝑢 posts a message attime 𝑡 , the message sentiment 𝑚 reflects the expressed opinion which is sampled from a distribution ACM Trans. Knowl. Discov. Data., Vol. 1, No. 1, Article 1. Publication date: January 2021. :6 Koley, et al. 𝑝 ( 𝑚 | 𝑥 ∗ 𝑢 ( 𝑡 )) . Here, the sentiment distribution 𝑝 ( 𝑚 | 𝑥 ∗ 𝑢 ( 𝑡 )) is assumed to be normal, i.e. 𝑝 ( 𝑚 | 𝑥 𝑢 ( 𝑡 )) = N ( 𝑥 𝑢 ( 𝑡 ) , 𝜎 𝑢 ) . In this section, we model the effect of exogenous events, C( 𝑡 ) , on the latent endogenous opinionprocess 𝑥 ∗ ( 𝑡 ) and the endogenous rate 𝜆 ∗ ( 𝑡 ) . Recall that N ( 𝑢 ) denotes the set of users that u follows .We present the dynamics of latent opinion 𝑥 ∗ 𝑢 ( 𝑡 ) of user 𝑢 , in the presence of exogenous messagesin the following equation. 𝑥 ∗ 𝑢 ( 𝑡 ) = 𝛼 𝑢 + ∑︁ 𝑣 ∈N( 𝑢 ) 𝑎 𝑣𝑢 (cid:16) ∑︁ 𝑒 𝑖 ∈H 𝑣 ( 𝑡 ) 𝑚 𝑖 𝑔 ( 𝑡 − 𝑡 𝑖 ) + ∑︁ 𝑒 𝑖 ∈C 𝑣 ( 𝑡 ) 𝑤 𝑖 𝑔 ( 𝑡 − 𝑡 𝑖 ) (cid:17) (4)where, the last term captures signals from exogenous posts. Similarly, the endogenous messagerate 𝜆 ∗ 𝑢 ( 𝑡 ) of a user 𝑢 evolves as, 𝜆 ∗ 𝑢 ( 𝑡 ) = 𝜇 𝑢 + ∑︁ 𝑣 ∈N( 𝑢 ) 𝑏 𝑣𝑢 (cid:16) ∑︁ 𝑒 𝑖 ∈H 𝑣 ( 𝑡 ) 𝜅 ( 𝑡 − 𝑡 𝑖 ) + ∑︁ 𝑒 𝑖 ∈C 𝑣 ( 𝑡 ) 𝜅 ( 𝑡 − 𝑡 𝑖 ) (cid:17) . (5)Note that same parameters, 𝑎 𝑣𝑢 and 𝑏 𝑣𝑢 , are used to model the effect of endogenous and exogenousprocesses, on both opinion and message dynamics. The above equation can be equivalently writtenas: 𝒙 ∗ ( 𝑡 ) = 𝜶 + ∫ 𝑡 𝑔 ( 𝑡 − 𝑠 ) 𝑨 (cid:2) 𝒎 ( 𝑠 ) ⊙ 𝑑 𝑵 ( 𝑠 ) + 𝒘 ( 𝑠 ) ⊙ 𝑑 𝑴 ( 𝑠 ) (cid:3) (6) 𝝀 ∗ ( 𝑡 ) = 𝝁 + ∫ 𝑡 𝑩 𝜅 ( 𝑡 − 𝑠 ) (cid:2) 𝑑 𝑵 ( 𝑠 ) + 𝑑 𝑴 ( 𝑠 ) (cid:3) . (7)Here 𝑨 = ( 𝑎 𝑣𝑢 ) ∈ R |V |×|V | , 𝑩 = ( 𝑏 𝑣𝑢 ) ∈ R |V |×|V |+ , 𝒙 ∗ ( 𝑡 ) = ( 𝑥 ∗ 𝑢 ( 𝑡 )) 𝑢 ∈V . Similarly we define 𝝀 ∗ ( 𝑡 ) , 𝒎 ( 𝑠 ) , 𝒘 ( 𝑠 ) . Furthermore, the exogenous intensity is given by: E [ 𝑑 𝑴 ( 𝑡 )|U ( 𝑡 )] = 𝜼 ( 𝑡 ) . Wedo not aim to model 𝜼 ( 𝑡 ) .By defining, 𝑷 ( 𝑡 ) : = 𝑵 ( 𝑡 ) + 𝑴 ( 𝑡 ) , as the counting process associated with the combined history U ( 𝑡 ) = H ( 𝑡 ) ∪ C( 𝑡 ) of both endogenous and exogenous events, we further simplify Eqs. (6) and (7)as, 𝒙 ∗ ( 𝑡 ) = 𝜶 + ∫ 𝑡 𝑔 ( 𝑡 − 𝑠 ) 𝑨 (cid:2) 𝜻 ( 𝑠 ) ⊙ 𝑑 𝑷 ( 𝑠 ) (cid:3) (8) 𝝀 ∗ ( 𝑡 ) = 𝝁 + ∫ 𝑡 𝑩 𝜅 ( 𝑡 − 𝑠 ) 𝑑 𝑷 ( 𝑠 ) . (9) In this section, we propose a novel technique for demarcating endogenous messages
H ( 𝑇 ) andexogenous messages C( 𝑇 ) from a stream of unlabelled messages U ( 𝑇 ) gathered during time [ ,𝑇 ) .Then, based on the categorized messages, we find the optimal parameters 𝜶 , 𝝁 , 𝑨 and 𝑩 by solvinga maximum likelihood estimation (MLE) problem. From now onwards, we would write U ( 𝑇 ) , H ( 𝑇 ) , C( 𝑇 ) as U 𝑇 , H 𝑇 and C 𝑇 to lighten the notations. Hence, succinctly, the problem can bestated as follows:(1) identify a subset H 𝑇 ⊆ U 𝑇 of endogenous events(2) find the optimal (maximum-likelihood) parameters 𝜶 , 𝝁 , 𝑨 and 𝑩 based only on H 𝑇 ACM Trans. Knowl. Discov. Data., Vol. 1, No. 1, Article 1. Publication date: January 2021. emarcating Endogenous and Exogenous Opinions 1:7 𝑋 : Our proposed approach Now, we attempt to design an unsupervised learning algorithm to isolate the endogenous events H 𝑇 and exogenous events C 𝑇 from the stream of unlabeled sentiment messages U 𝑇 , which isequivalent to assigning each event 𝑒 ∈ U 𝑇 into either H 𝑇 or C 𝑇 . This is achieved by extractingthe set of events that comply with the endogenous dynamics with high confidence that in turn isindicated by a low variance of estimated parameters.In more detail, given a candidate set of endogenous events H 𝑇 , the opinion parameters 𝑨 , 𝜶 canbe estimated by maximizing the likelihood of endogenous opinions 𝑚 𝑖 , (cid:205) 𝑖 log 𝑝 ( 𝑚 𝑖 | 𝑥 ∗ 𝑢 𝑖 ( 𝑡 𝑖 )) , i.e. ,minimizing the following,min 𝑨 , 𝜶 ∑︁ 𝑒 𝑖 ∈H 𝑇 𝑢 ∈V 𝜎 − (cid:16) 𝑚 𝑖 − 𝛼 𝑢 − ∫ 𝑡 𝑖 𝑔 ( 𝑡 − 𝑠 )( 𝜻 ( 𝑠 ) ⊙ 𝑑 𝑷 ( 𝑠 )) 𝑇 𝑨 𝑢 (cid:17) + 𝑐 || 𝑨 || 𝐹 + 𝑐 || 𝜶 || . (10)Here, the first term is derived using the Gaussian nature of 𝑝 ( 𝑚 | 𝑥 ∗ 𝑢 ( 𝑡 )) and the last two are theregularized terms. The optimal parameters ( ˆ 𝑨 , ˆ 𝜶 ) depend on the candidate set of endogenousmessages H 𝑇 . To this end, we compute the estimation covariance as, 𝚺 (H 𝑇 ) : = E ( ˆ 𝜽 − 𝜽 )( ˆ 𝜽 − 𝜽 ) 𝑇 , 𝜽 : = vec ([ 𝑨 , 𝜶 ]) . (11)Here the expectation is taken over the noise process induced while getting the message sentiment 𝑚 𝑖 , from the opinion 𝑥 ∗ 𝑢 𝑖 ( 𝑡 𝑖 ) according to the distribution 𝑝 ( 𝑚 𝑖 | 𝑥 ∗ 𝑢 𝑖 ( 𝑡 𝑖 )) . Before proceeding further,we want to make a clarification that we exclude intensity parameters ( 𝜇 and 𝐵 ) in covarianceestimation as their MLE estimation do not offer closed form solution, making it mathematicallyinconvenient. Prior to going into the selection mechanism of H 𝑇 , we first look into the expressionof covariance matrix 𝚺 in the Lemma 1. Note that, the inference problem given by Eq. (10) is thatof regularized least squares estimation, and so the covariance matrix for the optimal parameterscan be derived in a closed form given in the following:Lemma 1. For a given endogenous message-set H 𝑇 , 𝚺 (H 𝑇 ) = diag 𝑢 ∈V (cid:0) 𝑐 𝑰 + 𝜎 − ∑︁ 𝑒 𝑖 ∈H 𝑇 𝝓 𝑢𝑖 𝝓 𝑢𝑇𝑖 (cid:1) − (12) where, 𝝓 𝑢𝑖 = (cid:40) vec ([ ∫ 𝑡 𝑖 𝑔 ( 𝑡 − 𝑠 ) 𝜻 ( 𝑠 ) ⊙ 𝑑 𝑷 ( 𝑠 ) , ]) 𝑢 𝑖 = 𝑢 𝑢 𝑖 ≠ 𝑢 (13)The proof of this lemma is given in the Appendix (Section A.1).Our objective is to identify H 𝑇 , given its size 𝑁 H , so that 𝚺 (H 𝑇 ) is small. Such a demarcatedmessage-set H 𝑇 would then follow endogenous opinion dynamics more faithfully than its com-plement U 𝑇 \H 𝑇 . In order to compute the best candidate for H 𝑇 , we need to minimize a suitablefunction Ω 𝑋 (H 𝑇 ) which is some measure of 𝚺 (H 𝑇 ) .In accordance with the alphabetical design criteria of A-optimality, D-optimality, E-optimality,and T-optimality used by [12, 33], we define, Ω 𝐴 (H 𝑇 ) : = tr [ 𝚺 (H 𝑇 )] (14) Ω 𝐷 (H 𝑇 ) : = tr [ log 𝚺 (H 𝑇 )] = log [ det ( 𝚺 (H 𝑇 ))] (15) Ω 𝐸 (H 𝑇 ) : = 𝜆 𝑚𝑎𝑥 [ 𝚺 (H 𝑇 )] (16) Ω 𝑇 (H 𝑇 ) : = − tr (cid:2) 𝚺 (H 𝑇 ) − (cid:3) (17) ACM Trans. Knowl. Discov. Data., Vol. 1, No. 1, Article 1. Publication date: January 2021. :8 Koley, et al. where log 𝚺 is the matrix logarithm of 𝚺 , and 𝜆 𝑚𝑎𝑥 [ 𝚺 (H 𝑇 )] refers to the maximum eigenvalue of 𝚺 (H 𝑇 ) . These functions Ω 𝑋 (H 𝑇 ) , where 𝑋 ∈ { 𝐴, 𝐷, 𝐸,𝑇 } can be viewed as complexity measuresof 𝚺 (H 𝑇 ) [33] that make them good candidates for minimizing 𝚺 . Hence, by defining 𝑓 𝑋 (H 𝑇 ) : = − Ω 𝑋 (H 𝑇 ) , where 𝑋 ∈ { 𝐴, 𝐷, 𝐸,𝑇 } , we pose the following optimization problem to obtain the bestcardinality constrained candidate set H 𝑇 :maximize H 𝑇 ∈U 𝑇 𝑓 𝑋 (H 𝑇 ) subject to, |H 𝑇 | = 𝑁 H (18)Normally such a cardinality constrained subset selection problem would be NP-Hard [38, 65].Hence, we will rely on a greedy heuristic for maximizing 𝑓 𝑋 (Algorithm 1), that, we would showlater, gives an ( − / 𝑒 ) approximation bound. Before going to that, we first specify two propertiesdefined for any set function ℎ ( 𝑉 ) in general (Definition 2) . We would show that, 𝑓 𝑋 specificallyenjoys these properties, thereby affording an approximation guarantee from the proposed simplegreedy algorithm.Definition 2. A multidimensional set function ℎ ( 𝑉 ) in a set argument 𝑉 ⊆ 𝑈 , is said to be (1) submodular , if for any set 𝑉 ⊆ 𝑉 , 𝑥 ∉ 𝑉ℎ ( 𝑉 ∪ 𝑥 ) − ℎ ( 𝑉 ) ≥ ℎ ( 𝑉 ∪ 𝑥 ) − ℎ ( 𝑉 ) (19) In addition, if for all sets 𝑉 ⊆ 𝑈 , ℎ ( 𝑉 ) can be expressed as a linear function of weights ofindividual set elements, i.e. ℎ ( 𝑉 ) = 𝑤 ( 𝜙 ) + ∑︁ 𝑥 ∈ 𝑉 𝑤 ( 𝑥 ) (20) for some weight function 𝑤 : 𝑈 → R , then ℎ ( 𝑉 ) is said to be modular . (2) weakly submodular , if for any set 𝑉 ⊆ 𝑉 , 𝑥 ∉ 𝑉 , the two quantities 𝑐 ℎ = max 𝑉,𝑉,𝑥 ℎ ( 𝑉 ∪ 𝑥 ) − ℎ ( 𝑉 ) ℎ ( 𝑉 ∪ 𝑥 ) − ℎ ( 𝑉 ) (21) and 𝜖 ℎ = max 𝑉,𝑉,𝑥 ( ℎ ( 𝑉 ∪ 𝑥 ) − ℎ ( 𝑉 )) − ( ℎ ( 𝑉 ∪ 𝑥 ) − ℎ ( 𝑉 )) (22) are bounded. 𝑐 ℎ and 𝜖 ℎ are called the multiplicative and additive weak submodularity constants,respectively. Theorem 3 (
Characterizing 𝑓 𝑋 ). Let
V (H 𝑇 ) be the set of users of the message set H 𝑇 . (1) 𝑓 𝑋 (H 𝑇 ) is monotone in H 𝑇 ∀ 𝑋 ∈ { 𝐴, 𝐷, 𝐸,𝑇 } . (2) 𝑓 𝐴 (H 𝑇 ) and 𝑓 𝐸 (H 𝑇 ) are weakly submodular in H 𝑇 . (3) 𝑓 𝐷 (H 𝑇 ) is submodular in H 𝑇 . (4) 𝑓 𝑇 (H 𝑇 ) is modular in H 𝑇 . Proof Idea: The key to the proof of monotonicity relies on mapping the given set-function 𝑓 𝑋 to a suitably chosen continuous functions 𝑔 ( 𝑝 ) so that, 𝑔 ( ) > 𝑔 ( ) implies the monotonicity of 𝑓 𝑋 . Noting that 𝑓 𝑋 is linear, the rest of the proof follows from the properties of the A, D, E and Toptimality criteria presented in [33] and the citations therein. The complete proof is given in theAppendix (Sec. A.2) ACM Trans. Knowl. Discov. Data., Vol. 1, No. 1, Article 1. Publication date: January 2021. emarcating Endogenous and Exogenous Opinions 1:9 𝑓 𝑋 (H 𝑇 ) . Since 𝑓 𝑋 is (weakly) submodular in H 𝑇 , it can be maximized by atraditional greedy approach adopted for maximizing submodular functions of a single set [51]. Themaximization routine is formally shown in Algorithm 1. At each step, it greedily adds an event 𝑒 to H 𝑇 sequentially, by maximizing the marginal gain 𝑓 𝑋 (H 𝑇 ∪ { 𝑒 }) − 𝑓 𝑋 (H 𝑇 ) (step 5, Algorithm 1)until |H 𝑇 | reaches 𝑁 H .Lemma 4 ( Solution qality for 𝑓 𝐷 and 𝑓 𝑇 ). Algorithm 1 admits a ( − / 𝑒 ) approximationbound for 𝑓 𝐷 (H 𝑇 ) and 𝑓 𝑇 (H 𝑇 ) . This result is due to submodularity and monotonicity of the functions 𝑓 𝐷 (H 𝑇 ) and 𝑓 𝑇 (H 𝑇 ) . Ithas been shown in [52] that such a greedy algorithm for maximizing a monotone and submodularfunction admits a ( − / 𝑒 ) approximation bound.Lemma 5 ( Solution qality for 𝑓 𝐴 and 𝑓 𝐸 ). Let 𝑐 𝑓 𝐴 and 𝜖 𝑓 𝐴 be the multiplicative and additiveweak submodularity constants (see Definition 2) for 𝑓 𝐴 and 𝑐 𝑓 𝐸 and 𝜖 𝑓 𝐸 be those for 𝑓 𝐸 . By Theorem 3,these quantities exist and are bounded provided V (H 𝑇 ) . Let (H 𝐴𝑔𝑇 ) and (H 𝐸𝑔𝑇 ) be the subsets obtainedby maximizing 𝑓 𝐴 and 𝑓 𝐸 respectively, and (H 𝐴 ∗ 𝑇 ) and (H 𝐸 ∗ 𝑇 ) be the optimal subsets achieving themaximum of 𝑓 𝐴 and 𝑓 𝐸 respectively. Then, 𝑓 𝐴 (H 𝐴𝑔𝑇 ) ≥ ( − 𝑒 − / 𝑐 𝐴 ) 𝑓 𝐴 (H 𝐴 ∗ 𝑇 ) (23) 𝑓 𝐸 (H 𝐸𝑔𝑇 ) ≥ ( − 𝑒 − / 𝑐 𝐸 ) 𝑓 𝐸 (H 𝐸 ∗ 𝑇 ) (24) where 𝑐 𝐴 = max { 𝑐 𝑓 𝐴 , } and 𝑐 𝐸 = max { 𝑐 𝑓 𝐸 , } . Also, 𝑓 𝐴 (H 𝐴𝑔𝑇 ) ≥ ( − 𝑒 )( 𝑓 𝐴 (H 𝐴 ∗ 𝑇 ) − ( 𝑁 H − ) 𝜖 𝑓 𝐴 ) (25) 𝑓 𝐸 (H 𝐸𝑔𝑇 ) ≥ ( − 𝑒 )( 𝑓 𝐸 (H 𝐸 ∗ 𝑇 ) − ( 𝑁 H − ) 𝜖 𝑓 𝐸 ) (26) where 𝑁 H is the required number of endogenous events that is fed as an input to Algorithm 1. This result is due to weak submodularity and monotonicity of the functions 𝑓 𝐴 (H 𝑇 ) and 𝑓 𝐸 (H 𝑇 ) .This result directly follows from Proposition 2 of [33].Note that, one can find fast adaptive algorithms as an alternative to standard greedy for submod-ular function maximization. For 𝑓 𝐷 being submodular monotone, recent fast adaptive algorithmproposed in [9] easily applies for 𝑓 𝐷 . Though recent fast adaptive techniques in submodular max-imization do not apply to weakly submodular functions in general, it has been shown that 𝑓 𝐴 satisfies 𝛾 -differential submodularity [55] and thereby DASH, a recent fast adaptive techniqueproposed by [55] can be applied for maximizing 𝑓 𝐴 . However, in practice, we find the performanceobtained by both fast adaptive algorithms for maximizing 𝑓 𝐴 as well as 𝑓 𝐷 are inferior to standardgreedy in some cases, despite speeding up the demarcation process. Thereby we have added adetailed comparative evaluation of standard greedy and adaptive algorithms as additional resultsin supplementary (see Section B).The event-set H 𝑇 thus obtained would be used next to estimate all the parameters 𝑨 , 𝝁 , 𝜶 , 𝑩 (See Algorithm 2) by maximizing L ( 𝜶 , 𝝁 , 𝑨 , 𝑩 |H 𝑇 ) which is given by L ( 𝜶 , 𝝁 , 𝑨 , 𝑩 |H 𝑇 ) = ∑︁ 𝑒 𝑖 ∈H 𝑇 𝑝 ( 𝑚 𝑖 | 𝑥 ∗ 𝑢 𝑖 ( 𝑡 𝑖 )) + ∑︁ 𝑒 𝑖 ∈H 𝑇 log ( 𝜆 𝑢 𝑖 ( 𝑡 𝑖 )) − ∑︁ 𝑢 ∈V ∫ 𝑇 𝜆 ∗ 𝑢 ( 𝑠 ) 𝑑𝑠 (27)Since L is a concave function, one can maximize this efficiently. We adopt the method givenby the authors in [18], which can accurately compute the parameters. In conclusion, the aboveprocedures yield four distinct methods for demarcating the endogenous and exogenous dynamics, ACM Trans. Knowl. Discov. Data., Vol. 1, No. 1, Article 1. Publication date: January 2021. :10 Koley, et al.
ALGORITHM 1: 𝚼 =CherryPick 𝑋 ( 𝑓 𝑋 , 𝑁 H , U 𝑇 ) Initialization :2: H 𝑇 ← ∅ , C 𝑇 ← U 𝑇 General subroutine :4: while |H 𝑇 | < 𝑁 H do 𝑒 ← arg max 𝑒 ∈U 𝑇 𝑓 𝑋 (H 𝑇 ∪ { 𝑒 }) − 𝑓 𝑋 (H 𝑇 ) C 𝑇 ← C 𝑇 \{ 𝑒 }
7: /*
Update endogenous message-set */8: H 𝑇 ← H 𝑇 ∪ { 𝑒 } end while 𝚼 = (H 𝑇 , C 𝑇 ) return 𝚼 . ALGORITHM 2:
Parameter Estimation Input: 𝑁 H , U 𝑇 Output: ( 𝜶 ∗ , 𝝁 ∗ , 𝑨 ∗ , 𝑩 ∗ )
3: /*
First find the endogenous messages */4: ( H 𝑇 , C 𝑇 )=CherryPick 𝑋 ( 𝑓 𝑋 , 𝑁 H , U 𝑇 )5: /* Estimate parameters over only H 𝑇 */6: ( 𝜶 ∗ , 𝝁 ∗ , 𝑨 ∗ , 𝑩 ∗ ) = argmax L ( 𝜶 , 𝝁 , 𝑨 , 𝑩 |H 𝑇 ) return 𝜶 ∗ , 𝝁 ∗ , 𝑨 ∗ , 𝑩 ∗ . viz. CherryPick 𝐴 , CherryPick 𝐷 , CherryPick 𝐷 and CherryPick 𝑇 according to the optimalitycriterion applied. Since 𝑓 𝐴 and 𝑓 𝐸 are only weakly submodular while 𝑓 𝐷 and 𝑓 𝑇 enjoy full submodularity, Cher-ryPick 𝐴 and CherryPick 𝐸 may, in theory, achieve poorer performance than CherryPick 𝐷 andCherryPick 𝑇 . However, [33] have noted that, if the difference between the minimum and maximuminformation of individual observations is small, E-optimality is nearly submodular and under theseconditions, CherryPick 𝐸 is expected to find a good (informative) subset. Furthermore [13, 33] havealso noted that the behavior of both A-optimality and E-optimality approaches that of a submodularfunction if the highest SNR of the observations is relatively small and in this case CherryPick 𝐴 and CherryPick 𝐸 should perform well. Also, [13, 33] have noted that even when the SNRs arelarge, when the observations are not too correlated, greedy design for A and E optimality achievesgood results. In this context, we note that Proposition 1 of [7] provides a lower bound of the sub-modularity ratio of A-optimality in terms of the spectral norm ||H 𝑇 || of the observations H 𝑇 . Fromthere it follows that if the spectral norm ||H 𝑇 || is low, then the lower bound of the submodularityratio approaches 1, i.e. the behaviour of A-optimality approaches that of a submodular function.We also note that the modular nature of 𝑓 𝑇 (H 𝑇 ) implies that each 𝑥 ∈ H 𝑇 contributes indepen-dently to the function value. Consequently, the optimization of 𝑓 𝑇 is easily achieved by simplyevaluating 𝑓 𝑇 for each individual event, sorting the result, and then choosing the top 𝑁 H individualevents from the sorted list to obtain the best subset H 𝑇 .Finally, as noted by [33], under certain conditions A-optimality and D-optimality have moreintuitive interpretations than E-optimality and T-optimality, A-optimality being related to themean-square-error (MSE) and D-optimality being related to maximization of entropy of modelparameters. ACM Trans. Knowl. Discov. Data., Vol. 1, No. 1, Article 1. Publication date: January 2021. emarcating Endogenous and Exogenous Opinions 1:11
In this section, we provide a comprehensive evaluation of the four variants of CherryPick on adiverse set of real datasets. Since the category of being exogenous or endogenous is latent to amessage event in most public datasets, our proposals cannot be tested in terms of their classificationerror. Hence, we resort to measure the utility of our methods in terms of their impact on thepredictive power of the underlying endogenous model. To that aim, we address the followingresearch questions:(1) How do the variants of CherryPick compare against the competitive baselines in terms ofthe predictive accuracy of the trained endogenous model?(2) How does the pre-specified value of the fraction of exogenous messages 𝛾 impact the predic-tive accuracy?(3) Do the proposed methods have any positive impact on the long term forecasting task?(4) How do they perform well on a curated test set, which only contains endogenous messages?(5) How does their performance vary across different sizes of training set? We consider five real datasets (summarized in Table 2) corresponding to various real-world events,collected from Twitter. They are tweets about a particular story. Specifically, we have:(1)
Club [68]:
Barcelona winning the La-liga, from May 8 to May 16, 2016.(2)
Elections [68]:
British national election from May 7 to May 15, 2015.(3)
Verdict [68]:
Verdict for the corruption-case against Jayalalitha, an Indian politician, fromMay 6 to May 17, 2015(4)
Sports [68]
Champions League final in 2015, between Juventus and Real Madrid, from May8 to May 16, 2015.(5)
Delhi [18]:
Delhi assembly elections, from 9th to 15th of December 2013Along with other statistics, the last column (¯ 𝑟 ) in Table 2 indicates average absolute correlationsof the observation matrix for each dataset. We observe that the average correlations are quitesmall (in range of 0.001-0.006) despite high SNR value (in range of 30-50 dB), which justifies theapplication of CherryPick 𝐴 or CherryPick 𝐸 on these datasets as the greedy design of A and Eoptimality should achieve good performance for such cases even though they are only weaklysubmodular (see Section 4.2).For all datasets, we follow a very standard setup for both network construction and messagesentiment computation [16, 18, 63]. We built the follower-followee network for the users that postedrelated tweets using the Twitter rest API . Then, we filtered out users that posted less than 200tweets during the account lifetime, follow less than 100 users, or have less than 50 followers. Foreach dataset, we compute the sentiment values of the messages using a popular sentiment analysistoolbox [31]. Here, the sentiment takes values 𝑚 ∈ [− , ] and we consider the sentiment polarityto be simply sign ( 𝑚 ) . Note that, while other sentiment analysis tools [54] can be used to extractsentiments from tweets, we appeal to [31] due to two major reasons– its ability of accurately extractsentiments from short informal texts like tweets, and its wide usage in validating data-drivenopinion models [16, 18]. The temporal stream of sentiment messages is split into training and test sets, assigning the first90% of the total number of messages to the training set. The training set U 𝑇 , collected until time https://dev.twitter.com/rest/public ACM Trans. Knowl. Discov. Data., Vol. 1, No. 1, Article 1. Publication date: January 2021. :12 Koley, et al.
Datasets |V | |E | |U 𝑇 | E [ 𝑚 ] 𝜌 [ 𝑚 ] ¯ 𝑟 Club 703 4154 9409 0.109 0.268 0.0026Elections 231 1108 1584 -0.096 0.232 0.0062Verdict 1059 17452 10691 0.062 0.266 0.0016Sports 703 4154 7431 0.562 0.224 0.0041Delhi 548 5271 20026 0.016 0.178 0.0018
Table 2. Statistics of real datasets. 𝑇 , is categorized into endogenous H 𝑇 and exogenous messages C 𝑇 , and model parameters areestimated over the classified H 𝑇 . During categorization, we took a range of pre-specified valuesof |H 𝑇 |/|U 𝑇 | , the pre-specified fraction of organic messages. Finally, using this estimated model,we forecast the sentiment value 𝑚 for each message in the test set given the history up to T hoursbefore the time of the message as ˆ 𝑚 = 𝐸 H 𝑡 \H 𝑡 − T [ 𝑥 ∗ 𝑢 ( 𝑡 )|H 𝑡 − T ] that we compute using an efficientsimulation method given by [18, 25]. We compare the four variants of CherryPick with four unsupervised event classification tech-niques, borrowed from robust regression literature as well as various outlier detection techniques.Finally, we compare a representative of CherryPick with three best performing baselines for eachexperiment.
Huber regression [59].
Here, we apply Huber penalty in our learning objective, which followsfrom the underlying assumption that a subset of the samples are outliers.min 𝑨 , 𝜶 ∑︁ 𝑒 𝑖 ∈U 𝑇 𝜌 ℎ (cid:16) 𝑚 𝑖 − 𝛼 𝑢 − ∫ 𝑡 𝑖 𝑔 ( 𝑡 − 𝑠 )( 𝜻 ( 𝑠 ) ⊙ 𝑑 𝑷 ( 𝑠 )) 𝑇 𝑨 𝑢 (cid:17) (28)where 𝜌 ℎ : 𝑹 → 𝑹 is defined as 𝜌 ℎ ( 𝑢 ) = 𝑢 if | 𝑢 | ≤ 𝑐 /
2, otherwise 𝑐 | 𝑢 | − 𝑐 / Robust lasso [50].
Here, we define 𝜖 𝑖 as a measure of exogenous behavior of an event 𝑒 𝑖 . Such ameasure is matched with the training error using a mean square loss, which is further penalized byan 𝐿 regularizer on 𝝐 = ( 𝜖 𝑖 ) 𝑖 ∈U 𝑇 . Such a regularizer controls the fraction of exogenous messagesidentified by this model. During our implementation, we tune it so that the fraction of exogenousmessages is close to 𝛾 .min 𝑨 , 𝜶 , 𝒐 ∑︁ 𝑒 𝑖 ∈U 𝑇 (cid:16) 𝑚 𝑖 − 𝛼 𝑢 − ∫ 𝑡 𝑖 𝑔 ( 𝑡 − 𝑠 )( 𝜻 ( 𝑠 ) ⊙ 𝑑 𝑷 ( 𝑠 )) 𝑇 𝑨 𝑢 − 𝑜 𝑖 (cid:17) + 𝑐 (cid:16) || 𝑨 || + || 𝜶 || (cid:17) + 𝑐 || 𝒐 || . Robust hard thresholding [6].
Instead of minimizing different measures of variance, such amethod directly solves the training error of the endogenous model using a hard thresholding basedapproach.min 𝑨 , 𝜶 , |H 𝑇 |≥( − 𝛾 ) |U 𝑇 | ∑︁ 𝑒 𝑖 ∈H 𝑇 𝑢 ∈O |U 𝑇 | (cid:16) 𝑚 𝑖 − 𝛼 𝑢 − ∫ 𝑡 𝑖 𝑔 ( 𝑡 − 𝑠 )( 𝜻 ( 𝑠 ) ⊙ 𝑑 𝑷 ( 𝑠 )) 𝑇 𝑨 𝑢 (cid:17) + 𝑐 (cid:16) || 𝑨 || 𝐹 + || 𝜶 || (cid:17) . Soft thresholding.
This method is designed by assuming the presence of unlimited error signalson a limited number of data points and alternates between ( 𝛼, 𝐴 ) and { 𝑜 𝑖 } ∀ 𝑖 ∈H 𝑇 for solving the ACM Trans. Knowl. Discov. Data., Vol. 1, No. 1, Article 1. Publication date: January 2021. emarcating Endogenous and Exogenous Opinions 1:13 following objective.min 𝑨 , 𝜶 , 𝒐 ∑︁ 𝑒 𝑖 ∈U 𝑇 |U 𝑇 | (cid:16) 𝑚 𝑖 − 𝛼 𝑢 − ∫ 𝑡 𝑖 𝑔 ( 𝑡 − 𝑠 )( 𝜻 ( 𝑠 ) ⊙ 𝑑 𝑷 ( 𝑠 )) 𝑇 𝑨 𝑢 − 𝑜 𝑖 (cid:17) + 𝑐 (cid:16) || 𝑨 || 𝐹 + || 𝜶 || (cid:17) + 𝑐 || 𝒐 || . SLANT [18].
In this baseline, we have used all the samples for parameter estimation purpose,without any filtering.
We measure the performance of our methods and the baselines using the prediction errors of thecorrespondingly trained endogenous model. Specifically, we use (i) the mean squared error (MSE) between the actual sentiment value ( 𝑚 ) and the estimated sentiment value ( ˆ 𝑚 ), i.e. , E [( 𝑚 − ˆ 𝑚 ) ] .and (ii) the failure rate (FR) which is the probability that the polarity of actual sentiment ( 𝑚 ) doesnot coincide with the polarity of predicted opinion ( ˆ 𝑚 ), i.e. , P ( sign ( 𝑚 ) ≠ sign ( ˆ 𝑚 )) to measure thepredictive error of the resulting endogenous model. Mean squared error: E ( 𝑚 − ˆ 𝑚 ) CherryPick Competitive methodsDatasets A D E T Hard Huber Lasso Soft SlantClub 0.038
Failure Rate: P ( sign ( 𝑚 ) ≠ P ( sign ( ˆ 𝑚 )) Club 0.120 0.113 0.121 0.122
Table 3. Sentiment prediction performance for five real-world datasets for all competing methods for a fixed 𝛾 = . . For each message 𝑚 in test set, we predict its sentiment value given the history up to T = 4 hoursbefore the time of the message. For the T hours, we predict the opinion stream using a sampling algorithm.Mean squared error and failure rate have been reported. We observe that the variants of CherryPick generallyperform better than the baselines on all datasets. Among the baselines, Robust 𝐻𝑇 performs comparably withCherryPick on some of the datasets presented here. Comparative analysis.
Here we aim to address the research question (1). More specifically, wecompare the endogenous model obtained using our method against the baselines. Table 3 sum-marizes the results, which reveals the following observations. (I) Our method outperforms othermethods. (II) CherryPick 𝐴 and CherryPick 𝐷 perform best among the variants of CherryPick.This is because CherryPick 𝐸 and CherryPick 𝑇 often suffer from poor training due to the com-putational inefficiency of eigenvalue optimization in CherryPick 𝐸 and the trace optimization ofinverse matrices in CherryPick 𝑇 . (III) The robust regression with hard thresholding performs bestacross the baselines, which together with the superior performance of our methods indicate thatthe hard thresholding based methods are more effective than those based on soft thresholding in ACM Trans. Knowl. Discov. Data., Vol. 1, No. 1, Article 1. Publication date: January 2021. :14 Koley, et al. the context of unsupervised demarcation of the opinionated messages. Here we want to clarifythat Table 3 presents a complete comparative evaluation of all the baselines along with all variantsof CherryPick whereas, in subsequent experiments, we present only the three best performingbaselines for clarity.
Cherrypick A Cherrypick D Cherrypick E Cherrypick T | T |/| T | M SE Club | T |/| T | M SE Verdict | T |/| T | M SE Delhi | T |/| T | F R Club | T |/| T | F R Verdict | T |/| T | F R DelhiFig. 1. Performance variation with size of endogenous subset for Club, Verdict and Delhi datasets where thesize of endogenous subset of data is varied from 50% to 100%. Time span T has been set to 4 hours. Meansquared error and failure rate have been reported. We observe that our method performs better for the rangeof size of endogenous subset within 70% to 90% of the whole data whereas performance deteriorates whenendogenous subset size is set too small like 60% or too high like 100% of the whole data.
Variation of performance with the fraction of outliers ( 𝛾 ). Next, we address the researchquestion (2). Figure 1 describes the variation of forecasting performance for different values of 𝛾 i.e. the pre-specified fraction of outliers, for Club, Verdict and Delhi datasets. In this experiment, 𝛾 is varied from 0 . 𝛾 , followed by parameter estimation over the refined set. As this experiment is mainly intendedfor showing the effect of the parameter 𝛾 in our methods on predictive performance, baselines areomitted here. For this experiment, the time span has been fixed to 4 hours. Here we observe, as westart refining the event set, the prediction performance improves. But if we increase the value of 𝛾 beyond around 0.4, the performance drops, strongly suggesting an optimum number of outlierspresent in the training data. If we set a high value of 𝛾 , our methods misclassify many regularevents as outliers, while a small value of 𝛾 ignores their effects. Forecasting performance.
Next, we address the research question (3). In particular, we comparethe forecasting performance of CherryPick against the baselines. Figure 2 shows the forecastingperformance with respect to variation of T, across various representative datasets, for the bestperforming variant of CherryPick and three best performing baselines (best according to MSEat T = 4 hours), where 𝛾 = .
2. We make the following observations. (I) CherryPick outperformsthe baselines for the majority of the cases. (II) Generally, Robust 𝐻𝑇 performs better among the ACM Trans. Knowl. Discov. Data., Vol. 1, No. 1, Article 1. Publication date: January 2021. emarcating Endogenous and Exogenous Opinions 1:15 baselines. (III) The performance deteriorates as we predict further in the future for all the methods,but the performance stabilizes after a while.
Cherrypick Robust HT RobustLasso SoftThreshold
T (hours) M SE T (hours) M SE T (hours) M SE T (hours) M SE T (hours) M SE T (hours) F R Club
T (hours) F R Elections
T (hours) F R Verdict
T (hours) F R Sports
T (hours) F R DelhiFig. 2. Sentiment prediction performance for five real-world datasets for the best performing variant ofCherryPick and three best performing baselines for a fixed 𝛾 = . . For each message 𝑚 in test set, we predictits sentiment value given the history up to T hours before the time of the message. For the T hours, we predictthe opinion stream using a sampling algorithm. Mean squared error and failure rate have been reported.We observe CherryPick generally performs better than the baselines on all datasets. Among the baselines,Robust 𝐻𝑇 performs comparably with CherryPick on some of the datasets presented here. Effect of sanitizing test data.
Next, we address the research question (4). To that aim, we removethe outliers from the test data by refining entire datasets using variants of CherryPick and thencheck the prediction performance of the previously computed model (in the experimental settingsof Section 5.5) over the refined test set. We compare the prediction error over this sanitized testset with the prediction error over the unrefined test set and report the improvement. We reportthe prediction error in Table 4 (improvements over the error on the unrefined test set are given inbrackets). As observed from the results, for the majority of the cases, refining the test set improvesprediction performance, confirming the presence of outliers in the test set which the estimatedmodel will not be able to predict well. Depending on the type and nature of the datasets, the errorreduction varies, reaching its highest for Sports and lowest for Club. Among our methods, weobserve CherryPick 𝐴 and CherryPick 𝐷 to perform better than the rest in terms of error reduction.In summary, the effectiveness of CherryPick is more prominent after demarcating the test set. Variation of performance with training set size.
Next, we answer the research question (5).Specifically, we evaluate the efficacy of our approaches over varying training set sizes as follows.We use a subset of training data, varying from initial 50% of the total training set to entire 100%for estimating the parameters and test the estimated model on the same test data used in all theexperiments, which is the last 10% data of the entire stream of events. Figure 3 summarizes theresults over three datasets (Club, Elections and Sports) out of the five real datasets, which revealsthe following observations. (I) The performance deteriorates with decreasing training set size forall methods. (II) Our methods generally perform better compared to the baselines over varyingtraining set sizes showing the effectiveness of the demarcation technique. (III) For Elections wherethe total number of events is quite small compared to the rest, Robust 𝐻𝑇 performs better than the ACM Trans. Knowl. Discov. Data., Vol. 1, No. 1, Article 1. Publication date: January 2021. :16 Koley, et al.
Datasets CherryPick 𝐴 CherryPick 𝐷 CherryPick 𝐸 CherryPick 𝑇 MSEClub 0.038 (1.78%) 0.044 (-16.%) 0.039 (-1.1%) 0.042 (-10.%)Elections 0.057 (-4.5%) 0.053 (1.48%) 0.051 (6.56%) 0.062 (-19.%)Verdict 0.068 (2.04%) 0.069 (3.89%) 0.071 (0.51%) 0.074 (-4.4%)Sports 0.059 (11.9%) 0.042 (24.8%) 0.057 (6.43%) 0.059 (17.1%)Delhi 0.033 (6.41%) 0.039 (1.15%) 0.0 (100.%) 0.036 (1.41%)FRClub 0.744 (0.29%) 0.140 (-23.%) 0.740 (1.52%) 0.650 (-7.1%)Elections 0.314 (13.9%) 0.168 (4.59%) 0.252 (28.4%) 0.140 (28.3%)Verdict 0.685 (0.32%) 0.204 (2.94%) 0.701 (-0.4%) 0.694 (-0.0%)Sports 0.062 (22.4%) 0.071 (10.9%) 0.051 (20.4%) 0.051 (43.8%)Delhi 0.474 (19.2%) 0.143 (2.70%) 0.0 (100.%) 0.480 (0.83%)
Table 4. Mean squared error and failure rate of CherryPick 𝑋 on all the datasets after demarcating exogenousevents from the test set. Error reduction (from error reported on the entire test set) is given in brackets. Afterdemarcating the test set, in the majority of the cases, error reduction is positive, indicating that the error onan unfiltered test set gives an overestimation of the true error. Cherrypick Robust HT HuberRegression RobustLasso
50% 60% 70% 80% 90% 100%
Sample Size M SE Club
50% 60% 70% 80% 90% 100%
Sample Size M SE Elections
50% 60% 70% 80% 90% 100%
Sample Size M SE Sports
50% 60% 70% 80% 90% 100%
Sample Size F R Club
50% 60% 70% 80% 90% 100%
Sample Size F R Elections
50% 60% 70% 80% 90% 100%
Sample Size F R SportsFig. 3. Mean squared error and failure rate on three datasets Club, Elections and Sports for varying size oftraining data from 50% to 100% of total training data (initial 90% data of entire event collection) for the bestperforming variant of CherryPick and three best performing baselines. Time span for future prediction T isset to 4 hours and 𝛾 = 0.2. For all methods, error reduces with increasing training set size. Generally on theentire range, CherryPick performs better than the baselines. variants of CherryPick, indicating hard thresholding to be quite effective for demarcation. (IV)The performance of CherryPick for smaller sample size, relative to its competitors, indicates itsstability and robustness. ACM Trans. Knowl. Discov. Data., Vol. 1, No. 1, Article 1. Publication date: January 2021. emarcating Endogenous and Exogenous Opinions 1:17
In this section, we provide a comparative evaluation of the four variants of CherryPick withbaselines on a set of three synthetic datasets. To that aim, we address the following researchquestions here:(1) How do the variants of CherryPick compare in terms of parameter estimation performanceacross different sizes of training set?(2) How does their predictive performance vary across different levels of noise present in thedata?
For constructing the synthetic datasets, we have generated the following networks, each with 512nodes to use as input to the opinion model (SLANT) [18]. • Kronecker Core-Periphery networks : In this Kronecker network, the core is very wellconnected and there are comparatively fewer nodes in the periphery, sparsely connectedwith the core (parameter matrix [0.9, 0.5; 0.5, 0.3]). • Kronecker Random networks: This Kronecker network has been generated using the param-eter matrix [0.5, 0.5, 0.5, 0.5]. • Barabasi Albert : This is a scale-free network where the network grows iteratively withthis preferential attachment property that the more well-connected nodes are more likely toreceive new links.The message stream is generated for each of the networks by simulating the message samplingalgorithm proposed in [18]. Each user starts with a latent opinion ( 𝛼 𝑖 ) as well as non-zero opinioninfluence parameters corresponding to directed edges of the network ( 𝐴 𝑖 𝑗 ) sampled from a zero-mean unit variance Gaussian distribution. The rate of intensity ( 𝜇 𝑖 ), as well as the intensity influenceparameters ( 𝐵 𝑖 𝑗 ), are uniformly sampled from the range [ , ] . Opinions are sampled from unitvariance Gaussian with the mean set as the latent opinion of the user. Using multivariate Hawkes,the events are generated. While generating event times, each event is marked as exogenous with20% probability and if marked exogenous, it is sampled by a distinct distribution (described later).The kernel used in exponential decay of the opinion ( 𝜔 ) is 1000 and the kernel for the intensity ( 𝜇 )is 10. We compare our proposed approaches with the same baselines introduced in Section 5.3.
The evaluation protocol used is the same as in the case of real datasets (Section 5.2).
We measure the performance of our methods and the baselines using the sentiment predictionerrors and parameter estimation error of the correspondingly trained endogenous model, dependingon corresponding experiment. Specifically, we measure the sentiment prediction error using the mean squared error (MSE) between the actual sentiment value ( 𝑚 ) and the estimated sentimentvalue ( ˆ 𝑚 ), i.e. , E [( 𝑚 − ˆ 𝑚 ) ] and the parameter estimation error using the mean squared error (MSE) between the estimated ( ¯ 𝑥 )and true opinion parameters ( 𝑥 ), i.e., 𝐸 [( 𝑥 − ¯ 𝑥 ) ] . ACM Trans. Knowl. Discov. Data., Vol. 1, No. 1, Article 1. Publication date: January 2021. :18 Koley, et al.
Variation of performance with sample size.
Here, we investigate the research question (1).More specifically, to understand the effect of sample size on parameter estimation, we vary thenumber of events per node from 20 to 400. 20% of the messages are perturbed as exogenous eventsand they are sampled from a Gaussian distribution with 0 . 𝐷 and CherryPick 𝑇 are ableto boost parameter estimation performance across a wide range of training set size, along withshowing their robustness and stability with decreasing sample size. Hoewever, CherryPick 𝐴 andCherryPick 𝐸 are observed to perform poorly over the entire range.(II) Robust 𝐻𝑇 shows verycomparable performance with best performing variants of CherryPick. We can conclude thatCherryPick 𝐷 and CherryPick 𝑇 are able to identify more useful samples for accurately estimatingthe parameters compared to CherryPick 𝐴 and CherryPick 𝐸 . This is because CherryPick 𝐴 andCherryPick 𝐸 suffer from their weak submodularity property, which renders them disadvantageousdue to high SNR of the datasets (40-50 dB) [13]. Cherrypick A Cherrypick D Cherrypick E Cherrypick T Robust HT SoftThreshold HuberRegression
25 50 100 200 400 M SE Barabasi Albert
25 50 100 200 400 M SE Kronecker-CP
25 50 100 200 400 M SE Kronecker-RandomFig. 4. Mean squared error for parameter estimation on three synthetic datasets against varying training setsize. Best performing variants of CherryPick always achieve comparable (if not better) performance withbest of the baselines.
Variation of performance with noise.
Here we address research question (2). In particular, tohave a better understanding of the effect of noise intensity in the performance of CherryPick, wegradually increase the noise intensity in the message stream in all synthetic datasets and report thesentiment prediction performance of all competing methods. 30000 events are sampled for eachnetwork and 20% of them are perturbed by adding noises of increasing intensity. The noise is sampledfrom Gaussian with variance set at 0 .
05 and mean varying in the range [ . , . , . , . , . , . ] .Figure 5 summarizes the results for all variants of CherryPick along with three best performingbaselines, where it shows, (i) as prediction error increases sharply with increasing noise for all themethods, CherryPick 𝐷 or CherryPick 𝑇 outperform or perform comparably with the baselines,(iii) interestingly Robust 𝐻𝑇 performs comparably with CherryPick 𝑋 in majority of the cases. Aswe already mentioned, for all the cases, the performance of CherryPick 𝐴 or CherryPick 𝐸 suffersbecause their weak submodularity property which renders them ineffective due to high SNR of thecorresponding datasets (40-50 dB) [13]. ACM Trans. Knowl. Discov. Data., Vol. 1, No. 1, Article 1. Publication date: January 2021. emarcating Endogenous and Exogenous Opinions 1:19
Cherrypick A Cherrypick D Cherrypick E Cherrypick T Robust HT SoftThreshold HuberRegression
Noise M SE Barabasi Albert
Noise M SE Kronecker-CP
Noise M SE Kronecker-RandomFig. 5. Sentiment prediction error in terms of mean squared error on three synthetic datasets against varyingintensity of noise. Best performing variants of CherryPick degrade gracefully in comparison with the baselinesfor most of the cases.
The principal contribution of this paper lies in emphatically establishing the dual nature of messageflow over online social network: injection of exogenous opinions and influence-based dynamics,internal to the network. The realization helps us to propose CherryPick 𝑋 , a set of novel learningmethodologies to demarcate endogenous and exogenous opinions and illustrate their utility byanalyzing their performance from an opinion modeling perspective. In CherryPick 𝑋 , to this aim,we formulated the message classification problem as a submodular optimization task in the setof messages, which we solved using an efficient greedy algorithm. Our proposed techniques arevery easy to implement, extremely scalable (particularly CherryPick 𝐴 and CherryPick 𝑇 ) andquite effective in serving their purpose, showing their superiority over the baselines which weredesigned by robust regression literature. Finally, on various real datasets crawled from Twitter aswell as synthetic datasets, we showed that our proposals consistently outperform various outlierremoval algorithms in terms of predictive performance. The superior performance is even moreremarkable considering the fact that we train our system on smaller (but relevant) amounts of data. REFERENCES [1] Claudio Altafini and Gabriele Lini. 2015. Predictable dynamics of opinion forming for networks with antagonisticinteractions.
IEEE Trans. Automat. Control
60, 2 (2015), 342–357.[2] Aris Anagnostopoulos, Ravi Kumar, and Mohammad Mahdian. 2008. Influence and correlation in social networks. In
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining . ACM, 7–15.[3] Sinan Aral and Dylan Walker. 2011. Creating social contagion through viral product design: A randomized trial ofpeer influence in networks.
Management science
57, 9 (2011), 1623–1639.[4] Eytan Bakshy, Dean Eckles, Rong Yan, and Itamar Rosenn. 2012. Social influence in social advertising: evidence fromfield experiments. In
Proceedings of the 13th ACM conference on electronic commerce . 146–161.[5] Eytan Bakshy, Itamar Rosenn, Cameron Marlow, and Lada Adamic. 2012. The role of social networks in informationdiffusion. In
Proceedings of the 21st international conference on World Wide Web . 519–528.[6] Kush Bhatia, Prateek Jain, and Purushottam Kar. 2015. Robust regression via hard thresholding. In
Advances in NeuralInformation Processing Systems . 721–729.[7] Andrew An Bian, Joachim M Buhmann, Andreas Krause, and Sebastian Tschiatschek. 2017. Guarantees for greedymaximization of non-submodular functions with applications. arXiv preprint arXiv:1703.02100 (2017).[8] Vincent D Blondel, Julien M Hendrickx, and John N Tsitsiklis. 2009. On Krause’s multi-agent consensus model withstate-dependent connectivity.
IEEE transactions on Automatic Control
54, 11 (2009), 2586–2597.ACM Trans. Knowl. Discov. Data., Vol. 1, No. 1, Article 1. Publication date: January 2021. :20 Koley, et al. [9] Adam Breuer, Eric Balkanski, and Yaron Singer. 2020. The FAST algorithm for submodular maximization. In
InternationalConference on Machine Learning . PMLR, 1134–1143.[10] Xavier Castelló, Víctor M Eguíluz, and Maxi San Miguel. 2006. Ordering dynamics with two non-excluding options:bilingualism in language competition.
New Journal of Physics
8, 12 (2006), 308.[11] Damon Centola. 2010. The spread of behavior in an online social network experiment. science
Statist. Sci. (1995), 273–304.[13] Luiz Chamon and Alejandro Ribeiro. 2017. Approximate supermodularity bounds for experimental design. In
Advancesin Neural Information Processing Systems . 5403–5412.[14] P. Clifford and A. Sudbury. 1973. A model for spatial conflict.
Biometrika
60, 3 (1973), 581–588.[15] A. Das, S. Gollapudi, and K. Munagala. 2014. Modeling opinion dynamics in social networks. In
WSDM .[16] A. De, S. Bhattacharya, P. Bhattacharya, N. Ganguly, and S. Chakrabarti. 2014. Learning a Linear Influence Model fromTransient Opinion Dynamics. In
CIKM .[17] Abir De, Sourangshu Bhattacharya, and Niloy Ganguly. 2018. Demarcating endogenous and exogenous opiniondiffusion process on social networks. In
Proceedings of the 2018 World Wide Web Conference . 549–558.[18] Abir De, Isabel Valera, Niloy Ganguly, Sourangshu Bhattacharya, and Manuel Gomez Rodriguez. 2016. Learning andForecasting Opinion Dynamics in Social Networks. In
NIPS .[19] Guillaume Deffuant, David Neau, Frederic Amblard, and Gérard Weisbuch. 2000. Mixing beliefs among interactingagents.
Advances in Complex Systems
3, 01n04 (2000), 87–98.[20] M. H. DeGroot. 1974. Reaching a consensus.
J. Amer. Statist. Assoc.
69, 345 (1974), 118–121.[21] Jan Christian Dittmer. 2001. Consensus formation under bounded confidence.
Nonlinear Analysis: Theory, Methods &Applications
47, 7 (2001), 4615–4621.[22] Igor Douven and Alexander Riegler. 2009. Extending the Hegselmann–Krause Model I.
Logic Journal of IGPL
18, 2(2009), 323–335.[23] Rick Durrett and Simon Levin. 1996. Spatial models for species-area curves.
Journal of Theoretical Biology
IEEE Trans. Automat. Control
60, 7 (2015), 1886–1897.[25] M. Farajtabar, Y. Wang, M. Gomez-Rodriguez, S. Li, H. Zha, and L. Song. 2015. COEVOLVE: A Joint Point ProcessModel for Information Diffusion and Network Co-evolution. In
NIPS ’15: Advances in Neural Information ProcessingSystems .[26] Noah E Friedkin. 2015. The problem of social control and coordination of complex systems in sociology: A look at thecommunity cleavage problem.
IEEE Control Systems
35, 3 (2015), 40–51.[27] A. Friggeri, L. Adamic, D. Eckles, and J. Cheng. 2014. Rumor Cascades. In
ICWSM .[28] Przemyslaw A Grabowicz, Niloy Ganguly, and Krishna P Gummadi. 2016. Distinguishing between Topical andNon-Topical Information Diffusion Mechanisms in Social Media.. In
ICWSM . 151–160.[29] Trisha Greenhalgh, Glenn Robert, Fraser Macfarlane, Paul Bate, and Olivia Kyriakidou. 2004. Diffusion of innovationsin service organizations: systematic review and recommendations.
The Milbank Quarterly
82, 4 (2004), 581–629.[30] V. Gupta, A. De, S. Bhattacharya, and S. Bedathur. 2021. Learning Temporal Point Processes with IntermittentObservations. In
AISTATS .[31] Aniko Hannak, Eric Anderson, Lisa Feldman Barrett, Sune Lehmann, Alan Mislove, and Mirek Riedewald. 2012.Tweetin’in the Rain: Exploring Societal-Scale Effects of Weather on Mood.. In
ICWSM .[32] Abolfazl Hashemi, Mahsa Ghasemi, Haris Vikalo, and Ufuk Topcu. 2018. A randomized greedy algorithm for near-optimal sensor scheduling in large-scale sensor networks. In . IEEE,1027–1032.[33] Abolfazl Hashemi, Mahsa Ghasemi, Haris Vikalo, and Ufuk Topcu. 2019. Submodular observation selection andinformation gathering for quadratic models. arXiv preprint arXiv:1905.09919 (2019).[34] R. Hegselmann and U. Krause. 2002. Opinion dynamics and bounded confidence models, analysis, and simulation.
Journal of Artificial Societies and Social Simulation
5, 3 (2002).[35] P. Holme and M. E. Newman. 2006. Nonequilibrium phase transition in the coevolution of networks and opinions.
Physical Review E
74, 5 (2006), 056108.[36] Marjan Hosseinia and Arjun Mukherjee. 2017. Detecting Sockpuppets in Deceptive Opinion Spam. arXiv preprintarXiv:1703.03149 (2017).[37] Santosh KC and Arjun Mukherjee. 2016. On the temporal dynamics of opinion spamming: Case studies on yelp.In
Proceedings of the 25th International Conference on World Wide Web . International World Wide Web ConferencesSteering Committee, 369–379.[38] Andreas Krause and Daniel Golovin. 2014. Submodular function maximization.ACM Trans. Knowl. Discov. Data., Vol. 1, No. 1, Article 1. Publication date: January 2021. emarcating Endogenous and Exogenous Opinions 1:21 [39] Andreas Krause, Ajit Singh, and Carlos Guestrin. 2008. Near-optimal sensor placements in Gaussian processes: Theory,efficient algorithms and empirical studies.
Journal of Machine Learning Research
9, Feb (2008), 235–284.[40] Ulrich Krause. 2000. A discrete nonlinear and non-autonomous model of consensus formation.
Communications indifference equations (2000), 227–236.[41] Huayi Li, Zhiyuan Chen, Arjun Mukherjee, Bing Liu, and Jidong Shao. 2015. Analyzing and Detecting Opinion Spamon a Large-scale Dataset via Temporal and Spatial Patterns.. In
ICWSM . 634–637.[42] Huayi Li, Geli Fei, Shuai Wang, Bing Liu, Weixiang Shao, Arjun Mukherjee, and Jidong Shao. 2017. Bimodal distributionand co-bursting in review spam detection. In
Proceedings of the 26th International Conference on World Wide Web .International World Wide Web Conferences Steering Committee, 1063–1072.[43] T.J. Liniger. 2009.
Multivariate Hawkes Processes . Ph.D. Dissertation. ETHZ.[44] Bing Liu. 2012. Sentiment analysis and opinion mining.
Synthesis lectures on human language technologies
5, 1 (2012),1–167.[45] Nazareno GF Medeiros, Ana TC Silva, and FG Brady Moreira. 2006. Domain motion in the voter model with noise.
Physical Review E
73, 4 (2006), 046120.[46] Mauro Mobilia. 2003. Does a single zealot affect an infinite group of voters?
Physical review letters
91, 2 (2003), 028701.[47] Irinel-Constantin Mor, Antoine Girard, et al. 2011. Opinion dynamics with decaying confidence: Application tocommunity detection in graphs.
IEEE Trans. Automat. Control
56, 8 (2011), 1862–1873.[48] Irinel-Constantin Morărescu, Samuel Martin, Antoine Girard, and Aurélie Muller-Gueudin. 2016. Coordination innetworks of linear impulsive agents.
IEEE Trans. Automat. Control
61, 9 (2016), 2402–2415.[49] Seth A Myers, Chenguang Zhu, and Jure Leskovec. 2012. Information diffusion and external influence in networks. In
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining . ACM, 33–41.[50] Nasser M Nasrabadi, Trac D Tran, and Nam Nguyen. 2011. Robust lasso with missing and grossly corrupted observations.In
Advances in Neural Information Processing Systems . 1881–1889.[51] George L Nemhauser, Laurence A Wolsey, and Marshall L Fisher. 1978. An analysis of approximations for maximizingsubmodular set functions—I.
Mathematical Programming
14, 1 (1978), 265–294.[52] George L Nemhauser, Laurence A Wolsey, and Marshall L Fisher. 1978. An analysis of approximations for maximizingsubmodular set functions—I.
Mathematical programming
14, 1 (1978), 265–294.[53] B. Pang and L. Lee. 2008. Opinion mining and sentiment analysis.
Foundations and trends in information retrieval
2, 1-2(2008), 1–135.[54] James W Pennebaker, Martha E Francis, and Roger J Booth. 2001. Linguistic inquiry and word count: LIWC 2001.
Mahway: Lawrence Erlbaum Associates
71, 2001 (2001), 2001.[55] Sharon Qian and Yaron Singer. 2019. Fast parallel algorithms for statistical subset selection problems. In
Advances inNeural Information Processing Systems . 5072–5081.[56] Frank Schweitzer and Laxmidhar Behera. 2009. Nonlinear voter models: the transition from invasion to coexistence.
The European Physical Journal B-Condensed Matter and Complex Systems
67, 3 (2009), 301–318.[57] Manohar Shamaiah, Siddhartha Banerjee, and Haris Vikalo. 2010. Greedy sensor selection: Leveraging submodularity.In . IEEE, 2572–2577.[58] Tyler H Summers, Fabrizio L Cortesi, and John Lygeros. 2015. On submodularity and controllability in complexdynamical networks.
IEEE Transactions on Control of Network Systems
3, 1 (2015), 91–101.[59] Efthymios Tsakonas, Joakim Jaldén, Nicholas D Sidiropoulos, and Björn Ottersten. 2014. Convergence of the Huberregression M-estimate in the presence of dense outliers.
IEEE Signal Processing Letters
21, 10 (2014), 1211–1214.[60] Thomas W Valente. 1996. Social network thresholds in the diffusion of innovations.
Social networks
18, 1 (1996), 69–89.[61] Federico Vazquez and Víctor M Eguíluz. 2008. Analytical solution of the voter model on uncorrelated networks.
NewJournal of Physics
10, 6 (2008), 063011.[62] Federico Vazquez, Paul L Krapivsky, and Sidney Redner. 2003. Constrained opinion dynamics: Freezing and slowevolution.
Journal of Physics A: Mathematical and General
36, 3 (2003), L61.[63] Yichen Wang, Grady Williams, Evangelos Theodorou, and Le Song. 2017. Variational Policy for Guiding Point Processes. arXiv preprint arXiv:1701.08585 (2017).[64] Gérard Weisbuch, Guillaume Deffuant, Frédéric Amblard, and Jean-Pierre Nadal. 2002. Meet, discuss, and segregate!
Complexity
7, 3 (2002), 55–63.[65] David P Williamson and David B Shmoys. 2011.
The design of approximation algorithms . Cambridge university press.[66] E. Yildiz, A. Ozdaglar, D. Acemoglu, A. Saberi, and A. Scaglione. 2013. Binary opinion dynamics with stubborn agents.
ACM Transactions on Economics and Computation
1, 4 (2013), 19.[67] M. E. Yildiz, R. Pagliari, A. Ozdaglar, and A. Scaglione. 2010. Voting models in random networks. In
Information Theoryand Applications Workshop . 1–7.[68] Ali Zarezade, Abir De, Hamid Rabiee, and Manuel Gomez-Rodriguez. 2017. Cheshire: An Online Algorithm for ActivityMaximization in Social Networks. In arXiv preprint arXiv:1703.02059 .ACM Trans. Knowl. Discov. Data., Vol. 1, No. 1, Article 1. Publication date: January 2021. :22 Koley, et al.
A PROOFS OF RESULTSA.1 Proof of Lemma 1
For any 𝑢 ∈ V , define 𝜽 𝑢 = [ 𝑨 𝑢 ; 𝛼 𝑢 ] . We observe the loss (Eq. 10) associated with only user 𝑢 is aregularized least squares loss, i.e. ,ˆ 𝜽 𝑢 = min 𝜽 𝑢 ∑︁ 𝑒 𝑖 ∈H 𝑇 𝜎 − (cid:16) 𝑚 ∗ 𝑢 ( 𝑡 𝑖 ) − 𝝓 𝑢𝑇𝑖 𝜽 𝑢 (cid:17) + 𝑐 || 𝜽 𝑢 || , (29)ˆ 𝜽 𝑢 = (cid:16) 𝑐 𝑰 + 𝜎 − ∑︁ 𝑒 𝑖 ∈H 𝑇 𝝓 𝑢𝑖 𝝓 𝑢𝑇𝑖 (cid:17) − 𝜎 − ∑︁ 𝑒 𝑖 ∈H 𝑇 𝑚 𝑢 ( 𝑡 𝑖 ) 𝝓 𝑢𝑖 = (cid:16) 𝑐 𝑰 + 𝜎 − ∑︁ 𝑒 𝑖 ∈H 𝑇 𝝓 𝑢𝑖 𝝓 𝑢𝑇𝑖 (cid:17) − 𝜎 − ∑︁ 𝑒 𝑖 ∈H 𝑇 𝝓 𝑢𝑖 𝝓 𝑢𝑇𝑖 𝜽 𝑢 + (cid:16) 𝑐 𝑰 + 𝜎 − ∑︁ 𝑒 𝑖 ∈H 𝑇 𝝓 𝑢𝑖 𝝓 𝑢𝑇𝑖 (cid:17) − ∑︁ 𝑒 𝑖 ∈H 𝑇 𝝓 𝑢𝑖 𝜖 ( 𝑡 𝑖 ) 𝜎 The equality follows from the fact that, 𝑚 𝑢 ( 𝑡 𝑖 ) = 𝝓 𝑢𝑇𝑖 𝜽 𝑢 + 𝜖 ( 𝑡 𝑖 ) ˆ 𝜽 𝑢 − 𝜽 𝑢 = − 𝑐 (cid:16) 𝑐 𝑰 + 𝜎 − ∑︁ 𝑒 𝑖 ∈H 𝑇 𝝓 𝑢𝑖 𝝓 𝑢𝑇𝑖 (cid:17) − 𝜽 𝑢 + (cid:16) 𝑐 𝑰 + 𝜎 − ∑︁ 𝑒 𝑖 ∈H 𝑇 𝝓 𝑢𝑖 𝝓 𝑢𝑇𝑖 (cid:17) − ∑︁ 𝑒 𝑖 ∈H 𝑇 𝝓 𝑢𝑖 𝜖 ( 𝑡 𝑖 ) 𝜎 Then the covariance product is given in the following: E ( ˆ 𝜽 𝑢 − 𝜽 𝑢 )( ˆ 𝜽 𝑢 − 𝜽 𝑢 ) 𝑇 = 𝑐 (cid:16) 𝑐 𝑰 + 𝜎 − ∑︁ 𝑒 𝑖 ∈H 𝑇 𝝓 𝑢𝑖 𝝓 𝑢𝑇𝑖 (cid:17) − E [ 𝜽 𝑢 𝜽 𝑇𝑢 ] (cid:16) 𝑐 𝑰 + 𝜎 − ∑︁ 𝑒 𝑖 ∈H 𝑇 𝝓 𝑢𝑖 𝝓 𝑢𝑇𝑖 (cid:17) − + (cid:16) 𝑐 𝑰 + 𝜎 − ∑︁ 𝑒 𝑖 ∈H 𝑇 𝝓 𝑢𝑖 𝝓 𝑢𝑇𝑖 (cid:17) − ∑︁ 𝑒 𝑖 ∈H 𝑇 𝝓 𝑢𝑖 𝝓 𝑢𝑇𝑖 E ( 𝜖 ( 𝑡 𝑖 )) 𝜎 (cid:16) 𝑐 𝑰 + 𝜎 − ∑︁ 𝑒 𝑖 ∈H 𝑇 𝝓 𝑢𝑖 𝝓 𝑢𝑇𝑖 (cid:17) − Note that, from the regularizer we have, 𝜽 𝑢 ∼ N ( , 𝑰 / 𝑐 ) . Furthermore E ( 𝜖 ( 𝑡 𝑖 )) = 𝜎 . Using simplealgebraic calculation, we have the value for 𝚺 (H 𝑇 ) . A.2 Proof of Theorem 3 (i)
Monotonicity of 𝑓 𝑋 : To prove monotonicity, we need to show, 𝑓 𝑋 (H 𝑇 ∪ { 𝑒 𝑘 }) − 𝑓 𝑋 (H 𝑇 ) ≥ V . Assume the user 𝑢 has postedthe event 𝑒 𝑘 = { 𝑢 𝑘 , 𝑚 𝑘 , 𝑡 𝑘 } , i.e. 𝑢 𝑘 = 𝑢 . We define an auxiliary function of 𝑝 ∈ [ , ] : 𝑔 ( 𝑝 ) : = ∑︁ 𝑢 ∈V tr log (cid:16) 𝑐 𝑰 + 𝜎 − ∑︁ 𝑒 𝑖 ∈H 𝑇 𝝓 𝑢𝑖 𝝓 𝑢𝑇𝑖 + 𝑝𝜎 − 𝝓 𝑢𝑘 𝝓 𝑢𝑇𝑘 (cid:17) . Monotonicity of 𝑓 𝑋 in H 𝑇 is equivalent to the condition: 𝑔 ( ) ≥ 𝑔 ( ) that can be shown by proving 𝑑𝑑𝑝 𝑔 ( 𝑝 ) ≥
0. For compactness, we define 𝑮 𝑢 = (cid:16) 𝑐 𝑰 + 𝜎 − (cid:205) 𝑒 𝑖 ∈H 𝑇 𝝓 𝑢𝑖 𝝓 𝑢𝑇𝑖 (cid:17) . Now we can show that: 𝑑𝑑𝑝 𝑔 ( 𝑝 ) = ∑︁ 𝑢 ∈V tr 𝑑𝑑𝑝 log ( 𝑮 𝑢 + 𝑝 𝝓 𝑢𝑘 𝝓 𝑢𝑇𝑘 ) = ∑︁ 𝑢 ∈V tr 𝑑𝑑𝑝 (cid:104) log ( 𝑮 𝑢 + 𝑝 𝝓 𝑢𝑘 𝝓 𝑢𝑇𝑘 ) − log 𝑮 𝑢 (cid:105) = ∑︁ 𝑢 ∈V tr (cid:2) ( 𝑰 + 𝑝 𝝓 𝑢𝑘 𝝓 𝑢𝑇𝑘 𝑮 − 𝑢 ) − 𝝓 𝑢𝑘 𝝓 𝑢𝑇𝑘 𝑮 − 𝑢 (cid:3) = ∑︁ 𝑢 ∈V tr 𝝓 𝑢𝑇𝑘 𝑮 − 𝑢 ( 𝑰 + 𝑝 𝝓 𝑢𝑘 𝝓 𝑢𝑇𝑘 𝑮 − 𝑢 ) − 𝝓 𝑢𝑘 = ∑︁ 𝑢 ∈V 𝜎 − 𝝓 𝑢𝑇𝑘 ( 𝑮 𝑢 + 𝑝 𝝓 𝑢𝑘 𝝓 𝑢𝑇𝑘 ) − 𝝓 𝑢𝑘 ≥ ACM Trans. Knowl. Discov. Data., Vol. 1, No. 1, Article 1. Publication date: January 2021. emarcating Endogenous and Exogenous Opinions 1:23
Hence 𝑓 𝑋 is monotone in H 𝑇 ∀ 𝑋 ∈ { 𝐴, 𝐷, 𝐸,𝑇 } .(ii) Weak submodularity of 𝑓 𝐴 and 𝑓 𝐸 : Since 𝑓 𝐴 and 𝑓 𝐸 are linear and monotone as proved above,their weak submodularity follows from the weak submodularity of the A and E optimality criteriawhich has been proved in [7, 13, 32] for linear and monotone observation models.(iii) Submodularity of 𝑓 𝐷 : Since 𝑓 𝐷 is linear and monotone as proved above, its submodularityfollows from the submodularity of the D optimality criterion which has been proved in [39, 57] forlinear and monotone observation models.(iv) Modularity of 𝑓 𝑇 : Since 𝑓 𝑇 is linear and monotone as proved above, its modularity followsfrom the modularity of the T optimality criterion which has been proved in [39, 58] for linear andmonotone observation models. B ADDITIONAL RESULTS
Datasets CherryPick 𝐴 -Gr CherryPick 𝐴 -Fast CherryPick 𝐷 -Gr CherryPick 𝐷 -Fast Mean squared error: E ( 𝑚 − ˆ 𝑚 ) Club 0.038 0.039 0.037 0.037Elections 0.054 0.055 0.053 0.051Verdict 0.069 0.071 0.071 0.072Sports 0.066 0.067 0.056 0.057Delhi 0.035 0.036 0.039 0.039
Failure Rate: P ( sign ( 𝑚 ) ≠ P ( sign ( ˆ 𝑚 )) Club 0.120 0.124 0.113 0.116Elections 0.169 0.176 0.176 0.182Verdict 0.207 0.204 0.210 0.213Sports 0.114 0.091 0.079 0.087Delhi 0.137 0.144 0.147 0.152
Table 5. Comparative performance evaluation of CherryPick 𝐴 -Fast and CherryPick 𝐷 -Fast with CherryP-ick 𝐴 -Gr and CherryPick 𝐷 -Gr for five real-world datasets for fixed 𝛾 = . and T =4 hours. For each message 𝑚 in test set, we predict its sentiment value given the history up to T =4 hours before the time of the message.For the T hours, we predict the opinion stream using a sampling algorithm. Mean squared error and failurerate have been reported. We observe that, in general, greedy variants outperform adaptive variants by smallmargin. For the experimental results, we apply standard greedy for maximizing 𝑓 𝐴 and 𝑓 𝐷 (see Section 5).As we have mentioned earlier, there exist fast adaptive techniques as alternative to greedy forboth maximization of 𝑓 𝐷 as well as 𝑓 𝐴 . In this section, we repeat our primary experiments withfast adaptive techniques for CherryPick 𝐴 and CherryPick 𝐷 and compare them with their greedyalternatives. For maximizing 𝑓 𝐷 , we use Fast-Full, a fast adaptive algorithm proposed in [9]. Werefer to this method as CherryPick 𝐷 -Fast. For maximizing 𝑓 𝐴 , we adopt DASH, a fast adaptivetechnique proposed by [55], which we refer to as CherryPick 𝐴 -Fast. From now onwards, we referto greedy techniques for 𝑓 𝐴 and 𝑓 𝐷 as CherryPick 𝐴 -Gr and CherryPick 𝐷 -Gr respectively. Wepresent their detailed comparative evaluation in our primary experiments on real datasets below. Comparative analysis.
Here we compare the predictive performance of CherryPick 𝐴 -Fastand CherryPick 𝐷 -Fast with CherryPick 𝐴 -Gr and CherryPick 𝐷 -Gr. We fix the fraction ofendogenous chosen to 0 . ACM Trans. Knowl. Discov. Data., Vol. 1, No. 1, Article 1. Publication date: January 2021. :24 Koley, et al. standard greedy algorithms outperform adaptive methods by a small margin at the cost of speedingup the demarcation process.
Cherrypick A -GR Cherrypick A -Fast Cherrypick D -GR Cherrypick D -Fast | T |/| T | M SE Club | T |/| T | M SE Verdict | T |/| T | M SE Delhi | T |/| T | F R Club | T |/| T | F R Verdict | T |/| T | F R DelhiFig. 6. Performance variation with size of endogenous subset for Club, Verdict and Delhi datasets where thesize of endogenous subset is varied from 60% to 100%. Timespan T has been set to 4 hours. Mean squarederror and failure rate have been reported. Adaptive variants display almost identical behavior with greedyvariants.
Variation of performance with the fraction of outliers ( 𝛾 ). Figure 6 compares greedy andadaptive variants of CherryPick 𝐴 and CherryPick 𝐷 by observing the variation of forecastingperformance for different values of 𝛾 i.e. the pre-specified fraction of outliers, for Club, Verdict,and Delhi datasets. In this experiment, 𝛾 is varied from 0 . 𝐴 -Fast and CherryPick 𝐷 -Fast perform verysimilar to CherryPick 𝐴 -Gr and CherryPick 𝐷 -Gr respectively. Moreover, CherryPick 𝐴 -Fastand CherryPick 𝐷 -Fast preserve the same behavioral pattern as their greedy counterpart i.e. bothperforming best for a fixed value of 𝛾 , with performance drop at both ends. Forecasting performance.
Finally, we compare the forecasting performance of CherryPick 𝐴 -Fast and CherryPick 𝐷 -Fast with their greedy counterparts with respect to variation of T, fixing 𝛾 = .
2. Figure 7 summarizes the results, where we make following observations. (I) Generally, bothCherryPick 𝐴 -Fast and CherryPick 𝐷 -Fast are outperformed by their greedy counterparts withvery low margin as T increases. (II) CherryPick 𝐴 -Fast and CherryPick 𝐷 -Fast display a similarpattern of performance drop with higher T with small deviations from the greedy counterparts. ACM Trans. Knowl. Discov. Data., Vol. 1, No. 1, Article 1. Publication date: January 2021. emarcating Endogenous and Exogenous Opinions 1:25
Cherrypick A -GR Cherrypick A -Fast Cherrypick D -GR Cherrypick D -Fast T (hours) M SE T (hours) M SE T (hours) M SE T (hours) M SE T (hours) F R Club
T (hours) F R Verdict
T (hours) F R Sports
T (hours) F R DelhiFig. 7. Forecasting performance for greedy and adaptive variants of CherryPick 𝐴 and CherryPick 𝐷 for afixed 𝛾 = . for four major real datasets. For each message 𝑚 in the test set, we predict its sentiment valuegiven the history up to T hours before the time of the message. For the T hours, we predict the opinion streamusing a sampling algorithm. Mean squared error and failure rate have been reported. We observe almostidentical behavior of greedy and adaptive variants, with adaptive methods outperformed by greedy methodswith negligible margin.in the test set, we predict its sentiment valuegiven the history up to T hours before the time of the message. For the T hours, we predict the opinion streamusing a sampling algorithm. Mean squared error and failure rate have been reported. We observe almostidentical behavior of greedy and adaptive variants, with adaptive methods outperformed by greedy methodswith negligible margin.