[PDF] Learning and Forecasting Opinion Dynamics in Social Networks

Abstract

Social media and social networking sites have become a global pinboard for exposition and discussion of news, topics, and ideas, where social media users often update their opinions about a particular topic by learning from the opinions shared by their friends. In this context, can we learn a data-driven model of opinion dynamics that is able to accurately forecast opinions from users? In this paper, we introduce SLANT, a probabilistic modeling framework of opinion dynamics, which represents users opinions over time by means of marked jump diffusion stochastic differential equations, and allows for efficient model simulation and parameter estimation from historical fine grained event data. We then leverage our framework to derive a set of efficient predictive formulas for opinion forecasting and identify conditions under which opinions converge to a steady state. Experiments on data gathered from Twitter show that our model provides a good fit to the data and our formulas achieve more accurate forecasting than alternatives.

Full PDF

LLearning and Forecasting Opinion Dynamics in Social Networks

Abir De , Isabel Valera , Niloy Ganguly , Sourangshu Bhattacharya , and ManuelGomez-Rodriguez IIT Kharagpur, { abir.de, niloy, sourangshu } @cse.iitkgp.ernet.in Max Plank Institute for Software Systems, { ivalera, manuelgr } @mpi-sws.org Abstract

Social media and social networking sites have become a global pinboard for exposition and discussion of news,topics, and ideas, where social media users often update their opinions about a particular topic by learning from theopinions shared by their friends. In this context, can we learn a data-driven model of opinion dynamics that is ableto accurately forecast users’ opinions? In this paper, we introduce SLANT, a probabilistic modeling frameworkof opinion dynamics, which represents users’ opinions over time by means of marked jump diffusion stochasticdifferential equations, and allows for efﬁcient model simulation and parameter estimation from historical ﬁne grainedevent data. We then leverage our framework to derive a set of efﬁcient predictive formulas for opinion forecasting andidentify conditions under which opinions converge to a steady state. Experiments on data gathered from Twitter showthat our model provides a good ﬁt to the data and our formulas achieve more accurate forecasting than alternatives.

Social media and social networking sites are increasingly used by people to express their opinions, give their “hottakes”, on the latest breaking news, political issues, sports events, and new products. As a consequence, there hasbeen an increasing interest on leveraging social media and social networking sites to sense and forecast opinions ,as well as understand opinion dynamics . For example, political parties routinely use social media to sense people’sopinion about their political discourse ; quantitative investment ﬁrms measure investor sentiment and trade usingsocial media [17]; and, corporations leverage brand sentiment, estimated from users’ posts, likes and shares in socialmedia and social networking sites, to design their marketing campaigns . In this context, multiple methods for sensingopinions, typically based on sentiment analysis [20], have been proposed in recent years. However, methods foraccurately forecasting opinions are still scarce [6, 7, 18], despite the extensive literature on theoretical models ofopinion dynamics [5, 8].In this paper, we develop a novel modeling framework of opinion dynamics in social media and social networkingsites, SLANT , which allows for accurate forecasting of individual users’ opinions. The proposed framework isbased on two simple intuitive ideas: i) users’ opinions are hidden until they decide to share it with their friends (orneighbors); and, ii) users may update their opinions about a particular topic by learning from the opinions shared bytheir friends. While the latter is one of the main underlying premises used by many well-known theoretical modelsof opinion dynamics [5, 8, 21], the former has been ignored by models of opinion dynamics, despite its relevance onclosely related processes such as information diffusion [11].More in detail, our proposed model represents users’ latent opinions as continuous-time stochastic processes drivenby a set of marked jump stochastic differential equations (SDEs) [13]. Such construction allows each user’s latentopinion to be modulated over time by the opinions asynchronously expressed by her neighbors as sentiment messages.Here, every time a user expresses an opinion by posting a sentiment message, she reveals a noisy estimate of hercurrent latent opinion. Then, we exploit a key property of our model, the Markov property, to develop: Slant is a particular point of view from which something is seen or presented. a r X i v : . [ c s . S I] M a y . An efﬁcient estimation procedure to ﬁnd the parameters that maximize the likelihood of a set of (millions of)sentiment messages via convex programming.II. A scalable simulation procedure to sample millions of sentiment messages from the proposed model in a matterof minutes.III. A set of novel predictive formulas for efﬁcient and accurate opinion forecasting, which can also be used toidentify conditions under which opinions converge to a steady state of consensus or polarization.Finally, we experiment on both synthetic and real data gathered from Twitter and show that our model provides agood ﬁt to the data and our predictive formulas achieve more accurate opinion forecasting than several alternatives [6,7, 8, 14, 25]. Related work.

There is an extensive line of work on theoretical models of opinion dynamics and opinion formation [3,5, 8, 14, 16, 25]. However, previous models typically share the following limitations: (i) they do not distinguishbetween latent opinion and sentiment (or expressed opinion), which is a noisy observation of the opinion ( e.g. , thumbsup/down, text sentiment); (ii) they consider users’ opinions to be updated synchronously in discrete time, however,opinions may be updated asynchronously following complex temporal patterns [11]; (iii) the model parameters aredifﬁcult to learn from real ﬁne-grained data and instead are set arbitrarily, as a consequence, they provide inaccurateﬁne-grained predictions; and, (iv) they focus on analyzing only the steady state of the users’ opinions, neglectingthe transient behavior of real opinion dynamics which allows for opinion forecasting methods. More recently, therehave been some efforts on designing models that overcome some of the above limitations and provide more accuratepredictions [6, 7]. However, they do not distinguish between opinion and sentiment and still consider opinions to beupdated synchronously in discrete time. Our modeling framework addresses the above limitations and, by doing so,achieves more accurate opinion forecasting than alternatives.

In this section, we ﬁrst formulate our model of opinion dynamics, starting from the data it is designed for, and thenintroduce efﬁcient methods for model parameter estimation and model simulation.

Opinions data.

Given a directed social network G = ( V , E ) , we record each message as e := ( u, m, t ) , where thetriplet means that the user u ∈ V posted a message with sentiment m at time t . Given a collection of messages { e = ( u , m , t ) , . . . , e n = ( u n , m n , t n ) } , the history H u ( t ) gathers all messages posted by user u up to but notincluding time t , i.e., H u ( t ) = { e i = ( u i , m i , t i ) | u i = u and t i < t } , (1)and H ( t ) := ∪ u ∈V H u ( t ) denotes the entire history of messages up to but not including time t . Generative process.

We represent users’ latent opinions as a multidimensional stochastic process x ∗ ( t ) , in which the u -th entry, x ∗ u ( t ) ∈ R , represents the opinion of user u at time t and the sign ∗ means that it may depend on the history H ( t ) . Then, every time a user u posts a message at time t , we draw its sentiment m from a sentiment distribution p ( m | x ∗ u ( t )) . Here, we can also think of the sentiment m of each message as samples from a noisy stochastic process m u ( t ) ∼ p ( m u ( t ) | x ∗ u ( t )) . Further, we represent the message times by a set of counting processes. In particular, wedenote the set of counting processes as a vector N ( t ) , in which the u -th entry, N u ( t ) ∈ { } ∪ Z + , counts the numberof sentiment messages user u posted up to but not including time t . Then, we can characterize the message rate of theusers using their corresponding conditional intensities as E [ d N ( t ) | H ( t )] = λ ∗ ( t ) dt, (2)where d N ( t ) := ( dN u ( t ) ) u ∈V denotes the number of messages per user in the window [ t, t + dt ) and λ ∗ ( t ) :=( λ ∗ u ( t ) ) u ∈V denotes the associated user intensities, which may depend on the history H ( t ) . We denote the set of userthat u follows by N ( u ) . Next, we specify the the intensity functions λ ∗ ( t ) , the dynamics of the users’ opinions x ∗ ( t ) ,and the sentiment distribution p ( m | x ∗ u ( t )) . Intensity for messages.

There is a wide variety of message intensity functions one can choose from to model theusers’ intensity λ ∗ ( t ) [1]. In this work, we consider two of the most popular functional forms used in the growingliterature on social activity modeling using point processes [9, 23]:I. Poisson process.

The intensity is assumed to be independent of the history H ( t ) and constant, i.e. , λ ∗ u ( t ) = µ u .2I. Multivariate Hawkes processes.

The intensity captures a mutual excitation phenomena between messageevents and depends on the whole history of message events ∪ v ∈{ u ∪N ( u ) } H v ( t ) before t : λ ∗ u ( t ) = µ u + (cid:88) v ∈ u ∪N ( u ) b vu (cid:88) e i ∈H v ( t ) κ ( t − t i ) = µ u + (cid:88) v ∈ u ∪N ( u ) b vu ( κ ( t ) (cid:63) dN v ( t )) , (3)where the ﬁrst term, µ u (cid:62) , models the publication of messages by user u on her own initiative, and the secondterm, with b vu (cid:62) , models the publication of additional messages by user u due to the inﬂuence that previousmessages posted by the users she follows have on her intensity. Here, κ ( t ) = e νt is an exponential triggeringkernel modeling the decay of inﬂuence of the past events over time and (cid:63) denotes the convolution operation.In both cases, the couple ( N ( t ) , λ ∗ ( t )) is a Markov process, i.e. , future states of the process (conditional on past andpresent states) depends only upon the present state, and we can express the users’ intensity more compactly using thefollowing jump stochastic differential equation (SDE): d λ ∗ ( t ) = ν ( µ − λ ∗ ( t )) dt + B d N ( t ) , where the initial condition is λ ∗ (0) = µ . The Markov property will become important later. Stochastic process for opinion.

The opinion x ∗ u ( t ) of a user u at time t adopts the following form: x ∗ u ( t ) = α u + (cid:88) v ∈N ( u ) a vu (cid:88) e i ∈H v ( t ) m i g ( t − t i ) = α u + (cid:88) v ∈N ( u ) a vu ( g ( t ) (cid:63) ( m v ( t ) dN v ( t ))) , (4)where the ﬁrst term, α u ∈ R , models the original opinion a user u starts with, the second term, with a vu ∈ R , modelsupdates in user u ’s opinion due to the inﬂuence that previous messages with opinions m i posted by the users u followshas on her opinion. Here, g ( t ) = e − ωt denotes an exponential triggering kernel, which models the decay of inﬂuenceover time.Under this form, the resulting opinion dynamics are Markovian and can be compactly represented by a set ofcoupled marked jumped stochastic differential equations (proven in Appendix A): Proposition 1.

The tuple ( x ∗ ( t ) , λ ∗ ( t ) , N ( t )) is a Markov process, whose dynamics are deﬁned by the followingmarked jumped stochastic differential equations (SDE): d x ∗ ( t ) = ω ( α − x ∗ ( t )) dt + A ( m ( t ) (cid:12) d N ( t )) (5) d λ ∗ ( t ) = ν ( µ − λ ∗ ( t )) dt + B d N ( t ) (6) where the initial conditions are λ ∗ (0) = µ and x ∗ (0) = α , the marks are the sentiment messages m ( t ) =( m u ( t ) ) u ∈V , with m u ( t ) ∼ p ( m | x ∗ u ( t )) , and the sign (cid:12) denotes pointwise product. The above mentioned Markov property will be the key to the design of efﬁcient model parameter estimation andmodel simulation algorithms.

Sentiment distribution.

The particular choice of sentiment distribution p ( m | x ∗ u ( t )) depends on the recorded marks.For example, one may consider:I. Gaussian Distribution

The sentiment is assumed to be a real random variable m ∈ R , i.e. , p ( m | x u ( t )) = N ( x u ( t ) , σ u ) . This ﬁts well scenarios in which sentiment is extracted from text using sentiment analysis [12].II. Logistic.

The sentiment is assumed to be a binary random variable m ∈ {− , } , i.e. , p ( m | x u ( t )) = 1 / (1 +exp( − m · x u ( t ))) . This ﬁts well scenarios in which sentiment is measured by means of up votes, down votes orlikes.Our model estimation method can be easily adapted to any log-concave sentiment distribution. However, in theremainder of the paper, we consider the Gaussian distribution since, in our experiments, sentiment is extracted fromtext using sentiment analysis. Given a collection of messages H ( T ) = { ( u i , m i , t i ) } recorded during a time period [0 , T ) in a social network G = ( V , E ) , we can ﬁnd the optimal parameters α , µ , A and B by solving a maximum likelihood estimation (MLE)problem . To do so, it is easy to show that the log-likelihood of the messages is given by Here, if one decides to model the message intensities with a Poisson process, B = 0 . ( α , µ , A , B ) = (cid:88) e i ∈H ( T ) log p ( m i | x ∗ u i ( t i )) (cid:124) (cid:123)(cid:122) (cid:125) message sentiments + (cid:88) e i ∈H ( T ) log λ ∗ u i ( t i ) − (cid:88) u ∈V (cid:90) T λ ∗ u ( τ ) dτ (cid:124) (cid:123)(cid:122) (cid:125) message times . (7)Then, we can ﬁnd the optimal parameters ( α , µ , A , B ) using MLE asmaximize α , µ ≥ , A , B ≥ L ( α , µ , A , B ) . (8)Note that, as long as the sentiment distributions are log-concave, the MLE problem above is concave and thus canbe solved efﬁciently. Moreover, the problem decomposes in |V| independent subproblems, two per user u , since theﬁrst term in Eq. 7 only depends on ( α , A ) whereas the last two terms only depend on ( µ , B ) , and thus can be readilyparallelized. Then, we ﬁnd ( µ ∗ , B ∗ ) using spectral projected gradient descent [4], which works well in practiceand achieves ε accuracy in O (log(1 /ε )) iterations, and ﬁnd ( α ∗ , A ∗ ) analytically, since, for Gaussian sentimentdistributions, the problem reduces to a least-square problem. Fortunately, in each subproblem, we can use the Markovproperty from Proposition 1 to precompute the sums and integrals in (8) in linear time, i.e. , O ( |H u ( T ) | + | ∪ v ∈N ( u ) H v ( T ) | ) . Appendix H summarizes the overall estimation algorithm. We leverage the efﬁcient sampling algorithm for multivariate Hawkes introduced by Farajtabar et al. [10] to designa scalable algorithm to sample opinions from our model. The two key ideas that allow us to adapt the procedureby Farajtabar et al. to our model of opinion dynamics, while keeping its efﬁciency, are as follows: (i) the opiniondynamics, deﬁned by Eqs. 5 and 6, are Markovian and thus we can update individual intensities and opinions in O (1) – let t i and t i +1 be two consecutive events, then, we can compute λ ∗ ( t i +1 ) as ( λ ∗ ( t i ) − µ ) exp( − ν ( t i +1 − t i )) + µ and x ∗ ( t i +1 ) as ( x ∗ ( t i ) − α ) exp( − ω ( t i +1 − t i )) + α , respectively; and, (ii) social networks are typically sparse andthus both A and B are also sparse, then, whenever a node expresses its opinion, only a small number of opinions andintensity functions in its local neighborhood will change. As a consequence, we can reuse the majority of samplesfrom the intensity functions and sentiment distributions for the next new sample. Appendix I summarizes the overallsimulation algorithm. Our goal here is developing efﬁcient methods that leverage our model to forecast a user u ’s opinion x u ( t ) at time t given the history H ( t ) up to time t

Consider each user’s messages follow a Poisson process with rate µ u . Then, the conditionalaverage opinion is given by (proven in Appendix C): Theorem 2.

Given a collection of messages H t recorded during a time period [0 , t ) and λ ∗ u ( t ) = µ u for all u ∈ G ,then, E H t \H t [ x ∗ ( t ) |H t ] = e ( A Λ − ωI )( t − t ) x ( t ) + ω ( A Λ − ωI ) − (cid:16) e ( A Λ − ω I )( t − t ) − I (cid:17) α , (9)4 here Λ := diag[ µ ] and ( x ( t )) u ∈V = α u + (cid:80) v ∈N ( u ) a uv (cid:80) t i ∈H v ( t ) e − ω ( t − t i ) m v ( t i ) . Remarkably, we can efﬁciently compute both terms in Eq. 9 by using the iterative algorithm by Al-Mohy et al. [2]for the matrix exponentials and the well-known GMRES method [22] for the matrix inversion. Given this predictiveformula, we can easily study the stability condition and, for stable systems, ﬁnd the steady state conditional averageopinion (proven in Appendix D):

Theorem 3.

Given the conditions of Theorem 2, if Re [ ξ ( A Λ )] < ω , then, lim t →∞ E H t \H t [ x ∗ ( t ) |H t ] = (cid:18) I − A Λ w (cid:19) − α . (10)The above results indicate that the conditional average opinions are nonlinearly related to the parameter matrix A ,which depends on the network structure, and the message rates µ , which in this case are assumed to be constant andindependent on the network structure. Figure 1 provides empirical evidence of these results. II. Multivariate Hawkes Process.

Consider each user’s messages follow a multivariate Hawkes process, given byEq. 3, and b vu = 0 for all v, u ∈ G , v (cid:54) = u . Then, the conditional average opinion is given by (proven in Appendix E): Theorem 4.

Given a collection of messages H t recorded during a time period [0 , t ) and λ ∗ u ( t ) = µ u + b uu (cid:80) e i ∈H u ( t ) e − ν ( t − t i ) for all u ∈ G , then, the conditional average satisﬁes the following differential equation: d E H t \H t [ x ∗ ( t ) |H t ] dt = [ − ωI + A Λ ( t )] E H t \H t [ x ∗ ( t ) |H t ] + ω α , (11) where Λ ( t ) = diag (cid:16) E H t \H t [ λ ∗ ( t ) |H t ] (cid:17) , E H t \H t [ λ ∗ ( t ) |H t ] = e ( B − νI )( t − t ) η ( t ) + ν ( B − νI ) − (cid:16) e ( B − ν I )( t − t ) − I (cid:17) µ ∀ t ≥ t , ( η ( t )) u ∈V = µ u + (cid:88) v ∈N ( u ) b uv (cid:88) t i ∈H v ( t ) e − ν ( t − t i ) , B = diag (cid:0) [ b , . . . , b |V||V| ] (cid:62) (cid:1) . Here, we can compute the conditional average by solving numerically the differential equation above, which is notstochastic, where we can efﬁciently compute the vector E H t [ λ ∗ ( t )] by using again the algorithm by Al-Mohy et al. [2]and the GMRES method [22].In this case, the stability condition and the steady state conditional average opinion are given by (proven in Ap-pendix F): Theorem 5.

Given the conditions of Theorem 4, if the transition matrix Φ( t ) associated to the time-varying linearsystem described by Eq. 11 satisﬁes that || Φ( t ) || ≤ γe − ct ∀ t > , where γ, c > , then, lim t →∞ E H t \H t [ x ∗ ( t ) |H t ] = (cid:18) I − A Λ w (cid:19) − α , (12) where Λ := diag (cid:104)(cid:0) I − B ν (cid:1) − µ (cid:105) The above results indicate that the conditional average opinions are nonlinearly related to the parameter matrices A and B . This suggests that the effect of the temporal inﬂuence on the opinion evolution, by means of the parametermatrix B of the multivariate Hawkes process, is non trivial. We illustrate this result empirically in Figure 1.5 etwork G Time O p i n i o n - T r a j e c t o r y → ExperimentalTheoretical P: (cid:80) u ∈V± E [ xu ( t )] | V ±| Time O p i n i o n - T r a j e c t o r y → Hawkes (-)Hawkes (+)0.005 0.01 0.015 -4-224 H: (cid:80) u ∈V± E [ xu ( t )] | V ±| N o d e - I D P: Temporal evolution N o d e - I D Time1020304050 H: Temporal evolutionNetwork G Time O p i n i o n - T r a j e c t o r y → ExperimentalTheoretical -1.5-0.5-2-1 P: (cid:80) u ∈V E [ xu ( t )] | V | Time O p i n i o n - T r a j e c t o r y → Hawkes (-)Hawkes (+)0.005 0.01 0.015 -20-1010203040 H: (cid:80) u ∈V E [ xu ( t )] | V | N o d e - I D P: Temporal evolution N o d e - I D Time1020304050 H: Temporal evolution

Figure 1: Opinion dynamics on two 50-node networks G (top) and G (bottom) for Poisson (P) and Hawkes (H)message intensities. The ﬁrst column visualizes the two networks and opinion of each node at t = 0 (positive/negativeopinions in red/blue). The second column shows the temporal evolution of the theoretical and empirical averageopinion for Poisson intensities. The third column shows the temporal evolution of the empirical average opinion forHawkes intensities, where we compute the average separately for positive ( + ) and negative ( − ) opinions in the steadystate. The fourth and ﬁfth columns shows the polarity of average opinion per user over time. Given the efﬁcient simulation procedure described in Section 2.2, we can readily derive a general simulation basedformula for opinion forecasting: E H t \H t [ x ∗ ( t ) |H t ] ≈ ˆ x ∗ ( t ) = 1 n n (cid:88) l =1 x ∗ l ( t ) , (13)where n is the number of times that we simulate the opinion dynamics and x ∗ l ( t ) gathers the users’ opinion at time t for the l -th simulation. Moreover, we have the following theoretical guarantee (proven in Appendix G): Theorem 6.

Simulate the opinion dynamics up to time t > t the following number of times: n ≥ (cid:15) (6 σ max + 4 x max (cid:15) ) log(2 /δ ) , (14) where σ max = max u ∈G σ H t \H t ( x ∗ u ( t ) |H t ) is the maximum variance of the users’ opinions, which we analyze inAppendix G, and x max ≥ | x u ( t ) | , ∀ u ∈ G is an upper bound on the users’ (absolute) opinions. Then, for each user u ∈ G , the error between her true and estimated average opinion satisﬁes that | ˆ x ∗ u ( t ) − E H t \H t [ x ∗ u ( t ) |H t ] | ≤ (cid:15) withprobability at least − δ . We ﬁrst provide empirical evidence that our model is able to produce different types of opinion dynamics, whichmay or may not converge to a steady state of consensus or polarization. Then, we show that our model estimationand simulation algorithms as well as our predictive formulas scale to networks with millions of users and events.Appendix J contains an evaluation of the accuracy of our model parameter estimation method.

Different types of opinion dynamics.

We ﬁrst simulate our model on two different small networks using Poissonintensities, i.e. , λ ∗ u ( t ) = µ u , µ u ∼ U (0 , ∀ u , and then simulate our model on the same networks while usingHackers intensities with b vu ∼ U (0 , on 5% of the nodes, chosen at random, and the original Poisson intensitieson the remaining nodes. Figure 1 summarizes the results, which show that (i) our model is able to produce opinion6 InformationalTemporal

Nodes T i m e ( s ) − − (a) Estimation vs Nodes T i m e ( s ) − − (b) Simulation vs PoissonHawkes

Nodes T i m e ( s ) − − (c) Forecast vs

02 45 6 810 1015

PoissonHawkes

Forecast-Time[T(hr)] T i m e ( s ) (d) Forecast vs T Figure 2: Panels (a) and (b) show running time of our estimation and simulation procedures against number of nodes,where the average number of events per node is . Panels (c) and (d) show the running time needed to compute ouranalytical formulas against number of nodes and time horizon T = t − t , where the number of nodes is . InPanel (c), T = 6 hours. For all panels, the average degree per node is . The experiments are carried out in a singlemachine with 24 cores and 64 GB of main memory.dynamics that converge to consensus (second column) and polarization (third column); (ii) the opinion forecastingformulas described in Section 3 closely match an simulation based estimation (second column); and, (iii) the evolutionof the average opinion and whether opinions converge to a steady state of consensus or polarization depend on thefunctional form of message intensity . Scalability.

Figure 2 shows that our model estimation and simulation algorithms, described in Sections 2.1 and 2.2,and our analytical predictive formulas, described in Section 3.1, scale to networks with millions of users and events.For example, our algorithm takes minutes to estimate the model parameters from 10 million events generated byone million nodes using a single machine with 24 cores and 64 GB RAM. We use real data gathered from Twitter to show that our model can forecast users’ opinions more accurately than sixstate of the art methods [6, 7, 8, 14, 18, 25].

Experimental Setup.

We experimented with ﬁve Twitter datasets about current real-world events (Politics, Movie,Fight, Bollywood and US), in which, for each recorded message i , we compute its sentiment value m i using a popularsentiment analysis toolbox, specially designed for Twitter [12]. Here, the sentiment takes values m ∈ ( − , and weconsider the sentiment polarity to be simply sign( m ) . Appendix K contains further details and statistics about thesedatasets. Opinion forecasting.

We ﬁrst evaluate the performance of our model at predicting sentiment (expressed opinion) ata message level. To do so, for each dataset, we ﬁrst estimate the parameters of our model, SLANT, using messagesfrom a training set containing the (chronologically) ﬁrst 90% of the messages. Here, we set the decay parameters ofthe exponential triggering kernels κ ( t ) and g ( t ) by cross-validation. Then, we evaluate the predictive performanceof our opinion forecasting formulas using the last 10% of the messages . More speciﬁcally, we predict the sentimentvalue m for each message posted by user u in the test set given the history up to T hours before the time of themessage as ˆ m = E H t \H t − T [ x ∗ u ( t ) |H t − T ] . We compare the performance of our model with the asynchronous linearmodel (AsLM) [7], DeGroot’s model [8], the voter model [25], the biased voter model [6], the ﬂocking model [14],and the sentiment prediction method based on collaborative ﬁltering by Kim et al. [18], in terms of: (i) the meansquared error between the true ( m ) and the estimated ( ˆ m ) sentiment value for all messages in the held-out set, i.e. , E [( m − ˆ m ) ] , and (ii) the failure rate, deﬁned as the probability that the true and the estimated polarity do not coincide, i.e. , P (sign( m ) (cid:54) = sign( ˆ m )) . For the baselines algorithms, which work in discrete time, we simulate N T rounds in ( t − T, t ) , where N T is the number of posts in time T . Figure 3 summarizes the results, which show that: (i) ouropinion forecasting formulas consistently outperform others both in terms of MSE (often by an order of magnitude)and failure rate; (ii) its forecasting performance degrades gracefully with respect to T , in contrast, competing methodsoften fail catastrophically; and, (iii) it achieves an additional mileage by using Hawkes processes instead of Poissonprocesses. To some extent, we believe SLANT’s superior performance is due to its ability to leverage historical datato learn its model parameters and then simulate realistic temporal patterns. For these particular networks, Poisson intensities lead to consensus while Hake’s intensities lead to polarization, however, we did ﬁnd other examples in whichPoisson intensities lead to polarization and Hawkes intensities lead to consensus. Here, we do not distinguish between analytical and sampling based forecasting since, in practice, they closely match each other. The failure rate is very close to zero for those datasets in which most users post messages with the same polarity. Collab-Filter Flocking BiasedVoter Linear VoterDeGroot

SLANT (P) SLANT (H)

T, hours M S E − − T, hours

T, hours

001 2 4 6 8 100.20.40.60.8

T, hours F a il u r e - R a t e (a) Politics T, hours (b) Movie

T, hours (c) Fight

T, hours (d) Bollywood

T, hours (e) US

Figure 3: Sentiment prediction performance using a % held-out set for each real-world dataset. Performance ismeasured in terms of mean squared error (MSE) on the sentiment value, E [( m − ˆ m ) ] , and failure rate on the sentimentpolarity, P (sign( m ) (cid:54) = sign( ˆ m )) . For each message in the held-out set, we predict the sentiment value m given thehistory up to T hours before the time of the message, for different values of T . Nowcasting corresponds to T = 0 andforecasting to T > . The sentiment value m ∈ ( − , and the sentiment polarity sign ( m ) ∈ {− , } . ¯ m ( t )¯ x ( t ) T = 1hT = 3hT = 5h

Time A v e r a g e O p i n i o n →

28 April 2 May 5 May (a) Tw: Movie (Hawkes) ¯ m ( t )¯ x ( t ) T = 1hT = 3hT = 5h

Time A v e r a g e O p i n i o n →

28 April 2 May 5 May (b) Tw: Movie (Poisson) -0.4-0.20.20.40.60.8 ¯ m ( t )¯ x ( t ) T = 1hT = 3hT = 5h

Time A v e r a g e O p i n i o n → (c) Tw: US (Hawkes) -0.4-0.20.20.40.60.8 ¯ m ( t )¯ x ( t ) T = 1hT = 3hT = 5h

Time A v e r a g e O p i n i o n → (d) Tw: US (Poisson) Figure 4: Macroscopic sentiment prediction given by our model for two real-world datasets. The panels show theobserved sentiment ¯ m ( t ) (in blue, running average), inferred opinion ¯ x ( t ) on the training set (in red), and forecastedopinion E H t \H t − T [ x u ( t ) |H t − T ] for T = 1 , , and hours on the test set (in black, green and gray, respectively),where the symbol ¯ denotes average across users.Finally, we look at the forecasting results at a network level and show that our forecasting formulas can alsopredict the evolution of opinions macroscopically (in terms of the average opinion across users). Figure 4 summarizesthe results for two real world datasets, which show that the forecasted opinions become less accurate as the time T becomes larger, since the average is computed on longer time periods. As expected, our model is more accuratewhen the message intensities are modeled using multivariate Hawkes. We found qualitatively similar results for theremaining datasets. We proposed a modeling framework of opinion dynamics, whose key innovation is modeling users’ latent opinionsas continuous-time stochastic processes driven by a set of marked jump stochastic differential equations (SDEs) [13].Such construction allows each user’s latent opinion to be modulated over time by the opinions asynchronously ex-pressed by her neighbors as sentiment messages. We then exploited a key property of our model, the Markov property,to design efﬁcient parameter estimation and simulation algorithms, which scale to networks with millions of nodes.Moreover, we derived a set of novel predictive formulas for efﬁcient and accurate opinion forecasting and identiﬁedconditions under which opinions converge to a steady state of consensus or polarization. Finally, we experimentedwith real data gathered from Twitter and showed that our framework achieves more accurate opinion forecasting thanstate-of-the-arts.Our model opens up many interesting venues for future work. For example, in Eq. 4, our model assumes a lineardependence between users’ opinions, however, in some scenarios, this may be a coarse approximation. A naturalfollow-up to improve the opinion forecasting accuracy would be considering nonlinear dependences between opinions.8t would be interesting to augment our model to jointly consider correlations between different topics. One couldleverage our modeling framework to design opinion shaping algorithms based on stochastic optimal control [13, 24].Finally, one of the key modeling ideas is realizing that users’ expressed opinions (be it in the form of thumbs up/downor text sentiment) can be viewed as noisy discrete samples of the users’ latent opinion localized in time. It would bevery interesting to generalize this idea to any type of event data and derive sampling theorems and conditions underwhich an underlying general continuous signal of interest (be it user’s opinion, expertise, or wealth) can be recoveredfrom event data with provable guarantees.

References [1] O. Aalen, Ø. Borgan, and H. Gjessing.

Survival and event history analysis: a process point of view . SpringerVerlag, 2008.[2] A. H. Al-Mohy and N. J. Higham. Computing the action of the matrix exponential, with an application toexponential integrators.

SIAM journal on scientiﬁc computing , 33(2):488–511, 2011.[3] R. Axelrod. The dissemination of culture a model with local convergence and global polarization.

Journal ofconﬂict resolution , 41(2):203–226, 1997.[4] E. G. Birgin, J. M. Mart´ınez, and M. Raydan. Nonmonotone spectral projected gradient methods on convex sets.

SIAM Journal on Optimization , 10(4), 2000.[5] P. Clifford and A. Sudbury. A model for spatial conﬂict.

Biometrika , 60(3):581–588, 1973.[6] A. Das, S. Gollapudi, and K. Munagala. Modeling opinion dynamics in social networks. In

WSDM , 2014.[7] A. De, S. Bhattacharya, P. Bhattacharya, N. Ganguly, and S. Chakrabarti. Learning a linear inﬂuence model fromtransient opinion dynamics. In

CIKM , 2014.[8] M. H. DeGroot. Reaching a consensus.

Journal of the American Statistical Association , 69(345), 1974.[9] M. Farajtabar, N. Du, M. Gomez-Rodriguez, I. Valera, L. Song, and H. Zha. Shaping social activity by incen-tivizing users. In

NIPS , 2014.[10] M. Farajtabar, Y. Wang, M. Gomez-Rodriguez, S. Li, H. Zha, and L. Song. Coevolve: A joint point processmodel for information diffusion and network co-evolution. In

NIPS , 2015.[11] M. Gomez-Rodriguez, D. Balduzzi, and B. Sch¨olkopf. Uncovering the Temporal Dynamics of Diffusion Net-works. In

ICML , 2011.[12] A. Hannak, E. Anderson, L. F. Barrett, S. Lehmann, A. Mislove, and M. Riedewald. Tweetin’in the rain: Explor-ing societal-scale effects of weather on mood. In

ICWSM , 2012.[13] F. B. Hanson.

Applied Stochastic Processes and Control for Jump-Diffusions . SIAM, 2007.[14] R. Hegselmann and U. Krause. Opinion dynamics and bounded conﬁdence models, analysis, and simulation.

Journal of Artiﬁcial Societies and Social Simulation , 5(3), 2002.[15] D. Hinrichsen, A. Ilchmann, and A. Pritchard. Robustness of stability of time-varying linear systems.

Journal ofDifferential Equations , 82(2):219 – 250, 1989.[16] P. Holme and M. E. Newman. Nonequilibrium phase transition in the coevolution of networks and opinions.

Physical Review E , 74(5):056108, 2006.[17] T. Karppi and K. Crawford. Social media, ﬁnancial algorithms and the hack crash.

TC&S , 2015.[18] J. Kim, J.-B. Yoo, H. Lim, H. Qiu, Z. Kozareva, and A. Galstyan. Sentiment prediction using collaborativeﬁltering. In

ICWSM , 2013. 919] J. Leskovec, D. Chakrabarti, J. M. Kleinberg, C. Faloutsos, and Z. Ghahramani. Kronecker graphs: An approachto modeling networks.

JMLR , 2010.[20] B. Pang and L. Lee. Opinion mining and sentiment analysis.

F&T in information retrieval , 2(1-2), 2008.[21] B. H. Raven. The bases of power: Origins and recent developments.

Journal of social issues , 49(4), 1993.[22] Y. Saad and M. H. Schultz. Gmres: A generalized minimal residual algorithm for solving nonsymmetric linearsystems.

SIAM Journal on scientiﬁc and statistical computing , 7(3):856–869, 1986.[23] I. Valera and M. Gomez-Rodriguez. Modeling adoption and usage of competing products. In

Proceedings of the2015 IEEE International Conference on Data Mining , 2015.[24] Y. Wang, E. Theodorou, A. Verma, and L. Song. Steering opinion dynamics in information diffusion networks. arXiv preprint arXiv:1603.09021 , 2016.[25] M. E. Yildiz, R. Pagliari, A. Ozdaglar, and A. Scaglione. Voting models in random networks. In

InformationTheory and Applications Workshop , pages 1–7, 2010.10

Proof of Proposition 1

Given x ∗ ( t ) := [ x ( t ) , ..., x |V| ( t )] T and λ ∗ ( t ) := [ λ ( t ) , ..., λ |V| ( t )] T , we can compactly rewrite Eqs. 3 and 4 for allusers as: x ∗ ( t ) = α + A (cid:90) t g ( t − θ ) m ( t ) (cid:12) d N ( t ) (15)and λ ∗ ( t ) = µ + B (cid:90) t κ ( t − θ ) (cid:12) d N ( t ) . (16)Then, it easily follows that d x ∗ ( t ) = A (cid:90) t g (cid:48) ( t − θ ) m ( t ) (cid:12) d N ( t ) + g (0) Am ( t ) (cid:12) d N ( t ) , (17)where g ( t ) = e − ωt and g (cid:48) ( t − θ ) = − ωg ( t − θ ) . And, we can rewrite the above expression as d x ∗ ( t ) = ω ( α − x ∗ ( t )) dt + A ( m ( t ) (cid:12) d N ( t )) . (18)Similarly, we can show that d λ ∗ ( t ) = ν ( µ − λ ∗ ( t )) dt + B d N ( t ) . B Auxiliary theoretical results

The proofs of Theorems 2 and 5 rely on the following auxiliary Lemma.

Lemma 7.

The expected opinion E H t \H t [ x ∗ ( t ) |H t ] in the model of opinion dynamics deﬁned by Eqs. 4 and 3 withexponential triggering kernels with parameters ω and ν satisﬁes the following differential equation: d E H t \H t [ x ∗ ( t ) |H t ] dt = A E H t \H t [ x ∗ ( t ) (cid:12) λ ∗ ( t ) |H t ] − ω E H t \H t [ x ∗ ( t ) |H t ] + ω α , (19) where A = ( a vu ) v,u ∈G and the sign (cid:12) denotes pointwise product. Using that E [ m v ( θ ) | x ∗ v ( θ )] = x ∗ v ( θ ) , we can compute the average opinion of user u across all possible historiesfrom Eq. 4 as E H t \H t [ x ∗ u ( t ) |H t ] = α u + (cid:88) v ∈N ( u ) a uv (cid:90) t g ( t − θ ) E H t \H t [ m v ( θ ) dN v ( θ ) |H t ]= α u + (cid:88) v ∈N ( u ) a uv (cid:88) t i ∈H v ( t ) g ( t − t i ) m v ( t i ) + (cid:88) v ∈N ( u ) a uv (cid:90) tt g ( t − θ ) E H ( θ ) \H t (cid:2) x ∗ v ( θ ) λ ∗ v ( θ ) |H t (cid:3) dθ, and we can write the expectation of the opinion for all users in vectorial form as E H t \H t [ x ∗ ( t )] = v ( t ) + A (cid:90) t g ( t − θ ) E H ( θ ) \H t [ x ∗ ( θ ) (cid:12) λ ∗ ( θ )] dθ, (20)where the (cid:12) denotes pointwise product and ( v ( t )) u = α u + (cid:88) v ∈N ( u ) a uv (cid:88) t i ∈H v ( t ) g ( t − t i ) m v ( t i ) . Since g ( t ) = e − ωt , one may observe that ω v ( t ) + ˙ v ( t ) = ωα . Then, by differentiating Eq. 20, we obtain d E H t \H t [ x ∗ ( t ) |H t ] dt = A E H t \H t [ x ∗ ( t ) (cid:12) λ ∗ ( t ) |H t ] − ω E H t \H t [ x ∗ ( t ) |H t ] + ω α , (21)11 Proof of Theorem 2

Using Lemma 7 and λ ∗ u ( t ) = µ u , we obtain d E H t [ x ∗ ( t )] dt = [ − ωI + A Λ ] E H t [ x ∗ ( t )] + ω α , (22)where Λ = diag[ µ ] . Then, we apply the Laplace transform to the expression above and obtain ˆ x ( s ) = [ sI + ωI − A Λ ] − x ( t ) + ωs [ sI + ωI − A Λ ] − α , where we leverage the fact that, conditioning the prior history, the opinion is non-random, i.e. , E H t \H t − [ x ( t ) |H t − ] = x ( t ) . Finally, applying the inverse Laplace transform, we obtain the average opinion E H t \H t [ x ∗ ( t ) |H t ] in timedomain as E H t \H t [ x ∗ ( t ) |H t ] = e ( A Λ − ωI )( t − t ) x ( t ) + ω ( A Λ − ωI ) − (cid:16) e ( A Λ − ω I )( t − t ) − I (cid:17) (cid:105) α . D Proof of Theorem 3

Theorem 2 states that the average users’ opinion E H t [ x ∗ ( t )] in time domain is given by E H t \H t [ x ∗ ( t ) |H t ] = e ( A Λ − ωI )( t − t ) x ( t ) + ω ( A Λ − ωI ) − (cid:16) e ( A Λ − ω I )( t − t ) − I (cid:17) (cid:105) α . If Re [ ξ ( A Λ )] < ω , where ξ ( X ) denote the eigenvalues of matrix X , it easily follows that lim t →∞ E H t [ x ∗ ( t )] = (cid:18) I − A Λ w (cid:19) − α . (23) E Proof of Theorem 4

Assume b vu = 0 for all v, u ∈ G , v (cid:54) = u . Then, λ ∗ v ( t ) only depends on user v ’s history and, since x ∗ v ( t ) only dependson the history of the user v ’s neighbors N ( v ) , we can write E H t \H t [ x ∗ ( t ) (cid:12) λ ∗ ( t ) |H t ] = E H t \H t (cid:2) x ∗ ( t ) |H t (cid:3) (cid:12) E H t \H t (cid:2) λ ∗ ( t ) (cid:3) , and rewrite Eq. 21 as d E H t \H t [ x ∗ ( t ) |H t ] dt = (24) A ( E H t \H t [ x ∗ ( t ) |H t ] (cid:12) E H t \H t [ λ ∗ ( t ) |H t ]) − ω E H t \H t [ x ∗ ( t ) |H t ] + ω α . We can now compute E H ( θ ) \H t [ λ ∗ ( θ ) |H t ] analytically as follows. From Eq. 3, we obtain E H t \H t (cid:2) λ ∗ ( t ) |H t (cid:3) = η ( t ) + (cid:90) tt B κ ( t − θ ) E H θ \H t (cid:2) λ ∗ ( θ ) (cid:3) dθ, (25)where κ ( t ) = e − νt , [ η ( t )] u ∈V = µ u + (cid:80) v ∈N ( u ) b uv (cid:80) t i ∈H v ( t ) κ ( t − t i ) and B = ( b vu ) v,u ∈V , where b vu = 0 forall v (cid:54) = u , by assumption. Differentiating with respect to t , we get, ddt E H t \H t (cid:2) λ ∗ ( t ) |H t (cid:3) = − ν E H t \H t (cid:2) λ ∗ ( t ) |H t (cid:3) + B E H t \H t (cid:2) λ ∗ ( t ) |H t (cid:3) + ν µ , E H ( t +0 ) \H ( t ) λ ∗ ( t ) = η ( t ) . By taking the Laplace transform and then applying inverse Laplacetransform, E H t \H t [ λ ∗ ( t ) |H t ] = e ( B − νI )( t − t ) η ( t ) + ν ( B − νI ) − (cid:16) e ( B − ν I )( t − t ) − I (cid:17) µ ∀ t ≥ t , (26)where η ( t ) = E H ( t +0 ) \H ( t ) [ λ ∗ ( t )] . Using Eqs. 24 and 26, as well as E H t \H t [ x ∗ ( t )] (cid:12) E H t \H t [ λ ∗ ( t )] = Λ ( t ) E H t \H t [ x ∗ ( t )] , where Λ ( t ) := diag[ E H t \H t [ λ ∗ ( t )]] , we obtain d E H t \H t [ x ∗ ( t )] dt = [ − ωI + A Λ ( t )] E H t \H t [ x ∗ ( t )] + ω α , with initial conditions ( E H ( t +0 ) \H ( t ) [ x ∗ ( t )]) u ∈V = α u + (cid:80) v ∈N ( u ) a uv (cid:80) t i ∈H v ( t ) g ( t − t i ) m v ( t i ) . F Proof of Theorem 5

Theorem 4 states that the average users’ opinion E H t [ x ∗ ( t )] in time domain is given by d E H t \H t [ x ∗ ( t ) |H t ] dt = [ − ωI + A Λ ( t )] E H t \H t [ x ∗ ( t ) |H t ] + ω α . (27)In such systems, solutions can be written as [15] E H t \H t [ x ∗ ( t ) |H t ] = Φ( t ) α + ω (cid:90) t Φ( s ) α ds, (28)where the transition matrix Φ( t ) deﬁnes as a solution of the matrix differential equation ˙Φ( t ) = [ − ωI + A Λ ( t )]Φ( t ) with Φ(0) = I. If Φ( t ) satisﬁes || Φ( t ) || ≤ γe − ct ∀ t > for γ, c > then the steady state solution to Eq. 28 is given by [15] lim t →∞ E H t \H t [ x ∗ ( t ) |H t ] = (cid:18) I − A Λ ω (cid:19) − α . where Λ = lim t →∞ Λ ( t ) = diag (cid:104) I − B ν (cid:105) − µ . G Proof of Theorem 6

Let { x ∗ l ( t ) } nl =1 be the simulated opinions for all users and deﬁne s ( t ) = n (cid:80) nl =1 s l ( t ) , where s l ( t ) = ( ˆ x ∗ l ( t ) − E H t \H t [ x ∗ ( t ) |H t ]) . Clearly, for a given t , all elements in s l ( t ) are i.i.d. random variables with zero mean andvariance, and we can bound | s u ( t ) | < x max . Then, by Bernstein’s inequality, the following holds true, P ( | s u ( t ) | > (cid:15) ) = P ( s u ( t ) > (cid:15) ) + P ( s u ( t ) < − (cid:15) ) > . exp (cid:16) − n(cid:15) σ H t \H t ( x ∗ u ( t ) |H t ) + 4 x max (cid:15) (cid:17) Let σ max ( t ) = max u ∈G σ H t \H t ( x ∗ u ( t ) |H t ) . If we choose, δ < . exp (cid:16) − n(cid:15) σ max ( t ) + 4 x max (cid:15) (cid:17) (29)we obtain the required bound for n . Moreover, given this choice of δ , we have P ( | s u ( t ) | < (cid:15) ) > − δ immediately.However, a ﬁnite bound on n requires the variance σ max ( t ) to be bounded for all t . Hence, we analyze the varianceand its stability below. 13 .1 Dynamics of variance In this section we compute the time-domain evolution of the variance and characterize its stability for Poisson drivenopinion dynamics. A general analysis of the variance for multidimensional Hawkes is left for future work.

Lemma 8.

Given a collection of messages H t recorded during a time period [0 , t ) and λ ∗ u ( t ) = µ u for all u ∈ G ,the covariance matrix Γ ( t , t ) at any time t conditioned on the history H t can be described as,vec ( Γ ( t , t )) = (cid:90) t Φ ( t − θ ) vec [ σ A Λ A T + A diag( E H t − [ x ∗ ( θ )]) Λ A T ] dθ. where Φ ( t ) = e (cid:0) ( − ωI + A Λ ) ⊗ I + I ⊗ ( − ωI + A Λ )+( A ⊗ A ) (cid:98) Λ (cid:1) t , ˆ Λ i ,i = λ ∗ ( i ) , and Λ := diag[ λ ] . Moreover, the stability of the system is characterized by ξ (cid:2) ( − ωI + A Λ ) ⊗ I + I ⊗ ( − ωI + A Λ ) + ( A ⊗ A ) (cid:98) Λ (cid:3) < . Proof.

By deﬁnition, the covariance matrix is given by Γ ( t , t ) := E H t \H t [ (cid:0) x ∗ ( t ) − E H t \H t ( x ∗ ( t )) (cid:1)(cid:0) x ∗ ( t ) − E H t \H t ( x ∗ ( t )) (cid:1) T |H t ] . (30)Hence, if we deﬁne ∆ x = ( x ∗ ( t ) − E H t \H t ( x ∗ ( t ))) , we can compute the differential of the covariance matrix as d Γ ( t , t ) = E H t \H t [ d (∆ x ∆ x T ) |H t ] = E H t \H t [∆ x d (∆ x T ) + d (∆ x )∆ x T + d (∆ x ) d (∆ x T ) |H t ] , (31)where d (∆ x ) = d ( x ∗ ( t ) − E H t \H t ( x ∗ ( t ))) = d ( x ∗ ( t )) − d ( E H t \H t ( x ∗ ( t ))) . (32)Next, note that E ( d N ( t ) d N T ( t )) = E [ dN i ( t ) dN j ( t )] i,j ∈V = E [diag( d N ( t ))] = Λ , (33)where the off-diagonal entries vanish, since two jumps cannot happen at the same time point [13]. Now, recall theMarkov representation of our model, i.e. , d x ∗ ( t ) = − ω x ∗ ( t ) dt + AM ∗ ( t ) d N ( t ) + α dt, (34)where M ∗ ( t ) := diag[ m ( t )] is the diagonal formed by the sentiment vector and note that, m ( t ) (cid:12) d N ( t ) = M ∗ ( t ) d N ( t ) = diag[ d N ( t )] m ( t ) , (35)and, using Eq. 21, d E H t \H t [ x ∗ ( t ) |H t ] = − ω E H t \H t [ x ∗ ( t ) |H t ] dt + A Λ E H t \H t [ x ∗ ( t ) |H t ] dt + ωαdt. (36)Then, if we substitute Eqs. 34 and 36 in Eq. 32, we obtain d (∆ x ) = − ω [ x ∗ ( t ) − E H t \H t ( x ∗ ( t ) |H t )] dt + A [ M ∗ ( t ) d N ( t ) − Λ E H t \H t ( x ∗ ( t ) |H t ) dt ] . (37)As a result, we can write d Γ ( t , t ) = E H t \H t (cid:2) − ω ∆ x ∆ x T dt + ∆ x (cid:0) ( M ∗ ( t ) d N ( t ) − Λ E H t \H t ( x ∗ ( t )) dt (cid:1) T A T (cid:124) (cid:123)(cid:122) (cid:125) Term 1 + A (cid:0) M ∗ ( t ) d N ( t ) − Λ E H t \H t ( x ∗ ( t )) dt (cid:1) ∆ x T + A ( M ∗ ( t ) d N ( t ) − Λ E H t \H t ( x ∗ ( t )) dt )( M ∗ ( t ) d N ( t ) − Λ E H t \H t ( x ∗ ( t )) dt ) T A T (cid:124) (cid:123)(cid:122) (cid:125) Term 2 |H t (cid:3) , (38)14here Term 1 gives E H t \H t (cid:2) ∆ x (cid:0) M ∗ ( t ) d N ( t ) − Λ E H t \H t ( x ∗ ( t )) dt (cid:1) T A T (cid:12)(cid:12) H t ]= E H t \H t (cid:2) ∆ x ( m ∗ ( t ) − E H t \H t ( x ∗ ( t )) dt ) T A T |H t (cid:3) Λ dt (Using Eq. 35 and the fact that E [diag( d N ( t ))] = Λ ) = E H t \H t (cid:104) ∆ x. E (cid:2) ( m ∗ ( t ) − E H t \H t ( x ∗ ( t ))) T A T | x ( t ) , H t (cid:3)(cid:105) Λ dt = Γ ( t , t ) Λ dt, and Term 2 gives E H t \H t A [( M ∗ ( t ) d N ( t ) − Λ E H t \H t ( x ∗ ( t )) dt )( M ∗ ( t ) d N ( t ) − Λ E H t \H t ( x ∗ ( t )) dt ) T |H t ] A T = A E H t \H t [ M ∗ ( t ) E ( d N ( t ) d N T ( t )) M ∗ ( t ) |H t ] A T + O ( dt )= A E H t \H t [ M ( t ) |H t ] Λ dt A T (From Eq. 33) = A ( σ I + diag( E H t \H t ( x ( t ) ) |H t )) Λ A T dt = A (cid:2) σ I + Γ ii ( t , t ) + diag[ E H t \H t ( x ∗ ( t ) |H t )] ) (cid:3) Λ A T dt. Hence, substituting the expectations above into Eq. 38, we obtain d Γ ( t , t ) dt = − ω Γ ( t , t ) + Γ ( t , t ) Λ A T + A ΛΓ ( t , t ) (39) + A Γ ii ( t ) Λ A T + σ A Λ A T + A diag( E H t \H t [ x ∗ ( t ) |H t ]) Λ A T , Eq. 39 can be readily written in vectorial form by exploiting the properties of the Kronecker product as d [ vec ( Γ ( t , t )]) dt = V ( t ) vec ( Γ ( t , t )) + vec (cid:0) σ A Λ A T + A diag( E H t \H t [ x ∗ ( t ) |H t ]) Λ A T (cid:1) where V ( t ) = ( − ωI + A Λ ) ⊗ I + I ⊗ ( − ωI + A Λ ) + n (cid:88) i =1 ( A Λ P i ⊗ A P i ) ,P i = [ δ ii ] and ⊗ stands for the Kronecker product. Hence, the closed form solution of the above equation can bewritten as, vec ( Γ ( t , t )) = (cid:90) t e V ( t − θ ) vec [ σ A Λ A T + A diag( E H ( θ ) \H t [ x ∗ ( θ ) |H t ]) Λ A T ] dθ. (40)Moreover, the covariance matrix Γ ( t , t ) is bounded iffRe (cid:2) λ [( − ωI + A Λ ) ⊗ I + I ⊗ ( − ωI + A Λ ) + ( A ⊗ A ) (cid:98) Λ ]] < , where (cid:98) Λ := (cid:80) |V| i =1 Λ P i ⊗ P i .In this case, the steady state solution Γ = lim t →∞ Γ ( t , t ) is given by − ω Γ + ΓΛ A T + A ΛΓ + A diag( Γ ) Λ A T + σ A Λ A + A diag( E H t − [ x ∗ ]) Λ A T = 0 . (41)that means lim t →∞ vec [ Γ ( t , t )] is same as,vec ( Γ ) =[( − ωI + A Λ ) ⊗ I + I ⊗ ( − ωI + A Λ ) T + ( A ⊗ A ) ˆ Λ ] − × vec [ σ A Λ A T + A diag( x ∗∞ ) Λ A T ] (42)where x ∗ = lim t →∞ E H t \H t ( x ∗ ( t ) |H t ) . Finally, the variance of node u ’s opinion at any time t is given by σ H t \H t ( x ∗ u ( t ) |H t ) = [ Γ ( t , t )] u,u = vec ( P u ) T vec ( Γ ( t , t )) , where P u is the sparse matrix with its only ( u, u ) entry to be equal to unity.15 Parameter estimation algorithm

H.1 Estimation of α and A Algorithm H.1 summarizes the estimation of the opinion dynamics, i.e. , α and A , which reduces to a least-squareproblem. Algorithm 1

Estimation of α and A Input: H T , G ( V, E ) , regularizer λ , error bound (cid:15) Output: ˆ α , ˆ A Initialize: IndexForV [1 : | V | ] = (cid:126) for i = 1 to |H ( T ) | do T [ u i ]( IndexForV [ u i ]) = ( t i , u i , m i ) IndexForV [ u i ] ++ for u ∈ V do i = 0 S = T [ u ] for v ∈ N ( u ) do S = MergeTimes( S, T [ v ] ) // Merges two sets w.r.t t i , takes O ( |H u ( T ) | + | ∪ v ∈N ( u ) H v ( T ) | ) steps x last = 0 t last = 0 j = 0 for i = 1 to | S | do // This loop takes O ( | S | ) = O ( |H u ( T ) | + | ∪ v ∈N ( u ) H v ( T ) | ) steps ( t v , v, m v ) = S [ i ] t now = t v if u = v then x now = x last e − ω ( t now − t last ) j ++ g [ u ]( j, v ) = x now Y [ u ]( j ) = m u else x now = x last e − ω ( t now − t last ) + m v t last = t now x last = x now // Estimation of ( α, A ) for u ∈ V do a = InferOpinionParams( Y [ u ] , λ, g [ u ] ) ˆ α u = a [1] ˆ A [ ∗ ][ u ] = a [1 : end ] lgorithm 2 InferOpinionParams( Y u , λ, g u ) s = numRows( g u ) X = [ (cid:126) s g u ] Y = Y u L = X (cid:48) X x = ( λI + L ) − X (cid:48) Y return x H.2 Estimation of ( µ , B ) The ﬁrst step of the parameter estimation procedure for temporal dynamics also involves the computation of thetriggering kernels, which we do in same way as for the opinion dynamics in Algorithm H.1. In order to estimate theparameters, we adopted the spectral projected gradient descent (SPG) method proposed in [4].

Algorithm 3

Spectral projected gradient descent for µ u and B u ∗ Input: H T , G ( V, E ) and µ u , B u ∗ , step-length bound α max > and initial-step length α bb ∈ (0 , α max ] , error bound (cid:15) andsize of memory h Output: ˆ µ u , ˆ B u ∗ Notation: x = [ µ u , B u ∗ ] f ( x ) = (cid:80) e i ∈H ( T ) log λ ∗ u i ( t i ) − (cid:80) u ∈V (cid:82) T λ ∗ u ( τ ) dτ Initialize: k = 0 x = [ µ u , B u ∗ ] while || d k || < (cid:15) do ¯ α k ← min { α max , α bb } d k ← P c (cid:2) x k − ¯ α k ∇ f u ( x k )] − x k f ub ← max { f u ( x k ) , f u ( x k − ) , ..., f u ( x k − h ) } α ← while q k ( x k + αd k ) > f ub + να ∇ f u ( x k ) T d k do Select α by backtracking line-search; x k +1 ← x k + αd k s k ← αd k y k ← αB k d k α bb ← y Tk y k /s Tk y k k = k + 1 I Model simulation algorithm

We leverage the simulation algorithm in [10] to design an efﬁcient algorithm for simulating opinion-dynamics. Thetwo basic premises of the simulation algorithm are the sparsity of the network and the Markovian property of themodel. Due to sparsity, any sampled event would effect only a few number of intensity functions, only those of thelocal neighbors of the node. Therefore, to generate the new sample and to identify the intensity functions that requirechanges, we need O (log |V| ) operations to maintain the heap for the priority queue. On the other hand, the Markovianproperty allows us to update the rate and opinion in O (1) operations. The worst-case time-complexity of this algorithmcan be found to be O ( d max |H ( T ) ||V| ) , where d max is the maximum degree.17 lgorithm 4 OpinionModelSimulation( T, µ , B , α , A ) Initialize the priority queue Q LastOpinionUpdateTime [1 : | V | ] = LastIntensityUpdateTime [1 : | V | ] = (cid:126) LastOpinionUpdateValue [1 : | V | ] = α LastIntensityUpdateValue [1 : | V | ] = µ H (0) ← ∅ for u ∈ V do t = SampleEvent ( µ [ u ] , , u ) Q .insert( [ t, u ] ) while t < T do %%Approve the minimum time of all posts [ t (cid:48) , u ] = Q. ExtractMin () t u = LastOpinionUpdateTime [ u ] x t u = LastOpinionUpdateValue [ u ] x t (cid:48) u ← α [ u ] + ( x t u − α [ u ]) e − ω ( t − t u ) LastOpinionUpdateTime [ u ] = t (cid:48) LastOpinionUpdateValue [ u ] = x t (cid:48) u m u ∼ p ( m | x u ( t )) H ( t (cid:48) ) ← H ( t ) ∪ ( t, m u , u ) %% Update neighbors’ for ∀ v such that u (cid:32) v do t v = LastIntensityUpdateTime [ v ] λ t v = LastIntensityUpdateValue [ v ] λ t (cid:48) v ← µ [ v ] + ( λ t v − µ [ v ]) e − ω ( t − t v ) + B uv LastIntensityUpdateTime [ v ] = t (cid:48) LastIntensityUpdateValue [ v ] = λ t (cid:48) v t v = LastOpinionUpdateTime [ v ] x t v = LastOpinionUpdateValue [ v ] x t (cid:48) v ← α [ v ] + ( x t v − α [ v ]) e − ω ( t − t v ) + A uv m u LastOpinionUpdateTime [ v ] = t (cid:48) LastOpinionUpdateValue [ v ] = x t (cid:48) v %%Sample by only effected nodes t + = SampleEvent( λ v ( t (cid:48) ) , t (cid:48) , v ) Q. UpdateKey( v, t + ) t ← t (cid:48) return H ( T ) Algorithm 5

SampleEvent( λ, t, v ) ¯ λ = λ , s ← t while s < T do U ∼

Uniform [0 , s = s − ln U ¯ λ λ ( s ) ← µ [ v ] + ( λ v ( t ) − µ [ v ]) e − ω ( s − t ) d ∼ Uniform [0 , if d ¯ λ < λ then break else ¯ λ = λ ( s ) return s Experiments on Synthetic Data

Parameter estimation accuracy.

We evaluate the accuracy of our model estimation procedure on three types ofKronecker networks [19]: i) Assortative networks (parameter matrix [0 . , .

3; 0 . , . ), in which nodes tend to linkto nodes with similar degree; ii) dissortative networks ( [0 . , .

96; 0 . , . ), in which nodes tend to link to nodes withdifferent degree; and iii) core-periphery networks ( [0 . , .

5; 0 . , . ). For each network, the message intensities aremultivariate Hawkes, µ and B are drawn from a uniform distribution U (0 , , and α and A are drawn from a Gaussiandistribution N ( µ = 0 , σ = 1) . We use exponential kernels with parameters ω = 100 and ν = 1 , respectively, foropinions x ∗ u ( t ) and intensities λ ∗ ( t ) . We evaluate the accuracy of our estimation procedure in terms of mean squarederror (MSE), between the estimated and true parameters, i.e. , E [( x − ˆ x ) ] . Figure 5 shows the MSE of the parameters ( α , A ) , which control the Hawkes intensities, and the parameters ( µ , B ) , which control the opinion updates. As wefeed more messages into the estimation procedure, it becomes more accurate. M S E ( α , A ) → Avg. no. of events per node

HomophilyHeterophilyCore-periphery (cid:3)(cid:3)(cid:3) (a) Temporal parameters, ( α , A ) M S E ( µ , B ) → Avg. no. of events per node

HomophilyHeterophilyCore-periphery (cid:3)(cid:3)(cid:3) (b) Opinion parameters, ( µ , B ) Figure 5: Performance of model estimation for several -node kronecker networks in terms of mean squared errorbetween estimated and true parameters. As we feed more messages into the estimation procedure, the estimationbecomes more accurate. 19

Twitter dataset description

We used the Twitter search API to collect all the tweets (corresponding to a 2-3 weeks period around the event date)that contain hashtags related to the following events/topics: • Politics:

Delhi Assembly election, from 9th to 15th of December 2013. • Movie:

Release of “

Avengers: Age of Ultron ” movie, from April 28th to May 5th, 2015. • Fight:

Professional boxing match between the eight-division world champion Manny Pacquiao and the unde-feated ﬁve-division world champion Floyd Mayweather Jr., on May 2, 2015. • Bollywood:

Verdict that declared guilty to Salman Khan (a popular Bollywood movie star) for causing death ofa person by rash and negligible driving, from May 5th to 16th, 2015. • US:

Presidential election in the United-States, from April 7th to 13th, 2016.We then built the follower-followee network for the users that posted the collected tweets using the Twitter rest API .Finally, we ﬁltered out users that posted less than 200 tweets during the account lifetime, follow less than 100 users,or have less than 50 followers. Dataset |V| |E| |H ( T ) | E [ m ] std [ m ] Tw: Politics 548 5271 20026 0.0169 0.1780Tw: Movie 567 4886 14016 0.5969 0.1358Tw: Fight 848 10118 21526 -0.0123 0.2577Tw: Bollywood 1031 34952 46845 0.5101 0.2310Tw: US 533 20067 18704 -0.0186 0.7135

Table 1: Real datasets statistics https://dev.twitter.com/rest/public/search https://dev.twitter.com/rest/publichttps://dev.twitter.com/rest/public