Cheshire: An Online Algorithm for Activity Maximization in Social Networks
CCheshire : An Online Algorithm for ActivityMaximization in Social Networks
Ali Zarezade ∗ , Abir De ∗ , Hamid R. Rabiee , and Manuel Gomez-Rodriguez Sharif University, [email protected] IIT Kharagpur, [email protected] Max Planck Institute for Software Systems, [email protected]
Abstract
User engagement in social networks depends critically on the number of online actions their userstake in the network. Can we design an algorithm that finds when to incentivize users to take actions tomaximize the overall activity in a social network? In this paper, we model the number of online actionsover time using multidimensional Hawkes processes, derive an alternate representation of these processesbased on stochastic differential equations (SDEs) with jumps and, exploiting this alternate representation,address the above question from the perspective of stochastic optimal control of SDEs with jumps. Wefind that the optimal level of incentivized actions depends linearly on the current level of overall actions.Moreover, the coefficients of this linear relationship can be found by solving a matrix Riccati differentialequation, which can be solved efficiently, and a first order differential equation, which has a closed formsolution. As a result, we are able to design an efficient online algorithm,
Cheshire , to sample the optimaltimes of the users’ incentivized actions. Experiments on both synthetic and real data gathered fromTwitter show that our algorithm is able to consistently maximize the number of online actions moreeffectively than the state of the art.
People are constantly performing a wide variety of online actions in a growing number of online platformssuch as social networking sites, question answering (Q&A) sites or wikis. In most of these platforms, theseactions can be exogenous , taken by users at their own initiative, or endogenous actions, taken by users as a response to previous actions by other users. For example, in social networking sites, users post small piecesof information, which can then trigger likes, shares or replies by other users; in Q&A sites, users can askquestions, which are then answered and curated by other users; or, in wikis, editors write content, which islater reviewed and refined by other editors in a collaborative fashion. In this context, a natural questionemerges: how much should we incentivize a small number of users to take more initiatives to drive the overallnumber of actions to a certain level ( e.g. , at least twice actions per day per user)?The “activity shaping" problem was first studied by Farajtabar et al. [8], who derived a time dependentlinear relation between the intensity of exogenous actions and the overall intensity of actions in a socialnetwork under a model of actions based on multivariate Hawkes processes and, exploiting this connection,developed a convex optimization framework for activity shaping. One of the main shortcomings of theirframework is that it provides deterministic exogenous intensities that do not adapt to changes in the users’intensities and, as a consequence, it is less effective than our proposed algorithm, as shown in Section 5. ∗ Authors contributed equally. This work was done during Ali Zarezade’s and Abir De’s internships at Max Planck Institute forSoftware Systems. a r X i v : . [ s t a t . M L ] M a r ore recently, Farajtabar et al. [10] developed a heuristic method that splits the time window of interest intostages and adapts to changes in the users’ intensities at the beginning of each stage. However, their methodis suboptimal, it does not have provable guarantees, and it achieves lower performance than our method.In this paper, we design a novel online algorithm for the “activity maximization" problem in socialnetworks, one of the most important instances of the activity shaping problem, in which the goal is tomaximize the number of actions in the network. More in detail, we represent users’ actions using theframework of temporal point processes, which characterizes the continuous time interval between actionsusing conditional intensity functions [1], and model endogenous and exogenous actions using multidimensionalHawkes processes [15]. Then, we derive an alternate representation of multidimensional Hawkes processesusing SDEs with jumps and, exploiting this alternate representation, cast the activity maximization problemas a novel optimal control problem for SDEs with jumps. Our problem formulation differs from the traditionalcontrol literature [14] in two key technical aspects, which have not been considered until very recently [31]:I. The control signal, used to sample the times of incentivized actions, is a multidimensional conditionalintensity while previous work considered the control signal to be a time-varying real vector.II. The users’ intensities are stochastic Markov processes and thus the dynamics are doubly stochastic. Incontrast, previous work considered deterministic (typically constant) intensities.Moreover, we find that the optimal level of incentivized actions depends linearly on the current level ofoverall actions. Moreover, the coefficients of this linear relationship can be found by solving a matrix Riccatidifferential equation, which can be solved using many well-known efficient numerical solvers [12], and a firstorder differential equation, which has closed form solution. This allows for an efficient and relatively simpleonline procedure to sample the optimal times of the users’ incentivized actions, which can be implemented ina few lines of code (refer to Algorithm 1). Finally, we perform experiments on both synthetic and real datagathered from Twitter and show that our method is able to consistently maximize the number of actionsmore effectively than the state of the art [8, 10].
In addition to the paucity of work on the activity shaping problem [8, 10], discussed previously, our work alsorelates to previous work on stochastic optimal control, the influence maximization problem, and temporalpoint processes.In the traditional control literature [14], two key aspects of our problem formulation—intensities as controlsignals and stochastic intensities—have been large understudied. Only very recently, Zarezade et al. [31] andWang et al. [30] have considered these aspects, however, our approach differs in several ways: The work byZarezade et al. is more closely related to ours, however, they solve a different problem, the when-to-postproblem [26], where one aims to optimize a social network user’s broadcasting strategy to capture the greatestattention from the followers. Such problem is significantly less challenging than activity maximization—itonly considers -hop information propagation—and, as a consequence, their resulting sampling algorithm isvery different to ours. The work by Wang et al. focuses on two different problems, the when-to-post problemand opinion control, their strategy is open-loop and their control policy depends on the expectation of theuncontrolled dynamics, which needs to be calculated approximately by a time consuming sampling process. Incontrast, our framework is closed-loop, our control policy only depends on the current state of the dynamicsand the feedback coefficients only need to be calculated once off-line.The influence maximization problem [4, 6, 17, 25] aims to find a set of nodes in a social network whoseinitial adoption of a certain idea or product can trigger the largest expected number of follow-ups. However,previous work on the influence maximization problem differs from our work in two main aspects. First, ininfluence maximization, the state of each user is often assumed to be binary, either adopting a product or not.However, such assumption does not capture the recurrent nature of product usage, where the frequency ofthe usage matters. Second, while influence maximization methods identify a set of users to provide incentives, Our methodology can be easily extended to online platforms without an explicit underlying network between users. However,for ease of exposition, we focus on social networking sites. e.g. , information propagation [13, 6, 32], opinion dynamics [5],product competition [29], information reliability [28], or human learning [22]. However, in such context, thereis still a paucity of algorithms based on stochastic optimal control of temporal point processes [31, 30].
In this section, we first revise the framework of temporal point processes [1] and then describe how to usesuch framework to model endogenous and exogenous actions in social networks [7].
A univariate temporal point process is a stochastic process whose realization consists of a sequence of discreteevents localized in time, H = { t i ∈ R + | i ∈ N + , t i < t i +1 } . A univariate temporal point process can beequivalently represented by a counting process N ( t ) , which counts the number of events before time t , i.e. , N ( t ) = (cid:88) t i ∈H u ( t − t i ) , where u ( t ) = 1 if t ≥ and u ( t ) = 0 otherwise. Then, we can characterize the counting process usingthe conditional intensity function λ ∗ ( t ) , which is the conditional probability of observing an event in aninfinitesimal window [ t, t + dt ) given the history of event times up to time t , H ( t ) = { t i ∈ H | t i < t } , i.e. , λ ∗ ( t ) dt = P { event in [ t, t + dt ) | H ( t ) } = E [ dN ( t ) | H ( t )] , where dN ( t ) := N ( t + dt ) − N ( t ) ∈ { , } , the sign ∗ means that the intensity may depend on the history H ( t ) ,and the functional form for the intensity is often designed to capture the phenomena of interest. Moreover,given a function f ( t ) , it will be useful to define the convolution with respect to dN ( t ) as f ( t ) (cid:63) dN ( t ) := (cid:90) t f ( t − s ) dN ( s ) = (cid:88) t i ∈H ( t ) f ( t − t i ) . One can readily extend the above definitions to multivariate (or multidimensional) temporal point processes,which have been recently used to represent many different types of event data produced in social networks,such as the times of tweets [16], retweets [32] or links [9]. More specifically, a realization of an m -dimensionaltemporal point process consists of m sequences of discrete events localized in time, H = ∪ u ∈ [ m ] H u , where H u = { t i | t i ∈ R + | i ∈ N + , t i < t i +1 } , and it can be represented by an m -dimensional counting process N ( t ) ,where N u ( t ) counts the number of events in the u -th sequence before time t . Similarly, such counting processcan be characterized by m intensity functions, i.e. , λ ∗ ( t ) dt = E [ d N ( t ) |H ( t )] , where λ ∗ ( t ) = ( λ ∗ ( t ) , . . . , λ ∗ m ( t )) and H ( t ) = { t i ∈ H| t i < t } , and, given a function f ( t ) , one can define theconvolution with respect to d N ( t ) as (cid:90) t f ( t − s ) d N ( s ) = ( (cid:88) t i ∈H u ( t ) f ( t − t i )) u ∈ [ m ] . Next, we use the above background to revisit how to jointly model endogenous and exogenous actions insocial networks [7] and then derive an alternate model representation based on SDEs with jumps, which willbe useful to design our stochastic optimal control algorithm.3 .2 Modeling endogenous and exogenous actions in social networks
Given a directed network G = ( V , E ) with |V| = n users, we model both endogenous and exogenous actionstaken by all the users using an m -dimensional Hawkes process N ( t ) , where N u ( t ) counts the number ofactions taken by user u before time t . More specifically, the intensities of this process are given by [7, 15]: λ ∗ ( t ) = µ + A (cid:90) t κ ( t − s ) d N ( s ) , (1)where the first term models exogenous actions—actions users take at their own initiative—and the secondterm model endogenous actions—actions users take as response to the actions taken by their neighborswithin the network. Here, we parametrize the strength of influence between users using a sparse nonnegative influence matrix A = ( a uv ) ∈ R m × m + , where a uv means user u’s actions directly triggers follow-ups from user v , and κ ( t ) = e − ωt u ( t ) is an exponential kernel modeling the decay of influence over time. Note that thesecond term makes the intensities dependent on the history and thus a stochastic process by itself. In theremainder of the paper, we drop the sign ∗ from the intensities for the notational convenience.The following alternative representation of the above process will be useful to design our stochastic optimalcontrol algorithm for the activity maximization (proven in Appendix A): Proposition 1
Let N ( t ) be an m -dimensional counting process with an associated intensity λ ( t ) given byEq. 1. Then, the tuple ( N ( t ) , λ ( t )) is a doubly stochastic Markov process, whose dynamics can be defined bythe following jump SDEs: d λ ( t ) = [ w µ − w λ ( t )] dt + A d N ( t ) , (2) with the initial condition λ (0) = λ . In this section, we first describe a mechanism to steer endogenous actions in social networks and then formallystate the online activity maximization problem for such mechanism.
We steer endogenous actions (or events) in the counting process N ( t ) by introducing a counting process M ( t ) , with control intensities u ( t ) , which triggers additional follow-ups. More specifically, this countingprocess modulates the corresponding intensities λ ( t ) as follows: λ ∗ ( t ) = µ + A (cid:90) t κ ( t − s ) d N ( s ) (cid:124) (cid:123)(cid:122) (cid:125) Organic actions + A (cid:90) t κ ( t − s ) d M ( s ) (cid:124) (cid:123)(cid:122) (cid:125) Incentivized actions , (3)where we assume the strength of influence A between users is the same both for the organic and incentivizedactions, as previous work [7, 10]. Then, it is easy to derive the following alternative representation, similarlyas in Proposition 1, which we will use in our stochastic optimal control algorithm: Proposition 2
Let N ( t ) be a multidimensional counting process with associated intensities λ ( t ) , given byEq. 3, and M ( t ) be a controllable counting process with an associated intensity u ( t ) . Then, the systemdynamics can be defined by the following jump SDEs: d λ ( t ) = [ w µ − w λ ( t )] dt + A d N ( t ) + A d M ( t ) (4) with the initial condition λ ∗ (0) = λ . .2 The (online) activity maximization problem Given a directed network G = ( V , E ) with |V| = n users, our goal is to find the optimal control intensities u ( t ) that minimize the expected value of a particular loss function (cid:96) ( λ ( t ) , u ( t )) of the users’s organic andcontrol intensities over a time window ( t , t f ] , i.e. ,minimize u ( t ,t f ] E ( N , M )( t ,t f ] (cid:20) φ ( λ ( t f )) + (cid:90) t f t (cid:96) ( λ ( t ) , u ( t )) dt (cid:21) subject to u i ( t ) ≥ , ∀ t ∈ ( t , t f ] , i = 1 , . . . , n (5)where u ( t , t f ] denotes the control intensity functions from t to t f , the dynamics of N ( t ) are given by Eq. 4,and the expectation is taken over all possible realizations of the two counting process N ( t ) and M ( t ) duringinterval ( t , t f ] . Here, by considering a loss that is nonincreasing with respect to the organic intensities λ ( t ) ,we will penalize low organic intensities, and, by considering a loss that is nondecreasing with respect to thecontrol intensities u ( t ) , we will limit the number of posts we steer. Finally, note that the optimal intensities u ( t ) at time t may depend on the organic intensities λ ( t ) and thus the associated counting process M ( t ) may be doubly stochastic. In this section, we tackle the activity maximization problem defined by Eq. 5 from the perspective of stochasticoptimal control of jump SDEs [14]. More specifically, we first define a novel optimal cost-to-go function thataccounts for the above unique aspects of our problem, show that the Bellman’s principle of optimality [2] stillfollows, and finally derive and solve the Hamilton-Jacobi-Bellman (HJB) equation to find the optimal controlintensity.
Definition 3
The optimal cost-to-go is defined as the minimum of the expected value of the cost of goingfrom the state with intensity λ ( t ) at time t to the final state at time t f . J ( λ ( t ) , t ) = min u ( t,t f ] E ( N , M )( t,t f ] (cid:20) φ ( λ ( t f )) + (cid:90) t f t (cid:96) ( λ ( s ) , u ( s )) ds (cid:21) , (6) where the expectation is taken over all trajectories of the counting processes N and M in the ( t, t f ] interval,given the initial values of λ ( t ) and u ( t ) . To find the optimal control u ( t, t f ] and cost-to-go J , we break the problem into smaller subproblems, usingBellman’s principle of optimality (proven in the Appendix B): Theorem 4 (Bellman’s Principle of Optimality)
The optimal cost-to-go function defined by Eq. 6 sa-tisfies the following recursive equation: J ( λ ( t ) , t ) = min u ( t,t + dt ] (cid:8) E ( N , M )( t,t + dt ] [ J ( λ ( t + dt ) , t + dt )] + (cid:96) ( λ ( t ) , u ( t )) dt (cid:9) (7) where the expectation is taken over all realizations of the counting processes N ( t ) and M ( t ) in the infinitesimalinterval ( t, t + dt ] . Next, we use the Bellman’s principle of optimality to derive a partial differential equation on J , often calledHJB equation. To do so, we first assume J is continuous and then rewrite Eq. 7 as J ( λ ( t ) , t ) = min u ( t,t + dt ] (cid:8) E ( N , M )( t,t + dt ] [ J ( λ ( t ) , t ) + dJ ( λ ( t ) , t )] + (cid:96) ( λ ( t ) , u ( t )) dt (cid:9) u ( t,t + dt ] (cid:8) E ( N , M )( t,t + dt ] [ dJ ( λ ( t ) , t )] + (cid:96) ( λ ( t ) , u ( t )) dt (cid:9) (8)Then, we differentiate J with respect to time t and λ ( t ) according to Ito’s calculus [14] by the followingLemma (proven in the Appendix C): 5 heorem 5 The differential of the cost-to-go function J ( λ ( t ) , t ) given by Eq. 6 is given by: dJ ( λ ( t ) , t ) = J t ( λ ( t ) , t ) dt +[ w µ − w λ ( t )] T ∇ λ J ( λ ( t ) , t ) dt + n (cid:88) i =1 [ J ( λ ( t ) + a i , t ) − J ( λ ( t ) , t )] [ dN i ( t ) + dM i ( t )] where a i is the i ’th column of A , J t is derivative of J with respect to t and ∇ λ is the gradient of J withrespect to λ ( t ) . Next, if we plug the above equation in Eq. 8, then use E [ N i ( t )] = λ i ( t ) dt , and E [ M i ( t )] = u i ( t ) dt , the HJBequation follows: J t ( λ ( t ) , t ) + [ w µ − w λ ( t )] T ∇ λ J ( λ ( t ) , t ) + λ T ( t ) ∆ A J + min u ( t ) (cid:96) ( λ ( t ) , u ( t )) + u T ( t ) ∆ A J (9)where ∆ A J denotes a vector whose i ’th element is given by (∆ A J ) i = J ( λ ( t ) + a i , t ) − J ( λ ( t ) , t ) .To solve the above equation, we need to define the loss and penalty functions, (cid:96) and φ . Following theliterature on stochastic optimal control [2, 14], we consider the following quadratic forms, which will turn outto be a tractable choice: (cid:96) ( λ ( t ) , u ( t )) = − λ T ( t ) Q λ ( t ) + 12 u T ( t ) S u ( t ) φ ( λ ( t f )) = − λ T ( t f ) F λ ( t f ) (10)where Q , F and S are given symmetric matrices with q ij ≥ , f ij ≥ and s ij ≥ for all i, j ∈ [ n ] . Thesematrices allow us to trade-off the number of non incentivized actions, both over time and at time t f , and thenumber of incentivized actions. Under these definitions, we can find the relationship between the optimalintensity and the optimal cost by solving the minimization in the HJB Eq. 9.minimize u ( t ) u T ( t ) ∆ A J + 12 u T ( t ) S u ( t ) subject to u i ( t ) ≥ , i = 1 , . . . , n By taking the differentiation with respect to u ( t ) , the solution of the unconstrained minimization is given by: u ∗ ( t ) = − S − ∆ A J, (11)which is the same to the solution of the constraint problem, since (∆ A J ) i ≤ , as proved in the Appendix D,and s ij ≥ by definition.Then, we substitute Eq. 11 into Eq. 9 and find that the optimal cost J needs to satisfy the followingpartial differential equation: J t ( λ ( t ) , t ) + [ w µ − w λ ( t )] T ∇ λ J ( λ ( t ) , t ) + λ T ( t ) ∆ A J − λ T ( t ) Q λ ( t ) −
12 ∆ A J T S − ∆ A J (12)with J ( λ ( t f ) , t f ) = φ ( λ ( t f )) as the terminal condition. The following lemma provides us with a solution tothe above equation (proven in Appendix E): Lemma 6
Any solution to the nonlinear differential equation given by Eq. 12 can be approximated as desiredby the following quadratic form: J ( λ ( t ) , t ) = f ( t ) + g ( t ) T λ ( t ) + 12 λ ( t ) T H ( t ) λ ( t ) . where g ( t ) and H ( t ) can be found by solving the following differential equations: ˙ H ( t ) = ( ω I − A ) T H ( t ) + H ( t )( ω I − A ) + H ( t ) AS − A T H ( t ) + Q ˙ g ( t ) = [ ω I − A T + H ( t ) AS − A T ] g ( t ) − ω H ( t ) µ + 12 (cid:2) H ( t ) AS − − I (cid:3) diag( A T H ( t ) A ) lgorithm 1: Cheshire : it returns user i and time τ for the next incentivized action. Initialization: Compute H ( t ) and g ( t ) ; u ( t ) ← − S − (cid:2) A T ( g ( t ) + H ( t ) µ ) + diag( A T H ( t ) A ) (cid:3) ; General subroutine: ( i, τ ) ← Sample ( u ( t )) ; ( j, s ) ← NextAction ( ) ; while s < τ do λ N ( t ) ← A e j κ ( t − s ) ; u N ( t ) ← − S − A T H ( t ) λ N ( t ) ; ( k, r ) ← Sample ( u N ( t )) ; if r < τ then τ ← r ; i ← k ; u ( t ) ← u ( t ) + u N ( t ) ; ( j, s ) ← NextAction ( ) ; λ M ( t ) ← A e i κ ( t − τ ) ; u M ( t ) ← − S − A T H ( t ) λ M ( t ) ; u ( t ) ← u ( t ) + u M ( t ) ; return ( i, τ ) In the above lemma, note that the first differential equation is a matrix Riccati differential equation,which can be solved using many well-known efficient numerical solvers [12], and the second one is a first orderdifferential equation which has closed form solution. Both equations are solved backward in time with finalconditions g ( t f ) = and H ( t f ) = − F .Finally, given the above cost-to-go function, the optimal intensity is given by the following theorem, i.e. , Theorem 7
The optimal intensity for the activity maximization problem defined by Eq. 5 with quadratic lossand penalty function is given by: u ∗ ( t ) = − S − (cid:2) A T g ( t ) + A T H ( t ) λ ( t ) + 12 diag( A T H ( t ) A ) (cid:3) (13)This optimal intensity is linear in λ ( t ) and the coefficients g ( t ) and H ( t ) can be found off-line. Hence,it allows for a very efficient procedure to sample the times of the users’ incentivized actions, which takesinspiration from the algorithm RedQueen [31].. The key idea is as follows.At any given time t , we can view the multidimensional control signal u ( t ) as a superposition of inhomo-geneous multidimensional poisson processes, one per non incentivized action, which start when the actionstake place. Algorithm 1 summarizes our method, which we name Cheshire [3]. Within the algorithm,
NextAction () returns the time of the next (non incentivized) action in the network as well as the identify of theuser who takes the action, once the action happens, e j is an indicator vector where the entry corresponding touser j is , and Sample ( u ( t )) samples from a multidimensional inhomogeneous poisson process with intensity u ( t ) and it returns both the sampled time and dimension. To sample from a multidimensional inhomogeneouspoisson process, there exist multiple methods e.g. , refer to Lewis et al. [21]. Finally, note that one canprecompute most of the quantities the algorithm needs, i.e. , lines 2-3, Ae j in line 8, and S − A T H ( t ) in line9. Given these precomputations, the algorithm only needs to perform O ( n ) operations and sample T N ( t f ) times from an inhomogeneous poisson process. 7 ncontrolled, N ( t f ) Cheshire , N ( t f ) Cheshire , M ( t f ) Time T o t a l n o . o f a c t i o n s CHE , NNN ( t ) UNC , NNN ( t ) CHE , MMM ( t ) Total number of actions (a) Core-Periphery network v u Uncontrolled, N ( t f ) Cheshire , N ( t f ) Cheshire , M ( t f ) Time T o t a l n o . o f a c t i o n s CHE , NNN ( t ) UNC , NNN ( t ) CHE , MMM ( t ) Total number of actions (b) Dissortative networkFigure 1: Activity on two 64-node networks, G (top) and G (bottom) under uncontrolled (Eq. 2; without Cheshire ) and controlled (Eq. 4; with
Cheshire ) dynamics. The first and second columns visualize thefinal number of non incentivized actions N ( t f ) under uncontrolled and controlled dynamics, where darker redcorresponds to higher number of actions. The third column visualizes the final number of incentivized actions M ( t f ) under controlled dynamics, where darker green corresponds to higher number of actions. The fourthcolumn shows the temporal evolution of the number of incentivized and non incentivized actions across thewhole networks for controlled and uncontrolled dynamics, where the solid line is the average across simulationruns and the shadowed region represents the standard error. By incentivizing a relatively small numberof actions ( ∼ Cheshire is able to increase the overall number of (non incentivized) actionsdramatically ( ∼ , vs ∼ , actions). In this section, we shed light on
Cheshire ’s sampling strategy in two small Kronecker networks [19] byrecording, on the one hand, the number of incentivized actions per node and, on the other hand, the numberof additional actions per node these incentivized actions triggers in comparison with an uncontrolled setup.Additionally, Appendix F compares the performance of our method against several baselines and state of theart methods [8, 10] on a variety of large synthetic networks and provides a scalability analysis.
Experimental setup.
We experiment with two small Kronecker networks with 64 nodes, a small core-periphery (parameter matrix [0 . , .
3; 0 . , . ) and a dissortative network ( [0 . , .
96; 0 . , . ), shownin Figure 1. For each network, we draw A from a uniform distribution U (0 , , µ also from a uniformdistribution U (0 , for % of the nodes and µ = 0 for the remaining %, and set ω = 16 , t = 0 and t f = 5 . . Then, we compare the number of actions over time under uncontrolled dynamics (Eq. 2; without Cheshire ) and controlled dynamics (Eq. 4; with
Cheshire ). In both cases, we perform simulation In practice, we will consider diagonal matrices. CHE MSC OPL PRK DEG UNC
Time, days N ( t ) (high)) Elections, ¯ M ( t f ) ≈ K Time, days (high))
Verdict, ¯ M ( t f ) ≈ K Time, days (high))
Club, ¯ M ( t f ) ≈ K Time, days (high))
Sports, ¯ M ( t f ) ≈ K Time, days (high))
Series, ¯ M ( t f ) ≈ K Figure 2: Performance over time of
Cheshire against several competitors for each Twitter dataset. Perfor-mance is measured in terms of overall number of tweets ¯ N ( t ) = (cid:80) u ∈V E [ N u ( t )] . In all cases, we tune theparameters Q , S and F to be diagonal matrices such that the total number of incentivized tweets posted by our method is equal to the budget used in the competing methods and baselines. Cheshire (in red)consistently outperforms competing methods over time and it triggers up to %– % more posts than thesecond best performer (in blue) as time goes by.runs and sample (non incentivized) actions from a multidimensional Hawkes process using Ogata’s thinningalgorithm [23]. In the case of controlled dynamics, we sample incentivized actions using Algorithm 1. Results.
Figure 1 summarizes the results in terms of the number of non incentivized and incentivizedactions, which show that: (i) a relatively small number of incentivized actions (fourth column; ∼ ∼ , vs ∼ , actions); (ii) the majority of the incentivized actions concentratein a small set of influential nodes (third column; nodes in dark green); and, (iii) the variance of the overallnumber of actions (fourth column; shadowed regions) is consistently reduced when using Cheshire , in otherwords, the networks become more robust.
In this section, we experiment with data gathered from Twitter and show that our model can maximize thenumber of online actions more effectively than several baselines and state of the art methods [8, 10].
Experimental setup.
We experiment with five Twitter datasets about current real-world events (Elections,Verdict, Club, Sports, TV Show), where actions are tweets (and retweets). Appendix G contains furtherdetails and statistics about these datasets. We compare the performance of our algorithm,
Cheshire , withtwo state of the art methods, which we denote as OPL [8] and MSC [10], and two baselines, which we denoteas PRK and DEG. PRK and DEG distribute the users’ control intensities u ( t ) proportionally to the user’spage rank and outgoing degree in the network, respectively. More in detail, we proceed as follows.For each dataset, we estimate the influence matrix A of the multidimensional Hawkes process definedby Eq. 1 using maximum likelihood, as elsewhere [7, 29]. Moreover, we set the decay parameter ω of thecorresponding exponential triggering kernel κ ( t ) by cross-validation. Then, we perform simulation runsfor each method and baseline, where we sample non incentivized actions from the multidimensional Hawkesprocess learned from the corresponding Twitter dataset using Ogata’s thinning algorithm [23]. For thecompeting methods and baselines, the control intensities u ( t ) are deterministic and thus we only needto sample incentivized actions from inhomogeneous poisson processes [21]. For our method, the controlintensities are stochastic and thus we sample incentivized actions using Algorithm 1. Finally, we comparetheir performance in terms of the (empirical) average number of tweets E [ N ( t )] . In the above procedure, fora fair comparison, we tune the parameters Q , S and F to be diagonal matrices such that the total number ofincentivized tweets posted by our method is equal to the budget used in the state of the art methods andbaselines. Results.
We first compare the performance of our algorithm against others in terms of overall averagenumber of tweets ¯ N ( t ) = (cid:80) u ∈V E [ N u ( t )] for a fixed budget ¯ M ( t f ) = (cid:80) u ∈V E [ M u ( t f )] . Figure 2 summarizes9 CHE MSC OPL PRK DEG UNC M ( t f ) N t K
5K 10K 15K
Elections M ( t f ) N
20K 40K 60K
Verdict M ( t f ) N
5K 10K 15K 20K
Club M ( t f ) N
5K 10K 15K 20K
Sports M ( t f ) N
10K 20K 30K 40K 50K
Series
Figure 3: Performance vs. number of incentivized tweets for each Twitter dataset. Performance is measuredin terms of the average time ¯ t K required by each method to reach a milestone of , tweets. Cheshire (in red) consistently reaches the milestone faster than the competing methods, e.g. , %– % faster than thesecond best performer (in blue) for low budgets.the results, which show that: (i) our algorithm consistently outperforms competing methods by large margins;(ii) it triggers up to %– % more posts than the second best performer as time goes by; and, (iii) thebaselines PRK and DEG have an underwhelming performance, suggesting that the network structure alone isnot an accurate measure of influence.Next, we evaluate the performance of our algorithm against others with respect to the available budget.To this aim, we compute the average time ¯ t K required by each method to reach a milestone of , tweetsagainst the number of incentivized tweets ¯ M ( t f ) ( i.e. , the budget). Here, we do not report the results for theuncontrolled dynamics (UNC) since it did not reach the milestone after × the time the slowest competitortook to reach it. Figure 3 summarizes the results, which show that: (i) our algorithm consistently reachesthe milestone faster than the competing methods; (ii) it exhibits a greater competitive advantage when thebudget is low; and, (iii) it reaches the milestone %– % faster than the second best performer for lowbudgets. In this paper, we tackle the problem of activity maximization in social networks from the perspective ofstochastic optimal control of temporal point processes and showed that the optimal level of incentivizedactions depends linearly on the current level of overall actions. Moreover, the coefficients of this linearrelationship can be found by solving a matrix Riccati differential equation, which can be solved efficiently,and a first order differential equation, which has a closed form solution. As a result, we were able to design
Cheshire (Algorithm 1), an efficient and relatively simple online algorithm to sample the optimal times ofthe users’ incentivized actions. We experimented with synthetic and real-world data gathered from Twitterand showed that our algorithm is able to consistently maximize activity more effectively than the state of theart.Our work also opens many interesting venues for future work. For example,
Cheshire optimizes a sumof quadratic losses on the users’ intensities and assumes the influence matrix does not change over time,however, it would be useful to derive the optimal level of incentivized actions for other losses, e.g. , minimaxloss, and consider time-varying influence. One of the challenges one would face is to find solutions that ensurethe nonnegativity of the control intensity. Moreover, our experimental evaluation is based on simulation,using models whose parameters ( A , ω ) are learned from data. It would be very interesting to evaluate ourmethod using actual interventions in a social network. Finally, optimal control of jump SDEs with doublystochastic temporal point processes can be potentially applied to design online algorithms for a wide varietyof control problems in social and information systems, such as human learning [24], opinion control [30], andrumor control [11]. 10 eferences [1] O. Aalen, O. Borgan, and H. Gjessing. Survival and event history analysis: a process point of view .Springer Science & Business Media, 2008.[2] D. Bertsekas.
Dynamic programming and optimal control , volume 1. Athena Scientific Belmont, MA,1995.[3] L. Carroll.
Through the looking glass: And what Alice found there . Rand, McNally, 1917.[4] W. Chen, C. Wang, and Y. Wang. Scalable influence maximization for prevalent viral marketing inlarge-scale social networks. In
Proceedings of the 16th ACM SIGKDD international conference onKnowledge discovery and data mining , pages 1029–1038. ACM, 2010.[5] A. De, I. Valera, N. Ganguly, S. Bhattacharya, and M. Gomez-Rodriguez. Learning and forecastingopinion dynamics in social networks. In
Advances in Neural Information Processing Systems , 2016.[6] N. Du, L. Song, M. Gomez-Rodriguez, and H. Zha. Scalable influence estimation in continuous-timediffusion networks. In
Advances in Neural Information Processing Systems , pages 3147–3155, 2013.[7] M. Farajtabar, N. Du, M. Gomez-Rodriguez, I. Valera, H. Zha, and L. Song. Shaping social activity byincentivizing users. In
Advances in neural information processing systems , pages 2474–2482, 2014.[8] M. Farajtabar, M. Gomez-Rodriguez, N. Du, M. Zamani, H. Zha, and L. Song. Back to the past:Source identification in diffusion networks from partially observed cascades. In
Proceedings of the 18thInternational Conference on Artificial Intelligence and Statistics , 2015.[9] M. Farajtabar, Y. Wang, M. Gomez-Rodriguez, S. Li, H. Zha, and L. Song. COEVOLVE: A joint pointprocess model for information diffusion and network co-evolution.
Advances in Neural InformationProcessing Systems , 2015.[10] M. Farajtabar, X. Ye, S. Harati, L. Song, and H. Zha. Multistage campaigning in social networks. In
Advances in Neural Information Processing Systems , pages 4718–4726, 2016.[11] A. Friggeri, L. A. Adamic, D. Eckles, and J. Cheng. Rumor cascades. In
ICWSM , 2014.[12] C. Garrett.
Numerical integration of matrix Riccati differential equations with solution singularities .PhD thesis, The University of Texas at Arlington, May 2013.[13] M. Gomez-Rodriguez, D. Balduzzi, and B. Schölkopf. Uncovering the temporal dynamics of diffusionnetworks. In
Proceedings of the 28th International Conference on Machine Learning (ICML’11) , pages561–568, 2011.[14] F. Hanson.
Applied stochastic processes and control for Jump-diffusions: modeling, analysis, andcomputation , volume 13. Siam, 2007.[15] A. Hawkes. Spectra of some self-exciting and mutually exciting point processes.
Biometrika , 58(1):83–90,1971.[16] M. Karimi, E. Tavakoli, M. Farajtabar, L. Song, and M. Gomez-Rodriguez. Smart Broadcasting: Do youwant to be seen? In
Proceedings of the 22nd ACM SIGKDD International Conference on KnowledgeDiscovery in Data Mining , 2016.[17] D. Kempe, J. Kleinberg, and É. Tardos. Maximizing the spread of influence through a social network.In
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and datamining , pages 137–146. ACM, 2003.[18] J. Kingman.
Poisson processes . Oxford university press, 1992.1119] J. Leskovec, D. Chakrabarti, J. Kleinberg, C. Faloutsos, and Z. Ghahramani. Kronecker graphs: Anapproach to modeling networks.
The Journal of Machine Learning Research , 11:985–1042, 2010.[20] J. Leskovec, D. Chakrabarti, J. M. Kleinberg, C. Faloutsos, and Z. Ghahramani. Kronecker graphs: Anapproach to modeling networks.
JMLR , 2010.[21] P. A. Lewis and G. S. Shedler. Simulation of nonhomogeneous poisson processes by thinning.
Navalresearch logistics quarterly , 26(3):403–413, 1979.[22] C. Mavroforakis, I. Valera, and M. Gomez-Rodriguez. Modeling the dynamics of online learning activity.In
Proceedings of the 26th International World Wide Web Conference , 2017.[23] Y. Ogata. On lewis’ simulation method for point processes.
Information Theory, IEEE Transactions on ,27(1):23–31, 1981.[24] S. Reddy, I. Labutov, S. Banerjee, and T. Joachims. Unbounded human learning: Optimal schedulingfor spaced repetition.
Proceedings of the 22nd ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining , 2016.[25] M. Richardson and P. Domingos. Mining knowledge-sharing sites for viral marketing. In
Proceedingsof the eighth ACM SIGKDD international conference on Knowledge discovery and data mining , pages61–70. ACM, 2002.[26] N. Spasojevic, Z. Li, A. Rao, and P. Bhattacharyya. When-to-post on social networks. In
Proceedingsof the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages2127–2136. ACM, 2015.[27] M. Stone. The generalized weierstrass approximation theorem.
Mathematics Magazine , 21(5):237–254,1948.[28] B. Tabibian, I. Valera, M. Farajtabar, L. Song, B. Schoelkopf, and M. Gomez-Rodriguez. Distillinginformation reliability and source trustworthiness from digital traces. In
Proceedings of the 26thInternational World Wide Web Conference , 2017.[29] I. Valera, M. Gomez-Rodriguez, and K. Gummadi. Modeling diffusion of competing products andconventions in social media.
ICDM , 2015.[30] Y. Wang, E. Theodorou, A. Verma, and L. Song. A unifying framework for guiding point processes withstochastic intensity functions. arXiv preprint arXiv:1701.08585 , 2017.[31] A. Zarezade, U. Upadhyay, H. Rabiee, and M. Gomez-Rodriguez. Redqueen: An online algorithm forsmart broadcasting in social networks. In
Proceedings of the 10th ACM International Conference onWeb Search and Data Mining , 2017.[32] Q. Zhao, M. Erdogdu, H. He, A. Rajaraman, and J. Leskovec. Seismic: A self-exciting point processmodel for predicting tweet popularity. In
Proceedings of the 21th ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining , 2015.12
Proof of Proposition 1
Using the left continuity of poisson processes and the definition of derivative d λ ( t ) = λ ( t + dt ) − λ ( t ) we canfind the dynamics of the process by the Ito’s calculus [14] as follows: d λ ( t ) = A (cid:90) t + dt g ( t + dt − s ) d N ( s ) − A (cid:90) t g ( t − s ) d N ( s )= A (cid:90) t + dt ( g ( t − s ) + g (cid:48) ( t − s ) dt ) d N ( s ) − A (cid:90) t g ( t − s ) d N ( s )= A (cid:90) t + dtt g ( t − s ) d N ( s ) + dt A (cid:90) t + dt g (cid:48) ( t − s ) d N ( s )= A g (0) d N ( t ) − w dt A (cid:90) t + dt g ( t − s ) d N ( s )= A d N ( t ) − w dt A (cid:90) t g ( t − s ) d N ( s )= A d N ( t ) + w dt [ µ ( t ) − λ ( t )] B Proof of Theorem 4 J ( λ ( t ) , t ) = min u ( t,t f ] E ( N , M )( t,t f ] (cid:20) φ ( λ ( t f )) + (cid:90) t f t (cid:96) ( λ ( s ) , u ( s )) ds (cid:21) = min u ( t,t f ] E ( N , M )( t,t f ] (cid:20) φ ( λ ( t f )) + (cid:90) t + dtt (cid:96) ( λ ( s ) , u ( s )) ds + (cid:90) t f t + dt (cid:96) ( λ ( s ) , u ( s )) ds (cid:21) = min u ( t,t f ] E ( N , M )( t,t + dt ] (cid:20) E ( N , M )( t + dt,t f ] (cid:104) φ ( λ ( t f )) + (cid:96) ( λ ( t ) , u ( t )) dt + (cid:90) t f t + dt (cid:96) ( λ ( s ) , u ( s )) ds (cid:105)(cid:21) = min u ( t,t + dt ] min u ( t + dt,t f ] E ( N , M )( t,t + dt ] (cid:20) (cid:96) ( λ ( t ) , λ ( t ) , t ) dt + E ( N , M )( t + dt,t f ] (cid:104) φ ( λ ( t f )) + (cid:90) t f t + dt (cid:96) ( λ ( s ) , u ( s )) ds (cid:105)(cid:21) = min u ( t,t + dt ] E ( N , M )( t,t + dt ] [ J ( λ ( t + dt ) , t + dt )] + (cid:96) ( λ ( t ) , u ( t )) dt C Proof of Theorem 5
Using the definition of derivative we can evaluate the differential of the cost-to-go as follows: dJ ( λ ( t ) , t ) = J ( λ ( t + dt ) , t + dt ) − J ( λ ( t ) , t )= J ( λ ( t ) + d λ ( t ) , t + dt ) − J ( λ ( t ) , t ) .
13o evaluate the first term in the right hand side of the above equality we substitute d λ ( t ) by the SDEdynamics (4). Then, using the zero-one jump law [18] we can write: J ( λ ( t ) + d λ ( t ) , t + dt ) = J ( λ ( t ) + f ( t ) dt + A d N ( t ) + A d M ( t ) , t + dt )= (cid:88) i J ( λ ( t ) + f ( t ) dt + a i , t + dt ) dN i ( t ) + (cid:88) i J ( λ ( t ) + f ( t ) dt + a i , t + dt ) dM i ( t )+ J ( λ ( t ) + f ( t ) dt, t + dt ) (cid:89) i [1 − dN i ( t )][1 − dM i ( t )]= J ( λ ( t ) + f ( t ) dt, t + dt )[1 − (cid:88) i dN i ( t ) + dM i ( t )] + (cid:88) i J ( λ ( t ) + f ( t ) dt + a i , t + dt )[ dN i ( t ) + dM i ( t )]= J ( λ ( t ) + f ( t ) dt, t + dt ) + (cid:88) i [ J ( λ ( t ) + f ( t ) dt + a i , t + dt ) − J ( λ ( t ) + f ( t ) dt, t + dt )][ dN i ( t ) + dM i ( t )] where we denote f ( t ) := w µ − w λ ( t ) for notation simplicity, and used that the bilinear differential form dt dN ( t ) = 0 as in [14]. By total derivative rule we have: J ( λ ( t ) + f ( t ) dt + a i , t + dt ) = J ( λ ( t ) + a i , t ) + ∇ λ J ( λ ( t ) + a i , t ) f ( t ) dt + J t ( λ ( t ) + a i , t ) dtJ ( λ ( t ) + f ( t ) dt, t + dt ) = J ( λ ( t ) , t ) + ∇ λ J ( λ ( t ) , t ) f ( t ) dt + J t ( λ ( t ) , t ) dt. If we plug these two relation in to the previous one, it results to: J ( λ ( t ) + d λ ( t ) , t + dt ) = J ( λ ( t ) , t ) + ∇ λ J ( λ ( t ) , t ) f ( t ) dt + J t ( λ ( t ) , t ) dt + (cid:88) i [ J ( λ ( t ) + a i , t ) − J ( λ ( t ) , t )][ dN i ( t ) + dM i ( t )] , which completes the proof. D Proof of (∆ A J ) i ≤ Lets t < s , then according to the definition we can write, λ ( s ) = µ + (cid:90) s g ( s − τ ) A d N ( τ ) = µ + (cid:90) t g ( s − τ ) A d N ( τ ) + (cid:90) st g ( s − τ ) A d N ( τ ) . For the exponential kernel g ( t ) = e − wt we have, (cid:90) t g ( s − τ ) A d N ( τ ) = (cid:90) t e − w ( s − τ ) A d N ( τ ) = e − w ( s − t ) (cid:90) t e − w ( t − τ ) A d N ( τ ) = e − w ( s − t ) ( λ ( t ) − µ ) so given the value of λ ( t ) at time t then we can write λ ( s ) for later times as λ ( s ) = µ + e − w ( s − t ) ( λ ( t ) − µ ) + (cid:90) st g ( s − τ ) A d N ( τ ) Lets consider a process ξ ( s ) with intensity value at time t equal to λ ( t ) + a i as, ξ ( s ) = µ + e − w ( s − t ) ( λ ( t ) + a i − µ ) + (cid:90) st g ( s − τ ) A d N ( τ ) . Since a i (cid:23) , then given the same history in interval ( t, s ) , we have ξ ( s ) (cid:23) λ ( s ) . Then, we have: (cid:96) ( ξ ( s ) , u ( s )) ≤ (cid:96) ( λ ( s ) , u ( s )) . Now, taking the integration, then expectation (over all histories) and finally the minimization from theabove inequality does not change the direction of inequality. So it readily follows the required result. J ( λ ( t ) + a i , t ) ≤ J ( λ ( t ) , t ) CHE MSC OPL PRK DEG UNC
Time N ( t ) PRK (a) Core-periphery
Time
PRK (b) Hierarchical
Time
PRK (c) Homophily
Time
PRK (d) Heterophily
Time
PRK (e) Random
Figure 4: Performance over time of
Cheshire against several competitors for several types of Kroneckernetworks. Performance is measured in terms of overall number of tweets ¯ N ( t ) = (cid:80) u ∈V E [ N u ( t )] . In all cases,we tune the parameters Q , S and F such that the total number of incentivized tweets posted by our methodis equal to the budget used in the competing methods and baselines. E Proof of Lemma 6
Consider the following proposal of degree three for the cost-to-go function: J ( λ ( t ) , t ) = f ( t ) + (cid:88) i g i ( t ) λ i ( t ) + (cid:88) i (cid:88) j λ i ( t ) λ j ( t ) H ij ( t ) + (cid:88) i (cid:88) j (cid:88) k λ i ( t ) λ j ( t ) λ k ( t ) H ijk ( t ) If we plug this proposal in Eq. 9, and evaluate the coefficient of fourth degree terms like λ i ( t ) λ j ( t ) and equatethem to zero, then we can find the unknown coefficients H ijk ( t ) ’s as follows: ∀ i, j, t : (cid:88) k (cid:0) (cid:88) (cid:96) A (cid:96)k (cid:1) H ijk ( t ) = 0 Since the sum of positives terms is zero if and only if they all be zero, then H ijk ( t ) and consequently theterms with degree three in the proposal are all zero. So the proposal reduces to a quadratic proposal.It is quite straightforward to extend this argument for proposals with order m > and by equating thedegree m − terms, similarly conclude that coefficients of degree m terms in the proposal are zero. If werepeat this argument for m − , . . . , , we deduce that any proposal with arbitrary degree m ≥ , would resultin a quadratic optimal cost.Finally, according to the Stone-Weierstrass theorem, any continuous function in a closed interval canbe approximated as closely as desired by a polynomial function [27]. So by assuming the continuity of costfunction, it can be approximate by a polynomial as closely as desired, and then it also reduce to a quadraticproposal too. F Additional Experiments on Synthetic Data
Experimental setup.
In this section, we experiment with five different types of Kronecker networks [20]with nodes: (i) assortative networks (parameter matrix [0 . , .
3; 0 . , . ); (ii) dissortative networks( [0 . , .
96; 0 . , . ); (iii) random networks ( [0 . , .
7; 0 . , . ); (iv) hierarchical networks ( [0 . , .
1; 0 . , . );and, (v) core-periphery networks ( [0 . , .
5; 0 . , . ). For each network, we draw µ and A from a uniformdistribution U (0 , and set ω = 100 . Similarly as in the experiments with Twitter data, we compare theperformance of our algorithm with two state of the art methods, OPL [8] and MSC [10], and three baselines,PRK, DEG and UNC. Performance.
Figure 5 compares the performance of our algorithm against others in terms of overallaverage number of tweets ¯ N ( t ) = (cid:80) u ∈V E [ N u ( t )] for a fixed budget ¯ M ( t f ) = (cid:80) u ∈V M u ( t f ) ≈ . K. We findthat: (i) our algorithm consistently outperforms competing methods by large margins at time t f ; (ii) ittriggers up to %– % more posts than the second best performer by time t f ; (iii) MSC tends to use the15 .1 0.2 0.3 0.4 0.510 T i m e ( s e c ) t f t f t f CHEMSC (a) Running time vs t f −2 −1 T i m e ( s e c ) CHEMSC
OPL (b) Running time vs
Figure 5: Scalability of
Cheshire against several competitors. Panel (a) shows the running time against thecut-off time t f for a , node Kronecker network. Panel (b) shows the running time for several Kroneckernetworks of increasing size with t f = 0 . . In both panels, the average degree per node is . The experimentsare carried out in a single machine with 24 cores and 64 GB of main memory.budget too early, as a consequence, although it initially beats our method, it eventually gets outperformed bytime t f ; and, (iv) the baselines PRK and DEG have an underwhelming performance, suggesting that thenetwork structure alone is not an accurate measure of influence. Scalability.
Figure 5 shows that our algorithm scales to large networks and is almost an order of magnitudefaster than the second best performer, MSC [10]. For example, our algorithm takes ∼ seconds to steer anetwork with , nodes and average degree of while MSC takes ∼ minutes. G Twitter Datasets Description
We used the Twitter search API to collect all the tweets (corresponding to a 2-3 weeks period around theevent date) that contain hashtags related to the following events/topics: • Elections:
British election, from May 7 to May 15, 2015. • Verdict:
Verdict for the corruption-case against Jayalalitha, an Indian politician, from May 6 to May17, 2015. • Club:
Barcelona getting the first place in La-liga, from May 8 to May 16, 2016. • Sports:
Champions League final in 2015, between Juventus and Real Madrid, from May 8 to May 16,2015. • TV Show:
The promotion on the TV show “Games of Thrones”, from May 4 to May 12, 2015.We then built the follower-followee network for the users that posted the collected tweets using the Twitterrest API . Finally, we filtered out users that posted less than 200 tweets during the account lifetime, followless than 100 users, or have less than 50 followers. An account of the dataset statistics is given in Table 1. Dataset |V| |E| |H ( T Data ) | T = T simulation Elections 231 1108 1584 120.2Verdict 1059 10691 17452 22.11Club 703 4154 9409 19.23Sports 703 4154 7431 21.53TV Show 947 10253 13203 12.11
Table 1: Real datasets statistics https://dev.twitter.com/rest/public/search https://dev.twitter.com/rest/publichttps://dev.twitter.com/rest/public