A Survey on Influence Maximization in a Social Network
AA Survey on Influence Maximization in aSocial Network
Suman Banerjee a , Mamata Jenamani a , Dilip Kumar Pratihar a a Indian Institute of Technology, Kharagpur, West Bengal, India.
Abstract
Given a social network with diffusion probabilities as edge weights and an in-teger k , which k nodes should be chosen for initial injection of information tomaximize influence in the network? This problem is known as Target Set Se-lection in a social network ( TSS Problem ) and more popularly,
Social InfluenceMaximization Problem ( SIM Problem ). This is an active area of research in computational social network analysis domain since one and half decades or so.Due to its practical importance in various domains, such as viral marketing , tar-get advertisement , personalized recommendation , the problem has been studiedin different variants, and different solution methodologies have been proposedover the years. Hence, there is a need for an organized and comprehensive re-view on this topic. This paper presents a survey on the progress in and around TSS Problem . At last, it discusses current research trends and future researchdirections as well.
Keywords:
Target Set Selection Problem, Social Networks, InfluenceMaximization, Inapproxibility Results, Approximation Algorithm, GreedyStrategy, NP-Hard Problem. ∗ Corresponding author-Dilip Kumar Pratihar
Email addresses: [email protected] (Suman Banerjee), [email protected] (Mamata Jenamani), [email protected] (Dilip Kumar Pratihar)
Preprint submitted to Elsevier August 17, 2018 a r X i v : . [ c s . S I] A ug . Introduction A social network is an interconnected structure of a group of agents formedfor social interactions [1]. Nowadays, social networks play an important rolein spreading information, opinion, ideas, innovation, rumors etc. [2] [3]. Thisspreading process has a huge practical importance in viral marketing [4] [5],personalized recommendation [6], feed ranking [7], target advertisement [8], se-lecting influential twitters [9] [10], selecting informative blogs [11], etc. Hence,recent years have witnessed a significant attention in the study of influencepropagation in online social networks. Consider the case of viral marketing of acommercial house, where the goal is to attract the users for purchasing a par-ticular product. The best way to do this is to select a set of highly influentialusers and distribute them free samples. If they like the product, they will sharethe information to their neighbors. Due to their high influence, many of theneighbors will try for the product and share the information to their neighbors.This cascading process will be continued and ultimately a large fraction of theusers will try for the product. Naturally, number of free sample products will belimited due to economic reason. Hence, this process will be fruitful, if the freesamples can be distributed among the highly influential users and the problemhere bottoms down to select influential users from the network. This problemis known as
Social Influence Maximization Problem .Social influence occurs due to the diffusion of information in the network.This phenomenon in a networked system is well studied [12] [13]. Specifically,there are two popularly adopted models to study the diffusion process, namely Independent Cascade Model (abbreviated as
IC Model ), which collects the in-dependent behavior of the agents, and the other one is
Linear Threshold Model (abbreviated as
LT Model ), which captures the collective behavior of the agents(detailed discussion is deferred till Section 2.5) [14]. In both the models, infor- Now onwards, we will use Target Set Selection and Social Influence Maximization inter-changeably graph withthe users as the vertex set and social ties among the users as the edge set. Itis also assumed that the diffusion threshold (a measurement of how hard toinfluence the user and given in a numerical scale; more the value, more hardto influence the user) is given as the vertex weight and influence probabilitybetween two users as edge weight . In this settings, the SIM Problem is stated asfollows: for a given size k ( k ∈ Z + ), choose the set S of k nodes, such that σ ( S )gets maximized [15]. Here σ ( . ) is the social influence function . For any givenseed S , σ ( S ) returns the set of influenced nodes, when the diffusion process isover. In this survey, we have mainly focused on three aspects of the problem, asmentioned below. • Variants of this problem studied in the literature, • Hardness results of this problem in both traditional as well as parameter-ized complexity framework, • Different solution approaches proposed in the literature.The overview of this survey is shown in Figure 1. There are several other aspectsof the problem, such as
SIM in the presence of adversaries , in a time-varyingsocial network , in competitive scenario etc., which we have not considered inthis survey.The main goal of this survey is threefold: • to provide comprehensive understanding about the SIM Problem and itsdifferent variants studied in the literature,3 igure 1: Overview of this survey • to develop a taxonomy for classifying the existing solution methodologiesand present them in a concise manner, • to present an overview of the current research trend and future researchdirections regarding this problem.We set the following two criteria for the studies to be included in this survey: • Research work presented in the publication should produce theoreticallyor empirically better than some of the previously published results. • The presented solution methodology should be generic, i.e., it should workfor a network of any topology.
Rest of the paper is organized as follows: Section 2 describes some back-ground material required to understand the subsequent sections of this paper.Section 3 formally introduces the SIM Problem and its variants studied in the lit-erature. Section 4 describes hardness results of this problem in both traditionalas well as parameterized complexity theory framework. Section 5 describes somemajor research challenges in and around this problem. Section 6 describes theproposed taxonomy for classifying the existing solution methodologies in differ-ent categories and discuss them. Section 7 presents the summary of the survey4nd gives some future research directions. Finally, Section 8 presents concludingremarks regarding this survey.
2. Background
In this section, we have described relevant background topics upto requireddepth, such as basic graph theory , relation between SIM and existing graphtheoretic problems, approximation algorithm , parameterized complexity theory and information diffusion models in social networks. The symbols and notationsthat have been used in the subsequent sections of this paper are given in Table1. Graphs are popularly used to represent most of the real world networkedsystems including social networks [16] [17]. Here, we have reported some pre-liminary concepts of basic graph theory from [18]. A graph is denoted by G ( V, E )where V ( G ) and E ( G ) are the vertex set and edge set of G , respectively. Forany arbitrary vertex, u i ∈ V ( G ), its open neighborhood is defined as N ( u i ) = { u j | ( u i u j ) ∈ E ( G ) } . Closed neighborhood of u i will be N [ u i ] = u i ∪ N ( u i ). Degree of a vertex is defined as the cardinality of its open neighborhood, i.e., deg ( u i ) = |N ( u i ) | . For any S ⊂ V ( G ), its open neighborhood and close neigh-borhood will be N ( S ) = ∪ u i ∈ S N ( u i ) and N [ S ] = S ∪ N ( S ), respectively. Twovertices u i and u j are said to be true twins , if N [ u i ] = N [ u j ] and false twins ,if N ( u i ) = N ( u j ). A graph is weighted , if a real number is associated withits vertices or edges or both. A graph is directed , if its edges have directions.The edges that join the same pair of vertices are known as parallel edges, andan edge whose both the end points are same is known as self-loop . A graph is simple , if it is free from self-loop and parallel edges.Information diffusion process in a social network is represented by a sim-ple , directed and vertex and edge weighted graph G ( V, E, θ, P ). Here, V ( G ) = { u , u , . . . , u n } , the set of users of the network and E ( G ) = { e , e , . . . , e m } ,5 able 1: Symbols and Notations Symbols Interpretation G ( V, E, θ, P ) Directed, vertex and edge weighted social network V ( G ) Set of vertices of network GE ( G ) Set of edges of network GU Set of users of the network, i.e., U = V ( G ) n Number of users of the network, i.e., n = | V ( G ) | m Number of Edges of the network, i.e., m = | E ( G ) | θ Vertex weight function of G , i.e., θ : V ( G ) −→ [0 , θ i Weight of vertex u i , i.e., θ i = θ ( u i ) P Edge weight function, i.e., P : E ( G ) −→ [0 , p ij Edge weight of the edge ( u i u j ) N ( u i ) Open neighborhood of vertex u i N [ u i ] Closed neighborhood of vertex u i [ n ] Set { , , . . . , n }N in ( u i ) Incomming neighbors of vertex u i N out ( u i ) Outgoing neighbors of vertex u i deg in ( u i ) Indegree of vertex u i deg out ( u i ) Outdegree of vertex u i dist ( u, v ) Number of edges in the shortest path between u and v . S Seed set for diffusion, i.e.,
S ⊂ V ( G ) k Maximum allowable cardinality for the seed set, i.e., |S| ≤ kr Maximum allowable round for diffusion6he set of social ties among the users. θ and P are the vertex and edge weight function, which assign a numerical value in between 0 and 1 to each vertex andedge, respectively, as its weight, i.e., θ : V ( G ) −→ [0 ,
1] and P : E ( G ) −→ (0 , information diffusion , vertex and edge weights are called node threshold anddiffusion probability, respectively [19]. More the value of θ i , more hard to in-fluence the user u i and more the value of p ij , it is more probable that u i caninfluence u j . For any user u i ∈ V ( G ), its incoming neighbors and outgoingneighbors N in ( u i ) and N out ( u i ) are defined as: N in ( u i ) = { u j | ( u j u i ) ∈ E ( G ) } and N out ( u i ) = { u j | ( u i u j ) ∈ E ( G ) } , respectively. For any user u i ∈ V ( G ), its indegree and outdegree is defined as deg in ( u i ) = |N in ( u i ) | and deg out ( u i ) = |N out ( u i ) | , respectively. A path in a directed graph is a sequence of verticeswithout repetition, such that between every consecutive vertices there will bean edge . Two users are connected in the graph G , if there exists a directed pathbetween them. A directed graph is said to be connected, if there exists a pathbetween every pair of users. The TSS Problem is a more generalized version of many standard graphtheoretic problems discussed and mentioned in the literature, such as dominatingset with threshold [20], vector domination problem [21], k-tuple dominating set [22] (in all these problems instead of multiple rounds, diffusion can run onlyfor one round), vertex cover [23] (in this problem, vertex threshold is set equalto the number of neighbors of the node), irreversible k-conversion problem [24], r-neighbor bootstrap percolation problem [25] (where the threshold of each vertexis k or r respectively) and dynamic monopolies [26] (in this case, threshold ishalf of the neighbors of the user). Most of the optimization problems arising in real life are NP-Hard [27].Hence, we cannot expect to solve them by any deterministic algorithm in poly-nomial time. So, the goal is to get an approximate solution of the problem7ithin affordable time. Approximation algorithms serve this purpose and alsoprovide the worst case guarantee on solution quality. For a maximization prob-lem P , let A be an algorithm, which provides its solution and I be the set ofall possible input instances of P . For an input instance I of P ; let, A ∗ ( I ) isthe optimal solution and A ( I ) is the solution generated by the algorithm A .Now, A will be called an α -factor absolute approximation algorithm , if ∀ I ∈ I , |A ∗ ( I ) − A ( I ) | ≤ α and α -factor relative approximation algorithm , if ∀ I ∈ I , max { A ∗ ( I ) A ( I ) , A ( I ) A ∗ ( I ) } ≤ α ( A ( I ) , A ∗ ( I ) (cid:54) = 0) [28]. Section 6.1 of this paper de-scribes relative approximation algorithms for solving SIM Problem. Parameterized complexity theory is another way of dealing with NP-Hardoptimization problems. It aims to classify computational problems based on theinherent difficulty with respect to multiple parameters related to the problem.There are several complexity classes in parameterized complexity theory. Theclass FPT (
Fixed Parameter Tractable ) contains the problems for which, anyproblem with instances ( x, k ) ∈ I , where x is the input , k is the parameter and I is the set of instances; its running time will be of O ( f ( k ) | x | O (1) ), where f ( k )is the function depending on only k and | x | denotes the length of the input. W hierarchy is the collection of complexity classes with the property W [0] = F P T and W [ i ] ⊆ W [ j ] ∀ i ≤ j [29]. Many normal computational problems occupy thelower levels of hierarchy, i.e., W [1] and W [2]. In Section 4, we have describedhardness results of TSS Problem in parameterized complexity theoretic setting. Diffusion phenomena in a networked system has got attention from differ-ent disciplines, such as epidemiology (how diseases spread in a human contactnetwork?) [30], social network analysis (how information propagates in a socialnetwork?) [31], computer network (how computer virus propagates in an e-mailnetwork?) [32] etc.
Information Diffusion in an on-line social networks is aphenomenon by which word-of-mouth effect occurs electronically. Hence, the8echanism of information diffusion is very well studied [33] [34]. To study thediffusion process, there are some models in the literature [35]. Nature of thesemodels varies from deterministic to probabilistic . Here, we have described somewell studied information diffusion models from the literature. • Independent Cascade Model (IC Model) [14]: This is one of the well studiedprobabilistic diffusion models used by Kempe et al. [36] in their seminalwork of social influence maximization . In this model, a node can either bein active state (i.e., influenced) or in inactive state (i.e., not influenced).Initially (i.e., at t = 0), all the nodes except the seeds are inactive. Ev-ery active node (say, u i ) at time stamp t will get a chance to activateits currently inactive neighbor ( u j ∈ N out ( u i ) and u j is inactive) withprobability as their edge weight. If u i succeeds, then u j will become anactive node in time stamp t + 1. A node can change its state from inactiveto active but not from active to inactive. This cascading process will becontinued until no more active node is there in a time stamp. Suppose,this diffusion process starts at t = 0 and continued till t = T and A t denotes the set of active nodes till time stamp t , where t ∈ [0 , T ], then A ⊆ A ⊆ · · · ⊆ A t ⊆ A t +1 ⊆ · · · ⊆ A T ⊆ V ( G ).Node u i is said to be active at time stamp t , if u i ∈ A t \ A t − . • Linear Threshold Model ( LT Model ) [14]: This is another probabilisticdiffusion model proposed by Kempe et al. [36]. In this model, for anynode (say u i ), all its neighbors who are activated just at previous timestamp together make a try to activate that node. This activation processwill be successful, if the sum of the incoming active neighbor’s probabilitybecomes either greater than or equal to the node’s threshold, i.e., ∀ u j ∈N in ( u i ), if (cid:80) ∀ u j ∈N in ( u i ); u j ∈A t p ji ≥ θ i then, u i will become active at timestamp t + 1. This method will be continued until no more activation ispossible. In this model, we can use the negative influence, which is notpossible in IC Model. Later, several extensions of this two fundamental9odels have been proposed [37].In both IC as well as LT Model, it is assumed that diffusion probabilitybetween two users is known. However, later there were several studies forcomputing diffusion probability [38] [39] [40] [41] [42]. • Shortest Path Model ( SP Model ): This is a special case of IC Modelproposed by Kimura et al. [33]. In this model, an inactive node willget a chance to become active only through the shortest path from theinitially active nodes, i.e., at t = min u ∈A ,v ∈ V ( G ) \A dist ( u, v ). A slightlydifferent variation of SP Model proposed by the same author is SP1Model , which tells that an inactive node will get a chance of activationat t = min u ∈A ,v ∈ V ( G ) \A dist ( u, v ) and t = min u ∈A ,v ∈ V ( G ) \A dist ( u, v ) + 1. • Majority Threshold Model ( MT Model ): This is the deterministic thresh-old model proposed by Valente [43]. In this model, the vertex thresholdis defined as θ i = (cid:108) deg ( u i )2 (cid:109) , which means that a node will become active,when atleast half of its neighbors are already active in nature. • Constant Threshold Model ( CT Model ): This is another deterministic dif-fusion model, where vertex threshold can be any value from 1 to its degree,i.e., θ i ∈ [ deg ( u i )]. • Unanimous Threshold Model ( UT model ) [23]: This is the most influenceresistant model of diffusion. In this model, for each node in the network,its threshold value is set to its degree i.e., ∀ u i ∈ V ( G ), θ i = deg ( u i ).There are many other diffusion models, such as weighted cascade model , whereedge weight will be the reciprocal of the degree of the node; trivalency model ,where the edge weights are uniformly taken from the set: { . , . , . } etc.Readers require a detailed and exhaustive treatment on information diffusionmodels may refer to [44]. 10 . SIM Problem and its Variants In literature, SIM problem has been studied since early two thousand. Ini-tially, this problem was introduced by Domingos and Richardson in the contextof viral marketing [45]. Due to its substantial practical importance across mul-tiple domains, different variants of this problem have been introduced. In thissection, we will describe them one by one.
Basic SIM Problem [46]:.
In the basic version of the
TSS Problem along witha directed social network G ( V, E, θ, P ), we are given two integers: k and λ , andasked to find out a subset of atmost k nodes such that after the diffusion processis over atleast λ number of nodes are activated. Mathematically, this problemcan be stated as follows: Instance:
A Directed Graph G ( V, E, θ, P ) , λ ∈ [ n ] and k ∈ Z + . Problem:
Basic TSS Problem [Find out a
S ⊂ V ( G ) , such that |S| ≤ k , and | σ ( S ) | ≥ λ ]. Output:
The Seed Set for Diffusion
S ⊂ V ( G ) and |S| ≤ k .Top k-node Problem / Social Influence Maximization Problem (SIM Problem)[47]:. This variant of the problem is most well studied. For a given socialnetwork G ( V, E, θ, P ), this problem asks to choose a set S of k nodes (i.e., S ⊂ V ( G ) and |S| = k ) such that the maximum number of nodes of the networkbecome influenced at the end of diffusion process, i.e., σ ( S ) will be maximized.Most of the algorithms presented in Section 6 are solely develop for solving thisproblem. Mathematically, the Problem of Top k-node Selection will be like thefollowing: 11 nstance:
A Directed Graph G ( V, E, θ, P ) and k ∈ Z + . Problem:
Top k-node Problem [Find out a
S ⊂ V ( G ) where |S| = k such that and for any other S (cid:48) ⊂ V ( G ) with |S (cid:48) | = k , σ ( S ) ≥ σ ( S (cid:48) ) ]. Output:
The Seed Set for Diffusion
S ⊂ V ( G ) and |S| = k .Influence Spectrum Problem. [48] In this problem, along with the social net-work G ( V, E, θ, P ), we are also given with two integers: k lower and k upper with k upper > k lower . Our goal is to choose a set S for each k ∈ [ k lower , k upper ], suchthat social influence in the network ( σ ( S )) is maximum in each case. Intutively,solving one instance of this problem is equivalent to solving ( k upper − k lower + 1)instances of SIM problem. As viral marketing is basically done in differentphases and in each phase, seed set of different cardinalities can be used, in-fluence spectrum problem appears in a natural way. Mathematically, influencespectrum problem can be written as follows: Instance:
A Directed Graph G ( V, E, θ, P ) and k lower , k upper ∈ Z + with k upper > k lower . Problem:
Influence Spectrum Problem [Find out a
S ⊂ V ( G ) with |S| = k , ∀ k ∈ [ k lower , k upper ] such that and for any other S (cid:48) ⊂ V ( G ) with |S (cid:48) | = k , σ ( S ) ≥ σ ( S (cid:48) ) ]. Output:
The Seed Set for Diffusion
S ⊂ V ( G ) and |S| = k for each k ∈ [ k lower , k upper ] . λ Coverage Problem [47]:.
This is another variant of SIM Problem, which con-siders the minimum number of influenced nodes required at the end of diffusion.For a given social network G ( V, E, θ, P ) and a constant λ ∈ [ n ], this problemasks to find a subset S of its nodes with minimum cardinality, such that at least λ number of nodes will be influenced at the end of diffusion process. Mathe-12atically, this problem can be described in the following way: Instance:
A Directed Graph G ( V, E, θ, P ) and λ ∈ [ n ] . Problem: λ Coverage Problem [Find out the most minimumcardinality subset
S ⊂ V ( G ) such that | σ ( S ) | ≥ λ ]. Output:
The minimum cardinality seed set S for diffusion.Weighted Target Set Selection Problem (WTSS Problem) [49]:. This is an-other (infect weighted) variant of SIM Problem. Along with a social network G ( V, E, θ, P ), we are given another vertex weight function , φ : V ( G ) → N ,signifying the cost associated with each vertex. This problem asks to find outa subset S , which minimizes total selection cost , and also all the nodes will beinfluenced at the end of diffusion. Mathematically, this problem can be statedas follows: Instance:
A Directed Graph G ( V, E, θ, P ) , vertex cost function φ : V ( G ) → N . Problem:
Weighted TSS Problem [Find out the subset
S ⊂ V ( G ) such that φ ( S ) is minimum and | σ ( S ) | = n ]. Output:
The Seed Set for Diffusion
S ⊂ V ( G ) with minimum φ ( S ) value.r-round min-TSS Problem [50]:. It is a variant of SIM Problem, which considersthe number of rounds required to complete the diffusion process. Along witha directed graph G ( V, E, θ, P ), we are given the maximum number of allowablerounds r ∈ Z + , and asks to find out a minimum cardinality seed set S , whichactivates all the nodes of the network within r -round. Mathematically, thisproblem can be described as follows: 13 nstance: A Directed Graph G ( V, E, θ, P ) and r ∈ Z + . Problem: r-round min-TSS Problem [Find out the most minimumcardinality subset S such that ∪ ri =1 σ i ( S ) = V ( G ) ]. Output:
The Seed Set for Diffusion
S ⊂ V ( G ) . Here, σ i ( S ) denotes the set of influenced nodes from the seed set S at the i -thround of diffusion. Budgeted Influence Maximization Problem (BIM Problem) [51]:.
This is an-other variant of SIM Problem, which is recently gaining popularity. Along witha directed graph G ( V, E, θ, P ), we are given with a cost function C : V ( G ) −→ Z + and a fixed budget B ∈ Z + . Cost function C assigns a nonuniform selection costto every vertex of the network, which is the amount of incentive need to bepaid, if that vertex is selected as a seed node. This problem asks for selectinga seed set within the budget, which maximizes the spread of influence in thenetwork. Instance:
A Directed Graph G ( V, E, θ, P ) , a cost function C : V ( G ) −→ Z + and affordable budget B ∈ Z + . Problem:
Budgeted Influence Maximization Problem [Find out theseed set ( S ) such that (cid:80) u ∈S C ( u ) ≤ B and for any other seed set S (cid:48) with (cid:80) v ∈S (cid:48) C ( v ) ≤ B , | σ ( S ) | ≥ | σ ( S (cid:48) | )]. Output:
The Seed Set for Diffusion
S ⊂ V ( G ) with (cid:80) u ∈S C ( u ) ≤ B . ( λ, β, α ) TSS Problem [52]:.
This is another variant of TSS Problem, whichconsiders the maximum cardinality of the seed set ( β ), maximum allowablediffusion rounds ( λ ), and number of influenced nodes at the end of diffusionprocess ( α ) all together. Along with the input graph G ( V, E, θ, P ), we are given14ith the parameters λ, β and α . Mathematically, this problem can be stated asfollows: Instance:
A Directed Graph G ( V, E, θ, P ) , three parameters λ, β ∈ N and α ∈ [ n ] . Problem: ( λ, β, α ) TSS Problem [Find out the subset
S ⊂ V ( G ) suchthat |S| ≤ β , | ∪ λi =1 σ i ( S ) | ≥ α ]. Output:
The Seed Set for Diffusion
S ⊂ V ( G ) and |S| ≤ β . ( λ, β, A ) TSS Problem [52]:.
This is a slightly different from the ( λ, β, α ) TSSproblem, in which instead of the required number of the nodes after the diffusionprocess, it explicitly maintains which nodes should be influenced. Along with theinput social network G ( V, E, θ, P ), we are also given with maximum allowablerounds ( λ ), maximum cardinality of the seed set ( β ), and set of nodes A ⊆ V ( G )need to be influenced at the end of diffusion process as input. This problemasks for selecting a seed set of maximum β elements, which will influence all thenodes in A within λ rounds of diffusion. Mathematically, the problem can bestated as follows: Instance:
A Directed Graph G ( V, E, θ, P ) , A ⊆ V ( G ) and twoparameters λ, β ∈ N . Problem: ( λ, β, A ) TSS Problem [Find out the subset
S ⊂ V ( G ) suchthat |S| ≤ β , A ⊆ ∪ λi =1 σ i ( S ) ]. Output:
The Seed Set for Diffusion
S ⊂ V ( G ) and |S| ≤ β . ( λ, A ) TSS Problem [52]:.
This is slightly different from ( λ, β, A ) TSS Prob-lem. Here, we are interested in finding the minimum cardinality seed set,such that within some fixed numbers of diffusion rounds ( λ ), a subset of thenodes ( A ) will be influenced. Mathematically, the problem can be stated asfollows: 15 nstance: A Directed Graph G ( V, E, θ, P ) , A ⊂ V ( G ) and λ ∈ N . Problem: ( λ, A ) TSS Problem [Find out the subset S such that A ⊆ ∪ λi =1 σ i ( S ) and for any other S (cid:48) with |S (cid:48) | < |S| A (cid:54)⊆ ∪ λi =1 σ i ( S (cid:48) ) ]. Output:
Minimum cardinality Seed Set for Diffusion
S ⊂ V ( G ) . We have described different variants of TSS Problem in social networks avail-able in the literature. It is surprising to see that only Top-k node Problem hasbeen studied, in depth.
4. Hardness Results of TSS Problem
In this section, we have described hardness results of SIM Problem underboth general as well as parameterized complexity theoretic perspective. Ini-tially, the problem of social influence maximization was posed by Domingos andRichardson [45] [53] in the context of viral marketing. However, Kempe et al.[36] was the first to investigate the computational issues of the problem. Theywere able to show that SIM Problem under IC and LT Model is a special caseof
Set Cover Problem and
Vertex Cover Problem , respectively. Both the setcover and vertex cover problems are well-known
NP-Hard problems [27]. Theconclusion is presented as Theorem 1.
Theorem 1. [36] Social Influence Maximization Problem is NP-Hard for bothIC as well as LT model and also NP-Hard to approximate within a factor of n (1 − (cid:15) ) ∀ (cid:15) > . Chen [23] studied variant of SIM Problem namely λ Coverage Problem . Hisstudy was different from Kempe et al.’s [36] study in two ways. First one is,Kempe et al. [36] investigated the Top- k node problem, whereas Chen [23] stud-ied the λ -coverage problem. Secondly, Kempe et al. [36] studied the diffusionprocess under IC and LT Models, which are probabilistic in nature, whereas16hen [23] considered all the deterministic diffusion models like majority thresh-old model , constant threshold model and unanimous threshold model . In general,for the λ Coverage Problem, Chen [23] came up with a seminal result presentedin Theorem 2.
Theorem 2. [23] TSS Problem cannot be approximated with in the constantfactor O (2 log (1 − (cid:15) ) n ) unless N P ⊂ DT IM E ( n polylog ( n ) ) for any fixed constant (cid:15) > . This theorem can be proved by a reduction from the
Minimum Represen-tative Problem given in [54]. Next, they have shown that in majority thresholdmodel also, λ -coverage problem follows the similar result as presented in Theo-rem 2. However, when θ ( u ) = 1, ∀ u ∈ V ( G ) then TSS Problem can be solvedvery intuitively as targeting one node in each component results into the acti-vation of all the nodes of the network. Surprisingly, this problem becomes hard,when we allow the vertex threshold to be at most 2, i.e., θ ( u ) ≤ ∀ u ∈ V ( G ).They proved the following result in this regard. Theorem 3. [23] The TSS Problem is NP-Hard, when thresholds are at most2, even for bounded bipartite graphs.
This theorem can be proved by a reduction from a variant of 3-SAT Problempresented in [55]. Moreover, Chen [23] has shown that for unanimous thresholdmodel , the
TSS Problem is equivalent to vertex cover problem , which is a well-known NP-Complete Problem.
Theorem 4. [23] If all the vertex thresholds of the graph are unanimous (i.e. ∀ u ∈ V ( G ) , θ ( u ) = deg ( u ) ), then the TSS Problem is identical to vertex coverproblem. Chen [23] has also shown that if the underline graph is tree, then theTSS Problem can be solved in polynomial time and they have also given the
ALG-Tree
Algorithm, which does this computation. To the best of the authors’knowledge, there is no other literature, which focuses on the hardness analysis17f the TSS Problem in traditional complexity theoretic perspective. We havesummarized the results in Table 2.Now, we describe the hardness results based on the parameterized complex-ity theoretic perspective. For basic notions about parameterized complexity ,readers may refer to [56]. Bazgan et al. [57] showed that SIM Problem underconstant threshold model (CTM) does not have any parameterized approxima-tion algorithm with respect to the parameter seed set size . Chopin et al. [58],[59] studied the TSS Problem in parameterized settings with respect to theparameters related to network cohesiveness like clique cover number (numberof cliques required to cover all the vertices of the network [60]), distance toclique (number of vertices need to be deleted to obtain a clique), cluster vertexdeletion number (number of vertices to delete in order to obtain a collectionof disjoint cliques); parameters related to network density like distance to co-graph , distance to interval graph ; parameters related to sparsity of the network,namely vertex cover number (number of vertices to remove to obtain an edge-less graph), feedback edge set number and feedback vertex set number (numberof edges or vertices to remove to obtain a forest), pathwidth , bandwidth . It isinteresting to note that computing all the parameters except feedback edge setnumber is NP-Hard problem. The version of TSS Problem, they have workedwith is λ -coverage problem with λ = n . They came up with the following twoimportant results related to the sparsity parameters of the network: Theorem 5. [58] TSS Problem with majority threshold model is W[1] hard evenwith respect to the combined parameter feedback vertex set, distance to co-graph,distance to interval graph, and path width.
Theorem 6. [58] TSS Problem is fixed-parameter tractable with respect to theparameter bandwidth.
For proving the above two theorems, authors have used reduction rules used in[61] and [62]. Results related to dense structure property of the network is givenin Theorems 7 through 9. 18 heorem 7.
TSS Problem is W[1]-Hard with parameter cluster vertex deletionnumber.
Theorem 8.
TSS Problem is NP-Hard and W[2] Hard with respect to the pa-rameter target set size ( k ), even on graphs with clique cover number of two. Theorem 9.
TSS Problem is fixed parameter tractable with respect to the pa-rameter ‘distance l to clique’, if the threshold function satisfies following prop-erties θ ( u ) > g ( l ) ⇒ θ ( u ) = f (Γ( u )) ∀ u ∈ V ( G ) , f : P ( V ( G )) −→ N and g : N −→ N . For detailed proof of Theorems 7 through 9, readers may refer to [58]. All theresults related to the parameterized complexity theory has been summarized inTable 3.
5. Major Research Challenges
Before entering into the critical review of the existing solution methodolo-gies, in this section, we provide a brief discussion on major research challengesconcerned with the SIM Problem. This will help the reader to understand whichcategory of solution methodology can handle what challenge. • Trade of Between Accuracy and Computational Time:
From thediscussion in Section 4, it is now well understood that the SIM Problemis computationally hard from both traditional as well as parameterizedcomplexity theoretic prospective, in general. Hence, for some given k ∈ Z + , obtaining the most influential k nodes within feasible time is notpossible. In this scenario, the intuitive approach could be to use someheuristic method for selecting seed nodes. This will lead to less time forseed set generation. However, the number of influenced nodes generatedby the seed nodes could be also arbitrarily less. In this situation, it is animportant issue to design algorithms, which will run in affordable timeand also, the gap between the optimal spread and the spread due to theseed set selected by an algorithm will be as much less as possible.19 ame of theProblem DiffusionModel Major Findings SIM IC Model A special case of set cover problem and henceNP-Hard.LT Model A special case of vertex cover problem andhence NP-Hard. λ -Coverage Problem MT Model Not only NP-Hard as well as cannot be approximated in the con-stant factor O (2 log (1 − (cid:15) ) n ) unless N P ⊂ DT IM E ( n polylog ( n ) )CT Modelwith θ ( u ) = 1, ∀ u ∈ V ( G ) Can be solved trivially by selecting a vertexfrom each component of the network.CT Modelwith θ ( u ) ≤ ∀ u ∈ V ( G ) NP-Hard even for bounded bipartite graphs.UT Model Identical to vertex cover problem and henceNP-Hard Table 2: Hardness results of TSS Problem and its variants in traditional complexity theoryperspective. ame of theProblem DiffusionModel Parameter Major Findings SIM CT Model with θ ( u ) ∈ [ deg ( u )] Seed Set Size Does not have any param-eterized approximation al-gorithm. λ -coverageProblem with λ = n MT Model Feedback vertexset number, Path-width, Distance tocograph, Distanceto interval graph The problem is W [1]-Hard. λ -coverageProblem with λ = n GT Model Cluster vertex dele-tion number The problem is W [1]-Hard λ -coverageProblem with λ = n CT Model Cluster vertex dele-tion number The problem is fixed pa-rameter tractable. λ -coverageProblem with λ = n GT Model Seed set size The problem is W [2]-Hard λ -coverageProblem with λ = n MT Model, CTModel distance to clique The problem is fixed pa-rameter tractable.
Table 3: Hardness results of TSS Problem and its variants in parameterized complexity theoryperspective. Breaking the Barrier of Submodularity:
In general, the social in-fluence function σ ( . ) is submodular (Discussed in Section 6.1). However,in many practical situations, such as opinion and topic specific influencemaximization , the social influence function may not be submodular [63][64]. This happens because one node can switch its state from positiveopinion to negative opinion and the vice-versa. In this scenario, solvingthe SIM Problem may be more challenging due to the absence of submod-ularity property in the social influence function. • Practicality of the Problem:
In general, the SIM Problem takes manyassumptions, such as every selected seed will perform up to expectationin the spreading process, influencing each node of the network is equallyimportant etc. This assumptions may be unrealistic in some situations.Assume the case of target advertisement , where instead of all the nodes,a set of target nodes are chosen and the aim is to maximize the influencewithin the target nodes [65] [66]. In another way, due to the probabilisticnature of diffusion, a seed node may not perform up to expectation in theinfluence spreading process. Solving the SIM Problem and its variants willbe more challenging, if we relax these assumptions. • Scalability:
Real life social networks have millions of nodes and bil-lions of edges. So, solving the SIM and related problems for real lifesocial networks, scalability should be an important issue for any solutionmethodology. • Theoretical Challenges:
For a computational problem, any of its solu-tion methodology is concerned with two aspects. First one is the compu-tational time . This is measured as the execution time, when the method-ology is implemented with real life problem instances. The second one isthe computational complexity . This is measured as the asymptotic bound of the methodology. Theoretical research on any computational problemalways concerned with the second aspect of the problem. Hence, the the-22 igure 2: Proposed taxonomy for classifying the solution methodologies. oretical challenge for the SIM Problem is to design algorithms with goodasymptotic bounds.
6. Solutions Methodologies
Due to the inherent hardness of the SIM Problem, over the years researchershave developed algorithms for finding seed set for obtaining near-optimal influ-ence spread. In this section, the available solution methodologies in the litera-ture have been described. First we describe our proposed taxonomy for classi-fying the solution methodologies. Figure 2 gives a diagrammatic representationof the proposed taxonomy and we describe them below. • Approximation algorithms with provable guarantee : Algorithmsin this category give the worst case bound for influence spread. However,most of them suffer from the scalability issues, which means, with theincrease of the network size, running time grows heavily. Many of thealgorithms of this category have near optimal asymptotic bounds.23
Heuristic solutions : Algorithms of this category do not give any worstcase bound on influence spread. However, most of them have more scal-ability and better running time compared to the algorithms of previouscategory. • Meta-heuristic solutions : Methodologies of this category are the meta-heuristic optimization algorithms and many of them are developed basedon the evolutionary computation techniques. These algorithms also donot give any worst case bound on influence spread. • Community-Based Solutions : Algorithms of this category use com-munity detection of the underlying social network as an intermediate stepto bring down the problem into community level and improves scalability.Most of the algorithms of this category are heuristic and hence, do notprovide any worst case bound on influence spread. • Miscellaneous : Algorithms of this category do not follow any particularproperty and hence, we put them under this heading.
Kempe et al. [36] [67] [68] were the first to study the problem of social in-fluence maximization as a combinatorial optimization problem and investigatedits computational issues under two diffusion models, namely LT and IC mod-els. In there studies, they assumed that the social influence function , σ () is sub-modular and monotone . The function σ : 2 V ( G ) → R + will be sub-modular,if it follows the diminishing return property , which means ∀ S ⊂ T ⊂ V ( G ), u i ∈ V ( G ) \ T ; σ ( S ∪ u i ) − σ ( S ) ≥ σ ( T ∪ u i ) − σ ( T ) and σ will be mono-tone, if for any S ⊂ V ( G ) and ∀ u i ∈ V ( G ) \ S , σ ( S ∪ u i ) ≥ σ ( S ). Theyproposed a greedy strategy for selecting seed set presented in Algorithm 1.24 lgorithm 1: Kempe et al.’s [36] Greedy Algorithm for
Seed Set Selection .( Basic Greedy ) Data:
Given Social Network G ( V, E, θ, P ) and some k ∈ Z + . Result:
Seed Set for diffusion
S ⊂ V ( G ). S ← φ ; for i = 1 to k do u = argmax u i ∈ V ( G ) \S σ ( S ∪ u i ) − σ ( S ); S ← S ∪ u return S Starting with the empty seed set ( S ), Algorithm 1 iteratively selects nodewhich is currently not in S , and inclusion of which to S causes the maximummarginal increment in σ (). Let us assume that S i denotes the seed set at i − th iteration of the ‘for’ loop in Algorithm 1. In ( i +1) − th iteration, S i +1 = S i ∪{ u } ,if σ ( S ∪ u ) − σ ( S ) value becomes the maximum among all u ∈ V ( G ) \ S i . Thisiterative process will be continued until we reach the allowed cardinality of S .Kempe et al. [36] showed that Algorithm 1 provides (1 − e − (cid:15) ) with (cid:15) > Theorem 10.
Algorithm 1 provides (1 − e − (cid:15) ) with (cid:15) > factor approximationbound for the SIM Problem; i.e.; if S ∗ be the k element optimal seed set, then σ ( S ) ≥ (1 − e ) .σ ( S ∗ ) , where e = (cid:80) ∞ x =1 1 x ! . Though Algorithm 1 gives good approximation bound on influence spread, itsuffers from two major shortcomings. For example, for any given seed set S ,exact computation of the influence spread (i.e., σ ( S )) is P - Complete . Hence,they approximate the influence spread by running a huge number of
Monte CarloSimulations (MCS), counting total number of influenced nodes in all simulationruns and taking average with the number of runs. However, recently Maehara etal. [69] developed the first procedure for exact computation of influence spreadusing binary decision diagrams . Secondly, the number of times influence function( σ ( . )) needs to be evaluated is quite huge. For selecting a seed set of size k with25 number of MCS runs in a social network having n nodes and m edges willrequire O ( kmn R ) number of influence function evaluations. Hence, applicationof this algorithm for a medium size networks (only consisting of 15000 nodes;though real life networks are much larger) appears to be unrealistic [70], whichmeans that the algorithm is not scalable enough.In spite of having a few drawbacks, Kempe et al.’s [36] study is consideredto be the foundational work on the SIM Problem. This study has triggereda vast amount of research in this direction. In most of the cases, the mainfocus was to reduce the scalability problem incurred by Basic Greedy Algorithmin Kempe et al.’s work. Some of them landed with heuristics, in which theobtained solution could be far away from the optima. Still a few studies arethere, in which scalability problem was reduced significantly without loosingapproximation ratio. Here, we have listed the algorithms which could provideapproximation guarantee, whereas in Section 6.2, we have described all theheuristic methods. • CELF : For improving the scalability problem, Leskovec et al. [11] pro-posed a
Cost Effective Lazy Forward (CELF) scheme by exploiting thesub-modularity property of the social influence function. The key idea intheir study was: for any node, its marginal gain in influence spread in thecurrent iteration cannot be more than its marginal gain in the previous it-erations. Using this idea, they were able to make a drastic reduction in thenumber of evaluations of the influence estimation function ( σ ( . )), whichleads to significant improvement in running time though the asymptoticcomplexity remains the same as that of the Basic Greedy Algorithm (i.e., O ( kmn R )). Reported results in their paper shows that CELF can speedup the computation process upto 700 times compared to Basic GreedyAlgorithm on benchmark data sets. This algorithm is also applicable inmany other contexts, such as finding informative blogs in a web blog net-work , optimal placement of sensors in a water distribution network fordetecting out-breaks etc. 26 CELF++ : Goyal et al. [71] proposed an optimized version of CELF byexploiting the sub-modularity property of social influence function andnamed it as CELF++. For each node u of the network, CELF++ main-tains a table of the form < u.mg , u.prev best, u.mg , u.f lag > where u.mg σ ( . ) for the current S ; u.prev best is thenode with the maximum marginal gain among the users scanned till nowin the current iteration; u.mg σ ( . ) for u withrespect to the S ∪ { prev best } and u.f lag is the iteration number, when u.mg u.prev best is included in the seed set in the current iteration, then the marginal gainof u in σ ( . ) with respect to S ∪ { prev best } need not be recomputed in thenext iteration. Reported results showed that CELF++ is 35-55 % fasterthan CELF though the asymptotic complexity remains the same. • Static Greedy : Cheng et al. [72] developed this algorithm for solvingSIM problem, which provides both guaranteed accuracy as well as highscalability. This algorithm works in two stages. In the first stage, R num-ber of Monte Carlo snapshots are taken from the social network, whereeach edge ( uv ) is selected based on the associated diffusion probability p uv . In the second stage, starting from the empty seed set, a node havingthe maximum average marginal gain in influence spread over all sampledsnapshots will be selected as a seed node. This process will be contin-ued until k nodes are selected. This algorithm has the running time of O ( R m + k R m (cid:48) n ) and space requirement of O ( R m (cid:48) ), where R and m (cid:48) arethe number of Monte Carlo samples and average number of active edges inthe snapshots, respectively. Reported results show that the Static Greedyreduces the computational time by two orders of magnitude, while achiev-ing the better influence spread compared to Degree Discount Heuristic(DDH), Maximum Degree Heuristic (MDH), Prefix excluding MaximumInfluence Arborescence (PMIA) (discussed in Section 6.2) Algorithms. • Borgs et al.’s Method:
Borgs et al. [73] proposed a completely dif-27erent approach for solving SIM Problem under IC Model using reversereachable sampling technique . Other than the MCS runs , this is a newapproach for estimating the influence spread. Their algorithm is random-ized and succeeds with the probability of and has the running time of O (( m + n ) (cid:15) − log n ), which improves the previously best known algorithmhaving the complexity of O ( mnkP OLY ( (cid:15) − )). Algorithm proposed byBorgs et al. is near-optimal since the lower bound is Ω( m + n ). Thisalgorithm works in two phases. In the first phase, stochastically a hyper-graph ( H ) is generated from the input social network. Second phase isconcerned with the seed set selection. This is done by repeatedly choosingthe node with maximum degree in H , deleting it along with its incidenceedges from H . The k -element set obtained in this way is the seed setfor diffusion. This work is mostly theoretically enriched and lacking ofpractical experimentation. • Zohu et al.’s Method : Zohu et al. [74] improved the approximationbound from (1 − e ) (which is approximately 0.63) to 0.857. They de-signed two approximation algorithms: first algorithm works for the prob-lem, where the cardinality of the seed set ( S ) is not restricted and thesecond one works, when there is some restricted upper bound on the car-dinality of seed set. They formulated the influence maximization problemas an optimization problem given below. max S⊂ V ( G ) (cid:88) u ∈S ,v ∈ V ( G ) \S p uv , (1)where p uv is the influence probability between the users: u and v . Theyconverted this optimization problem into a quadratic integer programmingproblem and solved the problem using the concept of semidefinite pro-gramming [75]. • SKIM : Cohen et al. [76] proposed a
Sketch-Based Influence Maximiza-tion (SKIM) algorithm, which improves the Basic Greedy Algorithm byensuring in every iteration, with sufficiently high probability, or in expec-28ation, the node we choose to add to the seed set has a marginal gainthat is close to the maximum one. The running time of this algorithm is O ( nl + (cid:80) i =1 | E i | + m(cid:15) − log n ), where l is the number of snap shots of G , E i is the edge set of G i . Reported results show that SKIM has highscalability over Basic Greedy, Two phase Influence Maximization (TIM),Influence Ranking and Influence Estimation (IRIE) etc. without compro-mising influence spread. • TIM : Tang et al. [77] developed a
Two-phase Influence Maximization (TIM) algorithm, which has the expected running time of O (( k + l )( n + m ) log n/(cid:15) ) with atleast (1 − n − l ) probability for some given k , (cid:15) and l . Asits name suggests, this algorithm has two phases. In the first phase, TIMcomputes lower bound on the maximum expected influence spread amongall k sized sets and uses this lower bound to estimate a parameter φ . In thesecond phase, φ number of reverse reachability (RR) set samples have beenpicked up from the social network. Then, it derives a k sized seed set thatcovers the maximum number of RR sets and returns as the final result.Reported results shows that TIM is two times faster than CELF++ andBorgs et al.’s [73] Method, while achieving the same influence spread. Toimprove the running time of TIM, Tang et al. [77] proposed a heuristic,which takes all the RR sets, generated in an intermediate step of secondphase of TIM as inputs. Then, it uses a greedy approach for the maximumcoverage problem for selecting the seed set. This modified version of TIMis named as TIM + . Reported results showed that TIM + is two times fasterthan TIM. • IMM : Tang et al. [78] proposed
Influence Maximization via Martingales (IMM) (a kind of stochastic process, in which, for the given current andpreceding values, the conditional expectation of the next value, will be thecurrent value itself), which achieves a O (( k + l )( n + m ) log n/(cid:15) ) expectedrunning time and returns (1 − e − (cid:15) ) factor approximate solution withprobability of (1 − n − l ). IMM Algorithm also has two phases like TIM29nd TIM + . First phase is concerned with sampling RR sets from thegiven social network and the second phase is concerned with the seed setselection. In the first phase, unlike TIM and TIM + , RR sets generatedin the first phase are dependent because ( i + 1)-th RR set is generatedbased on whether first i of RR sets are satisfying stopping criteria or not.In IMM, the RR sets generated in the sampling phase are reused in nodeselection phase, which is not the case in TIM or TIM+. In this way, IMMcan eliminate a lot of unnecessary computations, which leads to significantimprovement in running time though asymptotic complexity remains thesame as that of TIM. Reported results conclude that IMM outperformsTIM, TIM+, IRIE (described in Section 6.2) based on running time whileachieving comparable influence spread. • Stop-and-Stare : Nguyen et al. [79] developed the Stop-and-Stare Algo-rithm (SSA) and its dynamic version DSSA for
Topic-aware Viral Mar-keting (TVM) problem. We have not discussed this problem, as it comesunder topic aware influence maximization. However, this solution method-ology can be used for solving SIM problem with minor modification. Theyshowed that, the number of RR set samples used by their algorithms isasymptotically minimum. Hence, Stop-and-Stare is 1200 times faster thanthe state-of-the art IMM algorithm. We are not discussing the results, asthey are for the TVM problem and out of the scope of this survey. • BCT : Recently, Nguyen et al. [80] proposed
Billion-scale Cost-awardTargeted (BCT) algorithm for solving cost-aware targeted viral marketing (CTVM) introduced by them. We have not discussed this problem, as itcomes under topic aware influence maximization. However, this solutionmethodology can be adopted for solving SIM Problem as well under bothIC and LT Models and have the running time of O (( k + l )( n + m ) log n/(cid:15) )and O (( k + l ) n log n/(cid:15) ), respectively. We are not discussing about theresults, as they are for CTVM Problem and out of scope of this survey. • Nguyen et al.’s Method : Nguyen et al. [51] studied the
Budgeted Influ- nce Maximization Problem described in Section 3. They have formulatedthe following optimization problem in the context of Budgeted InfluenceMaximization : max σ ( S ) (2)subject to, (cid:88) u ∈S C ( u ) ≤ B (3)Now, if ∀ u ∈ V ( G ), C ( u ) = 1, then it becomes the SIM Problem. To solvethis problem, they proposed two algorithms. First one is the modificationof basic greedy algorithm proposed by Kempe et al. [36] (Algorithm 1) andsecond one was adopted from [81]. In the first algorithm ∀ u ∈ V ( G ) \ S ,they computed the increment of influence in unit cost as follows: δ ( u ) = σ ( S ∪ u ) − σ ( S ) C ( u ) (4)Now, the algorithm choose u to include in the seed set ( S ), if it maximizedthe objective function as well as C ( S i ∪ u ) ≤ B . This iterative processwill be continued until no more nodes can be added within the budget.However, this algorithm does not give any constant approximation ratio.This algorithm can be modified to get the constant approximation ratio,as given in Algorithm 2. Algorithm 2:
Nguyen et al.’s [51] Greedy Algorithm for BIM Problem.
Data:
Given Social Network G ( V, E, θ, P ), cost function C : V ( G ) −→ Z + some B ∈ Z + . Result:
Seed Set for diffusion
S ⊂ V ( G ). S = result of Naive Greedy; S max = argmax u ∈ V ( G ) σ ( u ); S = argmax ( σ ( S ) , σ ( S max )); return S Theorem 11.
Algorithm 2 guarantees (1 − √ e ) approximate solution forBIM Problem. (cid:31) CELF (cid:31)
CELF++ (cid:31)
Static Greedy.Another scope of improvement in Kempe et al.’s [36] work was estimat-ing the influence spread by applying some method other than the heavily timeconsuming MCS runs. Borgs et al. [73] explored this scope by proposing adrastically different approach for spread estimation, namely reverse reachablesampling technique. The algorithms (such as TIM, TIM + , IMM) which usedthis method were seem to be much faster than CELF++ and also have com-petitive influence spread. Among TIM, TIM + , and IMM , IMM was found tobe the fastest one both theoretically (in terms of computational complexity),and empirically (in terms of computational time from experimentation) due tothe reuse of the RR sets in the node selection phase. To the best of the au-thors’ knowledge, IMM is the fastest algorithm, which was solely proposed forsolving SIM Problem. However, BCT Algorithm proposed by Nguyen et al.[80], which was originally proposed for solving CTVM problem, is the fastestsolution methodology available in the literature that can be adopted for solvingSIM Problem. 32ow from this discussion, it is important to note that the scalability problemincurred by the Basic Greedy Algorithm had been reduced by the subsequentresearch. However, as the size of the social network data set has become gi-gantic, development of algorithms with high scalability remains the thrust area.Solution methodologies described till now have been summarized in Table 4.Algorithms for which complexity analysis had not been done by the author(s),we left that column of the table blank. Algorithms of this category do not provide any approximation bound on theinfluence spread but have better running time and scalability. Here, we willdescribe the heuristic solution methodologies from the literature. • Random Heuristic : For selecting seed set by this method, randomlypick k nodes of the network and return them as seed set. In Kempe etal.’s [36] experiment, this method has been used as a baseline method. • Centrality-Based Heuristics : Centrality is a well-known measure innetwork analysis, which signifies how much importance a node has in thenetwork [84] [85]. There are many centrality-based heuristics proposed inthe literature for SIM Problem like
Maximum Degree Heuristic (MDH)(select k highest degree nodes as seed node), High Clustering CoefficientHeuristic (HCH) (select k nodes with the highest clustering coefficientvalue) [86] [87], High page rank heuristic [88] (select k nodes with thehighest page rank value) etc. • Degree Discount Heuristic (DDH): This is basically the modified ver-sion of MDH and was proposed by Chen et al. [70]. The key idea behindthis method is following for any two nodes u, v ∈ V ( G ), ( uv ) ∈ E ( G ) and u has been selected as a seed set by MDH, and then, during the countingthe degree of v , the edge ( uv ) should not be considered. Hence, due to the33 ame of theAlgorithm ProposedBy Complexity ApplicableFor ModelBasicGreedy Kempe etal. [36] O ( kmn R ) SIM
IC <
CELF
Leskovec etal. [83] O ( kmn R ) SIM
IC <
CELF++
Goyal etal.[71] O ( kmn R ) SIM
IC <
StaticGreedy
Cheng et al.[72] O ( R m + kn R m ) SIM
IC <
Brog et al.’sMethod
Brogs et al.[73] O ( kl ( m + n ) log n/(cid:15) ) SIM
IC <
Zohu et al.’sMethod
Zohu et al.[74] -
SIM
IC <
SKIM
Cohen et al.[76] O ( nl + (cid:80) i =1 | E i | + m(cid:15) − log n ) SIM
IC <
TIM+ , IMM
Tang et al.[77], [78] O (( k + l )( n + m ) log n/(cid:15) ) SIM
IC <
Stop-and-Stare
Nguyen etal. [79] -
TVM
IC <
Nguyen’sMethod
Nguyen etal. [51] O ( n (log n + d ) + kn (1 + d )) BIM
IC <
BCT
Nguyen etal. [80] O (( k + l )( n + m ) log n/(cid:15) ) SIM , BIM , CTVM IC BCT
Nguyen etal. [80] O (( k + l ) n log n/(cid:15) ) SIM , BIM , CTVM LT Table 4: Approximation algorithms for SIM Problem and its variants. u in the seed set, the degree of v will be discounted by 1. Thismethod is also named as Single Discount Heuristic (SDH). Experimentalresults of [70] show that DDH can achieve better influence spread thanMDH. • SIMPATH : This heuristic was proposed by Goyal et al. [89] for solvingSIM Problem under LT Model. SIMPATH works based on the principal ofCELF (discussed in Section 6.1). However, instead of using computation-ally expensive Monte Carlo Simulations for estimating influence spread,SIMPATH uses path enumeration techniques for this purpose. This al-gorithm has a parameter ( η ) for controlling trade off between influencespread and running time. Reported results conclude that SIMPATH out-performs other heuristics, such as MDH, Page Rank, LDGA with respectto information spread. • SPIN : Narayanam et al. [47] studied SIM Problem and λ Coverage Prob-lem as a co-operative game and proposed a
Shapely Value-Based Discov-ery of Influential Nodes (SPIN) Algorithm, which has the running time of O ( t ( n + m ) R + n log n + kn + k R m ), where t is the cardinality of the samplecollision set being considered for the computation of shapely value. Thisalgorithm has mainly two steps. First one is to generate a rank list of thenodes based on the shapley value and then, choose top-k of them and re-turn as seed set. Reported results show that SPIN constantly outperformsMDH and HCH. • MIA and
PMIA : Chen et al. [5] and Wang et al. [90] proposed maximuminfluence arborescence (MIA) and Prefix excluding MIA (PMIA) model ofinfluence propagation. They computed the propagation probability from aseed node to a non-seed node by multiplying the influence probabilities ofthe edges present in the shortest path.
Maximum Influence Path is the onehaving the maximum propagation probability and they considered thatinfluence spreads through local arborescence (a directed graph in which,for a vertex u called the root and any other vertex v , there is exactly35ne directed path from u to v ) only. Hence, the model is called MIA.In PMIA ( Prefix excluding
MIA) model, for any seed s i , its maximuminfluence path to other nodes should avoid all seeds that are before s i .They proposed greedy algorithms for selecting seed set based on these twodiffusion models. Reported results show that both MIA and PMIA canachieve high level of scalability. • LDAG : Chen et al. [91] developed this heuristic for solving SIM Problemunder LT Model. Influence spread in a
Directed Acyclic Graph (DAG) iseasy to compute. Hence, for computing the influence spread in generalsocial networks, they introduced a
Local Directed Acyclic Graph (LDAG)based influence model, which computes local DAGs for each node to ap-proximate influence spread. After constructing the DAGs, basic greedyalgorithm proposed by Kempe et al. [36] can be used to select the seednodes. Reported results show that LDAG constantly outperforms DDHor Page Rank heuristic. • IRIE : Jung et al. [92] proposed this heuristic based on influence rank-ing (IR) and influence estimation (IE) for solving SIM Problem underIC and its extension IC-N (independent cascade with negative opinion)Model. They developed a global influence ranking like belief propaga-tion approach. If we select top-k nodes, then there will be an overlapin influence spread by each node. For avoiding this shortcomings, theyintegrated a simple influence estimation technique to predict additionalinfluence impact of a seed on the other node of the network. Reportedresults show that IRIE can achieve better influence spread compared toMDH, Pagerank, PMIA etc. heuristics. However, IRIE has less runningtime and memory consumption. • ASIM : Galhotra et al. [93] designed this highly scalable heuristic forSIM Problem. For each node u ∈ V ( G ), this algorithm assigns a scorevalue (the weighted sum of the number of simple paths of length at most d starting from that node). ASIM has the running time of O ( kd ( m + n ))36nd its idea is quite similar to the SIMPATH Algorithm proposed by Goyalet al. [89]. Results show that ASIM takes less computational time andconsumes less memory compared to CELF++ and TIM, while achievingthe comparable influence spread. • EaSyIm : Galhotra et al. [94] proposed opinion cum interaction (OCI)model, which considers negative opinion as well. Based on the OCI Model,they formulated the maximizing effective opinion problem and proposedtwo fast and scalable heuristics, namely Openion Spread Influence Maxi-mization (OSIM) and EaSyIm having the running time of O ( k D ( m + n ))for this problem, where D is the diameter of the graph. Both the al-gorithms work in two phases. In the first phase, each node is assignedwith some score based on the contribution on influence spread for all thepaths starting at that node. Second step is concerned with the node pro-cessing step. The nodes with the maximum score value are selected asseed nodes. Reported empirical results show that OSIM and EaSyIm canachieve better influence spread compared to TIM + , CELF++ with lessrunning time. • Cordasco et al.’s [95] [96] Method : Later Cordasco et al. proposed afast and effective heuristic method for selecting the target set in a undi-rected social network [95] [96]. This heuristic produces optimal solutionfor trees , cycles and complete graphs . However, for real life social networks,this heuristic performs much better than the other methods available inthe literature. They extended this work for directed social networks aswell [97].There are several other studies also, which focused on developing heuristic.Nguyen et al. [51] proposed an efficient heuristic for solving BIM Problem. Wuet al. [98] developed a two-stage stochastic programming approach for solvingSIM Problem. In this study, instead of choosing a seed set of size exactly k ,their problem is choosing a seed set of size less than or equal to k .37ow, the studies related to heuristic methods will be summarized here.Centrality-based heuristics (CBHs) consider the topology of the network onlyand hence, obtained influence spread in most of the cases is quite less comparedto that of other states of the art methods. However, DDH performs slightlybetter than other CBHs, as it puts a little restriction on the selection of twoadjacent nodes. The application of SIMPATH for seed selection is little ad-vantageous, as it has a user controlled parameter η to balance the trade-offbetween accuracy and running time. SPIN has the advantage, as it can be usedfor solving both Top- k node problem as well as λ -Coverage Problem. MIA andPMIA have the better scalability compared to Basic Greedy. As LDAG worksbased on the principle of computation of influence spread in DAGs, it is seento be faster. As various heuristics are experimented with different benchmarkdata sets, drawing a general conclusion about the performance will be difficult.Here, we have summarized some of the important algorithms for solving SIMand related problems, as presented in Table 5. Algorithms for which complexityanalysis has not been done in the paper, we have left that column empty in thetable. Since early seventies, metaheuristic algorithms had been used successfullyto solve optimization problems arises in the broad domain of science and engi-neering [99] [100]. There is no exception for solving SIM Problem as well. • Bucur et al. [101] solved the SIM Problem using genetic algorithm . Theydemonstrated that with simple genetic operator, it is possible to find outapproximate solution for influence spread within feasible run time. In mostof the cases, influence spread obtained by their method was comparablewith that of the Basic Greedy Algorithm proposed by Kempe et al. [36]. • Jiang et al. [102] proposed simulated annealing -based algorithm for solv-ing the SIM Problem under IC Model. Reported results indicate that38 ame ofthe Algo-rithm Proposed By Complexity ModelSIMPATH
Goyal et al. [89] O ( kmn R ) LT SPIN
Narayanam etal. [47] O ( t ( n + m ) R + n log n + kn + k R m ) IC < MIA , PMIA
Chen et al. [5],Wang et al. [90] - MIA,PMIA
LDGA
Chen et al. [5] O ( n + kn log n ) MIA IRIE
Jung et al. [92] - IC &IC-N
ASIM
Galhotra et al.[93] O ( kd ( m + n )) IC EaSyIm
Galhotra et al.[94] O ( k D ( m + n )) OI Table 5: Heuristic solutions for SIM Problem their proposed methodology runs 2-3 times faster compared to the exist-ing heuristic methods in the literature. • Tsai et al. [103] developed the
Genetic New Greedy Algorithm ( GNA ) forsolving SIM Problem under IC Model by combining genetic algorithm withthe new greedy algorithm proposed by Chen et al. [70]. Their reportedresults conclude that GNA can give 10 % more influence spread comparedto the genetic algorithm. • Gong et al. [104] proposed a discrete particle swarm optimization algo-rithm for solving SIM Problem. They used the degree discount heuristicproposed by Chen et al. [70] to initialize the seed set and local influenceestimation (LIE) function to approximate the two-hop influence. They39ntroduced the network specific local search strategy also for fast conver-gence of their proposed algorithm. Reported results conclude that thismethodology outperforms the state of the art CELF++ with less compu-tational time.After that, several studies were also carried out in this direction [105], [106], [107][108]. Though there are a large number of metaheuristic algorithms [109], onlya few had been used for solving SIM Problem. Hence, the use of metaheuristicalgorithms for solving SIM Problem and its variants has been largely ignored.Next, we have described the community-based solution methodologies for SIMProblem.
Most of the real-life social networks exhibit a community structure withinit [110]. A community is basically a subset of nodes, which are densely con-nected among themselves and sparsely connected with the other nodes of thenetwork. In recent years, community-based solution framework ( CBSF ) hasbeen developed for solving SIM Problem. • Wang et al. [111] proposed the community-based greedy algorithm forsolving SIM Problem. This method consist of two steps, namely detectingcommunities based on information propagation and selecting communitiesfor finding influential nodes. This algorithm could outperform the degreediscount and random heuristic. • Chen et al. [112] [113] developed a CBSF for solving SIM Problem andnamed it
CIM . By exploiting the community structure, they selected somecandidate seed sets, for each community and from the candidate seed setsthey have selected the final seed set for diffusion. CIM could achieve betterinfluence spread compared to some state-of-the art heuristic methods, suchas CDH-Kcut, CDH-SHRINK and maximum degree. • Rahimkhan et al. [114] proposed a CBSF for solving SIM Problem underLT Model and named it
ComPath . They used Speaker- listener Label40ropagation Algorithm (SLPA) proposed by Xie et al. [115] for detectingcommunities and then identified the most influential communities andcandidate seed nodes. From the candidate seed set, they selected the finalseed set based on the intra distance among the nodes of the candidateseed set. ComPath could outperform CELF, CELF++, maximum degreeheuristic, maximum pagerank heuristic, LDGA. • Bozorgi et al. [116] developed a CBSF for solving SIM Problem under LTModel and named it
INCIM . Like ComPath, INCIM also use the SLPAAlgorithm for detecting the communities. They proposed an algorithm forselecting seed, which computes the influence spread using the algorithmdeveloped by Goyal et al. [89]. INCIM could outperform some state-of-the-art methodologies like LDGA, SIMPATH, IPA (a parallel algorithm forSIM Problem proposed by [117]), high pagerank and high degree heuristic. • Shang et al. [118] proposed a CBSF for solving SIM Problem and named it
CoFIM . In this study they introduced a diffusion model, which works intwo phases. In the first phase the seed set S was expanded to the neighbornodes of S , which would be usually allocated into different communities.Then, in the second phase, influence propagation within the communitieswas computed. Based on this diffusion model, they developed an incre-mental greedy algorithm for selecting seed set, which is analogous to thealgorithm proposed by Kempe et al. [36]. CoFIM could achieve betterinfluence spread compared to that of IPA, TIM+, MDH and IMM. • Recently, Li et al. [119] proposed a community-based approach for solvingthe SIM Problem, where the users have a specific geographical location.They developed a social influence-based community detection algorithmusing spectral clustering technique and a seed selection methodology byconsidering community-based influence index. Reported results show thatthis methodology is more efficient than many state-of-the-art methodolo-gies, while achieving almost the same influence spread.41t is important to note that except the methodology proposed by Wang etal. [111], all these methods are basically heuristics. However, these methods usecommunity detection of the underlying social network as an intermediate stepto scale down the SIM Problem into community level. There are large numberof algorithms available in the literature for detecting communities [120], [121].Among them, which one should be used for solving SIM Problem? How is thequality of community detection and influence spread related? This questionsare largely ignored in the literature.
In this section, we have described some solution methodologies of SIM Prob-lem, which are very different from the methodologies discussed till now. Also,each solution methodology presented here is different from another. It is re-ported in the literature that in any information diffusion process less than 10%nodes are influenced beyond the hop count 2 [122]. Based on this phenomenon,recently, Tang et al. [123] [124] developed a hop-based approach for SIM Prob-lem. Their methodology also gives a theoretical guarantee on influence spread.Ma et al. [125] proposed an algorithm for SIM Problem, which works based onthe heat diffusion process. It could produce better influence spread comparedto Basic Greedy Algorithm. Goyal et al. [126] developed a data-based ap-proach for solving SIM Problem. They introduced the credit distribution (CD)model that could grip the propagation traces to learn the influence flow pat-tern for approximating the influence spread. They showed that SIM Problemunder CD Model is NP-Hard and reported results show that this model canachieve even better influence spread compared to IC and LT Models with lessrunning time. Lee et al. [127] introduced a query-based approach for solvingSIM Problem under IC Model. Here, the query is for activating all the usersof a given set T , what should be the seed set? This methodology is intendedfor maximizing the influence of a particular group of users, which is the casein target-aware viral marketing . Zhu et al. [128] introduced the CTMC-ICM diffusion model, which is basically the blending of IC Model with
Continuous ime Markov Chain . They studied the SIM Problem under this model and cameup with a new centrality metric Spread Rank . Their reported results show thatseed nodes selected based on spread rank centrality can achieve better influencespared compared to the traditional distance-based centrality measures, such as degree , closeness , betweenness . Wang et al. [129] proposed the methodology Fluidspread , which works based on fluid dynamic principle and can revealthe dynamics of diffusion process. Kang et al. [130] introduced the notion ofdiffusion centrality for selecting influential nodes in a social network.
7. Summary of the Survey and Future Research Directions
Based on the survey of the existing literature presented in Sections 3 through6 we have summarized in this section the current research trends and givenfuture directions. • Practicality of the Problem : Most of the current studies is focused onthe practical issues of the SIM Problem. One of the major applicationsof social influence maximization is viral marketing. So, in this context,influencing an user will be beneficial, only if he will be able to influencea reasonable number of other users of the network. Recent studies, suchas [131] [80] along with the node selection cost also consider benefit asanother component in the SIM problem. • Scalability : Starting from kempe et al.’s [36] seminal work, scalabilityremains an important issue in this area. To reduce scalability problem,instead of using Monte Carlo simulation-based spread estimation, recentlyBorgs et al. [73] introduced reverse reachable set-based spread estimation.After this work, all the popular algorithms for SIM Problem, such asTIM, IMM, TIM+ etc uses this concept as an influence spread estimationtechnique for improving scalability.43
Diffusion Probability Computation : TSS problem assumes that in-fluence probability between any pair of users is known. However, this is avery unrealistic assumption. Though there were some previous studies inthis direction, people tried to predict influence probability using machinelearning techniques [132].Though since the last one and half decades or so, the
TSS Problem had beenstudied extensively from both theoretical as well as applied context, still to thebest of our knowledge, some of the corners of this problem are either not orpartially investigated. Here, we have listed some future research directions fromboth problem specification as well as solution methodology point of view.
Further research may be carried out in future in and around of TSS Problemof social networks, in the following directions: • As on-line social networks are formed by the rational agents, incentiviza-tion is required, if a node is selected as a seed node. For practical applica-tions, it is also important to consider what benefit will be obtained (e.g.,how many other non-seed nodes becoming influenced through that nodeetc.) by activating that node. At the same time , for influence propaga-tion of time sensitive events ( where influencing one person after an eventdoes not make any scene such as, political campaign before election, viralmarketing for a seasonal product etc.) consideration of diffusion time isalso important. To the best of our knowledge, there is no reported studyon TSS Problem considering all three issues: cost, benefit, and time . • Most of the studies done on SIM Problem and its variants are under eitherIC or LT diffusion model. However, recently, some other diffusion modelshave also been recently developed, such as Independent Cascade Modelwith Negative Opinion (IC-N) [133], Opinion cum Interaction Model (OI)4494], Opinion-based Cascading Model (OC) [134] etc., which consider neg-ative opinion. SIM Problems and its different variants can also be studiedunder these newly developed diffusion models. • Most of the studies done on SIM Problem consider that the underlyingsocial network is static including influence probabilities. However, thisis not a practical assumption, as most of the social networks are timevarying. Recent studies on SIM Problem started considering temporalnature of the social network [135], [136]. As this has just started, there isa lot of scope to work in TSS Problem in time-varying social networks. • In real-world social networks, users have specific topics of choice. So,one user will be influenced by other users if both of them have similarchoices. Keeping ‘topic’ into consideration spread of influence can beincreased, which is known as topic aware influence maximization . Recentstudies on influence maximization considers this phenomenon [137] [8].SIM Problems and its variants can be studied in this settings as well. • Among all the variants of TSS Problem in social networks described inSection 3, it is surprising to see that only SIM problem is well studied.Hence, solution methodologies developed for SIM Problem can be modifiedaccordingly, so that they can be adopted for solving other variants of SIMproblem as well. • One of the major issues in the solution methodology for SIM problem isthe scalability. It is important to observe that the social network usedin the Kempe et al.’s [36] experiment had 10748 nodes and 53000 edges,whereas the recent study of Nguyen et al.’s [80] has used social networkof with 41 . × nodes and 1 . × edges. From this example, it isclear that the size of the social network data sets is increasing day byday. Hence, developing more scalable algorithms is extremely importantto handle large data sets. 45 From the discussion in Section 6.3, it is understood that though there aremany evolutionary algorithms, only genetic algorithm, artificial bee colonyoptimization and discrete particle swarm optimization algorithm havebeen used till date for solving SIM Problem. Hence, other meta-heuristics,such as ant colony optimization , differential evolution etc. can also be usedfor this purpose. • There are many solution methodologies proposed in the literature. How-ever, which one to choose in which situation and for what kind of net-work structure? For answering this question, by taking all the proposedmethodologies from the literature a strong experimental evaluation is re-quired with benchmark data sets. Recently, Arora et al. [138] has done abenchmarking study with 11 most popular algorithms from the literature,and they have found some contradictions between their own experimen-tal results and reported ones in the literature. More such benchmarkingstudies are required to investigate these issues. • Most of the algorithms presented in the literature are serial in nature.The issue of scalability in SIM problem can be tackled by developingdistributed and parallel algorithms. To the best of the authors’ knowledge,except dIRIEr developed by Zong et al. [139], there is no distributedalgorithm existing in the literature. Recently, a few parallel algorithmshave been developed for SIM Problem [117] [140]. So, this an open areato study the SIM problem and its variants under parallel and distributedsettings. • Most of the solution methodologies are concerned with the selection ofthe seeds in one go, before the diffusion starts. In this case, if any one ofthe selected seeds does not perform up to expectation, then the numberof influenced nodes will be lesser than expected. Considering this case,recently the framework of multiphase diffusion has been developed [141],[142]. Different variants of this problem can be studied in this framework.46 . Concluding Remarks
In this survey, first we have discussed the SIM problem and its different vari-ants studied in the literature. Next, we have reported the hardness results ofthe problem. After that, we have reported major research challenges concernedwith the SIM Problem and its variants. Subsequently, based on the approach,we have classified the proposed solution methodologies and discussed algorithmsof each category. At the end, we have discussed the current research trends andgiven future directions. From this survey, we can conclude that SIM problem iswell studied, though its variants are not and there is a continuous thirst for de-veloping more scalable algorithm for these problems. We hope that presentingthree dimensions (variants, hardness results and solution methodologies all to-gether) of the problem will help the researchers and practitioners to have betterunderstanding of the problem and better exposure in this field.
Acknowledgement
Authors want to thank Ministry of Human Resource and Development (MHRD),Government of India for sponsoring the project: E-business Center of Excellenceunder the scheme of Center for Training and Research in Frontier Areas of Sci-ence and Technology (FAST), Grant No. F.No.5-5/2014-TS.VII .
References [1] Bing Liu. Social network analysis.
Web Data Mining , pages 269–309,2011.[2] Damon Centola. The spread of behavior in an online social network ex-periment. science , 329(5996):1194–1197, 2010.[3] Maziar Nekovee, Yamir Moreno, Ginestra Bianconi, and Matteo Marsili.Theory of rumour spreading in complex social networks.
Physica A: Sta-tistical Mechanics and its Applications , 374(1):457–470, 2007.474] Jure Leskovec, Lada A Adamic, and Bernardo A Huberman. The dynam-ics of viral marketing.
ACM Transactions on the Web (TWEB) , 1(1):5,2007.[5] Wei Chen, Chi Wang, and Yajun Wang. Scalable influence maximizationfor prevalent viral marketing in large-scale social networks. In
Proceed-ings of the 16th ACM SIGKDD international conference on Knowledgediscovery and data mining , pages 1029–1038. ACM, 2010.[6] Xiaodan Song, Belle L Tseng, Ching-Yung Lin, and Ming-Ting Sun. Per-sonalized recommendation driven by information flow. In
Proceedings ofthe 29th annual international ACM SIGIR conference on Research anddevelopment in information retrieval , pages 509–516. ACM, 2006.[7] Dino Ienco, Francesco Bonchi, and Carlos Castillo. The meme rankingproblem: Maximizing microblogging virality. In , pages 328–335. IEEE,2010.[8] Yuchen Li, Dongxiang Zhang, and Kian-Lee Tan. Real-time targeted in-fluence maximization for online advertisements.
Proceedings of the VLDBEndowment , 8(10):1070–1081, 2015.[9] Jianshu Weng, Ee-Peng Lim, Jing Jiang, and Qi He. Twitterrank: find-ing topic-sensitive influential twitterers. In
Proceedings of the third ACMinternational conference on Web search and data mining , pages 261–270.ACM, 2010.[10] Eytan Bakshy, Jake M Hofman, Winter A Mason, and Duncan J Watts.Everyone’s an influencer: quantifying influence on twitter. In
Proceed-ings of the fourth ACM international conference on Web search and datamining , pages 65–74. ACM, 2011.[11] Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos,Jeanne VanBriesen, and Natalie Glance. Cost-effective outbreak detection48n networks. In
Proceedings of the 13th ACM SIGKDD international con-ference on Knowledge discovery and data mining , pages 420–429. ACM,2007.[12] Robin Cowan and Nicolas Jonard. Network structure and the diffusion ofknowledge.
Journal of economic Dynamics and Control , 28(8):1557–1575,2004.[13] R Kasprzak. Diffusion in networks.
Journal of Telecommunications andInformation Technology , pages 99–106, 2012.[14] Paulo Shakarian, Abhinav Bhatnagar, Ashkan Aleali, Elham Shaabani,and Ruocheng Guo. The independent cascade and linear threshold models.In
Diffusion in Social Networks , pages 35–48. Springer, 2015.[15] Jimeng Sun and Jie Tang. A survey of models and algorithms for socialinfluence analysis.
Social network data analytics , pages 177–214, 2011.[16] William M Campbell, Charlie K Dagli, and Clifford J Weinstein. Socialnetwork analysis with content and graphs.
Lincoln Laboratory Journal ,20(1):61–81, 2013.[17] Tianyi Wang, Yang Chen, Zengbin Zhang, Tianyin Xu, Long Jin, Pan Hui,Beixing Deng, and Xing Li. Understanding graph sampling algorithmsfor social network analysis. In , pages 123–128.IEEE, 2011.[18] Reinhard Diestel. Graph theory. 2005.
Grad. Texts in Math , 101, 2005.[19] Daniel Gruhl, Ramanathan Guha, David Liben-Nowell, and AndrewTomkins. Information diffusion through blogspace. In
Proceedings of the13th international conference on World Wide Web , pages 491–501. ACM,2004. 4920] Jochen Harant, Anja Pruchnewski, and Margit Voigt. On dominating setsand independent sets of graphs.
Combinatorics, Probability and Comput-ing , 8(6):547–553, 1999.[21] Venkatesh Raman, Saket Saurabh, and Sriganesh Srihari. Parameterizedalgorithms for generalized domination.
Lecture Notes in Computer Sci-ence , 5165:116–126, 2008.[22] Ralf Klasing and Christian Laforest. Hardness results and approxima-tion algorithms of k-tuple domination in graphs.
Information ProcessingLetters , 89(2):75–83, 2004.[23] Ning Chen. On the approximability of influence in social networks.
SIAMJournal on Discrete Mathematics , 23(3):1400–1415, 2009.[24] Paul A Dreyer and Fred S Roberts. Irreversible k-threshold processes:Graph-theoretical threshold models of the spread of disease and of opinion.
Discrete Applied Mathematics , 157(7):1615–1627, 2009.[25] J´ozsef Balogh, B´ela Bollob´as, and Robert Morris. Bootstrap percolation inhigh dimensions.
Combinatorics, Probability and Computing , 19(5-6):643–692, 2010.[26] David Peleg. Local majorities, coalitions and monopolies in graphs: areview.
Theoretical Computer Science , 282(2):231–257, 2002.[27] Michael R Garey and David S Johnson.
Computers and intractability ,volume 29. wh freeman New York, 2002.[28] David P Williamson and David B Shmoys.
The design of approximationalgorithms . Cambridge university press, 2011.[29] Rodney G Downey, Michael R Fellows, and Kenneth W Regan. Param-eterized circuit complexity and the w hierarchy.
Theoretical ComputerScience , 191(1-2):97–115, 1998. 5030] Marcel Salath´e, Maria Kazandjieva, Jung Woo Lee, Philip Levis, Mar-cus W Feldman, and James H Jones. A high-resolution human contactnetwork for infectious disease transmission.
Proceedings of the NationalAcademy of Sciences , 107(51):22020–22025, 2010.[31] Bo Xu and Lu Liu. Information diffusion through online social networks.In , pages 53–56. IEEE, 2010.[32] Cliff C Zou, Don Towsley, and Weibo Gong. Modeling and simulationstudy of the propagation and defense of internet e-mail worms.
IEEETransactions on dependable and secure computing , 4(2), 2007.[33] Masahiro Kimura and Kazumi Saito. Tractable models for informationdiffusion in social networks.
Knowledge Discovery in Databases: PKDD2006 , pages 259–271, 2006.[34] Thomas W Valente. Network models of the diffusion of innovations. 1995.[35] Nima Heidari. Modeling information diffusion in social networks. arXivpreprint arXiv:1603.02178 , 2016.[36] David Kempe, Jon Kleinberg, and ´Eva Tardos. Maximizing the spreadof influence through a social network. In
Proceedings of the ninth ACMSIGKDD international conference on Knowledge discovery and data min-ing , pages 137–146. ACM, 2003.[37] Jaewon Yang and Jure Leskovec. Modeling information diffusion in im-plicit networks. In , pages 599–608. IEEE, 2010.[38] Kazumi Saito, Kouzou Ohara, Yuki Yamagishi, Masahiro Kimura, andHiroshi Motoda. Learning diffusion probability based on node attributesin social networks. In
International Symposium on Methodologies for In-telligent Systems , pages 153–162. Springer, 2011.5139] Kazumi Saito, Ryohei Nakano, and Masahiro Kimura. Prediction ofinformation diffusion probabilities for independent cascade model. In
Knowledge-based intelligent information and engineering systems , pages67–75. Springer, 2008.[40] Amit Goyal, Francesco Bonchi, and Laks VS Lakshmanan. Learning in-fluence probabilities in social networks. In
Proceedings of the third ACMinternational conference on Web search and data mining , pages 241–250.ACM, 2010.[41] Kazumi Saito, Masahiro Kimura, Kouzou Ohara, and Hiroshi Motoda.Selecting information diffusion models over social networks for behavioralanalysis.
Machine Learning and Knowledge Discovery in Databases , pages180–195, 2010.[42] Masahiro Kimura, Kazumi Saito, Ryohei Nakano, and Hiroshi Motoda.Finding influential nodes in a social network from information diffusiondata.
Social Computing and Behavioral Modeling , pages 1–8, 2009.[43] Thomas W Valente. Social network thresholds in the diffusion of innova-tions.
Social networks , 18(1):69–89, 1996.[44] Huiyuan Zhang, Subhankar Mishra, My T Thai, J Wu, and Y Wang.Recent advances in information diffusion and influence maximization incomplex social networks.
Opportunistic Mobile Social Networks , 37(1.1),2014.[45] Pedro Domingos and Matt Richardson. Mining the network value of cus-tomers. In
Proceedings of the seventh ACM SIGKDD international confer-ence on Knowledge discovery and data mining , pages 57–66. ACM, 2001.[46] Eyal Ackerman, Oren Ben-Zwi, and Guy Wolfovitz. Combinatorial modeland bounds for target set selection.
Theoretical Computer Science , 411(44-46):4017–4022, 2010. 5247] Ramasuri Narayanam and Yadati Narahari. A shapley value-based ap-proach to discover influential nodes in social networks.
IEEE Transactionson Automation Science and Engineering , 8(1):130–147, 2011.[48] Hung T Nguyen, Preetam Ghosh, Michael L Mayo, and Thang N Dinh.Social influence spectrum at scale: Near-optimal solutions for multiplebudgets at once.
ACM Transactions on Information Systems (TOIS) ,36(2):14, 2017.[49] S Raghavan and Rui Zhang. Weighted target set selection on social net-works. Technical report, Working paper, University of Maryland, 2015.[50] Moses Charikar, Yonatan Naamad, and Anthony Wirth. On approximat-ing target set selection. In
LIPIcs-Leibniz International Proceedings in In-formatics , volume 60. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik,2016.[51] Huy Nguyen and Rong Zheng. On budgeted influence maximization insocial networks.
IEEE Journal on Selected Areas in Communications ,31(6):1084–1094, 2013.[52] Ferdinando Cicalese, Gennaro Cordasco, Luisa Gargano, Martin Milaniˇc,and Ugo Vaccaro. Latency-bounded target set selection in social networks.
Theoretical Computer Science , 535:1–15, 2014.[53] Matthew Richardson and Pedro Domingos. Mining knowledge-sharingsites for viral marketing. In
Proceedings of the eighth ACM SIGKDDinternational conference on Knowledge discovery and data mining , pages61–70. ACM, 2002.[54] Guy Kortsarz. On the hardness of approximating spanners.
Algorithmica ,30(3):432–450, 2001.[55] Craig A Tovey. A simplified np-complete satisfiability problem.
Discreteapplied mathematics , 8(1):85–89, 1984.5356] Rodney G Downey and Michael R Fellows.
Fundamentals of parameterizedcomplexity , volume 4. Springer, 2013.[57] Cristina Bazgan, Morgan Chopin, Andr´e Nichterlein, and Florian Sikora.Parameterized approximability of maximizing the spread of influence innetworks.
Journal of Discrete Algorithms , 27:54–65, 2014.[58] Morgan Chopin, Andr´e Nichterlein, Rolf Niedermeier, and Mathias Weller.Constant thresholds can make target set selection tractable.
Theory ofComputing Systems , 55(1):61–83, 2014.[59] Morgan Chopin, Andr´e Nichterlein, Rolf Niedermeier, and Mathias Weller.
Constant Thresholds Can Make Target Set Selection Tractable , pages 120–133. Springer Berlin Heidelberg, Berlin, Heidelberg, 2012.[60] Richard M Karp. Reducibility among combinatorial problems. In
Com-plexity of computer computations , pages 85–103. Springer, 1972.[61] Andr´e Nichterlein, Rolf Niedermeier, Johannes Uhlmann, and MathiasWeller. On tractable cases of target set selection.
Social Network Analysisand Mining , 3(2):233–256, 2013.[62] Andr´e Nichterlein, Rolf Niedermeier, Johannes Uhlmann, and MathiasWeller. On tractable cases of target set selection.
Algorithms and Com-putation , pages 378–389, 2010.[63] Yanhua Li, Wei Chen, Yajun Wang, and Zhi-Li Zhang. Influence diffusiondynamics and influence maximization in social networks with friend andfoe relationships. In
Proceedings of the sixth ACM international conferenceon Web search and data mining , pages 657–666. ACM, 2013.[64] Aristides Gionis, Evimaria Terzi, and Panayiotis Tsaparas. Opinion max-imization in social networks. In
Proceedings of the 2013 SIAM Interna-tional Conference on Data Mining , pages 387–395. SIAM, 2013.5465] Alessandro Epasto, Ahmad Mahmoody, and Eli Upfal. Real-timetargeted-influence queries over large graphs. In
Proceedings of the 2017IEEE/ACM International Conference on Advances in Social NetworksAnalysis and Mining 2017 , pages 224–231. ACM, 2017.[66] Xiangyu Ke, Arijit Khan, and Gao Cong. Finding seeds and relevanttags jointly: For targeted influence maximization in social networks. In
Proceedings of the 2018 International Conference on Management of Data ,pages 1097–1111. ACM, 2018.[67] David Kempe, Jon M Kleinberg, and ´Eva Tardos. Influential nodes in adiffusion model for social networks. In
ICALP , volume 5, pages 1127–1138.Springer, 2005.[68] David Kempe, Jon M Kleinberg, and ´Eva Tardos. Maximizing the spreadof influence through a social network.
Theory of Computing , 11(4):105–147, 2015.[69] Takanori Maehara, Hirofumi Suzuki, and Masakazu Ishihata. Exact com-putation of influence spread by binary decision diagrams. In
Proceedingsof the 26th International Conference on World Wide Web , pages 947–956.International World Wide Web Conferences Steering Committee, 2017.[70] Wei Chen, Yajun Wang, and Siyu Yang. Efficient influence maximizationin social networks. In
Proceedings of the 15th ACM SIGKDD interna-tional conference on Knowledge discovery and data mining , pages 199–208.ACM, 2009.[71] Amit Goyal, Wei Lu, and Laks VS Lakshmanan. Celf++: optimizingthe greedy algorithm for influence maximization in social networks. In
Proceedings of the 20th international conference companion on World wideweb , pages 47–48. ACM, 2011.[72] Suqi Cheng, Huawei Shen, Junming Huang, Guoqing Zhang, and XueqiCheng. Staticgreedy: solving the scalability-accuracy dilemma in influence55aximization. In
Proceedings of the 22nd ACM international conferenceon Information & Knowledge Management , pages 509–518. ACM, 2013.[73] Christian Borgs, Michael Brautbar, Jennifer Chayes, and Brendan Lucier.Maximizing social influence in nearly optimal time. In
Proceedings ofthe Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms ,pages 946–957. SIAM, 2014.[74] Yuqing Zhu, Weili Wu, Yuanjun Bi, Lidong Wu, Yiwei Jiang, and WenXu. Better approximation algorithms for influence maximization in onlinesocial networks.
Journal of Combinatorial Optimization , 30(1):97–108,2015.[75] Uriel Feige and Michel Goemans. Approximating the value of two powerproof systems, with applications to max 2sat and max dicut.[76] Edith Cohen, Daniel Delling, Thomas Pajor, and Renato F Werneck.Sketch-based influence maximization and computation: Scaling up withguarantees. In
Proceedings of the 23rd ACM International Conference onConference on Information and Knowledge Management , pages 629–638.ACM, 2014.[77] Youze Tang, Xiaokui Xiao, and Yanchen Shi. Influence maximization:Near-optimal time complexity meets practical efficiency. In
Proceedingsof the 2014 ACM SIGMOD international conference on Management ofdata , pages 75–86. ACM, 2014.[78] Youze Tang, Yanchen Shi, and Xiaokui Xiao. Influence maximizationin near-linear time: A martingale approach. In
Proceedings of the 2015ACM SIGMOD International Conference on Management of Data , pages1539–1554. ACM, 2015.[79] Hung T Nguyen, My T Thai, and Thang N Dinh. Stop-and-stare: Op-timal sampling algorithms for viral marketing in billion-scale networks.56n
Proceedings of the 2016 International Conference on Management ofData , pages 695–710. ACM, 2016.[80] Hung T Nguyen, My T Thai, and Thang N Dinh. A billion-scale approxi-mation algorithm for maximizing benefit in viral marketing.
IEEE/ACMTransactions on Networking , 2017.[81] Samir Khuller, Anna Moss, and Joseph Seffi Naor. The budgeted max-imum coverage problem.
Information Processing Letters , 70(1):39–45,1999.[82] Huy Nguyen and Rong Zheng. On budgeted influence maximization insocial networks. arXiv preprint arXiv:1204.4491 , 2012.[83] Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. Graphs over time:densification laws, shrinking diameters and possible explanations. In
Proceedings of the eleventh ACM SIGKDD international conference onKnowledge discovery in data mining , pages 177–187. ACM, 2005.[84] Linton C Freeman. Centrality in social networks conceptual clarification.
Social networks , 1(3):215–239, 1978.[85] Andrea Landherr, Bettina Friedl, and Julia Heidemann. A critical reviewof centrality measures in social networks.
Business & Information SystemsEngineering , 2(6):371–385, 2010.[86] Christo Wilson, Bryce Boe, Alessandra Sala, Krishna PN Puttaswamy,and Ben Y Zhao. User interactions in social networks and their implica-tions. In
Proceedings of the 4th ACM European conference on Computersystems , pages 205–218. Acm, 2009.[87] Benjamin M Tabak, Marcelo Takami, Jadson MC Rocha, Daniel O Ca-jueiro, and Sergio RS Souza. Directed clustering coefficient as a measureof systemic risk in complex banking networks.
Physica A: Statistical Me-chanics and its Applications , 394:211–216, 2014.5788] Sergey Brin and Lawrence Page. The anatomy of a large-scale hyper-textual web search engine.
Comput. Netw. ISDN Syst. , 30(1-7):107–117,April 1998.[89] Amit Goyal, Wei Lu, and Laks VS Lakshmanan. Simpath: An efficientalgorithm for influence maximization under the linear threshold model.In ,pages 211–220. IEEE, 2011.[90] Chi Wang, Wei Chen, and Yajun Wang. Scalable influence maximizationfor independent cascade model in large-scale social networks.
Data Miningand Knowledge Discovery , 25(3):545, 2012.[91] Wei Chen, Yifei Yuan, and Li Zhang. Scalable influence maximizationin social networks under the linear threshold model. In , pages 88–97. IEEE,2010.[92] Kyomin Jung, Wooram Heo, and Wei Chen. Irie: Scalable and robust in-fluence maximization in social networks. In , pages 918–923. IEEE, 2012.[93] Sainyam Galhotra, Akhil Arora, Srinivas Virinchi, and Shourya Roy.Asim: A scalable algorithm for influence maximization under the indepen-dent cascade model. In
Proceedings of the 24th International Conferenceon World Wide Web , pages 35–36. ACM, 2015.[94] Sainyam Galhotra, Akhil Arora, and Shourya Roy. Holistic influence maxi-mization: Combining scalability and efficiency with opinion-aware models.In
Proceedings of the 2016 International Conference on Management ofData , pages 743–758. ACM, 2016.[95] Gennaro Cordasco, Luisa Gargano, Marco Mecchia, Adele A Rescigno,and Ugo Vaccaro. A fast and effective heuristic for discovering small target58ets in social networks. In
Combinatorial Optimization and Applications ,pages 193–208. Springer, 2015.[96] Gennaro Cordasco, Luisa Gargano, and Adele A Rescigno. Active spread-ing in networks. In
ICTCS , pages 149–162, 2016.[97] Gennaro Cordasco, Luisa Gargano, and Adele Anna Rescigno. Influencepropagation over large scale social networks. In
Proceedings of the 2015IEEE/ACM International Conference on Advances in Social NetworksAnalysis and Mining 2015 , pages 1531–1538. ACM, 2015.[98] Hao-Hsiang Wu and Simge K¨u¸c¨ukyavuz. A two-stage stochastic program-ming approach for influence maximization in social networks.
Computa-tional Optimization and Applications , pages 1–33, 2017.[99] Huizhi Yi, Qinglin Duan, and T Warren Liao. Three improved hybridmetaheuristic algorithms for engineering design optimization.
Applied SoftComputing , 13(5):2433–2444, 2013.[100] Xin-She Yang, Su Fong Chien, and Tiew On Ting. Computational in-telligence and metaheuristic algorithms with applications.
The ScientificWorld Journal , 2014, 2014.[101] Doina Bucur and Giovanni Iacca. Influence maximization in social net-works with genetic algorithms. In
European Conference on the Applica-tions of Evolutionary Computation , pages 379–392. Springer, 2016.[102] Qingye Jiang, Guojie Song, Gao Cong, Yu Wang, Wenjun Si, and KunqingXie. Simulated annealing based influence maximization in social networks.In
AAAI , volume 11, pages 127–132, 2011.[103] Chun-Wei Tsai, Yo-Chung Yang, and Ming-Chao Chiang. A geneticnewgreedy algorithm for influence maximization in social network. In , pages 2549–2554. IEEE, 2015.59104] Maoguo Gong, Jianan Yan, Bo Shen, Lijia Ma, and Qing Cai. Influencemaximization in social networks based on discrete particle swarm opti-mization.
Information Sciences , 367:600–614, 2016.[105] C Prem Sankar, S Asharaf, and K Satheesh Kumar. Learning from bees:An approach for influence maximization on viral campaigns.
PloS one ,11(12):e0168125, 2016.[106] Qixiang Wang, Maoguo Gong, Chao Song, and Shanfeng Wang. Discreteparticle swarm optimization based influence maximization in complex net-works. In ,pages 488–494. IEEE, 2017.[107] Shi-Jui Liu, Chi-Yuan Chen, and Chun-Wei Tsai. An effective simulatedannealing for influence maximization problem of online social networks.
Procedia Computer Science , 113:478–483, 2017.[108] Kaiqi Zhang, Haifeng Du, and Marcus W Feldman. Maximizing influencein a social network: Improved results using a genetic algorithm.
PhysicaA: Statistical Mechanics and its Applications , 478:20–30, 2017.[109] Xin-She Yang.
Nature-inspired metaheuristic algorithms . Luniver press,2010.[110] Aaron Clauset, Mark EJ Newman, and Cristopher Moore. Finding com-munity structure in very large networks.
Physical review E , 70(6):066111,2004.[111] Yu Wang, Gao Cong, Guojie Song, and Kunqing Xie. Community-basedgreedy algorithm for mining top-k influential nodes in mobile social net-works. In
Proceedings of the 16th ACM SIGKDD international conferenceon Knowledge discovery and data mining , pages 1039–1048. ACM, 2010.[112] Y Chen, S Chang, C Chou, W Peng, and S Lee. Exploring communitystructures for influence maximization in social networks. In
Proceedings f the 6th SNA-KDD Workshop on Social Network Mining and Analysisheld in conjunction with KDD12 (SNA-KDD12) , pages 1–6, 2012.[113] Yi-Cheng Chen, Wen-Yuan Zhu, Wen-Chih Peng, Wang-Chien Lee, andSuh-Yin Lee. Cim: Community-based influence maximization in so-cial networks. ACM Transactions on Intelligent Systems and Technology(TIST) , 5(2):25, 2014.[114] Khadije Rahimkhani, Abolfazl Aleahmad, Maseud Rahgozar, and AliMoeini. A fast algorithm for finding most influential people based on thelinear threshold model.
Expert Systems with Applications , 42(3):1353–1361, 2015.[115] Jierui Xie, Boleslaw K Szymanski, and Xiaoming Liu. Slpa: Uncoveringoverlapping communities in social networks via a speaker-listener inter-action dynamic process. In , pages 344–349. IEEE, 2011.[116] Arastoo Bozorgi, Hassan Haghighi, Mohammad Sadegh Zahedi, and Mo-jtaba Rezvani. Incim: A community-based algorithm for influence maxi-mization problem under the linear threshold model.
Information Process-ing & Management , 52(6):1188–1199, 2016.[117] Jinha Kim, Seung-Keol Kim, and Hwanjo Yu. Scalable and parallelizableprocessing of influence maximization for large-scale social networks? In ,pages 266–277. IEEE, 2013.[118] Jiaxing Shang, Shangbo Zhou, Xin Li, Lianchen Liu, and Hongchun Wu.Cofim: A community-based framework for influence maximization onlarge-scale networks.
Knowledge-Based Systems , 117:88–100, 2017.[119] Xiao Li, Xiang Cheng, Sen Su, and Chenna Sun. Community-based seedsselection algorithm for location aware influence maximization.
Neurocom-puting , 275:1601–1613, 2018. 61120] Santo Fortunato. Community detection in graphs.
Physics reports ,486(3):75–174, 2010.[121] Tanmoy Chakraborty, Ayushi Dalmia, Animesh Mukherjee, and NiloyGanguly. Metrics for community analysis: A survey.
ACM ComputingSurveys (CSUR) , 50(4):54, 2017.[122] Sharad Goel, Duncan J Watts, and Daniel G Goldstein. The structure ofonline diffusion networks. In
Proceedings of the 13th ACM conference onelectronic commerce , pages 623–638. ACM, 2012.[123] Jing Tang, Xueyan Tang, and Junsong Yuan. Influence maximizationmeets efficiency and effectiveness: A hop-based approach. In
Proceedingsof the 2017 IEEE/ACM International Conference on Advances in SocialNetworks Analysis and Mining 2017 , pages 64–71. ACM, 2017.[124] Jing Tang, Xueyan Tang, and Junsong Yuan. An efficient and effectivehop-based approach for influence maximization in social networks.
SocialNetwork Analysis and Mining , 8(1):10, 2018.[125] Hao Ma, Haixuan Yang, Michael R Lyu, and Irwin King. Mining socialnetworks using heat diffusion processes for marketing candidates selection.In
Proceedings of the 17th ACM conference on Information and knowledgemanagement , pages 233–242. ACM, 2008.[126] Amit Goyal, Francesco Bonchi, and Laks VS Lakshmanan. A data-basedapproach to social influence maximization.
Proceedings of the VLDB En-dowment , 5(1):73–84, 2011.[127] Jong-Ryul Lee and Chin-Wan Chung. A query approach for influencemaximization on specific users in social networks.
IEEE Transactions onknowledge and data engineering , 27(2):340–353, 2015.[128] Tian Zhu, Bai Wang, Bin Wu, and Chuanxi Zhu. Maximizing the spread ofinfluence ranking in social networks.
Information Sciences , 278:535–544,2014. 62129] Feng Wang, Wenjun Jiang, Xiaolin Li, and Guojun Wang. Maximizingpositive influence spread in online social networks via fluid dynamics.
Fu-ture Generation Computer Systems , 2017.[130] Chanhyun Kang, Sarit Kraus, Cristian Molinaro, Francesca Spezzano, andVS Subrahmanian. Diffusion centrality: A paradigm to maximize spreadin social networks.
Artificial Intelligence , 239:70–96, 2016.[131] Hung T Nguyen, Thang N Dinh, and My T Thai. Cost-aware targetedviral marketing in billion-scale networks. In
IEEE INFOCOM 2016-The35th Annual IEEE International Conference on Computer Communica-tions , pages 1–9. IEEE, 2016.[132] Devesh Varshney, Sandeep Kumar, and Vineet Gupta. Predicting infor-mation diffusion probabilities in social networks: A bayesian networksbased approach.
Knowledge-Based Systems , 133:66–76, 2017.[133] Wei Chen, Alex Collins, Rachel Cummings, Te Ke, Zhenming Liu, DavidRincon, Xiaorui Sun, Yajun Wang, Wei Wei, and Yifei Yuan. Influencemaximization in social networks when negative opinions may emerge andpropagate. In
Proceedings of the 2011 SIAM International Conference onData Mining , pages 379–390. SIAM, 2011.[134] Huiyuan Zhang, Thang N Dinh, and My T Thai. Maximizing the spreadof positive influence in online social networks. In , pages317–326. IEEE, 2013.[135] Guangmo Tong, Weili Wu, Shaojie Tang, and Ding-Zhu Du. Adaptiveinfluence maximization in dynamic social networks.
IEEE/ACM Trans-actions on Networking (TON) , 25(1):112–125, 2017.[136] Honglei Zhuang, Yihan Sun, Jie Tang, Jialin Zhang, and Xiaoming Sun.Influence maximization in dynamic social networks. In ernational Conference on Data Mining (ICDM) , pages 1313–1318. IEEE,2013.[137] Shuo Chen, Ju Fan, Guoliang Li, Jianhua Feng, Kian-lee Tan, and Jin-hui Tang. Online topic-aware influence maximization. Proceedings of theVLDB Endowment , 8(6):666–677, 2015.[138] Akhil Arora, Sainyam Galhotra, and Sayan Ranu. Debunking the myths ofinfluence maximization: An in-depth benchmarking study. In
Proceedingsof the 2017 ACM International Conference on Management of Data , pages651–666. ACM, 2017.[139] Zhou Zong, Bo Li, and Chunming Hu. dirier: Distributed influence max-imization in social network. In , pages 119–125. IEEE,2014.[140] Hong Wu, Kun Yue, Xiaodong Fu, Yujie Wang, and Weiyi Liu. Paral-lel seed selection for influence maximization based on k-shell decomposi-tion. In
International Conference on Collaborative Computing: Network-ing, Applications and Worksharing , pages 27–36. Springer, 2016.[141] Swapnil Dhamal, KJ Prabuchandran, and Y Narahari. Information dif-fusion in social networks in two phases.
IEEE Transactions on NetworkScience and Engineering , 3(4):197–210, 2016.[142] Kai Han, Keke Huang, Xiaokui Xiao, Jing Tang, Aixin Sun, and XueyanTang. Efficient algorithms for adaptive influence maximization.