[PDF] A Survey on Influence Maximization in a Social Network

Abstract

Given a social network with diffusion probabilities as edge weights and an integer k, which k nodes should be chosen for initial injection of information to maximize influence in the network? This problem is known as Target Set Selection in a social network (TSS Problem) and more popularly, Social Influence Maximization Problem (SIM Problem). This is an active area of research in computational social network analysis domain since one and half decades or so. Due to its practical importance in various domains, such as viral marketing, target advertisement, personalized recommendation, the problem has been studied in different variants, and different solution methodologies have been proposed over the years. Hence, there is a need for an organized and comprehensive review on this topic. This paper presents a survey on the progress in and around TSS Problem. At last, it discusses current research trends and future research directions as well.

Full PDF

AA Survey on Inﬂuence Maximization in aSocial Network

Suman Banerjee a , Mamata Jenamani a , Dilip Kumar Pratihar a a Indian Institute of Technology, Kharagpur, West Bengal, India.

Abstract

Given a social network with diﬀusion probabilities as edge weights and an in-teger k , which k nodes should be chosen for initial injection of information tomaximize inﬂuence in the network? This problem is known as Target Set Se-lection in a social network ( TSS Problem ) and more popularly,

Social InﬂuenceMaximization Problem ( SIM Problem ). This is an active area of research in computational social network analysis domain since one and half decades or so.Due to its practical importance in various domains, such as viral marketing , tar-get advertisement , personalized recommendation , the problem has been studiedin diﬀerent variants, and diﬀerent solution methodologies have been proposedover the years. Hence, there is a need for an organized and comprehensive re-view on this topic. This paper presents a survey on the progress in and around TSS Problem . At last, it discusses current research trends and future researchdirections as well.

Keywords:

Target Set Selection Problem, Social Networks, InﬂuenceMaximization, Inapproxibility Results, Approximation Algorithm, GreedyStrategy, NP-Hard Problem. ∗ Corresponding author-Dilip Kumar Pratihar

Email addresses: [email protected] (Suman Banerjee), [email protected] (Mamata Jenamani), [email protected] (Dilip Kumar Pratihar)

Preprint submitted to Elsevier August 17, 2018 a r X i v : . [ c s . S I] A ug . Introduction A social network is an interconnected structure of a group of agents formedfor social interactions [1]. Nowadays, social networks play an important rolein spreading information, opinion, ideas, innovation, rumors etc. [2] [3]. Thisspreading process has a huge practical importance in viral marketing [4] [5],personalized recommendation [6], feed ranking [7], target advertisement [8], se-lecting inﬂuential twitters [9] [10], selecting informative blogs [11], etc. Hence,recent years have witnessed a signiﬁcant attention in the study of inﬂuencepropagation in online social networks. Consider the case of viral marketing of acommercial house, where the goal is to attract the users for purchasing a par-ticular product. The best way to do this is to select a set of highly inﬂuentialusers and distribute them free samples. If they like the product, they will sharethe information to their neighbors. Due to their high inﬂuence, many of theneighbors will try for the product and share the information to their neighbors.This cascading process will be continued and ultimately a large fraction of theusers will try for the product. Naturally, number of free sample products will belimited due to economic reason. Hence, this process will be fruitful, if the freesamples can be distributed among the highly inﬂuential users and the problemhere bottoms down to select inﬂuential users from the network. This problemis known as

Social Inﬂuence Maximization Problem .Social inﬂuence occurs due to the diﬀusion of information in the network.This phenomenon in a networked system is well studied [12] [13]. Speciﬁcally,there are two popularly adopted models to study the diﬀusion process, namely Independent Cascade Model (abbreviated as

IC Model ), which collects the in-dependent behavior of the agents, and the other one is

Linear Threshold Model (abbreviated as

LT Model ), which captures the collective behavior of the agents(detailed discussion is deferred till Section 2.5) [14]. In both the models, infor- Now onwards, we will use Target Set Selection and Social Inﬂuence Maximization inter-changeably graph withthe users as the vertex set and social ties among the users as the edge set. Itis also assumed that the diﬀusion threshold (a measurement of how hard toinﬂuence the user and given in a numerical scale; more the value, more hardto inﬂuence the user) is given as the vertex weight and inﬂuence probabilitybetween two users as edge weight . In this settings, the SIM Problem is stated asfollows: for a given size k ( k ∈ Z + ), choose the set S of k nodes, such that σ ( S )gets maximized [15]. Here σ ( . ) is the social inﬂuence function . For any givenseed S , σ ( S ) returns the set of inﬂuenced nodes, when the diﬀusion process isover. In this survey, we have mainly focused on three aspects of the problem, asmentioned below. • Variants of this problem studied in the literature, • Hardness results of this problem in both traditional as well as parameter-ized complexity framework, • Diﬀerent solution approaches proposed in the literature.The overview of this survey is shown in Figure 1. There are several other aspectsof the problem, such as

SIM in the presence of adversaries , in a time-varyingsocial network , in competitive scenario etc., which we have not considered inthis survey.The main goal of this survey is threefold: • to provide comprehensive understanding about the SIM Problem and itsdiﬀerent variants studied in the literature,3 igure 1: Overview of this survey • to develop a taxonomy for classifying the existing solution methodologiesand present them in a concise manner, • to present an overview of the current research trend and future researchdirections regarding this problem.We set the following two criteria for the studies to be included in this survey: • Research work presented in the publication should produce theoreticallyor empirically better than some of the previously published results. • The presented solution methodology should be generic, i.e., it should workfor a network of any topology.

Rest of the paper is organized as follows: Section 2 describes some back-ground material required to understand the subsequent sections of this paper.Section 3 formally introduces the SIM Problem and its variants studied in the lit-erature. Section 4 describes hardness results of this problem in both traditionalas well as parameterized complexity theory framework. Section 5 describes somemajor research challenges in and around this problem. Section 6 describes theproposed taxonomy for classifying the existing solution methodologies in diﬀer-ent categories and discuss them. Section 7 presents the summary of the survey4nd gives some future research directions. Finally, Section 8 presents concludingremarks regarding this survey.

2. Background

In this section, we have described relevant background topics upto requireddepth, such as basic graph theory , relation between SIM and existing graphtheoretic problems, approximation algorithm , parameterized complexity theory and information diﬀusion models in social networks. The symbols and notationsthat have been used in the subsequent sections of this paper are given in Table1. Graphs are popularly used to represent most of the real world networkedsystems including social networks [16] [17]. Here, we have reported some pre-liminary concepts of basic graph theory from [18]. A graph is denoted by G ( V, E )where V ( G ) and E ( G ) are the vertex set and edge set of G , respectively. Forany arbitrary vertex, u i ∈ V ( G ), its open neighborhood is deﬁned as N ( u i ) = { u j | ( u i u j ) ∈ E ( G ) } . Closed neighborhood of u i will be N [ u i ] = u i ∪ N ( u i ). Degree of a vertex is deﬁned as the cardinality of its open neighborhood, i.e., deg ( u i ) = |N ( u i ) | . For any S ⊂ V ( G ), its open neighborhood and close neigh-borhood will be N ( S ) = ∪ u i ∈ S N ( u i ) and N [ S ] = S ∪ N ( S ), respectively. Twovertices u i and u j are said to be true twins , if N [ u i ] = N [ u j ] and false twins ,if N ( u i ) = N ( u j ). A graph is weighted , if a real number is associated withits vertices or edges or both. A graph is directed , if its edges have directions.The edges that join the same pair of vertices are known as parallel edges, andan edge whose both the end points are same is known as self-loop . A graph is simple , if it is free from self-loop and parallel edges.Information diﬀusion process in a social network is represented by a sim-ple , directed and vertex and edge weighted graph G ( V, E, θ, P ). Here, V ( G ) = { u , u , . . . , u n } , the set of users of the network and E ( G ) = { e , e , . . . , e m } ,5 able 1: Symbols and Notations Symbols Interpretation G ( V, E, θ, P ) Directed, vertex and edge weighted social network V ( G ) Set of vertices of network GE ( G ) Set of edges of network GU Set of users of the network, i.e., U = V ( G ) n Number of users of the network, i.e., n = | V ( G ) | m Number of Edges of the network, i.e., m = | E ( G ) | θ Vertex weight function of G , i.e., θ : V ( G ) −→ [0 , θ i Weight of vertex u i , i.e., θ i = θ ( u i ) P Edge weight function, i.e., P : E ( G ) −→ [0 , p ij Edge weight of the edge ( u i u j ) N ( u i ) Open neighborhood of vertex u i N [ u i ] Closed neighborhood of vertex u i [ n ] Set { , , . . . , n }N in ( u i ) Incomming neighbors of vertex u i N out ( u i ) Outgoing neighbors of vertex u i deg in ( u i ) Indegree of vertex u i deg out ( u i ) Outdegree of vertex u i dist ( u, v ) Number of edges in the shortest path between u and v . S Seed set for diﬀusion, i.e.,

S ⊂ V ( G ) k Maximum allowable cardinality for the seed set, i.e., |S| ≤ kr Maximum allowable round for diﬀusion6he set of social ties among the users. θ and P are the vertex and edge weight function, which assign a numerical value in between 0 and 1 to each vertex andedge, respectively, as its weight, i.e., θ : V ( G ) −→ [0 ,

1] and P : E ( G ) −→ (0 , information diﬀusion , vertex and edge weights are called node threshold anddiﬀusion probability, respectively [19]. More the value of θ i , more hard to in-ﬂuence the user u i and more the value of p ij , it is more probable that u i caninﬂuence u j . For any user u i ∈ V ( G ), its incoming neighbors and outgoingneighbors N in ( u i ) and N out ( u i ) are deﬁned as: N in ( u i ) = { u j | ( u j u i ) ∈ E ( G ) } and N out ( u i ) = { u j | ( u i u j ) ∈ E ( G ) } , respectively. For any user u i ∈ V ( G ), its indegree and outdegree is deﬁned as deg in ( u i ) = |N in ( u i ) | and deg out ( u i ) = |N out ( u i ) | , respectively. A path in a directed graph is a sequence of verticeswithout repetition, such that between every consecutive vertices there will bean edge . Two users are connected in the graph G , if there exists a directed pathbetween them. A directed graph is said to be connected, if there exists a pathbetween every pair of users. The TSS Problem is a more generalized version of many standard graphtheoretic problems discussed and mentioned in the literature, such as dominatingset with threshold [20], vector domination problem [21], k-tuple dominating set [22] (in all these problems instead of multiple rounds, diﬀusion can run onlyfor one round), vertex cover [23] (in this problem, vertex threshold is set equalto the number of neighbors of the node), irreversible k-conversion problem [24], r-neighbor bootstrap percolation problem [25] (where the threshold of each vertexis k or r respectively) and dynamic monopolies [26] (in this case, threshold ishalf of the neighbors of the user). Most of the optimization problems arising in real life are NP-Hard [27].Hence, we cannot expect to solve them by any deterministic algorithm in poly-nomial time. So, the goal is to get an approximate solution of the problem7ithin aﬀordable time. Approximation algorithms serve this purpose and alsoprovide the worst case guarantee on solution quality. For a maximization prob-lem P , let A be an algorithm, which provides its solution and I be the set ofall possible input instances of P . For an input instance I of P ; let, A ∗ ( I ) isthe optimal solution and A ( I ) is the solution generated by the algorithm A .Now, A will be called an α -factor absolute approximation algorithm , if ∀ I ∈ I , |A ∗ ( I ) − A ( I ) | ≤ α and α -factor relative approximation algorithm , if ∀ I ∈ I , max { A ∗ ( I ) A ( I ) , A ( I ) A ∗ ( I ) } ≤ α ( A ( I ) , A ∗ ( I ) (cid:54) = 0) [28]. Section 6.1 of this paper de-scribes relative approximation algorithms for solving SIM Problem. Parameterized complexity theory is another way of dealing with NP-Hardoptimization problems. It aims to classify computational problems based on theinherent diﬃculty with respect to multiple parameters related to the problem.There are several complexity classes in parameterized complexity theory. Theclass FPT (

Fixed Parameter Tractable ) contains the problems for which, anyproblem with instances ( x, k ) ∈ I , where x is the input , k is the parameter and I is the set of instances; its running time will be of O ( f ( k ) | x | O (1) ), where f ( k )is the function depending on only k and | x | denotes the length of the input. W hierarchy is the collection of complexity classes with the property W [0] = F P T and W [ i ] ⊆ W [ j ] ∀ i ≤ j [29]. Many normal computational problems occupy thelower levels of hierarchy, i.e., W [1] and W [2]. In Section 4, we have describedhardness results of TSS Problem in parameterized complexity theoretic setting. Diﬀusion phenomena in a networked system has got attention from diﬀer-ent disciplines, such as epidemiology (how diseases spread in a human contactnetwork?) [30], social network analysis (how information propagates in a socialnetwork?) [31], computer network (how computer virus propagates in an e-mailnetwork?) [32] etc.

Information Diﬀusion in an on-line social networks is aphenomenon by which word-of-mouth eﬀect occurs electronically. Hence, the8echanism of information diﬀusion is very well studied [33] [34]. To study thediﬀusion process, there are some models in the literature [35]. Nature of thesemodels varies from deterministic to probabilistic . Here, we have described somewell studied information diﬀusion models from the literature. • Independent Cascade Model (IC Model) [14]: This is one of the well studiedprobabilistic diﬀusion models used by Kempe et al. [36] in their seminalwork of social inﬂuence maximization . In this model, a node can either bein active state (i.e., inﬂuenced) or in inactive state (i.e., not inﬂuenced).Initially (i.e., at t = 0), all the nodes except the seeds are inactive. Ev-ery active node (say, u i ) at time stamp t will get a chance to activateits currently inactive neighbor ( u j ∈ N out ( u i ) and u j is inactive) withprobability as their edge weight. If u i succeeds, then u j will become anactive node in time stamp t + 1. A node can change its state from inactiveto active but not from active to inactive. This cascading process will becontinued until no more active node is there in a time stamp. Suppose,this diﬀusion process starts at t = 0 and continued till t = T and A t denotes the set of active nodes till time stamp t , where t ∈ [0 , T ], then A ⊆ A ⊆ · · · ⊆ A t ⊆ A t +1 ⊆ · · · ⊆ A T ⊆ V ( G ).Node u i is said to be active at time stamp t , if u i ∈ A t \ A t − . • Linear Threshold Model ( LT Model ) [14]: This is another probabilisticdiﬀusion model proposed by Kempe et al. [36]. In this model, for anynode (say u i ), all its neighbors who are activated just at previous timestamp together make a try to activate that node. This activation processwill be successful, if the sum of the incoming active neighbor’s probabilitybecomes either greater than or equal to the node’s threshold, i.e., ∀ u j ∈N in ( u i ), if (cid:80) ∀ u j ∈N in ( u i ); u j ∈A t p ji ≥ θ i then, u i will become active at timestamp t + 1. This method will be continued until no more activation ispossible. In this model, we can use the negative inﬂuence, which is notpossible in IC Model. Later, several extensions of this two fundamental9odels have been proposed [37].In both IC as well as LT Model, it is assumed that diﬀusion probabilitybetween two users is known. However, later there were several studies forcomputing diﬀusion probability [38] [39] [40] [41] [42]. • Shortest Path Model ( SP Model ): This is a special case of IC Modelproposed by Kimura et al. [33]. In this model, an inactive node willget a chance to become active only through the shortest path from theinitially active nodes, i.e., at t = min u ∈A ,v ∈ V ( G ) \A dist ( u, v ). A slightlydiﬀerent variation of SP Model proposed by the same author is SP1Model , which tells that an inactive node will get a chance of activationat t = min u ∈A ,v ∈ V ( G ) \A dist ( u, v ) and t = min u ∈A ,v ∈ V ( G ) \A dist ( u, v ) + 1. • Majority Threshold Model ( MT Model ): This is the deterministic thresh-old model proposed by Valente [43]. In this model, the vertex thresholdis deﬁned as θ i = (cid:108) deg ( u i )2 (cid:109) , which means that a node will become active,when atleast half of its neighbors are already active in nature. • Constant Threshold Model ( CT Model ): This is another deterministic dif-fusion model, where vertex threshold can be any value from 1 to its degree,i.e., θ i ∈ [ deg ( u i )]. • Unanimous Threshold Model ( UT model ) [23]: This is the most inﬂuenceresistant model of diﬀusion. In this model, for each node in the network,its threshold value is set to its degree i.e., ∀ u i ∈ V ( G ), θ i = deg ( u i ).There are many other diﬀusion models, such as weighted cascade model , whereedge weight will be the reciprocal of the degree of the node; trivalency model ,where the edge weights are uniformly taken from the set: { . , . , . } etc.Readers require a detailed and exhaustive treatment on information diﬀusionmodels may refer to [44]. 10 . SIM Problem and its Variants In literature, SIM problem has been studied since early two thousand. Ini-tially, this problem was introduced by Domingos and Richardson in the contextof viral marketing [45]. Due to its substantial practical importance across mul-tiple domains, diﬀerent variants of this problem have been introduced. In thissection, we will describe them one by one.

Basic SIM Problem [46]:.

In the basic version of the

TSS Problem along witha directed social network G ( V, E, θ, P ), we are given two integers: k and λ , andasked to ﬁnd out a subset of atmost k nodes such that after the diﬀusion processis over atleast λ number of nodes are activated. Mathematically, this problemcan be stated as follows: Instance:

A Directed Graph G ( V, E, θ, P ) , λ ∈ [ n ] and k ∈ Z + . Problem:

Basic TSS Problem [Find out a

S ⊂ V ( G ) , such that |S| ≤ k , and | σ ( S ) | ≥ λ ]. Output:

The Seed Set for Diﬀusion

S ⊂ V ( G ) and |S| ≤ k .Top k-node Problem / Social Inﬂuence Maximization Problem (SIM Problem)[47]:. This variant of the problem is most well studied. For a given socialnetwork G ( V, E, θ, P ), this problem asks to choose a set S of k nodes (i.e., S ⊂ V ( G ) and |S| = k ) such that the maximum number of nodes of the networkbecome inﬂuenced at the end of diﬀusion process, i.e., σ ( S ) will be maximized.Most of the algorithms presented in Section 6 are solely develop for solving thisproblem. Mathematically, the Problem of Top k-node Selection will be like thefollowing: 11 nstance:

A Directed Graph G ( V, E, θ, P ) and k ∈ Z + . Problem:

Top k-node Problem [Find out a

S ⊂ V ( G ) where |S| = k such that and for any other S (cid:48) ⊂ V ( G ) with |S (cid:48) | = k , σ ( S ) ≥ σ ( S (cid:48) ) ]. Output:

The Seed Set for Diﬀusion

S ⊂ V ( G ) and |S| = k .Inﬂuence Spectrum Problem. [48] In this problem, along with the social net-work G ( V, E, θ, P ), we are also given with two integers: k lower and k upper with k upper > k lower . Our goal is to choose a set S for each k ∈ [ k lower , k upper ], suchthat social inﬂuence in the network ( σ ( S )) is maximum in each case. Intutively,solving one instance of this problem is equivalent to solving ( k upper − k lower + 1)instances of SIM problem. As viral marketing is basically done in diﬀerentphases and in each phase, seed set of diﬀerent cardinalities can be used, in-ﬂuence spectrum problem appears in a natural way. Mathematically, inﬂuencespectrum problem can be written as follows: Instance:

A Directed Graph G ( V, E, θ, P ) and k lower , k upper ∈ Z + with k upper > k lower . Problem:

Inﬂuence Spectrum Problem [Find out a

S ⊂ V ( G ) with |S| = k , ∀ k ∈ [ k lower , k upper ] such that and for any other S (cid:48) ⊂ V ( G ) with |S (cid:48) | = k , σ ( S ) ≥ σ ( S (cid:48) ) ]. Output:

The Seed Set for Diﬀusion

S ⊂ V ( G ) and |S| = k for each k ∈ [ k lower , k upper ] . λ Coverage Problem [47]:.

This is another variant of SIM Problem, which con-siders the minimum number of inﬂuenced nodes required at the end of diﬀusion.For a given social network G ( V, E, θ, P ) and a constant λ ∈ [ n ], this problemasks to ﬁnd a subset S of its nodes with minimum cardinality, such that at least λ number of nodes will be inﬂuenced at the end of diﬀusion process. Mathe-12atically, this problem can be described in the following way: Instance:

A Directed Graph G ( V, E, θ, P ) and λ ∈ [ n ] . Problem: λ Coverage Problem [Find out the most minimumcardinality subset

S ⊂ V ( G ) such that | σ ( S ) | ≥ λ ]. Output:

The minimum cardinality seed set S for diﬀusion.Weighted Target Set Selection Problem (WTSS Problem) [49]:. This is an-other (infect weighted) variant of SIM Problem. Along with a social network G ( V, E, θ, P ), we are given another vertex weight function , φ : V ( G ) → N ,signifying the cost associated with each vertex. This problem asks to ﬁnd outa subset S , which minimizes total selection cost , and also all the nodes will beinﬂuenced at the end of diﬀusion. Mathematically, this problem can be statedas follows: Instance:

A Directed Graph G ( V, E, θ, P ) , vertex cost function φ : V ( G ) → N . Problem:

Weighted TSS Problem [Find out the subset

S ⊂ V ( G ) such that φ ( S ) is minimum and | σ ( S ) | = n ]. Output:

The Seed Set for Diﬀusion

S ⊂ V ( G ) with minimum φ ( S ) value.r-round min-TSS Problem [50]:. It is a variant of SIM Problem, which considersthe number of rounds required to complete the diﬀusion process. Along witha directed graph G ( V, E, θ, P ), we are given the maximum number of allowablerounds r ∈ Z + , and asks to ﬁnd out a minimum cardinality seed set S , whichactivates all the nodes of the network within r -round. Mathematically, thisproblem can be described as follows: 13 nstance: A Directed Graph G ( V, E, θ, P ) and r ∈ Z + . Problem: r-round min-TSS Problem [Find out the most minimumcardinality subset S such that ∪ ri =1 σ i ( S ) = V ( G ) ]. Output:

The Seed Set for Diﬀusion

S ⊂ V ( G ) . Here, σ i ( S ) denotes the set of inﬂuenced nodes from the seed set S at the i -thround of diﬀusion. Budgeted Inﬂuence Maximization Problem (BIM Problem) [51]:.

This is an-other variant of SIM Problem, which is recently gaining popularity. Along witha directed graph G ( V, E, θ, P ), we are given with a cost function C : V ( G ) −→ Z + and a ﬁxed budget B ∈ Z + . Cost function C assigns a nonuniform selection costto every vertex of the network, which is the amount of incentive need to bepaid, if that vertex is selected as a seed node. This problem asks for selectinga seed set within the budget, which maximizes the spread of inﬂuence in thenetwork. Instance:

A Directed Graph G ( V, E, θ, P ) , a cost function C : V ( G ) −→ Z + and aﬀordable budget B ∈ Z + . Problem:

Budgeted Inﬂuence Maximization Problem [Find out theseed set ( S ) such that (cid:80) u ∈S C ( u ) ≤ B and for any other seed set S (cid:48) with (cid:80) v ∈S (cid:48) C ( v ) ≤ B , | σ ( S ) | ≥ | σ ( S (cid:48) | )]. Output:

The Seed Set for Diﬀusion

S ⊂ V ( G ) with (cid:80) u ∈S C ( u ) ≤ B . ( λ, β, α ) TSS Problem [52]:.

This is another variant of TSS Problem, whichconsiders the maximum cardinality of the seed set ( β ), maximum allowablediﬀusion rounds ( λ ), and number of inﬂuenced nodes at the end of diﬀusionprocess ( α ) all together. Along with the input graph G ( V, E, θ, P ), we are given14ith the parameters λ, β and α . Mathematically, this problem can be stated asfollows: Instance:

A Directed Graph G ( V, E, θ, P ) , three parameters λ, β ∈ N and α ∈ [ n ] . Problem: ( λ, β, α ) TSS Problem [Find out the subset

S ⊂ V ( G ) suchthat |S| ≤ β , | ∪ λi =1 σ i ( S ) | ≥ α ]. Output:

The Seed Set for Diﬀusion

S ⊂ V ( G ) and |S| ≤ β . ( λ, β, A ) TSS Problem [52]:.

This is a slightly diﬀerent from the ( λ, β, α ) TSSproblem, in which instead of the required number of the nodes after the diﬀusionprocess, it explicitly maintains which nodes should be inﬂuenced. Along with theinput social network G ( V, E, θ, P ), we are also given with maximum allowablerounds ( λ ), maximum cardinality of the seed set ( β ), and set of nodes A ⊆ V ( G )need to be inﬂuenced at the end of diﬀusion process as input. This problemasks for selecting a seed set of maximum β elements, which will inﬂuence all thenodes in A within λ rounds of diﬀusion. Mathematically, the problem can bestated as follows: Instance:

A Directed Graph G ( V, E, θ, P ) , A ⊆ V ( G ) and twoparameters λ, β ∈ N . Problem: ( λ, β, A ) TSS Problem [Find out the subset

S ⊂ V ( G ) suchthat |S| ≤ β , A ⊆ ∪ λi =1 σ i ( S ) ]. Output:

The Seed Set for Diﬀusion

S ⊂ V ( G ) and |S| ≤ β . ( λ, A ) TSS Problem [52]:.

This is slightly diﬀerent from ( λ, β, A ) TSS Prob-lem. Here, we are interested in ﬁnding the minimum cardinality seed set,such that within some ﬁxed numbers of diﬀusion rounds ( λ ), a subset of thenodes ( A ) will be inﬂuenced. Mathematically, the problem can be stated asfollows: 15 nstance: A Directed Graph G ( V, E, θ, P ) , A ⊂ V ( G ) and λ ∈ N . Problem: ( λ, A ) TSS Problem [Find out the subset S such that A ⊆ ∪ λi =1 σ i ( S ) and for any other S (cid:48) with |S (cid:48) | < |S| A (cid:54)⊆ ∪ λi =1 σ i ( S (cid:48) ) ]. Output:

Minimum cardinality Seed Set for Diﬀusion

S ⊂ V ( G ) . We have described diﬀerent variants of TSS Problem in social networks avail-able in the literature. It is surprising to see that only Top-k node Problem hasbeen studied, in depth.

4. Hardness Results of TSS Problem

In this section, we have described hardness results of SIM Problem underboth general as well as parameterized complexity theoretic perspective. Ini-tially, the problem of social inﬂuence maximization was posed by Domingos andRichardson [45] [53] in the context of viral marketing. However, Kempe et al.[36] was the ﬁrst to investigate the computational issues of the problem. Theywere able to show that SIM Problem under IC and LT Model is a special caseof

Set Cover Problem and

Vertex Cover Problem , respectively. Both the setcover and vertex cover problems are well-known

NP-Hard problems [27]. Theconclusion is presented as Theorem 1.

Theorem 1. [36] Social Inﬂuence Maximization Problem is NP-Hard for bothIC as well as LT model and also NP-Hard to approximate within a factor of n (1 − (cid:15) ) ∀ (cid:15) > . Chen [23] studied variant of SIM Problem namely λ Coverage Problem . Hisstudy was diﬀerent from Kempe et al.’s [36] study in two ways. First one is,Kempe et al. [36] investigated the Top- k node problem, whereas Chen [23] stud-ied the λ -coverage problem. Secondly, Kempe et al. [36] studied the diﬀusionprocess under IC and LT Models, which are probabilistic in nature, whereas16hen [23] considered all the deterministic diﬀusion models like majority thresh-old model , constant threshold model and unanimous threshold model . In general,for the λ Coverage Problem, Chen [23] came up with a seminal result presentedin Theorem 2.

Theorem 2. [23] TSS Problem cannot be approximated with in the constantfactor O (2 log (1 − (cid:15) ) n ) unless N P ⊂ DT IM E ( n polylog ( n ) ) for any ﬁxed constant (cid:15) > . This theorem can be proved by a reduction from the

Minimum Represen-tative Problem given in [54]. Next, they have shown that in majority thresholdmodel also, λ -coverage problem follows the similar result as presented in Theo-rem 2. However, when θ ( u ) = 1, ∀ u ∈ V ( G ) then TSS Problem can be solvedvery intuitively as targeting one node in each component results into the acti-vation of all the nodes of the network. Surprisingly, this problem becomes hard,when we allow the vertex threshold to be at most 2, i.e., θ ( u ) ≤ ∀ u ∈ V ( G ).They proved the following result in this regard. Theorem 3. [23] The TSS Problem is NP-Hard, when thresholds are at most2, even for bounded bipartite graphs.

This theorem can be proved by a reduction from a variant of 3-SAT Problempresented in [55]. Moreover, Chen [23] has shown that for unanimous thresholdmodel , the

TSS Problem is equivalent to vertex cover problem , which is a well-known NP-Complete Problem.

Theorem 4. [23] If all the vertex thresholds of the graph are unanimous (i.e. ∀ u ∈ V ( G ) , θ ( u ) = deg ( u ) ), then the TSS Problem is identical to vertex coverproblem. Chen [23] has also shown that if the underline graph is tree, then theTSS Problem can be solved in polynomial time and they have also given the

ALG-Tree

Algorithm, which does this computation. To the best of the authors’knowledge, there is no other literature, which focuses on the hardness analysis17f the TSS Problem in traditional complexity theoretic perspective. We havesummarized the results in Table 2.Now, we describe the hardness results based on the parameterized complex-ity theoretic perspective. For basic notions about parameterized complexity ,readers may refer to [56]. Bazgan et al. [57] showed that SIM Problem underconstant threshold model (CTM) does not have any parameterized approxima-tion algorithm with respect to the parameter seed set size . Chopin et al. [58],[59] studied the TSS Problem in parameterized settings with respect to theparameters related to network cohesiveness like clique cover number (numberof cliques required to cover all the vertices of the network [60]), distance toclique (number of vertices need to be deleted to obtain a clique), cluster vertexdeletion number (number of vertices to delete in order to obtain a collectionof disjoint cliques); parameters related to network density like distance to co-graph , distance to interval graph ; parameters related to sparsity of the network,namely vertex cover number (number of vertices to remove to obtain an edge-less graph), feedback edge set number and feedback vertex set number (numberof edges or vertices to remove to obtain a forest), pathwidth , bandwidth . It isinteresting to note that computing all the parameters except feedback edge setnumber is NP-Hard problem. The version of TSS Problem, they have workedwith is λ -coverage problem with λ = n . They came up with the following twoimportant results related to the sparsity parameters of the network: Theorem 5. [58] TSS Problem with majority threshold model is W[1] hard evenwith respect to the combined parameter feedback vertex set, distance to co-graph,distance to interval graph, and path width.

Theorem 6. [58] TSS Problem is ﬁxed-parameter tractable with respect to theparameter bandwidth.

For proving the above two theorems, authors have used reduction rules used in[61] and [62]. Results related to dense structure property of the network is givenin Theorems 7 through 9. 18 heorem 7.

TSS Problem is W[1]-Hard with parameter cluster vertex deletionnumber.

Theorem 8.

TSS Problem is NP-Hard and W[2] Hard with respect to the pa-rameter target set size ( k ), even on graphs with clique cover number of two. Theorem 9.

TSS Problem is ﬁxed parameter tractable with respect to the pa-rameter ‘distance l to clique’, if the threshold function satisﬁes following prop-erties θ ( u ) > g ( l ) ⇒ θ ( u ) = f (Γ( u )) ∀ u ∈ V ( G ) , f : P ( V ( G )) −→ N and g : N −→ N . For detailed proof of Theorems 7 through 9, readers may refer to [58]. All theresults related to the parameterized complexity theory has been summarized inTable 3.

5. Major Research Challenges

Before entering into the critical review of the existing solution methodolo-gies, in this section, we provide a brief discussion on major research challengesconcerned with the SIM Problem. This will help the reader to understand whichcategory of solution methodology can handle what challenge. • Trade of Between Accuracy and Computational Time:

From thediscussion in Section 4, it is now well understood that the SIM Problemis computationally hard from both traditional as well as parameterizedcomplexity theoretic prospective, in general. Hence, for some given k ∈ Z + , obtaining the most inﬂuential k nodes within feasible time is notpossible. In this scenario, the intuitive approach could be to use someheuristic method for selecting seed nodes. This will lead to less time forseed set generation. However, the number of inﬂuenced nodes generatedby the seed nodes could be also arbitrarily less. In this situation, it is animportant issue to design algorithms, which will run in aﬀordable timeand also, the gap between the optimal spread and the spread due to theseed set selected by an algorithm will be as much less as possible.19 ame of theProblem DiﬀusionModel Major Findings SIM IC Model A special case of set cover problem and henceNP-Hard.LT Model A special case of vertex cover problem andhence NP-Hard. λ -Coverage Problem MT Model Not only NP-Hard as well as cannot be approximated in the con-stant factor O (2 log (1 − (cid:15) ) n ) unless N P ⊂ DT IM E ( n polylog ( n ) )CT Modelwith θ ( u ) = 1, ∀ u ∈ V ( G ) Can be solved trivially by selecting a vertexfrom each component of the network.CT Modelwith θ ( u ) ≤ ∀ u ∈ V ( G ) NP-Hard even for bounded bipartite graphs.UT Model Identical to vertex cover problem and henceNP-Hard Table 2: Hardness results of TSS Problem and its variants in traditional complexity theoryperspective. ame of theProblem DiﬀusionModel Parameter Major Findings SIM CT Model with θ ( u ) ∈ [ deg ( u )] Seed Set Size Does not have any param-eterized approximation al-gorithm. λ -coverageProblem with λ = n MT Model Feedback vertexset number, Path-width, Distance tocograph, Distanceto interval graph The problem is W [1]-Hard. λ -coverageProblem with λ = n GT Model Cluster vertex dele-tion number The problem is W [1]-Hard λ -coverageProblem with λ = n CT Model Cluster vertex dele-tion number The problem is ﬁxed pa-rameter tractable. λ -coverageProblem with λ = n GT Model Seed set size The problem is W [2]-Hard λ -coverageProblem with λ = n MT Model, CTModel distance to clique The problem is ﬁxed pa-rameter tractable.

Table 3: Hardness results of TSS Problem and its variants in parameterized complexity theoryperspective. Breaking the Barrier of Submodularity:

In general, the social in-ﬂuence function σ ( . ) is submodular (Discussed in Section 6.1). However,in many practical situations, such as opinion and topic speciﬁc inﬂuencemaximization , the social inﬂuence function may not be submodular [63][64]. This happens because one node can switch its state from positiveopinion to negative opinion and the vice-versa. In this scenario, solvingthe SIM Problem may be more challenging due to the absence of submod-ularity property in the social inﬂuence function. • Practicality of the Problem:

In general, the SIM Problem takes manyassumptions, such as every selected seed will perform up to expectationin the spreading process, inﬂuencing each node of the network is equallyimportant etc. This assumptions may be unrealistic in some situations.Assume the case of target advertisement , where instead of all the nodes,a set of target nodes are chosen and the aim is to maximize the inﬂuencewithin the target nodes [65] [66]. In another way, due to the probabilisticnature of diﬀusion, a seed node may not perform up to expectation in theinﬂuence spreading process. Solving the SIM Problem and its variants willbe more challenging, if we relax these assumptions. • Scalability:

Real life social networks have millions of nodes and bil-lions of edges. So, solving the SIM and related problems for real lifesocial networks, scalability should be an important issue for any solutionmethodology. • Theoretical Challenges:

For a computational problem, any of its solu-tion methodology is concerned with two aspects. First one is the compu-tational time . This is measured as the execution time, when the method-ology is implemented with real life problem instances. The second one isthe computational complexity . This is measured as the asymptotic bound of the methodology. Theoretical research on any computational problemalways concerned with the second aspect of the problem. Hence, the the-22 igure 2: Proposed taxonomy for classifying the solution methodologies. oretical challenge for the SIM Problem is to design algorithms with goodasymptotic bounds.

6. Solutions Methodologies

Due to the inherent hardness of the SIM Problem, over the years researchershave developed algorithms for ﬁnding seed set for obtaining near-optimal inﬂu-ence spread. In this section, the available solution methodologies in the litera-ture have been described. First we describe our proposed taxonomy for classi-fying the solution methodologies. Figure 2 gives a diagrammatic representationof the proposed taxonomy and we describe them below. • Approximation algorithms with provable guarantee : Algorithmsin this category give the worst case bound for inﬂuence spread. However,most of them suﬀer from the scalability issues, which means, with theincrease of the network size, running time grows heavily. Many of thealgorithms of this category have near optimal asymptotic bounds.23

Heuristic solutions : Algorithms of this category do not give any worstcase bound on inﬂuence spread. However, most of them have more scal-ability and better running time compared to the algorithms of previouscategory. • Meta-heuristic solutions : Methodologies of this category are the meta-heuristic optimization algorithms and many of them are developed basedon the evolutionary computation techniques. These algorithms also donot give any worst case bound on inﬂuence spread. • Community-Based Solutions : Algorithms of this category use com-munity detection of the underlying social network as an intermediate stepto bring down the problem into community level and improves scalability.Most of the algorithms of this category are heuristic and hence, do notprovide any worst case bound on inﬂuence spread. • Miscellaneous : Algorithms of this category do not follow any particularproperty and hence, we put them under this heading.

Kempe et al. [36] [67] [68] were the ﬁrst to study the problem of social in-ﬂuence maximization as a combinatorial optimization problem and investigatedits computational issues under two diﬀusion models, namely LT and IC mod-els. In there studies, they assumed that the social inﬂuence function , σ () is sub-modular and monotone . The function σ : 2 V ( G ) → R + will be sub-modular,if it follows the diminishing return property , which means ∀ S ⊂ T ⊂ V ( G ), u i ∈ V ( G ) \ T ; σ ( S ∪ u i ) − σ ( S ) ≥ σ ( T ∪ u i ) − σ ( T ) and σ will be mono-tone, if for any S ⊂ V ( G ) and ∀ u i ∈ V ( G ) \ S , σ ( S ∪ u i ) ≥ σ ( S ). Theyproposed a greedy strategy for selecting seed set presented in Algorithm 1.24 lgorithm 1: Kempe et al.’s [36] Greedy Algorithm for

Seed Set Selection .( Basic Greedy ) Data:

Given Social Network G ( V, E, θ, P ) and some k ∈ Z + . Result:

Seed Set for diﬀusion

S ⊂ V ( G ). S ← φ ; for i = 1 to k do u = argmax u i ∈ V ( G ) \S σ ( S ∪ u i ) − σ ( S ); S ← S ∪ u return S Starting with the empty seed set ( S ), Algorithm 1 iteratively selects nodewhich is currently not in S , and inclusion of which to S causes the maximummarginal increment in σ (). Let us assume that S i denotes the seed set at i − th iteration of the ‘for’ loop in Algorithm 1. In ( i +1) − th iteration, S i +1 = S i ∪{ u } ,if σ ( S ∪ u ) − σ ( S ) value becomes the maximum among all u ∈ V ( G ) \ S i . Thisiterative process will be continued until we reach the allowed cardinality of S .Kempe et al. [36] showed that Algorithm 1 provides (1 − e − (cid:15) ) with (cid:15) > Theorem 10.

Algorithm 1 provides (1 − e − (cid:15) ) with (cid:15) > factor approximationbound for the SIM Problem; i.e.; if S ∗ be the k element optimal seed set, then σ ( S ) ≥ (1 − e ) .σ ( S ∗ ) , where e = (cid:80) ∞ x =1 1 x ! . Though Algorithm 1 gives good approximation bound on inﬂuence spread, itsuﬀers from two major shortcomings. For example, for any given seed set S ,exact computation of the inﬂuence spread (i.e., σ ( S )) is P - Complete . Hence,they approximate the inﬂuence spread by running a huge number of

Monte CarloSimulations (MCS), counting total number of inﬂuenced nodes in all simulationruns and taking average with the number of runs. However, recently Maehara etal. [69] developed the ﬁrst procedure for exact computation of inﬂuence spreadusing binary decision diagrams . Secondly, the number of times inﬂuence function( σ ( . )) needs to be evaluated is quite huge. For selecting a seed set of size k with25 number of MCS runs in a social network having n nodes and m edges willrequire O ( kmn R ) number of inﬂuence function evaluations. Hence, applicationof this algorithm for a medium size networks (only consisting of 15000 nodes;though real life networks are much larger) appears to be unrealistic [70], whichmeans that the algorithm is not scalable enough.In spite of having a few drawbacks, Kempe et al.’s [36] study is consideredto be the foundational work on the SIM Problem. This study has triggereda vast amount of research in this direction. In most of the cases, the mainfocus was to reduce the scalability problem incurred by Basic Greedy Algorithmin Kempe et al.’s work. Some of them landed with heuristics, in which theobtained solution could be far away from the optima. Still a few studies arethere, in which scalability problem was reduced signiﬁcantly without loosingapproximation ratio. Here, we have listed the algorithms which could provideapproximation guarantee, whereas in Section 6.2, we have described all theheuristic methods. • CELF : For improving the scalability problem, Leskovec et al. [11] pro-posed a

Cost Eﬀective Lazy Forward (CELF) scheme by exploiting thesub-modularity property of the social inﬂuence function. The key idea intheir study was: for any node, its marginal gain in inﬂuence spread in thecurrent iteration cannot be more than its marginal gain in the previous it-erations. Using this idea, they were able to make a drastic reduction in thenumber of evaluations of the inﬂuence estimation function ( σ ( . )), whichleads to signiﬁcant improvement in running time though the asymptoticcomplexity remains the same as that of the Basic Greedy Algorithm (i.e., O ( kmn R )). Reported results in their paper shows that CELF can speedup the computation process upto 700 times compared to Basic GreedyAlgorithm on benchmark data sets. This algorithm is also applicable inmany other contexts, such as ﬁnding informative blogs in a web blog net-work , optimal placement of sensors in a water distribution network fordetecting out-breaks etc. 26 CELF++ : Goyal et al. [71] proposed an optimized version of CELF byexploiting the sub-modularity property of social inﬂuence function andnamed it as CELF++. For each node u of the network, CELF++ main-tains a table of the form < u.mg , u.prev best, u.mg , u.f lag > where u.mg σ ( . ) for the current S ; u.prev best is thenode with the maximum marginal gain among the users scanned till nowin the current iteration; u.mg σ ( . ) for u withrespect to the S ∪ { prev best } and u.f lag is the iteration number, when u.mg u.prev best is included in the seed set in the current iteration, then the marginal gainof u in σ ( . ) with respect to S ∪ { prev best } need not be recomputed in thenext iteration. Reported results showed that CELF++ is 35-55 % fasterthan CELF though the asymptotic complexity remains the same. • Static Greedy : Cheng et al. [72] developed this algorithm for solvingSIM problem, which provides both guaranteed accuracy as well as highscalability. This algorithm works in two stages. In the ﬁrst stage, R num-ber of Monte Carlo snapshots are taken from the social network, whereeach edge ( uv ) is selected based on the associated diﬀusion probability p uv . In the second stage, starting from the empty seed set, a node havingthe maximum average marginal gain in inﬂuence spread over all sampledsnapshots will be selected as a seed node. This process will be contin-ued until k nodes are selected. This algorithm has the running time of O ( R m + k R m (cid:48) n ) and space requirement of O ( R m (cid:48) ), where R and m (cid:48) arethe number of Monte Carlo samples and average number of active edges inthe snapshots, respectively. Reported results show that the Static Greedyreduces the computational time by two orders of magnitude, while achiev-ing the better inﬂuence spread compared to Degree Discount Heuristic(DDH), Maximum Degree Heuristic (MDH), Preﬁx excluding MaximumInﬂuence Arborescence (PMIA) (discussed in Section 6.2) Algorithms. • Borgs et al.’s Method:

Borgs et al. [73] proposed a completely dif-27erent approach for solving SIM Problem under IC Model using reversereachable sampling technique . Other than the MCS runs , this is a newapproach for estimating the inﬂuence spread. Their algorithm is random-ized and succeeds with the probability of and has the running time of O (( m + n ) (cid:15) − log n ), which improves the previously best known algorithmhaving the complexity of O ( mnkP OLY ( (cid:15) − )). Algorithm proposed byBorgs et al. is near-optimal since the lower bound is Ω( m + n ). Thisalgorithm works in two phases. In the ﬁrst phase, stochastically a hyper-graph ( H ) is generated from the input social network. Second phase isconcerned with the seed set selection. This is done by repeatedly choosingthe node with maximum degree in H , deleting it along with its incidenceedges from H . The k -element set obtained in this way is the seed setfor diﬀusion. This work is mostly theoretically enriched and lacking ofpractical experimentation. • Zohu et al.’s Method : Zohu et al. [74] improved the approximationbound from (1 − e ) (which is approximately 0.63) to 0.857. They de-signed two approximation algorithms: ﬁrst algorithm works for the prob-lem, where the cardinality of the seed set ( S ) is not restricted and thesecond one works, when there is some restricted upper bound on the car-dinality of seed set. They formulated the inﬂuence maximization problemas an optimization problem given below. max S⊂ V ( G ) (cid:88) u ∈S ,v ∈ V ( G ) \S p uv , (1)where p uv is the inﬂuence probability between the users: u and v . Theyconverted this optimization problem into a quadratic integer programmingproblem and solved the problem using the concept of semideﬁnite pro-gramming [75]. • SKIM : Cohen et al. [76] proposed a

Sketch-Based Inﬂuence Maximiza-tion (SKIM) algorithm, which improves the Basic Greedy Algorithm byensuring in every iteration, with suﬃciently high probability, or in expec-28ation, the node we choose to add to the seed set has a marginal gainthat is close to the maximum one. The running time of this algorithm is O ( nl + (cid:80) i =1 | E i | + m(cid:15) − log n ), where l is the number of snap shots of G , E i is the edge set of G i . Reported results show that SKIM has highscalability over Basic Greedy, Two phase Inﬂuence Maximization (TIM),Inﬂuence Ranking and Inﬂuence Estimation (IRIE) etc. without compro-mising inﬂuence spread. • TIM : Tang et al. [77] developed a

Two-phase Inﬂuence Maximization (TIM) algorithm, which has the expected running time of O (( k + l )( n + m ) log n/(cid:15) ) with atleast (1 − n − l ) probability for some given k , (cid:15) and l . Asits name suggests, this algorithm has two phases. In the ﬁrst phase, TIMcomputes lower bound on the maximum expected inﬂuence spread amongall k sized sets and uses this lower bound to estimate a parameter φ . In thesecond phase, φ number of reverse reachability (RR) set samples have beenpicked up from the social network. Then, it derives a k sized seed set thatcovers the maximum number of RR sets and returns as the ﬁnal result.Reported results shows that TIM is two times faster than CELF++ andBorgs et al.’s [73] Method, while achieving the same inﬂuence spread. Toimprove the running time of TIM, Tang et al. [77] proposed a heuristic,which takes all the RR sets, generated in an intermediate step of secondphase of TIM as inputs. Then, it uses a greedy approach for the maximumcoverage problem for selecting the seed set. This modiﬁed version of TIMis named as TIM + . Reported results showed that TIM + is two times fasterthan TIM. • IMM : Tang et al. [78] proposed

Inﬂuence Maximization via Martingales (IMM) (a kind of stochastic process, in which, for the given current andpreceding values, the conditional expectation of the next value, will be thecurrent value itself), which achieves a O (( k + l )( n + m ) log n/(cid:15) ) expectedrunning time and returns (1 − e − (cid:15) ) factor approximate solution withprobability of (1 − n − l ). IMM Algorithm also has two phases like TIM29nd TIM + . First phase is concerned with sampling RR sets from thegiven social network and the second phase is concerned with the seed setselection. In the ﬁrst phase, unlike TIM and TIM + , RR sets generatedin the ﬁrst phase are dependent because ( i + 1)-th RR set is generatedbased on whether ﬁrst i of RR sets are satisfying stopping criteria or not.In IMM, the RR sets generated in the sampling phase are reused in nodeselection phase, which is not the case in TIM or TIM+. In this way, IMMcan eliminate a lot of unnecessary computations, which leads to signiﬁcantimprovement in running time though asymptotic complexity remains thesame as that of TIM. Reported results conclude that IMM outperformsTIM, TIM+, IRIE (described in Section 6.2) based on running time whileachieving comparable inﬂuence spread. • Stop-and-Stare : Nguyen et al. [79] developed the Stop-and-Stare Algo-rithm (SSA) and its dynamic version DSSA for

Topic-aware Viral Mar-keting (TVM) problem. We have not discussed this problem, as it comesunder topic aware inﬂuence maximization. However, this solution method-ology can be used for solving SIM problem with minor modiﬁcation. Theyshowed that, the number of RR set samples used by their algorithms isasymptotically minimum. Hence, Stop-and-Stare is 1200 times faster thanthe state-of-the art IMM algorithm. We are not discussing the results, asthey are for the TVM problem and out of the scope of this survey. • BCT : Recently, Nguyen et al. [80] proposed

Billion-scale Cost-awardTargeted (BCT) algorithm for solving cost-aware targeted viral marketing (CTVM) introduced by them. We have not discussed this problem, as itcomes under topic aware inﬂuence maximization. However, this solutionmethodology can be adopted for solving SIM Problem as well under bothIC and LT Models and have the running time of O (( k + l )( n + m ) log n/(cid:15) )and O (( k + l ) n log n/(cid:15) ), respectively. We are not discussing about theresults, as they are for CTVM Problem and out of scope of this survey. • Nguyen et al.’s Method : Nguyen et al. [51] studied the

Budgeted Inﬂu- nce Maximization Problem described in Section 3. They have formulatedthe following optimization problem in the context of Budgeted InﬂuenceMaximization : max σ ( S ) (2)subject to, (cid:88) u ∈S C ( u ) ≤ B (3)Now, if ∀ u ∈ V ( G ), C ( u ) = 1, then it becomes the SIM Problem. To solvethis problem, they proposed two algorithms. First one is the modiﬁcationof basic greedy algorithm proposed by Kempe et al. [36] (Algorithm 1) andsecond one was adopted from [81]. In the ﬁrst algorithm ∀ u ∈ V ( G ) \ S ,they computed the increment of inﬂuence in unit cost as follows: δ ( u ) = σ ( S ∪ u ) − σ ( S ) C ( u ) (4)Now, the algorithm choose u to include in the seed set ( S ), if it maximizedthe objective function as well as C ( S i ∪ u ) ≤ B . This iterative processwill be continued until no more nodes can be added within the budget.However, this algorithm does not give any constant approximation ratio.This algorithm can be modiﬁed to get the constant approximation ratio,as given in Algorithm 2. Algorithm 2:

Nguyen et al.’s [51] Greedy Algorithm for BIM Problem.

Data:

Given Social Network G ( V, E, θ, P ), cost function C : V ( G ) −→ Z + some B ∈ Z + . Result:

Seed Set for diﬀusion

S ⊂ V ( G ). S = result of Naive Greedy; S max = argmax u ∈ V ( G ) σ ( u ); S = argmax ( σ ( S ) , σ ( S max )); return S Theorem 11.

Algorithm 2 guarantees (1 − √ e ) approximate solution forBIM Problem. (cid:31) CELF (cid:31)

CELF++ (cid:31)

Static Greedy.Another scope of improvement in Kempe et al.’s [36] work was estimat-ing the inﬂuence spread by applying some method other than the heavily timeconsuming MCS runs. Borgs et al. [73] explored this scope by proposing adrastically diﬀerent approach for spread estimation, namely reverse reachablesampling technique. The algorithms (such as TIM, TIM + , IMM) which usedthis method were seem to be much faster than CELF++ and also have com-petitive inﬂuence spread. Among TIM, TIM + , and IMM , IMM was found tobe the fastest one both theoretically (in terms of computational complexity),and empirically (in terms of computational time from experimentation) due tothe reuse of the RR sets in the node selection phase. To the best of the au-thors’ knowledge, IMM is the fastest algorithm, which was solely proposed forsolving SIM Problem. However, BCT Algorithm proposed by Nguyen et al.[80], which was originally proposed for solving CTVM problem, is the fastestsolution methodology available in the literature that can be adopted for solvingSIM Problem. 32ow from this discussion, it is important to note that the scalability problemincurred by the Basic Greedy Algorithm had been reduced by the subsequentresearch. However, as the size of the social network data set has become gi-gantic, development of algorithms with high scalability remains the thrust area.Solution methodologies described till now have been summarized in Table 4.Algorithms for which complexity analysis had not been done by the author(s),we left that column of the table blank. Algorithms of this category do not provide any approximation bound on theinﬂuence spread but have better running time and scalability. Here, we willdescribe the heuristic solution methodologies from the literature. • Random Heuristic : For selecting seed set by this method, randomlypick k nodes of the network and return them as seed set. In Kempe etal.’s [36] experiment, this method has been used as a baseline method. • Centrality-Based Heuristics : Centrality is a well-known measure innetwork analysis, which signiﬁes how much importance a node has in thenetwork [84] [85]. There are many centrality-based heuristics proposed inthe literature for SIM Problem like

Maximum Degree Heuristic (MDH)(select k highest degree nodes as seed node), High Clustering CoeﬃcientHeuristic (HCH) (select k nodes with the highest clustering coeﬃcientvalue) [86] [87], High page rank heuristic [88] (select k nodes with thehighest page rank value) etc. • Degree Discount Heuristic (DDH): This is basically the modiﬁed ver-sion of MDH and was proposed by Chen et al. [70]. The key idea behindthis method is following for any two nodes u, v ∈ V ( G ), ( uv ) ∈ E ( G ) and u has been selected as a seed set by MDH, and then, during the countingthe degree of v , the edge ( uv ) should not be considered. Hence, due to the33 ame of theAlgorithm ProposedBy Complexity ApplicableFor ModelBasicGreedy Kempe etal. [36] O ( kmn R ) SIM

IC <

CELF

Leskovec etal. [83] O ( kmn R ) SIM

IC <

CELF++

Goyal etal.[71] O ( kmn R ) SIM

IC <

StaticGreedy

Cheng et al.[72] O ( R m + kn R m ) SIM

IC <

Brog et al.’sMethod

Brogs et al.[73] O ( kl ( m + n ) log n/(cid:15) ) SIM

IC <

Zohu et al.’sMethod

Zohu et al.[74] -

SIM

IC <

SKIM

Cohen et al.[76] O ( nl + (cid:80) i =1 | E i | + m(cid:15) − log n ) SIM

IC <

TIM+ , IMM

Tang et al.[77], [78] O (( k + l )( n + m ) log n/(cid:15) ) SIM

IC <

Stop-and-Stare

Nguyen etal. [79] -

TVM

IC <

Nguyen’sMethod

Nguyen etal. [51] O ( n (log n + d ) + kn (1 + d )) BIM

IC <

BCT

Nguyen etal. [80] O (( k + l )( n + m ) log n/(cid:15) ) SIM , BIM , CTVM IC BCT

Nguyen etal. [80] O (( k + l ) n log n/(cid:15) ) SIM , BIM , CTVM LT Table 4: Approximation algorithms for SIM Problem and its variants. u in the seed set, the degree of v will be discounted by 1. Thismethod is also named as Single Discount Heuristic (SDH). Experimentalresults of [70] show that DDH can achieve better inﬂuence spread thanMDH. • SIMPATH : This heuristic was proposed by Goyal et al. [89] for solvingSIM Problem under LT Model. SIMPATH works based on the principal ofCELF (discussed in Section 6.1). However, instead of using computation-ally expensive Monte Carlo Simulations for estimating inﬂuence spread,SIMPATH uses path enumeration techniques for this purpose. This al-gorithm has a parameter ( η ) for controlling trade oﬀ between inﬂuencespread and running time. Reported results conclude that SIMPATH out-performs other heuristics, such as MDH, Page Rank, LDGA with respectto information spread. • SPIN : Narayanam et al. [47] studied SIM Problem and λ Coverage Prob-lem as a co-operative game and proposed a

Shapely Value-Based Discov-ery of Inﬂuential Nodes (SPIN) Algorithm, which has the running time of O ( t ( n + m ) R + n log n + kn + k R m ), where t is the cardinality of the samplecollision set being considered for the computation of shapely value. Thisalgorithm has mainly two steps. First one is to generate a rank list of thenodes based on the shapley value and then, choose top-k of them and re-turn as seed set. Reported results show that SPIN constantly outperformsMDH and HCH. • MIA and

PMIA : Chen et al. [5] and Wang et al. [90] proposed maximuminﬂuence arborescence (MIA) and Preﬁx excluding MIA (PMIA) model ofinﬂuence propagation. They computed the propagation probability from aseed node to a non-seed node by multiplying the inﬂuence probabilities ofthe edges present in the shortest path.

Maximum Inﬂuence Path is the onehaving the maximum propagation probability and they considered thatinﬂuence spreads through local arborescence (a directed graph in which,for a vertex u called the root and any other vertex v , there is exactly35ne directed path from u to v ) only. Hence, the model is called MIA.In PMIA ( Preﬁx excluding

MIA) model, for any seed s i , its maximuminﬂuence path to other nodes should avoid all seeds that are before s i .They proposed greedy algorithms for selecting seed set based on these twodiﬀusion models. Reported results show that both MIA and PMIA canachieve high level of scalability. • LDAG : Chen et al. [91] developed this heuristic for solving SIM Problemunder LT Model. Inﬂuence spread in a

Directed Acyclic Graph (DAG) iseasy to compute. Hence, for computing the inﬂuence spread in generalsocial networks, they introduced a

Local Directed Acyclic Graph (LDAG)based inﬂuence model, which computes local DAGs for each node to ap-proximate inﬂuence spread. After constructing the DAGs, basic greedyalgorithm proposed by Kempe et al. [36] can be used to select the seednodes. Reported results show that LDAG constantly outperforms DDHor Page Rank heuristic. • IRIE : Jung et al. [92] proposed this heuristic based on inﬂuence rank-ing (IR) and inﬂuence estimation (IE) for solving SIM Problem underIC and its extension IC-N (independent cascade with negative opinion)Model. They developed a global inﬂuence ranking like belief propaga-tion approach. If we select top-k nodes, then there will be an overlapin inﬂuence spread by each node. For avoiding this shortcomings, theyintegrated a simple inﬂuence estimation technique to predict additionalinﬂuence impact of a seed on the other node of the network. Reportedresults show that IRIE can achieve better inﬂuence spread compared toMDH, Pagerank, PMIA etc. heuristics. However, IRIE has less runningtime and memory consumption. • ASIM : Galhotra et al. [93] designed this highly scalable heuristic forSIM Problem. For each node u ∈ V ( G ), this algorithm assigns a scorevalue (the weighted sum of the number of simple paths of length at most d starting from that node). ASIM has the running time of O ( kd ( m + n ))36nd its idea is quite similar to the SIMPATH Algorithm proposed by Goyalet al. [89]. Results show that ASIM takes less computational time andconsumes less memory compared to CELF++ and TIM, while achievingthe comparable inﬂuence spread. • EaSyIm : Galhotra et al. [94] proposed opinion cum interaction (OCI)model, which considers negative opinion as well. Based on the OCI Model,they formulated the maximizing eﬀective opinion problem and proposedtwo fast and scalable heuristics, namely Openion Spread Inﬂuence Maxi-mization (OSIM) and EaSyIm having the running time of O ( k D ( m + n ))for this problem, where D is the diameter of the graph. Both the al-gorithms work in two phases. In the ﬁrst phase, each node is assignedwith some score based on the contribution on inﬂuence spread for all thepaths starting at that node. Second step is concerned with the node pro-cessing step. The nodes with the maximum score value are selected asseed nodes. Reported empirical results show that OSIM and EaSyIm canachieve better inﬂuence spread compared to TIM + , CELF++ with lessrunning time. • Cordasco et al.’s [95] [96] Method : Later Cordasco et al. proposed afast and eﬀective heuristic method for selecting the target set in a undi-rected social network [95] [96]. This heuristic produces optimal solutionfor trees , cycles and complete graphs . However, for real life social networks,this heuristic performs much better than the other methods available inthe literature. They extended this work for directed social networks aswell [97].There are several other studies also, which focused on developing heuristic.Nguyen et al. [51] proposed an eﬃcient heuristic for solving BIM Problem. Wuet al. [98] developed a two-stage stochastic programming approach for solvingSIM Problem. In this study, instead of choosing a seed set of size exactly k ,their problem is choosing a seed set of size less than or equal to k .37ow, the studies related to heuristic methods will be summarized here.Centrality-based heuristics (CBHs) consider the topology of the network onlyand hence, obtained inﬂuence spread in most of the cases is quite less comparedto that of other states of the art methods. However, DDH performs slightlybetter than other CBHs, as it puts a little restriction on the selection of twoadjacent nodes. The application of SIMPATH for seed selection is little ad-vantageous, as it has a user controlled parameter η to balance the trade-oﬀbetween accuracy and running time. SPIN has the advantage, as it can be usedfor solving both Top- k node problem as well as λ -Coverage Problem. MIA andPMIA have the better scalability compared to Basic Greedy. As LDAG worksbased on the principle of computation of inﬂuence spread in DAGs, it is seento be faster. As various heuristics are experimented with diﬀerent benchmarkdata sets, drawing a general conclusion about the performance will be diﬃcult.Here, we have summarized some of the important algorithms for solving SIMand related problems, as presented in Table 5. Algorithms for which complexityanalysis has not been done in the paper, we have left that column empty in thetable. Since early seventies, metaheuristic algorithms had been used successfullyto solve optimization problems arises in the broad domain of science and engi-neering [99] [100]. There is no exception for solving SIM Problem as well. • Bucur et al. [101] solved the SIM Problem using genetic algorithm . Theydemonstrated that with simple genetic operator, it is possible to ﬁnd outapproximate solution for inﬂuence spread within feasible run time. In mostof the cases, inﬂuence spread obtained by their method was comparablewith that of the Basic Greedy Algorithm proposed by Kempe et al. [36]. • Jiang et al. [102] proposed simulated annealing -based algorithm for solv-ing the SIM Problem under IC Model. Reported results indicate that38 ame ofthe Algo-rithm Proposed By Complexity ModelSIMPATH

Goyal et al. [89] O ( kmn R ) LT SPIN

Narayanam etal. [47] O ( t ( n + m ) R + n log n + kn + k R m ) IC < MIA , PMIA

Chen et al. [5],Wang et al. [90] - MIA,PMIA

LDGA

Chen et al. [5] O ( n + kn log n ) MIA IRIE

Jung et al. [92] - IC &IC-N

ASIM

Galhotra et al.[93] O ( kd ( m + n )) IC EaSyIm

Galhotra et al.[94] O ( k D ( m + n )) OI Table 5: Heuristic solutions for SIM Problem their proposed methodology runs 2-3 times faster compared to the exist-ing heuristic methods in the literature. • Tsai et al. [103] developed the

Genetic New Greedy Algorithm ( GNA ) forsolving SIM Problem under IC Model by combining genetic algorithm withthe new greedy algorithm proposed by Chen et al. [70]. Their reportedresults conclude that GNA can give 10 % more inﬂuence spread comparedto the genetic algorithm. • Gong et al. [104] proposed a discrete particle swarm optimization algo-rithm for solving SIM Problem. They used the degree discount heuristicproposed by Chen et al. [70] to initialize the seed set and local inﬂuenceestimation (LIE) function to approximate the two-hop inﬂuence. They39ntroduced the network speciﬁc local search strategy also for fast conver-gence of their proposed algorithm. Reported results conclude that thismethodology outperforms the state of the art CELF++ with less compu-tational time.After that, several studies were also carried out in this direction [105], [106], [107][108]. Though there are a large number of metaheuristic algorithms [109], onlya few had been used for solving SIM Problem. Hence, the use of metaheuristicalgorithms for solving SIM Problem and its variants has been largely ignored.Next, we have described the community-based solution methodologies for SIMProblem.

Most of the real-life social networks exhibit a community structure withinit [110]. A community is basically a subset of nodes, which are densely con-nected among themselves and sparsely connected with the other nodes of thenetwork. In recent years, community-based solution framework ( CBSF ) hasbeen developed for solving SIM Problem. • Wang et al. [111] proposed the community-based greedy algorithm forsolving SIM Problem. This method consist of two steps, namely detectingcommunities based on information propagation and selecting communitiesfor ﬁnding inﬂuential nodes. This algorithm could outperform the degreediscount and random heuristic. • Chen et al. [112] [113] developed a CBSF for solving SIM Problem andnamed it

CIM . By exploiting the community structure, they selected somecandidate seed sets, for each community and from the candidate seed setsthey have selected the ﬁnal seed set for diﬀusion. CIM could achieve betterinﬂuence spread compared to some state-of-the art heuristic methods, suchas CDH-Kcut, CDH-SHRINK and maximum degree. • Rahimkhan et al. [114] proposed a CBSF for solving SIM Problem underLT Model and named it

ComPath . They used Speaker- listener Label40ropagation Algorithm (SLPA) proposed by Xie et al. [115] for detectingcommunities and then identiﬁed the most inﬂuential communities andcandidate seed nodes. From the candidate seed set, they selected the ﬁnalseed set based on the intra distance among the nodes of the candidateseed set. ComPath could outperform CELF, CELF++, maximum degreeheuristic, maximum pagerank heuristic, LDGA. • Bozorgi et al. [116] developed a CBSF for solving SIM Problem under LTModel and named it

INCIM . Like ComPath, INCIM also use the SLPAAlgorithm for detecting the communities. They proposed an algorithm forselecting seed, which computes the inﬂuence spread using the algorithmdeveloped by Goyal et al. [89]. INCIM could outperform some state-of-the-art methodologies like LDGA, SIMPATH, IPA (a parallel algorithm forSIM Problem proposed by [117]), high pagerank and high degree heuristic. • Shang et al. [118] proposed a CBSF for solving SIM Problem and named it

CoFIM . In this study they introduced a diﬀusion model, which works intwo phases. In the ﬁrst phase the seed set S was expanded to the neighbornodes of S , which would be usually allocated into diﬀerent communities.Then, in the second phase, inﬂuence propagation within the communitieswas computed. Based on this diﬀusion model, they developed an incre-mental greedy algorithm for selecting seed set, which is analogous to thealgorithm proposed by Kempe et al. [36]. CoFIM could achieve betterinﬂuence spread compared to that of IPA, TIM+, MDH and IMM. • Recently, Li et al. [119] proposed a community-based approach for solvingthe SIM Problem, where the users have a speciﬁc geographical location.They developed a social inﬂuence-based community detection algorithmusing spectral clustering technique and a seed selection methodology byconsidering community-based inﬂuence index. Reported results show thatthis methodology is more eﬃcient than many state-of-the-art methodolo-gies, while achieving almost the same inﬂuence spread.41t is important to note that except the methodology proposed by Wang etal. [111], all these methods are basically heuristics. However, these methods usecommunity detection of the underlying social network as an intermediate stepto scale down the SIM Problem into community level. There are large numberof algorithms available in the literature for detecting communities [120], [121].Among them, which one should be used for solving SIM Problem? How is thequality of community detection and inﬂuence spread related? This questionsare largely ignored in the literature.

In this section, we have described some solution methodologies of SIM Prob-lem, which are very diﬀerent from the methodologies discussed till now. Also,each solution methodology presented here is diﬀerent from another. It is re-ported in the literature that in any information diﬀusion process less than 10%nodes are inﬂuenced beyond the hop count 2 [122]. Based on this phenomenon,recently, Tang et al. [123] [124] developed a hop-based approach for SIM Prob-lem. Their methodology also gives a theoretical guarantee on inﬂuence spread.Ma et al. [125] proposed an algorithm for SIM Problem, which works based onthe heat diﬀusion process. It could produce better inﬂuence spread comparedto Basic Greedy Algorithm. Goyal et al. [126] developed a data-based ap-proach for solving SIM Problem. They introduced the credit distribution (CD)model that could grip the propagation traces to learn the inﬂuence ﬂow pat-tern for approximating the inﬂuence spread. They showed that SIM Problemunder CD Model is NP-Hard and reported results show that this model canachieve even better inﬂuence spread compared to IC and LT Models with lessrunning time. Lee et al. [127] introduced a query-based approach for solvingSIM Problem under IC Model. Here, the query is for activating all the usersof a given set T , what should be the seed set? This methodology is intendedfor maximizing the inﬂuence of a particular group of users, which is the casein target-aware viral marketing . Zhu et al. [128] introduced the CTMC-ICM diﬀusion model, which is basically the blending of IC Model with

Continuous ime Markov Chain . They studied the SIM Problem under this model and cameup with a new centrality metric Spread Rank . Their reported results show thatseed nodes selected based on spread rank centrality can achieve better inﬂuencespared compared to the traditional distance-based centrality measures, such as degree , closeness , betweenness . Wang et al. [129] proposed the methodology Fluidspread , which works based on ﬂuid dynamic principle and can revealthe dynamics of diﬀusion process. Kang et al. [130] introduced the notion ofdiﬀusion centrality for selecting inﬂuential nodes in a social network.

7. Summary of the Survey and Future Research Directions

Based on the survey of the existing literature presented in Sections 3 through6 we have summarized in this section the current research trends and givenfuture directions. • Practicality of the Problem : Most of the current studies is focused onthe practical issues of the SIM Problem. One of the major applicationsof social inﬂuence maximization is viral marketing. So, in this context,inﬂuencing an user will be beneﬁcial, only if he will be able to inﬂuencea reasonable number of other users of the network. Recent studies, suchas [131] [80] along with the node selection cost also consider beneﬁt asanother component in the SIM problem. • Scalability : Starting from kempe et al.’s [36] seminal work, scalabilityremains an important issue in this area. To reduce scalability problem,instead of using Monte Carlo simulation-based spread estimation, recentlyBorgs et al. [73] introduced reverse reachable set-based spread estimation.After this work, all the popular algorithms for SIM Problem, such asTIM, IMM, TIM+ etc uses this concept as an inﬂuence spread estimationtechnique for improving scalability.43

Diﬀusion Probability Computation : TSS problem assumes that in-ﬂuence probability between any pair of users is known. However, this is avery unrealistic assumption. Though there were some previous studies inthis direction, people tried to predict inﬂuence probability using machinelearning techniques [132].Though since the last one and half decades or so, the

TSS Problem had beenstudied extensively from both theoretical as well as applied context, still to thebest of our knowledge, some of the corners of this problem are either not orpartially investigated. Here, we have listed some future research directions fromboth problem speciﬁcation as well as solution methodology point of view.

Further research may be carried out in future in and around of TSS Problemof social networks, in the following directions: • As on-line social networks are formed by the rational agents, incentiviza-tion is required, if a node is selected as a seed node. For practical applica-tions, it is also important to consider what beneﬁt will be obtained (e.g.,how many other non-seed nodes becoming inﬂuenced through that nodeetc.) by activating that node. At the same time , for inﬂuence propaga-tion of time sensitive events ( where inﬂuencing one person after an eventdoes not make any scene such as, political campaign before election, viralmarketing for a seasonal product etc.) consideration of diﬀusion time isalso important. To the best of our knowledge, there is no reported studyon TSS Problem considering all three issues: cost, beneﬁt, and time . • Most of the studies done on SIM Problem and its variants are under eitherIC or LT diﬀusion model. However, recently, some other diﬀusion modelshave also been recently developed, such as Independent Cascade Modelwith Negative Opinion (IC-N) [133], Opinion cum Interaction Model (OI)4494], Opinion-based Cascading Model (OC) [134] etc., which consider neg-ative opinion. SIM Problems and its diﬀerent variants can also be studiedunder these newly developed diﬀusion models. • Most of the studies done on SIM Problem consider that the underlyingsocial network is static including inﬂuence probabilities. However, thisis not a practical assumption, as most of the social networks are timevarying. Recent studies on SIM Problem started considering temporalnature of the social network [135], [136]. As this has just started, there isa lot of scope to work in TSS Problem in time-varying social networks. • In real-world social networks, users have speciﬁc topics of choice. So,one user will be inﬂuenced by other users if both of them have similarchoices. Keeping ‘topic’ into consideration spread of inﬂuence can beincreased, which is known as topic aware inﬂuence maximization . Recentstudies on inﬂuence maximization considers this phenomenon [137] [8].SIM Problems and its variants can be studied in this settings as well. • Among all the variants of TSS Problem in social networks described inSection 3, it is surprising to see that only SIM problem is well studied.Hence, solution methodologies developed for SIM Problem can be modiﬁedaccordingly, so that they can be adopted for solving other variants of SIMproblem as well. • One of the major issues in the solution methodology for SIM problem isthe scalability. It is important to observe that the social network usedin the Kempe et al.’s [36] experiment had 10748 nodes and 53000 edges,whereas the recent study of Nguyen et al.’s [80] has used social networkof with 41 . × nodes and 1 . × edges. From this example, it isclear that the size of the social network data sets is increasing day byday. Hence, developing more scalable algorithms is extremely importantto handle large data sets. 45 From the discussion in Section 6.3, it is understood that though there aremany evolutionary algorithms, only genetic algorithm, artiﬁcial bee colonyoptimization and discrete particle swarm optimization algorithm havebeen used till date for solving SIM Problem. Hence, other meta-heuristics,such as ant colony optimization , diﬀerential evolution etc. can also be usedfor this purpose. • There are many solution methodologies proposed in the literature. How-ever, which one to choose in which situation and for what kind of net-work structure? For answering this question, by taking all the proposedmethodologies from the literature a strong experimental evaluation is re-quired with benchmark data sets. Recently, Arora et al. [138] has done abenchmarking study with 11 most popular algorithms from the literature,and they have found some contradictions between their own experimen-tal results and reported ones in the literature. More such benchmarkingstudies are required to investigate these issues. • Most of the algorithms presented in the literature are serial in nature.The issue of scalability in SIM problem can be tackled by developingdistributed and parallel algorithms. To the best of the authors’ knowledge,except dIRIEr developed by Zong et al. [139], there is no distributedalgorithm existing in the literature. Recently, a few parallel algorithmshave been developed for SIM Problem [117] [140]. So, this an open areato study the SIM problem and its variants under parallel and distributedsettings. • Most of the solution methodologies are concerned with the selection ofthe seeds in one go, before the diﬀusion starts. In this case, if any one ofthe selected seeds does not perform up to expectation, then the numberof inﬂuenced nodes will be lesser than expected. Considering this case,recently the framework of multiphase diﬀusion has been developed [141],[142]. Diﬀerent variants of this problem can be studied in this framework.46 . Concluding Remarks

In this survey, ﬁrst we have discussed the SIM problem and its diﬀerent vari-ants studied in the literature. Next, we have reported the hardness results ofthe problem. After that, we have reported major research challenges concernedwith the SIM Problem and its variants. Subsequently, based on the approach,we have classiﬁed the proposed solution methodologies and discussed algorithmsof each category. At the end, we have discussed the current research trends andgiven future directions. From this survey, we can conclude that SIM problem iswell studied, though its variants are not and there is a continuous thirst for de-veloping more scalable algorithm for these problems. We hope that presentingthree dimensions (variants, hardness results and solution methodologies all to-gether) of the problem will help the researchers and practitioners to have betterunderstanding of the problem and better exposure in this ﬁeld.

Acknowledgement

Authors want to thank Ministry of Human Resource and Development (MHRD),Government of India for sponsoring the project: E-business Center of Excellenceunder the scheme of Center for Training and Research in Frontier Areas of Sci-ence and Technology (FAST), Grant No. F.No.5-5/2014-TS.VII .

References [1] Bing Liu. Social network analysis.

Web Data Mining , pages 269–309,2011.[2] Damon Centola. The spread of behavior in an online social network ex-periment. science , 329(5996):1194–1197, 2010.[3] Maziar Nekovee, Yamir Moreno, Ginestra Bianconi, and Matteo Marsili.Theory of rumour spreading in complex social networks.

Physica A: Sta-tistical Mechanics and its Applications , 374(1):457–470, 2007.474] Jure Leskovec, Lada A Adamic, and Bernardo A Huberman. The dynam-ics of viral marketing.

ACM Transactions on the Web (TWEB) , 1(1):5,2007.[5] Wei Chen, Chi Wang, and Yajun Wang. Scalable inﬂuence maximizationfor prevalent viral marketing in large-scale social networks. In

Proceed-ings of the 16th ACM SIGKDD international conference on Knowledgediscovery and data mining , pages 1029–1038. ACM, 2010.[6] Xiaodan Song, Belle L Tseng, Ching-Yung Lin, and Ming-Ting Sun. Per-sonalized recommendation driven by information ﬂow. In

Proceedings ofthe 29th annual international ACM SIGIR conference on Research anddevelopment in information retrieval , pages 509–516. ACM, 2006.[7] Dino Ienco, Francesco Bonchi, and Carlos Castillo. The meme rankingproblem: Maximizing microblogging virality. In , pages 328–335. IEEE,2010.[8] Yuchen Li, Dongxiang Zhang, and Kian-Lee Tan. Real-time targeted in-ﬂuence maximization for online advertisements.

Proceedings of the VLDBEndowment , 8(10):1070–1081, 2015.[9] Jianshu Weng, Ee-Peng Lim, Jing Jiang, and Qi He. Twitterrank: ﬁnd-ing topic-sensitive inﬂuential twitterers. In

Proceedings of the third ACMinternational conference on Web search and data mining , pages 261–270.ACM, 2010.[10] Eytan Bakshy, Jake M Hofman, Winter A Mason, and Duncan J Watts.Everyone’s an inﬂuencer: quantifying inﬂuence on twitter. In

Proceed-ings of the fourth ACM international conference on Web search and datamining , pages 65–74. ACM, 2011.[11] Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos,Jeanne VanBriesen, and Natalie Glance. Cost-eﬀective outbreak detection48n networks. In

Proceedings of the 13th ACM SIGKDD international con-ference on Knowledge discovery and data mining , pages 420–429. ACM,2007.[12] Robin Cowan and Nicolas Jonard. Network structure and the diﬀusion ofknowledge.

Journal of economic Dynamics and Control , 28(8):1557–1575,2004.[13] R Kasprzak. Diﬀusion in networks.

Journal of Telecommunications andInformation Technology , pages 99–106, 2012.[14] Paulo Shakarian, Abhinav Bhatnagar, Ashkan Aleali, Elham Shaabani,and Ruocheng Guo. The independent cascade and linear threshold models.In

Diﬀusion in Social Networks , pages 35–48. Springer, 2015.[15] Jimeng Sun and Jie Tang. A survey of models and algorithms for socialinﬂuence analysis.

Social network data analytics , pages 177–214, 2011.[16] William M Campbell, Charlie K Dagli, and Cliﬀord J Weinstein. Socialnetwork analysis with content and graphs.

Lincoln Laboratory Journal ,20(1):61–81, 2013.[17] Tianyi Wang, Yang Chen, Zengbin Zhang, Tianyin Xu, Long Jin, Pan Hui,Beixing Deng, and Xing Li. Understanding graph sampling algorithmsfor social network analysis. In , pages 123–128.IEEE, 2011.[18] Reinhard Diestel. Graph theory. 2005.

Grad. Texts in Math , 101, 2005.[19] Daniel Gruhl, Ramanathan Guha, David Liben-Nowell, and AndrewTomkins. Information diﬀusion through blogspace. In

Proceedings of the13th international conference on World Wide Web , pages 491–501. ACM,2004. 4920] Jochen Harant, Anja Pruchnewski, and Margit Voigt. On dominating setsand independent sets of graphs.

Combinatorics, Probability and Comput-ing , 8(6):547–553, 1999.[21] Venkatesh Raman, Saket Saurabh, and Sriganesh Srihari. Parameterizedalgorithms for generalized domination.

Lecture Notes in Computer Sci-ence , 5165:116–126, 2008.[22] Ralf Klasing and Christian Laforest. Hardness results and approxima-tion algorithms of k-tuple domination in graphs.

Information ProcessingLetters , 89(2):75–83, 2004.[23] Ning Chen. On the approximability of inﬂuence in social networks.

SIAMJournal on Discrete Mathematics , 23(3):1400–1415, 2009.[24] Paul A Dreyer and Fred S Roberts. Irreversible k-threshold processes:Graph-theoretical threshold models of the spread of disease and of opinion.

Discrete Applied Mathematics , 157(7):1615–1627, 2009.[25] J´ozsef Balogh, B´ela Bollob´as, and Robert Morris. Bootstrap percolation inhigh dimensions.

Combinatorics, Probability and Computing , 19(5-6):643–692, 2010.[26] David Peleg. Local majorities, coalitions and monopolies in graphs: areview.

Theoretical Computer Science , 282(2):231–257, 2002.[27] Michael R Garey and David S Johnson.

Computers and intractability ,volume 29. wh freeman New York, 2002.[28] David P Williamson and David B Shmoys.

The design of approximationalgorithms . Cambridge university press, 2011.[29] Rodney G Downey, Michael R Fellows, and Kenneth W Regan. Param-eterized circuit complexity and the w hierarchy.

Theoretical ComputerScience , 191(1-2):97–115, 1998. 5030] Marcel Salath´e, Maria Kazandjieva, Jung Woo Lee, Philip Levis, Mar-cus W Feldman, and James H Jones. A high-resolution human contactnetwork for infectious disease transmission.

Proceedings of the NationalAcademy of Sciences , 107(51):22020–22025, 2010.[31] Bo Xu and Lu Liu. Information diﬀusion through online social networks.In , pages 53–56. IEEE, 2010.[32] Cliﬀ C Zou, Don Towsley, and Weibo Gong. Modeling and simulationstudy of the propagation and defense of internet e-mail worms.

IEEETransactions on dependable and secure computing , 4(2), 2007.[33] Masahiro Kimura and Kazumi Saito. Tractable models for informationdiﬀusion in social networks.

Knowledge Discovery in Databases: PKDD2006 , pages 259–271, 2006.[34] Thomas W Valente. Network models of the diﬀusion of innovations. 1995.[35] Nima Heidari. Modeling information diﬀusion in social networks. arXivpreprint arXiv:1603.02178 , 2016.[36] David Kempe, Jon Kleinberg, and ´Eva Tardos. Maximizing the spreadof inﬂuence through a social network. In

Proceedings of the ninth ACMSIGKDD international conference on Knowledge discovery and data min-ing , pages 137–146. ACM, 2003.[37] Jaewon Yang and Jure Leskovec. Modeling information diﬀusion in im-plicit networks. In , pages 599–608. IEEE, 2010.[38] Kazumi Saito, Kouzou Ohara, Yuki Yamagishi, Masahiro Kimura, andHiroshi Motoda. Learning diﬀusion probability based on node attributesin social networks. In

International Symposium on Methodologies for In-telligent Systems , pages 153–162. Springer, 2011.5139] Kazumi Saito, Ryohei Nakano, and Masahiro Kimura. Prediction ofinformation diﬀusion probabilities for independent cascade model. In

Knowledge-based intelligent information and engineering systems , pages67–75. Springer, 2008.[40] Amit Goyal, Francesco Bonchi, and Laks VS Lakshmanan. Learning in-ﬂuence probabilities in social networks. In

Proceedings of the third ACMinternational conference on Web search and data mining , pages 241–250.ACM, 2010.[41] Kazumi Saito, Masahiro Kimura, Kouzou Ohara, and Hiroshi Motoda.Selecting information diﬀusion models over social networks for behavioralanalysis.

Machine Learning and Knowledge Discovery in Databases , pages180–195, 2010.[42] Masahiro Kimura, Kazumi Saito, Ryohei Nakano, and Hiroshi Motoda.Finding inﬂuential nodes in a social network from information diﬀusiondata.

Social Computing and Behavioral Modeling , pages 1–8, 2009.[43] Thomas W Valente. Social network thresholds in the diﬀusion of innova-tions.

Social networks , 18(1):69–89, 1996.[44] Huiyuan Zhang, Subhankar Mishra, My T Thai, J Wu, and Y Wang.Recent advances in information diﬀusion and inﬂuence maximization incomplex social networks.

Opportunistic Mobile Social Networks , 37(1.1),2014.[45] Pedro Domingos and Matt Richardson. Mining the network value of cus-tomers. In

Proceedings of the seventh ACM SIGKDD international confer-ence on Knowledge discovery and data mining , pages 57–66. ACM, 2001.[46] Eyal Ackerman, Oren Ben-Zwi, and Guy Wolfovitz. Combinatorial modeland bounds for target set selection.

Theoretical Computer Science , 411(44-46):4017–4022, 2010. 5247] Ramasuri Narayanam and Yadati Narahari. A shapley value-based ap-proach to discover inﬂuential nodes in social networks.

IEEE Transactionson Automation Science and Engineering , 8(1):130–147, 2011.[48] Hung T Nguyen, Preetam Ghosh, Michael L Mayo, and Thang N Dinh.Social inﬂuence spectrum at scale: Near-optimal solutions for multiplebudgets at once.

ACM Transactions on Information Systems (TOIS) ,36(2):14, 2017.[49] S Raghavan and Rui Zhang. Weighted target set selection on social net-works. Technical report, Working paper, University of Maryland, 2015.[50] Moses Charikar, Yonatan Naamad, and Anthony Wirth. On approximat-ing target set selection. In

LIPIcs-Leibniz International Proceedings in In-formatics , volume 60. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik,2016.[51] Huy Nguyen and Rong Zheng. On budgeted inﬂuence maximization insocial networks.

IEEE Journal on Selected Areas in Communications ,31(6):1084–1094, 2013.[52] Ferdinando Cicalese, Gennaro Cordasco, Luisa Gargano, Martin Milaniˇc,and Ugo Vaccaro. Latency-bounded target set selection in social networks.

Theoretical Computer Science , 535:1–15, 2014.[53] Matthew Richardson and Pedro Domingos. Mining knowledge-sharingsites for viral marketing. In

Proceedings of the eighth ACM SIGKDDinternational conference on Knowledge discovery and data mining , pages61–70. ACM, 2002.[54] Guy Kortsarz. On the hardness of approximating spanners.

Algorithmica ,30(3):432–450, 2001.[55] Craig A Tovey. A simpliﬁed np-complete satisﬁability problem.

Discreteapplied mathematics , 8(1):85–89, 1984.5356] Rodney G Downey and Michael R Fellows.

Fundamentals of parameterizedcomplexity , volume 4. Springer, 2013.[57] Cristina Bazgan, Morgan Chopin, Andr´e Nichterlein, and Florian Sikora.Parameterized approximability of maximizing the spread of inﬂuence innetworks.

Journal of Discrete Algorithms , 27:54–65, 2014.[58] Morgan Chopin, Andr´e Nichterlein, Rolf Niedermeier, and Mathias Weller.Constant thresholds can make target set selection tractable.

Theory ofComputing Systems , 55(1):61–83, 2014.[59] Morgan Chopin, Andr´e Nichterlein, Rolf Niedermeier, and Mathias Weller.

Constant Thresholds Can Make Target Set Selection Tractable , pages 120–133. Springer Berlin Heidelberg, Berlin, Heidelberg, 2012.[60] Richard M Karp. Reducibility among combinatorial problems. In

Com-plexity of computer computations , pages 85–103. Springer, 1972.[61] Andr´e Nichterlein, Rolf Niedermeier, Johannes Uhlmann, and MathiasWeller. On tractable cases of target set selection.

Social Network Analysisand Mining , 3(2):233–256, 2013.[62] Andr´e Nichterlein, Rolf Niedermeier, Johannes Uhlmann, and MathiasWeller. On tractable cases of target set selection.

Algorithms and Com-putation , pages 378–389, 2010.[63] Yanhua Li, Wei Chen, Yajun Wang, and Zhi-Li Zhang. Inﬂuence diﬀusiondynamics and inﬂuence maximization in social networks with friend andfoe relationships. In

Proceedings of the sixth ACM international conferenceon Web search and data mining , pages 657–666. ACM, 2013.[64] Aristides Gionis, Evimaria Terzi, and Panayiotis Tsaparas. Opinion max-imization in social networks. In

Proceedings of the 2013 SIAM Interna-tional Conference on Data Mining , pages 387–395. SIAM, 2013.5465] Alessandro Epasto, Ahmad Mahmoody, and Eli Upfal. Real-timetargeted-inﬂuence queries over large graphs. In

Proceedings of the 2017IEEE/ACM International Conference on Advances in Social NetworksAnalysis and Mining 2017 , pages 224–231. ACM, 2017.[66] Xiangyu Ke, Arijit Khan, and Gao Cong. Finding seeds and relevanttags jointly: For targeted inﬂuence maximization in social networks. In

Proceedings of the 2018 International Conference on Management of Data ,pages 1097–1111. ACM, 2018.[67] David Kempe, Jon M Kleinberg, and ´Eva Tardos. Inﬂuential nodes in adiﬀusion model for social networks. In

ICALP , volume 5, pages 1127–1138.Springer, 2005.[68] David Kempe, Jon M Kleinberg, and ´Eva Tardos. Maximizing the spreadof inﬂuence through a social network.

Theory of Computing , 11(4):105–147, 2015.[69] Takanori Maehara, Hirofumi Suzuki, and Masakazu Ishihata. Exact com-putation of inﬂuence spread by binary decision diagrams. In

Proceedingsof the 26th International Conference on World Wide Web , pages 947–956.International World Wide Web Conferences Steering Committee, 2017.[70] Wei Chen, Yajun Wang, and Siyu Yang. Eﬃcient inﬂuence maximizationin social networks. In

Proceedings of the 15th ACM SIGKDD interna-tional conference on Knowledge discovery and data mining , pages 199–208.ACM, 2009.[71] Amit Goyal, Wei Lu, and Laks VS Lakshmanan. Celf++: optimizingthe greedy algorithm for inﬂuence maximization in social networks. In

Proceedings of the 20th international conference companion on World wideweb , pages 47–48. ACM, 2011.[72] Suqi Cheng, Huawei Shen, Junming Huang, Guoqing Zhang, and XueqiCheng. Staticgreedy: solving the scalability-accuracy dilemma in inﬂuence55aximization. In

Proceedings of the 22nd ACM international conferenceon Information & Knowledge Management , pages 509–518. ACM, 2013.[73] Christian Borgs, Michael Brautbar, Jennifer Chayes, and Brendan Lucier.Maximizing social inﬂuence in nearly optimal time. In

Proceedings ofthe Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms ,pages 946–957. SIAM, 2014.[74] Yuqing Zhu, Weili Wu, Yuanjun Bi, Lidong Wu, Yiwei Jiang, and WenXu. Better approximation algorithms for inﬂuence maximization in onlinesocial networks.

Journal of Combinatorial Optimization , 30(1):97–108,2015.[75] Uriel Feige and Michel Goemans. Approximating the value of two powerproof systems, with applications to max 2sat and max dicut.[76] Edith Cohen, Daniel Delling, Thomas Pajor, and Renato F Werneck.Sketch-based inﬂuence maximization and computation: Scaling up withguarantees. In

Proceedings of the 23rd ACM International Conference onConference on Information and Knowledge Management , pages 629–638.ACM, 2014.[77] Youze Tang, Xiaokui Xiao, and Yanchen Shi. Inﬂuence maximization:Near-optimal time complexity meets practical eﬃciency. In

Proceedingsof the 2014 ACM SIGMOD international conference on Management ofdata , pages 75–86. ACM, 2014.[78] Youze Tang, Yanchen Shi, and Xiaokui Xiao. Inﬂuence maximizationin near-linear time: A martingale approach. In

Proceedings of the 2015ACM SIGMOD International Conference on Management of Data , pages1539–1554. ACM, 2015.[79] Hung T Nguyen, My T Thai, and Thang N Dinh. Stop-and-stare: Op-timal sampling algorithms for viral marketing in billion-scale networks.56n

Proceedings of the 2016 International Conference on Management ofData , pages 695–710. ACM, 2016.[80] Hung T Nguyen, My T Thai, and Thang N Dinh. A billion-scale approxi-mation algorithm for maximizing beneﬁt in viral marketing.

IEEE/ACMTransactions on Networking , 2017.[81] Samir Khuller, Anna Moss, and Joseph Seﬃ Naor. The budgeted max-imum coverage problem.

Information Processing Letters , 70(1):39–45,1999.[82] Huy Nguyen and Rong Zheng. On budgeted inﬂuence maximization insocial networks. arXiv preprint arXiv:1204.4491 , 2012.[83] Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. Graphs over time:densiﬁcation laws, shrinking diameters and possible explanations. In

Proceedings of the eleventh ACM SIGKDD international conference onKnowledge discovery in data mining , pages 177–187. ACM, 2005.[84] Linton C Freeman. Centrality in social networks conceptual clariﬁcation.

Social networks , 1(3):215–239, 1978.[85] Andrea Landherr, Bettina Friedl, and Julia Heidemann. A critical reviewof centrality measures in social networks.

Business & Information SystemsEngineering , 2(6):371–385, 2010.[86] Christo Wilson, Bryce Boe, Alessandra Sala, Krishna PN Puttaswamy,and Ben Y Zhao. User interactions in social networks and their implica-tions. In

Proceedings of the 4th ACM European conference on Computersystems , pages 205–218. Acm, 2009.[87] Benjamin M Tabak, Marcelo Takami, Jadson MC Rocha, Daniel O Ca-jueiro, and Sergio RS Souza. Directed clustering coeﬃcient as a measureof systemic risk in complex banking networks.

Physica A: Statistical Me-chanics and its Applications , 394:211–216, 2014.5788] Sergey Brin and Lawrence Page. The anatomy of a large-scale hyper-textual web search engine.

Comput. Netw. ISDN Syst. , 30(1-7):107–117,April 1998.[89] Amit Goyal, Wei Lu, and Laks VS Lakshmanan. Simpath: An eﬃcientalgorithm for inﬂuence maximization under the linear threshold model.In ,pages 211–220. IEEE, 2011.[90] Chi Wang, Wei Chen, and Yajun Wang. Scalable inﬂuence maximizationfor independent cascade model in large-scale social networks.

Data Miningand Knowledge Discovery , 25(3):545, 2012.[91] Wei Chen, Yifei Yuan, and Li Zhang. Scalable inﬂuence maximizationin social networks under the linear threshold model. In , pages 88–97. IEEE,2010.[92] Kyomin Jung, Wooram Heo, and Wei Chen. Irie: Scalable and robust in-ﬂuence maximization in social networks. In , pages 918–923. IEEE, 2012.[93] Sainyam Galhotra, Akhil Arora, Srinivas Virinchi, and Shourya Roy.Asim: A scalable algorithm for inﬂuence maximization under the indepen-dent cascade model. In

Proceedings of the 24th International Conferenceon World Wide Web , pages 35–36. ACM, 2015.[94] Sainyam Galhotra, Akhil Arora, and Shourya Roy. Holistic inﬂuence maxi-mization: Combining scalability and eﬃciency with opinion-aware models.In

Proceedings of the 2016 International Conference on Management ofData , pages 743–758. ACM, 2016.[95] Gennaro Cordasco, Luisa Gargano, Marco Mecchia, Adele A Rescigno,and Ugo Vaccaro. A fast and eﬀective heuristic for discovering small target58ets in social networks. In

Combinatorial Optimization and Applications ,pages 193–208. Springer, 2015.[96] Gennaro Cordasco, Luisa Gargano, and Adele A Rescigno. Active spread-ing in networks. In

ICTCS , pages 149–162, 2016.[97] Gennaro Cordasco, Luisa Gargano, and Adele Anna Rescigno. Inﬂuencepropagation over large scale social networks. In

Proceedings of the 2015IEEE/ACM International Conference on Advances in Social NetworksAnalysis and Mining 2015 , pages 1531–1538. ACM, 2015.[98] Hao-Hsiang Wu and Simge K¨u¸c¨ukyavuz. A two-stage stochastic program-ming approach for inﬂuence maximization in social networks.

Computa-tional Optimization and Applications , pages 1–33, 2017.[99] Huizhi Yi, Qinglin Duan, and T Warren Liao. Three improved hybridmetaheuristic algorithms for engineering design optimization.

Applied SoftComputing , 13(5):2433–2444, 2013.[100] Xin-She Yang, Su Fong Chien, and Tiew On Ting. Computational in-telligence and metaheuristic algorithms with applications.

The ScientiﬁcWorld Journal , 2014, 2014.[101] Doina Bucur and Giovanni Iacca. Inﬂuence maximization in social net-works with genetic algorithms. In

European Conference on the Applica-tions of Evolutionary Computation , pages 379–392. Springer, 2016.[102] Qingye Jiang, Guojie Song, Gao Cong, Yu Wang, Wenjun Si, and KunqingXie. Simulated annealing based inﬂuence maximization in social networks.In

AAAI , volume 11, pages 127–132, 2011.[103] Chun-Wei Tsai, Yo-Chung Yang, and Ming-Chao Chiang. A geneticnewgreedy algorithm for inﬂuence maximization in social network. In , pages 2549–2554. IEEE, 2015.59104] Maoguo Gong, Jianan Yan, Bo Shen, Lijia Ma, and Qing Cai. Inﬂuencemaximization in social networks based on discrete particle swarm opti-mization.

Information Sciences , 367:600–614, 2016.[105] C Prem Sankar, S Asharaf, and K Satheesh Kumar. Learning from bees:An approach for inﬂuence maximization on viral campaigns.

PloS one ,11(12):e0168125, 2016.[106] Qixiang Wang, Maoguo Gong, Chao Song, and Shanfeng Wang. Discreteparticle swarm optimization based inﬂuence maximization in complex net-works. In ,pages 488–494. IEEE, 2017.[107] Shi-Jui Liu, Chi-Yuan Chen, and Chun-Wei Tsai. An eﬀective simulatedannealing for inﬂuence maximization problem of online social networks.

Procedia Computer Science , 113:478–483, 2017.[108] Kaiqi Zhang, Haifeng Du, and Marcus W Feldman. Maximizing inﬂuencein a social network: Improved results using a genetic algorithm.

PhysicaA: Statistical Mechanics and its Applications , 478:20–30, 2017.[109] Xin-She Yang.

Nature-inspired metaheuristic algorithms . Luniver press,2010.[110] Aaron Clauset, Mark EJ Newman, and Cristopher Moore. Finding com-munity structure in very large networks.

Physical review E , 70(6):066111,2004.[111] Yu Wang, Gao Cong, Guojie Song, and Kunqing Xie. Community-basedgreedy algorithm for mining top-k inﬂuential nodes in mobile social net-works. In

Proceedings of the 16th ACM SIGKDD international conferenceon Knowledge discovery and data mining , pages 1039–1048. ACM, 2010.[112] Y Chen, S Chang, C Chou, W Peng, and S Lee. Exploring communitystructures for inﬂuence maximization in social networks. In

Proceedings f the 6th SNA-KDD Workshop on Social Network Mining and Analysisheld in conjunction with KDD12 (SNA-KDD12) , pages 1–6, 2012.[113] Yi-Cheng Chen, Wen-Yuan Zhu, Wen-Chih Peng, Wang-Chien Lee, andSuh-Yin Lee. Cim: Community-based inﬂuence maximization in so-cial networks. ACM Transactions on Intelligent Systems and Technology(TIST) , 5(2):25, 2014.[114] Khadije Rahimkhani, Abolfazl Aleahmad, Maseud Rahgozar, and AliMoeini. A fast algorithm for ﬁnding most inﬂuential people based on thelinear threshold model.

Expert Systems with Applications , 42(3):1353–1361, 2015.[115] Jierui Xie, Boleslaw K Szymanski, and Xiaoming Liu. Slpa: Uncoveringoverlapping communities in social networks via a speaker-listener inter-action dynamic process. In , pages 344–349. IEEE, 2011.[116] Arastoo Bozorgi, Hassan Haghighi, Mohammad Sadegh Zahedi, and Mo-jtaba Rezvani. Incim: A community-based algorithm for inﬂuence maxi-mization problem under the linear threshold model.

Information Process-ing & Management , 52(6):1188–1199, 2016.[117] Jinha Kim, Seung-Keol Kim, and Hwanjo Yu. Scalable and parallelizableprocessing of inﬂuence maximization for large-scale social networks? In ,pages 266–277. IEEE, 2013.[118] Jiaxing Shang, Shangbo Zhou, Xin Li, Lianchen Liu, and Hongchun Wu.Coﬁm: A community-based framework for inﬂuence maximization onlarge-scale networks.

Knowledge-Based Systems , 117:88–100, 2017.[119] Xiao Li, Xiang Cheng, Sen Su, and Chenna Sun. Community-based seedsselection algorithm for location aware inﬂuence maximization.

Neurocom-puting , 275:1601–1613, 2018. 61120] Santo Fortunato. Community detection in graphs.

Physics reports ,486(3):75–174, 2010.[121] Tanmoy Chakraborty, Ayushi Dalmia, Animesh Mukherjee, and NiloyGanguly. Metrics for community analysis: A survey.

ACM ComputingSurveys (CSUR) , 50(4):54, 2017.[122] Sharad Goel, Duncan J Watts, and Daniel G Goldstein. The structure ofonline diﬀusion networks. In

Proceedings of the 13th ACM conference onelectronic commerce , pages 623–638. ACM, 2012.[123] Jing Tang, Xueyan Tang, and Junsong Yuan. Inﬂuence maximizationmeets eﬃciency and eﬀectiveness: A hop-based approach. In

Proceedingsof the 2017 IEEE/ACM International Conference on Advances in SocialNetworks Analysis and Mining 2017 , pages 64–71. ACM, 2017.[124] Jing Tang, Xueyan Tang, and Junsong Yuan. An eﬃcient and eﬀectivehop-based approach for inﬂuence maximization in social networks.

SocialNetwork Analysis and Mining , 8(1):10, 2018.[125] Hao Ma, Haixuan Yang, Michael R Lyu, and Irwin King. Mining socialnetworks using heat diﬀusion processes for marketing candidates selection.In

Proceedings of the 17th ACM conference on Information and knowledgemanagement , pages 233–242. ACM, 2008.[126] Amit Goyal, Francesco Bonchi, and Laks VS Lakshmanan. A data-basedapproach to social inﬂuence maximization.

Proceedings of the VLDB En-dowment , 5(1):73–84, 2011.[127] Jong-Ryul Lee and Chin-Wan Chung. A query approach for inﬂuencemaximization on speciﬁc users in social networks.

IEEE Transactions onknowledge and data engineering , 27(2):340–353, 2015.[128] Tian Zhu, Bai Wang, Bin Wu, and Chuanxi Zhu. Maximizing the spread ofinﬂuence ranking in social networks.

Information Sciences , 278:535–544,2014. 62129] Feng Wang, Wenjun Jiang, Xiaolin Li, and Guojun Wang. Maximizingpositive inﬂuence spread in online social networks via ﬂuid dynamics.

Fu-ture Generation Computer Systems , 2017.[130] Chanhyun Kang, Sarit Kraus, Cristian Molinaro, Francesca Spezzano, andVS Subrahmanian. Diﬀusion centrality: A paradigm to maximize spreadin social networks.

Artiﬁcial Intelligence , 239:70–96, 2016.[131] Hung T Nguyen, Thang N Dinh, and My T Thai. Cost-aware targetedviral marketing in billion-scale networks. In

IEEE INFOCOM 2016-The35th Annual IEEE International Conference on Computer Communica-tions , pages 1–9. IEEE, 2016.[132] Devesh Varshney, Sandeep Kumar, and Vineet Gupta. Predicting infor-mation diﬀusion probabilities in social networks: A bayesian networksbased approach.

Knowledge-Based Systems , 133:66–76, 2017.[133] Wei Chen, Alex Collins, Rachel Cummings, Te Ke, Zhenming Liu, DavidRincon, Xiaorui Sun, Yajun Wang, Wei Wei, and Yifei Yuan. Inﬂuencemaximization in social networks when negative opinions may emerge andpropagate. In

Proceedings of the 2011 SIAM International Conference onData Mining , pages 379–390. SIAM, 2011.[134] Huiyuan Zhang, Thang N Dinh, and My T Thai. Maximizing the spreadof positive inﬂuence in online social networks. In , pages317–326. IEEE, 2013.[135] Guangmo Tong, Weili Wu, Shaojie Tang, and Ding-Zhu Du. Adaptiveinﬂuence maximization in dynamic social networks.

IEEE/ACM Trans-actions on Networking (TON) , 25(1):112–125, 2017.[136] Honglei Zhuang, Yihan Sun, Jie Tang, Jialin Zhang, and Xiaoming Sun.Inﬂuence maximization in dynamic social networks. In ernational Conference on Data Mining (ICDM) , pages 1313–1318. IEEE,2013.[137] Shuo Chen, Ju Fan, Guoliang Li, Jianhua Feng, Kian-lee Tan, and Jin-hui Tang. Online topic-aware inﬂuence maximization. Proceedings of theVLDB Endowment , 8(6):666–677, 2015.[138] Akhil Arora, Sainyam Galhotra, and Sayan Ranu. Debunking the myths ofinﬂuence maximization: An in-depth benchmarking study. In

Proceedingsof the 2017 ACM International Conference on Management of Data , pages651–666. ACM, 2017.[139] Zhou Zong, Bo Li, and Chunming Hu. dirier: Distributed inﬂuence max-imization in social network. In , pages 119–125. IEEE,2014.[140] Hong Wu, Kun Yue, Xiaodong Fu, Yujie Wang, and Weiyi Liu. Paral-lel seed selection for inﬂuence maximization based on k-shell decomposi-tion. In

International Conference on Collaborative Computing: Network-ing, Applications and Worksharing , pages 27–36. Springer, 2016.[141] Swapnil Dhamal, KJ Prabuchandran, and Y Narahari. Information dif-fusion in social networks in two phases.

IEEE Transactions on NetworkScience and Engineering , 3(4):197–210, 2016.[142] Kai Han, Keke Huang, Xiaokui Xiao, Jing Tang, Aixin Sun, and XueyanTang. Eﬃcient algorithms for adaptive inﬂuence maximization.