Theories for influencer identification in complex networks
aa r X i v : . [ phy s i c s . s o c - ph ] M a y Theories for influencer identification in complexnetworks
Sen Pei, Flaviano Morone and Hern´an A. MakseIn
Complex Spreading Phenomena in Social Systems , edited by Sune Lehmann andYong-Yeol Ahn (Springer Nature, 2018)
Abstract
In social and biological systems, the structural heterogeneity of interac-tion networks gives rise to the emergence of a small set of influential nodes, or in-fluencers, in a series of dynamical processes. Although much smaller than the entirenetwork, these influencers were observed to be able to shape the collective dynam-ics of large populations in different contexts. As such, the successful identificationof influencers should have profound implications in various real-world spreadingdynamics such as viral marketing, epidemic outbreaks and cascading failure. In thischapter, we first summarize the centrality-based approach in finding single influ-encers in complex networks, and then discuss the more complicated problem oflocating multiple influencers from a collective point of view. Progress rooted incollective influence theory, belief-propagation and computer science will be pre-sented. Finally, we present some applications of influencer identification in diversereal-world systems, including online social platforms, scientific publication, brainnetworks and socioeconomic systems.
In spreading processes of information, it is well known that certain individuals aremore influential than others. In the field of information diffusion, it has been ac-
Sen PeiDepartment of Environmental Health Sciences, Mailman School of Public Health, Columbia Uni-versity, New York, NY 10032, USA, e-mail: [email protected]
Flaviano MoroneLevich Institute and Physics Department, City College of New York, New York, NY 10031, USA,e-mail: [email protected]
Hern´an A. MakseLevich Institute and Physics Department, City College of New York, New York, NY 10031, USA,e-mail: [email protected] cepted that the ability of influencers to initiate a large-scale spreading is attributedto their privileged locations in the underlying social networks [92, 41, 71, 59]. Dueto the direct relevance of influencer identification in such phenomena as viral mar-keting [46], innovation diffusion [81], behavior adoption [17] and epidemic spread-ing [69], the research on searching for influential spreaders in different settings isbecoming increasingly important in recent years [71].In the relative simple case of locating individual influencers, given the rich struc-tural information encoded in nodes’ location in the network, it is straightforwardto measure the influence of a single node using centrality-based heuristics. Overthe years, a growing number of predictors have been developed and routinely em-ployed to rank single node’s influence in spreading processes, among which themost widely used ones include number of connection [1], k-core [85], betweennesscentrality [25] and PageRank [13], just to name a few. Beyond this non-interactingproblem, a more challenging task is to identify a set of influencers to achieve max-imal collective influence. Originally formulated in the context of viral marketing[80], collective influence maximization is in fact a core optimization problem inan array of important applications in various domains, ranging from cost-effectivemarketing in commercial promotion, optimal immunization in epidemic control, tostrategic protection against targeted attacks on infrastructures. In addition to thetopological complexity of network structure, collective influence maximization isfurther complicated by the entwined interactions between multiple spreaders, whichrenders the aforementioned centrality-based approaches invalid. As a result, it isrequired to treat the problem from a collective point of view to develop effectivesolutions [61].
In reality, many spreading phenomena are typically initiated by a single spreader.For instance, an epidemic outbreak in a local area is usually caused by the firstinfected person. For such processes, ranking the spreading capability of individualspreaders is of great significance in both accelerating and confining the diffusion.
Intuitively, the nodes with large numbers of connections should have more influenceon their direct neighbors. The disproportionate effect of highly-connected nodes, orhubs, on dynamical processes has been revealed in the early works on the vulner-ability of scale-free networks [1, 22]. The targeted attack on a very small num-ber of high-degree nodes will rapidly collapse the giant component of networkswith heavy-tailed degree distribution. Compared with other more complex central-ity measures, the computational burden of degree is almost negligible. Due to this, heories for influencer identification in complex networks 3 k S = 3 k S = 2 k S = 1 a bc d Node A k = 96 k S = 63Node B k = 96 k S = 26 Node C k = 65 k S =63 100755025 ( % ) a bc d Fig. 1 a , A schematic diagram of k-shell decomposition. The two highlighted nodes (blue and yel-low), although both with degree k =
8, are in different k-shells. b-d , Infections starting from singlenodes with same degree k =
96 (A and B) can result in totally different outcomes. Whereas, infec-tions originating from node C, locating in the same k-shell of node A ( k S =
63) but with a smallerdegree, are quite similar to the spreading from node A. The colors indicate nodes’ probability tobe infected in SIR simulations with infection rate β = .
035 and recovery rate µ =
1. Results areaveraged over 10,000 realizations. Figure is adapted from Kitsak et al. [41]. the simple degree centrality has been playing an important role in influencer identi-fication. In implementation, the performance of high-degree ranking can be furtherenhanced by a simple adaptive calculation procedure, that is, recalculating the de-gree of remaining nodes after the removal of previously selected nodes.An obvious drawback of degree centrality is that it only considers the number ofdirect neighbors. However, as indicated by empirical studies, most spreading phe-nomena are proceeded in a cascading fashion. Therefore, the ultimate influence ofa single spreader is also affected by the global network structure. In realistic com-plex networks, high-degree nodes can appear at either the core area or the peripheryregion. This implies, the number of connections may not be a reliable indicator
Sen Pei, Flaviano Morone and Hern´an A. Makse of influencers in real-world systems. Recently, Kitsak et al. confirmed this spec-ulation through extensive simulations of susceptible-infected-recovered (SIR) andsusceptible-infected-susceptible (SIS) dynamics on diverse real-world social net-works [41]. In SIR model, a susceptible individual will become infected with aprobability β upon contact with his/her infected neighbors, and infected populationwill recover with a probability µ and become immune to the disease. In SIS model,the infection follows the same dynamics but infected persons will become suscep-tible again with a probability µ . As shown in Fig. 1b-d, SIR spreading processesinitiated by two hubs with the same degree could result in quite different infectedpopulation, depending on their global position in the network. In contrast, the k-core index, which distinguishes the network core and periphery, is a more reliablepredictor of influence.The k-core index is obtained by the k-shell decomposition in which nodes areiteratively pruned according to their remaining degree in the network (see Fig. 1a)[85]. Specifically, nodes with degree k = k S =
1. Then we remove nodes with degree k = O ( M ) operations, where M is the number of links [7]. Thus k-core ranking is feasible for large-scale complexnetworks encountered in big-data analysis.As illustrated in Fig. 1a, the classification of k-core can be very different fromthat of degree. A hub with low k-core index is usually surrounded by many low-degree neighbors that limit the influence of the hub. On the contrary, nodes locatedin the core region, although may have moderate degree, are capable of generat-ing large-scale spreading facilitated by their well-connected neighbors. In the casewhere recovered individuals do not develop immunity, infections would persist inthe high k-core area. These findings challenge the previous predominate focus on thenumber of connections. The simple yet effective measure k-core has inspired severalgeneralizations in consideration of the detailed local environment in the vicinity ofhigh k-core nodes [95, 50, 51, 54].Although k-core was found effective in SIR and SIS spreading dynamics, somestudies indicate that it may not be a good predictor of influence for other spreadingmodels. For instance, in rumor spreading model, Borge-Holthoefer and Moreno [11]showed that the spreading capabilities of the nodes did not depend on their k-corevalues. These contradictory results relying on the choice of specific spreading modelnecessitate more extensive empirical validation with real information flow [72].Apart from the k-core index, another measure that takes into account the globalnetwork structure is eigenvector centrality [10, 79]. The reasoning behind the eigen-vector centrality is that the influence of an individual is determined by the spreadingcapability of his/her neighbors. Starting from a uniform score assigned to each node,the scores propagate along the links until a steady state is reached. In calculation,each step of score propagation corresponds to a left multiplication of the adjacencymatrix to the current score vector. This procedure is actually the power method tocompute the principal eigenvalue of the adjacency matrix. As a result, the steady heories for influencer identification in complex networks 5 score vector is in fact proportional to the right eigenvector corresponding to thelargest eigenvalue. Notice that, supposing the initial score of each node is one, thefirst step of iteration will recover the degree centrality.Despite the wide application of eigenvector centrality, it was recently found thatthe scores could be localized at a few high degree nodes due to the repeated re-flection of scores from their neighbors during the iteration. Martin et al. solved thisproblem by using the leading eigenvector of the Hashimoto Non-Backtracking (NB)matrix [56]. In NB matrix, the immediate backtracking paths i → j and j → i arenot permissible [34], thus avoiding the heavy score accumulation caused by the re-current one-step reflection. Recently, by mapping the SIR spreading process to bondpercolation, Radicchi and Castellano proved that the NB centrality was an optimizedpredictor for single influencers in SIR model at criticality [76]. In next section, wewill see the important role of NB matrix in collective influence maximization andoptimal percolation [61]. Beyond the above pure topological measures, a number of centralities are developedon the basis of specific assumptions on the spreading dynamics. In some classicalcentralities proposed in the field of social networks, much emphasis is put on theshortest path. Along this way, several renowned centralities were developed andwidely accepted in social network ranking. For instance, the closeness centralityquantifies the shortest distance from a given node to all other reachable nodes in thenetwork [84], while betweenness centrality measures the fraction of shortest pathscross through a certain individual between all node pairs [25]. A useful generaliza-tion of closeness centrality is the Katz centrality [39], which considers all possiblepaths in the network, but assigns a larger weight to shorter paths using a tunableparameter. In application, the applicability of these shortest-path-based centralitiesis limited by the high computational complexity of calculating the shortest pathsbetween all pairs of nodes. As a result, they are more suitable for small or mediumscale networks.Another group of metrics are designed based on random walks. A famous ran-dom walk based centrality is PageRank [13]. As a revolutionary webpage rankingalgorithm, PageRank mimics a random walk process along the directed hyperlinks.To avoid the random walker trapped in the dangled nodes, a jumping probability α is introduced to allow the walker jump to a randomly chosen node. The PageRankscore is the stationary probability of each node to be visited by the random walker,which can be calculated through iteration. In applications, the PageRank of a node i in a network can be calculated from p t ( i ) = − α N + α∑ j A ij p t − ( j ) k out ( j ) , where k out ( j ) is the number of outgoing links from node j and α is the jumping probability. Ina generalization called LeaderRank [53], a ground node is connected to all othernodes by additional bidirectional links. This procedure ensures the network to bestrongly connected so that the convergence becomes faster. Sen Pei, Flaviano Morone and Hern´an A. Makse
In addition to the aforementioned centralities designed for general spreading pro-cesses, several measures are proposed aimed at specific dynamics, depending ex-plicitly on model parameters. In these approaches, the development of measures isbased on the equations depicting the dynamical process. Usually, the analysis ofequations will naturally lead to the procedure of path counting in which the num-ber of possible spreading paths is assessed. For instance, Klemm et al. developed ageneral framework to evaluate the dynamical importance (DI) of nodes in a seriesof dynamical processes [43]. The iterative calculation of DI centrality essentiallycounts the total number of arbitrarily long walks departing from each node. Anothermetric relying on possible spreading paths is the expected force (ExF) proposedby Lawyer [45]. To compute the expected force, all possible clusters of infectednodes after n transmission events starting from a given node are enumerated. Thenthe entropy of their cluster degree (i.e., number of outgoing links of the cluster, orinfected-susceptible edges) is calculated as the expected force for each node.The approaches introduced here are far from complete. A growing number ofmetrics and methods are continuously proposed in the active area of finding singleinfluencers [52]. In designing effective methods for more complex spreading mod-els, the basic principles behind these measures should be universal. In spite of the great value of estimating individual nodes’ influence with centralities,in a realistic situation, it is more relevant to understand spreading processes initiatedby several spreaders. In applications such as viral marketing, it is expected thatthe spreaders can be coordinated in an optimal manner so that the final collectiveinfluence will be maximized. Although it sounds similar to the problem of locatingsingle influencers, the collective influence maximization is in fact a fundamentallydifferent and more difficult problem. In the seminal work of Kempe et al. [40],the influence maximization problems in both Independent Cascade Model (ICM)and Linear Threshold Model (LTM) were mapped to the NP-complete Vertex Coverproblem. This implies, the influence maximization problem cannot be solved exactlywithin a polynomial time, leaving us the only choice of heuristic approach.A straightforward idea to find multiple influencers is to select the top-rankedspreaders as individual seeds using centrality measures. However, this approach ne-glects the interactions and collective effect among spreaders. As demonstrated inSIR simulations, the selected spreaders have significant overlap in their influencedpopulation [41]. Therefore, the set of influencers identified with centrality metricsare usually far from optimal. To solve this conundrum, it needs to be treated from acollective point of view [61]. heories for influencer identification in complex networks 7
We start our discussion from the percolation model point of view. As a well-studieddynamical process, percolation was shown to be closely related to spreading and im-munization [67, 70, 16]. Percolation is a classical physical process in which nodesor links are randomly removed from a graph [86]. The critical quantity that is ofparticular interest is the fraction of nodes or links whose removal will collapse thegiant component. It is well known that the size of giant component decreases contin-uously to zero as the number of removed nodes or links increases. In the pioneeringworks of Newman [67, 68], the class of SIR models were mapped to the percola-tion process for which the critical point of the continuous transition could be solvedexactly.In contrast to the studies focused on random removal, the problem of optimalpercolation aims to find the minimal set of nodes that could guarantee the globalconnectivity of the network, or equivalently, dismantle the network if removed. Mo-rone and Makse showed that, mathematically, the optimization of spreading processfollowing exactly the Linear Threshold Model with threshold k − k is the de-gree of each node) can be mapped to the optimal percolation problem [61]. For thisspecific spreading model, finding the minimum number of seeds so that the informa-tion percolates the entire network is essentially equivalent to locating the optimal setof nodes in the optimal percolation problem. Similarly, the optimal immunizationproblem, dual of optimal spreading, can also be mapped to optimal percolation [61].The relation between the cohesion of a network and influence spreading indicatesthat the most influential spreaders are the nodes that maintain the integrity of thenetwork.The collective influence theory for optimal percolation is developed based on themessage passing equations of the percolation process. For a network with N nodesand M edges, suppose n = ( n i , · · · , n N ) indicates whether node i is removed ( n i = n i =
1) in the network. The total fraction of removed nodes is therefore q = − ∑ Ni = n i / N . For a directed link from i to j ( i → j ), let ν i → j denote the probabilityof node i belonging to the giant component G in the absence of node j . The evolutionof ν i → j satisfies the following self-consistent equation: ν i → j = n i " − ∏ k ∈ ∂ i \ j ( − ν k → i ) , (1)where ∂ i \ j denotes the nearest neighbors of i excluding j . The final probability ν i of node i belonging to the giant component is then determined by ν k → i ( k ∈ ∂ i )through ν i = n i " − ∏ k ∈ ∂ i ( − ν k → i ) . (2)The fraction of nodes in the giant component is then given by G ( q ) = ∑ Ni = ν i / N . Sen Pei, Flaviano Morone and Hern´an A. Makse κ – 1 λ q c q ( ) ( ) ∂ P BallBall a b G ( q ) ad bc q CIHDAPRHDk-core
Phone calls10.80.60.40.20 G ( q ) d c W0 0.04 0.08 0.12 q c d Fig. 2 a , For q ≥ q c , the global minimum of the largest eigenvalue λ of the NB matrix over n is 0. In this case, G = λ > G >
0. For q < q c , the minimum of the largest eigenvalue is always λ >
1. Thereforethe solution G = G >
0. At the optimal percolation transition, the minimumis at n ∗ such that λ ( n ∗ , q c ) =
1. At q = λ = κ − κ = h k i / h k i . At λ =
1, the giantcomponent is reduced to a tree plus one single loop. This loop is destroyed at the transition q c , and λ abruptly falls to 0. b , Ball ( i , ℓ ) of radius ℓ around node i is shown. ∂ Ball is the set of nodes onthe boundary. The highlighted route is the shortest path from i to j . c-d , Giant component G ( q ) of Twitter ( N = , N = . × ) computed usingCI, high degree adaptive (HDA), PageRank (PR), high degree (HD) and k-core strategies. Figureis adapted from Morone et al. [61]. For the continuous phase transition in percolation process, the stability of the zerosolution G = λ ( n ; q ) of the coupling matrix M for the linearized Eq. (1) evaluated at { ν i → j = } (see Fig. 2a). Concretely, M is defined on the 2 M × M directed links as M k → ℓ, i → j ≡ ∂ν i → j ∂ν k → ℓ | { ν i → j = } . A simplecalculation reveals that for locally-tree like random networks, M is given in termsof the Non-Backtracking (NB) matrix B [34] via M k → ℓ, i → j = n i B k → ℓ, i → j in which B k → ℓ, i → j = ℓ = i and j = k , and 0 otherwise.To guarantee the stability of the solution { ν i → j = } , it is required λ ( n ; q ) ≤ q can be rephrased as finding the optimal heories for influencer identification in complex networks 9 configuration n that minimizes the largest eigenvalue λ ( n ; q ) . As q approaches theoptimal threshold q c , there exist a decreasing number of configurations that satisfy λ ( n ; q ) ≤
1. At q c , only one configuration n ∗ exists such that λ ( n ∗ ; q c ) =
1, andall other configurations will give λ ( n ; q ) >
1. The optimal configuration of Nq c influencers n ∗ is therefore obtained when the minimum of the largest eigenvaluesatisfies λ ( n ∗ ; q c ) =
1. In practice, the largest eigenvalue can be calculated by thepower method (we leave out q in λ ( n ; q ) ): λ ( n ) = lim ℓ → ∞ (cid:20) | w ℓ ( n ) || w | (cid:21) /ℓ . (3)Here | w ℓ ( n ) | is the ℓ iterations of M on initial vector w : | w ℓ ( n ) | = | M ℓ w | . Tofind the best configuration of n , we need to minimize the cost function | w ℓ ( n ) | for afinite ℓ . Through a proper simplification, we have an approximation of | w ℓ ( n ) | oforder 1 / N as | w ℓ ( n ) | = N ∑ i = ( k i − ) ∑ j ∈ ∂ Ball ( i , ℓ − ) ∏ k ∈ P ℓ − ( i , j ) n k ! ( k j − ) , (4)in which ∂ Ball ( i , ℓ ) is the frontier of the ball of radius ℓ in terms of shortest pathcentered around node i , P ℓ ( i , j ) is the shortest path of length ℓ connecting i and j ,and k i is the degree of node i . See an example in Fig. 2b.Based on the form of Eq. (4), an energy function for each configuration n can bedefined as follows: E ℓ ( n ) = N ∑ i = ( k i − ) ∑ j ∈ ∂ Ball ( i ,ℓ ) ∏ k ∈ P ℓ ( i , j ) n k ! ( k j − ) , (5)where E ℓ ( n ) = | w ( ℓ + ) / | for ℓ odd and E ℓ ( n ) = h w ℓ/ | M | w ℓ/ i for ℓ even. For ℓ = E ℓ ( n ) is exactly the energy function of an Ising model which can be opti-mized using the cavity method [57]. For ℓ ≥
2, it becomes a hard optimization prob-lem involving many-body interactions. To develop a scalable algorithm for big-dataanalysis, an adaptive method is proposed, which is essentially a greedy algorithmfor minimizing the largest eigenvalue of the stability matrix M for a given ℓ in theform of Eq. (4). In fact, Eq. (5) can be rewritten as the sum of collective influencefrom single nodes: E ℓ ( n ) = N ∑ i = CI ( i ) , (6)in which the collective influence (CI) of node i at length ℓ is defined as:CI ℓ ( i ) = ( k i − ) ∑ j ∈ ∂ Ball ( i ,ℓ ) ( k j − ) . (7) The main idea behind the CI algorithm is to remove the nodes that can cause largestdecrease of energy function in Eq. (4). In each iteration of CI algorithm, the nodewith largest CI value is deleted, after which the CI values for remaining nodes are re-calculated. The adaptive removal continues until the giant component is fragmented,i.e. G ( q ) =
0. Notice that the procedure minimizes q c but does not guarantee theminimization of G in the percolation phase G >
0. If we want to optimize the con-figuration for G ( q ) >
0, a reinsertion procedure is applied from the configuration at G ( q ) =
0. In practice, if we use a heap structure to find the node with the largestCI and only update the nodes inside the ( ℓ + ) -radius ball around the removednode, the computational complexity of CI algorithm can achieve N log ( N ) [62]. Asa result, the CI algorithm is scalable for massively large-scale networks in modernsocial network analysis. For a Twitter network with 469 ,
013 users (Fig. 2c) and asocial network of 1 . × mobile phone users in Mexico (Fig. 2d), CI algorithmfinds a smaller set of influencers than simple scalable heuristics including high de-gree adaptive (HDA), PageRank (PR), high degree (HD), and k-core [61]. To applyCI algorithm to real-time influencer ranking, a Twitter search engine was developedat . Notice that, for ℓ =
0, CI algorithmdegenerates to high-degree ranking. So degree can be interpreted as the zero-oderapproximation of CI in Eq. (7).To guarantee the scalability of the algorithm, CI essentially takes an adaptivegreedy approach. The performance of CI algorithm can be further improved by asimple extension of CI using the message passing framework for ℓ → ∞ - the CIpropagation algorithm (CI P ) [62]. Remarkably, the CI propagation algorithm canreproduce the exact analytical threshold of optimal percolation for cubic randomregular graphs [8]. Another belief-propagation variant of CI algorithm based on op-timal immunization (CI BP ) also has similar performance of CI P [62]. However, theimprovement over CI algorithm is at the price of higher computational complexity O ( N log ( N )) , which makes both CI P and CI BP unscalable.Recent studies have shown that the optimal percolation problem is closely relatedto the optimal decycling problem, or minimum feedback vertex set (FVS) problem[38]. Using belief-propagation (BP) algorithms, the optimal percolation problemwas solved in recent works [65, 12]. The result of BP algorithms was found betterthan CI algorithm. Another approach to the optimal destruction of networks makesuse of the explosive percolation theory [21]. The percolation process is deterministic on a given network with a given seed set.An important class of spreading model with stochasticity is the independent cas-cade model (ICM) [42]. In these models, a node is infected or activated by itsneighbors with a predefined probability independently. Frequently used indepen-dent cascade models include susceptible-infected (SI) model, susceptible-infected-susceptible (SIS) model and susceptible-infected-removed (SIR) model. These mod- heories for influencer identification in complex networks 11 els are widely adopted in modeling infectious disease outbreaks and informationspreading in social networks [35, 41, 87, 74, 94, 93]. Therefore, it is of particularinterest in relevant applications.In the pioneering work of Kempe et al. [40], influence maximization was firstformalized as a discrete optimization problem: For a given spreading process on anetwork and an integer k , how to find the optimal set of k seeds that could generatethe largest influence. For a large class of ICM and LTM, the influence maximiza-tion problem can be well approximated by a simple greedy strategy, with a provableapproximation guarantee [40]. In the basic greedy algorithm, the seed set is ob-tained by repeatedly selecting the node that provides the largest marginal increaseof influence at each time step. The performance guarantee is built on the submod-ular property of the influence function σ ( S ) [66], which is defined as the expectednumber of active nodes if the initial seed set is S . The influence function σ ( · ) issubmodular if the incremental influence of selecting a node u into a seed set S is nosmaller than the incremental influence of selecting the same node into a larger set V containing S . That is, σ ( S ∪ { u } ) − σ ( S ) ≥ σ ( V ∪ { u } ) − σ ( V ) for all nodes u andany sets S ⊆ V . Leveraging on the result of submodular function [66], the greedyalgorithm is guaranteed to approximate the true optimal influence within a factorof 1 − / e ≈ σ ( S ) ≥ ( − / e ) σ ( S ∗ ) , where S is the seed set obtainedby the greedy algorithm and S ∗ is the true optimal seed set. Although the basicgreedy algorithm is simple to implement and performance-guaranteed, it requiresmassive Monte Carlo simulations to estimate the marginal gain of each candidatenode. Several works were proposed to improve the efficiency of greedy algorithm[47, 30, 20, 19].While performance guaranteed, from an optimization point of view, the greedyalgorithm may be stuck into local optimum. This drawback can be solved by a moresophisticated message passing approach. Altarelli et al. developed the message pass-ing algorithms (both belief-propagation (BP) and max-sum (MS)) for the problemof optimal immunization for SIR and SIS model [2], which can be applied to generalICMs. From another point of view, the independent cascade model can be naturallymapped to a bond percolation. Hu et al. found that in a series of real-world networks,most SIR spreading would be restrained to a local area while global-scale spreadingrarely occurs [37]. Using the bond percolation theory, a characteristic local lengthtermed influence radius was revealed. They argue that the global spreading opti-mization problem in fact can be solved locally, with the knowledge of the localenvironment within the influence radius. Compared with independent cascade model, linear threshold model is more complexin the sense that a node’s state is collectively determined by its neighbors’ state. In atypical instance of LTM, each node v is assigned with a threshold value θ v and eachlink ( u , v ) is assigned with a weight w ( u , v ) . During the cascade, a node is activated only if the sum of weights of its activated neighbors reaches the threshold value,i.e. ∑ u ∈ ∂ v w ( u , v ) ≥ θ v . In the case where the weights and thresholds are drawn uni-formly from the interval [ , ] , LTM was proven to be submodular [40]. Therefore,the influence maximization in this class of LTM can be well approximated by thegreedy strategy, as we introduced in above section. However, even with the lazyforward update [47], the algorithm is still unscalable for large networks. Chen etal. found a way to approximate the influence of a node in a local subgraph [19],and developed a scalable greedy algorithm. Goyal et al. [31] further improved thisalgorithm by considering more choices of paths.The above greedy approach and its variants are applicable to LTM with sub-modular property. However, for the general class of LTM with fixed weight andthreshold, it is not guaranteed to be submodular [40]. An important class of LTMthat may not be submodular is defined as follows: A node i is activated only after acertain number m i of its neighbors are activated. The choice of different threshold m i can generate two qualitatively different cascade regimes with continuous and dis-continuous phase transitions. For instance, in the special case of m i = k i − k i is thedegree of node i ), a continuous phase transition of influence occurs as the seed setgrows [61]. However, there also exist a wide class of LTM exhibiting a first-order,or discontinuous phase transition. In the case that seeds are selected randomly, thetransition between these two regimes is explored in detail in the context of bootstrappercolation [9, 29] and a simple cascade model [91]. But these results are based onthe typical dynamical properties starting from random initial conditions. For influ-ence maximization with a special initial condition, the dynamical behavior shouldbe deviated from the average ones. Altarelli et al. proposed a BP algorithm thatcould estimate statistical properties of nontypical trajectories and found the initialconditions that lead to cascading with desired properties [3]. To obtain the exact setof seeds, MS equations were derived by setting the inverse temperature β → ∞ inthe energy function [4]. Extending the work under the assumption of replica sym-metry, the theoretical limit of the minimal contagious set (the minimal seed set thatcan activate the entire graph) in random regular graphs is obtained using the cavitymethod with the effect of replica symmetry breaking [33].In big-data analysis, an efficient and scalable algorithm designed for generalLTM is needed. Starting from the message passing equations of LTM, general-ized from Eq. (1) of percolation, a scalable algorithm named collective influencefor threshold model (CI-TM) can be developed [75]. By iteratively solving the lin-earized message passing equations, the cascading process can be decomposed toseparate components, each of which corresponds to the contribution made by a sin-gle seed. Interestingly, it is found the contribution of a seed is determined by thesubcritical paths along which cascade propagates. In order to design a scalable algo-rithm, the node with the largest number of subcritical paths is recursively selectedinto the seed set. After each selection, the selected node and the subcritical pathsattached to it are removed, and the status of the remaining nodes is recalculated.Making use of the heap structure, CI-TM algorithm can achieve the complexity of O ( N log N ) . On one hand, computing CI-TM ℓ value for a given length ℓ is equiva-lent to iteratively visiting subcritical neighbors of each node layer by layer within ℓ heories for influencer identification in complex networks 13 radius. Because of the finite search radius, computing CI-TM ℓ for each node takes O ( ) time. Initially, we have to calculate CI-TM ℓ for all nodes. However, duringlater adaptive calculation, there is no need to update CI-TM ℓ for all nodes. We onlyhave to recalculate for nodes within ℓ + O ( ) compared to the network size as N → ∞ as shown in [62]. On theother hand, selecting the node with maximal CI-TM can be realized by making useof the data structure of heap that takes O ( log N ) time [62]. Therefore, the overallcomplexity of ranking N nodes is O ( N log N ) even when we remove the top CI-TMnodes one by one. In both homogeneous and scale-free random networks, CI-TMachieves larger collective influence given the same number of seeds compared withother scalable approaches. This provides a practical method that can be applied tomassively large-scale networks. The problem of influencer identification is ubiquitous in a wide class of applica-tions. So far, the theory of influencer identification has been applied to a numberof important problems. In this section, we will introduce the application of influ-encer identification in three different areas: information diffusion, brain networks,and socioeconomic systems.
The most direct application of influencer identification is to maximize the informa-tion diffusion in social networks. In recent years, a huge number of research workshave been performed aiming to relate users’ spreading power to their locations,or personal features [72, 88, 58]. These works, mainly focusing on various typesof online social networks including email communication [49], Facebook [90, 60],Twitter [18, 6, 44], and blogs sharing communities [5, 77], enrich our understandingof information diffusion in social networks.A great challenge of developing effective predictors of influencers comes fromthe validation. In most of the previous works, the validation of proposed measuresdepends on modeling of information spreading in a given network. This approach,however, has led to several contradictory results on the best predictor of influencedepending on the particular models [41, 11]. These models are built on simplifiedassumptions on human behavior [36] that neglect some of the most important fea-tures in real information diffusion [28], such as activity frequency [83, 64], behaviorpattern [73, 48, 89], etc. Therefore, it is required to validate the various proposedpredictors using empirical diffusion records in real-world social media.We first compare the performance of different predictors for single influencers[72]. Realistic information diffusion instances as well as the underlying social net- k S k i n LiveJournal k S k i n APS k S k i n Facebook k k i n Twitter S a bc d Fig. 3
K-core predicts the average influence of spreading more reliably than in-degree. Logarith-mic values (base 10) of the average size of influence region M ( k S , k in ) when spreading originatesfrom nodes with ( k S , k in ) for LiveJournal ( a ), APS journals ( b ), Facebook ( c ) and Twitter ( d ) areshown. Figure is adapted from Pei et al. [72]. works are collected in four dissimilar social platforms: a blog-sharing communityLiveJournal, scientific journals of American Physical Society, an online social net-work Facebook, and microblog service Twitter. To determine the real influence ofeach node, a directed diffusion graph is first constructed for each system by com-bining all directed diffusion links together. Then starting from a source node i , thetotal influence M i of node i is computed by tracking the diffusion links layer bylayer in a breadth-first-search (BFS) fashion. Once we get the realistic influence,it is convenient to compare the performance of different predictors, including de-gree, k-core, and PageRank. Specifically, we can calculate the average influence M ( k S , k in ) for nodes with a given combination of k-core value k S and in-degree k in : M ( k S , k in ) = ∑ i ∈ ϒ ( k S , k in ) M i / N ( k S , k in ) , where ϒ ( k S , k in ) is the collection of users inthe ( k S , k in ) bin, and N ( k S , k in ) is the size of this collection. In all the systems, itis consistently observed that nodes with fixed degree can have either large or smallinfluence, while nodes located in the same k-core have similar influence (see Fig. 3).Thus the influence of nodes is more related to their global location in the network,indicated by their k-core values. The same conclusion is also obtained in the com-parison with PageRank. K-core does not only predict the average influence better, heories for influencer identification in complex networks 15 Fig. 4 a , Calculation of influence strength to node u . Suppose the maximum spreading layer is setas L = s and s . The collective influence enforcing to u is selected as thelargest value of the strength I u ( s ) and I u ( s ) . b , An illustration of single influence and collectiveinfluence. The three circle-like areas represent influence range R s , R s and R s for different spread-ers s , s and s . The contour lines show the levels of influence strength. The collective influence(grey curve) is obtained by combining single influence strengths of all spreaders. Figure is adaptedfrom Teng et al. [88]. but also recognize influencers more accurately. Although k-core is effective, it is toocoarse to distinguish different nodes within same shells. In some cases, there maybe millions of nodes in one shell.We further investigate the identification of multiple influencers [88]. Again, weuse the realistic diffusion instances in the above four platforms. However, the em-pirical data cannot be directly mapped to ideal multi-source spreading. Such idealmulti-source spreading instances in which spreaders send out the same piece ofmessage at the same time rarely exist in reality. Even though we can find such in-stances, the initial spreaders are hardly the same as the set of nodes selected byCI or other heuristic strategies. To circumvent this difficulty, we can construct vir-tual multi-source spreading processes by leveraging the behavior patterns of usersextracted from the data. Suppose n spreaders S = { s i | i = , , · · · , n , n = qN } areactivated at the beginning of the virtual process. The influence strength I g ( s ) fromseed s to its neighbor g depends on the tendency of g to receive information from s . Assume during the observation time, s has sent out r ( s ) pieces of messages and g has accepted r ( s , g ) of them. Then the influence strength can be approximatedby I g ( s ) = r ( s , g ) / r ( s ) . In subsequent spreading, g may affect its neighbor g = s in the same manner. Following the spreading paths, we can acquire the influencestrength s enforcing on its ℓ -step neighbor g ℓ : I g ℓ ( s ) = ∏ ℓ k = r ( g k − , g k ) / r ( g k − ) ,where g = s . The collective influence I u for node u imposed by the seed set S istherefore I u = max ni = I u ( s i ) . See Fig. 4 for an example. Finally, summing up allthe N nodes in the network, the collective influence of the spreaders imposed onthe entire system is Q ( q ) = ∑ Nu = I u / N . Based on this virtual spreading process, wecan evaluate the collective influence of the spreaders selected by different methods.In particular, we compare the influencers selected by collective influence algorithm (CI), adaptive high degree (HDA), high degree (HD), PageRank (PR), and k-core.In all the systems, CI consistently outperforms other ranking methods. The human brain is a robust modular system interconnected as a Network of Net-works (NoN) [15, 78, 26]. How this robustness emerges in a modular structure isan important question in many disciplines. Previous interdependent NoN modelsinspired by power grid are extremely fragile [14], thus cannot explain the observedrobustness in brain networks. To reveal the mechanism beneath this robustness, aNoN model is proposed which can afford inter-link functionality and remain robustat the same time [63, 82].In NoN system, the links are classified into two types: inter-modular links thatrepresent the mutual dependencies between modules and intra-modular links that donot involve in the inter-modular dependencies. Denote S ( i ) and F ( i ) as the set ofnodes connected to node i via intra-modular and inter-modular links, respectively.Suppose the variable state of node i is σ i ∈ { , } (inactive or active), and the exter-nal input to node i is n i ∈ { , } (no input or input). In the general activation model,the variable state is related to the input through σ i = n i (cid:2) − ∏ j ∈ F ( i ) ( − n j ) (cid:3) . Thatis, the node i is activated only if i receives the input ( n i =
1) and at least one of itsneighbors connected with inter-modular links receives the input. In a robust brainnetwork, for typical input configuration n = ( n , · · · , n N ) , the giant (largest) com-ponent of the active nodes G with σ i = q rand = − h n i of zero inputs such that G ( q rand ) =
0. Here the input configura-tion n is sampled from a flat distribution. Ideally, the robust NoN should have nodisconnected phase, with a large value of q rand close to 1.To explain both robustness and inter-link functionality of brain networks, a robustNoN (R-NoN) model is proposed [63]. Define ρ i → j ∈ { , } as the message runningalong an intra-modular link i → j , ϕ i → j ∈ { , } as the message running along aninter-modular link i → j . The information flow follows the self-consistent equations ρ i → j = σ i " − ∏ k ∈ S ( i ) \ j ( − ρ k → i ) ∏ ℓ ∈ F ( i ) ( − ϕ ℓ → i ) , (8) ϕ i → j = σ i " − ∏ k ∈ S ( i ) ( − ρ k → i ) ∏ ℓ ∈ F ( i ) \ j ( − ϕ ℓ → i ) . (9)The physical meaning of the above equations is easy to be interpreted. For instance,in Eq. (8), a positive message ρ i → j is transmitted from i to j in the same module ifnode i is active σ i = k in the same module ρ k → i = or a node ℓ in the other module ϕ ℓ → i = heories for influencer identification in complex networks 17 Fig. 5 a , Spatial location of the three main modules (AC, PPC, and V1/V2) in the 3NoN. b ,Topology of the 3NoN. Inter-links and intra-links are displayed. c , Size of the largest active cluster G ( q ) as a function q of the nodes with n i = ℓ =3) andrandom states (black curve, random percolation). Figure is adapted from Morone et al. [63]. R-NoN. The final probability of node i belonging to the largest active component G is ρ i = σ i " − ∏ k ∈ S ( i ) ( − ρ k → i ) ∏ ℓ ∈ F ( i ) ( − ϕ ℓ → i ) . (10)The size of G is therefore G = h ρ i i . In the R-NoN model, the system is robust sincea node can be active σ i = G . This prevents catastrophiccascading effects in the catastrophic C-NoN model inspired by power grid failure[14]. In the C-NoN model, a node remains functional only if it belongs to the giantcomponent in both networks. This implies the status of a node in one network is in-terdependent on its status in the other network. The fundamental difference betweenC-NoN and R-NoN is that, in C-NoN model, the size of G is computed through ρ i = σ i " − ∏ k ∈ S ( i ) ( − ρ k → i ) − ∏ ℓ ∈ F ( i ) ( − ϕ ℓ → i ) . (11)So the logical OR in Eq. (10) is replaced by the logical AND in C-NoN. This strictercondition makes the system extremely sensitive to small perturbations. In synthetic NoN made of ER and SF random graphs, it is found the percolation threshold q rand of R-NoN model is close to 1. On the contrary, the C-NoN model has threshold q rand close to 0. This indicates that the two models indeed capture two differentphenomena.After exploring the behavior of R-NoN model under typical inputs, it is requiredto study the response to rare events targeting the influencers in the brain networks.Rare inputs { n i = } targeting influencers may interrupt the global communicationin the brain, which have been conjectured be responsible for certain neurologicaldisorders. Or conversely, activating the influencers would optimally broadcast in-formation to the entire network. Therefore, it is important to predict the locationof the most influential nodes involved in information processing in the brain. Tofind the minimal fraction of nodes q in f l in the brain network whose removal wouldoptimally fragment the giant component, the R-NoN model is mapped to the opti-mal percolation. The collective influence of nodes is calculated by minimizing thelargest eigenvalue of the modified NB matrix. Particularly, the collective influenceof node i is given byCI ℓ ( i ) = z i ∑ j ∈ ∂ Ball ( i ,ℓ ) z j + ∑ j ∈ F ( i ) : k outj = z j ∑ m ∈ ∂ Ball ( j ,ℓ ) z m , (12)where z i ≡ k ini + k outi −
1. The first term is the node-centric contribution, whichpresents in the single network case of optimal percolation, while the second termis the node-eccentric contribution, which is a new feature of the brain NoN.Applying the R-NoN model and collective influence theory to real brain net-works, it is possible to obtain the collective influence map of brain NoN. The brainnetwork is constructed from the functional magnetic resonance imaging (fMRI) dataof the experiment of stimulus driven attention [63, 26, 27, 24]. In the experiment,each subject performs a dual visual-auditory task when receiving a visual stimulusand an auditory pitch simultaneously. This experiment requires the deployment ofhigh level control modules in the brain, thus captures the role of dependency inter-modular connections. In the obtained brain network (see Fig. 5a-b), it is observedthat the system is robust with large threshold q rand ≈ .
9. While the minimal set ofinfluencers only requires q in f l ≈ . It has long been recognized that the pattern of individuals’ social connection in soci-ety can affect people’s financial status [32]. However, how to quantify the relation-ship between the location of an individual in social network and his/her economic heories for influencer identification in complex networks 19 a b Top 10%
Bottom 10%
Fig. 6 a-b , Visualization of communication activity of population in the top 10% and bottom 10%total credit limit classes. Figure is adapted from Luo et al. [55]. wellness remains an open question. Despite that the effect of network diversity oneconomic development has been tested in the community level [23], inference ofpeople’s financial status from social network centralities or metrics in individuallevel is still needed. The difficulty of such investigation comes from the lack of em-pirical data containing both individual’s financial information and pattern of socialties.To find a reliable social network predictor of people’s financial status, a mas-sively large social network of the mobile and residential communication in Mexicocontaining 1 . × users together with financial banking data are analyzed [55].With this dataset, it is possible to precisely cross-correlate the financial informationof a person with his/her location in the communication network at the country level.Particularly, the financial status of individuals is reflected by their credit limit. Inthe analysis of the 5 . × bank clients identified in the phone call network, thetop 10% and bottom 10% individuals present completely different communicationpattern (see Fig. 6). Richer people maintain more active and diverse links, someconnecting to remote locations and forming tightly linked “rich clubs”.To characterize the affluent people with network metrics, several centralities thatare feasible for large-scale networks are compared, including degree, PageRank, k-core, and collective influence (CI). In the communication network, these four met-rics are correlated. Therefore, they all show correlations with financial status whenage is controlled. Among them, both k-core and CI capture the strong correlationwith credit line with a R value of 0.96 and 0.93, respectively. However, CI is morepreferable since it satisfies both, a strong correlation and a high resolution. Accord-ing to the definition of CI, top CI nodes are surrounded by hubs hierarchically. Thisis exactly the structure of ego-centric network of the top 1% wealthy people.The performance of predictions can be further enhanced by considering the factorof age. An age-network combined metric ANC = α Age + ( − α ) CI with α = . R = .
99. Moreover, it is able to identify 70% highcredit individuals at the highest earner level. To validate the effectiveness, a realsocial marketing campaign was performed. Specifically, text messages inviting newcredit card clients were sent to 656,944 people selected by their high CI values in thesocial network. Meanwhile, the same message was sent to a control group of 48,000 individuals selected randomly. The response rate, measured by the fraction of recip-ients who requested the product, is augmented by threefold in the top influencersidentified by CI compared with the random control group.The same analysis was also applied to individuals’ diversity of links [23]. Thediversity of an individual can be measured by the diversity ratio DR = W out / W in ,i.e., the ratio of total communication events with people in other communities W out and within the same community W in . The correlation between DR and CI is weakso they should reflect different aspects of network structure. In comparison withfinancial data, the age-diversity composite ADC = α Age + ( − α ) DR ( α = . We acknowledge funding from NIH-NIBIB 1R01EB022720, NIH-NCI U54CA137788/ U54CA132378 and nsf-iis 1515022.
References
1. Albert, R., Jeong, H., Barab´asi, A.L.: Error and attack tolerance of complex networks. Nature (6794), 378–382 (2000)2. Altarelli, F., Braunstein, A., DallAsta, L., Wakeling, J.R., Zecchina, R.: Containing epidemicoutbreaks by message-passing techniques. Phys. Rev. X (2), 021024 (2014)3. Altarelli, F., Braunstein, A., DallAsta, L., Zecchina, R.: Large deviations of cascade processeson graphs. Phys. Rev. E (6), 062115 (2013)4. Altarelli, F., Braunstein, A., DallAsta, L., Zecchina, R.: Optimizing spread dynamics ongraphs by message passing. J. Stat. Mech: Theory and Exp. (09), P09011 (2013)5. Backstrom, L., Huttenlocher, D., Kleinberg, J., Lan, X.: Group formation in large social net-works: membership, growth, and evolution. In: Proc.12th ACM SIGKDD Intl. Conf. onKnowledge Discovery and Data Mining, pp. 44–54. ACM (2006)6. Bakshy, E., Hofman, J.M., Mason, W.A., Watts, D.J.: Everyone’s an influencer: quantifyinginfluence on twitter. In: Proc. 4th ACM Intl. Conf. on Web Search and Data Mining, pp.65–74. ACM (2011)7. Batagelj, V., Zaversnik, M.: An o (m) algorithm for cores decomposition of networks. arXivpreprint cs/0310049 (2003)8. Bau, S., Wormald, N.C., Zhou, S.: Decycling numbers of random regular graphs. RandomStruct. Alg. (3-4), 397–413 (2002)9. Baxter, G.J., Dorogovtsev, S.N., Goltsev, A.V., Mendes, J.F.: Bootstrap percolation on com-plex networks. Phys. Rev. E (1), 011103 (2010)10. Bonacich, P.: Factoring and weighting approaches to status scores and clique identification. J.Math. Socio. (1), 113–120 (1972)11. Borge-Holthoefer, J., Moreno, Y.: Absence of influential spreaders in rumor dynamics. Phys.Rev. E (2), 026116 (2012)heories for influencer identification in complex networks 2112. Braunstein, A., DallAsta, L., Semerjian, G., Zdeborov´a, L.: Network dismantling. Proc. Natl.Acad. Sci. U.S.A. (44), 12,368–12,373 (2016)13. Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. ComputerNetworks and ISDN System (1), 107–117 (1998)14. Buldyrev, S.V., Parshani, R., Paul, G., Stanley, H.E., Havlin, S.: Catastrophic cascade of fail-ures in interdependent networks. Nature (7291), 1025–1028 (2010)15. Bullmore, E., Sporns, O.: Complex brain networks: graph theoretical analysis of structural andfunctional systems. Nat. Rev. Neurosci. (3), 186–198 (2009)16. Callaway, D.S., Newman, M.E., Strogatz, S.H., Watts, D.J.: Network robustness and fragility:Percolation on random graphs. Phys. Rev. Lett. (25), 5468 (2000)17. Centola, D.: The spread of behavior in an online social network experiment. Science (5996), 1194–1197 (2010)18. Cha, M., Haddadi, H., Benevenuto, F., Gummadi, P.K.: Measuring user influence in twitter:The million follower fallacy. Proc. 4th Intl. AAAI Conf. on Weblogs and Social Media (10-17), 30 (2010)19. Chen, W., Wang, C., Wang, Y.: Scalable influence maximization for prevalent viral market-ing in large-scale social networks. In: Proc. 16th ACM SIGKDD Intl. Conf. on KnowledgeDiscovery and Data Mining, pp. 1029–1038. ACM (2010)20. Chen, W., Wang, Y., Yang, S.: Efficient influence maximization in social networks. In: Proc.15th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, pp. 199–208.ACM (2009)21. Clusella, P., Grassberger, P., P´erez-Reche, F.J., Politi, A.: Immunization and targeted destruc-tion of networks using explosive percolation. Phys. Rev. Lett. (20), 208301 (2016)22. Cohen, R., Erez, K., Ben-Avraham, D., Havlin, S.: Breakdown of the internet under intentionalattack. Phys. Rev. Lett. (16), 3682 (2001)23. Eagle, N., Macy, M., Claxton, R.: Network diversity and economic development. Science (5981), 1029–1031 (2010)24. del Ferraro, G., Moreno, A., Min, B., Morone, F., Perez-Ramirez, U., Perez-Cervera, L., Parra,L., A, H., Canals, S., Makse, H.A.: Finding essential nodes for integration in the brain usingnetwork optimization theory (2017)25. Freeman, L.C.: Centrality in social networks conceptual clarification. Soc. Netw. (3), 215–239 (1978)26. Gallos, L.K., Makse, H.A., Sigman, M.: A small world of weak ties provides optimal globalintegration of self-similar modules in functional brain networks. Proc. Natl. Acad. Sci. U.S.A. (8), 2825–2830 (2012)27. Gallos, L.K., Sigman, M., Makse, H.A.: The conundrum of functional brain networks: small-world efficiency or fractal modularity. Front. Psychol. , 123 (2007)28. Gallos, L.K., Song, C., Makse, H.A.: Scaling of degree correlations and its influence on diffu-sion in scale-free networks. Phys. Rev. Lett. (24), 248,701 (2008)29. Goltsev, A.V., Dorogovtsev, S.N., Mendes, J.F.F.: k-core (bootstrap) percolation on complexnetworks: Critical phenomena and nonlocal effects. Phys. Rev. E (5), 056101 (2006)30. Goyal, A., Lu, W., Lakshmanan, L.V.: Celf++: optimizing the greedy algorithm for influencemaximization in social networks. In: Proc. 20th Intl. Conf. World Wide Web, pp. 47–48. ACM(2011)31. Goyal, A., Lu, W., Lakshmanan, L.V.: Simpath: An efficient algorithm for influence maxi-mization under the linear threshold model. In: Data Mining (ICDM), 2011 IEEE 11th Intl.Conf. on, pp. 211–220. IEEE (2011)32. Granovetter, M.S.: The strength of weak ties. Am. J. Sociol. (6), 1360–1380 (1973)33. Guggiola, A., Semerjian, G.: Minimal contagious sets in random regular graphs. J. Stat. Phys. (2), 300–358 (2015)34. Hashimoto, K.i.: Zeta functions of finite graphs and representations of p-adic groups. Adv.Stud. Pure Math. , 211–280 (1989)35. Hethcote, H.W.: The mathematics of infectious diseases. SIAM Rev. (4), 599–653 (2000)36. Hu, Y., Havlin, S., Makse, H.A.: Conditions for viral influence spreading through multiplexcorrelated social networks. Phys. Rev. X (2), 021,031 (2014)2 Sen Pei, Flaviano Morone and Hern´an A. Makse37. Hu, Y., Ji, S., Feng, L., Havlin, S., Jin, Y.: Optimizing locally the spread of influence in largescale online social networks. arXiv preprint arXiv:1509.03484 (2015)38. Karp, R.M.: Reducibility among combinatorial problems. In: Complexity of computer com-putations, pp. 85–103. Springer (1972)39. Katz, L.: A new status index derived from sociometric analysis. Psychometrika (1), 39–43(1953)40. Kempe, D., Kleinberg, J., Tardos, ´E.: Maximizing the spread of influence through a socialnetwork. In: Proc. 9th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining,pp. 137–146. ACM (2003)41. Kitsak, M., Gallos, L.K., Havlin, S., Liljeros, F., Muchnik, L., Stanley, H.E., Makse, H.A.:Identification of influential spreaders in complex networks. Nat. Phys. (11), 888–893 (2010)42. Kleinberg, J.: Cascading behavior in networks: Algorithmic and economic issues. Algorithmicgame theory , 613–632 (2007)43. Klemm, K., Serrano, M., Eguiluz, V.M., Miguel, M.S.: A measure of individual role in collec-tive dynamics. Sci. Rep. , 292 (2012)44. Kwak, H., Lee, C., Park, H., Moon, S.: What is twitter, a social network or a news media? In:Proc.19th ACM Intl. Conf. on World Wide Web, pp. 591–600. ACM (2010)45. Lawyer, G.: Understanding the influence of all nodes in a network. Sci. Rep. , 8665 (2015)46. Leskovec, J., Adamic, L.A., Huberman, B.A.: The dynamics of viral marketing. ACM Trans.Web (1), 5 (2007)47. Leskovec, J., Krause, A., Guestrin, C., Faloutsos, C., VanBriesen, J., Glance, N.: Cost-effectiveoutbreak detection in networks. In: Proc. 13th ACM SIGKDD Intl. Conf. on KnowledgeDiscovery and Data Mining, pp. 420–429. ACM (2007)48. Li, W., Tang, S., Pei, S., Yan, S., Jiang, S., Teng, X., Zheng, Z.: The rumor diffusion processwith emerging independent spreaders in complex networks. Physica A , 121–128 (2014)49. Liben-Nowell, D., Kleinberg, J.: Tracing information flow on a global scale using internetchain-letter data. Proc. Natl. Acad. Sci. U.S.A. (12), 4633–4638 (2008)50. Liu, Y., Tang, M., Zhou, T., Do, Y.: Core-like groups result in invalidation of identifying super-spreader by k-shell decomposition. Sci. Rep. , 9602 (2015)51. Liu, Y., Tang, M., Zhou, T., Do, Y.: Improving the accuracy of the k-shell method by removingredundant links-from a perspective of spreading dynamics. Sci. Rep. , 13172 (2015)52. L¨u, L., Chen, D., Ren, X.L., Zhang, Q.M., Zhang, Y.C., Zhou, T.: Vital nodes identification incomplex networks. Phys. Rep. , 1–63 (2016)53. L¨u, L., Zhang, Y.C., Yeung, C.H., Zhou, T.: Leaders in social networks, the delicious case.PLoS ONE (6), e21202 (2011)54. L¨u, L., Zhou, T., Zhang, Q.M., Stanley, H.E.: The h-index of a network node and its relationto degree and coreness. Nat. Comm. , 10168 (2016)55. Luo, S., Morone, F., Sarraute, C., Makse, H.A.: Inferring personal financial status from socialnetwork location. Nat. Comm. , 15227 (2017)56. Martin, T., Zhang, X., Newman, M.: Localization and centrality in networks. Phys. Rev. E (5), 052808 (2014)57. M´ezard, M., Parisi, G.: The cavity method at zero temperature. J. Stat. Phys. (1), 1–34(2003)58. Min, B., Liljeros, F., Makse, H.A.: Finding influential spreaders from human activity beyondnetwork location. PLoS ONE (8), e0136831 (2015)59. Min, B., Morone, F., Makse, H.A.: Searching for influencers in big-data complex networks.In: Diffusive Spreading in Nature, Technology and Society (Springer Verlag, Edited by A.Bunde, J. Caro, J. Karger, G. Vogl) (2016)60. Mislove, A., Marcon, M., Gummadi, K.P., Druschel, P., Bhattacharjee, B.: Measurement andanalysis of online social networks. In: Proc. 7th ACM SIGCOMM Conf. on Internet Measure-ment, pp. 29–42. ACM (2007)61. Morone, F., Makse, H.A.: Influence maximization in complex networks through optimal per-colation. Nature , 65–68 (2015)62. Morone, F., Min, B., Bo, L., Mari, R., Makse, H.A.: Collective influence algorithm to findinfluencers via optimal percolation in massively large social media. Sci. Rep. , 30062 (2016)heories for influencer identification in complex networks 2363. Morone, F., Roth, K., Min, B., Stanley, H.E., Makse, H.A.: A model of brain activation predictsthe neural collective influence map of the human brain. Proc. Natl. Acad. Sci. U.S.A. (15),3849–3854 (2017)64. Muchnik, L., Pei, S., Parra, L.C., Reis, S.D., Andrade Jr, J.S., Havlin, S., Makse, H.A.: Originsof power-law degree distribution in the heterogeneity of human activity in social networks.Sci. Rep. , 1783 (2013)65. Mugisha, S., Zhou, H.J.: Identifying optimal targets of network attack by belief propagation.Phys. Rev. E (1), 012305 (2016)66. Nemhauser, G.L., Wolsey, L.A., Fisher, M.L.: An analysis of approximations for maximizingsubmodular set functionsi. Math. Program. (1), 265–294 (1978)67. Newman, M.E.: Spread of epidemic disease on networks. Phys. Rev. E (1), 016128 (2002)68. Newman, M.E., Strogatz, S.H., Watts, D.J.: Random graphs with arbitrary degree distributionsand their applications. Phys. Rev. E (2), 026118 (2001)69. Pastor-Satorras, R., Vespignani, A.: Epidemic spreading in scale-free networks. Phys. Rev.Lett. (14), 3200 (2001)70. Pastor-Satorras, R., Vespignani, A.: Immunization of complex networks. Phys. Rev. E (3),036104 (2002)71. Pei, S., Makse, H.A.: Spreading dynamics in complex networks. J. Stat. Mech: Theory Exp. (12), P12002 (2013)72. Pei, S., Muchnik, L., Andrade Jr, J.S., Zheng, Z., Makse, H.A.: Searching for superspreadersof information in real-world social media. Sci. Rep. , 5547 (2014)73. Pei, S., Muchnik, L., Tang, S., Zheng, Z., Makse, H.A.: Exploring the complex pattern ofinformation spreading in online blog communities. PLoS ONE (5), e0126894 (2015)74. Pei, S., Tang, S., Zheng, Z.: Detecting the influence of spreading in social networks withexcitable sensor networks. PLoS ONE (5), e0124,848 (2015)75. Pei, S., Teng, X., Shaman, J., Morone, F., Makse, H.A.: Efficient collective influence maxi-mization in threshold models of behavior cascading with first-order transitions. Sci. Rep. ,45240 (2017)76. Radicchi, F., Castellano, C.: Leveraging percolation theory to single out influential spreadersin networks. Phys. Rev. E (6), 062314 (2016)77. Ramos, M., Shao, J., Reis, S.D., Anteneodo, C., Andrade Jr, J.S., Havlin, S., Makse, H.A.:How does public opinion become extreme? Sci. Rep. , 10032 (2015)78. Reis, S.D., Hu, Y., Babino, A., Andrade Jr, J.S., Canals, S., Sigman, M., Makse, H.A.: Avoid-ing catastrophic failure in correlated networks of networks. Nat. Phys. (10), 762–767 (2014)79. Restrepo, J.G., Ott, E., Hunt, B.R.: Characterizing the dynamical importance of network nodesand links. Phys. Rev. Lett. (9), 094102 (2006)80. Richardson, M., Domingos, P.: Mining knowledge-sharing sites for viral marketing. In: Proc.8th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, pp. 61–70. ACM(2002)81. Rogers, E.M.: Diffusion of innovations. Simon and Schuster (2010)82. Roth, K., Morone, F., Min, B., Makse, H.A.: Emergence of robustness in networks of net-works. Phys. Rev. E (6), 062,308 (2017)83. Rybski, D., Buldyrev, S.V., Havlin, S., Liljeros, F., Makse, H.A.: Communication activity in asocial network: relation between long-term correlations and inter-event clustering. Sci. Rep. , 560 (2012)84. Sabidussi, G.: The centrality index of a graph. Psychometrika (4), 581–603 (1966)85. Seidman, S.B.: Network structure and minimum degree. Soc. Netw. (3), 269–287 (1983)86. Stauffer, D., Aharony, A.: Introduction to percolation theory. CRC press (1994)87. Tang, S., Teng, X., Pei, S., Yan, S., Zheng, Z.: Identification of highly susceptible individualsin complex networks. Physica A , 363–372 (2015)88. Teng, X., Pei, S., Morone, F., Makse, H.A.: Collective influence of multiple spreaders evalu-ated by tracing real information flow in large-scale social networks. Sci. Rep. , 36043 (2016)89. Teng, X., Yan, S., Tang, S., Pei, S., Li, W., Zheng, Z.: Individual behavior and social wealthin the spatial public goods game. Physica A , 141–149 (2014)4 Sen Pei, Flaviano Morone and Hern´an A. Makse90. Viswanath, B., Mislove, A., Cha, M., Gummadi, K.P.: On the evolution of user interaction infacebook. In: Proc. 2nd ACM Workshop on Online Social Networks, pp. 37–42. ACM (2009)91. Watts, D.J.: A simple model of global cascades on random networks. Proc. Natl. Acad. Sci.U.S.A. (9), 5766–5771 (2002)92. Watts, D.J., Dodds, P.S.: Influentials, networks, and public opinion formation. J. Cons. Res. (4), 441–458 (2007)93. Yan, S., Tang, S., Fang, W., Pei, S., Zheng, Z.: Global and local targeted immunization innetworks with community structure. J. Stat. Mech: Theory Exp. (8), P08010 (2015)94. Yan, S., Tang, S., Pei, S., Jiang, S., Zheng, Z.: Dynamical immunization strategy for seasonalepidemics. Phys. Rev. E (2), 022808 (2014)95. Zeng, A., Zhang, C.J.: Ranking spreaders by decomposing complex networks. Phys. Lett. A377