Algorithms for Online Influencer Marketing
AAlgorithms for Online Influencer Marketing
Paul Lagr´ee , Olivier Capp´e , Bogdan Cautis , and Silviu Maniu LRI, Universit´e Paris-Sud, Universit´e Paris-Saclay LIMSI, CNRS, Universit´e Paris-Saclay
Abstract
Influence maximization is the problem of finding influential users, or nodes, in a graph so asto maximize the spread of information. It has many applications in advertising and marketingon social networks. In this paper, we study a highly generic version of influence maximization,one of optimizing influence campaigns by sequentially selecting “spread seeds” from a set of influencers , a small subset of the node population, under the hypothesis that, in a given campaign,previously activated nodes remain “persistently” active throughout and thus do not yield furtherrewards. This problem is in particular relevant for an important form of online marketing,known as influencer marketing , in which the marketers target a sub-population of influentialpeople, instead of the entire base of potential buyers. Importantly, we make no assumptionson the underlying diffusion model and we work in a setting where neither a diffusion networknor historical activation data are available. We call this problem online influencer marketingwith persistence (in short, OIMP). We first discuss motivating scenarios and present our generalapproach. We introduce an estimator on the influencers’ remaining potential – the expectednumber of nodes that can still be reached from a given influencer – and justify its strength torapidly estimate the desired value, relying on real data gathered from Twitter. We then describea novel algorithm,
GT-UCB , relying on upper confidence bounds on the remaining potential.We show that our approach leads to high-quality spreads on both simulated and real datasets,even though it makes almost no assumptions on the diffusion medium. Importantly, it is ordersof magnitude faster than state-of-the-art influence maximization methods, making it possible todeal with large-scale online scenarios.
Keywords:
Influencer marketing, information diffusion, online social networks, influence maxi-mization, online learning, multi-armed bandits.
Advertising based on word-of-mouth diffusion in social media has become very important in thedigital marketing landscape. Nowadays, social value and social influence are arguably the hottestconcepts in the area of Web advertising and most companies that advertise in the Web space musthave a “social” strategy. For example, on widely used platforms such as Facebook or Twitter,promoted posts are interleaved with normal posts on user feeds. Users interact with these posts byactions such as “likes” (adoption), “shares” or “reposts” (network diffusion). This represents anunprecedented utility in advertising, be it with a commercial intent or not, as products, news, ideas,movies, political manifests, tweets, etc, can propagate easily to a large audience [40, 41].Motivated by the need for effective viral marketing strategies, influence estimation and influencemaximization have become important research problems, at the intersection of data mining and1 a r X i v : . [ c s . S I] O c t ocial sciences [12]. In short, influence maximization is the problem of selecting a set of nodes from agiven diffusion graph, maximizing the expected spread under an underlying diffusion model. Thisproblem was introduced in 2003 by the seminal work of Kempe et al. [21], through two stochastic,discrete-time diffusion models, Linear Threshold (LT) and
Independent Cascade (IC). These modelsrely on diffusion graphs whose edges are weighted by a score of influence. They show that selectingthe set of nodes maximizing the expected spread is NP-hard for both models, and they proposea greedy algorithm that takes advantage of the sub-modularity property of the influence spread,but does not scale to large graphs. A rich literature followed, focusing on computationally efficientand scalable algorithms to solve influence maximization. The recent benchmarking study of Aroraet al. [2] summarizes state-of-the-art techniques and also debunks many influence maximizationmyths. In particular, it shows that, depending on the underlying diffusion model and the choice ofparameters, each algorithm’s behavior can vary significantly, from very efficient to prohibitively slow,and that influence maximization at the scale of real applications remains an elusive target.Importantly, all the influence maximization studies discussed in [2] have as starting point aspecific diffusion model (IC or LT), whose graph topology and parameters – basically the edgeweights – are known. In order to infer the diffusion parameters or the underlying graph structure,or both, [15, 16, 14, 18, 33, 11] propose offline, model-specific methods , which rely on observedinformation cascades. In short, information cascades are time-ordered sequences of records indicatingwhen a specific user was activated or adopted a specific item.There are however many situations where it is unreasonable to assume the existence of relevanthistorical data in the form of cascades. For such settings, online approaches , which can learn theunderlying diffusion parameters while running diffusion campaigns , have been proposed. Bridginginfluence maximization and inference, this is done by balancing between exploration steps (of yetuncertain model aspects) and exploitation ones (of the best solution so far), by so called multi-armedbandits techniques, where an agent interacts with the network to infer influence probabilities [38, 10,42, 37]. The learning agent sequentially selects seeds from which diffusion processes are initiated inthe network; the obtained feedback is used to update the agent’s knowledge of the model.Nevertheless, all these studies on inferring diffusion networks, whether offline or online, rely onparametric diffusion models, i.e., assume that the actual diffusion dynamics are well captured bysuch a model (e.g., IC). This maintains significant limitations for practical purposes. First, themore complex the model, the harder to learn in large networks, especially in campaigns that have arelatively short timespan, making model inference and parameter estimation very challenging withina small horizon (typically tens or hundreds of spreads). Second, it is commonly agreed that theaforementioned diffusion models represent elegant yet coarse interpretations of a reality that is muchmore complex and often hard to observe fully. For examples of insights into this complex reality, the topical or non-topical nature of an influence campaign, the popularity of the piece of informationbeing diffused, or its specific topic were all shown to have a significant impact on hashtag diffusionsin Twitter [11, 19, 32]. Our contribution
Aiming to address such limitations, we propose in this paper a large-scaleapproach for online and adaptive influence maximization , in which the underlying assumptions forthe diffusion processes are kept to a minimum (if, in fact, hardly any).We argue that it can represent a versatile tool in many practical scenarios, including in particularthe one of influencer marketing , which, according to Wikipedia, can be described as follows: “aform of marketing in which focus is placed on influential people rather than the target market as awhole, identifying the individuals that have influence over potential buyers, and orienting marketingactivities around these influencers” . For instance, influential users may be contractually bound by asponsorship, in exchange for the publication of promoted posts on their online Facebook or Twitter2ccounts. This new form of marketing is by now extensively used in online social platforms, as isdiscussed in the marketing literature [13, 1, 6].More concretely, we focus on social media diffusion scenarios in which influence campaigns consistof multiple consecutive trials (or rounds ) spreading the same type of information from an arbitrarydomain (be it a product, idea, post, hashtag, etc) . The goal of each campaign is to reach (or activate )as many distinct users as possible, the objective function being the total spread. These potentialinfluencees – the target market – represent the nodes of an unknown diffusion medium / onlinenetwork. In our setting – as, arguably, in many real-world scenarios – the campaign selects from aknown set of spread seed candidates, so called influencers , a small subset of the potentially large andunknown target market. At each round, the learning agent picks among the influencers those fromwhich a new diffusion process is initiated in the network, gathers some feedback on the activations,and adapts the subsequent steps of the campaign. Only the effects of the diffusion process, namelythe activations (e.g., purchases or subscriptions), are observed, but not the process itself. The agentmay “re-seed” certain influencers (we may want to ask a particular one to initiate spreads severaltimes, e.g., if it has a strong converting impact ). This perspective on influence campaigns imposesnaturally a certain notion of persistence , which is given the following interpretation: users that werealready activated in the ongoing campaign – e.g., have adopted a product or endorsed a politicalmovement – remain activated throughout that campaign, and thus will not be accounted for morethan once in the objective function.We call this problem online influencer marketing with persistence (in short, OIMP). Our solutionfor it follows the multi-armed bandit idea initially employed in Lei et al. [24], but we adopt insteada diffusion-independent perspective , whose only input are the spread seed candidates, while thepopulation and underlying diffusion network – which may actually be the superposition of severalnetworks – remain unknown. In our bandit approach, the parameters to be estimated are the valuesof the influencers – how good is a specific influencer –, as opposed to the diffusion edge probabilities ofa known graph as in [24]. Furthermore, we make the model’s feedback more realistic by assuming thatafter each trial, the agent only gathers the set of activated nodes. The rationale is that oftentimes,for a given “viral” item, we can track in applications only when it was adopted by various users,but not why . A key difference w.r.t. other multi-armed bandit studies for influence maximizationsuch as [38, 10, 42, 37] is that these look for a constant optimal set of seeds, while the difficulty withOIMP is that the seemingly best action at a given trial depends on the activations of the previoustrials (and thus the learning agent’s past decisions).The multi-armed bandit algorithm we propose, called GT-UCB , relies on a famous statisticaltool known as the
Good-Turing estimator , first developed during WWII to crack the Enigma machine,and later published by Good in a study on species discovery [17]. Our approach is inspired by thework of Bubeck et al. [8], which proposed the use of the Good-Turing estimator in a context wherethe learning agent needs to sequentially select experts that only sample one of their potential nodes ateach trial. In contrast, in OIMP, when an influencer is selected, it may have a potentially large spreadand may activate many nodes at once. Our solution follows the well-known optimism in the face ofuncertainty principle from the bandit literature (see [7] for an introduction to multi-armed banditproblems), by deriving an upper confidence bound on the estimator for the remaining potential forspreading information of each influencer, and by choosing in a principled manner between exploreand exploit steps.In Section 6 we evaluate the proposed approach on publicly available graph datasets as well alarge snapshot of Twitter activity we collected. The proposed algorithm is agnostic with respect tothe choice of influencers, who in most realistic applications will be selected based on a combination of Repeated exposure, known also as the “effective frequency”, is a crucial concept in marketing strategies, online oroffline.
Comparison with previous publication
We extend in this article a preliminary study publishedin Lagr´ee et al. [22], introducing the following new contributions, which allow us to give a completepicture on our model and algorithmic solutions: • A more detailed discussion on the motivation behind our work, in relation to new forms ofonline marketing, such as influencer marketing. • An empirical analysis over Twitter data, which comes to support the assumptions behind ourchoice of estimators for the remaining potential in each influencer. • A detailed theoretical analysis and justification for the upper confidence bounds on which ouralgorithm
GT-UCB relies. • Theoretical guarantees on the performance of
GT-UCB , formulated in terms of waiting time ,a notion that is better suited in our bandit framework than the usual one of regret . • An adaptation of
GT-UCB (denoted
Fat-GT-UCB ) and the corresponding theoretical analysis,for scenarios in which influencers may experience fatigue, i.e., a diminishing tendency to activatetheir user base as they are re-seeded throughout a marketing campaign. • A broader experimental analysis involving the two datasets previously used in [22], as well asa new set of experimental results in a completely different scenario, involving real influencespreads from Twitter, for both
GT-UCB and
Fat-GT-UCB .To the best of our knowledge, our approach is the first to show that efficient and effective influencemaximization can be done in a highly uncertain or under-specified social environment, along withformal guarantees on the achieved spread.
The goal of the online influencer marketing with persistence is to successively select (or activate )a number of seed nodes, in order to reach (or spread to) as many other nodes as possible. In thissection, we formally define this problem.
Given a graph G = ( V, E ), the traditional problem of influence maximization is to select a set ofseed nodes I ⊆ V , under a cardinality constraint | I | = L , such that the expected spread – that is,the number of activated nodes – of an influence cascade starting from I is maximized. Formally,denoting by the random variable S ( I ) the spread initiated by the seed set I , influence maximizationaims to solve the following optimization problem:arg max I ⊆ V, | I | = L E [ | S ( I ) | ] . k ( u ) p k ( u ) u InfluencersBasic Nodes
Figure 1: Three influencers with associated activation probabilities p k ( u ).As mentioned before, a plethora of algorithms have been proposed to solve the influence maximiza-tion problem, under specific diffusion models. These algorithms can be viewed as full-information and offline approaches: they choose all the seeds at once, in one step, and they have the completediffusion configuration, i.e., the graph topology and the influence probabilities.In the online case, during a sequence of N (called hereafter the budget ) consecutive trials, L seednodes are selected at each trial, and feedback on the achieved spread from these seeds is collected. The short timespan of campaigns makes parameter estimation very challenging within small horizons.In other cases, the topology – or even the existence – of a graph is too strong an assumption. Incontrast to [24], we do not try to estimate edge probabilities in some graph, but, instead, we assumethe existence of a known set of spread seed candidates – in the following referred to as the influencers – who are the only access to the medium of diffusion. Formally, we let [ K ] := { , . . . , K } be a setof influencers up for selection; each influencer is connected to an unknown and potentially largebase (the influencer’s support ) of basic nodes, each with an unknown activation probability. Forillustration, we give in Figure 1 an example of this setting, with 3 influencers connected to 4, 5, and4 basic nodes, respectively.Now, the problem boils down to estimating the value of the K influencers, which is typically muchsmaller than the number of parameters of the diffusion model. The medium over which diffusionoperates may be a diffusion graph but we make no assumption on that, meaning that the diffusionmay also happen in a completely unknown environment. Finally, note that by choosing K = | V | influencers, the classic influence maximization problem can be seen as a special instance of oursetting.We complete the formal setting by assuming the existence of K sets A k ⊆ V of basic nodes suchthat each influencer k ∈ [ K ] is connected to each node in A k . We denote p k ( u ) the probability forinfluencer k to activate the child node u ∈ A k . In this context, the diffusion process can be abstractedas follows. Definition 1 (Influence process) . When an influencer k ∈ [ K ] is selected, each basic node u ∈ A k is sampled for activation, according to its probability p k ( u ) . The feedback for k ’s selection consists ofall the activated nodes, while the associated reward consists only of the newly activated ones. Remark
Limiting the influence maximization method to working with a small subset of the nodebase may allow to accurately estimate their value more rapidly, even in a highly uncertain environment,hence the algorithmic interest. At the same time, this is directly motivated by marketing scenariosinvolving marketers who may not have knowledge of the entire diffusion graph, only having access5o a few influential people who can diffuse information (the influencers in our setting), or maysimply prefer such a two-step flow of diffusion for various reasons, such as establishing credibility.Moreover, despite the fact that we model the social reach of every influencer by 1-hop links to theto-be-influenced nodes, these edges are just an abstraction of the activation probability, and mayrepresent in reality longer paths in an underlying unknown real influence graph G . We are now ready to define the online influencer marketing with persistence task.
Problem 1 (OIMP) . Given a set of influencers [ K ] := { , . . . , K } , a budget of N trials, and anumber ≤ L ≤ K of influencers to be activated at each trial, the objective of the online influencermarketing with persistence (OIMP) is to solve the following optimization problem: arg max I n ⊆ [ K ] , | I n | = L, ∀ (cid:54) n (cid:54) N E (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:91) (cid:54) n (cid:54) N S ( I n ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . As noticed in [24], the offline influence maximization can be seen as a special instance of theonline one, where the budget is N = 1. Note that, in contrast to persistence-free online influencemaximization – considered, e.g., in [38, 42] – the performance criterion used in OIMP displaysthe so-called diminishing returns property : the expected number of nodes activated by successiveselections of a given seed is decreasing, due to the fact that nodes that have already been activatedare discounted. We refer to the expected number of nodes remaining to be activated as the remainingpotential of a seed. The diminishing returns property implies that there is no static best set ofseeds to be selected, but that the algorithm must follow an adaptive policy, which can detect thatthe remaining potential of a seed is small and switch to another seed that has been less exploited.Our solution to this problem has to overcome challenges on two fronts: (1) it needs to estimate thepotential of nodes at each round, without knowing the diffusion model nor the activation probabilities,and (2) it needs to identify the currently best influencers, according to their estimated potentials.Other approaches for the online influence maximization problem rely on estimating diffusionparameters [24, 38, 42] – generally, a distribution over the influence probability of each edge in thegraph. However, the assumption that one can estimate accurately the diffusion parameters – andnotably the diffusion probabilities – may be overly ambitious, especially in cases where the numberof allowed trials (the budget) is rather limited. A limited trial setting is arguably more in line withreal-world campaigns: take as example political or marketing campaigns, which only last for a fewweeks.In our approach, we work with parameters on nodes , instead of edges. More specifically, theseparameters represent the potentials of remaining spreads from each of the influencer nodes. Westress that this potential can evolve as the campaign proceeds. In this way, we can go around thedependencies on specific diffusion models, and furthermore, we can remove entirely the dependencyon a detailed graph topology. In this section, we describe our UCB-like algorithm, which relies on the Good-Turing estimator tosequentially select the seeds to activate at each round, from the available influencers.6 .1 Remaining potential and Good-Turing estimator
A good algorithm for OIMP should aim at selecting the influencer k with the largest potential forinfluencing its children A k . However, the true potential value of an influencer is a priori unknown tothe decision maker.In the following, we index trials by t when referring to the time of the algorithm, and we indextrials by n when referring to the number of selections of the influencer. For example, the t -th spreadinitiated by the algorithm is noted S ( t ) whereas the n -th spread of influencer k is noted S k,n . Definition 2 (Remaining potential R k ( t )) . Consider an influencer k ∈ [ K ] connected to A k basicnodes. Let S (1) , . . . , S ( t ) be the set of nodes that were activated during the first t trials by the seededinfluencers. The remaining potential R k ( t ) is the expected number of new nodes that would beactivated upon starting the t + 1 -th cascade from k : R k ( t ) := (cid:88) u ∈ A k (cid:40) u / ∈ t (cid:91) i =1 S ( i ) (cid:41) p k ( u ) , where {·} denotes the indicator function. Definition 2 provides a formal way to obtain the remaining potential of an influencer k at a giventime. The optimal policy would simply select the influencer with the largest remaining potential ateach time step. The difficulty is, however, that the probabilities p k ( u ) are unknown. Hence, we haveto design a remaining potential estimator ˆ R k ( t ) instead. It is important to stress that the remainingpotential is a random quantity, because of the dependency on the spreads S (1) , . . . , S ( t ). Furthermore,due to the diminishing returns property, the sequence ( S k,n ) n ≥ is stochastically decreasing.Following ideas from [17, 8], we now introduce a version of the Good-Turing statistic, tailored toour problem of rapidly estimating the remaining potential. Denoting by n k ( t ) the number of timesinfluencer k has been selected after t trials, we let S k, , . . . , S k,n k ( t ) be the n k ( t ) cascades sampledindependently from influencer k . We denote by U k ( u, t ) the binary function whose value is 1 if node u has been activated exactly once by influencer k – such occurrences are called hapaxes in linguistics– and Z k ( u, t ) the binary function whose value is 1 if node u has never been activated by influencer k .The idea of the Good-Turing estimator is to estimate the remaining potential as the proportion ofhapaxes in the n k ( t ) sampled cascades, as follows:ˆ R k ( t ) := 1 n k ( t ) (cid:88) u ∈ A k U k ( u, t ) (cid:89) l (cid:54) = k Z l ( u, t ) . Albeit simple, this estimator turns out to be quite effective in practice. If an influencer isconnected to a combination of both nodes having high activation probabilities and nodes having lowactivation probabilities, then successive traces sampled from this influencer will result in multipleactivations of the high-probability nodes and few of the low-probability ones. Hence, after observinga few spreads, the influencer’s potential will be low, a fact that will be captured by the low proportionof hapaxes. In contrast, estimators that try to estimate each activation probability independentlywill require a much larger number of trials to properly estimate the influencer’s potential.To verify this assumption in reality, we conducted an analysis of the empirical activation proba-bilities from a Twitter dataset. Specifically, we used a collection of tweets and re-tweets gatheredvia crawling in August 2012. For each original tweet, we find all corresponding retweets, and, foreach user, we compute the empirical probability of a retweet occurring – this, in our case, is aproxy measure for influence probability. Specifically, for every user v “influenced” by u , i.e., v − − − − Retweet probability − − − − − S u r v i v a l f u n c t i o n .
025 0 .
05 0 .
075 0 . .
125 0 . Retweet probability
Figure 2: (left) Twitter empirical retweet probabilities. (right) Sample of 50 empirical retweetprobabilities.retweeted at least one original tweet from u – we compute the estimated diffusion probability: p u,v = | u ’s tweets retweeted by v | / | tweets by u | . In Fig. 2 (left), we show the survival function ofresulting empirical probabilities in a log-log plot. We can see that most probabilities are small – the9th decile has value 0 . Remark
While bearing similarities with the traditional missing mass concept, we highlight onefundamental difference between the remaining potential and the traditional missing mass studiedin [8], which impacts both the algorithmic solution and the analysis. Since at each step, after selectingan influencer, every node connected to that influencer is sampled, the algorithm receives a largerfeedback than in [8], whose feedback is in [0 , U k ( u, t )) u ∈ A k are independent. Interestingly, the quantity λ k := (cid:80) u ∈ A k p ( u ), whichcorresponds to the expected number of basic nodes an influencer activates or re-activates in a cascade,will prove to be a crucial ingredient for our problem.8
10 20 30 40 50
Trial A c t i v a t i o n s Trial n o d e s Remaining potentialGT estimatorBayes estimator
Figure 3: (left) Influence spread against number of rounds. (right) Bayesian estimator againstGood-Turing estimator.
Following principles from the bandit literature, the
GT-UCB algorithm relies on optimism in theface of uncertainty . At each step (trial) t , the algorithm selects the highest upper-confidence boundon the remaining potential – denoted by b k ( t ) – and activates (plays) the corresponding influencer k . This algorithm achieves robustness against the stochastic nature of the cascades, by ensuringthat influencers who “underperformed” with respect to their potential in previous trials may still beselected later on. Consequently, GT-UCB aims to maintain a degree of exploration of influencers, inaddition to the exploitation of the best influencers as per the feedback gathered so far.
ALGORITHM 1: – GT-UCB ( L = 1) Require:
Set of influencers [ K ], time budget N Initialization: play each influencer k ∈ [ K ] once, observe the spread S k, , set n k = 1 For each k ∈ [ K ]: update the reward W = W ∪ S k, for t = K + 1 , . . . , N do Compute b k ( t ) for every influencer k Choose k ( t ) = arg max k ∈ [ K ] b k ( t ) Play influencer k ( t ) and observe spread S ( t ) Update cumulative reward: W = W ∪ S ( t ) Update statistics of influencer k ( t ): n k ( t ) ( t + 1) = n k ( t ) ( t ) + 1 and S k,n k ( t ) = S ( t ). end for return W Algorithm 1 presents the main components of
GT-UCB for the case L = 1, that is, when a singleinfluencer is chosen at each step.The algorithm starts by activating each influencer k ∈ [ K ] once, in order to initialize its Good-Turing estimator. The main loop of GT-UCB occurs at lines 3-9. Let S ( t ) be the observed spread attrial t , and let S k,s be the result of the s -th diffusion initiated at influencer k . At every step t > K ,we recompute for each influencer k ∈ [ K ] its index b k ( t ), representing the upper confidence boundon the expected reward in the next trial. The computation of this index uses the previous samples S k, , . . . , S k,n k ( t ) and the number of times each influencer k has been activated up to trial t , n k ( t ).Based on the result of Theorem 1 – whose statement and proof are delayed to Section 4 –, the upper9onfidence bound is set as: b k ( t ) = ˆ R k ( t ) + (cid:16) √ (cid:17) (cid:115) ˆ λ k ( t ) log(4 t ) n k ( t ) + log(4 t )3 n k ( t ) , (1)where ˆ R k ( t ) is the Good-Turing estimator and ˆ λ k ( t ) := (cid:80) n k ( t ) s =1 | S k,s | n k ( t ) is an estimator for the expectedspread from influencer k .Then, in line 5, GT-UCB selects the influencer k ( t ) with the largest index, and initiates a cascadefrom this node. The feedback S ( t ) is observed and is used to update the cumulative reward set W .We stress again that S ( t ) provides only the Ids of the nodes that were activated, with no informationon how this diffusion happened in the hidden diffusion medium. Finally, the statistics associated tothe chosen influencer k ( t ) are updated. L > Algorithm 1 can be easily adapted to select
L > L largestindices. Note that k ( t ) then becomes a set of L influencers. A diffusion is initiated from the associatednodes and, at termination, all activations are observed. Similarly to [37], the algorithm requiresfeedback to include the influencer responsible for the activation of each node, in order to update thecorresponding statistics accordingly. In this section, we justify the upper confidence bound used by
GT-UCB in Eq. 1 and provide atheoretical analysis of the algorithm.
In the following, to simplify the analysis and to allow for a comparison with the oracle strategy, weassume that the influencers have non intersecting support . This means that each influencer’s remainingpotential and corresponding Good-Turing estimator does not dependent on other influencers. Hence,for notational efficiency, we also omit the subscript denoting the influencer k . After selecting theinfluencer n times, the Good-Turing estimator is simply written ˆ R n = (cid:80) u ∈ A U n ( u ) n . We note thatthe non-intersecting assumption is for theoretical purposes only – our experiments are done withinfluencers that can have intersecting supports.The classic Good-Turing estimator is known to be slightly biased (see Theorem 1 in [28] forexample). We show in Lemma 1 that our remaining potential estimator adds an additional factor λ = (cid:80) u ∈ A p ( u ) to this bias: Lemma 1.
The bias of the remaining potential estimator is E [ R n ] − E [ ˆ R n ] ∈ (cid:20) − λn , (cid:21) . roof. E [ R n ] − E [ ˆ R n ] = (cid:88) u ∈ A (cid:104) p ( u )(1 − p ( u )) n − nn p ( u )(1 − p ( u )) n − (cid:105) = − n (cid:88) u ∈ A p ( u ) × np ( u )(1 − p ( u )) n − = − n E (cid:34)(cid:88) u ∈ A p ( u ) U n ( u ) (cid:35) ∈ (cid:20) − (cid:80) u ∈ A p ( u ) n , (cid:21) Since λ is typically very small compared to | A | , in expectation, the estimation should be relativelyaccurate. However, in order to understand what may happen in the worst-case, we need to characterizethe deviation of the Good-Turing estimator: Theorem 1.
With probability at least − δ , for λ = (cid:80) u ∈ A p ( u ) and β n := (cid:0) √ (cid:1) (cid:113) λ log(4 /δ ) n + n log δ , the following holds: − β n − λn ≤ R n − ˆ R n ≤ β n . Note that the additional term appearing in the left deviation corresponds to the bias of ourestimator, which leads to a non-symmetrical interval.
Proof.
We prove the confidence interval in three steps: (1) Good-Turing estimator deviation, (2) re-maining potential deviation, (3) combination of these two inequalities to obtain the final confidenceinterval.Here, the child nodes are assumed to be sampled independently , which is a simplification comparedto the classic missing mass concentration results that relies on negatively associated samples [28, 27].On the other hand, since we may activate several nodes at once, we need original concentrationarguments to control the increments of both ˆ R n and R n . ( ) Good-Turing deviations. Let X n ( u ) := U n ( u ) n . We have that v := (cid:88) u ∈ A E [ X n ( u ) ] = 1 n (cid:88) u ∈ A E [ U n ( u )] ≤ λn . Moreover, clearly the following holds: X n ( u ) ≤ n .Applying Bennett’s inequality (Theorems 2.9, 2.10 in [5]) to the independent random variables { X n ( u ) } u ∈ A yields P (cid:32) ˆ R n − E [ ˆ R n ] ≥ (cid:114) λ log(1 /δ ) n + log(1 /δ )3 n (cid:33) ≤ δ. (2)The same inequality can be derived for left deviations. ( ) Remaining potential deviations. Remember that Z n ( u ) denotes the indicator equalto 1 if u has never been activated up to trial n . We can rewrite the remaining potential as R n = (cid:80) u ∈ A Z n ( u ) p ( u ) . Let Y n ( u ) = p ( u )( Z n ( u ) − E [ Z n ( u )]) and q ( u ) = P ( Z n ( u ) = 1) = (1 − p ( u )) n .11or some t >
0, we have next that P ( R n − E [ R n ] ≥ (cid:15) ) ≤ e − t(cid:15) (cid:89) u ∈ A E (cid:104) e tY n ( u ) (cid:105) = e t(cid:15) (cid:89) u ∈ A (cid:16) q ( u ) e tp ( u )(1 − q ( u )) + (1 − q ( u )) e − tp ( u ) q ( u ) (cid:17) ≤ e − t(cid:15) (cid:89) u ∈ A exp( p ( u ) t / (4 n )) = exp (cid:0) − t(cid:15) + t / (4 n ) λ (cid:1) . The first inequality is well-known in exponential concentration bounds and relies on Markov’sinequality. The second inequality follows from [4] (Lemma 3.5).Then, choosing t = n(cid:15)λ , we obtain P (cid:32) R n − E [ R n ] ≥ (cid:114) λ log(1 /δ ) n (cid:33) ≤ δ. (3)We can proceed similarly to obtain the left deviation. (3) Putting it all together. We combine Lemma 1 with Eq. (2), (3), to obtain the final result.Note that δ is replaced by δ to ensure that both the left and right bounds for the Good-Turingestimator and the remaining potential are verified. We now provide an analysis of the waiting time (defined below) of
GT-UCB , by comparing it to thewaiting time of an oracle policy, following ideas from [8]. Let R k ( t ) be the remaining potential ofinfluencer k at trial number t . This differs from R k,n , which is the remaining potential of influencer k once it has been played n times. Definition 3 (Waiting time) . Let λ k = (cid:80) u ∈ A k p ( u ) denote the expected number of activationsobtained by the first call to influencer k . For α ∈ (0 , , the waiting time T UCB ( α ) of GT-UCB represents the round at which the remaining potential of each influencer k is smaller than αλ k .Formally, T UCB ( α ) := min { t : ∀ k ∈ [ K ] , R k ( t ) ≤ αλ k } . The above definition can be applied to any strategy for influencer selection and, in particular,to an oracle one that knows beforehand the α value that is targeted, the spreads ( S k,s ) k ∈ [ K ] , ≤ s ≤ t sampled up to the current time, and the individual activation probabilities p k ( u ) , u ∈ A k . A policyhaving access to all these aspects will perform the fewest possible activations on each influencer. Wedenote by T ∗ ( α ) the waiting time of the oracle policy. We are now ready to state the main theoreticalproperty of the GT-UCB algorithm.
Theorem 2 (Waiting time) . Let λ min := min k ∈ [ K ] λ k and let λ max := max k ∈ [ K ] λ k . Assuming that λ min ≥ , for any α ∈ (cid:2) λ min , (cid:3) , if we define τ ∗ := T ∗ (cid:0) α − λ min (cid:1) , with probability at least − Kλ max the following holds: T UCB ( α ) ≤ τ ∗ + Kλ max log(4 τ ∗ + 11 Kλ max ) + 2 K. The proof of this result is given in Appendix B. Unsurprisingly, Theorem 2 says that
GT-UCB must perform slightly more activations of the influencers than the oracle policy. With high probability12 assuming that the best influencer has an initial remaining potential that is much larger than thenumber of influencers – the waiting time of
GT-UCB is comparable to T ∗ ( α (cid:48) ), up to factor that isonly logarithmic in the waiting time of the oracle strategy. α (cid:48) is smaller than α – hence T ∗ ( α (cid:48) ) islarger than T ∗ ( α )– by an offset that is inversely proportional to the initial remaining potential of theworst influencer. This essentially says that, if we deal with large graphs, and if the influencers triggerreasonably large spreads, our algorithm is competitive with the oracle. In our study of the OIMP problem so far, a key assumption has been that the influencers havea constant tendency to activate their followers. This hypothesis may not be verified in certainsituations, in which the moment influencers promote products that do not align with their image( misalignement ) or persist in promoting the same services ( weariness ). In such cases, they can expecttheir influence to diminish [34, 23]. To cope with such cases of weariness, we propose in this sectionan extension to
GT-UCB that incorporates the concept of influencer fatigue .In terms of bandits, the idea of our extension is similar in spirit to the one of Levine et al. [25]: anew type of bandits – called rotting bandits – where each arm’s value decays as a function of thenumber of times it has been selected. We also mention the work of Lou¨edec et al. [26] in which theauthors propose to take into account the gradual obsolescence of items to be recommended whileallowing new items to be added to the pool of candidates. In this latter work, the item’s value ismodeled by a decreasing function of the number of steps elapsed since the item was added to thepool of items, whereas in our work – and in that of [25] –, the value is a function of the number oftimes the item has been selected . The OIMP problem with influencer fatigue can be defined as follows.
Problem 2 (OIMP with influencer fatigue ) . Given a set of influencers [ K ] , a budget of N trials,a number ≤ L ≤ K of influencers to be activated at each trial, the objective of online influencermarketing with persistence (OIMP) and with influencer fatigue is to solve the following optimizationproblem: arg max I n ⊆ [ K ] , | I n | = L, ∀ (cid:54) n (cid:54) N E (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:91) (cid:54) n (cid:54) N S ( I n ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) , knowing that, at the s -th selection of an influencer k ∈ [ K ] , the probability that k activates somebasic node u is: p s ( u ) = γ ( s ) p ( u ) , for γ : N ∗ → (0 , a known non-increasing function and p ( u ) ∈ [0 , . Our initial OIMP formulation can be seen as a special instance of the one with influencer fatigue,where the non-increasing function γ – referred to as the weariness function in the following – is theconstant function n (cid:55)→
1. We follow the same strategy to solve this new OIMP variant, by estimatingthe remaining potential of a given influencer by an adaptation of the Good-Turing estimator. Whatmakes the problem more complex in this setting is the fact that our hapax statistics must now takeinto account the round at which they occured. 13 .2 The Fat-GT-UCB algorithm
As we did previously, to simplify the analysis, we assume that the influencers have non intersectingsupport . We redefine the remaining potential in the setting with influencer fatigue as R k ( t ) := (cid:88) u ∈ A k { u never activated } γ ( n k ( t ) + 1) p ( u ) , where p ( u ) is the probability that the influencer activates node u , independently of the number ofspreads initiated by the influencer. Again, the remaining potential is equal to the expected numberof additional conversions upon starting the t + 1-th cascade from k . The Good-Turing estimatoradapted to the setting with influencer fatigue is defined as follows:ˆ R k ( t ) = 1 n k ( t ) (cid:88) u ∈ A k U γn k ( t ) ( u ) , where U γk,n ( u ) := (cid:80) ≤ i ≤ n { X k, ( u ) = . . . = X k,i − ( u ) = X k,i +1 ( u ) = . . . = X k,n ( u ) = 0 , X k,i ( u ) =1 } γ ( n +1) γ ( i ) . In short, if i is the round at which a hapax has been activated, we reweight it by thefactor γ ( n + 1) /γ ( i ) since we are interested in its contribution at the n + 1-th spread initiated by theinfluencer. We provide a formal justification to this estimator by computing its bias in Appendix C.Following the same strategy and principles from the bandit literature, the Fat-GT-UCB adapta-tion of
GT-UCB selects at each step (trial) t the highest upper-confidence bound on the remainingpotential – denoted by b k ( t ) – and activates (plays) the corresponding influencer k . The upperconfidence bound can now be set as follows (the full details can also be found in Appendix C – seeTheorem 4): b k ( t ) = ˆ R k ( t ) + (cid:16) √ (cid:17) (cid:115) ˆ λ k ( t ) log(4 t ) n k ( t ) + log(4 t )3 n k ( t ) , (4)where ˆ R k ( t ) is the Good-Turing estimator andˆ λ k ( t ) := γ ( n k ( t ) + 1) n k ( t ) n k ( t ) (cid:88) s =1 | S k,s | γ ( s )is an estimator for the expected spread from influencer k . We conducted experiments on two types of datasets: (i.) two graphs, widely-used in the influencemaximization literature, and (ii.) a crawled dataset from Twitter, consisting of tweets occurringduring August 2012. All methods are implemented in C++ and simulations are done on an Ubuntu16.04 machine with an Intel Xeon 2.4GHz CPU 20 cores and 98GB of RAM. GT-UCB does not make any assumptions about the topology of the nodes under the scope ofinfluencers. Indeed, in many settings it may be more natural to assume that the set of influencers is The code is available at https://github.com/smaniu/oim . G remain unknown. In other settings, we may start from an existing social network G , inwhich case we need to extract a set of K representative influencers from it. Ideally, we should chooseinfluencers that have little intersection in their “scopes of influence” to avoid useless seed selections.While this may be interpreted and performed differently, from one application to another, we discussnext some of the most natural heuristics for selecting influencers which we use in our experiments. MaxDegree.
This method selects the K nodes with the highest out-degrees in G . Note that bythis criterion we may select influencers with overlapping influence scopes. Greedy MaxCover.
This strategy follows the well-known greedy approximation algorithm forselecting a cover of the graph G . Specifically, the algorithm executes the following steps K times:1. Select the node with highest out-degree2. Remove all out-neighbors of the selected nodeTo limit intersections among influencer scopes even more, nodes reachable by more than 1 hops maybe removed at step (2). DivRank [29].
DivRank is a PageRank-like method relying on reinforced random walks, withthe goal of producing diverse high-ranking nodes, while maintaining the rich-gets-richer paradigm.We adapted the original DivRank procedure by inverting the edge directions. In doing so, we getinfluential nodes instead of prestigious ones. By selecting the K highest scoring nodes as influencers,the diversity is naturally induced by the reinforcement of random walks. This ensures that theinfluencers are fairly scattered in the graph and should have limited impact on each other. Influence maximization approximated algorithms.
The fourth method we tested in ourexperiments assigns uniformly at random a propagation probability to each edge of G , assuming theIC model. Then, a state-of-the-art influence maximization algorithm – PMC in our experiments – isexecuted on G to get the set of K influencers having the highest potential spread. Similarly to [24], we tested our algorithm on HepPh and DBLP, two publicly available collaborationnetworks. HepPh is a citation graph, where a directed edge is established when an author cited atleast one paper of another author. In DBLP undirected edges are drawn between authors which havecollaborated on at least one indexed paper. The datasets are summarized in Table 1. We emphasizethat we kept the datasets relatively small to allow for comparison with computation-heavy baselines,even though
GT-UCB easily scales to large data, as will be illustrated in Section 6.3.Table 1: Summary of the datasets.
Dataset
HepPh DBLP Twitter . K K . M K . M . M Diffusion models.
In the work closest to ours, Lei et al. [24] compared their solution on theWeighted Cascade instance of IC, where the influence probabilities on incoming edges sum up to1. More precisely, every edge ( u, v ) has weight 1 /d v where d v is the in-degree of node v . In thisexperimental study, and to illustrate that our approach is diffusion-independent, we added two otherdiffusion scenarios to the set of experiments. First, we included the tri-valency model (TV), whichassociates randomly a probability from { . , . , . } to every edge and follows the IC propagation15
00 200 300 400 500
Trial . . . . . . . . I n fl u e n c e S p r e a d × K (a) HepPh (WC – Impact of K )
100 200 300 400 500
Trial . . . . . . . I n fl u e n c e S p r e a d × MaxCoverPMCDivRankMaxDegree (b) HepPh (WC – Influencer extrac-tion)
100 200 300 400 500
Trial . . . . . I n fl u e n c e S p r e a d × K (c) DBLP (WC – Impact of K )
100 200 300 400 500
Trial . . . . . I n fl u e n c e S p r e a d × MaxCoverMaxDegreePMCDivRank (d) DBLP (WC – Influencer extrac-tion)
Figure 4: Impact of K and the influencer extraction criterion on influence spread.model. We also conducted experiments under the Linear Threshold (LT) model, where the edgeprobabilities are set like in the WC case and where thresholds on nodes are sampled uniformly from[0 , Baselines.
We compare
GT-UCB to several baselines.
Random chooses a random influencerat each round.
MaxDegree selects the node with the largest degree at each step i , where thedegree does not include previously activated nodes. Finally, EG corresponds to the confidence-boundexplore-exploit method with exponentiated gradient update from [24]; it is the state-of-the-art methodfor the OIMP problem (code provided by the authors). We use this last baseline on WC and TVweighted graphs and tune parameters in accordance to the results of their experiments: MaximumLikelihood Estimation is adopted for graph update and edge priors are set to Beta(1 , EG learns parameters for the IC model, and hence is not applicable for LT. These baselines arecompared to an Oracle that knows beforehand the diffusion model together with probabilities. Ateach round, it runs an influence maximization approximated algorithm – PMC for IC propagation,SSA for LT. Note that previously activated nodes are not counted when estimating the value of anode with PMC or SSA, thus, making
Oracle an adaptive strategy.All experiments are done by fixing the trial horizon N = 500, a setting that is in line with manyreal-world marketing campaigns, which are fairly short and do not aim to reach the entire population. Choice of the influencers.
We show in Fig. 4b and 4d the impact of the influencer extractioncriterion on HepPh and DBLP under WC model. We can observe that the spread is only slightlyaffected by the extraction criterion: different datasets lead to different optimal criteria. On the16epPh network, DivRank clearly leads to larger influence spreads. On DBLP, however, the extractionmethod has little impact on resulting spreads. We emphasize that on some other graph and modelcombinations we observed that other extraction routines can perform better than DivRank. Insummary, we note that
GT-UCB performs consistently as long as the method leads to influencersthat are well spread over the graph. In the following, for each graph, we used DivRank as theinfluencer extraction criterion in accordance with these observations.In Fig. 4a and 4c, we measure the impact of the number of influencers K on the influence spread.We can observe that, on DBLP, a small number of influencers is sufficient to yield high-quality results.If too many influencers (relative to the budget) are selected (e.g. K = 200), the initialization steprequired by GT-UCB is too long relative to the full budget, and hence
GT-UCB does not reach itsoptimal spread – some influencers still have a large remaining potential at the end. On the otherhand, a larger amount of influencers leads to greater influence spreads on HepPh: this network isrelatively small (34 . K nodes), and thus half of the nodes are already activated after 400 trials. Byhaving more influencers, we are able to access parts of the network that would not be accessibleotherwise.
100 200 300 400 500
Trial − R u nn i n g t i m e ( s ) OracleEG-CBRandomMaxDegreeGT-UCB
Figure 5: DBLP (WC) – Execution time.
GT-UCB vs. baselines.
We evaluate the execution time of the different algorithms in Fig. 5.As expected,
GT-UCB largely outperforms EG (and Oracle ). The two baselines require theexecution of an approximated influence maximization algorithm at each round. In line with [2], weobserved that SSA has prohibitive computational cost when incoming edge weights do not sum upto 1, which is the case with both WC and TV. Thus, both
Oracle and EG run PMC on all ourexperiments with IC propagation. GT-UCB is several orders of magnitude faster: it concentratesmost of its running time on extracting influencers, while statistic updates and UCB computationsare negligible.In Fig. 6, we show the growth of the spread for
GT-UCB and baselines. For each experiment,
GT-UCB uses K = 50 if L = 1 and K = 100 if L = 10. First, we can see that MaxDegree is astrong baseline in many cases, especially for WC and LT.
GT-UCB results in good quality spreadsacross every combination of network and diffusion model. Interestingly, on the smaller graph HepPh,we observe an increase in the slope of spread after initialization, particularly visible at t = 50 withWC and LT. This corresponds to the step when GT-UCB starts to select influencers maximizing b k ( t ) in the main loop. It shows that our strategy adapts well to the previous activations, and choosesgood influencers at each iteration. Interestingly, Random performs surprisingly well in many cases,17
00 200 300 400 500
Trial . . . . . I n fl u e n c e S p r e a d × OracleEG-CBRandomMaxDegreeGT-UCB (a) HepPh (WC – L = 1)
100 200 300 400 500
Trial . . . . . I n fl u e n c e S p r e a d × (b) DBLP (WC – L = 1)
100 200 300 400 500
Trial . . . . . . . . I n fl u e n c e S p r e a d × (c) DBLP (WC – L = 10)
100 200 300 400 500
Trial . . . . . . I n fl u e n c e S p r e a d × OracleEG-CBRandomMaxDegreeGT-UCB (d) HepPh (TV – L = 1)
100 200 300 400 500
Trial . . . . . . I n fl u e n c e S p r e a d × (e) DBLP (TV – L = 1)
100 200 300 400 500
Trial . . . . . . . I n fl u e n c e S p r e a d × (f) DBLP (TV – L = 10)
100 200 300 400 500
Trial . . . . . I n fl u e n c e S p r e a d × OracleRandomMaxDegreeGT-UCB (g) HepPh (LT – L = 1)
100 200 300 400 500
Trial . . . . . . I n fl u e n c e S p r e a d × (h) DBLP (LT – L = 1)
100 200 300 400 500
Trial . . . . . I n fl u e n c e S p r e a d × (i) DBLP (LT – L = 10) Figure 6: Growth of spreads against the number of rounds.especially under TV weight assignment. However, when certain influencers are significantly betterthan others, it cannot adapt to select the best influencer unlike
GT-UCB . EG performs well onHepPh, especially under TV weight assignment. However, it fails to provide competitive cumulativespreads on DBLP. We believe that EG tries to estimate too many parameters for a horizon T = 500.After reaching this time step, less than 10% of all nodes for WC, and 20% for TV, are activated.This implies that we have hardly any information regarding the majority of edge probabilities, asmost nodes are located in parts of the graph that have never been explored. We continue the experimental section with an evaluation of
GT-UCB on the Twitter data, introducedas a motivating example in Section 3. The interest of this experiment is to observe actual spreads,instead of simulated ones, over data that does not provide an explicit influence graph.18
00 200 300 400 500
Trial . . . . . I n fl u e n c e S p r e a d × RandomMaxDegreeGT-UCB
100 200 300 400 500
Trial . . . . . . . . I n fl u e n c e S p r e a d × RandomMaxDegreeGT-UCB
Figure 7: Twitter spread against rounds: (left) L = 1 (right) L = 10.From the retweeting logs, for each active user u – a user who posted more than 10 tweets – weselect users having retweeted at least one of u ’s tweets. By doing so, we obtain the set of potentiallyinfluenceable users associated to active users. We then apply the greedy algorithm to select theusers maximizing the corresponding set cover. These are the influencers of GT-UCB and
Random . MaxDegree is given the entire reconstructed network (described in Table 1), that is, the networkconnecting active users to re-tweeters.To test realistic spreads, at each step, once an influencer is selected by
GT-UCB , a random cascadeinitiated by that influencer is chosen from the logs and we record its spread. This provides realistic,model-free spread samples to the compared algorithms. Since Twitter only contains successfulactivations (re-tweets) and not the failed ones, we could not test against EG , which needs both kindsof feedback.In Fig. 7, we show the growth of the diffusion spread of GT-UCB against
MaxDegree and
Random . Again,
GT-UCB uses K = 50 if L = 1 and K = 100 if L = 10. We can see that GT-UCB outperforms the baselines, especially when a single node is selected at each round. We can observethat
MaxDegree performs surprisingly well in both experiments. We emphasize that it relies onthe knowledge of the entire network reconstructed from retweeting logs, whereas
GT-UCB is onlygiven a set of (few) fixed influencers.
We conclude the experimental section with a series of experiments on Twitter data and taking intoaccount influencer fatigue.In a similar way to Section 6.3, we compute the set of potentially influenceable users (the support)associated to all active users – the set of all users who retweeted at least one tweet from the activeuser. We then choose 20 influencers as follows: we take the 5 best influencers, that is, the 5 activeusers with the largest support; then, the 51st to 55th best influencers, then, the 501st to 505th bestinfluencers, and finally the 5 worst influencers. By doing so, we obtain a set of 20 influencers withdiverse profiles, roughly covering the possible influencing outcomes. Ideally, a good algorithm thattakes into account influencer fatigue, would need to focus on the 5 best influencers at the beginning,but would need to move to other influencers when the initially optimal ones start to lose influencedue to fatigue.We compare
Fat-GT-UCB to GT-UCB , which does not use the influence fatigue function, and19
00 200 300 400 500
Trial . . . . . . C u m r e w a r d s × Fat-GT-UCBGT-UCBRandom
100 200 300 400 500
Trial . . . . . . . C u m r e w a r d s × Fat-GT-UCBGT-UCBRandom
Figure 8:
Fat-GT-UCB vs competitors on Twitter logs with (left) γ , (right) γ .to the Random baseline. As in Section 6.3, when an algorithm selects an influencer, we can choose arandom spread from the logs (belonging to the selected influencer), and we can now simulate thefatigue by removing every user in the spread with probability γ ( n ), where n is the number of timesthe influencer has already been played.We show the results of this comparison in Fig. 8. We tested with two different weariness functions,namely γ ( n ) = 1 /n and γ ( n ) = 1 / √ n . We can see that, in both scenarios, Fat-GT-UCB performsthe best, showing that our UCB-like approach can effectively handle the notion of influencer fatiguein the OIMP problem. Unsurprisingly,
GT-UCB performs better with weariness function γ than isdoes with γ : the former has a lower diminishing impact and thus, the penalty of not incorporatingfatigue is less problematic with γ . We have already discussed in Section 1 the main related studies in the area of influence maximization.For further details, we refer the interested reader to the recent survey in [2], which discusses thepros and cons of the best known techniques for influence maximization. In particular, the authorshighlight that the
Weighted Cascade (WC) instance of IC, where the weights associated to a node’sincoming edges must sum to one, leads to poor performance for otherwise rather fast IC algorithms.They conclude that PMC [31] is the state-of-the-art method to efficiently solve the IC optimizationproblem, while TIM+ [36] and IMM [35] – later improved by [30] with SSA – are the best currentalgorithms for WC and LT models.Other methods have been devised to handle the prevalent uncertainty in diffusion media, e.g.,when replacing edge probability scores with ranges thereof, by solving an influence maximizationproblem whose robust outcome should provide some effectiveness guarantees w.r.t. all possibleinstantiations of the uncertain model [20, 9].Methods for influence maximization that take into account more detailed information, such astopical categories, have been considered in the literature [11, 3, 39]. Interestingly, [32] experimentallyvalidates the intuition that different kinds of information spread differently in social networks, byrelying on two complementary properties, namely stickiness and persistence . The former can beseen as a measure of how viral the piece of information is, passing from one individual to the next.The latter can be seen as an indicator of the extent to which repeated exposures to that piece of20nformation impact its adoption, and it was shown to characterize complex contagions , of controversialinformation (e.g., from politics).
We propose in this paper a diffusion-independent approach for online and adaptive influencermarketing, whose role is to maximize the number of activated nodes in an arbitrary environment,under the OIMP framework. We focus on scenarios motivated by influencer marketing, in whichcampaigns consist of multiple consecutive trials conveying the same piece of information, requiringas only interfaces with the “real-world” the identification of potential seeds (the influencers) and thespread feedback (i.e., the set of activated nodes) at each trial. Our method’s online iterations arevery fast, making it possible to scale to very large graphs, where other approaches become infeasible.The efficiency of
GT-UCB comes from the fact that it only relies on an estimate of a single quantityfor each influencer – its remaining potential. This novel approach is shown to be very competitive oninfluence maximization benchmark tasks and on influence spreads in Twitter. Finally, we extend ourmethod to scenarios where, during a marketing campaign, the influencers may have a diminishing tendency to activate their user base.
Acknowledgments
This work was partially supported by the French research project ALICIA (grant ANR-13-CORD-0020).
References [1] In Duncan Brown and Nick Hayes, editors,
Influencer Marketing . Butterworth-Heinemann,Oxford, 2008.[2] A. Arora, S. Galhotra, and S. Ranu. Debunking the myths of influence maximization: Anin-depth benchmarking study. In
SIGMOD . ACM, 2017.[3] N. Barbieri, F. Bonchi, and G. Manco. Topic-aware social influence propagation models.
Knowl.Inf. Syst. , 37(3):555–584, 2013.[4] D. Berend and A. Kontorovich. On the concentration of the missing mass. In
ElectronicCommunications in Probability , pages 1–7, 2013.[5] S. Boucheron, G. Lugosi, P. Massart, and M. Ledoux.
Concentration inequalities : a nonasymp-totic theory of independence . Oxford university press, 2013.[6] D. Brown and S. Fiorella.
Influence Marketing: How to Create, Manage, and Measure BrandInfluencers in Social Media Marketing . Always learning. Que, 2013.[7] S. Bubeck and N. Cesa-Bianchi. Regret analysis of stochastic and nonstochastic multi-armedbandit problems.
Foundations & Trends in ML , 2012.[8] S. Bubeck, D. Ernst, and A. Garivier. Optimal discovery with probabilistic expert advice: finitetime analysis and macroscopic optimality.
Journal of Machine Learning Research , 14(1):601–623,2013. 219] W. Chen, T. Lin, Z. Tan, M. Zhao, and X. Zhou. Robust influence maximization. In
SIGKDD ,pages 795–804, 2016.[10] W. Chen, Y. Wang, Y. Yuan, and Q. Wang. Combinatorial multi-armed bandit and its extensionto probabilistically triggered arms.
JMLR , 17(1), 2016.[11] N. Du, L. Song, H. Woo, and H. Zha. Uncover topic-sensitive information diffusion networks. In
AISTATS , pages 229–237, 2013.[12] D. Easley and J. Kleinberg.
Networks, Crowds, and Markets - Reasoning About a HighlyConnected World . Cambridge University Press, 2010.[13] Paul Gillin.
The New Influencers: A Marketer’s Guide to the New Social Media . Quill DriverBooks, Sanger, CA, 2007.[14] M. Gomez-Rodriguez, D. Balduzzi, and B. Sch¨olkopf. Uncovering the temporal dynamics ofdiffusion networks. In
ICML , pages 561–568, 2011.[15] M. Gomez-Rodriguez, J. Leskovec, and A. Krause. Inferring networks of diffusion and influence.
ACM Trans. Knowl. Discov. Data , 5(4), February 2012.[16] M. Gomez-Rodriguez, J. Leskovec, and B. Sch¨olkopf. Structure and dynamics of informationpathways in online media. In
WSDM , pages 23–32, 2013.[17] I. J. Good. The population frequencies of species and the estimation of population parameters.
Biometrika , 40(3-4):237, 1953.[18] A. Goyal, F. Bonchi, and L. Lakshmanan. Learning influence probabilities in social networks.In
WSDM , pages 241–250, 2010.[19] P. Grabowicz, N. Ganguly, and K. Gummadi. Distinguishing between topical and non-topicalinformation diffusion mechanisms in social media. In
ICWSM , pages 151–160, 2016.[20] X. He and D. Kempe. Robust influence maximization. In
SIGKDD , pages 885–894, 2016.[21] D. Kempe, J. Kleinberg, and ´E. Tardos. Maximizing the spread of influence through a socialnetwork. In
SIGKDD , pages 137–146. ACM, 2003.[22] Paul Lagr´ee, Olivier Capp´e, Bogdan Cautis, and Silviu Maniu. Effective large-scale onlineinfluence maximization. In
ICDM , 2017.[23] Kamiu Lee. Influencer marketing 2.0: Key trends in 2017.https://influence.bloglovin.com/influencer-marketing-2-0-key-trends-in-2017-a5fb97424cd,2017.[24] S. Lei, S. Maniu, L. Mo, R. Cheng, and P. Senellart. Online influence maximization. In
SIGKDD ,2015.[25] Nir Levine, Koby Crammer, and Shie Mannor. Rotting bandits. In
Advances in NeuralInformation Processing Systems 30 (NIPS) , 2017.[26] Jonathan Lou¨edec, Laurent Rossi, Max Chevalier, Aur´elien Garivier, and Josiane Mothe.Algorithme de bandit et obsolescence : un mod`ele pour la recommandation. 2016.2227] D. McAllester and L. Ortiz. Concentration inequalities for the missing mass and for histogramrule error.
JMLR , 4:895–911, 2003.[28] D. McAllester and R. Schapire. On the convergence rate of good-turing estimators. In
COLT ,pages 1–6, 2000.[29] Q. Mei, J. Guo, and D. Radev. Divrank: The interplay of prestige and diversity in informationnetworks. In
SIGKDD , 2010.[30] H. T. Nguyen, M. T. Thai, and T. N. Dinh. Stop-and-stare: Optimal sampling algorithms forviral marketing in billion-scale networks. In
SIGMOD , 2016.[31] N. Ohsaka, T. Akiba, Y. Yoshida, and K. Kawarabayashi. Fast and accurate influence maxi-mization on large networks with pruned monte-carlo simulations. In
AAAI , 2014.[32] D. Romero, B. Meeder, and J. Kleinberg. Differences in the mechanics of information diffusionacross topics: Idioms, political hashtags, and complex contagion on twitter. In
WWW , pages695–704, 2011.[33] K. Saito, R. Nakano, and M. Kimura. Prediction of information diffusion probabilities forindependent cascade model. In
KES
SIGMOD , pages 1539–1554, 2015.[36] Y. Tang, X. Xiao, and Y. Shi. Influence maximization: Near-optimal time complexity meetspractical efficiency. In
SIGMOD , pages 75–86, 2014.[37] S. Vaswani, B. Kveton, Z. Wen, M. Ghavamzadeh, L. Lakshmanan, and M. Schmidt. Diffusionindependent semi-bandit influence maximization. In
ICML , 2017.[38] S. Vaswani, V.S. Lakshmanan, and M. Schmidt. Influence maximization with bandits. In
Workshop NIPS , NIPS ’15, 2015.[39] S. Wang, X. Hu, P. Yu, and Z. Li. Mmrate: inferring multi-aspect diffusion networks withmulti-pattern cascades. In
SIGKDD , pages 1246–1255, 2014.[40] D. Watts.
Six Degrees: The Science of a Connected Age . W. W. Norton, NY, 2003.[41] D. Watts and P. Dodds. Influentials, networks, and public opinion formation.
Journal ofConsumer Research , 34(4):441–458, 2007.[42] Z. Wen, B. Kveton, and M. Valko. Influence maximization with semi-bandit feedback. Technicalreport, 2016. 23
Useful Lemmas
Lemma 2 (Bennett’s inequality (Theorem 2.9 and 2.10 [5])) . Let X , . . . , X n be independentrandom variables with finite variance such that X i ≤ b for some b > for all i ≤ n . Let S := (cid:80) ni =1 ( X i − E [ X i ]) and v := (cid:80) ni =1 E [ X i ] . Writing φ ( u ) = e u − u − , then for all t > , log E (cid:2) e tS (cid:3) ≤ vb φ ( bt ) ≤ vt − bt/ . This implies that, P (cid:16) S > (cid:112) v log / δ + b log / δ (cid:17) ≤ δ . Lemma 3 (Lemma 7 – [4]) . Let n ≥ , λ ≥ , p ∈ [0 , and q = (1 − p ) n . Then, qe λp (1 − q ) + (1 − q ) e − λpq ≤ exp( pλ / (4 n )) (5) qe λp ( q − + (1 − q ) e λpq ≤ exp( pλ / (4 n )) (6) B Analysis of the Waiting Time of GT-UCB Algorithm
Lemma 4.
For any s ≥ , P (cid:16) ˆ R s ≤ ˆ R s − − λe ( s − − (cid:113) λs − log(1 /δ ) − s − log(1 /δ ) (cid:17) ≤ δ. Proof.
Denote by X s ( x ) := U s − ( x ) s − − U s ( x ) s ≤ s − . We can rewrite ˆ R s − − ˆ R s = (cid:80) x ∈ A X s ( x ) andcan easily verify that v ( x ) := E (cid:2) X s ( x ) (cid:3) = p ( x )(1 − p ( x )) s − (cid:18) s − − − p ( x ) s (cid:19) ≤ p ( x ) s − . (7)Let t >
0. By applying Lemma2, one obtains P (cid:32) ˆ R s − − ˆ R s ≥ E (cid:104) ˆ R s − − ˆ R s (cid:105) + (cid:114) λs − /δ ) + 13( s −
1) log(1 /δ ) (cid:33) ≤ δ. We conclude remarking that E [ X s ( x )] = p ( x ) (1 − p ( x )) s − ≤ p ( x ) e ( s − , that is, E [ ˆ R s − − ˆ R s ] ≤ λe ( s − . Theorem 3 (Waiting time) . Denote λ min := min k ∈ [ K ] λ k and λ max := max k ∈ [ K ] λ k . Assume that λ min ≥ . Then, for any α ∈ (cid:2) λ min , (cid:3) , if we define τ ∗ := T ∗ (cid:0) α − λ min (cid:1) , with probability at least − Kλ max , T UCB ( α ) ≤ τ ∗ + Kλ max log(4 τ ∗ + 11 Kλ max ) + 2 K. Proof.
Let us define the following confidence bounds: b + k,s ( t ) := (1 + √ (cid:114) λ k log(2 t ) s + log(2 t ) s ,b − k,s ( t ) := (1 + √ (cid:114) λ k log(2 t ) s + log(2 t ) s + λ k s , and c − k,t ( t ) := λe ( s −
2) + (cid:114) λ k log( t ) s − t ) s − . S >
0. Using these definitions, we introduce the following events: F := (cid:110) ∀ k ∈ [ K ] , ∀ t > S, ∀ s ≤ t, ˆ R k,s − b − k,s ( t ) ≤ R k,s ≤ ˆ R k,s + b + k,s ( t ) (cid:111) , G := (cid:110) ∀ k ∈ [ K ] , ∀ s ≥ S, ˆ R k,s ≥ ˆ R k,s − − c − k,s ( t ) (cid:111) , E := F ∩ G . Using Theorem 1, Lemma 4 and a union bound, one obtains P ( E ) ≥ − KS (by setting δ ≡ t ).Indeed, P (cid:0) ¯ E (cid:1) ≤ P ( ¯ F ) + P ( ¯ G ) ≤ K (cid:88) k =1 (cid:88) t>S (cid:88) s ≤ t t = 2 K (cid:88) t>S t ≤ KS .
In the following, we work on the event E . Recall that we want to control T UCB ( α ), the time atwhich every influencer attains a remaining potential smaller than α following GT-UCB strategy. Weaim at comparing T UCB ( α ) to T ∗ ( α ), the same quantity following the omniscient strategy. Withthat in mind, one can write: T UCB ( α ) = min (cid:8) t : ∀ k ∈ [ K ] , R k,N k ( t ) ≤ αλ k (cid:9) ,T ∗ ( α ) = K (cid:88) k =1 T ∗ k ( α ), where T ∗ k ( α ) = min { s : R k,s ≤ αλ k } . Following ideas from [8], we can control T UCB ( α ) by comparing it to U ( α ) defined below, andwhich replaces the remaining potential by an upper bound on the estimator of the remaining potential(the Good-Turing estimator). Indeed, recall that we can control this on event F . U ( α ) = min (cid:110) t ≥ ∀ k ∈ [ K ] , ˆ R k,N k ( t ) + b + k,N k ( t ) ( t ) ≤ αλ k (cid:111) . Let S (cid:48) ≥ S . On event E , one has that T UCB ( α ) ≤ max( S (cid:48) , U ( α )). If U ( α ) ≥ S (cid:48) , one has R k,N k ( U ( α )) ≥ ˆ R k,N k ( U ( α )) − b − k,N k ( U ( α )) ( U ( α )) (we are on event F and U ( α ) > S (cid:48) ≥ S ) ≥ ˆ R k,N k ( U ( α )) − − b − k,N k ( U ( α )) ( U ( α )) − c − k,N k ( U ( α )) ( U ( α )) (where are on event G ) ≥ (cid:16) αλ k − b + k,N k ( U ( α )) − ( U ( α )) (cid:17) − b − k,N k ( U ( α )) ( U ( α )) − c − k,N k ( U ( α )) ( U ( α ))The third inequality’s justification is more evolved. Let t be the time such that N k ( t ) = N k ( U ( α )) − N k ( t + 1) = N k ( U ( α )). This implies that k is the chosen expert at time t , that is, the onemaximizing the GT-UCB index. Moreover, since t < U ( α ), one knows that this index is greaterthan αλ k .If N k ( U ( α )) ≥ S (cid:48) + 2, some basic calculations lead to R k,N k ( U ( α )) ≥ αλ k − (cid:114) λ k log(2 U ( α )) S (cid:48) − U ( α )) S (cid:48) − λ k S (cid:48) We denote by λ max := max k λ k . If we take S (cid:48) = λ max log(2 U ( α )), we can rewrite the previousinequality as R k,N k ( U ( α )) ≥ αλ k − − λ max − T ∗ k ( α ), and if λ max >
6, one gets N k,U ( α ) ≤ T ∗ k (cid:18) α − λ k (cid:19) + S (cid:48) + 2 . Finally, if we denote by λ min = min k λ k , we obtain that U ( α ) ≤ K ( S (cid:48) + 2) + T ∗ (cid:18) α − λ min (cid:19) . We now apply Lemma 5. We obtain that U ( α ) ≤ K + τ ∗ + Kλ max log (8 K + 4 τ ∗ + 10 Kλ max ) ≤ τ ∗ + Kλ max log (4 τ ∗ + 11 Kλ max ) + 2 K. We conclude with T UCB ( α ) ≤ max( S (cid:48) , U ( α )). Lemma 5 (Lemma 3 from [8]) . Let a > , b ≥ . , and x ≥ e , such that x ≤ a + b log x . Then onehas x ≤ a + b log(2 a + 4 b log(4 b )) . Moreover, we add that if b ≥ , then x ≤ a + b log(2 a + 5 b ) . C Confidence intervals in the influencer fatigue setting
In this section, we consider a single influencer and omit its index k . We recall that we make theassumption that influencers have non-intersecting support. Thus, after selecting the influencer n times, the remaining potential can be rewritten R n = (cid:88) u ∈ A { u never activated } p n +1 ( u ) , – n + 1 because this is the remaining potential for the n + 1th spread – and the correspondingGood-Turing estimator is ˆ R n = 1 n (cid:88) u ∈ A U γn ( u ) , where U γn ( u ) = (cid:80) ni =1 { X = . . . = X i − = X i +1 = . . . = X n = 0 , X i = 1 } γ ( n +1) γ ( i ) . Estimator bias.
Lemma 6 shows that the estimator of the remaining potential for the influencerfatigue setting is hardly biased.
Lemma 6.
Denoting λ = (cid:80) u ∈ A p ( u ) , the bias of the remaining potential estimator is E [ R n ] − E [ ˆ R n ] ∈ (cid:20) − γ ( n + 1) λn , (cid:21) . Proof.
We have that E [ U γn ( u )] = n (cid:88) i =1 p i ( u ) (cid:89) j (cid:54) = i (1 − p j ( u )) γ ( n + 1) γ ( i ) = p n +1 ( u ) n (cid:88) i =1 (cid:89) j (cid:54) = i (1 − p j ( u )) .
26e now can compute the bias of the estimator: E [ R n ] − E [ ˆ R n ] = 1 n (cid:88) u ∈ A p n +1 ( u ) n (cid:88) i =1 n (cid:89) j =1 (1 − p j ( u )) − n (cid:88) i =1 (cid:89) j (cid:54) = i (1 − p j ( u )) = 1 n (cid:88) u ∈ A p n +1 ( u ) n (cid:88) i =1 (cid:89) j (cid:54) = i (1 − p j ( u ))[1 − p i ( u ) − − n (cid:88) u ∈ A p n +1 ( u ) n (cid:88) i =1 p i ( u ) (cid:89) j (cid:54) = i (1 − p j ( u ))= − n E (cid:34)(cid:88) u ∈ A p n +1 ( u ) U n ( u ) (cid:35) ∈ (cid:20) − (cid:80) u ∈ A p n +1 ( u ) n , (cid:21) Note that the random variable U n ( u ) correspond to the hapax definition given in the original OIMPproblem, that is, U n ( u ) = { u activated exactly once } .Unsurprisingly, we obtain the same bias for the case where γ is constant equal to 1 (no fatigue). Confidence Intervals.
To derive an optimistic algorithm, we need confidence intervals on theremaining potential. We operate in three steps:1.
Good-Turing deviations:
Remember that ˆ R n = (cid:80) u ∈ A U γn ( u ) n . We have next that E [ U γn ( u ) ] = n (cid:88) i =1 p i ( u ) (cid:89) j (cid:54) = i (1 − p j ( u )) γ ( n + 1) γ ( i ) = n (cid:88) i =1 p n +1 ( u ) (cid:89) j (cid:54) = i (1 − p j ( u )) γ ( n + 1) γ ( i ) ≤ np ( u ) γ ( n + 1) . Thus, we have that v := (cid:80) u ∈ A E (cid:104) U γn ( u ) n (cid:105) ≤ (cid:80) u ∈ A p n +1 ( u ) n .Applying Bennett’s inequality to the independent random variables { X γn ( u ) } u ∈ A yields P (cid:32) ˆ R n − E (cid:104) ˆ R n (cid:105) ≥ (cid:114) λ n +1 log(1 /δ ) n + 13 n log(1 /δ ) (cid:33) ≤ δ, (8)where λ n := γ ( n ) (cid:80) u ∈ A p ( u ).The same inequality can be derived for left deviations.2. Remaining potential deviations:
Remember that R n = (cid:80) u ∈ A Z n ( u ) p n +1 ( u ) where Z n ( u ) = { u never activated } = { X ( u ) = · · · = X n ( u ) = 0 } . We denote Y n ( u ) = p n +1 ( u )( Z n ( u ) − [ Z n ( u )]) and q n ( u ) = P ( Z n ( u ) = 1) = (cid:81) ni =1 (1 − p s ( u )). We have that P ( R n − E [ R n ] ≥ (cid:15) ) ≤ e − t(cid:15) (cid:89) u ∈ A E (cid:104) e tY n ( u ) (cid:105) = e − t(cid:15) (cid:89) u ∈ A (cid:16) P ( Z n ( u ) = 1) e tp n +1 ( u )(1 − q n ( u )) + P ( Z n ( u ) = 0) e − tp n +1 ( u ) q n ( u ) (cid:17) = e − t(cid:15) (cid:89) u ∈ A (cid:16) q n ( u ) e tp n +1 ( u )(1 − q n ( u )) + (1 − q n ( u )) e − tp n +1 ( u ) q n ( u ) (cid:17) ≤ e − t(cid:15) (cid:89) u ∈ A exp (cid:18) p n +1 ( u ) t n (cid:19) (by Eq. 9 in Lemma 7)Minimizing on t , we obtain (for t = (cid:15)n (cid:80) u ∈ A p n +1 ( u ) ), P ( R n − E [ R n ] ≥ (cid:15) ) ≤ exp (cid:18) − (cid:15) n (cid:80) u ∈ A p n +1 ( u ) (cid:19) . We can proceed similarly to obtain the left deviation.Putting it all together, we obtain the following confidence intervals in Theorem 4, which can beused in the design of the optimistic algorithm
Fat-GT-UCB . Theorem 4.
With probability at least − δ , for λ n = γ ( n ) (cid:80) u ∈ A p ( u ) and β n := (cid:0) √ (cid:1) × (cid:113) λ n +1 log(4 /δ ) n + n log δ , the following holds: − β n − λn ≤ R n − ˆ R n ≤ β n . Lemma 7 (Adaptation of Lemma 3.5 in [4]) . Let n ≥ , p ∈ [0 , , γ : N → [0 , a non-increasingfunction and t ≥ . We denote p n = γ ( n ) p and q n = (cid:81) i ≤ t (1 − p i ) . ( a ) q n e tp n (1 − q n ) + (1 − q n ) e − tp n q n ≤ exp (cid:18) p n t n (cid:19) (9)( b ) q n e tp n ( q n − + (1 − q n ) e tp n q n ≤ exp (cid:18) p n t n (cid:19) (10) Proof.
Let q (cid:48) n = (1 − p n ) n . Clearly, q n ≤ q (cid:48) n .(a) Using Theorem 3.2 in [4] with p ≡ q n and t ≡ tp n , we have that, q n e tp n (1 − q n ) + (1 − q n ) e − tp n q n ≤ exp (cid:18) − q n − q n ) /q n ) t p n (cid:19) . So it suffices to show that (1 − q n ) t p n / − q n ) /q n ) ≤ p n t / n, or equivalently, (1 − q n ) p n log((1 − q n ) /q n ) ≤ log(1 − p n ) / log( q (cid:48) n ) . L ( q n , q (cid:48) n ) := (1 − q n ) log(1 /q (cid:48) n )log((1 − q n ) /q n ) ≤ / log(1 − p n ) p n := R ( p n ) . As is [4], we show that L ≤ ≤ R . The second inequality is true (see [4]).The left-hand side can be written L ( q n , q (cid:48) n ) = L ( q n ) L ( q (cid:48) n ). Clearly, L ( q (cid:48) n ) ≥
0. If L ( q n ) ≤ L ( q n , q (cid:48) n ) ≤
1. Else, L ( q n ) ≥
0, we can upper bound theright-hand side as L ( q n , q (cid:48) n ) ≤ (1 − q n ) log(1 /q n )log((1 − q n ) /q n ) (because q n ≤ q (cid:48) n ), which is proven to be less than 1in [4]. This concludes the proof.(b) It is shown in the proof Lemma 3.5 (b) in [4] that L ( t ) := 1 t p n log (cid:104) q n e − tp n (1 − q n ) + (1 − q n ) e tp n q n (cid:105) ≤ t p n t p n q n / log(1 − p n ) =: R ( q n ) . It suffices to show that R ( q n ) ≤ R ( q (cid:48) n ) to obtain the desired inequality. This is true because0 ≤ log(1 − p n )log q n ≤ log(1 − p n )log q (cid:48) n ..