[PDF] Fast Adaptive Non-Monotone Submodular Maximization Subject to a Knapsack Constraint

Abstract

Constrained submodular maximization problems encompass a wide variety of applications, including personalized recommendation, team formation, and revenue maximization via viral marketing. The massive instances occurring in modern day applications can render existing algorithms prohibitively slow, while frequently, those instances are also inherently stochastic. Focusing on these challenges, we revisit the classic problem of maximizing a (possibly non-monotone) submodular function subject to a knapsack constraint. We present a simple randomized greedy algorithm that achieves a 5.83 approximation and runs in O(nlogn) time, i.e., at least a factor n faster than other state-of-the-art algorithms. The robustness of our approach allows us to further transfer it to a stochastic version of the problem. There, we obtain a 9 -approximation to the best adaptive policy, which is the first constant approximation for non-monotone objectives. Experimental evaluation of our algorithms showcases their improved performance on real and synthetic data.

Full PDF

aa r X i v : . [ c s . D S ] J u l Fast Adaptive Non-Monotone Submodular MaximizationSubject to a Knapsack Constraint ∗ Georgios Amanatidis † Federico Fusco ‡ Philip Lazos ‡ Stefano Leonardi ‡ Rebecca Reiﬀenhäuser ‡ July 13, 2020

Abstract

83 approximation and runs in O ( n log n ) time, i.e.,at least a factor n faster than other state-of-the-art algorithms. The robustness of our ap-proach allows us to further transfer it to a stochastic version of the problem. There, we obtaina 9-approximation to the best adaptive policy, which is the ﬁrst constant approximation fornon-monotone objectives. Experimental evaluation of our algorithms showcases their improvedperformance on real and synthetic data. Constrained submodular maximization is a fundamental problem at the heart of discrete optimiza-tion. The reason for this is as simple as it is clear: submodular functions capture the notion of diminishing returns present in a wide variety of real-world settings.Consequently to its striking importance and coinciding NP-hardness [20], extensive research hasbeen conducted on submodular maximization since the seventies (e.g., [15, 42]), with focus latelyshifting towards handling the massive datasets emerging in modern applications. With a widevariety of possible constraints, often regarding cardinality, independence in a matroid, or knapsack-type restrictions, the number of applications is vast. To name just a few, there are recent works onfeature selection in machine learning [13, 14, 32], inﬂuence maximization in viral marketing [2, 31],and data summarization [43, 38, 45]. Many of these applications have non-monotone submodularobjectives, meaning that adding an element to an existing set might actually decrease its value.Two such examples are discussed in detail in Section 5. ∗ This work was supported by the ERC Advanced Grant 788893 AMDROMA “Algorithmic and Mechanism DesignResearch in Online Markets” and the MIUR PRIN project ALGADIMAR “Algorithms, Games, and Digital Markets.” † Department of Mathematical Sciences, University of Essex, UK, and Institute for Logic, Language and Compu-tation, University of Amsterdam, The Netherlands. Email: [email protected] ‡ Department of Computer, Control, and Management Engineering “Antonio Ruberti”, Sapienza University ofRome, Italy. Email: {fuscof, lazos, leonardi, rebeccar}@diag.uniroma1.it objective function evaluations (also known as value oracle calls ),it is typically measured (as in this work) by their number. So, here the goal is to design algorithmsrequiring an almost linear number of such evaluations. There is extensive research focusing onthis issue, be it in the standard algorithmic setting [39], or in streaming [9, 3] and distributedsubmodular maximization [38, 12]. The second challenge is the inherent uncertainty in problemslike sensor placement or revenue maximization, where one does not learn the exact marginal valueof an element until it is added to the solution (and thus “paid for”). This, too, has motivatedseveral works on adaptive submodular maximization [25, 26, 28, 41]. Note that even estimatingthe expected value to a partially unknown objective function can be very costly and this makes thereduction of the number of such calls all the more important.Knapsack constraints are one of the most natural types of restriction that occurs in real-worldproblems and are often hard budget, time, or size constraints. Other combinatorial constraints likepartition matroid constraints, on the other hand, model less stringent requirements, e.g., avoidingtoo many similar items in the solution. As the soft versions of such constraints can be oftenhardwired in the objective itself (see the

Video Recommendation application in Section 5), we donot deal with them directly here.The nearly-linear time requirement, without large constants involved, leaves little room for usingsophisticated approaches like continuous greedy methods [23] or enumeration of initial solutions[44]. To further highlight the delicate balance between function evaluations and approximation, itis worth mentioning that, even for the monotone case, the ﬁrst result combining O ( n log n ) oraclecalls with an approximation better than 2 is the very recent ee − -approximation algorithm of Eneand Nguyen [16]. While this is a very elegant theoretical result, the huge constants involved renderit unusable in practice.At the same time, the strikingly simple, 2-approximation modiﬁed densitiy greedy algorithm ofWolsey [46] deals well with both issues in the monotone case: Sort the items in decreasing orderaccording to their marginal value over cost ratio and pick as many items as possible in that orderwithout violating the constraint. Finally, return the best among this solution and the best singleitem.

When combined with lazy evaluations [37], this algorithm requires only O ( n log n ) valueoracle calls and can be adjusted to work equally well for adaptive submodular maximization [25].For non-monotone objectives, however, the only practical algorithm is the (10 + ε )-approximationFANTOM algorithm of Mirzasoleiman et al. [39] requiring O ( n log n ) value oracle calls. Moreover,there is no known algorithm for the adaptive setting that can handle anything beyond a cardinalityconstraint [26].We aim to tackle both aforementioned challenges for non-monotone submodular maximizationunder a knapsack constraint, by revisiting the simple algorithmic principle of Wolsey’s densitygreedy algorithm. Our approach is along the lines of recent results on random greedy combinatorialalgorithms [6, 24], which show that introducing randomness into greedy algorithms can extendtheir guarantees to the non-monotone case. Here, we give the ﬁrst such algorithm for a knapsackconstraint. The density greedy algorithm mentioned above may produce arbitrarily poor solutions when theobjective is non-monotone. In this work we show that introducing some randomization leads toa simple algorithm,

SampleGreedy , that outperforms existing algorithms both in theory and inpractice.

SampleGreedy ﬂips a coin before greedily choosing any item in order to decide whether2o include it to the solution or ignore it. The algorithmic simplicity of such an approach keeps

SampleGreedy fast, easy to implement, and ﬂexible enough to adjust to other related settings.At the same time the added randomness prevents it from getting trapped in solutions of poorquality.In particular, in Section 3 we show that

SampleGreedy is a 5 . O ( n log n ) value oracle calls. When all singletons have small value compared to an optimalsolution, the approximation factor improves to almost 4. This is the ﬁrst constant-factor approxi-mation algorithm for the non-monotone case using this few queries. The only other algorithm fastenough to be suitable for large instances is the aforementioned FANTOM [39] which, for a knapsackconstraint, achieves an approximation factor of (10 + ε ) with O ( nrε − log n ) queries, where r isthe size of the largest feasible set and can be as large as Θ( n ). Even if we modify FANTOM to uselazy evaluations, we still improve the query complexity by a logarithmic factor (see also Remark 1).For the adaptive setting, where the stochastic submodular objective is learned as we build thesolution, we show in Section 4 that a variant of our algorithm, AdaptiveGreedy , still guaranteesa 9-approximation to the best adaptive policy . This is not only a relatively small loss given theconsiderably stronger benchmark, but is in fact the ﬁrst constant approximation known for theproblem in the adaptive submodular maximization framework of Golovin and Krause [25] andGotovos et al. [26]. Hence we ﬁll a notable theoretical gap, given that models with incomplete priorinformation or those capturing evolving settings are becoming increasingly important in practice.From a technical point of view, our algorithm combines the simple principle of always choosinga high-density item with maintaining a careful exploration-exploitation balance, as is the case inmany stochastic learning problems. It is therefore directly related to the recent simple randomizedgreedy approaches for maximizing non-monotone submodular objectives subject to other (i.e., non-knapsack) constraints [6, 9, 24]. However, there are underlying technical diﬃculties that makethe analysis for knapsack constraints signiﬁcantly more challenging. Every single result in thisline of work critically depends on making a random choice in each step, in a way so that “goodprogress” is consistently made. This is not possible under a knapsack constraint. Instead, weargue globally about the value of the

SampleGreedy output via a comparison with a carefullymaintained almost integral solution. When it comes to extending this approach to the adaptivenon-monotone submodular maximization framework, we crucially use the fact that the algorithmbuilds the solution iteratively, committing in every step to all the past choices. This is the maintechnical reason why it is not possible to adjust algorithms with multiple “parallel” runs, likeFANTOM, to the adaptive setting.Our algorithms provably handle well the aforementioned emerging, modern-day challenges, i.e.,stochastically evolving objectives and rapidly growing real-world instances. In Section 5 we show-case the fact that our theoretical results indeed translate into applied performance. We focus on twoapplications that ﬁt within the framework of non-monotone submodular maximization subject toa knapsack constraint, namely video recommendation and inﬂuence-and-exploit marketing . We runexperiments on real and synthetic data that indicate that

SampleGreedy consistently performsbetter than FANTOM while being much faster. For

AdaptiveGreedy we highlight the fact thatits adaptive behavior results in a signiﬁcant improvement over non-adaptive alternatives.

There is an extensive literature on submodular maximization subject to knapsack or other con-straints, going back several decades, see, e.g., [42, 46]. For a monotone submodular objective FANTOM can handle more general constraints, like a p -system constraint and ℓ knapsack constraints. Here werefer to its performance and running time when restricted to a single knapsack constraint. ee − -approximation algorithm [33, 44]which is tight, unless P = NP [20].On non-monotone submodular functions Lee et al. [35] provided a 5-approximation algorithmfor k knapsack constraints, which was the ﬁrst constant factor algorithm for the problem. Fadaeiet al. [17] building on the approach of Lee et al. [35], reduced this factor to 4. One of the mostinteresting algorithms for a single knapsack constraint is the 6-approximation algorithm of Guptaet al. [27]. As this is a greedy combinatorial algorithm based on running Sviridenko’s algorithmtwice, it is often used as a subroutine by other algorithms in the literature, e.g., [12], despite itsrunning time of O ( n ). A number of continuous greedy approaches [23, 34, 8] led to the currentbest factor of e when a knapsack—or even a general downwards closed—constraint is involved.However, continuous greedy algorithms are impractical for most real-world applications. The fastestsuch algorithm for our setting is the ( e + ε )-approximation algorithm of Chekuri et al. [10] requiring O ( n ε − polylog( n )) function evaluations. Possibly the only algorithm that is directly comparable toour SampleGreedy in terms of running time is FANTOM by Mirzasoleiman et al. [39]. FANTOMachieves a (1 + ε )( p + 1)(2 p + 2 ℓ + 1) /p -approximation for ℓ knapsack constraints and a p -systemconstraint in time O ( nrpε − log( n )), where r is the size of the largest feasible solution.As mentioned above, there is a number of recent results on randomizing simple greedy algorithmsso that they work for non-monotone submodular objectives [6, 9, 26, 24, 22]. Our paper extendsthis line of work, as we are the ﬁrst to successfully apply this approach for a knapsack constraint.Golovin and Krause [25] introduced the notions of adaptive monotonicity and submodularityand showed it is possible to achieve guarantees with respect to the optimal adaptive policy that aresimilar to the guarantees one gets in the standard algorithmic setting with respect to an optimalsolution. Our Section 4 ﬁts into this framework as it was generalized by Gotovos et al. [26] for non-monotone objectives; they showed that a variant of the random greedy algorithm of Buchbinderet al. [6] achieves a ee − -approximation in the case of a cardinality constraint.Implicitly related to our quest for few value oracle calls is the recent line of work on theadaptive complexity of submodular maximization that measures the number of sequential roundsof independent value oracle calls needed to obtain a constant factor approximation; see [4, 5, 18, 19]and references therein. To the best of our knowledge, nothing nontrivial is known for non-monotonefunctions and a knapsack constraint. In this section we formally introduce the problem of submodular maximization with a knapsackconstraint in both the standard and the adaptive setting.Let A = { , , ..., n } be a set of n items. Deﬁnition 1 (Submodularity) . A function v : 2 A → R is submodular if and only if v ( S ∪ { i } ) − v ( S ) ≥ v ( T ∪ { i } ) − v ( T )for all S ⊆ T ⊆ A and i T .A function v : 2 A → R is non-decreasing (often referred to as monotone ), if v ( S ) ≤ v ( T ) for any S ⊆ T ⊆ A . We consider general (i.e., not necessarily monotone), normalized (i.e., v ( ∅ ) = 0), non-negative submodular valuation functions. Since marginal values are extensively used, we adoptthe shortcut v ( T | S ) for the marginal value of set T with respect to a set S , i.e. v ( T | S ) = v ( T ∪ S ) − v ( S ) . If T = { i } we write simply v ( i | S ). While this is the most standard deﬁnition ofsubmodularity in this setting, there are alternative equivalent deﬁnitions that will be useful lateron. 4 heorem 1 (Nemhauser et al. [42]) . Given a function v : 2 A → R , the following are equivalent:1. v ( i | S ) ≥ v ( i | T ) for all S ⊆ T ⊆ A and i T .2. v ( S ) + v ( T ) ≥ v ( S ∪ T ) + v ( S ∩ T ) for all S, T ⊆ A .3. v ( T ) ≤ v ( S ) + P i ∈ T \ S v ( i | S ) − P i ∈ S \ T v ( i | S ∪ T \ { i } ) for all S, T ⊆ A . Moreover, we restate a key result which connects random sampling and submodular maximiza-tion. The original version of the theorem was due to Feige et al. [21], although here we use a variantfrom Buchbinder et al. [6].

Lemma 1 (Lemma 2.2. of Buchbinder et al. [6]) . Let f : 2 A → R be a submodular set function,let X ⊆ A and let X ( p ) be a sampled subset, where each element of X appears with probability atmost p (not necessarily independent). Then: E [ f ( X ( p ))] ≥ (1 − p ) f ( ∅ ) . We assume access to a value oracle that returns v ( S ) when given as input a set S . We alsoassociate a positive cost c i with each element i ∈ A and consider a given budget B . The goal is toﬁnd a subset of A of maximum value among the subsets whose total cost is at most B . Formally, wewant some S ∗ ∈ arg max { v ( S ) | S ⊆ A, P i ∈ S c i ≤ B } . Without loss of generality, we may assumethat c i ≤ B for all i ∈ A , since any element with cost exceeding B is not contained in any feasiblesolution and can be discarded.We next present the adaptive optimization framework. On a high level, here we do know howthe world works and what situations occur with which probability. However, which of those wewill be actually dealing with is inferred over time by the bits of information we learn. Alongwith set A , we introduce the state space Ω which is endowed with some probability measure. By ω = ( ω i ) i ∈ A ∈ Ω we specify the state of each element in A . The adaptive valuation function v isthen deﬁned over A × Ω; the value over a subset S ⊆ A depends on both the subset and ω . Dueto the probability measure over Ω, v ( S, ω ) is a random variable. We deﬁne v ( S ) = E [ v ( S, ω )],the expectation being with respect to ω . Like before, the costs c i are deterministic and known inadvance.For each ω ∈ Ω and S ⊆ A , we deﬁne the partial realization of state ω on S as the couple( S, ω | S ), where ω | S = ( ω i ) i ∈ S . It is natural to assume that the true value of a set S does not dependon the whole state, but only on ω | S , i.e., v ( S, ω ) = v ( S, ψ ) , for all ω, ψ ∈ Ω such that ω | S = ψ | S .Therefore, sometimes we overload the notation and use v ( S, ω | S ) instead of v ( S, ω ). There is a clearpartial ordering on the set of all possible partial realizations: (

S, ω | S ) ⊆ ( T, ω | T ) if S ⊆ T and ω | T coincides with ω | S over all the elements of S. The marginal value of an element i given a partialrealization ( S, ω | S ) is v ( i | ( S, ω | S )) = E h v ( { i } ∪ S, ω ) − v ( S, ω ) | ω | S i . We are now ready to introduce the concepts of adaptive submodularity and monotonicity.

Deﬁnition 2.

The function v ( · , · ) is adaptive submodular if v ( i | ( S, ω | S )) ≥ v ( i | ( T, ω | T )) for allpartial realizations ( S, ω | S ) ⊆ ( T, ω | T ) and for any i / ∈ T . Further, v ( · , · ) is adaptive monotone if v ( i | ( S, ω | S )) ≥ S, ω | S ) and for all i / ∈ S .In Section 4 we assume access to a value oracle that given an element i and a partial real-ization returns the expected marginal value of i . Using the properties of conditional expectation,5t is straightforward to show that if v ( · , · ) is adaptive submodular, then its expected value v ( · )is submodular. In analogy with [26], we assume v to be state-wise submodular, i.e., v ( · , ω ) is asubmodular set function for each ω ∈ Ω . In this framework it is possible to deﬁne adaptive policies to maximize v . An adaptive policyis a function which associates with every partial realization a distribution on the next element tobe added to the solution. The optimal solution to the adaptive submodular maximization problemis to ﬁnd an adaptive policy that maximizes the expected value while respecting the knapsackconstraint (the expectation being taken over Ω and the randomness of the policy itself). We present and analyze

SampleGreedy , a randomized 5 . Sam-pleGreedy is based on the modiﬁed density greedy algorithm of Wolsey [46]. Since the lattermay perform arbitrarily bad for non-monotone objectives, we add a sampling phase, similar to thesampling phase of the Sample Greedy algorithm of Feldman et al. [24].

SampleGreedy ﬁrst selects a random subset A ′ of A by independently picking each elementwith probability p . Then it runs Wolsey’s algorithm only on A ′ . To formalize this second step,using v ( i ) as a shorthand for v ( { i } ), let j ∈ arg max i ∈ A ′ v ( i ) /c i , while, for k ≥ j k +1 ∈ arg max i ∈ A ′ { j ,...,j k } v ( i | { j , . . . , j k } ) /c i . If ℓ is the largest integer such that P ℓi =1 c j i ≤ B , then S = { j , . . . , j ℓ } . In the end, the output isthe one yielding the largest value between S and an element from arg max i ∈ A ′ v ( i ).We formally present this algorithm in pseudocode below. Notice that to simplify the analysis,instead of selecting the entire set A ′ immediately at the start of the algorithm, we defer this decisionand toss a coin with probability of success p each time an item is considered to be added to thesolution. Both versions of the algorithm behave identically. Theorem 2.

For p = √ − , SampleGreedy is a (cid:16) √ (cid:17) -approximation algorithm.Proof. For the analysis of the algorithm we are going to use the auxiliary set O , an extension ofthe set S that respects the knapsack constraint and uses feasible items from an optimal solution.In particular, let S ∗ be an optimal solution and let s , s , . . . , s r be its elements.Then, O is a fuzzy set that is initially equal to S ∗ and during each iteration of the while loopit is updated as follows: • If r i = 1, then O = O ∪ { i } . In case this addition violates the knapsack constraint, i.e., c ( O ) > B , then we repetitively remove items from O \ S in increasing order with respect totheir cost until the cost of O becomes exactly B . Note that this means that the last itemremoved may be removed only partially. More precisely, if c ( O ) > B and c ( O \ { s j } ) ≤ B ,where s j is the item of S ∗ of maximum index in O \ S , then we keep a (cid:0) B − c ( O ) + c j (cid:1) /c j fraction of s j in O and stop its update for the current iteration. • Else (i.e., if r i = 0), O = O \ { i } . 6 ampleGreedy ( A, v, c , B ) i ∗ ∈ arg max k ∈ A v ( k ) /* best single item */ S = ∅ /* greedy solution */ F = { k ∈ A | v ( k | S ) > c k ≤ B } /* initial set of feasible items */ R = B /* remaining knapsack capacity */ while F = ∅ do Let i ∈ arg max k ∈ F v ( k | S ) c k Let r i ∼ Bernoulli( p ) /* independent random bit */ if r i = 1 then S = S ∪ { i } R = R − c i A = A \ { i } F = { k ∈ A | v ( k | S ) > c k ≤ R } return max { v ( i ∗ ) , v ( S ) } If an item j was considered (in line 6) in some iteration of the while loop, then let S j and O j denotethe sets S and O , respectively, at the beginning of that iteration. Moreover, let O ′ j denote O at theend of that iteration. If j was never considered, then S j and O j (or O ′ j ) denote the ﬁnal versions of S and O , respectively. In fact, in what follows we exclusively use S and O for their ﬁnal versions.It should be noted that, for all j ∈ A , S j ⊆ O j and also no item in O j \ S j has been consideredin any of the previous iterations of the while loop.Before stating the next lemma, let us introduce some notation for the sake of readability. Notethat, by construction, O \ S is either empty or consists of a single fractional item ˆ ı . In case O \ S = ∅ ,by ˆ ı we denote the last item removed from O . For every j ∈ A , we deﬁne Q j = O j \ ( O ′ j ∪ S ∪ { ˆ ı } ).Note that if j was never considered during the execution of the algorithm, then Q j = ∅ . Lemma 2.

For every realization of the Bernoulli random variables, it holds that v ( S ∪ S ∗ ) ≤ v ( S ) + v (ˆ ı ) + X j ∈ A c ( Q j ) v ( j | S j ) c j . Proof of Lemma 2.

Assume that the random bits r , r , . . . are ﬁxed. Also, without loss of gen-erality, assume the items are numbered according to the order in which they are considered by SampleGreedy , with the ones not considered by the algorithm numbered arbitrarily (but afterthe ones considered). That is, item j —if considered—is the item considered during the j th iteration.Consider now any round j of the while loop of SampleGreedy . An item is removed from O j in two cases. First, it could be item j itself that was originally in S ∗ but r j = 0 (and henceit will never get back in O k for any k > j ). Second, it could be some other item that was in S ∗ and is taken out to make room for the new item j . In the latter case the only possibility for theremoved item to return in O k for some k > j is to be selected by the algorithm and inserted in S . We can hence conclude that Q j ∩ Q k = ∅ for all j = k. In addition to that, it is clear that S ∪ S ∗ = S ∪ { ˆ ı } ∪ j =1 Q j . Therefore, if items 1 , , . . . , ℓ where all the items ever considered, using submodularity and the7act that S j ⊆ S ⊆ S ∪ ℓr = j +1 Q r , we have v ( S ∪ S ∗ ) − v ( S ) − v (ˆ ı ) ≤ v (( S ∪ S ∗ ) \ ˆ ı ) − v ( S ) = ℓ X j =1 v (cid:16) Q j (cid:12)(cid:12) S ∪ ℓr = j +1 Q r (cid:17) ≤ ℓ X j =1 v ( Q j | S j ) ≤ ℓ X j =1 X x ∈ Q j v ( x | S j ) c x · c x ≤ ℓ X j =1 X x ∈ Q j v ( j | S j ) c j · c x = ℓ X j =1 v ( j | S j ) c j · X x ∈ Q j c x = ℓ X j =1 v ( j | S j ) c j · c ( Q j ) , (1)where in a slight abuse of notation we consider c x to be the fractional (linear) cost if x ∈ Q j isa fractional item. While the ﬁrst three inequalities directly follow from the submodularity of v ,for the last inequality we need to combine the optimality of v ( j | S j ) /c j at the step j was selectedwith the fact that every single item x appearing in the sum P ℓj =1 P x ∈ Q j v ( x | S j ) was feasible (asa whole item) at that step. The latter is true because of the way we remove items from O . If x isremoved, it is removed before (any part of) ˆ ı is removed. Thus, x is removed when the availablebudget is still at least c ˆ ı . Given that c x ≤ c ˆ ı , we get that x is feasible until removed.To conclude the proof of the Lemma it is suﬃcient to note that c ( Q j ) = 0 for all items thatwere not considered.While the previous Lemma holds for each realization of the random coin tosses in the algo-rithm, we next consider inequalities holding in expectation over the randomness of the { r i } | A | i =1 in SampleGreedy . The indexing of the elements is hence to be considered deterministic and ﬁxedin advance, not as in the proof of Lemma 2.

Lemma 3. E X j ∈ A c ( Q j ) v ( j | S j ) c j  ≤ max { p, − p } p E [ v ( S )] Proof of Lemma 3.

For all i ∈ A , we deﬁne G i to be the random gain because of i at the time i isadded to the solution ( G i = v ( i | S i ) if i is added and 0 otherwise)Since v ( S ) = P i ∈ A G i , by linearity, it suﬃces to show that the following inequalities hold inexpectation over the coin tosses: c ( Q i ) v ( i | S i ) c i ≤ max { p, − p } p G i , ∀ i ∈ A. (2)In order to achieve that, following [24], let E i be any event specifying the random choices of thealgorithm up to the point i is considered (if i is never considered, E i captures all the randomness).If E i is an event that implies i is not considered, then the Eq. (2) is trivially true, due to G i = 0and Q i = ∅ . We focus now on the case E i implies that i is considered. Analyzing the algorithm, itis clear that E [ c ( Q i ) | E i ] ≤ ( · P ( r i = 1) + c i · P ( r i = 0) = (1 − p ) · c i , if i ∈ O i ,c i · P ( r i = 1) + 0 · P ( r i = 0) = p · c i , otherwise . (3)In short, E [ c ( Q i ) | E i ] ≤ max { p, − p } · c i . It is here that we use the fuzziness of O : without thefractional items it would be hopeless to bound c ( Q t ) with c t .8t this point, we exploit the fact that E i contains the information on S i , i.e., S i = S i ( E i )deterministically. Recall that S i is the solution set at the time item i is considered by the algorithm. E [ G i | E i ] = P ( i ∈ S | E i ) v ( i | S i ) = P ( r i = 1) v ( i | S i ) = p · c i v ( i | S i ) c i ≥ p max { p, − p } E [ c ( Q i ) | E i ] v ( i | S i ) c i = p max { p, − p } E (cid:20) c ( Q i ) v ( i | S i ) c i (cid:12)(cid:12)(cid:12) E i (cid:21) . We can hence conclude the proof by using the law of total probability over E i and the monotonicityof the conditional expectation: E [ G i ] = E [ E [ G i | E i ]] ≥ E (cid:20) p max { p, − p } E (cid:20) c ( Q i ) v ( i | S i ) c i (cid:12)(cid:12)(cid:12) E i (cid:21)(cid:21) == p max { p, − p } E (cid:20) c ( Q i ) v ( i | S i ) c i (cid:21) . Lemma 4. v ( S ∗ ) ≤ − p E [ v ( S ∪ S ∗ )] .Proof of Lemma 4. Let S ∗ be an optimal set for the constrained submodular maximization problem.We deﬁne g : 2 A → R + as follows: g ( B ) = v ( B ∪ S ∗ ) . It is a simple exercise to see that such functionis indeed submodular, moreover g ( ∅ ) = v ( S ∗ ) . If we now apply Lemma 1 to g , observing that theelements in the set S output by the algorithm are chosen with probability at most p , we concludethat: E [ v ( S ∪ S ∗ )] = E [ g ( S )] ≥ (1 − p ) g ( ∅ ) = (1 − p ) v ( S ∗ ) . Combining Lemmata 2, 3 and 4 we get(1 − p ) v ( S ∗ ) ≤ E [ v ( S ∪ S ∗ )] ≤ E  v ( S ) + v (ˆ ı ) + X j ∈ A c ( Q j ) v ( j | S j ) c j  ≤ E [ v ( S )] + v ( i ∗ ) + max { p, − p } p E [ v ( S )]= max n , p o · E [ v ( S )] + v ( i ∗ ) . (4)By substituting √ − p , this yields v ( S ∗ ) ≤ (3 + 2 √

2) max { E [ v ( S )] , v ( i ∗ ) } . This establishesthe claimed approximation factor.A naive implementation of SampleGreedy needs Θ( n ) value oracle calls in the worst case.Indeed, in each iteration all the remaining elements have their marginals updated and for largeenough B the greedy solution may contain a constant fraction of A . Applying lazy evaluations[37], however, we can cut the number of queries down to O ( nε − log ( n/ε )) losing only an ε in theapproximation factor (see also [16]). To achieve this, instead of recomputing all the marginals at9very step, we maintain an ordered queue of the elements sorted by their last known densities (i.e.,their marginal value per cost ratios) and use it to get a suﬃciently good element to add.More formally, the lazy implementation of SampleGreedy maintains the elements in a priorityqueue in decreasing order of density, which is initialised using the ratios v ( i ) /c i . At each step wepop the element on top of the queue. If its density with respect to the current solution is withina 1 + ε factor of its old one, then it is picked by the algorithm, otherwise it is reinserted in thequeue according to its new density and we pop the next element. Submodularity guarantees thatthe density of a picked element is at least 1 / (1 + ε ) of the best density for that step. As soon as anelement has been updated log( n/ε ) /ε times, we discard it. Theorem 3.

The lazy version of

SampleGreedy achieves an approximation factor of √ ε using O ( nε − log ( n/ε )) value oracle calls.Proof. For a given ε ∈ (0 ,

1) let ε ′ = ε/

6. We perform lazy evaluations using ε ′ , with log denotingthe binary logarithm.It is straightforward to argue about the number of value oracle calls. Since the marginalvalue of each element i has been updated at most log( n/ε ′ ) ε ′ times, we have a total of at most n log( n/ε ′ ) ε ′ = O (cid:0) n log( n/ε ) ε (cid:1) function evaluations.The approximation ratio is also easy to show. There are two distinct sources of loss in ap-proximation. We ﬁrst bound the total value of the discarded elements due to too many updates.This value appears as the upper bound of an extra additive term in Section 3. Indeed, now besides P ℓj =1 v (cid:16) Q j (cid:12)(cid:12) S ∪ ℓr = j +1 Q r (cid:17) we need to account for the elements of O that were ignored because oftoo many updates. Such elements, once they become “inactive” do not contribute to the cost ofthe current O and are never pushed out as new elements come into S . The deﬁnition of the Q j s inthe proof of Theorem 2 should be adjusted accordingly. That is, if W j are the elements of O thatbecome inactive because they were updated too many times during iteration j , we have v (( S ∪ S ∗ ) \ ˆ ı ) − v ( S ) ≤ ℓ X j =1 v (cid:0) Q j (cid:12)(cid:12) S j (cid:1) + ℓ X j =1 v (cid:0) W j (cid:12)(cid:12) S j (cid:1) . However, by noticing that for x ∈ (0 ,

1) it holds that x ≤ log(1 + x ), we have ℓ X j =1 v (cid:0) W j (cid:12)(cid:12) S j (cid:1) ≤ X i ∈ S j W j (1 + ε ′ ) − log( n/ε ′ ) ε ′ v ( i ) ≤ X i ∈ S j W j (1 + ε ′ ) − log( n/ε ′ )log(1+ ε ′ ) max k ∈ A v ( k ) ≤ X i ∈ A (1 + ε ′ ) − log ε ′ ( n/ε ′ ) v ( S ∗ )= X i ∈ A ε n v ( S ∗ ) = ε v ( S ∗ ) . For the second source of loss in approximation, recall that the marginals only decrease due tosubmodularity. So, we know that if some item j is considered during iteration j (following therenaming of Lemma 2), then (1 + ε ′ ) v ( j | S j ) /c j ≥ arg max k ∈ F v ( k | S j ) /c k . The only diﬀerence thismakes (compared to the proof of Theorem 2) is that in the last inequality of Section 3 we have anextra factor of 1 + ε ′ . 10ombining the above, we get the following analog of Lemma 2: v ( S ∪ S ∗ ) ≤ v ( S ) + v (ˆ ı ) + ε v ( S ∗ ) + X j ∈ A (cid:16) ε (cid:17) c ( Q j ) v ( j | S j ) c j , which carries over to Eq. (4), while Lemmata 3 and 4 are not aﬀected at all. It is then a matter ofsimple calculations to see that for p = √ −

1, we still get v ( S ∗ ) ≤ (3+2 √ ε ) max { E [ v ( S )] , v ( i ∗ ) } .Additionally, our analysis implies that SampleGreedy performs signiﬁcantly better in the large instance scenario, i.e., when the value of the optimal solution is much larger than the value ofany single element. While it is not expected to have exact knowledge of the factor δ in the followingproposition, often some estimate is accessible. Especially for massive instances, it is reasonable toassume that δ is bounded by a very small constant. Theorem 4. If max i ∈ A v ( i ) ≤ δ · opt for δ ∈ (0 , / , then SampleGreedy with p = − δ is a (4 + ε δ ) -approximation algorithm, where ε δ = δ (2 − δ )(1 − δ ) . Proof.

Starting from Eq. (4) and exploiting the large instance property, we get:(1 − p ) v ( S ∗ ) ≤ max n , p o · E [ v ( S )] + v ( i ∗ ) ≤ max n , p o · E [ v ( S )] + δ · v ( S ∗ ) . Rearranging the terms and assuming p + δ < , we have: v ( S ∗ ) ≤ max n , p o − p − δ . Optimizing for p ∈ (0 , − δ ) we get the desired statement. In this section we modify

SampleGreedy to achieve a good approximation guarantee in theadaptive framework. Recall that the adaptive valuation function v ( · , · ) depends on the state ofthe system which is discovered a bit at a time, in an adaptive fashion. Indeed, SampleGreedy iscompatible with this framework and can be applied nearly as it is. We stick to the interpretationof

SampleGreedy discussed right before Theorem 2. That is, there is no initial sampling phase.Instead, we directly begin to choose greedily with respect the density (marginal value with respectto the current solution over cost). Each time we are about to pick an element of A , we throw a p -biased coin that determines whether we keep or discard the element.Here the main diﬀerence with the greedy part of SampleGreedy is that the marginals are tobe considered with respect to the partial realization relative to the current solution. Moreover, sinceit is not possible to return the largest between max i ∈ A v ( i ) and the result of the greedy exploration,the choice between these two quantities has to be settled before starting the exploration. Formally,at the beginning of the algorithm a p -biased coin is tossed to decide between the two. The pseudo-code for the resulting algorithm, AdaptiveGreedy , is given below.Before proving that

AdaptiveGreedy works as promised, we need some observations. Letus denote by S the output of a run of our algorithm, and S ∗ the output of a run of the optimaladaptive strategy. Fix a realization ω ∈ Ω. Now, Lemma 1 of [26] implies E [ v ( S ∪ S ∗ , ω ) | ω ] ≥ (1 − p ) · v ( S ∗ , ω ) . daptiveGreedy Let r ∼ Bernoulli( p ) if r = 1 then i ∗ ∈ arg max k ∈ A v ( k ) /* best single item in expectation */ Observe ω i ∗ and return v ( i ∗ , ω i ∗ ) S = ∅ , R = B /* greedy solution and remaining knapsack capacity */ F = { k ∈ A | v ( k ) > } /* initial set of candidate items */ while F = ∅ do Let i ∈ arg max k ∈ F v ( k | ( S,ω | S )) c k Let r i ∼ Bernoulli( p ) /* independent random bit */ if r i = 1 then Observe ω i : S = S ∪ { i } , R = R − c i A = A \ { i } , F = n k ∈ A | v ( k | ( S, ω | S )) > c k ≤ R o return S, v ( S, ω | S )Since ω (and therefore, S ∗ ) is ﬁxed, the only randomness is due to the coin ﬂips in our algorithm.We stress that the union between S and S ∗ has to be intended in the following sense: run ouralgorithm, and independently, also the optimal one, both for the same realization ω . The previousinequality is true for any ω . So, by the law of total probability, we also have E [ v ( S ∪ S ∗ )] ≥ (1 − p ) · E [ v ( S ∗ )] . (5)For the next observation, assume our algorithm has picked (and therefore observed) exactly set S . That is, we know only ω | S . We number all items a ∈ A with positive marginal with respect to S by decreasing ratio v ( a | ( S,ω | S ) ) c a , i.e., a = arg max a ∈ A  v (cid:16) a | ( S, ω | S ) (cid:17) c a  , and so on. Note that this captures a notion of the best-looking items after already adding S .For k = min { i ∈ N | P il =1 c l ≥ B } , we get, in analogy to Lemma 1 of Gotovos et al. [26], k X i =1 v (cid:16) a i | ( S, ω | S ) (cid:17) ≥ E " X a ∈ S ∗ v (cid:16) a | ( S, ω | S ) (cid:17) | ω | S ≥ v (cid:16) S ∗ | ( S, ω | S ) (cid:17) . (6)Note that it could be the case that k is not well deﬁned, as there may not be enough elementswith positive marginal to ﬁll the knapsack. If that is the case, just consider k to be the number ofelements with positive marginals.The point of Eq. (6) is that, given ( S, ω | S ), the set of elements a . . . a k is deterministic, while S ∗ is not, because it corresponds to the set opened by the best adaptive policy. Moreover, in themiddle term notice that the conditioning inﬂuences the valuation, but not the policy, since we areassuming to run it obliviously. This is fundamental for the analysis.Since this holds for any set S , we can again generalize to the expectation over all possible runsof the algorithm (ﬁxing the coin ﬂips or not, as they only inﬂuence S ; the best adaptive policy or12he best-looking items a , a , . . . , a k are not aﬀected). So, we get E " k X i =1 v (cid:16) a i | ( S, ω | S ) (cid:17) ≥ E h v (cid:16) S ∗ | ( S, ω | S ) (cid:17)i . (7)We remark that k above is a random variable which depends on S . We use these observations toprove the ratio of our algorithm. Theorem 5.

For p = 1 / and p = 1 / , AdaptiveGreedy yields a -approximation of opt Ω ,while its lazy version achieves a (9 + ε ) -approximation using O ( nε − log ( n/ε )) value oracle calls.Moreover, when max i ∈ A v ( i ) ≤ δ · opt Ω for δ ∈ (0 , / , then for p = 0 and p = ( √ − δ − / , AdaptiveGreedy yields a (4 + 2 √ ε ′ δ ) -approximation, where ε ′ δ ≈ δ (2 − δ )(1 − δ ) .Proof. For any run of the algorithm, i.e., a ﬁxed set S , the corresponding partial realization ω | S and the coin ﬂips observed, deﬁne for convenience the set C as those items in { a , . . . , a k } that havebeen considered during the algorithm and then not added to S because of the coin ﬂips. Deﬁne U = { a , . . . , a k } \ C . Additionally, deﬁne C ′ to be the set of all items that are considered, butnot chosen during the run of our algorithm which have positive expected marginal contributionto S . I.e., C captures the items from the good-looking set after choosing S that we missed dueto coin tosses, and C ′ all items we missed for the same reason which should have had a positivecontribution in hindsight. Note that C ⊆ C ′ .We can then split the left hand side term of Eq. (7) into two parts: the sum over C (upperbounded by the sum over C ′ ), and the sum over U. Now we control separately these terms usinglinear combinations of v ( S ) and v ( i ∗ ) . Lemma 5. E [ v ( S )] ≥ p · E hP a ∈ C v (cid:16) a | ( S, ω | S ) (cid:17)i Proof.

Since C ⊆ C ′ and C ′ contains all considered elements with nonnegative expected contribu-tion to S , it is suﬃcient to show E [ v ( S )] ≥ p · E hP a ∈ C ′ v (cid:16) a | ( S, ω | S ) (cid:17)i . We proceed as in Lemma 3. Let’s consider for each a ∈ A all the events E a capturing the storyof a run of the algorithm up to the point element a is considered (all the history if it is neverconsidered).Let G a be the marginal contribution of element a to the solution set S . If E a corresponds toa story in which element a is not considered, then it does not contribute - neither in the left, norin the right hand side of the inequality we are trying to prove. Else, let ( S a , ω a ) be the partialsolution when it is indeed considered: E [ G a |E a ] = p · v ( a | ( S a , ω a )) ≥ p · E [ v ( a | ( S, ω | S )) |E a ] . The statement follows from the law of total probability with respect to E a , and state-wise submod-ularity of v . Lemma 6. E [ v ( S )] + v ( i ∗ ) ≥ E hP a ∈ U v (cid:16) a | ( S, ω | S ) (cid:17)i . Proof.

Now let us turn towards the items U that were not considered by the algorithm. Theintuition behind the claim is that if they were not considered then they were not good enough, inexpectation, to compare with S . The proof, though, has to deal with some probabilistic subtleties.Let’s start ﬁxing a story of the algorithm, i.e., the coin tosses and ( S, ω S ), S = s , s , . . . , s T ,numbered according to their insertion in S, i.e., s i is the i th element to be added to S . For the sake13f simplicity let’s also renumber the elements in U as a , . . . , a l respecting the order given by themarginals over costs.There are two cases. If during the whole algorithm the elements in U have ratio v ( a | ( S i ,ω | Si ) ) c a smaller than that of the item which was instead considered, then one can easily argue, by adaptivesubmodularity, that: X a ∈ U v (cid:16) a | ( S, ω | S ) (cid:17) ≤ T X t =1 v ( s t | ( S t , ω t )) + v ( u | ( S, ω | S ) ≤≤ T X t =1 v ( s t | ( S t , ω t )) + v ( u ) ≤ T X t =1 v ( s t | ( S t , ω t )) + v ( i ∗ ) . being S t = ( s , . . . , s t − ) and ω t the restriction of ω | S to S t . Note that the last element u is addedto account for the unspent budget by the solution. We claim that the above inequality holds also inthe case in which there is an element in U whose marginal over cost is greater than that of some in S . Such an element can exist because of the budget constraint: during the algorithm it had bettermarginal over cost, but was discarded because there was not enough room for it. We observe therecan exist at most one such element, due to the budget constraint and because its value is upperbounded by u , so the above formula still holds.Once we know that, by law of total probability, we have E " X a ∈ U v (cid:16) a | ( S, ω | S ) (cid:17) ≤ E [ v ( S )] + v ( i ∗ ) , (8)concluding the proof.Combining the two Lemmata we get:(1 + p ) E [ v ( S )] + E [ v ( i ∗ )] ≥ E " X a ∈ U v (cid:16) a | ( S, ω | S ) (cid:17) + E " X a ∈ C v (cid:16) a | ( S, ω | S ) (cid:17) == E " X a ∈ U ∪ C v (cid:16) a | ( S, ω | S ) (cid:17) . Equation Eq. (7) implies(1 + p ) E [ v ( S )] + E [ v ( i ∗ )] ≥ E " X a ∈ U ∪ C v (cid:16) a | ( S, ω | S ) (cid:17) ≥ E h v (cid:16) S ∗ | ( S, ω | S ) (cid:17)i . Also, by Eq. (5) and some algebra: E h v (cid:16) S ∗ | ( S, ω | S ) (cid:17)i = E [ E [ v ( S ∗ ∪ S, ω ) − v ( S, ω ) | ω S ]]= E [ v ( S ∗ ∪ S )] − E [ v ( S )] ≥ (1 − p ) · E [ v ( S ∗ )] − E [ v ( S )]All together, denoting as OP T the E [ v ( S ∗ )], we get:(2 p + 1) E [ v ( S )] + p E [ v ( i ∗ )] ≥ p (1 − p ) OP T (9)14et’s call

ALG the expected value of the solution output by the algorithm. Since the algorithmchooses with a coin ﬂip either the best expected single item or S , it holds ALG = (1 − p ) v ( S ) + p v ( i ∗ )Picking p = p p +1 , ALG = 2 p + 13 p + 1 E [ v ( S )] + p p + 1 E [ v ( i ∗ )] ≥ p (1 − p )3 p + 1 OP T.

The right hand side is minimized for p = , concluding the proof of the ﬁrst part of the statement.The lazy version of AdaptiveGreedy is analogous to the non-adaptive setting, both for thealgorithm and the analysis, so we omit repeating the proof.In order to prove the last part of the statement, we start from Eq. (9) and apply the largeinstance property:(1 − p ) E [ v ( S ∗ )] ≤ (cid:16) p (cid:17) E [ v ( S )] + E [ v ( i ∗ )] ≤ (cid:16) p (cid:17) E [ v ( S )] + δ · E [ v ( S ∗ )]Rearranging terms and assuming p + δ < E [ v ( S ∗ )] ≤ (cid:16) p (cid:17) − p − δ · E [ v ( S )] . Optimizing for p ∈ (0 , − δ ), we get the claimed result. Speciﬁcally, for p = ( √ − δ − / √ ε δ ), with ε δ = 2 √ − δ + 1(1 − δ ) + 11 − δ − √ − ! ≈ δ (2 − δ )(1 − δ ) . Out of the numerous applications of submodular maximization subject to a knapsack constraint,we evaluate

SampleGreedy and

AdaptiveGreedy on two selected examples, using real andsynthetic graph topologies. Variants of these have been studied in a similar context; see [39].As our algorithms are randomized, but extremely fast, we use the best output out of n = 5iterations. A delicate point is tuning the probabilities of acceptance p (line 9 of AdaptiveGreedy )for improved performance. While the choices of p in Theorems 2 and 5 minimize our analysis ofthe theoretical worst-case approximation, there are two factors that suggest a value much closer to1 works best in practice: the small value of any singleton solution, and the much better guaranteeof Lemma 4 for most widely used non-monotone submodular objectives. We do not micro-optimizefor p but rather choose uniformly at randomly from [0 . , Video Recommendation:

Suppose we have a large collection A of videos from various cat-egories (represented as possibly intersecting subsets C , . . . , C k ⊆ A ) and we want to design arecommendation system. When a user inputs a subset of categories and a target total length B ,the system should return a set of videos from the selected categories of total duration at most B that maximizes an appropriate objective function. (Of course, instead of time here, we could use15osts and a budget constraint.) Each video has a rating and there is some measure of similarity be-tween any two videos. We use a weighted graph on A to model the latter: each edge { i, j } betweentwo videos i and j has a weight w ij ∈ [0 ,

1] capturing the percentage of their similarity. To pave theway for our v ( · ), we start from the auxiliary objective f ( S ) = P i ∈ S P j ∈ A w ij − λ P i ∈ S P j ∈ S w ij ,for some λ ≥ maximal marginal relevance inspired objective [7] that rewardscoverage, while penalizing similarity. For λ = 1, internal similarities are irrelevant and f becomesa cut function. However, one can penalize similarities even more severely as f is submodular for λ ≥ λ = 5).In order to mimic the eﬀect of a partition matroid constraint, i.e., the avoidance of many videosfrom the same category, we may use two parameters λ ≥ , µ ≥

0. While λ is as above, µ putsextra weight on similarities between videos that belong to the same category. That leads to a moregeneral auxiliary objective g ( S ) = P i ∈ S P j ∈ A w ij − P i ∈ S P j ∈ S ( λ + χ ij µ ) w ij , where χ ij is equal to1 if there exists ℓ such that i, j ∈ C ℓ and 0 otherwise. To interpolate between choosing highly ratedvideos and videos that represent well the whole collection, here we use the submodular function v ( S ) = α P i ∈ S ρ i + βg ( S ) for α, β ≥

0, where ρ i is the rating of video i . We use λ = 3, µ = 7 andset the parameters α, β so that the two terms are of comparable size.We evaluate SampleGreedy on an instance based on the latest version of the MovieLens dataset[29], which includes 62000 movies, 13816 of which have both user-generated tags and ratings. Wecalculate the weights w ij using these tags (with the L2 norm of the pairwise minimum tag vector,see Appendix A) while the costs are drawn independently from U (0 , SampleGreedy consistently performsbetter than FANTOM for a wide range of budgets (Fig. 1a). Plotting the number of functionevaluations against the budget,

SampleGreedy is much faster (Fig. 1d) despite the fact that it isrun 5 times!

Remark 1.

The running time of FANTOM for ﬁxed ε is O ( nr log n ), where r is the cardinalityof the largest feasible solution. For a knapsack constraint this translates to O ( n log n ). To beas fair as possible, we implemented FANTOM using lazy evaluations, which improves the numberof evaluations of the objective function to O ( n log n ) and is indeed much faster in practice, forthe knapsack sizes we consider. Even so, our SampleGreedy is faster by a factor of Ω(log n )which, including the improvement in the constants involved, still makes a huge diﬀerence. Notethat in both Figs. 1d and 1e one can discern the superlinear increase of the function evaluationsfor FANTOM but not for SampleGreedy . Inﬂuence-and-Exploit Marketing:

Consider a seller of a single digital good (i.e., producingextra units of the good comes at no extra cost) and a social network on a set A of potential buyers.Suppose that the buyers inﬂuence each other and this is quantiﬁed by a weight w ij on each edge { i, j } between buyers i and j . Each buyer’s value for the good depends on who owns it within herimmediate social circle and how they inﬂuence her. A possible revenue-maximizing strategy forthe seller is to ﬁrst give the item for free to a selected set S of inﬂuential buyers (inﬂuence phase)and then extract revenue by selling to each of the remaining buyers at a price matching their valuefor the item due to the inﬂuential nodes (exploit phase). Here we further assume, similarly tothe adaptation of this model by Mirzasoleiman et al. [39], that each buyer comes with a cost ofconvincing her to advertise the product to her friends. The seller has a budget B and the set S should be such that P i ∈ S c i ≤ B . 16 .

02 0 .

04 0 .

06 0 .

08 0 . Budget . . . . . O b j ec t i v e F un c t i o n × FantomGreedySampleGreedy (a)

Recommendation on MovieLens

Number of Vertices . . . . . . . . O b j ec t i v e F un c t i o n × FantomSampleGreedy (b)

Graph Cut on G ( n, . . . . . Budget O b j ec t i v e F un c t i o n × DensityGreedyGreedyAdaptiveGreedy (c)

Revenue on YouTube graph .

02 0 .

04 0 .

06 0 .

08 0 . Budget . . . . . M a r g i n a l V a l u e C a ll s × FantomSampleGreedyGreedy (d)

Recommendation on MovieLens

Number of Vertices . . . . . . . . . M a r g i n a l V a l u e C a ll s × FantomSampleGreedy (e)

Graph Cut on G ( n, . Number of Vertices . . . . . . . . O b j ec t i v e F un c t i o n × DensityGreedyGreedyAdaptiveGreedy (f)

Revenue on G ( n, / √ n )Figure 1: The four plots on the left compare the performance and the number of function evaluations of

SampleGreedy and FANTOM on the video recommendation problem for the MovieLens dataset (a), (d)and on the maximum weighted cut problem on random graphs (b), (e). Since no ε ≤ ε = 1 to achievethe maximum possible speedup. The plots on the far right illustrate the performance of AdaptiveGreedy (ignoring single item solutions, i.e., p = 0) on the inﬂuence-and-exploit problem for two distinct topologies:the YouTube graph (c) and random graphs (f). All budgets are shown as fractions of the total cost. We adopt the generalization of the

Concave Graph Model of Hartline et al. [30] to non-monotonefunctions [2]. Each buyer i ∈ A is associated with a non-negative concave function f i . For any i ∈ A and any set S ⊂ A { i } of agents already owning the good, the value of i for it is v i ( S ) = f i (cid:0) P j ∈ S ∪{ i } w ij (cid:1) . The total potential revenue v ( S ) = P i ∈ A S v i ( S ) that we aim to maximize is anon-monotone submodular function. Besides the theoretical guarantees for inﬂuence-and-exploitmarketing in the Bayesian setting [30], there are strong experimental evidence of its performancein practice [2]. The problem generalizes naturally to diﬀerent stochastic versions. We assume thatthe valuation function of each buyer i is of the form f i ( x ) = a i √ x where a i is drawn independentlyfrom a Pareto Type II distribution with λ = 1, α = 2. We only learn the exact value of a buyerwhen we give the good for free to someone in her neighborhood.We evaluate AdaptiveGreedy on an instance based on the YouTube graph [47], containing1,134,890 vertices. The (known) weights are drawn independently from U (0 , , and the costs areproportional to the sum of the weights of the incident edges. As AdaptiveGreedy is the ﬁrstadaptive algorithm for the problem, we compare with non-adaptive alternatives like

Greedy and Density Greedy for diﬀerent values of the budget. AdaptiveGreedy outperforms the alternativesby up to 20% (Fig. 1c).We observe similar improvements for Erdős-Rényi random graphs of diﬀerent sizes and edge The simple greedy algorithm that in each step picks the element with the largest marginal value. The greedy part of Wolsey’s algorithm [46]. / √ n and a ﬁxed budget of 10% of the total cost (Fig. 1f). Maximum Weighted Cut:

Beyond the above applications, we would like to compare

Sam-pleGreedy to FANTOM with respect to both their performance and the number of value oraclecalls as n grows. We turn to weighted cut functions —one of the most prominent subclasses ofnon-monotone submodular functions—on dense Erdős–Rényi random graphs with edge probability0 .

2. The weights and the costs are drawn independently and uniformly from [0 ,

1] and the budgetis ﬁxed to 15% of the total cost. Again

SampleGreedy consistently performs better than FAN-TOM, albeit by 5–15% (Fig. 1b). In terms of running time, there is a large diﬀerence in favor of

SampleGreedy (even for multiple runs), while the superlinear increase for FANTOM is evident (Fig. 1e).

The proposed random greedy method yields a considerable improvement over state-of-the-art al-gorithms, especially, but not exclusively, regarding the handling of huge instances. With all thesubtleties of our work aﬀecting solely our analysis, the algorithm remains strikingly simple and weare conﬁdent this will also contribute to its use in practice. Simultaneously, this very simplicitytranslates into a generality that can be employed to achieve comparably good results for a varietyof settings; we demonstrated this in the case of the adaptive submodularity setting.Speciﬁcally, we expect that our approach can be directly utilised to improve the performanceand running time of algorithms that now use some variant of the algorithm of Gupta et al. [27].Such examples include the distributed algorithm of da Ponte Barbosa et al. [12] and the streamingalgorithm of Mirzasoleiman et al. [40] in the case of a knapsack constraint. We further suspectthat the same algorithmic principle can be applied in the presence of incentives. This wouldlargely improve the current state of the art in budget-feasible mechanism design for non-monotoneobjectives [11, 1].Finally, a major question here is whether the same high level approach is valid even in thepresence of additional combinatorial constraints. In particular, is it possible to achieve similarguarantees as FANTOM for a p -system and multiple knapsack constraints using only O ( n log n )value queries? 18 eferences [1] G. Amanatidis, P. Kleer, and G. Schäfer. Budget-feasible mechanism design for non-monotonesubmodular objectives: Oﬄine and online. In Proceedings of the 2019 ACM Conference onEconomics and Computation, EC 2019, Phoenix, AZ, USA, June 24-28, 2019 , pages 901–919.ACM, 2019.[2] M. Babaei, B. Mirzasoleiman, M. Jalili, and M. A. Safari. Revenue maximization in socialnetworks through discounting.

Social Netw. Analys. Mining , 3(4):1249–1262, 2013.[3] A. Badanidiyuru, B. Mirzasoleiman, A. Karbasi, and A. Krause. Streaming submodular max-imization: massive data summarization on the ﬂy. In

The 20th ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining, KDD ’14, New York, NY, USA - Au-gust 24 - 27, 2014 , pages 671–680. ACM, 2014.[4] E. Balkanski and Y. Singer. The adaptive complexity of maximizing a submodular function.In

Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, STOC2018, Los Angeles, CA, USA, June 25-29, 2018 , pages 1138–1151. ACM, 2018.[5] E. Balkanski, A. Rubinstein, and Y. Singer. An exponential speedup in parallel running timefor submodular maximization without loss in approximation. In

Proceedings of the ThirtiethAnnual ACM-SIAM Symposium on Discrete Algorithms, SODA 2019, San Diego, California,USA, January 6-9, 2019 , pages 283–302. SIAM, 2019.[6] N. Buchbinder, M. Feldman, J. Naor, and R. Schwartz. Submodular maximization with car-dinality constraints. In

Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium onDiscrete Algorithms, SODA 2014, Portland, Oregon, USA, January 5-7, 2014 , pages 1433–1452. SIAM, 2014.[7] J. Carbinell and J. Goldstein. The use of MMR, diversity-based reranking for reorderingdocuments and producing summaries.

SIGIR Forum , 51(2):209–210, 2017.[8] C. Chekuri, J. Vondrák, and R. Zenklusen. Submodular function maximization via the mul-tilinear relaxation and contention resolution schemes.

SIAM J. Comput. , 43(6):1831–1879,2014.[9] C. Chekuri, S. Gupta, and K. Quanrud. Streaming algorithms for submodular function max-imization. In

Automata, Languages, and Programming - 42nd International Colloquium,ICALP 2015, Kyoto, Japan, July 6-10, 2015, Proceedings, Part I , volume 9134 of

LectureNotes in Computer Science , pages 318–330. Springer, 2015.[10] C. Chekuri, T. S. Jayram, and J. Vondrák. On multiplicative weight updates for concave andsubmodular function maximization. In

Proceedings of the 2015 Conference on Innovationsin Theoretical Computer Science, ITCS 2015, Rehovot, Israel, January 11-13, 2015 , pages201–210. ACM, 2015.[11] N. Chen, N. Gravin, and P. Lu. On the approximability of budget feasible mechanisms. In

Proceedings of the Twenty-Second Annual ACM-SIAM Symposium on Discrete Algorithms,SODA 2011, San Francisco, California, USA, January 23-25, 2011 , pages 685–699, 2011.[12] R. da Ponte Barbosa, A. Ene, H. L. Nguyen, and J. Ward. The power of randomization:Distributed submodular maximization on massive datasets. In

ICML , volume 37 of

JMLRWorkshop and Conference Proceedings , pages 1236–1244. JMLR.org, 2015.1913] A. Das and D. Kempe. Algorithms for subset selection in linear regression. In

Proceedingsof the 40th Annual ACM Symposium on Theory of Computing, Victoria, British Columbia,Canada, May 17-20, 2008 , pages 45–54. ACM, 2008.[14] A. Das and D. Kempe. Approximate submodularity and its applications: Subset selection,sparse approximation and dictionary selection.

Journal of Machine Learning Research , 19(3):1–34, 2018.[15] J. Edmonds. Matroids and the greedy algorithm.

Math. Program. , 1(1):127–136, 1971.[16] A. Ene and H. L. Nguyen. A nearly-linear time algorithm for submodular maximizationwith a knapsack constraint. In , volume 132 of

LIPIcs , pages53:1–53:12. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2019.[17] S. Fadaei, M. Fazli, and M. Safari. Maximizing non-monotone submodular set functions subjectto diﬀerent constraints: Combined algorithms.

Oper. Res. Lett. , 39(6):447–451, 2011.[18] M. Fahrbach, V. S. Mirrokni, and M. Zadimoghaddam. Submodular maximization with nearlyoptimal approximation, adaptivity and query complexity. In

Proceedings of the ThirtiethAnnual ACM-SIAM Symposium on Discrete Algorithms, SODA 2019, San Diego, California,USA, January 6-9, 2019 , pages 255–273. SIAM, 2019.[19] M. Fahrbach, V. S. Mirrokni, and M. Zadimoghaddam. Non-monotone submodular maxi-mization with nearly optimal adaptivity and query complexity. In

Proceedings of the 36thInternational Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach,California, USA , volume 97 of

Proceedings of Machine Learning Research , pages 1833–1842.PMLR, 2019.[20] U. Feige. A threshold of ln( n ) for approximating set cover. J. ACM , 45(4):634–652, 1998.[21] U. Feige, V. S. Mirrokni, and J. Vondrák. Maximizing non-monotone submodular functions.

SIAM J. Comput. , 40(4):1133–1153, 2011.[22] M. Feldman and R. Zenklusen. The submodular secretary problem goes linear.

SIAM J.Comput. , 47(2):330–366, 2018.[23] M. Feldman, J. Naor, and R. Schwartz. A uniﬁed continuous greedy algorithm for submodularmaximization. In

IEEE 52nd Annual Symposium on Foundations of Computer Science, FOCS2011, Palm Springs, CA, USA, October 22-25, 2011 , pages 570–579. IEEE Computer Society,2011.[24] M. Feldman, C. Harshaw, and A. Karbasi. Greed is good: Near-optimal submodular maxi-mization via greedy optimization. In

Proceedings of the 30th Conference on Learning Theory,COLT 2017, Amsterdam, The Netherlands, 7-10 July 2017 , volume 65 of

Proceedings of Ma-chine Learning Research , pages 758–784. PMLR, 2017.[25] D. Golovin and A. Krause. Adaptive submodularity: Theory and applications in active learningand stochastic optimization.

J. Artif. Intell. Res. , 42:427–486, 2011.[26] A. Gotovos, A. Karbasi, and A. Krause. Non-monotone adaptive submodular maximization.In

Proceedings of the Twenty-Fourth International Joint Conference on Artiﬁcial Intelligence,IJCAI 2015, Buenos Aires, Argentina, July 25-31, 2015 , pages 1996–2003. AAAI Press, 2015.2027] A. Gupta, A. Roth, G. Schoenebeck, and K. Talwar. Constrained non-monotone submodularmaximization: Oﬄine and secretary algorithms. In

Internet and Network Economics - 6thInternational Workshop, WINE 2010, Stanford, CA, USA, December 13-17, 2010. Proceedings ,volume 6484 of

LNCS , pages 246–257. Springer, 2010.[28] A. Gupta, V. Nagarajan, and S. Singla. Adaptivity gaps for stochastic probing: Submodularand XOS functions. In

Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium onDiscrete Algorithms, SODA 2017, Barcelona, Spain, Hotel Porta Fira, January 16-19 , pages1688–1702. SIAM, 2017.[29] F. M. Harper and J. A. Konstan. The MovieLens datasets: History and context.

ACM Trans.Interact. Intell. Syst. , 5(4), Dec. 2015. ISSN 2160-6455.[30] J. D. Hartline, V. S. Mirrokni, and M. Sundararajan. Optimal marketing strategies over socialnetworks. In

Proceedings of the 17th International Conference on World Wide Web, WWW2008, Beijing, China, April 21-25, 2008 , pages 189–198. ACM, 2008.[31] D. Kempe, J. M. Kleinberg, and É. Tardos. Maximizing the spread of inﬂuence through asocial network.

Theory of Computing , 11:105–147, 2015.[32] R. Khanna, E. R. Elenberg, A. G. Dimakis, S. N. Negahban, and J. Ghosh. Scalable greedyfeature selection via weak submodularity. In

Proceedings of the 20th International Conferenceon Artiﬁcial Intelligence and Statistics, AISTATS 2017, 20-22 April 2017, Fort Lauderdale,FL, USA , volume 54 of

Proceedings of Machine Learning Research , pages 1560–1568. PMLR,2017.[33] S. Khuller, A. Moss, and J. S. Naor. The budgeted maximum coverage problem.

Informationprocessing letters , 70(1):39–45, 1999.[34] A. Kulik, H. Shachnai, and T. Tamir. Approximations for monotone and nonmonotone sub-modular maximization with knapsack constraints.

Math. Oper. Res. , 38(4):729–739, 2013.[35] J. Lee, V. S. Mirrokni, V. Nagarajan, and M. Sviridenko. Maximizing nonmonotone sub-modular functions under matroid or knapsack constraints.

SIAM J. Discrete Math. , 23(4):2053–2078, 2010.[36] H. Lin and J. A. Bilmes. Multi-document summarization via budgeted maximization of sub-modular functions. In

Human Language Technologies: Conference of the North AmericanChapter of the Association of Computational Linguistics, Proceedings, June 2-4, 2010, LosAngeles, California, USA , pages 912–920. The Association for Computational Linguistics,2010.[37] M. Minoux. Accelerated greedy algorithms for maximizing submodular set functions. In

Optimization Techniques , pages 234–243, Berlin, Heidelberg, 1978. Springer Berlin Heidelberg.[38] B. Mirzasoleiman, A. Karbasi, R. Sarkar, and A. Krause. Distributed submodular maximiza-tion: Identifying representative elements in massive data. In

Advances in Neural InformationProcessing Systems 26: 27th Annual Conference on Neural Information Processing Systems2013, NIPS 2013, December 5-8, 2013, Lake Tahoe, Nevada, United States , pages 2049–2057,2013. 2139] B. Mirzasoleiman, A. Badanidiyuru, and A. Karbasi. Fast constrained submodular maximiza-tion: Personalized data summarization. In

Proceedings of the 33nd International Conferenceon Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016 , volume 48of

JMLR , pages 1358–1367. JMLR.org, 2016.[40] B. Mirzasoleiman, S. Jegelka, and A. Krause. Streaming non-monotone submodular maximiza-tion: Personalized video summarization on the ﬂy. In

Proceedings of the Thirty-Second AAAIConference on Artiﬁcial Intelligence, (AAAI-18), New Orleans, Louisiana, USA, February2-7, 2018 , pages 1379–1386. AAAI Press, 2018.[41] M. Mitrovic, E. Kazemi, M. Feldman, A. Krause, and A. Karbasi. Adaptive sequence sub-modularity. In

Advances in Neural Information Processing Systems 32: Annual Conference onNeural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver,BC, Canada , pages 5353–5364, 2019.[42] G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. An analysis of approximations for maxi-mizing submodular set functions - I.

Math. Program. , 14(1):265–294, 1978.[43] R. Sipos, A. Swaminathan, P. Shivaswamy, and T. Joachims. Temporal corpus summarizationusing submodular word coverage. In , pages754–763. ACM, 2012.[44] M. Sviridenko. A note on maximizing a submodular set function subject to a knapsack con-straint.

Oper. Res. Lett. , 32(1):41–43, 2004.[45] S. Tschiatschek, R. K. Iyer, H. Wei, and J. A. Bilmes. Learning mixtures of submodularfunctions for image collection summarization. In

Advances in Neural Information ProcessingSystems 27: Annual Conference on Neural Information Processing Systems 2014, NIPS 2014,December 8-13 2014, Montreal, Quebec, Canada , pages 1413–1421, 2014.[46] L. A. Wolsey. Maximising real-valued submodular functions: Primal and dual heuristics forlocation problems.

Math. Oper. Res. , 7(3):410–425, 1982.[47] J. Yang and J. Leskovec. Deﬁning and evaluating network communities based on ground-truth.

Knowledge and Information Systems , 42(1):181–213, 2015.22

Additional Details on Section 5

All graphs contain error bars, indicating the standard deviation between diﬀerent runs of theexperiments. This is usually insigniﬁcant due to the concentrating eﬀect of the large size of theinstances, despite the randomly initialized weights and inherent randomness of the algorithmsused. Nevertheless, all results are obtained by running each experiment a number of times. For allalgorithms involved, we use lazy evaluations with ε = 0 . Video Recommendation:

We expand on the exact deﬁnition of the similarity measure that isonly tersely described in the main text. Each movie i is associated with a tag vector t i ∈ [0 , ,where each coordinate contains a relevance score for that individual tag. These tag vectors are not normalized and have no additional structure, other than each coordinate being restricted to [0 , w ij between two movies i, j as: w ij = vuut X k =1 (cid:16) min { t ik , t jk } (cid:17) . In other words, it is the L2 norm of the coordinate-wise minimum of t i and t j . This metric waschosen so that if both movies have a high value in some tag, this counts as a much stronger similaritythan one having a high value and the other a low one. For example, if we consider an inner productmetric, any movie with all tags set to 1 would be as similar as possible to all other movies, eventhough it would include many tags that would be missing from the others. In particular, any moviewould be appear more similar to the all 1 movie than to itself! Choosing the minimum of both tagsavoids this issue. Another possibility would be to normalize each tag vector before taking the innerproduct, to obtain the cosine similarity . Although this alleviates some of the issues, there is someinformation loss as one movie could meaningfully have higher scores in all tags than another one;tags are not mutually exclusive. Ultimately any sensible metric has advantages and disadvantagesand the exact choice has little bearing on our results. The similarity scores are then divided bytheir maximum as a ﬁnal normalization step.The experiment was repeated 5 times. The budget is represented as a fraction of the total coststarting at 1/100 and geometrically increasing to 1/10 in 10 steps. The total computation timewas around 3 hours. Inﬂuence-and-Exploit Marketing:

For the YouTube graph, the experiment was repeated 5times for a budget starting at 1/100 of the total cost and geometrically increasing to 1/3 in 20steps, leading to a total computation time of 7 hours. For the Erdős–Rényi graph with n verticesand edge probability 5 / √ n it was repeated 10 times, for n starting at 50 and geometrically increasingto 2500 in 20 steps, taking approximately 10 minutes. Maximum Weighted Cut:

The experiment was repeated 10 times for nn