Submodular Maximization subject to a Knapsack Constraint: Combinatorial Algorithms with Near-optimal Adaptive Complexity
Georgios Amanatidis, Federico Fusco, Philip Lazos, Stefano Leonardi, Alberto Marchetti Spaccamela, Rebecca Reiffenhäuser
SSubmodular Maximization subject to a Knapsack Constraint:Combinatorial Algorithms with Near-optimal Adaptive Complexity ∗Georgios Amanatidis † Federico Fusco ‡ Philip Lazos ‡ Stefano Leonardi ‡ Alberto Marchetti Spaccamela ‡ Rebecca Reiffenhäuser ‡ February 17, 2021
Abstract
The growing need to deal with massive instances motivates the design of algorithms balancingthe quality of the solution with applicability. For the latter, an important measure is the adaptivecomplexity , capturing the number of sequential rounds of parallel computation needed. In thiswork we obtain the first constant factor approximation algorithm for non-monotone submodularmaximization subject to a knapsack constraint with near-optimal O (log n ) adaptive complexity.Low adaptivity by itself, however, is not enough: one needs to account for the total numberof function evaluations (or value queries) as well. Our algorithm asks ˜ O ( n ) value queries, butcan be modified to run with only ˜ O ( n ) instead, while retaining a low adaptive complexity of O (log n ). Besides the above improvement in adaptivity, this is also the first combinatorial approach with sublinear adaptive complexity for the problem and yields algorithms comparableto the state-of-the-art even for the special cases of cardinality constraints or monotone objectives.Finally, we showcase our algorithms’ applicability on real-world datasets. Submodular optimization is a very popular topic relevant to various research areas as it captures thenatural notion of diminishing returns . Its numerous applications include viral marketing [27, 29],data summarization [39, 35], feature selection [13, 14, 36] and clustering [34]. Prominent examplesfrom combinatorial optimization are cut functions in graphs and coverage functions.Submodularity is often implicitly associated with monotonicity, and many results rely on thatassumption. However, non-monotone submodular functions do naturally arise in applications,either directly or from combining monotone submodular objectives and modular penalization or regularization terms [27, 39, 35, 6, 1]. Additional constraints, like cardinality, matroid, knapsack,covering, and packing constraints, are prevalent in applications and have been extensively studied.In this list, knapsack constraints are among the most natural, as they capture limitations on budget,time, or size of the elements. Like matroid constraints, they generalize cardinality constraints, yetthey are not captured by the former. ∗ This work was supported by the ERC Advanced Grant 788893 AMDROMA “Algorithmic and Mechanism DesignResearch in Online Markets”, the MIUR PRIN project ALGADIMAR “Algorithms, Games, and Digital Markets”,and the NWO Veni project No. VI.Veni.192.153. † Department of Mathematical Sciences, University of Essex, UK, and Institute for Logic, Language and Computation,University of Amsterdam, The Netherlands. Email: [email protected] ‡ Department of Computer, Control, and Management Engineering “Antonio Ruberti”, Sapienza University ofRome, Italy. Email: {fuscof, lazos, leonardi, alberto, rebeccar}@diag.uniroma1.it a r X i v : . [ c s . D S ] F e b he main computational bottleneck in submodular optimization comes from the necessity torepeatedly evaluate the objective function for various candidate sets. These so-called value queries are often notoriously heavy to compute, e.g., for exemplar-based clustering [15], log-determinant ofsubmatrices [28], and accuracy of ML models [13, 30]. With real-world instances of these problemsgrowing to enormous sizes, simply reducing the number of queries is not always sufficient andparallelisation has become an increasingly central paradigm. However, classic results in the area,often based on the greedy method, are inherently sequential: the intuitive approach of building asolution element-by-element contradicts the requirement of running independent computations onmany machines in parallel. The degree to which an algorithm can be parallelized is measured by thenotion of adaptive complexity , or adaptivity, introduced in Balkanski et al. [3]. It is defined as thenumber of sequential rounds of parallel computation needed to terminate. In each of these rounds,polynomially many value queries may be asked, but they can only depend on the answers to queriesissued in past rounds. Contribution.
We design the first combinatorial randomized algorithms for maximizing a (possi-bly) non-monotone submodular function subject to a knapsack constraint that combine constantapproximation, low adaptive complexity, and a small number of queries. In particular, we obtain• a 9 . ParKnapsack , that has O (log n ) adaptivity and uses O ( n log n ) value queries. This is the first constant factor approximation to the problem withoptimal adaptive complexity up to a O (log log n ) factor (Theorem 1).• a variant of our algorithm with the same approximation, near-optimal O ( n log n ) query com-plexity, and O (log n ) adaptivity (Theorem 2). This is the first constant factor approximationalgorithm that uses only ˜ O ( n ) queries and has sublinear adaptivity.• 3-approximation algorithms for monotone objectives that combine O (log n ) adaptivity with O ( n log n ) total queries, and O (log n ) adaptivity with O ( n log n ) queries, respectively(Theorem 3). Even in the monotone setting, the latter is the first O (1)-approximationalgorithm combining ˜ O ( n ) queries and sublinear adaptivity.• 5 . cardinality constraints that match or surpass the state-of-the-art when it comes to the combination of approximation, adaptivity and total queries(Theorem 4).See Table 1 for an overview of our results. Technical Challenges.
Like existing work for cardinality or matroid constraints, e.g., [5, 6], inorder to reduce the adaptive complexity we iteratively sample sequences of feasible elements andadd large chunks of them to our solution. However, knapsack constraints do not allow for theelegant counting arguments used in the case of cardinality or matroid constraints. The reason isthat while the latter can be interpreted as a 1-independence system, a knapsack constraint inducesa Θ( n )-independence system, leading to poor results when naively adjusting existing approaches.A natural and very successful way of circumventing the resulting difficulties is to turn towards a continuous version of the problem. This, however requires evaluating the objective function also for fractional sets, i.e., such algorithms require access to an oracle for the multilinear relaxation and itsgradient. Typically, these values are estimated by sampling, requiring ˜Θ( n ) samples (see Chekuriand Quanrud [9]). Our choice to avoid the resulting increase in query complexity and deal directlywith the discreteness of the problem calls for specifically tailored algorithmic approaches. Most2eference Objective Constraint Approx. Adaptivity QueriesEne et al. [19] General Knapsack e + ε O (log n ) ˜ O ( n )Theorem 1 (this work) General Knapsack 9 .
465 + ε O (log n ) ˜ O ( n )Theorem 2 (this work) General Knapsack 9 .
465 + ε O (log n ) ˜ O ( n ) Ene et al. [19] Monotone Knapsack ee − + ε O (log n ) ˜ O ( n )Chekuri and Quanrud [9] Monotone Knapsack ee − + ε O (log n ) ˜ O ( n )Theorem 3 (this work) Monotone Knapsack 3 + ε O (log n ) ˜ O ( n )Theorem 3 (this work) Monotone Knapsack 3 + ε O (log n ) ˜ O ( n ) Ene and Nguyen [17] General k -Cardinality e + ε O (log n ) ˜ O ( nk )Kuhnle [31] General k -Cardinality 6 + ε O (log n ) ˜ O ( n ) Kuhnle [31] General k -Cardinality 5 .
18 + ε O (log n ) ˜ O ( n ) Theorem 4 (this work) General k -Cardinality 5 .
83 + ε O (log n ) ˜ O ( nk )Theorem 4 (this work) General k -Cardinality 5 .
83 + ε O (log n log k ) ˜ O ( n ) Table 1: Our results—main result highlighted—compared to the state-of-the-art for low-adaptivity. Boldindicates the best result(s) in each setting. In the last two columns the dependence on ε is omitted; in thelast column only the leading terms are stated. crucially, our main subroutine ThreshSeq needs to balance a suitable definition of good quality candidates with a way to also reduce the size (not simply by cardinality, but a combination of overallcost and absolute marginal values) of the candidate set by a constant factor in each adaptive round.Both these goals are further hindered by our second main challenge, non-monotonicity. Inpresence of elements with negative marginals, not only is it harder to maintain a good quality of oursolution, but size measures like the overall absolute marginals of our candidate sets are no longerinclusion-monotone. In fact, even one such element can arbitrarily deteriorate intuitive qualitymeasures like the overall marginal density of the candidate set, causing a new adaptive round. Ourapproach combines carefully designed stopping times in
ThreshSeq with a separate handling ofthe elements responsible for most of the above mentioned discreteness issues , i.e., elements withcost less than 1 /n of the budget and elements of maximum value. Related Work.
Submodular maximization has been studied extensively since the seminal workof Nemhauser et al. [37]. For monotone submodular objectives subject to a knapsack constraintthe ee − -approximation algorithm of Sviridenko [38] is best-possible, unless P = NP [22]. For the non-monotone case, a number of continuous greedy approaches [24, 32, 11] led to the currentbest factor of e when a knapsack, or any downward closed, constraint is involved. Combinatorialapproaches [25, 1] achieve somewhat worse approximation, but are often significantly faster andthus relevant in practice.Adaptive complexity for submodular maximization was introduced by Balkanski and Singer [2] formonotone objectives and a cardinality constraint, where they achieved an O (1) approximation with O (log n ) adaptivity, along with an almost matching lower bound: to get an o (log n ) approximation,adaptivity must be Ω (cid:0) log n log log n (cid:1) . This result has been then improved [4, 16, 20] and recentlyBreuer et al. [6] achieved an optimal ee − -approximation in O (log n log log k ) adaptive rounds and O ( n log log k ) query complexity, where k is the cardinality constraint.The study of adaptivity for non-monotone objectives was initiated by Balkanski et al. [3] againfor a cardinality constraint, showing a constant approximation in O (log n ) adaptive rounds, laterimproved by [21, 17, 31]. Non-monotone maximization is also interesting in the unconstrained3cenario. Recently, Ene et al. [18] and Chen et al. [12] achieved a 2 + ε approximation with constantadaptivity depending only on ε . Note that the algorithm of Chen et al. [12] needs only ˜ O ( n ) valuequeries, where the ˜ O hides terms poly-logarithmic in ε − and n .Richer constraints, e.g., matroids and multiple packing constraints, have also been studied[5, 19, 10, 9]. For knapsack constraints (as a special case of packing constraints) Ene et al. [19]and Chekuri and Quanrud [9] provide low adaptivity results— O (log n ) for non-monotone and O (log n ) for monotone—via continuous approaches (see Table 1; notice that the query complexityof these algorithms is stated with respect to queries to f and not to its multilinear extension).Chekuri and Quanrud [9] also provide two combinatorial algorithms for the monotone case: onewith optimal approximation and adaptivity but O ( n ) value queries, and one with linear querycomplexity, optimal adaptivity but an approximation factor parameterized by max x ∈N c ( x ) whichcan be arbitrarily bad. Let f : 2 N → R be a set function over a ground set N of n elements. For S, T ⊆ N , f ( S | T )denotes the marginal value of S with respect to T , i.e., f ( S ∪ T ) − f ( T ). To ease notation, we write f ( x | T ) instead of f ( { x } | T ). The function f is non-negative if f ( S ) ≥ ∀ S ⊆ N , monotone if f ( S ) ≤ f ( T ), ∀ S, T ⊆ N , and submodular if f ( x | T ) ≤ f ( x | S ), ∀ S, T ⊆ N with S ⊆ T and x / ∈ T .We study non-negative, possibly non-monotone , submodular maximization under a knapsackconstraint . Formally, we are given a budget B >
0, a non-negative submodular function f and anon-negative additive cost function c : 2 N → R > . The goal is to find O ∗ ∈ arg max T ⊆N : c ( T ) ≤ B f ( T ).Let OP T = f ( O ∗ ) denote the value of such an optimal set. Given a (randomized) algorithm forthe problem, let ALG denote the expected value of its output. We say that the algorithm is a β -approximation algorithm if ALG · β ≥ OP T . Throughout this work, we assume, without loss ofgenerality, that max x ∈N c ( x ) ≤ B .We assume access to f through value queries, i.e., for each S ⊆ N , an oracle returns f ( S ) inconstant time. Given such an oracle for f , the adaptive complexity or adaptivity of an algorithmis the minimum number of rounds in which the algorithm makes O (poly( n )) independent queriesto the evaluation oracle. In each adaptive round the queries may only depend on the answers toqueries from past rounds. With respect to the same oracle, the query complexity of an algorithm isthe total number of value queries it makes.We finally state some widely known properties of submodular functions that are extensivelyused in the rest of the paper. The first lemma summarizes two equivalent definitions of submodularfunctions shown by Nemhauser et al. [37]. Lemma 1.
Let f : 2 N → R be a submodular function and S, T, U be any subsets of N , with S ⊆ T .Then i) f ( U | T ) ≤ f ( U | S ) , ii) f ( S | T ) ≤ P x ∈ S f ( x | T ) . The second lemma, Lemma 2.2 of Buchbinder et al. [7], is an important tool for tacklingnon-monotonicity.
Lemma 2 (Sampling Lemma) . Let f : 2 N → R be a submodular function, X ⊆ N and X p bea random subset of X , where each element of X is contained with probability at most p . Then E [ f ( X p )] ≥ (1 − p ) f ( ∅ ) . Finally, we assume access to
SubmodMax , an unconstrained submodular maximization oracle.For instance, this can be implemented via the combinatorial algorithm of Chen et al. [12], whichoutputs a (2 + ε )-approximation of max T ⊆N f ( T ) for a given precision ε in O ( ε log ε ) adaptive4ounds and linear query complexity. For our experiments, we use the much simpler 4-approximationof Feige et al. [23], which has adaptive complexity 1. To achieve sublinear adaptivity we need to add large chunks of elements to the solution withoutusing intermediate value queries. The sequence of elements that are candidates to be added to thecurrent solution is randomly drawn in
SampleSeq . This subroutine receives as input a partialsolution S , a set of feasible elements X and a budget, and outputs a sequence A each element ofwhich is sequentially drawn uniformly at random among the remaining elements of X that do notcause S ∪ A to exceed the budget. We do, however, need to restrict ourselves to only adding a suitable prefix of A ; with each element added, the original “good” quality of the leftover candidatesin X can quickly deteriorate.The selection of the prefix of the sequence A = [ a , . . . , a d ] to be added to the current solution S is then done by ThreshSeq . Given a threshold τ , we add to S a prefix A i = [ a , . . . , a i ] suchthat for all j < i the average contribution to S ∪ A j of the elements in X \ A j is comparable to τ .Then the expected value of f ( A i | S ) should be comparable to τ E [ c ( A i )]. In order to compute A i inone single parallel round, one can a posteriori compute for each prefix A j of A the a priori (withrespect to the uniform samples) expected marginal value of a j +1 ; with a j +1 drawn uniformly atrandom from the elements in X \ A j still fitting the budget, this means simply averaging over theirmarginal densities. Since all the value queries depend only on S and A , finding the prefix needsonly a single adaptive round.The crucial difficulty lies in the fact that limiting the expected marginal density is insufficientto bound the number of adaptive steps. In the worst case, a single very negative element couldtrigger this condition. We circumvent the resulting adaptive complexity of up to n by imposingtwo different stopping conditions, corresponding to i ∗ and j ∗ in ThreshSeq . The cost condition istriggered once an ε -fraction of all remaining candidates’ cost is due to elements that are no longer good , i.e., they now have marginal density below τ . The value condition is triggered at most ‘ times,which happens whenever the elements with negative marginal value make up an ε -fraction of theentire leftover marginal value. Now, in each adaptive step, either the overall cost or the summed-upmarginal contributions of the candidate set decrease by a factor of (1 − ε ). These observations areformalized below. Algorithm 1:
SampleSeq ( S, X, B ) Input: current solution S , set X of remaining elements and budget B > A ← [ ], i ← while X = ∅ do Draw a i uniformly at random from X A ← [ a , . . . , a i − , a i ] X ← { x ∈ X \ { a i } : c ( x ) + c ( A ) + c ( S ) ≤ B } i ← i + 1 return A = [ a , a , . . . , a d ] Lemma 3.
Let κ ( X ) = max x,y ∈ X c ( x ) /c ( y ) . Then ThreshSeq runs in O (cid:16) ε log( nκ ( X )) + ‘ (cid:17) adaptive rounds and issues O (cid:16) n (cid:16) ε log nκ ( X ) + ‘ (cid:17)(cid:17) value queries. roof. The adaptive rounds correspond to iterations of the while loop. In fact, once a new sequenceis drawn by
SampleSeq , all the value queries needed are deterministically induced by it and hencecan be assigned to different independent machines. Gathering this information we can determine k ∗ and start another iteration of the while loop. Bounding the number of such iterations where thevalue condition is triggered is easy, since it is forced to be at most ‘ . For the cost condition we usethe geometric decrease in the total cost of X : every time it is triggered, the total cost of the feasibleelements X is decreased by at least a (1 − ε ) factor. At the beginning of the algorithm, that cost isat most Cn , with C = max x ∈ X c ( x ), and it needs to decrease below c = min x ∈ X c ( x ) to ensure that X = ∅ . Call r the number of such rounds. In the worst case we need Cn (1 − ε ) r < c , meaning thatthe adaptivity is upper bounded by ε log ( nκ ( X )) + ‘ . Finally, notice that the query complexityis just a n factor greater than the adaptivity: each adaptive round contains O ( n ) value queries,since the length of the sequence output by SampleSeq may be linear in n and for each prefix thevalue of the marginals of all the remaining elements has to be considered.Having settled the adaptive and query complexity of ThreshSeq , we move to proving that ourconditions ensure good expected marginal density.
Algorithm 2:
ThreshSeq ( X, τ, ε, ‘, B ) Input: set X of elements, threshold τ >
0, precision ε ∈ (0 , ‘ and budget B S ← ∅ , ctr ← X ← { x ∈ X : f ( x ) ≥ τ c ( x ) } while X = ∅ and ctr < ‘ do [ a , a , . . . , a d ] ← SampleSeq ( S, X, B ); for i = 1 , . . . , d do A i ← { a , a , . . . , a i } X i ← { a ∈ X \ A i : c ( a ) + c ( S ∪ A i ) ≤ B } ; G i ← { a ∈ X i : f ( a | S ∪ A i ) ≥ τ · c ( a ) } E i ← { a ∈ X i : f ( a | S ∪ A i ) < } i ∗ ← min { i : c ( G i ) ≤ (1 − ε ) c ( X ) } j ∗ ← min n j : X x ∈ G j εf ( x | S ∪ A j ) ≤ X x ∈ E j | f ( x | S ∪ A j ) | o k ∗ ← min { i ∗ , j ∗ } S ← S ∪ A k ∗ X ← G k ∗ if j ∗ < i ∗ then ctr ← ctr + 1 return S Lemma 4.
For any X , τ , ε ∈ (0 , , ‘ and b , the random set S output by ThreshSeq is such that E [ f ( S )] ≥ (1 − ε ) τ E [ c ( S )] and c ( S ) ≤ B .Proof. We first note that c ( S ) ≤ B with probability 1 since SampleSeq always returns feasiblesequences. The algorithm adds a chunk of elements to the solution in each iteration of the while loop. This, along with the fact that each of these chunks is an ordered prefix of a sequence outputby
SampleSeq , induces a total ordering on the elements in S . To facilitate the presentation of thisproof, we imagine that the elements of S are added one after the other, according to this total order.6et us call the t -th such element s t , and let F t denote the filtration capturing the randomness ofthe algorithm up to, but excluding, the adding of s t to its chunk’s random sequence. We show thatwhenever any s t is added, its expected marginal density is at least (1 − ε ) τ .Fix some s t and consider the iteration of the while loop in which it is added to the solution.We denote with S old the partial solution at the beginning of that while loop, with X the candidateset { x ∈ N : f ( x | S old ) ≥ τ · c ( x ) , c ( x ) + c ( S old ) ≤ B } at that point, and with A the sequencedrawn in that iteration by SampleSeq . Let A ( t ) be the prefix of A up to, and excluding, s t .Then S t = S old ∪ A ( t ) is the set of all elements added to the solution before s t . Note that, given F t , the sets X , S old and A ( t ) are deterministic, while the rest of A is random. Recall that s t is drawn uniformly at random from X ( t ) = { x ∈ X \ A ( t ) | c ( S t ) + c ( x ) ≤ B } . We need to showthat E [ f ( s t | S t ) | F t ] ≥ (1 − ε ) τ E [ c ( s t ) | F t ], where the randomness is with respect to the uniformsampling in X ( t ) .If s t is the first element in A , this holds since all the elements in X exhibit a marginal densitygreater that τ . If s t is not the first element, it means that the value and cost condition were nottriggered for the previous one. Call G and E the sets of the good and negative elements with respectto S t , i.e., G = { x ∈ X ( t ) : f ( x | S t ) ≥ τ c ( x ) } and E = { x ∈ X ( t ) : f ( x | S t ) < } , which are alsodeterministically defined by F t . Finally, let p x be P ( s t = x | F t ) which is equal to | X ( t ) | − for all x ∈ X ( t ) and zero otherwise, then E [ f ( s t | S t ) | F t ] − (1 − ε ) τ E [ c ( s t ) | F t ] == X x ∈ X p x f ( x | S t ) − (1 − ε ) τ X x ∈ X p x c ( x ) ≥ X x ∈ G ∪ E p x f ( x | S t ) − (1 − ε ) τ X x ∈ X p x c ( x )= ε X x ∈ G p x f ( x | S t ) − X x ∈ E p x | f ( x | S t ) | (1)+ (1 − ε ) τ h X x ∈ G p x c ( x ) − (1 − ε ) X x ∈ X p x c ( x ) i (2)+ (1 − ε ) X x ∈ G p x h f ( x | S t ) − τ c ( x ) i ≥ . (3)Expressions (1) and (2) are nonnegative since the value and cost conditions were not triggeredbefore adding s t . Expression (3) is nonnegative by the definition of G .We have shown for all t and F t , the expected marginal density of the t -th element (if any)added by our algorithm is large enough. Next we carefully apply conditional expectations to get thedesired bound. We have already argued how the algorithm induces an ordering on the elements itadds to the solution, so that they can be pictured as being added one after the other. To avoidcomplication in the analysis we suppose that after the algorithm stops it keeps on adding dummyelements of no cost and no value, so that in total it runs for n time steps. Consider the filtration {F t } nt =1 generated by the stochastic process associated to the algorithm, where F t narrates whathappens up to the point when element s t is considered. So that F is empty and F n contains allthe story of the algorithm except for the last—possibly dummy—added element. From the aboveanalysis we know that for each time t ∈ { , . . . , n } and any possible story F t of the algorithm itholds E [ f ( s t | S t ) | F t ] ≥ (1 − ε ) τ E [ c ( s t ) | F t ] . (4)Note that this claim holds also if one considers the dummy elements after the actual termination of7he algorithm. E [ f ( S )] = E " n X t =1 f ( s t | S t ) = n X t =1 E [ f ( s t | S t )] = n X t =1 E [ E [ f ( s t | S t ) | F t ]]= E " n X t =1 E [ f ( s t | S t ) | F t ] ≥ (1 − ε ) τ E " n X t =1 E [ c ( s t ) | F t ] = (1 − ε ) τ E " n X t =1 c ( s t ) = (1 − ε ) τ E [ c ( S )] . The second and fourth equalities hold by linearity of expectation, the third and fifth equalities holdby the law of total expectation. Finally, the inequality follows from monotonicity of the conditionalexpectation and inequality (4).Having established that S has expected density comparable to our threshold τ , we move onto showing that when ThreshSeq terminates, either a large portion of the budget is used up inexpectation, or we can bound the value of good candidates that are left outside the solution.
Lemma 5.
When
ThreshSeq terminates we have f ( S ) ≥ ε‘ P x ∈ G f ( x | S ) , where G = { x ∈ X \ S : f ( x | S ) ≥ τ c ( x ) , c ( x ) + c ( S ) ≤ B } .Proof. ThreshSeq terminates in one of two cases. Either X is empty, meaning that there are noelements still fitting in the budget whose marginal density is greater than τ —and in that case theinequality we want to prove trivially holds—or the value condition has been triggered ‘ times.For the latter, suppose that the value condition was triggered for the i th time during iteration t i of the while loop. Denote by S t i the solution at the end of that iteration. We are interested inthe sets X j ∗ , G j ∗ , E j ∗ of that particular iteration of the while loop. In order to be consistent acrossiterations, we use X ( i ) , G ( i ) , and E ( i ) to denote these sets for iteration t i . Since the value conditionwas triggered during t i , we have ε P x ∈ G ( i ) f ( x | S t i ) ≤ P x ∈ E ( i ) | f ( x | S t i ) | . Clearly, G ( ‘ ) is what wedenoted by G in the statement and S t ‘ is S . Also notice that E ( j ) ∩ E ( k ) = ∅ for j = k .Now, by non-negativity of f and Lemma 1, we have0 ≤ f (cid:16) S t ‘ ∪ ‘ S i =1 E ( i ) (cid:17) ≤ f ( S t ‘ ) + ‘ X i =1 X x ∈ E ( i ) f ( x | S t i ) . Rearranging the terms and using the value condition, we get f ( S t ‘ ) ≥ ‘ X i =1 X x ∈ E ( i ) | f ( x | S t i ) | ≥ ‘ X i =1 ε X x ∈ G ( i ) f ( x | S t i ) ≥ ‘ε X x ∈ G ( ‘ ) f ( x | S t ‘ ) . The last inequality follows from the submodularity of f and the fact that that G (1) ⊇ G (2) ⊇ . . . ⊇ G ( ‘ ) .Lemma 5 still leaves a gap: how can we account for the elements which have marginal densitygreater than τ but are not considered due to the budget constraint? It can be the case that due tosome poor random choices we initially filled the solution with low quality elements, preventing thealgorithm at later stages to consider good elements with large costs. To handle this, we need thefollowing simple lemma. 8 emma 6. Suppose that E [ c ( S )] < B (1 − ε ) . Then, running ThreshSeq ε log( ε ) times, withprobability at least (1 − ε ) , there is at least one run where c ( S ) < B .Proof. Let E be the event { c ( S ) ≥ B } , then(1 − ε ) B > E [ c ( S )] ≥ E [ c ( S ) | E ] P ( E ) ≥ P ( E ) B . Hence P (cid:16) E C (cid:17) > ε >
0, so repeating the experiment ε log( ε ) independent times is enough to have aprobability at least 1 − ε of observing at least once E C .We can now present the full parallel algorithm ParKnapsack for the non-monotone case.It considers separately the set N − of “small” elements, each with cost smaller than B/n , andthe set of “large” elements N + = N \ N − . The set N − is fed to the low adaptive complexityunconstrained maximization routine SubmodMax as discussed in Section 2. For the large elements,
ParKnapsack samples each element of N + with probability p to get a random subset H , andthen it runs ThreshSeq a logarithmic number of times on H , in parallel, for different guesses ofthe “right” threshold. The partition between N + and N − is critical in bounding the adaptivity of ThreshSeq , as κ ( N + ) ≤ n . Algorithm 3:
ParKnapsack ( N , f, ε, α, p, B ) Input:
Ground set N , submodular function f , budget B , precision ε , parameter α andsampling probability p N − ← { x ∈ N : c ( x ) < Bn } ; N + ← N \ N − x ∗ ← max x ∈N + f ( x ) ; ˆ τ ← αn f ( x ∗ ) B ˆ ε ← ε/
125 ; ‘ ← ˆ ε − ; k ← ˆ ε − log( n ) S − ← SubmodMax ( N − , ˆ ε ) H ← sample each element in N + independently at random with probability p for i = 0 , , . . . , k in parallel do τ i ← ˆ τ · (1 − ˆ ε ) i for j = 1 , , . . . , ˆ ε − log(ˆ ε − ) in parallel do S ij ← ThreshSeq ( H, τ i , ˆ ε, ‘, B ) return T ∈ arg max i,j { f ( S ij ) , f ( x ∗ ) , f ( S − ) } Theorem 1.
For α = −√ , p = − α and ε < , ParKnapsack is a (9 .
465 + ε ) -approximationalgorithm with O ( ε log n ) adaptivity and O ( n ε log n log ε ) total queries.Proof. Excluding the call to
SubmodMax , the claim on the adaptivity follows directly fromLemma 3 with ‘ = O ( ε − ), and the observation that κ ( N + ) ≤ n . The adaptivity is indeed only dueto ThreshSeq , since the guessing of the threshold, as well as the multiple runs of
ThreshSeq ,happen independently in parallel. Relative to the query complexity, we have the bound in Lemma 3multiplied by an extra O ( log nε log ε ) factor caused by the two for loops. SubmodMax does notaffect these asymptotics since it has adaptivity bounded by O ( ε ) and linear query complexity.Consider now the approximation guarantee. Call O ∗ the optimal solution, and O + , O − itsintersections with N + and N − respectively. We can upper bound f ( O − ) with the unconstrained max9n N − , since there are at most n elements in N − whose cost is at most Bn . Using the combinatorialalgorithm of Chen et al. [12], we get f ( O − ) ≤ (2 + ˆ ε ) · f ( S − ) ≤ (2 + ˆ ε ) · ALG (5)Let O ∈ arg max { f ( T ) : T ⊆ N + , c ( T ) ≤ B } , i.e., O is an optimal solution in N + . Clearly f ( O + ) ≤ f ( O ), so we will upper bound the latter. Let O H = O ∩ H . By submodularity andmonotonicity of f ( · ∩ O ), we have pf ( O ) ≤ E [ f ( O H )]. Outside O , the function may be non-monotone,so we need Lemma 2. In particular, we apply it on the submodular function g ( · ) = f ( · ∪ O ). Sinceelements belong to H with probability p , for S ⊆ H we get p (1 − p ) f ( O ) ≤ (1 − p ) E [ f ( O H )] ≤ E [ E [ f ( S ∪ O H ) | O H ]] = E [ f ( S ∪ O H )] . (6)Let τ ∗ = αf ( O ) /B . By the parallel guesses we have that there exists τ = τ i such that (1 − ˆ ε ) τ ∗ ≤ τ <τ ∗ . This directly follows from the definitions of τ ∗ and ˆ τ and the fact that nf ( x ∗ ) ≥ f ( O ) ≥ f ( x ∗ ).We focus only on this particular τ and consider two cases, depending on E [ c ( S )], where S is the setoutputted by ThreshSeq for τ . If E [ c ( S )] ≥ (1 − ˆ ε ) B , then, from Lemma 4 we have ALG ≥ E [ f ( S )] ≥ (1 − ˆ ε ) τ E [ c ( S )] ≥ (1 − ˆ ε ) αf ( O ) E [ c ( S )] B ≥ (1 − ˆ ε ) α f ( O ) . (7)If E [ c ( S )] < (1 − ˆ ε ) B we need a more careful analysis, via Lemmata 5 and 6. Consider the multipleruns of ThreshSeq corresponding to τ . Let G be the event that at least one of those runs outputs S with c ( S ) < B and consider that solution; recall that P ( G ) ≥ (1 − ˆ ε ) from Lemma 6. What wewant to bound is the total value of the elements of O H which are not in S . The ones retaininga good marginal density with respect to S can be divided into two categories, depending on thereason why they were not added to S : G = { x ∈ H : f ( x | S ) ≥ τ c ( x ) , c ( x ) + c ( S ) ≤ B } , ˜ G = { x ∈ H : f ( x | S ) ≥ τ c ( x ) , c ( x ) + c ( S ) > B } . The total contribution of the elements in G can be bounded applying Lemma 5. For ˜ G ∩ O H weknow that it contains at most one element ˜ x , since we are conditioning on G and thus, if such ˜ x exists, c (˜ x ) > B . Moreover, f (˜ x ) ≤ f ( x ∗ ). Finally, we know that the marginal density of all theother elements in O H \ S is at most τ . Let E be the event that ˜ G ∩ O H = ∅ given G and q itsprobability. We have f ( S ∪ O H ) ≤ f ( S ) + E · f (˜ x | S ) + X x ∈ G f ( x | S ) + X x ∈ O H \ ( G ∪ ˜ G ) f ( x | S ) ≤ f ( S )(1 + ˆ ε ) + E · ( f (˜ x | S ) − τ c (˜ x )) + τ c ( O H ) ≤ f ( S )(1 + ˆ ε ) + E · ( f ( x ∗ ) − τ B ) + τ c ( O H ) . Keeping fixed H , let’s apply the expectation on the randomness in ThreshSeq , conditioning on G and recalling that both f ( S ) and f ( x ∗ ) are upper bounded by ALG : E [ f ( S ∪ O H ) | G ] ≤ (1 + ˆ ε + q ) ALG + τ c ( O H ) − qτ B , Now move on to the expectation with respect to H . Note that by submodularity f ( S ∪ O H ) ≤ f ( O ).We have E [ f ( S ∪ O H )] = E [ f ( S ∪ O H ) | G ] P ( G ) + E h f ( S ∪ O H ) | G C i P (cid:16) G C (cid:17) E [ f ( S ∪ O H ) | G ] + 2ˆ εf ( O ) . Putting together the last two inequalities and recalling that E [ c ( O H )] = pc ( O ) ≤ pB , we have E [ f ( S ∪ O H )] ≤ (2ˆ ε + α − q α ) f ( O ) + (1 + ˆ ε + q ) ALG.
Combining that with (6), we finally obtain f ( O ) ≤ (1 + q + ˆ ε ) (cid:2) p (1 − p ) − αp + αq − ε (cid:3) ALG (8)At this point we need to optimize the constants in (5), (6), (8), also using that
OP T ≤ f ( O + ) + f ( O − ) ≤ f ( O ) + f ( O − ) . Setting p = ( √ − α = 2 − √ ε = ε we get, for small enough ˆ ε and for any value of q ∈ (0 ,
1) the desired bound:
OP T ≤ (2(3 + √
3) + ε ) ALG . As mentioned already, an interesting feature of our approach is that—with few modifications—yields a number of algorithms that match or improve the state-of-the-art. We only sketch thesemodifications here and we defer the details to the appendix.We begin with a discussion on the possible trade-offs between adaptivity and query complexity.
ThreshSeq can be adapted to spare Θ( n log n ) value queries at the cost of O (log n ) extra adaptiverounds. The idea is to use binary search to locate k ∗ in the while loop of ThreshSeq . Onlya logarithmic number of prefixes needs to be sequentially considered, instead of all of them inparallel. To be able to binary search k ∗ , though, a carefully modified version of the value conditionis implemented, since the one used in ThreshSeq exhibits a multi-modal behaviour.
Theorem 2.
For ε ∈ (0 , / , it is possible to achieve a (9 .
465 + ε ) -approximation in O ( ε log n ) adaptive rounds and O ( nε log n log ε ) queries. For monotone objectives, the approximation ratio of
ParKnapsack can be significantly improved.In particular, in
ThreshSeq , we do not need to address the value condition any more. Moreover,the small elements can be accounted for without any extra loss in the approximation. As in thecase of Theorems 1 and 2, it is possible to trade a logarithmic loss in adaptivity for an almost lineargain in query complexity.
Theorem 3.
For ε ∈ (0 , it is possible to achieve a ε approximation in O ( ε log n ) adaptiverounds and O ( n ε log n log ε ) queries or in O ( ε log n ) adaptive rounds and O ( nε log n log ε ) queries. Note that the variant using ˜ O ( n ) queries is the first O (1)-approximation algorithm for theproblem combining this few queries with sublinear adaptivity.11 .1.2 Cardinality constraints ParKnapsack can be directly applied to cardinality constraints for (possibly) non-monotone objec-tives. Again with some simple modifications, it is possible to achieve a much better approximation.
Theorem 4.
For ε ∈ (0 , / it is possible to achieve a .
83 + ε approximation, in O ( ε log n ) adaptive rounds and O ( nkε log n log k log ε ) queries, or in O ( ε log n log k ) adaptive rounds and O ( nε log n log k log( ε )) queries. Although we do not heavily adjust our algorithms to cardinality constraints, Theorem 4 isdirectly comparable to the very recent results of Ene and Nguyen [17] and Kuhnle [31] which aretailored for the problem.
We evaluate the performance of
ParKnapsack on real datasets and real-world applications,as is often the case in the related literature [35, 21, 1, 6, 31]. All three objectives we use arenon-monotone submodular. We compare against the state-of-the-art of fast algorithms for non-monotone submodular maximization subject to a knapsack constraint, in order to demonstrate that
ParKnapsack produces almost equally good solutions with an exponential improvement on theadaptivity. We provide two kinds of figures: objective versus budget (or instance size) and objectiveversus adaptive steps, for a given instance, as in Balkanski et al. [3] and Fahrbach et al. [21]. Notethat we use the version of
ParKnapsack from Theorem 2 to ensure ˜ O ( n ) query complexity. Thebenchmarks we use are plain Greedy , Fantom of Mirzasoleiman et al. [35] and
SampleGreedy ofAmanatidis et al. [1]. The last two have the state-of-the-art performance in terms of objective valuefor knapsack constraints among algorithms with practical running times, i.e., among algorithmswith subquadratic query complexity. On the other hand, these algorithms are not designed forlow adaptivity but, the only alternative, i.e. continuous methods, are impractical for the instancesizes we consider. The
Greedy algorithm builds a solution step by step by adding the elementwith the highest marginal value, until the budget is exhausted. While this naive approach has notheoretical guarantees, it is very fast and often has acceptable performance in practice.
Fantom builds on
Greedy and is robust for intersecting p -systems and knapsack constraints providing a10(1 + (cid:15) )-approximation for our setting. Finally, SampleGreedy greedily selects elements accordingto their marginal value per cost ratio, but only adds them to the solution with some probability.This leads to a 5 . O ( n log n ) queries and adaptive steps,when implemented using with lazy evaluations [33].Apart from Greedy , all other algorithms need some constant parameters as part of the input,in addition to the submodular instance. For
SampleGreedy and
ParKnapsack , we set p = 0 . ε = 1 / Fantom and ε = 1 / , α = 2 − √ ParKnapsack balancing performance and running time. Each experimenton the top row was repeated 3 times, to gain an estimate of the variance. In total, the whole arrayof tests ran on four t2.micro instances on the Amazon Elastic Compute Cloud (EC2), which ispart of Amazon Web Services (AWS).
Movie Recommendation.
Given a set of movies A , a list of genres C i such that C ∪ C ∪ . . . ∪ C k = A and a list of user generated keyword tags t iu and ratings r iu , where i ∈ A and u is the id of a12 .
02 0 .
04 0 .
06 0 .
08 0 . Budget . . . . . . O b j ec t i v e F un c t i o n × FantomSampleGreedy
ParKnapsack
Greedy (a) Movie Recommendation (b) Movie Recommendation .
01 0 .
02 0 .
03 0 .
04 0 .
05 0 . Budget . . . . . . O b j ec t i v e F un c t i o n × FantomSampleGreedy
ParKnapsack
Greedy (c) Revenue Maximization (d) Revenue Maximization
Number of Vertices . . . . . . . O b j ec t i v e F un c t i o n × FantomSampleGreedy
ParKnapsack
Greedy (e) Graph Cut on G ( n, .
1) (f) Graph Cut on G ( n, . Figure 1: Each row contains two plots, corresponding to the same submodular problem. The left columncontains the objective function value for different budget or instance sizes. The right column focuses onone vertical slice of the top one: for a specific budget and instance size, the objective value is presented asa function of the number of adaptive rounds, indicating that if we required to stop after a small numberof rounds all other algorithms would perform extremely poorly. In all cases, the results are consistent:
ParKnapsack has comparable performance, with drastically improved adaptivity. user, a movie recommendation system aims to use this information to provide a short list of diverseoptions that match certain preferences. The MovieLens dataset [26] provides a very large set ofmovies that include user generated tags and ratings. We calculate the similarity between two movies(following the procedure of Amanatidis et al. [1]; see below) and produce a weighted complete graph,where each vertex is a movie. For i, j ∈ A the weight w ij represents their similarity. In addition,13e use χ ij to indicate if the two movies share a genre. Putting everything together, the objectivefunction is: v ( S ) = α P i ∈ S r i + β ( P i ∈ S P j ∈ A w ij − P i ∈ S P j ∈ S ( λ + χ ij µ ) w ij ) for λ, µ, α, β ≥ r i represents the average rating of movie i . This is a weighted average of the ratings of the moviesin S and a modified maximal marginal relevance [8]. The second part is similar to a max cut (infact it is a max cut for λ = 1 and µ = 0), but allows the internal edges to be penalized differently,depending on whether the movies are similar or belong to the same genre. For the experiments weconsider a subset of 5000 movies and set α = β = 0 . , λ = 3 and µ = 7. Each movie is assigned acost sampled uniformly from [0 ,
1] and the total budget ranges from 0 .
01 to 0 . w ij are created, given the MovieLens dataset. Each movie i isassociated with a tag vector t i ∈ [0 , , which encodes how much each tag applies to it. The tagsare user generated. For example, a movie like “Titanic” could have a score of 0 . . w ij = vuut X k =1 (cid:16) min { t ik , t jk } (cid:17) . (9)This approach ensures that movies with tag vectors that are close appear more similar. There arevarious trade-offs in the selection of the similarity metric. For instance, if one defined w ij = t i · t j ,then a movie with all tags set to 1 would appear more similar to any other movie, even whencomparing a movie with itself! Going to the other extreme, if w ij = t i · t j / ( | t i || t j | ) this issue wouldbe avoided, but some information would be lost as every movie would have a normalised tag vector,even though having a high score on one tag should not impact the score on other tags. Ultimately,using (9) appears to be a reasonable compromise between the two. We stress that our experimentalfindings about the performance of each algorithm remain qualitatively the same for any sensiblechoice of w ij . Revenue Maximization.
Representing a social network as a weighted graph, where each edgesignifies how much one user is influenced by another, our goal is to select a subset S of users whoare given a product to advertise, in order to maximize the revenue from sales. We use the YouTubeNetwork [40], and consider the subgraph induced by selecting its Top 5000 communities, whichhas 39841 vertices and 224235 edges. We assign edge weights w ij sampled uniformly in [0 ,
1] andeach user i ∈ V has a suggestibility parameter α i drawn from a Pareto Type II distribution with λ = 1 , α = 2. The objective to maximize is: v ( S ) = P i ∈ V \ S α i qP j ∈ S w ij . Each user is assigned acost proportional to their incident edges, with the budget ranging from 0 .
01 to 0 . Maximum Weighted Cut.
Given an Erdős–Rényi graph G ( n, p ) where n is the number ofvertices and p the probability of including each edge, the objective is to find a cut of maximumweight . Fixing p = 0 .
1, we let n ∈ { , . . . , } (with an exponential step) and assign random edgeweights and costs sampled uniformly from [0 , In this paper we close the gap for the adaptive complexity of non-monotone submodular maximizationsubject to a knapsack constraint, up to a O (log log n ) factor. Our algorithm, ParKnapsack , iscombinatorial and can be modified to achieve trade-offs between adaptivity and query complexity.In particular, it may use nearly linear queries, while achieving an exponential improvement onadaptivity compared to existing algorithms with subquadratic query complexity.14
Adapting
ParKnapsack to Use Binary Search
This appendix addresses the adaptivity versus queries trade-off mentioned in Section 3.1 which leadsto Theorem 2 and, implicitly, to Theorems 3 and 4. The algorithms and the proofs of these twotheorems are presented in Appendices B and C, respectively.As already pointed out in the main text, the value condition in
ThreshSeq may exhibit amulti-modal behaviour along a single iteration of the while loop. In order to enable binary searchfor k ∗ , we want to tweak the value condition so that if it is triggered for a certain prefix A i , itremains activated for all A j for j ≥ i in the specific while loop iteration.So, fix an arbitrary iteration of the while loop. Call S the initial solution and A the sequencedrawn by SampleSeq , { X i } i the sequence of the sets of elements still fitting into the budget relativeto each prefix A i and with G i and E i the subsets of X i containing the good , i.e., marginal densitygreater than τ , and bad , i.e., negative marginal density, elements, respectively. First, note that thecost condition is clearly unimodal: the { X i } i is a decreasing sequence of sets and hence c ( X i ) isa non-increasing sequence of costs, while c ( X ) stays fixed: as soon as the cost of X i drops below(1 − ε ) c ( X ) it stays there for all the prefixes longer than A i .For the value condition we need a bit more work; if for some j it holds that ε P x ∈ G j f ( x | S ∪ A j ) ≤ P x ∈ E j | f ( x | S ∪ A j ) | , it may be the case that the inequality switches direction later in the sameiteration of the while loop. Notice that it can happen for one of two reasons: either elements withnegative marginals are added to the solution or they are thrown away due to budget constraint.We want a modification which is robust to these corner cases. To this end, we add to the valuecondition the absolute contribution of two sets of items.First, we redefine the set E i to contain also all the bad elements considered in that while loop,regardless of the budget condition, i.e., E i ← { a ∈ X : f ( a | S ∪ A i ) < } . Moreover for each prefix A i , we define E i as the set of all the items in the prefix A i which added negative marginal wheninserted in the solution. i.e., E i = { a t ∈ A i : f ( a t | S ∪ A t − ) < } .The new value condition then reads: ε X x ∈ G i f ( x | S ∪ A i ) ≤ X x ∈ E i | f ( x | S ∪ A i ) | + X a j ∈E i | f ( a j | S ∪ A j ) | . Notice that now everything works out just fine: the left hand side of the condition is monotonicallydecreasing in i , while the right hand side is monotonically increasing, by submodularity and thefact that now { E t ∪ E t } t is an increasing set sequence.Given the new algorithm, ThreshBin , we need to show that it retains the right properties of
ThreshSeq and argue about its adaptive and query complexity.
Lemma 7.
Consider a run of
ThreshBin and denote with S and ¯ S the preliminary and finalsolution as in the algorithm. Then the following properties hold: • c ( ¯ S ) ≤ c ( S ) ≤ B • f ( ¯ S ) ≥ f ( S )• E [ f ( S )] ≥ τ (1 − ε ) E [ c ( S )]• Call G the set of elements still fitting in the budget after S , whose marginal density withrespect to S is greater than τ . Then f ( ¯ S ) ≥ ε‘ P x ∈ G f ( x | S ) . ThreshBin needs O (cid:16) log n (cid:16) log nκ ( X ) ε + ‘ (cid:17)(cid:17) adaptive rounds and O (cid:16) n log n (cid:16) log nκ ( X ) ε + ‘ (cid:17)(cid:17) valuequeries. lgorithm 4: ThreshBin ( X, τ, ε, ‘, B )Variant of
ThreshSeq that utilises binary search Input: set X of elements, threshold τ >
0, precision ε ∈ (0 , ‘ and budget B S ← ∅ ; ctr ←
0; flag ← X ← { x ∈ X : f ( x ) ≥ τ c ( x ) } while X = ∅ and ctr < ‘ do [ a , a , . . . , a d ] ← SampleSeq ( S, X, B ); b l ← b r ← d ; while b l < b r do i = b ( b r + b l ) / c ; A i ← { a , a , . . . , a i } X i ← { a ∈ X \ A i : c ( a ) + c ( S ∪ A i ) ≤ B } ; G i ← { a ∈ X i : f ( a | S ∪ A i ) ≥ τ · c ( a ) } E i ← { a ∈ X : f ( a | S ∪ A i ) < } E i ← { a s ∈ A i : f ( a s | S ∪ A s ) < } c ← c ( G i ) ≤ (1 − ε ) c ( X ) ; c ← X x ∈ G i εf ( x | S ∪ A i ) ≤ X x ∈ E i | f ( x | S ∪ A i ) | + X a j ∈E i | f ( a j | S ∪ A j ) | ; if c or c then b r ← i else b l ← i + 1 if c and not c then flag ← else flag ← k ∗ = b r ; S ← S ∪ A k ∗ ; X ← G k ∗ ctr ← ctr + flag Suppose S = { s , s , . . . , s | S | } , where the indices imply the total ordering from the proof ofLemma 4 ¯ S ← ∅ for t = 1 , . . . , | S | do if f ( s t | { s , . . . , s t − } ) > then ¯ S ← ¯ S ∪ { s t } return ¯ S roof. The proof of this Lemma is quite similar to the one for
ThreshSeq , so we just highlight thedifferences.First, the new value condition is stricter than the old one, so the E [ f ( s t | S t ) | F t ] ≥ τ (1 − ε ) E [ c ( s t ) | F t ] inequality holds as well, where S t and F t are as in the proof of Lemma 4. This impliesthat E [ f ( S )] ≥ τ (1 − ε ) E [ c ( S )] . The bounds on adaptivity and query complexity follow easily from the binary search and theanalysis of Lemma 3. Further, the first and second bullets follow directly from ¯ S ⊆ S and the factthat we only filter out from S elements with negative contribution.Consider now the last remaining statement to prove (the analog of Lemma 5). For all realnumbers a we denote with a + = max { a, } its positive part and with a − = max {− a, } the negativeone. Clearly a = a + − a − . f ( S ) = T X t =1 ( f ( s t | S t ) + − f ( s t | S t ) − ) ≤ T X t =1 f ( s t | S t ) + ≤ X s t ∈ ¯ S f ( s t | ¯ S t ) = f ( ¯ S ) , where in the last inequality we used submodularity.Similarly to the proof for ThreshSeq , consider t , . . . , t ‘ , E (1) , . . . , E ( ‘ ) , G (1) , . . . , G ( ‘ ) , and E (1) ,. . . , E ( ‘ ) . Notice that they are all disjoint. Like before, for s i ∈ S , S i denotes { s , . . . , s i − } , but weslightly abuse the notation and have S t j denote the set S at the end of the iteration of the outerwhile loop where ctr is increased for the ‘ th time. We have0 ≤ f (cid:0) S t ‘ ∪ ‘ S j =1 E ( j ) (cid:1) ≤ f ( S t ‘ ) + f (cid:0) ‘ S j =1 E ( j ) | S t ‘ (cid:1) ≤ X s i ∈ S t‘ ( f ( s i | S i ) + − f ( s i | S i ) − ) + ‘ X j =1 f ( E ( j ) | S t j ) ≤ X s i ∈ S t‘ ( f ( s i | S i ) + − f ( s i | S i ) − ) + ‘ X j =1 X x ∈ E ( j ) f ( x | S t j ) . Rearranging terms, and using the value condition, we get f ( ¯ S ) ≥ X s i ∈ S t‘ f ( s i | S i ) + ≥ X s i ∈ S t‘ f ( s i | S i ) − + ‘ X j =1 X x ∈ E ( j ) | f ( x | S t j ) |≥ ‘ X j =1 X x ∈ E ( j ) | f ( x | S t j ) | + X s i ∈E ( j ) | f ( s i | S i ) | ≥≥ ε ‘ X j =1 X x ∈ G ( j ) f ( x | S t j ) ≥ ‘ε X x ∈ G ( ‘ ) f ( x | S t ‘ ) . Observing that S t ‘ = S concludes the proof. Theorem 5.
For ε ∈ (0 , / , it is possible to achieve a (9 .
465 + ε ) -approximation in O ( ε log n ) adaptive rounds and O ( nε log n log ε ) queries.Proof. A large part of this proof is very similar to the proof of Theorem 1. Hence, we only highlightthe differences, while retaining the same notation. Note that now S is no more the output of the17lgorithm, but the non-filtered output of ThreshBin (the filtered version being ¯ S ). The two casesin the analysis are similar.If E [ c ( S )] ≥ (1 − ε ) B , then we have a result analogous to (7) in the main text: ALG ≥ E h f ( ¯ S ) i ≥ E [ f ( S )] ≥ τ (1 − ε ) E [ c ( S )] ≥ α (1 − ε ) f ( O ) . Otherwise, we can repeat the algorithm ε log( ε ) times to be sure that, with probability at least1 − ε , we observe B > c ( S ) > c ( ¯ S ). From this we can infer that at most one element is containedin ˜ G ∩ O H . Notice however a difference here: G and ˜ G are the elements with good marginal withrespect to S , not with respect to ¯ S . f ( S ∪ O H ) ≤ f ( S ) + E · f (˜ x | S ) + X x ∈ G f ( x | S ) + X x ∈ O H \ ( G ∪ ˜ G ) f ( x | S ) ≤ f ( ¯ S )(1 + ˆ ε ) + E · ( f (˜ x | S ) − τ c (˜ x )) + τ c ( O H ) ≤ f ( ¯ S )(1 + ˆ ε ) + E · ( f ( x ∗ ) − τ B ) + τ c ( O H ) , where E is the event that ˜ G ∩ O H is not empty given that c ( S ) < B . Proceeding as in the proof ofTheorem 1 and noting that f ( S ) ≤ f ( ¯ S ) ≤ ALG we arrive at the same inequality as in (8) in themain text: f ( O ) ≤ (1 + q + ˆ ε ) (cid:2) p (1 − p ) − αp + αq − ε (cid:3) ALG.
The rest of the proof is essentially the same with the proof of Theorem 1.
B Monotone Objectives and a Knapsack Constraint
For monotone objectives we can improve the approximation factor by slightly modifying the mainalgorithm. Notice, moreover, that in
ThreshSeq the only relevant condition is the cost conditionsince no element can have a negative marginal value.
Lemma 8.
For any set X , threshold τ , precision ε ∈ (0 , , parameter ‘ and budget B , the randomset S output by ThreshSeq is such that E [ f ( S )] ≥ τ (1 − ε ) E [ c ( S )] . The random set S is always a feasible solution and if c ( S ) < B , then all the elements in X \ S have either marginal density with respect to S smaller than τ or there is no room for them in thebudget. Finally, the adaptivity is upperbounded by ε log ( nκ ( X )) , while the query complexity by n ε log ( nκ ( X )) .Proof. Again the proof is similar to the one for the non-monotone case in the main text. There arethree main differences. First, the adaptive complexity is given only by the number of times the costcondition is triggered, hence an upper bound is given by ε log ( nκ ( X )). The query complexity issimply obtained multiplying that by a n factor as in the proof of Theorem 1.Second, the algorithm can now only stop if the budget is exhausted or there are no good elementsfitting within the budget; the while loop terminates only in those two cases. Finally, the main chainof inequalities is simply E [ f ( s t | S t ) | F t ] = X x ∈ X p x f ( x | S t ) ≥ τ X x ∈ G p x c ( x ) ≥ τ (1 − ε ) X x ∈ X p x c ( x ) = (1 − ε ) τ E [ c ( s t ) | F t ] , H and use the SamplingLemma, since f ( S ) ≤ f ( S ∪ O ). Second, if one defines the small elements to be the ones with costsmaller than ε Bn it is possible to account for them by simply adding all of them to the solution atthe cost of filling an ε fraction of the budget. Notice that this can be done while keeping κ ( N + )linear in n . The remaining (1 − ε ) fraction of the budget is then filled via ThreshSeq on the largeelements. The pseudocode is given in Algorithm 5 below.
Algorithm 5:
ParKnapsackMonotone ( N , f, ε, α, B )Full algorithm for monotone objectives and a knapsack constraint Input:
Ground set N , monotone submodular function f , budget B , precision ε ∈ (0 ,
1) andparameter α ∈ (0 , N − ← { x ∈ N : c ( x ) < ε Bn } N + ← N \ N − x ∗ ← max x ∈N f ( x ), ˆ τ ← αn f ( x ∗ ) B ˆ ε ← ε , k ← ε log( n ) for i = 0 , . . . , k in parallel do τ i ← ˆ τ · (1 − ˆ ε ) i for j = 1 , . . . , ε log( ε ) in parallel do S ij ← ThreshSeq ( N + , τ i , ˆ ε, (1 − ˆ ε ) B ) T ij ← S ij ∪ N − T ← arg max { f ( T ji ) , f ( x ∗ ) } Return T Theorem 6.
For ε ∈ (0 , it is possible to achieve a ε approximation in O ( ε log n ) adaptiverounds and O ( n ε log n log ε ) queries or in O ( ε log n ) adaptive rounds and O ( nε log n log ε ) queries.Proof. We show that
ParKnapsackMonotone with parameters α = and any ε ∈ (0 ,
1) satisfiesthe statement of the theorem. We start by noting that the adaptivity bound is given by combiningLemma 8, the fact that the thresholds are guessed in parallel, and the fact that κ ( N + ) ∈ O ( nε ). Weremark that now, since the cost condition is unimodal, binary search in ThreshSeq works withoutany major adjustment.Let O ∗ be the optimal solution and let τ ∗ = α f ( O ∗ ) B . By the parallel guesses we have thatthere exists a τ = τ i such that (1 − ˆ ε ) τ ∗ ≤ τ < τ ∗ . As in the non-monotone case, this is because f ( x ∗ ) ≥ f ( O ∗ ) ≥ f ( x ∗ ). Let S be the random set outputted by ThreshSeq for that τ . Also, let T = S ∪ N − and notice that c ( S ∪ N − ) ≤ B .We can distinguish two cases. First, if E [ c ( S )] ≥ B (1 − ε )(1 − ˆ ε ), then we apply Lemma 8 andwe have f ( O ∗ ) ≤ α (1 − ˆ ε ) (1 − ε ) f ( S ) ≤ α (1 − ˆ ε ) (1 − ε ) f ( T ) . (10)Let’s now address the other case. We can argue as we did in the monotone case: if we run ε log( ε ) independent times the algorithm, at least one of them respects c ( S ) < B ( − ˆ ε ), with19robability at least (1 − ˆ ε ) (the proof of this fact is practically the same as the one of Lemma 6).Let’s call G that event, similarly to what we have done in the main text. Focus on that run and call E the event that in that run there is a good element with respect to the solution not fitting in thebudget. Clearly there may be at most one such item which belongs to the optimal solution O ∗ ; wecall such element ˜ x . If ˜ x exists, then c (˜ x ) ≥ B . This is because in the budget of ThreshSeq , i.e., B (1 − ˆ ε ), at least B budget is empty, under G . f ( O ∗ ) ≤ f ( S ∪ O ∗ ) = f ( T ∪ ( O ∗ \ N − )) ≤ f ( T ) + E f (˜ x | T ) + X x ∈ O ∗ \{ ˜ x } f ( x | T ) ≤ f ( T ) + E f ( x ∗ ) + X x ∈ O ∗ \{ ˜ x } f ( x | S ) ≤ f ( T ) + E ( f ( x ∗ ) − α f ( O ∗ )2 ) + αf ( O ∗ ) . Passing to the expectation and calling q the conditioned probability of the event E given G we have,similarly to the main text: f ( O ∗ ) ≤ E [ f ( O ∗ ∪ S )] = E [ f ( O ∗ ∪ S ) | G ] P ( G ) + E h f ( O ∗ ∪ S ) | G C i P (cid:16) G C (cid:17) ≤ E [ f ( O ∗ ∪ S ) | G ] (1 − ˆ ε ) + ˆ ε f ( O ∗ ) ≤ (1 + q )(1 − ˆ ε ) ALG + (2ˆ ε + α (1 − ˆ ε )(1 − q )) f ( O ) . Notice we used the bound f ( S ∪ O ∗ ) ≤ f ( O ∗ ) which is universal as long as c ( S ) ≤ B . Rearrangingthe terms we have f ( O ∗ ) ≤ (1 + q )(1 − ˆ ε )1 − ε − α (1 − ˆ ε )(1 − q ) ALG . (11)Putting together (10) and (11) and setting α = , we have OP T ≤ (3 + 10ˆ ε ) ALG , for any value of q . Rescaling ˆ ε by a factor of 10 one yields the desired result. C Non-Monotone Objectives and a Cardinality Constraint
In presence of cardinality constraints, there is no need to address separately small and large elements.Moreover, when bounding the elements of the solution whose marginal density is greater than τ butdo not fit in the budget, we just need to consider the case E [ c ( S )] > (1 − ˆ ε ) k instead of consideringhalf of the “budget”.If E [ | S | ] ≥ (1 − ˆ ε ) k we have immediately a good bound in expectation. Otherwise, if we run itat least ε log( ε ) independent times, we have that, with probability at least (1 − ˆ ε ) the cardinalityconstraint k is not met, meaning that all the good elements belong to the set G , as defined in themain text. The full algorithm, ParCardinal , is presented in Algorithm 6 below.
Theorem 7.
For ε ∈ (0 , / it is possible to achieve a . ε approximation, in O ( ε log n ) adaptiverounds and O ( nkε log n log k log ε ) queries, or in O ( ε log n log k ) adaptive rounds and O ( nε log n log k log( ε )) queries.Proof. ParCardinal with parameters α = 3 − √ p = (1 − α ) / ε ∈ (0 , ) does the job.The adaptive and query complexity are as in the knapsack case. The only difference is that noweach sequence drawn from SampleSeq has at most length k . For the approximation guarantees, weconsider two cases, this time depending on the relative ordering of E [ | S | ] and (1 − ˆ ε ) k .20 lgorithm 6: ParCardinal ( N , f, ε, α, B )Full algorithm for non-monotone and a cardinality constraint Input:
Ground set N ,submodular function f , cardinality k , precision ε ∈ (0 , α ∈ (0 ,
1) and sampling probability p x ∗ ← max x ∈N f ( x ), ˆ τ ← αn f ( x ∗ ) k ˆ ε ← ε , ‘ ← ε , k ← ε log( n ) H ← sample each element in N independently at random with probability p for i = 0 , . . . , k in parallel do τ i ← ˆ τ · (1 − ˆ ε ) i for j = 1 , . . . , ε log( ε ) in parallel do S ij ← ThreshSeq ( H, τ i , ˆ ε, ‘, k ) T ← arg max { f ( S ji ) , f ( x ∗ ) } Return T If E [ | S | ] ≥ (1 − ˆ ε ) k , then, by Lemma 4, we have f ( O ∗ ) ≤ α (1 − ˆ ε ) E [ f ( S )] . (12)Otherwise, with probability at least ε , each run of the algorithm does not fill all the cardinalityconstraint, meaning that with ε log( ε ) independent rounds one has that | S | < k in at least oneround with probability at least 1 − ˆ ε ; as usual we call G that event. Focus on that round and recallthe definition of G from Lemma 5: G contains the good elements still fitting in the cardinalityconstraint. We have f ( S ∪ O H ) ≤ f ( S ) + X x ∈ G f ( x | S ) + X x ∈ O H \ ( S ∪ G ) f ( x | S ) ≤ f ( S )(1 + ˆ ε ) + τ c ( O H ) . Passing to the conditional expectation with respect to G and using f ( S ∪ O H ) ≤ f ( O ∗ ) we have p (1 − p ) f ( O ∗ ) ≤ E [ f ( S ∪ O H )] ≤ E [ f ( S ∪ O H ) | G ] (1 − ˆ ε ) + 2ˆ εf ( O ∗ ) ≤ E [ f ( S )] (1 + ˆ ε )(1 − ˆ ε ) + αp (1 − ˆ ε ) f ( O ∗ ) + 2ˆ εf ( O ∗ ) . By rearranging the terms, we get f ( O ∗ ) ≤ (1 + ˆ ε )(1 − ˆ ε ) p (1 − p − (1 − ˆ ε ) α − ˆ εp ) E [ f ( S )] , (13)Plugging α = 3 − √ p = − α in (12) and (13) one gets OP T ≤ (3 + 2 √ ε ) ALG .
If we rescale ˆ ε = ε , we have the desired result. 21 eferences [1] G. Amanatidis, F. Fusco, P. Lazos, S. Leonardi, and R. Reiffenhäuser. Fast adaptive non-monotone submodular maximization subject to a knapsack constraint. In Advances in NeuralInformation Processing Systems 33: Annual Conference on Neural Information ProcessingSystems 2020, NeurIPS 2020, December 6-12, 2020, virtual , 2020.[2] E. Balkanski and Y. Singer. The adaptive complexity of maximizing a submodular function.In
Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, STOC2018, Los Angeles, CA, USA, June 25-29, 2018 , pages 1138–1151. ACM, 2018.[3] E. Balkanski, A. Breuer, and Y. Singer. Non-monotone submodular maximization in expo-nentially fewer iterations. In
Advances in Neural Information Processing Systems 31: AnnualConference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8,2018, Montréal, Canada , pages 2359–2370, 2018.[4] E. Balkanski, A. Rubinstein, and Y. Singer. An exponential speedup in parallel running timefor submodular maximization without loss in approximation. In
Proceedings of the 50th AnnualACM SIGACT Symposium on Theory of Computing, STOC 2018, Los Angeles, CA, USA,June 25-29, 2018 , pages 283–302. SIAM, 2019.[5] E. Balkanski, A. Rubinstein, and Y. Singer. An optimal approximation for submodularmaximization under a matroid constraint in the adaptive complexity model. In
Proceedings ofthe 51st Annual ACM SIGACT Symposium on Theory of Computing, STOC 2019, Phoenix,AZ, USA, June 23-26, 2019 , pages 66–77. ACM, 2019.[6] A. Breuer, E. Balkanski, and Y. Singer. The FAST algorithm for submodular maximization.In
Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18July 2020, Virtual Event , volume 119 of
Proceedings of Machine Learning Research , pages1134–1143. PMLR, 2020.[7] N. Buchbinder, M. Feldman, J. Naor, and R. Schwartz. Submodular maximization withcardinality constraints. In
Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium onDiscrete Algorithms, SODA 2014, Portland, Oregon, USA, January 5-7, 2014 , pages 1433–1452.SIAM, 2014.[8] J. Carbinell and J. Goldstein. The use of MMR, diversity-based reranking for reorderingdocuments and producing summaries.
SIGIR Forum , 51(2):209–210, 2017.[9] C. Chekuri and K. Quanrud. Submodular function maximization in parallel via the multilinearrelaxation. In
Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algo-rithms, SODA 2019, San Diego, California, USA, January 6-9, 2019 , pages 303–322. SIAM,2019.[10] C. Chekuri and K. Quanrud. Parallelizing greedy for submodular set function maximization inmatroids and beyond. In
Proceedings of the 51st Annual ACM SIGACT Symposium on Theoryof Computing, STOC 2019, Phoenix, AZ, USA, June 23-26, 2019 , pages 78–89. ACM, 2019.[11] C. Chekuri, J. Vondrák, and R. Zenklusen. Submodular function maximization via themultilinear relaxation and contention resolution schemes.
SIAM J. Comput. , 43(6):1831–1879,2014. 2212] L. Chen, M. Feldman, and A. Karbasi. Unconstrained submodular maximization with constantadaptive complexity. In
Proceedings of the 51st Annual ACM SIGACT Symposium on Theoryof Computing, STOC 2019, Phoenix, AZ, USA, June 23-26, 2019 , pages 102–113. ACM, 2019.[13] A. Das and D. Kempe. Algorithms for subset selection in linear regression. In
Proceedingsof the 40th Annual ACM Symposium on Theory of Computing, Victoria, British Columbia,Canada, May 17-20, 2008 , pages 45–54. ACM, 2008.[14] A. Das and D. Kempe. Approximate submodularity and its applications: Subset selection,sparse approximation and dictionary selection.
J. Mach. Learn. Res. , 19:3:1–3:34, 2018.[15] D. Dueck and B. J. Frey. Non-metric affinity propagation for unsupervised image categorization.In
IEEE 11th International Conference on Computer Vision, ICCV 2007, Rio de Janeiro,Brazil, October 14-20, 2007 , pages 1–8. IEEE Computer Society, 2007.[16] A. Ene and H. L. Nguyen. Submodular maximization with nearly-optimal approximation andadaptivity in nearly-linear time. In
Proceedings of the Thirtieth Annual ACM-SIAM Symposiumon Discrete Algorithms, SODA 2019, San Diego, California, USA, January 6-9, 2019 , pages274–282. SIAM, 2019.[17] A. Ene and H. L. Nguyen. Parallel algorithm for non-monotone DR-submodular maximization.In
Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18July 2020, Virtual Event , volume 119 of
Proceedings of Machine Learning Research , pages2902–2911. PMLR, 2020.[18] A. Ene, H. L. Nguyen, and A. Vladu. A parallel double greedy algorithm for submodularmaximization.
CoRR , abs/1812.01591, 2018.[19] A. Ene, H. L. Nguyen, and A. Vladu. Submodular maximization with matroid and packingconstraints in parallel. In
Proceedings of the 51st Annual ACM SIGACT Symposium on Theoryof Computing, STOC 2019, Phoenix, AZ, USA, June 23-26, 2019 , pages 90–101. ACM, 2019.[20] M. Fahrbach, V. S. Mirrokni, and M. Zadimoghaddam. Submodular maximization with nearlyoptimal approximation, adaptivity and query complexity. In
Proceedings of the Thirtieth AnnualACM-SIAM Symposium on Discrete Algorithms, SODA 2019, San Diego, California, USA,January 6-9, 2019 , pages 255–273. SIAM, 2019.[21] M. Fahrbach, V. S. Mirrokni, and M. Zadimoghaddam. Non-monotone submodular maxi-mization with nearly optimal adaptivity and query complexity. In
Proceedings of the 36thInternational Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach,California, USA , volume 97 of
Proceedings of Machine Learning Research , pages 1833–1842.PMLR, 2019.[22] U. Feige. A threshold of ln n for approximating set cover. J. ACM , 45(4):634–652, 1998.[23] U. Feige, V. S. Mirrokni, and J. Vondrák. Maximizing non-monotone submodular functions.
SIAM J. Comput. , 40(4):1133–1153, 2011.[24] M. Feldman, J. Naor, and R. Schwartz. A unified continuous greedy algorithm for submodularmaximization. In
IEEE 52nd Annual Symposium on Foundations of Computer Science, FOCS2011, Palm Springs, CA, USA, October 22-25, 2011 , pages 570–579. IEEE Computer Society,2011. 2325] A. Gupta, A. Roth, G. Schoenebeck, and K. Talwar. Constrained non-monotone submodularmaximization: Offline and secretary algorithms. In
Internet and Network Economics - 6thInternational Workshop, WINE 2010, Stanford, CA, USA, December 13-17, 2010. Proceedings ,volume 6484 of
Lecture Notes in Computer Science , pages 246–257. Springer, 2010.[26] F. M. Harper and J. A. Konstan. The movielens datasets: History and context.
ACM Trans.Interact. Intell. Syst. , 5(4):19:1–19:19, 2016.[27] J. D. Hartline, V. S. Mirrokni, and M. Sundararajan. Optimal marketing strategies over socialnetworks. In
Proceedings of the 17th International Conference on World Wide Web, WWW2008, Beijing, China, April 21-25, 2008 , pages 189–198. ACM, 2008.[28] E. Kazemi, M. Zadimoghaddam, and A. Karbasi. Scalable deletion-robust submodular maxi-mization: Data summarization with privacy and fairness constraints. In
Proceedings of the 35thInternational Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm,Sweden, July 10-15, 2018 , volume 80 of
Proceedings of Machine Learning Research , pages2549–2558. PMLR, 2018.[29] D. Kempe, J. M. Kleinberg, and É. Tardos. Maximizing the spread of influence through asocial network.
Theory Comput. , 11:105–147, 2015.[30] R. Khanna, E. R. Elenberg, A. G. Dimakis, S. N. Negahban, and J. Ghosh. Scalable greedyfeature selection via weak submodularity. In
Proceedings of the 20th International Conferenceon Artificial Intelligence and Statistics, AISTATS 2017, 20-22 April 2017, Fort Lauderdale, FL,USA , volume 54 of
Proceedings of Machine Learning Research , pages 1560–1568. PMLR, 2017.[31] A. Kuhnle. Nearly linear-time, parallelizable algorithms for non-monotone submodular maxi-mization.
To appear in AAAI , abs/2009.01947, 2021.[32] A. Kulik, H. Shachnai, and T. Tamir. Approximations for monotone and nonmonotonesubmodular maximization with knapsack constraints.
Math. Oper. Res. , 38(4):729–739, 2013.[33] M. Minoux. Accelerated greedy algorithms for maximizing submodular set functions. In
Optimization Techniques , pages 234–243, Berlin, Heidelberg, 1978. Springer Berlin Heidelberg.[34] B. Mirzasoleiman, A. Karbasi, R. Sarkar, and A. Krause. Distributed submodular maximization:Identifying representative elements in massive data. In
Advances in Neural InformationProcessing Systems 26: 27th Annual Conference on Neural Information Processing Systems2013. Proceedings of a meeting held December 5-8, 2013, Lake Tahoe, Nevada, United States ,pages 2049–2057, 2013.[35] B. Mirzasoleiman, A. Badanidiyuru, and A. Karbasi. Fast constrained submodular maximization:Personalized data summarization. In
Proceedings of the 33nd International Conference onMachine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016 , volume 48 of
JMLR Workshop and Conference Proceedings , pages 1358–1367. JMLR.org, 2016.[36] B. Mirzasoleiman, J. A. Bilmes, and J. Leskovec. Coresets for data-efficient training of machinelearning models. In
Proceedings of the 37th International Conference on Machine Learning,ICML 2020, 13-18 July 2020, Virtual Event , volume 119 of
Proceedings of Machine LearningResearch , pages 6950–6960. PMLR, 2020.[37] G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. An analysis of approximations for maximizingsubmodular set functions - I.
Math. Program. , 14(1):265–294, 1978.2438] M. Sviridenko. A note on maximizing a submodular set function subject to a knapsackconstraint.
Oper. Res. Lett. , 32(1):41–43, 2004.[39] S. Tschiatschek, R. K. Iyer, H. Wei, and J. A. Bilmes. Learning mixtures of submodularfunctions for image collection summarization. In
Advances in Neural Information ProcessingSystems 27: Annual Conference on Neural Information Processing Systems 2014, December8-13 2014, Montreal, Quebec, Canada , pages 1413–1421, 2014.[40] J. Yang and J. Leskovec. Defining and evaluating network communities based on ground-truth.