[PDF] Budget-Smoothed Analysis for Submodular Maximization

Abstract

The greedy algorithm for submodular function maximization subject to cardinality constraint is guaranteed to approximate the optimal solution to within a 1-1/e factor. For worst-case instances, it is well known that this guarantee is essentially tight -- for greedy and in fact any efficient algorithm. Motivated by the question of why greedy performs better in practice, we introduce a new notion of budget smoothed analysis. Our framework requires larger perturbations of budgets than traditional smoothed analysis for e.g. linear programming. Nevertheless, we show that under realistic budget distributions, greedy and related algorithms enjoy provably better approximation guarantees, that hold even for worst-case submodular functions.

Full PDF

BBudget-Smoothed Analysis for Submodular Maximization ∗ Aviad RubinsteinStanford University [email protected]

Junyao ZhaoStanford University [email protected]

Abstract

The greedy algorithm for submodular function maximization subject to cardinality constraintis guaranteed to approximate the optimal solution to within a 1 − /e factor. For worst-caseinstances, it is well known that this guarantee is essentially tight — for greedy and in fact anyeﬃcient algorithm. Motivated by the question of why greedy performs better in practice, weintroduce a new notion of budget smoothed analysis . Our framework requires larger perturbationsof budgets than traditional smoothed analysis for e.g. linear programming. Nevertheless, weshow that under realistic budget distributions, greedy and related algorithms enjoy provablybetter approximation guarantees, that hold even for worst-case submodular functions. ∗ We thank Eric Balkanski and Matt Weinberg for interesting discussions of the model and/or comments on earlierdrafts. a r X i v : . [ c s . D S ] F e b Introduction

Monotone submodular function maximization subject to a cardinality constraint is a fundamentalproblem in combinatorial optimization with a wide variety of applications including feature selec-tion, sensor placement, inﬂuence maximization in social networks, document summarization, etc.(see e.g. [KG14] and references therein). We will use inﬂuence maximization in social networks asa running example: an advertiser has a limited budget of k free product samples that she wishesto distribute to seed consumers, who will then propagate the news about the product to theirfriends, then their friends’ friends, etc. The standard approach to this problem [KKT15] modelsthe expected ﬁnal reach of the campaign as a monotone submodular function f : P ([ n ]) → R of theset of seed consumers (where [ n ] is the set of all users in the network). The goal of the optimizationproblem is to ﬁnd a set of k seed consumers that (approximately) maximizes f .Classic work shows that the simple greedy algorithm achieves a 1 − /e -approximaiton to theoptimal solution in the worst case [NWF78]. Furthermore, this bound is tight for algorithms thatmake sub-exponential queries to the function [NW78, Von13]; and even succinctly representablefunctions (e.g. simple models of inﬂuence propagation on a social network graph) do not allowbetter approximation algorithms unless P = NP [Fei98]. In theory, this tight characterization ofthe optimal approximation factor is very satisfying.Given the importance of this problem in practical applications, it is also interesting to askwhat is the optimal approximation factor that can be obtained on realistic instances. As onecan expect, the performance of the greedy algorithm tends to be signiﬁcantly better in practice(e.g. [TSP20]). When reasoning about real-world instances, there is a natural tradeoﬀ betweenquality and generality of the guarantees: at one extreme, worst-case analysis only gives a (1 − /e )-approximation but applies to every instance; at the other extreme we could, in principle ,empirically evaluate the exact performance of the greedy algorithm on each instance of interest,but we would have to redo this for every new instance. Our goal in this work is to provide the simplest possible explanation for better-than- (1 − /e ) -approximation in practice . I.e. we want toextend the classic worst-case model –while making minimal assumptions– to explain why eﬃcientalgorithms like greedy should obtain better approximation factors.Coming up with useful and realistic assumptions about submodular functions continues to bean interesting and active topic of research. In Section 1.3 we survey several natural restrictions,including recent success stories that allow for improved approximation algorithms [KL14, SVW17,BRS16, HS16, Yos16, CRV17, TSP20, STY20]. Deferring details for later, we argue that the bottomline of this discussion is that submodular functions are complex objects, and as such modeling theirbeyond-worst-case behavior is tricky and often application-dependent.In this work, we circumvent the complexity of modeling typical submodular functions by fo-cusing on the beyond-worst-case behavior of a much simpler object: the cardinality constraint.Namely, inspired by the celebrated smoothed analysis [ST04], we initiate the study of submodularfunction maximization in a budget-smoothed analysis . We show that without making any assump-tions about the submodular function , we can make signiﬁcant progress by merely perturbing thecardinality constraint.Clearly, tiny perturbations of the constraint cannot escape the hardness of approximation re-sults . Instead we consider larger perturbations, e.g. budget drawn uniformly from [ x, x ] for In general calculating the approximation factor on a real-world instance requires computing the value of theoptimal solution; but if we could do that eﬃciently, we wouldn’t need the greedy algorithm in the ﬁrst place. Except monotonicity. By submodularity, a (1 + ε )-multiplicative perturbation in budget cannot aﬀect the value of the solution by morethan a (1 + ε )-factor. x . To motivate this assumption, consider a small and a large social marketing campaigns.While they both advertise and propagate on the same social network (hence maximize essentiallythe same, possibly worst-case, submodular function), their budgets can easily diﬀer by an order ofmagnitude or more. Thus even if the social network/submodular function is worst case, the “aver-age” advertiser uses an “average” budget which is independent of the social network/submodularfunction. The budget-smoothed analysis model

We study monotone submodular function maximization subject to a cardinality constraint in thefollowing semi-adversarial setting:

Deﬁnition 1.1 (Budget smoothed analysis) .

1. The distribution ˜ D of budgets (e.g. uniform over [ x, x ]) is given as input to the adversary.2. The adversary chooses a (monotone submodular) function f .3. The cardinality constraint k ∼ ˜ D is drawn at random and given as input to the algorithm.4. The algorithm (approximately) maximizes f over all sets of size at most k .For any distribution ˜ D , we’re interested in the expected ratio R ALG ( f, ˜ D ) between the value ob-tained by the algorithm and the optimal solution, R ALG ( f, ˜ D ) := E k ∼ ˜ D (cid:20) f ( ALG ) f ( OP T ) (cid:21) . We focus on algorithms that do not assume knowledge of ˜ D , but, somewhat surprisingly, theywill be competitive with algorithms that know ˜ D .Naturally the approximation factors we obtain will depend on the distribution ˜ D . For example,as we explained above, if ˜ D is concentrated in a small multiplicative interval [ x, (1 + ε ) x ], we cannothope to get more than a (1 + ε )-factor advantage.For notational convenience, we make the following change to the above model: Rather thansampling the budget from a distribution, each instance will be characterized by a base budget k and a budget perturbation distribution D , with the ﬁnal cardinality constraint being k := ρ · k for ρ ∼ D . This will allow us to talk about a distribution D like “uniform over [ x, x ]” while studyingthe asymptotic complexity as the instance size and cardinality constraint go to inﬁnity. Our results

In our model of budget-smoothed analysis, we investigate the following questions: Is it possible toobtain better approximation factors? If so, how much better compared to the worst-case 1 − /e ?Are simple algorithms like Greedy still optimal? Remark.

All the hardness results below hold both in the black-box oracle model (for any algorithmthat makes a subexponential number of queries), or assuming P (cid:54) = NP in the computational modelfor coverage functions (on a polynomial-size graph).2able 1: Empirical Results Budget perturbation distribution Worst-case approximation ratio

Baseline (no perturbation) 0.6321Uniform over [1 ,

10] 0.6675Log-scale-uniform over [1 ,

10] 0.6674Log-scale-uniform over [1 , Result 1: Optimal approximation factors

For any budget perturbation distribution D , wecharacterize the optimal possible approximation factor as a simple (but non-convex) optimizationprogram (Section 5). We also include some numerical estimates for natural distributions (Table 1).In particular, we see that our analysis guarantees non-negligible improvements even for worst-casesubmodular functions. For the special case of D supported on two budgets, we also give a closed-form solution (Proposition A.4). Result 2: Optimal approximation algorithms

We prove that a large class of algorithmsthat are (near-)optimal in the worst-case continue to obtain (near-)optimal approximation factorsunder budget-smoothed analysis for any distribution (Observation 3.5 and Appendix B). This classincludes the simple greedy algorithms, as well as (variants of) recent eﬃcient parallel algorithms,and Map-Reduce algorithms. In particular, these algorithms are optimal even in comparison toalgorithms that know the budget perturbation distribution D . Result 3: Bounding the best-case budget distribution

We also prove that for every bud-get distribution and any eﬃcient algorithm, the optimal budget-smoothed analysis approximationfactor is bounded away from 1 (in particular, it is at most 0 . At a super high level, our results show that while the worst-case approximation factor for monotonesubmodular maximization is ≈ .

63, our new model can explain, under minimal assumptions, whywe should expect at least ≈ . .

63 to 0 .

67 signiﬁcant?First, we believe that given the vast interest in this problem, this improvement already interesting.Clearly, it is insuﬃcient to fully explain the success of greedy on real world instances; withoutmaking any assumptions on the submodular function, it would be very ambitious to expect that in a ﬁrst attempt. For comparison, [ST04]’s original polynomial upper bound for smoothed analysis(of a non-trivial variant) of the Simplex algorithm was ˜ O ( d n σ − + d n ) iterations... Indeed, Although we did expect the approximation factor to be higher given that our budget perturbations completelyshatter known hardness construction. We ﬁnd the robustness of our new hardness constructions quite surprising. Here d is the number of variables, n is the number of constraints, and σ is the variance of the perturbation,see [DH18].

3e believe that there are countless future directions to explore in our new model of budget-smoothedanalysis. To exhibit this breadth of possibilities, we mention a couple of preliminary results thatwe have for other problems that ﬁt into our new model: • Submodular maximization subject to knapsack constraint . While the optimal 1 − /e factorcan again be recovered in polynomial time, the state-of-the-art algorithms for this problemare still not completely satisfying [Svi04, EN19, NS20], and the greedy algorithm does notprovide any non-trivial approximation guarantee. Our preliminary results show that withbudget-smoothed analysis, greedy guarantees a constant factor approximation with knapsackconstraint, and in fact to date we haven’t been able to rule out 1 − /e approximation (orbetter). • Budget-feasible mechanism design: this is a well-studied problem in AGT [Sin10, CGL11,DPS11, BKS12, SM13, CC14, EG14, GNS14, HIM14, BH16, CC16, NSKK16, ZLM16, ZWG + − /e . Ourpreliminary results show that this mechanism does not improve at all under budget-smoothedanalysis, but this inspires a new mechanism that is optimal under budget-smoothed analysisand also signiﬁcantly outperforms [AGN18]’s mechanism on realistic distributions theoreti-cally and empirically. In Section 1.3 we survey several other approaches to beyond-worst-case submodular functions. InSection 3 we prove our core technical result, namely that greedy obtains optimal approximationfactors for any distribution; in Appendix B we extend this result to other related algorithms. Inthe following sections we build on these techniques; in particular we henceforth simply analyze theapproximation factors of the greedy algorithm. In Section 4, we prove that the optimal approxima-tion factor is bounded away from 1 for any distribution. In Section 5, we characterize the optimalapproximation factor. We then use this program to simulate several exemplary distributions.

Due to the popularity of submodular maximization in practice, there is a lot of interest in under-standing and designing algorithms for “typical” cases. We discuss a few approaches below. Wenote that our model of beyond-worst-case constraint is orthogonal to any assumptions about thesubmodular function, and in principle could be combined with any of them to obtain even strongerresults.The model most closely in spirit to our smoothed-analysis-like approach is to take a worst-case submodular function and perturb it with random noise. The most straightforward way ofdoing this is independently perturbing the value of the function for each set. Unfortunately, thisbreaks the submodularity, which makes the problem signiﬁcantly harder , even for small perturba-tions: [HS17]barely recover the 1 − /e approximation factor in this setting (under further restrictionsand with a technically involved algorithm).Another approach is to consider coverage functions , an important special class of monotone sub-modular functions. This restriction has been successful for learning submodular functions [BCIW12,BDF +

12, FK14], but Feige’s NP -hard instance already rules out eﬃcient algorithms with improvedapproximation ratios for this case. One may combine this restriction with perturbations of theweights of the elements of the ground set; but it is not hard to show that Feige’s instance can be4ade robust even to very large amounts of noise. Another alternative is to consider special classesof graphs, e.g. power law, small-world, or triangle-dense that are common for social networks[WS98, GRS16]. But again Feige’s instance either already satisﬁes all of those, or can be adaptedto do so.Another popular restriction of monotone submodular functions is bounded curvature [CC84],which restricts the extent to which ground elements interact; this indeed allows for better algo-rithms with applications to e.g. maximum entropy sampling [CC84, SVW17, BRS16, HS16, Yos16].But bounded curvature seems too restrictive for applications like inﬂuence in social networks andconsumers’ valuations with diminishing returns .To cope with the limited applicability of curvature, the original paper of [CC84] also deﬁned arelaxed notion of greedy curvature , which only restricts the interaction between elements selectedby the greedy algorithm and elements in the optimal solution. In exciting recent work, [TSP20]deﬁne various notions of sharpness which only restricts the interactions of the average element ofthe optimal solution. Both greedy curvature and sharpness parameters suﬀer from the disadvantagethat they may be intractable to compute (both are assumptions about interaction of elements withthe optimal solutions, and if we knew the optimal soluion...). Moreover, due to their complicatedform it’s hard to heuristically reason about their ﬁt for any particular application. Nevertheless,on the positive side both are more realistic than vanilla curvature assumption, and combining themwith our budget-smoothed analysis model is an interesting direction for future research.[CRV17] study submodular maximization under a stability assumption, i.e. they assume thatthe optimal subset does not change when the function is perturbed. [TSP20] argue that in thecontext of submodular maximization, stable instances may fail to capture signiﬁcant interactionbetween elements. As in the case of greedy curvature and sharpness, it is also not clear how tocompute the stability of a function, or reason about instances that we expect to be stable.Finally, one setting that is both natural and allows for improved approximation factors isinﬂuence maximization in undirected graphs [KL14, ST19, STY20]. Speciﬁcally, [KL14] prove thatthe greedy algorithm obtains a (1 − /e + ε )-approximation (for some small unspeciﬁed constant ε >

0) for the independent cascade model on undirected graph. [STY20] show that in the linearthreshold model the greedy algorithm does not beat the (1 − /e )-approximation factor (by anyconstant, in the worst case). Deﬁnition 2.1.

A function f : 2 V → R ≥ is submodular if for all S ⊆ T ⊆ V and i ∈ V \ T , f ( S ∪ { i } ) − f ( S ) ≥ f ( T ∪ { i } ) − f ( T ), where V is called ground set. Moreover, we denote themarginal gain by f ( X | S ) := f ( X ∪ S ) − f ( S ).We make following conventions in this paper—When we say ”eﬃcient algorithm” , we meanpolynomial time algorithms in the general computation model assuming P (cid:54) = NP , or algorithmsusing sub-exponential number of function queries in the oracle query model. Moreover, we consider continuous distribution of budget perturbations D , and we let D ( k ) denote the distributionof budgets in which a budget is sampled by multiplying a random perturbation factor ρ ∼ D with k (if ρ · k is fractional, we can round it to an integer). Furthermore, following Deﬁnition 1.1, wedenote R ALG ( D ( k )) := min f R ALG ( f, D ( k )) s.t. f is monotone and submodular . If, for example, we already selected all of a node’s neighbors, the marginal contribution of adding this node isdiminished to zero. For consumers’ valuations, the marginal contribution of, e.g. the one-thousandth apple, is againdiminished to essentially zero. This means that curvature is unbounded in both settings (see [CC84, SVW17] forformal deﬁnitions). k -cover instances [Fei98]or Vondr´ak’s hard instances [Von13]. In the following theorem, we summarize useful properties ofthese two hardness results. Theorem 2.2.

There exists a class of monotone submodular functions C such that for every (cid:15) > ,for any eﬃcient algorithm A that given a submodular function f and an integer l outputs a set X l of cardinality l , for every suﬃciently large k that grows with size of instance, there is a submodularfunction f k ∈ C such that(i) for all l ≤ k , f k ( O l ) = ( l/k ) f k ( O k ) , where O l is the optimal set that maximizes f k among allcardinality- l sets, and(ii) for all l , f k ( X l ) ≤ (1 − e − l/k + (cid:15) ) f k ( O k ) . Next, we state a standard lemma for greedy analysis.

Lemma 2.3.

Given a monotone submodular f , we let X k and O k denote the greedy solution andthe optimal solution of cardinality k , respectively. Then, for all i, k > , f ( X i ) − f ( X i − ) ≥ k ( f ( O k ) − f ( X i − )) .Proof. Let x i denote the i -th element selected by greedy. It holds that f ( X i ) − f ( X i − ) = f ( x i | X i − ) ≥ k · (cid:88) o ∈ O k f ( o | X i − ) ≥ k · f ( O k | X i − ) ≥ k ( f ( O k ) − f ( X i − )) , where the ﬁrst inequality is by greedy selection, the second is by submodularity, and the third isby monotonicity. In this section, we prove our core technical result: greedy is optimal for submodular maximizationwith respect to arbitrary distribution of budget perturbations.

Theorem 3.1.

For any distribution of budget perturbations D , for every (cid:15) (cid:48) > , for any eﬃ-cient algorithm A , for every suﬃciently large k that grows with the size of instance, it holds that R A ( D ( k )) ≤ (1 + (cid:15) (cid:48) ) R greedy ( D ( k )) . Theorem 3.1 follows directly from Theorem 3.2 using a discretization argument.

Theorem 3.2.

There is a class of monotone submodular functions C such that, for any perturbationfactors < ρ < ρ < · · · < ρ m , for every (cid:15) > , there exists a suﬃciently large k that grows withthe size of instance such that given m budgets k = ρ · k , . . . , k m = ρ m · k , for any monotonesubmodular function ˆ f , for any eﬃcient algorithm A , there exists a function f ∈ C , such that forall i ∈ [ m ] , the solution Y k i computed by A for budget k i has value f ( Y k i ) ≤ (1 + (cid:15) ) f ( X k i ) , where X k i is the greedy solution for budget k i , and moreover, greedy achieves no worse approximationratio on ˆ f than that on f for every budget k i . The implicit dependence of k on (cid:15) in Theorem 2.2 carries over to the dependence of k on (cid:15) (cid:48) in this statement,and therefore, we keep such dependence implicit, and we are mostly interested in the asymptotic result. roof of Theorem 3.1. For arbitrarily small τ >

0, let ρ min and ρ max be such that the mass of D on [ ρ min , ρ max ] is at least 1 − τ . We discretize { ρ min · k, ρ min · k + 1 , . . . , ρ max · k } into { ρ min · k, (1 + δ ) ρ min · k, (1 + δ ) ρ min · k, . . . , ρ max · k } . Without loss of generality, we assume that there exists m such that (1 + δ ) m − ρ min = ρ max and every k i := (1 + δ ) i − ρ min · k is integral. Let ˆ f be the worst-case monotone submodular function for which greedy achieves only R greedy ( D ( k )) approximationin expectation. By Theorem 3.2, for any eﬃcient algorithm A , there is a monotone submodularfunction f such that for all i ∈ [ m ], the solution Y k i outputted by A for budget k i only achieves f ( Y k i ) ≤ (1 + (cid:15) ) f ( X k i ), where X k i is the greedy solution for budget k i , and moreover, for everybudget k i , f ( X k i ) f ( O k i ) ≤ ˆ f ( ˆ X k i )ˆ f ( ˆ O k i ) (1)where O k i and ˆ O k i denote optimal size- k i sets of f and ˆ f respectively, and ˆ X k i denotes the size- k i greedy solution for ˆ f .Besides, because marginal gain in each iteration of greedy is non-increasing, we have that(1 + δ ) f ( X k i − ) ≥ f ( X k i ). Furthermore, without loss of generality, we assume that f ( Y b ) is non-decreasing in b , since otherwise, for budget b , we can let the algorithm choose the best solutionamong Y l for all l ≤ b instead. For any 2 ≤ i ≤ m and any budget b such that k i − ≤ b ≤ k i , itfollows that f ( Y b ) f ( O b ) ≤ f ( Y k i ) f ( O b ) (Since b ≤ k i ) ≤ (1 + (cid:15) ) f ( X k i ) f ( O b ) (By Theorem 3.2) ≤ (1 + (cid:15) )(1 + δ ) f ( X k i − ) f ( O b ) (Diminishing marginal gain) ≤ (1 + (cid:15) )(1 + δ ) f ( X k i − ) f ( O k i − ) (Since b ≥ k i − ) ≤ (1 + (cid:15) )(1 + δ ) ˆ f ( ˆ X k i − )ˆ f ( ˆ O k i − ) (By Eq. (1)) ≤ (1 + (cid:15) )(1 + δ ) ˆ f ( ˆ X b )ˆ f ( ˆ O k i − ) (Since ˆ X k i − ⊆ ˆ X b ) ≤ (1 + (cid:15) )(1 + δ ) ˆ f ( ˆ X b )ˆ f ( ˆ O b ) . ( ˆ f ( ˆ O b ) ≤ bk i − ˆ f ( ˆ O k i − ) by submodularity)Therefore, for every budget b in { ρ min · k, ρ min · k + 1 , . . . , ρ max · k } , A can achieve on f inexpectation at most a factor of (1 + (cid:15) )(1 + δ ) better than what greedy achieves on ˆ f . The proofﬁnishes since δ, (cid:15), τ can be arbitrarily small. In our proof we will not derive the analytic formula of the approximation ratio, but instead, theproof works in a black-box way—First, we introduce an array of parameters such that every instancecan be characterized by these parameters, and we can show a parameterized guarantee of themarginal gain for each iteration of greedy. Then, we construct a hard instance characterized bythe same parameters such that the best possible marginal gains for this instance always matchthe parametrized guarantees from greedy. It follows that the performance of greedy is optimal for7very budget. Our hard instance has the following nice structure: it is a convex combination ofdisjoint-support copies of the classic hard instances guaranteed by Theorem 2.2.

Proof of Theorem 3.2.

Proof setup: bounding a single step of greedy performance

We ﬁrst lower bound the single-step performance of greedy solutions. By Lemma 2.3, we have thefollowing performance guarantees for each iteration of greedy, ∀ l ∈ [ m ] , f ( X i ) − f ( X i − ) ≥ k l · ( f ( O k l ) − f ( X i − )) ( l -th guarantee) , where O k l denotes the optimal solution of cardinality k l , and we call the inequality associated with O k l the l -th guarantee. Given any 1 ≤ l < l ≤ m , if the l -th guarantee dominates (i.e., is at leastas large as the l -th guarantee) at some iteration i , then the l -th guarantee will keep dominatingthe l -th guarantee for all the iterations i (cid:48) ≥ i , because the two guarantees are linear functions withvariable f ( X i − ), and the l -th guarantee decreases slower than the l -th guarantee. Therefore,as f ( X i ) increases, the best guarantee can only transit from some l to some l (cid:48) > l . Given aninstance, we let t ≤ m − k m -th iteration andlet l < l < · · · < l t be the indices of the corresponding best guarantees.For j ≤ t −

1, let F j be the lowest possible value of f ( X i − ), for which the j -th transition occurs,1 k l j · ( f ( O k lj ) − F j ) (cid:124) (cid:123)(cid:122) (cid:125) ( l j -th guarantee) = 1 k l j +1 · ( f ( O k lj +1 ) − F j ) (cid:124) (cid:123)(cid:122) (cid:125) ( l j +1 -th guarantee) . (2)We will be particularly interested in the quantity r j := F j /f ( O k lj ). Plugging into Eq. (2), we havethat r j = 1 − ( k l j /k l j +1 ) · ( f ( O k lj +1 ) /f ( O k lj ))1 − ( k l j /k l j +1 ) . (3) Lower bounding the total value of the greedy solution recursively

For q ≥

0, we denote by f (greedy-lb) ( q ) the best lower bound induced by the union of “ l -th guaran-tees” on the value of the q -th iterate of the greedy algorithm, namely f (greedy-lb) ( q ) := f (greedy-lb) ( q −

1) + max l (cid:26) k l · ( f ( O k l ) − f (greedy-lb) ( q − (cid:27) . Now we analyze f (greedy-lb) ( q ) speciﬁcally for the instance with before-mentioned guarantee tran-sitions. We start from the l -th guarantee and let f (greedy-lb) (0) = 0. Inductively, suppose thatin the current iteration q , the l j -th guarantee dominates the others, we apply the l j -th guarantee f (greedy-lb) ( q ) − f (greedy-lb) ( q −

1) = ( f ( O k lj ) − f (greedy-lb) ( q − /k l j and continue iteratively untilwe reach some i j -th iteration such that f (greedy-lb) ( i j − ≤ r j · f ( O k lj ) < f (greedy-lb) ( i j ). At the i j -th iteration, l j +1 -th guarantee starts dominating, and thus, we switch to the l j +1 -th guaranteeand continue like above. Approximation ratio based on f ( greedy-lb ) ( q ) ’s is determined by r j ’s We claim that the parameters r j fully determine the ratio between the greedy lower bound f (greedy-lb) ( k l i )and f ( O k li ) for all i ∈ [ t ]. To see this, ﬁrst observe that by 3 we can infer f ( O k lj +1 ) /f ( O k lj ) from8 j . We can assume that f ( O k l ) is ﬁxed without loss of generality, and then the parameters r j determine all the remaining f ( O k lj ), i.e., for all 1 < j ≤ t , f ( O k lj ) = f ( O k l ) · (cid:89) j (cid:48) ≤ j − (cid:32) k l j (cid:48) +1 k l j (cid:48) − (cid:32) k l j (cid:48) +1 k l j (cid:48) − (cid:33) r j (cid:48) (cid:33) . (4)Moreover, the greedy lower bound f (greedy-lb) ( k l i ) by deﬁnition is a linear combination of the f ( O k lj )’s. Therefore, the ratio between any f (greedy-lb) ( k l i ) and f ( O k li ) is fully characterized by r j . In the other words, given an instance, we can get the approximation ratios of the greedyalgorithm that depend only on its parameters r j .By deﬁnition of r , any feasible r has to satisfy r ≤

1, and by our assumption of the transitions,any feasible r j should satisfy r j − · f ( O k lj − ) ≤ r j · f ( O k lj ) for all 1 < j ≤ t , which is equivalent to r j − ≤ r j · ( f ( O k lj ) /f ( O k lj − )) (5)= r j · (cid:18) k l j k l j − − (cid:18) k l j k l j − − (cid:19) r j − (cid:19) (By Eq. (4))= r j · (cid:18) k l j − k l j − k l j − − (cid:18) k l j − k l j − k l j − (cid:19) r j − (cid:19) = r j · (cid:18) (cid:18) k l j − k l j − k l j − (cid:19) (1 − r j − ) (cid:19) . (6)Next, for any feasible r j ’s, we construct a hard instance that is characterized by the same r j ’s (i.e.,it satisﬁes Eq. (3) for the given r j ’s), such that for every budget, the greedy approximation ratiomentioned above is optimal on this instance. It follows that greedy is optimal. Construction of hard instances

Let ∆ = k l , ∆ j = k l j − k l j − , ∀ < j < t , and ∆ t = k m − k l t − . We apply Theorem 2.2 to create t hard (with respect to an arbitrary eﬃcient algorithm) functions f ∆ , . . . , f ∆ t over disjoint groundsets V , . . . , V t . We normalize these functions such that they have the same optimal value 1 (i.e., f ∆ j ( O ( j ) ) = 1, where O ( j ) denotes the optimal size-∆ j solution for f ∆ j ) and extend them to theground set V := ∪ ti =1 V i . The ﬁnal submodular function is f ( X ) := t (cid:88) j =1 α j · f ∆ j ( X ) , where α := 1 and α j := (∆ j / j − (cid:88) s =1 ∆ s ) · ( j − (cid:88) s =1 α s ) · (1 − r j − ) , for all 1 < j ≤ t. Claim 3.3.

For any r j that satisfy Eq. (5), r j · (cid:80) js =1 α s is non-decreasing in j . Proof of Claim 3.3. r j · j (cid:88) s =1 α s = r j · (cid:32) j − (cid:88) s =1 α s + α j (cid:33) r j · (cid:32) j − (cid:88) s =1 α s + ∆ j (cid:80) j − s =1 ∆ s · (cid:32) j − (cid:88) s =1 α s (cid:33) · (1 − r j − ) (cid:33) (Deﬁnition of α s )= r j · (cid:32) j − (cid:88) s =1 α s + k l j − k l j − k l j − · (cid:32) j − (cid:88) s =1 α s (cid:33) · (1 − r j − ) (cid:33) (Telescoping sum)= (cid:18) k l j − k l j − k l j − · (1 − r j − ) (cid:19) r j · j − (cid:88) s =1 α s ≥ r j − · j − (cid:88) s =1 α s , (Ineq. (5))Because any feasible r j ’s that we need to consider satisfy r ≤ (cid:80) j − s =1 α s − ( (cid:80) j − s =1 ∆ s / ∆ j ) α j = r j − · (cid:80) j − s =1 α s that α j / ∆ j is decreasing as j increases.Hence, f ( O k lj ) = (cid:80) ji =1 α i for all j ∈ [ t ]. Moreover, it is easy to verify that the r j ’s indeedcharacterize the f constructed above in the sense that Eq. (3) holds for the f constructed above. Upper bounding greedy performance on the hard instances: a single step

First, we analyze the best possible improvement of a single step of greedy on this instance. Supposethat greedy has chosen some size- i set X ( j ) i ⊂ V j , if it chooses another element from V j , then weclaim that the marginal gain is almost always ( f ∆ j ( O ( j ) ) − f ∆ j ( X ( j ) i )) / ∆ j (it is at least this amountby greedy guarantee). Assume otherwise, for some γ, (cid:15) , (cid:15) >

0, in the ﬁrst γ · ∆ j iterations whengreedy chooses elements from V j , there are more than (cid:15) · ∆ j iterations i in which the marginal gain islarger than ((1+ (cid:15) ) / ∆ j ) · ( f ∆ j ( O ( j ) ) − f ∆ j ( X ( j ) i )). Suppose that at ( γ · ∆ j )-th iteration f ∆ j ( X ( j ) γ · ∆ j ) = c · f ∆ j ( O ( j ) ), then each of those (cid:15) · ∆ j iterations gets at least an extra ( (cid:15) / ∆ j ) · (1 − c ) f ∆ j ( O ( j ) ) inaddition to basic greedy guarantee, which implies that f ∆ j ( X ( j )∆ j ) ≥ (1 − e − γ + (cid:15) · (cid:15) · (1 − c )) f ∆ j ( O ( j ) ).Then, for some (cid:15) > f ∆ j ( X ( j ) γ · ∆ j ) ≥ (1 − e − γ + (cid:15) ) f ∆ j ( O ( j ) ), which is impossible by Theorem 2.2.Henceforth, we can assume that the marginal gain for f ∆ j is always ( f ∆ j ( O ( j ) ) − f ∆ j ( X ( j ) i )) / ∆ j forthe i -th iteration when greedy chooses elements from V j , and this will only decrease all the valuesof interest by an arbitrarily small multiplicative error. Upper bounding greedy performance on the hard instances: total value

When we start running the greedy algorithm, for a while it only select elements from V sincethose have the highest marginal contribution. Speciﬁcally, suppose that at the beginning of the q -th step, greedy has selected X q − ⊂ V . Then the best achievable marginal gain of an elementfrom V for f is α ( f ∆ ( O (1) ) − f ∆ ( X q − )) / ∆ . In comparison, the best singleton value of anelement in V is ( α / ∆ ) f ∆ ( O (2) ), which is dominated by α ( f ∆ ( O (1) ) − f ∆ ( X q − )) / ∆ , when f ∆ ( X q − ) ≤ r · f ∆ ( O (1) ), because α ∆ ( f ∆ ( O (1) ) − f ∆ ( X q − )) ≥ α ∆ ( f ∆ ( O (1) ) − r · f ∆ ( O (1) )) (By f ∆ ( X q − ) ≤ r · f ∆ ( O (1) ))= α ∆ (1 − r ) (By f ∆ ( O (1) ) = 1)10 α ∆ (By deﬁnition of α )= α ∆ f ∆ ( O (2) ) (By f ∆ ( O (2) ) = 1) . (7)Thus, when f ( X q − ) = α · f ∆ ( X q − ) < r · f ( O (1) ), greedy should always prefer choosing elementsfrom V over V (and other V i ’s), and the single step improvement is f ( X q ) − f ( X q − ) = ( f ( O (1) ) − f ( X q − )) / ∆ = ( f ( O (1) ) − f ( X q − )) /k (this matches how f (greedy-lb) ( q ) changes).We now analyze what happens when, after running greedy for a while, the marginal contributionfrom V -elements decays so that greedy may prefer V -elements. By Eq. (7), it is when f ∆ ( X q − ) = r · f ∆ ( O (1) ) that the best singleton value of V -elements ( α / ∆ ) f ∆ ( O (2) ) becomes equal tothe best marginal contribution of a V -element α ( f ∆ ( O (1) ) − f ∆ ( X q − )) / ∆ . Therefore, once r · f ( O (1) ∪ O (2) ) > f ( X q − ) ≥ r · f ( O (1) ), greedy should start choosing elements from V and V alternatively to keep the identity α ( f ∆ ( O (1) ) − f ∆ ( X q − )) / ∆ = α ( f ∆ ( O (2) ) − f ∆ ( X q − )) / ∆ (up to negligible error), it follows that α ( f ∆ ( O (1) ) − f ∆ ( X q − ))∆ = α ( f ∆ ( O (2) ) − f ∆ ( X q − ))∆ = α ( f ∆ ( O (1) ) − f ∆ ( X q − )) + α ( f ∆ ( O (2) ) − f ∆ ( X q − ))∆ + ∆ = f ( O (1) ∪ O (2) ) − f ( X q − ) k l . Thus, there is a transition of the best marginal gain when f ( X q − ) = r · f ( O (1) ) with X q − ⊆ V ,and after that the best achievable marginal gain is characterized by ( f ( O (1) ∪ O (2) ) − f ( X q − )) /k l (this matches the guarantee transition for f (greedy-lb) ( q )), which is larger than ( α / ∆ ) f ∆ ( O (3) ) bydeﬁnition of α .Similarly, for every 3 ≤ p ≤ t , when f ( X q − ) = r p − · f ( ∪ p − j =1 O ( j ) ) with X q − ⊆ ∪ j ≤ p − V j ,it holds that ( f ( ∪ p − j =1 O ( j ) ) − f ( X q − )) /k l p − = ( α p / ∆ p ) f ∆ p ( O ( p ) ) by deﬁnition of α p , and hence,greedy starts to choose elements from V , . . . , V p to keep α j ( f ∆ j ( O ( j ) ) − f ∆ j ( X q − )) / ∆ j for all j ≤ p approximately equal to each other. Hence, for all j ≤ p , α j ( f ∆ j ( O ( j ) ) − f ∆ j ( X q − ))∆ j = (cid:80) j ≤ p α j ( f ∆ j ( O ( j ) ) − f ∆ j ( X q − )) (cid:80) j ≤ p ∆ j = f ( ∪ pj =1 O ( j ) ) − f ( X q − ) k l p , and this is a transition of the best marginal gain from ( f ( ∪ p − j =1 O ( j ) ) − f ( X q − )) /k l p − to ( f ( ∪ pj =1 O ( j ) ) − f ( X q − )) /k l p (this matches the guarantee transition for f (greedy-lb) ( q )). Therefore, we have shownthat the greedy performance f ( X q ) changes in exactly the same way as f (greedy-lb) ( q ), and hence,the approximation ratio based on f (greedy-lb) ( q )’s is tight for greedy on the hard instance. How greedy spends the budget.

Finally, following the above derivation, we emphasize howgreedy spends the budget. As we have shown, for any 1 ≤ p ≤ t , when r p − · f ( ∪ pj =1 O ( j ) ) ≤ f ( X q − ) ≤ r p · f ( ∪ pj =1 O ( j ) ), greedy splits its budget on V , . . . , V p to keep all the α j ( f ∆ j ( O ( j ) ) − f ∆ j ( X q − )) / ∆ j approximately equal to each other. Moreover, for any j (cid:48) > p , the best singletonvalue ( α j (cid:48) / ∆ j (cid:48) ) f ∆ j (cid:48) ( O ( j (cid:48) ) ) of V j (cid:48) is smaller than α j ( f ∆ j ( O ( j ) ) − f ∆ j ( X q − )) / ∆ j for any j ≤ p . Sup-pose that greedy has spent budget ˆ b j on V j for each j , which implies that f ∆ j ( X q − ) = 1 − e − ˆ b j / ∆ j .Then, we have that α j ( f ∆ j ( O ( j ) ) − f ∆ j ( X q − )) / ∆ j = dα j (1 − e − x/ ∆ j ) dx | x =ˆ b j , and thus, dα j (1 − e − x/ ∆ j ) dx | x =ˆ b j j ≤ p are equal to each other. Moreover, since ( α j (cid:48) / ∆ j (cid:48) ) f ∆ j (cid:48) ( O ( j (cid:48) ) ) = dα j (cid:48) (1 − e − x/ ∆ j (cid:48) ) dx | x =0 , dα j (1 − e − x/ ∆ j ) dx | x =ˆ b j ≥ dα j (cid:48) (1 − e − x/ ∆ j (cid:48) ) dx | x =0 for all j ≤ p and j (cid:48) > p . Greedy spends the budget optimally on the hard instance

By Theorem 2.2, for any budget b j , the best possible value an eﬃcient algorithm can get by spendingbudget b j on f ∆ j is u j ( b ) = (1 − e − b j / ∆ j ) α j . Suppose the algorithm spends budget b j on each f ∆ j ,where (cid:80) tj =1 b j = b for some b , then in this case, the best possible value in total is (cid:80) tj =1 u j ( b j ), andhence, in general, the best possible value for budget b is the maximum of the following program:max t (cid:88) j =1 u j ( b j ) s.t. t (cid:88) j =1 b j = b and ∀ j b j ≥ . We observe that for an arbitrary ﬁxed b , the maximizer b ∗ j ’s for this program should satisfy thatfor all positive b ∗ j , the derivatives of u j ’s at b ∗ j ’s are equal (notice that the way greedy spends thebudget also satisﬁes this property), and moreover, they are not smaller than the derivative of u j (cid:48) ’sat 0 for any j (cid:48) such that b ∗ j (cid:48) = 0. Otherwise, there must exist du j ( x ) dx | x = b ∗ j < du j ( x ) dx | x = b ∗ j where b ∗ j is strictly positive, then increasing b ∗ j by δ and decreasing b ∗ j by δ for suﬃciently small δ willincrease the objective value while preserving the feasibility of b ∗ j ’s.Now we prove that for any ﬁxed b , the b ∗ j ’s satisfying the above mentioned property are unique.(Then, it follows that the maximizer matches exactly how greedy spends the budget, and moreover,greedy attains the optimal value of the program.) Suppose that besides b ∗ j ’s, (cid:101) b j ’s also satisfy theproperty. Let supp( b ∗ ) be the set of j such that b ∗ j >

0. We ﬁrst argue if j (cid:48) / ∈ supp( b ∗ ), then (cid:101) b j (cid:48) = 0.Suppose otherwise, (cid:101) b j (cid:48) >

0, then (cid:80) j ∈ supp( b ∗ ) (cid:101) b j < t , and hence, there must exist a j ∈ supp( b ∗ )such that (cid:101) b j < b ∗ j . By strict concavity of u j , du j ( x ) dx | x = (cid:101) b j (cid:48) > du j ( x ) dx | x = b ∗ j . However, since the b ∗ j ’ssatisfy the above mentioned property and b ∗ j (cid:48) = 0, du j ( x ) dx | x = b ∗ j ≥ du j (cid:48) ( x ) dx | x =0 , and by strict concavityof u j (cid:48) , du j (cid:48) ( x ) dx | x =0 > du j (cid:48) ( x ) dx | x = (cid:101) b j (cid:48) , which gives a contradiction. Furthermore, we can argue that forall j ∈ supp( b ∗ ), (cid:101) b j (cid:48) = b ∗ j , because otherwise, there must exist j, j (cid:48) ∈ supp( b ∗ ) such that b ∗ j > (cid:101) b j and b ∗ j (cid:48) < (cid:101) b j (cid:48) , and hence du j ( x ) dx | x = b ∗ j < du j ( x ) dx | x = (cid:101) b j and du j (cid:48) ( x ) dx | x = b ∗ j (cid:48) > du j (cid:48) ( x ) dx | x = (cid:101) b j (cid:48) , which contradicts theproperty du j ( x ) dx | x = (cid:101) b j = du j (cid:48) ( x ) dx | x = (cid:101) b j (cid:48) . One might wonder whether the transitions of the greedy guarantees in the above analysis of Theo-rem 3.2 always occur in the order 1 , , . . . , m but never in any proper subsequence, namely whether r l · f ( O k l ) ≤ r l − · f ( O k l − ), which is equivalent to f ( O k l ) − ( k l /k l +1 ) f ( O k l +1 )1 − ( k l /k l +1 ) ≥ f ( O k l − ) − ( k l − /k l ) f ( O k l )1 − ( k l − /k l ) , This is equivalent to f ( O k l +1 ) − f ( O k l ) ≤ k l +1 − k l k l − k l − · ( f ( O k l ) − f ( O k l − )) , which is actually true for our instances but not in general. See Example 3.4.12 xample 3.4. Consider the function f on the ground set { , , , } with values f ( ∅ ) = 0 , f ( { } ) =1 , f ( { } ) = f ( { } ) = f ( { } ) = 1 / f ( { , } ) = f ( { , } ) = f ( { , } ) = 7 / f ( { , } ) = f ( { , } ) = f ( { , } ) = 1, f ( { , , } ) = 3 / f ( { , , } ) = f ( { , , } ) = f ( { , , } ) = 4 / f ( { , , , } ) = 3 /

2. It is straightforward to check that f is submodular and monotone, andthat f ( O ) − f ( O ) > f ( O ) − f ( O ).Finally, we end this section with the following observation. In the appendix, we give the proof ofthis observation and show that many practical algorithms satisfy the condition of this observation. Observation 3.5.

For any perturbation factors 0 < ρ < ρ < · · · < ρ m , there exists a suﬃcientlylarge k that grows with the size of instance such that given m budgets k = ρ · k , . . . , k m = ρ m · k ,the optimality described in Theorem 3.2 actually holds for a general class of algorithms such that: • Given budget k i , the algorithm A runs in T rounds ( T is suﬃciently large), of which eachround selects about k i /T elements. • For any (cid:15) >

0, it holds for all t ∈ [ T ], for all j ∈ [ m ], that f ( X A tk i /T ) − f ( X A ( t − k i /T ) ≥ ((1 − (cid:15) ) ρ i / ( ρ j T )) · ( f ( O k j ) − f ( X A tk i /T )), where X A s is the s -th element chosen by A . The main thesis of this paper is that worst case instances of submodular maximization are reallytailored to a speciﬁc budget constraint. It is natural to hope that as the distribution of budgetperturbation becomes arbitrarily spread (aka arbitrarily far from the worst case single budget), theapproximation factor approaches 1. In this section, we give a negative answer to this question.

Theorem 4.1.

For any distribution of budget perturbations D , for every (cid:15) > , for any eﬃcientalgorithm A , for every suﬃciently large k that grows with the size of instance, R A ( D ( k )) ≤ . . (We did not seriously try to optimize the constant 0 . Proof.

For arbitrarily small τ >

0, let ρ min and ρ max be such that the mass of D on [ ρ min , ρ max ] is atleast 1 − τ . Let q = 50. Let K = q − ( i ∗ − · k where i ∗ is the largest i such that q − ( i − · k ≤ ρ min · k .Let N be the smallest i such that q ( i − · K ≥ ρ max · k . We ﬁrst construct the hard instances, andby Theorem 3.2 it suﬃces to upper bound the approximation ratio achieved by greedy on theseinstances. Construction of hard instances

Let K i = q ( i − · K . We use Theorem 2.2 to create hard (with respect to greedy algorithm)functions f K i for all i ∈ [ N ] over disjoint ground sets. We normalize these functions such that theyhave the same optimal value 1 (i.e., f K i ( O ( i ) ) = 1, where O ( i ) denotes the optimal size- K i solutionfor f K i ) and extend them to the union of all the ground sets. The ﬁnal submodular function is f ( X ) = (cid:80) Ni =1 ( q/e ) i − · f K i ( X ). Upper bounding the approximation ratio on the hard instances

Consider a budget K between (cid:80) ij =1 K j and (cid:80) i +1 j =1 K j , for any i ≤ N −

1. We ﬁrst show that thecontribution of f K j with j ≤ i − f K j is13 q/e ) j − / ( q j − · K ), which is decreasing in j . Hence, we can generously assume that the algorithmspends a budget of size (cid:80) i − j =1 K j getting all the utilities from f K j with j ≤ i −

1, which is thebest one can hope for. The total value of these f K j ’s is (cid:80) i − j =1 ( q/e ) j − = (( q/e ) i − − / ( q/e − / ( q/e − < . f K i . Therefore, the best possibleapproximation ratio for budget K is at most the best possible approximation ratio for budget K − (cid:80) i − j =1 K j on the f K j ’s with j ≥ i plus 0 . f K i +1 is at most ( q/e ) i / ( q i · K ). On the other hand, withbudget K i on f K i , greedy can achieve approximation ratio at most 1 − /e by Theorem 2.2, and thus,at the ( K i + 1)-th iteration, greedy has marginal gain at least (1 − (1 − /e )) · ( q/e ) i − / ( q ( i − · K ),which is equal to the best singleton value of f K i +1 . Hence, greedy will not choose anything from f K i +1 until it has selected K i elements from f K i . The remaining budget K − (cid:80) ij =1 K j is at most K i +1 , and it follows for the same reason that greedy will not spend remaining budget on any f K j ’swith j ≥ i + 2.It remains to show how greedy performs on f K i and f K i +1 with budget K (cid:48) = K − (cid:80) ij =1 K j .Let a = K (cid:48) /K i . Notice that greedy splits its budget in the way that the marginal gain of choosingthe next element from f K i is approximately equal to that of choosing the next element from f K i +1 .This can be expressed as the following equations: a K i + a K i = aK i ,e − a · e/qK i = e − a KiKi +1 · K i +1 , where a K i and a K i are the budgets spent on f K i and f K i +1 respectively. The solution is a =( a + q ) / ( q + 1) and a q ( a − / ( q + 1). Hence the approximation ratio of greedy on f K i and f K i +1 with budget K (cid:48) is at most(1 − e − a K i /K i ) · eq + (1 − e − a K i /K i +1 ) eq + aK i − K i K i +1 = (1 − e − ( a + q ) / ( q +1) ) · eq + (1 − e − ( a − / ( q +1) ) eq + a − q . The maximum is approximately 0 . a ≈ . . . . . ρ min · k, ρ max · k ],and this ﬁnishes the proof because τ is arbitrarily small. We formulate a mathematical program that computes the worst possible optimal expected approxi-mation ratio, for any ﬁxed distribution on any ﬁxed choice of m budgets ρ k < ρ k < · · · < ρ m k = k (we also let ρ = 0). We denote the probability of budget ρ i k by p i for each i . Reducing the hard instances to a standard form

Recall that the hard instances in the proof of Theorem 3.2 have following form— f ∗ ( X ) = (cid:80) tj =1 α j · f ( ρ lj − ρ lj − ) k , where l , . . . , l t is a subsequence of 1 , . . . , m , and f ( ρ lj − ρ lj − ) k ( X ) is the hard sub-modular function from Theorem 2.2, and it is normalized such that its optimal value for budget14 ρ l j − ρ l j − ) k is 1. Moreover, α j / ( ρ l j − ρ l j − ) is increasing in j . We show that there is a sub-modular function that is as hard as f ∗ to approximately maximize in the following standardform — f ( X ) = (cid:80) mi =1 β i · f ( ρ i − ρ i − ) k ( X ), where β i ’s satisfy β i ρ i − ρ i − ≥ β i +1 ρ i +1 − ρ i , ∀ i < m, (8)where f ( ρ i − ρ i − ) k is deﬁned analogously to f ( ρ lj − ρ lj − ) k , and we denote the ground set of f ( ρ i − ρ i − ) k by V i .For i ∈ [ m ] and j such that l j − < i ≤ l j , we deﬁne λ i := ρ i − ρ i − ρ l j − ρ l j − . Claim 5.1.

Given budget x · k for any x ≥

0, the best achievable approximation ratio for (cid:80) l j i = l j − +1 λ i · f ( ρ i − ρ i − ) k ( X ) is equal to that for f ( ρ lj − ρ lj − ) k ( X ). Proof.

For any budget x · k , the best achievable approximation ratio for (cid:80) l j i = l j − +1 λ i · f ( ρ i − ρ i − ) k ( X )is max x i ’s l j (cid:88) i = l j − +1 λ i (1 − e − xiρi − ρi − ) s.t. l j (cid:88) i = l j − +1 x i = x and x i ’s are non-negative . For any feasible x i ’s, l j (cid:88) i = l j − +1 λ i (1 − e − xiρi − ρi − ) ≤ − e − (cid:80) lji = lj − λi · xiρi − ρi − (Jensen’s inequality and l j (cid:88) i = l j − +1 λ i = 1)= 1 − e − (cid:80) lji = lj − xiρlj − ρlj − (By deﬁnition of λ i )= 1 − e − xρlj − ρlj − (By l j (cid:88) i = l j − +1 x i = x ) . Moreover, when x i ρ i − ρ i − for all i are equal to each other, we have that x i ρ i − ρ i − = (cid:80) lji = lj − x i (cid:80) lji = lj − ρ i − ρ i − = xρ lj − ρ lj − , and then (cid:80) l j i = l j − +1 λ i (1 − e − xiρi − ρi − ) = 1 − e − xρlj − ρlj − . Hence, 1 − e − xρlj − ρlj − is exactlythe best achievable approximation ratio for (cid:80) l j i = l j − +1 λ i · f ( ρ i − ρ i − ) k ( X ). Notice that it is also thebest achievable approximation ratio for f ( ρ lj − ρ lj − ) k ( X ).Henceforth, we can replace each f ( ρ lj − ρ lj − ) k with (cid:80) l j i = l j − +1 λ i · f ( ρ i − ρ i − ) k ( X ) in f ∗ , whichreduces f ∗ to the standard form. Then, Eq. (8) follows by deﬁnition of λ i ’s and the monotonicityof α j / ( ρ l j − ρ l j − ). Finally, we note that Eq. (8) implies that optimal value of the f ∗ for budget ρ i k is (cid:80) ij =1 β j and that for any i , whenever V i is used by the greedy algorithm, so should the V i (cid:48) ’sfor any i (cid:48) ≤ i . The diﬀerence between f ∗ and the standard form f is that in the f there is a sub-instance for every budget ormulating the mathematical program For each budget ρ i · k , the best possible approximation ratio is achieved by choosing elements fromthe ﬁrst l subsets V , . . . , V l for certain l ≤ m (which we do not know a priori), and the budget shouldbe split in a way such that the marginal contribution from the next element is (approximately)equal among V , . . . , V l . That is, d ( β (1 − e − x ( i,l )1 /ρ )) dx ( i,l )1 = d ( β j (1 − e − x ( i,l ) j / ( ρ j − ρ j − ) )) dx ( i,l ) j , ∀ j ≤ l, (cid:88) j ≤ l x ( i,l ) j = ρ i , and x ( i,l ) j ≥ , ∀ j ≤ l. Solving the system of equations in the above constraint gives us x ( i,l )1 ρ = ρ i ρ l − l (cid:88) j =1 ln (cid:18) β j ρ β ( ρ j − ρ j − ) (cid:19) · ρ j − ρ j − ρ l ,x ( i,l ) j ρ j − ρ j − = x ( i,l )1 ρ + ln (cid:18) β j ρ β ( ρ j − ρ j − ) (cid:19) , ∀ j ≤ l. (9)We let h ( i,l ) ( β , . . . , β m ) denote the approximation ratio achieved by x ( i,l ) j ’s, then it is given by h ( i,l ) ( β , . . . , β m ) = (cid:80) lj =1 β j (1 − e − x ( i,l ) jρj − ρj − ) (cid:80) ij =1 β j = (cid:80) lj =1 β j (1 − e − x ( i,l )1 ρ · β ( ρ j − ρ j − ) β j ρ ) (cid:80) ij =1 β j (Solution of x ( i,l ) j )= (cid:80) lj =1 β j − (cid:80) lj =1 β ( ρ j − ρ j − ) ρ · e − x ( i,l )1 ρ (cid:80) ij =1 β j = (cid:16)(cid:80) lj =1 β j (cid:17) − β · ρ l ρ · e − x ( i,l )1 ρ (cid:80) ij =1 β j (Telescoping sum)= (cid:16)(cid:80) lj =1 β j (cid:17) − β · ρ l ρ · e − ρiρl + (cid:80) lj =1 ln (cid:18) βjρ β ρj − ρj − (cid:19) · ρj − ρj − ρl (cid:80) ij =1 β j (Solution of x ( i,l )1 ) , where the nominator is the value achieved by x ( i,l ) j ’s, and the denominator is the optimal value.Since we do not know the right choice of l a priori, we will enumerate all possible choices of l andpick the best. Moreover, for every l ≤ m , we consider l as a candidate choice only if the solutionsof x ( i,l ) j ’s by Eq. (9) are non-negative, because this holds for the right choice of l . (Note that l = 1is always a candidate choice, because it means that all budget are spent on the ﬁrst sub-instance,16able 2: Campaign Budgets (in millions) Candidate

Bennet Biden Bloomberg Buttigieg Gabbard

Budget x ( i, = ρ i ≥ h ( i ) denote the best approximation ratio for budget ρ i k , then it is given by h ( i ) ( β , . . . , β m ) = max ≤ l ≤ m h ( i,l ) ( β , . . . , β m ) · [ x ( i,l ) l ≥ ≤ l ≤ m h ( i,l ) ( β , . . . , β m ) − C · [ x ( i,l ) l <

0] ( C is a large constant) , where x ( i,l ) l can be represented as a function of β i ’s. Note that we only restrict x ( i,l ) l to be non-negative, which actually implies that every x ( i,l ) j for j ≤ l is non-negative, by Eq.(8) and Eq.(9).Finally, the expected approximation ratio h is given by h ( β , . . . , β m ) = (cid:80) mi =1 p i · h ( i ) ( β , . . . , β m ).Given any ﬁxed ρ i ’s, the worst possible optimal average approximation ratio is the result of thefollowing program min β ,...,β m ≥ h ( β , . . . , β m ) s.t. Eq.(8) and Eq.(9) . We solve this program numerically for various distributions of budget perturbations ( ρ i ’s); theresults are summarized in Table 1. Canonical distributions

It is natural to ask what is the expected approximation factor when thebudget is drawn from uniform over [ x, x ]. Since we don’t know how to compute this valueexactly, we take discretization of this distribution namely 25 budgets evenly spaced between x and 10 x . Similarly, we experiment with discretizations of log-scale uniform distributionsover [ x, x ] and [ x, x ]. Top social/political campaigns on Facebook

With the application of inﬂuence maximizationon social networks in mind, we use the budgets of the top ten campaigns on Facebook’sdatabase of social/political campaigns . We use reported total campaign budgets bycandidates in the 2020 Democratic Party primary elections during months October-December2019 [AKS20] (see Table 2).

References [AGN18] Nima Anari, Gagan Goel, and Afshin Nikzad. Budget feasible procurement auctions.

Operations Research , 66(3):637–652, 2018. We use a somewhat sparse discretization since the program is non-convex. Top ten amount spent during the month before Mar. 26, 2020 in the Facebook Ad report [Fac20].

Proceedings ofthe 2019 ACM Conference on Economics and Computation, EC 2019, Phoenix, AZ,USA, June 24-28, 2019 , pages 901–919, 2019.[AKS20] Sarah Almukhtar, Thomas Kaplan, and Rachel Shorey. 2020 democrats went on aspending spree in the ﬁnal months of 2019. , 2020.[BCIW12] Maria-Florina Balcan, Florin Constantin, Satoru Iwata, and Lei Wang. Learning val-uation functions. In

COLT 2012 - The 25th Annual Conference on Learning Theory,June 25-27, 2012, Edinburgh, Scotland , pages 4.1–4.24, 2012.[BDF +

12] Ashwinkumar Badanidiyuru, Shahar Dobzinski, Hu Fu, Robert Kleinberg, Noam Nisan,and Tim Roughgarden. Sketching valuation functions. In

Proceedings of the Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2012, Kyoto,Japan, January 17-19, 2012 , pages 1025–1035, 2012.[BH16] Eric Balkanski and Jason D. Hartline. Bayesian budget feasibility with posted pricing.In

Proceedings of the 25th International Conference on World Wide Web, WWW 2016,Montreal, Canada, April 11 - 15, 2016 , pages 189–203, 2016.[BKS12] Ashwinkumar Badanidiyuru, Robert Kleinberg, and Yaron Singer. Learning on a bud-get: posted price mechanisms for online procurement. In

Proceedings of the 13th ACMConference on Electronic Commerce, EC 2012, Valencia, Spain, June 4-8, 2012 , pages128–145, 2012.[BRS16] Eric Balkanski, Aviad Rubinstein, and Yaron Singer. The power of optimization fromsamples. In

Advances in Neural Information Processing Systems 29: Annual Confer-ence on Neural Information Processing Systems 2016, December 5-10, 2016, Barcelona,Spain , pages 4017–4025, 2016.[BRS19] Eric Balkanski, Aviad Rubinstein, and Yaron Singer. An exponential speedup in par-allel running time for submodular maximization without loss in approximation. In

Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms,SODA 2019, San Diego, California, USA, January 6-9, 2019 , pages 283–302, 2019.[CC84] Michele Conforti and G´erard Cornu´ejols. Submodular set functions, matroids andthe greedy algorithm: Tight worst-case bounds and some generalizations of the rado-edmonds theorem.

Discret. Appl. Math. , 7(3):251–274, 1984.[CC14] Hau Chan and Jing Chen. Truthful multi-unit procurements with budgets. In

Weband Internet Economics - 10th International Conference, WINE 2014, Beijing, China,December 14-17, 2014. Proceedings , pages 89–105, 2014.[CC16] Hau Chan and Jing Chen. Budget feasible mechanisms for dealers. In

Proceedingsof the 2016 International Conference on Autonomous Agents & Multiagent Systems,Singapore, May 9-13, 2016 , pages 113–122, 2016.[CGL11] Ning Chen, Nick Gravin, and Pinyan Lu. On the approximability of budget feasiblemechanisms. In

Proceedings of the twenty-second annual ACM-SIAM symposium onDiscrete Algorithms , pages 685–699. Society for Industrial and Applied Mathematics,2011. 18CRV17] Vaggos Chatziafratis, Tim Roughgarden, and Jan Vondr´ak. Stability and recovery forindependence systems. In , pages 26:1–26:15, 2017.[DH18] Daniel Dadush and Sophie Huiberts. A friendly smoothed analysis of the simplexmethod. In Ilias Diakonikolas, David Kempe, and Monika Henzinger, editors,

Proceed-ings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, STOC2018, Los Angeles, CA, USA, June 25-29, 2018 , pages 390–403. ACM, 2018.[DPS11] Shahar Dobzinski, Christos H. Papadimitriou, and Yaron Singer. Mechanisms forcomplement-free procurement. In

Proceedings 12th ACM Conference on ElectronicCommerce (EC-2011), San Jose, CA, USA, June 5-9, 2011 , pages 273–282, 2011.[EG14] Ludwig Ensthaler and Thomas Giebe. A dynamic auction for multi-object procurementunder a hard budget constraint.

Research Policy , 43(1):179–189, 2014.[EN19] Alina Ene and Huy L. Nguyen. A nearly-linear time algorithm for submodular maxi-mization with a knapsack constraint. In Christel Baier, Ioannis Chatzigiannakis, PaolaFlocchini, and Stefano Leonardi, editors, , volume132 of

LIPIcs , pages 53:1–53:12. Schloss Dagstuhl - Leibniz-Zentrum f¨ur Informatik,2019.[Fac20] Facebook. Facebook ad library report. , 2020. Accessed: 2020-03-26.[Fei98] Uriel Feige. A threshold of ln n for approximating set cover.

Journal of the ACM(JACM) , 45(4):634–652, 1998.[FK14] Vitaly Feldman and Pravesh Kothari. Learning coverage functions and private releaseof marginals. In

Proceedings of The 27th Conference on Learning Theory, COLT 2014,Barcelona, Spain, June 13-15, 2014 , pages 679–702, 2014.[GJLZ19] Nick Gravin, Yaonan Jin, Pinyan Lu, and Chenhao Zhang. Optimal budget-feasiblemechanisms for additive valuations. In

Proceedings of the 2019 ACM Conference onEconomics and Computation , pages 887–900. ACM, 2019.[GNS14] Gagan Goel, Afshin Nikzad, and Adish Singla. Mechanism design for crowdsourcingmarkets with heterogeneous tasks. In

Proceedings of the Seconf AAAI Conference onHuman Computation and Crowdsourcing, HCOMP 2014, November 2-4, 2014, Pitts-burgh, Pennsylvania, USA , 2014.[GRS16] Rishi Gupta, Tim Roughgarden, and C. Seshadhri. Decompositions of triangle-densegraphs.

SIAM J. Comput. , 45(2):197–215, 2016.[HIM14] Thibaut Horel, Stratis Ioannidis, and S. Muthukrishnan. Budget feasible mechanismsfor experimental design. In

LATIN 2014: Theoretical Informatics - 11th Latin AmericanSymposium, Montevideo, Uruguay, March 31 - April 4, 2014. Proceedings , pages 719–730, 2014. 19HS16] Thibaut Horel and Yaron Singer. Maximization of approximately submodular functions.In

Advances in Neural Information Processing Systems 29: Annual Conference on Neu-ral Information Processing Systems 2016, December 5-10, 2016, Barcelona, Spain , pages3045–3053, 2016.[HS17] Avinatan Hassidim and Yaron Singer. Submodular optimization under noise. In

Pro-ceedings of the 30th Conference on Learning Theory, COLT 2017, Amsterdam, TheNetherlands, 7-10 July 2017 , pages 1069–1122, 2017.[KG14] Andreas Krause and Daniel Golovin. Submodular function maximization. In

Tractabil-ity: Practical Approaches to Hard Problems , pages 71–104. 2014.[KKT15] David Kempe, Jon M. Kleinberg, and ´Eva Tardos. Maximizing the spread of inﬂuencethrough a social network.

Theory of Computing , 11:105–147, 2015.[KL14] Sanjeev Khanna and Brendan Lucier. Inﬂuence maximization in undirected networks.In

Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algo-rithms, SODA 2014, Portland, Oregon, USA, January 5-7, 2014 , pages 1482–1496,2014.[KT18] Pooya Jalaly Khalilabadi and ´Eva Tardos. Simple and eﬃcient budget feasible mech-anisms for monotone submodular valuations. In

Web and Internet Economics - 14thInternational Conference, WINE 2018, Oxford, UK, December 15-17, 2018, Proceed-ings , pages 246–263, 2018.[LMSZ17] Stefano Leonardi, Gianpiero Monaco, Piotr Sankowski, and Qiang Zhang. Budget feasi-ble mechanisms on matroids. In

Integer Programming and Combinatorial Optimization -19th International Conference, IPCO 2017, Waterloo, ON, Canada, June 26-28, 2017,Proceedings , pages 368–379, 2017.[LV18] Paul Liu and Jan Vondrak. Submodular optimization in the mapreduce model. In . Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2018.[LZY20] Juan Li, Yanmin Zhu, and Jiadi Yu. Redundancy-aware and budget-feasible incentivemechanism in crowd sensing.

Comput. J. , 63(1):66–79, 2020.[NS20] Zeev Nutov and Elad Shoham. Practical budgeted submodular maximization.

CoRR ,abs/2007.04937, 2020.[NSKK16] Besmira Nushi, Adish Singla, Andreas Krause, and Donald Kossmann. Learning andfeature selection under budget constraints in crowdsourcing. In

Proceedings of theFourth AAAI Conference on Human Computation and Crowdsourcing, HCOMP 2016,30 October - 3 November, 2016, Austin, Texas, USA , pages 159–168, 2016.[NW78] George L Nemhauser and Laurence A Wolsey. Best algorithms for approximating themaximum of a submodular set function.

Mathematics of operations research , 3(3):177–188, 1978.[NWF78] George L Nemhauser, Laurence A Wolsey, and Marshall L Fisher. An analysis ofapproximations for maximizing submodular set functions.

Mathematical Programming ,14(1):265–294, 1978. 20Sin10] Yaron Singer. Budget feasible mechanisms. In , pages 765–774. IEEE, 2010.[SM13] Yaron Singer and Manas Mittal. Pricing mechanisms for crowdsourcing markets. In , pages 1157–1166, 2013.[ST04] Daniel A. Spielman and Shang-Hua Teng. Smoothed analysis of algorithms: Why thesimplex algorithm usually takes polynomial time.

J. ACM , 51(3):385–463, 2004.[ST19] Grant Schoenebeck and Biaoshuai Tao. Inﬂuence maximization on undirected graphs:Towards closing the (1-1/e) gap. In

Proceedings of the 2019 ACM Conference on Eco-nomics and Computation, EC 2019, Phoenix, AZ, USA, June 24-28, 2019 , pages 423–453, 2019.[STY20] Grant Schoenebeck, Biaoshuai Tao, and Fang-Yi Yu. Limitations of greed: Inﬂuencemaximization in undirected networks re-visited. In

AAMAS 2020 , 2020. To appear.[Svi04] Maxim Sviridenko. A note on maximizing a submodular set function subject to aknapsack constraint.

Oper. Res. Lett. , 32(1):41–43, 2004.[SVW17] Maxim Sviridenko, Jan Vondr´ak, and Justin Ward. Optimal approximation for sub-modular and supermodular optimization with bounded curvature.

Math. Oper. Res. ,42(4):1197–1218, 2017.[TSP20] Alfredo Torrico, Mohit Singh, and Sebastian Pokutta. On the unreasonable eﬀectivenessof the greedy algorithm: Greedy adapts to sharpness.

CoRR , abs/2002.04063, 2020.[Von13] Jan Vondr´ak. Symmetry and approximability of submodular maximization problems.

SIAM J. Comput. , 42(1):265–304, 2013.[WS98] Duncan Watts and Steven Strogatz. Collective dynamics of ’small-world’ networks.

Nature , 1998.[Yos16] Yuichi Yoshida. Maximizing a monotone submodular function with a bounded curvatureunder a knapsack constraint.

CoRR , abs/1607.04527, 2016.[ZLM16] Dong Zhao, Xiang-Yang Li, and Huadong Ma. Budget-feasible online incentive mech-anisms for crowdsourcing tasks truthfully.

IEEE/ACM Trans. Netw. , 24(2):647–661,2016.[ZWG +

17] Zhenzhe Zheng, Fan Wu, Xiaofeng Gao, Hongzi Zhu, Shaojie Tang, and Guihai Chen.A budget feasible incentive mechanism for weighted coverage maximization in mobilecrowdsensing.

IEEE Trans. Mob. Comput. , 16(9):2392–2407, 2017.

A Explicit Analysis for Two Budgets

In this section, we establish an analytic formula for the optimal expected approximation ratio forthe uniform distribution of two budgets k = ρ · k < k for every suﬃciently large k and k − k that grow with the size of instance. This is done by implementing the analysis of Theorem 3.2explicitly. We start by re-stating the hard instances.21 onstruction of hard instances. Using Theorem 2.2, we create two hard (with respect to anarbitrary eﬃcient algorithm) submodular functions f k and f k − k with disjoint ground sets V and V . To simplify notation, we let f and f denote f k and f k − k respectively. Then, we normalizethe two functions such that f ( O (1) ) = f ( O (2) ) = 1, where O (1) denotes the optimal size- k solu-tion for f and O (2) is the optimal size-( k − k ) solution for f . Furthermore, we extend both thefunctions to the ground set V := V ∪ V in a natural way that the sets of V have zero value to f and vice versa. Finally, for α >

0, we deﬁne f : V → R ≥ as f ( X ) = α · f ( X ) + f ( X ). Lemma A.1.

For any eﬃcient algorithm A , there is a submodular function f constructed as abovethat has the following properties:(i) For every k = β · k with ≤ β ≤ , the optimal value of f with budget k is min { , β/ (1 − ρ ) } + max { ( β − ρ ) /ρ, } · α if α < ρ/ (1 − ρ ) and is min { , β/ρ } · α + max { ( β − ρ ) / (1 − ρ ) , } otherwise.(ii) For every k = β · k with ≤ β ≤ , the solution value of A on f with budget k is upperbounded by max ≤ x ≤ β (1 − e − x/ρ ) · α + (1 − e ( x − β ) / (1 − ρ ) ) .Proof. (i) By the ﬁrst property in Theorem 2.2, the optimal values of f , f grow linearly with thebudget. Moreover, the marginal gain of an element of O (1) is α/k = α/ ( ρ · k ) until we select all O (1) and is zero after that. The marginal gain of an element in O (2) is 1 / ( k − k ) = 1 / ((1 − ρ ) k )until O (2) is exhausted. Hence, if α < ρ/ (1 − ρ ), the optimal solution to f should prefer the elementsof O (2) until it exhausts O (2) , and then spend the rest of budget on O (1) . Hence, the optimal valueis min { , β/ (1 − ρ ) } + max { ( β − ρ ) /ρ, } · α . The other case is similar. (ii) Suppose x · k = ( x/ρ ) k elements are chosen from V , then, the remaining (( β − x ) / (1 − ρ )) k elements are from V . By the second property in Theorem 2.2, the value we can obtain is at most(1 − e − x/ρ ) · α + (1 − e ( x − β ) / (1 − ρ ) ). Therefore, the maximum of this objective is an upper bound ofthe optimum of f with budget k .With this lemma, we can easily prove a parametrized hardness result. Proposition A.2.

Given any < ρ < and α (1 − ρ ) /ρ ≥ , for any (cid:15) > , there is no eﬃcientalgorithm can approximate submodular maximization problem for two budgets k and k = ρ · k ,with the average approximation ratio is larger than(i) (cid:32) (cid:18) − e − ρ (cid:16) ρα (1 − ρ ) (cid:17) − ρ (cid:19) α + 1 − e − ρ (cid:16) α (1 − ρ ) ρ (cid:17) ρ α + (cid:18) − e − (cid:16) ρα (1 − ρ ) (cid:17) − ρ (cid:19) α + 1 − e − (cid:16) α (1 − ρ ) ρ (cid:17) ρ α + 1 (cid:33) + (cid:15), (10) if α (1 − ρ ) /ρ ≤ e ,(ii) (cid:32) (1 − e − ) + (cid:18) − e − (cid:16) ρα (1 − ρ ) (cid:17) − ρ (cid:19) α + 1 − e − (cid:16) α (1 − ρ ) ρ (cid:17) ρ α + 1 (cid:33) + (cid:15), (11)22 f e ≤ α (1 − ρ ) /ρ ≤ e /ρ ,(iii) (cid:18) (1 − e − ) + (1 − e − /ρ ) · αα + 1 (cid:19) + (cid:15), (12) if α (1 − ρ ) /ρ ≥ e /ρ .Proof. Since α (1 − ρ ) /ρ ≥

1, by the ﬁrst property in Lemma A.1, we know the optimal value forbudget k is α and that for budget k is α + 1. We can maximize the best achievable solutionvalue (1 − e − x/ρ ) · α + (1 − e ( x − β ) / (1 − ρ ) ) for β = ρ and 1 by standard calculus. In general, we ﬁndthat the optimal x is (ln( α (1 − ρ ) /ρ ) + β/ (1 − ρ )) / (1 / (1 − ρ ) + 1 /ρ ). Then, we observe that if1 ≤ α (1 − ρ ) /ρ ≤ e , the optimal x for β = ρ is between 0 and ρ , and that for β = 1 is between ρ and 1. Hence the optimal x for both β are feasible, and we can calculate the analytic formulaof each maximum. Therefore, we have an upper bound of approximation ratio for each budget.Obviously, the average of these two upper bounds, which is given in Eq. (10), is an upper boundfor the average approximation ratio. If e < α (1 − ρ ) /ρ ≤ e /ρ , then the optimal x is ρ when β = ρ and is still (ln( α (1 − ρ ) /ρ ) + β/ (1 − ρ )) / (1 / (1 − ρ ) + 1 /ρ ) when β = 1. As before, we can calculatethe upper bound, which is given in Eq. (11). Finally, if α (1 − ρ ) /ρ ≥ e /ρ , then the optimal x is ρ when β = ρ and is 1 when β = 1. The corresponding upper bound is given in Eq. (12).Next, we derive the closed-form parametrized formulas of the approximation ratios of the greedyalgorithm for monotone submodular maximization with two budgets, which will match the hardnessin Proposition A.2. Before that, we establish a useful lemma for greedy analysis. Lemma A.3.

Given the same conditions as in Lemma 2.3, for all k = θ · k with θ ≥ and k = η · k with ≤ η ≤ , the following inequality holds, f ( X k ) ≥ (1 − e θ · η − θ ) f ( O k ) + e θ · η − θ · f ( X k ) , and in particular, by letting η = 0 , f ( X k ) ≥ (1 − e − θ ) f ( O k ) . Proof.

We start from Lemma 2.3, f ( X i ) − f ( X i − ) ≥ k ( f ( O k ) − f ( X i − )) . We rearrange the terms as follows, f ( O k ) − f ( X i ) ≤ (cid:18) − k (cid:19) · ( f ( O k ) − f ( X i − )) , and we recursively apply this step and get f ( O k ) − f ( X k ) ≤ (cid:18) − k (cid:19) k − k · ( f ( O k ) − f ( X k )) ≤ e θ − θ · η ( f ( O k ) − f ( X k )) . The proof ﬁnishes by rearranging the terms. 23 roposition A.4.

Given a monotone submodular function f and two budgets k and k = ρ · k with < ρ < , we we let X k and O k denote the greedy solution and the optimal solution ofcardinality k . Suppose f ( O k ) = c · f ( O k ) , where ρ ≤ c ≤ . Then, the greedy algorithm has thefollowing average approximation ratios,(i) (cid:18)(cid:18) (cid:18) − e − ρ (cid:18) − ρρ (1 /c − (cid:19) ρ (cid:19) · c + e − ρ (cid:18) − ρρ (1 /c − (cid:19) ρ · − ρ/c − ρ (cid:19) + (cid:18) − e − (cid:18) − ρρ (1 /c − (cid:19) ρ + e − (cid:18) − ρρ (1 /c − (cid:19) ρ · c − ρ − ρ (cid:19)(cid:19) , (13) if ((1 − ρ/c ) / (1 − ρ )) ≤ − e − ,(ii) (cid:32) (1 − e − ) + (cid:32) − e − (cid:18) − ρρ (1 /c − (cid:19) ρ · (1 − c ) + e − (cid:18) − ρρ (1 /c − (cid:19) − ρ c (cid:33) (cid:33) , (14) if − e − ≤ ((1 − ρ/c ) / (1 − ρ )) ≤ /c ,(iii)

12 (((1 − e − ) + (1 − e − /ρ ) c )) , (15) if ((1 − ρ/c ) / (1 − ρ )) ≥ /c .Moreover, there is no eﬃcient algorithm can achieve better approximation ratios.Proof. By Lemma 2.3, we have the following two guarantees, f ( X i ) − f ( X i − ) ≥ k ( f ( O k ) − f ( X i − )) ,f ( X i ) − f ( X i − ) ≥ k ( f ( O k ) − f ( X i − )) . Observe that ( f ( O k ) − f ( X i − )) /k ≥ ( f ( O k ) − f ( X i − )) /k holds if and only if f ( X i − ) ≤ (( c − ρ ) / (1 − ρ )) f ( O k ), where the right hand side is equal to ((1 − ρ/c ) / (1 − ρ )) f ( O k ). We let i ∗ − τ · k be the largest i − f ( X i − ) ≤ ((1 − ρ/c ) / (1 − ρ )) f ( O k ) holds. We ﬁrstconsider the case where ((1 − ρ/c ) / (1 − ρ )) ≤ − e − , (16)which implies that τ ≤

1, because f ( X k ) ≥ (1 − e − ) f ( O k ). Then, by Lemma A.3 with θ = τ, η =0, f ( X i ∗ − ) ≥ (1 − e − τ ) f ( O k ). It follows that((1 − ρ/c ) / (1 − ρ )) ≥ − e − τ . (17)Now we apply Lemma A.3 with θ = ρ, η = i ∗ /k = τ + o (1), f ( X k ) ≥ (1 − e ρ · τ − ρ ) f ( O k ) + e ρ · τ − ρ · f ( X i ∗ ) ≥ (1 − e ρ · τ − ρ ) f ( O k ) + e ρ · τ − ρ ((1 − ρ/c ) / (1 − ρ )) f ( O k )= ((1 − e ρ · τ − ρ ) /c + e ρ · τ − ρ ((1 − ρ/c ) / (1 − ρ ))) f ( O k )24 (cid:18) (cid:18) − e − ρ (cid:18) − ρρ (1 /c − (cid:19) ρ (cid:19) · c + e − ρ (cid:18) − ρρ (1 /c − (cid:19) ρ · − ρ/c − ρ (cid:19) f ( O k ) , where the second inequality is by deﬁnition of i ∗ , and the last inequality follows from Eq. (17).Then, we apply Lemma A.3 with θ = 1 , η = ρ and use the previous bound of f ( X k ), f ( X k ) ≥ (1 − e ρ − ) f ( O k ) + e ρ − f ( X k ) ≥ (cid:18) − e − (cid:18) − ρρ (1 /c − (cid:19) ρ + e − (cid:18) − ρρ (1 /c − (cid:19) ρ · c − ρ − ρ (cid:19) f ( O k ) . We relate this case, where we assume Eq. (16), with the ﬁrst case of Proposition A.2 by noticingthat c = α/ (1 + α ). It is straightforward to verify the approximation ratios for f ( X k ) and f ( X k )match the ratios there. Therefore, greedy is optimal for this case. Next, we consider the case where1 − e − ≤ ((1 − ρ/c ) / (1 − ρ )) ≤ /c, (18)which implies 1 ≤ τ ≤ /ρ , and Eq. (17) still holds. In this case, we know f ( X k ) ≥ (1 − e − ) f ( O k ),and similar to before, we apply Lemma A.3 with θ = 1 , η = i ∗ /k = ρ · τ + o (1), f ( X k ) ≥ (1 − e ρ · τ − ) f ( O k ) + e ρ · τ − · f ( X i ∗ ) ≥ (1 − e ρ · τ − ) f ( O k ) + e ρ · τ − (1 − e − τ ) f ( O k )= ((1 − e ρ · τ − ) + e ρ · τ − (1 − e − τ ) c ) f ( O k ) ≥ (cid:32) − e − (cid:18) − ρρ (1 /c − (cid:19) ρ · (1 − c ) + e − (cid:18) − ρρ (1 /c − (cid:19) − ρ c (cid:33) f ( O k ) , where the second inequality is again by Lemma A.3 with θ = 1 , η = i ∗ /k = τ + o (1) and the lastinequality follows from Eq. (17). It is not hard to verify that this case correspond to the secondcase of Proposition A.2 and greedy has optimal ratios. Finally, we consider the last case where((1 − ρ/c ) / (1 − ρ )) ≥ /c. (19)In this case, we know that f ( X k ) ≥ (1 − e − ) f ( O k ) and f ( X k ) ≥ (1 − e − /ρ ) f ( O k ) = (1 − e − /ρ ) c · f ( O k ). We conclude that greedy is optimal by comparing this case with the third case ofProposition A.2.Figure 1: Optimal average approximation ratios for two budgets.For every 0 < ρ <

1, using Proposition A.4, we can compute the worst c to minimize theapproximation ratio, and it follows that the minimal ratio is the best achievable approximationguarantee for submodular maximization with budgets k and k = ρ · k . It turns out that the ﬁrstcase of Proposition A.4 is always the worst case. We illustrate the best achievable approximationratios for 0 . ≤ ρ ≤ .

99 in Figure 1. 25

Optimal Algorithms in Practical Settings

In this section, we show that our main result, greedy is optimal for multiple budgets, generalizesto the constant rounds Map-Reduce algorithm in distributed setting [LV18], and the logarithmicrounds parallel algorithm [BRS19]. We sketch the main ideas behind these algorithms and pointout how to adapt them to Observation 3.5. Before that, we provide the proof of Observation 3.5.

Observation B.1 (Observation 3.5 restated) . For any perturbation factors 0 < ρ < ρ < · · · <ρ m , there exists a suﬃciently large k that grows with the size of instance such that given m budgets k = ρ · k , . . . , k m = ρ m · k , the optimality described in Theorem 3.2 actually holds for a generalclass of algorithms such that: • Given budget k i , the algorithm A runs in T rounds ( T is suﬃciently large), of which eachround selects about k i /T elements. • For any (cid:15) >

0, it holds for all t ∈ [ T ], for all j ∈ [ m ], that f ( X A tk i /T ) − f ( X A ( t − k i /T ) ≥ ((1 − (cid:15) ) ρ i / ( ρ j T )) · ( f ( O k j ) − f ( X A tk i /T )), where X A s is the s -th element chosen by A . Proof.

We let ˆ f ( X greedy l ) denote the lower bound estimate of f ( X greedy l ) that we get by iterativelyapplying the best greedy guarantees until the l -th iteration ( ˆ f ( X A l ) is deﬁned similarly). Notethat the second property in the observation is similar to the performance guarantee of greedyalgorithm with respect to each O k j . The only diﬀerence is that with respect to any O k j , in av-erage, every element selected in round t of A has the same guarantee ( f ( O k j ) − ˆ f ( X A tk i /T )) /k j (ignore the 1 − (cid:15) factor), while for each s ≤ k i /T , the (( t − k i /T + s )-th element selected bystandard greedy has guarantee ( f ( O k j ) − ˆ f ( X greedy( t − k i /T + s − )) /k j . However, we can show that thisdiﬀerence between the two guarantees can be ignored. First, observe that ˆ f ( X A tk i /T ) ≤ ˆ f ( X greedy tk i /T )for all t ≤ T , because greedy has better choices of guarantees than A . Moreover, notice that forall t , ( f ( O k j ) − ˆ f ( X greedy( t − k i /T + s − )) /k j ≤ ( f ( O k j ) − ˆ f ( X greedy( t − k i /T )) /k j , and the diﬀerence between( f ( O k j ) − ˆ f ( X greedy( t − k i /T )) /k j and ( f ( O k j ) − ˆ f ( X A tk i /T )) /k j for any j is upper bounded by ( ˆ f ( X A tk i /T ) − ˆ f ( X greedy( t − k i /T )) /k , which in turn is upper bounded by (cid:15) t := ( ˆ f ( X A tk i /T ) − ˆ f ( X A ( t − k i /T )) /k . Fur-thermore, if we iteratively apply the guarantee for each t ≤ T and s ≤ k i /T for A as followsˆ f ( X A ( t − k i /T + s ) ≥ ˆ f ( X A ( t − k i /T + s − ) − ˆ f ( X A tk i /T ) /k j + f ( O k j ) /k j = ˆ f ( X A ( t − k i /T + s − ) − ˆ f ( X greedy( t − k i /T + s − ) /k j + f ( O k j ) /k j − (cid:15) t , where for each t and s , j is chosen to be same as the best choice of j for greedy in this iteration,then by an inductive argument (base case is ˆ f ( X A ) = ˆ f ( X greedy0 )), we have that ˆ f ( X greedy( t − k i /T + s ) − ˆ f ( X A ( t − k i /T + s ) ≤ (cid:80) t − r =1 ( k i /T ) (cid:15) r + s · (cid:15) t for each t ≤ T and s ≤ k i /T . By a telescoping sum, (cid:80) Tt =1 ( k i /T ) (cid:15) t ≤ ( f ( X A k i ) − f ( ∅ )) / ( ρ T /ρ i ), which is negligible if T is suﬃciently large. Therefore,the ﬁnal performance guarantee of A is approximately equal to the ﬁnal greedy guarantee. Map-Reduce algorithm.

Suppose the budget is k i . The setup is that there are (cid:112) n/k i machinesand a central machine, each with memory (cid:101) O ( √ nk i ). The algorithm has t = O ( (cid:15) ) Map-Reducerounds and maintains a solution set G , which is empty initially. At the l -th round, the algorithmsets a threshold k i (1 − t ) l f ( O k i ) and wants to add k i t elements to G (actually, it might diﬀerfrom this amount, but this is ﬁne as we will explain later in this paragraph), each with marginalgain above the threshold. To achieve this, each machine from its storage selects a candidate set26onsisting of the elements that have marginal gains above the threshold with respect to G ( G is not updated) and sends the candidates to the central machine, and then the central machineenumerates all the candidates and adds the element to G if it has marginal gain above the thresholdwith respect to the latest G . The chosen threshold is actually the greedy guarantee of marginal gainwhen the cumulative utility reaches (1 − (1 − t ) l ) f ( O k i ). Hence, during the enumeration procedureon the central machine, either it successfully selects k i t elements with marginal contribution above k i (1 − t ) l f ( O k i ), or there is no such element left, in which case the cumulative utility should alreadyreach (1 − (1 − t ) l ) f ( O k i ). In either case, we will achieve roughly (1 − (1 − t ) l ) f ( O k i ) at the endof l -th round, and the ﬁnal 1 − /e approximation ratio follows by standard greedy analysis. Twoissues remain—ﬁrst, we do not know f ( O k i ), which can be ﬁxed by standard ”guessing optimalvalue” trick, second, we need to bound the memory usage. For the ordinary machines, we cansimply randomly partition the ground set, and for the central machine, this can be ﬁxed as follows:at the beginning of each round, the central machine samples a random set S of size 4 √ nk i andsequentially adds the elements from S to G if the element has marginal gain above the thresholdwith respect to the latest G ; If this procedure ends up selecting at least k i elements, the algorithmcan stop, otherwise it continues as before. Using a martingale argument, it can be shown that ifthere are many (more than √ nk i ) candidate elements chosen by the ordinary machines, then withhigh probability, the central machine should have already chosen at least k i elements in the aboveprocedure.In order to apply Observation 3.5, we need the greedy guarantees with respect to all O k j ’s. Tothis end, we can guess f ( O k j )’s rather than just f ( O k i ), and moreover, we set the threshold at the l -th round as the largest of k j ( f ( O k j ) − ˆ f ( X k i l/t )) for all j , where ˆ f ( X k i l/t ) is the lower boundestimate of f ( X k i l/t ) we get by applying best guarantee for each iteration of greedy. Finally, if wewant the algorithm to be oblivious to the budget distribution, we can simply discretize the domainof perturbed budgets and apply above-mentioned trick for the budgets in the discretized domain. Parallel algorithm.

Suppose the budget is k i . The parallel algorithm is similar to the MapRe-duce algorithm. It runs in t rounds and maintains a solution set G . In each round, it adds to G aset of k i t elements with total marginal gain above the threshold − (cid:15)t ( f ( O k i ) − f ( G )). Speciﬁcally,the algorithm ﬁrst selects a candidate set X by iteratively discarding from X all the elements thathave marginal contribution roughly below − (cid:15)k i ( f ( O k i ) − f ( G )) with respect to the union between G and a random subset of size kt of X until the expected total marginal gain of a random size- k i t subset of X achieves the threshold, and then it samples a set of size k i t from X and adds it to G .Each round terminates quickly because if the expected total marginal gain of a random subset islow in one iteration, then there should be many elements with low marginal contribution, and theywill be discarded together in this iteration.In order to apply our analysis, we can adapt the algorithm similarly to what we did for theMapReduce algorithm, i.e., we set the threshold as the largest of − (cid:15)k j ( f ( O k j ) − f ( G )) for all jj