[PDF] A Refined Analysis of Submodular Greedy

Abstract

Many algorithms for maximizing a monotone submodular function subject to a knapsack constraint rely on the natural greedy heuristic. We present a novel refined analysis of this greedy heuristic which enables us to: (1) reduce the enumeration in the tight (1-e^{-1})-approximation of [Sviridenko 04] from subsets of size three to two; (2) present an improved upper bound of 0.42945 for the classic algorithm which returns the better between a single element and the output of the greedy heuristic.

Full PDF

aa r X i v : . [ c s . D S ] F e b A Faster Tight Approximation for Submodular Maximization Subjectto a Knapsack Constraint

Ariel Kulik ∗ , Roy Schwartz † , and Hadas Shachnai ‡ Computer Science Department, Technion, Haifa 3200003, IsraelFebruary 26, 2021

Abstract

The problem of maximizing a monotone submodular function subject to a knapsack constraintadmits a tight (1 − e − )-approximation: exhaustively enumerate over all subsets of size at most threeand extend each using the greedy heuristic [Sviridenko, 2004]. We prove it suﬃces to enumerateonly over all subsets of size at most two and still retain a tight (1 − e − )-approximation. Thisimproves the running time from O ( n ) to O ( n ) queries. The result is achieved via a reﬁnedanalysis of the greedy heuristic. Keywords:

Submodular functions, Knapsack constraint, Approximation Algorithms

Submodularity is a fundamental mathematical notion that captures the concept of economy of scaleand is prevalent in many areas of science and technology. Given a ground set E , a set function f : 2 E → R over E is called submodular if it has the diminishing returns property: f ( A ∪ { e } ) − f ( A ) ≥ f ( B ∪ { e } ) − f ( B ) for every A ⊆ B ⊆ E and e ∈ E \ B . Submodular functions naturallyarise in diﬀerent areas such as combinatorics, graph theory, probability, game theory, and economics.Some well known examples include coverage functions, cuts in graphs and hypergraphs, matroid rankfunctions, entropy, and budget additive functions.A submodular function f is monotone if f ( S ) ≤ f ( T ) for every S ⊆ T ⊆ E . In this note weconsider the problem of maximizing a monotone submodular function subject to a knapsack constraint (MSK). An instance of the problem is a tuple ( E, f, w, W ) where E is a set of n elements, f : 2 E → R ≥ is a non-negative, monotone and submodular set function given by a value oracle , w : E → N + is aweight function over the elements, and W ∈ N is the knapsack capacity. A subset S ⊆ E is feasible if P e ∈ S w ( e ) ≤ W , i.e., the total weight of elements in S does not exceed the capacity W ; the valueof S ⊆ E is f ( S ). The objective is to ﬁnd a feasible subset S ⊆ E of maximal value.MSK arises in many applications. Some examples include sensor placement [7], document summa-rization [8], and network optimization [13]. The problem is a generalization of monotone submodularmaximization with a cardinality constraint (i.e., w ( e ) = 1 for all e ∈ E ), for which a simple greedyalgorithm yields a (1 − e − )-approximation [11]. This is the best ratio which can be obtained inpolynomial time in the oracle model [10]. The approximation ratio of (1 − e − ) is also optimal in thespecial case of coverage functions under P = N P [5].The ﬁrst (1 − e − )-approximation for MSK was given by Sviredenko [14] as an adaptation ofan algorithm of Khuller, Moss and Naor [6] proposed for the special case of coverage functions. Thealgorithm of Sviridenko exhaustively enumerates (iterates) over all subsets G ⊆ E of at most 3 elements ∗ [email protected] † [email protected] ‡ [email protected] An equivalent deﬁnition is: f ( A ) + f ( B ) ≥ f ( A ∪ B ) + f ( A ∩ B ) for any A, B ⊆ E . We use N to denote the set of non-negative integers, and N + = N \ { } . G using a greedy approach. Within the greedy phase the algorithm maintains afeasible subset A ⊆ E , and in each step an element e ∈ { e ′ ∈ E | w ( A ∪ { e ′ } ) ≤ W } which maximizes f ( A ∪{ e } ) − f ( A ) w ( e ) is added to A . Overall, the algorithm uses O ( n ) oracle calls and arithmetic operations.In [3], Ene and Nguyen presented a (1 − e − − ε )-approximation for MSK in time O ( n · log n ) forany ﬁxed ε >

0, improving upon an earlier O ( n · polylog( n )) algorithm with the same approximationratio due to Badanidiyuru and Vondr´ak [1]. We note, however, that the dependence of the runningtimes of these algorithms on ε renders them purely theoretical. To date, the algorithm of [14] has thebest running time for a (1 − e − )-approximation for MSK.Our main result is the following theorem. Theorem 1.1.

There is a (1 − e − ) -approximation for MSK using O ( n ) value oracle calls andarithmetic operations. To prove the result we use a simple variant of the algorithm of [14] which only enumerates over sets G of up to 2 elements, essentially showing one can save in the enumeration step of [14]. Intuitively,the analysis in [14] bounds the value of the solution generated by the greedy phase assuming a worstcase submodular function f , and then bounds the value loss due to a discarded element (the elementis discarded by the analysis, not by the algorithm) assuming a worst case submodular function g .The main insight for our improved result is that g = f ; that is, there is no function which attainssimultaneously the worst cases assumed in [14] for the outcome of greedy and for the value loss dueto the discarded element. Based on this insight, we give in Section 2 a reﬁned analysis of the greedyphase. The reﬁned analysis is the key for the proof of Theorem 1.1, given in Section 3. We believeour reﬁned analysis may ﬁnd additional applications. We start with some deﬁnitions and notation. Given a monotone submodular function f : 2 E → R ≥ and A ⊆ E , we deﬁne the function f A : 2 E → R ≥ by f A ( S ) = f ( A ∪ S ) − f ( A ) for any S ⊆ E . It iswell known that f A is also monotone, submodular and non-negative (see, e.g., Claim 13 in [4]). Wealso use f ( e ) = f ( { e } ) for e ∈ E . Algorithm 1:

Greedy ( E, f, w, W ) Input :

An MSK instance (

E, f, w, W ) Set E ′ ← E and A ← ∅ while E ′ = ∅ do Find e ∈ E ′ such that f A ( e ) w ( e ) is maximal. Set E ′ ← E ′ \ { e } . If w ( A ∪ { e } ) ≤ W set A ← A ∪ { e } . end Return A The greedy procedure is given in Algorithm 1. While the procedure is useful for deriving eﬃcientapproximation, as a stand-alone algorithm it does not guarantee any constant approximation ratio.We say that the element e ∈ E found in Step 3 is considered in the speciﬁc iteration of the loop in Step2. Furthermore, if the element was also added to A in Step 5 we say it was selected in this iteration. Lemma 2.1.

For any MSK instance ( E, f, w, W ) , Algorithm 1 returns a feasible solution for ( E, f, w, W ) . For any MSK instance (

E, f, w, W ), we deﬁne a value function V . Let { a , . . . , a ℓ } be the outputof Greedy ( E, f, w, W ), in the order by which the elements are added to A in Step 5 of Algorithm 1.Furthermore, deﬁne A i = { a , . . . , a i } for i ∈ [ ℓ ], and A = ∅ . We deﬁne V : [0 , w ( A ℓ )] → R ≥ by ∀ i ∈ [ ℓ ] , w ( A i − ) ≤ u ≤ w ( A i ) : V ( u ) = f ( A i − ) + ( u − w ( A i − )) f A i − ( { a i } ) w ( a i ) . For a set A ⊆ E we use w ( A ) = P e ∈ A w ( e ). For any k ∈ N + we use [ k ] to denote the set { i ∈ N | ≤ i ≤ k } = { , , . . . , k } .

2e note that the value of V ( w ( A i )) is well deﬁned for i ∈ [ ℓ −

1] since f ( A i − ) + ( w ( A i ) − w ( A i − )) f A i − ( { a i } ) w ( a i ) = f ( A i ) = f ( A i ) + ( w ( A i ) − w ( A i )) f A i ( { a i +1 } ) w ( a i +1 ) . That is, the value function V is semi-linear and continuous. By deﬁnition we have that V (0) = f ( ∅ )and V ( w ( A ℓ )) = f ( A ℓ ). Intuitively, V ( u ) can be viewed as the value attained by Algorithm 1 whileusing capacity of u . We use V ′ to denote the ﬁrst derivative of V . We note that V ′ ( u ) is deﬁned foralmost all u ∈ [0 , w ( A ℓ )]. Similar to [14], our analysis is based on lower bounds over V ′ (in [14] theanalysis used a discretization of V , thus omitting the diﬀerentiation).For every i ∈ [ ℓ ] and u ∈ ( w ( A i − ) , w ( A i )) we have V ′ ( u ) = f Ai − ( a i ) w ( a i ) . The next lemma gives alower bound for V ′ . Lemma 2.2.

Let E ′ be the set from Algorithm 1 at the beginning of the iteration in which a i isselected, and let ∅ 6 = Y ⊆ A i − ∪ E ′ . Then, f Ai − ( a i ) w ( a i ) ≥ f ( Y ) − f ( A i − ) w ( Y ) .Proof. Let m = | Y | and Y = { y , . . . , y m } . For any 1 ≤ j ≤ m , if y j ∈ A i − then f Ai − ( { y j } ) w ( y j ) = 0 ≤ f Ai − ( { a i } ) w ( a i ) . Otherwise, by the assumption of the lemma, y j ∈ E ′ and therefore f Ai − ( { y j } ) w ( y j ) ≤ f Ai − ( { a i } ) w ( a i ) ,since a i was selected in Step 5 of Algorithm 1 when the value of the variable A was A i − . Thus, f Ai − ( { y j } ) w ( y j ) ≤ f Ai − ( { a i } ) w ( a i ) for every j ∈ [ m ]. By the last inequality, and since f is monotone andsubmodular, we have the following. f ( Y ) ≤ f ( A i − ∪ Y )= f ( A i − ) + m X j =1 f A i − ∪{ y ,...,y j − } ( { y j } ) ≤ f ( A i − ) + m X j =1 f A i − ( { y j } )= f ( A i − ) + m X j =1 w ( y j ) f A i − ( { y j } ) w ( y j ) ≤ f ( A i − ) + m X j =1 w ( y j ) f A i − ( { a i } ) w ( a i )= f ( A i − ) + w ( Y ) f A i − ( { a i } ) w ( a i ) . By rearranging the terms, we have f Ai − ( a i ) w ( a i ) ≥ f ( Y ) − f ( A i − ) w ( Y ) , as desired.To lower bound V , we use Lemma 2.2 with several diﬀerent sets as Y . Let X be a solution foran MSK instance I = ( E, f, w, W ) and X , . . . , X k be a partition of X such that f X ∪ ... ∪ Xi − ( X i ) w ( X i ) ≥ f X ∪ ... ∪ Xi ( X i +1 ) w ( X i +1 ) for every i ∈ [ k − Y = X and utilizethe diﬀerential inequality to lower bound V on an interval [0 , D ]. The point D is set such that,on the interval [ D , D ], using Lemma 2.2 with Y = X ∪ X yields a better lower bound on V ′ incomparison to Y = X . Subsequently, the resulting diﬀerential inequality is used to bound V on[ D , D ]. When repeated k times, the process results in the bounding function h , formally given inDeﬁnition 2.3. Lemma 2.4 shows that indeed h lower bounds V . Deﬁnition 2.3.

Let I = ( E, f, w, W ) be an MSK instance, and consider X ⊆ E such that w ( X ) ≤ W ,and X , . . . , X k is a partition of X ( X i = ∅ for all i ∈ [ k ] ). Denote S j = S ji =1 X i ( S = ∅ ) and r j = f Sj − ( X j ) w ( X j ) for j ∈ [ k ] . Also, assume that r ≥ r ≥ . . . ≥ r k and deﬁne D = 0 , D j = P ji =1 w ( X i ) ln r i r j +1 or ≤ j ≤ k − , and D k = ∞ . The bounding function of X , . . . , X k and I is h : R ≥ → R ≥ deﬁned by ∀ j ∈ [ k ] , D j − ≤ u < D j : h ( u ) = f ( S j ) − r j · w ( S j ) · exp (cid:18) − u − D j − w ( S j ) (cid:19) (1)It can be easily veriﬁed that D ≤ D ≤ . . . ≤ D k , and D j ′ = D j for j ′ < j if and only if r j ′ +1 = r j +1 . By deﬁnition, the bounding function is diﬀerentiable almost everywhere. Furthermore,for every j ∈ [ k ] such that D j < D j +1 and D j = 0, let j ′ be the minimal j ′ ∈ [ j ] such that D j ′ = D j .Then, lim u ր D j h ( u ) = f ( S j ′ ) − r j ′ · w ( S j ′ ) · exp (cid:18) − D j ′ − D j ′ − w ( S j ′ ) (cid:19) = f ( S j ′ ) − r j ′ · w ( S j ′ ) · exp  − P j ′ i =1 w ( X i ) ln r i r j ′ +1 − P j ′ − i =1 w ( X i ) ln r i r j ′ w ( S j ′ )  = f ( S j ′ ) − r j ′ · w ( S j ′ ) · exp − P j ′ i =1 w ( X i ) w ( S j ′ ) · ln r j ′ r j ′ +1 ! = f ( S j ′ ) − r j ′ +1 · w ( S j ′ )= f ( S j ′ ) + j +1 X i = j ′ +1 f S i − ( X i ) − j +1 X i = j ′ +1 r i · w ( X i ) − r j +1 · w ( S j ′ )= f ( S j +1 ) − r j +1 · w ( S j +1 ) = h ( D j ) . The ﬁrst equality follows from (1). The second and forth equalities follow from the deﬁnitions of D j ′ and S j ′ . The sixth equality holds since for every j ′ < i ≤ j + 1 we have that f Si − ( X i ) w ( X i ) = r i = r j +1 .Thus, the bounding function h is continuous.Let S ∗ be an optimal solution for an MSK instance, I = ( E, f, w, W ). Then, the boundingfunction of I and S ∗ (i.e., k = 1) is h ( u ) = f ( S ∗ ) (cid:0) − exp (cid:0) − uW (cid:1)(cid:1) . It follows from [14] that V ( u ) ≥ (cid:0) − exp (cid:0) − uW (cid:1)(cid:1) f ( S ∗ ) = h ( u ) f ( S ∗ ) for u ∈ [0 , W − max e ∈ S ∗ w ( e )] ∩ N , where the restriction to integervalues can be easily relaxed. Thus, the following lemma can be viewed as a generalization of theanalysis of [14]. Lemma 2.4.

Let I = ( E, f, w, W ) be an MSK instance, V its value function, and A the output ofAlgorithm 1 for the instance I . Consider a subset of elements X ⊆ E where w ( X ) ≤ W , and apartition X , . . . , X k of X , such that f X ∪ ... ∪ Xi − ( X i ) w ( X i ) ≥ f X ∪ ... ∪ Xi ( X i +1 ) w ( X i +1 ) for any i ∈ [ k − . Let h be thebounding function of I and X , . . . , X k , and W max = min { W − max e ∈ X w ( e ) , w ( A ) } . Then, for any u ∈ [0 , W max ] , it holds that V ( u ) ≥ h ( u ) . The proof of Lemma 2.4 uses a diﬀerential comparison argument. We say a function ϕ : Z → R , Z ⊆ R , is positively linear in the second dimension if there is K > u, t , t where( u, t ) , ( u, t ) ∈ Z it holds that ϕ ( u, t ) − ϕ ( u, t ) = K · ( t − t ). The following is a simple variant ofstandard diﬀerential comparison theorems (see, e.g., [9]). Lemma 2.5.

Let [ a, b ] = S sr =1 [ c r , c r +1 ] be an interval such that c ≤ c ≤ . . . ≤ c s +1 and ϑ , ϑ :[ a, b ] → R be two continuous functions such that ϑ ( a ) ≥ ϑ ( a ) and the deriviatives ϑ ′ , ϑ ′ are deﬁnedand continuous on ( c r , c r +1 ) for every r ∈ [ s ] . Also, for any r ∈ [ s ] let ϕ r : ( c r , c r +1 ) × R → R bepositively linear in the second dimension. If ϑ ′ ( u ) ≥ ϕ r ( u, ϑ ( u )) and ϑ ′ ( u ) ≤ ϕ r ( u, ϑ ( u )) for every r ∈ [ s ] and u ∈ ( c r , c r +1 ) , then ϑ ( u ) ≥ ϑ ( u ) for every u ∈ [ a, b ] . The lemma follows from standard arguments in the theory of diﬀerential equations. A formal proofis given in Appendix A We deﬁne ln = ln a = ∞ . roof of Lemma 2.4. Let ( D j ) kj =0 , and ( S j ) kj =0 be as in Deﬁnition 2.3. Deﬁne ϕ ( u, v ) = f ( S j ) − vw ( S j ) (2)for j ∈ [ k ], D j − ≤ u < D j and v ∈ R . Let A = { a , . . . , a ℓ } , where a , . . . , a ℓ is the order by whichthe elements were added to A in Step 5 of Algorithm 1. As before, we use A i = { a , . . . , a i } for i ∈ [ ℓ ]and A = ∅ . Let C = (0 , W max ) \ { D , . . . , D k − } \ { a , . . . , a ℓ } . Then, for any u ∈ C , there is j ∈ [ k ]such that D j − < u < D j . Hence, ϕ ( u, h ( u )) = f ( S j ) − h ( u ) w ( S j ) = r j · exp (cid:18) − u − D j − w ( S j ) (cid:19) = h ′ ( u ) , (3)where h ′ is the ﬁrst derivative of h . As u ∈ C , there is also i ∈ [ ℓ ] such that w ( A i − ) < u < w ( A i ).Hence, V ′ ( u ) = f A i − ( a i ) w ( a i ) ≥ f ( S j ) − f ( A i − ) w ( S j ) = f ( S j ) − V ( w ( A i − )) w ( S j ) ≥ f ( S j ) − V ( u ) w ( S j ) = ϕ ( u, V ( u )) (4)For the ﬁrst inequality, we note that X ⊆ A i − ∪ E ′ , where E ′ is the set at the beginning of theiteration in which a i was selected. Indeed, otherwise we have that X contains an element e ∈ E thatwas considered by the algorithm at some iteration 1 ≤ ℓ < i , but not selected since w ( A ℓ ∪ { e } ) > W .This would imply that u > w ( A i − ) ≥ w ( A ℓ ) > W max . Thus, as S j ⊆ X we have the conditions ofLemma 2.2. The second inequality holds since V is increasing. By (3) and (4) we have ∀ u ∈ C : h ′ ( u ) = ϕ ( u, h ( u )) and V ′ ( u ) ≥ ϕ ( u, V ( u )) . (5)We can write C = S sr =1 ( c r , c r +1 ) where 0 = c ≤ c ≤ . . . ≤ c s +1 = W max . For any r ∈ [ s ] let ϕ r : ( c r , c r +1 ) → R be the restriction of ϕ to ( c r , c r +1 ) ( ϕ r ( u ) = ϕ ( u ) for any u ∈ ( c r , c r +1 )). It canbe easily veriﬁed that ϕ r is continuous and positively linear in the second dimension. Furthermore, itholds that V ′ and h ′ are continuous on ( c r , c r +1 ) for any r ∈ [ s ]. Thus, by (5) and Lemma 2.5 it holdsthat V ( u ) ≥ h ( u ) for any u ∈ [0 , W max ]. Algorithm 2:

EnumGreedy κ ( E, f, w, W ) Input:

An MSK instance (

E, f, w, W ), and enumeration size κ ∈ N . Set S ∗ ← ∅ for every G ⊆ E , | G | ≤ κ do A ← Greedy ( E, f G , w, W − w ( G )) If f ( A ∪ G ) ≥ f ( S ∗ ) then S ∗ ← A ∪ G end Return S ∗ To prove Theorem 1.1 we use

EnumGreedy , i.e., we take Algorithm 2 with κ = 2. We note that EnumGreedy is the (1 − e − )-approximation algorithm of [14]. Lemma 3.1.

EnumGreedy is a (1 − e − ) -approximation for MSK.Proof. It can be easily veriﬁed that the algorithm always returns a feasible solution for the inputinstance. Let (

E, f, w, W ) be an MSK instance and Y ⊆ E an optimal solution for the instance.Let Y = { y , . . . , y | Y | } , and assume the elements are ordered by their marginal values. That is, f { y ,...,y i − } ( { y i } ) = max ≤ j ≤| Y | f { y ,...,y i − } ( { y j } ) for every 1 ≤ i ≤ | Y | .If | Y | ≤ G = { y i | i ∈ { , } , i ≤ | Y | }} . In thisiteration it holds that f ( G ∪ A ) ≥ f ( G ) ≥ f ( Y ) (since f is monotone); thus, following this iteration5e have f ( S ∗ ) ≥ f ( Y ) ≥ (1 − e − ) f ( Y ). Therefore, in this special case the algorithm returns anapproximate solution as required. Hence, we may assume that | Y | > G = { y , y } . Let A be theoutput of Greedy ( E, f G , w, W − w ( G )) in Step 3 in this iteration. If Y \ G ⊆ A then f ( A ∪ G ) ≥ f ( Y ),thus following this iteration it holds that f ( S ∗ ) ≥ f ( Y ), and the algorithm returns an optimal solution.Therefore, we may assume that Y \ G A .Let e ∗ ∈ Y \ G such that w ( e ∗ ) = max e ∈ Y \ G w ( e ) and denote R = Y \ G \ { e ∗ } . Deﬁne twosets X , X such that { X , X } = {{ e ∗ } , R } and f ( X ) w ( X ) ≥ f ( X ) w ( X ) . As f is submodular it follows that f X ( X ) w ( X ) ≤ f ( X ) w ( X ) ≤ f ( X ) w ( X ) . Let h be the bounding function of ( E, f G , w, W − w ( G )) and X , X . Also,let r , r , and D be the values from Deﬁnition 2.3.By Step 5 of Algorithm 1, as Y \ G A , it follows that w ( A ) ≥ W − w ( G ) − w ( e ∗ ). Thus, byLemma 2.4, it holds that f G ( A ) ≥ V ( W − w ( G ) − w ( e ∗ )) ≥ h ( W − w ( G ) − w ( e ∗ )). We consider thefollowing cases. Case 1: W − w ( G ) − w ( e ∗ ) ≥ D . In this case it holds that f ( G ) + f G ( A ) ≥ f ( G ) + h ( W − w ( G ) − w ( e ∗ ))= f ( G ) + f G ( X ∪ X ) − w ( X ∪ X ) · r · exp (cid:18) − W − w ( G ) − w ( e ∗ ) − D w ( X ∪ X ) (cid:19) = f ( Y ) − w ( X ∪ X ) · exp − W − w ( G ) − w ( e ∗ ) − w ( X ) · ln r r w ( X ∪ X ) + ln r ! ≥ f ( Y ) − w ( X ∪ X ) · exp (cid:18) − w ( e ∗ ) + w ( X ) · ln r + w ( X ) · ln r w ( X ∪ X ) (cid:19) . (6)The ﬁrst and second equalities follow from the deﬁnitions of h and D (Deﬁnition 2.3). The lastinequality follows from w ( X ∪ X ) + w ( G ) ≤ W . Deﬁne two sets H e ∗ , H R as follows. If X = { e ∗ } then H e ∗ = ∅ and H R = { e ∗ } . If X = R then H e ∗ = R and H R = ∅ . It follows that w ( X ) · ln r + w ( X ) · ln r = w ( X ) · ln f G ( X ) w ( X ) + w ( X ) · ln f G ∪ X ( X ) w ( X )= w ( e ∗ ) · ln f G ∪ H e ∗ ( e ∗ ) w ( e ∗ ) + w ( R ) · ln f G ∪ H R ( R ) w ( R ) . (7)As the elements y , . . . , y m are ordered according to their marginal values, we have that f ( y ) ≥ f y ( y ) ≥ f G ( e ∗ ) ≥ f G ∪ H e ∗ ( e ∗ ). Therefore, f ( G ) ≥ · f G ∪ H e ∗ ( e ∗ ) and we have that f ( Y ) − f G ∪ H R ( R ) = f ( G ) + f G ∪ H e ∗ ( e ∗ ) ≥ · f G ∪ H e ∗ ( e ∗ ) . (8)By combining (7) and (8) we obtain the following. w ( X ) · ln r + w ( X ) · ln r ≤ w ( e ∗ ) · ln f ( Y ) − f G ∪ H R ( R )3 · w ( e ∗ ) + w ( R ) · ln f G ∪ H R ( R ) w ( R )= − w ( e ∗ ) · ln 3 + w ( e ∗ ) · ln f ( Y ) − f G ∪ H R ( R ) w ( e ∗ ) + w ( R ) · ln f G ∪ H R ( R ) w ( R ) ≤ − w ( e ∗ ) + w ( R ∪ { e ∗ } ) ln f ( Y ) w ( R ∪ { e ∗ } ) = − w ( e ∗ ) + w ( X ∪ X ) ln f ( Y ) w ( X ∪ X ) (9)The second inequality follows from the log-sum inequality (see, e.g, Theorem 2.7.1 in [2]) and ln 3 > f ( G ) + f G ( A ) ≥ f ( Y ) − w ( X ∪ X ) · exp (cid:18) − f ( Y ) w ( X ∪ X ) (cid:19) = f ( Y ) · (cid:0) − e − (cid:1) . ase 2: W − w ( G ) − w ( e ∗ ) < D and X = { e ∗ } . We can use the assumption in this case to lowerbound f G ( X ) f G ∪ X ( X ) as follows. W − w ( G ) − w ( X ) < D = w ( X ) · ln r r = w ( X ) · (cid:18) ln f G ( X ) f G ∪ X ( X ) + ln w ( X ) w ( X ) (cid:19) By rearranging the terms we haveln f G ( X ) f G ∪ X ( X ) > W − w ( G ) − w ( X ) w ( X ) − ln w ( X ) w ( X ) . Thus, f G ( X ) > f G ∪ X ( X ) · w ( X ) w ( X ) · exp (cid:18) W − w ( G ) − w ( X ) w ( X ) (cid:19) ≥ f G ∪ X ( X ) · δ − · exp ( δ ) , (10)where δ = W − w ( G ) − w ( X ) w ( X ) and the last inequality follows from w ( X ) + w ( X ) + w ( G ) ≤ W .We use (10) to lower bound f ( G ∪ A ). f ( G ) + f G ( A ) ≥ f ( G ) + h ( W − w ( G ) − w ( X ))= f ( G ) + f G ( X ) − f G ( X ) · exp ( − δ )= 23 ( f ( G ) + f G ( X )) + 13 ( f ( G ) + f G ( X )) − f G ( X ) · exp ( − δ ) ≥

23 ( f ( G ) + f G ( X )) + f G ( X ) − f G ( X ) · exp ( − δ ) ≥

23 ( f ( G ) + f G ( X )) + f G ∪ X ( X ) (1 − exp ( − δ )) · δ − · exp ( δ ) ≥

23 ( f ( G ) + f G ( X ) + f G ∪ X ( X )) ≥ (cid:0) − e − (cid:1) f ( Y )The second inequality follows from f ( G ) = f ( y )+ f { y } ( y ) ≥ · f G ( e ∗ ) = 2 f G ( X ) due to the orderingof elements in Y . The third inequality follows from (10). The forth inequality follows from(1 − exp( − δ )) · δ − · exp( δ ) = (exp( δ ) − · δ − ≥ ≥ , as exp( δ ) ≥ δ . Case 3: W − w ( G ) − w ( e ∗ ) < D and X = R . In this case, we have f ( G ) + f G ( A ) ≥ f ( G ) + h ( W − w ( G ) − w ( e ∗ ))= f ( G ) + f G ( R ) − f G ( R ) · exp (cid:18) − W − w ( G ) − w ( e ∗ ) w ( R ) (cid:19) ≥

23 ( f ( Y ) − f G ( R )) + f G ( R ) − f G ( R ) · exp ( − ≥ (cid:0) − e − (cid:1) f ( Y )The second inequality follows from w ( X ) + w ( X ) + w ( G ) ≤ W , and f ( G ) ≥ f ( G ) + f G ∪ R ( { e } ) = f ( Y ) − f G ( R ), as G = { y , y } .Thus, in all cases f ( A ∪ G ) = f ( G ) + f G ( A ) ≥ (1 − e − ) f ( Y ). Hence, in the iteration where G = { y , y } we have that f ( S ∗ ) ≥ (cid:0) − e − (cid:1) f ( Y ).Theorem 1.1 follows from Lemma 3.1 and the observation that EnumGreedy uses O ( n ) oraclecalls and arithmetic operations. It is natural to ask whether EnumGreedy also yields a (1 − e − )-approximation. Here, the answer is clearly negative. For any N >

0, consider the MSK instance I = ( E, f, w, W ), with E = { , , } , w (1) = w (2) = N , w (3) = 1, W = 2 N and f ( S ) = | S ∩{ , }| · N + 2 · | S ∩ { }| . While the optimal solution for the instance is { , } with f ( { , } ) = 2 N , EnumGreedy ( E, f, W, w ) returns either { , } or { , } where f (1 ,

3) = f ( { , } ) = N + 1. Alreadyfor N = 4 the solution returned is not an (1 − e − )-approximation. We note that the function f inthis example is modular (linear). 7 eferences [1] Ashwinkumar Badanidiyuru and Jan Vondr´ak. Fast algorithms for maximizing submodular func-tions. In Proceedings of the twenty-ﬁfth annual ACM-SIAM symposium on Discrete algorithms(SODA) , pages 1497–1514. SIAM, 2014.[2] Thomas M. Cover and Joy A. Thomas.

Elements of Information Theory (Wiley Series in Telecom-munications and Signal Processing) . Wiley-Interscience, New York, NY, USA, second edition,2006.[3] Alina Ene and Huy L. Nguyen. A Nearly-Linear Time Algorithm for Submodular Maximizationwith a Knapsack Constraint. In , pages 53:1–53:12, 2019.[4] Yaron Fairstein, Ariel Kulik, Joseph (Seﬃ) Naor, Danny Raz, and Hadas Shachnai. A (1-e-1- ǫ )-approximation for the monotone submodular multiple knapsack problem. In , volume 173 of LIPIcs , pages 44:1–44:19, 2020.[5] Uriel Feige. A threshold of ln n for approximating set cover.

J. ACM , 45(4):634–652, July 1998.[6] Samir Khuller, Anna Moss, and Joseph Naor. The budgeted maximum coverage problem.

Infor-mation processing letters , 70(1):39–45, 1999.[7] Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne VanBriesen, andNatalie Glance. Cost-eﬀective outbreak detection in networks. In

Proceedings of the 13th ACMSIGKDD international conference on Knowledge discovery and data mining , pages 420–429, 2007.[8] Hui Lin and Jeﬀ Bilmes. Multi-document summarization via budgeted maximization of submod-ular functions. In

Human Language Technologies: The 2010 Annual Conference of the NorthAmerican Chapter of the Association for Computational Linguistics , pages 912–920, 2010.[9] Alex McNabb. Comparison theorems for diﬀerential equations.

Journal of mathematical analysisand applications , 119(1-2):417–428, 1986.[10] G. L. Nemhauser and L. A. Wolsey. Best algorithms for approximating the maximum of asubmodular set function.

Mathematics of Operations Research , 3(3):177–188, 1978.[11] George L Nemhauser, Laurence A Wolsey, and Marshall L Fisher. An analysis of approximationsfor maximizing submodular set functions—i.

Mathematical programming , 14(1):265–294, 1978.[12] Thomas C Sideris.

Ordinary diﬀerential equations and dynamical systems . Springer, 2013.[13] K. Son, H. Kim, Y. Yi, and B. Krishnamachari. Base station operation and user associationmechanisms for energy-delay tradeoﬀs in green cellular networks.

IEEE Journal on SelectedAreas in Communications , 29(8):1525–1536, 2011.[14] Maxim Sviridenko. A note on maximizing a submodular set function subject to a knapsackconstraint.

Operations Research Letters , 32(1):41–43, 2004.

A Proof of Lemma 2.5

To prove Lemma 2.5 we ﬁrst show a simpler claim.

Lemma A.1.

Within the settings of Lemma 2.5, if ϑ ( c r ) ≥ ϑ ( c r ) for some r ∈ [ s ] then ϑ ( u ) ≥ ϑ ( u ) for any u ∈ [ c r , c r +1 ] . roof. If c r = c r +1 the claim trivially holds. Therefore we may assume that c r < c r +1 . Deﬁne δ : [ c r , c r +1 ] → R by δ ( u ) = ϑ ( u ) − ϑ ( u ), and let δ ′ = ϑ ′ − ϑ ′ be its derivative. We note that ϑ ′ and ϑ ′ are continuous in ( c r , c r +1 ) and hence δ ′ is continuous and integrable on ( c r , c r +1 ). Thus, for any u ∈ ( c r , c r +1 ) it holds that δ ( u ) = lim t ց c r δ ( u ) − δ ( t ) + δ ( c r ) = lim t ց c r Z ut δ ′ ( z ) dz + δ ( c r ) = δ ( c r ) + Z uc r δ ′ ( z ) dz (11)where the ﬁrst equality holds since δ is continuous. Furthermore, as ϕ r is positively linear in thesecond dimension there is K r > ϕ r ( u, t ) − ϕ r ( u, t ) = K r · ( t − t ) for any u ∈ ( c r , c r +1 )and t , t ∈ R . Hence, for any u ∈ ( c r , c r +1 ), it holds that δ ′ ( u ) = ϑ ′ ( u ) − ϑ ′ ( u ) ≤ ϕ r ( u, ϑ ( u )) − ϕ r ( u, ϑ ( u )) = K r · ( ϑ ( u ) − ϑ ( u ))= K r (cid:18) δ ( c r ) + Z uc r δ ′ ( z ) dz (cid:19) ≤ K r · (cid:12)(cid:12)(cid:12)(cid:12)Z uc r δ ′ ( z ) dz (cid:12)(cid:12)(cid:12)(cid:12) . The last equality follows from (11) and the inequality holds since δ ( c r ) = ϑ ( c r ) − ϑ ( c r ) ≤

0. ByGronwall’s inequality (see, e.g., Lemma 3.3 in [12]), it follows that δ ( u ) ≤ u ∈ ( c r , c r +1 ), andtherefore ϑ ( u ) ≥ ϑ ( u ) for any u ∈ ( c r , c r +1 ). As δ is continuous, δ ( c r +1 ) = lim u ր c r +1 δ ( u ) ≤

0; thus, ϑ ( c r +1 ) ≥ ϑ ( c r +1 ) as well. Proof of Lemma 2.5.

The lemma essentially follows immediately form Lemma A.1 using an inductiveclaim. We will prove by induction on r ∈ [ s + 1] that ϑ ( u ) ≥ ϑ ( u ) for any u ∈ [ a, c r ]. For r = 1 theclaim holds since c = a and ϑ ( a ) ≥ ϑ ( a ). Let r > ϑ ( u ) ≥ ϑ ( u ) for any u ∈ [ a, c r ].Then, by Lemma A.1, ϑ ( u ) ≥ ϑ ( u ) for any u ∈ [ c r , c r +1 ] as well. That is, the claim holds for r + 1.Taking r = s + 1 we have that ϑ ( u ) ≥ ϑ ( u ) for any u ∈ [ a, c s +1 ] = [ a, ba, b