[PDF] Constrained Submodular Maximization: Beyond 1/e

Abstract

In this work, we present a new algorithm for maximizing a non-monotone submodular function subject to a general constraint. Our algorithm finds an approximate fractional solution for maximizing the multilinear extension of the function over a down-closed polytope. The approximation guarantee is 0.372 and it is the first improvement over the 1/e approximation achieved by the unified Continuous Greedy algorithm [Feldman et al., FOCS 2011].

Full PDF

aa r X i v : . [ c s . D S ] A ug Constrained Submodular Maximization: Beyond 1 /e Alina Ene ∗ Huy L. Nguy˜ên † August 15, 2016

Abstract

In this work, we present a new algorithm for maximizing a non-monotone submodular func-tion subject to a general constraint. Our algorithm ﬁnds an approximate fractional solution formaximizing the multilinear extension of the function over a down-closed polytope. The approx-imation guarantee is 0 .

372 and it is the ﬁrst improvement over the 1 /e approximation achievedby the uniﬁed Continuous Greedy algorithm [Feldman et al. , FOCS 2011]. ∗ Department of Computer Science, Boston University, [email protected] . This work was done in part while the authorwas with the Computer Science department at the University of Warwick and a visitor at the Toyota TechnologicalInstitute at Chicago. † College of Computer and Information Science, Northeastern University, [email protected] . This workwas done in part while the author was with the Toyota Technological Institute at Chicago. Introduction

A set function f : 2 V → R is submodular if for every A, B ⊆ V , we have f ( A ) + f ( B ) ≥ f ( A ∪ B ) + f ( A ∩ B ). Submodular functions naturally arise in a variety of contexts, both in theory and practice.From a theoretical perspective, submodular functions provide a common abstraction of cut functionsof graphs and digraphs, Shannon entropy, weighted coverage functions, and log-determinants. Froma practical perspective, submodular functions are used in a wide range of application domains frommachine learning to economics. In machine learning, it is used for document summarization [15],sensor placement [12], exemplar clustering [9], potential functions for image segmentation [10],etc. In an economics context, it can be used to model market expansion [6], inﬂuence in socialnetworks [11], etc. The core mathematical problem underpinning many of these applications isthe meta problem of maximizing a submodular objective function subject to some constraints, i.e.,max S ∈S f ( S ) where S is a down-closed family of sets .A common approach to this problem is a two-step framework based on the multilinear extension F of f , a continuous function that extends f to the domain [0 , V . The program ﬁrst (1) maximizes F ( x )subject to x ∈ C where C is a convex relaxation of S , and then (2) rounds x to an integral solutionin S . This paradigm has been very successful and it has led to the current best approximationalgorithms for a wide variety of constraints including cardinality constraints, knapsack constraints,matroid constraints, etc. The contention resolution scheme framework of Chekuri et al. [5] gives ageneric way to perform step 2 of the program. Thus, many recent works [7, 4] focus on improvingstep 1, as it immediately leads to improved algorithms when combined with the known roundingschemes. Feldman et al. [7] give a beautiful generalization of the continuous greedy algorithm [18],achieving an 1 /e approximation for step 1 for any down-closed and solvable polytope C . While itis known that the continuous greedy algorithm is optimal when we restrict to monotone functions ,it is not known if the 1 /e approximation of the extension by [7] is optimal. Recently, the workof Buchbinder et al. [1] shows that, for the special case of a cardinality constraint, it is possibleto beat 1 /e . However, this result still leaves open the possibility that an 1 /e approximation isbest possible for a harder constraint. This possibility is consistent with our current knowledge:the best known hardness for a cardinality constraint is 0.491 while the best known hardness for amatroid constraint is 0.478 [8], suggesting that the matroid constraint may be strictly harder toapproximate. In this paper, we rule out this possibility and show that it is possible to go beyondthe 1 /e barrier in the same generic setting as considered by [7]. Theorem 1.

Let f be a non-negative, non-monotone, submodular function f . Let F be the multi-linear extension of f . Let C be a down-closed and solvable polytope. There is an eﬃcient algorithmthat constructs a solution x ∈ C such that F ( x ) ≥ . · F (OPT) , where OPT is an optimalintegral solution to the problem max x ∈ C ∩{ , } V F ( x ) . In order to keep the analysis as simple as possible, we have not tried to optimize the constant in thetheorem above. We believe a better constant can be obtained using the techniques in this paper,and we hope that our work will lead to improved approximation guarantees for the problem. A family of sets S is down-closed if for all A ⊂ B , if B ∈ S then A ∈ S . A polytope C is solvable if there is an oracle for optimizing linear functions over C , i.e., for solving max c ∈ C h v , c i for any vector v . A function f : 2 V → R is monotone if f ( A ) ≤ f ( B ) for all A ⊆ B . . − o (1) approximation for submodular maximizationsubject to a matroid constraint and a 0 . − ǫ approximation for a constant number of knapsackconstraints, which improve over the 1 /e − o (1) and 1 /e − ǫ approximations [7]. Our starting point is the uniﬁed continuous greedy algorithm [7] achieving the approximation factor1 /e . The algorithm grows a solution x over the time interval [0 , F ( x ) in each timestep by an amount proportional to (1 −k x k ∞ ) F (OPT). As x increases over time, the gain starts outlarge and decreases to 0 at the end of the process. The change in x in each time step is proportional to v = argmax c ∈ C h∇ F ( x ) ◦ ( − x ) , c i . Notice that we can improve the gain ((1 − k x k ∞ ) F (OPT))by slowing down the growth of k x k ∞ . Thus in our algorithm, we add a new constraint k c k ∞ ≤ α and have v = argmax c ∈ C, k c k ∞ ≤ α h∇ F ( x ) ◦ ( − x ) , c i . How does this change aﬀect the performanceof the algorithm and in particular, the quantity h∇ F ( x ) ◦ ( − x ) , v i ? Intuitively, if there is a secondsolution other than OPT with value close to OPT, we can pick v to be a mixture of OPT and thissolution and have a new solution comparable to OPT but with lower ℓ ∞ norm. Thus, if this is thecase, we do not lose very much in h∇ F ( x ) ◦ ( − x ) , v i while simultaneously increasing the gain(1 − k x k ∞ ) F (OPT). On the other hand, if there is no such solution, the crucial insight is that v must be well correlated with OPT. Thus we can identify a good fraction of OPT by searching forargmax x ≤ v F ( x ). (Here we crucially use the fact that C is a down-closed polytope.) This problemturns out to be not very diﬀerent from the unconstrained setting and we can use a variant of thedouble greedy algorithm of Buchbinder et al. [2] to ﬁnd a good solution.The above description is an intuitive but simpliﬁed overview of the algorithm. Nevertheless, eachroutine mentioned above corresponds to a part of the algorithm described in Section 4. The maintechnical diﬃculty is in formalizing the statement “ v must be well correlated with OPT” and thesubsequent identiﬁcation of (a large part of) OPT. We describe how to overcome these diﬃcultiesin Section 5.2. Comparison with the work of [1] for a cardinality constraint.

The idea of adding a newconstraint k c k ∞ ≤ α is inspired by a recent algorithm by [1] for the cardinality constraint. Intheir setting, the goal is to maximize F ( x ) subject to a constraint k x k ≤ k . In the i th step, theiralgorithm picks 2( k − i ) elements with maximum marginal gain and randomly adds one of them tothe current solution. This can be viewed as an analog of adding a constraint k c k ∞ ≤ k/ (2 k − i )when the time is in the range [( i − /k, i/k ]. This more sophisticated use of varying thresholdsdepending on the time is allowed in our framework but in the simple solution presented here, wejust use a ﬁxed threshold throughout.When the marginal gain ( h∇ F ( x ) ◦ (1 − x ) , v i in our case) is small, they also use a variant ofthe double greedy algorithm to ﬁnish the solution. From the 2( k − i ) elements that are pickedin the last iteration, their algorithm picks k − i elements using a sophisticated variant of doubleGreedy, and add them to the i elements it picked in the previous iterations. Unfortunately thisstep crucially uses the structure of the solution when the constraint is a cardinality constraint.From the point of view of a continuous Greedy algorithm, this is analogous to doubling a fractionalsolution and then selecting half of the coordinates. It is not clear what the analog should be in We use x ◦ y to denote the vector whose i -th coordinate is x i · y i . p = argmax c ∈ C h∇ F ( x ) ◦ (1 − x ) , c i (dropping the dampening constraint) and use standard double greedy to ﬁnd z = argmax c ≤ p F ( c ).While v and p could potentially be very diﬀerent, we manage to connect h∇ F ( x ) ◦ (1 − x ) , v i and h∇ F ( x ) ◦ (1 − x ) , p i , which is enough to prove the existence of z ≤ p with large F ( z ).In summary, in this work, we introduce several novel insights that lead to a much more generalalgorithm and analysis. We believe that our algorithm and analysis are conceptually much simplerand cleaner, and we hope that our techniques will lead to further improvements for the problem. Submodular maximization problems are well-studied and it is hard to do justice to all previous liter-ature on this subject. Starting with the seminal work Nemhauser et al. [16], the classical approachesto these problems are largely combinatorial and based on greedy and local search algorithms. Overthe years, with increasing sophistication, this direction has led to many tight results such as thealgorithm of [17] for a knapsack constraint, and the current best results for constraints such as mul-tiple matroid constraints [14]. In the last few years, another approach emerged [3] that follows thepopular paradigm in approximation algorithms of optimizing a continuous relaxation and roundingthe resulting fractional solution. A key diﬃculty that separates the submodular setting from theclassical setting is that even ﬁnding a fractional solution may be quite challenging, and in particularit is NP -hard to solve the continuous relaxation for maximizing submodular functions that is basedon the multilinear extension. Thus, a line of work has been developed to approximately optimizethis relaxation [3, 13, 8, 7] culminating in the work [7], which we extend here. Here we consider the problem of maximizing a submodular function f subject to a downward closedconvex constraint x ∈ C .We use the following notation. Let n = | V | . We write x ≤ y if x i ≤ y i for all coordinates i ∈ [ n ].Let x ◦ y denote the vector whose i -th coordinate is x i · y i . Let x ∨ y (resp. x ∧ y ) be the vectorwhose i -th coordinate is max { x i , y i } (resp. min { x i , y i } ). Let S ∈ { , } V denote the indicatorvector of S ⊆ V , i.e., the vector that has a 1 in entry i if and only if i ∈ S .Let F : [0 , n → R + denote the multilinear extension of f : F ( x ) = E [ f ( R ( x ))] = X S ⊆ V f ( S ) Y i ∈ S x i Y i ∈ V \ S (1 − x i ) , where R ( x ) is a random subset of V where each i ∈ V is included independently at random withprobability x i . We use ∇ F to denote the gradient of F and we use ∂F∂ x i to denote the i -th coordinateof the gradient of F .The multilinear extension has the following well-known properties, see for instance [18]. Claim 2. ∂F∂x i ( x ) = F ( x ∨ { i } ) − F ( x ∧ V \{ i } ) . Proof:

Note that F ( x ) = x i · F ( x ∨ { i } ) + (1 − x i ) · F ( x ∧ V \{ i } ) . lgorithm 1: Algorithm for a box constraint u (0) = u , v (0) = v for i ∈ [ n ] do a i ← ( v i − u i ) (cid:16) F (cid:16) u ( i − ∨ { i } (cid:17) − F (cid:16) u ( i − ∧ V \{ i } (cid:17)(cid:17) b i ← ( v i − u i ) (cid:16) F (cid:16) v ( i − ∧ V \{ i } (cid:17) − F (cid:16) v ( i − ∨ { i } (cid:17)(cid:17) a ′ i = max( a i , , b ′ i = max( b i , if a ′ i + b ′ i = 0 then u ( i ) ← u ( i − + a ′ i a ′ i + b ′ i · ( v i − u i ) · { i } v ( i ) ← v ( i − − b ′ i a ′ i + b ′ i · ( v i − u i ) · { i } else // Can update u ( i ) i and v ( i ) i to any common value// We set u ( i ) i = v ( i ) i = v ( i − i u ( i ) ← u ( i − + ( v i − u i ) · { i } v ( i ) ← v ( i − return u ( n ) Figure 1: Double Greedy algorithm for a box constraint { x : u ≤ x ≤ v } .Thus if we take the partial derivative with respect to x i , we obtain the claim. (cid:3) Claim 3. If x ≤ y then ∇ F ( x ) ≥ ∇ F ( y ) . Proof:

Fix a coordinate i . Since x ≤ y , submodularity implies that F ( x ∨ { i } ) − F ( x ∧ V \{ i } ) ≥ F ( y ∨ { i } ) − F ( y ∧ V \{ i } ) . By Claim 2, the left-hand side of the inequality above is ∂F∂x i ( x ) and the right-hand side is ∂F∂x i ( y ). (cid:3) Claim 4. F is concave along any line of direction d ≥ . That is, for any vector x , the function φ : R → R such that φ ( t ) = F ( x + t · d ) for every t ∈ R for which x + t · d ∈ [0 , V is concave. Proof sketch:

By submodularity, we can verify that φ ′′ ( t ) ≤ φ is concave. (cid:3) In this section, we describe an algorithm for maximizing the multilinear extension subject to a boxconstraint: given u and v , ﬁnd max u ≤ x ≤ v F ( x ). The algorithm is similar to the Double Greedyalgorithm of [2] and it is given in Figure 1.The proof of the following lemma is similar to the analysis of the Double Greedy algorithm and itcan be found in Appendix A. 4 lgorithm 2: Algorithm for a general constraint Initialize x ∗ = for θ ∈ [0 , do // Dampened Continuous Greedy stage Initialize x (0) = for t ∈ [0 , θ ] do v ( t ) = argmax c ∈ C, k c k ∞ ≤ α h∇ F ( x ( t ) ) ◦ ( − x ( t ) ) , c i Update x ( t ) according to d x ( t ) dt = v ( t ) ◦ ( − x ( t ) ) // Standard Continuous Greedy stage Initialize y ( θ ) = x ( θ ) for t ∈ ( θ, do v ( t ) = argmax c ∈ C h∇ F ( y ( t ) ) ◦ ( − y ( t ) ) , c i Update y ( t ) according to d y ( t ) dt = v ( t ) ◦ ( − y ( t ) ) // Double Greedy stage p = argmax c ∈ C h∇ F ( x ( θ ) ) ◦ ( − x ( θ ) ) , c i Find z approximating argmax c ≤ p F ( c ) using the DoubleGreedy algorithm (Algorithm 1) // Update the best solution if F ( y (1) ) > F ( x ∗ ) then x ∗ = y (1) if F ( z ) > F ( x ∗ ) then x ∗ = z Return x ∗ Figure 2: Continuous algorithm for a general constraint

Lemma 5.

Algorithm 1 ﬁnds a solution x to the problem max u ≤ x ≤ v F ( x ) such that F ( x ) ≥ · F (OPT) + 14 · F ( u ) + 14 · F ( v ) , where OPT is an optimal solution to the problem max u ≤ x ≤ v F ( x ) . Alternatively, we can reduce the problem with a box constraint to the unconstrained problem bydeﬁning a suitable submodular function g ( S ) = F ( u +( v − u ) ◦ S ). We can show that this reductionis correct using the same argument given in the appendix.5 The algorithm

In this section, we describe our main algorithm for the problem max x ∈ C F ( x ). We ﬁrst give acontinuous version of our algorithm; in order to eﬃciently implement the algorithm, we discretizeit using a standard approach, and we give the details in Appendix B.The algorithm picks the best out of two solutions. The ﬁrst solution is constructed by running theContinuous Greedy algorithm with an additional dampening constraint as long as the marginal gainremains high despite the dampening constraint, and then ﬁnishing the solution via the standardContinuous Greedy algorithm for the remaining time. The second solution is constructed by runningDouble Greedy exactly when the marginal gain becomes low because of the dampening constraint,which must happen early if the ﬁrst solution is not good. Since we do not know precisely when themarginal gain becomes low, the algorithm tries all possible values via the outer for loop on line 2. In this section, we analyze the algorithm and show that it achieves an approximation greater than0 . θ of the outer loop and we analyze the solutions constructed inthat iteration. In the remainder of this section, all of the vectors x ( · ) , y ( · ) , etc. refer to the vectorsduring iteration θ . (1) In this section, we analyze the solution y (1) constructed in any given iteration θ ∈ [0 , Theorem 6.

Let θ ∈ [0 , and let y (1) be the solution constructed by Algorithm 2 in iteration θ ofthe outer loop. We have F ( y (1) ) ≥ e (cid:16) e θ F ( x ( θ ) ) + (1 − θ ) e (1 − α ) θ F (OPT) (cid:17) . We devote the rest of this section to the proof of Theorem 6. In the remainder of this section, allof the vectors x ( · ) , y ( · ) , etc. refer to the vectors during iteration θ .We start by upper bounding the ℓ ∞ norm of x ( t ) and y ( t ) . Lemma 7.

Consider the following process. Let time run from t = t ≥ to t = t ≤ . Let u ( t ) bea vector updated according to d u ( t ) dt = v ( t ) ◦ ( − u ( t ) ) , where v ( t ) is a vector such that k v ( t ) k ∞ ≤ δ . Then k u ( t ) k ∞ ≤ k u ( t ) k ∞ − e − δ ( t − t ) for each t ∈ [ t , t ] . The dampening constraint imposes an ℓ ∞ constraint in line 5 of Algorithm 2. roof: The i -th coordinate u ( t ) i of u ( t ) is updated according to d u ( t ) i dt = v ( t ) i (1 − u ( t ) i ) ≤ δ (1 − u ( t ) i ) . By solving the diﬀerential inequality above, we obtain u ( t ) i ≤ C · e − δt , where C is a constant.Using the initial condition 1 + C · e − δt = u ( t ) i , we obtain C = ( u ( t ) i − e δt and thus u ( t ) i ≤ u ( t ) i − e − δ ( t − t ) . (cid:3) Corollary 8.

For every t ∈ [0 , θ ] , k x ( t ) k ∞ ≤ − e − αt . Proof:

The vector x ( t ) starts at and it is updated according to the update rule in line 6, where v ( t ) is a vector of ℓ ∞ norm at most α . Thus it follows from Lemma 7 that k x ( t ) k ∞ ≤ − e − αt . (cid:3) Corollary 9.

For every t ∈ [ θ, , k y ( t ) k ∞ ≤ − e (1 − α ) θ − t . Proof:

The vector y ( t ) starts at y ( θ ) = x ( θ ) and it is updated according to the update rule in line11, where v ( t ) is a vector of ℓ ∞ norm at most 1. Thus it follows from Lemma 7 and the upperbound on k x ( θ ) k ∞ given by Corollary 8 that k y ( t ) k ∞ ≤ k x ( θ ) k ∞ − e − ( t − θ ) ≤ − e (1 − α ) θ − t . (cid:3) We will also need the following lemma that was shown in [7].

Lemma 10 ([7, Lemma III.5]) . Let x ∈ [0 , n and let S ⊆ V . We have F ( x ∨ S ) ≥ (1 − k x k ∞ ) f ( S ) . Proof of Theorem 6:

Using the chain rule, for every t ∈ [ θ, dF ( y ( t ) ) dt = * ∇ F ( y ( t ) ) , d y ( t ) dt + = D ∇ F ( y ( t ) ) , v ( t ) ◦ ( − y ( t ) ) E = D ∇ F ( y ( t ) ) ◦ ( − y ( t ) ) , v ( t ) E ≥ D ∇ F ( y ( t ) ) ◦ ( − y ( t ) ) , OPT E ≥ F ( y ( t ) ∨ OPT) − F ( y ( t ) ) ≥ e (1 − α ) θ − t · F (OPT) − F ( y ( t ) ) . (By Corollary 9 and Lemma 10)By solving the diﬀerential inequality, we obtain F ( y ( t ) ) ≥ e t (cid:16) e θ F ( y ( θ ) ) + ( t − θ ) e (1 − α ) θ F (OPT) (cid:17) , and thus F ( y (1) ) ≥ e (cid:16) e θ F ( y ( θ ) ) + (1 − θ ) e (1 − α ) θ F (OPT) (cid:17) . The theorem now follows, since y ( θ ) = x ( θ ) . (cid:3) .2 Analysis of the solution z In this section, we analyze the solution z constructed using the Double Greedy algorithm, and thisis the crux of our argument. Theorem 11.

Let θ ∈ [0 , and let z be the solution constructed by Algorithm 2 in iteration θ ofthe outer loop. If α ≥ / , we have F ( z ) ≥ − α ) ( e − αθ F (OPT) − F ( x ( θ ) ) − h∇ F ( x ( θ ) ) ◦ ( − x ( θ ) ) , v ( θ ) i ) . We ﬁrst give some intuition for the approach. Consider the solution x constructed by the dampened Continuous Greedy algorithm. The rate of growth of F ( x ) is given by the inner product h∇ F ( x ) ◦ ( − x ) , v i and thus the crux of the analysis is to understand how this inner product evolves overtime. The inner product h∇ F ( x ) ◦ ( − x ) , v i is always at least h∇ F ( x ) ◦ ( − x ) , α OPT i andintuitively we should gain proportional to the diﬀerence between the two. If this diﬀerence is verysmall, the key insight is that once we drop the dampening constraint and compute the vector thatmaximizes h∇ F ( x ) ◦ ( − x ) , c i over all feasible vectors c , we obtain a vector p that is well-correlatedwith OPT in the sense that p ∧ OPT is a good solution. We formalize this intuition in the remainderof this section.In the remainder of this section, we analyze the solution b z := ( − x ( θ ) ) ◦ ( p ∧ OPT) . By submodularity, we have F ( b z ) − F ( ) ≥ F ( b z + x ( θ ) ) − F ( x ( θ ) ) , and thus it suﬃces to analyze F ( b z + x ( θ ) ).Note that x ( θ ) ≤ b z + x ( θ ) ≤ x ( θ ) + ( − x ( θ ) ) ◦ OPT = x ( θ ) ∨ OPT. Thus Claim 4 and Claim 3 give F ( x ( θ ) ∨ OPT) − F ( b z + x ( θ ) ) ≤ h∇ F ( b z + x ( θ ) ) , ( − x ( θ ) ) ◦ (( − p ) ∧ OPT) i (By Claim 4) ≤ h∇ F ( x ( θ ) ) , ( − x ( θ ) ) ◦ (( − p ) ∧ OPT) i (By Claim 3)= h∇ F ( x ( θ ) ) ◦ ( − x ( θ ) ) , ( − p ) ∧ OPT i In the ﬁrst inequality, we have used the fact that ( x ( θ ) ∨ OPT) − ( b z + x ( θ ) ) = ( x ( θ ) + ( − x ( θ ) ) ◦ OPT) − ( b z + x ( θ ) ) = ( − x ( θ ) ) ◦ (( − p ) ∧ OPT).In order to upper bound h∇ F ( x ( θ ) ) ◦ ( − x ( θ ) ) , ( − p ) ∧ OPT i , we consider the following vector. b v := (1 − α )(( − p ) ∧ OPT) + α p . The intuition behind the choice of b v is to connect the fact that the inner product h∇ F ( x ( τ ) ) ◦ ( − x ( τ ) ) , v i is not much more than h∇ F ( x ( τ ) ) ◦ ( − x ( τ ) ) , α OPT i to the insight that h∇ F ( x ( τ ) ) ◦ ( − x ( τ ) ) , ( − p ) ∧ OPT i is small.Since α ≥ /

2, we have 1 − α ≤ α . Therefore, for each i ∈ [ n ], we have b v i = ( (1 − α )(1 − p i ) + α p i ≤ α if i ∈ OPT α p i ≤ α otherwise8herefore k b v k ∞ ≤ α . Additionally, b v ∈ C , since it is a convex combination of two vectors in C (recall that C is downward closed and convex). It follows from the deﬁnition of v ( θ ) on line 5 that h∇ F ( x ( θ ) ) ◦ ( − x ( θ ) ) , v ( θ ) i≥ h∇ F ( x ( θ ) ) ◦ ( − x ( θ ) ) , b v i = (1 − α ) h∇ F ( x ( θ ) ) ◦ ( − x ( θ ) ) , ( − p ) ∧ OPT) i + α h∇ F ( x ( θ ) ) ◦ ( − x ( θ ) ) , p i≥ (1 − α ) h∇ F ( x ( θ ) ) ◦ ( − x ( θ ) ) , ( − p ) ∧ OPT) i + α h∇ F ( x ( θ ) ) ◦ ( − x ( θ ) ) , OPT i≥ (1 − α ) h∇ F ( x ( θ ) ) ◦ ( − x ( θ ) ) , ( − p ) ∧ OPT) i + α ( F ( x ( θ ) ∨ OPT) − F ( x ( θ ) ))By rearranging the inequality above, we obtain h∇ F ( x ( θ ) ) ◦ ( − x ( θ ) ) , ( − p ) ∧ OPT) i≤ − α h∇ F ( x ( θ ) ) ◦ ( − x ( θ ) ) , v ( θ ) i − α − α ( F ( x ( θ ) ∨ OPT) − F ( x ( θ ) )) . By combining all of the inequalities, we obtain F ( b z ) ≥ F ( b z + x ( θ ) ) − F ( x ( θ ) ) ≥ F ( x ( θ ) ∨ OPT) − F ( x ( θ ) ) − h∇ F ( x ( θ ) ) ◦ ( − x ( θ ) ) , ( − p ) ∧ OPT) i≥ − α ( F ( x ( θ ) ∨ OPT) − F ( x ( θ ) ) − h∇ F ( x ( θ ) ) ◦ ( − x ( θ ) ) , v ( θ ) i ) ≥ − α ( e − αθ F (OPT) − F ( x ( θ ) ) − h∇ F ( x ( θ ) ) ◦ ( − x ( θ ) ) , v ( θ ) i )In the last inequality, we have used Corollary 8 and Lemma 10 to lower bound F ( x ( θ ) ∨ OPT) by e − αθ F (OPT).Since b z is a candidate solution for the Double Greedy step on line 14, it follows from Lemma 5 that F ( z ) ≥ F ( b z ), and the theorem follows. As we have shown above, in every iteration θ of the algorithm, we obtain two solutions y (1) and z satisfying: F ( y (1) ) ≥ e (cid:16) (1 − θ ) e (1 − α ) θ F (OPT) + e θ F ( x ( θ ) ) (cid:17) F ( z ) ≥ − α ) ( e − αθ F (OPT) − F ( x ( θ ) ) − h∇ F ( x ( θ ) ) ◦ ( − x ( θ ) ) , v ( θ ) i )In the following, we show that there is an iteration θ for which max { F ( y (1) ) , F ( z ) } > C · F (OPT),where C is a constant that we will set later. We will proceed by contradiction and assume thatmax { F ( y (1) ) , F ( z ) } ≤ C · F (OPT) for all θ .Note that the coeﬃcient of F ( x ( θ ) ) is positive in the ﬁrst inequality above and it is negative in thesecond inequality, and there is a trade-oﬀ between the two solutions. We can get a handle on this9rade-oﬀ as follows. To simplify matters, we take a convex combination of the two inequalities andeliminate F ( x ( θ ) ). Thus we get that, for all θ ∈ [0 , − α ) + e θ − (2 − θ ) e (1 − α ) θ − − α ) F (OPT) − e θ − − α ) h∇ F ( x ( θ ) ) ◦ ( − x ( θ ) ) , v ( θ ) i ! ≤ C · F (OPT) . Thus, for all θ ∈ [0 , h∇ F ( x ( θ ) ) ◦ ( − x ( θ ) ) , v ( θ ) i ≥ (cid:16) (2 − θ ) e − αθ − ( e − θ + 2(1 − α )) C (cid:17) F (OPT) . Now note that F ( x ( θ ) ) = Z θ h∇ F ( x ( t ) ) ◦ ( − x ( t ) ) , v ( t ) i ) dt ≥ Z θ (cid:16) (2 − t ) e − αt − ( e − t + 2(1 − α )) C (cid:17) F (OPT) dt = e − αt ( α ( t −

2) + 1) α − ( − e − t + 2(1 − α ) t ) C ! (cid:12)(cid:12)(cid:12) θt =0 F (OPT)= e − αθ ( α ( θ −

2) + 1) + 2 α − α − ( − e − θ + 2(1 − α ) θ + e ) C ! F (OPT)Therefore F ( y (1) ) ≥ (1 − θ ) e (1 − α ) θ − + e (1 − α ) θ − ( α ( θ −

2) + 1) + e θ − (2 α − α − ( − − α ) θe θ − + e θ ) C ! F (OPT)In order to obtain a contradiction, we need that the coeﬃcient of F (OPT) in the inequality aboveis at least C for some θ ∈ [0 ,

1] and some α ∈ [1 / , (1 − θ ) e (1 − α ) θ − + e (1 − α ) θ − ( α ( θ −

2) + 1) + e θ − (2 α − α − (2(1 − α ) θe θ − + e θ ) C ! ≥ . By rearranging, we have C ≤ − α ) θe θ − + e θ (1 − θ ) e (1 − α ) θ − + e (1 − α ) θ − ( α ( θ −

2) + 1) + e θ − (2 α − α ! . Thus, in order to obtain the best approximation C , we need to maximize the right hand side of theinequality above over θ ∈ [0 ,

1] and α ∈ [1 / , α = 1 / θ = 0 .

18 gives

C > . References [1] Niv Buchbinder, Moran Feldman, Joseph Naor, and Roy Schwartz. Submodular maximizationwith cardinality constraints. In

Proc. 25th SODA , pages 1433–1452, 2014.102] Niv Buchbinder, Moran Feldman, Joseph (Seﬃ) Naor, and Roy Schwartz. A tight linear time(1/2)-approximation for unconstrained submodular maximization. In

Proc. 53nd FOCS , 2012.[3] Gruia Calinescu, Chandra Chekuri, Martin Pál, and Jan Vondrák. Maximizing a submodularset function subject to a matroid constraint.

SIAM J. Comput. , 40(6):1740–1766, 2011.[4] Chandra Chekuri, T. S. Jayram, and Jan Vondrák. On multiplicative weight updates forconcave and submodular function maximization. In

Proc. 6th ITCS , pages 201–210, 2015.[5] Chandra Chekuri, Jan Vondrák, and Rico Zenklusen. Submodular function maximization viathe multilinear relaxation and contention resolution schemes.

SIAM J. Comput. , 43(6):1831–1879, 2014.[6] Shaddin Dughmi, Tim Roughgarden, and Mukund Sundararajan. Revenue submodularity.

Theory of Computing , 8(1):95–119, 2012.[7] Moran Feldman, Joseph Naor, and Roy Schwartz. A uniﬁed continuous greedy algorithm forsubmodular maximization. In

Proc. 52nd FOCS , pages 570–579, 2011.[8] Shayan Oveis Gharan and Jan Vondrák. Submodular maximization by simulated annealing.In

Proc. 22nd SODA , pages 1098–1117. SIAM, 2011.[9] Ryan Gomes and Andreas Krause. Budgeted nonparametric learning from data streams. In

Proc. 27th ICML , pages 391–398, 2010.[10] Stefanie Jegelka and Jeﬀ A. Bilmes. Submodularity beyond submodular energies: Couplingedges in graph cuts. In

The 24th IEEE Conference on Computer Vision and Pattern Recogni-tion, CVPR 2011, Colorado Springs, CO, USA, 20-25 June 2011 , pages 1897–1904, 2011.[11] David Kempe, Jon M. Kleinberg, and Éva Tardos. Maximizing the spread of inﬂuence througha social network. In

Proc. 9th KDD , pages 137–146, 2003.[12] Andreas Krause, Ajit Paul Singh, and Carlos Guestrin. Near-optimal sensor placements ingaussian processes: Theory, eﬃcient algorithms and empirical studies.

Journal of MachineLearning Research , 9:235–284, 2008.[13] Jon Lee, Vahab S Mirrokni, Viswanath Nagarajan, and Maxim Sviridenko. Non-monotonesubmodular maximization under matroid and knapsack constraints. In

Proc. 41st STOC ,pages 323–332. ACM, 2009.[14] Jon Lee, Maxim Sviridenko, and Jan Vondrák. Submodular maximization over multiple ma-troids via generalized exchange properties.

Mathematics of Operations Research , 35(4):795–806,2010.[15] Hui Lin and Jeﬀ A. Bilmes. Multi-document summarization via budgeted maximization ofsubmodular functions. In

Human Language Technologies: Conference of the North AmericanChapter of the Association of Computational Linguistics, Proceedings, June 2-4, 2010, LosAngeles, California, USA , pages 912–920, 2010.[16] G L Nemhauser and L A Wolsey. Best algorithms for approximating the maximum of asubmodular set function.

Mathematics of Operations Research , 3(3):177–188, 1978.1117] Maxim Sviridenko. A note on maximizing a submodular set function subject to a knapsackconstraint.

Operations Research Letters , 32(1):41–43, 2004.[18] Jan Vondrák. Optimal approximation for the submodular welfare problem in the value oraclemodel. In

Proc. 40th STOC , pages 67–74. ACM, 2008.12

Analysis of Algorithm 1

In this section, we analyze Algorithm 1 given in Section 3. We start by showing that the problemmax u ≤ x ≤ v F ( x ) has an optimal solution OPT with the following property. Lemma 12.

There is an optimal solution

OPT to the problem max u ≤ x ≤ v F ( x ) such that for all i ,either OPT i = u i or OPT i = v i . Proof:

Let OPT = argmax u ≤ x ≤ v F ( x ) be an arbitrary optimal solution. Note that we can writeeach OPT i as a convex combination of u i and v i : OPT i = γ i u i + (1 − γ i ) v i , where γ i = (OPT i − u i ) / ( v i − u i ) if u i = v i and γ i = 1 otherwise. Let o i be a random variable that is equal to u i withprobability γ i and is equal to v i with probability 1 − γ i . Let o be the vector whose coordinatesare o i ’s. By the deﬁnition of the multilinear extension, we have F (OPT) = E o [ F ( o )]. Thus, thereexists a realization o = ˆ o such that F (ˆ o ) ≥ F (OPT). Thus there is an optimal solution such thatfor all i , its i th coordinate is either u i or v i . (cid:3) We will also use the following observation that follows from the deﬁnition of the multilinear exten-sion.

Claim 13.

Let x ∈ [0 , n and δ ∈ [ − x i , − x i ] . We have F ( x + δ · { i } ) − F ( x ) = δ · ( F ( x ∨ { i } ) − F ( x ∧ V \{ i } )) . Proof:

Note that, for each y ∈ [0 , n and each j ∈ [ n ], F ( y ) = y j · F ( y ∨ { j } ) + (1 − y j ) · F ( y ∧ V \{ j } ) . Therefore F ( x + δ · { i } ) = ( x i + δ ) · F ( x ∨ { i } ) + (1 − x i − δ ) · F ( x ∧ V \{ i } ) ,F ( x ) = x i · F ( x ∨ { i } ) + (1 − x i ) · F ( x ∧ V \{ i } ) , and the claim follows. (cid:3) Proof of Lemma 5:

Let OPT = argmax u ≤ x ≤ v F ( x ). By Lemma 12, we may assume that for each i , either OPT i = u i or OPT i = v i . Let OPT ( i ) = (OPT ∨ u ( i ) ) ∧ v ( i ) . Note that u ( i ) ≤ OPT ( i ) ≤ v ( i ) and therefore ∇ F ( u ( i ) ) ≥ ∇ F (OPT ( i ) ) ≥ ∇ F ( v ( i ) ) by Fact 3.We will show that F (OPT ( i − ) − F (OPT ( i ) ) ≤ (cid:16) F ( u ( i ) ) − F ( u ( i − ) + F ( v ( i ) ) − F ( v ( i − ) (cid:17) . (1)Note that (1) immediately implies the lemma. We prove (1) in the following.Suppose that a ′ i + b ′ i = 0. By Claim 13, F ( u ( i ) ) − F ( u ( i − ) = a ′ i a ′ i + b ′ i ( v i − u i ) (cid:16) F ( u ( i − ∨ { i } ) − F ( u ( i − ∧ V \{ i } ) (cid:17) = ( a ′ i ) a ′ i + b ′ i . and F ( v ( i ) ) − F ( v ( i − ) = b ′ i a ′ i + b ′ i ( v i − u i ) (cid:16) − F ( v ( i − ∨ { i } ) + F ( v ( i − ∧ V \{ i } ) (cid:17) = ( b ′ i ) a ′ i + b ′ i . i = u i or OPT i = v i . If OPT i = u i , we have F (OPT ( i − ) − F (OPT ( i ) ) = a ′ i a ′ i + b ′ i ( v i − u i ) (cid:16) − F (OPT ( i − ∨ { i } ) + F (OPT ( i − ∧ ( V \{ i } ) ) (cid:17) = − a ′ i a ′ i + b ′ i ( v i − u i ) ∂F∂x i (OPT ( i − ) ≤ − a ′ i a ′ i + b ′ i ( v i − u i ) ∂F∂x i ( v ( i − )= a ′ i b i a ′ i + b ′ i ≤ a ′ i b ′ i a ′ i + b ′ i On the ﬁrst two lines, we have used Claim 13 and Fact 2. On the third line, we have used the factthat ∇ F (OPT ( i − ) ≥ ∇ F ( v ( i − ). On the fourth and ﬁfth lines, we have used the deﬁnition of b i and b ′ i .If OPT i = v i , we use an analogous argument. F (OPT ( i − ) − F (OPT ( i ) ) = b ′ i a ′ i + b ′ i ( v i − u i )( F (OPT ( i − ∨ { i } ) − F (OPT ( i − ∧ V \{ i } ))= b ′ i a ′ i + b ′ i ( v i − u i ) ∂F∂x i (OPT ( i − ) ≤ b ′ i a ′ i + b ′ i ( v i − u i ) ∂F∂x i ( u ( i − ) ≤ a ′ i b ′ i a ′ i + b ′ i Since 2 a ′ i b ′ i ≤ ( a ′ i ) + ( b ′ i ) , the inequality (1) follows.Finally, suppose that a ′ i + b ′ i = 0. Notice that a i ≥ − b i by submodularity so this case can onlyhappen if a i = b i = 0. In this case, we can set the i th coordinate of u and v to an arbitrary commonvalue and by the same argument as above, we have F ( u ( i ) ) = F ( u ( i − ) , F ( v ( i ) ) = F ( v ( i − ), and F (OPT ( i ) ) = F (OPT ( i − ). Therefore (1) is trivially satisﬁed. (cid:3) B Discretized algorithm

In Figure 3, we give a discretized version of Algorithm 2. In the remainder of this section, we showhow to modify the analysis from Section 5.We discretize the time interval [0 ,

1] into segments of size δ = n − . Analysis of the solution y (1) . We modify the analysis from Subsection 5.1 as follows. (Notethat the rest of the analysis remains unchanged.)Consider an iteration θ of the outer for loop of Algorithm 3. In the remainder of this section, all ofthe vectors x ( · ) , y ( · ) , etc. refer to the vectors during iteration θ . All of the time steps are implicitlyassumed to be the discrete time steps { , δ, δ, δ, . . . , } .14 lgorithm 3: Discretized algorithm for a general constraint Initialize x ∗ = 0 δ = n − for θ ∈ { , δ, δ, δ, . . . , } do // Dampened Continuous Greedy stage Initialize x (0) = t = 0 while t < θ do v ( t ) = argmax c ∈ C, k c k ∞ ≤ α h∇ F ( x ( t ) ) ◦ ( − x ( t ) ) , c i x ( t + δ ) = x ( t ) + δ v ( t ) ◦ ( − x ( t ) ) t = t + δ // Standard Continuous Greedy stage Initialize y ( θ ) = x ( θ ) while t < do v ( t ) = argmax c ∈ C h∇ F ( y ( t ) ) ◦ ( − y ( t ) ) , c i y ( t + δ ) = y ( t ) + δ v ( t ) ◦ ( − y ( t ) ) t = t + δ // Double Greedy stage p = argmax c ∈ C h∇ F ( x ( θ ) ) ◦ ( − x ( θ ) ) , c i Find z approximating argmax c ≤ p F ( c ) using the DoubleGreedy algorithm (Algorithm 1) // Update the best solution if F ( y (1) ) > F ( x ∗ ) then x ∗ = y (1) if F ( z ) > F ( x ∗ ) then x ∗ = z Return x ∗ Figure 3: Discretized algorithm for a general constraintFirst we prove bounds on k x ( t ) k ∞ and k y ( t ) k ∞ with a similar argument to Lemma 7. Consider aﬁxed i ∈ [ n ]. Since v ( t ) i ≤ α for all t < θ , we have 1 − x ( t + δ ) i ≥ (1 − δα )(1 − x ( t ) i ) for all t < θ . Thus,1 − x ( t ) i ≥ (1 − δα ) t/δ . Similarly, for all t ∈ [ θ,

1] and t a multiple of δ ,1 − y ( t ) i ≥ (1 − δα ) θ/δ (1 − δ ) ( t − θ ) /δ . Next we prove a lower bound on F ( y (1) ) with a similar argument to Theorem 6.15 emma 14. Consider u , v satisfying | u i − v i | ≤ δ . Then F ( v ) − F ( u ) ≤ δn M, where M = max i ∈ [ n ] f ( { i } ) . Proof:

First consider the case u , v agree on all but coordinate i . Let R ( x ) be the random setwhere each element i is independently included with probability x i . This process can be thoughtof as picking a random r i ∈ [0 ,

1] and including i if r i ≤ x i . One can generate (coupled) R ( u ) and R ( v ) by sharing the same r i ’s. Notice that R ( u ) and R ( v ) agree on all coordinates other than i and they disagree on coordinate i with probability at most δ . Thus we have E [ f ( R ( u )) − f ( R ( v ))] ≤ δ E [ f ( R ( u )) | R ( u ) = R ( v )] ≤ δ max S ⊆ V f ( S ) ≤ δnM. Next, let u i be the vector whose ﬁrst i coordinates agree with u and the last n − i coordinatesagree with v . We have F ( v ) − F ( u ) = n − X i =0 ( F ( u i ) − F ( u i +1 )) ≤ δn M, where the inequality comes from the above special case. (cid:3) Corollary 15.

Consider u and v satisfying u ≤ v and v i ≤ u i + δ . Then ∂F∂x i ( u ) − ∂F∂x i ( v ) ≤ δn M. Proof:

By Claim 2, ∂F∂x i ( u ) − ∂F∂x i ( v ) = F ( u ∨ { i } ) − F ( v ∨ { i } ) − F ( u ∧ V \{ i } ) + F ( v ∧ V \{ i } ) ≤ δn M. (cid:3) For any given t ∈ [ θ, − δ ], we have F ( y ( t + δ ) ) − F ( y ( t ) )= Z h∇ F ((1 − z ) y ( t ) + z y ( t + δ ) ) , y ( t + δ ) − y ( t ) i dz = h∇ F ( y ( t ) ) , y ( t + δ ) − y ( t ) i + Z h∇ F ((1 − z ) y ( t ) + z y ( t + δ ) ) − ∇ F ( y ( t ) ) , y ( t + δ ) − y ( t ) i dz ≥ h∇ F ( y ( t ) ) , y ( t + δ ) − y ( t ) i − Z k∇ F ((1 − z ) y ( t ) + z y ( t + δ ) ) − ∇ F ( y ( t ) ) k ∞ k y ( t + δ ) − y ( t ) k dz ≥ h∇ F ( y ( t ) ) , y ( t + δ ) − y ( t ) i − δ n M = h∇ F ( y ( t ) ) , δ v ( t ) ◦ ( − y ( t ) ) i − δ n M = h∇ F ( y ( t ) ) ◦ ( − y ( t ) ) , δ v ( t ) i − δ n M ≥ h∇ F ( y ( t ) ) ◦ ( − y ( t ) ) , δ OPT i − δ n M ≥ δ ( F ( y ( t ) ∨ OPT) − F ( y ( t ) )) − δ n M δ (1 − δα ) θ/δ (1 − δ ) ( t − θ ) /δ · F (OPT) − δ · F ( y ( t ) ) − δ n M (By Lemma 10)Therefore we have the following recurrence for F ( y ( t ) ): F ( y ( t + δ ) ) ≥ (1 − δ ) F ( y ( t ) ) + δ (1 − δα ) θ/δ (1 − δ ) ( t − θ ) /δ · F (OPT) − δ n M. By expanding the recurrence, we obtain F ( y ( t ) ) ≥ (1 − δ ) ( t − θ ) /δ (cid:16) ( t − θ )(1 − δα ) θ/δ · F (OPT) + F ( x ( θ ) ) (cid:17) − tδn M. By substituting t = 1 and δ = n − and by using the inequalities e − x/ (1 − x ) ≤ − x ≤ e − x for all0 < x <

1, we obtain F ( y (1) ) ≥ e − θ (cid:16) (1 − θ ) e − αθ F (OPT) + F ( x ( θ ) ) (cid:17) − O ( n − ) F (OPT)= 1 e (cid:16) (1 − θ ) e (1 − α ) θ F (OPT) + e θ F ( x ( θ ) ) (cid:17) − O ( n − ) F (OPT) . Combining the two solutions y (1) and z.