Optimal approximation for submodular and supermodular optimization with bounded curvature
aa r X i v : . [ c s . D S ] D ec Optimal approximation for submodular and supermodularoptimization with bounded curvature
Maxim Sviridenko ∗ Jan Vondr´ak † Justin Ward ‡ December 15, 2014
Abstract
We design new approximation algorithms for the problems of optimizing submodular and supermod-ular functions subject to a single matroid constraint. Specifically, we consider the case in which we wishto maximize a nondecreasing submodular function or minimize a nonincreasing supermodular functionin the setting of bounded total curvature c . In the case of submodular maximization with curvature c , weobtain a (1 − c/e )-approximation — the first improvement over the greedy (1 − e − c ) /c -approximation ofConforti and Cornuejols from 1984, which holds for a cardinality constraint, as well as recent approachesthat hold for an arbitrary matroid constraint.Our approach is based on modifications of the continuous greedy algorithm and non-oblivious localsearch, and allows us to approximately maximize the sum of a nonnegative, nondecreasing submodularfunction and a (possibly negative) linear function. We show how to reduce both submodular maximizationand supermodular minimization to this general problem when the objective function has bounded totalcurvature. We prove that the approximation results we obtain are the best possible in the value oraclemodel, even in the case of a cardinality constraint.We define an extension of the notion of curvature to general monotone set functions and show (1 − c )-approximation for maximization and 1 / (1 − c )-approximation for minimization cases. Finally, we give twoconcrete applications of our results in the settings of maximum entropy sampling, and the column-subsetselection problem. The problem of maximizing a submodular function subject to various constraints is a meta-problem thatappears in various settings, from combinatorial auctions [30, 12, 35] and viral marketing in social networks [23]to optimal sensor placement in machine learning [26, 27, 28, 25]. A classic result by Nemhauser, Wolsey andFisher [33] is that the greedy algorithm provides a (1 − /e )-approximation for maximizing a nondecreasingsubmodular function subject to a cardinality constraint. The factor of 1 − /e cannot be improved, underthe assumption that the algorithm queries the objective function a polynomial number of times [32].The greedy algorithm has been applied in numerous settings in practice. Although it is useful to knowthat it never performs worse than 1 − /e compared to the optimum, in practice its performance is often evenbetter than this, in fact very close to the optimum. To get a quantitative handle on this phenomenon, variousassumptions can be made about the input. One such assumption is the notion of curvature , introducedby Conforti and Cornu´ejols [9]: A function f : 2 X → R + has curvature c ∈ [0 , f ( S + j ) − f ( S )does not change by a factor larger than 1 − c when varying S . A function with c = 0 is linear, so theparameter measures in some sense how far f is from linear. It was shown in [9] that the greedy algorithmfor nondecreasing submodular functions provides a (1 − e − c ) /c -approximation, which tends to 1 as c → ∗ Yahoo! Labs, New York, NY, USA. [email protected] † IBM Almaden Research Center, San Jose, CA, USA. [email protected] ‡ Department of Computer Science, University of Warwick, Coventry, United Kingdom.
Worksupported by EPSRC grant EP/J021814/1. . . . . . . . . c f ( S ) / f ( O ) Previous [37]This PaperFigure 1: Comparison of Approximation Ratios for Submodular MaximizationRecently, various applications have motivated the study of submodular optimization under various moregeneral constraints. In particular, the (1 − /e )-approximation under a cardinality constraint has beengeneralized to any matroid constraint in [5]. This captures various applications such as welfare maximizationin combinatorial auctions [35], generalized assignment problems [4] and variants of sensor placement [28].Assuming curvature c , [37] generalized the (1 − e − c ) /c -approximation of [9] to any matroid constraint, andhypothesized that this is the optimal approximation factor. It was proved in [37] that this factor is indeedoptimal for instances of curvature c with respect to the optimum (a technical variation of the definition, whichdepends on how values change when measured on top of the optimal solution). In the following, we use totalcurvature to refer to the original definition of [9], to distinguish from curvature w.r.t. the optimum [37]. Our main result is that given total curvature c ∈ [0 , − e − c c -approximation of Conforti and Cornu´ejolsfor monotone submodular maximization subject to a cardinality constraint [9] is suboptimal and can beimproved to a (1 − c/e − O ( ǫ ))-approximation. We prove that this guarantee holds for the maximization ofa nondecreasing submodular function subject to any matroid constraint, thus improving the result of [37] aswell. We give two techniques that achieve this result: a modification of the continuous greedy algorithm of[5], and a variant of the local search algorithm of [18].Using the same techniques, we obtain an approximation factor of 1 + c − c e − + − c O ( ǫ ) for minimizinga nonincreasing supermodular function subject to a matroid constraint. Our approximation guarantees arestrictly better than existing algorithms for every value of c except c = 0 and c = 1. The relevant ratios areplotted in Figures 1 and 2. In the case of minimization, we have also plotted the inverse approximation ratioto aid in comparison.We also derive complementary negative results, showing that no algorithm that evaluates f on only apolynomial number of sets can have approximation performance better than the algorithms we give. Thus,we resolve the question of optimal approximation as a function of curvature in both the submodular andsupermodular case.Further, we show that the assumption of bounded curvature alone is sufficient to achieve certain approxi-mations, even without assuming submodularity or supermodularity. Specifically, there is a (simple) algorithmthat achieves a (1 − c )-approximation for the maximization of any nondecreasing function of total curvatureat most c , subject to a matroid constraint. (In contrast, we achieve a (1 − c/e − O ( ǫ ))-approximation withthe additional assumption of submodularity.) Also, there is a − c -approximation for the minimization ofany nonincreasing function of total curvature at most c subject to a matroid constraint, compared with a(1 + c − c e − + − c O ( ǫ ))-approximation for supermodular functions.2 0 . . . . c f ( S ) / f ( O ) Approximation Ratio 0 0 . . . . . . . . c f ( O ) / f ( S ) Inverse Approximation RatioPrevious [20]This PaperFigure 2: Comparison of Approximation Ratios for Supermodular Minimization
We provide two applications of our results. In the first application, we are given a positive semidefinite matrix M . Let M [ S, S ] be a principal minor defined by the columns and rows indexed by the set S ⊆ { , . . . , n } . Inthe maximum entropy sampling problem (or more precisely in the generalization of that problem) we wouldlike to find a set | S | = k maximizing f ( S ) = ln det M [ S, S ]. It is well-known that this set function f ( S ) issubmodular [22] (many earlier and alternative proofs of that fact are known). In addition, we know thatsolving this problem exactly is NP-hard [24] (see also Lee [29] for a survey on known optimization techniquesfor the problem).We consider the maximum entropy sampling problem when the matrix M has eigenvalues 1 ≤ λ ≤ · · · ≤ λ n . Since the determinant of any matrix is just a product of its eigenvalues, the Cauchy Interlacing Theoremimplies that the the submodular function f ( S ) = ln det M [ S, S ] is nondecreasing. In addition we can easilyderive a bound on its curvature c ≤ − /λ n (see the formal definition of curvature in Section 1.3). Thisimmediately implies that our new algorithms for submodular maximization have an approximation guaranteeof 1 − (cid:16) − λ n (cid:17) e for the maximum entropy sampling problem.Our second application is the Column-Subset Selection Problem arising in various machine learningsettings. The goal is, given a matrix A ∈ R m × n , to select a subset of k columns such that the matrixis well-approximated (say in squared Frobenius norm) by a matrix whose columns are in the span of theselected k columns. This is a variant of feature selection , since the rows might correspond to examples andthe columns to features. The problem is to select a subset of k features such that the remaining features canbe approximated by linear combinations of the selected features. This is related but not identical to PrincipalComponent Analysis (PCA) where we want to select a subspace of rank k (not necessarily generated by asubset of columns) such that the matrix is well approximated by its projection to this subspace. While PCAcan be solved optimally by spectral methods, the Column-Subset Selection Problem is less well understood.Here we take the point of view of approximation algorithms: given a matrix A , we want to find a subset of k columns such that the squared Frobenius distance of A from its projection on the span of these k columnsis minimized. To the best of our knowledge, this problem is not known to be NP-hard; on the other hand,the approximation factors of known algorithms are quite large. The best known algorithm for the problemas stated is a ( k + 1)-approximation algorithm given by Deshpande and Rademacher [10]. For the relatedproblem in which we may select any set of r ≥ k columns that form a rank k submatrix of A , Deshpande3nd Vempala [11] showed that there exist matrices for which Ω( k/ǫ ) columns must be chosen to obtain a(1 + ǫ )-approximation. Boutsidis et al. [2] give a matching algorithm, which obtains a set of O ( k/ǫ ) columnsthat give a (1 + ǫ ) approximation. We refer the reader to [2] for further background on the history of thisand related problems.Here, we return to the setting in which only k columns of A may be chosen and show that this isa special case of nonincreasing function minimization with bounded curvature. We show a relationshipbetween curvature and the condition number κ of A , which allows us to obtain approximation factor of κ .We define the problem and the related notions more precisely in Section 9. The problem of maximizing a nondecreasing submodular function subject to a cardinality constraint (i.e., auniform matroid) was studied by Nemhauser, Wolsey, and Fisher [33], who showed that the standard greedyalgorithm gives a (1 − e − )-approximation. However, in [19], they show that the greedy algorithm is only1 / − e − ) approximation for an arbitrary matroidconstraint. In their approach, the continuous greedy algorithm first maximizes approximately a multilinearextension of the given submodular function and then applies a pipage rounding technique inspired by [1]to obtain an integral solution. The running time of this algorithm is dominated by the pipage roundingphase. Chekuri, Vondr´ak, and Zenklusen [6] later showed that pipage rounding can be replaced by analternative rounding procedure called swap rounding based on the exchange properties of the underlyingconstraint. In later work [8, 7], they developed the notion of a contention resolution scheme , which givesa unified treatment for a variety of constraints, and allows rounding approaches for the continuous greedyalgorithm to be composed in order to solve submodular maximization problems under combinations ofconstraints. Later, Filmus and Ward [17] obtained a (1 − e − )-approximation for submodular maximizationin an arbitrary matroid by using a non-oblivious local search algorithm that does not require rounding.On the negative side, Nemhauser and Wolsey [32] showed that it is impossible to improve upon the boundof (1 − e − ) in the value oracle model, even under a single cardinality constraint. In this model, f is given asa value oracle and an algorithm can evaluate f on only a polynomial number of sets. Feige [16] showeds that(1 − e − ) is the best possible approximation even when the function is given explicitly, unless P = N P . Inlater work, Vondr´ak [36] introduced the notion of the symmetry gap of a submodular function f , which unifiesmany inapproximability results in the value oracle model, and proved new inapproximability results for somespecific constrained settings. Later, Dobzinski and Vondr´ak [14] showed how these inapproximability boundsmay be converted to matching complexity-theoretic bounds, which hold when f is given explicitly, under theassumption that RP = N P .Conforti and Cornu´ejols [9] defined the total curvature c of non-decreasing submodular function f as c = max j ∈ X f ∅ ( j ) − f X − j ( j ) f ∅ ( j ) = 1 − min j ∈ X f X − j ( j ) f ∅ ( j ) . (1)They showed that greedy algorithm has an approximation ratio of c for the problem of maximizing anondecreasing submodular function with curvature at most c subject to a single matroid constraint. In thespecial case of a uniform matroid, they were able to show that the greedy is a − e − c c -approximation algorithm.Later, Vondr´ak [37] considered the continuous greedy algorithm in the setting of bounded curvature. Heintroduced the notion of curvature with respect to the optimum , which is a weaker notion than total curvature,and showed that the continuous greedy algorithm is a − e − c c -approximation for maximizing a nondecreasingsubmodular function f subject to an arbitrary matroid constraint whenever f has curvature at most c withrespect to the optimum. He also showed that it is impossible to obtain a − e − c c -approximation in this settingwhen evaluating f on only a polynomial number of sets. Unfortunately, unlike total curvature, it is in generalnot possible to compute the curvature of a function with respect to the optimum, as it requires knowledgeof an optimal solution. 4e shall also consider the problem of minimizing nonincreasing supermodular functions f : 2 X → R ≥ .By analogy with total curvature, Il’ev [20] defines the steepness s of a nonincreasing supermodular function.His definition, which is stated in terms of the marginal decreases of the function, is equivalent to (1) whenreformulated in terms of marginal gains. He showed that, in contrast to submodular maximization, the simplegreedy heuristic does not give a constant factor approximation algorithm in the general case. However, whenthe supermodular function f has total curvature at most c , he shows that the reverse greedy algorithm is an e p − p -approximation where p = c − c . We now fix some of our notation and give two lemmas pertaining to functions with bounded curvature.
A set function f : 2 X → R ≥ is submodular if f ( A ) + f ( B ) ≥ f ( A ∩ B ) + f ( A ∪ B ) for all A, B ⊆ X .Submodularity can equivalently be characterized in terms of marginal values , defined by f A ( i ) = f ( A + i ) − f ( A ) for i ∈ X and A ⊆ X − i . Then, f is submodular if and only if f A ( i ) ≥ f B ( i ) for all A ⊆ B ⊆ X and i B . That is, submodular functions are characterized by decreasing marginal values.Intuitively, f is supermodular if and only if − f is submodular. That is, f is supermodular if and only if f A ( i ) ≤ f B ( i ) for all A ⊆ B ⊆ X and i B .Finally, we say that a function is non-decreasing , or monotone increasing , if f A ( i ) ≥ i ∈ X and A ⊆ X , and non-increasing , or monotone decreasing , if f A ( i ) ≤ i ∈ X and A ⊆ X . We now present the definitions and notations that we shall require when dealing with matroids. We referthe reader [34] for a detailed introduction to basic matroid theory. Let M = ( X, I ) be a matroid definedon ground set X with independent sets given by I . We denote by B ( M ) the set of all bases (inclusion-wisemaximal sets in I ) of M . We denote by P ( M ) the matroid polytope for M , given by: P ( M ) = conv { I : I ∈ I} = { x ≥ X j ∈ S x j ≤ r M ( S ) , ∀ S ⊆ X } , where r M denotes the rank function associated with M . The second equality above is due to Edmonds [15].Similarly, we denote by B ( M ) the base polytope associated with M : B ( M ) = conv { I : I ∈ B ( M ) } = { x ∈ P ( M ) : X j ∈ X x j = r M ( X ) } . For a matroid M = ( X, I ), we denote by M ∗ the dual system ( X, I ∗ ) whose independent sets I ∗ aredefined as those subsets A ⊆ X that satisfy A ∩ B = ∅ for some B ∈ I (i.e., those subsets that are disjointfrom some base of M . Then, a standard result of matroid theory shows that M ∗ is a matroid whenever M is a matroid, and, moreover, B ( M ∗ ) is precisely the set { X \ B : B ∈ B ( M ) } of complements of bases of M .Finally, given a set of elements D ⊆ X , we denote by M | D the matroid ( X ∩ D, I ′ ) obtained byrestricting to M to D . The independent sets I ′ of M | D are simply those independent sets of M thatcontain only elements from D . That is, I ′ = { A ∈ I : A ∩ D = A } . We now give two general lemmas pertaining to functions of bounded curvature that will be useful in ouranalysis. The proofs, which follow directly from (1), are given in the Appendix.5 emma 2.1. If f : 2 X → R ≥ is a monotone increasing submodular function with total curvature at most c , then P j ∈ A f X − j ( j ) ≥ (1 − c ) f ( A ) for all A ⊆ X . Lemma 2.2. If f : 2 X → R ≥ is a monotone decreasing supermodular function with total curvature at most c , then (1 − c ) P j ∈ A f ∅ ( j ) ≥ − f ( X \ A ) for all A ⊆ X . Our new results for both submodular maximization and supermodular minimization with bounded curvaturemake use of an algorithm for the following meta-problem: we are given a monotone increasing, nonnegative,submodular function g : 2 X → R ≥ , a linear function ℓ : 2 X → R , and a matroid M = ( X, I ) and mustfind a base S ∈ B ( M ) maximizing g ( S ) + ℓ ( S ). Note that we do not require ℓ to be nonnegative. Indeed,in the case of supermodular minimization (discussed in Section 6.2), our approach shall require that ℓ be anegative, monotone decreasing function. For j ∈ X , we shall write ℓ ( j ) and g ( j ) as a shorthand for ℓ ( { j } )and g ( { j } ). We note that because ℓ is linear, we have ℓ ( A ) = P j ∈ A ℓ ( j ) for all A ⊆ X .Let ˆ v g = max j ∈ X g ( j ), ˆ v ℓ = max j ∈ X | ℓ ( j ) | , and ˆ v = max(ˆ v g , ˆ v ℓ ). Then, because g is submodular and ℓ islinear, we have both g ( A ) ≤ n ˆ v and | ℓ ( A ) | ≤ n ˆ v for every set A ⊆ X . Moreover, given ℓ and g , we can easilycompute ˆ v in time O ( n ). Our main technical result is the following, which gives a joint approximation for g and ℓ . Theorem 3.1.
For every ǫ > , there is an algorithm that, given a monotone increasing submodular function g : 2 X → R ≥ , a linear function ℓ : 2 X → R and a matroid M , produces a set S ∈ B ( M ) in polynomial timesatisfying g ( S ) + ℓ ( S ) ≥ (cid:0) − e − (cid:1) g ( O ) + ℓ ( O ) − O ( ǫ ) · ˆ v, for every O ∈ B ( M ) , with high probability. In the next two sections, we give two different algorithms satisfying the conditions of Theorem 3.1.
The first algorithm we consider is a modification of the continuous greedy algorithm of [5] We first sketchthe algorithm conceptually in the continuous setting, ignoring certain technicalities.
Consider x ∈ [0 , X . For any function f : 2 X → R , the multilinear extension of f is a function F : [0 , X → R given by F ( x ) = E [ f ( R x )], where R x is a random subset of X in which each element e appears independentlywith probability x e . We let G denote the multilinear extension of the given, monotone increasing submodularfunction g , and L denote the multilinear extension of the given linear function ℓ . Note that due to the linearityof expectation, L ( x ) = E [ ℓ ( R x )] = P j ∈ X x j ℓ ( j ). That is, the multilinear extension L corresponds to thenatural, linear extension of ℓ . Let P ( M ) and B ( M ) be the matroid polytope and matroid base polytopeassociated M , and let O be the arbitrary base in B ( M ) to which we shall compare our solution in Theorem3.1. Our algorithm is shown in Figure 3. In contrast to the standard continuous greedy algorithm, in thethird step we require a direction that is larger than both the value of ℓ ( O ) and the residual value γ − G ( x ( t )).Applying the standard continuous greedy algorithm to ( g + ℓ ) gives a direction that is larger than the sum of these two values, but this is insufficient for our purposes.Our analysis proceeds separately for L ( x ) and G ( x ). First, because L is linear, we obtain dLdt ( x ( t )) = L (cid:18) dxdt (cid:19) ≥ λ, odified Continuous Greedy • Guess the values of λ = ℓ ( O ) and γ = g ( O ). • Initialize x = . • For time running from t = 0 to t = 1, update x according to dxdt = v ( t ), where v ( t ) ∈ P ( M ) is avector satisfying both: L ( v ( t )) ≥ λv ( t ) · ∇ G ( x ( t )) ≥ γ − G ( x ( t )) . Such a vector exists because O is one possible candidate (as in the analysis of [5]). • Apply pipage rounding to the point x (1) and return the resulting solution.Figure 3: The modified continuous greedy algorithmfor every time t , and hence L ( x (1)) ≥ λ = ℓ ( O ). For the submodular component, we obtain dGdt ( x ( t )) = dxdt · ∇ G ( x ( t )) ≥ γ − G ( x ( t ))similar to the analysis in [5]. This leads to a differential equation that gives G ( x ( t )) ≥ (1 − e − t ) γ. The final pipage rounding phase is oblivious to the value of the objective function and so is not affectedby the potential negativity of ℓ . We can view G + L as the multilinear extension of g + ℓ and so, asin the standard continuous greedy analysis, pipage rounding produces an integral solution S satisfying g ( S ) + ℓ ( S ) ≥ G ( x (1)) + L ( x (1)) ≥ (1 − e − t ) g ( O ) + ℓ ( O ). Now we discuss the technical details of how the continuous greedy algorithm can be implemented efficiently.There are three main issues that we ignored in our previous discussion: (1) How do we “guess” the valuesof ℓ ( O ) and g ( O ); (2) How do we find a suitable direction v ( t ) in each step of the algorithm; and (3) Howdo we discretize time efficiently? Let us now address them one by one. Guessing the optimal values:
In fact, it is enough to guess the value of ℓ ( O ); we will optimize over v ( t ) · ∇ G later. Recall that | ℓ ( O ) | ≤ n ˆ v . We discretize the interval [ − n ˆ v, n ˆ v ] with O ( ǫ − ) points of theform iǫ · ˆ v for − ǫ − ≤ i ≤ ǫ − , filling the interval [ − ˆ v, ˆ v ], together with O ( ǫ − n log n ) points of the form(1 + ǫ/n ) i · ˆ v and − (1 + ǫ/n ) i · ˆ v for 0 ≤ i ≤ log ǫ/n n , filling the intervals [ˆ v, n ˆ v ] and [ − n ˆ v, − ˆ v ], respectively.We then run the following algorithm using each point as a guess for λ , and return the best solution found.Then if | ℓ ( O ) | < ˆ v , we must have ℓ ( O ) ≥ λ ≥ ℓ ( O ) − ǫ · ˆ v, for some iteration (using one of the guesses in [ − ˆ v, ˆ v ]). Similarly, if | ℓ ( O ) | ≥ ˆ v , then for some iteration wehave ℓ ( O ) ≥ λ ≥ ℓ ( O ) − ( ǫ/n ) | ℓ ( O ) | ≥ ℓ ( O ) − ǫ · ˆ v (using one of the guesses in [ˆ v, n ˆ v ] or [ − n ˆ v, − ˆ v ]). For the remainder of our analysis we consider this particulariteration. In the applications we consider ℓ is either nonnegative or non-positive, and so we need only consider half of the giveninterval. For simplicity, here we give a general approach that does not depend on the sign of ℓ . In general, we have favored,whenever possible, simplicity in the analysis over obtaining the best runtime bounds. inding a suitable direction: Given our guess of λ and a current solution x ( t ), our goal is to find adirection v ( t ) ∈ P ( M ) such that L ( v ( t )) ≥ λ and v ( t ) · ∇ G ( x ( t )) ≥ γ − G ( x ( t )). As in [5], we must estimate ∇ G ( x ( t )) by random sampling. Then, given an estimate ˜ ∇ G , we solve the linear program:max { v · ˜ ∇ G : v ∈ B ( M ) , L ( v ) ≥ λ } . (2)We can do this by the ellipsoid method, for example (or more efficiently using other methods).Following the analysis of [5], we can obtain, in polynomial time, an estimate satisfying | v ( t ) · ˜ ∇ G ( x ( t )) − v ( t ) · ∇ G ( x ( t )) | ≤ O ( ǫ ) · ˆ v, with high probability. Since L ( O ) = ℓ ( O ) ≥ λ , the base O is a feasible solution of (2). Because G ( x ( t )) isconcave along any nonnegative direction, we then have: v · ∇ G ( x ( t )) ≥ g ( O ) − G ( x ( t )) − O ( ǫ ) · ˆ v, (3)just as in the analysis of [5]. Discretizing the algorithm:
We discretize time into steps of length δ = ǫ/n ; let us assume for simplicitythat 1 /δ is an integer. In each step, we find a direction v ( t ) as described above and we update x ( t + δ ) = x ( t ) + δ · v ( t ) . Clearly, after 1 /δ steps we obtain a solution x (1) which is a convex combination of points v ( t ) ∈ P ( M ), andtherefore a feasible solution. In each step, we had L ( v ( t )) ≥ λ and so L ( x (1)) = /δ X i =1 δ · L ( v ( i · δ )) ≥ λ ≥ ℓ ( O ) − ǫ · ˆ v. The analysis of the submodular component follows along the lines of [5]. In one time step, we gain G ( x ( t + δ )) − G ( x ( t )) ≥ (1 − nδ ) · ( x ( t + δ ) − x ( t )) · ∇ G ( x ( t )) ≥ (1 − nδ ) · δ · [ g ( O ) − G ( x ( t )) − O ( ǫ ) · ˆ v ] ≥ δ · [ g ( O ) − G ( x ( t )) − O ( ǫ ) · ˆ v − ( ǫ/n ) · g ( O )]= δ · [ g ( O ) − G ( x ( t )) − O ( ǫ ) · ˆ v ]using the bound (3). By induction (as in [5]), we obtain G ( x ( t )) ≥ (1 − e − t ) g ( O ) − O ( ǫ ) · ˆ v, and so G ( x (1)) + L ( x (1)) ≥ (1 − e − ) g ( O ) + ℓ ( O ) − O ( ǫ ) · ˆ v, as required. We now give another proof of Theorem 3.1, using a modification of the local search algorithm of [18]. Incontrast to the modified continuous greedy algorithm, our modified local search algorithm does not needto guess the optimal value of ℓ ( O ), and also does not need to solve the associated continuous optimizationproblem given in (2). However, here the convergence time of the algorithm becomes an issue that must bedealt with. 8 on-Oblivious Local Search • Let δ = ǫn · ˆ v . • S ← an arbitrary base S ∈ B ( M ). • While there exists a ∈ S and b ∈ X \ S such that S − a + b ∈ B ( M ) and ψ ( S − a + b ) ≥ ψ ( S ) + δ, set S ← S − a + b . • Return S Figure 4: The non-oblivious local search algorithm
We begin by presenting a few necessary lemmas and definitions from the analysis of [18]. We shall requirethe following general property of matroid bases, first proved by Brualdi [3], which can also be found in e.g.[34, Corollary 39.12a].
Lemma 5.1.
Let M be a matroid and A and B be two bases in B ( M ) . Then, there exists a bijection π : A → B such that A − x + π ( x ) ∈ B ( M ) for all x ∈ A . We can restate Lemma 5.1 as follows: let A = { a , . . . , a r } and B be bases of a matroid M of r . Thenwe can index the elements of b i ∈ B so that b i = π ( a i ), and then we have that A − a i + b i ∈ B ( M ) for all1 ≤ i ≤ r . The resulting collection of sets { A − a i + b i } i ∈ A will define a set of feasible swaps between thebases A and B that we consider when analyzing our local search algorithm.The local search algorithm of [18] maximizes a monotone submodular function g using a simple localsearch routine that evaluates the quality of the current solution using an auxiliary potential h , derived from g as follows: h ( A ) = X B ⊆ A g ( B ) · Z e p e − · p | B |− (1 − p ) | A |−| B | dp. We defer a discussion of issues related to convergence and computing h until the next subsection, andfirst sketch the main idea of our modified algorithm. We shall make use of the following fact, proved in [18,Lemma 4.4, p. 524-5]: for all A , g ( A ) ≤ h ( A ) ≤ C · g ( A ) ln n, for some constant C .In order to jointly maximize g ( S ) + ℓ ( S ), we employ a modified local search algorithm that is guided bythe potential ψ , given by: ψ ( A ) = (1 − e − ) h ( A ) + ℓ ( A ) . Our final algorithm is shown in Figure 4.The following Lemma shows that if it is impossible to significantly improve ψ ( S ) by exchanging a singleelement, then both g ( S ) and ℓ ( S ) must have relatively high values. Lemma 5.2.
Let A = { a , . . . , a r } and B = { b , . . . , b r } be any two bases of a matroid M , and suppose thatthe elements of B are indexed according to Lemma 5.1 so that A − a i + b i ∈ B ( M ) for all ≤ i ≤ r . Then, g ( A ) + ℓ ( A ) ≥ (1 − e − ) g ( B ) + ℓ ( B ) + r X i =1 [ ψ ( A ) − ψ ( A − a i + b i )] . roof. Filmus and Ward [18, Theorem 5.1, p. 526] show that for any submodular function g , the associatedfunction h satisfies ee − g ( A ) ≥ g ( B ) + r X i =1 [ h ( A ) − h ( A − a i + b i )] . (4)We note that since ℓ is linear, we have: ℓ ( A ) = ℓ ( B ) + r X i =1 [ ℓ ( a i ) − ℓ ( b i )]= ℓ ( B ) + r X i =1 [ ℓ ( A ) − ℓ ( A − a i + b i )] (5)Adding (1 − e − ) times (4) to (5) then completes the proof.Suppose that S ∈ B ( M ) is locally optimal for ψ under single-element exchanges, and let O be anarbitrary base of M . Then, local optimality of S implies that ψ ( S ) − ψ ( S − s i + o i ) ≥ i ∈ [ r ],where the elements s i of S and o i of O have been indexed according to Lemma 5.1. Then, Lemma 5.2 gives g ( S ) + ℓ ( S ) ≥ (cid:0) − e − (cid:1) g ( O ) + ℓ ( O ), as required by Theorem 3.1. We now show how to obtain a polynomial-time algorithm from Lemma 5.2. We face two technical difficulties:(1) how do we compute ψ efficiently in polynomial time; and (2) how do we ensure that the search forimprovements converges to a local optimum in polynomial time? As in the case of the continuous greedyalgorithm, we can address these issues by using standard techniques, but we must be careful since ℓ maytake negative values. As in that case, we have not attempted to obtain the most efficient possible runningtime analysis here, focusing instead on simplifying the arguments. Estimating ψ efficiently: Although the definition of h requires evaluating g on a potentially exponentialnumber of sets, Filmus and Ward show that h can be estimated efficiently using a sampling procedure: Lemma 5.3 ([18, Lemma 5.1, p. 525]) . Let ˜ h ( A ) be an estimate of h ( A ) computed from N = Ω( ǫ − ln n ln M ) samples of g . Then, Pr h | ˜ h ( A ) − h ( A ) | ≥ ǫ · h ( A ) i = O ( M − ) . We let ˜ ψ ( A ) = ˜ h ( A ) + ℓ ( A ) be an estimate of ψ . Set δ = ǫn · ˆ v . We shall ensure that ˜ ψ ( A ) differs from ψ ( A ) by at most δ = ǫn · ˆ v ≥ ǫn · g ( A ) = ǫC · n ln n · C · g ( A ) ln n ≥ ǫC · n ln n · h ( A ) . Applying Lemma 5.3, we can then ensure thatPr h | ˜ ψ ( A ) − ψ ( A ) | ≥ δ i = O ( M − ) , by using Ω( ǫ − n ln n ln M ) samples for each computation of ψ . By the union bound, we can ensure that | ˜ ψ ( A ) − ψ ( A ) | ≤ δ holds with high probability for all sets A considered by the algorithm, by setting M appropriately. In particular, if we evaluate ˜ ψ on any polynomial number of distinct sets A , it suffices tomake M polynomially small, which requires only a polynomial number of samples for each evaluation.10 ounding the convergence time of the algorithm: We initialize our search with an arbitrary base S ∈ B ( M ), and at each step of the algorithm, we restrict our search to those improvements that yield asignificant increase in the value of ψ . Specifically, we require that each improvement increases the currentvalue of ψ by at least an additive term δ = ǫn · ˆ v . We now bound the total number of improvements madeby the algorithm.We suppose that all values ˜ ψ ( A ) computed by the algorithm satisfy ψ ( A ) − δ ≤ ˜ ψ ( A ) ≤ ψ ( A ) + δ. From the previous discussion, we can ensure that this is indeed the case with high probability.Let O ψ = arg max A ∈B ( M ) ψ ( A ). Then, the total number of improvements applied by the algorithm is atmost: δ ( ˜ ψ ( O ψ ) − ˜ ψ ( S )) ≤ δ ( ψ ( O ψ ) − ψ ( S ) + 2 δ )= δ ((1 − e − ) · ( h ( O ψ ) − h ( S )) + ℓ ( O ψ ) − ℓ ( S ) + 2 δ ) ≤ δ ((1 − e − ) · h ( O ψ ) + | ℓ ( O ψ ) | + | ℓ ( S ) | + 2 δ ) ≤ δ ((1 − e − ) · C · g ( O ψ ) ln n + | ℓ ( O ψ ) | + | ℓ ( S ) | + 2 δ ) ≤ δ ((1 − e − ) · C · ˆ v · n ln n + n · ˆ v + n · ˆ v + 2 δ )= O ( ǫ − n ln n ) . Each improvement step requires O ( n ) evaluations of ψ . From the discussion in the previous section, setting M sufficiently high will ensures that all of the estimates made for the first Ω( ǫ − n ln n ) iterations will satisfyour assumptions with high probability, and so the algorithm will converge in polynomial time.In order to obtain a deterministic bound on the running time of the algorithm we simply terminate oursearch if it has not converged in Ω( ǫ − n ln n ) steps and return the current solution. Then when the resultingalgorithm terminates, with high probability, we indeed have ˜ ψ ( S ) − ˜ ψ ( S − s i + o i ) ≤ δ for every i ∈ [ r ] andso r X i =1 [ ψ ( S ) − ψ ( S − s i + o i )] ≤ r ( δ + 2 δ ) ≤ ǫ · ˆ v. From Lemma 5.2, the set S produced by the algorithm then satisfies g ( S ) + ℓ ( S ) ≥ (1 − e − ) g ( O ) + ℓ ( O ) − O ( ǫ ) · ˆ v, as required by Theorem 3.1. We now return to the problems of submodular maximization and supermodular minimization with boundedcurvature. We reduce both problems to the general setting introduced in Section 3. In both cases, wesuppose that we are seeking optimize a function f : 2 X → R ≥ over a given matroid M = ( X, I ) and we let O denote any optimal base of M (i.e., a base of M that either maximizes or minimizes f , according to thesetting). Suppose that f is a monotone increasing submodular function with curvature at most c ∈ [0 , f over a matroid M . 11 heorem 6.1. For every ǫ > and c ∈ [0 , , there is an algorithm that given a monotone increasingsubmodular function f : 2 X → R ≥ of curvature c and a matroid M = ( X, I ) , produces a set S ∈ I inpolynomial time satisfying f ( S ) ≥ (1 − c/e − O ( ǫ )) f ( O ) for every O ∈ I , with high probability.Proof. Define the functions: ℓ ( A ) = X j ∈ A f X − j ( j ) g ( A ) = f ( A ) − ℓ ( A ) . Then, ℓ is linear and g is submodular, monotone increasing, and nonnegative (as verified in Lemma A.1 ofthe appendix). Moreover, because f has curvature at most c , Lemma 2.1 implies that for any set A ⊆ X , ℓ ( A ) = P j ∈ A f X − j ( j ) ≥ (1 − c ) f ( A ) . In order to apply Theorem 3.1 we must bound the term ˆ v . By optimality of O and non-negativity of ℓ and g , we have ˆ v ≤ g ( O ) + ℓ ( O ) ≤ f ( O ). From Theorem 3.1, we can find a solution S satisfying: f ( S ) = g ( S ) + ℓ ( S ) ≥ (cid:0) − e − (cid:1) g ( O ) + ℓ ( O ) − O ( ǫ ) · f ( O )= (cid:0) − e − (cid:1) f ( O ) + e − ℓ ( O ) − O ( ǫ ) · f ( O ) ≥ (cid:0) − e − (cid:1) f ( O ) + (1 − c ) e − f ( O ) − O ( ǫ ) · f ( O )= (cid:0) − ce − − O ( ǫ ) (cid:1) f ( O ) . Suppose that f is a monotone decreasing supermodular function with curvature at most c ∈ [0 ,
1) and weseek to minimize f over a matroid M . Theorem 6.2.
For every ǫ > and c ∈ [0 , , there is an algorithm that given a monotone decreasingsupermodular function f : 2 X → R ≥ of curvature c and a matroid M = ( X, I ) , produces a set S ∈ I inpolynomial time satisfying f ( S ) ≤ (cid:18) c − c · e − + 11 − c · O ( ǫ ) (cid:19) f ( O ) for every O ∈ I , with high probability.Proof. Define the linear and submodular functions: ℓ ( A ) = X j ∈ A f ∅ ( j ) g ( A ) = − ℓ ( A ) − f ( X \ A ) . Because f is monotone decreasing, we have f ∅ ( j ) ≤ ℓ ( A ) ≤ A ⊆ X . Thus, ℓ is a non-positive, decreasing linear function. However, as we verify in Lemma A.2 of the appendix, g is submodular,monotone increasing, and nonnegative.We shall consider the problem of maximizing g ( S ) + ℓ ( S ) = − f ( X \ S ) in the dual matroid M ∗ , whosebases correspond to complements of bases of M . We compare our solution S to this problem to the base O ∗ = X \ O of M ∗ . Again, in order to apply Theorem 3.1, we must bound the term ˆ v . Here, because12 ( A ) is non-positive, we cannot bound ˆ v directly as in the previous section. Rather, we proceed by partialenumeration. Let ˆ e = arg max j ∈ O ∗ max( g ( j ) , | ℓ ( j ) | ). We iterate through all possible guesses e ∈ X for ˆ e , andfor each such e consider ˆ v e = max( g ( e ) , | ℓ ( e ) | ). We set X e to be the set { j ∈ X : g ( j ) ≤ ˆ v e ∧ | ℓ ( j ) | ≤ ˆ v e } ,and consider the matroid M ∗ e = M ∗ | X e , obtained by restricting M ∗ to the ground set X e . For each e satisfying r M ∗ e ( X e ) = r M ∗ ( X ), we apply our algorithm to the problem max { g ( A ) + ℓ ( A ) : A ∈ M ∗ e } , andreturn the best solution S obtained. Note since r M ∗ e ( X e ) = r M ∗ ( X ), the set S is also a base of M ∗ and so X \ S is a base of M .Consider the iteration in which we correctly guess e = ˆ e . In the corresponding restricted instance wehave g ( j ) ≤ ˆ v e = ˆ v and | ℓ ( j ) | ≤ ˆ v e = ˆ v for all j ∈ X e . Additionally, O ∗ ⊆ X e and so O ∗ ∈ B ( M e ), asrequired by our analysis. Finally, from the definition of g and ℓ , we have f ( O ) = − ℓ ( O ∗ ) − g ( O ∗ ). Sinceˆ e ∈ O ∗ , and ℓ is nonpositive while f is nonnegative,ˆ v ≤ g ( O ∗ ) + | ℓ ( O ∗ ) | = − ℓ ( O ∗ ) − f ( O ) − ℓ ( O ∗ ) ≤ − ℓ ( O ∗ ) . Therefore, by Theorem 3.1, the base S of M ∗ returned by the algorithm satisfies: g ( S ) + ℓ ( S ) ≥ (1 − e − ) g ( O ∗ ) + ℓ ( O ∗ ) + O ( ǫ ) · ℓ ( O ∗ )Finally, since f is supermodular with curvature at most c , Lemma 2.2 implies that for all A ⊆ X , − ℓ ( A ) = − P j ∈ A f ∅ ( j ) ≤ − c f ( X \ A ) . Thus, we have f ( X \ S ) = − g ( S ) − ℓ ( S ) ≤ − (cid:0) − e − (cid:1) g ( O ∗ ) − ℓ ( O ∗ ) − O ( ǫ ) · ℓ ( O ∗ )= (cid:0) − e − (cid:1) f ( O ) − (cid:0) e − + O ( ǫ ) (cid:1) ℓ ( O ∗ ) ≤ (cid:0) − e − (cid:1) f ( O ) + (cid:0) e − + O ( ǫ ) (cid:1) · − c · f ( O )= (cid:16) c − c · e − + − c · O ( ǫ ) (cid:17) f ( O ) . We note that because the error term depends on − c , our result requires that c is bounded away from 1by a constant. We now show that our approximation guarantees are the best achievable using only a polynomial numberof function evaluations, even in the special case that M is a uniform matroid (i.e., a cardinality con-straint). Nemhauser and Wolsey [32] considered the problem of finding a set S of cardinality at most r that maximizes a monotone submodular function. They give a class of functions for which obtaining a(1 − e − + ǫ )-approximation for any constant ǫ > f in their class: let p = max i ∈ X f ∅ ( i ), and let O be a set of size r on which f takes its maximum value. Then, f ( O ) = rp . Theorem 7.1.
For any constant δ > and c ∈ (0 , , there is no (1 − ce − + δ ) -approximation algorithmfor the problem max { ˆ f ( S ) : | S | ≤ r } , where ˆ f is a monotone increasing submodular function with curvatureat most c , that evaluates ˆ f on only a polynomial number of sets. roof. Let f be a function in the family given by [32] for the cardinality constraint r , and let O be a set ofsize r on which f takes its maximum value. Consider the function:ˆ f ( A ) = f ( A ) + 1 − cc · | A | · p. In Lemma A.3 of the appendix, we show that ˆ f is monotone increasing, submodular, and nonnegative withcurvature at most c .We consider the problem max { ˆ f ( S ) : | S | ≤ r } . Let α = (1 − ce − + δ ), and suppose that some algorithmreturns a solution S satisfying ˆ f ( S ) ≥ α · ˆ f ( O ), evaluating ˆ f on only a polynomial number of sets. Because f is monotone increasing, we assume without loss of generality that | S | = r . Then, f ( S ) + 1 − cc · rp ≥ α · f ( O ) + α · − cc · rp = α · f ( O ) + α · − cc · f ( O ) = 1 c · α · f ( O ) , and so f ( S ) ≥ c · α · f ( O ) − − cc · rp = (cid:18) c · α − − cc (cid:19) · f ( O )= (cid:18) c − e − + δc − − cc (cid:19) · f ( O )= (cid:18) − e − + δc (cid:19) · f ( O ) . Because each evaluation of ˆ f requires only a single evaluation of f , this contradicts the negative result of[32]. Theorem 7.2.
For any constant δ > and c ∈ (0 , , there is no (1 + c − c e − − δ ) -approximation algorithmfor the problem min { ˆ f ( S ) : | S | ≤ r } , where ˆ f is a monotone decreasing supermodular function with curvatureat most c , that evaluates ˆ f on only a polynomial number of sets.Proof. Again, let f be a function in the family given by [32] for the cardinality constraint r . Let O be a setof size r on which f takes its maximum value, and recall that f ( O ) = rp , where p = max i ∈ X f ∅ ( i ). Considerthe function: ˆ f ( A ) = pc · | X \ A | − f ( X \ A ) . In Lemma A.4 of the appendix we show that ˆ f is monotone decreasing, supermodular, and nonnegative withcurvature at most c .We consider the problem min { ˆ f ( A ) : | A | ≤ n − r } . Let α = (1 + c − c e − − δ ), and suppose that somealgorithm returns a solution A , satisfying ˆ f ( A ) ≤ α · ˆ f ( X \ O ), evaluating ˆ f on only a polynomial numberof sets.Suppose we run the algorithm for minimizing ˆ f and return the set S = X \ A . Because ˆ f is monotonedecreasing, we assume without loss of generality that | A | = n − r and so | S | = r . Then, rpc − f ( S ) ≤ α (cid:16) rpc − f ( O ) (cid:17) = α (cid:18) f ( O ) c − f ( O ) (cid:19) = α · − cc · f ( O ) , f ( S ) ≥ rpc − α · − cc · f ( O )= (cid:18) c − α · − cc (cid:19) · f ( O )= 1 − (1 − c ) − ce − + (1 − c ) δc · f ( O )= (cid:18) − e − + 1 − cc · δ (cid:19) · f ( O ) . Again, since each evaluation of ˆ f requires only one evaluation of f , this contradicts the negative result of[32]. Now we consider the problem of maximizing (respectively, minimizing) an arbitrary monotone increasing(respectively, monotone decreasing) nonnegative function f of bounded curvature subject to a single matroidconstraint. We do not require that f be supermodular or submodular, but only that there is a bound on thefollowing generalized notion of curvature.Let f be an arbitrary monotone increasing or monotone decreasing function. We define the curvature c of f as c = 1 − min j ∈ X min S,T ⊆ X \{ j } f S ( j ) f T ( j ) . (6)Note that in the case that f is either monotone increasing and submodular or monotone decreasing andsupermodular, the minimum of f S ( j ) f T ( j ) over S and T is attained when S = X − j and T = ∅ . Thus, (6) agreeswith the standard definition of curvature given in (1). Moreover, if monotonically increasing f has curvatureat most c for some c ∈ [0 , j ∈ X we have(1 − c ) f B ( j ) ≤ f A ( j ) , (7)for any j ∈ X , and A, B ⊆ X \ { j } . Analogously, for monotonically decreasing function f we have(1 − c ) f B ( j ) ≥ f A ( j ) , (8)for any j ∈ X , and A, B ⊆ X \ { j } . As with the standard notion of curvature for submodular andsupermodular functions, when c = 0, we find that (7) and (8) require f to be a linear function, while when c = 1, they require only that f is monotone increasing or monotone decreasing, respectively.First, we consider the case in which we wish to maximize a monotone increasing function f subject toa matroid constraint M = ( X, I ). Suppose that we run the standard greedy algorithm, which at each stepadds to the current solution S the element e yielding the largest marginal gain in f , subject to the constraint S + e ∈ I . Theorem 8.1.
Suppose that f is a monotone increasing function with curvature at most c ∈ [0 , , and M is a matroid. Let S ∈ B ( M ) be the base produced by the standard greedy maximization algorithm on f and M , and let O ∈ B ( M ) be any base of M . Then, f ( S ) ≥ (1 − c ) f ( O ) . roof. Let r be rank of M . Let s i be the i th element picked by the greedy algorithm, and let S i be the setcontaining the first i elements picked by the greedy algorithm. We use the bijection guaranteed by Lemma5.1 to order the elements of o i of O so that S − s i + o i ∈ I for all i ∈ [ r ], and let O i = { o j : j ≤ i } be theset containing the first i elements of O in this ordering. Then,(1 − c ) f ( O ) = (1 − c ) f ( ∅ ) + (1 − c ) r X i =1 f O i − ( o i ) ≤ f ( ∅ ) + r X i =1 f S i − ( o i ) ≤ f ( ∅ ) + r X i =1 f S i − ( s i )= f ( S ) . The first inequality follows from (7) and f ( ∅ ) ≥
0. The last inequality is due to the fact that S i − + o i ∈ I but s i was chosen by the greedy maximization algorithm in the i th round.Similarly, we can consider the problem of finding a base of M that minimizes f . In this setting, we againemploy a greedy algorithm, but at each step choose the element e yielding the smallest marginal gain in f ,terminating only when no element can be added to the current solution. We call this algorithm the standardgreedy minimization algorithm. Theorem 8.2.
Suppose that f is a monotone increasing function with curvature at most c ∈ [0 , and M is a matroid. Let S ∈ B ( M ) be the base produced by the standard greedy minimization algorithm on f and M , and let O ∈ B ( M ) be any base of M . Then, f ( S ) ≤ − c f ( O ) . Proof.
Let r, S i , s i , O i , and o i be defined as in the proof of Theorem 8.1. Then, f ( O ) = f ( ∅ ) + r X i =1 f O i − ( o i ) ≥ (1 − c ) f ( ∅ ) + (1 − c ) r X i =1 f S i − ( o i ) ≥ (1 − c ) f ( ∅ ) + r X i =1 f S i − ( s i )= (1 − c ) f ( S ) . As in the proof of Theorem 8.1, the first inequality follows from (7) and f ( ∅ ) ≥
0. The last inequality is dueto the fact that S i − + o i ∈ I but s i was chosen by the greedy minimization algorithm in the i th round.Now, we consider the case in which f is a monotone decreasing function. For any function f : 2 X → R ≥ ,we define the function f ∗ : 2 X → R ≥ by f ∗ ( S ) = f ( X \ S ) for all S ⊆ X . Then, since f is monotonedecreasing, f ∗ is monotone increasing. Moreover, the next lemma shows that the curvature of f ∗ is the sameas that of f . Lemma 8.3.
Let f be a monotone decreasing function with curvature at most c ∈ [0 , , and define f ∗ ( S ) = f ( X \ S ) for all S ⊆ X . Then, f ∗ also has curvature at most c . roof. From the definition of f ∗ , we have: f ∗ A ( j ) = f ( X \ ( A + j )) − f ( X \ A ) = − f X \ ( A + j ) ( j ) , for any A ⊆ X and j ∈ X . Consider any j ∈ X and S, T ⊆ X \ { j } . Since f is monotone decreasing withcurvature at most c , (8) implies f ∗ S ( j ) = − f X \ ( S + j ) ( j ) ≥ − (1 − c ) f X \ ( T + j ) ( j ) = (1 − c ) f ∗ T ( j ) . Thus, f ∗ S ( j ) f ∗ T ( j ) ≥ (1 − c ) for all j ∈ X and S, T ⊆ X \ { j } .Given a matroid M , we consider the problem of finding a base of M minimizing f . This problem isequivalent to finding a base of the dual matroid M ∗ that minimizes f ∗ . Similarly, the problem of finding abase of M that maximizes f can be reduced to that of finding a base of M ∗ that maximizes f ∗ . Since f ∗ is monotone increasing with curvature no more than f , we obtain the following corollaries, which show howto employ the standard greedy algorithm to optimize monotone decreasing functions. Corollary 8.4.
Suppose that f is a monotone decreasing function with curvature at most c ∈ [0 , and M is a matroid. Let S ∗ ∈ B ( M ∗ ) be the base of M ∗ produced by running the standard greedy maximizationalgorithm on f ∗ and M ∗ . Let O ∈ B ( M ) be any base of M , O ∗ = X \ O , and S = X \ S ∗ ∈ B ( M ) . Then, f ( S ) = f ∗ ( S ∗ ) ≥ (1 − c ) f ∗ ( O ∗ ) = (1 − c ) f ( O ) . Corollary 8.5.
Suppose that f is a monotone decreasing function with curvature at most c ∈ [0 , and M is a matroid. Let S ∗ ∈ B ( M ∗ ) be the base of M ∗ produced by running the standard greedy minimizationalgorithm on f ∗ and M ∗ . Let O ∈ B ( M ) be any base of M , O ∗ = X \ O , and S = X \ S ∗ ∈ B ( M ) . Then, f ( S ) = f ∗ ( S ∗ ) ≤ − c f ∗ ( O ∗ ) = 11 − c f ( O ) . The approximation factors of 1 − c and 1 / (1 − c ) for maximization and minimization respectively arebest possible, given curvature c . The hardness result for minimization follows from [21], where it is shownthat no algorithm using polynomially many value queries can achieve an approximation factor of ρ ( n, ǫ ) = n / − ǫ n / − ǫ − − c ) for the problem min { f ( S ) : | S | ≥ r } , where f is monotone increasing (even submodular) ofcurvature c . This implies that no 1 / (1 − c + ǫ )-approximation for a constant ǫ > Theorem 8.6.
For any constant c ∈ (0 , and ǫ > , there is no (1 − c + ǫ ) -approximation using polynomiallymany queries for the problem max { f ( S ) : | S | ≤ k } where f is monotone increasing of curvature c .Proof. Fix c ∈ (0 , | X | = n and let O ⊆ X be a random subset of size k = n / (assume k is an integer).We define the following function: • f ( S ) = max {| S ∩ O | , (1 − c ) | S | + cn / , min {| S | , n / }} .The marginal values of f are always between 1 − c and 1; therefore, its curvature is c . Consider a query Q .For | Q | ≤ n / , we have f ( Q ) = | Q | in any case. For | Q | > n / , we have f ( Q ) = (1 − c ) | Q | + cn / , unless | Q ∩ O | > (1 − c ) | Q | + cn / . Since O is a random n / -fraction of the ground set, we have E [ | Q ∩ O | ] = n / | Q | ≥
1. Therefore, by the Chernoff bound, Pr[ | Q ∩ O | > (1 − c ) | Q | + cn / ] is exponentially small in n / (with respect to the choice of O ). Furthermore, as long as | Q ∩ O | ≤ (1 − c ) | Q | + cn / , f ( Q ) depends onlyon | Q | and hence the algorithm does not learn anything about the identity of the optimal set O . Unless thealgorithm queries an exponential number of sets, it will never find a set S satisfying | S ∩ O | > (1 − c ) | S | + cn / and hence the value of the returned solution is f ( S ) ≤ (1 − c ) | S | + cn / ≤ (1 − c ) n / + cn / with highprobability. On the other hand, the optimum is f ( O ) = | O | = n / . Therefore, the algorithm cannot achievea better than (1 − c + o (1))-approximation. 17herefore, the approximation factors in Theorems 8.1 and 8.2 are optimal. Combining these inapprox-imability results with Lemma 8.3 we derive similar inapproximability results showing the optimality ofCorollary 8.4 and 8.5. Let A be an m × n real matrix. We denote the columns of A by c , . . . , c n . I.e., for x ∈ R n , A x = P x i c i .The (squared) Frobenius norm of A is defined as k A k F = X i,j a ij = n X i =1 k c i k , where here, and throughout this section, we use k·k to denote the standard, ℓ vector norm.For a matrix A with independent columns, the condition number is defined as κ ( A ) = sup k x k =1 k A x k inf k x k =1 k A x k . If the columns of A are dependent, then κ ( A ) = ∞ (there is a nonzero vector x such that A x = 0).Given a matrix A with columns c , . . . , c n , and a subset S ⊆ [ n ], we denote by proj S ( x ) = argmin y ∈ span ( { c i : i ∈ S } ) k x − y k the projection of x onto the subspace spanned by the respective columns of A . Given S ⊆ [ n ], it is easy tosee that the matrix A ( S ) with columns spanned by { c i : i ∈ S } that is closest to A in squared Frobeniusnorm is A ( S ) = ( proj S ( c ) , proj S ( c ) , . . . , proj S ( c n )). The distance between two matrices is thus k A − A ( S ) k F = n X i =1 k c i − proj S ( c i ) k . We define f A : 2 [ n ] → R to be this quantity as a function of S : f A ( S ) = n X i =1 k c i − proj S ( c i ) k = n X i =1 ( k c i k − k proj S ( c i ) k ) , where the final equality follows from the fact that proj S ( c i ) and c i − proj S ( c i ) are orthogonal.Given a matrix A ∈ R m × n and an integer k , the column-subset selection problem (CSSP) is to select asubset S of k columns of A so as to minimize f A ( S ). It follows from standard properties of projection that f A is non-increasing, and so CSSP is a special case of non-increasing minimization subject to a cardinalityconstraint. We now show that the curvature of f A is related to the condition number of A . Lemma 9.1.
For any non-singular matrix A , the curvature c = c ( f A ) of the associated set function f A satisfies − c ≤ κ ( A ) . Proof.
We want to prove that for any S and i / ∈ S ,min k x k =1 k A x k ≤ | f AS ( i ) | ≤ max k x k =1 k A x k . (9)18his implies that by varying the set S , a marginal value can change by at most a factor of κ ( A ). Recallthat the marginal values of f A are negative, but only the ratio matters so we can consider the respectiveabsolute values. The inequalities (9) imply that11 − c = max j ∈ X max S,T ⊆ X \{ j } | f S ( j ) || f T ( j ) | ≤ κ ( A ) . We now prove the inequalities (9). Let v i,S = c i − proj S ( c i ) denote the component of c i orthogonal to span ( S ). We have | f AS ( i ) | = f A ( S ) − f A ( S + i )= n X j =1 (cid:0) k c j − proj S ( c j ) k − k c j − proj S + i ( c j ) k (cid:1) = n X j =1 (cid:16) k c j − proj S ( c j ) k − k c j − proj S ( c j ) − proj v i,S ( c j ) k (cid:17) = n X j =1 k proj v i,S ( c j ) k because proj S ( c j ), proj v i,S ( c j ) and c j − proj S ( c j ) − proj v i,S ( c j ) are orthogonal.Our first goal is to show that if | f AS ( i ) | is large, then there is a unit vector x such that k A x k is large.In particular, let us define p j = ( v i,S · c j ) / k v i,S k and x j = p j / pP nℓ =1 p ℓ . We have k x k = P nj =1 x j = 1.Multiplying by matrix A , we obtain A x = P nj =1 x j c j . We can estimate k A x k as follows: v i,S · ( A x ) = v i,S · n X j =1 x j c j = n X j =1 p j pP nℓ =1 p ℓ ( v i,S · c j )= n X j =1 p j pP nℓ =1 p ℓ k v i,S k = k v i,S k vuut n X j =1 p j . By the Cauchy-Schwartz inequality, this implies that k A x k ≥ qP nj =1 p j = q | f AS ( i ) | .On the other hand, using the expression above, we have | f AS ( i ) | = n X j =1 k proj v i,S ( c j ) k ≥ k proj v i,S ( c i ) k = k v i,S k since v i,S = c i − proj S ( c i ) is the component of c i orthogonal to span ( S ). We claim that if k v i,S k is small, thenthere is a unit vector x ′ such that k A x ′ k is small. To this purpose, write proj S ( c i ) as a linear combinationof the vectors { c j : j ∈ S } : proj S ( c i ) = P j ∈ S y j c j . Finally, we define y i = −
1, and normalize to obtain19 ′ = y / k y k . We get the following: k A x ′ k = 1 k y k k A y k = 1 k y k k n X j =1 y j c j k = 1 k y k k proj S ( c i ) − c i k = 1 k y k k v i,S k . Since k y k ≥
1, and k v i,S k ≤ q | f AS ( i ) | , we obtain k A x ′ k ≤ q | f AS ( i ) | .In summary, we have given two unit vectors x , x ′ with k A x k ≥ q | f AS ( i ) | and k A x ′ k ≤ q | f AS ( i ) | . Thisproves that min k x k =1 k A x k ≤ | f AS ( i ) | ≤ max k x k =1 k A x k . By Corollary 8.5, the standard greedy minimization algorithm is then a κ ( A )-approximation for thecolumn-subset selection problem. The following lemma shows that Lemma 9.1 is asymptotically tight. Lemma 9.2.
There exists a matrix A with condition number κ for which the associated function f A hascurvature − O (1 /κ ) . Let us denote by dist S the distance from x to the subspace spanned by the columns corresponding to S . dist S ( x ) = k x − proj S ( x ) k = min y ∈ span ( { c i : i ∈ S } ) k x − y k . For some ǫ >
0, consider A = ( c , . . . , c n ) where c = e and c j = ǫ e + e j for j ≥
2. (A similar examplewas used in [2] for a lower bound on column-subset approximation.) Here, e i is the i -th canonical basisvector in R n . We claim that the condition number of A is κ = O (max { , ǫ ( n − } ), while the curvature of f A is 1 − O ( { ,ǫ ( n − } ) = 1 − O ( κ ).To bound the condition number, consider a unit vector x . We have A x = ( x + ǫ n X i =2 x i , x , x , . . . , x n )and k A x k = ( x + ǫ n X i =2 x i ) + n X i =2 x i = k x k + 2 ǫx n X i =2 x i + ǫ ( n X i =2 x i ) . We need a lower bound and an upper bound on k A x k , assuming that k x k = 1. On the one hand, we have k A x k ≤ x + ǫ n X i =2 x i ) ≤ ǫ √ n − = O (max { , ǫ ( n − } ) . On the other hand, to get a lower bound: if x ≤ /
2, then k A x k ≥ P ni =2 x i = 1 − x ≥ . If x > / P ni =2 | x i | ≤ ǫ , in which case k A x k ≥ ( x + ǫ n X i =2 x i ) ≥ , P ni =2 | x i | > ǫ in which case by convexity we get n X i =2 x i ≥ ǫ ( n − . So, in all cases k A x k = Ω(1 / max { , ǫ ( n − } ). This means that the condition number of A is κ = O (max { , ǫ ( n − } ).To lower-bound the curvature of f A , consider the first column c and let us estimate f A ∅ (1) and f A [ n ] \{ } (1).We have f A ∅ (1) = k c k + n X j =2 k proj ( c j ) k = 1 + ǫ ( n − . On the other side, f A [ n ] \{ } (1) = k c − proj [ n ] \{ } c k = ( dist [ n ] \{ } ( c )) . We exhibit a linear combination of the columns c , . . . , c n which is close to c : Let y = ǫ ( n − P nj =2 c j . Weobtain dist [ n ] \{ } ( c ) ≤ k c − y k = 1 ǫ ( n − k (0 , − , − , . . . , − k = 1 ǫ √ n − . Alternatively, we can also pick y = 0 which shows that dist [ n ] \{ } ( c ) ≤ k c k = 1. So we have f A [ n ] \{ } (1) = ( dist [ n ] \{ } ( c )) ≤ min (cid:26) , ǫ ( n − (cid:27) = 1max { , ǫ ( n − } . We conclude that the curvature of f A is at least1 − { , ǫ ( n − } = 1 − O (cid:18) κ (cid:19) . Acknowledgment
We thank Christos Boutsidis for suggesting a connection between curvature and con-dition number.
References [1] Alexander Ageev and Maxim Sviridenko. Pipage rounding: A new method of constructing algorithmswith proven performance guarantee.
J. Combinatorial Optimization , 8(3):307–328, 2004.[2] Christos Boutsidis, Petros Drineas, and Malik Magdon-Ismail. Near-optimal column-based matrix re-construction.
SIAM J. Comput. , 43(2):687–717, 2014.[3] Richard A Brualdi. Comments on bases in dependence structures.
Bull. of the Australian Math. Soc. ,1(02):161–167, 1969.[4] Gruia Calinescu, Chandra Chekuri, Martin P´al, and Jan Vondr´ak. Maximizing a submodular setfunction subject to a matroid constraint (extended abstract). In
Proc. 12th IPCO , pages 182–196, 2007.[5] Gruia Calinescu, Chandra Chekuri, Martin P´al, and Jan Vondr´ak. Maximizing a submodular setfunction subject to a matroid constraint.
SIAM J. Comput. , 40(6):1740–1766, 2011.[6] Chandra Chekuri, Jan Vondr´ak, and Rico Zenklusen. Dependent randomized rounding via exchangeproperties of combinatorial structures. In
Proc. 51st FOCS , pages 575–584, 2010.[7] Chandra Chekuri, Jan Vondr´ak, and Rico Zenklusen. Multi-budgeted matchings and matroid intersec-tion via dependent rounding. In
Proc. 22nd SODA , pages 1080–1097, 2011.218] Chandra Chekuri, Jan Vondr´ak, and Rico Zenklusen. Submodular function maximization via the mul-tilinear relaxation and contention resolution schemes. In
Proc. 43rd STOC , pages 783–792, 2011.[9] Michele Conforti and G´erard Cornu´ejols. Submodular set functions, matroids and the greedy algorithm:Tight worst-case bounds and some generalizations of the rado-edmonds theorem.
Discrete AppliedMathematics , 7(3):251–274, 1984.[10] Amit Deshpande and Luis Rademacher. Efficient volume sampling for row/column subset selection. In
Proc. 51st FOCS , pages 329–338, 2010.[11] Amit Deshpande and Santosh Vempala. Adaptive sampling and fast low-rank matrix approximation.In
Proc. 9th APPROX , pages 292–303. Springer, 2006.[12] S. Dobzinski, N. Nisan, and M. Schapira. Approximation algorithms for combinatorial auctions withcomplement-free bidders. In
Proc. 37th STOC , pages 610–618, 2005.[13] Shahar Dobzinski and Michael Schapira. An improved approximation algorithm for combinatorial auc-tions with submodular bidders. In
Proceedings of the Seventeenth Annual ACM-SIAM Symposium onDiscrete Algorithms, SODA 2006, Miami, Florida, USA, January 22-26, 2006 , pages 1064–1073, 2006.[14] Shahar Dobzinski and Jan Vondr´ak. From query complexity to computational complexity. In
Proc.44th STOC , pages 1107–1116, 2012.[15] Jack Edmonds. Matroids and the greedy algorithm.
Mathematical Programming , 1(1):127–136, 1971.[16] Uriel Feige. A threshold of ln n for approximating set cover.
J. ACM , 45:634–652, 1998.[17] Yuval Filmus and Justin Ward. A tight combinatorial algorithm for submodular maximization subjectto a matroid constraint. In
Proc. 53nd FOCS , 2012.[18] Yuval Filmus and Justin Ward. Monotone submodular maximization over a matroid via non-obliviouslocal search.
SIAM J. Comput. , 43(2):514–542, 2014.[19] M.L. Fisher, G.L. Nemhauser, and L.A. Wolsey. An analysis of approximations for maximizing sub-modular set functions—II.
Mathematical Programming Studies , 8:73–87, 1978.[20] Victor P. Il’ev. An approximation guarantee of the greedy descent algorithm for minimizing a super-modular set function.
Discrete Applied Mathematics , 114(1-3):131–146, October 2001.[21] R. Iyer, S. Jegelka, and J. Bilmes. Curvature and optimal algorithms for learning and minimizingsubmodular functions. In
In Neural Information Processing Society (NIPS), Lake Tahoe, CA, , 2013.[22] A.K. Kelmans. Multiplicative submodularity of a matrix’s principal minor as a function of the set ofits rows.
Discrete Mathematics , 44(1):113–116, 1983.[23] David Kempe, Jon M. Kleinberg, and ´Eva Tardos. Maximizing the spread of influence through a socialnetwork. In
Proc. 9th KDD , pages 137–146, 2003.[24] C.W. Ko, Jon Lee, and Maurice Queyranne. An exact algorithm for maximum entropy sampling.
Operations Research , 43(4):684–691, 1996.[25] A. Krause and C. Guestrin. Submodularity and its applications in optimized information gathering.
ACM Trans. on Intelligent Systems and Technology , 2(4):32, 2011.[26] A. Krause, C. Guestrin, A. Gupta, and J. Kleinberg. Near-optimal sensor placements: maximizinginformation while minimizing communication cost. In
Proc. 5th IPSN , pages 2–10, 2006.[27] A. Krause, A. Singh, and C. Guestrin. Near-optimal sensor placements in gaussian processes: Theory,efficient algorithms and empirical studies.
J. Machine Learning Research , 9:235–284, 2008.2228] Andreas Krause, Ram Rajagopal, Anupam Gupta, and Carlos Guestrin. Simultaneous placement andscheduling of sensors. In
Proc. 8th IPSN , pages 181–192, 2009.[29] Jon Lee. Maximum entropy sampling.
Encyclopedia of Environmetrics , 3:1229–1234, 2002.[30] B. Lehmann, D. J. Lehmann, and N. Nisan. Combinatorial auctions with decreasing marginal utilities.
Games and Economic Behavior , 55:1884–1899, 2006.[31] Vahab S. Mirrokni, Michael Schapira, and Jan Vondr´ak. Tight information-theoretic lower bounds forwelfare maximization in combinatorial auctions. In
Proceedings 9th ACM Conference on ElectronicCommerce (EC-2008), Chicago, IL, USA, June 8-12, 2008 , pages 70–77, 2008.[32] G.L. Nemhauser and L.A. Wolsey. Best algorithms for approximating the maximum of a submodularset function.
Mathematics of Operations Research , 3(3):177–188, 1978.[33] G.L. Nemhauser, L.A. Wolsey, and M.L. Fisher. An analysis of approximations for maximizing sub-modular set functions—I.
Mathematical Programming , 14(1):265–294, 1978.[34] Alexander Schrijver.
Combinatorial Optimization: Polyhedra and Efficiency . Springer, 2003.[35] Jan Vondr´ak. Optimal approximation for the submodular welfare problem in the value oracle model.In
Proc. 40th STOC , pages 67–74, 2008.[36] Jan Vondr´ak. Symmetry and approximability of submodular maximization problems. In
Proc. 50thFOCS , pages 651–670, 2009.[37] Jan Vondr´ak. Submodularity and curvature: the optimal algorithm. In
RIMS Kokyuroku Bessatsu ,volume B23, pages 253–266, Kyoto, 2010.
A Proofs and Claims Omitted from the Main Body
Lemma 2.1. If f : 2 X → R ≥ is a monotone increasing submodular function with total curvature at most c , then P j ∈ A f X − j ( j ) ≥ (1 − c ) f ( A ) for all A ⊆ X .Proof. We order the elements of X arbitrarily, and let A j be the set containing all those elements of A thatprecede the element j . Then, P j ∈ A f A j ( j ) = f ( A ) − f ( ∅ ). From (1), we have c ≥ − f X − j ( j ) f ∅ ( j )which, since f ∅ ( j ) ≥
0, is equivalent to f X − j ( j ) ≥ (1 − c ) f ∅ ( j ), for each j ∈ A. Because f is submodular, we have f ∅ ( j ) ≥ f A j ( j ) for all j , and so X j ∈ A f X − j ( j ) ≥ (1 − c ) X j ∈ A f ∅ ( j ) ≥ (1 − c ) X j ∈ A f A j ( j ) = (1 − c )[ f ( A ) − f ( ∅ )] ≥ (1 − c ) f ( A ) . Lemma 2.2. If f : 2 X → R ≥ is a monotone decreasing supermodular function with total curvature at most c , then (1 − c ) P j ∈ A f ∅ ( j ) ≥ − f ( X \ A ) for all A ⊆ X . roof. Order A arbitrarily, and let A j be the set of all elements in A that precede element j , including j itself. Then, P j ∈ A f X \ A j ( j ) = f ( X ) − f ( X \ A ). From (1), we have c ≥ − f X − j ( j ) f ∅ ( j ) , which, since f ∅ ( j ) ≤
0, is equivalent to f X − j ( j ) ≤ (1 − c ) f ∅ ( j ) . Then, since f is supermodular, we have f X \ A j ( j ) ≤ f X − j ( j ) for all j ∈ A , and so(1 − c ) X j ∈ A f ∅ ( j ) ≥ X j ∈ A f X − j ( j ) ≥ X j ∈ A f X \ A j ( j ) = f ( X ) − f ( X \ A ) ≥ − f ( X \ A ) . Lemma A.1.
Let f : 2 X → R ≥ be a monotone-increasing submodular function and define ℓ ( A ) = P j ∈ A f X − j ( j ) and g ( A ) = f ( A ) − ℓ ( A ) . Then, g is submodular, monotone increasing, and nonnegative.Proof. The function g is the sum of a submodular function f and a linear function ℓ , and so must besubmodular. For any set A ⊆ X and element j A , g A ( j ) = f A ( j ) − f X − j ( j ) ≥ f is submodular. Thus, g is monotone increasing. Finally, we note that g ( ∅ ) = f ( ∅ ) − ℓ ( ∅ ) = f ( ∅ ) ≥ g must be nonnegative. Lemma A.2.
Let f : 2 X → R ≥ be a monotone-decreasing supermodular function and define ℓ ( A ) = P j ∈ A f ∅ ( j ) and g ( A ) = − ℓ ( A ) − f ( X \ A ) . Then, g is submodular, monotone increasing, and nonnegative.Proof. We first show that g is monotone-increasing. Consider an arbitrary A ⊆ X and j A . Then, g A ( j ) = g ( A + j ) − g ( A )= − ℓ ( A + j ) − f (( X \ A ) − j ) + ℓ ( A ) + f ( X \ A )= − ℓ ( j ) + f ( X \ A ) − j ( j )= − f ∅ ( j ) + f ( X \ A ) − j ( j ) ≥ , where the last line holds because f is supermodular. Moreover, note that g ( ∅ ) = − ℓ ( ∅ ) − f ( X ) = 0, so g isnonnegative.Finally, we show that g is submodular. Suppose A ⊆ B and j B . Then, ( X \ B ) − j ⊆ ( X \ A ) − j andso, since f is supermodular, f ( X \ B ) − j ( j ) ≤ f ( X \ A ) − j ( j ). Thus, g A ( j ) = − f ∅ ( j ) + f ( X \ A ) − j ( j ) ≥ − f ∅ ( j ) + f ( X \ B ) − j ( j ) = g B ( j ) . Lemma A.3.
Let f be a monotone increasing submodular function, satisfying f A ( j ) ≤ p for all j, A , andlet c ∈ [0 , . Define ˆ f ( A ) = f ( A ) + 1 − cc · | A | · p. Then, ˆ f is submodular, monotone increasing, and nonnegative, and has curvature at most c . roof. Because ˆ f is the sum of a monotone increasing, nonnegative submodular function and a nonnegativelinear function, and hence must be monotone increasing, nonnegative, and submodular. Furthermore, forany A ⊆ X and j A , we have ˆ f A ( j ) = f A ( j ) + − cc p . Thus,ˆ f X − j ( j )ˆ f ∅ ( j ) = f X − j ( j ) + − cc · pf ∅ ( j ) + − cc · p ≥ − cc · pp + − cc · p = − cc c = 1 − c, and so ˆ f has curvature at most c . Lemma A.4.
Let f be a monotone increasing submodular function, satisfying f A ( j ) ≤ p for all j, A , andlet c ∈ [0 , . Define: ˆ f ( A ) = pc · | X \ A | − f ( X \ A ) . Then, ˆ f is submodular, monotone increasing, and nonnegative, and has curvature at most c .Proof. Because f is submodular, so is f ( X \ A ), and hence − f ( X \ A ) is supermodular. Thus, ˆ f is the sum ofa supermodular function and a linear function and so is supermodular. In order to see that ˆ f is decreasing,we consider the marginal ˆ f A ( j ), which is equal to pc ·| X \ ( A + j ) | − f ( X \ ( A + j )) − pc ·| X \ A | + f ( X \ A ) = − pc + f X \ ( A + j ) ( j ) ≤ − pc − p ≤ . Additionally, we note that ˆ f ( X ) = 0, and so ˆ f must be nonnegative. Finally, we show that ˆ f has curvatureat most c . We have: ˆ f ∅ ( j ) = − pc + f X − j ( j ) ≥ − pc ˆ f X − j ( j ) = − pc + f ∅ ( j ) ≤ − pc + p = − − cc p, and therefore ˆ f X − j ( j ) / ˆ f ∅ ( j ) ≤ − cc