[PDF] Beyond Submodular Maximization via One-Sided Smoothness

Abstract

The multilinear framework has achieved the breakthrough 1−1/e approximation for maximizing a monotone submodular function subject to a matroid constraint. This framework has a continuous optimization part and a rounding part. We extend both parts to a wider array of problems. In particular, we make a conceptual contribution by identifying a family of parameterized functions. As a running example we focus on solving diversity problems maxf(S)= 1 2 ∑ i,j∈A A ij :S∈M , where M is a matroid. These diversity functions have A ij ≥0 as a measure of dissimilarity of i,j , and A has 0 -diagonal. The multilinear framework cannot be directly applied to the multilinear extension of such functions. We introduce a new parameter for functions F∈ C 2 which measures the approximability of the associated problem max{F(x):x∈P} , for solvable downwards-closed polytopes P . A function F is called one-sided σ -smooth if 1 2 u T ∇ 2 F(x)u≤σ⋅ ||u| | 1 ||x| | 1 u T ∇F(x) for all u,x≥0 , x≠0 . We give an Ω(1/σ) -approximation for the maximization problem of monotone, normalized one-sided σ -smooth F with an additional property: non-positive third order partial derivatives. Using the multilinear framework and new matroid rounding techniques for quadratic objectives, we give an Ω(1/ σ 3/2 ) -approximation for maximizing a σ -semi-metric diversity function subject to matroid constraint. This improves upon the previous best bound of Ω(1/ σ 2 ) and we give evidence that it may be tight. For general one-sided smooth functions, we show the continuous process gives an Ω(1/ 3 2σ ) -approximation, independent of n . In this setting, by discretizing, we present a poly-time algorithm for multilinear one-sided σ -smooth functions.

Full PDF

aa r X i v : . [ c s . D S ] A p r Beyond Submodular Maximization

Mehrdad Ghadiri ∗ Richard Santiago † Bruce Shepherd ‡ Abstract

While there are well-developed tools for maximizing a submodular function f ( S ) subjectto a matroid constraint S ∈ M , there is much less work on the corresponding supermodularmaximization problems. We develop new techniques for attacking these problems inspired by thecontinuous greedy method applied to the multi-linear extension of a submodular function. Weﬁrst adapt the continuous greedy algorithm to work for general twice-continuously diﬀerentiablefunctions. Reminiscent of how the Lipschitz constant governs the convergence rate in convexoptimization, the performance of the adapted algorithm depends on a new smoothness parameter.If F : [0 , n → R ≥ is one-sided σ -smooth, then it yields an approximation factor dependingonly on σ . We apply the new algorithm to a broad class of quadratic supermodular functionsarising in diversity maximization. The case σ = 2 captures metric diversity maximization andgeneral σ includes the densest subgraph problem. We also develop new methods for roundingquadratics over a matroid polytope. These are based on extensions to swap rounding andapproximate integer decomposition. Together with the adapted continuous greedy this leads toa O ( σ / ) -approximation. This is the best asymptotic approximation known for this class ofdiversity maximization and we give some evidence for why we believe it may be tight.We then consider general (non-quadratic) functions. We give a broad parameterized familyof monotone functions which include submodular functions and the just-discussed supermodularfamily of discrete quadratics. The new family is deﬁned by restricting the one-sided smoothnesscondition to the boolean hypercube; such set functions are called γ -meta-submodular . We de-velop local search algorithms with approximation factors that depend only on γ . We show thatthe γ -meta-submodular families include well-known function classes including meta-submodularfunctions ( γ = 0 ), proportionally submodular ( γ = 1 ), and diversity functions based on negative-type distances or Jensen-Shannon divergence (both γ = 2 ) and (semi-)metric diversity functions. ∗ [email protected] , University of British Columbia, Vancouver, Canada. † [email protected] , McGill University, Montreal, Canada. ‡ [email protected] , University of British Columbia, Vancouver, Canada. Introduction

In the past decade, the catalogue of algorithms available to combinatorial optimizers has beensubstantially extended to new settings which allow submodular objective functions. For instance,while classical work [42, 43, 25] already established a -approximation for maximizing a non-negativemonotone submodular function subject to a matroid constraint, it was not until recently when thework from [49, 12] achieved a tight (1 − e )-approximation for this problem. The latter required thedevelopment of new continuous optimization machinery for the associated multi-linear relaxation.These developments in submodular maximization were occurring at the same time that researchersfound a wealth of new applications for these models [33, 39, 10, 36, 30, 40, 47, 41, 44, 18].The related supermodular maximization models (submodular minimization) also oﬀer an abun-dance of applications, but they appeared to be highly intractable even under simple cardinalityconstraints [48]. One exception came from a speciﬁc model for diversity maximization. Given aset function f ( S ) which measures the ‘diversity’ amongst elements of a set S , a problem of broadinterest is to ﬁnd a set S of maximum diversity subject to a prescribed bound on its cardinality | S | ≤ k , or more generally, subject to a matroid M constraint: ( DivM ax ) max { f ( S ) : S ∈ M} . One class of diversity functions that has wide applications in machine learning are the so-called remote-clique functions [1, 51, 26]. These are based on having a dis-similarity measure d ( u, v ) between each pair of objects u, v in the ground set. The corresponding max-sum problem is thento maximize f ( S ) := P u,v ∈ S A ( u, v ) [37, 15]. If A ( u, v ) ≥ , then one easily checks that f issupermodular. We sometimes abuse nomenclature and conﬂate A with its associated diversityfunction f . These functions are essentially a special case of what we term discrete quadratic functions .Namely, a function which is the restriction of a quadratic x T Ax + b T x to the boolean hypercube.Throughout we assume that b ≥ (all our functions are non-negative) and the associated matrix issymmetric, non-negative and has diagonal (so the quadratic is multi-linear).Even for the subclass of discrete quadratic diversity functions, the problem DivMax is ostensiblyintractable in the sense that it includes the densest subgraph problem [9]. However, for metricdiversity functions (remote-clique function when A forms a metric), there is a -approximationsubject to a cardinality constraint [45, 27]. Moreover, this has been generalized to the case of matroidconstraint [1, 9]. Borodin et al. [7, 8] introduced the class of proportionally submodular (monotone)functions which include these metric diversity functions as well as monotone submodular functions.They give a . -approximation for maximizing these functions subject to a matroid constraint.The weaker notion of σ -semi-metric (that is, satisfying a σ -approximate triangle inequality for σ ≥ ) is considered in [50]. They provide a σ -approximation under a cardinality constraint and a σ -approximation under a matroid constraint.The preceding results motivate the key impetus for our work, namely, to explain and explore thereasons for the fortunate cases when supermodular maximization is actually tractable. We argue thata one-sided smoothness parameter governs the degree to which we can approximate these problems.Two driving questions become: ( div+ ) Find a parameterized family of supermodular functionswhich contains metric, and more generally σ -semi-metric, diversity functions and remains tractablein terms of σ . A second motivating question is ( sub+div ) to ﬁnd a parameterized tractable familyof monotone set functions which includes all monotone submodular functions and the aforementioneddiversity functions. Our Results

In 1978 Fisher et al. [42, 43, 25] gave a 1/2-approximation for max { f ( S ) : S ∈ M} where M isa matroid and f is non-negative monotone submodular. In the special case of uniform matroids, M = { S : | S | ≤ k } , they gave a, provably tight, (1 − /e ) -approximation. Whether this ratio couldbe achieved for general matroids remained open for 35 years. Partly motivated by interest in thesubmodular welfare problem, Calinescu, Chekuri, Pál and Vondrak [49, 12] gave such a (1 − /e ) -approximation algorithm. This was based on a new (non-convex) relaxation followed by an elegantapplication of lossless pipage rounding of the fractional solution to a vertex of the matroid polytope.We examine both phases of their framework for clues to the question ( div+ ) on supermodularmaximization.At the heart of their approach is the problem of maximizing the multi-linear extension of asubmodular set function over a downwards-closed polytope. Submodularity in this context ensuressome nice properties for the multi-linear extension. For instance, concavity along a direction d ≥ is used to bound a Taylor series expansion in the continuous greedy analysis [49]. Since non-submodular multi-linear extensions will not have this concavity property, we propose a “smoothness”condition which guarantees an alternative bound based on Taylor series. A continuously twicediﬀerentiable function F : [0 , n → R is called one-sided σ -smooth at x = ~ if for any u ∈ [0 , n u T ∇ F ( x ) u ≤ σ · || u || || x || u T ∇ F ( x ) . We call such a function F one-sided σ -smooth if it is σ -smooth at any non-zero point of its domain.As we see, approximation algorithms exist for maximizing these nonlinear functions due to a boundon their second derivatives in terms of their gradient. This is the essential ingredient in several ofthe main results - see Lemma 3.We give an adaptation of continuous greedy which yields approximation factors that are upper-bounded by a function of the smoothness parameter σ . These results are used in a 2-phase (relaxand round) algorithm for maximizing a discrete quadratic function. Interestingly, however, one-sided smoothness also plays a role in the analysis of a local search algorithm discussed in the nextsection. Theorem 1 (Maximizing a One-Sided Smooth Function over Downwards-Closed Polytope) . Let F :[0 , n → R ≥ be a monotone one-sided σ -smooth function, and P ⊆ [0 , n be a polytime separable,downward-closed polytope. If we run the jump-start continuous greedy process (Algorithm 1) with c = 1 / , then x (1) ∈ P and F ( x (1)) ≥ [1 − exp( − (0 . / σ )] · OP T ≥ . σ +0 . · OP T where

OP T =max( F ( x ) : x ∈ P ) . In the above result, the one-sided smoothness parameter σ governs the performance ratio ofcontinuous greedy. This is somewhat reminiscent of how convergence rates in convex minimizationcan be tied to Lipschitz constants. As with Lipschitz conditions, we can improve our performanceratios by requiring smoothness on higher derivatives. This is encapsulated in the following, whichshows that if the partials ∇ i F are one-sided -smooth, then the approximation ratio improves tolinear. Theorem 2.

Let F : [0 , n → R ≥ be a monotone one-sided σ -smooth function with non-positivethird order partial derivatives. Let c ∈ (0 , and P be a polytime separable, downward-closed,polytope. If we run the jump-start continuous greedy process (Algorithm 1) with c = 1 / , then x (1) ∈ P and F ( x (1)) ≥ [1 − exp ( − σ +2 )] · OP T ≥ σ +3 · OP T , where

OP T := max { F ( x ) : x ∈ P } .

3y standard techniques (see [49, 12]) one may discretize the continuous greedy process to obtain aﬁnite algorithm, which deviates from the above guarantees by a o (1) additive error.We now return to the discrete setting and question ( div+ ). We focus on the tractability of thefollowing class of supermodular functions: f ( S ) = P { u,v }⊆ S A ( u, v ) + P v ∈ S b ( v ) , where A, b ≥ and A is a symmetric -diagonal matrix.This class is of interest for a variety of reasons. First, it is a natural family since these arejust restrictions to the hypercube of quadratic forms x T Ax + b T x . This family also coincideswith the class of second-order-modular functions introduced in [38] (see Lemma 5 in Appendix C.1).Second, in the special case when b = 0 and A ( u, v ) forms a metric, this class corresponds to metricdiversity functions and, as pointed out, the maximization problem over a matroid constraint has a2-approximation [9, 1]. Third, discrete quadratics have interesting behaviour with respect to theirone-sided smoothness. The previous mentioned metric diversity functions have one-sided smoothness σ = 2 . If A is a negative type distance , then the corresponding problems have been shown to admita PTAS [13, 14]. Another well-known distance measure is the Jensen-Shannon divergence used tomeasure dis-similarity of two probability distributions. Both JS and negative-type distances haveassociated smoothness parameter σ = 4 (Propositions 5, 6 in Appendix C).For general σ ≥ , let O σ denote the family of discrete quadratic functions which are one-sided σ -smooth. One may show (Proposition 7 in Appendix C.1) that O σ includes functions which aredetermined by a matrix A which is a σ -semi-metric . That is, A ( u, v ) ’s satisfy a σ -approximate trian-gle inequality - in this case, we refer to the associated discrete quadratic as a σ -semi-metric function .Generalizing the metric case, these semi-metric diversity problems admit a σ -approximation formatroid constraint (and a σ -approximation subject to a cardinality constraint) [50]. The nextresult, that relies on the hardness of the planted clique problem [3], shows that the approximationguarantee necessarily degrades as the smoothness parameter grows - see Appendix C.2 for the proof. Theorem 3.

There is an O ( σ / ) -approximation algorithm for maximizing f ∈ O σ over a matroid. This result is proved as follows. First, if f ∈ O σ , then one may show that its multi-linearrelaxation F satisﬁes ∇ F = A , and thus ∇ F = 0 . Hence Theorem 2 gives an O ( σ ) -approximationfor continuous greedy applied to the multi-linear relaxation. We then show that for any matroidpolytope P M and fractional x ∗ ∈ P M , we may round to an integral vector in P M with a loss of atmost O ( √ σ ) . The two phases together give the desired O ( σ / ) bound for general matroids and, asdiscussed later, a tight O ( σ ) bound for uniform matroids.A key obvious ingredient is to bound the rounding phase. This is non-standard since we aredealing with quadratic objectives. This is achieved by two diﬀerent types of rounding. We obtaina O ( rc − ) bound by a technique inspired by approximate integer decomposition methods (here r denotes the rank of the matroid, and c is the size of its smallest circuit ). A second rounding A symmetric, -diagonal matrix A represents a negative-type distance if x T Ax ≤ for all x such that T x = 0 .These include ℓ , ℓ and Jaccard distances. A circuit in a matroid is any minimal dependent set. O (1 + σr ) on the integrality gap. This is inspired by swap roundingtechniques but requires substantial modiﬁcation. The results yield the following - see Section 5. Theorem 5 (Quadratic Integrality Gap over Matroid) . Let f ∈ O σ be a set function and F itsmulti-linear extension. Let M be a matroid of rank r , minimum circuit size c , and matroid polytope P M . Then there is a polytime algorithm which given x ∗ ∈ P M produces an integral vector I ∈ P M such that F ( x ∗ ) ≤ O (min { rc − , σr } ) f ( I ) ≤ O ( √ σ ) f ( I ) . The O ( √ σ ) bound is pessimistic for some matroids. For instance, in uniform matroids (i.e., forcardinality constraint | S | ≤ k ) we have r = k and c = k + 1 . If k ≥ , then the ﬁrst rounding boundis O (1) . Hence the algorithm gives a tight O ( σ ) -approximation in this case. These observations, andthe planted clique hardness result (Theorem 3) show that we cannot expect a o ( σ ) -approximationalgorithm even for continuous greedy applied to the multi-linear extension of f ∈ O σ .These integrality gap bounds are also tight in the sense that we give almost-matching lowerbound examples - see Proposition 1. Now consider any algorithm which maximizes the multi-linearextension of f ∈ O σ in a ﬁrst phase, and then rounds the solution to an integral point. We haveseen that the ﬁrst phase should not asymptotically beat a O ( σ ) factor and in the worst case, the(quadratic) integrality gap may be as bad as Ω( √ σ ) . Therefore, for this class of algorithms one maynaively expect a best case approximation factor O ( σ / ) . In this section we no longer restrict attention to discrete quadratic functions, and study moregeneral monotone set functions. To motivate our approach we consider the deﬁnition of one-sided σ -smoothness restricted to only integral points of a function F instead of its whole domain. Namely,for any non-empty S ⊆ [ n ] : u T ∇ F ( S ) u ≤ σ · || u || | S | u T ∇ F ( S ) . If we also limit our attention todirections u = e i + e j , the inequality becomes ∇ ij F ( S ) ≤ σ · ( ∇ i F ( S ) + ∇ j F ( S ) | S | ) . (1)Now suppose that F is the multi-linear extension of a set function f : 2 [ n ] → R ≥ , and so F ( S ) = f ( S ) . One may show [49] that ∇ i F ( S ) = f ( S + i ) − f ( S − i ) and ∇ ij F ( S ) = f ( S + i + j ) − f ( S + i − j ) − f ( S − i + j ) + f ( S − i − j ) = ∇ j ( S + i ) − ∇ j ( S − i ) . To abbreviate notation we write B i ( S ) = ∇ i F ( S ) and A ij ( S ) = ∇ ij F ( S ) and so (1) becomes: A ij ( S ) ≤ σ · ( B i ( S ) + B j ( S ) | S | ) . (2)We now call a set function f σ -meta-submodular if it satisﬁes this inequality for any S = ∅ . Onemay view this as the discrete analogue of bounding the second-order term of a Taylor series by thecorresponding ﬁrst-order term. We primarily focus on monotone functions and so we denote by G σ the family of non-negative, monotone set functions which are σ -meta-submodular. Note that sincethe B i ’s are non-negative, we then have that G σ ⊆ G σ ′ if σ < σ ′ .We ﬁrst discuss the structure around the meta-submodular family (see Fig. 1). Most importantlywith respect to ( sub+div ) is that G σ includes all monotone submodular functions and σ -semi-metricdiversity functions. More precisely, the -meta-submodular functions coincide with the class ofmeta-submodular functions deﬁned by Kleinberg et al [35] which properly includes all submodularfunctions - see Proposition 2 and 4 in Appendix B. A second property is that every proportionalsubmodular function (cf. Borodin et al [8]) is -meta-submodular (see Proposition 3 in Appendix B).Given the performance guarantees of continuous greedy for smooth functions, it is natural tostudy the smoothness of multi-linear extensions from the meta-submodular families. First, one5 upermodularMonotone Submodular Div σ -Meta-submodular σ -semi-metric proportionally submodular One-sided σ -Smooth neg Figure 1: The Meta-Submodular Familiescan show that if the multi-linear extension of a set function is one-sided σ -smooth, then the setfunction itself is σ -meta-submodular (Proposition 8 in Appendix D). The converse is not necessarilytrue however: the multi-linear extension of a σ -meta-submodular function is not always one-sided σ -smooth. Hence, we prefer to use a diﬀerent parameter γ when referring to meta-submodularity.In other words we speak of γ -meta-submodular set functions and write G γ . One may think of γ as a discrete smoothness parameter. The following result shows that a set function’s multi-linear extension is one-sided smooth whenever a stronger probabilistic version of (2) is satisﬁed (seeAppendix D for proof details). We call this the expectation inequality (3), where R ∼ x denotes arandom set that contains element i independently with probability x i . Lemma 1 (Expectation Inequality) . Let f be a non-negative, monotone set function and F be itsmulti-linear extension. Let x ∈ [0 , n and σ ≥ . If for any i, j ∈ [ n ] we have the following: E R ∼ x [ | R | ] · E R ∼ x [ A ij ( R )] ≤ σ · ( E R ∼ x [ B i ( R )] + E R ∼ x [ B j ( R )]) , (3) then F is one-sided σ -smooth at x . We have proved that this inequality holds (modulo a constant factor) in the supermodular case, i.e.for the intersection of supermodular functions and γ -meta-submodular functions (see Lemma 7 inAppendix D). This yields the following. Theorem 6.

Let f be a supermodular function in G γ and F be its multi-linear extension. Then F is one-sided (max { γ, γ + 2 } ) -smooth. We conjecture that this also holds for any γ -meta-submodular function with γ > . Conjecture 1.

Let f ∈ G γ and F be its multi-linear extension where γ > . Then F is one-sided O ( γ ) -smooth. While we do not have the continuous greedy available to us for the general family G γ , ironicallyone may use a weakened smoothness property to analyze a local search algorithm for the discreteproblem max f ( S ) : S ∈ M . The weakened property asks for f to be one-sided smooth on asubdomain which dominates some integral point S . Theorem 7.

Let f ∈ G γ and F be its multi-linear extension. Let α ≥ and S ⊆ [ n ] be non-empty.Then F is one-sided αγ -smooth on { x | x ≥ S , || x || ≤ α | S |} . This sub-domain smoothness property is used in a technical analysis to obtain the following ap-proximation factor depending only on γ . This result provides a very general answer to question( sub+div ), and for low values of γ we obtain a new tractable parameterized class of functions. Theorem 8.

Let f ∈ G γ . Then there is an O ( γ γ ) -approximation via local search for maximizing f subject to a matroid constraint.

6s with the continuous setting (Theorem 2), one can improve the performance ratios by requiringadditional (discrete smoothness) conditions on higher order (ﬁrst derivative) terms. As we have seenthe discrete analog of ∇ i F is the marginal gain set function B i ( S ) . The following result shows thatif these set functions are submodular, then the exponential factor from Theorem 8 improves toa quadratic factor. We remark that submodularity of the B i ’s is just the notion of second-order-submodularity introduced in [38], and is also equivalent to the non-positivity of the third-orderpartial derivatives of the multi-linear extension. Theorem 9.

Let f ∈ G γ such that f is also second-order-submodular (that is, f ’s marginal gainsare submodular). Let M = ([ n ] , I ) be a matroid of rank r . Then the modiﬁed local search algorithm(Algorithm 2) gives an O ( γ + γ r ) -approximation for maximizng f subject to M . In order to achieve a sub-quadratic approximation matching Theorem 4 we also require the functionto be supermodular. Moreover, the local search algorithm must be signiﬁcantly adapted and ﬁnd amaximum matching in the last step - see Algorithm 2. The full proof is included in Appendix F.4.

Theorem 10. If f ∈ G γ is also second-order-submodular and supermodular, then Algorithm 2 givesan O ( γ / ) -approximation. Let S γ denote the class of functions f ∈ G γ which are also supermodular and 2nd-order-submodular. Note that S γ properly contains the family O γ of discrete quadratic functions whichare one-sided γ -smooth. By Theorem 10 there is an O ( γ / ) -approximation factor for functions in S γ , and hence this class provides our most general answer to question ( div+ ) . Other extensions of submodular functions with respect to some sliding parameter γ (measuring howclose a set function is to being submodular) have been considered in the literature. These include theclass of γ -weakly submodular functions, introduced in [17] and further studied in [19, 34, 29, 16, 6].The class of set functions with supermodular degree d (an integer between and n − such that d = 0 if and only if f is submodular), introduced in [21] and further considered in [22, 23]. The classof ǫ -approximate submodular functions studied in [28]. The hierarchy over monotone set functionsintroduced in [20], where levels of the hierarchy correspond to the degree of complementarity in agiven function. They refer to this class as MPH (Maximum over Positive Hypergraphs), and MPH-kdenotes the k -th level in the hierarchy where ≤ k ≤ n . The highest level MPH-n of the hierarchycaptures all monotone functions, while the lowest level MPH-1 captures the class of XOS functions(which include submodular).We remark that our class of γ -meta-submodular functions diﬀers from all the above extensions,since, for instance, none of them captures the class of metric diversity functions (in the sense ofgiving a good, say O (1) , approximation) while ours does. Moreover, a discussion about the one-sidedsmoothness and Lipschitz smoothness is provided in Appendix A.1.Other adaptations of the original continuous greedy algorithm from [49] have been used indiﬀerent submodular maximization settings, including non-monotone [24] and distributed [4] maxi-mization. We use { e , . . . , e n } to denote the standard basis of R n and [ n ] := { , . . . , n } to refer to the groundset of a set function. If R ⊆ [ n ] and x = ( x , . . . , x n ) ∈ [0 , n , p x ( R ) denotes the probability ofpicking set R with respect to vector x . In other words, p x ( R ) = Q v ∈ R x v Q v ∈ [ n ] \ R (1 − x v ) .7he multi-linear extension of a set function f : 2 [ n ] → R is F : [0 , n → R , where F ( x ) = X R ⊆ [ n ] f ( R ) p x ( R ) = E R ∼ x [ f ( R )] . For a set R ⊆ [ n ] , we denote by R its characteristic vector. Given a vector x we denote its supportby supp ( x ) , i.e., the set of non-zero coordinates of x . The following lemma describes the connectionbetween the terms A ij and B i (see Appendix A for proof details). Lemma 2 (Discrete integral) . Let f : 2 [ n ] → R , i ∈ [ n ] , and R = { v , . . . , v r } ⊆ [ n ] . Moreover, let R m = { v , . . . , v m } for ≤ m ≤ r and R = ∅ . Then B i ( R ) = f ( { i } ) + P rj =1 A iv j ( R j − ) . For vector x ∈ R n and i ∈ [ n ] , we use the (somewhat unfortunate) notation x − i ∈ R n − to denotethe vector produced by eliminating the i -th coordinate of x . The following result describes a property of one-sided smoothness that plays a key role in the analysisof both our continuous and discrete (local search) algorithms.

Lemma 3.

Let x ∈ [0 , n \ { ~ } , u ∈ [0 , n and ǫ > such that x + ǫu ∈ [0 , n . Let F : [0 , n → R be a non-negative, monotone function which is one-sided σ -smooth on { y | x + ǫu ≥ y ≥ x } . Then u T ∇ F ( x + ǫu ) ≤ (cid:18) || x + ǫu || || x || (cid:19) σ ( u T ∇ F ( x )) . Proof.

Let g ( t ) := u T ∇ F ( x + tu ) . By the Chain Rule we have g ′ ( t ) = u T ∇ F ( x + tu ) u .By one-sided σ -smoothness on { y | x + ǫu ≥ y ≥ x } , for any ≤ t ≤ ǫ , g ′ ( t ) = u T ∇ F ( x + tu ) u ≤ σ || u || || x + tu || u T ∇ F ( x + tu ) = σ || u || || x + tu || g ( t ) ≤ σ || u || || x + tu || ( g ( t ) + c ) , for any c > . Therefore, using that g ( t ) + c > for all t (since g ( t ) ≥ ), we have g ′ ( t ) g ( t ) + c ≤ σ || u || || x + tu || . (4)We integrate both sides of (4) with respect to t . On the left hand side we get Z ǫ g ′ ( t ) g ( t ) + c dt = ln( g ( t ) + c ) (cid:12)(cid:12)(cid:12)(cid:12) ǫ = ln( g ( ǫ ) + cg (0) + c ) , and on the right hand side we get σ Z ǫ || u || || x + tu || dt = σ ln( || x + tu || ) (cid:12)(cid:12)(cid:12)(cid:12) ǫ = σ ln( || x + ǫu || || x || ) , where we use that || u || = P i u i = ddt P i ( x i + tu i ) = ddt || x + tu || .Therefore ln( g ( ǫ )+ cg (0)+ c ) ≤ σ ln( || x + ǫu || || x || ) , and hence g ( ǫ ) + c ≤ (cid:16) || x + ǫu || || x || (cid:17) σ ( g (0) + c ) . Since thisholds for any c > taking the limit yields the desired result.8 lgorithm 1: Jump-Start Continuous Greedy Input:

A monotone one-sided σ -smooth function F : [0 , n → R ≥ , a polytime separable downward-closedpolytope P , and c ∈ (0 , v ∗ ← arg max x ∈ P || x || x (0) ← cv ∗ v max ( x ) ← arg max v ∈ P { v T ∇ F ( x ) } for t ∈ [0 , do Solve x ′ ( t ) = (1 − c ) v max ( x ( t )) with boundary condition x (0) = cv ∗ return x (1) ; σ -Smoothness In this section, we provide an adaptation of the continuous greedy algorithm , originally introducedin [49]. Algorithm 1 is for maximizing a monotone one-sided σ -smooth function over a polytimeseparable downward-close polytope. Unlike the classical continuous greedy, our algorithm startsfrom a non-zero point, which allows us to take advantage of Lemma 3. Because of this, we call ouralgorithm jump-start continuous greedy . Theorem 1.

Let F : [0 , n → R ≥ be a monotone one-sided σ -smooth function. Let c ∈ (0 , and P be a polytime separable, downward-closed, polytope. If we run the jump-start continuousgreedy process (Algorithm 1) then x (1) ∈ P and F ( x (1)) ≥ [1 − exp ( − (1 − c )( cc +1 ) σ )] · OP T where

OP T := max { F ( x ) : x ∈ P } . The proof details are provided in Appendix E.1. Here we discuss the main idea of the proof.That is to show that moving in the v max direction guarantees a fractional progress proportional to ( cc +1 ) σ towards OP T . Let x ∗ ∈ P be such that F ( x ∗ ) = OP T . Also, let x ∈ { x ( t ) : 0 ≤ t ≤ } and u = ( x ∗ − x ) ∨ , i.e., x ∗ ∨ x = x + u (where ∨ denotes the component-wise maximum operation).We have by Taylor’s Theorem that for some ǫ ∈ (0 , : OP T ≤ F ( x ∗ ∨ x ) = F ( x ) + u T ∇ F ( x + ǫu ) ≤ F ( x ) + (cid:18) || x + ǫu || || x || (cid:19) σ u T ∇ F ( x ) , where the last inequality follows from Lemma 3. By the choice of x (0) we have that || x (0) || ≥ c || w || for any w ∈ P , and then since u ∈ P and x ( t ) is non-decreasing in each component (because v max is always non-negative) we also have || x + ǫu || || x || ≤ || x + u || || x || = 1 + || u || || x || ≤ || u || || x (0) || ≤ c = c + 1 c . By the choice of v max and above inequalities it follows that for any x ∈ { x ( t ) : 0 ≤ t ≤ } , v max ( x ) · ∇ F ( x ) ≥ u T ∇ F ( x ) ≥ (cid:16) || x + ǫu || || x || (cid:17) σ ( OP T − F ( x )) ≥ ( cc + 1 ) σ ( OP T − F ( x )) . In Proposition 12 in Appendix E we provide an explicit expression for the best value of c (interms of σ ) for Algorithm 1 when we are dealing with one-sided σ -smooth functions.As discussed in Section 2.1, Theorem 2, if F also satisﬁes the higher order smoothness conditionof having non-positive third order partial derivatives, then the approximation factor of Algorithm 1improves to O ( σ ) (proof details are provided in Appendix E). Finally, we remark that by standardtechniques (see [49, 12]) one may discretize the continuous greedy process to obtain a ﬁnite algorithm,which deviates from the above guarantees by a o (1) additive error.9 Integrality Gaps of Quadratic Objectives over Matroids

Let M = ([ n ] , I ) be a matroid and P M be its polytope. In this section we consider the integralitygap for the quadratic program: max F ( x ) : x ∈ P M . Here F is a non-negative, quadratic multi-linear function F ( x ) = x T Ax + b T x such that A, b ≥ and A is a symmetric, zero diagonalmatrix.Gaps for such quadratic programmes may be unbounded even for graphic matroids if we allowparallel edges. Fortunately these large gaps transpire due to a simple reason, namely that thematroids have very small circuits. This is encapsulated in the following integrality gap upperbound. Theorem 11.

Let F be a non-negative, quadratic multi-linear polynomial and M be a matroid withrank r and minimum circuit size c ≥ . If x ∗ ∈ P M , then there is an independent set I of M suchthat (3 + rc − ) F ( I ) ≥ F ( x ∗ ) . We actually prove the following decomposition result. For x ∗ ∈ P M , we deﬁne the coverage ofa pair u, v to be the quantity x ∗ ( u ) x ∗ ( v ) . Let Cov ∈ R ( n ) be the vector with entries Cov ( u, v ) = x ∗ ( u ) x ∗ ( v ) . As F is quadratic it is linear in these coverage values and the vector x ∗ : F ( x ∗ ) = P u = v ( A ( u,v )2 ) Cov ( u, v ) + P v b ( v ) x ∗ ( v ) . For a set X we say its coverage set is cov ( X ) = {{ u, v } : u, v ∈ X, u = v } . A quadratic coverage of x ∗ is a collection C = { I i , µ i } of weighted independent setswith properties (1) for each u = v , P i : { u,v }⊆ cov ( I i ) µ i ≥ Cov ( u, v ) , and (2) for each v , P i : I i ∋ v µ i ≥ x ∗ ( v ) . Recall that A, b ≥ . It follows that P i µ i F ( I i ) ≥ F ( x ∗ ) and hence if the size P i µ i ≤ K ,then some I i satisﬁes F ( I i ) ≥ F ( x ∗ ) K . This bound depends on the fact that entries of A are non-negative. By condition (1) of quadratic coverages, we have P i µ i cov ( I i ) ≥ Cov and by condition(2), P i µ i I i ≥ x ∗ . Therefore, for such a collection we have P i µ i F ( I i ) ≥ F ( x ∗ ) . This reasoningshows that to deduce Theorem 11, it suﬃces to ﬁnd a quadratic coverage with P i µ i ≤ (3 + rc − ) .We show how to do this in Appendix G.2, Theorem 17.Our other rounding approach (Algorithm 7 in Appendix G.3) is inspired by swap rounding, andshows an integrality gap of at most O (1 + γr ) . It starts from a convex combination x = P pk =1 λ k I k of bases and at each step it merges two of them. Given two bases I k and I m , the algorithm pickselements i ∈ I k \ I m and j ∈ I m \ I k and depending on the change of value in the objective, iteither replaces i by j in I k , or j by i in I m . This is repeated until I k and I m merge into one basis.The set of pairs used to produce the merged basis form a matching M = { ( i , j ) , . . . , ( i ℓ , j ℓ ) } . Weshow (Lemma 14) that this process reduces the objective by at most λ m λ k P ℓt =1 A ( i t , j t ) . Let B be the ﬁnal basis obtained during the merging process. We show that its overall loss is at mosthalf the weight of the maximum of the matchings encountered. We then return to the bases I k , I m corresponding to the maximum matching and do a merge on them to produce basis B ′ . The outputof the algorithm is the better of B, B ′ - see Theorem 18 in Appendix G.3.We also have an almost matching lower bound to the integrality gaps in Theorem 5. SeeAppendix G.1 for proof details. Proposition 1.

Let k, t ∈ N with ≤ t ≤ k . There exists a σ -semi-metric with multi-linearextension F , and a matroid M = ([2 k ] , I ) with rank r = k + t − and minimum circuit size c = 2 t ,where the integrality gap of F ( x ) over the matroid polytope P M is Ω(min { rc − , σr } ) . The main algorithmic result for general monotone γ -meta-submodular functions is as follows.10 heorem 8. Let f ∈ G γ and M = ([ n ] , I ) be a matroid of rank r . Let A ∈ I be an optimum set,i.e., A ∈ arg max R ∈I f ( R ) , and S ∈ I be an (1 + ǫn ) -approximate local optima, i.e., for any i and j such that S − i + j ∈ I , (1 + ǫn ) f ( S ) ≥ f ( S − i + j ) , where ǫ > is a constant. Then if γ = O ( r ) , f ( A ) ≤ O ( γ γ ) f ( S ) and if γ = ω ( r ) , f ( A ) ≤ O ( γ γ ) f ( S ) . This result does not need the last step of Algorithm 2 where we ﬁnd a maximum matching. Theanalysis relies on two technical lemmas (Lemma 8, 9 in Appendix F) that use subdomain one-sidedsmoothness to bound the second term of the Taylor series in the right hand side of the followingexpression - see Appendix F for proof details. F ( A ) ≤ F ( A ∨ S ) = F ( S ) + TA \ S ∇ F ( S + ǫ ′ A \ S ) . We discuss the runtime of Algorithm 2 in Appendix F.2.

Algorithm 2:

Local search under matroid constraint Input:

A set function f , a matroid M = ([ n ] , I ) with circuits of minimum cardinality c , and ǫ > . S ← arg max { v,v ′ }∈I f ( { v, v ′ } ) S ← a base of M that contains S while S is not an approximate local optima do Find i ∈ S and j ∈ [ n ] \ S such that S − i + j ∈ I and f ( S − i + j ) ≥ (1 + ǫn ) f ( S ) S ← S − i + j Create a complete weighted bipartite graph G with node sets S and [ n ] \ S , and edge weights w ( i, j ) := A ij ( S ) for each i ∈ S and j / ∈ S . Find a maximum weighted matching M in G of (edge)cardinality c − , and let S ′ denote the node set of M . Return arg max { f ( S ) , f ( S ′ ) } As discussed in Theorems 9 and 10 on Section 2.2, one can get improved approximation factorsby requiring additional conditions on the marginal gains of the set function f . More precisely, if f ∈ G γ has marginal gain terms B i ( S ) = f ( S + i ) − f ( S − i ) which are submodular, then Algorithm 2gives an improved O ( γ + γ r ) -approximation. If we go one step further and also require f to besupermodular, then the approximation factor becomes O (min { γ + γ r , γrc − } ) ≤ O ( γ / ) .The analysis goes as follows. By Taylor’s theorem and non-positivity of third-order partialderivatives, we have F ( A ) ≤ F ( A ∨ S ) = F ( S )+ TA \ S ∇ F ( S )+ 12 TA \ S ∇ F ( S ) A \ S ≤ F ( S )+(1+ γ | A \ S || S | ) TA \ S ∇ F ( S ) , where the last inequality follows from the deﬁnition of smoothness and Theorem 7 about subdomainsmoothness (for α = 1 ). Writing it with other notation, we have f ( A ) ≤ f ( S ) + (1 + γ | A \ S || S | ) X i ∈ A \ S B i ( S ) ≤ f ( S ) + (1 + γ ) X i ∈ A \ S B i ( S ) . Let g : A \ S → S \ A be a bijective mapping where S − g ( i ) + i ∈ M for any i . Then by the aboveinequality, Lemma 2 (discrete integral) and approximate local optimality of S , we have f ( A ) ≤ f ( S ) + (1 + γ ) X i ∈ A \ S B i ( S ) = f ( S ) + (1 + γ ) X i ∈ A \ S ( B i ( S − g ( i )) + A ig ( i ) ( S − g ( i ))) ≤ f ( S ) + (1 + γ ) X i ∈ A \ S ( B g ( i ) ( S − g ( i )) + ǫn f ( S ) + A ig ( i ) ( S − g ( i ))) ≤ O ( γ ) f ( S ) + (1 + γ ) X i ∈ A \ S A ig ( i ) ( S − g ( i ))) , P i ∈ A \ S A ig ( i ) ( S − g ( i ))) in two diﬀerent ways: First, by using the deﬁnition of meta-submodularity, we obtain an O ( γ + γ r ) -approximation. Second, by using the maximum weightedmatching (explained in Algorithm 2) with node sets S ′ and returning the better of S and S ′ , weobtain an O ( γrc − ) -approximation. We include a further discussion and proof details in Appendix F.4. References [1] Zeinab Abbassi, Vahab S. Mirrokni, and Mayur Thakur. Diversity maximization under matroidconstraints. In

The 19th ACM SIGKDD International Conference on Knowledge Discovery andData Mining, KDD 2013, Chicago, IL, USA, August 11-14, 2013 , pages 32–40, 2013.[2] Alexander A Ageev and Maxim I Sviridenko. Pipage rounding: A new method of construct-ing algorithms with proven performance guarantee.

Journal of Combinatorial Optimization ,8(3):307–328, 2004.[3] Noga Alon, Sanjeev Arora, Rajsekar Manokaran, Dana Moshkovitz, and Omri Weinstein. In-approximability of densest κ -subgraph from average case hardness. Unpublished manuscript , 1,2011.[4] Rafael da Ponte Barbosa, Alina Ene, Huy L Nguyen, and Justin Ward. A new framework fordistributed submodular maximization. In , pages 645–654. Ieee, 2016.[5] Aditya Bhaskara, Mehrdad Ghadiri, Vahab S. Mirrokni, and Ola Svensson. Linear relaxationsfor ﬁnding diverse elements in metric spaces. In

Advances in Neural Information ProcessingSystems 29: Annual Conference on Neural Information Processing Systems 2016, December5-10, 2016, Barcelona, Spain , pages 4098–4106, 2016.[6] Andrew An Bian, Joachim M Buhmann, Andreas Krause, and Sebastian Tschiatschek. Guar-antees for greedy maximization of non-submodular functions with applications. arXiv preprintarXiv:1703.02100 , 2017.[7] Allan Borodin, Dai Le, and Yuli Ye. Weakly submodular functions.

CoRR , abs/1401.6697,2014.[8] Allan Borodin, Dai Le, and Yuli Ye. A proportionally submodular functions. , 2015.[9] Allan Borodin, Hyun Chul Lee, and Yuli Ye. Max-sum diversiﬁcation, monotone submodularfunctions and dynamic updates. In

Proceedings of the 31st ACM SIGMOD-SIGACT-SIGARTSymposium on Principles of Database Systems, PODS 2012, Scottsdale, AZ, USA, May 20-24,2012 , pages 155–166, 2012.[10] Yuri Y Boykov and M-P Jolly. Interactive graph cuts for optimal boundary & region segmen-tation of objects in nd images. In

Computer Vision, 2001. ICCV 2001. Proceedings. EighthIEEE International Conference on , volume 1, pages 105–112. IEEE, 2001.[11] Gruia Calinescu, Chandra Chekuri, Martin Pál, and Jan Vondrák. Maximizing a submodularset function subject to a matroid constraint. In

International Conference on Integer Program-ming and Combinatorial Optimization , pages 182–196. Springer, 2007.1212] Gruia Calinescu, Chandra Chekuri, Martin Pál, and Jan Vondrák. Maximizing a monotonesubmodular function subject to a matroid constraint.

SIAM Journal on Computing , 40(6):1740–1766, 2011.[13] Alfonso Cevallos, Friedrich Eisenbrand, and Rico Zenklusen. Max-sum diversity via convexprogramming. In , pages 26:1–26:14, 2016.[14] Alfonso Cevallos, Friedrich Eisenbrand, and Rico Zenklusen. Local search for max-sum diver-siﬁcation. In

Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on DiscreteAlgorithms, SODA 2017, Barcelona, Spain, Hotel Porta Fira, January 16-19 , pages 130–142,2017.[15] Barun Chandra and Magnús M. Halldórsson. Facility dispersion and remote subgraphs. In

Algorithm Theory - SWAT ’96, 5th Scandinavian Workshop on Algorithm Theory, Reykjavík,Iceland, July 3-5, 1996, Proceedings , pages 53–65, 1996.[16] Lin Chen, Moran Feldman, and Amin Karbasi. Weakly submodular maximization beyondcardinality constraints: Does randomization help greedy? arXiv preprint arXiv:1707.04347 ,2017.[17] Abhimanyu Das and David Kempe. Submodular meets spectral: Greedy algorithms for subsetselection, sparse approximation and dictionary selection. arXiv preprint arXiv:1102.3975 , 2011.[18] Debadeepta Dey, Tian Yu Liu, Martial Hebert, and J Andrew Bagnell. Contextual sequenceprediction with application to control library optimization. 2012.[19] Ethan Elenberg, Alexandros G Dimakis, Moran Feldman, and Amin Karbasi. Streaming weaksubmodularity: Interpreting neural networks on the ﬂy. In

Advances in Neural InformationProcessing Systems , pages 4044–4054, 2017.[20] Uriel Feige, Michal Feldman, Nicole Immorlica, Rani Izsak, Brendan Lucier, and Vasilis Syrgka-nis. A unifying hierarchy of valuations with complements and substitutes. In

Proceedings of theTwenty-Ninth AAAI Conference on Artiﬁcial Intelligence, January 25-30, 2015, Austin, Texas,USA. , pages 872–878, 2015.[21] Uriel Feige and Rani Izsak. Welfare maximization and the supermodular degree. In

Proceedingsof the 4th conference on Innovations in Theoretical Computer Science , pages 247–256. ACM,2013.[22] Moran Feldman and Rani Izsak. Constrained monotone function maximization and the super-modular degree. arXiv preprint arXiv:1407.6328 , 2014.[23] Moran Feldman and Rani Izsak. Building a good team: Secretary problems and the supermod-ular degree. In

Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on DiscreteAlgorithms , pages 1651–1670. SIAM, 2017.[24] Moran Feldman, Joseph Naor, and Roy Schwartz. A uniﬁed continuous greedy algorithmfor submodular maximization. In , pages 570–579. IEEE, 2011.[25] Marshall L Fisher, George L Nemhauser, and Laurence A Wolsey.

An analysis of approxima-tions for maximizing submodular set functions-II . Springer, 1978.1326] Mehrdad Ghadiri and Mark Schmidt. Distributed maximization of submodular plus diversityfunctions for multi-label feature selection on huge datasets.

CoRR , abs/1903.08351, 2019.[27] R Hassin, S Rubinstein, and A Tamir. Notes on dispersion problems.

Unpublished manuscript ,1994.[28] Thibaut Horel and Yaron Singer. Maximization of approximately submodular functions. In

Advances in Neural Information Processing Systems , pages 3045–3053, 2016.[29] Hanzhang Hu, Alexander Grubb, J Andrew Bagnell, and Martial Hebert. Eﬃcient featuregroup sequencing for anytime linear prediction. arXiv preprint arXiv:1409.5495 , 2014.[30] Stefanie Jegelka and Jeﬀ Bilmes. Submodularity beyond submodular energies: coupling edgesin graph cuts. In

Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conferenceon , pages 1897–1904. IEEE, 2011.[31] Mark Jerrum. Large cliques elude the metropolis process.

Random Struct. Algorithms , 3(4):347–360, 1992.[32] R Karp. The probabilistic analysis of some combinatorial search problems, algorithms andcomplexity (proc. sympos., carnegie-mellon univ., pittsburgh, pa. 1976), 1-19, 1976.[33] David Kempe, Jon Kleinberg, and Éva Tardos. Maximizing the spread of inﬂuence through a so-cial network. In

Proceedings of the ninth ACM SIGKDD international conference on Knowledgediscovery and data mining , pages 137–146. ACM, 2003.[34] Rajiv Khanna, Ethan Elenberg, Alexandros G Dimakis, Sahand Negahban, and Joydeep Ghosh.Scalable greedy feature selection via weak submodularity. arXiv preprint arXiv:1703.02723 ,2017.[35] Jon M. Kleinberg, Christos H. Papadimitriou, and Prabhakar Raghavan. Segmentation prob-lems. In

Proceedings of the Thirtieth Annual ACM Symposium on the Theory of Computing,Dallas, Texas, USA, May 23-26, 1998 , pages 473–482, 1998.[36] Pushmeet Kohli, M Pawan Kumar, and Philip HS Torr. P & beyond: Move making algo-rithms for solving higher order functions. Pattern Analysis and Machine Intelligence, IEEETransactions on , 31(9):1645–1656, 2009.[37] Guy Kortsarz and David Peleg. On choosing a dense subgraph (extended abstract). In , pages 692–701, 1993.[38] Nitish Korula, Vahab S. Mirrokni, and Morteza Zadimoghaddam. Online submodular welfaremaximization: Greedy beats 1/2 in random order. In

Proceedings of the Forty-Seventh AnnualACM on Symposium on Theory of Computing, STOC 2015, Portland, OR, USA, June 14-17,2015 , pages 889–898, 2015.[39] Andreas Krause and Carlos Guestrin. Near-optimal observation selection using submodularfunctions. In

AAAI , volume 7, pages 1650–1654, 2007.[40] Hui Lin and Jeﬀ Bilmes. A class of submodular functions for document summarization. In

Pro-ceedings of the 49th Annual Meeting of the Association for Computational Linguistics: HumanLanguage Technologies-Volume 1 , pages 510–520. Association for Computational Linguistics,2011. 1441] Yuzong Liu, Kai Wei, Katrin Kirchhoﬀ, Yisong Song, and Jeﬀ Bilmes. Submodular featureselection for high-dimensional acoustic score spaces. In

Acoustics, Speech and Signal Processing(ICASSP), 2013 IEEE International Conference on , pages 7184–7188. IEEE, 2013.[42] George L Nemhauser, Laurence A Wolsey, and Marshall L Fisher. An analysis of approxima-tions for maximizing submodular set functions - i.

Mathematical Programming , 14(1):265–294,1978.[43] George L Nemhauser and Leonard A Wolsey. Best algorithms for approximating the maximumof a submodular set function.

Mathematics of operations research , 3(3):177–188, 1978.[44] Adarsh Prasad, Stefanie Jegelka, and Dhruv Batra. Submodular meets structured: Findingdiverse subsets in exponentially-large structured item sets. In

Advances in Neural InformationProcessing Systems , pages 2645–2653, 2014.[45] S. S. Ravi, Daniel J. Rosenkrantz, and Giri Kumar Tayi. Heuristic and special case algorithmsfor dispersion problems.

Operations Research , 42(2):299–310, 1994.[46] Alexander Schrijver. A combinatorial algorithm minimizing submodular functions in stronglypolynomial time.

Journal of Combinatorial Theory, Series B , 80(2):346–355, 2000.[47] Matthew Streeter and Daniel Golovin. An online algorithm for maximizing submodular func-tions. In

Advances in Neural Information Processing Systems , pages 1577–1584, 2009.[48] Zoya Svitkina and Lisa Fleischer. Submodular approximation: Sampling-based algorithms andlower bounds.

SIAM Journal on Computing , 40(6):1715–1737, 2011.[49] Jan Vondrák. Optimal approximation for the submodular welfare problem in the value oraclemodel. In

Proceedings of the fortieth annual ACM symposium on Theory of computing , pages67–74. ACM, 2008.[50] Sepehr Abbasi Zadeh and Mehrdad Ghadiri. Max-sum diversiﬁcation, monotone submodularfunctions and semi-metric spaces.

CoRR , abs/1511.02402, 2015.[51] Sepehr Abbasi Zadeh, Mehrdad Ghadiri, Vahab S. Mirrokni, and Morteza Zadimoghaddam.Scalable feature selection via distributed diversity maximization. In

Proceedings of the Thirty-First AAAI Conference on Artiﬁcial Intelligence, February 4-9, 2017, San Francisco, California,USA. , pages 2876–2883, 2017.

A Appendix: Preliminaries

The following result describes the connection between the terms A ij and B i . One can see it as adiscrete integral formula. Lemma 2.

Let f : 2 [ n ] → R , i ∈ [ n ] , and R = { v , . . . , v r } ⊆ [ n ] . Moreover, let R m = { v , . . . , v m } for ≤ m ≤ r and R = ∅ . Then B i ( R ) = f ( { i } ) + r X j =1 A iv j ( R j − ) . roof. First, we consider the case where i / ∈ R . Then B i ( R ) = f ( R + i ) − f ( R ) and the right handside is equal to f ( R r − + i + v r ) − f ( R r − − i + v r ) − f ( R r − + i − v r ) + f ( R r − − i − v r )+ f ( R r − + i + v r − ) − f ( R r − − i + v r − ) − f ( R r − + i − v r − ) + f ( R r − − i − v r − )+ · · · + f ( R + i + v ) − f ( R − i + v ) − f ( R + i − v ) + f ( R − i − v )+ f ( R + i + v ) − f ( R − i + v ) − f ( R + i − v ) + f ( R − i − v )+ f ( { i } )= f ( R + i ) − f ( R ) − f ( R r − + i ) + f ( R r − )+ f ( R r − + i ) − f ( R r − ) − f ( R r − + i ) + f ( R r − )+ · · · + f ( R + i ) − f ( R ) − f ( R + i ) + f ( R )+ f ( R + i ) − f ( R ) − f ( R + i ) + f ( R )+ f ( { i } )= f ( R + i ) − f ( R ) The last equality holds because the third and the fourth elements of each line cancel out the ﬁrstand the second element of the next line (except for the last two lines), respectively. For the last twolines, note that f ( R ) = f ( ∅ ) = 0 and f ( R + i ) = f ( { i } ) .Now, we consider the case that i ∈ R . Let i = v j . Then B i ( R ) = f ( R ) − f ( R − i ) and the right16and side is equal to f ( R r − + i + v r ) − f ( R r − − i + v r ) − f ( R r − + i − v r ) + f ( R r − − i − v r )+ f ( R r − + i + v r − ) − f ( R r − − i + v r − ) − f ( R r − + i − v r − ) + f ( R r − − i − v r − )+ · · · + f ( R j + i + v j +1 ) − f ( R j − i + v j +1 ) − f ( R j + i − v j +1 ) + f ( R j − i − v j +1 )+ f ( R j − + i + v j ) − f ( R j − − i + v j ) − f ( R j − + i − v j ) + f ( R j − − i − v j )+ f ( R j − + i + v j − ) − f ( R j − − i + v j − ) − f ( R j − + i − v j − ) + f ( R j − − i − v j − )+ · · · + f ( R + i + v ) − f ( R − i + v ) − f ( R + i − v ) + f ( R − i − v )+ f ( R + i + v ) − f ( R − i + v ) − f ( R + i − v ) + f ( R − i − v )+ f ( { i } )= f ( R ) − f ( R − i ) − f ( R r − ) + f ( R r − − i )+ f ( R r − ) − f ( R r − − i ) − f ( R r − ) + f ( R r − − i )+ · · · + f ( R j +1 ) − f ( R j +1 − i ) − f ( R j ) + f ( R j − )+ f ( R j ) − f ( R j ) − f ( R j − ) + f ( R j − )+ f ( R j ) − f ( R j − ) − f ( R j − + i ) + f ( R j − )+ · · · + f ( R + i ) − f ( R ) − f ( R + i ) + f ( R )+ f ( R + i ) − f ( R ) − f ( R + i ) + f ( R )+ f ( { i } )= f ( R ) − f ( R − i ) . Like before the last equality holds because the last two terms of each line cancels out the ﬁrst twoterms of the next line except for the last two lines, the ﬁrst f ( R j ) line and the f ( R j +1 ) line. Theterms of the ﬁrst f ( R j ) line cancel each other out, while the last two terms of the f ( R j +1 ) linecancel the ﬁrst two terms of the second f ( R j ) line.The following result connects the ﬁrst order diﬀerence ( B i ) and the second order diﬀerence ( A ij )to the ﬁrst and the second order partial derivatives of the multi-linear extension of a set function. Lemma 4 ([49]) . Let f be a set function and F its multi-linear function. Then for any x =( x , . . . , x n ) ∈ [0 , n and i, j ∈ [ n ] , ∇ i F ( x ) = E R ∼ x [ B i ( R )] = X R ⊆ [ n ] B i ( R ) p x ( R ) = X R ⊆ [ n ] − i [ f ( R + i ) − f ( R )] Y v ∈ R x v Y v ∈ [ n ] \ ( R + i ) (1 − x v ) , and, ∇ ij F ( x ) = E R ∼ x [ A ij ( R )] = X R ⊆ [ n ] A ij ( R ) p x ( R )= X R ⊆ [ n ] − i − j [ f ( R + i + j ) − f ( R + i ) − f ( R + j ) + f ( R )] Y v ∈ R x v Y v ∈ [ n ] \ ( R + i + j ) (1 − x v ) . roof. First of all, note that if i / ∈ R then B i ( R + i ) = B i ( R ) . Now, we write the multi-linearfunction F ( x ) = X R ⊆ [ n ] f ( R ) Y v ∈ R x v Y v ∈ [ n ] \ R (1 − x v )= X R ⊆ [ n ] − i ( f ( R + i ) x i + f ( R )(1 − x i )) Y v ∈ R x v Y v ∈ [ n ] \ ( R + i ) (1 − x v ) . Therefore ∇ i F ( x ) = X R ⊆ [ n ] − i ( f ( R + i ) − f ( R )) Y v ∈ R x v Y v ∈ [ n ] \ ( R + i ) (1 − x v )= x i X R ⊆ [ n ] − i ( f ( R + i ) − f ( R )) Y v ∈ R x v Y v ∈ [ n ] \ ( R + i ) (1 − x v )+ (1 − x i ) X R ⊆ [ n ] − i ( f ( R + i ) − f ( R )) Y v ∈ R x v Y v ∈ [ n ] \ ( R + i ) (1 − x v )= X R ⊆ [ n ] − i ( f ( R + i ) − f ( R )) Y v ∈ R + i x v Y v ∈ [ n ] \ ( R + i ) (1 − x v )+ X R ⊆ [ n ] − i ( f ( R + i ) − f ( R )) Y v ∈ R x v Y v ∈ [ n ] \ R (1 − x v )= X R ⊆ [ n ] − i B i ( R + i ) p x ( R + i ) + X R ⊆ [ n ] − i B i ( R ) p x ( R )= X R ⊆ [ n ] B i ( R ) p x ( R ) . Now, to prove the other part of the lemma, we write the multi-linear function again. F ( x ) = X R ⊆ [ n ] f ( R ) Y v ∈ R x v Y v ∈ [ n ] \ R (1 − x v )= x i x j X R ⊆ [ n ] − i − j f ( R + i + j ) Y v ∈ R x v Y v ∈ [ n ] \ ( R + i + j ) (1 − x v )+ x i (1 − x j ) X R ⊆ [ n ] − i − j f ( R + i ) Y v ∈ R x v Y v ∈ [ n ] \ ( R + i + j ) (1 − x v )+ (1 − x i ) x j X R ⊆ [ n ] − i − j f ( R + j ) Y v ∈ R x v Y v ∈ [ n ] \ ( R + i + j ) (1 − x v )+ (1 − x i )(1 − x j ) X R ⊆ [ n ] − i − j f ( R ) Y v ∈ R x v Y v ∈ [ n ] \ ( R + i + j ) (1 − x v ) . Therefore, by using the fact that x i x j + (1 − x i ) x j + x i (1 − x j ) + (1 − x i )(1 − x j ) = 1 , and A ij ( R + i + j ) = A ij ( R + i ) = A ij ( R + j ) = A ij ( R ) = f ( R + i + j ) − f ( R + i ) − f ( R + j ) + f ( R ) R ⊆ [ n ] − i − j , we have ∇ ij F ( x ) = X R ⊆ [ n ] − i − j ( f ( R + i + j ) − f ( R + i ) − f ( R + j ) + f ( R )) Y v ∈ R x v Y v ∈ [ n ] \ ( R + i + j ) (1 − x v )= x i x j X R ⊆ [ n ] − i − j A ij ( R + i + j ) Y v ∈ R x v Y v ∈ [ n ] \ ( R + i + j ) (1 − x v )+ (1 − x i ) x j X R ⊆ [ n ] − i − j A ij ( R + j ) Y v ∈ R x v Y v ∈ [ n ] \ ( R + i + j ) (1 − x v )+ x i (1 − x j ) X R ⊆ [ n ] − i − j A ij ( R + i ) Y v ∈ R x v Y v ∈ [ n ] \ ( R + i + j ) (1 − x v )+ (1 − x i )(1 − x j ) X R ⊆ [ n ] − i − j A ij ( R ) Y v ∈ R x v Y v ∈ [ n ] \ ( R + i + j ) (1 − x v )= X R ⊆ [ n ] − i − j A ij ( R + i + j ) Y v ∈ R + i + j x v Y v ∈ [ n ] \ R (1 − x v )+ X R ⊆ [ n ] − i − j A ij ( R + j ) Y v ∈ R + j x v Y v ∈ [ n ] \ ( R + i ) (1 − x v )+ X R ⊆ [ n ] − i − j A ij ( R + i ) Y v ∈ R + i x v Y v ∈ V \ ( R + i ) (1 − x v )+ X R ⊆ [ n ] − i − j A ij ( R ) Y v ∈ R x v Y v ∈ [ n ] \ R (1 − x v )= X R ⊆ [ n ] − i − j A ij ( R + i + j ) p x ( R + i + j )+ X R ⊆ [ n ] − i − j A ij ( R + j ) p x ( R + j )+ X R ⊆ [ n ] − i − j A ij ( R + i ) p x ( R + i )+ X R ⊆ [ n ] − i − j A ij ( R ) p x ( R )= X R ⊆ [ n ] A ij ( R ) p x ( R ) . A.1 One-Sided Smoothness versus Lipschitz Smoothness

Lipschitz smoothness is an important, widely-used property in convex optimization and machinelearning. One-sided σ -smoothness is diﬀerent from Lipschitz smoothness (and other smoothnessnotions based on Holder’s or uniform continuity) and we believe it may also have applications tothese areas.A diﬀerentiable function is Lipschitz smooth if its gradient is Lipschitz continuous. In other words, f is Lipschitz smooth if there exists L ≥ such that for any x and y , ||∇ f ( x ) −∇ f ( y ) || ≤ L || x − y || or equivalently for twice diﬀerentiable functions, u T ∇ f ( x ) u ≤ L || u || . We then call f L -Lipschitzsmooth . One could deﬁne the one-sided version of this smoothness if the above inequality holdsfor any x ≤ y (second deﬁnition/inequality holds for any u ≥ ~ ). With this deﬁnition, it is easy19o see that submodular functions are one-sided -Lipschitz smooth. On the other hand one-sided σ -smoothness is not equivalent to one-sided L -Lipschitz smoothness. To see an important diﬀerence,consider g = cf function where c is a constant and f is one-sided smooth. We have ∇ g ( x ) = c ∇ f ( x ) .Thus if f is one-sided L -Lipschitz smooth we may only assert that g is one-sided cL -Lipschitz smooth.In particular, Lipschitz smoothness is not closed under multiplication. On the other hand, the one-sided σ -smooth functions form a cone. Intuitively, the reason is that in σ -smooth functions, theratio of the gradients is bounded (as shown in Lemma 3) unlike Lipschitz smoothness where thediﬀerence of the gradients is bounded. B Appendix: Meta-Submodular Family

In this section, we discuss the meta-submodularity parameter of the class of meta-submodularfunctions (deﬁned by Kleinberg et al. [35]) and the class of proportionally submodular functions(deﬁned by Borodin et al. [8]).

Proposition 2. f is -meta-submodular if and only if it is meta-submodular (by Kleinberg et al.deﬁnition [35]).Proof. Kleinberg et al [35] show that a set function f is meta-submodular if and only if f ( S + i ) − f ( S ) ≥ f ( T + i ) − f ( T ) , ∀ ∅ = S ⊆ T, ∀ i / ∈ T. The above is clearly equivalent to f ( S + i ) − f ( S ) ≥ f ( S + j + i ) − f ( S + j ) , ∀ S = ∅ , ∀ i = j / ∈ S. (5)Then f is 0-meta submodular ⇐⇒ A ij ( S ) ≤ , ∀ S = ∅ , ∀ i, j ∈ V ⇐⇒ f ( S + i + j ) − f ( S + i ) − f ( S + j ) + f ( S ) ≤ , ∀ S = ∅ , ∀ i, j ∈ V ⇐⇒ f ( S + i ) − f ( S ) ≥ f ( S + j + i ) − f ( S + j ) , ∀ S = ∅ , ∀ i, j ∈ V ⇐⇒ f ( S + i ) − f ( S ) ≥ f ( S + j + i ) − f ( S + j ) , ∀ S = ∅ , ∀ i = j / ∈ S ⇐⇒ (5) holds . Proposition 3.

Any monotone propotionally submodular function is -meta-submodular.Proof. The proof is by case analysis. • If i, j / ∈ R then using weak submodularity property we have ( | R | + 2) f ( R ) + ( | R | ) f ( R + i + j ) ≤ ( | R | + 1) f ( R + i ) + ( | R | + 1) f ( R + j ) , which means | R | · ( f ( R ) + f ( R + i + j ) − f ( R + i ) − f ( R + j )) ≤ f ( R + i ) + f ( R + j ) − f ( R ) . f ( R + i + j ) − f ( R + i − j ) − f ( R + j − i ) + f ( R − i − j )= f ( R + i + j ) − f ( R + i ) − f ( R + j ) + f ( R ) ≤ f ( R + i ) − f ( R ) + f ( R + j ) − f ( R ) | R | = f ( R + i ) − f ( R − i ) + f ( R + j ) − f ( R − j ) | R | . • If i, j ∈ R then using weak submodularity property we have ( | R | − f ( R ) + ( | R | ) f ( R − i − j ) ≤ ( | R | − f ( R − i ) + ( | R | − f ( R − j ) , which means | R | · ( f ( R ) + f ( R − i − j ) − f ( R − i ) − f ( R − j )) ≤ f ( R ) − f ( R − i ) − f ( R − j ) . Hence f ( R + i + j ) − f ( R + i − j ) − f ( R + j − i ) + f ( R − i − j )= f ( R ) − f ( R − j ) − f ( R − i ) + f ( R − i − j ) ≤ f ( R ) − f ( R − i ) + f ( R ) − f ( R − j ) | R | = f ( R + i ) − f ( R − i ) + f ( R + j ) − f ( R − j ) | R | . • If i ∈ R and j / ∈ R then using weak submodularity property we have ( | R | − f ( R + j ) + ( | R | + 1) f ( R − i ) ≤ ( | R | ) f ( R ) + ( | R | ) f ( R + j − i ) , which means | R | · ( f ( R + j ) + f ( R − i ) − f ( R ) − f ( R + j − i )) ≤ f ( R + j ) − f ( R − i )= f ( R + j ) − f ( R − j ) + f ( R + i ) − f ( R − i ) , where the equality is correct because f ( R ) = f ( R − j ) = f ( R + i ) . Hence f ( R + i + j ) − f ( R + i − j ) − f ( R + j − i ) + f ( R − i − j )= f ( R + j ) − f ( R ) − f ( R + j − i ) + f ( R − i ) ≤ f ( R + j ) − f ( R − i ) | R | = f ( R + i ) − f ( R − i ) + f ( R + j ) − f ( R − j ) | R | . Proposition 4.

Any second-order-modular function with a σ -semi-metric distance function ( σ ≥ )and a non-negative modular function is a σ -meta submodular function. roof. Let f ( R ) = P q ∈ R g ( q ) + P { q,q ′ }⊆ R d ( q, q ′ ) be a second-order modular function (by Lemma 5,it has this form). The proof is by case analysis. • If i, j / ∈ R , we have | R | A ij ( R ) = | R | ( f ( R + i + j ) − f ( R + i − j ) − f ( R − i + j ) + f ( R − i − j ))= | R | ( X q ∈ R + i + j g ( q ) + X { q,q ′ }⊆ R + i + j d ( q, q ′ ) − X q ∈ R + i g ( q ) − X { q,q ′ }⊆ R + i d ( q, q ′ ) − X q ∈ R + j g ( q ) − X { q,q ′ }⊆ R + j d ( q, q ′ ) + X q ∈ R g ( q ) + X { q,q ′ }⊆ R d ( q, q ′ ))= | R | d ( i, j ) . We also have σ ( B i ( R ) + B j ( R )) = σ ( f ( R + i ) − f ( R − i ) + f ( R + j ) − f ( R − i ))= σ ( X q ∈ R + i g ( q ) + X { q,q ′ }⊆ R + i d ( q, q ′ ) − X q ∈ R g ( q ) − X { q,q ′ }⊆ R d ( q, q ′ )+ X q ∈ R + j g ( q ) + X { q,q ′ }⊆ R + j d ( q, q ′ ) − X q ∈ R g ( q ) − X { q,q ′ }⊆ R d ( q, q ′ ))= σg ( i ) + σg ( j ) + σ X q ∈ R d ( i, q ) + σ X q ∈ R d ( j, q ) . Therefore | R | A ij ( R ) ≤ σ ( B i ( R ) + B j ( R )) because g is non-negative and d is non-negative σ -semi-metric. • If i, j ∈ R , we have | R | A ij ( R ) = | R | ( f ( R + i + j ) − f ( R + i − j ) − f ( R − i + j ) + f ( R − i − j ))= | R | ( X q ∈ R g ( q ) + X { q,q ′ }⊆ R d ( q, q ′ ) − X q ∈ R − j g ( q ) − X { q,q ′ }⊆ R − j d ( q, q ′ ) − X q ∈ R − i g ( q ) − X { q,q ′ }⊆ R − i d ( q, q ′ ) + X q ∈ R − i − j g ( q ) + X { q,q ′ }⊆ R − i − j d ( q, q ′ ))= | R | d ( i, j ) . We also have σ ( B i ( R ) + B j ( R )) = σ ( f ( R + i ) − f ( R − i ) + f ( R + j ) − f ( R − i ))= σ ( X q ∈ R g ( q ) + X { q,q ′ }⊆ R d ( q, q ′ ) − X q ∈ R − i g ( q ) − X { q,q ′ }⊆ R − i d ( q, q ′ )+ X q ∈ R g ( q ) + X { q,q ′ }⊆ R d ( q, q ′ ) − X q ∈ R − j g ( q ) − X { q,q ′ }⊆ R − j d ( q, q ′ ))= σg ( i ) + σg ( j ) + 2 σd ( i, j ) + σ X q ∈ R − i − j d ( i, q ) + σ X q ∈ R − i − j d ( j, q ) . Therefore | R | A ij ( R ) ≤ σ ( B i ( R ) + B j ( R )) because g is non-negative, d is non-negative σ -semi-metric, and σ ≥ . 22 If i ∈ R and j / ∈ R , we have | R | A ij ( R ) = | R | ( f ( R + i + j ) − f ( R + i − j ) − f ( R − i + j ) + f ( R − i − j ))= | R | ( X q ∈ R + j g ( q ) + X { q,q ′ }⊆ R + j d ( q, q ′ ) − X q ∈ R g ( q ) − X { q,q ′ }⊆ R d ( q, q ′ ) − X q ∈ R − i + j g ( q ) − X { q,q ′ }⊆ R − i + j d ( q, q ′ ) + X q ∈ R − i g ( q ) + X { q,q ′ }⊆ R − i d ( q, q ′ ))= | R | d ( i, j ) . We also have σ ( B i ( R ) + B j ( R )) = σ ( f ( R + i ) − f ( R − i ) + f ( R + j ) − f ( R − i ))= σ ( X q ∈ R g ( q ) + X { q,q ′ }⊆ R d ( q, q ′ ) − X q ∈ R − i g ( q ) − X { q,q ′ }⊆ R − i d ( q, q ′ )+ X q ∈ R + j g ( q ) + X { q,q ′ }⊆ R + j d ( q, q ′ ) − X q ∈ R g ( q ) − X { q,q ′ }⊆ R d ( q, q ′ ))= σg ( i ) + σg ( j ) + σd ( i, j ) + σ X q ∈ R − i d ( i, q ) + σ X q ∈ R − i d ( j, q ) . Therefore | R | A ij ( R ) ≤ σ ( B i ( R ) + B j ( R )) because g is non-negative, d is non-negative σ -semi-metric, and σ ≥ . C Appendix: Semi-Metric Diversity

In this section, we establish the smoothness parameter associated with several of the discretequadratic functions discussed. In other words, we bound the approximate triangle inequality fortheir associated distance functions.

Deﬁnition 1.

Let d : [ n ] × [ n ] → R ≥ be a distance function with the corresponding distance matrix D ∈ R n × n ≥ where D a,b = d ( a, b ) . We say d is a negative-type distance if for any x ∈ R n with || x || = 0 we have x T Dx ≤ . Proposition 5.

Any negative-type distance d : [ n ] × [ n ] → R ≥ is -semi-metric.Proof. Let x = 0 . e a + 0 . e b − e c . We know x T Dx = 0 . d ( a, b ) − d ( a, c ) − d ( b, c ) ≤ . Therefore d ( a, b ) ≤ d ( a, c ) + 2 d ( b, c ) and d is -semi metric.Jensen-Shannon Divergence is a function which measures dis-similarity between probability dis-tributions. It is well-known that if d is a JS measure, then √ d is a metric. Hence JS distances forma -semi-metric by the following result. Proposition 6.

Let d : [ n ] × [ n ] → R ≥ be a distance function such that p d ( · , · ) is a metric. Then d ( · , · ) is a -semi-metric. roof. By deﬁnition, we have p d ( i, j ) ≤ p d ( i, k ) + p d ( j, k ) . Therefore, d ( i, j ) ≤ d ( i, k ) + d ( j, k ) + 2 p d ( i, k ) d ( j, k ) . We also know that d ( i, k ) + d ( j, k ) − p d ( i, k ) d ( j, k ) = ( p d ( i, k ) − p d ( j, k )) ≥ . Hence, d ( i, j ) ≤ d ( i, k ) + d ( j, k )) . C.1 Second-Order-Modular Functions

In this section, we describe the structure of second-order-modular functions (deﬁned by Korula etal. [38]). We also discuss the smoothness parameter of quadratic functions deﬁned on a σ -semi-metricdistance. Moreover, we discuss the meta-submodularity parameter of the second-order-modularfunctions deﬁned on a σ -semi-metric distance. Deﬁnition 2 ([38]) . A set functions f : 2 [ n ] → R is called second-order modular if B i ( S ∪ R ) − B i ( S ) = B i ( T ∪ R ) − B i ( T ) for any S ⊆ T , R ⊆ [ n ] \ T , and i ∈ [ n ] \ ( T ∪ R ) . The following lemma characterize the structure of second-order modular functions.

Lemma 5. f is a second-order modular function if and only if there exist symmetric d : [ n ] × [ n ] → R ,and g : [ n ] → R such that f ( R ) = X { i,j }⊂ R d ( i, j ) + X i ∈ R g ( i ) . If f is also supermodular (submodular), then d is non-negative (non-positive).Proof. Suﬃciency is easy since B i ( S ∪ R ) − B i ( S ) = ( g ( i ) + X m ∈ S ∪ R d ( m, i )) − ( g ( i ) + X m ∈ S d ( m, i )) = X m ∈ R d ( m, i )= ( g ( i ) + X m ∈ T ∪ R d ( m, i )) − ( g ( i ) + X m ∈ T d ( m, i ))= B i ( T ∪ R ) − B i ( T ) . To prove necessity, we ﬁrst show that if i, j ∈ [ n ] and S ⊆ [ n ] − i − j then, by second-order modularity B j ( S + i ) − B j ( S ) = B j ([ n ] − j ) − B j ([ n ] − i − j ) , because S ⊆ [ n ] − i ( [ n ] − i plays the role of T in the deﬁnition of second-order modular). Now, let d ( i, j ) = B j ([ n ] − j ) − B j ([ n ] − i − j ) and g ( i ) = B i ( ∅ ) . Note that d is symmetric because d ( i, j ) = B j ([ n ] − j ) − B j ([ n ] − i − j ) = ( f ([ n ]) − f ([ n ] − j )) − ( f ([ n ] − i ) − f ([ n ] − i − j ))= ( f ([ n ]) − f ([ n ] − i )) − ( f ([ n ] − j ) − f ([ n ] − i − j )) = B i ([ n ] − i ) − B i ([ n ] − i − j ) = d ( j, i ) . m , let R m = { v , . . . , v m } , and set R = ∅ . Consider a set R = { v , . . . , v r } . Then we have f ( R ) = r − X m =0 ( f ( R m + v m +1 ) − f ( R m )) = r − X m =0 B v m +1 ( R m )= r − X m =0 ( m X t =1 ( B v m +1 ( R t ) − B v m +1 ( R t − )) + B v m +1 ( R )) telescoping sum = r − X m =0 ( m X t =1 ( B v m +1 ([ n ] − v m +1 ) − B v m +1 ([ n ] − v t − v m +1 )) + B v m +1 ( R ))= r − X m =0 m X t =1 d ( v t , v m +1 ) + r − X m =0 g ( v m +1 ) . If f is supermodular, i, j ∈ [ n ] , and R ⊆ [ n ] − i − j , we have f ( R + i + j ) − f ( R + i ) ≥ f ( R + j ) − f ( R ) . Therefore, g ( j ) + X v ∈ R + i d ( v, j ) ≥ g ( j ) + X v ∈ R d ( v, j ) , which means d ( i, j ) ≥ . Similarly, if f is submodular, d is non-positive. Proposition 7.

Let A ∈ R n × n be a symmetric, -diagonal matrix. Let b ∈ R n and b ≥ . Then F ( x ) = x T Ax + b T x is one-sided σ -smooth if A is σ -semi-metric.Proof. Note that ∇ F ( x ) = A and ∇ F ( x ) = Ax + b . Therefore, σ ( ∇ i F ( x ) + ∇ j F ( x )) ≥ σ ( n X k =1 A ( i, k ) x k + n X k =1 A ( j, k ) x k ) = n X k =1 σ ( A ( i, k ) + A ( j, k )) x k ≥ n X k =1 A ( i, j ) x k = || x || A ( i, j ) = || x || ∇ ij F ( x ) , where the ﬁrst inequality follows from b ≥ and the last inequality holds because A is σ -semi-metric.Now by Lemma 1, we conclude that F is one-sided σ -smooth. C.2 Hardness of Approximation for σ -semi-metrics In this section, we provide a hardness result for approximate maximization of remote-clique functionsdeﬁned on a semi-metric distance.

Theorem 3.

Assuming the Planted Clique Conjecture: (1) for any constant σ ≥ , it is hard toapproximate the maximum of a σ -semi-metric function subject to a cardinality constraint within afactor of σ − ǫ for any ǫ > and (2) for a super-constant σ , there is no constant factor (polytime)approximation algorithm for maximizing a σ -semi-metric function subject to a cardinality constraint.Proof. Planted Clique problem asks for an algorithm to distinguish between the following graphswith probability of at least / : 1) A graph drawn from G ( n, / , 2) A graph drawn from G ( n, / and then a clique of size n / − δ is planted in it ( δ > ) [32]. The planted clique conjecture states25hat there is no polynomial time algorithm to do this task [3, 31]. It has been shown that assumingthe planted clique conjecture, it is hard to approximate the maximum of a metric diversity functionwithin a factor better than [5, 9].Given a graph G , in the densest k -subgraph problem we need to ﬁnd an induced subgraph of size k with the maximum number of edges. Let R be a subset of vertices of G and E ( R ) be the numberof edges in the induced subgraph of R . The density of R is deﬁned as ρ ( R ) = E ( R ) / (cid:0) | R | (cid:1) . Alonet al. [3] showed that if there is no polynomial time algorithm for the planted clique problem for aplanted clique of size n / , then there is no polynomial time algorithm for distinguishing between agraph G of size n that contains a clique of size n / , and a graph G of the same size in which thedensity of every subset of vertices of size n / is at most δ > .We can reduce the densest k -subgraph problem to σ -semi-metric function maximization in thefollowing way. Consider an instance of densest k -subgraph ( k = n / ) on graph G with vertex set [ n ] . Create the distance function d : [ n ] × [ n ] → R . If there is an edge between i, j ∈ [ n ] in G , set d ( i, j ) = 2 σ , otherwise set d ( i, j ) = 1 . It is easy to see that this distance function is σ -semi-metric.Let f ( R ) = P { i,j }⊆ R d ( i, j ) . If | R | = k , we have f ( R ) = 2 σE ( R ) + ( (cid:18) k (cid:19) − E ( R )) . We know (cid:0) k (cid:1) ≥ E ( R ) . Therefore σE ( R ) ≤ f ( R ) ≤ σE ( R ) + (cid:18) k (cid:19) , and dividing both sides by σ (cid:0) k (cid:1) we get ρ ( R ) ≤ f ( R )2 σ (cid:0) k (cid:1) ≤ ρ ( R ) + 12 σ . (6)It is easy to see that arg max R ⊆ [ n ] | R | = k ρ ( R ) = arg max R ⊆ [ n ] | R | = k f ( R ) . Now, assume that for some ﬁxed constant c ≥ there is a c -factor approximate algorithm for ﬁndingthe maximum of σ -semi-metric function ( σ is super-constant) and its output on G is S . Also, let OPT ∈ arg max R ⊆ [ n ] | R | = k ρ ( R ) . We have ρ ( OPT ) ≤ f ( OPT )2 σ (cid:0) k (cid:1) ≤ cf ( S )2 σ (cid:0) k (cid:1) ≤ cρ ( S ) + c σ Since σ ∈ ω (1) , for some n large enough we have that c σ ≤ . Hence ρ ( OPT ) ≤ cρ ( S )+ . Set δ = c and note that δ > is a constant. If G is a graph in which the density of every subset of verticesof size k is at most δ then clearly ρ ( S ) ≤ δ . If G is a graph that contains a clique of size k then ρ ( OPT ) ≤ cρ ( S ) + , which means ρ ( S ) ≥ c = 2 δ . This means that our c -factor approximatealgorithm can distinguish between these two graphs which is in contrast with the planted cliqueconjecture and Alon et al. result. 26or the ﬁrst part, given any constant σ , assume there is a (2 σ − ǫ ) -factor approximate algorithmfor some ǫ > for ﬁnding the maximum of σ -semi-metric function. Denote its output on G by S ,and let OP T be deﬁned as above. We then have ρ ( OPT ) ≤ f ( OPT )2 σ (cid:0) k (cid:1) ≤ (2 σ − ǫ ) f ( S )2 σ (cid:0) k (cid:1) ≤ (2 σ − ǫ ) ρ ( S ) + 2 σ − ǫ σ . Set δ = ( σ − ǫ − σ ) / ǫ σ (2 σ − ǫ ) , and note that δ > is a constant. If G is a graph in which thedensity of every subset of vertices of size k is at most δ then clearly ρ ( S ) ≤ δ . If G is a graph thatcontains a clique of size k then ρ ( OPT ) ≤ (2 σ − ǫ ) ρ ( S )+ σ − ǫ σ which means ρ ( S ) ≥ σ − ǫ − σ = 2 δ .This means that our (2 σ − ǫ ) -factor approximate algorithm can distinguish between these two graphswhich is in contrast with the planted clique conjecture and Alon et al. result. D Appendix: One-Sided Smoothness

In this section, we discuss the connection between meta-submodularity of a function and the smooth-ness of its multi-linear extension. We show that if a probabilistic version of ( 2) holds at a point x ,then the multi-linear extension of the functions is smooth at x . We also show that the smoothnessof the multi-linear extension results in the meta-submodularity of the corresponding set function. Lemma 1.

Let f be a non-negative, monotone set function and F be its multi-linear function. Let x ∈ [0 , n and γ ≥ . If for any i, j ∈ [ n ] we have E R ∼ x [ | R | ] · E R ∼ x [ A ij ( R )] ≤ γ · ( E R ∼ x [ B i ( R )] + E R ∼ x [ B j ( R )]) , or equivalently (by Lemma 4), || x || ∇ ij F ( x ) ≤ γ ( ∇ i F ( x ) + ∇ j F ( x )) , then F is one-sided γ -smooth at x .Proof. We have u T ∇ F ( x ) u = n X i =1 n X j =1 u i u j ∇ ij F ( x ) ≤ γ || x || n X i =1 n X j =1 u i u j ( ∇ i F ( x ) + ∇ j F ( x ))= γ || x || ( n X i =1 n X j =1 u i u j ∇ i F ( x ) + n X i =1 n X j =1 u i u j ∇ j F ( x ))= γ || x || ( n X i =1 u i ∇ i F ( x )( n X j =1 u j ) + n X i =1 u i ( n X j =1 u j ∇ j F ( x )))= γ || x || ( || u || n X i =1 u i ∇ i F ( x ) + || u || n X j =1 u j ∇ j F ( x ))= 2 γ (cid:18) || u || || x || (cid:19) ( u T ∇ F ( x )) . roposition 8. Let f be a set function and F be its multi-linear extension. If F is one-sided γ -smooth, then f is γ -meta-submodular.Proof. Let non-empty R ⊆ [ n ] and i, j ∈ [ n ] . Consider the inequality of one-sided γ -smoothness for u = { i,j } and x = R : u i u j ∇ F ij ( x ) ≤ γ u i + u j || x || ( u i ∇ i F ( x ) + u j ∇ j F ( x )) Since u i = u j = 1 , || x || = | R | , ∇ F ij ( x ) = A ij ( R ) , and ∇ i F ( x ) + ∇ j F ( x ) = B i ( S ) + B j ( S ) weobtain the γ -meta-submodular inequality. D.1 Smoothness of Supermodular γ -Meta-Submodular Functions In this section, we show that the multi-linear extension of a supermodular γ -meta-submodularfunctions is one-sided O ( γ ) -smooth. Lemma 6.

Let f : 2 [ n ] → R + be a non-negative, monotone, supermodular, γ -meta-submodular setfunction. Let x ∈ [0 , n \ { ~ } and R ⊆ [ n ] such that ≤ | R | < || x || . Then for all i, j ∈ [ n ] we have ( || x || − | R | ) A ij ( R ) p x ( R ) ≤ γ X e ∈ [ n ] \ R ( B i ( R + e ) + B j ( R + e ) | R | + 1 ) p x ( R + e ) . Also, for the empty set, ( || x || ) A ij ( ∅ ) p x ( ∅ ) ≤ γ + 1) X e ∈ [ n ] ( B i ( { e } ) + B j ( { e } )) p x ( { e } ) . Proof.

Let | R | = r . Note that r < n because | R | = r < || x || . Also, note that if x e = 1 for some e ∈ [ n ] \ R then p x ( R ) = 0 , which means that the left hand side is zero. In that case, the inequalityholds because f is monotone and the right hand side is non-negative. Hence, we assume that x e < for all e ∈ [ n ] \ R . We know that X e ∈ [ n ] x e = || x || . Therefore, because each x e ≤ , X e ∈ [ n ] \ R x e = || x || − X e ∈ R x e ≥ || x || − X e ∈ R || x || − | R | . Hence, since < − x e ≤ for all e ∈ [ n ] \ R , we get ( || x || − | R | ) A ij ( R ) p x ( R ) ≤ X e ∈ [ n ] \ R x e A ij ( R ) p x ( R ) ≤ X e ∈ [ n ] \ R x e − x e A ij ( R ) p x ( R )= X e ∈ [ n ] \ R A ij ( R ) p x ( R + e ) . Moreover, | R | ≥ | R | + 1 because | R | ≥ , and we have X e ∈ [ n ] \ R A ij ( R ) p x ( R + e ) ≤ X e ∈ [ n ] \ R | R | A ij ( R ) | R | + 1 p x ( R + e ) . γ -meta-submodularity and supermodularity we have X e ∈ [ n ] \ R | R | A ij ( R ) | R | + 1 p x ( R + e ) ≤ γ X e ∈ [ n ] \ R B i ( R ) + B j ( R ) | R | + 1 p x ( R + e ) ≤ γ X e ∈ [ n ] \ R B i ( R + e ) + B j ( R + e ) | R | + 1 p x ( R + e ) Combining all of these inequalities yields the ﬁrst part of the lemma. For the second part of thelemma, we consider the set { i, j, e } . By Lemma 2 and the γ -meta-submodularity, we have f ( { i, j, e } ) = B i ( { j, e } ) + B j ( { e } ) + f ( { e } )= A ij ( { e } ) + B i ( { e } ) + B j ( { e } ) + f ( { e } ) ≤ ( γ + 1)( B i ( { e } ) + B j ( { e } )) + f ( { e } ) . Also, by Lemma 2, we have f ( { i, j, e } ) = B i ( { j, e } ) + B j ( { e } ) + f ( { e } )= A ie ( { j } ) + A ij ( ∅ ) + f ( { i } ) + B j ( { e } ) + f ( { e } ) . Therefore A ie ( { j } ) + A ij ( ∅ ) + f ( { i } ) + B j ( { e } ) + f ( { e } ) ≤ ( γ + 1)( B i ( { e } ) + B j ( { e } )) + f ( { e } ) . Hence, because f is non-negative, monotone and supermodular, it follows that A ij ( ∅ ) ≤ A ie ( { j } ) + A ij ( ∅ ) + f ( { i } ) + B j ( { e } ) ≤ ( γ + 1)( B i ( { e } ) + B j ( { e } )) . (7)Moreover, because f is non-negative and monotone, we have A ij ( ∅ ) = f ( { i, j } ) − f ( { i } ) − f ( { j } ) + f ( ∅ ) = B j ( { i } ) − f ( { j } ) ≤ B j ( { i } ) + B i ( { i } ) ≤ ( γ + 1)( B j ( { i } ) + B i ( { i } )) , and A ij ( ∅ ) = f ( { i, j } ) − f ( { i } ) − f ( { j } ) + f ( ∅ ) = B i ( { j } ) − f ( { i } ) ≤ B i ( { j } ) + B j ( { j } ) ≤ ( γ + 1)( B i ( { j } ) + B j ( { j } )) . If x e = 1 for an e ∈ [ n ] then p x ( ∅ ) = 0 and the inequality holds because the left hand side is zeroand the right hand side is non-negative (since f is monotone). Therefore, we assume that x e < for all e ∈ [ n ] . Combining the above inequalities, we have ( || x || ) A ij ( ∅ ) p x ( ∅ ) = X e ∈ [ n ] x e A ij ( ∅ ) p x ( ∅ ) ≤ X e ∈ [ n ] x e − x e A ij ( ∅ ) p x ( ∅ )= X e ∈ [ n ] A ij ( ∅ ) p x ( { e } ) ≤ ( γ + 1) X e ∈ [ n ] ( B i ( { e } ) + B j ( { e } )) p x ( { e } ) , where the last inequality follows from ( ). This completes the proof.29 emma 7. Let f be a non-negative, monotone, supermodular, γ -meta-submodular set function and F be its multi-linear function. Then for any x ∈ [0 , n \ { ~ } and i, j ∈ [ n ] , || x || ∇ ij F ( x ) ≤ (max { γ, γ + 1 } )( ∇ i F ( x ) + ∇ j F ( x )) . Proof.

By using Lemma 6 for all the sets of size less than || x || , we can write ( || x || ) A ij ( ∅ ) p x ( ∅ ) + X R ⊆ [ n ]1 ≤| R | < || x || ( || x || − | R | ) A ij ( R ) p x ( R ) ≤ ( γ + 1) X e ∈ [ n ] ( B i ( { e } ) + B j ( { e } )) p x ( { e } ) + 2 γ X R ⊆ [ n ]1 ≤| R | < || x || X e ∈ [ n ] \ R ( B i ( R + e ) + B j ( R + e ) | R | + 1 ) p x ( R + e )= ( γ + 1) X e ∈ [ n ] ( B i ( { e } ) + B j ( { e } )) p x ( { e } ) + 2 γ X R ⊆ [ n ]2 ≤| R | < || x || +1 ( B i ( R ) + B j ( R )) p x ( R ) ≤ max { γ + 1 , γ } X R ⊆ [ n ] ( B i ( R ) + B j ( R )) p x ( R ) = max { γ + 1 , γ } ( ∇ i F ( x ) + ∇ j F ( x )) , (8)where the equality follows from a simple counting argument, and in the last inequality we used themonotonicity of f (i.e., the B i ’s are non-negative).By γ -meta-submodularity, we also have that X R ⊆ [ n ]1 ≤| R | < || x || | R | A ij ( R ) p x ( R ) + X R ⊆ [ n ] | R |≥|| x || ( || x || ) A ij ( R ) p x ( R ) ≤ X | R |≥ | R | A ij ( R ) p x ( R ) ≤ X | R |≥ γ ( B i ( R ) + B j ( R )) p x ( R ) ≤ X R ⊆ [ n ] γ ( B i ( R ) + B j ( R )) p x ( R ) = γ ( ∇ i F ( x ) + ∇ j F ( x )) . (9)By adding (8) and (9), we conclude that || x || X R ⊆ [ n ] A ij ( R ) p x ( R ) = || x || ∇ ij F ( x ) ≤ max { γ + 1 , γ } ( ∇ i F ( x ) + ∇ j F ( x )) . D.2 Smoothness of Submodular and -Meta-Submodular Functions In this section, we provide results about the smoothness of the multi-linear extension of submodu-lar functions and also sub-domain smoothness of the multi-linear extension of -meta-submodularfunctions. Proposition 9.

Let f : 2 [ n ] → R and F be its multi-linear extension. Then f is submodular if andonly if F is one-sided -smooth.Proof. A set function f is submodular if and only if A ij ( S ) ≤ for all S ⊆ [ n ] and i, j ∈ [ n ] .Let f be submodular. Then ∇ ij F ( x ) = E R ∼ x [ A ij ( R )] ≤ , for any x ∈ [0 , n . It follows that u T ∇ F ( x ) u ≤ for any u ∈ [0 , n , and thus F is one-sided -smooth.30or the opposite direction, let F be one-sided -smooth and let u = e i + e j . Then u T ∇ F ( x ) u =2 ∇ ij F ( x ) ≤ for all x = . Moreover, by continuity of ∇ F ( x ) , the inequality also holds at x = .We then have that A ij ( S ) = ∇ ij F ( S ) ≤ for all S ⊆ [ n ] , and thus f is submodular. Proposition 10.

Let f be a non-negative, monotone, -meta-submodular function and F be itsmulti-linear extension. Then for any v ∈ [ n ] , F is one-sided -smooth on { x ∈ [0 , n : x ≥ { v } } .Proof. By -meta-submodularity, for any set R , we have | R | A ij ( R ) ≤ . This means that for anynon-empty R , A ij ( R ) ≤ . Since x v = 1 , the probability of picking a set that does not include v iszero. Therefore, we have ∇ ij F ( x ) = X R ⊆ [ n ] A ij ( R ) p x ( R ) = X R ⊆ [ n ] − v A ij ( R + v ) p x ( R + v ) ≤ . Hence for u ∈ [0 , n , u T ∇ F ( x ) u = 2 X { i,j }⊆ [ n ] u i u j ∇ ij F ( x ) ≤ . D.3 Sub-domain Smoothness of Meta-Submodular Functions and General Mono-tone Functions

In this section, we discuss the sub-domain smoothness of the multi-linear extension of general γ -meta-submodular functions and monotone set functions. Theorem 7.

Let f be a non-negative, monotone, γ -meta-submodular set function and F be itsmulti-linear extension. Let c ≥ and S ⊆ [ n ] be non-empty. Then F is one-sided cγ -smooth on { x | x ≥ S , || x || ≤ c | S |} .Proof. Let y ∈ { x | x ≥ S , || x || ≤ c | S |} . First, we show that || y || ∇ ij F ( y ) ≤ γc ( ∇ i F ( y ) + ∇ j F ( y )) . We know ∇ ij F ( y ) = P R ⊆ [ n ] A ij ( R ) p y ( R ) . Since y ≥ S , p y ( R ) = 0 for any R that is not a supersetof S . Therefore, ∇ ij F ( y ) = P R ⊆ [ n ] \ S A ij ( S ∪ R ) p y ( S ∪ R ) . We have || y || ∇ ij F ( y ) = || y || X R ⊆ [ n ] \ S A ij ( S ∪ R ) p y ( S ∪ R ) ≤ c | S | X R ⊆ [ n ] \ S A ij ( S ∪ R ) p y ( S ∪ R ) ≤ X R ⊆ [ n ] \ S γc | S || S ∪ R | ( B i ( S ∪ R ) + B j ( S ∪ R )) p y ( S ∪ R ) ≤ X R ⊆ [ n ] \ S γc ( B i ( S ∪ R ) + B j ( S ∪ R )) p y ( S ∪ R ) ≤ γc ( ∇ i F ( y ) + ∇ j F ( y )) . Now, by Lemma 1, we conclude that F is one-sided ( cγ )-smooth at y . Proposition 11.

Let f : 2 [ n ] → R be a non-negative, monotone function and F be its multi-linearextension. Let x ∈ [0 , n such that x v > for each v ∈ [ n ] . Then there is a σ ≥ , such that F is one-sided σ -smooth at x . Moreover, let z ∈ [0 , n whose smallest component value is z min > .Then F is nz min -smooth on { x : 1 ≥ x ≥ z } . roof. Let i, j ∈ [ n ] . By Lemma 4 we have ∇ ij F ( x ) = X R ⊆ [ n ] A ij ( R ) p x ( R ) = X R ⊆ [ n ] ( B i ( R + j ) − B i ( R − j )) p x ( R )= X R ⊆ [ n ] B i ( R + j ) p x ( R ) − X R ⊆ [ n ] B i ( R − j ) p x ( R ) . We ﬁrst show that there is γ ij > such that || x || ∇ ij F ( x ) ≤ γ ij ( ∇ i F ( x ) + ∇ j F ( x )) . (10)Since f is monotone, the right hand side is non-negative. Hence, if ∇ ij F ( x ) is non-positive, theinequality holds for any γ ij > . Therefore, we assume that ∇ ij F ( x ) is positive which implies that P R ⊆ [ n ] B i ( R + j ) p x ( R ) > by monotonicity. Hence < ∇ ij F ( x ) ≤ X R ⊆ [ n ] B i ( R + j ) p x ( R ) = X R ⊆ [ n ] − j B i ( R + j ) p x ( R ) + X R ⊆ [ n ] − j B i (( R + j ) + j ) p x ( R + j )= X R ⊆ [ n ] − j B i ( R + j )( p x ( R ) + p x ( R + j )) = X R ⊆ [ n ] − j B i ( R + j )( 1 − x j x j p x ( R + j ) + p x ( R + j ))= X R ⊆ [ n ] − j B i ( R + j )( 1 x j p x ( R + j )) = 1 x j X R ⊆ [ n ] − j B i ( R + j ) p x ( R + j ) ≤ x j ( X R ⊆ [ n ] − j B i ( R ) p x ( R ) + X R ⊆ [ n ] − j B i ( R + j ) p x ( R + j )) = 1 x j X R ⊆ [ n ] B i ( R ) p x ( R ) = 1 x j ∇ i F ( x ) . Hence, we conclude that ∇ i F ( x ) ≥ ∇ ij F ( x ) and so if ∇ ij F ( x ) is positive, then ∇ i F ( x ) + ∇ j F ( x ) is also positive. Now, set γ ij = 0 if ∇ ij F ( x ) is non-positive and otherwise we set γ ij = || x || ∇ ij F ( x ) ∇ i F ( x ) + ∇ j F ( x ) ≤ || x || ∇ ij F ( x )( x i + x j ) ∇ ij F ( x ) = || x || x i + x j . (11)Let γ = 2 max { i,j }⊆ [ n ] γ ij . Then for u ∈ [0 , n , we have by (11) u T ∇ F ( x ) u = n X i =1 n X j =1 u i u j ∇ ij F ( x ) ≤ || x || n X i =1 n X j =1 γ ij u i u j ( ∇ i F ( x ) + ∇ j F ( x )) ≤ γ || x || n X i =1 n X j =1 u i u j ( ∇ i F ( x ) + ∇ j F ( x ))= γ || x || ( n X i =1 n X j =1 u i u j ∇ i F ( x ) + n X i =1 n X j =1 u i u j ∇ j F ( x ))= γ || x || ( n X i =1 u i ∇ i F ( x )( n X j =1 u j ) + n X i =1 u i ( n X j =1 u j ∇ j F ( x )))= γ || x || ( || u || n X i =1 u i ∇ i F ( x ) + || u || n X j =1 u j ∇ j F ( x ))= γ (cid:18) || u || || x || (cid:19) ( u T ∇ F ( x )) . Now for the second part of the proof we must choose a γ that works for all x ≥ z and each i, j .By (11) it is suﬃcient to choose γ = max i,j { || x || x i + x j : x ∈ [0 , n , x ≥ z } ≤ nz min .32 Appendix: Jump-Start Continuous Greedy

In this section, we provide the omitted results and proofs about the jump-start continuous greedyalgorithm.

E.1 Jump-Start Continuous Greedy for One-Sided σ -smooth functions In this section, we provide the complete proof of the approximation bound of the jump-start con-tinuous greedy algorithm for one-sided σ -smooth functions. We also discuss the optimum c valuefor the algorithm when it runs on a one-sided σ -smooth function. Theorem 1.

OP T := max { F ( x ) : x ∈ P } .Proof. For each t ∈ [0 , we have x ( t ) = x (0) + (1 − c ) Z t v max ( x ( τ )) dτ = cv ∗ + (1 − c ) Z t v max ( x ( τ )) dτ. (12)Since P is convex and v ∗ ∈ P , we have that x ( t ) ∈ P as long as y ( t ) := R t v max ( x ( τ )) dτ ∈ P . Giventhat each v max ( x ( τ )) ∈ P and also ~ ∈ P , it follows that y ( t ) is a convex combination of points in P , and hence belongs to P .Let x ∗ ∈ P be such that F ( x ∗ ) = OP T . Also let x ∈ { x ( t ) : 0 ≤ t ≤ } and u = ( x ∗ − x ) ∨ ,i.e., x ∗ ∨ x = x + u . We have by Taylor’s Theorem that for some ǫ ∈ (0 , : F ( x ∗ ∨ x ) = F ( x )+ u T ∇ F ( x + ǫu ) ≤ F ( x )+ (cid:18) || x + ǫu || || x || (cid:19) σ u T ∇ F ( x ) ≤ F ( x )+ (cid:18) || x + u || || x || (cid:19) σ u T ∇ F ( x ) where the ﬁrst inequality follows from Lemma 3. Hence u T ∇ F ( x ) ≥ (cid:16) || x + u || || x || (cid:17) σ ( F ( x ∨ x ∗ ) − F ( x )) ≥ (cid:16) || x + u || || x || (cid:17) σ ( OP T − F ( x )) , (13)where the last inequality follows from monotonicity since then F ( x ∨ x ∗ ) ≥ F ( x ∗ ) = OP T . We alsohave that v max ( x ) · ∇ F ( x ) ≥ x ∗ · ∇ F ( x ) ≥ u · ∇ F ( x ) , where the ﬁrst inequality follows by deﬁnition of v max and the fact that x ∗ ∈ P , and the secondinequality from the fact that x ∗ ≥ u and ∇ F ≥ . Combining this with (13) yields: v max ( x ) · ∇ F ( x ) ≥ (cid:16) || x + u || || x || (cid:17) σ ( OP T − F ( x )) . (14)By the choice of x (0) we have that || x (0) || ≥ c || w || for any w ∈ P . Since u ∈ P and x ( t ) isnon-decreasing in each component (because v max is always non-negative), we thus have || x + u || || x || ≤ || u || || x || ≤ || u || || x (0) || ≤ c = c + 1 c . (cid:16) || x + u || || x || (cid:17) σ ≥ ( cc + 1 ) σ (15)for any x ∈ { x ( t ) : 0 ≤ t ≤ } . Let us deﬁne ρ to be the righthand side quantity above. Intuitively,(14) indicates that the direction v max makes at least a ρ “fractional progress” towards OPT.Moreover, we can use the Chain Rule to get ddt F ( x ( t )) = ∇ F ( x ( t )) · x ′ ( t ) = ∇ F ( x ( t )) · (1 − c ) v max ( x ( t )) ≥ ρ (1 − c )[ OP T − F ( x ( t ))] , (16)where the last inequality follows from (14) and (15).We solve the above diﬀerential inequality by multiplying by e ρ (1 − c ) t . ddt [ e ρ (1 − c ) t · F ( x ( t ))] = ρ (1 − c ) e ρ (1 − c ) t · F ( x ( t )) + e ρ (1 − c ) t · ddt F ( x ( t )) ≥ ρ (1 − c ) e ρ (1 − c ) t · F ( x ( t )) + ρ · e ρ (1 − c ) t (1 − c )[ OP T − F ( x ( t ))]= ρ (1 − c ) e ρ (1 − c ) t · OP T. where the inequality follows from Equation (16).Integrating the LHS and RHS of the above equation between and t we get e ρ (1 − c ) t · F ( x ( t )) − e · F ( x (0)) ≥ ρ (1 − c ) OP T Z t e ρ (1 − c ) τ dτ = ρ (1 − c ) OP T · [ e ρ (1 − c ) t ρ (1 − c ) − ρ (1 − c ) ] = OP T · [ e ρ (1 − c ) t − . Hence F ( x ( t )) ≥ [1 − e ρ (1 − c ) t ] OP T + F ( x (0)) e ρ (1 − c ) t ≥ [1 − e ρ (1 − c ) t ] OP T, where the last inequality follows from the fact that F is non-negative. Substituting t = 1 and ρ = ( cc +1 ) σ gives the desired result. Proposition 12.

For any σ > the best approximation guarantee in Theorem 1 is attained at c = √ σ + 6 σ + 1 − ( σ + 1)2 . Proof.

We need to ﬁnd the maximizer of g ( c ) = (1 − c )( cc +1 ) σ where c ∈ [0 , . Hence, we solve g ′ ( c ) = 0 . g ′ ( c ) = σc σ − ( c + 1) σ − ( σ + 1) c σ ( c + 1) σ − σ ( c + 1) σ − c σ + σ ( c + 1) σ − c σ +1 ( c + 1) σ = 0 ⇒ σc σ − ( c + 1) σ − − σc σ ( c + 1) σ − = c σ ( c + 1) σ ⇒ σc σ − ( c + 1) σ − (1 − c ) = c σ ( c + 1) σ ⇒ σ (1 − c ) = c ( c + 1) ⇒ c + (1 + σ ) c − σ = 0 ⇒ c = − ( σ + 1) ± √ σ + 6 σ + 12 The only solution in (0 , is − ( σ +1)+ √ σ +6 σ +12 and this yields the proposition.34 .2 Jump-Start Continuous Greedy for second-order smooth functions The following result improves the approximation factor of jump-start continuous greedy algorithmfor smooth functions that also satisfy higher order smoothness conditions.

Theorem 2.

Let F : [0 , n → R ≥ be a monotone one-sided σ -smooth function with non-positivethird order partial derivatives. Let c ∈ (0 , and P be a polytime separable, downward-closed,polytope. If we run the jump-start continuous greedy process (Algorithm 1) then x (1) ∈ P and F ( x (1)) ≥ [1 − exp ( − c (1 − c )2 c + σ )] · OP T where

OP T := max { F ( x ) : x ∈ P } . In particular, taking c = 1 / we get F ( x (1)) ≥ [1 − exp ( − σ +2 )] · OP T and so F ( x (1)) ≥ σ +3 · OP T (since e x ≥ x + 1 for x < ).Proof. For each t ∈ [0 , we have x ( t ) = x (0) + (1 − c ) Z t v max ( x ( τ )) dτ = cv ∗ + (1 − c ) Z t v max ( x ( τ )) dτ. (17)Since P is convex and v ∗ ∈ P , we have that x ( t ) ∈ P as long as y ( t ) := R t v max ( x ( τ )) dτ ∈ P . Giventhat each v max ( x ( τ )) ∈ P and also ~ ∈ P , it follows that y ( t ) is a convex combination of points in P , and hence belongs to P .Let x ∗ ∈ P be such that F ( x ∗ ) = OP T . Also let x ∈ { x ( t ) : 0 ≤ t ≤ } and u = ( x ∗ − x ) ∨ ,i.e., x ∗ ∨ x = x + u . By Taylor’s Theorem and non-positivity of the third order derivatives of F wehave F ( x ∗ ∨ x ) ≤ F ( x )+ u T ∇ F ( x )+ 12 u T ∇ F ( x ) u ≤ F ( x )+ (cid:16) σ || u || || x || (cid:17) u T ∇ F ( x ) ≤ F ( x )+ (cid:16) σ c (cid:17) u T ∇ F ( x ) , where the second inequality follows from smoothness, and the third from the fact that || x ( t ) || ≥|| x (0) || = c || v ∗ || ≥ c || u || . Thus u T ∇ F ( x ) ≥ (cid:16) c c + σ (cid:17)(cid:16) F ( x ∨ x ∗ ) − F ( x ) (cid:17) ≥ (cid:16) c c + σ (cid:17)(cid:16) OP T − F ( x ) (cid:17) , (18)where the last inequality follows from monotonicity. We also have that v max ( x ) · ∇ F ( x ) ≥ x ∗ · ∇ F ( x ) ≥ u · ∇ F ( x ) , where the ﬁrst inequality follows by deﬁnition of v max and the fact that x ∗ ∈ P , and the secondinequality from the fact that x ∗ ≥ u and ∇ F ≥ . Combining this with (18) yields: v max ( x ) · ∇ F ( x ) ≥ (cid:16) c c + σ (cid:17)(cid:16) OP T − F ( x ) (cid:17) , (19)for any x ∈ { x ( t ) : 0 ≤ t ≤ } . Let us denote ρ = 2 c/ (2 c + σ ) . We can use the Chain Rule to get ddt F ( x ( t )) = ∇ F ( x ( t )) · x ′ ( t ) = ∇ F ( x ( t )) · (1 − c ) v max ( x ( t )) ≥ ρ (1 − c ) h OP T − F ( x ( t )) i , (20)where the last inequality follows from (19).We solve the above diﬀerential inequality by multiplying by e ρ (1 − c ) t . ddt [ e ρ (1 − c ) t · F ( x ( t ))] = ρ (1 − c ) e ρ (1 − c ) t · F ( x ( t )) + e ρ (1 − c ) t · ddt F ( x ( t )) ≥ ρ (1 − c ) e ρ (1 − c ) t · F ( x ( t )) + ρ · e ρ (1 − c ) t (1 − c )[ OP T − F ( x ( t ))]= ρ (1 − c ) e ρ (1 − c ) t · OP T. and t we get e ρ (1 − c ) t · F ( x ( t )) − e · F ( x (0)) ≥ ρ (1 − c ) OP T Z t e ρ (1 − c ) τ dτ = ρ (1 − c ) OP T · [ e ρ (1 − c ) t ρ (1 − c ) − ρ (1 − c ) ] = OP T · [ e ρ (1 − c ) t − . Hence F ( x ( t )) ≥ [1 − e ρ (1 − c ) t ] OP T + F ( x (0)) e ρ (1 − c ) t ≥ [1 − e ρ (1 − c ) t ] OP T, where the last inequality follows from the fact that F is non-negative. Substituting t = 1 and ρ = 2 c/ (2 c + σ ) gives the desired result. E.3 Continuous Greedy and Pipage Rounding for -meta-submodular functions In this section, we provide an adaptation of the continuous greedy algorithm for maximizing a -meta-submodular function over a polytime separable downward closed polytope. We also show thatthe pipage rounding algorithm can be used to round the solution of the continuous greedy over amatroid polytope. Theorem 12.

There is a randomized (1 − e − o (1)) -approximation for maximizing a non-negative,monotone, -meta-submodular function subject to a matroid constraint. Given a matroid M = ([ n ] , I ) , and an independent set R ∈ I , we denote by M R = ([ n ] − R, I R ) the contraction of M by R . That is, I ∈ I R if and only if R ∪ I ∈ I . We denote by P R ⊆ [0 , [ n ] − R its associated matroid polytope. We also deﬁne an extended version of P R , as ¯ P R = { x ∈ [0 , n : x | R = 0 , x | [ n ] − R ∈ P R } , where x | R ∈ [0 , R denotes the restriction of x to the components in R .That is, ¯ P R is obtained by extending the contracted polytope P R to the original space [0 , n , andsetting all components x i = 0 for i ∈ R . Theorem 13.

Let f be a non-negative monotone -meta submodular function and F be its multi-linear extension. Let M = ([ n ] , I ) be a matroid, P ( M ) its corresponding polytope, and R ∈ I anindependent set. Then, the continuous greedy process described in Algorithm 3, outputs a vector x ∈ P ( M ) satisfying x ≥ R and F ( x ) ≥ [1 − e − ] · OP T R where OP T R := max { F ( x ) : x ≥ R } .Proof. For each t ∈ [0 , we have x ( t ) = x (0) + Z t v max ( x ( τ )) dτ = R + Z t v max ( x ( τ )) dτ. (21)Note that x ∈ ¯ P R if and only if x is a convex combination x = P mi =1 λ i S i of some independentsets S i ∈ I R (i.e. R ∪ S i ∈ I ). Thus, R + x = R + P mi =1 λ i S i = P mi =1 λ i [ R + S i ] ∈ P ( M ) since R + S i ∈ P ( M ) for each i ∈ [ m ] . Given that each v max ( x ( τ )) ∈ ¯ P R for each τ , it follows that ( R t v max ( x ( τ )) dτ ) ∈ ¯ P R and therefore x ( t ) ∈ P ( M ) . Moreover, it is clear that x ( t ) ≥ R .36et U := { y + R : y ∈ ¯ P R } , or equivalently, U = { x ∈ P ( M ) : x | R = R } . Let x, x ∗ ∈ U besuch that F ( x ∗ ) = OP T R and u = ( x ∗ − x ) ∨ , i.e., x ∗ ∨ x = x + u . By Theorem 7, we know that F is one-sided -smooth at U . Hence, we have by Taylor’s Theorem that for some ǫ ∈ (0 , : F ( x ∗ ∨ x ) = F ( x ) + u T ∇ F ( x + ǫu ) ≤ F ( x ) + (cid:18) || x + ǫu || || x || (cid:19) u T ∇ F ( x ) = F ( x ) + u T ∇ F ( x ) where the inequality follows from Lemma 3. Hence u T ∇ F ( x ) ≥ F ( x ∨ x ∗ ) − F ( x ) ≥ OP T R − F ( x ) . (22)We also have that v max ( x ) · ∇ F ( x ) ≥ ( x ∗ − R ) · ∇ F ( x ) ≥ u · ∇ F ( x ) , where the ﬁrst inequality follows by deﬁnition of v max and the fact that x ∗ − R ∈ ¯ P R , and thesecond inequality from the fact that x ∗ − R ≥ u and ∇ F ≥ . Combining this with (22) yields: v max ( x ) · ∇ F ( x ) ≥ OP T R − F ( x ) . (23)We can now use the Chain Rule to get ddt F ( x ( t )) = ∇ F ( x ( t )) · x ′ ( t ) = ∇ F ( x ( t )) · v max ( x ( t )) ≥ OP T R − F ( x ( t )) , (24)where the last inequality follows from Equation (23).We solve the above diﬀerential inequality by multiplying by e t . ddt [ e t · F ( x ( t ))] = e t · F ( x ( t )) + e t · ddt F ( x ( t )) ≥ e t · F ( x ( t )) + e t [ OP T R − F ( x ( t ))] = e t · OP T R . where the inequality follows from Equation (24).Integrating the LHS and RHS of the above equation between and t we get e t · F ( x ( t )) − e · F ( x (0)) ≥ OP T R Z t e τ dτ = OP T R · [ e t − . Hence F ( x ( t )) ≥ [1 − e t ] OP T R + F ( x (0)) e t ≥ [1 − e t ] OP T R , where the last inequality follows from the fact that F is nonnegative. Taking t = 1 we get F ( x (1)) ≥ [1 − e ] OP T R . Algorithm 3:

Jump-Start Continuous Greedy for Contracted Matroids Input:

A monotone set function f , its multi-linear extension F , a matroid M , an independent set R , and itsextended contracted polytope ¯ P R x (0) ← R v max ( x ) ← arg max v ∈ ¯ P R { v T ∇ F ( x ) } for t ∈ [0 , do Solve x ′ ( t ) = v max ( x ( t )) with boundary condition x (0) = R return x (1) ; This now leads to the following result. 37 orollary 1.

Let f be a non-negative monotone -meta-submodular function and F be its multi-linear extension. Let M = ([ n ] , I ) be a matroid, and P ( M ) its corresponding polytope. For each i ∈ [ n ] , let x i denote the output of Algorithm 3 run with R = { i } , and let ¯ x = arg max i ∈ [ n ] F ( x i ) .Then ¯ x ∈ P ( M ) and F (¯ x ) ≥ [1 − e − ] · max { f ( S ) : S ∈ I} . Proof.

Let O = arg max S ∈I f ( S ) and i ∈ O . Then O ≥ { i } , and hence F (¯ x ) ≥ F ( x i ) ≥ (1 − e ) · max { F ( x ) : x ≥ { i } } ≥ (1 − e ) F ( O ) = (1 − e ) f ( O ) . where the second inequality follows from Theorem 13.Hence, we can ﬁnd a (1 − /e ) -approximate fractional solution by running the continuous greedyprocess n times. By standard techniques (see [49, 12]), one may discretize the continuous greedyprocess to obtain a ﬁnite algorithm achieving a (1 − /e − o (1)) -approximation. In fact, it may bethe case that a more careful analysis provides a clean (1 − /e ) -approximation.We now discuss a randomized technique that allows to round eﬃciently in the matroid polytope.This rounding technique was initially introduced by Ageev and Sviridenko [2], and later adapted formatroid polytopes by Calinescu et al. [11]. This rounding procedure is known as randomized pipagerounding and we describe it in Algorithm 5 (also note that it uses Algorithm 4 as a subroutine). Algorithm 4:

Reﬁnement Subroutine

Input:

A vector x ∈ [0 , n and two components i, j ∈ { , , . . . , n } Let S = { S ⊆ V : i ∈ S, j / ∈ S } Compute S ∗ = arg min S ∈S [ r ( S ) − x ( S )] and let ξ ∗ = r ( S ∗ ) − x ( S ∗ ) if x j < ξ ∗ then x i ← x i + x j , x j ← , S ′ ← { j } else x i ← x i + ξ ∗ , x j ← x j − ξ ∗ , S ′ ← S ∗ Output ( x, S ′ ) By monotonicity we may assume that the output x ∗ of the continuous greedy algorithm (de-scribed in Section 4) is without loss of generality in the base polytope. We then have the following. Theorem 14.

Let f : 2 [ n ] → R ≥ be a 0-meta-submodular set function and F : [0 , n → R ≥ itsmultilinear extension. Let M be a matroid and x ∗ ∈ B ( M ) be the output of Corollary 1 over M .Then Algorithm 5 outputs in polynomial time a random base B of M such that E [ B ] = x ∗ and E [ f ( B )] ≥ F ( x ∗ ) .Proof. It is well known [11] that the randomized pipage rounding algorithm ﬁnishes in polynomialtime. We next argue that there is no loss (on expectation) in the objective value during the rounding.Let x ∗ be the output of Corollary 1. Hence x ∗ i ∗ = 1 for some i ∗ ∈ [ n ] , and by Proposition 10 itfollows that F is -smooth over the region R := { x ∈ [0 , n : x i ∗ = 1 } , that is, ∇ ij F ( x ) ≤ for all x ∈ R .Given any x ∈ R and i ∗ = i, j ∈ [ n ] , let φ x ( t ) := F ( x + t ( { i } − { j } )) . Then φ ′′ x ( t ) = − ∇ ij F ( x + t ( { i } − { j } )) ≥ , since x + t ( { i } − { j } ) ∈ R . Hence φ x is convex.Let x be the current point during the rounding procedure, and i, j be the current changingcoordinates. The next point is then given by x ′ = x + t ( { i } − { j } ) , where t is a random variablesuch that E [ t ] = 0 . Then conditioning on the current point x and changing coordinates i, j , by38 lgorithm 5: Pipage Rounding

Input:

A vector x ∈ [0 , n and a matroid polytope P ( M ) while x not integral do S ← V while S has fractional variables do Choose i, j ∈ S fractional ( x + , S + ) ← Reﬁnement Subroutine ( x, i, j ) ( x − , S − ) ← Reﬁnement Subroutine ( x, j, i ) if x = x + = x − then S ← S ∩ S + else p ← || x + − x |||| x + − x − || With probability p x ← x − , S ← S ∩ S − Otherwise x ← x + , S ← S ∩ S + Output x Jensen’s inequality we get E [ F ( x ′ | x, i, j )] = E [ φ x ( t )] ≥ φ x (0) = F ( x ) . Since this is true for anychoice of i, j that could be modiﬁed at that step, the result follows.Note that Corollary 1 and Theorem 14 now prove Theorem 12. E.4 Jump-Start Continuous Greedy for General Monotone Functions

In this section, we provide an adaptation of the jump-start continuous greedy algorithm that canbe used for maximizing the multi-linear extension of a general monotone set function (Algorithm 6).This relies on the sub-domain smoothness result provided in Proposition 11 (Appendix D).

Theorem 15.

Let f : 2 [ n ] → R be a non-negative, monotone set function and F be its multi-linear extension. Let c ∈ (0 , and P be a polytime separable, downward-closed, convex polytopesuch that { i } ∈ P for any i ∈ [ n ] . Let σ be the one-sided smoothness parameter on { y | y ≥ c ( || v ∗ || +1 [ n ] n + || v ∗ || || v ∗ || +1 v ∗ ) } where, v ∗ = arg max x ∈ P || x || . Then Algorithm 6 outputs x (1) ∈ P suchthat F ( x (1)) ≥ [1 − exp ( − (1 − c )( cc + 2 ) σ )] · OP T where

OP T := max { F ( x ) : x ∈ P } .Proof. We know that { i } ∈ P for any i ∈ [ n ] and so a convex combination of these is also in thepolytope which means [ n ] n ∈ P . Hence, since v ∗ ∈ P and P is convex, ( 1 || v ∗ || + 1 [ n ] n + || v ∗ || || v ∗ || + 1 v ∗ ) ∈ P. For each t ∈ [0 , we have x ( t ) = x (0) + (1 − c ) Z t v max ( x ( τ )) dτ (25)39ince P is convex and ( || v ∗ || +1 [ n ] n + || v ∗ || || v ∗ || +1 v ∗ ) ∈ P , we have that x ( t ) ∈ P as long as y ( t ) := R t v max ( x ( τ )) dτ ∈ P . Given that each v max ( x ( τ )) ∈ P and also ~ ∈ P , it follows that y ( t ) is aconvex combination of points in P , and hence belongs to P .Let x ∗ ∈ P be such that F ( x ∗ ) = OP T . Let y ≥ x (0) and u = ( x ∗ − y ) ∨ , i.e., x ∗ ∨ y = y + u .Note that all the coordinate of x (0) are non-zero. We have by Taylor’s Theorem that for some ǫ ∈ (0 , : F ( x ∗ ∨ y ) = F ( y )+ u T ∇ F ( y + ǫu ) ≤ F ( y )+ (cid:18) || y + ǫu || || y || (cid:19) σ u T ∇ F ( y ) ≤ F ( y )+ (cid:18) || y + u || || y || (cid:19) σ u T ∇ F ( y ) where the ﬁrst inequality follows from Proposition 11 and Lemma 3. Hence u T ∇ F ( y ) ≥ (cid:16) || y + u || || y || (cid:17) σ ( F ( y ∨ x ∗ ) − F ( y )) ≥ (cid:16) || y + u || || y || (cid:17) σ ( OP T − F ( y )) , (26)where the last inequality follows from monotonicity since then F ( y ∨ x ∗ ) ≥ F ( x ∗ ) = OP T .The deﬁnition of v max implies that v max ( y ) · ∇ F ( y ) ≥ x ∗ · ∇ F ( y ) . Since f is monotonic, ∇ F ≥ .Hence since u = ( x ∗ − y ) ∨ ≤ x ∗ , we also have x ∗ · ∇ F ( y ) ≥ u · ∇ F ( y ) . Combining these with (26)yields: v max ( y ) · ∇ F ( y ) ≥ (cid:16) || y + u || || y || (cid:17) σ ( OP T − F ( y )) . (27)By the choice of x (0) we have that for any w ∈ P , || x (0) || = || c ( 1 || v ∗ || + 1 [ n ] n + || v ∗ || || v ∗ || + 1 v ∗ ) || = c || v ∗ || + 1 || v ∗ || + 1 = c || v ∗ || + 1) || v ∗ || + 1 ≥ c || v ∗ || ≥ c || w || Since u ∈ P and x ( t ) is non-decreasing in each component (because v max is always non-negative),we thus have || x ( t ) + u || || x ( t ) || ≤ || u || || x ( t ) || ≤ || u || || x (0) || ≤ c = c + 2 c . Hence we deduce that (cid:16) || x ( t )+ u || || x ( t ) || (cid:17) σ ≥ ( cc + 2 ) σ for all x ( t ) . Let us deﬁne ρ to be the righthand side quantity above. Intuitively, (27) indicates thatthe direction v max makes at least a ρ “fractional progress” towards OPT.Moreover, we can use the Chain Rule to get ddt F ( x ( t )) = ∇ F ( x ( t )) · x ′ ( t ) = ∇ F ( x ( t )) · (1 − c ) v max ( x ( t )) ≥ ρ (1 − c )[ OP T − F ( x ( t ))] , (28)where the last inequality follows from Equation (27).We solve the above diﬀerential inequality by multiplying by e ρ (1 − c ) t . ddt [ e ρ (1 − c ) t · F ( x ( t ))] = ρ (1 − c ) e ρ (1 − c ) t · F ( x ( t )) + e ρ (1 − c ) t · ddt F ( x ( t )) ≥ ρ (1 − c ) e ρ (1 − c ) t · F ( x ( t )) + ρ · e ρ (1 − c ) t (1 − c )[ OP T − F ( x ( t ))]= ρ (1 − c ) e ρ (1 − c ) t · OP T. and t we get e ρ (1 − c ) t · F ( x ( t )) − e · F ( x (0)) ≥ ρ (1 − c ) OP T Z t e ρ (1 − c ) τ dτ = ρ (1 − c ) OP T · [ e ρ (1 − c ) t ρ (1 − c ) − ρ (1 − c ) ] = OP T · [ e ρ (1 − c ) t − . Hence F ( x ( t )) ≥ [1 − e ρ (1 − c ) t ] OP T + F ( x (0)) e ρ (1 − c ) t ≥ [1 − e ρ (1 − c ) t ] OP T, where the last inequality follows from the fact that F is nonnegative. Taking t = 1 we get F ( x (1)) ≥ [1 − e ρ (1 − c ) ] OP T.

Substituting ρ = ( cc +2 ) σ gives the desired result. Algorithm 6:

Jump-Start Continuous Greedy for Monotone Functions Input:

A monotone set function f , its multi-linear extension F , a polytime separable, downward-closedpolytope P ⊆ [0 , n and c ∈ (0 , . v ∗ ← arg max x ∈ P || x || x (0) ← c ( || v ∗ || +1 [ n ] n + || v ∗ || || v ∗ || +1 v ∗ ) v max ( x ) ← arg max v ∈ P { v T ∇ F ( x ) } for t ∈ [0 , do Solve x ′ ( t ) = (1 − c ) v max ( x ( t )) with boundary condition x (0) = c ( || v ∗ || +1 [ n ] n + || v ∗ || || v ∗ || +1 v ∗ ) return x (1) ; F Appendix: Local Search

In this section, we provide two key lemmas for bounding the Taylor series expansion for γ -meta-submodular functions. We later use these results to analyze the local search algorithm. Lemma 8.

Let f be a non-negative, monotone, γ -meta submodular function and F be its multi-linear extension. Let R ⊆ [ n ] such that | R | ≥ . Then TR ∇ F ( R ) = X i ∈ R B i ( R − i ) ≤ (2( ⌊ | R | ⌋ + ⌈ | R | ⌉ ⌊ | R | ⌋⌈ | R | ⌉ + 2) γ + 2) f ( R ) ≤ (9 γ + 2) f ( R ) Proof.

Partition R into two sets of size ⌊ | R | ⌋ and of size ⌈ | R | ⌉ like S and T . Using Theorem 7, weknow that F is one-sided (2( ⌊ | R | ⌋ / ⌈ | R | ⌉ + 1) γ ) -smooth on { y | T ≤ y ≤ R } and it is one-sided (2( ⌈ | R | ⌉ / ⌊ | R | ⌋ + 1) γ ) -smooth on { y | S ≤ y ≤ R } . Let c = 2( ⌈ | R | ⌉ / ⌊ | R | ⌋ + 1) γ . We show that X i ∈ T B i ( R − i ) ≤ cf ( R ) . h ( t ) = F ( S + t T ) and g ( t ) = TT ∇ F ( S + t T ) where ≤ t ≤ . Note that g ( t ) = h ′ ( t ) and TT ∇ F ( S + t T ) T = g ′ ( t ) . Since F is one-sided c -smooth at any given point S ≤ y ≤ R , wehave g ′ ( t ) = TT ∇ F ( S + t T ) T ≤ c ( || T || || S + t T || )( TT ∇ F ( S + t T )) ≤ c t g ( t ) . Therefore, tg ′ ( t ) ≤ cg ( t ) . Integrating both sides, we get Z tg ′ ( t ) dt ≤ Z cg ( t ) dt. Applying the integration by parts formula to the left hand side, we get tg ( t ) (cid:12)(cid:12)(cid:12)(cid:12) − Z g ( t ) dt ≤ c Z g ( t ) dt. It follows that · g (1) − · g (0) = TT ∇ F ( S + T ) = TT ∇ F ( R ) = X i ∈ T B i ( R − i ) ≤ ( c + 1) Z g ( t ) dt. By using g ( t ) = h ′ ( t ) we have X i ∈ T B i ( R − i ) ≤ ( c + 1) Z h ′ ( t ) dt = ( c + 1)( h (1) − h (0)) = ( c + 1)( F ( S + T ) − F ( S )) ≤ ( c + 1) F ( R ) = ( c + 1) f ( R ) . This means that X i ∈ T B i ( R − i ) ≤ (2( ⌈ | R | ⌉ / ⌊ | R | ⌋ + 1) γ + 1) f ( R ) . With the same argument we can conclude that X i ∈ S B i ( R − i ) ≤ (2( ⌊ | R | ⌋ / ⌈ | R | ⌉ + 1) γ + 1) f ( R ) , and combining these inequalities yields the lemma. Lemma 9.

Let f be a non-negative, monotone, γ -meta-submodular function, F be its multi-linearfunction, R ⊂ [ n ] , and x ∈ [0 , n such that || x || ≤ | R | . Let u = R ∨ x − R . Then for ≤ ǫ ≤ u T ∇ F ( R + ǫu ) ≤ γ u T ∇ F ( R ) Proof.

By Theorem 7, we know that F is one-sided γ -smooth on A = { y | y ≥ R , || y || ≤ | R |} .Therefore F is one-sided γ -smooth on B = { y | R + ǫu ≥ y ≥ R } because B ⊆ A . Therefore, thedesired result yields by Lemma 3. 42 .1 Local Search for γ -meta-submodular functions Theorem 8.

Let f ∈ G γ and M = ([ n ] , I ) be a matroid of rank r . Let A ∈ I be an optimum set,i.e., A ∈ arg max R ∈I f ( R ) , and S ∈ I be an (1 + ǫn ) -approximate local optima, i.e., for any i and j such that S − i + j ∈ I , (1 + ǫn ) f ( S ) ≥ f ( S − i + j ) , where ǫ > is a constant. Then if γ = O ( r ) , f ( A ) ≤ O ( γ γ ) f ( S ) and if γ = ω ( r ) , f ( A ) ≤ O ( γ γ ) f ( S ) .Proof. Since f is monotone, we assume that | S | = | A | = r . Given the exchangeability propertyof matroids, there is a bijective mapping ([46]) g : S \ A → A \ S such that S − i + g ( i ) ∈ I where i ∈ S \ A . Since S is a (1 + ǫn ) -approximate local optima, for all i ∈ S \ A we have (1 + ǫn ) f ( S ) ≥ f ( S − i + g ( i )) . That is, ǫn f ( S ) + B i ( S − i ) ≥ B g ( i ) ( S − i ) . Using this we get B g ( i ) ( S ) = B g ( i ) ( S − i ) + A ig ( i ) ( S − i ) ≤ B g ( i ) ( S − i ) + γ ( B g ( i ) ( S − i ) + B i ( S − i ) r − ≤ γ + r − r − B i ( S − i ) + ǫ ( γ + r − r − n f ( S ) , where the equality follows from Lemma 2 and the ﬁrst inequality from γ -meta-submodularity. There-fore, X i ∈ S \ A B g ( i ) ( S ) ≤ γ + r − r − X i ∈ S \ A B i ( S − i ) + o (1) f ( S ) . Now, by Taylor’s Theorem, Lemma 9, and the above inequality, we have f ( S ∪ A ) = F ( S ∨ A ) = F ( S + A \ S ) = F ( S ) + TA \ S ∇ F ( S + ǫ ′ A \ S ) ≤ F ( S ) + 2 γ TA \ S ∇ F ( S ) = F ( S ) + 2 γ X i ∈ S \ A B g ( i ) ( S ) ≤ (1 + 2 γ · o (1)) f ( S ) + 2 γ + r − r − γ X i ∈ S \ A B i ( S − i ) Therefore, using the monotonicity of f and Lemma 8 we get f ( A ) ≤ f ( S ∪ A ) ≤ (1 + 2 γ · o (1)) f ( S ) + 2 γ + r − r − γ (9 γ + 2) f ( S )= [ 2 γ + r − r − γ (9 γ + 2) + 1 + 2 γ · o (1)] f ( S ) . F.2 Runtime of the Local Search Algorithm

In this section, we analyze the runtime of the local search algorithm that ﬁnds an approximate localoptima.

Lemma 10.

Let f be a non-negative, monotone, γ -meta submodular function and M = ([ n ] , I ) bea matroid of rank r . Let A ∈ I be an optimum set, i.e., A ∈ arg max R ∈I f ( R ) , and S ∈ arg max { v,v ′ }∈I f ( { v, v ′ } ) . Then f ( A ) ≤ O ( r ( γ + 1) r − ) f ( S ) . roof. Let A = { a , . . . , a r } and A i = { a , . . . , a i } for ≤ i ≤ r . By deﬁnition of S we know that f ( A ) ≤ f ( S ) . Now by induction we show that for any ≤ i < j ≤ n , B a j ( A i ) ≤ O (( γ +1) i − ) f ( S ) .The base case is i = 2 . By deﬁnition of f ( S ) , monotonicity and meta submodularity of f , we have B a j ( A ) = B a j ( A ) + A a a j ( A ) ≤ B a j ( A ) + γ ( B a j ( A ) + B a ( A )) ≤ (2 γ + 1) f ( S ) ≤ O ( γ + 1) f ( S ) . Now assume that for k < j ≤ n , we have B a j ( A k ) ≤ O ( γ k − ) f ( S ) . We want to show that for k + 1 < j ≤ n , we have B a j ( A k +1 ) ≤ O ( γ k ) f ( S ) . B a j ( A k +1 ) = B a j ( A k ) + A a k +1 a j ( A k ) ≤ B a j ( A k ) + γk ( B a k +1 ( A k ) + B a j ( A k )) ≤ (1 + 2 γk ) O (( γ + 1) k − ) f ( s ) ≤ O (( γ + 1) k ) f ( S ) . We know that f ( A ) = f ( A ) + r X i =3 B a i ( A i − ) ≤ f ( S ) + r X i =3 O (( γ + 1) i − ) f ( S ) ≤ O ( r ( γ + 1) r − ) f ( S ) Proposition 13.

Local search algorithm (Algorithm 2) runs in O ( n (log( r ) + r log( γ + 1) /ǫ ) timeon a γ -meta submodular functions and a matorid of rank r .Proof. Cost of ﬁnding S is O ( n ) . Also, each iteration of the while loop costs O ( n ) . Let S k bethe solution after k iterations and A be an optimum solution. By Lemma 10, we know f ( S k ) ≤ (1 + ǫn ) k f ( S ) ≤ f ( A ) ≤ O ( r ( γ + 1) r − ) f ( S ) . Taking the logarithm, we have k ln(1 + ǫn ) ≤ O (ln( r ) + ( r −

2) ln( γ + 1)) . Noting that x − x ≤ ln x for any x > , we have k ( ǫn ) / ( n + ǫn ) ≤ O (ln( r ) + ( r −

2) ln( γ + 1)) . This yields the result.

F.3 Local Search for Set Functions with a Smooth Multi-Linear Extension

In this section, we ﬁrst provide a key lemma for bounding the Taylor series expansion of smoothmulti-linear extension. Then we show that the local search algorithm ﬁnds a solution which iswithin O ( σ σ ) -approximation of the optimal solution of the matroid polytope. One can also viewthis result as an integrality gap result for the matroid polytope. Lemma 11.

Let F : [0 , n be a one-sided σ -smooth function where F ( ~ ≥ . Then x T ∇ F ( x ) ≤ ( σ + 1) F ( x ) and x T ∇ F ( x ) x ≤ σ ( σ + 1) F ( x ) . roof. Given x ∈ [0 , n , let h x ( t ) = F ( tx ) and g x ( t ) = x T ∇ F ( tx ) where t ∈ R . Note that g x ( t ) = h ′ x ( t ) and x T ∇ F ( tx ) x = g ′ x ( t ) . Since F is one-sided σ -smooth, for ≤ t ≤ we have g ′ x ( t ) = x T ∇ F ( tx ) x ≤ σ ( || x || || tx || )( x T ∇ F ( tx )) = σ t g x ( t ) . Therefore, tg ′ x ( t ) ≤ σg x ( t ) , and integrating both sides, we get Z tg ′ x ( t ) dt ≤ Z σg x ( t ) dt. Applying the integration by parts formula to the left hand side, we get tg x ( t ) (cid:12)(cid:12)(cid:12)(cid:12) − Z g x ( t ) dt ≤ σ Z g x ( t ) dt. It follows that · g x (1) − · g x (0) = x T ∇ F ( x ) ≤ ( σ + 1) Z g x ( t ) dt. By using g x ( t ) = h ′ x ( t ) we have x T ∇ F ( x ) ≤ ( σ + 1) Z h ′ x ( t ) dt = ( σ + 1)( h x (1) − h x (0)) = ( σ + 1)( F ( x ) − F ( ~ σ + 1) F ( x ) . By one-sided σ -smoothness we have x T ∇ F ( x ) x ≤ σx T ∇ F ( x ) . Hence, x T ∇ F ( x ) x ≤ σ ( σ + 1) F ( x ) . Theorem 16.

Let f be a non-negative, monotone set function such that its multi-linear extension F is one-sided σ -smooth, for some non-negative integer σ . Let M = ([ n ] , I ) be a matroid of rank r and P be its associated polytope. Let x ∈ P be such that || x || = c where c ∈ { , . . . , r } . Let S ∈ I ofsize c be an approximate local optima such that S ⊆ supp ( x ) , i.e., for any a ∈ S and b ∈ supp ( x ) \ S such that S − a + b ∈ I , (1 + ǫn ) f ( S ) ≥ f ( S − a + b ) , where ǫ > . Then if σ = O ( c ) , F ( x ) ≤ O ( σ σ ) f ( S ) and if σ = ω ( c ) , F ( x ) ≤ O ( σ σ ) f ( S ) .Proof. Let u = ( S ∨ x ) − S , i.e. S ∨ x = S + u . It follows that || u || ≤ || x || = c . By Taylor’sTheorem and Lemma 3 we have that for some ǫ ∈ (0 , F ( S ∨ x ) = F ( S + u ) = F ( S ) + u T ∇ F ( S + ǫu ) ≤ F ( S ) + u T ∇ F ( S ) (cid:18) || S + ǫu || || S || (cid:19) σ . Using that | S | = c , ǫ ∈ (0 , , and || u || ≤ c , we get F ( x ) ≤ F ( S ∨ x ) ≤ F ( S ) + u T ∇ F ( S ) (cid:18) cc (cid:19) σ ≤ f ( S ) + 2 σ u T ∇ F ( S ) . (29)45et e ∈ supp ( u ) . Because of the exchange property, there is an a ∈ S such that S − a + e ∈ I .Because of the selection of S , we know that (1+ ǫn ) f ( S ) ≥ f ( S − a + e ) . Hence ǫn f ( S )+ B a ( S − a ) ≥ B e ( S − a ) . Therefore, We have ∇ e F ( S ) = B e ( S ) = B e ( S − a ) + A ae ( S − a ) ≤ B e ( S − a ) + σ ( B e ( S − a ) + B a ( S − a ) c − ≤ c − σc − B a ( S − a ) + ( c − σ ) ǫ ( c − n f ( S ) Let S = { a , . . . , a c } such that B a ( S − a ) ≥ · · · ≥ B a c ( S − a c ) . Bounding B e ( S ) with B a i ( S − a i ) where i is large is better. Let R i = { e i , . . . , e ik i } be the set of elements in supp ( u ) thatare exchangeable with a i but are not exchangeable with any of a i +1 , . . . , a c . It is obvious that R i ’spartition supp ( u ) . Let t i = P e ∈ R i u e . By contradiction, we show that if i ≤ c − then P ij =1 t j ≤ i .We know that for R ⊆ [ n ] and y ∈ P we have P e ∈ R y e ≤ r M ( R ) where r M is the rank function ofthe matroid. If P ij =1 t j > i then r M ( S ij =1 R i ) > i . This means that there is R ⊆ S ij =1 R i such that | R | ≥ i + 1 and R ∈ I . Now because of the exchange properties of matroids, we can add elementsof S to R until they are the same size. Call this new set R ′ . Let T S = S \ R ′ and T R = R ′ \ S . | T S | = | T R | = i +1 . Therefore, there is a perfect matching of exchangeablity between T R and T S [46].This contradicts our assumption because elements in S ij =1 R i are only exchangeable with a , . . . , a i .Now, we have u T ∇ F ( S ) = X e ∈ supp ( u ) u e ∇ e F ( S ) ≤ c X j =1 X e ∈ R i u e ( c − σc − B a j ( S − a j ) + ( c − σ ) ǫ ( c − n f ( S ))= c X j =1 t j ( c − σc − B a j ( S − a j ) + ( c − σ ) ǫ ( c − n f ( S ))= c − σc − c X j =1 t j B a j ( S − a j )) + c ( c − σ ) ǫ ( c − n f ( S ) . (30)By Lemma 11, we know that c X j =1 B a j ( S − a j ) = TS ∇ F ( S ) ≤ ( σ + 1) F ( S ) . We also know that B a ( S − a ) ≥ · · · ≥ B a c ( S − a c ) , P cj =1 t j = || u || ≤ c , and P ij =1 t j ≤ i for i = 1 , . . . , c − . Now, we show that c X j =1 t j B a j ( S − a j ) ≤ ( σ + 1) f ( S ) . We try to ﬁnd the maximizer of the above. Fix the value of B a j ( S − a j ) ’s. For any j < k , if weincrease the value of t j by ǫ and decrease the value of t k by ǫ , the value of the summation willincrease. This means that the maximum happens when t , . . . , t ⌊|| u || ⌋ are equal to one and t ⌈|| u || ⌉ is equal to || u || − ⌊|| u || ⌋ . Therefore, c X j =1 t j B a j ( S − a j ) ≤ c X j =1 B a j ( S − a j ) ≤ ( σ + 1) f ( S ) . u T ∇ F ( S ) ≤ c − σc − σ + 1) f ( S ) + c ( c − σ ) ǫ ( c − n f ( S ) . Hence, if σ = O ( c ) then u T ∇ F ( S ) ≤ O ( σ ) f ( S ) and if σ = ω ( c ) then u T ∇ F ( S ) ≤ O ( σ ) f ( S ) .Combining this with (29) yields the result. F.4 Local Search for Second-Order-Submodular γ -Meta-Submodular Functions In this section, we ﬁrst provide a key lemma for bounding the Taylor series expansion of multi-linearextension of second-order-submodular functions. Then using this, we show that the modiﬁed localsearch algorithm (Algorithm 2) can be used to ﬁnd an O ( γ / ) -approximation for maximizing asecond-order-submodular γ -meta-submodular subject to a matroid constraint. Lemma 12.

Let f : 2 n → R be a non-negative, second-order-submodular set function and F be itsmulti-linear extension. Then for any R ⊆ [ n ] , P u ∈ R B u ( R ) ≤ f ( R ) . If f is also monotone then x ∈ [0 , n , x T ∇ F ( x ) x ≤ F ( x ) .Proof. For the ﬁrst part, WLOG let R = [ k ] (we can always relabel the elements so that this istrue) and R i = [ i ] . By Lemma 2, we have X i ∈ R B i ( R ) = k X i =1 (cid:0) f ( { i } ) + k X j =1 A ij ( R j − ) (cid:1) . Since B i ( R i ) = B i ( R i − ) , and f ( R ) = f ( ∅ ) = 0 we have f ( R ) = 2 k X i =1 B i ( R i ) = 2 k X i =1 (cid:0) f ( { i } ) + i X j =1 A ij ( R j − ) (cid:1) . Moreover, note that k X i =1 k X j =1 A ij ( R j − ) ≤ k X i =1 i X j =1 A ij ( R j − ) since k X i =1 k X j = i +1 A ij ( R j − ) = k X j =1 j − X i =1 A ij ( R j − ) = k X j =1 j − X i =1 A ji ( R j − ) ≤ k X j =1 j − X i =1 A ji ( R i − ) = k X j =1 j X i =1 A ji ( R i − )= k X i =1 i X j =1 A ij ( R j − ) , where the second equality follows from the fact that A ij ( S ) = A ji ( S ) for all i, j ∈ [ n ] and S ⊆ [ n ] ,and the third equality from the fact that A ii ( S ) = 0 for all i ∈ [ n ] and S ⊆ [ n ] . The inequalityfollows since R j − ⊇ R i − and f is second-order-submodular.By non-negativity we also have that f ( { i } ) ≥ f ( { i } ) . This yields the ﬁrst part of the lemma.We now discuss the second part. By the Taylor’s Theorem, non-negativity, monotononicity andsecond-order-submodularity, we have F ( x ) = F (0) + x T ∇ F (0) + 12 x T ∇ F ( ǫx ) x ≥ x T ∇ F ( ǫx ) x ≥ x T ∇ F ( x ) x. heorem 10. Let f be a γ -meta-submodular function which is also second order submodular (thatis, f ’s marginal gains are submodular). Let M = ([ n ] , I ) be a matroid of rank r and minimumcircuit size c . Let A ∈ I be an optimum set, i.e., A ∈ arg max R ∈I f ( R ) , and S ∈ I be an (1 + ǫn ) -approximate local optima, i.e., for any i and j such that S − i + j ∈ I , (1 + ǫn ) f ( S ) ≥ f ( S − i + j ) , where ǫ > is a constant. Then f ( A ) ≤ O ( γ + γ r ) f ( S ) . So Algorithm 2 gives an O ( γ + γ r ) -approximation. If f is also supermodular then Algorithm 2 gives an O (min { γ + γ r , γrc − } ) ≤ O ( γ / ) -approximation.Proof. Since f is monotone, we assume that | S | = | A | = r . Given the exchangeability propertyof matroids, there is a bijective mapping ([46]) g : S \ A → A \ S such that S − i + g ( i ) ∈ I where i ∈ S \ A . Since S is a (1 + ǫn ) -approximate local optima, for all i ∈ S \ A we have (1 + ǫn ) f ( S ) ≥ f ( S − i + g ( i )) . That is, ǫn f ( S ) + B i ( S − i ) ≥ B g ( i ) ( S − i ) . (31)Using this we get B g ( i ) ( S ) = B g ( i ) ( S − i ) + A ig ( i ) ( S − i ) ≤ B g ( i ) ( S − i ) + γ ( B g ( i ) ( S − i ) + B i ( S − i ) r − ≤ γ + r − r − B i ( S − i ) + ǫ ( γ + r − r − n f ( S ) = (cid:16) γr − (cid:17) B i ( S ) + ǫ ( γ + r − r − n f ( S ) , where the ﬁrst equality follows from Lemma 2, the ﬁrst inequality from γ -meta-submodularity, andthe last equality from B i ( S ) = B i ( S − i ) for all i ∈ [ n ] and S ⊆ [ n ] . Thus, X i ∈ S \ A B g ( i ) ( S ) ≤ (cid:16) γr − (cid:17) X i ∈ S \ A B i ( S ) + | S \ A | · ǫ ( γ + r − r − n f ( S ) ≤ (cid:16) γr − (cid:17) X i ∈ S B i ( S ) + ǫ ( γ + r − r − n f ( S ) ≤ (cid:16) γr − o (1) (cid:17) · f ( S ) . where the second inequality follows from monotonicity (i.e. B i ( S ) ≥ ), and the last one followsfrom Lemma 12.Now, by Taylor’s Theorem and the submodularity of the marginal gains of f (i.e. the non-positivity of the third order marginal gains), γ -meta submodularity, and the above inequality, wehave f ( A ) ≤ f ( S ∪ A ) = F ( S ∨ A ) = F ( S + A \ S ) ≤ F ( S ) + TA \ S ∇ F ( S ) + 12 TA \ S ∇ F ( S ) A \ S ≤ F ( S ) + (cid:16) γ | A \ S || S | (cid:17) TA \ S ∇ F ( S ) ≤ F ( S ) + (1 + γ ) TA \ S ∇ F ( S )= F ( S ) + (1 + γ ) X i ∈ S \ A B g ( i ) ( S ) ≤ (cid:16) γ r − γ (cid:0) r − o (1) (cid:1) + 3 + o (1) (cid:17) f ( S )= O (cid:16) γ r + γ (cid:17) f ( S ) . f is also supermodular. Let S ∩ S ′ = { a , . . . , a p } and S ′ \ S = { b , . . . , b p } where { a i , b i } ’s are the edges of the matching. Also, let T i = { a , . . . , a i } and R i = { b , . . . , b i } .Then since M is a maximum weighted matching, we have X i ∈ S \ A A ig ( i ) ( S ) ≤ · | S \ A | c − p X i =1 A a i b i ( S ) ≤ rc − p X i =1 A a i b i ( S ) . (32)We also have that f ( S ′ ) = p X i =1 ( f ( T i ∪ R i ) − f ( T i − ∪ R i − )) = p X i =1 ( B a i ( T i − ∪ R i − ) + B b i ( T i − ∪ R i − + a i ))= p X i =1 (cid:16) B a i ( T i − ∪ R i − ) + f ( { b i } ) + i X j =1 A b i a j ( T j − ) + i − X j =1 A b i b j ( T i − + a i ∪ R j − ) (cid:17) = p X i =1 (cid:16) B a i ( T i − ∪ R i − ) + A b i a i ( T i − ) + f ( { b i } ) + i − X j =1 A b i a j ( T j − ) + i − X j =1 A b i b j ( T i − ∪ R j − + a i ) (cid:17) ≥ p X i =1 A a i b i ( T i − ) ≥ p X i =1 A a i b i ( S ) . (33)where the third equality follows from Lemma 2, the ﬁrst inequality from monotonocity and super-modularity (i.e. all the B i and A ij terms are non-negative), and the last inequality from second-order-submodularity and the fact that T i ⊆ S for any i = 1 , . . . , p .Hence, by combining (32) and (33), we get X i ∈ S \ A A ig ( i ) ( S − i ) = X i ∈ S \ A A ig ( i ) ( S ) ≤ rc − p X i =1 A a i b i ( S ) ≤ rc − f ( S ′ ) . (34)Using Taylor’s Theorem f ( A ) ≤ f ( S ∪ A ) = F ( S ∨ A ) = F ( S + A \ S ) ≤ F ( S ) + TA \ S ∇ F ( S ) + 12 TA \ S ∇ F ( S ) A \ S ≤ F ( S ) + (cid:16) γ | A − S || S | (cid:17) TA \ S ∇ F ( S ) ≤ F ( S ) + (1 + γ ) TA \ S ∇ F ( S )= F ( S ) + (1 + γ ) X i ∈ S \ A B g ( i ) ( S ) = f ( S ) + (1 + γ )( X i ∈ S \ A B g ( i ) ( S − i ) + X i ∈ S \ A A ig ( i ) ( S − i )) ≤ f ( S ) + (1 + γ ) (cid:16) rǫn f ( S ) + X i ∈ S \ A B i ( S − i ) + 2 rc − f ( S ′ ) (cid:17) ≤ f ( S ) + (1 + γ ) (cid:16) rǫn f ( S ) + 2 f ( S ) + 2 rc − f ( S ′ ) (cid:17) ≤ O (cid:0) γrc − (cid:1) max { f ( S ) , f ( S ′ ) } . where the second inequality follows from second-order-submodularity (i.e. the non-positivity of thethird order derivatives), the third inequality from γ -meta submodularity, the ﬁfth inequality from(31) and (34), and the second to last inequality from Lemma 12.We then have that if r ≤ √ γ then γr ≤ γ / , and if r ≥ √ γ then γ r + γ ≤ γ / . Therefore, f ( A ) ≤ O ( γ / ) max { f ( S ) , f ( S ′ ) } . 49 Appendix: Integrality Gaps and Rounding Algorithms

In this section, we provide the omitted results and proofs about the integrality gap and diﬀerentrounding techniques.

G.1 Integrality Gap Lower Bound

In this section, we describe an example that shows the integrality gap of a quadratic function witha σ -semi-metric distance over a matroid polytope is Ω(min { rc − , σr } ) in the worst case, where r isthe rank of the matroid and c is the size of the smallest circuit. Proposition 1.

Let k, t ∈ N with ≤ t ≤ k . There exists a σ -semi-metric with multilinear extension F , and a matroid M = ([2 k ] , I ) with rank r = k + t − and minimum circuit size c = 2 t , wherethe integrality gap of F ( x ) over the matroid polytope P M is Ω(min { rc − , σr } ) .Proof. Let S i = { i − , i } for ≤ i ≤ k , and S = { S , S , . . . , S k } . We deﬁne a matroid M = ([2 k ] , I ) in terms of its circuits as follows. A set C is a circuit of M if and only if C is theunion of any t sets S i . It is then clear that the minimum size c of a circuit is t , and the rank r ofthe matroid is k + t − . For example, M could be the graphic matroid corresponding to the graphin Figure 2. Circuits here correspond to cycles of size , and the dashed lines show the non-zerocoeﬃcients of F .Let F ( x ) = P { u,v }∈S x u x v + P { u,v } / ∈S σ x u x v . It is straightforward to see that F is the multi-linear extension of a σ -semi-metric induced by a complete graph which has weight on edges from S and weight /σ otherwise.By deﬁnition of M and F , it is clear that any integral solution x I ∈ P M maximizing F will pick t − pairs from S and then singletons from other pairs. Therefore F ( x I ) := max x ∈ P M ∩{ , } k F ( x ) = ( t − σ (cid:16)(cid:18) r (cid:19) − ( t − (cid:17) = (1 − σ )( t − σ (cid:18) r (cid:19) = ( σ − c −

2) + r ( r − σ . On the other hand, x = k + t − k [2 k ] ∈ P M and F ( x ) = k ( k + t − k ) + (cid:16)(cid:18) k (cid:19) − k (cid:17) σ ( k + t − k ) = k (cid:0) k + t − k (cid:1) (cid:0) k − σ (cid:1) . Using that r = k + t − and k = r − c + 1 we have k ( k + t − k ) = r r − c + 1) = r r − c + 2) ≥ r , where the last inequality follows since c ≥ . Hence, F ( x ) ≥ r (1 + k − σ ) . It follows that theintegrality gap is at least F ( x ) F ( x I ) ≥ · σr + 2 r ( k − σ − c −

2) + r ( r − ≥ · σrσ ( c −

2) + r ≥ · min { rc − , σr } . G.2 Quadratic Coverage Rounding

In this section, we provide the details about the quadratic coverage rounding.We actually prove the following decomposition result. For x ∗ ∈ P M , we deﬁne the coverage ofa pair u, v to be the quantity x ∗ ( u ) x ∗ ( v ) . Let Cov ∈ R ( n ) be the vector with entries Cov ( u, v ) = x ∗ ( u ) x ∗ ( v ) . As F is quadratic it is linear in these coverage values and the vector x ∗ : F ( x ∗ ) = P u = v ( A ( u,v )2 ) Cov ( u, v ) + P v b ( v ) x ∗ ( v ) . For a set X we say its coverage set is cov ( X ) = {{ u, v } : u, v ∈ X, u = v } . A quadratic coverage of x ∗ is a collection C = { I i , µ i } of weighted independent setswith properties (1) for each u = v , P i : { u,v }⊆ cov ( I i ) µ i ≥ Cov ( u, v ) , and (2) for each v , P i : I i ∋ v µ i ≥ x ∗ ( v ) . Recall that A, b ≥ . It follows that P i µ i F ( I i ) ≥ F ( x ∗ ) and hence if the size P i µ i ≤ K ,then some I i satisﬁes F ( I i ) ≥ F ( x ∗ ) K . This bound depends on the fact that entries of A are non-negative. By condition (1) of quadratic coverages, we have P i µ i cov ( I i ) ≥ Cov and by condition(2), P i µ i I i ≥ x ∗ . Therefore, for such a collection we have P i µ i F ( I i ) ≥ F ( x ∗ ) . This reasoningshows that to deduce Theorem 11, it suﬃces to ﬁnd a quadratic coverage with P i µ i ≤ (3 + rc − ) . Theorem 17.

Let F ( x ) = x T Ax + b T x be a non-negative, quadratic multi-linear polynomial and M be a matroid with rank r = r ([ n ]) and minimum circuit size c ≥ . If x ∗ ∈ P M , then it has aquadratic coverage of size at most rc − .Proof. We start with an arbitrary representation of x ∗ as a convex combination of independent sets: P i λ i B i .First note that Cov ( u, v ) = ( P B i ∋ u λ i )( P B j ∋ v λ j ) = P ( i,j ): B i ∋ u,B j ∋ v λ i λ j . Hence an orderedpair ( B i , B j ) contributes λ i λ j to Cov ( u, v ) if u ∈ B i , v ∈ B j . This implies that if B i = B j , thenthis contributes exactly λ i for every u, v ∈ B i . If B i = B j , then the unordered pair { B i , B j } contributes to coverages as follows. It contributes λ i λ j for every u, v ∈ B i ∩ B j and λ i λ j foreach uv ∈ δ ( B i − B j , B j − B i , B i ∩ B j ) . Here for disjoint node sets X , X , . . . , X p we deﬁne δ ( X , X , . . . , X p ) to be the set of edges which have endpoints in distinct sets from the X i ’s. Hencewe can express the coverage vector Cov for x ∗ in R ( n ) as: X i λ i · cov ( B i ) + X i

In this section, we analyze a modiﬁed version of the swap rounding algorithm (Algorithm 7) and weshow that it ﬁnds an integral solution which is an O (1 + σr ) -approximation of the initial fractionalsolution.First we deﬁne the following notation. d ( S ) = P { i,j }⊆ S d ( i, j ) and d ( S, S ′ ) = P i ∈ S P j ∈ S ′ d ( i, j ) and g ( S ) = P i ∈ S g ( i ) . The following result provides a decomposition of the multi-linear extensionof a quadratic function based on the convex decomposition of a point to the bases of the matroid. Lemma 13.

Let f ( S ) = P i ∈ S g ( i ) + P { i,j }⊆ S d ( i, j ) where g : [ n ] → R ≥ and d : [ n ] × [ n ] → R ≥ with d ( i, i ) = 0 for all i ∈ [ n ] . Let b ∈ R n be a vector such that b i = g ( i ) and A ∈ R n × n be a matrixsuch that A ij = d ( i, j ) . Then the multi-linear extension of f is F ( x ) = x T Ax + x T b . Moreover, if x = P pk =1 λ k I k for some scalars λ k ’s and subsets I k ⊆ [ n ] , then F ( x ) = p X k =1 λ k g ( I k ) + p X k =1 λ k d ( I k ) + p − X k =1 p X ℓ = k +1 λ k λ ℓ d ( I k , I ℓ ) . (37)52 roof. For the ﬁrst part of the lemma note that F ( x ) = X S ⊆ [ n ] f ( S ) Y k ∈ S x k Y k ∈ [ n ] \ S (1 − x k ) = X S ⊆ [ n ] ( g ( S ) + d ( S )) Y k ∈ S x k Y k ∈ [ n ] \ S (1 − x k )= X S ⊆ [ n ] ( X i ∈ S g ( i )) Y k ∈ S x k Y k ∈ [ n ] \ S (1 − x k ) + X S ⊆ [ n ] ( X { i,j }⊆ S d ( i, j )) Y k ∈ S x k Y k ∈ [ n ] \ S (1 − x k )= X i ∈ [ n ] g ( i ) X S ⊆ [ n ] i ∈ S ( Y k ∈ S x k Y k ∈ [ n ] \ S (1 − x k )) + X { i,j }⊆ [ n ] d ( i, j ) X S ⊆ [ n ] { i,j }⊆ S ( Y k ∈ S x k Y k ∈ [ n ] \ S (1 − x k ))= X i ∈ [ n ] g ( i ) x i X S ⊆ [ n ] − i ( Y k ∈ S x k Y k ∈ [ n ] − i \ S (1 − x k )) + X { i,j }⊆ [ n ] d ( i, j ) x i x j X S ⊆ [ n ] − i − j ( Y k ∈ S x k Y k ∈ [ n ] − i − j \ S (1 − x k ))= X i ∈ [ n ] g ( i ) x i + X { i,j }⊆ [ n ] d ( i, j ) x i x j = x T b + 12 x T Ax.

To see the second part, observe that b T x = b T ( X k λ k I k ) = X k λ k ( b T I k ) = X k λ k g ( I k ) , and x T Ax = ( p X k =1 λ k I k ) A ( p X ℓ =1 λ ℓ I ℓ ) = p X k,ℓ =1 λ k λ ℓ I k A I ℓ = p X k,ℓ =1 λ k λ ℓ d ( I k , I ℓ )= p X k =1 λ k d ( I k , I k ) + 2 X k<ℓ λ k λ ℓ d ( I k , I ℓ ) = 2 p X k =1 λ k d ( I k ) + 2 p − X k =1 p X ℓ = k +1 λ k λ ℓ d ( I k , I ℓ ) . Lemma 14.

Let M = ([ n ] , I ) be a matroid and P be its corresponding base polytope. Let F ( z ) = z T Az + z T b where A, b ≥ and A is a symmetric matrix such that its diagonal is zero. Let f ( S ) = F ( S ) for any S ⊆ [ n ] . Let x = P pi =1 λ i I i ∈ P where I i ’s are bases of the matroid, P pi =1 λ i = 1 , and λ i ≥ , for i = 1 , . . . , p . Let ( I ′ , M ) be the output of MergeBases (deﬁned inAlgorithm 7) on ( I , . . . , I p ) and ( λ , . . . , λ p ) . Let y = ( λ + λ ) I ′ + P pi =3 λ i I i . Then F ( x ) ≤ F ( y ) + λ λ P ( i,j ) ∈ M d ( i, j ) .Proof. Let I = I and I = I (the original inputs of the function). Let I m and I m be the resulting I and I after the m -th iteration of the while loop. Let x m = λ I m + λ I m + P pk =3 λ k I k .Let i m , j m be the elements we pick at the m -th iteration of the loop. We show that F ( x m − ) ≤ F ( x m )+ λ λ d ( i m , j m ) and this yields the desired result using a simple recursion argument. Withoutloss of generality, we assume g ( i m ) + λ d ( i m , I m − − i m ) + λ d ( i m , I m − − j m ) + p X k =3 λ k d ( i m , I k ) ≥ g ( j m ) + λ d ( j m , I m − − i m ) + λ d ( j m , I m − − j m ) + p X k =3 λ k d ( j m , I k ) (38)53e have F ( x m − ) = λ g ( I m − ) + λ g ( I m − ) + p X k =3 λ k g ( I k ) + λ d ( I m − ) + λ d ( I m − ) + p X k =3 λ k d ( I k )+ λ λ d ( I m − , I m − ) + λ p X k =3 λ k d ( I m − , I k ) + λ p X k =3 λ k d ( I m − , I k ) + p − X k =3 p X k ′ = k +1 λ k λ k ′ d ( I k , I k ′ )= λ g ( I m − ) + λ g ( I m − − j m ) + p X k =3 λ k g ( I k ) + λ d ( I m − ) + λ d ( I m − − j m ) + p X k =3 λ k d ( I k )+ λ λ d ( I m − , I m − − j m ) + λ p X k =3 λ k d ( I m − , I k ) + λ p X k =3 λ k d ( I m − − j m , I k )+ p − X k =3 p X k ′ = k +1 λ k λ k ′ d ( I k , I k ′ ) + λ g ( j m ) + λ d ( j m , I m − − j m ) + λ λ d ( j m , I m − − i m )+ λ p X k =3 λ k d ( j m , I k ) + λ λ d ( i m , j m ) ≤ λ g ( I m − ) + λ g ( I m − − j m ) + p X k =3 λ k g ( I k ) + λ d ( I m − ) + λ d ( I m − − j m ) + p X k =3 λ k d ( I k )+ λ λ d ( I m − , I m − − j m ) + λ p X k =3 λ k d ( I m − , I k ) + λ p X k =3 λ k d ( I m − − j m , I k )+ p − X k =3 p X k ′ = k +1 λ k λ k ′ d ( I k , I k ′ ) + λ g ( i m ) + λ d ( i m , I m − − j m ) + λ λ d ( i m , I m − − i m )+ λ p X k =3 λ k d ( i m , I k ) + λ λ d ( i m , j m )= λ g ( I m ) + λ g ( I m ) + p X k =3 λ k g ( I k ) + λ d ( I m ) + λ d ( I m ) + p X k =3 λ k d ( I k )+ λ λ d ( I m , I m ) + λ p X k =3 λ k d ( I m , I k ) + λ p X k =3 λ k d ( I m , I k )+ p − X k =3 p X k ′ = k +1 λ k λ k ′ d ( I k , I k ′ ) + λ λ d ( i m , j m ) = F ( x m ) + λ λ d ( i m , j m ) . The inequality holds because of (38), and the ﬁrst and the last equalities follow from Lemma 14.The second to the last equality uses that I m = I m − and I m = I m − − j m + i m . Theorem 18.

Let M ([ n ] , I ) be a matroid of rank r and P be its corresponding base polytope. Let F ( z ) = z T Az + z T b where A, b ≥ and A is a symmetric matrix with zero diagonal that satisﬁesthe σ -semi-metric inequality, i.e., A ij ≤ σ ( A ik + A jk ) . Let f ( S ) = F ( S ) for any S ⊆ [ n ] . Let x ∈ P and S be the output of the modiﬁed swap rounding (Algorithm 7) on x . Then F ( x ) ≤ O (1 + σr ) f ( S ) .Proof. Let x = P pi =1 λ i I i ∈ P where I i ’s are bases of the matroid, P pi =1 λ i = 1 , and λ i ≥ , for i = 1 , . . . , p . Let S be the output of the swap rounding (Algorithm 7) if it starts from ( I , . . . , I p ) ( λ , . . . , λ p ) . Let x k denote the vector corresponding to I k = ( I k ′ , I k +1 , . . . , I p ) and λ k =( λ k ′ , λ k +1 , . . . , λ p ) , i.e. x k = λ ′ k I ′ k + P pi = k +1 λ i I i . By Lemma 14, for k = 1 , . . . , n − , we have F ( x k ) ≤ F ( x k +1 ) + λ ′ k λ k +1 X ( i,j ) ∈ M k d ( i, j ) ≤ F ( x k +1 ) + λ ′ k λ k +1 X ( i,j ) ∈ M t d ( i, j ) , where t = arg max k =1 ,...,p − { P ( i,j ) ∈ M k d ( i, j ) } . Therefore F ( x ) ≤ F ( x p ) + ( p − X k =1 λ ′ k λ k +1 ) X ( i,j ) ∈ M t d ( i, j ) = F ( x p ) + ( p − X k =1 k X m =1 λ m λ k +1 ) X ( i,j ) ∈ M t d ( i, j ) ≤ F ( x p ) + 12 X ( i,j ) ∈ M t d ( i, j ) = f ( I ′ p ) + 12 X ( i,j ) ∈ M t d ( i, j ) , (39)where the last inequality holds since P p − k =1 P km =1 λ m λ k +1 ≤ ( P pk =1 λ k ) = 1 . Now, we bound theterm P ( i,j ) ∈ M t d ( i, j ) . By deﬁnition of M t , note that M t ⊆ I ′ t × I t +1 . Using this and Lemma 13 itfollows that X ( i,j ) ∈ M t d ( i, j ) ≤ d ( I ′ t , I t +1 ) ≤ · F ( 12 I ′ t + 12 I t +1 ) . (40)By Lemma 14 and the σ -semi-metric assumption, we also know that F ( 12 I ′ t + 12 I t +1 ) ≤ F ( I ∗ ) + 14 X ( i,j ) ∈ M ∗ d ( i, j ) ≤ F ( I ∗ ) + 14 X ( i,j ) ∈ M ∗ σr − (cid:0) d ( i, I ′ t − i ) + d ( j, I ′ t − i ) (cid:1) . (41)Note that none of the edges of M ∗ is present in the right hand side summation. Therefore X ( i,j ) ∈ M ∗ ( d ( i, I ′ t − i ) + d ( j, I ′ t − i )) ≤ d ( I ′ t ) + d ( I ′ t , I t +1 ) − X ( i,j ) ∈ M ∗ d ( i, j ) ≤ · F ( 12 I ′ t + 12 I t +1 ) − X ( i,j ) ∈ M ∗ d ( i, j ) ≤ F ( I ∗ ) = 4 f ( I ∗ ) . (42)where the second inequality follows from Lemma 13 and the last inequality holds because ofLemma 14. Combining (40), (41), and (42), we get X ( i,j ) ∈ M t d ( i, j ) ≤ (cid:0) σr − (cid:1) f ( I ∗ ) . (43)Hence, by (39) and (43), we have F ( x ) ≤ f ( I ′ p ) + (cid:16) σr − (cid:17) f ( I ∗ ) , and this yields the result. 55 lgorithm 7: Swap rounding for monotone second-order-modular functions under matroidconstraints Input:

A matroid M = ([ n ] , I ) , its base polytope P , and a fractional solution x ∈ P . A set function f ( S ) = P i ∈ S g ( i ) + P { i,j }⊆ S d ( i, j ) . Find λ = ( λ , λ , . . . , λ p ) and I = ( I , I , . . . , I p ) such that x = P pi =1 λ i I i , λ i ≥ (for any i ), P pi =1 λ i = 1 ,and I i ’s are bases of the matroid; I ′ ← I ; λ ′ ← λ ; for k = 1 , . . . , p − do ( I ′ k +1 , M k ) ← MergeBases( I k , λ k ) ; λ ′ k +1 ← λ ′ k + λ k +1 ; I k +1 ← ( I ′ k +1 , I k +2 , . . . , I p ) ; λ k +1 ← ( λ ′ k +1 , λ k +2 , . . . , λ p ) ; t ← arg max k =1 ,...,p − { P ( i,j ) ∈ M k d ( i, j ) } ; ( I ∗ , M ∗ ) ← MergeBases( ( I ′ t , I t +1 ) , (0 . , . ) ; return arg max { f ( I ∗ ) , f ( I ′ p ) } ; Function

MergeBases( I = ( I , I , . . . , I m ) , λ = ( λ , λ , . . . , λ m ) ) : M ← ∅ ; while I = I do Pick i ∈ I \ I and j ∈ I \ I such that I − i + j ∈ I and I − j + i ∈ I ; M ← M ∪ { ( i, j ) } ; if g ( i ) + λ d ( i, I − i ) + λ d ( i, I − j ) + P mk =3 λ k d ( i, I k ) ≥ g ( j ) + λ d ( j, I − i ) + λ d ( j, I − j ) + P mk =3 λ k d ( j, I k ) then I ← I − j + i ; else I ← I − i + j ; return ( I , M ) ; End Function

G.4 Pipage Rounding

In section G.1, we provide super-constant lower bounds for rounding discrete quadratics over ma-troids. In this section we show that for uniform matroids, there is a constant-factor roundingalgorithm even for the much more general class of second-order-submodular functions. We analyzethe pipage rounding algorithm (Algorithm 8) for this purpose.Recall that given a vector x ∈ [0 , n and i ∈ [ n ] , we denote by x − i the vector resulting fromsetting the ith coordinate of x to zero. That is, ( x − i ) j = x j for all j = i and ( x − i ) i = 0 . Lemma 15.

Let f be a set function and F be its multi-linear extension. Let x ∈ [0 , n and i = j ∈ [ n ] such that ∇ i F ( x − i − j ) ≥ ∇ j F ( x − i − j ) . Consider the vector y = x + ǫ ( e i − e j ) ,where e i denotes the characteristic vector of i ∈ [ n ] , and ǫ = min { x j , − x i } . That is, y k =  x i + ǫ = min { , x i + x j } , k = ix j − ǫ = max { , x i + x j − } , k = jx k , o.w. Then F ( y ) + max { , x i x j ∇ ij F ( x ) } ≥ F ( x ) . roof. For any z ∈ [0 , n , we have F ( z ) = X R ⊆ [ n ] f ( R ) Y v ∈ R z v Y v / ∈ R (1 − z v )= z i z j X R ⊆ [ n ] − i − j f ( R + i + j ) Y v ∈ R z v Y v / ∈ R + i + j (1 − z v )+ z i (1 − z j ) X R ⊆ [ n ] − i − j f ( R + i ) Y v ∈ R z v Y v / ∈ R + i + j (1 − z v )+ (1 − z i ) z j X R ⊆ [ n ] − i − j f ( R + j ) Y v ∈ R z v Y v / ∈ R + i + j (1 − z v )+ (1 − z i )(1 − z j ) X R ⊆ [ n ] − i − j f ( R ) Y v ∈ R z v Y v / ∈ R + i + j (1 − z v )= z i z j X R ⊆ [ n ] − i − j (cid:0) f ( R + i + j ) − f ( R + i ) − f ( R + j ) + f ( R ) (cid:1) Y v ∈ R z v Y v / ∈ R + i + j (1 − z v )+ z i X R ⊆ [ n ] − i − j (cid:0) f ( R + i ) − f ( R ) (cid:1) Y v ∈ R z v Y v / ∈ R + i + j (1 − z v )+ z j X R ⊆ [ n ] − i − j (cid:0) f ( R + j ) − f ( R ) (cid:1) Y v ∈ R z v Y v / ∈ R + i + j (1 − z v )+ X R ⊆ [ n ] − i − j f ( R ) Y v ∈ R z v Y v / ∈ R + i + j (1 − z v )= z i z j ∇ ij F ( z − i − j ) + z i ∇ i F ( z − i − j ) + z j ∇ j F ( z − i − j ) + F ( z − i − j ) . Note that x − i − j = y − i − j . Also, by deﬁnition of ǫ we have ǫ ≥ x j − x i , and hence y i y j = ( x i + ǫ )( x j − ǫ ) = x i x j + ǫ ( x j − x i − ǫ ) ≤ x i x j . It follows that F ( x ) = x i x j ∇ ij F ( x − i − j ) + x i ∇ i F ( x − i − j ) + x j ∇ j F ( x − i − j ) + F ( x − i − j )= x i x j ∇ ij F ( y − i − j ) + x i ∇ i F ( y − i − j ) + x j ∇ j F ( y − i − j ) + F ( y − i − j ) ≤ x i x j ∇ ij F ( y − i − j ) + y i ∇ i F ( y − i − j ) + y j ∇ j F ( y − i − j ) + F ( y − i − j )= ( x i x j − y i y j ) ∇ ij F ( y − i − j ) + y i y j ∇ ij F ( y − i − j ) + y i ∇ i F ( y − i − j )+ y j ∇ j F ( y − i − j ) + F ( y − i − j )= ( x i x j − y i y j ) ∇ ij F ( x − i − j ) + F ( y ) ≤ ( x i x j − y i y j ) max { , ∇ ij F ( x − i − j ) } + F ( y ) ≤ x i x j max { , ∇ ij F ( x − i − j ) } + F ( y )= max { , x i x j ∇ ij F ( x ) } + F ( y ) , where the ﬁrst inequality follows from the assumption ∇ i F ( x − i − j ) ≥ ∇ j F ( x − i − j ) , and thelast equality follows from ∇ ij F ( x − i − j ) = ∇ ij F ( x ) (see Lemma 4). Theorem 19.

Let f be a non-negative, monotone, second-order-submodular function and F be itsmulti-linear extension. Let x ∈ [0 , n such that || x || = k . Then Algorithm 8 ﬁnds S ⊆ [ n ] suchthat | S | = k and f ( S ) ≥ F ( x ) . lgorithm 8: Rounding second order modular functions under cardinality constraint Input:

A fractional solution x = ( x i ) ∈ [0 , n where P i ∈ [ n ] x i = k . while the sum of fractional coordinates of x is greater than 2 do x F ← fractional coordinates of x ; { i, j } ← arg min { q,q ′ }⊂ supp ( x F ) x q x q ′ ∇ qq ′ F ( x ) ; if x i + x j ≤ then if ∇ i F ( x − i − j ) ≥ ∇ j F ( x − i − j ) then x i ← x i + x j ; x j ← ; else x j ← x i + x j ; x i ← ; else if ∇ i F ( x − i − j ) ≥ ∇ j F ( x − i − j ) then x j ← x i + x j − ; x i ← ; else x i ← x i + x j − ; x j ← ; x F ← fractional coordinates of x ; x I ← integral coordinates of x ; { i, j } ← arg max { q,q ′ }⊂ supp ( x f ) ( d ( q, q ′ ) + g ( q ) + g ( q ′ ) + P v ∈ supp ( x I ) ( d ( q, v ) + d ( q ′ , v ))) ; S ← supp ( x I ) ; { i, j } ← arg max { q,q ′ }⊂ supp ( x F ) B q ( S ) + B q ′ ( S ) + A qq ′ ( S ) ; x i ← ; x j ← ; for q ∈ supp ( x F ) − i − j do x q ← ; return supp ( x ) ; Proof.

Let z ∈ [0 , n and z F be its fractional part (coordinates). Also let z ′ be z after one step ofpipage rounding algorithm (Algorithm 19). By Lemma 15, we have F ( z ) ≤ F ( z ′ ) + max { , z i z j ∇ ij F ( z ) } . (44)By second-order-submodularity, Lemma 12, and monotonicity, we have

12 ( z F ) T ∇ F ( z )( z F ) ≤

12 ( z F ) T ∇ F ( z F )( z F ) ≤ F ( z F ) ≤ F ( z ) . Therefore, z i z j ∇ ij F ( z ) = min { q,q ′ }⊂ supp ( z F ) z q z q ′ ∇ qq ′ F ( z ) ≤ (cid:0) | supp ( z F ) | (cid:1) F ( z ) Hence, by non-negativity of f , we have max { , z i z j ∇ ij F ( z ) } ≤ (cid:0) | supp ( z F ) | (cid:1) F ( z ) Using this and (44), we have (cid:0) | supp ( z F ) | (cid:1) − (cid:0) | supp ( z F ) | (cid:1) F ( z ) ≤ F ( z ′ ) (45)58et x be the initial vector in Algorithm 19 and x i +1 be the vector after i ’th iteration of theloop. Also, let n i = | supp ( x i ) | . If the loop iterates t times, we have n ≥ n > n > · · · > n t ≥ because in each iteration, the number of integral coordinate increases by at least , and || x t || > (the loop’s condition). By (45), for i = 1 , . . . , t , we have F ( x i +1 ) ≥ n i − n i − n i ( n i − F ( x i ) . Let x t +2 be theﬁnal vector in the algorithm (it is integral). We show that F ( x t +2 ) ≥ F ( x t +1 ) . Let x F be thefractional part of the x t +1 , x I be its integral part, S = supp ( x I ) , and { i, j } = arg max { q,q ′ }⊂ supp ( x f ) ( B q ( S ) + B q ′ ( S ) + A qq ′ ( S )) . Note that || x F || = 2 because the norm of the fractional part decreases by at most at any iterationand also it is always an integer. Therefore, because of the selection of i, j , we have ( X { q,q ′ }⊂ supp ( x F ) x q x q ′ )( B i ( S ) + B j ( S ) + A ij ( S )) ≥ X { q,q ′ }⊂ supp ( x F ) x q x q ′ ( B q ( S ) + B q ′ ( S ) + A qq ′ ( S ))= X { q,q ′ }⊂ supp ( x F ) x q x q ′ B q ( S ) + X { q,q ′ }⊂ supp ( x F ) x q x q ′ B q ′ ( S ) + X { q,q ′ }⊂ supp ( x F ) x q x q ′ A qq ′ ( S )= X q ∈ supp ( x F ) X q ′ ∈ supp ( x F ) q ′ = q x q x q ′ B q ( S ) + X { q,q ′ }⊂ supp ( x F ) x q x q ′ A qq ′ ( S )= X q ∈ supp ( x F ) x q (2 − x q ) B q ( S ) + X { q,q ′ }⊂ supp ( x F ) x q x q ′ A qq ′ ( S ) ≥ X q ∈ supp ( x F ) x q B q ( S ) + X { q,q ′ }⊂ supp ( x F ) x q x q ′ A qq ′ ( S )= ( x F ) T ∇ F ( x I ) + 12 ( x F ) T ∇ F ( x I )( x F ) The second inequality holds because || x f || = P q ∈ supp ( x F ) x q = 2 and x q is fractional, i.e., x q < .By the Lagrange multipliers’ method and the fact that P q ∈ supp ( x F ) x q = 2 , we can conclude that X { q,q }⊂ supp ( x F ) x q x q ′ ≤ , and the equality happens when all x q = 2 / ( | supp ( x F ) | ) . Using non-negativity and monotonicity of f , the Taylor’s theorem, the above inequalities, and Lemma 2, we have F ( x t +1 ) = F ( x I ) + ( x F ) T ∇ F ( x I ) + 12 ( x F ) T ∇ F ( x I )( x F ) ≤ F ( x I ) + ( X { q,q ′ }⊂ supp ( x F ) x q x q ′ )( B i ( S ) + B j ( S ) + A ij ( S ))= (2 − X { q,q ′ }⊂ supp ( x F ) x q x q ′ ) F ( x I ) + ( X { q,q ′ }⊂ supp ( x F ) x q x q ′ )( F ( x I ) + B i ( S ) + B j ( S ) + A ij ( S ))= (2 − X { q,q ′ }⊂ supp ( x F ) x q x q ′ ) F ( x I ) + ( X { q,q ′ }⊂ supp ( x F ) x q x q ′ ) F ( x I + { i,j } ) ≤ F ( x I + { i,j } ) = 2 F ( x t +2 )

59y the above inequalities, we have F ( x t +2 ) ≥ ( t Y i =1 (cid:0) n i (cid:1) − (cid:0) n i (cid:1) ) 12 F ( x ) ≥ ( n Y i =3 (cid:0) i (cid:1) − (cid:0) i (cid:1) ) 12 F ( x ) ≥ ( n Y i =3 i − i − i ( i −

1) ) 12 F ( x )= ( n Y i =3 ( i + 1)( i − i ( i −

1) ) 12 F ( x ) = n + 13( n −

1) ( n − Y i =3 ( i − i + 1)( i + 1)( i −

1) ) 12 F ( x )= n + 13( n −

1) 12 F ( x ) ≥ F ( x ) ..